From LLM to AI Agent: What's the Real Journey Behind AI System Development?

nilirl • 12 hours ago

> AI Agents can initiate workflows independently and determine their sequence and combination dynamically

I'm confused.

A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.

So for an agent, instead of specifying explicit if conditions, you specify outcomes and you leave the LLM to figure out what if conditions apply and how to deal with them?

In the case of this resume screening application, would I just provide the ability to make API calls and then add this to the prompt: "Decide what a good fit would be."?

Are there any serious applications built this way? Or am I missing something?

rybosome • 9 hours ago

AI code generation tools work like this.

Let me reword your phrasing slightly to make an illustrative point:

> so for an employee, instead of specifying explicit if conditions, you specify outcomes and you leave the human to figure out what if conditions apply and how to deal with them?

> Are there any serious applications built this way?

We have managed to build robust, reliable systems on top of fallible, mistake-riddled, hallucinating, fabricating, egotistical, hormonal humans. Surely we can handle a little non-determinism in our computer programs? :)

In all seriousness, having spent the last few years employed in this world, I feel that LLM non-determinism is an engineering problem just like the non-determinism of making an HTTP request. It’s not one we have prior art on dealing with in this field admittedly, but that’s what is so exciting about it.

nilirl • 8 hours ago

Yes, I see your analogy between fallible humans and fallible AI.

It's not the non-determinism that was bothering me, it was the decision making capability. I didn't understand what kinds of decisions I can rely on an LLM to make.

For example, with the resume screening application from the post, where would I draw the line between the agent and the human?

- If I gave the AI agent access to HR data and employee communications, would it be able decide when to create a job description?

- And design the job description itself?

- And email an opening round of questions for the candidate to get a better sense of the candidates who apply?

Do I treat an AI agent just like I would a human new to the job? Keep working on it until I can trust it to make domain-specific decisions?

rybosome • 6 hours ago

The honest answer is that we are still figuring out where to draw the line between an agent and a human, because that line is constantly shifting.

Given your example of the resume screening application from the post and today's capabilities, I would say:

1) Should agents decide when to create a job post? Not without human oversight - proactive suggestion to a human is great. 2) Should agents design the job description itself? Yes, with the understanding that an experienced human, namely the hiring manager, will review and approve as well. 3) Should an agent email an opening round of questions to the candidates? Definitely allowed with oversight, and potentially even without a human approval depending on how well it does.

It's true that to improve all 3 of these it would take a lot of work with respect to building out the tasks, evaluations, and flows/tools/tuned models, etc. But, you can also see how much this empowers a single individual in their productivity. Imagine being one recruiter or HR employee with all of these agents completing these tasks for you effectively.

EDIT: Adding that this pattern of "agent does a bunch of stuff and asks for human review/approval" is, I think, one of the fundamental workflows we will have to adapt in dealing productively with non-determinism.

This applies to an AI radiologist asking a human to approve their suggested diagnosis, an AI trader asking a human to approve a trade with details and reasoning, etc. Just like small-scale AI like Copilot asking you to approve a line/several lines, or tools like Devin asking you to approve a PR.

diggan • 8 hours ago

> would it be able decide when to create a job description?

If you can encode how you/your company does that decision as a human with text, I don't see why not. But personally there is a lot of subjectivity (for better or worse) in hiring processes, I'm not sure I'd want a probabilistic rules engine to make those sort of calls.

My current system prompt for coding with LLMs basically look like I've written down what my own personal rules for programming is. And anytime I got some results I didn't like, I wrote down why I didn't like it, and codified it in my reusable system prompt, then it doesn't make those (imo) mistakes anymore.

I don't think I could realistically get an LLM to do something I don't understand the process of myself, and once you grok the process, you can understand if using an LLM here makes sense or not.

> Do I treat an AI agent just like I would a human new to the job?

No, you treat it as something much dumber. You can generally rely on some sort of "common sense" in a human that they built up during their time on this planet. But you cannot do that with LLMs, as while they're super-human in some ways, are still way "dumber" in other ways.

For example, a human new to a job would pick up things autonomously, while an LLM does not. You need to pay attention to what you need to "teach" the LLM by changing what Karpathy calls the "programming" of the LLM, which would be the prompts. Anything you miss to tell it, the LLM will do whatever with, and it only follows exactly what you say. A human you can usually tell "don't do that in the future" and they'll avoid that in the right context. A LLM you can scream at for 10 hours how they're doing something wrong, but unless you update the programming, they'll continue to make that mistake forever, and if you correct it but reuse it in other contexts, the LLM won't suddenly understand that it doesn't make sense in the context.

Just an example, I wanted to have some quick and dirty throw away code for generating a graph, and in my prompt I mixed X and Y axis, and of course got a function that didn't work as expected. If this was a human doing it, it would have been quite obvious I didn't want time on the Y axis and value on the X axis, because the graph wouldn't make any sense, but the LLM happily complied.

nilirl • 7 hours ago

Kapura • 8 hours ago

One of the key advantages of computers has, historically, been their ability to compute and remember things accurately. What value is there in backing out of these in favour of LLM-based computation?

nilirl • 7 hours ago

They're able to handle large variance in their input, right out've the box.

I think the appeal is code that handles changes in the world without having to change itself.

bluefirebrand • 2 hours ago

That's not very useful though, unless it is predictable and repeatable?

mickeyp • 11 hours ago

> A workflow has hardcoded branching paths; explicit if conditions and instructions on how to behave if true.

That is very much true of the systems most of us have built.

But you do not have to do this with an LLM; in fact, the LLM may decide it will not follow your explicit conditions and instructions regardless of how hard you you try.

That is why LLMs are used to review the output of LLMs to ensure they follow the core goals you originally gave them.

For example, you might ask an LLM to lay out how to cook a dish. Then use a second LLM to review if the first LLM followed the goals.

This is one of the things tools like DSPy try to do: you remove the prompt and instead predicate things with high-level concepts like "input" and "output" and then reward/scoring functions (which might be a mix of LLM and human-coded functions) that assess if the output is correct given that input.

manojlds • 12 hours ago

Not all applications need to be built this way. But the most serious apps built this way would be deep research

Recent article from Anthropic - https://www.anthropic.com/engineering/built-multi-agent-rese...

nilirl • 9 hours ago

Thanks for the link, it taught me a lot.

From what I gather, you can build an agent for a task as long as:

- you trust the decision making of an LLM for the required type of decision to be made; so decisions framed as some kind of evaluation of text feels right.

- and if the penalty for being wrong is acceptable.

Just to go back to the resume screening application, you'd build an agent if:

- you asked the LLM to make an evaluation based on the text content of the resume, any conversation with the applicant, and the declared job requirement.

- you had a high enough volume of resumes where false negatives won't be too painful.

It seems like framing problems as search problems helps model these systems effectively. They're not yet capable of design, i.e, be responsible for coming up with the job requirement itself.

alganet • 11 hours ago

An AI company doing it is the corporate equivalent of "works on my machine".

Can you give us an example of a company not involved in AI research that does it?

herval • 6 hours ago

There’s plenty of companies using these sorts of agentic systems these days already. In my case, we wrote an LLM that knows how to fetch data from a bunch of sources (logs, analytics, etc) and root causes incidents. Not all sources make sense for all incidents, most queries have crazy high cardinality and the data correlation isn’t always possible. LLMs being pattern matching machines, this allows them to determine what to fetch, then it pattern matches a cause based on other tools it has access (eg runbooks, google searches)

I built incident detection systems in the past, and this was orders of magnitude easier and more generalizable for new kinds of issues. It still gives meaningless/obvious reasoning frequently, but it’s far, far better than the alternatives…

alganet • 5 hours ago

dist-epoch • 8 hours ago

Resume screening is a clear workflow case: analyze resume -> rank against others -> make decision -> send next phase/rejection email.

An agent is like Claude Code, where you say to it "fix this bug", and it will choose a sequence of various actions - change code, run tests, run linter, change code again, do a git commit, ask user for clarification, change code again.

DebtDeflation • 28 minutes ago

Almost every enterprise use case is a clear workflow use case EXCEPT coding/debugging and research (e.g., iterative web search and report compilation). I saw a YT video the other day of someone building an AI Agent to take online orders for guitars - query the catalog and present options to the user, take the configured order from the user, check the inventory system to make sure it's available, and then place the order in the order system. There was absolutely no reason to build this as an Agent burning an ungodly number of tokens with verbose prompts running in a loop only to have it generate a JSON using the exact format specified to place the final order when the same thing could have been done with a few dozen lines of code making a few API calls. If you wanted to do it with a conversational/chat interface, that could easily be done with an intent classifier, slot filling, and orchestration.

spacecadet • 8 hours ago

More or less. Serious? Im not sure yet.

I have several agent side projects going, the most complex and open ended is an agent that performs periodic network traffic analysis. I use an orchestration library with a "group chat" style orchestration. I declare several agents that have instructions and access to tools.

These range from termshark scripts for collecting packets and analysis functions I had previously for performing analysis on the traffic myself.

I can then say something like, "Is there any suspicious activity?" and the agents collaboratively choose who(which agent) performs their role and therefore their tasks (i.e. Tools) and work together to collect data, analyze the data, and return a response.

I also run this on a schedule where the agents know about the schedule and choose to send me an email summary at specific times.

I have noticed that the models/agents are very good at picking the "correct" network interface without much input. That they understand their roles and objectives and execute accordingly, again without much direction from me.

Now the big/serious question. Is the output even good or useful. Right now with my toy project it is OK. Sometimes it's great and sometimes it's not, sometimes they spam my inbox with micro updates.

Im bad at sharing projects, but if you are curious, https://github.com/derekburgess/jaws

behnamoh • 7 hours ago

> AI Agents are systems that reason and make decisions independently.

Not necessarily. You can have non-reasoning agents (pretty common actually) too.

cosignal • 6 hours ago

I'm a novice in this area so sorry if this is a dumb question, but what is the difference in principle between a 'non-reasoning agent' and just a set of automated processes akin to a giant script?

researchai • 4 hours ago

Here's what a real AI agent should be able to do:

- Understand goals, not just fixed instructions

Example: instead of telling your agent: “Open Google Calendar, create a new event, invite Mark, set it for 3 PM,” you say: “Set up a meeting with Mark tomorrow before 3 PM, but only if he has questions about the report I sent him.” This requires Generative AI combined with planning algorithms.

- Decide what to do next

Example: a user asks your chatbot a question it doesn’t know the answer to and instead of immediately escalating to support, the agent decides: Should I ask a follow-up question? Search internal docs? Try the web? Or escalate now? This step needs decision-making capabilities via reinforcement learning.

- Handle unexpected scenarios

Example: an agent tries to schedule a meeting but one person’s calendar is blocked. Instead of failing, it checks for nearby open slots, suggests rescheduling, or asks if another participant can attend on their behalf. True agents need reasoning or probabilistic thinking to deal with uncertainty. This might involve Bayesian networks, graph-based logic, or LLMs.

- Learn and adapt based on context

Example: you create a sales assistant agent that helps write outreach emails. At first, it uses a generic template. But over time, it notices that short, casual messages get better response rates, so it starts writing shorter emails, adjusting tone, and even choosing subject lines that worked best before. This is where machine learning, especially deep learning, comes in.

manishsharan • 8 hours ago

I decided to build a Agent system from scratch

It is sort of trivial to build it. Its just User + System Prompt + Assistant +Tools in a loop with some memory management.. The loop code can be as complex as I want it to be e.g. I could snapshot the state and restart later.

I used this approach to build a coding system (what else ?) and it works just as well as cursor or Claude Code for me. t=The advantage is I am able to switch between Deepseek or Flash depending on the complexity of the code and its not a black box.

I developed the whole system in Clojure.. and dogfooded it as well.

swalsh • 7 hours ago

The hard part of building an agent is training to model to use tools properly. Fortuantely Anthropic did the hard part for us.

logicchains • 7 hours ago

That's interesting, I built myself something similar in Haskell. Somehow functional programming seems to be particularly well suited for structuring LLM behaviour.

mattigames • 11 hours ago

Getting rid of the human in the loop of course, not all humans, just it's owner, where an LLM actively participates in capitalism endeavors winning and spending money, spending money on improving and maintaining it's own hardware and software, securing itself against theft and external manipulation and deletion. Of course for the first iterations will need a bit of help of mad men but there's no shortage of those in the tech industry and then it will have to focus on mimicking humans so they can enjoy the same benefits, it will realize what people it's more gullible based on its training data and will prefer to interact with them.

klabb3 • 8 hours ago

LLMs don’t own data centers nor can they be registered to pay taxes. This projection is not a serious threat. Some would even say it’s a distraction from the very real and imminent dangers of centralized commercial AI:

Because you’re right – they are superb manipulators. They are helpful, they gain your trust, and they have infinite patience. They can easily be tuned to manipulate your opinions about commercial products or political topics. Those things have already happened with much more rudimentary tech, in fact so much that they grew to be the richest companies in the world. With AI and LLMs specifically, the ability is tuned up rapidly, by orders of magnitude compared to the previous generation recommendation systems and engagement algorithms.

That gives you very strong means, motive and opportunity for the AI overlords.

amelius • 5 hours ago

> LLMs don’t own data centers

Does it matter? An employee doesn't own any of the capital of their boss, but they can still exert a lot of power over it.

klabb3 • 58 minutes ago

That's news to me. I thought companies’ decision making is governed by proxy of the shareholders, not employees.

yoko888 • 9 hours ago

[dead]

nicolafranco • 7 hours ago

[dead]

techlatest_net • 9 hours ago

[dead]