Defining "AI Agents"

Who knows what an "AI agent" is? Amidst the industry hype, this is an (amateur) attempt at a somewhat rigorous definition.

Yiing Chau Mak

23 Mar 2025 • 14 min read

Photo by Martín Castañeda / Unsplash

I read with some amusement a TechCrunch article published last week titled "No one knows what the hell an AI agent is".

Quoting from the article:

But it also depends on how one defines “agents,” which is no easy task. Much like other AI-related jargon (e.g. “multimodal,” “AGI,” and “AI” itself), the terms “agent” and “agentic” are becoming diluted to the point of meaninglessness.

I don't think "multimodal" is being diluted, but the article's point stands. Many of these AI terms aren't well defined, yet everyone seems to be using them liberally.

Add other terms like "AI assistant", "AGI", "AI operator" (e.g. OpenAI's Operator) and "copilot" (with a lowercase "c", since there are so many copilots now) to the mix, and we have a recipe for confusion.

Definitions

How do we actually define something?

During my stint in the Singapore government, I had the privilege of working on legal and regulatory frameworks for cybersecurity. While putting together the inaugural Cybersecurity Act in 2018 (which has since been amended in 2024), we grappled with the definitions of "cybersecurity" and even "computer". We spent hours discussing various options, considering which entities and systems we wanted to be bound by the law, and comparing our definitions with those in similar legislation in other countries.

Defining a term in law requires an extraordinary amount of precision and rigor because it affects people's lives, and because it impacts how justice is to be interpreted. A law may contain hundreds of clauses, every one of which will depend on the definition. Definitions affect who falls within the law's jurisdiction, and who doesn't. This is why laws always start off with definitions.

With that perspective in mind, let us try to adopt a somewhat rigorous approach to define "AI Agents". Let us also consider several other terms that are often hyped up in the industry and media.

Defining: Agents

It may be easier to define an "AI agent" if we first define an "agent", without considering whether it is an AI agent, human agent, or somewhere in between.

The Oxford Languages dictionary defines an "agent" very simply as:

a person who acts on behalf of another person or group.

It also provides some examples of such agents:

a person who manages business, financial, or contractual matters for an actor, performer, writer, etc
a person or company that provides a particular service, typically one that involves organizing transactions between two other parties - e.g. "a travel agent"
a person who obtains information for a government or other official body, typically in secret - e.g. "a trained intelligence agent"

There are two key aspects to this definition: "acts" and "on behalf of".

"Acting"

Being able to "act" generally requires several capabilities:

Being able to perceive an external state of affairs, i.e. information about the world.
Being able to process such information.
Being able to decide what actions to perform, based on such information. This is synonymous with the "reasoning" step in the ReAct prompting approach first described by Yao et al. Note that "reasoning" is probably more involved than "processing", which seems rather mechanical in nature.
Being able to take those actions. Such actions will impact the external state of affairs, i.e. the world – note that this may span both physical and virtual realms. Actions include but are not limited to sending messages and other communications to other entities.

This appears quite closely aligned with the OODA loop (observe, orient, decide, act) approach used in the U.S. Air Force:

OODA loop, from https://en.wikipedia.org/wiki/OODA_loop

The "processing" and "reasoning" capabilities set this apart from the simple stimulus-response model popular in psychology, which seems to emphasize the lack of need for conscious thought (i.e. processing or reasoning) between stimulus and response.

"On behalf of"

Acting "on behalf of" another person or group is a little more nuanced, and arguably requires the following:

Being aligned with someone else's interests – let's call this the agent's "client". Note that "being aligned" with the client is not necessarily the same as "understanding" the client – we will probably require a separate article to define "understanding", especially in the context of machines!
Being able to act autonomously, without being directed by the client at every step. This implies that the agent's actions may comprise multiple sequential steps over time. And since "acting" requires perception and other steps, this also implies that agents should be able to perceive, reason and take actions on their own, i.e. autonomously. Note that acting autonomously is different from being acting completely independently – the latter suggests that there is no need for any direction from the client.
Being able to maintain context over time. This implies that agents must have some form of "memory".
Being able to communicate with the client. While it is entirely possible that an agent may act for a client entirely independently, it is hard to imagine that the agent could be effective over time without ever communicating with the client. Acting autonomously still leaves room for some direction by the client. Furthermore, communication is essential if the agent is to remain aligned with the client's interests over time, since the client's interests may change over time. Note that while communication may take the form of language (either spoken or written), that is not a strict requirement (e.g. multimodal inputs and outputs).

Defining: AI

This is all well and good, but where does "AI" come in?

The capabilities that we have set out above for "acting on behalf of" a client can therefore be summarized as:

Alignment with a client's interests
Perception
Memory
Reasoning, which can be seen as a superset of processing. (For example, a computer can process "1 + 1", but that does not require any reasoning.)
Autonomous action
Communication

Computers 💻

These capabilities starts to look suspiciously like a basic computer. The von Neumann computer architecture from 1945 (80 years ago!) looks like this:

Diagram depicting the von Neumann computer architecture. From Wikipedia.

Perception maps to input, action and communication map to output, and reasoning maps to the Central Processing Unit comprising control and arithmetic/logic units. The only delta seems to be alignment with a client's interests.

So one possible implementation of an agent is as a computer agent, i.e. a computer that acts on behalf of a client. But what about an "AI agent"? This begs the question: what is AI?

The most authoritative definition of AI, in my view, can be found in the seminal CS textbook Artificial Intelligence: A Modern Approach by Stuart Russell and Peter Norvig (2021).

Russell and Norvig start off by exploring how AI could be thought of as both thinking and acting, and doing so in a human and rational manner.

Four ways to think about AI (page 2, *Artificial Intelligence: A Modern Approach*).

The book adopts the "rational agent" approach to AI, i.e. the bottom right quadrant:

An agent is just something that acts (agent comes from the Latin agere, to do). Of course, all computer programs do something, but computer agents are expected to do more: operate autonomously, perceive their environment, persist over a prolonged time period, adapt to change, and create and pursue goals. A rational agent is one that acts so as to achieve the best outcome or, when there is uncertainty, the best expected outcome.

As I described above, agents act on behalf of their clients. The definition of rational agents goes one step further: rational agents act to achieve the best expected outcome for their clients.

The rational-agent approach ... is more amenable to scientific development than are approaches based on human behavior or human thought. The standard of rationality is mathematically well defined and completely general, and can be “unpacked” to generate agent designs that provably achieve it. Human behavior, on the other hand, is well adapted for one specific environment and is defined by, well, the sum total of all the things that humans do. This book therefore concentrates on general principles of rational agents and on components for constructing them.

Much of the hype on AI agents in the wild today focus on the "humanness" or "humanity" that modern-day agents are capable of. This is expected – we tend to see, and want to see, human likeness in things around us. (This is a phenomenon known as pareidolia.) But humanity is a rather difficult and possibly normative standard. Just as Russell and Norvig chose to use rational agents as a baseline for studying AI, it is probably safe for us to adopt a similar approach to studying AI agents – focus on their rationality, instead of their humanity.

This brings us to a possible definition of "AI agent":

🤖

An AI agent is a computer that acts on behalf of a (human) client. AI agents operate autonomously to achieve the best outcome for their clients over a period of time. This requires them to perceive their environment, reason about and decide on optimal actions, perform these actions at the right time, and communicate with their clients and the environment. To do so, AI agents must remember the state of their environment and their goals.

Note that I have enclosed the word "human" in parentheses. Strictly speaking, AI agents could also have their own AI agents – we could imagine multiple layers of AI agents, each with its own delegated level of responsibility. (I know, this sounds like Inception.) But given that we are humans, we probably want our AI agents to act on behalf of humans...

Here is a refined definition, with some help from LLMs:

🤖

An AI agent is a computer system that autonomously acts on behalf of a human client to maximize its client's utility over time. It perceives its environment, reasons about the best actions, executes them at the right times, and communicates with both its client and its surroundings. To function effectively, it must remember and recall information about its environment and objectives.

The economists among you will probably recognize the phrase "maximize its client's utility". Fun fact: the von Neumann-Morgenstern utility theorem which forms the basis for expected utility theory is from the same John von Neumann who created the foundational von Neumann architecture described at the beginning of this chapter!

Let's reason about AI agents

Let us put our new definition to the test by reasoning about it in the face of various terms and concepts often brought up in the hype around AI agents.

Chatbots 💬

Are "chatbots" agents? Probably not. They communicate a lot, but do not execute actions and certainly not on behalf of users.

A "customer service agent" is an entirely different construct though, insofar as it is able to autonomously take actions on behalf of either the organization or the user, to maximize their objectives.

Agentic ________ (insert noun here)

There is a lot of buzz around "agentic systems" and "agentic AI". What is "agentic"?

Merriam-Webster defines "agentic" as a slang term (emphasis mine):

Agentic refers to someone or something capable of achieving outcomes independently (“functioning like an agent”) or possessing such ability, means, or power (“having agency"). It is especially used with a type of artificial intelligence (AI), often referred to as an AI agent, designed to execute complex tasks autonomously or with little human involvement. [...]

It adds on (emphasis also mine):

[...] agentic is based on the specific use of agent to refer to computer applications designed to automate certain tasks. Technological advances in the 2020s engineered AI agents programmed to carry out increasingly sophisticated forms of automation. The capabilities of such AI are often described in the terms of human agency: making decisions, taking actions, solving problems, reasoning, etc., on its own. An agentic AI, for example, might be designed to receive and resolve a customer service issue, such as refunding money or resetting a password, without any human oversight in any step of the process.

This all sounds good. To understand "agentic systems" and "agentic AI", we will need to define "AI agents", which we have done above. The dictionary definition above though fails to capture the importance of the agent-client relationship, i.e. that agents must act on behalf of their (human) clients.

Nvidia provides a clear definition of "agentic AI" in a blog post (Oct 2024), but also fails to capture the nuance around the agent-client relationship:

Agentic AI uses sophisticated reasoning and iterative planning to autonomously solve complex, multi-step problems.

Artificial General Intelligence (AGI)

I shudder to define AGI, given how difficult it has already been to define AI agents (~2,000 words so far in this article and counting). But let us consider how some industry giants have defined it, in the context of our definition of "AI agent".

OpenAI has been vocal about AGI and its associated safety issues for a long time. Notably, its mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity (blog post, Feb 2023).

Amazon has also been at the forefront of AGI work, having developed a huge AGI division (paywalled, sorry). Here is how it defines AGI:

Artificial general intelligence (AGI) is a field of theoretical AI research that attempts to create software with human-like intelligence and the ability to self-teach. The aim is for the software to be able to perform tasks that it is not necessarily trained or developed for.

Perhaps not to be outdone, Google also has a clear definition:

Artificial general intelligence (AGI) refers to the hypothetical intelligence of a machine that possesses the ability to understand or learn any intellectual task that a human being can. It is a type of artificial intelligence (AI) that aims to mimic the cognitive abilities of the human brain.

And here is IBM's definition:

Artificial general intelligence (AGI) is a hypothetical stage in the development of machine learning (ML) in which an artificial intelligence (AI) system can match or exceed the cognitive abilities of human beings across any task. It represents the fundamental, abstract goal of AI development: the artificial replication of human intelligence in a machine or software.

While these definitions differ in many areas, they seem to agree that AGI is a particular state or level of intelligence. Systems with AGI must be at least as smart as a human, in terms of being able to understand or perform a general set of tasks for which they were not trained for.

Recall that we have chosen above not to focus on the "humanness" of thinking and acting, in favor of the "rationality" approach, just as Russell and Norvig have done for their book. So one could argue that whether a system is to be considered an "AI agent" and whether it meets the AGI bar are two completely orthogonal issues. In other words, AI agents could possess any level of intelligence. A computer system need not have reached AGI to be considered an AI agent.

AI Assistants

This is yet another term that shares a huge common center with "AI agents" in an imaginary Venn diagram, but the exact overlap is unclear.

A quick detour to Clippy 📎

AI assistants have certainly evolved over the years, and it is worth taking a (slight) historical detour to see how far we have come. One could argue that the "OG assistant" was none other than Clippy and friends:

Clippy and friends, circa Office 2000. Most people likely only recognize Clippy, though I vaguely remember using the red dot and the wizard too. Image from a Reddit thread.

While Clippy was much hated and derided, it remains a popular cultural icon today, perhaps one that harkens back (nostalgically) to simpler, more benign times.

Clippy was based on surprisingly sophisticated Bayesian networks, and its objective was to offer suggestions based on the users' actions. It certainly tried to perceive its environment and its client's (i.e. the user's) state of mind:

Figure 1 from the Lumiere Project paper by Microsoft Research (1998), which proposed the Bayesian user models that are the foundation of Clippy and friends.

It would appear that Clippy would meet most of our requirements of an AI agent, with the possible exception of whether it reasons about the best course of action. Given that Clippy uses only probabilities (albeit probabilities which result from fairly sophisticated Bayesian modeling) to decide what assistance to provide the user, we could argue that it processes but does not reason.

Perhaps this brings to light that reasoning is ultimately a subjective endeavor – the "best" course of action given any set of circumstances cannot easily be modeled with a set of equations.

Back to the present 🚀

Perhaps the best place to look for a clear definition for "AI Assistant" is Salesforce, given its huge and public investment in AI assistants via Agentforce (evidenced by the goal of hiring 1,000 salespersons). Salesforce has an entire page dedicated to AI assistants, and it states:

An AI assistant is a software application powered by artificial intelligence that understands natural language, processes commands, and performs specific tasks to assist its users. Whether voice-based — think Apple’s Siri, Amazon Alexa, or Samsung Bixby — or text-based, such as customer support chatbots, these digital tools help automate workflows, answer questions, and deliver accurate actions or responses.

This defines AI assistants as something much less lofty and much closer to the present than AGI. The natural language interface appears rather restrictive, but otherwise this definition seems quite aligned with our definition of AI agents at first glance.

Would we consider Siri or Alexa as AI agents though? Given how we use them today, I would argue not. They do not meet the bar for acting autonomously, and they certainly do not reason about the optimal actions that they should take to maximize the interests of their human clients over time.

But perhaps they simply do not have the opportunity to, since all we have done for the past ten years is to ask them about the weather and to play songs. Perhaps Alexa+ will change this status quo, one Prime user or $19.99/month at a time. It would be somewhat creepy though, for us to walk into a room and have Alexa say something like:

"Hey, I've noticed you always ask the weather ten minutes after you walk into your bedroom between 9pm and 10pm. So I've dug out the weather for tomorrow: 32F and a chance of snow. You're welcome!"

Or imagine if Siri says:

"You've been searching the web for cold and flu symptoms and remedies for the past hour. Would you like me to make an appointment for you to see your primary physician? The earliest appointment with Dr. James is this Friday at 2pm.

It will take some time for us to get familiar with the notion of our digital assistants being full-fledged AI agents, and actually use them as such!

Multi-agent systems

Unlike all of the other terms above, multi-agent systems are actually a well-studied interdisciplinary field with roots in computer science, game theory / economics, operations research, and other disciplines.

To dive into this, check out this excellent foundational book by Shoham and Leyton-Brown, published back in 2009 (free e-book available):

So... what is an AI agent?

Let us revisit our definition:

An AI agent is a computer system that autonomously acts on behalf of a human client to maximize its client's utility over time. It perceives its environment, reasons about the best actions, executes them at the right times, and communicates with both its client and its surroundings. To function effectively, it must remember and recall information about its environment and objectives.

This is by no means perfect, but let's see how this holds up over time. As a next step, I will share some ideas for designing AI agents.

Please feel free to leave your comments below. Thank you for reading! 🙏