The Anatomy of an Autonomous Agent

A breakdown of autonomous agents–the future medium of interaction between us and our tools.

Aidan Tilgner

May 31, 2024

Article voiceover

0:00

-10:29

Modern automata…

Dreams of autonomous machines have been around for thousands of years. Shortly after beginning the use of tools, we started to ponder the bridge between these functional objects and us, lifeforms. The automata of Greek mythology are one of many examples, with the tale of Talos, the 30-meter-tall golden protector of Crete, being told thousands of years ago. It seems that we see fleeting hints of life in the tools that we create.

Today, our fascination with autonomous machines continues. While we don't have giant metal men hurling boulders, we do have machines that feel lifelike. Modern language models, trained to mimic the complexities of human language, break the barrier of natural language, achieving feats through sophisticated statistical analysis.

The automata described in poems thousands of years ago are being realized today. But in what form? We call them autonomous agents, and they stand to be one of the many types of applications built on top of the language model platform.

A bit of background…

While conversational products like ChatGPT are more passive in their function, autonomous agents are active by nature. Simply put, an autonomous agent is an entity with the will to achieve specific goals, the perception to understand its environment, and the ability to independently make and execute decisions.

While we can expect a large degree of variability in the exact form of autonomous agents, there are some features which tie examples together:

Agency: agents have a sort of “will” where they are compelled to complete a given task, or pursue an overall goal.
Autonomy: they operate without continuous human intervention, maintaining a degree of independence.
Perception: the world is perceived by agents, and they are capable of receiving input from various sources
Decision-Making: through different algorithms or “reasoning engines”, agents are capable of making decisions on their own.
Actuation: they are capable of effecting their environment through some form of actuator or other mechanism

So essentially you can think of an autonomous agent as a machine which is capable of problem-solving similarly to humans. In fact, by these rules, humans are actually autonomous agents. You can see why these might be very useful mechanisms to have around, as more general purpose tooling can automate many workflows.

Autonomous anatomy…

So how do these autonomous agents work under the hood? Well, it’s complicated. The exact specification of a system has a lot of variability due to its modular design. An autonomous agent is more of a loose architecture than a specific technology or even type of application. Therefore, here we’ll want to focus on an overview, and leave specific use-case implementations for other posts.

There are 5 main elements of an autonomous agent, and they all work in service of the greater goal.

Environment…

When considering players in a game, it’s crucial to understand the arena. For us humans, the arena is the world which we inhabit. From the sun's heat to the grass underfoot, to interactions with friends and starlight reaching us from light-years away.

In game theory, terms like agents and environments are helpful but not all-encompassing. The environment is where the agent operates, the touchpoints it interacts with. For humans, this is a complex set of interactions, many unconscious, and hard to distill into a single dataset.

Modern mechanical agents, in contrast, interact with simpler environments. Software agents are confined to digital spaces, while robots, though they interact with the physical world, are limited to specific tasks.

Any line drawn around an arena is arbitrary, but useful for managing complexity. Even software agents can expand their arena through interactions with humans. This thought experiment helps optimize agents by understanding their environments, even if roughly estimated.

Perception…

The environment is great, but as far as the agent is concerned, it all starts with perception. But what is perception? Perception is what you’re doing right now, and always during conscious life. It’s the light hitting your retinas, converting to electrical signals sent through the optic nerve to your brain's occipital lobe. Or the air vibrations from your device's speakers resonating with your eardrum, and converting to signals.

You are actively taking in massive amounts of stimuli, which are eventually converted into a more manageable form. This data is cross-referenced with memory and other systems to allow for a more comprehensive understanding. Attention also plays a crucial role, determining which data should be focused on and what should be ignored. The attended data is presented to the conscious mind, which is you, right now, reading this.

Thus, the perception is the sensory information gathered, processed and presented to the mind.

Memory…

While perception can largely be thought of as the intake and processing of environmental stimuli, it is made up of a variety of factors. Memory, for example, changes the way that we perceive the world. When something is out of place in a familiar environment, it catches our attention. When you talk to a friend in a busy restaurant, you can still understand them because your brain fills in the auditory blanks based on experience.

In fact, this is also how optical illusions work, as your memory and expectations affect how you process the visual input. In perception, memory comes into play by attaching weight to certain stimuli. While this takes many forms, the important thing to remember is that memory and perception are deeply intertwined.

In practice, this can mean filtering relevant information from a knowledge-base, and injecting it into the prompt of a language model. However, memory influences perception, as well as the other way around. So in practice, this also means a mechanism for weighing the value of information and building it over time.

Agency…

This may not come as a surprise, but an agent has agency. This means that it has a will in a certain direction. As humans, we often feel compelled towards certain activities. This could mean inherent biological drives, or it could mean more abstract ideas like getting a job or making money. For mechanical agents, their agency comes entirely from their programming.

Agency affects perception as well. When you take in information from the world, the processing of that information is largely guided by what you deem important for the task at hand. If your goal is to make a sandwich, you’ll probably tune out the birds chirping outside the window, and focus your attention on bread, knives, and fillings instead.

In practice, agency typically means giving a task to a language model, and asking it to generate a list of steps to complete said task.

Actuation…

Perhaps the most crucial aspect of an agent is its ability to actually make things happen. Agents must act. Being able to influence the environment is as critical as being able to perceive it. In fact, this closes the agent loop, where the environment provides stimuli, the agent perceives it and then acts to influence it.

The influence of an agent becomes a part of its environment, and therefore perception, and future action. This continuous loop is the same thing we do. Of course, we’re all sorts of complicated, but the basics are the same. You type a word of your essay, and see it appear on the screen, then you can type the next one.

For an agent, acting on the world can take many forms. For an LLM-based agent, it really comes down to what tools it has access to. Some agents have web browsers and code interpreters which they can use to fulfill requests. Others might be able to make actions happen on robots.

Get ready…

We’re headed in an interesting direction. Businesses are striving to please their investors with the latest “AI” features. Foundation models are getting better and better, and industries are thirsty for hopeful productivity boosts. Language models feel like the future—partly due to effective marketing—and are set to become the next major platform for development.

Much like the iPhone App Store, in the coming years, we’ll see a new wave of applications built on the latest technologies. I believe many of these applications will utilize autonomous agents. Imagine your research being automated, or your trips planned with ease, while there will certainly be noise to navigate, there is real value here.

As the industry moves in this direction, the ability to leverage language models will become a crucial skill. Programming will be as important as ever, but new skills such as prompt engineering will rise in popularity. But what is prompt engineering? Is it just a buzzword?

Fortunately, I’ve explored this topic in detail. Check out my post that separates the signal from the noise in prompt engineering.

The Real Value of Prompt Engineering: How to Leverage LLMs

Aidan Tilgner

May 17, 2024

The Real Value of Prompt Engineering: How to Leverage LLMs

In November 2022, a research lab and startup named OpenAI launched a language-model-based product called ChatGPT. Spreading like wildfire, ChatGPT reached 100 million users just 2 months after launch, and in 2024 received around 10 million queries every day, each with a unique and specific intent. Navigating enormous information landscapes has never bee…

Read full story

Software and Synapses

The Real Value of Prompt Engineering: How to Leverage LLMs

Discussion about this post