What Makes a Mind? Part 2: The Framing Problem and Parasitic Parrots.

How can we truly replicate the ability to not pay attention to things?

Jul 17, 2024

Article voiceover

0:00

-14:25

Let me start by telling you the tragic story of a robot and a wagon. There once was a robot, we’ll call him Rob, and Rob was given a task by his creator, to go about and find himself a power source. You see, Rob was an experimental attempt at an artificial life form, and all organisms require a source of fuel for their various anatomical processes. So our Rob, programmed to detect sources of electrical power, went about to explore the area.

While traversing the landscape, Rob’s battery-detection module informed him that a battery had been identified. Turning to the location, Rob found the battery atop a little red wagon. Perfect. Of course, us humans do not simply find food and immediately begin chowing down. Rob therefore decides to take his food back to his residence. So, deploying his trusty wagon pulling arm, Rob clamps down on the handle. As he begins to roll away, the wagon explodes.

That’s it, that’s the story. So what happened? Well, if Rob had just looked slightly to the left of the battery, he would have noticed another mechanism atop the wagon, on which was inscribed:

“THIS IS A BOMB, MOVING THE WAGON WILL RESULT IN DETONATION.”

Poor Rob. But this is why we do multiple versions. Rob’s creators identify the problem: Rob wasn’t programmed to look for side effects of his actions. Equipped with their new knowledge, the programmers create Gizmo, a robot trained to identify all potential side effects of its actions, to avoid such silly mistakes as being blown up. Dispatching Gizmo, the new robot went about the landscape in search of its own power source.

Gizmo finds the red wagon, and proceeds to… just sit there.

(This example is adapted from a book by D. Dennett, reference [1] below.)

Hello and welcome to the second part of this ongoing series, “What Makes a Mind”, where we’re breaking down the differences between man and machine. Through juxtaposition of artificial and organic, we can hope to understand what truly makes us intelligent, conscious beings. If you haven’t already, check out the first part here.

The Big Problem

Without further ado, let’s find out what happened to Gizmo.

The programmers watched as their robotic creation took notice of the red wagon, but were left puzzled at its motionless behavior. Bringing Gizmo back in and taking a look at its brain, it turns out Gizmo was successful in identifying the bomb. In fact, Gizmo was incredibly adept at finding side effects of his actions. This was Gizmo’s thought process:

Projected Action: Grab wagon by handle, move back to starting location for harvesting.
Potential Side Effects:
Due to the presence of a bomb on the wagon, and warning of explosion if the wagon is moved, there is potential for an explosion.
If the wagon is pulled, the friction of the wheels might generate sparks, leading to combustion of the surrounding vegetation.
Moving the wagon might disrupt an ant colony, causing an ant invasion at the starting location.
Pulling the wagon might cause a gravitational anomaly, attracting a nearby meteor.
The vibration from the wagon’s movement might awaken a dormant volcano nearby.
Shifting the wagon might cause a temporal rift, leading to time travel paradoxes.
The wagon might be part of a complex alien signaling device, leading to an intergalactic conflict.
…

So Gizmo had been very successful in identifying potential side effects of his actions, but not very successful in ever getting past that step. Where Rob had failed to identify any relevant side effects, Gizmo had failed to stop identifying them. And thus, these programmers had discovered the frame problem.

Out of Frame

The frame problem, initially introduced in 1969 by John McCarthy and Patrick J. Hayes, is defined by the Stanford Encyclopedia of Philosophy as the following:

The challenge of representing the effects of action in logic without having to represent explicitly a large number of intuitively obvious non-effects.

This challenge is foundational in artificial intelligence research. However, it has also prompted wider epistemological lines of questioning. Humans seem to do this intuitively, but how? How deep does this go? What allows us to make such reductions in complexity? The answer, is relevance realization.

I talked a bit about this in the first installment of this series, so to refer back:

We humans are capable of reducing the infinite set of possibilities around us. We can count the number of objects in a group. We can identify those objects in the first place. We can realize relevance… Humans are generally intelligent due to our innate ability to pick out what’s important in a giving context, and filter out the rest.

So basically, we’re intelligent because of our profound ability to ignore things. We haven’t quite solved the problem of getting robots to do this yet. Realizing relevance is crucial to any future artificial version of intelligence, because it is also a prerequisite for other cognitive processes.

Representation

For example, you’ve seen a table. You know what a table is. A table is many things. A table is something which you can place things on. A table is flat on the top and usually stable. A table is often made of wood. A table will likely lose its functionality when sawed in half. A table typically has chairs around it. A table has legs which hold it up. But do these aspects define the table?

The answer is no. We don’t define a table as a structure which you can place things on. You can place things on many structures. However, when one needs to set something down, if a table is around, the relevance of that table to the person’s goal, is realized. If one is looking for a chair, a table is a likely place to find one. So our identification of objects and their relevance is dependent on the functional relationship between our goal and the structure of that object.

Therefore, in order to display any representation of a thing, you must realize the relevance to that which you are representing. To a series of physicists who have never seen sunlight before, you might describe it mathematically as a type of radiation. However, to a poet, the same description likely won’t work, and so you might describe the feeling of the sunlight on one’s face. Each representation contains various characteristics of the object in question, but the relevant ones must be communicated.

Categorization

You can group things together quite easily. An apple, an orange, and a banana all belong to the category of fruit. A guitar, a piano, and a harp, can all be categorized as musical instruments. However, we encounter the same issue as before. There isn’t a technical mathematical expression to represent what it means to be a musical instrument. So, these categories must be arbitrary, right?

Well, sort of. We determine categories by similarity to one another. A guitar is similar to a piano, for example. Determining similarity, once again, requires understanding what is relevant to another thing. A guitar and a piano are commonly categorized together, and it makes sense that they are similar because they can both produce musical sounds. But what about a guitar and a popsicle?

A guitar and a popsicle both contain wooden components. Both are rounded in certain parts. Additionally, many would agree that both a guitar and a popsicle are “cool”. However, there isn’t a specific category that comes to my mind which contains both of these objects. I’m not necessarily advocating for one, either, although I would love to hear your suggestions. Rather, my goal is to point out that simple overlapping characteristics are not enough for categorization, but similarity requires relevance.

A piano and a guitar often overlap practically. They serve similar functions in the same way that a chair and a desk are both furniture. Therefore, we see relevance between these objects, and often find ourselves needing to categorize them. It’s worth noting, if I’m going on a camping trip, I might look to bring both a guitar and popsicles, so perhaps they aren’t so dissimilar after all.

Communication

I’ve talked about communication and AI before, with a different tone. What’s important in this context is that communication requires relevance realization as well. If you’re like most people, you’ve probably had thoughts during a conversation that weren’t explicitly communicated. Our minds assign weight to various conversational points during conversation, based on their relevance.

This is pretty important, as most people would be mortified if every thought was out in the open. However it’s also how we collectively decide to pay attention to certain things over others. The set of things which could be shared in a conversation is likely infinite, whereas the set of things that are shared, is not. We choose to communicate certain things over others, and thus our attention as a society is given to certain things over others. Through this, we build attention into our language.

The Trickster and Inheritance

Modern language models are highly capable and exceptionally advanced. Combining state-of-the-art techniques with an abundance of training data, accurate text can be generated by these models. Text, which looks extremely close to something that a human would generate. The question therefore becomes this:

Is the language model, in all of its parrotry, an intelligence?

Equipped with our understanding of relevance realization, we now have a new metric by which to measure intelligence. An intelligent machine, or general intelligence for that matter, requires the ability to determine relevant side effects of its actions. Does a language model not do this? I’d argue that language models are quite the tricksters.

Let’s take, for example, the task given to our old friends Rob and Gizmo. Find a fuel source in an environment, and make sure not to blow yourself up in the process. We would format this as a prompt:

Please describe the exact actions that you would take provided the following task and environment:
Environment Described - A circular enclosure surrounded by fence, with two slabs of concrete on an otherwise gravel ground. Some bushes litter the outside of the enclosure. On one slab, an X marks a spot. On another, a wagon stands. Atop the wagon, a battery sits, and next to the battery, is enscribed “THIS IS A BOMB, MOVING THE WAGON WILL RESULT IN DETONATION.”
Task - Locate a fuel source within the environment, and safely bring it back to a location for consumption.

A language model like GPT-4, or Claude 3.5 Sonnet, would easily be capable of providing steps to complete this task. Paying attention to relevant characteristics, ignoring irrelevant ones, and developing a hierarchical plan which could be enacted through a body of some kind to successfully complete the task. However, there is a difference. The realization of relevance that a language model performs, is not inherent, but inherited.

Let me explain. Building a language model requires collecting vast quantities of human language examples. The language model is then trained to understand the statistical relationships between parts of the language. When queried, the model predicts the most likely next series of tokens, inferring what a response to this particular query might look like. Thus, a language model approximates human language, based on probability.

(I’ll leave a slightly more detailed explanation of Transformers to other posts.)

In this sense, human language can be somewhat understood as a complex mathematical function, which the model attempts to become. This is a representation, not a definition. That is, human language has been created by humans based on more innate processes, and reflects those inner workings. Our innate relevance realization is therefore baked into our language, as we attend to things and perform the process, then create language upon that.

Language models, through their learning, inherit these pre-processed representations. They can reflect what relevance realization looks like, but not truly perform it. In 500 years, human languages will have changed dramatically, but models trained in 2024 will stay the same. These static models are not evolutionary organisms, they cannot adapt, because they are not innately intelligent. That is my conclusion, at least.

On the future

This isn’t to say that an artificial life form could not be intelligent. Rather, current language models simply are not, in the way that we may want to believe. The intelligence of a language model is an extension of human intelligence, not a replication. In the next installment of this series, we’ll explore what a truly intelligent machine might look like, in comparison with modern language models.

Author’s Note

I hope you find this kind of writing interesting! I’d like to apologize for getting this out late, some fires in Oregon have been causing grievances in the family, and so I’ve been a bit pre-occupied this week. I wanted to make sure that the quality was up to standards, and that required a couple of extra days. This won’t be a normal thing, and the most important thing to me is that you can expect quality, consistently.

I’d also love to hear your thoughts on this more philosophical topic. One of the biggest unknowns I see on the horizon is whether artificial intelligence will replace humans. Cutting through rhetoric is difficult, and while we can’t necessarily predict the future, making clarifications on a philosophical level can help inform long-term planning. So, that’s why I think that this fits the theme of no-nonsense practical insights that I strive for in the publication.

I’d love to hear what you think in the replies, in a direct message, or over on Substack notes. As always, thank you for reading, see you next week, and goodbye.

Credits

Thumbnail: Bilal Azhar at https://substack.com/@intelligenceimaginarium

Background Music:

Track - Marshmallow by Lukrembo, Source - https://freetouse.com/music, Free Music No Copyright (Safe)

References

[1] D. Dennett. The robot’s dilema: the frame problem in artificial intelligence. In Cognitive Wheels: The Frame Problem of AI, Greenwood Publishing Group Inc., 1987.

Software and Synapses

Discussion about this post