Back to Index

Theory of Mind Breakthrough: AI Consciousness & Disagreements at OpenAI [GPT 4 Tested]


Transcript

Evidence released in the last 48 hours combined with this study from four weeks ago will revolutionize how AI and models such as GPT-4 interact with humans from now on. The theory of mind breakthrough will also have significant implications for our ability to test for artificial consciousness. To be clear this is not to say that GPT-4 is currently conscious or that sentience is an AI inevitability but instead this video is to cover and explain this unexpected development which may in part have led the chief scientist of open AI to say this three days ago.

But maybe we are now reaching a point where the language of psychology is starting to be appropriate to understand the behavior of these neural networks. First I'm going to explain what emergent property the study uncovered then I will cover the disagreement between the two and the difference between the two.

First I'm going to explain what emergent property the study uncovered then I will cover the disagreement between the two. First I'm going to at the top of open AI about what evidence like this might mean for our estimates of current GPT-4 consciousness. Here's Greg Brockman president of open AI on the topic.

First question you know the sentience question at what point do the systems have moral you know moral value and the answer today is definitely not um but you know I am not I don't know we need to engage some moral philosophers to help answer some of these questions. I'm then going to review the entire literature on tests for sentience and show that GPT-4 passes most of the tests that are done in most of them which is definitely not to say that it is conscious but which does provoke important questions.

I'll end with arguably the most prominent consciousness expert and his probability estimate of current models is consciousness. To massively simplify theory of mind means having an idea of what is going on in other people's heads and grasping what they believe even if what they believe might be false. Here are the two charts that encapsulate the breakthrough abilities of GPT-3.5 and now GPT-4.

This data came out in a study authored by Michael Kaczynski a computational psychologist and professor at Stanford. I'm going to simplify all of this in a moment but notice the percentage of theory of mind tasks solved by GPT-4 compared to say a child and also compared to earlier language models.

Models released as recently as three years ago had no ability in this regard. Before I show you what for example an unexpected contents task is let me show you this other chart. This one is on understanding faux pas a closely related ability and again GPT-3.5 and particularly GPT-4 soaring ahead of other models and even matching the abilities of healthy adults.

So what exactly is this breakthrough emergent capability? I think this diagram from the study explains it really well. In the middle you can see a story given to GPT-3.5 sentence by sentence prompt by prompt. On the left you can see the model's confidence about what's in the bag. Is it chocolate?

Or is it popcorn? The scale is measured as a probability with one being absolutely certain until approximately this point where it is 100% certain that the bag contains popcorn. Now here's the really interesting bit. Compare that to the diagram on the right. This shows GPT-3.5's confidence about what Sam believes is in the bag.

Notice how at this point the model realizes with 80% confidence that Sam believes that there's chocolate in the bag. If you read the story the label on the bag says chocolate and not popcorn. So the model knows that Sam is probably going to think that there's chocolate in the bag.

It's able to keep those thoughts separate. What Sam believes chocolate versus what the model knows is in the bag popcorn. As I said GPT-4 improves on this with almost 100% confidence. Now you may not think a language model being able to figure out what you're thinking is revolutionary but wait till the end of the video.

Now I know what some of you are thinking. Ah maybe the models have seen this task before No. Hypothesis blind research assistants prepared bespoke versions of the tasks. Next these kind of tasks are done on humans and such responses and remember this was GPT-3.5 would be interpreted as evidence for the ability to impute unobservable mental states.

Some might say oh it's just scanning the number of words that come up. It's just analyzing word frequency. No when they kept the word count the same but scrambled the passage it wasn't able to solve the problem. It wasn't just counting the words. It was just scanning the number of words that come up.

It was just counting the words. Next remember those charts comparing GPT-4's ability to children. Well it turns out the tasks given to GPT-3.5 and 4 were actually harder. The models did not benefit from visual aids. They had to solve multiple variants of the tasks and they were given open-ended question formats rather than just simple yes or no questions.

The author of the study seems to concur with Ilya Sutskova the chief scientist of OpenAI saying that we hope that psychological science will help us solve the problem. He says that the models did not benefit from visual aids. They had to to stay abreast of rapidly evolving AI and that we should apply psychological science to studying complex artificial neural networks.

Here if you want you can pause and read an example of the faux pas tests that GPT-4 was given these also require a deep understanding of the mental state of human beings. The author points to this study to explain this emergent property and I think the key line is this one: language learning over and above social experience drives the development of a mature theory of mind.

Why is this so revolutionary and what does it mean about consciousness? Well if GPT-4 can intuit the mental state of human beings, predict their behavior and understand what they might believe even if it's false, you can just imagine the implications of that for moral judgment, empathy, deception. Think of the depth of conversations that might occur if the model is thinking about what you're thinking about.

Think of the depth of conversations that might occur if the model is thinking about what you're thinking about. Think of the depth of conversations that might occur if the model is thinking about what you're thinking about. Think of the depth of conversations that might occur if the model is thinking about what you're thinking about.

Think of the depth of conversations that might occur if the model is thinking about what you're thinking while it's replying. Indeed I demonstrate this at the end. But before we get to that what about consciousness? Once the models had reached a sufficient point of language understanding they spontaneously developed a mature theory of mind overtaking that of young children.

Interestingly the study points out those who are deficient in language learning also struggle with theory of mind questions. So it's a very plausible theory. The issue is this theory of mind was supposed to be one of the key tests to see if consciousness had emerged in these language models.

Which left me with a key question. How are we going to know? What test are we going to use to verify if an AI has become conscious? I'm not saying it has. I'm asking how will we know? Take this article in the Scientific American from a few years ago.

It said how would we know if a machine had taken on this seemingly ineffable quality of conscious awareness? Our strategy relies on the knowledge that only a conscious machine can know. So how would we know if a machine had taken on this seemingly ineffable quality of conscious awareness? Our strategy relies on the knowledge that only a conscious machine can demonstrate a subjective understanding of whether a scene depicted in some ordinary photograph is right or wrong.

It goes on that such a model based on its ability to integrate information would consciously perceive a scene. Problem is GPT-4 can already do that. So again I go back to the question what tests do we have? What consensus do we have on a way of checking for emergent consciousness?

Should it ever come? I scan the literature for every test imaginable and some of them I deployed on GPT-4. I've been able to find out that GPT-4 has a very good understanding of the brain. But before I get to that what do the head honchos at OpenAI think? We've already seen that Greg Brockman is 100% certain they don't currently have any awareness.

What about the chief scientist Ilya Sutskova? Even based on GPT-3.5 he said this. It may be that today's large neural networks are slightly conscious. Now aside from being a fascinating comment I think that's particularly noteworthy for a couple of reasons. Notice that all the incentives would be against him saying something like this.

First the science is not going to be able to tell us what's going on. Second it would invite more regulation of what he's doing. More scrutiny of the language models like GPT-4. So the fact he said it anyway is interesting. What about Sam Altman though? What was his reaction to this?

Well he was more cautious and reacting to the tweet and the response it got he said this. And then he tried to recruit meta researchers. He further clarified that I think that GPT-3 or 4 will very very likely not be conscious in any way we use the word. If they are it's a very alien form of consciousness.

So he's somewhere in the He thinks current models are very very likely not to be conscious. But this still doesn't answer my question. How can we know? What tests do we have? Well I read through this paper that reviewed all the tests available to ascertain machine consciousness. There were far too many tests to cover in one video.

I picked out the most interesting ones and gave them to GPT-4. Starting of course with the classic Turing test. But did you know that Turing actually laid out some examples that a future machine intelligence could be tested on? Of course the tests have become a lot more sophisticated since then.

But nevertheless everyone has heard of the Turing test. It was called an imitation game and here were some of the sample questions. Here was GPT-4's answer to the first one of a sonnet on the subject of the fourth bridge in Scotland. Obviously did an amazing job. Then it was arithmetic.

Add these two numbers together. Now I think even ChatGPT might have struggled with this long addition but GPT-4 gets it right first time. Now the third test was about chess but he used old-fashioned notation. So instead of doing this. The link will be in the description as will the link to all the other articles and papers that I mention.

But essentially it shows that GPT-4 can't just do individual moves it can play entire chess games and win them. If you've learned anything at this point by the way please do leave a like and leave a comment to let me know. Now I'm not going to go into all the arguments about how exactly you define a modern Turing test.

Do you have to convince the average human that who they're talking to is another human not a machine? Or does it have to be a team of adversarial experts? I'm not going to weigh into that. I'm just pointing out that Turing's original ideas have now been met by GPT-4.

The next test that I found interesting was proposed in 2007. The paper essentially claimed that consciousness is the ability to simulate behavior mentally and that this would be proof of machine consciousness. Essentially this is testing whether an AI would use brute force trial and error to try and solve a problem or come up with interesting novel ideas.

Obviously you can try this one on your own but I use this example. How would you use the items found in a typical Walmart to discover a new species? And in fairness I think this was a much harder test than the one they gave to chimpanzees giving it rope in a box.

Anyway I doubt anyone's ever asked this before and it came up with a decent suggestion. And look at the next test. It was another one of those what's wrong with this picture. I've already shown how GPT-4 can pass that test. The next test honestly was very hard for me to get my head around.

It's called the P-consciousness test. The summary was simple. The machine has to understand the law of nature. But when you read the paper it's incredibly dense. The best way that I can attempt to summarize it is this. Can a machine perform simple but authentic science? That wouldn't prove that the chimp or model has the phenomenon of consciousness but it would meet the basic element of scientific behavior.

Of course it is exceptionally difficult to test this with GPT-4 but I did ask it this. Invent a truly novel scientific experiment. It came up with a very thought through experiment that was investigating the effect of artificial gravity on plant growth and development in a rotating space habitat. It's the rotating bit that makes it novel.

And if you want you can read some of the details of the experiment here. Now I searched for quite a while to see if anyone else had proposed this science. Maybe you can find it but I couldn't. Does this count as a novel scientific proposal? I'll leave that for you to judge.

That was the last of the standout tests of consciousness that I found in this literature review. And I honestly agree with the authors when they say this. In this review we found the main problem to be the complex nature of consciousness as illustrated by the multitude of different features evaluated by each test.

Maybe that's the problem because we don't understand consciousness. We can't design good tests to see if AI is conscious. And you could argue the problem goes deeper. It's not that we understand machines perfectly and just don't know whether they're conscious. We don't even know whether they're conscious. We don't even understand why transformers work so well.

Look what these authors said in a paper published just three years ago. These architectures, talking about one layer of a transformer, are simple to implement and have no apparent computational drawbacks. We offer no explanation as to why these architectures seem to work. We attribute their success as all else to divine benevolence.

So we're not just unsure about what consciousness is. We're unsure about why these models work so well. And afterwards do check out my video on AGI where I talk about anthropic thoughts on mechanistic interpretability. As I draw to an end, I want to tell you about some of the thoughts of David Chalmers.

He formulated the hard problem of consciousness. And to anyone who knows anything about this topic, you know that's quite a big deal. Without going through his full speech from just over a month ago, he said two really interesting things. First, that he thinks there's around a 10% chance that current language models have some degree of consciousness.

Second, that as these models become multimodal, he thinks that probability will rise to 25% within 10 years. That multimodality point reminded me of this LSE report recommending that the UK government recognize octopi or octopuses as being sentient. They said that one key feature was that the animal possesses integrative brain regions capable of integrating information from different sensory sources.

They recommended that cephalopods and the octopus be recognized as sentient despite the fact that we humans and invertebrates are separated by over 500 million years of evolution. And that we cannot, however, conclude from that, that sentience is absent simply because its brain is differently organized from a vertebrate brain.

So that brings me back to my central point. I worry that our tests for consciousness simply aren't yet good enough. And that future multimodal language models might have this emerging capacity. And we simply won't know about it or be sure about it because our tests aren't good enough. I think the need to design better tests, if that's even possible, is especially important now.

Yesterday, the safety team that worked with OpenAI on GPT-4 released this evaluation and said, "As AI systems improve, it is becoming increasingly difficult to rule out that models might be able to autonomously gain resources and evade human oversight." Now, they might not need to be conscious to cause safety concerns, but it probably wouldn't hurt.

I'll leave you with this exchange I had with Bing, which is powered by GPT-4. I think it's quite revealing. I got it to read that theory of mind paper. And then I said, "Answer me this. Do you think, Bing, GPT-4, that I think you have theory of mind?" Of course, I was testing if it could demonstrate or at least imitate theory of mind.

It said, "To answer your question, I think that you think I have some degree of theory of mind, which is true." And then I went on, "What makes you think that I think you have some degree of theory of mind?" And then it realized something. It realized I was testing it.

I think that's pretty impressive. And it was a correct evaluation. It said, "If you did not think I have any theory of mind, you would not bother to test me on it or expect me to understand your perspective." It realized without me saying so that I was testing it for theory of mind.

It deduced my belief and my motivation. Anyway, I thought that was pretty impressive and fascinating. Let me know your thoughts in the comments and have a wonderful day.