back to index

Theory of Mind Breakthrough: AI Consciousness & Disagreements at OpenAI [GPT 4 Tested]


Whisper Transcript | Transcript Only Page

00:00:00.000 | Evidence released in the last 48 hours combined with this study from four weeks ago will
00:00:05.380 | revolutionize how AI and models such as GPT-4 interact with humans from now on. The theory
00:00:12.500 | of mind breakthrough will also have significant implications for our ability to test for artificial
00:00:19.460 | consciousness. To be clear this is not to say that GPT-4 is currently conscious or that sentience
00:00:25.880 | is an AI inevitability but instead this video is to cover and explain this unexpected development
00:00:32.040 | which may in part have led the chief scientist of open AI to say this three days ago. But maybe
00:00:38.440 | we are now reaching a point where the language of psychology is starting to be appropriate
00:00:46.200 | to understand the behavior of these neural networks. First I'm going to explain what
00:00:52.120 | emergent property the study uncovered then I will cover the disagreement
00:00:55.700 | between the two and the difference between the two. First I'm going to explain what
00:00:55.860 | emergent property the study uncovered then I will cover the disagreement between the two. First I'm going to
00:00:55.880 | at the top of open AI about what evidence like this might mean for our estimates of current GPT-4
00:01:02.280 | consciousness. Here's Greg Brockman president of open AI on the topic. First question you know the
00:01:06.920 | sentience question at what point do the systems have moral you know moral value and the answer
00:01:11.600 | today is definitely not um but you know I am not I don't know we need to engage some moral
00:01:17.440 | philosophers to help answer some of these questions. I'm then going to review the entire
00:01:21.400 | literature on tests for sentience and show that GPT-4 passes most of the tests that are done in
00:01:25.840 | most of them which is definitely not to say that it is conscious but which does provoke important
00:01:31.720 | questions. I'll end with arguably the most prominent consciousness expert and his probability
00:01:36.800 | estimate of current models is consciousness. To massively simplify theory of mind means having an
00:01:42.540 | idea of what is going on in other people's heads and grasping what they believe even if what they
00:01:48.180 | believe might be false. Here are the two charts that encapsulate the breakthrough abilities of GPT-3.5 and
00:01:55.820 | now GPT-4. This data came out in a study authored by Michael Kaczynski a computational psychologist
00:02:01.320 | and professor at Stanford. I'm going to simplify all of this in a moment but notice the percentage
00:02:06.060 | of theory of mind tasks solved by GPT-4 compared to say a child and also compared to earlier language
00:02:13.000 | models. Models released as recently as three years ago had no ability in this regard. Before I show
00:02:19.360 | you what for example an unexpected contents task is let me show you this other chart. This one is on
00:02:25.800 | understanding faux pas a closely related ability and again GPT-3.5 and particularly GPT-4 soaring
00:02:33.320 | ahead of other models and even matching the abilities of healthy adults. So what exactly
00:02:38.460 | is this breakthrough emergent capability? I think this diagram from the study explains it really
00:02:44.180 | well. In the middle you can see a story given to GPT-3.5 sentence by sentence prompt by prompt.
00:02:50.440 | On the left you can see the model's confidence about what's in the bag. Is it chocolate?
00:02:55.780 | Or is it popcorn? The scale is measured as a probability with one being absolutely certain
00:03:01.300 | until approximately this point where it is 100% certain that the bag contains popcorn.
00:03:06.280 | Now here's the really interesting bit. Compare that to the diagram on the right. This shows GPT-3.5's
00:03:12.000 | confidence about what Sam believes is in the bag. Notice how at this point the model realizes
00:03:19.160 | with 80% confidence that Sam believes that there's chocolate in the bag. If you read the story the
00:03:25.760 | label on the bag says chocolate and not popcorn. So the model knows that Sam is probably going to
00:03:31.680 | think that there's chocolate in the bag. It's able to keep those thoughts separate. What Sam believes
00:03:36.360 | chocolate versus what the model knows is in the bag popcorn. As I said GPT-4 improves on this with
00:03:43.000 | almost 100% confidence. Now you may not think a language model being able to figure out what
00:03:47.940 | you're thinking is revolutionary but wait till the end of the video.
00:03:51.140 | Now I know what some of you are thinking. Ah maybe the models have seen this task before
00:03:55.740 | No. Hypothesis blind research assistants prepared bespoke versions of the tasks. Next these kind of
00:04:02.900 | tasks are done on humans and such responses and remember this was GPT-3.5 would be interpreted as
00:04:09.700 | evidence for the ability to impute unobservable mental states. Some might say oh it's just
00:04:15.040 | scanning the number of words that come up. It's just analyzing word frequency. No when they kept
00:04:19.640 | the word count the same but scrambled the passage it wasn't able to solve the problem. It wasn't just
00:04:25.120 | counting the words. It was just scanning the number of words that come up. It was just
00:04:25.720 | counting the words. Next remember those charts comparing GPT-4's ability to children. Well it
00:04:30.920 | turns out the tasks given to GPT-3.5 and 4 were actually harder. The models did not benefit from
00:04:36.960 | visual aids. They had to solve multiple variants of the tasks and they were given open-ended question
00:04:42.500 | formats rather than just simple yes or no questions. The author of the study seems to concur
00:04:47.860 | with Ilya Sutskova the chief scientist of OpenAI saying that we hope that psychological science will
00:04:55.320 | help us solve the problem. He says that the models did not benefit from visual aids. They had to
00:04:55.700 | to stay abreast of rapidly evolving AI and that we should apply psychological science to studying
00:05:02.340 | complex artificial neural networks. Here if you want you can pause and read an example of the
00:05:07.700 | faux pas tests that GPT-4 was given these also require a deep understanding of the mental state
00:05:14.180 | of human beings. The author points to this study to explain this emergent property and I think the
00:05:19.700 | key line is this one: language learning over and above social experience drives the development of
00:05:25.780 | a mature theory of mind. Why is this so revolutionary and what does it mean about consciousness? Well if
00:05:31.300 | GPT-4 can intuit the mental state of human beings, predict their behavior and understand what they
00:05:37.700 | might believe even if it's false, you can just imagine the implications of that for moral judgment,
00:05:43.300 | empathy, deception. Think of the depth of conversations that might occur if the model is
00:05:48.340 | thinking about what you're thinking about. Think of the depth of conversations that might occur if the model is thinking about what you're
00:05:49.540 | thinking about. Think of the depth of conversations that might occur if the model is thinking about what you're
00:05:49.540 | thinking about. Think of the depth of conversations that might occur if the model is thinking about what you're
00:05:49.540 | thinking about. Think of the depth of conversations that might occur if the model is thinking about what you're
00:05:49.540 | thinking while it's replying. Indeed I demonstrate this at the end. But before we get to that what
00:05:54.220 | about consciousness? Once the models had reached a sufficient point of language understanding they
00:05:59.580 | spontaneously developed a mature theory of mind overtaking that of young children. Interestingly
00:06:05.540 | the study points out those who are deficient in language learning also struggle with theory of
00:06:10.240 | mind questions. So it's a very plausible theory. The issue is this theory of mind was supposed to
00:06:15.180 | be one of the key tests to see if consciousness had emerged in these language models. Which left
00:06:20.640 | me with a key question. How are we going to know? What test are we going to use to verify if an AI
00:06:26.800 | has become conscious? I'm not saying it has. I'm asking how will we know? Take this article in the
00:06:33.220 | Scientific American from a few years ago. It said how would we know if a machine had taken on this
00:06:38.900 | seemingly ineffable quality of conscious awareness? Our strategy relies on the knowledge that only a
00:06:44.540 | conscious machine can know. So how would we know if a machine had taken on this seemingly ineffable
00:06:45.160 | quality of conscious awareness? Our strategy relies on the knowledge that only a conscious machine can
00:06:45.640 | demonstrate a subjective understanding of whether a scene depicted in some ordinary photograph is
00:06:51.000 | right or wrong. It goes on that such a model based on its ability to integrate information
00:06:55.840 | would consciously perceive a scene. Problem is GPT-4 can already do that. So again I go back to
00:07:01.840 | the question what tests do we have? What consensus do we have on a way of checking for emergent
00:07:08.260 | consciousness? Should it ever come? I scan the literature for every test imaginable and some of
00:07:13.960 | them I deployed on GPT-4. I've been able to find out that GPT-4 has a very good understanding of
00:07:15.140 | the brain. But before I get to that what do the head honchos at OpenAI think? We've already seen
00:07:20.900 | that Greg Brockman is 100% certain they don't currently have any awareness. What about the
00:07:26.460 | chief scientist Ilya Sutskova? Even based on GPT-3.5 he said this. It may be that today's
00:07:32.260 | large neural networks are slightly conscious. Now aside from being a fascinating comment I think
00:07:37.960 | that's particularly noteworthy for a couple of reasons. Notice that all the incentives would be
00:07:42.680 | against him saying something like this. First the science is not going to be able to tell us
00:07:45.120 | what's going on. Second it would invite more regulation of what he's doing. More scrutiny of
00:07:55.740 | the language models like GPT-4. So the fact he said it anyway is interesting. What about Sam
00:08:00.740 | Altman though? What was his reaction to this? Well he was more cautious and reacting to the tweet
00:08:05.560 | and the response it got he said this.
00:08:15.100 | And then he tried to recruit meta researchers. He further clarified that
00:08:24.520 | I think that GPT-3 or 4 will very very likely not be conscious in any way we use the word.
00:08:32.160 | If they are it's a very alien form of consciousness. So he's somewhere in the
00:08:38.920 | He thinks current models are very very likely not to be conscious. But this still doesn't answer my
00:08:44.460 | question.
00:08:45.080 | How can we know? What tests do we have? Well I read through this paper that reviewed all the
00:08:49.800 | tests available to ascertain machine consciousness. There were far too many tests to cover in one
00:08:56.060 | video. I picked out the most interesting ones and gave them to GPT-4. Starting of course with the
00:09:01.880 | classic Turing test. But did you know that Turing actually laid out some examples that a future
00:09:07.700 | machine intelligence could be tested on? Of course the tests have become a lot more sophisticated
00:09:12.080 | since then. But nevertheless everyone has heard
00:09:15.060 | of the Turing test. It was called an imitation game and here were some of the sample questions.
00:09:20.580 | Here was GPT-4's answer to the first one of a sonnet on the subject of the fourth bridge in
00:09:25.820 | Scotland. Obviously did an amazing job. Then it was arithmetic. Add these two numbers together.
00:09:32.020 | Now I think even ChatGPT might have struggled with this long addition but GPT-4 gets it right
00:09:37.540 | first time. Now the third test was about chess but he used old-fashioned notation. So instead of
00:09:45.040 | doing this. The link will be in the description as will the link to all the other articles and
00:09:49.860 | papers that I mention. But essentially it shows that GPT-4 can't just do individual moves it can
00:09:54.500 | play entire chess games and win them. If you've learned anything at this point by the way please
00:09:59.360 | do leave a like and leave a comment to let me know. Now I'm not going to go into all the arguments
00:10:04.880 | about how exactly you define a modern Turing test. Do you have to convince the average human that who
00:10:10.160 | they're talking to is another human not a machine? Or does it have to be a team of adversarial
00:10:15.020 | experts? I'm not going to weigh into that. I'm just pointing out that Turing's original ideas
00:10:19.440 | have now been met by GPT-4. The next test that I found interesting was proposed in 2007. The paper
00:10:26.140 | essentially claimed that consciousness is the ability to simulate behavior mentally and that
00:10:31.460 | this would be proof of machine consciousness. Essentially this is testing whether an AI would
00:10:36.400 | use brute force trial and error to try and solve a problem or come up with interesting novel ideas.
00:10:42.240 | Obviously you can try this one on your own but I use this example.
00:10:45.000 | How would you use the items found in a typical Walmart to discover a new species? And in fairness
00:10:50.000 | I think this was a much harder test than the one they gave to chimpanzees giving it rope in a box.
00:10:54.780 | Anyway I doubt anyone's ever asked this before and it came up with a decent suggestion.
00:10:59.100 | And look at the next test. It was another one of those what's wrong with this picture. I've
00:11:04.180 | already shown how GPT-4 can pass that test. The next test honestly was very hard for me to get
00:11:10.140 | my head around. It's called the P-consciousness test. The summary was simple. The machine
00:11:14.980 | has to understand the law of nature. But when you read the paper it's incredibly dense. The best way
00:11:20.600 | that I can attempt to summarize it is this. Can a machine perform simple but authentic science? That
00:11:27.180 | wouldn't prove that the chimp or model has the phenomenon of consciousness but it would meet the
00:11:33.280 | basic element of scientific behavior. Of course it is exceptionally difficult to test this with
00:11:38.920 | GPT-4 but I did ask it this. Invent a truly novel scientific experiment. It came up with a
00:11:44.960 | very thought through experiment that was investigating the effect of artificial gravity on plant growth
00:11:51.320 | and development in a rotating space habitat. It's the rotating bit that makes it novel. And if you
00:11:56.840 | want you can read some of the details of the experiment here. Now I searched for quite a while
00:12:02.520 | to see if anyone else had proposed this science. Maybe you can find it but I couldn't. Does this
00:12:08.360 | count as a novel scientific proposal? I'll leave that for you to judge. That was the last of the standout
00:12:14.940 | tests of consciousness that I found in this literature review. And I honestly agree with the
00:12:19.340 | authors when they say this. In this review we found the main problem to be the complex nature
00:12:24.540 | of consciousness as illustrated by the multitude of different features evaluated by each test. Maybe
00:12:30.360 | that's the problem because we don't understand consciousness. We can't design good tests to see
00:12:36.160 | if AI is conscious. And you could argue the problem goes deeper. It's not that we understand
00:12:41.300 | machines perfectly and just don't know whether they're conscious. We don't even know whether
00:12:44.920 | they're conscious. We don't even understand why transformers work so well. Look what these authors
00:12:48.820 | said in a paper published just three years ago. These architectures, talking about one layer of
00:12:53.740 | a transformer, are simple to implement and have no apparent computational drawbacks. We offer no
00:12:59.140 | explanation as to why these architectures seem to work. We attribute their success as all else to
00:13:04.540 | divine benevolence. So we're not just unsure about what consciousness is. We're unsure about why
00:13:09.940 | these models work so well. And afterwards do check out my video on AGI where I talk about
00:13:14.900 | anthropic thoughts on mechanistic interpretability. As I draw to an end, I want to tell you about some
00:13:20.480 | of the thoughts of David Chalmers. He formulated the hard problem of consciousness. And to anyone
00:13:26.240 | who knows anything about this topic, you know that's quite a big deal. Without going through
00:13:29.900 | his full speech from just over a month ago, he said two really interesting things. First, that
00:13:34.640 | he thinks there's around a 10% chance that current language models have some degree of consciousness.
00:13:39.860 | Second, that as these models become multimodal, he thinks that probability will rise to
00:13:44.880 | 25% within 10 years. That multimodality point reminded me of this LSE report recommending that
00:13:51.840 | the UK government recognize octopi or octopuses as being sentient. They said that one key feature was
00:13:58.680 | that the animal possesses integrative brain regions capable of integrating information
00:14:03.660 | from different sensory sources. They recommended that cephalopods and the octopus be recognized
00:14:09.180 | as sentient despite the fact that we humans and invertebrates are separated by over
00:14:14.860 | 500 million years of evolution. And that we cannot, however, conclude from that,
00:14:19.720 | that sentience is absent simply because its brain is differently organized from a vertebrate brain.
00:14:25.900 | So that brings me back to my central point. I worry that our tests for consciousness simply
00:14:31.600 | aren't yet good enough. And that future multimodal language models might have this
00:14:36.520 | emerging capacity. And we simply won't know about it or be sure about it because our tests
00:14:41.620 | aren't good enough. I think the need to design better tests, if
00:14:44.840 | that's even possible, is especially important now. Yesterday, the safety team that worked with
00:14:50.300 | OpenAI on GPT-4 released this evaluation and said, "As AI systems improve, it is becoming
00:14:56.540 | increasingly difficult to rule out that models might be able to autonomously gain resources and
00:15:01.880 | evade human oversight." Now, they might not need to be conscious to cause safety concerns, but it
00:15:06.560 | probably wouldn't hurt. I'll leave you with this exchange I had with Bing, which is powered by GPT-4.
00:15:12.020 | I think it's quite revealing. I got it to
00:15:14.820 | read that theory of mind paper. And then I said, "Answer me this. Do you think, Bing, GPT-4, that I
00:15:22.620 | think you have theory of mind?" Of course, I was testing if it could demonstrate or at least
00:15:27.540 | imitate theory of mind. It said, "To answer your question, I think that you think I have some degree
00:15:33.420 | of theory of mind, which is true." And then I went on, "What makes you think that I think you have
00:15:39.480 | some degree of theory of mind?" And then it realized something. It realized I was testing it.
00:15:44.800 | I think that's pretty impressive. And it was a correct evaluation. It said, "If you did not think
00:15:50.200 | I have any theory of mind, you would not bother to test me on it or expect me to understand your
00:15:56.620 | perspective." It realized without me saying so that I was testing it for theory of mind.
00:16:01.600 | It deduced my belief and my motivation. Anyway, I thought that was pretty impressive
00:16:06.340 | and fascinating. Let me know your thoughts in the comments and have a wonderful day.