Turing Test: Can Machines Think?

00:00:00.000 | In this video, I propose to ask the question

00:00:02.280 | that was asked by Alan Turing almost 70 years ago

00:00:05.240 | in his paper, Computing Machinery and Intelligence.

00:00:09.060 | Can machines think?

00:00:10.780 | This is the first paper in a paper reading club

00:00:16.280 | that we started focused on artificial intelligence,

00:00:18.600 | but also including mathematics, physics,

00:00:21.080 | computer science, neuroscience,

00:00:22.360 | all of the scientific and engineering disciplines.

00:00:25.020 | On the surface, this is a philosophical paper,

00:00:27.440 | but really it's one of the most impactful, important

00:00:30.200 | first steps towards actually engineering

00:00:32.720 | intelligent systems by providing a test,

00:00:35.680 | a benchmark that we call today the Turing test

00:00:39.480 | of how we can actually know quantifiably

00:00:43.040 | that a system has become intelligent.

00:00:46.560 | So I'd like to talk about an overview of ideas

00:00:50.200 | in the paper, provide some of the objections

00:00:52.740 | inside the paper and external to the paper,

00:00:55.520 | consider some alternatives to the test proposed

00:00:58.320 | within the paper, and then finished with some takeaways.

00:01:03.100 | Like I said, the title of the paper

00:01:04.720 | was Computing Machinery and Intelligence,

00:01:07.080 | published almost 70 years ago in 1950,

00:01:10.240 | author Alan Turing.

00:01:12.400 | And to me, now we can argue about this,

00:01:15.120 | on the slide I say it's one of the most impactful papers.

00:01:17.720 | To me, it probably is the most impactful paper

00:01:20.620 | in the history of artificial intelligence

00:01:22.680 | while only being a philosophy paper.

00:01:26.320 | I think the number of researchers

00:01:29.160 | from inside computer science and from outside

00:01:31.680 | that has inspired, has made dream

00:01:34.400 | at a collective intelligence level of our species

00:01:38.440 | inspired that this is possible, I think is immeasurable.

00:01:42.320 | For all the major engineering breakthroughs

00:01:44.960 | and computer science breakthroughs and papers

00:01:47.920 | stretching all the way back to the 30s and 40s

00:01:50.600 | with even the work by Alan Turing with the Turing machine,

00:01:55.040 | some of the mathematical foundations of computer science

00:01:59.220 | to today with deep learning, a sequence of papers

00:02:02.680 | from the very practical Alex Ned paper

00:02:04.680 | to the back propagation papers.

00:02:06.000 | So all of these papers that underlie

00:02:09.400 | the actual successes of the field,

00:02:11.720 | I think the seed was planted.

00:02:15.120 | The dream was born with this paper.

00:02:19.800 | And it happens to have some of my favorite opening lines

00:02:23.520 | of any paper I've ever read.

00:02:25.600 | It goes, I propose to consider the question,

00:02:29.680 | can machines think?

00:02:31.720 | This should begin with the definitions

00:02:33.280 | of the meaning of the terms machine and think.

00:02:37.000 | The definition might be framed so as to reflect

00:02:39.880 | so far as possible the normal use of the words,

00:02:42.920 | but this attitude is dangerous.

00:02:44.960 | If the meaning of the words machine and think

00:02:47.360 | are to be found in examining

00:02:49.200 | how they're commonly used,

00:02:50.720 | it is difficult to escape the conclusion

00:02:52.520 | that the meaning and the answer to the question,

00:02:54.640 | can machines think is to be sought

00:02:57.240 | in a statistical survey such as a Gallup poll.

00:03:00.240 | But this is absurd.

00:03:02.040 | Instead of attempting such a definition,

00:03:03.960 | I shall replace the question by another,

00:03:06.320 | which is closely related to it

00:03:08.200 | and is expressed in relatively unambiguous terms.

00:03:12.040 | And he goes on to define the imitation game,

00:03:14.760 | the construction that we today call the Turing test,

00:03:18.160 | which goes like this.

00:03:19.640 | There's a human interrogator on one side of the wall,

00:03:23.580 | and there's two entities,

00:03:24.960 | one a machine, one a human on the other side.

00:03:29.320 | And the human interrogator communicates

00:03:32.080 | with the two entities on the other side of the wall

00:03:34.680 | by written word, by passing those back and forth.

00:03:38.280 | And after some time of this conversation,

00:03:40.280 | the human interrogator is tasked with making a decision,

00:03:43.280 | which of the other two entities is a human

00:03:46.040 | and which is a machine.

00:03:47.560 | I think this is a powerful leap of engineering,

00:03:50.480 | which is take an ambiguous but a profound question

00:03:54.040 | like can machines think and convert it into a concrete test

00:03:59.040 | that can serve as a benchmark of intelligence.

00:04:01.960 | But there's echoes in this question

00:04:04.720 | to some of the other profound questions that we often ask.

00:04:07.520 | So not only can machines think,

00:04:09.000 | but can machines be conscious?

00:04:11.120 | Can machines fall in love?

00:04:12.780 | Can machines create art, music, poetry?

00:04:17.080 | Can machines enjoy a delicious meal,

00:04:19.840 | a piece of chocolate cake?

00:04:22.000 | I think these are really, really important questions,

00:04:24.800 | but very difficult to ask

00:04:26.920 | when we're trying to create a non-human system

00:04:29.840 | that tries to achieve human level capabilities.

00:04:33.120 | So that's where Turing formulates this imitation game.

00:04:36.680 | And his prediction was that by the year 2000,

00:04:40.580 | or in 50 years since the paper,

00:04:44.040 | that a machine with 100 megabytes of storage

00:04:47.400 | will fool 30% of humans

00:04:49.240 | in a five-minute test of conversation.

00:04:52.020 | Another broader societal prediction he made,

00:04:56.480 | which I think is also interesting,

00:04:58.160 | is that people will no longer consider a phrase

00:05:00.800 | like thinking machine contradictory.

00:05:04.280 | So basically artificial intelligence at a human level

00:05:08.400 | becomes so commonplace that we would just take it

00:05:10.440 | for granted.

00:05:11.700 | And the other part that he goes at length

00:05:14.420 | towards the end of the paper to describe,

00:05:16.680 | which he believes that learning machines,

00:05:18.820 | or machine learning,

00:05:20.700 | will be a critical component of this success.

00:05:23.740 | I think it's also useful to break apart

00:05:25.940 | two implied claims within the paper,

00:05:28.860 | open claims, open questions.

00:05:31.940 | One is that the imitation game as Turing proposes

00:05:36.420 | is a good test of intelligence.

00:05:39.300 | And the second is that machines can actually pass this test.

00:05:44.300 | So when you say can machines think,

00:05:46.540 | you're both proposing an engineering benchmark

00:05:50.540 | for the word think,

00:05:52.480 | and raising the questions,

00:05:54.400 | can machines pass this benchmark?

00:05:57.860 | One of the perhaps tragic,

00:05:59.580 | but also exciting aspects of this whole area of work

00:06:03.580 | is that we still have a lot of work to do.

00:06:06.140 | So throughout this presentation,

00:06:08.300 | I will not only describe some of the ideas in the paper

00:06:10.780 | and outside of it in the years since,

00:06:12.820 | but also some of the open questions that remain,

00:06:16.480 | both at the philosophical, the psychological,

00:06:18.620 | and the technical levels.

00:06:20.020 | So here the open question stands,

00:06:21.800 | is it even possible to create a test of intelligence

00:06:25.380 | for artificial systems that will be convincing to us?

00:06:29.340 | Or will we always raise the bar?

00:06:31.400 | A corollary of that question is,

00:06:34.700 | looking at the prediction that Turing made

00:06:36.940 | that people will no longer find

00:06:38.620 | the phrase thinking machines contradictory,

00:06:41.440 | why do we still find that phrase contradictory?

00:06:45.620 | Why do we still think that computers

00:06:47.940 | are not at all intelligent?

00:06:50.100 | For many people, the game of chess

00:06:53.020 | was seen as the highest level of intelligence

00:06:55.100 | in these early days.

00:06:57.120 | In fact, we assign a lot of intelligence

00:07:00.540 | to Garry Kasparov for being one of the greatest,

00:07:03.300 | if not the greatest chess players of all time,

00:07:06.260 | as a human.

00:07:07.560 | Why do we not assign at least an inkling of that

00:07:10.900 | to IBM Deep Blue when it beat Garry Kasparov?

00:07:14.320 | Now, of course, you might start saying

00:07:16.460 | that it's a brute force algorithm,

00:07:18.020 | or in the case of AlphaGo and AlphaZero,

00:07:20.900 | you know how the learning mechanisms

00:07:22.940 | behind those algorithms work

00:07:24.340 | when they mastered the game of Go and the game of chess.

00:07:27.140 | And we'll get to some of those objections,

00:07:30.200 | but there's something deeply psychological

00:07:32.460 | within those objections

00:07:34.260 | that almost fear an artificial intelligence

00:07:37.780 | that passes the test.

00:07:39.740 | So the Turing test is very interesting

00:07:41.620 | as a thought experiment, as a philosophical construct,

00:07:45.100 | but it's also interesting as a real engineering test.

00:07:48.120 | And one of the implementations of it

00:07:50.020 | has been called the Lobner Prize,

00:07:52.440 | which has been running since 1991 to today.

00:07:55.660 | And the awards behind it, the award structure,

00:07:58.700 | is $25,000 for a system that you can use

00:08:03.740 | that using text alone passes the test,

00:08:06.180 | and $100,000 that uses other modalities

00:08:09.820 | like visual and auditory input.

00:08:12.420 | The rules of the competition have changed through the years,

00:08:14.460 | but they're currently are as follows.

00:08:16.340 | It's a 25-minute conversation,

00:08:19.140 | and in order to win, to pass the test,

00:08:21.540 | you have to fool 50% of the judges

00:08:24.180 | with which the system communicates.

00:08:26.780 | Mitsuko and Rose from Steve Warswick and Bruce Wilcox

00:08:32.340 | have been dominating the past 10 years,

00:08:34.540 | winning all but one of the years.

00:08:37.080 | Some details, Mitsuko and Rose are both

00:08:41.220 | mostly scripted, rule-based chatbots,

00:08:43.740 | so they're not end-to-end learning systems.

00:08:48.320 | I believe there is a little bit of machine learning,

00:08:50.660 | but as I understand, at the core, they're mostly scripted.

00:08:54.540 | And on the next slide, we'll go through a transcript

00:08:56.620 | of a conversation with Mitsuko to give you a sense

00:08:59.100 | of what those conversations are like.

00:09:01.020 | But I'd like to mention sort of a tragic note, really,

00:09:04.820 | that as far as I understand, the prize is no longer funded.

00:09:08.660 | But money aside, it shows that there's not a broad interest

00:09:13.380 | in this construction, both inside

00:09:15.940 | and outside the research community.

00:09:18.300 | It's really surprising to me that Google DeepMind,

00:09:21.600 | Facebook AI, major groups, major individual researchers

00:09:25.940 | in the field of AI have not taken on this challenge.

00:09:29.280 | I hope to actually be a small part of changing that

00:09:32.100 | a little bit, because I think there is a lot of value

00:09:35.060 | in considering the Turing test.

00:09:36.460 | I don't think it's a distraction.

00:09:38.020 | And I'll return to that point later on in this presentation.

00:09:41.660 | So you can pause and maybe read parts of this transcript.

00:09:45.580 | I'll highlight on the right, at the top of the right column,

00:09:48.660 | the human says, "You're a robot."

00:09:50.940 | Mitsuko answers, "How can a person be a robot?"

00:09:54.660 | Human, "Many people live lives akin to robots."

00:09:58.380 | Mitsuko, "How many, more than a million?"

00:10:01.140 | Human, "Yes."

00:10:03.180 | Mitsuko, "That seems like rather a lot."

00:10:06.020 | Human, "Do you think Southampton will beat Swansea?"

00:10:10.260 | Mitsuko, "I think so."

00:10:12.180 | Human, "Why?"

00:10:13.500 | Mitsuko, "No reason, it's just what I think."

00:10:16.140 | Human, "You must have a reason."

00:10:18.180 | Mitsuko, "Nope, no real reason."

00:10:20.780 | Human, "Which team do you support?"

00:10:23.340 | Mitsuko, "I support Leeds United."

00:10:26.340 | Human, "Leeds!"

00:10:29.500 | How's the season going?

00:10:31.140 | Mitsuko, "Yes, it is a city in the north of England.

00:10:34.820 | "About half a million people live there.

00:10:37.980 | "It is the same as ever."

00:10:39.520 | So you see in this transcript,

00:10:43.900 | it's just some little gap of inhumanity

00:10:46.780 | that becomes apparent.

00:10:48.380 | There's a lingering and a repeatability of points.

00:10:52.940 | There's certain gaps in understanding

00:10:56.260 | and ability to follow tangents, all those kinds of things.

00:10:59.180 | But it's still not clear to me as an open question

00:11:04.460 | how to make explicit where exactly

00:11:09.100 | the point of the failure of the test is.

00:11:11.780 | I believe that hasn't actually been really researched

00:11:15.220 | that well in these constructions.

00:11:17.540 | As opposed to decision making at the very end

00:11:20.660 | of a conversation, is this human or not,

00:11:22.900 | rather marking parts of a conversation

00:11:25.220 | as more or less human, like suspicious parts

00:11:28.140 | that make you wonder this is not human.

00:11:30.700 | I think that'll be really interesting to see

00:11:32.540 | if it's possible to make explicit what aspects

00:11:34.940 | of the conversation are the failure points.

00:11:37.460 | One of the times that the claim

00:11:39.280 | that the Turing test was passed,

00:11:41.140 | I think most famously was in 2014

00:11:43.700 | at an exhibition event that marked the 60th anniversary

00:11:46.700 | of Turing's death, Eugene Guzman fooled 33%

00:11:51.700 | of the event judges.

00:11:53.900 | And the method he used was to portray

00:11:55.940 | a 13-year-old Ukrainian boy that had a bunch

00:11:58.740 | of different personality quirks

00:12:00.100 | and obviously the language barrier,

00:12:02.220 | and had some humor and a constant sort of drive

00:12:05.180 | towards misdirecting the conversation back

00:12:08.620 | to the places where it was comfortable doing.

00:12:11.420 | So there's some criticism that you can make

00:12:13.000 | of this event due to some sort of smoke and mirrors,

00:12:16.820 | kind of the PR and marketing side of things

00:12:18.820 | that I think is always there with these kind

00:12:23.140 | of exhibition events.

00:12:24.900 | But setting that aside, I think the interesting lessons here

00:12:29.260 | is that the parameters, the rules of the actual engineering

00:12:33.260 | of the Turing test can determine whether it contains

00:12:38.060 | sort of the spirit of the Turing test,

00:12:40.620 | which is the test that captures the ability

00:12:44.260 | of an agent to have a deep, meaningful conversation.

00:12:48.660 | So in this case, you can argue that a few tricks were used

00:12:53.660 | to circumvent the need to have a deep,

00:12:57.980 | meaningful conversation.

00:12:59.460 | And 30% of judges were fooled without rigorous,

00:13:03.300 | thorough, transparent, open-domain testing.

00:13:06.260 | On the left is a transcript with Scott Aronson,

00:13:08.940 | the famed computer scientist,

00:13:10.300 | the quantum computing researcher.

00:13:13.380 | Talked to him on the podcast, brilliant guy.

00:13:15.980 | He posted some of the conversation that he had

00:13:18.200 | with Eugene, he was one of the judges,

00:13:20.420 | on his blog that I think is really interesting.

00:13:22.980 | So it shows that the judge, the interrogator,

00:13:26.660 | when they're an expert, they can drive,

00:13:29.500 | they can truly put the bot to the test.

00:13:32.500 | As Scott did, he really didn't allow the kind

00:13:35.940 | of misdirection that Eugene nonstop tried to do.

00:13:39.660 | And you can see that in the transcript.

00:13:41.700 | Scott refuses to take the misdirection.

00:13:44.540 | So as I mentioned, despite the waning,

00:13:48.060 | I guess, popularity of the Lobner Prize

00:13:50.260 | and the Turing Test idea in general,

00:13:53.400 | Google has published a paper and proposed a system

00:13:56.460 | called MINA that's a chatbot,

00:13:59.220 | that's an end-to-end deep learning system.

00:14:02.720 | The representational goal in the 2.6 billion parameters

00:14:06.180 | is to capture the conversational context well,

00:14:09.160 | to be able to generate the text

00:14:10.520 | that fits the conversational context well.

00:14:12.740 | Now, one interesting aspect of this,

00:14:14.600 | besides being a serious attempt

00:14:16.280 | at creating a learning-based system

00:14:18.420 | for open domain conversational agents,

00:14:21.760 | is that a new metric is proposed.

00:14:23.960 | And it's a two-part metric of sensibleness and specificity.

00:14:28.960 | Now, sensibleness is that a bot's responses

00:14:32.280 | have to make sense in context.

00:14:33.840 | They have to fit the context.

00:14:35.560 | Just to give you a sense,

00:14:36.520 | for humans, we have 97% sensibleness.

00:14:39.460 | So ability to match what we're saying to the context.

00:14:45.080 | Now, the reason you need another side of that metric

00:14:48.900 | is because you can be sensible,

00:14:52.020 | you can fit the context by being boring, by being generic,

00:14:55.320 | by making statements like, I don't know,

00:14:57.300 | or that's a good point.

00:14:58.880 | So these generic statements

00:15:00.180 | that fit a lot of different kinds of context.

00:15:02.460 | So the other side of the metric is specificity.

00:15:05.160 | Basically, the goal being there is don't be boring.

00:15:08.480 | It's to say something very specific to this context.

00:15:12.460 | So not only does it match the context,

00:15:14.980 | but it captures something very unique

00:15:17.180 | to this particular set of lines of conversation

00:15:21.980 | that form the context.

00:15:24.060 | I think it's fair to say that the beauty of the music,

00:15:28.820 | the humor, the wit of conversation

00:15:31.300 | comes from that ability to play with the specifics,

00:15:35.380 | the specificity metric.

00:15:37.220 | So both are really important.

00:15:39.400 | Humans achieve 86% sensibleness and specificity.

00:15:44.540 | Mina achieves 79% compared to Mitsuku, who achieves 56%.

00:15:49.520 | Now, take this all with a grain of salt.

00:15:52.140 | I want to be very careful here

00:15:53.980 | because there is also, not to throw shade,

00:15:57.580 | but it's closed source currently.

00:16:00.140 | And there's a little bit of a feeling

00:16:02.320 | of a PR marketing situation here.

00:16:05.500 | Naturally, perhaps the paper is made in such a way,

00:16:10.220 | the methodology and the results are made in such a way

00:16:12.580 | that benefit the way the learning framework was constructed.

00:16:16.380 | Now, I don't want to over-criticize that

00:16:18.740 | because I think there's still a lot of interesting ideas

00:16:21.460 | in this paper, but in terms of looking

00:16:23.340 | at the actual percentages of 86% human performance

00:16:26.980 | and 79% Mina performance, I think we're quite away

00:16:31.600 | from being able to make conclusive statements

00:16:34.060 | about a system achieving

00:16:35.780 | human-level conversational capabilities.

00:16:38.380 | So those plots should be taken with a grain of salt,

00:16:41.220 | but the actual content of the ideas,

00:16:43.260 | I think is really interesting.

00:16:44.780 | I think quite obviously the future, long-term,

00:16:49.240 | but hopefully short-term, is in learning end-to-end,

00:16:53.600 | learning-based approaches to open domain conversation.

00:16:57.020 | So just like Turing described, funny enough,

00:16:59.620 | 70 years ago in his paper that machine learning

00:17:02.220 | will be essential to success, I believe the same.

00:17:05.600 | It's a lot less interesting and revolutionary

00:17:07.940 | to think so today, but I believe

00:17:10.500 | that machine learning will also need

00:17:12.340 | to be a very central part of achieving

00:17:15.580 | human-level conversational capabilities.

00:17:18.300 | So let's talk through some objections.

00:17:20.320 | Nine of them are highlighted by Turing himself in his paper.

00:17:25.940 | Here I provide some informal, highly informal summaries.

00:17:30.940 | The first objection is religious,

00:17:33.460 | which connects thinking to, quote-unquote, the soul.

00:17:37.640 | And God, presumably, is the giver of the soul to humans.

00:17:42.640 | Now, Turing's response to that is God is all-powerful.

00:17:49.180 | There is no reason why he can't assign souls

00:17:53.780 | to anything biological or artificial.

00:17:58.220 | So it doesn't seem that whatever mechanism

00:18:01.400 | by which the soul arrives in the human

00:18:04.720 | cannot also be repeated for artificial creatures.

00:18:07.260 | The second objection is the, quote-unquote,

00:18:10.580 | head in the sand.

00:18:11.980 | It's a bit of a ridiculous one,

00:18:13.480 | but I think it's an important one

00:18:14.700 | because it keeps coming up often,

00:18:16.860 | even in today's context, highlighted by folks

00:18:19.340 | like Elon Musk, Stuart Russell, and so on.

00:18:22.420 | The head in the sand objection is that AGI is scary.

00:18:27.260 | So human-level and superhuman-level intelligence

00:18:29.820 | is kind of scary.

00:18:31.340 | Today we talk about existential threats.

00:18:34.140 | It seems like the world would be totally transformed

00:18:36.360 | if we have something like that.

00:18:37.900 | Then it could be transformed in a highly negative way.

00:18:40.440 | So let's not think about it

00:18:42.160 | because it's kind of seems far away.

00:18:45.000 | So it probably won't happen.

00:18:46.820 | So let's just not think about it.

00:18:49.000 | That's kind of the objection of the Turing test.

00:18:50.840 | It's so far away, it's not worthwhile

00:18:53.680 | to even think about a test for this intelligence

00:18:56.320 | or what human-level intelligence means

00:18:58.560 | or what superhuman-level intelligence means.

00:19:01.320 | The response, quite naturally,

00:19:02.640 | is that it doesn't matter how you feel about something

00:19:05.580 | and whether it's going to happen or not.

00:19:08.560 | So we kind of have to set our feelings aside

00:19:11.160 | and not allow fear or emotion to model our thinking

00:19:16.080 | or detract us from thinking about it at all.

00:19:19.680 | The third objection is from Gato's incompleteness theorem,

00:19:23.020 | saying there's limits to computation.

00:19:24.800 | This is the Roger Penrose line of thinking

00:19:28.080 | that basically if a machine is a computation system,

00:19:30.880 | there is limits to its capabilities

00:19:33.440 | in that it can never be a perfectly rational system.

00:19:37.440 | Turing's response to this

00:19:38.760 | is that humans are not rational either.

00:19:40.960 | They're flawed.

00:19:42.160 | Nowhere does it say that intelligence equals infallibility.

00:19:46.220 | In fact, it could probably be argued

00:19:47.720 | that fallibility is at the core of intelligence.

00:19:51.860 | The fourth objection is that consciousness

00:19:54.560 | may be required for intelligence.

00:19:56.920 | Turing's response to this is to separate

00:19:59.620 | whether something is conscious

00:20:00.980 | and whether something appears to be conscious.

00:20:03.200 | So the focus of the Turing test is how something appears.

00:20:06.240 | And so in some sense, humans, to us, as far as we know,

00:20:11.240 | only appear to be conscious.

00:20:12.880 | We can't prove that they're actually conscious,

00:20:14.960 | humans outside of ourselves.

00:20:17.080 | And so since humans only appear to be conscious,

00:20:20.920 | there's no reason to think that machines

00:20:22.720 | can't also appear to be conscious,

00:20:25.240 | and that's at the core of the Turing test.

00:20:27.320 | So the Turing test kind of skirts around the question

00:20:30.240 | whether something is or isn't intelligence,

00:20:32.220 | whether is or isn't conscious.

00:20:34.540 | The fundamental question is,

00:20:35.760 | does it appear to be intelligent?

00:20:37.400 | Does it appear to be conscious?

00:20:39.060 | So he actually doesn't respond to the idea

00:20:41.700 | that consciousness is or isn't required for intelligence.

00:20:44.820 | He just says that if it is,

00:20:46.460 | there's no reason why you can't fake it,

00:20:50.100 | and that will be sufficient

00:20:52.320 | to achieve the display of intelligence.

00:20:56.000 | The fifth objection is the Negative Nancy objection

00:21:01.000 | of machines will never be able to do X, whatever X is.

00:21:05.100 | You can make it love, joke, humor,

00:21:08.420 | understand or generate humor, eat, enjoy food,

00:21:13.420 | create art, music, poetry, and so on.

00:21:16.380 | So there's a lot of things we can put in that X

00:21:18.500 | that machines can never do.

00:21:20.080 | And basically highlighting our human intuition

00:21:24.060 | about the limitations of machines.

00:21:26.540 | Just like with the second objection,

00:21:28.240 | naturally the response here is that the objection

00:21:32.140 | that machines will never do X

00:21:33.640 | doesn't have any actual reasoning behind it.

00:21:37.260 | It is just a vapid opinion based on the world today,

00:21:42.260 | refusing to believe that the world of tomorrow

00:21:46.140 | will be different.

00:21:47.020 | The sixth objection, probably the most important,

00:21:51.700 | one of the most interesting,

00:21:53.180 | comes by way of Ada Lovelace, Lady Lovelace,

00:21:57.100 | the mother of computer science,

00:21:59.120 | with the basic idea that machines can only do

00:22:01.820 | what we program them to do.

00:22:03.820 | Now this is an objection that appears in many forms

00:22:06.940 | throughout, before Turing and after Turing.

00:22:09.540 | And I think it's a really important objection

00:22:11.340 | to think about.

00:22:12.380 | So in this particular case,

00:22:14.560 | I think Turing's response is quite shallow,

00:22:17.580 | but it is nevertheless pretty interesting,

00:22:19.940 | and we'll talk about it again later on.

00:22:23.260 | His response is, well, if machines can only do

00:22:26.740 | what we program them to do,

00:22:28.180 | we can rephrase that statement as saying,

00:22:30.900 | machines can't surprise us.

00:22:32.760 | And when you rephrase it that way,

00:22:35.400 | it becomes clear that machines actually surprise us

00:22:37.700 | all the time.

00:22:38.540 | A system that is sufficiently complex

00:22:41.140 | will no longer be one of which we have a solid intuition

00:22:45.900 | of how it behaves,

00:22:47.260 | even if we built all the individual pieces of code

00:22:49.820 | for those of you who have programmed things.

00:22:52.180 | So I've written a lot of programs.

00:22:54.280 | In the initial design stage,

00:22:56.900 | you have an intuition about how it should behave.

00:22:58.820 | There's a design, there's a plan,

00:23:00.820 | you know what the individual functions do.

00:23:02.980 | But as the piece of code grows,

00:23:05.180 | your ability to intuit exactly the mapping

00:23:09.340 | from input to output fades with the size of the code base,

00:23:14.220 | even if you understand everything about the code,

00:23:17.620 | and even if you set logical and syntactic bugs aside.

00:23:22.180 | The seventh objection looks to the brain

00:23:25.220 | and looks to the continuous analog nature

00:23:27.280 | of that particular neural network system.

00:23:30.840 | So Turing's response to that is,

00:23:33.900 | sure, the brain might be analog,

00:23:36.260 | and then digital computers are discrete,

00:23:39.580 | but if you have a big enough digital computer,

00:23:41.620 | it can sufficiently approximate the analog system,

00:23:45.700 | meaning to a sufficient degree

00:23:48.060 | that it would appear intelligent.

00:23:50.540 | The eighth objection is the free will objection, right?

00:23:55.300 | Is that when you have deterministic rules, laws, algorithms,

00:24:00.300 | they're going to result in predictable behavior.

00:24:05.100 | And this kind of exactly deterministic predictable behavior

00:24:09.980 | doesn't quite feel like the mind

00:24:14.540 | that we know us humans is possessing.

00:24:16.900 | This kind of feeling that underlies

00:24:19.740 | what's required for intelligence for a mind,

00:24:23.700 | I think is behind the Chinese room thought experiment

00:24:28.220 | that we'll talk about next.

00:24:29.780 | So Turing's response here is that

00:24:33.340 | humans very well could be a complex collection of rules.

00:24:38.340 | There's no indication that we're not,

00:24:40.180 | just because we don't understand

00:24:42.780 | or don't even have the tools to explore

00:24:44.860 | the kind of rules that underlie our brain,

00:24:48.860 | doesn't mean it's not just a collection

00:24:51.500 | of deterministic, perfectly predictable sets of rules.

00:24:56.500 | Objection number nine is kind of fun.

00:24:59.620 | Quite possibly Turing is trolling us,

00:25:01.960 | but more likely the ideas of mind reading,

00:25:04.820 | extrasensory perception, telepathy,

00:25:07.460 | were a little bit more popular in his time.

00:25:10.180 | So the objection here is what if mind reading

00:25:13.620 | was used to cheat the test?

00:25:15.820 | So basically if human to human communication

00:25:19.060 | through telepathy could be used,

00:25:22.420 | then a machine can't achieve that same kind

00:25:25.340 | of telepathic communication.

00:25:27.100 | And so that could be used to circumvent

00:25:30.900 | the effectiveness of the test.

00:25:32.780 | Now Turing's response to this is,

00:25:35.500 | well, you just have to design a room

00:25:38.520 | that not only protects you from being able to see,

00:25:40.780 | whether it's a robot or a human,

00:25:42.740 | but also design a telepathy proof room

00:25:47.740 | that prevents telepathic communication.

00:25:50.580 | Again, could be Turing trolling us,

00:25:53.940 | but I think more importantly,

00:25:55.120 | I think it's a nice illustration at the time,

00:25:57.240 | and even still today, that there's a lot of mystery

00:26:00.180 | about how our mind works.

00:26:01.980 | If you chuckle and completely laugh off

00:26:04.620 | the possibility of telepathic communication,

00:26:07.860 | I think you're assuming too much about your own knowledge

00:26:11.060 | about how our mind works.

00:26:13.580 | I think we know very little about how our mind works.

00:26:15.920 | It is true, we have very little scientific evidence

00:26:19.180 | of telepathic communication,

00:26:21.220 | but you shouldn't take the next leap

00:26:23.800 | and have a feeling like you understand

00:26:26.580 | that telepathic communication is impossible.

00:26:29.300 | You should nevertheless maintain an open mind.

00:26:31.980 | But as an objection,

00:26:33.640 | it doesn't seem to be a very effective one.

00:26:36.340 | I wanted to dedicate just one slide

00:26:38.500 | and probably the most famous objection to the Turing test

00:26:40.980 | proposed by John Searle in 1980

00:26:43.620 | in his paper "Minds, Brains, and Programs,"

00:26:47.260 | commonly known as the Chinese Room Thought Experiment.

00:26:50.820 | And it's kind of a combination of number four, number six,

00:26:53.660 | and number eight objections on the previous slide,

00:26:56.140 | which is the consciousness is required for intelligence,

00:27:01.100 | the Ada Lovelace objection that programs

00:27:03.460 | can only do what we program them to do,

00:27:06.580 | and the deterministic free will objection

00:27:10.060 | that deterministic rules lead to predictable behavior.

00:27:13.820 | And that doesn't seem to be like what the mind does.

00:27:16.620 | So there's echoes of all those objections

00:27:18.560 | that Turing anticipated all put together

00:27:21.100 | into the Chinese Room.

00:27:23.320 | As a small aside, it is now 6 a.m.

00:27:27.500 | I did not sleep last night,

00:27:28.700 | so this video is brought to you

00:27:31.780 | by this magic potion called Nitro Cold Brew,

00:27:36.380 | an excessively expensive canned beverage

00:27:41.700 | from Starbucks that fuels me

00:27:44.780 | this wonderful Saturday morning.

00:27:48.080 | Here's to you, dear friends.

00:27:50.800 | Okay, the Chinese Room involves following instructions

00:27:56.780 | of an algorithm.

00:27:57.900 | So there's a human sitting inside a room

00:28:00.220 | that doesn't know how to speak Chinese,

00:28:02.220 | but there's notes being passed to them

00:28:04.660 | inside the room from outside in Chinese,

00:28:07.860 | and all they do is follow a set of rules

00:28:10.140 | in order to respond to that language.

00:28:12.740 | So the idea is if the brain inside the system

00:28:17.740 | that passes the Turing test

00:28:21.300 | is simply following a set of rules

00:28:24.300 | that it's not truly understanding,

00:28:26.980 | it is not conscious, it does not have a mind,

00:28:30.100 | the objection is philosophical.

00:28:32.100 | So there's not, for my computer science engineering self,

00:28:37.100 | there's not enough meat in it

00:28:38.480 | to even make it that interesting.

00:28:40.580 | It's very human-centric,

00:28:41.860 | but allow us to explore it further.

00:28:45.360 | So the key argument is that programs,

00:28:48.980 | computational systems, are formal,

00:28:52.740 | and so they can capture syntactic structure.

00:28:55.580 | Minds, our brains, have mental content,

00:28:59.900 | so they can capture semantics.

00:29:01.860 | And so the claim that I think is the most important,

00:29:05.300 | the clearest in the paper,

00:29:06.740 | is that syntax by itself is neither constitutive of

00:29:11.000 | nor sufficient for semantics.

00:29:13.620 | So just because you can replicate the syntax of the language

00:29:16.980 | doesn't mean you can truly understand it.

00:29:19.980 | Now this is the same kind of criticism

00:29:21.580 | we hear of language models of today with transformers,

00:29:24.820 | that OpenAI's GP2 really doesn't understand the language,

00:29:28.700 | it's just mimicking the statistics of it so well

00:29:32.700 | that it can generate syntactically correct,

00:29:35.380 | and even have echoes of semantic structure

00:29:39.340 | that indicates some kind of understanding, but it doesn't.

00:29:43.320 | To me, that argument is not very interesting

00:29:45.300 | from an engineering perspective,

00:29:46.600 | because it just sounds like saying

00:29:48.940 | humans can understand things, humans are special,

00:29:52.900 | therefore machines cannot understand things.

00:29:56.680 | It's a very human-centric argument

00:29:59.260 | that's not allowing us to rigorously explore

00:30:02.680 | what exactly does understanding mean

00:30:06.620 | from a computational perspective.

00:30:08.140 | Or put in other words, if understanding, intelligence,

00:30:12.540 | consciousness, either one of those,

00:30:14.980 | is not achievable through computation,

00:30:18.020 | then where is the point that computation hits the wall?

00:30:21.900 | The most interesting open questions to me here

00:30:25.140 | are on the point of faking things, or mimicking,

00:30:27.760 | or the appearance of things.

00:30:29.440 | Does the mimicking of thinking equal thinking?

00:30:32.460 | Does the mimicking of consciousness equal consciousness?

00:30:35.020 | Does the mimicking of love equal love?

00:30:37.860 | This is something that I think a lot about,

00:30:40.460 | and depending on the day, go back and forth.

00:30:43.200 | But I tend to believe from an engineering perspective,

00:30:45.460 | I tend to agree with the spirit

00:30:47.900 | and the work of Alan Turing,

00:30:49.700 | in that at this time as engineers,

00:30:51.980 | we can only focus on building the appearance of thinking,

00:30:55.460 | the appearance of consciousness, the appearance of love.

00:30:58.780 | I think as we work towards creating that appearance,

00:31:01.980 | we'll actually begin to understand the fundamentals

00:31:06.620 | of what it means to be conscious,

00:31:08.140 | what it means to love, what it means to think.

00:31:12.760 | You may have even heard me say sometimes

00:31:14.600 | that the appearance of consciousness is consciousness.

00:31:18.800 | I think that's me being a little bit poetic,

00:31:21.780 | but I think from our perspective,

00:31:23.100 | from our exceptionally limited understanding,

00:31:27.180 | both problems are in the same direction.

00:31:31.500 | So it's not like if we focus on creating

00:31:33.300 | the appearance of consciousness,

00:31:34.500 | that's gonna lead us astray, in my personal view.

00:31:37.600 | It's going to lead us very far down the road

00:31:39.700 | of actually understanding,

00:31:40.940 | and maybe one day engineering consciousness.

00:31:44.380 | And now I'd like to talk about some alternatives

00:31:46.620 | and variations of the Turing test

00:31:48.180 | that I find quite interesting.

00:31:50.140 | So there's a lot of kind of natural variations

00:31:52.780 | and extensions to the Turing test.

00:31:54.980 | First, the total Turing test proposed in 1989.

00:31:59.980 | It extends the Turing test

00:32:01.660 | in the natural language conversation domain

00:32:04.420 | to perception, computer vision,

00:32:06.940 | and object manipulation of robotics.

00:32:09.100 | So it takes it into the physical world.

00:32:12.300 | The interesting question here to me

00:32:14.020 | is whether adding extra modalities

00:32:17.460 | like audio, visual, manipulation

00:32:21.580 | makes the test harder or easier.

00:32:24.200 | To me, it's very possible that a test

00:32:28.100 | with a narrow bandwidth of communication,

00:32:31.760 | such as the natural language communication

00:32:33.620 | of the Turing test is actually harder to pass

00:32:36.620 | than the one that includes other modalities.

00:32:38.820 | But anyway, one of the powerful things

00:32:42.580 | about the original Turing test is that it's so simple.

00:32:45.500 | The Lovelace test proposed in 2001

00:32:49.660 | builds on the Ada Lovelace objection

00:32:52.700 | to form the test that says the machine

00:32:55.100 | has to do something surprising

00:32:57.540 | that the creator or the person who's aware

00:33:02.100 | how the program was created cannot explain.

00:33:05.380 | So it should be truly surprised.

00:33:07.580 | There is also, in 2014, was proposed a Lovelace 2.0 test,

00:33:12.940 | which emphasizes a more constrained definition

00:33:15.660 | of what surprising is, 'cause it's very difficult

00:33:17.600 | to pin down, to formalize the idea of surprise

00:33:21.820 | and explain, right, in the original formulation

00:33:25.940 | of the Lovelace test.

00:33:27.300 | But with Lovelace 2.0, it emphasizes

00:33:29.820 | sort of creativity, art, so on.

00:33:33.660 | So it's more concrete than surprise,

00:33:35.540 | especially if you define constraints

00:33:38.260 | to which creative medium we're operating in.

00:33:41.860 | You basically have to create an impressive piece

00:33:45.300 | of artistic work.

00:33:46.600 | I think that's an interesting conception,

00:33:49.640 | but it takes us in the land that's much more,

00:33:53.180 | not less subjective than the original Turing test.

00:33:58.180 | But this brings us to the open

00:34:00.900 | and the very interesting question of surprise,

00:34:02.940 | which I think is really at the core

00:34:06.820 | of our conception of intelligence.

00:34:09.200 | I think it is true that our idea

00:34:12.540 | of what makes an intelligent machine

00:34:14.500 | is one that really surprises.

00:34:16.580 | So when we one day finally create a system

00:34:19.880 | of human-level or superhuman-level intelligence,

00:34:23.000 | we will surely be surprised.

00:34:25.680 | So we have to think, what kind of behavior

00:34:28.800 | is one that will surprise us to the core?

00:34:31.200 | To me, I have many examples in mind

00:34:34.620 | that I'll cover in future videos,

00:34:36.880 | but one certainly, one of the hardest ones is humor.

00:34:40.840 | And finally, the truly total Turing test proposed in 1998

00:34:45.840 | proposes an interesting philosophical idea

00:34:49.020 | that we should not judge the performance

00:34:52.600 | of an individual agent in an isolated context,

00:34:55.960 | but instead look at the body of work

00:34:58.200 | produced by a collection of intelligent agents

00:35:01.880 | throughout their evolution,

00:35:03.720 | with some constraints on the consistency

00:35:06.520 | underlying the evolutionary process.

00:35:09.600 | It's interesting to suggest that the way we conceive

00:35:15.000 | of intelligence amongst us humans

00:35:17.920 | is grounded in the long arc of history

00:35:20.880 | of the body of work we've created together.

00:35:23.280 | I don't find that argument convincing,

00:35:25.720 | but I do find the interesting question

00:35:27.560 | and the open question, the idea

00:35:31.000 | that we should measure systems

00:35:35.840 | not in the moment or particular five minute period

00:35:38.720 | or 20 minute period, but over a period of months and years,

00:35:43.440 | perhaps condensed in a simulated context.

00:35:46.560 | So really increase the scale

00:35:49.760 | at which we judge interactions

00:35:51.720 | by several orders of magnitude.

00:35:54.040 | That to me is a really interesting idea,

00:35:59.040 | you know, to judge alpha zero performance

00:36:01.520 | not on a single game of chess,

00:36:03.840 | but looking at millions of games

00:36:06.960 | and not looking at a million games

00:36:09.240 | for a static set of parameters,

00:36:11.240 | but looking at the millions of games played

00:36:14.640 | as the system was trained from scratch

00:36:17.520 | and became better and better and better.

00:36:19.680 | There's something about that full journey

00:36:23.520 | that may capture intelligence.

00:36:25.960 | So intelligence very well could be the journey,

00:36:30.000 | not the destination.

00:36:31.600 | I think there's something there.

00:36:32.800 | It's very imprecise in this construction,

00:36:35.720 | but it struck me as a very novel idea

00:36:39.880 | for benchmark not to measure instantaneous performance,

00:36:43.920 | but performance over time

00:36:45.240 | and the improvement of performance over time.

00:36:47.600 | It appears that there's something to that,

00:36:49.120 | but I can't quite make it concrete.

00:36:51.000 | And I'm not sure it's possible to formalize

00:36:52.760 | in the way that the original Turing test is formalized.

00:36:56.480 | Another kind of test is the Winograd Schema Challenge,

00:37:00.340 | which I think is really compelling in many ways.

00:37:03.280 | So first to explain it with an example,

00:37:05.840 | there's a sentence, really two sentences.

00:37:09.280 | Let's say the trophy doesn't fit into the brown suitcase

00:37:11.940 | because it's too small,

00:37:13.540 | and the trophy doesn't fit into the brown suitcase

00:37:15.880 | because it is too large.

00:37:17.680 | And the question is, what is too small?

00:37:20.280 | What is too large?

00:37:22.120 | The answer for the small, what is too small,

00:37:24.720 | is the suitcase is too small.

00:37:27.080 | The trophy doesn't fit into the brown suitcase

00:37:29.240 | because it is too small.

00:37:31.120 | And then the second question is, what is too large?

00:37:34.280 | The answer there is the trophy.

00:37:36.360 | The trophy doesn't fit into the brown suitcase

00:37:38.380 | because it is too large.

00:37:40.600 | The basic idea behind this challenge

00:37:42.620 | is the ambiguity in the sentence can only be resolved

00:37:47.380 | with common sense reasoning about ideas in this world.

00:37:51.200 | And so the strength of this test is it's quite clear,

00:37:56.160 | quite simple, and yet requires, at least in theory,

00:38:01.100 | this deep thing that we think makes us human,

00:38:05.960 | which is the ability to reason

00:38:07.940 | at the very basic level of common sense reasoning.

00:38:10.640 | The other nice thing is it can be a benchmark,

00:38:15.840 | like we're used to in the machine learning world,

00:38:18.080 | that doesn't require subjective human judges.

00:38:21.240 | There's literally a right answer.

00:38:23.420 | The weakness here that holds for other similar challenges

00:38:28.420 | in the space is that it's very difficult

00:38:30.780 | to come up with a large amount of questions.

00:38:33.720 | I mean, each one is handcrafted.

00:38:36.560 | And so that means you can't build a benchmark

00:38:40.340 | of millions or billions of questions.

00:38:43.340 | It has to be on a small scale.

00:38:45.860 | Variations of the Winograd scheme are included

00:38:50.420 | in some natural language benchmarks of today

00:38:54.860 | that people use in the machine learning context.

00:38:57.820 | The Amazon Alexa prize, I think,

00:39:00.500 | captures nicely the spirit of the Turing test.

00:39:03.900 | I think it's actually quite an amazing challenge

00:39:06.100 | and competition that uses voice conversation in the wild,

00:39:10.260 | so with real people, and they can use a,

00:39:13.420 | I think it's called a social bot skill

00:39:15.680 | on their Alexa devices.

00:39:17.340 | And I don't wanna wake up my own Alexa devices,

00:39:20.300 | but basically say her name and say, let's chat.

00:39:23.460 | And that brings up one of the bots involved in the challenge

00:39:26.540 | and then you can have a conversation.

00:39:28.420 | And then the bar that's to be reached is for you

00:39:32.700 | to have a 20 minute or longer conversation with the bot

00:39:36.860 | and for two thirds or more of the interactions

00:39:40.000 | to be that long.

00:39:41.280 | So the basic metric of successful interaction

00:39:45.120 | is the duration of the interaction.

00:39:47.420 | And as of today, we're still really,

00:39:49.580 | really far away from that.

00:39:51.180 | So why is this a good metric?

00:39:52.580 | And I do think it's a really powerful metric.

00:39:55.660 | As opposed to us judging the quality of conversation

00:39:57.940 | in retrospect, we speak with our actions.

00:40:01.340 | So a deep, meaningful conversation

00:40:04.100 | is one we don't want to leave.

00:40:05.940 | When we have other things contending for our time,

00:40:09.980 | when we make the choice to stay in that conversation,

00:40:12.900 | that's as powerful a signal as any

00:40:15.500 | to show that that conversation has content,

00:40:19.220 | has meaning, is enjoyable.

00:40:21.900 | I think that is what passing the Turing Test

00:40:25.060 | in its original spirit actually is.

00:40:28.540 | And I should mention that as of today,

00:40:30.700 | no team has even come close to passing the Turing Test

00:40:35.060 | as it is constructed by the Alexa Prize.

00:40:37.820 | There's several things that are really surprising

00:40:39.460 | about this challenge.

00:40:40.460 | One is that it's not a lot more popular

00:40:43.300 | and two, that Amazon chose to limit it to students only.

00:40:48.300 | I mean, almost making it an educational exercise

00:40:51.900 | as opposed to a moonshot challenge

00:40:54.620 | for our entire generation of researchers.

00:40:57.980 | I mentioned it before, but I'll say it again here

00:41:00.140 | that it's surprising to me that the biggest research lab

00:41:03.740 | and industry in academia have not focused on this problem,

00:41:08.140 | have not found the magic within the Turing Test problem

00:41:12.060 | and the Alexa Prize as it formulates,

00:41:15.180 | I believe, the spirit of the Turing Test quite well.

00:41:19.140 | A very different kind of test is the Hutter Prize

00:41:22.460 | started by Marcus Hutter,

00:41:24.220 | which I think is really fascinating

00:41:26.980 | on both the philosophical and mathematical angle.

00:41:29.460 | Underlying it is the idea that compression

00:41:33.460 | is strongly correlated with intelligence.

00:41:38.140 | Put another way, the ability to compress knowledge

00:41:42.020 | well requires intelligence.

00:41:44.380 | And the better you compress that knowledge,

00:41:46.380 | the more intelligent you are.

00:41:47.900 | I think this is a really compelling notion

00:41:51.180 | because then we can make explicit,

00:41:53.620 | we can quantify how intelligent you are

00:41:56.580 | by how well you're able to compress knowledge.

00:41:59.660 | As the prize webpage puts it,

00:42:02.740 | being able to compress well is closely related

00:42:05.300 | to acting intelligently,

00:42:07.060 | thus reducing the slippery concept of intelligence

00:42:10.020 | to hard file size numbers.

00:42:12.420 | So the task is to take one gigabyte of Wikipedia data

00:42:17.420 | and compress it down as much as possible.

00:42:20.860 | The current best is a 8.58 compression factor.

00:42:25.860 | So down from one gigabyte to 117 megabytes.

00:42:30.220 | And the award for each 1% improvement, you win 5,000 euros.

00:42:35.180 | I find this competition just amazing

00:42:37.780 | and fascinating on many levels.

00:42:40.100 | I think it's a really good formulation

00:42:43.020 | of an intelligence challenge, but it's not a test.

00:42:47.980 | That's one of its kind of limitations,

00:42:50.460 | at least in the poetic sense,

00:42:52.260 | that it doesn't set a bar

00:42:53.860 | beyond which we're really damn impressed.

00:42:56.700 | Meaning it's harder to set a bar,

00:42:59.300 | like the one formulated by the Turing test,

00:43:01.340 | beyond which we feel it would be human level intelligence.

00:43:04.620 | Now the bar that's set by the Turing,

00:43:06.260 | Alan Turing and others,

00:43:07.820 | Lobna Prize, Alexa Prize, are also arbitrary,

00:43:11.780 | but it feels like we're able to intuit a good bar

00:43:14.180 | in that context better than being able to intuit

00:43:17.700 | the kind of bar we need to set for the compression challenge.

00:43:21.260 | Another fascinating challenge

00:43:22.660 | is the abstraction and reasoning challenge

00:43:24.740 | put forth by Francois Chollet just a few months ago.

00:43:28.500 | So this is very exciting.

00:43:29.460 | It's actually ongoing as a competition on Kegel,

00:43:31.980 | I think with a deadline in May.

00:43:35.020 | It's a really, really interesting idea.

00:43:37.020 | I haven't internalized it fully yet,

00:43:39.380 | and perhaps we'll do a separate video

00:43:41.860 | on just this paper alone,

00:43:43.780 | and I'll talk to Francois, I'm sure, on the podcast

00:43:46.140 | and other contacts in the future about it.

00:43:48.140 | I think there's a lot of brilliant ideas here

00:43:50.820 | that I still have to kind of digest a little bit,

00:43:53.500 | but let me describe the high-level ideas

00:43:57.860 | behind this benchmark.

00:43:59.420 | So first of all, the name is abstraction reasoning corpus

00:44:03.140 | or challenge arc.

00:44:05.140 | The domain is in a grid world of patterns,

00:44:08.860 | not limited in size,

00:44:10.020 | but the grid world is filled with cells

00:44:13.780 | that can be of different colors.

00:44:15.700 | And the spirit of the set of tests that Francois proposes

00:44:19.380 | is to stay close to IQ tests,

00:44:22.060 | so psychometric intelligence tests

00:44:23.740 | that we use to measure the intelligence of human beings.

00:44:29.100 | Now, the Turing test is kind of at a higher level

00:44:33.580 | of natural language.

00:44:35.580 | In this construction of arc,

00:44:38.300 | it goes as close as possible

00:44:40.940 | to the very basic elements of reasoning,

00:44:44.980 | just like in the IQ test of patterns.

00:44:47.700 | It gets to the very core,

00:44:49.920 | such that we can then make explicit the priors,

00:44:55.620 | the concepts that we bring to the table of those tests.

00:44:58.780 | And if we can make them explicit,

00:45:00.700 | it reduces the test as close as possible

00:45:04.420 | to the measure of the system's ability to reason.

00:45:07.920 | Now, the concepts that are brought to this grid world,

00:45:11.180 | here's just a couple of example of priors

00:45:13.100 | that Francois shows in his paper.

00:45:15.780 | I recommend highly,

00:45:16.820 | called "On the Measure of Intelligence."

00:45:19.260 | Here, prior concept is not referring to a previous concept.

00:45:22.420 | It's referring to a prior set of knowledge

00:45:25.700 | that you bring to the table.

00:45:27.420 | So this first row of illustrations of the two grid worlds

00:45:31.220 | illustrates the idea of object persistence with noise.

00:45:34.420 | So we're able to understand that large objects,

00:45:39.420 | when there is some visual noise occluding

00:45:45.180 | our ability to see them,

00:45:47.580 | that they still exist in the world.

00:45:49.480 | And if that noise changes, the object is still unchanged.

00:45:53.580 | So that idea of object persistence in the world

00:45:57.900 | is a prior that we bring to the table

00:46:01.140 | of understanding this grid world.

00:46:02.700 | Another prior is on the left at the bottom

00:46:06.820 | is objects are defined by spatial contiguity.

00:46:11.820 | So objects in this grid world,

00:46:16.220 | when the cells are of the same color

00:46:18.540 | and they're touching each other,

00:46:20.120 | they're probably part of the same object.

00:46:22.180 | And if there's black cells that separate

00:46:25.420 | those groupings of cells,

00:46:26.980 | that means there's multiple objects.

00:46:28.660 | So this kind of spatial contiguity of colored cells

00:46:33.660 | define the entity of the object.

00:46:37.860 | And on the right at the bottom is the color-based contiguity,

00:46:42.780 | which means that even if the cells

00:46:44.760 | of different colors are touching,

00:46:46.780 | if their colors are different,

00:46:48.300 | that means it likely belongs to a different object.

00:46:51.280 | That's a basic prior.

00:46:52.820 | And there's a few others, by the way,

00:46:54.980 | just beautiful pictures in that paper

00:46:58.660 | that make you really think

00:47:00.660 | about the core elements of intelligence.

00:47:03.340 | I love that paper, worth looking at.

00:47:06.380 | There's a lot of interesting insights in there.

00:47:09.280 | Just to give you some examples of what the actual task

00:47:13.700 | for the machine in this test looks like,

00:47:15.940 | it's similar to the kind of task you would see in an IQ test.

00:47:18.700 | So here there's three pairings,

00:47:21.680 | and the task is for the fourth pairing of images

00:47:25.640 | to generate the grid world that fits the other three,

00:47:30.640 | that fits the generating pattern of the other three.

00:47:34.760 | So in this case, figure four from the paper,

00:47:37.960 | a task where the implicit goal

00:47:39.400 | is to complete a symmetrical pattern.

00:47:41.940 | The nature of the task is specified

00:47:43.600 | by the three input-output examples.

00:47:45.820 | The test taker must generate the output grid

00:47:47.920 | corresponding to the input grid

00:47:49.820 | of the test input bottom right.

00:47:52.200 | So here, what you're tasked with understanding

00:47:55.280 | in the first three pairings

00:47:57.540 | is that the input has a perfect global symmetry to it.

00:48:02.540 | And also that there's parts of the image that are missing

00:48:10.000 | that can be filled in order to complete

00:48:12.540 | that perfect symmetry.

00:48:14.360 | Now that's relying on another prior,

00:48:16.440 | another basic concept of symmetry,

00:48:19.080 | which I think underlies a lot of our understanding

00:48:22.160 | of visual patterns.

00:48:23.780 | Again, so the intelligence system

00:48:25.920 | has to have a good representation of symmetry

00:48:30.920 | in various contexts.

00:48:33.140 | This is fascinating and beautiful, beautiful images.

00:48:36.760 | Okay, another example, figure 10 from the paper,

00:48:39.840 | a task where the implicit goal is to count unique objects

00:48:43.240 | and select the objects that appears the most times.

00:48:46.320 | The actual task has more demonstration pairs

00:48:48.480 | than these three.

00:48:49.740 | So figure 10 here from the paper,

00:48:52.100 | a task where the implicit goal is to count unique objects

00:48:55.080 | and select the objects that appear the most times.

00:48:58.400 | So again, there's three pairings.

00:49:01.320 | You see in the first one, there's three blue objects.

00:49:04.520 | In the second one, there's four yellow objects.

00:49:06.880 | In the third one, there's three red objects.

00:49:09.280 | So you have to figure that out.

00:49:10.560 | And then the output is the grid cells capturing that object

00:49:15.440 | that appears the most times.

00:49:17.120 | And so apply that kind of reasoning

00:49:20.380 | to complete the output of the fourth pairing.

00:49:24.120 | One of the challenges for this kind of test

00:49:26.200 | is it's difficult to generate.

00:49:28.060 | But just like I said,

00:49:29.120 | I think there's a lot of really interesting technical

00:49:31.640 | and philosophical ideas here that are worth exploring.

00:49:35.680 | So let's quickly talk through a few takeaways.

00:49:37.980 | So zooming out, is the Turing test

00:49:42.160 | a good measure of intelligence

00:49:43.380 | and can it serve as an answer to the big ambiguous

00:49:48.100 | but profound philosophical question of can machines think?

00:49:52.060 | So first some notes

00:49:54.020 | on the underlying challenges of the Turing test.

00:49:57.280 | Let's talk about intelligence.

00:50:00.660 | So if we compare human behavior and intelligent behavior,

00:50:05.360 | it's clear that the Turing test hopes to capture

00:50:09.620 | the intelligent parts of human behavior.

00:50:13.180 | But if we're trying to really capture

00:50:16.140 | human level intelligence,

00:50:18.700 | it's also possible that we wanna capture

00:50:21.820 | the unintelligent, the irrational parts of human behavior.

00:50:25.500 | So it's an open question whether natural conversation

00:50:28.460 | is a test of intelligence or humanness.

00:50:33.460 | Because if it's a test of intelligence,

00:50:36.380 | it's focusing only on kind of rational systematic thinking.

00:50:40.740 | If it's a test of humanness,

00:50:42.620 | then you have to capture the full range of emotion,

00:50:45.720 | the mess, the irrationality, the laziness, the boredom,

00:50:50.180 | all the things that make us human

00:50:52.020 | and all the things that then project themselves

00:50:55.220 | into the way we carry on through conversation.

00:50:57.740 | As I mentioned in the previous objectives,

00:50:59.700 | the Turing test really focuses on the external appearances,

00:51:03.900 | not the internal processes.

00:51:06.100 | So like I said, from an engineering perspective,

00:51:09.340 | I think it's very difficult to create a test

00:51:12.460 | for internal processes for some of these concepts

00:51:15.580 | that we have a very poor understanding of,

00:51:17.820 | like intelligence, like consciousness.

00:51:20.380 | I think the best we can do right now

00:51:23.300 | in terms of quantifying and having a measure of something,

00:51:27.180 | we have to look at the external performance of the system

00:51:31.620 | as opposed to some properties of the internal processes.

00:51:34.940 | Another challenge for the Turing test,

00:51:37.980 | as Scott Aronson's conversation with Eugene Guzman indicates

00:51:41.940 | is that the skill of the interrogator

00:51:45.420 | is really important here.

00:51:47.100 | That's both on just the conversational skill

00:51:51.080 | of how much you can stretch

00:51:52.460 | and challenge the conversation with a bot,

00:51:54.860 | and two, on the human side of it,

00:51:58.100 | the ability of the interrogator to identify the humanness

00:52:01.620 | of both the human and the machine.

00:52:03.500 | So the ability to have a conversation

00:52:07.180 | that challenges the bot,

00:52:09.540 | and the ability to make the actual identification

00:52:12.300 | of human or machine.

00:52:13.500 | Those are both skills that are essential

00:52:17.740 | to the Turing test.

00:52:19.980 | Also, to me, it's really interesting,

00:52:21.660 | the anthropomorphization of human

00:52:25.500 | to inanimate object interaction,

00:52:29.060 | I think is really fascinating.

00:52:31.060 | And it's an open question

00:52:32.660 | whether in some construction of the Turing test,

00:52:35.220 | whether anthropomorphism is leveraged

00:52:37.740 | to convince the human,

00:52:39.180 | whether that's cheating the Turing test,

00:52:41.220 | or in fact, that's an essential element

00:52:44.220 | to convincing us humans that something is intelligent.

00:52:47.700 | Perhaps as a starting point,

00:52:49.660 | we have to anthropomorphize something

00:52:51.460 | before we allow it to be intelligent

00:52:54.980 | in our subjective judgment of its intelligence.

00:52:59.940 | And finally, another limitation of the Turing test

00:53:02.500 | that could be narrowly stated as

00:53:04.860 | why do we expect a bot to talk?

00:53:07.420 | What if it doesn't feel like talking?

00:53:11.020 | Does it still fail?

00:53:12.340 | I think a more general way to phrase that is

00:53:15.300 | why do we judge the performance of a system

00:53:18.940 | on such a narrow window of time?

00:53:21.740 | I think, as I mentioned before,

00:53:23.540 | there could be something interesting

00:53:25.460 | on expanding the window of time

00:53:28.820 | over which we analyze the intelligence of the system,

00:53:34.060 | looking not just at the average performance

00:53:37.140 | but the growth of its performance

00:53:38.740 | as it interacts with you as the individual.

00:53:41.700 | I think one key aspect of intelligence

00:53:44.540 | is a social aspect and a social connection,

00:53:49.540 | I think in part may require getting to know the person.

00:53:54.700 | And there's something to rethink in the Turing test

00:53:57.220 | that relies on us building a relationship

00:53:59.700 | with the person as part of the test.

00:54:02.140 | So you can think of it as kind of the ex machina Turing test

00:54:07.060 | where they spend a series of conversations together,

00:54:10.980 | several days together, all those kinds of things.

00:54:14.060 | That feels like an interesting extension

00:54:17.220 | of the Turing test which could reveal

00:54:19.700 | the significant limitation of the current construction

00:54:22.540 | of the Turing test which is a limited window of time,

00:54:25.020 | one time at the end interrogate or judgment of it

00:54:28.780 | whether it's human or machine.

00:54:31.700 | Now my view overall on the Turing test is that yes,

00:54:35.700 | something like the Turing test as originally constructed

00:54:39.780 | so the natural language conversation

00:54:44.100 | is close to the ultimate test of intelligence.

00:54:47.700 | And moreover, this is where I disagree.

00:54:49.900 | I think I disagree with Francois Chollet

00:54:51.900 | and other world-class researchers in the area,

00:54:55.580 | Stuart Russell and so on, that I think the Turing test

00:54:58.980 | is not a distraction for us to think about.

00:55:01.700 | It doesn't pull us away from actually making progress

00:55:05.700 | in the field.

00:55:06.780 | I think it keeps us honest.

00:55:08.540 | I think truly analyzing where we stand

00:55:12.140 | in natural language conversation will help us understand

00:55:15.700 | how far away we are.

00:55:17.040 | And more than that, I think there should be active research

00:55:21.340 | on this field.

00:55:22.180 | I think the Lubna Prize type of formulations,

00:55:24.220 | the Alexa Prize formulations should be more popular

00:55:27.260 | than they are and I think researchers should take them

00:55:30.220 | very seriously.

00:55:31.660 | Now that doesn't mean that the work of the ARC benchmark

00:55:36.500 | with the IQ test type of intelligent test

00:55:40.420 | is not also going to be fruitful, potentially very fruitful.

00:55:45.220 | But I think ultimately the real test

00:55:48.760 | of human-level intelligence will occur in something

00:55:52.900 | like the construction of the Turing test

00:55:55.660 | with natural language open domain conversation

00:55:58.580 | that results in deep, meaningful connection

00:56:02.100 | between human and machine.

00:56:04.460 | Zooming out a little bit, I think in general,

00:56:08.100 | I think AI researchers don't like and try to avoid

00:56:12.740 | the messiness of human beings as is captured

00:56:16.380 | by the human-robot interaction field and set of problems.

00:56:20.260 | I think more than just embracing the Turing test,

00:56:23.980 | I think we should embrace the messiness of the human being

00:56:27.780 | in all the different domains of computer vision,

00:56:31.700 | of natural language, of robotics, of autonomous vehicles.

00:56:36.700 | I've been a long-time advocate that semi-autonomous vehicles

00:56:40.480 | are here to stay for a long time.

00:56:41.980 | We're going to have to figure out

00:56:43.420 | the human-robot interaction problem

00:56:46.500 | and for that we have to embrace perceiving everything

00:56:49.620 | about the human inside the car,

00:56:51.540 | perceiving everything about the humans outside the car.

00:56:54.240 | As I mentioned, this presentation of the paper

00:56:57.880 | is actually part of our paper reading club

00:57:02.100 | focused on artificial intelligence

00:57:04.140 | where we discuss a couple of times a week

00:57:07.100 | on a Discord server called LexPlus AI Podcast

00:57:10.180 | that you're welcome to join.

00:57:12.220 | We have an amazing community of brilliant people there

00:57:14.600 | that discuss all kinds of topics

00:57:16.900 | in artificial intelligence and beyond.

00:57:19.200 | This particular illustration that I just love

00:57:22.300 | is from Will Scobie who is an illustrator

00:57:24.580 | from United Kingdom who is part of this Discord community

00:57:29.160 | so he contributed it.

00:57:30.940 | And in general, aside from the amazing conversations,

00:57:34.640 | I encourage and hope to see other members of the community

00:57:38.260 | contribute art, code, visualizations, slides,

00:57:43.260 | ideas for these kinds of videos.

00:57:45.820 | I'm really excited by the kind of conversations I've seen.

00:57:48.980 | If you're watching this video and wanna join in,

00:57:51.420 | click on the Discord link in the description on the slide.

00:57:54.440 | Join the conversation, new paper every week, it's fun.

00:57:58.920 | Just to give you a little sense of the ideas

00:58:02.020 | behind this AI paper reading club,

00:58:04.340 | like what the goals are.

00:58:05.860 | So what is it?

00:58:07.380 | I think the goal is to take a seminal paper in the field

00:58:11.360 | that doesn't just focus in on the specific

00:58:13.960 | sort of paragraph to paragraph, section to section analysis

00:58:17.700 | what the paper is saying, but actually use the paper

00:58:20.400 | to discuss the history, the big picture development

00:58:23.860 | of the field within the context of that paper.

00:58:26.860 | Now that could be philosophical papers

00:58:28.580 | like this Turing test paper,

00:58:30.220 | or it could be very specific papers in the field.

00:58:33.420 | Again, physics, mathematics, computer science,

00:58:35.900 | and probably quite a bit of deep learning.

00:58:38.340 | So the hope is to prioritize beautiful, powerful,

00:58:43.340 | impactful insights as opposed to full coverage

00:58:46.780 | of all the contents of the paper.

00:58:49.100 | And the actual meetings on Discord,

00:58:52.860 | hopefully are less one person presenting

00:58:55.900 | and more discussion.

00:58:57.360 | There's a lot of brilliant people, they're civil,

00:59:00.140 | so you can have 300, 400 people on voice chat,

00:59:03.820 | which is a really intimate setting.

00:59:05.540 | And yet people aren't interrupting each other,

00:59:08.220 | it's not chaos, it's quite an amazing community.

00:59:11.060 | The other goal I'd love to see

00:59:13.740 | is even if we cover technical papers,

00:59:16.020 | the goal is for it to be accessible to everyone.

00:59:19.380 | So both high school students,

00:59:21.820 | people outside of all of these fields in general,

00:59:24.780 | but also I'd love to make it useful

00:59:27.980 | to experts in the field, expert researchers.

00:59:31.300 | So avoid using technical jargon,

00:59:34.340 | but still try to discover insights that are new,

00:59:38.380 | that are interesting, that are important

00:59:40.180 | for the researchers in the field.

00:59:42.060 | That's what I would love to achieve here

00:59:44.340 | with this paper reading club.

00:59:45.980 | If you're interested, join in, listen in,

00:59:48.940 | or contribute to the conversation,

00:59:50.420 | suggest papers, suggest content, visualizations, code,

00:59:55.100 | all is welcome, it's an amazing community.

00:59:57.540 | Thanks for watching this excessively long presentation.

01:00:00.620 | If you have suggestions, let me know.

01:00:03.220 | Otherwise, hope to see you next time.

01:00:05.220 | (upbeat music)

01:00:07.820 | (upbeat music)

01:00:10.420 | (upbeat music)

01:00:13.020 | (upbeat music)

01:00:15.620 | (upbeat music)

01:00:18.220 | (upbeat music)

01:00:20.820 | [BLANK_AUDIO]

Turing Test: Can Machines Think?

Chapters