back to index

Turing Test: Can Machines Think?


Chapters

0:0 Introduction
1:2 Paper opening lines
3:11 Paper overview
7:39 Loebner Prize
11:36 Eugene Goostman
13:43 Google's Meena
17:17 Objections to the Turing Test
17:29 Objection 1: Religious
18:7 Objection 2: "Heads in the Sand"
19:18 Objection 3: Godel Incompleteness Theorem
19:51 Objection 4: Consciousness
20:54 Objection 5: Machines will never do X
21:47 Objection 6: Ada Lovelace
23:22 Objection 7: Brain in analog
23:49 Objection 8: Determinism
24:55 Objection 9: Mind-reading
26:34 Chinese Room thought experiment
27:21 Coffee break
31:42 Turing Test extensions and alternatives
36:54 Winograd Schema Challenge
38:55 Alexa Prize
41:17 Hutter Prize
43:18 Francois Chollet's Abstraction and Reasoning Challenge (ARC)
49:32 Takeaways
56:51 Discord community
57:56 AI Paper Reading Club

Whisper Transcript | Transcript Only Page

00:00:00.000 | In this video, I propose to ask the question
00:00:02.280 | that was asked by Alan Turing almost 70 years ago
00:00:05.240 | in his paper, Computing Machinery and Intelligence.
00:00:09.060 | Can machines think?
00:00:10.780 | This is the first paper in a paper reading club
00:00:16.280 | that we started focused on artificial intelligence,
00:00:18.600 | but also including mathematics, physics,
00:00:21.080 | computer science, neuroscience,
00:00:22.360 | all of the scientific and engineering disciplines.
00:00:25.020 | On the surface, this is a philosophical paper,
00:00:27.440 | but really it's one of the most impactful, important
00:00:30.200 | first steps towards actually engineering
00:00:32.720 | intelligent systems by providing a test,
00:00:35.680 | a benchmark that we call today the Turing test
00:00:39.480 | of how we can actually know quantifiably
00:00:43.040 | that a system has become intelligent.
00:00:46.560 | So I'd like to talk about an overview of ideas
00:00:50.200 | in the paper, provide some of the objections
00:00:52.740 | inside the paper and external to the paper,
00:00:55.520 | consider some alternatives to the test proposed
00:00:58.320 | within the paper, and then finished with some takeaways.
00:01:03.100 | Like I said, the title of the paper
00:01:04.720 | was Computing Machinery and Intelligence,
00:01:07.080 | published almost 70 years ago in 1950,
00:01:10.240 | author Alan Turing.
00:01:12.400 | And to me, now we can argue about this,
00:01:15.120 | on the slide I say it's one of the most impactful papers.
00:01:17.720 | To me, it probably is the most impactful paper
00:01:20.620 | in the history of artificial intelligence
00:01:22.680 | while only being a philosophy paper.
00:01:26.320 | I think the number of researchers
00:01:29.160 | from inside computer science and from outside
00:01:31.680 | that has inspired, has made dream
00:01:34.400 | at a collective intelligence level of our species
00:01:38.440 | inspired that this is possible, I think is immeasurable.
00:01:42.320 | For all the major engineering breakthroughs
00:01:44.960 | and computer science breakthroughs and papers
00:01:47.920 | stretching all the way back to the 30s and 40s
00:01:50.600 | with even the work by Alan Turing with the Turing machine,
00:01:55.040 | some of the mathematical foundations of computer science
00:01:59.220 | to today with deep learning, a sequence of papers
00:02:02.680 | from the very practical Alex Ned paper
00:02:04.680 | to the back propagation papers.
00:02:06.000 | So all of these papers that underlie
00:02:09.400 | the actual successes of the field,
00:02:11.720 | I think the seed was planted.
00:02:15.120 | The dream was born with this paper.
00:02:19.800 | And it happens to have some of my favorite opening lines
00:02:23.520 | of any paper I've ever read.
00:02:25.600 | It goes, I propose to consider the question,
00:02:29.680 | can machines think?
00:02:31.720 | This should begin with the definitions
00:02:33.280 | of the meaning of the terms machine and think.
00:02:37.000 | The definition might be framed so as to reflect
00:02:39.880 | so far as possible the normal use of the words,
00:02:42.920 | but this attitude is dangerous.
00:02:44.960 | If the meaning of the words machine and think
00:02:47.360 | are to be found in examining
00:02:49.200 | how they're commonly used,
00:02:50.720 | it is difficult to escape the conclusion
00:02:52.520 | that the meaning and the answer to the question,
00:02:54.640 | can machines think is to be sought
00:02:57.240 | in a statistical survey such as a Gallup poll.
00:03:00.240 | But this is absurd.
00:03:02.040 | Instead of attempting such a definition,
00:03:03.960 | I shall replace the question by another,
00:03:06.320 | which is closely related to it
00:03:08.200 | and is expressed in relatively unambiguous terms.
00:03:12.040 | And he goes on to define the imitation game,
00:03:14.760 | the construction that we today call the Turing test,
00:03:18.160 | which goes like this.
00:03:19.640 | There's a human interrogator on one side of the wall,
00:03:23.580 | and there's two entities,
00:03:24.960 | one a machine, one a human on the other side.
00:03:29.320 | And the human interrogator communicates
00:03:32.080 | with the two entities on the other side of the wall
00:03:34.680 | by written word, by passing those back and forth.
00:03:38.280 | And after some time of this conversation,
00:03:40.280 | the human interrogator is tasked with making a decision,
00:03:43.280 | which of the other two entities is a human
00:03:46.040 | and which is a machine.
00:03:47.560 | I think this is a powerful leap of engineering,
00:03:50.480 | which is take an ambiguous but a profound question
00:03:54.040 | like can machines think and convert it into a concrete test
00:03:59.040 | that can serve as a benchmark of intelligence.
00:04:01.960 | But there's echoes in this question
00:04:04.720 | to some of the other profound questions that we often ask.
00:04:07.520 | So not only can machines think,
00:04:09.000 | but can machines be conscious?
00:04:11.120 | Can machines fall in love?
00:04:12.780 | Can machines create art, music, poetry?
00:04:17.080 | Can machines enjoy a delicious meal,
00:04:19.840 | a piece of chocolate cake?
00:04:22.000 | I think these are really, really important questions,
00:04:24.800 | but very difficult to ask
00:04:26.920 | when we're trying to create a non-human system
00:04:29.840 | that tries to achieve human level capabilities.
00:04:33.120 | So that's where Turing formulates this imitation game.
00:04:36.680 | And his prediction was that by the year 2000,
00:04:40.580 | or in 50 years since the paper,
00:04:44.040 | that a machine with 100 megabytes of storage
00:04:47.400 | will fool 30% of humans
00:04:49.240 | in a five-minute test of conversation.
00:04:52.020 | Another broader societal prediction he made,
00:04:56.480 | which I think is also interesting,
00:04:58.160 | is that people will no longer consider a phrase
00:05:00.800 | like thinking machine contradictory.
00:05:04.280 | So basically artificial intelligence at a human level
00:05:08.400 | becomes so commonplace that we would just take it
00:05:10.440 | for granted.
00:05:11.700 | And the other part that he goes at length
00:05:14.420 | towards the end of the paper to describe,
00:05:16.680 | which he believes that learning machines,
00:05:18.820 | or machine learning,
00:05:20.700 | will be a critical component of this success.
00:05:23.740 | I think it's also useful to break apart
00:05:25.940 | two implied claims within the paper,
00:05:28.860 | open claims, open questions.
00:05:31.940 | One is that the imitation game as Turing proposes
00:05:36.420 | is a good test of intelligence.
00:05:39.300 | And the second is that machines can actually pass this test.
00:05:44.300 | So when you say can machines think,
00:05:46.540 | you're both proposing an engineering benchmark
00:05:50.540 | for the word think,
00:05:52.480 | and raising the questions,
00:05:54.400 | can machines pass this benchmark?
00:05:57.860 | One of the perhaps tragic,
00:05:59.580 | but also exciting aspects of this whole area of work
00:06:03.580 | is that we still have a lot of work to do.
00:06:06.140 | So throughout this presentation,
00:06:08.300 | I will not only describe some of the ideas in the paper
00:06:10.780 | and outside of it in the years since,
00:06:12.820 | but also some of the open questions that remain,
00:06:16.480 | both at the philosophical, the psychological,
00:06:18.620 | and the technical levels.
00:06:20.020 | So here the open question stands,
00:06:21.800 | is it even possible to create a test of intelligence
00:06:25.380 | for artificial systems that will be convincing to us?
00:06:29.340 | Or will we always raise the bar?
00:06:31.400 | A corollary of that question is,
00:06:34.700 | looking at the prediction that Turing made
00:06:36.940 | that people will no longer find
00:06:38.620 | the phrase thinking machines contradictory,
00:06:41.440 | why do we still find that phrase contradictory?
00:06:45.620 | Why do we still think that computers
00:06:47.940 | are not at all intelligent?
00:06:50.100 | For many people, the game of chess
00:06:53.020 | was seen as the highest level of intelligence
00:06:55.100 | in these early days.
00:06:57.120 | In fact, we assign a lot of intelligence
00:07:00.540 | to Garry Kasparov for being one of the greatest,
00:07:03.300 | if not the greatest chess players of all time,
00:07:06.260 | as a human.
00:07:07.560 | Why do we not assign at least an inkling of that
00:07:10.900 | to IBM Deep Blue when it beat Garry Kasparov?
00:07:14.320 | Now, of course, you might start saying
00:07:16.460 | that it's a brute force algorithm,
00:07:18.020 | or in the case of AlphaGo and AlphaZero,
00:07:20.900 | you know how the learning mechanisms
00:07:22.940 | behind those algorithms work
00:07:24.340 | when they mastered the game of Go and the game of chess.
00:07:27.140 | And we'll get to some of those objections,
00:07:30.200 | but there's something deeply psychological
00:07:32.460 | within those objections
00:07:34.260 | that almost fear an artificial intelligence
00:07:37.780 | that passes the test.
00:07:39.740 | So the Turing test is very interesting
00:07:41.620 | as a thought experiment, as a philosophical construct,
00:07:45.100 | but it's also interesting as a real engineering test.
00:07:48.120 | And one of the implementations of it
00:07:50.020 | has been called the Lobner Prize,
00:07:52.440 | which has been running since 1991 to today.
00:07:55.660 | And the awards behind it, the award structure,
00:07:58.700 | is $25,000 for a system that you can use
00:08:03.740 | that using text alone passes the test,
00:08:06.180 | and $100,000 that uses other modalities
00:08:09.820 | like visual and auditory input.
00:08:12.420 | The rules of the competition have changed through the years,
00:08:14.460 | but they're currently are as follows.
00:08:16.340 | It's a 25-minute conversation,
00:08:19.140 | and in order to win, to pass the test,
00:08:21.540 | you have to fool 50% of the judges
00:08:24.180 | with which the system communicates.
00:08:26.780 | Mitsuko and Rose from Steve Warswick and Bruce Wilcox
00:08:32.340 | have been dominating the past 10 years,
00:08:34.540 | winning all but one of the years.
00:08:37.080 | Some details, Mitsuko and Rose are both
00:08:41.220 | mostly scripted, rule-based chatbots,
00:08:43.740 | so they're not end-to-end learning systems.
00:08:48.320 | I believe there is a little bit of machine learning,
00:08:50.660 | but as I understand, at the core, they're mostly scripted.
00:08:54.540 | And on the next slide, we'll go through a transcript
00:08:56.620 | of a conversation with Mitsuko to give you a sense
00:08:59.100 | of what those conversations are like.
00:09:01.020 | But I'd like to mention sort of a tragic note, really,
00:09:04.820 | that as far as I understand, the prize is no longer funded.
00:09:08.660 | But money aside, it shows that there's not a broad interest
00:09:13.380 | in this construction, both inside
00:09:15.940 | and outside the research community.
00:09:18.300 | It's really surprising to me that Google DeepMind,
00:09:21.600 | Facebook AI, major groups, major individual researchers
00:09:25.940 | in the field of AI have not taken on this challenge.
00:09:29.280 | I hope to actually be a small part of changing that
00:09:32.100 | a little bit, because I think there is a lot of value
00:09:35.060 | in considering the Turing test.
00:09:36.460 | I don't think it's a distraction.
00:09:38.020 | And I'll return to that point later on in this presentation.
00:09:41.660 | So you can pause and maybe read parts of this transcript.
00:09:45.580 | I'll highlight on the right, at the top of the right column,
00:09:48.660 | the human says, "You're a robot."
00:09:50.940 | Mitsuko answers, "How can a person be a robot?"
00:09:54.660 | Human, "Many people live lives akin to robots."
00:09:58.380 | Mitsuko, "How many, more than a million?"
00:10:01.140 | Human, "Yes."
00:10:03.180 | Mitsuko, "That seems like rather a lot."
00:10:06.020 | Human, "Do you think Southampton will beat Swansea?"
00:10:10.260 | Mitsuko, "I think so."
00:10:12.180 | Human, "Why?"
00:10:13.500 | Mitsuko, "No reason, it's just what I think."
00:10:16.140 | Human, "You must have a reason."
00:10:18.180 | Mitsuko, "Nope, no real reason."
00:10:20.780 | Human, "Which team do you support?"
00:10:23.340 | Mitsuko, "I support Leeds United."
00:10:26.340 | Human, "Leeds!"
00:10:29.500 | How's the season going?
00:10:31.140 | Mitsuko, "Yes, it is a city in the north of England.
00:10:34.820 | "About half a million people live there.
00:10:37.980 | "It is the same as ever."
00:10:39.520 | So you see in this transcript,
00:10:43.900 | it's just some little gap of inhumanity
00:10:46.780 | that becomes apparent.
00:10:48.380 | There's a lingering and a repeatability of points.
00:10:52.940 | There's certain gaps in understanding
00:10:56.260 | and ability to follow tangents, all those kinds of things.
00:10:59.180 | But it's still not clear to me as an open question
00:11:04.460 | how to make explicit where exactly
00:11:09.100 | the point of the failure of the test is.
00:11:11.780 | I believe that hasn't actually been really researched
00:11:15.220 | that well in these constructions.
00:11:17.540 | As opposed to decision making at the very end
00:11:20.660 | of a conversation, is this human or not,
00:11:22.900 | rather marking parts of a conversation
00:11:25.220 | as more or less human, like suspicious parts
00:11:28.140 | that make you wonder this is not human.
00:11:30.700 | I think that'll be really interesting to see
00:11:32.540 | if it's possible to make explicit what aspects
00:11:34.940 | of the conversation are the failure points.
00:11:37.460 | One of the times that the claim
00:11:39.280 | that the Turing test was passed,
00:11:41.140 | I think most famously was in 2014
00:11:43.700 | at an exhibition event that marked the 60th anniversary
00:11:46.700 | of Turing's death, Eugene Guzman fooled 33%
00:11:51.700 | of the event judges.
00:11:53.900 | And the method he used was to portray
00:11:55.940 | a 13-year-old Ukrainian boy that had a bunch
00:11:58.740 | of different personality quirks
00:12:00.100 | and obviously the language barrier,
00:12:02.220 | and had some humor and a constant sort of drive
00:12:05.180 | towards misdirecting the conversation back
00:12:08.620 | to the places where it was comfortable doing.
00:12:11.420 | So there's some criticism that you can make
00:12:13.000 | of this event due to some sort of smoke and mirrors,
00:12:16.820 | kind of the PR and marketing side of things
00:12:18.820 | that I think is always there with these kind
00:12:23.140 | of exhibition events.
00:12:24.900 | But setting that aside, I think the interesting lessons here
00:12:29.260 | is that the parameters, the rules of the actual engineering
00:12:33.260 | of the Turing test can determine whether it contains
00:12:38.060 | sort of the spirit of the Turing test,
00:12:40.620 | which is the test that captures the ability
00:12:44.260 | of an agent to have a deep, meaningful conversation.
00:12:48.660 | So in this case, you can argue that a few tricks were used
00:12:53.660 | to circumvent the need to have a deep,
00:12:57.980 | meaningful conversation.
00:12:59.460 | And 30% of judges were fooled without rigorous,
00:13:03.300 | thorough, transparent, open-domain testing.
00:13:06.260 | On the left is a transcript with Scott Aronson,
00:13:08.940 | the famed computer scientist,
00:13:10.300 | the quantum computing researcher.
00:13:13.380 | Talked to him on the podcast, brilliant guy.
00:13:15.980 | He posted some of the conversation that he had
00:13:18.200 | with Eugene, he was one of the judges,
00:13:20.420 | on his blog that I think is really interesting.
00:13:22.980 | So it shows that the judge, the interrogator,
00:13:26.660 | when they're an expert, they can drive,
00:13:29.500 | they can truly put the bot to the test.
00:13:32.500 | As Scott did, he really didn't allow the kind
00:13:35.940 | of misdirection that Eugene nonstop tried to do.
00:13:39.660 | And you can see that in the transcript.
00:13:41.700 | Scott refuses to take the misdirection.
00:13:44.540 | So as I mentioned, despite the waning,
00:13:48.060 | I guess, popularity of the Lobner Prize
00:13:50.260 | and the Turing Test idea in general,
00:13:53.400 | Google has published a paper and proposed a system
00:13:56.460 | called MINA that's a chatbot,
00:13:59.220 | that's an end-to-end deep learning system.
00:14:02.720 | The representational goal in the 2.6 billion parameters
00:14:06.180 | is to capture the conversational context well,
00:14:09.160 | to be able to generate the text
00:14:10.520 | that fits the conversational context well.
00:14:12.740 | Now, one interesting aspect of this,
00:14:14.600 | besides being a serious attempt
00:14:16.280 | at creating a learning-based system
00:14:18.420 | for open domain conversational agents,
00:14:21.760 | is that a new metric is proposed.
00:14:23.960 | And it's a two-part metric of sensibleness and specificity.
00:14:28.960 | Now, sensibleness is that a bot's responses
00:14:32.280 | have to make sense in context.
00:14:33.840 | They have to fit the context.
00:14:35.560 | Just to give you a sense,
00:14:36.520 | for humans, we have 97% sensibleness.
00:14:39.460 | So ability to match what we're saying to the context.
00:14:45.080 | Now, the reason you need another side of that metric
00:14:48.900 | is because you can be sensible,
00:14:52.020 | you can fit the context by being boring, by being generic,
00:14:55.320 | by making statements like, I don't know,
00:14:57.300 | or that's a good point.
00:14:58.880 | So these generic statements
00:15:00.180 | that fit a lot of different kinds of context.
00:15:02.460 | So the other side of the metric is specificity.
00:15:05.160 | Basically, the goal being there is don't be boring.
00:15:08.480 | It's to say something very specific to this context.
00:15:12.460 | So not only does it match the context,
00:15:14.980 | but it captures something very unique
00:15:17.180 | to this particular set of lines of conversation
00:15:21.980 | that form the context.
00:15:24.060 | I think it's fair to say that the beauty of the music,
00:15:28.820 | the humor, the wit of conversation
00:15:31.300 | comes from that ability to play with the specifics,
00:15:35.380 | the specificity metric.
00:15:37.220 | So both are really important.
00:15:39.400 | Humans achieve 86% sensibleness and specificity.
00:15:44.540 | Mina achieves 79% compared to Mitsuku, who achieves 56%.
00:15:49.520 | Now, take this all with a grain of salt.
00:15:52.140 | I want to be very careful here
00:15:53.980 | because there is also, not to throw shade,
00:15:57.580 | but it's closed source currently.
00:16:00.140 | And there's a little bit of a feeling
00:16:02.320 | of a PR marketing situation here.
00:16:05.500 | Naturally, perhaps the paper is made in such a way,
00:16:10.220 | the methodology and the results are made in such a way
00:16:12.580 | that benefit the way the learning framework was constructed.
00:16:16.380 | Now, I don't want to over-criticize that
00:16:18.740 | because I think there's still a lot of interesting ideas
00:16:21.460 | in this paper, but in terms of looking
00:16:23.340 | at the actual percentages of 86% human performance
00:16:26.980 | and 79% Mina performance, I think we're quite away
00:16:31.600 | from being able to make conclusive statements
00:16:34.060 | about a system achieving
00:16:35.780 | human-level conversational capabilities.
00:16:38.380 | So those plots should be taken with a grain of salt,
00:16:41.220 | but the actual content of the ideas,
00:16:43.260 | I think is really interesting.
00:16:44.780 | I think quite obviously the future, long-term,
00:16:49.240 | but hopefully short-term, is in learning end-to-end,
00:16:53.600 | learning-based approaches to open domain conversation.
00:16:57.020 | So just like Turing described, funny enough,
00:16:59.620 | 70 years ago in his paper that machine learning
00:17:02.220 | will be essential to success, I believe the same.
00:17:05.600 | It's a lot less interesting and revolutionary
00:17:07.940 | to think so today, but I believe
00:17:10.500 | that machine learning will also need
00:17:12.340 | to be a very central part of achieving
00:17:15.580 | human-level conversational capabilities.
00:17:18.300 | So let's talk through some objections.
00:17:20.320 | Nine of them are highlighted by Turing himself in his paper.
00:17:25.940 | Here I provide some informal, highly informal summaries.
00:17:30.940 | The first objection is religious,
00:17:33.460 | which connects thinking to, quote-unquote, the soul.
00:17:37.640 | And God, presumably, is the giver of the soul to humans.
00:17:42.640 | Now, Turing's response to that is God is all-powerful.
00:17:49.180 | There is no reason why he can't assign souls
00:17:53.780 | to anything biological or artificial.
00:17:58.220 | So it doesn't seem that whatever mechanism
00:18:01.400 | by which the soul arrives in the human
00:18:04.720 | cannot also be repeated for artificial creatures.
00:18:07.260 | The second objection is the, quote-unquote,
00:18:10.580 | head in the sand.
00:18:11.980 | It's a bit of a ridiculous one,
00:18:13.480 | but I think it's an important one
00:18:14.700 | because it keeps coming up often,
00:18:16.860 | even in today's context, highlighted by folks
00:18:19.340 | like Elon Musk, Stuart Russell, and so on.
00:18:22.420 | The head in the sand objection is that AGI is scary.
00:18:27.260 | So human-level and superhuman-level intelligence
00:18:29.820 | is kind of scary.
00:18:31.340 | Today we talk about existential threats.
00:18:34.140 | It seems like the world would be totally transformed
00:18:36.360 | if we have something like that.
00:18:37.900 | Then it could be transformed in a highly negative way.
00:18:40.440 | So let's not think about it
00:18:42.160 | because it's kind of seems far away.
00:18:45.000 | So it probably won't happen.
00:18:46.820 | So let's just not think about it.
00:18:49.000 | That's kind of the objection of the Turing test.
00:18:50.840 | It's so far away, it's not worthwhile
00:18:53.680 | to even think about a test for this intelligence
00:18:56.320 | or what human-level intelligence means
00:18:58.560 | or what superhuman-level intelligence means.
00:19:01.320 | The response, quite naturally,
00:19:02.640 | is that it doesn't matter how you feel about something
00:19:05.580 | and whether it's going to happen or not.
00:19:08.560 | So we kind of have to set our feelings aside
00:19:11.160 | and not allow fear or emotion to model our thinking
00:19:16.080 | or detract us from thinking about it at all.
00:19:19.680 | The third objection is from Gato's incompleteness theorem,
00:19:23.020 | saying there's limits to computation.
00:19:24.800 | This is the Roger Penrose line of thinking
00:19:28.080 | that basically if a machine is a computation system,
00:19:30.880 | there is limits to its capabilities
00:19:33.440 | in that it can never be a perfectly rational system.
00:19:37.440 | Turing's response to this
00:19:38.760 | is that humans are not rational either.
00:19:40.960 | They're flawed.
00:19:42.160 | Nowhere does it say that intelligence equals infallibility.
00:19:46.220 | In fact, it could probably be argued
00:19:47.720 | that fallibility is at the core of intelligence.
00:19:51.860 | The fourth objection is that consciousness
00:19:54.560 | may be required for intelligence.
00:19:56.920 | Turing's response to this is to separate
00:19:59.620 | whether something is conscious
00:20:00.980 | and whether something appears to be conscious.
00:20:03.200 | So the focus of the Turing test is how something appears.
00:20:06.240 | And so in some sense, humans, to us, as far as we know,
00:20:11.240 | only appear to be conscious.
00:20:12.880 | We can't prove that they're actually conscious,
00:20:14.960 | humans outside of ourselves.
00:20:17.080 | And so since humans only appear to be conscious,
00:20:20.920 | there's no reason to think that machines
00:20:22.720 | can't also appear to be conscious,
00:20:25.240 | and that's at the core of the Turing test.
00:20:27.320 | So the Turing test kind of skirts around the question
00:20:30.240 | whether something is or isn't intelligence,
00:20:32.220 | whether is or isn't conscious.
00:20:34.540 | The fundamental question is,
00:20:35.760 | does it appear to be intelligent?
00:20:37.400 | Does it appear to be conscious?
00:20:39.060 | So he actually doesn't respond to the idea
00:20:41.700 | that consciousness is or isn't required for intelligence.
00:20:44.820 | He just says that if it is,
00:20:46.460 | there's no reason why you can't fake it,
00:20:50.100 | and that will be sufficient
00:20:52.320 | to achieve the display of intelligence.
00:20:56.000 | The fifth objection is the Negative Nancy objection
00:21:01.000 | of machines will never be able to do X, whatever X is.
00:21:05.100 | You can make it love, joke, humor,
00:21:08.420 | understand or generate humor, eat, enjoy food,
00:21:13.420 | create art, music, poetry, and so on.
00:21:16.380 | So there's a lot of things we can put in that X
00:21:18.500 | that machines can never do.
00:21:20.080 | And basically highlighting our human intuition
00:21:24.060 | about the limitations of machines.
00:21:26.540 | Just like with the second objection,
00:21:28.240 | naturally the response here is that the objection
00:21:32.140 | that machines will never do X
00:21:33.640 | doesn't have any actual reasoning behind it.
00:21:37.260 | It is just a vapid opinion based on the world today,
00:21:42.260 | refusing to believe that the world of tomorrow
00:21:46.140 | will be different.
00:21:47.020 | The sixth objection, probably the most important,
00:21:51.700 | one of the most interesting,
00:21:53.180 | comes by way of Ada Lovelace, Lady Lovelace,
00:21:57.100 | the mother of computer science,
00:21:59.120 | with the basic idea that machines can only do
00:22:01.820 | what we program them to do.
00:22:03.820 | Now this is an objection that appears in many forms
00:22:06.940 | throughout, before Turing and after Turing.
00:22:09.540 | And I think it's a really important objection
00:22:11.340 | to think about.
00:22:12.380 | So in this particular case,
00:22:14.560 | I think Turing's response is quite shallow,
00:22:17.580 | but it is nevertheless pretty interesting,
00:22:19.940 | and we'll talk about it again later on.
00:22:23.260 | His response is, well, if machines can only do
00:22:26.740 | what we program them to do,
00:22:28.180 | we can rephrase that statement as saying,
00:22:30.900 | machines can't surprise us.
00:22:32.760 | And when you rephrase it that way,
00:22:35.400 | it becomes clear that machines actually surprise us
00:22:37.700 | all the time.
00:22:38.540 | A system that is sufficiently complex
00:22:41.140 | will no longer be one of which we have a solid intuition
00:22:45.900 | of how it behaves,
00:22:47.260 | even if we built all the individual pieces of code
00:22:49.820 | for those of you who have programmed things.
00:22:52.180 | So I've written a lot of programs.
00:22:54.280 | In the initial design stage,
00:22:56.900 | you have an intuition about how it should behave.
00:22:58.820 | There's a design, there's a plan,
00:23:00.820 | you know what the individual functions do.
00:23:02.980 | But as the piece of code grows,
00:23:05.180 | your ability to intuit exactly the mapping
00:23:09.340 | from input to output fades with the size of the code base,
00:23:14.220 | even if you understand everything about the code,
00:23:17.620 | and even if you set logical and syntactic bugs aside.
00:23:22.180 | The seventh objection looks to the brain
00:23:25.220 | and looks to the continuous analog nature
00:23:27.280 | of that particular neural network system.
00:23:30.840 | So Turing's response to that is,
00:23:33.900 | sure, the brain might be analog,
00:23:36.260 | and then digital computers are discrete,
00:23:39.580 | but if you have a big enough digital computer,
00:23:41.620 | it can sufficiently approximate the analog system,
00:23:45.700 | meaning to a sufficient degree
00:23:48.060 | that it would appear intelligent.
00:23:50.540 | The eighth objection is the free will objection, right?
00:23:55.300 | Is that when you have deterministic rules, laws, algorithms,
00:24:00.300 | they're going to result in predictable behavior.
00:24:05.100 | And this kind of exactly deterministic predictable behavior
00:24:09.980 | doesn't quite feel like the mind
00:24:14.540 | that we know us humans is possessing.
00:24:16.900 | This kind of feeling that underlies
00:24:19.740 | what's required for intelligence for a mind,
00:24:23.700 | I think is behind the Chinese room thought experiment
00:24:28.220 | that we'll talk about next.
00:24:29.780 | So Turing's response here is that
00:24:33.340 | humans very well could be a complex collection of rules.
00:24:38.340 | There's no indication that we're not,
00:24:40.180 | just because we don't understand
00:24:42.780 | or don't even have the tools to explore
00:24:44.860 | the kind of rules that underlie our brain,
00:24:48.860 | doesn't mean it's not just a collection
00:24:51.500 | of deterministic, perfectly predictable sets of rules.
00:24:56.500 | Objection number nine is kind of fun.
00:24:59.620 | Quite possibly Turing is trolling us,
00:25:01.960 | but more likely the ideas of mind reading,
00:25:04.820 | extrasensory perception, telepathy,
00:25:07.460 | were a little bit more popular in his time.
00:25:10.180 | So the objection here is what if mind reading
00:25:13.620 | was used to cheat the test?
00:25:15.820 | So basically if human to human communication
00:25:19.060 | through telepathy could be used,
00:25:22.420 | then a machine can't achieve that same kind
00:25:25.340 | of telepathic communication.
00:25:27.100 | And so that could be used to circumvent
00:25:30.900 | the effectiveness of the test.
00:25:32.780 | Now Turing's response to this is,
00:25:35.500 | well, you just have to design a room
00:25:38.520 | that not only protects you from being able to see,
00:25:40.780 | whether it's a robot or a human,
00:25:42.740 | but also design a telepathy proof room
00:25:47.740 | that prevents telepathic communication.
00:25:50.580 | Again, could be Turing trolling us,
00:25:53.940 | but I think more importantly,
00:25:55.120 | I think it's a nice illustration at the time,
00:25:57.240 | and even still today, that there's a lot of mystery
00:26:00.180 | about how our mind works.
00:26:01.980 | If you chuckle and completely laugh off
00:26:04.620 | the possibility of telepathic communication,
00:26:07.860 | I think you're assuming too much about your own knowledge
00:26:11.060 | about how our mind works.
00:26:13.580 | I think we know very little about how our mind works.
00:26:15.920 | It is true, we have very little scientific evidence
00:26:19.180 | of telepathic communication,
00:26:21.220 | but you shouldn't take the next leap
00:26:23.800 | and have a feeling like you understand
00:26:26.580 | that telepathic communication is impossible.
00:26:29.300 | You should nevertheless maintain an open mind.
00:26:31.980 | But as an objection,
00:26:33.640 | it doesn't seem to be a very effective one.
00:26:36.340 | I wanted to dedicate just one slide
00:26:38.500 | and probably the most famous objection to the Turing test
00:26:40.980 | proposed by John Searle in 1980
00:26:43.620 | in his paper "Minds, Brains, and Programs,"
00:26:47.260 | commonly known as the Chinese Room Thought Experiment.
00:26:50.820 | And it's kind of a combination of number four, number six,
00:26:53.660 | and number eight objections on the previous slide,
00:26:56.140 | which is the consciousness is required for intelligence,
00:27:01.100 | the Ada Lovelace objection that programs
00:27:03.460 | can only do what we program them to do,
00:27:06.580 | and the deterministic free will objection
00:27:10.060 | that deterministic rules lead to predictable behavior.
00:27:13.820 | And that doesn't seem to be like what the mind does.
00:27:16.620 | So there's echoes of all those objections
00:27:18.560 | that Turing anticipated all put together
00:27:21.100 | into the Chinese Room.
00:27:23.320 | As a small aside, it is now 6 a.m.
00:27:27.500 | I did not sleep last night,
00:27:28.700 | so this video is brought to you
00:27:31.780 | by this magic potion called Nitro Cold Brew,
00:27:36.380 | an excessively expensive canned beverage
00:27:41.700 | from Starbucks that fuels me
00:27:44.780 | this wonderful Saturday morning.
00:27:48.080 | Here's to you, dear friends.
00:27:50.800 | Okay, the Chinese Room involves following instructions
00:27:56.780 | of an algorithm.
00:27:57.900 | So there's a human sitting inside a room
00:28:00.220 | that doesn't know how to speak Chinese,
00:28:02.220 | but there's notes being passed to them
00:28:04.660 | inside the room from outside in Chinese,
00:28:07.860 | and all they do is follow a set of rules
00:28:10.140 | in order to respond to that language.
00:28:12.740 | So the idea is if the brain inside the system
00:28:17.740 | that passes the Turing test
00:28:21.300 | is simply following a set of rules
00:28:24.300 | that it's not truly understanding,
00:28:26.980 | it is not conscious, it does not have a mind,
00:28:30.100 | the objection is philosophical.
00:28:32.100 | So there's not, for my computer science engineering self,
00:28:37.100 | there's not enough meat in it
00:28:38.480 | to even make it that interesting.
00:28:40.580 | It's very human-centric,
00:28:41.860 | but allow us to explore it further.
00:28:45.360 | So the key argument is that programs,
00:28:48.980 | computational systems, are formal,
00:28:52.740 | and so they can capture syntactic structure.
00:28:55.580 | Minds, our brains, have mental content,
00:28:59.900 | so they can capture semantics.
00:29:01.860 | And so the claim that I think is the most important,
00:29:05.300 | the clearest in the paper,
00:29:06.740 | is that syntax by itself is neither constitutive of
00:29:11.000 | nor sufficient for semantics.
00:29:13.620 | So just because you can replicate the syntax of the language
00:29:16.980 | doesn't mean you can truly understand it.
00:29:19.980 | Now this is the same kind of criticism
00:29:21.580 | we hear of language models of today with transformers,
00:29:24.820 | that OpenAI's GP2 really doesn't understand the language,
00:29:28.700 | it's just mimicking the statistics of it so well
00:29:32.700 | that it can generate syntactically correct,
00:29:35.380 | and even have echoes of semantic structure
00:29:39.340 | that indicates some kind of understanding, but it doesn't.
00:29:43.320 | To me, that argument is not very interesting
00:29:45.300 | from an engineering perspective,
00:29:46.600 | because it just sounds like saying
00:29:48.940 | humans can understand things, humans are special,
00:29:52.900 | therefore machines cannot understand things.
00:29:56.680 | It's a very human-centric argument
00:29:59.260 | that's not allowing us to rigorously explore
00:30:02.680 | what exactly does understanding mean
00:30:06.620 | from a computational perspective.
00:30:08.140 | Or put in other words, if understanding, intelligence,
00:30:12.540 | consciousness, either one of those,
00:30:14.980 | is not achievable through computation,
00:30:18.020 | then where is the point that computation hits the wall?
00:30:21.900 | The most interesting open questions to me here
00:30:25.140 | are on the point of faking things, or mimicking,
00:30:27.760 | or the appearance of things.
00:30:29.440 | Does the mimicking of thinking equal thinking?
00:30:32.460 | Does the mimicking of consciousness equal consciousness?
00:30:35.020 | Does the mimicking of love equal love?
00:30:37.860 | This is something that I think a lot about,
00:30:40.460 | and depending on the day, go back and forth.
00:30:43.200 | But I tend to believe from an engineering perspective,
00:30:45.460 | I tend to agree with the spirit
00:30:47.900 | and the work of Alan Turing,
00:30:49.700 | in that at this time as engineers,
00:30:51.980 | we can only focus on building the appearance of thinking,
00:30:55.460 | the appearance of consciousness, the appearance of love.
00:30:58.780 | I think as we work towards creating that appearance,
00:31:01.980 | we'll actually begin to understand the fundamentals
00:31:06.620 | of what it means to be conscious,
00:31:08.140 | what it means to love, what it means to think.
00:31:12.760 | You may have even heard me say sometimes
00:31:14.600 | that the appearance of consciousness is consciousness.
00:31:18.800 | I think that's me being a little bit poetic,
00:31:21.780 | but I think from our perspective,
00:31:23.100 | from our exceptionally limited understanding,
00:31:27.180 | both problems are in the same direction.
00:31:31.500 | So it's not like if we focus on creating
00:31:33.300 | the appearance of consciousness,
00:31:34.500 | that's gonna lead us astray, in my personal view.
00:31:37.600 | It's going to lead us very far down the road
00:31:39.700 | of actually understanding,
00:31:40.940 | and maybe one day engineering consciousness.
00:31:44.380 | And now I'd like to talk about some alternatives
00:31:46.620 | and variations of the Turing test
00:31:48.180 | that I find quite interesting.
00:31:50.140 | So there's a lot of kind of natural variations
00:31:52.780 | and extensions to the Turing test.
00:31:54.980 | First, the total Turing test proposed in 1989.
00:31:59.980 | It extends the Turing test
00:32:01.660 | in the natural language conversation domain
00:32:04.420 | to perception, computer vision,
00:32:06.940 | and object manipulation of robotics.
00:32:09.100 | So it takes it into the physical world.
00:32:12.300 | The interesting question here to me
00:32:14.020 | is whether adding extra modalities
00:32:17.460 | like audio, visual, manipulation
00:32:21.580 | makes the test harder or easier.
00:32:24.200 | To me, it's very possible that a test
00:32:28.100 | with a narrow bandwidth of communication,
00:32:31.760 | such as the natural language communication
00:32:33.620 | of the Turing test is actually harder to pass
00:32:36.620 | than the one that includes other modalities.
00:32:38.820 | But anyway, one of the powerful things
00:32:42.580 | about the original Turing test is that it's so simple.
00:32:45.500 | The Lovelace test proposed in 2001
00:32:49.660 | builds on the Ada Lovelace objection
00:32:52.700 | to form the test that says the machine
00:32:55.100 | has to do something surprising
00:32:57.540 | that the creator or the person who's aware
00:33:02.100 | how the program was created cannot explain.
00:33:05.380 | So it should be truly surprised.
00:33:07.580 | There is also, in 2014, was proposed a Lovelace 2.0 test,
00:33:12.940 | which emphasizes a more constrained definition
00:33:15.660 | of what surprising is, 'cause it's very difficult
00:33:17.600 | to pin down, to formalize the idea of surprise
00:33:21.820 | and explain, right, in the original formulation
00:33:25.940 | of the Lovelace test.
00:33:27.300 | But with Lovelace 2.0, it emphasizes
00:33:29.820 | sort of creativity, art, so on.
00:33:33.660 | So it's more concrete than surprise,
00:33:35.540 | especially if you define constraints
00:33:38.260 | to which creative medium we're operating in.
00:33:41.860 | You basically have to create an impressive piece
00:33:45.300 | of artistic work.
00:33:46.600 | I think that's an interesting conception,
00:33:49.640 | but it takes us in the land that's much more,
00:33:53.180 | not less subjective than the original Turing test.
00:33:58.180 | But this brings us to the open
00:34:00.900 | and the very interesting question of surprise,
00:34:02.940 | which I think is really at the core
00:34:06.820 | of our conception of intelligence.
00:34:09.200 | I think it is true that our idea
00:34:12.540 | of what makes an intelligent machine
00:34:14.500 | is one that really surprises.
00:34:16.580 | So when we one day finally create a system
00:34:19.880 | of human-level or superhuman-level intelligence,
00:34:23.000 | we will surely be surprised.
00:34:25.680 | So we have to think, what kind of behavior
00:34:28.800 | is one that will surprise us to the core?
00:34:31.200 | To me, I have many examples in mind
00:34:34.620 | that I'll cover in future videos,
00:34:36.880 | but one certainly, one of the hardest ones is humor.
00:34:40.840 | And finally, the truly total Turing test proposed in 1998
00:34:45.840 | proposes an interesting philosophical idea
00:34:49.020 | that we should not judge the performance
00:34:52.600 | of an individual agent in an isolated context,
00:34:55.960 | but instead look at the body of work
00:34:58.200 | produced by a collection of intelligent agents
00:35:01.880 | throughout their evolution,
00:35:03.720 | with some constraints on the consistency
00:35:06.520 | underlying the evolutionary process.
00:35:09.600 | It's interesting to suggest that the way we conceive
00:35:15.000 | of intelligence amongst us humans
00:35:17.920 | is grounded in the long arc of history
00:35:20.880 | of the body of work we've created together.
00:35:23.280 | I don't find that argument convincing,
00:35:25.720 | but I do find the interesting question
00:35:27.560 | and the open question, the idea
00:35:31.000 | that we should measure systems
00:35:35.840 | not in the moment or particular five minute period
00:35:38.720 | or 20 minute period, but over a period of months and years,
00:35:43.440 | perhaps condensed in a simulated context.
00:35:46.560 | So really increase the scale
00:35:49.760 | at which we judge interactions
00:35:51.720 | by several orders of magnitude.
00:35:54.040 | That to me is a really interesting idea,
00:35:59.040 | you know, to judge alpha zero performance
00:36:01.520 | not on a single game of chess,
00:36:03.840 | but looking at millions of games
00:36:06.960 | and not looking at a million games
00:36:09.240 | for a static set of parameters,
00:36:11.240 | but looking at the millions of games played
00:36:14.640 | as the system was trained from scratch
00:36:17.520 | and became better and better and better.
00:36:19.680 | There's something about that full journey
00:36:23.520 | that may capture intelligence.
00:36:25.960 | So intelligence very well could be the journey,
00:36:30.000 | not the destination.
00:36:31.600 | I think there's something there.
00:36:32.800 | It's very imprecise in this construction,
00:36:35.720 | but it struck me as a very novel idea
00:36:39.880 | for benchmark not to measure instantaneous performance,
00:36:43.920 | but performance over time
00:36:45.240 | and the improvement of performance over time.
00:36:47.600 | It appears that there's something to that,
00:36:49.120 | but I can't quite make it concrete.
00:36:51.000 | And I'm not sure it's possible to formalize
00:36:52.760 | in the way that the original Turing test is formalized.
00:36:56.480 | Another kind of test is the Winograd Schema Challenge,
00:37:00.340 | which I think is really compelling in many ways.
00:37:03.280 | So first to explain it with an example,
00:37:05.840 | there's a sentence, really two sentences.
00:37:09.280 | Let's say the trophy doesn't fit into the brown suitcase
00:37:11.940 | because it's too small,
00:37:13.540 | and the trophy doesn't fit into the brown suitcase
00:37:15.880 | because it is too large.
00:37:17.680 | And the question is, what is too small?
00:37:20.280 | What is too large?
00:37:22.120 | The answer for the small, what is too small,
00:37:24.720 | is the suitcase is too small.
00:37:27.080 | The trophy doesn't fit into the brown suitcase
00:37:29.240 | because it is too small.
00:37:31.120 | And then the second question is, what is too large?
00:37:34.280 | The answer there is the trophy.
00:37:36.360 | The trophy doesn't fit into the brown suitcase
00:37:38.380 | because it is too large.
00:37:40.600 | The basic idea behind this challenge
00:37:42.620 | is the ambiguity in the sentence can only be resolved
00:37:47.380 | with common sense reasoning about ideas in this world.
00:37:51.200 | And so the strength of this test is it's quite clear,
00:37:56.160 | quite simple, and yet requires, at least in theory,
00:38:01.100 | this deep thing that we think makes us human,
00:38:05.960 | which is the ability to reason
00:38:07.940 | at the very basic level of common sense reasoning.
00:38:10.640 | The other nice thing is it can be a benchmark,
00:38:15.840 | like we're used to in the machine learning world,
00:38:18.080 | that doesn't require subjective human judges.
00:38:21.240 | There's literally a right answer.
00:38:23.420 | The weakness here that holds for other similar challenges
00:38:28.420 | in the space is that it's very difficult
00:38:30.780 | to come up with a large amount of questions.
00:38:33.720 | I mean, each one is handcrafted.
00:38:36.560 | And so that means you can't build a benchmark
00:38:40.340 | of millions or billions of questions.
00:38:43.340 | It has to be on a small scale.
00:38:45.860 | Variations of the Winograd scheme are included
00:38:50.420 | in some natural language benchmarks of today
00:38:54.860 | that people use in the machine learning context.
00:38:57.820 | The Amazon Alexa prize, I think,
00:39:00.500 | captures nicely the spirit of the Turing test.
00:39:03.900 | I think it's actually quite an amazing challenge
00:39:06.100 | and competition that uses voice conversation in the wild,
00:39:10.260 | so with real people, and they can use a,
00:39:13.420 | I think it's called a social bot skill
00:39:15.680 | on their Alexa devices.
00:39:17.340 | And I don't wanna wake up my own Alexa devices,
00:39:20.300 | but basically say her name and say, let's chat.
00:39:23.460 | And that brings up one of the bots involved in the challenge
00:39:26.540 | and then you can have a conversation.
00:39:28.420 | And then the bar that's to be reached is for you
00:39:32.700 | to have a 20 minute or longer conversation with the bot
00:39:36.860 | and for two thirds or more of the interactions
00:39:40.000 | to be that long.
00:39:41.280 | So the basic metric of successful interaction
00:39:45.120 | is the duration of the interaction.
00:39:47.420 | And as of today, we're still really,
00:39:49.580 | really far away from that.
00:39:51.180 | So why is this a good metric?
00:39:52.580 | And I do think it's a really powerful metric.
00:39:55.660 | As opposed to us judging the quality of conversation
00:39:57.940 | in retrospect, we speak with our actions.
00:40:01.340 | So a deep, meaningful conversation
00:40:04.100 | is one we don't want to leave.
00:40:05.940 | When we have other things contending for our time,
00:40:09.980 | when we make the choice to stay in that conversation,
00:40:12.900 | that's as powerful a signal as any
00:40:15.500 | to show that that conversation has content,
00:40:19.220 | has meaning, is enjoyable.
00:40:21.900 | I think that is what passing the Turing Test
00:40:25.060 | in its original spirit actually is.
00:40:28.540 | And I should mention that as of today,
00:40:30.700 | no team has even come close to passing the Turing Test
00:40:35.060 | as it is constructed by the Alexa Prize.
00:40:37.820 | There's several things that are really surprising
00:40:39.460 | about this challenge.
00:40:40.460 | One is that it's not a lot more popular
00:40:43.300 | and two, that Amazon chose to limit it to students only.
00:40:48.300 | I mean, almost making it an educational exercise
00:40:51.900 | as opposed to a moonshot challenge
00:40:54.620 | for our entire generation of researchers.
00:40:57.980 | I mentioned it before, but I'll say it again here
00:41:00.140 | that it's surprising to me that the biggest research lab
00:41:03.740 | and industry in academia have not focused on this problem,
00:41:08.140 | have not found the magic within the Turing Test problem
00:41:12.060 | and the Alexa Prize as it formulates,
00:41:15.180 | I believe, the spirit of the Turing Test quite well.
00:41:19.140 | A very different kind of test is the Hutter Prize
00:41:22.460 | started by Marcus Hutter,
00:41:24.220 | which I think is really fascinating
00:41:26.980 | on both the philosophical and mathematical angle.
00:41:29.460 | Underlying it is the idea that compression
00:41:33.460 | is strongly correlated with intelligence.
00:41:38.140 | Put another way, the ability to compress knowledge
00:41:42.020 | well requires intelligence.
00:41:44.380 | And the better you compress that knowledge,
00:41:46.380 | the more intelligent you are.
00:41:47.900 | I think this is a really compelling notion
00:41:51.180 | because then we can make explicit,
00:41:53.620 | we can quantify how intelligent you are
00:41:56.580 | by how well you're able to compress knowledge.
00:41:59.660 | As the prize webpage puts it,
00:42:02.740 | being able to compress well is closely related
00:42:05.300 | to acting intelligently,
00:42:07.060 | thus reducing the slippery concept of intelligence
00:42:10.020 | to hard file size numbers.
00:42:12.420 | So the task is to take one gigabyte of Wikipedia data
00:42:17.420 | and compress it down as much as possible.
00:42:20.860 | The current best is a 8.58 compression factor.
00:42:25.860 | So down from one gigabyte to 117 megabytes.
00:42:30.220 | And the award for each 1% improvement, you win 5,000 euros.
00:42:35.180 | I find this competition just amazing
00:42:37.780 | and fascinating on many levels.
00:42:40.100 | I think it's a really good formulation
00:42:43.020 | of an intelligence challenge, but it's not a test.
00:42:47.980 | That's one of its kind of limitations,
00:42:50.460 | at least in the poetic sense,
00:42:52.260 | that it doesn't set a bar
00:42:53.860 | beyond which we're really damn impressed.
00:42:56.700 | Meaning it's harder to set a bar,
00:42:59.300 | like the one formulated by the Turing test,
00:43:01.340 | beyond which we feel it would be human level intelligence.
00:43:04.620 | Now the bar that's set by the Turing,
00:43:06.260 | Alan Turing and others,
00:43:07.820 | Lobna Prize, Alexa Prize, are also arbitrary,
00:43:11.780 | but it feels like we're able to intuit a good bar
00:43:14.180 | in that context better than being able to intuit
00:43:17.700 | the kind of bar we need to set for the compression challenge.
00:43:21.260 | Another fascinating challenge
00:43:22.660 | is the abstraction and reasoning challenge
00:43:24.740 | put forth by Francois Chollet just a few months ago.
00:43:28.500 | So this is very exciting.
00:43:29.460 | It's actually ongoing as a competition on Kegel,
00:43:31.980 | I think with a deadline in May.
00:43:35.020 | It's a really, really interesting idea.
00:43:37.020 | I haven't internalized it fully yet,
00:43:39.380 | and perhaps we'll do a separate video
00:43:41.860 | on just this paper alone,
00:43:43.780 | and I'll talk to Francois, I'm sure, on the podcast
00:43:46.140 | and other contacts in the future about it.
00:43:48.140 | I think there's a lot of brilliant ideas here
00:43:50.820 | that I still have to kind of digest a little bit,
00:43:53.500 | but let me describe the high-level ideas
00:43:57.860 | behind this benchmark.
00:43:59.420 | So first of all, the name is abstraction reasoning corpus
00:44:03.140 | or challenge arc.
00:44:05.140 | The domain is in a grid world of patterns,
00:44:08.860 | not limited in size,
00:44:10.020 | but the grid world is filled with cells
00:44:13.780 | that can be of different colors.
00:44:15.700 | And the spirit of the set of tests that Francois proposes
00:44:19.380 | is to stay close to IQ tests,
00:44:22.060 | so psychometric intelligence tests
00:44:23.740 | that we use to measure the intelligence of human beings.
00:44:29.100 | Now, the Turing test is kind of at a higher level
00:44:33.580 | of natural language.
00:44:35.580 | In this construction of arc,
00:44:38.300 | it goes as close as possible
00:44:40.940 | to the very basic elements of reasoning,
00:44:44.980 | just like in the IQ test of patterns.
00:44:47.700 | It gets to the very core,
00:44:49.920 | such that we can then make explicit the priors,
00:44:55.620 | the concepts that we bring to the table of those tests.
00:44:58.780 | And if we can make them explicit,
00:45:00.700 | it reduces the test as close as possible
00:45:04.420 | to the measure of the system's ability to reason.
00:45:07.920 | Now, the concepts that are brought to this grid world,
00:45:11.180 | here's just a couple of example of priors
00:45:13.100 | that Francois shows in his paper.
00:45:15.780 | I recommend highly,
00:45:16.820 | called "On the Measure of Intelligence."
00:45:19.260 | Here, prior concept is not referring to a previous concept.
00:45:22.420 | It's referring to a prior set of knowledge
00:45:25.700 | that you bring to the table.
00:45:27.420 | So this first row of illustrations of the two grid worlds
00:45:31.220 | illustrates the idea of object persistence with noise.
00:45:34.420 | So we're able to understand that large objects,
00:45:39.420 | when there is some visual noise occluding
00:45:45.180 | our ability to see them,
00:45:47.580 | that they still exist in the world.
00:45:49.480 | And if that noise changes, the object is still unchanged.
00:45:53.580 | So that idea of object persistence in the world
00:45:57.900 | is a prior that we bring to the table
00:46:01.140 | of understanding this grid world.
00:46:02.700 | Another prior is on the left at the bottom
00:46:06.820 | is objects are defined by spatial contiguity.
00:46:11.820 | So objects in this grid world,
00:46:16.220 | when the cells are of the same color
00:46:18.540 | and they're touching each other,
00:46:20.120 | they're probably part of the same object.
00:46:22.180 | And if there's black cells that separate
00:46:25.420 | those groupings of cells,
00:46:26.980 | that means there's multiple objects.
00:46:28.660 | So this kind of spatial contiguity of colored cells
00:46:33.660 | define the entity of the object.
00:46:37.860 | And on the right at the bottom is the color-based contiguity,
00:46:42.780 | which means that even if the cells
00:46:44.760 | of different colors are touching,
00:46:46.780 | if their colors are different,
00:46:48.300 | that means it likely belongs to a different object.
00:46:51.280 | That's a basic prior.
00:46:52.820 | And there's a few others, by the way,
00:46:54.980 | just beautiful pictures in that paper
00:46:58.660 | that make you really think
00:47:00.660 | about the core elements of intelligence.
00:47:03.340 | I love that paper, worth looking at.
00:47:06.380 | There's a lot of interesting insights in there.
00:47:09.280 | Just to give you some examples of what the actual task
00:47:13.700 | for the machine in this test looks like,
00:47:15.940 | it's similar to the kind of task you would see in an IQ test.
00:47:18.700 | So here there's three pairings,
00:47:21.680 | and the task is for the fourth pairing of images
00:47:25.640 | to generate the grid world that fits the other three,
00:47:30.640 | that fits the generating pattern of the other three.
00:47:34.760 | So in this case, figure four from the paper,
00:47:37.960 | a task where the implicit goal
00:47:39.400 | is to complete a symmetrical pattern.
00:47:41.940 | The nature of the task is specified
00:47:43.600 | by the three input-output examples.
00:47:45.820 | The test taker must generate the output grid
00:47:47.920 | corresponding to the input grid
00:47:49.820 | of the test input bottom right.
00:47:52.200 | So here, what you're tasked with understanding
00:47:55.280 | in the first three pairings
00:47:57.540 | is that the input has a perfect global symmetry to it.
00:48:02.540 | And also that there's parts of the image that are missing
00:48:10.000 | that can be filled in order to complete
00:48:12.540 | that perfect symmetry.
00:48:14.360 | Now that's relying on another prior,
00:48:16.440 | another basic concept of symmetry,
00:48:19.080 | which I think underlies a lot of our understanding
00:48:22.160 | of visual patterns.
00:48:23.780 | Again, so the intelligence system
00:48:25.920 | has to have a good representation of symmetry
00:48:30.920 | in various contexts.
00:48:33.140 | This is fascinating and beautiful, beautiful images.
00:48:36.760 | Okay, another example, figure 10 from the paper,
00:48:39.840 | a task where the implicit goal is to count unique objects
00:48:43.240 | and select the objects that appears the most times.
00:48:46.320 | The actual task has more demonstration pairs
00:48:48.480 | than these three.
00:48:49.740 | So figure 10 here from the paper,
00:48:52.100 | a task where the implicit goal is to count unique objects
00:48:55.080 | and select the objects that appear the most times.
00:48:58.400 | So again, there's three pairings.
00:49:01.320 | You see in the first one, there's three blue objects.
00:49:04.520 | In the second one, there's four yellow objects.
00:49:06.880 | In the third one, there's three red objects.
00:49:09.280 | So you have to figure that out.
00:49:10.560 | And then the output is the grid cells capturing that object
00:49:15.440 | that appears the most times.
00:49:17.120 | And so apply that kind of reasoning
00:49:20.380 | to complete the output of the fourth pairing.
00:49:24.120 | One of the challenges for this kind of test
00:49:26.200 | is it's difficult to generate.
00:49:28.060 | But just like I said,
00:49:29.120 | I think there's a lot of really interesting technical
00:49:31.640 | and philosophical ideas here that are worth exploring.
00:49:35.680 | So let's quickly talk through a few takeaways.
00:49:37.980 | So zooming out, is the Turing test
00:49:42.160 | a good measure of intelligence
00:49:43.380 | and can it serve as an answer to the big ambiguous
00:49:48.100 | but profound philosophical question of can machines think?
00:49:52.060 | So first some notes
00:49:54.020 | on the underlying challenges of the Turing test.
00:49:57.280 | Let's talk about intelligence.
00:50:00.660 | So if we compare human behavior and intelligent behavior,
00:50:05.360 | it's clear that the Turing test hopes to capture
00:50:09.620 | the intelligent parts of human behavior.
00:50:13.180 | But if we're trying to really capture
00:50:16.140 | human level intelligence,
00:50:18.700 | it's also possible that we wanna capture
00:50:21.820 | the unintelligent, the irrational parts of human behavior.
00:50:25.500 | So it's an open question whether natural conversation
00:50:28.460 | is a test of intelligence or humanness.
00:50:33.460 | Because if it's a test of intelligence,
00:50:36.380 | it's focusing only on kind of rational systematic thinking.
00:50:40.740 | If it's a test of humanness,
00:50:42.620 | then you have to capture the full range of emotion,
00:50:45.720 | the mess, the irrationality, the laziness, the boredom,
00:50:50.180 | all the things that make us human
00:50:52.020 | and all the things that then project themselves
00:50:55.220 | into the way we carry on through conversation.
00:50:57.740 | As I mentioned in the previous objectives,
00:50:59.700 | the Turing test really focuses on the external appearances,
00:51:03.900 | not the internal processes.
00:51:06.100 | So like I said, from an engineering perspective,
00:51:09.340 | I think it's very difficult to create a test
00:51:12.460 | for internal processes for some of these concepts
00:51:15.580 | that we have a very poor understanding of,
00:51:17.820 | like intelligence, like consciousness.
00:51:20.380 | I think the best we can do right now
00:51:23.300 | in terms of quantifying and having a measure of something,
00:51:27.180 | we have to look at the external performance of the system
00:51:31.620 | as opposed to some properties of the internal processes.
00:51:34.940 | Another challenge for the Turing test,
00:51:37.980 | as Scott Aronson's conversation with Eugene Guzman indicates
00:51:41.940 | is that the skill of the interrogator
00:51:45.420 | is really important here.
00:51:47.100 | That's both on just the conversational skill
00:51:51.080 | of how much you can stretch
00:51:52.460 | and challenge the conversation with a bot,
00:51:54.860 | and two, on the human side of it,
00:51:58.100 | the ability of the interrogator to identify the humanness
00:52:01.620 | of both the human and the machine.
00:52:03.500 | So the ability to have a conversation
00:52:07.180 | that challenges the bot,
00:52:09.540 | and the ability to make the actual identification
00:52:12.300 | of human or machine.
00:52:13.500 | Those are both skills that are essential
00:52:17.740 | to the Turing test.
00:52:19.980 | Also, to me, it's really interesting,
00:52:21.660 | the anthropomorphization of human
00:52:25.500 | to inanimate object interaction,
00:52:29.060 | I think is really fascinating.
00:52:31.060 | And it's an open question
00:52:32.660 | whether in some construction of the Turing test,
00:52:35.220 | whether anthropomorphism is leveraged
00:52:37.740 | to convince the human,
00:52:39.180 | whether that's cheating the Turing test,
00:52:41.220 | or in fact, that's an essential element
00:52:44.220 | to convincing us humans that something is intelligent.
00:52:47.700 | Perhaps as a starting point,
00:52:49.660 | we have to anthropomorphize something
00:52:51.460 | before we allow it to be intelligent
00:52:54.980 | in our subjective judgment of its intelligence.
00:52:59.940 | And finally, another limitation of the Turing test
00:53:02.500 | that could be narrowly stated as
00:53:04.860 | why do we expect a bot to talk?
00:53:07.420 | What if it doesn't feel like talking?
00:53:11.020 | Does it still fail?
00:53:12.340 | I think a more general way to phrase that is
00:53:15.300 | why do we judge the performance of a system
00:53:18.940 | on such a narrow window of time?
00:53:21.740 | I think, as I mentioned before,
00:53:23.540 | there could be something interesting
00:53:25.460 | on expanding the window of time
00:53:28.820 | over which we analyze the intelligence of the system,
00:53:34.060 | looking not just at the average performance
00:53:37.140 | but the growth of its performance
00:53:38.740 | as it interacts with you as the individual.
00:53:41.700 | I think one key aspect of intelligence
00:53:44.540 | is a social aspect and a social connection,
00:53:49.540 | I think in part may require getting to know the person.
00:53:54.700 | And there's something to rethink in the Turing test
00:53:57.220 | that relies on us building a relationship
00:53:59.700 | with the person as part of the test.
00:54:02.140 | So you can think of it as kind of the ex machina Turing test
00:54:07.060 | where they spend a series of conversations together,
00:54:10.980 | several days together, all those kinds of things.
00:54:14.060 | That feels like an interesting extension
00:54:17.220 | of the Turing test which could reveal
00:54:19.700 | the significant limitation of the current construction
00:54:22.540 | of the Turing test which is a limited window of time,
00:54:25.020 | one time at the end interrogate or judgment of it
00:54:28.780 | whether it's human or machine.
00:54:31.700 | Now my view overall on the Turing test is that yes,
00:54:35.700 | something like the Turing test as originally constructed
00:54:39.780 | so the natural language conversation
00:54:44.100 | is close to the ultimate test of intelligence.
00:54:47.700 | And moreover, this is where I disagree.
00:54:49.900 | I think I disagree with Francois Chollet
00:54:51.900 | and other world-class researchers in the area,
00:54:55.580 | Stuart Russell and so on, that I think the Turing test
00:54:58.980 | is not a distraction for us to think about.
00:55:01.700 | It doesn't pull us away from actually making progress
00:55:05.700 | in the field.
00:55:06.780 | I think it keeps us honest.
00:55:08.540 | I think truly analyzing where we stand
00:55:12.140 | in natural language conversation will help us understand
00:55:15.700 | how far away we are.
00:55:17.040 | And more than that, I think there should be active research
00:55:21.340 | on this field.
00:55:22.180 | I think the Lubna Prize type of formulations,
00:55:24.220 | the Alexa Prize formulations should be more popular
00:55:27.260 | than they are and I think researchers should take them
00:55:30.220 | very seriously.
00:55:31.660 | Now that doesn't mean that the work of the ARC benchmark
00:55:36.500 | with the IQ test type of intelligent test
00:55:40.420 | is not also going to be fruitful, potentially very fruitful.
00:55:45.220 | But I think ultimately the real test
00:55:48.760 | of human-level intelligence will occur in something
00:55:52.900 | like the construction of the Turing test
00:55:55.660 | with natural language open domain conversation
00:55:58.580 | that results in deep, meaningful connection
00:56:02.100 | between human and machine.
00:56:04.460 | Zooming out a little bit, I think in general,
00:56:08.100 | I think AI researchers don't like and try to avoid
00:56:12.740 | the messiness of human beings as is captured
00:56:16.380 | by the human-robot interaction field and set of problems.
00:56:20.260 | I think more than just embracing the Turing test,
00:56:23.980 | I think we should embrace the messiness of the human being
00:56:27.780 | in all the different domains of computer vision,
00:56:31.700 | of natural language, of robotics, of autonomous vehicles.
00:56:36.700 | I've been a long-time advocate that semi-autonomous vehicles
00:56:40.480 | are here to stay for a long time.
00:56:41.980 | We're going to have to figure out
00:56:43.420 | the human-robot interaction problem
00:56:46.500 | and for that we have to embrace perceiving everything
00:56:49.620 | about the human inside the car,
00:56:51.540 | perceiving everything about the humans outside the car.
00:56:54.240 | As I mentioned, this presentation of the paper
00:56:57.880 | is actually part of our paper reading club
00:57:02.100 | focused on artificial intelligence
00:57:04.140 | where we discuss a couple of times a week
00:57:07.100 | on a Discord server called LexPlus AI Podcast
00:57:10.180 | that you're welcome to join.
00:57:12.220 | We have an amazing community of brilliant people there
00:57:14.600 | that discuss all kinds of topics
00:57:16.900 | in artificial intelligence and beyond.
00:57:19.200 | This particular illustration that I just love
00:57:22.300 | is from Will Scobie who is an illustrator
00:57:24.580 | from United Kingdom who is part of this Discord community
00:57:29.160 | so he contributed it.
00:57:30.940 | And in general, aside from the amazing conversations,
00:57:34.640 | I encourage and hope to see other members of the community
00:57:38.260 | contribute art, code, visualizations, slides,
00:57:43.260 | ideas for these kinds of videos.
00:57:45.820 | I'm really excited by the kind of conversations I've seen.
00:57:48.980 | If you're watching this video and wanna join in,
00:57:51.420 | click on the Discord link in the description on the slide.
00:57:54.440 | Join the conversation, new paper every week, it's fun.
00:57:58.920 | Just to give you a little sense of the ideas
00:58:02.020 | behind this AI paper reading club,
00:58:04.340 | like what the goals are.
00:58:05.860 | So what is it?
00:58:07.380 | I think the goal is to take a seminal paper in the field
00:58:11.360 | that doesn't just focus in on the specific
00:58:13.960 | sort of paragraph to paragraph, section to section analysis
00:58:17.700 | what the paper is saying, but actually use the paper
00:58:20.400 | to discuss the history, the big picture development
00:58:23.860 | of the field within the context of that paper.
00:58:26.860 | Now that could be philosophical papers
00:58:28.580 | like this Turing test paper,
00:58:30.220 | or it could be very specific papers in the field.
00:58:33.420 | Again, physics, mathematics, computer science,
00:58:35.900 | and probably quite a bit of deep learning.
00:58:38.340 | So the hope is to prioritize beautiful, powerful,
00:58:43.340 | impactful insights as opposed to full coverage
00:58:46.780 | of all the contents of the paper.
00:58:49.100 | And the actual meetings on Discord,
00:58:52.860 | hopefully are less one person presenting
00:58:55.900 | and more discussion.
00:58:57.360 | There's a lot of brilliant people, they're civil,
00:59:00.140 | so you can have 300, 400 people on voice chat,
00:59:03.820 | which is a really intimate setting.
00:59:05.540 | And yet people aren't interrupting each other,
00:59:08.220 | it's not chaos, it's quite an amazing community.
00:59:11.060 | The other goal I'd love to see
00:59:13.740 | is even if we cover technical papers,
00:59:16.020 | the goal is for it to be accessible to everyone.
00:59:19.380 | So both high school students,
00:59:21.820 | people outside of all of these fields in general,
00:59:24.780 | but also I'd love to make it useful
00:59:27.980 | to experts in the field, expert researchers.
00:59:31.300 | So avoid using technical jargon,
00:59:34.340 | but still try to discover insights that are new,
00:59:38.380 | that are interesting, that are important
00:59:40.180 | for the researchers in the field.
00:59:42.060 | That's what I would love to achieve here
00:59:44.340 | with this paper reading club.
00:59:45.980 | If you're interested, join in, listen in,
00:59:48.940 | or contribute to the conversation,
00:59:50.420 | suggest papers, suggest content, visualizations, code,
00:59:55.100 | all is welcome, it's an amazing community.
00:59:57.540 | Thanks for watching this excessively long presentation.
01:00:00.620 | If you have suggestions, let me know.
01:00:03.220 | Otherwise, hope to see you next time.
01:00:05.220 | (upbeat music)
01:00:07.820 | (upbeat music)
01:00:10.420 | (upbeat music)
01:00:13.020 | (upbeat music)
01:00:15.620 | (upbeat music)
01:00:18.220 | (upbeat music)
01:00:20.820 | [BLANK_AUDIO]