back to index

François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38


Chapters

0:0
11:43 Scale and Defining Intelligent Systems
15:23 Why an Intelligence Explosion Is Not Possible
29:22 Limits of Deep Learning
57:34 Can You Learn Rule-Based Models
58:6 The Field of Program Synthesis
63:7 Weight Agnostic Neural Networks
87:8 Algorithmic Bias
95:17 Turing Test

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Francois Chollet.
00:00:03.760 | He's the creator of Keras, which is an open source deep learning
00:00:07.360 | library that is designed to enable
00:00:09.200 | fast, user-friendly experimentation
00:00:11.520 | with deep neural networks.
00:00:13.600 | It serves as an interface to several deep learning libraries,
00:00:16.720 | most popular of which is TensorFlow.
00:00:19.040 | And it was integrated into the TensorFlow main code
00:00:22.000 | base a while ago.
00:00:24.120 | Meaning, if you want to create, train, and use neural networks,
00:00:28.640 | probably the easiest and most popular option
00:00:31.080 | is to use Keras inside TensorFlow.
00:00:34.800 | Aside from creating an exceptionally useful
00:00:37.280 | and popular library, Francois is also a world-class AI
00:00:41.040 | researcher and software engineer at Google.
00:00:44.560 | And he's definitely an outspoken, if not controversial,
00:00:48.160 | personality in the AI world, especially
00:00:51.480 | in the realm of ideas around the future
00:00:53.760 | of artificial intelligence.
00:00:55.920 | This is the Artificial Intelligence Podcast.
00:00:58.640 | If you enjoy it, subscribe on YouTube,
00:01:01.000 | give us five stars on iTunes, support on Patreon,
00:01:04.160 | or simply connect with me on Twitter at Lex Friedman,
00:01:07.160 | spelled F-R-I-D-M-A-N. And now, here's my conversation
00:01:12.200 | with Francois Chollet.
00:01:14.880 | You're known for not sugarcoating your opinions
00:01:17.280 | and speaking your mind about ideas in AI,
00:01:19.120 | especially on Twitter.
00:01:21.160 | It's one of my favorite Twitter accounts.
00:01:22.840 | So what's one of the more controversial ideas
00:01:26.360 | you've expressed online and gotten some heat for?
00:01:30.440 | How do you pick?
00:01:33.040 | How do I pick?
00:01:33.880 | Yeah, no, I think if you go through the trouble
00:01:36.920 | of maintaining a Twitter account,
00:01:39.600 | you might as well speak your mind.
00:01:41.880 | Otherwise, what's even the point of having a Twitter account?
00:01:44.600 | It's like having a nice car and just leaving it in the garage.
00:01:48.600 | Yeah, so what's one thing for which I got
00:01:52.280 | a lot of pushback?
00:01:53.600 | Perhaps, you know, that time I wrote something
00:01:56.680 | about the idea of intelligence explosion.
00:02:00.920 | And I was questioning the idea
00:02:04.520 | and the reasoning behind this idea.
00:02:06.840 | And I got a lot of pushback on that.
00:02:09.640 | I got a lot of flack for it.
00:02:11.840 | So yeah, so intelligence explosion,
00:02:13.600 | I'm sure you're familiar with the idea,
00:02:14.960 | but it's the idea that if you were to build
00:02:18.800 | general AI problem solving algorithms,
00:02:22.920 | well, the problem of building such an AI,
00:02:27.480 | that itself is a problem that could be solved by your AI.
00:02:30.520 | And maybe it could be solved better than what humans can do.
00:02:33.760 | So your AI could start tweaking its own algorithm,
00:02:36.840 | could start being a better version of itself.
00:02:39.520 | And so on, iteratively, in a recursive fashion.
00:02:43.240 | And so you would end up with an AI
00:02:47.320 | with exponentially increasing intelligence.
00:02:50.080 | That's right.
00:02:50.880 | And I was basically questioning this idea,
00:02:55.880 | first of all, because the notion of intelligence explosion
00:02:59.040 | uses an implicit definition of intelligence
00:03:02.200 | that doesn't sound quite right to me.
00:03:05.360 | It considers intelligence as a property of a brain
00:03:11.200 | that you can consider in isolation,
00:03:13.680 | like the height of a building, for instance.
00:03:15.760 | Right.
00:03:16.640 | But that's not really what intelligence is.
00:03:19.040 | Intelligence emerges from the interaction between a brain,
00:03:24.480 | a body, like embodied intelligence,
00:03:26.720 | and an environment.
00:03:28.320 | And if you're missing one of these pieces,
00:03:30.720 | then you cannot really define intelligence anymore.
00:03:33.840 | So just tweaking a brain to make it smaller and smaller
00:03:36.800 | doesn't actually make any sense to me.
00:03:39.120 | So first of all, you're crushing the dreams of many people.
00:03:42.560 | Right.
00:03:43.360 | So there's a, let's look at like Sam Harris.
00:03:46.000 | Actually, a lot of physicists, Max Tegmark,
00:03:48.720 | people who think the universe is an information processing
00:03:53.640 | system.
00:03:54.640 | Our brain is kind of an information processing system.
00:03:57.680 | So what's the theoretical limit?
00:04:00.080 | It doesn't make sense that there should be some--
00:04:04.800 | it seems naive to think that our own brain is somehow
00:04:08.160 | the limit of the capabilities of this information.
00:04:11.600 | I'm playing devil's advocate here.
00:04:13.640 | This information processing system.
00:04:15.600 | And then if you just scale it, if you're
00:04:18.080 | able to build something that's on par with the brain,
00:04:21.600 | the process that builds it just continues,
00:04:24.000 | and it'll improve exponentially.
00:04:26.360 | So that's the logic that's used, actually,
00:04:30.160 | by almost everybody that is worried
00:04:33.960 | about superhuman intelligence.
00:04:36.800 | Yeah.
00:04:37.320 | So you're trying to make--
00:04:39.080 | so most people who are skeptical of that are kind of like,
00:04:42.280 | this doesn't-- the thought process,
00:04:44.320 | this doesn't feel right.
00:04:46.600 | That's for me as well.
00:04:47.600 | So I'm more like, it doesn't--
00:04:51.360 | the whole thing is shrouded in mystery where you can't really
00:04:54.400 | say anything concrete, but you could
00:04:56.160 | say this doesn't feel right.
00:04:57.840 | This doesn't feel like that's how the brain works.
00:05:00.600 | And you're trying to, with your blog post,
00:05:02.320 | and now making it a little more explicit.
00:05:05.640 | So one idea is that the brain doesn't exist alone.
00:05:11.280 | It exists within the environment.
00:05:13.880 | So you can't exponentially--
00:05:16.320 | you would have to somehow exponentially
00:05:18.200 | improve the environment and the brain together,
00:05:21.080 | almost, in order to create something that's much smarter
00:05:26.600 | in some kind of--
00:05:28.520 | of course, we don't have a definition of intelligence.
00:05:30.720 | That's correct.
00:05:31.320 | That's correct.
00:05:31.920 | I don't think-- if you look at very smart people today,
00:05:34.880 | even humans, not even talking about AIs,
00:05:37.920 | I don't think their brain and the performance of their brain
00:05:41.200 | is the bottleneck to their expressed intelligence,
00:05:44.720 | to their achievements.
00:05:47.200 | You cannot just tweak one part of this system,
00:05:50.520 | like of this brain-body environment system,
00:05:53.440 | and expect the capabilities, like what emerges out
00:05:56.640 | of this system, to just explode exponentially.
00:06:00.840 | Because any time you improve one part
00:06:04.120 | of a system with many interdependencies like this,
00:06:07.280 | there's a new bottleneck that arises.
00:06:09.560 | And I don't think even today, for very smart people,
00:06:12.320 | their brain is not the bottleneck to the sort
00:06:15.400 | of problems they can solve.
00:06:17.600 | In fact, many very, very smart people today,
00:06:21.480 | they're not actually solving any big scientific problems.
00:06:23.800 | They're not Einstein.
00:06:24.840 | They're like Einstein, but the patent clerk days.
00:06:29.840 | Like Einstein became Einstein because this
00:06:32.640 | was a meeting of a genius with a big problem at the right time.
00:06:39.480 | But maybe this meeting could have never happened.
00:06:42.520 | And then Einstein would have just been a patent clerk.
00:06:45.000 | And in fact, many people today are probably
00:06:48.000 | like genius level smart, but you wouldn't know,
00:06:52.280 | because they're not really expressing any of that.
00:06:54.840 | - Wow, that's brilliant.
00:06:55.680 | So we can think of the world, Earth, but also the universe
00:06:59.560 | as just as a space of problems.
00:07:02.760 | So all of these problems and tasks are roaming it,
00:07:05.200 | of various difficulty.
00:07:06.920 | And there's agents, creatures like ourselves
00:07:10.120 | and animals and so on that are also roaming it.
00:07:13.360 | And then you get coupled with a problem
00:07:16.480 | and then you solve it.
00:07:17.640 | But without that coupling, you can't demonstrate
00:07:20.880 | your quote unquote intelligence.
00:07:22.520 | - Exactly, intelligence is the meeting
00:07:24.480 | of great problem solving capabilities
00:07:27.480 | with a great problem.
00:07:28.720 | And if you don't have the problem,
00:07:30.520 | you don't really express an intelligence.
00:07:32.240 | All you're left with is potential intelligence,
00:07:34.720 | like the performance of your brain
00:07:36.240 | or how high your IQ is, which in itself is just a number.
00:07:41.240 | - So you mentioned problem solving capacity.
00:07:46.160 | What do you think of as problem solving capacity?
00:07:51.800 | Can you try to define intelligence?
00:07:55.160 | Like what does it mean to be more or less intelligent?
00:08:00.000 | Is it completely coupled to a particular problem
00:08:03.000 | or is there something a little bit more universal?
00:08:05.720 | - Yeah, I do believe all intelligence
00:08:07.440 | is specialized intelligence.
00:08:09.080 | Even human intelligence has some degree of generality.
00:08:12.200 | Well, all intelligence systems have some degree of generality
00:08:15.360 | but they're always specialized in one category of problems.
00:08:19.400 | So the human intelligence is specialized
00:08:21.880 | in the human experience and that shows at various levels.
00:08:25.560 | That shows in some prior knowledge that's innate
00:08:30.200 | that we have at birth.
00:08:32.040 | Knowledge about things like agents, goal-driven behavior,
00:08:37.040 | visual priors about what makes an object,
00:08:40.400 | priors about time and so on.
00:08:43.520 | That shows also in the way we learn.
00:08:45.320 | For instance, it's very, very easy for us
00:08:47.160 | to pick up language.
00:08:48.600 | It's very, very easy for us to learn certain things
00:08:52.040 | because we are basically hard-coded to learn them.
00:08:54.920 | And we are specialized in solving certain kinds of problem
00:08:58.280 | and we are quite useless when it comes
00:09:00.200 | to other kinds of problems.
00:09:01.440 | For instance, we are not really designed
00:09:06.160 | to handle very long-term problems.
00:09:08.800 | We have no capability of seeing the very long-term.
00:09:12.880 | We don't have very much working memory.
00:09:16.880 | - So how do you think about long-term?
00:09:20.080 | Do you think long-term planning,
00:09:21.360 | are we talking about scale of years, millennia?
00:09:24.880 | What do you mean by long-term we're not very good?
00:09:28.120 | - Well, human intelligence is specialized
00:09:29.720 | in the human experience.
00:09:30.720 | And human experience is very short.
00:09:32.600 | Like one lifetime is short.
00:09:34.240 | Even within one lifetime, we have a very hard time
00:09:38.080 | envisioning things on a scale of years.
00:09:41.160 | Like it's very difficult to project yourself
00:09:43.240 | at the scale of five years,
00:09:44.080 | at the scale of 10 years and so on.
00:09:46.160 | - Right.
00:09:47.000 | - We can solve only fairly narrowly scoped problems.
00:09:50.000 | So when it comes to solving bigger problems,
00:09:52.320 | larger scale problems,
00:09:53.760 | we are not actually doing it on an individual level.
00:09:56.360 | So it's not actually our brain doing it.
00:09:59.320 | We have this thing called civilization, right?
00:10:03.080 | Which is itself a sort of problem solving system,
00:10:06.640 | a sort of artificial intelligence system, right?
00:10:10.040 | And it's not running on one brain,
00:10:12.160 | it's running on a network of brains.
00:10:14.120 | In fact, it's running on much more
00:10:15.640 | than a network of brains.
00:10:16.800 | It's running on a lot of infrastructure,
00:10:20.120 | like books and computers and the internet
00:10:23.080 | and human institutions and so on.
00:10:25.840 | And that is capable of handling problems
00:10:30.280 | on a much greater scale than any individual human.
00:10:33.760 | If you look at computer science, for instance,
00:10:37.600 | that's an institution that solves problems
00:10:39.840 | and it is superhuman, right?
00:10:42.560 | I had to press on a greater scale,
00:10:44.200 | it can solve much bigger problems
00:10:46.880 | than an individual human could.
00:10:49.080 | And science itself, science as a system,
00:10:51.360 | as an institution is a kind of artificially intelligent
00:10:55.720 | problem solving algorithm that is superhuman.
00:10:59.400 | - Yeah, it's a, at least computer science
00:11:02.800 | is like a theorem prover at a scale of thousands,
00:11:07.760 | maybe hundreds of thousands of human beings.
00:11:10.440 | At that scale, what do you think is an intelligent agent?
00:11:14.680 | So there's us humans at the individual level,
00:11:18.320 | there is millions, maybe billions of bacteria in our skin.
00:11:23.920 | There is, that's at the smaller scale.
00:11:26.440 | You can even go to the particle level
00:11:29.200 | as systems that behave, you can say intelligently
00:11:33.560 | in some ways, and then you can look at Earth
00:11:36.720 | as a single organism, you can look at our galaxy
00:11:39.240 | and even the universe as a single organism.
00:11:42.200 | Do you think, how do you think about scale
00:11:44.680 | and defining intelligent systems?
00:11:46.320 | And we're here at Google, there is millions of devices
00:11:50.480 | doing computation in a distributed way.
00:11:53.440 | How do you think about intelligence versus scale?
00:11:55.920 | - You can always characterize anything as a system.
00:11:59.480 | - Right.
00:12:00.680 | - I think people who talk about things
00:12:03.640 | like intelligence explosion tend to focus on one agent
00:12:07.440 | is basically one brain, like one brain considered
00:12:10.200 | in isolation, like a brain, a jaw that's controlling
00:12:12.680 | a body in a very like top to bottom kind of fashion.
00:12:16.320 | And that body is pursuing goals into an environment.
00:12:19.520 | So it's a very hierarchical view.
00:12:20.720 | You have the brain at the top of the pyramid,
00:12:22.880 | then you have the body just plainly receiving orders
00:12:26.000 | and then the body is manipulating objects
00:12:27.640 | in an environment and so on.
00:12:28.920 | So everything is subordinate to this one thing,
00:12:32.960 | this epicenter, which is the brain.
00:12:34.720 | But in real life, intelligent agents
00:12:37.120 | don't really work like this, right?
00:12:39.240 | There is no strong delimitation between the brain
00:12:41.760 | and the body to start with.
00:12:43.400 | You have to look not just at the brain,
00:12:45.000 | but at the nervous system.
00:12:46.560 | But then the nervous system and the body
00:12:48.840 | are not really two separate entities.
00:12:50.760 | So you have to look at an entire animal as one agent,
00:12:53.960 | but then you start realizing as you observe an animal
00:12:57.000 | over any length of time, that a lot of the intelligence
00:13:02.000 | of an animal is actually externalized.
00:13:04.600 | That's especially true for humans.
00:13:06.240 | A lot of our intelligence is externalized.
00:13:08.880 | When you write down some notes,
00:13:10.360 | that is externalized intelligence.
00:13:11.960 | When you write a computer program,
00:13:14.000 | you are externalizing cognition.
00:13:16.000 | So it's externalized in books, it's externalized
00:13:18.280 | in computers, the internet, in other humans.
00:13:21.520 | It's externalized in language and so on.
00:13:25.400 | So there is no hard delimitation
00:13:30.400 | of what makes an intelligent agent.
00:13:32.640 | It's all about context.
00:13:33.880 | - Okay, but AlphaGo is better at Go
00:13:38.720 | than the best human player.
00:13:40.160 | There's levels of skill here.
00:13:44.960 | So do you think there's such a concept
00:13:49.960 | as intelligence explosion in a specific task?
00:13:54.720 | And then, well, yeah, do you think it's possible
00:13:58.560 | to have a category of tasks on which you do have something
00:14:02.040 | like an exponential growth of ability
00:14:05.000 | to solve that particular problem?
00:14:07.440 | - I think if you consider a specific vertical,
00:14:10.320 | it's probably possible to some extent
00:14:14.640 | I also don't think we have to speculate about it
00:14:17.640 | because we have real world examples
00:14:21.640 | of recursively self-improving intelligent systems.
00:14:25.440 | - Right.
00:14:26.280 | - So for instance, science is a problem solving system,
00:14:30.280 | a knowledge generation system,
00:14:31.960 | like a system that experiences the world in some sense
00:14:35.640 | and then gradually understands it and can act on it.
00:14:39.520 | And that system is superhuman
00:14:42.080 | and it is clearly recursively self-improving
00:14:45.560 | because science feeds into technology.
00:14:47.520 | Technology can be used to build better tools,
00:14:50.160 | better computers, better instrumentation and so on,
00:14:52.840 | which in turn can make science faster, right?
00:14:56.680 | So science is probably the closest thing we have today
00:15:00.520 | to a recursively self-improving superhuman AI.
00:15:04.720 | And you can just observe,
00:15:06.480 | is scientific progress today exploding,
00:15:10.280 | which itself is an interesting question.
00:15:12.760 | And you can use that as a basis to try to understand
00:15:15.520 | what will happen with a superhuman AI
00:15:17.840 | that has science-like behavior.
00:15:20.960 | - Let me linger on it a little bit more.
00:15:23.280 | What is your intuition why an intelligence explosion
00:15:27.560 | is not possible?
00:15:28.480 | Like taking the scientific,
00:15:30.880 | all the scientific revolutions,
00:15:33.200 | why can't we slightly accelerate that process?
00:15:38.080 | - So you can absolutely accelerate
00:15:41.200 | any problem-solving process.
00:15:43.120 | So recursive self-improvement is absolutely a real thing.
00:15:48.120 | But what happens with a recursively self-improving system
00:15:51.880 | is typically not explosion
00:15:53.680 | because no system exists in isolation.
00:15:56.480 | And so tweaking one part of the system
00:15:58.640 | means that suddenly another part of the system
00:16:00.880 | becomes a bottleneck.
00:16:02.160 | And if you look at science, for instance,
00:16:03.760 | which is clearly recursively self-improving,
00:16:06.800 | clearly a problem-solving system,
00:16:09.040 | scientific progress is not actually exploding.
00:16:12.000 | If you look at science,
00:16:13.520 | what you see is the picture of a system
00:16:16.480 | that is consuming an exponentially increasing
00:16:19.240 | amount of resources.
00:16:20.520 | But it's having a linear output
00:16:23.920 | in terms of scientific progress.
00:16:25.960 | And maybe that will seem like a very strong claim.
00:16:28.920 | Many people are actually saying that,
00:16:31.120 | scientific progress is exponential,
00:16:34.520 | but when they're claiming this,
00:16:36.120 | they're actually looking at indicators
00:16:38.400 | of resource consumption by science.
00:16:43.080 | For instance, the number of papers being published,
00:16:46.680 | the number of patents being filed and so on,
00:16:49.960 | which are just completely correlated
00:16:53.600 | with how many people are working on science today.
00:16:58.480 | So it's actually an indicator of resource consumption.
00:17:00.640 | But what you should look at is the output,
00:17:03.200 | is progress in terms of the knowledge that science generates
00:17:08.040 | in terms of the scope and significance
00:17:10.640 | of the problems that we solve.
00:17:12.520 | And some people have actually been trying to measure that.
00:17:16.720 | Like Michael Nielsen, for instance.
00:17:20.160 | He had a very nice paper,
00:17:21.920 | I think that was last year about it.
00:17:23.720 | So his approach to measure scientific progress
00:17:28.360 | was to look at the timeline of scientific discoveries
00:17:33.360 | over the past 100, 150 years.
00:17:37.160 | And for each measured discovery,
00:17:41.360 | ask a panel of experts
00:17:43.480 | to rate the significance of the discovery.
00:17:46.760 | And if the output of science as an institution
00:17:49.600 | were exponential, you would expect
00:17:51.640 | the temporal density of significance
00:17:56.600 | to go up exponentially.
00:17:58.160 | Maybe because there's a faster rate of discoveries,
00:18:00.960 | maybe because the discoveries are increasingly
00:18:03.560 | more important.
00:18:04.920 | And what actually happens if you plot
00:18:07.840 | this temporal density of significance measured in this way,
00:18:11.320 | is that you see very much a flat graph.
00:18:14.520 | You see a flat graph across all disciplines,
00:18:16.600 | across physics, biology, medicine, and so on.
00:18:19.720 | And it actually makes a lot of sense if you think about it,
00:18:23.280 | because think about the progress of physics
00:18:26.000 | 110 years ago, right?
00:18:28.000 | It was a time of crazy change.
00:18:30.040 | Think about the progress of technology,
00:18:31.960 | you know, 170 years ago,
00:18:34.360 | when we started having, you know,
00:18:35.360 | replacing horses with cars,
00:18:37.560 | when we started having electricity and so on.
00:18:40.000 | It was a time of incredible change.
00:18:41.520 | And today is also a time of very, very fast change,
00:18:44.600 | but it would be an unfair characterization
00:18:48.040 | to say that today, technology and science
00:18:50.560 | are moving way faster than they did 50 years ago,
00:18:52.920 | 100 years ago.
00:18:54.360 | And if you do try to
00:18:58.240 | rigorously plot the temporal density
00:19:01.520 | of the significance,
00:19:04.840 | yeah, of significance idea,
00:19:06.000 | of significance idea, sorry,
00:19:07.360 | you do see very flat curves.
00:19:09.720 | - That's fascinating.
00:19:10.560 | - And you can check out the paper that Michael Nielsen
00:19:13.800 | had about this idea.
00:19:16.000 | And so the way I interpret it is,
00:19:20.000 | as you make progress,
00:19:22.600 | you know, in a given field
00:19:24.200 | or in a given subfield of science,
00:19:26.120 | it becomes exponentially more difficult
00:19:28.680 | to make further progress.
00:19:30.440 | Like the very first person to work on information theory.
00:19:35.000 | If you enter a new field,
00:19:36.440 | and it's still the very early years,
00:19:37.920 | there's a lot of low hanging fruit you can pick.
00:19:41.160 | - That's right, yeah.
00:19:42.000 | - But the next generation of researchers
00:19:43.960 | is gonna have to dig much harder actually
00:19:48.120 | to make smaller discoveries,
00:19:50.160 | probably larger numbers, smaller discoveries.
00:19:52.640 | And to achieve the same amount of impact,
00:19:54.640 | you're gonna need a much greater headcount.
00:19:57.480 | And that's exactly the picture you're seeing with science,
00:20:00.040 | is that the number of scientists and engineers
00:20:03.760 | is in fact increasing exponentially.
00:20:06.520 | The amount of computational resources
00:20:08.400 | that are available to science
00:20:10.040 | is increasing exponentially and so on.
00:20:11.840 | So the resource consumption of science is exponential,
00:20:15.560 | but the output in terms of progress,
00:20:18.160 | in terms of significance is linear.
00:20:20.960 | And the reason why is because,
00:20:23.080 | and even though science is recursively self-improving,
00:20:25.960 | meaning that scientific progress
00:20:28.400 | turns into technological progress,
00:20:30.200 | which in turn helps science.
00:20:32.920 | If you look at computers, for instance,
00:20:35.240 | are a product of science,
00:20:37.760 | and computers are tremendously useful
00:20:40.320 | in speeding up science.
00:20:41.520 | The internet, same thing.
00:20:42.680 | The internet is a technology that's made possible
00:20:44.640 | by very recent scientific advances.
00:20:47.440 | And itself, because it enables scientists to network,
00:20:52.400 | to communicate, to exchange papers and ideas much faster,
00:20:55.520 | it is a way to speed up scientific progress.
00:20:57.400 | So even though you're looking
00:20:58.400 | at a recursively self-improving system,
00:21:01.400 | it is consuming exponentially more resources
00:21:04.080 | to produce the same amount of problem-solving very much.
00:21:09.080 | - So that's a fascinating way to paint it.
00:21:11.080 | And certainly that holds for the deep learning community.
00:21:14.920 | If you look at the temporal, what did you call it?
00:21:18.080 | The temporal density of significant ideas.
00:21:21.200 | If you look at in deep learning,
00:21:23.880 | I think, I'd have to think about that,
00:21:26.920 | but if you really look at significant ideas
00:21:29.000 | in deep learning, they might even be decreasing.
00:21:32.360 | - So I do believe the per paper significance is decreasing.
00:21:37.360 | But the amount of papers is still today
00:21:42.360 | exponentially increasing.
00:21:43.400 | So I think if you look at an aggregate,
00:21:45.840 | my guess is that you would see a linear progress.
00:21:48.840 | - Linear progress.
00:21:49.680 | - If you were to sum the significance of all papers,
00:21:53.800 | you would see a roughly linear progress.
00:21:58.600 | And in my opinion, it is not a coincidence
00:22:03.600 | that you're seeing linear progress in science
00:22:05.760 | despite exponential resource consumption.
00:22:07.680 | I think the resource consumption
00:22:10.280 | is dynamically adjusting itself to maintain linear progress
00:22:15.280 | because we as a community expect linear progress,
00:22:18.520 | meaning that if we start investing less
00:22:21.240 | and seeing less progress,
00:22:22.320 | it means that suddenly there are some lower hanging fruits
00:22:25.720 | that become available and someone's gonna step up
00:22:29.600 | and pick them.
00:22:31.240 | So it's very much like a market for discoveries and ideas.
00:22:36.240 | - But there's another fundamental part
00:22:38.680 | which you're highlighting,
00:22:39.760 | which is a hypothesis that science,
00:22:42.600 | or like the space of ideas,
00:22:45.120 | any one path you travel down,
00:22:48.120 | it gets exponentially more difficult
00:22:51.040 | to develop new ideas.
00:22:54.680 | And your sense is that's gonna hold
00:22:57.600 | across our mysterious universe.
00:23:01.480 | - Yes, well, exponential progress
00:23:03.320 | triggers exponential friction.
00:23:05.440 | So that if you tweak one part of the system,
00:23:07.400 | suddenly some other part becomes a bottleneck.
00:23:10.680 | For instance, let's say we develop some device
00:23:14.880 | that measures its own acceleration
00:23:17.160 | and then it has some engine
00:23:18.720 | and it outputs even more acceleration
00:23:20.800 | in proportion of its own acceleration
00:23:22.360 | and you drop it somewhere,
00:23:23.320 | it's not gonna reach infinite speed
00:23:25.240 | because it exists in a certain context.
00:23:27.880 | So the air around it is gonna generate friction.
00:23:31.000 | It's gonna block it at some top speed.
00:23:34.320 | And even if you were to consider the broader context
00:23:37.520 | and lift the bottleneck there,
00:23:39.880 | like the bottleneck of friction,
00:23:42.280 | then some other part of the system
00:23:45.200 | would start stepping in
00:23:46.440 | and creating exponential friction,
00:23:48.160 | maybe the speed of light, or whatever.
00:23:49.960 | And this definitely holds true
00:23:51.960 | when you look at the problem-solving algorithm
00:23:55.000 | that is being run by science as an institution,
00:23:58.200 | science as a system.
00:23:59.760 | As you make more and more progress,
00:24:02.080 | despite having this recursive self-improvement component,
00:24:05.920 | you are encountering exponential friction.
00:24:09.440 | Like the more researchers you have
00:24:11.600 | working on different ideas,
00:24:13.600 | the more overhead you have
00:24:15.000 | in terms of communication across researchers.
00:24:18.160 | If you look at,
00:24:19.000 | you were mentioning quantum mechanics, right?
00:24:23.040 | Well, if you want to start making significant discoveries
00:24:26.960 | today, significant progress in quantum mechanics,
00:24:29.800 | there is an amount of knowledge
00:24:31.880 | you have to ingest, which is huge.
00:24:34.160 | So there's a very large overhead
00:24:36.600 | to even start to contribute.
00:24:39.320 | There's a large amount of overhead
00:24:40.760 | to synchronize across researchers and so on.
00:24:44.120 | And of course, the significant practical experiments
00:24:47.520 | are going to require exponentially expensive equipment
00:24:52.240 | because the easier ones have already been run, right?
00:24:56.600 | - So in your senses,
00:24:59.360 | there's no way escaping,
00:25:01.840 | there's no way of escaping this kind of friction
00:25:05.720 | with artificial intelligence systems.
00:25:09.840 | - Yeah, no, I think science is a very good way
00:25:12.800 | to model what would happen
00:25:14.080 | with a superhuman recursively self-improving AI.
00:25:17.760 | - That's your sense, I mean, the--
00:25:19.800 | - That's my intuition.
00:25:20.880 | It's not like a mathematical proof of anything.
00:25:24.680 | That's not my point.
00:25:25.720 | Like, I'm not trying to prove anything.
00:25:27.360 | I'm just trying to make an argument,
00:25:28.800 | to question the narrative of intelligence explosion,
00:25:32.120 | which is quite a dominant narrative.
00:25:33.800 | And you do get a lot of pushback if you go against it.
00:25:36.760 | Because, so for many people, right,
00:25:40.200 | AI is not just a subfield of computer science.
00:25:43.120 | It's more like a belief system.
00:25:44.920 | Like this belief that the world is headed towards an event,
00:25:49.560 | the singularity, past which, you know,
00:25:52.800 | AI will become, will go exponential very much
00:25:58.040 | and the world will be transformed
00:25:59.520 | and humans will become obsolete.
00:26:01.920 | And if you go against this narrative,
00:26:04.840 | because it is not really a scientific argument,
00:26:07.840 | but more of a belief system,
00:26:09.960 | it is part of the identity of many people.
00:26:12.200 | If you go against this narrative,
00:26:13.560 | it's like you're attacking the identity
00:26:15.320 | of people who believe in it.
00:26:16.480 | It's almost like saying God doesn't exist or something.
00:26:19.240 | - Right.
00:26:20.280 | - So you do get a lot of pushback
00:26:22.960 | if you try to question these ideas.
00:26:25.080 | - First of all, I believe most people,
00:26:27.640 | they might not be as eloquent or explicit as you're being,
00:26:30.280 | but most people in computer science
00:26:32.000 | and most people who actually have built
00:26:34.040 | anything that you could call AI, quote unquote,
00:26:37.400 | would agree with you.
00:26:39.160 | They might not be describing in the same kind of way.
00:26:41.600 | It's more, so the pushback you're getting
00:26:45.040 | is from people who get attached to the narrative
00:26:48.800 | from not from a place of science,
00:26:51.060 | but from a place of imagination.
00:26:53.440 | - That's correct, that's correct.
00:26:54.800 | - So why do you think that's so appealing?
00:26:56.960 | Because the usual dreams that people have
00:27:01.960 | when you create a superintelligent system
00:27:04.000 | past the singularity, that what people imagine
00:27:06.880 | is somehow always destructive.
00:27:08.620 | Do you have, if you were put on your psychology hat,
00:27:12.280 | what's, why is it so appealing to imagine
00:27:17.280 | the ways that all of human civilization will be destroyed?
00:27:20.800 | - I think it's a good story.
00:27:22.120 | You know, it's a good story.
00:27:23.160 | And very interestingly, it mirrors
00:27:26.480 | religious stories, right?
00:27:28.600 | Religious mythology.
00:27:30.600 | If you look at the mythology of most civilizations,
00:27:34.400 | it's about the world being headed towards some final events
00:27:38.320 | in which the world will be destroyed
00:27:40.520 | and some new world order will arise
00:27:42.840 | that will be mostly spiritual,
00:27:44.960 | like the apocalypse followed by paradise probably, right?
00:27:49.480 | It's a very appealing story on a fundamental level.
00:27:52.640 | And we all need stories.
00:27:54.600 | We all need stories to structure the way we see the world,
00:27:58.160 | especially at timescales that are beyond
00:28:01.760 | our ability to make predictions, right?
00:28:04.520 | - So on a more serious, non-exponential explosion question,
00:28:09.520 | do you think there will be a time
00:28:14.960 | when we'll create something like human-level intelligence
00:28:19.800 | or intelligent systems that will make you sit back
00:28:23.800 | and be just surprised at damn how smart this thing is?
00:28:28.480 | That doesn't require exponential growth
00:28:30.160 | or exponential improvement,
00:28:32.120 | but what's your sense of the timeline and so on
00:28:35.560 | that you'll be really surprised at certain capabilities?
00:28:41.040 | And we'll talk about limitations in deep learning.
00:28:42.520 | So when do you, do you think in your lifetime
00:28:44.440 | you'll be really damn surprised?
00:28:46.600 | - Around 2013, 2014, I was many times surprised
00:28:51.400 | by the capabilities of deep learning, actually.
00:28:53.680 | - Yeah.
00:28:54.520 | - That was before we had assessed exactly
00:28:55.840 | what deep learning could do and could not do.
00:28:57.840 | And it felt like a time of immense potential.
00:29:00.560 | And then we started narrowing it down,
00:29:03.040 | but I was very surprised.
00:29:04.320 | So I would say it has already happened.
00:29:07.120 | - Was there a moment, there must've been a day in there
00:29:10.800 | where your surprise was almost bordering on the belief
00:29:15.800 | of the narrative that we just discussed.
00:29:19.040 | Was there a moment, 'cause you've written quite eloquently
00:29:22.400 | about the limits of deep learning.
00:29:23.960 | Was there a moment that you thought
00:29:25.760 | that maybe deep learning is limitless?
00:29:27.720 | - No, I don't think I've ever believed this.
00:29:32.400 | What was really shocking is that it's worked.
00:29:35.240 | - It worked at all, yeah.
00:29:36.320 | - Yeah.
00:29:37.640 | But there's a big jump between being able to do
00:29:41.600 | really good computer vision and human level intelligence.
00:29:44.880 | So I don't think at any point, I wasn't an impression
00:29:49.480 | that the results we got in computer vision
00:29:51.280 | meant that we were very close to human level intelligence.
00:29:54.120 | I don't think we were very close to human level intelligence.
00:29:56.080 | I do believe that there's no reason why we won't achieve it
00:30:00.400 | at some point.
00:30:01.800 | I also believe that, you know, it's the problem
00:30:06.440 | with talking about human level intelligence
00:30:08.600 | that implicitly you're considering like an axis
00:30:12.160 | of intelligence with different levels.
00:30:14.400 | But that's not really how intelligence works.
00:30:16.760 | Intelligence is very multidimensional.
00:30:19.520 | And so there's the question of capabilities,
00:30:22.520 | but there's also the question of being human-like.
00:30:25.600 | And it's two very different things.
00:30:27.080 | Like you can build potentially very advanced
00:30:29.760 | intelligent agents that are not human-like at all.
00:30:32.720 | And you can also build very human-like agents.
00:30:35.280 | And these are two very different things, right?
00:30:37.920 | - Right.
00:30:38.800 | Let's go from the philosophical to the practical.
00:30:42.280 | Can you give me a history of Keras
00:30:44.280 | and all the major deep learning frameworks
00:30:46.520 | that you kind of remember in relation to Keras
00:30:48.560 | and in general, TensorFlow, Theano, the old days.
00:30:52.080 | Can you give a brief overview, Wikipedia style history
00:30:55.440 | and your role in it before we return to AGI discussions?
00:30:59.160 | - Yeah, that's a broad topic.
00:31:00.720 | So I started working on Keras.
00:31:04.080 | It was a name Keras at the time.
00:31:06.240 | I actually picked the name like just the day
00:31:08.880 | I was going to release it.
00:31:10.240 | So I started working on it in February, 2015.
00:31:14.840 | And so at the time, there weren't too many people
00:31:17.280 | working on deep learning, maybe like fewer than 10,000.
00:31:20.400 | The software tooling was not really developed.
00:31:22.920 | So the main deep learning library was Caffe,
00:31:28.880 | which was mostly C++.
00:31:30.920 | - Why do you say Caffe was the main one?
00:31:32.840 | - Caffe was vastly more popular than Theano
00:31:36.080 | in late 2014, early 2015.
00:31:38.960 | Caffe was the one library that everyone was using
00:31:42.440 | for computer vision.
00:31:43.480 | - And computer vision was the most popular problem
00:31:46.200 | that you were working at the time.
00:31:47.040 | - Absolutely, like Covenants was like the subfield
00:31:49.920 | of deep learning that everyone was working on.
00:31:53.160 | So myself, so in late 2014, I was actually interested
00:31:58.160 | in RNNs, in Recurrent Neural Networks,
00:32:01.760 | which was a very niche topic at the time.
00:32:05.800 | It really took off around 2016.
00:32:08.640 | And so I was looking for good tools.
00:32:11.560 | I had used Torch 7, I had used Theano,
00:32:14.800 | used Theano a lot in Kaggle competitions.
00:32:17.600 | I had used Caffe, and there was no good solution
00:32:24.320 | for RNNs at the time.
00:32:26.160 | There was no reusable open source implementation
00:32:28.800 | of an LSTM, for instance.
00:32:30.160 | So I decided to build my own.
00:32:33.080 | And at first, the pitch for that was,
00:32:35.600 | it was gonna be mostly around LSTM,
00:32:39.080 | Recurrent Neural Networks.
00:32:40.120 | It was gonna be in Python.
00:32:41.520 | An important decision at the time
00:32:44.480 | that was kind of not obvious is that the models
00:32:47.080 | would be defined via Python code,
00:32:50.520 | which was kind of like going against the mainstream
00:32:54.560 | at the time because Caffe, PyLearn 2, and so on,
00:32:58.160 | like all the big libraries were actually going
00:33:00.800 | with the approach of having static configuration files
00:33:03.680 | in YAML to define models.
00:33:05.760 | So some libraries were using code to define models,
00:33:09.040 | like Torch 7, obviously, but that was not.
00:33:11.080 | Python.
00:33:12.480 | Lasagne was like a Theano-based, very early library
00:33:16.880 | that was, I think, developed, I'm not sure exactly,
00:33:18.840 | probably late 2014.
00:33:20.440 | - It's Python as well.
00:33:21.360 | - It's Python as well.
00:33:22.200 | It was like on top of Theano.
00:33:24.680 | And so I started working on something,
00:33:28.360 | and the value proposition at the time was that
00:33:32.520 | not only that what I think was the first reusable
00:33:36.920 | open source implementation of LSTM,
00:33:40.400 | you could combine RNNs and Covenants
00:33:44.440 | with the same library, which was not really possible before.
00:33:46.920 | Like Caffe was only doing Covenants.
00:33:49.080 | And it was kind of easy to use because,
00:33:53.040 | so before I was using Theano, I was actually using Scikit-Learn
00:33:55.680 | and I loved Scikit-Learn for its usability.
00:33:58.320 | So I drew a lot of inspiration from Scikit-Learn
00:34:01.560 | when I met Keras.
00:34:02.400 | It's almost like Scikit-Learn for neural networks.
00:34:04.960 | - Yeah, the fit function.
00:34:06.680 | - Exactly, the fit function.
00:34:07.960 | Like reducing a complex string loop
00:34:10.800 | to a single function call, right?
00:34:12.880 | And of course, some people will say,
00:34:14.880 | this is hiding a lot of details,
00:34:16.320 | but that's exactly the point, right?
00:34:18.680 | The magic is the point.
00:34:20.280 | So it's magical, but in a good way.
00:34:22.640 | It's magical in the sense that it's delightful, right?
00:34:25.560 | - Yeah, I'm actually quite surprised.
00:34:27.600 | I didn't know that it was born out of desire
00:34:29.560 | to implement RNNs and LSTMs.
00:34:32.440 | - It was.
00:34:33.280 | - That's fascinating.
00:34:34.120 | So you were actually one of the first people
00:34:36.000 | to really try to attempt
00:34:37.920 | to get the major architectures together.
00:34:40.960 | And it's also interesting,
00:34:42.720 | you made me realize that that was a design decision at all,
00:34:45.120 | is defining the model and code.
00:34:47.320 | Just, I'm putting myself in your shoes,
00:34:49.880 | whether the YAML, especially if CAFE was the most popular.
00:34:53.200 | - It was the most popular by far at the time.
00:34:56.040 | - If I were, yeah, I didn't like the YAML thing,
00:34:59.560 | but it makes more sense that you will put
00:35:02.840 | in a configuration file the definition of a model.
00:35:05.720 | That's an interesting gutsy move
00:35:07.200 | to stick with defining it in code.
00:35:10.040 | Just if you look back.
00:35:11.400 | - Other libraries were doing it as well,
00:35:13.480 | but it was definitely the more niche option.
00:35:16.320 | - Yeah, okay, Keras and then--
00:35:18.360 | - Keras, so I released Keras in March, 2015,
00:35:21.520 | and it got users pretty much from the start.
00:35:24.160 | So the deep learning community
00:35:25.080 | was very, very small at the time.
00:35:27.240 | Lots of people were starting to be interested in LSTM.
00:35:30.600 | So it was gonna release it at the right time
00:35:32.440 | because it was offering an easy to use LSTM implementation.
00:35:35.560 | Exactly at the time where lots of people started
00:35:37.680 | to be intrigued by the capabilities of RNN, RNN, so NLP.
00:35:42.280 | So it grew from there.
00:35:43.960 | Then I joined Google about six months later,
00:35:51.520 | and that was actually completely unrelated to Keras.
00:35:54.960 | I actually joined a research team
00:35:57.120 | working on image classification,
00:35:59.560 | mostly like computer vision.
00:36:00.720 | So I was doing computer vision research at Google initially.
00:36:03.680 | And immediately when I joined Google,
00:36:05.520 | I was exposed to the early internal version of TensorFlow.
00:36:10.520 | And the way it appeared to me at the time,
00:36:13.920 | and it was definitely the way it was at the time,
00:36:15.720 | is that this was an improved version of Theano.
00:36:20.720 | So I immediately knew I had to port Keras
00:36:24.720 | to this new TensorFlow thing.
00:36:26.800 | And I was actually very busy as a new Googler.
00:36:31.640 | So I had not time to work on that.
00:36:34.520 | But then in November, I think it was November 2015,
00:36:38.680 | TensorFlow got released.
00:36:41.240 | And it was kind of like my wake up call that,
00:36:44.720 | hey, I had to actually go and make it happen.
00:36:47.320 | So in December, I ported Keras to run on top of TensorFlow.
00:36:52.320 | But it was not exactly a port,
00:36:53.320 | it was more like a refactoring
00:36:55.280 | where I was abstracting away all the backend functionality
00:36:59.440 | into one module so that the same code base
00:37:02.320 | could run on top of multiple backends.
00:37:04.200 | So on top of TensorFlow or Theano.
00:37:07.440 | And for the next year, Theano stayed as the default option.
00:37:12.440 | It was easier to use, somewhat less buggy.
00:37:20.400 | It was much faster, especially when it came to Ornans.
00:37:23.360 | But eventually, TensorFlow overtook it.
00:37:27.440 | - And TensorFlow, the early TensorFlow,
00:37:30.160 | has similar architectural decisions as Theano.
00:37:33.920 | So it was a natural transition.
00:37:37.400 | - Yeah, absolutely.
00:37:38.240 | - So what, I mean, that's still Keras
00:37:41.680 | as a side, almost fun project, right?
00:37:45.240 | - Yeah, so it was not my job assignment.
00:37:48.960 | I was doing it on the side.
00:37:52.160 | And even though it grew to have a lot of users
00:37:55.800 | for a deep learning library at the time, like Stroud 2016,
00:37:59.560 | but I wasn't doing it as my main job.
00:38:02.440 | So things started changing in,
00:38:04.720 | I think it must have been maybe October 2016.
00:38:09.720 | So one year later.
00:38:11.280 | So Rajat, who was the lead on TensorFlow,
00:38:15.200 | basically showed up one day in our building
00:38:19.200 | where I was doing research and things like,
00:38:21.600 | so I did a lot of computer vision research,
00:38:24.600 | also collaborations with Christian Zugedi
00:38:27.560 | and deep learning for theorem proving.
00:38:29.640 | It was a really interesting research topic.
00:38:32.880 | And so Rajat was saying, "Hey, we saw Keras, we like it.
00:38:39.520 | We saw that you're at Google.
00:38:42.400 | Why don't you come over for like a quarter
00:38:45.280 | and work with us?"
00:38:47.280 | And I was like, "Yeah, that sounds like a great opportunity.
00:38:49.240 | Let's do it."
00:38:50.400 | And so I started working on integrating the Keras API
00:38:55.400 | into TensorFlow more tightly.
00:38:57.320 | So what followed up is a sort of like temporary
00:39:02.640 | TensorFlow only version of Keras
00:39:05.480 | that was in TensorFlow.com Trib for a while.
00:39:09.320 | And finally moved to TensorFlow Core.
00:39:12.200 | And I've never actually gotten back
00:39:15.360 | to my old team doing research.
00:39:17.560 | - Well, it's kind of funny that somebody like you
00:39:22.280 | who dreams of, or at least sees the power of AI systems
00:39:27.280 | that reason and theorem proving we'll talk about
00:39:31.640 | has also created a system that makes
00:39:34.600 | the most basic kind of Lego building
00:39:39.000 | that is deep learning, super accessible, super easy.
00:39:42.600 | So beautifully so.
00:39:43.800 | It's a funny irony that you're both,
00:39:47.200 | you're responsible for both things.
00:39:49.080 | But so TensorFlow 2.0 is kind of, there's a sprint.
00:39:53.960 | I don't know how long it'll take,
00:39:55.000 | but there's a sprint towards the finish.
00:39:56.920 | What do you look, what are you working on these days?
00:40:01.000 | What are you excited about?
00:40:02.120 | What are you excited about in 2.0?
00:40:04.200 | I mean, eager execution.
00:40:05.720 | There's so many things that just make it a lot easier
00:40:08.400 | to work.
00:40:09.720 | What are you excited about?
00:40:11.520 | And what's also really hard?
00:40:13.560 | What are the problems you have to kind of solve?
00:40:15.760 | - So I've spent the past year and a half
00:40:17.960 | working on TensorFlow 2.0.
00:40:20.840 | It's been a long journey.
00:40:22.880 | I'm actually extremely excited about it.
00:40:25.040 | I think it's a great product.
00:40:26.400 | It's a delightful product compared to TensorFlow 1.0.
00:40:29.320 | We've made huge progress.
00:40:31.400 | So on the Keras side, what I'm really excited about is that,
00:40:37.360 | so, you know, previously Keras has been this very easy
00:40:42.040 | to use high-level interface to do deep learning.
00:40:46.040 | But if you wanted to, you know,
00:40:50.760 | if you wanted a lot of flexibility,
00:40:53.280 | the Keras framework, you know,
00:40:55.600 | was probably not the optimal way to do things
00:40:58.640 | compared to just writing everything from scratch.
00:41:01.120 | So in some way, the framework was getting in the way.
00:41:04.920 | And in TensorFlow 2.0, you don't have this at all,
00:41:07.760 | actually, you have the usability of the high-level interface,
00:41:11.280 | but you have the flexibility of this lower-level interface.
00:41:14.720 | And you have this spectrum of workflows
00:41:17.120 | where you can get more or less usability
00:41:21.760 | and flexibility trade-offs depending on your needs, right?
00:41:26.760 | You can write everything from scratch
00:41:29.880 | and you get a lot of help doing so by, you know,
00:41:33.200 | subclassing models and writing some train loops
00:41:36.640 | using ego execution.
00:41:38.400 | It's very flexible.
00:41:39.320 | It's very easy to debug.
00:41:40.360 | It's very powerful.
00:41:41.360 | But all of this integrates seamlessly
00:41:44.960 | with higher-level features up to, you know,
00:41:47.800 | the classic Keras workflows,
00:41:49.400 | which are very scikit-learn-like
00:41:51.520 | and, you know, are ideal for a data scientist,
00:41:56.040 | machine learning engineer type of profile.
00:41:58.200 | So now you can have the same framework
00:42:00.800 | offering the same set of APIs
00:42:02.840 | that enable a spectrum of workflows
00:42:05.000 | that are more or less low-level, more or less high-level
00:42:08.560 | that are suitable for, you know,
00:42:10.440 | profiles ranging from researchers to data scientists
00:42:14.400 | and everything in between.
00:42:15.560 | - Yeah, so that's super exciting.
00:42:16.960 | I mean, it's not just that,
00:42:18.400 | it's connected to all kinds of tooling.
00:42:21.680 | You can go on mobile, you can go with TensorFlow Lite,
00:42:24.520 | you can go in the cloud with serving and so on,
00:42:27.240 | and all is connected together.
00:42:28.960 | And some of the best software written ever
00:42:31.880 | is often done by one person, sometimes two.
00:42:37.240 | So with Google, you're now seeing sort of Keras
00:42:40.760 | having to be integrated in TensorFlow,
00:42:42.800 | I'm sure has a ton of engineers working on.
00:42:46.400 | So, and there's, I'm sure,
00:42:48.320 | a lot of tricky design decisions to be made.
00:42:52.160 | How does that process usually happen
00:42:54.400 | from at least your perspective?
00:42:56.720 | What are the debates like?
00:42:59.720 | Is there a lot of thinking,
00:43:04.160 | considering different options and so on?
00:43:06.840 | - Yes.
00:43:08.200 | So a lot of the time I spend at Google
00:43:12.600 | is actually discussing design discussions, right?
00:43:17.240 | Writing design docs,
00:43:18.560 | participating in design review meetings and so on.
00:43:22.040 | This is, you know, as important
00:43:23.720 | as actually writing the code.
00:43:25.200 | - Right.
00:43:26.040 | - So there's a lot of thoughts,
00:43:27.200 | there's a lot of thoughts and a lot of care
00:43:29.280 | that is taken in coming up with these decisions
00:43:34.120 | and taking into account all of our users
00:43:37.080 | because TensorFlow has this extremely diverse user base,
00:43:40.640 | right?
00:43:41.480 | It's not like just one user segment
00:43:43.040 | where everyone has the same needs.
00:43:45.400 | We have small scale production users,
00:43:47.560 | large scale production users.
00:43:49.440 | We have startups, we have researchers,
00:43:52.760 | you know, it's all over the place.
00:43:55.000 | And we have to cater to all of their needs.
00:43:57.480 | - If I just look at the standard debates of C++ or Python,
00:44:02.200 | there's some heated debates.
00:44:03.920 | Do you have those at Google?
00:44:05.920 | I mean, they're not heated in terms of emotionally,
00:44:08.000 | but there's probably multiple ways to do it right.
00:44:10.720 | So how do you arrive through those design meetings
00:44:13.960 | at the best way to do it?
00:44:15.360 | Especially in deep learning where the field is evolving
00:44:19.200 | as you're doing it.
00:44:20.800 | Is there some magic to it?
00:44:23.520 | There's some magic to the process?
00:44:25.240 | - I don't know if there's magic to the process,
00:44:28.200 | but there definitely is a process.
00:44:30.680 | So making design decisions
00:44:33.920 | about satisfying a set of constraints,
00:44:36.120 | but also trying to do so in the simplest way possible,
00:44:39.960 | because this is what can be maintained,
00:44:42.280 | this is what can be expanded in the future.
00:44:45.000 | So you don't want to naively satisfy the constraints
00:44:49.160 | by just, you know, for each capability you need available,
00:44:51.960 | you're gonna come up with one argument,
00:44:53.480 | a new API and so on.
00:44:54.760 | You want to design APIs that are modular
00:44:59.560 | and hierarchical so that they have an API surface
00:45:04.120 | that is as small as possible.
00:45:06.080 | And you want this modular hierarchical architecture
00:45:11.680 | to reflect the way that domain experts
00:45:14.600 | think about the problem.
00:45:16.440 | Because as a domain expert,
00:45:17.920 | when you are reading about a new API,
00:45:19.880 | you're reading a tutorial or some docs pages,
00:45:23.680 | you already have a way
00:45:26.400 | that you're thinking about the problem.
00:45:28.240 | You already have like certain concepts in mind
00:45:32.360 | and you're thinking about how they relate together.
00:45:35.720 | And when you're reading docs,
00:45:37.240 | you're trying to build as quickly as possible
00:45:40.360 | a mapping between the concepts featured in your API
00:45:45.320 | and the concepts in your mind.
00:45:46.840 | So you're trying to map your mental model
00:45:48.920 | as a domain expert to the way things work in the API.
00:45:53.640 | So you need an API and an underlying implementation
00:45:57.080 | that are reflecting the way people think about these things.
00:46:00.120 | - So you're minimizing the time it takes to do the mapping.
00:46:02.920 | - Yes, minimizing the time,
00:46:04.720 | the cognitive load there is
00:46:06.600 | in ingesting this new knowledge about your API.
00:46:10.960 | An API should not be self-referential
00:46:13.200 | or referring to implementation details.
00:46:15.560 | It should only be referring to domain specific concepts
00:46:19.200 | that people already understand.
00:46:21.400 | - Brilliant.
00:46:24.480 | So what's the future of Keras and TensorFlow look like?
00:46:27.560 | What does TensorFlow 3.0 look like?
00:46:29.640 | - So that's kind of too far in the future for me to answer,
00:46:33.680 | especially since I'm not even the one making these decisions.
00:46:37.800 | - Okay.
00:46:39.080 | - But so from my perspective,
00:46:41.240 | which is just one perspective
00:46:43.200 | among many different perspectives on the TensorFlow team,
00:46:46.040 | I'm really excited by developing even higher level APIs,
00:46:52.200 | higher level than Keras.
00:46:53.600 | I'm really excited by hyper parameter tuning,
00:46:56.480 | by automated machine learning, AutoML.
00:46:59.280 | I think the future is not just, you know,
00:47:03.200 | defining a model like you were assembling Lego blocks
00:47:07.600 | and then collect fit on it.
00:47:09.200 | It's more like an automagical model
00:47:13.640 | that would just look at your data
00:47:16.080 | and optimize the objective you're after, right?
00:47:19.040 | So that's what I'm looking into.
00:47:23.040 | - Yeah, so you put the baby into a room with the problem
00:47:26.440 | and come back a few hours later
00:47:28.760 | with a fully solved problem.
00:47:30.960 | - Exactly, it's not like a box of Legos.
00:47:33.560 | It's more like the combination of a kid
00:47:35.920 | that's really good at Legos and a box of Legos
00:47:38.800 | and just building the thing on its own.
00:47:41.520 | - Very nice.
00:47:42.680 | So that's an exciting future.
00:47:44.120 | And I think there's a huge amount of applications
00:47:46.080 | and revolutions to be had under the constraints
00:47:50.760 | of the discussion we previously had.
00:47:52.640 | But what do you think are the current limits
00:47:56.000 | of deep learning?
00:47:57.480 | If we look specifically at these function approximators
00:48:02.480 | that tries to generalize from data.
00:48:06.160 | So you've talked about local versus extreme generalization.
00:48:10.160 | You mentioned that neural networks don't generalize well,
00:48:13.280 | humans do.
00:48:14.560 | So there's this gap.
00:48:16.280 | So, and you've also mentioned that generalization,
00:48:19.880 | extreme generalization requires something like reasoning
00:48:22.360 | to fill those gaps.
00:48:23.960 | So how can we start trying to build systems like that?
00:48:27.560 | - Right, yeah, so this is by design, right?
00:48:30.600 | Deep learning models are like huge parametric models
00:48:35.080 | differentiable, so continuous,
00:48:39.440 | that go from an input space to an output space.
00:48:42.680 | And they're trained with gradient descent.
00:48:44.120 | So they're trained pretty much point by point.
00:48:47.200 | They're learning a continuous geometric morphing
00:48:50.520 | from an input vector space to an output vector space.
00:48:54.200 | And because this is done point by point,
00:48:59.000 | a deep neural network can only make sense
00:49:02.200 | of points in experience space that are very close
00:49:05.880 | to things that it has already seen in the stream data.
00:49:08.560 | At best, it can do interpolation across points.
00:49:12.560 | But that means, you know,
00:49:15.640 | it means in order to train your network,
00:49:17.360 | you need a dense sampling of the input cross output space,
00:49:21.680 | almost a point by point sampling,
00:49:25.240 | which can be very expensive
00:49:26.560 | if you're dealing with complex real world problems
00:49:29.320 | like autonomous driving, for instance, or robotics.
00:49:33.240 | It's doable if you're looking at the subset
00:49:36.000 | of the visual space, but even then,
00:49:37.760 | it's still fairly expensive.
00:49:38.760 | You still need millions of examples.
00:49:40.920 | And it's only gonna be able to make sense of things
00:49:44.240 | that are very close to what it has seen before.
00:49:46.840 | And in contrast to that,
00:49:48.600 | well, of course you have human intelligence,
00:49:50.160 | but even if you're not looking at human intelligence,
00:49:53.200 | you can look at very simple rules, algorithms.
00:49:56.760 | If you have a symbolic rule,
00:49:58.040 | it can actually apply to a very, very large set of inputs
00:50:03.040 | because it is abstract.
00:50:04.840 | It is not obtained by doing a point by point mapping, right?
00:50:09.840 | For instance, if you try to learn a sorting algorithm
00:50:14.000 | using a deep neural network,
00:50:15.520 | well, you're very much limited to learning point by point
00:50:18.480 | what the sorted representation of this specific list
00:50:23.320 | is like, but instead you could have a very, very simple
00:50:28.320 | sorting algorithm written in a few lines.
00:50:31.920 | Maybe it's just two nested loops
00:50:34.400 | and it can process any list at all because it is abstract,
00:50:40.560 | because it is a set of rules.
00:50:42.240 | So deep learning is really like point by point
00:50:45.160 | geometric morphings, morphings, train, risk, and descent.
00:50:48.600 | And meanwhile, abstract rules can generalize much better.
00:50:53.600 | And I think the future is really to combine the two.
00:50:56.680 | - So how do we, do you think, combine the two?
00:50:59.680 | How do we combine good point by point functions
00:51:03.520 | with programs, which is what the symbolic AI type systems?
00:51:08.520 | - Yeah.
00:51:09.880 | - At which levels the combination happen?
00:51:12.080 | I mean, obviously we're jumping into the realm
00:51:15.160 | of where there's no good answers.
00:51:17.360 | You just kind of ideas and intuitions and so on.
00:51:20.760 | - Well, if you look at the really successful AI systems
00:51:23.520 | today, I think they are already hybrid systems
00:51:26.320 | that are combining symbolic AI with deep learning.
00:51:29.520 | For instance, successful robotics systems
00:51:32.520 | are already mostly model-based, rule-based,
00:51:36.400 | things like planning algorithms and so on.
00:51:39.400 | At the same time, they're using deep learning
00:51:42.200 | as perception modules.
00:51:43.840 | Sometimes they're using deep learning as a way to inject
00:51:47.200 | fuzzy intuition into a rule-based process.
00:51:50.920 | If you look at the system like in a self-driving car,
00:51:54.560 | it's not just one big end-to-end neural network,
00:51:57.240 | you know, that wouldn't work at all.
00:51:59.000 | Precisely because in order to train that,
00:52:00.760 | you would need a dense sampling of experience space
00:52:05.160 | when it comes to driving,
00:52:06.200 | which is completely unrealistic, obviously.
00:52:08.880 | Instead, the self-driving car is mostly symbolic,
00:52:13.280 | you know, it's software, it's programmed by hand.
00:52:18.360 | So it's mostly based on explicit models,
00:52:21.640 | in this case, mostly 3D models of the environment
00:52:25.840 | around the car, but it's interfacing with the real world
00:52:29.520 | using deep learning modules, right?
00:52:31.240 | - Right, so the deep learning there serves as a way
00:52:33.440 | to convert the raw sensory information
00:52:36.080 | to something usable by symbolic systems.
00:52:38.360 | Okay, well, let's linger on that a little more.
00:52:42.400 | So dense sampling from input to output.
00:52:45.440 | You said it's obviously very difficult.
00:52:48.240 | Is it possible?
00:52:50.200 | - In the case of self-driving, you mean?
00:52:51.840 | - Let's say self-driving, right?
00:52:53.080 | Self-driving, for many people,
00:52:55.800 | let's not even talk about self-driving,
00:52:59.520 | let's talk about steering, so staying inside the lane.
00:53:05.080 | Lane following, yeah, it's definitely a problem
00:53:07.080 | you can solve with an end-to-end deep learning model,
00:53:08.920 | but that's like one small subset.
00:53:10.560 | - Hold on a second.
00:53:11.600 | I don't know why you're jumping from the extreme so easily,
00:53:14.520 | 'cause I disagree with you on that.
00:53:16.280 | I think, well, it's not obvious to me
00:53:21.000 | that you can solve lane following.
00:53:23.360 | - No, it's not obvious.
00:53:24.760 | I think it's doable.
00:53:25.840 | I think in general, you know, there is no hard limitations
00:53:30.840 | to what you can learn with a deep neural network,
00:53:33.680 | as long as the search space is rich enough,
00:53:38.680 | is flexible enough, and as long as you have
00:53:42.240 | this dense sampling of the input cross output space.
00:53:45.360 | The problem is that this dense sampling
00:53:47.720 | could mean anything from 10,000 examples
00:53:51.120 | to like trillions and trillions.
00:53:52.800 | - So that's my question.
00:53:54.360 | So what's your intuition?
00:53:56.200 | And if you could just give it a chance
00:53:58.720 | and think what kind of problems can be solved
00:54:01.840 | by getting a huge amounts of data
00:54:04.240 | and thereby creating a dense mapping.
00:54:08.000 | So let's think about natural language dialogue,
00:54:12.480 | the Turing test.
00:54:14.000 | Do you think the Turing test can be solved
00:54:17.000 | with a neural network alone?
00:54:21.120 | - Well, the Turing test is all about tricking people
00:54:24.440 | into believing they're talking to a human.
00:54:26.880 | And I don't think that's actually very difficult
00:54:29.040 | because it's more about exploiting human perception
00:54:34.040 | and not so much about intelligence.
00:54:37.520 | There's a big difference between mimicking
00:54:39.680 | intelligent behavior and actual intelligent behavior.
00:54:42.080 | - So, okay, let's look at maybe the Alexa Prize and so on,
00:54:45.360 | the different formulations of the natural language
00:54:47.480 | conversation that are less about mimicking
00:54:50.520 | and more about maintaining a fun conversation
00:54:52.800 | that lasts for 20 minutes.
00:54:54.720 | That's a little less about mimicking
00:54:56.200 | and that's more about, I mean,
00:54:58.160 | it's still mimicking,
00:54:59.080 | but it's more about being able to carry forward
00:55:01.440 | a conversation with all the tangents that happen
00:55:03.640 | in dialogue and so on.
00:55:05.080 | Do you think that problem is learnable
00:55:08.320 | with this kind of, with a neural network
00:55:11.960 | that does the point-to-point mapping?
00:55:14.520 | - So I think it would be very, very challenging
00:55:16.280 | to do this with deep learning.
00:55:17.800 | I don't think it's out of the question either.
00:55:21.480 | I wouldn't rule it out.
00:55:23.240 | - The space of problems that can be solved
00:55:25.400 | with a large neural network.
00:55:26.920 | What's your sense about the space of those problems?
00:55:30.040 | So useful problems for us.
00:55:32.560 | - In theory, it's infinite, right?
00:55:34.800 | You can solve any problem.
00:55:36.200 | In practice, while deep learning is a great fit
00:55:39.800 | for perception problems, in general,
00:55:42.360 | any problem which is naturally amenable
00:55:47.360 | to explicit handcrafted rules
00:55:50.240 | or rules that you can generate by exhaustive search
00:55:53.480 | over some program space.
00:55:56.080 | So perception, artificial intuition,
00:55:59.360 | as long as you have a sufficient trained data set.
00:56:03.280 | - And that's the question.
00:56:04.280 | I mean, perception, there's interpretation
00:56:06.440 | and understanding of the scene,
00:56:08.440 | which seems to be outside the reach
00:56:10.320 | of current perception systems.
00:56:13.000 | So do you think larger networks will be able
00:56:15.960 | to start to understand the physics
00:56:18.320 | and the physics of the scene,
00:56:21.120 | the three-dimensional structure and relationships
00:56:23.400 | of objects in the scene and so on,
00:56:25.600 | or really that's where Symbolica has to step in?
00:56:28.320 | - Well, it's always possible to solve these problems
00:56:34.400 | with deep learning.
00:56:36.840 | It's just extremely inefficient.
00:56:38.640 | A model would be, an explicit rule-based abstract model
00:56:42.080 | would be a far better, more compressed representation
00:56:45.960 | of physics than learning just this mapping
00:56:48.400 | between in this situation, this thing happens.
00:56:51.000 | If you change the situation slightly,
00:56:52.800 | then this other thing happens and so on.
00:56:54.800 | - Do you think it's possible to automatically generate
00:56:57.480 | the programs that would require that kind of reasoning?
00:57:02.200 | Or does it have to, so the way the expert systems fail,
00:57:05.360 | there's so many facts about the world
00:57:07.120 | had to be hand-coded in.
00:57:08.960 | Do you think it's possible to learn those logical statements
00:57:13.480 | that are true about the world and their relationships?
00:57:17.280 | Do you think, I mean, that's kind of what theorem proving
00:57:20.360 | at a basic level is trying to do, right?
00:57:22.680 | - Yeah, except it's much harder to formulate statements
00:57:26.160 | about the world compared to formulating
00:57:28.440 | mathematical statements.
00:57:30.320 | Statements about the world tend to be subjective.
00:57:32.880 | So can you learn rule-based models?
00:57:39.200 | - Yes.
00:57:40.440 | - Yes, definitely.
00:57:41.280 | That's the field of program synthesis.
00:57:43.600 | However, today we just don't really know how to do it.
00:57:48.000 | So it's very much a graph search or tree search problem.
00:57:52.400 | And so we are limited to the sort of tree search
00:57:56.480 | and graph search algorithms that we have today.
00:57:58.560 | Personally, I think genetic algorithms are very promising.
00:58:02.760 | - So it's almost like genetic programming.
00:58:04.360 | - Genetic programming, exactly.
00:58:05.600 | - Can you discuss the field of program synthesis?
00:58:08.840 | Like how many people are working and thinking about it?
00:58:13.360 | What, where we are in the history of program synthesis
00:58:17.920 | and what are your hopes for it?
00:58:20.760 | - Well, if it were deep learning, this is like the '90s.
00:58:23.560 | (laughing)
00:58:24.600 | So meaning that we already have existing solutions.
00:58:29.160 | We are starting to have some basic understanding
00:58:34.160 | of what this is about,
00:58:35.520 | but it's still a field that is in its infancy.
00:58:37.960 | There are very few people working on it.
00:58:40.440 | There are very few real-world applications.
00:58:42.840 | So the one real-world application I'm aware of
00:58:47.640 | is Flash Fill in Excel.
00:58:50.800 | It's a way to automatically learn very simple programs
00:58:55.080 | to format cells in an Excel spreadsheet
00:58:58.240 | from a few examples.
00:59:00.280 | For instance, learning a way to format a date,
00:59:02.120 | things like that.
00:59:02.960 | - Oh, that's fascinating.
00:59:03.800 | - Yeah.
00:59:04.640 | - You know, okay, that's a fascinating topic.
00:59:06.240 | I always wonder when I provide a few samples to Excel,
00:59:10.280 | what it's able to figure out.
00:59:12.640 | Like just giving it a few dates.
00:59:15.520 | What are you able to figure out
00:59:16.960 | from the pattern I just gave you?
00:59:18.440 | That's a fascinating question.
00:59:19.680 | And it's fascinating whether that's learnable patterns.
00:59:23.240 | And you're saying they're working on that.
00:59:24.880 | - Yeah.
00:59:25.720 | - How big is the toolbox currently?
00:59:28.120 | Are we completely in the dark?
00:59:29.360 | So if you said the '90s.
00:59:30.200 | - In terms of processes?
00:59:31.680 | No, so I would say,
00:59:35.160 | so maybe '90s is even too optimistic
00:59:37.760 | because by the '90s, you know,
00:59:38.920 | we already understood backprop.
00:59:41.000 | We already understood, you know,
00:59:42.040 | the engine of deep learning,
00:59:43.880 | even though we couldn't really see its potential quite.
00:59:47.240 | Today, I don't think we've found
00:59:48.440 | the engine of program synthesis.
00:59:50.360 | - So we're in the winter before backprop.
00:59:52.800 | - Yeah.
00:59:54.120 | In a way, yes.
00:59:55.680 | So I do believe program synthesis,
00:59:58.200 | in general, discrete search over rule-based models
01:00:02.040 | is gonna be a cornerstone of AI research
01:00:04.640 | in the next century, right?
01:00:06.640 | And that doesn't mean we're gonna drop deep learning.
01:00:10.120 | Deep learning is immensely useful.
01:00:11.800 | Like being able to learn is a very flexible,
01:00:16.160 | adaptable parametric model.
01:00:18.040 | So it's got in this sense,
01:00:19.080 | that's actually immensely useful.
01:00:20.240 | Like all it's doing is pattern cognition,
01:00:23.080 | but being good at pattern cognition,
01:00:24.800 | given lots of delays is just extremely powerful.
01:00:27.800 | So we are still gonna be working on deep learning
01:00:30.240 | and we're gonna be working on program synthesis.
01:00:31.800 | We're gonna be combining the two
01:00:33.240 | in increasingly automated ways.
01:00:35.080 | - So let's talk a little bit about data.
01:00:38.520 | You've tweeted.
01:00:40.120 | (laughs)
01:00:42.280 | About 10,000 deep learning papers have been written
01:00:45.200 | about hard coding priors about a specific task
01:00:48.200 | in a neural network architecture
01:00:49.640 | works better than a lack of a prior.
01:00:52.440 | Basically summarizing all these efforts,
01:00:55.120 | they put a name to an architecture,
01:00:56.960 | but really what they're doing is hard coding some priors
01:00:59.320 | that improve the performance of the system.
01:01:01.560 | But which gets straight to the point is probably true.
01:01:05.960 | So you say that you can always buy performance,
01:01:09.280 | buy in quotes performance by either training on more data,
01:01:12.960 | better data, or by injecting task information
01:01:15.480 | to the architecture of the pre-processing.
01:01:18.440 | However, this isn't informative
01:01:19.920 | about the generalization power of the techniques used,
01:01:22.200 | the fundamental ability to generalize.
01:01:24.240 | Do you think we can go far by coming up with better methods
01:01:28.320 | for this kind of cheating,
01:01:29.960 | for better methods of large-scale annotation of data?
01:01:33.560 | So building better priors.
01:01:35.240 | - If you had made it, it's not cheating anymore.
01:01:37.360 | - Right, I'm joking about the cheating,
01:01:39.480 | but large-scale, so basically I'm asking
01:01:43.080 | about something that hasn't, from my perspective,
01:01:48.280 | been researched too much is exponential improvement
01:01:53.280 | in annotation of data.
01:01:55.080 | Do you often think about-
01:01:58.200 | - I think it's actually been researched quite a bit.
01:02:00.840 | You just don't see publications about it
01:02:02.760 | because people who publish papers
01:02:05.840 | are gonna publish about known benchmarks.
01:02:07.920 | Sometimes they're gonna release a new benchmark.
01:02:09.800 | People who actually have real-world,
01:02:11.520 | large-scale deep learning problems,
01:02:13.520 | they're gonna spend a lot of resources
01:02:15.800 | into data annotation and good data annotation pipelines,
01:02:18.400 | but you don't see any papers about it.
01:02:19.680 | - That's interesting.
01:02:20.520 | So do you think, certainly resources,
01:02:22.720 | but do you think there's innovation happening?
01:02:24.840 | - Oh yeah, definitely.
01:02:25.880 | To clarify the point in the twist,
01:02:28.840 | so machine learning in general
01:02:30.960 | is the science of generalization.
01:02:33.880 | You want to generate knowledge that can be reused
01:02:38.880 | across different datasets, across different tasks.
01:02:42.120 | And if instead you're looking at one dataset
01:02:45.400 | and then you are hard-coding knowledge
01:02:49.120 | about this task into your architecture,
01:02:51.440 | this is no more useful than training a network
01:02:55.840 | and then saying, "Oh, I found these weight values
01:02:58.360 | "perform well," right?
01:03:01.960 | So David Ha, I don't know if you know David,
01:03:05.720 | he had a paper the other day
01:03:07.520 | about weight-agnostic neural networks.
01:03:10.440 | And this was a very interesting paper
01:03:12.120 | because it really illustrates the fact that an architecture,
01:03:16.360 | even without weights, an architecture is knowledge
01:03:20.600 | about a task, it encodes knowledge.
01:03:23.720 | And when it comes to architectures
01:03:25.880 | that are handcrafted by researchers,
01:03:29.400 | in some cases it is very, very clear
01:03:32.640 | that all they are doing is artificially
01:03:35.880 | re-encoding the template that corresponds
01:03:39.480 | to the proper way to solve a task
01:03:44.000 | encoding a given dataset.
01:03:45.160 | For instance, I know if you've looked at the Baby dataset,
01:03:50.160 | which is about natural language question answering,
01:03:53.400 | it is generated by an algorithm.
01:03:55.400 | So this is a question-answer pairs
01:03:57.680 | that are generated by an algorithm.
01:03:59.280 | The algorithm is solving a certain template.
01:04:01.520 | Turns out if you craft a network
01:04:04.080 | that literally encodes this template,
01:04:06.200 | you can solve this dataset with nearly 100% accuracy.
01:04:09.680 | But that doesn't actually tell you anything
01:04:12.280 | about how to solve question answering in general,
01:04:15.560 | which is the point.
01:04:17.720 | - The question is just to linger on it,
01:04:19.440 | whether it's from the data side
01:04:20.880 | or from the size of the network.
01:04:22.920 | I don't know if you've read the blog post
01:04:25.040 | by Rich Sutton, "The Bitter Lesson,"
01:04:27.720 | where he says, "The biggest lesson that we can read
01:04:30.360 | "from 70 years of AI research is that general methods
01:04:33.480 | "that leverage computation are ultimately
01:04:35.240 | "the most effective."
01:04:37.160 | So as opposed to figuring out methods
01:04:39.760 | that can generalize effectively,
01:04:41.720 | do you think we can get pretty far
01:04:46.640 | by just having something that leverages computation
01:04:50.120 | and the improvement of computation?
01:04:51.560 | - Yeah, so I think Rich is making a very good point,
01:04:54.720 | which is that a lot of these papers,
01:04:56.840 | which are actually all about manually hard-coding
01:05:00.960 | prior knowledge about a task into some system,
01:05:04.080 | doesn't have to be deep learning architecture,
01:05:05.640 | but into some system, right?
01:05:07.040 | You know, these papers are not actually making any impact.
01:05:11.760 | Instead, what's making really long-term impact
01:05:14.840 | is very simple, very general systems
01:05:18.520 | that are really agnostic to all these tricks,
01:05:21.280 | because these tricks do not generalize.
01:05:23.360 | And of course, the one general and simple thing
01:05:27.480 | that you should focus on is that which leverages computation,
01:05:32.480 | because computation, the availability
01:05:36.200 | of large-scale computation has been increasing exponentially
01:05:39.400 | following Moore's law.
01:05:40.600 | So if your algorithm is all about exploiting this,
01:05:44.120 | then your algorithm is suddenly exponentially improving.
01:05:47.480 | So I think Rich is definitely right.
01:05:52.440 | However, you know, he's right about the past 70 years.
01:05:57.120 | He's like assessing the past 70 years.
01:05:59.560 | I am not sure that this assessment will still hold true
01:06:02.960 | for the next 70 years.
01:06:04.960 | It might to some extent, I suspect it will not,
01:06:08.600 | because the truth of his assessment
01:06:11.600 | is a function of the context, right?
01:06:14.600 | In which this research took place.
01:06:16.880 | And the context is changing.
01:06:18.400 | Like Moore's law might not be applicable anymore,
01:06:21.480 | for instance, in the future.
01:06:23.840 | And I do believe that, you know,
01:06:26.480 | when you tweak one aspect of a system,
01:06:31.440 | when you exploit one aspect of a system,
01:06:32.960 | some other aspects starts becoming the bottleneck.
01:06:36.480 | Let's say you have unlimited computation.
01:06:38.880 | Well, then data is the bottleneck.
01:06:41.520 | And I think we're already starting to be in a regime
01:06:44.640 | where our systems are so large in scale
01:06:47.160 | and so data-hungry, the data today
01:06:49.320 | and the quality of data and the scale of data
01:06:51.760 | is the bottleneck.
01:06:53.120 | And in this environment,
01:06:54.680 | the bitter lesson from Rich
01:06:58.200 | is it's not gonna be true anymore, right?
01:07:00.840 | So I think we are gonna move from a focus on a scale
01:07:05.840 | of a competition scale to focus on data efficiency.
01:07:09.880 | - Data efficiency.
01:07:10.760 | So that's getting to the question of symbolic AI,
01:07:13.120 | but to linger on the deep learning approaches,
01:07:16.160 | do you have hope for either unsupervised learning
01:07:19.240 | or reinforcement learning,
01:07:20.920 | which are ways of being more data efficient
01:07:25.920 | in terms of the amount of data they need
01:07:29.600 | that required human annotation?
01:07:31.560 | - So unsupervised learning and reinforcement learning
01:07:34.240 | are frameworks for learning,
01:07:36.160 | but they're not like any specific technique.
01:07:39.000 | So usually when people say reinforcement learning,
01:07:41.200 | what they really mean is deep reinforcement learning,
01:07:43.240 | which is like one approach,
01:07:45.720 | which is actually very questionable.
01:07:47.440 | The question I was asking was unsupervised learning
01:07:50.920 | with deep neural networks and deep reinforcement learning.
01:07:54.640 | - Well, these are not really data efficient
01:07:56.840 | because you're still leveraging, you know,
01:07:58.360 | this huge parametric models,
01:08:00.040 | trying point by point with gradient descent.
01:08:02.440 | It is more efficient in terms of the number of annotations,
01:08:08.040 | the density of annotations you need.
01:08:09.440 | So the idea being to learn the latent space
01:08:13.280 | around which the data is organized
01:08:15.320 | and then map the sparse annotations into it.
01:08:18.720 | And sure, I mean, that's clearly a very good idea.
01:08:21.800 | It's not really a topic I would be working on,
01:08:26.080 | but it's clearly a good idea.
01:08:27.920 | - So it would get us to solve some problems that-
01:08:31.720 | - It will get us to incremental improvements
01:08:34.840 | in label data efficiency.
01:08:38.160 | - Do you have concerns about short-term
01:08:42.160 | or long-term threats from AI,
01:08:44.560 | from artificial intelligence?
01:08:46.000 | - Yes, definitely to some extent.
01:08:50.520 | - And what's the shape of those concerns?
01:08:52.800 | - This is actually something I've briefly written about,
01:08:56.840 | but the capabilities of deep learning technology
01:09:01.840 | can be used in many ways that are concerning
01:09:06.200 | from mass surveillance with things like facial recognition,
01:09:11.920 | in general, tracking lots of data about everyone
01:09:15.480 | and then being able to making sense of this data
01:09:18.960 | to do identification, to do prediction.
01:09:21.040 | That's concerning.
01:09:23.160 | That's something that's being very aggressively pursued
01:09:26.600 | by totalitarian states like China.
01:09:29.960 | One thing I am very much concerned about is that
01:09:34.680 | our lives are increasingly online,
01:09:40.680 | are increasingly digital, made of information,
01:09:43.280 | made of information consumption and information production
01:09:46.720 | or digital footprints, I would say.
01:09:51.880 | And if you absorb all of this data
01:09:56.360 | and you are in control of where you consume information,
01:10:01.360 | social networks and so on,
01:10:03.320 | recommendation engines,
01:10:07.000 | then you can build a sort of reinforcement loop
01:10:11.520 | for human behavior.
01:10:13.880 | You can observe the state of your mind at time T.
01:10:18.400 | You can predict how you would react
01:10:21.120 | to different pieces of content,
01:10:22.760 | how to get you to move your mind in a certain direction.
01:10:27.080 | And then you can feed you the specific piece of content
01:10:32.080 | that would move you in a specific direction.
01:10:35.800 | And you can do this at scale,
01:10:37.840 | at scale in terms of doing it continuously in real time.
01:10:44.960 | You can also do it at scale in terms of scaling this
01:10:47.960 | to many, many people, to entire populations.
01:10:50.400 | So potentially artificial intelligence,
01:10:53.920 | even in its current state,
01:10:55.680 | if you combine it with the internet,
01:10:58.560 | with the fact that we have,
01:11:00.720 | all of our lives are moving to digital devices
01:11:04.160 | and digital information consumption and creation,
01:11:06.800 | what you get is the possibility
01:11:10.640 | to achieve mass manipulation of behavior
01:11:14.520 | and mass psychological control.
01:11:16.880 | And this is a very real possibility.
01:11:18.600 | - Yeah, so you're talking about
01:11:20.080 | any kind of recommender system?
01:11:21.720 | - Yeah.
01:11:22.560 | - Let's look at the YouTube algorithm, Facebook,
01:11:26.160 | anything that recommends content you should watch next.
01:11:29.680 | And it's fascinating to think that there's some aspects
01:11:33.800 | of human behavior that you can,
01:11:36.640 | say a problem of,
01:11:41.080 | is this person hold Republican beliefs or Democratic beliefs
01:11:45.360 | and it's a trivial, that's an objective function
01:11:50.240 | and you can optimize and you can measure
01:11:52.600 | and you can turn everybody into a Republican
01:11:54.360 | or everybody into a Democrat.
01:11:55.200 | - Absolutely, yeah.
01:11:56.040 | I do believe it's true.
01:11:57.880 | So the human mind is very,
01:12:02.000 | if you look at the human mind as a kind of computer program,
01:12:05.200 | it is a very large exploit surface, right?
01:12:07.600 | It has many, many vulnerabilities.
01:12:09.360 | - Exploit surfaces, yeah.
01:12:10.880 | - Ways you can control it.
01:12:13.520 | For instance, when it comes to your political beliefs,
01:12:16.600 | this is very much tied to your identity.
01:12:19.280 | So for instance, if I'm in control of your newsfeed
01:12:23.080 | on your favorite social media platforms,
01:12:26.000 | this is actually where you're getting your news from.
01:12:29.400 | And I can, of course I can choose to only show you news
01:12:33.680 | that will make you see the world in a specific way, right?
01:12:37.120 | But I can also,
01:12:38.360 | create incentives for you to post
01:12:43.280 | about some political beliefs.
01:12:44.640 | And then when I get you to express a statement,
01:12:47.960 | if it's a statement that me as the controller,
01:12:51.800 | I want to reinforce,
01:12:53.720 | I can just show it to people who will agree
01:12:55.560 | and they will like it.
01:12:56.880 | And that will reinforce the statement in your mind.
01:12:59.240 | If this is a statement I want you to,
01:13:01.600 | this is a belief I want you to abandon,
01:13:05.200 | I can, on the other hand, show it to opponents, right?
01:13:09.520 | Will attack you.
01:13:10.560 | And then because they attack you at the very least,
01:13:12.800 | next time you will think twice about posting it.
01:13:16.760 | But maybe you will even, you know,
01:13:18.960 | stop believing this because you got pushback, right?
01:13:22.760 | So there are many ways in which
01:13:27.200 | social media platforms can potentially control your opinions.
01:13:30.520 | And today,
01:13:31.360 | so all of these things are already being controlled
01:13:36.920 | by AI algorithms.
01:13:38.240 | These algorithms do not have
01:13:39.960 | any explicit political goal today.
01:13:42.880 | While potentially they could,
01:13:44.800 | like if some totalitarian government
01:13:49.200 | takes over social media platforms
01:13:52.720 | and decides that now we are gonna use this
01:13:54.960 | not just for mass surveillance,
01:13:56.280 | but also for mass opinion control and behavior control,
01:13:59.160 | you know, very bad things could happen.
01:14:01.880 | But what's really fascinating
01:14:04.760 | and actually quite concerning is that
01:14:07.080 | even without an explicit intent to manipulate,
01:14:11.320 | you're already seeing very dangerous dynamics
01:14:14.840 | in terms of how these content recommendation
01:14:17.960 | algorithms behave.
01:14:19.760 | Because right now, the goal,
01:14:23.440 | the objective function of these algorithms
01:14:26.040 | is to maximize engagement, right?
01:14:28.640 | Which seems fairly innocuous at first, right?
01:14:32.480 | However, it is not because content
01:14:36.520 | that will maximally engage people, you know,
01:14:39.960 | get people to react in an emotional way,
01:14:43.000 | get people to click on something,
01:14:44.760 | it is very often content that, you know,
01:14:49.760 | is not healthy to the public discourse.
01:14:54.400 | For instance, fake news are far more likely
01:14:59.080 | to get you to click on them than real news,
01:15:01.360 | simply because they are not constrained to reality.
01:15:06.360 | So they can be as atrocious, as surprising,
01:15:11.440 | as good stories as you want,
01:15:13.800 | because they're artificial, right?
01:15:15.280 | - Yeah, to me, that's an exciting world
01:15:17.640 | because so much good can come.
01:15:19.600 | So there's an opportunity to educate people.
01:15:24.600 | You can balance people's worldview with other ideas.
01:15:29.640 | So there's so many objective functions,
01:15:33.880 | the space of objective functions
01:15:35.680 | that create better civilizations
01:15:37.960 | is large, arguably infinite.
01:15:40.640 | But there's also a large space that creates division
01:15:45.680 | and destruction, civil war, a lot of bad stuff.
01:15:50.680 | And the worry is, naturally,
01:15:56.200 | probably that space is bigger, first of all.
01:15:59.160 | And if we don't explicitly think about
01:16:01.520 | what kind of effects are going to be observed
01:16:06.520 | from different objective functions,
01:16:08.360 | then we're going to get into trouble.
01:16:10.240 | But the question is, how do we get into rooms
01:16:14.480 | and have discussions?
01:16:16.280 | So inside Google, inside Facebook, inside Twitter,
01:16:20.160 | and think about, okay, how can we drive up engagement
01:16:23.720 | and at the same time create a good society?
01:16:27.960 | Is it even possible to have
01:16:29.280 | that kind of philosophical discussion?
01:16:31.360 | - I think you can definitely try.
01:16:33.080 | So from my perspective,
01:16:34.800 | I would feel rather uncomfortable with companies
01:16:39.480 | that are in control of these news algorithms,
01:16:43.240 | with them making explicit decisions
01:16:45.720 | to manipulate people's opinions or behaviors,
01:16:50.440 | even if the intent is good,
01:16:52.680 | because that's a very totalitarian mindset.
01:16:55.240 | So instead, what I would like to see,
01:16:57.560 | it's probably never going to happen
01:16:58.880 | because it's not super realistic,
01:17:00.360 | but that's actually something I really care about.
01:17:02.560 | I would like all these algorithms
01:17:06.320 | to present configuration settings to their users
01:17:10.600 | so that the users can actually make the decision
01:17:13.880 | about how they want to be impacted
01:17:16.800 | by these information recommendation,
01:17:19.840 | content recommendation algorithms.
01:17:21.960 | For instance, as a user of something like YouTube
01:17:24.840 | or Twitter, maybe I want to maximize learning
01:17:28.320 | about a specific topic, right?
01:17:30.360 | So I want the algorithm to feed my curiosity, right?
01:17:35.360 | Which is in itself a very interesting problem.
01:17:38.680 | So instead of maximizing my engagement,
01:17:41.280 | it will maximize how fast and how much I'm learning.
01:17:44.720 | And it will also take into account the accuracy,
01:17:47.400 | hopefully, of the information I'm learning.
01:17:49.600 | So yeah, the user should be able to determine exactly
01:17:55.680 | how these algorithms are affecting their lives.
01:17:58.640 | I don't want actually any entity making decisions
01:18:03.600 | about in which direction
01:18:06.960 | they're going to try to manipulate me, right?
01:18:09.520 | I want technology.
01:18:11.760 | So AI, these algorithms are increasingly going to be
01:18:15.160 | our interface to a world
01:18:17.480 | that is increasingly made of information.
01:18:20.080 | And I want everyone to be in control of this interface,
01:18:25.080 | to interface with the world on their own terms.
01:18:29.120 | So if someone wants these algorithms
01:18:32.920 | to serve their own personal growth goals,
01:18:37.680 | they should be able to configure these algorithms
01:18:40.680 | in such a way.
01:18:41.880 | - Yeah, but so I know it's painful
01:18:44.960 | to have explicit decisions,
01:18:46.720 | but there is underlying explicit decisions,
01:18:51.120 | which is some of the most beautiful fundamental philosophy
01:18:54.960 | that we have before us, which is personal growth.
01:19:01.120 | If I want to watch videos from which I can learn,
01:19:04.600 | what does that mean?
01:19:08.000 | So if I have a checkbox that wants to emphasize learning,
01:19:11.840 | there's still an algorithm with explicit decisions in it
01:19:15.520 | that would promote learning.
01:19:17.800 | What does that mean for me?
01:19:19.080 | Like, for example, I've watched a documentary
01:19:20.720 | on flat earth theory, I guess.
01:19:23.840 | It was very, like, I learned a lot.
01:19:28.280 | I'm really glad I watched it.
01:19:29.880 | It was a friend recommended it to me.
01:19:31.720 | Not, 'cause I don't have such an allergic reaction
01:19:35.120 | to crazy people as my fellow colleagues do,
01:19:37.680 | but it was very eye-opening.
01:19:40.360 | And for others, it might not be.
01:19:42.200 | For others, they might just get turned off for that.
01:19:45.320 | Same with Republican and Democrat.
01:19:47.240 | And what, it's a non-trivial problem.
01:19:50.320 | And first of all, if it's done well,
01:19:53.000 | I don't think it's something that wouldn't happen,
01:19:56.600 | that YouTube wouldn't be promoting,
01:19:59.320 | or Twitter wouldn't be.
01:20:00.160 | It's just a really difficult problem.
01:20:02.320 | How do we do, how to give people control?
01:20:05.560 | - Well, it's mostly an interface design problem.
01:20:08.040 | - Right.
01:20:09.000 | - The way I see it, you want to create technology
01:20:11.080 | that's like a mentor or a coach or an assistant
01:20:16.080 | so that it's not your boss, right?
01:20:19.480 | You are in control of it.
01:20:22.600 | You are telling it what to do for you.
01:20:25.800 | And if you feel like it's manipulating you,
01:20:27.880 | it's not actually doing what you want.
01:20:31.880 | You should be able to switch to a different algorithm.
01:20:33.880 | - Right.
01:20:35.000 | So that's fine-tune control.
01:20:36.480 | And you kind of learn,
01:20:38.240 | you're trusting the human collaboration.
01:20:40.160 | I mean, that's how I see autonomous vehicles too,
01:20:41.960 | is giving as much information as possible,
01:20:44.560 | and you learn that dance yourself.
01:20:46.440 | Yeah, Adobe, I don't know if you use Adobe products
01:20:50.360 | for like Photoshop. - Yeah, I use Photoshop.
01:20:51.600 | - Yeah.
01:20:52.440 | They're trying to see if they can inject YouTube
01:20:55.080 | into their interface.
01:20:56.200 | Basically allow you to show you all these videos
01:20:59.880 | that, 'cause everybody's confused about what to do
01:21:02.840 | with features, so basically teach people by linking to,
01:21:07.160 | and that way it's an assistant that shows,
01:21:09.480 | uses videos as a basic element of information.
01:21:12.600 | Okay, so what practically should people do
01:21:18.320 | to try to fight against abuses of these algorithms
01:21:24.040 | or algorithms that manipulate us?
01:21:27.400 | - Honestly, it's a very, very difficult problem
01:21:29.280 | because to start with,
01:21:30.120 | there is very little public awareness of these issues.
01:21:33.920 | Very few people would think there's anything wrong
01:21:38.520 | with their newsfeed algorithm,
01:21:39.720 | even though there is actually something wrong already,
01:21:42.040 | which is that it's trying to maximize engagement
01:21:44.480 | most of the time, which has very negative side effects.
01:21:49.880 | So ideally, so the very first thing is to stop
01:21:54.560 | trying to purely maximize engagement,
01:21:59.560 | try to propagate content based on popularity, right?
01:22:04.560 | Instead, take into account the goals
01:22:11.000 | and the profiles of each user.
01:22:13.520 | So you will be, one example is for instance,
01:22:16.920 | when they look at topic recommendations on Twitter,
01:22:20.760 | it's like, you know, they have this news tab
01:22:24.480 | with switch recommendations, it's always the worst garbage
01:22:28.440 | because it's content that appeals to the smallest
01:22:33.160 | common denominator to all Twitter users
01:22:35.200 | because they're trying to optimize,
01:22:37.080 | they're purely trying to optimize popularity,
01:22:39.040 | they're purely trying to optimize engagement,
01:22:41.320 | but that's not what I want.
01:22:42.960 | So they should put me in control of some setting
01:22:46.120 | so that I define what's the objective function
01:22:48.920 | that Twitter is going to be following
01:22:52.200 | to show me this content.
01:22:54.080 | And honestly, so this is all about interface design.
01:22:57.240 | And it's not realistic to give users control
01:23:00.480 | over a bunch of knobs that define algorithm.
01:23:03.400 | Instead, we should purely put them in charge
01:23:06.720 | of defining the objective function.
01:23:09.360 | Like let the user tell us what they want to achieve,
01:23:13.200 | how they want this algorithm to impact their lives.
01:23:15.200 | - So do you think it is that,
01:23:16.640 | or do they provide individual article
01:23:18.720 | by article reward structure where you give a signal,
01:23:21.560 | I'm glad I saw this, or I'm glad I didn't.
01:23:24.680 | - So like a Spotify type feedback mechanism,
01:23:28.480 | it works to some extent.
01:23:29.840 | I'm kind of skeptical about it
01:23:31.960 | because the only way the algorithm,
01:23:34.840 | the algorithm will attempt to relate your choices
01:23:39.120 | with the choices of everyone else, which might, you know,
01:23:43.280 | if you have an average profile that works fine,
01:23:45.720 | I'm sure Spotify accommodations work fine
01:23:47.880 | if you just like mainstream stuff.
01:23:49.560 | If you don't, it can be, it's not optimal at all, actually.
01:23:53.960 | - It'll be an inefficient search
01:23:56.080 | for the part of the Spotify world that represents you.
01:24:00.800 | - So it's a tough problem,
01:24:02.960 | but do note that even a feedback system
01:24:07.960 | like what Spotify has does not give me control
01:24:10.880 | over what the algorithm is trying to optimize for.
01:24:14.960 | - Well, public awareness, which is what we're doing now,
01:24:19.320 | is a good place to start.
01:24:21.320 | Do you have concerns about long-term existential threats
01:24:25.920 | of artificial intelligence?
01:24:27.320 | - Well, as I was saying,
01:24:31.000 | our world is increasingly made of information.
01:24:33.360 | AI algorithms are increasingly gonna be our interface
01:24:36.200 | to this world of information,
01:24:37.840 | and somebody will be in control of these algorithms.
01:24:41.440 | And that puts us in any kind of a bad situation, right?
01:24:45.920 | It has risks.
01:24:46.840 | It has risks coming from potentially large companies
01:24:52.200 | wanting to optimize their own goals,
01:24:54.960 | maybe profits, maybe something else.
01:24:57.120 | Also from governments who might want to use these algorithms
01:25:01.920 | as a means of control of the population.
01:25:04.680 | - Do you think there's existential threat
01:25:06.200 | that could arise from that?
01:25:07.400 | - So existential threat.
01:25:10.280 | So maybe you're referring to the singularity narrative
01:25:14.440 | where robots just take over.
01:25:16.720 | - Well, I don't, not Terminator robots,
01:25:19.360 | and I don't believe it has to be a singularity.
01:25:21.960 | We're just talking to, just like you said,
01:25:25.640 | the algorithm controlling masses of populations.
01:25:28.800 | The existential threat being,
01:25:31.960 | hurt ourselves much like a nuclear war would hurt ourselves.
01:25:37.680 | That kind of thing.
01:25:38.520 | I don't think that requires a singularity,
01:25:40.440 | that requires a loss of control over AI algorithms.
01:25:43.400 | - Yes.
01:25:44.520 | So I do agree there are concerning trends.
01:25:47.960 | Honestly, I wouldn't want to make any long-term predictions.
01:25:52.880 | I don't think today we really have the capability
01:25:56.880 | to see what the dangers of AI are gonna be in 50 years,
01:26:00.480 | in a hundred years.
01:26:02.280 | I do see that we are already faced with concrete
01:26:07.280 | and present dangers surrounding the negative side effects
01:26:12.320 | of content recombination systems, of newsfeed algorithms,
01:26:15.720 | concerning algorithmic bias as well.
01:26:18.440 | So we are delegating more and more decision processes
01:26:24.400 | to algorithms.
01:26:25.840 | Some of those algorithms are uncrafted,
01:26:27.480 | some are learned from data,
01:26:30.080 | but we are delegating control.
01:26:32.680 | Sometimes it's a good thing, sometimes not so much.
01:26:37.040 | And there is in general very little supervision
01:26:40.200 | of this process, right?
01:26:41.640 | So we are still in this period of very fast change,
01:26:46.040 | even chaos, where society is restructuring itself,
01:26:51.040 | turning into an information society,
01:26:53.840 | which itself is turning into an increasingly automated
01:26:57.000 | information processing society.
01:26:59.000 | And well, yeah, I think the best we can do today
01:27:03.160 | is try to raise awareness around some of these issues.
01:27:06.640 | And I think we're actually making good progress.
01:27:08.280 | If you look at algorithmic bias, for instance,
01:27:11.720 | three years ago, even two years ago,
01:27:14.720 | very, very few people were talking about it.
01:27:17.000 | And now all the big companies are talking about it.
01:27:20.280 | They are often not in a very serious way,
01:27:22.320 | but at least it is part of the public discourse.
01:27:24.520 | You see people in Congress talking about it.
01:27:26.560 | And it all started from raising awareness.
01:27:31.560 | - Right.
01:27:32.800 | So in terms of alignment problem,
01:27:36.040 | trying to teach as we allow algorithms,
01:27:39.200 | just even recommender systems on Twitter,
01:27:41.480 | encoding human values and morals,
01:27:47.040 | decisions that touch on ethics,
01:27:50.160 | how hard do you think that problem is?
01:27:52.560 | How do we have loss functions in neural networks
01:27:57.200 | that have some component,
01:27:58.600 | some fuzzy components of human morals?
01:28:01.040 | - Well, I think this is really all about
01:28:04.720 | objective function engineering,
01:28:06.120 | which is probably going to be increasingly
01:28:08.760 | a topic of concern in the future.
01:28:10.520 | Like for now, we're just using very naive loss functions
01:28:14.680 | because the hard part is not actually
01:28:16.600 | what you're trying to minimize, it's everything else.
01:28:19.040 | But as the everything else
01:28:20.880 | is going to be increasingly automated,
01:28:22.920 | we're going to be focusing our human attention
01:28:27.120 | on increasingly high level components.
01:28:30.280 | Like what's actually driving the whole learning system,
01:28:32.640 | like the objective function.
01:28:33.920 | So loss function engineering is going to be,
01:28:36.880 | loss function engineer is probably going to be
01:28:38.480 | a job title in the future.
01:28:40.600 | - And then the tooling you're creating with Keras
01:28:42.680 | essentially takes care of all the details underneath.
01:28:47.000 | And basically the human expert is needed for exactly that.
01:28:52.000 | - That's the idea.
01:28:53.840 | Keras is the interface between the data you're collecting
01:28:57.560 | and the business goals.
01:28:59.000 | And your job as an engineer is going to be to express
01:29:02.440 | your business goals and your understanding of your business
01:29:05.360 | or your product, your system as a kind of loss function
01:29:09.760 | or a kind of set of constraints.
01:29:11.760 | - Does the possibility of creating an AGI system
01:29:14.680 | excite you or scare you?
01:29:17.160 | Or bore you?
01:29:18.120 | - So intelligence can never really be general.
01:29:22.120 | You know, at best it can have some degree of generality
01:29:24.440 | like human intelligence.
01:29:26.400 | It's also always has some specialization
01:29:29.040 | in the same way that human intelligence is specialized
01:29:31.960 | in a certain category of problems,
01:29:33.440 | is specialized in the human experience.
01:29:35.440 | And when people talk about AGI,
01:29:37.280 | I'm never quite sure if they're talking about
01:29:39.480 | very, very smart AI, so smart that it's even smarter
01:29:44.280 | than humans, or they're talking about human-like
01:29:47.200 | intelligence, because these are different things.
01:29:49.720 | - Let's say, presumably I'm impressing you today
01:29:53.280 | with my humanness.
01:29:54.800 | So imagine that I was in fact a robot.
01:29:58.400 | So what does that mean?
01:30:00.760 | I'm impressing you with natural language processing.
01:30:04.960 | Maybe if you weren't able to see me,
01:30:06.440 | maybe this is a phone call.
01:30:07.920 | So that kind of system.
01:30:09.080 | - Okay, so-- - Companion.
01:30:11.160 | - So that's very much about building human-like AI.
01:30:15.080 | And you're asking me, you know,
01:30:16.200 | is this an exciting perspective?
01:30:18.240 | - Yes.
01:30:19.480 | - I think so, yes.
01:30:20.640 | Not so much because of what artificial
01:30:26.160 | human-like intelligence could do,
01:30:28.000 | but, you know, from an intellectual perspective,
01:30:30.880 | I think if you could build truly human-like intelligence,
01:30:34.160 | that means you could actually understand human intelligence,
01:30:37.240 | which is fascinating, right?
01:30:39.880 | Human-like intelligence is gonna require emotions,
01:30:42.680 | it's gonna require consciousness,
01:30:44.400 | which is not things that would normally be required
01:30:47.080 | by an intelligent system.
01:30:49.720 | If you look at, you know, we were mentioning earlier,
01:30:51.880 | like science as superhuman problem-solving agent or system,
01:30:56.880 | it does not have consciousness, it doesn't have emotions.
01:31:02.120 | In general, so emotions, I see consciousness
01:31:05.320 | as being on the same spectrum as emotions.
01:31:07.680 | It is a component of the subjective experience
01:31:12.280 | that is meant very much to guide behavior generation, right?
01:31:17.280 | It's meant to guide your behavior.
01:31:20.840 | In general, human intelligence and animal intelligence
01:31:24.560 | has evolved for the purpose of behavior generation, right?
01:31:29.360 | Including in a social context,
01:31:30.680 | so that's why we actually need emotions,
01:31:32.520 | that's why we need consciousness.
01:31:34.960 | An artificial intelligence system
01:31:36.640 | developed in a different context may well never need them,
01:31:39.760 | may well never be conscious, like science.
01:31:43.120 | - Well, on that point, I would argue it's possible
01:31:46.000 | to imagine that there's echoes of consciousness in science
01:31:51.000 | when viewed as an organism, that science is consciousness.
01:31:54.560 | - So, I mean, how would you go about testing this hypothesis?
01:31:59.200 | How do you probe the subjective experience
01:32:02.960 | of an abstract system like science?
01:32:06.440 | - Well, the point of probing any subjective experience
01:32:09.560 | is impossible, 'cause I'm not science, I'm Lex.
01:32:13.240 | So I can't probe another entity's,
01:32:16.080 | it's no more than bacteria on my skin.
01:32:20.600 | - You're Lex, I can ask you questions
01:32:22.680 | about your subjective experience and you can answer me,
01:32:25.240 | and that's how I know you're conscious.
01:32:27.400 | - Yes, but that's because we speak the same language.
01:32:31.880 | You perhaps, we have to speak the language of science
01:32:35.080 | in order to ask it. - Honestly,
01:32:35.920 | I don't think consciousness, just like emotions
01:32:38.640 | of pain and pleasure, is not something that inevitably arises
01:32:43.640 | from any sort of sufficiently
01:32:46.040 | intelligent information processing.
01:32:48.000 | It is a feature of the mind,
01:32:49.920 | and if you've not implemented it explicitly,
01:32:52.480 | it is not there.
01:32:54.000 | - So you think it's an emergent feature
01:32:56.960 | of a particular architecture.
01:32:59.040 | So do you think--
01:33:00.400 | - It's a feature in the same sense.
01:33:02.040 | So again, the subjective experience
01:33:04.200 | is all about guiding behavior.
01:33:07.560 | If the problems you're trying to solve
01:33:11.960 | don't really involve embodied agents,
01:33:15.240 | maybe in a social context, generating behavior
01:33:18.000 | and pursuing goals like this.
01:33:19.600 | And if you look at science,
01:33:20.840 | that's not really what's happening, even though it is.
01:33:23.040 | It is a form of artificial AI, artificial intelligence,
01:33:28.040 | in the sense that it is solving problems,
01:33:30.320 | it is accumulating knowledge,
01:33:32.120 | accumulating solutions and so on.
01:33:34.120 | So if you're not explicitly implementing
01:33:38.160 | a subjective experience,
01:33:39.560 | implementing certain emotions
01:33:42.440 | and implementing consciousness,
01:33:44.160 | it's not gonna just spontaneously emerge.
01:33:47.400 | - Yeah, but so for a system like,
01:33:50.200 | human-like intelligent system that has consciousness,
01:33:53.400 | do you think it needs to have a body?
01:33:56.000 | - Yes, definitely.
01:33:56.840 | I mean, it doesn't have to be a physical body, right?
01:33:59.800 | And there's not that much difference
01:34:01.360 | between a realistic simulation and the real world.
01:34:03.520 | - Oh, so there has to be something
01:34:04.800 | you have to preserve kind of thing.
01:34:06.480 | - Yes, but human-like intelligence
01:34:08.800 | can only arise in a human-like context.
01:34:11.960 | Intelligence in the start.
01:34:12.800 | - In other humans, in order for you to demonstrate
01:34:16.920 | that you have human-like intelligence essentially.
01:34:19.120 | - Yes.
01:34:20.320 | - So what kind of test and demonstration
01:34:25.320 | would be sufficient for you
01:34:28.280 | to demonstrate human-like intelligence?
01:34:31.120 | - Yeah.
01:34:31.960 | - I just out of curiosity,
01:34:32.800 | you've talked about in terms of theorem proving
01:34:35.680 | and program synthesis,
01:34:37.120 | I think you've written about
01:34:38.160 | that there's no good benchmarks for this.
01:34:40.600 | - Yeah.
01:34:41.440 | - That's one of the problems.
01:34:42.280 | So let's talk program synthesis.
01:34:46.480 | So what do you imagine is a good,
01:34:48.960 | I think it's related questions for human-like intelligence
01:34:51.560 | and for program synthesis.
01:34:53.240 | What's a good benchmark for either or both?
01:34:56.160 | - Right, so I mean, you're actually asking two questions,
01:34:59.440 | which is one is about qualifying intelligence
01:35:02.680 | and comparing the intelligence of an artificial system
01:35:07.120 | to the intelligence for human.
01:35:08.680 | And the other is about degree to which
01:35:12.040 | this intelligence is human-like.
01:35:13.680 | It's actually two different questions.
01:35:15.600 | So if you look, you mentioned earlier the Turing test.
01:35:19.320 | - Right.
01:35:20.160 | - Well, I actually don't like the Turing test
01:35:21.720 | because it's very lazy.
01:35:23.400 | It's all about completely bypassing the problem
01:35:26.720 | of defining and measuring intelligence.
01:35:28.680 | - Right.
01:35:29.520 | - And instead delegating to a human judge
01:35:32.640 | or a panel of human judges.
01:35:34.360 | So it's a total cop out, right?
01:35:37.480 | If you want to measure how human-like an agent is,
01:35:43.360 | I think you have to make it interact with other humans.
01:35:47.680 | Maybe it's not necessarily a good idea
01:35:49.840 | to have these other humans be the judges.
01:35:53.960 | Maybe you should just observe behavior
01:35:56.880 | and compare it to what a human would actually have done.
01:35:59.680 | When it comes to measuring how smart,
01:36:03.280 | how clever an agent is and comparing that
01:36:06.360 | to the degree of human intelligence.
01:36:11.240 | So we're already talking about two things, right?
01:36:13.680 | The degree, kind of like the magnitude of an intelligence
01:36:17.840 | and its direction, right?
01:36:20.520 | Like the norm of a vector and its direction.
01:36:23.400 | And the direction is like human likeness
01:36:26.960 | and the magnitude, the norm is intelligence.
01:36:31.960 | You could call it intelligence, right?
01:36:34.240 | - So the direction, your sense, the space of directions
01:36:38.840 | that are human-like is very narrow.
01:36:41.160 | - Yeah.
01:36:42.360 | So the way you would measure the magnitude of intelligence
01:36:48.320 | in a system in a way that also enables you to compare it
01:36:51.920 | to that of a human.
01:36:54.800 | Well, if you look at different benchmarks
01:36:58.320 | for intelligence today, they're all too focused on skill
01:37:03.280 | at a given task.
01:37:04.320 | That's skill at playing chess, skill at playing Go,
01:37:07.640 | skill at playing Dota.
01:37:09.120 | And I think that's not the right way to go about it
01:37:14.480 | because you can always be too human at one specific task.
01:37:19.360 | The reason why our skill at playing Go or juggling
01:37:23.680 | or anything is impressive is because we are expressing
01:37:26.160 | this skill within a certain set of constraints.
01:37:29.440 | If you remove the constraints, the constraints
01:37:32.080 | that we have one lifetime, that we have this body and so on,
01:37:35.880 | if you remove the context, if you have unlimited string data,
01:37:40.000 | if you can have access to, for instance,
01:37:41.960 | if you look at juggling, if you have no restriction
01:37:44.920 | on the hardware, then achieving arbitrary levels of skill
01:37:48.800 | is not very interesting and says nothing about
01:37:51.920 | the amount of intelligence you've achieved.
01:37:53.880 | So if you want to measure intelligence,
01:37:55.720 | you need to rigorously define what intelligence is,
01:37:59.920 | which in itself, it's a very challenging problem.
01:38:04.400 | - And do you think that's possible?
01:38:05.960 | - To define intelligence, yes, absolutely.
01:38:07.520 | I mean, you can provide, many people have provided
01:38:10.280 | some definition, I have my own definition.
01:38:13.520 | - Where does your definition begin if it doesn't end?
01:38:16.240 | - Well, I think intelligence is essentially the efficiency
01:38:21.240 | with which you turn experience into generalizable programs.
01:38:27.440 | So what that means is it's the efficiency
01:38:32.000 | with which you turn a sampling of experience space
01:38:36.360 | into the ability to process a larger chunk
01:38:41.360 | of experience space.
01:38:46.080 | So measuring skill can be one proxy,
01:38:51.080 | because many different tasks can be one proxy
01:38:53.600 | for measuring intelligence,
01:38:54.560 | but if you want to only measure skill,
01:38:57.360 | you should control for two things.
01:38:58.840 | You should control for the amount of experience
01:39:03.960 | that your system has and the priors that your system has.
01:39:08.960 | But if you control, if you look at two agents
01:39:12.080 | and you give them the same priors
01:39:13.960 | and you give them the same amount of experience,
01:39:17.200 | there is one of the agents that is going to learn programs,
01:39:22.200 | representations, something, a model,
01:39:24.840 | that will perform well on the larger chunk
01:39:27.800 | of experience space than the other,
01:39:29.560 | and that is the smaller agent.
01:39:32.320 | - Yeah, so if you fix the experience,
01:39:35.280 | which generate better programs,
01:39:37.800 | better meaning more generalizable.
01:39:39.760 | That's really interesting.
01:39:40.680 | That's a very nice, clean definition of--
01:39:42.600 | - Oh, by the way, in this definition,
01:39:45.600 | it is already very obvious
01:39:47.440 | that intelligence has to be specialized,
01:39:49.560 | because you're talking about experience space
01:39:51.920 | and you're talking about segments of experience space.
01:39:54.240 | You're talking about priors
01:39:55.640 | and you're talking about experience.
01:39:57.320 | All of these things define the context
01:40:00.480 | in which intelligence emerges.
01:40:02.880 | And you can never look at the totality
01:40:07.560 | of experience space, right?
01:40:09.000 | So intelligence has to be specialized.
01:40:12.400 | - But it can be sufficiently large,
01:40:13.720 | the experience space, even though it's specialized.
01:40:16.240 | There's a certain point when the experience space
01:40:18.520 | is large enough to where it might as well be general.
01:40:22.120 | It feels general, it looks general.
01:40:23.960 | - Sure, I mean, it's very relative.
01:40:25.760 | Like, for instance, many people would say
01:40:27.440 | human intelligence is general.
01:40:29.440 | In fact, it is quite specialized.
01:40:31.600 | You know, we can definitely build systems
01:40:34.720 | that start from the same innate priors
01:40:37.240 | as what humans have at birth,
01:40:39.160 | because we already understand fairly well
01:40:42.400 | what sort of priors we have as humans.
01:40:44.640 | Like, many people have worked on this problem,
01:40:46.880 | most notably Elisabeth Spelke from Harvard,
01:40:51.160 | I don't know if you know her.
01:40:52.400 | She's worked a lot on what she calls core knowledge,
01:40:56.120 | and it is very much about trying to determine
01:40:59.160 | and describe what priors we are born with.
01:41:02.480 | - Like language skills and so on, all that kind of stuff.
01:41:04.800 | - Exactly.
01:41:05.640 | So we have some pretty good understanding
01:41:09.840 | of what priors we are born with.
01:41:11.480 | So we could, so I've actually been working on a benchmark
01:41:16.480 | for the past couple of years, you know, on and off.
01:41:18.760 | I hope to be able to release it at some point.
01:41:21.400 | The idea is to measure the intelligence of systems
01:41:26.880 | by considering for priors,
01:41:28.760 | considering for amount of experience,
01:41:30.600 | and by assuming the same priors
01:41:33.760 | as what humans are born with,
01:41:34.920 | so that you can actually compare these scores
01:41:37.880 | to human intelligence,
01:41:39.680 | and you can actually have humans pass the same test
01:41:42.080 | in a way that's fair.
01:41:44.160 | And so importantly, such a benchmark
01:41:48.080 | should be such that
01:41:49.200 | any amount of practicing does not increase your score.
01:41:56.680 | So try to picture a game
01:41:58.560 | where no matter how much you play this game,
01:42:00.960 | that does not change your skill at the game.
01:42:05.280 | Can you picture that?
01:42:06.360 | - As a person who deeply appreciates practice,
01:42:11.640 | I cannot actually.
01:42:12.840 | I cannot, I cannot, I, yeah.
01:42:16.720 | - There's actually a very simple trick.
01:42:18.920 | So in order to come up with a task,
01:42:21.800 | so the only thing you can measure is skill at a task.
01:42:24.160 | - Yes.
01:42:25.000 | - So these tasks are gonna involve priors.
01:42:27.560 | - Yeah.
01:42:28.400 | - The trick is to know what they are,
01:42:30.600 | and to describe that.
01:42:32.360 | And then you make sure that this is the same set of priors
01:42:34.800 | as what humans start with.
01:42:36.280 | So you create a task that assumes these priors,
01:42:38.960 | that exactly documents these priors,
01:42:40.680 | so that the priors are made explicit,
01:42:42.520 | and there are no other priors involved.
01:42:44.520 | And then you generate a certain number of samples
01:42:49.200 | in experience space for this task, right?
01:42:52.040 | And this, for one task,
01:42:54.880 | assuming that the task is new for the agent passing it,
01:42:59.320 | that's one test of this definition of intelligence
01:43:04.320 | that we set up.
01:43:07.360 | And now you can scale that to many different tasks,
01:43:09.880 | that all, you know,
01:43:10.960 | each task should be new to the agent passing it, right?
01:43:14.480 | And also should be human, yeah,
01:43:15.920 | human interpretable and understandable,
01:43:17.480 | so that you can actually have a human pass the same test,
01:43:19.960 | and then you can compare the score of your machine
01:43:21.680 | and the score of your human.
01:43:22.840 | - Which could be a lot,
01:43:23.680 | you could even start a task like MNIST,
01:43:26.960 | just as long as you start with the same set of priors.
01:43:28.640 | - Yeah, so the problem with MNIST,
01:43:30.600 | humans are already trying to recognize digits, right?
01:43:34.640 | And, but let's say we're considering objects
01:43:39.640 | that are not digits, some completely arbitrary patterns.
01:43:44.600 | Well, humans already come with visual priors
01:43:47.840 | about how to process that.
01:43:50.000 | So in order to make the game fair,
01:43:51.960 | you would have to isolate these priors and describe them,
01:43:56.200 | and then express them as computational rules.
01:43:58.520 | - Having worked a lot with vision science people,
01:44:01.440 | that's exceptionally difficult.
01:44:03.080 | A lot of progress has been made,
01:44:04.360 | there's been a lot of good tests,
01:44:05.920 | and basically reducing all of human vision
01:44:08.080 | into some good priors.
01:44:09.640 | I mean, we're still probably far away from that perfectly,
01:44:12.160 | but as a start for a benchmark,
01:44:14.440 | that's an exciting possibility.
01:44:15.920 | - Yeah, so Elisabeth Belke actually lists
01:44:20.880 | objectness as one of the core knowledge priors.
01:44:24.280 | - Objectness, cool.
01:44:25.400 | - Objectness, yeah.
01:44:27.000 | So we have priors about objectness,
01:44:29.000 | like about the visual space, about time,
01:44:31.000 | about agents, about goal-oriented behavior.
01:44:34.480 | We have many different priors,
01:44:37.000 | but what's interesting is that,
01:44:38.920 | sure, we have this pretty diverse and rich set of priors,
01:44:44.520 | but it's also not that diverse, right?
01:44:48.280 | We are not born into this world
01:44:50.240 | with a ton of knowledge about the world,
01:44:52.520 | with only a small set of
01:44:55.760 | core knowledge.
01:44:59.800 | - Yeah, sorry, do you have a sense of how
01:45:02.840 | it feels to us humans that that set is not that large?
01:45:07.080 | But just even the nature of time
01:45:09.440 | that we kind of integrate pretty effectively
01:45:11.680 | through all of our perception, all of our reasoning,
01:45:14.600 | maybe how, you know,
01:45:16.840 | do you have a sense of how easy it is
01:45:18.320 | to encode those priors?
01:45:19.600 | Maybe it requires building a universe
01:45:23.280 | and then the human brain in order to encode those priors.
01:45:27.600 | Or do you have a hope that it can be listed
01:45:29.840 | like in axiomatic sense?
01:45:30.680 | - I don't think so.
01:45:31.520 | You have to keep in mind that any knowledge
01:45:33.600 | about the world that we are born with
01:45:36.000 | is something that has to have been encoded
01:45:39.680 | into our DNA by evolution at some point.
01:45:43.120 | And DNA is a very, very low bandwidth medium.
01:45:47.520 | Like it's extremely long and expensive
01:45:50.640 | to encode anything into DNA,
01:45:52.040 | because first of all,
01:45:53.160 | you need some sort of evolutionary pressure
01:45:57.160 | to guide this writing process.
01:45:59.200 | And then, you know,
01:46:01.640 | the higher level of information you're trying to write,
01:46:03.840 | the longer it's going to take.
01:46:05.360 | And the thing in the environment
01:46:11.560 | that you're trying to encode knowledge about
01:46:13.840 | has to be stable over this duration.
01:46:17.120 | So you can only encode into DNA things
01:46:19.680 | that constitute an evolutionary advantage.
01:46:22.760 | So this is actually a very small subset
01:46:25.240 | of all possible knowledge about the world.
01:46:27.080 | You can only encode things that are stable,
01:46:31.160 | that are true over very, very long periods of time,
01:46:33.760 | typically millions of years.
01:46:35.400 | For instance, we might have some visual prior
01:46:37.240 | about the shape of snakes, right?
01:46:40.320 | But all the, what makes a face,
01:46:43.640 | what's the difference between a face and an ant face?
01:46:46.280 | But consider this interesting question.
01:46:49.800 | Do we have any innate sense of the visual difference
01:46:54.800 | between a male face and a female face?
01:46:58.480 | What do you think?
01:46:59.360 | For a human, I mean.
01:47:01.800 | - I would have to look back into evolutionary history
01:47:03.840 | when the gender is emerged, but yeah, most.
01:47:08.400 | I mean, the faces of humans are quite different
01:47:10.440 | from the faces of great apes, great apes, right?
01:47:13.520 | - Yeah, that's interesting, but yeah.
01:47:17.520 | - You couldn't tell the face of a female chimpanzee
01:47:21.440 | from the face of a male chimpanzee, probably.
01:47:23.520 | - Yeah, and I don't think most humans have all that ability.
01:47:26.280 | - So we do have innate knowledge of what makes a face,
01:47:30.880 | but it's actually impossible for us to have any DNA
01:47:34.840 | encoding knowledge of the difference
01:47:36.800 | between a female human face and a male human face,
01:47:40.440 | because that knowledge, that information
01:47:44.960 | came up into the world actually very recently.
01:47:50.720 | If you look at the slowness of the process
01:47:54.480 | of encoding knowledge into DNA.
01:47:56.560 | - Yeah, so that's interesting.
01:47:57.520 | That's a really powerful argument.
01:47:59.280 | The DNA is a low bandwidth,
01:48:00.800 | and it takes a long time to encode.
01:48:02.960 | That naturally creates a very efficient encoding.
01:48:05.360 | - And one important consequence of this is that,
01:48:09.720 | so yes, we are born into this world
01:48:12.200 | with a bunch of knowledge,
01:48:13.760 | sometimes a high level knowledge about the world,
01:48:15.880 | like the shape, the rough shape of a snake,
01:48:18.120 | of the rough shape of face.
01:48:20.640 | But importantly, because this knowledge
01:48:22.760 | takes so long to write,
01:48:24.200 | almost all of this innate knowledge is shared
01:48:29.120 | with our cousins, with great apes, right?
01:48:33.240 | So it is not actually this innate knowledge
01:48:35.640 | that makes us special.
01:48:37.320 | - But to throw it right back at you
01:48:39.160 | from the earlier on in our discussion,
01:48:42.120 | that encoding might also include
01:48:46.240 | the entirety of the environment of Earth.
01:48:49.520 | - To some extent, so it can include things
01:48:53.200 | that are important to survival and production,
01:48:56.320 | so for which there is some evolutionary pressure,
01:48:59.000 | and things that are stable, constant,
01:49:01.600 | over very, very, very long time periods.
01:49:04.960 | And honestly, it's not that much information.
01:49:07.240 | There's also, besides the bandwidths constraint
01:49:10.440 | and the constraints of the writing process,
01:49:15.360 | there's also memory constraints.
01:49:18.520 | Like DNA, the part of DNA that deals with the human brain,
01:49:22.320 | it's actually fairly small.
01:49:23.480 | It's like, you know, on the order of megabytes, right?
01:49:26.640 | There's not that much high level knowledge
01:49:28.640 | about the world you can encode.
01:49:31.200 | - That's quite brilliant and hopeful for a benchmark
01:49:35.200 | of, that you're referring to, of encoding priors.
01:49:38.960 | I actually look forward to, I'm skeptical
01:49:41.680 | whether you can do it in the next couple of years,
01:49:43.080 | but hopefully.
01:49:44.520 | - I've been working on it.
01:49:45.440 | So honestly, it's a very simple benchmark,
01:49:47.440 | and it's not like a big breakthrough or anything.
01:49:49.600 | It's more like a fun side project, right?
01:49:52.880 | - But these fun, so is ImageNet.
01:49:56.360 | - These fun side projects could launch entire groups
01:50:00.760 | of efforts towards creating reasoning systems and so on.
01:50:05.160 | And I think-
01:50:06.000 | - Yeah, that's the goal.
01:50:06.840 | It's trying to measure strong generalization,
01:50:09.200 | to measure the strength of abstraction in our minds,
01:50:12.840 | well, in our minds and in artificial intelligence.
01:50:17.080 | - And if there's anything true about this science organism,
01:50:20.920 | it's individual cells love competition.
01:50:24.880 | So, and benchmarks encourage competition.
01:50:26.960 | So that's an exciting possibility.
01:50:29.640 | If you, do you think an AI winter is coming?
01:50:33.720 | And how do we prevent it?
01:50:35.440 | - Not really.
01:50:36.280 | So an AI winter is something that would occur
01:50:39.680 | when there's a big mismatch
01:50:41.360 | between how we are selling the capabilities of AI
01:50:44.800 | and the actual capabilities of AI.
01:50:47.400 | And today, so deep learning is creating a lot of value,
01:50:50.760 | and it will keep creating a lot of value in the sense that
01:50:54.760 | these models are applicable to a very wide range of problems
01:50:58.960 | that are relevant today.
01:51:00.040 | And we are only just getting started
01:51:02.160 | with applying algorithms to every problem
01:51:05.240 | that could be solving.
01:51:06.360 | So deep learning will keep creating a lot of value
01:51:09.040 | for the time being.
01:51:10.280 | What's concerning, however, is that there's a lot of hype
01:51:14.920 | around deep learning and around AI.
01:51:16.280 | There are lots of people are overselling the capabilities
01:51:20.040 | of these systems, not just the capabilities,
01:51:22.880 | but also overselling the fact that they might be
01:51:26.640 | more or less, you know, brain-like,
01:51:29.240 | like given a kind of a mystical aspect,
01:51:34.360 | these technologies,
01:51:35.720 | and also overselling the pace of progress,
01:51:39.360 | which, you know, it might look fast in the sense that
01:51:43.960 | we have this exponentially increasing number of papers.
01:51:46.760 | But again, that's just a simple consequence of the fact
01:51:51.720 | that we have ever more people coming into the field.
01:51:54.600 | It doesn't mean the progress is actually exponentially fast.
01:51:57.720 | Like, let's say you're trying to raise money
01:52:00.560 | for your startup or your research lab.
01:52:02.840 | You might want to tell, you know,
01:52:05.160 | a grandiose story to investors about how deep learning
01:52:09.160 | is just like the brain and how it can solve
01:52:11.640 | all these incredible problems like self-driving
01:52:14.360 | and robotics and so on.
01:52:15.880 | And maybe you can tell them that the field is progressing
01:52:18.280 | so fast and we are going to have AGI within 15 years
01:52:21.640 | or even 10 years.
01:52:23.000 | And none of this is true.
01:52:25.960 | And every time you're like saying these things
01:52:30.440 | and an investor or, you know, a decision maker believes them,
01:52:34.520 | well, this is like the equivalent of taking on
01:52:37.880 | credit card debt, but for trust, right?
01:52:42.520 | - Yeah.
01:52:43.080 | - And maybe this will, you know,
01:52:47.800 | this will be what enables you to raise a lot of money,
01:52:50.680 | but ultimately you are creating damage.
01:52:53.560 | You are damaging the field.
01:52:54.600 | - So that's the concern is that that debt,
01:52:57.320 | that's what happens with the other AI winters.
01:52:59.240 | Is the concern is you actually tweet about this
01:53:02.600 | with autonomous vehicles, right?
01:53:04.120 | There's almost every single company now have promised
01:53:07.320 | that they will have full autonomous vehicles by 2021, 2022.
01:53:11.640 | - That's a good example of the consequences
01:53:15.160 | of overhyping the capabilities of AI
01:53:17.960 | and the pace of progress.
01:53:19.080 | - So, because I work especially a lot recently in this area,
01:53:23.000 | I have a deep concern of what happens
01:53:25.720 | when all of these companies, after I've invested billions,
01:53:29.400 | have a meeting and say, how much did we actually,
01:53:31.960 | first of all, do we have an autonomous vehicle?
01:53:33.560 | The answer will definitely be no.
01:53:35.240 | And second would be, wait a minute,
01:53:37.880 | we've invested one, two, three, $4 billion into this
01:53:41.320 | and we made no profit.
01:53:43.160 | And the reaction to that may be going very hard
01:53:47.000 | in another direction that might impact
01:53:49.480 | even other industries.
01:53:50.280 | - And that's what we call an AI winter
01:53:52.280 | is when there is backlash,
01:53:53.480 | where no one believes any of these promises anymore
01:53:56.840 | because they've turned out to be big lies
01:53:58.680 | the first time around.
01:53:59.640 | And this will definitely happen to some extent
01:54:03.000 | for autonomous vehicles
01:54:04.600 | because the public and decision makers have been convinced
01:54:08.040 | that around 2015, they've been convinced
01:54:12.760 | by these people who are trying to raise money
01:54:14.760 | for their startups and so on,
01:54:16.200 | that L5 driving was coming in maybe 2016,
01:54:20.840 | maybe 2017, maybe 2018.
01:54:22.920 | Now in 2019, we're still waiting for it.
01:54:26.120 | And so I don't believe we are going to have
01:54:30.360 | a full-on AI winter because we have these technologies
01:54:33.400 | that are producing a tremendous amount of real value.
01:54:36.680 | - Yeah.
01:54:37.160 | - But there is also too much hype.
01:54:39.960 | So there will be some backlash,
01:54:41.640 | especially there will be backlash.
01:54:43.560 | So, you know, some startups are trying to sell
01:54:47.160 | the dream of AGI, right?
01:54:49.800 | And the fact that AGI is going to create infinite value,
01:54:53.720 | like AGI is like a free lunch.
01:54:55.720 | Like if you can develop an AI system
01:54:58.920 | that passes a certain threshold of IQ or something,
01:55:02.760 | then suddenly you have infinite value.
01:55:04.360 | And well, there are actually lots of investors
01:55:09.240 | buying into this idea.
01:55:11.240 | And, you know, they will wait maybe 10, 15 years
01:55:15.800 | and nothing will happen.
01:55:17.240 | And the next time around, well, maybe there will be
01:55:21.240 | a new generation of investors.
01:55:22.600 | No one will care.
01:55:23.400 | You know, human memory is fairly short after all.
01:55:26.920 | - I don't know about you, but because I've spoken about AGI
01:55:31.640 | sometimes poetically, like I get a lot of emails from people
01:55:35.480 | giving me, they're usually like large manifestos
01:55:39.960 | of they say to me that they have created an AGI system
01:55:46.120 | or they know how to do it.
01:55:47.160 | And there's a long write-up of how to do it.
01:55:48.920 | - I get a lot of these emails, yeah.
01:55:50.120 | - They're a little bit, feel like it's generated
01:55:53.560 | by an AI system actually, but there's usually no diagram.
01:55:57.960 | - Maybe that's recursively self-improving AI.
01:56:00.760 | - Exactly.
01:56:01.320 | - It's you have a transformer generating
01:56:03.160 | crank papers about AGI.
01:56:06.200 | - So the question is about, because you've been such a good,
01:56:09.560 | you have a good radar for crank papers.
01:56:12.600 | How do we know they're not onto something?
01:56:16.680 | How do I, so when you start to talk about AGI
01:56:22.280 | or anything like the reasoning benchmarks and so on,
01:56:24.760 | so something that doesn't have a benchmark,
01:56:27.080 | it's really difficult to know.
01:56:28.200 | I mean, I talked to Jeff Hawkins,
01:56:31.160 | who's really looking at neuroscience approaches to how,
01:56:35.160 | and there's some, there's echoes of really interesting ideas
01:56:40.200 | in at least Jeff's case, which he's showing.
01:56:42.360 | How do you usually think about this?
01:56:45.000 | Like preventing yourself from being too narrow-minded
01:56:49.720 | and elitist about deep learning.
01:56:52.680 | It has to work on these particular benchmarks,
01:56:55.400 | otherwise it's trash.
01:56:56.360 | - Well, you know, the thing is intelligence
01:57:02.520 | does not exist in the abstract.
01:57:05.240 | Intelligence has to be applied.
01:57:07.160 | So if you don't have a benchmark,
01:57:08.440 | if you're not doing an improvement on some benchmark,
01:57:10.600 | maybe it's a new benchmark, right?
01:57:12.360 | Maybe it's not something we've been looking at before,
01:57:14.600 | but you do need a problem that you're trying to solve.
01:57:17.400 | You're not going to come up with a solution
01:57:19.080 | without a problem.
01:57:19.960 | - So you, general intelligence, I mean,
01:57:23.640 | you've clearly highlighted generalization.
01:57:25.480 | If you want to claim that you have an intelligence system,
01:57:30.040 | it should come with a benchmark.
01:57:31.080 | - It should, yes.
01:57:32.120 | It should display capabilities of some kind.
01:57:35.720 | It should show that it can create some form of value,
01:57:40.040 | even if it's a very artificial form of value.
01:57:42.760 | And that's also the reason why you don't actually
01:57:45.720 | need to care about telling which papers
01:57:48.600 | have actually some hidden potential and which do not.
01:57:52.040 | Because if there is a new technique
01:57:56.520 | that's actually creating value,
01:57:57.960 | you know, this is going to be brought to light very quickly
01:58:00.600 | because it's actually making a difference.
01:58:02.440 | So it's the difference between something that's ineffectual
01:58:04.920 | and something that is actually useful.
01:58:08.840 | And ultimately usefulness is our guide,
01:58:11.800 | not just in this field,
01:58:12.840 | but if you look at science in general,
01:58:14.920 | maybe there are many, many people over the years
01:58:16.920 | that have had some really interesting theories
01:58:19.880 | of everything, but they were just completely useless.
01:58:22.840 | And you don't actually need to tell the interesting theories
01:58:26.280 | from the useless theories.
01:58:28.040 | All you need is to see, you know,
01:58:30.200 | is this actually having an effect on something else?
01:58:33.880 | You know, is this actually useful?
01:58:35.400 | Is this making an impact or not?
01:58:36.760 | - That's beautifully put.
01:58:38.680 | I mean, the same applies to quantum mechanics,
01:58:41.000 | to a string theory, to the holographic principle.
01:58:43.480 | - Like we are doing deep learning because it works.
01:58:45.480 | You know, that's like before it started working,
01:58:48.280 | people, you know, considered people working
01:58:50.840 | on neural networks as cranks very much.
01:58:53.240 | Like, you know, no one was working on this anymore.
01:58:56.360 | And now it's working, which is what makes it valuable.
01:58:59.160 | It's not about being right, right?
01:59:01.160 | It's about being effective.
01:59:02.600 | - And nevertheless, the individual entities
01:59:04.360 | of the scientific mechanism,
01:59:06.600 | just like Yoshua Banjo, Yan Likun,
01:59:09.240 | while being called cranks, stuck with it, right?
01:59:12.760 | - Yeah.
01:59:13.320 | - And so us individual agents,
01:59:15.480 | even if everyone's laughing at us,
01:59:17.000 | just stick with it because--
01:59:18.920 | - If you believe you have something,
01:59:20.040 | you should stick with it and see it through.
01:59:21.880 | - That's a beautiful, inspirational message to end on.
01:59:25.960 | Francois, thank you so much for talking today.
01:59:27.640 | That was amazing.
01:59:28.280 | - Thank you.
01:59:29.640 | (upbeat music)
01:59:29.800 | (upbeat music)
01:59:29.880 | [BLANK_AUDIO]
01:59:37.760 | [BLANK_AUDIO]
01:59:47.760 | (upbeat music)