François Chollet: Keras, Deep Learning, and the Progress of AI

00:00:00.000 | The following is a conversation with Francois Chollet.

00:00:03.760 | He's the creator of Keras, which is an open source deep learning

00:00:07.360 | library that is designed to enable

00:00:09.200 | fast, user-friendly experimentation

00:00:11.520 | with deep neural networks.

00:00:13.600 | It serves as an interface to several deep learning libraries,

00:00:16.720 | most popular of which is TensorFlow.

00:00:19.040 | And it was integrated into the TensorFlow main code

00:00:22.000 | base a while ago.

00:00:24.120 | Meaning, if you want to create, train, and use neural networks,

00:00:28.640 | probably the easiest and most popular option

00:00:31.080 | is to use Keras inside TensorFlow.

00:00:34.800 | Aside from creating an exceptionally useful

00:00:37.280 | and popular library, Francois is also a world-class AI

00:00:41.040 | researcher and software engineer at Google.

00:00:44.560 | And he's definitely an outspoken, if not controversial,

00:00:48.160 | personality in the AI world, especially

00:00:51.480 | in the realm of ideas around the future

00:00:53.760 | of artificial intelligence.

00:00:55.920 | This is the Artificial Intelligence Podcast.

00:00:58.640 | If you enjoy it, subscribe on YouTube,

00:01:01.000 | give us five stars on iTunes, support on Patreon,

00:01:04.160 | or simply connect with me on Twitter at Lex Friedman,

00:01:07.160 | spelled F-R-I-D-M-A-N. And now, here's my conversation

00:01:12.200 | with Francois Chollet.

00:01:14.880 | You're known for not sugarcoating your opinions

00:01:17.280 | and speaking your mind about ideas in AI,

00:01:19.120 | especially on Twitter.

00:01:21.160 | It's one of my favorite Twitter accounts.

00:01:22.840 | So what's one of the more controversial ideas

00:01:26.360 | you've expressed online and gotten some heat for?

00:01:30.440 | How do you pick?

00:01:33.040 | How do I pick?

00:01:33.880 | Yeah, no, I think if you go through the trouble

00:01:36.920 | of maintaining a Twitter account,

00:01:39.600 | you might as well speak your mind.

00:01:41.880 | Otherwise, what's even the point of having a Twitter account?

00:01:44.600 | It's like having a nice car and just leaving it in the garage.

00:01:48.600 | Yeah, so what's one thing for which I got

00:01:52.280 | a lot of pushback?

00:01:53.600 | Perhaps, you know, that time I wrote something

00:01:56.680 | about the idea of intelligence explosion.

00:02:00.920 | And I was questioning the idea

00:02:04.520 | and the reasoning behind this idea.

00:02:06.840 | And I got a lot of pushback on that.

00:02:09.640 | I got a lot of flack for it.

00:02:11.840 | So yeah, so intelligence explosion,

00:02:13.600 | I'm sure you're familiar with the idea,

00:02:14.960 | but it's the idea that if you were to build

00:02:18.800 | general AI problem solving algorithms,

00:02:22.920 | well, the problem of building such an AI,

00:02:27.480 | that itself is a problem that could be solved by your AI.

00:02:30.520 | And maybe it could be solved better than what humans can do.

00:02:33.760 | So your AI could start tweaking its own algorithm,

00:02:36.840 | could start being a better version of itself.

00:02:39.520 | And so on, iteratively, in a recursive fashion.

00:02:43.240 | And so you would end up with an AI

00:02:47.320 | with exponentially increasing intelligence.

00:02:50.080 | That's right.

00:02:50.880 | And I was basically questioning this idea,

00:02:55.880 | first of all, because the notion of intelligence explosion

00:02:59.040 | uses an implicit definition of intelligence

00:03:02.200 | that doesn't sound quite right to me.

00:03:05.360 | It considers intelligence as a property of a brain

00:03:11.200 | that you can consider in isolation,

00:03:13.680 | like the height of a building, for instance.

00:03:15.760 | Right.

00:03:16.640 | But that's not really what intelligence is.

00:03:19.040 | Intelligence emerges from the interaction between a brain,

00:03:24.480 | a body, like embodied intelligence,

00:03:26.720 | and an environment.

00:03:28.320 | And if you're missing one of these pieces,

00:03:30.720 | then you cannot really define intelligence anymore.

00:03:33.840 | So just tweaking a brain to make it smaller and smaller

00:03:36.800 | doesn't actually make any sense to me.

00:03:39.120 | So first of all, you're crushing the dreams of many people.

00:03:42.560 | Right.

00:03:43.360 | So there's a, let's look at like Sam Harris.

00:03:46.000 | Actually, a lot of physicists, Max Tegmark,

00:03:48.720 | people who think the universe is an information processing

00:03:53.640 | system.

00:03:54.640 | Our brain is kind of an information processing system.

00:03:57.680 | So what's the theoretical limit?

00:04:00.080 | It doesn't make sense that there should be some--

00:04:04.800 | it seems naive to think that our own brain is somehow

00:04:08.160 | the limit of the capabilities of this information.

00:04:11.600 | I'm playing devil's advocate here.

00:04:13.640 | This information processing system.

00:04:15.600 | And then if you just scale it, if you're

00:04:18.080 | able to build something that's on par with the brain,

00:04:21.600 | the process that builds it just continues,

00:04:24.000 | and it'll improve exponentially.

00:04:26.360 | So that's the logic that's used, actually,

00:04:30.160 | by almost everybody that is worried

00:04:33.960 | about superhuman intelligence.

00:04:36.800 | Yeah.

00:04:37.320 | So you're trying to make--

00:04:39.080 | so most people who are skeptical of that are kind of like,

00:04:42.280 | this doesn't-- the thought process,

00:04:44.320 | this doesn't feel right.

00:04:46.600 | That's for me as well.

00:04:47.600 | So I'm more like, it doesn't--

00:04:51.360 | the whole thing is shrouded in mystery where you can't really

00:04:54.400 | say anything concrete, but you could

00:04:56.160 | say this doesn't feel right.

00:04:57.840 | This doesn't feel like that's how the brain works.

00:05:00.600 | And you're trying to, with your blog post,

00:05:02.320 | and now making it a little more explicit.

00:05:05.640 | So one idea is that the brain doesn't exist alone.

00:05:11.280 | It exists within the environment.

00:05:13.880 | So you can't exponentially--

00:05:16.320 | you would have to somehow exponentially

00:05:18.200 | improve the environment and the brain together,

00:05:21.080 | almost, in order to create something that's much smarter

00:05:26.600 | in some kind of--

00:05:28.520 | of course, we don't have a definition of intelligence.

00:05:30.720 | That's correct.

00:05:31.320 | That's correct.

00:05:31.920 | I don't think-- if you look at very smart people today,

00:05:34.880 | even humans, not even talking about AIs,

00:05:37.920 | I don't think their brain and the performance of their brain

00:05:41.200 | is the bottleneck to their expressed intelligence,

00:05:44.720 | to their achievements.

00:05:47.200 | You cannot just tweak one part of this system,

00:05:50.520 | like of this brain-body environment system,

00:05:53.440 | and expect the capabilities, like what emerges out

00:05:56.640 | of this system, to just explode exponentially.

00:06:00.840 | Because any time you improve one part

00:06:04.120 | of a system with many interdependencies like this,

00:06:07.280 | there's a new bottleneck that arises.

00:06:09.560 | And I don't think even today, for very smart people,

00:06:12.320 | their brain is not the bottleneck to the sort

00:06:15.400 | of problems they can solve.

00:06:17.600 | In fact, many very, very smart people today,

00:06:21.480 | they're not actually solving any big scientific problems.

00:06:23.800 | They're not Einstein.

00:06:24.840 | They're like Einstein, but the patent clerk days.

00:06:29.840 | Like Einstein became Einstein because this

00:06:32.640 | was a meeting of a genius with a big problem at the right time.

00:06:39.480 | But maybe this meeting could have never happened.

00:06:42.520 | And then Einstein would have just been a patent clerk.

00:06:45.000 | And in fact, many people today are probably

00:06:48.000 | like genius level smart, but you wouldn't know,

00:06:52.280 | because they're not really expressing any of that.

00:06:54.840 | - Wow, that's brilliant.

00:06:55.680 | So we can think of the world, Earth, but also the universe

00:06:59.560 | as just as a space of problems.

00:07:02.760 | So all of these problems and tasks are roaming it,

00:07:05.200 | of various difficulty.

00:07:06.920 | And there's agents, creatures like ourselves

00:07:10.120 | and animals and so on that are also roaming it.

00:07:13.360 | And then you get coupled with a problem

00:07:16.480 | and then you solve it.

00:07:17.640 | But without that coupling, you can't demonstrate

00:07:20.880 | your quote unquote intelligence.

00:07:22.520 | - Exactly, intelligence is the meeting

00:07:24.480 | of great problem solving capabilities

00:07:27.480 | with a great problem.

00:07:28.720 | And if you don't have the problem,

00:07:30.520 | you don't really express an intelligence.

00:07:32.240 | All you're left with is potential intelligence,

00:07:34.720 | like the performance of your brain

00:07:36.240 | or how high your IQ is, which in itself is just a number.

00:07:41.240 | - So you mentioned problem solving capacity.

00:07:46.160 | What do you think of as problem solving capacity?

00:07:51.800 | Can you try to define intelligence?

00:07:55.160 | Like what does it mean to be more or less intelligent?

00:08:00.000 | Is it completely coupled to a particular problem

00:08:03.000 | or is there something a little bit more universal?

00:08:05.720 | - Yeah, I do believe all intelligence

00:08:07.440 | is specialized intelligence.

00:08:09.080 | Even human intelligence has some degree of generality.

00:08:12.200 | Well, all intelligence systems have some degree of generality

00:08:15.360 | but they're always specialized in one category of problems.

00:08:19.400 | So the human intelligence is specialized

00:08:21.880 | in the human experience and that shows at various levels.

00:08:25.560 | That shows in some prior knowledge that's innate

00:08:30.200 | that we have at birth.

00:08:32.040 | Knowledge about things like agents, goal-driven behavior,

00:08:37.040 | visual priors about what makes an object,

00:08:40.400 | priors about time and so on.

00:08:43.520 | That shows also in the way we learn.

00:08:45.320 | For instance, it's very, very easy for us

00:08:47.160 | to pick up language.

00:08:48.600 | It's very, very easy for us to learn certain things

00:08:52.040 | because we are basically hard-coded to learn them.

00:08:54.920 | And we are specialized in solving certain kinds of problem

00:08:58.280 | and we are quite useless when it comes

00:09:00.200 | to other kinds of problems.

00:09:01.440 | For instance, we are not really designed

00:09:06.160 | to handle very long-term problems.

00:09:08.800 | We have no capability of seeing the very long-term.

00:09:12.880 | We don't have very much working memory.

00:09:16.880 | - So how do you think about long-term?

00:09:20.080 | Do you think long-term planning,

00:09:21.360 | are we talking about scale of years, millennia?

00:09:24.880 | What do you mean by long-term we're not very good?

00:09:28.120 | - Well, human intelligence is specialized

00:09:29.720 | in the human experience.

00:09:30.720 | And human experience is very short.

00:09:32.600 | Like one lifetime is short.

00:09:34.240 | Even within one lifetime, we have a very hard time

00:09:38.080 | envisioning things on a scale of years.

00:09:41.160 | Like it's very difficult to project yourself

00:09:43.240 | at the scale of five years,

00:09:44.080 | at the scale of 10 years and so on.

00:09:46.160 | - Right.

00:09:47.000 | - We can solve only fairly narrowly scoped problems.

00:09:50.000 | So when it comes to solving bigger problems,

00:09:52.320 | larger scale problems,

00:09:53.760 | we are not actually doing it on an individual level.

00:09:56.360 | So it's not actually our brain doing it.

00:09:59.320 | We have this thing called civilization, right?

00:10:03.080 | Which is itself a sort of problem solving system,

00:10:06.640 | a sort of artificial intelligence system, right?

00:10:10.040 | And it's not running on one brain,

00:10:12.160 | it's running on a network of brains.

00:10:14.120 | In fact, it's running on much more

00:10:15.640 | than a network of brains.

00:10:16.800 | It's running on a lot of infrastructure,

00:10:20.120 | like books and computers and the internet

00:10:23.080 | and human institutions and so on.

00:10:25.840 | And that is capable of handling problems

00:10:30.280 | on a much greater scale than any individual human.

00:10:33.760 | If you look at computer science, for instance,

00:10:37.600 | that's an institution that solves problems

00:10:39.840 | and it is superhuman, right?

00:10:42.560 | I had to press on a greater scale,

00:10:44.200 | it can solve much bigger problems

00:10:46.880 | than an individual human could.

00:10:49.080 | And science itself, science as a system,

00:10:51.360 | as an institution is a kind of artificially intelligent

00:10:55.720 | problem solving algorithm that is superhuman.

00:10:59.400 | - Yeah, it's a, at least computer science

00:11:02.800 | is like a theorem prover at a scale of thousands,

00:11:07.760 | maybe hundreds of thousands of human beings.

00:11:10.440 | At that scale, what do you think is an intelligent agent?

00:11:14.680 | So there's us humans at the individual level,

00:11:18.320 | there is millions, maybe billions of bacteria in our skin.

00:11:23.920 | There is, that's at the smaller scale.

00:11:26.440 | You can even go to the particle level

00:11:29.200 | as systems that behave, you can say intelligently

00:11:33.560 | in some ways, and then you can look at Earth

00:11:36.720 | as a single organism, you can look at our galaxy

00:11:39.240 | and even the universe as a single organism.

00:11:42.200 | Do you think, how do you think about scale

00:11:44.680 | and defining intelligent systems?

00:11:46.320 | And we're here at Google, there is millions of devices

00:11:50.480 | doing computation in a distributed way.

00:11:53.440 | How do you think about intelligence versus scale?

00:11:55.920 | - You can always characterize anything as a system.

00:11:59.480 | - Right.

00:12:00.680 | - I think people who talk about things

00:12:03.640 | like intelligence explosion tend to focus on one agent

00:12:07.440 | is basically one brain, like one brain considered

00:12:10.200 | in isolation, like a brain, a jaw that's controlling

00:12:12.680 | a body in a very like top to bottom kind of fashion.

00:12:16.320 | And that body is pursuing goals into an environment.

00:12:19.520 | So it's a very hierarchical view.

00:12:20.720 | You have the brain at the top of the pyramid,

00:12:22.880 | then you have the body just plainly receiving orders

00:12:26.000 | and then the body is manipulating objects

00:12:27.640 | in an environment and so on.

00:12:28.920 | So everything is subordinate to this one thing,

00:12:32.960 | this epicenter, which is the brain.

00:12:34.720 | But in real life, intelligent agents

00:12:37.120 | don't really work like this, right?

00:12:39.240 | There is no strong delimitation between the brain

00:12:41.760 | and the body to start with.

00:12:43.400 | You have to look not just at the brain,

00:12:45.000 | but at the nervous system.

00:12:46.560 | But then the nervous system and the body

00:12:48.840 | are not really two separate entities.

00:12:50.760 | So you have to look at an entire animal as one agent,

00:12:53.960 | but then you start realizing as you observe an animal

00:12:57.000 | over any length of time, that a lot of the intelligence

00:13:02.000 | of an animal is actually externalized.

00:13:04.600 | That's especially true for humans.

00:13:06.240 | A lot of our intelligence is externalized.

00:13:08.880 | When you write down some notes,

00:13:10.360 | that is externalized intelligence.

00:13:11.960 | When you write a computer program,

00:13:14.000 | you are externalizing cognition.

00:13:16.000 | So it's externalized in books, it's externalized

00:13:18.280 | in computers, the internet, in other humans.

00:13:21.520 | It's externalized in language and so on.

00:13:25.400 | So there is no hard delimitation

00:13:30.400 | of what makes an intelligent agent.

00:13:32.640 | It's all about context.

00:13:33.880 | - Okay, but AlphaGo is better at Go

00:13:38.720 | than the best human player.

00:13:40.160 | There's levels of skill here.

00:13:44.960 | So do you think there's such a concept

00:13:49.960 | as intelligence explosion in a specific task?

00:13:54.720 | And then, well, yeah, do you think it's possible

00:13:58.560 | to have a category of tasks on which you do have something

00:14:02.040 | like an exponential growth of ability

00:14:05.000 | to solve that particular problem?

00:14:07.440 | - I think if you consider a specific vertical,

00:14:10.320 | it's probably possible to some extent

00:14:14.640 | I also don't think we have to speculate about it

00:14:17.640 | because we have real world examples

00:14:21.640 | of recursively self-improving intelligent systems.

00:14:25.440 | - Right.

00:14:26.280 | - So for instance, science is a problem solving system,

00:14:30.280 | a knowledge generation system,

00:14:31.960 | like a system that experiences the world in some sense

00:14:35.640 | and then gradually understands it and can act on it.

00:14:39.520 | And that system is superhuman

00:14:42.080 | and it is clearly recursively self-improving

00:14:45.560 | because science feeds into technology.

00:14:47.520 | Technology can be used to build better tools,

00:14:50.160 | better computers, better instrumentation and so on,

00:14:52.840 | which in turn can make science faster, right?

00:14:56.680 | So science is probably the closest thing we have today

00:15:00.520 | to a recursively self-improving superhuman AI.

00:15:04.720 | And you can just observe,

00:15:06.480 | is scientific progress today exploding,

00:15:10.280 | which itself is an interesting question.

00:15:12.760 | And you can use that as a basis to try to understand

00:15:15.520 | what will happen with a superhuman AI

00:15:17.840 | that has science-like behavior.

00:15:20.960 | - Let me linger on it a little bit more.

00:15:23.280 | What is your intuition why an intelligence explosion

00:15:27.560 | is not possible?

00:15:28.480 | Like taking the scientific,

00:15:30.880 | all the scientific revolutions,

00:15:33.200 | why can't we slightly accelerate that process?

00:15:38.080 | - So you can absolutely accelerate

00:15:41.200 | any problem-solving process.

00:15:43.120 | So recursive self-improvement is absolutely a real thing.

00:15:48.120 | But what happens with a recursively self-improving system

00:15:51.880 | is typically not explosion

00:15:53.680 | because no system exists in isolation.

00:15:56.480 | And so tweaking one part of the system

00:15:58.640 | means that suddenly another part of the system

00:16:00.880 | becomes a bottleneck.

00:16:02.160 | And if you look at science, for instance,

00:16:03.760 | which is clearly recursively self-improving,

00:16:06.800 | clearly a problem-solving system,

00:16:09.040 | scientific progress is not actually exploding.

00:16:12.000 | If you look at science,

00:16:13.520 | what you see is the picture of a system

00:16:16.480 | that is consuming an exponentially increasing

00:16:19.240 | amount of resources.

00:16:20.520 | But it's having a linear output

00:16:23.920 | in terms of scientific progress.

00:16:25.960 | And maybe that will seem like a very strong claim.

00:16:28.920 | Many people are actually saying that,

00:16:31.120 | scientific progress is exponential,

00:16:34.520 | but when they're claiming this,

00:16:36.120 | they're actually looking at indicators

00:16:38.400 | of resource consumption by science.

00:16:43.080 | For instance, the number of papers being published,

00:16:46.680 | the number of patents being filed and so on,

00:16:49.960 | which are just completely correlated

00:16:53.600 | with how many people are working on science today.

00:16:58.480 | So it's actually an indicator of resource consumption.

00:17:00.640 | But what you should look at is the output,

00:17:03.200 | is progress in terms of the knowledge that science generates

00:17:08.040 | in terms of the scope and significance

00:17:10.640 | of the problems that we solve.

00:17:12.520 | And some people have actually been trying to measure that.

00:17:16.720 | Like Michael Nielsen, for instance.

00:17:20.160 | He had a very nice paper,

00:17:21.920 | I think that was last year about it.

00:17:23.720 | So his approach to measure scientific progress

00:17:28.360 | was to look at the timeline of scientific discoveries

00:17:33.360 | over the past 100, 150 years.

00:17:37.160 | And for each measured discovery,

00:17:41.360 | ask a panel of experts

00:17:43.480 | to rate the significance of the discovery.

00:17:46.760 | And if the output of science as an institution

00:17:49.600 | were exponential, you would expect

00:17:51.640 | the temporal density of significance

00:17:56.600 | to go up exponentially.

00:17:58.160 | Maybe because there's a faster rate of discoveries,

00:18:00.960 | maybe because the discoveries are increasingly

00:18:03.560 | more important.

00:18:04.920 | And what actually happens if you plot

00:18:07.840 | this temporal density of significance measured in this way,

00:18:11.320 | is that you see very much a flat graph.

00:18:14.520 | You see a flat graph across all disciplines,

00:18:16.600 | across physics, biology, medicine, and so on.

00:18:19.720 | And it actually makes a lot of sense if you think about it,

00:18:23.280 | because think about the progress of physics

00:18:26.000 | 110 years ago, right?

00:18:28.000 | It was a time of crazy change.

00:18:30.040 | Think about the progress of technology,

00:18:31.960 | you know, 170 years ago,

00:18:34.360 | when we started having, you know,

00:18:35.360 | replacing horses with cars,

00:18:37.560 | when we started having electricity and so on.

00:18:40.000 | It was a time of incredible change.

00:18:41.520 | And today is also a time of very, very fast change,

00:18:44.600 | but it would be an unfair characterization

00:18:48.040 | to say that today, technology and science

00:18:50.560 | are moving way faster than they did 50 years ago,

00:18:52.920 | 100 years ago.

00:18:54.360 | And if you do try to

00:18:58.240 | rigorously plot the temporal density

00:19:01.520 | of the significance,

00:19:04.840 | yeah, of significance idea,

00:19:06.000 | of significance idea, sorry,

00:19:07.360 | you do see very flat curves.

00:19:09.720 | - That's fascinating.

00:19:10.560 | - And you can check out the paper that Michael Nielsen

00:19:13.800 | had about this idea.

00:19:16.000 | And so the way I interpret it is,

00:19:20.000 | as you make progress,

00:19:22.600 | you know, in a given field

00:19:24.200 | or in a given subfield of science,

00:19:26.120 | it becomes exponentially more difficult

00:19:28.680 | to make further progress.

00:19:30.440 | Like the very first person to work on information theory.

00:19:35.000 | If you enter a new field,

00:19:36.440 | and it's still the very early years,

00:19:37.920 | there's a lot of low hanging fruit you can pick.

00:19:41.160 | - That's right, yeah.

00:19:42.000 | - But the next generation of researchers

00:19:43.960 | is gonna have to dig much harder actually

00:19:48.120 | to make smaller discoveries,

00:19:50.160 | probably larger numbers, smaller discoveries.

00:19:52.640 | And to achieve the same amount of impact,

00:19:54.640 | you're gonna need a much greater headcount.

00:19:57.480 | And that's exactly the picture you're seeing with science,

00:20:00.040 | is that the number of scientists and engineers

00:20:03.760 | is in fact increasing exponentially.

00:20:06.520 | The amount of computational resources

00:20:08.400 | that are available to science

00:20:10.040 | is increasing exponentially and so on.

00:20:11.840 | So the resource consumption of science is exponential,

00:20:15.560 | but the output in terms of progress,

00:20:18.160 | in terms of significance is linear.

00:20:20.960 | And the reason why is because,

00:20:23.080 | and even though science is recursively self-improving,

00:20:25.960 | meaning that scientific progress

00:20:28.400 | turns into technological progress,

00:20:30.200 | which in turn helps science.

00:20:32.920 | If you look at computers, for instance,

00:20:35.240 | are a product of science,

00:20:37.760 | and computers are tremendously useful

00:20:40.320 | in speeding up science.

00:20:41.520 | The internet, same thing.

00:20:42.680 | The internet is a technology that's made possible

00:20:44.640 | by very recent scientific advances.

00:20:47.440 | And itself, because it enables scientists to network,

00:20:52.400 | to communicate, to exchange papers and ideas much faster,

00:20:55.520 | it is a way to speed up scientific progress.

00:20:57.400 | So even though you're looking

00:20:58.400 | at a recursively self-improving system,

00:21:01.400 | it is consuming exponentially more resources

00:21:04.080 | to produce the same amount of problem-solving very much.

00:21:09.080 | - So that's a fascinating way to paint it.

00:21:11.080 | And certainly that holds for the deep learning community.

00:21:14.920 | If you look at the temporal, what did you call it?

00:21:18.080 | The temporal density of significant ideas.

00:21:21.200 | If you look at in deep learning,

00:21:23.880 | I think, I'd have to think about that,

00:21:26.920 | but if you really look at significant ideas

00:21:29.000 | in deep learning, they might even be decreasing.

00:21:32.360 | - So I do believe the per paper significance is decreasing.

00:21:37.360 | But the amount of papers is still today

00:21:42.360 | exponentially increasing.

00:21:43.400 | So I think if you look at an aggregate,

00:21:45.840 | my guess is that you would see a linear progress.

00:21:48.840 | - Linear progress.

00:21:49.680 | - If you were to sum the significance of all papers,

00:21:53.800 | you would see a roughly linear progress.

00:21:58.600 | And in my opinion, it is not a coincidence

00:22:03.600 | that you're seeing linear progress in science

00:22:05.760 | despite exponential resource consumption.

00:22:07.680 | I think the resource consumption

00:22:10.280 | is dynamically adjusting itself to maintain linear progress

00:22:15.280 | because we as a community expect linear progress,

00:22:18.520 | meaning that if we start investing less

00:22:21.240 | and seeing less progress,

00:22:22.320 | it means that suddenly there are some lower hanging fruits

00:22:25.720 | that become available and someone's gonna step up

00:22:29.600 | and pick them.

00:22:31.240 | So it's very much like a market for discoveries and ideas.

00:22:36.240 | - But there's another fundamental part

00:22:38.680 | which you're highlighting,

00:22:39.760 | which is a hypothesis that science,

00:22:42.600 | or like the space of ideas,

00:22:45.120 | any one path you travel down,

00:22:48.120 | it gets exponentially more difficult

00:22:51.040 | to develop new ideas.

00:22:54.680 | And your sense is that's gonna hold

00:22:57.600 | across our mysterious universe.

00:23:01.480 | - Yes, well, exponential progress

00:23:03.320 | triggers exponential friction.

00:23:05.440 | So that if you tweak one part of the system,

00:23:07.400 | suddenly some other part becomes a bottleneck.

00:23:10.680 | For instance, let's say we develop some device

00:23:14.880 | that measures its own acceleration

00:23:17.160 | and then it has some engine

00:23:18.720 | and it outputs even more acceleration

00:23:20.800 | in proportion of its own acceleration

00:23:22.360 | and you drop it somewhere,

00:23:23.320 | it's not gonna reach infinite speed

00:23:25.240 | because it exists in a certain context.

00:23:27.880 | So the air around it is gonna generate friction.

00:23:31.000 | It's gonna block it at some top speed.

00:23:34.320 | And even if you were to consider the broader context

00:23:37.520 | and lift the bottleneck there,

00:23:39.880 | like the bottleneck of friction,

00:23:42.280 | then some other part of the system

00:23:45.200 | would start stepping in

00:23:46.440 | and creating exponential friction,

00:23:48.160 | maybe the speed of light, or whatever.

00:23:49.960 | And this definitely holds true

00:23:51.960 | when you look at the problem-solving algorithm

00:23:55.000 | that is being run by science as an institution,

00:23:58.200 | science as a system.

00:23:59.760 | As you make more and more progress,

00:24:02.080 | despite having this recursive self-improvement component,

00:24:05.920 | you are encountering exponential friction.

00:24:09.440 | Like the more researchers you have

00:24:11.600 | working on different ideas,

00:24:13.600 | the more overhead you have

00:24:15.000 | in terms of communication across researchers.

00:24:18.160 | If you look at,

00:24:19.000 | you were mentioning quantum mechanics, right?

00:24:23.040 | Well, if you want to start making significant discoveries

00:24:26.960 | today, significant progress in quantum mechanics,

00:24:29.800 | there is an amount of knowledge

00:24:31.880 | you have to ingest, which is huge.

00:24:34.160 | So there's a very large overhead

00:24:36.600 | to even start to contribute.

00:24:39.320 | There's a large amount of overhead

00:24:40.760 | to synchronize across researchers and so on.

00:24:44.120 | And of course, the significant practical experiments

00:24:47.520 | are going to require exponentially expensive equipment

00:24:52.240 | because the easier ones have already been run, right?

00:24:56.600 | - So in your senses,

00:24:59.360 | there's no way escaping,

00:25:01.840 | there's no way of escaping this kind of friction

00:25:05.720 | with artificial intelligence systems.

00:25:09.840 | - Yeah, no, I think science is a very good way

00:25:12.800 | to model what would happen

00:25:14.080 | with a superhuman recursively self-improving AI.

00:25:17.760 | - That's your sense, I mean, the--

00:25:19.800 | - That's my intuition.

00:25:20.880 | It's not like a mathematical proof of anything.

00:25:24.680 | That's not my point.

00:25:25.720 | Like, I'm not trying to prove anything.

00:25:27.360 | I'm just trying to make an argument,

00:25:28.800 | to question the narrative of intelligence explosion,

00:25:32.120 | which is quite a dominant narrative.

00:25:33.800 | And you do get a lot of pushback if you go against it.

00:25:36.760 | Because, so for many people, right,

00:25:40.200 | AI is not just a subfield of computer science.

00:25:43.120 | It's more like a belief system.

00:25:44.920 | Like this belief that the world is headed towards an event,

00:25:49.560 | the singularity, past which, you know,

00:25:52.800 | AI will become, will go exponential very much

00:25:58.040 | and the world will be transformed

00:25:59.520 | and humans will become obsolete.

00:26:01.920 | And if you go against this narrative,

00:26:04.840 | because it is not really a scientific argument,

00:26:07.840 | but more of a belief system,

00:26:09.960 | it is part of the identity of many people.

00:26:12.200 | If you go against this narrative,

00:26:13.560 | it's like you're attacking the identity

00:26:15.320 | of people who believe in it.

00:26:16.480 | It's almost like saying God doesn't exist or something.

00:26:19.240 | - Right.

00:26:20.280 | - So you do get a lot of pushback

00:26:22.960 | if you try to question these ideas.

00:26:25.080 | - First of all, I believe most people,

00:26:27.640 | they might not be as eloquent or explicit as you're being,

00:26:30.280 | but most people in computer science

00:26:32.000 | and most people who actually have built

00:26:34.040 | anything that you could call AI, quote unquote,

00:26:37.400 | would agree with you.

00:26:39.160 | They might not be describing in the same kind of way.

00:26:41.600 | It's more, so the pushback you're getting

00:26:45.040 | is from people who get attached to the narrative

00:26:48.800 | from not from a place of science,

00:26:51.060 | but from a place of imagination.

00:26:53.440 | - That's correct, that's correct.

00:26:54.800 | - So why do you think that's so appealing?

00:26:56.960 | Because the usual dreams that people have

00:27:01.960 | when you create a superintelligent system

00:27:04.000 | past the singularity, that what people imagine

00:27:06.880 | is somehow always destructive.

00:27:08.620 | Do you have, if you were put on your psychology hat,

00:27:12.280 | what's, why is it so appealing to imagine

00:27:17.280 | the ways that all of human civilization will be destroyed?

00:27:20.800 | - I think it's a good story.

00:27:22.120 | You know, it's a good story.

00:27:23.160 | And very interestingly, it mirrors

00:27:26.480 | religious stories, right?

00:27:28.600 | Religious mythology.

00:27:30.600 | If you look at the mythology of most civilizations,

00:27:34.400 | it's about the world being headed towards some final events

00:27:38.320 | in which the world will be destroyed

00:27:40.520 | and some new world order will arise

00:27:42.840 | that will be mostly spiritual,

00:27:44.960 | like the apocalypse followed by paradise probably, right?

00:27:49.480 | It's a very appealing story on a fundamental level.

00:27:52.640 | And we all need stories.

00:27:54.600 | We all need stories to structure the way we see the world,

00:27:58.160 | especially at timescales that are beyond

00:28:01.760 | our ability to make predictions, right?

00:28:04.520 | - So on a more serious, non-exponential explosion question,

00:28:09.520 | do you think there will be a time

00:28:14.960 | when we'll create something like human-level intelligence

00:28:19.800 | or intelligent systems that will make you sit back

00:28:23.800 | and be just surprised at damn how smart this thing is?

00:28:28.480 | That doesn't require exponential growth

00:28:30.160 | or exponential improvement,

00:28:32.120 | but what's your sense of the timeline and so on

00:28:35.560 | that you'll be really surprised at certain capabilities?

00:28:41.040 | And we'll talk about limitations in deep learning.

00:28:42.520 | So when do you, do you think in your lifetime

00:28:44.440 | you'll be really damn surprised?

00:28:46.600 | - Around 2013, 2014, I was many times surprised

00:28:51.400 | by the capabilities of deep learning, actually.

00:28:53.680 | - Yeah.

00:28:54.520 | - That was before we had assessed exactly

00:28:55.840 | what deep learning could do and could not do.

00:28:57.840 | And it felt like a time of immense potential.

00:29:00.560 | And then we started narrowing it down,

00:29:03.040 | but I was very surprised.

00:29:04.320 | So I would say it has already happened.

00:29:07.120 | - Was there a moment, there must've been a day in there

00:29:10.800 | where your surprise was almost bordering on the belief

00:29:15.800 | of the narrative that we just discussed.

00:29:19.040 | Was there a moment, 'cause you've written quite eloquently

00:29:22.400 | about the limits of deep learning.

00:29:23.960 | Was there a moment that you thought

00:29:25.760 | that maybe deep learning is limitless?

00:29:27.720 | - No, I don't think I've ever believed this.

00:29:32.400 | What was really shocking is that it's worked.

00:29:35.240 | - It worked at all, yeah.

00:29:36.320 | - Yeah.

00:29:37.640 | But there's a big jump between being able to do

00:29:41.600 | really good computer vision and human level intelligence.

00:29:44.880 | So I don't think at any point, I wasn't an impression

00:29:49.480 | that the results we got in computer vision

00:29:51.280 | meant that we were very close to human level intelligence.

00:29:54.120 | I don't think we were very close to human level intelligence.

00:29:56.080 | I do believe that there's no reason why we won't achieve it

00:30:00.400 | at some point.

00:30:01.800 | I also believe that, you know, it's the problem

00:30:06.440 | with talking about human level intelligence

00:30:08.600 | that implicitly you're considering like an axis

00:30:12.160 | of intelligence with different levels.

00:30:14.400 | But that's not really how intelligence works.

00:30:16.760 | Intelligence is very multidimensional.

00:30:19.520 | And so there's the question of capabilities,

00:30:22.520 | but there's also the question of being human-like.

00:30:25.600 | And it's two very different things.

00:30:27.080 | Like you can build potentially very advanced

00:30:29.760 | intelligent agents that are not human-like at all.

00:30:32.720 | And you can also build very human-like agents.

00:30:35.280 | And these are two very different things, right?

00:30:37.920 | - Right.

00:30:38.800 | Let's go from the philosophical to the practical.

00:30:42.280 | Can you give me a history of Keras

00:30:44.280 | and all the major deep learning frameworks

00:30:46.520 | that you kind of remember in relation to Keras

00:30:48.560 | and in general, TensorFlow, Theano, the old days.

00:30:52.080 | Can you give a brief overview, Wikipedia style history

00:30:55.440 | and your role in it before we return to AGI discussions?

00:30:59.160 | - Yeah, that's a broad topic.

00:31:00.720 | So I started working on Keras.

00:31:04.080 | It was a name Keras at the time.

00:31:06.240 | I actually picked the name like just the day

00:31:08.880 | I was going to release it.

00:31:10.240 | So I started working on it in February, 2015.

00:31:14.840 | And so at the time, there weren't too many people

00:31:17.280 | working on deep learning, maybe like fewer than 10,000.

00:31:20.400 | The software tooling was not really developed.

00:31:22.920 | So the main deep learning library was Caffe,

00:31:28.880 | which was mostly C++.

00:31:30.920 | - Why do you say Caffe was the main one?

00:31:32.840 | - Caffe was vastly more popular than Theano

00:31:36.080 | in late 2014, early 2015.

00:31:38.960 | Caffe was the one library that everyone was using

00:31:42.440 | for computer vision.

00:31:43.480 | - And computer vision was the most popular problem

00:31:46.200 | that you were working at the time.

00:31:47.040 | - Absolutely, like Covenants was like the subfield

00:31:49.920 | of deep learning that everyone was working on.

00:31:53.160 | So myself, so in late 2014, I was actually interested

00:31:58.160 | in RNNs, in Recurrent Neural Networks,

00:32:01.760 | which was a very niche topic at the time.

00:32:05.800 | It really took off around 2016.

00:32:08.640 | And so I was looking for good tools.

00:32:11.560 | I had used Torch 7, I had used Theano,

00:32:14.800 | used Theano a lot in Kaggle competitions.

00:32:17.600 | I had used Caffe, and there was no good solution

00:32:24.320 | for RNNs at the time.

00:32:26.160 | There was no reusable open source implementation

00:32:28.800 | of an LSTM, for instance.

00:32:30.160 | So I decided to build my own.

00:32:33.080 | And at first, the pitch for that was,

00:32:35.600 | it was gonna be mostly around LSTM,

00:32:39.080 | Recurrent Neural Networks.

00:32:40.120 | It was gonna be in Python.

00:32:41.520 | An important decision at the time

00:32:44.480 | that was kind of not obvious is that the models

00:32:47.080 | would be defined via Python code,

00:32:50.520 | which was kind of like going against the mainstream

00:32:54.560 | at the time because Caffe, PyLearn 2, and so on,

00:32:58.160 | like all the big libraries were actually going

00:33:00.800 | with the approach of having static configuration files

00:33:03.680 | in YAML to define models.

00:33:05.760 | So some libraries were using code to define models,

00:33:09.040 | like Torch 7, obviously, but that was not.

00:33:11.080 | Python.

00:33:12.480 | Lasagne was like a Theano-based, very early library

00:33:16.880 | that was, I think, developed, I'm not sure exactly,

00:33:18.840 | probably late 2014.

00:33:20.440 | - It's Python as well.

00:33:21.360 | - It's Python as well.

00:33:22.200 | It was like on top of Theano.

00:33:24.680 | And so I started working on something,

00:33:28.360 | and the value proposition at the time was that

00:33:32.520 | not only that what I think was the first reusable

00:33:36.920 | open source implementation of LSTM,

00:33:40.400 | you could combine RNNs and Covenants

00:33:44.440 | with the same library, which was not really possible before.

00:33:46.920 | Like Caffe was only doing Covenants.

00:33:49.080 | And it was kind of easy to use because,

00:33:53.040 | so before I was using Theano, I was actually using Scikit-Learn

00:33:55.680 | and I loved Scikit-Learn for its usability.

00:33:58.320 | So I drew a lot of inspiration from Scikit-Learn

00:34:01.560 | when I met Keras.

00:34:02.400 | It's almost like Scikit-Learn for neural networks.

00:34:04.960 | - Yeah, the fit function.

00:34:06.680 | - Exactly, the fit function.

00:34:07.960 | Like reducing a complex string loop

00:34:10.800 | to a single function call, right?

00:34:12.880 | And of course, some people will say,

00:34:14.880 | this is hiding a lot of details,

00:34:16.320 | but that's exactly the point, right?

00:34:18.680 | The magic is the point.

00:34:20.280 | So it's magical, but in a good way.

00:34:22.640 | It's magical in the sense that it's delightful, right?

00:34:25.560 | - Yeah, I'm actually quite surprised.

00:34:27.600 | I didn't know that it was born out of desire

00:34:29.560 | to implement RNNs and LSTMs.

00:34:32.440 | - It was.

00:34:33.280 | - That's fascinating.

00:34:34.120 | So you were actually one of the first people

00:34:36.000 | to really try to attempt

00:34:37.920 | to get the major architectures together.

00:34:40.960 | And it's also interesting,

00:34:42.720 | you made me realize that that was a design decision at all,

00:34:45.120 | is defining the model and code.

00:34:47.320 | Just, I'm putting myself in your shoes,

00:34:49.880 | whether the YAML, especially if CAFE was the most popular.

00:34:53.200 | - It was the most popular by far at the time.

00:34:56.040 | - If I were, yeah, I didn't like the YAML thing,

00:34:59.560 | but it makes more sense that you will put

00:35:02.840 | in a configuration file the definition of a model.

00:35:05.720 | That's an interesting gutsy move

00:35:07.200 | to stick with defining it in code.

00:35:10.040 | Just if you look back.

00:35:11.400 | - Other libraries were doing it as well,

00:35:13.480 | but it was definitely the more niche option.

00:35:16.320 | - Yeah, okay, Keras and then--

00:35:18.360 | - Keras, so I released Keras in March, 2015,

00:35:21.520 | and it got users pretty much from the start.

00:35:24.160 | So the deep learning community

00:35:25.080 | was very, very small at the time.

00:35:27.240 | Lots of people were starting to be interested in LSTM.

00:35:30.600 | So it was gonna release it at the right time

00:35:32.440 | because it was offering an easy to use LSTM implementation.

00:35:35.560 | Exactly at the time where lots of people started

00:35:37.680 | to be intrigued by the capabilities of RNN, RNN, so NLP.

00:35:42.280 | So it grew from there.

00:35:43.960 | Then I joined Google about six months later,

00:35:51.520 | and that was actually completely unrelated to Keras.

00:35:54.960 | I actually joined a research team

00:35:57.120 | working on image classification,

00:35:59.560 | mostly like computer vision.

00:36:00.720 | So I was doing computer vision research at Google initially.

00:36:03.680 | And immediately when I joined Google,

00:36:05.520 | I was exposed to the early internal version of TensorFlow.

00:36:10.520 | And the way it appeared to me at the time,

00:36:13.920 | and it was definitely the way it was at the time,

00:36:15.720 | is that this was an improved version of Theano.

00:36:20.720 | So I immediately knew I had to port Keras

00:36:24.720 | to this new TensorFlow thing.

00:36:26.800 | And I was actually very busy as a new Googler.

00:36:31.640 | So I had not time to work on that.

00:36:34.520 | But then in November, I think it was November 2015,

00:36:38.680 | TensorFlow got released.

00:36:41.240 | And it was kind of like my wake up call that,

00:36:44.720 | hey, I had to actually go and make it happen.

00:36:47.320 | So in December, I ported Keras to run on top of TensorFlow.

00:36:52.320 | But it was not exactly a port,

00:36:53.320 | it was more like a refactoring

00:36:55.280 | where I was abstracting away all the backend functionality

00:36:59.440 | into one module so that the same code base

00:37:02.320 | could run on top of multiple backends.

00:37:04.200 | So on top of TensorFlow or Theano.

00:37:07.440 | And for the next year, Theano stayed as the default option.

00:37:12.440 | It was easier to use, somewhat less buggy.

00:37:20.400 | It was much faster, especially when it came to Ornans.

00:37:23.360 | But eventually, TensorFlow overtook it.

00:37:27.440 | - And TensorFlow, the early TensorFlow,

00:37:30.160 | has similar architectural decisions as Theano.

00:37:33.920 | So it was a natural transition.

00:37:37.400 | - Yeah, absolutely.

00:37:38.240 | - So what, I mean, that's still Keras

00:37:41.680 | as a side, almost fun project, right?

00:37:45.240 | - Yeah, so it was not my job assignment.

00:37:48.960 | I was doing it on the side.

00:37:52.160 | And even though it grew to have a lot of users

00:37:55.800 | for a deep learning library at the time, like Stroud 2016,

00:37:59.560 | but I wasn't doing it as my main job.

00:38:02.440 | So things started changing in,

00:38:04.720 | I think it must have been maybe October 2016.

00:38:09.720 | So one year later.

00:38:11.280 | So Rajat, who was the lead on TensorFlow,

00:38:15.200 | basically showed up one day in our building

00:38:19.200 | where I was doing research and things like,

00:38:21.600 | so I did a lot of computer vision research,

00:38:24.600 | also collaborations with Christian Zugedi

00:38:27.560 | and deep learning for theorem proving.

00:38:29.640 | It was a really interesting research topic.

00:38:32.880 | And so Rajat was saying, "Hey, we saw Keras, we like it.

00:38:39.520 | We saw that you're at Google.

00:38:42.400 | Why don't you come over for like a quarter

00:38:45.280 | and work with us?"

00:38:47.280 | And I was like, "Yeah, that sounds like a great opportunity.

00:38:49.240 | Let's do it."

00:38:50.400 | And so I started working on integrating the Keras API

00:38:55.400 | into TensorFlow more tightly.

00:38:57.320 | So what followed up is a sort of like temporary

00:39:02.640 | TensorFlow only version of Keras

00:39:05.480 | that was in TensorFlow.com Trib for a while.

00:39:09.320 | And finally moved to TensorFlow Core.

00:39:12.200 | And I've never actually gotten back

00:39:15.360 | to my old team doing research.

00:39:17.560 | - Well, it's kind of funny that somebody like you

00:39:22.280 | who dreams of, or at least sees the power of AI systems

00:39:27.280 | that reason and theorem proving we'll talk about

00:39:31.640 | has also created a system that makes

00:39:34.600 | the most basic kind of Lego building

00:39:39.000 | that is deep learning, super accessible, super easy.

00:39:42.600 | So beautifully so.

00:39:43.800 | It's a funny irony that you're both,

00:39:47.200 | you're responsible for both things.

00:39:49.080 | But so TensorFlow 2.0 is kind of, there's a sprint.

00:39:53.960 | I don't know how long it'll take,

00:39:55.000 | but there's a sprint towards the finish.

00:39:56.920 | What do you look, what are you working on these days?

00:40:01.000 | What are you excited about?

00:40:02.120 | What are you excited about in 2.0?

00:40:04.200 | I mean, eager execution.

00:40:05.720 | There's so many things that just make it a lot easier

00:40:08.400 | to work.

00:40:09.720 | What are you excited about?

00:40:11.520 | And what's also really hard?

00:40:13.560 | What are the problems you have to kind of solve?

00:40:15.760 | - So I've spent the past year and a half

00:40:17.960 | working on TensorFlow 2.0.

00:40:20.840 | It's been a long journey.

00:40:22.880 | I'm actually extremely excited about it.

00:40:25.040 | I think it's a great product.

00:40:26.400 | It's a delightful product compared to TensorFlow 1.0.

00:40:29.320 | We've made huge progress.

00:40:31.400 | So on the Keras side, what I'm really excited about is that,

00:40:37.360 | so, you know, previously Keras has been this very easy

00:40:42.040 | to use high-level interface to do deep learning.

00:40:46.040 | But if you wanted to, you know,

00:40:50.760 | if you wanted a lot of flexibility,

00:40:53.280 | the Keras framework, you know,

00:40:55.600 | was probably not the optimal way to do things

00:40:58.640 | compared to just writing everything from scratch.

00:41:01.120 | So in some way, the framework was getting in the way.

00:41:04.920 | And in TensorFlow 2.0, you don't have this at all,

00:41:07.760 | actually, you have the usability of the high-level interface,

00:41:11.280 | but you have the flexibility of this lower-level interface.

00:41:14.720 | And you have this spectrum of workflows

00:41:17.120 | where you can get more or less usability

00:41:21.760 | and flexibility trade-offs depending on your needs, right?

00:41:26.760 | You can write everything from scratch

00:41:29.880 | and you get a lot of help doing so by, you know,

00:41:33.200 | subclassing models and writing some train loops

00:41:36.640 | using ego execution.

00:41:38.400 | It's very flexible.

00:41:39.320 | It's very easy to debug.

00:41:40.360 | It's very powerful.

00:41:41.360 | But all of this integrates seamlessly

00:41:44.960 | with higher-level features up to, you know,

00:41:47.800 | the classic Keras workflows,

00:41:49.400 | which are very scikit-learn-like

00:41:51.520 | and, you know, are ideal for a data scientist,

00:41:56.040 | machine learning engineer type of profile.

00:41:58.200 | So now you can have the same framework

00:42:00.800 | offering the same set of APIs

00:42:02.840 | that enable a spectrum of workflows

00:42:05.000 | that are more or less low-level, more or less high-level

00:42:08.560 | that are suitable for, you know,

00:42:10.440 | profiles ranging from researchers to data scientists

00:42:14.400 | and everything in between.

00:42:15.560 | - Yeah, so that's super exciting.

00:42:16.960 | I mean, it's not just that,

00:42:18.400 | it's connected to all kinds of tooling.

00:42:21.680 | You can go on mobile, you can go with TensorFlow Lite,

00:42:24.520 | you can go in the cloud with serving and so on,

00:42:27.240 | and all is connected together.

00:42:28.960 | And some of the best software written ever

00:42:31.880 | is often done by one person, sometimes two.

00:42:37.240 | So with Google, you're now seeing sort of Keras

00:42:40.760 | having to be integrated in TensorFlow,

00:42:42.800 | I'm sure has a ton of engineers working on.

00:42:46.400 | So, and there's, I'm sure,

00:42:48.320 | a lot of tricky design decisions to be made.

00:42:52.160 | How does that process usually happen

00:42:54.400 | from at least your perspective?

00:42:56.720 | What are the debates like?

00:42:59.720 | Is there a lot of thinking,

00:43:04.160 | considering different options and so on?

00:43:06.840 | - Yes.

00:43:08.200 | So a lot of the time I spend at Google

00:43:12.600 | is actually discussing design discussions, right?

00:43:17.240 | Writing design docs,

00:43:18.560 | participating in design review meetings and so on.

00:43:22.040 | This is, you know, as important

00:43:23.720 | as actually writing the code.

00:43:25.200 | - Right.

00:43:26.040 | - So there's a lot of thoughts,

00:43:27.200 | there's a lot of thoughts and a lot of care

00:43:29.280 | that is taken in coming up with these decisions

00:43:34.120 | and taking into account all of our users

00:43:37.080 | because TensorFlow has this extremely diverse user base,

00:43:40.640 | right?

00:43:41.480 | It's not like just one user segment

00:43:43.040 | where everyone has the same needs.

00:43:45.400 | We have small scale production users,

00:43:47.560 | large scale production users.

00:43:49.440 | We have startups, we have researchers,

00:43:52.760 | you know, it's all over the place.

00:43:55.000 | And we have to cater to all of their needs.

00:43:57.480 | - If I just look at the standard debates of C++ or Python,

00:44:02.200 | there's some heated debates.

00:44:03.920 | Do you have those at Google?

00:44:05.920 | I mean, they're not heated in terms of emotionally,

00:44:08.000 | but there's probably multiple ways to do it right.

00:44:10.720 | So how do you arrive through those design meetings

00:44:13.960 | at the best way to do it?

00:44:15.360 | Especially in deep learning where the field is evolving

00:44:19.200 | as you're doing it.

00:44:20.800 | Is there some magic to it?

00:44:23.520 | There's some magic to the process?

00:44:25.240 | - I don't know if there's magic to the process,

00:44:28.200 | but there definitely is a process.

00:44:30.680 | So making design decisions

00:44:33.920 | about satisfying a set of constraints,

00:44:36.120 | but also trying to do so in the simplest way possible,

00:44:39.960 | because this is what can be maintained,

00:44:42.280 | this is what can be expanded in the future.

00:44:45.000 | So you don't want to naively satisfy the constraints

00:44:49.160 | by just, you know, for each capability you need available,

00:44:51.960 | you're gonna come up with one argument,

00:44:53.480 | a new API and so on.

00:44:54.760 | You want to design APIs that are modular

00:44:59.560 | and hierarchical so that they have an API surface

00:45:04.120 | that is as small as possible.

00:45:06.080 | And you want this modular hierarchical architecture

00:45:11.680 | to reflect the way that domain experts

00:45:14.600 | think about the problem.

00:45:16.440 | Because as a domain expert,

00:45:17.920 | when you are reading about a new API,

00:45:19.880 | you're reading a tutorial or some docs pages,

00:45:23.680 | you already have a way

00:45:26.400 | that you're thinking about the problem.

00:45:28.240 | You already have like certain concepts in mind

00:45:32.360 | and you're thinking about how they relate together.

00:45:35.720 | And when you're reading docs,

00:45:37.240 | you're trying to build as quickly as possible

00:45:40.360 | a mapping between the concepts featured in your API

00:45:45.320 | and the concepts in your mind.

00:45:46.840 | So you're trying to map your mental model

00:45:48.920 | as a domain expert to the way things work in the API.

00:45:53.640 | So you need an API and an underlying implementation

00:45:57.080 | that are reflecting the way people think about these things.

00:46:00.120 | - So you're minimizing the time it takes to do the mapping.

00:46:02.920 | - Yes, minimizing the time,

00:46:04.720 | the cognitive load there is

00:46:06.600 | in ingesting this new knowledge about your API.

00:46:10.960 | An API should not be self-referential

00:46:13.200 | or referring to implementation details.

00:46:15.560 | It should only be referring to domain specific concepts

00:46:19.200 | that people already understand.

00:46:21.400 | - Brilliant.

00:46:24.480 | So what's the future of Keras and TensorFlow look like?

00:46:27.560 | What does TensorFlow 3.0 look like?

00:46:29.640 | - So that's kind of too far in the future for me to answer,

00:46:33.680 | especially since I'm not even the one making these decisions.

00:46:37.800 | - Okay.

00:46:39.080 | - But so from my perspective,

00:46:41.240 | which is just one perspective

00:46:43.200 | among many different perspectives on the TensorFlow team,

00:46:46.040 | I'm really excited by developing even higher level APIs,

00:46:52.200 | higher level than Keras.

00:46:53.600 | I'm really excited by hyper parameter tuning,

00:46:56.480 | by automated machine learning, AutoML.

00:46:59.280 | I think the future is not just, you know,

00:47:03.200 | defining a model like you were assembling Lego blocks

00:47:07.600 | and then collect fit on it.

00:47:09.200 | It's more like an automagical model

00:47:13.640 | that would just look at your data

00:47:16.080 | and optimize the objective you're after, right?

00:47:19.040 | So that's what I'm looking into.

00:47:23.040 | - Yeah, so you put the baby into a room with the problem

00:47:26.440 | and come back a few hours later

00:47:28.760 | with a fully solved problem.

00:47:30.960 | - Exactly, it's not like a box of Legos.

00:47:33.560 | It's more like the combination of a kid

00:47:35.920 | that's really good at Legos and a box of Legos

00:47:38.800 | and just building the thing on its own.

00:47:41.520 | - Very nice.

00:47:42.680 | So that's an exciting future.

00:47:44.120 | And I think there's a huge amount of applications

00:47:46.080 | and revolutions to be had under the constraints

00:47:50.760 | of the discussion we previously had.

00:47:52.640 | But what do you think are the current limits

00:47:56.000 | of deep learning?

00:47:57.480 | If we look specifically at these function approximators

00:48:02.480 | that tries to generalize from data.

00:48:06.160 | So you've talked about local versus extreme generalization.

00:48:10.160 | You mentioned that neural networks don't generalize well,

00:48:13.280 | humans do.

00:48:14.560 | So there's this gap.

00:48:16.280 | So, and you've also mentioned that generalization,

00:48:19.880 | extreme generalization requires something like reasoning

00:48:22.360 | to fill those gaps.

00:48:23.960 | So how can we start trying to build systems like that?

00:48:27.560 | - Right, yeah, so this is by design, right?

00:48:30.600 | Deep learning models are like huge parametric models

00:48:35.080 | differentiable, so continuous,

00:48:39.440 | that go from an input space to an output space.

00:48:42.680 | And they're trained with gradient descent.

00:48:44.120 | So they're trained pretty much point by point.

00:48:47.200 | They're learning a continuous geometric morphing

00:48:50.520 | from an input vector space to an output vector space.

00:48:54.200 | And because this is done point by point,

00:48:59.000 | a deep neural network can only make sense

00:49:02.200 | of points in experience space that are very close

00:49:05.880 | to things that it has already seen in the stream data.

00:49:08.560 | At best, it can do interpolation across points.

00:49:12.560 | But that means, you know,

00:49:15.640 | it means in order to train your network,

00:49:17.360 | you need a dense sampling of the input cross output space,

00:49:21.680 | almost a point by point sampling,

00:49:25.240 | which can be very expensive

00:49:26.560 | if you're dealing with complex real world problems

00:49:29.320 | like autonomous driving, for instance, or robotics.

00:49:33.240 | It's doable if you're looking at the subset

00:49:36.000 | of the visual space, but even then,

00:49:37.760 | it's still fairly expensive.

00:49:38.760 | You still need millions of examples.

00:49:40.920 | And it's only gonna be able to make sense of things

00:49:44.240 | that are very close to what it has seen before.

00:49:46.840 | And in contrast to that,

00:49:48.600 | well, of course you have human intelligence,

00:49:50.160 | but even if you're not looking at human intelligence,

00:49:53.200 | you can look at very simple rules, algorithms.

00:49:56.760 | If you have a symbolic rule,

00:49:58.040 | it can actually apply to a very, very large set of inputs

00:50:03.040 | because it is abstract.

00:50:04.840 | It is not obtained by doing a point by point mapping, right?

00:50:09.840 | For instance, if you try to learn a sorting algorithm

00:50:14.000 | using a deep neural network,

00:50:15.520 | well, you're very much limited to learning point by point

00:50:18.480 | what the sorted representation of this specific list

00:50:23.320 | is like, but instead you could have a very, very simple

00:50:28.320 | sorting algorithm written in a few lines.

00:50:31.920 | Maybe it's just two nested loops

00:50:34.400 | and it can process any list at all because it is abstract,

00:50:40.560 | because it is a set of rules.

00:50:42.240 | So deep learning is really like point by point

00:50:45.160 | geometric morphings, morphings, train, risk, and descent.

00:50:48.600 | And meanwhile, abstract rules can generalize much better.

00:50:53.600 | And I think the future is really to combine the two.

00:50:56.680 | - So how do we, do you think, combine the two?

00:50:59.680 | How do we combine good point by point functions

00:51:03.520 | with programs, which is what the symbolic AI type systems?

00:51:08.520 | - Yeah.

00:51:09.880 | - At which levels the combination happen?

00:51:12.080 | I mean, obviously we're jumping into the realm

00:51:15.160 | of where there's no good answers.

00:51:17.360 | You just kind of ideas and intuitions and so on.

00:51:20.760 | - Well, if you look at the really successful AI systems

00:51:23.520 | today, I think they are already hybrid systems

00:51:26.320 | that are combining symbolic AI with deep learning.

00:51:29.520 | For instance, successful robotics systems

00:51:32.520 | are already mostly model-based, rule-based,

00:51:36.400 | things like planning algorithms and so on.

00:51:39.400 | At the same time, they're using deep learning

00:51:42.200 | as perception modules.

00:51:43.840 | Sometimes they're using deep learning as a way to inject

00:51:47.200 | fuzzy intuition into a rule-based process.

00:51:50.920 | If you look at the system like in a self-driving car,

00:51:54.560 | it's not just one big end-to-end neural network,

00:51:57.240 | you know, that wouldn't work at all.

00:51:59.000 | Precisely because in order to train that,

00:52:00.760 | you would need a dense sampling of experience space

00:52:05.160 | when it comes to driving,

00:52:06.200 | which is completely unrealistic, obviously.

00:52:08.880 | Instead, the self-driving car is mostly symbolic,

00:52:13.280 | you know, it's software, it's programmed by hand.

00:52:18.360 | So it's mostly based on explicit models,

00:52:21.640 | in this case, mostly 3D models of the environment

00:52:25.840 | around the car, but it's interfacing with the real world

00:52:29.520 | using deep learning modules, right?

00:52:31.240 | - Right, so the deep learning there serves as a way

00:52:33.440 | to convert the raw sensory information

00:52:36.080 | to something usable by symbolic systems.

00:52:38.360 | Okay, well, let's linger on that a little more.

00:52:42.400 | So dense sampling from input to output.

00:52:45.440 | You said it's obviously very difficult.

00:52:48.240 | Is it possible?

00:52:50.200 | - In the case of self-driving, you mean?

00:52:51.840 | - Let's say self-driving, right?

00:52:53.080 | Self-driving, for many people,

00:52:55.800 | let's not even talk about self-driving,

00:52:59.520 | let's talk about steering, so staying inside the lane.

00:53:05.080 | Lane following, yeah, it's definitely a problem

00:53:07.080 | you can solve with an end-to-end deep learning model,

00:53:08.920 | but that's like one small subset.

00:53:10.560 | - Hold on a second.

00:53:11.600 | I don't know why you're jumping from the extreme so easily,

00:53:14.520 | 'cause I disagree with you on that.

00:53:16.280 | I think, well, it's not obvious to me

00:53:21.000 | that you can solve lane following.

00:53:23.360 | - No, it's not obvious.

00:53:24.760 | I think it's doable.

00:53:25.840 | I think in general, you know, there is no hard limitations

00:53:30.840 | to what you can learn with a deep neural network,

00:53:33.680 | as long as the search space is rich enough,

00:53:38.680 | is flexible enough, and as long as you have

00:53:42.240 | this dense sampling of the input cross output space.

00:53:45.360 | The problem is that this dense sampling

00:53:47.720 | could mean anything from 10,000 examples

00:53:51.120 | to like trillions and trillions.

00:53:52.800 | - So that's my question.

00:53:54.360 | So what's your intuition?

00:53:56.200 | And if you could just give it a chance

00:53:58.720 | and think what kind of problems can be solved

00:54:01.840 | by getting a huge amounts of data

00:54:04.240 | and thereby creating a dense mapping.

00:54:08.000 | So let's think about natural language dialogue,

00:54:12.480 | the Turing test.

00:54:14.000 | Do you think the Turing test can be solved

00:54:17.000 | with a neural network alone?

00:54:21.120 | - Well, the Turing test is all about tricking people

00:54:24.440 | into believing they're talking to a human.

00:54:26.880 | And I don't think that's actually very difficult

00:54:29.040 | because it's more about exploiting human perception

00:54:34.040 | and not so much about intelligence.

00:54:37.520 | There's a big difference between mimicking

00:54:39.680 | intelligent behavior and actual intelligent behavior.

00:54:42.080 | - So, okay, let's look at maybe the Alexa Prize and so on,

00:54:45.360 | the different formulations of the natural language

00:54:47.480 | conversation that are less about mimicking

00:54:50.520 | and more about maintaining a fun conversation

00:54:52.800 | that lasts for 20 minutes.

00:54:54.720 | That's a little less about mimicking

00:54:56.200 | and that's more about, I mean,

00:54:58.160 | it's still mimicking,

00:54:59.080 | but it's more about being able to carry forward

00:55:01.440 | a conversation with all the tangents that happen

00:55:03.640 | in dialogue and so on.

00:55:05.080 | Do you think that problem is learnable

00:55:08.320 | with this kind of, with a neural network

00:55:11.960 | that does the point-to-point mapping?

00:55:14.520 | - So I think it would be very, very challenging

00:55:16.280 | to do this with deep learning.

00:55:17.800 | I don't think it's out of the question either.

00:55:21.480 | I wouldn't rule it out.

00:55:23.240 | - The space of problems that can be solved

00:55:25.400 | with a large neural network.

00:55:26.920 | What's your sense about the space of those problems?

00:55:30.040 | So useful problems for us.

00:55:32.560 | - In theory, it's infinite, right?

00:55:34.800 | You can solve any problem.

00:55:36.200 | In practice, while deep learning is a great fit

00:55:39.800 | for perception problems, in general,

00:55:42.360 | any problem which is naturally amenable

00:55:47.360 | to explicit handcrafted rules

00:55:50.240 | or rules that you can generate by exhaustive search

00:55:53.480 | over some program space.

00:55:56.080 | So perception, artificial intuition,

00:55:59.360 | as long as you have a sufficient trained data set.

00:56:03.280 | - And that's the question.

00:56:04.280 | I mean, perception, there's interpretation

00:56:06.440 | and understanding of the scene,

00:56:08.440 | which seems to be outside the reach

00:56:10.320 | of current perception systems.

00:56:13.000 | So do you think larger networks will be able

00:56:15.960 | to start to understand the physics

00:56:18.320 | and the physics of the scene,

00:56:21.120 | the three-dimensional structure and relationships

00:56:23.400 | of objects in the scene and so on,

00:56:25.600 | or really that's where Symbolica has to step in?

00:56:28.320 | - Well, it's always possible to solve these problems

00:56:34.400 | with deep learning.

00:56:36.840 | It's just extremely inefficient.

00:56:38.640 | A model would be, an explicit rule-based abstract model

00:56:42.080 | would be a far better, more compressed representation

00:56:45.960 | of physics than learning just this mapping

00:56:48.400 | between in this situation, this thing happens.

00:56:51.000 | If you change the situation slightly,

00:56:52.800 | then this other thing happens and so on.

00:56:54.800 | - Do you think it's possible to automatically generate

00:56:57.480 | the programs that would require that kind of reasoning?

00:57:02.200 | Or does it have to, so the way the expert systems fail,

00:57:05.360 | there's so many facts about the world

00:57:07.120 | had to be hand-coded in.

00:57:08.960 | Do you think it's possible to learn those logical statements

00:57:13.480 | that are true about the world and their relationships?

00:57:17.280 | Do you think, I mean, that's kind of what theorem proving

00:57:20.360 | at a basic level is trying to do, right?

00:57:22.680 | - Yeah, except it's much harder to formulate statements

00:57:26.160 | about the world compared to formulating

00:57:28.440 | mathematical statements.

00:57:30.320 | Statements about the world tend to be subjective.

00:57:32.880 | So can you learn rule-based models?

00:57:39.200 | - Yes.

00:57:40.440 | - Yes, definitely.

00:57:41.280 | That's the field of program synthesis.

00:57:43.600 | However, today we just don't really know how to do it.

00:57:48.000 | So it's very much a graph search or tree search problem.

00:57:52.400 | And so we are limited to the sort of tree search

00:57:56.480 | and graph search algorithms that we have today.

00:57:58.560 | Personally, I think genetic algorithms are very promising.

00:58:02.760 | - So it's almost like genetic programming.

00:58:04.360 | - Genetic programming, exactly.

00:58:05.600 | - Can you discuss the field of program synthesis?

00:58:08.840 | Like how many people are working and thinking about it?

00:58:13.360 | What, where we are in the history of program synthesis

00:58:17.920 | and what are your hopes for it?

00:58:20.760 | - Well, if it were deep learning, this is like the '90s.

00:58:23.560 | (laughing)

00:58:24.600 | So meaning that we already have existing solutions.

00:58:29.160 | We are starting to have some basic understanding

00:58:34.160 | of what this is about,

00:58:35.520 | but it's still a field that is in its infancy.

00:58:37.960 | There are very few people working on it.

00:58:40.440 | There are very few real-world applications.

00:58:42.840 | So the one real-world application I'm aware of

00:58:47.640 | is Flash Fill in Excel.

00:58:50.800 | It's a way to automatically learn very simple programs

00:58:55.080 | to format cells in an Excel spreadsheet

00:58:58.240 | from a few examples.

00:59:00.280 | For instance, learning a way to format a date,

00:59:02.120 | things like that.

00:59:02.960 | - Oh, that's fascinating.

00:59:03.800 | - Yeah.

00:59:04.640 | - You know, okay, that's a fascinating topic.

00:59:06.240 | I always wonder when I provide a few samples to Excel,

00:59:10.280 | what it's able to figure out.

00:59:12.640 | Like just giving it a few dates.

00:59:15.520 | What are you able to figure out

00:59:16.960 | from the pattern I just gave you?

00:59:18.440 | That's a fascinating question.

00:59:19.680 | And it's fascinating whether that's learnable patterns.

00:59:23.240 | And you're saying they're working on that.

00:59:24.880 | - Yeah.

00:59:25.720 | - How big is the toolbox currently?

00:59:28.120 | Are we completely in the dark?

00:59:29.360 | So if you said the '90s.

00:59:30.200 | - In terms of processes?

00:59:31.680 | No, so I would say,

00:59:35.160 | so maybe '90s is even too optimistic

00:59:37.760 | because by the '90s, you know,

00:59:38.920 | we already understood backprop.

00:59:41.000 | We already understood, you know,

00:59:42.040 | the engine of deep learning,

00:59:43.880 | even though we couldn't really see its potential quite.

00:59:47.240 | Today, I don't think we've found

00:59:48.440 | the engine of program synthesis.

00:59:50.360 | - So we're in the winter before backprop.

00:59:52.800 | - Yeah.

00:59:54.120 | In a way, yes.

00:59:55.680 | So I do believe program synthesis,

00:59:58.200 | in general, discrete search over rule-based models

01:00:02.040 | is gonna be a cornerstone of AI research

01:00:04.640 | in the next century, right?

01:00:06.640 | And that doesn't mean we're gonna drop deep learning.

01:00:10.120 | Deep learning is immensely useful.

01:00:11.800 | Like being able to learn is a very flexible,

01:00:16.160 | adaptable parametric model.

01:00:18.040 | So it's got in this sense,

01:00:19.080 | that's actually immensely useful.

01:00:20.240 | Like all it's doing is pattern cognition,

01:00:23.080 | but being good at pattern cognition,

01:00:24.800 | given lots of delays is just extremely powerful.

01:00:27.800 | So we are still gonna be working on deep learning

01:00:30.240 | and we're gonna be working on program synthesis.

01:00:31.800 | We're gonna be combining the two

01:00:33.240 | in increasingly automated ways.

01:00:35.080 | - So let's talk a little bit about data.

01:00:38.520 | You've tweeted.

01:00:40.120 | (laughs)

01:00:42.280 | About 10,000 deep learning papers have been written

01:00:45.200 | about hard coding priors about a specific task

01:00:48.200 | in a neural network architecture

01:00:49.640 | works better than a lack of a prior.

01:00:52.440 | Basically summarizing all these efforts,

01:00:55.120 | they put a name to an architecture,

01:00:56.960 | but really what they're doing is hard coding some priors

01:00:59.320 | that improve the performance of the system.

01:01:01.560 | But which gets straight to the point is probably true.

01:01:05.960 | So you say that you can always buy performance,

01:01:09.280 | buy in quotes performance by either training on more data,

01:01:12.960 | better data, or by injecting task information

01:01:15.480 | to the architecture of the pre-processing.

01:01:18.440 | However, this isn't informative

01:01:19.920 | about the generalization power of the techniques used,

01:01:22.200 | the fundamental ability to generalize.

01:01:24.240 | Do you think we can go far by coming up with better methods

01:01:28.320 | for this kind of cheating,

01:01:29.960 | for better methods of large-scale annotation of data?

01:01:33.560 | So building better priors.

01:01:35.240 | - If you had made it, it's not cheating anymore.

01:01:37.360 | - Right, I'm joking about the cheating,

01:01:39.480 | but large-scale, so basically I'm asking

01:01:43.080 | about something that hasn't, from my perspective,

01:01:48.280 | been researched too much is exponential improvement

01:01:53.280 | in annotation of data.

01:01:55.080 | Do you often think about-

01:01:58.200 | - I think it's actually been researched quite a bit.

01:02:00.840 | You just don't see publications about it

01:02:02.760 | because people who publish papers

01:02:05.840 | are gonna publish about known benchmarks.

01:02:07.920 | Sometimes they're gonna release a new benchmark.

01:02:09.800 | People who actually have real-world,

01:02:11.520 | large-scale deep learning problems,

01:02:13.520 | they're gonna spend a lot of resources

01:02:15.800 | into data annotation and good data annotation pipelines,

01:02:18.400 | but you don't see any papers about it.

01:02:19.680 | - That's interesting.

01:02:20.520 | So do you think, certainly resources,

01:02:22.720 | but do you think there's innovation happening?

01:02:24.840 | - Oh yeah, definitely.

01:02:25.880 | To clarify the point in the twist,

01:02:28.840 | so machine learning in general

01:02:30.960 | is the science of generalization.

01:02:33.880 | You want to generate knowledge that can be reused

01:02:38.880 | across different datasets, across different tasks.

01:02:42.120 | And if instead you're looking at one dataset

01:02:45.400 | and then you are hard-coding knowledge

01:02:49.120 | about this task into your architecture,

01:02:51.440 | this is no more useful than training a network

01:02:55.840 | and then saying, "Oh, I found these weight values

01:02:58.360 | "perform well," right?

01:03:01.960 | So David Ha, I don't know if you know David,

01:03:05.720 | he had a paper the other day

01:03:07.520 | about weight-agnostic neural networks.

01:03:10.440 | And this was a very interesting paper

01:03:12.120 | because it really illustrates the fact that an architecture,

01:03:16.360 | even without weights, an architecture is knowledge

01:03:20.600 | about a task, it encodes knowledge.

01:03:23.720 | And when it comes to architectures

01:03:25.880 | that are handcrafted by researchers,

01:03:29.400 | in some cases it is very, very clear

01:03:32.640 | that all they are doing is artificially

01:03:35.880 | re-encoding the template that corresponds

01:03:39.480 | to the proper way to solve a task

01:03:44.000 | encoding a given dataset.

01:03:45.160 | For instance, I know if you've looked at the Baby dataset,

01:03:50.160 | which is about natural language question answering,

01:03:53.400 | it is generated by an algorithm.

01:03:55.400 | So this is a question-answer pairs

01:03:57.680 | that are generated by an algorithm.

01:03:59.280 | The algorithm is solving a certain template.

01:04:01.520 | Turns out if you craft a network

01:04:04.080 | that literally encodes this template,

01:04:06.200 | you can solve this dataset with nearly 100% accuracy.

01:04:09.680 | But that doesn't actually tell you anything

01:04:12.280 | about how to solve question answering in general,

01:04:15.560 | which is the point.

01:04:17.720 | - The question is just to linger on it,

01:04:19.440 | whether it's from the data side

01:04:20.880 | or from the size of the network.

01:04:22.920 | I don't know if you've read the blog post

01:04:25.040 | by Rich Sutton, "The Bitter Lesson,"

01:04:27.720 | where he says, "The biggest lesson that we can read

01:04:30.360 | "from 70 years of AI research is that general methods

01:04:33.480 | "that leverage computation are ultimately

01:04:35.240 | "the most effective."

01:04:37.160 | So as opposed to figuring out methods

01:04:39.760 | that can generalize effectively,

01:04:41.720 | do you think we can get pretty far

01:04:46.640 | by just having something that leverages computation

01:04:50.120 | and the improvement of computation?

01:04:51.560 | - Yeah, so I think Rich is making a very good point,

01:04:54.720 | which is that a lot of these papers,

01:04:56.840 | which are actually all about manually hard-coding

01:05:00.960 | prior knowledge about a task into some system,

01:05:04.080 | doesn't have to be deep learning architecture,

01:05:05.640 | but into some system, right?

01:05:07.040 | You know, these papers are not actually making any impact.

01:05:11.760 | Instead, what's making really long-term impact

01:05:14.840 | is very simple, very general systems

01:05:18.520 | that are really agnostic to all these tricks,

01:05:21.280 | because these tricks do not generalize.

01:05:23.360 | And of course, the one general and simple thing

01:05:27.480 | that you should focus on is that which leverages computation,

01:05:32.480 | because computation, the availability

01:05:36.200 | of large-scale computation has been increasing exponentially

01:05:39.400 | following Moore's law.

01:05:40.600 | So if your algorithm is all about exploiting this,

01:05:44.120 | then your algorithm is suddenly exponentially improving.

01:05:47.480 | So I think Rich is definitely right.

01:05:52.440 | However, you know, he's right about the past 70 years.

01:05:57.120 | He's like assessing the past 70 years.

01:05:59.560 | I am not sure that this assessment will still hold true

01:06:02.960 | for the next 70 years.

01:06:04.960 | It might to some extent, I suspect it will not,

01:06:08.600 | because the truth of his assessment

01:06:11.600 | is a function of the context, right?

01:06:14.600 | In which this research took place.

01:06:16.880 | And the context is changing.

01:06:18.400 | Like Moore's law might not be applicable anymore,

01:06:21.480 | for instance, in the future.

01:06:23.840 | And I do believe that, you know,

01:06:26.480 | when you tweak one aspect of a system,

01:06:31.440 | when you exploit one aspect of a system,

01:06:32.960 | some other aspects starts becoming the bottleneck.

01:06:36.480 | Let's say you have unlimited computation.

01:06:38.880 | Well, then data is the bottleneck.

01:06:41.520 | And I think we're already starting to be in a regime

01:06:44.640 | where our systems are so large in scale

01:06:47.160 | and so data-hungry, the data today

01:06:49.320 | and the quality of data and the scale of data

01:06:51.760 | is the bottleneck.

01:06:53.120 | And in this environment,

01:06:54.680 | the bitter lesson from Rich

01:06:58.200 | is it's not gonna be true anymore, right?

01:07:00.840 | So I think we are gonna move from a focus on a scale

01:07:05.840 | of a competition scale to focus on data efficiency.

01:07:09.880 | - Data efficiency.

01:07:10.760 | So that's getting to the question of symbolic AI,

01:07:13.120 | but to linger on the deep learning approaches,

01:07:16.160 | do you have hope for either unsupervised learning

01:07:19.240 | or reinforcement learning,

01:07:20.920 | which are ways of being more data efficient

01:07:25.920 | in terms of the amount of data they need

01:07:29.600 | that required human annotation?

01:07:31.560 | - So unsupervised learning and reinforcement learning

01:07:34.240 | are frameworks for learning,

01:07:36.160 | but they're not like any specific technique.

01:07:39.000 | So usually when people say reinforcement learning,

01:07:41.200 | what they really mean is deep reinforcement learning,

01:07:43.240 | which is like one approach,

01:07:45.720 | which is actually very questionable.

01:07:47.440 | The question I was asking was unsupervised learning

01:07:50.920 | with deep neural networks and deep reinforcement learning.

01:07:54.640 | - Well, these are not really data efficient

01:07:56.840 | because you're still leveraging, you know,

01:07:58.360 | this huge parametric models,

01:08:00.040 | trying point by point with gradient descent.

01:08:02.440 | It is more efficient in terms of the number of annotations,

01:08:08.040 | the density of annotations you need.

01:08:09.440 | So the idea being to learn the latent space

01:08:13.280 | around which the data is organized

01:08:15.320 | and then map the sparse annotations into it.

01:08:18.720 | And sure, I mean, that's clearly a very good idea.

01:08:21.800 | It's not really a topic I would be working on,

01:08:26.080 | but it's clearly a good idea.

01:08:27.920 | - So it would get us to solve some problems that-

01:08:31.720 | - It will get us to incremental improvements

01:08:34.840 | in label data efficiency.

01:08:38.160 | - Do you have concerns about short-term

01:08:42.160 | or long-term threats from AI,

01:08:44.560 | from artificial intelligence?

01:08:46.000 | - Yes, definitely to some extent.

01:08:50.520 | - And what's the shape of those concerns?

01:08:52.800 | - This is actually something I've briefly written about,

01:08:56.840 | but the capabilities of deep learning technology

01:09:01.840 | can be used in many ways that are concerning

01:09:06.200 | from mass surveillance with things like facial recognition,

01:09:11.920 | in general, tracking lots of data about everyone

01:09:15.480 | and then being able to making sense of this data

01:09:18.960 | to do identification, to do prediction.

01:09:21.040 | That's concerning.

01:09:23.160 | That's something that's being very aggressively pursued

01:09:26.600 | by totalitarian states like China.

01:09:29.960 | One thing I am very much concerned about is that

01:09:34.680 | our lives are increasingly online,

01:09:40.680 | are increasingly digital, made of information,

01:09:43.280 | made of information consumption and information production

01:09:46.720 | or digital footprints, I would say.

01:09:51.880 | And if you absorb all of this data

01:09:56.360 | and you are in control of where you consume information,

01:10:01.360 | social networks and so on,

01:10:03.320 | recommendation engines,

01:10:07.000 | then you can build a sort of reinforcement loop

01:10:11.520 | for human behavior.

01:10:13.880 | You can observe the state of your mind at time T.

01:10:18.400 | You can predict how you would react

01:10:21.120 | to different pieces of content,

01:10:22.760 | how to get you to move your mind in a certain direction.

01:10:27.080 | And then you can feed you the specific piece of content

01:10:32.080 | that would move you in a specific direction.

01:10:35.800 | And you can do this at scale,

01:10:37.840 | at scale in terms of doing it continuously in real time.

01:10:44.960 | You can also do it at scale in terms of scaling this

01:10:47.960 | to many, many people, to entire populations.

01:10:50.400 | So potentially artificial intelligence,

01:10:53.920 | even in its current state,

01:10:55.680 | if you combine it with the internet,

01:10:58.560 | with the fact that we have,

01:11:00.720 | all of our lives are moving to digital devices

01:11:04.160 | and digital information consumption and creation,

01:11:06.800 | what you get is the possibility

01:11:10.640 | to achieve mass manipulation of behavior

01:11:14.520 | and mass psychological control.

01:11:16.880 | And this is a very real possibility.

01:11:18.600 | - Yeah, so you're talking about

01:11:20.080 | any kind of recommender system?

01:11:21.720 | - Yeah.

01:11:22.560 | - Let's look at the YouTube algorithm, Facebook,

01:11:26.160 | anything that recommends content you should watch next.

01:11:29.680 | And it's fascinating to think that there's some aspects

01:11:33.800 | of human behavior that you can,

01:11:36.640 | say a problem of,

01:11:41.080 | is this person hold Republican beliefs or Democratic beliefs

01:11:45.360 | and it's a trivial, that's an objective function

01:11:50.240 | and you can optimize and you can measure

01:11:52.600 | and you can turn everybody into a Republican

01:11:54.360 | or everybody into a Democrat.

01:11:55.200 | - Absolutely, yeah.

01:11:56.040 | I do believe it's true.

01:11:57.880 | So the human mind is very,

01:12:02.000 | if you look at the human mind as a kind of computer program,

01:12:05.200 | it is a very large exploit surface, right?

01:12:07.600 | It has many, many vulnerabilities.

01:12:09.360 | - Exploit surfaces, yeah.

01:12:10.880 | - Ways you can control it.

01:12:13.520 | For instance, when it comes to your political beliefs,

01:12:16.600 | this is very much tied to your identity.

01:12:19.280 | So for instance, if I'm in control of your newsfeed

01:12:23.080 | on your favorite social media platforms,

01:12:26.000 | this is actually where you're getting your news from.

01:12:29.400 | And I can, of course I can choose to only show you news

01:12:33.680 | that will make you see the world in a specific way, right?

01:12:37.120 | But I can also,

01:12:38.360 | create incentives for you to post

01:12:43.280 | about some political beliefs.

01:12:44.640 | And then when I get you to express a statement,

01:12:47.960 | if it's a statement that me as the controller,

01:12:51.800 | I want to reinforce,

01:12:53.720 | I can just show it to people who will agree

01:12:55.560 | and they will like it.

01:12:56.880 | And that will reinforce the statement in your mind.

01:12:59.240 | If this is a statement I want you to,

01:13:01.600 | this is a belief I want you to abandon,

01:13:05.200 | I can, on the other hand, show it to opponents, right?

01:13:09.520 | Will attack you.

01:13:10.560 | And then because they attack you at the very least,

01:13:12.800 | next time you will think twice about posting it.

01:13:16.760 | But maybe you will even, you know,

01:13:18.960 | stop believing this because you got pushback, right?

01:13:22.760 | So there are many ways in which

01:13:27.200 | social media platforms can potentially control your opinions.

01:13:30.520 | And today,

01:13:31.360 | so all of these things are already being controlled

01:13:36.920 | by AI algorithms.

01:13:38.240 | These algorithms do not have

01:13:39.960 | any explicit political goal today.

01:13:42.880 | While potentially they could,

01:13:44.800 | like if some totalitarian government

01:13:49.200 | takes over social media platforms

01:13:52.720 | and decides that now we are gonna use this

01:13:54.960 | not just for mass surveillance,

01:13:56.280 | but also for mass opinion control and behavior control,

01:13:59.160 | you know, very bad things could happen.

01:14:01.880 | But what's really fascinating

01:14:04.760 | and actually quite concerning is that

01:14:07.080 | even without an explicit intent to manipulate,

01:14:11.320 | you're already seeing very dangerous dynamics

01:14:14.840 | in terms of how these content recommendation

01:14:17.960 | algorithms behave.

01:14:19.760 | Because right now, the goal,

01:14:23.440 | the objective function of these algorithms

01:14:26.040 | is to maximize engagement, right?

01:14:28.640 | Which seems fairly innocuous at first, right?

01:14:32.480 | However, it is not because content

01:14:36.520 | that will maximally engage people, you know,

01:14:39.960 | get people to react in an emotional way,

01:14:43.000 | get people to click on something,

01:14:44.760 | it is very often content that, you know,

01:14:49.760 | is not healthy to the public discourse.

01:14:54.400 | For instance, fake news are far more likely

01:14:59.080 | to get you to click on them than real news,

01:15:01.360 | simply because they are not constrained to reality.

01:15:06.360 | So they can be as atrocious, as surprising,

01:15:11.440 | as good stories as you want,

01:15:13.800 | because they're artificial, right?

01:15:15.280 | - Yeah, to me, that's an exciting world

01:15:17.640 | because so much good can come.

01:15:19.600 | So there's an opportunity to educate people.

01:15:24.600 | You can balance people's worldview with other ideas.

01:15:29.640 | So there's so many objective functions,

01:15:33.880 | the space of objective functions

01:15:35.680 | that create better civilizations

01:15:37.960 | is large, arguably infinite.

01:15:40.640 | But there's also a large space that creates division

01:15:45.680 | and destruction, civil war, a lot of bad stuff.

01:15:50.680 | And the worry is, naturally,

01:15:56.200 | probably that space is bigger, first of all.

01:15:59.160 | And if we don't explicitly think about

01:16:01.520 | what kind of effects are going to be observed

01:16:06.520 | from different objective functions,

01:16:08.360 | then we're going to get into trouble.

01:16:10.240 | But the question is, how do we get into rooms

01:16:14.480 | and have discussions?

01:16:16.280 | So inside Google, inside Facebook, inside Twitter,

01:16:20.160 | and think about, okay, how can we drive up engagement

01:16:23.720 | and at the same time create a good society?

01:16:27.960 | Is it even possible to have

01:16:29.280 | that kind of philosophical discussion?

01:16:31.360 | - I think you can definitely try.

01:16:33.080 | So from my perspective,

01:16:34.800 | I would feel rather uncomfortable with companies

01:16:39.480 | that are in control of these news algorithms,

01:16:43.240 | with them making explicit decisions

01:16:45.720 | to manipulate people's opinions or behaviors,

01:16:50.440 | even if the intent is good,

01:16:52.680 | because that's a very totalitarian mindset.

01:16:55.240 | So instead, what I would like to see,

01:16:57.560 | it's probably never going to happen

01:16:58.880 | because it's not super realistic,

01:17:00.360 | but that's actually something I really care about.

01:17:02.560 | I would like all these algorithms

01:17:06.320 | to present configuration settings to their users

01:17:10.600 | so that the users can actually make the decision

01:17:13.880 | about how they want to be impacted

01:17:16.800 | by these information recommendation,

01:17:19.840 | content recommendation algorithms.

01:17:21.960 | For instance, as a user of something like YouTube

01:17:24.840 | or Twitter, maybe I want to maximize learning

01:17:28.320 | about a specific topic, right?

01:17:30.360 | So I want the algorithm to feed my curiosity, right?

01:17:35.360 | Which is in itself a very interesting problem.

01:17:38.680 | So instead of maximizing my engagement,

01:17:41.280 | it will maximize how fast and how much I'm learning.

01:17:44.720 | And it will also take into account the accuracy,

01:17:47.400 | hopefully, of the information I'm learning.

01:17:49.600 | So yeah, the user should be able to determine exactly

01:17:55.680 | how these algorithms are affecting their lives.

01:17:58.640 | I don't want actually any entity making decisions

01:18:03.600 | about in which direction

01:18:06.960 | they're going to try to manipulate me, right?

01:18:09.520 | I want technology.

01:18:11.760 | So AI, these algorithms are increasingly going to be

01:18:15.160 | our interface to a world

01:18:17.480 | that is increasingly made of information.

01:18:20.080 | And I want everyone to be in control of this interface,

01:18:25.080 | to interface with the world on their own terms.

01:18:29.120 | So if someone wants these algorithms

01:18:32.920 | to serve their own personal growth goals,

01:18:37.680 | they should be able to configure these algorithms

01:18:40.680 | in such a way.

01:18:41.880 | - Yeah, but so I know it's painful

01:18:44.960 | to have explicit decisions,

01:18:46.720 | but there is underlying explicit decisions,

01:18:51.120 | which is some of the most beautiful fundamental philosophy

01:18:54.960 | that we have before us, which is personal growth.

01:19:01.120 | If I want to watch videos from which I can learn,

01:19:04.600 | what does that mean?

01:19:08.000 | So if I have a checkbox that wants to emphasize learning,

01:19:11.840 | there's still an algorithm with explicit decisions in it

01:19:15.520 | that would promote learning.

01:19:17.800 | What does that mean for me?

01:19:19.080 | Like, for example, I've watched a documentary

01:19:20.720 | on flat earth theory, I guess.

01:19:23.840 | It was very, like, I learned a lot.

01:19:28.280 | I'm really glad I watched it.

01:19:29.880 | It was a friend recommended it to me.

01:19:31.720 | Not, 'cause I don't have such an allergic reaction

01:19:35.120 | to crazy people as my fellow colleagues do,

01:19:37.680 | but it was very eye-opening.

01:19:40.360 | And for others, it might not be.

01:19:42.200 | For others, they might just get turned off for that.

01:19:45.320 | Same with Republican and Democrat.

01:19:47.240 | And what, it's a non-trivial problem.

01:19:50.320 | And first of all, if it's done well,

01:19:53.000 | I don't think it's something that wouldn't happen,

01:19:56.600 | that YouTube wouldn't be promoting,

01:19:59.320 | or Twitter wouldn't be.

01:20:00.160 | It's just a really difficult problem.

01:20:02.320 | How do we do, how to give people control?

01:20:05.560 | - Well, it's mostly an interface design problem.

01:20:08.040 | - Right.

01:20:09.000 | - The way I see it, you want to create technology

01:20:11.080 | that's like a mentor or a coach or an assistant

01:20:16.080 | so that it's not your boss, right?

01:20:19.480 | You are in control of it.

01:20:22.600 | You are telling it what to do for you.

01:20:25.800 | And if you feel like it's manipulating you,

01:20:27.880 | it's not actually doing what you want.

01:20:31.880 | You should be able to switch to a different algorithm.

01:20:33.880 | - Right.

01:20:35.000 | So that's fine-tune control.

01:20:36.480 | And you kind of learn,

01:20:38.240 | you're trusting the human collaboration.

01:20:40.160 | I mean, that's how I see autonomous vehicles too,

01:20:41.960 | is giving as much information as possible,

01:20:44.560 | and you learn that dance yourself.

01:20:46.440 | Yeah, Adobe, I don't know if you use Adobe products

01:20:50.360 | for like Photoshop. - Yeah, I use Photoshop.

01:20:51.600 | - Yeah.

01:20:52.440 | They're trying to see if they can inject YouTube

01:20:55.080 | into their interface.

01:20:56.200 | Basically allow you to show you all these videos

01:20:59.880 | that, 'cause everybody's confused about what to do

01:21:02.840 | with features, so basically teach people by linking to,

01:21:07.160 | and that way it's an assistant that shows,

01:21:09.480 | uses videos as a basic element of information.

01:21:12.600 | Okay, so what practically should people do

01:21:18.320 | to try to fight against abuses of these algorithms

01:21:24.040 | or algorithms that manipulate us?

01:21:27.400 | - Honestly, it's a very, very difficult problem

01:21:29.280 | because to start with,

01:21:30.120 | there is very little public awareness of these issues.

01:21:33.920 | Very few people would think there's anything wrong

01:21:38.520 | with their newsfeed algorithm,

01:21:39.720 | even though there is actually something wrong already,

01:21:42.040 | which is that it's trying to maximize engagement

01:21:44.480 | most of the time, which has very negative side effects.

01:21:49.880 | So ideally, so the very first thing is to stop

01:21:54.560 | trying to purely maximize engagement,

01:21:59.560 | try to propagate content based on popularity, right?

01:22:04.560 | Instead, take into account the goals

01:22:11.000 | and the profiles of each user.

01:22:13.520 | So you will be, one example is for instance,

01:22:16.920 | when they look at topic recommendations on Twitter,

01:22:20.760 | it's like, you know, they have this news tab

01:22:24.480 | with switch recommendations, it's always the worst garbage

01:22:28.440 | because it's content that appeals to the smallest

01:22:33.160 | common denominator to all Twitter users

01:22:35.200 | because they're trying to optimize,

01:22:37.080 | they're purely trying to optimize popularity,

01:22:39.040 | they're purely trying to optimize engagement,

01:22:41.320 | but that's not what I want.

01:22:42.960 | So they should put me in control of some setting

01:22:46.120 | so that I define what's the objective function

01:22:48.920 | that Twitter is going to be following

01:22:52.200 | to show me this content.

01:22:54.080 | And honestly, so this is all about interface design.

01:22:57.240 | And it's not realistic to give users control

01:23:00.480 | over a bunch of knobs that define algorithm.

01:23:03.400 | Instead, we should purely put them in charge

01:23:06.720 | of defining the objective function.

01:23:09.360 | Like let the user tell us what they want to achieve,

01:23:13.200 | how they want this algorithm to impact their lives.

01:23:15.200 | - So do you think it is that,

01:23:16.640 | or do they provide individual article

01:23:18.720 | by article reward structure where you give a signal,

01:23:21.560 | I'm glad I saw this, or I'm glad I didn't.

01:23:24.680 | - So like a Spotify type feedback mechanism,

01:23:28.480 | it works to some extent.

01:23:29.840 | I'm kind of skeptical about it

01:23:31.960 | because the only way the algorithm,

01:23:34.840 | the algorithm will attempt to relate your choices

01:23:39.120 | with the choices of everyone else, which might, you know,

01:23:43.280 | if you have an average profile that works fine,

01:23:45.720 | I'm sure Spotify accommodations work fine

01:23:47.880 | if you just like mainstream stuff.

01:23:49.560 | If you don't, it can be, it's not optimal at all, actually.

01:23:53.960 | - It'll be an inefficient search

01:23:56.080 | for the part of the Spotify world that represents you.

01:24:00.800 | - So it's a tough problem,

01:24:02.960 | but do note that even a feedback system

01:24:07.960 | like what Spotify has does not give me control

01:24:10.880 | over what the algorithm is trying to optimize for.

01:24:14.960 | - Well, public awareness, which is what we're doing now,

01:24:19.320 | is a good place to start.

01:24:21.320 | Do you have concerns about long-term existential threats

01:24:25.920 | of artificial intelligence?

01:24:27.320 | - Well, as I was saying,

01:24:31.000 | our world is increasingly made of information.

01:24:33.360 | AI algorithms are increasingly gonna be our interface

01:24:36.200 | to this world of information,

01:24:37.840 | and somebody will be in control of these algorithms.

01:24:41.440 | And that puts us in any kind of a bad situation, right?

01:24:45.920 | It has risks.

01:24:46.840 | It has risks coming from potentially large companies

01:24:52.200 | wanting to optimize their own goals,

01:24:54.960 | maybe profits, maybe something else.

01:24:57.120 | Also from governments who might want to use these algorithms

01:25:01.920 | as a means of control of the population.

01:25:04.680 | - Do you think there's existential threat

01:25:06.200 | that could arise from that?

01:25:07.400 | - So existential threat.

01:25:10.280 | So maybe you're referring to the singularity narrative

01:25:14.440 | where robots just take over.

01:25:16.720 | - Well, I don't, not Terminator robots,

01:25:19.360 | and I don't believe it has to be a singularity.

01:25:21.960 | We're just talking to, just like you said,

01:25:25.640 | the algorithm controlling masses of populations.

01:25:28.800 | The existential threat being,

01:25:31.960 | hurt ourselves much like a nuclear war would hurt ourselves.

01:25:37.680 | That kind of thing.

01:25:38.520 | I don't think that requires a singularity,

01:25:40.440 | that requires a loss of control over AI algorithms.

01:25:43.400 | - Yes.

01:25:44.520 | So I do agree there are concerning trends.

01:25:47.960 | Honestly, I wouldn't want to make any long-term predictions.

01:25:52.880 | I don't think today we really have the capability

01:25:56.880 | to see what the dangers of AI are gonna be in 50 years,

01:26:00.480 | in a hundred years.

01:26:02.280 | I do see that we are already faced with concrete

01:26:07.280 | and present dangers surrounding the negative side effects

01:26:12.320 | of content recombination systems, of newsfeed algorithms,

01:26:15.720 | concerning algorithmic bias as well.

01:26:18.440 | So we are delegating more and more decision processes

01:26:24.400 | to algorithms.

01:26:25.840 | Some of those algorithms are uncrafted,

01:26:27.480 | some are learned from data,

01:26:30.080 | but we are delegating control.

01:26:32.680 | Sometimes it's a good thing, sometimes not so much.

01:26:37.040 | And there is in general very little supervision

01:26:40.200 | of this process, right?

01:26:41.640 | So we are still in this period of very fast change,

01:26:46.040 | even chaos, where society is restructuring itself,

01:26:51.040 | turning into an information society,

01:26:53.840 | which itself is turning into an increasingly automated

01:26:57.000 | information processing society.

01:26:59.000 | And well, yeah, I think the best we can do today

01:27:03.160 | is try to raise awareness around some of these issues.

01:27:06.640 | And I think we're actually making good progress.

01:27:08.280 | If you look at algorithmic bias, for instance,

01:27:11.720 | three years ago, even two years ago,

01:27:14.720 | very, very few people were talking about it.

01:27:17.000 | And now all the big companies are talking about it.

01:27:20.280 | They are often not in a very serious way,

01:27:22.320 | but at least it is part of the public discourse.

01:27:24.520 | You see people in Congress talking about it.

01:27:26.560 | And it all started from raising awareness.

01:27:31.560 | - Right.

01:27:32.800 | So in terms of alignment problem,

01:27:36.040 | trying to teach as we allow algorithms,

01:27:39.200 | just even recommender systems on Twitter,

01:27:41.480 | encoding human values and morals,

01:27:47.040 | decisions that touch on ethics,

01:27:50.160 | how hard do you think that problem is?

01:27:52.560 | How do we have loss functions in neural networks

01:27:57.200 | that have some component,

01:27:58.600 | some fuzzy components of human morals?

01:28:01.040 | - Well, I think this is really all about

01:28:04.720 | objective function engineering,

01:28:06.120 | which is probably going to be increasingly

01:28:08.760 | a topic of concern in the future.

01:28:10.520 | Like for now, we're just using very naive loss functions

01:28:14.680 | because the hard part is not actually

01:28:16.600 | what you're trying to minimize, it's everything else.

01:28:19.040 | But as the everything else

01:28:20.880 | is going to be increasingly automated,

01:28:22.920 | we're going to be focusing our human attention

01:28:27.120 | on increasingly high level components.

01:28:30.280 | Like what's actually driving the whole learning system,

01:28:32.640 | like the objective function.

01:28:33.920 | So loss function engineering is going to be,

01:28:36.880 | loss function engineer is probably going to be

01:28:38.480 | a job title in the future.

01:28:40.600 | - And then the tooling you're creating with Keras

01:28:42.680 | essentially takes care of all the details underneath.

01:28:47.000 | And basically the human expert is needed for exactly that.

01:28:52.000 | - That's the idea.

01:28:53.840 | Keras is the interface between the data you're collecting

01:28:57.560 | and the business goals.

01:28:59.000 | And your job as an engineer is going to be to express

01:29:02.440 | your business goals and your understanding of your business

01:29:05.360 | or your product, your system as a kind of loss function

01:29:09.760 | or a kind of set of constraints.

01:29:11.760 | - Does the possibility of creating an AGI system

01:29:14.680 | excite you or scare you?

01:29:17.160 | Or bore you?

01:29:18.120 | - So intelligence can never really be general.

01:29:22.120 | You know, at best it can have some degree of generality

01:29:24.440 | like human intelligence.

01:29:26.400 | It's also always has some specialization

01:29:29.040 | in the same way that human intelligence is specialized

01:29:31.960 | in a certain category of problems,

01:29:33.440 | is specialized in the human experience.

01:29:35.440 | And when people talk about AGI,

01:29:37.280 | I'm never quite sure if they're talking about

01:29:39.480 | very, very smart AI, so smart that it's even smarter

01:29:44.280 | than humans, or they're talking about human-like

01:29:47.200 | intelligence, because these are different things.

01:29:49.720 | - Let's say, presumably I'm impressing you today

01:29:53.280 | with my humanness.

01:29:54.800 | So imagine that I was in fact a robot.

01:29:58.400 | So what does that mean?

01:30:00.760 | I'm impressing you with natural language processing.

01:30:04.960 | Maybe if you weren't able to see me,

01:30:06.440 | maybe this is a phone call.

01:30:07.920 | So that kind of system.

01:30:09.080 | - Okay, so-- - Companion.

01:30:11.160 | - So that's very much about building human-like AI.

01:30:15.080 | And you're asking me, you know,

01:30:16.200 | is this an exciting perspective?

01:30:18.240 | - Yes.

01:30:19.480 | - I think so, yes.

01:30:20.640 | Not so much because of what artificial

01:30:26.160 | human-like intelligence could do,

01:30:28.000 | but, you know, from an intellectual perspective,

01:30:30.880 | I think if you could build truly human-like intelligence,

01:30:34.160 | that means you could actually understand human intelligence,

01:30:37.240 | which is fascinating, right?

01:30:39.880 | Human-like intelligence is gonna require emotions,

01:30:42.680 | it's gonna require consciousness,

01:30:44.400 | which is not things that would normally be required

01:30:47.080 | by an intelligent system.

01:30:49.720 | If you look at, you know, we were mentioning earlier,

01:30:51.880 | like science as superhuman problem-solving agent or system,

01:30:56.880 | it does not have consciousness, it doesn't have emotions.

01:31:02.120 | In general, so emotions, I see consciousness

01:31:05.320 | as being on the same spectrum as emotions.

01:31:07.680 | It is a component of the subjective experience

01:31:12.280 | that is meant very much to guide behavior generation, right?

01:31:17.280 | It's meant to guide your behavior.

01:31:20.840 | In general, human intelligence and animal intelligence

01:31:24.560 | has evolved for the purpose of behavior generation, right?

01:31:29.360 | Including in a social context,

01:31:30.680 | so that's why we actually need emotions,

01:31:32.520 | that's why we need consciousness.

01:31:34.960 | An artificial intelligence system

01:31:36.640 | developed in a different context may well never need them,

01:31:39.760 | may well never be conscious, like science.

01:31:43.120 | - Well, on that point, I would argue it's possible

01:31:46.000 | to imagine that there's echoes of consciousness in science

01:31:51.000 | when viewed as an organism, that science is consciousness.

01:31:54.560 | - So, I mean, how would you go about testing this hypothesis?

01:31:59.200 | How do you probe the subjective experience

01:32:02.960 | of an abstract system like science?

01:32:06.440 | - Well, the point of probing any subjective experience

01:32:09.560 | is impossible, 'cause I'm not science, I'm Lex.

01:32:13.240 | So I can't probe another entity's,

01:32:16.080 | it's no more than bacteria on my skin.

01:32:20.600 | - You're Lex, I can ask you questions

01:32:22.680 | about your subjective experience and you can answer me,

01:32:25.240 | and that's how I know you're conscious.

01:32:27.400 | - Yes, but that's because we speak the same language.

01:32:31.880 | You perhaps, we have to speak the language of science

01:32:35.080 | in order to ask it. - Honestly,

01:32:35.920 | I don't think consciousness, just like emotions

01:32:38.640 | of pain and pleasure, is not something that inevitably arises

01:32:43.640 | from any sort of sufficiently

01:32:46.040 | intelligent information processing.

01:32:48.000 | It is a feature of the mind,

01:32:49.920 | and if you've not implemented it explicitly,

01:32:52.480 | it is not there.

01:32:54.000 | - So you think it's an emergent feature

01:32:56.960 | of a particular architecture.

01:32:59.040 | So do you think--

01:33:00.400 | - It's a feature in the same sense.

01:33:02.040 | So again, the subjective experience

01:33:04.200 | is all about guiding behavior.

01:33:07.560 | If the problems you're trying to solve

01:33:11.960 | don't really involve embodied agents,

01:33:15.240 | maybe in a social context, generating behavior

01:33:18.000 | and pursuing goals like this.

01:33:19.600 | And if you look at science,

01:33:20.840 | that's not really what's happening, even though it is.

01:33:23.040 | It is a form of artificial AI, artificial intelligence,

01:33:28.040 | in the sense that it is solving problems,

01:33:30.320 | it is accumulating knowledge,

01:33:32.120 | accumulating solutions and so on.

01:33:34.120 | So if you're not explicitly implementing

01:33:38.160 | a subjective experience,

01:33:39.560 | implementing certain emotions

01:33:42.440 | and implementing consciousness,

01:33:44.160 | it's not gonna just spontaneously emerge.

01:33:47.400 | - Yeah, but so for a system like,

01:33:50.200 | human-like intelligent system that has consciousness,

01:33:53.400 | do you think it needs to have a body?

01:33:56.000 | - Yes, definitely.

01:33:56.840 | I mean, it doesn't have to be a physical body, right?

01:33:59.800 | And there's not that much difference

01:34:01.360 | between a realistic simulation and the real world.

01:34:03.520 | - Oh, so there has to be something

01:34:04.800 | you have to preserve kind of thing.

01:34:06.480 | - Yes, but human-like intelligence

01:34:08.800 | can only arise in a human-like context.

01:34:11.960 | Intelligence in the start.

01:34:12.800 | - In other humans, in order for you to demonstrate

01:34:16.920 | that you have human-like intelligence essentially.

01:34:19.120 | - Yes.

01:34:20.320 | - So what kind of test and demonstration

01:34:25.320 | would be sufficient for you

01:34:28.280 | to demonstrate human-like intelligence?

01:34:31.120 | - Yeah.

01:34:31.960 | - I just out of curiosity,

01:34:32.800 | you've talked about in terms of theorem proving

01:34:35.680 | and program synthesis,

01:34:37.120 | I think you've written about

01:34:38.160 | that there's no good benchmarks for this.

01:34:40.600 | - Yeah.

01:34:41.440 | - That's one of the problems.

01:34:42.280 | So let's talk program synthesis.

01:34:46.480 | So what do you imagine is a good,

01:34:48.960 | I think it's related questions for human-like intelligence

01:34:51.560 | and for program synthesis.

01:34:53.240 | What's a good benchmark for either or both?

01:34:56.160 | - Right, so I mean, you're actually asking two questions,

01:34:59.440 | which is one is about qualifying intelligence

01:35:02.680 | and comparing the intelligence of an artificial system

01:35:07.120 | to the intelligence for human.

01:35:08.680 | And the other is about degree to which

01:35:12.040 | this intelligence is human-like.

01:35:13.680 | It's actually two different questions.

01:35:15.600 | So if you look, you mentioned earlier the Turing test.

01:35:19.320 | - Right.

01:35:20.160 | - Well, I actually don't like the Turing test

01:35:21.720 | because it's very lazy.

01:35:23.400 | It's all about completely bypassing the problem

01:35:26.720 | of defining and measuring intelligence.

01:35:28.680 | - Right.

01:35:29.520 | - And instead delegating to a human judge

01:35:32.640 | or a panel of human judges.

01:35:34.360 | So it's a total cop out, right?

01:35:37.480 | If you want to measure how human-like an agent is,

01:35:43.360 | I think you have to make it interact with other humans.

01:35:47.680 | Maybe it's not necessarily a good idea

01:35:49.840 | to have these other humans be the judges.

01:35:53.960 | Maybe you should just observe behavior

01:35:56.880 | and compare it to what a human would actually have done.

01:35:59.680 | When it comes to measuring how smart,

01:36:03.280 | how clever an agent is and comparing that

01:36:06.360 | to the degree of human intelligence.

01:36:11.240 | So we're already talking about two things, right?

01:36:13.680 | The degree, kind of like the magnitude of an intelligence

01:36:17.840 | and its direction, right?

01:36:20.520 | Like the norm of a vector and its direction.

01:36:23.400 | And the direction is like human likeness

01:36:26.960 | and the magnitude, the norm is intelligence.

01:36:31.960 | You could call it intelligence, right?

01:36:34.240 | - So the direction, your sense, the space of directions

01:36:38.840 | that are human-like is very narrow.

01:36:41.160 | - Yeah.

01:36:42.360 | So the way you would measure the magnitude of intelligence

01:36:48.320 | in a system in a way that also enables you to compare it

01:36:51.920 | to that of a human.

01:36:54.800 | Well, if you look at different benchmarks

01:36:58.320 | for intelligence today, they're all too focused on skill

01:37:03.280 | at a given task.

01:37:04.320 | That's skill at playing chess, skill at playing Go,

01:37:07.640 | skill at playing Dota.

01:37:09.120 | And I think that's not the right way to go about it

01:37:14.480 | because you can always be too human at one specific task.

01:37:19.360 | The reason why our skill at playing Go or juggling

01:37:23.680 | or anything is impressive is because we are expressing

01:37:26.160 | this skill within a certain set of constraints.

01:37:29.440 | If you remove the constraints, the constraints

01:37:32.080 | that we have one lifetime, that we have this body and so on,

01:37:35.880 | if you remove the context, if you have unlimited string data,

01:37:40.000 | if you can have access to, for instance,

01:37:41.960 | if you look at juggling, if you have no restriction

01:37:44.920 | on the hardware, then achieving arbitrary levels of skill

01:37:48.800 | is not very interesting and says nothing about

01:37:51.920 | the amount of intelligence you've achieved.

01:37:53.880 | So if you want to measure intelligence,

01:37:55.720 | you need to rigorously define what intelligence is,

01:37:59.920 | which in itself, it's a very challenging problem.

01:38:04.400 | - And do you think that's possible?

01:38:05.960 | - To define intelligence, yes, absolutely.

01:38:07.520 | I mean, you can provide, many people have provided

01:38:10.280 | some definition, I have my own definition.

01:38:13.520 | - Where does your definition begin if it doesn't end?

01:38:16.240 | - Well, I think intelligence is essentially the efficiency

01:38:21.240 | with which you turn experience into generalizable programs.

01:38:27.440 | So what that means is it's the efficiency

01:38:32.000 | with which you turn a sampling of experience space

01:38:36.360 | into the ability to process a larger chunk

01:38:41.360 | of experience space.

01:38:46.080 | So measuring skill can be one proxy,

01:38:51.080 | because many different tasks can be one proxy

01:38:53.600 | for measuring intelligence,

01:38:54.560 | but if you want to only measure skill,

01:38:57.360 | you should control for two things.

01:38:58.840 | You should control for the amount of experience

01:39:03.960 | that your system has and the priors that your system has.

01:39:08.960 | But if you control, if you look at two agents

01:39:12.080 | and you give them the same priors

01:39:13.960 | and you give them the same amount of experience,

01:39:17.200 | there is one of the agents that is going to learn programs,

01:39:22.200 | representations, something, a model,

01:39:24.840 | that will perform well on the larger chunk

01:39:27.800 | of experience space than the other,

01:39:29.560 | and that is the smaller agent.

01:39:32.320 | - Yeah, so if you fix the experience,

01:39:35.280 | which generate better programs,

01:39:37.800 | better meaning more generalizable.

01:39:39.760 | That's really interesting.

01:39:40.680 | That's a very nice, clean definition of--

01:39:42.600 | - Oh, by the way, in this definition,

01:39:45.600 | it is already very obvious

01:39:47.440 | that intelligence has to be specialized,

01:39:49.560 | because you're talking about experience space

01:39:51.920 | and you're talking about segments of experience space.

01:39:54.240 | You're talking about priors

01:39:55.640 | and you're talking about experience.

01:39:57.320 | All of these things define the context

01:40:00.480 | in which intelligence emerges.

01:40:02.880 | And you can never look at the totality

01:40:07.560 | of experience space, right?

01:40:09.000 | So intelligence has to be specialized.

01:40:12.400 | - But it can be sufficiently large,

01:40:13.720 | the experience space, even though it's specialized.

01:40:16.240 | There's a certain point when the experience space

01:40:18.520 | is large enough to where it might as well be general.

01:40:22.120 | It feels general, it looks general.

01:40:23.960 | - Sure, I mean, it's very relative.

01:40:25.760 | Like, for instance, many people would say

01:40:27.440 | human intelligence is general.

01:40:29.440 | In fact, it is quite specialized.

01:40:31.600 | You know, we can definitely build systems

01:40:34.720 | that start from the same innate priors

01:40:37.240 | as what humans have at birth,

01:40:39.160 | because we already understand fairly well

01:40:42.400 | what sort of priors we have as humans.

01:40:44.640 | Like, many people have worked on this problem,

01:40:46.880 | most notably Elisabeth Spelke from Harvard,

01:40:51.160 | I don't know if you know her.

01:40:52.400 | She's worked a lot on what she calls core knowledge,

01:40:56.120 | and it is very much about trying to determine

01:40:59.160 | and describe what priors we are born with.

01:41:02.480 | - Like language skills and so on, all that kind of stuff.

01:41:04.800 | - Exactly.

01:41:05.640 | So we have some pretty good understanding

01:41:09.840 | of what priors we are born with.

01:41:11.480 | So we could, so I've actually been working on a benchmark

01:41:16.480 | for the past couple of years, you know, on and off.

01:41:18.760 | I hope to be able to release it at some point.

01:41:21.400 | The idea is to measure the intelligence of systems

01:41:26.880 | by considering for priors,

01:41:28.760 | considering for amount of experience,

01:41:30.600 | and by assuming the same priors

01:41:33.760 | as what humans are born with,

01:41:34.920 | so that you can actually compare these scores

01:41:37.880 | to human intelligence,

01:41:39.680 | and you can actually have humans pass the same test

01:41:42.080 | in a way that's fair.

01:41:44.160 | And so importantly, such a benchmark

01:41:48.080 | should be such that

01:41:49.200 | any amount of practicing does not increase your score.

01:41:56.680 | So try to picture a game

01:41:58.560 | where no matter how much you play this game,

01:42:00.960 | that does not change your skill at the game.

01:42:05.280 | Can you picture that?

01:42:06.360 | - As a person who deeply appreciates practice,

01:42:11.640 | I cannot actually.

01:42:12.840 | I cannot, I cannot, I, yeah.

01:42:16.720 | - There's actually a very simple trick.

01:42:18.920 | So in order to come up with a task,

01:42:21.800 | so the only thing you can measure is skill at a task.

01:42:24.160 | - Yes.

01:42:25.000 | - So these tasks are gonna involve priors.

01:42:27.560 | - Yeah.

01:42:28.400 | - The trick is to know what they are,

01:42:30.600 | and to describe that.

01:42:32.360 | And then you make sure that this is the same set of priors

01:42:34.800 | as what humans start with.

01:42:36.280 | So you create a task that assumes these priors,

01:42:38.960 | that exactly documents these priors,

01:42:40.680 | so that the priors are made explicit,

01:42:42.520 | and there are no other priors involved.

01:42:44.520 | And then you generate a certain number of samples

01:42:49.200 | in experience space for this task, right?

01:42:52.040 | And this, for one task,

01:42:54.880 | assuming that the task is new for the agent passing it,

01:42:59.320 | that's one test of this definition of intelligence

01:43:04.320 | that we set up.

01:43:07.360 | And now you can scale that to many different tasks,

01:43:09.880 | that all, you know,

01:43:10.960 | each task should be new to the agent passing it, right?

01:43:14.480 | And also should be human, yeah,

01:43:15.920 | human interpretable and understandable,

01:43:17.480 | so that you can actually have a human pass the same test,

01:43:19.960 | and then you can compare the score of your machine

01:43:21.680 | and the score of your human.

01:43:22.840 | - Which could be a lot,

01:43:23.680 | you could even start a task like MNIST,

01:43:26.960 | just as long as you start with the same set of priors.

01:43:28.640 | - Yeah, so the problem with MNIST,

01:43:30.600 | humans are already trying to recognize digits, right?

01:43:34.640 | And, but let's say we're considering objects

01:43:39.640 | that are not digits, some completely arbitrary patterns.

01:43:44.600 | Well, humans already come with visual priors

01:43:47.840 | about how to process that.

01:43:50.000 | So in order to make the game fair,

01:43:51.960 | you would have to isolate these priors and describe them,

01:43:56.200 | and then express them as computational rules.

01:43:58.520 | - Having worked a lot with vision science people,

01:44:01.440 | that's exceptionally difficult.

01:44:03.080 | A lot of progress has been made,

01:44:04.360 | there's been a lot of good tests,

01:44:05.920 | and basically reducing all of human vision

01:44:08.080 | into some good priors.

01:44:09.640 | I mean, we're still probably far away from that perfectly,

01:44:12.160 | but as a start for a benchmark,

01:44:14.440 | that's an exciting possibility.

01:44:15.920 | - Yeah, so Elisabeth Belke actually lists

01:44:20.880 | objectness as one of the core knowledge priors.

01:44:24.280 | - Objectness, cool.

01:44:25.400 | - Objectness, yeah.

01:44:27.000 | So we have priors about objectness,

01:44:29.000 | like about the visual space, about time,

01:44:31.000 | about agents, about goal-oriented behavior.

01:44:34.480 | We have many different priors,

01:44:37.000 | but what's interesting is that,

01:44:38.920 | sure, we have this pretty diverse and rich set of priors,

01:44:44.520 | but it's also not that diverse, right?

01:44:48.280 | We are not born into this world

01:44:50.240 | with a ton of knowledge about the world,

01:44:52.520 | with only a small set of

01:44:55.760 | core knowledge.

01:44:59.800 | - Yeah, sorry, do you have a sense of how

01:45:02.840 | it feels to us humans that that set is not that large?

01:45:07.080 | But just even the nature of time

01:45:09.440 | that we kind of integrate pretty effectively

01:45:11.680 | through all of our perception, all of our reasoning,

01:45:14.600 | maybe how, you know,

01:45:16.840 | do you have a sense of how easy it is

01:45:18.320 | to encode those priors?

01:45:19.600 | Maybe it requires building a universe

01:45:23.280 | and then the human brain in order to encode those priors.

01:45:27.600 | Or do you have a hope that it can be listed

01:45:29.840 | like in axiomatic sense?

01:45:30.680 | - I don't think so.

01:45:31.520 | You have to keep in mind that any knowledge

01:45:33.600 | about the world that we are born with

01:45:36.000 | is something that has to have been encoded

01:45:39.680 | into our DNA by evolution at some point.

01:45:43.120 | And DNA is a very, very low bandwidth medium.

01:45:47.520 | Like it's extremely long and expensive

01:45:50.640 | to encode anything into DNA,

01:45:52.040 | because first of all,

01:45:53.160 | you need some sort of evolutionary pressure

01:45:57.160 | to guide this writing process.

01:45:59.200 | And then, you know,

01:46:01.640 | the higher level of information you're trying to write,

01:46:03.840 | the longer it's going to take.

01:46:05.360 | And the thing in the environment

01:46:11.560 | that you're trying to encode knowledge about

01:46:13.840 | has to be stable over this duration.

01:46:17.120 | So you can only encode into DNA things

01:46:19.680 | that constitute an evolutionary advantage.

01:46:22.760 | So this is actually a very small subset

01:46:25.240 | of all possible knowledge about the world.

01:46:27.080 | You can only encode things that are stable,

01:46:31.160 | that are true over very, very long periods of time,

01:46:33.760 | typically millions of years.

01:46:35.400 | For instance, we might have some visual prior

01:46:37.240 | about the shape of snakes, right?

01:46:40.320 | But all the, what makes a face,

01:46:43.640 | what's the difference between a face and an ant face?

01:46:46.280 | But consider this interesting question.

01:46:49.800 | Do we have any innate sense of the visual difference

01:46:54.800 | between a male face and a female face?

01:46:58.480 | What do you think?

01:46:59.360 | For a human, I mean.

01:47:01.800 | - I would have to look back into evolutionary history

01:47:03.840 | when the gender is emerged, but yeah, most.

01:47:08.400 | I mean, the faces of humans are quite different

01:47:10.440 | from the faces of great apes, great apes, right?

01:47:13.520 | - Yeah, that's interesting, but yeah.

01:47:17.520 | - You couldn't tell the face of a female chimpanzee

01:47:21.440 | from the face of a male chimpanzee, probably.

01:47:23.520 | - Yeah, and I don't think most humans have all that ability.

01:47:26.280 | - So we do have innate knowledge of what makes a face,

01:47:30.880 | but it's actually impossible for us to have any DNA

01:47:34.840 | encoding knowledge of the difference

01:47:36.800 | between a female human face and a male human face,

01:47:40.440 | because that knowledge, that information

01:47:44.960 | came up into the world actually very recently.

01:47:50.720 | If you look at the slowness of the process

01:47:54.480 | of encoding knowledge into DNA.

01:47:56.560 | - Yeah, so that's interesting.

01:47:57.520 | That's a really powerful argument.

01:47:59.280 | The DNA is a low bandwidth,

01:48:00.800 | and it takes a long time to encode.

01:48:02.960 | That naturally creates a very efficient encoding.

01:48:05.360 | - And one important consequence of this is that,

01:48:09.720 | so yes, we are born into this world

01:48:12.200 | with a bunch of knowledge,

01:48:13.760 | sometimes a high level knowledge about the world,

01:48:15.880 | like the shape, the rough shape of a snake,

01:48:18.120 | of the rough shape of face.

01:48:20.640 | But importantly, because this knowledge

01:48:22.760 | takes so long to write,

01:48:24.200 | almost all of this innate knowledge is shared

01:48:29.120 | with our cousins, with great apes, right?

01:48:33.240 | So it is not actually this innate knowledge

01:48:35.640 | that makes us special.

01:48:37.320 | - But to throw it right back at you

01:48:39.160 | from the earlier on in our discussion,

01:48:42.120 | that encoding might also include

01:48:46.240 | the entirety of the environment of Earth.

01:48:49.520 | - To some extent, so it can include things

01:48:53.200 | that are important to survival and production,

01:48:56.320 | so for which there is some evolutionary pressure,

01:48:59.000 | and things that are stable, constant,

01:49:01.600 | over very, very, very long time periods.

01:49:04.960 | And honestly, it's not that much information.

01:49:07.240 | There's also, besides the bandwidths constraint

01:49:10.440 | and the constraints of the writing process,

01:49:15.360 | there's also memory constraints.

01:49:18.520 | Like DNA, the part of DNA that deals with the human brain,

01:49:22.320 | it's actually fairly small.

01:49:23.480 | It's like, you know, on the order of megabytes, right?

01:49:26.640 | There's not that much high level knowledge

01:49:28.640 | about the world you can encode.

01:49:31.200 | - That's quite brilliant and hopeful for a benchmark

01:49:35.200 | of, that you're referring to, of encoding priors.

01:49:38.960 | I actually look forward to, I'm skeptical

01:49:41.680 | whether you can do it in the next couple of years,

01:49:43.080 | but hopefully.

01:49:44.520 | - I've been working on it.

01:49:45.440 | So honestly, it's a very simple benchmark,

01:49:47.440 | and it's not like a big breakthrough or anything.

01:49:49.600 | It's more like a fun side project, right?

01:49:52.880 | - But these fun, so is ImageNet.

01:49:56.360 | - These fun side projects could launch entire groups

01:50:00.760 | of efforts towards creating reasoning systems and so on.

01:50:05.160 | And I think-

01:50:06.000 | - Yeah, that's the goal.

01:50:06.840 | It's trying to measure strong generalization,

01:50:09.200 | to measure the strength of abstraction in our minds,

01:50:12.840 | well, in our minds and in artificial intelligence.

01:50:17.080 | - And if there's anything true about this science organism,

01:50:20.920 | it's individual cells love competition.

01:50:24.880 | So, and benchmarks encourage competition.

01:50:26.960 | So that's an exciting possibility.

01:50:29.640 | If you, do you think an AI winter is coming?

01:50:33.720 | And how do we prevent it?

01:50:35.440 | - Not really.

01:50:36.280 | So an AI winter is something that would occur

01:50:39.680 | when there's a big mismatch

01:50:41.360 | between how we are selling the capabilities of AI

01:50:44.800 | and the actual capabilities of AI.

01:50:47.400 | And today, so deep learning is creating a lot of value,

01:50:50.760 | and it will keep creating a lot of value in the sense that

01:50:54.760 | these models are applicable to a very wide range of problems

01:50:58.960 | that are relevant today.

01:51:00.040 | And we are only just getting started

01:51:02.160 | with applying algorithms to every problem

01:51:05.240 | that could be solving.

01:51:06.360 | So deep learning will keep creating a lot of value

01:51:09.040 | for the time being.

01:51:10.280 | What's concerning, however, is that there's a lot of hype

01:51:14.920 | around deep learning and around AI.

01:51:16.280 | There are lots of people are overselling the capabilities

01:51:20.040 | of these systems, not just the capabilities,

01:51:22.880 | but also overselling the fact that they might be

01:51:26.640 | more or less, you know, brain-like,

01:51:29.240 | like given a kind of a mystical aspect,

01:51:34.360 | these technologies,

01:51:35.720 | and also overselling the pace of progress,

01:51:39.360 | which, you know, it might look fast in the sense that

01:51:43.960 | we have this exponentially increasing number of papers.

01:51:46.760 | But again, that's just a simple consequence of the fact

01:51:51.720 | that we have ever more people coming into the field.

01:51:54.600 | It doesn't mean the progress is actually exponentially fast.

01:51:57.720 | Like, let's say you're trying to raise money

01:52:00.560 | for your startup or your research lab.

01:52:02.840 | You might want to tell, you know,

01:52:05.160 | a grandiose story to investors about how deep learning

01:52:09.160 | is just like the brain and how it can solve

01:52:11.640 | all these incredible problems like self-driving

01:52:14.360 | and robotics and so on.

01:52:15.880 | And maybe you can tell them that the field is progressing

01:52:18.280 | so fast and we are going to have AGI within 15 years

01:52:21.640 | or even 10 years.

01:52:23.000 | And none of this is true.

01:52:25.960 | And every time you're like saying these things

01:52:30.440 | and an investor or, you know, a decision maker believes them,

01:52:34.520 | well, this is like the equivalent of taking on

01:52:37.880 | credit card debt, but for trust, right?

01:52:42.520 | - Yeah.

01:52:43.080 | - And maybe this will, you know,

01:52:47.800 | this will be what enables you to raise a lot of money,

01:52:50.680 | but ultimately you are creating damage.

01:52:53.560 | You are damaging the field.

01:52:54.600 | - So that's the concern is that that debt,

01:52:57.320 | that's what happens with the other AI winters.

01:52:59.240 | Is the concern is you actually tweet about this

01:53:02.600 | with autonomous vehicles, right?

01:53:04.120 | There's almost every single company now have promised

01:53:07.320 | that they will have full autonomous vehicles by 2021, 2022.

01:53:11.640 | - That's a good example of the consequences

01:53:15.160 | of overhyping the capabilities of AI

01:53:17.960 | and the pace of progress.

01:53:19.080 | - So, because I work especially a lot recently in this area,

01:53:23.000 | I have a deep concern of what happens

01:53:25.720 | when all of these companies, after I've invested billions,

01:53:29.400 | have a meeting and say, how much did we actually,

01:53:31.960 | first of all, do we have an autonomous vehicle?

01:53:33.560 | The answer will definitely be no.

01:53:35.240 | And second would be, wait a minute,

01:53:37.880 | we've invested one, two, three, $4 billion into this

01:53:41.320 | and we made no profit.

01:53:43.160 | And the reaction to that may be going very hard

01:53:47.000 | in another direction that might impact

01:53:49.480 | even other industries.

01:53:50.280 | - And that's what we call an AI winter

01:53:52.280 | is when there is backlash,

01:53:53.480 | where no one believes any of these promises anymore

01:53:56.840 | because they've turned out to be big lies

01:53:58.680 | the first time around.

01:53:59.640 | And this will definitely happen to some extent

01:54:03.000 | for autonomous vehicles

01:54:04.600 | because the public and decision makers have been convinced

01:54:08.040 | that around 2015, they've been convinced

01:54:12.760 | by these people who are trying to raise money

01:54:14.760 | for their startups and so on,

01:54:16.200 | that L5 driving was coming in maybe 2016,

01:54:20.840 | maybe 2017, maybe 2018.

01:54:22.920 | Now in 2019, we're still waiting for it.

01:54:26.120 | And so I don't believe we are going to have

01:54:30.360 | a full-on AI winter because we have these technologies

01:54:33.400 | that are producing a tremendous amount of real value.

01:54:36.680 | - Yeah.

01:54:37.160 | - But there is also too much hype.

01:54:39.960 | So there will be some backlash,

01:54:41.640 | especially there will be backlash.

01:54:43.560 | So, you know, some startups are trying to sell

01:54:47.160 | the dream of AGI, right?

01:54:49.800 | And the fact that AGI is going to create infinite value,

01:54:53.720 | like AGI is like a free lunch.

01:54:55.720 | Like if you can develop an AI system

01:54:58.920 | that passes a certain threshold of IQ or something,

01:55:02.760 | then suddenly you have infinite value.

01:55:04.360 | And well, there are actually lots of investors

01:55:09.240 | buying into this idea.

01:55:11.240 | And, you know, they will wait maybe 10, 15 years

01:55:15.800 | and nothing will happen.

01:55:17.240 | And the next time around, well, maybe there will be

01:55:21.240 | a new generation of investors.

01:55:22.600 | No one will care.

01:55:23.400 | You know, human memory is fairly short after all.

01:55:26.920 | - I don't know about you, but because I've spoken about AGI

01:55:31.640 | sometimes poetically, like I get a lot of emails from people

01:55:35.480 | giving me, they're usually like large manifestos

01:55:39.960 | of they say to me that they have created an AGI system

01:55:46.120 | or they know how to do it.

01:55:47.160 | And there's a long write-up of how to do it.

01:55:48.920 | - I get a lot of these emails, yeah.

01:55:50.120 | - They're a little bit, feel like it's generated

01:55:53.560 | by an AI system actually, but there's usually no diagram.

01:55:57.960 | - Maybe that's recursively self-improving AI.

01:56:00.760 | - Exactly.

01:56:01.320 | - It's you have a transformer generating

01:56:03.160 | crank papers about AGI.

01:56:06.200 | - So the question is about, because you've been such a good,

01:56:09.560 | you have a good radar for crank papers.

01:56:12.600 | How do we know they're not onto something?

01:56:16.680 | How do I, so when you start to talk about AGI

01:56:22.280 | or anything like the reasoning benchmarks and so on,

01:56:24.760 | so something that doesn't have a benchmark,

01:56:27.080 | it's really difficult to know.

01:56:28.200 | I mean, I talked to Jeff Hawkins,

01:56:31.160 | who's really looking at neuroscience approaches to how,

01:56:35.160 | and there's some, there's echoes of really interesting ideas

01:56:40.200 | in at least Jeff's case, which he's showing.

01:56:42.360 | How do you usually think about this?

01:56:45.000 | Like preventing yourself from being too narrow-minded

01:56:49.720 | and elitist about deep learning.

01:56:52.680 | It has to work on these particular benchmarks,

01:56:55.400 | otherwise it's trash.

01:56:56.360 | - Well, you know, the thing is intelligence

01:57:02.520 | does not exist in the abstract.

01:57:05.240 | Intelligence has to be applied.

01:57:07.160 | So if you don't have a benchmark,

01:57:08.440 | if you're not doing an improvement on some benchmark,

01:57:10.600 | maybe it's a new benchmark, right?

01:57:12.360 | Maybe it's not something we've been looking at before,

01:57:14.600 | but you do need a problem that you're trying to solve.

01:57:17.400 | You're not going to come up with a solution

01:57:19.080 | without a problem.

01:57:19.960 | - So you, general intelligence, I mean,

01:57:23.640 | you've clearly highlighted generalization.

01:57:25.480 | If you want to claim that you have an intelligence system,

01:57:30.040 | it should come with a benchmark.

01:57:31.080 | - It should, yes.

01:57:32.120 | It should display capabilities of some kind.

01:57:35.720 | It should show that it can create some form of value,

01:57:40.040 | even if it's a very artificial form of value.

01:57:42.760 | And that's also the reason why you don't actually

01:57:45.720 | need to care about telling which papers

01:57:48.600 | have actually some hidden potential and which do not.

01:57:52.040 | Because if there is a new technique

01:57:56.520 | that's actually creating value,

01:57:57.960 | you know, this is going to be brought to light very quickly

01:58:00.600 | because it's actually making a difference.

01:58:02.440 | So it's the difference between something that's ineffectual

01:58:04.920 | and something that is actually useful.

01:58:08.840 | And ultimately usefulness is our guide,

01:58:11.800 | not just in this field,

01:58:12.840 | but if you look at science in general,

01:58:14.920 | maybe there are many, many people over the years

01:58:16.920 | that have had some really interesting theories

01:58:19.880 | of everything, but they were just completely useless.

01:58:22.840 | And you don't actually need to tell the interesting theories

01:58:26.280 | from the useless theories.

01:58:28.040 | All you need is to see, you know,

01:58:30.200 | is this actually having an effect on something else?

01:58:33.880 | You know, is this actually useful?

01:58:35.400 | Is this making an impact or not?

01:58:36.760 | - That's beautifully put.

01:58:38.680 | I mean, the same applies to quantum mechanics,

01:58:41.000 | to a string theory, to the holographic principle.

01:58:43.480 | - Like we are doing deep learning because it works.

01:58:45.480 | You know, that's like before it started working,

01:58:48.280 | people, you know, considered people working

01:58:50.840 | on neural networks as cranks very much.

01:58:53.240 | Like, you know, no one was working on this anymore.

01:58:56.360 | And now it's working, which is what makes it valuable.

01:58:59.160 | It's not about being right, right?

01:59:01.160 | It's about being effective.

01:59:02.600 | - And nevertheless, the individual entities

01:59:04.360 | of the scientific mechanism,

01:59:06.600 | just like Yoshua Banjo, Yan Likun,

01:59:09.240 | while being called cranks, stuck with it, right?

01:59:12.760 | - Yeah.

01:59:13.320 | - And so us individual agents,

01:59:15.480 | even if everyone's laughing at us,

01:59:17.000 | just stick with it because--

01:59:18.920 | - If you believe you have something,

01:59:20.040 | you should stick with it and see it through.

01:59:21.880 | - That's a beautiful, inspirational message to end on.

01:59:25.960 | Francois, thank you so much for talking today.

01:59:27.640 | That was amazing.

01:59:28.280 | - Thank you.

01:59:29.640 | (upbeat music)

01:59:29.800 | (upbeat music)

01:59:29.880 | [BLANK_AUDIO]

01:59:37.760 | [BLANK_AUDIO]

01:59:47.760 | (upbeat music)

François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38

Chapters