back to index

Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70


Chapters

0:0 Introduction
2:12 Difference between a computer and a human brain
3:43 Computer abstraction layers and parallelism
17:53 If you run a program multiple times, do you always get the same answer?
20:43 Building computers and teams of people
22:41 Start from scratch every 5 years
30:5 Moore's law is not dead
55:47 Is superintelligence the next layer of abstraction?
60:2 Is the universe a computer?
63:0 Ray Kurzweil and exponential improvement in technology
64:33 Elon Musk and Tesla Autopilot
80:51 Lessons from working with Elon Musk
88:33 Existential threats from AI
92:38 Happiness and the meaning of life

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Jim Keller,
00:00:03.020 | legendary microprocessor engineer
00:00:05.560 | who has worked at AMD, Apple, Tesla, and now Intel.
00:00:10.160 | He's known for his work on AMD K7, K8, K12,
00:00:13.520 | and Zen microarchitectures, Apple A4 and A5 processors,
00:00:18.040 | and co-author of the specification
00:00:20.080 | for the x86-64 instruction set
00:00:23.040 | and HyperTransport Interconnect.
00:00:26.160 | He's a brilliant first principles engineer
00:00:28.440 | and out of the box thinker,
00:00:30.040 | and just an interesting and fun human being to talk to.
00:00:33.480 | This is the Artificial Intelligence Podcast.
00:00:36.480 | If you enjoy it, subscribe on YouTube,
00:00:38.840 | give it five stars on Apple Podcast,
00:00:40.840 | follow on Spotify, support it on Patreon,
00:00:43.500 | or simply connect with me on Twitter,
00:00:45.600 | Alex Friedman, spelled F-R-I-D-M-A-N.
00:00:49.560 | I recently started doing ads
00:00:51.040 | at the end of the introduction.
00:00:52.600 | I'll do one or two minutes after introducing the episode
00:00:55.560 | and never any ads in the middle
00:00:57.100 | that can break the flow of the conversation.
00:00:59.400 | I hope that works for you
00:01:00.780 | and doesn't hurt the listening experience.
00:01:04.060 | This show is presented by Cash App,
00:01:06.160 | the number one finance app in the App Store.
00:01:08.640 | I personally use Cash App to send money to friends,
00:01:11.440 | but you can also use it to buy, sell,
00:01:13.200 | and deposit Bitcoin in just seconds.
00:01:15.600 | Cash App also has a new investing feature.
00:01:18.480 | You can buy fractions of a stock, say $1 worth,
00:01:21.420 | no matter what the stock price is.
00:01:23.540 | Brokers services are provided by Cash App Investing,
00:01:26.480 | a subsidiary of Square and member SIPC.
00:01:29.740 | I'm excited to be working with Cash App
00:01:32.040 | to support one of my favorite organizations called FIRST,
00:01:35.440 | best known for their FIRST Robotics and Lego competitions.
00:01:38.980 | They educate and inspire hundreds of thousands of students
00:01:42.240 | in over 110 countries
00:01:44.100 | and have a perfect rating at Charity Navigator,
00:01:46.720 | which means that donated money
00:01:48.000 | is used to maximum effectiveness.
00:01:50.760 | When you get Cash App from the App Store or Google Play
00:01:53.360 | and use code LEXPODCAST,
00:01:56.280 | you'll get $10 and Cash App will also donate $10 to FIRST,
00:02:00.300 | which again is an organization
00:02:02.140 | that I've personally seen inspire girls and boys
00:02:04.920 | to dream of engineering a better world.
00:02:08.060 | And now here's my conversation with Jim Keller.
00:02:11.480 | What are the differences and similarities
00:02:14.520 | between the human brain and a computer
00:02:17.200 | with the microprocessor at its core?
00:02:19.260 | Let's start with the philosophical question perhaps.
00:02:22.280 | - Well, since people don't actually understand
00:02:25.400 | how human brains work, I think that's true.
00:02:29.200 | - I think that's true.
00:02:30.560 | - So it's hard to compare them.
00:02:32.600 | Computers are, you know, there's really two things.
00:02:37.280 | There's memory and there's computation, right?
00:02:40.480 | And to date, almost all computer architectures
00:02:43.920 | are global memory, which is a thing, right?
00:02:47.600 | And then computation where you pull data
00:02:49.400 | and you do relatively simple operations on it
00:02:52.420 | and write data back.
00:02:53.900 | So it's decoupled in modern computers.
00:02:57.760 | And you think in the human brain,
00:02:59.840 | everything's a mesh, a mess that's combined together.
00:03:02.600 | - What people observe is there's, you know,
00:03:04.840 | some number of layers of neurons
00:03:06.500 | which have local and global connections.
00:03:09.120 | And information is stored in some distributed fashion.
00:03:13.720 | And people build things called neural networks in computers
00:03:18.280 | where the information is distributed
00:03:21.200 | in some kind of fashion.
00:03:22.840 | You know, there's a mathematics behind it.
00:03:25.520 | I don't know that the understanding of that is super deep.
00:03:29.200 | The computations we run on those
00:03:31.120 | are straightforward computations.
00:03:33.440 | I don't believe anybody has said
00:03:35.520 | a neuron does this computation.
00:03:37.880 | So to date, it's hard to compare them, I would say.
00:03:42.880 | - So let's get into the basics before we zoom back out.
00:03:48.800 | How do you build a computer from scratch?
00:03:51.020 | What is a microprocessor?
00:03:52.760 | What is a microarchitecture?
00:03:54.140 | What's an instruction set architecture?
00:03:56.640 | Maybe even as far back as what is a transistor?
00:03:59.460 | - So the special charm of computer engineering
00:04:05.040 | is there's a relatively good understanding
00:04:08.400 | of abstraction layers.
00:04:10.480 | So down at the bottom, you have atoms.
00:04:12.280 | And atoms get put together in materials
00:04:14.280 | like silicon or dope silicon or metal.
00:04:17.480 | And we build transistors on top of that.
00:04:21.000 | We build logic gates, right?
00:04:23.680 | And then functional units, like an adder, a subtractor,
00:04:27.400 | an instruction parsing unit.
00:04:28.760 | And then we assemble those into processing elements.
00:04:32.300 | Modern computers are built out of probably 10 to 20
00:04:37.200 | locally organic processing elements
00:04:40.920 | or coherent processing elements.
00:04:42.620 | And then that runs computer programs, right?
00:04:46.600 | So there's abstraction layers.
00:04:47.920 | And then software, there's an instruction set.
00:04:50.840 | You run, and then there's assembly language, C,
00:04:53.880 | C++, Java, JavaScript.
00:04:56.400 | There's abstraction layers,
00:04:58.680 | essentially from the atom to the data center, right?
00:05:02.520 | So when you build a computer,
00:05:05.060 | first there's a target, like what's it for?
00:05:08.520 | Like how fast does it have to be?
00:05:09.920 | Which today there's a whole bunch of metrics
00:05:12.180 | about what that is.
00:05:13.800 | And then in an organization of a thousand people
00:05:17.020 | who build a computer,
00:05:19.160 | there's lots of different disciplines
00:05:22.200 | that you have to operate on.
00:05:24.080 | Does that make sense?
00:05:25.480 | And so--
00:05:27.080 | - So there's a bunch of levels of abstraction.
00:05:29.380 | In an organization like Intel, and in your own vision,
00:05:35.720 | there's a lot of brilliance that comes in
00:05:37.560 | at every one of those layers.
00:05:39.680 | Some of it is science, some of it is engineering,
00:05:41.640 | some of it is art.
00:05:43.320 | What's the most, if you could pick favorites,
00:05:46.340 | what's the most important, your favorite layer
00:05:49.240 | on these layers of abstractions?
00:05:51.080 | Where does the magic enter this hierarchy?
00:05:53.920 | - I don't really care.
00:05:57.080 | That's the fun, you know, I'm somewhat agnostic to that.
00:06:00.720 | So I would say, for relatively long periods of time,
00:06:05.480 | instruction sets are stable.
00:06:08.020 | So the x86 instruction set, the ARM instruction set.
00:06:11.960 | - What's an instruction set?
00:06:13.320 | - So it says, how do you encode the basic operations?
00:06:16.080 | Load, store, multiply, add, subtract, conditional branch.
00:06:19.620 | There aren't that many interesting instructions.
00:06:23.800 | Like if you look at a program and it runs,
00:06:26.440 | 90% of the execution is on 25 opcodes, 25 instructions.
00:06:31.440 | And those are stable, right?
00:06:33.920 | - What does it mean, stable?
00:06:35.480 | - Intel architecture has been around for 25 years.
00:06:38.120 | - It works.
00:06:38.960 | - It works.
00:06:39.800 | And that's because the basics are defined a long time ago.
00:06:45.280 | Now, the way an old computer ran,
00:06:48.720 | is you fetched instructions and you executed them in order.
00:06:52.960 | Do the load, do the add, do the compare.
00:06:56.140 | The way a modern computer works,
00:06:58.880 | is you fetch large numbers of instructions, say 500.
00:07:03.280 | And then you find the dependency graph
00:07:06.240 | between the instructions.
00:07:07.920 | And then you execute in independent units,
00:07:12.320 | those little micrographs.
00:07:15.260 | So a modern computer, like people like to say,
00:07:17.740 | computers should be simple and clean.
00:07:20.700 | But this turns out the market for simple,
00:07:22.340 | complete, clean, slow computers is zero, right?
00:07:26.220 | We don't sell any simple, clean computers.
00:07:29.500 | Now you can, there's how you build it can be clean,
00:07:33.500 | but the computer people want to buy,
00:07:36.620 | that's say in a phone or a data center,
00:07:40.380 | fetches a large number of instructions,
00:07:42.620 | computes the dependency graph,
00:07:45.540 | and then executes it in a way that gets the right answers.
00:07:49.080 | - And optimize that graph somehow.
00:07:50.820 | - Yeah, they run deeply out of order.
00:07:53.460 | And then there's semantics around how memory ordering works
00:07:57.500 | and other things work.
00:07:58.340 | So the computer sort of has a bunch of bookkeeping tables.
00:08:01.900 | It says, what order should these operations finish in
00:08:05.420 | or appear to finish in?
00:08:07.740 | But to go fast, you have to fetch a lot of instructions
00:08:10.620 | and find all the parallelism.
00:08:12.620 | Now there's a second kind of computer,
00:08:15.380 | which we call GPUs today.
00:08:17.480 | And I call it a difference.
00:08:19.540 | There's found parallelism,
00:08:20.940 | like you have a program
00:08:21.780 | with a lot of dependent instructions.
00:08:24.020 | You fetch a bunch and then you go figure out
00:08:26.020 | the dependency graph and you issue instructions out of order.
00:08:29.340 | That's because you have one serial narrative to execute,
00:08:32.900 | which in fact can be done out of order.
00:08:35.780 | - You call it a narrative?
00:08:37.020 | - Yeah.
00:08:37.860 | - Wow.
00:08:38.700 | - Yeah, so humans think in serial narrative.
00:08:40.700 | So read a book, right?
00:08:42.980 | There's a sentence after sentence after sentence
00:08:45.780 | and there's paragraphs.
00:08:46.860 | Now you could diagram that.
00:08:49.380 | Imagine you diagrammed it properly and you said,
00:08:51.820 | which sentences could be read in any order
00:08:56.260 | without changing the meaning, right?
00:08:59.060 | - That's a fascinating question to ask of a book, yeah.
00:09:02.540 | - Yeah, you could do that, right?
00:09:04.380 | So some paragraphs could be reordered,
00:09:06.300 | some sentences can be reordered.
00:09:08.420 | You could say, he is tall and smart and X, right?
00:09:13.420 | And it doesn't matter the order of tall and smart.
00:09:18.220 | But if you say the tall man is wearing a red shirt,
00:09:22.940 | what colors, you know, like you can create dependencies.
00:09:27.140 | And so GPUs on the other hand,
00:09:32.020 | run simple programs on pixels,
00:09:35.300 | but you're given a million of them.
00:09:36.860 | And the first order, the screen you're looking at
00:09:40.140 | doesn't care which order you do it in.
00:09:42.180 | So I call that given parallelism.
00:09:44.460 | Simple narratives around the large numbers of things
00:09:48.300 | where you can just say it's parallel
00:09:50.420 | because you told me it was.
00:09:52.300 | - So found parallelism where the narrative is sequential
00:09:57.300 | but you discover like little pockets of parallelism versus--
00:10:01.780 | - Turns out large pockets of parallelism.
00:10:03.980 | - Large, so how hard is it to discover?
00:10:05.860 | - Well, how hard is it?
00:10:06.940 | That's just transistor count, right?
00:10:08.820 | So once you crack the problem, you say,
00:10:11.140 | here's how you fetch 10 instructions at a time.
00:10:13.460 | Here's how you calculated the dependencies between them.
00:10:16.340 | Here's how you describe the dependencies.
00:10:18.500 | Here's, you know, these are pieces, right?
00:10:20.640 | - So once you describe the dependencies,
00:10:25.580 | then it's just a graph,
00:10:27.580 | sort of it's an algorithm that finds,
00:10:30.180 | well, what is that?
00:10:31.940 | I'm sure there's a graph theory,
00:10:33.580 | theoretical answer here that's solvable.
00:10:35.780 | In general, programs, modern programs
00:10:40.700 | that human beings write,
00:10:42.220 | how much found parallelism is there in them?
00:10:45.060 | - About 10x.
00:10:45.900 | - What does 10x mean?
00:10:47.140 | - So if you execute it in order--
00:10:49.700 | - Versus, yeah.
00:10:51.500 | - You would get what's called cycles per instruction
00:10:53.900 | and it would be about, you know, three instructions,
00:10:58.180 | three cycles per instruction
00:10:59.980 | because of the latency of the operations and stuff.
00:11:02.740 | And in a modern computer,
00:11:04.460 | excuse it, but like 0.2, 0.25 cycles per instruction.
00:11:08.660 | So it's about, we today find 10x.
00:11:11.780 | And there's two things.
00:11:12.960 | One is the found parallelism in the narrative, right?
00:11:17.300 | And the other is to predictability of the narrative, right?
00:11:21.320 | So certain operations, they do a bunch of calculations
00:11:25.480 | and if greater than one, do this, else do that.
00:11:29.740 | That decision is predicted in modern computers
00:11:33.140 | to high 90% accuracy.
00:11:36.220 | So branches happen a lot.
00:11:38.700 | So imagine you have a decision
00:11:40.380 | to make every six instructions,
00:11:41.740 | which is about the average, right?
00:11:43.740 | But you want to fetch 500 instructions,
00:11:45.460 | figure out the graph and execute them all in parallel.
00:11:48.440 | That means you have, let's say,
00:11:51.580 | if you fix 600 instructions and it's every six,
00:11:54.980 | you have to fetch,
00:11:56.060 | you have to predict 99 out of 100 branches correctly
00:11:59.380 | for that window to be effective.
00:12:02.340 | - Okay, so parallelism,
00:12:04.660 | you can't parallelize branches, or you can?
00:12:07.580 | - No, you can predict--
00:12:09.100 | - What does predict a branch mean?
00:12:10.620 | Or what does predict a branch mean?
00:12:11.460 | - So imagine you do a computation over and over,
00:12:13.580 | you're in a loop.
00:12:14.940 | So while n is greater than one, do.
00:12:19.420 | And you go through that loop a million times.
00:12:21.220 | So every time you look at the branch,
00:12:22.640 | you say, it's probably still greater than one.
00:12:25.740 | - And you're saying you could do that accurately.
00:12:27.820 | - Very accurately.
00:12:28.660 | Modern computer--
00:12:29.480 | - My mind is blown.
00:12:30.320 | How the heck do you do that?
00:12:31.460 | Wait a minute.
00:12:32.580 | - Well, you want to know?
00:12:33.780 | This is really sad.
00:12:35.500 | 20 years ago, you simply recorded
00:12:38.700 | which way the branch went last time
00:12:40.620 | and predicted the same thing.
00:12:42.780 | - Right.
00:12:43.620 | - Okay.
00:12:44.460 | - What's the accuracy of that?
00:12:46.140 | - 85%.
00:12:48.100 | So then somebody said, hey, let's keep a couple of bits
00:12:51.780 | and have a little counter.
00:12:53.080 | So when it predicts one way,
00:12:54.980 | we count up and then pins.
00:12:56.720 | So say you have a three bit counter.
00:12:58.060 | So you count up and then you count down.
00:13:00.760 | And if it's, you can use the top bit as the sign bit.
00:13:03.260 | So you have a sign two bit number.
00:13:05.020 | So if it's greater than one, you predict taken
00:13:07.460 | and less than one, you predict not taken, right?
00:13:11.460 | Or less than zero, whatever the thing is.
00:13:14.100 | And that got us to 92%.
00:13:16.100 | - Oh.
00:13:17.300 | - Okay, now it gets better.
00:13:19.540 | This branch depends on how you got there.
00:13:22.900 | So if you came down the code one way,
00:13:25.540 | you're talking about Bob and Jane, right?
00:13:28.420 | And then said, is just Bob like Jane?
00:13:30.460 | It went one way.
00:13:31.300 | But if you're talking about Bob and Jill,
00:13:32.900 | does Bob like Jane?
00:13:33.940 | You go a different way, right?
00:13:35.800 | So that's called history.
00:13:36.920 | So you take the history and a counter.
00:13:38.900 | That's cool.
00:13:41.300 | But that's not how anything works today.
00:13:43.380 | They use something that looks a little
00:13:45.060 | like a neural network.
00:13:46.260 | So modern, you take all the execution flows
00:13:52.220 | and then you do basically deep pattern recognition
00:13:56.060 | of how the program is executing.
00:13:58.460 | And you do that multiple different ways.
00:14:03.660 | And you have something that chooses what the best result is.
00:14:06.660 | There's a little supercomputer inside the computer.
00:14:10.380 | - That's trying to predict branching.
00:14:11.220 | - That calculates which way branches go.
00:14:14.260 | So the effective window that is worth finding grass
00:14:17.340 | and gets bigger.
00:14:18.260 | - Why was that gonna make me sad?
00:14:21.820 | 'Cause that's amazing.
00:14:22.860 | - It's amazingly complicated.
00:14:24.380 | - Oh, well.
00:14:25.220 | - Well, here's the funny thing.
00:14:27.060 | So to get to 85% took a thousand bits.
00:14:31.700 | To get to 99% takes tens of megabits.
00:14:37.720 | So this is one of those, to get the result,
00:14:42.700 | to get from a window of say 50 instructions to 500,
00:14:47.700 | it took three orders of magnitude
00:14:49.500 | or four orders of magnitude more bits.
00:14:52.420 | - Now, if you get the prediction of a branch wrong,
00:14:55.460 | what happens then?
00:14:56.300 | - You flush the pipe.
00:14:57.380 | - You flush the pipe, so it's just the performance cost.
00:14:59.540 | - But it gets even better.
00:15:01.420 | So we're starting to look at stuff that says,
00:15:03.860 | so they executed down this path
00:15:05.820 | and then you had two ways to go,
00:15:09.260 | but far, far away there's something
00:15:11.860 | that doesn't matter which path you went.
00:15:14.660 | So you took the wrong path, you executed a bunch of stuff.
00:15:20.580 | Then you had the mispredicting, you backed it up,
00:15:22.420 | but you remembered all the results you already calculated.
00:15:25.500 | Some of those are just fine.
00:15:27.660 | Like if you read a book and you misunderstand a paragraph,
00:15:30.260 | you're understanding the next paragraph,
00:15:32.500 | sometimes it's invariant to their understanding.
00:15:35.740 | Sometimes it depends on it.
00:15:37.580 | - And you can kind of anticipate that invariance.
00:15:43.260 | - Yeah, well, you can keep track
00:15:45.540 | of whether the data changed.
00:15:47.380 | And so when you come back to a piece of code,
00:15:49.220 | should you calculate it again or do the same thing?
00:15:51.860 | - Okay, how much of this is art
00:15:53.340 | and how much of it is science?
00:15:55.620 | 'Cause it sounds pretty complicated.
00:15:59.060 | - Well, how do you describe a situation?
00:16:00.620 | So imagine you come to a point in the road
00:16:02.580 | where you have to make a decision.
00:16:05.140 | And you have a bunch of knowledge about which way to go,
00:16:07.020 | maybe you have a map.
00:16:08.900 | So you wanna go the shortest way
00:16:11.540 | or do you wanna go the fastest way
00:16:13.140 | or you wanna take the nicest road.
00:16:14.780 | So there's some set of data.
00:16:17.820 | So imagine you're doing something complicated
00:16:19.620 | like building a computer
00:16:20.900 | and there's hundreds of decision points
00:16:24.340 | all with hundreds of possible ways to go.
00:16:27.720 | And the ways you pick interact in a complicated way.
00:16:30.880 | - Right.
00:16:33.420 | - And then you have to pick the right spot.
00:16:35.660 | - Right, so that's-- - Or that's art or science,
00:16:36.900 | I don't know.
00:16:37.740 | - You avoided the question,
00:16:38.900 | you just described the Robert Frost problem
00:16:41.340 | of "Road Less Taken."
00:16:42.640 | - Described the Robert Frost problem?
00:16:45.700 | (laughs)
00:16:47.460 | - That's what we do as computer designers,
00:16:49.460 | it's all poetry.
00:16:50.420 | - Okay. - Great.
00:16:51.440 | Yeah, I don't know how to describe that
00:16:54.180 | because some people are very good
00:16:56.420 | at making those intuitive leaps.
00:16:57.940 | It seems like there's combinations of things.
00:17:00.560 | Some people are less good at it
00:17:02.180 | but they're really good at evaluating the alternatives.
00:17:05.580 | Right, and everybody has a different way to do it.
00:17:09.260 | And some people can't make those leaps
00:17:11.880 | but they're really good at analyzing it.
00:17:14.300 | So when you see computers are designed by teams of people
00:17:16.900 | who have very different skill sets
00:17:19.300 | and a good team has lots of different kinds of people.
00:17:24.300 | I suspect you would describe some of them as artistic.
00:17:27.220 | - Right. - But not very many.
00:17:30.460 | - Unfortunately, or fortunately.
00:17:32.100 | - Unfortunately.
00:17:32.940 | (laughs)
00:17:33.780 | Well, you know, computer design's hard,
00:17:36.500 | it's 99% perspiration.
00:17:39.500 | - And-- - The 1% inspiration
00:17:42.060 | is really important.
00:17:44.140 | - But you still need the 99.
00:17:45.900 | - Yeah, you gotta do a lot of work.
00:17:47.340 | And then there are interesting things to do
00:17:50.780 | at every level of that stack.
00:17:52.780 | - At the end of the day,
00:17:55.700 | if you run the same program multiple times,
00:17:58.860 | does it always produce the same result?
00:18:00.860 | Is there some room for fuzziness there?
00:18:04.740 | - That's a math problem.
00:18:06.740 | So if you run a correct C program,
00:18:08.580 | the definition is every time you run it,
00:18:11.460 | you get the same answer.
00:18:12.460 | - Yeah, well, that's a math statement.
00:18:14.460 | - But that's a language definitional statement.
00:18:17.420 | So for years when people did,
00:18:19.780 | when we first did 3D acceleration of graphics,
00:18:22.900 | you could run the same scene multiple times
00:18:27.260 | and get different answers.
00:18:28.740 | - Right.
00:18:29.740 | - Right, and then some people thought that was okay,
00:18:32.340 | and some people thought it was a bad idea.
00:18:34.540 | And then when the HPC world used GPUs for calculations,
00:18:39.220 | they thought it was a really bad idea, okay?
00:18:42.100 | Now, in modern AI stuff,
00:18:44.380 | people are looking at networks
00:18:48.060 | where the precision of the data is low enough
00:18:51.020 | that the data is somewhat noisy.
00:18:53.620 | And the observation is the input data is unbelievably noisy.
00:18:57.220 | So why should the calculation be not noisy?
00:19:00.180 | And people have experimented with algorithms
00:19:02.140 | that say can get faster answers by being noisy.
00:19:05.900 | Like as a network starts to converge,
00:19:08.220 | if you look at the computation graph,
00:19:09.540 | it starts out really wide and then it gets narrower.
00:19:12.140 | And you can say, is that last little bit that important?
00:19:14.460 | Or should I start the graph on the next rev
00:19:17.700 | before we whittle it all the way down to the answer?
00:19:20.780 | Right, so you can create algorithms that are noisy.
00:19:24.060 | Now, if you're developing something
00:19:25.460 | and every time you run it, you get a different answer,
00:19:27.420 | it's really annoying.
00:19:29.300 | And so most people think even today,
00:19:33.940 | every time you run the program, you get the same answer.
00:19:36.740 | - No, I know, but the question is,
00:19:38.340 | that's the formal definition of a programming language.
00:19:42.380 | - There is a definition of languages
00:19:44.500 | that don't get the same answer, but people who use those.
00:19:47.400 | You always want something 'cause you get a bad answer
00:19:50.780 | and then you're wondering, is it because
00:19:53.260 | of something in the algorithm or because of this?
00:19:55.340 | And so everybody wants a little switch
00:19:56.780 | that says no matter what, do it deterministically.
00:20:00.260 | And it's really weird 'cause almost everything
00:20:02.380 | going into modern calculations is noisy.
00:20:05.300 | So why do the answers have to be so clear?
00:20:07.620 | - Right, so where do you stand?
00:20:09.620 | - I design computers for people who run programs.
00:20:12.500 | So if somebody says, I want a deterministic answer,
00:20:17.100 | most people want that.
00:20:18.380 | - Can you deliver a deterministic answer,
00:20:20.180 | I guess is the question.
00:20:21.340 | - Yeah, hopefully, sure.
00:20:24.380 | What people don't realize is you get a deterministic answer
00:20:27.260 | even though the execution flow is very undeterministic.
00:20:31.100 | So you run this program 100 times,
00:20:33.100 | it never runs the same way twice, ever.
00:20:36.100 | And the answer, it arrives at the same answer.
00:20:37.980 | - But it gets the same answer every time.
00:20:39.220 | - It's just amazing.
00:20:42.020 | Okay, you've achieved in the eyes of many people,
00:20:47.020 | a legend status as a chip architect.
00:20:53.020 | What design creation are you most proud of?
00:20:56.420 | Perhaps because it was challenging, because of its impact,
00:21:00.660 | or because of the set of brilliant ideas
00:21:03.100 | that were involved in bringing it to life.
00:21:06.820 | - I find that description odd.
00:21:10.100 | And I have two small children, and I promise you,
00:21:12.580 | they think it's hilarious.
00:21:15.940 | - This question, I do it for them.
00:21:18.340 | - I'm really interested in building computers.
00:21:22.420 | And I've worked with really, really smart people.
00:21:27.620 | I'm not unbelievably smart.
00:21:30.020 | I'm fascinated by how they go together,
00:21:32.100 | both as a thing to do, and as an endeavor that people do.
00:21:37.100 | - How people and computers go together?
00:21:40.020 | - Yeah, like how people think and build a computer.
00:21:43.000 | And I find sometimes that the best computer architects
00:21:47.780 | aren't that interested in people,
00:21:49.220 | or the best people managers aren't that good
00:21:51.780 | at designing computers.
00:21:53.260 | - So the whole stack of human beings is fascinating.
00:21:56.860 | So the managers, the individual engineers.
00:21:58.940 | - Yeah, yeah, so yeah, I said I realized
00:22:01.460 | after a lot of years of building computers,
00:22:03.660 | we sort of build them out of transistors,
00:22:05.180 | logic gates, functional units, computational elements,
00:22:08.560 | that you could think of people the same way.
00:22:10.740 | So people are functional units.
00:22:12.620 | And then you could think of organizational design
00:22:14.540 | as a computer architectural problem.
00:22:16.900 | And then it was like, oh, that's super cool,
00:22:19.300 | 'cause the people are all different,
00:22:20.680 | just like the computational elements are all different.
00:22:23.660 | And they like to do different things.
00:22:25.580 | And so I had a lot of fun reframing
00:22:29.180 | how I think about organizations.
00:22:31.300 | - Just like with computers,
00:22:34.140 | we were saying execution paths,
00:22:35.980 | you can have a lot of different paths
00:22:37.340 | that end up at the same good destination.
00:22:41.620 | So what have you learned about the human abstractions,
00:22:45.820 | from individual functional human units
00:22:48.860 | to the broader organization?
00:22:51.900 | What does it take to create something special?
00:22:55.020 | - Well, most people don't think simple enough.
00:23:00.300 | - All right, so do you know the difference
00:23:01.660 | between a recipe and the understanding?
00:23:04.140 | There's probably a philosophical description of this.
00:23:09.180 | So imagine you're gonna make a loaf of bread.
00:23:11.500 | The recipe says, get some flour, add some water,
00:23:14.060 | add some yeast, mix it up, let it rise,
00:23:16.820 | put it in a pan, put it in the oven.
00:23:19.420 | It's a recipe.
00:23:20.280 | Understanding bread, you can understand biology,
00:23:24.740 | supply chains, grain grinders, yeast,
00:23:30.260 | physics, thermodynamics.
00:23:34.380 | There's so many levels of understanding there.
00:23:37.220 | And then when people build and design things,
00:23:40.220 | they frequently are executing some stack of recipes.
00:23:43.660 | And the problem with that is the recipes
00:23:46.900 | all have limited scope.
00:23:48.900 | Like if you have a really good recipe book
00:23:50.640 | for making bread, it won't tell you anything
00:23:52.300 | about how to make an omelet.
00:23:53.700 | But if you have a deep understanding of cooking,
00:23:59.220 | then bread, omelets, sandwich,
00:24:03.100 | there's a different way of viewing everything.
00:24:07.740 | And most people, when you get to be an expert at something,
00:24:12.260 | you're hoping to achieve deeper understanding,
00:24:16.420 | not just a large set of recipes to go execute.
00:24:19.960 | And it's interesting to walk groups of people
00:24:22.820 | because executing recipes is unbelievably efficient
00:24:27.620 | if it's what you want to do.
00:24:29.180 | If it's not what you want to do, you're really stuck.
00:24:33.500 | And that difference is crucial.
00:24:36.580 | And everybody has a balance of, let's say,
00:24:39.480 | deeper understanding of recipes.
00:24:40.940 | And some people are really good at recognizing
00:24:43.740 | when the problem is to understand something deeply.
00:24:46.460 | Deeply.
00:24:47.700 | Does that make sense?
00:24:49.060 | - It totally makes sense.
00:24:50.540 | Does every stage of development,
00:24:52.780 | deep understanding on the team needed?
00:24:55.540 | - Well, this goes back to the art versus science question.
00:24:58.620 | - Sure.
00:24:59.460 | - If you constantly unpack everything
00:25:01.220 | for deeper understanding, you never get anything done.
00:25:04.220 | And if you don't unpack understanding when you need to,
00:25:06.900 | you'll do the wrong thing.
00:25:08.460 | And then at every juncture, like human beings
00:25:12.020 | are these really weird things because everything you tell
00:25:15.100 | them has a million possible outputs.
00:25:17.100 | And then they all interact in a hilarious way.
00:25:20.640 | And then having some intuition about what do you tell them,
00:25:24.260 | what do you do, when do you intervene, when do you not,
00:25:26.700 | it's complicated.
00:25:28.740 | - Right, so--
00:25:29.780 | - It's essentially computationally unsolvable.
00:25:33.180 | - Yeah, it's an intractable problem, sure.
00:25:35.340 | Humans are a mess.
00:25:37.980 | But with deep understanding, do you mean also
00:25:42.980 | sort of fundamental questions of things like
00:25:47.340 | what is a computer?
00:25:49.940 | Or why?
00:25:53.700 | Like the why question is why are we even building this?
00:25:57.460 | Like of purpose?
00:25:58.780 | Or do you mean more like going towards
00:26:02.180 | the fundamental limits of physics,
00:26:04.300 | sort of really getting into the core of the science?
00:26:07.300 | - Well, in terms of building a computer,
00:26:09.540 | think a little simpler.
00:26:11.360 | So common practice is you build a computer.
00:26:14.620 | And then when somebody says, I want to make it 10% faster,
00:26:17.760 | you'll go in and say, all right,
00:26:19.220 | I need to make this buffer bigger.
00:26:20.820 | And maybe I'll add an ad unit.
00:26:22.980 | Or I have this thing that's three instructions wide,
00:26:25.340 | I'm gonna make it four instructions wide.
00:26:27.580 | And what you see is each piece
00:26:30.460 | gets incrementally more complicated.
00:26:32.700 | And then at some point you hit this limit.
00:26:37.060 | Like adding another feature or buffer
00:26:39.020 | doesn't seem to make it any faster.
00:26:41.160 | And then people say, well, that's because
00:26:42.740 | it's a fundamental limit.
00:26:45.360 | And then somebody else will look at it and say,
00:26:46.900 | well, actually the way you divided the problem up
00:26:49.380 | and the way that different features are interacting
00:26:51.940 | is limiting you and it has to be rethought, rewritten.
00:26:54.960 | So then you refactor it and rewrite it.
00:26:58.100 | And what people commonly find is the rewrite
00:27:00.900 | is not only faster, but half as complicated.
00:27:03.540 | - From scratch?
00:27:04.380 | - Yes.
00:27:05.200 | - So how often in your career,
00:27:07.340 | but just have you seen as needed,
00:27:09.700 | maybe more generally to just throw the whole thing out?
00:27:14.220 | - This is where I'm on one end of it,
00:27:16.980 | every three to five years.
00:27:19.080 | - Which end are you on?
00:27:21.060 | - Rewrite more often.
00:27:22.700 | - Rewrite, and three to five years is?
00:27:25.180 | - If you want to really make a lot of progress
00:27:26.940 | on computer architecture, every five years
00:27:28.900 | you should do one from scratch.
00:27:30.460 | - So where does the x86-64 standard come in?
00:27:36.900 | How often do you?
00:27:38.740 | - I wrote the, I was the co-author of that spec in '98.
00:27:42.340 | That's 20 years ago.
00:27:43.860 | - Yeah, so that's still around.
00:27:45.820 | - The instruction set itself has been extended
00:27:48.240 | quite a few times.
00:27:49.140 | - Yes.
00:27:49.980 | - And instruction sets are less interesting
00:27:52.460 | than the implementation underneath.
00:27:54.740 | There's been, on x86 architecture,
00:27:57.500 | Intel's designed a few, AMD's designed a few,
00:27:59.940 | very different architectures.
00:28:02.460 | And I don't want to go into too much of the detail
00:28:06.460 | about how often, but there's a tendency
00:28:10.560 | to rewrite it every 10 years,
00:28:12.500 | and it really should be every five.
00:28:15.100 | - So you're saying you're an outlier in that sense?
00:28:17.500 | - Rewrite more often.
00:28:18.900 | Rewrite more often.
00:28:20.060 | - Well, and here's the problem.
00:28:20.900 | - Isn't that scary?
00:28:22.100 | - Yeah, of course.
00:28:23.660 | Well, scary to who?
00:28:25.180 | - To everybody involved, because like you said,
00:28:28.140 | repeating the recipe is efficient.
00:28:30.620 | Companies want to make money,
00:28:33.820 | well, no, individual engineers want to succeed,
00:28:36.340 | so you want to incrementally improve,
00:28:39.000 | increase the buffer from three to four.
00:28:41.260 | - Well, this is where you get into
00:28:43.340 | diminishing return curves.
00:28:45.420 | I think Steve Jobs said this, right?
00:28:46.920 | So every, you have a project,
00:28:48.980 | and you start here, and it goes up,
00:28:50.580 | and you have diminishing return.
00:28:52.380 | And to get to the next level, you have to do a new one,
00:28:54.780 | and the initial starting point will be lower
00:28:57.660 | than the old optimization point, but it'll get higher.
00:29:01.860 | So now you have two kinds of fear,
00:29:03.580 | short-term disaster and long-term disaster.
00:29:07.560 | - And you're-- - So grown-ups.
00:29:09.580 | - Grown-ups.
00:29:10.460 | - Right, like, you know, people with a quarter-by-quarter
00:29:13.820 | business objective are terrified about changing everything.
00:29:17.880 | And people who are trying to run a business
00:29:21.080 | or build a computer for a long-term objective
00:29:24.000 | know that the short-term limitations
00:29:26.560 | block them from the long-term success.
00:29:29.360 | So if you look at leaders of companies
00:29:32.760 | that had really good long-term success,
00:29:35.200 | every time they saw that they had to redo something,
00:29:37.640 | they did.
00:29:39.000 | - And so somebody has to speak up?
00:29:41.060 | - Or you do multiple projects in parallel.
00:29:43.080 | Like, you optimize the old one while you build a new one.
00:29:46.720 | But the marketing guys are always like,
00:29:48.480 | promise me that the new computer
00:29:49.960 | is faster on every single thing.
00:29:52.720 | And the computer architect says,
00:29:53.920 | well, the new computer will be faster on the average.
00:29:56.740 | But there's a distribution of results and performance,
00:29:59.480 | and you'll have some outliers that are slower.
00:30:01.920 | And that's very hard, 'cause they have one customer
00:30:03.760 | who cares about that one.
00:30:05.320 | - So speaking of the long-term, for over 50 years now,
00:30:09.000 | Moore's Law has served, for me and millions of others,
00:30:12.960 | as an inspiring beacon of what kind of amazing future
00:30:16.680 | brilliant engineers can build.
00:30:18.160 | I'm just making your kids laugh all of today.
00:30:21.880 | - Yeah, that's great.
00:30:23.520 | - So first, in your eyes, what is Moore's Law,
00:30:27.560 | if you could define for people who don't know?
00:30:29.860 | - Well, the simple statement was, from Gordon Moore,
00:30:34.300 | was double the number of transistors every two years.
00:30:37.920 | Something like that.
00:30:39.360 | And then my operational model is,
00:30:43.280 | we increase the performance of computers by 2x
00:30:46.880 | every two or three years.
00:30:48.560 | And it's wiggled around substantially over time.
00:30:51.480 | And also, in how we deliver performance has changed.
00:30:55.220 | - So, right, so the--
00:30:59.000 | - The foundational idea was 2x the transistors
00:31:01.680 | every two years.
00:31:02.960 | The current cadence is something like,
00:31:05.800 | they call it a shrink factor.
00:31:08.000 | Like .6 every two years, which is not .5.
00:31:11.920 | - But that's referring strictly, again,
00:31:13.800 | to the original definition of just--
00:31:14.640 | - Yeah, of transistor count.
00:31:16.660 | - A shrink factor, just getting 'em smaller,
00:31:18.400 | smaller, smaller.
00:31:19.240 | - Well, it's for a constant chip area.
00:31:21.760 | If you make the transistors smaller by .6,
00:31:24.200 | then you get one over .6 more transistors.
00:31:27.200 | - So can you linger on it a little longer?
00:31:29.160 | What's the broader, what do you think should be
00:31:31.680 | the broader definition of Moore's Law?
00:31:33.920 | When you mentioned how you think of performance,
00:31:37.920 | just broadly, what's a good way to think about Moore's Law?
00:31:41.480 | - Well, first of all, I've been aware of Moore's Law
00:31:46.200 | for 30 years.
00:31:47.220 | - In which sense?
00:31:49.080 | - Well, I've been designing computers for 40.
00:31:52.920 | - You're just watching it before your eyes, kind of thing.
00:31:55.440 | - Well, and somewhere where I became aware of it,
00:31:58.160 | I was also informed that Moore's Law was gonna die
00:32:00.440 | in 10 to 15 years.
00:32:02.240 | And I thought that was true at first,
00:32:03.920 | but then after 10 years, it was gonna die
00:32:05.840 | in 10 to 15 years.
00:32:07.480 | And then at one point, it was gonna die in five years,
00:32:09.760 | and then it went back up to 10 years,
00:32:11.320 | and at some point, I decided not to worry
00:32:13.440 | about that particular prognostication
00:32:16.680 | for the rest of my life, which is fun.
00:32:19.640 | And then I joined Intel, and everybody said,
00:32:21.560 | Moore's Law is dead.
00:32:22.840 | And I thought, that's sad, 'cause it's the Moore's Law
00:32:24.600 | company, and it's not dead, and it's always been gonna die.
00:32:29.200 | And humans, like these apocryphal kind of statements,
00:32:33.360 | like, we'll run out of food, or we'll run out of air,
00:32:36.280 | or run out of room, or run out of something.
00:32:39.960 | - Right, but it's still incredible that it's lived
00:32:42.520 | for as long as it has.
00:32:44.640 | And yes, there's many people who believe now
00:32:47.640 | that Moore's Law is dead.
00:32:50.200 | - You know, they can join the last 50 years
00:32:52.840 | of people who had the same idea.
00:32:53.680 | - Yeah, there's a long tradition.
00:32:55.400 | But why do you think, if you can try to understand it,
00:33:00.400 | why do you think it's not dead currently?
00:33:03.880 | - Well, first, let's just think,
00:33:05.680 | people think Moore's Law is one thing,
00:33:07.080 | transistors get smaller.
00:33:09.160 | But actually, under the sheets, there's literally
00:33:10.760 | thousands of innovations, and almost all those innovations
00:33:14.120 | have their own diminishing return curves.
00:33:17.360 | So if you graph it, it looks like a cascade
00:33:19.400 | of diminishing return curves.
00:33:21.440 | I don't know what to call that.
00:33:22.680 | But the result is an exponential curve.
00:33:26.480 | Well, at least it has been.
00:33:27.920 | So, and we keep inventing new things.
00:33:30.920 | So if you're an expert in one of the things
00:33:32.960 | on a diminishing return curve, right,
00:33:35.920 | and you can see its plateau,
00:33:38.480 | you will probably tell people, "Well, this is done."
00:33:42.240 | Meanwhile, some other pile of people
00:33:43.640 | are doing something different.
00:33:46.400 | So that's just normal.
00:33:48.280 | So then there's the observation of how small
00:33:51.320 | could a switching device be?
00:33:54.080 | So a modern transistor is something like
00:33:55.760 | a thousand by a thousand by a thousand atoms, right?
00:33:59.920 | And you get quantum effects down around two to 10 atoms.
00:34:04.680 | So you can imagine a transistor as small
00:34:06.680 | as 10 by 10 by 10.
00:34:08.240 | So that's a million times smaller.
00:34:12.080 | And then the quantum computational people
00:34:14.480 | are working away at how to use quantum effects.
00:34:18.320 | - A thousand by a thousand by a thousand.
00:34:21.920 | - Atoms.
00:34:22.760 | - That's a really clean way of putting it.
00:34:26.640 | - Well, a fan, like a modern transistor,
00:34:28.840 | if you look at the fan, it's like 120 atoms wide,
00:34:32.040 | but we can make that thinner.
00:34:33.320 | And then there's a gate wrapped around it,
00:34:35.680 | and then there's spacing.
00:34:36.600 | There's a whole bunch of geometry.
00:34:38.760 | And a competent transistor designer
00:34:42.000 | could count both atoms in every single direction.
00:34:46.780 | (laughing)
00:34:47.980 | Like there's techniques now to already put down atoms
00:34:50.460 | in a single atomic layer.
00:34:51.980 | And you can place atoms if you want to.
00:34:55.820 | It's just, you know, from a manufacturing process,
00:34:59.580 | if placing an atom takes 10 minutes
00:35:01.300 | and you need to put 10 to the 23rd atoms together
00:35:05.620 | to make a computer, it would take a long time.
00:35:08.780 | So the methods are both shrinking things
00:35:13.340 | and then coming up with effective ways
00:35:15.060 | to control what's happening.
00:35:17.900 | - Manufacture stably and cheaply.
00:35:20.060 | - Yeah.
00:35:21.660 | So the innovation stack's pretty broad.
00:35:23.540 | You know, there's equipment, there's optics,
00:35:26.020 | there's chemistry, there's physics,
00:35:27.580 | there's material science, there's metallurgy.
00:35:31.040 | There's lots of ideas about when you put
00:35:32.660 | different materials together, how do they interact?
00:35:34.500 | Are they stable?
00:35:35.540 | Is it stable over temperature?
00:35:37.180 | Like are they repeatable?
00:35:40.540 | You know, there's like literally thousands
00:35:43.540 | of technologies involved.
00:35:45.020 | - But just for the shrinking, you don't think
00:35:46.980 | we're quite yet close to the fundamental limits of physics.
00:35:50.980 | - I did a talk on Moore's Law and I asked
00:35:52.540 | for a roadmap to a path of 100 and after two weeks,
00:35:56.580 | they said we only got to 50.
00:35:58.900 | - 100 what, sorry?
00:35:59.740 | - 100x shrink.
00:36:00.580 | - 100x shrink?
00:36:01.940 | We only got to 50?
00:36:02.780 | - To 50 and I said, why don't you give it
00:36:03.940 | another two weeks?
00:36:05.120 | Well, here's the thing about Moore's Law, right?
00:36:09.660 | So I believe that the next 10 or 20 years
00:36:14.180 | of shrinking is gonna happen, right?
00:36:16.360 | Now, as a computer designer, you have two stances.
00:36:20.940 | You think it's going to shrink, in which case
00:36:23.040 | you're designing and thinking about architecture
00:36:26.180 | in a way that you'll use more transistors.
00:36:29.020 | Or conversely, not be swamped by the complexity
00:36:32.860 | of all the transistors you get, right?
00:36:36.140 | You have to have a strategy, you know?
00:36:39.300 | - So you're open to the possibility and waiting
00:36:42.100 | for the possibility of a whole new army
00:36:44.180 | of transistors ready to work.
00:36:45.940 | - I'm expecting--
00:36:47.260 | - Expecting.
00:36:48.100 | - More transistors every two or three years
00:36:50.380 | by a number large enough that how you think
00:36:53.580 | about design, how you think about architecture
00:36:55.580 | has to change.
00:36:57.200 | Like imagine you build buildings out of bricks
00:37:01.100 | and every year the bricks are half the size
00:37:03.260 | or every two years.
00:37:05.860 | Well, if you kept building bricks the same way,
00:37:08.100 | you know, so many bricks per person per day,
00:37:11.260 | the amount of time to build a building
00:37:13.580 | would go up exponentially.
00:37:14.980 | - Right.
00:37:16.660 | - Right.
00:37:17.480 | But if you said, I know that's coming,
00:37:19.140 | so now I'm going to design equipment
00:37:21.140 | that moves bricks faster, uses them better,
00:37:23.420 | 'cause maybe you're getting something out
00:37:24.540 | of the smaller bricks, more strength, thinner walls,
00:37:27.500 | you know, less material, efficiency out of that.
00:37:30.320 | So once you have a roadmap with what's going to happen,
00:37:33.220 | transistors, we're going to get more of them,
00:37:36.500 | then you design all this collateral around it
00:37:38.740 | to take advantage of it and also to cope with it.
00:37:42.420 | Like that's the thing people don't understand,
00:37:43.740 | it's like if I didn't believe in Moore's law
00:37:46.100 | and then Moore's law transistors showed up,
00:37:48.720 | my design teams were all drowned.
00:37:50.500 | - So what's the hardest part of this in flood
00:37:56.160 | of new transistors?
00:37:57.300 | I mean, even if you just look historically
00:37:59.440 | throughout your career, what's the thing,
00:38:03.700 | what fundamentally changes when you add more transistors
00:38:06.940 | in the task of designing an architecture?
00:38:09.980 | - Well, there's two constants, right?
00:38:12.500 | One is people don't get smarter.
00:38:14.140 | - By the way, there's some science showing
00:38:17.300 | that we do get smarter because of nutrition, whatever.
00:38:20.300 | Sorry to bring that up.
00:38:22.060 | - The Flint effect.
00:38:22.880 | - Yes.
00:38:23.720 | - Yeah, I'm familiar with it.
00:38:24.560 | Nobody understands it, nobody knows if it's still going on.
00:38:26.260 | So that's a--
00:38:27.140 | - Or whether it's real or not, but yeah.
00:38:29.140 | - I sort of--
00:38:31.260 | - Anyway, but not exponentially.
00:38:32.100 | - I would believe for the most part,
00:38:33.460 | people aren't getting much smarter.
00:38:35.500 | - The evidence doesn't support it.
00:38:36.820 | - That's right.
00:38:37.660 | - And then teams can't grow that much.
00:38:40.060 | - Right.
00:38:40.900 | - Right, so human beings, we're really good in teams of 10,
00:38:45.060 | up to teams of 100, they can know each other.
00:38:48.140 | Beyond that, you have to have organizational boundaries.
00:38:50.800 | So you're kind of, you have,
00:38:51.900 | those are pretty hard constraints, right?
00:38:54.640 | So then you have to divide and conquer.
00:38:56.380 | Like as the designs get bigger,
00:38:57.900 | you have to divide it into pieces.
00:38:59.700 | The power of abstraction layers is really high.
00:39:03.180 | We used to build computers out of transistors.
00:39:06.100 | Now we have a team that turns transistors into logic cells
00:39:08.860 | and another team that turns them into functional units
00:39:10.660 | and another one that turns them into computers, right?
00:39:13.140 | So we have abstraction layers in there.
00:39:16.040 | And you have to think about when do you shift gears on that?
00:39:21.040 | We also use faster computers to build faster computers.
00:39:24.280 | So some algorithms run twice as fast on new computers,
00:39:27.780 | but a lot of algorithms are N squared.
00:39:30.420 | So, you know, a computer with twice as many transistors
00:39:33.580 | in it might take four times as long to run.
00:39:36.500 | So you have to refactor the software.
00:39:39.340 | Like simply using faster computers
00:39:41.020 | to build bigger computers doesn't work.
00:39:43.020 | So you have to think about all these things.
00:39:46.260 | - So in terms of computing performance
00:39:47.860 | and the exciting possibility
00:39:49.260 | that more powerful computers bring,
00:39:51.560 | is shrinking the thing we've just been talking about,
00:39:55.180 | one of the, for you,
00:39:57.500 | one of the biggest exciting possibilities
00:39:59.860 | of advancement in performance,
00:40:01.500 | or is there other directions
00:40:02.820 | that you're interested in?
00:40:03.900 | Like in the direction of sort of enforcing given parallelism
00:40:08.900 | or like doing massive parallelism
00:40:12.180 | in terms of many, many CPUs,
00:40:15.020 | you know, stacking CPUs on top of each other,
00:40:17.660 | that kind of parallelism or any kind of parallelism?
00:40:20.780 | - Well, think about it in a different way.
00:40:22.220 | So old computers, you know, slow computers,
00:40:25.220 | you said A equal B plus C times D.
00:40:28.500 | Pretty simple, right?
00:40:30.580 | And then we made faster computers with vector units
00:40:33.460 | and you can do proper equations and matrices, right?
00:40:38.460 | And then modern like AI computations
00:40:41.060 | or like convolutional neural networks,
00:40:43.380 | where you convolve one large data set against another.
00:40:47.060 | And so there's sort of this hierarchy of mathematics,
00:40:51.100 | you know, from simple equation to linear equations
00:40:54.020 | to matrix equations to deeper kind of computation.
00:40:58.740 | And the data sets are getting so big
00:41:00.580 | that people are thinking of data as a topology problem.
00:41:04.340 | You know, data is organized in some immense shape.
00:41:07.940 | And then the computation,
00:41:09.340 | which sort of wants to be get data from immense shape
00:41:12.900 | and do some computation on it.
00:41:15.300 | So what computers have allowed people to do
00:41:18.100 | is have algorithms go much, much further.
00:41:21.380 | So that paper you referenced, the Sutton paper,
00:41:26.620 | they talked about, you know, like when AI started,
00:41:29.100 | it was apply rule sets to something.
00:41:31.860 | That's a very simple computational situation.
00:41:35.780 | And then when they did first chess thing,
00:41:37.820 | they solved deep searches.
00:41:39.860 | So have a huge database of moves and results, deep search,
00:41:44.660 | but it's still just a search, right?
00:41:48.140 | Now we take large numbers of images
00:41:51.140 | and we use it to train these weight sets
00:41:54.380 | that we convolve across.
00:41:56.260 | It's a completely different kind of phenomena.
00:41:58.900 | We call that AI.
00:41:59.940 | Now they're doing the next generation.
00:42:02.420 | And if you look at it,
00:42:03.780 | they're going up this mathematical graph, right?
00:42:07.540 | And then computations, both computation and data sets
00:42:11.180 | support going up that graph.
00:42:13.940 | - Yeah, the kind of computation that might,
00:42:15.460 | I mean, I would argue that all of it is still a search,
00:42:18.700 | right?
00:42:19.980 | Just like you said, a topology problem of data sets,
00:42:22.780 | you're searching the data sets for valuable data.
00:42:27.020 | And also the actual optimization of neural networks
00:42:29.980 | is a kind of search for the--
00:42:33.060 | - I don't know, if you had looked at the inner layers
00:42:34.780 | of finding a cat, it's not a search.
00:42:39.100 | It's a set of endless projections.
00:42:41.100 | So, you know, a projection,
00:42:42.740 | here's a shadow of this phone, right?
00:42:45.660 | And then you can have a shadow of that on the something
00:42:47.700 | and a shadow on that of something.
00:42:49.260 | If you look in the layers, you'll see,
00:42:51.420 | this layer actually describes pointy ears
00:42:53.580 | and round eyeness and fuzziness.
00:42:55.540 | And, but the computation to tease out the attributes
00:43:00.540 | is not search.
00:43:03.700 | - Right, I mean--
00:43:04.540 | - Like the inference part might be search,
00:43:05.980 | but the training's not search.
00:43:07.460 | - Okay, well--
00:43:08.300 | - And then in deep networks, they look at layers
00:43:10.740 | and they don't even know it's represented.
00:43:13.140 | And yet if you take the layers out, it doesn't work.
00:43:16.620 | - Okay, so--
00:43:17.460 | - So I don't think it's search.
00:43:18.940 | - All right, well--
00:43:19.780 | - But you'd have to talk to a mathematician
00:43:21.020 | about what that actually is.
00:43:22.980 | - Well, we could disagree, but the,
00:43:25.780 | it's just semantics, I think, it's not,
00:43:28.180 | but it's certainly not--
00:43:29.020 | - I would say it's absolutely not semantics, but--
00:43:31.900 | - Okay.
00:43:32.740 | All right, well, if you wanna go there.
00:43:35.580 | So optimization to me is search,
00:43:39.020 | and we're trying to optimize the ability
00:43:42.940 | of a neural network to detect cat ears.
00:43:45.820 | And the difference between chess
00:43:49.020 | and the space, the incredibly multi-dimensional,
00:43:54.020 | 100,000 dimensional space that neural networks
00:43:57.340 | are trying to optimize over
00:43:58.740 | is nothing like the chessboard database.
00:44:02.220 | So it's a totally different kind of thing.
00:44:04.780 | Okay, in that sense, you can say--
00:44:06.220 | - Yeah, yeah.
00:44:07.060 | - It loses the meaning.
00:44:07.900 | - I can see how you might say, if you,
00:44:11.220 | the funny thing is, is the difference between
00:44:14.060 | given search space and found search space.
00:44:16.500 | - Right, exactly.
00:44:17.340 | - Yeah, maybe that's a different way to describe it.
00:44:18.180 | - That's a beautiful way to put it, okay.
00:44:19.980 | But you're saying, what's your sense
00:44:21.700 | in terms of the basic mathematical operations
00:44:24.820 | and the architectures, computer hardware
00:44:27.780 | that enables those operations?
00:44:29.920 | Do you see the CPUs of today still being
00:44:33.020 | a really core part of executing
00:44:36.020 | those mathematical operations?
00:44:37.620 | - Yes.
00:44:38.540 | Well, the operations continue to be add, subtract,
00:44:42.300 | load, store, compare, and branch.
00:44:44.020 | It's remarkable.
00:44:46.140 | So it's interesting that the building blocks
00:44:48.860 | of computers or transistors, under that, atoms.
00:44:52.780 | So you got atoms, transistors, logic gates, computers,
00:44:55.940 | right, functional units of computers.
00:44:58.420 | The building blocks of mathematics at some level
00:45:01.060 | are things like adds and subtracts and multiplies,
00:45:04.460 | but the space mathematics can describe
00:45:08.400 | is, I think, essentially infinite.
00:45:11.300 | But the computers that run the algorithms
00:45:14.100 | are still doing the same things.
00:45:16.660 | Now, a given algorithm might say,
00:45:19.020 | I need sparse data, or I need 32-bit data,
00:45:21.940 | or I need a convolution operation
00:45:26.340 | that naturally takes eight-bit data,
00:45:28.980 | multiplies it, and sums it up a certain way.
00:45:31.660 | So the data types in TensorFlow imply an optimization set,
00:45:36.660 | but when you go right down and look at the computers,
00:45:40.460 | it's and and or gates doing adds and multiplies.
00:45:44.060 | Like, that hasn't changed much.
00:45:46.220 | Now, the quantum researchers think
00:45:48.580 | they're gonna change that radically,
00:45:49.980 | and then there's people who think about analog computing,
00:45:52.260 | 'cause you look in the brain,
00:45:53.140 | and it seems to be more analogish.
00:45:54.960 | You know, that maybe there's a way
00:45:57.140 | to do that more efficiently.
00:45:59.100 | But we have a million X on computation,
00:46:03.480 | and I don't know the relationship
00:46:07.780 | between computational, let's say, intensity
00:46:10.980 | and ability to hit mathematical abstractions.
00:46:14.420 | I don't know any ways to describe that,
00:46:17.380 | but just like you saw in AI,
00:46:19.780 | you went from rule sets to simple search
00:46:22.980 | to complex search to, say, found search.
00:46:26.420 | Like, those are orders of magnitude
00:46:29.180 | more computation to do.
00:46:30.700 | And as we get the next two orders of magnitude,
00:46:33.900 | like a friend, Roger Godori, said,
00:46:36.620 | every order of magnitude changes the computation.
00:46:40.140 | - Fundamentally changes what the computation is doing.
00:46:42.700 | - Yeah.
00:46:43.540 | Oh, you know the expression,
00:46:45.660 | the difference in quantity is the difference in kind.
00:46:48.300 | You know, the difference between ant and anthill, right?
00:46:53.020 | Or neuron and brain.
00:46:54.640 | You know, there's this indefinable place
00:46:58.880 | where the quantity changed the quality, right?
00:47:02.500 | And we've seen that happen in mathematics multiple times,
00:47:04.980 | and my guess is it's gonna keep happening.
00:47:08.560 | - So, in your sense, is it, yeah,
00:47:09.980 | if you focus head down and shrinking the transistor.
00:47:14.860 | - Well, it's not just head down,
00:47:15.700 | and we're aware of the software stacks
00:47:18.360 | that are running the computational loads,
00:47:20.400 | and we're kind of pondering,
00:47:22.060 | what do you do with a petabyte of memory
00:47:24.500 | that wants to be accessed in a sparse way
00:47:27.100 | and have, you know, the kind of calculations
00:47:29.360 | AI programmers want?
00:47:31.780 | So, there's a dialogue and interaction,
00:47:34.740 | but when you go in the computer chip,
00:47:38.100 | you know, you find adders and subtractors and multipliers.
00:47:41.540 | - So, if you zoom out then with,
00:47:44.860 | as you mentioned, Rich Sutton,
00:47:46.920 | the idea that most of the development
00:47:49.300 | in the last many decades in AI research
00:47:51.540 | came from just leveraging computation
00:47:54.320 | and just simple algorithms
00:47:57.900 | waiting for the computation to improve.
00:48:00.040 | - Well, software guys have a thing that they call it
00:48:02.740 | the problem of early optimization.
00:48:06.220 | - Right.
00:48:07.060 | - So, you write a big software stack,
00:48:09.140 | and if you start optimizing,
00:48:10.660 | like, the first thing you write,
00:48:12.380 | the odds of that being the performance limiter is low.
00:48:15.420 | But when you get the whole thing working,
00:48:16.900 | can you make it 2X faster by optimizing the right things?
00:48:19.780 | Sure.
00:48:21.020 | While you're optimizing that,
00:48:22.500 | could you have written a new software stack,
00:48:24.300 | which would have been a better choice?
00:48:25.980 | Maybe.
00:48:27.100 | Now you have creative tension.
00:48:30.300 | - But the whole time as you're doing the writing,
00:48:33.140 | that's the software we're talking about.
00:48:34.860 | The hardware underneath gets faster and faster.
00:48:36.820 | - This goes back to the Moore's Law.
00:48:38.140 | If Moore's Law is gonna continue,
00:48:39.980 | then your AI research should expect that to show up,
00:48:44.980 | and then you make a slightly different set of choices,
00:48:47.900 | then we've hit the wall, nothing's gonna happen,
00:48:51.380 | and from here, it's just us rewriting algorithms.
00:48:55.020 | Like, that seems like a failed strategy
00:48:56.500 | for the last 30 years of Moore's Law's death.
00:48:59.180 | - So, can you just linger on it?
00:49:02.020 | I think you've answered it,
00:49:04.540 | but I'll just ask the same dumb question
00:49:06.460 | over and over.
00:49:07.300 | So, why do you think Moore's Law is not going to die?
00:49:12.300 | Which is the most promising, exciting possibility
00:49:15.740 | of why it won't die in the next five, 10 years?
00:49:18.060 | So, is it the continued shrinking of the transistor,
00:49:20.700 | or is it another S-curve that steps in,
00:49:24.020 | and it totally sort of--
00:49:25.580 | - Well, shrinking the transistor
00:49:27.540 | is literally thousands of innovations.
00:49:30.240 | - Right, so there's stacks of S-curves in there.
00:49:33.340 | - There's a whole bunch of S-curves
00:49:34.860 | just kind of running their course
00:49:36.540 | and being reinvented and new things.
00:49:40.500 | You know, the semiconductor fabricators
00:49:44.700 | and technologists have all announced
00:49:46.300 | what's called nanowires,
00:49:47.460 | so they took a fin which had a gate around it
00:49:51.180 | and turned that into little wires
00:49:52.700 | so you have better control of that,
00:49:53.940 | and they're smaller,
00:49:55.380 | and then from there, there's some obvious steps
00:49:57.260 | about how to shrink that.
00:49:59.380 | So, the metallurgy around wire stacks and stuff
00:50:03.660 | has very obvious abilities to shrink,
00:50:07.140 | and there's a whole combination of things there to do.
00:50:10.980 | - Your sense is that we're gonna get a lot
00:50:13.460 | if this innovation from just that shrinking.
00:50:15.700 | - Yeah, like a factor of 100, it's a lot.
00:50:19.420 | - Yeah, I would say.
00:50:20.500 | That's incredible.
00:50:22.140 | And it's totally unknown.
00:50:23.740 | - It's only 10 or 15 years.
00:50:25.140 | - Now, you're smart, and you might know,
00:50:26.420 | but to me, it's totally unpredictable
00:50:28.180 | of what that 100x would bring
00:50:29.740 | in terms of the nature of the computation
00:50:33.340 | that people would be doing.
00:50:34.460 | - Yeah, you're familiar with Bell's Law.
00:50:37.300 | So, for a long time, it was mainframes,
00:50:39.420 | minis, workstation, PC, mobile.
00:50:42.500 | Moore's Law drove faster, smaller computers.
00:50:45.380 | And then, when we were thinking about Moore's Law,
00:50:49.540 | Rajagirdari said, "Every 10x generates a new computation."
00:50:53.300 | So, scalar, vector, matrix, topological computation.
00:51:01.100 | And if you go look at the industry trends,
00:51:03.860 | there was mainframes and minicomputers and PCs,
00:51:07.380 | and then the internet took off,
00:51:08.900 | and then we got mobile devices,
00:51:10.740 | and now we're building 5G wireless
00:51:12.700 | with one millisecond latency.
00:51:14.780 | And people are starting to think about the smart world
00:51:17.140 | where everything knows you, recognizes you.
00:51:21.220 | The transformations are gonna be unpredictable.
00:51:27.420 | - How does it make you feel that you're one
00:51:29.900 | of the key architects of this kind of future?
00:51:34.900 | So, we're not talking about the architects
00:51:37.180 | of the high-level people who build the Angry Bird apps.
00:51:42.180 | - What's wrong with Angry Bird apps?
00:51:44.740 | Who knows?
00:51:45.580 | Maybe that's the whole point of the universe.
00:51:47.180 | - I'm gonna take a stand at that,
00:51:48.820 | and the attention-distracting nature of mobile phones.
00:51:52.780 | I'll take a stand.
00:51:53.780 | But anyway, in terms of--
00:51:55.260 | - I don't think that matters much.
00:51:57.580 | - The side effects of smartphones
00:52:01.260 | or the attention-distraction, which part?
00:52:03.700 | - Well, who knows where this is all leading?
00:52:06.140 | It's changing so fast.
00:52:07.420 | - Wait, so back to the--
00:52:08.260 | - My parents used to yell at my sisters
00:52:09.740 | for hiding in the closet with a wired phone
00:52:11.420 | with a dial on it.
00:52:13.100 | Stop talking to your friends all day.
00:52:14.660 | - Right.
00:52:15.740 | - Now my wife yells at my kids
00:52:17.220 | for talking to their friends all day on text.
00:52:20.380 | I don't know, it looks the same to me.
00:52:21.740 | - It's always, it echoes of the same thing.
00:52:23.380 | Okay, but you are one of the key people
00:52:26.660 | architecting the hardware of this future.
00:52:29.140 | How does that make you feel?
00:52:30.500 | Do you feel responsible?
00:52:31.780 | Do you feel excited?
00:52:34.900 | - So we're in a social context,
00:52:38.100 | so there's billions of people on this planet.
00:52:40.900 | There are literally millions of people
00:52:42.860 | working on technology.
00:52:44.420 | I feel lucky to be doing what I do
00:52:49.860 | and getting paid for it, and there's an interest in it.
00:52:52.820 | But there's so many things going on in parallel.
00:52:56.140 | Like the actions are so unpredictable.
00:52:58.340 | If I wasn't here, somebody else would do it.
00:53:01.180 | The vectors of all these different things
00:53:03.420 | are happening all the time.
00:53:04.860 | You know, there's a, I'm sure some philosopher
00:53:10.260 | or meta-philosophers, you know,
00:53:11.820 | wondering about how we transform our world.
00:53:14.020 | - So you can't deny the fact that these tools,
00:53:19.140 | whether, that these tools are changing our world.
00:53:24.140 | - That's right.
00:53:25.260 | - So do you think it's changing for the better?
00:53:28.420 | - I read this thing recently, it said
00:53:31.740 | the two disciplines with the highest GRE scores
00:53:35.420 | in college are physics and philosophy.
00:53:38.420 | And they're both sort of trying to answer the question,
00:53:41.780 | why is there anything?
00:53:42.900 | And the philosophers are on the kind of theological side,
00:53:47.740 | and the physicists are obviously on the material side.
00:53:52.660 | And there's 100 billion galaxies with 100 billion stars.
00:53:56.980 | It seems, well, repetitive at best.
00:54:00.140 | So, you know, there's, on our way to 10 billion people.
00:54:06.020 | I mean, it's hard to say what it's all for,
00:54:08.180 | if that's what you're asking.
00:54:09.580 | - Yeah, I guess I am.
00:54:11.260 | - Things do tend to significantly increase in complexity.
00:54:15.020 | And I'm curious about how computation,
00:54:21.300 | like our world, our physical world,
00:54:23.940 | inherently generates mathematics.
00:54:25.880 | It's kind of obvious, right?
00:54:26.840 | So we have XYZ coordinates.
00:54:28.820 | You take a sphere, you make it bigger,
00:54:30.100 | you get a surface that falls, you know, grows by R squared.
00:54:34.060 | Like it generally generates mathematics,
00:54:36.380 | and the mathematicians and the physicists
00:54:38.700 | have been having a lot of fun talking
00:54:39.940 | to each other for years.
00:54:41.260 | And computation has been, let's say, relatively pedestrian.
00:54:46.060 | Like computation in terms of mathematics
00:54:48.540 | has been doing binary algebra,
00:54:52.020 | while those guys have been gallivanting
00:54:54.460 | through the other realms of possibility, right?
00:54:58.020 | Now, recently, the computation lets you do
00:55:01.820 | mathematical computations that are sophisticated enough
00:55:06.540 | that nobody understands how the answers came out, right?
00:55:10.060 | - Machine learning.
00:55:10.900 | - Machine learning.
00:55:11.740 | - Yeah, yeah.
00:55:12.560 | - It used to be, you get data set,
00:55:14.260 | you guess at a function.
00:55:16.780 | The function is considered physics
00:55:18.900 | if it's predictive of new functions, new data sets.
00:55:22.140 | Modern, you can take a large data set
00:55:28.020 | with no intuition about what it is
00:55:29.980 | and use machine learning to find a pattern
00:55:31.820 | that has no function, right?
00:55:34.260 | And it can arrive at results that I don't know
00:55:37.580 | if they're completely mathematically describable.
00:55:39.980 | So computation has kind of done something interesting
00:55:44.160 | compared to A equal B plus C.
00:55:47.220 | - There's something reminiscent of that step
00:55:49.660 | from the basic operations of addition
00:55:53.660 | to taking a step towards neural networks
00:55:56.680 | that's reminiscent of what life on Earth
00:55:59.020 | at its origins was doing.
00:56:01.080 | Do you think we're creating sort of the next step
00:56:03.460 | in our evolution in creating artificial intelligence systems
00:56:07.600 | that will--
00:56:08.440 | - I don't know.
00:56:09.260 | I mean, there's so much in the universe already,
00:56:11.060 | it's hard to say.
00:56:12.660 | - Where we stand in this whole thing.
00:56:14.060 | - Are human beings working on additional abstraction layers
00:56:17.460 | and possibilities?
00:56:18.460 | Yeah, it appears so.
00:56:20.300 | Does that mean that human beings don't need dogs?
00:56:23.020 | You know, no.
00:56:24.140 | Like there's so many things
00:56:25.940 | that are all simultaneously interesting and useful.
00:56:30.420 | - Well, you've seen, throughout your career,
00:56:32.460 | you've seen greater and greater level abstractions
00:56:35.140 | built in artificial machines, right?
00:56:39.540 | Do you think, when you look at humans,
00:56:41.260 | do you think that the look of all life on Earth
00:56:44.020 | as a single organism building this thing,
00:56:46.860 | this machine with greater and greater levels of abstraction,
00:56:49.860 | do you think humans are the peak,
00:56:52.680 | the top of the food chain
00:56:54.100 | in this long arc of history on Earth?
00:56:58.380 | Or do you think we're just somewhere in the middle?
00:57:00.500 | Are we the basic functional operations of a CPU?
00:57:05.220 | Are we the C++ program, the Python program?
00:57:09.260 | Are we the neural network?
00:57:10.460 | Like somebody's, you know, people have calculated
00:57:12.900 | like how many operations does the brain do?
00:57:14.900 | And something, you know, I've seen the number 10
00:57:17.020 | to the 18th a bunch of times, arrived different ways.
00:57:20.620 | So could you make a computer
00:57:21.980 | that did 10 to the 20th operations?
00:57:23.820 | - Yes. - Sure.
00:57:25.300 | - So you think-- - We're gonna do that.
00:57:27.060 | Now, is there something magical
00:57:29.420 | about how brains compute things?
00:57:31.620 | I don't know.
00:57:32.980 | You know, my personal experience is interesting
00:57:35.260 | 'cause, you know, you think you know how you think
00:57:37.780 | and then you have all these ideas
00:57:39.020 | and you can't figure out how they happened.
00:57:41.500 | And if you meditate, you know,
00:57:44.100 | like what you can be aware of is interesting.
00:57:48.660 | So I don't know if brains are magical or not.
00:57:50.900 | You know, the physical evidence says no.
00:57:54.780 | Lots of people's personal experience says yes.
00:57:57.820 | So what would be funny is if brains are magical
00:58:01.300 | and yet we can make brains with more computation.
00:58:04.620 | You know, I don't know what to say about that, but.
00:58:07.060 | - Well, do you think magic is an emergent phenomena?
00:58:10.460 | What-- - It could be.
00:58:12.060 | I have no explanation for it.
00:58:13.820 | I'm an engineer. - Let me ask Jim Keller
00:58:15.020 | of what in your view is consciousness?
00:58:17.740 | - What's consciousness?
00:58:20.620 | - Yeah, like what, you know, consciousness, love,
00:58:25.500 | things that are these deeply human things
00:58:27.700 | that seems to emerge from our brain.
00:58:29.560 | Is that something that we'll be able to make,
00:58:33.580 | encode in chips that get faster and faster
00:58:37.220 | and faster and faster?
00:58:38.060 | - That's like a 10 hour conversation.
00:58:39.860 | Nobody really knows.
00:58:41.020 | - Can you summarize it in a couple of sentences?
00:58:44.020 | - Many people have observed that organisms run
00:58:48.860 | at lots of different levels, right?
00:58:51.500 | If you had two neurons, somebody said
00:58:52.860 | you'd have one sensory neuron and one motor neuron, right?
00:58:56.900 | So we move towards things and away from things
00:58:58.820 | and we have physical integrity and safety or not, right?
00:59:03.180 | And then if you look at the animal kingdom,
00:59:05.660 | you can see brains that are a little more complicated
00:59:08.340 | and at some point there's a planning system
00:59:10.300 | and then there's an emotional system
00:59:11.980 | that's happy about being safe
00:59:14.380 | or unhappy about being threatened, right?
00:59:17.220 | And then our brains have massive numbers of structures,
00:59:21.660 | you know, like planning and movement and thinking
00:59:24.940 | and feeling and drives and emotions.
00:59:27.940 | And we seem to have multiple layers of thinking systems.
00:59:31.140 | And we have a brain, a dream system
00:59:32.820 | that nobody understands whatsoever,
00:59:35.260 | which I find completely hilarious.
00:59:37.500 | And you can think in a way that those systems
00:59:42.500 | are more independent and you can observe,
00:59:46.540 | you know, the different parts of yourself can observe them.
00:59:49.540 | I don't know which one's magical.
00:59:51.380 | I don't know which one's not computational.
00:59:56.740 | - Is it possible that it's all computation?
00:59:58.860 | - Probably.
01:00:00.060 | Is there a limit to computation?
01:00:01.500 | I don't think so.
01:00:03.180 | - Do you think the universe is a computer?
01:00:05.300 | - I don't know, it seems to be.
01:00:07.420 | It's a weird kind of computer
01:00:09.540 | because if it was a computer, right?
01:00:12.580 | Like when they do calculations on what it,
01:00:15.340 | how much calculation it takes to describe quantum effects
01:00:18.380 | is unbelievably high.
01:00:20.900 | So if it was a computer,
01:00:22.180 | wouldn't you have built it out of something
01:00:23.540 | that was easier to compute?
01:00:25.060 | Right, that's a funny, it's a funny system.
01:00:29.580 | But then the simulation guys have pointed out
01:00:31.300 | that the rules are kind of interesting.
01:00:32.700 | Like when you look really close, it's uncertain.
01:00:35.100 | And the speed of light says you can only look so far
01:00:37.660 | and things can't be simultaneous
01:00:39.180 | except for the odd entanglement problem
01:00:41.220 | where they seem to be.
01:00:42.540 | Like the rules are all kind of weird.
01:00:45.100 | And somebody said physics is like having 50 equations
01:00:48.860 | with 50 variables to define 50 variables.
01:00:52.020 | Like, you know, it's, you know,
01:00:55.220 | like physics itself has been a shit show
01:00:56.980 | for thousands of years.
01:00:59.020 | It seems odd when you get to the corners of everything.
01:01:01.780 | You know, it's either uncomputable
01:01:03.660 | or undefinable or uncertain.
01:01:07.180 | - It's almost like the designers of the simulation
01:01:09.380 | are trying to prevent us from understanding it perfectly.
01:01:12.820 | - But also the things that require calculations
01:01:16.140 | require so much calculation
01:01:17.740 | that our idea of the universe of a computer is absurd
01:01:20.820 | because every single little bit of it
01:01:23.100 | takes all the computation in the universe to figure out.
01:01:26.300 | So that's a weird kind of computer.
01:01:28.100 | You know, you say the simulation is running in the computer
01:01:30.900 | which has by definition infinite computation.
01:01:34.500 | - Not infinite.
01:01:35.460 | Oh, you mean if the universe is infinite?
01:01:37.700 | - Yeah, well, every little piece of our universe
01:01:40.700 | seems to take infinite computation to figure out.
01:01:43.260 | - Just a lot.
01:01:44.220 | - Well, a lot's a pretty big number.
01:01:46.060 | Compute this little teeny spot takes all the mass
01:01:50.340 | in the local one light year by one light year space.
01:01:53.460 | It's close enough to infinite.
01:01:54.940 | - Oh, it's a heck of a computer if it is one.
01:01:56.660 | - I know, it's a weird description
01:02:00.020 | 'cause the simulation description seems to break
01:02:03.140 | when you look closely at it.
01:02:04.940 | But the rules of the universe seem to imply something's up.
01:02:07.900 | That seems a little arbitrary.
01:02:10.900 | - The universe, the whole thing, the laws of physics,
01:02:14.980 | it just seems like how did it come out to be the way it is?
01:02:19.980 | - Well, lots of people talk about that.
01:02:22.660 | Like I said, the two smartest groups of humans
01:02:24.500 | are working on the same problem.
01:02:26.220 | - From different sides.
01:02:27.060 | - Different aspects and they're both complete failures.
01:02:30.060 | So that's kind of cool.
01:02:31.520 | - They might succeed eventually.
01:02:34.260 | - Well, after 2,000 years, the trend isn't good.
01:02:38.180 | - Oh, 2,000 years is nothing in the span
01:02:40.180 | of the history of the universe.
01:02:41.540 | So we have some time.
01:02:43.380 | - But the next 1,000 years doesn't look good either.
01:02:45.980 | - That's what everybody says at every stage.
01:02:48.940 | But with Moore's Law, as you've just described,
01:02:51.420 | not being dead, the exponential growth of technology,
01:02:55.240 | the future seems pretty incredible.
01:02:57.740 | - Well, it'll be interesting, that's for sure.
01:02:59.620 | - That's right.
01:03:00.460 | So what are your thoughts on Ray Kurzweil's sense
01:03:03.860 | that exponential improvement in technology
01:03:05.900 | will continue indefinitely?
01:03:07.580 | Is that how you see Moore's Law?
01:03:10.860 | Do you see Moore's Law more broadly
01:03:13.100 | in the sense that technology of all kinds
01:03:16.900 | has a way of stacking S-curves on top of each other
01:03:21.240 | where it'll be exponential
01:03:23.140 | and then we'll see all kinds of--
01:03:24.540 | - What does an exponential of a million mean?
01:03:27.660 | That's a pretty amazing number.
01:03:29.440 | And that's just for a local little piece of silicon.
01:03:32.200 | Now let's imagine you say decided to get
01:03:35.780 | 1,000 tons of silicon to collaborate in one computer
01:03:41.500 | at a million times the density.
01:03:43.180 | Like now you're talking, I don't know,
01:03:46.860 | 10 to the 20th more computation power
01:03:49.820 | than our current already unbelievably fast computers.
01:03:53.880 | Like nobody knows what that's gonna mean.
01:03:55.780 | The sci-fi guys call it computronium.
01:03:58.980 | Like when a local civilization turns
01:04:01.620 | the nearby star into a computer.
01:04:03.840 | Like I don't know if that's true.
01:04:06.720 | - So just even when you shrink a transistor,
01:04:10.280 | - That's only one dimension.
01:04:12.580 | - The ripple effects of that.
01:04:14.180 | - Like people tend to think about computers
01:04:15.960 | as a cost problem, right?
01:04:17.640 | So computers are made out of silicon
01:04:19.340 | and minor amounts of metals.
01:04:21.980 | And you know, this and that.
01:04:24.780 | None of those things cost any money.
01:04:26.900 | Like there's plenty of sand.
01:04:28.720 | Like you could just turn the beach
01:04:31.140 | and a little bit of ocean water into computers.
01:04:33.340 | So all the cost is in the equipment to do it.
01:04:36.700 | And the trend on equipment is once you figure out
01:04:39.420 | how to build the equipment, the trend of cost is zero.
01:04:41.820 | Elon said first you figure out what configuration
01:04:45.900 | you want the atoms in and then how to put them there.
01:04:49.820 | Right?
01:04:50.660 | - Yeah.
01:04:51.480 | - But here's the, you know, his great insight is
01:04:54.900 | people are how constrained.
01:04:56.500 | I have this thing, I know how it works.
01:04:58.700 | And then little tweaks to that will generate something
01:05:02.300 | as opposed to what do I actually want
01:05:05.140 | and then figure out how to build it.
01:05:07.060 | It's a very different mindset.
01:05:09.280 | And almost nobody has it, obviously.
01:05:11.360 | - Well, let me ask on that topic.
01:05:15.780 | You were one of the key early people
01:05:18.060 | in the development of autopilot,
01:05:20.180 | at least in the hardware side.
01:05:21.640 | Elon Musk believes that autopilot and vehicle autonomy,
01:05:25.500 | if you just look at that problem,
01:05:26.700 | can follow this kind of exponential improvement.
01:05:29.500 | In terms of the how question that we're talking about,
01:05:32.620 | there's no reason why it can't.
01:05:34.700 | What are your thoughts on this particular space
01:05:37.300 | of vehicle autonomy and your part of it
01:05:42.300 | and Elon Musk's and Tesla's vision for--
01:05:45.260 | - Well, the computer you need to build is straightforward.
01:05:48.780 | And you could argue, well, does it need to be
01:05:51.140 | two times faster or five times or 10 times?
01:05:53.600 | But that's just a matter of time or price in the short run.
01:05:58.440 | So that's not a big deal.
01:06:00.240 | You don't have to be especially smart to drive a car.
01:06:03.300 | So it's not like a super hard problem.
01:06:05.740 | I mean, the big problem with safety is attention,
01:06:07.940 | which computers are really good at, not skills.
01:06:11.120 | - Well, let me push back on one.
01:06:15.260 | You say everything you said is correct,
01:06:17.160 | but we as humans tend to take for granted
01:06:22.160 | how incredible our vision system is.
01:06:27.900 | - You can drive a car with 20/50 vision
01:06:30.620 | and you can train a neural network to extract
01:06:33.060 | the distance of any object and the shape of any surface
01:06:36.460 | from a video and data.
01:06:38.540 | - Yeah, but-- - It's really simple.
01:06:40.180 | - No, it's not simple.
01:06:42.140 | - That's a simple data problem.
01:06:44.380 | - It's not simple.
01:06:46.340 | It's because it's not just detecting objects,
01:06:50.460 | it's understanding the scene and it's being able to do it
01:06:53.720 | in a way that doesn't make errors.
01:06:56.580 | So the beautiful thing about the human vision system
01:07:00.020 | and our entire brain around the whole thing
01:07:02.600 | is we're able to fill in the gaps.
01:07:05.540 | It's not just about perfectly detecting cars,
01:07:08.200 | it's inferring the occluded cars.
01:07:09.960 | It's trying to, it's understanding the physics--
01:07:12.800 | - I think that's mostly a data problem.
01:07:14.580 | So you think what data would compute
01:07:17.700 | with improvement of computation,
01:07:19.220 | with improvement in collection--
01:07:20.740 | - Well, there is a, you know, when you're driving a car
01:07:22.660 | and somebody cuts you off, your brain has theories
01:07:24.760 | about why they did it.
01:07:26.140 | You know, they're a bad person, they're distracted,
01:07:28.660 | they're dumb, you know, you can listen to yourself.
01:07:31.820 | - Right.
01:07:32.820 | - So, you know, if you think that narrative is important
01:07:37.020 | to be able to successfully drive a car,
01:07:38.820 | then current autopilot systems can't do it.
01:07:41.620 | But if cars are ballistic things with tracks
01:07:44.360 | and probabilistic changes of speed and direction,
01:07:47.340 | and roads are fixed and given by the way,
01:07:50.220 | they don't change dynamically, right?
01:07:53.280 | You can map the world really thoroughly.
01:07:56.340 | You can place every object really thoroughly, right?
01:08:01.340 | You can calculate trajectories of things really thoroughly.
01:08:04.780 | Right?
01:08:06.900 | - But everything you said about really thoroughly
01:08:09.860 | has a different degree of difficulty.
01:08:13.340 | - And you could say at some point,
01:08:15.100 | computer autonomous systems will be way better
01:08:17.620 | at things that humans are lousy at.
01:08:20.020 | Like, they'll be better at attention,
01:08:22.480 | they'll always remember there was a pothole in the road
01:08:25.060 | that humans keep forgetting about.
01:08:27.380 | They'll remember that this set of roads
01:08:29.460 | has these weirdo lines on it
01:08:31.220 | that the computers figured out once.
01:08:32.780 | And especially if they get updates
01:08:35.180 | so if somebody changes a given,
01:08:37.960 | like the key to robots and stuff,
01:08:40.660 | somebody said is to maximize the givens.
01:08:42.880 | Right?
01:08:44.740 | - Right.
01:08:45.560 | - So having a robot pick up this bottle cap
01:08:47.920 | is way easier to put a red dot on the top.
01:08:50.060 | 'Cause then you have to figure out,
01:08:52.640 | if you wanna do a certain thing with it,
01:08:54.800 | maximize the givens is the thing.
01:08:57.120 | And autonomous systems are happily maximizing the givens.
01:09:00.200 | Like humans, when you drive someplace new,
01:09:04.120 | you remember it 'cause you're processing it the whole time.
01:09:06.880 | And after the 50th time you drove to work,
01:09:08.880 | you get to work, you don't know how you got there.
01:09:11.240 | Right?
01:09:12.080 | You're on autopilot.
01:09:13.680 | Right?
01:09:14.760 | Autonomous cars are always on autopilot.
01:09:17.720 | But the cars have no theories about why they got cut off
01:09:20.340 | or why they're in traffic.
01:09:22.080 | - So that's--
01:09:22.920 | - They also never stop paying attention.
01:09:24.680 | - Right.
01:09:25.520 | So I tend to believe you do have to have theories,
01:09:27.960 | meta models of other people,
01:09:29.960 | especially with pedestrian and cyclists,
01:09:31.380 | but also with other cars.
01:09:32.800 | So everything you said is actually essential to driving.
01:09:37.800 | Driving is a lot more complicated than people realize,
01:09:41.720 | I think.
01:09:42.560 | So sort of to push back slightly, but--
01:09:44.800 | - So to cut into traffic, right?
01:09:46.480 | - Yep.
01:09:47.320 | - You can't just wait for a gap.
01:09:48.440 | You have to be somewhat aggressive.
01:09:50.120 | You'd be surprised how simple a calculation for that is.
01:09:53.800 | - I may be on that particular point, but there's--
01:09:56.240 | Maybe I actually have to push back.
01:10:00.320 | I would be surprised.
01:10:01.600 | You know what?
01:10:02.440 | I'll say where I stand.
01:10:03.260 | I would be very surprised,
01:10:04.280 | but I think you might be surprised how complicated it is.
01:10:09.280 | - I tell people, it's like progress disappoints
01:10:11.960 | in the short run and surprises in the long run.
01:10:13.920 | - It's very possible.
01:10:14.920 | Yeah.
01:10:15.760 | - I suspect in 10 years, it'll be just taken for granted.
01:10:18.960 | - Yeah, probably.
01:10:19.840 | But you're probably right.
01:10:21.520 | Now it look like--
01:10:22.360 | - It's gonna be a $50 solution that nobody cares about.
01:10:25.040 | It's like GPS is like, wow, GPS is,
01:10:27.240 | we have satellites in space
01:10:29.440 | that tell you where your location is.
01:10:31.120 | It was a really big deal.
01:10:32.040 | Now everything has a GPS in it.
01:10:33.480 | - Yeah, that's true.
01:10:34.320 | But I do think that systems that involve human behavior
01:10:38.880 | are more complicated than we give them credit for.
01:10:40.800 | So we can do incredible things with technology
01:10:43.520 | that don't involve humans, but when you--
01:10:45.600 | - I think humans are less complicated than people
01:10:48.840 | frequently ascribed.
01:10:50.560 | - Maybe I--
01:10:51.400 | - We tend to operate out of large numbers of patterns
01:10:53.720 | and just keep doing it over and over.
01:10:55.800 | - But I can't trust you because you're a human.
01:10:58.040 | That's something a human would say.
01:11:00.760 | But my hope is on the point you've made is,
01:11:04.560 | even if, no matter who's right,
01:11:07.240 | I'm hoping that there's a lot of things
01:11:10.640 | that humans aren't good at
01:11:11.840 | that machines are definitely good at.
01:11:13.440 | Like you said, attention and things like that.
01:11:15.600 | Well, they'll be so much better
01:11:17.660 | that the overall picture of safety and autonomy
01:11:20.960 | will be obviously cars will be safer,
01:11:22.840 | even if they're not as good at it.
01:11:24.680 | - I'm a big believer in safety.
01:11:26.360 | I mean, there are already the current safety systems
01:11:29.600 | like cruise control that doesn't let you run into people
01:11:32.000 | and lane keeping.
01:11:33.320 | There are so many features that you just look at the Pareto
01:11:36.280 | of accidents and knocking off like 80% of them
01:11:39.560 | is super doable.
01:11:42.440 | - Just to linger on the autopilot team
01:11:44.640 | and the efforts there,
01:11:45.820 | it seems to be that there's a very intense scrutiny
01:11:51.680 | by the media and the public in terms of safety,
01:11:54.280 | the pressure, the bar put before autonomous vehicles.
01:11:57.960 | What are your sort of as a person there
01:12:01.720 | working on the hardware and trying to build a system
01:12:03.860 | that builds a safe vehicle and so on,
01:12:07.200 | what was your sense about that pressure?
01:12:08.940 | Is it unfair?
01:12:09.880 | Is it expected of new technology?
01:12:12.280 | - Yeah, it seems reasonable.
01:12:13.500 | I was interested, I talked to both American
01:12:15.400 | and European regulators,
01:12:17.240 | and I was worried that the regulations
01:12:21.200 | would write into the rules technology solutions
01:12:25.080 | like modern brake systems imply hydraulic brakes.
01:12:30.000 | So if you read the regulations
01:12:32.120 | to meet the letter of the law for brakes,
01:12:35.040 | it sort of has to be hydraulic, right?
01:12:37.760 | And the regulator said,
01:12:39.320 | they're interested in the use cases,
01:12:42.020 | like a head-on crash, an offset crash,
01:12:44.320 | don't hit pedestrians, don't run into people,
01:12:47.040 | don't leave the road, don't run a red light or a stoplight.
01:12:50.360 | They were very much into the scenarios.
01:12:53.120 | And they had all the data about which scenarios
01:12:56.880 | injured or killed the most people.
01:12:59.280 | And for the most part, those conversations were like,
01:13:04.000 | what's the right thing to do to take the next step?
01:13:08.760 | Now Elon's very interested also in the benefits
01:13:11.960 | of autonomous driving or freeing people's time
01:13:14.120 | and attention as well as safety.
01:13:16.480 | And I think that's also an interesting thing,
01:13:20.320 | but building autonomous systems so they're safe
01:13:25.120 | and safer than people seemed,
01:13:27.360 | since the goal is to be 10x safer than people,
01:13:30.120 | having the bar to be safer than people
01:13:32.160 | and scrutinizing accidents seems philosophically correct.
01:13:37.160 | So I think that's a good thing.
01:13:40.760 | - It's different than the things you worked at,
01:13:47.360 | the Intel, AMD, Apple, with autopilot chip design
01:13:51.560 | and hardware design.
01:13:53.400 | What are interesting or challenging aspects
01:13:55.300 | of building this specialized kind of computing system
01:13:57.880 | in the automotive space?
01:13:59.300 | - I mean, there's two tricks to building
01:14:01.600 | like an automotive computer.
01:14:02.740 | One is the software team, the machine learning team
01:14:07.280 | is developing algorithms that are changing fast.
01:14:10.640 | So as you're building the accelerator,
01:14:14.240 | you have this worry or intuition
01:14:16.880 | that the algorithms will change enough
01:14:18.480 | that the accelerator will be the wrong one.
01:14:21.760 | And there's a generic thing,
01:14:24.560 | which is if you build a really good general purpose computer
01:14:27.200 | say its performance is one,
01:14:29.800 | and then GPU guys will deliver about 5x to performance
01:14:34.260 | for the same amount of silicon,
01:14:35.680 | because instead of discovering parallelism,
01:14:37.600 | you're given parallelism.
01:14:39.200 | And then special accelerators get another two to 5x
01:14:43.680 | on top of a GPU, because you say,
01:14:46.080 | I know the math is always eight bit integers
01:14:49.000 | into 32 bit accumulators,
01:14:51.120 | and the operations are the subset
01:14:53.000 | of mathematical possibilities.
01:14:55.160 | So, AI accelerators have a claim performance benefit
01:15:00.160 | over GPUs because in the narrow mass space,
01:15:05.060 | you're nailing the algorithm.
01:15:07.080 | Now, you still try to make it programmable,
01:15:10.000 | but the AI field is changing really fast.
01:15:13.240 | So there's a little creative tension there of,
01:15:17.240 | I want the acceleration afforded by specialization
01:15:20.560 | without being over specialized
01:15:22.100 | so that the new algorithm is so much more effective
01:15:25.540 | that you'd have been better off on a GPU.
01:15:27.880 | So there's a tension there.
01:15:29.920 | To build a good computer for an application like automotive,
01:15:34.360 | there's all kinds of sensor inputs and safety processors
01:15:37.540 | and a bunch of stuff.
01:15:39.060 | So one of Elon's goals to make it super affordable.
01:15:42.160 | So every car gets an autopilot computer.
01:15:44.800 | So some of the recent startups you look at,
01:15:46.440 | and they have a server in the trunk,
01:15:48.440 | because they're saying,
01:15:49.280 | I'm gonna build this autopilot computer
01:15:50.640 | or replaces the driver.
01:15:52.480 | So their cost budget's 10 or $20,000.
01:15:55.160 | And Elon's constraint was, I'm gonna put one in every car,
01:15:58.720 | whether people buy autonomous driving or not.
01:16:01.640 | So the cost constraint he had in mind was great.
01:16:05.200 | And to hit that, you had to think about the system design.
01:16:08.320 | That's complicated, it's fun.
01:16:09.840 | You know, it's like, it's craftsman's work.
01:16:12.520 | Like a violin maker, right?
01:16:14.240 | You can say Stradivarius is this incredible thing,
01:16:16.760 | the musicians are incredible.
01:16:18.460 | But the guy making the violin, you know,
01:16:20.440 | picked wood and sanded it, and then he cut it,
01:16:23.960 | you know, and he glued it, you know,
01:16:25.920 | and he waited for the right day
01:16:27.880 | so that when he put the finish on it,
01:16:29.480 | it didn't, you know, do something dumb.
01:16:31.620 | That's craftsman's work, right?
01:16:33.840 | You may be a genius craftsman
01:16:35.480 | 'cause you have the best techniques
01:16:36.800 | and you discover a new one,
01:16:38.800 | but most engineering is craftsman's work.
01:16:41.920 | And humans really like to do that.
01:16:44.280 | You know the expression-- - Smart humans.
01:16:45.920 | - No, everybody.
01:16:46.760 | - All humans. - I don't know.
01:16:47.880 | I used to, I dug ditches when I was in college.
01:16:50.340 | I got really good at it, satisfying.
01:16:52.600 | - Yeah.
01:16:53.440 | Digging ditches is also craftsman work.
01:16:55.440 | - Yeah, of course.
01:16:56.920 | So there's an expression called complex mastery behavior.
01:17:00.880 | So when you're learning something,
01:17:02.040 | that's fun 'cause you're learning something.
01:17:04.060 | When you do something and it's rote and simple,
01:17:05.720 | it's not that satisfying.
01:17:06.680 | But if the steps that you have to do are complicated
01:17:10.360 | and you're good at 'em, it's satisfying to do them.
01:17:13.480 | And then if you're intrigued by it all,
01:17:16.840 | as you're doing them, you sometimes learn new things
01:17:19.500 | that you can raise your game.
01:17:21.560 | But craftsman's work is good.
01:17:23.720 | And engineers, like engineering is complicated enough
01:17:27.040 | that you have to learn a lot of skills
01:17:28.760 | and then a lot of what you do is then craftsman's work,
01:17:32.320 | which is fun.
01:17:33.440 | Autonomous driving, building a very
01:17:35.400 | resource-constrained computer,
01:17:37.800 | so a computer has to be cheap enough
01:17:39.500 | to put in every single car,
01:17:41.080 | that essentially boils down to craftsman's work.
01:17:45.040 | It's engineering, it's--
01:17:45.880 | - Yeah, you know, there's thoughtful decisions
01:17:47.660 | and problems to solve and trade-offs to make.
01:17:50.560 | Do you need 10 camera in ports or eight?
01:17:52.480 | You know, you're building for the current car
01:17:54.480 | or the next one.
01:17:56.000 | You know, how do you do the safety stuff?
01:17:57.880 | You know, there's a whole bunch of details.
01:18:00.600 | But it's fun.
01:18:01.420 | It's not like I'm building a new type of neural network
01:18:04.740 | which has a new mathematics and a new computer to work.
01:18:08.020 | You know, that's, like there's more invention than that.
01:18:11.480 | But the rejection to practice,
01:18:14.100 | once you pick the architecture, you look inside
01:18:16.100 | and what do you see?
01:18:17.060 | Adders and multipliers and memories and, you know,
01:18:20.340 | the basics.
01:18:21.180 | So computers is always this weird set of abstraction layers
01:18:25.580 | of ideas and thinking that reduction to practice
01:18:29.300 | is transistors and wires and, you know, pretty basic stuff.
01:18:33.740 | And that's an interesting phenomenon.
01:18:37.060 | By the way, like factory work,
01:18:38.820 | like lots of people think factory work
01:18:40.580 | is road assembly stuff.
01:18:42.260 | I've been on the assembly line.
01:18:44.140 | Like the people who work there really like it.
01:18:46.260 | It's a really great job.
01:18:47.820 | It's really complicated.
01:18:48.740 | Putting cars together is hard, right?
01:18:50.860 | And the car is moving and the parts are moving
01:18:53.420 | and sometimes the parts are damaged
01:18:54.940 | and you have to coordinate putting all the stuff together
01:18:57.520 | and people are good at it.
01:18:59.060 | They're good at it.
01:19:00.340 | And I remember one day I went to work
01:19:01.740 | and the line was shut down for some reason
01:19:03.940 | and some of the guys sitting around were really bummed
01:19:06.740 | 'cause they had reorganized a bunch of stuff
01:19:09.220 | and they were gonna hit a new record
01:19:10.700 | for the number of cars built that day
01:19:12.740 | and they were all gung-ho to do it.
01:19:14.140 | And these were big, tough buggers.
01:19:15.840 | (Luke laughs)
01:19:17.780 | But what they did was complicated and you couldn't do it.
01:19:20.180 | - Yeah, and I mean--
01:19:21.340 | - Well, after a while you could,
01:19:22.740 | but you'd have to work your way up
01:19:24.180 | 'cause, you know, like putting the bright,
01:19:27.180 | what's called the brights,
01:19:28.660 | the trim on a car on a moving assembly line
01:19:32.620 | where it has to be attached 25 places
01:19:34.620 | in a minute and a half is unbelievably complicated.
01:19:38.160 | And human beings can do it, it's really good.
01:19:42.500 | I think that's harder than driving a car, by the way.
01:19:45.280 | - Putting together, working--
01:19:47.060 | - Working in a factory.
01:19:48.580 | - Too smart people can disagree.
01:19:51.420 | - Yay.
01:19:52.260 | - I think driving a car--
01:19:54.460 | - Well, we'll get you in the factory someday
01:19:56.140 | and then we'll see how you do.
01:19:56.980 | - No, not for us humans driving a car is easy.
01:19:59.540 | I'm saying building a machine
01:20:01.740 | that drives a car is not easy.
01:20:04.540 | Okay, driving a car is easy for humans
01:20:07.460 | because we've been evolving for billions of years.
01:20:10.900 | - To drive cars, yeah, I noticed that.
01:20:13.300 | The paleolithic cars are super cool.
01:20:15.640 | - Oh, now you join the rest of the internet in mocking me.
01:20:19.860 | - Okay.
01:20:20.700 | (Luke laughs)
01:20:21.520 | I wasn't mocking, I was just, you know,
01:20:23.820 | intrigued by your anthropology.
01:20:26.860 | - Yeah, it's--
01:20:27.700 | - I'll have to go dig into that.
01:20:28.980 | - There's some inaccuracies there, yes.
01:20:31.100 | Okay, but in general,
01:20:33.580 | (Luke laughs)
01:20:35.380 | what have you learned in terms of,
01:20:39.660 | thinking about passion, craftsmanship,
01:20:44.060 | tension, chaos, you know--
01:20:47.260 | - Jesus.
01:20:48.100 | - The whole mess of it,
01:20:50.900 | what have you learned, have taken away from your time
01:20:54.260 | working with Elon Musk, working at Tesla,
01:20:57.020 | which is known to be a place of chaos,
01:21:00.860 | innovation, craftsmanship, and all those things.
01:21:03.700 | - I really like the way he thought.
01:21:05.360 | Like, you think you have an understanding
01:21:07.700 | about what first principles of something is,
01:21:10.020 | and then you talk to Elon about it,
01:21:11.660 | and you didn't scratch the surface.
01:21:13.900 | You know, he has a deep belief
01:21:17.420 | that no matter what you do, it's a local maximum.
01:21:19.860 | Right, and I had a friend,
01:21:21.740 | he invented a better electric motor,
01:21:24.260 | and it was a lot better than what we were using.
01:21:26.980 | And one day he came by, he said,
01:21:28.060 | "You know, I'm a little disappointed,
01:21:30.020 | "'cause this is really great,
01:21:31.840 | "and you didn't seem that impressed."
01:21:33.300 | And I said, "You know, when the super intelligent aliens
01:21:36.420 | "come, are they gonna be looking for you?"
01:21:38.940 | Like, where is he?
01:21:39.780 | The guy who built the motor.
01:21:41.140 | Probably not.
01:21:43.220 | But doing interesting work that's both innovative,
01:21:49.440 | and let's say craftsman's work on the current thing,
01:21:51.840 | it's really satisfying, and it's good.
01:21:54.220 | And that's cool.
01:21:55.140 | And then Elon was good at taking everything apart.
01:21:59.060 | Like, what's the deep first principle?
01:22:01.680 | Oh, no, what's really, no, what's really?
01:22:03.980 | You know, that ability to look at it without assumptions,
01:22:08.980 | and how constraint is super wild.
01:22:13.140 | You know, he built a rocket ship,
01:22:15.380 | and electric car, and everything.
01:22:19.740 | And that's super fun, and he's into it, too.
01:22:21.860 | Like, when they first landed two SpaceX rockets to Tesla,
01:22:26.140 | we had a video projector in the big room,
01:22:28.000 | and like 500 people came down,
01:22:29.840 | and when they landed, everybody cheered,
01:22:31.300 | and some people cried.
01:22:32.660 | It was so cool.
01:22:33.720 | All right, but how did you do that?
01:22:36.260 | Well, it was super hard.
01:22:39.400 | And then people say, "Well, it's chaotic."
01:22:42.180 | Really?
01:22:43.020 | To get out of all your assumptions?
01:22:44.580 | You think that's not gonna be unbelievably painful?
01:22:48.140 | And is Elon tough?
01:22:50.040 | Yeah, probably.
01:22:51.400 | Do people look back on it and say,
01:22:53.200 | "Boy, I'm really happy I had that experience
01:22:57.440 | "to go take apart that many layers of assumptions?"
01:23:01.800 | Sometimes super fun, sometimes painful.
01:23:05.360 | So it could be emotionally and intellectually painful,
01:23:07.900 | that whole process of just stripping away assumptions?
01:23:10.860 | Yeah, imagine 99% of your thought process
01:23:13.320 | is protecting your self-conception.
01:23:16.540 | And 98% of that's wrong.
01:23:18.660 | Now you got the math right.
01:23:21.500 | How do you think you're feeling
01:23:23.620 | when you get back into that one bit that's useful,
01:23:26.800 | and now you're open and you have the ability
01:23:28.540 | to do something different?
01:23:30.640 | I don't know if I got the math right.
01:23:33.660 | It might be 99.9, but it ain't 50.
01:23:37.420 | Imagining that 50% is hard enough.
01:23:44.200 | Now for a long time I've suspected you could get better.
01:23:47.040 | Like you can think better, you can think more clearly,
01:23:50.720 | you can take things apart.
01:23:52.040 | And there's lots of examples of that, people who do that.
01:23:56.400 | And Elon is an example of that.
01:24:01.000 | Apparently. You are an example.
01:24:02.160 | So-- I don't know if I am.
01:24:04.480 | I'm fun to talk to.
01:24:05.520 | Certainly. I've learned a lot of stuff.
01:24:08.600 | Right. Well, here's the other thing
01:24:09.880 | is like, I joke, like I read books,
01:24:13.000 | and people think, oh, you read books.
01:24:14.600 | Well, no, I've read a couple of books a week for 55 years.
01:24:19.600 | Well, maybe 50, 'cause I didn't learn to read
01:24:22.600 | until I was eight or something.
01:24:24.680 | And it turns out when people write books,
01:24:28.480 | they often take 20 years of their life
01:24:31.240 | where they passionately did something,
01:24:33.280 | reduced it to 200 pages.
01:24:36.080 | That's kind of fun.
01:24:37.480 | And then you go online and you can find out
01:24:39.800 | who wrote the best books and who like, you know,
01:24:42.420 | that's kind of wild.
01:24:43.360 | So there's this wild selection process,
01:24:45.200 | and then you can read it,
01:24:46.040 | and for the most part, understand it.
01:24:48.600 | And then you can go apply it.
01:24:51.920 | Like I went to one company, I thought,
01:24:53.400 | I haven't managed much before.
01:24:55.080 | So I read 20 management books,
01:24:57.280 | and I started talking to them,
01:24:58.720 | and basically compared to all the VPs running around,
01:25:01.400 | I'd read 19 more management books than anybody else.
01:25:05.400 | (laughing)
01:25:07.080 | Wasn't even that hard.
01:25:08.600 | And half the stuff worked, like first time.
01:25:11.160 | It wasn't even rocket science.
01:25:12.660 | - But at the core of that is questioning the assumptions,
01:25:16.960 | or sort of entering, thinking first principles thinking,
01:25:21.760 | sort of looking at the reality of the situation,
01:25:24.880 | and using that knowledge, applying that knowledge.
01:25:28.200 | - Yeah, so I would say my brain has this idea
01:25:31.360 | that you can question first assumptions.
01:25:34.260 | But I can go days at a time and forget that,
01:25:38.280 | and you have to kind of circle back that observation.
01:25:41.440 | - Because it is emotionally challenging.
01:25:45.120 | - Well, it's hard to just keep it front and center,
01:25:47.280 | 'cause you operate on so many levels all the time,
01:25:50.380 | and getting this done takes priority,
01:25:53.440 | or being happy takes priority,
01:25:56.480 | or screwing around takes priority.
01:25:59.360 | Like how you go through life is complicated.
01:26:03.040 | And then you remember, oh yeah,
01:26:04.360 | I could really think first principles.
01:26:06.480 | Oh shit, that's tiring.
01:26:08.260 | But you do for a while, and that's kind of cool.
01:26:12.760 | - So just as a last question in your sense,
01:26:16.200 | from the big picture, from the first principles,
01:26:19.480 | do you think, you kind of answered it already,
01:26:21.520 | but do you think autonomous driving
01:26:24.320 | is something we can solve on a timeline of years?
01:26:28.720 | So one, two, three, five, 10 years,
01:26:32.240 | as opposed to a century?
01:26:33.880 | - Yeah, definitely.
01:26:35.400 | - Just to linger on it a little longer,
01:26:37.400 | where's the confidence coming from?
01:26:40.080 | Is it the fundamentals of the problem,
01:26:42.600 | the fundamentals of building the hardware and the software?
01:26:46.360 | - As a computational problem,
01:26:48.760 | understanding ballistics, roles, topography,
01:26:53.280 | it seems pretty solvable.
01:26:56.520 | I mean, and you can see this,
01:26:57.960 | like speech recognition for a long time,
01:27:00.240 | people are doing frequency and domain analysis,
01:27:02.720 | and all kinds of stuff,
01:27:04.360 | and that didn't work for at all, right?
01:27:07.280 | And then they did deep learning about it,
01:27:09.360 | and it worked great.
01:27:10.400 | And it took multiple iterations.
01:27:14.320 | And, you know, autonomous driving
01:27:17.400 | is way past the frequency analysis point.
01:27:19.840 | You know, use radar, don't run into things.
01:27:23.880 | And the data gathering is going up,
01:27:25.440 | and the computation is going up,
01:27:26.840 | and the algorithm understanding is going up,
01:27:28.600 | and there's a whole bunch of problems
01:27:30.000 | getting solved like that.
01:27:31.960 | - The data side is really powerful,
01:27:33.480 | but I disagree with both you and Elon.
01:27:35.720 | I'll tell Elon once again, as I did before,
01:27:38.560 | that when you add human beings into the picture,
01:27:42.360 | it's no longer a ballistics problem.
01:27:45.680 | It's something more complicated,
01:27:47.480 | but I could be very well proven wrong.
01:27:50.360 | - Cars are highly damped in terms of rate of change.
01:27:53.040 | Like the steering system's really slow
01:27:56.640 | compared to a computer.
01:27:57.640 | The acceleration, the acceleration's really slow.
01:28:01.000 | - Yeah, on a certain time scale.
01:28:02.840 | On a ballistics time scale, but human behavior,
01:28:05.000 | I don't know.
01:28:05.840 | I shouldn't say--
01:28:08.200 | - Human beings are really slow too.
01:28:09.800 | Weirdly, we operate, you know,
01:28:11.320 | half a second behind reality.
01:28:13.960 | Nobody really understands that one either.
01:28:15.320 | It's pretty funny.
01:28:16.440 | - Yeah, yeah.
01:28:18.160 | We very well could be surprised.
01:28:23.600 | And I think with the rate of improvement
01:28:25.160 | in all aspects on both the compute
01:28:26.880 | and the software and the hardware,
01:28:29.680 | there's gonna be pleasant surprises all over the place.
01:28:32.720 | - Mm-hmm.
01:28:33.560 | - Speaking of unpleasant surprises,
01:28:36.720 | many people have worries about a singularity
01:28:39.520 | in the development of AI.
01:28:41.680 | Forgive me for such questions.
01:28:43.160 | - Yeah.
01:28:44.440 | - When AI improves exponentially
01:28:46.040 | and reaches a point of superhuman level
01:28:48.360 | general intelligence,
01:28:49.800 | you know, beyond the point, there's no looking back.
01:28:53.320 | Do you share this worry of existential threats
01:28:56.120 | from artificial intelligence,
01:28:57.360 | from computers becoming superhuman level intelligent?
01:29:01.920 | - No, not really.
01:29:03.400 | You know, like we already have a very stratified society.
01:29:07.520 | And then if you look at the whole animal kingdom
01:29:09.400 | of capabilities and abilities and interests,
01:29:12.560 | and, you know, smart people have their niche,
01:29:15.280 | and, you know, normal people have their niche,
01:29:17.760 | and craftsmen have their niche,
01:29:19.640 | and, you know, animals have their niche.
01:29:22.560 | I suspect that the domains of interest
01:29:26.040 | for things that, you know, astronomically different,
01:29:29.480 | like the whole something got 10 times smarter than us
01:29:32.320 | and wanted to track us all down because what?
01:29:34.720 | We like to have coffee at Starbucks?
01:29:36.960 | Like, it doesn't seem plausible.
01:29:38.920 | No, is there an existential problem
01:29:40.720 | that how do you live in a world
01:29:42.560 | where there's something way smarter than you,
01:29:44.120 | and you base your kind of self-esteem
01:29:46.440 | on being the smartest local person?
01:29:48.920 | Well, there's what, 0.1% of the population who thinks that?
01:29:52.560 | 'Cause the rest of the population's been dealing with it
01:29:54.880 | since they were born.
01:29:56.760 | So the breadth of possible experience
01:30:00.960 | that can be interesting is really big.
01:30:03.680 | And, you know, superintelligence seems likely,
01:30:09.840 | although we still don't know if we're magical,
01:30:14.200 | but I suspect we're not,
01:30:16.320 | and it seems likely that it'll create possibilities
01:30:18.800 | that are interesting for us,
01:30:20.920 | and its interests will be interesting for whatever it is.
01:30:26.840 | It's not obvious why its interests
01:30:28.920 | would somehow wanna fight over some square foot of dirt
01:30:32.400 | or, you know, whatever the usual fears are about.
01:30:37.400 | - So you don't think you'll inherit
01:30:39.000 | some of the darker aspects of human nature?
01:30:41.320 | - Depends on how you think reality's constructed.
01:30:45.240 | So for whatever reason, human beings are in, let's say,
01:30:50.240 | creative tension and opposition
01:30:52.320 | with both our good and bad forces.
01:30:55.400 | Like, there's lots of philosophical understanding of that.
01:30:58.300 | Right?
01:31:00.480 | I don't know why that would be different.
01:31:03.200 | - So you think the evil is necessary for the good?
01:31:06.720 | I mean, the tension.
01:31:08.200 | - I don't know about evil,
01:31:09.120 | but like we live in a competitive world
01:31:11.640 | where your good is somebody else's, you know, evil.
01:31:16.640 | You know, there's the malignant part of it,
01:31:19.320 | but that seems to be self-limiting,
01:31:22.760 | although occasionally it's super horrible.
01:31:26.300 | - But yes, there's a debate over ideas
01:31:30.000 | and some people have different beliefs
01:31:32.360 | and that debate itself is a process
01:31:34.600 | so that arriving at something--
01:31:37.560 | - Yeah, and why wouldn't that continue?
01:31:39.360 | - Yeah.
01:31:40.200 | But you don't think that whole process
01:31:43.160 | will leave humans behind in a way that's painful?
01:31:46.140 | Emotionally painful, yes, for the 0.1%.
01:31:50.440 | There'll be--
01:31:51.280 | - Isn't it already painful for a large percentage
01:31:53.240 | of the population?
01:31:54.080 | And it is.
01:31:54.900 | I mean, society does have a lot of stress in it,
01:31:57.880 | about the 1% and about to this and about to that,
01:32:00.680 | but you know, everybody has a lot of stress in their life
01:32:03.760 | about what they find satisfying
01:32:05.360 | and you know, know yourself seems to be the proper dictum
01:32:09.760 | and pursue something that makes your life meaningful
01:32:14.240 | seems proper.
01:32:15.220 | And there's so many avenues on that.
01:32:18.720 | Like, there's so much unexplored space
01:32:21.120 | at every single level.
01:32:22.560 | You know, I'm somewhat of,
01:32:27.320 | my nephew called me a jaded optimist.
01:32:29.600 | (laughing)
01:32:31.840 | - There's a beautiful tension in that label.
01:32:37.160 | But if you were to look back at your life
01:32:40.960 | and could relive a moment, a set of moments
01:32:45.800 | because there were the happiest times of your life
01:32:49.240 | outside of family, what would that be?
01:32:52.580 | - I don't wanna relive any moments.
01:32:56.680 | I like that.
01:32:58.040 | I like that situation where you have some amount of optimism
01:33:01.360 | and then the anxiety of the unknown.
01:33:04.840 | - So you love the unknown, the mystery of it.
01:33:10.120 | - I don't know about the mystery.
01:33:11.240 | It sure gets your blood pumping.
01:33:12.940 | - What do you think is the meaning of this whole thing?
01:33:17.100 | Of life on this pale blue dot?
01:33:20.620 | - It seems to be what it does.
01:33:23.900 | Like the universe, for whatever reason,
01:33:29.280 | makes atoms which makes us which we do stuff.
01:33:32.820 | And we figure out things and we explore things.
01:33:37.120 | - That's just what it is.
01:33:39.840 | - It's not just.
01:33:41.600 | - Yeah, it is.
01:33:43.520 | Jim, I don't think there's a better place to end it
01:33:46.920 | it's a huge honor.
01:33:48.180 | - Well, that was super fun.
01:33:51.200 | - Thank you so much for talking today.
01:33:52.540 | - All right, great.
01:33:54.080 | - Thanks for listening to this conversation
01:33:56.200 | and thank you to our presenting sponsor, Cash App.
01:33:59.360 | Download it, use code LexPodcast, you'll get $10
01:34:03.080 | and $10 will go to FIRST, a STEM education nonprofit
01:34:06.440 | that inspires hundreds of thousands of young minds
01:34:09.280 | to become future leaders and innovators.
01:34:12.200 | If you enjoy this podcast, subscribe on YouTube,
01:34:15.000 | get five stars on Apple Podcast, follow on Spotify,
01:34:18.280 | support on Patreon or simply connect with me on Twitter.
01:34:22.320 | And now let me leave you with some words of wisdom
01:34:24.800 | from Gordon Moore.
01:34:26.880 | If everything you try works, you aren't trying hard enough.
01:34:30.920 | Thank you for listening and hope to see you next time.
01:34:34.720 | (upbeat music)
01:34:37.300 | (upbeat music)
01:34:39.880 | [BLANK_AUDIO]