Jim Keller: Moore's Law, Microprocessors, and First Principles

00:00:00.000 | The following is a conversation with Jim Keller,

00:00:03.020 | legendary microprocessor engineer

00:00:05.560 | who has worked at AMD, Apple, Tesla, and now Intel.

00:00:10.160 | He's known for his work on AMD K7, K8, K12,

00:00:13.520 | and Zen microarchitectures, Apple A4 and A5 processors,

00:00:18.040 | and co-author of the specification

00:00:20.080 | for the x86-64 instruction set

00:00:23.040 | and HyperTransport Interconnect.

00:00:26.160 | He's a brilliant first principles engineer

00:00:28.440 | and out of the box thinker,

00:00:30.040 | and just an interesting and fun human being to talk to.

00:00:33.480 | This is the Artificial Intelligence Podcast.

00:00:36.480 | If you enjoy it, subscribe on YouTube,

00:00:38.840 | give it five stars on Apple Podcast,

00:00:40.840 | follow on Spotify, support it on Patreon,

00:00:43.500 | or simply connect with me on Twitter,

00:00:45.600 | Alex Friedman, spelled F-R-I-D-M-A-N.

00:00:49.560 | I recently started doing ads

00:00:51.040 | at the end of the introduction.

00:00:52.600 | I'll do one or two minutes after introducing the episode

00:00:55.560 | and never any ads in the middle

00:00:57.100 | that can break the flow of the conversation.

00:00:59.400 | I hope that works for you

00:01:00.780 | and doesn't hurt the listening experience.

00:01:04.060 | This show is presented by Cash App,

00:01:06.160 | the number one finance app in the App Store.

00:01:08.640 | I personally use Cash App to send money to friends,

00:01:11.440 | but you can also use it to buy, sell,

00:01:13.200 | and deposit Bitcoin in just seconds.

00:01:15.600 | Cash App also has a new investing feature.

00:01:18.480 | You can buy fractions of a stock, say $1 worth,

00:01:21.420 | no matter what the stock price is.

00:01:23.540 | Brokers services are provided by Cash App Investing,

00:01:26.480 | a subsidiary of Square and member SIPC.

00:01:29.740 | I'm excited to be working with Cash App

00:01:32.040 | to support one of my favorite organizations called FIRST,

00:01:35.440 | best known for their FIRST Robotics and Lego competitions.

00:01:38.980 | They educate and inspire hundreds of thousands of students

00:01:42.240 | in over 110 countries

00:01:44.100 | and have a perfect rating at Charity Navigator,

00:01:46.720 | which means that donated money

00:01:48.000 | is used to maximum effectiveness.

00:01:50.760 | When you get Cash App from the App Store or Google Play

00:01:53.360 | and use code LEXPODCAST,

00:01:56.280 | you'll get $10 and Cash App will also donate $10 to FIRST,

00:02:00.300 | which again is an organization

00:02:02.140 | that I've personally seen inspire girls and boys

00:02:04.920 | to dream of engineering a better world.

00:02:08.060 | And now here's my conversation with Jim Keller.

00:02:11.480 | What are the differences and similarities

00:02:14.520 | between the human brain and a computer

00:02:17.200 | with the microprocessor at its core?

00:02:19.260 | Let's start with the philosophical question perhaps.

00:02:22.280 | - Well, since people don't actually understand

00:02:25.400 | how human brains work, I think that's true.

00:02:29.200 | - I think that's true.

00:02:30.560 | - So it's hard to compare them.

00:02:32.600 | Computers are, you know, there's really two things.

00:02:37.280 | There's memory and there's computation, right?

00:02:40.480 | And to date, almost all computer architectures

00:02:43.920 | are global memory, which is a thing, right?

00:02:47.600 | And then computation where you pull data

00:02:49.400 | and you do relatively simple operations on it

00:02:52.420 | and write data back.

00:02:53.900 | So it's decoupled in modern computers.

00:02:57.760 | And you think in the human brain,

00:02:59.840 | everything's a mesh, a mess that's combined together.

00:03:02.600 | - What people observe is there's, you know,

00:03:04.840 | some number of layers of neurons

00:03:06.500 | which have local and global connections.

00:03:09.120 | And information is stored in some distributed fashion.

00:03:13.720 | And people build things called neural networks in computers

00:03:18.280 | where the information is distributed

00:03:21.200 | in some kind of fashion.

00:03:22.840 | You know, there's a mathematics behind it.

00:03:25.520 | I don't know that the understanding of that is super deep.

00:03:29.200 | The computations we run on those

00:03:31.120 | are straightforward computations.

00:03:33.440 | I don't believe anybody has said

00:03:35.520 | a neuron does this computation.

00:03:37.880 | So to date, it's hard to compare them, I would say.

00:03:42.880 | - So let's get into the basics before we zoom back out.

00:03:48.800 | How do you build a computer from scratch?

00:03:51.020 | What is a microprocessor?

00:03:52.760 | What is a microarchitecture?

00:03:54.140 | What's an instruction set architecture?

00:03:56.640 | Maybe even as far back as what is a transistor?

00:03:59.460 | - So the special charm of computer engineering

00:04:05.040 | is there's a relatively good understanding

00:04:08.400 | of abstraction layers.

00:04:10.480 | So down at the bottom, you have atoms.

00:04:12.280 | And atoms get put together in materials

00:04:14.280 | like silicon or dope silicon or metal.

00:04:17.480 | And we build transistors on top of that.

00:04:21.000 | We build logic gates, right?

00:04:23.680 | And then functional units, like an adder, a subtractor,

00:04:27.400 | an instruction parsing unit.

00:04:28.760 | And then we assemble those into processing elements.

00:04:32.300 | Modern computers are built out of probably 10 to 20

00:04:37.200 | locally organic processing elements

00:04:40.920 | or coherent processing elements.

00:04:42.620 | And then that runs computer programs, right?

00:04:46.600 | So there's abstraction layers.

00:04:47.920 | And then software, there's an instruction set.

00:04:50.840 | You run, and then there's assembly language, C,

00:04:53.880 | C++, Java, JavaScript.

00:04:56.400 | There's abstraction layers,

00:04:58.680 | essentially from the atom to the data center, right?

00:05:02.520 | So when you build a computer,

00:05:05.060 | first there's a target, like what's it for?

00:05:08.520 | Like how fast does it have to be?

00:05:09.920 | Which today there's a whole bunch of metrics

00:05:12.180 | about what that is.

00:05:13.800 | And then in an organization of a thousand people

00:05:17.020 | who build a computer,

00:05:19.160 | there's lots of different disciplines

00:05:22.200 | that you have to operate on.

00:05:24.080 | Does that make sense?

00:05:25.480 | And so--

00:05:27.080 | - So there's a bunch of levels of abstraction.

00:05:29.380 | In an organization like Intel, and in your own vision,

00:05:35.720 | there's a lot of brilliance that comes in

00:05:37.560 | at every one of those layers.

00:05:39.680 | Some of it is science, some of it is engineering,

00:05:41.640 | some of it is art.

00:05:43.320 | What's the most, if you could pick favorites,

00:05:46.340 | what's the most important, your favorite layer

00:05:49.240 | on these layers of abstractions?

00:05:51.080 | Where does the magic enter this hierarchy?

00:05:53.920 | - I don't really care.

00:05:57.080 | That's the fun, you know, I'm somewhat agnostic to that.

00:06:00.720 | So I would say, for relatively long periods of time,

00:06:05.480 | instruction sets are stable.

00:06:08.020 | So the x86 instruction set, the ARM instruction set.

00:06:11.960 | - What's an instruction set?

00:06:13.320 | - So it says, how do you encode the basic operations?

00:06:16.080 | Load, store, multiply, add, subtract, conditional branch.

00:06:19.620 | There aren't that many interesting instructions.

00:06:23.800 | Like if you look at a program and it runs,

00:06:26.440 | 90% of the execution is on 25 opcodes, 25 instructions.

00:06:31.440 | And those are stable, right?

00:06:33.920 | - What does it mean, stable?

00:06:35.480 | - Intel architecture has been around for 25 years.

00:06:38.120 | - It works.

00:06:38.960 | - It works.

00:06:39.800 | And that's because the basics are defined a long time ago.

00:06:45.280 | Now, the way an old computer ran,

00:06:48.720 | is you fetched instructions and you executed them in order.

00:06:52.960 | Do the load, do the add, do the compare.

00:06:56.140 | The way a modern computer works,

00:06:58.880 | is you fetch large numbers of instructions, say 500.

00:07:03.280 | And then you find the dependency graph

00:07:06.240 | between the instructions.

00:07:07.920 | And then you execute in independent units,

00:07:12.320 | those little micrographs.

00:07:15.260 | So a modern computer, like people like to say,

00:07:17.740 | computers should be simple and clean.

00:07:20.700 | But this turns out the market for simple,

00:07:22.340 | complete, clean, slow computers is zero, right?

00:07:26.220 | We don't sell any simple, clean computers.

00:07:29.500 | Now you can, there's how you build it can be clean,

00:07:33.500 | but the computer people want to buy,

00:07:36.620 | that's say in a phone or a data center,

00:07:40.380 | fetches a large number of instructions,

00:07:42.620 | computes the dependency graph,

00:07:45.540 | and then executes it in a way that gets the right answers.

00:07:49.080 | - And optimize that graph somehow.

00:07:50.820 | - Yeah, they run deeply out of order.

00:07:53.460 | And then there's semantics around how memory ordering works

00:07:57.500 | and other things work.

00:07:58.340 | So the computer sort of has a bunch of bookkeeping tables.

00:08:01.900 | It says, what order should these operations finish in

00:08:05.420 | or appear to finish in?

00:08:07.740 | But to go fast, you have to fetch a lot of instructions

00:08:10.620 | and find all the parallelism.

00:08:12.620 | Now there's a second kind of computer,

00:08:15.380 | which we call GPUs today.

00:08:17.480 | And I call it a difference.

00:08:19.540 | There's found parallelism,

00:08:20.940 | like you have a program

00:08:21.780 | with a lot of dependent instructions.

00:08:24.020 | You fetch a bunch and then you go figure out

00:08:26.020 | the dependency graph and you issue instructions out of order.

00:08:29.340 | That's because you have one serial narrative to execute,

00:08:32.900 | which in fact can be done out of order.

00:08:35.780 | - You call it a narrative?

00:08:37.020 | - Yeah.

00:08:37.860 | - Wow.

00:08:38.700 | - Yeah, so humans think in serial narrative.

00:08:40.700 | So read a book, right?

00:08:42.980 | There's a sentence after sentence after sentence

00:08:45.780 | and there's paragraphs.

00:08:46.860 | Now you could diagram that.

00:08:49.380 | Imagine you diagrammed it properly and you said,

00:08:51.820 | which sentences could be read in any order

00:08:56.260 | without changing the meaning, right?

00:08:59.060 | - That's a fascinating question to ask of a book, yeah.

00:09:02.540 | - Yeah, you could do that, right?

00:09:04.380 | So some paragraphs could be reordered,

00:09:06.300 | some sentences can be reordered.

00:09:08.420 | You could say, he is tall and smart and X, right?

00:09:13.420 | And it doesn't matter the order of tall and smart.

00:09:18.220 | But if you say the tall man is wearing a red shirt,

00:09:22.940 | what colors, you know, like you can create dependencies.

00:09:27.140 | And so GPUs on the other hand,

00:09:32.020 | run simple programs on pixels,

00:09:35.300 | but you're given a million of them.

00:09:36.860 | And the first order, the screen you're looking at

00:09:40.140 | doesn't care which order you do it in.

00:09:42.180 | So I call that given parallelism.

00:09:44.460 | Simple narratives around the large numbers of things

00:09:48.300 | where you can just say it's parallel

00:09:50.420 | because you told me it was.

00:09:52.300 | - So found parallelism where the narrative is sequential

00:09:57.300 | but you discover like little pockets of parallelism versus--

00:10:01.780 | - Turns out large pockets of parallelism.

00:10:03.980 | - Large, so how hard is it to discover?

00:10:05.860 | - Well, how hard is it?

00:10:06.940 | That's just transistor count, right?

00:10:08.820 | So once you crack the problem, you say,

00:10:11.140 | here's how you fetch 10 instructions at a time.

00:10:13.460 | Here's how you calculated the dependencies between them.

00:10:16.340 | Here's how you describe the dependencies.

00:10:18.500 | Here's, you know, these are pieces, right?

00:10:20.640 | - So once you describe the dependencies,

00:10:25.580 | then it's just a graph,

00:10:27.580 | sort of it's an algorithm that finds,

00:10:30.180 | well, what is that?

00:10:31.940 | I'm sure there's a graph theory,

00:10:33.580 | theoretical answer here that's solvable.

00:10:35.780 | In general, programs, modern programs

00:10:40.700 | that human beings write,

00:10:42.220 | how much found parallelism is there in them?

00:10:45.060 | - About 10x.

00:10:45.900 | - What does 10x mean?

00:10:47.140 | - So if you execute it in order--

00:10:49.700 | - Versus, yeah.

00:10:51.500 | - You would get what's called cycles per instruction

00:10:53.900 | and it would be about, you know, three instructions,

00:10:58.180 | three cycles per instruction

00:10:59.980 | because of the latency of the operations and stuff.

00:11:02.740 | And in a modern computer,

00:11:04.460 | excuse it, but like 0.2, 0.25 cycles per instruction.

00:11:08.660 | So it's about, we today find 10x.

00:11:11.780 | And there's two things.

00:11:12.960 | One is the found parallelism in the narrative, right?

00:11:17.300 | And the other is to predictability of the narrative, right?

00:11:21.320 | So certain operations, they do a bunch of calculations

00:11:25.480 | and if greater than one, do this, else do that.

00:11:29.740 | That decision is predicted in modern computers

00:11:33.140 | to high 90% accuracy.

00:11:36.220 | So branches happen a lot.

00:11:38.700 | So imagine you have a decision

00:11:40.380 | to make every six instructions,

00:11:41.740 | which is about the average, right?

00:11:43.740 | But you want to fetch 500 instructions,

00:11:45.460 | figure out the graph and execute them all in parallel.

00:11:48.440 | That means you have, let's say,

00:11:51.580 | if you fix 600 instructions and it's every six,

00:11:54.980 | you have to fetch,

00:11:56.060 | you have to predict 99 out of 100 branches correctly

00:11:59.380 | for that window to be effective.

00:12:02.340 | - Okay, so parallelism,

00:12:04.660 | you can't parallelize branches, or you can?

00:12:07.580 | - No, you can predict--

00:12:09.100 | - What does predict a branch mean?

00:12:10.620 | Or what does predict a branch mean?

00:12:11.460 | - So imagine you do a computation over and over,

00:12:13.580 | you're in a loop.

00:12:14.940 | So while n is greater than one, do.

00:12:19.420 | And you go through that loop a million times.

00:12:21.220 | So every time you look at the branch,

00:12:22.640 | you say, it's probably still greater than one.

00:12:25.740 | - And you're saying you could do that accurately.

00:12:27.820 | - Very accurately.

00:12:28.660 | Modern computer--

00:12:29.480 | - My mind is blown.

00:12:30.320 | How the heck do you do that?

00:12:31.460 | Wait a minute.

00:12:32.580 | - Well, you want to know?

00:12:33.780 | This is really sad.

00:12:35.500 | 20 years ago, you simply recorded

00:12:38.700 | which way the branch went last time

00:12:40.620 | and predicted the same thing.

00:12:42.780 | - Right.

00:12:43.620 | - Okay.

00:12:44.460 | - What's the accuracy of that?

00:12:46.140 | - 85%.

00:12:48.100 | So then somebody said, hey, let's keep a couple of bits

00:12:51.780 | and have a little counter.

00:12:53.080 | So when it predicts one way,

00:12:54.980 | we count up and then pins.

00:12:56.720 | So say you have a three bit counter.

00:12:58.060 | So you count up and then you count down.

00:13:00.760 | And if it's, you can use the top bit as the sign bit.

00:13:03.260 | So you have a sign two bit number.

00:13:05.020 | So if it's greater than one, you predict taken

00:13:07.460 | and less than one, you predict not taken, right?

00:13:11.460 | Or less than zero, whatever the thing is.

00:13:14.100 | And that got us to 92%.

00:13:16.100 | - Oh.

00:13:17.300 | - Okay, now it gets better.

00:13:19.540 | This branch depends on how you got there.

00:13:22.900 | So if you came down the code one way,

00:13:25.540 | you're talking about Bob and Jane, right?

00:13:28.420 | And then said, is just Bob like Jane?

00:13:30.460 | It went one way.

00:13:31.300 | But if you're talking about Bob and Jill,

00:13:32.900 | does Bob like Jane?

00:13:33.940 | You go a different way, right?

00:13:35.800 | So that's called history.

00:13:36.920 | So you take the history and a counter.

00:13:38.900 | That's cool.

00:13:41.300 | But that's not how anything works today.

00:13:43.380 | They use something that looks a little

00:13:45.060 | like a neural network.

00:13:46.260 | So modern, you take all the execution flows

00:13:52.220 | and then you do basically deep pattern recognition

00:13:56.060 | of how the program is executing.

00:13:58.460 | And you do that multiple different ways.

00:14:03.660 | And you have something that chooses what the best result is.

00:14:06.660 | There's a little supercomputer inside the computer.

00:14:10.380 | - That's trying to predict branching.

00:14:11.220 | - That calculates which way branches go.

00:14:14.260 | So the effective window that is worth finding grass

00:14:17.340 | and gets bigger.

00:14:18.260 | - Why was that gonna make me sad?

00:14:21.820 | 'Cause that's amazing.

00:14:22.860 | - It's amazingly complicated.

00:14:24.380 | - Oh, well.

00:14:25.220 | - Well, here's the funny thing.

00:14:27.060 | So to get to 85% took a thousand bits.

00:14:31.700 | To get to 99% takes tens of megabits.

00:14:37.720 | So this is one of those, to get the result,

00:14:42.700 | to get from a window of say 50 instructions to 500,

00:14:47.700 | it took three orders of magnitude

00:14:49.500 | or four orders of magnitude more bits.

00:14:52.420 | - Now, if you get the prediction of a branch wrong,

00:14:55.460 | what happens then?

00:14:56.300 | - You flush the pipe.

00:14:57.380 | - You flush the pipe, so it's just the performance cost.

00:14:59.540 | - But it gets even better.

00:15:01.420 | So we're starting to look at stuff that says,

00:15:03.860 | so they executed down this path

00:15:05.820 | and then you had two ways to go,

00:15:09.260 | but far, far away there's something

00:15:11.860 | that doesn't matter which path you went.

00:15:14.660 | So you took the wrong path, you executed a bunch of stuff.

00:15:20.580 | Then you had the mispredicting, you backed it up,

00:15:22.420 | but you remembered all the results you already calculated.

00:15:25.500 | Some of those are just fine.

00:15:27.660 | Like if you read a book and you misunderstand a paragraph,

00:15:30.260 | you're understanding the next paragraph,

00:15:32.500 | sometimes it's invariant to their understanding.

00:15:35.740 | Sometimes it depends on it.

00:15:37.580 | - And you can kind of anticipate that invariance.

00:15:43.260 | - Yeah, well, you can keep track

00:15:45.540 | of whether the data changed.

00:15:47.380 | And so when you come back to a piece of code,

00:15:49.220 | should you calculate it again or do the same thing?

00:15:51.860 | - Okay, how much of this is art

00:15:53.340 | and how much of it is science?

00:15:55.620 | 'Cause it sounds pretty complicated.

00:15:59.060 | - Well, how do you describe a situation?

00:16:00.620 | So imagine you come to a point in the road

00:16:02.580 | where you have to make a decision.

00:16:05.140 | And you have a bunch of knowledge about which way to go,

00:16:07.020 | maybe you have a map.

00:16:08.900 | So you wanna go the shortest way

00:16:11.540 | or do you wanna go the fastest way

00:16:13.140 | or you wanna take the nicest road.

00:16:14.780 | So there's some set of data.

00:16:17.820 | So imagine you're doing something complicated

00:16:19.620 | like building a computer

00:16:20.900 | and there's hundreds of decision points

00:16:24.340 | all with hundreds of possible ways to go.

00:16:27.720 | And the ways you pick interact in a complicated way.

00:16:30.880 | - Right.

00:16:33.420 | - And then you have to pick the right spot.

00:16:35.660 | - Right, so that's-- - Or that's art or science,

00:16:36.900 | I don't know.

00:16:37.740 | - You avoided the question,

00:16:38.900 | you just described the Robert Frost problem

00:16:41.340 | of "Road Less Taken."

00:16:42.640 | - Described the Robert Frost problem?

00:16:45.700 | (laughs)

00:16:47.460 | - That's what we do as computer designers,

00:16:49.460 | it's all poetry.

00:16:50.420 | - Okay. - Great.

00:16:51.440 | Yeah, I don't know how to describe that

00:16:54.180 | because some people are very good

00:16:56.420 | at making those intuitive leaps.

00:16:57.940 | It seems like there's combinations of things.

00:17:00.560 | Some people are less good at it

00:17:02.180 | but they're really good at evaluating the alternatives.

00:17:05.580 | Right, and everybody has a different way to do it.

00:17:09.260 | And some people can't make those leaps

00:17:11.880 | but they're really good at analyzing it.

00:17:14.300 | So when you see computers are designed by teams of people

00:17:16.900 | who have very different skill sets

00:17:19.300 | and a good team has lots of different kinds of people.

00:17:24.300 | I suspect you would describe some of them as artistic.

00:17:27.220 | - Right. - But not very many.

00:17:30.460 | - Unfortunately, or fortunately.

00:17:32.100 | - Unfortunately.

00:17:32.940 | (laughs)

00:17:33.780 | Well, you know, computer design's hard,

00:17:36.500 | it's 99% perspiration.

00:17:39.500 | - And-- - The 1% inspiration

00:17:42.060 | is really important.

00:17:44.140 | - But you still need the 99.

00:17:45.900 | - Yeah, you gotta do a lot of work.

00:17:47.340 | And then there are interesting things to do

00:17:50.780 | at every level of that stack.

00:17:52.780 | - At the end of the day,

00:17:55.700 | if you run the same program multiple times,

00:17:58.860 | does it always produce the same result?

00:18:00.860 | Is there some room for fuzziness there?

00:18:04.740 | - That's a math problem.

00:18:06.740 | So if you run a correct C program,

00:18:08.580 | the definition is every time you run it,

00:18:11.460 | you get the same answer.

00:18:12.460 | - Yeah, well, that's a math statement.

00:18:14.460 | - But that's a language definitional statement.

00:18:17.420 | So for years when people did,

00:18:19.780 | when we first did 3D acceleration of graphics,

00:18:22.900 | you could run the same scene multiple times

00:18:27.260 | and get different answers.

00:18:28.740 | - Right.

00:18:29.740 | - Right, and then some people thought that was okay,

00:18:32.340 | and some people thought it was a bad idea.

00:18:34.540 | And then when the HPC world used GPUs for calculations,

00:18:39.220 | they thought it was a really bad idea, okay?

00:18:42.100 | Now, in modern AI stuff,

00:18:44.380 | people are looking at networks

00:18:48.060 | where the precision of the data is low enough

00:18:51.020 | that the data is somewhat noisy.

00:18:53.620 | And the observation is the input data is unbelievably noisy.

00:18:57.220 | So why should the calculation be not noisy?

00:19:00.180 | And people have experimented with algorithms

00:19:02.140 | that say can get faster answers by being noisy.

00:19:05.900 | Like as a network starts to converge,

00:19:08.220 | if you look at the computation graph,

00:19:09.540 | it starts out really wide and then it gets narrower.

00:19:12.140 | And you can say, is that last little bit that important?

00:19:14.460 | Or should I start the graph on the next rev

00:19:17.700 | before we whittle it all the way down to the answer?

00:19:20.780 | Right, so you can create algorithms that are noisy.

00:19:24.060 | Now, if you're developing something

00:19:25.460 | and every time you run it, you get a different answer,

00:19:27.420 | it's really annoying.

00:19:29.300 | And so most people think even today,

00:19:33.940 | every time you run the program, you get the same answer.

00:19:36.740 | - No, I know, but the question is,

00:19:38.340 | that's the formal definition of a programming language.

00:19:42.380 | - There is a definition of languages

00:19:44.500 | that don't get the same answer, but people who use those.

00:19:47.400 | You always want something 'cause you get a bad answer

00:19:50.780 | and then you're wondering, is it because

00:19:53.260 | of something in the algorithm or because of this?

00:19:55.340 | And so everybody wants a little switch

00:19:56.780 | that says no matter what, do it deterministically.

00:20:00.260 | And it's really weird 'cause almost everything

00:20:02.380 | going into modern calculations is noisy.

00:20:05.300 | So why do the answers have to be so clear?

00:20:07.620 | - Right, so where do you stand?

00:20:09.620 | - I design computers for people who run programs.

00:20:12.500 | So if somebody says, I want a deterministic answer,

00:20:17.100 | most people want that.

00:20:18.380 | - Can you deliver a deterministic answer,

00:20:20.180 | I guess is the question.

00:20:21.340 | - Yeah, hopefully, sure.

00:20:24.380 | What people don't realize is you get a deterministic answer

00:20:27.260 | even though the execution flow is very undeterministic.

00:20:31.100 | So you run this program 100 times,

00:20:33.100 | it never runs the same way twice, ever.

00:20:36.100 | And the answer, it arrives at the same answer.

00:20:37.980 | - But it gets the same answer every time.

00:20:39.220 | - It's just amazing.

00:20:42.020 | Okay, you've achieved in the eyes of many people,

00:20:47.020 | a legend status as a chip architect.

00:20:53.020 | What design creation are you most proud of?

00:20:56.420 | Perhaps because it was challenging, because of its impact,

00:21:00.660 | or because of the set of brilliant ideas

00:21:03.100 | that were involved in bringing it to life.

00:21:06.820 | - I find that description odd.

00:21:10.100 | And I have two small children, and I promise you,

00:21:12.580 | they think it's hilarious.

00:21:15.940 | - This question, I do it for them.

00:21:18.340 | - I'm really interested in building computers.

00:21:22.420 | And I've worked with really, really smart people.

00:21:27.620 | I'm not unbelievably smart.

00:21:30.020 | I'm fascinated by how they go together,

00:21:32.100 | both as a thing to do, and as an endeavor that people do.

00:21:37.100 | - How people and computers go together?

00:21:40.020 | - Yeah, like how people think and build a computer.

00:21:43.000 | And I find sometimes that the best computer architects

00:21:47.780 | aren't that interested in people,

00:21:49.220 | or the best people managers aren't that good

00:21:51.780 | at designing computers.

00:21:53.260 | - So the whole stack of human beings is fascinating.

00:21:56.860 | So the managers, the individual engineers.

00:21:58.940 | - Yeah, yeah, so yeah, I said I realized

00:22:01.460 | after a lot of years of building computers,

00:22:03.660 | we sort of build them out of transistors,

00:22:05.180 | logic gates, functional units, computational elements,

00:22:08.560 | that you could think of people the same way.

00:22:10.740 | So people are functional units.

00:22:12.620 | And then you could think of organizational design

00:22:14.540 | as a computer architectural problem.

00:22:16.900 | And then it was like, oh, that's super cool,

00:22:19.300 | 'cause the people are all different,

00:22:20.680 | just like the computational elements are all different.

00:22:23.660 | And they like to do different things.

00:22:25.580 | And so I had a lot of fun reframing

00:22:29.180 | how I think about organizations.

00:22:31.300 | - Just like with computers,

00:22:34.140 | we were saying execution paths,

00:22:35.980 | you can have a lot of different paths

00:22:37.340 | that end up at the same good destination.

00:22:41.620 | So what have you learned about the human abstractions,

00:22:45.820 | from individual functional human units

00:22:48.860 | to the broader organization?

00:22:51.900 | What does it take to create something special?

00:22:55.020 | - Well, most people don't think simple enough.

00:23:00.300 | - All right, so do you know the difference

00:23:01.660 | between a recipe and the understanding?

00:23:04.140 | There's probably a philosophical description of this.

00:23:09.180 | So imagine you're gonna make a loaf of bread.

00:23:11.500 | The recipe says, get some flour, add some water,

00:23:14.060 | add some yeast, mix it up, let it rise,

00:23:16.820 | put it in a pan, put it in the oven.

00:23:19.420 | It's a recipe.

00:23:20.280 | Understanding bread, you can understand biology,

00:23:24.740 | supply chains, grain grinders, yeast,

00:23:30.260 | physics, thermodynamics.

00:23:34.380 | There's so many levels of understanding there.

00:23:37.220 | And then when people build and design things,

00:23:40.220 | they frequently are executing some stack of recipes.

00:23:43.660 | And the problem with that is the recipes

00:23:46.900 | all have limited scope.

00:23:48.900 | Like if you have a really good recipe book

00:23:50.640 | for making bread, it won't tell you anything

00:23:52.300 | about how to make an omelet.

00:23:53.700 | But if you have a deep understanding of cooking,

00:23:59.220 | then bread, omelets, sandwich,

00:24:03.100 | there's a different way of viewing everything.

00:24:07.740 | And most people, when you get to be an expert at something,

00:24:12.260 | you're hoping to achieve deeper understanding,

00:24:16.420 | not just a large set of recipes to go execute.

00:24:19.960 | And it's interesting to walk groups of people

00:24:22.820 | because executing recipes is unbelievably efficient

00:24:27.620 | if it's what you want to do.

00:24:29.180 | If it's not what you want to do, you're really stuck.

00:24:33.500 | And that difference is crucial.

00:24:36.580 | And everybody has a balance of, let's say,

00:24:39.480 | deeper understanding of recipes.

00:24:40.940 | And some people are really good at recognizing

00:24:43.740 | when the problem is to understand something deeply.

00:24:46.460 | Deeply.

00:24:47.700 | Does that make sense?

00:24:49.060 | - It totally makes sense.

00:24:50.540 | Does every stage of development,

00:24:52.780 | deep understanding on the team needed?

00:24:55.540 | - Well, this goes back to the art versus science question.

00:24:58.620 | - Sure.

00:24:59.460 | - If you constantly unpack everything

00:25:01.220 | for deeper understanding, you never get anything done.

00:25:04.220 | And if you don't unpack understanding when you need to,

00:25:06.900 | you'll do the wrong thing.

00:25:08.460 | And then at every juncture, like human beings

00:25:12.020 | are these really weird things because everything you tell

00:25:15.100 | them has a million possible outputs.

00:25:17.100 | And then they all interact in a hilarious way.

00:25:20.640 | And then having some intuition about what do you tell them,

00:25:24.260 | what do you do, when do you intervene, when do you not,

00:25:26.700 | it's complicated.

00:25:28.740 | - Right, so--

00:25:29.780 | - It's essentially computationally unsolvable.

00:25:33.180 | - Yeah, it's an intractable problem, sure.

00:25:35.340 | Humans are a mess.

00:25:37.980 | But with deep understanding, do you mean also

00:25:42.980 | sort of fundamental questions of things like

00:25:47.340 | what is a computer?

00:25:49.940 | Or why?

00:25:53.700 | Like the why question is why are we even building this?

00:25:57.460 | Like of purpose?

00:25:58.780 | Or do you mean more like going towards

00:26:02.180 | the fundamental limits of physics,

00:26:04.300 | sort of really getting into the core of the science?

00:26:07.300 | - Well, in terms of building a computer,

00:26:09.540 | think a little simpler.

00:26:11.360 | So common practice is you build a computer.

00:26:14.620 | And then when somebody says, I want to make it 10% faster,

00:26:17.760 | you'll go in and say, all right,

00:26:19.220 | I need to make this buffer bigger.

00:26:20.820 | And maybe I'll add an ad unit.

00:26:22.980 | Or I have this thing that's three instructions wide,

00:26:25.340 | I'm gonna make it four instructions wide.

00:26:27.580 | And what you see is each piece

00:26:30.460 | gets incrementally more complicated.

00:26:32.700 | And then at some point you hit this limit.

00:26:37.060 | Like adding another feature or buffer

00:26:39.020 | doesn't seem to make it any faster.

00:26:41.160 | And then people say, well, that's because

00:26:42.740 | it's a fundamental limit.

00:26:45.360 | And then somebody else will look at it and say,

00:26:46.900 | well, actually the way you divided the problem up

00:26:49.380 | and the way that different features are interacting

00:26:51.940 | is limiting you and it has to be rethought, rewritten.

00:26:54.960 | So then you refactor it and rewrite it.

00:26:58.100 | And what people commonly find is the rewrite

00:27:00.900 | is not only faster, but half as complicated.

00:27:03.540 | - From scratch?

00:27:04.380 | - Yes.

00:27:05.200 | - So how often in your career,

00:27:07.340 | but just have you seen as needed,

00:27:09.700 | maybe more generally to just throw the whole thing out?

00:27:14.220 | - This is where I'm on one end of it,

00:27:16.980 | every three to five years.

00:27:19.080 | - Which end are you on?

00:27:21.060 | - Rewrite more often.

00:27:22.700 | - Rewrite, and three to five years is?

00:27:25.180 | - If you want to really make a lot of progress

00:27:26.940 | on computer architecture, every five years

00:27:28.900 | you should do one from scratch.

00:27:30.460 | - So where does the x86-64 standard come in?

00:27:36.900 | How often do you?

00:27:38.740 | - I wrote the, I was the co-author of that spec in '98.

00:27:42.340 | That's 20 years ago.

00:27:43.860 | - Yeah, so that's still around.

00:27:45.820 | - The instruction set itself has been extended

00:27:48.240 | quite a few times.

00:27:49.140 | - Yes.

00:27:49.980 | - And instruction sets are less interesting

00:27:52.460 | than the implementation underneath.

00:27:54.740 | There's been, on x86 architecture,

00:27:57.500 | Intel's designed a few, AMD's designed a few,

00:27:59.940 | very different architectures.

00:28:02.460 | And I don't want to go into too much of the detail

00:28:06.460 | about how often, but there's a tendency

00:28:10.560 | to rewrite it every 10 years,

00:28:12.500 | and it really should be every five.

00:28:15.100 | - So you're saying you're an outlier in that sense?

00:28:17.500 | - Rewrite more often.

00:28:18.900 | Rewrite more often.

00:28:20.060 | - Well, and here's the problem.

00:28:20.900 | - Isn't that scary?

00:28:22.100 | - Yeah, of course.

00:28:23.660 | Well, scary to who?

00:28:25.180 | - To everybody involved, because like you said,

00:28:28.140 | repeating the recipe is efficient.

00:28:30.620 | Companies want to make money,

00:28:33.820 | well, no, individual engineers want to succeed,

00:28:36.340 | so you want to incrementally improve,

00:28:39.000 | increase the buffer from three to four.

00:28:41.260 | - Well, this is where you get into

00:28:43.340 | diminishing return curves.

00:28:45.420 | I think Steve Jobs said this, right?

00:28:46.920 | So every, you have a project,

00:28:48.980 | and you start here, and it goes up,

00:28:50.580 | and you have diminishing return.

00:28:52.380 | And to get to the next level, you have to do a new one,

00:28:54.780 | and the initial starting point will be lower

00:28:57.660 | than the old optimization point, but it'll get higher.

00:29:01.860 | So now you have two kinds of fear,

00:29:03.580 | short-term disaster and long-term disaster.

00:29:07.560 | - And you're-- - So grown-ups.

00:29:09.580 | - Grown-ups.

00:29:10.460 | - Right, like, you know, people with a quarter-by-quarter

00:29:13.820 | business objective are terrified about changing everything.

00:29:17.880 | And people who are trying to run a business

00:29:21.080 | or build a computer for a long-term objective

00:29:24.000 | know that the short-term limitations

00:29:26.560 | block them from the long-term success.

00:29:29.360 | So if you look at leaders of companies

00:29:32.760 | that had really good long-term success,

00:29:35.200 | every time they saw that they had to redo something,

00:29:37.640 | they did.

00:29:39.000 | - And so somebody has to speak up?

00:29:41.060 | - Or you do multiple projects in parallel.

00:29:43.080 | Like, you optimize the old one while you build a new one.

00:29:46.720 | But the marketing guys are always like,

00:29:48.480 | promise me that the new computer

00:29:49.960 | is faster on every single thing.

00:29:52.720 | And the computer architect says,

00:29:53.920 | well, the new computer will be faster on the average.

00:29:56.740 | But there's a distribution of results and performance,

00:29:59.480 | and you'll have some outliers that are slower.

00:30:01.920 | And that's very hard, 'cause they have one customer

00:30:03.760 | who cares about that one.

00:30:05.320 | - So speaking of the long-term, for over 50 years now,

00:30:09.000 | Moore's Law has served, for me and millions of others,

00:30:12.960 | as an inspiring beacon of what kind of amazing future

00:30:16.680 | brilliant engineers can build.

00:30:18.160 | I'm just making your kids laugh all of today.

00:30:21.880 | - Yeah, that's great.

00:30:23.520 | - So first, in your eyes, what is Moore's Law,

00:30:27.560 | if you could define for people who don't know?

00:30:29.860 | - Well, the simple statement was, from Gordon Moore,

00:30:34.300 | was double the number of transistors every two years.

00:30:37.920 | Something like that.

00:30:39.360 | And then my operational model is,

00:30:43.280 | we increase the performance of computers by 2x

00:30:46.880 | every two or three years.

00:30:48.560 | And it's wiggled around substantially over time.

00:30:51.480 | And also, in how we deliver performance has changed.

00:30:55.220 | - So, right, so the--

00:30:59.000 | - The foundational idea was 2x the transistors

00:31:01.680 | every two years.

00:31:02.960 | The current cadence is something like,

00:31:05.800 | they call it a shrink factor.

00:31:08.000 | Like .6 every two years, which is not .5.

00:31:11.920 | - But that's referring strictly, again,

00:31:13.800 | to the original definition of just--

00:31:14.640 | - Yeah, of transistor count.

00:31:16.660 | - A shrink factor, just getting 'em smaller,

00:31:18.400 | smaller, smaller.

00:31:19.240 | - Well, it's for a constant chip area.

00:31:21.760 | If you make the transistors smaller by .6,

00:31:24.200 | then you get one over .6 more transistors.

00:31:27.200 | - So can you linger on it a little longer?

00:31:29.160 | What's the broader, what do you think should be

00:31:31.680 | the broader definition of Moore's Law?

00:31:33.920 | When you mentioned how you think of performance,

00:31:37.920 | just broadly, what's a good way to think about Moore's Law?

00:31:41.480 | - Well, first of all, I've been aware of Moore's Law

00:31:46.200 | for 30 years.

00:31:47.220 | - In which sense?

00:31:49.080 | - Well, I've been designing computers for 40.

00:31:52.920 | - You're just watching it before your eyes, kind of thing.

00:31:55.440 | - Well, and somewhere where I became aware of it,

00:31:58.160 | I was also informed that Moore's Law was gonna die

00:32:00.440 | in 10 to 15 years.

00:32:02.240 | And I thought that was true at first,

00:32:03.920 | but then after 10 years, it was gonna die

00:32:05.840 | in 10 to 15 years.

00:32:07.480 | And then at one point, it was gonna die in five years,

00:32:09.760 | and then it went back up to 10 years,

00:32:11.320 | and at some point, I decided not to worry

00:32:13.440 | about that particular prognostication

00:32:16.680 | for the rest of my life, which is fun.

00:32:19.640 | And then I joined Intel, and everybody said,

00:32:21.560 | Moore's Law is dead.

00:32:22.840 | And I thought, that's sad, 'cause it's the Moore's Law

00:32:24.600 | company, and it's not dead, and it's always been gonna die.

00:32:29.200 | And humans, like these apocryphal kind of statements,

00:32:33.360 | like, we'll run out of food, or we'll run out of air,

00:32:36.280 | or run out of room, or run out of something.

00:32:39.960 | - Right, but it's still incredible that it's lived

00:32:42.520 | for as long as it has.

00:32:44.640 | And yes, there's many people who believe now

00:32:47.640 | that Moore's Law is dead.

00:32:50.200 | - You know, they can join the last 50 years

00:32:52.840 | of people who had the same idea.

00:32:53.680 | - Yeah, there's a long tradition.

00:32:55.400 | But why do you think, if you can try to understand it,

00:33:00.400 | why do you think it's not dead currently?

00:33:03.880 | - Well, first, let's just think,

00:33:05.680 | people think Moore's Law is one thing,

00:33:07.080 | transistors get smaller.

00:33:09.160 | But actually, under the sheets, there's literally

00:33:10.760 | thousands of innovations, and almost all those innovations

00:33:14.120 | have their own diminishing return curves.

00:33:17.360 | So if you graph it, it looks like a cascade

00:33:19.400 | of diminishing return curves.

00:33:21.440 | I don't know what to call that.

00:33:22.680 | But the result is an exponential curve.

00:33:26.480 | Well, at least it has been.

00:33:27.920 | So, and we keep inventing new things.

00:33:30.920 | So if you're an expert in one of the things

00:33:32.960 | on a diminishing return curve, right,

00:33:35.920 | and you can see its plateau,

00:33:38.480 | you will probably tell people, "Well, this is done."

00:33:42.240 | Meanwhile, some other pile of people

00:33:43.640 | are doing something different.

00:33:46.400 | So that's just normal.

00:33:48.280 | So then there's the observation of how small

00:33:51.320 | could a switching device be?

00:33:54.080 | So a modern transistor is something like

00:33:55.760 | a thousand by a thousand by a thousand atoms, right?

00:33:59.920 | And you get quantum effects down around two to 10 atoms.

00:34:04.680 | So you can imagine a transistor as small

00:34:06.680 | as 10 by 10 by 10.

00:34:08.240 | So that's a million times smaller.

00:34:12.080 | And then the quantum computational people

00:34:14.480 | are working away at how to use quantum effects.

00:34:17.480 | So--

00:34:18.320 | - A thousand by a thousand by a thousand.

00:34:21.920 | - Atoms.

00:34:22.760 | - That's a really clean way of putting it.

00:34:26.640 | - Well, a fan, like a modern transistor,

00:34:28.840 | if you look at the fan, it's like 120 atoms wide,

00:34:32.040 | but we can make that thinner.

00:34:33.320 | And then there's a gate wrapped around it,

00:34:35.680 | and then there's spacing.

00:34:36.600 | There's a whole bunch of geometry.

00:34:38.760 | And a competent transistor designer

00:34:42.000 | could count both atoms in every single direction.

00:34:46.780 | (laughing)

00:34:47.980 | Like there's techniques now to already put down atoms

00:34:50.460 | in a single atomic layer.

00:34:51.980 | And you can place atoms if you want to.

00:34:55.820 | It's just, you know, from a manufacturing process,

00:34:59.580 | if placing an atom takes 10 minutes

00:35:01.300 | and you need to put 10 to the 23rd atoms together

00:35:05.620 | to make a computer, it would take a long time.

00:35:08.780 | So the methods are both shrinking things

00:35:13.340 | and then coming up with effective ways

00:35:15.060 | to control what's happening.

00:35:17.900 | - Manufacture stably and cheaply.

00:35:20.060 | - Yeah.

00:35:21.660 | So the innovation stack's pretty broad.

00:35:23.540 | You know, there's equipment, there's optics,

00:35:26.020 | there's chemistry, there's physics,

00:35:27.580 | there's material science, there's metallurgy.

00:35:31.040 | There's lots of ideas about when you put

00:35:32.660 | different materials together, how do they interact?

00:35:34.500 | Are they stable?

00:35:35.540 | Is it stable over temperature?

00:35:37.180 | Like are they repeatable?

00:35:40.540 | You know, there's like literally thousands

00:35:43.540 | of technologies involved.

00:35:45.020 | - But just for the shrinking, you don't think

00:35:46.980 | we're quite yet close to the fundamental limits of physics.

00:35:50.980 | - I did a talk on Moore's Law and I asked

00:35:52.540 | for a roadmap to a path of 100 and after two weeks,

00:35:56.580 | they said we only got to 50.

00:35:58.900 | - 100 what, sorry?

00:35:59.740 | - 100x shrink.

00:36:00.580 | - 100x shrink?

00:36:01.940 | We only got to 50?

00:36:02.780 | - To 50 and I said, why don't you give it

00:36:03.940 | another two weeks?

00:36:05.120 | Well, here's the thing about Moore's Law, right?

00:36:09.660 | So I believe that the next 10 or 20 years

00:36:14.180 | of shrinking is gonna happen, right?

00:36:16.360 | Now, as a computer designer, you have two stances.

00:36:20.940 | You think it's going to shrink, in which case

00:36:23.040 | you're designing and thinking about architecture

00:36:26.180 | in a way that you'll use more transistors.

00:36:29.020 | Or conversely, not be swamped by the complexity

00:36:32.860 | of all the transistors you get, right?

00:36:36.140 | You have to have a strategy, you know?

00:36:39.300 | - So you're open to the possibility and waiting

00:36:42.100 | for the possibility of a whole new army

00:36:44.180 | of transistors ready to work.

00:36:45.940 | - I'm expecting--

00:36:47.260 | - Expecting.

00:36:48.100 | - More transistors every two or three years

00:36:50.380 | by a number large enough that how you think

00:36:53.580 | about design, how you think about architecture

00:36:55.580 | has to change.

00:36:57.200 | Like imagine you build buildings out of bricks

00:37:01.100 | and every year the bricks are half the size

00:37:03.260 | or every two years.

00:37:05.860 | Well, if you kept building bricks the same way,

00:37:08.100 | you know, so many bricks per person per day,

00:37:11.260 | the amount of time to build a building

00:37:13.580 | would go up exponentially.

00:37:14.980 | - Right.

00:37:16.660 | - Right.

00:37:17.480 | But if you said, I know that's coming,

00:37:19.140 | so now I'm going to design equipment

00:37:21.140 | that moves bricks faster, uses them better,

00:37:23.420 | 'cause maybe you're getting something out

00:37:24.540 | of the smaller bricks, more strength, thinner walls,

00:37:27.500 | you know, less material, efficiency out of that.

00:37:30.320 | So once you have a roadmap with what's going to happen,

00:37:33.220 | transistors, we're going to get more of them,

00:37:36.500 | then you design all this collateral around it

00:37:38.740 | to take advantage of it and also to cope with it.

00:37:42.420 | Like that's the thing people don't understand,

00:37:43.740 | it's like if I didn't believe in Moore's law

00:37:46.100 | and then Moore's law transistors showed up,

00:37:48.720 | my design teams were all drowned.

00:37:50.500 | - So what's the hardest part of this in flood

00:37:56.160 | of new transistors?

00:37:57.300 | I mean, even if you just look historically

00:37:59.440 | throughout your career, what's the thing,

00:38:03.700 | what fundamentally changes when you add more transistors

00:38:06.940 | in the task of designing an architecture?

00:38:09.980 | - Well, there's two constants, right?

00:38:12.500 | One is people don't get smarter.

00:38:14.140 | - By the way, there's some science showing

00:38:17.300 | that we do get smarter because of nutrition, whatever.

00:38:20.300 | Sorry to bring that up.

00:38:22.060 | - The Flint effect.

00:38:22.880 | - Yes.

00:38:23.720 | - Yeah, I'm familiar with it.

00:38:24.560 | Nobody understands it, nobody knows if it's still going on.

00:38:26.260 | So that's a--

00:38:27.140 | - Or whether it's real or not, but yeah.

00:38:29.140 | - I sort of--

00:38:31.260 | - Anyway, but not exponentially.

00:38:32.100 | - I would believe for the most part,

00:38:33.460 | people aren't getting much smarter.

00:38:35.500 | - The evidence doesn't support it.

00:38:36.820 | - That's right.

00:38:37.660 | - And then teams can't grow that much.

00:38:40.060 | - Right.

00:38:40.900 | - Right, so human beings, we're really good in teams of 10,

00:38:45.060 | up to teams of 100, they can know each other.

00:38:48.140 | Beyond that, you have to have organizational boundaries.

00:38:50.800 | So you're kind of, you have,

00:38:51.900 | those are pretty hard constraints, right?

00:38:54.640 | So then you have to divide and conquer.

00:38:56.380 | Like as the designs get bigger,

00:38:57.900 | you have to divide it into pieces.

00:38:59.700 | The power of abstraction layers is really high.

00:39:03.180 | We used to build computers out of transistors.

00:39:06.100 | Now we have a team that turns transistors into logic cells

00:39:08.860 | and another team that turns them into functional units

00:39:10.660 | and another one that turns them into computers, right?

00:39:13.140 | So we have abstraction layers in there.

00:39:16.040 | And you have to think about when do you shift gears on that?

00:39:21.040 | We also use faster computers to build faster computers.

00:39:24.280 | So some algorithms run twice as fast on new computers,

00:39:27.780 | but a lot of algorithms are N squared.

00:39:30.420 | So, you know, a computer with twice as many transistors

00:39:33.580 | in it might take four times as long to run.

00:39:36.500 | So you have to refactor the software.

00:39:39.340 | Like simply using faster computers

00:39:41.020 | to build bigger computers doesn't work.

00:39:43.020 | So you have to think about all these things.

00:39:46.260 | - So in terms of computing performance

00:39:47.860 | and the exciting possibility

00:39:49.260 | that more powerful computers bring,

00:39:51.560 | is shrinking the thing we've just been talking about,

00:39:55.180 | one of the, for you,

00:39:57.500 | one of the biggest exciting possibilities

00:39:59.860 | of advancement in performance,

00:40:01.500 | or is there other directions

00:40:02.820 | that you're interested in?

00:40:03.900 | Like in the direction of sort of enforcing given parallelism

00:40:08.900 | or like doing massive parallelism

00:40:12.180 | in terms of many, many CPUs,

00:40:15.020 | you know, stacking CPUs on top of each other,

00:40:17.660 | that kind of parallelism or any kind of parallelism?

00:40:20.780 | - Well, think about it in a different way.

00:40:22.220 | So old computers, you know, slow computers,

00:40:25.220 | you said A equal B plus C times D.

00:40:28.500 | Pretty simple, right?

00:40:30.580 | And then we made faster computers with vector units

00:40:33.460 | and you can do proper equations and matrices, right?

00:40:38.460 | And then modern like AI computations

00:40:41.060 | or like convolutional neural networks,

00:40:43.380 | where you convolve one large data set against another.

00:40:47.060 | And so there's sort of this hierarchy of mathematics,

00:40:51.100 | you know, from simple equation to linear equations

00:40:54.020 | to matrix equations to deeper kind of computation.

00:40:58.740 | And the data sets are getting so big

00:41:00.580 | that people are thinking of data as a topology problem.

00:41:04.340 | You know, data is organized in some immense shape.

00:41:07.940 | And then the computation,

00:41:09.340 | which sort of wants to be get data from immense shape

00:41:12.900 | and do some computation on it.

00:41:15.300 | So what computers have allowed people to do

00:41:18.100 | is have algorithms go much, much further.

00:41:21.380 | So that paper you referenced, the Sutton paper,

00:41:26.620 | they talked about, you know, like when AI started,

00:41:29.100 | it was apply rule sets to something.

00:41:31.860 | That's a very simple computational situation.

00:41:35.780 | And then when they did first chess thing,

00:41:37.820 | they solved deep searches.

00:41:39.860 | So have a huge database of moves and results, deep search,

00:41:44.660 | but it's still just a search, right?

00:41:48.140 | Now we take large numbers of images

00:41:51.140 | and we use it to train these weight sets

00:41:54.380 | that we convolve across.

00:41:56.260 | It's a completely different kind of phenomena.

00:41:58.900 | We call that AI.

00:41:59.940 | Now they're doing the next generation.

00:42:02.420 | And if you look at it,

00:42:03.780 | they're going up this mathematical graph, right?

00:42:07.540 | And then computations, both computation and data sets

00:42:11.180 | support going up that graph.

00:42:13.940 | - Yeah, the kind of computation that might,

00:42:15.460 | I mean, I would argue that all of it is still a search,

00:42:18.700 | right?

00:42:19.980 | Just like you said, a topology problem of data sets,

00:42:22.780 | you're searching the data sets for valuable data.

00:42:27.020 | And also the actual optimization of neural networks

00:42:29.980 | is a kind of search for the--

00:42:33.060 | - I don't know, if you had looked at the inner layers

00:42:34.780 | of finding a cat, it's not a search.

00:42:39.100 | It's a set of endless projections.

00:42:41.100 | So, you know, a projection,

00:42:42.740 | here's a shadow of this phone, right?

00:42:45.660 | And then you can have a shadow of that on the something

00:42:47.700 | and a shadow on that of something.

00:42:49.260 | If you look in the layers, you'll see,

00:42:51.420 | this layer actually describes pointy ears

00:42:53.580 | and round eyeness and fuzziness.

00:42:55.540 | And, but the computation to tease out the attributes

00:43:00.540 | is not search.

00:43:03.700 | - Right, I mean--

00:43:04.540 | - Like the inference part might be search,

00:43:05.980 | but the training's not search.

00:43:07.460 | - Okay, well--

00:43:08.300 | - And then in deep networks, they look at layers

00:43:10.740 | and they don't even know it's represented.

00:43:13.140 | And yet if you take the layers out, it doesn't work.

00:43:16.620 | - Okay, so--

00:43:17.460 | - So I don't think it's search.

00:43:18.940 | - All right, well--

00:43:19.780 | - But you'd have to talk to a mathematician

00:43:21.020 | about what that actually is.

00:43:22.980 | - Well, we could disagree, but the,

00:43:25.780 | it's just semantics, I think, it's not,

00:43:28.180 | but it's certainly not--

00:43:29.020 | - I would say it's absolutely not semantics, but--

00:43:31.900 | - Okay.

00:43:32.740 | All right, well, if you wanna go there.

00:43:35.580 | So optimization to me is search,

00:43:39.020 | and we're trying to optimize the ability

00:43:42.940 | of a neural network to detect cat ears.

00:43:45.820 | And the difference between chess

00:43:49.020 | and the space, the incredibly multi-dimensional,

00:43:54.020 | 100,000 dimensional space that neural networks

00:43:57.340 | are trying to optimize over

00:43:58.740 | is nothing like the chessboard database.

00:44:02.220 | So it's a totally different kind of thing.

00:44:04.780 | Okay, in that sense, you can say--

00:44:06.220 | - Yeah, yeah.

00:44:07.060 | - It loses the meaning.

00:44:07.900 | - I can see how you might say, if you,

00:44:11.220 | the funny thing is, is the difference between

00:44:14.060 | given search space and found search space.

00:44:16.500 | - Right, exactly.

00:44:17.340 | - Yeah, maybe that's a different way to describe it.

00:44:18.180 | - That's a beautiful way to put it, okay.

00:44:19.980 | But you're saying, what's your sense

00:44:21.700 | in terms of the basic mathematical operations

00:44:24.820 | and the architectures, computer hardware

00:44:27.780 | that enables those operations?

00:44:29.920 | Do you see the CPUs of today still being

00:44:33.020 | a really core part of executing

00:44:36.020 | those mathematical operations?

00:44:37.620 | - Yes.

00:44:38.540 | Well, the operations continue to be add, subtract,

00:44:42.300 | load, store, compare, and branch.

00:44:44.020 | It's remarkable.

00:44:46.140 | So it's interesting that the building blocks

00:44:48.860 | of computers or transistors, under that, atoms.

00:44:52.780 | So you got atoms, transistors, logic gates, computers,

00:44:55.940 | right, functional units of computers.

00:44:58.420 | The building blocks of mathematics at some level

00:45:01.060 | are things like adds and subtracts and multiplies,

00:45:04.460 | but the space mathematics can describe

00:45:08.400 | is, I think, essentially infinite.

00:45:11.300 | But the computers that run the algorithms

00:45:14.100 | are still doing the same things.

00:45:16.660 | Now, a given algorithm might say,

00:45:19.020 | I need sparse data, or I need 32-bit data,

00:45:21.940 | or I need a convolution operation

00:45:26.340 | that naturally takes eight-bit data,

00:45:28.980 | multiplies it, and sums it up a certain way.

00:45:31.660 | So the data types in TensorFlow imply an optimization set,

00:45:36.660 | but when you go right down and look at the computers,

00:45:40.460 | it's and and or gates doing adds and multiplies.

00:45:44.060 | Like, that hasn't changed much.

00:45:46.220 | Now, the quantum researchers think

00:45:48.580 | they're gonna change that radically,

00:45:49.980 | and then there's people who think about analog computing,

00:45:52.260 | 'cause you look in the brain,

00:45:53.140 | and it seems to be more analogish.

00:45:54.960 | You know, that maybe there's a way

00:45:57.140 | to do that more efficiently.

00:45:59.100 | But we have a million X on computation,

00:46:03.480 | and I don't know the relationship

00:46:07.780 | between computational, let's say, intensity

00:46:10.980 | and ability to hit mathematical abstractions.

00:46:14.420 | I don't know any ways to describe that,

00:46:17.380 | but just like you saw in AI,

00:46:19.780 | you went from rule sets to simple search

00:46:22.980 | to complex search to, say, found search.

00:46:26.420 | Like, those are orders of magnitude

00:46:29.180 | more computation to do.

00:46:30.700 | And as we get the next two orders of magnitude,

00:46:33.900 | like a friend, Roger Godori, said,

00:46:36.620 | every order of magnitude changes the computation.

00:46:40.140 | - Fundamentally changes what the computation is doing.

00:46:42.700 | - Yeah.

00:46:43.540 | Oh, you know the expression,

00:46:45.660 | the difference in quantity is the difference in kind.

00:46:48.300 | You know, the difference between ant and anthill, right?

00:46:53.020 | Or neuron and brain.

00:46:54.640 | You know, there's this indefinable place

00:46:58.880 | where the quantity changed the quality, right?

00:47:02.500 | And we've seen that happen in mathematics multiple times,

00:47:04.980 | and my guess is it's gonna keep happening.

00:47:08.560 | - So, in your sense, is it, yeah,

00:47:09.980 | if you focus head down and shrinking the transistor.

00:47:14.860 | - Well, it's not just head down,

00:47:15.700 | and we're aware of the software stacks

00:47:18.360 | that are running the computational loads,

00:47:20.400 | and we're kind of pondering,

00:47:22.060 | what do you do with a petabyte of memory

00:47:24.500 | that wants to be accessed in a sparse way

00:47:27.100 | and have, you know, the kind of calculations

00:47:29.360 | AI programmers want?

00:47:31.780 | So, there's a dialogue and interaction,

00:47:34.740 | but when you go in the computer chip,

00:47:38.100 | you know, you find adders and subtractors and multipliers.

00:47:41.540 | - So, if you zoom out then with,

00:47:44.860 | as you mentioned, Rich Sutton,

00:47:46.920 | the idea that most of the development

00:47:49.300 | in the last many decades in AI research

00:47:51.540 | came from just leveraging computation

00:47:54.320 | and just simple algorithms

00:47:57.900 | waiting for the computation to improve.

00:48:00.040 | - Well, software guys have a thing that they call it

00:48:02.740 | the problem of early optimization.

00:48:06.220 | - Right.

00:48:07.060 | - So, you write a big software stack,

00:48:09.140 | and if you start optimizing,

00:48:10.660 | like, the first thing you write,

00:48:12.380 | the odds of that being the performance limiter is low.

00:48:15.420 | But when you get the whole thing working,

00:48:16.900 | can you make it 2X faster by optimizing the right things?

00:48:19.780 | Sure.

00:48:21.020 | While you're optimizing that,

00:48:22.500 | could you have written a new software stack,

00:48:24.300 | which would have been a better choice?

00:48:25.980 | Maybe.

00:48:27.100 | Now you have creative tension.

00:48:28.600 | So--

00:48:30.300 | - But the whole time as you're doing the writing,

00:48:33.140 | that's the software we're talking about.

00:48:34.860 | The hardware underneath gets faster and faster.

00:48:36.820 | - This goes back to the Moore's Law.

00:48:38.140 | If Moore's Law is gonna continue,

00:48:39.980 | then your AI research should expect that to show up,

00:48:44.980 | and then you make a slightly different set of choices,

00:48:47.900 | then we've hit the wall, nothing's gonna happen,

00:48:51.380 | and from here, it's just us rewriting algorithms.

00:48:55.020 | Like, that seems like a failed strategy

00:48:56.500 | for the last 30 years of Moore's Law's death.

00:48:59.180 | - So, can you just linger on it?

00:49:02.020 | I think you've answered it,

00:49:04.540 | but I'll just ask the same dumb question

00:49:06.460 | over and over.

00:49:07.300 | So, why do you think Moore's Law is not going to die?

00:49:12.300 | Which is the most promising, exciting possibility

00:49:15.740 | of why it won't die in the next five, 10 years?

00:49:18.060 | So, is it the continued shrinking of the transistor,

00:49:20.700 | or is it another S-curve that steps in,

00:49:24.020 | and it totally sort of--

00:49:25.580 | - Well, shrinking the transistor

00:49:27.540 | is literally thousands of innovations.

00:49:30.240 | - Right, so there's stacks of S-curves in there.

00:49:33.340 | - There's a whole bunch of S-curves

00:49:34.860 | just kind of running their course

00:49:36.540 | and being reinvented and new things.

00:49:40.500 | You know, the semiconductor fabricators

00:49:44.700 | and technologists have all announced

00:49:46.300 | what's called nanowires,

00:49:47.460 | so they took a fin which had a gate around it

00:49:51.180 | and turned that into little wires

00:49:52.700 | so you have better control of that,

00:49:53.940 | and they're smaller,

00:49:55.380 | and then from there, there's some obvious steps

00:49:57.260 | about how to shrink that.

00:49:59.380 | So, the metallurgy around wire stacks and stuff

00:50:03.660 | has very obvious abilities to shrink,

00:50:07.140 | and there's a whole combination of things there to do.

00:50:10.980 | - Your sense is that we're gonna get a lot

00:50:13.460 | if this innovation from just that shrinking.

00:50:15.700 | - Yeah, like a factor of 100, it's a lot.

00:50:19.420 | - Yeah, I would say.

00:50:20.500 | That's incredible.

00:50:22.140 | And it's totally unknown.

00:50:23.740 | - It's only 10 or 15 years.

00:50:25.140 | - Now, you're smart, and you might know,

00:50:26.420 | but to me, it's totally unpredictable

00:50:28.180 | of what that 100x would bring

00:50:29.740 | in terms of the nature of the computation

00:50:33.340 | that people would be doing.

00:50:34.460 | - Yeah, you're familiar with Bell's Law.

00:50:37.300 | So, for a long time, it was mainframes,

00:50:39.420 | minis, workstation, PC, mobile.

00:50:42.500 | Moore's Law drove faster, smaller computers.

00:50:45.380 | And then, when we were thinking about Moore's Law,

00:50:49.540 | Rajagirdari said, "Every 10x generates a new computation."

00:50:53.300 | So, scalar, vector, matrix, topological computation.

00:51:01.100 | And if you go look at the industry trends,

00:51:03.860 | there was mainframes and minicomputers and PCs,

00:51:07.380 | and then the internet took off,

00:51:08.900 | and then we got mobile devices,

00:51:10.740 | and now we're building 5G wireless

00:51:12.700 | with one millisecond latency.

00:51:14.780 | And people are starting to think about the smart world

00:51:17.140 | where everything knows you, recognizes you.

00:51:21.220 | The transformations are gonna be unpredictable.

00:51:27.420 | - How does it make you feel that you're one

00:51:29.900 | of the key architects of this kind of future?

00:51:34.900 | So, we're not talking about the architects

00:51:37.180 | of the high-level people who build the Angry Bird apps.

00:51:42.180 | - What's wrong with Angry Bird apps?

00:51:44.740 | Who knows?

00:51:45.580 | Maybe that's the whole point of the universe.

00:51:47.180 | - I'm gonna take a stand at that,

00:51:48.820 | and the attention-distracting nature of mobile phones.

00:51:52.780 | I'll take a stand.

00:51:53.780 | But anyway, in terms of--

00:51:55.260 | - I don't think that matters much.

00:51:57.580 | - The side effects of smartphones

00:52:01.260 | or the attention-distraction, which part?

00:52:03.700 | - Well, who knows where this is all leading?

00:52:06.140 | It's changing so fast.

00:52:07.420 | - Wait, so back to the--

00:52:08.260 | - My parents used to yell at my sisters

00:52:09.740 | for hiding in the closet with a wired phone

00:52:11.420 | with a dial on it.

00:52:13.100 | Stop talking to your friends all day.

00:52:14.660 | - Right.

00:52:15.740 | - Now my wife yells at my kids

00:52:17.220 | for talking to their friends all day on text.

00:52:20.380 | I don't know, it looks the same to me.

00:52:21.740 | - It's always, it echoes of the same thing.

00:52:23.380 | Okay, but you are one of the key people

00:52:26.660 | architecting the hardware of this future.

00:52:29.140 | How does that make you feel?

00:52:30.500 | Do you feel responsible?

00:52:31.780 | Do you feel excited?

00:52:34.900 | - So we're in a social context,

00:52:38.100 | so there's billions of people on this planet.

00:52:40.900 | There are literally millions of people

00:52:42.860 | working on technology.

00:52:44.420 | I feel lucky to be doing what I do

00:52:49.860 | and getting paid for it, and there's an interest in it.

00:52:52.820 | But there's so many things going on in parallel.

00:52:56.140 | Like the actions are so unpredictable.

00:52:58.340 | If I wasn't here, somebody else would do it.

00:53:01.180 | The vectors of all these different things

00:53:03.420 | are happening all the time.

00:53:04.860 | You know, there's a, I'm sure some philosopher

00:53:10.260 | or meta-philosophers, you know,

00:53:11.820 | wondering about how we transform our world.

00:53:14.020 | - So you can't deny the fact that these tools,

00:53:19.140 | whether, that these tools are changing our world.

00:53:24.140 | - That's right.

00:53:25.260 | - So do you think it's changing for the better?

00:53:28.420 | - I read this thing recently, it said

00:53:31.740 | the two disciplines with the highest GRE scores

00:53:35.420 | in college are physics and philosophy.

00:53:38.420 | And they're both sort of trying to answer the question,

00:53:41.780 | why is there anything?

00:53:42.900 | And the philosophers are on the kind of theological side,

00:53:47.740 | and the physicists are obviously on the material side.

00:53:52.660 | And there's 100 billion galaxies with 100 billion stars.

00:53:56.980 | It seems, well, repetitive at best.

00:54:00.140 | So, you know, there's, on our way to 10 billion people.

00:54:06.020 | I mean, it's hard to say what it's all for,

00:54:08.180 | if that's what you're asking.

00:54:09.580 | - Yeah, I guess I am.

00:54:11.260 | - Things do tend to significantly increase in complexity.

00:54:15.020 | And I'm curious about how computation,

00:54:21.300 | like our world, our physical world,

00:54:23.940 | inherently generates mathematics.

00:54:25.880 | It's kind of obvious, right?

00:54:26.840 | So we have XYZ coordinates.

00:54:28.820 | You take a sphere, you make it bigger,

00:54:30.100 | you get a surface that falls, you know, grows by R squared.

00:54:34.060 | Like it generally generates mathematics,

00:54:36.380 | and the mathematicians and the physicists

00:54:38.700 | have been having a lot of fun talking

00:54:39.940 | to each other for years.

00:54:41.260 | And computation has been, let's say, relatively pedestrian.

00:54:46.060 | Like computation in terms of mathematics

00:54:48.540 | has been doing binary algebra,

00:54:52.020 | while those guys have been gallivanting

00:54:54.460 | through the other realms of possibility, right?

00:54:58.020 | Now, recently, the computation lets you do

00:55:01.820 | mathematical computations that are sophisticated enough

00:55:06.540 | that nobody understands how the answers came out, right?

00:55:10.060 | - Machine learning.

00:55:10.900 | - Machine learning.

00:55:11.740 | - Yeah, yeah.

00:55:12.560 | - It used to be, you get data set,

00:55:14.260 | you guess at a function.

00:55:16.780 | The function is considered physics

00:55:18.900 | if it's predictive of new functions, new data sets.

00:55:22.140 | Modern, you can take a large data set

00:55:28.020 | with no intuition about what it is

00:55:29.980 | and use machine learning to find a pattern

00:55:31.820 | that has no function, right?

00:55:34.260 | And it can arrive at results that I don't know

00:55:37.580 | if they're completely mathematically describable.

00:55:39.980 | So computation has kind of done something interesting

00:55:44.160 | compared to A equal B plus C.

00:55:47.220 | - There's something reminiscent of that step

00:55:49.660 | from the basic operations of addition

00:55:53.660 | to taking a step towards neural networks

00:55:56.680 | that's reminiscent of what life on Earth

00:55:59.020 | at its origins was doing.

00:56:01.080 | Do you think we're creating sort of the next step

00:56:03.460 | in our evolution in creating artificial intelligence systems

00:56:07.600 | that will--

00:56:08.440 | - I don't know.

00:56:09.260 | I mean, there's so much in the universe already,

00:56:11.060 | it's hard to say.

00:56:12.660 | - Where we stand in this whole thing.

00:56:14.060 | - Are human beings working on additional abstraction layers

00:56:17.460 | and possibilities?

00:56:18.460 | Yeah, it appears so.

00:56:20.300 | Does that mean that human beings don't need dogs?

00:56:23.020 | You know, no.

00:56:24.140 | Like there's so many things

00:56:25.940 | that are all simultaneously interesting and useful.

00:56:30.420 | - Well, you've seen, throughout your career,

00:56:32.460 | you've seen greater and greater level abstractions

00:56:35.140 | built in artificial machines, right?

00:56:39.540 | Do you think, when you look at humans,

00:56:41.260 | do you think that the look of all life on Earth

00:56:44.020 | as a single organism building this thing,

00:56:46.860 | this machine with greater and greater levels of abstraction,

00:56:49.860 | do you think humans are the peak,

00:56:52.680 | the top of the food chain

00:56:54.100 | in this long arc of history on Earth?

00:56:58.380 | Or do you think we're just somewhere in the middle?

00:57:00.500 | Are we the basic functional operations of a CPU?

00:57:05.220 | Are we the C++ program, the Python program?

00:57:09.260 | Are we the neural network?

00:57:10.460 | Like somebody's, you know, people have calculated

00:57:12.900 | like how many operations does the brain do?

00:57:14.900 | And something, you know, I've seen the number 10

00:57:17.020 | to the 18th a bunch of times, arrived different ways.

00:57:20.620 | So could you make a computer

00:57:21.980 | that did 10 to the 20th operations?

00:57:23.820 | - Yes. - Sure.

00:57:25.300 | - So you think-- - We're gonna do that.

00:57:27.060 | Now, is there something magical

00:57:29.420 | about how brains compute things?

00:57:31.620 | I don't know.

00:57:32.980 | You know, my personal experience is interesting

00:57:35.260 | 'cause, you know, you think you know how you think

00:57:37.780 | and then you have all these ideas

00:57:39.020 | and you can't figure out how they happened.

00:57:41.500 | And if you meditate, you know,

00:57:44.100 | like what you can be aware of is interesting.

00:57:48.660 | So I don't know if brains are magical or not.

00:57:50.900 | You know, the physical evidence says no.

00:57:54.780 | Lots of people's personal experience says yes.

00:57:57.820 | So what would be funny is if brains are magical

00:58:01.300 | and yet we can make brains with more computation.

00:58:04.620 | You know, I don't know what to say about that, but.

00:58:07.060 | - Well, do you think magic is an emergent phenomena?

00:58:10.460 | What-- - It could be.

00:58:12.060 | I have no explanation for it.

00:58:13.820 | I'm an engineer. - Let me ask Jim Keller

00:58:15.020 | of what in your view is consciousness?

00:58:17.740 | - What's consciousness?

00:58:20.620 | - Yeah, like what, you know, consciousness, love,

00:58:25.500 | things that are these deeply human things

00:58:27.700 | that seems to emerge from our brain.

00:58:29.560 | Is that something that we'll be able to make,

00:58:33.580 | encode in chips that get faster and faster

00:58:37.220 | and faster and faster?

00:58:38.060 | - That's like a 10 hour conversation.

00:58:39.860 | Nobody really knows.

00:58:41.020 | - Can you summarize it in a couple of sentences?

00:58:44.020 | - Many people have observed that organisms run

00:58:48.860 | at lots of different levels, right?

00:58:51.500 | If you had two neurons, somebody said

00:58:52.860 | you'd have one sensory neuron and one motor neuron, right?

00:58:56.900 | So we move towards things and away from things

00:58:58.820 | and we have physical integrity and safety or not, right?

00:59:03.180 | And then if you look at the animal kingdom,

00:59:05.660 | you can see brains that are a little more complicated

00:59:08.340 | and at some point there's a planning system

00:59:10.300 | and then there's an emotional system

00:59:11.980 | that's happy about being safe

00:59:14.380 | or unhappy about being threatened, right?

00:59:17.220 | And then our brains have massive numbers of structures,

00:59:21.660 | you know, like planning and movement and thinking

00:59:24.940 | and feeling and drives and emotions.

00:59:27.940 | And we seem to have multiple layers of thinking systems.

00:59:31.140 | And we have a brain, a dream system

00:59:32.820 | that nobody understands whatsoever,

00:59:35.260 | which I find completely hilarious.

00:59:37.500 | And you can think in a way that those systems

00:59:42.500 | are more independent and you can observe,

00:59:46.540 | you know, the different parts of yourself can observe them.

00:59:49.540 | I don't know which one's magical.

00:59:51.380 | I don't know which one's not computational.

00:59:53.580 | So.

00:59:56.740 | - Is it possible that it's all computation?

00:59:58.860 | - Probably.

01:00:00.060 | Is there a limit to computation?

01:00:01.500 | I don't think so.

01:00:03.180 | - Do you think the universe is a computer?

01:00:05.300 | - I don't know, it seems to be.

01:00:07.420 | It's a weird kind of computer

01:00:09.540 | because if it was a computer, right?

01:00:12.580 | Like when they do calculations on what it,

01:00:15.340 | how much calculation it takes to describe quantum effects

01:00:18.380 | is unbelievably high.

01:00:20.900 | So if it was a computer,

01:00:22.180 | wouldn't you have built it out of something

01:00:23.540 | that was easier to compute?

01:00:25.060 | Right, that's a funny, it's a funny system.

01:00:29.580 | But then the simulation guys have pointed out

01:00:31.300 | that the rules are kind of interesting.

01:00:32.700 | Like when you look really close, it's uncertain.

01:00:35.100 | And the speed of light says you can only look so far

01:00:37.660 | and things can't be simultaneous

01:00:39.180 | except for the odd entanglement problem

01:00:41.220 | where they seem to be.

01:00:42.540 | Like the rules are all kind of weird.

01:00:45.100 | And somebody said physics is like having 50 equations

01:00:48.860 | with 50 variables to define 50 variables.

01:00:52.020 | Like, you know, it's, you know,

01:00:55.220 | like physics itself has been a shit show

01:00:56.980 | for thousands of years.

01:00:59.020 | It seems odd when you get to the corners of everything.

01:01:01.780 | You know, it's either uncomputable

01:01:03.660 | or undefinable or uncertain.

01:01:07.180 | - It's almost like the designers of the simulation

01:01:09.380 | are trying to prevent us from understanding it perfectly.

01:01:12.820 | - But also the things that require calculations

01:01:16.140 | require so much calculation

01:01:17.740 | that our idea of the universe of a computer is absurd

01:01:20.820 | because every single little bit of it

01:01:23.100 | takes all the computation in the universe to figure out.

01:01:26.300 | So that's a weird kind of computer.

01:01:28.100 | You know, you say the simulation is running in the computer

01:01:30.900 | which has by definition infinite computation.

01:01:34.500 | - Not infinite.

01:01:35.460 | Oh, you mean if the universe is infinite?

01:01:37.700 | - Yeah, well, every little piece of our universe

01:01:40.700 | seems to take infinite computation to figure out.

01:01:43.260 | - Just a lot.

01:01:44.220 | - Well, a lot's a pretty big number.

01:01:46.060 | Compute this little teeny spot takes all the mass

01:01:50.340 | in the local one light year by one light year space.

01:01:53.460 | It's close enough to infinite.

01:01:54.940 | - Oh, it's a heck of a computer if it is one.

01:01:56.660 | - I know, it's a weird description

01:02:00.020 | 'cause the simulation description seems to break

01:02:03.140 | when you look closely at it.

01:02:04.940 | But the rules of the universe seem to imply something's up.

01:02:07.900 | That seems a little arbitrary.

01:02:10.900 | - The universe, the whole thing, the laws of physics,

01:02:14.980 | it just seems like how did it come out to be the way it is?

01:02:19.980 | - Well, lots of people talk about that.

01:02:22.660 | Like I said, the two smartest groups of humans

01:02:24.500 | are working on the same problem.

01:02:26.220 | - From different sides.

01:02:27.060 | - Different aspects and they're both complete failures.

01:02:30.060 | So that's kind of cool.

01:02:31.520 | - They might succeed eventually.

01:02:34.260 | - Well, after 2,000 years, the trend isn't good.

01:02:38.180 | - Oh, 2,000 years is nothing in the span

01:02:40.180 | of the history of the universe.

01:02:41.540 | So we have some time.

01:02:43.380 | - But the next 1,000 years doesn't look good either.

01:02:45.980 | - That's what everybody says at every stage.

01:02:48.940 | But with Moore's Law, as you've just described,

01:02:51.420 | not being dead, the exponential growth of technology,

01:02:55.240 | the future seems pretty incredible.

01:02:57.740 | - Well, it'll be interesting, that's for sure.

01:02:59.620 | - That's right.

01:03:00.460 | So what are your thoughts on Ray Kurzweil's sense

01:03:03.860 | that exponential improvement in technology

01:03:05.900 | will continue indefinitely?

01:03:07.580 | Is that how you see Moore's Law?

01:03:10.860 | Do you see Moore's Law more broadly

01:03:13.100 | in the sense that technology of all kinds

01:03:16.900 | has a way of stacking S-curves on top of each other

01:03:21.240 | where it'll be exponential

01:03:23.140 | and then we'll see all kinds of--

01:03:24.540 | - What does an exponential of a million mean?

01:03:27.660 | That's a pretty amazing number.

01:03:29.440 | And that's just for a local little piece of silicon.

01:03:32.200 | Now let's imagine you say decided to get

01:03:35.780 | 1,000 tons of silicon to collaborate in one computer

01:03:41.500 | at a million times the density.

01:03:43.180 | Like now you're talking, I don't know,

01:03:46.860 | 10 to the 20th more computation power

01:03:49.820 | than our current already unbelievably fast computers.

01:03:53.880 | Like nobody knows what that's gonna mean.

01:03:55.780 | The sci-fi guys call it computronium.

01:03:58.980 | Like when a local civilization turns

01:04:01.620 | the nearby star into a computer.

01:04:03.840 | Like I don't know if that's true.

01:04:06.720 | - So just even when you shrink a transistor,

01:04:10.280 | - That's only one dimension.

01:04:12.580 | - The ripple effects of that.

01:04:14.180 | - Like people tend to think about computers

01:04:15.960 | as a cost problem, right?

01:04:17.640 | So computers are made out of silicon

01:04:19.340 | and minor amounts of metals.

01:04:21.980 | And you know, this and that.

01:04:24.780 | None of those things cost any money.

01:04:26.900 | Like there's plenty of sand.

01:04:28.720 | Like you could just turn the beach

01:04:31.140 | and a little bit of ocean water into computers.

01:04:33.340 | So all the cost is in the equipment to do it.

01:04:36.700 | And the trend on equipment is once you figure out

01:04:39.420 | how to build the equipment, the trend of cost is zero.

01:04:41.820 | Elon said first you figure out what configuration

01:04:45.900 | you want the atoms in and then how to put them there.

01:04:49.820 | Right?

01:04:50.660 | - Yeah.

01:04:51.480 | - But here's the, you know, his great insight is

01:04:54.900 | people are how constrained.

01:04:56.500 | I have this thing, I know how it works.

01:04:58.700 | And then little tweaks to that will generate something

01:05:02.300 | as opposed to what do I actually want

01:05:05.140 | and then figure out how to build it.

01:05:07.060 | It's a very different mindset.

01:05:09.280 | And almost nobody has it, obviously.

01:05:11.360 | - Well, let me ask on that topic.

01:05:15.780 | You were one of the key early people

01:05:18.060 | in the development of autopilot,

01:05:20.180 | at least in the hardware side.

01:05:21.640 | Elon Musk believes that autopilot and vehicle autonomy,

01:05:25.500 | if you just look at that problem,

01:05:26.700 | can follow this kind of exponential improvement.

01:05:29.500 | In terms of the how question that we're talking about,

01:05:32.620 | there's no reason why it can't.

01:05:34.700 | What are your thoughts on this particular space

01:05:37.300 | of vehicle autonomy and your part of it

01:05:42.300 | and Elon Musk's and Tesla's vision for--

01:05:45.260 | - Well, the computer you need to build is straightforward.

01:05:48.780 | And you could argue, well, does it need to be

01:05:51.140 | two times faster or five times or 10 times?

01:05:53.600 | But that's just a matter of time or price in the short run.

01:05:58.440 | So that's not a big deal.

01:06:00.240 | You don't have to be especially smart to drive a car.

01:06:03.300 | So it's not like a super hard problem.

01:06:05.740 | I mean, the big problem with safety is attention,

01:06:07.940 | which computers are really good at, not skills.

01:06:11.120 | - Well, let me push back on one.

01:06:15.260 | You say everything you said is correct,

01:06:17.160 | but we as humans tend to take for granted

01:06:22.160 | how incredible our vision system is.

01:06:26.860 | So--

01:06:27.900 | - You can drive a car with 20/50 vision

01:06:30.620 | and you can train a neural network to extract

01:06:33.060 | the distance of any object and the shape of any surface

01:06:36.460 | from a video and data.

01:06:38.540 | - Yeah, but-- - It's really simple.

01:06:40.180 | - No, it's not simple.

01:06:42.140 | - That's a simple data problem.

01:06:44.380 | - It's not simple.

01:06:46.340 | It's because it's not just detecting objects,

01:06:50.460 | it's understanding the scene and it's being able to do it

01:06:53.720 | in a way that doesn't make errors.

01:06:56.580 | So the beautiful thing about the human vision system

01:07:00.020 | and our entire brain around the whole thing

01:07:02.600 | is we're able to fill in the gaps.

01:07:05.540 | It's not just about perfectly detecting cars,

01:07:08.200 | it's inferring the occluded cars.

01:07:09.960 | It's trying to, it's understanding the physics--

01:07:12.800 | - I think that's mostly a data problem.

01:07:14.580 | So you think what data would compute

01:07:17.700 | with improvement of computation,

01:07:19.220 | with improvement in collection--

01:07:20.740 | - Well, there is a, you know, when you're driving a car

01:07:22.660 | and somebody cuts you off, your brain has theories

01:07:24.760 | about why they did it.

01:07:26.140 | You know, they're a bad person, they're distracted,

01:07:28.660 | they're dumb, you know, you can listen to yourself.

01:07:31.820 | - Right.

01:07:32.820 | - So, you know, if you think that narrative is important

01:07:37.020 | to be able to successfully drive a car,

01:07:38.820 | then current autopilot systems can't do it.

01:07:41.620 | But if cars are ballistic things with tracks

01:07:44.360 | and probabilistic changes of speed and direction,

01:07:47.340 | and roads are fixed and given by the way,

01:07:50.220 | they don't change dynamically, right?

01:07:53.280 | You can map the world really thoroughly.

01:07:56.340 | You can place every object really thoroughly, right?

01:08:01.340 | You can calculate trajectories of things really thoroughly.

01:08:04.780 | Right?

01:08:06.900 | - But everything you said about really thoroughly

01:08:09.860 | has a different degree of difficulty.

01:08:12.500 | So--

01:08:13.340 | - And you could say at some point,

01:08:15.100 | computer autonomous systems will be way better

01:08:17.620 | at things that humans are lousy at.

01:08:20.020 | Like, they'll be better at attention,

01:08:22.480 | they'll always remember there was a pothole in the road

01:08:25.060 | that humans keep forgetting about.

01:08:27.380 | They'll remember that this set of roads

01:08:29.460 | has these weirdo lines on it

01:08:31.220 | that the computers figured out once.

01:08:32.780 | And especially if they get updates

01:08:35.180 | so if somebody changes a given,

01:08:37.960 | like the key to robots and stuff,

01:08:40.660 | somebody said is to maximize the givens.

01:08:42.880 | Right?

01:08:44.740 | - Right.

01:08:45.560 | - So having a robot pick up this bottle cap

01:08:47.920 | is way easier to put a red dot on the top.

01:08:50.060 | 'Cause then you have to figure out,

01:08:52.640 | if you wanna do a certain thing with it,

01:08:54.800 | maximize the givens is the thing.

01:08:57.120 | And autonomous systems are happily maximizing the givens.

01:09:00.200 | Like humans, when you drive someplace new,

01:09:04.120 | you remember it 'cause you're processing it the whole time.

01:09:06.880 | And after the 50th time you drove to work,

01:09:08.880 | you get to work, you don't know how you got there.

01:09:11.240 | Right?

01:09:12.080 | You're on autopilot.

01:09:13.680 | Right?

01:09:14.760 | Autonomous cars are always on autopilot.

01:09:17.720 | But the cars have no theories about why they got cut off

01:09:20.340 | or why they're in traffic.

01:09:22.080 | - So that's--

01:09:22.920 | - They also never stop paying attention.

01:09:24.680 | - Right.

01:09:25.520 | So I tend to believe you do have to have theories,

01:09:27.960 | meta models of other people,

01:09:29.960 | especially with pedestrian and cyclists,

01:09:31.380 | but also with other cars.

01:09:32.800 | So everything you said is actually essential to driving.

01:09:37.800 | Driving is a lot more complicated than people realize,

01:09:41.720 | I think.

01:09:42.560 | So sort of to push back slightly, but--

01:09:44.800 | - So to cut into traffic, right?

01:09:46.480 | - Yep.

01:09:47.320 | - You can't just wait for a gap.

01:09:48.440 | You have to be somewhat aggressive.

01:09:50.120 | You'd be surprised how simple a calculation for that is.

01:09:53.800 | - I may be on that particular point, but there's--

01:09:56.240 | Maybe I actually have to push back.

01:10:00.320 | I would be surprised.

01:10:01.600 | You know what?

01:10:02.440 | I'll say where I stand.

01:10:03.260 | I would be very surprised,

01:10:04.280 | but I think you might be surprised how complicated it is.

01:10:09.280 | - I tell people, it's like progress disappoints

01:10:11.960 | in the short run and surprises in the long run.

01:10:13.920 | - It's very possible.

01:10:14.920 | Yeah.

01:10:15.760 | - I suspect in 10 years, it'll be just taken for granted.

01:10:18.960 | - Yeah, probably.

01:10:19.840 | But you're probably right.

01:10:21.520 | Now it look like--

01:10:22.360 | - It's gonna be a $50 solution that nobody cares about.

01:10:25.040 | It's like GPS is like, wow, GPS is,

01:10:27.240 | we have satellites in space

01:10:29.440 | that tell you where your location is.

01:10:31.120 | It was a really big deal.

01:10:32.040 | Now everything has a GPS in it.

01:10:33.480 | - Yeah, that's true.

01:10:34.320 | But I do think that systems that involve human behavior

01:10:38.880 | are more complicated than we give them credit for.

01:10:40.800 | So we can do incredible things with technology

01:10:43.520 | that don't involve humans, but when you--

01:10:45.600 | - I think humans are less complicated than people

01:10:48.840 | frequently ascribed.

01:10:50.560 | - Maybe I--

01:10:51.400 | - We tend to operate out of large numbers of patterns

01:10:53.720 | and just keep doing it over and over.

01:10:55.800 | - But I can't trust you because you're a human.

01:10:58.040 | That's something a human would say.

01:11:00.760 | But my hope is on the point you've made is,

01:11:04.560 | even if, no matter who's right,

01:11:07.240 | I'm hoping that there's a lot of things

01:11:10.640 | that humans aren't good at

01:11:11.840 | that machines are definitely good at.

01:11:13.440 | Like you said, attention and things like that.

01:11:15.600 | Well, they'll be so much better

01:11:17.660 | that the overall picture of safety and autonomy

01:11:20.960 | will be obviously cars will be safer,

01:11:22.840 | even if they're not as good at it.

01:11:24.680 | - I'm a big believer in safety.

01:11:26.360 | I mean, there are already the current safety systems

01:11:29.600 | like cruise control that doesn't let you run into people

01:11:32.000 | and lane keeping.

01:11:33.320 | There are so many features that you just look at the Pareto

01:11:36.280 | of accidents and knocking off like 80% of them

01:11:39.560 | is super doable.

01:11:42.440 | - Just to linger on the autopilot team

01:11:44.640 | and the efforts there,

01:11:45.820 | it seems to be that there's a very intense scrutiny

01:11:51.680 | by the media and the public in terms of safety,

01:11:54.280 | the pressure, the bar put before autonomous vehicles.

01:11:57.960 | What are your sort of as a person there

01:12:01.720 | working on the hardware and trying to build a system

01:12:03.860 | that builds a safe vehicle and so on,

01:12:07.200 | what was your sense about that pressure?

01:12:08.940 | Is it unfair?

01:12:09.880 | Is it expected of new technology?

01:12:12.280 | - Yeah, it seems reasonable.

01:12:13.500 | I was interested, I talked to both American

01:12:15.400 | and European regulators,

01:12:17.240 | and I was worried that the regulations

01:12:21.200 | would write into the rules technology solutions

01:12:25.080 | like modern brake systems imply hydraulic brakes.

01:12:30.000 | So if you read the regulations

01:12:32.120 | to meet the letter of the law for brakes,

01:12:35.040 | it sort of has to be hydraulic, right?

01:12:37.760 | And the regulator said,

01:12:39.320 | they're interested in the use cases,

01:12:42.020 | like a head-on crash, an offset crash,

01:12:44.320 | don't hit pedestrians, don't run into people,

01:12:47.040 | don't leave the road, don't run a red light or a stoplight.

01:12:50.360 | They were very much into the scenarios.

01:12:53.120 | And they had all the data about which scenarios

01:12:56.880 | injured or killed the most people.

01:12:59.280 | And for the most part, those conversations were like,

01:13:04.000 | what's the right thing to do to take the next step?

01:13:08.760 | Now Elon's very interested also in the benefits

01:13:11.960 | of autonomous driving or freeing people's time

01:13:14.120 | and attention as well as safety.

01:13:16.480 | And I think that's also an interesting thing,

01:13:20.320 | but building autonomous systems so they're safe

01:13:25.120 | and safer than people seemed,

01:13:27.360 | since the goal is to be 10x safer than people,

01:13:30.120 | having the bar to be safer than people

01:13:32.160 | and scrutinizing accidents seems philosophically correct.

01:13:37.160 | So I think that's a good thing.

01:13:40.760 | - It's different than the things you worked at,

01:13:47.360 | the Intel, AMD, Apple, with autopilot chip design

01:13:51.560 | and hardware design.

01:13:53.400 | What are interesting or challenging aspects

01:13:55.300 | of building this specialized kind of computing system

01:13:57.880 | in the automotive space?

01:13:59.300 | - I mean, there's two tricks to building

01:14:01.600 | like an automotive computer.

01:14:02.740 | One is the software team, the machine learning team

01:14:07.280 | is developing algorithms that are changing fast.

01:14:10.640 | So as you're building the accelerator,

01:14:14.240 | you have this worry or intuition

01:14:16.880 | that the algorithms will change enough

01:14:18.480 | that the accelerator will be the wrong one.

01:14:21.760 | And there's a generic thing,

01:14:24.560 | which is if you build a really good general purpose computer

01:14:27.200 | say its performance is one,

01:14:29.800 | and then GPU guys will deliver about 5x to performance

01:14:34.260 | for the same amount of silicon,

01:14:35.680 | because instead of discovering parallelism,

01:14:37.600 | you're given parallelism.

01:14:39.200 | And then special accelerators get another two to 5x

01:14:43.680 | on top of a GPU, because you say,

01:14:46.080 | I know the math is always eight bit integers

01:14:49.000 | into 32 bit accumulators,

01:14:51.120 | and the operations are the subset

01:14:53.000 | of mathematical possibilities.

01:14:55.160 | So, AI accelerators have a claim performance benefit

01:15:00.160 | over GPUs because in the narrow mass space,

01:15:05.060 | you're nailing the algorithm.

01:15:07.080 | Now, you still try to make it programmable,

01:15:10.000 | but the AI field is changing really fast.

01:15:13.240 | So there's a little creative tension there of,

01:15:17.240 | I want the acceleration afforded by specialization

01:15:20.560 | without being over specialized

01:15:22.100 | so that the new algorithm is so much more effective

01:15:25.540 | that you'd have been better off on a GPU.

01:15:27.880 | So there's a tension there.

01:15:29.920 | To build a good computer for an application like automotive,

01:15:34.360 | there's all kinds of sensor inputs and safety processors

01:15:37.540 | and a bunch of stuff.

01:15:39.060 | So one of Elon's goals to make it super affordable.

01:15:42.160 | So every car gets an autopilot computer.

01:15:44.800 | So some of the recent startups you look at,

01:15:46.440 | and they have a server in the trunk,

01:15:48.440 | because they're saying,

01:15:49.280 | I'm gonna build this autopilot computer

01:15:50.640 | or replaces the driver.

01:15:52.480 | So their cost budget's 10 or $20,000.

01:15:55.160 | And Elon's constraint was, I'm gonna put one in every car,

01:15:58.720 | whether people buy autonomous driving or not.

01:16:01.640 | So the cost constraint he had in mind was great.

01:16:05.200 | And to hit that, you had to think about the system design.

01:16:08.320 | That's complicated, it's fun.

01:16:09.840 | You know, it's like, it's craftsman's work.

01:16:12.520 | Like a violin maker, right?

01:16:14.240 | You can say Stradivarius is this incredible thing,

01:16:16.760 | the musicians are incredible.

01:16:18.460 | But the guy making the violin, you know,

01:16:20.440 | picked wood and sanded it, and then he cut it,

01:16:23.960 | you know, and he glued it, you know,

01:16:25.920 | and he waited for the right day

01:16:27.880 | so that when he put the finish on it,

01:16:29.480 | it didn't, you know, do something dumb.

01:16:31.620 | That's craftsman's work, right?

01:16:33.840 | You may be a genius craftsman

01:16:35.480 | 'cause you have the best techniques

01:16:36.800 | and you discover a new one,

01:16:38.800 | but most engineering is craftsman's work.

01:16:41.920 | And humans really like to do that.

01:16:44.280 | You know the expression-- - Smart humans.

01:16:45.920 | - No, everybody.

01:16:46.760 | - All humans. - I don't know.

01:16:47.880 | I used to, I dug ditches when I was in college.

01:16:50.340 | I got really good at it, satisfying.

01:16:52.600 | - Yeah.

01:16:53.440 | Digging ditches is also craftsman work.

01:16:55.440 | - Yeah, of course.

01:16:56.920 | So there's an expression called complex mastery behavior.

01:17:00.880 | So when you're learning something,

01:17:02.040 | that's fun 'cause you're learning something.

01:17:04.060 | When you do something and it's rote and simple,

01:17:05.720 | it's not that satisfying.

01:17:06.680 | But if the steps that you have to do are complicated

01:17:10.360 | and you're good at 'em, it's satisfying to do them.

01:17:13.480 | And then if you're intrigued by it all,

01:17:16.840 | as you're doing them, you sometimes learn new things

01:17:19.500 | that you can raise your game.

01:17:21.560 | But craftsman's work is good.

01:17:23.720 | And engineers, like engineering is complicated enough

01:17:27.040 | that you have to learn a lot of skills

01:17:28.760 | and then a lot of what you do is then craftsman's work,

01:17:32.320 | which is fun.

01:17:33.440 | Autonomous driving, building a very

01:17:35.400 | resource-constrained computer,

01:17:37.800 | so a computer has to be cheap enough

01:17:39.500 | to put in every single car,

01:17:41.080 | that essentially boils down to craftsman's work.

01:17:45.040 | It's engineering, it's--

01:17:45.880 | - Yeah, you know, there's thoughtful decisions

01:17:47.660 | and problems to solve and trade-offs to make.

01:17:50.560 | Do you need 10 camera in ports or eight?

01:17:52.480 | You know, you're building for the current car

01:17:54.480 | or the next one.

01:17:56.000 | You know, how do you do the safety stuff?

01:17:57.880 | You know, there's a whole bunch of details.

01:18:00.600 | But it's fun.

01:18:01.420 | It's not like I'm building a new type of neural network

01:18:04.740 | which has a new mathematics and a new computer to work.

01:18:08.020 | You know, that's, like there's more invention than that.

01:18:11.480 | But the rejection to practice,

01:18:14.100 | once you pick the architecture, you look inside

01:18:16.100 | and what do you see?

01:18:17.060 | Adders and multipliers and memories and, you know,

01:18:20.340 | the basics.

01:18:21.180 | So computers is always this weird set of abstraction layers

01:18:25.580 | of ideas and thinking that reduction to practice

01:18:29.300 | is transistors and wires and, you know, pretty basic stuff.

01:18:33.740 | And that's an interesting phenomenon.

01:18:37.060 | By the way, like factory work,

01:18:38.820 | like lots of people think factory work

01:18:40.580 | is road assembly stuff.

01:18:42.260 | I've been on the assembly line.

01:18:44.140 | Like the people who work there really like it.

01:18:46.260 | It's a really great job.

01:18:47.820 | It's really complicated.

01:18:48.740 | Putting cars together is hard, right?

01:18:50.860 | And the car is moving and the parts are moving

01:18:53.420 | and sometimes the parts are damaged

01:18:54.940 | and you have to coordinate putting all the stuff together

01:18:57.520 | and people are good at it.

01:18:59.060 | They're good at it.

01:19:00.340 | And I remember one day I went to work

01:19:01.740 | and the line was shut down for some reason

01:19:03.940 | and some of the guys sitting around were really bummed

01:19:06.740 | 'cause they had reorganized a bunch of stuff

01:19:09.220 | and they were gonna hit a new record

01:19:10.700 | for the number of cars built that day

01:19:12.740 | and they were all gung-ho to do it.

01:19:14.140 | And these were big, tough buggers.

01:19:15.840 | (Luke laughs)

01:19:17.780 | But what they did was complicated and you couldn't do it.

01:19:20.180 | - Yeah, and I mean--

01:19:21.340 | - Well, after a while you could,

01:19:22.740 | but you'd have to work your way up

01:19:24.180 | 'cause, you know, like putting the bright,

01:19:27.180 | what's called the brights,

01:19:28.660 | the trim on a car on a moving assembly line

01:19:32.620 | where it has to be attached 25 places

01:19:34.620 | in a minute and a half is unbelievably complicated.

01:19:38.160 | And human beings can do it, it's really good.

01:19:42.500 | I think that's harder than driving a car, by the way.

01:19:45.280 | - Putting together, working--

01:19:47.060 | - Working in a factory.

01:19:48.580 | - Too smart people can disagree.

01:19:51.420 | - Yay.

01:19:52.260 | - I think driving a car--

01:19:54.460 | - Well, we'll get you in the factory someday

01:19:56.140 | and then we'll see how you do.

01:19:56.980 | - No, not for us humans driving a car is easy.

01:19:59.540 | I'm saying building a machine

01:20:01.740 | that drives a car is not easy.

01:20:04.540 | Okay, driving a car is easy for humans

01:20:07.460 | because we've been evolving for billions of years.

01:20:10.900 | - To drive cars, yeah, I noticed that.

01:20:13.300 | The paleolithic cars are super cool.

01:20:15.640 | - Oh, now you join the rest of the internet in mocking me.

01:20:19.860 | - Okay.

01:20:20.700 | (Luke laughs)

01:20:21.520 | I wasn't mocking, I was just, you know,

01:20:23.820 | intrigued by your anthropology.

01:20:26.860 | - Yeah, it's--

01:20:27.700 | - I'll have to go dig into that.

01:20:28.980 | - There's some inaccuracies there, yes.

01:20:31.100 | Okay, but in general,

01:20:33.580 | (Luke laughs)

01:20:35.380 | what have you learned in terms of,

01:20:39.660 | thinking about passion, craftsmanship,

01:20:44.060 | tension, chaos, you know--

01:20:47.260 | - Jesus.

01:20:48.100 | - The whole mess of it,

01:20:50.900 | what have you learned, have taken away from your time

01:20:54.260 | working with Elon Musk, working at Tesla,

01:20:57.020 | which is known to be a place of chaos,

01:21:00.860 | innovation, craftsmanship, and all those things.

01:21:03.700 | - I really like the way he thought.

01:21:05.360 | Like, you think you have an understanding

01:21:07.700 | about what first principles of something is,

01:21:10.020 | and then you talk to Elon about it,

01:21:11.660 | and you didn't scratch the surface.

01:21:13.900 | You know, he has a deep belief

01:21:17.420 | that no matter what you do, it's a local maximum.

01:21:19.860 | Right, and I had a friend,

01:21:21.740 | he invented a better electric motor,

01:21:24.260 | and it was a lot better than what we were using.

01:21:26.980 | And one day he came by, he said,

01:21:28.060 | "You know, I'm a little disappointed,

01:21:30.020 | "'cause this is really great,

01:21:31.840 | "and you didn't seem that impressed."

01:21:33.300 | And I said, "You know, when the super intelligent aliens

01:21:36.420 | "come, are they gonna be looking for you?"

01:21:38.940 | Like, where is he?

01:21:39.780 | The guy who built the motor.

01:21:41.140 | Probably not.

01:21:43.220 | But doing interesting work that's both innovative,

01:21:49.440 | and let's say craftsman's work on the current thing,

01:21:51.840 | it's really satisfying, and it's good.

01:21:54.220 | And that's cool.

01:21:55.140 | And then Elon was good at taking everything apart.

01:21:59.060 | Like, what's the deep first principle?

01:22:01.680 | Oh, no, what's really, no, what's really?

01:22:03.980 | You know, that ability to look at it without assumptions,

01:22:08.980 | and how constraint is super wild.

01:22:13.140 | You know, he built a rocket ship,

01:22:15.380 | and electric car, and everything.

01:22:19.740 | And that's super fun, and he's into it, too.

01:22:21.860 | Like, when they first landed two SpaceX rockets to Tesla,

01:22:26.140 | we had a video projector in the big room,

01:22:28.000 | and like 500 people came down,

01:22:29.840 | and when they landed, everybody cheered,

01:22:31.300 | and some people cried.

01:22:32.660 | It was so cool.

01:22:33.720 | All right, but how did you do that?

01:22:36.260 | Well, it was super hard.

01:22:39.400 | And then people say, "Well, it's chaotic."

01:22:42.180 | Really?

01:22:43.020 | To get out of all your assumptions?

01:22:44.580 | You think that's not gonna be unbelievably painful?

01:22:48.140 | And is Elon tough?

01:22:50.040 | Yeah, probably.

01:22:51.400 | Do people look back on it and say,

01:22:53.200 | "Boy, I'm really happy I had that experience

01:22:57.440 | "to go take apart that many layers of assumptions?"

01:23:01.800 | Sometimes super fun, sometimes painful.

01:23:05.360 | So it could be emotionally and intellectually painful,

01:23:07.900 | that whole process of just stripping away assumptions?

01:23:10.860 | Yeah, imagine 99% of your thought process

01:23:13.320 | is protecting your self-conception.

01:23:16.540 | And 98% of that's wrong.

01:23:18.660 | Now you got the math right.

01:23:21.500 | How do you think you're feeling

01:23:23.620 | when you get back into that one bit that's useful,

01:23:26.800 | and now you're open and you have the ability

01:23:28.540 | to do something different?

01:23:30.640 | I don't know if I got the math right.

01:23:33.660 | It might be 99.9, but it ain't 50.

01:23:37.420 | Imagining that 50% is hard enough.

01:23:44.200 | Now for a long time I've suspected you could get better.

01:23:47.040 | Like you can think better, you can think more clearly,

01:23:50.720 | you can take things apart.

01:23:52.040 | And there's lots of examples of that, people who do that.

01:23:56.400 | And Elon is an example of that.

01:24:01.000 | Apparently. You are an example.

01:24:02.160 | So-- I don't know if I am.

01:24:04.480 | I'm fun to talk to.

01:24:05.520 | Certainly. I've learned a lot of stuff.

01:24:08.600 | Right. Well, here's the other thing

01:24:09.880 | is like, I joke, like I read books,

01:24:13.000 | and people think, oh, you read books.

01:24:14.600 | Well, no, I've read a couple of books a week for 55 years.

01:24:19.600 | Well, maybe 50, 'cause I didn't learn to read

01:24:22.600 | until I was eight or something.

01:24:24.680 | And it turns out when people write books,

01:24:28.480 | they often take 20 years of their life

01:24:31.240 | where they passionately did something,

01:24:33.280 | reduced it to 200 pages.

01:24:36.080 | That's kind of fun.

01:24:37.480 | And then you go online and you can find out

01:24:39.800 | who wrote the best books and who like, you know,

01:24:42.420 | that's kind of wild.

01:24:43.360 | So there's this wild selection process,

01:24:45.200 | and then you can read it,

01:24:46.040 | and for the most part, understand it.

01:24:48.600 | And then you can go apply it.

01:24:51.920 | Like I went to one company, I thought,

01:24:53.400 | I haven't managed much before.

01:24:55.080 | So I read 20 management books,

01:24:57.280 | and I started talking to them,

01:24:58.720 | and basically compared to all the VPs running around,

01:25:01.400 | I'd read 19 more management books than anybody else.

01:25:05.400 | (laughing)

01:25:07.080 | Wasn't even that hard.

01:25:08.600 | And half the stuff worked, like first time.

01:25:11.160 | It wasn't even rocket science.

01:25:12.660 | - But at the core of that is questioning the assumptions,

01:25:16.960 | or sort of entering, thinking first principles thinking,

01:25:21.760 | sort of looking at the reality of the situation,

01:25:24.880 | and using that knowledge, applying that knowledge.

01:25:28.200 | - Yeah, so I would say my brain has this idea

01:25:31.360 | that you can question first assumptions.

01:25:34.260 | But I can go days at a time and forget that,

01:25:38.280 | and you have to kind of circle back that observation.

01:25:41.440 | - Because it is emotionally challenging.

01:25:45.120 | - Well, it's hard to just keep it front and center,

01:25:47.280 | 'cause you operate on so many levels all the time,

01:25:50.380 | and getting this done takes priority,

01:25:53.440 | or being happy takes priority,

01:25:56.480 | or screwing around takes priority.

01:25:59.360 | Like how you go through life is complicated.

01:26:03.040 | And then you remember, oh yeah,

01:26:04.360 | I could really think first principles.

01:26:06.480 | Oh shit, that's tiring.

01:26:08.260 | But you do for a while, and that's kind of cool.

01:26:12.760 | - So just as a last question in your sense,

01:26:16.200 | from the big picture, from the first principles,

01:26:19.480 | do you think, you kind of answered it already,

01:26:21.520 | but do you think autonomous driving

01:26:24.320 | is something we can solve on a timeline of years?

01:26:28.720 | So one, two, three, five, 10 years,

01:26:32.240 | as opposed to a century?

01:26:33.880 | - Yeah, definitely.

01:26:35.400 | - Just to linger on it a little longer,

01:26:37.400 | where's the confidence coming from?

01:26:40.080 | Is it the fundamentals of the problem,

01:26:42.600 | the fundamentals of building the hardware and the software?

01:26:46.360 | - As a computational problem,

01:26:48.760 | understanding ballistics, roles, topography,

01:26:53.280 | it seems pretty solvable.

01:26:56.520 | I mean, and you can see this,

01:26:57.960 | like speech recognition for a long time,

01:27:00.240 | people are doing frequency and domain analysis,

01:27:02.720 | and all kinds of stuff,

01:27:04.360 | and that didn't work for at all, right?

01:27:07.280 | And then they did deep learning about it,

01:27:09.360 | and it worked great.

01:27:10.400 | And it took multiple iterations.

01:27:14.320 | And, you know, autonomous driving

01:27:17.400 | is way past the frequency analysis point.

01:27:19.840 | You know, use radar, don't run into things.

01:27:23.880 | And the data gathering is going up,

01:27:25.440 | and the computation is going up,

01:27:26.840 | and the algorithm understanding is going up,

01:27:28.600 | and there's a whole bunch of problems

01:27:30.000 | getting solved like that.

01:27:31.960 | - The data side is really powerful,

01:27:33.480 | but I disagree with both you and Elon.

01:27:35.720 | I'll tell Elon once again, as I did before,

01:27:38.560 | that when you add human beings into the picture,

01:27:42.360 | it's no longer a ballistics problem.

01:27:45.680 | It's something more complicated,

01:27:47.480 | but I could be very well proven wrong.

01:27:50.360 | - Cars are highly damped in terms of rate of change.

01:27:53.040 | Like the steering system's really slow

01:27:56.640 | compared to a computer.

01:27:57.640 | The acceleration, the acceleration's really slow.

01:28:01.000 | - Yeah, on a certain time scale.

01:28:02.840 | On a ballistics time scale, but human behavior,

01:28:05.000 | I don't know.

01:28:05.840 | I shouldn't say--

01:28:08.200 | - Human beings are really slow too.

01:28:09.800 | Weirdly, we operate, you know,

01:28:11.320 | half a second behind reality.

01:28:13.960 | Nobody really understands that one either.

01:28:15.320 | It's pretty funny.

01:28:16.440 | - Yeah, yeah.

01:28:18.160 | We very well could be surprised.

01:28:23.600 | And I think with the rate of improvement

01:28:25.160 | in all aspects on both the compute

01:28:26.880 | and the software and the hardware,

01:28:29.680 | there's gonna be pleasant surprises all over the place.

01:28:32.720 | - Mm-hmm.

01:28:33.560 | - Speaking of unpleasant surprises,

01:28:36.720 | many people have worries about a singularity

01:28:39.520 | in the development of AI.

01:28:41.680 | Forgive me for such questions.

01:28:43.160 | - Yeah.

01:28:44.440 | - When AI improves exponentially

01:28:46.040 | and reaches a point of superhuman level

01:28:48.360 | general intelligence,

01:28:49.800 | you know, beyond the point, there's no looking back.

01:28:53.320 | Do you share this worry of existential threats

01:28:56.120 | from artificial intelligence,

01:28:57.360 | from computers becoming superhuman level intelligent?

01:29:01.920 | - No, not really.

01:29:03.400 | You know, like we already have a very stratified society.

01:29:07.520 | And then if you look at the whole animal kingdom

01:29:09.400 | of capabilities and abilities and interests,

01:29:12.560 | and, you know, smart people have their niche,

01:29:15.280 | and, you know, normal people have their niche,

01:29:17.760 | and craftsmen have their niche,

01:29:19.640 | and, you know, animals have their niche.

01:29:22.560 | I suspect that the domains of interest

01:29:26.040 | for things that, you know, astronomically different,

01:29:29.480 | like the whole something got 10 times smarter than us

01:29:32.320 | and wanted to track us all down because what?

01:29:34.720 | We like to have coffee at Starbucks?

01:29:36.960 | Like, it doesn't seem plausible.

01:29:38.920 | No, is there an existential problem

01:29:40.720 | that how do you live in a world

01:29:42.560 | where there's something way smarter than you,

01:29:44.120 | and you base your kind of self-esteem

01:29:46.440 | on being the smartest local person?

01:29:48.920 | Well, there's what, 0.1% of the population who thinks that?

01:29:52.560 | 'Cause the rest of the population's been dealing with it

01:29:54.880 | since they were born.

01:29:56.760 | So the breadth of possible experience

01:30:00.960 | that can be interesting is really big.

01:30:03.680 | And, you know, superintelligence seems likely,

01:30:09.840 | although we still don't know if we're magical,

01:30:14.200 | but I suspect we're not,

01:30:16.320 | and it seems likely that it'll create possibilities

01:30:18.800 | that are interesting for us,

01:30:20.920 | and its interests will be interesting for whatever it is.

01:30:26.840 | It's not obvious why its interests

01:30:28.920 | would somehow wanna fight over some square foot of dirt

01:30:32.400 | or, you know, whatever the usual fears are about.

01:30:37.400 | - So you don't think you'll inherit

01:30:39.000 | some of the darker aspects of human nature?

01:30:41.320 | - Depends on how you think reality's constructed.

01:30:45.240 | So for whatever reason, human beings are in, let's say,

01:30:50.240 | creative tension and opposition

01:30:52.320 | with both our good and bad forces.

01:30:55.400 | Like, there's lots of philosophical understanding of that.

01:30:58.300 | Right?

01:31:00.480 | I don't know why that would be different.

01:31:03.200 | - So you think the evil is necessary for the good?

01:31:06.720 | I mean, the tension.

01:31:08.200 | - I don't know about evil,

01:31:09.120 | but like we live in a competitive world

01:31:11.640 | where your good is somebody else's, you know, evil.

01:31:16.640 | You know, there's the malignant part of it,

01:31:19.320 | but that seems to be self-limiting,

01:31:22.760 | although occasionally it's super horrible.

01:31:26.300 | - But yes, there's a debate over ideas

01:31:30.000 | and some people have different beliefs

01:31:32.360 | and that debate itself is a process

01:31:34.600 | so that arriving at something--

01:31:37.560 | - Yeah, and why wouldn't that continue?

01:31:39.360 | - Yeah.

01:31:40.200 | But you don't think that whole process

01:31:43.160 | will leave humans behind in a way that's painful?

01:31:46.140 | Emotionally painful, yes, for the 0.1%.

01:31:50.440 | There'll be--

01:31:51.280 | - Isn't it already painful for a large percentage

01:31:53.240 | of the population?

01:31:54.080 | And it is.

01:31:54.900 | I mean, society does have a lot of stress in it,

01:31:57.880 | about the 1% and about to this and about to that,

01:32:00.680 | but you know, everybody has a lot of stress in their life

01:32:03.760 | about what they find satisfying

01:32:05.360 | and you know, know yourself seems to be the proper dictum

01:32:09.760 | and pursue something that makes your life meaningful

01:32:14.240 | seems proper.

01:32:15.220 | And there's so many avenues on that.

01:32:18.720 | Like, there's so much unexplored space

01:32:21.120 | at every single level.

01:32:22.560 | You know, I'm somewhat of,

01:32:27.320 | my nephew called me a jaded optimist.

01:32:29.600 | (laughing)

01:32:31.840 | - There's a beautiful tension in that label.

01:32:37.160 | But if you were to look back at your life

01:32:40.960 | and could relive a moment, a set of moments

01:32:45.800 | because there were the happiest times of your life

01:32:49.240 | outside of family, what would that be?

01:32:52.580 | - I don't wanna relive any moments.

01:32:56.680 | I like that.

01:32:58.040 | I like that situation where you have some amount of optimism

01:33:01.360 | and then the anxiety of the unknown.

01:33:04.840 | - So you love the unknown, the mystery of it.

01:33:10.120 | - I don't know about the mystery.

01:33:11.240 | It sure gets your blood pumping.

01:33:12.940 | - What do you think is the meaning of this whole thing?

01:33:17.100 | Of life on this pale blue dot?

01:33:20.620 | - It seems to be what it does.

01:33:23.900 | Like the universe, for whatever reason,

01:33:29.280 | makes atoms which makes us which we do stuff.

01:33:32.820 | And we figure out things and we explore things.

01:33:37.120 | - That's just what it is.

01:33:39.840 | - It's not just.

01:33:41.600 | - Yeah, it is.

01:33:43.520 | Jim, I don't think there's a better place to end it

01:33:46.920 | it's a huge honor.

01:33:48.180 | - Well, that was super fun.

01:33:51.200 | - Thank you so much for talking today.

01:33:52.540 | - All right, great.

01:33:54.080 | - Thanks for listening to this conversation

01:33:56.200 | and thank you to our presenting sponsor, Cash App.

01:33:59.360 | Download it, use code LexPodcast, you'll get $10

01:34:03.080 | and $10 will go to FIRST, a STEM education nonprofit

01:34:06.440 | that inspires hundreds of thousands of young minds

01:34:09.280 | to become future leaders and innovators.

01:34:12.200 | If you enjoy this podcast, subscribe on YouTube,

01:34:15.000 | get five stars on Apple Podcast, follow on Spotify,

01:34:18.280 | support on Patreon or simply connect with me on Twitter.

01:34:22.320 | And now let me leave you with some words of wisdom

01:34:24.800 | from Gordon Moore.

01:34:26.880 | If everything you try works, you aren't trying hard enough.

01:34:30.920 | Thank you for listening and hope to see you next time.

01:34:34.720 | (upbeat music)

01:34:37.300 | (upbeat music)

01:34:39.880 | [BLANK_AUDIO]

Jim Keller: Moore's Law, Microprocessors, and First Principles | Lex Fridman Podcast #70

Chapters