back to indexJim Keller: Abstraction Layers from the Atom to the Data Center | AI Podcast Clips
00:00:00.000 |
- So let's get into the basics before we zoom back out. 00:00:14.460 |
Maybe even as far back as what is a transistor? 00:00:17.280 |
- So the special charm of computer engineering 00:00:30.080 |
and atoms get put together in materials like silicon 00:00:33.280 |
or dope silicon or metal, and we build transistors. 00:00:41.500 |
And then functional units, like an adder, a subtractor, 00:00:45.240 |
an instruction parsing unit, and then we assemble those 00:00:50.120 |
Modern computers are built out of probably 10 to 20 00:01:04.440 |
So there's abstraction layers, and then software, 00:01:09.560 |
and then there's assembly language C, C++, Java, JavaScript. 00:01:14.240 |
There's abstraction layers, essentially from the atom 00:01:34.840 |
who build a computer, there's lots of different disciplines 00:01:44.120 |
- So there's a bunch of levels of abstraction. 00:01:47.380 |
In an organization like Intel, and in your own vision, 00:01:57.500 |
Some of it is science, some of it is engineering, 00:02:01.160 |
What's the most, if you could pick favorites, 00:02:04.160 |
what's the most important, your favorite layer 00:02:14.920 |
That's the fun, you know, I'm somewhat agnostic to that. 00:02:18.560 |
So I would say for relatively long periods of time, 00:02:25.840 |
So the x86 instruction set, the ARM instruction set. 00:02:31.160 |
- So it says, how do you encode the basic operations? 00:02:33.920 |
Load, store, multiply, add, subtract, conditional branch. 00:02:37.420 |
There aren't that many interesting instructions. 00:02:44.240 |
90% of the execution is on 25 opcodes, 25 instructions. 00:02:53.280 |
- Intel architecture has been around for 25 years. 00:03:03.100 |
Now, the way an old computer ran is you fetched instructions 00:03:32.200 |
So a modern computer, like people like to say, 00:03:40.180 |
complete, clean, slow computers is zero, right? 00:03:47.360 |
Now you can, there's how you build it can be clean, 00:04:03.360 |
and then executes it in a way that gets the right answers. 00:04:11.280 |
And then there's semantics around how memory ordering works 00:04:16.160 |
So the computer sort of has a bunch of bookkeeping tables 00:04:19.720 |
that says what order do these operations finish in 00:04:25.560 |
But to go fast, you have to fetch a lot of instructions 00:04:38.760 |
like you have a program with a lot of dependent instructions. 00:04:43.880 |
the dependency graph and you issue instructions out of order. 00:04:47.200 |
That's because you have one serial narrative to execute, 00:04:56.560 |
- So yeah, so humans think in serial narrative. 00:05:00.760 |
There's a sentence after sentence after sentence 00:05:07.160 |
Imagine you diagrammed it properly and you said, 00:05:17.780 |
- That's a fascinating question to ask of a book. 00:05:26.220 |
You could say, he is tall and smart and X, right? 00:05:31.220 |
And it doesn't matter the order of tall and smart. 00:05:36.020 |
But if you say the tall man is wearing a red shirt, 00:05:40.720 |
what colors, you know, like you can create dependencies. 00:05:54.660 |
And the first order, the screen you're looking at 00:06:02.260 |
Simple narratives around the large numbers of things 00:06:10.100 |
- So found parallelism where the narrative is sequential 00:06:15.480 |
but you discover like little pockets of parallelism 00:06:28.940 |
here's how you fetch 10 instructions at a time, 00:06:31.260 |
here's how you calculated the dependencies between them, 00:07:09.300 |
- You would get what's called cycles per instruction 00:07:11.700 |
and it would be about, you know, three instructions, 00:07:17.780 |
because of the latency of the operations and stuff. 00:07:22.260 |
executes it like 0.25 cycles per instruction. 00:07:30.780 |
One is the found parallelism in the narrative, right? 00:07:35.140 |
And the other is to predictability of the narrative, right? 00:07:39.140 |
So certain operations, they do a bunch of calculations 00:07:43.280 |
and if greater than one, do this, else do that. 00:07:46.640 |
That decision is predicted in modern computers 00:08:03.240 |
figure out the graph and execute them all in parallel. 00:08:09.380 |
if you fix 600 instructions and it's every six, 00:08:13.860 |
you have to predict 99 out of a hundred branches correctly 00:08:20.140 |
- Okay, so parallelism, you can't parallelize branches 00:08:29.260 |
- So imagine you do a computation over and over, 00:08:37.220 |
And you go through that loop a million times. 00:08:40.440 |
you say, it's probably still greater than one. 00:08:43.540 |
- And you're saying you could do that accurately. 00:09:05.900 |
So then somebody said, hey, let's keep a couple of bits 00:09:10.860 |
So when it predicts one way, we count up and then pins. 00:09:18.540 |
And if it's, you can use the top bit as the sign bit. 00:09:22.820 |
So if it's greater than one, you predict taken 00:09:25.260 |
and less than one, you predict not taken, right? 00:09:48.240 |
It went one way, but if you're talking about Bob and Jill, 00:09:56.700 |
That's cool, but that's not how anything works today. 00:10:09.020 |
and then you do basically deep pattern recognition 00:10:21.500 |
and you have something that chooses what the best result is. 00:10:24.500 |
There's a little supercomputer inside the computer. 00:10:32.120 |
So the effective window that is worth finding grass 00:10:58.900 |
To get the result, to get from a window of say 00:11:02.860 |
50 instructions to 500, it took three orders of magnitude 00:11:09.380 |
- Now if you get the prediction of a branch wrong, 00:11:15.180 |
- You flush the pipe, so it's just the performance cost. 00:11:19.260 |
So we're starting to look at stuff that says, 00:11:27.060 |
but far, far away there's something that doesn't matter 00:11:31.460 |
So you took the wrong path, you executed a bunch of stuff. 00:11:37.080 |
Then you had the mispredicting, you backed it up, 00:11:40.220 |
but you remembered all the results you already calculated. 00:11:45.460 |
Like if you read a book and you misunderstand a paragraph, 00:11:50.300 |
Sometimes it's invariant to their understanding. 00:11:55.380 |
- And you can kind of anticipate that invariance. 00:12:06.100 |
to a piece of code, should you calculate it again 00:12:22.940 |
And you have a bunch of knowledge about which way to go. 00:12:35.620 |
So imagine you're doing something complicated 00:12:45.500 |
And the ways you pick interact in a complicated way. 00:12:53.460 |
- Right, so that's-- - Or is that art or science? 00:13:19.980 |
but they're really good at evaluating the alternatives. 00:13:23.380 |
Right, and everybody has a different way to do it. 00:13:32.100 |
So when you see computers are designed by teams of people 00:13:37.060 |
and a good team has lots of different kinds of people. 00:13:42.060 |
I suspect you would describe some of them as artistic. 00:14:32.300 |
- Well, that's a language definitional statement. 00:14:35.260 |
So for years when we first did 3D acceleration of graphics, 00:14:52.340 |
And then when the HPC world used GPUs for calculations, 00:15:05.940 |
where the precision of the data is low enough 00:15:11.460 |
And the observation is the input data is unbelievably noisy. 00:15:20.000 |
that say can get faster answers by being noisy. 00:15:27.380 |
it starts out really wide and then it gets narrower. 00:15:29.940 |
And you can say, is that last little bit that important? 00:15:35.500 |
before we whittle it all the way down to the answer? 00:15:38.560 |
Right, so you can create algorithms that are noisy. 00:15:43.240 |
and every time you run it, you get a different answer, 00:15:51.740 |
every time you run the program, you get the same answer. 00:15:56.160 |
that's the formal definition of a programming language. 00:16:02.320 |
that don't get the same answer, but people who use those. 00:16:05.180 |
You always want something 'cause you get a bad answer 00:16:11.040 |
of something in the algorithm or because of this? 00:16:13.160 |
And so everybody wants a little switch that says, 00:16:18.080 |
And it's really weird 'cause almost everything 00:16:27.400 |
- I design computers for people who run programs. 00:16:30.280 |
So if somebody says, I want a deterministic answer, 00:16:41.880 |
What people don't realize is you get a deterministic answer 00:16:45.080 |
even though the execution flow is very undeterministic. 00:16:53.880 |
- And the answer, it arrives at the same answer.