Back to Index

Moore's Law is Not Dead (Jim Keller) | AI Podcast Clips


Chapters

0:0
0:20 What Is Moore's Law
1:25 Broader Definition of Moore's Law
14:52 Building Blocks of Mathematics
19:40 Nanowires

Transcript

For over 50 years now, Moore's Law has served for me and millions of others as an inspiring beacon of what kind of amazing future brilliant engineers can build. I'm just making your kids laugh all of today. That's great. So first, in your eyes, what is Moore's Law, if you could define for people who don't know?

Well the simple statement was, from Gordon Moore, was double the number of transistors every two years. Something like that. And then my operational model is, we increase the performance of computers by 2x every two or three years. And it's wiggled around substantially over time. And also, in how we deliver performance has changed.

Right. But the foundational idea was 2x the transistors every two years. The current cadence is something like, they call it a shrink factor, like .6 every two years, which is not .5. But that's referring strictly, again, to the original definition of just-- Yeah, of transistor count. A shrink factor, just getting them smaller and smaller and smaller.

Well, it's for a constant chip area. If you make the transistors smaller by .6, then you get one over .6 more transistors. So can you linger on it a little longer? What's a broader-- what do you think should be the broader definition of Moore's Law? When you mentioned how you think of performance, just broadly, what's a good way to think about Moore's Law?

Well, first of all, I've been aware of Moore's Law for 30 years. In which sense? Well, I've been designing computers for 40. You're just watching it before your eyes, kind of thing. Well, and somewhere where I became aware of it, I was also informed that Moore's Law was going to die in 10 to 15 years.

And I thought that was true at first. But then after 10 years, it was going to die in 10 to 15 years. And then at one point, it was going to die in five years. And then it went back up to 10 years. And at some point, I decided not to worry about that particular prognostication for the rest of my life, which is fun.

And then I joined Intel, and everybody said Moore's Law is dead. And I thought, that's sad, because it's the Moore's Law company. And it's not dead. And it's always been going to die. And humans like these apocryphal kind of statements, like, we'll run out of food, or we'll run out of air, or run out of room, or run out of something.

Right. But it's still incredible that it's lived for as long as it has. And yes, there's many people who believe now that Moore's Law is dead. You know, they can join the last 50 years of people who had the same idea. Yeah, there's a long tradition. But why do you think, if you can try to understand it, why do you think it's not dead currently?

Well, first, let's just think, people think Moore's Law is one thing. Transistors get smaller. But actually, under the sheet, there's literally thousands of innovations. And almost all those innovations have their own diminishing return curves. So if you graph it, it looks like a cascade of diminishing return curves. I don't know what to call that.

But the result is an exponential curve. Well, at least it has been. And we keep inventing new things. So if you're an expert in one of the things on a diminishing return curve, right, and you can see its plateau, you will probably tell people, well, this is done. Meanwhile, some other pile of people are doing something different.

So that's just normal. So then there's the observation of how small could a switching device be? So a modern transistor is something like a thousand by a thousand by a thousand atoms, right? And you get quantum effects down around two to ten atoms. So you can imagine a transistor as small as 10 by 10 by 10.

So that's a million times smaller. And then the quantum computational people are working away at how to use quantum effects. So a thousand by a thousand by a thousand atoms. That's a really clean way of putting it. Well, a fan, like a modern transistor, if you look at the fan, it's like 120 atoms wide.

But we can make that thinner. And then there's a gate wrapped around it. And then there's spacing. There's a whole bunch of geometry. And a competent transistor designer could count both atoms in every single direction. Like there's techniques now to already put down atoms in a single atomic layer.

And you can place atoms if you want to. It's just from a manufacturing process, placing an atom takes 10 minutes. And you need to put 10 to the 23rd atoms together to make a computer. It would take a long time. So the methods are both shrinking things and then coming up with effective ways to control what's happening.

Manufacture stably and cheaply. Yeah. So the innovation stack's pretty broad. There's equipment, there's optics, there's chemistry, there's physics, there's material science, there's metallurgy. There's lots of ideas about when you put different materials together, how do they interact? Are they stable? Are they stable over temperature? Like are they repeatable?

There's literally thousands of technologies involved. - But just for the shrinking, you don't think we're quite yet close to the fundamental limits of physics? - I did a talk on Moore's Law and I asked for a roadmap to a path of 100. And after two weeks, they said we only got to 50.

- 100 what, sorry? - 100x shrink. - About 100x shrink. We only got to 50. - The 50 and I said, "Why don't you give it another two weeks?" Well here's the thing about Moore's Law. So I believe that the next 10 or 20 years of shrinking is gonna happen.

Now as a computer designer, you have two stances. You think it's going to shrink, in which case you're designing and thinking about architecture in a way that you'll use more transistors. Or conversely, not be swamped by the complexity of all the transistors you get. You have to have a strategy.

- You're open to the possibility and waiting for the possibility of a whole new army of transistors ready to work. - I'm expecting more transistors every two or three years by a number large enough that how you think about design, how you think about architecture has to change. Like imagine you build buildings out of bricks and every year the bricks are half the size or every two years.

Well if you kept building bricks the same way, so many bricks per person per day, the amount of time to build a building would go up exponentially. But if you said, "I know that's coming, so now I'm gonna design equipment that moves bricks faster, uses them better," because maybe you're getting something out of the smaller bricks, more strength, thinner walls, less material efficiency out of that.

So once you have a roadmap with what's gonna happen, transistors, we're gonna get more of them, then you design all this collateral around it to take advantage of it and also to cope with it. That's the thing people don't understand. If I didn't believe in Moore's law and then Moore's law transistors showed up, my design teams were all drowned.

- So what's the hardest part of this influx of new transistors? I mean, even if you just look historically throughout your career, what's the thing, what fundamentally changes when you add more transistors in the task of designing an architecture? - There's two constants, right? One is people don't get smarter.

- By the way, there's some science showing that we do get smarter because of nutrition, whatever. - Yeah. - Sorry to bring that up. - The Flynn effect. - Yes. - Yeah, I'm familiar with it. Nobody understands it. Nobody knows if it's still going on. - Or whether it's real or not.

- Yeah, I sort of-- - Anyway, but not exponentially. - I would believe for the most part people aren't getting much smarter. - The evidence doesn't support it. That's right. - And then teams can't grow that much. So human beings, we're really good in teams of 10, up to teams of 100, they can know each other.

Beyond that, you have to have organizational boundaries. So you're kind of, you have, those are pretty hard constraints, right? So then you have to divide and conquer. Like as the designs get bigger, you have to divide it into pieces. You know, the power of abstraction layers is really high.

We used to build computers out of transistors. Now we have a team that turns transistors into logic cells and another team that turns them into functional units and another one that turns them into computers, right? So we have abstraction layers in there. And you have to think about when do you shift gears on that?

We also use faster computers to build faster computers. So some algorithms run twice as fast on new computers, but a lot of algorithms are N squared. So a computer with twice as many transistors in it might take four times as long to run. So you have to refactor the software.

Like simply using faster computers to build bigger computers doesn't work. So you have to think about all these things. So in terms of computing performance and the exciting possibility that more powerful computers bring is shrinking the thing we've just been talking about. One of the, for you, one of the biggest exciting possibilities of advancement in performance, or is there other directions that you're interested in?

Like in the direction of sort of enforcing given parallelism or like doing massive parallelism in terms of many, many CPUs, stacking CPUs on top of each other, that kind of parallelism or any kind of parallelism? Well, think about it in a different way. So old computers, slow computers, you said A equal B plus C times D.

Pretty simple, right? And then we made faster computers with vector units and you can do proper equations and matrices, right? And then modern like AI computations or like convolutional neural networks where you convolve one large dataset against another. And so there's sort of this hierarchy of mathematics, from simple equation to linear equations, to matrix equations, to deeper kind of computation.

And the datasets are getting so big that people are thinking of data as a topology problem. Data is organized in some immense shape. And then the computation, which sort of wants to be get data from immense shape and do some computation on it. So what computers have allowed people to do is have algorithms go much, much further.

So that paper you referenced, the Sutton paper, they talked about, like when AI started, it was apply rule sets to something. That's a very simple computational situation. And then when they did first chess thing, they solved deep searches. So have a huge database of moves and results, deep search, but it's still just a search.

Now we take large numbers of images and we use it to train these weight sets that we convolve across to completely different kind of phenomena. We call that AI. And now they're doing the next generation. And if you look at it, they're going up this mathematical graph, right? And then computations, both computation and datasets support going up that graph.

- Yeah, the kind of computation that might, I mean, I would argue that all of it is still a search, right? Just like you said, a topology problem of datasets, you're searching the datasets for valuable data. And also the actual optimization of neural networks is a kind of search for the-- - I don't know, if you had looked at the inner layers of finding a cat, it's not a search.

It's a set of endless projections. So projection, here's a shadow of this phone, right? And then you can have a shadow of that on something, a shadow on that of something. If you look in the layers, you'll see this layer actually describes pointy ears and round eyedness and fuzziness, but the computation to tease out the attributes is not search.

- Right, I mean-- - Like the inference part might be search, but the training is not search. And then in deep networks, they look at layers and they don't even know it's represented. And yet if you take the layers out, it doesn't work. - Okay, so-- - So I don't think it's search.

- All right, well-- - But you have to talk to a mathematician about what that actually is. - Well, we could disagree, but it's just semantics, I think. It's not, but it's certainly not-- - I would say it's absolutely not semantics, but-- - Okay, all right, well, if you wanna go there.

So optimization to me is search, and we're trying to optimize the ability of a neural network to detect cat ears. And the difference between chess and the space, the incredibly multi-dimensional, 100,000 dimensional space that neural networks are trying to optimize over is nothing like the chess board database. So it's a totally different kind of thing.

Okay, in that sense, you can say-- - Yeah, yeah. - It loses the meaning. - I can see how you might say. The funny thing is, it's the difference between given search space and found search space. - Right, exactly. - Yeah, maybe that's a different way to describe it.

- That's a beautiful way to put it. - Okay. - But you're saying, what's your sense in terms of the basic mathematical operations and the architectures, hardware that enables those operations? Do you see the CPUs of today still being a really core part of executing those mathematical operations? - Yes.

Well, the operations continue to be add, subtract, load, store, compare, and branch. It's remarkable. So it's interesting that the building blocks of computers are transistors, and under that, atoms. So you got atoms, transistors, logic gates, computers, functional units of computers. The building blocks of mathematics at some level are things like adds and subtracts and multiplies, but the space mathematics can describe is, I think, essentially infinite.

But the computers that run the algorithms are still doing the same things. Now, a given algorithm might say, "I need sparse data," or, "I need 32-bit data," or, "I need a convolution operation that naturally takes 8-bit data, multiplies it, and sums it up a certain way." So the data types in TensorFlow imply an optimization set, but when you go right down and look at the computers, it's and and or gates doing adds and multiplies.

That hasn't changed much. Now, the quantum researchers think they're going to change that radically, and then there's people who think about analog computing, because you look in the brain and it seems to be more analog-ish, that maybe there's a way to do that more efficiently. We have a million X on computation, and I don't know the relationship between computational, let's say, intensity and ability to hit mathematical abstractions.

I don't know any ways to describe that, but just like you saw in AI, you went from rule sets to simple search to complex search to, say, found search. Those are orders of magnitude more computation to do. And as we get the next two orders of magnitude, like a friend, Roger Godori, said, "Every order of magnitude changes the computation." - Fundamentally changes what the computation is doing.

- Yeah. Oh, you know the expression, "The difference in quantity is the difference in kind." You know, the difference between ant and anthill, right? Or neuron and brain. There's this indefinable place where the quantity changed the quality, right? And we've seen that happen in mathematics multiple times, and my guess is it's gonna keep happening.

- So, in your sense, is it, yeah, if you focus head down and shrinking the transistor... - Well, it's not just head down. We're aware of the software stacks that are running the computational loads, and we're kind of pondering what do you do with a petabyte of memory that wants to be accessed in a sparse way and have the kind of calculations AI programmers want.

So there's a dialogue interaction, but when you go in the computer chip, you find adders and subtractors and multipliers. - So if you zoom out then with, as you mentioned, Rich Sutton, the idea that most of the development in the last many decades in AI research came from just leveraging computation and just simple algorithms waiting for the computation to improve.

- Well, software guys have a thing that they call the problem of early optimization. So you write a big software stack, and if you start optimizing the first thing you write, the odds of that being the performance limiter is low. But when you get the whole thing working, can you make it 2x faster by optimizing the right things?

Sure. While you're optimizing that, could you have written a new software stack, which would have been a better choice? Maybe. Now you have creative tension. - But the whole time as you're doing the writing, that's the software we're talking about. The hardware underneath gets faster and faster. - Well, this goes back to the Moore's Law.

If Moore's Law is going to continue, then your AI research should expect that to show up. And then you make a slightly different set of choices. Then we've hit the wall. Nothing's going to happen. And from here, it's just us rewriting algorithms. That seems like a failed strategy for the last 30 years of Moore's Law's death.

- So can you just linger on it? I think you've answered it, but I'll just ask the same dumb question over and over. So why do you think Moore's Law is not going to die? Which is the most promising, exciting possibility of why it won't die in the next five, 10 years?

So is it the continued shrinking of the transistor, or is it another S-curve that steps in and it totally sort of- - Well, shrinking the transistor is literally thousands of innovations. - Right. So there's stacks of S-curves in there. There's a whole bunch of S-curves just kind of running their course and being reinvented and new things.

The semiconductor fabricators and technologists have all announced what's called nanowires. So they took a fin which had a gate around it and turned that into little wires so you have better control of that and they're smaller. And then from there, there are some obvious steps about how to shrink that.

The metallurgy around wire stacks and stuff has very obvious abilities to shrink. And there's a whole combination of things there to do. - Your sense is that we're going to get a lot if this innovation from just that shrinking. - Yeah, like a factor of a hundred, it's a lot.

- Yeah, I would say that's incredible. And it's totally unknown. - It's only 10 or 15 years. - Now you're smart and you might know, but to me it's totally unpredictable of what that hundred X would bring in terms of the nature of the computation that people would be.

- Yeah, you're familiar with Bell's law. So for a long time it was mainframes, minis, workstation, PC, mobile. Moore's law drove faster, smaller computers. And then when we were thinking about Moore's law, Rajagirdari said every 10 X generates a new computation. So scalar, vector, matrix, topological computation. And if you go look at the industry trends, there was mainframes and then mini computers and then PCs.

And then the internet took off and then we got mobile devices and now we're building 5G wireless with one millisecond latency. And people are starting to think about the smart world where everything knows you, recognizes you, like the transformations are going to be like unpredictable. - How does it make you feel that you're one of the key architects of this kind of future?

So we're not talking about the architects of the high level, people who build the Angry Bird apps and Snapchat. - Angry Bird apps, who knows? Maybe that's the whole point of the universe. - I'm going to take a stand at that and the attention distracting nature of mobile phones.

I'll take a stand. But anyway, in terms of-- - I don't think that matters much. - The side effects of smartphones or the attention distraction, which part? - Well, who knows where this is all leading? It's changing so fast. - Wait, so back to-- - My parents used to yell at my sisters for hiding in the closet with a wired phone with a dial on it.

Stop talking to your friends all day. - Right. - And now my wife yells at my kids for talking to their friends all day on text. It looks the same to me. - It's always, it echoes the same thing. But you are one of the key people architecting the hardware of this future.

How does that make you feel? Do you feel responsible? Do you feel excited? - So we're in a social context, so there's billions of people on this planet. There are literally millions of people working on technology. I feel lucky to be doing what I do and getting paid for it, and there's an interest in it.

But there's so many things going on in parallel. Like the actions are so unpredictable. If I wasn't here, somebody else would do it. The vectors of all these different things are happening all the time. There's a, I'm sure some philosopher or meta philosophers wondering about how we transform our world.

- So you can't deny the fact that these tools are changing our world. - That's right. - So do you think it's changing for the better? - I read this thing recently, it said the two disciplines with the highest GRE scores in college are physics and philosophy. And they're both sort of trying to answer the question, why is there anything?

And the philosophers are on the theological side, and the physicists are obviously on the material side. And there's 100 billion galaxies with 100 billion stars. It seems, well, repetitive at best. So there's on our way to 10 billion people. It's hard to say what it's all for, if that's what you're asking.

- Yeah, I guess I am. - Things do tend to significantly increases in complexity. And I'm curious about how computation, like our world, our physical world inherently generates mathematics. It's kind of obvious, right? So we have XYZ coordinates, you take a sphere, you make it bigger, you get a surface that falls, grows by R squared.

Like it generally generates mathematics and the mathematicians and the physicists have been having a lot of fun talking to each other for years. And computation has been, let's say, relatively pedestrian. Like computation in terms of mathematics has been doing binary algebra, while those guys have been gallivanting through the other realms of possibility, right?

Now recently, the computation lets you do mathematical computations that are sophisticated enough that nobody understands how the answers came out, right? - Machine learning. - Machine learning. It used to be, you get data set, you guess at a function, the function is considered physics if it's predictive of new functions, new data sets.

Modern, you can take a large data set with no intuition about what it is and use machine learning to find a pattern that has no function, right? And it can arrive at results that I don't know if they're completely mathematically describable. So computation has kind of done something interesting compared to A equal B plus C.

- Thank you. - Thank you. - Thank you. - Thank you. - Thank you. - Thank you. - Thank you. - Thank you.