back to indexEp17. Welcome Jensen Huang | BG2 w/ Bill Gurley & Brad Gerstner
Chapters
0:0 Introduction
1:50 The Evolution of AGI and Personal Assistants
6:3 NVIDIA's Competitive Moat
15:51 The Future of Inference and Training in AI
19:1 Building the AI Infrastructure
31:35 Inventing a New Market in an AI Future
38:40 The Impact of OpenAI
43:25 The Future of AI Models
51:21 Distributed Computing and Inference Scaling
55:54 Inference Time Reasoning and Its Importance
60:46 AI's Role in Growing Business and Improving Productivity
68:0 Ensuring Safe AI Development
72:31 The Balance of Open Source and Closed Source AI
00:00:00.000 |
what they achieved is singular, never been done before. 00:00:06.900 |
that's easily the fastest supercomputer on the planet. 00:00:46.300 |
- Yeah, I got my ugly glasses on just like you. 00:00:52.360 |
- There's something only your family could love. 00:01:07.220 |
where we're gonna debate all the consequences of AI, 00:01:11.940 |
And I couldn't think of anybody better, really, 00:01:16.380 |
- As both a shareholder, as a thought partner, 00:01:18.980 |
kicking ideas back and forth, you really make us smarter. 00:01:39.660 |
So I thought we would kick it off with a thought experiment 00:01:51.100 |
If I think of AGI as that colloquial assistant in my pocket. 00:02:06.960 |
When you look at the rate of change in the world today, 00:02:19.940 |
And that assistant will get better over time. 00:02:25.780 |
That's the beauty of technology as we know it. 00:02:28.260 |
So I think in the beginning it'll be quite useful, 00:02:33.260 |
And then it gets more and more perfect over time, 00:02:39.560 |
"The only thing that really matters is rate of change." 00:02:47.140 |
is the fastest rate of change we've ever seen 00:03:06.620 |
because we drove the marginal cost of computing down 00:03:19.100 |
We did it by one, introducing accelerated computing, 00:03:22.080 |
taking what is work that is not very effective on CPUs 00:03:29.480 |
We did it by inventing new numerical precisions. 00:03:33.320 |
We did it by new architectures, inventing a tensor core. 00:03:45.880 |
and scaling things up with NVLink and InfiniBand, 00:03:59.040 |
led to a super Moore's law rate of innovation. 00:04:07.840 |
we went from human programming to machine learning. 00:04:13.900 |
is that machine learning can learn pretty fast, 00:04:17.800 |
And so as we reformulated the way we distribute computing, 00:04:22.400 |
we did a lot of parallelism of all kinds, right? 00:04:30.480 |
And we became good at inventing new algorithms 00:04:45.820 |
if you look at the way Moore's law was working, 00:04:51.680 |
- It was pre-compiled, it was shrink-wrapped, 00:04:55.480 |
And the hardware underneath was growing at Moore's law rate. 00:04:59.640 |
Now we've got the whole stack growing, right? 00:05:23.540 |
And as a result, the computing capacity necessary 00:05:26.780 |
is increasing by a factor of four every year. 00:05:32.000 |
- But now we're seeing scaling with post-training, 00:05:38.740 |
- And so people used to think that pre-training 00:05:46.520 |
The idea that all of human thinking is one shot 00:05:52.720 |
And so there must be a concept of fast thinking, 00:05:55.280 |
and slow thinking, and reasoning, and reflection, 00:06:05.000 |
one of the most misunderstood things about NVIDIA 00:06:19.900 |
But the truth is you've been spending the past decade 00:06:22.320 |
building the full stack from the GPU, to the CPU, 00:06:25.260 |
to the networking, and especially the software 00:06:27.740 |
and libraries that enable applications to run on NVIDIA. 00:06:34.000 |
But when you think about NVIDIA's moat today, right? 00:06:42.440 |
or smaller than it was three to four years ago? 00:06:53.560 |
and many still do, that you designed a better chip, 00:06:57.440 |
it has more flops, has more flips, and flops, 00:07:00.440 |
and bits, and bytes, you know what I'm saying? 00:07:22.040 |
It is old thinking in the sense that the software 00:07:33.940 |
to improve the system is just making faster and faster chips. 00:07:43.400 |
Machine learning is not about just the software. 00:07:50.020 |
It's about, in fact, the flywheel of machine learning 00:07:54.580 |
So how do you think about enabling this flywheel 00:07:59.100 |
on the one hand, and enabling data scientists 00:08:02.820 |
and researchers to be productive in this flywheel? 00:08:06.340 |
And that flywheel starts at the very, very beginning. 00:08:13.340 |
that it takes AI to curate data to teach an AI. 00:08:40.740 |
and all kinds of different ways of curating data, 00:08:43.920 |
presenting data to, and so before you even get the training, 00:08:47.460 |
you've got massive amounts of data processing involved. 00:08:57.580 |
But don't forget before PyTorch, there's amount of work. 00:09:11.620 |
a computing architecture that helps you take this flywheel 00:09:16.660 |
It's not one slice of an application, training. 00:09:30.140 |
instead of thinking about, how do I make Excel faster? 00:09:35.580 |
That was kind of the old days, isn't that right? 00:09:40.940 |
And this flywheel has a whole bunch of different steps. 00:09:43.780 |
And there's nothing easy about machine learning, 00:09:46.440 |
There's nothing easy about what OpenAI does, or X does, 00:09:51.780 |
I mean, there's nothing easy about what they do. 00:09:56.420 |
this is really what you ought to be thinking about. 00:10:13.740 |
I didn't really accelerate the entire process by that much. 00:10:25.060 |
can you really materially improve that cycle time. 00:10:33.580 |
is really, in the end, what causes the exponential rise. 00:10:37.380 |
And so what I'm trying to say is that our perspective about, 00:10:48.100 |
And notice, I've been talking about this flywheel-- 00:11:23.860 |
- And people are only thinking about text models today. 00:11:27.940 |
- But the future is, you know, this video models, 00:11:40.500 |
Language models are gonna be involved in everything. 00:11:43.540 |
It took the industry enormous technology and effort 00:11:56.180 |
- I don't mean to be overly simplistic about this, 00:12:00.460 |
we hear it all the time from investors, right? 00:12:10.260 |
What I hear you saying is that in a combinatorial system, 00:12:16.380 |
So I heard you say that our advantage is greater today 00:12:26.340 |
Is that, you know, when you think about, for example, 00:12:33.220 |
Who had a dominant mode, a dominant position in the stack 00:12:38.900 |
Perhaps just, you know, again, boil it down a little bit. 00:12:41.980 |
You know, compare, contrast your competitive advantage 00:12:53.420 |
Intel is extraordinary because they were probably 00:13:01.940 |
at manufacturing, process engineering, manufacturing, 00:13:15.420 |
And designing the chip and architecting the chip 00:13:29.540 |
Our company is a little different in the sense that, 00:13:34.220 |
and we recognize this, that in fact, parallel processing 00:13:38.820 |
doesn't require every transistor to be excellent. 00:13:42.140 |
Serial processing requires every transistor to be excellent. 00:13:45.180 |
Parallel processing requires lots and lots of transistors 00:13:50.780 |
I'd rather have 10 times more transistors, 20% slower, 00:14:04.340 |
single-threaded processing and parallel processing 00:14:13.260 |
We want to be very good, as good as we can be. 00:14:16.420 |
But our world is really about much better going up. 00:14:19.780 |
Parallel computing, parallel processing is hard 00:14:22.300 |
because every single algorithm requires a different way 00:14:27.300 |
of refactoring and re-architecting the algorithm 00:14:32.500 |
What people don't realize is that you can have 00:14:39.060 |
You could take software and compile down to the ISA. 00:14:42.140 |
That's not possible in accelerated computing. 00:14:45.580 |
The company who comes up with the architecture 00:14:52.900 |
because of our domain-specific library called CUDNN. 00:14:58.660 |
because it's one layer underneath PyTorch and TensorFlow 00:15:03.180 |
and back in the old days, CAFE and Theano and now Triton. 00:15:08.180 |
There's a whole bunch of different frameworks. 00:15:16.580 |
we have a domain-specific library called Quantum, 00:15:24.380 |
- Industry-specific algorithms that sit below 00:15:28.260 |
that PyTorch layer that everybody's focused on. 00:15:33.300 |
- If we didn't invent that, no application on top could work. 00:15:50.900 |
- There's all this attention now on inference, finally. 00:15:55.420 |
But I remember two years ago, Brad and I had dinner with you 00:16:01.740 |
"Do you think your moat will be as strong in inference 00:16:08.100 |
- Yeah, and I'm sure I said it would be greater. 00:16:32.300 |
- It was inference, training is inferencing at scale. 00:16:50.260 |
You could still go and optimize it for other architectures, 00:17:08.500 |
you want your best new gear to be used for training, 00:17:13.500 |
which leaves behind gear that you used yesterday. 00:17:27.780 |
behind the new infrastructure that's CUDA compatible. 00:17:32.820 |
about making sure that we're compatible throughout, 00:17:42.820 |
into continuously reinventing new algorithms, 00:17:48.740 |
the Hopper architecture is two, three, four times better 00:17:54.180 |
so that infrastructure continues to be really effective. 00:18:02.340 |
notice it helps every single install base that we have. 00:18:06.820 |
Hopper is better for it, Ampere is better for it, 00:18:13.380 |
that they had just decommissioned the Volta infrastructure 00:18:18.900 |
And so I think we leave behind this trail of install base. 00:18:23.380 |
Just like all computing, install base matters. 00:18:27.500 |
we're on-prem and all the way out to the edge. 00:18:42.980 |
And so I think this idea of architecture compatibility 00:18:52.900 |
I think the install base is really important for inference. 00:19:03.540 |
these large language models and the new architectures of it, 00:19:07.020 |
we're able to think about how do we create architectures 00:19:11.820 |
that's excellent at inference someday when the time comes. 00:19:15.260 |
And so we've been thinking about iterative models 00:19:20.900 |
and how do we create very interactive inference experiences 00:19:30.820 |
and have to go off and think about it for a while. 00:19:32.220 |
You want it to interact with you quite quickly. 00:19:47.020 |
And so you want to optimize for this time to first token. 00:19:52.020 |
And time to first token is insanely hard to do actually, 00:19:57.020 |
because time to first token requires a lot of bandwidth. 00:20:07.340 |
And so you need an infinite amount of bandwidth, 00:20:11.860 |
in order to achieve just a few millisecond response time. 00:20:15.620 |
And so that architecture is really hard to do. 00:20:18.460 |
And we invented a Grace Blackwell NVLink for that. 00:20:22.500 |
In the spirit of time, I have more questions about that, 00:20:33.980 |
So, you know, I was at a dinner with Andy Jassy earlier. 00:20:38.420 |
- See, now we don't have to worry about the time. 00:20:42.900 |
And Andy said, you know, we've got Tranium, you know, 00:20:57.820 |
and will remain a huge and important partner for us. 00:21:09.620 |
that are going to go after targeted application, 00:21:14.300 |
maybe, you know, Tranium at Amazon, you know, 00:21:23.620 |
Do any of those things change that dynamic, right? 00:21:35.820 |
- We're trying to accomplish different things. 00:21:39.820 |
is build a computing platform for this new world, 00:21:44.060 |
this generative AI world, this agentic AI world. 00:21:46.860 |
We're trying to create, you know, as you know, 00:22:02.660 |
the way that you process software from CPUs to GPU, 00:22:12.700 |
And so software tools to artificial intelligence. 00:22:27.940 |
And this is really the complexity of what we do. 00:22:39.540 |
the data center is now the unit of computing. 00:22:54.380 |
And we're trying to build a new one every year. 00:23:01.020 |
We're trying to build a brand new one every single year. 00:23:04.980 |
we deliver two or three times more performance. 00:23:12.580 |
we improve the energy efficiency by two or three times. 00:23:32.260 |
at the pace that we're doing is incredibly hard. 00:23:40.100 |
and instead of selling it as a infrastructure, 00:23:59.140 |
We have to get all of our architectural libraries, 00:24:07.060 |
We get our security system integrated into theirs. 00:24:09.460 |
We get our networking integrated into theirs. 00:24:24.420 |
It's madness that you're trying to do this every year. 00:24:33.660 |
Clark's just back from Taipei, and Korea, and Japan, 00:24:50.180 |
- Yeah, when you break it down systematically, 00:25:04.060 |
that the entire ecosystem of electronics today 00:25:10.780 |
to build, ultimately, this cube of a computer 00:25:14.500 |
integrated into all of these different ecosystems, 00:25:20.420 |
So, there's obviously APIs, and methodologies, 00:25:44.020 |
- When the time comes, all these things in Taiwan, 00:25:48.820 |
they're gonna land somewhere in Azure's data center, 00:25:54.580 |
- Someone just calls an OpenAI API and it just works. 00:26:21.100 |
People use it in robotic systems now and human robots. 00:26:33.420 |
- Clark, I don't want you to leave the impression 00:26:38.100 |
What I meant by that when relating to your ASIC 00:26:48.260 |
- As a company, we want to be situationally aware, 00:26:55.940 |
of everything around our company and our ecosystem. 00:26:59.180 |
I'm aware of all the people doing alternative things 00:27:03.180 |
and sometimes it's adversarial to us, sometimes it's not. 00:27:10.380 |
But that doesn't change what the purpose of the company is. 00:27:26.860 |
We're not trying to take any share from anybody. 00:27:34.140 |
not one day does this company talk about market share, 00:27:39.260 |
All we're talking about is how do we create the next thing? 00:27:45.540 |
In that flywheel, how can we do a better job for people? 00:27:49.100 |
How do we take that flywheel that used to take about a year, 00:27:58.020 |
And so we're thinking about all these different things, 00:28:09.620 |
The only question is whether that mission is necessary. 00:28:32.180 |
and you're about to decide how to become a company. 00:29:06.980 |
we present our roadmap to them years in advance. 00:29:45.260 |
you said recently that the demand for Blackwell is insane. 00:29:48.940 |
You said one of the hardest parts of your job 00:30:02.020 |
But critics say this is just a moment in time, right? 00:30:13.140 |
I think about the start of 23 when we were having dinner. 00:30:17.140 |
The forecast for NVIDIA at that dinner in January of 23 00:30:47.780 |
because we had folks like Mustafa from Inflection 00:30:57.420 |
"Well, if you can't pencil out investing in our companies, 00:31:12.980 |
these 25 analysts were so focused on the crypto winner 00:31:16.540 |
that they couldn't get their head around an imagination 00:31:28.940 |
that it's going to be that way for as far as you can, 00:31:33.140 |
Of course, the future is unknown and unknowable, 00:31:38.220 |
that this isn't going to be the Cisco-like situation 00:31:55.660 |
what are the first principles of what we're doing? 00:32:06.940 |
The way that computing will be done in the future 00:32:16.380 |
Word, Excel, PowerPoint, Photoshop, Premier, you know, 00:32:21.380 |
AutoCAD, you give me your favorite application 00:32:29.020 |
I promise you it will be highly machine-learned 00:32:34.620 |
and on top of that, you're gonna have machines, 00:32:40.620 |
And so we know this for a fact at this point, right? 00:32:43.980 |
We've reinvented computing, we're not going back. 00:32:52.620 |
What software can write is gonna be different. 00:33:02.220 |
- Now the question, therefore, is what happens? 00:33:05.220 |
And so let's go back and let's just take a look 00:33:09.100 |
So we have a trillion dollars worth of computers 00:33:23.380 |
And we just know that we have a trillion dollars 00:33:25.020 |
worth of data centers that we have to modernize. 00:33:28.340 |
if we were to have a trajectory over the next four 00:33:46.340 |
You have $50 billion of CapEx you'd like to spend. 00:33:50.580 |
Option A, option B, build CapEx for the future. 00:33:57.180 |
- Now you already have the CapEx of the past. 00:34:07.380 |
Let's just take $50 billion, put it into generative AI. 00:34:13.780 |
- Now, how much of that 50 billion would you put in? 00:34:19.860 |
of infrastructure behind me that's of the past. 00:34:23.180 |
And so now you just, I just reasoned about it 00:34:26.900 |
from the perspective of somebody thinking about it 00:34:28.620 |
from first principles and that's what they're doing. 00:34:34.060 |
So now we have a trillion dollars worth of capacity 00:34:37.660 |
What about, you know, call it $150 billion into it. 00:34:42.020 |
So we have a trillion dollars in infrastructure 00:34:44.980 |
to go build over the next four or five years. 00:34:48.700 |
is that the way that software is written is different 00:34:53.580 |
but how software is gonna be used is different. 00:34:59.460 |
- We're gonna have digital employees in our company. 00:35:01.140 |
In your inbox, you have all these little dots 00:35:04.860 |
In the future, there's gonna be icons of AIs. 00:35:10.660 |
I'm gonna be, I'm no longer gonna program computers 00:35:19.580 |
Now this is no different than me talking to my, 00:35:21.660 |
you know, this morning, I wrote a bunch of emails 00:35:41.740 |
And I wanna be clear about what the outcome should be, 00:35:45.460 |
But I leave enough ambiguous space on, you know, 00:35:51.700 |
- It's no different than how I prompt an AI today. 00:36:03.660 |
This new infrastructure are going to be AI factories 00:36:10.380 |
And they're gonna be running all the time, 24/7. 00:36:14.020 |
- We're gonna have 'em for all of our companies 00:36:21.740 |
So there's a whole layer of computing fabric, 00:36:45.420 |
and the architecture for the AI factory is the same. 00:36:55.060 |
You at least have a trillion of new AI workloads coming on. 00:36:59.220 |
- Give or take, you'll do 125 billion in revenue this year. 00:37:02.860 |
You know, there was, at one point somebody told you 00:37:04.820 |
the company would never be worth more than a billion. 00:37:07.100 |
As you sit here today, is there any reason, right, 00:37:10.540 |
if you're only 125 billion out of a multi-trillion, Tam, 00:37:14.580 |
that you're not going to have 2X the revenue, 00:37:16.780 |
3X the revenue in the future that you have today? 00:37:27.740 |
everything is, you know, companies are only limited 00:37:39.580 |
And so the question is, what is our fish pond? 00:37:46.380 |
And this is the reason why market makers think 00:37:50.180 |
about that future, creating that new fish pond. 00:37:54.180 |
It's hard to figure this out looking backwards 00:38:04.740 |
- Yeah, and so, you know, I think the good fortune 00:38:07.660 |
that our company has is that since the very beginning 00:38:13.860 |
That market, and people don't realize this back then, 00:38:16.140 |
but anymore, but, you know, we were at ground zero 00:38:23.620 |
- We largely invented this market and all the ecosystem 00:38:27.220 |
and all the graphics card ecosystem, we invented all that. 00:38:30.380 |
And so the need to invent a new market to go serve it later 00:38:40.060 |
And speaking to somebody who's invented a new market, 00:38:42.860 |
you know, let's shift gears a little bit to models 00:39:03.300 |
- Reports are that they'll do 5 billion-ish of revenue 00:39:22.700 |
which we estimate is twice the amount Google had 00:39:25.420 |
at the time of its IPO. - Is that right, okay, wow. 00:39:27.260 |
- And if you look at the multiple of the business, 00:39:33.780 |
which is about the multiple of Google and Meta 00:39:37.660 |
When you think about a company that had zero revenue, 00:39:59.620 |
kind of public awareness and usage around AI. 00:40:02.740 |
- Well, this is one of the most consequential companies 00:40:34.420 |
you know, really believe that the timing matters. 00:40:40.220 |
The one thing that I know is that AI is gonna have 00:41:09.460 |
climate tech researchers, material researchers, 00:41:13.540 |
physical sciences, astrophysicists, quantum chemists. 00:41:33.540 |
And you go deep in there and you talk to the people 00:41:41.860 |
And you take those data points and you come back 00:42:00.220 |
Right now, ag tech, material tech, climate tech, 00:42:04.700 |
you pick your tech, you pick your field of science. 00:42:08.220 |
They are advancing, AI is helping them advancing their work 00:42:16.540 |
every university, unbelievable, isn't that right? 00:42:20.900 |
- It is absolutely going to somehow transform business. 00:42:40.700 |
chat GPT triggered, it's completely incredible. 00:42:46.900 |
And I love their velocity and their singular purpose 00:43:01.020 |
that can finance the next frontier of models, right? 00:43:04.860 |
And I think there's a growing consensus in Silicon Valley 00:43:16.460 |
And so early on here, we had a lot of model companies, 00:43:19.300 |
character and inflection and Cohere and Mistral 00:43:27.580 |
those companies can build the escape velocity 00:43:31.140 |
on the economic engine that can continue funding 00:43:38.380 |
that's why you're seeing the consolidation, right? 00:43:40.820 |
Open AI clearly has hit that escape velocity. 00:43:45.180 |
It's not clear to me that many of these other companies can. 00:43:48.820 |
Is that a fair kind of review of the state of things 00:43:53.940 |
this consolidation like we have in lots of other markets 00:44:03.860 |
- First of all, there's a different fundamental difference 00:44:09.860 |
between a model and artificial intelligence, right? 00:44:21.620 |
- And so, and artificial intelligence is a capability, 00:44:29.940 |
- The artificial intelligence for self-driving cars 00:44:37.660 |
which is related to the artificial intelligence 00:44:42.460 |
- And so, you have to understand the taxonomy of-- 00:44:57.540 |
all you have to do is replace the word model with GPU. 00:45:06.900 |
that there's a fundamental difference between GPU, 00:45:10.140 |
graphics chip or GPU, versus accelerated computing. 00:45:15.140 |
And accelerated computing is a different thing 00:45:18.460 |
than the work that we do with AI infrastructure. 00:45:33.380 |
Somebody who's really, really good at building GPUs 00:45:35.740 |
have no clue how to be an accelerated computing company. 00:45:38.620 |
I can, there are a whole lot of people who build GPUs. 00:45:52.860 |
but they're not accelerated computing companies. 00:46:00.620 |
accelerators that does application acceleration, 00:46:03.980 |
but that's different than an accelerated computing company. 00:46:06.300 |
And so for example, a very specialized AI application. 00:46:20.860 |
And so, you've got to decide where you want to be. 00:46:23.620 |
There's opportunities probably in all these different areas, 00:46:27.420 |
you have to be mindful of the shifting of the ecosystem 00:46:32.740 |
Recognizing what's a feature versus a product. 00:46:46.140 |
that has the money, the smarts, the ambition. 00:47:00.060 |
They went to Memphis and built a large coherent super cluster 00:47:07.580 |
- So first, three points don't make a line, okay. 00:47:29.300 |
You know, first talk to us a little bit about X 00:47:32.060 |
and their ambitions and what they've achieved. 00:47:33.580 |
But also, are we already at the age of clusters 00:47:46.540 |
acknowledgement of achievement where it's deserved. 00:47:56.860 |
that's ready for NVIDIA to have our gear there, 00:48:05.420 |
had it all hooked up, and it did its first training. 00:48:11.780 |
- That first part, just building a massive factory, 00:48:30.380 |
there's only one person in the world who could do that. 00:48:33.260 |
- I mean, Elon is singular in this understanding 00:48:35.860 |
of engineering and construction and large systems 00:48:58.580 |
And from the moment that we decided to get to go, 00:49:06.300 |
our networking team, our infrastructure computing team, 00:49:08.940 |
the software team, all of the preparation advance, 00:49:12.140 |
then all of the infrastructure, all of the logistics 00:49:17.540 |
and the amount of technology and equipment that came in 00:49:23.740 |
and computing infrastructure and all that technology, 00:49:41.700 |
but it's also kind of nice to just take a step back 00:49:43.820 |
and just, do you know how many days 19 days is? 00:49:48.500 |
And the amount of technology, if you were to see it, 00:49:52.580 |
All of the wiring and the networking and, you know, 00:49:58.020 |
than networking hyperscale data centers, okay? 00:50:05.900 |
And just getting this mountain of technology integrated 00:50:11.060 |
Yeah, so I think what Elon and the X team did, 00:50:14.700 |
and I'm really appreciative that he acknowledges 00:50:24.300 |
But what they achieved is singular, never been done before. 00:50:31.740 |
that's easily the fastest supercomputer on the planet 00:50:54.820 |
- What's the credit of the NVIDIA platform, right? 00:51:02.460 |
And of course there's a whole bunch of X algorithms 00:51:05.380 |
and X framework and X stack and things like that. 00:51:08.580 |
And we got a ton of integration we have to do, 00:51:18.500 |
But you answered that question by starting off saying, 00:51:20.940 |
yes, 200 to 300,000 GPU clusters are here, right? 00:51:46.220 |
My sense is that distributed training will have to work. 00:51:58.340 |
and distributed, asynchronous distributed computing 00:52:06.180 |
And I'm very enthusiastic and very optimistic about that. 00:52:17.260 |
the scaling law used to be about pre-training. 00:52:25.220 |
- Post-training has now scaled up incredibly. 00:52:33.820 |
And then now inference scaling has gone through the roof. 00:52:38.820 |
- The idea that a model, before it answers your answer, 00:52:42.460 |
had already done internal inference 10,000 times, 00:52:50.980 |
it's probably done reinforcement learning on that, 00:52:58.500 |
it looked up some information, isn't that right? 00:53:14.500 |
if you just did that math and you compound it with, 00:53:37.580 |
how do we architect it from a data center perspective? 00:53:42.020 |
are there data centers that are gigawatts at a time, 00:54:07.620 |
because NVIDIA is just scaling up or scaling out, 00:54:12.940 |
It's not such that you're only dependent on a world 00:54:17.140 |
where there's a 500,000 or a million GPU cluster. 00:54:24.740 |
you'll have written the software to enable that. 00:54:52.180 |
And so all the model parallelism that's being done, 00:54:59.980 |
all of that stuff is because we did the early work. 00:55:17.620 |
- But first, I think it's cool that they named O1 00:55:23.460 |
Which is about recruiting the world's best and brightest, 00:55:26.940 |
you know, and bringing them to the United States. 00:55:29.020 |
It's something I know we're both deeply passionate about. 00:55:32.380 |
So I love the idea that building a model that thinks 00:55:40.100 |
Is an homage to the fact that it's these people 00:55:43.900 |
who come to the United States by way of immigration 00:55:49.340 |
bring their collective intelligence to the United States. 00:55:55.340 |
- You know, it was spearheaded by our friend, 00:55:57.900 |
He worked at Pluribus and Cicero when he was at Meta. 00:56:04.180 |
as a totally new vector of scaling intelligence, 00:56:15.260 |
a lot of intelligence can't be done a priori. 00:56:28.300 |
I mean, just, you know, out of order execution 00:56:32.500 |
And so a lot of things that can only be done in runtime. 00:56:40.340 |
or you think about it from an intelligence perspective, 00:56:54.420 |
Sometimes just a quick answer is good enough. 00:57:04.420 |
- You know, depending on the nature of the usage 00:57:13.860 |
So I could totally imagine me sending off a prompt 00:57:27.860 |
"what's your best answer and reason about it for me." 00:57:30.580 |
And so I think the segmentation of intelligence 00:57:42.540 |
And then there'll be some that take five minutes, you know. 00:57:46.940 |
- And the intelligence layer that roots those questions 00:57:55.900 |
I was coaching my son for his AP history test. 00:58:00.740 |
And it was like having the world's best AP history teacher. 00:58:31.540 |
That's the part that most people have, you know, 00:58:48.180 |
And so, you know, everybody's so hyper-focused on NVIDIA 00:58:51.740 |
as kind of like doing training on bigger models. 00:58:57.060 |
Isn't it the case that your revenue, if it's 50/50 today, 00:59:00.420 |
you're going to do way more inference in the future. 00:59:03.780 |
- And then, I mean, training will always be important, 00:59:06.180 |
but just the growth of inference is going to be way larger 00:59:11.220 |
- It's almost impossible to conceive otherwise. 00:59:18.060 |
- But the goal is so that you can be productive 00:59:24.740 |
- Are you already using chain of reasoning and, you know, 00:59:49.420 |
AI software engineers, AI verification engineers. 00:59:52.100 |
And we build them all inside because, you know, 00:59:59.140 |
use the opportunity to explore the technology ourselves. 01:00:01.660 |
- You know, when I walked into the building today, 01:00:12.140 |
flat organizations that can execute quickly, smaller teams. 01:00:16.380 |
You know, NVIDIA is in a league of its own, really, 01:00:20.580 |
you know, at about 4 million of revenue per employee, 01:00:23.860 |
about 2 million of profits or free cashflow per employee. 01:00:30.660 |
that really has unleashed creativity and innovation 01:00:37.780 |
You've broken the mold on kind of functional management. 01:00:40.140 |
Everybody likes to talk about all of your direct reports. 01:00:49.220 |
that's going to continue to allow you to be hyper-creative 01:01:14.660 |
with a hundred million, you know, AI assistants. 01:01:29.340 |
that are just generally good at doing things. 01:01:31.820 |
We'll also have, our inbox is gonna full of directories 01:01:34.620 |
of AIs that we work with that we know are really good, 01:01:39.580 |
And so AIs will recruit other AIs to solve problems. 01:01:43.340 |
AIs will be in, you know, Slack channels with each other. 01:01:54.220 |
Some of 'em are digital and AI, some of 'em are biological. 01:01:57.460 |
And I'm hoping some of 'em even in megatronics. 01:02:04.980 |
You just described a company that's producing the output 01:02:14.860 |
Now, you didn't say I was gonna get rid of all my employees. 01:02:20.100 |
in the organization, but the output of that organization, 01:02:28.380 |
AI is not, it's not, AI will change every job. 01:03:02.860 |
into either better earnings or better growth or both. 01:03:08.860 |
- And when that happens, the next email from the CEO 01:03:19.420 |
because we have more ideas than we can explore, 01:03:21.940 |
and we need people to help us think through it 01:03:25.980 |
And so the automation part of it, AI can help us do. 01:03:30.660 |
Obviously, it's gonna help us think through it as well, 01:03:33.540 |
but it's still gonna require us to go figure out 01:03:39.340 |
What problems does this company have to go solve? 01:03:46.740 |
And so as a result, we're gonna hire more people 01:03:55.780 |
obviously we have more ideas today than 200 years ago. 01:04:01.420 |
and even though we're automating like crazy underneath. 01:04:16.460 |
of the automation and the technology of the last 200 years. 01:04:22.500 |
from Adam Smith and Schumpeter's creative destruction, 01:04:26.700 |
you can look at charted GDP growth per person 01:04:41.820 |
And then in the 2000s, it slowed down to about 1.8%. 01:04:58.100 |
And a lot of people have debated the reasoning for this, 01:05:02.980 |
and we're going to leverage and manufacture intelligence, 01:05:06.500 |
then isn't it the case that we're on the verge 01:05:08.660 |
of a dramatic expansion in terms of human productivity? 01:05:14.060 |
And of course, you know, we live in this world, 01:05:21.300 |
either as isolated of a case as a individual researcher. 01:05:28.540 |
now explore science at such an extraordinary scale 01:05:39.260 |
Or that we're designing chips that are so incredible 01:05:44.260 |
at such a high pace and the chip complexities 01:06:01.940 |
because we're using AI and supercomputers to help us. 01:06:05.340 |
The number of employees is growing barely linearly. 01:06:14.940 |
I can spot check it in a whole bunch of different industries. 01:06:30.620 |
is to generalize what is it that we're observing 01:06:33.740 |
and whether this could manifest in other industries. 01:07:00.260 |
- And when I reflect on that, that's my life. 01:07:09.220 |
is because they're world-class at what they do. 01:07:17.700 |
And I have no trouble prompt engineering them. 01:08:03.380 |
You mentioned the tragedy going on in the Middle East. 01:08:18.300 |
about safe AI, about coordination with Washington. 01:08:25.820 |
Do we have a sufficient level of coordination? 01:08:30.340 |
the way we beat the bad AIs is we make the good AIs better. 01:08:41.140 |
that this is a positive net benefit for humanity, 01:08:47.140 |
leaving us in this dystopian world without purpose? 01:09:05.300 |
And the reason for that is because as we know, 01:09:08.620 |
artificial intelligence and large language models 01:09:57.020 |
for vectorization or graphing or whatever it is, 01:10:16.300 |
- That we're building everybody all over the industry, 01:10:19.940 |
the methodologies, the red teaming, the process, 01:10:30.260 |
All of that, all of the harnesses that are being built 01:10:34.100 |
at the velocity that's been built is incredible. 01:10:51.100 |
around best practices with respect to these critical matters. 01:10:56.460 |
And so that's under-celebrated, under-understood. 01:11:37.020 |
where some of the regulation ought to be done, 01:11:40.380 |
most of the regulation ought to be done at the applications. 01:11:47.180 |
All of the different, all of the different ecosystems 01:11:50.540 |
that already regulate applications of technology. 01:11:54.220 |
- Now have to regulate the application of technology 01:12:07.140 |
don't overlook the overwhelming amount of regulation 01:12:10.540 |
in the world that are going to have to be activated 01:12:13.260 |
for AI, and don't rely on just one universal, 01:12:23.820 |
all of these different agencies were created. 01:12:27.220 |
all these different regulatory bodies were created. 01:12:32.180 |
- I'd get in trouble by my partner, Bill Gurley, 01:12:34.420 |
if I didn't go back to the open source point. 01:12:37.540 |
You guys launched a very important, very large, 01:12:46.260 |
- Obviously, meta is making significant contributions 01:13:10.580 |
is that, you know, having that open source model 01:13:13.300 |
and also having closed source models, you know, 01:13:21.980 |
does that create the healthy tension for safety? 01:13:26.220 |
Open source versus closed source is related to safety, 01:13:39.740 |
that are, that are the engines of an economic model. 01:14:06.700 |
how would all these different fields of science 01:14:16.420 |
using open source models, create domain-specific AIs. 01:14:25.420 |
and so you have to have that open source model 01:14:30.340 |
So financial services, healthcare, transportation, 01:15:18.660 |
that infinite loop, that loop, you know, questionable. 01:15:25.500 |
is kind of like, you get a super smart person, 01:15:33.100 |
You know, what comes out is probably not a smarter person. 01:15:51.580 |
- And so the idea that you can have AI models 01:15:54.940 |
exchanging, interacting, going back and forth, 01:16:04.060 |
kind of intuitively suggests it makes sense, yeah. 01:16:11.580 |
340B is the best model in the world for reward systems. 01:16:26.220 |
Irrespective of how great somebody else's model is, 01:16:51.500 |
Your journey is unlikely and incredible at the same time. 01:16:56.940 |
- You survived, like just surviving the early days 01:17:16.820 |
how long can you sustain what you're doing today? 01:17:31.420 |
And is there something else that you would rather be doing? 01:17:36.260 |
- Is this a question about the last hour and a half? 01:17:45.780 |
I couldn't imagine anything else I'd rather be doing. 01:17:53.500 |
I don't think it's right to leave the impression 01:18:07.500 |
Was that ever an expectation that it was fun all the time? 01:18:19.940 |
I take our contribution and our moment in time 01:18:51.460 |
The real question is how long can I be relevant? 01:18:56.820 |
And that only matters, that piece of information, 01:19:09.140 |
I'm not saying this simply because of our topic today. 01:19:15.020 |
to stay relevant and continue to learn because of AI. 01:19:19.620 |
I use it, I don't know, but I'm sure you guys do. 01:19:28.060 |
There's not one question that even if I know the answer, 01:19:57.980 |
You know, boy, you guys, it's completely revolutionary. 01:20:02.460 |
And that's just, you know, I'm an information worker. 01:20:10.220 |
that I'll have on society is pretty extraordinary. 01:20:33.540 |
you and I have been at this for a few decades. 01:20:37.900 |
- It's the most consequential moment of our careers. 01:20:51.660 |
That's going to optimistically and safely lead this forward.