back to index

Ep18. Jensen Recap - Competitive Moat, X.AI, Smart Assistant | BG2 w/ Bill Gurley & Brad Gerstner


Chapters

0:0 Introduction and Initial Reactions to Jensen
4:32 NVIDIA's Position in Accelerated Compute
5:11 CUDA and NVIDIA’s Competitive Moat
12:53 Challenges to NVIDIA’s Competitive Advantage
18:22 Future Outlook on Inference
24:46 The Insatiable Demand for AI and Hardware
27:12 Elon Musk' and X.ai
31:47 Scaling AI Models and Clusters
34:17 Economic Models and Funding in AI
39:8 The Future of AI Pricing and Consumption Models
42:25 Memory, Actions, and Intelligent Agents
47:8 The Role of AI in Business Productivity
51:3 Open vs Closed Models in AI Development

Whisper Transcript | Transcript Only Page

00:00:00.000 | You may also be running up against the,
00:00:02.380 | even for the Mag 7, the size of Capo X deployment,
00:00:07.380 | where their CFOs start to talk at higher levels.
00:00:10.440 | - For sure, totally.
00:00:12.700 | (upbeat music)
00:00:15.280 | Sonny, Bill, great to see you guys.
00:00:26.980 | - Good to see you.
00:00:28.060 | - Good to be back.
00:00:29.540 | - Thanks, man, it's great to have you.
00:00:31.100 | We literally just finished two days
00:00:32.780 | of the Altimeter annual meetings.
00:00:34.860 | I mean, we had hundreds of investors, CEOs, founders,
00:00:38.540 | and the theme was scaling intelligence to AGI.
00:00:43.120 | We had Nikesh talking about enterprise AI.
00:00:45.740 | We had Rene Haas talking about AI at the edge.
00:00:48.660 | We had Noam Brown talking about, you know,
00:00:51.500 | the Strawberry and O1 model and inference time reasoning.
00:00:54.140 | We had Sonny talking about, you know, accelerating inference.
00:00:58.300 | And of course, we kicked off with Jensen
00:01:01.020 | talking about the future of compute.
00:01:03.460 | You know, I did the Jensen talk with my partner, Clark Tang,
00:01:07.100 | who covers the compute layer and the public side.
00:01:09.540 | We recorded it on Friday.
00:01:11.260 | We'll be releasing it as part of this pod.
00:01:14.020 | And man, was it dense.
00:01:15.460 | I mean, he was, you know, he was on fire.
00:01:18.620 | He told me, I asked him at the beginning of the pod,
00:01:20.240 | "What do you want to do?"
00:01:21.080 | He said, "Grip it and rip it."
00:01:22.260 | And we did.
00:01:23.540 | 90 minutes, we went deep.
00:01:25.560 | I shared it with you guys.
00:01:26.860 | We've all listened to it.
00:01:28.000 | I learned so much playing it back
00:01:30.740 | that I just thought it made sense for us to unpack it,
00:01:34.060 | right, to really, to really analyze it,
00:01:36.980 | see what we agree with, what we may disagree with,
00:01:39.460 | things we want to further explore.
00:01:40.860 | Sonny, any high level reactions to it?
00:01:44.820 | Yeah, you know, first, it's the first time
00:01:47.380 | I've really seen him in a format
00:01:49.340 | where you got all that information out in one setting,
00:01:52.620 | 'cause you kind of get the, you get the tidbits.
00:01:54.820 | And the ones that really struck with me was when he said,
00:01:57.740 | "NVIDIA's not a GPU company.
00:01:59.340 | "They're an accelerated compute company."
00:02:01.880 | I think the next one, you know, which you'll touch on
00:02:05.120 | is where he really said,
00:02:06.060 | "The data center's the unit of compute."
00:02:09.440 | I thought that was massive.
00:02:11.760 | And, you know, sort of just closing out
00:02:13.240 | when he talked about, he thinks about using
00:02:16.680 | and already utilizing so much AI within NVIDIA
00:02:20.120 | and how that's a superpower for them to accelerate
00:02:22.680 | over everyone they're competing with.
00:02:24.000 | I thought those were kind of really awesome points in him,
00:02:27.320 | you know, eating the dog food, as they say.
00:02:29.540 | It is incredible, you know,
00:02:31.380 | there's this thing we'll talk about later,
00:02:33.300 | but he said he thinks they can 3X, you know,
00:02:36.820 | the top line of the business
00:02:38.740 | while only adding 25% more humans
00:02:41.820 | because they can have 100,000 autonomous agents
00:02:45.380 | doing things like building the software,
00:02:47.620 | doing the security,
00:02:49.140 | and that he becomes really a prompt agent,
00:02:51.200 | not only for his human direct reports,
00:02:53.540 | but also for these agents, which, you know,
00:02:56.620 | really is mind-boggling.
00:02:58.360 | Bill, anything stand out for you?
00:03:00.600 | Well, one, I mean, you should be pleased
00:03:03.240 | that you were able to get his time.
00:03:05.620 | You know, this is, at points in time,
00:03:09.360 | the largest market cap company in the world, if not one too.
00:03:13.980 | And so it was so, I think, kind of him
00:03:16.840 | to sit down with you for so long.
00:03:18.320 | And during the pod, he kept saying,
00:03:20.600 | "I can stay as long as you want."
00:03:22.360 | I was like, "Doesn't he have something to be doing?"
00:03:24.800 | (laughing)
00:03:26.900 | Is incredibly generous and-
00:03:28.620 | It's fantastic.
00:03:29.940 | But my other big, I mean, I had two big takeaways.
00:03:33.380 | One, I mean, it's obvious that this guy's, you know,
00:03:37.540 | rolling on all cylinders here, right?
00:03:39.700 | Like you have a company at a 3.3 trillion market cap
00:03:44.700 | that's still growing over 100% a year.
00:03:47.540 | And the margins are insane.
00:03:49.620 | I mean, 65% operating margins.
00:03:52.380 | There's only like five companies in the S&P 500
00:03:55.200 | at that level.
00:03:56.120 | And they certainly aren't growing at this pace.
00:03:59.120 | And when you bring up that point about getting more done
00:04:02.960 | on the increment with fewer employees,
00:04:04.920 | where's this gonna go?
00:04:06.360 | Like 80% operating margins?
00:04:08.720 | I mean, that would be unprecedented.
00:04:10.040 | There's a lot that's already here that's unprecedented,
00:04:13.300 | but obviously Wall Street is fully aware
00:04:17.200 | of the unbelievable performance of this company.
00:04:21.120 | And, you know, the multiples reflected
00:04:23.580 | and the market cap reflects it,
00:04:25.760 | but it's super powerful how they're executing.
00:04:29.200 | And you can see the confidence in every answer
00:04:31.880 | that he gives.
00:04:32.960 | We spent about a third of the pod
00:04:35.060 | on NVIDIA's competitive mode,
00:04:37.120 | really trying to break it down,
00:04:38.440 | really trying to understand this idea
00:04:40.720 | of systems level advantages,
00:04:42.600 | the combinatorial advantages that he has in the business.
00:04:46.460 | Because I think when I talk to people
00:04:48.680 | around the investment community,
00:04:50.080 | despite how well it's covered, Bill, right?
00:04:52.640 | There's still this idea that it's just a GPU
00:04:55.680 | and that somebody is gonna build a better chip.
00:04:57.840 | They're gonna come along and displace the business.
00:05:00.360 | And so when he said, again,
00:05:02.920 | it can sound like marketing speak, Sonny,
00:05:05.000 | when somebody says it's not a GPU company,
00:05:07.620 | it's an accelerated compute company.
00:05:10.080 | You know, we showed this chart
00:05:12.600 | where you can see kind of the NVIDIA full stack.
00:05:15.240 | And he talked about how he just built layer after layer
00:05:18.200 | after layer of the stack, you know,
00:05:20.280 | over the course of the last decade and a half.
00:05:23.600 | But when he said that, Sonny,
00:05:25.200 | I know you had a reaction to it, right?
00:05:27.440 | Even though you know, it's not just a GPU company,
00:05:30.560 | when he really broke it down,
00:05:32.640 | it seemed like, you know, he did break new territory here.
00:05:36.600 | Yeah, like what was great to hear from him
00:05:39.680 | and really, you know, positive for, you know,
00:05:42.120 | folks thinking about where NVIDIA
00:05:44.000 | lives in the stack right now,
00:05:45.640 | is he kind of got into details
00:05:48.080 | and then the sub details below CUDA.
00:05:50.280 | And he really started going into what they're doing
00:05:53.840 | very particularly on mathematical operations
00:05:56.280 | to accelerate their partners
00:05:58.000 | and how they work really closely with their partners.
00:06:00.520 | You know, all the cloud service providers
00:06:03.040 | to basically build these functions
00:06:04.560 | so that they can further accelerate workloads.
00:06:06.840 | The other little nuance that I picked up in there,
00:06:09.320 | he didn't focus purely on LLMs.
00:06:11.760 | He talked in that particular area
00:06:14.080 | about how they're doing that
00:06:15.320 | for a lot of traditional models
00:06:17.640 | and even newer models are being deployed for AI.
00:06:20.200 | And I think just really showed
00:06:22.400 | how they're partnering much closer on the software layer
00:06:25.680 | than the hardware layer alone.
00:06:27.800 | Right, I mean, in fact, you know,
00:06:29.600 | he talked about, you know, the CUDA library
00:06:32.760 | now has over 300 industry specific
00:06:35.440 | acceleration algorithms, right?
00:06:37.720 | Where they deeply learn the industry, right?
00:06:40.520 | So whether this is synthetic biology
00:06:42.480 | or this is image generation, or this is autonomous driving,
00:06:46.480 | they learn the needs of that industry
00:06:48.520 | and then they accelerate the particular workloads.
00:06:51.080 | And that for me was also one of the key things.
00:06:54.760 | This idea that every workload
00:06:57.600 | is moving from kind of this deterministic,
00:07:01.680 | you know, handmade workload
00:07:04.840 | to something that's really driven by machine learning
00:07:07.840 | and really infused with AI
00:07:09.360 | and therefore benefits from acceleration.
00:07:11.720 | Even something as ubiquitous as data processing.
00:07:15.240 | - Yeah, and I shared this code sample with Bill
00:07:17.840 | as, you know, we were just preparing for this pod
00:07:20.480 | and, you know, I knew Bill processed it right away
00:07:23.840 | and then ran it, which was,
00:07:25.480 | it really showed like every piece of code
00:07:27.440 | that's out there now that's related to,
00:07:29.560 | or not every piece, many of the pieces
00:07:31.360 | have this like sort of, if device equals CUDA,
00:07:34.240 | do X, and if it's not, do Y.
00:07:36.320 | And that's the level of impact they're having
00:07:38.280 | across the, you know, entire ecosystem of services
00:07:41.560 | and apps that are being built that are related to AI.
00:07:43.800 | Bill, I don't know what you thought
00:07:44.760 | when you saw that piece.
00:07:46.880 | - Yeah, I mean, I think there's a question
00:07:51.000 | for the long-term that relates to CUDA.
00:07:54.360 | And I wanna go back to the system point you made later,
00:07:56.880 | Brad, but while we're on CUDA,
00:07:58.760 | is what percentage of developers will touch CUDA
00:08:04.680 | and is that number going up or down?
00:08:07.280 | And I could see arguments on both sides.
00:08:09.920 | You could say the A models are gonna get
00:08:12.400 | more and more hyper-specialized and performance matters
00:08:15.880 | so much that the one, the models that matter the most,
00:08:19.440 | the deployments that matter the most,
00:08:21.160 | they're gonna get as close to the metal as possible
00:08:24.200 | and then CUDA is gonna matter.
00:08:26.080 | The other side you can make is,
00:08:28.320 | those optimizations are gonna live in PyTorch,
00:08:31.120 | they're gonna live in other tools like that.
00:08:33.320 | And the marginal developer's not gonna need to know that.
00:08:37.080 | And I don't, I could make both arguments,
00:08:40.000 | but I think it's an interesting question going forward.
00:08:42.680 | - I mean, I just asked Chet GPT
00:08:44.880 | how many CUDA developers there are today
00:08:46.360 | just to be on top of it.
00:08:47.440 | Three million CUDA developers, right?
00:08:50.120 | And a lot more that touch CUDA that aren't specifically
00:08:54.000 | kind of developing on CUDA.
00:08:55.160 | So it is one of these things
00:08:56.520 | that has become pretty ubiquitous.
00:08:58.640 | And his point was, it's not just CUDA, of course.
00:09:01.000 | It's really full stack, all the way from data ingestion,
00:09:05.480 | all the way through kind of the post-training.
00:09:07.680 | - I think I'm on the ladder of your point, Bill.
00:09:09.600 | I think there's gonna be fewer people touching that.
00:09:12.240 | And I do think that's a point where there,
00:09:14.360 | the moat is not as strong as a longer term, as you say.
00:09:17.880 | And think about like, you know, the way,
00:09:19.760 | the analogy that I would go with is like,
00:09:21.800 | think about the number of iPhone, iOS,
00:09:23.840 | like developers working at Apple building that
00:09:26.240 | versus the number of app developers, right?
00:09:28.200 | And I think you're gonna have a, you know,
00:09:30.280 | 10 to one or a hundred to one ratio of people
00:09:32.640 | building at layers above
00:09:34.480 | versus people building down closer to the bare metal.
00:09:36.920 | - That'd be something to watch.
00:09:38.000 | We can ask more people over time.
00:09:40.520 | Obviously it's a big lock today, for sure.
00:09:43.240 | - You know, and I think, Bill, to your point,
00:09:44.840 | you know, I reached out to Gavin.
00:09:46.720 | Actually, before I did the interview,
00:09:48.280 | Gavin Baker, who's a good buddy
00:09:50.120 | and who obviously knows the space incredibly well,
00:09:52.480 | has followed it at a deeper level
00:09:55.160 | for a longer period of time than I have.
00:09:57.640 | And, you know, like when I asked him
00:09:59.960 | about the competitive advantage,
00:10:02.280 | he really said a lot of the competitive advantages
00:10:04.800 | around this algorithmic diversity
00:10:06.680 | and innovation and why CUDA matters.
00:10:09.640 | He said, if the world standardizes
00:10:11.760 | on transformers on PyTorch,
00:10:14.480 | then it's less relevant for GPUs, you know,
00:10:18.640 | in that environment.
00:10:19.480 | Like if you have a lot of standardization, right,
00:10:22.000 | then advantage goes to the custom ASICs.
00:10:25.960 | But I'll tell you this, you know,
00:10:27.400 | and I've had this conversation with a lot of people.
00:10:29.960 | When I asked Jensen, I pushed him on, you know, custom ASICs.
00:10:33.080 | I was like, hey, you know, you've got, you know,
00:10:35.280 | accelerated inference coming from Meta
00:10:37.240 | with their MTIA chip.
00:10:38.480 | You know, you've got Inferentia and Tranium,
00:10:40.480 | you know, coming.
00:10:41.320 | He's like, yeah, Brad, like they're, you know,
00:10:43.880 | they're my biggest partners.
00:10:44.960 | I actually share my three to five-year roadmap with them.
00:10:47.640 | Yes, they're going to have these point solutions
00:10:50.320 | that are going to do these very specific tasks.
00:10:52.880 | But at the end of the day,
00:10:54.280 | the vast majority of the workloads in the world
00:10:56.840 | that are machine learning and AI infused
00:10:59.000 | are gonna run on NVIDIA.
00:11:00.360 | And the more people I talk to,
00:11:01.840 | the more I'm convinced that that's the case,
00:11:04.280 | despite the fact that there'll be a lot of other winners,
00:11:06.800 | including Grok and Cerebrus, et cetera.
00:11:09.320 | And they're acquiring companies.
00:11:11.200 | They're moving up the stack.
00:11:12.360 | They're trying to do more optimization at higher levels.
00:11:15.600 | So they want to extend, obviously, what CUDA is doing.
00:11:19.360 | Don't go to inference yet.
00:11:20.480 | That's a whole nother story.
00:11:22.680 | I'm actually on that bit about the deep integrations.
00:11:26.320 | Because, you know, really that's a playbook
00:11:28.840 | that I think Microsoft really had done well
00:11:32.400 | for a long time in enterprise software.
00:11:34.440 | And you really haven't seen that in hardware ever.
00:11:37.120 | You know, if you go back to say Cisco or the PC era,
00:11:40.320 | or, you know, the cloud era,
00:11:41.320 | you didn't see that deep level integration.
00:11:42.880 | Now, Microsoft pulled it off with Azure.
00:11:44.960 | And when I heard him talking,
00:11:46.360 | all I could think about was, man, that was really smart.
00:11:49.880 | What he's done is he's gotten together,
00:11:51.800 | really understand what the use cases are,
00:11:53.960 | and build an organization
00:11:55.360 | that deeply integrates into his customers,
00:11:58.080 | and does it so well all the way up into his roadmap
00:12:00.920 | that he's much more deeply embedded than anyone else is.
00:12:04.920 | When I heard that part,
00:12:06.280 | I kind of gave him a real tip of the hat on that one.
00:12:08.760 | But what did you, you know, Brad, what was your take on that?
00:12:12.040 | You and I had this conversation
00:12:13.840 | after we first listened to it.
00:12:15.760 | And, you know, if you really telescope out,
00:12:19.200 | you know, he talks as a systems level engineer, right?
00:12:23.200 | Even if you hear like people, you know,
00:12:25.360 | people went to Harvard Business School and say,
00:12:26.760 | how can this guy possibly have 60 direct reports, right?
00:12:30.040 | But how many direct reports does Elon have, right?
00:12:32.760 | These systems level, and he said,
00:12:34.720 | I have situational awareness, right?
00:12:37.960 | I'm a prompt engineer to the best people in the world
00:12:41.080 | at these specific tasks.
00:12:42.560 | I think when I look at this,
00:12:44.120 | the thing that I deeply underappreciated
00:12:46.320 | a year and a half ago about this company
00:12:48.400 | was the systems level thinking, right?
00:12:51.440 | That these are, that he spent years thinking about
00:12:54.640 | how to embed this competitive advantage,
00:12:56.840 | and how it really, it goes all the way from power,
00:12:59.920 | all the way through application.
00:13:01.800 | And every day they're launching these new things
00:13:04.120 | to further embed themselves in the ecosystem.
00:13:06.160 | But I did hear from somebody over the last two days who,
00:13:09.880 | you know, Rene Haas, the CEO of Arm, right?
00:13:13.320 | Rene was also at our event and he's a huge Jensen fan.
00:13:17.800 | He worked eight years at NVIDIA
00:13:20.040 | before becoming the CEO of Arm in 2013.
00:13:23.000 | And he said, listen, nobody is going to assault
00:13:25.440 | the NVIDIA castle head on, right?
00:13:27.840 | Like the mainframe of AI, right?
00:13:30.880 | Is entrenched and it's going to become a lot bigger,
00:13:33.720 | at least as far as the eye can see.
00:13:36.760 | He said, however, if you think about
00:13:39.560 | where we're interacting with AI today, right?
00:13:43.440 | On these devices, on edge devices.
00:13:46.360 | He's like, our installed base at Arm
00:13:48.360 | is 300 billion devices.
00:13:52.160 | And increasingly a lot more of this compute
00:13:56.240 | can run closer to the edge.
00:13:58.920 | If you think about an orthogonal competitor, right?
00:14:03.400 | Again, if he has a deep competitive moat in the cloud,
00:14:06.360 | what's the orthogonal competitor?
00:14:07.640 | The orthogonal competitor peels off
00:14:09.960 | a lot of the AI on the edge.
00:14:11.400 | And I think Arm's incredibly well positioned to do that.
00:14:14.280 | Clearly NVIDIA has got Arm embedded now
00:14:17.000 | in a lot of their, you know,
00:14:19.080 | in a lot of their Grace Blackwell, et cetera.
00:14:22.040 | But that to me would be one area.
00:14:23.680 | Like if you looked out and you said,
00:14:24.920 | where can their competitive advantage,
00:14:27.240 | you know, be challenged a little bit?
00:14:28.560 | I don't think they necessarily have
00:14:30.320 | the same level of advantage on the edge
00:14:32.680 | as they have in the cloud.
00:14:34.960 | - You started the pod by saying, you know,
00:14:37.320 | everyone's heard this in the investment community.
00:14:39.320 | It's not a GPU company, it's a systems company.
00:14:41.600 | And I, in my brain, I think had thought,
00:14:45.120 | oh, well, they've got four in a box
00:14:46.960 | instead of, you know, just one GPU or eight in a box.
00:14:50.720 | At the time I was listening to the podcast
00:14:53.000 | you did with Jensen, I was reading this
00:14:55.440 | Neo cloud playbook and anatomy post by Dylan Patel.
00:14:59.400 | - Yes, that was a good one.
00:15:00.840 | - He goes into extreme detail about the architecture
00:15:05.840 | of some of the larger systems, you know,
00:15:08.640 | like the one that X.AI that we're going to talk about
00:15:11.920 | that was just deployed, which I think is 100,000 nodes
00:15:15.440 | or something like that.
00:15:16.880 | And it literally changed my opinion
00:15:20.000 | of exactly what's going on in the world.
00:15:21.960 | And actually answered a lot of questions I had,
00:15:25.040 | but it appears to me that NVIDIA's competitive advantage
00:15:29.920 | is strongest where the size of the system is largest,
00:15:34.880 | which is another way of saying what Renee said,
00:15:37.040 | it's flipping it on its head.
00:15:38.680 | It's not to say it's weak on the edge,
00:15:42.080 | but it's super powerful
00:15:44.000 | when you put a whole bunch of them together.
00:15:45.920 | That's when the networking piece thrives.
00:15:48.440 | That's where NVLink thrives.
00:15:50.200 | That's where CUDA really comes alive
00:15:52.600 | in the biggest systems that are out there.
00:15:54.920 | And some of the questions that answered for me was,
00:15:58.800 | one, why is demand so high at the high end
00:16:02.400 | and why are nodes available on the internet,
00:16:05.560 | you know, single nodes available on the internet
00:16:07.560 | for at or below cost?
00:16:09.360 | And this starts to get at that,
00:16:10.720 | 'cause you can do things with the large systems
00:16:13.560 | that you just can't do with a single node.
00:16:16.320 | And so those two things can be simultaneously true.
00:16:19.520 | Why was NVIDIA so interested in CoreWeave existing?
00:16:23.480 | Now, I understand like if the biggest systems
00:16:27.320 | are where the biggest competitive advantage is,
00:16:29.640 | you need as many of these big system companies
00:16:32.160 | as you can possibly have.
00:16:33.880 | And there may be, if that trajectory remains true,
00:16:38.880 | you could have an evolution
00:16:40.560 | where customer concentration increases for NVIDIA over time
00:16:45.160 | rather than going the other way.
00:16:46.880 | Depending on how, you know,
00:16:48.080 | if Sam's right that they're gonna spend a hundred billion
00:16:51.520 | or whatever on a single model,
00:16:52.800 | there's only so many places
00:16:54.080 | they're gonna be able to afford that.
00:16:56.720 | But a lot of stuff started to make sense to me
00:16:59.200 | that didn't before.
00:17:00.360 | And I clearly underestimated the scale
00:17:04.240 | of what it meant to be a non-GPU company,
00:17:06.800 | to be a system company.
00:17:08.200 | This goes way, way up.
00:17:11.080 | - Yeah, and you know, again, Bill,
00:17:13.600 | you touched on something that I think
00:17:15.160 | is really important here.
00:17:17.280 | And this is this question of whether their competitive mode
00:17:21.640 | is also as powerful in training as it is in inference, right?
00:17:25.760 | Because I think that there's a lot of doubt
00:17:29.720 | as to whether their competitive mode
00:17:31.400 | is as strong as inference.
00:17:32.520 | But, you know, let's just-
00:17:35.160 | - You wanna flip to that?
00:17:36.680 | - Well, no, but I asked him if it was as strong.
00:17:40.560 | - No, I heard you.
00:17:41.560 | - He actually said it was greater, right?
00:17:44.840 | - I heard him.
00:17:45.680 | - To me, you know, when you think about that,
00:17:48.840 | in the first instance, right,
00:17:51.000 | I think it didn't make a lot of sense.
00:17:52.800 | But then when you really started thinking about it,
00:17:54.520 | he said there's a trail of infra
00:17:55.840 | behind the infrastructure that's already out there
00:17:58.240 | that is CUDA compatible and can be amortized
00:18:01.360 | for all this inference.
00:18:02.440 | And so he, like for example,
00:18:04.520 | referenced that OpenAI had just decommissioned Volta.
00:18:07.320 | So it's like this massive installed base.
00:18:10.080 | And when they improve their algorithms,
00:18:11.920 | when they improve their frameworks,
00:18:13.280 | when they improve their CUDA libraries,
00:18:15.720 | it's all backward compatible.
00:18:17.840 | So Hopper gets better and Ampere gets better
00:18:20.040 | and Volta gets better.
00:18:21.520 | That combined with the fact that he said
00:18:23.760 | everything in the world today
00:18:25.120 | is becoming highly machine learned, right?
00:18:27.960 | Almost everything that we do,
00:18:29.080 | he said almost every single application,
00:18:30.880 | Word, Excel, PowerPoint, Photoshop, AutoCAD,
00:18:33.720 | like it all will run on these modern systems.
00:18:38.720 | Sonny, do you buy that?
00:18:40.840 | Do you buy that, you know, when people go to replace,
00:18:43.880 | you know, compute,
00:18:44.720 | they're gonna replace it on these modern systems?
00:18:46.840 | So when I was listening to it, I was buying it.
00:18:49.920 | But then when I, he said one thing
00:18:51.920 | that kept resonating in my mind,
00:18:53.280 | which he said inference is going to be
00:18:55.960 | a billion times larger than training.
00:18:59.160 | And if you kind of double click into that,
00:19:01.920 | these old systems aren't gonna be sufficient enough, right?
00:19:05.360 | If you're gonna have that much more demand,
00:19:07.200 | that much more workload, which I think we all agree,
00:19:10.000 | then how is it that these old systems,
00:19:12.920 | which are being decommissioned from training
00:19:14.680 | are gonna be sufficient?
00:19:15.920 | So I think that's where that argument didn't hold,
00:19:18.480 | just didn't hold strong enough for me.
00:19:20.320 | If that grows as fast as he says it is,
00:19:22.520 | as fast as, you know, you guys have seen it in their numbers,
00:19:25.760 | then it's gonna be a lot more net new
00:19:28.520 | inference related, you know, deployments.
00:19:31.800 | And there, I don't think that that argument
00:19:34.680 | holds on the transfer from older hardware to newer hardware.
00:19:38.200 | - Well, you said something pretty casually there, right?
00:19:42.560 | Let's underscore this, right?
00:19:44.320 | We were talking about the strawberry in the '01 preview,
00:19:46.960 | and he said there's a whole new vector
00:19:48.440 | of scaling intelligence, inference time reasoning, right?
00:19:52.080 | That's not gonna be single shot,
00:19:53.960 | but it's going to be lots of agent to agent interactions,
00:19:58.760 | thinking time as Noam Brown likes to say, right?
00:20:02.400 | And he said as a consequence of that,
00:20:04.920 | inference is going to 100X, 1,000X, a million X,
00:20:08.440 | maybe even a billion X.
00:20:10.560 | And that in and of itself, right,
00:20:13.160 | to me was, you know, kind of a wow moment.
00:20:16.560 | 40% of their revenues are already inference.
00:20:19.680 | And I said, over time, does your inference
00:20:21.840 | become a higher percentage of your revenue mix?
00:20:24.760 | And he said, of course, right?
00:20:26.600 | But again, I think conventional wisdom
00:20:28.720 | is all around the size of clusters
00:20:30.480 | and the size of training.
00:20:31.680 | And if models don't keep getting bigger,
00:20:33.880 | then their relevance will dissipate.
00:20:35.920 | But he's basically saying every single workload
00:20:39.120 | is gonna benefit from acceleration, right?
00:20:41.520 | It's gonna be an inference workload,
00:20:43.360 | and the number of inference interactions
00:20:45.360 | is gonna explode higher.
00:20:46.800 | Yeah, one technical detail,
00:20:49.400 | which is you need bigger clusters
00:20:51.920 | if you're training bigger models.
00:20:54.080 | But if you're running bigger models,
00:20:55.640 | you don't need bigger clusters.
00:20:56.760 | It can be distributed, right?
00:20:59.080 | And so I think what we're gonna see here
00:21:02.200 | is that the larger clusters will continue to get deployed,
00:21:05.520 | and as Bill said, they'll get deployed for folks,
00:21:07.640 | maybe a limited number of folks that need to deploy it
00:21:09.800 | for a hundred billion dollar runs
00:21:11.680 | or even bigger than that.
00:21:12.960 | But you'll see inference clusters be large,
00:21:16.760 | but not as large as a training clusters
00:21:18.400 | and be a lot more distributed
00:21:19.880 | because you don't need it to be all in the same place.
00:21:22.080 | And I think that's what'll be really interesting.
00:21:24.200 | It was interesting.
00:21:25.200 | He simplified it even more than you did there, Brad.
00:21:28.840 | He said, think about a human.
00:21:30.960 | How much time do you spend learning versus doing?
00:21:34.440 | And he used that analogy as to why
00:21:37.360 | this was gonna be so great.
00:21:38.640 | But I, in a little different way than Sonny,
00:21:43.240 | I thought the argument
00:21:45.080 | that the reason we're gonna be great at inference
00:21:47.080 | is 'cause there's so much of our old stuff laying around
00:21:50.200 | wasn't super solid.
00:21:52.120 | In other words, what if some other company,
00:21:56.680 | Sonny's or some other one decided to optimize inference?
00:22:01.680 | It wasn't an argument for optimization.
00:22:04.160 | It was an argument for cost advantage
00:22:07.400 | because it might be fully distributed or whatever.
00:22:10.240 | And of course, if you had maybe poked him back on that,
00:22:14.000 | he might've had another answer about why for optimization,
00:22:18.440 | but there are clearly gonna be people,
00:22:21.280 | whether it's other chips companies,
00:22:24.760 | some of these accelerator companies,
00:22:26.600 | there are gonna be people working on inference optimization,
00:22:30.000 | which may include edge techniques.
00:22:31.560 | I think some of the accelerators may look like AI CDNs,
00:22:36.080 | if you will, and they're gonna be buying stuff
00:22:37.800 | closer to the customer.
00:22:39.120 | So all TBD, but just the argument
00:22:42.960 | that you've got it left over didn't seem super solid to me.
00:22:46.480 | - And the three fastest companies in inference right now
00:22:49.240 | are not NVIDIA.
00:22:50.240 | - Right, so who are they, Sonny?
00:22:52.760 | Show it, we'll post the leaderboard.
00:22:54.880 | - Yeah, it's a combination of Grok,
00:22:57.520 | Cerebris, and SambaNova, right?
00:22:59.680 | Those are three companies that are not NVIDIA
00:23:02.040 | that are on the leaderboards of all the models that they run.
00:23:05.400 | - You're talking about performance.
00:23:06.640 | Performance is what you're talking about.
00:23:07.480 | - Performance, yeah. - Yeah.
00:23:09.040 | - Yeah, and I would argue even price.
00:23:11.600 | - Yeah.
00:23:12.440 | - And make the argument, why are they faster?
00:23:15.480 | Why are they cheaper in your mind?
00:23:18.240 | But yet, notwithstanding that fact,
00:23:20.840 | NVIDIA is gonna do, let's call it,
00:23:22.680 | 50 or 60 billion of inference this year,
00:23:25.520 | and these companies are still just getting started, right?
00:23:29.400 | Why is their inference business?
00:23:30.840 | Is it just because of installed base?
00:23:33.400 | - Yeah, I think it's a combination of installed base,
00:23:35.560 | and I think it's because that inference market
00:23:37.560 | is growing so incredibly fast.
00:23:39.280 | I think if you're making this decision even 18 months ago,
00:23:42.840 | it would be a really difficult decision
00:23:44.520 | to buy any of those three companies,
00:23:46.200 | because your primary workload was training,
00:23:48.280 | and the first part of this pod,
00:23:50.000 | we talked about how they have such a strong tie-in,
00:23:53.440 | integration to getting training done properly.
00:23:55.880 | I think when it comes to inference,
00:23:57.040 | you can see all the non-NVIDIA folks
00:23:59.640 | can get the models up and running right away.
00:24:01.320 | There is no tie-in to CUDA that's required to go faster,
00:24:04.840 | that's required to get the models running, right?
00:24:06.920 | Obviously, none of the three companies run CUDA,
00:24:09.480 | and so that moat doesn't exist around inference.
00:24:12.760 | - Yeah, CUDA's less relevant in inference.
00:24:15.280 | That's another point worth making.
00:24:18.080 | But I wanted to say one other thing
00:24:19.640 | to what Sonny just said.
00:24:20.720 | If you go back to the early internet days,
00:24:24.240 | and this is just an argument
00:24:25.800 | that optimization takes a while,
00:24:28.280 | all of the startups were running on Oracle and Sun.
00:24:32.880 | Every single (beep) one of them
00:24:34.320 | were running on Oracle and Sun,
00:24:35.960 | and five years later, they were all running on Linux
00:24:38.560 | and MySQL, like in five years.
00:24:41.600 | And so, and it was literally,
00:24:43.480 | it went from 100% to 3%,
00:24:47.080 | and I'm not making that projection
00:24:50.040 | that that's gonna happen here,
00:24:51.120 | but you did have a wholesale shift as the industry,
00:24:56.120 | they went from developing and building it
00:24:59.400 | for the first time to optimizing,
00:25:01.320 | which are really two separate motions.
00:25:03.720 | - It seems to me, I pulled up this chart, right,
00:25:06.000 | that we shared, we made, Bill,
00:25:08.040 | way earlier this year for the pod,
00:25:10.160 | which showed the trillion dollars of new AI workloads
00:25:14.720 | expected over the next four to five years,
00:25:16.960 | and the trillion dollars
00:25:18.360 | of effectively data center replacement.
00:25:21.240 | And I just wanted to get his updated
00:25:23.240 | kind of reaction or forecast,
00:25:24.800 | now that he's had six more months to think about
00:25:28.320 | whether or not he thinks that's achievable.
00:25:31.720 | And what I heard him say was,
00:25:33.760 | "Yes, the data center replacement's
00:25:35.520 | gonna look exactly like that."
00:25:37.720 | Of course, he's just making his best educated guess,
00:25:41.520 | but he seemed to suggest
00:25:42.640 | that the AI workloads could be even bigger, right?
00:25:45.400 | Like that once he saw Strawberry in '01,
00:25:48.480 | that he thought the amount of compute
00:25:51.120 | that was gonna be required to power this,
00:25:53.120 | and the more people I talk to,
00:25:54.960 | the more I get that same sense,
00:25:57.920 | there is this insatiable demand.
00:25:59.720 | So maybe we just touch on this.
00:26:01.800 | He goes on CNBC and he says, "The demand is insane," right?
00:26:06.800 | And I kept trying to push on that.
00:26:08.640 | I was like, "Yeah, but what about MTIA?
00:26:11.600 | What about custom inference?
00:26:13.880 | What about all these other factors?
00:26:16.200 | What if models stop getting so big?"
00:26:19.040 | I said, "Will any of that change the equation?"
00:26:21.920 | And he consistently pushed back and said,
00:26:25.360 | "You still don't understand the amount of demand
00:26:27.840 | in the world because all compute is changing," right?
00:26:32.800 | I thought he had one nuance, that answer,
00:26:35.400 | which was when you asked him that, he said,
00:26:38.480 | "Look, if you have to replace
00:26:40.920 | some amount of infrastructure,"
00:26:42.680 | whatever the number was, was really big,
00:26:44.240 | "and you're part of that,
00:26:46.280 | and you're a CIO somewhere tasked with doing this,
00:26:50.280 | what are you gonna do?
00:26:51.320 | What are you gonna replace it with?
00:26:52.360 | It's accelerated compute."
00:26:53.880 | And then immediately, once you make that choice,
00:26:56.920 | 'cause you're not going to traditional compute,
00:26:58.920 | then NVIDIA is your number one choice.
00:27:00.920 | So I thought he kind of tied that back together
00:27:02.920 | in that like, are you really gonna get yourself in trouble
00:27:06.120 | by having something else there,
00:27:07.440 | or are you just gonna go to NVIDIA?
00:27:09.160 | When he said it, I didn't wanna say that, Bill,
00:27:12.600 | but it felt like the old IBM argument.
00:27:14.520 | Yeah, look, I mean, one thing, Brad,
00:27:16.560 | is this company's public.
00:27:18.040 | When a private company says, "Oh, the demand's insane,"
00:27:21.200 | you know, I immediately get skeptical.
00:27:23.600 | This company's doing 30 billion a quarter,
00:27:26.040 | growing 122%, like, the demand is insane.
00:27:30.600 | Like, we can see it.
00:27:33.040 | There's no doubt about it.
00:27:34.600 | And part of that demand was a conversation
00:27:38.520 | about Elon and x.ai and what they did.
00:27:42.760 | And I thought it was also just incredibly fascinating, right?
00:27:46.400 | I thought it was funny.
00:27:47.360 | I asked him a question about the dinner
00:27:48.760 | that he and Elon and Larry Ellison apparently had.
00:27:51.760 | And he's like, you know, just because that dinner occurred
00:27:55.080 | and they ended up with 100,000 H100s,
00:27:57.560 | don't necessarily connect the dots.
00:28:01.040 | But listen, he confirmed that his mind was blown by Elon.
00:28:07.720 | And he said he is an N of one superhuman
00:28:11.360 | that could possibly pull off,
00:28:13.240 | that could energize a data center,
00:28:15.320 | that could liquid cool a data center.
00:28:17.560 | And he said, what would take somebody else years
00:28:21.120 | to get permitted, to get energized,
00:28:23.040 | to get liquid cooled, to get stood up,
00:28:25.040 | that x.ai did in 19 days, you know?
00:28:29.400 | And you could just tell the immense respect
00:28:32.520 | that he had for Elon.
00:28:34.280 | It's clear, you know,
00:28:35.840 | he said it's the single largest coherent supercomputer
00:28:38.840 | in the world today, that it's gonna get bigger.
00:28:42.160 | And if you believe that the future of AI
00:28:44.360 | is tied closely together with the systems engineering
00:28:48.280 | on the hardware side, you know,
00:28:50.320 | what hit me in that moment was,
00:28:53.000 | that's a huge, huge advantage for Elon.
00:28:56.040 | Yeah, I think he, I forgot the exact number,
00:28:58.600 | but like he talked about how many thousands of miles
00:29:01.400 | of cabling that were just in there.
00:29:04.200 | As part of the task.
00:29:05.960 | Look, you know, coming to it from a bit, you know,
00:29:10.080 | doing a lot of that ourselves right now,
00:29:11.840 | building data centers, standing them up,
00:29:14.000 | racking and stacking, you know, our nodes,
00:29:17.360 | it's impressive.
00:29:18.200 | It's impressive to do something at that scale in 19 days.
00:29:21.680 | You know, it doesn't even include
00:29:24.080 | how quickly they built that data center.
00:29:25.880 | I think it's all happened, you know, within 2024.
00:29:29.520 | And so that's part of the advantage.
00:29:32.040 | The interesting thing there is he didn't touch on it
00:29:35.040 | as much as what, when he talked about it,
00:29:37.200 | doing the integration with cloud service providers.
00:29:39.920 | What I'd love to kind of double click into is,
00:29:42.400 | because, you know, Elon is in a unique situation
00:29:44.640 | where he's obviously bought this cluster.
00:29:46.440 | He has a ton of respect for NVIDIA,
00:29:48.160 | but he, you know, is building his own chip,
00:29:49.720 | building their own clusters with Tesla.
00:29:51.760 | So I wonder how much, you know,
00:29:55.000 | cross correlation or information there is for them,
00:29:58.000 | for them to be able to do that at scale.
00:30:00.200 | And, you know, you guys look at this.
00:30:01.560 | What have you kind of seen on their clusters?
00:30:04.960 | I don't really have a lot of data
00:30:06.560 | on the non-NVIDIA clusters that they have.
00:30:09.840 | I'm sure Freedom, my team, does.
00:30:11.480 | I just don't have it off the top of my head.
00:30:13.080 | If we have it, you know, I'll pull a chart and I'll show it.
00:30:15.560 | Sonny, you said you now think the XAI cluster
00:30:18.880 | is the largest NVIDIA cluster alive today?
00:30:21.560 | I'm saying, 'cause I believe Jensen said it in the pod,
00:30:24.320 | that he said it's the largest supercomputer in the world.
00:30:26.840 | Yeah, I mean, I just want to spend 30 seconds
00:30:30.520 | on what you said, Brad, about Elon.
00:30:32.080 | I'm staring out my window at the Gigafactory in Austin
00:30:35.540 | that was also built in record time.
00:30:37.880 | Starlink's insane.
00:30:39.520 | When we were walking in Diablo, I just kept thinking,
00:30:42.120 | "You know who I'd love to reimagine this place?
00:30:45.040 | Elon," right?
00:30:46.320 | And I don't, the world should study
00:30:49.840 | how he can do infrastructure fast,
00:30:52.640 | because if that could be cloned, it would be so valuable.
00:30:56.600 | Not really relevant to this podcast, but worth noting.
00:30:59.720 | The other thing that I thought about on the Elon thing,
00:31:03.200 | and this also, where these pieces coming together,
00:31:07.000 | my mind about these large clusters
00:31:09.520 | and how important that was to NVIDIA,
00:31:12.760 | he got allocation, right?
00:31:14.720 | This is supposed to be like the hottest company,
00:31:17.920 | the hottest product backed up for years on demand.
00:31:22.920 | And he walks in and takes what equates,
00:31:25.560 | sounds, looks like about 10% of the quarter's availability.
00:31:30.240 | And in my mind, I'm thinking that's because,
00:31:34.680 | hey, if there's another company
00:31:36.160 | that's gonna develop these big ones,
00:31:38.280 | I'm gonna let him to the front of the line.
00:31:40.960 | And that speaks to what's happening in Malaysia
00:31:43.920 | and the Middle East, and any one of these people
00:31:47.520 | that are gonna get excited, he's gonna spend time with them,
00:31:50.680 | put them at the front of the line.
00:31:52.200 | You know, I'll tell you, I pushed him on this.
00:31:55.240 | I said, you know, Elon's gonna,
00:31:57.840 | you know, rumor is that he's gonna get another 100,000,
00:32:01.280 | you know, H200s, add them to this cluster.
00:32:04.120 | I said, are we already at the phase
00:32:06.280 | of two and 300,000 cluster scale?
00:32:09.520 | And he said, yes.
00:32:11.000 | And then I said, and will we go to 500,000 a million?
00:32:14.520 | And he's like, yes.
00:32:15.960 | Now, I think these things, Bill,
00:32:18.400 | are already being planned and built.
00:32:20.960 | And what he said is beyond that, beyond that,
00:32:24.400 | he said, you start bumping up
00:32:25.960 | against the limitations of base power.
00:32:28.920 | Like, can you find something that can be energized
00:32:32.120 | to power a single cluster?
00:32:34.000 | And he said, we're gonna have
00:32:35.120 | to develop distributed training.
00:32:38.400 | And he said, but just like with Megatron
00:32:41.360 | that we developed to allow to occur what is occurring today,
00:32:45.480 | we're working on the distributed stuff
00:32:48.120 | because we know we're gonna have to decompose
00:32:50.480 | these clusters at some point
00:32:52.240 | in order to continue scaling them.
00:32:54.200 | - You may also be running up against the,
00:32:56.600 | even for the Mag 7, the size of Capo X deployment,
00:33:01.600 | where their CFOs start to talk at higher levels.
00:33:04.640 | - For sure, totally.
00:33:07.240 | - And there's a super interesting article
00:33:09.560 | in the information just now where,
00:33:11.960 | it came out today where Sam Altman is questioning
00:33:15.680 | whether Microsoft's willing to put up the money
00:33:18.880 | and build a cluster.
00:33:20.120 | And it may have been,
00:33:22.000 | that may have been kind of triggered
00:33:23.920 | by Elon's comments or Elon's willingness to do it at X.AI.
00:33:28.920 | - What I will say on like the size of the models,
00:33:31.600 | like we're gonna push into this really interesting realm
00:33:34.040 | where obviously we can have bigger
00:33:35.640 | and bigger training clusters.
00:33:37.320 | That naturally imposes that the models
00:33:39.720 | are bigger and bigger.
00:33:40.880 | But what you can't do is you can't take a single,
00:33:43.160 | like you can train a model across a distributed site
00:33:46.120 | and it may just take you a month longer
00:33:48.760 | because you have to move traffic around.
00:33:50.920 | And so instead of taking three months,
00:33:52.040 | it takes you four months.
00:33:53.280 | But you can't really run a model across a distributed site
00:33:55.920 | 'cause that inferences in like real time thing.
00:33:58.680 | And so we do, we're not pushing it there,
00:34:01.440 | but when you start to get to models,
00:34:02.640 | it became way too big to run in single locations.
00:34:05.440 | That may be a problem that we wanna be aware of
00:34:07.520 | and we wanna keep in our minds as well.
00:34:10.560 | - On this question of scaling our way to intelligence.
00:34:15.560 | One of the things I asked Noam Brown today
00:34:20.040 | in our fireside chat,
00:34:21.960 | he made very clear his perspective,
00:34:24.480 | although he's working on inference time reasoning,
00:34:26.800 | which is a totally different vector
00:34:28.600 | and a breakthrough vector at OpenAI,
00:34:31.040 | which we ought to spend a little bit of time talking about.
00:34:33.320 | He said, now there are these two vectors, right?
00:34:37.240 | That again are multiplicative in terms of the path to AGI.
00:34:41.400 | He's like, make no mistake about it,
00:34:43.520 | like we're still seeing big advantages
00:34:45.680 | to scaling bigger models, right?
00:34:47.400 | We have the data, we have the synthetic data,
00:34:50.000 | we're going to build those bigger models
00:34:51.760 | and we have an economic engine that can fund it, right?
00:34:54.960 | Don't forget this company is over 4 billion in revenue,
00:34:58.680 | scaling probably most people think to 10 billion plus
00:35:01.680 | in revenue over the course of the next year.
00:35:04.040 | They just raised 6.5 billion,
00:35:05.560 | they got a $4 billion line of credit from Citigroup.
00:35:08.760 | So among the independent players, Bill, right?
00:35:12.120 | Like Microsoft can choose whether or not
00:35:13.960 | they're going to fund it,
00:35:14.840 | but I don't think it's a question of whether or not
00:35:16.560 | they're gonna have the funding.
00:35:17.920 | At this point, they've achieved escape velocity.
00:35:20.120 | I think for a lot of the other independent players,
00:35:22.520 | there's a real question
00:35:23.520 | whether they have the economic model
00:35:26.160 | to continue to fund the activity.
00:35:28.120 | So they have to find a proxy
00:35:29.840 | because I don't think a lot of venture capitalists
00:35:31.600 | are going to write multi-billion dollar checks
00:35:33.680 | into the players that haven't yet
00:35:35.280 | caught lightning in a bottle.
00:35:37.600 | That would be my guess.
00:35:39.680 | I mean, you know, I just think it's hard.
00:35:43.000 | You know, listen, at the end of the day,
00:35:44.320 | we're economic animals, you know, and I've said before,
00:35:48.360 | you know, if you look at the forward multiple,
00:35:49.960 | most of us underwrote to on open AI,
00:35:52.040 | it was about 15 times forward earnings, right?
00:35:54.480 | If Chad GPT wasn't doing what it was doing,
00:35:56.440 | if the revenue wasn't doing what it was doing, right,
00:35:58.880 | this would have meant massively dilutive to the company.
00:36:01.360 | It would have been very hard to raise the money.
00:36:03.200 | I think if Mistral or all these other companies
00:36:05.800 | want to raise that money, I think it'd be very difficult,
00:36:08.080 | but you know, you never, I mean, you know,
00:36:09.840 | there's still a lot of money out there, so it's possible,
00:36:12.160 | but I think this is, you know, you should-
00:36:13.880 | - You said 15 times earnings, I think you meant revenue.
00:36:16.160 | - Oh, 15 times revenue, for sure.
00:36:18.760 | Which I said, you know, when Google went public,
00:36:21.480 | it was about 13 or 14 times revenue
00:36:23.680 | and Meta was like 13 or 14 times revenue.
00:36:26.360 | So I do think we're on the precipice
00:36:29.520 | of a lot of this consolidation among the new entrants.
00:36:31.760 | What I think is so interesting about X is, you know,
00:36:34.920 | when I was pushing him on this model consolidation,
00:36:37.680 | pushing Jensen on it, he was like, listen,
00:36:40.320 | with Elon, you have somebody with the ambition,
00:36:42.800 | with the capability, with the know-how, with the money,
00:36:45.880 | right, with the brands, with the businesses.
00:36:48.720 | So I think a lot of times when we're talking about AI today,
00:36:52.160 | we oftentimes talk about open AI,
00:36:54.440 | but a lot of people quickly then go
00:36:56.200 | into all of the other model companies.
00:36:58.080 | I think X is often left out of the conversation.
00:37:01.520 | And one of the things I took away
00:37:03.840 | from this conversation with Jensen is, again,
00:37:07.000 | if scaling these data centers
00:37:09.920 | is a key competitive advantage to winning an AI, right,
00:37:15.120 | like you absolutely cannot count out X.AI in this battle.
00:37:19.800 | They're certainly going to have to figure out, you know,
00:37:21.480 | something with the consumer that's going to have a flywheel
00:37:23.920 | like ChatGPT or something with the enterprise.
00:37:26.520 | But in terms of standing it up,
00:37:28.000 | building the model, having the compute,
00:37:30.000 | I think they're, you know,
00:37:32.040 | going to be one of the three or four in the game.
00:37:34.720 | You touched on maybe wanting to close out
00:37:37.960 | on the strawberry-like models.
00:37:41.280 | You know, one thing we don't have exposure to,
00:37:46.040 | but we can guess at is cost.
00:37:48.360 | And that chart that they showed when they released Strawberry,
00:37:52.240 | the X-axis was logarithmic.
00:37:56.280 | So the cost of a search with the new preview model
00:38:01.280 | is probably costing them 20X or 30X
00:38:06.160 | what it does to do a normal ChatGPT search.
00:38:09.760 | - Which I think is fractions of a penny.
00:38:12.560 | - But figuring out which, and it also takes longer.
00:38:15.360 | So figuring out which problems it's acceptable,
00:38:19.400 | and Jensen gave a few examples for it,
00:38:21.760 | to take more time and cost more
00:38:24.120 | and to get the cost benefit right for that type of result
00:38:28.080 | is something we're going to have to figure out,
00:38:29.560 | like which problems tilt to that place.
00:38:32.600 | - Right, and you know,
00:38:33.680 | the one thing I feel good about there,
00:38:35.360 | and again, I'm speculating,
00:38:37.760 | I don't have information from OpenAI on this,
00:38:40.720 | but what we know is that the cost of inference
00:38:42.640 | has fallen by 90% over the course of last year.
00:38:44.760 | What we, you know, what Sonny has told us
00:38:46.840 | and other people, you know, in the field have told us
00:38:49.400 | that inference is going to drop by another 90%
00:38:52.280 | over the course of the next, you know, period of months.
00:38:56.160 | - If you're racing logarithmic needs,
00:38:58.600 | you're going to need that.
00:39:00.440 | - Right, and you know,
00:39:01.880 | and here's what I also think happens, Bill,
00:39:04.960 | is in this chain of reasoning,
00:39:07.600 | you're going to build intelligence
00:39:09.200 | into the chain of reasoning, right?
00:39:11.600 | So that, you know, you're going to optimize
00:39:14.200 | where you send these, you know,
00:39:16.080 | each of these inference interactions,
00:39:18.040 | you're going to batch them,
00:39:19.440 | you're going to take more time with,
00:39:21.120 | because it's just a time money trade-off, right?
00:39:24.080 | At the end of the day.
00:39:25.480 | I also think that we're in the very earliest innings
00:39:28.880 | as to how we're going to think
00:39:31.640 | about pricing these models, right?
00:39:33.360 | So if we think about this in terms of systems one,
00:39:35.560 | systems two level thinking, right?
00:39:38.080 | Systems one being, you know,
00:39:39.400 | what's the capital of France, right?
00:39:41.480 | You're going to be able to do that for fractions of a penny
00:39:44.240 | using pretty simple models on chat GPT, right?
00:39:49.000 | When you want to do something more complex,
00:39:50.960 | if you're a scientist and you want to use O1
00:39:53.120 | as your research partner, right?
00:39:55.000 | You may end up paying it by the hour
00:39:56.600 | and relative to the cost of an actual research partner,
00:39:59.480 | it may be really cheap, right?
00:40:01.400 | So I think there are going to be consumption models,
00:40:04.240 | you know, for this.
00:40:05.080 | I think we haven't even scratched the surface
00:40:07.680 | to think about how that's going to be priced,
00:40:09.280 | but I totally agree with you
00:40:11.680 | that it's going to be priced very differently.
00:40:13.280 | Again, I think this puts,
00:40:14.920 | I think OpenAI has suggested, you know,
00:40:18.840 | that the O1 full model may even be released yet this year,
00:40:23.160 | right?
00:40:24.360 | One of the things that I'm kind of waiting to see
00:40:27.600 | is I think, you know, listen,
00:40:29.200 | having known Noam Brown for quite a while now,
00:40:32.000 | he's an N of one, right?
00:40:33.920 | And he wasn't the only one working on this
00:40:35.720 | for sure at OpenAI, but, you know, listen,
00:40:39.360 | whether it was pluribus or winning at the game of diplomas,
00:40:42.920 | he's been thinking about this for a decade, right?
00:40:46.080 | It was his major breakthrough
00:40:47.680 | on how to win the game of six-handed poker.
00:40:50.280 | And so he brought this to OpenAI.
00:40:52.720 | I think they have a real lead here,
00:40:55.120 | which leads me back to this question, Bill,
00:40:57.480 | you and I talk about all the time,
00:40:59.680 | which is memory and actions, right?
00:41:02.280 | And so I have to tell you this funny thing
00:41:04.880 | that occurred at our investor day.
00:41:08.440 | So I had Nikesh on stage and, you know,
00:41:11.040 | obviously Nikesh, you know,
00:41:12.560 | was instrumental at Google for a decade.
00:41:14.760 | And so I wanted to talk to him about both consumer AI
00:41:16.920 | as well as enterprise AI.
00:41:18.680 | And I asked him, I said, I want to make a wager with you.
00:41:21.440 | I knew of course he would take a bet.
00:41:23.840 | And I said, I want to make a wager with you.
00:41:26.280 | Over, under, I'll set the line at two years
00:41:29.800 | until we have an agent that has memory and can take action.
00:41:33.720 | And the canonical use case, of course,
00:41:35.720 | that I used was that I could tell my agent,
00:41:38.560 | book me the Mercer Hotel next Tuesday
00:41:40.640 | in New York at the lowest price.
00:41:42.680 | And I said, over, under, you know,
00:41:44.800 | two years on getting that done.
00:41:46.640 | I said, I'll start 5,000 bucks, I'll take the under.
00:41:51.640 | He snap calls me, he says, I'll take the over.
00:41:55.320 | And he said, but only if you 10X the bet.
00:41:58.320 | And of course we're doing it for a good cause.
00:42:02.080 | So I had to call him because I, you know,
00:42:04.600 | I can't not step up to a good cause.
00:42:09.080 | So we're taking the opposite sides of that trade.
00:42:11.520 | Now, what was interesting is over the course
00:42:13.400 | of the next couple of days,
00:42:14.720 | I asked some other friends who took the stage, you know,
00:42:18.280 | where they would come down on the same bet, right?
00:42:22.600 | Our friend, Stanley Tang took the under.
00:42:25.240 | A friend from Apple, who will remain nameless,
00:42:29.080 | kind of took the over.
00:42:30.080 | And then Noam Brown, who was there, pleaded the fifth.
00:42:34.240 | He says, I know the answer, so I can't say.
00:42:36.640 | And so, yeah, it was kind of provocative.
00:42:40.560 | And I, you know, I texted Nikesh and I said,
00:42:43.320 | I think you better get your checkbook ready.
00:42:46.320 | You know, so coming back to that, Bill,
00:42:49.560 | you know, Strawberry O'Wan's an incredible breakthrough,
00:42:52.000 | something that thinks,
00:42:52.960 | so this whole new vector of intelligence,
00:42:55.920 | but it kind of makes us forget
00:42:58.080 | about the thing you and I focus so much on,
00:42:59.920 | which was memory and actions, right?
00:43:02.520 | And I think that we are on the real precipice
00:43:06.240 | of not only these models think, you know,
00:43:09.040 | can spend more time thinking,
00:43:10.640 | not only can they give us less hallucinations, you know,
00:43:14.680 | and just scaled compute, but I also think,
00:43:17.440 | I mean, you already see the makings of this.
00:43:19.120 | I mean, use these things today.
00:43:20.360 | They already remember quite a bit.
00:43:22.840 | So I think they're sliding this into the experience,
00:43:26.760 | but I think we're going to have the ability
00:43:28.360 | to take simple actions.
00:43:29.520 | And I think this metaphor that people had in their minds,
00:43:32.280 | that they were going to have to build deep APIs
00:43:34.840 | and deep integrations to everybody,
00:43:37.800 | I don't think is the way this is going to play out.
00:43:40.040 | And let me just--
00:43:41.480 | What do you think is going to play out?
00:43:42.760 | Well, I mean, the Easter egg
00:43:44.000 | that I thought got dropped last week
00:43:45.640 | is they did this event on, you know, their voice API, right?
00:43:50.560 | And it's literally your GPT calling a human
00:43:53.680 | on the telephone and placing an order.
00:43:55.760 | So why the hell can't my GPT just call up the Mercer Hotel
00:43:59.600 | and say, "Brad Gerstner would like to make a reservation.
00:44:02.240 | Here's his credit card number,"
00:44:03.360 | and pass along the information?
00:44:05.400 | There is a reason for that.
00:44:06.560 | I mean, look, scrapers and form fillers have existed
00:44:11.400 | for how long, Sonny?
00:44:12.640 | 15 years?
00:44:13.800 | Like, you could write an agent to go fill out
00:44:17.320 | and book at the Mercer Hotel 15 years ago.
00:44:20.320 | There's nothing impossible about that.
00:44:22.760 | It's the corner cases and, like, the hallucination
00:44:26.360 | when your credit card gets charged 10 grand.
00:44:29.120 | Like, you just can't have failure.
00:44:31.680 | And how you architect this
00:44:34.240 | so that there's not failure and there's trust,
00:44:36.600 | I'm sure you could demo this tomorrow.
00:44:38.960 | I have zero doubt you could demo it tomorrow.
00:44:41.400 | Could you provide it at scale in a trustworthy way
00:44:44.520 | where people are allocating their credit cards to it?
00:44:47.480 | That might take a little longer.
00:44:48.560 | Okay, so over-under bill on two years.
00:44:51.800 | I mean, I'm gonna get to action either way.
00:44:54.840 | But what's the test?
00:44:55.960 | The demo?
00:44:56.800 | I think you can do it today.
00:44:57.640 | No, not the cheesy demo you just said.
00:44:59.720 | I'm talking about a release that allows me,
00:45:02.520 | you know, at scale to book a hotel.
00:45:04.760 | Where it's spending your credit card?
00:45:06.600 | And not just you, but everybody, full release?
00:45:09.600 | Yeah, we'll call it a full release,
00:45:11.120 | just because I know that's the only way
00:45:12.440 | I can entice you to take the bet.
00:45:16.240 | Which today is October 8th, 2024.
00:45:20.360 | I mean, Sonny, you already know what he's gonna say.
00:45:22.960 | You'll take the over, right, Bill?
00:45:25.280 | Yeah, yes.
00:45:26.120 | Okay, so Bill's into cash camp.
00:45:27.560 | Sonny, where do you come down?
00:45:28.640 | Over-under on two years.
00:45:30.040 | No, don't start hedging, Bill.
00:45:31.720 | Don't start hedging.
00:45:32.560 | Go ahead, Sonny.
00:45:33.400 | I already said it, demo today.
00:45:34.400 | It's 15 years ago you could do that.
00:45:36.920 | Let me comment on what you're worried about, Bill.
00:45:39.320 | And I think people still are still working
00:45:41.840 | their way through it.
00:45:42.800 | You don't need a single agent right now
00:45:44.960 | to book the Mercer and deal with all the scraping stuff
00:45:47.280 | you're talking about.
00:45:48.200 | You can have a thousand agents working together.
00:45:50.400 | You can have one that's making sure
00:45:51.600 | that the credit card charge is not too big.
00:45:53.480 | You can have another one to make sure
00:45:54.760 | that the address is right.
00:45:55.800 | You can have another one checking against your calendar.
00:45:57.880 | And so all of that's free.
00:45:59.600 | So I'm on the under, and Brad,
00:46:01.320 | I'll even go under one year.
00:46:03.400 | Wow, wow.
00:46:05.240 | So we got a little side action, you and I, Sonny.
00:46:08.240 | I'm not gonna go under a year,
00:46:09.760 | but I think we could have limited releases in a year.
00:46:13.200 | But Sonny, you and I now have action with Bill.
00:46:16.920 | What do you want, Bill, a thousand bucks?
00:46:18.880 | Sure. To a good cause?
00:46:19.880 | Okay, a thousand bucks each to a good cause.
00:46:22.120 | And I'll just assume, Sonny,
00:46:23.920 | that we'll get action from Nikesh as well.
00:46:26.080 | And you know our friend Stanley Tang
00:46:27.680 | is definitely in the tank for some.
00:46:29.600 | So we're gonna give some good money to a good cause.
00:46:32.960 | And listen, I think this is the trillion dollar question.
00:46:35.800 | I know we're all focused on scaling models,
00:46:39.080 | and I know we're all focused on the compute layer,
00:46:41.200 | but what really transforms people's lives,
00:46:44.160 | what really disrupts 10 Blue Links,
00:46:47.360 | what really disrupts the entire architecture
00:46:49.920 | of the app ecosystem,
00:46:52.200 | is that when we have an intelligent assistant
00:46:54.560 | that we can interact with that gets smarter over time,
00:46:57.160 | that has memory and could take actions.
00:46:59.360 | And when I see the combination of advanced voice mode,
00:47:03.360 | voice-to-voice API,
00:47:05.480 | Strawberry 01 thinking combined with scaling intelligence,
00:47:09.720 | I just think this is going to go a lot faster
00:47:12.320 | than most of us think.
00:47:13.160 | Now, listen, they may pull on the reins, right?
00:47:16.040 | They may slow down the release schedule in order,
00:47:19.080 | you know, for a lot of business reasons.
00:47:20.800 | That's harder to predict.
00:47:22.720 | But I think the technology, I mean, even Noam said,
00:47:25.600 | I thought it was gonna take us much, much longer
00:47:28.400 | to see the results that we have seen.
00:47:30.720 | Can I hit on one other thing?
00:47:32.400 | This is, you know, we started the pod
00:47:34.360 | a little bit talking about it.
00:47:35.880 | I just wanna get your impression, Bill.
00:47:38.360 | This idea that Jensen can scale the business
00:47:40.840 | two or three times with, you know,
00:47:42.680 | increasing the head count by, you know, 20 or 25%, right?
00:47:47.280 | We know that Meta's done that
00:47:48.720 | over the course of the last two years.
00:47:51.240 | And you and I've talked about,
00:47:52.400 | are we on the eve of just massive productivity boom
00:47:55.960 | and massive margin expansion
00:47:58.840 | like we've never seen before, right?
00:48:01.240 | Nikesh said, we ought to be able to get 20 or 30%,
00:48:04.760 | you know, productivity gains out of everybody
00:48:06.640 | in the business.
00:48:07.880 | - First of all, I think NVIDIA is a very special company.
00:48:10.680 | And it's a company that's even if it's a systems company,
00:48:15.680 | it's an IP company.
00:48:17.280 | And the demand is growing at such a rate
00:48:20.160 | that they don't need more designers
00:48:22.680 | or more developer engineers to create incremental revenue.
00:48:27.680 | That's happening on its own.
00:48:29.280 | And so their operating margins are record levels.
00:48:32.920 | For the majority of companies, you know,
00:48:36.640 | I've always just held this belief that, you know,
00:48:39.480 | you evolve with your tools.
00:48:41.040 | And the real answer is the companies
00:48:44.920 | that don't deploy these things
00:48:46.480 | are gonna go out of business.
00:48:47.600 | - Yeah.
00:48:48.440 | - And so I think margins get competed away
00:48:51.360 | in many, many cases.
00:48:52.560 | I think it's ridiculous to imagine,
00:48:55.800 | oh, every company goes to 60% operating margin.
00:48:58.120 | - No, no, no, no.
00:48:59.280 | I mean, listen, Delta Airlines
00:49:01.320 | is going to do all of these things with AI
00:49:04.600 | and immediately because it's in a commodity market,
00:49:07.160 | it'll get competed away by Southwestern United.
00:49:09.320 | Bad industries remain bad industries.
00:49:11.720 | - Yeah, yeah, yeah.
00:49:12.960 | So, but there might be some, you know, that figure it out.
00:49:16.920 | And I have another theory that I always keep in mind,
00:49:21.920 | which is hyper growth tends to delay
00:49:25.960 | what you learned in microeconomics class.
00:49:28.600 | You know, I remember when I was a PC analyst
00:49:31.040 | and there were five public PC companies
00:49:33.240 | all growing 100%.
00:49:35.200 | And so in moments of hyper growth,
00:49:38.000 | you will have margins that may or may not be durable.
00:49:41.280 | And you'll have a number of participants in a market
00:49:44.360 | that may or may not be durable
00:49:46.640 | during periods of hyper growth.
00:49:48.840 | - I have two more things on my mind, Sunday.
00:49:51.080 | Do you have any reactions to that?
00:49:52.680 | I mean, I just have to get to a couple of these topics.
00:49:55.280 | - No, like-
00:49:56.480 | - There's gonna be a Lex Friedman links podcast
00:50:00.240 | once you've finished the interview.
00:50:03.040 | - No, look, I really, you know,
00:50:06.080 | been thinking a lot about Jensen's point in the pod about,
00:50:09.160 | you know, how much AI they're using internally for design,
00:50:12.040 | design verification for all those pieces, right?
00:50:14.520 | And I think, you know, it's not 30%.
00:50:17.680 | I actually think sort of that's an underestimate.
00:50:21.160 | I think you're talking, you know,
00:50:22.960 | multiple hundreds of percent improvement
00:50:24.880 | in productivity gains.
00:50:26.560 | And the only issue is that not every company
00:50:28.920 | can grasp that that quickly.
00:50:31.320 | And so, you know,
00:50:33.560 | I think he was kind of holding some cards back
00:50:35.960 | at that point when he made that comment.
00:50:37.840 | And it really got me thinking about like,
00:50:39.400 | how much are they doing there
00:50:40.880 | that they don't want everybody to know about?
00:50:42.960 | And you kind of see it now in the model development
00:50:46.280 | because they, you know,
00:50:47.440 | if you've noticed the last couple of weeks,
00:50:48.800 | they put some models out there
00:50:50.280 | that are models trained on their own
00:50:51.680 | and they don't get as much noise as, you know,
00:50:54.440 | ones from Meta and, you know,
00:50:56.800 | the other players that are out there,
00:50:59.080 | but they're really doing a lot more than we think.
00:51:01.760 | And they, I think they have their arms
00:51:04.320 | around a lot of these very, very difficult problems.
00:51:07.120 | Brad, why did they put their own model up?
00:51:09.680 | Well, it's related to this topic of open versus closed.
00:51:13.040 | So Bill, you know, I hope you're proud of me.
00:51:16.200 | You know, I went back and I said,
00:51:17.520 | I have to ask this question. I do.
00:51:20.000 | Right, and you know, I thought Jensen, you know,
00:51:23.160 | I thought he gave a great answer, which is like, listen,
00:51:25.680 | we're gonna have companies that for economic reasons,
00:51:28.320 | right, push the boundary toward AGI
00:51:30.800 | or whatever they're doing.
00:51:32.240 | And it makes sense to have a closed model
00:51:34.000 | that can be the best and they can monetize.
00:51:36.840 | But the world's not gonna develop with just closed models.
00:51:39.200 | We're gonna, you know, he's like,
00:51:40.240 | it's both open and closed.
00:51:42.640 | And, you know, he said, because open,
00:51:45.160 | he's like, it's absolutely a condition required.
00:51:47.720 | It's gonna be the vast majority
00:51:49.040 | of the models in the industry.
00:51:50.440 | He's right now, if we didn't have open source,
00:51:52.360 | how would you have all these different fields in science,
00:51:55.000 | you know, be able to be activated on AI?
00:51:57.120 | He talked about llama models exploding higher.
00:52:00.040 | And then with respect to his own open source model,
00:52:02.440 | which I thought was really interesting.
00:52:04.480 | He said, we focused on, right,
00:52:07.400 | something that a specific capability.
00:52:10.840 | And the capability that we were focused on
00:52:13.520 | is how to agentically use this model
00:52:16.280 | to make your model smarter, faster, right?
00:52:19.240 | So it's almost like a training coaching model
00:52:21.600 | that he built.
00:52:22.680 | And so I think for them, it makes perfect sense
00:52:25.160 | why they may, you know, put that out into the world.
00:52:29.400 | But I also, you know, a lot of times
00:52:31.240 | the open versus closed debate, you know,
00:52:34.200 | gets hijacked into this conversation
00:52:36.680 | about safety and security.
00:52:38.600 | And, you know, and I think he said, you know,
00:52:40.280 | listen, these two things are related,
00:52:41.800 | but they're not the same thing.
00:52:43.640 | You know, one of the things he commented on that is just,
00:52:46.440 | he said, there's so much coordination going on
00:52:49.120 | on the safety and security level.
00:52:50.560 | Like we have so many agents
00:52:52.640 | and so much activity going on, on making sure, you know,
00:52:56.480 | just look at what Meta's doing, you know, on this.
00:53:00.120 | He's like, I think that's one thing that's under-celebrated
00:53:02.840 | that even in the absence of any, you know,
00:53:05.680 | platonic guardian sort of regulation, right?
00:53:09.080 | Without any top-down,
00:53:10.320 | you already have an extraordinary amount of effort
00:53:14.040 | going in by all of these companies into AI safety
00:53:18.080 | and security that I thought was,
00:53:20.280 | I thought was a really important comment.
00:53:22.520 | Thanks for jumping in, guys, kicking this one around.
00:53:25.120 | It was a special one to--
00:53:26.480 | - Yeah, congrats on having that opportunity.
00:53:28.520 | That's pretty, that's pretty unique.
00:53:30.880 | - And now we got a little wager.
00:53:32.640 | So I mean, listen, I am so looking forward
00:53:36.760 | to like doing a live booking at the Mercer on the pod, right?
00:53:41.760 | And then Sonny, we can just drop the money from the sky.
00:53:45.640 | We can just collect, we can just collect, exactly, exactly.
00:53:49.880 | Good to see you guys.
00:53:50.720 | We'll talk soon.
00:53:51.560 | - All right, peace. - Take care.
00:53:53.240 | (upbeat music)
00:53:55.840 | - As a reminder to everybody,
00:54:04.280 | just our opinions, not investment advice.