back to index

AI Semiconductor Landscape feat. Dylan Patel | BG2 w/ Bill Gurley & Brad Gerstner


Chapters

0:0 Intro
1:50 Dylan Patel Backstory
2:36 SemiAnalysis Backstory
4:18 Google's AI Workload
6:58 NVIDIA's Edge
10:59 NVIDIA's Incremental Differentiation
13:12 Potential Vulnerabilities for NVIDIA
17:18 The Shift to GPUs: What It Means for Data Centers
22:29 AI Pre-training Scaling Challenges
29:43 If Pretraining Is Dead, Why Bigger Clusters?
34:0 Synthetic Data Generation
36:26 Hyperscaler CapEx
38:12 Pre-training and Inference-tIme Reasoning
41:0 Cisco Comparison to NVIDIA
44:11 Inference-time Compute
53:18 The Future of AI Models and Market Dynamics
60:58 Evolving Memory Technology
66:46 Chip Competition
67:18 AMD
70:35 Google’s TPU
74:56 Cerebras and Grok
77:33 Predictions for 2025 and 2026

Whisper Transcript | Transcript Only Page

00:00:00.000 | is scaling dead, then why is Mark Zuckerberg
00:00:01.900 | building a two gigawatt data center in Louisiana?
00:00:04.140 | Why is Amazon building these multi-gigawatt data centers?
00:00:07.900 | Why is Google, why is Microsoft
00:00:09.140 | building multiple gigawatt data centers,
00:00:10.820 | plus buying billions and billions of dollars of fiber
00:00:14.600 | to connect them together because they think,
00:00:16.100 | hey, I need to win on scale,
00:00:17.860 | so let me just connect all the data centers together
00:00:19.580 | with super high bandwidth
00:00:20.580 | so then I can make them act like one data center, right?
00:00:23.140 | Towards one job, right?
00:00:24.380 | So this whole, like, is scaling over narrative
00:00:28.020 | falls on its face when you see
00:00:30.060 | what the people who know the best are spending on.
00:00:32.840 | (upbeat music)
00:00:35.420 | - Great to be here.
00:00:45.640 | Psyched you both are in the shop today.
00:00:47.860 | Dylan, this is one of the things
00:00:49.940 | we've been talking about all year,
00:00:51.080 | which is, you know,
00:00:51.920 | how the world of compute is radically changing.
00:00:53.940 | So Bill, why don't you tell folks
00:00:55.940 | who Dylan is, and let's get started.
00:00:59.300 | - Yeah, we're thrilled to have Dylan Patel with us
00:01:01.340 | from Semi-Analysis.
00:01:02.460 | Dylan has quickly built,
00:01:05.140 | I think, the most respected research group
00:01:07.740 | on global semiconductor industry.
00:01:09.700 | And so what we thought we'd do today
00:01:11.940 | is dive deep on the intersection, I think,
00:01:15.700 | between everything Dylan knows
00:01:17.900 | from a technical perspective about the architectures
00:01:20.480 | that are out there, about the scaling,
00:01:22.540 | about the key players in the market globally,
00:01:25.220 | the supply chain, and the best and the brightest
00:01:28.180 | of people we know are all listening
00:01:30.420 | and reading Dylan's work.
00:01:32.160 | And then connect it to some of the business issues
00:01:34.940 | that our audience cares about,
00:01:36.900 | and see where it comes out.
00:01:38.820 | What I was hoping to do is kind of get a moment in time
00:01:42.780 | snapshot of all the semiconductor activity
00:01:46.080 | that relates to this big AI wave,
00:01:48.540 | and try and put it in perspective.
00:01:50.600 | - Dylan, how'd you get into this?
00:01:52.900 | - So when I was eight, my Xbox broke,
00:01:54.620 | and I have immigrant parents.
00:01:56.220 | I grew up in rural Georgia,
00:01:57.860 | so I didn't have much to do besides be a nerd,
00:02:00.940 | and I couldn't tell them I broke my Xbox.
00:02:03.000 | I had to open it up, short the temperature sensor,
00:02:05.300 | and fix it, and that was the way to fix it.
00:02:06.500 | I didn't know what I was doing at the time,
00:02:08.100 | but then I stayed on those forums.
00:02:09.580 | And then I became a forum warrior, right?
00:02:11.220 | You know, you see those people in the comments
00:02:12.840 | always yelling at you, Brad.
00:02:14.540 | You know, it's like, that was me, right?
00:02:17.140 | As a child, and you didn't know I was a child then,
00:02:18.980 | but you know, it was just like, you know,
00:02:21.660 | arguing with people online as a child,
00:02:23.380 | and then being passionate.
00:02:25.460 | As soon as I started making money,
00:02:26.420 | I was reading earnings from semiconductor companies,
00:02:28.220 | and investing in them, you know, with my internship money,
00:02:30.900 | and yeah, reading technical stuff as well, of course,
00:02:34.740 | and then working a little bit, and then, yeah.
00:02:36.620 | - And just tell us, give us a quick thumbnail
00:02:38.740 | on semi-analysis today.
00:02:40.260 | Like, what is the business?
00:02:41.420 | - Yeah, so today we are a semiconductor research firm,
00:02:44.460 | AI research firm.
00:02:45.420 | We service companies.
00:02:46.960 | Our biggest customers are all hyperscalers,
00:02:49.060 | the largest semiconductor companies,
00:02:51.100 | private equity, as well as hedge funds,
00:02:52.660 | and we sell data around where every data center
00:02:55.300 | in the world is, what the power is in each quarter,
00:02:57.580 | how the build-outs are going.
00:02:59.460 | We sell data around fabs.
00:03:01.180 | We track all 1,500 fabs in the world.
00:03:03.100 | For your purposes, only 50 of them matter,
00:03:04.740 | but like, you know, all 1,500 fabs around the world.
00:03:07.140 | Same thing with the supply chain of like,
00:03:09.460 | whether it be cables, or servers, or boards,
00:03:12.660 | or transformer substation equipment.
00:03:14.740 | We try and track all of this on a very number-driven basis,
00:03:17.740 | as well as forecasting.
00:03:19.300 | And then we do consulting around those areas.
00:03:21.220 | - Yeah, so I mean, you know, Bill,
00:03:23.260 | you and I just talked about this.
00:03:24.980 | I mean, for Altimeter,
00:03:27.100 | our team talks with Dylan and Dylan's team all the time.
00:03:29.980 | I think you're right.
00:03:31.620 | He's quickly emerged, really just through hustle, hard work,
00:03:35.020 | doing the grindy stuff that matters,
00:03:38.300 | I think is, you know, a benchmark
00:03:40.260 | for what's going on in the semiconductor industry.
00:03:42.460 | We're at this, you know, I suggested,
00:03:44.500 | we're two years into this, maybe, you know, this build-out,
00:03:47.380 | and it's been hyper-kinetic.
00:03:49.700 | And one of the things Bill and I are talking about
00:03:51.740 | is we enter the end of 2024 taking a deep breath,
00:03:56.300 | thinking about '25, '26, and beyond,
00:03:59.020 | because a lot of things are changing,
00:04:00.620 | and there's a lot of debates.
00:04:02.220 | And it's gonna have consequence for trillions of dollars
00:04:05.300 | of value in the public markets, in the private markets,
00:04:09.020 | how the hyperscalers are investing,
00:04:11.420 | and where we go from here.
00:04:12.380 | So Bill, why don't you take us a little bit
00:04:15.140 | through the start of the questions?
00:04:17.260 | - Well, so I think if you're gonna talk about AI
00:04:19.740 | and semiconductors, there's only one place to start,
00:04:21.900 | which is to talk about NVIDIA broadly.
00:04:25.420 | Dylan, what percentage of global AI workloads
00:04:28.660 | do you think are on NVIDIA chips right now?
00:04:31.460 | - So I would say if you ignored Google, it'd be over 98%.
00:04:35.860 | But then when you bring Google into the mix,
00:04:37.500 | it's actually more like 70, 'cause Google is really
00:04:40.820 | that large a percentage of AI workloads,
00:04:43.300 | especially production workloads.
00:04:45.340 | You know, they have less--
00:04:46.180 | - Production, you mean in-house workloads for Google?
00:04:48.820 | - Production as in things that are making money.
00:04:51.020 | Things that are making money, they're actually probably,
00:04:52.820 | it's probably even less than 70%, right?
00:04:54.420 | 'Cause you think about Google Search and Google Ads
00:04:56.700 | are the two largest, you know,
00:04:58.380 | two of the largest AI-driven businesses in the world, right?
00:05:02.020 | You know, the only things that are even comparable
00:05:03.620 | are like TikTok and Metas, right?
00:05:06.780 | - And those Google workloads, I think it's important
00:05:09.260 | just to kind of frame this, those are running
00:05:12.980 | on Google's proprietary chips.
00:05:15.900 | They're non-LLM workloads, correct?
00:05:20.460 | - So Google's production workloads for non-LLM and LLM
00:05:25.460 | run on their internal silicon.
00:05:27.940 | And I think one of the interesting things is, yes,
00:05:30.260 | you know, everyone will say Google dropped the ball
00:05:32.100 | on transformers and LLMs, right?
00:05:33.900 | How did OpenAI do GPT, right?
00:05:35.860 | And not Google, but Google was running transformers
00:05:39.060 | even in their search workload since 2018, 2019.
00:05:42.820 | The advent of BERT, which was one of the most well-known,
00:05:46.580 | most popular transformers before we got to the GPT madness,
00:05:50.100 | has been in their production search workloads for years.
00:05:54.780 | So they run transformers on their own
00:05:56.980 | in their search and ads business as well.
00:05:59.020 | - Going back to this number, you'd use 98%.
00:06:03.860 | If you just look at, I guess, workloads people
00:06:07.540 | are purchasing to do work on their own.
00:06:09.660 | So you take the captives out, you're at 98, right?
00:06:12.900 | This is a dominant landslide at this moment in time.
00:06:17.700 | - Back to Google for a second.
00:06:18.700 | They also are one of the big customers of Nvidia.
00:06:21.260 | - They do buy a number of GPUs.
00:06:24.580 | They buy some for, you know,
00:06:25.820 | some YouTube video-related workloads,
00:06:27.460 | internal workload, right?
00:06:28.300 | So not everything internal is like, is a GPU, right?
00:06:33.300 | They do buy some for some other internal workloads,
00:06:35.780 | but by and large, their GPU purchases are for Google Cloud
00:06:40.020 | to then rent out to customers.
00:06:42.220 | Because they are, while they do have some customers
00:06:45.300 | for their internal silicon externally, such as Apple,
00:06:49.460 | the vast majority of their external rental business for AI
00:06:53.980 | in terms of cloud business is still GPUs.
00:06:55.940 | - And that's the Nvidia GPUs.
00:06:57.740 | - Correct, Nvidia GPUs.
00:06:59.020 | - Why are they so dominant?
00:07:00.780 | Why is Nvidia so dominant?
00:07:03.140 | - So I like to think of it as like a three-headed dragon,
00:07:06.100 | right?
00:07:06.940 | I would say every semiconductor company in the world
00:07:10.340 | sucks at software except for Nvidia, right?
00:07:12.180 | So there's software.
00:07:13.020 | There's of course hardware.
00:07:14.500 | People don't realize that Nvidia is actually just much better
00:07:17.060 | at hardware than most people.
00:07:18.580 | They get to the newest technologies first and fastest
00:07:20.700 | because they drive like crazy
00:07:23.540 | towards hitting certain production goals, targets.
00:07:25.860 | They get chips out faster than other people
00:07:28.060 | from thought, design to deployed.
00:07:31.540 | And then the networking side of things, right?
00:07:33.700 | They bought Mellanox and they've driven really hard
00:07:35.860 | with the networking side of things.
00:07:38.580 | So those three things kind of combined
00:07:40.460 | to make a three-headed dragon
00:07:42.020 | that no other semiconductor company can do alone.
00:07:45.100 | - Yeah, I'd call out a piece you did, Dylan,
00:07:47.180 | where you helped everyone visualize the complexity
00:07:50.660 | of one of these modern cutting edge Nvidia deployments
00:07:54.660 | that involves the racks, the memory, the networking,
00:07:58.380 | the size of its scale of the whole thing.
00:08:01.220 | Super helpful.
00:08:02.100 | - I mean, there's this comparison oftentimes
00:08:05.380 | between companies that are truly standalone chip companies.
00:08:08.420 | They're not systems companies.
00:08:09.900 | They're not infrastructure companies and Nvidia.
00:08:13.100 | But I think one of the things that's deeply underappreciated
00:08:15.900 | is the level of competitive moats that Nvidia has.
00:08:19.580 | You know, software is becoming a bigger and bigger component
00:08:23.620 | of squeezing efficiencies and, you know,
00:08:25.660 | total cost of operation out of these infrastructures.
00:08:29.020 | So talk to us a little bit about that schema, you know,
00:08:32.300 | that Bill's referring to,
00:08:33.860 | like there are many different layers of systems architecture
00:08:37.340 | and how that's differentiated from maybe, you know,
00:08:40.220 | a custom ASIC or an AMD.
00:08:43.220 | - Right, so when you look broadly at the GPU, right,
00:08:46.900 | no one buys one chip for running an AI workload, right?
00:08:50.660 | Models have far exceeded that, right?
00:08:53.700 | You look at, you know, today's leading edge models
00:08:57.140 | like GPT-4 was, you know,
00:08:59.140 | over a trillion parameters, right?
00:09:00.900 | A trillion parameters is over a terabyte of memory.
00:09:04.420 | You can't get a chip with that capacity.
00:09:07.180 | A chip can't have enough performance to serve that model,
00:09:09.340 | even if it had enough memory capacity.
00:09:11.340 | So therefore you must tie together many chips together.
00:09:14.780 | And so what's interesting is that Nvidia has seen that
00:09:19.020 | and built an architecture that has many chips networked
00:09:21.740 | together really well called NVLink.
00:09:24.100 | But funnily enough and the thing that many ignore
00:09:27.060 | is that Google actually did this alongside Broadcom,
00:09:30.660 | you know, and they did it before Nvidia, right?
00:09:33.020 | You know, today everyone's freaking out about,
00:09:34.660 | or not freaking out,
00:09:35.500 | but like everyone's like very excited
00:09:36.780 | about Nvidia's Blackwell system, right?
00:09:38.740 | It is a rack of GPUs.
00:09:40.740 | That is the purchased unit, right?
00:09:41.980 | It's not one server, it's not one chip, it's a rack.
00:09:44.300 | And this rack, it weighs three tons
00:09:46.060 | and it has thousands and thousands of cables
00:09:48.180 | and all these things that Jensen will probably tell you,
00:09:49.820 | right, extremely complex.
00:09:52.220 | Interestingly, Google did something very similar in 2018,
00:09:55.420 | right, with the TPU.
00:09:57.140 | Now they couldn't do it alone, right?
00:09:58.380 | They know the software.
00:09:59.860 | They know what the compute element needs to be,
00:10:02.140 | but they didn't know anything.
00:10:02.980 | They can't do a lot of the other difficult things
00:10:05.460 | like package design, like networking.
00:10:08.180 | And so they had to work with other vendors
00:10:10.060 | like Broadcom to do this.
00:10:11.940 | And because Google had such a unified vision
00:10:14.540 | of where AI models were headed,
00:10:16.340 | they actually were able to build this system,
00:10:18.980 | this system architecture that was optimized for AI, right?
00:10:22.460 | Whereas at the time, NVIDIA was like,
00:10:24.260 | "Well, how big do we go?"
00:10:26.580 | I'm sure they could have tried to scale up bigger,
00:10:28.140 | but what they saw as the primary workloads
00:10:30.140 | didn't require scaling to that degree, right?
00:10:32.720 | Now everyone sort of sees this
00:10:34.140 | and they're running towards it,
00:10:35.340 | but NVIDIA has got Blackwell coming now.
00:10:37.980 | Competitors like AMD and others
00:10:40.180 | have to make an acquisition recently
00:10:42.140 | to help them get into the system design, right?
00:10:44.260 | Because building a chip is one thing,
00:10:46.900 | but building many chips that connect together,
00:10:49.260 | cooling them appropriately, networking them together,
00:10:51.660 | making sure that it's reliable at that scale
00:10:54.100 | is a whole host of problems
00:10:55.660 | that semiconductor companies don't have the engineers for.
00:10:59.140 | - Where would you say NVIDIA has been investing the most
00:11:02.660 | in incremental differentiation?
00:11:05.600 | - I would say for differentiating,
00:11:08.860 | NVIDIA has primarily focused on supply chain things,
00:11:13.860 | which might sound like,
00:11:15.380 | "Oh, well like, yeah, they're just like ordering stuff."
00:11:17.180 | No, no, no, no.
00:11:18.020 | You have to work deeply with the supply chain
00:11:19.940 | to build the next generation technology
00:11:22.700 | so that you can bring it to market
00:11:24.020 | before anyone else does, right?
00:11:25.740 | Because if NVIDIA stands still,
00:11:27.580 | they will be eaten up, right?
00:11:29.740 | They're sort of the Andy Grove,
00:11:32.220 | only the paranoid will survive.
00:11:33.900 | Jensen is probably the most paranoid man in the world, right?
00:11:37.100 | He's known for many years,
00:11:39.260 | since before the LLM craze,
00:11:41.340 | all of his biggest customers were building AI chips, right?
00:11:43.900 | Before the LLM craze, his main competitors were like,
00:11:46.660 | "Oh, we should make GPUs."
00:11:48.500 | And yet he stays on top
00:11:50.340 | because he's bringing to market technologies at volume
00:11:54.220 | that no one else can, right?
00:11:56.380 | And so whether it be in networking,
00:11:58.000 | whether it be in optics,
00:11:59.700 | whether it be in water cooling, right?
00:12:02.480 | Whether it be in all sorts of other power delivery,
00:12:05.940 | all these things, he's bringing to market technologies
00:12:08.400 | that no one else has,
00:12:09.700 | and he has to work with the supply chain
00:12:11.580 | and teach those supply chain companies,
00:12:13.360 | and they're helping, obviously,
00:12:14.380 | they have their own capabilities,
00:12:15.540 | to build things that don't exist today.
00:12:17.740 | And NVIDIA is trying to do this on an annual cadence now.
00:12:20.900 | - That's incredible, yeah.
00:12:21.740 | - Blackwell, Blackwell Ultra, Rubin, Rubin Ultra,
00:12:23.580 | they're going so fast,
00:12:25.180 | they're driving so many changes every year.
00:12:27.700 | Of course, people are gonna be like,
00:12:28.520 | "Oh, no, there are some delays in Blackwell."
00:12:30.580 | Yeah, of course, look how hard
00:12:31.940 | you're driving the supply chain.
00:12:33.540 | - Is that part, like how big a part
00:12:35.620 | of the competitive advantage
00:12:37.580 | is the fact that they're now on this annual cadence, right?
00:12:40.860 | Because it seems like by going there,
00:12:43.100 | it almost precludes their competitors from catching up,
00:12:46.740 | because even if you skate to where Blackwell is, right,
00:12:49.580 | you're already on next generation within 12 months.
00:12:52.540 | He's already planning two or three generations ahead
00:12:55.060 | because it's only two to three years ahead.
00:12:57.560 | - Well, the funny thing is a lot of people at NVIDIA
00:12:59.700 | will say Jensen doesn't plan more than a year,
00:13:01.780 | year and a half out,
00:13:02.780 | because they change things
00:13:05.160 | and they'll deploy them out that fast, right?
00:13:07.460 | Every other semiconductor company takes years to deploy,
00:13:10.500 | you know, make architecture changes,
00:13:13.260 | - You said if they stand still there,
00:13:14.740 | they would have competition,
00:13:16.940 | like what would be their area of vulnerability
00:13:21.740 | or what would have to play out in the market
00:13:24.580 | for other alternatives to take more share of the workload?
00:13:29.420 | - Yeah, so the main thing for NVIDIA is, you know,
00:13:33.020 | "Hey, this workload is this big," right?
00:13:35.100 | It's well over a hundred billion dollars of spend
00:13:38.980 | for the biggest customers.
00:13:39.960 | They have multiple customers
00:13:40.820 | that are spending billions of dollars.
00:13:42.820 | I can hire enough engineers to figure out
00:13:46.260 | how to run my model on other hardware, right?
00:13:48.680 | Now, maybe I can't figure out how to train
00:13:49.860 | on other hardware, but I can figure out
00:13:50.820 | how to run it for inference on other hardware.
00:13:52.980 | So NVIDIA's moat in inference
00:13:55.340 | is actually a lot smaller on software,
00:13:58.780 | but it's a lot bigger on,
00:13:59.940 | "Hey, they just have the best hardware."
00:14:01.260 | Now, what does the best hardware mean?
00:14:02.540 | It means capital costs and it means operation costs
00:14:04.740 | and then it means performance, right?
00:14:05.980 | Performance, TCO.
00:14:08.060 | And NVIDIA's whole moat here is,
00:14:11.460 | if they stand still, their performance TCO doesn't grow.
00:14:14.720 | But interestingly, they are, right?
00:14:16.020 | Like with Blackwell, not only is it way, way, way faster,
00:14:19.620 | anywhere from 10 to 15 times
00:14:20.940 | on really large models for inference,
00:14:22.740 | because they've optimized it for very large language models,
00:14:25.760 | they've also decided,
00:14:26.920 | "Hey, we're going to cut our margin too somewhat
00:14:28.940 | because I'm competing with Amazon's, you know,
00:14:32.340 | chip and TPU and AMD and all these things."
00:14:35.060 | They've decided to cut their margin too.
00:14:36.720 | So, between all these things,
00:14:38.140 | they've decided that they need to push performance TCO,
00:14:41.820 | not 2X every two years, right?
00:14:43.980 | You know, Moore's law, right?
00:14:45.620 | They've decided they need to push performance TCO 5X,
00:14:49.960 | maybe every year, right?
00:14:51.220 | At least that's what Blackwell is
00:14:52.380 | and we'll see what Rubin does.
00:14:53.580 | But, you know, 5X plus in a single year
00:14:56.020 | for performance TCO is an insane pace, right?
00:14:59.780 | And then you stack on top, like,
00:15:01.300 | "Hey, AI models are actually getting a lot better
00:15:03.260 | for the same size."
00:15:04.420 | The cost for delivering LLMs is tanking,
00:15:07.240 | which is going to induce demand, right?
00:15:09.440 | - So, just to clarify one thing you said,
00:15:11.840 | or at least restate it to make sure,
00:15:14.160 | I think when you said the software is more important
00:15:16.720 | for training, you meant CUDA is more of a differentiator
00:15:20.680 | on training than it is on inference.
00:15:22.680 | - So, I think a lot of people in the investor community,
00:15:25.120 | you know, call CUDA, which is just like one layer
00:15:27.560 | for all of NVIDIA software.
00:15:28.880 | There's a lot of layers of software,
00:15:30.520 | but for simplicity's sake, you know,
00:15:32.720 | regarding networking or what runs on switches
00:15:35.300 | or what runs on, you know, all sorts of things,
00:15:37.580 | fleet management stuff, all sorts of stuff
00:15:39.020 | that NVIDIA makes that we'll just call CUDA
00:15:41.180 | for simplicity's sake.
00:15:43.020 | But all of this software is stupendously difficult
00:15:47.180 | to replicate.
00:15:48.020 | In fact, no one else has deployments to do that
00:15:51.220 | besides the hyperscalers, right?
00:15:52.780 | And a few thousand GPUs is like a Microsoft
00:15:55.020 | inference cluster, right?
00:15:55.940 | It's not a training cluster.
00:15:57.860 | So, when you talk about, "Hey, what is the difficulty here?"
00:16:01.460 | On training, this is users constantly experimenting, right?
00:16:05.400 | Researchers saying, "Hey, let's try this.
00:16:07.000 | Let's try that.
00:16:07.840 | Let's try this.
00:16:08.660 | Let's try that."
00:16:09.500 | I don't have time to optimize
00:16:10.640 | and hand wring out the performance.
00:16:12.160 | I rely on NVIDIA's performance to be quite good
00:16:15.160 | with existing software stacks or very little effort, right?
00:16:19.120 | But when I go to inference,
00:16:21.000 | Microsoft is deploying five, six models
00:16:24.240 | across how many billions of revenue, right?
00:16:26.760 | So, all of OpenAI's revenue,
00:16:28.200 | plus whatever they have on copilot and all that.
00:16:30.080 | - 10 billion of inference revenue.
00:16:31.380 | - Yeah, so they have $10 billion of revenue here
00:16:34.020 | and they're deploying five models, right?
00:16:35.820 | GPT-4, 4.0, you know, 4.0 mini,
00:16:39.540 | and now, you know, the reasoning model.
00:16:41.340 | Yeah, the reasoning models, 0.1 and yeah.
00:16:44.100 | So, it's like they're deploying very few models
00:16:46.620 | and those change, what, every six months, right?
00:16:49.260 | So, every six months they get a new model
00:16:51.460 | and they deploy that.
00:16:52.300 | So, within that timeframe,
00:16:53.380 | you can hand wring out the performance.
00:16:55.140 | And so, Microsoft has deployed GPT-style models
00:16:58.860 | on other competitors' hardware, such as AMD,
00:17:02.440 | and some of their own, but mostly AMD.
00:17:04.840 | And so, they can wring that out with software
00:17:06.920 | because they can spend hundreds of engineers,
00:17:09.560 | dozens of engineers' hours,
00:17:11.400 | hundreds of engineer hours,
00:17:12.520 | or thousands of engineer hours on working this out
00:17:14.640 | because it's such a unified sort of workload, right?
00:17:18.600 | - I wanna get you to comment on this chart.
00:17:20.440 | This is a chart we showed earlier in the year
00:17:23.320 | that I think was kind of a moment for me with Jensen
00:17:28.000 | when he was in, I think, the Middle East.
00:17:30.640 | And for the first time he said,
00:17:32.680 | not only are we gonna have a trillion dollars
00:17:34.840 | of new AI workloads over the course of the next four years,
00:17:38.880 | he said, but we're also going to have a trillion dollars
00:17:42.200 | of CPU replacement, of data center replacement workloads
00:17:47.040 | over the course of the next four years.
00:17:48.420 | So, that's an effort to model it out.
00:17:49.960 | And I, you know, we referenced it on the pod with him
00:17:53.560 | and he seemed to indicate
00:17:55.160 | that it was directionally correct, right?
00:17:56.800 | That he still believes that it's not just about,
00:18:00.400 | because there's a lot of fuss in the world about,
00:18:03.400 | you know, pre-training and what if pre-training
00:18:05.720 | doesn't continue apace.
00:18:07.360 | And it seemed to suggest that there was a lot of AI workloads
00:18:12.080 | that had nothing to do with pre-training
00:18:14.160 | that they're working on,
00:18:15.000 | but also that they had all of this data center replacement.
00:18:17.880 | Do you buy that?
00:18:18.700 | I've heard a lot of people push back
00:18:20.280 | on the data center replacement and say,
00:18:22.040 | there's no way people are gonna, you know,
00:18:23.960 | rebuild a CPU data center with a bunch of NVIDIA GPUs.
00:18:27.720 | It just doesn't make any sense.
00:18:29.440 | But his argument is that an increasing number
00:18:32.920 | of these applications, even things like Excel
00:18:35.360 | and PowerPoint are becoming machine learning applications
00:18:39.240 | and require accelerated compute.
00:18:41.720 | NVIDIA has been pushing non AI workloads
00:18:43.880 | for accelerators for a very long time, right?
00:18:46.180 | Professional visualization, right?
00:18:48.000 | Pixar uses a ton of GPUs, right?
00:18:49.600 | To make every movie, you know,
00:18:51.800 | all these Siemens engineering applications, right?
00:18:54.200 | All these things do use GPUs, right?
00:18:56.800 | I would say they're a drop in the bucket
00:18:59.360 | compared to, you know, AI.
00:19:01.960 | The other aspect I would say is,
00:19:03.880 | and this is sort of a bit contentious
00:19:05.200 | with your chart, I think,
00:19:06.160 | but IBM mainframes sell more volume
00:19:09.480 | and revenue every single cycle, right?
00:19:11.880 | So, you know, yes, no one in the bay uses mainframes
00:19:14.640 | or talks about mainframes, but they're still growing, right?
00:19:18.960 | And so, like, I would say the same applies to CPUs, right?
00:19:23.320 | To classic workloads.
00:19:24.760 | Just because, you know, AI is here
00:19:26.280 | doesn't mean web serving is like gonna slow down
00:19:28.060 | or databasing is gonna slow down.
00:19:29.560 | Now, what does happen is that line is like this
00:19:33.680 | and the AI line is like this.
00:19:35.920 | And furthermore, right?
00:19:38.000 | Like when you talk about what, you know,
00:19:40.080 | hey, these applications, they're now AI, right?
00:19:42.040 | You know, Excel with Copilot or Word with Copilot,
00:19:44.200 | or whatever, right?
00:19:45.680 | There's, they're gonna be,
00:19:46.520 | they're still gonna have all of those classic operations.
00:19:48.320 | You don't get rid of what you used to have, right?
00:19:50.680 | Southwest doesn't stop booking flights.
00:19:52.560 | They just run AI analytics on top of their flights
00:19:54.440 | to maybe, you know, do pricing better or whatever, right?
00:19:57.200 | So I would say that still happens,
00:19:59.240 | but there is an element of replacement
00:20:01.300 | that is sort of misunderstood, right?
00:20:03.480 | Which is given how much people are deploying,
00:20:06.520 | how tight the supply chains for data centers are.
00:20:10.160 | Data centers take longer,
00:20:11.360 | they're longer time supply chains, unfortunately, right?
00:20:14.360 | Which is why you see things like what Elon's doing.
00:20:16.440 | But when you, when you think about that,
00:20:18.040 | well, how can I get power then, right?
00:20:19.920 | So you can do what CoreWeave is doing
00:20:21.680 | and go to crypto mining companies
00:20:23.220 | and just like clear them out
00:20:24.360 | and put a bunch of GPUs in them, right?
00:20:26.280 | Retrofit the data center,
00:20:27.120 | put GPUs in them like they're doing in Texas.
00:20:30.160 | Or you can do what some of these other folks are doing,
00:20:32.860 | which is, hey, well, my depreciation for CPU servers
00:20:37.340 | has gone from three years to six years
00:20:39.240 | in just a handful of years, why?
00:20:41.360 | Because Intel's progress has been this, right?
00:20:43.480 | So in reality, like the old Intel CPU
00:20:45.280 | is not that much better.
00:20:46.120 | But all of a sudden over the last couple of years,
00:20:48.440 | AMD's burst onto the scene,
00:20:50.160 | ARM CPUs have burst onto the scene,
00:20:52.360 | Intel's started to right the ship.
00:20:54.480 | Now I can upgrade the most,
00:20:57.100 | the plurality of Amazon CPUs in their data centers
00:21:00.340 | are 24 core Intel CPUs from,
00:21:03.600 | that were manufactured from 2015 to 2020.
00:21:06.280 | Same, more or less the same architecture.
00:21:08.360 | There's 24 core CPU.
00:21:09.700 | I can buy 128 or 192 core CPU now today
00:21:14.880 | where each CPU core is higher performance.
00:21:17.720 | And well, if I just replace like six servers with one,
00:21:21.600 | I've basically invented power out of thin air, right?
00:21:24.300 | I mean, like, you know, in effect,
00:21:25.680 | because these old servers, which are six plus years old,
00:21:28.520 | or even, you know, they can just be deprecated and put.
00:21:31.680 | So with CapEx of new servers,
00:21:34.500 | I can replace these old servers.
00:21:35.960 | And now, you know, every time I do that,
00:21:37.960 | I can throw another AI server in there, right?
00:21:39.760 | So this is sort of the, yes, there is some replacement.
00:21:42.320 | I still need more total capacity,
00:21:44.160 | but that total capacity can be served by fewer machines,
00:21:47.200 | maybe, if I buy new ones.
00:21:49.560 | And generally the market is not gonna shrink,
00:21:51.160 | it's still gonna grow, just nowhere close to what AI is.
00:21:53.800 | And AI is causing this behavior of,
00:21:56.240 | I need to replace so I can get power.
00:21:58.320 | - Okay, Bill, this reminds me of a point Satya made
00:22:01.400 | on the pod last week that I've seen replayed
00:22:03.400 | a bunch of times, and I think is fairly misunderstood.
00:22:06.880 | He said last week on the pod
00:22:08.720 | that he was power and data center constrained,
00:22:11.560 | not chip constrained.
00:22:13.080 | What I think it was, was more a assessment
00:22:16.560 | on the real bottleneck, which is data centers and power,
00:22:20.480 | as opposed to GPUs, because GPUs have come online.
00:22:24.480 | And so I think the case you just made,
00:22:27.960 | I think helps to clarify that.
00:22:29.920 | - Well, before we dive into the alternatives to NVIDIA,
00:22:33.400 | I thought we would hit on this pre-training scaling debate
00:22:38.400 | that you wrote about in your last piece, Dylan,
00:22:41.080 | and we've already talked about quite a bit,
00:22:43.520 | but why don't you give us your view
00:22:46.440 | of what's going on there.
00:22:47.840 | I think Ilya was the one, the most credible
00:22:51.040 | AI specialists that brought this up,
00:22:55.440 | and then it got repeated and cross-analyzed quite a bit.
00:22:59.040 | - And Bill, just to repeat what it is,
00:23:02.000 | I think Ilya said, data's the fossil fuel of AI,
00:23:06.880 | that we've consumed all the fossil fuel
00:23:09.160 | because we only have but one internet.
00:23:11.840 | And so the huge gains we got from pre-training
00:23:14.920 | are not gonna be repeated.
00:23:16.240 | - And some experts had predicted that data,
00:23:19.840 | that data would run out a year or two ago.
00:23:23.720 | So it wasn't like out of nowhere
00:23:27.680 | that that argument came to light.
00:23:29.280 | Anyway, let's hear what Dylan has to say.
00:23:31.240 | - So pre-training scaling laws are pretty simple, right?
00:23:34.760 | You get more compute, and then I throw it at a model,
00:23:37.240 | and it'll get better.
00:23:38.120 | Now what is that?
00:23:38.960 | That breaks out into two axes, right?
00:23:40.360 | Data and parameters, right?
00:23:42.280 | The bigger the model, the more data, the better.
00:23:44.320 | And there's actually an optimal ratio, right?
00:23:46.120 | So Google published a paper called Chinchilla,
00:23:48.840 | which says the optimal ratio of data to parameter,
00:23:53.240 | model size, and that's the scaling thing.
00:23:55.280 | Now what happens when the data runs out?
00:23:56.720 | Well, I don't really get much more data,
00:23:59.600 | but I keep growing the size of the model
00:24:01.120 | because my budget for compute keeps growing.
00:24:03.960 | This is a bit not fair, though, right?
00:24:06.320 | We have barely, barely, barely tapped video data, right?
00:24:10.800 | So there is a significant amount of data that's not tapped.
00:24:13.000 | It's just video data is so much more information
00:24:17.200 | than written data, right?
00:24:19.160 | And so therefore, you're throwing that away.
00:24:20.400 | But I think that's part of the,
00:24:23.280 | there's a bit of misunderstanding there.
00:24:24.960 | But more importantly, text is the most efficient domain,
00:24:28.200 | right?
00:24:29.040 | Humans generally, yes, a picture paints a thousand words,
00:24:32.040 | but if I write a hundred words,
00:24:33.920 | I can probably, you can tell, figure out faster, right?
00:24:35.760 | - And the transcripts of most of those videos were already.
00:24:38.760 | - Yeah, the transcripts of many of those videos
00:24:40.440 | are in there already.
00:24:43.040 | But regardless, the data is like a big axis.
00:24:47.040 | Now, the problem is this is only pre-training, right?
00:24:50.240 | Quote, pre.
00:24:51.920 | Training a model is more than just the pre-training, right?
00:24:54.760 | There's many elements of it.
00:24:56.600 | And so people have been talking about,
00:24:57.920 | hey, inference time compute.
00:24:59.480 | Yeah, that's important, right?
00:25:00.480 | You can continue to scale models
00:25:01.640 | if you figure out how to make them think
00:25:03.240 | and recursively be like, oh, that's not right.
00:25:05.240 | Let me think this way.
00:25:06.080 | Oh, that's not right.
00:25:06.920 | That, let me, you know, much like, you know,
00:25:08.560 | you don't hire an intern and say,
00:25:10.240 | hey, what's the answer to X?
00:25:11.840 | Or you don't hire a PhD and say,
00:25:12.800 | hey, what's the answer to X?
00:25:13.800 | You're like, go work on this.
00:25:14.920 | And then they come back and bring something to you.
00:25:16.320 | So inference time compute is important,
00:25:18.120 | but really what's more important is,
00:25:20.040 | as we continue to get more and more compute,
00:25:22.400 | can we improve models if data is run out?
00:25:24.760 | And the answer is you can create data
00:25:27.040 | out of thin air almost, right?
00:25:29.440 | In certain domains, right?
00:25:31.040 | And so this is the whole,
00:25:32.480 | the debate around scaling laws is
00:25:34.480 | how can we create data, right?
00:25:37.440 | And so what is Ilya's company doing?
00:25:40.040 | Most likely.
00:25:40.880 | What is Mira's company doing?
00:25:42.320 | Most likely.
00:25:43.160 | Mira Murady, CTO of OpenAI.
00:25:46.720 | What are, you know, all these companies focused on?
00:25:49.680 | OpenAI.
00:25:51.040 | What are all these companies focused?
00:25:52.200 | They have Noam Brown,
00:25:53.040 | who's like sort of one of the big reasoning people
00:25:54.880 | on roadshows, just going and speaking everywhere,
00:25:56.960 | basically, right?
00:25:57.880 | What are they doing, right?
00:25:59.880 | They're saying, hey, we can still improve these models.
00:26:02.480 | Yes, spending compute at inference time is important,
00:26:04.840 | but what do we do at training time?
00:26:06.640 | 'Cause you can't just tell a model,
00:26:08.000 | think more and it gets better.
00:26:09.040 | You have to do a lot at training time.
00:26:10.520 | And so what that is, is I take the model,
00:26:13.280 | I take an objective function I have, right?
00:26:15.880 | What is the square root of 81, right?
00:26:18.600 | Now, if I told you the square,
00:26:19.720 | ask many people what's the square root of 81,
00:26:21.960 | many could answer,
00:26:23.000 | but I bet many people could answer
00:26:24.720 | if they thought about it more,
00:26:25.680 | like almost, you know, a lot more people, right?
00:26:27.440 | Maybe it's a simplistic problem.
00:26:28.840 | But you say, hey, let's have the existing model do that.
00:26:31.280 | Let's have it just run every possible,
00:26:34.200 | you know, not every possible,
00:26:35.040 | many permutations of this.
00:26:36.960 | Start off with say five,
00:26:38.200 | and then anytime it's unsure branch into multiple.
00:26:40.600 | So you start out,
00:26:41.440 | you have hundreds of quote unquote rollouts or trajectories
00:26:45.120 | of generated data.
00:26:46.120 | Most of this is garbage, right?
00:26:48.360 | You prune it down to,
00:26:49.640 | hey, only these paths got to the right answer.
00:26:52.080 | Okay, now I feed that and that is now new training data.
00:26:55.040 | And so I do this with every possible area
00:26:57.640 | where I can do functional verification.
00:26:59.640 | Functional verification, i.e.,
00:27:01.240 | hey, this code compiles.
00:27:02.560 | Hey, this unit test that I have in my code base,
00:27:05.720 | how can I generate the solution?
00:27:07.080 | How can I generate the function?
00:27:08.200 | Okay, now, and you do this over,
00:27:10.600 | and over, and over, and over again,
00:27:12.160 | across many, many, many different domains
00:27:14.160 | where you can functionally prove it's real.
00:27:16.320 | You generate all this data,
00:27:17.320 | you throw away the vast, vast majority of it,
00:27:19.360 | but you now have some chains of thought
00:27:22.600 | that you can train the model on,
00:27:24.160 | which then it will learn how to do that more effectively,
00:27:26.720 | and it generalizes outside of it, right?
00:27:28.520 | And so this is the whole domain.
00:27:29.880 | Now, when you talk about scaling laws,
00:27:32.600 | it's point of diminishing returns
00:27:34.640 | is kind of not proven yet, by the way, right?
00:27:38.040 | Because it's more so,
00:27:39.440 | hey, the scaling laws are a log-log axis.
00:27:41.920 | A log, i.e., it takes 10x more investment
00:27:44.240 | to get the next iteration.
00:27:45.800 | Well, 10x more investment, you know,
00:27:47.680 | going from 30 million to 300 million,
00:27:50.840 | 300 million to 3 billion is relevant,
00:27:52.800 | but when Sam wants to go from 3 billion to 30 billion,
00:27:56.320 | it's a little difficult to raise that money, right?
00:27:58.680 | That's why, you know, the most recent rounds
00:28:00.400 | are a bit like, ooh, crap,
00:28:01.920 | we can't spend 30 billion on the next run.
00:28:04.520 | And so the question is, well, that's just one axis.
00:28:07.640 | Where have we gone on synthetic data?
00:28:09.480 | Oh, we're still like very early days, right?
00:28:11.720 | We've spent tens of millions of dollars, maybe,
00:28:14.280 | on synthetic data.
00:28:15.400 | - With synthetic data,
00:28:18.080 | you used a qualifier in certain domains.
00:28:20.840 | When they released O1,
00:28:22.560 | it also had a qualifier like that in certain domains.
00:28:26.240 | I'm just saying those two scaling axis
00:28:28.760 | do better in certain domains
00:28:30.400 | and aren't as applicable in others,
00:28:32.200 | and we have to figure that out.
00:28:33.600 | - Yeah, I think one of the interesting things about AI
00:28:35.880 | is that first in 2022, 2023,
00:28:39.960 | with the release of diffusion models,
00:28:41.720 | with the release of text models, people were like,
00:28:43.040 | oh, wow, artists are the one that are the most,
00:28:45.000 | you know, out of luck, not technical jobs.
00:28:47.840 | Actually, these things suck at technical jobs.
00:28:49.720 | But with this new axis of synthetic data
00:28:53.680 | and test-time compute,
00:28:55.520 | actually, where are the areas where we can teach the model?
00:28:57.960 | We can't teach it what good art is
00:29:00.360 | because we have no way to functionally prove
00:29:02.800 | what good art is.
00:29:03.640 | We can teach it to write really good software.
00:29:06.360 | We can teach it how to do mathematical proofs.
00:29:09.040 | We can teach it how to engineer systems
00:29:11.080 | because there are, while there are trade-offs,
00:29:13.400 | and this is not like, it's not just a one-zero thing,
00:29:15.880 | especially on engineering systems,
00:29:17.440 | this is something you can functionally verify.
00:29:19.280 | Is this works or not?
00:29:20.360 | Or this is correct or not?
00:29:21.200 | - You grade the output
00:29:22.040 | and then the model can iterate more often.
00:29:24.240 | - Exactly.
00:29:25.080 | - Which goes back to the AlphaGo thing
00:29:26.960 | and why that was a sandbox that could allow
00:29:30.600 | for novel moves and plays
00:29:35.320 | 'cause you could traverse it and run synthetically.
00:29:39.280 | You could just let it create and create and create.
00:29:42.000 | - Putting on my investor hat, public investor hat here,
00:29:46.200 | there is a lot of tension in the world
00:29:48.800 | over NVIDIA as we look forward at 2025
00:29:53.120 | and this question of pre-training, right?
00:29:56.520 | And if in fact, we've seen,
00:29:59.440 | we plucked 90% of the low-hanging fruit
00:30:01.840 | that comes from pre-training,
00:30:03.240 | then do people really need to buy bigger clusters?
00:30:05.360 | And I think there's a view in the world,
00:30:07.160 | particularly post Ilya's comments,
00:30:09.480 | that, no, the 90% benefit of pre-training is gone.
00:30:13.520 | But then I look at the comments out of Hock Tan this week,
00:30:18.400 | during their earnings call,
00:30:19.480 | that all the hyperscalers are building
00:30:21.360 | these million XPU clusters.
00:30:24.280 | I look at the commentary out of X.AI
00:30:27.560 | that they're gonna build 200 or 300,000 GPU clusters,
00:30:31.960 | Meta reportedly building much bigger clusters,
00:30:34.880 | Microsoft building much bigger clusters.
00:30:37.000 | How do you square those two things, right?
00:30:39.440 | If everybody's right and pre-training's dead,
00:30:42.080 | then why is everybody building much bigger clusters?
00:30:44.920 | - So the scaling, right, goes back to
00:30:48.000 | what's the optimal ratio?
00:30:49.240 | What's the, how do we continue to grow, right?
00:30:50.880 | Just blindly growing parameter count
00:30:52.720 | when we don't have any more data,
00:30:54.040 | or the data is very hard to get at,
00:30:55.560 | i.e. because it's video data,
00:30:57.960 | wouldn't give you so many gains.
00:30:59.720 | And then there's also the access if it's a log chart, right?
00:31:01.720 | You need 10X more to get the next job, right?
00:31:04.000 | So when you look at both of those,
00:31:05.760 | oh, crap, like I need to invest 10X more.
00:31:07.880 | And I might not get the full gain
00:31:09.480 | because I don't have the data.
00:31:10.480 | But the data generation side,
00:31:13.080 | we are so early days with this, right?
00:31:15.240 | - So the point is I'm still gonna squeak out enough gain
00:31:18.760 | that it's a positive return,
00:31:20.840 | particularly when you look at the competitive dynamic,
00:31:24.480 | you know, our models versus our competitor models.
00:31:26.760 | So it's a rational decision
00:31:28.800 | to go from 100,000 to 200,000 or 300,000,
00:31:32.040 | even if, you know, the kind of big one-time gain
00:31:35.040 | in pre-training is behind us.
00:31:36.800 | - Or rather it's exponentially more,
00:31:38.560 | it's logarithmically more expensive to do that gain.
00:31:40.600 | - Correct. - Right?
00:31:41.440 | So it's still there.
00:31:42.280 | Like the gain is still there,
00:31:43.120 | but like the sort of whole, like Orion has failed
00:31:46.000 | sort of narrative around OpenAI's model
00:31:48.880 | and they didn't release Orion, right?
00:31:50.280 | They released O1, which is sort of a different axis.
00:31:53.360 | It's partially because, hey, this is, you know,
00:31:55.680 | because of these like data issues,
00:31:57.040 | but it's partially because they did not scale 10X, right?
00:31:59.920 | 'Cause scaling 10X from four to this is actually was like-
00:32:03.720 | - I think this is Gavin's point, right?
00:32:05.520 | - Well, I would also, let's go to Gavin a second.
00:32:07.760 | One of the reasons this became controversial, I think,
00:32:11.120 | is Dario and Sam had prior to this moment,
00:32:16.120 | at least the way I heard them,
00:32:22.800 | made it sound like they were just gonna build
00:32:25.400 | the next biggest thing and get the same amount of gain.
00:32:29.240 | They had left that impression.
00:32:31.320 | And so we get to this place, as you described it,
00:32:33.760 | it's not quite like that.
00:32:35.320 | And then people go, "Oh, what does that mean?"
00:32:36.880 | Like it causes them to raise their head up.
00:32:38.600 | - I think they have never said the chinchilla scaling laws
00:32:42.640 | were what delivers us, you know, AGI, right?
00:32:45.560 | They've had scaling.
00:32:46.440 | Scaling is, you need a lot of compute.
00:32:48.840 | And guess what?
00:32:49.760 | If you have to generate a ton of data
00:32:53.080 | and throw away most of it because,
00:32:54.640 | hey, only some of the paths are good,
00:32:56.680 | you're spending a ton of compute at train time, right?
00:32:59.800 | And this is sort of the axis that is like,
00:33:01.840 | we may actually see models improve faster
00:33:04.520 | in the next six months to a year
00:33:06.120 | than we saw them improve in the last year.
00:33:07.880 | Because there's this new axis of synthetic data generation
00:33:11.960 | and the amount of compute we can throw at it is,
00:33:14.640 | we're still right here in the scaling law, right?
00:33:17.200 | We're not here.
00:33:18.040 | We haven't pushed it to billions of dollars
00:33:20.400 | spent on synthetic data generation,
00:33:22.680 | functional verification, reasoning training.
00:33:24.600 | We've only spent millions, tens of millions of dollars,
00:33:27.240 | right?
00:33:28.080 | So what happens when we scale that up?
00:33:29.040 | So there is a new axis of spending money.
00:33:31.720 | And then there's, of course, test time compute as well,
00:33:33.560 | i.e. spending time at inference to get better and better.
00:33:36.000 | So it's possible.
00:33:37.440 | And in fact, many people at these labs believe
00:33:39.760 | that the next year of gains
00:33:40.800 | or the next six months of gains will be faster
00:33:42.880 | because they've unlocked this new axis
00:33:44.960 | through a new methodology, right?
00:33:47.520 | And it's still scale, right?
00:33:49.160 | Because this requires stupendous amounts of compute.
00:33:51.880 | You're generating so much more data than exist on the web,
00:33:54.920 | and then you're throwing away most of it,
00:33:56.240 | but you're generating so much data
00:33:57.920 | that you have to run the model constantly, right?
00:34:00.760 | - What domains do you think are most applicable
00:34:05.720 | with this approach?
00:34:06.840 | Like where were synthetic data be most effective?
00:34:11.680 | And maybe you could do both a pro and a con,
00:34:16.200 | like a scenario where it's gonna be really good
00:34:18.560 | and one where it wouldn't work as well.
00:34:20.240 | - Yeah, so I think that goes back to the point around
00:34:23.280 | what can we functionally verify is true or not?
00:34:26.160 | What can I grade and it's not subjective?
00:34:28.480 | What class can you take in college
00:34:31.360 | and you get the card, you get the thing back
00:34:33.520 | and you're like, "Oh, this is BS."
00:34:34.920 | Or you're like, "Dang, I messed that up," right?
00:34:36.560 | There's certain classes where you can-
00:34:37.400 | - There's like a determinism of grading the output.
00:34:42.240 | - Right, exactly.
00:34:43.080 | So if it can be functionally verified, amazing.
00:34:46.400 | If it has to be judged, right?
00:34:47.720 | So there's sort of two ways to judge an output, right?
00:34:49.880 | There is, without using humans, right?
00:34:52.320 | This is sort of the whole scale AI, right?
00:34:54.640 | What were they initially doing?
00:34:56.240 | They were using a ton of manpower
00:34:59.000 | to create good data, right?
00:35:01.000 | Label data.
00:35:01.880 | But now, humans don't scale for this level of data, right?
00:35:05.280 | Humans post on the internet every day
00:35:06.640 | and we've already tapped that out, right?
00:35:08.000 | Kind of more or less on a text domain.
00:35:09.600 | - So what are domains that work?
00:35:11.360 | - So these are domains where, hey, in Google,
00:35:14.840 | when they push data to any of their services,
00:35:17.920 | they have tons of unit tests.
00:35:19.360 | These unit tests make sure everything's working.
00:35:21.360 | Well, why can't I have the LLM
00:35:23.520 | just generate a ton of outputs
00:35:25.160 | and then use those unit tests to grade those outputs, right?
00:35:27.520 | Because it's pass or fail, right?
00:35:28.880 | It's not.
00:35:30.040 | And then you can also grade these outputs in other ways.
00:35:31.920 | Like, oh, it takes this long to run
00:35:33.120 | versus this long to run.
00:35:33.960 | So then you have various,
00:35:34.960 | there's other areas such as like, hey, image generation.
00:35:37.960 | Well, actually it's harder to say
00:35:39.160 | which image looks more beautiful to you versus me.
00:35:42.720 | I might like some sunsets and flowers
00:35:45.880 | and you might like the beach, right?
00:35:47.280 | You can't really argue what is good there.
00:35:49.600 | So there's no functional verification.
00:35:51.480 | There is only subjective, right?
00:35:53.880 | So the objective nature of this is where,
00:35:55.840 | so where do we have objective grading, right?
00:35:57.960 | We have that in code.
00:35:58.800 | We have that in math.
00:35:59.640 | We have that in engineering.
00:36:01.040 | And while these can be complex, like, hey,
00:36:04.400 | engineering is not just, this is the best solution.
00:36:06.400 | It's, hey, given all the resources we've had
00:36:08.400 | and given all these trade-offs,
00:36:09.360 | we think this is the best trade-off, right?
00:36:11.040 | That's usually what engineering ends up being.
00:36:13.080 | Well, I can still look at all these axes, right?
00:36:16.920 | Whereas in subjective things, right?
00:36:19.560 | Like, hey, what's the best way to write this email
00:36:22.000 | or what's the best way to negotiate with this person?
00:36:24.080 | That's difficult, right?
00:36:25.520 | That's not something that is objective.
00:36:26.920 | - What are you hearing from the hyperscalers?
00:36:28.480 | I mean, they're all out there saying,
00:36:29.680 | our CapEx is going up next year.
00:36:31.120 | We're building larger clusters.
00:36:32.680 | You know, is that in fact happening?
00:36:35.320 | Like, what's happening out there?
00:36:37.040 | - Yeah, so I think when you look
00:36:38.800 | at the streets estimates for CapEx,
00:36:40.520 | they're all far too low, you know,
00:36:42.720 | based on a few factors, right?
00:36:45.080 | So when we track every data center in the world
00:36:46.960 | and it's insane how much, especially Microsoft
00:36:51.120 | and now Meta and Amazon and, you know, and many others,
00:36:54.800 | right, but those guys specifically are spending
00:36:57.640 | on data center capacity.
00:36:58.680 | And as that power comes online,
00:36:59.960 | which you can track pretty easily
00:37:00.920 | if you look at all of the different regulatory filings
00:37:03.840 | and use satellite imagery, all these things that we do,
00:37:06.040 | you can see that, hey, they're going to have
00:37:07.720 | this much data center capacity, right?
00:37:09.840 | - Right.
00:37:10.680 | So it's accelerating.
00:37:12.760 | - What are you going to fill in there, right?
00:37:14.480 | It turns out you have to fill, to fill it up, you know,
00:37:17.080 | you can make some estimates around how much power
00:37:18.640 | is each GPU, all in, everything, right?
00:37:21.240 | Satya said he's going to slow down that a little bit,
00:37:22.960 | but they've signed deals for next year rentals, right?
00:37:25.360 | For some, in some of these cases, right?
00:37:27.040 | - And it's part of the reason he said is he expects
00:37:29.440 | his cloud revenue in the first half of next year
00:37:31.160 | to accelerate, because he said we're going to have
00:37:32.680 | a lot more data center capacity
00:37:34.160 | and we're currently capacity constrained.
00:37:36.000 | So, you know, what they're, you know, like again,
00:37:37.960 | going back to the, is scaling dead?
00:37:39.480 | Then why is Mark Zuckerberg building
00:37:40.800 | a two gigawatt data center in Louisiana?
00:37:42.520 | - Right.
00:37:43.360 | - Why is, why is Amazon building
00:37:45.360 | these multi gigawatt data centers?
00:37:46.560 | Why is Google, why is Microsoft building
00:37:48.280 | multiple gigawatt data centers,
00:37:49.480 | plus buying billions and billions of dollars of fiber
00:37:53.280 | to connect them together because they think,
00:37:54.760 | hey, I need to win on scale.
00:37:56.520 | So let me just connect all the data centers together
00:37:58.240 | with super high bandwidth.
00:37:59.240 | So then I can make them act like one data center, right?
00:38:01.680 | - Right.
00:38:02.520 | - Towards one job, right?
00:38:03.360 | So this is, this, this whole like,
00:38:05.040 | is scaling over narrative falls on its face
00:38:08.080 | when you see what the people who know the best
00:38:10.480 | are spending on, right?
00:38:12.400 | - You talked a lot at the beginning
00:38:13.560 | about NVIDIA's differentiation
00:38:15.360 | around these large coherent clusters
00:38:17.960 | that are used in pre-training.
00:38:20.440 | Can you see anything, like, I guess one,
00:38:23.640 | someone might be super bullish on inference
00:38:25.840 | and keep building out a data center,
00:38:28.040 | but they might have thought they were gonna go
00:38:31.600 | from 100,000 nodes to 200 to 400
00:38:34.840 | and might not be doing that right now
00:38:37.040 | if this pre-training thing is real.
00:38:39.560 | Are you seeing anything that gives you
00:38:41.840 | any visibility on that dimension?
00:38:44.240 | - So when you think about training a neural network, right,
00:38:47.160 | it is doing a forwards pass and a backwards pass, right?
00:38:49.560 | Forwards pass is generating the data, basically,
00:38:52.240 | and it's half as much compute as the backwards pass,
00:38:54.840 | which is updating the weights.
00:38:56.920 | When you look at this new paradigm
00:38:58.520 | of synthetic data generation,
00:39:00.640 | grading the outputs, and then training the model,
00:39:03.400 | you are going to do many, many, many forward passes
00:39:05.960 | before you do a backwards pass.
00:39:06.920 | What is serving a user?
00:39:07.920 | That's also just a forwards pass.
00:39:09.600 | So it turns out that there is a lot of inference
00:39:13.240 | in training, right?
00:39:14.440 | In fact, there's more inference in training
00:39:16.560 | than there is updating the model weights
00:39:18.480 | because you have to generate hundreds of possibilities
00:39:21.800 | and then, oh, you only train on a couple of them, right?
00:39:24.200 | So there is, that paradigm is very relevant.
00:39:27.560 | The other paradigm I would say that is very relevant
00:39:29.400 | is when you're training a model,
00:39:33.760 | do you necessarily need to be co-located
00:39:36.560 | for every single aspect of it, right?
00:39:38.880 | And this is-- - And what's the answer?
00:39:40.560 | - The answer is, depends on what you're doing.
00:39:42.680 | If you're in the pre-training paradigm,
00:39:44.600 | then maybe you don't, yeah,
00:39:46.000 | you need it to be co-located, right?
00:39:48.160 | You need everything to be in one spot.
00:39:49.960 | Yeah, why did Microsoft in Q1 and Q2
00:39:52.640 | sign these massive fiber deals, right?
00:39:54.760 | And why are they building multiple similar-sized
00:39:57.920 | data centers in Wisconsin and Atlanta and Texas
00:40:01.080 | and so on and so forth, right, and Arizona?
00:40:02.920 | Why are they doing that?
00:40:03.920 | Because they already see the research is there
00:40:06.400 | for being able to split the workload more appropriately,
00:40:08.920 | which is, hey, this data center, it's not serving users.
00:40:11.680 | It's running inference.
00:40:12.520 | It's just running inference
00:40:13.840 | and then throwing away most of the output
00:40:15.720 | because some of the output is good
00:40:18.040 | because I'm grading it, right?
00:40:19.000 | And it's doing, they're doing this
00:40:20.320 | while they're also updating the model in other areas.
00:40:22.200 | So the whole paradigm of training,
00:40:25.640 | pre-training is not slowing down.
00:40:27.640 | It's just, it's logarithmically more expensive
00:40:30.480 | for each generation, for each incremental improvement.
00:40:32.240 | - So people are finding other ways to--
00:40:33.840 | - But there's other ways to not just continue this,
00:40:36.840 | but hey, I don't need a logarithmic increase in spend
00:40:40.480 | to get the next generation of improvement.
00:40:42.440 | In fact, through this reasoning, training, and inference,
00:40:46.560 | I can get that logarithmic improvement in the model
00:40:48.960 | without ever spending that.
00:40:50.880 | Now I'm gonna do both, right?
00:40:52.640 | Because this is, because each model jump
00:40:55.280 | has unlocked huge value, right?
00:40:57.080 | - I mean, the, you know, the thing that I think
00:40:59.840 | so interesting, you know, I hear Kramer on CNBC
00:41:02.440 | this morning, you know, and they're talking about,
00:41:04.760 | is this Cisco from 2000?
00:41:06.840 | I was in Omaha, Bill, Sunday night for dinner.
00:41:10.200 | You know, they're obviously big investors and utilities
00:41:13.040 | and they're watching what's going on
00:41:14.640 | in the data center build out.
00:41:15.880 | And they're like, is this Cisco from 2000?
00:41:18.400 | So I had my team pull up a chart for Cisco, you know, 2000,
00:41:22.400 | and we'll show it on the pod.
00:41:24.120 | But, you know, they peaked at like 120 PE, right?
00:41:29.000 | And, you know, if you look at the fall off
00:41:31.760 | that occurred in revenue and in EBITDA, you know,
00:41:34.640 | and then it adds 70% compression
00:41:36.640 | in the price to earnings multiple, right?
00:41:39.000 | So the price to earnings multiple went from 120
00:41:41.320 | down to something closer to 30.
00:41:43.480 | And so I said to, you know, in this dinner conversation,
00:41:47.120 | I said, well, NVIDIA's, you know, PE today is 30.
00:41:51.600 | It's not 120, right?
00:41:53.400 | So you would have to think that there would be 70% PE
00:41:56.360 | compression from here or that their revenue
00:41:58.600 | was gonna fall by 70% or that their earnings
00:42:01.040 | were gonna fall by 70%, you know,
00:42:03.120 | in order to have a Cisco-like event,
00:42:05.280 | we all have post-traumatic stress about that.
00:42:07.200 | I mean, hell, you know, I lived through that too.
00:42:09.560 | Nobody wants to repeat that.
00:42:11.200 | But when people make that comparison,
00:42:12.840 | it strikes me as uninformed, right?
00:42:15.640 | It's not to say that there can't be a pullback,
00:42:17.800 | but given what you just told us
00:42:19.320 | about the build-out next year,
00:42:21.120 | given what you told us about scaling laws continuing,
00:42:23.960 | you know, what do you think when you hear, you know,
00:42:27.520 | the Cisco comparison when people are talking about NVIDIA?
00:42:30.840 | - Yeah, so I think there's a couple of things
00:42:33.720 | that are not fair, right?
00:42:34.680 | Cisco's revenue, a lot of it was funded
00:42:37.240 | through private/credit investments
00:42:41.040 | into building out telecom infrastructure, right?
00:42:44.040 | When we look at NVIDIA's revenue sources,
00:42:46.760 | very little of it is private/credit, right?
00:42:49.800 | And in some cases, yes, it's private/credit,
00:42:51.840 | like CoreWeave, right?
00:42:53.520 | But CoreWeave is just backstopped by Microsoft.
00:42:55.360 | There is significant amounts of, like, difference
00:42:57.480 | in, like, what is the source of the capital, right?
00:42:59.680 | The other thing is, at the peak of the dot-com,
00:43:02.160 | you know, especially once you inflation-adjust it,
00:43:04.640 | the private capital entering the space
00:43:08.240 | was much larger than it is today, right?
00:43:11.720 | As much as people say the venture markets are going crazy,
00:43:14.000 | throwing these huge valuations at, you know,
00:43:16.400 | all these companies, and we were just talking about this
00:43:18.560 | before the show, but, like, hey, the venture markets,
00:43:21.960 | the private markets, have not even tapped in, right?
00:43:24.280 | Guess what?
00:43:25.120 | Private market money, like in the Middle East,
00:43:27.080 | in these sovereign wealth funds, it's not coming in yet.
00:43:29.600 | Has barely come in, right?
00:43:31.160 | Why wouldn't there be a lot more spend
00:43:33.560 | from them as well, right?
00:43:35.280 | And so there is a significant amount of,
00:43:37.800 | the difference of capital, the source is positive cash flows
00:43:41.000 | of the most profitable companies that have ever lived
00:43:43.120 | or ever existed in humanity
00:43:44.880 | versus credit speculatively spent, right?
00:43:47.920 | So I think that is like a big aspect.
00:43:50.080 | That also gives it a bit of a knob, right?
00:43:52.200 | These companies that are profitable
00:43:53.520 | will be a bit more rational.
00:43:55.760 | - I think corporate America is investing more in AI
00:43:59.000 | and with more conviction than they did
00:44:01.480 | even in the internet wave also.
00:44:03.480 | - Maybe we can switch a little bit.
00:44:04.800 | You've mentioned inference time reasoning a few times now.
00:44:08.560 | It's clearly a new vector of scaling intelligence.
00:44:11.800 | And I read some of your analysis recently
00:44:14.400 | about how inference time reasoning
00:44:17.040 | is way more compute intensive, right?
00:44:20.120 | Than simply pre-train, you know, scaling pre-training.
00:44:23.360 | Why don't you walk us through,
00:44:24.600 | we have a really interesting graph here
00:44:26.680 | about why that's the case that we'll post as well.
00:44:30.600 | But why don't you walk us through first,
00:44:32.320 | just kind of what inference time reasoning is
00:44:34.800 | from the perspective of compute consumption,
00:44:37.720 | why it's so much more compute intensive.
00:44:40.240 | And so leading to the conclusion
00:44:42.360 | that if this is in fact going to continue to scale
00:44:46.080 | as a new vector of intelligence,
00:44:50.240 | it looks like it's gonna be even more compute intensive
00:44:52.720 | than what came before it.
00:44:54.480 | - Yeah, so pre-training may be slowing out
00:44:56.200 | or it's too expensive,
00:44:57.560 | but there's these other aspects of synthetic data generation
00:44:59.680 | and inference time compute.
00:45:01.040 | Inference time compute is,
00:45:03.440 | on the surface sounds amazing, right?
00:45:05.520 | I don't need to spend more training a model.
00:45:07.400 | But when you think about it for a second,
00:45:09.720 | this is actually very, very,
00:45:12.320 | this is not the way you want to scale.
00:45:13.760 | You only do that because you have to, right?
00:45:16.240 | The way, because, all right, think about it.
00:45:17.840 | GPT-4 was trained with hundreds of billions of dollars
00:45:21.280 | and it's generating billions of dollars of revenue.
00:45:24.320 | - Hundreds of millions of dollars.
00:45:25.600 | - Hundreds of millions of dollars to train GPT-4.
00:45:27.720 | And it's generating billions of dollars of revenue.
00:45:29.800 | So when you say like, "Hey, Microsoft's CapEx is nuts."
00:45:32.560 | Sure, but their spend on GPT-4 was very reasonable
00:45:37.320 | relative to the ROI they're getting out of it, right?
00:45:39.880 | Now, when you say, "Hey, I want the next gain."
00:45:42.440 | If I just spend sort of a large amount of capital
00:45:47.080 | and train a better model, awesome.
00:45:48.760 | But if I don't have to spend that large amount of capital
00:45:51.360 | and I deploy this better model without,
00:45:54.840 | at the time of revenue generation,
00:45:56.560 | rather than ahead of time when I'm training the model,
00:45:58.720 | this also sounds awesome.
00:46:00.360 | But this comes with this big trade-off, right?
00:46:03.880 | When you're running reasoning, right?
00:46:05.720 | You're having the model generate a lot.
00:46:09.320 | And then the answer is only a portion of that, right?
00:46:11.400 | Today, when you open up chat GPT, use GPT-4, 4.0,
00:46:15.400 | you say something, you get a response.
00:46:17.440 | You send something, you get a response, whatever it is, right?
00:46:20.760 | All of the stuff that's being generated
00:46:22.480 | is being sent to you.
00:46:23.640 | Now you're having this reasoning phase, right?
00:46:26.040 | And OpenAI doesn't wanna show you,
00:46:27.600 | but there's some open source Chinese models
00:46:29.480 | like Alibaba and DeepSeek.
00:46:31.720 | They've released some open source models,
00:46:33.000 | which are not quite as good as OpenAI, of course,
00:46:34.480 | but they show you what that reasoning looks like
00:46:37.160 | if you want to.
00:46:38.000 | And OpenAI has released some examples.
00:46:39.080 | It generates tons of things.
00:46:40.520 | It's like, it sometimes switches
00:46:42.200 | between Chinese and English, right?
00:46:43.520 | Like whatever it is, it's thinking, right?
00:46:45.360 | It's churning.
00:46:46.200 | It's like this, this, this, this.
00:46:47.160 | Oh, should I do it this way?
00:46:48.000 | Should I break it down in these steps?
00:46:49.440 | And then it comes out with an answer, right?
00:46:51.520 | Now, on the surface, awesome.
00:46:53.280 | I didn't have to spend any more on R&D or capital, right?
00:46:56.880 | I'm saying this in the loose terms.
00:46:57.960 | They don't treat training models as R&D,
00:47:01.600 | I think on Microsoft on a financial basis.
00:47:03.420 | But they don't have to treat this,
00:47:05.320 | they don't have this R&D ahead of time, right?
00:47:07.800 | You get it at spend time.
00:47:09.000 | But think about what that means, right?
00:47:11.840 | If for you, right, for example,
00:47:13.400 | one simple thing that we've done a lot of tests on is,
00:47:16.620 | hey, generate me this code, right?
00:47:19.460 | Like make this function.
00:47:21.040 | Great.
00:47:21.880 | I describe the function and a few hundred words.
00:47:24.660 | I get back a response that's a thousand words.
00:47:27.640 | Awesome.
00:47:28.960 | And I'm paying per token.
00:47:30.560 | When I do this with O1 or any other reasoning model,
00:47:33.460 | I'm sending the same response, right?
00:47:35.740 | A few hundred tokens.
00:47:36.580 | I'm paying for that.
00:47:37.500 | I'm getting the same response, roughly a thousand tokens.
00:47:40.060 | But in the middle, there was 10,000 tokens of it thinking.
00:47:43.180 | Now, what does that 10,000 tokens of thinking actually mean?
00:47:46.620 | It means, well, the model's spitting out
00:47:47.980 | 10 times as many tokens.
00:47:49.260 | Well, if Microsoft's generating,
00:47:51.020 | call it $10 billion of inference revenue,
00:47:53.180 | and their margins on that are good.
00:47:55.580 | They've stated this, right?
00:47:56.980 | They're anywhere from 50 to 70%,
00:47:59.820 | depending on how you count the OpenAI profit share.
00:48:03.040 | You know, anywhere from 50 to 70% gross margins.
00:48:05.440 | Their cost for that is a few billion dollars
00:48:07.840 | for $10 billion of revenue.
00:48:09.040 | - Right.
00:48:09.880 | - If, now, obviously the better model
00:48:11.960 | gets to charge more, right?
00:48:13.240 | So O1 does charge a lot more,
00:48:14.880 | but you're now increasing your cost from,
00:48:17.320 | hey, I outputted a thousand tokens
00:48:18.680 | to I outputted 11,000 tokens.
00:48:20.740 | I've 10X'd my spend to generate,
00:48:23.780 | now, not the same thing, right?
00:48:24.840 | It's higher quality.
00:48:25.680 | - Correct.
00:48:26.960 | - And that's only part of it.
00:48:29.200 | That's deceptively simple.
00:48:30.320 | It's not just 10X, right?
00:48:31.640 | Because if you go look at O1,
00:48:32.720 | despite it being the same model architecture as GPD 4.0,
00:48:37.080 | it actually costs significantly more per token as well.
00:48:39.880 | And that's because of, you know,
00:48:41.040 | sort of this chart that we're looking at here, right?
00:48:43.020 | - Right.
00:48:43.860 | - And this chart shows, hey, what is GPD 4.0, right?
00:48:46.300 | If I'm generating, you know, call it a thousand tokens,
00:48:48.680 | right, and that's what GPD 4.0 on the bottom right is,
00:48:51.420 | of LLAMA 405B, this is an open model,
00:48:53.560 | so it's easier to simulate, you know,
00:48:55.400 | the exact metrics of it.
00:48:56.920 | But, you know, if I'm doing that,
00:48:58.360 | I'm keeping my users, you know,
00:49:00.820 | experience of the model constant,
00:49:03.240 | i.e. the number of tokens they're getting at the speed,
00:49:05.520 | then, you know, when I ask it a question,
00:49:07.400 | it generates the unit, it generates the code,
00:49:10.040 | whatever it is, I can group together many users' requests.
00:49:14.640 | I can group together over 256 users' requests
00:49:18.160 | on one server for LLAMA 405B of NVIDIA server, right?
00:49:21.800 | Like, you know, $300,000 server or so.
00:49:23.740 | When I do this with O1, right,
00:49:26.080 | because it's doing that thinking phase of 10,000, right,
00:49:28.920 | this is basically the whole context length thing.
00:49:31.000 | Context length is not free, right?
00:49:32.920 | Context length or sequence length
00:49:34.360 | means that it has to calculate attention,
00:49:36.280 | the attention mechanism, i.e. it spends a lot of memory
00:49:38.900 | on generating this KB cache
00:49:40.760 | and reading this KB cache constantly.
00:49:42.660 | Now the maximum batch size, i.e. concurrent users I can have
00:49:46.880 | is a fraction of that, one-fourth to one-fifth
00:49:50.800 | the number of users can currently use the server.
00:49:52.960 | So not only do I need to generate 10X as many tokens,
00:49:55.860 | each token that's generated is four to five X less users.
00:50:01.840 | So the cost increase is stupendous
00:50:04.840 | when you think about a single user.
00:50:06.480 | Cost increase for a single token
00:50:08.080 | to be generated is four to five X,
00:50:09.560 | but then I'm generating 10X as many tokens.
00:50:11.160 | So you could argue the cost increase is 50X
00:50:13.880 | for an O1 style model on input to output.
00:50:17.240 | - I know it's a 10X 'cause it was on the original O1 release
00:50:20.120 | but with the log scale, but I didn't know.
00:50:21.960 | - And it just requires you to have,
00:50:24.840 | again, to service the same number of customers,
00:50:27.800 | you have to have multiples more compute.
00:50:30.560 | - Well, there's good news and bad news here, Brad,
00:50:32.960 | which I think is what Dylan's telling us.
00:50:35.320 | If you're just selling NVIDIA hardware
00:50:37.680 | and they remain the architecture
00:50:39.400 | and this is our scaling path,
00:50:41.240 | you're gonna consume way more of it.
00:50:43.020 | - Correct, but the margins for the people
00:50:45.480 | who are generating on the other end,
00:50:47.360 | unless they can pass it on to the end consumer
00:50:49.800 | are gonna compress.
00:50:51.520 | And the thing is you can pass it on to the end consumer
00:50:53.960 | because, hey, it's not really like,
00:50:56.560 | oh, it's X percent better on this benchmark.
00:50:58.480 | It's, it literally could not do this before
00:51:01.280 | and now it can, right?
00:51:03.040 | And so--
00:51:03.860 | - And they're running a test right now
00:51:04.700 | where they're 10X-ing what they're charging,
00:51:06.880 | the end consumer, you know.
00:51:08.120 | - And it's 10X per token, right?
00:51:10.280 | Remember, they're also paying for 10X as many tokens, right?
00:51:12.600 | So it's actually, you know,
00:51:13.880 | the consumer is paying 50X more per query,
00:51:17.640 | but they're getting value out of it
00:51:19.080 | because now all of a sudden,
00:51:20.560 | it can pass certain benchmarks like SWEbench, right?
00:51:23.640 | Software Engineering Benchmark, right?
00:51:25.440 | Which is just a benchmark for generating,
00:51:26.960 | like, you know, decent code, right?
00:51:28.940 | There's front-end web development, right?
00:51:30.340 | What do you pay front-end web developers?
00:51:32.120 | What do you pay back-end developers?
00:51:34.460 | Versus, hey, what if they use O1?
00:51:36.560 | How much more code can they output?
00:51:37.960 | How much more can they output?
00:51:39.120 | Yes, the queries are expensive,
00:51:40.800 | but they're nothing close to the human, right?
00:51:42.840 | And so each level of productivity gain I get,
00:51:45.720 | each level of capabilities jump
00:51:47.320 | is a whole new class of tasks that it can do, right?
00:51:50.600 | And therefore, I can charge for that, right?
00:51:52.360 | So this is the whole, like, axes of,
00:51:54.600 | yes, I spend a lot more to get the same output,
00:51:57.200 | but you're not getting the same output with this model.
00:51:59.520 | - Are we overestimating or underestimating
00:52:02.560 | end-demand enterprise-level demand for the O1 model?
00:52:06.920 | What are you hearing?
00:52:07.760 | - So I would say the O1 style model is so early days,
00:52:11.440 | people don't even, like, get it, right?
00:52:12.920 | O1 is like, they just crack the code and they're doing it,
00:52:15.880 | but guess what?
00:52:16.720 | Right now on, you know, some of the anonymous benchmarks,
00:52:20.360 | there are, you know, it's called LLM SIS,
00:52:22.040 | which is like an arena where different LLMs
00:52:24.120 | get to, like, compete, sort of, and people vote on them.
00:52:27.360 | There's a Google model that is doing reasoning right now,
00:52:30.720 | and it's not released yet,
00:52:31.920 | but it's going to be released soon enough, right?
00:52:34.240 | Anthropic is going to release a reasoning model.
00:52:36.520 | These people are going to one-up each other,
00:52:38.080 | and also they've spent so little compute
00:52:39.880 | on reasoning right now in terms of training time.
00:52:42.320 | And they see a very clear path to spending a lot more,
00:52:45.700 | i.e. jumping up the scaling laws.
00:52:47.640 | Oh, I only spent $10 million.
00:52:49.200 | Well, wait, that means I can jump up
00:52:50.760 | two to three logarithms in scaling like that,
00:52:53.620 | because I've already got the compute.
00:52:55.720 | You know, I can go from $10 million to $100 billion
00:52:57.840 | to $10 billion on reasoning in such a quick succession.
00:53:01.160 | And so the performance improvements
00:53:03.760 | we'll get out of these models is humongous, right?
00:53:06.720 | In the coming, you know, six months to a year
00:53:09.320 | in certain benchmarks where you have functional verifiers.
00:53:13.080 | - Quick question, and we promised
00:53:15.100 | we'd go to these alternatives,
00:53:16.260 | so we'll have to get there eventually.
00:53:17.660 | But if you go back, we've used this internet wave
00:53:22.180 | comparison multiple times.
00:53:24.360 | When all of the venture-backed companies
00:53:26.980 | got started on the internet,
00:53:28.180 | they were all on Oracle and Sun.
00:53:30.900 | And five years later, they weren't on Oracle or Sun.
00:53:34.140 | And some have argued it went from a development sandbox
00:53:37.700 | world to a optimization world.
00:53:40.580 | Is that likely to happen?
00:53:42.220 | Is there an equivalency here or not?
00:53:44.820 | And if you could touch on why the backend
00:53:49.820 | is so steep and cheap, like, you know,
00:53:52.980 | just you go a model, you know, behind,
00:53:56.180 | or you, like, the price you can save
00:53:59.860 | by just backing up a little bit is nutty.
00:54:03.300 | - Yeah, yeah, so today, right,
00:54:05.820 | like O1 is stupendously expensive.
00:54:07.740 | You drop down to 4.0, it's a lot cheaper.
00:54:09.380 | You jump down to 4.0 mini, it's so cheap.
00:54:11.260 | Why, because now I'm, with 4.0 mini,
00:54:13.740 | I'm competing against Lama,
00:54:15.420 | and I'm competing against DeepSeek,
00:54:16.740 | I'm competing against Mistral,
00:54:17.940 | I'm competing against Alibaba,
00:54:19.420 | and I'm competing against tons of companies.
00:54:21.100 | - So you think those are market-clearing prices?
00:54:23.620 | - I think, and in addition, right,
00:54:26.020 | there is also the problem of inferencing
00:54:28.100 | a small model is quite easy, right?
00:54:30.620 | I can run Lama70B on one AMD GPU.
00:54:34.380 | I can run Lama70B on one Nvidia GPU,
00:54:37.820 | and soon enough there'll be,
00:54:38.860 | on one set of Amazon's Neutranium, right?
00:54:41.620 | I can sort of run this model on a single chip.
00:54:44.100 | This is a very easy, I won't say very easy problem,
00:54:46.340 | it's still hard, but it's quite a bit easier problem
00:54:49.060 | than running this complex reasoning
00:54:50.980 | or this very large model, right?
00:54:52.780 | And so there is that difference, right?
00:54:54.980 | There's also the fact that, hey,
00:54:56.420 | there's literally 15 different companies out there
00:54:59.460 | offering API inferences, inference APIs,
00:55:02.780 | on Lama, and Alibaba, and DeepSeek, and Mistral,
00:55:05.140 | like these different models, right?
00:55:06.460 | - You're talking about Cerebrus, and Grok,
00:55:08.220 | and, you know, Fireworks, and all these others.
00:55:10.340 | - Yeah, Fireworks together, you know,
00:55:12.740 | all the companies that aren't using their own hardware.
00:55:14.740 | Now, of course, Grok and Cerebrus
00:55:15.780 | are doing their own hardware and doing this as well.
00:55:17.540 | But the market, the margins here are bad, right?
00:55:22.300 | You know, sort of, we had this whole thing
00:55:23.740 | about the inference race to the bottom
00:55:25.380 | when Mistral released their Mistral model,
00:55:27.540 | which was like very revolutionary, sort of late last year,
00:55:30.440 | because it was such a level of performance
00:55:33.780 | that didn't exist in the open source,
00:55:36.260 | that it drove pricing down so fast, right?
00:55:39.380 | Because everyone's competing for API.
00:55:41.100 | What am I, as an API provider, providing you,
00:55:43.900 | like, why don't you switch from mine to his, why?
00:55:46.220 | Because, well, there's no, it's pretty fungible, right?
00:55:48.820 | I'm still getting the same tokens on the same model.
00:55:50.580 | And so, the margins for these guys is much lower.
00:55:52.820 | So, Microsoft's earning 50 to 70% gross margins
00:55:55.860 | on OpenAI models, and that's with the profit share
00:55:58.060 | they get to get, or the share that they give OpenAI, right?
00:56:00.580 | Or, you know, Anthropic, similarly,
00:56:02.540 | in their most recent round,
00:56:03.380 | they were showing, like, 70% gross margins.
00:56:05.980 | But that's because they have this model.
00:56:07.900 | You step down to here, no one uses this model from,
00:56:12.580 | you know, a lot less people use it from OpenAI or Anthropic,
00:56:15.260 | because they can just, like, take the weights from Llama,
00:56:19.380 | put it on their own server, or vice versa.
00:56:21.060 | Go to one of the many competitive API providers,
00:56:23.620 | some of them being venture-funded,
00:56:24.900 | some of them, you know, and losing money, right?
00:56:27.180 | So, there's all this competition here.
00:56:28.780 | So, not only are you saying, I'm taking a step back,
00:56:31.300 | and it's an easier problem, and so, therefore,
00:56:33.700 | like, if the model's 10x smaller,
00:56:35.220 | it's, like, 15x cheaper to run.
00:56:37.220 | On top of that, I'm removing that gross margin.
00:56:39.700 | And so, it's not 15x cheaper to run,
00:56:41.620 | it's 30x cheaper to run.
00:56:43.140 | And so, this is sort of the beauty of, like,
00:56:45.660 | well, is everything commodity?
00:56:47.140 | No, but, like, there is a huge chase to, like,
00:56:50.140 | if you're deploying it in services,
00:56:51.580 | that's gonna be, this is great for you, A.
00:56:53.740 | B, you have to have the best model,
00:56:56.620 | or you're no one if you're one of the labs, right?
00:56:58.580 | And so, you see a lot of struggles for the companies
00:57:00.980 | that were trying to build the best models, but failing.
00:57:03.060 | - And arguably, not only do you have to have the best model,
00:57:05.420 | you actually have to have an enterprise or a consumer
00:57:08.420 | willing to pay for the best model, right?
00:57:10.660 | Because at the end of the day, you know,
00:57:12.540 | the best model implies that somebody's willing to pay you
00:57:14.980 | these high margins, right?
00:57:16.460 | And that's either an enterprise or a consumer.
00:57:18.340 | So, I think, you know, you're quickly narrowing down
00:57:21.860 | to just a handful of folks who will be able to compete,
00:57:25.060 | you know, in that market.
00:57:26.020 | - I think on the model side, yes.
00:57:27.340 | I think on the who's willing to pay for these models is,
00:57:31.180 | I think a lot more people will pay for the best model, right?
00:57:34.460 | When we use models internally, right?
00:57:36.060 | We have language models go through every regulatory,
00:57:39.060 | filing, and permit to look at data center stuff
00:57:41.180 | and pull that out and tell us where to look
00:57:42.540 | and where to not to.
00:57:43.540 | And we just use the best model because it's so cheap, right?
00:57:47.260 | Like, the data that I'm getting out of it,
00:57:48.900 | the value I'm getting out of it is so much higher.
00:57:50.860 | - What model are you using?
00:57:52.140 | - We're using Anthropic, actually, right now,
00:57:53.580 | Cloud 3.5, Sunet, and you, Sonnet.
00:57:56.660 | And so, just because O1 is a lot better on certain tasks,
00:58:00.220 | but not necessarily regulatory filings and permitting
00:58:02.820 | and things like that,
00:58:03.780 | because the cost of errors is so much higher, right?
00:58:06.060 | Same with a developer, right?
00:58:07.980 | If I can increase a developer
00:58:09.180 | who makes $300,000 a year here in the Bay by 20%,
00:58:12.820 | that's a lot.
00:58:13.660 | If I can take a team of 100 developers
00:58:16.340 | and use 75 or 50 to do the same job,
00:58:19.380 | or I can ship twice as much code,
00:58:21.380 | this is so worth using the most expensive model
00:58:23.700 | because O1, as expensive it is relative to 4.0,
00:58:27.420 | it's still super cheap, right?
00:58:29.380 | The cost for intelligence is so high in society, right?
00:58:32.860 | That's why intelligent jobs are the most high-paying jobs.
00:58:36.380 | White-collar jobs, right, are the most high-paying jobs.
00:58:38.220 | If you can bring down the cost of intelligence
00:58:40.580 | or augment intelligence,
00:58:42.140 | then there's a high market clearing price for that,
00:58:44.060 | which is why I think that sort of the,
00:58:46.300 | oh, yes, O1 is expensive,
00:58:47.980 | and people will always gravitate to
00:58:49.820 | what's the cheapest thing at a certain level of intelligence,
00:58:52.060 | but each time we break a new level of intelligence,
00:58:54.100 | it's not just, oh, we've got a few more tasks we can do.
00:58:57.580 | I think it grows the mode of tasks
00:58:59.100 | that can be done dramatically.
00:59:00.700 | Very few people could use GPT-2 and 3, right?
00:59:03.860 | A lot of people can use GPT-4.
00:59:05.740 | When we get to that quality of jump
00:59:07.500 | that we see for the next generation,
00:59:09.260 | the amount of people that can use it,
00:59:11.020 | the tasks that it can do, balloons out,
00:59:13.420 | and therefore the amount of sort of white-collar jobs
00:59:15.740 | that it can augment increased productivity on will grow,
00:59:18.220 | and therefore the market clearing price for that token
00:59:20.020 | will be very high.
00:59:20.860 | - That's super interesting.
00:59:21.700 | I could make the other argument
00:59:22.540 | that someone that's in a high-volume,
00:59:25.380 | you know, just replacing tons of customer service calls
00:59:28.660 | or whatever might be tempted to minimize the spend--
00:59:33.660 | - Absolutely.
00:59:35.980 | - And maximize the amount of value add
00:59:37.980 | they build around this thing,
00:59:40.020 | database writes and reads.
00:59:41.580 | - Absolutely.
00:59:42.420 | So one of the funny things I like to,
00:59:43.860 | the calculations we did is,
00:59:45.460 | if you take one quarter of NVIDIA shipments,
00:59:47.700 | and you said all of them are gonna inference LLAMA 7B,
00:59:50.980 | you can give every single person on Earth
00:59:53.340 | 100 tokens per minute, right?
00:59:56.660 | Or sorry, 100 tokens per second.
00:59:58.300 | You can give every single person on Earth
00:59:59.620 | 100 tokens per second,
01:00:00.900 | which is like absurd.
01:00:02.500 | - Yeah.
01:00:03.340 | - You know, so like,
01:00:04.180 | if we're just deploying LLAMA 7B quality models,
01:00:07.500 | we've so overbuilt, it's not even funny.
01:00:10.500 | Now, if we're deploying things that can like augment
01:00:13.260 | engineers and increase productivity
01:00:16.780 | and help us build robotics or AV or whatever else faster,
01:00:21.020 | then that's a very different like calculation, right?
01:00:24.140 | And so that's sort of the whole thing.
01:00:25.380 | Like, yes, small models are there,
01:00:26.620 | but like, they're so easy to run.
01:00:27.860 | - And it may just, both these things may be true.
01:00:30.700 | - Right, we're gonna have tons of small models
01:00:32.100 | running everywhere, but the compute cost of them is so low.
01:00:34.020 | - Yeah, fair enough.
01:00:34.980 | - Bill and I were talking about this earlier
01:00:36.540 | with respect to the hard drives,
01:00:38.580 | you know, that you used to cover.
01:00:39.980 | But if you look at the memory market,
01:00:41.540 | it's been one of these boom or bust markets.
01:00:43.740 | The idea was you would always, you know,
01:00:45.700 | sell these things when they're nearing peak.
01:00:48.340 | You know, you always buy them at trough.
01:00:49.860 | You don't own them anywhere, you know, in between.
01:00:52.900 | They trade at very low earnings multiples.
01:00:55.300 | I'm talking about Hynex and I'm talking about,
01:00:57.580 | you know, Micron.
01:00:58.740 | As you think about the shift toward inference time compute,
01:01:01.820 | it seems that the memory demanded of these chips,
01:01:05.060 | and Jensen has talked a lot about this,
01:01:07.260 | just is on a secular shift higher, right?
01:01:11.060 | Because if they're doing these passes, you know,
01:01:13.940 | and you're running, like you said,
01:01:15.300 | 10 or a hundred or a thousand passes
01:01:17.660 | for inference time reasoning,
01:01:19.260 | you just have to have more and more memory
01:01:20.900 | as this context length expands.
01:01:23.020 | So, you know, talk to us a little bit about, you know,
01:01:25.820 | kind of how you think about the memory market.
01:01:27.620 | - Yeah, so, you know, to sort of like set the stage
01:01:30.820 | a little bit more is reasoning models
01:01:34.580 | output thousands and thousands of tokens.
01:01:36.900 | And when we're looking at transformers,
01:01:40.540 | attention, right, like holy grail of transformers,
01:01:43.460 | i.e. how it like understands the entire context
01:01:46.980 | grows dramatically and the KV cache,
01:01:49.460 | i.e. the memory that is keeping track
01:01:53.020 | of how, what this like context means
01:01:55.540 | is growing quadratically, right?
01:01:57.780 | And therefore, if I go from a context length of 10 to 100,
01:02:01.340 | it's not just a 10X, it's much more, right?
01:02:03.260 | And so you treat that, right?
01:02:05.020 | Like today's reasoning models,
01:02:06.500 | they'll think 10,000 tokens, 20,000 tokens.
01:02:09.460 | When we get to, hey,
01:02:10.700 | what is complex reasoning gonna look like?
01:02:12.380 | Models are going to get to the point
01:02:13.980 | where they're thinking for hundreds of thousands of tokens.
01:02:16.500 | And then this is all one chain of thought
01:02:18.580 | or it might be some search,
01:02:19.700 | but it's gonna be thinking a lot
01:02:21.660 | and this KV cache is gonna balloon, right?
01:02:24.020 | - You're saying memory could grow faster than GPU cache.
01:02:26.700 | - And it objectively is when you look
01:02:28.460 | at the cost of goods sold of NVIDIA,
01:02:32.100 | their highest cost of goods sold is not TSMC,
01:02:34.460 | which is a thing that people don't realize.
01:02:36.620 | It's actually HBM memory, primarily SK Hynix.
01:02:39.180 | - That may be a for now also, but.
01:02:40.980 | - Yeah, so there's three memory companies out there, right?
01:02:43.620 | There's Samsung, SK Hynix, and Micron.
01:02:46.540 | NVIDIA has majority used SK Hynix.
01:02:48.900 | And this is like a big shift in the memory market as a whole
01:02:51.540 | 'cause historically it has been a commodity, right?
01:02:53.980 | I.e. it's fungible.
01:02:55.980 | Whether I buy from Samsung or SK Hynix or Micron or.
01:02:58.540 | - Is the socket replaceable?
01:03:00.340 | - Yeah, and even now,
01:03:02.300 | Samsung is getting really, really hit hard
01:03:04.180 | because there's a Chinese memory maker, CXMT,
01:03:07.980 | and their memory is not as good as the last, but it's fine.
01:03:10.220 | And in low end memory, it's fungible.
01:03:12.780 | And therefore, the price of low end memory
01:03:14.280 | has fallen a lot. - Right.
01:03:15.940 | - In HBM, Samsung has almost no share, right?
01:03:19.660 | Especially at NVIDIA.
01:03:21.460 | And so this is hitting Samsung really hard, right?
01:03:25.060 | Despite them being the largest memory maker in the world,
01:03:28.260 | everyone's always like, if you said memory,
01:03:29.500 | it's like, yeah, Samsung's a little bit ahead in tech
01:03:31.260 | and their margins are a little bit better
01:03:32.940 | and they're killing it, right?
01:03:33.820 | But now it's quite not the case
01:03:35.020 | because on the low end, they're getting a little bit hit.
01:03:36.780 | And on the high end, they can't break in
01:03:38.380 | or they keep trying, but they keep failing.
01:03:40.960 | On the flip side, you have companies like SK Hynix
01:03:43.500 | and Micron who are converting significant amounts
01:03:47.380 | of their capacity of sort of commodity DRAM to HBM.
01:03:50.940 | Now, HBM is still fungible, right?
01:03:52.960 | In that if someone hits a certain level of technology,
01:03:56.260 | they can swap out Micron to Hynix, right?
01:03:58.080 | So it's fungible in that sense, right?
01:04:00.180 | It's a commodity in that sense,
01:04:01.300 | but because reasoning requires so much more memory
01:04:04.980 | and the cost of goods sold of an H100 to Blackwell,
01:04:08.000 | the percentage of costs to HBM has grown faster
01:04:11.860 | than the percentage of costs to leading edge silicon.
01:04:15.640 | You've got this big shift or dynamic going on.
01:04:18.380 | And this applies not just to NVIDIA's GPUs,
01:04:20.900 | but it applies to the hyperscalers GPUs as well, right?
01:04:23.740 | Or accelerators like the TPU, Amazon Tranium, et cetera.
01:04:26.780 | - And SK has higher gross margins
01:04:29.340 | than memory companies have this year.
01:04:31.300 | - Correct, correct.
01:04:32.140 | If you listen to Jensen at least describe it,
01:04:34.300 | it's not all memory is created equal, right?
01:04:37.740 | And so it's not only that the product
01:04:39.860 | is more differentiated today,
01:04:41.100 | there's more software associated with the product today,
01:04:43.460 | but it's also how it's integrated into the overall system,
01:04:47.800 | right?
01:04:48.640 | And going back to the supply chain question,
01:04:50.480 | it sounds like it's all commodity.
01:04:52.420 | It just seems to me that at least
01:04:54.060 | there's a question out there.
01:04:55.500 | Is it structurally changing?
01:04:56.860 | We know the secular curve is up and to the right.
01:04:59.580 | - I'm hearing you say maybe.
01:05:00.740 | It may be differentiated enough to not be a commodity.
01:05:03.420 | - It may be.
01:05:04.260 | And I think another thing to point out is
01:05:07.380 | funnily enough, the gross margins on HBM
01:05:09.900 | have not been fantastic.
01:05:11.260 | - Right.
01:05:12.100 | - They've been good, but they haven't been fantastic.
01:05:13.580 | Actually, regular memory, high-end like server memory
01:05:16.540 | that is not HBM is actually higher gross margin than HBM.
01:05:19.020 | And the reason for this is because
01:05:20.780 | NVIDIA is pushing the memory makers so hard, right?
01:05:23.660 | They want the faster, newer generation of memory,
01:05:25.580 | faster and faster and faster for HBM,
01:05:28.100 | but not necessarily like everyone else for servers.
01:05:30.260 | Now, what is this like meant?
01:05:32.140 | Is that, hey, even though Samsung may achieve level four,
01:05:35.580 | right, or level three or whatever that they had previously,
01:05:37.660 | they can't reach what Hynix is at now.
01:05:39.620 | What are the competitors doing, right?
01:05:41.020 | What is AMD and Amazon saying?
01:05:42.900 | AMD explicitly has a better inference GPU
01:05:46.820 | because they give you more memory, right?
01:05:48.980 | They give you more memory and more memory bandwidth.
01:05:50.580 | That's literally the only reason AMD's GPU
01:05:52.580 | is even considered better.
01:05:53.940 | - On chip?
01:05:55.060 | - HBM memory.
01:05:55.900 | - Okay.
01:05:56.740 | - Which is on package.
01:05:57.560 | - Okay.
01:05:58.400 | - Right?
01:05:59.220 | Specifically, yeah.
01:06:00.060 | And then when we look at Amazon,
01:06:02.460 | their whole thing at reInvent,
01:06:03.900 | if you really talk to them,
01:06:05.140 | when they were announced Trinium 2,
01:06:06.260 | and our whole post about it and our analysis of it
01:06:08.060 | is like supply chain wise,
01:06:09.900 | this is, you squint your eyes,
01:06:11.780 | this looks like an Amazon Basics TPU, right?
01:06:14.260 | It's decent, right?
01:06:15.460 | But it's really cheap, A.
01:06:17.060 | And B, it gives you the most HBM capacity per dollar
01:06:20.700 | and most HBM memory bandwidth per dollar
01:06:22.860 | of any chip on the market.
01:06:24.300 | And therefore, it actually makes sense
01:06:25.900 | for certain applications to use.
01:06:27.700 | And so this is like a real, real shift.
01:06:29.660 | Like, hey, we maybe can't design as well as NVIDIA,
01:06:33.080 | but we can put more memory on the package, right?
01:06:34.740 | Now, this is just only one vector of like,
01:06:36.700 | there's a multi-vector problem here.
01:06:38.300 | They don't have the networking nearly as good.
01:06:39.780 | They don't have the software nearly as good.
01:06:41.260 | Their compute elements are not nearly as good.
01:06:43.640 | By golly, they've got more memory bandwidth per dollar.
01:06:46.540 | - Well, this is where we wanted to go
01:06:47.980 | before we run out of time,
01:06:49.100 | is just to talk about these alternatives,
01:06:50.820 | which you just started doing.
01:06:52.380 | So despite all the amazing reasons
01:06:55.540 | why no one would seemingly wanna pick a fight with NVIDIA,
01:07:00.220 | many are trying, right?
01:07:02.020 | And I even hear people talk about trying
01:07:04.520 | that haven't tried yet.
01:07:05.480 | Like OpenAI is constantly talking about their own chip.
01:07:08.420 | How are these other players doing?
01:07:11.760 | Like, how would you handicap?
01:07:12.920 | Let's start with AMD just because they're a standalone
01:07:15.920 | company, and then we'll go to some of the internal program.
01:07:18.200 | - Yeah, so AMD is competing well
01:07:20.840 | because silicon engineering-wise, they're amazing, right?
01:07:24.140 | They're competitive, potentially.
01:07:25.600 | - They kicked Intel's ass.
01:07:26.520 | - But, yeah, they kicked Intel's ass,
01:07:28.080 | but that's like, you know, stealing candy from a baby.
01:07:30.360 | - They started way down here.
01:07:32.700 | Over a 20-year period, it was pretty (beep) amazing.
01:07:35.600 | - So AMD is really good, but they're missing software.
01:07:38.100 | AMD has no clue how to do software, I think.
01:07:40.500 | They've got very few developers on it.
01:07:42.780 | They won't spend the money to build a GPU cluster
01:07:46.340 | for themselves so that they can develop software, right?
01:07:50.440 | Which is like insane, right?
01:07:51.540 | Like NVIDIA, you know, the top 500 supercomputer list
01:07:54.860 | is not relevant because most of the biggest supercomputers
01:07:57.420 | like Elon's and Microsoft's and so on and so forth,
01:07:59.460 | they're not on there.
01:08:00.700 | But NVIDIA has multiple supercomputers
01:08:03.440 | on the top 500 supercomputer list.
01:08:05.640 | And they use them fully internally
01:08:07.380 | to develop software, network software,
01:08:09.180 | whether it be network software or compute software,
01:08:10.780 | inference software, all these things.
01:08:12.700 | You know, test all these changes they make.
01:08:14.660 | And then roll out pushes, you know,
01:08:16.740 | where if XAI is mad because of, you know,
01:08:19.660 | software's not working, NVIDIA will push it the next day
01:08:22.020 | or two days later, like clockwork, right?
01:08:24.840 | Because there's tons of things that break constantly
01:08:26.440 | when you're training models.
01:08:28.820 | AMD doesn't do that, right?
01:08:30.020 | And I don't know why they won't spend the money
01:08:31.500 | on a big cluster.
01:08:33.460 | The other thing is they have no idea
01:08:34.780 | how to do system level design.
01:08:36.060 | They've always lived in the world of,
01:08:38.020 | I'm competing with Intel,
01:08:39.060 | so if I make a better chip than Intel, then I'm great.
01:08:41.480 | Because software, x86, it's x86, everything's fungible.
01:08:45.460 | - I mean, NVIDIA doesn't keep it a secret
01:08:47.220 | that they're a systems company.
01:08:48.460 | So presumably they've read that in their--
01:08:50.260 | - Yeah, and so they bought this systems company
01:08:52.020 | called ZT Systems.
01:08:53.180 | But they're, you know, the whole rack scale architecture,
01:08:57.180 | which Google deployed in 2018 with the TPU v3.
01:09:00.420 | - Are there any hyperscalers that are so interested
01:09:04.680 | in AMD being successful that they're co-developing
01:09:07.460 | with them?
01:09:08.300 | - So the hyperscalers all have their own custom
01:09:11.100 | silicon efforts, but they also are helping AMD
01:09:14.020 | a lot in different ways, right?
01:09:15.060 | So Meta and Microsoft are helping them with software, right?
01:09:18.500 | Not enough that like AMD is like caught up
01:09:20.700 | or anything close to it.
01:09:21.900 | They're helping AMD a lot
01:09:22.900 | with what they should even do, right?
01:09:24.240 | So the other thing that people recognize is,
01:09:26.820 | if I have the best engineering team in the world,
01:09:28.620 | that doesn't tell me what the problem is, right?
01:09:30.380 | The problem has this, this, this, this.
01:09:32.220 | It's got these trade-offs.
01:09:33.620 | AMD doesn't know software development.
01:09:35.260 | It doesn't know model development.
01:09:36.540 | It doesn't know what inference economics look like.
01:09:39.260 | And so how do they know what trade-offs to make?
01:09:41.340 | Do I push this lever on the chip a bit harder,
01:09:43.500 | which then makes me have to back off on this?
01:09:45.260 | Or what exactly do I do, right?
01:09:47.580 | The hyperscalers are helping,
01:09:48.700 | but not enough that AMD is on the same timelines as NVIDIA.
01:09:52.580 | - How successful will AMD be in the next year on AI revenue
01:09:56.980 | and what kind of sockets might they succeed in?
01:10:00.620 | - Yes, I think they'll have a lot less success
01:10:03.580 | with Microsoft than they did this year.
01:10:05.520 | And they'll have less success than they did with Meta
01:10:10.660 | than they did this year.
01:10:11.500 | And it's because like the regulations make it so
01:10:13.860 | actually AMD's GPU is like quite good for China
01:10:16.580 | because of the way they shaped it.
01:10:18.260 | But generally I think AMD will do okay.
01:10:21.580 | They'll profit from the market.
01:10:23.500 | They just won't like go gangbusters like people are hoping.
01:10:26.740 | And they won't be a,
01:10:28.620 | their share of total revenue will fall next year.
01:10:31.300 | - Okay.
01:10:32.260 | - But they will still do really well, right?
01:10:33.620 | Billions of dollars of revenue is not nothing to stop at.
01:10:35.820 | - Let's go with the Google TPU.
01:10:37.260 | You earlier stated that it's got the second most workloads.
01:10:42.260 | It seems like by a lot, like it's firmly in second place.
01:10:46.420 | - Yeah, so this is where the whole systems
01:10:49.080 | and infrastructure thing matters a lot more.
01:10:51.920 | Each individual TPU is not that impressive.
01:10:55.360 | It's impressive, right?
01:10:56.520 | It's got good networking.
01:10:57.420 | It's got good, you know, architecture, et cetera.
01:11:00.000 | It's got okay memory, right?
01:11:01.320 | Like it's not that impressive on its own.
01:11:03.540 | But when you say, hey, if I'm spending X amount of money
01:11:07.280 | and then what's my system, Google's TPU looks amazing, right?
01:11:10.280 | So Google's engineered it for things
01:11:11.560 | that Nvidia maybe has not focused on as much, right?
01:11:14.800 | So actually their interconnects between chips
01:11:17.240 | is arguably competitive, if not better in certain aspects,
01:11:20.440 | worse in other aspects than Nvidia's.
01:11:22.000 | Because they've been doing this with Broadcom,
01:11:23.680 | you know, the world leader in networking,
01:11:26.040 | you know, building a chip with them.
01:11:27.640 | And since 2018, they've had this scale up, right?
01:11:30.120 | Nvidia's talking about GB200, NVL72,
01:11:33.360 | TPUs go to 8,000 today, right?
01:11:36.640 | And while it's not a switch, it's a point to point,
01:11:39.320 | you know, it's a little bit,
01:11:40.360 | there's some technical nuances there.
01:11:42.440 | So it's not just like those numbers
01:11:44.120 | are not all you should look at, but this is important.
01:11:46.720 | The other aspect is,
01:11:47.800 | Google's brought in water cooling for years, right?
01:11:50.800 | Nvidia only just realized
01:11:51.760 | they needed water cooling on this generation.
01:11:53.560 | And Google's brought in a level of reliability
01:11:58.000 | that Nvidia GPUs don't have.
01:12:00.360 | You know, the dirty secret is to go ask people
01:12:02.120 | what the reliability rate of GPUs is in the cloud
01:12:04.800 | or in a deployment.
01:12:05.800 | It's like, oh God, it's not, they're reliable-ish,
01:12:08.960 | like, but like, especially initially,
01:12:10.520 | you have to pull out like 5% of them.
01:12:12.560 | - Why has TPU not been more commercially successful
01:12:15.400 | outside of Google?
01:12:16.360 | - I think Google keeps a lot of their software internal
01:12:21.200 | when they should just have it be open,
01:12:22.720 | 'cause like, who cares?
01:12:23.920 | You know, like that's one aspect of it.
01:12:26.840 | You know, there's a lot of software that DeepMind uses
01:12:28.840 | that just is not available to Google Cloud.
01:12:32.840 | - Even their Google Cloud offering relative to AWS
01:12:36.000 | had that bias.
01:12:37.400 | - Yeah, yeah.
01:12:39.240 | Number two, the pricing of it is sort of,
01:12:43.680 | it's not that it's egregious on list price,
01:12:46.480 | like list price of a GPU at Google Cloud is also egregious.
01:12:51.160 | But you as a person know when I go rent a GPU,
01:12:55.040 | you know, I tell Google like, hey, like, you know,
01:12:56.600 | blah, blah, blah, you're like, okay,
01:12:57.760 | you can get around the first round of negotiations,
01:12:59.440 | get both down.
01:13:00.280 | But then you're like, well, look at this offer from Oracle
01:13:02.320 | or from Microsoft or from Amazon or from CoreWeave
01:13:05.080 | or one of the 80 Neo clouds that exist.
01:13:07.880 | And Google might not match like many of these companies,
01:13:10.200 | but like, they'll go down because they, you know,
01:13:11.720 | and then you're like, oh, well, like,
01:13:13.160 | what's the market clearing price for a,
01:13:14.720 | if I wanted an H100 for two years or a year,
01:13:18.320 | oh yeah, I could get it for like two bucks.
01:13:19.760 | - Right.
01:13:20.600 | - A little bit over versus like the $4 quoted, right?
01:13:23.440 | Whereas a TPU it's here,
01:13:24.760 | you don't know that you can get here.
01:13:26.200 | And so people see the list price and they're like, eh.
01:13:28.520 | - Do you think that'll change?
01:13:30.120 | - I don't see any reason why it would.
01:13:32.240 | And so number three is sort of,
01:13:34.400 | Google is better off using all of their TPUs internally.
01:13:37.280 | Microsoft rents very few GPUs by the way, right?
01:13:40.120 | They actually get far more profit
01:13:42.440 | from using their GPUs for internal workloads
01:13:44.560 | or using them for inference
01:13:46.160 | because the gross margin on selling tokens is 50 to 70%.
01:13:49.920 | Right?
01:13:50.760 | The gross margin on selling a GPU server
01:13:52.520 | is lower than that, right?
01:13:53.880 | So while it is like good gross margin,
01:13:55.400 | it's like, you know, it's--
01:13:56.680 | - And they've said out of the 10 billion that they've quoted,
01:13:59.000 | none of that's coming from external renting of GPUs.
01:14:02.360 | - If Gemini becomes hyper competitive as an API,
01:14:08.120 | then you indirectly will have third parties
01:14:11.360 | using the Google TPU, is that accurate?
01:14:13.800 | - Yeah, absolutely.
01:14:14.640 | Ads, search, Gemini applications,
01:14:18.640 | all of these things use TPUs.
01:14:20.200 | So it's not that like, you know, that you're not using,
01:14:22.640 | every YouTube video you upload is going through a TPU, right?
01:14:25.200 | Like, you know, it goes through other chips as well
01:14:27.480 | that they've made themselves custom chips for YouTube.
01:14:29.160 | But like, there's so much that touches a TPU,
01:14:31.600 | but you indirectly would never rent it, right?
01:14:35.440 | And that's therefore like,
01:14:36.840 | when you look at the market of renters,
01:14:38.800 | there's only one company accounts for over 70%
01:14:41.720 | of Google's revenue from TPUs as far as I understand,
01:14:43.840 | and that's Apple, right?
01:14:45.000 | And I think there's a whole long story
01:14:46.520 | around why Apple hates Nvidia.
01:14:48.000 | But, you know, that may be a story for another time, but--
01:14:51.920 | - You just did a super deep piece on Tranium.
01:14:55.080 | Why don't you do the Amazon version
01:14:58.320 | of what you just did with Google?
01:14:59.960 | - Yeah, so funnily enough, Amazon's chip is the Amazon,
01:15:03.760 | I call it the Amazon's basics TPU, right?
01:15:06.840 | And the reason I call it that is because,
01:15:08.800 | yes, it uses more silicon, yes, it uses more memory,
01:15:11.760 | yes, the network is like somewhat comparable to TPUs,
01:15:15.760 | right, it's a four by four by four Taurus.
01:15:18.160 | They just do it in a less efficient way in terms of,
01:15:23.160 | you know, hey, they're spending a lot more
01:15:25.360 | on active cables, right?
01:15:27.120 | Because they're working with Marvell and L-chip
01:15:30.200 | on their own chips versus working with Broadcom,
01:15:32.320 | the leader in networking, who then can use passive cables,
01:15:34.760 | right, 'cause their SERTIs are so strong.
01:15:37.000 | Like there's other things here, their SERTI speed is lower,
01:15:40.280 | they spend more silicon area, like there's all these things
01:15:42.760 | about the Tranium that are, you know,
01:15:45.880 | you could look at it and be like, wow,
01:15:47.000 | this would suck if it was a merchant silicon thing,
01:15:48.800 | but it doesn't because Amazon's
01:15:52.200 | not paying Broadcom margins, right?
01:15:53.760 | They're paying lower margins.
01:15:55.800 | They're not paying the margins on the HPM,
01:15:58.080 | they're paying lower margins in general, right?
01:16:01.040 | They're paying the margins to Marvell on HPM.
01:16:03.240 | You know, there's all these different things they do
01:16:04.720 | to crush the price down to where their AmazonBasics TPU,
01:16:08.920 | the Tranium 2, right, is very, very cost-effective
01:16:12.280 | to the end customer and themselves
01:16:14.040 | in terms of HBM per dollar, memory bandwidth per dollar,
01:16:16.840 | and it has this world size of 64.
01:16:19.040 | Now, Amazon can't do it in one rack,
01:16:21.040 | it actually requires them two racks to do 64,
01:16:23.400 | and the bandwidth between each chip
01:16:24.680 | is much slower than Nvidia's rack,
01:16:26.480 | and their memory per chip is lower than Nvidia's,
01:16:29.920 | and their memory bandwidth per chip is lower than Nvidia,
01:16:32.040 | but you're not paying north of $40,000 per chip
01:16:37.040 | for the server, you're paying significantly less, right?
01:16:40.360 | $5,000 per chip, right?
01:16:41.840 | Like, you know, it's like such a gulf, right, for Amazon,
01:16:44.440 | and then they pass that on to the customer, right,
01:16:45.720 | 'cause when you buy an Nvidia GPU.
01:16:46.960 | So there is legitimate use cases,
01:16:50.400 | and because of this, right, Amazon and Anthropic
01:16:53.000 | have decided to make a 400,000 Tranium supercomputer, right?
01:16:58.200 | 400,000 chips, right, going back to the whole
01:16:59.960 | of scaling laws dead, no,
01:17:01.480 | they're making a 400,000 chip system
01:17:04.240 | because they truly believe in this, right?
01:17:06.520 | And 400,000 chips in one location
01:17:08.840 | is not useful for serving inference, right?
01:17:11.980 | It's useful for making better models, right?
01:17:14.040 | You want your inference to be more distributed than that.
01:17:16.880 | So this is a huge, huge investment for them,
01:17:20.600 | and while technically it's not that impressive,
01:17:25.120 | there are some impressive aspects
01:17:26.300 | that I kind of glossed over,
01:17:28.200 | it is so cheap and so cost-effective
01:17:30.240 | that I think it's a decent play for Amazon.
01:17:33.560 | - Maybe just wrapping this up,
01:17:35.040 | I wanna shift a little bit
01:17:36.880 | to kind of what you see happening in 25 and 26, right?
01:17:41.040 | For example, over the last 30 days, right,
01:17:43.940 | we've seen Broadcom, you know, explode higher,
01:17:46.920 | Nvidia trade off a lot.
01:17:49.240 | I think there's about a 40% separation
01:17:51.320 | over the last 30 days, you know,
01:17:53.120 | with Broadcom being this play on custom ASICs,
01:17:55.840 | you know, people questioning whether or not
01:17:57.760 | Nvidia's got a lot of new competition, pre-training,
01:18:01.400 | you know, not improving at the rate that it was before.
01:18:04.940 | Look into your crystal ball for 25, 26.
01:18:08.100 | What are you talking to clients about,
01:18:10.080 | you know, in terms of what you think are
01:18:13.440 | kind of the things that are most misunderstood,
01:18:15.520 | best ideas, you know, in the spaces that you cover?
01:18:20.320 | - So I think a couple of the things are, you know,
01:18:23.200 | hey, Broadcom does have multiple custom ASIC wins, right?
01:18:26.000 | It's not just Google here.
01:18:27.760 | Meta's ramping up mostly still for recommendation systems,
01:18:30.840 | but their custom chips are gonna get better.
01:18:33.400 | You know, there's other players like OpenAI
01:18:36.720 | who are making a chip, right?
01:18:38.840 | You know, there's Apple who are not quite making
01:18:40.720 | the whole chip with Broadcom,
01:18:42.720 | but a small portion of it will be made with Broadcom, right?
01:18:45.440 | You know, there's a lot of wins they have, right?
01:18:47.660 | Now, these all won't hit in 25.
01:18:50.040 | Some of them will hit in 26.
01:18:51.560 | And it's, you know, it's a custom ASIC,
01:18:53.360 | so like it could be a failure and not be good,
01:18:56.120 | like Microsoft's and therefore never ramp,
01:18:58.280 | or it could be really good and like,
01:19:00.480 | or at least, you know, good price to performance
01:19:03.320 | like Amazon's and it could ramp a lot, right?
01:19:05.040 | So there are risks here,
01:19:07.000 | but Broadcom has that custom ASIC business, one.
01:19:10.040 | And two, really importantly,
01:19:11.880 | the networking side is so, so important, right?
01:19:14.240 | Yes, NVIDIA is selling a lot of networking equipment,
01:19:16.920 | but when people make their own ASIC,
01:19:20.960 | what are they gonna do, right?
01:19:21.920 | Yes, they could go to Amazon or not,
01:19:23.600 | but they could also, they also need to network
01:19:25.600 | many of these chips together.
01:19:27.520 | Or sorry, to Broadcom or not.
01:19:29.760 | They could go to Marvell or many other competitors
01:19:31.840 | out there like Alchip or NGUC.
01:19:34.240 | Like you could, you can, you,
01:19:35.920 | Broadcom is really well positioned
01:19:38.680 | to make the competitor to NVSwitch,
01:19:40.600 | which many would argue is one of NVIDIA's
01:19:43.080 | biggest competitive advantages on a hardware basis
01:19:45.520 | versus everyone else.
01:19:46.840 | And Broadcom is making a competitor to that,
01:19:49.680 | that they will seed to the market, right?
01:19:51.480 | Multiple companies will be using that.
01:19:53.360 | Not just, AMD will be using that competitor to NVSwitch,
01:19:57.080 | but they're not making it themselves
01:19:58.080 | 'cause they don't have the skills, right?
01:19:59.640 | They're going to Broadcom to get it made, right?
01:20:01.520 | - So make a call for us
01:20:03.680 | as you think about the semis market today.
01:20:07.000 | You've got ARM, Broadcom, you've got NVIDIA,
01:20:09.720 | you've got AMD, et cetera.
01:20:11.680 | Does the whole market continue to elevate
01:20:13.840 | as we head into 25 and 26?
01:20:16.200 | Who's best positioned from current levels to do well?
01:20:19.720 | Who's most, you know, overestimated?
01:20:22.000 | Who's least, who's most underestimated?
01:20:24.840 | - I think, I bought Broadcom long-term,
01:20:28.160 | but like in the next six months,
01:20:30.200 | there is a bit of a slowdown in Google TPU purchases
01:20:32.600 | because they have no data center space.
01:20:34.000 | They want more.
01:20:34.840 | They just literally have no data center space to put them.
01:20:36.760 | So we actually like, you know, can see how they're like,
01:20:39.600 | there's a bit of a pause, but people may look past that.
01:20:42.880 | Beyond that, right, it's the question is like,
01:20:44.600 | who wins what custom ASIC deals, right?
01:20:46.800 | Is Marvell going to win future generations?
01:20:48.960 | Is Broadcom going to win future generations?
01:20:51.040 | How big are these generations going to be?
01:20:52.840 | Are the hyperscalers going to be able to internalize
01:20:54.720 | more and more of this or no, right?
01:20:56.040 | Like it's no secret Google's trying to leave Broadcom.
01:20:58.920 | They could succeed or they could fail, right?
01:21:02.260 | It's not just like--
01:21:03.100 | - Broaden out beyond Broadcom.
01:21:04.800 | I'm talking NVIDIA and everybody else.
01:21:06.520 | Like, you know, we've had these two massive years, right,
01:21:10.280 | of tailwinds behind this sector.
01:21:12.420 | Is 2025 a year of consolidation?
01:21:15.680 | Do you think it's another year that the sector does well?
01:21:18.980 | Just kind of--
01:21:20.040 | - Yeah, I think the plans for hyperscalers
01:21:22.360 | are pretty firm on,
01:21:24.640 | they're going to spend a crapload more next year, right?
01:21:26.760 | And therefore the ecosystem of networking players,
01:21:29.560 | of ASIC vendors, of systems vendors is going to do well,
01:21:33.560 | whether it be NVIDIA or Marvell or Broadcom or AMD,
01:21:36.320 | or, you know, generally, you know, some better than others.
01:21:39.020 | The real question that people should be looking out to
01:21:41.420 | is 2026, does the spend continue, right?
01:21:45.000 | We are not good.
01:21:45.840 | The growth rate for NVIDIA
01:21:46.900 | is going to be stupendous next year, right?
01:21:48.560 | And that's going to drag the entire component supply chain up.
01:21:51.000 | It's going to bring so many people with them.
01:21:52.840 | But 2026 is like where the reckoning comes, right?
01:21:55.800 | You know, will people keep spending like this?
01:22:00.280 | And it's all points to where,
01:22:01.960 | will the models continue to get better?
01:22:03.560 | Because if they don't continue to get better,
01:22:05.200 | in my opinion, we'll get better faster, in fact, next year,
01:22:07.920 | then there will be a big, you know,
01:22:09.580 | sort of clearing event, right?
01:22:11.620 | But that's not next year, right?
01:22:13.420 | You know, the other aspect I would say
01:22:14.660 | is there is consolidation in the Neo cloud market, right?
01:22:17.700 | There are 80 Neo clouds that we're tracking,
01:22:19.580 | that we talk to, that we see how many GPUs they have, right?
01:22:23.300 | The problem is nowadays,
01:22:25.400 | if you look at rental prices for H100s,
01:22:27.700 | they're tanking, right?
01:22:29.180 | Not just at these Neo clouds, right?
01:22:31.060 | Where you can, you used to have to pay, you know,
01:22:32.820 | do four-year deals and prepay 25%.
01:22:35.420 | You'd sign a venture ground and you'd buy a cluster
01:22:38.320 | and that's about it, right?
01:22:39.160 | You'd rent one cluster, right?
01:22:40.400 | Nowadays, you can get three-month, six-month deals
01:22:43.320 | at way better pricing than even the four-month
01:22:45.480 | or the four-year, three-year deals
01:22:47.000 | that you used to have for Hopper, right?
01:22:49.160 | And on top of that, it's not just through the Neo clouds,
01:22:51.520 | Amazon's pricing for, you know, on-demand GPUs is falling.
01:22:54.480 | Now it's still over, it's like still really expensive,
01:22:56.480 | relatively, but like pricing is falling really fast.
01:22:59.420 | 80 Neo clouds are not gonna survive.
01:23:01.560 | Maybe five to 10 will.
01:23:04.160 | And that's because five of those are sovereign, right?
01:23:07.420 | And then the other five are like actually
01:23:08.860 | like market competitive.
01:23:09.700 | - What percentage of the industry AI revenues
01:23:13.260 | have come from those Neo clouds that may not survive?
01:23:16.140 | - Yeah, so roughly you can say hyperscalers
01:23:19.040 | are 50-ish percent of revenue, 50 to 60%.
01:23:22.620 | And the rest of it is Neo cloud/sovereign AI
01:23:26.360 | because enterprises purchases of GPU clusters
01:23:28.860 | is still quite low and it ends up being better for them
01:23:31.460 | to just like outsource it to Neo clouds.
01:23:33.940 | When they can like get through the security,
01:23:35.860 | which they can for certain companies,
01:23:37.160 | like Corby even.
01:23:38.600 | - Is there a scenario where in 2026,
01:23:42.400 | where you see industry volumes actually down versus 2025
01:23:47.400 | or Nvidia volumes actually down meaningfully from 2025?
01:23:55.200 | - So when you look at custom ASIC designs that are coming,
01:23:59.180 | as well as Nvidia's chips that are coming,
01:24:01.960 | the revenue, the content in each chip is exploding.
01:24:06.960 | The cost to make Blackwell is North of 2X
01:24:09.960 | that of the cost to make Hopper, right?
01:24:11.720 | So Nvidia can make the same,
01:24:13.560 | obviously they're cutting margins a little bit,
01:24:15.120 | but Nvidia can ship the same volumes
01:24:17.240 | and still grow a ton, right?
01:24:18.680 | - So rather than unit volumes,
01:24:20.920 | is there a scenario where industry revenues are down in '26
01:24:25.800 | or Nvidia revenues are down in '26?
01:24:28.920 | - The reckoning is do models continue
01:24:32.600 | to get much faster, better?
01:24:34.200 | And will hyperscalers,
01:24:36.720 | are they okay with taking their free cash flow to zero?
01:24:38.600 | I think they are, by the way.
01:24:40.600 | I think Meta and Microsoft may even take
01:24:42.920 | their free cash flows close to zero and just spent.
01:24:46.700 | But then that's only if models continue to get better,
01:24:49.320 | that's A.
01:24:50.160 | And then B, are we going to have this huge influx
01:24:52.400 | of capital from people we haven't had it yet from?
01:24:55.000 | The Middle East, the sovereign wealth funds in Singapore
01:24:57.960 | and Nordics and Canadian pension fund and all these folks,
01:25:01.800 | they can write really big checks.
01:25:04.160 | They haven't, but they could.
01:25:06.200 | And if things continue to get better,
01:25:09.060 | I truly do believe that OpenAI and XAI and Anthropic
01:25:13.680 | will continue to raise more and more money
01:25:15.560 | and keep this game going of not just,
01:25:17.920 | "Hey, where's the revenue for OpenAI?
01:25:19.960 | Well, it's 8 billion and it might double or whatever,
01:25:22.160 | or even more next year."
01:25:23.840 | And that's their spend, no, no, no.
01:25:25.160 | Like they have to raise more money
01:25:26.360 | to spend significantly more.
01:25:28.040 | And that keeps the engine rolling
01:25:29.240 | because once one of them spends,
01:25:31.040 | Elon is forcing everyone to spend more actually, right?
01:25:33.660 | With his cluster because, and his plans,
01:25:36.680 | because everybody's like, "Well, we can't get outscaled
01:25:38.480 | by Elon, we have to spend more."
01:25:40.400 | Right?
01:25:41.240 | And so there's sort of a game of chicken there too.
01:25:42.280 | We're like, "Oh, they're buying this?
01:25:44.360 | We have to match them or go bigger
01:25:46.060 | because it is a game of scale."
01:25:47.360 | So, you know, in sort of Pascal's wager sense, right?
01:25:50.760 | If I underspend, that's just the worst scenario ever.
01:25:53.400 | And I'm like the worst CEO ever
01:25:54.760 | of the most profitable business ever.
01:25:56.200 | But if I overspend, yeah, shareholders will be mad,
01:25:59.360 | but it's fine, right?
01:26:00.440 | It's, you know, $20 billion, $50 billion.
01:26:02.400 | You can paint that either way though,
01:26:03.760 | 'cause if that becomes the reasoning for doing it,
01:26:06.320 | you're more, the probability of overshooting goes up.
01:26:10.240 | For sure.
01:26:11.080 | And every bubble ever we overshoot.
01:26:12.840 | And you know, to me, it, you know,
01:26:16.320 | you said it all hangs on models improving.
01:26:20.200 | I would take it a step further, you know,
01:26:22.600 | and go back to what Satya said to us last week.
01:26:26.320 | It all comes down ultimately to the revenues
01:26:29.560 | that are generated by the people
01:26:31.240 | who are making the purchases of the GPUs, right?
01:26:34.400 | Like he said last week,
01:26:36.360 | I'm gonna buy a certain amount every single year,
01:26:40.200 | and it's going to be related to the revenues
01:26:42.520 | that I'm able to generate in that year
01:26:45.080 | or the next couple of years.
01:26:46.280 | So like, they're not gonna spend way ahead
01:26:49.400 | of where those revenues are.
01:26:51.120 | So he's looking at what, you know,
01:26:52.840 | he had 10 billion in revenues this year.
01:26:55.080 | He knows the growth rate associated
01:26:57.080 | with those inference revenues,
01:26:58.520 | and they're making, he and Amy are making some forecast
01:27:01.480 | as to what they can afford to spend.
01:27:03.040 | I think Zuckerberg's doing the same thing.
01:27:04.880 | I think Sundar's doing the same thing.
01:27:07.000 | And so if you assume they're acting rationally,
01:27:09.560 | it's not just the models improving,
01:27:11.840 | it's also the rate of adoption of the underlying,
01:27:15.080 | you know, enterprises who are using their services.
01:27:17.640 | It's the rate of adoption of consumers
01:27:19.640 | and what consumers are willing to pay
01:27:21.680 | to use ChatGPT or to use Claude
01:27:24.720 | or to use these other services.
01:27:26.520 | So, you know, if you think that infrastructure expenses
01:27:30.120 | are going to grow at 30% a year,
01:27:32.320 | then I think you have to believe
01:27:33.880 | that the underlying inference revenues, right,
01:27:36.600 | both on the consumer side and the enterprise side
01:27:38.880 | are gonna grow somewhere in that range as well.
01:27:41.760 | - There is definitely an element of spend ahead though,
01:27:43.640 | right? - For sure.
01:27:44.480 | - And it's point in time spend versus, you know,
01:27:46.320 | what do I think revenue will be
01:27:47.800 | for the next five years for the server, right?
01:27:49.640 | So I think there is an element of that for sure,
01:27:51.480 | but absolutely, right?
01:27:53.080 | Models, the whole point is models getting better
01:27:56.120 | is what generates more revenue, right?
01:27:58.120 | And it gets deployed.
01:27:59.080 | So I think that's, I'm in agreement,
01:28:00.880 | but people are definitely spending ahead of what's charted.
01:28:05.600 | - Fair enough. - Well, that's what
01:28:06.440 | makes it spicy.
01:28:08.040 | You know, it's fun to have you here.
01:28:09.720 | I mean, you know, a fellow analyst,
01:28:12.000 | you guys do a lot of digging.
01:28:13.840 | Congratulations on the success of your business.
01:28:17.120 | You know, I think you add a lot of important information
01:28:21.560 | to the entire ecosystem.
01:28:22.920 | You know, one of the things I think
01:28:24.080 | about the wall of worry, Bill,
01:28:25.720 | is the fact that we're all talking about
01:28:27.600 | and looking for, right, the bubble.
01:28:29.920 | Sometimes that's what prevents the bubble
01:28:31.720 | from actually happening.
01:28:32.880 | But, you know, as both an investor and an analyst,
01:28:36.800 | you know, I look at this and I say,
01:28:39.120 | there are definitely people out there who are spending
01:28:41.920 | who don't have commensurate revenues, to your point.
01:28:45.120 | They're spending way ahead.
01:28:47.880 | On the other hand, and frankly, you know,
01:28:50.240 | we heard that from Satya last week.
01:28:51.640 | He said, listen, I've got the revenues.
01:28:53.640 | I've said what my revenues are.
01:28:55.480 | I haven't heard that from everybody else, right?
01:28:58.280 | And so it'll be interesting to see in 2025
01:29:02.680 | who shows up with the revenues.
01:29:04.080 | I think you already see some of these smaller
01:29:06.720 | second and third tier models, changing business model,
01:29:09.800 | falling aside, no longer engaged in the arms race,
01:29:14.120 | you know, of investment here.
01:29:16.120 | I think that's part of the creative destructive process,
01:29:19.040 | but it's been fun having you on.
01:29:20.640 | - Yeah, thank you so much, Dale.
01:29:21.720 | I really appreciate it.
01:29:22.560 | - Yeah, fun having you here in person, Bill.
01:29:25.240 | And until next year.
01:29:27.320 | - Awesome, thank you.
01:29:28.160 | - Take care.
01:29:29.000 | (upbeat music)
01:29:31.560 | - As a reminder to everybody,
01:29:39.480 | just our opinions, not investment advice.