You may also be running up against the, even for the Mag 7, the size of Capo X deployment, where their CFOs start to talk at higher levels. - For sure, totally. (upbeat music) Sonny, Bill, great to see you guys. - Good to see you. - Good to be back.
- Thanks, man, it's great to have you. We literally just finished two days of the Altimeter annual meetings. I mean, we had hundreds of investors, CEOs, founders, and the theme was scaling intelligence to AGI. We had Nikesh talking about enterprise AI. We had Rene Haas talking about AI at the edge.
We had Noam Brown talking about, you know, the Strawberry and O1 model and inference time reasoning. We had Sonny talking about, you know, accelerating inference. And of course, we kicked off with Jensen talking about the future of compute. You know, I did the Jensen talk with my partner, Clark Tang, who covers the compute layer and the public side.
We recorded it on Friday. We'll be releasing it as part of this pod. And man, was it dense. I mean, he was, you know, he was on fire. He told me, I asked him at the beginning of the pod, "What do you want to do?" He said, "Grip it and rip it." And we did.
90 minutes, we went deep. I shared it with you guys. We've all listened to it. I learned so much playing it back that I just thought it made sense for us to unpack it, right, to really, to really analyze it, see what we agree with, what we may disagree with, things we want to further explore.
Sonny, any high level reactions to it? Yeah, you know, first, it's the first time I've really seen him in a format where you got all that information out in one setting, 'cause you kind of get the, you get the tidbits. And the ones that really struck with me was when he said, "NVIDIA's not a GPU company.
"They're an accelerated compute company." I think the next one, you know, which you'll touch on is where he really said, "The data center's the unit of compute." I thought that was massive. And, you know, sort of just closing out when he talked about, he thinks about using and already utilizing so much AI within NVIDIA and how that's a superpower for them to accelerate over everyone they're competing with.
I thought those were kind of really awesome points in him, you know, eating the dog food, as they say. It is incredible, you know, there's this thing we'll talk about later, but he said he thinks they can 3X, you know, the top line of the business while only adding 25% more humans because they can have 100,000 autonomous agents doing things like building the software, doing the security, and that he becomes really a prompt agent, not only for his human direct reports, but also for these agents, which, you know, really is mind-boggling.
Bill, anything stand out for you? Well, one, I mean, you should be pleased that you were able to get his time. You know, this is, at points in time, the largest market cap company in the world, if not one too. And so it was so, I think, kind of him to sit down with you for so long.
And during the pod, he kept saying, "I can stay as long as you want." I was like, "Doesn't he have something to be doing?" (laughing) Is incredibly generous and- It's fantastic. But my other big, I mean, I had two big takeaways. One, I mean, it's obvious that this guy's, you know, rolling on all cylinders here, right?
Like you have a company at a 3.3 trillion market cap that's still growing over 100% a year. And the margins are insane. I mean, 65% operating margins. There's only like five companies in the S&P 500 at that level. And they certainly aren't growing at this pace. And when you bring up that point about getting more done on the increment with fewer employees, where's this gonna go?
Like 80% operating margins? I mean, that would be unprecedented. There's a lot that's already here that's unprecedented, but obviously Wall Street is fully aware of the unbelievable performance of this company. And, you know, the multiples reflected and the market cap reflects it, but it's super powerful how they're executing.
And you can see the confidence in every answer that he gives. We spent about a third of the pod on NVIDIA's competitive mode, really trying to break it down, really trying to understand this idea of systems level advantages, the combinatorial advantages that he has in the business. Because I think when I talk to people around the investment community, despite how well it's covered, Bill, right?
There's still this idea that it's just a GPU and that somebody is gonna build a better chip. They're gonna come along and displace the business. And so when he said, again, it can sound like marketing speak, Sonny, when somebody says it's not a GPU company, it's an accelerated compute company.
You know, we showed this chart where you can see kind of the NVIDIA full stack. And he talked about how he just built layer after layer after layer of the stack, you know, over the course of the last decade and a half. But when he said that, Sonny, I know you had a reaction to it, right?
Even though you know, it's not just a GPU company, when he really broke it down, it seemed like, you know, he did break new territory here. Yeah, like what was great to hear from him and really, you know, positive for, you know, folks thinking about where NVIDIA lives in the stack right now, is he kind of got into details and then the sub details below CUDA.
And he really started going into what they're doing very particularly on mathematical operations to accelerate their partners and how they work really closely with their partners. You know, all the cloud service providers to basically build these functions so that they can further accelerate workloads. The other little nuance that I picked up in there, he didn't focus purely on LLMs.
He talked in that particular area about how they're doing that for a lot of traditional models and even newer models are being deployed for AI. And I think just really showed how they're partnering much closer on the software layer than the hardware layer alone. Right, I mean, in fact, you know, he talked about, you know, the CUDA library now has over 300 industry specific acceleration algorithms, right?
Where they deeply learn the industry, right? So whether this is synthetic biology or this is image generation, or this is autonomous driving, they learn the needs of that industry and then they accelerate the particular workloads. And that for me was also one of the key things. This idea that every workload is moving from kind of this deterministic, you know, handmade workload to something that's really driven by machine learning and really infused with AI and therefore benefits from acceleration.
Even something as ubiquitous as data processing. - Yeah, and I shared this code sample with Bill as, you know, we were just preparing for this pod and, you know, I knew Bill processed it right away and then ran it, which was, it really showed like every piece of code that's out there now that's related to, or not every piece, many of the pieces have this like sort of, if device equals CUDA, do X, and if it's not, do Y.
And that's the level of impact they're having across the, you know, entire ecosystem of services and apps that are being built that are related to AI. Bill, I don't know what you thought when you saw that piece. - Yeah, I mean, I think there's a question for the long-term that relates to CUDA.
And I wanna go back to the system point you made later, Brad, but while we're on CUDA, is what percentage of developers will touch CUDA and is that number going up or down? And I could see arguments on both sides. You could say the A models are gonna get more and more hyper-specialized and performance matters so much that the one, the models that matter the most, the deployments that matter the most, they're gonna get as close to the metal as possible and then CUDA is gonna matter.
The other side you can make is, those optimizations are gonna live in PyTorch, they're gonna live in other tools like that. And the marginal developer's not gonna need to know that. And I don't, I could make both arguments, but I think it's an interesting question going forward. - I mean, I just asked Chet GPT how many CUDA developers there are today just to be on top of it.
Three million CUDA developers, right? And a lot more that touch CUDA that aren't specifically kind of developing on CUDA. So it is one of these things that has become pretty ubiquitous. And his point was, it's not just CUDA, of course. It's really full stack, all the way from data ingestion, all the way through kind of the post-training.
- I think I'm on the ladder of your point, Bill. I think there's gonna be fewer people touching that. And I do think that's a point where there, the moat is not as strong as a longer term, as you say. And think about like, you know, the way, the analogy that I would go with is like, think about the number of iPhone, iOS, like developers working at Apple building that versus the number of app developers, right?
And I think you're gonna have a, you know, 10 to one or a hundred to one ratio of people building at layers above versus people building down closer to the bare metal. - That'd be something to watch. We can ask more people over time. Obviously it's a big lock today, for sure.
- You know, and I think, Bill, to your point, you know, I reached out to Gavin. Actually, before I did the interview, Gavin Baker, who's a good buddy and who obviously knows the space incredibly well, has followed it at a deeper level for a longer period of time than I have.
And, you know, like when I asked him about the competitive advantage, he really said a lot of the competitive advantages around this algorithmic diversity and innovation and why CUDA matters. He said, if the world standardizes on transformers on PyTorch, then it's less relevant for GPUs, you know, in that environment.
Like if you have a lot of standardization, right, then advantage goes to the custom ASICs. But I'll tell you this, you know, and I've had this conversation with a lot of people. When I asked Jensen, I pushed him on, you know, custom ASICs. I was like, hey, you know, you've got, you know, accelerated inference coming from Meta with their MTIA chip.
You know, you've got Inferentia and Tranium, you know, coming. He's like, yeah, Brad, like they're, you know, they're my biggest partners. I actually share my three to five-year roadmap with them. Yes, they're going to have these point solutions that are going to do these very specific tasks. But at the end of the day, the vast majority of the workloads in the world that are machine learning and AI infused are gonna run on NVIDIA.
And the more people I talk to, the more I'm convinced that that's the case, despite the fact that there'll be a lot of other winners, including Grok and Cerebrus, et cetera. And they're acquiring companies. They're moving up the stack. They're trying to do more optimization at higher levels. So they want to extend, obviously, what CUDA is doing.
Don't go to inference yet. That's a whole nother story. I'm actually on that bit about the deep integrations. Yes. Because, you know, really that's a playbook that I think Microsoft really had done well for a long time in enterprise software. And you really haven't seen that in hardware ever.
You know, if you go back to say Cisco or the PC era, or, you know, the cloud era, you didn't see that deep level integration. Now, Microsoft pulled it off with Azure. And when I heard him talking, all I could think about was, man, that was really smart. What he's done is he's gotten together, really understand what the use cases are, and build an organization that deeply integrates into his customers, and does it so well all the way up into his roadmap that he's much more deeply embedded than anyone else is.
When I heard that part, I kind of gave him a real tip of the hat on that one. But what did you, you know, Brad, what was your take on that? You and I had this conversation after we first listened to it. And, you know, if you really telescope out, you know, he talks as a systems level engineer, right?
Even if you hear like people, you know, people went to Harvard Business School and say, how can this guy possibly have 60 direct reports, right? But how many direct reports does Elon have, right? These systems level, and he said, I have situational awareness, right? I'm a prompt engineer to the best people in the world at these specific tasks.
I think when I look at this, the thing that I deeply underappreciated a year and a half ago about this company was the systems level thinking, right? That these are, that he spent years thinking about how to embed this competitive advantage, and how it really, it goes all the way from power, all the way through application.
And every day they're launching these new things to further embed themselves in the ecosystem. But I did hear from somebody over the last two days who, you know, Rene Haas, the CEO of Arm, right? Rene was also at our event and he's a huge Jensen fan. He worked eight years at NVIDIA before becoming the CEO of Arm in 2013.
And he said, listen, nobody is going to assault the NVIDIA castle head on, right? Like the mainframe of AI, right? Is entrenched and it's going to become a lot bigger, at least as far as the eye can see. He said, however, if you think about where we're interacting with AI today, right?
On these devices, on edge devices. He's like, our installed base at Arm is 300 billion devices. And increasingly a lot more of this compute can run closer to the edge. If you think about an orthogonal competitor, right? Again, if he has a deep competitive moat in the cloud, what's the orthogonal competitor?
The orthogonal competitor peels off a lot of the AI on the edge. And I think Arm's incredibly well positioned to do that. Clearly NVIDIA has got Arm embedded now in a lot of their, you know, in a lot of their Grace Blackwell, et cetera. But that to me would be one area.
Like if you looked out and you said, where can their competitive advantage, you know, be challenged a little bit? I don't think they necessarily have the same level of advantage on the edge as they have in the cloud. - You started the pod by saying, you know, everyone's heard this in the investment community.
It's not a GPU company, it's a systems company. And I, in my brain, I think had thought, oh, well, they've got four in a box instead of, you know, just one GPU or eight in a box. At the time I was listening to the podcast you did with Jensen, I was reading this Neo cloud playbook and anatomy post by Dylan Patel.
- Yes, that was a good one. - He goes into extreme detail about the architecture of some of the larger systems, you know, like the one that X.AI that we're going to talk about that was just deployed, which I think is 100,000 nodes or something like that. And it literally changed my opinion of exactly what's going on in the world.
And actually answered a lot of questions I had, but it appears to me that NVIDIA's competitive advantage is strongest where the size of the system is largest, which is another way of saying what Renee said, it's flipping it on its head. It's not to say it's weak on the edge, but it's super powerful when you put a whole bunch of them together.
That's when the networking piece thrives. That's where NVLink thrives. That's where CUDA really comes alive in the biggest systems that are out there. And some of the questions that answered for me was, one, why is demand so high at the high end and why are nodes available on the internet, you know, single nodes available on the internet for at or below cost?
And this starts to get at that, 'cause you can do things with the large systems that you just can't do with a single node. And so those two things can be simultaneously true. Why was NVIDIA so interested in CoreWeave existing? Now, I understand like if the biggest systems are where the biggest competitive advantage is, you need as many of these big system companies as you can possibly have.
And there may be, if that trajectory remains true, you could have an evolution where customer concentration increases for NVIDIA over time rather than going the other way. Depending on how, you know, if Sam's right that they're gonna spend a hundred billion or whatever on a single model, there's only so many places they're gonna be able to afford that.
But a lot of stuff started to make sense to me that didn't before. And I clearly underestimated the scale of what it meant to be a non-GPU company, to be a system company. This goes way, way up. - Yeah, and you know, again, Bill, you touched on something that I think is really important here.
And this is this question of whether their competitive mode is also as powerful in training as it is in inference, right? Because I think that there's a lot of doubt as to whether their competitive mode is as strong as inference. But, you know, let's just- - You wanna flip to that?
- Well, no, but I asked him if it was as strong. - No, I heard you. - He actually said it was greater, right? - I heard him. - To me, you know, when you think about that, in the first instance, right, I think it didn't make a lot of sense.
But then when you really started thinking about it, he said there's a trail of infra behind the infrastructure that's already out there that is CUDA compatible and can be amortized for all this inference. And so he, like for example, referenced that OpenAI had just decommissioned Volta. So it's like this massive installed base.
And when they improve their algorithms, when they improve their frameworks, when they improve their CUDA libraries, it's all backward compatible. So Hopper gets better and Ampere gets better and Volta gets better. That combined with the fact that he said everything in the world today is becoming highly machine learned, right?
Almost everything that we do, he said almost every single application, Word, Excel, PowerPoint, Photoshop, AutoCAD, like it all will run on these modern systems. Sonny, do you buy that? Do you buy that, you know, when people go to replace, you know, compute, they're gonna replace it on these modern systems?
So when I was listening to it, I was buying it. But then when I, he said one thing that kept resonating in my mind, which he said inference is going to be a billion times larger than training. And if you kind of double click into that, these old systems aren't gonna be sufficient enough, right?
If you're gonna have that much more demand, that much more workload, which I think we all agree, then how is it that these old systems, which are being decommissioned from training are gonna be sufficient? So I think that's where that argument didn't hold, just didn't hold strong enough for me.
If that grows as fast as he says it is, as fast as, you know, you guys have seen it in their numbers, then it's gonna be a lot more net new inference related, you know, deployments. And there, I don't think that that argument holds on the transfer from older hardware to newer hardware.
- Well, you said something pretty casually there, right? Let's underscore this, right? We were talking about the strawberry in the '01 preview, and he said there's a whole new vector of scaling intelligence, inference time reasoning, right? That's not gonna be single shot, but it's going to be lots of agent to agent interactions, thinking time as Noam Brown likes to say, right?
And he said as a consequence of that, inference is going to 100X, 1,000X, a million X, maybe even a billion X. And that in and of itself, right, to me was, you know, kind of a wow moment. 40% of their revenues are already inference. And I said, over time, does your inference become a higher percentage of your revenue mix?
And he said, of course, right? But again, I think conventional wisdom is all around the size of clusters and the size of training. And if models don't keep getting bigger, then their relevance will dissipate. But he's basically saying every single workload is gonna benefit from acceleration, right? It's gonna be an inference workload, and the number of inference interactions is gonna explode higher.
Yeah, one technical detail, which is you need bigger clusters if you're training bigger models. But if you're running bigger models, you don't need bigger clusters. It can be distributed, right? And so I think what we're gonna see here is that the larger clusters will continue to get deployed, and as Bill said, they'll get deployed for folks, maybe a limited number of folks that need to deploy it for a hundred billion dollar runs or even bigger than that.
But you'll see inference clusters be large, but not as large as a training clusters and be a lot more distributed because you don't need it to be all in the same place. And I think that's what'll be really interesting. It was interesting. He simplified it even more than you did there, Brad.
He said, think about a human. How much time do you spend learning versus doing? And he used that analogy as to why this was gonna be so great. But I, in a little different way than Sonny, I thought the argument that the reason we're gonna be great at inference is 'cause there's so much of our old stuff laying around wasn't super solid.
In other words, what if some other company, Sonny's or some other one decided to optimize inference? It wasn't an argument for optimization. It was an argument for cost advantage because it might be fully distributed or whatever. And of course, if you had maybe poked him back on that, he might've had another answer about why for optimization, but there are clearly gonna be people, whether it's other chips companies, some of these accelerator companies, there are gonna be people working on inference optimization, which may include edge techniques.
I think some of the accelerators may look like AI CDNs, if you will, and they're gonna be buying stuff closer to the customer. So all TBD, but just the argument that you've got it left over didn't seem super solid to me. - And the three fastest companies in inference right now are not NVIDIA.
- Right, so who are they, Sonny? Show it, we'll post the leaderboard. - Yeah, it's a combination of Grok, Cerebris, and SambaNova, right? Those are three companies that are not NVIDIA that are on the leaderboards of all the models that they run. - You're talking about performance. Performance is what you're talking about.
- Performance, yeah. - Yeah. - Yeah, and I would argue even price. - Yeah. - And make the argument, why are they faster? Why are they cheaper in your mind? But yet, notwithstanding that fact, NVIDIA is gonna do, let's call it, 50 or 60 billion of inference this year, and these companies are still just getting started, right?
Why is their inference business? Is it just because of installed base? - Yeah, I think it's a combination of installed base, and I think it's because that inference market is growing so incredibly fast. I think if you're making this decision even 18 months ago, it would be a really difficult decision to buy any of those three companies, because your primary workload was training, and the first part of this pod, we talked about how they have such a strong tie-in, integration to getting training done properly.
I think when it comes to inference, you can see all the non-NVIDIA folks can get the models up and running right away. There is no tie-in to CUDA that's required to go faster, that's required to get the models running, right? Obviously, none of the three companies run CUDA, and so that moat doesn't exist around inference.
- Yeah, CUDA's less relevant in inference. That's another point worth making. But I wanted to say one other thing to what Sonny just said. If you go back to the early internet days, and this is just an argument that optimization takes a while, all of the startups were running on Oracle and Sun.
Every single (beep) one of them were running on Oracle and Sun, and five years later, they were all running on Linux and MySQL, like in five years. And so, and it was literally, it went from 100% to 3%, and I'm not making that projection that that's gonna happen here, but you did have a wholesale shift as the industry, they went from developing and building it for the first time to optimizing, which are really two separate motions.
- It seems to me, I pulled up this chart, right, that we shared, we made, Bill, way earlier this year for the pod, which showed the trillion dollars of new AI workloads expected over the next four to five years, and the trillion dollars of effectively data center replacement. And I just wanted to get his updated kind of reaction or forecast, now that he's had six more months to think about whether or not he thinks that's achievable.
And what I heard him say was, "Yes, the data center replacement's gonna look exactly like that." Of course, he's just making his best educated guess, but he seemed to suggest that the AI workloads could be even bigger, right? Like that once he saw Strawberry in '01, that he thought the amount of compute that was gonna be required to power this, and the more people I talk to, the more I get that same sense, there is this insatiable demand.
So maybe we just touch on this. He goes on CNBC and he says, "The demand is insane," right? And I kept trying to push on that. I was like, "Yeah, but what about MTIA? What about custom inference? What about all these other factors? What if models stop getting so big?" I said, "Will any of that change the equation?" And he consistently pushed back and said, "You still don't understand the amount of demand in the world because all compute is changing," right?
I thought he had one nuance, that answer, which was when you asked him that, he said, "Look, if you have to replace some amount of infrastructure," whatever the number was, was really big, "and you're part of that, and you're a CIO somewhere tasked with doing this, what are you gonna do?
What are you gonna replace it with? It's accelerated compute." And then immediately, once you make that choice, 'cause you're not going to traditional compute, then NVIDIA is your number one choice. So I thought he kind of tied that back together in that like, are you really gonna get yourself in trouble by having something else there, or are you just gonna go to NVIDIA?
When he said it, I didn't wanna say that, Bill, but it felt like the old IBM argument. Yeah, look, I mean, one thing, Brad, is this company's public. When a private company says, "Oh, the demand's insane," you know, I immediately get skeptical. This company's doing 30 billion a quarter, growing 122%, like, the demand is insane.
Like, we can see it. There's no doubt about it. And part of that demand was a conversation about Elon and x.ai and what they did. And I thought it was also just incredibly fascinating, right? I thought it was funny. I asked him a question about the dinner that he and Elon and Larry Ellison apparently had.
And he's like, you know, just because that dinner occurred and they ended up with 100,000 H100s, don't necessarily connect the dots. But listen, he confirmed that his mind was blown by Elon. And he said he is an N of one superhuman that could possibly pull off, that could energize a data center, that could liquid cool a data center.
And he said, what would take somebody else years to get permitted, to get energized, to get liquid cooled, to get stood up, that x.ai did in 19 days, you know? And you could just tell the immense respect that he had for Elon. It's clear, you know, he said it's the single largest coherent supercomputer in the world today, that it's gonna get bigger.
And if you believe that the future of AI is tied closely together with the systems engineering on the hardware side, you know, what hit me in that moment was, that's a huge, huge advantage for Elon. Yeah, I think he, I forgot the exact number, but like he talked about how many thousands of miles of cabling that were just in there.
As part of the task. Look, you know, coming to it from a bit, you know, doing a lot of that ourselves right now, building data centers, standing them up, racking and stacking, you know, our nodes, it's impressive. It's impressive to do something at that scale in 19 days. You know, it doesn't even include how quickly they built that data center.
I think it's all happened, you know, within 2024. And so that's part of the advantage. The interesting thing there is he didn't touch on it as much as what, when he talked about it, doing the integration with cloud service providers. What I'd love to kind of double click into is, because, you know, Elon is in a unique situation where he's obviously bought this cluster.
He has a ton of respect for NVIDIA, but he, you know, is building his own chip, building their own clusters with Tesla. So I wonder how much, you know, cross correlation or information there is for them, for them to be able to do that at scale. And, you know, you guys look at this.
What have you kind of seen on their clusters? I don't really have a lot of data on the non-NVIDIA clusters that they have. I'm sure Freedom, my team, does. I just don't have it off the top of my head. If we have it, you know, I'll pull a chart and I'll show it.
Sonny, you said you now think the XAI cluster is the largest NVIDIA cluster alive today? I'm saying, 'cause I believe Jensen said it in the pod, that he said it's the largest supercomputer in the world. Yeah, I mean, I just want to spend 30 seconds on what you said, Brad, about Elon.
I'm staring out my window at the Gigafactory in Austin that was also built in record time. Starlink's insane. When we were walking in Diablo, I just kept thinking, "You know who I'd love to reimagine this place? Elon," right? And I don't, the world should study how he can do infrastructure fast, because if that could be cloned, it would be so valuable.
Not really relevant to this podcast, but worth noting. The other thing that I thought about on the Elon thing, and this also, where these pieces coming together, my mind about these large clusters and how important that was to NVIDIA, he got allocation, right? This is supposed to be like the hottest company, the hottest product backed up for years on demand.
And he walks in and takes what equates, sounds, looks like about 10% of the quarter's availability. And in my mind, I'm thinking that's because, hey, if there's another company that's gonna develop these big ones, I'm gonna let him to the front of the line. And that speaks to what's happening in Malaysia and the Middle East, and any one of these people that are gonna get excited, he's gonna spend time with them, put them at the front of the line.
You know, I'll tell you, I pushed him on this. I said, you know, Elon's gonna, you know, rumor is that he's gonna get another 100,000, you know, H200s, add them to this cluster. I said, are we already at the phase of two and 300,000 cluster scale? And he said, yes.
And then I said, and will we go to 500,000 a million? And he's like, yes. Now, I think these things, Bill, are already being planned and built. And what he said is beyond that, beyond that, he said, you start bumping up against the limitations of base power. Like, can you find something that can be energized to power a single cluster?
And he said, we're gonna have to develop distributed training. And he said, but just like with Megatron that we developed to allow to occur what is occurring today, we're working on the distributed stuff because we know we're gonna have to decompose these clusters at some point in order to continue scaling them.
- You may also be running up against the, even for the Mag 7, the size of Capo X deployment, where their CFOs start to talk at higher levels. - For sure, totally. - And there's a super interesting article in the information just now where, it came out today where Sam Altman is questioning whether Microsoft's willing to put up the money and build a cluster.
And it may have been, that may have been kind of triggered by Elon's comments or Elon's willingness to do it at X.AI. - What I will say on like the size of the models, like we're gonna push into this really interesting realm where obviously we can have bigger and bigger training clusters.
That naturally imposes that the models are bigger and bigger. But what you can't do is you can't take a single, like you can train a model across a distributed site and it may just take you a month longer because you have to move traffic around. And so instead of taking three months, it takes you four months.
But you can't really run a model across a distributed site 'cause that inferences in like real time thing. And so we do, we're not pushing it there, but when you start to get to models, it became way too big to run in single locations. That may be a problem that we wanna be aware of and we wanna keep in our minds as well.
- On this question of scaling our way to intelligence. One of the things I asked Noam Brown today in our fireside chat, he made very clear his perspective, although he's working on inference time reasoning, which is a totally different vector and a breakthrough vector at OpenAI, which we ought to spend a little bit of time talking about.
He said, now there are these two vectors, right? That again are multiplicative in terms of the path to AGI. He's like, make no mistake about it, like we're still seeing big advantages to scaling bigger models, right? We have the data, we have the synthetic data, we're going to build those bigger models and we have an economic engine that can fund it, right?
Don't forget this company is over 4 billion in revenue, scaling probably most people think to 10 billion plus in revenue over the course of the next year. They just raised 6.5 billion, they got a $4 billion line of credit from Citigroup. So among the independent players, Bill, right? Like Microsoft can choose whether or not they're going to fund it, but I don't think it's a question of whether or not they're gonna have the funding.
At this point, they've achieved escape velocity. I think for a lot of the other independent players, there's a real question whether they have the economic model to continue to fund the activity. So they have to find a proxy because I don't think a lot of venture capitalists are going to write multi-billion dollar checks into the players that haven't yet caught lightning in a bottle.
That would be my guess. I mean, you know, I just think it's hard. You know, listen, at the end of the day, we're economic animals, you know, and I've said before, you know, if you look at the forward multiple, most of us underwrote to on open AI, it was about 15 times forward earnings, right?
If Chad GPT wasn't doing what it was doing, if the revenue wasn't doing what it was doing, right, this would have meant massively dilutive to the company. It would have been very hard to raise the money. I think if Mistral or all these other companies want to raise that money, I think it'd be very difficult, but you know, you never, I mean, you know, there's still a lot of money out there, so it's possible, but I think this is, you know, you should- - You said 15 times earnings, I think you meant revenue.
- Oh, 15 times revenue, for sure. Which I said, you know, when Google went public, it was about 13 or 14 times revenue and Meta was like 13 or 14 times revenue. So I do think we're on the precipice of a lot of this consolidation among the new entrants.
What I think is so interesting about X is, you know, when I was pushing him on this model consolidation, pushing Jensen on it, he was like, listen, with Elon, you have somebody with the ambition, with the capability, with the know-how, with the money, right, with the brands, with the businesses.
So I think a lot of times when we're talking about AI today, we oftentimes talk about open AI, but a lot of people quickly then go into all of the other model companies. I think X is often left out of the conversation. And one of the things I took away from this conversation with Jensen is, again, if scaling these data centers is a key competitive advantage to winning an AI, right, like you absolutely cannot count out X.AI in this battle.
They're certainly going to have to figure out, you know, something with the consumer that's going to have a flywheel like ChatGPT or something with the enterprise. But in terms of standing it up, building the model, having the compute, I think they're, you know, going to be one of the three or four in the game.
You touched on maybe wanting to close out on the strawberry-like models. You know, one thing we don't have exposure to, but we can guess at is cost. And that chart that they showed when they released Strawberry, the X-axis was logarithmic. So the cost of a search with the new preview model is probably costing them 20X or 30X what it does to do a normal ChatGPT search.
- Which I think is fractions of a penny. - But figuring out which, and it also takes longer. So figuring out which problems it's acceptable, and Jensen gave a few examples for it, to take more time and cost more and to get the cost benefit right for that type of result is something we're going to have to figure out, like which problems tilt to that place.
- Right, and you know, the one thing I feel good about there, and again, I'm speculating, I don't have information from OpenAI on this, but what we know is that the cost of inference has fallen by 90% over the course of last year. What we, you know, what Sonny has told us and other people, you know, in the field have told us that inference is going to drop by another 90% over the course of the next, you know, period of months.
- If you're racing logarithmic needs, you're going to need that. - Right, and you know, and here's what I also think happens, Bill, is in this chain of reasoning, you're going to build intelligence into the chain of reasoning, right? So that, you know, you're going to optimize where you send these, you know, each of these inference interactions, you're going to batch them, you're going to take more time with, because it's just a time money trade-off, right?
At the end of the day. I also think that we're in the very earliest innings as to how we're going to think about pricing these models, right? So if we think about this in terms of systems one, systems two level thinking, right? Systems one being, you know, what's the capital of France, right?
You're going to be able to do that for fractions of a penny using pretty simple models on chat GPT, right? When you want to do something more complex, if you're a scientist and you want to use O1 as your research partner, right? You may end up paying it by the hour and relative to the cost of an actual research partner, it may be really cheap, right?
So I think there are going to be consumption models, you know, for this. I think we haven't even scratched the surface to think about how that's going to be priced, but I totally agree with you that it's going to be priced very differently. Again, I think this puts, I think OpenAI has suggested, you know, that the O1 full model may even be released yet this year, right?
One of the things that I'm kind of waiting to see is I think, you know, listen, having known Noam Brown for quite a while now, he's an N of one, right? And he wasn't the only one working on this for sure at OpenAI, but, you know, listen, whether it was pluribus or winning at the game of diplomas, he's been thinking about this for a decade, right?
It was his major breakthrough on how to win the game of six-handed poker. And so he brought this to OpenAI. I think they have a real lead here, which leads me back to this question, Bill, you and I talk about all the time, which is memory and actions, right?
And so I have to tell you this funny thing that occurred at our investor day. So I had Nikesh on stage and, you know, obviously Nikesh, you know, was instrumental at Google for a decade. And so I wanted to talk to him about both consumer AI as well as enterprise AI.
And I asked him, I said, I want to make a wager with you. I knew of course he would take a bet. And I said, I want to make a wager with you. Over, under, I'll set the line at two years until we have an agent that has memory and can take action.
And the canonical use case, of course, that I used was that I could tell my agent, book me the Mercer Hotel next Tuesday in New York at the lowest price. And I said, over, under, you know, two years on getting that done. I said, I'll start 5,000 bucks, I'll take the under.
He snap calls me, he says, I'll take the over. And he said, but only if you 10X the bet. And of course we're doing it for a good cause. So I had to call him because I, you know, I can't not step up to a good cause. So we're taking the opposite sides of that trade.
Now, what was interesting is over the course of the next couple of days, I asked some other friends who took the stage, you know, where they would come down on the same bet, right? Our friend, Stanley Tang took the under. A friend from Apple, who will remain nameless, kind of took the over.
And then Noam Brown, who was there, pleaded the fifth. He says, I know the answer, so I can't say. And so, yeah, it was kind of provocative. And I, you know, I texted Nikesh and I said, I think you better get your checkbook ready. You know, so coming back to that, Bill, you know, Strawberry O'Wan's an incredible breakthrough, something that thinks, so this whole new vector of intelligence, but it kind of makes us forget about the thing you and I focus so much on, which was memory and actions, right?
And I think that we are on the real precipice of not only these models think, you know, can spend more time thinking, not only can they give us less hallucinations, you know, and just scaled compute, but I also think, I mean, you already see the makings of this. I mean, use these things today.
They already remember quite a bit. So I think they're sliding this into the experience, but I think we're going to have the ability to take simple actions. And I think this metaphor that people had in their minds, that they were going to have to build deep APIs and deep integrations to everybody, I don't think is the way this is going to play out.
And let me just-- What do you think is going to play out? Well, I mean, the Easter egg that I thought got dropped last week is they did this event on, you know, their voice API, right? And it's literally your GPT calling a human on the telephone and placing an order.
So why the hell can't my GPT just call up the Mercer Hotel and say, "Brad Gerstner would like to make a reservation. Here's his credit card number," and pass along the information? There is a reason for that. I mean, look, scrapers and form fillers have existed for how long, Sonny?
15 years? Like, you could write an agent to go fill out and book at the Mercer Hotel 15 years ago. There's nothing impossible about that. It's the corner cases and, like, the hallucination when your credit card gets charged 10 grand. Like, you just can't have failure. And how you architect this so that there's not failure and there's trust, I'm sure you could demo this tomorrow.
I have zero doubt you could demo it tomorrow. Could you provide it at scale in a trustworthy way where people are allocating their credit cards to it? That might take a little longer. Okay, so over-under bill on two years. I mean, I'm gonna get to action either way. But what's the test?
The demo? I think you can do it today. No, not the cheesy demo you just said. I'm talking about a release that allows me, you know, at scale to book a hotel. Where it's spending your credit card? And not just you, but everybody, full release? Yeah, we'll call it a full release, just because I know that's the only way I can entice you to take the bet.
Which today is October 8th, 2024. I mean, Sonny, you already know what he's gonna say. You'll take the over, right, Bill? Yeah, yes. Okay, so Bill's into cash camp. Sonny, where do you come down? Over-under on two years. No, don't start hedging, Bill. Don't start hedging. Go ahead, Sonny.
I already said it, demo today. It's 15 years ago you could do that. Let me comment on what you're worried about, Bill. And I think people still are still working their way through it. You don't need a single agent right now to book the Mercer and deal with all the scraping stuff you're talking about.
You can have a thousand agents working together. You can have one that's making sure that the credit card charge is not too big. You can have another one to make sure that the address is right. You can have another one checking against your calendar. And so all of that's free.
So I'm on the under, and Brad, I'll even go under one year. Wow, wow. So we got a little side action, you and I, Sonny. I'm not gonna go under a year, but I think we could have limited releases in a year. But Sonny, you and I now have action with Bill.
What do you want, Bill, a thousand bucks? Sure. To a good cause? Okay, a thousand bucks each to a good cause. And I'll just assume, Sonny, that we'll get action from Nikesh as well. And you know our friend Stanley Tang is definitely in the tank for some. So we're gonna give some good money to a good cause.
And listen, I think this is the trillion dollar question. I know we're all focused on scaling models, and I know we're all focused on the compute layer, but what really transforms people's lives, what really disrupts 10 Blue Links, what really disrupts the entire architecture of the app ecosystem, is that when we have an intelligent assistant that we can interact with that gets smarter over time, that has memory and could take actions.
And when I see the combination of advanced voice mode, voice-to-voice API, Strawberry 01 thinking combined with scaling intelligence, I just think this is going to go a lot faster than most of us think. Now, listen, they may pull on the reins, right? They may slow down the release schedule in order, you know, for a lot of business reasons.
That's harder to predict. But I think the technology, I mean, even Noam said, I thought it was gonna take us much, much longer to see the results that we have seen. Can I hit on one other thing? This is, you know, we started the pod a little bit talking about it.
I just wanna get your impression, Bill. This idea that Jensen can scale the business two or three times with, you know, increasing the head count by, you know, 20 or 25%, right? We know that Meta's done that over the course of the last two years. And you and I've talked about, are we on the eve of just massive productivity boom and massive margin expansion like we've never seen before, right?
Nikesh said, we ought to be able to get 20 or 30%, you know, productivity gains out of everybody in the business. - First of all, I think NVIDIA is a very special company. And it's a company that's even if it's a systems company, it's an IP company. And the demand is growing at such a rate that they don't need more designers or more developer engineers to create incremental revenue.
That's happening on its own. And so their operating margins are record levels. For the majority of companies, you know, I've always just held this belief that, you know, you evolve with your tools. And the real answer is the companies that don't deploy these things are gonna go out of business.
- Yeah. - And so I think margins get competed away in many, many cases. I think it's ridiculous to imagine, oh, every company goes to 60% operating margin. - No, no, no, no. I mean, listen, Delta Airlines is going to do all of these things with AI and immediately because it's in a commodity market, it'll get competed away by Southwestern United.
Bad industries remain bad industries. - Yeah, yeah, yeah. So, but there might be some, you know, that figure it out. And I have another theory that I always keep in mind, which is hyper growth tends to delay what you learned in microeconomics class. You know, I remember when I was a PC analyst and there were five public PC companies all growing 100%.
And so in moments of hyper growth, you will have margins that may or may not be durable. And you'll have a number of participants in a market that may or may not be durable during periods of hyper growth. - I have two more things on my mind, Sunday. Do you have any reactions to that?
I mean, I just have to get to a couple of these topics. - No, like- - There's gonna be a Lex Friedman links podcast once you've finished the interview. - No, look, I really, you know, been thinking a lot about Jensen's point in the pod about, you know, how much AI they're using internally for design, design verification for all those pieces, right?
And I think, you know, it's not 30%. I actually think sort of that's an underestimate. I think you're talking, you know, multiple hundreds of percent improvement in productivity gains. And the only issue is that not every company can grasp that that quickly. And so, you know, I think he was kind of holding some cards back at that point when he made that comment.
And it really got me thinking about like, how much are they doing there that they don't want everybody to know about? And you kind of see it now in the model development because they, you know, if you've noticed the last couple of weeks, they put some models out there that are models trained on their own and they don't get as much noise as, you know, ones from Meta and, you know, the other players that are out there, but they're really doing a lot more than we think.
And they, I think they have their arms around a lot of these very, very difficult problems. Brad, why did they put their own model up? Well, it's related to this topic of open versus closed. So Bill, you know, I hope you're proud of me. You know, I went back and I said, I have to ask this question.
I do. Right, and you know, I thought Jensen, you know, I thought he gave a great answer, which is like, listen, we're gonna have companies that for economic reasons, right, push the boundary toward AGI or whatever they're doing. And it makes sense to have a closed model that can be the best and they can monetize.
But the world's not gonna develop with just closed models. We're gonna, you know, he's like, it's both open and closed. And, you know, he said, because open, he's like, it's absolutely a condition required. It's gonna be the vast majority of the models in the industry. He's right now, if we didn't have open source, how would you have all these different fields in science, you know, be able to be activated on AI?
He talked about llama models exploding higher. And then with respect to his own open source model, which I thought was really interesting. He said, we focused on, right, something that a specific capability. And the capability that we were focused on is how to agentically use this model to make your model smarter, faster, right?
So it's almost like a training coaching model that he built. And so I think for them, it makes perfect sense why they may, you know, put that out into the world. But I also, you know, a lot of times the open versus closed debate, you know, gets hijacked into this conversation about safety and security.
And, you know, and I think he said, you know, listen, these two things are related, but they're not the same thing. You know, one of the things he commented on that is just, he said, there's so much coordination going on on the safety and security level. Like we have so many agents and so much activity going on, on making sure, you know, just look at what Meta's doing, you know, on this.
He's like, I think that's one thing that's under-celebrated that even in the absence of any, you know, platonic guardian sort of regulation, right? Without any top-down, you already have an extraordinary amount of effort going in by all of these companies into AI safety and security that I thought was, I thought was a really important comment.
Thanks for jumping in, guys, kicking this one around. It was a special one to-- - Yeah, congrats on having that opportunity. That's pretty, that's pretty unique. - And now we got a little wager. So I mean, listen, I am so looking forward to like doing a live booking at the Mercer on the pod, right?
And then Sonny, we can just drop the money from the sky. We can just collect, we can just collect, exactly, exactly. Good to see you guys. We'll talk soon. - All right, peace. - Take care. (upbeat music) - As a reminder to everybody, just our opinions, not investment advice.