back to indexWhat's next for photonics-powered data centers and AI ft. Lightmatter's Nick Harris
00:00:00.000 |
It's my privilege to introduce the first speaker, Nick Harris, who is CEO of Light Matter. 00:00:08.740 |
We all know that a lot of AI progress has been driven by scaling laws and training very 00:00:15.900 |
Nick and his company, Light Matter, is a key player in that, and he's building very, very 00:00:21.060 |
large data centers, hundreds of thousands of GPUs, maybe millions of nodes someday that 00:00:27.420 |
will be coming online soon and hopefully powering hyperscalers, next generation of AI models 00:00:37.140 |
Thank you, Sequoia team, for the invite, Sean McGuire for putting my name up, and Konstantin 00:00:44.420 |
I have to say the talks at Sequoia, I've attended two events now, really been world class. 00:00:50.980 |
Sequoia is able to pull together some of those interesting people in the world. 00:00:56.200 |
So yeah, let's talk about Light Matter and the future of the data center. 00:01:01.320 |
One of the things that I thought was incredibly exciting from earlier today was seeing Sora. 00:01:07.380 |
And the example that was very near to my heart was looking at what happened as you scaled 00:01:16.480 |
It went from this sort of goofy munch of like some kind of furry thing to the physics of 00:01:22.700 |
a dog with a hat on and a person and their hair flowing. 00:01:27.000 |
And this is the difference that the amount of compute you have makes on the power of 00:01:34.180 |
So let's go ahead and talk about the future of the data center. 00:01:41.460 |
Very rough estimate on sort of the capital expenditure for the supercomputers that are 00:01:48.960 |
So let's start here in the bottom, so 4,000 GPUs, something like $150 million to deploy 00:01:55.460 |
this kind of system, 10,000, we're looking at about 400 million, 60,000, 4 billion. 00:02:05.600 |
It turns out that the power of AI models and AI in general scales very much with the amount 00:02:14.320 |
And the spend for these systems is astronomical. 00:02:16.820 |
And if you look at what's coming next, what's the next point here, 10 billion, 20 billion, 00:02:23.220 |
there's going to be an enormous amount of pressure on companies to deliver a return 00:02:30.540 |
But we know that the AGI is potentially out there, at least we suspect it is, if you spend 00:02:46.580 |
We're not getting more performance out of computer chips. 00:02:49.860 |
Jensen had GTC announcement yesterday, I believe, where he showed a chip that was twice as big 00:02:57.000 |
And that's sort of what we're doing in terms of scaling today. 00:03:00.580 |
So the core technology that's driven Moore's Law and Dennard's scaling that make computers 00:03:06.660 |
faster and cheaper and has democratized computing for the world and made this AGI hunt that 00:03:15.860 |
So at Light Matter, what we're doing is we're looking at how do you continue scaling. 00:03:20.140 |
And everything we do is centered around light. 00:03:22.940 |
We're using light to move the data between the chips, allow you to scale it to be much 00:03:27.180 |
bigger so that you can get to 100,000 nodes, a million nodes and beyond. 00:03:32.940 |
Try to figure out what's required to get to AGI, what's required to get to these next 00:03:41.940 |
So this is kind of what a present day supercomputer looks like. 00:03:45.980 |
You'll have racks of networking gear, and you'll have racks of computing gear. 00:03:51.180 |
And there are a lot of interconnections when you're inside one of the computing racks. 00:03:55.980 |
But then you kind of get a spaghetti, a few links over to the networking racks and this 00:04:00.300 |
very weak sort of interconnectivity in these clusters. 00:04:04.820 |
And what that means is that when you map a computation like an AI training workload onto 00:04:09.020 |
these supercomputers, you're basically having to slice and dice it so that big pieces of 00:04:14.680 |
it fit in the tightly interconnected clusters. 00:04:17.380 |
You're having a really hard time scaling, getting really good unit performance scaling 00:04:21.820 |
as you get to 50,000 GPUs running a workload. 00:04:26.380 |
So I would basically tell you that 1,000 GPUs is not just 1,000 GPUs. 00:04:32.220 |
It really depends how you wire these together. 00:04:35.780 |
And that wiring is where a significant amount of the value is. 00:04:46.300 |
And what if we scaled the compute to be 100 times larger? 00:04:50.980 |
And what if instead of the spaghetti we have linking everything together, what if we had 00:04:56.900 |
What if we deleted all of the networking equipment in the data center? 00:05:00.380 |
This is the future that we're building at Light Matter. 00:05:02.880 |
We're looking at how you get these AI supercomputers to get to the next model. 00:05:07.820 |
It's going to be super expensive, and it's going to require fundamentally new technologies. 00:05:17.500 |
And this is how all GPUs and switches are going to be built. 00:05:21.020 |
We work with companies like AMD, Intel, NVIDIA, Qualcomm, places like this. 00:05:25.900 |
And we put their chips on top of our optical interconnect substrate. 00:05:30.480 |
It's the foundation for how AI computing will make progress. 00:05:34.420 |
It will reduce the energy consumption of these clusters dramatically. 00:05:39.380 |
And it will enable scaling to a million nodes and beyond. 00:05:43.720 |
This is how you get to wafer scale, the biggest chips in the world.