Back to Index

What's next for photonics-powered data centers and AI ft. Lightmatter's Nick Harris


Transcript

It's my privilege to introduce the first speaker, Nick Harris, who is CEO of Light Matter. We all know that a lot of AI progress has been driven by scaling laws and training very large foundation models. Nick and his company, Light Matter, is a key player in that, and he's building very, very large data centers, hundreds of thousands of GPUs, maybe millions of nodes someday that will be coming online soon and hopefully powering hyperscalers, next generation of AI models for all of us to build on.

So let me hand over to Nick. All right. Thank you, Sequoia team, for the invite, Sean McGuire for putting my name up, and Konstantin for accepting the talk. I have to say the talks at Sequoia, I've attended two events now, really been world class. Sequoia is able to pull together some of those interesting people in the world.

So yeah, let's talk about Light Matter and the future of the data center. One of the things that I thought was incredibly exciting from earlier today was seeing Sora. And the example that was very near to my heart was looking at what happened as you scaled the amount of compute in the AI model.

It went from this sort of goofy munch of like some kind of furry thing to the physics of a dog with a hat on and a person and their hair flowing. And this is the difference that the amount of compute you have makes on the power of AI models.

So let's go ahead and talk about the future of the data center. So this is pretty wild. Very rough estimate on sort of the capital expenditure for the supercomputers that are used to train AI models. So let's start here in the bottom, so 4,000 GPUs, something like $150 million to deploy this kind of system, 10,000, we're looking at about 400 million, 60,000, 4 billion.

This is an insane amount of money. It turns out that the power of AI models and AI in general scales very much with the amount of compute that you have. And the spend for these systems is astronomical. And if you look at what's coming next, what's the next point here, 10 billion, 20 billion, there's going to be an enormous amount of pressure on companies to deliver a return on this investment.

But we know that the AGI is potentially out there, at least we suspect it is, if you spend enough money. But this comes at a very challenging time. My background is in physics. I love computers. And I'll tell you that scaling is over. We're not getting more performance out of computer chips.

Jensen had GTC announcement yesterday, I believe, where he showed a chip that was twice as big for twice the performance. And that's sort of what we're doing in terms of scaling today. So the core technology that's driven Moore's Law and Dennard's scaling that make computers faster and cheaper and has democratized computing for the world and made this AGI hunt that we're on possible is coming to an end.

So at Light Matter, what we're doing is we're looking at how do you continue scaling. And everything we do is centered around light. We're using light to move the data between the chips, allow you to scale it to be much bigger so that you can get to 100,000 nodes, a million nodes and beyond.

Try to figure out what's required to get to AGI, what's required to get to these next gen models. So this is kind of what a present day supercomputer looks like. You'll have racks of networking gear, and you'll have racks of computing gear. And there are a lot of interconnections when you're inside one of the computing racks.

But then you kind of get a spaghetti, a few links over to the networking racks and this very weak sort of interconnectivity in these clusters. And what that means is that when you map a computation like an AI training workload onto these supercomputers, you're basically having to slice and dice it so that big pieces of it fit in the tightly interconnected clusters.

You're having a really hard time scaling, getting really good unit performance scaling as you get to 50,000 GPUs running a workload. So I would basically tell you that 1,000 GPUs is not just 1,000 GPUs. It really depends how you wire these together. And that wiring is where a significant amount of the value is.

This is present day data centers. What if we deleted all the networking racks? What if we deleted all of these? And what if we scaled the compute to be 100 times larger? And what if instead of the spaghetti we have linking everything together, what if we had an all to all interconnect?

What if we deleted all of the networking equipment in the data center? This is the future that we're building at Light Matter. We're looking at how you get these AI supercomputers to get to the next model. It's going to be super expensive, and it's going to require fundamentally new technologies.

And this is the core technology. This is called passage. And this is how all GPUs and switches are going to be built. We work with companies like AMD, Intel, NVIDIA, Qualcomm, places like this. And we put their chips on top of our optical interconnect substrate. It's the foundation for how AI computing will make progress.

It will reduce the energy consumption of these clusters dramatically. And it will enable scaling to a million nodes and beyond. This is how you get to wafer scale, the biggest chips in the world. And this is how you get to AGI. Thank you, Nick. Thank you, Nick.