back to index

What's next for photonics-powered data centers and AI ft. Lightmatter's Nick Harris


Whisper Transcript | Transcript Only Page

00:00:00.000 | It's my privilege to introduce the first speaker, Nick Harris, who is CEO of Light Matter.
00:00:08.740 | We all know that a lot of AI progress has been driven by scaling laws and training very
00:00:13.580 | large foundation models.
00:00:15.900 | Nick and his company, Light Matter, is a key player in that, and he's building very, very
00:00:21.060 | large data centers, hundreds of thousands of GPUs, maybe millions of nodes someday that
00:00:27.420 | will be coming online soon and hopefully powering hyperscalers, next generation of AI models
00:00:33.020 | for all of us to build on.
00:00:35.140 | So let me hand over to Nick.
00:00:36.140 | All right.
00:00:37.140 | Thank you, Sequoia team, for the invite, Sean McGuire for putting my name up, and Konstantin
00:00:42.700 | for accepting the talk.
00:00:44.420 | I have to say the talks at Sequoia, I've attended two events now, really been world class.
00:00:50.980 | Sequoia is able to pull together some of those interesting people in the world.
00:00:56.200 | So yeah, let's talk about Light Matter and the future of the data center.
00:01:01.320 | One of the things that I thought was incredibly exciting from earlier today was seeing Sora.
00:01:07.380 | And the example that was very near to my heart was looking at what happened as you scaled
00:01:13.780 | the amount of compute in the AI model.
00:01:16.480 | It went from this sort of goofy munch of like some kind of furry thing to the physics of
00:01:22.700 | a dog with a hat on and a person and their hair flowing.
00:01:27.000 | And this is the difference that the amount of compute you have makes on the power of
00:01:31.740 | AI models.
00:01:34.180 | So let's go ahead and talk about the future of the data center.
00:01:38.820 | So this is pretty wild.
00:01:41.460 | Very rough estimate on sort of the capital expenditure for the supercomputers that are
00:01:46.500 | used to train AI models.
00:01:48.960 | So let's start here in the bottom, so 4,000 GPUs, something like $150 million to deploy
00:01:55.460 | this kind of system, 10,000, we're looking at about 400 million, 60,000, 4 billion.
00:02:03.380 | This is an insane amount of money.
00:02:05.600 | It turns out that the power of AI models and AI in general scales very much with the amount
00:02:12.320 | of compute that you have.
00:02:14.320 | And the spend for these systems is astronomical.
00:02:16.820 | And if you look at what's coming next, what's the next point here, 10 billion, 20 billion,
00:02:23.220 | there's going to be an enormous amount of pressure on companies to deliver a return
00:02:28.220 | on this investment.
00:02:30.540 | But we know that the AGI is potentially out there, at least we suspect it is, if you spend
00:02:35.760 | enough money.
00:02:37.260 | But this comes at a very challenging time.
00:02:40.940 | My background is in physics.
00:02:42.420 | I love computers.
00:02:44.420 | And I'll tell you that scaling is over.
00:02:46.580 | We're not getting more performance out of computer chips.
00:02:49.860 | Jensen had GTC announcement yesterday, I believe, where he showed a chip that was twice as big
00:02:55.660 | for twice the performance.
00:02:57.000 | And that's sort of what we're doing in terms of scaling today.
00:03:00.580 | So the core technology that's driven Moore's Law and Dennard's scaling that make computers
00:03:06.660 | faster and cheaper and has democratized computing for the world and made this AGI hunt that
00:03:12.580 | we're on possible is coming to an end.
00:03:15.860 | So at Light Matter, what we're doing is we're looking at how do you continue scaling.
00:03:20.140 | And everything we do is centered around light.
00:03:22.940 | We're using light to move the data between the chips, allow you to scale it to be much
00:03:27.180 | bigger so that you can get to 100,000 nodes, a million nodes and beyond.
00:03:32.940 | Try to figure out what's required to get to AGI, what's required to get to these next
00:03:38.220 | gen models.
00:03:41.940 | So this is kind of what a present day supercomputer looks like.
00:03:45.980 | You'll have racks of networking gear, and you'll have racks of computing gear.
00:03:51.180 | And there are a lot of interconnections when you're inside one of the computing racks.
00:03:55.980 | But then you kind of get a spaghetti, a few links over to the networking racks and this
00:04:00.300 | very weak sort of interconnectivity in these clusters.
00:04:04.820 | And what that means is that when you map a computation like an AI training workload onto
00:04:09.020 | these supercomputers, you're basically having to slice and dice it so that big pieces of
00:04:14.680 | it fit in the tightly interconnected clusters.
00:04:17.380 | You're having a really hard time scaling, getting really good unit performance scaling
00:04:21.820 | as you get to 50,000 GPUs running a workload.
00:04:26.380 | So I would basically tell you that 1,000 GPUs is not just 1,000 GPUs.
00:04:32.220 | It really depends how you wire these together.
00:04:35.780 | And that wiring is where a significant amount of the value is.
00:04:40.140 | This is present day data centers.
00:04:41.740 | What if we deleted all the networking racks?
00:04:44.260 | What if we deleted all of these?
00:04:46.300 | And what if we scaled the compute to be 100 times larger?
00:04:50.980 | And what if instead of the spaghetti we have linking everything together, what if we had
00:04:55.500 | an all to all interconnect?
00:04:56.900 | What if we deleted all of the networking equipment in the data center?
00:05:00.380 | This is the future that we're building at Light Matter.
00:05:02.880 | We're looking at how you get these AI supercomputers to get to the next model.
00:05:07.820 | It's going to be super expensive, and it's going to require fundamentally new technologies.
00:05:13.860 | And this is the core technology.
00:05:15.700 | This is called passage.
00:05:17.500 | And this is how all GPUs and switches are going to be built.
00:05:21.020 | We work with companies like AMD, Intel, NVIDIA, Qualcomm, places like this.
00:05:25.900 | And we put their chips on top of our optical interconnect substrate.
00:05:30.480 | It's the foundation for how AI computing will make progress.
00:05:34.420 | It will reduce the energy consumption of these clusters dramatically.
00:05:39.380 | And it will enable scaling to a million nodes and beyond.
00:05:43.720 | This is how you get to wafer scale, the biggest chips in the world.
00:05:47.260 | And this is how you get to AGI.
00:05:48.260 | [APPLAUSE]
00:05:49.260 | Thank you, Nick.
00:05:49.260 | [APPLAUSE]
00:05:52.260 | Thank you, Nick.
00:05:54.300 | [BLANK_AUDIO]