back to index

Why We Don’t Need More Data Centers - Dr. Jasper Zhang, Hyperbolic


Whisper Transcript | Transcript Only Page

00:00:15.000 | Nice meeting you guys.
00:00:16.500 | Great to be here.
00:00:17.840 | And I'm here to present Hyperbolic, which
00:00:20.320 | is an AI cloud for developers.
00:00:22.400 | And so my topic is why we don't need more data centers.
00:00:28.320 | It's like a very eye-catching title.
00:00:31.320 | But what I want to clarify is I still
00:00:33.820 | think building data centers is important.
00:00:36.420 | But just building data centers alone can solve the problem.
00:00:41.640 | So wait, before we get started, let me introduce myself.
00:00:47.320 | I'm Jasper.
00:00:47.820 | I'm the CEO and co-founder of Hyperbolic.
00:00:50.440 | I did my math PhD at UC Berkeley, finished my PhD in two years,
00:00:54.200 | which made me the fastest person in the history of Berkeley.
00:00:57.480 | And then I also won a few gold medals.
00:00:59.460 | So after that, I worked at State of Securities,
00:01:02.260 | trying to use AI and machine learning to predict the market
00:01:04.780 | as a key strategy.
00:01:05.940 | So I always have a passion about how
00:01:08.300 | to make things very efficient and how
00:01:10.860 | to help you to save money.
00:01:12.240 | Because everyone knows that compute is actually
00:01:15.020 | one of the biggest costs for your companies
00:01:17.140 | or for your startups.
00:01:18.960 | Usually, if you want to rent 1,000 GPU,
00:01:21.580 | we'll spend you millions of dollars per year.
00:01:24.480 | And we think that these problems should
00:01:26.760 | be solved by not just building more data centers,
00:01:30.240 | but actually building a GPU marketplace.
00:01:33.120 | So let's get started with the problem that we're facing.
00:01:36.880 | First, I think--
00:01:39.460 | so everyone knows that AI is going to integrate
00:01:41.700 | with everything in the future.
00:01:43.720 | And every company will be AI companies.
00:01:46.540 | So the demand for GPUs as well as data centers is exploding.
00:01:51.380 | So by McKinsey, by 2030, we'll need 4x more data centers built
00:01:58.780 | in one quarter of the time that we built in this bit.
00:02:04.060 | But what if I tell you that you actually don't need
00:02:07.140 | that many data centers?
00:02:09.240 | You actually need another solution.
00:02:11.820 | So we can break down the demand first.
00:02:15.200 | Right now, the current capacity for data center is 55 gigawatts.
00:02:22.220 | By the median scenario, we're going to see 22% annual growth
00:02:28.220 | rate for the demand.
00:02:29.640 | So in 2030, we're going to need 219 gigawatts.
00:02:38.300 | And however, there are a lot of challenges building data centers.
00:02:43.280 | So first, everyone knows StartGate.
00:02:46.440 | So it takes-- for the first StartGate data center,
00:02:49.780 | it takes more than a billion dollars to build.
00:02:52.860 | And then also, it's very slow to connect data center
00:02:56.000 | to the electrical grid.
00:02:58.160 | For example, right now, the wait list is like seven years.
00:03:02.280 | So you need to wait seven years to connect 100 megawatts
00:03:05.420 | facility to the electrical grid in Northern Virginia.
00:03:11.720 | And then, it's also very consuming a lot of energy.
00:03:17.360 | So currently, we're spending 4% of the total electricity
00:03:21.320 | consumption in the US for just GPUs and data centers.
00:03:24.700 | And also, it's not very environmentally sustainable.
00:03:28.500 | If you can look at the number, that's crazy CO2 emissions
00:03:32.640 | annually.
00:03:34.520 | And even say, if we're going to deliver all the data centers
00:03:39.380 | on time, there is still a data center supply deficit
00:03:43.540 | of more than 15 gigawatts in the US along by 2030.
00:03:48.720 | And so it means that just building data center
00:03:52.660 | can solve the problem.
00:03:54.740 | On the other hand, we think the GPU utilization
00:04:00.500 | is actually pretty low.
00:04:01.980 | So according to Deloitte, GPUs sit idle 80% of the time
00:04:08.380 | for enterprises and companies.
00:04:11.920 | According to semi-analysis, there exists 100-plus GPU clouds.
00:04:17.240 | So we can see how fragmented this space is.
00:04:20.740 | A lot of you guys need GPUs, but you can't find them.
00:04:25.000 | Or you are going to pay extremely high price.
00:04:28.420 | On the other hand, there are a lot of GPUs sit idle in data centers
00:04:32.500 | or in different clouds.
00:04:34.240 | And so naturally, a solution that we think we should build is actually
00:04:39.880 | build a GPU marketplace or aggregation layer that
00:04:43.160 | aggregate different data centers and GPU providers to solve the problem
00:04:48.100 | for GPU users.
00:04:50.280 | So it doesn't necessarily need to be hyperbolic,
00:04:52.480 | but I just use hyperbolic as an example to show you.
00:04:57.760 | So I can just share what we're trying to solve.
00:05:03.040 | So we're building this global orchestration layer.
00:05:06.100 | We invented a software called HyperDOS, which is short for Hyperbolic
00:05:10.420 | Distributed Operating System.
00:05:13.520 | So basically, it's like Kubernetes software.
00:05:17.280 | So any cluster, as long as it installed our software within five
00:05:22.460 | minutes, suddenly the data center become a cluster in our network.
00:05:27.280 | And on the other side, users can rent GPUs in different ways
00:05:32.300 | that they want.
00:05:32.960 | They can just do the spot instance.
00:05:36.540 | They can on-demand.
00:05:37.720 | They can long-term reserve.
00:05:39.180 | Or they can also host models on top.
00:05:41.680 | And so we see that there are several benefits.
00:05:49.080 | One, we kind of solve the matching problem of compute.
00:05:56.340 | And then second, GPU become commodities.
00:05:59.560 | So you don't need to spend too much time to wait for data center.
00:06:02.860 | You just buy them on the marketplace.
00:06:04.780 | And then third, you can have different options.
00:06:09.160 | And so we do some math modeling.
00:06:13.640 | I mean, I don't have time to kind of put down the math
00:06:16.180 | in the slides.
00:06:16.900 | But this is our conclusion, right?
00:06:19.120 | Basically, we can save the cost by 50% to 75%.
00:06:25.080 | Even if you look at the current--
00:06:27.340 | we're running some beta version of our marketplace right now.
00:06:30.940 | And our GPU cost for H100 is $0.99 per hour.
00:06:36.480 | But if you look at Google, for example,
00:06:39.160 | they have on-demand GPU.
00:06:40.600 | It's like $11.
00:06:42.080 | They're like a lambda.
00:06:42.980 | They have like $2 or $3.
00:06:44.380 | But on average, by aggregating more supply
00:06:48.900 | and then have a uniform distribution channel,
00:06:53.180 | you can drastically reduce the price.
00:06:57.020 | It's the theory behind that is like the queuing theory.
00:07:00.800 | Basically, it's MMC theory.
00:07:04.180 | Probably next time, if we're going to watch my talk,
00:07:06.580 | I will share more math behind it.
00:07:09.440 | But yeah, and then you can just save time
00:07:11.240 | to vetting your suppliers.
00:07:12.380 | Because if you think about--
00:07:15.400 | I mean, how many people here are funders
00:07:17.820 | or need to acquire GPUs?
00:07:21.840 | So are you frustrated when you are trying to talk to--
00:07:26.120 | how many suppliers are you talking to?
00:07:28.140 | If you have talked to more than five, raise your hands.
00:07:32.500 | Are you frustrated when you're trying
00:07:34.140 | to have five sales calls and try to know which GPUs are
00:07:40.760 | frustrated, yeah, are good, yeah.
00:07:42.720 | That's good, yeah.
00:07:43.360 | So basically, by having this uniform platform,
00:07:47.580 | like funders or startups or companies
00:07:50.260 | no longer need to vet different data centers.
00:07:53.140 | They just pick the one that they have high rating
00:07:57.820 | or have the best price.
00:07:59.440 | We're also going to do benchmarking
00:08:01.840 | on the performance of the GPUs.
00:08:10.160 | Sorry.
00:08:10.700 | All right.
00:08:13.980 | So sorry.
00:08:14.820 | Somehow the graph didn't show.
00:08:17.520 | Give me one sec.
00:08:22.860 | So basically, we can think about a use case example.
00:08:34.200 | So let's say if you are a startup and you want 1,000 GPUs
00:08:39.120 | at the beginning.
00:08:39.960 | So usually, you will just reserve these 1,000 GPUs for a year.
00:08:43.820 | You think, I might need to use these GPUs for training.
00:08:48.060 | And later on, I want to do inference.
00:08:49.840 | And so you run some training jobs.
00:08:52.340 | And then after three months, then you realize that, OK, now,
00:08:56.180 | I have a bad idea by running those experiments.
00:09:00.940 | And now I need 1,000 more GPUs just for a month, right?
00:09:05.040 | And then after six months, then you finish your training job.
00:09:10.920 | And then you realize that now I only
00:09:12.700 | need 500 GPUs for hosting my model.
00:09:16.280 | But I still have 500 GPUs left.
00:09:19.460 | So on the traditional unhyperbolic case,
00:09:23.900 | you basically can say, OK, I will rent 1,000 GPUs
00:09:27.380 | for a year at the beginning.
00:09:29.140 | But then in month three, I can say,
00:09:34.280 | I just rent an extra 1,000 GPUs for just a month.
00:09:40.920 | And then in month six, then I can say, OK,
00:09:44.720 | I can release my idle GPUs on hyperbolic and try to sell them
00:09:49.380 | to other people that need them.
00:09:52.580 | But if you just use some traditional cloud,
00:09:55.320 | then you need to rent 1,000 GPUs at the beginning.
00:09:57.920 | And then in month three, you need to rent actually 1,000 GPUs
00:10:02.240 | for a year, usually.
00:10:03.720 | And if you calculate the cost, compare that,
00:10:06.660 | and then also think about the price difference you will have,
00:10:12.340 | you can reduce the cost from $43.8 million to $6.9 million.
00:10:19.000 | So it's like 6x saving.
00:10:21.360 | And you also help other people to get cheaper GPUs too,
00:10:24.520 | because you can release those idle GPUs to other people.
00:10:27.900 | And so this is how we think that we're going to increase
00:10:34.660 | the productivity.
00:10:36.340 | People only think about saving, but actually,
00:10:39.760 | this is not true for GPU, right?
00:10:42.580 | By scaling law, we know that the more compute you spend,
00:10:46.620 | the better quality your model will be.
00:10:49.700 | So it's not just about saving your cost by 6x.
00:10:53.980 | It's more about with the same budget,
00:10:56.720 | you will increase your productivity by 6x.
00:11:00.260 | And imagine how many startups that they use
00:11:05.300 | only need to rely on open AI and anthropic those closed AI
00:11:08.800 | models.
00:11:09.540 | But now suddenly, their money become more valuable,
00:11:13.680 | and they can rent as many GPUs as they want for their training.
00:11:19.080 | And so the next step that we think usually the GPU marketplace will
00:11:23.040 | evolve into is that it will be an all-in-one platform for a different AI
00:11:29.140 | workload.
00:11:30.020 | Because what people really want is not just GPUs.
00:11:33.940 | They want to run their different AI jobs, right?
00:11:37.060 | So you will have AI inference, online inference, offline inference,
00:11:42.040 | and then you will also have a training job.
00:11:47.080 | And so, yeah, so this is, like, to, like, some takeaway.
00:11:52.840 | Like, basically, we don't think we need, like, just focus on building
00:11:58.320 | data centers.
00:11:58.980 | We also need to do, like, smart allocation for the resources.
00:12:03.200 | And then second, we can reduce your costs by building GPUs
00:12:09.220 | marketplace.
00:12:10.260 | And lastly, I think just focusing on building data centers
00:12:14.520 | is not very sustainable.
00:12:15.920 | We're costing a lot of energy, taking a lot of land.
00:12:20.420 | We should better reuse, recycle those idle compute
00:12:24.620 | by sending it to others.
00:12:27.860 | So if you're interested in trying out,
00:12:31.520 | you can come to our website.
00:12:34.520 | The left QR code is the current product
00:12:37.580 | that we have, which is a marketplace.
00:12:39.280 | But then we're also launching our business card
00:12:41.100 | and enterprise card that give you, like, production-ready GPUs
00:12:44.760 | with 99.5% reliability.
00:12:47.400 | All right.
00:12:47.820 | Thanks.
00:12:48.200 | Awesome.
00:12:53.460 | So I actually got--
00:12:55.260 | I'm curious.
00:12:56.040 | Can you tell us more about the kind of hyperbolic OS?
00:12:59.520 | How exactly does that turn?
00:13:01.260 | Because I know a lot of times you have a data center,
00:13:03.760 | you have a cluster set of GPUs.
00:13:05.260 | How does it actually work to connect it to hyperbolic itself?
00:13:09.760 | Yeah.
00:13:10.260 | Yeah.
00:13:10.260 | Yeah.
00:13:10.260 | So basically, this is a hybrid--
00:13:13.260 | HyperDOS is like a Kubernetes agent.
00:13:15.820 | So you just install that in your cluster,
00:13:19.360 | as long as you have Kubernetes.
00:13:21.540 | I mean, most data centers have Kubernetes.
00:13:24.520 | But then even for your MacBook or for your other PC,
00:13:27.940 | you can just install, like, to kind
00:13:30.700 | of become a Kubernetes-ready machine.
00:13:34.120 | And so basically, now you kind of have--
00:13:38.060 | we have terminology in-house.
00:13:40.340 | We call, like, our hyperbolic server Monarch.
00:13:45.580 | And then we have different Barons.
00:13:48.840 | So it's like a model.
00:13:50.740 | So different Barons, they own different compute.
00:13:54.160 | And then every time when a user wants to rent GPU,
00:13:58.120 | they will talk to our Monarch server.
00:14:00.160 | And the Monarch server will send a request to the Barron.
00:14:04.140 | And then Barron will basically provision the machines
00:14:07.540 | and set up the SSH instance for customers to access.
00:14:11.380 | Yeah.
00:14:11.880 | And then we'll see you next time.