Why We Don’t Need More Data Centers - Dr. Jasper Zhang, Hyperbolic

00:00:00.000 | .

00:00:15.000 | Nice meeting you guys.

00:00:16.500 | Great to be here.

00:00:17.840 | And I'm here to present Hyperbolic, which

00:00:20.320 | is an AI cloud for developers.

00:00:22.400 | And so my topic is why we don't need more data centers.

00:00:28.320 | It's like a very eye-catching title.

00:00:31.320 | But what I want to clarify is I still

00:00:33.820 | think building data centers is important.

00:00:36.420 | But just building data centers alone can solve the problem.

00:00:41.640 | So wait, before we get started, let me introduce myself.

00:00:47.320 | I'm Jasper.

00:00:47.820 | I'm the CEO and co-founder of Hyperbolic.

00:00:50.440 | I did my math PhD at UC Berkeley, finished my PhD in two years,

00:00:54.200 | which made me the fastest person in the history of Berkeley.

00:00:57.480 | And then I also won a few gold medals.

00:00:59.460 | So after that, I worked at State of Securities,

00:01:02.260 | trying to use AI and machine learning to predict the market

00:01:04.780 | as a key strategy.

00:01:05.940 | So I always have a passion about how

00:01:08.300 | to make things very efficient and how

00:01:10.860 | to help you to save money.

00:01:12.240 | Because everyone knows that compute is actually

00:01:15.020 | one of the biggest costs for your companies

00:01:17.140 | or for your startups.

00:01:18.960 | Usually, if you want to rent 1,000 GPU,

00:01:21.580 | we'll spend you millions of dollars per year.

00:01:24.480 | And we think that these problems should

00:01:26.760 | be solved by not just building more data centers,

00:01:30.240 | but actually building a GPU marketplace.

00:01:33.120 | So let's get started with the problem that we're facing.

00:01:36.880 | First, I think--

00:01:39.460 | so everyone knows that AI is going to integrate

00:01:41.700 | with everything in the future.

00:01:43.720 | And every company will be AI companies.

00:01:46.540 | So the demand for GPUs as well as data centers is exploding.

00:01:51.380 | So by McKinsey, by 2030, we'll need 4x more data centers built

00:01:58.780 | in one quarter of the time that we built in this bit.

00:02:04.060 | But what if I tell you that you actually don't need

00:02:07.140 | that many data centers?

00:02:09.240 | You actually need another solution.

00:02:11.820 | So we can break down the demand first.

00:02:15.200 | Right now, the current capacity for data center is 55 gigawatts.

00:02:22.220 | By the median scenario, we're going to see 22% annual growth

00:02:28.220 | rate for the demand.

00:02:29.640 | So in 2030, we're going to need 219 gigawatts.

00:02:38.300 | And however, there are a lot of challenges building data centers.

00:02:43.280 | So first, everyone knows StartGate.

00:02:46.440 | So it takes-- for the first StartGate data center,

00:02:49.780 | it takes more than a billion dollars to build.

00:02:52.860 | And then also, it's very slow to connect data center

00:02:56.000 | to the electrical grid.

00:02:58.160 | For example, right now, the wait list is like seven years.

00:03:02.280 | So you need to wait seven years to connect 100 megawatts

00:03:05.420 | facility to the electrical grid in Northern Virginia.

00:03:11.720 | And then, it's also very consuming a lot of energy.

00:03:17.360 | So currently, we're spending 4% of the total electricity

00:03:21.320 | consumption in the US for just GPUs and data centers.

00:03:24.700 | And also, it's not very environmentally sustainable.

00:03:28.500 | If you can look at the number, that's crazy CO2 emissions

00:03:32.640 | annually.

00:03:34.520 | And even say, if we're going to deliver all the data centers

00:03:39.380 | on time, there is still a data center supply deficit

00:03:43.540 | of more than 15 gigawatts in the US along by 2030.

00:03:48.720 | And so it means that just building data center

00:03:52.660 | can solve the problem.

00:03:54.740 | On the other hand, we think the GPU utilization

00:04:00.500 | is actually pretty low.

00:04:01.980 | So according to Deloitte, GPUs sit idle 80% of the time

00:04:08.380 | for enterprises and companies.

00:04:11.920 | According to semi-analysis, there exists 100-plus GPU clouds.

00:04:17.240 | So we can see how fragmented this space is.

00:04:20.740 | A lot of you guys need GPUs, but you can't find them.

00:04:25.000 | Or you are going to pay extremely high price.

00:04:28.420 | On the other hand, there are a lot of GPUs sit idle in data centers

00:04:32.500 | or in different clouds.

00:04:34.240 | And so naturally, a solution that we think we should build is actually

00:04:39.880 | build a GPU marketplace or aggregation layer that

00:04:43.160 | aggregate different data centers and GPU providers to solve the problem

00:04:48.100 | for GPU users.

00:04:50.280 | So it doesn't necessarily need to be hyperbolic,

00:04:52.480 | but I just use hyperbolic as an example to show you.

00:04:57.760 | So I can just share what we're trying to solve.

00:05:03.040 | So we're building this global orchestration layer.

00:05:06.100 | We invented a software called HyperDOS, which is short for Hyperbolic

00:05:10.420 | Distributed Operating System.

00:05:13.520 | So basically, it's like Kubernetes software.

00:05:17.280 | So any cluster, as long as it installed our software within five

00:05:22.460 | minutes, suddenly the data center become a cluster in our network.

00:05:27.280 | And on the other side, users can rent GPUs in different ways

00:05:32.300 | that they want.

00:05:32.960 | They can just do the spot instance.

00:05:36.540 | They can on-demand.

00:05:37.720 | They can long-term reserve.

00:05:39.180 | Or they can also host models on top.

00:05:41.680 | And so we see that there are several benefits.

00:05:49.080 | One, we kind of solve the matching problem of compute.

00:05:56.340 | And then second, GPU become commodities.

00:05:59.560 | So you don't need to spend too much time to wait for data center.

00:06:02.860 | You just buy them on the marketplace.

00:06:04.780 | And then third, you can have different options.

00:06:09.160 | And so we do some math modeling.

00:06:13.640 | I mean, I don't have time to kind of put down the math

00:06:16.180 | in the slides.

00:06:16.900 | But this is our conclusion, right?

00:06:19.120 | Basically, we can save the cost by 50% to 75%.

00:06:25.080 | Even if you look at the current--

00:06:27.340 | we're running some beta version of our marketplace right now.

00:06:30.940 | And our GPU cost for H100 is $0.99 per hour.

00:06:36.480 | But if you look at Google, for example,

00:06:39.160 | they have on-demand GPU.

00:06:40.600 | It's like $11.

00:06:42.080 | They're like a lambda.

00:06:42.980 | They have like $2 or $3.

00:06:44.380 | But on average, by aggregating more supply

00:06:48.900 | and then have a uniform distribution channel,

00:06:53.180 | you can drastically reduce the price.

00:06:57.020 | It's the theory behind that is like the queuing theory.

00:07:00.800 | Basically, it's MMC theory.

00:07:04.180 | Probably next time, if we're going to watch my talk,

00:07:06.580 | I will share more math behind it.

00:07:09.440 | But yeah, and then you can just save time

00:07:11.240 | to vetting your suppliers.

00:07:12.380 | Because if you think about--

00:07:15.400 | I mean, how many people here are funders

00:07:17.820 | or need to acquire GPUs?

00:07:21.840 | So are you frustrated when you are trying to talk to--

00:07:26.120 | how many suppliers are you talking to?

00:07:28.140 | If you have talked to more than five, raise your hands.

00:07:32.500 | Are you frustrated when you're trying

00:07:34.140 | to have five sales calls and try to know which GPUs are

00:07:40.760 | frustrated, yeah, are good, yeah.

00:07:42.720 | That's good, yeah.

00:07:43.360 | So basically, by having this uniform platform,

00:07:47.580 | like funders or startups or companies

00:07:50.260 | no longer need to vet different data centers.

00:07:53.140 | They just pick the one that they have high rating

00:07:57.820 | or have the best price.

00:07:59.440 | We're also going to do benchmarking

00:08:01.840 | on the performance of the GPUs.

00:08:10.160 | Sorry.

00:08:10.700 | All right.

00:08:13.980 | So sorry.

00:08:14.820 | Somehow the graph didn't show.

00:08:17.520 | Give me one sec.

00:08:22.860 | So basically, we can think about a use case example.

00:08:34.200 | So let's say if you are a startup and you want 1,000 GPUs

00:08:39.120 | at the beginning.

00:08:39.960 | So usually, you will just reserve these 1,000 GPUs for a year.

00:08:43.820 | You think, I might need to use these GPUs for training.

00:08:48.060 | And later on, I want to do inference.

00:08:49.840 | And so you run some training jobs.

00:08:52.340 | And then after three months, then you realize that, OK, now,

00:08:56.180 | I have a bad idea by running those experiments.

00:09:00.940 | And now I need 1,000 more GPUs just for a month, right?

00:09:05.040 | And then after six months, then you finish your training job.

00:09:10.920 | And then you realize that now I only

00:09:12.700 | need 500 GPUs for hosting my model.

00:09:16.280 | But I still have 500 GPUs left.

00:09:19.460 | So on the traditional unhyperbolic case,

00:09:23.900 | you basically can say, OK, I will rent 1,000 GPUs

00:09:27.380 | for a year at the beginning.

00:09:29.140 | But then in month three, I can say,

00:09:34.280 | I just rent an extra 1,000 GPUs for just a month.

00:09:40.920 | And then in month six, then I can say, OK,

00:09:44.720 | I can release my idle GPUs on hyperbolic and try to sell them

00:09:49.380 | to other people that need them.

00:09:52.580 | But if you just use some traditional cloud,

00:09:55.320 | then you need to rent 1,000 GPUs at the beginning.

00:09:57.920 | And then in month three, you need to rent actually 1,000 GPUs

00:10:02.240 | for a year, usually.

00:10:03.720 | And if you calculate the cost, compare that,

00:10:06.660 | and then also think about the price difference you will have,

00:10:12.340 | you can reduce the cost from $43.8 million to $6.9 million.

00:10:19.000 | So it's like 6x saving.

00:10:21.360 | And you also help other people to get cheaper GPUs too,

00:10:24.520 | because you can release those idle GPUs to other people.

00:10:27.900 | And so this is how we think that we're going to increase

00:10:34.660 | the productivity.

00:10:36.340 | People only think about saving, but actually,

00:10:39.760 | this is not true for GPU, right?

00:10:42.580 | By scaling law, we know that the more compute you spend,

00:10:46.620 | the better quality your model will be.

00:10:49.700 | So it's not just about saving your cost by 6x.

00:10:53.980 | It's more about with the same budget,

00:10:56.720 | you will increase your productivity by 6x.

00:11:00.260 | And imagine how many startups that they use

00:11:05.300 | only need to rely on open AI and anthropic those closed AI

00:11:08.800 | models.

00:11:09.540 | But now suddenly, their money become more valuable,

00:11:13.680 | and they can rent as many GPUs as they want for their training.

00:11:19.080 | And so the next step that we think usually the GPU marketplace will

00:11:23.040 | evolve into is that it will be an all-in-one platform for a different AI

00:11:29.140 | workload.

00:11:30.020 | Because what people really want is not just GPUs.

00:11:33.940 | They want to run their different AI jobs, right?

00:11:37.060 | So you will have AI inference, online inference, offline inference,

00:11:42.040 | and then you will also have a training job.

00:11:47.080 | And so, yeah, so this is, like, to, like, some takeaway.

00:11:52.840 | Like, basically, we don't think we need, like, just focus on building

00:11:58.320 | data centers.

00:11:58.980 | We also need to do, like, smart allocation for the resources.

00:12:03.200 | And then second, we can reduce your costs by building GPUs

00:12:09.220 | marketplace.

00:12:10.260 | And lastly, I think just focusing on building data centers

00:12:14.520 | is not very sustainable.

00:12:15.920 | We're costing a lot of energy, taking a lot of land.

00:12:20.420 | We should better reuse, recycle those idle compute

00:12:24.620 | by sending it to others.

00:12:27.860 | So if you're interested in trying out,

00:12:31.520 | you can come to our website.

00:12:34.520 | The left QR code is the current product

00:12:37.580 | that we have, which is a marketplace.

00:12:39.280 | But then we're also launching our business card

00:12:41.100 | and enterprise card that give you, like, production-ready GPUs

00:12:44.760 | with 99.5% reliability.

00:12:47.400 | All right.

00:12:47.820 | Thanks.

00:12:48.200 | Awesome.

00:12:53.460 | So I actually got--

00:12:55.260 | I'm curious.

00:12:56.040 | Can you tell us more about the kind of hyperbolic OS?

00:12:59.520 | How exactly does that turn?

00:13:01.260 | Because I know a lot of times you have a data center,

00:13:03.760 | you have a cluster set of GPUs.

00:13:05.260 | How does it actually work to connect it to hyperbolic itself?

00:13:09.760 | Yeah.

00:13:10.260 | Yeah.

00:13:10.260 | So basically, this is a hybrid--

00:13:13.260 | HyperDOS is like a Kubernetes agent.

00:13:15.820 | So you just install that in your cluster,

00:13:19.360 | as long as you have Kubernetes.

00:13:21.540 | I mean, most data centers have Kubernetes.

00:13:24.520 | But then even for your MacBook or for your other PC,

00:13:27.940 | you can just install, like, to kind

00:13:30.700 | of become a Kubernetes-ready machine.

00:13:34.120 | And so basically, now you kind of have--

00:13:38.060 | we have terminology in-house.

00:13:40.340 | We call, like, our hyperbolic server Monarch.

00:13:45.580 | And then we have different Barons.

00:13:48.840 | So it's like a model.

00:13:50.740 | So different Barons, they own different compute.

00:13:54.160 | And then every time when a user wants to rent GPU,

00:13:58.120 | they will talk to our Monarch server.

00:14:00.160 | And the Monarch server will send a request to the Barron.

00:14:04.140 | And then Barron will basically provision the machines

00:14:07.540 | and set up the SSH instance for customers to access.

00:14:11.380 | Yeah.

00:14:11.880 | And then we'll see you next time.