Why We Don’t Need More Data Centers - Dr. Jasper Zhang, Hyperbolic

. Nice meeting you guys. Great to be here. And I'm here to present Hyperbolic, which is an AI cloud for developers. And so my topic is why we don't need more data centers. It's like a very eye-catching title. But what I want to clarify is I still think building data centers is important.

But just building data centers alone can solve the problem. So wait, before we get started, let me introduce myself. I'm Jasper. I'm the CEO and co-founder of Hyperbolic. I did my math PhD at UC Berkeley, finished my PhD in two years, which made me the fastest person in the history of Berkeley.

And then I also won a few gold medals. So after that, I worked at State of Securities, trying to use AI and machine learning to predict the market as a key strategy. So I always have a passion about how to make things very efficient and how to help you to save money.

Because everyone knows that compute is actually one of the biggest costs for your companies or for your startups. Usually, if you want to rent 1,000 GPU, we'll spend you millions of dollars per year. And we think that these problems should be solved by not just building more data centers, but actually building a GPU marketplace.

So let's get started with the problem that we're facing. First, I think-- so everyone knows that AI is going to integrate with everything in the future. And every company will be AI companies. So the demand for GPUs as well as data centers is exploding. So by McKinsey, by 2030, we'll need 4x more data centers built in one quarter of the time that we built in this bit.

But what if I tell you that you actually don't need that many data centers? You actually need another solution. So we can break down the demand first. Right now, the current capacity for data center is 55 gigawatts. By the median scenario, we're going to see 22% annual growth rate for the demand.

So in 2030, we're going to need 219 gigawatts. And however, there are a lot of challenges building data centers. So first, everyone knows StartGate. So it takes-- for the first StartGate data center, it takes more than a billion dollars to build. And then also, it's very slow to connect data center to the electrical grid.

For example, right now, the wait list is like seven years. So you need to wait seven years to connect 100 megawatts facility to the electrical grid in Northern Virginia. And then, it's also very consuming a lot of energy. So currently, we're spending 4% of the total electricity consumption in the US for just GPUs and data centers.

And also, it's not very environmentally sustainable. If you can look at the number, that's crazy CO2 emissions annually. And even say, if we're going to deliver all the data centers on time, there is still a data center supply deficit of more than 15 gigawatts in the US along by 2030.

And so it means that just building data center can solve the problem. On the other hand, we think the GPU utilization is actually pretty low. So according to Deloitte, GPUs sit idle 80% of the time for enterprises and companies. According to semi-analysis, there exists 100-plus GPU clouds. So we can see how fragmented this space is.

A lot of you guys need GPUs, but you can't find them. Or you are going to pay extremely high price. On the other hand, there are a lot of GPUs sit idle in data centers or in different clouds. And so naturally, a solution that we think we should build is actually build a GPU marketplace or aggregation layer that aggregate different data centers and GPU providers to solve the problem for GPU users.

So it doesn't necessarily need to be hyperbolic, but I just use hyperbolic as an example to show you. So I can just share what we're trying to solve. So we're building this global orchestration layer. We invented a software called HyperDOS, which is short for Hyperbolic Distributed Operating System. So basically, it's like Kubernetes software.

So any cluster, as long as it installed our software within five minutes, suddenly the data center become a cluster in our network. And on the other side, users can rent GPUs in different ways that they want. They can just do the spot instance. They can on-demand. They can long-term reserve.

Or they can also host models on top. And so we see that there are several benefits. One, we kind of solve the matching problem of compute. And then second, GPU become commodities. So you don't need to spend too much time to wait for data center. You just buy them on the marketplace.

And then third, you can have different options. And so we do some math modeling. I mean, I don't have time to kind of put down the math in the slides. But this is our conclusion, right? Basically, we can save the cost by 50% to 75%. Even if you look at the current-- we're running some beta version of our marketplace right now.

And our GPU cost for H100 is $0.99 per hour. But if you look at Google, for example, they have on-demand GPU. It's like $11. They're like a lambda. They have like $2 or $3. But on average, by aggregating more supply and then have a uniform distribution channel, you can drastically reduce the price.

It's the theory behind that is like the queuing theory. Basically, it's MMC theory. Probably next time, if we're going to watch my talk, I will share more math behind it. But yeah, and then you can just save time to vetting your suppliers. Because if you think about-- I mean, how many people here are funders or need to acquire GPUs?

So are you frustrated when you are trying to talk to-- how many suppliers are you talking to? If you have talked to more than five, raise your hands. Are you frustrated when you're trying to have five sales calls and try to know which GPUs are frustrated, yeah, are good, yeah.

That's good, yeah. So basically, by having this uniform platform, like funders or startups or companies no longer need to vet different data centers. They just pick the one that they have high rating or have the best price. We're also going to do benchmarking on the performance of the GPUs.

Sorry. All right. So sorry. Somehow the graph didn't show. Give me one sec. So basically, we can think about a use case example. So let's say if you are a startup and you want 1,000 GPUs at the beginning. So usually, you will just reserve these 1,000 GPUs for a year.

You think, I might need to use these GPUs for training. And later on, I want to do inference. And so you run some training jobs. And then after three months, then you realize that, OK, now, I have a bad idea by running those experiments. And now I need 1,000 more GPUs just for a month, right?

And then after six months, then you finish your training job. And then you realize that now I only need 500 GPUs for hosting my model. But I still have 500 GPUs left. So on the traditional unhyperbolic case, you basically can say, OK, I will rent 1,000 GPUs for a year at the beginning.

But then in month three, I can say, I just rent an extra 1,000 GPUs for just a month. And then in month six, then I can say, OK, I can release my idle GPUs on hyperbolic and try to sell them to other people that need them. But if you just use some traditional cloud, then you need to rent 1,000 GPUs at the beginning.

And then in month three, you need to rent actually 1,000 GPUs for a year, usually. And if you calculate the cost, compare that, and then also think about the price difference you will have, you can reduce the cost from $43.8 million to $6.9 million. So it's like 6x saving.

And you also help other people to get cheaper GPUs too, because you can release those idle GPUs to other people. And so this is how we think that we're going to increase the productivity. People only think about saving, but actually, this is not true for GPU, right? By scaling law, we know that the more compute you spend, the better quality your model will be.

So it's not just about saving your cost by 6x. It's more about with the same budget, you will increase your productivity by 6x. And imagine how many startups that they use only need to rely on open AI and anthropic those closed AI models. But now suddenly, their money become more valuable, and they can rent as many GPUs as they want for their training.

And so the next step that we think usually the GPU marketplace will evolve into is that it will be an all-in-one platform for a different AI workload. Because what people really want is not just GPUs. They want to run their different AI jobs, right? So you will have AI inference, online inference, offline inference, and then you will also have a training job.

And so, yeah, so this is, like, to, like, some takeaway. Like, basically, we don't think we need, like, just focus on building data centers. We also need to do, like, smart allocation for the resources. And then second, we can reduce your costs by building GPUs marketplace. And lastly, I think just focusing on building data centers is not very sustainable.

We're costing a lot of energy, taking a lot of land. We should better reuse, recycle those idle compute by sending it to others. So if you're interested in trying out, you can come to our website. The left QR code is the current product that we have, which is a marketplace.

But then we're also launching our business card and enterprise card that give you, like, production-ready GPUs with 99.5% reliability. All right. Thanks. Awesome. So I actually got-- I'm curious. Can you tell us more about the kind of hyperbolic OS? How exactly does that turn? Because I know a lot of times you have a data center, you have a cluster set of GPUs.

How does it actually work to connect it to hyperbolic itself? Yeah. Yeah. Yeah. So basically, this is a hybrid-- HyperDOS is like a Kubernetes agent. So you just install that in your cluster, as long as you have Kubernetes. I mean, most data centers have Kubernetes. But then even for your MacBook or for your other PC, you can just install, like, to kind of become a Kubernetes-ready machine.

And so basically, now you kind of have-- we have terminology in-house. We call, like, our hyperbolic server Monarch. And then we have different Barons. So it's like a model. So different Barons, they own different compute. And then every time when a user wants to rent GPU, they will talk to our Monarch server.

And the Monarch server will send a request to the Barron. And then Barron will basically provision the machines and set up the SSH instance for customers to access. Yeah. And then we'll see you next time.

Why We Don’t Need More Data Centers - Dr. Jasper Zhang, Hyperbolic

Transcript