back to index$125B for Superintelligence? 3 Models Coming, Sutskever's Secret SSI, & Data Centers (in space)...
Chapters
0:0 Intro
1:6 SSI, Safe Superintelligence (Sutskever)
3:45 Grok-3 (Colossus) + Altman Concerned
5:36 CharacterAI + Foundation Models
6:26 125B Supercomputers + 5-10GW
8:28 ‘GPT-6’ Scale
9:7 Zuckerberg on Exponentials and Doubt
9:42 Strawberry/Orion + Connections + Weights
11:39 Data Centers in Space (and the sea)
12:45 Distributed Training + SemiAnalysis Report w/ Gemini 2
17:34 Climate Change Pledges?
00:00:00.000 |
"Superintelligence just got valued at $5 billion." 00:00:06.840 |
"led by none other than the reclusive Ilya Sutskova 00:00:13.200 |
with this detail-free tweet in just the last few hours. 00:00:17.560 |
And today, we also got more news of Gemini 2, Grok 3, 00:00:22.120 |
and not just one, but two new $125 billion data centers. 00:00:37.140 |
Even to the point of making data centers in space, 00:00:49.900 |
If the scaling hypothesis believers are right, 00:00:57.640 |
as the biggest waste of resources in human history. 00:01:16.520 |
He's definitely still alive and working on AI. 00:01:19.960 |
If you haven't heard of Safe Superintelligence, 00:01:22.560 |
don't worry, it's actually only three months old, 00:01:25.720 |
but as mentioned earlier, valued at $5 billion. 00:01:37.400 |
what will those $1 billion in funds be used for? 00:01:40.960 |
Well, it's that key theme you'll see throughout this video 00:01:46.060 |
The funds will be used to acquire computing power. 00:01:52.920 |
so we can pretty much trust it's spot on with its details. 00:02:12.840 |
By the way, they're gonna do that with a team 00:02:22.480 |
like Sequoia Capital and Daniel Gross, who is a co-founder. 00:02:28.560 |
who is clearly the key person in this venture, 00:02:31.320 |
hasn't given any real detail about his approach, 00:02:34.600 |
but he did sprinkle some hints into this article. 00:02:43.520 |
And he said, "Everyone just says scaling hypothesis. 00:02:46.500 |
Everyone neglects to ask, 'What are we scaling?'" 00:02:49.740 |
But he went on, "Some people can work really long hours, 00:02:52.720 |
and they'll just go down the same path faster. 00:02:58.600 |
then it becomes possible for you to do something special." 00:03:01.440 |
Before people get completely carried away, though, 00:03:05.800 |
to some of the claims that Sutskova has made before. 00:03:13.120 |
which just over a year ago set themselves the deadline 00:03:32.540 |
but it doesn't strike me as being a quarter of the way 00:03:44.300 |
Now, naturally, those weren't the only grandiose visions 00:03:52.520 |
claiming to have the most powerful AI training system 00:03:56.320 |
He mentions it soon having around 200,000 H100 equivalents, 00:04:09.640 |
or that it's not really the computing power that matters, 00:04:17.400 |
more credence because of the capabilities of Grok2. 00:04:20.920 |
Grok2, the frontier model produced by Musk's ex-AI team, 00:04:38.780 |
that that claim shouldn't be immediately dismissed. 00:04:41.640 |
And that's from this report yesterday in the information. 00:04:45.680 |
Now, first, it did caveat that that 100,000 chip cluster, 00:04:59.160 |
and more about power constraints in a moment. 00:05:02.920 |
OpenAI CEO Sam Altman has told some Microsoft executives 00:05:09.640 |
could soon have more access to computing power 00:05:19.940 |
that you might be starting to wonder something. 00:05:24.740 |
Isn't there supposed to be some secret source 00:05:37.820 |
that have tried to build their own foundation models, 00:05:50.700 |
but couldn't make their own foundation models work. 00:05:53.140 |
You may also recall efforts by AdeptAI and Inflection, 00:06:01.100 |
were snapped up by the likes of Google and Microsoft. 00:06:04.660 |
In short, people are trying things as alternatives 00:06:16.900 |
the Orca series of models and the Phi family of models, 00:06:21.380 |
And that might be why companies are betting everything 00:06:38.660 |
And you might've thought from the title of this video 00:06:41.020 |
that there's a singular 125 billion supercomputer, 00:06:50.140 |
via officials that would know about such investments. 00:06:53.620 |
Namely, the source is the Commissioner of Commerce, 00:06:56.700 |
Josh Teigen, who said that two separate companies 00:07:00.220 |
approached him and the governor of North Dakota 00:07:05.340 |
These would initially consume around 500 megawatts 00:07:11.380 |
with plans to scale up to five or 10 gigawatts of power 00:07:20.380 |
So for context, here is an excellent diagram from Epoch AI. 00:07:24.620 |
Five gigawatts of power allocated to a single training run 00:07:28.100 |
would put the power constraint just above this line here. 00:07:31.460 |
Now, given that it's expected that these other constraints 00:07:36.580 |
that would give us just over 10,000 times more compute 00:07:49.740 |
that will be the equivalent of a GPT-6 training run. 00:07:53.540 |
Now, yes, I know that there are quote leaks like this one 00:08:02.220 |
but my adage for such leaks is don't trust and verify. 00:08:06.300 |
And also the number of parameters that goes into a model 00:08:10.020 |
or the number of tweakable knobs, if you like, 00:08:19.580 |
Chinchilla scaling laws have long since been left behind 00:08:22.860 |
and we are massively ramping up the amount of data 00:08:28.300 |
But before we all get too lost in the numbers, 00:08:32.100 |
I'm saying that with the amount of money that's being spent 00:08:35.220 |
and the amount of power that's being provisioned, 00:08:37.740 |
people are factoring in models up to the scale 00:08:46.900 |
compare the performance of the original chat GPT 00:08:55.180 |
Claude 5.5 Sonnet would be quite interesting to behold. 00:09:00.180 |
Now, just to emphasize how much this scaling is a bet 00:09:09.860 |
- It's one of the trickiest things in the world 00:09:11.940 |
to plan around is when you have an exponential curve, 00:09:16.620 |
And I think it's likely enough that it will keep going, 00:09:21.300 |
that it is worth investing the tens or 100 billion plus 00:09:30.740 |
you're going to get some really amazing things 00:09:33.020 |
that are just going to make amazing products. 00:09:35.260 |
But I don't think anyone in the industry can really tell you 00:09:38.980 |
that it will continue scaling at that rate for sure. 00:09:42.020 |
- And you may have noticed that I've barely mentioned 00:09:43.900 |
OpenAI successor language models and new verifier approaches. 00:09:48.260 |
Those approaches previously labeled Q* or Strawberry 00:09:51.900 |
throw in a bit of an X factor over the coming months. 00:09:55.060 |
According to this article from again, the information, 00:10:09.060 |
Interestingly, the only hint they gave of its capabilities 00:10:21.980 |
You've got to create four groups of four words 00:10:28.660 |
If you feed in these puzzles to GPC 4.0 as text 00:10:33.300 |
it usually can get one or two sets of four words. 00:10:37.980 |
But then what will happen is it will get stuck. 00:10:40.620 |
And even if you prompt it to try different arrangements 00:10:45.500 |
it'll still predict the same things again and again. 00:10:52.140 |
to get language models out of their local minima 00:11:01.780 |
I'm gonna wait to test it on SimpleBench to find out. 00:11:13.700 |
Proper evaluations of language models are absolutely crucial 00:11:17.660 |
as is clearly visualizing the differences between them. 00:11:20.980 |
You'd also ideally want your toolkit to be lightweight 00:11:28.460 |
So in addition to their free courses and guides, 00:11:31.580 |
do check out Weave using the link you can see on screen, 00:11:34.860 |
which will also of course be in the description. 00:11:37.340 |
Now though, for what some of you have been waiting for, 00:11:41.300 |
that one startup is attempting to build data centers 00:11:50.020 |
We are resorting to putting the data centers into space. 00:11:53.980 |
This company, Lumen Orbit, is a Y Combinator startup 00:12:12.620 |
before we all go wild about data centers in space. 00:12:18.700 |
Microsoft tried to build data centers underwater. 00:12:22.580 |
The idea was that the sea could help cool the data center 00:12:29.340 |
And even though it was described as largely a success, 00:12:34.620 |
from an operational or practical perspective. 00:12:42.180 |
but it strikes me that the cost of maintaining things 00:12:48.380 |
that's not exactly gonna stop us reaching GPT-6 scale models. 00:12:53.720 |
Well, we do have the option of geographically distributing 00:13:02.260 |
Microsoft found they more or less had to do that. 00:13:06.820 |
on a GPT-6 training cluster project was asked, 00:13:10.740 |
"Why not just co-locate the cluster in one region?" 00:13:19.260 |
that's roughly the size of that Colossus project 00:13:23.020 |
"in a single state without bringing down the power grid." 00:13:29.140 |
it's the power that's the constraining factor. 00:13:32.100 |
And also possibly water, but more on that in a future video. 00:13:37.420 |
then the clusters don't all have to be in the same place, 00:13:45.900 |
to cut a long story short is where we seem to be heading. 00:13:49.820 |
According to a report out just today from Semianalysis, 00:13:53.540 |
Google, OpenAI, and Anthropic are already executing plans 00:13:59.980 |
from one site to multiple data center campuses. 00:14:07.980 |
across multiple data centers, so it can be done. 00:14:12.460 |
there was this hidden gem in the third paragraph. 00:14:15.380 |
Again, this article was from today, and it said, 00:14:17.860 |
"Google's existing models lag behind OpenAI and Anthropic 00:14:22.780 |
"in terms of synthetic data, reinforcement learning, 00:14:26.420 |
"But the impending release of Gemini 2 will change this." 00:14:34.060 |
It seems like we will get Gemini 2 and Grok 3 00:14:38.780 |
And as we heard earlier, the Strawberry system 00:14:52.060 |
I want you to forget all the names and the fruit 00:14:59.180 |
to the performance of language models is their scale, 00:15:02.580 |
we should find out that fact by the end of this year. 00:15:13.820 |
If the data centers are getting to the kind of scale 00:15:17.100 |
where we need satellite pictures to assess how big they are 00:15:20.620 |
and that doesn't produce true artificial intelligence, 00:15:23.940 |
then, well, do we have to rely on Ilya Satskova? 00:15:36.380 |
So if it doesn't, you could expect a reflection of that 00:15:46.140 |
then the fact that models will be increasingly interconnected 00:15:51.540 |
to a kind of interesting philosophical moment. 00:16:00.260 |
Now, of course, it almost goes without saying 00:16:06.100 |
in getting this all set up and running smoothly. 00:16:08.940 |
Billions of man hours worth of problems to be solved, for sure. 00:16:13.300 |
And that's why companies, it seems, are clamping up 00:16:16.380 |
about how they're solving these hardware issues. 00:16:19.620 |
The publishing of methods has effectively stopped. 00:16:22.060 |
When OpenAI and others tell the hardware industry 00:16:24.420 |
about these issues, they are very vague and high level 00:16:27.180 |
so as not to reveal any of their distributed systems tricks. 00:16:30.620 |
To be clear, Semi-Analysis says these techniques 00:16:35.780 |
as both can be thought of as compute efficiency. 00:16:38.580 |
Here, then, is the central claim from Semi-Analysis. 00:16:42.500 |
There is a camp that feels AI capabilities have stagnated 00:16:57.900 |
The word only there is, of course, an opinion 00:17:02.980 |
Some, of course, believe that no amount of scaling 00:17:08.420 |
I have my thoughts, but honestly, I'm somewhat agnostic. 00:17:11.940 |
I genuinely want to know how these future models 00:17:16.460 |
I go into a ton of detail about what I'm creating 00:17:24.540 |
I released this video on that Epoch AI research. 00:17:33.340 |
It came about halfway through the 20,000 word report, 00:17:39.980 |
I just find it really quite poignant and interesting 00:17:42.740 |
to see what these behemoth companies will do, 00:17:54.860 |
if it turns out that the scaling hypothesis is true 00:18:16.780 |
but there was one quote that I found interesting 00:18:41.020 |
Anyway, Sam Altman is, according to the TSMC CEO, 00:18:46.380 |
And maybe even these 125 billion data centers 00:18:52.300 |
It's indubitable that a mountain has been identified 00:18:57.100 |
and that the AI industry is trying to climb it. 00:19:02.820 |
whether they're even heading in the right direction,