back to indexAGI: (gets close), Humans: ‘Who Gets to Own it?’

Chapters
0:0 Intro
1:37 AGI Inches Closer
4:26 ‘Super-Exponential’
5:58 Musk Bid
7:34 Luxury Goods and Land
9:5 ‘Benefits All Humanity’
12:52 ‘National Security’
14:21 s1
20:33 Final thoughts
00:00:00.000 |
The world may be waking up to the fact that intelligence will be automated sooner than 00:00:05.680 |
anyone could have imagined a few years ago, but it is still sleeping when it comes to 00:00:11.280 |
who gets the spoils. Just today, the Vice President of America said that AI will never 00:00:17.120 |
replace workers and only boost productivity. Then again, Sam Altman, CEO of OpenAI, wrote 00:00:23.520 |
just yesterday that he could see labor losing its power to capital. And Rand, the famous 00:00:30.080 |
think tank, put out a paper just the other day that said that the world isn't ready 00:00:34.200 |
for the "job losses and societal unrest" that it thinks might accompany a more general 00:00:40.320 |
artificial intelligence. But even if labor does lose, capital can't decide who gets 00:00:45.840 |
the money. Just today, Musk and co. challenged Sam Altman and Microsoft for control of OpenAI 00:00:53.500 |
itself. And of course, there are always papers like this one from Stanford suggesting that 00:00:58.720 |
the reasoning enhancements needed to bring a model to frontier capability are achievable 00:01:03.960 |
for just $20, which makes me think you guys can afford AGI after all. Meanwhile, Dario 00:01:09.600 |
Amadei, CEO of Anthropic, makers of Claude, says that time is running out to control the 00:01:15.400 |
AGI itself. "I just think that when the day inevitably comes that we must confront 00:01:20.920 |
the full automation of intelligence. I just hope we are a little more unified, let's 00:01:26.600 |
say, than we are now." There is too much to cover as always, so let's just cut it 00:01:30.880 |
down to the 7 most interesting developments, using the Sam Altman essay as the jumping 00:01:36.280 |
off point for each. First off, he gives his 5th or maybe 15th different definition for 00:01:42.440 |
AGI, but this time it's "We mean it to be a system that can tackle increasingly 00:01:47.640 |
complex problems at human level in many fields." Well, under that definition, we are getting 00:01:53.880 |
awfully close. Take coding, where we heard in December that the O3 model was the 175th 00:02:01.400 |
highest ranked coder in CodeForce's ELO. Now, that might not mean much to many people, 00:02:06.680 |
but just yesterday in Japan, Sam Altman said they now have internally the 50th highest 00:02:12.120 |
scoring competitor. We're clearly well beyond imitation learning. These systems O1, O3, 00:02:17.480 |
O4, they're not copying those top 50 competitors saying coding. They are trying things out 00:02:22.440 |
themselves and teaching themselves through reinforcement learning what works. We are 00:02:26.440 |
not capped at the human level and that applies to way more than just coding. I've been 00:02:30.580 |
using deep research from OpenAI on the pro tier this week to at least suggest diagnoses 00:02:37.420 |
for a relative and a doctor I know said that it found things that she wouldn't have thought 00:02:43.960 |
of. Of course, it does hallucinate fairly frequently, but it also thinks of things you 00:02:48.040 |
might not have thought of. And remember, this is O3 searching maybe 20 sources. What about 00:02:53.960 |
O5 searching 500? And you might say, well, knowing stuff is cool, but why color workers 00:02:59.360 |
actually take actions on their computers? Well, Karina Nguyen from OpenAI has this to 00:03:05.760 |
say. On tasks, they're saturating all the benchmarks. 00:03:09.260 |
And post-training itself is not hitting the wall. Basically, we went from like raw data 00:03:15.620 |
sets from pre-trained models to infinite amount of tasks that you can teach the model in the 00:03:23.760 |
post-training world via reinforcement learning. So any task, for example, like how to search 00:03:30.700 |
the web, how to use the computer, how to write, well, like all sorts of tasks that you like 00:03:37.900 |
trying to teach the model, all the different skills. And that's why we think like there's 00:03:42.340 |
no data wall or whatever, because there will be infinite amount of tasks. And that's how 00:03:48.340 |
the model becomes extremely super intelligent. And VR actually getting saturated in all benchmarks. 00:03:54.740 |
So I think the bottleneck is actually in evaluations. 00:03:57.860 |
And there's a reason I can believe that even though their current operator system, only 00:04:01.900 |
available on Pro for $200 a month, is quite jank. It's because tasks like buying something 00:04:07.680 |
online or filling out a spreadsheet are mostly verifiable. And whenever you hear verifiable 00:04:12.780 |
or checkable, think ready to be absolutely eaten by reinforcement learning. Just like 00:04:18.060 |
domains like code, where you can see the impact of enhanced reinforcement learning from 01 00:04:23.980 |
preview to 03. Next is the investment that must go in to make all of this happen. And 00:04:28.740 |
Sam Altman had this to say later on in the essay, "The scaling laws that predict intelligence 00:04:34.060 |
improvements have been accurate over many orders of magnitude. Give or take the intelligence 00:04:39.260 |
of an AI model roughly equals the log of the resources used to train and run it." So think 00:04:43.980 |
of that as 10xing the resources you put in to get one incremental step forward in intelligence. 00:04:50.500 |
Doesn't sound super impressive until you read the third point. And I agree with this point. 00:04:54.700 |
The socioeconomic value of linearly increasing intelligence, each increment, is super exponential. 00:05:01.140 |
In short, if someone could somehow double the intelligence of 03, it wouldn't be worth 00:05:06.020 |
4x more to me, and I think to many people, it would be worth way, way more than that. 00:05:10.460 |
It would be super exponential. He goes on, "A consequence of this is that we see no reason 00:05:15.820 |
for the exponentially increasing investment to stop in the near future." In other words, 00:05:20.340 |
if AI will always pay you back tenfold for what you invest in it, why ever stop investing? 00:05:26.820 |
Many forget this, but less than two years ago, Sam Altman himself said that his grand 00:05:31.060 |
idea is that OpenAI will capture much of the world's wealth through the creation of AGI, 00:05:36.780 |
and then redistribute it to the people. We're talking figures like not just 100 billion, 00:05:41.700 |
but a trillion, or even 100 trillion. That's coming from him. Only adds, if AGI does create 00:05:47.140 |
all that wealth, he's not sure how the company will redistribute it. To give you a sense 00:05:51.260 |
of scale, as you head towards 100 trillion, you're talking about the scale of the entire 00:05:56.460 |
labor force of the planet. And that, of course, brings us to others who don't want him to 00:06:01.980 |
have that control, or maybe want that control for themselves. As you may have heard about, 00:06:07.500 |
Elon Musk has bid almost 100 billion for OpenAI, or at least it's a bid for the non-profit 00:06:14.200 |
which currently controls OpenAI. To save you reading half a dozen reports, essentially 00:06:19.500 |
it looks like Sam Altman and OpenAI have valued that non-profit's stake in OpenAI at around 00:06:26.380 |
$40 billion. That leaves plenty of equity left for Microsoft and OpenAI itself, including 00:06:32.380 |
its employees. However, if Musk and others have valued that stake at $100 billion, then 00:06:39.380 |
it might be very difficult in court for Altman and co. to say it's worth only $40 billion. 00:06:45.820 |
So even if they reject, as it seems like they have done, Musk's offer, it forces them 00:06:51.180 |
to potentially dilute the stake owned by Microsoft and the employees. Altman said to the employees 00:06:57.380 |
at OpenAI that these are just tactics to try and weaken us because we're making great 00:07:02.300 |
progress. The non-profit behind OpenAI could also reject the offer because it thinks that 00:07:07.780 |
AGI wouldn't be safe in the hands of Musk. At this point, I just can't resist doing 00:07:12.480 |
a quick plug for a mini documentary I released on my Patreon just yesterday. It actually 00:07:18.100 |
covers the origin stories of DeepMind, OpenAI, the tussle with Musk and Anthropic and how 00:07:24.300 |
the founding vision of each of those AGI labs went awry. This time, by the way, I used a 00:07:29.780 |
professional video editor and the early reviews seemed to be good. All the shenanigans that 00:07:34.820 |
are going on with the non-profit at OpenAI seem worthy of an entire video on their own. 00:07:41.260 |
So for now, I'm going to move on to the next point. 00:07:44.060 |
Sam Altman predicted that with the advent of AGI, the price of many goods will eventually 00:07:49.300 |
fall dramatically. It seems like one way to assuage people who lose their job or see their 00:07:54.700 |
wages drop is that, well, at least your TV is cheaper. But he did say the price of luxury 00:08:00.460 |
goods and land may rise even more dramatically. Now, I don't know what you think, but I live 00:08:07.380 |
in London and the price of land is already pretty dramatic. So who knows what it will 00:08:13.100 |
be after AGI. But just on that luxury goods point, I think Sam Altman might have one particular 00:08:19.340 |
luxury good in mind. Yesterday in London, Sam Altman was asked about their hardware 00:08:24.860 |
device designed in part by Johnny Ive from Apple. And he said, it's incredible. It really 00:08:31.360 |
is. I'm proud of it. And it's just a year away. Yes, by the way, I did apply to be at 00:08:36.380 |
that event, but you had to have certain org IDs, which I didn't. One thing that might 00:08:40.960 |
not be a luxury device are smaller language models. In leaked audio of that same event, 00:08:47.760 |
he apparently said, well, one idea would be we put out O3 and then open source O3 mini. 00:08:53.420 |
We put out O4 and open source O4 mini. He added, this is not a decision, but directionally 00:08:59.900 |
you could imagine us saying this. Take all of that for what it is worth. 00:09:04.620 |
The next jumping off point comes in the first sentence actually of this essay, which is 00:09:09.740 |
that the mission of OpenAI is to ensure that AGI benefits all of humanity. Not that they 00:09:15.640 |
make AGI, that they make an AGI that benefits all of humanity. Now, originally when they 00:09:20.680 |
were founded, which I covered in the documentary, the charter was that they make AGI that benefits 00:09:26.200 |
all of humanity unencumbered by the need for a financial return. But that last bit's gone, 00:09:31.540 |
but we still have that it benefits all of humanity. Not most of humanity, by the way, 00:09:35.940 |
benefits all of humanity. I really don't know how they are going to achieve that when they 00:09:41.660 |
themselves admit that the vast majority of human labor might soon become redundant. Even 00:09:46.980 |
if they somehow got implemented a benevolent policy in the US to make sure that everyone 00:09:53.140 |
was looked after, how could you ensure that for other nations? 00:09:56.420 |
After watching Yoshua Bengio, one of the godfathers of AI, and I'll show you the clip in a second, 00:10:01.220 |
I did have this thought. It seems to me if a nation got to AGI or super intelligence 00:10:06.380 |
one month, three months, six months before another one, it's not the most likely that 00:10:11.500 |
they would use that advantage to just wipe out other nations. I think more likely would 00:10:16.860 |
be to wipe out the economies of other nations. The US might automate the economy of say China 00:10:23.460 |
or China, the US, and then take that wealth and distribute it amongst its people. And 00:10:28.580 |
Yoshua Bengio thinks that that might even apply at the level of companies. 00:10:32.500 |
I can see from the declarations that are made and, you know, what, you know, logically these 00:10:37.540 |
people would do is that the people who control these systems, like say open AI potentially, 00:10:44.780 |
they're not going to continue just selling the access to their AI. They're going to give 00:10:50.980 |
access to, you know, a lower grade AI. They're going to keep the really powerful ones for 00:10:55.940 |
themselves and they're going to build companies that are going to compete with the non-AI, 00:11:01.100 |
you know, systems that exist. And they're going to basically wipe out the economies 00:11:05.360 |
of all the other countries which don't have these superintelligent systems. So it's, you 00:11:13.160 |
know, you say it's, you wrote it's not existential, but I think it is existential for countries 00:11:18.660 |
who don't build up to this kind of level of AI. And it's an emergency because it's going 00:11:26.340 |
to take at least several years, even with the coalition of the willing to bridge that. 00:11:32.900 |
And just very quickly, I can't help, because he mentioned competitor companies to mention 00:11:38.000 |
Gemini 2 Pro and Flash from Google, new models from Google DeepMind. There's also of course 00:11:43.720 |
Gemini Thinking, which replicates the kind of reasoning traces of say O3 Mini or DeepSeek 00:11:48.720 |
R1. Now straight off the benchmark results of these models are decent, but not stratospheric. 00:11:55.000 |
For the most part, we're not talking O3 or DeepSeek R1 levels. On SimpleBench we're rate 00:11:59.480 |
limited, but it seems like the scores of both the Thinking Mode and Gemini 2 Pro will gravitate 00:12:05.520 |
around the same level as the "Gemini Experimental 1206". But I will say this, I know it's kind 00:12:11.280 |
of niche. Gemini is amazing at quickly reading vast amounts of PDFs and other files. No, 00:12:18.760 |
it's transcription accuracy of audio and I've tested isn't going to be at the level of say 00:12:23.200 |
Assembly AI and no, it's coding is no O3 and it's "deep research" button is no deep research, 00:12:30.080 |
but the Gemini series are great at extracting text from files and they are incredibly cheap. 00:12:36.040 |
So I'm quite impressed. And I do suspect as Chachapiti just recently overtook Twitter to 00:12:41.200 |
become the sixth most visited site and slowly starts closing in on Google, that Google will 00:12:46.600 |
invest more and more and more to ensure that Gemini 3 is state of the art. Next, Ullman 00:12:52.400 |
wrote about a likely path that he sees is AI being used by authoritarian governments 00:12:57.400 |
to control their population through mass surveillance and loss of autonomy. And that remark brings 00:13:03.060 |
me to the RAND paper that for some reason I read in full, because they're worried by 00:13:08.400 |
not just mass surveillance by authoritarian dictatorships, but other threats to quote 00:13:13.640 |
national security. Wonder weapons, systemic shifts in power, kind of talked about that 00:13:18.560 |
earlier with say China automating the economy of the US, non-experts in power to develop 00:13:23.520 |
weapons of mass destruction, artificial entities with agency, think O6 kind of coming alive 00:13:29.160 |
and instability. This is RAND again, which has been around for over 75 years and is not 00:13:35.160 |
known for dramatic statements. Again, I would ask though that if the US does a quote large 00:13:40.280 |
national effort to ensure that they obtain a decisive AI enabled wonder weapon before 00:13:46.240 |
China, say three months before, six months before, then what? Are you really going to 00:13:50.080 |
use it to then disable the tech sector of China? For me, the real admission comes towards 00:13:55.800 |
the end of this paper where they say the US is not well positioned to realise the ambitious 00:14:02.160 |
economic benefits of AGI without widespread unemployment and accompanying societal unrest. 00:14:08.580 |
And I still remember the days when Ullman used to say in interviews, it's just around 00:14:12.280 |
two years ago, he said stuff like, if AGI produces the kind of inequality that he thinks 00:14:17.960 |
it will, people won't take it anymore. Let's now though, get to some signs that 00:14:22.440 |
AGI might not even be controlled by countries or even companies. For less than $50 worth 00:14:29.260 |
of compute time, of course, not counting research time, but for around apparently $20 worth 00:14:34.400 |
of compute time, affordable for all of you guys, Stanford produced S1. Now, yes, of course, 00:14:40.360 |
they did utilise an open weight base model, QUEN 2.5, 32 billion parameters in struct, 00:14:46.100 |
but the headline is with just a thousand questions worth of data, they could bring that tiny 00:14:52.360 |
model to being competitive with O1. This is in science, GPQA and competition level mathematics. 00:14:59.840 |
The key methodology was, well, whenever the model wanted to stop, they forced it to continue 00:15:05.520 |
by adding weight, literally the token weight, multiple times to the model's generation 00:15:11.520 |
when it tried to end. Imagine you sitting in an exam and every time 00:15:14.560 |
you think you've come to an answer and you're ready to write it down, a voice in your head 00:15:18.880 |
says, wait, that's kind of what happened until the student or you had taken a set amount 00:15:25.240 |
of time on the problem. Appropriately then, this is called test time scaling, scaling 00:15:30.640 |
up the amount of tokens spent to answer each question. I've reviewed the questions, by 00:15:35.520 |
the way, in the math 500 benchmark, and they are tough. So to get 95%, at least the hard 00:15:41.000 |
ones, the level five ones are, so to get 95% in that is impressive. Likewise, of course, 00:15:45.840 |
to get beyond 60% in GPQA diamond, which roughly matches the level of PhDs in those domains. 00:15:52.920 |
To recap, this is an off the shelf open weights model trained with just a thousand questions 00:15:58.000 |
and reasoning traces. There were some famed professors in this Stanford team and their 00:16:02.760 |
goal, by the way, was to replicate this chart on the right, which came in September from 00:16:07.920 |
OpenAI. Now we kind of already know that the more pre-training you do and post-training 00:16:12.640 |
with reinforcement learning you do, the better the performance will be. But what about time 00:16:17.460 |
taken to actually answer questions, test time compute? That's the chart they wanted to 00:16:22.040 |
replicate. Going back to the S1 paper, they say, despite the large number of O1 replication 00:16:26.760 |
attempts, none have openly replicated a clear test time scaling behavior and look how they 00:16:32.520 |
have done so. I'm going to simplify their approach a little bit because it's the finding 00:16:36.560 |
that I'm more interested in, but essentially they sourced 59,000 tough questions. Physics 00:16:42.640 |
Olympiads, astronomy, competition level mathematics, and AGI eval. I remember covering that paper 00:16:47.800 |
like almost two years ago on this channel. They got Gemini thinking, the one that outputs 00:16:52.160 |
thinking tokens like DeepSeek R1 does to generate reasoning traces and answers for each of those 00:16:57.840 |
59,000 examples. Now they could have just trained on all of those examples, but that 00:17:04.120 |
did not offer substantial gains over just picking a thousand of them. Just a thousand 00:17:08.860 |
examples in say your domain to get a small model to be a true reasoner. Then of course 00:17:14.200 |
get it to think for a while with that weight trick. How to filter down from 59,000 examples 00:17:19.360 |
to 1,000 by the way. First decontaminate, you don't want any questions that you're 00:17:23.520 |
going to use to then test the model of course. Remove examples that rely on images that aren't 00:17:28.600 |
found in the question, for example, and other formatting stuff. But more interestingly, 00:17:34.040 |
difficulty and diversity. This is the kind of diversity that even JD Vance would get 00:17:38.320 |
behind. On difficulty, they got smaller models to try those questions. And if those smaller 00:17:42.880 |
models got the questions right, they excluded them. They must be too easy. On diversity, 00:17:48.000 |
they wanted to cover as many topics as possible from mathematics and science, for example. 00:17:53.960 |
They ended up with around 20 questions from 50 different domains. They then fine-tuned 00:17:59.120 |
that base model on those a thousand examples with the reasoning traces from Gemini. And 00:18:03.360 |
if you're wondering about DeepSeek R1, they fine-tuned with 800,000 examples. Actually, 00:18:09.760 |
you can see that in this chart on the right here. Again, it wasn't just about fine-tuning 00:18:14.360 |
each time the model would try to stop. They said wait sometimes two, four, or six times 00:18:20.440 |
to keep boosting performance. Basically, it forces the model to check its own output and 00:18:25.200 |
see if it can improve it. Notice weight is fairly neutral. You're not telling the model 00:18:29.200 |
that it's wrong. You're saying, wait, maybe do we need to check that? They also tried 00:18:33.160 |
scaling up majority voting or self-consistency, and it didn't quite have the same slope. Suffice 00:18:38.980 |
to say though, if anyone watching is in any confusion, getting these kinds of scores in 00:18:43.440 |
GPQA, Google Proof Question and Answer, and competition level mathematics, it's insane. 00:18:49.400 |
Incredibly impressive. Of course, if you took this same model and tested it in a different 00:18:53.400 |
domain, it would likely perform relatively poorly. Also, side note, when they say open 00:18:57.960 |
data, they mean those thousand examples that they fine-tuned the base model on. The actual 00:19:02.800 |
base model doesn't have open data. So it's not truly open data. As in, we don't know 00:19:07.440 |
everything that went into the base model. Everything that Quen 2.5, 32 billion parameters, 00:19:12.920 |
was trained on. Interestingly, they would have gone further, but the actual context 00:19:17.000 |
window of the underlying language model constrains it. And Karpathy, in his excellent Chatterbitty 00:19:22.560 |
video this week, talked about how it's an open research question about how to extend 00:19:27.800 |
the context window suitably at the frontier. It's a three and a half hour video, but it's 00:19:31.920 |
a definite recommend from me. Actually, speaking of Karpathy, his reaction to this very paper 00:19:36.880 |
was "cute idea, reminds me of let's think step-by-step trick. That's where you told 00:19:41.960 |
the model to think step-by-step so it spent more tokens to reason first before giving 00:19:46.160 |
you an answer. Here, by saying wait, we're forcing the model to think for longer. Both 00:19:50.320 |
lean, he said, on the language prior to steer the thoughts." And speaking of spending 00:19:54.240 |
your time well by watching a Karpathy video, I would argue you can spend your money pretty 00:19:59.680 |
well by researching which, say, charity to give to through GiveWell. They are the sponsors 00:20:05.480 |
of this video, but I've actually been using them for, I think, 13 years. They have incredibly 00:20:10.780 |
rigorous methodology, backed by 60,000+ hours of research each year on which charities save 00:20:17.800 |
the most lives, essentially. The one that I've gone for, for actually all of those 00:20:22.000 |
13 years, is the Against Malaria Foundation, I think started in the UK. Anyway, do check 00:20:26.520 |
out GiveWell, the links are in the description, and you can even put in where you first heard 00:20:31.080 |
of them. So obviously, you could put, say, AI Explained. But alas, we are drawing to 00:20:35.040 |
the end, so I've got one more point from the Sam Wallman essay that I wanted to get 00:20:38.600 |
to. In previous essays, he's talked about the value of labour going to zero. Now he 00:20:42.720 |
just talks about the balance of power between capital and labour getting messed up. But 00:20:47.120 |
interestingly, he adds, this may require early intervention. Now, OpenAI have funded studies 00:20:52.320 |
into UBI with, let's say, mixed results, so it's interesting he doesn't specifically 00:20:56.860 |
advocate for universal basic income. He just talks about early intervention, then talks 00:21:00.920 |
about compute budgets and being open to strange-sounding ideas. But I would say, if AGI is coming in 00:21:06.400 |
two to five years, then the quote "early intervention" would have to happen, say, 00:21:10.480 |
now? I must confess, though, at this stage, that I feel like we desperately need preparation 00:21:15.880 |
for what's coming, but it's quite hard to actually specifically say what I'm advocating 00:21:20.360 |
the preparation be. Then we get renewed calls just today from the CEO of Anthropic, Dario 00:21:25.160 |
Amadei, about how AI will become a country of geniuses in a data centre, possibly by 00:21:30.400 |
2026 or 2027, and almost certainly no later than 2030. He said that governments are not 00:21:36.800 |
doing enough to hold the big AI labs to account and measure risks and, at the next international 00:21:43.600 |
summit – there was one just this week – we should not repeat this missed opportunity. 00:21:48.080 |
These issues should be at the top of the agenda. The advance of AI presents major new global 00:21:53.200 |
challenges. We must move faster and with greater clarity to confront them. I mean, I'm sold 00:21:58.760 |
and I think many of you are, that change is coming very rapidly and sooner than the vast 00:22:04.360 |
majority of people on the planet think. The question for me that I'll have to reflect 00:22:08.120 |
on is, well, what are we going to do about it? Let me know what you think in the comments 00:22:12.560 |
but above all, thank you so much for watching to the end and have a wonderful day.