AGI: (gets close), Humans: ‘Who Gets to Own it?’

00:00:00.000 | The world may be waking up to the fact that intelligence will be automated sooner than

00:00:05.680 | anyone could have imagined a few years ago, but it is still sleeping when it comes to

00:00:11.280 | who gets the spoils. Just today, the Vice President of America said that AI will never

00:00:17.120 | replace workers and only boost productivity. Then again, Sam Altman, CEO of OpenAI, wrote

00:00:23.520 | just yesterday that he could see labor losing its power to capital. And Rand, the famous

00:00:30.080 | think tank, put out a paper just the other day that said that the world isn't ready

00:00:34.200 | for the "job losses and societal unrest" that it thinks might accompany a more general

00:00:40.320 | artificial intelligence. But even if labor does lose, capital can't decide who gets

00:00:45.840 | the money. Just today, Musk and co. challenged Sam Altman and Microsoft for control of OpenAI

00:00:53.500 | itself. And of course, there are always papers like this one from Stanford suggesting that

00:00:58.720 | the reasoning enhancements needed to bring a model to frontier capability are achievable

00:01:03.960 | for just $20, which makes me think you guys can afford AGI after all. Meanwhile, Dario

00:01:09.600 | Amadei, CEO of Anthropic, makers of Claude, says that time is running out to control the

00:01:15.400 | AGI itself. "I just think that when the day inevitably comes that we must confront

00:01:20.920 | the full automation of intelligence. I just hope we are a little more unified, let's

00:01:26.600 | say, than we are now." There is too much to cover as always, so let's just cut it

00:01:30.880 | down to the 7 most interesting developments, using the Sam Altman essay as the jumping

00:01:36.280 | off point for each. First off, he gives his 5th or maybe 15th different definition for

00:01:42.440 | AGI, but this time it's "We mean it to be a system that can tackle increasingly

00:01:47.640 | complex problems at human level in many fields." Well, under that definition, we are getting

00:01:53.880 | awfully close. Take coding, where we heard in December that the O3 model was the 175th

00:02:01.400 | highest ranked coder in CodeForce's ELO. Now, that might not mean much to many people,

00:02:06.680 | but just yesterday in Japan, Sam Altman said they now have internally the 50th highest

00:02:12.120 | scoring competitor. We're clearly well beyond imitation learning. These systems O1, O3,

00:02:17.480 | O4, they're not copying those top 50 competitors saying coding. They are trying things out

00:02:22.440 | themselves and teaching themselves through reinforcement learning what works. We are

00:02:26.440 | not capped at the human level and that applies to way more than just coding. I've been

00:02:30.580 | using deep research from OpenAI on the pro tier this week to at least suggest diagnoses

00:02:37.420 | for a relative and a doctor I know said that it found things that she wouldn't have thought

00:02:43.960 | of. Of course, it does hallucinate fairly frequently, but it also thinks of things you

00:02:48.040 | might not have thought of. And remember, this is O3 searching maybe 20 sources. What about

00:02:53.960 | O5 searching 500? And you might say, well, knowing stuff is cool, but why color workers

00:02:59.360 | actually take actions on their computers? Well, Karina Nguyen from OpenAI has this to

00:03:05.760 | say. On tasks, they're saturating all the benchmarks.

00:03:09.260 | And post-training itself is not hitting the wall. Basically, we went from like raw data

00:03:15.620 | sets from pre-trained models to infinite amount of tasks that you can teach the model in the

00:03:23.760 | post-training world via reinforcement learning. So any task, for example, like how to search

00:03:30.700 | the web, how to use the computer, how to write, well, like all sorts of tasks that you like

00:03:37.900 | trying to teach the model, all the different skills. And that's why we think like there's

00:03:42.340 | no data wall or whatever, because there will be infinite amount of tasks. And that's how

00:03:48.340 | the model becomes extremely super intelligent. And VR actually getting saturated in all benchmarks.

00:03:54.740 | So I think the bottleneck is actually in evaluations.

00:03:57.860 | And there's a reason I can believe that even though their current operator system, only

00:04:01.900 | available on Pro for $200 a month, is quite jank. It's because tasks like buying something

00:04:07.680 | online or filling out a spreadsheet are mostly verifiable. And whenever you hear verifiable

00:04:12.780 | or checkable, think ready to be absolutely eaten by reinforcement learning. Just like

00:04:18.060 | domains like code, where you can see the impact of enhanced reinforcement learning from 01

00:04:23.980 | preview to 03. Next is the investment that must go in to make all of this happen. And

00:04:28.740 | Sam Altman had this to say later on in the essay, "The scaling laws that predict intelligence

00:04:34.060 | improvements have been accurate over many orders of magnitude. Give or take the intelligence

00:04:39.260 | of an AI model roughly equals the log of the resources used to train and run it." So think

00:04:43.980 | of that as 10xing the resources you put in to get one incremental step forward in intelligence.

00:04:50.500 | Doesn't sound super impressive until you read the third point. And I agree with this point.

00:04:54.700 | The socioeconomic value of linearly increasing intelligence, each increment, is super exponential.

00:05:01.140 | In short, if someone could somehow double the intelligence of 03, it wouldn't be worth

00:05:06.020 | 4x more to me, and I think to many people, it would be worth way, way more than that.

00:05:10.460 | It would be super exponential. He goes on, "A consequence of this is that we see no reason

00:05:15.820 | for the exponentially increasing investment to stop in the near future." In other words,

00:05:20.340 | if AI will always pay you back tenfold for what you invest in it, why ever stop investing?

00:05:26.820 | Many forget this, but less than two years ago, Sam Altman himself said that his grand

00:05:31.060 | idea is that OpenAI will capture much of the world's wealth through the creation of AGI,

00:05:36.780 | and then redistribute it to the people. We're talking figures like not just 100 billion,

00:05:41.700 | but a trillion, or even 100 trillion. That's coming from him. Only adds, if AGI does create

00:05:47.140 | all that wealth, he's not sure how the company will redistribute it. To give you a sense

00:05:51.260 | of scale, as you head towards 100 trillion, you're talking about the scale of the entire

00:05:56.460 | labor force of the planet. And that, of course, brings us to others who don't want him to

00:06:01.980 | have that control, or maybe want that control for themselves. As you may have heard about,

00:06:07.500 | Elon Musk has bid almost 100 billion for OpenAI, or at least it's a bid for the non-profit

00:06:14.200 | which currently controls OpenAI. To save you reading half a dozen reports, essentially

00:06:19.500 | it looks like Sam Altman and OpenAI have valued that non-profit's stake in OpenAI at around

00:06:26.380 | $40 billion. That leaves plenty of equity left for Microsoft and OpenAI itself, including

00:06:32.380 | its employees. However, if Musk and others have valued that stake at $100 billion, then

00:06:39.380 | it might be very difficult in court for Altman and co. to say it's worth only $40 billion.

00:06:45.820 | So even if they reject, as it seems like they have done, Musk's offer, it forces them

00:06:51.180 | to potentially dilute the stake owned by Microsoft and the employees. Altman said to the employees

00:06:57.380 | at OpenAI that these are just tactics to try and weaken us because we're making great

00:07:02.300 | progress. The non-profit behind OpenAI could also reject the offer because it thinks that

00:07:07.780 | AGI wouldn't be safe in the hands of Musk. At this point, I just can't resist doing

00:07:12.480 | a quick plug for a mini documentary I released on my Patreon just yesterday. It actually

00:07:18.100 | covers the origin stories of DeepMind, OpenAI, the tussle with Musk and Anthropic and how

00:07:24.300 | the founding vision of each of those AGI labs went awry. This time, by the way, I used a

00:07:29.780 | professional video editor and the early reviews seemed to be good. All the shenanigans that

00:07:34.820 | are going on with the non-profit at OpenAI seem worthy of an entire video on their own.

00:07:41.260 | So for now, I'm going to move on to the next point.

00:07:44.060 | Sam Altman predicted that with the advent of AGI, the price of many goods will eventually

00:07:49.300 | fall dramatically. It seems like one way to assuage people who lose their job or see their

00:07:54.700 | wages drop is that, well, at least your TV is cheaper. But he did say the price of luxury

00:08:00.460 | goods and land may rise even more dramatically. Now, I don't know what you think, but I live

00:08:07.380 | in London and the price of land is already pretty dramatic. So who knows what it will

00:08:13.100 | be after AGI. But just on that luxury goods point, I think Sam Altman might have one particular

00:08:19.340 | luxury good in mind. Yesterday in London, Sam Altman was asked about their hardware

00:08:24.860 | device designed in part by Johnny Ive from Apple. And he said, it's incredible. It really

00:08:31.360 | is. I'm proud of it. And it's just a year away. Yes, by the way, I did apply to be at

00:08:36.380 | that event, but you had to have certain org IDs, which I didn't. One thing that might

00:08:40.960 | not be a luxury device are smaller language models. In leaked audio of that same event,

00:08:47.760 | he apparently said, well, one idea would be we put out O3 and then open source O3 mini.

00:08:53.420 | We put out O4 and open source O4 mini. He added, this is not a decision, but directionally

00:08:59.900 | you could imagine us saying this. Take all of that for what it is worth.

00:09:04.620 | The next jumping off point comes in the first sentence actually of this essay, which is

00:09:09.740 | that the mission of OpenAI is to ensure that AGI benefits all of humanity. Not that they

00:09:15.640 | make AGI, that they make an AGI that benefits all of humanity. Now, originally when they

00:09:20.680 | were founded, which I covered in the documentary, the charter was that they make AGI that benefits

00:09:26.200 | all of humanity unencumbered by the need for a financial return. But that last bit's gone,

00:09:31.540 | but we still have that it benefits all of humanity. Not most of humanity, by the way,

00:09:35.940 | benefits all of humanity. I really don't know how they are going to achieve that when they

00:09:41.660 | themselves admit that the vast majority of human labor might soon become redundant. Even

00:09:46.980 | if they somehow got implemented a benevolent policy in the US to make sure that everyone

00:09:53.140 | was looked after, how could you ensure that for other nations?

00:09:56.420 | After watching Yoshua Bengio, one of the godfathers of AI, and I'll show you the clip in a second,

00:10:01.220 | I did have this thought. It seems to me if a nation got to AGI or super intelligence

00:10:06.380 | one month, three months, six months before another one, it's not the most likely that

00:10:11.500 | they would use that advantage to just wipe out other nations. I think more likely would

00:10:16.860 | be to wipe out the economies of other nations. The US might automate the economy of say China

00:10:23.460 | or China, the US, and then take that wealth and distribute it amongst its people. And

00:10:28.580 | Yoshua Bengio thinks that that might even apply at the level of companies.

00:10:32.500 | I can see from the declarations that are made and, you know, what, you know, logically these

00:10:37.540 | people would do is that the people who control these systems, like say open AI potentially,

00:10:44.780 | they're not going to continue just selling the access to their AI. They're going to give

00:10:50.980 | access to, you know, a lower grade AI. They're going to keep the really powerful ones for

00:10:55.940 | themselves and they're going to build companies that are going to compete with the non-AI,

00:11:01.100 | you know, systems that exist. And they're going to basically wipe out the economies

00:11:05.360 | of all the other countries which don't have these superintelligent systems. So it's, you

00:11:13.160 | know, you say it's, you wrote it's not existential, but I think it is existential for countries

00:11:18.660 | who don't build up to this kind of level of AI. And it's an emergency because it's going

00:11:26.340 | to take at least several years, even with the coalition of the willing to bridge that.

00:11:32.900 | And just very quickly, I can't help, because he mentioned competitor companies to mention

00:11:38.000 | Gemini 2 Pro and Flash from Google, new models from Google DeepMind. There's also of course

00:11:43.720 | Gemini Thinking, which replicates the kind of reasoning traces of say O3 Mini or DeepSeek

00:11:48.720 | R1. Now straight off the benchmark results of these models are decent, but not stratospheric.

00:11:55.000 | For the most part, we're not talking O3 or DeepSeek R1 levels. On SimpleBench we're rate

00:11:59.480 | limited, but it seems like the scores of both the Thinking Mode and Gemini 2 Pro will gravitate

00:12:05.520 | around the same level as the "Gemini Experimental 1206". But I will say this, I know it's kind

00:12:11.280 | of niche. Gemini is amazing at quickly reading vast amounts of PDFs and other files. No,

00:12:18.760 | it's transcription accuracy of audio and I've tested isn't going to be at the level of say

00:12:23.200 | Assembly AI and no, it's coding is no O3 and it's "deep research" button is no deep research,

00:12:30.080 | but the Gemini series are great at extracting text from files and they are incredibly cheap.

00:12:36.040 | So I'm quite impressed. And I do suspect as Chachapiti just recently overtook Twitter to

00:12:41.200 | become the sixth most visited site and slowly starts closing in on Google, that Google will

00:12:46.600 | invest more and more and more to ensure that Gemini 3 is state of the art. Next, Ullman

00:12:52.400 | wrote about a likely path that he sees is AI being used by authoritarian governments

00:12:57.400 | to control their population through mass surveillance and loss of autonomy. And that remark brings

00:13:03.060 | me to the RAND paper that for some reason I read in full, because they're worried by

00:13:08.400 | not just mass surveillance by authoritarian dictatorships, but other threats to quote

00:13:13.640 | national security. Wonder weapons, systemic shifts in power, kind of talked about that

00:13:18.560 | earlier with say China automating the economy of the US, non-experts in power to develop

00:13:23.520 | weapons of mass destruction, artificial entities with agency, think O6 kind of coming alive

00:13:29.160 | and instability. This is RAND again, which has been around for over 75 years and is not

00:13:35.160 | known for dramatic statements. Again, I would ask though that if the US does a quote large

00:13:40.280 | national effort to ensure that they obtain a decisive AI enabled wonder weapon before

00:13:46.240 | China, say three months before, six months before, then what? Are you really going to

00:13:50.080 | use it to then disable the tech sector of China? For me, the real admission comes towards

00:13:55.800 | the end of this paper where they say the US is not well positioned to realise the ambitious

00:14:02.160 | economic benefits of AGI without widespread unemployment and accompanying societal unrest.

00:14:08.580 | And I still remember the days when Ullman used to say in interviews, it's just around

00:14:12.280 | two years ago, he said stuff like, if AGI produces the kind of inequality that he thinks

00:14:17.960 | it will, people won't take it anymore. Let's now though, get to some signs that

00:14:22.440 | AGI might not even be controlled by countries or even companies. For less than $50 worth

00:14:29.260 | of compute time, of course, not counting research time, but for around apparently $20 worth

00:14:34.400 | of compute time, affordable for all of you guys, Stanford produced S1. Now, yes, of course,

00:14:40.360 | they did utilise an open weight base model, QUEN 2.5, 32 billion parameters in struct,

00:14:46.100 | but the headline is with just a thousand questions worth of data, they could bring that tiny

00:14:52.360 | model to being competitive with O1. This is in science, GPQA and competition level mathematics.

00:14:59.840 | The key methodology was, well, whenever the model wanted to stop, they forced it to continue

00:15:05.520 | by adding weight, literally the token weight, multiple times to the model's generation

00:15:11.520 | when it tried to end. Imagine you sitting in an exam and every time

00:15:14.560 | you think you've come to an answer and you're ready to write it down, a voice in your head

00:15:18.880 | says, wait, that's kind of what happened until the student or you had taken a set amount

00:15:25.240 | of time on the problem. Appropriately then, this is called test time scaling, scaling

00:15:30.640 | up the amount of tokens spent to answer each question. I've reviewed the questions, by

00:15:35.520 | the way, in the math 500 benchmark, and they are tough. So to get 95%, at least the hard

00:15:41.000 | ones, the level five ones are, so to get 95% in that is impressive. Likewise, of course,

00:15:45.840 | to get beyond 60% in GPQA diamond, which roughly matches the level of PhDs in those domains.

00:15:52.920 | To recap, this is an off the shelf open weights model trained with just a thousand questions

00:15:58.000 | and reasoning traces. There were some famed professors in this Stanford team and their

00:16:02.760 | goal, by the way, was to replicate this chart on the right, which came in September from

00:16:07.920 | OpenAI. Now we kind of already know that the more pre-training you do and post-training

00:16:12.640 | with reinforcement learning you do, the better the performance will be. But what about time

00:16:17.460 | taken to actually answer questions, test time compute? That's the chart they wanted to

00:16:22.040 | replicate. Going back to the S1 paper, they say, despite the large number of O1 replication

00:16:26.760 | attempts, none have openly replicated a clear test time scaling behavior and look how they

00:16:32.520 | have done so. I'm going to simplify their approach a little bit because it's the finding

00:16:36.560 | that I'm more interested in, but essentially they sourced 59,000 tough questions. Physics

00:16:42.640 | Olympiads, astronomy, competition level mathematics, and AGI eval. I remember covering that paper

00:16:47.800 | like almost two years ago on this channel. They got Gemini thinking, the one that outputs

00:16:52.160 | thinking tokens like DeepSeek R1 does to generate reasoning traces and answers for each of those

00:16:57.840 | 59,000 examples. Now they could have just trained on all of those examples, but that

00:17:04.120 | did not offer substantial gains over just picking a thousand of them. Just a thousand

00:17:08.860 | examples in say your domain to get a small model to be a true reasoner. Then of course

00:17:14.200 | get it to think for a while with that weight trick. How to filter down from 59,000 examples

00:17:19.360 | to 1,000 by the way. First decontaminate, you don't want any questions that you're

00:17:23.520 | going to use to then test the model of course. Remove examples that rely on images that aren't

00:17:28.600 | found in the question, for example, and other formatting stuff. But more interestingly,

00:17:34.040 | difficulty and diversity. This is the kind of diversity that even JD Vance would get

00:17:38.320 | behind. On difficulty, they got smaller models to try those questions. And if those smaller

00:17:42.880 | models got the questions right, they excluded them. They must be too easy. On diversity,

00:17:48.000 | they wanted to cover as many topics as possible from mathematics and science, for example.

00:17:53.960 | They ended up with around 20 questions from 50 different domains. They then fine-tuned

00:17:59.120 | that base model on those a thousand examples with the reasoning traces from Gemini. And

00:18:03.360 | if you're wondering about DeepSeek R1, they fine-tuned with 800,000 examples. Actually,

00:18:09.760 | you can see that in this chart on the right here. Again, it wasn't just about fine-tuning

00:18:14.360 | each time the model would try to stop. They said wait sometimes two, four, or six times

00:18:20.440 | to keep boosting performance. Basically, it forces the model to check its own output and

00:18:25.200 | see if it can improve it. Notice weight is fairly neutral. You're not telling the model

00:18:29.200 | that it's wrong. You're saying, wait, maybe do we need to check that? They also tried

00:18:33.160 | scaling up majority voting or self-consistency, and it didn't quite have the same slope. Suffice

00:18:38.980 | to say though, if anyone watching is in any confusion, getting these kinds of scores in

00:18:43.440 | GPQA, Google Proof Question and Answer, and competition level mathematics, it's insane.

00:18:49.400 | Incredibly impressive. Of course, if you took this same model and tested it in a different

00:18:53.400 | domain, it would likely perform relatively poorly. Also, side note, when they say open

00:18:57.960 | data, they mean those thousand examples that they fine-tuned the base model on. The actual

00:19:02.800 | base model doesn't have open data. So it's not truly open data. As in, we don't know

00:19:07.440 | everything that went into the base model. Everything that Quen 2.5, 32 billion parameters,

00:19:12.920 | was trained on. Interestingly, they would have gone further, but the actual context

00:19:17.000 | window of the underlying language model constrains it. And Karpathy, in his excellent Chatterbitty

00:19:22.560 | video this week, talked about how it's an open research question about how to extend

00:19:27.800 | the context window suitably at the frontier. It's a three and a half hour video, but it's

00:19:31.920 | a definite recommend from me. Actually, speaking of Karpathy, his reaction to this very paper

00:19:36.880 | was "cute idea, reminds me of let's think step-by-step trick. That's where you told

00:19:41.960 | the model to think step-by-step so it spent more tokens to reason first before giving

00:19:46.160 | you an answer. Here, by saying wait, we're forcing the model to think for longer. Both

00:19:50.320 | lean, he said, on the language prior to steer the thoughts." And speaking of spending

00:19:54.240 | your time well by watching a Karpathy video, I would argue you can spend your money pretty

00:19:59.680 | well by researching which, say, charity to give to through GiveWell. They are the sponsors

00:20:05.480 | of this video, but I've actually been using them for, I think, 13 years. They have incredibly

00:20:10.780 | rigorous methodology, backed by 60,000+ hours of research each year on which charities save

00:20:17.800 | the most lives, essentially. The one that I've gone for, for actually all of those

00:20:22.000 | 13 years, is the Against Malaria Foundation, I think started in the UK. Anyway, do check

00:20:26.520 | out GiveWell, the links are in the description, and you can even put in where you first heard

00:20:31.080 | of them. So obviously, you could put, say, AI Explained. But alas, we are drawing to

00:20:35.040 | the end, so I've got one more point from the Sam Wallman essay that I wanted to get

00:20:38.600 | to. In previous essays, he's talked about the value of labour going to zero. Now he

00:20:42.720 | just talks about the balance of power between capital and labour getting messed up. But

00:20:47.120 | interestingly, he adds, this may require early intervention. Now, OpenAI have funded studies

00:20:52.320 | into UBI with, let's say, mixed results, so it's interesting he doesn't specifically

00:20:56.860 | advocate for universal basic income. He just talks about early intervention, then talks

00:21:00.920 | about compute budgets and being open to strange-sounding ideas. But I would say, if AGI is coming in

00:21:06.400 | two to five years, then the quote "early intervention" would have to happen, say,

00:21:10.480 | now? I must confess, though, at this stage, that I feel like we desperately need preparation

00:21:15.880 | for what's coming, but it's quite hard to actually specifically say what I'm advocating

00:21:20.360 | the preparation be. Then we get renewed calls just today from the CEO of Anthropic, Dario

00:21:25.160 | Amadei, about how AI will become a country of geniuses in a data centre, possibly by

00:21:30.400 | 2026 or 2027, and almost certainly no later than 2030. He said that governments are not

00:21:36.800 | doing enough to hold the big AI labs to account and measure risks and, at the next international

00:21:43.600 | summit – there was one just this week – we should not repeat this missed opportunity.

00:21:48.080 | These issues should be at the top of the agenda. The advance of AI presents major new global

00:21:53.200 | challenges. We must move faster and with greater clarity to confront them. I mean, I'm sold

00:21:58.760 | and I think many of you are, that change is coming very rapidly and sooner than the vast

00:22:04.360 | majority of people on the planet think. The question for me that I'll have to reflect

00:22:08.120 | on is, well, what are we going to do about it? Let me know what you think in the comments

00:22:12.560 | but above all, thank you so much for watching to the end and have a wonderful day.

AGI: (gets close), Humans: ‘Who Gets to Own it?’

Chapters