back to index

AGI: (gets close), Humans: ‘Who Gets to Own it?’


Chapters

0:0 Intro
1:37 AGI Inches Closer
4:26 ‘Super-Exponential’
5:58 Musk Bid
7:34 Luxury Goods and Land
9:5 ‘Benefits All Humanity’
12:52 ‘National Security’
14:21 s1
20:33 Final thoughts

Whisper Transcript | Transcript Only Page

00:00:00.000 | The world may be waking up to the fact that intelligence will be automated sooner than
00:00:05.680 | anyone could have imagined a few years ago, but it is still sleeping when it comes to
00:00:11.280 | who gets the spoils. Just today, the Vice President of America said that AI will never
00:00:17.120 | replace workers and only boost productivity. Then again, Sam Altman, CEO of OpenAI, wrote
00:00:23.520 | just yesterday that he could see labor losing its power to capital. And Rand, the famous
00:00:30.080 | think tank, put out a paper just the other day that said that the world isn't ready
00:00:34.200 | for the "job losses and societal unrest" that it thinks might accompany a more general
00:00:40.320 | artificial intelligence. But even if labor does lose, capital can't decide who gets
00:00:45.840 | the money. Just today, Musk and co. challenged Sam Altman and Microsoft for control of OpenAI
00:00:53.500 | itself. And of course, there are always papers like this one from Stanford suggesting that
00:00:58.720 | the reasoning enhancements needed to bring a model to frontier capability are achievable
00:01:03.960 | for just $20, which makes me think you guys can afford AGI after all. Meanwhile, Dario
00:01:09.600 | Amadei, CEO of Anthropic, makers of Claude, says that time is running out to control the
00:01:15.400 | AGI itself. "I just think that when the day inevitably comes that we must confront
00:01:20.920 | the full automation of intelligence. I just hope we are a little more unified, let's
00:01:26.600 | say, than we are now." There is too much to cover as always, so let's just cut it
00:01:30.880 | down to the 7 most interesting developments, using the Sam Altman essay as the jumping
00:01:36.280 | off point for each. First off, he gives his 5th or maybe 15th different definition for
00:01:42.440 | AGI, but this time it's "We mean it to be a system that can tackle increasingly
00:01:47.640 | complex problems at human level in many fields." Well, under that definition, we are getting
00:01:53.880 | awfully close. Take coding, where we heard in December that the O3 model was the 175th
00:02:01.400 | highest ranked coder in CodeForce's ELO. Now, that might not mean much to many people,
00:02:06.680 | but just yesterday in Japan, Sam Altman said they now have internally the 50th highest
00:02:12.120 | scoring competitor. We're clearly well beyond imitation learning. These systems O1, O3,
00:02:17.480 | O4, they're not copying those top 50 competitors saying coding. They are trying things out
00:02:22.440 | themselves and teaching themselves through reinforcement learning what works. We are
00:02:26.440 | not capped at the human level and that applies to way more than just coding. I've been
00:02:30.580 | using deep research from OpenAI on the pro tier this week to at least suggest diagnoses
00:02:37.420 | for a relative and a doctor I know said that it found things that she wouldn't have thought
00:02:43.960 | of. Of course, it does hallucinate fairly frequently, but it also thinks of things you
00:02:48.040 | might not have thought of. And remember, this is O3 searching maybe 20 sources. What about
00:02:53.960 | O5 searching 500? And you might say, well, knowing stuff is cool, but why color workers
00:02:59.360 | actually take actions on their computers? Well, Karina Nguyen from OpenAI has this to
00:03:05.760 | say. On tasks, they're saturating all the benchmarks.
00:03:09.260 | And post-training itself is not hitting the wall. Basically, we went from like raw data
00:03:15.620 | sets from pre-trained models to infinite amount of tasks that you can teach the model in the
00:03:23.760 | post-training world via reinforcement learning. So any task, for example, like how to search
00:03:30.700 | the web, how to use the computer, how to write, well, like all sorts of tasks that you like
00:03:37.900 | trying to teach the model, all the different skills. And that's why we think like there's
00:03:42.340 | no data wall or whatever, because there will be infinite amount of tasks. And that's how
00:03:48.340 | the model becomes extremely super intelligent. And VR actually getting saturated in all benchmarks.
00:03:54.740 | So I think the bottleneck is actually in evaluations.
00:03:57.860 | And there's a reason I can believe that even though their current operator system, only
00:04:01.900 | available on Pro for $200 a month, is quite jank. It's because tasks like buying something
00:04:07.680 | online or filling out a spreadsheet are mostly verifiable. And whenever you hear verifiable
00:04:12.780 | or checkable, think ready to be absolutely eaten by reinforcement learning. Just like
00:04:18.060 | domains like code, where you can see the impact of enhanced reinforcement learning from 01
00:04:23.980 | preview to 03. Next is the investment that must go in to make all of this happen. And
00:04:28.740 | Sam Altman had this to say later on in the essay, "The scaling laws that predict intelligence
00:04:34.060 | improvements have been accurate over many orders of magnitude. Give or take the intelligence
00:04:39.260 | of an AI model roughly equals the log of the resources used to train and run it." So think
00:04:43.980 | of that as 10xing the resources you put in to get one incremental step forward in intelligence.
00:04:50.500 | Doesn't sound super impressive until you read the third point. And I agree with this point.
00:04:54.700 | The socioeconomic value of linearly increasing intelligence, each increment, is super exponential.
00:05:01.140 | In short, if someone could somehow double the intelligence of 03, it wouldn't be worth
00:05:06.020 | 4x more to me, and I think to many people, it would be worth way, way more than that.
00:05:10.460 | It would be super exponential. He goes on, "A consequence of this is that we see no reason
00:05:15.820 | for the exponentially increasing investment to stop in the near future." In other words,
00:05:20.340 | if AI will always pay you back tenfold for what you invest in it, why ever stop investing?
00:05:26.820 | Many forget this, but less than two years ago, Sam Altman himself said that his grand
00:05:31.060 | idea is that OpenAI will capture much of the world's wealth through the creation of AGI,
00:05:36.780 | and then redistribute it to the people. We're talking figures like not just 100 billion,
00:05:41.700 | but a trillion, or even 100 trillion. That's coming from him. Only adds, if AGI does create
00:05:47.140 | all that wealth, he's not sure how the company will redistribute it. To give you a sense
00:05:51.260 | of scale, as you head towards 100 trillion, you're talking about the scale of the entire
00:05:56.460 | labor force of the planet. And that, of course, brings us to others who don't want him to
00:06:01.980 | have that control, or maybe want that control for themselves. As you may have heard about,
00:06:07.500 | Elon Musk has bid almost 100 billion for OpenAI, or at least it's a bid for the non-profit
00:06:14.200 | which currently controls OpenAI. To save you reading half a dozen reports, essentially
00:06:19.500 | it looks like Sam Altman and OpenAI have valued that non-profit's stake in OpenAI at around
00:06:26.380 | $40 billion. That leaves plenty of equity left for Microsoft and OpenAI itself, including
00:06:32.380 | its employees. However, if Musk and others have valued that stake at $100 billion, then
00:06:39.380 | it might be very difficult in court for Altman and co. to say it's worth only $40 billion.
00:06:45.820 | So even if they reject, as it seems like they have done, Musk's offer, it forces them
00:06:51.180 | to potentially dilute the stake owned by Microsoft and the employees. Altman said to the employees
00:06:57.380 | at OpenAI that these are just tactics to try and weaken us because we're making great
00:07:02.300 | progress. The non-profit behind OpenAI could also reject the offer because it thinks that
00:07:07.780 | AGI wouldn't be safe in the hands of Musk. At this point, I just can't resist doing
00:07:12.480 | a quick plug for a mini documentary I released on my Patreon just yesterday. It actually
00:07:18.100 | covers the origin stories of DeepMind, OpenAI, the tussle with Musk and Anthropic and how
00:07:24.300 | the founding vision of each of those AGI labs went awry. This time, by the way, I used a
00:07:29.780 | professional video editor and the early reviews seemed to be good. All the shenanigans that
00:07:34.820 | are going on with the non-profit at OpenAI seem worthy of an entire video on their own.
00:07:41.260 | So for now, I'm going to move on to the next point.
00:07:44.060 | Sam Altman predicted that with the advent of AGI, the price of many goods will eventually
00:07:49.300 | fall dramatically. It seems like one way to assuage people who lose their job or see their
00:07:54.700 | wages drop is that, well, at least your TV is cheaper. But he did say the price of luxury
00:08:00.460 | goods and land may rise even more dramatically. Now, I don't know what you think, but I live
00:08:07.380 | in London and the price of land is already pretty dramatic. So who knows what it will
00:08:13.100 | be after AGI. But just on that luxury goods point, I think Sam Altman might have one particular
00:08:19.340 | luxury good in mind. Yesterday in London, Sam Altman was asked about their hardware
00:08:24.860 | device designed in part by Johnny Ive from Apple. And he said, it's incredible. It really
00:08:31.360 | is. I'm proud of it. And it's just a year away. Yes, by the way, I did apply to be at
00:08:36.380 | that event, but you had to have certain org IDs, which I didn't. One thing that might
00:08:40.960 | not be a luxury device are smaller language models. In leaked audio of that same event,
00:08:47.760 | he apparently said, well, one idea would be we put out O3 and then open source O3 mini.
00:08:53.420 | We put out O4 and open source O4 mini. He added, this is not a decision, but directionally
00:08:59.900 | you could imagine us saying this. Take all of that for what it is worth.
00:09:04.620 | The next jumping off point comes in the first sentence actually of this essay, which is
00:09:09.740 | that the mission of OpenAI is to ensure that AGI benefits all of humanity. Not that they
00:09:15.640 | make AGI, that they make an AGI that benefits all of humanity. Now, originally when they
00:09:20.680 | were founded, which I covered in the documentary, the charter was that they make AGI that benefits
00:09:26.200 | all of humanity unencumbered by the need for a financial return. But that last bit's gone,
00:09:31.540 | but we still have that it benefits all of humanity. Not most of humanity, by the way,
00:09:35.940 | benefits all of humanity. I really don't know how they are going to achieve that when they
00:09:41.660 | themselves admit that the vast majority of human labor might soon become redundant. Even
00:09:46.980 | if they somehow got implemented a benevolent policy in the US to make sure that everyone
00:09:53.140 | was looked after, how could you ensure that for other nations?
00:09:56.420 | After watching Yoshua Bengio, one of the godfathers of AI, and I'll show you the clip in a second,
00:10:01.220 | I did have this thought. It seems to me if a nation got to AGI or super intelligence
00:10:06.380 | one month, three months, six months before another one, it's not the most likely that
00:10:11.500 | they would use that advantage to just wipe out other nations. I think more likely would
00:10:16.860 | be to wipe out the economies of other nations. The US might automate the economy of say China
00:10:23.460 | or China, the US, and then take that wealth and distribute it amongst its people. And
00:10:28.580 | Yoshua Bengio thinks that that might even apply at the level of companies.
00:10:32.500 | I can see from the declarations that are made and, you know, what, you know, logically these
00:10:37.540 | people would do is that the people who control these systems, like say open AI potentially,
00:10:44.780 | they're not going to continue just selling the access to their AI. They're going to give
00:10:50.980 | access to, you know, a lower grade AI. They're going to keep the really powerful ones for
00:10:55.940 | themselves and they're going to build companies that are going to compete with the non-AI,
00:11:01.100 | you know, systems that exist. And they're going to basically wipe out the economies
00:11:05.360 | of all the other countries which don't have these superintelligent systems. So it's, you
00:11:13.160 | know, you say it's, you wrote it's not existential, but I think it is existential for countries
00:11:18.660 | who don't build up to this kind of level of AI. And it's an emergency because it's going
00:11:26.340 | to take at least several years, even with the coalition of the willing to bridge that.
00:11:32.900 | And just very quickly, I can't help, because he mentioned competitor companies to mention
00:11:38.000 | Gemini 2 Pro and Flash from Google, new models from Google DeepMind. There's also of course
00:11:43.720 | Gemini Thinking, which replicates the kind of reasoning traces of say O3 Mini or DeepSeek
00:11:48.720 | R1. Now straight off the benchmark results of these models are decent, but not stratospheric.
00:11:55.000 | For the most part, we're not talking O3 or DeepSeek R1 levels. On SimpleBench we're rate
00:11:59.480 | limited, but it seems like the scores of both the Thinking Mode and Gemini 2 Pro will gravitate
00:12:05.520 | around the same level as the "Gemini Experimental 1206". But I will say this, I know it's kind
00:12:11.280 | of niche. Gemini is amazing at quickly reading vast amounts of PDFs and other files. No,
00:12:18.760 | it's transcription accuracy of audio and I've tested isn't going to be at the level of say
00:12:23.200 | Assembly AI and no, it's coding is no O3 and it's "deep research" button is no deep research,
00:12:30.080 | but the Gemini series are great at extracting text from files and they are incredibly cheap.
00:12:36.040 | So I'm quite impressed. And I do suspect as Chachapiti just recently overtook Twitter to
00:12:41.200 | become the sixth most visited site and slowly starts closing in on Google, that Google will
00:12:46.600 | invest more and more and more to ensure that Gemini 3 is state of the art. Next, Ullman
00:12:52.400 | wrote about a likely path that he sees is AI being used by authoritarian governments
00:12:57.400 | to control their population through mass surveillance and loss of autonomy. And that remark brings
00:13:03.060 | me to the RAND paper that for some reason I read in full, because they're worried by
00:13:08.400 | not just mass surveillance by authoritarian dictatorships, but other threats to quote
00:13:13.640 | national security. Wonder weapons, systemic shifts in power, kind of talked about that
00:13:18.560 | earlier with say China automating the economy of the US, non-experts in power to develop
00:13:23.520 | weapons of mass destruction, artificial entities with agency, think O6 kind of coming alive
00:13:29.160 | and instability. This is RAND again, which has been around for over 75 years and is not
00:13:35.160 | known for dramatic statements. Again, I would ask though that if the US does a quote large
00:13:40.280 | national effort to ensure that they obtain a decisive AI enabled wonder weapon before
00:13:46.240 | China, say three months before, six months before, then what? Are you really going to
00:13:50.080 | use it to then disable the tech sector of China? For me, the real admission comes towards
00:13:55.800 | the end of this paper where they say the US is not well positioned to realise the ambitious
00:14:02.160 | economic benefits of AGI without widespread unemployment and accompanying societal unrest.
00:14:08.580 | And I still remember the days when Ullman used to say in interviews, it's just around
00:14:12.280 | two years ago, he said stuff like, if AGI produces the kind of inequality that he thinks
00:14:17.960 | it will, people won't take it anymore. Let's now though, get to some signs that
00:14:22.440 | AGI might not even be controlled by countries or even companies. For less than $50 worth
00:14:29.260 | of compute time, of course, not counting research time, but for around apparently $20 worth
00:14:34.400 | of compute time, affordable for all of you guys, Stanford produced S1. Now, yes, of course,
00:14:40.360 | they did utilise an open weight base model, QUEN 2.5, 32 billion parameters in struct,
00:14:46.100 | but the headline is with just a thousand questions worth of data, they could bring that tiny
00:14:52.360 | model to being competitive with O1. This is in science, GPQA and competition level mathematics.
00:14:59.840 | The key methodology was, well, whenever the model wanted to stop, they forced it to continue
00:15:05.520 | by adding weight, literally the token weight, multiple times to the model's generation
00:15:11.520 | when it tried to end. Imagine you sitting in an exam and every time
00:15:14.560 | you think you've come to an answer and you're ready to write it down, a voice in your head
00:15:18.880 | says, wait, that's kind of what happened until the student or you had taken a set amount
00:15:25.240 | of time on the problem. Appropriately then, this is called test time scaling, scaling
00:15:30.640 | up the amount of tokens spent to answer each question. I've reviewed the questions, by
00:15:35.520 | the way, in the math 500 benchmark, and they are tough. So to get 95%, at least the hard
00:15:41.000 | ones, the level five ones are, so to get 95% in that is impressive. Likewise, of course,
00:15:45.840 | to get beyond 60% in GPQA diamond, which roughly matches the level of PhDs in those domains.
00:15:52.920 | To recap, this is an off the shelf open weights model trained with just a thousand questions
00:15:58.000 | and reasoning traces. There were some famed professors in this Stanford team and their
00:16:02.760 | goal, by the way, was to replicate this chart on the right, which came in September from
00:16:07.920 | OpenAI. Now we kind of already know that the more pre-training you do and post-training
00:16:12.640 | with reinforcement learning you do, the better the performance will be. But what about time
00:16:17.460 | taken to actually answer questions, test time compute? That's the chart they wanted to
00:16:22.040 | replicate. Going back to the S1 paper, they say, despite the large number of O1 replication
00:16:26.760 | attempts, none have openly replicated a clear test time scaling behavior and look how they
00:16:32.520 | have done so. I'm going to simplify their approach a little bit because it's the finding
00:16:36.560 | that I'm more interested in, but essentially they sourced 59,000 tough questions. Physics
00:16:42.640 | Olympiads, astronomy, competition level mathematics, and AGI eval. I remember covering that paper
00:16:47.800 | like almost two years ago on this channel. They got Gemini thinking, the one that outputs
00:16:52.160 | thinking tokens like DeepSeek R1 does to generate reasoning traces and answers for each of those
00:16:57.840 | 59,000 examples. Now they could have just trained on all of those examples, but that
00:17:04.120 | did not offer substantial gains over just picking a thousand of them. Just a thousand
00:17:08.860 | examples in say your domain to get a small model to be a true reasoner. Then of course
00:17:14.200 | get it to think for a while with that weight trick. How to filter down from 59,000 examples
00:17:19.360 | to 1,000 by the way. First decontaminate, you don't want any questions that you're
00:17:23.520 | going to use to then test the model of course. Remove examples that rely on images that aren't
00:17:28.600 | found in the question, for example, and other formatting stuff. But more interestingly,
00:17:34.040 | difficulty and diversity. This is the kind of diversity that even JD Vance would get
00:17:38.320 | behind. On difficulty, they got smaller models to try those questions. And if those smaller
00:17:42.880 | models got the questions right, they excluded them. They must be too easy. On diversity,
00:17:48.000 | they wanted to cover as many topics as possible from mathematics and science, for example.
00:17:53.960 | They ended up with around 20 questions from 50 different domains. They then fine-tuned
00:17:59.120 | that base model on those a thousand examples with the reasoning traces from Gemini. And
00:18:03.360 | if you're wondering about DeepSeek R1, they fine-tuned with 800,000 examples. Actually,
00:18:09.760 | you can see that in this chart on the right here. Again, it wasn't just about fine-tuning
00:18:14.360 | each time the model would try to stop. They said wait sometimes two, four, or six times
00:18:20.440 | to keep boosting performance. Basically, it forces the model to check its own output and
00:18:25.200 | see if it can improve it. Notice weight is fairly neutral. You're not telling the model
00:18:29.200 | that it's wrong. You're saying, wait, maybe do we need to check that? They also tried
00:18:33.160 | scaling up majority voting or self-consistency, and it didn't quite have the same slope. Suffice
00:18:38.980 | to say though, if anyone watching is in any confusion, getting these kinds of scores in
00:18:43.440 | GPQA, Google Proof Question and Answer, and competition level mathematics, it's insane.
00:18:49.400 | Incredibly impressive. Of course, if you took this same model and tested it in a different
00:18:53.400 | domain, it would likely perform relatively poorly. Also, side note, when they say open
00:18:57.960 | data, they mean those thousand examples that they fine-tuned the base model on. The actual
00:19:02.800 | base model doesn't have open data. So it's not truly open data. As in, we don't know
00:19:07.440 | everything that went into the base model. Everything that Quen 2.5, 32 billion parameters,
00:19:12.920 | was trained on. Interestingly, they would have gone further, but the actual context
00:19:17.000 | window of the underlying language model constrains it. And Karpathy, in his excellent Chatterbitty
00:19:22.560 | video this week, talked about how it's an open research question about how to extend
00:19:27.800 | the context window suitably at the frontier. It's a three and a half hour video, but it's
00:19:31.920 | a definite recommend from me. Actually, speaking of Karpathy, his reaction to this very paper
00:19:36.880 | was "cute idea, reminds me of let's think step-by-step trick. That's where you told
00:19:41.960 | the model to think step-by-step so it spent more tokens to reason first before giving
00:19:46.160 | you an answer. Here, by saying wait, we're forcing the model to think for longer. Both
00:19:50.320 | lean, he said, on the language prior to steer the thoughts." And speaking of spending
00:19:54.240 | your time well by watching a Karpathy video, I would argue you can spend your money pretty
00:19:59.680 | well by researching which, say, charity to give to through GiveWell. They are the sponsors
00:20:05.480 | of this video, but I've actually been using them for, I think, 13 years. They have incredibly
00:20:10.780 | rigorous methodology, backed by 60,000+ hours of research each year on which charities save
00:20:17.800 | the most lives, essentially. The one that I've gone for, for actually all of those
00:20:22.000 | 13 years, is the Against Malaria Foundation, I think started in the UK. Anyway, do check
00:20:26.520 | out GiveWell, the links are in the description, and you can even put in where you first heard
00:20:31.080 | of them. So obviously, you could put, say, AI Explained. But alas, we are drawing to
00:20:35.040 | the end, so I've got one more point from the Sam Wallman essay that I wanted to get
00:20:38.600 | to. In previous essays, he's talked about the value of labour going to zero. Now he
00:20:42.720 | just talks about the balance of power between capital and labour getting messed up. But
00:20:47.120 | interestingly, he adds, this may require early intervention. Now, OpenAI have funded studies
00:20:52.320 | into UBI with, let's say, mixed results, so it's interesting he doesn't specifically
00:20:56.860 | advocate for universal basic income. He just talks about early intervention, then talks
00:21:00.920 | about compute budgets and being open to strange-sounding ideas. But I would say, if AGI is coming in
00:21:06.400 | two to five years, then the quote "early intervention" would have to happen, say,
00:21:10.480 | now? I must confess, though, at this stage, that I feel like we desperately need preparation
00:21:15.880 | for what's coming, but it's quite hard to actually specifically say what I'm advocating
00:21:20.360 | the preparation be. Then we get renewed calls just today from the CEO of Anthropic, Dario
00:21:25.160 | Amadei, about how AI will become a country of geniuses in a data centre, possibly by
00:21:30.400 | 2026 or 2027, and almost certainly no later than 2030. He said that governments are not
00:21:36.800 | doing enough to hold the big AI labs to account and measure risks and, at the next international
00:21:43.600 | summit – there was one just this week – we should not repeat this missed opportunity.
00:21:48.080 | These issues should be at the top of the agenda. The advance of AI presents major new global
00:21:53.200 | challenges. We must move faster and with greater clarity to confront them. I mean, I'm sold
00:21:58.760 | and I think many of you are, that change is coming very rapidly and sooner than the vast
00:22:04.360 | majority of people on the planet think. The question for me that I'll have to reflect
00:22:08.120 | on is, well, what are we going to do about it? Let me know what you think in the comments
00:22:12.560 | but above all, thank you so much for watching to the end and have a wonderful day.