back to indexAltman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out

Chapters
0:0 Introduction
1:13 Pro Cost and OpenAI Operator
4:0 Agent Benchmarks Being Targeted
7:48 Fast Take-off, Altman
8:48 Altman flip-flops
10:2 Deepseek R1 First Reaction
00:00:00.000 |
Progress in AI is increasingly hidden behind closed doors, 00:00:11.720 |
are targeting particular AI agent benchmarks, 00:00:15.120 |
and I'll give you the highlights of two papers 00:00:24.480 |
which I tried to find interesting, but I just couldn't. 00:00:27.120 |
Meanwhile, though, Sam Altman markedly changes gear 00:00:31.880 |
in other words, how fast superintelligence is coming, 00:00:39.840 |
proves that open-source models aren't that far behind 00:01:01.760 |
That, to be honest, will give you the best gauge 00:01:10.520 |
and I don't just mean the cost of O3 when it is released, 00:01:15.280 |
which apparently will be still $200 on the pro tier. 00:01:19.600 |
Given that they are already losing money with O1 Pro, 00:01:23.000 |
it does kind of make you wonder about the economics 00:01:29.840 |
No, I more mean the numbers behind the operator system 00:01:33.960 |
that OpenAI looks set to be releasing quite soon. 00:01:45.080 |
I'll get to the two relevant papers in a moment, 00:01:48.640 |
if the O series has proven anything from OpenAI, 00:01:55.960 |
So is that why yesterday we got this headline in Axios, 00:02:04.920 |
but I'm just going to give you the two or three highlights. 00:02:10.160 |
in the coming weeks will announce a breakthrough 00:02:12.920 |
that unleashes, quote, "PhD level super agents 00:02:27.700 |
for US government officials on the 30th of January. 00:02:31.020 |
And there's not much other information in the article 00:02:35.380 |
"Several OpenAI staff have been telling friends 00:02:38.160 |
"that they are both jazzed and spooked by recent progress." 00:02:45.820 |
are hiring aggressively for a multi-agent research team, 00:03:05.060 |
and involve complex environments with multiple agents. 00:03:08.520 |
This is something that they are marching towards this year. 00:03:19.820 |
according to one White House National Security Advisor. 00:03:28.620 |
that was rarely heard during his decade plus in public life. 00:03:33.260 |
the first version of this computer use operator agent 00:03:44.980 |
although I doubt OpenAI would release a model that could. 00:03:48.760 |
As we enter this year of AI agents doing our work for us, 00:03:56.900 |
What kind of tasks are involved in WebVoyager and OS World? 00:04:10.400 |
That is pretty cool that the agent could do that. 00:04:14.680 |
that would take me quite a while to type out. 00:04:17.280 |
I mean, I guess I could speak that to the agent, 00:04:30.120 |
and has at least a four-star rating based on user reviews. 00:04:37.320 |
if it's giving you something that meets your criteria. 00:04:41.920 |
I could well imagine listing a bunch of criteria 00:04:50.680 |
Definitely not a long horizon task in a complex environment, 00:05:00.160 |
I illegally downloaded an episode of "Friends" 00:05:03.720 |
but I don't know how to remove the subtitles. 00:05:13.880 |
sometimes two to edit these videos in Descript 00:05:24.480 |
Why can't existing agents already crush the simpler tasks? 00:05:28.320 |
Well, apparently more than 75% of their clicks 00:05:39.040 |
Oh, and also they were attracted by advertisement content, 00:05:49.040 |
and watching "Helpless" as it clicks on an ad 00:05:53.800 |
Now, I know the flaws of agents can seem silly sometimes, 00:05:57.080 |
unlike we're years and years away from usable agents, 00:06:04.400 |
I created over 200 pages worth of mathematics puzzles 00:06:14.960 |
to benchmark early AI models like the original ChatGPT. 00:06:40.000 |
Obviously, there had been incremental progress before that, 00:06:42.760 |
but even tougher challenges like this one, O1 pro aced. 00:06:46.920 |
So I guess I'm saying that I feel like we will go 00:07:00.080 |
who is a lead researcher on the O series of models, 00:07:02.840 |
when he said, "It can be hard to feel the AGI 00:07:15.240 |
he's talking about the writer behind Taxi Driver, 00:07:18.120 |
who said that AI came up with better script ideas 00:07:32.160 |
And I don't think that's necessarily contradictory 00:07:37.560 |
"Lots of vague AI hype on social media these days, 00:07:44.240 |
but plenty of unsolved research problems remain." 00:07:54.960 |
who has reversed his position on fast takeoff timelines. 00:08:03.160 |
- What's something you've rethought recently on AI 00:08:12.800 |
but something that's in like a small number of years 00:08:16.840 |
What do you think is the worst advice people are given 00:08:32.200 |
about what he thought just 18 months ago or so. 00:08:41.280 |
But the way people define the start of the takeoff, 00:08:50.200 |
we would have clearer communication from these companies 00:08:57.200 |
And honestly, it is hard to keep up sometimes 00:09:00.000 |
with the changing opinions of the CEOs of these AI labs. 00:09:12.120 |
18 months ago, he personally implored Congress 00:09:15.040 |
to regulate AI, and I covered that at the time. 00:09:18.400 |
we got this very corporate economic blueprint from OpenAI, 00:09:24.880 |
In short, though, it implores the US government 00:09:31.800 |
it's promised that OpenAI would never facilitate 00:09:34.400 |
their tools being used to threaten or coerce other states. 00:09:37.680 |
Meanwhile, that principle doesn't always seem 00:09:43.280 |
The anthropic CEO who chose not to make such a donation 00:09:55.600 |
I really think we need to do something in 2025. 00:10:04.680 |
where companies used to take six to eight months 00:10:14.440 |
These days, speaking to official safety testers and others, 00:10:22.840 |
And no, open source does not feel like a year behind 00:10:28.840 |
It was announced literally an hour and a half ago 00:10:35.120 |
but I have digested some of the benchmark results 00:10:48.920 |
that official benchmarks tell us less than they used to 00:10:54.680 |
with our own benchmark and see which model performs best. 00:11:08.640 |
but it repeatedly says, wait, no, wait, I'm gonna do this. 00:11:17.680 |
it will be very interesting to see how quickly 00:11:24.880 |
sometimes thinks in its chain of thought in Chinese 00:11:42.840 |
You know the model that I'm actually looking forward 00:11:44.880 |
to the most, that would be Claude for Sonnet. 00:11:48.240 |
I was spending about 50 hours the last 10 days 00:11:52.400 |
or so working on this coding project with a colleague. 00:11:55.880 |
And there's one critical task that we needed an LLM to do. 00:11:59.600 |
And O1 Pro simply couldn't get the hang of it, 00:12:24.760 |
but I genuinely listen to them and really learn a lot. 00:12:30.440 |
I was listening to while on a long walk in London. 00:12:39.480 |
Yes, by the way, they also have a YouTube channel 00:12:41.920 |
that I know some of you have already checked out and like, 00:12:46.480 |
Thank you also to everyone who has participated 00:12:53.880 |
Lots more to say on that front in another video. 00:13:04.800 |
For me, as ever, the truth lies somewhere in between. 00:13:08.680 |
Thank you so much for watching and have a wonderful day.