back to indexWas GPT-5 Underwhelming? OpenAI Co-founder Leaves, Figure02 Arrives, Character.AI Gutted, GPT-5 2025
00:00:00.000 |
As the GPT-5 release date sails further into the horizon, OpenAI leadership splinters. 00:00:07.600 |
Meanwhile, other AI labs ship incrementally smarter models, 00:00:12.640 |
and smaller AGI efforts like Character AI are swallowed by the Google whale, 00:00:19.040 |
leaving in my eyes just four companies remaining in contention for having the most capable models 00:00:24.640 |
out there. But if we get agile, autonomous humanoid robots like Figaro 2 that could soon 00:00:29.840 |
count to 50 in one breath with ChatGPT Advanced Voice, maybe many will take that as a win for 2024. 00:00:37.600 |
But let's start with the leadership of OpenAI, and Greg Brockman is going on a leave of absence 00:00:44.000 |
through to the end of the year, calling it his first time to relax since co-founding the company. 00:00:50.320 |
He goes on, "The mission is far from complete though. We still have a safe AGI to build." 00:00:56.080 |
Now, of course, any one person taking time away might well be for personal reasons, 00:01:00.640 |
but we have the full-on departure to Anthropic of another one of the co-founders, 00:01:06.080 |
John Shulman. The reason that he, as the former head of alignment at OpenAI, 00:01:10.240 |
jumped ship to Anthropic, I think is given away in this sentence. He said he wants to do research 00:01:15.840 |
alongside people deeply engaged with the topics he's most interested in. Now, it is pretty hard 00:01:21.760 |
to read that as saying anything other than he wasn't working with people who were deeply engaged 00:01:28.240 |
with the kind of alignment work that he was working on. Just for those who don't know, 00:01:31.600 |
alignment is this attempt to align machine values with human values, but it follows the departure of 00:01:38.000 |
the previous head of alignment, Jan Laika, and before that, the previous co-head of alignment 00:01:43.440 |
and co-founder, Ilya Satskova. I'm definitely starting to notice a trend of departures among 00:01:48.960 |
co-founders at OpenAI, so I've enlisted ChachiPT to count the former heads of alignment at OpenAI. 00:01:55.920 |
Now, obviously, I was somewhat joking there. I think there's been less than 10 00:02:08.720 |
former heads of alignment, but that advanced voice mode does seem cool. 00:02:13.040 |
I mean, even if OpenAI models aren't actually getting smarter and arguably with GPC 4.0 are 00:02:18.320 |
getting slightly dumber, that advanced voice mode is incredibly lifelike and I could see 00:02:23.840 |
hundreds of millions of people using it, at least when it comes out, which seems to be on the never 00:02:28.960 |
never. And here's some even more important context. OpenAI back in May said that they had recently 00:02:35.680 |
begun training its next frontier model. Now here we are on August the 6th. It would almost certainly 00:02:42.240 |
have finished training by now, so those key players at OpenAI would have a rough sense 00:02:47.600 |
for its capabilities. All of these departures after they've trained their latest frontier model 00:02:53.600 |
seems strange. And then we got this about OpenAI's so-called dev day, which is starting in October 00:03:00.160 |
and running through to November. I am sure it will be fascinating, but they released this particular 00:03:05.520 |
nugget. While we know developers are waiting for our next big model, which we shared has begun 00:03:11.280 |
training earlier this year in May, these events will focus on advancements in the API and our 00:03:17.840 |
dev tools. Taking that at face value would imply that "GPC 5.0" will not come before November 21st. 00:03:25.840 |
More likely, it means that GPC 5.0 wouldn't even come before the end of the year, because why would 00:03:29.920 |
you release a model just after you've invited a load of devs to play about with your tools? 00:03:35.200 |
Now, yes, Sam Ullman has recently claimed in the Washington Post that more advances will soon 00:03:40.560 |
follow and will usher in a decisive period in the story of human society. But in recent months, 00:03:47.040 |
it would be hard to say that OpenAI have produced much in the way of decisive progress. And of 00:03:53.200 |
course, all of that comes as Elon Musk is suing yet again Sam Ullman and OpenAI for what he says 00:04:01.120 |
is lying and perfidy. The lawsuit calls the original OpenAI a spurious venture. The language 00:04:07.840 |
used throughout this 86-page document is hardly subtle. Musk claims that Sam Ullman is doing a 00:04:14.080 |
long con. His perfidy and deceit are of Shakespearean proportions. It does go on and on, 00:04:20.400 |
but the basic accusation is that Sam Ullman was motivated by greed and Elon Musk just wanted to 00:04:27.120 |
have something more open to compete versus Google. I think Musk and others may raise an eyebrow when 00:04:33.760 |
later in the article, Sam Ullman said making sure open source models are readily available 00:04:38.560 |
to developers in other nations will further bolster our advantage, talking of the US. 00:04:43.920 |
And the question that the article fundamentally raises is who will control the future of AI? 00:04:50.640 |
Well, as of today, it looks less and less likely to be Sam Ullman. OpenAI are certainly good at 00:04:57.600 |
productizing AI and the advanced voice mode, as we saw, is great. SearchGPT could make them 00:05:03.600 |
some money and Sora's coming out at some point, presumably this year. And at least at the moment, 00:05:08.880 |
the Figaro 2 robot is using an OpenAI video language model. We'll get back to that in 00:05:14.800 |
just a moment, but in terms of raw intelligence, OpenAI feel like they're falling behind. 00:05:20.720 |
The LLAMA 3 405 billion parameter model is already smarter than GPT 4.0 and Zuckerberg 00:05:28.400 |
has recently committed to 10 times more computing power to train LLAMA 4. Or to put it another way, 00:05:34.720 |
the next OpenAI model would have to be significantly better than GPT 4.0 just to 00:05:39.760 |
catch up to the current state of the art, let alone the state of the art when LLAMA 4 comes out. 00:05:44.400 |
And of course, in the meantime, even just this year, we might be getting Claude 3.5 00:05:48.640 |
Opus from Anthropic or even Claude 4. Simply put, a year is a long time in AI and the debate has 00:05:54.960 |
moved on. Even the White House are now encouraging open source competition to the likes of OpenAI. 00:06:01.200 |
Sam Ullman, meanwhile, is still warning about people stealing key intellectual property such 00:06:07.120 |
as model weights. Now do forgive me for pointing out that one of the modules on my Coursera course 00:06:12.880 |
is about that difference between open source and open weights. Super grateful, of course, 00:06:17.840 |
for those 10 reviewers who have kindly left reviews for this course. 00:06:22.000 |
But don't get me wrong, it's not like Meta is having it all its own way. Do you remember those 00:06:26.800 |
Tom Brady, Paris Hilton chatbots that all the youngsters were apparently going to be using? 00:06:32.080 |
I think each celebrity was paid something like $5 million for a few hours of recordings. Well, 00:06:37.760 |
apparently they're now being scrapped and none of those AI chatbots amassed a particularly big 00:06:43.040 |
following. But nor are things going particularly well for the smaller AGI labs like Character AI. 00:06:49.760 |
Their product was or is an array of chatbots but they were also aiming at AGI and training 00:06:56.400 |
their own foundation models. Obviously, the leaders of Character AI must have been 00:07:00.800 |
somewhat disappointed by those new foundation models because essentially they've been bought 00:07:05.200 |
out or hollowed out by Google. Not actually buying a rival company but taking its key talent and IP. 00:07:12.160 |
But at this point you might be starting to notice somewhat of a trend. If the incrementally greater 00:07:18.240 |
intelligence of new models were down to obscure tricks or arcane knowledge, then you'd expect 00:07:24.240 |
smaller labs like Character AI to be doing as well as the biggest labs. But if it's all about 00:07:29.680 |
sheer scale of data and compute, you'd expect the leaders to be increasingly, well, Meta and Google. 00:07:36.960 |
And that is more or less what we're seeing though of course measuring that intelligence is quite 00:07:41.040 |
hard. We do have the LMSys chatbot arena leaderboard in which the new version of Gemini 1.5 Pro takes 00:07:48.160 |
the lead at almost 1300 Elo. But if you'll notice we have GPT-40 Mini coming third, ahead indeed of 00:07:55.680 |
Claw 3.5 Sonnet, which in my own benchmark is far and away ahead. If we just relied on these Elo 00:08:03.040 |
rankings, you'd think, well, if OpenAI can come up with a tiny model doing almost as good as the rest, 00:08:09.520 |
they must be doing amazingly. But LMSys recently did something great, which was release a batch 00:08:14.720 |
of raw data showing comparisons between the models and which one won. And I looked through 00:08:20.480 |
the dozens of examples and one trend emerged. Claw 3.5 Sonnet essentially refused more requests 00:08:27.840 |
than GPT-40 Mini. Even when both models couldn't perform a task like creating an image natively, 00:08:33.840 |
GPT-40 Mini gave it, I guess, more of a go. It at least described the image that it would create. 00:08:39.520 |
Or in this example, when the models were asked a political question, GPT-40 Mini gave a response, 00:08:46.320 |
whereas Claw 3.5 Sonnet just apologized and said it wouldn't provide analysis. Now, 00:08:50.800 |
I have noticed that myself that Claw 3.5 Sonnet is more sensitive than any other model, but that's 00:08:56.640 |
not a sign of lacking intelligence. So to the extent that we're going to call language modeling 00:09:01.600 |
intelligence, Claw 3.5 Sonnet is far more capable than GPT-40 Mini. But this reticence to answer 00:09:08.400 |
certain questions could explain the leaderboard rankings. Now, yes, to anyone following the 00:09:12.960 |
channel, I have been testing the new version of Gemini 1.5 Pro on my simple bench. The final 00:09:18.960 |
scores will be presented on a website that I'm hoping to release before the next video, 00:09:23.840 |
but in the early testing, it performs slightly worse than 3.5 Sonnet, but far better than other 00:09:29.920 |
models. So in that sense, this leaderboard position could be far more justified, at least 00:09:34.800 |
than GPT-40 Mini. For those who haven't heard of my new reasoning benchmark, humans score over 90% 00:09:40.400 |
quite easily, whereas models like the new Gemini 1.5 Pro version score around 25%. It would, however, 00:09:47.360 |
be somewhat hypey to say that this is another step toward AGI, as one of the co-founders of 00:09:53.600 |
Google DeepMind recently said. But whether you think that new Gemini 1.5 Pro is the best or 00:09:59.440 |
Claude 3.5 Sonnet, certainly more and more people are now shifting their API spend away from OpenAI. 00:10:06.400 |
OpenAI letting other labs have the lead for a couple of weeks could be a timing issue, 00:10:11.280 |
but a couple of months just makes it seem like they don't have a reply. But as I mentioned at 00:10:16.320 |
the start, at least OpenAI's models are the ones being chosen by Figure02. These humanoid robots 00:10:22.560 |
have an onboard mic and speaker, so hopefully you could chat to them like you would the new 00:10:27.600 |
OpenAI advanced voice mode. In other words, seamlessly with very low latency. Brett Adcock, 00:10:33.120 |
the founder of Figure, said that the default user interface to our robot will be speech. 00:10:38.960 |
Apparently, the robot can work for around 20 hours straight, which is even more than me reading the 00:10:44.800 |
latest AI papers. Its hands have 16 degrees of freedom and apparently human equivalent strength. 00:10:51.120 |
So honestly, even though it can speak back to you, you might not want to speak back to it. 00:10:56.480 |
Now, apparently these Figure02 robots can perform certain tasks autonomously 00:11:01.120 |
and self-correct and that data flywheel will be in effect. And one might well say that the argument 00:11:07.760 |
that ubiquitous robot assistants will arrive before Artificial General Intelligence looks 00:11:14.000 |
more plausible than ever. And speaking of a data flywheel, you may already know that AI labs, 00:11:19.680 |
including OpenAI, have used Weights & Biases, this video's sponsor, to track frontier machine 00:11:25.760 |
learning experiments. But what you might not know is that Weights & Biases now have Weave, 00:11:30.480 |
a lightweight toolkit to confidently iterate on LLM applications. They also produce free 00:11:36.080 |
prompt and LLM agent courses on their website. And if you didn't know that, you can let them 00:11:40.880 |
know that you came from me by using my customized link. And the link is in the description. 00:11:46.640 |
So in short, the vibe is shifting, but what isn't changing is my gratitude for you watching all the 00:11:53.360 |
way to the end. If you're keen to carry on the conversation with me personally, I'd love to see 00:11:57.920 |
you over on AI Insiders on Patreon. But to everyone watching, have a wonderful day.