Was GPT-5 Underwhelming? OpenAI Co-founder Leaves, Figure02 Arrives, Character.AI Gutted, GPT-5 2025

As the GPT-5 release date sails further into the horizon, OpenAI leadership splinters. Meanwhile, other AI labs ship incrementally smarter models, and smaller AGI efforts like Character AI are swallowed by the Google whale, leaving in my eyes just four companies remaining in contention for having the most capable models out there.

But if we get agile, autonomous humanoid robots like Figaro 2 that could soon count to 50 in one breath with ChatGPT Advanced Voice, maybe many will take that as a win for 2024. But let's start with the leadership of OpenAI, and Greg Brockman is going on a leave of absence through to the end of the year, calling it his first time to relax since co-founding the company.

He goes on, "The mission is far from complete though. We still have a safe AGI to build." Now, of course, any one person taking time away might well be for personal reasons, but we have the full-on departure to Anthropic of another one of the co-founders, John Shulman. The reason that he, as the former head of alignment at OpenAI, jumped ship to Anthropic, I think is given away in this sentence.

He said he wants to do research alongside people deeply engaged with the topics he's most interested in. Now, it is pretty hard to read that as saying anything other than he wasn't working with people who were deeply engaged with the kind of alignment work that he was working on.

Just for those who don't know, alignment is this attempt to align machine values with human values, but it follows the departure of the previous head of alignment, Jan Laika, and before that, the previous co-head of alignment and co-founder, Ilya Satskova. I'm definitely starting to notice a trend of departures among co-founders at OpenAI, so I've enlisted ChachiPT to count the former heads of alignment at OpenAI.

Now, obviously, I was somewhat joking there. I think there's been less than 10 former heads of alignment, but that advanced voice mode does seem cool. I mean, even if OpenAI models aren't actually getting smarter and arguably with GPC 4.0 are getting slightly dumber, that advanced voice mode is incredibly lifelike and I could see hundreds of millions of people using it, at least when it comes out, which seems to be on the never never.

And here's some even more important context. OpenAI back in May said that they had recently begun training its next frontier model. Now here we are on August the 6th. It would almost certainly have finished training by now, so those key players at OpenAI would have a rough sense for its capabilities.

All of these departures after they've trained their latest frontier model seems strange. And then we got this about OpenAI's so-called dev day, which is starting in October and running through to November. I am sure it will be fascinating, but they released this particular nugget. While we know developers are waiting for our next big model, which we shared has begun training earlier this year in May, these events will focus on advancements in the API and our dev tools.

Taking that at face value would imply that "GPC 5.0" will not come before November 21st. More likely, it means that GPC 5.0 wouldn't even come before the end of the year, because why would you release a model just after you've invited a load of devs to play about with your tools?

Now, yes, Sam Ullman has recently claimed in the Washington Post that more advances will soon follow and will usher in a decisive period in the story of human society. But in recent months, it would be hard to say that OpenAI have produced much in the way of decisive progress.

And of course, all of that comes as Elon Musk is suing yet again Sam Ullman and OpenAI for what he says is lying and perfidy. The lawsuit calls the original OpenAI a spurious venture. The language used throughout this 86-page document is hardly subtle. Musk claims that Sam Ullman is doing a long con.

His perfidy and deceit are of Shakespearean proportions. It does go on and on, but the basic accusation is that Sam Ullman was motivated by greed and Elon Musk just wanted to have something more open to compete versus Google. I think Musk and others may raise an eyebrow when later in the article, Sam Ullman said making sure open source models are readily available to developers in other nations will further bolster our advantage, talking of the US.

And the question that the article fundamentally raises is who will control the future of AI? Well, as of today, it looks less and less likely to be Sam Ullman. OpenAI are certainly good at productizing AI and the advanced voice mode, as we saw, is great. SearchGPT could make them some money and Sora's coming out at some point, presumably this year.

And at least at the moment, the Figaro 2 robot is using an OpenAI video language model. We'll get back to that in just a moment, but in terms of raw intelligence, OpenAI feel like they're falling behind. The LLAMA 3 405 billion parameter model is already smarter than GPT 4.0 and Zuckerberg has recently committed to 10 times more computing power to train LLAMA 4.

Or to put it another way, the next OpenAI model would have to be significantly better than GPT 4.0 just to catch up to the current state of the art, let alone the state of the art when LLAMA 4 comes out. And of course, in the meantime, even just this year, we might be getting Claude 3.5 Opus from Anthropic or even Claude 4.

Simply put, a year is a long time in AI and the debate has moved on. Even the White House are now encouraging open source competition to the likes of OpenAI. Sam Ullman, meanwhile, is still warning about people stealing key intellectual property such as model weights. Now do forgive me for pointing out that one of the modules on my Coursera course is about that difference between open source and open weights.

Super grateful, of course, for those 10 reviewers who have kindly left reviews for this course. But don't get me wrong, it's not like Meta is having it all its own way. Do you remember those Tom Brady, Paris Hilton chatbots that all the youngsters were apparently going to be using?

I think each celebrity was paid something like $5 million for a few hours of recordings. Well, apparently they're now being scrapped and none of those AI chatbots amassed a particularly big following. But nor are things going particularly well for the smaller AGI labs like Character AI. Their product was or is an array of chatbots but they were also aiming at AGI and training their own foundation models.

Obviously, the leaders of Character AI must have been somewhat disappointed by those new foundation models because essentially they've been bought out or hollowed out by Google. Not actually buying a rival company but taking its key talent and IP. But at this point you might be starting to notice somewhat of a trend.

If the incrementally greater intelligence of new models were down to obscure tricks or arcane knowledge, then you'd expect smaller labs like Character AI to be doing as well as the biggest labs. But if it's all about sheer scale of data and compute, you'd expect the leaders to be increasingly, well, Meta and Google.

And that is more or less what we're seeing though of course measuring that intelligence is quite hard. We do have the LMSys chatbot arena leaderboard in which the new version of Gemini 1.5 Pro takes the lead at almost 1300 Elo. But if you'll notice we have GPT-40 Mini coming third, ahead indeed of Claw 3.5 Sonnet, which in my own benchmark is far and away ahead.

If we just relied on these Elo rankings, you'd think, well, if OpenAI can come up with a tiny model doing almost as good as the rest, they must be doing amazingly. But LMSys recently did something great, which was release a batch of raw data showing comparisons between the models and which one won.

And I looked through the dozens of examples and one trend emerged. Claw 3.5 Sonnet essentially refused more requests than GPT-40 Mini. Even when both models couldn't perform a task like creating an image natively, GPT-40 Mini gave it, I guess, more of a go. It at least described the image that it would create.

Or in this example, when the models were asked a political question, GPT-40 Mini gave a response, whereas Claw 3.5 Sonnet just apologized and said it wouldn't provide analysis. Now, I have noticed that myself that Claw 3.5 Sonnet is more sensitive than any other model, but that's not a sign of lacking intelligence.

So to the extent that we're going to call language modeling intelligence, Claw 3.5 Sonnet is far more capable than GPT-40 Mini. But this reticence to answer certain questions could explain the leaderboard rankings. Now, yes, to anyone following the channel, I have been testing the new version of Gemini 1.5 Pro on my simple bench.

The final scores will be presented on a website that I'm hoping to release before the next video, but in the early testing, it performs slightly worse than 3.5 Sonnet, but far better than other models. So in that sense, this leaderboard position could be far more justified, at least than GPT-40 Mini.

For those who haven't heard of my new reasoning benchmark, humans score over 90% quite easily, whereas models like the new Gemini 1.5 Pro version score around 25%. It would, however, be somewhat hypey to say that this is another step toward AGI, as one of the co-founders of Google DeepMind recently said.

But whether you think that new Gemini 1.5 Pro is the best or Claude 3.5 Sonnet, certainly more and more people are now shifting their API spend away from OpenAI. OpenAI letting other labs have the lead for a couple of weeks could be a timing issue, but a couple of months just makes it seem like they don't have a reply.

But as I mentioned at the start, at least OpenAI's models are the ones being chosen by Figure02. These humanoid robots have an onboard mic and speaker, so hopefully you could chat to them like you would the new OpenAI advanced voice mode. In other words, seamlessly with very low latency.

Brett Adcock, the founder of Figure, said that the default user interface to our robot will be speech. Apparently, the robot can work for around 20 hours straight, which is even more than me reading the latest AI papers. Its hands have 16 degrees of freedom and apparently human equivalent strength.

So honestly, even though it can speak back to you, you might not want to speak back to it. Now, apparently these Figure02 robots can perform certain tasks autonomously and self-correct and that data flywheel will be in effect. And one might well say that the argument that ubiquitous robot assistants will arrive before Artificial General Intelligence looks more plausible than ever.

And speaking of a data flywheel, you may already know that AI labs, including OpenAI, have used Weights & Biases, this video's sponsor, to track frontier machine learning experiments. But what you might not know is that Weights & Biases now have Weave, a lightweight toolkit to confidently iterate on LLM applications.

They also produce free prompt and LLM agent courses on their website. And if you didn't know that, you can let them know that you came from me by using my customized link. And the link is in the description. So in short, the vibe is shifting, but what isn't changing is my gratitude for you watching all the way to the end.

If you're keen to carry on the conversation with me personally, I'd love to see you over on AI Insiders on Patreon. But to everyone watching, have a wonderful day.

Was GPT-5 Underwhelming? OpenAI Co-founder Leaves, Figure02 Arrives, Character.AI Gutted, GPT-5 2025

Transcript