back to index

Was GPT-5 Underwhelming? OpenAI Co-founder Leaves, Figure02 Arrives, Character.AI Gutted, GPT-5 2025


Whisper Transcript | Transcript Only Page

00:00:00.000 | As the GPT-5 release date sails further into the horizon, OpenAI leadership splinters.
00:00:07.600 | Meanwhile, other AI labs ship incrementally smarter models,
00:00:12.640 | and smaller AGI efforts like Character AI are swallowed by the Google whale,
00:00:19.040 | leaving in my eyes just four companies remaining in contention for having the most capable models
00:00:24.640 | out there. But if we get agile, autonomous humanoid robots like Figaro 2 that could soon
00:00:29.840 | count to 50 in one breath with ChatGPT Advanced Voice, maybe many will take that as a win for 2024.
00:00:37.600 | But let's start with the leadership of OpenAI, and Greg Brockman is going on a leave of absence
00:00:44.000 | through to the end of the year, calling it his first time to relax since co-founding the company.
00:00:50.320 | He goes on, "The mission is far from complete though. We still have a safe AGI to build."
00:00:56.080 | Now, of course, any one person taking time away might well be for personal reasons,
00:01:00.640 | but we have the full-on departure to Anthropic of another one of the co-founders,
00:01:06.080 | John Shulman. The reason that he, as the former head of alignment at OpenAI,
00:01:10.240 | jumped ship to Anthropic, I think is given away in this sentence. He said he wants to do research
00:01:15.840 | alongside people deeply engaged with the topics he's most interested in. Now, it is pretty hard
00:01:21.760 | to read that as saying anything other than he wasn't working with people who were deeply engaged
00:01:28.240 | with the kind of alignment work that he was working on. Just for those who don't know,
00:01:31.600 | alignment is this attempt to align machine values with human values, but it follows the departure of
00:01:38.000 | the previous head of alignment, Jan Laika, and before that, the previous co-head of alignment
00:01:43.440 | and co-founder, Ilya Satskova. I'm definitely starting to notice a trend of departures among
00:01:48.960 | co-founders at OpenAI, so I've enlisted ChachiPT to count the former heads of alignment at OpenAI.
00:01:55.920 | Now, obviously, I was somewhat joking there. I think there's been less than 10
00:02:08.720 | former heads of alignment, but that advanced voice mode does seem cool.
00:02:13.040 | I mean, even if OpenAI models aren't actually getting smarter and arguably with GPC 4.0 are
00:02:18.320 | getting slightly dumber, that advanced voice mode is incredibly lifelike and I could see
00:02:23.840 | hundreds of millions of people using it, at least when it comes out, which seems to be on the never
00:02:28.960 | never. And here's some even more important context. OpenAI back in May said that they had recently
00:02:35.680 | begun training its next frontier model. Now here we are on August the 6th. It would almost certainly
00:02:42.240 | have finished training by now, so those key players at OpenAI would have a rough sense
00:02:47.600 | for its capabilities. All of these departures after they've trained their latest frontier model
00:02:53.600 | seems strange. And then we got this about OpenAI's so-called dev day, which is starting in October
00:03:00.160 | and running through to November. I am sure it will be fascinating, but they released this particular
00:03:05.520 | nugget. While we know developers are waiting for our next big model, which we shared has begun
00:03:11.280 | training earlier this year in May, these events will focus on advancements in the API and our
00:03:17.840 | dev tools. Taking that at face value would imply that "GPC 5.0" will not come before November 21st.
00:03:25.840 | More likely, it means that GPC 5.0 wouldn't even come before the end of the year, because why would
00:03:29.920 | you release a model just after you've invited a load of devs to play about with your tools?
00:03:35.200 | Now, yes, Sam Ullman has recently claimed in the Washington Post that more advances will soon
00:03:40.560 | follow and will usher in a decisive period in the story of human society. But in recent months,
00:03:47.040 | it would be hard to say that OpenAI have produced much in the way of decisive progress. And of
00:03:53.200 | course, all of that comes as Elon Musk is suing yet again Sam Ullman and OpenAI for what he says
00:04:01.120 | is lying and perfidy. The lawsuit calls the original OpenAI a spurious venture. The language
00:04:07.840 | used throughout this 86-page document is hardly subtle. Musk claims that Sam Ullman is doing a
00:04:14.080 | long con. His perfidy and deceit are of Shakespearean proportions. It does go on and on,
00:04:20.400 | but the basic accusation is that Sam Ullman was motivated by greed and Elon Musk just wanted to
00:04:27.120 | have something more open to compete versus Google. I think Musk and others may raise an eyebrow when
00:04:33.760 | later in the article, Sam Ullman said making sure open source models are readily available
00:04:38.560 | to developers in other nations will further bolster our advantage, talking of the US.
00:04:43.920 | And the question that the article fundamentally raises is who will control the future of AI?
00:04:50.640 | Well, as of today, it looks less and less likely to be Sam Ullman. OpenAI are certainly good at
00:04:57.600 | productizing AI and the advanced voice mode, as we saw, is great. SearchGPT could make them
00:05:03.600 | some money and Sora's coming out at some point, presumably this year. And at least at the moment,
00:05:08.880 | the Figaro 2 robot is using an OpenAI video language model. We'll get back to that in
00:05:14.800 | just a moment, but in terms of raw intelligence, OpenAI feel like they're falling behind.
00:05:20.720 | The LLAMA 3 405 billion parameter model is already smarter than GPT 4.0 and Zuckerberg
00:05:28.400 | has recently committed to 10 times more computing power to train LLAMA 4. Or to put it another way,
00:05:34.720 | the next OpenAI model would have to be significantly better than GPT 4.0 just to
00:05:39.760 | catch up to the current state of the art, let alone the state of the art when LLAMA 4 comes out.
00:05:44.400 | And of course, in the meantime, even just this year, we might be getting Claude 3.5
00:05:48.640 | Opus from Anthropic or even Claude 4. Simply put, a year is a long time in AI and the debate has
00:05:54.960 | moved on. Even the White House are now encouraging open source competition to the likes of OpenAI.
00:06:01.200 | Sam Ullman, meanwhile, is still warning about people stealing key intellectual property such
00:06:07.120 | as model weights. Now do forgive me for pointing out that one of the modules on my Coursera course
00:06:12.880 | is about that difference between open source and open weights. Super grateful, of course,
00:06:17.840 | for those 10 reviewers who have kindly left reviews for this course.
00:06:22.000 | But don't get me wrong, it's not like Meta is having it all its own way. Do you remember those
00:06:26.800 | Tom Brady, Paris Hilton chatbots that all the youngsters were apparently going to be using?
00:06:32.080 | I think each celebrity was paid something like $5 million for a few hours of recordings. Well,
00:06:37.760 | apparently they're now being scrapped and none of those AI chatbots amassed a particularly big
00:06:43.040 | following. But nor are things going particularly well for the smaller AGI labs like Character AI.
00:06:49.760 | Their product was or is an array of chatbots but they were also aiming at AGI and training
00:06:56.400 | their own foundation models. Obviously, the leaders of Character AI must have been
00:07:00.800 | somewhat disappointed by those new foundation models because essentially they've been bought
00:07:05.200 | out or hollowed out by Google. Not actually buying a rival company but taking its key talent and IP.
00:07:12.160 | But at this point you might be starting to notice somewhat of a trend. If the incrementally greater
00:07:18.240 | intelligence of new models were down to obscure tricks or arcane knowledge, then you'd expect
00:07:24.240 | smaller labs like Character AI to be doing as well as the biggest labs. But if it's all about
00:07:29.680 | sheer scale of data and compute, you'd expect the leaders to be increasingly, well, Meta and Google.
00:07:36.960 | And that is more or less what we're seeing though of course measuring that intelligence is quite
00:07:41.040 | hard. We do have the LMSys chatbot arena leaderboard in which the new version of Gemini 1.5 Pro takes
00:07:48.160 | the lead at almost 1300 Elo. But if you'll notice we have GPT-40 Mini coming third, ahead indeed of
00:07:55.680 | Claw 3.5 Sonnet, which in my own benchmark is far and away ahead. If we just relied on these Elo
00:08:03.040 | rankings, you'd think, well, if OpenAI can come up with a tiny model doing almost as good as the rest,
00:08:09.520 | they must be doing amazingly. But LMSys recently did something great, which was release a batch
00:08:14.720 | of raw data showing comparisons between the models and which one won. And I looked through
00:08:20.480 | the dozens of examples and one trend emerged. Claw 3.5 Sonnet essentially refused more requests
00:08:27.840 | than GPT-40 Mini. Even when both models couldn't perform a task like creating an image natively,
00:08:33.840 | GPT-40 Mini gave it, I guess, more of a go. It at least described the image that it would create.
00:08:39.520 | Or in this example, when the models were asked a political question, GPT-40 Mini gave a response,
00:08:46.320 | whereas Claw 3.5 Sonnet just apologized and said it wouldn't provide analysis. Now,
00:08:50.800 | I have noticed that myself that Claw 3.5 Sonnet is more sensitive than any other model, but that's
00:08:56.640 | not a sign of lacking intelligence. So to the extent that we're going to call language modeling
00:09:01.600 | intelligence, Claw 3.5 Sonnet is far more capable than GPT-40 Mini. But this reticence to answer
00:09:08.400 | certain questions could explain the leaderboard rankings. Now, yes, to anyone following the
00:09:12.960 | channel, I have been testing the new version of Gemini 1.5 Pro on my simple bench. The final
00:09:18.960 | scores will be presented on a website that I'm hoping to release before the next video,
00:09:23.840 | but in the early testing, it performs slightly worse than 3.5 Sonnet, but far better than other
00:09:29.920 | models. So in that sense, this leaderboard position could be far more justified, at least
00:09:34.800 | than GPT-40 Mini. For those who haven't heard of my new reasoning benchmark, humans score over 90%
00:09:40.400 | quite easily, whereas models like the new Gemini 1.5 Pro version score around 25%. It would, however,
00:09:47.360 | be somewhat hypey to say that this is another step toward AGI, as one of the co-founders of
00:09:53.600 | Google DeepMind recently said. But whether you think that new Gemini 1.5 Pro is the best or
00:09:59.440 | Claude 3.5 Sonnet, certainly more and more people are now shifting their API spend away from OpenAI.
00:10:06.400 | OpenAI letting other labs have the lead for a couple of weeks could be a timing issue,
00:10:11.280 | but a couple of months just makes it seem like they don't have a reply. But as I mentioned at
00:10:16.320 | the start, at least OpenAI's models are the ones being chosen by Figure02. These humanoid robots
00:10:22.560 | have an onboard mic and speaker, so hopefully you could chat to them like you would the new
00:10:27.600 | OpenAI advanced voice mode. In other words, seamlessly with very low latency. Brett Adcock,
00:10:33.120 | the founder of Figure, said that the default user interface to our robot will be speech.
00:10:38.960 | Apparently, the robot can work for around 20 hours straight, which is even more than me reading the
00:10:44.800 | latest AI papers. Its hands have 16 degrees of freedom and apparently human equivalent strength.
00:10:51.120 | So honestly, even though it can speak back to you, you might not want to speak back to it.
00:10:56.480 | Now, apparently these Figure02 robots can perform certain tasks autonomously
00:11:01.120 | and self-correct and that data flywheel will be in effect. And one might well say that the argument
00:11:07.760 | that ubiquitous robot assistants will arrive before Artificial General Intelligence looks
00:11:14.000 | more plausible than ever. And speaking of a data flywheel, you may already know that AI labs,
00:11:19.680 | including OpenAI, have used Weights & Biases, this video's sponsor, to track frontier machine
00:11:25.760 | learning experiments. But what you might not know is that Weights & Biases now have Weave,
00:11:30.480 | a lightweight toolkit to confidently iterate on LLM applications. They also produce free
00:11:36.080 | prompt and LLM agent courses on their website. And if you didn't know that, you can let them
00:11:40.880 | know that you came from me by using my customized link. And the link is in the description.
00:11:46.640 | So in short, the vibe is shifting, but what isn't changing is my gratitude for you watching all the
00:11:53.360 | way to the end. If you're keen to carry on the conversation with me personally, I'd love to see
00:11:57.920 | you over on AI Insiders on Patreon. But to everyone watching, have a wonderful day.