An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

To sum, the release a few hours ago of the Google image editing upgrade, codenamed Nano Banana, will already be proof we're not in an AI bubble. You can see the mostly accurate new shadows and the attention to detail, like the little metal parts on the bench. In the AI studio, though, I asked for one flipper against its head holding an iPhone, and things weren't quite as good.

You could say it's a true Photoshop replacement if you don't look too closely. But on the AI bubble point, for those of a more factual inclination, there is the rest of the video. Because this is about the coming months and years of our lives, and the CEO behind ChatGPT did seem to say that AI is a bubble.

And there was indeed an MIT study published in the last few days that claimed that the vast majority of AI investments were yielding zero return. Stocks did go down. Context got lost. But I've read a dozen papers, studies and articles released on the topic of an AI bubble in the last couple of weeks.

So I've prepared eight points to consider as we all evaluate whether AI models have indeed prompted us into a bubble. And yes, have no fear, Nano Banana will return. Point number one is that Sam Ullman didn't actually say he thought AI was a bubble. That was an editorialized summary by the Verge journalist of his statement that investors are, quote, overexcited about AI.

Sam Ullman said, When bubbles happen, smart people get overexcited about a kernel of truth. On AI, he said, Are we in a phase where investors as a whole are overexcited about AI? My opinion is yes. Is AI the most important thing to happen in a very long time? He added, My opinion is also yes.

We'll get to whether CEOs like Sam Ullman actually understand their tools or can be trusted in their statements. But let's first have some context about his observation of investor overexcitement. You've got to remember that Sam Ullman's former chief scientist at OpenAI fired Sam Ullman, his name is Ilya Satskova, then left to form Safe Superintelligence, valued at $32 billion with no product.

Maybe that's why Sam Ullman thinks there's investor overexcitement. Then there's Sam Ullman's former CTO, who was his brief replacement as CEO when he got fired, and who now runs Thinking Machines Lab, valued at $12 billion, also with no public product. I wouldn't be surprised if that's also what he's referring to when he talks about investor overexcitement.

Point two is that the media can't claim to have foreseen an AI bubble if they have predicted a bubble every month for years. Let's take the Wall Street Journal, who a year ago said that OpenAI's revenue is at least $2 billion, and the company thought it could double that amount by 2025 this year.

They added that is still a far cry from the revenue needed to justify OpenAI's now nearly $90 billion valuation. Now remember that, a prediction of $2 billion going to $4 billion this year. And here's the Washington Post earlier poo-pooing the coverage and excitement about OpenAI hitting 100 million users.

The headline was of an AI hype bubble, and they said that this is just website visits, not official monthly active users. Okay, well what's the update as of August 2025? OpenAI has hit 700 million weekly active users, so much for 100 million being overblown, and it's hit $12 billion in annualized revenue, so much for $4 billion being overstated.

I do appreciate some of the analogies to the dot-com bubble, but while pets.com is a dead link today, I don't think chat.com will be 20 years from now. Point three, there have been three recent studies that have added fuel to the claim that AI is a bubble. Yes, I've read them all, but for your own sanity, I'm going to massively condense the findings into a series of pithy comments, as is my style.

The first is a McKinsey study cited in the New York Times that 8 in 10 enterprises are not seeing measurable increased profit as a result of company-wide AI projects. The detail there is important, company-wide AI projects. First problem, the actual study cited was done during the pre-reasoning paradigm of AI in mid-2024.

Problem two, the whole McKinsey study read as more of an advert for the case studies where AI did have, a big profit boost when coincidentally enterprises had worked with McKinsey. Of possible relevance here is the fact that AI consulting is certainly boosting McKinsey profits. See this Wall Street Journal article on that making up 40% of McKinsey's revenue.

The second most quoted study in the headlines was one from MIT, which was more thorough and seemingly more recent. They surveyed 153 senior leaders at 52 organizations. Now it does say in fairness that just 5% of enterprise projects are getting all of the value and the vast majority make no impact, positive or negative, on the bottom line.

The ones that do well apparently are those that focus on business outcomes rather than software benchmarks. But the far more interesting part is the more nuanced point that the study is making, which the headlines miss. I think you guys will quite like this because the paper says, while official company initiatives, that's why I emphasized earlier company-wide projects, official enterprise initiatives remain stuck on the wrong side of the Gen AI divide.

Employees, on the other hand, are already crossing it through personal AI tools. I think the next sentence is crucial. This shadow AI often delivers better return on investment than formal initiatives. Translated, when people like you use your own AI workflows to boost productivity, you become more reluctant to use the company AI tools.

Furthermore, this shadow usage creates a feedback loop. Employees know what good AI feels like, making them less tolerant of static enterprise tools. In short, the benefits you do get from using AI in your work is often invisible in the data. Bosses might ignore that when it comes to looking at who to hire and fire.

Or in more formal terms, the Wall Street Journal puts it like this. In an article titled, AI's overlooked $97 billion contribution to the economy. And that, by the way, is 2024, might be closer to a trillion dollars this year. What the hell is this invisible surplus? Well, it's what you would pay to access AI above and beyond what you actually pay.

Think, in other words, of what you would pay for your current tools not to be taken away from you. That might be slightly more than the $0 or $20 a month. That's the consumer surplus that we're talking about. Add that up for an entire economy and you do get $100 billion, maybe hundreds of billions of dollars.

Back to the bubble point, that means if your definition of AI progress was insane GDP acceleration, As one of the only remaining OG members of OpenAI's board predicted would occur, that's Adam D'Angelo, he expected up to 50% growth per year. Well, if that's your definition, then that perception is a bubble that has burst.

Before we leave that MIT study behind, though, two last quick clarifications. I think it's worth the time, given the millions of people who read the headlines that came of it. The first is that if you dig into the appendices, you'll see it was asking about projects beginning in January 2024.

You may remember that it was September of 2024 that the reasoning paradigm burst onto the scene with O1 Preview from OpenAI. Coding with AI a year ago was pretty awful, but now it's a completely different story. Still not perfect, but way better. And yes, I do say that in the full knowledge of that meter report, showing that on massive code bases with early 2025 models, the impact is pretty mixed for coders.

If you're curious about that, check out my interview with the lead author of that study on Patreon. Second quick clarification, the authors of the MIT study were encouraging the use of an AI agent framework, NANDA, so had some reason to suggest businesses were not yet deriving tremendous value, a bit like McKinsey.

The study did make brilliant points about the current lack of memory and adaptability in real time of models, but offered its framework as a potential solution to those very problems. Just something to bear in mind when you read the headlines. Fourth bit of context on the AI bubble question, which is on the nature of incremental progress.

If you look at each week's progress in AI, it can seem like small steps forward. Somewhat akin to this new helix walking demo from humanoid robot makers, Figure AI. Slow, but somewhat ominously inevitable progress. Everything seems incremental, but on the other hand, if you were only shown the progress in AI at the end of each year, I think you'd be less inclined to think of AI as a bubble.

And on that year end point, I don't just mean on benchmarks like my own, SimpleBench, which Elon Musk seems to have picked up on as a marker of Grok4's improvement. You might not know, but SimpleBench is basically a private test of logical reasoning, and each generation of models does outperform the last.

Except for Claude 3.7 Sonnet randomly, but anyway. No, I'm talking about things like the MMMU, testing models' ability to navigate charts, tables, and technical diagrams at almost expert level. You could literally just take a snapshot at the end of each year. Let me try and do this. It's the bloody American dates.

What is that? 23rd of July? Okay, so we've got, what is that? 38%. Then we go forward a year, and we're on 68%. Forward a year, and we are on 83%. I notice on the left, it says Ensemble of Human Experts, Medium, 82.6, so we're ahead of that, and Top Human Experts, 88.6.

Trust me, I know some of these benchmarks are brittle, but if I only gave you those three data points, you'd be like, hmm, wow, that's pretty good. To be honest, I kind of think of benchmarks as a bit like being photos of a music concert. They can give you a snapshot in time of model performance, not really an immersive experience of it.

What would you have said this time last year about Genie 3, which enables you to step into and explore your favourite paintings, as one commentator said? Sometimes I think we're all so dopamine addicted to the next release, that if two weeks go by without something major, we're like, man, AI winter, it's all over.

I'm not even going to get to the near-expert level speech transcription we now have in 2025, VO3, lifelike speech generation, song generation, or even real-world impact. Well, I kind of lied, I am going to briefly touch on real-world impact. Already, systems like Alpha Evolve from Google have saved Google 0.7% of their worldwide compute resources.

The key was that the language model could get feedback, so it kind of knew when it was hallucinating or when it was on to something good, and it could iterate rapidly. There was so much real-world impact with its automated solutions that the bottleneck actually became manual experimentation. When I look back for this video, I couldn't believe that that paper was in May, it felt like years ago, and stepping back, there's just so much to take in each week, that we even ignored the official passing of the Turing test in March.

Humans literally could not tell in written conversations whether they were speaking to another human or the recently retired GPT 4.5. Speaking of seeing the human behind the screen, that reminds me of something, and I'm going to do something that I very, very rarely do on the channel, which is talk about something that isn't all about AI, just for maybe 30 seconds, because the other day I met up with 11 young medical students on electives from Palestine.

They were incredible people, and most of them are using language models, by the way, to revise and become better doctors. They are from the West Bank, so they do currently have access to basic resources like the internet and food, which is not the case in Gaza. So massive shout out to the Janine lads who might be watching, and any of you guys that I might see at the next protest march in London.

Now for point five, and if you thought that so far in this video, every bit of AI that I've mentioned was just a flashy demo, well, then you have to have an answer for the reasoning breakthroughs of the last 10 months. Because in mid-2024, it appeared to be the academic consensus that models couldn't reason.

What you're seeing is a classic blocks world challenge, in which the goal is to get the red block on top of the blue block. And people said, well, yes, they could memorize word sequences to solve such basic Lego-like scenarios. The data backed that up, and you can see Gemini 1.5 Pro and GPT-40 scoring around half or less on this challenge.

Back then, though, if you switched the words around, so the logic was the same, but the words didn't really make sense anymore, model performance dropped off a cliff, as you can see. That's the mystery blocks world challenge. This was pretty nailed on proof that language models couldn't reason and would never be able to reason.

This study was cited by, among others, Jan LeCun. Here's an example, by the way, of this more abstract challenge, where you have to understand the patterns behind the words, not just memorize the next word. Don't know about you guys, but I would find this really quite hard. Then, somewhat out of the blue for the authors, one of whom I interviewed, along comes O1 Preview, which gets almost 53% on this mystery blocks world.

The authors had to rename language models to language reasoning models, because they could, in fact, decode such jumbled abstractions. Likewise, this challenge, Arc AGI 1, which I did a video on back in the autumn, in which models couldn't predict what would come next in this sequence. It held out for almost six years, and even, by the way, held out against O1 Preview, but fell to O3, the precursor to GPT-5.

The author of that benchmark says it's not just brute forcing it. These capabilities are new territory, and they demand serious scientific attention. Francois Chalet, by the way, now believes that fully human-level AI will arrive by 2030. But the point is this, we've come a long way since Chachapiti in November 2022, and it's easier to point to flaws in current models than design a benchmark that will last even 18 months at the current rate of AI progress.

If you think you've found a slam dunk thing that AI can't do, make a benchmark of it and see if it lasts 18 months. Because whatever layer of abstraction you add, LLMs seem able to climb up to it eventually. Now, it's true that deriving new physics or inventing brilliant new literary genres may require a mountain of such layers, something that's a real uphill climb to incentivise in training.

As the rich air, you could call it, of genius-level data gets sparser, I think, the further up you go. There's another problem, though. These language models are preternatural pattern finders, but their intelligence is not perfectly analogous to human intelligence and is not as efficiently derived. Thank God a baby doesn't demand the same kind of energy, or at least wattage, as the frontier models do to train.

And prompting models at inference time demand similar resources to both training them and experimenting with new ways to train them. It all comes from the same compute, so AI labs have competing demands on bottleneck resources. Serving more people sometimes literally comes at the sacrifice of making a smarter model.

But while we're on Sam Wattman, let me turn to point number six, which is that the CEOs of these labs have so much to do these days, that I wouldn't be surprised if they have somewhat lost touch with the models they're creating. And it's not just Sam Wattman who oscillates between saying that AI might generate OpenAI $100 trillion, and that there's investor over-excitement, as you saw at the start.

Between saying that he feels the AGI with every release, and admitting that the GPT-5 rollout was a fiasco. You know what, though? It's not only him. You've got Sundar Pichai, the CEO of Google, who I remember distinctly saying that progress would be slower in 2025 than it was in 2024.

But then Google DeepMind went beast mode after that and started releasing eye-opening models almost weekly from around, I'd say, mid-June. He clearly didn't know what was coming. He would never have said that. Then there's the CEO of Anthropic, Dario Amadei, who said recently, I get really angry when someone's like, this guy's a doomer, he wants to slow things down.

Hmm, why would people say that he wants to slow things down? Well, let's look at a transcript of one of his earliest interviews on the subject in 2023. Sorry for the transcript text, but Amadei repeatedly said that he didn't want AI acceleration. You may not know this, but Anthropic had a model clawed before ChachiPT and could have released it first.

Just after that decision, he said, yeah, we didn't want a big, loud public release that might accelerate things so fast that the ecosystem might not know how to handle it. Later on, he said, given the rate at which the technology is progressing, there's a worrying aspect. On balance, I'm glad that we weren't the ones who fired that starting gun.

I could go on and on about the changes in sentiment of the CEOs, but I guess let's just cut to the point. My only point is I would forgive a lot of people for thinking that the CEOs will be the most informed about these AI models, but they're often not.

Pay more attention to the top researchers at these companies, not necessarily the executives of the companies. Seventh, if you remember, I started this video by saying that stocks went down and that's what those articles were about. Well, that's true, but they're back up again. Stocks are going to do what stocks do.

Which brings us to my eighth and final point, because even researchers don't know how many layers of abstraction an LLM can be made to think in. And so the leadership of those companies definitely don't know. And the media is almost like headless chickens. The very same outlets that argue that AI is a bubble report that the job impacts will soon be grave.

Of course, I get that these news organizations are not monolithic, but there's somewhat of a contrast in those two points. But honestly, even for those of us who pay attention, for every benchmark that shows an incredible new ability, like that of being able to predict the future, can't cheat on that, at increasing accuracy, see Profit Arena, currently they have GPC5 High as being the best at predictions.

There are papers at the same time, like this one, exposing how basic visual tricks fool even the best LLMs of today. Will the gold medal accumulating unreleased reasoning model of OpenAI suffer from the same hallucinations? Or perhaps the forthcoming Gemini 3? We shall probably know by the autumn, which may bring us a new step change in performance as last autumn did, or may not.

I would certainly be skeptical of anyone who says they are certain either way. Anyway, that's enough from me. Thank you so much for watching. Let me know in the comments if you think AI is a bubble. I'm off to play with the image editing tool from Gemini. Nano Banana.

Have a wonderful day.

An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana

Chapters

Transcript