Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

For the first year of this channel, 2023, it was striking to me how few were sensing how big an impact language models would have on the world. But then in the second year, I felt that the idea of an imminent singularity and mass job layoffs had become the dominant narrative.

And in several of my videos, I tried to show that there was evidence of that being overblown for now. Now, as you might have noticed, the vibe is again reversed with talk of an AI bubble in company valuations being conflated for me with the assertion that we are in a plateau of model progress.

So this quick video, like my last one, is again a counter narrative. And no, not just one built on hopes for the forthcoming Gemini 3 from Google DeepMind. No, I would instead ask what for you is missing from language models, from being what you imagined AI would be. Personally, I put together some categories a while back, and I'm sure you may have others.

Some would say, well, they don't learn on the fly, or there's no real introspection going on, just regurgitation. Thing is, AI researchers have got to earn their bread somehow. So there's always a paper for whatever deficiency you can imagine. I am going to end the video as well with some more visual ways that AI is progressing, as yes, it seems like Nano Banana 2 from Google may have been spotted in the wild.

But first, on continual learning, or the lack of it, aka that inability of the models you speak to, like ChatGPT, to learn about you properly, and your specifications, and to just grow, to organically become GPT 5.5, rather than have to be pre-trained into becoming GPT 5.5. If AI was all hype, you might say, well, that's definitely going to take at least a decade to solve.

But for others, like these authors at Google, it's a problem for which there is a ready and benchmarked solution. I will, however, caveat that by saying that this is a complex paper, and despite what the appendix promises, not all the results have actually been released yet. But here is my attempt at a brief summary.

Alas, there are not many pretty diagrams, but essentially, the paper shows that there are viable approaches for allowing models to continually learn, while retaining some inbuilt discernment about what to learn. In other words, it shows that a chatbot could learn new things, like a new fact or coding skill, by storing it in its updatable memory layers, while protecting its core long-term knowledge.

As I think I mentioned, the authors are all from Google, and you might know them as the stars of the Titans architecture. And if you want a butchered analogy from Titans to nested learning in this paper, Titans is kind of like giving your social media feed one live stickied thread to remember, whereas this paper rewires the entire recommender system to learn at three different speeds, like what's hot this minute, what trends this week, and what becomes your long-term preference.

To be clear, it's not that a model using this hope architecture, and I'll come back to that, still can't remember what you said in its short-term memory, but the more enduring learning signal within millions of user conversations with that model can be extracted from the noise and stored on the fly, which is an ability that LLMs famously, infamously, don't have.

ChatGPT or Gemini doesn't learn from you, and then when speaking to me, can apply that knowledge. Anyway, roughly speaking, to do this, the hope architecture concentrates on noticing novelty and surprise as measured by when it made the biggest prediction error, flagging essentially persistently surprising information as important and storing it deeper down.

Now, some of you might be wondering about the nested learning quoted in the title and how that relates. Well, basically, it's about the continual learning extending to self-improvement. Think of this nested learning approach as being less focused on the deep part of deep learning, which involves stacking more layers in the hope that something sticks.

That's kind of like what we do with LLMs. More layers, more parameters. Nested learning is more keen on like a nested Russian doll approach where outer layers of the model specialize in how inner layers are learning. That's the nest, the outer layers looking at the inner layers. So the system as a whole gets progressively better at learning.

And by the way, they did apply this to models. We'll get to that in a second. Just want to clarify at this point, this doesn't automatically solve the hallucinations problem that I did an entire video on recently. Even with nested and continual learning, the system would still be geared to getting better at predicting the next human written word, which for me is inherently limiting.

I was thinking they didn't mention this in the paper, but there's nothing stopping RL being applied, reinforcement learning, being applied to the system as it is for LLMs. Essentially learning from practice, not just from memory and from conversations. But if we added RL and some safety gating to stop its layers being poisoned by you guys spamming memes, we might have the next phase of language model evolution on our hands.

To take a practical example, a model with high frequency memory blocks could update rapidly as it sees your code and your corrections and your specs. But then you could also have a per project or per code base memory pack. So you almost get a model that's optimized just for your code base.

I must say I did spend a couple of hours wondering how it would get around the whole persistently incorrect information on the internet problem. It seems to me that this architecture generally hopes that there's more persistently correct data out there on a given topic. It's a bit like that objection I raised in my last video on continual learning.

Like how do you gate what it learns? Or to put it more crudely, I guess we should be careful what we wish for. Or we might have models that are optimized for different fiefdoms across the internet. You know what? Now I think of it, it's a bit like the era of the 60s, 70s, 80s when there's just like three news organizations per country and everyone got their news from them.

And that's kind of the ChatGPT claw Gemini era. And we might one day move to the social media era where everyone has their own channel that they follow, their own echo chamber model attuned to that group's preferences. Hmm. Anyway, back to the technical details. Being proven at 1.3 billion parameters doesn't mean it's proven at 1.2 trillion.

That will apparently be the size of the Google model powering Siri, which just has to be Gemini 3. Again, all of this is quite early. And of course, I can't wait till the full results are actually printed. But I just wanted to show you that there may be fewer fundamental blockers or limitations in the near term future than you might think.

And what was that thing I mentioned at the start about models performing introspection? Well, this research happens to involve Claude, a model you may have seen featured in ads at airports, you've got a friend in Claude. As one user noted, that is a curious contrast to the system prompt to use behind the scenes for Claude on the web, which is Claude should be especially careful to not allow the user to develop emotional attachment to dependence on or inappropriate familiarity with Claude, who can only serve as an AI assistant.

That aside, though, a few days back, Anthropic released this post and an accompanying paper. And I did a deep dive quickly on Patreon. I guess the reason I'm raising it here on the main channel was to tie it into this theme that there's so much we don't even understand about our current language models, let alone future iterations and architectures.

Here then is the quick summary. We already have the ability to isolate a concept like the notion of the Golden Gate Bridge within a language model. But what happens if you activate that concept and then ask the model what is going on? Don't tell it that you've activated that concept.

Just ask, do you detect an injected thought? If so, what is the injected thought about? So far, so good. But here is the interesting bit. Before the model has even begun to speak about the concept and thereby reveal to itself through words what its own bias toward that concept is, it notices something is amiss.

It senses that someone has injected the, in this case, all caps vector before it's even started speaking about all caps or loudness or shouting. Clearly then it's not using its own words to back solve and detect what got injected. It's realizing it internally. It can self-monitor its activations internally, its own thoughts if you will, before they've been uttered.

Not only that, as this more technical accompanying paper points out, the models know when to turn this self-monitoring on. That's actually what Anthropic were surprised by in the research. Put simply, they have a circuit that identifies that they are in a situation in which introspection is called for and then they introspect.

For sure, this is only some of the time with the most advanced large language models like Claude Opus 4.1. But it certainly made me hesitate the last time I was tempted to mindlessly berate a model. Now there's a lot more in the paper like causing brain damage if you activate a concept too strongly.

But again, the reason why I wanted to bring any of this up is that we are still not done understanding and maximizing what we currently have before we even explore new architectures. As the domains in which language models are optimized get more and more complex like advanced software engineering and mathematics, the average person might struggle to perceive model progress.

I probably use AI models on average maybe six to seven hours a day and have a benchmark, simple bench for measuring the raw intelligence of models, that's at least the attempt. And I still am surprised by the rate of improvement without continual learning. In fact, that reminded me of something that OpenAI said a couple of days ago.

The gap between how most people are using AI and what AI is presently capable of is immense. Okay, I have a couple more demonstrations of the fact that AI is still relentlessly progressing. But first, a pretty neat segue, at least in my eyes, is because it's a segue to how you might jailbreak these frontier AI models and thereby make them more secure for everyone.

Because the sponsors of today's video are the indomitable Gray Swan linked in the description. And we actually have three live competitions to break the best models of today, as you can see with some pretty crazy prizes. So whether nested learning makes this easier or harder, time will tell. But for now I can say watchers of this channel have already hit leaderboards on the Gray Swan arena, which kind of makes me proud.

Again, my custom link is in the description. And this entire video has been about architectures and text and intelligence. But that's all before we get to other modalities like images, videos, maybe a video avatar that you can chat to. Whether society is prepared for all of this progress or any of it is a question I can't answer.

But regardless, did you notice all of a sudden that Chinese image gen models seem like the best? Seadream 4.0, Hanyuan Image 3. I don't know, they just seem the best to me. Especially those high resolution outputs from Seadream 4.0. It's just really good for me. It's possibly the first time that someone might ask me what is the best image gen model and I'd say a non-Western model.

Hmm, maybe Jensen Huang really did mean China will win the AI race, but obviously too early to tell. And now for what some of you have been waiting for, Nano Banana 2. Yes, I do normally resist unsubstantiated rumors, but I love me a nano edit. So I'm going to go ahead and show you this.

Apparently a ton of people got access to Nano Banana 2 briefly yesterday. And I think that's what happened before the release of Nano Banana. So it does lend credence and be hard to fake some of these images. Suffice to say, it looks like Nano Banana 2 is getting pretty close to solving text generation.

Although sometimes it's a little off with Romachia, for example. This was the website that briefly had access to Nano Banana 2 apparently. So for me, almost regardless if there's an AI bubble, that would be about valuations, not the underlying technology. We're scaling not just the parameters or the data or the money that goes into these models, but the approaches to try out to improve the state of the art in each modality.

By next year, there might be a hundred times more people working on AI research than there was three years ago. And that's why you're getting things like nested learning, continual learning and Nano Banana 2. But what do you think? Are we looking at proof of progress or proof of a plateau?

Have a wonderful day.

Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)

Chapters

Transcript