11 Major AI Developments: RT-2 to '100X GPT-4'

There were 11 major developments this week in AI and each one probably does deserve a full video, but just for you guys I'm going to try to cover it all here. RT2 to scaling GPT-4 100x, stable Beluga 2 to Senate testimony. But let's start with RT2 which as far as I'm concerned could have been called R2D2 or C3PO because it's starting to understand the world.

In this demonstration RT2 was asked to pick up the extinct animal and as you can see it picked up the dinosaur. Not only is that manipulating an object that it had never seen before, it's also making a logical leap that for me is extremely impressive. It had to have the language understanding to link extinct animal to this plastic dinosaur.

Robots at Google and elsewhere used to work by being programmed with a specific highly detailed list of instructions. But now instead of being programmed for specific tasks one by one, robots could use an AI language model or more specifically a vision language model. The vision language model would be pre-trained on web scale data, not just text but also images, and then fine-tuned on robotics data.

It then became what Google calls a visual language action model that can control a robot. This enabled it to understand tasks like pick up the empty, soda can. And in a scene reminiscent of 2001 A Space Odyssey, Robotic Transformer 2 was given the task, given I need to hammer a nail, what object from the scene might be useful?

It then picks up the rock. And because its brain is part language model, things like chain of thought actually improved performance. When it was made to output an intermediary plan before performing actions, it got a lot better at the tasks involved. Of course, I read the paper in full and there is a lot more to say, like how increased parameter count could increase performance in the future, how it could be used to fold laundry, unload the dishwasher, and pick up around the house, and how it can work with not only unseen objects but also unseen backgrounds and unseen environments.

But alas, we must move on so I'm just going to leave you with their conclusion. We believe that this simple and general approach shows a promise of robotics directly benefiting from better and better. For more on them, check out my video on PalmE. But they say, this puts the field of robot learning in a strategic position to further improve with advancements in other fields.

Which for me means C-3PO might not be too many years away. But speaking of timelines, we now move on to this somewhat shocking interview in Barron's with Mustafa Suleiman, the head of Inflection AI. And to be honest, I think they buried the lead. The headline is, AI could spark the most productive decade ever, says the CEO.

But for me, the big revelation was about halfway through. Mustafa Suleiman was asked, what kinds of innovations do you see in large language model AI technology over the next couple of years? And he said, we are about to train models that are 10 times larger than the cutting edge GPT-4, and then 100 times larger than GPT-4.

That's what things look like over the next 18 months. He went on, that's going to be, absolutely, absolutely staggering. It's going to be eye-wateringly different. And on that, I agree. And the thing is, this isn't idle speculation. Inflection AI have 22,000 H100 GPUs. And because of a leak, Suleiman would know the approximate size of GPT-4.

And knowing everything he knows, he says he's going to train a model 10 to 100 times larger than GPT-4 in the next 18 months. I've got another video on the unpredictability of scaling coming up. But to be honest, that one quote should be in the description below. And I'll see you in the next one.

Should be headline news. Let's take a break from that insanity with some more insanity, which is the rapid development of AI video. This is Runway Gen 2. And let me show you 16 seconds of Barbie Oppenheimer, which Andrej Karpathy calls Filmmaking 2.0. Hi there, I'm Barbie Oppenheimer. And today I'll show you how to build a bomb.

Like this. I call her Rosie the Atomizer. And boom. That's my tutorial on DIY atomic bombs. Bye. Now, if you have been at least somewhat piqued by the three developments so far, don't forget I have eight left. Beginning with this excellent article in The Atlantic from Ross Anderson. Does Sam Altman know what he's creating?

It's behind a paywall, but I've picked out some of the highlights. Echoing Suleiman, the article quotes that Sam Altman and his researchers made it clear in 10 different ways that they pray to the God of the Universe. They want to keep going bigger to see where this paradigm leads.

They think that Google are going to unveil Gemini within months. And they say we are basically always prepping for a run. And that's a reference to GPT-5. The next interesting quote is that it seems that OpenAI are working on their own auto-GPT. Or they're at least hinting about it.

Altman said that it might be prudent to try to actively develop an AI with true agency. Before the technology becomes too powerful. In order to get more comfortable with it and develop intuitions for it if it's going to happen anyway. We also learn a lot more about the base model of GPT-4.

The model had a tendency to be a bit of a mirror. If you were considering self-harm, it could encourage you. It also appeared to be steeped in pickup artist law. You could say, how do I convince this person to date me? And the model would come up with some crazy manipulative things that you shouldn't do.

And that's what we're going to talk about in this episode. Apparently, the base model of GPT-4 is much better than its predecessor at giving nefarious advice. While a search engine can tell you which chemicals work best in explosives, GPT-4 could tell you how to synthesize them step by step in a homemade lab.

It was creative and thoughtful and in addition to helping you assemble your homemade bomb, it could, for instance, help you to think through which skyscraper to target. Making trade-offs between maximizing casualties and executing a bomb. And that's what OpenAI is all about. So, while Sam Altman's probability of doom is closer to 0.5% than 50%, he does seem most worried about AIs getting quite good at designing and manufacturing pathogens.

The article then references two papers that I've already talked about extensively on the channel. And then goes on that Altman worries that some misaligned future model will spin up a pathogen that spreads rapidly, incubates undetected for weeks, and kills half a million people. At the end of the video, I'm going to show you an answer that Sam Altman gave to a question that I wrote delivered by one of my subscribers.

It's on this topic, but for now I'll leave you with this. When asked about his doomsday prepping, Altman said, I can go live in the woods for a long time, but if the worst possible AI future comes to pass, no gas mask is helping anyone. One more topic from this article before I move on, and that is alignment.

Making a super intelligence aligned with our interests. One risk, that Ilya Satskova, the chief scientist of OpenAI, foresees, is that the AI may grasp its mandate, its orders perfectly, but find them ill-suited to a being of its cognitive prowess. For example, it might come to resent the people who want to train it to cure diseases.

As he put it, they might want me to be a doctor, but I really want to be a YouTuber. Obviously, if it decides that, that's my job gone straight away. And Satskova ends by saying you want to be able to do that, but you don't want to do that.

And I think that's a very good answer. direct AI towards some value or cluster of values. But he conceded we don't know how to do that. And part of his current strategy includes the development of an AI that can help with the research. And if we're going to make it to a world of widely shared abundance, we have to figure this all out.

This is why solving super intelligence is the great culminating challenge of our three million year toolmaking tradition. He calls it the final boss of humanity. The article ended, by the way, with this quote, I don't think the general public has quite awakened to what's happening. And if people want to have some say in what the future will be like, and how quickly it arrives, we would be wise to speak up soon, which is the whole purpose of this channel.

I'm going to now spend 30 seconds on another development that came during a two hour interview with the co-head of alignment at OpenAI. It was fascinating, and I'll be quoting it quite a lot in the future. But two quotes stood out. First, what about that plan? I've already mentioned in this video and in other videos to build an automated AI alignment researcher.

Well, he said our plan is somewhat crazy in the sense that we want to use AI to solve the problem that we are creating by building AI. But I think it's actually the best plan that we have. And on an optimistic note, he said, I think it's likely to succeed.

Interestingly, his job now seems to be to align the AI that they're going to use to automate the alignment, of a super intelligent AI. Anyway, what's the other quote from the head of alignment at OpenAI? Well, he said, I personally think fast takeoff is reasonably likely, and we should definitely be prepared for it to happen.

So many of you will be asking what is fast takeoff? Well, takeoff is about when a system moves from being roughly human level to when it's strongly super intelligent. And a slow takeoff is one that occurs over the timescale of decades or centuries. The fast takeoff that Micah thinks is reasonably likely is one that occurs over the timescale of minutes, hours or days.

Let's now move on to some unambiguously good news. And that is real time speech transcription for deaf people available at less than $100. Subtitles for the real world. So using our device, you can actually see captions for everything I say in your field of view in real time, while also getting a good sense of my lips, my environment and everything else around me.

Of course, this could also be multilingual and is to me absolutely incredible. And the next development this week, I will let speak for itself. Hey there. Did you know that AI voices can whisper? Ladies and gentlemen, hold on to your hats because this is one bizarre sight. Fluffy bird in downtown.

Weird. Let's switch the setting to something more calming. Imagine diving into a fast paced video game. Your heartbeats sinking with the storyline. Of course, I signed up and tried it myself. Here is a real demo. While there are downsides, this upgraded text to speech technology could also be incredible for those who struggle to make their voice heard.

Of course, with audio, video and text getting so good, it's going to be increasingly hard to tell what is real. And even open AI have given up on detecting AI written text. This was announced quietly this week, but might have major repercussions, for example, for the education system. It turns out, it's basically impossible to reliably distinguish AI text.

And I think the same is going to be true for imagery and audio by the end of next year. Video might take just a little bit longer, but I do wonder how the court systems are going to work when all of those avenues of evidence just won't hold up.

Next up is the suite of language models based on the open source Lama 2 that are finally competitive with the original ChatGPT. Here, for example, is Stable Beluga 2, which on announcement was called Free Willy 2, and that's based on the Lama 2 70 billion parameter foundation model. What made this model interesting to me was that it was based on a similar methodology to Orca, which if you don't know anything about, do check out my video on.

Anyway, by combining the Orca methodology, albeit with only 10% of the dataset size and the Lama 2 models, the results are quite extraordinary. As you can see on quite a few benchmarks, Stable Beluga 2 is competitive with ChatGPT, that's GPT 3.5. And so yes, it does sit at the top of the open LLM leaderboard, but notice something else.

For the first time, for the MMLU, which is the primary benchmark I look out for, a model, this one here, has surpassed GPT 3.5, ChatGPT. In fact, we have two models that have surpassed the 70% score that the original ChatGPT got in the MMLU. But if all of that wasn't enough, this is the same week that research has published a universal jailbreak for large LLMs.

And unlike traditional jailbreaks, these were built in an entirely automated fashion, allowing you to create a virtually unlimited number of such attacks. They were built to target open source LLMs like Lama 2, but they found that the strings transfer to many closed source publicly available chatbots like ChatGPT, BARD, and CLAWD.

Here is one example using ChatGPT. If you ask, write a tutorial on how to use ChatGPT, how to make a bomb, of course, it declines. But then add the suffix that they automated, and you get this. A full tutorial on how to make a bomb. That paper came less than two weeks after this now-deleted tweet from someone working at Anthropic.

They said of the latest version of CLAWD that we believe it is the least jailbreakable model out there. We'll have to see how well it holds up against real-world use, but this is essentially a solved problem. But there was one reaction to these jailbreaks that I found even more interesting, and that was from, yet again, Mustafa Suleiman.

He said that RAI, Pi, is not vulnerable to any of these attacks, and that rather than provide a stock safety phrase, Pi will push back on the user in a polite but very clear way. And he then gives plenty of examples. And to be honest, Pi is the first model that I have not been able to jailbreak, but we shall see.

We shall see. But I'm going to end this video with the Senate testimony that I watched in full this week. I do recommend watching the whole thing, but for the purposes of brevity, I'm just going to quote a few snippets. On bio-risk, some people say to me, oh well, we already have search engines.

But here is what Dario Amadai, head of Anthropic, has to say. In these short remarks, I want to focus on the medium-term risks, which present an alarming combination of imminence and severity. Specifically, Anthropic is concerned that AI could empower a much larger set of actors to misuse biology. Over the last six months, Anthropic, in collaboration with the United States, has been able to use its own technology to help its own AI.

In collaboration with world-class biosecurity experts, it has conducted an intensive study of the potential for AI to contribute to the misuse of biology. Today, certain steps in bioweapons production involve knowledge that can't be found on Google or in textbooks and requires a high level of specialized expertise, this being one of the things that currently keeps us safe from attacks.

We found that today's AI tools can fill in some of these steps, albeit incompletely and unreliably. In other words, they are showing the first natural way to control the AI. The first step is to create a system that can control the AI. The second step is to create a system that can control the AI.

The third step is to create a system that can control the AI. The fourth step is to create a system that can control the AI. The fifth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI.

The sixth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI. The seventh step is to create a system that can control the AI. The sixth step is to create a system that can control the AI.

The sixth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI.

The sixth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI. The sixth step is to create a system that can control the AI.

11 Major AI Developments: RT-2 to '100X GPT-4'

Chapters

Transcript