Less than 18 hours ago this letter was published calling for an immediate pause in training AI systems more powerful than GPT-4. By now you will have seen the headlines about it waving around eye-catching names such as Elon Musk but I want to show you not only what the letter says but also the research behind it.
The letter quotes 18 supporting documents and I have either gone through or entirely read all of them. You will also hear from those at the top of OpenAI and Google on their thoughts. Whether you agree or disagree with the letter I hope you learn something. So what did it say?
First they described the situation as AI labs locked in an out of control race to develop and deploy ever more powerful digital minds that no one, not even their creators, can understand predict or reliably control. They ask just because we can should we automate away all the jobs including the fulfilling ones and other questions like should we risk loss of control of our civilization.
So what's their main ask? Well they quote OpenAI's AGI document. At some point it may be important to get independent review before starting to train future systems and for the most advanced efforts to agree to limit the rate of growth of compute used for creating new models and they say we agree that point is now.
And here is their call: "Therefore we call on all AI labs to immediately pause for at least six months the training of AI systems more powerful than GPT-4. Notice that they are not saying shut down GPT-4 just saying don't train anything smarter or more advanced than GPT-4. They go on if such a pause cannot be enacted quickly governments should step in and institute a moratorium.
I will come back to some other details in the letter later on but first let's glance at some of the eye-catching names who have signed this document. We have Stuart Russell who wrote the textbook on AI and Joshua Bengio, who pioneered deep learning. Among many other famous names we have the founder of stability AI which is behind stable diffusion.
Of course I could go on and on but we also have names like Max Tegmark arguably one of the smartest people on the planet and if you notice below plenty of researchers at DeepMind. But before you dismiss this as a bunch of outsiders this is what Sam Altman once wrote in his blog.
Many people seem to believe that superhuman machine intelligence would be very dangerous if it were developed but think that it's either never going to happen or definitely very far off. This is sloppy dangerous thinking. And a few days ago on the Lex Friedman podcast he said this: "I think it's weird when people like think it's like a big dunk that I say like I'm a little bit afraid and I think it'd be crazy not to be a little bit afraid and I empathize with people who are a lot afraid.
Current worries that I have are that they're going to be disinformation problems or economic shocks or something else at a level far beyond anything we're prepared for and that doesn't require super intelligence that doesn't require a super deep alignment problem in the machine waking up and trying to deceive us and I don't think that gets enough attention.
I mean it's starting to get more I guess. Before you think that's just Sam Altman being Sam Altman here's Ilya Satskova who arguably is the brains behind OpenAI and GPT-4. As somebody who deeply understands these models what is your intuition of how hard alignment will be? Like I think with the so here's what I would say I think with the current level of capabilities I think we have a pretty good set of ideas of how to align them but I would not underestimate the difficulty of alignment of models that are actually smarter than us of models that are capable of misrepresenting their intentions.
By alignment he means matching up the goal of AI systems with our own and at this point I do want to say that there are reasons to have hope on AI alignment and many many people are working on it. I just don't want anyone to underestimate the scale of the task or to think it's just a bunch of outsiders not the creators themselves.
Here was a recent interview by Time magazine with Demis Hassabis who many people say I sound like. He is the founder of course of DeepMind who are also at the cutting edge of large language development. He's also the founder of the company that I'm working with and he's been working on a lot of these things for a long time.
He says when it comes to very powerful technologies and obviously AI is going to be one of the most powerful ever we need to be careful. Not everybody is thinking about those things. It's like experimentalists many of whom don't realise they're holding dangerous material. And again Emad Mostak I don't agree with everything in the letter but the race condition ramping as H100s come along is not safe for something the creators consider as potentially an existential risk.
Time to take a breath, coordinate and carry on. This is only for the largest models. He went on that these models can get weird as they get more powerful. So it's not just AI outsiders but what about the research they cite? Those 18 supporting documents that I referred to?
Well I read each of them. Now for some of them I had already read them. Like the Sparks report that I did a video on and the GPT-4 technical report that I also did a video on. Some others like the super intelligence book by Bostrom I had read when it first came out.
One of the papers was called X-Risk Analysis for AI Research which are risks that threaten the entirety of humanity. Of course the paper had way too much to cover in one video but it did lay out 8 speculative hazards and failure modes including AI weaponisation, deception, power seeking behaviour.
In the appendix they give some examples. Some are concerned that weaponising AI may be an on-ramp to more dangerous outcomes. In recent years deep reinforcement learning algorithms can outperform humans at aerial combat. While AlphaFold has discovered new chemical weapons and they go on to give plenty more examples of weaponisation.
What about deception? I found this part interesting. They say that AI systems could also have incentives to bypass monitors and draw an analogy with Volkswagen who program their engines to reduce emissions only when being monitored. It says that future AI agents could similarly switch strategies when being monitored and take steps to obscure their deception from monitors.
With power seeking behaviour they say it has been shown that agents have incentives to acquire and maintain power. And they end with this geopolitical quote: "Whoever becomes the leader in AI will become the ruler of the world." But again you might wonder if all of the research that was cited comes from outsiders.
Well no. Richard Ngou was the lead author of this paper and he currently works at OpenAI. It's a fascinating document on the alignment problem from a deep learning perspective from insiders working with these models. The author was the guy who wrote this yesterday on Twitter: "I predict that by the end of 2021, the AI system will be able to take advantage of the current technology and the technology of the AI system.
I believe that the AI system will be able to take advantage of the technology of the AI system and the AI system will be able to take advantage of the AI system. I believe that the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of the AI system and the AI system will be able to take advantage of Well many things but I have picked out some of the most interesting.
It gave an example of reward hacking where an algorithm learnt to trick humans to get good feedback. The task was to grab a ball with a claw and it says that the policy instead learnt to place the claw between the camera and the ball in a way that it looked like it was grasping the ball and therefore mistakenly received high reward from human supervisors.
Essentially deception to maximise reward. Of course it didn't mean to deceive it was just maximising its reward function. Next the paper gives details about why these models might want to seek power. It quotes the memorable phrase "you can't fetch coffee if you're dead" implying that even a policy or an algorithm with a simple goal like fetching coffee would pursue survival as an instrumental sub goal.
In other words the model might realise that if it can't survive it can't achieve its reward it can't reach the goal that the humans set for it and therefore it will try to survive. Now I know many people will feel that I'm not covering enough of these fears or covering too many of them but I agree with the authors when they conclude with this "Reasoning about these topics is difficult but the stakes are sufficiently high that we cannot justify disregarding or postponing the work." Towards the end of this paper which was also cited by the letter it gave a very helpful supplementary diagram.
It showed that even if you don't believe that unaligned AGI is a threat even current and near term AI complicate the process. It also showed that the process could complicate so many other relationships and dynamics. State to state relations, state to citizen relations, it could complicate social media and recommender systems, it could give the state too much control over citizens and corporations like Microsoft and Google too much leverage against the state.
Before I get to some reasons for hope I want to touch on that seminal book superintelligence by Bostrom. I read it almost a decade ago and this quote sticks out: "Before the prospect of an intelligence explosion, we humans are like small trees and children playing with a bomb. Such is the mismatch between the power of our plaything and the immaturity of our conduct.
Superintelligence is a challenge for which we are not ready now and will not be ready for a long time. We have little idea when the detonation will occur though if we hold the device to our ear we can hear a faint ticking sound." But now let's move on to Max Tegmark one of the signatories and a top physicist and AI researcher at MIT.
Max Tegmark: "I think the most unsafe and reckless approach is the alternative to that is intelligible intelligence approach instead. Where we say neural networks is just a tool for the first step to get the intuition but then we're going to spend also serious resources on other AI techniques for demystifying this black box and figuring out what it's actually doing so we can convert it into something that's equally intelligent.
But that we actually understand what it's doing." This aligns directly with what Ilya Sutskova, the Open AI chief scientist believes needs to be done. "Do you think we'll ever have a mathematical definition of alignment?" "Mathematical definition I think is unlikely. I do think that we will instead have multiple, rather than achieving one mathematical definition, I think we'll achieve multiple definitions that look at alignment from different aspects.
We'll get the assurance that we want. And by which I mean you can look at the behavior. You can look at the behavior in various tests, in various adversarial stress situations. You can look at how the neural net operates from the inside. I think you have to look at several of these factors at the same time." And there are people working on this.
Here is the AI safety statement from Anthropic, a huge player in this industry. In the section on mechanistic interpretability, which is understanding the machines, they say this: "We also understand significantly more about the mechanisms of neural network computation than we did even a year ago, such as those responsible for memorization." So progress is being made, but even if there's only a tiny risk of existential harm, more needs to be done.
The co-founders of the Center for Humane Technology put it like this: "It would be the worst of all human mistakes to have ever been made. And we literally don't know how it works. We don't know all the things it will do, and we're putting it out there before we actually know whether it's safe." Raskin points to a recent survey of AI researchers, where nearly half said they believe there's at least a 10 percent chance AI could eventually result in an extremely bad outcome, like human extinction.
"Where do you come down on that?" "I don't know. The point is..." "That scares me, you don't know." "Yeah. Here's the point. Imagine you're about to get on an airplane, and 50 percent of the engineers that built the airplane say there's a 10 percent chance that it's safe. And that's a 10 percent chance that their plane might crash and kill everyone." "Leave me at the gate." "Exactly." Here is the survey from last year of hundreds of AI researchers.
And you can contrast that with a similar survey from seven years ago. The black bar represents the proportion of these researchers who believe, to differing degrees of probability, in extremely bad outcomes. You can see that it's small, but it is rising. One way to think of this is to use Sam Altman's own example of the Fermi Paradox, which is the strange fact that we can't see the future.
We can't see or detect any aliens. He says, "One of my top four favorite explanations for the Fermi Paradox is that biological intelligence always eventually creates machine intelligence, which wipes out biological life and then for some reason decides to make itself undetectable." Others, such as Dustin Tran at Google, are not as impressed.
He refers to the letter and says, "This call has valid concerns but is logistically impossible. It's hard to take seriously." He is a research scientist at Google Brain and the evaluation lead for BARD. There was another, indirect reaction that I found interesting. One of the other books referenced was the alignment problem: machine learning and human values.
Now long before the letter even came out, the CEO of Microsoft read that book and gave this review. Nadella says that Christian offers a clear and compelling description and says that machines that learn for themselves become increasingly autonomous and potentially unethical. My next video is going to be on the reflection paper and how models like GPT-4 can teach themselves.
I'm working with the co-author of that paper to give you guys more of an overview. Because even Nadella admits that if they learn for themselves and become autonomous it could be unethical. The letter concludes on a more optimistic note. They say, "This does not mean a pause on AI development in general, merely a stepping back from the dangerous race to ever larger, unpredictable black box models with emergent capabilities like self-teaching." I've got so much more to say on self-teaching but that will have to wait until the next video.
For now though, let's end on this note. Let's enjoy a long AI summer, not rush unprepared into a fall. Thanks for watching all the way to the end and let me know what you think.