Just this week we have had OpenAI tell us that superintelligence might need to be made safe within 4 years, competing lab leaders say it's decades away, and expert warnings that AI might have runaway power within 2 years. Let's try to unpack those disparate timelines, see what might speed up the timing or slow it down, show what superintelligence might mean, and end with some interesting clips that capture the moment we're in.
But the first timeline is from Mustafa Suleiman, head of Inflection AI this week. If it's so risky, why don't you stop? I think that the point of raising concerns is that we can see a moment at some point in the future, probably over a decade or two decades time horizon, when slowing down is likely going to be the safe and ethical thing to do.
10 years is not a long time. I find it fascinating that he talks about two decades from now when Inflection AI has just built the world's second highest performing supercomputer. And even as they admit, that's three times as much compute as was used to train all of GPT-4. Telling the public that we have a decade or two before we have to worry about safety seems extremely conservative to me.
But what do we even mean by transformative AI or superintelligence? Well, here is just one projection of current scaling laws out to 2030 from Jacob Steinhardt of Berkeley. And here is just one projection of current scaling laws out to 2030 from Jacob Steinhardt of Berkeley. And here of course we're talking about just six and a half years away.
If we look at projections of future compute and data availability and the velocity of current improvement, which of course might not hold forever, some experts claim that we'll need new innovations beyond the transformer. But if current projections of future compute and data availability scale up, here's the kind of thing that we're talking about.
Being superhuman at tasks including coding, hacking, mathematics, protein engineering, doing 1.8 million years of work, in 2.4 months, learning for 2,500 human equivalent years in just one day, and by training on different modalities such as molecular structures, low-level machine code, astronomical images and brain scans, it might have a strong intuitive grasp of domains where we have limited experience including forming concepts that we do not have.
Indeed, some research released this week show that GPT-4 already crushes some benchmarks for creative thinking. And the median forecast, for being better than all but the very best humans at coding, is 2027. And here we have a median forecast of 2028 for AI winning a gold medal at the International Math Olympiad.
The number that I'm looking out for is getting 100% on the MMLU. That's a test of 57 different subject matters. And I've actually been discussing with some of the creators of the MMLU that we might not even know the full potential of GPT-4 on this test. Officially it's 86.4%.
So we've heard 20 years and 6.5 years, well how about 2? This article comes from the Boston Globe that did a feature piece on Dan Hendricks and the Centre for AI Safety. They were behind that one sentence letter that was signed by almost all of the AGI lab leaders and world experts on AI.
The journalist asked Dan Hendricks how much time we have to tame AI. And he said, well, how long till it can build a bioweapon? How long till it can have a bioweapon? How long till it can hack? It seems plausible that all of that is within a year. And within two, he says, AI could have so much runaway power that it can't be pulled back.
Seems a pretty massive contrast to Mustafa Suleiman talking about a decade or two from now. I'm going to come back to this article quite a few times, but now I want to move on to OpenAI's recent statement. This week they released this, introducing super alignment. We need scientific and technical breakthroughs to steer and control AI systems and to make it smarter than us.
I can just see now all the comments from people saying that that's going to be physically impossible. But moving on to solve this problem within four years, we're starting a new team co-led by Ilya Sutskevert and Jan Leiker and dedicating 20% of the compute we've secured to date to this effort.
That is quite a remarkable statement. To their credit, they've made themselves accountable in a way that they didn't have to and that others haven't. And they're deploying one of the legends of deep learning. They say that super intelligence will be the most impactful technology humanity has ever invented. And I agree with that.
And it could help us solve many of the world's most important problems. Absolutely. But the vast power of super intelligence could also be very dangerous and could lead to the disempowerment of humanity or even human extinction. They go on, while super intelligence seems far off now, we believe it could arrive this decade.
Notice they don't say in a decade, they say this decade. They go on. Currently, we don't have a solution for steering or controlling a potentially super intelligent AI. They can't prevent it from going rogue. And our current techniques for aligning AI rely on humans ability to supervise AI. But humans won't be able to reliably supervise AI systems that are much smarter than us.
And so our current alignment techniques will not scale to super intelligence. I'm going to go into more detail about their plan for aligning super intelligence in another video. But here is the high level overview. Essentially, they want to automate alignment or safety research. Build an AI alignment researcher. I've read each of these papers and posts and some of them are very interesting, including automated red teaming and using a model to look inside the internals of another model.
But the point of including this post in this video was the timeline of four years. 20% of their compute is millions and millions and millions of dollars. And 40% of their compute is data. So that's a very strict deadline. And one of the most interesting aspects of this post came in one of the footnotes.
They say: Solving the problem includes providing evidence and arguments that convince the machine learning and safety community that it has been solved. That is an extremely high bar to set yourself. They go on: If we fail to have a very high level of confidence in our solutions, we hope our findings let us and the community plan appropriately.
That's probably one of the most interesting sentences I've read for quite a while. If we fail to have a very high level of confidence in our solutions, we hope our findings let us and the community plan appropriately. In other words, if they can't make their models safe, they're going to have contingency plans and they want the community to have plans as well.
And it is a really interesting number, isn't it? Four years, not even around five years or just end of the decade. And it does make me wonder what Ilya Satskova thinks is coming within four years. To have such a deadline. Now, apparently the prediction markets give them only a 15% chance of succeeding.
And the head of alignment at OpenAI said he's excited to beat these odds. So we've heard about one to two years and about four years. But what might slow those timelines down? The other day I read this fascinating paper, coincidentally co-authored by Jacob Steinhardt, on jailbreaking large language models.
The paper showed that you could basically jailbreak GPT-4 and CLAWD 100% of the time using AI. And that is fascinating to me as we approach the one year anniversary of the creation of GPT-4. And the relevance to superintelligence is that if the creators of these models can't stop them being used to commit crimes, then you would think that they might have to dedicate more and more of their efforts in stopping jailbreaks versus working on capabilities.
For obvious reasons, I'm not going to go into too much detail on jailbreaking here, but here is CLAWD+ from Anthropic telling me how to hold jailbreak. The first thing I wanted to say is that the most innocent version of the CLAWD+ is the one that I found to be the most interesting.
And to be honest, that's just the most innocent one. And yes, it did also work on GPT-4. I did find one of the reasons why it does work quite interesting though. That reason is about competing objectives where its compulsion to predict the next word successfully overrides its safety training.
And so because those two facets of smartness clash inside the model, it's not an issue that can be fixed with more data and more scale. What else might slow down the work on superintelligence? Well, lawsuits and lawsuits. And possibly criminal sanctions. Yuval Noah Harari recently said that AI firms should face prison over the creation of fake humans.
And he was saying this to the United Nations. He called for sanctions, including prison sentences, to apply to tech company executives who fail to guard against fake profiles on their social media platforms. Of course, those executives might well blame the AI companies themselves. But Harari said that the proliferation of fake humans could lead to a collapse in public trust and democracy.
Now it's possible for the first time in history to create fake people, billions of fake people. If this is allowed to happen, it will do to society what fake money threatened to do to the financial system. If you can't know who is a real human, trust will collapse. What's another famous roadblock to superintelligence?
Hallucinations. I've already talked in another video about how Sam Altman thinks that won't be an issue in 18 to 24 months. But here again is Mustafa Suleiman on the issue of hallucinations. Yesterday he said, "Soon LLMs will know when they don't know. They'll know when to say 'I don't know' or instead ask another AI, or ask a human, or use a different tool or a different knowledge base.
This will be a hugely transformative moment." And on that I agree, hallucinations are probably one of the biggest hurdles stopping most people from using LLMs more commonly. It's not about knowing more, it's about when these models bullcrap less, or the moment when they don't bullcrap at all. But what about things that could actually speed up the process?
What about things that could speed up the timelines to superintelligence? Going back to the Boston Globe article, one thing could be competition for military supremacy, which has already produced a startling turn to automation. And that's not just robotics and autonomous drones, that's the LLMs that might control them. Here is a snippet of a trailer for a Netflix show released today.
"A.I. is a dual-edged sword. The flip of a switch, and the technology becomes... lethal." "There is no place that is ground zero for this conversation more than military applications." "Forces that are supported by A.I. will absolutely crush and destroy forces without." "Militaries are racing to develop A.I. faster than their adversaries." "The A.I., unless it's told to fear death, will not fear death." "There is no second place in war.
If you're going up against an A.I. pilot, you don't stand a chance." If language models prove useful in war, the amount of investment that's going to go into them will skyrocket. Of course, investment doesn't always equal innovation, but it usually does. And one of the other things that could speed up timelines is the automation of the economy.
For detail on why it might, check out the paper linked above and in the description. But the high-level overview is this: As A.I. grows more capable and ubiquitous, companies will be forced to "hand over increasingly high-level decisions to A.I.s in order to keep up with their rivals." If an A.I.
as CEO does a better job for stockholders, how long can a company resist employing them? And of course, it doesn't just have to be white-collar work. As Andrej Karpathy said, "Welcome to the matrix for apples." But the thing is, whether we're talking about one year or four years or six, superintelligence is coming pretty soon.
And it is interesting to me that so much of society is carrying on as if it's not coming. Take these 50-year long mortgages that are available in the UK. How can anyone plan out 50 years from now in a world where we might have superintelligence in five? Of course, I do think we all need to start defining terms a bit better, and I've tried to do that on this channel with A.G.I.
and superintelligence. But I don't think it's quite good enough to give vague reassurances of a decade or two from now. How we're going to react when superintelligence arrives is anyone's guess. We might be crushed by the sense of inferiority, as Douglas Hofstadter recently said. Or some of us might become like curious children speaking to a wise adult.
Just the other day, I got a foreshadowing of my own reaction by speaking to Pi, the model from Inflection AI. It is designed to be extremely human-like, and the conversations can be quite startling and personal. Of course, just imagine when they're superintelligent and multimodal. Anyway, let me know your thoughts in the comments and as all, always, have a wonderful day.