OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027'

If you blinked this week, you might have missed all four of the developments we're going to investigate today. From OpenAI's flip-flop on whether to maximize engagement, to doubts at the top of that company as to whether they should even be building superintelligence, and from the heating up of the battle to control quality data and the share of what Sam Altman calls the $100 trillion AI future, to this - the startling findings from the biggest survey of AI researchers to date.

I'll cover the highlights of this 38-page paper, but let's start with something I noticed about the GPT store. That's the store basically where you can create your own version of chat GPT based on your own data, your own custom instructions, that kind of thing. Various companies are getting involved and creating their own GPT and I'll talk about some of the GPTs that I've looked at.

But it was actually this short paragraph in the announcement of the GPT store that I want to focus on first. It might seem like a small deal now, but in a few months or at the latest a few years, this short paragraph could have major ramifications. In the paragraph, OpenAI are describing the opportunity for builders to create their own GPTs and what they'll get out of it.

Basically, how can builders monetize those GPTs that they're creating? Well, it seems like the way that OpenAI want to monetize their GPTs is through engagement. Here's the key sentence. As a first step, US-only builders will be paid based on user engagement with their GPTs. In other words, get your users to use your GPT as much as possible for as long as possible.

And you're probably thinking, "Okay, Philip, what's the big deal with that sentence?" Well, here's Sam Altman testifying before Congress mid last year. - Companies whose revenues depend upon volume of use, screen time, intensity of use, design these systems in order to maximize the engagement of all users, including children, with perverse results in many cases.

And what I would humbly advise you is that you get way ahead of this issue. - First of all, I think we try to design systems that do not maximize for engagement. In fact, we're so short on GPUs, the less people use our products, the better. But we're not an advertising-based model.

We're not trying to get people to use it more and more. - So what might explain this flip-flop? Well, I think it's character AI. Their platform full of addictive chatbots is apparently catching up to ChatGPT in the US. As of just a few months ago, they had 4.2 million monthly active users in the US compared to 6 million monthly active US users for ChatGPT's mobile apps.

It's also well known that people spend longer with character AI than they do on ChatGPT. There's also, of course, competition from Inflection AI, who have their Pi personal intelligence. It's basically a chatbot that kind of wants to be your friend, and it's soon going to be powered by Inflection 2, their most powerful LLM.

Then there's Meta's universe of characters where you can, quote, "chat to your favorite celebrity" via a chatbot. So that commitment not to maximize engagement does seem to be waning. Indeed, when Sam Altman had fun speculation about what GPTs will be doing the best by the end of today, he had his own theory.

There are incredibly useful GPTs in the store, but probably everything is waifus runs away with it. I'm not too familiar with that, but I believe it's like AI boyfriends and AI girlfriends. Just before we move on from the GPT store, there's an experience that happened to me quite a few times.

I would test out one of the GPTs, like Write For Me, and that GPT says it can do relevant and precise word counts. I tried multiple times and every time it failed on word counts. I'm not trying to pick on one GPT, but very often using one of the GPTs wasn't really an improvement on just using GPT-4.

There was, however, one notable exception to that, which was the Consensus GPT. Obviously, this is not sponsored, but by giving relevant links that I could then follow up with, it was genuinely helpful and better than GPT-4 base. This is for the specific task of looking up scientific research on a given topic.

But now I'm going to give you a little dose of nostalgia. We are almost a year to the day when I started AI Explained, and one of my very first videos in that week was a review of the original Consensus app. That video is unlisted now because it's simply not relevant anymore, but I do talk about the one year anniversary of the channel and how it's changed on my latest podcast episode on AI Insiders for Patreon.

There's even a link to the unlisted video if you want to see what AI Explained was like back in the day. But there was a much more cryptic announcement from OpenAI at the same time as the GPT store that I'm not sure was fully intentional. The reason I say that is that it was leaked/announced by someone not from OpenAI.

It seems a bit like an inadvertent launch because a few minutes later, Greg Brockman put out a hastily written tweet, which he then edited into this tweet. Greg Brockman is the president and co-founder of OpenAI. The update was about GPT4 learning from your chats, carrying what it learns between chats, and improving over time by remembering details and preferences.

You're then allowed to reset your GPT's memory or turn off this feature. But I don't know about you, but it reminds me directly of my previous video on what Mamba would allow. If you're curious about that architecture or selective state space models, then check out that video. But the point for this video is this.

We don't know if this announcement was intentional and we don't know what powers it. It could of course just be storing your conversations to load into the context window, but it feels more significant than that. As Brockman says, they are still experimenting, but hoping to roll this out more broadly over upcoming weeks.

And as the investor Daniel Gross says, it's the early signs of a moat more significant than compute flops. He shared this screenshot where you can ask GPT, "What do you know about me? Where did we leave off on my last project? And remember that I like concise responses." So why would that be a moat?

Well, just like we saw with user engagement for GPT's, it makes models more addictive, more customized to you. Why go to a different AI model, a different LLM, if this one from chatGPT knows who you are, knows what you like, knows what projects you like to work on, what code style you use, or more controversially, feels a bit more like your friend remembers your birthday, that kind of thing.

If you think that's far fetched, by the way, remember that photorealistic video avatars are coming as well soon. And of course, the natural end point of this is that these chatbots become as intelligent or more intelligent than you. After all, that's what OpenAI have said that they're working towards all along.

They want to build super intelligence, the super meaning beyond, beyond human intelligence. In this fairly recent interview with the Financial Times, Sam Altman said he splits his time between researching how to build super intelligence and ways to build up the computing power to do so. Also on his checklist is to figure out how to make it safe and figure out the benefits.

And I promise I'm getting to a point soon enough here, but OpenAI have also said that they admit that by building AGI and super intelligence, it's going to replace human work. Indeed, their definition of AGI on their website is a system that outperforms humans at most economically valuable work.

And that's just AGI, remember, not even super intelligence. Indeed, in a blog post on super intelligence, Sam Altman and Greg Brockman said that we believe it will be actually risky and difficult to stop the creation of super intelligence. Yes, it might automate human labor, but the upsides are so tremendous in quotes, and so many people are racing to build it.

It's inherently part of the technological path that we are on. Now, you are probably really wondering why I'm bringing up these recent quotes to recap they're building super intelligence. They have to build super intelligence and yes, it will mean the automation of human labor. Well, here's why I bring all of that up.

It seems like there might be the first inkling of second thoughts about that plan. Just four days ago, one of the key figures at OpenAI, Andrej Karpathy said this, he says that the best possible thing that we can be is not EAC or EA, but all about intelligence amplification.

In other words, we should not seek to build super intelligent, godlike entities that replaces humans. So that's no super intelligence and not for the replacement of humans. It's about tools being the bicycle for the mind, things that empower all humans, not just a top percentile. And it's pretty hard to disagree with that.

It seems like a wonderful vision to me too, but it was quite fascinating to see Sam Altman retweet or repost that tweet. The obvious questions are like, are you trying to build super intelligence or are you not? Are you trying to replace human labor or are you not trying to do so?

You can't keep describing your latest advancements as tools, but then admit that they are going to replace human labor and eventually be more intelligent than all of us. I mean, I do think almost everyone could get behind this vision from Andrej Karpathy. It just seems to contradict some of the other stuff that OpenAI have put out.

Also, if I had a chance to ask Karpathy or Altman, I would say, how do you draw the line between a tool and something that replaces us? Making an AI model more intelligent indeed makes it a better tool, but it also makes us one step closer to being replaced in the labor market.

Giving an AI more agency and independence will make it a better assistant, but again, makes it one step closer to being able to replace your job. Indeed, as another key worker at OpenAI said, Richard Ngo, he thinks we'll continue hearing that LLMs are just tools and lack any intentions or goals until well after it's clearly false.

I guess I'm seeking clarity on what divides something from being a tool and being a replacement. And if OpenAI are trying to be on the same page on this, they need to state clearly what that dividing line is. Now, just quickly before we get to that AI researcher prediction paper, there's one more thing from OpenAI that I find hard to reconcile.

Last month, they did a deal with Axel Springer, a publishing house, to create new financial opportunities that support a sustainable future for journalism. And as of two days ago, it was revealed that they're in talks with CNN, Fox, and Time to get more content. And finally, the information revealed what OpenAI is typically offering publishers.

It's between $1 million to $5 million annually. And what we also learned from this article is that battle to control data is not just being waged by OpenAI or even OpenAI and Google. Apple has actually launched into the fray and they are trying to strike deals with publishers for the use of their content.

But the difference with Apple is that they want to be able to use content for future AI products in any way the company deems necessary. That could include, for example, imitating the style of a particular publisher or journal. Or, for example, developing a model that acts as the world's newspaper.

Remember, you can customize models now so you can have your own Fox News, your own MSNBC. You could have, for example, your own personalized AI video avatar giving you exactly and only the news that you want. What that does to society is a question for another day. But Apple are apparently offering up to $50 million for those kind of rights.

So what's so hard to reconcile then? Well, remember, this is all about creating new financial opportunities that support a sustainable future for journalism. But Sam Altman has already said that his grand idea is that OpenAI will capture much of the world's wealth through the creation of AGI and then redistribute this wealth.

And he's talked about figures like $1 trillion and $100 trillion. In a world where OpenAI, Google and Apple are creating $100 trillion worth of wealth and profit, seems like in that world they would have gobbled up independent journalism or at least the major profits of independent journalism. Indeed, $100 trillion is about the size of the entire global GDP.

And that kind of makes sense, right? If AGI or superintelligence can do the task of any human being, it would make sense to equate it with the global economy size. How that fits in with a sustainable future for journalism, I'll have to work that one out. But it's time at last for AI researchers to weigh in themselves on all of these debates.

Thousands of AI authors submitted their predictions for this paper. It's predictions about everything. Timelines, safety, economic impact, everything. And of course, as you might expect, I've read the paper in full and I'm going to give you only the most juicy bits. This paper, by the way, came out just a week ago.

But let's start with the very first paragraph. These results, by the way, come from a survey of 2,778 AI researchers. So here's the first prediction. If science continues undisrupted, the chance of unaided machines outperforming humans in every possible task was estimated at 10% by 2027. And yes, at 50% by 2047.

But let's focus on that 10% by 2027. That's a one in 10 chance that all human tasks are potentially automatable in three years from now. Now, there is one quick caveat that I'm going to add to that on behalf of the paper. Even if there is one model out there that can wire a house and solve a math competition, doesn't mean that there's instantly billions of such models.

If we're taking embodiment into account, the mass manufacturing of all of those models would take a lot longer than just a few months or years. But nevertheless, if you sit back and just read that prediction again from AI researchers, like it's easy to get lost in the noise and the news and next week there's going to be another model, another development, but a 10% chance in three, maybe three and a bit years for every human task to be potentially automatable unaided is pretty insane.

These estimates, as they say, are typically earlier than they were when these researchers were surveyed last year. Now you may have noticed that this sentence uses the word outperforming, and there's a later sentence in the first paragraph talking about being fully automatable. And I'll come back to that in a moment.

And when I say a moment, I mean like literally right now, because in the paper, there was an incredibly stark and unjustified difference between the predictions for high level machine intelligence, that's all human tasks and the full automation of labor or human jobs with the full automation of labor being predicted for the 2100s.

And of course, on hearing that many of you will be like, wait, what's high level machine intelligence then and all human tasks? Well, here is where they describe high level machine intelligence as defined for this survey. High level machine intelligence is achieved when unaided machines can accomplish every task, every task better and more cheaply than human workers.

So not just better, but also more cheaply think feasibility, not adoption. The only caveat is that we're assuming here that human scientific activity continues without major negative disruption. But the date by which there is apparently a 50% chance of this happening is 2047 down 13 years from 2060. But how on earth would it take us from 2047 to the 2100s to go from a machine that can accomplish every human task better and more cheaply to the full automation of labor?

Like literally how long do they think it's going to take to manufacture these robots? And don't forget the manufacturing of these embodied AIs is going to be assisted presumably by the high level machine intelligences. It's not like manufacturing is going to continue at the pace it always has. Each factory would have its own AGI helping it speed up production.

To be honest, the main result of this survey for me is that it shows that AI researchers are really not good at thinking through their predictions. Later on in the paper, the authors admit that this discrepancy between human level machine intelligence and the full automation of all labor is surprising.

And their guesses are maybe it was the framing effect of the question, or maybe it's that caveat about the continuation of scientific progress. Maybe respondents expect major disruption to scientific progress. And there's something else from the paper that I found kind of amusing. What is the last job that AI researchers think will be fully automated?

Being an AI researcher. That's at 2063. Of course, that's the 50% prediction. Some people think much earlier. Some people think much later. Now, number one, I think that's kind of funny that AI researchers think what they do is going to be harder to automate than anything else. But number two, look at the timeline 2063.

That's 40 years or so from now. But now think back to OpenAI who have the goal of solving super intelligent alignment in four years. And the key point is this, their method of doing so is to automate machine learning safety research. That's right. Build an AI that can do the safety research for them.

Indeed, one of the things they're working on is trying to align the model that's going to solve alignment. But the point is, look at their timeline. They think it's possible, indeed it's their goal, to create this automated alignment AI researcher in four years. They call it the first automated alignment researcher.

Now, yes, it's quite possible that they miss this deadline and it takes five years or 10 years, or maybe they do it faster than four years. But look at the kind of numbers we're talking about. Four years, five years, 10 years. Now I get that alignment research isn't all of AI research and there's a lot more there, but these four year, five year, 10 year goals seem a big stretch from a 40 year expectation of the automation of AI research.

One of these two dates is going to be pretty dramatically wrong. Now for a few more interesting little highlights before we close out. The paper found that subtle differences in the way that they ask certain questions radically changed the results. Apparently, if you ask people in this kind of way, will we have superintelligence by 2050?

You get much more conservative answers than if you ask, give me the year by which we'll have a 50% chance of superintelligence. In a related study, when people were asked about the chances of AI going wrong, when they were asked for a percentage, they said 5%. When they were asked for a fraction, it was one in 15 million.

Now, of course, these weren't the same people, otherwise they'd be pretty bad at mathematics. These were different test groups given different versions of questions. One tranche of the researchers might get one style of question, another tranche gets a different style. But I guess the take home here is that human psychology and anchoring is still playing an immense role when we talk about timelines and predictions.

Here's another fascinating highlight. The researchers were asked about the possibility of an intelligence explosion. This is the question. Some people have argued the following, if AI systems do nearly all research and development, improvements in AI will accelerate the pace of technological progress, including further progress in AI. Over a short period, less than five years, this feedback loop could cause technological progress to become more than an order of magnitude faster.

That's a period of less than five years in which technological progress becomes 10 times or more faster. Now, before you see the results, let's reflect. That would be a crazy world. It's already hard to keep up with AI. And I say that as someone who does it for a living.

Now imagine research and progress being 10 times or more faster. So is it 5% or 10% of AI researchers who think that's possible? No, it's a majority of respondents. In 2023, it's 24% who think there's an even chance of that happening, 20% who think it's likely, and 9% who think it's quite likely.

That's 53% of AI researchers who think we'll get this accelerating feedback loop, a proto-singularity, if you like. At that point, it would almost be worth live streaming my channel rather than creating videos because it would be like every minute there's a new bit of news. Apparently, 86% of the researchers, and I would phrase it as only 86%, were worried about deepfakes.

To count, by the way, they had to view it as at least a substantial concern. I would show those guys this in January of 2024. These were made in Blender and Cinema 4D, but they look ridiculously lifelike to me. If 14% of researchers don't think that deepfakes will be at least a concern, I don't know what to say to them.

And the vast majority of respondents, 70%, thought that AI safety research should be prioritised more than it currently is. And an equally clear message is that timelines are getting shorter. The red line is this year's predictions and the blue line is last year's. And the lines being more to the left mean more proximate, more close at hand predictions.

This is for human level machine intelligence. Now, I made a detailed AGI timeline of my own for AI Insiders, but suffice to say I am well to the left of that red line. And just one final point before we leave this survey. Some people, I am sure, in the comments will point out that it only achieved a 15% response rate, similar to previous years.

Thing is, they even gave prizes for people to respond to the survey. So they tried their best. It seems that most people just seem to not want to spend the time to answer surveys. And to be honest, I can understand. And that response rate is in line with other surveys of a similar size.

But I can't end on that technicality in the week of CES 2024 in Las Vegas. Now, I can talk about more of my highlights if you like, but there's one device that stood out to me. It's ridiculously expensive, but shows AI can be insanely useful and fun when we want it to be.

It's these almost five grand AI powered binoculars that can identify birds while you're looking through them. I just think that's insane and super fun. And I'd get it if you dropped off a couple of zeros. Anyway, we have covered a lot in this video. I am really curious to hear what you think.

Thank you so much for watching and have a wonderful day.

OpenAI Flip-Flops and '10% Chance of Outperforming Humans in Every Task by 2027' - 3K AI Researchers

Transcript