Sora 2 - It will only get more realistic from here

Sora 2 is out and for some it will seem like a slop feed Shoggoth. For others a glimpse at a genuinely generalist AI. Visually it could be seen as the best that has ever come out of text video but for others barely an improvement on VO3 from Google. It's up to you of course but I will try to focus on six elements you might not have caught from the viral release video and announcement.

So is it all a distraction though from physical science breakthroughs like those promised by periodic labs or the coding supremacy of Claude 4.5 Sonnet? Depends who you ask but let's get started. First a quick one one detail that many may have missed there are actually two Sora 2's. OpenAI said ChatGPT Pro users will be able to use our experimental higher quality Sora 2 Pro initially on Sora.com and then in the app as well.

But my question is where did all the best demos come which you're going to see in this video and which went viral? Could it be that most of those were Sora 2 Pro and therefore what most people will access is just the normal Sora 2? These things are incredibly expensive to run don't forget and OpenAI do eventually have to make a profit.

And then there's the rollout. According to the Sora 2 system card that invitation system which is a bit jank is actually deliberate to slow things down. That's maybe also why it's only for the US and Canada initially. iOS only. Freemium but with limits that will actually decrease as new users join with no API though apparently that's promised in the coming weeks.

All of that is deliberate and part of the safety focused iterative rollout strategy. Then there's the inevitable comparisons initially between Sora 1 and Sora 2 but I'm going to throw in VO3 Demos 2 for reference made via VO3 Preview on Gemini Online and VO3 Quality with Google Flow. Now note that one of the leads for Sora 2 said that that model is intelligent in a way that we haven't seen with a video model before.

So they're claiming it has the best world model you could say. Image to video or video to video is not yet allowed although we'll get to cameos later. All of this begs the inevitable question as to which model is the very best for video generation and comparisons are really hard to state definitively.

As I've said earlier on we don't even know whether this is Sora 2 Pro or Sora 2 and even VO3 has preview quality versions fast versions and the main VO3. Also I have seen credible leaks that apparently VO3.1 is going to be released in the coming week or so.

I'm also going to make a point that I think is fairly significant which is models. I'm not just talking about Sora and VO but even LLMs like Gemini and ChatGPT. They are unbelievably fundamentally dependent on the data sets on which they're trained. So just because for one particular prompt say of a gymnast one model is clearly better than the other doesn't mean it's better all around.

It might just have more training data on that domain. Take this game generation of cyberpunk from Sora 2. Now I've never played that game but clearly according to reports they must have taken plenty of video tutorials from that game and fed it into the training data. Sora 2 can also generate anime much better than VO3 apparently but again think training data.

"You better keep that wheel steady because everyone's gunning for us." "I found out output tokens cost more than input tokens." "Yeah, apparently my words aren't worth as much as the models." "Input tokens are the cheap seats." "Then there's questions of copyright." "Transformer, that's the power of my stand." "Sydney Bing." "And, ready, begin." But that is gonna have to be for another video.

I will note that certain claims I've seen online about Sora 2 mastering physics are really overstated. Take this video in particular. This was touted by one of the leads on Sora 2 as an exemplar of Sora 2 understanding physics. I'm not sure about you but the physics in this one seems more video gamey than real.

Incredible realism but more like that from a video game. Look how he bounces off the hoop. Now what about that almost social media app that OpenAI are launching called Sora and Sam Altman last night said that he could find it easy to imagine the degenerate case of AI video generation that ends up with us all being sucked into a reinforcement learning optimized slot feed.

Well clearly OpenAI wanted to distinguish their app from vibes by Meta which was widely panned. Let me know what you think but for many there will be nothing less vibey in the current climate than a couple of billionaires like Zuckerberg and Wang announcing the launch of a new form of social media full of quote AI slop but putting vibes to one side for a moment I do think it's a little bit more nuanced than that and to OpenAI's credit they are starting with some decent differentiations.

There will be no infinite scroll for under 18s. Users will be nudged to create rather than consume. There will be watermarks both visible and invisible on all videos as well as strict opt-ins for your likeness being used. Inputs will be classified and then potentially blocked and outputs will go through a reasoning model to see whether they should be blocked.

Like I said you can't just input an image and output a video or go from video to video so that's blocked and these categories are also blocked from display. So if you were hoping for some wrongdoing you'll have to look elsewhere. Which brings me to the cameo feature which is unique at the moment at least to OpenAI's Sora app.

For this feature you can't just upload a video of yourself otherwise you get a bunch of deepfakes but you have to record yourself saying things that OpenAI get you to say. That kind of proves that you are who you are and then you can insert your likeness into any new video or existing video.

This is at the moment a unique feature available for Sora 2. That's why you've been seeing all that Sam Altman content and the intention is that no one can take your likeness and make a video of you without your permission and even if one of your invited friends does that you can then delete the ones you don't like.

Given how low the bar is at the moment for deepfakes I actually commend them for setting some standards. But the real master plan can be found in Sam Altman's blog post from just 18 hours ago and that has plenty of juicy details you may have missed. They were clearly very hesitant about launching a social media app and you could see the hesitation on their faces when some of the leads for Sora 2 were announcing it.

First apparently there are going to be periodic checks in the app on how Sora is impacting users mood and well-being. I presume a lot of people are going to spam thumbs up just to avoid being blocked out of the app. But then comes the centerpiece promise which is big if true.

They will have a rule such that the majority of users looking back on the past six months should feel that their life is better for using Sora than it would have been if they hadn't. If that's not the case, they're going to make quote significant changes. Brackets and this is key.

If we can't fix it, we would discontinue offering the service. Almost like a guarantee given to forestall the criticism that they knew would be inevitable about launching a social media app. And by the way, you can direct message other people so it is social media. Taken at face value, this means that Sora does have to be net beneficial for humanity to continue.

However, let's just say that if you look at the track record, not every promise issued by OpenAI Not every promise issued by OpenAI has been fully upheld. Just to take one example, the CEO of OpenAI when it was launched said that setting up this Manhattan project for AI called OpenAI, they would obviously comply with and aggressively support all regulation.

They now employ a whole bunch of lobbyists who are partly responsible for blocking certain pieces of regulation. My prediction is that this promise will be quietly forgotten. Now, I say all that, but I must confess that with Sora 2 and this app, my feelings are about as mixed as they possibly could be.

You will very likely be able to find me sending memes of me in certain activities to some of my friends. I think there's going to be huge entertainment value and even some practical utility. As one of the leads for Sora 2 said, Will Depew, one of the biggest bottlenecks in science at the moment is good simulators for RL.

But then we can imagine elders and eventually ourselves falling for the kind of slop and not being able to believe anything. Sam Ortman even admits, "If you just truly want to doom scroll and be angry, then okay, we'll help you with that." But that's quite a big zoom out.

For now, my take is that a social media app is actually quite a clever way to build a moat in an environment that doesn't have many at the moment. It is so easy to flip from Sora 2 and just use VO3 or soon VO3.1 or maybe Kling 2.5, which we've just announced.

When Seadream becomes a video generator, you could just hop to that. How do you get people to stay using your video generator? How does OpenAI make a profit? Well, if you're locked into a social media app and all your friends are on it and you want to use your own or their likeness but not have others use your likeness, well then you have the Sora app.

So I think it quite cleverly locks you into their system. OpenAI did also claim in the launch video that Sora 2 is a step towards a generalist agent. And I get that they have to say that because the company mission is officially, we're literally building AGI. So everything has to get wrapped up into that vision.

But Sora 2 seems more like a side quest that might add XP, but isn't directly on course. Much more on course for me would be something like Periodic Labs. I mentioned in my last video how exploration and experimentation is one of the last big blockers toward a singularity, if you will.

Even if you solve hallucinations and the data problem and the modeling problem, models are still passive, they're not exploring the world. Well, Periodic Labs want them to automate science, run experiments autonomously. I interviewed one of the founders who came from Google DeepMind a little while ago for a Patreon video.

And another one of their founders, William Fedus, came from OpenAI, I believe he was behind ChatGPT in part. This story, I realize, is almost the polar opposite of Sora 2 because it's immensely physical and in the real world. It's also not available immediately, but the idea, roughly speaking, is this.

If we want to, for example, come up with a room temperature superconductor or better solar batteries, then there are a few bottlenecks. First, is running enough experiments? Well, what if we could have deep learning systems predict what an experiment will yield and then have, say, humanoid robots conduct those experiments autonomously?

That might remove one bottleneck. Then what about those terabytes and terabytes of data generated by existing experiments that LLMs can't use? What if a lab collected all of that data in an LLM friendly format, which could then be fed into the latest model? Finally, I think we all know that there are just thousands and thousands of papers out there that we're never going to get around to read.

So what about an AI model optimized for literature review? It could find from the literature what are the most promising experiments to run. Anyway, the big reveal is that Periodic Labs with $300 million in funding is going to work on all of those. Why even bring this up? Well, partly to contrast with Sora 2 and claims of being a generalist agent, but also, I guess, for those people who think all of AI is bad and it's nothing but slob.

In fairness, those results aren't going to come overnight. So in the meantime, let's talk about job opportunities that you could apply for even today. The sponsors of today's video are 80,000 hours and in particular their job board, which you can access through a link in the description. These are jobs that are available around the world, both remote and in person, and you can see the list is updated daily.

The focus is on positive impact, and as you can see, it spans from entry level to senior roles. Again, if you're curious, check out the link in the description. The obvious thing to say about Sora 2 is the moment it's out, that forever will be the worst that AI ever is at video generation.

Likewise, Claude Sonic 4.5, which is claimed to be the best coding model in the world, although they don't fully have the stats to back that up, is, I guess, the worst that coding is ever going to be via an LLM. By the way, just on that point about them not backing up, I do get that they have benchmarks showing that it's the best, but then there'll be other benchmarks showing that Codex is the best.

So where's the definitive proof that on all metrics is the best coding model? But that's for another discussion. I've been testing Claude 4.5 Sonnet for quite a few days, and to everyone's amazement, we actually got an early result on Simple Bench, and yes, this is with thinking enabled, and it was 54%.

Big step up from Claude 4 Sonnet, and it does feel in the ballpark of Claude 4.1 Opus when I'm doing coding. On one benchmark, at least, Sweebench verified, it even beats Opus 4.1, and you might say, "Well, that's already a model from Anthropic, so what's the big deal?" It's like five times cheaper.

You try using Opus 4.1 in cursor, and you really do have to get the checkbook out. For me, this just goes to show that a few months after each new breakthrough in AI, there is a breakthrough in price, wherein the earlier breakthrough is suddenly then as cheap as the models that came before that breakthrough.

Or to bring that back to video, there will likely be a video generation model released by some Chinese company, which is as good as Sora 2, with fewer filters, and way, way cheaper in, say, three to six months. Before we end though, a quick word on the future, because it's almost a given that in a few years, you can imagine a button on your TV remote that you could press and just add your stored face as a selected character in any show that you're watching.

That is coming. It's just a matter of whether it's two years or four years away. Suddenly, Netflix will be all about you. But then, here's what I've been thinking about, and forgive me for the digression, but we already have models that pass the written Turing test. As in, you can't distinguish that you're talking to a model, not a human.

And then Sora 2 is much closer to passing the visual one. It's not there, despite the hype posts, unless you're visually impaired, especially gullible, or just see a couple of seconds at a glance. But I think we have to admit we are getting closer and closer to passing the visual Turing test, not being able to tell that the video we're watching is real or fake.

But what happens after we pass the visual Turing test, and then the audio Turing test, and then the somatic sensory system test, so that we feel artificial worlds in our nervous systems and can literally touch them? You can think of each of our senses like a benchmark that we're getting closer to crushing.

What happens when we have models that can create entire worlds from scratch in real time, that are indistinguishable from reality, according to every sense we humans have? If we can be fooled visually, why not with audio or with touch or taste? When that happens, we might look back to Sora 2 as one step along that fascinating, exciting, and treacherous path.

Let me know what you think, thank you so much for watching, and have a wonderful day! What's up everyone, welcome to Sora 2, you finally made it! I'm so so excited to see you here, I've been waiting all week for this moment, and it's real now, you're actually here.

Yeah, those GPUs behind me are literally on fire, it's fine, we'll deal with that later, right? Knowledge is not a destination, it's a companion for the road. Und wenn wir gemeinsam gehen, lernen wir, dass jede Frage mehr Türen öffnet, als sie schließt.

Sora 2 - It will only get more realistic from here

Chapters

Transcript