OpenAI's Sora

00:00:00.000 | There are occasionally these moments in AI, particularly in the past few years,

00:00:05.680 | where I see something that is completely unexpected and just seems so incredibly

00:00:15.920 | mind-blowing and far beyond where I would expect AI to be. Now the first time I had that

00:00:23.360 | mind-blown feeling with AI was when I was getting ready to do a talk with some engineers from OpenAI

00:00:32.320 | and they showed me what would have been prompt engineering at the time. I don't think it was

00:00:36.880 | called prompt engineering back then but we prompt engineered GPT-3 to do rag and answer questions

00:00:43.360 | in different ways and it was really far beyond what I thought AI was at at that point. And now

00:00:53.120 | OpenAI have released something called Sora and it's just insane. I haven't had a good look at it

00:01:00.640 | yet. I wanted to, given how rare these moments are where I'm truly just mind blown, I saw one video

00:01:10.160 | from this and thought okay now it's time to turn the camera on and just take a look together and

00:01:17.200 | see what it's like. So Sora is an AI model that can create realistic and imaginative scenes from

00:01:25.040 | text instructions and then this and now all videos on this page were generated directly by Sora

00:01:31.200 | without modification. I'm sure these are some of the best videos they generated but nonetheless,

00:01:37.040 | so this background image here is already pretty cool. Let's keep going. This is a video where I

00:01:46.080 | started watching and thought oh wow this is kind of insane. So this is AI generated.

00:01:52.880 | I mean it's insane. So there was the odd thing that was weird but it's just incredible how good

00:02:02.320 | like the background is perfect. It doesn't seem to, there's nothing weird going on. The mood,

00:02:08.640 | like the person is actually moving through the scene which you don't usually get with AI generated

00:02:16.560 | videos. It's usually kind of like they're moving slightly and you know the background is maybe

00:02:21.280 | moving a little bit but this is like insane. The amount of detail and the amount of movement is

00:02:25.600 | just, I don't even know. I noticed earlier that the legs get a little weird around,

00:02:35.760 | it's kind of like the hands to disable diffusion. Now it's legs. Around here the leg kind of swaps,

00:02:41.600 | it's super weird. Oh look at that, her left leg became her right leg which is interesting. It's

00:02:50.800 | hard to even notice that but this is just insane. I even, like you look at here and you look at the

00:03:00.240 | jacket and it has these, you know, these four buttons here. You go ahead, I'm just trying to

00:03:04.640 | find anything that's kind of odd but even here it's like the same four buttons, same jacket,

00:03:10.640 | maybe oh this is kind of long and big here. Okay so this grew over the video but like, gosh I'm

00:03:19.680 | really like pointing out these very minor little things. It's just insane. Then you look at the

00:03:26.320 | prompt and it's not, you know, it's like it's a relative, it's a paragraph. I mean it's nothing

00:03:31.440 | crazy and that paragraph of text produced this. Gosh I'm gonna have so much good suck video for

00:03:40.240 | my videos now. This one I saw briefly, I thought this was less impressive but I mean it's so good.

00:03:46.240 | And then this one, guy with a woolly, it's just like a, I don't even know, it's a bit weird and

00:03:52.640 | kind of all over the place but it's really pretty cool. I mean look at the detail on the guy and

00:04:01.600 | then this as well, like it just looks real, no? Am I, like it just looks real. The guy looks real,

00:04:11.440 | like there's nothing here that doesn't look legit. Photorealistic video to pirate ships,

00:04:20.240 | ah, it's a sail inside a cup of coffee.

00:04:23.280 | How good is that? Like looking at this, would I think it is, like the book is being weird and

00:04:37.360 | kind of going a little crazy but would I, if this wasn't on OpenAI's website, would I have

00:04:42.400 | looked at this and thought this is an AI video? I don't, I'm pretty sure I wouldn't. Okay so then

00:04:47.760 | Sora is becoming available to red teamers, so that it's still very, like it isn't released yet.

00:04:53.520 | We're also granting access to a number of visual artists, designers and filmmakers to

00:04:57.200 | gain feedback on how to advance the model to be most helpful for creative professionals.

00:05:01.440 | How insane, look at this. So this, historical footage of California during the gold rush,

00:05:12.560 | I mean the prompt is tiny but I would never, if you'd have shown me this two days, a day ago,

00:05:20.480 | I would be like oh look at this cool, like how did they film this? Were they on a balloon?

00:05:25.120 | I would have no idea. Things like the human eye are, because we're, I think biologically we're

00:05:36.320 | so able to, like we know what an eye looks like more than anything else, right? We can read an eye

00:05:43.040 | and understand it so well and the fact that I look at this and I don't think I can tell that it's not

00:05:53.280 | real. And it's probably the feature that I as a human would, should be able to distinguish from

00:06:02.560 | reality the easiest of anything. That should be like the hardest thing to convince me. Maybe it

00:06:09.600 | moves a bit weirdly, but the eyeball I mean, but really, I mean it's insane. I mean this is going

00:06:21.520 | to be like the new AI generated photos where you, after a little while, this is interesting.

00:06:28.960 | So like the people here are tiny and then all of a sudden these people are huge.

00:06:34.080 | Oh wow, this is, I mean it's not supposed to be like that of course, but it's interesting.

00:06:43.040 | Yeah, this like creating multiple shots within a single generated video that,

00:06:49.040 | that you know, the characters remain the same. How, I mean I don't know how

00:06:57.200 | they can, it's just so impressive. Like it's kind of weird here, like the guys,

00:07:01.440 | yeah, the perspective is strange. Yeah, that's interesting. So the perspective seems to mess up

00:07:10.400 | more, more often. That's like a strange thing that it has going on here. Also in the earlier

00:07:17.280 | video in Lagos in Nigeria, the perspectives were kind of messed up. Same here. Oh my gosh,

00:07:23.440 | look at this. How cool is that? It looks like a film. There's some weird stuff going on here,

00:07:31.600 | I feel. Like what is, this guy has like three trainers on or something. So, so it has its,

00:07:40.000 | has weaknesses. It may struggle with accurately simulating the physics of a complex scene

00:07:44.080 | and may not understand specific instances of cause and effect. For example, a person might

00:07:50.880 | take a bite out of a cookie, but afterward the cookie might not have a bite mark. The model may

00:07:56.480 | also confuse spatial details of a prompt. For example, mixing up left and right and may struggle

00:08:01.600 | with precise descriptions of events that take place over time, like following a specific camera

00:08:07.440 | trajectory. Yeah, this is interesting. So yeah, let's have a look at these. This is interesting.

00:08:15.840 | Oh yeah, so like people and animals just appear. Oh yeah, that's interesting.

00:08:23.520 | Oh, look at that. That's so cool though, at the same time.

00:08:28.240 | Archaeologists discover a generic plastic chair in the desert, excavating and dusting

00:08:36.400 | it with great care. So here it's like they are,

00:08:43.840 | what's happening here? So here they're taking, what is this? How, how insane.

00:08:55.200 | So, and then the guy's hands as well, like messed up. But this is, I mean, despite how weird it is,

00:09:06.000 | it looks so, it feels like I'm watching a, I don't know, like a dream or something.

00:09:13.280 | And then the actual people themselves, they're pretty impressive. And this is the first version,

00:09:21.200 | this is just insane. Okay, so that working red team is so people that will basically stress test

00:09:28.800 | the model, make sure it's not going to do anything weird, and they'll probably go over the top with

00:09:34.000 | it, but what can you do? Oh, I imagine how good all the UFO videos will be now. That's exciting.

00:09:42.560 | Oh, it's going to be so difficult to know what's real anymore. All right,

00:09:48.480 | so Sora is a diffusion model. It generates videos. We're starting off with one that looks like static

00:09:53.440 | noise. Oh wow, it does the same. How? So it starts with static noise and obviously removes the noise

00:10:00.080 | over many steps. I think it's the same as video stable diffusion. Then, okay, transformer

00:10:08.320 | architecture. I think they all, I mean, I assume the other ones did that as well with the way that

00:10:14.640 | they encode the text. They represent videos and images collection, smaller units called patches.

00:10:20.640 | Again, that's, I think, similar to before. It uses the recaptioning technique from Dali 3,

00:10:28.640 | which involves generating highly scripted captions for the visual training data.

00:10:33.200 | As a result, the model is able to follow the user's text instructions and generate a video

00:10:36.960 | more faithfully. Sora says foundation for models which can understand and simulate the real world.

00:10:42.240 | Yeah, it's not bad. How insane is that? Okay. I don't, yeah, I don't have much more to say.

00:10:58.880 | That's pretty impressive. It seems like it's probably going to be a while before we can do

00:11:04.480 | anything with it. I'm very curious to see where the other open source video generation models end

00:11:11.680 | up. They, I mean, they've, they've seemed like the ones that are ahead for a long time. And then

00:11:17.280 | obviously OpenAI just, they, they share this, but this is, this is really very, very interesting.

00:11:25.440 | I mean, that's all I have. So thank you. Thank you for watching and see you later. Bye.

00:11:43.760 | (gentle music)

00:11:46.340 | you