back to indexOpenAI's Sora
00:00:00.000 |
There are occasionally these moments in AI, particularly in the past few years, 00:00:05.680 |
where I see something that is completely unexpected and just seems so incredibly 00:00:15.920 |
mind-blowing and far beyond where I would expect AI to be. Now the first time I had that 00:00:23.360 |
mind-blown feeling with AI was when I was getting ready to do a talk with some engineers from OpenAI 00:00:32.320 |
and they showed me what would have been prompt engineering at the time. I don't think it was 00:00:36.880 |
called prompt engineering back then but we prompt engineered GPT-3 to do rag and answer questions 00:00:43.360 |
in different ways and it was really far beyond what I thought AI was at at that point. And now 00:00:53.120 |
OpenAI have released something called Sora and it's just insane. I haven't had a good look at it 00:01:00.640 |
yet. I wanted to, given how rare these moments are where I'm truly just mind blown, I saw one video 00:01:10.160 |
from this and thought okay now it's time to turn the camera on and just take a look together and 00:01:17.200 |
see what it's like. So Sora is an AI model that can create realistic and imaginative scenes from 00:01:25.040 |
text instructions and then this and now all videos on this page were generated directly by Sora 00:01:31.200 |
without modification. I'm sure these are some of the best videos they generated but nonetheless, 00:01:37.040 |
so this background image here is already pretty cool. Let's keep going. This is a video where I 00:01:46.080 |
started watching and thought oh wow this is kind of insane. So this is AI generated. 00:01:52.880 |
I mean it's insane. So there was the odd thing that was weird but it's just incredible how good 00:02:02.320 |
like the background is perfect. It doesn't seem to, there's nothing weird going on. The mood, 00:02:08.640 |
like the person is actually moving through the scene which you don't usually get with AI generated 00:02:16.560 |
videos. It's usually kind of like they're moving slightly and you know the background is maybe 00:02:21.280 |
moving a little bit but this is like insane. The amount of detail and the amount of movement is 00:02:25.600 |
just, I don't even know. I noticed earlier that the legs get a little weird around, 00:02:35.760 |
it's kind of like the hands to disable diffusion. Now it's legs. Around here the leg kind of swaps, 00:02:41.600 |
it's super weird. Oh look at that, her left leg became her right leg which is interesting. It's 00:02:50.800 |
hard to even notice that but this is just insane. I even, like you look at here and you look at the 00:03:00.240 |
jacket and it has these, you know, these four buttons here. You go ahead, I'm just trying to 00:03:04.640 |
find anything that's kind of odd but even here it's like the same four buttons, same jacket, 00:03:10.640 |
maybe oh this is kind of long and big here. Okay so this grew over the video but like, gosh I'm 00:03:19.680 |
really like pointing out these very minor little things. It's just insane. Then you look at the 00:03:26.320 |
prompt and it's not, you know, it's like it's a relative, it's a paragraph. I mean it's nothing 00:03:31.440 |
crazy and that paragraph of text produced this. Gosh I'm gonna have so much good suck video for 00:03:40.240 |
my videos now. This one I saw briefly, I thought this was less impressive but I mean it's so good. 00:03:46.240 |
And then this one, guy with a woolly, it's just like a, I don't even know, it's a bit weird and 00:03:52.640 |
kind of all over the place but it's really pretty cool. I mean look at the detail on the guy and 00:04:01.600 |
then this as well, like it just looks real, no? Am I, like it just looks real. The guy looks real, 00:04:11.440 |
like there's nothing here that doesn't look legit. Photorealistic video to pirate ships, 00:04:23.280 |
How good is that? Like looking at this, would I think it is, like the book is being weird and 00:04:37.360 |
kind of going a little crazy but would I, if this wasn't on OpenAI's website, would I have 00:04:42.400 |
looked at this and thought this is an AI video? I don't, I'm pretty sure I wouldn't. Okay so then 00:04:47.760 |
Sora is becoming available to red teamers, so that it's still very, like it isn't released yet. 00:04:53.520 |
We're also granting access to a number of visual artists, designers and filmmakers to 00:04:57.200 |
gain feedback on how to advance the model to be most helpful for creative professionals. 00:05:01.440 |
How insane, look at this. So this, historical footage of California during the gold rush, 00:05:12.560 |
I mean the prompt is tiny but I would never, if you'd have shown me this two days, a day ago, 00:05:20.480 |
I would be like oh look at this cool, like how did they film this? Were they on a balloon? 00:05:25.120 |
I would have no idea. Things like the human eye are, because we're, I think biologically we're 00:05:36.320 |
so able to, like we know what an eye looks like more than anything else, right? We can read an eye 00:05:43.040 |
and understand it so well and the fact that I look at this and I don't think I can tell that it's not 00:05:53.280 |
real. And it's probably the feature that I as a human would, should be able to distinguish from 00:06:02.560 |
reality the easiest of anything. That should be like the hardest thing to convince me. Maybe it 00:06:09.600 |
moves a bit weirdly, but the eyeball I mean, but really, I mean it's insane. I mean this is going 00:06:21.520 |
to be like the new AI generated photos where you, after a little while, this is interesting. 00:06:28.960 |
So like the people here are tiny and then all of a sudden these people are huge. 00:06:34.080 |
Oh wow, this is, I mean it's not supposed to be like that of course, but it's interesting. 00:06:43.040 |
Yeah, this like creating multiple shots within a single generated video that, 00:06:49.040 |
that you know, the characters remain the same. How, I mean I don't know how 00:06:57.200 |
they can, it's just so impressive. Like it's kind of weird here, like the guys, 00:07:01.440 |
yeah, the perspective is strange. Yeah, that's interesting. So the perspective seems to mess up 00:07:10.400 |
more, more often. That's like a strange thing that it has going on here. Also in the earlier 00:07:17.280 |
video in Lagos in Nigeria, the perspectives were kind of messed up. Same here. Oh my gosh, 00:07:23.440 |
look at this. How cool is that? It looks like a film. There's some weird stuff going on here, 00:07:31.600 |
I feel. Like what is, this guy has like three trainers on or something. So, so it has its, 00:07:40.000 |
has weaknesses. It may struggle with accurately simulating the physics of a complex scene 00:07:44.080 |
and may not understand specific instances of cause and effect. For example, a person might 00:07:50.880 |
take a bite out of a cookie, but afterward the cookie might not have a bite mark. The model may 00:07:56.480 |
also confuse spatial details of a prompt. For example, mixing up left and right and may struggle 00:08:01.600 |
with precise descriptions of events that take place over time, like following a specific camera 00:08:07.440 |
trajectory. Yeah, this is interesting. So yeah, let's have a look at these. This is interesting. 00:08:15.840 |
Oh yeah, so like people and animals just appear. Oh yeah, that's interesting. 00:08:23.520 |
Oh, look at that. That's so cool though, at the same time. 00:08:28.240 |
Archaeologists discover a generic plastic chair in the desert, excavating and dusting 00:08:36.400 |
it with great care. So here it's like they are, 00:08:43.840 |
what's happening here? So here they're taking, what is this? How, how insane. 00:08:55.200 |
So, and then the guy's hands as well, like messed up. But this is, I mean, despite how weird it is, 00:09:06.000 |
it looks so, it feels like I'm watching a, I don't know, like a dream or something. 00:09:13.280 |
And then the actual people themselves, they're pretty impressive. And this is the first version, 00:09:21.200 |
this is just insane. Okay, so that working red team is so people that will basically stress test 00:09:28.800 |
the model, make sure it's not going to do anything weird, and they'll probably go over the top with 00:09:34.000 |
it, but what can you do? Oh, I imagine how good all the UFO videos will be now. That's exciting. 00:09:42.560 |
Oh, it's going to be so difficult to know what's real anymore. All right, 00:09:48.480 |
so Sora is a diffusion model. It generates videos. We're starting off with one that looks like static 00:09:53.440 |
noise. Oh wow, it does the same. How? So it starts with static noise and obviously removes the noise 00:10:00.080 |
over many steps. I think it's the same as video stable diffusion. Then, okay, transformer 00:10:08.320 |
architecture. I think they all, I mean, I assume the other ones did that as well with the way that 00:10:14.640 |
they encode the text. They represent videos and images collection, smaller units called patches. 00:10:20.640 |
Again, that's, I think, similar to before. It uses the recaptioning technique from Dali 3, 00:10:28.640 |
which involves generating highly scripted captions for the visual training data. 00:10:33.200 |
As a result, the model is able to follow the user's text instructions and generate a video 00:10:36.960 |
more faithfully. Sora says foundation for models which can understand and simulate the real world. 00:10:42.240 |
Yeah, it's not bad. How insane is that? Okay. I don't, yeah, I don't have much more to say. 00:10:58.880 |
That's pretty impressive. It seems like it's probably going to be a while before we can do 00:11:04.480 |
anything with it. I'm very curious to see where the other open source video generation models end 00:11:11.680 |
up. They, I mean, they've, they've seemed like the ones that are ahead for a long time. And then 00:11:17.280 |
obviously OpenAI just, they, they share this, but this is, this is really very, very interesting. 00:11:25.440 |
I mean, that's all I have. So thank you. Thank you for watching and see you later. Bye.