if you haven't seen it, Sora is the text video model from open AI. And this, if you weren't studying it, would look like a major feature film. The traditional approach for rendering video is you create three dimensional objects. And then you have a rendering engine that renders those objects.
And then you have a system that defines where the camera goes. And that's how you get the visual that you use to generate a 2d movie like this. This doesn't do that. This was a trained model. So how would you train a model to do this without having a 3d space, the compute necessary to define each of those objects, place them in 3d space is practically impossible today, my guess is that open AI used a tool like Unreal Engine five and generated tons and tons of video content, tagged it labeled it.
And we're then able to use that to train this model that can for whatever reason that we don't understand, do this. You're referring to synthetic training data.