The State of Generative Media - Gorkem Yurtseven, FAL

It's so nice to see a generative media track in the AI conference, AI engineering conference this year. My company, I work at this company called file.ai. We call ourselves a generative media platform. And this is a term that's been around for a while. But we kind of owned it and we called it the name of our company, generative media platform.

And the way we define it at least is it's a generative video, audio or image. And our company is seeing all these kinds of models using our inference engine. And we are partnering up with some closed source model providers as well. So I've been doing this for a couple of years, but it is a really, really new market.

And throughout the talk, I'm going to walk you through how we got here and a little bit of the history and what's next. I remember in 2022 when Sam Altman started tweeting about Delhi too. I was working at home. It was end of COVID. I know COVID took a little longer in San Francisco, but I remember sitting on the floor, could not believe my eyes.

People were tweeting at him and he was tweeting back pretty high definition images of incredible things that people were tweeting. Like looking back to it, obviously it all looks kind of bad quality, but I remember at the time I was, I thought this was the most incredible technology ever.

And I was, I was in the industry. I knew what was going on, not as much as today, but I thought at that time OpenAI was so far ahead of anything else. And it's going to be so, so hard for normal people to catch up to this technology. I was, I was, I remember this was one of the biggest WTF moments of my life.

But then you can tell me, hey Gerkem, this was, this was all going to happen. There was other AI waves before, before this last big wave. There was a GAN breakthrough that people did similar things using GANs. Deep dream from Google went through a phase. And then there was even a viral consumer AI application of it called Prism.

People uploaded their selfies and they were able to change their avatars. But the capabilities and the applications of the technology was not nearly as much as what generative media can be used today. Not only the previous AI wave, generative media or being able to create art with computers has been around.

Kind of since the computers has been around. And this guy, Harold Cohen, this is a recreation of his project. But basically he created these massive computers to draw on these huge canvas to create art similar to how a human would drive. And then we have computer graphics and generative graphics, things like that.

Throughout the years, people tried to generate visuals and art using different computing technologies all along. Right after Sam Altman's tweet, playing field evened out really, really quickly. So Deli2 was April 6. Right after that, Midjourney released their initial model in beta as a Discord bot. And then very quickly after that, Stable Diffusion open sourced their model, which was a huge, huge thing.

People now were able to run a technology similar to Deli2 in their homes, in their home GPUs. People started building services around it. And then STXL came out. And then now there's many different image models, open and closed source. And most recently, Flux was released early in the summer last year.

And with all this playing field evening out, the marginal cost of creation is approaching zero. And I'm very careful when I choose my words here. I'm not saying marginal cost of creativity. It's marginal cost of creation. I think the storytelling is still really important. Creativity is still really important.

But once you have that set up, creating that next new thing is approaching zero. And we believe this is going to have huge impacts on different kinds of industries and markets. So anything from social media, advertising, marketing, fashion, obviously film and movies, gaming and e-commerce is going to be transformed by generative media.

And this transformation is going to continue until all content one way or the other is impacted by AI. So if you've been following, software has been eating media all along. YouTube, just from basically ads, is generating more revenue than any other media company except Disney. This is pretty remarkable.

And with Disney revenue, there is clearly non-media revenue in there. They have parks, they have cruise ships, they have other things. So it's not too hard to say YouTube right now is one of the highest revenue generating media companies in the world. And it is happening through ads. And whenever ad industry is impacted by technology, it usually grows in volume.

So we believe the same thing is going to happen with generative media and ads. We believe ad industry is going to be the first industries to be impacted at a large scale by generative media. We believe the size of the industry is going to increase. So it's really funny.

Since 2000, every ad spend has been increased. So the industry grew three times since 2000. But all that growth happened in software ads. So we believe something similar is going to happen with AI-driven ads. And the ad industry is going to grow. And most of that growth is going to come from AI.

And there are several different ways how this can happen. We believe ads themselves are hyper-personalized. So this might mean you are generating many different versions of the same ad, but maybe 10,000 different demographics really quickly. Or it can also mean it's targeted towards a certain individual. If you are coming from a certain website, then the ad can be generated on the fly.

And then it could also be interactive in ways that I just mentioned. Things generated on the fly. And that might mean many different things in the industry. One other thing why I think generative media fits the ad industry very well is the abundance of content. For example, I probably won't watch a blockbuster movie every single day.

So even if we have a thousand more movies this year, I probably have to sit down and watch a movie a day to go through all of them. But I probably won't be able to do that. But ads, there can be kind of unlimited content. Every time I'm grabbing my phone, I'm seeing ads.

On TV, there are ads all the time. And it doesn't matter if the ad is different. Like maybe there needs to be some consistency. But ad industry can actually survive with a lot more content and things can get a lot creative. So we were ahead of the time a little bit.

Last year, we did an ad promo with A24's Civil War movie. And it was one of those interactive ideas that I was talking about. So if you've seen the movie, it's about an imaginary civil war in the US. And they had this campaign of these little green toy soldiers.

And they created a live marketing site where you could put a selfie. And then we created a little toy soldier with your selfie and your description. And they put this on Times Square. People were able to display their own faces on these little green toy soldiers. So AI is going to help us create experiences like this that are interactive and personalized.

The other trend we are watching really closely is e-commerce. If you've been paying attention to it, e-commerce is growing about 1% every year. Getting a percentage of the US retail industry. So this is a trend that's happening with or without AI. And we believe generative media is going to play a big, big part on e-commerce's growth as well.

And it's already there are many companies trying to define how people shop online. And because online shopping is very visual, AI can add a lot of interactivity to the experience. In fact, it's one of the earliest product market fits I've seen in generative media. This has been happening for a couple months, maybe a year.

That virtual try-on is one of the clearest product market fits that I see in the AI industry. Many different retailers, e-commerce websites are adapting this technology. Many different startups are being built on it. So I believe this is going to be everywhere. Every retailer, every e-commerce website is a potential generative media user.

And then there is video. So when Samatman tweeted Dali 2, I thought OpenAI was so far ahead and no one was able to catch up. People caught up incredibly fast. So this time, he did the same trick with Sora when Sora was released a year and a half ago, basically.

And this time around, maybe Sora was even more impressive than Dali 2 in terms of how far ahead things look like. But this time around, I was incredibly excited that researchers at OpenAI was able to actually do things like this. And from the past experience, I know that if this is possible in one place, people are going to be able to do similar things in others.

So I was incredibly excited when Samatman started tweeting about Sora because I knew very soon a technology like this was going to be everywhere. In fact, it started happening. So this is a little snapshot of our company's revenue, which I think is a good proxy of the entire market.

Early this year in October, we barely had any video model usage in the platform. And in February, this this went all the way up to 18 percent. And I didn't get time to update it. But I looked yesterday. It's around 30 percent today. So it is growing really fast, even though it's expensive, even though it still doesn't work as well.

Video models are going to completely take over the generative media market. And I have some predictions about how much bigger the video market is going to be compared to the image market. So rough math, but we believe video models are 20x more compute intensive. And let's say if it's 5x more engaging and it's going to impact more industries because it's going to be more useful to the industry.

We believe all said and done, the video market is going to be generated. The video market is going to be 100x to 250x bigger than the image generation market. And we are just scratching the surface here. I believe the image generation market has a ton of growth. That's going to happen in the next couple of years as well.

But video is growing much, much faster than that. And when all said and done, it's going to be a much bigger market. And yeah, video models are leveling up as well. You probably have all seen the newest model from DeepMind, from Google, VO3. We keep adding new capabilities into the video models.

First, it was consistency. And then now with sound, really the things people are generating with it is incredible. And every time a new capability is added, it unlocks a different use case in the industry. So it's not on our platform yet, but I'm very curious to see how people are going to start creating using VO3 and what different use cases it's going to unlock in the ad industry or the e-commerce industry.

So that is very interesting to see. So where is the video market going? We believe there is so, so much to improve. We are going to have faster and cheaper video generation until video generation basically becomes real time. So generating one second of video in one second. So you'll be able to stream generated content to the user.

And this is going to have very different implications on how people interact with this technology. Everything potentially becomes interactive. The line between games and movies gets blurred. So how is this going to impact social apps? How it's going to impact live events? People, like if you play Fortnite or similar games, people are already having live events there.

Is it going to become more lifelike? Our parents, like you know, people who are not used to playing video games are going to be part of this experience. I'm really curious about the future of this technology. And then image models are not done yet as well. There's been a lot of different improvements in the past couple of months on the image models as well.

Flux Context and GPT 4.0 introduced new editing capabilities, better text rendering capabilities. At one point, people thought, okay, maybe this is as good as image models are going to get. But with these new releases and new capabilities, it is opening up to more use cases in the industry. Whenever we see a technological shift like this happening, we see a lot of different, more mature players in the industry picking up these technologies.

So we believe something similar is going to happen with Flux Context and GPT 4.0. And it's going to blend into more of the enterprise use cases people are trying to do. And then this is pretty much it. We are hiring. So please visit our website, fall.aii/careers. We are hiring machine learning engineers, inference engineers, product engineers, all sorts of positions.

And I'll be hanging around the rest of the day today. So find me, talk to me. Would love to discuss whatever related to generative media or about the industry in general. Thank you so much. Thank you so much. you We'll see you next time.

The State of Generative Media - Gorkem Yurtseven, FAL

Chapters

Transcript