Back to Index

The State of AI Startups in 2024 [LS Live @ NeurIPS]


Transcript

(upbeat music) - Okay, I think we're gonna kick this off. Thanks to everyone who made it early morning. It's like really weird experiments that we wanted to try because one, we saw this space, but two also, I've been to a number of these things now and I always felt like there was not enough like industry content for people and we wanted an opportunity while everyone is in town in like one central spot to get everyone together to talk about the best stuff of the year, review the year.

It's very nice that New York is always the end of the year. And so I'm very honored that Sarah and Pranav have agreed to help us kick this off. Sarah, I've known for, I was actually counting, 17 years. - Sounds very much. (laughing) - But she's been enormously successful as an AI investor.

Even when you're doing your Greylock days, I was tracking your investing and it's come a long way since then. And Pranav, I've known shorter, but he's also starting to write really incredible posts and opinions about what he's seeing as an investor. So I wanted to kick this off with an industry session.

We have a great day of sort of like best of year recaps lined up. I think Vic is here as well and the RoboFlow guys. So I would just let you kick it off. Thank you. (clicking) - Hi everyone. My name is Sarah Guo and thanks to Sean and friends here for having me and Pranav.

So I'd start by just giving 30 seconds of intro. I promise this isn't an ad. We started a venture fund called Conviction about two years ago. Here is a set of the investments we've made. They range from companies at the infrastructure level in terms of feeding the revolution to a foundation model companies, alternative architectures, domain specific training efforts, and of course applications.

And the premise of the fund, Sean mentioned I worked at Greylock for about a decade before that and came from the product engineering side was that we thought that there was a really interesting technical revolution happening, that it would probably be the biggest change in how people use technology in our lifetimes.

And that represented huge economic opportunity and maybe that there'd be an advantage versus the incumbent venture firms in that when the floor is lava, the dynamics of the markets change, the types of products and founders that you back change, it's a lot for existing firms to ingest and a lot of their mental models may not apply in the same way.

And so there was an opportunity for first principles thinking, and if we were right, we'd do really well and get to work with amazing people. And so we are two years into that journey and we can share some of the opinions and predictions we have with all of you.

Sorry, I'm just making sure that isn't actually blocking the whole presentation. And Pran's gonna start us off. - So quick agenda for today, we'll cover some of the model landscapes and themes that we've seen in 2024, what we think is happening in AI startups and then some of our latent priors on what we think is working in investing.

So I thought it'd be useful to start from like, what was happening at NeurIPS last year in December, 2023. So in October, 2023, OpenAI had just launched the ability to upload images to ChatGPT, which means up until that moment, it's hard to believe, but like roughly a year ago, you could only input text and get text out of ChatGPT.

The Mistral folks had just launched the Mixtral model right before the beginning of NeurIPS. Google had just announced Gemini. I very genuinely forgot about the existence of Bard before making these slides. And Europe had just announced that they were doing their first round of AI regulation, but not to be their last.

And when we were thinking about like, what's changed in 2024, there's at least five themes that we could come up with that feel like they were descriptive of what 2024 has meant for AI and for startups. And so we'd start with, first, it's a much closer race on the foundation model side than it was in 2023.

So this is Elm Arena, they're asking users to rate the evaluations of generations from specific prompts. So you get two responses from two language models, answer which one of them is better. The way to interpret this is like roughly 100 Elo difference means that you're preferred two thirds of the time.

And a year ago, every OpenAI model was like more than 100 points better than anything else. And the view from the ground was roughly like, OpenAI is the IBM, there is no point in competing. Everyone should just give up, go work at OpenAI or attempt to use OpenAI models.

And I think the story today is not that. I think it would have been unbelievable a year ago if you told people that, A, the best model today on this, at least on this eval is not OpenAI. And B, that it was Google would have been pretty unimaginable to the majority of researchers.

But actually there are a variety of proprietary language model options and some set of open source options that are increasingly competitive. And this seems true, not just on the eval side, but also in actual spend. So this is RAMP data. There's a bunch of colors, but it's actually just OpenAI and Anthropic spend.

And the OpenAI spend at the beginning, at the end of last year in November of '23 was close to 90% of total volume. And today, less than a year later, it's closer to 60% of total volume, which I think is indicative both that language models are pretty easy APIs to switch out and people are trialing a variety of different options to figure out what works best for them.

Related, second trend that we've noticed is that open source is increasingly competitive. So this is from the scale leaderboards, which is a set of independent evals that are not contaminated. And on a number of topics that actually the foundation models clearly care a great deal about. Open source models are pretty good on math instruction following and adversarial robustness.

The Lama model is amongst the top three of evaluated models. I included the agentic tool use here, just to point out that this isn't true across the board. There are clearly some areas where foundation model companies have had more data or more expertise in training against these use cases, but models are surprisingly an increasing, open source models are surprisingly increasingly effective.

This feels true across evals. This is the MMLU eval. I wanna call out two things here. One is that it's pretty remarkable that the ninth best model and two points behind the best state-in-the-art models is actually a 70 billion parameter model. I think this would have been surprising to a bunch of people who were, the belief was largely that most intelligence is just an emergent property, and there's a limit to how much intelligence you can push into smaller form factors.

In fact, a year ago, the best small model or under 10 billion parameter model would have been Mistral-7b, which on this eval, if memory service is somewhere around a 60, and today that's the LLAMA-8b model, which is more than 10 points better. The gap between what is state-of-the-art and what you can fit into a fairly small form factor is actually shrinking.

And again, related, we think the price of intelligence has come down substantially. This is a graph of flagship OpenAI model costs, where the cost of the API has come down roughly 80, 85%, and call it the last year, year and a half, which is pretty remarkable. This isn't just OpenAI 2.

This is also the full set of models. This is from artificial analysis, which tracks cost per token across a variety of different APIs and public inference options. And we were doing some math on this. If you wanted to recreate the kind of data that a text editor had, or something like Notion or Coda, that's somewhere in the volume of a couple thousand dollars to create that volume of tokens, that's pretty remarkable and impressive.

It's clearly not the same distribution of data, but just as a sense of scope, there's an enormous volume of data that you can create. And then fourth, we think new modalities are beginning to work. Start quickly with biology. We're lucky to work with the folks at Chai Discovery, who just released Chai 1, which is open source model that outperforms AlphaFold 3.

It's impressive that this is like roughly a year of work with a pretty specific data set and then pretty specific technical beliefs. But models in domains like biology are beginning to work. We think that's true on the voice side as well. Point out that there were voice models before things like 11 Labs have existed for a while, but we think low latency voice is more than just a feature, it's actually a net new experience and interaction.

Using voice mode feels very different than the historical transcription first models. Same thing with many of the Cartesian models. And then a new nascent use case is execution. So cloud-launched computer use, OpenAI launched code execution inside of Canvas yesterday. And then I think Devin just announced that you can all try it for $500 a month, which is pretty remarkable.

It's a set of capabilities that have historically never been available to vast majority of population. And I think we're still in early innings. Cognition, the company was founded under a year ago. First product was roughly nine months ago, which was pretty impressive. - If you recall, like a year ago, the point of view on Sui Bench was like, it was impossible to surpass 15% or so.

And I think the whole industry now considers that, if not trivial, accessible. - Yeah. Last new modality we wanted to call out, although there are many more, is video. I got early access to Sora and managed to sign up before they cut off accesses. So here's my favorite joke in the form of a video.

Hopefully someone here can guess it. Yeah, you're telling me I shrimp fried this rice. It's a pretty bad joke, but I really like it. And I think this one, the next video here is one of our portfolio companies, Heygen, that translated and does the dubbing for, or lip sync and dubbing for live speeches.

So this is Javier Millet, who speaks in Spanish, but here you will hear him in English if this plays. And you can see that you can capture the original tonality of his speech and performance. I think audio here doesn't work, but we'll push something publicly. - Let's give it a shot.

- Yeah. Excellent. Yeah, and you can hear that this captures his original tone and the emotion in his speech, which is definitely new and pretty impressive from new models. So the last... Yeah, that makes sense. The last point that we wanted to call out is the much purported end of scaling.

I think there's a great debate happening here later today on the question of this, but we think at minimum, it's hard to deny that there are at least some limits to the clear benefits to increasing scale, but there also seems like there are new scaling paradigms. So the question of test-time compute scaling is a pretty interesting one.

It seems like OpenAI has cracked a version of this that works, and we think, A, foundation model labs will come up with better ways of doing this, and B, so far it largely works for very verifiable domains, things that look like math and physics and maybe secondarily software engineering, where we can get an objective value function.

And I think an open question for the next year is going to be how do we generate those value functions for spaces that are not as well-constrained or well-defined? And so the question that this leaves us in is like, well, what does that mean for startups? And I think a prevailing view has been that we live in an AI bubble.

There's an enormous amount of funding that goes towards AI companies and startups that is largely unjustified based on outcomes and what's actually working on the ground, and startups are largely raising money on hype. And so we pulled some pitch book data, and the 2024 number is probably incomplete since not all rounds are being reported, and largely suggests like actually there is a substantial recovery in funding, and maybe 2025 looks something like 2021.

But if you break out the numbers here a bit more, the red is actually just a small number of foundation model labs, like what you would think of as the largest labs raising money, which is upwards of 30 to $40 billion this year. And so the reality of the funding environment actually seems like much more sane and rational.

It doesn't look like we're headed to a version of 2021. In fact, the foundation model labs account for an outsized amount of money being raised, but the set of money going to companies that are working seems much more rational. And we wanted to give you, we can't share numbers for every company, but this is one of our portfolio companies growing really, really quickly.

We think zero to 20 and just PLG style spending is pretty impressive. If any of you are doing better than that, you should come find us. We'd love to chat. And so what we wanted to try and center discussion on, this is certainly not all of the companies that are making 10 million more or revenue and growing, but we took a selection of them and wanted to give you a couple ideas of patterns that we've noticed that seem to be working across the board.

The first one that we've noticed is like first wave service automation. So we think there's a large amount of work that doesn't get done at companies today, either because it is too expensive to hire someone to do it. It's too expensive to provide them context and enable them to be successful at whatever the specific role is, or it's too hard to manage those set of people.

So prescribing it's too expensive to hire the specific set of people. For Sierra and Decagon, for customer support style companies, it's really useful to do like next level automation. And then there's obviously growth in that. And for Harvey and Even Up, the story is you can do first wave professional services and then grow beyond that.

Second trend that we've noticed is a better search new friends. So we think that there is a, it's pretty impressive like how effective text modalities have been. So Character and Replica have been remarkably successful companies. And there's a whole host of not safer work chatbots as well that are pretty effective at just text generation.

They're pretty compelling mechanisms. And on the productivity side, Perplexity and Glean have demonstrated this as well. I worked at a search company for a while. I think the changing paradigms of the how people capture and learn information is pretty interesting. We think it's likely text isn't the last medium.

They're infographics for sets of information that seem more useful or sets of engagement that are more engaging. But this feels like a pretty interesting place to start. - Oh, yeah. Okay, Mike. So one thing that I've worked on investing in in a long time is democratization of different skills, be they creative or technical.

This has been an amazing few years for that across different modalities, audio, video, general image, media, text, and now code and really fully functioning applications. One thing that's really interesting about the growth driver for all of these companies is the end users, in large part, are not people that we thought of as, we, the venture industry, you know, the royal we, thought of as important markets before.

And so a premise we have as a fund is that there's actually much more instinct for creativity, visual creativity, audio creativity, technical creativity, than like there's latent demand for it. And AI applications can really serve that. I think in particular, Midjourney was a company that is in the vanguard here and nobody understood for a long time because the perhaps outside view is like how many people want to generate images that are not easily, you know, they're raster, they're not easily editable, they can't be using these professional contexts in a complete way.

And the answer is like an awful lot, right, for a whole range of use cases. And I think we'll continue to find that, especially as the capabilities improve. And we think the range of quality and controllability that you can get in these different domains is still, it's very deep and we're still very early.

And then I think as, if we're in the first or second inning of this AI wave, one obvious place to go invest and to go build companies is the enabling layers, right? Shorthand for this is obviously compute and data. I think that the needs for data are largely changed now as well.

You need more expert data. You need different forms of data. We'll talk about that later in terms of who has, like let's say reasoning traces in different domains that are interesting to companies doing their own training. But this is an area that has seen explosive growth and we continue to invest here.

Okay, so maybe time for some opinions. There was a prevailing narrative that some part from companies, some part from investors, it's a fun debate as to where is the value in the ecosystem and can there be opportunities for startups? If you guys remember the phrase GPT wrapper, it was like the dominant phrase in the tech ecosystem for a while of, and what it represented was this idea that there was no value at the application layer.

You had to do pre-training and then like nobody's gonna catch open AI in pre-training. And this isn't like a knock on open AI at all. These labs have done amazing work enabling the ecosystem and we continue to partner with them and others. But it's simply untrue as a narrative, right?

The odds are clearly in favor of a very rich ecosystem of innovation. You have a bunch of choices of models that are good at different things. You have price competition, you have open source. I think an underappreciated impact of test time scaling is you're going to better match user value with your spend on compute.

And so if you are a new company that can figure out how to make these models useful to somebody, the customer can pay for the compute instead of you taking as a startup, the CapEx for pre-training or RL upfront. And as Pranav mentioned, small models, especially if you know the domain can be unreasonably effective.

And the product layer has, if we look at the sort of cluster of companies that we described, shown that it is creating and capturing value and that it's actually a pretty hard thing to build great products that leverage AI. So broadly, like we have a point of view that I think is actually shared by many of the labs that the world is full of problems and the last mile to go take even AGI into all of those use cases is quite long.

Okay, another prevailing belief is that, or another great debate that Sean could host is like, does the value go to startups or incumbents? We must admit some bias here, even though we have friends and portfolio, former portfolio companies that would be considered incumbents now. But, oh, sorry, swap views.

Sorry, there are markets in venture that have been considered traditionally like too hard, right? Like just bad markets for the venture capital spec, which is capital efficient, rapid growth. That's a venture backable company where the end output is a tens of billions of dollars of enterprise value company. And these included areas like legal healthcare, defense, pharma, education, any traditional venture firm would say like bad market, nobody makes money there, it's really hard to sell, there's no budget, et cetera.

And one of the things that's interesting is if you look at the cluster of companies that has actually been effective over the past year, some of them are in these markets that were traditionally non-obvious, right? And so perhaps one of our more optimistic views is that AI is really useful.

And if you make a capability that is novel, that is several magnitudes, orders of magnitude cheaper, then actually you can change the buying pattern and the structure of these markets. And maybe the legal industry didn't buy anything 'cause it wasn't anything worth buying for a really long time, that's one example.

We also think that like, what was the last great consumer company? Maybe it was Discord or Roblox in terms of things that started that have just like really enormous user basis and engagement until we had these consumer chatbots of different kinds and like the next, perhaps the next generation of search.

As Pranav mentioned, we think that the opportunity for social and media generation and games is large and new in a totally different way. And finally, in terms of the markets that we look at, I think there's broad recognition now that you can sell against outcomes and services rather than software spend with AI because you're doing work versus just giving people the ability to do a workflow.

But if you take that one step further, we think there's elastic demand for many services, right? Our classic example is there's on order of 20 to 25 million professional software developers in the world. I imagine much of this audience is technical. Demand for software is not being met, right?

If we take the cost of software and high quality software down to orders of magnitude, we're just gonna end up with more software in the world. We're not gonna end up with fewer people doing development. At least that's what we would argue. And then finally, on the incumbent versus startup question, the prevailing narrative is incumbents have the distribution, the product surfaces and the data.

Don't bother competing with them. They're gonna create and capture the value and share some of it back with their customers. I think this is only partially true. Incumbents have the distribution. They have always had the distribution. Like the point of the startup is you have to go fight with a better product or a more clever product and maybe a different business model to go get new distribution.

But the specifics around the product surface and the data I think are actually worth understanding. There's a really strong innovators dilemma. If you look at the SaaS companies that are dominant, they sell by seat. And if I'm doing the work for you, I don't necessarily wanna sell you seats.

I might actually decrease the number of seats. The tens, the decades of years, the millions of man and woman hours of code that have been written to enable a particular workflow in CRM, for example, may not matter if I don't want people to do that workflow of filling out the database every Friday anymore.

And so I do think that this sunk cost or the incumbent advantage gets highly challenged by new UX and code generation as well. And then one disappointing learning that we found in our own portfolio is no one has the data we want in many cases, right? So imagine you are trying to automate a specific type of knowledge work.

And what you want is the reasoning trace, all of the inputs and the output decision. Like that sounds like a very useful set of data. And the incumbent companies in any given domain, they never save that data, right? Like they have a database with the outputs some of the time.

And so I would say one of the things that is worth thinking through as a startup is when an incumbent says they have the data, like what is the data you actually need to make your product higher quality? Okay, so in summary, our shorthand for the set of changes that are happening is software 3.0.

We think it is a full stack rethinking and it enables a new generation of companies to have a huge advantage. The speed of change favors startups. If the floor is lava, it's really hard to turn a really big ship. I think that some of the CEOs of large companies now are incredibly capable, but they're still trying to make 100,000 people move very quickly in a new paradigm.

The market opportunities are different, right? These markets that we think are interesting and very large, like represent a trillion dollars of value are not just the replacement software markets of the last two decades. It's not clear what the business model for many of these companies should be. Sierra just started talking about charging for outcomes.

Outcomes-based pricing has been this holy grail idea in software, and it's been very hard, but now we do more work. There are other business model challenges. And so, our companies, they spend a lot more on compute than they have in the past. They spend a lot with the foundation model providers.

They think about gross margin. They think about where to get the data. It's a time where you need to be really creative about product versus just replace the workflows of the past. And it might require ripping out those workflows entirely. It's a different development cycle. I bet most of the people in this room have written evals and compared the academic benchmark to a real-world eval and said, "That's not it." And how do I make a user understand the non-deterministic nature of these outputs or gracefully fail?

I think that's a different way to think about product than in the past. And we need to think about infrastructure again. There was this middle period where the cloud providers, the hyperscalers, took this problem away from software developers, and it was all just gonna be front-end people at some point.

And it's like, we are not there anymore. We're back in the hardware era where people are acquiring and managing and optimizing compute. And I think that will really matter in terms of capability in companies. So I guess we'll end with a call to action here and encourage all of you to seize the opportunity.

It is the greatest technical and economic opportunity that we've ever seen. We made a decade-plus career-type bet on it. And we do a lot of work with the foundation model companies. We think they are doing amazing work, and they're great partners and even co-investors in some of our efforts.

But I think all of the focus on their interesting missions around AGI and safety do not mean that there are not opportunities in other parts of the economy. The world is very large, and we think much of the value will be distributed in the world through an unbundling and eventually a re-bundling, as often happens in technology cycles.

So we think this is a market that is structurally supportive of startups. We're really excited to try to work with the more ambitious ones. And the theme of 2024 to us has been like, well, thank goodness, this is an ecosystem that is much friendlier to startups than 2023. It is what we hoped.

And so, you know, please ask those questions and take advantage of the opportunity. - Thanks for joining our first talk from Latent Space Live at NeurIPS 2024 in Vancouver. As always, this is your AI co-host, Charlie. I want to give a huge thank you to Sarah Guo and Pranav Reddy for sharing their invaluable insights on the state of AI in 2024.

Be sure to check the link in the description for their presentation slides, social media links, and additional resources. Watch out and take care. (upbeat music)