Back to Index

The 2025 AI Engineering Report — Barr Yaron, Amplify


Transcript

Tanya Cushman Reviewer: Peter van de Ven All right. Hi, everyone. Thank you for having me here. And huge thanks to Ben, to SWIX, to all the organizers who've put so much time and heart into bringing this community together. Yeah. All right. So we're here because we care about AI engineering and where this field is headed.

So to better understand the current landscape, we launched the 2025 State of AI Engineering Survey, and I'm excited to share some early findings with you today. All right. Before we dive into the results, the least interesting slide. I don't know everyone in this audience, but I'm Barr. I'm an investment partner at Amplify, where I'm lucky to invest in technical founders, including companies built by and for AI engineers.

And with that, let's get into what you actually care about, which is enough bar and more bar charts. And there are a lot of bar charts coming up. Okay. So first, our sample, we had 500 respondents fill out the survey, including many of you here in the audience today and on the live stream.

Thank you for doing that. And the largest group called themselves engineers, whether software engineers or AI engineers. While this is the AI engineering conference, it's clear from the speakers, from the hallway chats. There's a wide mix of titles and roles. You even let a VC sneak in. So let's test this with a quick show of hands.

Raise your hand if your title is actually AI engineer at the AI engineering conference. Okay. That is extremely sparse. Raise your hand. Put your hands down. Raise your hand if your title is something else entirely. So that should be almost everyone. Keep it up if you think you're doing the exact same work as many of the AI engineers.

All right. So this sort of tracks. Titles are weird right now, but the community is broad. It's technical. It's growing. We expect that AI engineer label to gain even more ground. Couldn't help myself. Quick Google trend search. Term AI engineering barely registered before late 2022. We know what happened.

ChatGPT launched. And the moment for AI engineering interest has not slowed since. Okay. So people had a wide variety of titles, but also a wide variety of experience. The interesting part here is that many of our most seasoned developers are AI newcomers. So among software engineers with 10-plus years of software experience, nearly half have been working with AI for three years or less, and one in 10 started just this past year.

So change right now is the only constant, even for the veterans. So what are folks actually building? Let's get into the juice. So more than half of the respondents are using LLMs for both internal and external use cases. What was striking to me was that three out of the top five models and half of the top 10 models that respondents are using for those external cases for the customer-facing products are from OpenAI.

The top use cases that we saw are code generation and code intelligence and writing assistant content generation. Maybe that's not particularly surprising. But the real story here is heterogeneity. So 94% of people who use LLMs are using it for at least two use cases. 82% using it for at least three.

Basically, folks who are using LLMs are using it internally, externally, and across multiple use cases. All right. So you may ask, how are folks actually interfacing with the models? And how are they customizing their systems for these use cases? Besides few-shot learning, RAG is the most popular way folks are customizing their systems.

So 70% of respondents said they're using it. The real surprise for me here, I'm looking to gauge surprise in the audience, was how much fine-tuning is happening across the board. It was much more than I had expected overall. In the sample, we have researchers and we have research engineers who are the ones doing fine-tuning by far the most.

We also asked an open-ended question for those who were fine-tuning. What specific techniques are you using? So here's what the fine-tuners had to say. 40% mentioned Laura or QLaura reflecting a strong preference for parameter-efficient methods. We also saw a bunch of different fine-tuning methods, including DPO, reinforcement fine-tuning, and the most popular core training approach was good old supervised fine-tuning.

Many hybrid approaches were listed as well. Moving on top of updating systems, sometimes it can feel like new models come out every single week. Just as you finished integrating one, another one drops with better benchmarks and a breaking change. So it turns out more than 50% are updating their models at least monthly, 17% weekly.

And folks are updating their prompts much more frequently. So 70% of respondents are updating prompts at least monthly, and 1 in 10 are doing it daily. So it sounds like some of you have not stopped typing since GPT-4 dropped. But I also understand I have empathy seeing one blog post from Simon Willison, and suddenly your trusty prompt just isn't good enough anymore.

Despite all of these prompt changes, a full 31% of respondents don't have any way of managing their prompts. What I did not ask is how AI engineers feel about not doing anything to manage their prompts. So we have the 2026 survey for that. We also asked folks across the different modalities who is actually using these models at work, and is it actually going well?

And we see that image, video, and audio usage all lag text usage by significant margins. I like to call this the multimodal production gap, because I wanted an animation. And this gap still persists when we add in folks who have these models in production, but have not garnered as much traction.

Okay, what's interesting here is when we add the folks who are not using models at all in this chart 2. So here we can see folks who are not using text, not using image, not using audio, or not using video. And we have two categories. It's broken down by folks who plan to eventually use these modalities, and folks who do not currently plan to.

You can roughly see this ratio of no plan to adopt versus plan to adopt. Audio has the highest intent to adopt, so 37% of the folks not using audio today have a plan to eventually adopt audio. So get ready to see an audio wave. Of course, as models get better and more accessible, I imagine some of these adoption numbers will go up even further.

All right, so we have to talk about agents. One question I almost put in the survey was how do you define an AI agent, but I thought I would still be reading through different responses. So for the sake of clarity, we defined an AI agent as a system where an LLM controls the core decision-making or workflow.

So 80% of respondents say LLMs are working well at work, but less than 20% say the same about agents. Agents aren't everywhere yet, but they're coming. The majority of folks may not be using agents, but most at least plan to. Fewer than one in ten say that they will never use agents, all to say that people want their agents, and I'm probably preaching to the choir.

The majority of agents already in production do have write access, typically with a human in the loop, and some can even take actions independently. So excited as more agents are adopted to learn more about the tool permissioning that folks have access to. If we want AI in production, of course we need strong monitoring and observability.

So we asked, do you manage and monitor your AI systems? This was a multi-select question, so most folks are using multiple methods to monitor their systems. 60% are using standard observability. Over 50% rely on offline eval. And we asked the same thing for how you evaluate your model and system accuracy and quality.

So folks are using a combination of methods, including data collection from users, benchmarks, et cetera. But the most popular at the end of the day is still human review. And for monitoring their own model usage, most respondents rely on internal metrics. So storage is important, too. Where does the context live?

How do we get it when we need it? 65% of respondents are using a dedicated vector database. And this suggests that for many use cases, specialized vector databases are providing enough value over general purpose databases with vector extensions. Among that group, 35% said that they primarily self-host. 30% primarily use a third-party provider.

All right. So I think we've been having fun this whole time, but we're entering a section I like to formally call other fun stuff. I spent hours workshopping the name. So we asked AI engineers, should agents be required to disclose when they're AI and not human? Most folks think yes.

Agents should disclose that they're AI. We asked folks if they'd pay more for inference-time compute, and the answer was yes, but not by a wide margin. And we asked folks if transformer-based models will be dominant in 2030, and it seems like people do believe that attention is all we'll need in 2030.

The majority of respondents also think open-source and closed-source models are going to converge. So I will let you debate that after. No commentary needed here. So the average or the mean guess for the percentage of U.S. Gen Z population that will have AI girlfriends, boyfriends, is 26%. I don't really know what to say or expect here, but we'll see.

We'll see what happens in a world where folks don't know if they're being left unread or just facing latency issues, or of course the dreaded, it's not you, it's my algorithm. And finally, we asked folks what is the number one most painful thing about AI engineering today, and evaluation topped that list.

So it's a good thing this conference and the talk before me has been so focused on evals because clearly they're causing some serious pain. Okay, and now to bring us home, I'm going to show you what's popular. So we asked folks to pick all the podcasts and newsletters that they actively learn something from at least once a month.

And these were the top 10 of each so if you're looking for new content to follow and to learn from, this is your guide. Many of the creators are in this room, so keep up the great work. And I'll just shout out that SWIX is listed both on popular newsletter and popular podcast for Latent Space, so I will just leave this here.

I think that's enough bar charts and bar time, but if you want to geek out about AI trends, you can come find me online in the hallways. We're going to be publishing a full report next week. I'll let Elon and Musk have Twitter today, but it's going to include more juicy details including everyone's favorite models and tools across the stack.

Thank you for the time. Enjoy the afternoon. Thank you. Thank you. We'll see you next time.