back to indexThe 2025 AI Engineering Report — Barr Yaron, Amplify

00:00:32.840 |
And huge thanks to Ben, to SWIX, to all the organizers 00:00:36.340 |
who've put so much time and heart into bringing this community together. 00:00:45.880 |
So we're here because we care about AI engineering 00:00:50.460 |
So to better understand the current landscape, 00:00:52.720 |
we launched the 2025 State of AI Engineering Survey, 00:00:57.200 |
and I'm excited to share some early findings with you today. 00:01:00.040 |
All right. Before we dive into the results, the least interesting slide. 00:01:07.040 |
I don't know everyone in this audience, but I'm Barr. 00:01:12.820 |
where I'm lucky to invest in technical founders, 00:01:15.680 |
including companies built by and for AI engineers. 00:01:18.660 |
And with that, let's get into what you actually care about, 00:01:30.400 |
So first, our sample, we had 500 respondents fill out the survey, 00:01:35.440 |
including many of you here in the audience today and on the live stream. 00:01:42.140 |
And the largest group called themselves engineers, 00:01:51.520 |
it's clear from the speakers, from the hallway chats. 00:02:00.260 |
So let's test this with a quick show of hands. 00:02:02.460 |
Raise your hand if your title is actually AI engineer 00:02:15.040 |
Raise your hand if your title is something else entirely. 00:02:20.640 |
Keep it up if you think you're doing the exact same work 00:02:29.680 |
Titles are weird right now, but the community is broad. 00:02:33.920 |
We expect that AI engineer label to gain even more ground. 00:02:40.620 |
Term AI engineering barely registered before late 2022. 00:02:47.240 |
And the moment for AI engineering interest has not slowed since. 00:02:56.440 |
The interesting part here is that many of our most seasoned developers 00:03:01.620 |
So among software engineers with 10-plus years of software experience, 00:03:05.920 |
nearly half have been working with AI for three years or less, 00:03:11.980 |
So change right now is the only constant, even for the veterans. 00:03:20.340 |
So more than half of the respondents are using LLMs for both internal 00:03:26.740 |
What was striking to me was that three out of the top five models 00:03:30.940 |
and half of the top 10 models that respondents are using for those external cases 00:03:35.740 |
for the customer-facing products are from OpenAI. 00:03:38.780 |
The top use cases that we saw are code generation and code intelligence 00:03:51.600 |
So 94% of people who use LLMs are using it for at least two use cases. 00:03:59.540 |
Basically, folks who are using LLMs are using it internally, externally, 00:04:07.040 |
So you may ask, how are folks actually interfacing with the models? 00:04:10.840 |
And how are they customizing their systems for these use cases? 00:04:16.140 |
Besides few-shot learning, RAG is the most popular way folks are customizing 00:04:25.560 |
The real surprise for me here, I'm looking to gauge surprise in the audience, 00:04:30.660 |
was how much fine-tuning is happening across the board. 00:04:34.560 |
It was much more than I had expected overall. 00:04:37.940 |
In the sample, we have researchers and we have research engineers who are the ones 00:04:44.180 |
We also asked an open-ended question for those who were fine-tuning. 00:04:52.780 |
40% mentioned Laura or QLaura reflecting a strong preference for parameter-efficient methods. 00:04:59.820 |
We also saw a bunch of different fine-tuning methods, including DPO, reinforcement fine-tuning, 00:05:06.020 |
and the most popular core training approach was good old supervised fine-tuning. 00:05:17.380 |
Moving on top of updating systems, sometimes it can feel like new models come out every single week. 00:05:25.980 |
Just as you finished integrating one, another one drops with better benchmarks and a breaking change. 00:05:31.320 |
So it turns out more than 50% are updating their models at least monthly, 17% weekly. 00:05:39.860 |
And folks are updating their prompts much more frequently. 00:05:43.600 |
So 70% of respondents are updating prompts at least monthly, and 1 in 10 are doing it daily. 00:05:49.980 |
So it sounds like some of you have not stopped typing since GPT-4 dropped. 00:05:53.740 |
But I also understand I have empathy seeing one blog post from Simon Willison, and suddenly your trusty prompt just isn't good enough anymore. 00:06:05.420 |
Despite all of these prompt changes, a full 31% of respondents don't have any way of managing their prompts. 00:06:14.020 |
What I did not ask is how AI engineers feel about not doing anything to manage their prompts. 00:06:24.360 |
We also asked folks across the different modalities who is actually using these models at work, and is it actually going well? 00:06:33.280 |
And we see that image, video, and audio usage all lag text usage by significant margins. 00:06:39.620 |
I like to call this the multimodal production gap, because I wanted an animation. 00:06:47.020 |
And this gap still persists when we add in folks who have these models in production, but have not garnered as much traction. 00:06:56.740 |
Okay, what's interesting here is when we add the folks who are not using models at all in this chart 2. 00:07:08.080 |
So here we can see folks who are not using text, not using image, not using audio, or not using video. 00:07:16.800 |
It's broken down by folks who plan to eventually use these modalities, and folks who do not currently plan to. 00:07:24.060 |
You can roughly see this ratio of no plan to adopt versus plan to adopt. 00:07:29.800 |
Audio has the highest intent to adopt, so 37% of the folks not using audio today have a plan to eventually adopt audio. 00:07:41.880 |
Of course, as models get better and more accessible, I imagine some of these adoption numbers will go up even further. 00:07:51.160 |
One question I almost put in the survey was how do you define an AI agent, but I thought I would still be reading through different responses. 00:07:58.900 |
So for the sake of clarity, we defined an AI agent as a system where an LLM controls the core decision-making or workflow. 00:08:06.900 |
So 80% of respondents say LLMs are working well at work, but less than 20% say the same about agents. 00:08:15.640 |
Agents aren't everywhere yet, but they're coming. 00:08:19.640 |
The majority of folks may not be using agents, but most at least plan to. 00:08:24.580 |
Fewer than one in ten say that they will never use agents, all to say that people want their agents, and I'm probably preaching to the choir. 00:08:32.060 |
The majority of agents already in production do have write access, typically with a human in the loop, and some can even take actions independently. 00:08:43.260 |
So excited as more agents are adopted to learn more about the tool permissioning that folks have access to. 00:08:51.000 |
If we want AI in production, of course we need strong monitoring and observability. 00:08:56.880 |
So we asked, do you manage and monitor your AI systems? 00:09:00.140 |
This was a multi-select question, so most folks are using multiple methods to monitor their systems. 00:09:12.320 |
And we asked the same thing for how you evaluate your model and system accuracy and quality. 00:09:18.160 |
So folks are using a combination of methods, including data collection from users, benchmarks, et cetera. 00:09:24.360 |
But the most popular at the end of the day is still human review. 00:09:28.760 |
And for monitoring their own model usage, most respondents rely on internal metrics. 00:09:40.120 |
65% of respondents are using a dedicated vector database. 00:09:44.420 |
And this suggests that for many use cases, specialized vector databases are providing enough value over 00:09:50.400 |
general purpose databases with vector extensions. 00:09:53.540 |
Among that group, 35% said that they primarily self-host. 00:10:03.480 |
So I think we've been having fun this whole time, but we're entering a section I like to formally call other fun stuff. 00:10:11.860 |
So we asked AI engineers, should agents be required to disclose when they're AI and not human? 00:10:22.740 |
We asked folks if they'd pay more for inference-time compute, and the answer was yes, but not by a wide margin. 00:10:28.740 |
And we asked folks if transformer-based models will be dominant in 2030, and it seems like people do believe that attention is all we'll need in 2030. 00:10:37.120 |
The majority of respondents also think open-source and closed-source models are going to converge. 00:10:48.000 |
So the average or the mean guess for the percentage of U.S. Gen Z population that will have AI girlfriends, boyfriends, is 26%. 00:10:57.880 |
I don't really know what to say or expect here, but we'll see. 00:11:01.840 |
We'll see what happens in a world where folks don't know if they're being left unread or just facing latency issues, or of course the dreaded, it's not you, it's my algorithm. 00:11:16.220 |
And finally, we asked folks what is the number one most painful thing about AI engineering today, and evaluation topped that list. 00:11:24.220 |
So it's a good thing this conference and the talk before me has been so focused on evals because clearly they're causing some serious pain. 00:11:31.720 |
Okay, and now to bring us home, I'm going to show you what's popular. 00:11:34.600 |
So we asked folks to pick all the podcasts and newsletters that they actively learn something from at least once a month. 00:11:41.680 |
And these were the top 10 of each so if you're looking for new content to follow and to learn from, this is your guide. 00:11:47.180 |
Many of the creators are in this room, so keep up the great work. 00:11:51.680 |
And I'll just shout out that SWIX is listed both on popular newsletter and popular podcast for Latent Space, so I will just leave this here. 00:11:59.680 |
I think that's enough bar charts and bar time, but if you want to geek out about AI trends, you can come find me online in the hallways. 00:12:11.680 |
We're going to be publishing a full report next week. 00:12:14.680 |
I'll let Elon and Musk have Twitter today, but it's going to include more juicy details including everyone's favorite models and tools across the stack.