The 2025 AI Engineering Report — Barr Yaron, Amplify

00:00:00.000 | Tanya Cushman Reviewer: Peter van de Ven

00:00:28.600 | All right. Hi, everyone.

00:00:30.240 | Thank you for having me here.

00:00:32.840 | And huge thanks to Ben, to SWIX, to all the organizers

00:00:36.340 | who've put so much time and heart into bringing this community together.

00:00:39.940 | Yeah.

00:00:42.580 | All right.

00:00:45.880 | So we're here because we care about AI engineering

00:00:48.520 | and where this field is headed.

00:00:50.460 | So to better understand the current landscape,

00:00:52.720 | we launched the 2025 State of AI Engineering Survey,

00:00:57.200 | and I'm excited to share some early findings with you today.

00:01:00.040 | All right. Before we dive into the results, the least interesting slide.

00:01:07.040 | I don't know everyone in this audience, but I'm Barr.

00:01:10.340 | I'm an investment partner at Amplify,

00:01:12.820 | where I'm lucky to invest in technical founders,

00:01:15.680 | including companies built by and for AI engineers.

00:01:18.660 | And with that, let's get into what you actually care about,

00:01:22.420 | which is enough bar and more bar charts.

00:01:24.760 | And there are a lot of bar charts coming up.

00:01:28.600 | Okay.

00:01:30.400 | So first, our sample, we had 500 respondents fill out the survey,

00:01:35.440 | including many of you here in the audience today and on the live stream.

00:01:39.340 | Thank you for doing that.

00:01:42.140 | And the largest group called themselves engineers,

00:01:45.520 | whether software engineers or AI engineers.

00:01:48.720 | While this is the AI engineering conference,

00:01:51.520 | it's clear from the speakers, from the hallway chats.

00:01:54.560 | There's a wide mix of titles and roles.

00:01:56.920 | You even let a VC sneak in.

00:02:00.260 | So let's test this with a quick show of hands.

00:02:02.460 | Raise your hand if your title is actually AI engineer

00:02:06.160 | at the AI engineering conference.

00:02:08.140 | Okay. That is extremely sparse.

00:02:10.700 | Raise your hand.

00:02:14.300 | Put your hands down.

00:02:15.040 | Raise your hand if your title is something else entirely.

00:02:17.740 | So that should be almost everyone.

00:02:20.640 | Keep it up if you think you're doing the exact same work

00:02:23.880 | as many of the AI engineers.

00:02:25.680 | All right.

00:02:28.080 | So this sort of tracks.

00:02:29.680 | Titles are weird right now, but the community is broad.

00:02:32.160 | It's technical.

00:02:32.880 | It's growing.

00:02:33.920 | We expect that AI engineer label to gain even more ground.

00:02:37.360 | Couldn't help myself.

00:02:38.960 | Quick Google trend search.

00:02:40.620 | Term AI engineering barely registered before late 2022.

00:02:44.260 | We know what happened.

00:02:46.000 | ChatGPT launched.

00:02:47.240 | And the moment for AI engineering interest has not slowed since.

00:02:50.200 | Okay.

00:02:51.300 | So people had a wide variety of titles,

00:02:53.300 | but also a wide variety of experience.

00:02:56.440 | The interesting part here is that many of our most seasoned developers

00:02:59.880 | are AI newcomers.

00:03:01.620 | So among software engineers with 10-plus years of software experience,

00:03:05.920 | nearly half have been working with AI for three years or less,

00:03:09.360 | and one in 10 started just this past year.

00:03:11.980 | So change right now is the only constant, even for the veterans.

00:03:15.560 | So what are folks actually building?

00:03:19.040 | Let's get into the juice.

00:03:20.340 | So more than half of the respondents are using LLMs for both internal

00:03:25.240 | and external use cases.

00:03:26.740 | What was striking to me was that three out of the top five models

00:03:30.940 | and half of the top 10 models that respondents are using for those external cases

00:03:35.740 | for the customer-facing products are from OpenAI.

00:03:38.780 | The top use cases that we saw are code generation and code intelligence

00:03:44.860 | and writing assistant content generation.

00:03:46.920 | Maybe that's not particularly surprising.

00:03:48.960 | But the real story here is heterogeneity.

00:03:51.600 | So 94% of people who use LLMs are using it for at least two use cases.

00:03:56.960 | 82% using it for at least three.

00:03:59.540 | Basically, folks who are using LLMs are using it internally, externally,

00:04:03.800 | and across multiple use cases.

00:04:06.140 | All right.

00:04:07.040 | So you may ask, how are folks actually interfacing with the models?

00:04:10.840 | And how are they customizing their systems for these use cases?

00:04:16.140 | Besides few-shot learning, RAG is the most popular way folks are customizing

00:04:21.080 | their systems.

00:04:22.180 | So 70% of respondents said they're using it.

00:04:25.560 | The real surprise for me here, I'm looking to gauge surprise in the audience,

00:04:30.660 | was how much fine-tuning is happening across the board.

00:04:34.560 | It was much more than I had expected overall.

00:04:37.940 | In the sample, we have researchers and we have research engineers who are the ones

00:04:41.600 | doing fine-tuning by far the most.

00:04:44.180 | We also asked an open-ended question for those who were fine-tuning.

00:04:47.980 | What specific techniques are you using?

00:04:49.980 | So here's what the fine-tuners had to say.

00:04:52.780 | 40% mentioned Laura or QLaura reflecting a strong preference for parameter-efficient methods.

00:04:59.820 | We also saw a bunch of different fine-tuning methods, including DPO, reinforcement fine-tuning,

00:05:06.020 | and the most popular core training approach was good old supervised fine-tuning.

00:05:12.360 | Many hybrid approaches were listed as well.

00:05:17.380 | Moving on top of updating systems, sometimes it can feel like new models come out every single week.

00:05:25.980 | Just as you finished integrating one, another one drops with better benchmarks and a breaking change.

00:05:31.320 | So it turns out more than 50% are updating their models at least monthly, 17% weekly.

00:05:39.860 | And folks are updating their prompts much more frequently.

00:05:43.600 | So 70% of respondents are updating prompts at least monthly, and 1 in 10 are doing it daily.

00:05:49.980 | So it sounds like some of you have not stopped typing since GPT-4 dropped.

00:05:53.740 | But I also understand I have empathy seeing one blog post from Simon Willison, and suddenly your trusty prompt just isn't good enough anymore.

00:06:05.420 | Despite all of these prompt changes, a full 31% of respondents don't have any way of managing their prompts.

00:06:14.020 | What I did not ask is how AI engineers feel about not doing anything to manage their prompts.

00:06:20.360 | So we have the 2026 survey for that.

00:06:24.360 | We also asked folks across the different modalities who is actually using these models at work, and is it actually going well?

00:06:33.280 | And we see that image, video, and audio usage all lag text usage by significant margins.

00:06:39.620 | I like to call this the multimodal production gap, because I wanted an animation.

00:06:47.020 | And this gap still persists when we add in folks who have these models in production, but have not garnered as much traction.

00:06:56.740 | Okay, what's interesting here is when we add the folks who are not using models at all in this chart 2.

00:07:08.080 | So here we can see folks who are not using text, not using image, not using audio, or not using video.

00:07:15.400 | And we have two categories.

00:07:16.800 | It's broken down by folks who plan to eventually use these modalities, and folks who do not currently plan to.

00:07:24.060 | You can roughly see this ratio of no plan to adopt versus plan to adopt.

00:07:29.800 | Audio has the highest intent to adopt, so 37% of the folks not using audio today have a plan to eventually adopt audio.

00:07:39.040 | So get ready to see an audio wave.

00:07:41.880 | Of course, as models get better and more accessible, I imagine some of these adoption numbers will go up even further.

00:07:47.720 | All right, so we have to talk about agents.

00:07:51.160 | One question I almost put in the survey was how do you define an AI agent, but I thought I would still be reading through different responses.

00:07:58.900 | So for the sake of clarity, we defined an AI agent as a system where an LLM controls the core decision-making or workflow.

00:08:06.900 | So 80% of respondents say LLMs are working well at work, but less than 20% say the same about agents.

00:08:15.640 | Agents aren't everywhere yet, but they're coming.

00:08:19.640 | The majority of folks may not be using agents, but most at least plan to.

00:08:24.580 | Fewer than one in ten say that they will never use agents, all to say that people want their agents, and I'm probably preaching to the choir.

00:08:32.060 | The majority of agents already in production do have write access, typically with a human in the loop, and some can even take actions independently.

00:08:43.260 | So excited as more agents are adopted to learn more about the tool permissioning that folks have access to.

00:08:51.000 | If we want AI in production, of course we need strong monitoring and observability.

00:08:56.880 | So we asked, do you manage and monitor your AI systems?

00:09:00.140 | This was a multi-select question, so most folks are using multiple methods to monitor their systems.

00:09:05.880 | 60% are using standard observability.

00:09:08.780 | Over 50% rely on offline eval.

00:09:12.320 | And we asked the same thing for how you evaluate your model and system accuracy and quality.

00:09:18.160 | So folks are using a combination of methods, including data collection from users, benchmarks, et cetera.

00:09:24.360 | But the most popular at the end of the day is still human review.

00:09:28.760 | And for monitoring their own model usage, most respondents rely on internal metrics.

00:09:34.980 | So storage is important, too.

00:09:37.240 | Where does the context live?

00:09:38.380 | How do we get it when we need it?

00:09:40.120 | 65% of respondents are using a dedicated vector database.

00:09:44.420 | And this suggests that for many use cases, specialized vector databases are providing enough value over

00:09:50.400 | general purpose databases with vector extensions.

00:09:53.540 | Among that group, 35% said that they primarily self-host.

00:09:58.340 | 30% primarily use a third-party provider.

00:10:02.480 | All right.

00:10:03.480 | So I think we've been having fun this whole time, but we're entering a section I like to formally call other fun stuff.

00:10:08.980 | I spent hours workshopping the name.

00:10:11.860 | So we asked AI engineers, should agents be required to disclose when they're AI and not human?

00:10:18.740 | Most folks think yes.

00:10:20.740 | Agents should disclose that they're AI.

00:10:22.740 | We asked folks if they'd pay more for inference-time compute, and the answer was yes, but not by a wide margin.

00:10:28.740 | And we asked folks if transformer-based models will be dominant in 2030, and it seems like people do believe that attention is all we'll need in 2030.

00:10:37.120 | The majority of respondents also think open-source and closed-source models are going to converge.

00:10:42.620 | So I will let you debate that after.

00:10:45.500 | No commentary needed here.

00:10:48.000 | So the average or the mean guess for the percentage of U.S. Gen Z population that will have AI girlfriends, boyfriends, is 26%.

00:10:57.880 | I don't really know what to say or expect here, but we'll see.

00:11:01.840 | We'll see what happens in a world where folks don't know if they're being left unread or just facing latency issues, or of course the dreaded, it's not you, it's my algorithm.

00:11:16.220 | And finally, we asked folks what is the number one most painful thing about AI engineering today, and evaluation topped that list.

00:11:24.220 | So it's a good thing this conference and the talk before me has been so focused on evals because clearly they're causing some serious pain.

00:11:31.720 | Okay, and now to bring us home, I'm going to show you what's popular.

00:11:34.600 | So we asked folks to pick all the podcasts and newsletters that they actively learn something from at least once a month.

00:11:41.680 | And these were the top 10 of each so if you're looking for new content to follow and to learn from, this is your guide.

00:11:47.180 | Many of the creators are in this room, so keep up the great work.

00:11:51.680 | And I'll just shout out that SWIX is listed both on popular newsletter and popular podcast for Latent Space, so I will just leave this here.

00:11:59.680 | I think that's enough bar charts and bar time, but if you want to geek out about AI trends, you can come find me online in the hallways.

00:12:11.680 | We're going to be publishing a full report next week.

00:12:14.680 | I'll let Elon and Musk have Twitter today, but it's going to include more juicy details including everyone's favorite models and tools across the stack.

00:12:23.680 | Thank you for the time.

00:12:25.680 | Enjoy the afternoon.

00:12:26.680 | Thank you.

00:12:27.680 | Thank you.

00:12:27.680 | We'll see you next time.