back to index

The 2025 AI Engineering Report — Barr Yaron, Amplify


Whisper Transcript | Transcript Only Page

00:00:00.000 | Tanya Cushman Reviewer: Peter van de Ven
00:00:28.600 | All right. Hi, everyone.
00:00:30.240 | Thank you for having me here.
00:00:32.840 | And huge thanks to Ben, to SWIX, to all the organizers
00:00:36.340 | who've put so much time and heart into bringing this community together.
00:00:39.940 | Yeah.
00:00:42.580 | All right.
00:00:45.880 | So we're here because we care about AI engineering
00:00:48.520 | and where this field is headed.
00:00:50.460 | So to better understand the current landscape,
00:00:52.720 | we launched the 2025 State of AI Engineering Survey,
00:00:57.200 | and I'm excited to share some early findings with you today.
00:01:00.040 | All right. Before we dive into the results, the least interesting slide.
00:01:07.040 | I don't know everyone in this audience, but I'm Barr.
00:01:10.340 | I'm an investment partner at Amplify,
00:01:12.820 | where I'm lucky to invest in technical founders,
00:01:15.680 | including companies built by and for AI engineers.
00:01:18.660 | And with that, let's get into what you actually care about,
00:01:22.420 | which is enough bar and more bar charts.
00:01:24.760 | And there are a lot of bar charts coming up.
00:01:28.600 | Okay.
00:01:30.400 | So first, our sample, we had 500 respondents fill out the survey,
00:01:35.440 | including many of you here in the audience today and on the live stream.
00:01:39.340 | Thank you for doing that.
00:01:42.140 | And the largest group called themselves engineers,
00:01:45.520 | whether software engineers or AI engineers.
00:01:48.720 | While this is the AI engineering conference,
00:01:51.520 | it's clear from the speakers, from the hallway chats.
00:01:54.560 | There's a wide mix of titles and roles.
00:01:56.920 | You even let a VC sneak in.
00:02:00.260 | So let's test this with a quick show of hands.
00:02:02.460 | Raise your hand if your title is actually AI engineer
00:02:06.160 | at the AI engineering conference.
00:02:08.140 | Okay. That is extremely sparse.
00:02:10.700 | Raise your hand.
00:02:14.300 | Put your hands down.
00:02:15.040 | Raise your hand if your title is something else entirely.
00:02:17.740 | So that should be almost everyone.
00:02:20.640 | Keep it up if you think you're doing the exact same work
00:02:23.880 | as many of the AI engineers.
00:02:25.680 | All right.
00:02:28.080 | So this sort of tracks.
00:02:29.680 | Titles are weird right now, but the community is broad.
00:02:32.160 | It's technical.
00:02:32.880 | It's growing.
00:02:33.920 | We expect that AI engineer label to gain even more ground.
00:02:37.360 | Couldn't help myself.
00:02:38.960 | Quick Google trend search.
00:02:40.620 | Term AI engineering barely registered before late 2022.
00:02:44.260 | We know what happened.
00:02:46.000 | ChatGPT launched.
00:02:47.240 | And the moment for AI engineering interest has not slowed since.
00:02:50.200 | Okay.
00:02:51.300 | So people had a wide variety of titles,
00:02:53.300 | but also a wide variety of experience.
00:02:56.440 | The interesting part here is that many of our most seasoned developers
00:02:59.880 | are AI newcomers.
00:03:01.620 | So among software engineers with 10-plus years of software experience,
00:03:05.920 | nearly half have been working with AI for three years or less,
00:03:09.360 | and one in 10 started just this past year.
00:03:11.980 | So change right now is the only constant, even for the veterans.
00:03:15.560 | So what are folks actually building?
00:03:19.040 | Let's get into the juice.
00:03:20.340 | So more than half of the respondents are using LLMs for both internal
00:03:25.240 | and external use cases.
00:03:26.740 | What was striking to me was that three out of the top five models
00:03:30.940 | and half of the top 10 models that respondents are using for those external cases
00:03:35.740 | for the customer-facing products are from OpenAI.
00:03:38.780 | The top use cases that we saw are code generation and code intelligence
00:03:44.860 | and writing assistant content generation.
00:03:46.920 | Maybe that's not particularly surprising.
00:03:48.960 | But the real story here is heterogeneity.
00:03:51.600 | So 94% of people who use LLMs are using it for at least two use cases.
00:03:56.960 | 82% using it for at least three.
00:03:59.540 | Basically, folks who are using LLMs are using it internally, externally,
00:04:03.800 | and across multiple use cases.
00:04:06.140 | All right.
00:04:07.040 | So you may ask, how are folks actually interfacing with the models?
00:04:10.840 | And how are they customizing their systems for these use cases?
00:04:16.140 | Besides few-shot learning, RAG is the most popular way folks are customizing
00:04:21.080 | their systems.
00:04:22.180 | So 70% of respondents said they're using it.
00:04:25.560 | The real surprise for me here, I'm looking to gauge surprise in the audience,
00:04:30.660 | was how much fine-tuning is happening across the board.
00:04:34.560 | It was much more than I had expected overall.
00:04:37.940 | In the sample, we have researchers and we have research engineers who are the ones
00:04:41.600 | doing fine-tuning by far the most.
00:04:44.180 | We also asked an open-ended question for those who were fine-tuning.
00:04:47.980 | What specific techniques are you using?
00:04:49.980 | So here's what the fine-tuners had to say.
00:04:52.780 | 40% mentioned Laura or QLaura reflecting a strong preference for parameter-efficient methods.
00:04:59.820 | We also saw a bunch of different fine-tuning methods, including DPO, reinforcement fine-tuning,
00:05:06.020 | and the most popular core training approach was good old supervised fine-tuning.
00:05:12.360 | Many hybrid approaches were listed as well.
00:05:17.380 | Moving on top of updating systems, sometimes it can feel like new models come out every single week.
00:05:25.980 | Just as you finished integrating one, another one drops with better benchmarks and a breaking change.
00:05:31.320 | So it turns out more than 50% are updating their models at least monthly, 17% weekly.
00:05:39.860 | And folks are updating their prompts much more frequently.
00:05:43.600 | So 70% of respondents are updating prompts at least monthly, and 1 in 10 are doing it daily.
00:05:49.980 | So it sounds like some of you have not stopped typing since GPT-4 dropped.
00:05:53.740 | But I also understand I have empathy seeing one blog post from Simon Willison, and suddenly your trusty prompt just isn't good enough anymore.
00:06:05.420 | Despite all of these prompt changes, a full 31% of respondents don't have any way of managing their prompts.
00:06:14.020 | What I did not ask is how AI engineers feel about not doing anything to manage their prompts.
00:06:20.360 | So we have the 2026 survey for that.
00:06:24.360 | We also asked folks across the different modalities who is actually using these models at work, and is it actually going well?
00:06:33.280 | And we see that image, video, and audio usage all lag text usage by significant margins.
00:06:39.620 | I like to call this the multimodal production gap, because I wanted an animation.
00:06:47.020 | And this gap still persists when we add in folks who have these models in production, but have not garnered as much traction.
00:06:56.740 | Okay, what's interesting here is when we add the folks who are not using models at all in this chart 2.
00:07:08.080 | So here we can see folks who are not using text, not using image, not using audio, or not using video.
00:07:15.400 | And we have two categories.
00:07:16.800 | It's broken down by folks who plan to eventually use these modalities, and folks who do not currently plan to.
00:07:24.060 | You can roughly see this ratio of no plan to adopt versus plan to adopt.
00:07:29.800 | Audio has the highest intent to adopt, so 37% of the folks not using audio today have a plan to eventually adopt audio.
00:07:39.040 | So get ready to see an audio wave.
00:07:41.880 | Of course, as models get better and more accessible, I imagine some of these adoption numbers will go up even further.
00:07:47.720 | All right, so we have to talk about agents.
00:07:51.160 | One question I almost put in the survey was how do you define an AI agent, but I thought I would still be reading through different responses.
00:07:58.900 | So for the sake of clarity, we defined an AI agent as a system where an LLM controls the core decision-making or workflow.
00:08:06.900 | So 80% of respondents say LLMs are working well at work, but less than 20% say the same about agents.
00:08:15.640 | Agents aren't everywhere yet, but they're coming.
00:08:19.640 | The majority of folks may not be using agents, but most at least plan to.
00:08:24.580 | Fewer than one in ten say that they will never use agents, all to say that people want their agents, and I'm probably preaching to the choir.
00:08:32.060 | The majority of agents already in production do have write access, typically with a human in the loop, and some can even take actions independently.
00:08:43.260 | So excited as more agents are adopted to learn more about the tool permissioning that folks have access to.
00:08:51.000 | If we want AI in production, of course we need strong monitoring and observability.
00:08:56.880 | So we asked, do you manage and monitor your AI systems?
00:09:00.140 | This was a multi-select question, so most folks are using multiple methods to monitor their systems.
00:09:05.880 | 60% are using standard observability.
00:09:08.780 | Over 50% rely on offline eval.
00:09:12.320 | And we asked the same thing for how you evaluate your model and system accuracy and quality.
00:09:18.160 | So folks are using a combination of methods, including data collection from users, benchmarks, et cetera.
00:09:24.360 | But the most popular at the end of the day is still human review.
00:09:28.760 | And for monitoring their own model usage, most respondents rely on internal metrics.
00:09:34.980 | So storage is important, too.
00:09:37.240 | Where does the context live?
00:09:38.380 | How do we get it when we need it?
00:09:40.120 | 65% of respondents are using a dedicated vector database.
00:09:44.420 | And this suggests that for many use cases, specialized vector databases are providing enough value over
00:09:50.400 | general purpose databases with vector extensions.
00:09:53.540 | Among that group, 35% said that they primarily self-host.
00:09:58.340 | 30% primarily use a third-party provider.
00:10:02.480 | All right.
00:10:03.480 | So I think we've been having fun this whole time, but we're entering a section I like to formally call other fun stuff.
00:10:08.980 | I spent hours workshopping the name.
00:10:11.860 | So we asked AI engineers, should agents be required to disclose when they're AI and not human?
00:10:18.740 | Most folks think yes.
00:10:20.740 | Agents should disclose that they're AI.
00:10:22.740 | We asked folks if they'd pay more for inference-time compute, and the answer was yes, but not by a wide margin.
00:10:28.740 | And we asked folks if transformer-based models will be dominant in 2030, and it seems like people do believe that attention is all we'll need in 2030.
00:10:37.120 | The majority of respondents also think open-source and closed-source models are going to converge.
00:10:42.620 | So I will let you debate that after.
00:10:45.500 | No commentary needed here.
00:10:48.000 | So the average or the mean guess for the percentage of U.S. Gen Z population that will have AI girlfriends, boyfriends, is 26%.
00:10:57.880 | I don't really know what to say or expect here, but we'll see.
00:11:01.840 | We'll see what happens in a world where folks don't know if they're being left unread or just facing latency issues, or of course the dreaded, it's not you, it's my algorithm.
00:11:16.220 | And finally, we asked folks what is the number one most painful thing about AI engineering today, and evaluation topped that list.
00:11:24.220 | So it's a good thing this conference and the talk before me has been so focused on evals because clearly they're causing some serious pain.
00:11:31.720 | Okay, and now to bring us home, I'm going to show you what's popular.
00:11:34.600 | So we asked folks to pick all the podcasts and newsletters that they actively learn something from at least once a month.
00:11:41.680 | And these were the top 10 of each so if you're looking for new content to follow and to learn from, this is your guide.
00:11:47.180 | Many of the creators are in this room, so keep up the great work.
00:11:51.680 | And I'll just shout out that SWIX is listed both on popular newsletter and popular podcast for Latent Space, so I will just leave this here.
00:11:59.680 | I think that's enough bar charts and bar time, but if you want to geek out about AI trends, you can come find me online in the hallways.
00:12:11.680 | We're going to be publishing a full report next week.
00:12:14.680 | I'll let Elon and Musk have Twitter today, but it's going to include more juicy details including everyone's favorite models and tools across the stack.
00:12:23.680 | Thank you for the time.
00:12:25.680 | Enjoy the afternoon.
00:12:26.680 | Thank you.
00:12:27.680 | Thank you.
00:12:27.680 | We'll see you next time.