Designing AI-Intensive Applications

00:00:00.000 | -

00:00:05.000 | - Okay.

00:00:18.240 | Hi everyone, welcome to the conference, how are you doing?

00:00:20.960 | Excellent.

00:00:23.760 | Usually I open these conferences with a small little talk

00:00:26.680 | to introduce what's going on

00:00:28.600 | and then give you a little update

00:00:30.000 | on where the state of AI engineering is

00:00:31.680 | and how we put together the conference for you.

00:00:34.940 | This is one of those combined talks.

00:00:37.540 | I'm trying to answer every single question you have

00:00:39.480 | about the conference, about AI news,

00:00:42.080 | about where this is all going,

00:00:44.400 | and we'll just dive right in.

00:00:47.060 | Okay, so, 3,000 of you, all of you registered last minute.

00:00:51.680 | Thank you for that stress.

00:00:53.060 | I actually can't quantify this.

00:00:55.480 | I call this the Gini coefficient for the AI organizer stress.

00:01:01.480 | This is compared to last year.

00:01:02.820 | It is, please just buy tickets earlier.

00:01:05.180 | Like, I mean, you know you're going to come, just do it.

00:01:07.200 | Okay.

00:01:09.420 | We also like to use this conference

00:01:11.820 | as a way to track the evolution of AI engineering.

00:01:14.760 | That's, those are the tracks for last year.

00:01:16.320 | We've just doubled every single track for you.

00:01:19.620 | So, basically, it's basically, you know, like double the value

00:01:22.820 | for whatever you get here, and I think, like, you know,

00:01:26.420 | I think this is as much concurrency as we want to do.

00:01:29.080 | Like, I know, I hear that people have decision fatigue

00:01:31.740 | and all that totally, but also we try to cover all of AI,

00:01:34.540 | so deal with it.

00:01:35.740 | We also pride ourselves in doing well by being more responsive

00:01:43.120 | than other conferences like NeurIPS and being more technical

00:01:45.700 | than other conferences like TED or whatever, what have you.

00:01:49.000 | So, we asked you what you wanted to hear about.

00:01:51.180 | These are the surveys.

00:01:52.380 | We tried all sorts of things.

00:01:53.460 | We tried computer using agents.

00:01:55.400 | We tried AI and crypto.

00:01:58.120 | It's always a fun one.

00:01:59.320 | And, but you guys told us what you wanted, and we put it in there.

00:02:03.120 | For all, for more data, we would actually like you to finish

00:02:06.800 | out our survey with surveys not done.

00:02:08.300 | So, if you want to head to that URL, we will present the results

00:02:11.800 | in full tomorrow.

00:02:12.760 | We would love all of you to fill it out so we can get a representative

00:02:15.600 | sample of what you want, and they'll inform us next year.

00:02:19.000 | Okay.

00:02:21.480 | You know, I think the other thing about AI engineering is that we also

00:02:24.520 | have been innovating as engineers, right?

00:02:26.100 | We're the first conference to have an MCP.

00:02:28.140 | We're the first conference to have an MCP talk accepted by MCP.

00:02:32.600 | Shout out to Sam Julian from Ryder for working with us on the official

00:02:39.100 | chatbot, and Quinn and John from Daily for working with us on the official

00:02:43.060 | VoiceBot, as well as Elizabeth Treichen from Vappy.

00:02:45.940 | I need to give her a shout out because she originally helped us prototype

00:02:49.540 | the VoiceBot as well.

00:02:51.280 | So, we're trying to constantly improve the experience.

00:02:53.380 | The other thing I think I want to emphasize as well is, like,

00:02:57.940 | these are the talks that I give.

00:02:59.460 | In 2023, the very first AIE, I talked about the three types of AI engineer.

00:03:06.560 | In 2024, I talked about how AI engineering was becoming more multidisciplinary,

00:03:11.760 | and that's why we started the World's Fair with multiple tracks.

00:03:14.940 | In 2025, in New York, we talked about the evolution and the focus on agent engineering.

00:03:20.180 | So, where are we now in sort of June of 2025?

00:03:24.320 | That's where we're going to focus on.

00:03:25.640 | I think we've come a long way, regardless.

00:03:27.400 | Like, you know, people used to make fun of AI engineering, and I anticipated this.

00:03:30.900 | We used to be low status.

00:03:32.500 | People just derived GPT wrappers.

00:03:34.500 | And look at all the GPT wrappers now.

00:03:36.500 | All of you are rich.

00:03:38.500 | So, we're going to hear from some of these folks in the room, and thank you for sponsoring,

00:03:46.600 | as well.

00:03:47.600 | But, you know, I think the other thing that's also super interesting is that, like, you should--

00:03:51.600 | the consistent lesson that we hear is to not overcomplicate things.

00:03:54.600 | From Anthropic on the Latent Space podcast, we hear from Eric Stuntz about how they beat

00:04:00.600 | a sweep bench with just a very simple scaffold.

00:04:03.200 | Same about deep research from Greg Brockman, who you're going to hear later on in the sort

00:04:07.700 | of closing keynotes.

00:04:08.700 | As well as AMP code.

00:04:09.700 | Where's the AMP folks here?

00:04:10.700 | AMP?

00:04:11.700 | AMP?

00:04:12.700 | AMP?

00:04:13.700 | I think they're probably back in the other room.

00:04:14.700 | But also, you know, there's a sort of emperor has no clothes.

00:04:17.700 | Like, there's-- it's still a very early field, and I think the AI engineers in the room, like,

00:04:21.700 | should be very encouraged by that.

00:04:22.700 | Like, there's still a lot of alpha to mine.

00:04:28.200 | If you watch back all the way to the start of this conference, we actually compared this

00:04:31.500 | moment a lot to the time when sort of physics was in full bloom.

00:04:35.500 | Like, this is the Solvay conference in 1927 when Einstein, Marie Curie, and all the other

00:04:39.600 | household names in physics were all gathered together, and that's what we're trying to do

00:04:43.100 | for this conference.

00:04:44.100 | We've gathered the entire-- the best sort of AI engineers in the world and researchers

00:04:49.400 | and all that to build and push the frontier forward.

00:04:54.700 | The thesis is that there's-- this is the time.

00:04:56.500 | This is the right time to do it.

00:04:58.000 | I said that two and a half years ago, still true today.

00:05:01.600 | But I think, like, there's a very specific time when, like, basically, what people did

00:05:05.800 | in that time of the formation of an industry is that they set out all the basic ideas that

00:05:10.300 | then lasted for the rest of that industry.

00:05:12.700 | So this is the standard model in physics, and there was a very specific period in time

00:05:16.100 | from, like, the '40s to the '70s where they figured it all out, and the next 50 years

00:05:20.400 | we haven't really changed the standard model.

00:05:22.400 | So the question that I want to phrase here is, what is the standard model in AI engineering?

00:05:27.700 | Right?

00:05:28.700 | We have standard models in the rest of engineering, right?

00:05:30.400 | Everyone knows ETL.

00:05:32.200 | Everyone knows MVC.

00:05:33.400 | Everyone knows CRUD.

00:05:34.400 | Everyone knows MapReduce.

00:05:36.400 | And I've used those things in, like, building AI applications.

00:05:39.700 | And, like, it's pretty much like, yes, RAG is there, but I heard RAG is dead.

00:05:43.700 | I don't know.

00:05:44.700 | You guys can tell me.

00:05:46.500 | This day is, like, long context-filled RAG.

00:05:48.500 | The other day, fine-tuning kills RAG.

00:05:50.300 | I don't know.

00:05:51.000 | But I don't think-- I definitely don't think it's the full answer.

00:05:53.500 | So what other standard models might emerge to help us guide our thinking?

00:05:57.700 | And that's really what I want to push you guys to.

00:05:59.800 | So there are a few candidates, standard models in AI engineering.

00:06:02.700 | I'll pick out a few of these.

00:06:03.800 | I don't have time to talk about all of them.

00:06:05.600 | But definitely listen to the DSPy talk from Omar later, tomorrow.

00:06:10.600 | So we're going to cover a few of these.

00:06:13.200 | So first is the LMOS.

00:06:15.100 | This is one of the earliest standard models, basically,

00:06:18.400 | from Karpavi in 2023.

00:06:20.560 | I have updated it for 2025 for multimodality,

00:06:24.160 | for the standard set of tools that have come out,

00:06:27.340 | as well as MCP, which has become the default protocol

00:06:31.900 | for connecting with the outside world.

00:06:35.100 | Second one would be the LNSDLC, Software Development Lifecycle.

00:06:39.600 | I have two versions of this, one with the intersecting concerns

00:06:42.800 | of all the tooling that you buy.

00:06:44.000 | By the way, this is all on the Latent Space blog, if you want.

00:06:46.400 | And I'll tweet out the slides, so--

00:06:49.300 | and it's live stream, so whatever.

00:06:52.140 | But I think, for me, the most interesting insight

00:06:54.300 | in the aha moment when I was talking to Ankur of Braintrust,

00:06:57.700 | who is going to be keynoting tomorrow, is that, you know,

00:07:01.100 | the early parts of the SDLC are increasingly commodity, right?

00:07:04.900 | LLM's kind of free, you know, monitoring kind of free,

00:07:09.800 | and RAG kind of free.

00:07:11.300 | Obviously, there's-- it's just free tier for all of them,

00:07:13.300 | and you only get-- start paying.

00:07:14.800 | But like, when you start to make real money from your customers,

00:07:18.200 | it's when you start to do evals, and you start to add in security,

00:07:20.900 | orchestration, and do real work.

00:07:22.300 | That is real hard engineering work.

00:07:24.300 | And I think that's-- those are the tracks that we've added this year.

00:07:27.300 | And I'm very proud to, you know, I guess, push AI engineering along

00:07:30.800 | from demos into production, which is what everyone always wants.

00:07:34.400 | Another form of standard model is building effective agents.

00:07:37.400 | Our last conference, we had Barry, one of the co-authors

00:07:40.500 | of Building Effective Agents from Anthropic,

00:07:42.000 | give an extremely popular talk about this.

00:07:44.600 | I think that this is now at least the received wisdom

00:07:48.100 | for how to build an agent.

00:07:49.700 | And I think, like, that is one definition.

00:07:52.900 | OpenAI has a different definition.

00:07:54.500 | And I think we're just going to continue to iterate.

00:07:56.700 | I think Dominic yesterday released another improvement

00:07:59.500 | on the agents SDK, which builds upon the Swarm concept

00:08:02.500 | that the OpenAI is pushing.

00:08:03.400 | The way that I approach sort of the agent standard model

00:08:09.200 | has been very different.

00:08:10.200 | So you can refer to my talk from the previous conference on that.

00:08:13.100 | I'm basically trying to do a descriptive top-down model

00:08:18.700 | of what people use, the words people use to describe agents,

00:08:22.500 | like intent, you know, control flow, memory planning, and tool use.

00:08:28.900 | So there's all these, like, really, really interesting things.

00:08:31.600 | But I think that the thing that really got me is, like,

00:08:34.800 | I don't actually use all of that to build AI news.

00:08:36.900 | By the way, who here reads AI news?

00:08:38.200 | I don't know if there's, like-- yeah?

00:08:39.500 | Oh, my god, like, there's half of you.

00:08:41.200 | Thanks.

00:08:42.300 | It's a really good tool I built for myself.

00:08:44.800 | And, you know, hopefully now over 70,000 people

00:08:48.000 | are reading along as well.

00:08:49.500 | And the thing that really got me was Sumith.

00:08:52.700 | At the last conference, you know, he's the lead of PyTorch.

00:08:55.300 | And he says he reads AI news, he loves it, but it is not an agent.

00:08:58.000 | And I was like, what do you mean it's not an agent?

00:08:59.500 | I call it an agent, and you should call it an agent.

00:09:01.900 | But he's right.

00:09:03.000 | It's actually-- it's actually--

00:09:06.100 | I'm going to talk a little bit about that.

00:09:07.400 | But, like, why does it still deliver value,

00:09:10.100 | even though it's, like, a workflow?

00:09:11.400 | And, like, you know, is that still interesting to people?

00:09:13.300 | Right?

00:09:13.800 | Like, why do we not brand every single track here--

00:09:16.800 | voice agents, you know, like, workflow agents,

00:09:21.300 | computer use agents?

00:09:22.200 | Like, why is every single track in this conference not an agent?

00:09:25.100 | Well, I think, basically, we want to deliver value

00:09:28.800 | instead of arguable terminology.

00:09:30.600 | So the assertion that I have is that it's really about human input

00:09:34.400 | versus valuable AI output.

00:09:37.100 | And you can sort of make a mental model of this

00:09:39.400 | and track the ratio of this, and that's more interesting

00:09:41.700 | than arguing about definitions of workflow versus agents.

00:09:44.900 | So, for example, in the co-pilot era,

00:09:47.000 | you had sort of, like, a debounce input of, like,

00:09:50.000 | every few characters that you type that maybe you'll do in autocomplete.

00:09:52.800 | In ChatGPT, every few queries that you type,

00:09:55.300 | you'll maybe output responding query.

00:09:58.100 | It starts to get more interesting with the reasoning models

00:10:00.300 | with, like, a 1 to 10 ratio.

00:10:01.900 | And then, obviously, with, like, the new agents,

00:10:04.100 | now it's, like, more sort of deep research, Notebook LM.

00:10:06.600 | By the way, Ryza Martin, also speaking on the product management

00:10:09.800 | track, she's incredible on talking about the story of Notebook LM.

00:10:15.600 | The other really interesting angle,

00:10:17.400 | if you want to take this mental model to stretch it,

00:10:20.100 | is the 0 to 1, the ambient agents.

00:10:22.000 | With no human input, what kind of interesting AI output can you get?

00:10:26.200 | So, to me, that's more a useful discussion

00:10:29.000 | about input versus output than what is a workflow?

00:10:31.200 | What is an agent?

00:10:31.800 | How agentic is your thing versus not?

00:10:35.800 | Talking about AI news, so, you know,

00:10:37.800 | it is like a bunch of scripts in a trench code.

00:10:42.000 | And I realized I've written it three times.

00:10:43.600 | I've written it for the Discord scrape.

00:10:45.600 | I've written it for the Reddit scrape.

00:10:46.700 | I've written it for the Twitter scrape.

00:10:48.100 | And basically, it's just-- it's always the same process.

00:10:50.100 | You scrape it, you plan, you recursively summarize,

00:10:52.700 | you format, and you evaluate.

00:10:54.300 | And yeah, that's the three kids in a trench coat.

00:10:58.300 | And that's really what it is.

00:10:59.800 | I run it every day, and we improve it a little bit,

00:11:01.900 | but then I'm also running this conference.

00:11:04.300 | So if you generalize it, that actually starts to become

00:11:07.300 | an interesting model for building AI-intensive applications,

00:11:11.300 | where you start to make thousands of AI calls

00:11:14.400 | to serve a particular purpose.

00:11:17.100 | So you sync, you plan, and you sort of parallel process.

00:11:20.600 | You analyze and sort of reduce that down to--

00:11:23.400 | from many to one.

00:11:24.900 | And then you deliver the contents to the user,

00:11:29.300 | and then you evaluate.

00:11:30.400 | And to me, like, that conveniently forms an acronym,

00:11:32.900 | S-P-A-D-E, which is really nice.

00:11:36.500 | There's also sort of interesting AI engineering elements

00:11:39.100 | that fit in there.

00:11:40.300 | So you can process all these into a knowledge graph.

00:11:42.600 | You can turn these into, like, structured outputs.

00:11:45.800 | And you can generate code as well.

00:11:47.200 | So for example, you know, ChatGPT with Canvas,

00:11:50.800 | or Claude with artifacts, is a way of just delivering the output

00:11:55.300 | as a code artifact instead of just text output.

00:11:57.900 | And I think it's like a really interesting way

00:11:59.500 | to think about this.

00:12:00.100 | So this is my mental model so far.

00:12:02.600 | I wish I had the space to go into it.

00:12:04.200 | But ask me later.

00:12:05.200 | This is what I'm developing right now.

00:12:06.500 | But I think what I would really emphasize is, you know,

00:12:09.900 | I think, like, there's all sorts of interesting ways

00:12:11.800 | to think about what the standard model is,

00:12:14.100 | and whether it's useful for you in taking your application

00:12:17.600 | to the next step of, like, how do I add more intelligence

00:12:20.300 | to this in a way that's useful and not annoying?

00:12:22.800 | And for me, this is it.

00:12:24.500 | OK, so I've thrown a bunch of standard models in here.

00:12:27.700 | But that's just my current hypothesis.

00:12:29.800 | I want you at this conference, in all your conversations

00:12:32.300 | with each other and with the speakers,

00:12:33.800 | to think about what the new standard model for AI

00:12:35.800 | engineering is.

00:12:36.600 | What can everyone use to improve their applications?

00:12:39.100 | And I guess, ultimately, build products

00:12:41.300 | that people want to use, which is what Laurie mentioned

00:12:44.100 | at the start.

00:12:44.600 | So I'm really excited about this conference.

00:12:47.500 | It's such an honor and a joy to get it together for you guys.

00:12:51.200 | And I hope you enjoy the rest of the conference.

00:12:52.900 | Thank you so much.

00:12:53.800 | Thank you so much.

00:12:55.900 | Thank you.

00:12:56.900 | Thank you.

00:12:57.900 | Thank you.

00:12:58.900 | Thank you.

Designing AI-Intensive Applications - swyx

Chapters