back to index

The emerging skillset of wielding coding agents — Beyang Liu, Sourcegraph / Amp


Chapters

0:0
0:14 Current Discourse: The talk begins by acknowledging the polarized debate on AI's role in coding. While some elite programmers are skeptical, many developers find significant value in AI tools, suggesting a disconnect between top-tier and mainstream experience. Liu frames the discussion by referencing opinions from figures like Jonathan Blow and Eric S. Raymond, highlighting the varied perspectives in the field.
3:1 Paradigm Shift: The most significant mistake developers make is using new agents with old mental models. Liu emphasizes that we are in a "step function transition" in model capabilities, meaning that strategies from even six months ago are already outdated for leveraging the full power of today's agents.
5:6 GPT-3 Era (2022): This era was defined by text completion models. The primary application was "copilot" or "autocomplete," where the AI would suggest the next few lines of code based on the preceding context.
5:24 ChatGPT Era (2023): The introduction of instruct-tuned models like GPT-3.5 led to the rise of chatbots. In the coding world, this manifested as "ragbots," which combined a chat interface with a retrieval engine to answer questions about a codebase.
6:11 Agent Era (Present): The current era is defined by models capable of tool use and autonomous operation. This requires a new application architecture where the agent can directly edit files, run commands, and interact with external services to accomplish a goal.
7:27 Autonomous Edits
9:55 Unix Philosophy
10:24 New Applications
13:15 The Task
14:30 Tool Use
15:53 Sub-Agents
17:56 Planning & Execution
19:46 Nuanced Problem Solving
23:21 Detailed Prompts
24:21 Feedback Loops
28:3 Code Understanding
28:36 Code Reviews
30:35 Micromanagement
30:46 Under-prompting
31:52 Parallel Agents
33:18 High-Ceiling Skill

Whisper Transcript | Transcript Only Page

00:00:00.000 | My name is Bianc. I'm the CTO and co-founder of a company called Sourcegraph. We build developer
00:00:19.500 | tools and today I want to share with you some of the observations and insights that we've had on
00:00:24.900 | the sort of like emerging skill set of how to wield coding agents. That sound good to everyone?
00:00:30.900 | All right, cool. Okay, so let's check in on the agent discourse. I don't know if you all saw this,
00:00:39.900 | but a couple days ago there were some spicy tweets about the efficacy of AI coding agents or inefficacy,
00:00:47.900 | depending on your perspective. So Jonathan Blow, who is a really talented developer, he basically
00:00:54.900 | single-handedly coded up the indie game Braid, if you're familiar with that. So he's kind of like
00:00:59.900 | god tier status in terms of coding ability. Retweeted Alex Albert, who is also someone I
00:01:05.900 | respect and admire a lot, works at Anthropic, basically claiming that all the hype around coding
00:01:10.900 | agents and code AI in general was just hype, right? There's no substance there. And then there were
00:01:17.900 | some responses, and there was kind of a spectrum of responses too. We had some other big names in the
00:01:22.900 | developer world, like Jessie Frazell. She was one of the early contributors, maintainers of Docker. She's
00:01:27.900 | also really legit. She said basically something to the effect of like, I think you're right, but you're in
00:01:34.900 | like the top 0.01% of programmers, Jonathan, for the rest of us down here in this room that aren't on
00:01:40.900 | Mount Olympus. It actually helps a lot, but not super helpful if you're really, really good. But then we also had folks like Eric S. Raymond, who is like one of the fathers of open source, who had a very spicy reply. He's basically like, look, I consider myself to be pretty decent at programming, and these things help a lot. And then the kind of my favorite one of this was actually the Top of Hacker News post that was written by
00:01:45.900 | Thomas Tacek, who is a
00:01:50.900 | Thomas Tacek, who is a really legit security engineer. Some of you may have seen this trending.
00:01:55.900 | reply is basically like, look, I consider myself to be pretty decent at programming, and these things
00:02:02.160 | help a lot. And then the kind of my favorite one of this was actually the Top of Hacker News post
00:02:09.540 | that was written by Thomas Tacek, who is a really legit security engineer. Some of you may have seen
00:02:18.660 | this trending. He was basically taking the opposite view of like, you know, there's some really smart
00:02:22.980 | people there who are very AI skeptical, but they're nuts. Like, these things are really useful.
00:02:26.680 | So I'm guessing if you're at this conference, you probably lean toward coding agents are
00:02:33.220 | substantively useful, and there's something there. I don't know. Just a guess. But I think even within
00:02:40.720 | this room, there's probably a spectrum of best practices and opinions about like where agents
00:02:48.940 | are good. You know, whether they're restricted to like small edits or like front end applications or
00:02:56.020 | weekend via coding, whether they actually work on your production code base. And I think this is just
00:03:02.200 | indicative of the dynamic technical landscape that we're in right now. And a couple months back, I read
00:03:07.760 | this blog post from this guy, Jeff Huntley. So Jeff was a senior engineer at Canva at the time. And his role
00:03:15.240 | at Canva is really interesting. He basically went around interviewing all the developers inside of
00:03:21.480 | Canva using AI tools like cursor and other things and seeing how they're using it. And he basically came
00:03:27.060 | to the conclusion that like most people were holding it wrong, which is really interesting. And he came up
00:03:32.300 | with the blog post about like all the different anti-patterns that he was seeing. But my summation
00:03:37.540 | of that blog post is like the number one mistake that people are using with coding agents right now is they're
00:03:45.080 | trying to use coding agents the same way they're using AI coding tools six months ago. And therefore, they're
00:03:52.480 | probably wrong, which is kind of crazy because normally, if you're, you know, using a tool,
00:03:57.720 | the best practices don't change in six months. Typically, the things that you learn that are good will
00:04:04.420 | still be like present and, you know, topical and relevant six months down the line. But I think we're in a
00:04:12.660 | really interesting moment in time right now. And, you know, why the sudden change? I think it's because
00:04:18.900 | of this step function transition that we've experienced in model capabilities in the past six months. So, you know,
00:04:26.700 | we've, we've all been around since the dawn of generative AI, the ancient year of 2022, right? November 2022 was
00:04:36.700 | when ChatGPT launched, right? And every year now, you know, this is now the year three, you know, three
00:04:43.260 | after ChatGPT, right? We're now living in the AI future. But I think there's already been kind of like three
00:04:48.900 | distinct waves or eras largely driven by the evolution of frontier model capabilities. And the
00:04:56.700 | model capabilities really dictate the ideal architecture that, that becomes dominant at the
00:05:02.680 | application layer. So in the GPT-3 era, all the models were text completion models, which meant all
00:05:09.280 | the applications that people were building were these like copilots or autocomplete tools. So the
00:05:13.540 | dominant UX paradigm was like, you type some stuff, it types some stuff, you type some more, and that's
00:05:19.000 | how you would interact. And then ChatGPT came along with GPT-3.5, which was InstructTuned to interact
00:05:25.980 | like a chat bot. And suddenly people realized like, oh, it's not just completing the next thing
00:05:31.960 | I'm talking about, I can actually ask it questions like I can a human now. And then some other people
00:05:36.980 | came along, we were part of this crowd, we realized like, hey, you know what's even better than just
00:05:42.260 | like asking questions? You can actually copy paste stuff into the chat and say like, here's some code
00:05:47.320 | for my code base, use that as an example, and pattern match against that. And that helps it generate,
00:05:52.960 | you know, a little bit better code or less fake code or less hallucinated code than it did before.
00:05:58.940 | And that basically meant that everyone at the application layer was building a rag bot in 2023.
00:06:04.920 | So like a chat bot plus a rag retrieval engine. But now, I think we've entered a new era.
00:06:12.900 | And I'm not sure if everyone realizes it, or maybe this is, I don't know, like who agrees with this statement?
00:06:17.940 | Like who thinks it's a real paradigm shift? Okay, and then who here is like, ah, that's a bunch of
00:06:22.900 | bullshit. Anyone? Feel free to, I like, okay, okay, so maybe I'm, maybe I could just skip this slide.
00:06:28.880 | So we're now living in the era of agents, and the new model capabilities really dictate a new
00:06:35.720 | application architecture. And so one of the things that we asked ourselves at Sourcegraph is, you know,
00:06:40.580 | a lot of the existing tools in the market that were designed for the era of GPT-4 and CLAWD-3.
00:06:46.860 | So a lot of the application stuff, features, and UX and UI was really built around the capabilities,
00:06:53.400 | or in some cases, the limitations of the chat-based LLMs. And so if we were going to design a coding agent
00:07:00.780 | from the ground up to unleash the capabilities of tool using LLMs, agentic LLMs, what would that look like?
00:07:07.580 | Okay, so here are my spicy takes. These are controversial design decisions that I think are better to make
00:07:16.140 | in the age of agents. And many of these go against the best practices that kind of emerged in the chatbot era.
00:07:23.820 | Okay, so number one is the agent should just make edits to your files. It shouldn't ask you at every turn,
00:07:30.620 | like, hey, you know, I want to make this change. Should I apply it? If it's asking you and it's wrong,
00:07:34.620 | it's already done the wrong thing and it's wasted your time. Humans need to get more out of the inner loop
00:07:41.420 | and more kind of like on top of the loop, like still steering it and guiding it, but
00:07:45.420 | less, you know, micromanaging and managing every change. Second thing is, do we still need a thick
00:07:52.300 | client to manipulate the LLMs? Like, do we still need a forked VS code? That's like the salty way of
00:07:58.860 | saying this, right? The VS code fork became the culmination of the AI coding application, I think,
00:08:04.780 | for the chatbot era. But there's this question of like, you know, if the contract of an agent is you
00:08:10.940 | ask it to do stuff and then it does stuff, do you really still need all that UI built around like
00:08:15.340 | context management and applying the proposed change in the code base? Or can you just ask it to do stuff
00:08:21.260 | and expect it to do the right thing? Third, I think we're going to move beyond the choose your own model
00:08:28.140 | phase. So I think in the chatbot era, it was very easy to swap models in and out. And you'd like, oh,
00:08:33.580 | you know, a new model came along, let me swap it out and see how well it attends to the context that
00:08:38.220 | my retrieval engine fetches. In the agentic world, there's a much deeper coupling because the LLM that
00:08:44.780 | you're using essentially becomes the brains of these agentic chains. And so it's much harder to rip
00:08:49.660 | and replace. And I think a lot of people in this room who have tried mixing and matching, you know,
00:08:53.900 | different models in the context of agents have found that, you know, swapping out a different model and
00:08:59.420 | expecting similar results is very different. A lot of the, like, a lot of the LLMs out there aren't even
00:09:04.620 | good at the basics of tool use yet. So it's very difficult to just replace the brains.
00:09:10.700 | Four is I think we're going to move past the era of fixed pricing. Agents eat up a lot of tokens.
00:09:16.380 | And so they look expensive relative to chatbots. But the comparison that more and more people are
00:09:23.100 | making is how much human time is it saving? So they're still cheap relative to human time saved.
00:09:28.460 | And the fixed pricing model actually introduces a perverse incentive now where it's like selling gym
00:09:33.420 | memberships, right? Like if I sold you a membership to my chatbot and you're now paying me, you know,
00:09:38.140 | 20 bucks a month, my incentive now is to push the inference cost as low as possible. And the easiest
00:09:43.260 | way to do that is to use dumber models. But dumber models just waste more of your time.
00:09:48.140 | Sorry, this is a long list. Hopefully it's not too tedious. But I think these are important points.
00:09:56.540 | The second-to-last point I'll make is I think the Unix philosophy is going to be more powerful here
00:10:00.940 | than vertical integration. So in developer tools, the ability to use simple tools in ways that compose
00:10:07.500 | well with other interesting tools is really powerful. And so I think, especially with agents where there's
00:10:13.420 | less of a need to create like a lot of UI around it, you're going to start to see more command-driven
00:10:19.020 | tools, command line tools, and things like that. And then last but not least is, you know, we had an
00:10:25.260 | existing RAG chat coding assistant. Maybe some of you have used it. It was called Kodi. It still exists.
00:10:31.180 | We're still supporting it. It's still in heavy use across, you know, many Fortune 500 companies. But we
00:10:36.380 | decided to build a new application from the ground up for the agentic world because we didn't want to be
00:10:43.900 | constrained by all the assumptions and constraints that we built into the application layer for the previous
00:10:50.700 | generation of LLMs. And one analogy I like to draw here is, you know, what the early days of the internet, right?
00:11:00.460 | Like in the early days of the internet, the way people, you know, jumped into the the web was
00:11:05.820 | using an interface on the left. This was before like most people knew what the internet was about, what it was capable of,
00:11:11.340 | and that was the right interface for the first generation of the internet because like what can you do with the internet?
00:11:15.580 | Well, like there's a bunch of different things. You can look at like trending celebrities. You can, you know, buy
00:11:20.620 | automobiles. You can look at movie reviews, all these things you might not have thought of. And so it's
00:11:24.940 | useful to have in front of you. But at some point it gets a little tedious like clicking through all the
00:11:29.420 | different hyperlinks and navigating your way through. And then the the real power of the web was sort of
00:11:34.700 | unleashed by just like the one simple text box where you just like type what you're looking for and you get
00:11:40.540 | to it. And I think, you know, with with agentic UIs, that's what we should be striving for both in developer
00:11:47.500 | tools and in a lot of different application paradigms. Okay, so what does that look like in practice?
00:11:52.940 | So when we went to design this thing, our coding agent is called AMP. And AMP has two clients and this
00:12:00.060 | is what they look like. So both are like very, very bare bones. A lot of people, you know, look at this
00:12:04.860 | and like, what is this? It's just a text box. What can I do with it? And that was by design, you know,
00:12:10.780 | that for all the reasons I just mentioned. One client is just a simple VS code extension
00:12:16.380 | that allows us to take advantage of some nice things that you get in VS code, like being able to view
00:12:21.980 | diffs. That's really important in the agent decoding world. I often joke that like that's now I use that
00:12:27.500 | view more than the editor view now. And the second was a CLI. So just stripping things down to bare bones. It
00:12:34.700 | has access to all the same tools as the VS code extension does, but it's just something that you
00:12:39.500 | can invoke in your command line. You can also script it, compose it with other tools.
00:12:45.180 | Okay. So what does this actually look like in practice? I want to do something a little bit
00:12:51.980 | risky here, which is in the past I've done a lot of like, you know, hey, here's me building a simple
00:12:58.140 | app, like those sorts of demos. But I actually want to show off like where we think this is most useful,
00:13:02.700 | which is like, hey, I'm working on an application that has real users. Let me actually make a contribution
00:13:08.620 | to that code base, given all, with all the like existing constraints. And so I actually want to,
00:13:14.220 | I'm just going to code a little bit. Well, I don't even know how far we're going to get.
00:13:16.780 | But this is, this is AMP. This is VS code running AMP in the sidebar, and it's open to the AMP code base.
00:13:24.780 | And what I want to do is implement like a simple change to this application. So the change that I'm going
00:13:33.100 | to make is AMP has a server component. And the server exists as a way to provide the LLM inference
00:13:41.100 | point. It also provides like team functionality. We have a way to share like what different teams
00:13:46.780 | are doing, what different users are doing with AI. So you can kind of learn from other users. There's
00:13:50.460 | leaderboard. It's fun. But there's also these things called connectors, which allow AMP to talk to
00:13:55.340 | external services. So our issue tracker is linear. And so I've integrated linear into AMP here, but I'm
00:14:01.180 | kind of annoyed because it's using this generic like network icon. And I would really like to customize
00:14:05.660 | this icon such that when you plug in the linear MCP endpoint, it uses a more appropriate icon like a
00:14:11.180 | checkbox or something issue-y. So I've already filed this as a linear issue. And I'm just going to ask,
00:14:18.060 | can you find the linear issue about customizing the linear connector icon? Then implement it.
00:14:30.380 | So what this will do is it has access to a set of tools. I can go over here to the tool panel
00:14:37.340 | and see what tools it has access to. Some are local, some are built in. It's got the standard tools
00:14:42.860 | like read and edit file or run bash command. You can also plug in things like Playwright and Postgres
00:14:48.220 | via MCP. And then linear is also plugged in through this. So we're basically talking to the linear API
00:14:55.020 | through the MCP server. And what this will do is it will use the linear issues API. And it will search
00:15:05.340 | at issues. It found 50 issues. And the one that I was referring to is at the top here. So add a special
00:15:10.380 | icon for the linear connector. And now it's going to go and implement the thing for me.
00:15:18.460 | And one thing to note here is it's just making these tool calls on its own. I'm not asking it
00:15:24.140 | to use specific tools. We've also tried to make the information that you see minimal. So you don't
00:15:31.980 | need to see all the API tool calls that it's making underneath the hood or crowd out the transcript with
00:15:37.260 | a bunch of things. Most of the time, we just want to keep it simple. Because the contract we want to
00:15:42.380 | provide to users is like, the feedback loops here are more robust. And you don't have to micromanage
00:15:48.460 | this as much. Another thing I want to point out here is the search tool that this is using is actually
00:15:55.020 | a sub-agent. So it's actually spinning off a sub-agent loop that uses a form of agentic search. It has
00:16:00.780 | access to a bunch of different search tools. Keyword search, regular graph, looking at file names.
00:16:09.340 | If you want to inspect what it's doing, you can click the expand thing and see what different path
00:16:14.140 | it's taking, what files it's reading, what things it uncovered. But again, by default, we think this is
00:16:19.020 | an implementation detail. And hopefully, it should just surface the right thing. So it's working. It's
00:16:24.940 | gathering context. Another thing I want to call out in this interface is, as we've gotten more feedback,
00:16:31.100 | we've kind of designed this thing to be more multi-threaded. So there's a quick keyboard shortcut that
00:16:36.060 | allows you to quickly tab through the different threads that you're running. And it's a common
00:16:39.340 | paradigm in our user community to be running more than one of these things at a time.
00:16:43.660 | And it takes a little bit to get used to the context switching. Developers hate context switching.
00:16:49.740 | We like to be in flow, in focus. Typically, what we see here is the secondary thread will either be
00:16:59.660 | something that's a lot shallower, so that you can quickly page back to the main thread. Or what I like
00:17:03.740 | to do is, while the agent is working, I actually like to understand the code at a deeper level myself,
00:17:09.420 | so I can better understand what it's going to do. So I could ask something like, "Can you show me how
00:17:15.900 | connectors and connections work in AMP?" I can ask it to draw a picture of that.
00:17:24.860 | So we'll kick that thread off in parallel. We'll check back in on what this guy is doing.
00:17:28.620 | So it's found -- it's read a bunch of files. It's read some front-end files. Our front-end is written
00:17:34.940 | in Svelte. And as you can see, it's being fairly thoughtful about reading the appropriate files
00:17:40.620 | before it actually goes and does the work. And we find that this is really important to make the feedback
00:17:46.700 | cycles more robust. Otherwise, the anti-pattern is you just get into the weeds of steering it manually.
00:17:53.180 | It's also got this to-do list thing at the bottom that helps it structure and plan out the longer-term
00:18:00.860 | tasks, so that it doesn't go immediately dive into the code. There's a classic mistake that human
00:18:05.340 | developers make too, where you dive into the code too early, and then you get lost in the weeds, and then
00:18:09.100 | it takes a while to dig yourself out. Okay, so it's making some changes. One other thing that I like
00:18:17.340 | to point out here is -- you know, I mentioned that I use the diff view in VS Code now probably more than
00:18:22.620 | the editor view. VS Code actually has a really nice diff view. I have a hot keyed, so I can open it up
00:18:28.380 | quickly. And most of my time in VS Code now is spent just reviewing the changes it makes. And I actually
00:18:35.500 | like this a lot better than GitHub PRs or Git diff on the command line. Just because it's in the editor,
00:18:41.340 | you can see the whole file, and jump to definition even works. So, yeah. We'll just wait a little bit
00:18:49.180 | for it to do its thing. I actually think it's probably made -- looks like it's getting there. It's probably
00:18:59.900 | just running, like, tests. Let's see if we go back here, if it's updated the icon at all.
00:19:05.740 | Okay, so it hasn't gotten there yet, but I think it's on the right track.
00:19:13.420 | The question was, does it write its own tests? Yes, it typically writes its own tests. And if it doesn't,
00:19:23.420 | you can prompt it to do so. So, it's doing a lot of things. It's reading a lot of files. It's making
00:19:29.660 | these edits incrementally and then checking the diagnostics. And then now let's see if it works.
00:19:35.100 | Okay, cool. So, you see here the icon has been updated. And this is without me really steering it
00:19:40.220 | in any fashion. Notice here on this page that this icon didn't update, though. And so this is actually
00:19:48.140 | not surprising to me because this change -- as many changes in production code bases are -- often more
00:19:53.740 | nuanced than it seems at the surface. So, in this case, the reason it's not getting it here is because
00:19:58.620 | this is the admin page. And the piece of data we need to know -- we need to read in order to tell
00:20:07.820 | that this is a linear MCP rather than a generic MCP is actually part of the config. We have to look at
00:20:13.900 | the endpoint of the MCP URL. In order to do that, you have to read the config. But the config might
00:20:18.860 | also contain secrets. It doesn't contain secrets in this case, but might contain secrets in other cases.
00:20:22.620 | So, we actually prohibit those secrets from being sent to non-admin pages. So, it's not surprising to me
00:20:28.620 | that the first pass, it didn't get that right. But let's see if it can get -- I'll just nudge it a little
00:20:32.940 | a bit. So, like, I noticed that the icon changed on admin connections, but not on settings.
00:20:41.820 | Can you investigate why?
00:20:49.980 | And in the interest of time, we'll check back on this later. How about that?
00:20:54.060 | We'll let it run and we'll see if it can find its way to the right solution there.
00:20:59.260 | Is it okay if I go a little bit over since we started a little bit? Okay, cool.
00:21:05.020 | Is it okay with you all if I go a little bit over? Okay. Are you still having fun?
00:21:08.620 | Okay, cool. So, that was like a brief demo of just like the interaction patterns and the UI.
00:21:14.700 | We try to keep it really minimal. We've released this to like a small group so far. The sign-up is
00:21:21.340 | now publicly open. It's been open for about two weeks, but we haven't done a lot of like marketing
00:21:25.580 | around it. And that's kind of been intentional because we're really trying to design this for
00:21:30.540 | where we think the puck is going. And so, we've done a lot to curate this community of people who are
00:21:35.900 | trying to experiment with LLMs and figure out like how the interaction paradigms are going to change
00:21:41.980 | over the next six to 12 months. And so, our user community is really people who are like spending
00:21:46.620 | nights and weekends a lot of time with this thing to see what they can get it to do. And so, actually,
00:21:53.340 | one of the most insightful things and actually the main topic of this talk is lessons that we've learned
00:21:58.700 | from just like looking at what our power users are doing and seeing what interesting behavior patterns
00:22:05.660 | they're kind of like implementing. And so, like the average spend for agents is growing. It's a lot
00:22:13.100 | more than the average spend was for chatbots or autocomplete. But one other interesting thing that
00:22:17.340 | we've noticed among the user base is that there's a huge variance in terms of how much people use this
00:22:24.140 | think. To the point where like there's like an upper echelon of users that are spending like thousands of
00:22:31.020 | dollars per month just in inference costs. And at first, we're like, this has got to be abuse, right?
00:22:37.660 | Like someone out there is poked, found some way to exploit the inference endpoint, is using it to power
00:22:44.220 | some Chinese AI girlfriend or whatever. But actually, no, when we spoke to the people using it,
00:22:51.900 | we actually found that they were doing real things. And we're like, that's interesting. What the hell are
00:22:55.580 | you doing? And from those insights and the conversations, we basically have encapsulated a series of best
00:23:01.580 | practices or emergent power user patterns for how the very dominant users, the most active users are using
00:23:13.420 | this thing. And this has informed our product design process as well. So one of the the first changes
00:23:17.980 | that we made was we noticed that a lot of the power users were very writing very long prompts. It was
00:23:25.260 | not like the simple kind of like Google style, like three keywords and just like read my mind and expect
00:23:30.300 | something good to happen. They actually wanted to write a lot of detail because they realized that LLMs are
00:23:36.060 | actually quite programmable. If you give them a lot of context, they will follow those instructions and get
00:23:40.940 | a little bit further than if you just give them like a one line sentence. And so we made the default
00:23:45.180 | behavior of the enter key in the AMP input just new line. So you have to hit command enter to submit.
00:23:50.940 | And this throws a lot of the new users off because they're like, wait a minute, why isn't it just enter?
00:23:54.460 | Like, you know, if I'm in like cursor or whatever, it's just enter. That's easy. That's intuitive.
00:23:58.300 | But actually, what we want to push users to do is to write those longer prompts because that actually yields
00:24:03.500 | better results. And I think that's one of the things that prevents people who are still in the kind of like chat LLM
00:24:09.340 | mode from unlocking some of the, you know, cool stuff that agents can do.
00:24:15.420 | Another thing that people do very intentionally is direct the agent to look at relevant context and
00:24:24.700 | feedback mechanisms. So, you know, context was very important in the chat bot era. It's still important
00:24:30.140 | in the agentic era. Now, agents do have a good amount of like built-in knowledge for how to use tools to
00:24:35.820 | acquire context. Like you saw that before when it was using the search tool to find different things.
00:24:39.900 | And it was executing the test and linters to see if the code was valid.
00:24:46.460 | But there's still some cases, especially in production code bases, where it's like, oh,
00:24:51.340 | we do things in a very specific way that are kind of like out of distribution. And so like some like less
00:25:00.140 | agentically inclined users at that point will just give up. They're like, ah, you know, agents aren't
00:25:03.340 | capable of working with like backend code yet. But what we've noticed is the power user like, actually,
00:25:07.820 | let me try to just tell it how to run, you know, the build in this particular sub directory, run the
00:25:12.620 | tests. And that helps it complete the feedback loop so that it can get the validation to get further.
00:25:20.220 | Feedback loops are going to be a big theme of this talk. So another like dominant paradigm here is
00:25:26.700 | constructing these like frontend feedback loops. So like a really common formula is you have the playwright
00:25:31.580 | mcp server and then there's a thing called storybook, which is basically a way to encapsulate or
00:25:36.780 | componentize a lot of your fun components. It makes it very easy to test individual components without
00:25:42.220 | loading your entire app. And you know, you probably should have been doing this anyways as a human
00:25:46.940 | developer because you get a fast feedback loop. You make a change, see it reflected instantly,
00:25:50.540 | you get the auto reload and then go back to your editor. But with agents, you kind of notice it more
00:25:55.180 | because you're no longer like in the weeds doing the thing. You're like, oh, you're almost like the
00:25:59.260 | developer experience engineer for your agent. It's like, how can I make it loop back faster? And so what
00:26:03.740 | the agent will do is like, you know, make the code change, use playwright to open up the page in the
00:26:08.380 | browser, snapshot it, and then loop back on itself. And it does that via storybook because it's much faster
00:26:14.140 | than reloading the entire app.
00:26:17.740 | You put playwright as a tool for you?
00:26:19.740 | Yes. So, it's one of the default recommended tools. So, it's right here.
00:26:24.780 | And actually, it looks like that run completed. I wonder if -- looks like it did approximately the
00:26:35.020 | right thing. Sorry, just to jump out of the slides of her a little bit. So, now you can see like the icon
00:26:40.060 | is customized on the settings page, not just the admin page. And if you look at how it did that,
00:26:45.660 | I think it did the right thing. So, if you look at the diff, it actually looked at the surrounding code
00:26:51.020 | and it was like, oh, there is an existing mechanism for plumbing non-secret parts of the config through
00:26:56.620 | to the UI. Let me use that as a reference point. And it actually plumbed exactly that field through
00:27:05.020 | to the front end. So, now if I add additional fields to the MCP config that do contain secrets,
00:27:09.980 | this is whitelisted. So, it will still only send the endpoint URL over to the client,
00:27:15.100 | basically what it needs to make that icon customization. So, yeah. I know it's not a super
00:27:22.540 | impressively visual change, but a lot of such changes in messy production code bases are like that,
00:27:27.100 | and it's cool to see the agent be able to tease out that nuance.
00:27:32.460 | Okay. I know we're a little bit over time. Can people mind if I keep going? Okay, cool.
00:27:41.180 | There's some additional tips and tricks. Most of this talk is just sharing what we've learned from
00:27:49.740 | our power users. So, another thing that we've noticed is this prevailing narrative that agents
00:27:55.180 | are going to make programmers lazy. It's going to make it so we don't really understand what's going
00:27:59.580 | on in the code, so we're going to ship more slop. But we've actually found the inverse happen with the
00:28:04.860 | power users. They're actually using agents to better understand the code. And so, this is a really good
00:28:10.140 | onboarding tool. We just hired this guy, Tyler Bruno. He's a very precocious young developer.
00:28:15.420 | He's actually still in college, but he's working full-time in addition to taking classes. So, really
00:28:20.860 | bright, but also a bit green. He's been using AMP to just quickly ramp up on how the different pieces
00:28:26.860 | connect together. He can draw diagrams and point you to specific pieces of the code. It's really good at
00:28:32.540 | accelerating that. And then a corollary to this is we all do a form of onboarding to new code whenever we do a
00:28:38.940 | code review. By definition, code review is new code. And oftentimes, it's new code that contains
00:28:45.020 | blogs or is hard to understand or is a bit of a slog. And so, rather than just ignore the code that the
00:28:52.460 | AI generates and just commit it blindly, we find that our user base is actually using this tool to do more
00:28:58.300 | thorough code reviews. So, I've adopted this practice myself where if I have to review a very large diff,
00:29:04.140 | the first thing I do is ask the agent to consume the diff and generate a high-level summary so I can
00:29:08.940 | have a high-level awareness. And then I ask it, "Hey, if you were a smart senior dev, what's the entry
00:29:14.860 | point into this PR?" Because often half the battle is just finding the right entry point. And psychologically,
00:29:21.660 | I often put off code reviews because I'm like, "Oh, it's going to be a pain, and it's going to take
00:29:26.220 | forever just to figure out where I should start reviewing it. So, I'll just do it tomorrow." But this thing,
00:29:30.540 | just like it helps lower that activation energy and make code views more thorough and actually,
00:29:35.500 | dare I say, like a little bit fun and enjoyable now.
00:29:38.620 | Sub-agents are also a thing. So, we implemented the search tool as a sub-agent in the very beginning,
00:29:47.180 | but we're seeing more and more use cases emerge for sub-agents. And the general best practice with
00:29:51.500 | sub-agents is that they often are useful for longer, more complex tasks because the sub-agent allows you
00:29:58.300 | to essentially preserve the context window. So, like, the quality of the LLM will degrade over time.
00:30:06.140 | You know, Sonnet 4 has a context window of 200K, but we see degradation typically around like 120 or 130K.
00:30:12.700 | And by the time you hit 170 tokens, you start to see more kind of like off-the-rails and crazy behavior.
00:30:19.260 | But sub-agents allow you to encapsulate the context used up by a specific sub-task, like implementing a small feature,
00:30:25.980 | such that it doesn't pollute the main agent.
00:30:28.780 | Okay. So, that was a quick tour of a lot of best practices. Just to recap, like the anti-practices,
00:30:35.980 | the common anti-patterns are just like micromanaging the agent, like using it like you would a chatbot,
00:30:41.180 | where you have to kind of like steer it at every interaction or review every edit it's making.
00:30:45.660 | Another common anti-pattern is just like under-prompting, so not giving it enough detail.
00:30:51.020 | Like LLMs, their knowledge comes from two places. It either comes from their training data or from the
00:30:57.180 | context that you give it. And so, you know, it's fine if you do a five-word prompt if you're coding up
00:31:03.980 | like a 3D Flappy Bird game from scratch, because that's well represented in the training set. They're
00:31:08.860 | really good at that. They're trained to do that. But if you're trying to make a subtle nuance change to
00:31:14.460 | your large existing code base, you should be giving it all the details that you would give a colleague
00:31:19.020 | on the team to point them in the right direction. And then last but not least, like agents are not a
00:31:25.500 | vehicle to like TLDR the code. If anything, they're the opposite. You should be using them to do much more
00:31:30.700 | thorough code reviews more quickly. The human is still, you're ultimately responsible for the code
00:31:36.540 | that you ship, and you shouldn't view this as a human replacement. It's really a tool that you can
00:31:40.620 | wield to make yourself 10, 100x more effective. Last tidbit. So, one of the things that we've noticed
00:31:48.140 | among the very, very, very top 1% of the 1% is this inclination to run multiple of these things
00:31:57.740 | in parallel. Jeff Huntley, who wrote that blog post that I showed earlier, he started putting out these
00:32:05.580 | Twitter streams. They're about four hours long each. And it's basically just, he's working on a compiler
00:32:13.580 | on the side. And what he does is he constructs prompts for three or four different agents to work on
00:32:21.100 | different parts of the compiler. And he's gotten to the point where he's prompting it such that he feels
00:32:26.380 | confident enough in the feedback loops where he just hits enter, lets him run, and then he goes
00:32:30.220 | to sleep. And then this thing just runs on Twitter for a while. And I think he's doing this to spread
00:32:35.260 | the word. It's like, hey, you can use this for serious engineering. Compilers are not some vibe-coded
00:32:43.180 | weekend project. They're real tech. They're difficult to build. And it is possible to use agents for code like
00:32:51.100 | this. But it has to be a very intentional skill that you practice. And so I think it's cool. I think
00:32:57.740 | there's a lot of people thinking in terms of agent fleets and where the world is going. But I do think
00:33:02.220 | that the way that we'll actually get there is by building these composable building blocks that allow
00:33:07.020 | people like Jeff to go and combine them and come up with interesting UIs. I think this is just running
00:33:13.180 | in like Tmux or some window manager. Okay, so the takeaways I just want to leave you with is, one,
00:33:19.500 | contrary to what some might say, and there's a lot of smart senior developers out there who think AI is
00:33:25.820 | overhyped, and maybe parts of it are. But I think coding agents are very real. And it is, I think,
00:33:32.060 | a high ceiling skill. It's like, I think we will probably invest in learning how to use these things
00:33:38.380 | in the same way that we invest in learning how best to use our editor or our programming language of choice.
00:33:43.500 | And I think the only way you can learn this stuff is by doing it and then sharing it out with others.
00:33:49.180 | And one of the reasons we built the thread sharing mechanism in AMP is to help encourage knowledge
00:33:54.780 | dissemination so that like if you discover an interesting way of using it, you can share that out with your team.
00:33:59.740 | But yeah, that's it. If you want to kind of like see a recap of the best practices in this talk,
00:34:06.540 | we've actually put out like an AMP owner's manual that guides new users how to best use it.
00:34:11.260 | I'll also be around afterwards. We have a booth in the main expo hall. I'm supposed to say, too,
00:34:17.420 | if you stop by the booth, we'll give you like $10 in free credits. So if anything you saw here was of interest of you
00:34:23.820 | and you want to try this out, stop by and say hi.
00:34:26.540 | I noticed you still type, can you, and then you correct your typos, which I guess you said you shouldn't do.
00:34:38.620 | Yeah, it's part habit and it's part paranoia that in like a live demo setting, there will be some typo
00:34:47.180 | token that will trigger off the rails behavior. But it like, I think that was more of a concern that I
00:34:52.140 | learned in like 2023 when it actually mattered. Because like these days, LMs are more and more like
00:34:57.180 | typo robust, I would say.
00:34:59.260 | Yeah.