back to indexThe emerging skillset of wielding coding agents — Beyang Liu, Sourcegraph / Amp

Chapters
0:0
0:14 Current Discourse: The talk begins by acknowledging the polarized debate on AI's role in coding. While some elite programmers are skeptical, many developers find significant value in AI tools, suggesting a disconnect between top-tier and mainstream experience. Liu frames the discussion by referencing opinions from figures like Jonathan Blow and Eric S. Raymond, highlighting the varied perspectives in the field.
3:1 Paradigm Shift: The most significant mistake developers make is using new agents with old mental models. Liu emphasizes that we are in a "step function transition" in model capabilities, meaning that strategies from even six months ago are already outdated for leveraging the full power of today's agents.
5:6 GPT-3 Era (2022): This era was defined by text completion models. The primary application was "copilot" or "autocomplete," where the AI would suggest the next few lines of code based on the preceding context.
5:24 ChatGPT Era (2023): The introduction of instruct-tuned models like GPT-3.5 led to the rise of chatbots. In the coding world, this manifested as "ragbots," which combined a chat interface with a retrieval engine to answer questions about a codebase.
6:11 Agent Era (Present): The current era is defined by models capable of tool use and autonomous operation. This requires a new application architecture where the agent can directly edit files, run commands, and interact with external services to accomplish a goal.
7:27 Autonomous Edits
9:55 Unix Philosophy
10:24 New Applications
13:15 The Task
14:30 Tool Use
15:53 Sub-Agents
17:56 Planning & Execution
19:46 Nuanced Problem Solving
23:21 Detailed Prompts
24:21 Feedback Loops
28:3 Code Understanding
28:36 Code Reviews
30:35 Micromanagement
30:46 Under-prompting
31:52 Parallel Agents
33:18 High-Ceiling Skill
00:00:00.000 |
My name is Bianc. I'm the CTO and co-founder of a company called Sourcegraph. We build developer 00:00:19.500 |
tools and today I want to share with you some of the observations and insights that we've had on 00:00:24.900 |
the sort of like emerging skill set of how to wield coding agents. That sound good to everyone? 00:00:30.900 |
All right, cool. Okay, so let's check in on the agent discourse. I don't know if you all saw this, 00:00:39.900 |
but a couple days ago there were some spicy tweets about the efficacy of AI coding agents or inefficacy, 00:00:47.900 |
depending on your perspective. So Jonathan Blow, who is a really talented developer, he basically 00:00:54.900 |
single-handedly coded up the indie game Braid, if you're familiar with that. So he's kind of like 00:00:59.900 |
god tier status in terms of coding ability. Retweeted Alex Albert, who is also someone I 00:01:05.900 |
respect and admire a lot, works at Anthropic, basically claiming that all the hype around coding 00:01:10.900 |
agents and code AI in general was just hype, right? There's no substance there. And then there were 00:01:17.900 |
some responses, and there was kind of a spectrum of responses too. We had some other big names in the 00:01:22.900 |
developer world, like Jessie Frazell. She was one of the early contributors, maintainers of Docker. She's 00:01:27.900 |
also really legit. She said basically something to the effect of like, I think you're right, but you're in 00:01:34.900 |
like the top 0.01% of programmers, Jonathan, for the rest of us down here in this room that aren't on 00:01:40.900 |
Mount Olympus. It actually helps a lot, but not super helpful if you're really, really good. But then we also had folks like Eric S. Raymond, who is like one of the fathers of open source, who had a very spicy reply. He's basically like, look, I consider myself to be pretty decent at programming, and these things help a lot. And then the kind of my favorite one of this was actually the Top of Hacker News post that was written by 00:01:50.900 |
Thomas Tacek, who is a really legit security engineer. Some of you may have seen this trending. 00:01:55.900 |
reply is basically like, look, I consider myself to be pretty decent at programming, and these things 00:02:02.160 |
help a lot. And then the kind of my favorite one of this was actually the Top of Hacker News post 00:02:09.540 |
that was written by Thomas Tacek, who is a really legit security engineer. Some of you may have seen 00:02:18.660 |
this trending. He was basically taking the opposite view of like, you know, there's some really smart 00:02:22.980 |
people there who are very AI skeptical, but they're nuts. Like, these things are really useful. 00:02:26.680 |
So I'm guessing if you're at this conference, you probably lean toward coding agents are 00:02:33.220 |
substantively useful, and there's something there. I don't know. Just a guess. But I think even within 00:02:40.720 |
this room, there's probably a spectrum of best practices and opinions about like where agents 00:02:48.940 |
are good. You know, whether they're restricted to like small edits or like front end applications or 00:02:56.020 |
weekend via coding, whether they actually work on your production code base. And I think this is just 00:03:02.200 |
indicative of the dynamic technical landscape that we're in right now. And a couple months back, I read 00:03:07.760 |
this blog post from this guy, Jeff Huntley. So Jeff was a senior engineer at Canva at the time. And his role 00:03:15.240 |
at Canva is really interesting. He basically went around interviewing all the developers inside of 00:03:21.480 |
Canva using AI tools like cursor and other things and seeing how they're using it. And he basically came 00:03:27.060 |
to the conclusion that like most people were holding it wrong, which is really interesting. And he came up 00:03:32.300 |
with the blog post about like all the different anti-patterns that he was seeing. But my summation 00:03:37.540 |
of that blog post is like the number one mistake that people are using with coding agents right now is they're 00:03:45.080 |
trying to use coding agents the same way they're using AI coding tools six months ago. And therefore, they're 00:03:52.480 |
probably wrong, which is kind of crazy because normally, if you're, you know, using a tool, 00:03:57.720 |
the best practices don't change in six months. Typically, the things that you learn that are good will 00:04:04.420 |
still be like present and, you know, topical and relevant six months down the line. But I think we're in a 00:04:12.660 |
really interesting moment in time right now. And, you know, why the sudden change? I think it's because 00:04:18.900 |
of this step function transition that we've experienced in model capabilities in the past six months. So, you know, 00:04:26.700 |
we've, we've all been around since the dawn of generative AI, the ancient year of 2022, right? November 2022 was 00:04:36.700 |
when ChatGPT launched, right? And every year now, you know, this is now the year three, you know, three 00:04:43.260 |
after ChatGPT, right? We're now living in the AI future. But I think there's already been kind of like three 00:04:48.900 |
distinct waves or eras largely driven by the evolution of frontier model capabilities. And the 00:04:56.700 |
model capabilities really dictate the ideal architecture that, that becomes dominant at the 00:05:02.680 |
application layer. So in the GPT-3 era, all the models were text completion models, which meant all 00:05:09.280 |
the applications that people were building were these like copilots or autocomplete tools. So the 00:05:13.540 |
dominant UX paradigm was like, you type some stuff, it types some stuff, you type some more, and that's 00:05:19.000 |
how you would interact. And then ChatGPT came along with GPT-3.5, which was InstructTuned to interact 00:05:25.980 |
like a chat bot. And suddenly people realized like, oh, it's not just completing the next thing 00:05:31.960 |
I'm talking about, I can actually ask it questions like I can a human now. And then some other people 00:05:36.980 |
came along, we were part of this crowd, we realized like, hey, you know what's even better than just 00:05:42.260 |
like asking questions? You can actually copy paste stuff into the chat and say like, here's some code 00:05:47.320 |
for my code base, use that as an example, and pattern match against that. And that helps it generate, 00:05:52.960 |
you know, a little bit better code or less fake code or less hallucinated code than it did before. 00:05:58.940 |
And that basically meant that everyone at the application layer was building a rag bot in 2023. 00:06:04.920 |
So like a chat bot plus a rag retrieval engine. But now, I think we've entered a new era. 00:06:12.900 |
And I'm not sure if everyone realizes it, or maybe this is, I don't know, like who agrees with this statement? 00:06:17.940 |
Like who thinks it's a real paradigm shift? Okay, and then who here is like, ah, that's a bunch of 00:06:22.900 |
bullshit. Anyone? Feel free to, I like, okay, okay, so maybe I'm, maybe I could just skip this slide. 00:06:28.880 |
So we're now living in the era of agents, and the new model capabilities really dictate a new 00:06:35.720 |
application architecture. And so one of the things that we asked ourselves at Sourcegraph is, you know, 00:06:40.580 |
a lot of the existing tools in the market that were designed for the era of GPT-4 and CLAWD-3. 00:06:46.860 |
So a lot of the application stuff, features, and UX and UI was really built around the capabilities, 00:06:53.400 |
or in some cases, the limitations of the chat-based LLMs. And so if we were going to design a coding agent 00:07:00.780 |
from the ground up to unleash the capabilities of tool using LLMs, agentic LLMs, what would that look like? 00:07:07.580 |
Okay, so here are my spicy takes. These are controversial design decisions that I think are better to make 00:07:16.140 |
in the age of agents. And many of these go against the best practices that kind of emerged in the chatbot era. 00:07:23.820 |
Okay, so number one is the agent should just make edits to your files. It shouldn't ask you at every turn, 00:07:30.620 |
like, hey, you know, I want to make this change. Should I apply it? If it's asking you and it's wrong, 00:07:34.620 |
it's already done the wrong thing and it's wasted your time. Humans need to get more out of the inner loop 00:07:41.420 |
and more kind of like on top of the loop, like still steering it and guiding it, but 00:07:45.420 |
less, you know, micromanaging and managing every change. Second thing is, do we still need a thick 00:07:52.300 |
client to manipulate the LLMs? Like, do we still need a forked VS code? That's like the salty way of 00:07:58.860 |
saying this, right? The VS code fork became the culmination of the AI coding application, I think, 00:08:04.780 |
for the chatbot era. But there's this question of like, you know, if the contract of an agent is you 00:08:10.940 |
ask it to do stuff and then it does stuff, do you really still need all that UI built around like 00:08:15.340 |
context management and applying the proposed change in the code base? Or can you just ask it to do stuff 00:08:21.260 |
and expect it to do the right thing? Third, I think we're going to move beyond the choose your own model 00:08:28.140 |
phase. So I think in the chatbot era, it was very easy to swap models in and out. And you'd like, oh, 00:08:33.580 |
you know, a new model came along, let me swap it out and see how well it attends to the context that 00:08:38.220 |
my retrieval engine fetches. In the agentic world, there's a much deeper coupling because the LLM that 00:08:44.780 |
you're using essentially becomes the brains of these agentic chains. And so it's much harder to rip 00:08:49.660 |
and replace. And I think a lot of people in this room who have tried mixing and matching, you know, 00:08:53.900 |
different models in the context of agents have found that, you know, swapping out a different model and 00:08:59.420 |
expecting similar results is very different. A lot of the, like, a lot of the LLMs out there aren't even 00:09:04.620 |
good at the basics of tool use yet. So it's very difficult to just replace the brains. 00:09:10.700 |
Four is I think we're going to move past the era of fixed pricing. Agents eat up a lot of tokens. 00:09:16.380 |
And so they look expensive relative to chatbots. But the comparison that more and more people are 00:09:23.100 |
making is how much human time is it saving? So they're still cheap relative to human time saved. 00:09:28.460 |
And the fixed pricing model actually introduces a perverse incentive now where it's like selling gym 00:09:33.420 |
memberships, right? Like if I sold you a membership to my chatbot and you're now paying me, you know, 00:09:38.140 |
20 bucks a month, my incentive now is to push the inference cost as low as possible. And the easiest 00:09:43.260 |
way to do that is to use dumber models. But dumber models just waste more of your time. 00:09:48.140 |
Sorry, this is a long list. Hopefully it's not too tedious. But I think these are important points. 00:09:56.540 |
The second-to-last point I'll make is I think the Unix philosophy is going to be more powerful here 00:10:00.940 |
than vertical integration. So in developer tools, the ability to use simple tools in ways that compose 00:10:07.500 |
well with other interesting tools is really powerful. And so I think, especially with agents where there's 00:10:13.420 |
less of a need to create like a lot of UI around it, you're going to start to see more command-driven 00:10:19.020 |
tools, command line tools, and things like that. And then last but not least is, you know, we had an 00:10:25.260 |
existing RAG chat coding assistant. Maybe some of you have used it. It was called Kodi. It still exists. 00:10:31.180 |
We're still supporting it. It's still in heavy use across, you know, many Fortune 500 companies. But we 00:10:36.380 |
decided to build a new application from the ground up for the agentic world because we didn't want to be 00:10:43.900 |
constrained by all the assumptions and constraints that we built into the application layer for the previous 00:10:50.700 |
generation of LLMs. And one analogy I like to draw here is, you know, what the early days of the internet, right? 00:11:00.460 |
Like in the early days of the internet, the way people, you know, jumped into the the web was 00:11:05.820 |
using an interface on the left. This was before like most people knew what the internet was about, what it was capable of, 00:11:11.340 |
and that was the right interface for the first generation of the internet because like what can you do with the internet? 00:11:15.580 |
Well, like there's a bunch of different things. You can look at like trending celebrities. You can, you know, buy 00:11:20.620 |
automobiles. You can look at movie reviews, all these things you might not have thought of. And so it's 00:11:24.940 |
useful to have in front of you. But at some point it gets a little tedious like clicking through all the 00:11:29.420 |
different hyperlinks and navigating your way through. And then the the real power of the web was sort of 00:11:34.700 |
unleashed by just like the one simple text box where you just like type what you're looking for and you get 00:11:40.540 |
to it. And I think, you know, with with agentic UIs, that's what we should be striving for both in developer 00:11:47.500 |
tools and in a lot of different application paradigms. Okay, so what does that look like in practice? 00:11:52.940 |
So when we went to design this thing, our coding agent is called AMP. And AMP has two clients and this 00:12:00.060 |
is what they look like. So both are like very, very bare bones. A lot of people, you know, look at this 00:12:04.860 |
and like, what is this? It's just a text box. What can I do with it? And that was by design, you know, 00:12:10.780 |
that for all the reasons I just mentioned. One client is just a simple VS code extension 00:12:16.380 |
that allows us to take advantage of some nice things that you get in VS code, like being able to view 00:12:21.980 |
diffs. That's really important in the agent decoding world. I often joke that like that's now I use that 00:12:27.500 |
view more than the editor view now. And the second was a CLI. So just stripping things down to bare bones. It 00:12:34.700 |
has access to all the same tools as the VS code extension does, but it's just something that you 00:12:39.500 |
can invoke in your command line. You can also script it, compose it with other tools. 00:12:45.180 |
Okay. So what does this actually look like in practice? I want to do something a little bit 00:12:51.980 |
risky here, which is in the past I've done a lot of like, you know, hey, here's me building a simple 00:12:58.140 |
app, like those sorts of demos. But I actually want to show off like where we think this is most useful, 00:13:02.700 |
which is like, hey, I'm working on an application that has real users. Let me actually make a contribution 00:13:08.620 |
to that code base, given all, with all the like existing constraints. And so I actually want to, 00:13:14.220 |
I'm just going to code a little bit. Well, I don't even know how far we're going to get. 00:13:16.780 |
But this is, this is AMP. This is VS code running AMP in the sidebar, and it's open to the AMP code base. 00:13:24.780 |
And what I want to do is implement like a simple change to this application. So the change that I'm going 00:13:33.100 |
to make is AMP has a server component. And the server exists as a way to provide the LLM inference 00:13:41.100 |
point. It also provides like team functionality. We have a way to share like what different teams 00:13:46.780 |
are doing, what different users are doing with AI. So you can kind of learn from other users. There's 00:13:50.460 |
leaderboard. It's fun. But there's also these things called connectors, which allow AMP to talk to 00:13:55.340 |
external services. So our issue tracker is linear. And so I've integrated linear into AMP here, but I'm 00:14:01.180 |
kind of annoyed because it's using this generic like network icon. And I would really like to customize 00:14:05.660 |
this icon such that when you plug in the linear MCP endpoint, it uses a more appropriate icon like a 00:14:11.180 |
checkbox or something issue-y. So I've already filed this as a linear issue. And I'm just going to ask, 00:14:18.060 |
can you find the linear issue about customizing the linear connector icon? Then implement it. 00:14:30.380 |
So what this will do is it has access to a set of tools. I can go over here to the tool panel 00:14:37.340 |
and see what tools it has access to. Some are local, some are built in. It's got the standard tools 00:14:42.860 |
like read and edit file or run bash command. You can also plug in things like Playwright and Postgres 00:14:48.220 |
via MCP. And then linear is also plugged in through this. So we're basically talking to the linear API 00:14:55.020 |
through the MCP server. And what this will do is it will use the linear issues API. And it will search 00:15:05.340 |
at issues. It found 50 issues. And the one that I was referring to is at the top here. So add a special 00:15:10.380 |
icon for the linear connector. And now it's going to go and implement the thing for me. 00:15:18.460 |
And one thing to note here is it's just making these tool calls on its own. I'm not asking it 00:15:24.140 |
to use specific tools. We've also tried to make the information that you see minimal. So you don't 00:15:31.980 |
need to see all the API tool calls that it's making underneath the hood or crowd out the transcript with 00:15:37.260 |
a bunch of things. Most of the time, we just want to keep it simple. Because the contract we want to 00:15:42.380 |
provide to users is like, the feedback loops here are more robust. And you don't have to micromanage 00:15:48.460 |
this as much. Another thing I want to point out here is the search tool that this is using is actually 00:15:55.020 |
a sub-agent. So it's actually spinning off a sub-agent loop that uses a form of agentic search. It has 00:16:00.780 |
access to a bunch of different search tools. Keyword search, regular graph, looking at file names. 00:16:09.340 |
If you want to inspect what it's doing, you can click the expand thing and see what different path 00:16:14.140 |
it's taking, what files it's reading, what things it uncovered. But again, by default, we think this is 00:16:19.020 |
an implementation detail. And hopefully, it should just surface the right thing. So it's working. It's 00:16:24.940 |
gathering context. Another thing I want to call out in this interface is, as we've gotten more feedback, 00:16:31.100 |
we've kind of designed this thing to be more multi-threaded. So there's a quick keyboard shortcut that 00:16:36.060 |
allows you to quickly tab through the different threads that you're running. And it's a common 00:16:39.340 |
paradigm in our user community to be running more than one of these things at a time. 00:16:43.660 |
And it takes a little bit to get used to the context switching. Developers hate context switching. 00:16:49.740 |
We like to be in flow, in focus. Typically, what we see here is the secondary thread will either be 00:16:59.660 |
something that's a lot shallower, so that you can quickly page back to the main thread. Or what I like 00:17:03.740 |
to do is, while the agent is working, I actually like to understand the code at a deeper level myself, 00:17:09.420 |
so I can better understand what it's going to do. So I could ask something like, "Can you show me how 00:17:15.900 |
connectors and connections work in AMP?" I can ask it to draw a picture of that. 00:17:24.860 |
So we'll kick that thread off in parallel. We'll check back in on what this guy is doing. 00:17:28.620 |
So it's found -- it's read a bunch of files. It's read some front-end files. Our front-end is written 00:17:34.940 |
in Svelte. And as you can see, it's being fairly thoughtful about reading the appropriate files 00:17:40.620 |
before it actually goes and does the work. And we find that this is really important to make the feedback 00:17:46.700 |
cycles more robust. Otherwise, the anti-pattern is you just get into the weeds of steering it manually. 00:17:53.180 |
It's also got this to-do list thing at the bottom that helps it structure and plan out the longer-term 00:18:00.860 |
tasks, so that it doesn't go immediately dive into the code. There's a classic mistake that human 00:18:05.340 |
developers make too, where you dive into the code too early, and then you get lost in the weeds, and then 00:18:09.100 |
it takes a while to dig yourself out. Okay, so it's making some changes. One other thing that I like 00:18:17.340 |
to point out here is -- you know, I mentioned that I use the diff view in VS Code now probably more than 00:18:22.620 |
the editor view. VS Code actually has a really nice diff view. I have a hot keyed, so I can open it up 00:18:28.380 |
quickly. And most of my time in VS Code now is spent just reviewing the changes it makes. And I actually 00:18:35.500 |
like this a lot better than GitHub PRs or Git diff on the command line. Just because it's in the editor, 00:18:41.340 |
you can see the whole file, and jump to definition even works. So, yeah. We'll just wait a little bit 00:18:49.180 |
for it to do its thing. I actually think it's probably made -- looks like it's getting there. It's probably 00:18:59.900 |
just running, like, tests. Let's see if we go back here, if it's updated the icon at all. 00:19:05.740 |
Okay, so it hasn't gotten there yet, but I think it's on the right track. 00:19:13.420 |
The question was, does it write its own tests? Yes, it typically writes its own tests. And if it doesn't, 00:19:23.420 |
you can prompt it to do so. So, it's doing a lot of things. It's reading a lot of files. It's making 00:19:29.660 |
these edits incrementally and then checking the diagnostics. And then now let's see if it works. 00:19:35.100 |
Okay, cool. So, you see here the icon has been updated. And this is without me really steering it 00:19:40.220 |
in any fashion. Notice here on this page that this icon didn't update, though. And so this is actually 00:19:48.140 |
not surprising to me because this change -- as many changes in production code bases are -- often more 00:19:53.740 |
nuanced than it seems at the surface. So, in this case, the reason it's not getting it here is because 00:19:58.620 |
this is the admin page. And the piece of data we need to know -- we need to read in order to tell 00:20:07.820 |
that this is a linear MCP rather than a generic MCP is actually part of the config. We have to look at 00:20:13.900 |
the endpoint of the MCP URL. In order to do that, you have to read the config. But the config might 00:20:18.860 |
also contain secrets. It doesn't contain secrets in this case, but might contain secrets in other cases. 00:20:22.620 |
So, we actually prohibit those secrets from being sent to non-admin pages. So, it's not surprising to me 00:20:28.620 |
that the first pass, it didn't get that right. But let's see if it can get -- I'll just nudge it a little 00:20:32.940 |
a bit. So, like, I noticed that the icon changed on admin connections, but not on settings. 00:20:49.980 |
And in the interest of time, we'll check back on this later. How about that? 00:20:54.060 |
We'll let it run and we'll see if it can find its way to the right solution there. 00:20:59.260 |
Is it okay if I go a little bit over since we started a little bit? Okay, cool. 00:21:05.020 |
Is it okay with you all if I go a little bit over? Okay. Are you still having fun? 00:21:08.620 |
Okay, cool. So, that was like a brief demo of just like the interaction patterns and the UI. 00:21:14.700 |
We try to keep it really minimal. We've released this to like a small group so far. The sign-up is 00:21:21.340 |
now publicly open. It's been open for about two weeks, but we haven't done a lot of like marketing 00:21:25.580 |
around it. And that's kind of been intentional because we're really trying to design this for 00:21:30.540 |
where we think the puck is going. And so, we've done a lot to curate this community of people who are 00:21:35.900 |
trying to experiment with LLMs and figure out like how the interaction paradigms are going to change 00:21:41.980 |
over the next six to 12 months. And so, our user community is really people who are like spending 00:21:46.620 |
nights and weekends a lot of time with this thing to see what they can get it to do. And so, actually, 00:21:53.340 |
one of the most insightful things and actually the main topic of this talk is lessons that we've learned 00:21:58.700 |
from just like looking at what our power users are doing and seeing what interesting behavior patterns 00:22:05.660 |
they're kind of like implementing. And so, like the average spend for agents is growing. It's a lot 00:22:13.100 |
more than the average spend was for chatbots or autocomplete. But one other interesting thing that 00:22:17.340 |
we've noticed among the user base is that there's a huge variance in terms of how much people use this 00:22:24.140 |
think. To the point where like there's like an upper echelon of users that are spending like thousands of 00:22:31.020 |
dollars per month just in inference costs. And at first, we're like, this has got to be abuse, right? 00:22:37.660 |
Like someone out there is poked, found some way to exploit the inference endpoint, is using it to power 00:22:44.220 |
some Chinese AI girlfriend or whatever. But actually, no, when we spoke to the people using it, 00:22:51.900 |
we actually found that they were doing real things. And we're like, that's interesting. What the hell are 00:22:55.580 |
you doing? And from those insights and the conversations, we basically have encapsulated a series of best 00:23:01.580 |
practices or emergent power user patterns for how the very dominant users, the most active users are using 00:23:13.420 |
this thing. And this has informed our product design process as well. So one of the the first changes 00:23:17.980 |
that we made was we noticed that a lot of the power users were very writing very long prompts. It was 00:23:25.260 |
not like the simple kind of like Google style, like three keywords and just like read my mind and expect 00:23:30.300 |
something good to happen. They actually wanted to write a lot of detail because they realized that LLMs are 00:23:36.060 |
actually quite programmable. If you give them a lot of context, they will follow those instructions and get 00:23:40.940 |
a little bit further than if you just give them like a one line sentence. And so we made the default 00:23:45.180 |
behavior of the enter key in the AMP input just new line. So you have to hit command enter to submit. 00:23:50.940 |
And this throws a lot of the new users off because they're like, wait a minute, why isn't it just enter? 00:23:54.460 |
Like, you know, if I'm in like cursor or whatever, it's just enter. That's easy. That's intuitive. 00:23:58.300 |
But actually, what we want to push users to do is to write those longer prompts because that actually yields 00:24:03.500 |
better results. And I think that's one of the things that prevents people who are still in the kind of like chat LLM 00:24:09.340 |
mode from unlocking some of the, you know, cool stuff that agents can do. 00:24:15.420 |
Another thing that people do very intentionally is direct the agent to look at relevant context and 00:24:24.700 |
feedback mechanisms. So, you know, context was very important in the chat bot era. It's still important 00:24:30.140 |
in the agentic era. Now, agents do have a good amount of like built-in knowledge for how to use tools to 00:24:35.820 |
acquire context. Like you saw that before when it was using the search tool to find different things. 00:24:39.900 |
And it was executing the test and linters to see if the code was valid. 00:24:46.460 |
But there's still some cases, especially in production code bases, where it's like, oh, 00:24:51.340 |
we do things in a very specific way that are kind of like out of distribution. And so like some like less 00:25:00.140 |
agentically inclined users at that point will just give up. They're like, ah, you know, agents aren't 00:25:03.340 |
capable of working with like backend code yet. But what we've noticed is the power user like, actually, 00:25:07.820 |
let me try to just tell it how to run, you know, the build in this particular sub directory, run the 00:25:12.620 |
tests. And that helps it complete the feedback loop so that it can get the validation to get further. 00:25:20.220 |
Feedback loops are going to be a big theme of this talk. So another like dominant paradigm here is 00:25:26.700 |
constructing these like frontend feedback loops. So like a really common formula is you have the playwright 00:25:31.580 |
mcp server and then there's a thing called storybook, which is basically a way to encapsulate or 00:25:36.780 |
componentize a lot of your fun components. It makes it very easy to test individual components without 00:25:42.220 |
loading your entire app. And you know, you probably should have been doing this anyways as a human 00:25:46.940 |
developer because you get a fast feedback loop. You make a change, see it reflected instantly, 00:25:50.540 |
you get the auto reload and then go back to your editor. But with agents, you kind of notice it more 00:25:55.180 |
because you're no longer like in the weeds doing the thing. You're like, oh, you're almost like the 00:25:59.260 |
developer experience engineer for your agent. It's like, how can I make it loop back faster? And so what 00:26:03.740 |
the agent will do is like, you know, make the code change, use playwright to open up the page in the 00:26:08.380 |
browser, snapshot it, and then loop back on itself. And it does that via storybook because it's much faster 00:26:19.740 |
Yes. So, it's one of the default recommended tools. So, it's right here. 00:26:24.780 |
And actually, it looks like that run completed. I wonder if -- looks like it did approximately the 00:26:35.020 |
right thing. Sorry, just to jump out of the slides of her a little bit. So, now you can see like the icon 00:26:40.060 |
is customized on the settings page, not just the admin page. And if you look at how it did that, 00:26:45.660 |
I think it did the right thing. So, if you look at the diff, it actually looked at the surrounding code 00:26:51.020 |
and it was like, oh, there is an existing mechanism for plumbing non-secret parts of the config through 00:26:56.620 |
to the UI. Let me use that as a reference point. And it actually plumbed exactly that field through 00:27:05.020 |
to the front end. So, now if I add additional fields to the MCP config that do contain secrets, 00:27:09.980 |
this is whitelisted. So, it will still only send the endpoint URL over to the client, 00:27:15.100 |
basically what it needs to make that icon customization. So, yeah. I know it's not a super 00:27:22.540 |
impressively visual change, but a lot of such changes in messy production code bases are like that, 00:27:27.100 |
and it's cool to see the agent be able to tease out that nuance. 00:27:32.460 |
Okay. I know we're a little bit over time. Can people mind if I keep going? Okay, cool. 00:27:41.180 |
There's some additional tips and tricks. Most of this talk is just sharing what we've learned from 00:27:49.740 |
our power users. So, another thing that we've noticed is this prevailing narrative that agents 00:27:55.180 |
are going to make programmers lazy. It's going to make it so we don't really understand what's going 00:27:59.580 |
on in the code, so we're going to ship more slop. But we've actually found the inverse happen with the 00:28:04.860 |
power users. They're actually using agents to better understand the code. And so, this is a really good 00:28:10.140 |
onboarding tool. We just hired this guy, Tyler Bruno. He's a very precocious young developer. 00:28:15.420 |
He's actually still in college, but he's working full-time in addition to taking classes. So, really 00:28:20.860 |
bright, but also a bit green. He's been using AMP to just quickly ramp up on how the different pieces 00:28:26.860 |
connect together. He can draw diagrams and point you to specific pieces of the code. It's really good at 00:28:32.540 |
accelerating that. And then a corollary to this is we all do a form of onboarding to new code whenever we do a 00:28:38.940 |
code review. By definition, code review is new code. And oftentimes, it's new code that contains 00:28:45.020 |
blogs or is hard to understand or is a bit of a slog. And so, rather than just ignore the code that the 00:28:52.460 |
AI generates and just commit it blindly, we find that our user base is actually using this tool to do more 00:28:58.300 |
thorough code reviews. So, I've adopted this practice myself where if I have to review a very large diff, 00:29:04.140 |
the first thing I do is ask the agent to consume the diff and generate a high-level summary so I can 00:29:08.940 |
have a high-level awareness. And then I ask it, "Hey, if you were a smart senior dev, what's the entry 00:29:14.860 |
point into this PR?" Because often half the battle is just finding the right entry point. And psychologically, 00:29:21.660 |
I often put off code reviews because I'm like, "Oh, it's going to be a pain, and it's going to take 00:29:26.220 |
forever just to figure out where I should start reviewing it. So, I'll just do it tomorrow." But this thing, 00:29:30.540 |
just like it helps lower that activation energy and make code views more thorough and actually, 00:29:35.500 |
dare I say, like a little bit fun and enjoyable now. 00:29:38.620 |
Sub-agents are also a thing. So, we implemented the search tool as a sub-agent in the very beginning, 00:29:47.180 |
but we're seeing more and more use cases emerge for sub-agents. And the general best practice with 00:29:51.500 |
sub-agents is that they often are useful for longer, more complex tasks because the sub-agent allows you 00:29:58.300 |
to essentially preserve the context window. So, like, the quality of the LLM will degrade over time. 00:30:06.140 |
You know, Sonnet 4 has a context window of 200K, but we see degradation typically around like 120 or 130K. 00:30:12.700 |
And by the time you hit 170 tokens, you start to see more kind of like off-the-rails and crazy behavior. 00:30:19.260 |
But sub-agents allow you to encapsulate the context used up by a specific sub-task, like implementing a small feature, 00:30:28.780 |
Okay. So, that was a quick tour of a lot of best practices. Just to recap, like the anti-practices, 00:30:35.980 |
the common anti-patterns are just like micromanaging the agent, like using it like you would a chatbot, 00:30:41.180 |
where you have to kind of like steer it at every interaction or review every edit it's making. 00:30:45.660 |
Another common anti-pattern is just like under-prompting, so not giving it enough detail. 00:30:51.020 |
Like LLMs, their knowledge comes from two places. It either comes from their training data or from the 00:30:57.180 |
context that you give it. And so, you know, it's fine if you do a five-word prompt if you're coding up 00:31:03.980 |
like a 3D Flappy Bird game from scratch, because that's well represented in the training set. They're 00:31:08.860 |
really good at that. They're trained to do that. But if you're trying to make a subtle nuance change to 00:31:14.460 |
your large existing code base, you should be giving it all the details that you would give a colleague 00:31:19.020 |
on the team to point them in the right direction. And then last but not least, like agents are not a 00:31:25.500 |
vehicle to like TLDR the code. If anything, they're the opposite. You should be using them to do much more 00:31:30.700 |
thorough code reviews more quickly. The human is still, you're ultimately responsible for the code 00:31:36.540 |
that you ship, and you shouldn't view this as a human replacement. It's really a tool that you can 00:31:40.620 |
wield to make yourself 10, 100x more effective. Last tidbit. So, one of the things that we've noticed 00:31:48.140 |
among the very, very, very top 1% of the 1% is this inclination to run multiple of these things 00:31:57.740 |
in parallel. Jeff Huntley, who wrote that blog post that I showed earlier, he started putting out these 00:32:05.580 |
Twitter streams. They're about four hours long each. And it's basically just, he's working on a compiler 00:32:13.580 |
on the side. And what he does is he constructs prompts for three or four different agents to work on 00:32:21.100 |
different parts of the compiler. And he's gotten to the point where he's prompting it such that he feels 00:32:26.380 |
confident enough in the feedback loops where he just hits enter, lets him run, and then he goes 00:32:30.220 |
to sleep. And then this thing just runs on Twitter for a while. And I think he's doing this to spread 00:32:35.260 |
the word. It's like, hey, you can use this for serious engineering. Compilers are not some vibe-coded 00:32:43.180 |
weekend project. They're real tech. They're difficult to build. And it is possible to use agents for code like 00:32:51.100 |
this. But it has to be a very intentional skill that you practice. And so I think it's cool. I think 00:32:57.740 |
there's a lot of people thinking in terms of agent fleets and where the world is going. But I do think 00:33:02.220 |
that the way that we'll actually get there is by building these composable building blocks that allow 00:33:07.020 |
people like Jeff to go and combine them and come up with interesting UIs. I think this is just running 00:33:13.180 |
in like Tmux or some window manager. Okay, so the takeaways I just want to leave you with is, one, 00:33:19.500 |
contrary to what some might say, and there's a lot of smart senior developers out there who think AI is 00:33:25.820 |
overhyped, and maybe parts of it are. But I think coding agents are very real. And it is, I think, 00:33:32.060 |
a high ceiling skill. It's like, I think we will probably invest in learning how to use these things 00:33:38.380 |
in the same way that we invest in learning how best to use our editor or our programming language of choice. 00:33:43.500 |
And I think the only way you can learn this stuff is by doing it and then sharing it out with others. 00:33:49.180 |
And one of the reasons we built the thread sharing mechanism in AMP is to help encourage knowledge 00:33:54.780 |
dissemination so that like if you discover an interesting way of using it, you can share that out with your team. 00:33:59.740 |
But yeah, that's it. If you want to kind of like see a recap of the best practices in this talk, 00:34:06.540 |
we've actually put out like an AMP owner's manual that guides new users how to best use it. 00:34:11.260 |
I'll also be around afterwards. We have a booth in the main expo hall. I'm supposed to say, too, 00:34:17.420 |
if you stop by the booth, we'll give you like $10 in free credits. So if anything you saw here was of interest of you 00:34:23.820 |
and you want to try this out, stop by and say hi. 00:34:26.540 |
I noticed you still type, can you, and then you correct your typos, which I guess you said you shouldn't do. 00:34:38.620 |
Yeah, it's part habit and it's part paranoia that in like a live demo setting, there will be some typo 00:34:47.180 |
token that will trigger off the rails behavior. But it like, I think that was more of a concern that I 00:34:52.140 |
learned in like 2023 when it actually mattered. Because like these days, LMs are more and more like