The emerging skillset of wielding coding agents — Beyang Liu, Sourcegraph / Amp

00:00:00.000 | My name is Bianc. I'm the CTO and co-founder of a company called Sourcegraph. We build developer

00:00:19.500 | tools and today I want to share with you some of the observations and insights that we've had on

00:00:24.900 | the sort of like emerging skill set of how to wield coding agents. That sound good to everyone?

00:00:30.900 | All right, cool. Okay, so let's check in on the agent discourse. I don't know if you all saw this,

00:00:39.900 | but a couple days ago there were some spicy tweets about the efficacy of AI coding agents or inefficacy,

00:00:47.900 | depending on your perspective. So Jonathan Blow, who is a really talented developer, he basically

00:00:54.900 | single-handedly coded up the indie game Braid, if you're familiar with that. So he's kind of like

00:00:59.900 | god tier status in terms of coding ability. Retweeted Alex Albert, who is also someone I

00:01:05.900 | respect and admire a lot, works at Anthropic, basically claiming that all the hype around coding

00:01:10.900 | agents and code AI in general was just hype, right? There's no substance there. And then there were

00:01:17.900 | some responses, and there was kind of a spectrum of responses too. We had some other big names in the

00:01:22.900 | developer world, like Jessie Frazell. She was one of the early contributors, maintainers of Docker. She's

00:01:27.900 | also really legit. She said basically something to the effect of like, I think you're right, but you're in

00:01:34.900 | like the top 0.01% of programmers, Jonathan, for the rest of us down here in this room that aren't on

00:01:40.900 | Mount Olympus. It actually helps a lot, but not super helpful if you're really, really good. But then we also had folks like Eric S. Raymond, who is like one of the fathers of open source, who had a very spicy reply. He's basically like, look, I consider myself to be pretty decent at programming, and these things help a lot. And then the kind of my favorite one of this was actually the Top of Hacker News post that was written by

00:01:45.900 | Thomas Tacek, who is a

00:01:50.900 | Thomas Tacek, who is a really legit security engineer. Some of you may have seen this trending.

00:01:55.900 | reply is basically like, look, I consider myself to be pretty decent at programming, and these things

00:02:02.160 | help a lot. And then the kind of my favorite one of this was actually the Top of Hacker News post

00:02:09.540 | that was written by Thomas Tacek, who is a really legit security engineer. Some of you may have seen

00:02:18.660 | this trending. He was basically taking the opposite view of like, you know, there's some really smart

00:02:22.980 | people there who are very AI skeptical, but they're nuts. Like, these things are really useful.

00:02:26.680 | So I'm guessing if you're at this conference, you probably lean toward coding agents are

00:02:33.220 | substantively useful, and there's something there. I don't know. Just a guess. But I think even within

00:02:40.720 | this room, there's probably a spectrum of best practices and opinions about like where agents

00:02:48.940 | are good. You know, whether they're restricted to like small edits or like front end applications or

00:02:56.020 | weekend via coding, whether they actually work on your production code base. And I think this is just

00:03:02.200 | indicative of the dynamic technical landscape that we're in right now. And a couple months back, I read

00:03:07.760 | this blog post from this guy, Jeff Huntley. So Jeff was a senior engineer at Canva at the time. And his role

00:03:15.240 | at Canva is really interesting. He basically went around interviewing all the developers inside of

00:03:21.480 | Canva using AI tools like cursor and other things and seeing how they're using it. And he basically came

00:03:27.060 | to the conclusion that like most people were holding it wrong, which is really interesting. And he came up

00:03:32.300 | with the blog post about like all the different anti-patterns that he was seeing. But my summation

00:03:37.540 | of that blog post is like the number one mistake that people are using with coding agents right now is they're

00:03:45.080 | trying to use coding agents the same way they're using AI coding tools six months ago. And therefore, they're

00:03:52.480 | probably wrong, which is kind of crazy because normally, if you're, you know, using a tool,

00:03:57.720 | the best practices don't change in six months. Typically, the things that you learn that are good will

00:04:04.420 | still be like present and, you know, topical and relevant six months down the line. But I think we're in a

00:04:12.660 | really interesting moment in time right now. And, you know, why the sudden change? I think it's because

00:04:18.900 | of this step function transition that we've experienced in model capabilities in the past six months. So, you know,

00:04:26.700 | we've, we've all been around since the dawn of generative AI, the ancient year of 2022, right? November 2022 was

00:04:36.700 | when ChatGPT launched, right? And every year now, you know, this is now the year three, you know, three

00:04:43.260 | after ChatGPT, right? We're now living in the AI future. But I think there's already been kind of like three

00:04:48.900 | distinct waves or eras largely driven by the evolution of frontier model capabilities. And the

00:04:56.700 | model capabilities really dictate the ideal architecture that, that becomes dominant at the

00:05:02.680 | application layer. So in the GPT-3 era, all the models were text completion models, which meant all

00:05:09.280 | the applications that people were building were these like copilots or autocomplete tools. So the

00:05:13.540 | dominant UX paradigm was like, you type some stuff, it types some stuff, you type some more, and that's

00:05:19.000 | how you would interact. And then ChatGPT came along with GPT-3.5, which was InstructTuned to interact

00:05:25.980 | like a chat bot. And suddenly people realized like, oh, it's not just completing the next thing

00:05:31.960 | I'm talking about, I can actually ask it questions like I can a human now. And then some other people

00:05:36.980 | came along, we were part of this crowd, we realized like, hey, you know what's even better than just

00:05:42.260 | like asking questions? You can actually copy paste stuff into the chat and say like, here's some code

00:05:47.320 | for my code base, use that as an example, and pattern match against that. And that helps it generate,

00:05:52.960 | you know, a little bit better code or less fake code or less hallucinated code than it did before.

00:05:58.940 | And that basically meant that everyone at the application layer was building a rag bot in 2023.

00:06:04.920 | So like a chat bot plus a rag retrieval engine. But now, I think we've entered a new era.

00:06:12.900 | And I'm not sure if everyone realizes it, or maybe this is, I don't know, like who agrees with this statement?

00:06:17.940 | Like who thinks it's a real paradigm shift? Okay, and then who here is like, ah, that's a bunch of

00:06:22.900 | bullshit. Anyone? Feel free to, I like, okay, okay, so maybe I'm, maybe I could just skip this slide.

00:06:28.880 | So we're now living in the era of agents, and the new model capabilities really dictate a new

00:06:35.720 | application architecture. And so one of the things that we asked ourselves at Sourcegraph is, you know,

00:06:40.580 | a lot of the existing tools in the market that were designed for the era of GPT-4 and CLAWD-3.

00:06:46.860 | So a lot of the application stuff, features, and UX and UI was really built around the capabilities,

00:06:53.400 | or in some cases, the limitations of the chat-based LLMs. And so if we were going to design a coding agent

00:07:00.780 | from the ground up to unleash the capabilities of tool using LLMs, agentic LLMs, what would that look like?

00:07:07.580 | Okay, so here are my spicy takes. These are controversial design decisions that I think are better to make

00:07:16.140 | in the age of agents. And many of these go against the best practices that kind of emerged in the chatbot era.

00:07:23.820 | Okay, so number one is the agent should just make edits to your files. It shouldn't ask you at every turn,

00:07:30.620 | like, hey, you know, I want to make this change. Should I apply it? If it's asking you and it's wrong,

00:07:34.620 | it's already done the wrong thing and it's wasted your time. Humans need to get more out of the inner loop

00:07:41.420 | and more kind of like on top of the loop, like still steering it and guiding it, but

00:07:45.420 | less, you know, micromanaging and managing every change. Second thing is, do we still need a thick

00:07:52.300 | client to manipulate the LLMs? Like, do we still need a forked VS code? That's like the salty way of

00:07:58.860 | saying this, right? The VS code fork became the culmination of the AI coding application, I think,

00:08:04.780 | for the chatbot era. But there's this question of like, you know, if the contract of an agent is you

00:08:10.940 | ask it to do stuff and then it does stuff, do you really still need all that UI built around like

00:08:15.340 | context management and applying the proposed change in the code base? Or can you just ask it to do stuff

00:08:21.260 | and expect it to do the right thing? Third, I think we're going to move beyond the choose your own model

00:08:28.140 | phase. So I think in the chatbot era, it was very easy to swap models in and out. And you'd like, oh,

00:08:33.580 | you know, a new model came along, let me swap it out and see how well it attends to the context that

00:08:38.220 | my retrieval engine fetches. In the agentic world, there's a much deeper coupling because the LLM that

00:08:44.780 | you're using essentially becomes the brains of these agentic chains. And so it's much harder to rip

00:08:49.660 | and replace. And I think a lot of people in this room who have tried mixing and matching, you know,

00:08:53.900 | different models in the context of agents have found that, you know, swapping out a different model and

00:08:59.420 | expecting similar results is very different. A lot of the, like, a lot of the LLMs out there aren't even

00:09:04.620 | good at the basics of tool use yet. So it's very difficult to just replace the brains.

00:09:10.700 | Four is I think we're going to move past the era of fixed pricing. Agents eat up a lot of tokens.

00:09:16.380 | And so they look expensive relative to chatbots. But the comparison that more and more people are

00:09:23.100 | making is how much human time is it saving? So they're still cheap relative to human time saved.

00:09:28.460 | And the fixed pricing model actually introduces a perverse incentive now where it's like selling gym

00:09:33.420 | memberships, right? Like if I sold you a membership to my chatbot and you're now paying me, you know,

00:09:38.140 | 20 bucks a month, my incentive now is to push the inference cost as low as possible. And the easiest

00:09:43.260 | way to do that is to use dumber models. But dumber models just waste more of your time.

00:09:48.140 | Sorry, this is a long list. Hopefully it's not too tedious. But I think these are important points.

00:09:56.540 | The second-to-last point I'll make is I think the Unix philosophy is going to be more powerful here

00:10:00.940 | than vertical integration. So in developer tools, the ability to use simple tools in ways that compose

00:10:07.500 | well with other interesting tools is really powerful. And so I think, especially with agents where there's

00:10:13.420 | less of a need to create like a lot of UI around it, you're going to start to see more command-driven

00:10:19.020 | tools, command line tools, and things like that. And then last but not least is, you know, we had an

00:10:25.260 | existing RAG chat coding assistant. Maybe some of you have used it. It was called Kodi. It still exists.

00:10:31.180 | We're still supporting it. It's still in heavy use across, you know, many Fortune 500 companies. But we

00:10:36.380 | decided to build a new application from the ground up for the agentic world because we didn't want to be

00:10:43.900 | constrained by all the assumptions and constraints that we built into the application layer for the previous

00:10:50.700 | generation of LLMs. And one analogy I like to draw here is, you know, what the early days of the internet, right?

00:11:00.460 | Like in the early days of the internet, the way people, you know, jumped into the the web was

00:11:05.820 | using an interface on the left. This was before like most people knew what the internet was about, what it was capable of,

00:11:11.340 | and that was the right interface for the first generation of the internet because like what can you do with the internet?

00:11:15.580 | Well, like there's a bunch of different things. You can look at like trending celebrities. You can, you know, buy

00:11:20.620 | automobiles. You can look at movie reviews, all these things you might not have thought of. And so it's

00:11:24.940 | useful to have in front of you. But at some point it gets a little tedious like clicking through all the

00:11:29.420 | different hyperlinks and navigating your way through. And then the the real power of the web was sort of

00:11:34.700 | unleashed by just like the one simple text box where you just like type what you're looking for and you get

00:11:40.540 | to it. And I think, you know, with with agentic UIs, that's what we should be striving for both in developer

00:11:47.500 | tools and in a lot of different application paradigms. Okay, so what does that look like in practice?

00:11:52.940 | So when we went to design this thing, our coding agent is called AMP. And AMP has two clients and this

00:12:00.060 | is what they look like. So both are like very, very bare bones. A lot of people, you know, look at this

00:12:04.860 | and like, what is this? It's just a text box. What can I do with it? And that was by design, you know,

00:12:10.780 | that for all the reasons I just mentioned. One client is just a simple VS code extension

00:12:16.380 | that allows us to take advantage of some nice things that you get in VS code, like being able to view

00:12:21.980 | diffs. That's really important in the agent decoding world. I often joke that like that's now I use that

00:12:27.500 | view more than the editor view now. And the second was a CLI. So just stripping things down to bare bones. It

00:12:34.700 | has access to all the same tools as the VS code extension does, but it's just something that you

00:12:39.500 | can invoke in your command line. You can also script it, compose it with other tools.

00:12:45.180 | Okay. So what does this actually look like in practice? I want to do something a little bit

00:12:51.980 | risky here, which is in the past I've done a lot of like, you know, hey, here's me building a simple

00:12:58.140 | app, like those sorts of demos. But I actually want to show off like where we think this is most useful,

00:13:02.700 | which is like, hey, I'm working on an application that has real users. Let me actually make a contribution

00:13:08.620 | to that code base, given all, with all the like existing constraints. And so I actually want to,

00:13:14.220 | I'm just going to code a little bit. Well, I don't even know how far we're going to get.

00:13:16.780 | But this is, this is AMP. This is VS code running AMP in the sidebar, and it's open to the AMP code base.

00:13:24.780 | And what I want to do is implement like a simple change to this application. So the change that I'm going

00:13:33.100 | to make is AMP has a server component. And the server exists as a way to provide the LLM inference

00:13:41.100 | point. It also provides like team functionality. We have a way to share like what different teams

00:13:46.780 | are doing, what different users are doing with AI. So you can kind of learn from other users. There's

00:13:50.460 | leaderboard. It's fun. But there's also these things called connectors, which allow AMP to talk to

00:13:55.340 | external services. So our issue tracker is linear. And so I've integrated linear into AMP here, but I'm

00:14:01.180 | kind of annoyed because it's using this generic like network icon. And I would really like to customize

00:14:05.660 | this icon such that when you plug in the linear MCP endpoint, it uses a more appropriate icon like a

00:14:11.180 | checkbox or something issue-y. So I've already filed this as a linear issue. And I'm just going to ask,

00:14:18.060 | can you find the linear issue about customizing the linear connector icon? Then implement it.

00:14:30.380 | So what this will do is it has access to a set of tools. I can go over here to the tool panel

00:14:37.340 | and see what tools it has access to. Some are local, some are built in. It's got the standard tools

00:14:42.860 | like read and edit file or run bash command. You can also plug in things like Playwright and Postgres

00:14:48.220 | via MCP. And then linear is also plugged in through this. So we're basically talking to the linear API

00:14:55.020 | through the MCP server. And what this will do is it will use the linear issues API. And it will search

00:15:05.340 | at issues. It found 50 issues. And the one that I was referring to is at the top here. So add a special

00:15:10.380 | icon for the linear connector. And now it's going to go and implement the thing for me.

00:15:18.460 | And one thing to note here is it's just making these tool calls on its own. I'm not asking it

00:15:24.140 | to use specific tools. We've also tried to make the information that you see minimal. So you don't

00:15:31.980 | need to see all the API tool calls that it's making underneath the hood or crowd out the transcript with

00:15:37.260 | a bunch of things. Most of the time, we just want to keep it simple. Because the contract we want to

00:15:42.380 | provide to users is like, the feedback loops here are more robust. And you don't have to micromanage

00:15:48.460 | this as much. Another thing I want to point out here is the search tool that this is using is actually

00:15:55.020 | a sub-agent. So it's actually spinning off a sub-agent loop that uses a form of agentic search. It has

00:16:00.780 | access to a bunch of different search tools. Keyword search, regular graph, looking at file names.

00:16:09.340 | If you want to inspect what it's doing, you can click the expand thing and see what different path

00:16:14.140 | it's taking, what files it's reading, what things it uncovered. But again, by default, we think this is

00:16:19.020 | an implementation detail. And hopefully, it should just surface the right thing. So it's working. It's

00:16:24.940 | gathering context. Another thing I want to call out in this interface is, as we've gotten more feedback,

00:16:31.100 | we've kind of designed this thing to be more multi-threaded. So there's a quick keyboard shortcut that

00:16:36.060 | allows you to quickly tab through the different threads that you're running. And it's a common

00:16:39.340 | paradigm in our user community to be running more than one of these things at a time.

00:16:43.660 | And it takes a little bit to get used to the context switching. Developers hate context switching.

00:16:49.740 | We like to be in flow, in focus. Typically, what we see here is the secondary thread will either be

00:16:59.660 | something that's a lot shallower, so that you can quickly page back to the main thread. Or what I like

00:17:03.740 | to do is, while the agent is working, I actually like to understand the code at a deeper level myself,

00:17:09.420 | so I can better understand what it's going to do. So I could ask something like, "Can you show me how

00:17:15.900 | connectors and connections work in AMP?" I can ask it to draw a picture of that.

00:17:24.860 | So we'll kick that thread off in parallel. We'll check back in on what this guy is doing.

00:17:28.620 | So it's found -- it's read a bunch of files. It's read some front-end files. Our front-end is written

00:17:34.940 | in Svelte. And as you can see, it's being fairly thoughtful about reading the appropriate files

00:17:40.620 | before it actually goes and does the work. And we find that this is really important to make the feedback

00:17:46.700 | cycles more robust. Otherwise, the anti-pattern is you just get into the weeds of steering it manually.

00:17:53.180 | It's also got this to-do list thing at the bottom that helps it structure and plan out the longer-term

00:18:00.860 | tasks, so that it doesn't go immediately dive into the code. There's a classic mistake that human

00:18:05.340 | developers make too, where you dive into the code too early, and then you get lost in the weeds, and then

00:18:09.100 | it takes a while to dig yourself out. Okay, so it's making some changes. One other thing that I like

00:18:17.340 | to point out here is -- you know, I mentioned that I use the diff view in VS Code now probably more than

00:18:22.620 | the editor view. VS Code actually has a really nice diff view. I have a hot keyed, so I can open it up

00:18:28.380 | quickly. And most of my time in VS Code now is spent just reviewing the changes it makes. And I actually

00:18:35.500 | like this a lot better than GitHub PRs or Git diff on the command line. Just because it's in the editor,

00:18:41.340 | you can see the whole file, and jump to definition even works. So, yeah. We'll just wait a little bit

00:18:49.180 | for it to do its thing. I actually think it's probably made -- looks like it's getting there. It's probably

00:18:59.900 | just running, like, tests. Let's see if we go back here, if it's updated the icon at all.

00:19:05.740 | Okay, so it hasn't gotten there yet, but I think it's on the right track.

00:19:13.420 | The question was, does it write its own tests? Yes, it typically writes its own tests. And if it doesn't,

00:19:23.420 | you can prompt it to do so. So, it's doing a lot of things. It's reading a lot of files. It's making

00:19:29.660 | these edits incrementally and then checking the diagnostics. And then now let's see if it works.

00:19:35.100 | Okay, cool. So, you see here the icon has been updated. And this is without me really steering it

00:19:40.220 | in any fashion. Notice here on this page that this icon didn't update, though. And so this is actually

00:19:48.140 | not surprising to me because this change -- as many changes in production code bases are -- often more

00:19:53.740 | nuanced than it seems at the surface. So, in this case, the reason it's not getting it here is because

00:19:58.620 | this is the admin page. And the piece of data we need to know -- we need to read in order to tell

00:20:07.820 | that this is a linear MCP rather than a generic MCP is actually part of the config. We have to look at

00:20:13.900 | the endpoint of the MCP URL. In order to do that, you have to read the config. But the config might

00:20:18.860 | also contain secrets. It doesn't contain secrets in this case, but might contain secrets in other cases.

00:20:22.620 | So, we actually prohibit those secrets from being sent to non-admin pages. So, it's not surprising to me

00:20:28.620 | that the first pass, it didn't get that right. But let's see if it can get -- I'll just nudge it a little

00:20:32.940 | a bit. So, like, I noticed that the icon changed on admin connections, but not on settings.

00:20:41.820 | Can you investigate why?

00:20:49.980 | And in the interest of time, we'll check back on this later. How about that?

00:20:54.060 | We'll let it run and we'll see if it can find its way to the right solution there.

00:20:59.260 | Is it okay if I go a little bit over since we started a little bit? Okay, cool.

00:21:05.020 | Is it okay with you all if I go a little bit over? Okay. Are you still having fun?

00:21:08.620 | Okay, cool. So, that was like a brief demo of just like the interaction patterns and the UI.

00:21:14.700 | We try to keep it really minimal. We've released this to like a small group so far. The sign-up is

00:21:21.340 | now publicly open. It's been open for about two weeks, but we haven't done a lot of like marketing

00:21:25.580 | around it. And that's kind of been intentional because we're really trying to design this for

00:21:30.540 | where we think the puck is going. And so, we've done a lot to curate this community of people who are

00:21:35.900 | trying to experiment with LLMs and figure out like how the interaction paradigms are going to change

00:21:41.980 | over the next six to 12 months. And so, our user community is really people who are like spending

00:21:46.620 | nights and weekends a lot of time with this thing to see what they can get it to do. And so, actually,

00:21:53.340 | one of the most insightful things and actually the main topic of this talk is lessons that we've learned

00:21:58.700 | from just like looking at what our power users are doing and seeing what interesting behavior patterns

00:22:05.660 | they're kind of like implementing. And so, like the average spend for agents is growing. It's a lot

00:22:13.100 | more than the average spend was for chatbots or autocomplete. But one other interesting thing that

00:22:17.340 | we've noticed among the user base is that there's a huge variance in terms of how much people use this

00:22:24.140 | think. To the point where like there's like an upper echelon of users that are spending like thousands of

00:22:31.020 | dollars per month just in inference costs. And at first, we're like, this has got to be abuse, right?

00:22:37.660 | Like someone out there is poked, found some way to exploit the inference endpoint, is using it to power

00:22:44.220 | some Chinese AI girlfriend or whatever. But actually, no, when we spoke to the people using it,

00:22:51.900 | we actually found that they were doing real things. And we're like, that's interesting. What the hell are

00:22:55.580 | you doing? And from those insights and the conversations, we basically have encapsulated a series of best

00:23:01.580 | practices or emergent power user patterns for how the very dominant users, the most active users are using

00:23:13.420 | this thing. And this has informed our product design process as well. So one of the the first changes

00:23:17.980 | that we made was we noticed that a lot of the power users were very writing very long prompts. It was

00:23:25.260 | not like the simple kind of like Google style, like three keywords and just like read my mind and expect

00:23:30.300 | something good to happen. They actually wanted to write a lot of detail because they realized that LLMs are

00:23:36.060 | actually quite programmable. If you give them a lot of context, they will follow those instructions and get

00:23:40.940 | a little bit further than if you just give them like a one line sentence. And so we made the default

00:23:45.180 | behavior of the enter key in the AMP input just new line. So you have to hit command enter to submit.

00:23:50.940 | And this throws a lot of the new users off because they're like, wait a minute, why isn't it just enter?

00:23:54.460 | Like, you know, if I'm in like cursor or whatever, it's just enter. That's easy. That's intuitive.

00:23:58.300 | But actually, what we want to push users to do is to write those longer prompts because that actually yields

00:24:03.500 | better results. And I think that's one of the things that prevents people who are still in the kind of like chat LLM

00:24:09.340 | mode from unlocking some of the, you know, cool stuff that agents can do.

00:24:15.420 | Another thing that people do very intentionally is direct the agent to look at relevant context and

00:24:24.700 | feedback mechanisms. So, you know, context was very important in the chat bot era. It's still important

00:24:30.140 | in the agentic era. Now, agents do have a good amount of like built-in knowledge for how to use tools to

00:24:35.820 | acquire context. Like you saw that before when it was using the search tool to find different things.

00:24:39.900 | And it was executing the test and linters to see if the code was valid.

00:24:46.460 | But there's still some cases, especially in production code bases, where it's like, oh,

00:24:51.340 | we do things in a very specific way that are kind of like out of distribution. And so like some like less

00:25:00.140 | agentically inclined users at that point will just give up. They're like, ah, you know, agents aren't

00:25:03.340 | capable of working with like backend code yet. But what we've noticed is the power user like, actually,

00:25:07.820 | let me try to just tell it how to run, you know, the build in this particular sub directory, run the

00:25:12.620 | tests. And that helps it complete the feedback loop so that it can get the validation to get further.

00:25:20.220 | Feedback loops are going to be a big theme of this talk. So another like dominant paradigm here is

00:25:26.700 | constructing these like frontend feedback loops. So like a really common formula is you have the playwright

00:25:31.580 | mcp server and then there's a thing called storybook, which is basically a way to encapsulate or

00:25:36.780 | componentize a lot of your fun components. It makes it very easy to test individual components without

00:25:42.220 | loading your entire app. And you know, you probably should have been doing this anyways as a human

00:25:46.940 | developer because you get a fast feedback loop. You make a change, see it reflected instantly,

00:25:50.540 | you get the auto reload and then go back to your editor. But with agents, you kind of notice it more

00:25:55.180 | because you're no longer like in the weeds doing the thing. You're like, oh, you're almost like the

00:25:59.260 | developer experience engineer for your agent. It's like, how can I make it loop back faster? And so what

00:26:03.740 | the agent will do is like, you know, make the code change, use playwright to open up the page in the

00:26:08.380 | browser, snapshot it, and then loop back on itself. And it does that via storybook because it's much faster

00:26:14.140 | than reloading the entire app.

00:26:17.740 | You put playwright as a tool for you?

00:26:19.740 | Yes. So, it's one of the default recommended tools. So, it's right here.

00:26:24.780 | And actually, it looks like that run completed. I wonder if -- looks like it did approximately the

00:26:35.020 | right thing. Sorry, just to jump out of the slides of her a little bit. So, now you can see like the icon

00:26:40.060 | is customized on the settings page, not just the admin page. And if you look at how it did that,

00:26:45.660 | I think it did the right thing. So, if you look at the diff, it actually looked at the surrounding code

00:26:51.020 | and it was like, oh, there is an existing mechanism for plumbing non-secret parts of the config through

00:26:56.620 | to the UI. Let me use that as a reference point. And it actually plumbed exactly that field through

00:27:05.020 | to the front end. So, now if I add additional fields to the MCP config that do contain secrets,

00:27:09.980 | this is whitelisted. So, it will still only send the endpoint URL over to the client,

00:27:15.100 | basically what it needs to make that icon customization. So, yeah. I know it's not a super

00:27:22.540 | impressively visual change, but a lot of such changes in messy production code bases are like that,

00:27:27.100 | and it's cool to see the agent be able to tease out that nuance.

00:27:32.460 | Okay. I know we're a little bit over time. Can people mind if I keep going? Okay, cool.

00:27:41.180 | There's some additional tips and tricks. Most of this talk is just sharing what we've learned from

00:27:49.740 | our power users. So, another thing that we've noticed is this prevailing narrative that agents

00:27:55.180 | are going to make programmers lazy. It's going to make it so we don't really understand what's going

00:27:59.580 | on in the code, so we're going to ship more slop. But we've actually found the inverse happen with the

00:28:04.860 | power users. They're actually using agents to better understand the code. And so, this is a really good

00:28:10.140 | onboarding tool. We just hired this guy, Tyler Bruno. He's a very precocious young developer.

00:28:15.420 | He's actually still in college, but he's working full-time in addition to taking classes. So, really

00:28:20.860 | bright, but also a bit green. He's been using AMP to just quickly ramp up on how the different pieces

00:28:26.860 | connect together. He can draw diagrams and point you to specific pieces of the code. It's really good at

00:28:32.540 | accelerating that. And then a corollary to this is we all do a form of onboarding to new code whenever we do a

00:28:38.940 | code review. By definition, code review is new code. And oftentimes, it's new code that contains

00:28:45.020 | blogs or is hard to understand or is a bit of a slog. And so, rather than just ignore the code that the

00:28:52.460 | AI generates and just commit it blindly, we find that our user base is actually using this tool to do more

00:28:58.300 | thorough code reviews. So, I've adopted this practice myself where if I have to review a very large diff,

00:29:04.140 | the first thing I do is ask the agent to consume the diff and generate a high-level summary so I can

00:29:08.940 | have a high-level awareness. And then I ask it, "Hey, if you were a smart senior dev, what's the entry

00:29:14.860 | point into this PR?" Because often half the battle is just finding the right entry point. And psychologically,

00:29:21.660 | I often put off code reviews because I'm like, "Oh, it's going to be a pain, and it's going to take

00:29:26.220 | forever just to figure out where I should start reviewing it. So, I'll just do it tomorrow." But this thing,

00:29:30.540 | just like it helps lower that activation energy and make code views more thorough and actually,

00:29:35.500 | dare I say, like a little bit fun and enjoyable now.

00:29:38.620 | Sub-agents are also a thing. So, we implemented the search tool as a sub-agent in the very beginning,

00:29:47.180 | but we're seeing more and more use cases emerge for sub-agents. And the general best practice with

00:29:51.500 | sub-agents is that they often are useful for longer, more complex tasks because the sub-agent allows you

00:29:58.300 | to essentially preserve the context window. So, like, the quality of the LLM will degrade over time.

00:30:06.140 | You know, Sonnet 4 has a context window of 200K, but we see degradation typically around like 120 or 130K.

00:30:12.700 | And by the time you hit 170 tokens, you start to see more kind of like off-the-rails and crazy behavior.

00:30:19.260 | But sub-agents allow you to encapsulate the context used up by a specific sub-task, like implementing a small feature,

00:30:25.980 | such that it doesn't pollute the main agent.

00:30:28.780 | Okay. So, that was a quick tour of a lot of best practices. Just to recap, like the anti-practices,

00:30:35.980 | the common anti-patterns are just like micromanaging the agent, like using it like you would a chatbot,

00:30:41.180 | where you have to kind of like steer it at every interaction or review every edit it's making.

00:30:45.660 | Another common anti-pattern is just like under-prompting, so not giving it enough detail.

00:30:51.020 | Like LLMs, their knowledge comes from two places. It either comes from their training data or from the

00:30:57.180 | context that you give it. And so, you know, it's fine if you do a five-word prompt if you're coding up

00:31:03.980 | like a 3D Flappy Bird game from scratch, because that's well represented in the training set. They're

00:31:08.860 | really good at that. They're trained to do that. But if you're trying to make a subtle nuance change to

00:31:14.460 | your large existing code base, you should be giving it all the details that you would give a colleague

00:31:19.020 | on the team to point them in the right direction. And then last but not least, like agents are not a

00:31:25.500 | vehicle to like TLDR the code. If anything, they're the opposite. You should be using them to do much more

00:31:30.700 | thorough code reviews more quickly. The human is still, you're ultimately responsible for the code

00:31:36.540 | that you ship, and you shouldn't view this as a human replacement. It's really a tool that you can

00:31:40.620 | wield to make yourself 10, 100x more effective. Last tidbit. So, one of the things that we've noticed

00:31:48.140 | among the very, very, very top 1% of the 1% is this inclination to run multiple of these things

00:31:57.740 | in parallel. Jeff Huntley, who wrote that blog post that I showed earlier, he started putting out these

00:32:05.580 | Twitter streams. They're about four hours long each. And it's basically just, he's working on a compiler

00:32:13.580 | on the side. And what he does is he constructs prompts for three or four different agents to work on

00:32:21.100 | different parts of the compiler. And he's gotten to the point where he's prompting it such that he feels

00:32:26.380 | confident enough in the feedback loops where he just hits enter, lets him run, and then he goes

00:32:30.220 | to sleep. And then this thing just runs on Twitter for a while. And I think he's doing this to spread

00:32:35.260 | the word. It's like, hey, you can use this for serious engineering. Compilers are not some vibe-coded

00:32:43.180 | weekend project. They're real tech. They're difficult to build. And it is possible to use agents for code like

00:32:51.100 | this. But it has to be a very intentional skill that you practice. And so I think it's cool. I think

00:32:57.740 | there's a lot of people thinking in terms of agent fleets and where the world is going. But I do think

00:33:02.220 | that the way that we'll actually get there is by building these composable building blocks that allow

00:33:07.020 | people like Jeff to go and combine them and come up with interesting UIs. I think this is just running

00:33:13.180 | in like Tmux or some window manager. Okay, so the takeaways I just want to leave you with is, one,

00:33:19.500 | contrary to what some might say, and there's a lot of smart senior developers out there who think AI is

00:33:25.820 | overhyped, and maybe parts of it are. But I think coding agents are very real. And it is, I think,

00:33:32.060 | a high ceiling skill. It's like, I think we will probably invest in learning how to use these things

00:33:38.380 | in the same way that we invest in learning how best to use our editor or our programming language of choice.

00:33:43.500 | And I think the only way you can learn this stuff is by doing it and then sharing it out with others.

00:33:49.180 | And one of the reasons we built the thread sharing mechanism in AMP is to help encourage knowledge

00:33:54.780 | dissemination so that like if you discover an interesting way of using it, you can share that out with your team.

00:33:59.740 | But yeah, that's it. If you want to kind of like see a recap of the best practices in this talk,

00:34:06.540 | we've actually put out like an AMP owner's manual that guides new users how to best use it.

00:34:11.260 | I'll also be around afterwards. We have a booth in the main expo hall. I'm supposed to say, too,

00:34:17.420 | if you stop by the booth, we'll give you like $10 in free credits. So if anything you saw here was of interest of you

00:34:23.820 | and you want to try this out, stop by and say hi.

00:34:26.540 | I noticed you still type, can you, and then you correct your typos, which I guess you said you shouldn't do.

00:34:38.620 | Yeah, it's part habit and it's part paranoia that in like a live demo setting, there will be some typo

00:34:47.180 | token that will trigger off the rails behavior. But it like, I think that was more of a concern that I

00:34:52.140 | learned in like 2023 when it actually mattered. Because like these days, LMs are more and more like

00:34:57.180 | typo robust, I would say.

00:34:59.260 | Yeah.

The emerging skillset of wielding coding agents — Beyang Liu, Sourcegraph / Amp

Chapters