back to indexThe Making of Devin by Cognition AI: Scott Wu

00:00:00.000 |
I'm Scott from Cognition AI, and I'm going to tell you guys a little bit about, you know, the early makings of Devon. 00:00:20.300 |
We're still super, super early on, and also a little bit about kind of the space as a whole and what's coming next. 00:00:26.660 |
You know, I thought it would be nice to start with the demo first. 00:00:29.360 |
It sounds like some of you guys have already seen some of the videos, but I brought a nice custom one here for the World's Fair today, so I'll just show that quickly. 00:00:38.300 |
And here I basically said, hey, Devon -- this was this morning, by the way, this is a huge scramble -- I said, hey, Devon, I want you to build a mobile-friendly website to play the name game. 00:00:47.560 |
So I have a lot of trouble memorizing names and faces, I don't know about you guys, but I basically just said, you know, 00:00:53.320 |
here's a -- here's a TSV file of a bunch of names of faces, these are all the speakers here at the World's Fair this week. 00:01:00.020 |
And I said, can you set up the game so that you show two different random faces and then show the names of one of them and have me guess which one is which, right? 00:01:08.020 |
And I gave kind of a few instructions on how the game should work. 00:01:11.020 |
And so Devon is a fully autonomous software engineer. 00:01:14.720 |
And what that means is Devon has access to all the same tools that a human software engineer would have when it was building -- when they were building something like this. 00:01:23.720 |
And so the first thing that Devon is going to do is Devon is going to make a plan. 00:01:26.720 |
And you can see here, you know, kind of a basic plan coming out. 00:01:30.220 |
One of the interesting things about this is the plan changes a lot over time. 00:01:33.720 |
And so, you know, as you get new information or new feedback, you update your plan accordingly with that, too. 00:01:40.420 |
After that, Devon is basically just running this the same way that a human would. 00:01:43.220 |
And so if you can take a look, you know, Devon makes a new directory for the name game website, starts a new React app, you know, all the same primitives, works on building it out and building the code, you know, reads the TSV file to take a look at what's going on here. 00:01:56.720 |
And it's just kind of generally working through it and jumping through. 00:02:00.720 |
It comes out and deploys this first version after some minutes. 00:02:08.720 |
I mean, it shows -- it's still showing the names. 00:02:10.720 |
And I think maybe I didn't quite specify that exactly. 00:02:13.720 |
But, you know, you can click the name and got that correct. 00:02:17.720 |
And so I just went ahead and just gave it some more feedback in plain English. 00:02:20.720 |
And so I said, hey, you know, can you hide the two names until I click on the answer? 00:02:23.720 |
And also, can you probably restyle the play again button? 00:02:26.720 |
It's like, you know, somehow it's a little off on this page. 00:02:29.720 |
And I kept going and just kind of gave it more and more feedback over time. 00:02:34.720 |
And I also asked it, hey, can you add a streak counter as well? 00:02:36.720 |
You know, can you keep track of how many I got correct and, you know, reset to zero? 00:02:42.720 |
And the website it ultimately deployed was this one right here. 00:02:52.720 |
And so, you know, if I were to -- for example, if I got this one wrong on purpose, 00:02:58.720 |
then you would see the streak would reset to zero, you know, and it would go on. 00:03:01.720 |
And so I actually played this game and learned the names of everyone, 00:03:10.720 |
I think it was something like 170 speakers here at the World's Fair this week. 00:03:15.720 |
So, you know, this is kind of a cool example. 00:03:18.720 |
But, you know, I want to highlight how different the world is 00:03:25.720 |
You know, if you can just explain exactly what you want in plain English 00:03:30.720 |
And so, you know, this is obviously kind of a toy use case. 00:03:34.720 |
But we use Devon all the time ourselves when we're building Devon, actually. 00:03:37.720 |
And by the way, obviously, I didn't make this website myself. 00:03:41.720 |
I just said, hey, can you build me this website with the QR code and whatever? 00:03:45.720 |
But, you know, here's a quick example of Devon that we're using ourselves in production. 00:03:52.720 |
And so, you know, if you take a quick look here, for example, 00:03:54.720 |
there's this whole search bar and there's all the sessions 00:03:59.720 |
Devon actually made that in the Devon repository. 00:04:04.720 |
And Bryce was asking, hey, Devon, can you go into the Devon sessions list? 00:04:09.720 |
And so there's a few features about this in particular 00:04:11.720 |
that are obviously tuned for working in a production code base. 00:04:15.720 |
You can see here that Devon started from a snapshot. 00:04:18.720 |
So we have a machine instance loaded where it's cloned from. 00:04:23.720 |
So it knows like a lot of the details about our repositories. 00:04:25.720 |
And then it's also just able to generally work within our Git environment. 00:04:29.720 |
So you'll see it just make a PR and interact with all those same tools. 00:04:32.720 |
And so I'll just kind of go through this quickly. 00:04:40.720 |
And again, you're just giving feedback in plain English, right? 00:04:43.720 |
You know, now could you add a magnifying glass and make it idiomatic? 00:04:47.720 |
You know, use phosphor, you know, it's up to you, right? 00:04:52.720 |
And Bryce says, oh, by the way, no need to test. 00:04:55.720 |
And Devon says, by the way, I'm dealing with a bit of an issue 00:04:58.720 |
You know, it's just like you're working with another engineer, right? 00:05:03.720 |
And, you know, it kind of builds it all and gets the PR. 00:05:06.720 |
And this is, you know, the search bar, right? 00:05:07.720 |
And similarly, you know, a lot of the API integrations that Devon has were built by Devon. 00:05:12.720 |
You know, a lot of our own internal dashboards and metrics tracking within Devon were actually 00:05:22.720 |
And it's been kind of a fun one to see, like, Devon building the company with the company as 00:05:29.720 |
Yeah, I want to talk a little bit about, you know, our journey so far and about what's happening 00:05:35.720 |
And so, you know, we got started back in November. 00:05:43.720 |
And it was basically just -- a lot of us had already, like, lived together at that point. 00:05:48.720 |
You know, we'd all had our own journeys in AI. 00:05:50.720 |
And we just knew that we wanted to build something together. 00:05:52.720 |
And we obviously knew that we wanted to do something in code and build a coding agent. 00:05:58.720 |
After that, there was another hacker house in New York. 00:06:00.720 |
Then there was another hacker house in the Bay Area. 00:06:02.720 |
And so we were actually -- we've been going back and forth between New York and the Bay 00:06:07.720 |
I think at this point, we are now going to, like, settle in the Bay. 00:06:09.720 |
But it's been going back and forth and getting, like, a slightly bigger Airbnb each time 00:06:13.720 |
because the team also gets a little bit bigger. 00:06:18.720 |
And, you know, this is a particular question that I'm really passionate about, which is, 00:06:24.720 |
you know, language models have been pretty big. 00:06:29.720 |
And, you know, the first wave of generative AI is what I generally call these text completion products. 00:06:35.720 |
And, you know, that makes a lot of natural sense if you think about it, 00:06:38.720 |
that obviously the interface of a language model is text completion. 00:06:42.720 |
You give it a prefix and it completes the suffix from there. 00:06:44.720 |
And so if you think about ChatGPT, if you think about a lot of these Q&A products, 00:06:48.720 |
if you think about, you know, writing marketing copy or answering customer support, 00:06:53.720 |
or even GitHub Copilot and Cursor and products like that, you know, obviously very -- you know, 00:06:58.720 |
a lot of these are really great products and very natural use case where you have the prefix 00:07:02.720 |
so far and you're asking the model to complete what's next in the suffix. 00:07:09.720 |
And I think we're entering this new wave where, you know, we're going beyond that 00:07:13.720 |
and actually introducing some amount of autonomous decision making. 00:07:16.720 |
And obviously, you know, that's typically referred to in our space as agents. 00:07:20.720 |
And, you know, there's all sorts of new things that you unlock. 00:07:24.720 |
There's a lot higher bar of consistency that you require. 00:07:26.720 |
But there's new things that you unlock with that. 00:07:28.720 |
And so it's been an interesting one because it's both a very deep core capabilities question 00:07:34.720 |
of getting Devon to solve these, but also a pretty interesting product design problem 00:07:38.720 |
because I think that the UX of agents is something that's extremely new. 00:07:49.720 |
And so the idea of teaching AI to code is, you know, one of the coolest things 00:07:54.720 |
But beyond that, I think there's a few particular reasons that code 00:08:01.720 |
You know, one is that obviously there's so much more to being a software engineer 00:08:06.720 |
A lot of the work that you can do is, you know, you're going to be looking into a bug. 00:08:10.720 |
You're going to be looking at the different files of the code base. 00:08:12.720 |
Maybe you're going to be running this or that command. 00:08:14.720 |
Maybe you're going to be pulling up documentation. 00:08:16.720 |
Maybe you're going to run the front end yourself to reproduce the bug. 00:08:22.720 |
All of this work here obviously is, you know, that's what software engineering is, right? 00:08:27.720 |
More so than just typing the code in the file, which leads very naturally to an agentic workflow. 00:08:33.720 |
You know, another part which I think is closely related is the ability to iterate 00:08:38.720 |
And so what I mean by that is, you know, if you were given an entire production code base 00:08:43.720 |
and you were told, hey, this has this one bug. 00:08:48.720 |
You know, and let's say it's like thousands of files and, you know, hundreds of thousands 00:08:54.720 |
I mean, it would be pretty tough honestly for most humans. 00:08:57.720 |
It's also going to be quite tough for AIs as well. 00:09:00.720 |
And obviously the way that we do this in practice is, you know, you go and add print statements. 00:09:07.720 |
You know, you jump back and forth between different files. 00:09:10.720 |
Each of these things that you're doing, you know, you're making a decision 00:09:14.720 |
and then you're running actual code to find out what happened. 00:09:19.720 |
And it just gives you a much cleaner path to solve the problem in front of you. 00:09:22.720 |
And similarly, you know, that kind of lends very well to agents. 00:09:25.720 |
And the last thing I just want to mention is, you know, 00:09:27.720 |
how fast model agentic capabilities are improving. 00:09:31.720 |
And so, you know, two years ago, like, even something as simple as this name game demo, 00:09:38.720 |
And, you know, you think about where things are going 00:09:41.720 |
and where things are going to be two years from now. 00:09:43.720 |
I think there's a lot of, you know, the data, the right training, and so on, 00:09:47.720 |
that's really, really rapidly improving in the space. 00:09:54.720 |
And then, you know, again, beyond the capabilities problem, 00:09:56.720 |
there's actually a really deep UX problem as well. 00:09:58.720 |
And at a high level, you know, I think what's kind of happening here is, 00:10:01.720 |
when we're building agents, and I think all of us in the space are quite new to agents, 00:10:05.720 |
you know, the immediate first things, I think, to map to are, you know, 00:10:09.720 |
how we use software today, and also how we talk with other humans, right? 00:10:12.720 |
And so, you know, I mean, even a lot of the features in Devon are essentially looking 00:10:18.720 |
You know, you can see their computer, and you can see what commands they're running, 00:10:22.720 |
The thing is, I think an agent is actually pretty different from both. 00:10:25.720 |
You know, there's a lot of nuances and details of parallel work, information gathering, 00:10:30.720 |
how it manages context, et cetera, et cetera, that are super, super different. 00:10:35.720 |
And it's actually a quite deep problem from a product perspective as well. 00:10:38.720 |
And just to give you guys a bit of a sense of that, like, here's just kind of a short list 00:10:43.720 |
of some of the features that we've built into the product. 00:10:45.720 |
And so, you know, obviously there's Devon being able to use the shell, you know, edit code, browse the web. 00:10:52.720 |
You know, being able to fork and rollback sessions, you know, being able to handle integrations 00:10:56.720 |
with Slack and GitHub, being able to handle playbooks, to store machine snapshots, 00:11:01.720 |
to keep track of secrets, you know, to be able to work with the right tools for verification. 00:11:05.720 |
You know, all of this is part of the actual product iteration, right? 00:11:08.720 |
Which is, you know, on its own, I think, already an incredibly, incredibly dense problem. 00:11:12.720 |
And I think, honestly, we're going to see actually a lot more iteration with that over time. 00:11:17.720 |
And I just wanted to show kind of a new feature which we just recently shipped, 00:11:22.720 |
which is the ability to use Devon's machine, which is kind of -- again, 00:11:25.720 |
it's the kind of thing that's not always -- there's not necessarily a very close parallel in, 00:11:31.720 |
you know, in the software that we have today, right? 00:11:34.720 |
But the ability to just have a VS Code live share in Devon's machine. 00:11:37.720 |
And, you know, if you want to collaborate with Devon and say, hey, 00:11:40.720 |
oh, there's these couple lines, like, you know, you should make this edit. 00:11:45.720 |
And you can just talk with Devon and do that, right? 00:11:47.720 |
So there's a lot more room to go and a lot to iterate on in the space. 00:11:51.720 |
One of the other things I wanted to mention, too, is just how much, you know, 00:11:57.720 |
So you guys saw, like, a simple example of Devon building the search bar. 00:12:00.720 |
But, you know, we actually handle tasks in a much more async way now. 00:12:04.720 |
One of the cool kind of features of Devon, I'd say, is, you know, if, as an engineer, 00:12:09.720 |
you're working on, let's say, four different tasks today, you know, 00:12:17.720 |
You have four Devons that are all running in parallel. 00:12:19.720 |
And it's kind of turning every engineer into an engineering manager, 00:12:23.720 |
You know, I think the Devons are very, like, enthusiastic interns, is what I'd say. 00:12:29.720 |
You know, they're -- obviously, they don't know everything. 00:12:33.720 |
But, you know, you're kind of working with each of them 00:12:38.720 |
I mean, this is literally from earlier today. 00:12:40.720 |
But, you know, we were talking about some particular feature 00:12:45.720 |
In this case, it was a pretty simple thing of changing the color. 00:12:47.720 |
But it's just as simple as just saying in Slack, in the conversation, 00:12:50.720 |
hey, at Devon, can you just change this thing? 00:12:55.720 |
And so, you know, we've had a lot of occasions where we're, you know, 00:13:02.720 |
Because, you know, you can tell Devon exactly what you want Devon to do. 00:13:05.720 |
You just don't have your whole computer with you. 00:13:08.720 |
But, you know, being able to just kind of describe what you want to Devon 00:13:11.720 |
and then being able to review the code afterward actually works really well. 00:13:17.720 |
You know, I think this is a really important question. 00:13:20.720 |
And obviously, I think the technology is extremely early today. 00:13:24.720 |
But, you know, where do these things go in a few years? 00:13:26.720 |
And also, what happens with software engineering? 00:13:28.720 |
I think there's been a lot of uncertainty about that question. 00:13:31.720 |
And, you know, as we're using Devon more and more, I think one of the big things 00:13:35.720 |
that we see actually is -- this is kind of obvious perhaps, 00:13:38.720 |
but Devon is not the one that decides what to do or what to build, you know. 00:13:42.720 |
And there's this core part of software engineering. 00:13:44.720 |
The way I describe it is like software engineers everywhere, you know, 00:13:50.720 |
And the first job is basically problem solving with code. 00:13:53.720 |
You know, you're given a problem and you're breaking down exactly what is the solution 00:13:58.720 |
You know, what is the architecture that you're going to use? 00:14:00.720 |
What are all the flows and the details and the edge cases that might come up? 00:14:04.720 |
And kind of architecting your exact solution. 00:14:06.720 |
And then the second part is once you have that, you know, you're dealing with debugging 00:14:11.720 |
or implementing different functions or writing unit tests 00:14:14.720 |
or all of the other things that kind of go into this implementation 00:14:17.720 |
of something that you know you want to do, right? 00:14:20.720 |
And, you know, I think right now the average software engineer is probably 00:14:23.720 |
spending like 10 or 20 percent of the time on that first thinking part 00:14:26.720 |
and they're spending 80 or 90 percent of the time on that implementation part. 00:14:29.720 |
And, you know, what we really see is Devon actually just frees you up 00:14:35.720 |
And I think the future of Devon, again, it's very, very early, 00:14:38.720 |
but I think as Devon gets better we're going to see more of that 00:14:40.720 |
where Devon just frees up the implementation for you 00:14:42.720 |
where you don't have to go figure out how to set up Kubernetes. 00:14:45.720 |
You know, you don't have to go like debug all these like APIs that are broken. 00:14:48.720 |
You know, you don't have to go like deal with version changes or migrations 00:14:51.720 |
or all of these other things that, you know, take up a lot of time 00:14:56.720 |
But you actually are spending all your time on figuring out 00:15:00.720 |
You know, it's a little more like a mix between, you know, a technical architect 00:15:07.720 |
And so, you know, I think software engineering, the job 00:15:10.720 |
that we call software engineering is going to change. 00:15:12.720 |
But I think practically like there's actually going to be way more 00:15:17.720 |
And I think there's a lot of precedent for that too, you know. 00:15:20.720 |
Programming back then, you know, used to mean punch cards. 00:15:23.720 |
And then after that it used to mean assembly, you know. 00:15:26.720 |
And then after that it used to mean C, right? 00:15:28.720 |
And, you know, as these things have gone on, I mean, 00:15:30.720 |
most people aren't using punch cards anymore. 00:15:32.720 |
But there's actually way more programmers than before, right? 00:15:35.720 |
And I think one of the things that's easy to underestimate 00:15:38.720 |
is just how much more code there is to write. 00:15:41.720 |
And, you know, it's funny to think about, I think, 00:15:43.720 |
because obviously we all love software here in this room. 00:15:46.720 |
I would say I think software has been the number one driver of progress 00:15:52.720 |
And yet despite that, I think, you know, our demand for software to be built 00:15:56.720 |
is actually probably a lot more than 10x what we're currently getting. 00:16:02.720 |
we get to open up the power of software engineering to a lot more people. 00:16:05.720 |
And every single software engineer gets to be 5 or 10x more effective. 00:16:08.720 |
But we actually do a lot more software engineering. 00:16:14.720 |
But, yeah, we'd love to open the floor if there's any questions. 00:16:40.720 |
Every week we've been letting on more and more people. 00:16:42.720 |
We've also been sizing up with our enterprise customers. 00:16:49.720 |
But we'd love to get you guys access as soon as possible. 00:17:01.720 |
like, the Devon has access to running the code wall in the back 00:17:03.720 |
and, like, how are you going to be going to now 00:17:12.720 |
It has a machine that's basically instantiated 00:17:19.720 |
hey, I need you to debug this particular thing, 00:17:21.720 |
it'll just pull it up itself and then, you know, reproduce it, 00:17:41.720 |
with all of these simpler tasks getting solved, what happens to, you know, 00:17:45.720 |
all the junior engineers or the interns who obviously need to learn how to code? 00:17:50.720 |
what happens honestly is I think that demand is going to just keep rising with supply. 00:17:51.720 |
And I think the training process is going to change a little bit. 00:17:52.720 |
But, you know, I think a lot of these core fundamentals of -- you know, if you think of someone as -- 00:17:53.720 |
when you say someone's a really great engineer, typically you don't mean that they type really fast, 00:17:55.720 |
You typically mean that they have a really great understanding of what they're going to do. 00:17:56.720 |
And I think that they have a really great understanding of what they're going to do. 00:18:09.720 |
And I think the training process is going to change a little bit. 00:18:11.720 |
But, you know, I think a lot of these core fundamentals of -- you know, if you think of someone as -- 00:18:15.720 |
when you say someone's a really great engineer, typically you don't mean that they type really fast, 00:18:20.720 |
You typically mean that they have a really great understanding of problems. 00:18:23.720 |
You know, they know all the different architectures. 00:18:28.720 |
And so those are the fundamentals that I think are always going to matter. 00:18:32.720 |
And I think -- basically I think interns or junior engineers are going to get more exposed 00:18:37.720 |
to getting to use those fundamentals earlier and earlier. 00:18:50.720 |
So someone asked, what are the biggest challenges to realizing the vision of a future suite? 00:18:55.720 |
I mean, it's basically everything, as you can imagine. 00:19:06.720 |
You know, and I think all of these things -- one of the cool things, I think, is just how 00:19:13.720 |
And so, you know, obviously we're going to do our best work on it. 00:19:15.720 |
But, you know, every new hardware release is amazing for it. 00:19:19.720 |
You know, every new foundation model that comes out is amazing. 00:19:22.720 |
You know, every new piece of agentic research. 00:19:24.720 |
And I think this is the kind of thing where -- I think there will be a lot of different optimizations 00:19:30.720 |
that come in different parts of the stack that make this agentic flow better and better and better.