The Making of Devin by Cognition AI: Scott Wu

I'm Scott from Cognition AI, and I'm going to tell you guys a little bit about, you know, the early makings of Devon. We're still super, super early on, and also a little bit about kind of the space as a whole and what's coming next. You know, I thought it would be nice to start with the demo first.

It sounds like some of you guys have already seen some of the videos, but I brought a nice custom one here for the World's Fair today, so I'll just show that quickly. And here I basically said, hey, Devon -- this was this morning, by the way, this is a huge scramble -- I said, hey, Devon, I want you to build a mobile-friendly website to play the name game.

So I have a lot of trouble memorizing names and faces, I don't know about you guys, but I basically just said, you know, here's a -- here's a TSV file of a bunch of names of faces, these are all the speakers here at the World's Fair this week. And I said, can you set up the game so that you show two different random faces and then show the names of one of them and have me guess which one is which, right?

And I gave kind of a few instructions on how the game should work. And so Devon is a fully autonomous software engineer. And what that means is Devon has access to all the same tools that a human software engineer would have when it was building -- when they were building something like this.

And so the first thing that Devon is going to do is Devon is going to make a plan. And you can see here, you know, kind of a basic plan coming out. One of the interesting things about this is the plan changes a lot over time. And so, you know, as you get new information or new feedback, you update your plan accordingly with that, too.

After that, Devon is basically just running this the same way that a human would. And so if you can take a look, you know, Devon makes a new directory for the name game website, starts a new React app, you know, all the same primitives, works on building it out and building the code, you know, reads the TSV file to take a look at what's going on here.

And it's just kind of generally working through it and jumping through. It comes out and deploys this first version after some minutes. And I'll just pull this up quickly. So that's what this looks like. It's closed but not quite there, right? I mean, it shows -- it's still showing the names.

And I think maybe I didn't quite specify that exactly. But, you know, you can click the name and got that correct. And so I just went ahead and just gave it some more feedback in plain English. And so I said, hey, you know, can you hide the two names until I click on the answer?

And also, can you probably restyle the play again button? It's like, you know, somehow it's a little off on this page. And I kept going and just kind of gave it more and more feedback over time. And I also asked it, hey, can you add a streak counter as well?

You know, can you keep track of how many I got correct and, you know, reset to zero? You know, a few of these other things. And the website it ultimately deployed was this one right here. And so this is Justine, for example. Keeps track of my streak. And you can see it's kind of ramping it up.

And so, you know, if I were to -- for example, if I got this one wrong on purpose, then you would see the streak would reset to zero, you know, and it would go on. And so I actually played this game and learned the names of everyone, which was super helpful, by the way.

And, you know, you guys can play it too. It's right here if you want to try it out. This has all the speakers. I think it was something like 170 speakers here at the World's Fair this week. So, you know, this is kind of a cool example. But, you know, I want to highlight how different the world is if software engineering is just this easy.

You know, if you can just explain exactly what you want in plain English and get that out. And so, you know, this is obviously kind of a toy use case. And it's perhaps useful. But we use Devon all the time ourselves when we're building Devon, actually. And by the way, obviously, I didn't make this website myself.

I just said, hey, can you build me this website with the QR code and whatever? And Devon built that too. But, you know, here's a quick example of Devon that we're using ourselves in production. And so, you know, if you take a quick look here, for example, there's this whole search bar and there's all the sessions and you can search across sessions, right?

Devon actually made that in the Devon repository. You can see here Bryce is on our team. And Bryce was asking, hey, Devon, can you go into the Devon sessions list? Create a search bar component. Here's what I need you to do. And so there's a few features about this in particular that are obviously tuned for working in a production code base.

You can see here that Devon started from a snapshot. So we have a machine instance loaded where it's cloned from. It has a playbook. So it knows like a lot of the details about our repositories. And then it's also just able to generally work within our Git environment. So you'll see it just make a PR and interact with all those same tools.

And so I'll just kind of go through this quickly. So yeah, Devon says absolutely. You know, makes the first pull request. Bryce continues. And again, you're just giving feedback in plain English, right? And you say, hey, this is a great start. You know, now could you add a magnifying glass and make it idiomatic?

You know, use phosphor, you know, it's up to you, right? And Devon says, yeah, sure, I'll build that. And Bryce says, oh, by the way, no need to test. You know, I trust you. And Devon says, by the way, I'm dealing with a bit of an issue with the login process.

You know, it's just like you're working with another engineer, right? And Devin says -- and Bryce says, okay, bro. And, you know, it kind of builds it all and gets the PR. And this PR was actually merged. And this is, you know, the search bar, right? And similarly, you know, a lot of the API integrations that Devon has were built by Devon.

You know, a lot of our own internal dashboards and metrics tracking within Devon were actually also built by Devon. And it's been kind of a fun one to see, like, Devon building the company with the company as well. So cool. Yeah, I want to talk a little bit about, you know, our journey so far and about what's happening in the space as well.

And so, you know, we got started back in November. So it's been about seven months now. It's kind of funny. We started in a hacker house in Burlingame. And it was basically just -- a lot of us had already, like, lived together at that point. You know, we'd all had our own journeys in AI.

And we just knew that we wanted to build something together. And we obviously knew that we wanted to do something in code and build a coding agent. And then that hacker house in the Bay Area. After that, there was another hacker house in New York. Then there was another hacker house in the Bay Area.

And so we were actually -- we've been going back and forth between New York and the Bay for basically the last seven months. I think at this point, we are now going to, like, settle in the Bay. But it's been going back and forth and getting, like, a slightly bigger Airbnb each time because the team also gets a little bit bigger.

But, you know, why Devon in particular? And, you know, this is a particular question that I'm really passionate about, which is, you know, language models have been pretty big. I think that's fair to say. And, you know, the first wave of generative AI is what I generally call these text completion products.

Right? And, you know, that makes a lot of natural sense if you think about it, that obviously the interface of a language model is text completion. Right? You give it a prefix and it completes the suffix from there. And so if you think about ChatGPT, if you think about a lot of these Q&A products, if you think about, you know, writing marketing copy or answering customer support, or even GitHub Copilot and Cursor and products like that, you know, obviously very -- you know, a lot of these are really great products and very natural use case where you have the prefix so far and you're asking the model to complete what's next in the suffix.

Right? And it does that for you. And that's a tool that's useful. Right? And I think we're entering this new wave where, you know, we're going beyond that and actually introducing some amount of autonomous decision making. And obviously, you know, that's typically referred to in our space as agents.

Right? And, you know, there's all sorts of new things that you unlock. Right? There's a lot higher bar of consistency that you require. But there's new things that you unlock with that. And so it's been an interesting one because it's both a very deep core capabilities question of getting Devon to solve these, but also a pretty interesting product design problem because I think that the UX of agents is something that's extremely new.

And then why code in particular? You know, a few different things. Obviously, we're all coding nerds as well. You know, we're all engineers. And so the idea of teaching AI to code is, you know, one of the coolest things that we could think of. But beyond that, I think there's a few particular reasons that code with agents works especially well.

You know, one is that obviously there's so much more to being a software engineer than typing the code. Right? A lot of the work that you can do is, you know, you're going to be looking into a bug. You're going to be looking at the different files of the code base.

Maybe you're going to be running this or that command. Maybe you're going to be pulling up documentation. Maybe you're going to run the front end yourself to reproduce the bug. You know, you look at this thing. You make this edit. You try it again. All of this work here obviously is, you know, that's what software engineering is, right?

More so than just typing the code in the file, which leads very naturally to an agentic workflow. You know, another part which I think is closely related is the ability to iterate with code feedback. And so what I mean by that is, you know, if you were given an entire production code base and you were told, hey, this has this one bug.

I need you to fix it. Here's the bug. You know, and let's say it's like thousands of files and, you know, hundreds of thousands of lines of code. I mean, it would be pretty tough honestly for most humans. It's also going to be quite tough for AIs as well.

And obviously the way that we do this in practice is, you know, you go and add print statements. You pull up the logs. You check the monitoring. You know, you jump back and forth between different files. You try and diagnose it, right? Each of these things that you're doing, you know, you're making a decision and then you're running actual code to find out what happened.

And from that, you're able to iterate. And it just gives you a much cleaner path to solve the problem in front of you. And similarly, you know, that kind of lends very well to agents. And the last thing I just want to mention is, you know, how fast model agentic capabilities are improving.

And so, you know, two years ago, like, even something as simple as this name game demo, I think, would have been almost unthinkable. And, you know, you think about where things are going and where things are going to be two years from now. I think there's a lot of, you know, the data, the right training, and so on, that's really, really rapidly improving in the space.

And then, you know, again, beyond the capabilities problem, there's actually a really deep UX problem as well. And at a high level, you know, I think what's kind of happening here is, when we're building agents, and I think all of us in the space are quite new to agents, you know, the immediate first things, I think, to map to are, you know, how we use software today, and also how we talk with other humans, right?

And so, you know, I mean, even a lot of the features in Devon are essentially looking over your own intern's shoulder. You know, you can see their computer, and you can see what commands they're running, and things like that. The thing is, I think an agent is actually pretty different from both.

You know, there's a lot of nuances and details of parallel work, information gathering, how it manages context, et cetera, et cetera, that are super, super different. And it's actually a quite deep problem from a product perspective as well. And just to give you guys a bit of a sense of that, like, here's just kind of a short list of some of the features that we've built into the product.

And so, you know, obviously there's Devon being able to use the shell, you know, edit code, browse the web. But there's all these other things, right? You know, being able to fork and rollback sessions, you know, being able to handle integrations with Slack and GitHub, being able to handle playbooks, to store machine snapshots, to keep track of secrets, you know, to be able to work with the right tools for verification.

You know, all of this is part of the actual product iteration, right? Which is, you know, on its own, I think, already an incredibly, incredibly dense problem. And I think, honestly, we're going to see actually a lot more iteration with that over time. And I just wanted to show kind of a new feature which we just recently shipped, which is the ability to use Devon's machine, which is kind of -- again, it's the kind of thing that's not always -- there's not necessarily a very close parallel in, you know, in the software that we have today, right?

But the ability to just have a VS Code live share in Devon's machine. And, you know, if you want to collaborate with Devon and say, hey, oh, there's these couple lines, like, you know, you should make this edit. I went ahead and did that edit for you. And you can just talk with Devon and do that, right?

So there's a lot more room to go and a lot to iterate on in the space. One of the other things I wanted to mention, too, is just how much, you know, we've seen it changing our own workflow. So you guys saw, like, a simple example of Devon building the search bar.

But, you know, we actually handle tasks in a much more async way now. One of the cool kind of features of Devon, I'd say, is, you know, if, as an engineer, you're working on, let's say, four different tasks today, you know, you just give one to Devon number one.

You give the second one to Devon number two. The third one to Devon number three. You have four Devons that are all running in parallel. And it's kind of turning every engineer into an engineering manager, is almost how I'd describe it. You know, I think the Devons are very, like, enthusiastic interns, is what I'd say.

I mean, they try very hard. You know, they're -- obviously, they don't know everything. They get little things wrong. They ask a lot of questions. But, you know, you're kind of working with each of them and having them iterate. And so here's just kind of a fun example. I mean, this is literally from earlier today.

But, you know, we were talking about some particular feature and about what we wanted to build. In this case, it was a pretty simple thing of changing the color. But it's just as simple as just saying in Slack, in the conversation, hey, at Devon, can you just change this thing?

And then Devon goes and makes the PR. And then you hit merge, you know. And so, you know, we've had a lot of occasions where we're, you know, in the gym or in the car or something. And now you can actually write code. Because, you know, you can tell Devon exactly what you want Devon to do.

You just don't have your whole computer with you. You can't type everything. But, you know, being able to just kind of describe what you want to Devon and then being able to review the code afterward actually works really well. So what's next? You know, I think this is a really important question.

And obviously, I think the technology is extremely early today. But, you know, where do these things go in a few years? And also, what happens with software engineering? I think there's been a lot of uncertainty about that question. And, you know, as we're using Devon more and more, I think one of the big things that we see actually is -- this is kind of obvious perhaps, but Devon is not the one that decides what to do or what to build, you know.

And there's this core part of software engineering. The way I describe it is like software engineers everywhere, you know, are doing really two jobs at once, right? And the first job is basically problem solving with code. You know, you're given a problem and you're breaking down exactly what is the solution you're going to build.

You know, what is the architecture that you're going to use? What are all the flows and the details and the edge cases that might come up? And kind of architecting your exact solution. And then the second part is once you have that, you know, you're dealing with debugging or implementing different functions or writing unit tests or all of the other things that kind of go into this implementation of something that you know you want to do, right?

And, you know, I think right now the average software engineer is probably spending like 10 or 20 percent of the time on that first thinking part and they're spending 80 or 90 percent of the time on that implementation part. And, you know, what we really see is Devon actually just frees you up to do more of the first part, you know.

And I think the future of Devon, again, it's very, very early, but I think as Devon gets better we're going to see more of that where Devon just frees up the implementation for you where you don't have to go figure out how to set up Kubernetes. You know, you don't have to go like debug all these like APIs that are broken.

You know, you don't have to go like deal with version changes or migrations or all of these other things that, you know, take up a lot of time in software engineering, right? But you actually are spending all your time on figuring out how to solve the problems in front of you.

You know, it's a little more like a mix between, you know, a technical architect and a product manager almost, right? And so, you know, I think software engineering, the job that we call software engineering is going to change. But I think practically like there's actually going to be way more software engineers than ever, you know.

And I think there's a lot of precedent for that too, you know. Programming back then, you know, used to mean punch cards. And then after that it used to mean assembly, you know. And then after that it used to mean C, right? And, you know, as these things have gone on, I mean, most people aren't using punch cards anymore.

But there's actually way more programmers than before, right? And I think one of the things that's easy to underestimate is just how much more code there is to write. And, you know, it's funny to think about, I think, because obviously we all love software here in this room. I would say I think software has been the number one driver of progress in the world in the last 40 or 50 years.

And yet despite that, I think, you know, our demand for software to be built is actually probably a lot more than 10x what we're currently getting. And so, you know, I think what happens is we get to open up the power of software engineering to a lot more people.

And every single software engineer gets to be 5 or 10x more effective. But we actually do a lot more software engineering. Cool. Yeah, so that's all I had. But, yeah, we'd love to open the floor if there's any questions. Yeah, right here in the front. Great question. Great question.

So we've been ramping up access. Every week we've been letting on more and more people. We've also been sizing up with our enterprise customers. We have a lot of wait lists to get through. So we're doing it as fast as we can. But we'd love to get you guys access as soon as possible.

Yeah. Yeah. All the way in the back over there. Yeah, in the red. Hey, I was just wondering how much, like, like, the Devon has access to running the code wall in the back and, like, how are you going to be going to now to correct ? Yeah, exactly.

So in our code base, for example, Devon has all the setup that it needs. It has a machine that's basically instantiated where it can run the Dev environment. It can run the server. It can run the front end. And so if it's -- if you're asking it, hey, I need you to debug this particular thing, it'll just pull it up itself and then, you know, reproduce it, and then it'll debug it and try it again.

Yeah. Yeah. Exactly. Yeah. Yeah. Yeah. Any other questions? Right here? Yeah. So -- Yeah. So -- -- Yeah. So -- -- Sorry. So someone asked, you know, with all of these simpler tasks getting solved, what happens to, you know, all the junior engineers or the interns who obviously need to learn how to code?

You know, I think -- what happens honestly is I think that demand is going to just keep rising with supply. And I think the training process is going to change a little bit. But, you know, I think a lot of these core fundamentals of -- you know, if you think of someone as -- when you say someone's a really great engineer, typically you don't mean that they type really fast, although maybe they do that too, right?

You typically mean that they have a really great understanding of what they're going to do. And I think that they have a really great understanding of what they're going to do. And I think that they're going to do that. And I think that they're going to do that. And I think that they're going to do that.

And I think that they're going to do that. And I think that they're going to do that. And I think that they're going to do that. And I think that they're going to do that. And I think that they're going to do that. And I think that they're going to do that.

And I think that they're going to do that. And I think that they're going to do that. With supply. And I think the training process is going to change a little bit. But, you know, I think a lot of these core fundamentals of -- you know, if you think of someone as -- when you say someone's a really great engineer, typically you don't mean that they type really fast, although maybe they do that too, right?

You typically mean that they have a really great understanding of problems. You know, they know all the different architectures. They never miss an edge case. Stuff like that, right? And so those are the fundamentals that I think are always going to matter. And I think -- basically I think interns or junior engineers are going to get more exposed to getting to use those fundamentals earlier and earlier.

Yeah. Yeah. Okay. So someone asked, what are the biggest challenges to realizing the vision of a future suite? You know, there's a lot. I mean, it's basically everything, as you can imagine. I mean, there's speed. There's consistency. There's access. There's integrations. There's the right product UX. You know, and I think all of these things -- one of the cool things, I think, is just how much of a rising tide there is everywhere.

And so, you know, obviously we're going to do our best work on it. But, you know, every new hardware release is amazing for it. You know, every new foundation model that comes out is amazing. You know, every new piece of agentic research. And I think this is the kind of thing where -- I think there will be a lot of different optimizations that come in different parts of the stack that make this agentic flow better and better and better.

But, yeah, it won't just be one small thing. But I think it will be pretty fast. That's all the time we had. So thank you so much. Thank you guys so much. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.

Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. you We'll see you next time.

The Making of Devin by Cognition AI: Scott Wu

Transcript