back to indexBuilding Replit Agent v2

00:00:05.000 |
I'm thrilled to announce our next speaker from Replit. 00:00:11.960 |
They've made programming accessible to anyone. 00:00:15.000 |
They've revolutionized how we've written, deployed, 00:00:26.760 |
And so I'd like to welcome my friend Michele, 00:00:28.880 |
president at Replit, to the stage for our fireside chat. 00:00:57.000 |
You guys launched V2 of Replit Agent six months ago? 00:01:03.220 |
MIKHELE MIRANDA: Early access end of February, GA, late March. 00:01:06.960 |
MIKHELE MIRANDA: And I've heard nothing but fantastic things 00:01:10.340 |
And so if people haven't tried out Replit Agent in the last two 00:01:18.440 |
MIKHELE MIRANDA: I think the shortest possible summary 00:01:20.440 |
is autonomy, the level of autonomy that it showcases compared 00:01:25.440 |
If you tried V1 starting from September last year, 00:01:28.440 |
you recall that it was working autonomously for a couple of 00:01:32.440 |
MIKHELE MIRANDA: And right now it's not uncommon to see it 00:01:37.160 |
And what I mean, what I say by running is not spinning the 00:01:39.620 |
wheels, like rather doing useful work and accomplishing 00:01:44.380 |
And it took a lot of re-architecting and also thanks 00:01:50.100 |
And things we learned, to be honest, like shipping things 00:01:55.000 |
I think we learned a lot of tweaks to make the agent overall 00:01:58.820 |
MIKHELE MIRANDA: Are you able to share any of those tweaks? 00:02:03.920 |
MIKHELE MIRANDA: I would say I usually have two pillars, which, 00:02:07.020 |
by the way, I'm going to reiterate what you just 00:02:15.420 |
Otherwise, especially the more your agent becomes advanced, 00:02:18.800 |
the more you don't have an idea if you're introducing 00:02:26.860 |
I mean, as you know, we use LangSmeet pretty thoroughly. 00:02:31.800 |
And I think we are all learning, as a field, how to do 00:02:36.060 |
It's a completely different animal compared to how we built 00:02:40.800 |
MIKHELE MIRANDA: One of the things that I'd love to hear more 00:02:44.800 |
about, when we did a separate fireside chat maybe in December, 00:02:49.060 |
and we talked about the human in the loop experience and how 00:02:54.020 |
Now you're saying these agents are more autonomous. 00:02:58.760 |
Has that changed, or is it just present in a different way? 00:03:02.460 |
There is this constant tension between wanting to put the human 00:03:05.420 |
in the loop so that you can break the genetic flow and make sure 00:03:08.680 |
that in case it's going sideways, the human can bring it back 00:03:12.800 |
But at the same time, what we're experiencing from our users 00:03:15.580 |
is that when the agent is actually working correctly, 00:03:21.840 |
And the bar keeps raising basically on a monthly basis. 00:03:24.460 |
The more we can get done, it maybe takes a week for the user 00:03:27.340 |
to get used to that, and then they just want more. 00:03:29.800 |
So I think the strategy that we're following at the moment 00:03:32.980 |
is we try to upload notifications also to other platforms. 00:03:37.840 |
We have a mobile app, for instance, that basically 00:03:40.000 |
allows you to bring back the user to the attention. 00:03:43.420 |
But at the same time, there is always a chat available 00:04:01.860 |
And I'm trying to build a product that makes both of them happy. 00:04:11.500 |
On the topic of users, how are people using RepliAgent? 00:04:21.580 |
Who are the users that you're thinking of targeting? 00:04:25.760 |
So starting from early February, we finally opened our free tier. 00:04:29.880 |
So everyone can use Rapid just creating an account. 00:04:33.360 |
And we are on track to create roughly 1 million applications 00:04:36.840 |
So that's the level of scale that we reach today. 00:04:40.440 |
A lot of them are just testing what agents can do. 00:04:43.940 |
And I think the same high that we got when we were younger. 00:04:47.660 |
We wrote our first piece of code, and you actually see it running. 00:04:50.820 |
That's what a lot of people are chasing when first trying the agent. 00:04:53.540 |
Like realizing that you can actually build software even without having any coding background. 00:04:58.840 |
At the same time, some of them get hooked up. 00:05:01.080 |
And they realize, oh, I can build what I need for my business. 00:05:06.540 |
And that's when they start to work on much more ambitious applications. 00:05:10.260 |
So I think one of the key differences of our product is the fact that it's not used mostly 00:05:15.600 |
to create simple like landing pages or prototypes, but rather people find value on very long trajectories. 00:05:21.980 |
I've seen people spending hundreds of hours on a single project with a reputation, writing 00:05:27.240 |
absolutely no lines of code, just making progress with the agent. 00:05:30.800 |
That is, first of all, a great technical challenge because it makes things much harder for several 00:05:36.140 |
different reasons, and the people that are spending so much time, they are usually either 00:05:46.020 |
There is this concept of unbundling SaaS that even program talks about, the idea that why would 00:05:51.660 |
I spend seven figures buying a very expensive SaaS when I did only two features? 00:05:56.640 |
I'm going to rather rebuild it and deploy it internally in the company. 00:06:00.480 |
So this is one direction that I see a lot more companies working on. 00:06:04.480 |
And at the same time, also personalized applications for professionals or even people that have 00:06:11.840 |
their own hobby and they want to build software based on that. 00:06:20.120 |
For people who have agents and are maybe starting with agents on the lower end of autonomy and 00:06:26.820 |
are thinking of letting it run now for 10, 15 minutes like you are, how did you have the 00:06:33.340 |
When was the point where you were like, okay, we can bring the human out of the loop and 00:06:38.060 |
Was that based on feedback from users, internal testing, metrics? 00:06:42.520 |
What did that process to get that confidence look like? 00:06:47.460 |
Even before we launched V1, we had a prototype of it since early 2024. 00:06:52.320 |
So we have always been trying to make it work. 00:06:55.920 |
And the moment we find the ride unlocks, which partially are due to what Frontier Labs are working 00:07:02.460 |
And at the same time, it's also due to how good is the scaffold that we're building. 00:07:06.800 |
The moment it works well enough, then that's when we start to feel we should launch this. 00:07:10.340 |
We should put it at least in front of a small alpha users cohort. 00:07:15.000 |
What happened with V2 is that we re-architected it to best leverage the latest models out there, 00:07:22.520 |
and then we started to use it a lot internally. 00:07:25.060 |
And we started with a approach that was a bit more similar to V1, so we were more cautious. 00:07:33.900 |
So we wanted to say, "Okay, how far can we take this? 00:07:37.660 |
And it turns out that it exceeded our expectations. 00:07:40.680 |
So the confidence, in all honesty, as usual, came during the early access program where we 00:07:49.180 |
We asked users just through social to go and try it. 00:07:52.460 |
And then we received exceedingly positive feedback. 00:07:55.720 |
And then as a team, we rushed to basically go to GA as soon as possible. 00:08:02.160 |
Are you able to share what models you all are using or how generally you think of the model 00:08:09.820 |
We are heavy users of the Sonnet models, especially in 3.7 as unlock a new level of autonomy for 00:08:18.000 |
So I see overall the industry pointing in that direction, like the latest Gemini 2.5 Pro is 00:08:27.120 |
And I do believe that Frontier Labs are realizing that there is a lot of value in allowing companies 00:08:33.820 |
like ours and all your customers to create much more advanced, agentic workflows compared 00:08:42.240 |
So I wouldn't be surprised if in the next few months we are going to see all the top models 00:08:46.760 |
exposing tools and being post-trained in such a way that allows you to have much more autonomy 00:08:54.280 |
And how many do you let users choose what model is used under the hood, or is that hidden? 00:09:02.020 |
No, we are very opinionated, and it's also product choice. 00:09:06.660 |
In all honesty, there are platforms where, of course, you can pick your model. 00:09:11.960 |
We use Cursor Internet and Rapid, for example, to develop parts of it. 00:09:15.240 |
So I think it's great to have a model selector and get the best possible performance from 00:09:20.760 |
the different models available on the market. 00:09:22.760 |
In our case, it would be a fairly big challenge to allow you to switch models. 00:09:31.760 |
3.7 is kind of like the foundation, the main building block for the IQ of the agent. 00:09:38.280 |
But we also use a lot of other models to do a lot of accessory functions. 00:09:43.280 |
Especially when we can trade off latency for performance, then we go with flash models or 00:09:52.280 |
So we don't give you that optionality, because it would be very hard for us to even maintain 00:10:01.800 |
We go very deep into the rabbit hole of optimizing the prompts. 00:10:04.560 |
It would be very hard for me to go from n=1 to n=3 prompt sets. 00:10:11.800 |
Do you use any open source models as well as part of this, or is it mostly foundation 00:10:18.600 |
At this point, it's mostly foundation models. 00:10:21.100 |
We definitely spent some time testing DeepSeq, and I'm very bullish overall in time. 00:10:27.580 |
The reason why we're not investing too much time today fine-tuning or exploring open source 00:10:33.020 |
models at length is because, again, the labs are moving at a completely different pace compared 00:10:40.320 |
I think back in the days when we got to know each other, maybe there was a new leap every 00:10:45.360 |
Now it's probably happening every couple of months. 00:10:47.960 |
So it's better to explore what you can do today with Frontier Labs. 00:10:52.660 |
And then eventually, when things slow down, if they will ever slow down, by the way, or if 00:10:56.660 |
there is a reason for us to take an open source model, fine-tune it, and perhaps try to optimize 00:11:02.200 |
some of the key actions that our agent takes, then I'd be happy to spend time there. 00:11:08.000 |
But for now, it's already very frantic, as it is. 00:11:12.040 |
You've mentioned kind of like the trade-off between cost and latency, and then there's 00:11:18.080 |
How do you think about that now, and how have you thought about that over time? 00:11:23.820 |
Because RepliAgent, I feel like, at least based on what I see on Twitter, has exploded 00:11:30.080 |
And so was there a moment-- like, I think everyone kind of has some fear when they launch 00:11:36.120 |
Like, if this becomes really popular, like, it's going to bankrupt me. 00:11:40.120 |
And so did you guys have that fear as you started to see things take off? 00:11:43.920 |
I still have that fear, so it doesn't change much, trust me. 00:11:47.960 |
So I think I went on a podcast, probably in early October last year, of course, saying 00:11:53.420 |
that the three dimensions you want to optimize are performance, cost, and latency. 00:11:59.960 |
And for me, performance and cost are almost at the same level in terms of importance. 00:12:05.360 |
And then, already back in the V1 days, I was using latency as a far third. 00:12:11.780 |
It doesn't change much today with V2, if anything, that gap has become even wider. 00:12:17.740 |
It runs for so long, and possibly that was the scariest bet we did when we launched it, 00:12:24.020 |
especially when we put it on and we made it GA. 00:12:27.300 |
And the reason is, we were already not emphasizing too much the latency component, but we strongly 00:12:33.260 |
believe that it's far more important for the agent to get done what people want, and especially 00:12:38.940 |
for the ICP that we have in mind, which is non-technical people. 00:12:42.880 |
So we went almost like one order of magnitude in terms of additional latency. 00:12:47.500 |
And the reaction has been fairly non-controversial, I think, and maybe for the first week we heard 00:12:52.900 |
some people being shocked about the amount of time it was taking, but the moment you realize 00:12:57.260 |
how much more it gets done, and the amount of headaches that it solves for you, because you 00:13:04.060 |
Even if you debug it with the agent, with an older version of the agent, you have to know 00:13:09.120 |
Right now, it's not the case anymore, oftentimes. 00:13:11.060 |
So do you see people modifying the code manually still, or is it completely hands-off? 00:13:18.740 |
We have an internal metric, and it's one of my North Stars, to be honest. 00:13:22.680 |
We try to track how often people go back into our editor, which, by the way, we have been 00:13:26.880 |
hiding in the product since we launched Agent B1. 00:13:33.920 |
The main product for those who didn't know Rapid before we launched the agent was an editor 00:13:39.920 |
We started by still showing you the file tree, then now it's hidden by default, and then it 00:13:44.920 |
takes some effort to get in front of the editor. 00:13:47.920 |
We started where, I think, one user out of four were actually still editing the code, especially 00:13:55.620 |
I think as of today, we arrived to a point where it's one out of ten doing that. 00:14:00.320 |
And my goal is, eventually, it should be like zero users willing to put their hands on the 00:14:06.100 |
One of the cool features of Repl.it that I remember from before, Agent, was kind of like 00:14:15.300 |
When people build agents, is there a collaborative aspect to it, or is it mostly kind of like—sorry, 00:14:20.120 |
when people build apps with agent, is it mostly one person using the agent, or is there sometimes 00:14:24.620 |
collaborative as well interacting with the agent? 00:14:28.320 |
So for our consumers around the world, yes, most of them, I think, are just single-player 00:14:34.300 |
experience, especially more like in a business and enterprise setting. 00:14:40.040 |
We bring them in in a team so everyone can see each other's projects. 00:14:46.560 |
Now, we have a giant lock as of now, for reasons I'm happy to explain. 00:14:51.560 |
But, you know, we see oftentimes in the shot logs that there are several people sending, basically, 00:14:58.080 |
The challenge why it's still hard to run a lot of agents in parallel is not that much on the 00:15:03.560 |
Like, we have everything it takes to run multiple instances because we already run at scale, so 00:15:10.380 |
The real challenge is how do you merge all the different, you know, patches, basically PRs 00:15:16.380 |
that the agent creates, which is a non-trivial problem, Steve, for even AI frontier models. 00:15:22.380 |
Like, merge conflicts are hard, unfortunately. 00:15:25.200 |
You mentioned earlier that there's some app for using Repl.it and getting notifications. 00:15:34.960 |
Where I'm going with this is when this agent's running for, like, 10, 15 minutes, how does 00:15:39.020 |
it-- like, what are the communication patterns you're seeing? 00:15:42.020 |
Are they just keeping the browser open and looking there? 00:15:46.020 |
Is it this app that sends them a push-- like, what are you seeing being helpful there? 00:15:51.400 |
And has that changed as the agent gets longer and longer running? 00:15:55.960 |
So with WeeV1, most of the users were in front of the screen all the time, because the feedback 00:16:05.120 |
And I think there was also quite a bit to learn from what the agent was doing. 00:16:12.900 |
If you're curious, you can basically expand every single action it does. 00:16:16.580 |
If you want, you can see the output of every single tool we run. 00:16:21.560 |
So there is a subset of users that are using the agent not only because they want to build 00:16:27.080 |
something, but also because they want to speedrun their learning experience. 00:16:30.780 |
It teaches you how to build 0 to 1 apps in possibly the best possible way. 00:16:35.960 |
There are also users that absolutely don't care, and they just launch, they submit a prompt, 00:16:41.520 |
and then they go back, maybe they go to it, and then they go back and check Replit. 00:16:45.200 |
To make sure that the loop is a bit tighter, the Replit mobile app, that is available both 00:16:50.880 |
in App Store and Android, sends you notifications when the agent wants your feedback. 00:16:56.160 |
And the vision that we have for the next release is to send you even fewer notifications. 00:17:01.880 |
And the idea is, right now, one of the bottlenecks, at least for us, is the fact that we rely solely 00:17:10.560 |
But, as you know, more and more progress is happening on the computer use side. 00:17:16.560 |
You know, Anthropic launched that back in late October, if I recall correctly. 00:17:20.240 |
Open AI fast-followed, and open source is also catching up. 00:17:23.740 |
You know, I see Hagen Phase launched something similar a week ago. 00:17:26.740 |
That is something that we are actively working on to remove even, you know, this additional 00:17:33.240 |
Because a lot of the time what we ask you to test is fairly trivial. 00:17:37.240 |
So, like, it's data input and clicking around a very simple interface. 00:17:42.700 |
I expect us to be able to do that with computer use very soon. 00:17:46.920 |
Bring it in products, and then jumping from, say, ten minutes of autonomy to one hour of autonomy. 00:17:53.040 |
That is my target, you know, for P3, hopefully in a few months. 00:17:56.400 |
How do you think about, there's kind of like testing, but then there's also making sure that 00:18:03.220 |
And oftentimes we're bad communicators, and don't specify everything up front. 00:18:07.380 |
How do you think about getting all of that specification? 00:18:10.360 |
Do you have something like deep research, where it kind of grills the user back and forth at 00:18:17.240 |
So we are changing the planning experience as we speak, and we're going to launch it very 00:18:23.800 |
It's hard to reconcile how most of the users have been trained by products like ChatGPT, 00:18:29.920 |
and actually how we expect them to use a coding agent, or in general any agent. 00:18:34.080 |
Because if you have a complicated task that you want to express, let's say in the case of 00:18:38.320 |
building software, you basically want to submit a PRD, that's what like every PM is capable 00:18:47.020 |
Or what they do is that they write a two-lines prompt, they throw it into Cloud, they get back 00:18:51.800 |
a long PRD, and then they expect to follow pedantically every single item in that PRD. 00:19:00.760 |
The challenge here is to make happy both people that love to use it as a chatbot, so that they 00:19:13.200 |
You know, we did a course with Andrew Yang, who's going to be on stage in a few hours, 00:19:16.720 |
just to tell people if you want to use it that way, it's important that you split your main 00:19:21.320 |
goal into subtasks, and basically you submit them sequentially. 00:19:25.320 |
But at the same time, I would love to reach a point where we go through each subtask in isolation, 00:19:31.640 |
And maybe after we ask for feedback, say, for one hour, then it's up to you as a user to 00:19:35.900 |
find out if you accomplished everything that you wanted. 00:19:38.820 |
But I think there is so much that can be done autonomously that maybe brings, say, 90% close 00:19:45.340 |
And then when we get their attention back, we basically ask them to polish the user experience 00:19:53.600 |
You mentioned observability and thinking about that early on. 00:19:59.300 |
What have you learned as Repl.Agent has gone crazy viral? 00:20:03.360 |
That observability is even harder than expected, regardless of the fact that you guys are building 00:20:17.300 |
So first of all, this feels a bit like back in the days when we were discussing what is the 00:20:25.520 |
The tool is, you know, one size does not fit all in this case. 00:20:30.200 |
And there are the datadog style observability that is still very useful. 00:20:35.640 |
Like you want to have aggregates, you want to have dashboards that tell you you're failing 00:20:39.700 |
to use this tool 50% of the times and then ring an alert and go ahead and fix it. 00:20:45.460 |
At the same time, something like Langsmit is extremely important because unfortunately we're still 00:20:50.640 |
at the kind of like assembly era of debugging for agents. 00:20:54.320 |
I think you would agree with me because when you are trying to understand why the agent has 00:21:00.640 |
made, you know, the wrong choice or is going sideways, your last resort is to actually read 00:21:06.820 |
the entire input from the output and the generated output and trying to figure out why certain choices 00:21:12.780 |
So it's much more effort to debug compared to an advanced distributed system in Mambulopino. 00:21:21.560 |
You have something that looks like a step debugger, but rather than showing you the state in memory, 00:21:25.920 |
you need to read 100,000 tokens and figure out what's wrong. 00:21:30.240 |
So I think we are at the early stages of observability. 00:21:33.440 |
But what I recommend everyone who starts to really think of building an agent or like any 00:21:38.040 |
agentic workflow is invest in observability from day one. 00:21:42.360 |
Otherwise, you're going to be lost immediately and you're probably going to give up because 00:21:46.220 |
you're going to think it's impossible to pull this off. 00:21:48.720 |
And I hope that we are proof and many other companies are proof that it's not impossible. 00:21:55.180 |
who do you see kind of being the best-- who debugs these agents? 00:22:01.640 |
I mean, you guys are building a technical product. 00:22:03.360 |
So presumably everyone has some product sense and product feel for it. 00:22:07.740 |
But is there a particular persona that spends the majority of their time in Langsmith looking 00:22:14.120 |
out logs or who has the best kind of like skill or knack or intuition for that? 00:22:18.440 |
Given the size of rapidly today, we are like barely 75 people across the entire company. 00:22:26.140 |
The way we work is everyone does a bit of everything. 00:22:28.500 |
So even if you're an AI engineer and you are the person who has been optimizing the prompts, 00:22:32.440 |
but there is a page and something is broken, most of the people in the technical team are capable 00:22:38.700 |
of going all the way from almost the product surface to the metal. 00:22:42.420 |
Now, what makes it a bit more challenging for Rapid is that we own the entire stack. 00:22:47.440 |
So we have the execution plane where we orchestrate all the containers. 00:22:52.020 |
We have the control plane, which is basically like a combination of our agent code base, 00:22:56.940 |
Langrath style orchestration, and all the way to the product. 00:23:00.940 |
So it's important, unfortunately, as of now, to be capable of reading the traces all the way down. 00:23:08.840 |
You know, even one of the tools we invoke, maybe the interface is correct, but it could 00:23:17.160 |
We've talked a bit about the journey from v1 to v2, and maybe to close us off, what's coming 00:23:24.920 |
What are some things that are on the roadmap that we can expect? 00:23:29.020 |
You know, I expect us to bring computer use, or in general, like making it easier to test applications. 00:23:35.120 |
At the same time, I'm also very bullish on bringing in software testing in the loop. 00:23:40.480 |
The beauty of building a coding agent is that code is far more observable, and there are 00:23:46.760 |
way more tools that you can apply on code to test if it's correct or not. 00:23:51.840 |
And last but not least, I will want to work even further on test time computing, where, 00:23:57.840 |
as of today, we already use a fair amount of tokens, as you know. 00:24:03.180 |
But definitely we want to explore both sampling and parallelism. 00:24:07.020 |
So we see this, especially at the beginning, a lot of our users open several projects in 00:24:12.380 |
parallel, and do the initial build, so that they can see which one matches their UI taste 00:24:18.460 |
I imagine taking this concept and carrying it along the entire trajectory, where you sample, 00:24:22.880 |
and then you rank and pick the best solution for the problem. 00:24:26.320 |
So this will be like for our high spenders, but it definitely helps you to get better performance.