Building Replit Agent v2

00:00:00.000 | .

00:00:05.000 | I'm thrilled to announce our next speaker from Replit.

00:00:09.600 | You all are probably familiar with Replit.

00:00:11.960 | They've made programming accessible to anyone.

00:00:15.000 | They've revolutionized how we've written, deployed,

00:00:17.820 | and collaborated on code, empowering

00:00:21.000 | a community of over 30 million developers

00:00:23.680 | to build more efficiently than ever before.

00:00:26.760 | And so I'd like to welcome my friend Michele,

00:00:28.880 | president at Replit, to the stage for our fireside chat.

00:00:32.600 | Welcome, Michele.

00:00:33.880 | Thanks for coming.

00:00:39.840 | Thanks for being-- thanks for being here.

00:00:48.640 | MIKHELE MIRANDA: Thanks for inviting me.

00:00:49.680 | Excited about it.

00:00:50.920 | So I think most people are probably

00:00:52.820 | familiar with Replit and what they do.

00:00:57.000 | You guys launched V2 of Replit Agent six months ago?

00:01:00.720 | MIKHELE MIRANDA: Oh, I wish.

00:01:01.720 | Two months ago.

00:01:02.000 | MIKHELE MIRANDA: Two months ago.

00:01:02.720 | MIKHELE MIRANDA: OK.

00:01:03.220 | MIKHELE MIRANDA: Early access end of February, GA, late March.

00:01:06.960 | MIKHELE MIRANDA: And I've heard nothing but fantastic things

00:01:09.760 | about it.

00:01:10.340 | And so if people haven't tried out Replit Agent in the last two

00:01:13.720 | months, what is new?

00:01:15.440 | What is different?

00:01:16.440 | What is different?

00:01:17.440 | Why should they try it out?

00:01:18.440 | MIKHELE MIRANDA: I think the shortest possible summary

00:01:20.440 | is autonomy, the level of autonomy that it showcases compared

00:01:24.440 | to V1.

00:01:25.440 | If you tried V1 starting from September last year,

00:01:28.440 | you recall that it was working autonomously for a couple of

00:01:31.440 | minutes at most.

00:01:32.440 | MIKHELE MIRANDA: And right now it's not uncommon to see it

00:01:35.260 | running for 10, 15 minutes.

00:01:37.160 | And what I mean, what I say by running is not spinning the

00:01:39.620 | wheels, like rather doing useful work and accomplishing

00:01:42.620 | what the user wants.

00:01:44.380 | And it took a lot of re-architecting and also thanks

00:01:48.200 | to new models that are coming out.

00:01:50.100 | And things we learned, to be honest, like shipping things

00:01:52.560 | in production teaches you a lot.

00:01:55.000 | I think we learned a lot of tweaks to make the agent overall

00:01:57.320 | better in these months.

00:01:58.820 | MIKHELE MIRANDA: Are you able to share any of those tweaks?

00:02:01.240 | MIKHELE MIRANDA: Yeah.

00:02:02.420 | MIKHELE MIRANDA: Where do I start from?

00:02:03.920 | MIKHELE MIRANDA: I would say I usually have two pillars, which,

00:02:07.020 | by the way, I'm going to reiterate what you just

00:02:09.180 | explained during your keynote.

00:02:11.120 | On one end, investing early in evaluations--

00:02:14.280 | extremely important.

00:02:15.420 | Otherwise, especially the more your agent becomes advanced,

00:02:18.800 | the more you don't have an idea if you're introducing

00:02:20.960 | regressions or actually making progress.

00:02:23.380 | And the other one is observability.

00:02:25.640 | We can go deep in there.

00:02:26.860 | I mean, as you know, we use LangSmeet pretty thoroughly.

00:02:30.300 | We also use another set of tools.

00:02:31.800 | And I think we are all learning, as a field, how to do

00:02:34.680 | observability on agents.

00:02:36.060 | It's a completely different animal compared to how we built

00:02:38.860 | distributed systems in the past decades.

00:02:40.800 | MIKHELE MIRANDA: One of the things that I'd love to hear more

00:02:44.800 | about, when we did a separate fireside chat maybe in December,

00:02:49.060 | and we talked about the human in the loop experience and how

00:02:52.060 | that was important kind of like at the time.

00:02:54.020 | Now you're saying these agents are more autonomous.

00:02:56.260 | How do you think about that?

00:02:58.760 | Has that changed, or is it just present in a different way?

00:03:01.300 | MIKHELE MIRANDA: Yeah, you're spot on.

00:03:02.460 | There is this constant tension between wanting to put the human

00:03:05.420 | in the loop so that you can break the genetic flow and make sure

00:03:08.680 | that in case it's going sideways, the human can bring it back

00:03:11.800 | on tracks.

00:03:12.800 | But at the same time, what we're experiencing from our users

00:03:15.580 | is that when the agent is actually working correctly,

00:03:18.640 | they don't want to be bothered.

00:03:19.840 | They just want you to get things done.

00:03:21.840 | And the bar keeps raising basically on a monthly basis.

00:03:24.460 | The more we can get done, it maybe takes a week for the user

00:03:27.340 | to get used to that, and then they just want more.

00:03:29.800 | So I think the strategy that we're following at the moment

00:03:32.980 | is we try to upload notifications also to other platforms.

00:03:37.840 | We have a mobile app, for instance, that basically

00:03:40.000 | allows you to bring back the user to the attention.

00:03:43.420 | But at the same time, there is always a chat available

00:03:46.160 | where you can ask the agent to stop.

00:03:47.740 | You can ask it to do different work,

00:03:49.780 | even while it's actually working.

00:03:52.420 | So it depends, I think, on the user profile.

00:03:54.500 | Some users tend to be more trustworthy,

00:03:57.280 | and then deliver the agency to the agent.

00:03:59.340 | And some others have been more hands-on.

00:04:01.860 | And I'm trying to build a product that makes both of them happy.

00:04:05.700 | But I think, overall, we are all going

00:04:07.840 | towards more autonomy over time.

00:04:09.420 | And I think that's the winning recipe.

00:04:11.500 | On the topic of users, how are people using RepliAgent?

00:04:17.340 | What types of things are they building?

00:04:20.380 | What are their backgrounds?

00:04:21.580 | Who are the users that you're thinking of targeting?

00:04:25.300 | Yeah.

00:04:25.760 | So starting from early February, we finally opened our free tier.

00:04:29.880 | So everyone can use Rapid just creating an account.

00:04:33.360 | And we are on track to create roughly 1 million applications

00:04:36.220 | per month.

00:04:36.840 | So that's the level of scale that we reach today.

00:04:40.440 | A lot of them are just testing what agents can do.

00:04:43.940 | And I think the same high that we got when we were younger.

00:04:47.660 | We wrote our first piece of code, and you actually see it running.

00:04:50.820 | That's what a lot of people are chasing when first trying the agent.

00:04:53.540 | Like realizing that you can actually build software even without having any coding background.

00:04:58.840 | At the same time, some of them get hooked up.

00:05:01.080 | And they realize, oh, I can build what I need for my business.

00:05:04.040 | I can build something that I need at work.

00:05:06.540 | And that's when they start to work on much more ambitious applications.

00:05:10.260 | So I think one of the key differences of our product is the fact that it's not used mostly

00:05:15.600 | to create simple like landing pages or prototypes, but rather people find value on very long trajectories.

00:05:21.980 | I've seen people spending hundreds of hours on a single project with a reputation, writing

00:05:27.240 | absolutely no lines of code, just making progress with the agent.

00:05:30.800 | That is, first of all, a great technical challenge because it makes things much harder for several

00:05:36.140 | different reasons, and the people that are spending so much time, they are usually either

00:05:40.780 | building internal tools in companies.

00:05:44.300 | There's something I'm very excited about.

00:05:46.020 | There is this concept of unbundling SaaS that even program talks about, the idea that why would

00:05:51.660 | I spend seven figures buying a very expensive SaaS when I did only two features?

00:05:56.640 | I'm going to rather rebuild it and deploy it internally in the company.

00:06:00.480 | So this is one direction that I see a lot more companies working on.

00:06:04.480 | And at the same time, also personalized applications for professionals or even people that have

00:06:11.840 | their own hobby and they want to build software based on that.

00:06:14.980 | So that's the kernel escape today.

00:06:17.120 | That's awesome.

00:06:20.120 | For people who have agents and are maybe starting with agents on the lower end of autonomy and

00:06:26.820 | are thinking of letting it run now for 10, 15 minutes like you are, how did you have the

00:06:31.040 | confidence to let it do that?

00:06:33.340 | When was the point where you were like, okay, we can bring the human out of the loop and

00:06:37.060 | we can start letting it run?

00:06:38.060 | Was that based on feedback from users, internal testing, metrics?

00:06:42.520 | What did that process to get that confidence look like?

00:06:44.740 | I would say a lot of internal testing.

00:06:47.460 | Even before we launched V1, we had a prototype of it since early 2024.

00:06:52.320 | So we have always been trying to make it work.

00:06:55.920 | And the moment we find the ride unlocks, which partially are due to what Frontier Labs are working

00:07:01.180 | on, so the new models that they give us.

00:07:02.460 | And at the same time, it's also due to how good is the scaffold that we're building.

00:07:06.800 | The moment it works well enough, then that's when we start to feel we should launch this.

00:07:10.340 | We should put it at least in front of a small alpha users cohort.

00:07:15.000 | What happened with V2 is that we re-architected it to best leverage the latest models out there,

00:07:22.520 | and then we started to use it a lot internally.

00:07:25.060 | And we started with a approach that was a bit more similar to V1, so we were more cautious.

00:07:31.780 | And then we just gave more leash.

00:07:33.900 | So we wanted to say, "Okay, how far can we take this?

00:07:36.000 | How good is it going to work?"

00:07:37.660 | And it turns out that it exceeded our expectations.

00:07:40.680 | So the confidence, in all honesty, as usual, came during the early access program where we

00:07:47.720 | launched it as an opt-in.

00:07:49.180 | We asked users just through social to go and try it.

00:07:52.460 | And then we received exceedingly positive feedback.

00:07:55.720 | And then as a team, we rushed to basically go to GA as soon as possible.

00:07:59.340 | So you've mentioned models a few times.

00:08:02.160 | Are you able to share what models you all are using or how generally you think of the model

00:08:07.780 | landscape out there?

00:08:09.820 | We are heavy users of the Sonnet models, especially in 3.7 as unlock a new level of autonomy for

00:08:17.000 | coding agents.

00:08:18.000 | So I see overall the industry pointing in that direction, like the latest Gemini 2.5 Pro is

00:08:23.820 | also following a very similar philosophy.

00:08:27.120 | And I do believe that Frontier Labs are realizing that there is a lot of value in allowing companies

00:08:33.820 | like ours and all your customers to create much more advanced, agentic workflows compared

00:08:41.240 | to the past.

00:08:42.240 | So I wouldn't be surprised if in the next few months we are going to see all the top models

00:08:46.760 | exposing tools and being post-trained in such a way that allows you to have much more autonomy

00:08:52.280 | than before.

00:08:54.280 | And how many do you let users choose what model is used under the hood, or is that hidden?

00:09:02.020 | No, we are very opinionated, and it's also product choice.

00:09:06.660 | In all honesty, there are platforms where, of course, you can pick your model.

00:09:11.960 | We use Cursor Internet and Rapid, for example, to develop parts of it.

00:09:15.240 | So I think it's great to have a model selector and get the best possible performance from

00:09:20.760 | the different models available on the market.

00:09:22.760 | In our case, it would be a fairly big challenge to allow you to switch models.

00:09:27.360 | We use multiple models, by the way.

00:09:28.760 | In one run of the agent?

00:09:30.760 | Yeah.

00:09:31.760 | 3.7 is kind of like the foundation, the main building block for the IQ of the agent.

00:09:38.280 | But we also use a lot of other models to do a lot of accessory functions.

00:09:43.280 | Especially when we can trade off latency for performance, then we go with flash models or

00:09:50.280 | with smaller models in general.

00:09:52.280 | So we don't give you that optionality, because it would be very hard for us to even maintain

00:09:58.600 | several different prompts.

00:09:59.800 | Yeah.

00:10:00.800 | If you think about it.

00:10:01.800 | We go very deep into the rabbit hole of optimizing the prompts.

00:10:04.560 | It would be very hard for me to go from n=1 to n=3 prompt sets.

00:10:08.800 | It would be quite a lot of work for now.

00:10:11.800 | Do you use any open source models as well as part of this, or is it mostly foundation

00:10:17.600 | models?

00:10:18.600 | At this point, it's mostly foundation models.

00:10:21.100 | We definitely spent some time testing DeepSeq, and I'm very bullish overall in time.

00:10:27.580 | The reason why we're not investing too much time today fine-tuning or exploring open source

00:10:33.020 | models at length is because, again, the labs are moving at a completely different pace compared

00:10:39.040 | even to one year ago.

00:10:40.320 | I think back in the days when we got to know each other, maybe there was a new leap every

00:10:44.360 | six to nine months.

00:10:45.360 | Now it's probably happening every couple of months.

00:10:47.960 | So it's better to explore what you can do today with Frontier Labs.

00:10:52.660 | And then eventually, when things slow down, if they will ever slow down, by the way, or if

00:10:56.660 | there is a reason for us to take an open source model, fine-tune it, and perhaps try to optimize

00:11:02.200 | some of the key actions that our agent takes, then I'd be happy to spend time there.

00:11:08.000 | But for now, it's already very frantic, as it is.

00:11:12.040 | You've mentioned kind of like the trade-off between cost and latency, and then there's

00:11:16.040 | also kind of like performance there.

00:11:17.080 | And performance, yeah.

00:11:18.080 | How do you think about that now, and how have you thought about that over time?

00:11:23.820 | Because RepliAgent, I feel like, at least based on what I see on Twitter, has exploded

00:11:27.960 | like recently.

00:11:30.080 | And so was there a moment-- like, I think everyone kind of has some fear when they launch

00:11:35.120 | an agent or some AI application.

00:11:36.120 | Like, if this becomes really popular, like, it's going to bankrupt me.

00:11:40.120 | And so did you guys have that fear as you started to see things take off?

00:11:43.920 | I still have that fear, so it doesn't change much, trust me.

00:11:47.960 | So I think I went on a podcast, probably in early October last year, of course, saying

00:11:53.420 | that the three dimensions you want to optimize are performance, cost, and latency.

00:11:59.960 | And for me, performance and cost are almost at the same level in terms of importance.

00:12:05.360 | And then, already back in the V1 days, I was using latency as a far third.

00:12:11.780 | It doesn't change much today with V2, if anything, that gap has become even wider.

00:12:16.120 | Because it runs for so long.

00:12:17.740 | It runs for so long, and possibly that was the scariest bet we did when we launched it,

00:12:24.020 | especially when we put it on and we made it GA.

00:12:27.300 | And the reason is, we were already not emphasizing too much the latency component, but we strongly

00:12:33.260 | believe that it's far more important for the agent to get done what people want, and especially

00:12:38.940 | for the ICP that we have in mind, which is non-technical people.

00:12:42.880 | So we went almost like one order of magnitude in terms of additional latency.

00:12:47.500 | And the reaction has been fairly non-controversial, I think, and maybe for the first week we heard

00:12:52.900 | some people being shocked about the amount of time it was taking, but the moment you realize

00:12:57.260 | how much more it gets done, and the amount of headaches that it solves for you, because you

00:13:02.440 | don't have to go and try to debug.

00:13:04.060 | Even if you debug it with the agent, with an older version of the agent, you have to know

00:13:08.120 | what to ask.

00:13:09.120 | Right now, it's not the case anymore, oftentimes.

00:13:11.060 | So do you see people modifying the code manually still, or is it completely hands-off?

00:13:17.740 | It's a great question.

00:13:18.740 | We have an internal metric, and it's one of my North Stars, to be honest.

00:13:22.680 | We try to track how often people go back into our editor, which, by the way, we have been

00:13:26.880 | hiding in the product since we launched Agent B1.

00:13:29.680 | I mean, that was the main product.

00:13:31.300 | That was the goal.

00:13:32.920 | Yeah, exactly.

00:13:33.920 | The main product for those who didn't know Rapid before we launched the agent was an editor

00:13:37.980 | in the cloud.

00:13:39.920 | We started by still showing you the file tree, then now it's hidden by default, and then it

00:13:44.920 | takes some effort to get in front of the editor.

00:13:47.920 | We started where, I think, one user out of four were actually still editing the code, especially

00:13:53.920 | like the more professional ones.

00:13:55.620 | I think as of today, we arrived to a point where it's one out of ten doing that.

00:14:00.320 | And my goal is, eventually, it should be like zero users willing to put their hands on the

00:14:05.100 | code.

00:14:06.100 | One of the cool features of Repl.it that I remember from before, Agent, was kind of like

00:14:11.700 | the multiplayer collaborator thing as well.

00:14:15.300 | When people build agents, is there a collaborative aspect to it, or is it mostly kind of like—sorry,

00:14:20.120 | when people build apps with agent, is it mostly one person using the agent, or is there sometimes

00:14:24.620 | collaborative as well interacting with the agent?

00:14:28.320 | So for our consumers around the world, yes, most of them, I think, are just single-player

00:14:34.300 | experience, especially more like in a business and enterprise setting.

00:14:40.040 | We bring them in in a team so everyone can see each other's projects.

00:14:43.740 | And we see them using the agent together.

00:14:46.560 | Now, we have a giant lock as of now, for reasons I'm happy to explain.

00:14:51.560 | But, you know, we see oftentimes in the shot logs that there are several people sending, basically,

00:14:56.560 | prompts to the agent.

00:14:58.080 | The challenge why it's still hard to run a lot of agents in parallel is not that much on the

00:15:02.560 | infrastructure side.

00:15:03.560 | Like, we have everything it takes to run multiple instances because we already run at scale, so

00:15:08.380 | that wouldn't be such a big leap.

00:15:10.380 | The real challenge is how do you merge all the different, you know, patches, basically PRs

00:15:16.380 | that the agent creates, which is a non-trivial problem, Steve, for even AI frontier models.

00:15:22.380 | Like, merge conflicts are hard, unfortunately.

00:15:25.200 | You mentioned earlier that there's some app for using Repl.it and getting notifications.

00:15:34.960 | Where I'm going with this is when this agent's running for, like, 10, 15 minutes, how does

00:15:39.020 | it-- like, what are the communication patterns you're seeing?

00:15:41.020 | How do the users know when it's done?

00:15:42.020 | Are they just keeping the browser open and looking there?

00:15:45.020 | Do you have, like, Slack notifications?

00:15:46.020 | Is it this app that sends them a push-- like, what are you seeing being helpful there?

00:15:51.400 | And has that changed as the agent gets longer and longer running?

00:15:54.960 | Yeah.

00:15:55.960 | So with WeeV1, most of the users were in front of the screen all the time, because the feedback

00:16:02.840 | loop was relatively short.

00:16:05.120 | And I think there was also quite a bit to learn from what the agent was doing.

00:16:09.520 | It's still the case today.

00:16:11.640 | It's fairly verbose.

00:16:12.900 | If you're curious, you can basically expand every single action it does.

00:16:16.580 | If you want, you can see the output of every single tool we run.

00:16:19.360 | We try to be as transparent as possible.

00:16:21.560 | So there is a subset of users that are using the agent not only because they want to build

00:16:27.080 | something, but also because they want to speedrun their learning experience.

00:16:30.780 | It teaches you how to build 0 to 1 apps in possibly the best possible way.

00:16:35.960 | There are also users that absolutely don't care, and they just launch, they submit a prompt,

00:16:41.520 | and then they go back, maybe they go to it, and then they go back and check Replit.

00:16:45.200 | To make sure that the loop is a bit tighter, the Replit mobile app, that is available both

00:16:50.880 | in App Store and Android, sends you notifications when the agent wants your feedback.

00:16:56.160 | And the vision that we have for the next release is to send you even fewer notifications.

00:17:01.880 | And the idea is, right now, one of the bottlenecks, at least for us, is the fact that we rely solely

00:17:09.060 | on humans for testing.

00:17:10.560 | But, as you know, more and more progress is happening on the computer use side.

00:17:16.560 | You know, Anthropic launched that back in late October, if I recall correctly.

00:17:20.240 | Open AI fast-followed, and open source is also catching up.

00:17:23.740 | You know, I see Hagen Phase launched something similar a week ago.

00:17:26.740 | That is something that we are actively working on to remove even, you know, this additional

00:17:32.240 | hurdle from the user.

00:17:33.240 | Because a lot of the time what we ask you to test is fairly trivial.

00:17:37.240 | So, like, it's data input and clicking around a very simple interface.

00:17:42.700 | I expect us to be able to do that with computer use very soon.

00:17:46.920 | Bring it in products, and then jumping from, say, ten minutes of autonomy to one hour of autonomy.

00:17:53.040 | That is my target, you know, for P3, hopefully in a few months.

00:17:56.400 | How do you think about, there's kind of like testing, but then there's also making sure that

00:18:00.660 | it's doing what the human actually wanted.

00:18:03.220 | And oftentimes we're bad communicators, and don't specify everything up front.

00:18:07.380 | How do you think about getting all of that specification?

00:18:10.360 | Do you have something like deep research, where it kind of grills the user back and forth at

00:18:14.500 | the start?

00:18:15.500 | Or how do you think about that?

00:18:17.240 | So we are changing the planning experience as we speak, and we're going to launch it very

00:18:21.800 | soon.

00:18:23.800 | It's hard to reconcile how most of the users have been trained by products like ChatGPT,

00:18:29.920 | and actually how we expect them to use a coding agent, or in general any agent.

00:18:34.080 | Because if you have a complicated task that you want to express, let's say in the case of

00:18:38.320 | building software, you basically want to submit a PRD, that's what like every PM is capable

00:18:42.480 | of doing.

00:18:43.480 | Very few people are willing to do that.

00:18:47.020 | Or what they do is that they write a two-lines prompt, they throw it into Cloud, they get back

00:18:51.800 | a long PRD, and then they expect to follow pedantically every single item in that PRD.

00:18:57.480 | We're not there yet.

00:19:00.760 | The challenge here is to make happy both people that love to use it as a chatbot, so that they

00:19:08.480 | do basically one single task at a time.

00:19:11.160 | And we put some effort in training.

00:19:13.200 | You know, we did a course with Andrew Yang, who's going to be on stage in a few hours,

00:19:16.720 | just to tell people if you want to use it that way, it's important that you split your main

00:19:21.320 | goal into subtasks, and basically you submit them sequentially.

00:19:25.320 | But at the same time, I would love to reach a point where we go through each subtask in isolation,

00:19:30.240 | we get things done.

00:19:31.640 | And maybe after we ask for feedback, say, for one hour, then it's up to you as a user to

00:19:35.900 | find out if you accomplished everything that you wanted.

00:19:38.820 | But I think there is so much that can be done autonomously that maybe brings, say, 90% close

00:19:43.820 | to what the user wants.

00:19:45.340 | And then when we get their attention back, we basically ask them to polish the user experience

00:19:49.720 | and finance exactly what they want.

00:19:53.600 | You mentioned observability and thinking about that early on.

00:19:59.300 | What have you learned as Repl.Agent has gone crazy viral?

00:20:03.360 | That observability is even harder than expected, regardless of the fact that you guys are building

00:20:09.480 | something awesome with Langsmit.

00:20:11.360 | What are the hardest parts?

00:20:13.080 | Give us some product ideas.

00:20:17.300 | So first of all, this feels a bit like back in the days when we were discussing what is the

00:20:23.620 | best possible architecture for databases.

00:20:25.520 | The tool is, you know, one size does not fit all in this case.

00:20:30.200 | And there are the datadog style observability that is still very useful.

00:20:35.640 | Like you want to have aggregates, you want to have dashboards that tell you you're failing

00:20:39.700 | to use this tool 50% of the times and then ring an alert and go ahead and fix it.

00:20:45.460 | At the same time, something like Langsmit is extremely important because unfortunately we're still

00:20:50.640 | at the kind of like assembly era of debugging for agents.

00:20:54.320 | I think you would agree with me because when you are trying to understand why the agent has

00:21:00.640 | made, you know, the wrong choice or is going sideways, your last resort is to actually read

00:21:06.820 | the entire input from the output and the generated output and trying to figure out why certain choices

00:21:11.640 | have been made.

00:21:12.780 | So it's much more effort to debug compared to an advanced distributed system in Mambulopino.

00:21:19.820 | Like aggregates are not enough.

00:21:21.560 | You have something that looks like a step debugger, but rather than showing you the state in memory,

00:21:25.920 | you need to read 100,000 tokens and figure out what's wrong.

00:21:30.240 | So I think we are at the early stages of observability.

00:21:33.440 | But what I recommend everyone who starts to really think of building an agent or like any

00:21:38.040 | agentic workflow is invest in observability from day one.

00:21:42.360 | Otherwise, you're going to be lost immediately and you're probably going to give up because

00:21:46.220 | you're going to think it's impossible to pull this off.

00:21:48.720 | And I hope that we are proof and many other companies are proof that it's not impossible.

00:21:52.800 | It's just really hard as we speak.

00:21:55.180 | who do you see kind of being the best-- who debugs these agents?

00:22:00.640 | Is it everyone on the team?

00:22:01.640 | I mean, you guys are building a technical product.

00:22:03.360 | So presumably everyone has some product sense and product feel for it.

00:22:07.740 | But is there a particular persona that spends the majority of their time in Langsmith looking

00:22:14.120 | out logs or who has the best kind of like skill or knack or intuition for that?

00:22:18.440 | Given the size of rapidly today, we are like barely 75 people across the entire company.

00:22:26.140 | The way we work is everyone does a bit of everything.

00:22:28.500 | So even if you're an AI engineer and you are the person who has been optimizing the prompts,

00:22:32.440 | but there is a page and something is broken, most of the people in the technical team are capable

00:22:38.700 | of going all the way from almost the product surface to the metal.

00:22:42.420 | Now, what makes it a bit more challenging for Rapid is that we own the entire stack.

00:22:47.440 | So we have the execution plane where we orchestrate all the containers.

00:22:52.020 | We have the control plane, which is basically like a combination of our agent code base,

00:22:56.940 | Langrath style orchestration, and all the way to the product.

00:23:00.940 | So it's important, unfortunately, as of now, to be capable of reading the traces all the way down.

00:23:07.120 | Those problems can happen anywhere.

00:23:08.840 | You know, even one of the tools we invoke, maybe the interface is correct, but it could

00:23:13.420 | be that the binary of the tool is broken.

00:23:17.160 | We've talked a bit about the journey from v1 to v2, and maybe to close us off, what's coming

00:23:23.920 | in v3?

00:23:24.920 | What are some things that are on the roadmap that we can expect?

00:23:27.520 | So I entered one of them.

00:23:29.020 | You know, I expect us to bring computer use, or in general, like making it easier to test applications.

00:23:35.120 | At the same time, I'm also very bullish on bringing in software testing in the loop.

00:23:39.480 | Yeah.

00:23:40.480 | The beauty of building a coding agent is that code is far more observable, and there are

00:23:46.760 | way more tools that you can apply on code to test if it's correct or not.

00:23:51.840 | And last but not least, I will want to work even further on test time computing, where,

00:23:57.840 | as of today, we already use a fair amount of tokens, as you know.

00:24:03.180 | But definitely we want to explore both sampling and parallelism.

00:24:07.020 | So we see this, especially at the beginning, a lot of our users open several projects in

00:24:12.380 | parallel, and do the initial build, so that they can see which one matches their UI taste

00:24:17.460 | the better.

00:24:18.460 | I imagine taking this concept and carrying it along the entire trajectory, where you sample,

00:24:22.880 | and then you rank and pick the best solution for the problem.

00:24:26.320 | So this will be like for our high spenders, but it definitely helps you to get better performance.

00:24:30.680 | Awesome.

00:24:31.680 | Well, I'm looking forward to all of those.

00:24:35.380 | Thank you, Michele, for joining me.

00:24:37.320 | Thank you.

00:24:38.320 | Let's give Michele a big round of applause.

00:24:39.320 | Thank you.

00:24:40.320 | Thank you.

00:24:41.320 | Thank you.

00:24:42.320 | Thank you.

00:24:44.320 | Thank you.