back to index

Building Replit Agent v2


Whisper Transcript | Transcript Only Page

00:00:05.000 | I'm thrilled to announce our next speaker from Replit.
00:00:09.600 | You all are probably familiar with Replit.
00:00:11.960 | They've made programming accessible to anyone.
00:00:15.000 | They've revolutionized how we've written, deployed,
00:00:17.820 | and collaborated on code, empowering
00:00:21.000 | a community of over 30 million developers
00:00:23.680 | to build more efficiently than ever before.
00:00:26.760 | And so I'd like to welcome my friend Michele,
00:00:28.880 | president at Replit, to the stage for our fireside chat.
00:00:32.600 | Welcome, Michele.
00:00:33.880 | Thanks for coming.
00:00:39.840 | Thanks for being-- thanks for being here.
00:00:48.640 | MIKHELE MIRANDA: Thanks for inviting me.
00:00:49.680 | Excited about it.
00:00:50.920 | So I think most people are probably
00:00:52.820 | familiar with Replit and what they do.
00:00:57.000 | You guys launched V2 of Replit Agent six months ago?
00:01:00.720 | MIKHELE MIRANDA: Oh, I wish.
00:01:01.720 | Two months ago.
00:01:02.000 | MIKHELE MIRANDA: Two months ago.
00:01:02.720 | MIKHELE MIRANDA: OK.
00:01:03.220 | MIKHELE MIRANDA: Early access end of February, GA, late March.
00:01:06.960 | MIKHELE MIRANDA: And I've heard nothing but fantastic things
00:01:09.760 | about it.
00:01:10.340 | And so if people haven't tried out Replit Agent in the last two
00:01:13.720 | months, what is new?
00:01:15.440 | What is different?
00:01:16.440 | What is different?
00:01:17.440 | Why should they try it out?
00:01:18.440 | MIKHELE MIRANDA: I think the shortest possible summary
00:01:20.440 | is autonomy, the level of autonomy that it showcases compared
00:01:24.440 | to V1.
00:01:25.440 | If you tried V1 starting from September last year,
00:01:28.440 | you recall that it was working autonomously for a couple of
00:01:31.440 | minutes at most.
00:01:32.440 | MIKHELE MIRANDA: And right now it's not uncommon to see it
00:01:35.260 | running for 10, 15 minutes.
00:01:37.160 | And what I mean, what I say by running is not spinning the
00:01:39.620 | wheels, like rather doing useful work and accomplishing
00:01:42.620 | what the user wants.
00:01:44.380 | And it took a lot of re-architecting and also thanks
00:01:48.200 | to new models that are coming out.
00:01:50.100 | And things we learned, to be honest, like shipping things
00:01:52.560 | in production teaches you a lot.
00:01:55.000 | I think we learned a lot of tweaks to make the agent overall
00:01:57.320 | better in these months.
00:01:58.820 | MIKHELE MIRANDA: Are you able to share any of those tweaks?
00:02:01.240 | MIKHELE MIRANDA: Yeah.
00:02:02.420 | MIKHELE MIRANDA: Where do I start from?
00:02:03.920 | MIKHELE MIRANDA: I would say I usually have two pillars, which,
00:02:07.020 | by the way, I'm going to reiterate what you just
00:02:09.180 | explained during your keynote.
00:02:11.120 | On one end, investing early in evaluations--
00:02:14.280 | extremely important.
00:02:15.420 | Otherwise, especially the more your agent becomes advanced,
00:02:18.800 | the more you don't have an idea if you're introducing
00:02:20.960 | regressions or actually making progress.
00:02:23.380 | And the other one is observability.
00:02:25.640 | We can go deep in there.
00:02:26.860 | I mean, as you know, we use LangSmeet pretty thoroughly.
00:02:30.300 | We also use another set of tools.
00:02:31.800 | And I think we are all learning, as a field, how to do
00:02:34.680 | observability on agents.
00:02:36.060 | It's a completely different animal compared to how we built
00:02:38.860 | distributed systems in the past decades.
00:02:40.800 | MIKHELE MIRANDA: One of the things that I'd love to hear more
00:02:44.800 | about, when we did a separate fireside chat maybe in December,
00:02:49.060 | and we talked about the human in the loop experience and how
00:02:52.060 | that was important kind of like at the time.
00:02:54.020 | Now you're saying these agents are more autonomous.
00:02:56.260 | How do you think about that?
00:02:58.760 | Has that changed, or is it just present in a different way?
00:03:01.300 | MIKHELE MIRANDA: Yeah, you're spot on.
00:03:02.460 | There is this constant tension between wanting to put the human
00:03:05.420 | in the loop so that you can break the genetic flow and make sure
00:03:08.680 | that in case it's going sideways, the human can bring it back
00:03:11.800 | on tracks.
00:03:12.800 | But at the same time, what we're experiencing from our users
00:03:15.580 | is that when the agent is actually working correctly,
00:03:18.640 | they don't want to be bothered.
00:03:19.840 | They just want you to get things done.
00:03:21.840 | And the bar keeps raising basically on a monthly basis.
00:03:24.460 | The more we can get done, it maybe takes a week for the user
00:03:27.340 | to get used to that, and then they just want more.
00:03:29.800 | So I think the strategy that we're following at the moment
00:03:32.980 | is we try to upload notifications also to other platforms.
00:03:37.840 | We have a mobile app, for instance, that basically
00:03:40.000 | allows you to bring back the user to the attention.
00:03:43.420 | But at the same time, there is always a chat available
00:03:46.160 | where you can ask the agent to stop.
00:03:47.740 | You can ask it to do different work,
00:03:49.780 | even while it's actually working.
00:03:52.420 | So it depends, I think, on the user profile.
00:03:54.500 | Some users tend to be more trustworthy,
00:03:57.280 | and then deliver the agency to the agent.
00:03:59.340 | And some others have been more hands-on.
00:04:01.860 | And I'm trying to build a product that makes both of them happy.
00:04:05.700 | But I think, overall, we are all going
00:04:07.840 | towards more autonomy over time.
00:04:09.420 | And I think that's the winning recipe.
00:04:11.500 | On the topic of users, how are people using RepliAgent?
00:04:17.340 | What types of things are they building?
00:04:20.380 | What are their backgrounds?
00:04:21.580 | Who are the users that you're thinking of targeting?
00:04:25.300 | Yeah.
00:04:25.760 | So starting from early February, we finally opened our free tier.
00:04:29.880 | So everyone can use Rapid just creating an account.
00:04:33.360 | And we are on track to create roughly 1 million applications
00:04:36.220 | per month.
00:04:36.840 | So that's the level of scale that we reach today.
00:04:40.440 | A lot of them are just testing what agents can do.
00:04:43.940 | And I think the same high that we got when we were younger.
00:04:47.660 | We wrote our first piece of code, and you actually see it running.
00:04:50.820 | That's what a lot of people are chasing when first trying the agent.
00:04:53.540 | Like realizing that you can actually build software even without having any coding background.
00:04:58.840 | At the same time, some of them get hooked up.
00:05:01.080 | And they realize, oh, I can build what I need for my business.
00:05:04.040 | I can build something that I need at work.
00:05:06.540 | And that's when they start to work on much more ambitious applications.
00:05:10.260 | So I think one of the key differences of our product is the fact that it's not used mostly
00:05:15.600 | to create simple like landing pages or prototypes, but rather people find value on very long trajectories.
00:05:21.980 | I've seen people spending hundreds of hours on a single project with a reputation, writing
00:05:27.240 | absolutely no lines of code, just making progress with the agent.
00:05:30.800 | That is, first of all, a great technical challenge because it makes things much harder for several
00:05:36.140 | different reasons, and the people that are spending so much time, they are usually either
00:05:40.780 | building internal tools in companies.
00:05:44.300 | There's something I'm very excited about.
00:05:46.020 | There is this concept of unbundling SaaS that even program talks about, the idea that why would
00:05:51.660 | I spend seven figures buying a very expensive SaaS when I did only two features?
00:05:56.640 | I'm going to rather rebuild it and deploy it internally in the company.
00:06:00.480 | So this is one direction that I see a lot more companies working on.
00:06:04.480 | And at the same time, also personalized applications for professionals or even people that have
00:06:11.840 | their own hobby and they want to build software based on that.
00:06:14.980 | So that's the kernel escape today.
00:06:17.120 | That's awesome.
00:06:20.120 | For people who have agents and are maybe starting with agents on the lower end of autonomy and
00:06:26.820 | are thinking of letting it run now for 10, 15 minutes like you are, how did you have the
00:06:31.040 | confidence to let it do that?
00:06:33.340 | When was the point where you were like, okay, we can bring the human out of the loop and
00:06:37.060 | we can start letting it run?
00:06:38.060 | Was that based on feedback from users, internal testing, metrics?
00:06:42.520 | What did that process to get that confidence look like?
00:06:44.740 | I would say a lot of internal testing.
00:06:47.460 | Even before we launched V1, we had a prototype of it since early 2024.
00:06:52.320 | So we have always been trying to make it work.
00:06:55.920 | And the moment we find the ride unlocks, which partially are due to what Frontier Labs are working
00:07:01.180 | on, so the new models that they give us.
00:07:02.460 | And at the same time, it's also due to how good is the scaffold that we're building.
00:07:06.800 | The moment it works well enough, then that's when we start to feel we should launch this.
00:07:10.340 | We should put it at least in front of a small alpha users cohort.
00:07:15.000 | What happened with V2 is that we re-architected it to best leverage the latest models out there,
00:07:22.520 | and then we started to use it a lot internally.
00:07:25.060 | And we started with a approach that was a bit more similar to V1, so we were more cautious.
00:07:31.780 | And then we just gave more leash.
00:07:33.900 | So we wanted to say, "Okay, how far can we take this?
00:07:36.000 | How good is it going to work?"
00:07:37.660 | And it turns out that it exceeded our expectations.
00:07:40.680 | So the confidence, in all honesty, as usual, came during the early access program where we
00:07:47.720 | launched it as an opt-in.
00:07:49.180 | We asked users just through social to go and try it.
00:07:52.460 | And then we received exceedingly positive feedback.
00:07:55.720 | And then as a team, we rushed to basically go to GA as soon as possible.
00:07:59.340 | So you've mentioned models a few times.
00:08:02.160 | Are you able to share what models you all are using or how generally you think of the model
00:08:07.780 | landscape out there?
00:08:09.820 | We are heavy users of the Sonnet models, especially in 3.7 as unlock a new level of autonomy for
00:08:17.000 | coding agents.
00:08:18.000 | So I see overall the industry pointing in that direction, like the latest Gemini 2.5 Pro is
00:08:23.820 | also following a very similar philosophy.
00:08:27.120 | And I do believe that Frontier Labs are realizing that there is a lot of value in allowing companies
00:08:33.820 | like ours and all your customers to create much more advanced, agentic workflows compared
00:08:41.240 | to the past.
00:08:42.240 | So I wouldn't be surprised if in the next few months we are going to see all the top models
00:08:46.760 | exposing tools and being post-trained in such a way that allows you to have much more autonomy
00:08:52.280 | than before.
00:08:54.280 | And how many do you let users choose what model is used under the hood, or is that hidden?
00:09:02.020 | No, we are very opinionated, and it's also product choice.
00:09:06.660 | In all honesty, there are platforms where, of course, you can pick your model.
00:09:11.960 | We use Cursor Internet and Rapid, for example, to develop parts of it.
00:09:15.240 | So I think it's great to have a model selector and get the best possible performance from
00:09:20.760 | the different models available on the market.
00:09:22.760 | In our case, it would be a fairly big challenge to allow you to switch models.
00:09:27.360 | We use multiple models, by the way.
00:09:28.760 | In one run of the agent?
00:09:30.760 | Yeah.
00:09:31.760 | 3.7 is kind of like the foundation, the main building block for the IQ of the agent.
00:09:38.280 | But we also use a lot of other models to do a lot of accessory functions.
00:09:43.280 | Especially when we can trade off latency for performance, then we go with flash models or
00:09:50.280 | with smaller models in general.
00:09:52.280 | So we don't give you that optionality, because it would be very hard for us to even maintain
00:09:58.600 | several different prompts.
00:09:59.800 | Yeah.
00:10:00.800 | If you think about it.
00:10:01.800 | We go very deep into the rabbit hole of optimizing the prompts.
00:10:04.560 | It would be very hard for me to go from n=1 to n=3 prompt sets.
00:10:08.800 | It would be quite a lot of work for now.
00:10:11.800 | Do you use any open source models as well as part of this, or is it mostly foundation
00:10:17.600 | models?
00:10:18.600 | At this point, it's mostly foundation models.
00:10:21.100 | We definitely spent some time testing DeepSeq, and I'm very bullish overall in time.
00:10:27.580 | The reason why we're not investing too much time today fine-tuning or exploring open source
00:10:33.020 | models at length is because, again, the labs are moving at a completely different pace compared
00:10:39.040 | even to one year ago.
00:10:40.320 | I think back in the days when we got to know each other, maybe there was a new leap every
00:10:44.360 | six to nine months.
00:10:45.360 | Now it's probably happening every couple of months.
00:10:47.960 | So it's better to explore what you can do today with Frontier Labs.
00:10:52.660 | And then eventually, when things slow down, if they will ever slow down, by the way, or if
00:10:56.660 | there is a reason for us to take an open source model, fine-tune it, and perhaps try to optimize
00:11:02.200 | some of the key actions that our agent takes, then I'd be happy to spend time there.
00:11:08.000 | But for now, it's already very frantic, as it is.
00:11:12.040 | You've mentioned kind of like the trade-off between cost and latency, and then there's
00:11:16.040 | also kind of like performance there.
00:11:17.080 | And performance, yeah.
00:11:18.080 | How do you think about that now, and how have you thought about that over time?
00:11:23.820 | Because RepliAgent, I feel like, at least based on what I see on Twitter, has exploded
00:11:27.960 | like recently.
00:11:30.080 | And so was there a moment-- like, I think everyone kind of has some fear when they launch
00:11:35.120 | an agent or some AI application.
00:11:36.120 | Like, if this becomes really popular, like, it's going to bankrupt me.
00:11:40.120 | And so did you guys have that fear as you started to see things take off?
00:11:43.920 | I still have that fear, so it doesn't change much, trust me.
00:11:47.960 | So I think I went on a podcast, probably in early October last year, of course, saying
00:11:53.420 | that the three dimensions you want to optimize are performance, cost, and latency.
00:11:59.960 | And for me, performance and cost are almost at the same level in terms of importance.
00:12:05.360 | And then, already back in the V1 days, I was using latency as a far third.
00:12:11.780 | It doesn't change much today with V2, if anything, that gap has become even wider.
00:12:16.120 | Because it runs for so long.
00:12:17.740 | It runs for so long, and possibly that was the scariest bet we did when we launched it,
00:12:24.020 | especially when we put it on and we made it GA.
00:12:27.300 | And the reason is, we were already not emphasizing too much the latency component, but we strongly
00:12:33.260 | believe that it's far more important for the agent to get done what people want, and especially
00:12:38.940 | for the ICP that we have in mind, which is non-technical people.
00:12:42.880 | So we went almost like one order of magnitude in terms of additional latency.
00:12:47.500 | And the reaction has been fairly non-controversial, I think, and maybe for the first week we heard
00:12:52.900 | some people being shocked about the amount of time it was taking, but the moment you realize
00:12:57.260 | how much more it gets done, and the amount of headaches that it solves for you, because you
00:13:02.440 | don't have to go and try to debug.
00:13:04.060 | Even if you debug it with the agent, with an older version of the agent, you have to know
00:13:08.120 | what to ask.
00:13:09.120 | Right now, it's not the case anymore, oftentimes.
00:13:11.060 | So do you see people modifying the code manually still, or is it completely hands-off?
00:13:17.740 | It's a great question.
00:13:18.740 | We have an internal metric, and it's one of my North Stars, to be honest.
00:13:22.680 | We try to track how often people go back into our editor, which, by the way, we have been
00:13:26.880 | hiding in the product since we launched Agent B1.
00:13:29.680 | I mean, that was the main product.
00:13:31.300 | That was the goal.
00:13:32.920 | Yeah, exactly.
00:13:33.920 | The main product for those who didn't know Rapid before we launched the agent was an editor
00:13:37.980 | in the cloud.
00:13:39.920 | We started by still showing you the file tree, then now it's hidden by default, and then it
00:13:44.920 | takes some effort to get in front of the editor.
00:13:47.920 | We started where, I think, one user out of four were actually still editing the code, especially
00:13:53.920 | like the more professional ones.
00:13:55.620 | I think as of today, we arrived to a point where it's one out of ten doing that.
00:14:00.320 | And my goal is, eventually, it should be like zero users willing to put their hands on the
00:14:05.100 | code.
00:14:06.100 | One of the cool features of Repl.it that I remember from before, Agent, was kind of like
00:14:11.700 | the multiplayer collaborator thing as well.
00:14:15.300 | When people build agents, is there a collaborative aspect to it, or is it mostly kind of like—sorry,
00:14:20.120 | when people build apps with agent, is it mostly one person using the agent, or is there sometimes
00:14:24.620 | collaborative as well interacting with the agent?
00:14:28.320 | So for our consumers around the world, yes, most of them, I think, are just single-player
00:14:34.300 | experience, especially more like in a business and enterprise setting.
00:14:40.040 | We bring them in in a team so everyone can see each other's projects.
00:14:43.740 | And we see them using the agent together.
00:14:46.560 | Now, we have a giant lock as of now, for reasons I'm happy to explain.
00:14:51.560 | But, you know, we see oftentimes in the shot logs that there are several people sending, basically,
00:14:56.560 | prompts to the agent.
00:14:58.080 | The challenge why it's still hard to run a lot of agents in parallel is not that much on the
00:15:02.560 | infrastructure side.
00:15:03.560 | Like, we have everything it takes to run multiple instances because we already run at scale, so
00:15:08.380 | that wouldn't be such a big leap.
00:15:10.380 | The real challenge is how do you merge all the different, you know, patches, basically PRs
00:15:16.380 | that the agent creates, which is a non-trivial problem, Steve, for even AI frontier models.
00:15:22.380 | Like, merge conflicts are hard, unfortunately.
00:15:25.200 | You mentioned earlier that there's some app for using Repl.it and getting notifications.
00:15:34.960 | Where I'm going with this is when this agent's running for, like, 10, 15 minutes, how does
00:15:39.020 | it-- like, what are the communication patterns you're seeing?
00:15:41.020 | How do the users know when it's done?
00:15:42.020 | Are they just keeping the browser open and looking there?
00:15:45.020 | Do you have, like, Slack notifications?
00:15:46.020 | Is it this app that sends them a push-- like, what are you seeing being helpful there?
00:15:51.400 | And has that changed as the agent gets longer and longer running?
00:15:54.960 | Yeah.
00:15:55.960 | So with WeeV1, most of the users were in front of the screen all the time, because the feedback
00:16:02.840 | loop was relatively short.
00:16:05.120 | And I think there was also quite a bit to learn from what the agent was doing.
00:16:09.520 | It's still the case today.
00:16:11.640 | It's fairly verbose.
00:16:12.900 | If you're curious, you can basically expand every single action it does.
00:16:16.580 | If you want, you can see the output of every single tool we run.
00:16:19.360 | We try to be as transparent as possible.
00:16:21.560 | So there is a subset of users that are using the agent not only because they want to build
00:16:27.080 | something, but also because they want to speedrun their learning experience.
00:16:30.780 | It teaches you how to build 0 to 1 apps in possibly the best possible way.
00:16:35.960 | There are also users that absolutely don't care, and they just launch, they submit a prompt,
00:16:41.520 | and then they go back, maybe they go to it, and then they go back and check Replit.
00:16:45.200 | To make sure that the loop is a bit tighter, the Replit mobile app, that is available both
00:16:50.880 | in App Store and Android, sends you notifications when the agent wants your feedback.
00:16:56.160 | And the vision that we have for the next release is to send you even fewer notifications.
00:17:01.880 | And the idea is, right now, one of the bottlenecks, at least for us, is the fact that we rely solely
00:17:09.060 | on humans for testing.
00:17:10.560 | But, as you know, more and more progress is happening on the computer use side.
00:17:16.560 | You know, Anthropic launched that back in late October, if I recall correctly.
00:17:20.240 | Open AI fast-followed, and open source is also catching up.
00:17:23.740 | You know, I see Hagen Phase launched something similar a week ago.
00:17:26.740 | That is something that we are actively working on to remove even, you know, this additional
00:17:32.240 | hurdle from the user.
00:17:33.240 | Because a lot of the time what we ask you to test is fairly trivial.
00:17:37.240 | So, like, it's data input and clicking around a very simple interface.
00:17:42.700 | I expect us to be able to do that with computer use very soon.
00:17:46.920 | Bring it in products, and then jumping from, say, ten minutes of autonomy to one hour of autonomy.
00:17:53.040 | That is my target, you know, for P3, hopefully in a few months.
00:17:56.400 | How do you think about, there's kind of like testing, but then there's also making sure that
00:18:00.660 | it's doing what the human actually wanted.
00:18:03.220 | And oftentimes we're bad communicators, and don't specify everything up front.
00:18:07.380 | How do you think about getting all of that specification?
00:18:10.360 | Do you have something like deep research, where it kind of grills the user back and forth at
00:18:14.500 | the start?
00:18:15.500 | Or how do you think about that?
00:18:17.240 | So we are changing the planning experience as we speak, and we're going to launch it very
00:18:21.800 | soon.
00:18:23.800 | It's hard to reconcile how most of the users have been trained by products like ChatGPT,
00:18:29.920 | and actually how we expect them to use a coding agent, or in general any agent.
00:18:34.080 | Because if you have a complicated task that you want to express, let's say in the case of
00:18:38.320 | building software, you basically want to submit a PRD, that's what like every PM is capable
00:18:42.480 | of doing.
00:18:43.480 | Very few people are willing to do that.
00:18:47.020 | Or what they do is that they write a two-lines prompt, they throw it into Cloud, they get back
00:18:51.800 | a long PRD, and then they expect to follow pedantically every single item in that PRD.
00:18:57.480 | We're not there yet.
00:19:00.760 | The challenge here is to make happy both people that love to use it as a chatbot, so that they
00:19:08.480 | do basically one single task at a time.
00:19:11.160 | And we put some effort in training.
00:19:13.200 | You know, we did a course with Andrew Yang, who's going to be on stage in a few hours,
00:19:16.720 | just to tell people if you want to use it that way, it's important that you split your main
00:19:21.320 | goal into subtasks, and basically you submit them sequentially.
00:19:25.320 | But at the same time, I would love to reach a point where we go through each subtask in isolation,
00:19:30.240 | we get things done.
00:19:31.640 | And maybe after we ask for feedback, say, for one hour, then it's up to you as a user to
00:19:35.900 | find out if you accomplished everything that you wanted.
00:19:38.820 | But I think there is so much that can be done autonomously that maybe brings, say, 90% close
00:19:43.820 | to what the user wants.
00:19:45.340 | And then when we get their attention back, we basically ask them to polish the user experience
00:19:49.720 | and finance exactly what they want.
00:19:53.600 | You mentioned observability and thinking about that early on.
00:19:59.300 | What have you learned as Repl.Agent has gone crazy viral?
00:20:03.360 | That observability is even harder than expected, regardless of the fact that you guys are building
00:20:09.480 | something awesome with Langsmit.
00:20:11.360 | What are the hardest parts?
00:20:13.080 | Give us some product ideas.
00:20:17.300 | So first of all, this feels a bit like back in the days when we were discussing what is the
00:20:23.620 | best possible architecture for databases.
00:20:25.520 | The tool is, you know, one size does not fit all in this case.
00:20:30.200 | And there are the datadog style observability that is still very useful.
00:20:35.640 | Like you want to have aggregates, you want to have dashboards that tell you you're failing
00:20:39.700 | to use this tool 50% of the times and then ring an alert and go ahead and fix it.
00:20:45.460 | At the same time, something like Langsmit is extremely important because unfortunately we're still
00:20:50.640 | at the kind of like assembly era of debugging for agents.
00:20:54.320 | I think you would agree with me because when you are trying to understand why the agent has
00:21:00.640 | made, you know, the wrong choice or is going sideways, your last resort is to actually read
00:21:06.820 | the entire input from the output and the generated output and trying to figure out why certain choices
00:21:11.640 | have been made.
00:21:12.780 | So it's much more effort to debug compared to an advanced distributed system in Mambulopino.
00:21:19.820 | Like aggregates are not enough.
00:21:21.560 | You have something that looks like a step debugger, but rather than showing you the state in memory,
00:21:25.920 | you need to read 100,000 tokens and figure out what's wrong.
00:21:30.240 | So I think we are at the early stages of observability.
00:21:33.440 | But what I recommend everyone who starts to really think of building an agent or like any
00:21:38.040 | agentic workflow is invest in observability from day one.
00:21:42.360 | Otherwise, you're going to be lost immediately and you're probably going to give up because
00:21:46.220 | you're going to think it's impossible to pull this off.
00:21:48.720 | And I hope that we are proof and many other companies are proof that it's not impossible.
00:21:52.800 | It's just really hard as we speak.
00:21:55.180 | who do you see kind of being the best-- who debugs these agents?
00:22:00.640 | Is it everyone on the team?
00:22:01.640 | I mean, you guys are building a technical product.
00:22:03.360 | So presumably everyone has some product sense and product feel for it.
00:22:07.740 | But is there a particular persona that spends the majority of their time in Langsmith looking
00:22:14.120 | out logs or who has the best kind of like skill or knack or intuition for that?
00:22:18.440 | Given the size of rapidly today, we are like barely 75 people across the entire company.
00:22:26.140 | The way we work is everyone does a bit of everything.
00:22:28.500 | So even if you're an AI engineer and you are the person who has been optimizing the prompts,
00:22:32.440 | but there is a page and something is broken, most of the people in the technical team are capable
00:22:38.700 | of going all the way from almost the product surface to the metal.
00:22:42.420 | Now, what makes it a bit more challenging for Rapid is that we own the entire stack.
00:22:47.440 | So we have the execution plane where we orchestrate all the containers.
00:22:52.020 | We have the control plane, which is basically like a combination of our agent code base,
00:22:56.940 | Langrath style orchestration, and all the way to the product.
00:23:00.940 | So it's important, unfortunately, as of now, to be capable of reading the traces all the way down.
00:23:07.120 | Those problems can happen anywhere.
00:23:08.840 | You know, even one of the tools we invoke, maybe the interface is correct, but it could
00:23:13.420 | be that the binary of the tool is broken.
00:23:17.160 | We've talked a bit about the journey from v1 to v2, and maybe to close us off, what's coming
00:23:23.920 | in v3?
00:23:24.920 | What are some things that are on the roadmap that we can expect?
00:23:27.520 | So I entered one of them.
00:23:29.020 | You know, I expect us to bring computer use, or in general, like making it easier to test applications.
00:23:35.120 | At the same time, I'm also very bullish on bringing in software testing in the loop.
00:23:39.480 | Yeah.
00:23:40.480 | The beauty of building a coding agent is that code is far more observable, and there are
00:23:46.760 | way more tools that you can apply on code to test if it's correct or not.
00:23:51.840 | And last but not least, I will want to work even further on test time computing, where,
00:23:57.840 | as of today, we already use a fair amount of tokens, as you know.
00:24:03.180 | But definitely we want to explore both sampling and parallelism.
00:24:07.020 | So we see this, especially at the beginning, a lot of our users open several projects in
00:24:12.380 | parallel, and do the initial build, so that they can see which one matches their UI taste
00:24:17.460 | the better.
00:24:18.460 | I imagine taking this concept and carrying it along the entire trajectory, where you sample,
00:24:22.880 | and then you rank and pick the best solution for the problem.
00:24:26.320 | So this will be like for our high spenders, but it definitely helps you to get better performance.
00:24:30.680 | Awesome.
00:24:31.680 | Well, I'm looking forward to all of those.
00:24:35.380 | Thank you, Michele, for joining me.
00:24:37.320 | Thank you.
00:24:38.320 | Let's give Michele a big round of applause.
00:24:39.320 | Thank you.
00:24:40.320 | Thank you.
00:24:41.320 | Thank you.
00:24:41.320 | Thank you.
00:24:41.320 | Thank you.
00:24:41.320 | Thank you.
00:24:42.320 | Thank you.
00:24:42.320 | Thank you.
00:24:44.320 | Thank you.