back to indexLangChain Interrupt 2025 Building and Scaling an AI Agent During HyperGrowth – Sherwood Callaway

00:00:23.660 |
And today, I'm joined by Keith Burnett of Grove, who 00:00:26.120 |
is the product manager for this Alice product. 00:00:28.660 |
Now, WebEx, for those of you who are unfamiliar, 00:00:32.480 |
is a company that's building digital workers. 00:00:44.120 |
I want to take everybody back to September 2024, which is, 00:00:47.780 |
for most people, not long ago for us top company history. 00:00:55.040 |
We just announced our series A. Then our series B, 00:01:00.840 |
With all this chaos going on, we relocated our team and company 00:01:04.480 |
from London at San Francisco to a beautiful new office 00:01:11.820 |
And at the same time, we also bought it at Wapid because we're 00:01:18.880 |
And during all this chaos, we chose this moment to rebuild our core 00:01:26.340 |
And the reason we did that is we truly felt at the time 00:01:30.340 |
and we're going to be sure that agents were the future. 00:01:34.580 |
So today's talk, I want to first tell you why we felt the need 00:01:38.540 |
Hopefully, I think everyone's probably in agreement about agents 00:01:44.100 |
We built this enterprise rate at AISDR in just three months. 00:01:47.820 |
Then I'm going to talk about one of the new challenges that we 00:01:49.740 |
experienced, which was the final brand of the agent of architecture. 00:01:52.860 |
And I'll wrap up with some reflections on building agents and some closing thoughts. 00:01:59.780 |
Why did we feel like we needed to rebuild our core product from scratch at such critical moments? 00:02:04.020 |
Well, to understand that question, we first need to understand Alice 1. 00:02:11.780 |
The main thing that we're going to do with Alice is to create these custom AI-powered outreach campaigns. 00:02:16.780 |
And there were five steps involved in campaign creation. 00:02:21.780 |
That's when you identify the people that you'd like to sell to. 00:02:24.780 |
And then the second step, you describe your offer. 00:02:27.780 |
This is the products or services that we need to sell. 00:02:31.780 |
Then the third and fourth step, you construct your sequence and also tweak the AI-powered messaging. 00:02:37.780 |
And finally, when everything's to your liking, you move on to the last step, 00:02:42.780 |
And that's when Alice will get sourcing leads at Metro ICP, researching them, 00:02:46.780 |
writing those customized emails, and in general just executing the sequence that you've built 00:02:53.780 |
Now Alice 1 was a big success by a lot of different metrics. 00:02:58.780 |
But we wouldn't really consider a true digital worker. 00:03:05.780 |
More than you would probably expect from a digital worker. 00:03:07.780 |
And you also probably saw there was a lot of manual input, especially on that offer stage. 00:03:16.780 |
We weren't doing deep research or scraping the web or anything like that. 00:03:21.780 |
And downstream that would lead to relatively uninspiring personalization in our emails. 00:03:26.780 |
And there, on top of that, Alice wasn't able to handle replies automatically. 00:03:31.780 |
She wasn't able to answer customers' questions. 00:03:34.780 |
And finally there was no real self-learning advantage. 00:03:40.780 |
Meanwhile, while we were building Alice 1, the industry was evolving around us. 00:03:53.780 |
And we got function calling and the open AI/AI. 00:03:56.780 |
Then in January of 2024, we got more production ready agent framework in the form of Langra. 00:04:07.780 |
And finally in September, we got the Replit agent, which for us was the first example of a truly mind-blowing authentic software product. 00:04:16.780 |
And just to double click into the Replit agent a little bit, this really blew our minds. 00:04:23.780 |
First, the agents were going to be really helpful. 00:04:31.780 |
So with that in mind, we developed a new vision for Alice, centered on seven legitimate capabilities. 00:04:38.780 |
We believe that users should mostly interact with Alice through chat, where they would interact with the human team here. 00:04:44.780 |
Secondly, users should be able to upload internal documents. 00:04:47.780 |
their websites, reading recordings to a knowledge base. 00:04:53.780 |
Third, we should use an AI agent for lead sourcing that actually considers the quality and fit of each lead rather than a dumb filter search. 00:05:03.780 |
Number four, we should do deep research on every lead. 00:05:06.780 |
And that should lead to number five, which is true personalization in those emails. 00:05:11.780 |
And then finally, we believe that Alice should be able to handle inbound messages automatically, answering questions and booking meetings. 00:05:21.780 |
She should incorporate the insights from all of the campaigns she's running to optimize the performance of your account. 00:05:29.780 |
And with that a place, Scott was set up to rebuild an awesome scratch. 00:05:33.780 |
And in short, this was a pretty aggressive push for the company. 00:05:36.780 |
It took us just three months from the first commit to migrating our last business customer. 00:05:41.780 |
We issued staff just two engineers on building an agent. 00:05:44.780 |
After developing the POC, we would draw in more resources. 00:05:47.780 |
We had one project manager, our one another Keith. 00:05:51.780 |
And we had about 300 customers that needed to be migrated from our original platforms to new ones. 00:05:59.780 |
And yet, our code marketing team was really not slowing down. 00:06:01.780 |
There were a few key decisions that we made at the outset of this project. 00:06:06.780 |
The first is that we wanted to start from scratch. 00:06:08.780 |
We didn't want to ask who to be encountered by boss one and nickname. 00:06:17.780 |
We were going to be taking on a lot of risk with some unfamiliar technologies like the agent and the knowledge base. 00:06:22.780 |
We didn't want to add additional risk through technologies that we didn't understand. 00:06:28.780 |
And number three, we wanted to leverage vendors as much as possible to move really quickly. 00:06:32.780 |
We didn't want to be building not essential components. 00:06:38.780 |
I won't go into too much detail here, but I thought it would be interesting to see. 00:06:42.780 |
And here are some of the vendors that we chose to leverage and work with. 00:06:46.780 |
I can't go into detail with every one of the vendors, 00:06:49.780 |
but they were all essential to our assessment of what is shot and what went out that has been useful. 00:06:54.780 |
Of course, one of the most important vendors we chose to work with is Lightchain. 00:06:59.780 |
And we knew that we were going to be a really good partner from the start to enter into the soft. 00:07:05.780 |
They were a clear-new school leader in AI tech tools and AI infrastructure. 00:07:11.780 |
We had cloud hosting and observability, so we knew we were going to be able to get into production. 00:07:16.780 |
And that once our users was in production, we would understand how it was performing. 00:07:24.780 |
And then Lightchain also had TypeScript support, which is important to us as a TypeScript's shop. 00:07:30.780 |
And last but not least, the customer support from the Lightchain team was just incredible. 00:07:36.780 |
They wrapped us up on Minecraft, with the Lightchain ecosystem, and on the agents in general. 00:07:43.780 |
In terms of the products that we used today, we used pretty much the entire suite. 00:07:48.780 |
And now I want to talk you through one of the main challenges that we encountered while building this, or building Alice 2, 00:07:55.780 |
which was finding the right to the agent architecture. 00:07:58.780 |
And you'll remember that the main feature of Alice was campaign creation. 00:08:02.780 |
So we wanted Alice, Alice Agents, to guide users through campaign creation in the same way that a Reclite agent would guide you through your web-creating app. 00:08:10.780 |
We tried three different architectures for this. 00:08:17.780 |
And then finally we ended on a multi-agent system. 00:08:20.780 |
So now I'm going to talk you through each of these, how it works in detail, and why it didn't work for our use case until we arrived in multi-agent. 00:08:30.780 |
Well, React is a JavaScript framework for building user interfaces, but that's not what I mean here. 00:08:35.780 |
I mean the React model of an AI agent, which I think other people have talked about earlier today. 00:08:39.780 |
This is a model that was invented by Google researchers back in 2022. 00:08:45.780 |
And basically what these researchers have observed is if you include reasoning traces in the conversation context, 00:08:52.780 |
the agent performs better than another OS would. 00:08:54.780 |
And with a React agent, the execution loop is split into three parts. 00:08:58.780 |
There's reasoning, where the agent thinks about what to do. 00:09:02.780 |
There's action, where the agent actually takes action, for example, performing a tool call. 00:09:07.780 |
And then finally there's observe, where the agent observes the new skill of the world after performing the action. 00:09:16.780 |
But as I mentioned, reasoning traces lead to better performance in the agent. 00:09:22.780 |
This is our implementation of our React agent. 00:09:24.780 |
It consists of just one node and 10 or 20 tools. 00:09:29.780 |
And it's not very impressive looking, I know. 00:09:31.780 |
But this simplicity is actually one of the main benefits of the React architecture in my opinion. 00:09:38.780 |
Well, there are lots of different things that need to happen in campaign creation. 00:09:44.780 |
We need to insert new db entities and draft emails. 00:09:50.780 |
The React loop dimension on the previous slide, that's implemented inside of the assistant node. 00:09:56.780 |
And when the assistant actually performs an action, that is manifested in the form of a tool call, 00:10:05.780 |
One thing to note about the React agent is that it runs completions for every turn. 00:10:09.780 |
So if the user says hello, and they say I'd like to create a campaign, that would be two turns, 00:10:14.780 |
and the React agent runs completions each time. 00:10:19.780 |
Here are some of the tools that we implemented and back attached to our agent. 00:10:27.780 |
So we didn't use an MCP server or any third-party tool registries. 00:10:32.780 |
A few things I want to tell you about tools before we can move on. 00:10:35.780 |
The first is that tools are necessary to take action. 00:10:37.780 |
So any time you want your agent to do anything on the outside world, for example, 00:10:42.780 |
call an API or write a file, you're going to need a tool to do that. 00:10:47.780 |
There are also necessary to access information beyond the context window. 00:10:51.780 |
If you think about it, what your agent knows is limited to three things. 00:10:54.780 |
The conversation context, the prompts, and the model dates. 00:10:58.780 |
If you wanted to know anything beyond that, you need to give the tool, for example, a web search tool. 00:11:03.780 |
And that's essentially the concept behind React. 00:11:09.780 |
This is one of the easiest and safest ways to get started with a multi-agent system. 00:11:14.780 |
And last but not least, tools are preferable over skills. 00:11:20.780 |
Essentially, if you think about it, if someone asks you to do something like perform a complex compilation, 00:11:26.780 |
you can do that either through a tool like a calculator, or maybe you have the skill of the mental algorithm and the technology that provides it to perform that calculation. 00:11:33.780 |
And in general, it's better to use a tool than to use a skill. 00:11:38.780 |
Because this minimizes the amount of tokens you're using in a context to accomplish that role. 00:11:45.780 |
What are the strengths of the React architecture? 00:11:50.780 |
We basically never needed to revise our agent structure later on. 00:11:54.780 |
It was also great at handling arbitrary user inputs over multiple turns. 00:11:58.780 |
This is because the graph is running for completion each time. 00:12:01.780 |
It allows the user to say something in step three that's related to step one without the agent being confused. 00:12:12.780 |
When it has some issues, for example, the React is kind of bad. 00:12:18.780 |
And as you know, what sometimes happens when you do that is the agent will struggle with which to call one into one order. 00:12:26.780 |
And this would sometimes lead to infinite loops for the agents repeatedly trying to accomplish some part of campaign creation by not succeeding. 00:12:33.780 |
If Windows infinite loops for one for a while, we would get a recursion limit error, 00:12:38.780 |
which is effectively the agent would put on a staffer or a phone. 00:12:42.780 |
And also the outputs that we were getting from this version of the agent were relatively mediocre. 00:12:47.780 |
The audiences, the sequences, the emails, they just weren't that good. 00:12:52.780 |
And our hypothesis here was down because there's just one agent and really like one set of prompts that are responsible for doing the entire campaign creation process. 00:13:00.780 |
It wasn't really good at any particular point. 00:13:06.780 |
In our case, we chose to add structure, which is like a standard within a workflow. 00:13:12.780 |
Now workflow is defined by an algorithm as a system where elements and tools are harvested through redefining coding paths. 00:13:18.780 |
In this screenshot, they both come from an excellent blog post-buying project called Building Effective Agents. 00:13:27.780 |
Importantly, workflows are different from agents. 00:13:30.780 |
And this is one of the things that the agent community has been debating a lot on Twitter, for example. 00:13:35.780 |
That's the reason why we have the term agentic for sometimes describing a system as opposed to agent. 00:13:40.780 |
A system could be agentic, but not a system as opposed to agent per se. 00:13:50.780 |
In fact, predefined code paths is the element is not choosing how to orchestrate code. 00:13:55.780 |
The elements are just because of those predefined code paths. 00:13:59.780 |
And last but not least, workflows are not really a new technology. 00:14:04.780 |
and the most famous form is probably the data-renewable airflow. 00:14:14.780 |
This was our implementation of a workflow campaign creation agent. 00:14:18.780 |
It's obviously a lot of complex that our React data is. 00:14:25.780 |
These stages correspond to different steps of campaign creation that I mentioned before. 00:14:30.780 |
Interestingly, this graph, unlike the React region, doesn't like the completion of our region. 00:14:36.780 |
It only runs the completion once for the entire campaign creation process. 00:14:40.780 |
And the way that we get user input or feedback at certain points within the graph execution 00:14:45.780 |
is the use of something called node-based graphs, which is a light graph feature. 00:14:50.780 |
There were a number of strengths involved with the workflow of architecture. 00:14:53.780 |
They basically solved all the problems that we use in the React. 00:14:57.780 |
For one, we no longer have issues with tools, but we just didn't have the tools. 00:15:02.780 |
We specialize in nodes, like a bright email node. 00:15:07.780 |
And we've also got a clearly defined execution flow with a fixed number of steps. 00:15:11.780 |
So no more event loops, no more recursive errors. 00:15:18.780 |
The email sequences that we were getting from this version of the agent were much better. 00:15:22.780 |
And that's because you forced the agent to go through these particular steps. 00:15:26.780 |
But the workflow architecture did have issues. 00:15:31.780 |
And now our front-end campaign creation flow experience is coupled with the architecture of our agent. 00:15:36.780 |
And we would have to change that architecture and that graph structure any time we were to make changes in the campaign creation experience. 00:15:46.780 |
It also didn't support jumping around within the campaign creation flow. 00:15:49.780 |
That's because the graph doesn't run information over time. 00:15:51.780 |
When you get to step three, and you stop using the ability to collect feedback from that step, 00:15:56.780 |
you can really only respond to what's happening in step three. 00:16:02.780 |
So really workflows we're not going to be in for our use case. 00:16:08.780 |
Well, after some soul searching, we began across a blog post by BangChain 00:16:14.780 |
that explains how to build a customer support agent using a multi-agent architecture. 00:16:19.780 |
And this is the model that gives the insight that we needed for our use case. 00:16:23.780 |
A multi-agent system is one that's a hierarchical approach to building an AI agent. 00:16:29.780 |
In this pattern, there's one supervisor and there are many sub-agents that are specialized. 00:16:34.780 |
And the supervisor is responsible for interacting with the user and programming tasks for sub-agents. 00:16:40.780 |
And the sub-agents will then fulfill those tasks and they'll explain it back to the supervisor when they're building. 00:16:46.780 |
And we really devoured this blog post by BangChain. 00:16:49.780 |
We went a little crazy in the process but ultimately found a version of this that works for our use case. 00:16:57.780 |
We have a multi-agent graph that accomplishes all of campaign creation except for auto-inscribation, 00:17:06.780 |
And you can see here at the top is our supervisor node. 00:17:14.780 |
We have something that generates something called a positioning report, 00:17:17.780 |
which is how you should position your product or service from this particular view. 00:17:25.780 |
And this multi-agent architecture, it gave us the best of both worlds. 00:17:30.780 |
We got the flexibility of the react agent and then we got the performance of the workflow. 00:17:35.780 |
Now I want to share a couple of reflections of building agents from this experience. 00:17:43.780 |
All of that structure and scaffolding can provide performance gains in the short term, 00:17:47.780 |
but over the long term it locks you into a structure that can be counterproductive. 00:17:51.780 |
And related to this is that new models can really change everything. 00:17:56.780 |
Anjan and Replit told us this about the Replit agent. 00:17:59.780 |
He said it wasn't really working until Sonic 3.5 came out. 00:18:02.780 |
Then they dropped it in and everything was magic. 00:18:06.780 |
It's also used to take which your agent has a human converter or it has human co-workers. 00:18:11.780 |
In our case, we have different mental models. 00:18:13.780 |
We thought that the agent was a user flow within our product or a directed graph. 00:18:19.780 |
It was with the wrong mental models and they let us implement the wrong architecture. 00:18:24.780 |
We should also break big tasks down into small tasks. 00:18:27.780 |
In our case, the big task was the campaign creation. 00:18:29.780 |
But there were small tasks like writing an email within that. 00:18:32.780 |
And it became easier to implement the agent once we broke it down into smaller tasks. 00:18:42.780 |
Just give it the right tools and tell it how to use them. 00:18:46.780 |
And then last but not least, don't forget about prompt engineering. 00:18:49.780 |
It's easy to forget that your agent is just a series of calls within a while loop. 00:18:54.780 |
If your agent isn't performing well, you should consider going back into this prompt engineering. 00:18:59.780 |
And I wish we had time for a demo, but I don't. 00:19:06.780 |
But I do have this workload and I'll open this up for a moment. 00:19:08.780 |
If you're not able to get to it now, the slides will be available afterwards. 00:19:14.780 |
Alice, too, went live in January and now results in pretty exciting. 00:19:18.780 |
She's now sourced up close to 2 million feeds. 00:19:23.780 |
We've sent up close to 3 million messages and generated about 21,000 replies. 00:19:27.780 |
Over the last month or so, the reply rate is around 2%, which is somewhat far with a human SDR. 00:19:35.780 |
And we're starting to see that climb as we implement self-learning and some other optimizations. 00:19:41.780 |
In terms of future plans, we're excited to integrate Alice and Julian, our release agents, 00:19:45.780 |
so that these two agents can engage leads across multiple channels, both inbound and outbound. 00:19:53.780 |
We've done some work here and I wasn't able to talk more about it. 00:19:56.780 |
And then finally, we're really excited about applications with new technologies like computers and memory and reinforcement connection. 00:20:04.780 |
Yeah, if any of you, this is, you know, sounds exciting. 00:20:08.780 |
I know you're sick of going to software because you're going to go to digital, you know, workers. 00:20:11.780 |
I have a message showing myself or, you know, anyone that actually needs like 11 times a minute before, 00:20:47.780 |
Let's discuss these lessons learned from building agents at box. 00:20:53.780 |
I'm Ed Kuss and I'm here to talk about the lessons that we learned at box, working on agents and architectures. 00:21:16.780 |
We have a particular deal of large enterprises. 00:21:24.780 |
We have over 100,000 companies, 10,000 users. 00:21:31.780 |
In many of these companies, we are actually the first AI that they started to deploy across the company. 00:21:38.780 |
Partially because large enterprises are straight AI, and we were lucky enough to already have it trusted. 00:21:43.780 |
So when we do AI, we're always thinking of enterprise-grade. 00:21:47.780 |
Now, when we went to do AI content, typically we would think about it all these different ways, 00:21:53.780 |
where we had kind of standard-rack stuff, kind of doing community across a bunch of documents, 00:21:59.780 |
searching and doing deep research across a bunch of corpus of data. 00:22:04.780 |
And then data extraction is also the only one of the future that we have. 00:22:07.780 |
So we do extracting structured information from non-structured data. 00:22:10.780 |
In addition to things like AI-powered workloads, like being able to do, like, 00:22:14.780 |
, insurance summary generation, and this kind of thing. 00:22:20.780 |
But today, to talk about our journey, I'd like to talk about the middle one here, for data extraction. 00:22:26.780 |
We'll talk about how since we've been integrating our AI into our products since 2023, 00:22:34.780 |
And I think this went harshly because I think on this list, this is the least agentic-looking type of functionality. 00:22:40.780 |
There's no, like, chatbot associated with it. 00:22:44.780 |
So this is an interesting lesson that we've learned. 00:22:47.780 |
So if you don't know much about data extraction, 00:22:50.780 |
the idea behind it is that many companies have an awful lot of unstructured data. 00:22:55.780 |
Probably 90% of the data in the world is not built in the system. 00:23:01.780 |
And so companies always want to get data out of their unstructured data. 00:23:05.780 |
So this is what we call data extraction or data extraction. 00:23:08.780 |
And it's a common request for many companies. 00:23:12.780 |
And there's a whole industry here that you've probably never heard of called IDK. 00:23:17.780 |
And it's really oriented around machine learning-based systems where you train a model, 00:23:23.780 |
you need a bunch of data scientists and do this kind of interaction. 00:23:26.780 |
But it really didn't work that well historically. 00:23:29.780 |
Many companies, they only automate things that are extremely high-scale. 00:23:38.780 |
Because if you change the format of anything, it would just kind of, 00:23:42.780 |
So, when General Edd came out, this was, like, a gift for anybody who knows about structured data. 00:23:49.780 |
Because you could actually just use the power of data to be able to pull out structured data. 00:24:06.780 |
And then be able to get to the large end quality state of community results. 00:24:21.780 |
And we got to the point where we were saying, like, this is community document. 00:24:27.780 |
And so, it was truly built down the basics of AI content. 00:24:43.780 |
When we started telling our customers, just give us any data and we'll be able to extract the things you want. 00:24:49.780 |
And so, they were like, oh, I've never been able to write this thing before. 00:24:52.780 |
This 300 page document that was rolled down in content. 00:25:01.780 |
We built the concept of, like, a, an enterprise rack where we were able to get the data out. 00:25:08.780 |
But then, they were like, okay, it turns out OCR doesn't work that well in certain cases. 00:25:13.780 |
People can cross things out or when you have to build different languages. 00:25:18.780 |
Then, we had this challenge where some people were like, okay, I want not just 20 pieces of data, 00:25:27.780 |
And that just kind of, like, overwhelm the attention of the model to be able to pull those things out for you, 00:25:32.780 |
And then, people in this world, they're used to things like confidence. 00:25:36.780 |
They're like, well, how do I know what's right? 00:25:39.780 |
And, of course, gender data doesn't have confidence scores, like, and all models do. 00:25:42.780 |
So, we had to, like, start doing things like, oh, we'll run it now as a judge. 00:25:56.780 |
And this was, like, our moment of, like, the Trump dissolution of somebody high to somebody high.