Back to Index

LangChain Interrupt 2025 Building and Scaling an AI Agent During HyperGrowth – Sherwood Callaway


Transcript

They build their agents, Alice and Julian, online gratifying our platform, and they'll share their lessons learned. So let's give them a little more call. Hey, everyone. How's it going? My name's Sherwin. I am one of the tech needs here at 11X. I'm the engineering for our Alice products. And today, I'm joined by Keith Burnett of Grove, who is the product manager for this Alice product.

Now, WebEx, for those of you who are unfamiliar, is a company that's building digital workers. We have two digital workers today. The first is Alice. She's our AI and CR. And the second is Julian. She's an AI and voice agent. We've got more workers on the way. I want to take everybody back to September 2024, which is, for most people, not long ago for us top company history.

We just crossed 10 million CRR. We just announced our series A. Then our series B, meeting for 15 days later. With all this chaos going on, we relocated our team and company from London at San Francisco to a beautiful new office with our beautiful new CTO. And at the same time, we also bought it at Wapid because we're 11X.

And during all this chaos, we chose this moment to rebuild our core product from the ground up. And the reason we did that is we truly felt at the time and we're going to be sure that agents were the future. So today's talk, I want to first tell you why we felt the need to rebuild our core product from scratch.

Hopefully, I think everyone's probably in agreement about agents in the future. Then I'll tell you how we did it. We built this enterprise rate at AISDR in just three months. Then I'm going to talk about one of the new challenges that we experienced, which was the final brand of the agent of architecture.

And I'll wrap up with some reflections on building agents and some closing thoughts. Let's start with the decision to rebuild. Why did we feel like we needed to rebuild our core product from scratch at such critical moments? Well, to understand that question, we first need to understand Alice 1.

Alice 1 was our original AISDR product. The main thing that we're going to do with Alice is to create these custom AI-powered outreach campaigns. And there were five steps involved in campaign creation. The first step is defining your audience. That's when you identify the people that you'd like to sell to.

And then the second step, you describe your offer. This is the products or services that we need to sell. Then the third and fourth step, you construct your sequence and also tweak the AI-powered messaging. And finally, when everything's to your liking, you move on to the last step, which is you launch the campaign.

And that's when Alice will get sourcing leads at Metro ICP, researching them, writing those customized emails, and in general just executing the sequence that you've built for every lead that enters the campaign. Now Alice 1 was a big success by a lot of different metrics. But we wouldn't really consider a true digital worker.

And that's for a lot of reasons. For one, there was a lot of button clicking. More than you would probably expect from a digital worker. And you also probably saw there was a lot of manual input, especially on that offer stage. Our lead research was also relatively basic. We weren't doing deep research or scraping the web or anything like that.

And downstream that would lead to relatively uninspiring personalization in our emails. And there, on top of that, Alice wasn't able to handle replies automatically. She wasn't able to answer customers' questions. And finally there was no real self-learning advantage. She wasn't getting better over time. Meanwhile, while we were building Alice 1, the industry was evolving around us.

In March of 2023, we got GBT4. We got the first clock model. And we got the first agent frameworks. And later that year, we got clock 2. And we got function calling and the open AI/AI. Then in January of 2024, we got more production ready agent framework in the form of Langra.

In March, we got clock 3. And then in May, we got GBT4. And finally in September, we got the Replit agent, which for us was the first example of a truly mind-blowing authentic software product. And just to double click into the Replit agent a little bit, this really blew our minds.

It convinced us of two things. First, the agents were going to be really helpful. They could build an entire ass from scratch. And second, that they're here today. They're ready for production. So with that in mind, we developed a new vision for Alice, centered on seven legitimate capabilities. The first one was chat.

We believe that users should mostly interact with Alice through chat, where they would interact with the human team here. Secondly, users should be able to upload internal documents. their websites, reading recordings to a knowledge base. And in doing so, they would train Alice. Third, we should use an AI agent for lead sourcing that actually considers the quality and fit of each lead rather than a dumb filter search.

Number four, we should do deep research on every lead. And that should lead to number five, which is true personalization in those emails. And then finally, we believe that Alice should be able to handle inbound messages automatically, answering questions and booking meetings. Also, she should be self-learning. She should incorporate the insights from all of the campaigns she's running to optimize the performance of your account.

So that was our vision. And with that a place, Scott was set up to rebuild an awesome scratch. And in short, this was a pretty aggressive push for the company. It took us just three months from the first commit to migrating our last business customer. We issued staff just two engineers on building an agent.

After developing the POC, we would draw in more resources. We had one project manager, our one another Keith. And we had about 300 customers that needed to be migrated from our original platforms to new ones. And that was growing by the day. And yet, our code marketing team was really not slowing down.

There were a few key decisions that we made at the outset of this project. The first is that we wanted to start from scratch. We didn't want to ask who to be encountered by boss one and nickname. So new repo, new infrastructure, new team. We also didn't want to reinvent the wheel.

We were going to be taking on a lot of risk with some unfamiliar technologies like the agent and the knowledge base. We didn't want to add additional risk through technologies that we didn't understand. So we chose a very vanilla tech staff. And number three, we wanted to leverage vendors as much as possible to move really quickly.

We didn't want to be building not essential components. This is the tech staff that we went with. I won't go into too much detail here, but I thought it would be interesting to see. And here are some of the vendors that we chose to leverage and work with. I can't go into detail with every one of the vendors, but they were all essential to our assessment of what is shot and what went out that has been useful.

Of course, one of the most important vendors we chose to work with is Lightchain. And we knew that we were going to be a really good partner from the start to enter into the soft. Lightchain was a very natural choice for us. They were a clear-new school leader in AI tech tools and AI infrastructure.

They had an ancient framework ready to go. We had cloud hosting and observability, so we knew we were going to be able to get into production. And that once our users was in production, we would understand how it was performing. We also had some familiarity from Alice 1. We were using the core SDK with Alice 1.

And then Lightchain also had TypeScript support, which is important to us as a TypeScript's shop. And last but not least, the customer support from the Lightchain team was just incredible. It really felt like the engine of our team. They wrapped us up on Minecraft, with the Lightchain ecosystem, and on the agents in general.

We are so grateful to them for that help. In terms of the products that we used today, we used pretty much the entire suite. And now I want to talk you through one of the main challenges that we encountered while building this, or building Alice 2, which was finding the right to the agent architecture.

And you'll remember that the main feature of Alice was campaign creation. So we wanted Alice, Alice Agents, to guide users through campaign creation in the same way that a Reclite agent would guide you through your web-creating app. We tried three different architectures for this. The first was React. The second was a workflow.

And then finally we ended on a multi-agent system. So now I'm going to talk you through each of these, how it works in detail, and why it didn't work for our use case until we arrived in multi-agent. Let's start with React. Well, React is a JavaScript framework for building user interfaces, but that's not what I mean here.

I mean the React model of an AI agent, which I think other people have talked about earlier today. This is a model that was invented by Google researchers back in 2022. And it stands for Reason and Act. And basically what these researchers have observed is if you include reasoning traces in the conversation context, the agent performs better than another OS would.

And with a React agent, the execution loop is split into three parts. There's reasoning, where the agent thinks about what to do. There's action, where the agent actually takes action, for example, performing a tool call. And then finally there's observe, where the agent observes the new skill of the world after performing the action.

And I guess React wasn't a very good name. But as I mentioned, reasoning traces lead to better performance in the agent. This is our implementation of our React agent. It consists of just one node and 10 or 20 tools. And it's not very impressive looking, I know. But this simplicity is actually one of the main benefits of the React architecture in my opinion.

So why do we have so many tools? Well, there are lots of different things that need to happen in campaign creation. We need to fetch leads from our database. We need to insert new db entities and draft emails. And all of those things become a tool. The React loop dimension on the previous slide, that's implemented inside of the assistant node.

And when the assistant actually performs an action, that is manifested in the form of a tool call, which is executed by the tool node. One thing to note about the React agent is that it runs completions for every turn. So if the user says hello, and they say I'd like to create a campaign, that would be two turns, and the React agent runs completions each time.

That's going to become relevant later. Here are some of the tools that we implemented and back attached to our agent. Unfortunately, Clouds 2 predated MCP. So we didn't use an MCP server or any third-party tool registries. A few things I want to tell you about tools before we can move on.

The first is that tools are necessary to take action. So any time you want your agent to do anything on the outside world, for example, call an API or write a file, you're going to need a tool to do that. There are also necessary to access information beyond the context window.

If you think about it, what your agent knows is limited to three things. The conversation context, the prompts, and the model dates. If you wanted to know anything beyond that, you need to give the tool, for example, a web search tool. And that's essentially the concept behind React. Tools can also be used to call other agents.

This is one of the easiest and safest ways to get started with a multi-agent system. And last but not least, tools are preferable over skills. So this is a framework I came up with. Essentially, if you think about it, if someone asks you to do something like perform a complex compilation, you can do that either through a tool like a calculator, or maybe you have the skill of the mental algorithm and the technology that provides it to perform that calculation.

And in general, it's better to use a tool than to use a skill. Because this minimizes the amount of tokens you're using in a context to accomplish that role. What are the strengths of the React architecture? Well, I mentioned without many. That is that it is extremely simple. We basically never needed to revise our agent structure later on.

It was also great at handling arbitrary user inputs over multiple turns. This is because the graph is running for completion each time. It allows the user to say something in step three that's related to step one without the agent being confused. It's actually robust to that. So that was a great fan of that show.

When it has some issues, for example, the React is kind of bad. It's mostly attached to a lot of tools. And as you know, what sometimes happens when you do that is the agent will struggle with which to call one into one order. And this would sometimes lead to infinite loops for the agents repeatedly trying to accomplish some part of campaign creation by not succeeding.

If Windows infinite loops for one for a while, we would get a recursion limit error, which is effectively the agent would put on a staffer or a phone. And also the outputs that we were getting from this version of the agent were relatively mediocre. The audiences, the sequences, the emails, they just weren't that good.

And our hypothesis here was down because there's just one agent and really like one set of prompts that are responsible for doing the entire campaign creation process. It wasn't really good at any particular point. So what can we do? And how can we address these issues? In our case, we chose to add structure, which is like a standard within a workflow.

Now workflow is defined by an algorithm as a system where elements and tools are harvested through redefining coding paths. In this screenshot, they both come from an excellent blog post-buying project called Building Effective Agents. Highly recommend checking it out. I shame I selected it. Importantly, workflows are different from agents.

And this is one of the things that the agent community has been debating a lot on Twitter, for example. That's the reason why we have the term agentic for sometimes describing a system as opposed to agent. A system could be agentic, but not a system as opposed to agent per se.

Workflows are highly structured. In fact, predefined code paths is the element is not choosing how to orchestrate code. The elements are just because of those predefined code paths. And last but not least, workflows are not really a new technology. They're not in a long time in other formats. and the most famous form is probably the data-renewable airflow.

This was our implementation of a workflow campaign creation agent. It's obviously a lot of complex that our React data is. We now have 15 nodes. They're split across five different stages. These stages correspond to different steps of campaign creation that I mentioned before. Interestingly, this graph, unlike the React region, doesn't like the completion of our region.

It only runs the completion once for the entire campaign creation process. And the way that we get user input or feedback at certain points within the graph execution is the use of something called node-based graphs, which is a light graph feature. There were a number of strengths involved with the workflow of architecture.

They basically solved all the problems that we use in the React. For one, we no longer have issues with tools, but we just didn't have the tools. We replace the nodes. We specialize in nodes, like a bright email node. And we've also got a clearly defined execution flow with a fixed number of steps.

So no more event loops, no more recursive errors. On top of that, we got much better outputs. The email sequences that we were getting from this version of the agent were much better. And that's because you forced the agent to go through these particular steps. But the workflow architecture did have issues.

For one, it was extremely complex. And now our front-end campaign creation flow experience is coupled with the architecture of our agent. And we would have to change that architecture and that graph structure any time we were to make changes in the campaign creation experience. So it's super laborious and annoying.

It also didn't support jumping around within the campaign creation flow. That's because the graph doesn't run information over time. When you get to step three, and you stop using the ability to collect feedback from that step, you can really only respond to what's happening in step three. You can't jump back to step one.

So really workflows we're not going to be in for our use case. What else can we try? Well, after some soul searching, we began across a blog post by BangChain that explains how to build a customer support agent using a multi-agent architecture. And this is the model that gives the insight that we needed for our use case.

A multi-agent system is one that's a hierarchical approach to building an AI agent. In this pattern, there's one supervisor and there are many sub-agents that are specialized. And the supervisor is responsible for interacting with the user and programming tasks for sub-agents. And the sub-agents will then fulfill those tasks and they'll explain it back to the supervisor when they're building.

And we really devoured this blog post by BangChain. We went a little crazy in the process but ultimately found a version of this that works for our use case. And here's what that looks like. We have a multi-agent graph that accomplishes all of campaign creation except for auto-inscribation, which we kept up with for different reasons.

And you can see here at the top is our supervisor node. It's close to the start of the graph. And then we have four specialist sub-agents. We have a researcher. We have something that generates something called a positioning report, which is how you should position your product or service from this particular view.

Then we have a LinkedIn message writer. And finally, we have an email writer. And this multi-agent architecture, it gave us the best of both worlds. We got the flexibility of the react agent and then we got the performance of the workflow. Now I want to share a couple of reflections of building agents from this experience.

And the first is that simplicity is key. All of that structure and scaffolding can provide performance gains in the short term, but over the long term it locks you into a structure that can be counterproductive. And related to this is that new models can really change everything. Anjan and Replit told us this about the Replit agent.

He said it wasn't really working until Sonic 3.5 came out. Then they dropped it in and everything was magic. And that's really true. It's also used to take which your agent has a human converter or it has human co-workers. In our case, we have different mental models. We thought that the agent was a user flow within our product or a directed graph.

It was with the wrong mental models and they let us implement the wrong architecture. We should also break big tasks down into small tasks. In our case, the big task was the campaign creation. But there were small tasks like writing an email within that. And it became easier to implement the agent once we broke it down into smaller tasks.

Tools are preferable over skills. So don't try to make your agent too smart. Just give it the right tools and tell it how to use them. And then last but not least, don't forget about prompt engineering. It's easy to forget that your agent is just a series of calls within a while loop.

If your agent isn't performing well, you should consider going back into this prompt engineering. And I wish we had time for a demo, but I don't. But I do have this workload and I'll open this up for a moment. If you're not able to get to it now, the slides will be available afterwards.

You can check out what we've done. Alice, too, went live in January and now results in pretty exciting. She's now sourced up close to 2 million feeds. I think these numbers are blue on a date. We've sent up close to 3 million messages and generated about 21,000 replies. Over the last month or so, the reply rate is around 2%, which is somewhat far with a human SDR.

And we're starting to see that climb as we implement self-learning and some other optimizations. In terms of future plans, we're excited to integrate Alice and Julian, our release agents, so that these two agents can engage leads across multiple channels, both inbound and outbound. We're also really excited about software.

We've done some work here and I wasn't able to talk more about it. And then finally, we're really excited about applications with new technologies like computers and memory and reinforcement connection. Yeah, if any of you, this is, you know, sounds exciting. I know you're sick of going to software because you're going to go to digital, you know, workers.

I have a message showing myself or, you know, anyone that actually needs like 11 times a minute before, 11 times fast. This is a show of all night, please. Please, like, we do all day. Cool. Thank you, guys. Cheers, guys. All right. All right. Now you're welcome to present with us.

It's a box. So box is beginning to go to the Minecraft. Let's discuss these lessons learned from building agents at box. So welcome back. Hello. Hi. I'm Ed Kuss and I'm here to talk about the lessons that we learned at box, working on agents and architectures. If you don't know box, we're a B&B company.

Many people know us. Our content sharing is also unstructured. We have a particular deal of large enterprises. So, like, big companies across Fortune 500. We have over 100,000 companies, 10,000 users. Other customers have given us trustless. In many of these companies, we are actually the first AI that they started to deploy across the company.

Partially because large enterprises are straight AI, and we were lucky enough to already have it trusted. So when we do AI, we're always thinking of enterprise-grade. Now, when we went to do AI content, typically we would think about it all these different ways, where we had kind of standard-rack stuff, kind of doing community across a bunch of documents, searching and doing deep research across a bunch of corpus of data.

And then data extraction is also the only one of the future that we have. So we do extracting structured information from non-structured data. In addition to things like AI-powered workloads, like being able to do, like, , insurance summary generation, and this kind of thing. But today, to talk about our journey, I'd like to talk about the middle one here, for data extraction.

We'll talk about how since we've been integrating our AI into our products since 2023, how we kind of evolved to be more agentic. And I think this went harshly because I think on this list, this is the least agentic-looking type of functionality. There's no, like, chatbot associated with it.

So this is an interesting lesson that we've learned. So if you don't know much about data extraction, the idea behind it is that many companies have an awful lot of unstructured data. Probably 90% of the data in the world is not built in the system. It's, like, unstructured a lot.

And there's a lot of very useful data in it. And so companies always want to get data out of their unstructured data. So this is what we call data extraction or data extraction. And it's a common request for many companies. And there's a whole industry here that you've probably never heard of called IDK.

And it's really oriented around machine learning-based systems where you train a model, you need a bunch of data scientists and do this kind of interaction. But it really didn't work that well historically. Many companies, they only automate things that are extremely high-scale. It just wasn't very common in the industry.

And also, it would break all the time. Very brittle. Because if you change the format of anything, it would just kind of, it's just not working. Yeah. So, when General Edd came out, this was, like, a gift for anybody who knows about structured data. Because you could actually just use the power of data to be able to pull out structured data.

So, for us, we started on this architecture. Really straightforward. Take a document. Take the fields we're looking for. Do some pre-processing. And then use some OCR. And then be able to get to the large end quality state of community results. And it pumps it up. You've been tracking it.

This is amazing. When we did this, we usually deployed it. Ten million pay fees. The first customer, everything was working. And we got to the point where we were saying, like, this is community document. This is amazing. And so, it was truly built down the basics of AI content.

And so, this was great. It was kind of, like, generic AI solves. We did it. I thought. But then, we started to get the problems. When we started telling our customers, just give us any data and we'll be able to extract the things you want. Like, they did. And so, they were like, oh, I've never been able to write this thing before.

This 300 page document that was rolled down in content. Because of the time. And, we were like, okay, no problem. We'll process more. We built the concept of, like, a, an enterprise rack where we were able to get the data out. So, okay, solve that. But then, they were like, okay, it turns out OCR doesn't work that well in certain cases.

People can cross things out or when you have to build different languages. So, we had to start to solve that. Then, we had this challenge where some people were like, okay, I want not just 20 pieces of data, but, like, 200 or 500 different pieces. And that just kind of, like, overwhelm the attention of the model to be able to pull those things out for you, especially on complex documents.

And then, people in this world, they're used to things like confidence. They're like, well, how do I know what's right? What's your confidence score? And, of course, gender data doesn't have confidence scores, like, and all models do. So, we had to, like, start doing things like, oh, we'll run it now as a judge.

And they'll tell you after it's done. If things are accurate now. And I was like, okay, sure. But, like, they told me it was wrong. So, why did you tell me any of it? It says it's wrong. So, we ended up with all these challenges. And this was, like, our moment of, like, the Trump dissolution of somebody high to somebody high.

Because, you know, it's working out.