back to indexAlice 2: Building and Scaling an AI Agent During HyperGrowth | 11x | LangChain Interrupt

00:00:00.000 |
everyone how's it going my name is Sherwood I am one of the tech leads here 00:00:14.620 |
at 11x I lead engineering for our Alice products and today I'm joined by Keith 00:00:18.760 |
our head of growth who is the product manager for this Alice project now an 00:00:24.360 |
11x for those of you who are unfamiliar is a company that's building digital 00:00:27.900 |
workers we have two digital workers today the first is Alice she's our AI SDR and 00:00:32.640 |
the second is Julian he's an AI voice agent and we've got more workers on the 00:00:36.420 |
way I want to take everybody back to September 2024 it's for most people not 00:00:42.580 |
long ago for us you know it's half the company's history we just crossed 10 00:00:46.860 |
million car we just announced our series a then our series B released 15 days 00:00:53.220 |
later with all this chaos going on we relocated our whole team and company from 00:00:58.520 |
London that San Francisco to our beautiful new office with our beautiful new CTO and 00:01:05.840 |
you know at the same time we we also bought a rocket because we're 11x and you 00:01:13.140 |
know during all this chaos we chose this moment to rebuild our core product from 00:01:19.380 |
the ground up and the reason we did that is because we truly felt at the time and 00:01:24.360 |
proved to be true is that agents were the future so today's talk I want to first 00:01:29.760 |
tell you why we felt the need to rebuild Alice from scratch hopefully I think 00:01:33.300 |
everyone is probably in agreement about agents being the future then I'll tell you 00:01:37.020 |
how we did it we built this enterprise grade AI STR in just three months then I 00:01:42.000 |
want to talk you through one of the challenges that we experienced which was 00:01:44.500 |
finding the right agent architecture and I'll wrap up with some reflections on 00:01:48.480 |
building agents and some closing thoughts so let's start with the decision to 00:01:52.800 |
rebuild why did we feel like we needed to rebuild our core product from scratch at 00:01:56.520 |
such a critical moment well to understand that question you first need to 00:02:01.260 |
understand Alice one and Alice one was our original AI SDR product the main thing 00:02:07.020 |
that you could do with Alice was create these custom AI powered outreach campaigns 00:02:11.100 |
and there were five steps involved in campaign creation the first step is 00:02:15.320 |
defining your audience that's when you identify the people that you'd like to 00:02:18.380 |
sell to and in the second step you describe your offer this is the products 00:02:23.000 |
or service that you're hoping to sell then in the third and fourth step you 00:02:27.980 |
construct your sequence and also tweak the AI generated messaging and finally 00:02:32.260 |
when everything is to your liking you move on to the last step which is you 00:02:35.240 |
launch the campaign and that's when Alice will begin sourcing leads that 00:02:38.540 |
match your ICP researching them writing those customized emails and in general just 00:02:44.100 |
executing the sequence that you've built for every lead that enters the campaign 00:02:46.700 |
now Alice one was a big success by a lot of different metrics but we wouldn't 00:02:53.420 |
really consider her a true digital worker and that's for a lot of reasons for one 00:02:57.320 |
there was a lot of button clicking more than you would probably expect of a 00:03:01.220 |
digital worker and you also probably saw there was a lot of manual input 00:03:05.080 |
especially on that offers page our lead research was also relatively basic we 00:03:11.240 |
weren't doing deep research or scraping the web or anything like that and 00:03:16.220 |
downstream that would lead to relatively uninspiring personalization in our emails 00:03:20.660 |
and there on top of that Alice wasn't able to handle replies automatically she 00:03:26.480 |
wasn't able to to answer customers questions and finally there was no real 00:03:31.220 |
self-learning component she wasn't getting better over time meanwhile while we 00:03:36.380 |
were building Alice one the industry was evolving around us in March of 2023 we 00:03:41.700 |
got GPT for we got the first cloud model and we got the first agent frameworks then 00:03:47.140 |
later that year we got cloud two and we got function calling in the open AI API then 00:03:53.200 |
in January of 2024 we got a more production ready agent framework in the form of 00:03:56.960 |
land graph then in March we got cloud three in May we got GPT for oh and finally 00:04:04.580 |
in September we got the replet agent which for us was the first example of a 00:04:08.940 |
truly mind-blowing agentic software product and just a double click into the 00:04:12.760 |
replet agent a little bit this really blew our minds it convinced us of two things 00:04:17.600 |
first the agents were going to be really powerful they could build entire apps from 00:04:21.680 |
scratch and second that they're here today they're ready for production so with 00:04:27.620 |
that in mind we developed a new vision for Alice centered on seven agentic 00:04:31.180 |
capabilities and the first one was chat we believe that users should mostly 00:04:35.060 |
interact with Alice through chat the way they would interact with a human team 00:04:37.940 |
member secondly users should be able to upload internal documents their their 00:04:43.160 |
websites meeting recordings to a knowledge base and in doing so they would 00:04:46.400 |
train Alice third we should use an AI agent for lead sourcing that actually 00:04:51.900 |
considers the quality and and fit of each lead rather than than a dumb filter 00:04:57.140 |
search number four we should do deep research on every lead and that should 00:05:01.340 |
lead to number five which is true personalization in those emails and then 00:05:06.140 |
finally we believe that I should be able to handle inbound messages 00:05:10.760 |
automatically answering questions and booking meetings also she should be 00:05:15.140 |
self-learning she should incorporate the insights from all of the campaigns she's 00:05:18.460 |
running to optimize the performance of your account so that was our vision and 00:05:23.720 |
with that in place we start we set about to to rebuild Alice from scratch and in 00:05:28.800 |
short this was a pretty aggressive push for the company it it took us just three 00:05:32.360 |
months from the first commit to migrating our last business customer we 00:05:36.560 |
initially staffed just two engineers on building the agent after developing the POC we 00:05:40.860 |
we brought in more resources we had one project manager our one and only Keith 00:05:45.620 |
and we had about 300 customers that needed to be migrated from our original 00:05:50.280 |
platform to the new one and that was growing by the day we had our go-to-market 00:05:54.120 |
team was just really not slowing down there were a few key decisions that we 00:05:59.400 |
made at the outset of this project the first is that we wanted to start from 00:06:02.760 |
scratch we didn't want Alice to to be encumbered by Alice one in any way so new 00:06:06.840 |
repo new infrastructure new team we also didn't want to reinvent the wheel we 00:06:12.000 |
were going to be taking on a lot of risk with some unfamiliar technologies like 00:06:14.760 |
the agent and the knowledge base we didn't want to add additional risk 00:06:18.420 |
through technologies that we didn't understand so we chose a very vanilla tech 00:06:21.480 |
stack and number three we wanted to leverage vendors as much as possible to 00:06:26.580 |
move really quickly we didn't want to be building non-essential components this is 00:06:31.840 |
the tech stack that we went with I won't go into too much detail here but I 00:06:34.520 |
thought people be interested to see and here are some of the vendors that we 00:06:39.040 |
chose to leverage and work with we I can't go into detail with every one of 00:06:43.180 |
these vendors but they were all essential to our access and wanted to shout 00:06:46.360 |
everyone out that that has been useful of course one of the most important 00:06:51.800 |
vendors we chose to work with was Langchain and we knew that we were going to 00:06:55.360 |
need a really good partner from the start if we're gonna pull this off Langchain was a 00:06:58.780 |
very natural choice for us they were a clear industry leader in AI dev tools 00:07:02.800 |
and AI infrastructure they had an agent framework ready to go that agent framework 00:07:06.940 |
had cloud hosting and observability so we knew we were going to be able to get 00:07:10.060 |
product get to production and that once our agent was in production we would 00:07:13.400 |
understand how it's performing we also had some familiarity from Alice one we were 00:07:17.440 |
using the the core SDK with Alice one and then Langchain also had TypeScript 00:07:22.360 |
support which is important to us as a TypeScript shop and last but not least the 00:07:26.460 |
customer support from the Langchain team was just incredible they really felt 00:07:30.060 |
like an extension of our team they ramped us up on Lang graph and the Langchain 00:07:33.060 |
ecosystem and on agents in general and we are so grateful to them for that help in 00:07:38.840 |
terms of the products that we use today we use pretty much the entire suite and now 00:07:44.560 |
I want to talk you talk you through the one of the main challenges that we 00:07:47.400 |
encountered while building this while building Alice - which was finding the 00:07:51.240 |
right agent architecture and you'll remember the main feature of Alice was 00:07:56.420 |
campaign creation so we wanted Alice the Alice agent to guide users through 00:08:01.000 |
campaign creation the same way that a repli agent would guide you through creating 00:08:04.680 |
an app we tried three different architectures for this the first was 00:08:09.840 |
react the second was a workflow and then finally we ended on a multi agent 00:08:14.760 |
system so now I want to talk you through each of these how it works in detail and 00:08:19.360 |
why it didn't work for our use case until we arrived at multi agent let's start 00:08:24.340 |
with react well react is a JavaScript framework for building user interfaces 00:08:28.420 |
but that's not what I mean here I mean the react model of an AI agent which I 00:08:32.680 |
think other people have talked about earlier today this is a model that was 00:08:35.920 |
invented by Google researchers back in 2022 and it stands for reason and act and 00:08:41.360 |
basically what these researchers observed is if you include reasoning traces in the 00:08:46.120 |
conversation context the agent performs better than it otherwise would and with 00:08:50.540 |
a react agent the execution loop is split into three parts there's reasoning where 00:08:55.480 |
the agent thinks about what to do there's action where the agent actually takes action 00:08:59.860 |
for example performing a tool call and then finally there's observe where the 00:09:04.120 |
agent observed the new state of the world after performing the the action and I 00:09:08.180 |
guess reacto wasn't a very good name but as I mentioned reasoning traces lead to 00:09:14.120 |
better performance in the agent this is our implementation of a react agent it 00:09:19.580 |
consists of a just one node and 10 or 20 tools and it's not very impressive 00:09:25.280 |
looking I know but this simplicity is actually one of the main benefits of the 00:09:29.000 |
react architecture in my opinion well why do we have so many tools well there are 00:09:34.060 |
lots of different things that need to happen in campaign creation we need to fetch leads from our 00:09:38.040 |
database we need to insert new DB entities and draft emails and all of those 00:09:42.900 |
things become a tool the react loop that I mentioned on the previous slide that's 00:09:48.180 |
implemented inside of the assistant node and when the assistant actually performs 00:09:53.040 |
an action that is manifested in the in the form of a tool call which is then 00:09:56.760 |
executed by the tool node one thing to note about the react agent is that it runs 00:10:02.140 |
to completion for every turn so if the user says hello and then they say I'd like to 00:10:06.720 |
create a campaign that would be two turns and the react agent runs to completion each time 00:10:10.680 |
that's going to become relevant later and here are some of the tools that we we implemented 00:10:16.860 |
and attached to our agent unfortunately Alice 2 predated MCP and so we didn't use an MCP server 00:10:23.160 |
or any third-party tool registries and a few things I want to tell you about tools before we can 00:10:28.860 |
move on the first is that tools are necessary to take action so anytime you want your agent to do anything on the outside world for example 00:10:36.580 |
call an API or write a file you're going to need a tool to do that they're also necessary to access information beyond the context window if you think about it what your agent 00:10:46.720 |
knows is limited to three things the the conversation context the prompt and the model weights and if you wanted to know anything beyond that you need to give it a tool for example a web search tool and that's essentially the concept behind RAG 00:10:59.720 |
tools can also be used to call other agents this is one of the easiest and simplest ways to to get started with a multi-agent system and 00:11:09.860 |
last but not least tools are preferable over skills so this is a framework I came up with essentially if you think about it if someone asked you to do something like 00:11:17.860 |
like perform a complex calculation you can do that either through a tool like a calculator or maybe you have the skill of the mental arithmetic that's required to perform that calculation and in general it's better 00:11:29.580 |
to use a tool and then to tend to use a skill because this minimizes the amount of tokens you're using in the context to accomplish that goal 00:11:37.980 |
what are the strengths of the reactor architecture well I mentioned one already that is that it is extremely simple we basically never needed to revise our agent structure later on 00:11:48.340 |
it was also great at handling arbitrary user inputs over multiple turns this is because the graph is running to completion each time it allows the the user to say something 00:11:59.440 |
in step three that's related to step one without the the agent getting confused it's actually robust to that so that was a great advantage 00:12:07.840 |
but it had some issues for example the react agent was kind of bad at tools we had attached a lot of tools and as you know that what can sometimes happen when you do that is the agent will struggle with which tool to call and in what order 00:12:20.840 |
And this would sometimes lead to infinite loops where the agent is repeatedly trying to accomplish some part of campaign creation but not succeeding 00:12:27.440 |
And when those infinite loops would go on for a while we would get a recursion limit error which is effectively the the agent equivalent of a stack overflow 00:12:36.240 |
and also the outputs that we were getting from this version of the agent were relatively mediocre the audiences the sequences the emails 00:12:44.640 |
They they they just weren't that good in our hypothesis here was that because there's just one agent and really like one set of prompts that are responsible for doing the entire campaign creation process 00:12:54.840 |
It wasn't really good at any particular point 00:12:57.040 |
So what can we do like how can we address these issues in our case we chose to add structure which led us to implementing a workflow 00:13:05.140 |
A workflow is defined by Anthropic as a system where LLMs and tools are orchestrated through predefined code paths 00:13:13.240 |
In this screenshot and quote from they both come from an excellent blog post by Anthropic called building effective agents highly recommend checking it out 00:13:21.380 |
Importantly workflows are different from agents and this is one of the things that the agent community has been debating a lot on on Twitter for example 00:13:29.700 |
It's the reason why we have the term agentic for sometimes describing a system as opposed to agent 00:13:35.080 |
A system could be agentic but not necessarily an agent per se 00:13:38.820 |
Workflows are highly structured as you probably inferred from that predefined code paths piece 00:13:46.620 |
The LLM is not choosing how to how to orchestrate the code the LMs are just being called within these predefined code paths 00:13:53.180 |
And last but not least workflows are not really a new technology. We've had them for a long time in other forums 00:13:59.360 |
And the most famous form is probably the data engineering dual air flow 00:14:08.200 |
This was our implementation of a workflow campaign creation agent 00:14:12.620 |
It's obviously a lot more complex than our react agent 00:14:15.760 |
We now have 15 nodes. They're split across five different stages and these stages correspond to the different steps of campaign creation that I mentioned before 00:14:25.000 |
Interestingly this graph unlike the react agent doesn't run to completion for every turn 00:14:30.740 |
It only runs completion once for the entire campaign creation process and the way that we get user input or feedback 00:14:36.480 |
Feedback at certain points within the graph execution is through the use of something called node interrupts which is a line graph feature 00:14:42.220 |
There were a number of strengths involved with the workflow architecture 00:14:47.960 |
It basically solved all of the problems we observed with react for one we no longer had issues with tools because we just didn't have tools we've we've replaced them now with these specialized nodes like a write email node 00:14:59.700 |
And we've also got a clearly defined execution flow with a fixed number of steps so no more infinite loops no more recursion limit errors 00:15:08.700 |
On top of that we got much better outputs of the the emails and sequences that we were getting from this version of the agent were much better and that's because you force the agent to go through these particular steps 00:15:19.380 |
But the workflow architecture did have issues for one it was extremely complex and now our front-end campaign creation flow experience was coupled with the the architecture of our agent and we would have to change that architecture in that graph structure anytime we wanted to make changes to the campaign creation experience 00:15:39.480 |
It also didn't support jumping around within the campaign creation flow 00:15:43.680 |
And that's because the graph doesn't run to completion every time when you get to step 3 and you it's you stop using a node interrupt to collect feedback on that step 00:15:51.460 |
You can really only respond to the to what's happening in step 3 you can't jump back to step 1 00:15:56.880 |
So clearly workflows were not going to be it for our use case. What else can we can we try? 00:16:04.820 |
Well after some soul-searching we came across a blog post by Langchain that explained how to build a customer support agent using a multi agent architecture and this is the blog post that gave us the insight that we needed for our use case 00:16:17.760 |
And a multi agent system is one that's a hierarchical approach to building an AI agent and this pattern 00:16:25.320 |
There's one supervisor and there are many sub agents that are specialized and the supervisor is responsible for interacting with the user and for routing tasks to sub agents 00:16:34.540 |
When the sub agents will then fulfill those tasks and they'll escalate back to the supervisor when they're complete 00:16:40.260 |
And we really devoured this blog post by Langchain 00:16:44.440 |
We went a little crazy in the process but ultimately found a version of this that worked for our use case 00:16:49.820 |
And here's what that looks like we have a graph that complicated that a multi agent graph that accomplishes all of campaign creation except for audience creation 00:17:00.540 |
And you can see here at the top is our supervisor node. It's close to this the start of the graph and then we have four specialist sub agents we have a researcher 00:17:08.440 |
We have something that that generates something called a positioning report which is how you should position your your product or service for this particular lead 00:17:15.260 |
Then we have a LinkedIn message writer and finally we have an email writer 00:17:20.540 |
And this multi agent architecture it gave us the best of both worlds 00:17:25.540 |
We got the flexibility of the react agent and then we got the the performance of the workflow 00:17:30.380 |
Now I want to share a couple reflections on building agents from this experience and the first is that 00:17:36.580 |
Simplicity is key all of that structure and scaffolding it can provide performance gains in the short term 00:17:42.260 |
But over the long term it locks you into a structure that can be counterproductive and related to this is that a new model release can really change everything 00:17:50.260 |
Amjad from replet told us this about the replet agent he said it wasn't really working until sonnet 3.5 came out and then they dropped it in and everything was magic and that's really true 00:18:00.020 |
It's also useful to think of your agent as a human co-worker or a team of co-workers 00:18:05.620 |
In our case we had different mental models. We thought that the the agent was a was a user flow within 00:18:12.260 |
Our product or a directed graph and those were the wrong mental models and they led us to implement the wrong architecture 00:18:16.980 |
You should also break big tasks down into small tasks in our case the big task was the campaign creation 00:18:24.180 |
But there were small tasks like writing an email within that and it became easier to implement the agent once we broke it down into the smaller component tasks 00:18:31.140 |
Tools are preferable over skills don't try to make your agent too smart 00:18:36.500 |
Just give it the right tools and tell it how to use them and 00:18:39.380 |
Then last but not least don't forget about prompt engineering 00:18:43.380 |
It's easy to forget that your agent is just a series of LLM calls within a while loop 00:18:48.420 |
If your agent isn't performing well, you should consider going back and doing some prompt engineering that might unlock the performance you're looking for 00:18:56.020 |
And I wish we had time for a demo, but I don't but I do have this QR code 00:19:01.620 |
I'll leave this up for a moment if you're not able to get to it now the slides will be available afterwards 00:19:07.740 |
And Alice - went live in January and now the results have been pretty exciting. She's now sourced close to 2 million leads 00:19:16.380 |
I think these numbers a little out of date and we've sent close to 3 million messages and generated about 21,000 replies 00:19:22.460 |
Over the last month or so her replies or her plot reply rate is around 2% 00:19:27.180 |
Which is on par with a human SDR and we're starting to see that climb as we implement self-learning and some other optimizations 00:19:35.660 |
In terms of future plans, we're excited to integrate Alice and Julian our voice agents so that these two agents can 00:19:42.460 |
Engage leads across multiple channels on both inbound and outbound 00:19:46.540 |
We're also really excited about self-learning we've done some work here, but I wasn't able to talk more about it 00:19:50.860 |
And then finally we're really excited about applications of new technologies like computer use and memory and reinforcement learning 00:19:59.420 |
Yeah, if any of this is you know sounds exciting and you're sick of building software you want to build digital you know workers 00:20:05.980 |
Message showed myself for you know anyone 11x we need like 11 times as many people like 11 times as fast 00:20:11.980 |
So this is a bit of a show but like please please like we need a lot of people to you know to build the future