back to index

Alice 2: Building and Scaling an AI Agent During HyperGrowth | 11x | LangChain Interrupt


Whisper Transcript | Transcript Only Page

00:00:00.000 | everyone how's it going my name is Sherwood I am one of the tech leads here
00:00:14.620 | at 11x I lead engineering for our Alice products and today I'm joined by Keith
00:00:18.760 | our head of growth who is the product manager for this Alice project now an
00:00:24.360 | 11x for those of you who are unfamiliar is a company that's building digital
00:00:27.900 | workers we have two digital workers today the first is Alice she's our AI SDR and
00:00:32.640 | the second is Julian he's an AI voice agent and we've got more workers on the
00:00:36.420 | way I want to take everybody back to September 2024 it's for most people not
00:00:42.580 | long ago for us you know it's half the company's history we just crossed 10
00:00:46.860 | million car we just announced our series a then our series B released 15 days
00:00:53.220 | later with all this chaos going on we relocated our whole team and company from
00:00:58.520 | London that San Francisco to our beautiful new office with our beautiful new CTO and
00:01:05.840 | you know at the same time we we also bought a rocket because we're 11x and you
00:01:13.140 | know during all this chaos we chose this moment to rebuild our core product from
00:01:19.380 | the ground up and the reason we did that is because we truly felt at the time and
00:01:24.360 | proved to be true is that agents were the future so today's talk I want to first
00:01:29.760 | tell you why we felt the need to rebuild Alice from scratch hopefully I think
00:01:33.300 | everyone is probably in agreement about agents being the future then I'll tell you
00:01:37.020 | how we did it we built this enterprise grade AI STR in just three months then I
00:01:42.000 | want to talk you through one of the challenges that we experienced which was
00:01:44.500 | finding the right agent architecture and I'll wrap up with some reflections on
00:01:48.480 | building agents and some closing thoughts so let's start with the decision to
00:01:52.800 | rebuild why did we feel like we needed to rebuild our core product from scratch at
00:01:56.520 | such a critical moment well to understand that question you first need to
00:02:01.260 | understand Alice one and Alice one was our original AI SDR product the main thing
00:02:07.020 | that you could do with Alice was create these custom AI powered outreach campaigns
00:02:11.100 | and there were five steps involved in campaign creation the first step is
00:02:15.320 | defining your audience that's when you identify the people that you'd like to
00:02:18.380 | sell to and in the second step you describe your offer this is the products
00:02:23.000 | or service that you're hoping to sell then in the third and fourth step you
00:02:27.980 | construct your sequence and also tweak the AI generated messaging and finally
00:02:32.260 | when everything is to your liking you move on to the last step which is you
00:02:35.240 | launch the campaign and that's when Alice will begin sourcing leads that
00:02:38.540 | match your ICP researching them writing those customized emails and in general just
00:02:44.100 | executing the sequence that you've built for every lead that enters the campaign
00:02:46.700 | now Alice one was a big success by a lot of different metrics but we wouldn't
00:02:53.420 | really consider her a true digital worker and that's for a lot of reasons for one
00:02:57.320 | there was a lot of button clicking more than you would probably expect of a
00:03:01.220 | digital worker and you also probably saw there was a lot of manual input
00:03:05.080 | especially on that offers page our lead research was also relatively basic we
00:03:11.240 | weren't doing deep research or scraping the web or anything like that and
00:03:16.220 | downstream that would lead to relatively uninspiring personalization in our emails
00:03:20.660 | and there on top of that Alice wasn't able to handle replies automatically she
00:03:26.480 | wasn't able to to answer customers questions and finally there was no real
00:03:31.220 | self-learning component she wasn't getting better over time meanwhile while we
00:03:36.380 | were building Alice one the industry was evolving around us in March of 2023 we
00:03:41.700 | got GPT for we got the first cloud model and we got the first agent frameworks then
00:03:47.140 | later that year we got cloud two and we got function calling in the open AI API then
00:03:53.200 | in January of 2024 we got a more production ready agent framework in the form of
00:03:56.960 | land graph then in March we got cloud three in May we got GPT for oh and finally
00:04:04.580 | in September we got the replet agent which for us was the first example of a
00:04:08.940 | truly mind-blowing agentic software product and just a double click into the
00:04:12.760 | replet agent a little bit this really blew our minds it convinced us of two things
00:04:17.600 | first the agents were going to be really powerful they could build entire apps from
00:04:21.680 | scratch and second that they're here today they're ready for production so with
00:04:27.620 | that in mind we developed a new vision for Alice centered on seven agentic
00:04:31.180 | capabilities and the first one was chat we believe that users should mostly
00:04:35.060 | interact with Alice through chat the way they would interact with a human team
00:04:37.940 | member secondly users should be able to upload internal documents their their
00:04:43.160 | websites meeting recordings to a knowledge base and in doing so they would
00:04:46.400 | train Alice third we should use an AI agent for lead sourcing that actually
00:04:51.900 | considers the quality and and fit of each lead rather than than a dumb filter
00:04:57.140 | search number four we should do deep research on every lead and that should
00:05:01.340 | lead to number five which is true personalization in those emails and then
00:05:06.140 | finally we believe that I should be able to handle inbound messages
00:05:10.760 | automatically answering questions and booking meetings also she should be
00:05:15.140 | self-learning she should incorporate the insights from all of the campaigns she's
00:05:18.460 | running to optimize the performance of your account so that was our vision and
00:05:23.720 | with that in place we start we set about to to rebuild Alice from scratch and in
00:05:28.800 | short this was a pretty aggressive push for the company it it took us just three
00:05:32.360 | months from the first commit to migrating our last business customer we
00:05:36.560 | initially staffed just two engineers on building the agent after developing the POC we
00:05:40.860 | we brought in more resources we had one project manager our one and only Keith
00:05:45.620 | and we had about 300 customers that needed to be migrated from our original
00:05:50.280 | platform to the new one and that was growing by the day we had our go-to-market
00:05:54.120 | team was just really not slowing down there were a few key decisions that we
00:05:59.400 | made at the outset of this project the first is that we wanted to start from
00:06:02.760 | scratch we didn't want Alice to to be encumbered by Alice one in any way so new
00:06:06.840 | repo new infrastructure new team we also didn't want to reinvent the wheel we
00:06:12.000 | were going to be taking on a lot of risk with some unfamiliar technologies like
00:06:14.760 | the agent and the knowledge base we didn't want to add additional risk
00:06:18.420 | through technologies that we didn't understand so we chose a very vanilla tech
00:06:21.480 | stack and number three we wanted to leverage vendors as much as possible to
00:06:26.580 | move really quickly we didn't want to be building non-essential components this is
00:06:31.840 | the tech stack that we went with I won't go into too much detail here but I
00:06:34.520 | thought people be interested to see and here are some of the vendors that we
00:06:39.040 | chose to leverage and work with we I can't go into detail with every one of
00:06:43.180 | these vendors but they were all essential to our access and wanted to shout
00:06:46.360 | everyone out that that has been useful of course one of the most important
00:06:51.800 | vendors we chose to work with was Langchain and we knew that we were going to
00:06:55.360 | need a really good partner from the start if we're gonna pull this off Langchain was a
00:06:58.780 | very natural choice for us they were a clear industry leader in AI dev tools
00:07:02.800 | and AI infrastructure they had an agent framework ready to go that agent framework
00:07:06.940 | had cloud hosting and observability so we knew we were going to be able to get
00:07:10.060 | product get to production and that once our agent was in production we would
00:07:13.400 | understand how it's performing we also had some familiarity from Alice one we were
00:07:17.440 | using the the core SDK with Alice one and then Langchain also had TypeScript
00:07:22.360 | support which is important to us as a TypeScript shop and last but not least the
00:07:26.460 | customer support from the Langchain team was just incredible they really felt
00:07:30.060 | like an extension of our team they ramped us up on Lang graph and the Langchain
00:07:33.060 | ecosystem and on agents in general and we are so grateful to them for that help in
00:07:38.840 | terms of the products that we use today we use pretty much the entire suite and now
00:07:44.560 | I want to talk you talk you through the one of the main challenges that we
00:07:47.400 | encountered while building this while building Alice - which was finding the
00:07:51.240 | right agent architecture and you'll remember the main feature of Alice was
00:07:56.420 | campaign creation so we wanted Alice the Alice agent to guide users through
00:08:01.000 | campaign creation the same way that a repli agent would guide you through creating
00:08:04.680 | an app we tried three different architectures for this the first was
00:08:09.840 | react the second was a workflow and then finally we ended on a multi agent
00:08:14.760 | system so now I want to talk you through each of these how it works in detail and
00:08:19.360 | why it didn't work for our use case until we arrived at multi agent let's start
00:08:24.340 | with react well react is a JavaScript framework for building user interfaces
00:08:28.420 | but that's not what I mean here I mean the react model of an AI agent which I
00:08:32.680 | think other people have talked about earlier today this is a model that was
00:08:35.920 | invented by Google researchers back in 2022 and it stands for reason and act and
00:08:41.360 | basically what these researchers observed is if you include reasoning traces in the
00:08:46.120 | conversation context the agent performs better than it otherwise would and with
00:08:50.540 | a react agent the execution loop is split into three parts there's reasoning where
00:08:55.480 | the agent thinks about what to do there's action where the agent actually takes action
00:08:59.860 | for example performing a tool call and then finally there's observe where the
00:09:04.120 | agent observed the new state of the world after performing the the action and I
00:09:08.180 | guess reacto wasn't a very good name but as I mentioned reasoning traces lead to
00:09:14.120 | better performance in the agent this is our implementation of a react agent it
00:09:19.580 | consists of a just one node and 10 or 20 tools and it's not very impressive
00:09:25.280 | looking I know but this simplicity is actually one of the main benefits of the
00:09:29.000 | react architecture in my opinion well why do we have so many tools well there are
00:09:34.060 | lots of different things that need to happen in campaign creation we need to fetch leads from our
00:09:38.040 | database we need to insert new DB entities and draft emails and all of those
00:09:42.900 | things become a tool the react loop that I mentioned on the previous slide that's
00:09:48.180 | implemented inside of the assistant node and when the assistant actually performs
00:09:53.040 | an action that is manifested in the in the form of a tool call which is then
00:09:56.760 | executed by the tool node one thing to note about the react agent is that it runs
00:10:02.140 | to completion for every turn so if the user says hello and then they say I'd like to
00:10:06.720 | create a campaign that would be two turns and the react agent runs to completion each time
00:10:10.680 | that's going to become relevant later and here are some of the tools that we we implemented
00:10:16.860 | and attached to our agent unfortunately Alice 2 predated MCP and so we didn't use an MCP server
00:10:23.160 | or any third-party tool registries and a few things I want to tell you about tools before we can
00:10:28.860 | move on the first is that tools are necessary to take action so anytime you want your agent to do anything on the outside world for example
00:10:36.580 | call an API or write a file you're going to need a tool to do that they're also necessary to access information beyond the context window if you think about it what your agent
00:10:46.720 | knows is limited to three things the the conversation context the prompt and the model weights and if you wanted to know anything beyond that you need to give it a tool for example a web search tool and that's essentially the concept behind RAG
00:10:59.720 | tools can also be used to call other agents this is one of the easiest and simplest ways to to get started with a multi-agent system and
00:11:09.860 | last but not least tools are preferable over skills so this is a framework I came up with essentially if you think about it if someone asked you to do something like
00:11:17.860 | like perform a complex calculation you can do that either through a tool like a calculator or maybe you have the skill of the mental arithmetic that's required to perform that calculation and in general it's better
00:11:29.580 | to use a tool and then to tend to use a skill because this minimizes the amount of tokens you're using in the context to accomplish that goal
00:11:37.980 | what are the strengths of the reactor architecture well I mentioned one already that is that it is extremely simple we basically never needed to revise our agent structure later on
00:11:48.340 | it was also great at handling arbitrary user inputs over multiple turns this is because the graph is running to completion each time it allows the the user to say something
00:11:59.440 | in step three that's related to step one without the the agent getting confused it's actually robust to that so that was a great advantage
00:12:07.840 | but it had some issues for example the react agent was kind of bad at tools we had attached a lot of tools and as you know that what can sometimes happen when you do that is the agent will struggle with which tool to call and in what order
00:12:20.840 | And this would sometimes lead to infinite loops where the agent is repeatedly trying to accomplish some part of campaign creation but not succeeding
00:12:27.440 | And when those infinite loops would go on for a while we would get a recursion limit error which is effectively the the agent equivalent of a stack overflow
00:12:36.240 | and also the outputs that we were getting from this version of the agent were relatively mediocre the audiences the sequences the emails
00:12:44.640 | They they they just weren't that good in our hypothesis here was that because there's just one agent and really like one set of prompts that are responsible for doing the entire campaign creation process
00:12:54.840 | It wasn't really good at any particular point
00:12:57.040 | So what can we do like how can we address these issues in our case we chose to add structure which led us to implementing a workflow
00:13:05.140 | A workflow is defined by Anthropic as a system where LLMs and tools are orchestrated through predefined code paths
00:13:13.240 | In this screenshot and quote from they both come from an excellent blog post by Anthropic called building effective agents highly recommend checking it out
00:13:19.780 | I shamelessly lifted it
00:13:21.380 | Importantly workflows are different from agents and this is one of the things that the agent community has been debating a lot on on Twitter for example
00:13:29.700 | It's the reason why we have the term agentic for sometimes describing a system as opposed to agent
00:13:35.080 | A system could be agentic but not necessarily an agent per se
00:13:38.820 | Workflows are highly structured as you probably inferred from that predefined code paths piece
00:13:46.620 | The LLM is not choosing how to how to orchestrate the code the LMs are just being called within these predefined code paths
00:13:53.180 | And last but not least workflows are not really a new technology. We've had them for a long time in other forums
00:13:59.360 | And the most famous form is probably the data engineering dual air flow
00:14:02.460 | And the clicker is okay there we go
00:14:08.200 | This was our implementation of a workflow campaign creation agent
00:14:12.620 | It's obviously a lot more complex than our react agent
00:14:15.760 | We now have 15 nodes. They're split across five different stages and these stages correspond to the different steps of campaign creation that I mentioned before
00:14:25.000 | Interestingly this graph unlike the react agent doesn't run to completion for every turn
00:14:30.740 | It only runs completion once for the entire campaign creation process and the way that we get user input or feedback
00:14:36.480 | Feedback at certain points within the graph execution is through the use of something called node interrupts which is a line graph feature
00:14:42.220 | There were a number of strengths involved with the workflow architecture
00:14:47.960 | It basically solved all of the problems we observed with react for one we no longer had issues with tools because we just didn't have tools we've we've replaced them now with these specialized nodes like a write email node
00:14:59.700 | And we've also got a clearly defined execution flow with a fixed number of steps so no more infinite loops no more recursion limit errors
00:15:08.700 | On top of that we got much better outputs of the the emails and sequences that we were getting from this version of the agent were much better and that's because you force the agent to go through these particular steps
00:15:19.380 | But the workflow architecture did have issues for one it was extremely complex and now our front-end campaign creation flow experience was coupled with the the architecture of our agent and we would have to change that architecture in that graph structure anytime we wanted to make changes to the campaign creation experience
00:15:36.540 | So super laborious and annoying
00:15:39.480 | It also didn't support jumping around within the campaign creation flow
00:15:43.680 | And that's because the graph doesn't run to completion every time when you get to step 3 and you it's you stop using a node interrupt to collect feedback on that step
00:15:51.460 | You can really only respond to the to what's happening in step 3 you can't jump back to step 1
00:15:56.880 | So clearly workflows were not going to be it for our use case. What else can we can we try?
00:16:04.820 | Well after some soul-searching we came across a blog post by Langchain that explained how to build a customer support agent using a multi agent architecture and this is the blog post that gave us the insight that we needed for our use case
00:16:17.760 | And a multi agent system is one that's a hierarchical approach to building an AI agent and this pattern
00:16:25.320 | There's one supervisor and there are many sub agents that are specialized and the supervisor is responsible for interacting with the user and for routing tasks to sub agents
00:16:34.540 | When the sub agents will then fulfill those tasks and they'll escalate back to the supervisor when they're complete
00:16:40.260 | And we really devoured this blog post by Langchain
00:16:44.440 | We went a little crazy in the process but ultimately found a version of this that worked for our use case
00:16:49.820 | And here's what that looks like we have a graph that complicated that a multi agent graph that accomplishes all of campaign creation except for audience creation
00:16:58.540 | Which we kept separate for different reasons
00:17:00.540 | And you can see here at the top is our supervisor node. It's close to this the start of the graph and then we have four specialist sub agents we have a researcher
00:17:08.440 | We have something that that generates something called a positioning report which is how you should position your your product or service for this particular lead
00:17:15.260 | Then we have a LinkedIn message writer and finally we have an email writer
00:17:20.540 | And this multi agent architecture it gave us the best of both worlds
00:17:25.540 | We got the flexibility of the react agent and then we got the the performance of the workflow
00:17:30.380 | Now I want to share a couple reflections on building agents from this experience and the first is that
00:17:36.580 | Simplicity is key all of that structure and scaffolding it can provide performance gains in the short term
00:17:42.260 | But over the long term it locks you into a structure that can be counterproductive and related to this is that a new model release can really change everything
00:17:50.260 | Amjad from replet told us this about the replet agent he said it wasn't really working until sonnet 3.5 came out and then they dropped it in and everything was magic and that's really true
00:18:00.020 | It's also useful to think of your agent as a human co-worker or a team of co-workers
00:18:05.620 | In our case we had different mental models. We thought that the the agent was a was a user flow within
00:18:12.260 | Our product or a directed graph and those were the wrong mental models and they led us to implement the wrong architecture
00:18:16.980 | You should also break big tasks down into small tasks in our case the big task was the campaign creation
00:18:24.180 | But there were small tasks like writing an email within that and it became easier to implement the agent once we broke it down into the smaller component tasks
00:18:31.140 | Tools are preferable over skills don't try to make your agent too smart
00:18:36.500 | Just give it the right tools and tell it how to use them and
00:18:39.380 | Then last but not least don't forget about prompt engineering
00:18:43.380 | It's easy to forget that your agent is just a series of LLM calls within a while loop
00:18:48.420 | If your agent isn't performing well, you should consider going back and doing some prompt engineering that might unlock the performance you're looking for
00:18:56.020 | And I wish we had time for a demo, but I don't but I do have this QR code
00:19:01.620 | I'll leave this up for a moment if you're not able to get to it now the slides will be available afterwards
00:19:05.740 | You can check out what we've built
00:19:07.740 | And Alice - went live in January and now the results have been pretty exciting. She's now sourced close to 2 million leads
00:19:16.380 | I think these numbers a little out of date and we've sent close to 3 million messages and generated about 21,000 replies
00:19:22.460 | Over the last month or so her replies or her plot reply rate is around 2%
00:19:27.180 | Which is on par with a human SDR and we're starting to see that climb as we implement self-learning and some other optimizations
00:19:35.660 | In terms of future plans, we're excited to integrate Alice and Julian our voice agents so that these two agents can
00:19:42.460 | Engage leads across multiple channels on both inbound and outbound
00:19:46.540 | We're also really excited about self-learning we've done some work here, but I wasn't able to talk more about it
00:19:50.860 | And then finally we're really excited about applications of new technologies like computer use and memory and reinforcement learning
00:19:59.420 | Yeah, if any of this is you know sounds exciting and you're sick of building software you want to build digital you know workers
00:20:05.980 | Message showed myself for you know anyone 11x we need like 11 times as many people like 11 times as fast
00:20:11.980 | So this is a bit of a show but like please please like we need a lot of people to you know to build the future
00:20:18.140 | Cool, thanks everyone guys cheers harrison
00:20:21.020 | Thank you
00:20:23.020 | Thank you.
00:20:25.020 | Thank you.
00:20:26.020 | Thank you.
00:20:26.520 | Thank you.