back to index

Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x


Whisper Transcript | Transcript Only Page

00:00:00.000 | Okay, thanks everyone for coming today. So today's talk is called "Building Alice's
00:00:20.360 | Brain: How We Built an AI Sales Rep That Learns Like a Human." My name is Sherwood. I am one of
00:00:26.640 | the tech leads here at 11x. I lead engineering for our Alice product and I'm
00:00:30.840 | joined by my colleague, Sotwick. So 11x, for those of you who are unfamiliar, is a
00:00:36.240 | company that's building digital workers for the go-to-market organization. We have
00:00:40.380 | two digital workers today. We have Alice, who is our AI SDR, and then we also have
00:00:44.340 | Julian, who is our voice agent, and we have more workers on the way. Today we're
00:00:49.980 | gonna be talking about Alice specifically and actually Alice's brain or the
00:00:54.780 | knowledge base, which is effectively her brain. So let's start from the basics.
00:00:59.540 | What is an SDR? Well an SDR is a sales development representative, if you're not
00:01:04.400 | familiar. I know that's a room full of engineers, so I thought I would start with
00:01:07.320 | the basics. And this is essentially an entry-level sales role. This is the kind
00:01:11.220 | of job that you might get right out of school. And your responsibilities basically
00:01:15.420 | boil down to three things. First, you're sourcing leads. These are people that you'd
00:01:19.680 | like to sell to. Then you're contacting them or engaging them across channels.
00:01:23.400 | And finally, you're booking meetings with those people. So your goal here is to
00:01:27.900 | generate positive replies and meetings booked. These are the two key metrics for
00:01:32.100 | an SDR. And a lot of an SDR's job boils down to writing emails like the one that you
00:01:38.120 | see in front of you right now. This is actually an email that Alice has written, and
00:01:41.440 | it's an example of the type of work output that Alice has. Alice sends about 50,000 of
00:01:49.160 | these emails in a given day. And that's in comparison to a human SDR who would send 20 to
00:01:54.860 | 50. And Alice is now running campaigns for about 300 different business organizations. So before
00:02:04.160 | we go any further, I want to define some terms. Because since we work at 11x, we have our customers,
00:02:09.120 | but then our customers also have their customers. So things get a little confusing.
00:02:12.840 | Today, we'll be using the term seller to refer to the company that is selling
00:02:16.800 | something through Alice that is our customer. And then we'll be using the term
00:02:20.280 | lead to refer to the person who's being sold to. And here's what that looks like as
00:02:25.260 | a diagram. You can see the seller is pushing context about their business. These
00:02:29.620 | are the products that they sell or the case studies that they have that they can
00:02:33.660 | reference in emails. They're pushing that to Alice. And then Alice is then using that to
00:02:38.100 | personalize emails for each of the leads that she contacts. So there are two
00:02:42.900 | requirements that Alice needs to in order to succeed in her role. The first is that
00:02:47.460 | she needs to know the seller, the products, the services, the case studies, the pain
00:02:52.020 | points, the value props, the ICP. And the second is that she needs to know the lead,
00:02:56.460 | their role, their responsibilities, what they care about, what other solutions they've
00:03:00.660 | tried, pain points that they might be experiencing, the company they work for. And today, we're going to be really focused on knowing the seller.
00:03:07.100 | So in our, in the old version of our product, the seller would be responsible for pushing context about her, about their business
00:03:16.460 | to Alice. And they did so through a manual experience called the library. And here you can see what it looks like there, where the library shows all of the different products and offers that are available for this business that Alice can then reference when she writes emails.
00:03:31.100 | The user would have to enter details about every individual product and service and all of the pain points and solutions and value props associated with them in our dashboard and
00:03:40.460 | including these detailed descriptions. And those descriptions would, were important to get right because these actually get included in the context for the emails or for Alice when she writes the emails.
00:03:50.460 | Then later on, during campaign creation, this is what it looks like to create a campaign. And you can see we have a lead in the top left. And the user
00:03:59.820 | is selecting the different offers that they've defined from the library in the top right. And these are the offers that Alice has access to when she's generating her emails.
00:04:07.820 | We had a lot of problems with this user experience. And the first one was it was just extremely tedious. It was a really bad and cumbersome user experience.
00:04:17.180 | The user had to enter a lot of information. And that created this onboarding friction where users couldn't actually run campaigns until they hadn't filled out their library.
00:04:27.420 | And finally, the emails that we were generating using this approach were just suboptimal. Users would have to either choose between too few emails, or too few offers,
00:04:35.420 | which meant that you'd have irrelevant offers for a given lead, or too many offers, which means that you have all of the stuff in the context window.
00:04:42.620 | And Alice just wasn't as smart when she writes those emails.
00:04:47.020 | So how can we address this? Well, we had an idea, which is that instead of the seller being responsible for pushing context about the business to Alice,
00:04:55.580 | we could flip things around so that Alice can proactively pull all of the context about the seller into her system,
00:05:02.620 | and then use whatever is most relevant when writing those emails. And that's effectively what we accomplished with the knowledge base,
00:05:08.940 | which we'll tell you more about in just a moment.
00:05:10.620 | So for the rest of the talk, we're going to first do a high-level overview of the knowledge base and how it works.
00:05:17.820 | Then we will do a deep dive on the pipeline, the different steps in our RAG system pipeline.
00:05:22.780 | Then after that, we will talk through the user experience of the knowledge base.
00:05:27.340 | And we will wrap up with some lessons from this project and future clients.
00:05:31.820 | So let's start out with an overview.
00:05:35.500 | All right. So overview, what is knowledge base, right?
00:05:38.700 | It's basically a way for us to kind of get closer to a human experience.
00:05:43.740 | Like if you're training a human SDR, you would kind of get them in, and then you will basically dump a bunch
00:05:49.580 | of documents on them, and then they ramp up throughout a period of like weeks or months.
00:05:54.140 | And you can basically check in on their progress.
00:05:57.980 | And similar to that, knowledge base is basically a centralized repository on our platform for the seller
00:06:04.220 | info. And then users can kind of come in, dump all their source material, and then we are able to
00:06:09.260 | reference that information at the time of message generation.
00:06:11.980 | Now, what resources do SDRs care about? Here's a little glimpse into that.
00:06:17.020 | Marketing materials, case studies, sales calls, press releases, you know, and a bunch of other stuff.
00:06:22.380 | Now, how do we bucket these into categories that we're actually going to parse?
00:06:28.620 | Well, we created documents and images, websites, and then media, audio, video. And you're going to see
00:06:33.980 | why that's important. So here's an overview of what the architecture looks like. It starts off with
00:06:40.060 | the user uploading something, any document or resource in the client, and then we save it to our S3 bucket,
00:06:46.380 | and then send it to the backend, which then, you know, creates a bunch of resources in our DB, and then
00:06:52.540 | kicks off a bunch of jobs depending on the resource type and the vendor selected. Now, the vendors are
00:06:57.660 | asynchronously doing the parsing. Once they're done, they send a webhook to us, which we consume via
00:07:02.300 | ingest. And then once we've consumed that webhook, we take that parsed artifact that we get back from the
00:07:11.100 | vendors, and then we store it in our DB, and then at the same time, upsert it to pinecone and embed it.
00:07:18.540 | And then eventually, once we store it in the local DB, we have like a UI update, and then eventually,
00:07:23.820 | our agent can query pinecone, our vector DB, for that stored information that we just put in.
00:07:30.460 | So now that we have a high level of understanding of how the knowledge base works, let's dig into each
00:07:35.580 | individual step in the pipeline. There are five different steps in the pipeline. The first is parsing,
00:07:41.260 | then there's chunking, then there's storage, then there's retrieval, and finally, we have visualization,
00:07:47.740 | which will, sounds a little untraditional, but we'll cover it in a moment. So let's start with
00:07:52.620 | parsing. What is parsing? I think that we probably all take this for granted, but it's worth defining.
00:07:58.700 | Parsing is the process of converting a non-text resource into text. And the reason that this is
00:08:03.500 | necessary is because, as we all know, language models, they speak text. So in order to make information
00:08:09.660 | that is represented in a different form, like a PDF, or an MP4 file, or a, or an image, legible or useful
00:08:17.020 | to the LLM, we need to first convert it to text. And so one way of thinking about parsing is it's the
00:08:21.980 | process of making non-text information legible to a large language model. And we do have multimodal models
00:08:27.660 | that are one solution to this, but there are lots of restrictions on multimodal models that make it,
00:08:31.660 | that make parsing still relevant. So to illustrate that we have the five different document types or
00:08:38.060 | resource types that we mentioned momentarily ago, going through our parsing process and coming out as
00:08:42.700 | actually markdown, which is a type of text that, as we all know, contains some structural information
00:08:47.900 | and formatting, which is actually semantically, semantically meaningful and useful.
00:08:51.660 | Let's talk about the process of how we implemented Parsed. And the, the short answer is that we did not.
00:08:57.980 | We didn't want to build this from scratch. And we had a few different reasons for doing this.
00:09:02.540 | The first is that you just saw that we had five different research types and a lot of different
00:09:06.380 | file types within each of them. We thought it was going to be too many, and we thought it was going
00:09:09.820 | to be too much work. We wanted to get to market quickly. The last reason was that we just weren't that
00:09:15.420 | confident in the outcome. There are vendors who dedicate their entire company to building an
00:09:19.820 | effective parsing system for a specific resource type. We didn't want our team to, to have to become
00:09:25.180 | specialists in, in parsing for each one of these resource types and to build a, a parsing system for
00:09:29.980 | that. We thought that maybe if we tried to do this, the outcome actually just wouldn't be that, that
00:09:34.460 | successful. So we chose to work with a vendor and here are a bunch of the vendors that we, we came across.
00:09:41.260 | You can find 10 or 20 or 50 with just a quick Google search, but these are some of the leaders
00:09:45.900 | that we evaluated. And in order to make a decision, we came up with some requirements and three specific
00:09:52.380 | requirements. The first was that we needed support for our necessary resource types. That goes without
00:09:57.900 | saying. We also wanted markdown output. And then finally, we wanted this vendor to support webhooks. We
00:10:03.740 | wanted to be able to receive that output in a convenient manner. A few things that we didn't
00:10:09.100 | consider to start out with. Accuracy. Crazy. We didn't consider accuracy. We didn't consider either
00:10:16.060 | accuracy or comprehensiveness. Our assumption here was that most of the vendors that are leaders in the
00:10:21.180 | market are going to be within a reasonable band of accuracy and comprehensiveness. And accuracy would
00:10:26.300 | refer to whether or not the extracted output is actually matches the original resource. Comprehensiveness
00:10:32.620 | on the other hand is the amount of extracted information that is available in the, in the final
00:10:38.460 | output. The last thing that we didn't really consider was cost to be honest. And this was because we were,
00:10:44.060 | the system was pre-production. We didn't have real production data yet and we didn't know what our usage
00:10:49.500 | would be. And so we, we figured what we would do is would come back and optimize cost once we had real
00:10:54.460 | usage data. So on to our final selections. For documents and images, we chose to work with LlamaParse,
00:11:01.980 | which is a LlamaIndex product. I think Jerry was up here earlier today. And the reasons that we chose to
00:11:07.660 | work with LlamaParse was first, it supported the most number of file types out of any document parsing
00:11:13.100 | solution we could find. And second, their support was really great. Jerry and his team were, were, were quick
00:11:18.620 | to get in a Slack channel with us. I think within just a couple of hours of us doing an initial
00:11:22.540 | evaluation. And with LlamaParse, we're able to turn documents like this PDF of a 11x sales deck
00:11:30.540 | into a markdown file, like the one you see on the right. For websites, we chose to work with FireCrawl.
00:11:36.460 | The other main vendor that we were considering was Tavely. And this is actually not really a major knock on
00:11:40.940 | Tavely. For FireCrawl, we chose to work with them because first, we were familiar. We had already worked with
00:11:45.580 | them on a previous project. And secondly, Tavely's Crawl endpoint, which is the endpoint that we would
00:11:50.940 | have needed for this project, was still in development at the time. So it wasn't something we could
00:11:54.700 | actually use. And similar to LlamaParse with FireCrawl, we are able to take a website like this Brex
00:12:02.620 | homepage that you see here and turn it into another markdown document.
00:12:05.340 | Then we have audio and video. And for audio and video, we chose to work with a newer upstart vendor
00:12:11.980 | called Cloud Glue. And the reasons that we chose to work with Cloud Glue were first,
00:12:16.300 | they supported both audio and video, not just audio. And second, they were actually capable
00:12:21.980 | of extracting information from the video itself, as opposed to just transcribing the video and giving
00:12:27.180 | us back a markdown file that contains the transcript of the audio.
00:12:30.460 | And so with Cloud Glue, we were able to turn YouTube videos and MP4 files and other video formats into
00:12:37.180 | markdown like you see on the right. So now that everything is marked down, we move on to the next
00:12:41.980 | step, which is chunking. All right. Markdown. Let's go. Now, basically, we have a blob of markdown,
00:12:48.060 | right? And we want to kind of break it down into like semantic entities that we can embed and put it in our
00:12:54.860 | vectorDB. At the same time, we want to protect the structure of the markdown because it contains some meaning
00:13:02.300 | inherently, like something's a title versus something's a paragraph. There is inherent meaning behind that.
00:13:07.420 | So we're splitting these long blobs of text, like 10-page documents into chunks that we can eventually
00:13:14.300 | retrieve after we've embedded and stored them in a vectorDB, right? And now, basically, we can, like,
00:13:21.020 | take all of this and, like, we're thinking about how we can, you know, split our long document into chunks,
00:13:28.380 | right? So chunking strategies. You have various things that you can do. You can split on tokens. You can split on
00:13:34.700 | sentences. You can also split on markdown headers, right? And then you can do, like, LLM calls and have an LLM
00:13:41.660 | split your document into chunks, you know, or any combination of the above. Now, what you want to ask yourself when
00:13:48.540 | you're deciding on a chunking strategy is, like, what kind of logical units am I trying to preserve in my
00:13:53.980 | data, right? What do I eventually want to extract during my retrieval, right? What strategy will keep
00:14:00.140 | them intact? And at the same time, you're able to successfully embed them and store them in whatever
00:14:04.940 | DB you want. So, and then, should I try a different strategy for different resource types? We have, like,
00:14:11.260 | we have to deal with PDFs, PowerPoints, videos, right? And then, eventually, what kinds of queries or retrieval
00:14:17.820 | strategies am I expecting? And then, we ended up with, like, a combination of all the three, like,
00:14:24.940 | all the things that we mentioned. So, we split on markdown headers, and then we kind of a waterfall.
00:14:30.060 | So, because we want our, like, records in our vector DB to be a certain token count, so we split
00:14:36.540 | on markdown headers, and then we split on sentences, and then eventually we split on tokens. And then,
00:14:41.740 | yeah, it's, like, worked well for us for all types of documents. And it has successfully preserved our
00:14:47.660 | markdown chunks that we can kind of cleanly show in the UI. And it also prevents super long chunks,
00:14:53.180 | which are, you know, diluting the meaning behind your document if you end up with that.
00:14:56.940 | Speaker 1: Okay, so we have split all of our markdown into individual chunks. It's now time to put those
00:15:02.540 | chunks somewhere. We're going to store them. Let's talk about storage technologies. So, for storage
00:15:07.340 | technologies, I'm sure everyone is, like, here for the RAG section. So, they think that we're using a
00:15:11.420 | vector database. We actually are using a vector database. But to be pedantic, RAG is retrieval augmented
00:15:16.460 | generation. So, we all know that anytime you're retrieving context from an external source, whether it's a graph
00:15:22.780 | database, or elastic search, or a file in the file system, that also qualifies as RAG. Some of the
00:15:30.140 | other options you can use for RAG, I just mentioned a graph database, document databases, relational databases,
00:15:37.020 | key value stores. You could even use object storage like S3. In our case, we did use a vector database,
00:15:43.660 | and that's because we wanted to do some similarity search, which is what vector databases are built for
00:15:49.100 | and optimized for. Once again, we had a lot of options to choose from. This is a not a complete or an
00:15:56.380 | exhaustive list. In the end, we chose to work with a company called Pinecone. And the reason that we
00:16:02.300 | chose to work with Pinecone was first, it was a well-known solution. We were kind of new to the space,
00:16:06.860 | and we thought probably can't go wrong going with the market leader. It was cloud hosted, so our team
00:16:13.260 | wouldn't have to spin up any additional infrastructure. It was really easy to get started. They had great getting
00:16:18.140 | started guides and SDKs. They had embedding models bundled with the solution. So for a vector database,
00:16:25.100 | typically you have to embed the information before it goes into the database. That would require the use
00:16:29.500 | of a third party or an external vector, excuse me, embedding model. And with Pinecone, we didn't
00:16:35.020 | actually have to go find another embedding model provider or host our own embedding model. We just
00:16:39.100 | used the one that they provide. And last but not least, their customer support was awesome. They got a lot of
00:16:44.300 | calls with us, helped us analyze different vector database options, and think through graph databases
00:16:50.300 | and graph rag, whether that made sense for us.
00:16:53.260 | So retrieval, the rag part of the rag workflow that we just built, right? You'll see that there's
00:17:00.620 | actually an evolution of different rag techniques over the last year. We started off with just traditional
00:17:05.500 | rag, which is kind of a play on you're pulling information and then enriching your system prompt for an
00:17:11.020 | LLM API call, right? And then eventually that turned into an agentic rag form where
00:17:15.740 | now you have all these tools for getting information retrieval. And then you attach those tools to
00:17:22.460 | whatever agentic flow that you have, and then it calls the tool as a part of its larger flow, right?
00:17:29.420 | Now something we were seeing emerge in the last couple of months is deep research rack, where now you have
00:17:35.100 | these deep research agents which are coming up with a plan, and then they execute them. And the plan may contain one or
00:17:39.020 | many steps of retrieval, right? These deep research agents can go broad or deep depending on the context needs and
00:17:47.740 | they can evaluate whether or not they want to do more retrieval.
00:17:51.740 | We ended up building a deep research agent.
00:17:53.740 | We actually used a company called Leto. Leto is a cloud agent provider and they're really easy to build with.
00:18:01.740 | So how it works, basically we pass in the lead information to our agent and then it basically
00:18:08.140 | comes up with a plan, plan contains one or many context retrieval steps, and then eventually,
00:18:13.500 | you know, does the tool call, summarizes the results, and then generates an answer for us in a nice,
00:18:21.180 | clean Q&A manner, right? And then this is kind of how it looks like for a system with two questions that we ask.
00:18:27.660 | Now on to visualization, the most mysterious part of the pipeline. So what does visualization have to do
00:18:35.500 | with a RAG or ETL pipeline? For more context, our customers are trusting Alice to represent their business.
00:18:43.420 | They really want to know that Alice knows her stuff, that she actually knows the products that they sell,
00:18:47.660 | and she's not going to lie about case studies or testimonials or make things up about the pain points
00:18:52.140 | that they address. So how can we reassure them? In our case, we came up with a solution,
00:18:57.340 | which is to let users peek into Alice's brain. Get ready. This is what that looks like.
00:19:04.140 | We have an interactive 3D visualization of the knowledge base available in the product.
00:19:10.140 | What we've done here is taken all of the vectors from our pinecone vector database and collapsed,
00:19:16.780 | or actually, excuse me, I think the correct term is projected them down to just three dimensions.
00:19:20.780 | So we're going to render them as nodes in three-dimensional space.
00:19:23.500 | U-map.
00:19:23.900 | U-map. Using U-map. And once the nodes are visible in this space, you can click on any
00:19:31.100 | given node to view the associated chunk. This is one of the ways that, for example,
00:19:35.340 | our sales team or support team will demonstrate Alice's knowledge.
00:19:38.060 | Now, how does it look like in the actual UI, right? Basically, you start off with this nice little modal,
00:19:45.180 | you know, you drop in your URLs, your web pages, your documents, your videos,
00:19:49.420 | and then you click learn. And then it kind of shows up nicely in the UI. You have all the resources
00:19:54.540 | there. And then you have the ability to interrogate Alice about what she knows of your knowledge base,
00:20:00.220 | right? It's a really nice agent that we built, again, using Leta. And here's how it looks like
00:20:05.980 | in the campaign creation flow. You see that on the left-hand side, we have the knowledge base content
00:20:11.660 | showing up as a nice Q&A where you can click on the questions. And it shows you a drop down of the chunks
00:20:16.300 | that we retrieve. And these were used as a part of the messaging flow.
00:20:19.660 | Now, with that, we have achieved our goal. Our agent is closer to a human than being an email-o-tron,
00:20:27.420 | right? We are now basically emulating how you onboard a human SDR. You dump in a bunch
00:20:35.420 | of context and they just know. So in conclusion, the knowledge base was a pretty revolutionary project
00:20:41.660 | for our product and really changed the user experience and also leveled up our team a lot.
00:20:45.740 | We learned a lot of lessons. It was hard to create this slide, but there are just three that I want to
00:20:50.780 | highlight for you today. The first was that RAG is complex. It was a lot harder than we thought it was
00:20:56.300 | going to be. There were a lot of micro decisions made along the way. A lot of different technologies we had
00:21:00.940 | to evaluate. Supporting different research types was hard. Hopefully you all have a better appreciation
00:21:05.340 | of how complicated RAG can be. The second lesson was that you should first get to production before
00:21:12.060 | benchmarking and then you can improve. And the idea here is that with all of those decisions and vendors
00:21:16.940 | to evaluate, it can be hard to get started. So we recommend just getting something in production that
00:21:21.980 | satisfies the product requirements and then establishing some real benchmarks which you can use to iterate and
00:21:26.860 | improve. And the last learning here was that you should lean on vendors. You guys are all going to
00:21:31.820 | be buying solutions and they're going to be fighting for your business. Make them work for it. Make them
00:21:36.620 | teach you about the different offerings and why their solution is better. And so our future plans are to
00:21:43.740 | first track and address hallucinations in our emails. Evaluate parsing vendors on accuracy and completeness,
00:21:49.820 | those metrics that we identified earlier. Experiment with hybrid RAG, the introduction of a graph database,
00:21:55.740 | alongside our vector database, and finally to just focus on reducing costs across our entire pipeline.
00:22:01.740 | And if any of this sounds interesting to you, we are hiring. So please reach out to either Sotwick or myself.
00:22:07.180 | And thank you all for coming today.
00:22:13.020 | We'll see you next time.