Okay, thanks everyone for coming today. So today's talk is called "Building Alice's Brain: How We Built an AI Sales Rep That Learns Like a Human." My name is Sherwood. I am one of the tech leads here at 11x. I lead engineering for our Alice product and I'm joined by my colleague, Sotwick.
So 11x, for those of you who are unfamiliar, is a company that's building digital workers for the go-to-market organization. We have two digital workers today. We have Alice, who is our AI SDR, and then we also have Julian, who is our voice agent, and we have more workers on the way.
Today we're gonna be talking about Alice specifically and actually Alice's brain or the knowledge base, which is effectively her brain. So let's start from the basics. What is an SDR? Well an SDR is a sales development representative, if you're not familiar. I know that's a room full of engineers, so I thought I would start with the basics.
And this is essentially an entry-level sales role. This is the kind of job that you might get right out of school. And your responsibilities basically boil down to three things. First, you're sourcing leads. These are people that you'd like to sell to. Then you're contacting them or engaging them across channels.
And finally, you're booking meetings with those people. So your goal here is to generate positive replies and meetings booked. These are the two key metrics for an SDR. And a lot of an SDR's job boils down to writing emails like the one that you see in front of you right now.
This is actually an email that Alice has written, and it's an example of the type of work output that Alice has. Alice sends about 50,000 of these emails in a given day. And that's in comparison to a human SDR who would send 20 to 50. And Alice is now running campaigns for about 300 different business organizations.
So before we go any further, I want to define some terms. Because since we work at 11x, we have our customers, but then our customers also have their customers. So things get a little confusing. Today, we'll be using the term seller to refer to the company that is selling something through Alice that is our customer.
And then we'll be using the term lead to refer to the person who's being sold to. And here's what that looks like as a diagram. You can see the seller is pushing context about their business. These are the products that they sell or the case studies that they have that they can reference in emails.
They're pushing that to Alice. And then Alice is then using that to personalize emails for each of the leads that she contacts. So there are two requirements that Alice needs to in order to succeed in her role. The first is that she needs to know the seller, the products, the services, the case studies, the pain points, the value props, the ICP.
And the second is that she needs to know the lead, their role, their responsibilities, what they care about, what other solutions they've tried, pain points that they might be experiencing, the company they work for. And today, we're going to be really focused on knowing the seller. So in our, in the old version of our product, the seller would be responsible for pushing context about her, about their business to Alice.
And they did so through a manual experience called the library. And here you can see what it looks like there, where the library shows all of the different products and offers that are available for this business that Alice can then reference when she writes emails. The user would have to enter details about every individual product and service and all of the pain points and solutions and value props associated with them in our dashboard and including these detailed descriptions.
And those descriptions would, were important to get right because these actually get included in the context for the emails or for Alice when she writes the emails. Then later on, during campaign creation, this is what it looks like to create a campaign. And you can see we have a lead in the top left.
And the user is selecting the different offers that they've defined from the library in the top right. And these are the offers that Alice has access to when she's generating her emails. We had a lot of problems with this user experience. And the first one was it was just extremely tedious.
It was a really bad and cumbersome user experience. The user had to enter a lot of information. And that created this onboarding friction where users couldn't actually run campaigns until they hadn't filled out their library. And finally, the emails that we were generating using this approach were just suboptimal.
Users would have to either choose between too few emails, or too few offers, which meant that you'd have irrelevant offers for a given lead, or too many offers, which means that you have all of the stuff in the context window. And Alice just wasn't as smart when she writes those emails.
So how can we address this? Well, we had an idea, which is that instead of the seller being responsible for pushing context about the business to Alice, we could flip things around so that Alice can proactively pull all of the context about the seller into her system, and then use whatever is most relevant when writing those emails.
And that's effectively what we accomplished with the knowledge base, which we'll tell you more about in just a moment. So for the rest of the talk, we're going to first do a high-level overview of the knowledge base and how it works. Then we will do a deep dive on the pipeline, the different steps in our RAG system pipeline.
Then after that, we will talk through the user experience of the knowledge base. And we will wrap up with some lessons from this project and future clients. So let's start out with an overview. All right. So overview, what is knowledge base, right? It's basically a way for us to kind of get closer to a human experience.
Like if you're training a human SDR, you would kind of get them in, and then you will basically dump a bunch of documents on them, and then they ramp up throughout a period of like weeks or months. And you can basically check in on their progress. And similar to that, knowledge base is basically a centralized repository on our platform for the seller info.
And then users can kind of come in, dump all their source material, and then we are able to reference that information at the time of message generation. Now, what resources do SDRs care about? Here's a little glimpse into that. Marketing materials, case studies, sales calls, press releases, you know, and a bunch of other stuff.
Now, how do we bucket these into categories that we're actually going to parse? Well, we created documents and images, websites, and then media, audio, video. And you're going to see why that's important. So here's an overview of what the architecture looks like. It starts off with the user uploading something, any document or resource in the client, and then we save it to our S3 bucket, and then send it to the backend, which then, you know, creates a bunch of resources in our DB, and then kicks off a bunch of jobs depending on the resource type and the vendor selected.
Now, the vendors are asynchronously doing the parsing. Once they're done, they send a webhook to us, which we consume via ingest. And then once we've consumed that webhook, we take that parsed artifact that we get back from the vendors, and then we store it in our DB, and then at the same time, upsert it to pinecone and embed it.
And then eventually, once we store it in the local DB, we have like a UI update, and then eventually, our agent can query pinecone, our vector DB, for that stored information that we just put in. So now that we have a high level of understanding of how the knowledge base works, let's dig into each individual step in the pipeline.
There are five different steps in the pipeline. The first is parsing, then there's chunking, then there's storage, then there's retrieval, and finally, we have visualization, which will, sounds a little untraditional, but we'll cover it in a moment. So let's start with parsing. What is parsing? I think that we probably all take this for granted, but it's worth defining.
Parsing is the process of converting a non-text resource into text. And the reason that this is necessary is because, as we all know, language models, they speak text. So in order to make information that is represented in a different form, like a PDF, or an MP4 file, or a, or an image, legible or useful to the LLM, we need to first convert it to text.
And so one way of thinking about parsing is it's the process of making non-text information legible to a large language model. And we do have multimodal models that are one solution to this, but there are lots of restrictions on multimodal models that make it, that make parsing still relevant.
So to illustrate that we have the five different document types or resource types that we mentioned momentarily ago, going through our parsing process and coming out as actually markdown, which is a type of text that, as we all know, contains some structural information and formatting, which is actually semantically, semantically meaningful and useful.
Let's talk about the process of how we implemented Parsed. And the, the short answer is that we did not. We didn't want to build this from scratch. And we had a few different reasons for doing this. The first is that you just saw that we had five different research types and a lot of different file types within each of them.
We thought it was going to be too many, and we thought it was going to be too much work. We wanted to get to market quickly. The last reason was that we just weren't that confident in the outcome. There are vendors who dedicate their entire company to building an effective parsing system for a specific resource type.
We didn't want our team to, to have to become specialists in, in parsing for each one of these resource types and to build a, a parsing system for that. We thought that maybe if we tried to do this, the outcome actually just wouldn't be that, that successful. So we chose to work with a vendor and here are a bunch of the vendors that we, we came across.
You can find 10 or 20 or 50 with just a quick Google search, but these are some of the leaders that we evaluated. And in order to make a decision, we came up with some requirements and three specific requirements. The first was that we needed support for our necessary resource types.
That goes without saying. We also wanted markdown output. And then finally, we wanted this vendor to support webhooks. We wanted to be able to receive that output in a convenient manner. A few things that we didn't consider to start out with. Accuracy. Crazy. We didn't consider accuracy. We didn't consider either accuracy or comprehensiveness.
Our assumption here was that most of the vendors that are leaders in the market are going to be within a reasonable band of accuracy and comprehensiveness. And accuracy would refer to whether or not the extracted output is actually matches the original resource. Comprehensiveness on the other hand is the amount of extracted information that is available in the, in the final output.
The last thing that we didn't really consider was cost to be honest. And this was because we were, the system was pre-production. We didn't have real production data yet and we didn't know what our usage would be. And so we, we figured what we would do is would come back and optimize cost once we had real usage data.
So on to our final selections. For documents and images, we chose to work with LlamaParse, which is a LlamaIndex product. I think Jerry was up here earlier today. And the reasons that we chose to work with LlamaParse was first, it supported the most number of file types out of any document parsing solution we could find.
And second, their support was really great. Jerry and his team were, were, were quick to get in a Slack channel with us. I think within just a couple of hours of us doing an initial evaluation. And with LlamaParse, we're able to turn documents like this PDF of a 11x sales deck into a markdown file, like the one you see on the right.
For websites, we chose to work with FireCrawl. The other main vendor that we were considering was Tavely. And this is actually not really a major knock on Tavely. For FireCrawl, we chose to work with them because first, we were familiar. We had already worked with them on a previous project.
And secondly, Tavely's Crawl endpoint, which is the endpoint that we would have needed for this project, was still in development at the time. So it wasn't something we could actually use. And similar to LlamaParse with FireCrawl, we are able to take a website like this Brex homepage that you see here and turn it into another markdown document.
Then we have audio and video. And for audio and video, we chose to work with a newer upstart vendor called Cloud Glue. And the reasons that we chose to work with Cloud Glue were first, they supported both audio and video, not just audio. And second, they were actually capable of extracting information from the video itself, as opposed to just transcribing the video and giving us back a markdown file that contains the transcript of the audio.
And so with Cloud Glue, we were able to turn YouTube videos and MP4 files and other video formats into markdown like you see on the right. So now that everything is marked down, we move on to the next step, which is chunking. All right. Markdown. Let's go. Now, basically, we have a blob of markdown, right?
And we want to kind of break it down into like semantic entities that we can embed and put it in our vectorDB. At the same time, we want to protect the structure of the markdown because it contains some meaning inherently, like something's a title versus something's a paragraph. There is inherent meaning behind that.
So we're splitting these long blobs of text, like 10-page documents into chunks that we can eventually retrieve after we've embedded and stored them in a vectorDB, right? And now, basically, we can, like, take all of this and, like, we're thinking about how we can, you know, split our long document into chunks, right?
So chunking strategies. You have various things that you can do. You can split on tokens. You can split on sentences. You can also split on markdown headers, right? And then you can do, like, LLM calls and have an LLM split your document into chunks, you know, or any combination of the above.
Now, what you want to ask yourself when you're deciding on a chunking strategy is, like, what kind of logical units am I trying to preserve in my data, right? What do I eventually want to extract during my retrieval, right? What strategy will keep them intact? And at the same time, you're able to successfully embed them and store them in whatever DB you want.
So, and then, should I try a different strategy for different resource types? We have, like, we have to deal with PDFs, PowerPoints, videos, right? And then, eventually, what kinds of queries or retrieval strategies am I expecting? And then, we ended up with, like, a combination of all the three, like, all the things that we mentioned.
So, we split on markdown headers, and then we kind of a waterfall. So, because we want our, like, records in our vector DB to be a certain token count, so we split on markdown headers, and then we split on sentences, and then eventually we split on tokens. And then, yeah, it's, like, worked well for us for all types of documents.
And it has successfully preserved our markdown chunks that we can kind of cleanly show in the UI. And it also prevents super long chunks, which are, you know, diluting the meaning behind your document if you end up with that. Speaker 1: Okay, so we have split all of our markdown into individual chunks.
It's now time to put those chunks somewhere. We're going to store them. Let's talk about storage technologies. So, for storage technologies, I'm sure everyone is, like, here for the RAG section. So, they think that we're using a vector database. We actually are using a vector database. But to be pedantic, RAG is retrieval augmented generation.
So, we all know that anytime you're retrieving context from an external source, whether it's a graph database, or elastic search, or a file in the file system, that also qualifies as RAG. Some of the other options you can use for RAG, I just mentioned a graph database, document databases, relational databases, key value stores.
You could even use object storage like S3. In our case, we did use a vector database, and that's because we wanted to do some similarity search, which is what vector databases are built for and optimized for. Once again, we had a lot of options to choose from. This is a not a complete or an exhaustive list.
In the end, we chose to work with a company called Pinecone. And the reason that we chose to work with Pinecone was first, it was a well-known solution. We were kind of new to the space, and we thought probably can't go wrong going with the market leader. It was cloud hosted, so our team wouldn't have to spin up any additional infrastructure.
It was really easy to get started. They had great getting started guides and SDKs. They had embedding models bundled with the solution. So for a vector database, typically you have to embed the information before it goes into the database. That would require the use of a third party or an external vector, excuse me, embedding model.
And with Pinecone, we didn't actually have to go find another embedding model provider or host our own embedding model. We just used the one that they provide. And last but not least, their customer support was awesome. They got a lot of calls with us, helped us analyze different vector database options, and think through graph databases and graph rag, whether that made sense for us.
So retrieval, the rag part of the rag workflow that we just built, right? You'll see that there's actually an evolution of different rag techniques over the last year. We started off with just traditional rag, which is kind of a play on you're pulling information and then enriching your system prompt for an LLM API call, right?
And then eventually that turned into an agentic rag form where now you have all these tools for getting information retrieval. And then you attach those tools to whatever agentic flow that you have, and then it calls the tool as a part of its larger flow, right? Now something we were seeing emerge in the last couple of months is deep research rack, where now you have these deep research agents which are coming up with a plan, and then they execute them.
And the plan may contain one or many steps of retrieval, right? These deep research agents can go broad or deep depending on the context needs and they can evaluate whether or not they want to do more retrieval. We ended up building a deep research agent. We actually used a company called Leto.
Leto is a cloud agent provider and they're really easy to build with. So how it works, basically we pass in the lead information to our agent and then it basically comes up with a plan, plan contains one or many context retrieval steps, and then eventually, you know, does the tool call, summarizes the results, and then generates an answer for us in a nice, clean Q&A manner, right?
And then this is kind of how it looks like for a system with two questions that we ask. Now on to visualization, the most mysterious part of the pipeline. So what does visualization have to do with a RAG or ETL pipeline? For more context, our customers are trusting Alice to represent their business.
They really want to know that Alice knows her stuff, that she actually knows the products that they sell, and she's not going to lie about case studies or testimonials or make things up about the pain points that they address. So how can we reassure them? In our case, we came up with a solution, which is to let users peek into Alice's brain.
Get ready. This is what that looks like. We have an interactive 3D visualization of the knowledge base available in the product. What we've done here is taken all of the vectors from our pinecone vector database and collapsed, or actually, excuse me, I think the correct term is projected them down to just three dimensions.
So we're going to render them as nodes in three-dimensional space. U-map. U-map. Using U-map. And once the nodes are visible in this space, you can click on any given node to view the associated chunk. This is one of the ways that, for example, our sales team or support team will demonstrate Alice's knowledge.
Now, how does it look like in the actual UI, right? Basically, you start off with this nice little modal, you know, you drop in your URLs, your web pages, your documents, your videos, and then you click learn. And then it kind of shows up nicely in the UI. You have all the resources there.
And then you have the ability to interrogate Alice about what she knows of your knowledge base, right? It's a really nice agent that we built, again, using Leta. And here's how it looks like in the campaign creation flow. You see that on the left-hand side, we have the knowledge base content showing up as a nice Q&A where you can click on the questions.
And it shows you a drop down of the chunks that we retrieve. And these were used as a part of the messaging flow. Now, with that, we have achieved our goal. Our agent is closer to a human than being an email-o-tron, right? We are now basically emulating how you onboard a human SDR.
You dump in a bunch of context and they just know. So in conclusion, the knowledge base was a pretty revolutionary project for our product and really changed the user experience and also leveled up our team a lot. We learned a lot of lessons. It was hard to create this slide, but there are just three that I want to highlight for you today.
The first was that RAG is complex. It was a lot harder than we thought it was going to be. There were a lot of micro decisions made along the way. A lot of different technologies we had to evaluate. Supporting different research types was hard. Hopefully you all have a better appreciation of how complicated RAG can be.
The second lesson was that you should first get to production before benchmarking and then you can improve. And the idea here is that with all of those decisions and vendors to evaluate, it can be hard to get started. So we recommend just getting something in production that satisfies the product requirements and then establishing some real benchmarks which you can use to iterate and improve.
And the last learning here was that you should lean on vendors. You guys are all going to be buying solutions and they're going to be fighting for your business. Make them work for it. Make them teach you about the different offerings and why their solution is better. And so our future plans are to first track and address hallucinations in our emails.
Evaluate parsing vendors on accuracy and completeness, those metrics that we identified earlier. Experiment with hybrid RAG, the introduction of a graph database, alongside our vector database, and finally to just focus on reducing costs across our entire pipeline. And if any of this sounds interesting to you, we are hiring.
So please reach out to either Sotwick or myself. And thank you all for coming today. We'll see you next time.