BotDojo Launch: Enhancing AI Assistants with Evaluations and Synthetic Data

. PAUL HENRY: So, hello, my name is Paul Henry. I'm the founder of Bibe Dojo. And as a previous CTO, I was working with teams deploying LLMs applications for hundreds of thousands of customers. And like many of you guys know, it's super easy to hook up a vector database with an LLM over the weekend, but really hard to get it production ready.

And so that's what we do. We are an AI-enablement company, and we let companies deploy AI to prod. Live demo time. All right, so today I'm going to show you a demo of our product. We're going to take synthetic data that we're going to generate, and we're going to combine it with evaluations to see how we can improve the performance of a chatbot.

Or at least that's what I hope happens. All right. So I'm going to open up our template of our chatbot. And we have customers live that are using this template. It's kind of battle tested. And so let's test it out. How do I create a vector index in bot dojo?

OK. And as you can see, all the little no's are lighting up as they execute. We're taking the question. We're looking at the chat history. We're going to the vector database to retrieve the information. And then we're answering it with an AI model. So if I pull this up, you can kind of see in our low-code editor, this is the prompt that we're sending to the LLM.

We're getting the results out here. And we also support JSON schema. So if the model supports JSON output, like Grok, Claude, and all that stuff, then we just conform to that. One key thing is you can pull a trace of each node and see exactly what we sent to the LLM, what came from the retriever, the exact data, which has been super useful for debugging apps.

All right, and cool, we have an image. It's got citations. We should ship it. That was supposed to be a joke, but all right. So this is where evaluations come in. So I'm going to demonstrate the evaluations that I previously ran. So we have a feature in Bot Dojo called batches, which allow you to run a whole bunch of questions through your chatbot or your AI flow and run evaluations to kind of see how things are doing.

So if you can see this, we have a few five evaluations that we ran. There's a little bit of red. That's because we don't have enough information from our vector database. It also checks for things like hallucinations. So let's try to fix that. So I'm going to clone this batch.

I'm going to rename it with generated data. I'm going to increase the throughput a little bit because of time. And I don't have enough time to generate all the data for this demo. So the previous ran was filtering out the generated data. And so I'm going to remove the filter that we're passing into the flow so it takes in the generated data.

You can also change the model and all that kind of stuff to see how it performs. All right, so while that guy is running, I'm going to open up another flow. And so this is the actual flow that we generated that synthetic data. And so let me run this one real quick.

And so this particular flow takes in multiple inputs. And so I'm going to paste in some JSON from a previous run. And what this is going to do is kind of a trick that's been working well for customers is where you take, you extract questions and answers from support tickets.

So these are live agents talking with customers. And you use this as a test data to send it through your chat bot. And we take relevant information from the existing index and we have it write a document. And so it uses the same writing style. And then we do an inline evaluation to where we check to see if the document has enough information to answer the question.

And then we also have a code node here where a lot of times when you're using these low code editors, there's situations where you have 40,000 different boxes. And so when you have to do write code, we support Tyscript and soon Python. But you can see that, hey, we're getting the information and we're right into the vector index.

All right. Running out of time. Okay, let me go back to the support chat bot. Moment of truth. So I'm going to compare the batch that we ran before with the new stuff in 20 seconds. Oh, shh. You do it 15 times and it doesn't work. 10, 9. We're also hiring.

So if you're an AI engineer, help us fix this. All right, there it comes. Okay. All right. One second left. It's all green. So it improved the, you know, it measuredly improved something. So thank you. Botdojo.com. Check us out. Thanks. Thanks. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye.

Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. you We'll be right back.

BotDojo Launch: Enhancing AI Assistants with Evaluations and Synthetic Data

Transcript