BotDojo Launch: Enhancing AI Assistants with Evaluations and Synthetic Data

00:00:00.120 | .

00:00:13.680 | PAUL HENRY: So, hello, my name is Paul Henry.

00:00:15.600 | I'm the founder of Bibe Dojo.

00:00:17.200 | And as a previous CTO, I was working with teams

00:00:20.520 | deploying LLMs applications for hundreds of thousands

00:00:24.040 | of customers.

00:00:25.480 | And like many of you guys know, it's

00:00:26.800 | super easy to hook up a vector database with an LLM

00:00:29.960 | over the weekend, but really hard to get it production ready.

00:00:33.680 | And so that's what we do.

00:00:34.880 | We are an AI-enablement company, and we

00:00:37.800 | let companies deploy AI to prod.

00:00:41.440 | Live demo time.

00:00:43.840 | All right, so today I'm going to show you a demo of our product.

00:00:46.920 | We're going to take synthetic data that we're going to generate,

00:00:50.440 | and we're going to combine it with evaluations

00:00:52.320 | to see how we can improve the performance of a chatbot.

00:00:56.320 | Or at least that's what I hope happens.

00:00:59.000 | All right.

00:01:00.680 | So I'm going to open up our template of our chatbot.

00:01:04.840 | And we have customers live that are using this template.

00:01:07.180 | It's kind of battle tested.

00:01:09.340 | And so let's test it out.

00:01:10.340 | How do I create a vector index in bot dojo?

00:01:23.720 | OK.

00:01:24.220 | And as you can see, all the little no's

00:01:26.240 | are lighting up as they execute.

00:01:28.160 | We're taking the question.

00:01:29.280 | We're looking at the chat history.

00:01:30.900 | We're going to the vector database to retrieve the information.

00:01:34.480 | And then we're answering it with an AI model.

00:01:36.900 | So if I pull this up, you can kind of see in our low-code editor,

00:01:40.780 | this is the prompt that we're sending to the LLM.

00:01:43.120 | We're getting the results out here.

00:01:45.340 | And we also support JSON schema.

00:01:48.580 | So if the model supports JSON output,

00:01:51.760 | like Grok, Claude, and all that stuff,

00:01:55.420 | then we just conform to that.

00:01:59.060 | One key thing is you can pull a trace of each node

00:02:03.420 | and see exactly what we sent to the LLM,

00:02:06.060 | what came from the retriever, the exact data,

00:02:08.820 | which has been super useful for debugging apps.

00:02:12.000 | All right, and cool, we have an image.

00:02:14.100 | It's got citations.

00:02:15.620 | We should ship it.

00:02:17.780 | That was supposed to be a joke, but all right.

00:02:21.520 | So this is where evaluations come in.

00:02:23.420 | So I'm going to demonstrate the evaluations

00:02:26.900 | that I previously ran.

00:02:28.680 | So we have a feature in Bot Dojo called batches,

00:02:31.900 | which allow you to run a whole bunch of questions

00:02:33.860 | through your chatbot or your AI flow

00:02:36.680 | and run evaluations to kind of see how things are doing.

00:02:40.080 | So if you can see this, we have a few five evaluations

00:02:43.580 | that we ran.

00:02:44.340 | There's a little bit of red.

00:02:46.160 | That's because we don't have enough information

00:02:48.280 | from our vector database.

00:02:49.820 | It also checks for things like hallucinations.

00:02:52.400 | So let's try to fix that.

00:02:54.840 | So I'm going to clone this batch.

00:02:57.820 | I'm going to rename it with generated data.

00:03:02.960 | I'm going to increase the throughput a little bit

00:03:04.820 | because of time.

00:03:06.320 | And I don't have enough time to generate all the data

00:03:10.200 | for this demo.

00:03:11.000 | So the previous ran was filtering out the generated data.

00:03:15.180 | And so I'm going to remove the filter that we're passing

00:03:17.840 | into the flow so it takes in the generated data.

00:03:21.620 | You can also change the model and all that kind of stuff

00:03:24.280 | to see how it performs.

00:03:26.600 | All right, so while that guy is running,

00:03:29.740 | I'm going to open up another flow.

00:03:32.600 | And so this is the actual flow that we generated that synthetic data.

00:03:37.540 | And so let me run this one real quick.

00:03:43.520 | And so this particular flow takes in multiple inputs.

00:03:46.460 | And so I'm going to paste in some JSON from a previous run.

00:03:52.520 | And what this is going to do is kind of a trick that's been working well for customers

00:03:56.240 | is where you take, you extract questions and answers from support tickets.

00:04:00.500 | So these are live agents talking with customers.

00:04:02.720 | And you use this as a test data to send it through your chat bot.

00:04:07.840 | And we take relevant information from the existing index

00:04:11.240 | and we have it write a document.

00:04:12.660 | And so it uses the same writing style.

00:04:16.600 | And then we do an inline evaluation to where we check to see

00:04:22.140 | if the document has enough information to answer the question.

00:04:24.500 | And then we also have a code node here where a lot of times

00:04:28.460 | when you're using these low code editors,

00:04:30.360 | there's situations where you have 40,000 different boxes.

00:04:34.180 | And so when you have to do write code, we support Tyscript and soon Python.

00:04:39.660 | But you can see that, hey, we're getting the information

00:04:42.360 | and we're right into the vector index.

00:04:43.920 | All right.

00:04:45.860 | Running out of time.

00:04:47.540 | Okay, let me go back to the support chat bot.

00:04:50.740 | Moment of truth.

00:04:52.580 | So I'm going to compare the batch that we ran before

00:04:56.420 | with the new stuff in 20 seconds.

00:05:00.060 | Oh, shh.

00:05:02.200 | You do it 15 times and it doesn't work.

00:05:07.620 | 10, 9.

00:05:10.800 | We're also hiring.

00:05:11.680 | So if you're an AI engineer, help us fix this.

00:05:15.840 | All right, there it comes.

00:05:16.640 | Okay.

00:05:17.640 | All right.

00:05:18.460 | One second left.

00:05:19.320 | It's all green.

00:05:20.160 | So it improved the, you know,

00:05:21.720 | it measuredly improved something.

00:05:22.940 | So thank you.

00:05:24.260 | Botdojo.com.

00:05:26.240 | Check us out.

00:05:26.760 | Thanks.

00:05:27.080 | Thanks.

00:05:27.160 | Bye.

00:05:27.960 | Bye.

00:05:28.760 | Bye.

00:05:29.560 | Bye.

00:05:30.060 | Bye.

00:05:30.860 | Bye.

00:05:31.860 | Bye.

00:05:32.860 | Bye.

00:05:33.860 | Bye.

00:05:34.360 | Bye.

00:05:34.860 | Bye.

00:05:35.860 | Bye.

00:05:36.860 | Bye.

00:05:37.860 | Bye.

00:05:38.860 | Bye.

00:05:39.860 | Bye.

00:05:41.860 | you

00:05:42.360 | We'll be right back.