back to index

BotDojo Launch: Enhancing AI Assistants with Evaluations and Synthetic Data


Whisper Transcript | Transcript Only Page

00:00:13.680 | PAUL HENRY: So, hello, my name is Paul Henry.
00:00:15.600 | I'm the founder of Bibe Dojo.
00:00:17.200 | And as a previous CTO, I was working with teams
00:00:20.520 | deploying LLMs applications for hundreds of thousands
00:00:24.040 | of customers.
00:00:25.480 | And like many of you guys know, it's
00:00:26.800 | super easy to hook up a vector database with an LLM
00:00:29.960 | over the weekend, but really hard to get it production ready.
00:00:33.680 | And so that's what we do.
00:00:34.880 | We are an AI-enablement company, and we
00:00:37.800 | let companies deploy AI to prod.
00:00:41.440 | Live demo time.
00:00:43.840 | All right, so today I'm going to show you a demo of our product.
00:00:46.920 | We're going to take synthetic data that we're going to generate,
00:00:50.440 | and we're going to combine it with evaluations
00:00:52.320 | to see how we can improve the performance of a chatbot.
00:00:56.320 | Or at least that's what I hope happens.
00:00:59.000 | All right.
00:01:00.680 | So I'm going to open up our template of our chatbot.
00:01:04.840 | And we have customers live that are using this template.
00:01:07.180 | It's kind of battle tested.
00:01:09.340 | And so let's test it out.
00:01:10.340 | How do I create a vector index in bot dojo?
00:01:24.220 | And as you can see, all the little no's
00:01:26.240 | are lighting up as they execute.
00:01:28.160 | We're taking the question.
00:01:29.280 | We're looking at the chat history.
00:01:30.900 | We're going to the vector database to retrieve the information.
00:01:34.480 | And then we're answering it with an AI model.
00:01:36.900 | So if I pull this up, you can kind of see in our low-code editor,
00:01:40.780 | this is the prompt that we're sending to the LLM.
00:01:43.120 | We're getting the results out here.
00:01:45.340 | And we also support JSON schema.
00:01:48.580 | So if the model supports JSON output,
00:01:51.760 | like Grok, Claude, and all that stuff,
00:01:55.420 | then we just conform to that.
00:01:59.060 | One key thing is you can pull a trace of each node
00:02:03.420 | and see exactly what we sent to the LLM,
00:02:06.060 | what came from the retriever, the exact data,
00:02:08.820 | which has been super useful for debugging apps.
00:02:12.000 | All right, and cool, we have an image.
00:02:14.100 | It's got citations.
00:02:15.620 | We should ship it.
00:02:17.780 | That was supposed to be a joke, but all right.
00:02:21.520 | So this is where evaluations come in.
00:02:23.420 | So I'm going to demonstrate the evaluations
00:02:26.900 | that I previously ran.
00:02:28.680 | So we have a feature in Bot Dojo called batches,
00:02:31.900 | which allow you to run a whole bunch of questions
00:02:33.860 | through your chatbot or your AI flow
00:02:36.680 | and run evaluations to kind of see how things are doing.
00:02:40.080 | So if you can see this, we have a few five evaluations
00:02:43.580 | that we ran.
00:02:44.340 | There's a little bit of red.
00:02:46.160 | That's because we don't have enough information
00:02:48.280 | from our vector database.
00:02:49.820 | It also checks for things like hallucinations.
00:02:52.400 | So let's try to fix that.
00:02:54.840 | So I'm going to clone this batch.
00:02:57.820 | I'm going to rename it with generated data.
00:03:02.960 | I'm going to increase the throughput a little bit
00:03:04.820 | because of time.
00:03:06.320 | And I don't have enough time to generate all the data
00:03:10.200 | for this demo.
00:03:11.000 | So the previous ran was filtering out the generated data.
00:03:15.180 | And so I'm going to remove the filter that we're passing
00:03:17.840 | into the flow so it takes in the generated data.
00:03:21.620 | You can also change the model and all that kind of stuff
00:03:24.280 | to see how it performs.
00:03:26.600 | All right, so while that guy is running,
00:03:29.740 | I'm going to open up another flow.
00:03:32.600 | And so this is the actual flow that we generated that synthetic data.
00:03:37.540 | And so let me run this one real quick.
00:03:43.520 | And so this particular flow takes in multiple inputs.
00:03:46.460 | And so I'm going to paste in some JSON from a previous run.
00:03:52.520 | And what this is going to do is kind of a trick that's been working well for customers
00:03:56.240 | is where you take, you extract questions and answers from support tickets.
00:04:00.500 | So these are live agents talking with customers.
00:04:02.720 | And you use this as a test data to send it through your chat bot.
00:04:07.840 | And we take relevant information from the existing index
00:04:11.240 | and we have it write a document.
00:04:12.660 | And so it uses the same writing style.
00:04:16.600 | And then we do an inline evaluation to where we check to see
00:04:22.140 | if the document has enough information to answer the question.
00:04:24.500 | And then we also have a code node here where a lot of times
00:04:28.460 | when you're using these low code editors,
00:04:30.360 | there's situations where you have 40,000 different boxes.
00:04:34.180 | And so when you have to do write code, we support Tyscript and soon Python.
00:04:39.660 | But you can see that, hey, we're getting the information
00:04:42.360 | and we're right into the vector index.
00:04:43.920 | All right.
00:04:45.860 | Running out of time.
00:04:47.540 | Okay, let me go back to the support chat bot.
00:04:50.740 | Moment of truth.
00:04:52.580 | So I'm going to compare the batch that we ran before
00:04:56.420 | with the new stuff in 20 seconds.
00:05:00.060 | Oh, shh.
00:05:02.200 | You do it 15 times and it doesn't work.
00:05:07.620 | 10, 9.
00:05:10.800 | We're also hiring.
00:05:11.680 | So if you're an AI engineer, help us fix this.
00:05:15.840 | All right, there it comes.
00:05:16.640 | Okay.
00:05:17.640 | All right.
00:05:18.460 | One second left.
00:05:19.320 | It's all green.
00:05:20.160 | So it improved the, you know,
00:05:21.720 | it measuredly improved something.
00:05:22.940 | So thank you.
00:05:24.260 | Botdojo.com.
00:05:26.240 | Check us out.
00:05:26.760 | Thanks.
00:05:27.080 | Thanks.
00:05:42.360 | We'll be right back.