back to indexBotDojo Launch: Enhancing AI Assistants with Evaluations and Synthetic Data

00:00:13.680 |
PAUL HENRY: So, hello, my name is Paul Henry. 00:00:17.200 |
And as a previous CTO, I was working with teams 00:00:20.520 |
deploying LLMs applications for hundreds of thousands 00:00:26.800 |
super easy to hook up a vector database with an LLM 00:00:29.960 |
over the weekend, but really hard to get it production ready. 00:00:43.840 |
All right, so today I'm going to show you a demo of our product. 00:00:46.920 |
We're going to take synthetic data that we're going to generate, 00:00:50.440 |
and we're going to combine it with evaluations 00:00:52.320 |
to see how we can improve the performance of a chatbot. 00:01:00.680 |
So I'm going to open up our template of our chatbot. 00:01:04.840 |
And we have customers live that are using this template. 00:01:30.900 |
We're going to the vector database to retrieve the information. 00:01:34.480 |
And then we're answering it with an AI model. 00:01:36.900 |
So if I pull this up, you can kind of see in our low-code editor, 00:01:40.780 |
this is the prompt that we're sending to the LLM. 00:01:59.060 |
One key thing is you can pull a trace of each node 00:02:06.060 |
what came from the retriever, the exact data, 00:02:08.820 |
which has been super useful for debugging apps. 00:02:17.780 |
That was supposed to be a joke, but all right. 00:02:28.680 |
So we have a feature in Bot Dojo called batches, 00:02:31.900 |
which allow you to run a whole bunch of questions 00:02:36.680 |
and run evaluations to kind of see how things are doing. 00:02:40.080 |
So if you can see this, we have a few five evaluations 00:02:46.160 |
That's because we don't have enough information 00:02:49.820 |
It also checks for things like hallucinations. 00:03:02.960 |
I'm going to increase the throughput a little bit 00:03:06.320 |
And I don't have enough time to generate all the data 00:03:11.000 |
So the previous ran was filtering out the generated data. 00:03:15.180 |
And so I'm going to remove the filter that we're passing 00:03:17.840 |
into the flow so it takes in the generated data. 00:03:21.620 |
You can also change the model and all that kind of stuff 00:03:32.600 |
And so this is the actual flow that we generated that synthetic data. 00:03:43.520 |
And so this particular flow takes in multiple inputs. 00:03:46.460 |
And so I'm going to paste in some JSON from a previous run. 00:03:52.520 |
And what this is going to do is kind of a trick that's been working well for customers 00:03:56.240 |
is where you take, you extract questions and answers from support tickets. 00:04:00.500 |
So these are live agents talking with customers. 00:04:02.720 |
And you use this as a test data to send it through your chat bot. 00:04:07.840 |
And we take relevant information from the existing index 00:04:16.600 |
And then we do an inline evaluation to where we check to see 00:04:22.140 |
if the document has enough information to answer the question. 00:04:24.500 |
And then we also have a code node here where a lot of times 00:04:30.360 |
there's situations where you have 40,000 different boxes. 00:04:34.180 |
And so when you have to do write code, we support Tyscript and soon Python. 00:04:39.660 |
But you can see that, hey, we're getting the information 00:04:47.540 |
Okay, let me go back to the support chat bot. 00:04:52.580 |
So I'm going to compare the batch that we ran before 00:05:11.680 |
So if you're an AI engineer, help us fix this.