back to indexStreamlit for ML #2 - ML Models and APIs
Chapters
0:0 Intro
0:47 Creating the Vector DB
8:56 Implementing Retrieval
00:00:00.000 |
Okay in the last video we had a look at how to build what you can see on the screen right now a very 00:00:10.680 |
What we want to do in this video is go through how we actually 00:00:21.400 |
Open domain Q&A system that we're going to put together here. So I 00:00:27.800 |
Said before there are a few components to open the main Q&A. We're going to stick the first two for now 00:00:33.120 |
so the vector database, which we're going to use pinecone for and 00:00:36.960 |
The retriever model which we are going to download 00:00:40.640 |
From hugging face model hub and we're going to use the sentence transformers library to actually implement that 00:00:46.920 |
now the first thing we are going to want to do is 00:00:51.000 |
Create our vector database or our index now to do that. There are 00:00:59.120 |
Parts or three steps we need to take first. We need to download our data 00:01:03.160 |
We're going to be using the squad data set from hugging face data sets 00:01:10.840 |
encode those paragraphs or what we call context into context vectors and 00:01:16.420 |
We use sentence transformers and a retriever model for that. And then the next part is 00:01:23.080 |
uploading or pushing all of those vectors into our 00:01:32.440 |
We're just going to very quickly go through that code because it is a lot and I don't want to focus on it too much 00:01:47.240 |
Script I'm going to maybe zoom out a little bit so you can see 00:01:51.960 |
So the first thing we do is import everything you don't need tqdm here 00:01:57.260 |
But you can pip install tqdm if you do want to use that 00:02:03.520 |
From data sets. So this is hugging face data sets. You will need to install this. So that is just a 00:02:15.280 |
We're going to first initialize our retriever model, so we're using the pinecone MP net retrievers or to 00:02:22.280 |
Retrieve model. So this is a retriever model that is based on the MP net model from Microsoft 00:02:30.040 |
and it's been trained on the spot to data set and 00:02:33.080 |
First we need to do is initialize our connection to 00:02:41.840 |
Store all of our vectors now to do that. You do need an API key 00:02:46.320 |
So I wouldn't I wouldn't write it in your code, but I'm going to just do that 00:02:51.900 |
For the sake of simplicity here. So I'm going to go to this app dot pinecone dio and this is free 00:03:05.840 |
And then you will have to sign up. So you create an account 00:03:09.040 |
I already have one so I don't need to worry about that and 00:03:12.800 |
I have this default API key over here. Like I could use that and 00:03:21.880 |
We can see the key if we want. I want to zoom in a little bit. I'll Chris it's a little bit bigger and 00:03:31.720 |
and we can see the value though, we just copy or we just press over here and copy that across and 00:03:40.800 |
Okay. Now this script by the way, I will leave a link to this in the description 00:03:47.560 |
So you can just download it instead of writing it all out 00:03:49.800 |
Because this isn't essential to our app. It's just how we build 00:04:06.400 |
We have that we have the cloud environment that we're using there 00:04:13.080 |
We want to check if the index already exists, so I'm going to create a secure index now actually you can see in mind 00:04:20.600 |
I already have it because I've run this code already 00:04:26.120 |
Already exists, so it's not going to create a new index and instead it's just going to connect that index here 00:04:33.520 |
right, so we've just connected with or we created our index our vector database index and 00:04:40.200 |
Now what I want to do is I'm going to switch back to our 00:04:44.960 |
Data, and I'm going to run through that. So I'm going to load the data search and the squad data set from 00:04:56.440 |
Now the I'm going to use a validation split because the model has been trained on the training data, but squad 00:05:03.400 |
I want to make it at least a little bit hard, so we're going to use a validation split that it hasn't seen before 00:05:16.760 |
Zoom out a little bit here and squad death. We're using this filter, so this is all hugging face data sets 00:05:27.040 |
And then we're encoding it so this model don't encode so this is our sentence transformer 00:05:34.080 |
Sentence vectors for our context and we're converting these to list because we are going to be pushing these through an API request 00:05:42.560 |
To pinecone game. We need a list not a numpy array of what you can get going to get an error 00:05:48.360 |
Okay, then back to the pinecone side of things. We want to create a list of 00:05:58.280 |
It's basically a list of tuples and those tuples include the ID of 00:06:03.160 |
Each context so there's a unique ID for each context. We want the vector or the encoding the context vector 00:06:11.560 |
And then we also have this dictionary here now. This is metadata so metadata and pinecone is like any other 00:06:18.800 |
Information about your vectors that you want to include and this is really good if you want to use metadata filtering which is super 00:06:28.520 |
Sand I definitely want to you know leave the option open later on. I'm not sure if we'll use it or not 00:06:35.060 |
We'll probably put something in there. Just so we can play around with it 00:06:41.560 |
That creates the format that we need to upset everything which means just like push or upload everything to pinecone 00:06:52.280 |
It just makes things a little bit easier on the on the API request rather than sending everything at once. Okay? 00:07:11.480 |
So let's switch back to our app here. Let's view it 00:07:17.560 |
So first, let's just remove this. We don't need that. Okay, it's a will automatically reload 00:07:39.400 |
Just copy it and then we'll remove what we don't need in a minute 00:07:44.000 |
So we don't need we do need sentence transformers 00:07:51.080 |
We do need pinecone. So actually here. We're initializing our our retriever model 00:07:58.240 |
It's the same as what we did before. So we do want to keep that in there 00:08:04.120 |
API key again just saw this somewhere else or if you are using 00:08:09.840 |
Streamlit cloud they have like a secrets management 00:08:13.760 |
System and it's something we'll look at in the future for sure. But for now, I'm just putting in here 00:08:19.440 |
So we have our API key environment and we're just doing the same thing we did before but actually we don't want to create an index 00:08:28.240 |
We're assuming we've already created an index if web in our app, so we're just going to connect to it 00:08:39.200 |
Like the back end part of our app. I've smart part that's going to handle the open the main Q&A 00:08:44.960 |
But it's going to be a little bit slow and we we will have a look at how to solve that 00:08:50.720 |
pretty soon, but for now, what we're going to do is actually just implement this and 00:08:56.480 |
We're going to actually query and see what we we return 00:09:04.360 |
we won't see anything change in our app now other than the fact that it takes longer to load because it's 00:09:15.880 |
The slowness here and then obviously connecting to pinecone also take takes a second as well 00:09:31.440 |
Now I actually want to do is I want to say okay if the query is not empty because by default it is empty 00:09:38.680 |
That's why we add that in there. So I'm going to actually remove this 00:09:54.200 |
Pinecone for whatever is in that query. So the first thing we need to do is create our context vector 00:10:06.880 |
Especially if you use vice before they tend to use this and I say I said context vector. I mean query vector 00:10:18.520 |
We need to put this in square brackets and we have query. Okay, and then we're going to convert that to a list 00:10:48.760 |
Context and we're going to solve these in XC. So like query like context vector 00:11:09.760 |
And we're going to pass XQ. So our query vector and we're gonna say how many results we want to return now 00:11:20.360 |
Streamlet a little like a slider bar to decide how many we would like to return but for now we will hard code it 00:11:28.920 |
another thing that we want to include here is we want to tell pinecone to 00:11:34.200 |
Return the metadata because by default it will not return metadata 00:11:40.480 |
Return metadata equals true. So these are like the extra little bits. I mentioned before so included our 00:11:49.600 |
Like the topic Wikipedia topic that the context is coming from and also the text itself 00:11:56.960 |
okay, so we're going to return the relevant context and 00:12:00.320 |
Then we're gonna loop through each of those now 00:12:04.080 |
When we do this, there's a particular format that we need to follow 00:12:13.960 |
our context are actually going to be sword so for context in XC results and 00:12:24.040 |
Results is going to return a list and we just want the the first item in that list 00:12:29.360 |
The reason it returns a list is because if you are a querying pinecone with multiple queries 00:12:34.960 |
It will turn a list of you know, your answers for each query 00:12:40.080 |
But in this case, we are only ever going to query with one 00:12:44.160 |
query vector so we always enter a position zero here and then in there we will have 00:13:02.640 |
For context in there. All we're going to do is write st dot right 00:13:12.600 |
then we want to go into the metadata that we were returning and 00:13:15.840 |
We have title and text here. We don't want title we want text 00:13:21.240 |
Okay, so let's say that and check that actually works 00:13:25.200 |
so again, this is going to take a lot while to load because we're 00:13:28.880 |
Initializing like the full pipeline of our vector database and the true model 00:13:34.800 |
So every time we run this it's downloading the full tree model, which takes quite a bit of time 00:13:39.600 |
Okay, so this is just rerun our app and now I can say who are the Normans? 00:13:50.080 |
Again trying to reload everything so it's gonna take a while. We're going to fix this in the next video 00:13:57.080 |
We should be returning five context and if we scroll down we can see we have these five paragraphs 00:14:04.960 |
Context so we could maybe we can inspect the element 00:14:21.040 |
Pretty horrific to look at so if I zoom up we can see each one of these is 00:14:45.080 |
Video, so we're now we have these back end working and the next one 00:14:50.960 |
What we'll do is fix this issue with it taking forever to load reload everything every time 00:15:05.000 |
Thank you very much for watching and I will see you in the next one. Bye