back to index

Streamlit for ML #2 - ML Models and APIs


Chapters

0:0 Intro
0:47 Creating the Vector DB
8:56 Implementing Retrieval

Whisper Transcript | Transcript Only Page

00:00:00.000 | Okay in the last video we had a look at how to build what you can see on the screen right now a very
00:00:06.720 | simple sort of interface using streamlet now
00:00:10.680 | What we want to do in this video is go through how we actually
00:00:15.520 | build the smart part
00:00:18.640 | behind the
00:00:21.400 | Open domain Q&A system that we're going to put together here. So I
00:00:27.800 | Said before there are a few components to open the main Q&A. We're going to stick the first two for now
00:00:33.120 | so the vector database, which we're going to use pinecone for and
00:00:36.960 | The retriever model which we are going to download
00:00:40.640 | From hugging face model hub and we're going to use the sentence transformers library to actually implement that
00:00:46.920 | now the first thing we are going to want to do is
00:00:51.000 | Create our vector database or our index now to do that. There are
00:00:57.640 | three
00:00:59.120 | Parts or three steps we need to take first. We need to download our data
00:01:03.160 | We're going to be using the squad data set from hugging face data sets
00:01:07.720 | Then we want to encode those vectors
00:01:10.840 | encode those paragraphs or what we call context into context vectors and
00:01:16.420 | We use sentence transformers and a retriever model for that. And then the next part is
00:01:23.080 | uploading or pushing all of those vectors into our
00:01:27.720 | pinecone vector database
00:01:29.720 | so to do all of that
00:01:32.440 | We're just going to very quickly go through that code because it is a lot and I don't want to focus on it too much
00:01:39.680 | So here we have
00:01:47.240 | Script I'm going to maybe zoom out a little bit so you can see
00:01:51.960 | So the first thing we do is import everything you don't need tqdm here
00:01:57.260 | But you can pip install tqdm if you do want to use that
00:01:59.960 | So we are
00:02:03.520 | From data sets. So this is hugging face data sets. You will need to install this. So that is just a
00:02:09.840 | pip install
00:02:12.520 | data sets
00:02:15.280 | We're going to first initialize our retriever model, so we're using the pinecone MP net retrievers or to
00:02:22.280 | Retrieve model. So this is a retriever model that is based on the MP net model from Microsoft
00:02:30.040 | and it's been trained on the spot to data set and
00:02:33.080 | First we need to do is initialize our connection to
00:02:38.500 | Pinecone. So this is where we're going to
00:02:41.840 | Store all of our vectors now to do that. You do need an API key
00:02:46.320 | So I wouldn't I wouldn't write it in your code, but I'm going to just do that
00:02:51.900 | For the sake of simplicity here. So I'm going to go to this app dot pinecone dio and this is free
00:02:58.400 | By the way, you don't have to pay anything
00:03:00.400 | so we just go to
00:03:02.760 | app dot pinecone dio
00:03:05.840 | And then you will have to sign up. So you create an account
00:03:09.040 | I already have one so I don't need to worry about that and
00:03:12.800 | I have this default API key over here. Like I could use that and
00:03:19.000 | Yeah, I'm just going to use that so
00:03:21.880 | We can see the key if we want. I want to zoom in a little bit. I'll Chris it's a little bit bigger and
00:03:28.360 | So we can see that
00:03:31.720 | and we can see the value though, we just copy or we just press over here and copy that across and
00:03:38.640 | Then I'm just going to paste in here
00:03:40.800 | Okay. Now this script by the way, I will leave a link to this in the description
00:03:47.560 | So you can just download it instead of writing it all out
00:03:49.800 | Because this isn't essential to our app. It's just how we build
00:03:54.280 | we encode all of our context and
00:03:57.720 | actually
00:04:00.520 | Saw them in our in our vector database
00:04:06.400 | We have that we have the cloud environment that we're using there
00:04:09.840 | Switch this back to the app
00:04:13.080 | We want to check if the index already exists, so I'm going to create a secure index now actually you can see in mind
00:04:20.600 | I already have it because I've run this code already
00:04:23.480 | so QA index
00:04:26.120 | Already exists, so it's not going to create a new index and instead it's just going to connect that index here
00:04:33.520 | right, so we've just connected with or we created our index our vector database index and
00:04:40.200 | Now what I want to do is I'm going to switch back to our
00:04:44.960 | Data, and I'm going to run through that. So I'm going to load the data search and the squad data set from
00:04:53.440 | Hugging face
00:04:56.440 | Now the I'm going to use a validation split because the model has been trained on the training data, but squad
00:05:03.400 | I want to make it at least a little bit hard, so we're going to use a validation split that it hasn't seen before
00:05:09.160 | I'm removing any unique
00:05:12.080 | or duplicate
00:05:14.760 | contexts in there, so
00:05:16.760 | Zoom out a little bit here and squad death. We're using this filter, so this is all hugging face data sets
00:05:24.160 | syntax here
00:05:27.040 | And then we're encoding it so this model don't encode so this is our sentence transformer
00:05:32.080 | We're encoding it to create a load of
00:05:34.080 | Sentence vectors for our context and we're converting these to list because we are going to be pushing these through an API request
00:05:42.560 | To pinecone game. We need a list not a numpy array of what you can get going to get an error
00:05:48.360 | Okay, then back to the pinecone side of things. We want to create a list of
00:05:58.280 | It's basically a list of tuples and those tuples include the ID of
00:06:03.160 | Each context so there's a unique ID for each context. We want the vector or the encoding the context vector
00:06:11.560 | And then we also have this dictionary here now. This is metadata so metadata and pinecone is like any other
00:06:18.800 | Information about your vectors that you want to include and this is really good if you want to use metadata filtering which is super
00:06:26.520 | powerful in pinecone
00:06:28.520 | Sand I definitely want to you know leave the option open later on. I'm not sure if we'll use it or not
00:06:35.060 | We'll probably put something in there. Just so we can play around with it
00:06:41.560 | That creates the format that we need to upset everything which means just like push or upload everything to pinecone
00:06:47.840 | So then I do that in chunks of 50 at a time
00:06:52.280 | It just makes things a little bit easier on the on the API request rather than sending everything at once. Okay?
00:06:59.480 | So that's like how we create the index
00:07:02.880 | so now what we're going to do is actually
00:07:05.800 | Integrate that a little bit in in our app
00:07:11.480 | So let's switch back to our app here. Let's view it
00:07:17.560 | So first, let's just remove this. We don't need that. Okay, it's a will automatically reload
00:07:26.720 | First we want to do here is let's initialize
00:07:29.720 | the pinecone connection, so I'm going to
00:07:33.560 | just take
00:07:36.680 | Let's just take this part of the code
00:07:39.400 | Just copy it and then we'll remove what we don't need in a minute
00:07:44.000 | So we don't need we do need sentence transformers
00:07:47.800 | In a minute, we don't need datasets
00:07:51.080 | We do need pinecone. So actually here. We're initializing our our retriever model
00:07:58.240 | It's the same as what we did before. So we do want to keep that in there
00:08:01.480 | and a bigger
00:08:04.120 | API key again just saw this somewhere else or if you are using
00:08:09.840 | Streamlit cloud they have like a secrets management
00:08:13.760 | System and it's something we'll look at in the future for sure. But for now, I'm just putting in here
00:08:19.440 | So we have our API key environment and we're just doing the same thing we did before but actually we don't want to create an index
00:08:28.240 | We're assuming we've already created an index if web in our app, so we're just going to connect to it
00:08:32.600 | okay, so
00:08:35.120 | with that we've
00:08:37.120 | kind of set up the
00:08:39.200 | Like the back end part of our app. I've smart part that's going to handle the open the main Q&A
00:08:44.960 | But it's going to be a little bit slow and we we will have a look at how to solve that
00:08:50.720 | pretty soon, but for now, what we're going to do is actually just implement this and
00:08:56.480 | We're going to actually query and see what we we return
00:09:01.600 | So I'm gonna save this
00:09:04.360 | we won't see anything change in our app now other than the fact that it takes longer to load because it's
00:09:09.400 | downloading the
00:09:13.480 | retriever model that's the main part of the
00:09:15.880 | The slowness here and then obviously connecting to pinecone also take takes a second as well
00:09:24.080 | Now we're going to deal with how slow it is
00:09:26.120 | But we will we will fix that pretty soon and
00:09:31.440 | Now I actually want to do is I want to say okay if the query is not empty because by default it is empty
00:09:38.680 | That's why we add that in there. So I'm going to actually remove this
00:09:41.840 | Enter if it is not empty. So if query
00:09:46.600 | Is not equal to nothing
00:09:49.520 | we're going to
00:09:52.400 | query
00:09:54.200 | Pinecone for whatever is in that query. So the first thing we need to do is create our context vector
00:09:59.480 | So I'm gonna write XQ
00:10:01.480 | Just shorthand for context vector
00:10:03.640 | It's pretty standard
00:10:06.880 | Especially if you use vice before they tend to use this and I say I said context vector. I mean query vector
00:10:13.440 | So we're going to do model
00:10:16.800 | encode and
00:10:18.520 | We need to put this in square brackets and we have query. Okay, and then we're going to convert that to a list
00:10:24.280 | Okay, so this is going to create our
00:10:28.280 | Query vector. Let's write it down
00:10:30.320 | create query vector
00:10:33.320 | And then the next thing we want to do is
00:10:37.040 | Query pinecone with this query vector
00:10:42.320 | To do that
00:10:44.160 | We want to write
00:10:46.160 | First let's get relevant
00:10:48.760 | Context and we're going to solve these in XC. So like query like context vector
00:10:58.280 | similar thing to the
00:11:00.280 | Query vector that we use for with XQ
00:11:03.600 | But this time we're gonna write
00:11:07.160 | index dot query
00:11:09.760 | And we're going to pass XQ. So our query vector and we're gonna say how many results we want to return now
00:11:18.120 | Later on we're going to use
00:11:20.360 | Streamlet a little like a slider bar to decide how many we would like to return but for now we will hard code it
00:11:28.920 | another thing that we want to include here is we want to tell pinecone to
00:11:34.200 | Return the metadata because by default it will not return metadata
00:11:40.480 | Return metadata equals true. So these are like the extra little bits. I mentioned before so included our
00:11:46.800 | title so
00:11:49.600 | Like the topic Wikipedia topic that the context is coming from and also the text itself
00:11:56.960 | okay, so we're going to return the relevant context and
00:12:00.320 | Then we're gonna loop through each of those now
00:12:04.080 | When we do this, there's a particular format that we need to follow
00:12:13.960 | our context are actually going to be sword so for context in XC results and
00:12:24.040 | Results is going to return a list and we just want the the first item in that list
00:12:29.360 | The reason it returns a list is because if you are a querying pinecone with multiple queries
00:12:34.960 | It will turn a list of you know, your answers for each query
00:12:40.080 | But in this case, we are only ever going to query with one
00:12:44.160 | query vector so we always enter a position zero here and then in there we will have
00:12:52.880 | all of our
00:12:54.800 | returned matches
00:12:56.640 | inside this matches
00:12:58.640 | Key value value so
00:13:02.640 | For context in there. All we're going to do is write st dot right
00:13:09.160 | Context and
00:13:12.600 | then we want to go into the metadata that we were returning and
00:13:15.840 | We have title and text here. We don't want title we want text
00:13:21.240 | Okay, so let's say that and check that actually works
00:13:25.200 | so again, this is going to take a lot while to load because we're
00:13:28.880 | Initializing like the full pipeline of our vector database and the true model
00:13:34.800 | So every time we run this it's downloading the full tree model, which takes quite a bit of time
00:13:39.600 | Okay, so this is just rerun our app and now I can say who are the Normans?
00:13:50.080 | Again trying to reload everything so it's gonna take a while. We're going to fix this in the next video
00:13:57.080 | We should be returning five context and if we scroll down we can see we have these five paragraphs
00:14:02.960 | Now each one is paragraphs. It's a single
00:14:04.960 | Context so we could maybe we can inspect the element
00:14:10.880 | Okay, so we can see down here
00:14:21.040 | Pretty horrific to look at so if I zoom up we can see each one of these is
00:14:29.160 | A single a single one of our context, right?
00:14:34.880 | These here cool so I
00:14:40.600 | Think that is that's it for this
00:14:45.080 | Video, so we're now we have these back end working and the next one
00:14:50.960 | What we'll do is fix this issue with it taking forever to load reload everything every time
00:14:56.120 | Which is actually super easy
00:14:58.800 | But we'll make a big difference to our app
00:15:05.000 | Thank you very much for watching and I will see you in the next one. Bye