Cohere AI's LLM for Semantic Search in Python

Today we are going to take a look at how to build a semantic search tool using Cohere's Embed API endpoint and Pinecone's Vector Database. We'll be using Cohere's large language model to embed sentences or paragraphs into a vector space and then we'll be using Pinecone's Vector Database to actually search through that vector space and retrieve relevant answers to our particular queries based on the semantics of those queries rather than just keyword matching.

Now both of these services together are a pretty good combination and they make building this sort of tool incredibly easy as we'll see. But before we start building it let's take a look at what the overall architecture will look like. So we're gonna be starting with our data it's just going to be a load of text it can be split into sentences or roughly paragraph sized chunks of text depending on what we're trying to do and what we're going to do is feed those into Cohere's embedding endpoint which is just going to go to a large language model and what that will do is encode each of the chunks of text that we feed into it into a single vector.

Now we're going to have a thousand of these chunks of text we're gonna have like quite small questions from the Tregg data set so we'll end up with a thousand of these vectors. Okay and once we have them we then take them we put them into Pinecone and whilst they are stored in Pinecone or even just the vectors by themselves we can think of them as being you know how this works is that they are represented in a vector space.

So two of these chunks of text are semantically similar i.e. they have a similar meaning they would be very close together in that vector space whereas two sentences that have a very dissimilar meaning would be very far apart within that vector space. Pinecone is the storage the database that stores all these vectors and also allows us to search through these vectors very efficiently so we can literally store millions, tens of millions, billions of vectors in here and search through them incredibly fast.

Now all of this together is what we would call indexing and on the other side of this we have the querying phase so when we're making queries let's say we have a little search box here obviously this input can be anything we like but we have a search box here and our users are going to enter a query it's like Google search that query will go over to go here first using the same large language model and what we will get is a single what we call a query vector so here is our query vector in code we usually refer to it as XQ and we're going to pass that over to Pinecone here and we're going to say to Pinecone okay with this query vector what are the top K so like the number here top K maybe we say equal to 3 what are the top K most similar already index vectors so with top K equal to 3 if this was our query vector we would return the top 3 most similar items here so I think maybe number 1 would be this vector here maybe number 2 would be this one and number 3 would be this one and the Pinecone would return those to us so we'd have 3 vectors here but obviously we can't read or understand what these vectors mean all right they're just numbers so what we need to do is find the metadata that was attached to those so the metadata is going to contain the original text from there so we would actually get that original text and we would return all that to our user so that's what we're going to be building it probably looks much more complicated from this chart than it actually is it's incredibly easy as we'll see it won't take as long to go through so let's get started we are going to using this guide on Pinecone so docs.pinecone.io/docs/cohere we're not going to have you go through this page here we're actually just going to go over to here opening Colab okay and we get this little chart is that pretty much exactly the same as what I just explained just a little simpler and we're going to go down here and we just need to install a few prerequisites so we have take a look up here we have Cohere and Pinecone you know you know why we're using those I just mentioned it and then we also have Hugging Face datasets this is where we're going to be sourcing our data set from that we're going to be indexing and then querying against later on okay it looks good so come down to here and we need to sign up for API keys Cohere and Pinecone and both of these we can actually do this completely free so Cohere has a trial amount that we can query with so we're going to be using that click on Cohere here that will take us to os.cohere.ai and it will also just redirect you to the dashboard if you're already logged in if this is your first time using Cohere of course you will not be so you have to go to the top right over here and create an account or log in once you have done that you can go to settings on the left go to API keys and you should have a trial key here so I've got the default key I'm going to copy that and I'm going to go ahead and put it in here okay so I've just put mine in and then for the Pinecone key click here and that will take us to app.pinecone.io if you already logged in you will see something like this otherwise you're going to see a little login screen or create an account page so you go ahead do that and then you should be redirected to here now if this first time using Pinecone this will be empty you see I have a few indexes in here already but of course if you haven't created any already this will be empty so all we need to do is head on over to API keys on the left we should have a default API key you can copy that and we will place in here okay so I've just added mine so I've got my Cohere and Pinecone keys in there first thing we want to do is create our embeddings for that we are going to need to use the Cohere embed endpoint and we also need some data so let's get both of those so here we just initialize our connection to Cohere using that API key from before and then here we're going to use HuggingFace datasets and we're going to download the first 1,000 questions from the Trek dataset of first 1,000 rows actually and then the questions themselves are stored within this text feature which we can see down here okay cool after this we are going to be using the Cohere embed endpoint we're going to passing all of those items so yes you have 1,000 items in there just pass them all to Cohere at once and this client will just automatically batch those and iterate through everything so we don't overload the API requests we're going to be using the small model and we're going to be truncating so we're going to keep everything on the left here and then after that we want to extract the embeddings from what we return there so we run this it should be pretty quick a second super fast and then let's just have a look at the dimensionality of what we return there so we have 1,000 vectors each one of those vectors is 1,024 dimensions which is just the output dimensionality of Cohere's small large language model okay and with those embeddings all created we can move on to actually initializing our vector index which is where we're going to start everything so for that we initialize our connection Pinecone using the API key we used before we are going to be using this index name you can change this to whatever you want I would just recommend that you keep it descriptive so that you're not getting confused if you have multiple indexes later on and then here if this is your first time using Pinecone or going through this notebook this will just run here so it will create the index if you've already run this notebook and the index already exists within your Pinecone account then this is going to say if that index name does exist do not create the index again because it already exists but it needs to create it again within that create index we have the index name the dimensionality so that's the 1,024 that we saw before the Cohere small model and also using cosine similarity as the similarity metric there as well after that we just go ahead and connect to the index so let's run that now once that is all done we're going to move on to actually upsetting everything so adding all of those vectors the relevant metadata and some unique IDs to our index and that will be in this format so we're gonna big list where each record content is within a tuple containing an ID a unique ID a vector that we've created from Coheres embed endpoint and the metadata which is just going to contain the plain text version of the information so come down here and we will go ahead and create that structure so that's what we're doing here so creating a zip list of unique IDs which are just a count of the embeddings which we've created before Cohere and the metadata which you can see we're creating here it's just a dictionary metadata is always within the dictionary format and we just have a key which is text and the value which is the original plain text of our data now appeal using batch size 128 that is so that we're not overloading the API calls that we're making to Pinecone and we are actually upsetting in batches of 128 okay so we can run that at the end here we're going to describe the index statistics which is just so we can see the vector count so have we upsert everything into our vector index there and we can see here that we have and from here we can also change the dimensionality of our index which again this should align to the model output dimensionality again the 1024 that we saw from before and okay so with that we've actually done all the indexing stage of our workflow so we can actually cross off this bit here so the indexing part this is all done now all we need to do is the querying part so we can see we have our plain text query we're going to take that to Cohere we're going to from Cohere we're going to create that query embedding we query that with Pinecone and we return some vectors and the metadata with those vectors to the user so to do that it's pretty much the same process again we have our query we have what caused the 1929 Great Depression we are going to do the exact same thing that we did before with the Trek data we are going to use Cohere embed use the small model which I'm going to create to the left and we get those embeddings we can also print the shape here so let me run this okay so the shape is just one vector this time which is our query vector and it's still 1024 dimensionality because we're using that small Cohere large language model now from there we query Pinecone with this we're saying we want to return the top 10 most similar results and we want to include the metadata that includes the plain text so that we can actually read the results and we get this sort of response which is we can read it relatively easily but let's clean it up a little bit more so we come down here run this so we're going to go through each of those matches that we returned here and we're just going to print out the score rounded so it's be easier to read and we're going to return the metadata text and we print all of those out we get something like this now the top two results are they have much higher similarity scores than the rest of our results and they are indeed far more relevant to our question these would clearly be counted as duplicates to our question or at least very similar and then the rest of these we can see that these are not directly relevant but I think most of these kind of occur within the same sort of time error so it's interesting that they it manages to kind of identify some sort of relationship there and return those but of course these are the only two within that 1000 query data set that we have from Trek these are the only two items that refer to the Great Depression the rest of them as you can see not relevant at least not directly relevant so they're very good results now let's adjust this a little bit so I mentioned before that we're searching based on the meaning of these queries not the keywords so what we're going to do is replace the keyword depression with the incorrect term recession now although it's incorrect it's so you know we as humans would understand that it means the same thing someone is trying to ask about that specific event and indeed we can see that the results are pretty much exactly the same now the similarity scores dropped a bit because we're using that incorrect term but nonetheless it is identifying that the top two are exactly the same I think most of these are also the same there's just a little bit of different order in there and a couple that may be dropped from the top there now in this case we still have a lot of similar keywords there so maybe recession is very clearly identified as depression major with great and so on so maybe we can modify this even more and just kind of drop all those similar words and we can be kind of more descriptive here as well so why was there a long-term economic downturn in the early 20th century so this is very different to the results that we would expect to find and yet again we see those two right at the top and the rest of these results are also very similar now interestingly I think because we are using more descriptive language here it's managing to identify these two as being more similar than we did with the previous where we use the incorrect result you can see that the similarity score here is lower than it is here so you can see already how easy is to build a pretty high-performing semantic search engine using very little code and literally no prior knowledge about this technology all we do is make a few API calls to go here and make a few API calls to Pinecone and we have this semantic search tool now that's it for this video I hope it has been useful and interesting but for now thank you very much for watching and I'll see you again next one, bye.

you you you you you you

Cohere AI's LLM for Semantic Search in Python

Chapters

Transcript