back to indexOpenAI's New GPT 3.5 Embedding Model for Semantic Search
Chapters
0:0
0:30 Semantic search with OpenAI GPT architecture
3:43 Getting started with OpenAI embeddings in Python
4:12 Initializing connection to OpenAI API
5:49 Creating OpenAI embeddings with ada
7:24 Initializing the Pinecone vector index
9:4 Getting dataset from Hugging Face to embed and index
10:3 Populating vector index with embeddings
12:1 Semantic search querying
15:9 Deleting the environment
15:23 Final notes
00:00:00.000 |
Today, we're going to have a look at how we can use OpenAI's new text embedding model, 00:00:09.200 |
to essentially search through loads of documents and do it in a super easy way. 00:00:15.600 |
So we really don't need to know that much about what is going on behind the scenes here. 00:00:20.200 |
We can just kind of get going with it and get really impressive results super quickly. 00:00:25.900 |
So to start, let's just have a quick look at how all this is going to look. 00:00:30.100 |
It's very similar, if you follow any of these videos, very similar architecture to what we normally use. 00:00:42.700 |
which is going to be over here. And we're going to take that 00:00:47.900 |
and we're going to use the new Arda002 model to embed these. 00:00:56.200 |
Okay, so what we have in here are sentences, some text goes through like this. 00:01:04.400 |
And what we're doing here is creating meaningful embeddings. 00:01:07.500 |
So for example, two sentences that have a very similar meaning within a vector space, 00:01:14.400 |
because that's what we're converting them into, vectors within that vector space, 00:01:20.600 |
And of course, we know that OpenAI, when they do something, they do it pretty well. 00:01:25.200 |
So the expectation here is that the Arda002 model is going to be pretty good at creating these dense vector representations. 00:01:33.200 |
So from that, we're going to get our embeddings. 00:01:37.000 |
I'm going to just have them in this little square here. 00:01:40.000 |
What we're going to do with those is we're going to take them over into Pinecone, 00:01:47.000 |
So where we essentially, where this will live, this vector space. 00:01:52.300 |
So we have our vector database here and they're going to go into there like that. 00:02:00.500 |
Okay, so this process here is what we would refer to as indexing. 00:02:06.300 |
Okay, we're taking all of our data and we're indexing it within Pinecone using the Arda002 model. 00:02:11.900 |
Now, there's another step to this whole pipeline that we haven't spoken about and that is querying. 00:02:19.900 |
So querying is literally when we do a search. 00:02:22.800 |
So let's say some random person comes along and they're like, I want to know about this. 00:02:28.200 |
We don't know what they're asking about. It's a mystery, but they have this query. 00:02:34.200 |
What we do with that query is we take it into Arda002. 00:02:47.100 |
And we're going to take that over to Pinecone here and we're going to say it's Pinecone returned top k. 00:03:01.000 |
Let's say 5, return the top k most relevant vectors that we have already indexed. 00:03:07.300 |
So we return those. Now we have five of these vectors. 00:03:12.900 |
They're all in here, 1, 2, 3, 4, 5 and we return them to the user. 00:03:19.300 |
Okay, but when we return them to the user, we're actually not going to return the vectors because it's just numbers. 00:03:26.600 |
We're going to return the text that those vectors were embedded with. 00:03:31.900 |
Okay, and that is how we will build our system. 00:03:36.900 |
Now it's actually super simple. This chart probably makes it look way more complicated than it actually is. 00:03:44.000 |
So we're going to be working from this example here. 00:03:50.300 |
We're going to open this in Colab and just work through. 00:03:56.200 |
So we get started by just installing any prerequisites that we have. 00:04:00.600 |
So we want to install the Pinecone Client, OpenAI and datasets. 00:04:13.400 |
So come down here and first thing we're going to need to do is create our embeddings. 00:04:17.200 |
Now to do that, we need to initialize our connection to OpenAI and for that we need these two keys. 00:04:25.100 |
So we need a organization key and we need our secret API key. 00:04:31.800 |
We go to beta@openai.com and you'll need to log in so you can log in at the top right. 00:04:37.800 |
I've already logged in so I can go over, click on my profile and I can click view API keys. 00:04:43.900 |
Okay, and the first page you come to here is the secret key. 00:04:48.000 |
Now here you can't copy this. It's already been created. 00:04:51.800 |
So what you need to do is create a new secret key. 00:04:54.900 |
So I will do that and then you just copy your key here. 00:04:58.100 |
Then with that secret key, you need to paste it into here. 00:05:02.800 |
I have mine stored in a variable called API key. 00:05:08.300 |
We go over to settings and then in here we'll also find our organization ID. 00:05:14.100 |
So we need to copy that and that will go in here. 00:05:17.900 |
And I have mine stored in another variable called org key. 00:05:24.300 |
Now I can run this and what we'll do is we'll get a list of all the models that are available 00:05:33.000 |
So you can see we have this big list which we initialized with this OpenAI engine list. 00:05:37.400 |
So we're just seeing everything in there and I don't know if maybe Arda is at the bottom, maybe not. 00:05:45.300 |
So I'm not going to search through it, but we'll see which model we're using here. 00:05:48.800 |
So this is a new model from OpenAI and it's much cheaper to use 00:05:54.300 |
and the performance is supposedly much greater. 00:05:57.700 |
So we'll go ahead and we'll try this one out. 00:06:00.700 |
So text embedding Arda002 and just as an example, this is how we would create our embedding. 00:06:07.300 |
So OpenAI embedding create and then we can pass multiple things to embed here. 00:06:13.100 |
So this we have two sensors and that means we will end up outputting two vector embeddings. 00:06:18.900 |
And then for the model, we just pass the model that we'd like to use. 00:06:21.300 |
So this one. Okay, so we run that and if it worked correctly, you should see that we have these vectors in here. 00:06:28.400 |
Okay, and some little bits of information in there. 00:06:32.800 |
Now one thing that I would like to demonstrate here is, okay, are these vectors, 00:06:38.700 |
do they have the same dimensionality and what is that dimensionality? 00:06:42.000 |
Now they're output by the same model, so we would expect them to have the same dimensionality. 00:06:46.400 |
So we're just checking the response. We have data, zero and embedding. 00:06:51.700 |
So essentially what we have in here, if I scroll up a bit, you'll be able to see that. 00:06:58.500 |
Okay, so we have data, we're going for the first item in the list and we're looking at embedding. 00:07:03.400 |
Great. Now print those out and we should see that we get 1536, 00:07:08.200 |
which is the embedding dimensionality of the new Arda model. 00:07:11.300 |
Now what I want to do is extract those into a list, which is what we're going to be doing later. 00:07:15.700 |
So we can extract those and see that we do in fact have two of those. 00:07:19.400 |
And again, we can check the dimensionality there as well. 00:07:23.300 |
So now what we need to do is initialize a Pinecone instance. 00:07:28.700 |
And this is where we're going to store all of our vectors. 00:07:31.800 |
So for that, we need to head over to app.pinecone.io. 00:07:36.600 |
So let me open that over here. You will need to sign up if this is your first time. 00:07:42.200 |
Again, you should come through to a page that looks kind of like this. 00:07:45.100 |
So I have James's default project up here. You will have your name followed by default project. 00:07:50.400 |
And what we're going to do is we don't want to create our first index. 00:07:53.400 |
We're going to be doing that in Python. What we do need is the API keys. 00:07:57.400 |
So I'm going to just take one of these. I have my default API key here. 00:08:00.800 |
I'm going to copy it here and we're going to paste it into the notebook. 00:08:04.000 |
So I've stored mine in a variable called Pinecone key. 00:08:09.000 |
So I can run that. And what this will do is initialize our connection to Pinecone. 00:08:13.400 |
It will check if there is an index called OpenAI within our project. 00:08:18.200 |
So within this space here, we don't have any so it doesn't exist. 00:08:23.300 |
If it doesn't exist, it will be created and it will use this dimension here. 00:08:27.800 |
So this dimension is a 1536 that we saw earlier. And then we'll connect to that index. 00:08:33.100 |
So let's run that. And if we navigate back to the page here, the app.pinecone.io, 00:08:39.700 |
we can refresh and we should see that we have an index here. 00:08:44.200 |
It was initializing and now it's ready. So we can see all the details there. 00:08:49.000 |
We see the dimensionality, the pod types we're using, metrics and so on. 00:08:53.800 |
So these are just default variables there. But yes, we do want to be using cosine and pod type. 00:08:58.900 |
You can change the pod type depending on what you're wanting to do. 00:09:02.400 |
So back in our code, let's go ahead and begin populating that index. 00:09:07.100 |
So to populate the index, we obviously need some data. 00:09:10.500 |
We're just going to use a very small data set, 1,000 questions from the Trek data set. 00:09:15.800 |
So let's load that. This we are getting from Hugging Face data sets. 00:09:21.500 |
So if we actually go to Hugging Face CO data sets, Trek, 00:09:25.900 |
we'll see the data set that we are downloading, which is this here. 00:09:31.500 |
Okay. I think in total there's maybe 5,000-ish examples in there. 00:09:38.400 |
We're just going to use the first 1,000 to make things really fast as we're walking through this example. 00:09:43.200 |
Okay. And yes, we can see we have text, course label, 00:09:46.200 |
fine label. All we really care about here is actually the text. 00:09:49.900 |
Okay. And we can have a look at the first one. 00:09:53.000 |
How did certain develop in and then leave Russia. 00:09:56.600 |
And we can also compare that over to here and we see that it's actually exactly the same. 00:10:01.300 |
Okay, cool. So now what we want to do is we're going to create a vector embedding for each one of these samples. 00:10:08.700 |
So well, let's walk through the logic of doing that. 00:10:12.000 |
So we're going to be doing that in a loop. We're going to be doing it in batches of 32. 00:10:15.700 |
And what we're going to do is extract the start position of the batch, 00:10:19.200 |
which is I and the end position of that batch. 00:10:21.900 |
And we're going to get all of the text within that batch. 00:10:26.500 |
So this should actually be high end. So we get all the text within the batch. 00:10:31.500 |
We get all the IDs, which is just a count. You can use actual IDs if you want. 00:10:37.600 |
And then what we're going to do is we're going to create our embeddings using the OpenAI endpoint that we used before. 00:10:42.500 |
So we have our inputs, which is our batch of text. 00:10:45.500 |
We have the engine, which is the Arda002 model. 00:10:48.700 |
And then here, we're just reformatting those embeddings into a format that we can then take and upstart into Pinecone. 00:10:56.800 |
We also, so later on when we're serving or when we're querying, 00:11:01.500 |
what we're going to want to do is we don't want to see these vectors because it doesn't make sense to us. 00:11:08.200 |
So to make that easy, what we're going to do is pair our metadata. 00:11:12.500 |
So the metadata is literally just that text that we want to see. 00:11:16.400 |
That will basically just be some metadata attached to each one of our vectors. 00:11:20.600 |
And it means that when we're querying, we can just return that and read the actual text rather than looking at the vectors. 00:11:26.900 |
I mean, that's it. So we zip all this together. 00:11:29.000 |
So each record is going to be a unique ID, the vector embedding, and the attached metadata. 00:11:38.100 |
So we can run that. It should be pretty quick. 00:11:41.200 |
Okay. Yep. It's like 13 seconds, really fast. 00:11:44.600 |
Okay. 14 seconds total. Really super fast for a thousand items. 00:11:49.600 |
That's pretty insane. So now what we can do, that is the indexing portion of our app done. 00:11:59.200 |
So we can kind of cross that off. Now what we need to focus on is the querying. 00:12:03.200 |
So how do we do querying? It's actually really easy. 00:12:05.800 |
So we have a query. I'm going to say what caused the 1929 Great Depression? 00:12:10.700 |
We're kind of limited in number of questions we can ask here because we do only have 1,000 examples indexed. 00:12:19.800 |
So we're going to be limited on what we can actually ask here. 00:12:22.800 |
But this is so pretty good in order to just demonstrate this workflow. 00:12:28.500 |
So let's run this. Basically, we're doing the exact same thing for the query that we did with the lines or the Trek dataset from before. 00:12:36.700 |
So we're just embedding it using the Adder002 model. 00:12:40.300 |
In this case, we just have one string input there. 00:12:43.900 |
And then what we do is in that response, we're going to have data, we're going to retrieve the first item. 00:12:49.700 |
There's just one item in there anyway. And we want the embedding from that. 00:12:53.600 |
And that will be, if I take a look at this, so that will be a 1536 dimensional vector. 00:13:01.700 |
And then what we can do is we pass that to index.query, like so. 00:13:08.900 |
OK, so we can remove those square brackets there. Top k equals 5, include metadata. 00:13:16.300 |
We do want to include this. This is going to return the text, the original text back to us. 00:13:21.400 |
So let's see, are we returning questions that are similar to the question we asked? 00:13:26.700 |
OK, so why did the world enter a global depression in 1929 when it was the Great Depression? 00:13:33.200 |
I don't know what is with the weird formatting here. 00:13:36.800 |
And then it's talking about some other things that are maybe somewhat related. 00:13:40.100 |
I'm not really sure, or just things from around that sort of time era. 00:13:44.600 |
But we can see that the score here, the similarity, does drop really quickly when we come down to these. 00:13:51.400 |
Because they're actually not that relevant. They're just kind of within the same context, I suppose. 00:13:58.200 |
It's clearly returning the correct question that we would expect it to, based on the question we asked. 00:14:04.100 |
OK, so we can also format that a little bit nicer. 00:14:06.900 |
So here, just run that. We can see a little bit easier to read than in this sort of response format that we had up here. 00:14:14.000 |
Now let's make it a little bit harder. We're just going to replace the correct term depression with incorrect term recession. 00:14:19.800 |
And see if it still understands our query, because this is where a lexical search, 00:14:24.000 |
so where you're searching by keywords, would fail. 00:14:26.100 |
In this case, we should see, hopefully, that it does not fail. 00:14:28.500 |
So replacing or replicating the same logic again. 00:14:32.400 |
And we can see that, yes, the similarity is slightly low, because we're using a different word. 00:14:37.100 |
But it's still returning the relevant question as our first example there. 00:14:42.100 |
OK, that is pretty cool. Now let's make it even harder. 00:14:46.500 |
Why was there a long-term economic downturn in the early 20th century? 00:14:50.600 |
Is it going to figure out that this is what we're talking about? 00:14:53.400 |
That we're talking about the global depression of 1929? 00:14:57.800 |
And yes, it does. And the similarity is actually pretty good there. 00:15:01.300 |
So despite not really sharing any of the same words, 00:15:03.900 |
it actually manages to identify that this is talking about the same thing, which is pretty good. 00:15:09.700 |
Now with that done, we can finish with this example. 00:15:12.700 |
So one thing you might need to do here is head over to Pinecone console, 00:15:18.200 |
and you can just go ahead and delete the index, or you can do it in code. Completely up to you. 00:15:23.400 |
Great. So that's it for this walkthrough and example. 00:15:27.500 |
I hope this has been useful. It's really cool to see OpenAI's new embedding model. 00:15:32.600 |
And from what I've heard, the performance, although not as clear from this example, is really good. 00:15:38.800 |
And as you have seen from this example, it's super easy to use. 00:15:42.100 |
So a few lines of code, and we have this really cool, really high-performance, 00:15:46.800 |
semantic search example with OpenAI and Pinecone, and we don't really need to worry about anything. 00:15:53.000 |
It's just super easy to do. So I hope this has all been interesting and useful. 00:15:57.800 |
Thank you very much for watching, and I'll see you again in the next one. Bye.