OpenAI's New GPT 3.5 Embedding Model for Semantic Search

00:00:00.000 | Today, we're going to have a look at how we can use OpenAI's new text embedding model,

00:00:05.400 | creatively named Text Embedding Arda002,

00:00:09.200 | to essentially search through loads of documents and do it in a super easy way.

00:00:15.600 | So we really don't need to know that much about what is going on behind the scenes here.

00:00:20.200 | We can just kind of get going with it and get really impressive results super quickly.

00:00:25.900 | So to start, let's just have a quick look at how all this is going to look.

00:00:30.100 | It's very similar, if you follow any of these videos, very similar architecture to what we normally use.

00:00:35.400 | We start with our data source,

00:00:42.700 | which is going to be over here. And we're going to take that

00:00:47.900 | and we're going to use the new Arda002 model to embed these.

00:00:56.200 | Okay, so what we have in here are sentences, some text goes through like this.

00:01:04.400 | And what we're doing here is creating meaningful embeddings.

00:01:07.500 | So for example, two sentences that have a very similar meaning within a vector space,

00:01:14.400 | because that's what we're converting them into, vectors within that vector space,

00:01:17.900 | they will be located very closely together.

00:01:20.600 | And of course, we know that OpenAI, when they do something, they do it pretty well.

00:01:25.200 | So the expectation here is that the Arda002 model is going to be pretty good at creating these dense vector representations.

00:01:33.200 | So from that, we're going to get our embeddings.

00:01:37.000 | I'm going to just have them in this little square here.

00:01:40.000 | What we're going to do with those is we're going to take them over into Pinecone,

00:01:44.800 | which is going to be our vector database.

00:01:47.000 | So where we essentially, where this will live, this vector space.

00:01:52.300 | So we have our vector database here and they're going to go into there like that.

00:02:00.500 | Okay, so this process here is what we would refer to as indexing.

00:02:06.300 | Okay, we're taking all of our data and we're indexing it within Pinecone using the Arda002 model.

00:02:11.900 | Now, there's another step to this whole pipeline that we haven't spoken about and that is querying.

00:02:19.900 | So querying is literally when we do a search.

00:02:22.800 | So let's say some random person comes along and they're like, I want to know about this.

00:02:28.200 | We don't know what they're asking about. It's a mystery, but they have this query.

00:02:32.900 | They've passed it to us.

00:02:34.200 | What we do with that query is we take it into Arda002.

00:02:38.400 | We embed it to create a query vector.

00:02:42.300 | So it's going to be a smaller box called xq.

00:02:47.100 | And we're going to take that over to Pinecone here and we're going to say it's Pinecone returned top k.

00:02:55.500 | So top k is going to be a number.

00:02:57.400 | Let's say we say 3 or 5.

00:03:01.000 | Let's say 5, return the top k most relevant vectors that we have already indexed.

00:03:07.300 | So we return those. Now we have five of these vectors.

00:03:12.900 | They're all in here, 1, 2, 3, 4, 5 and we return them to the user.

00:03:19.300 | Okay, but when we return them to the user, we're actually not going to return the vectors because it's just numbers.

00:03:24.700 | It won't make any sense.

00:03:26.600 | We're going to return the text that those vectors were embedded with.

00:03:31.900 | Okay, and that is how we will build our system.

00:03:36.900 | Now it's actually super simple. This chart probably makes it look way more complicated than it actually is.

00:03:42.200 | Let's take a look at the code.

00:03:44.000 | So we're going to be working from this example here.

00:03:46.600 | So we have dops, pinecone.io, dops, OpenAI.

00:03:50.300 | We're going to open this in Colab and just work through.

00:03:56.200 | So we get started by just installing any prerequisites that we have.

00:04:00.600 | So we want to install the Pinecone Client, OpenAI and datasets.

00:04:05.900 | So we'll go ahead and run that.

00:04:10.800 | Okay, that will take a moment. Okay, great.

00:04:13.400 | So come down here and first thing we're going to need to do is create our embeddings.

00:04:17.200 | Now to do that, we need to initialize our connection to OpenAI and for that we need these two keys.

00:04:25.100 | So we need a organization key and we need our secret API key.

00:04:29.200 | So to get that, we'll head over here.

00:04:31.800 | We go to beta@openai.com and you'll need to log in so you can log in at the top right.

00:04:37.800 | I've already logged in so I can go over, click on my profile and I can click view API keys.

00:04:43.900 | Okay, and the first page you come to here is the secret key.

00:04:48.000 | Now here you can't copy this. It's already been created.

00:04:51.800 | So what you need to do is create a new secret key.

00:04:54.900 | So I will do that and then you just copy your key here.

00:04:58.100 | Then with that secret key, you need to paste it into here.

00:05:02.800 | I have mine stored in a variable called API key.

00:05:05.700 | Then we return to the OpenAI page.

00:05:08.300 | We go over to settings and then in here we'll also find our organization ID.

00:05:14.100 | So we need to copy that and that will go in here.

00:05:17.900 | And I have mine stored in another variable called org key.

00:05:22.700 | So I will copy that.

00:05:24.300 | Now I can run this and what we'll do is we'll get a list of all the models that are available

00:05:30.400 | as long as we've authenticated correctly.

00:05:33.000 | So you can see we have this big list which we initialized with this OpenAI engine list.

00:05:37.400 | So we're just seeing everything in there and I don't know if maybe Arda is at the bottom, maybe not.

00:05:45.300 | So I'm not going to search through it, but we'll see which model we're using here.

00:05:48.800 | So this is a new model from OpenAI and it's much cheaper to use

00:05:54.300 | and the performance is supposedly much greater.

00:05:57.700 | So we'll go ahead and we'll try this one out.

00:06:00.700 | So text embedding Arda002 and just as an example, this is how we would create our embedding.

00:06:07.300 | So OpenAI embedding create and then we can pass multiple things to embed here.

00:06:13.100 | So this we have two sensors and that means we will end up outputting two vector embeddings.

00:06:18.900 | And then for the model, we just pass the model that we'd like to use.

00:06:21.300 | So this one. Okay, so we run that and if it worked correctly, you should see that we have these vectors in here.

00:06:28.400 | Okay, and some little bits of information in there.

00:06:31.100 | So it's pretty cool.

00:06:32.800 | Now one thing that I would like to demonstrate here is, okay, are these vectors,

00:06:38.700 | do they have the same dimensionality and what is that dimensionality?

00:06:42.000 | Now they're output by the same model, so we would expect them to have the same dimensionality.

00:06:46.400 | So we're just checking the response. We have data, zero and embedding.

00:06:51.700 | So essentially what we have in here, if I scroll up a bit, you'll be able to see that.

00:06:58.500 | Okay, so we have data, we're going for the first item in the list and we're looking at embedding.

00:07:03.400 | Great. Now print those out and we should see that we get 1536,

00:07:08.200 | which is the embedding dimensionality of the new Arda model.

00:07:11.300 | Now what I want to do is extract those into a list, which is what we're going to be doing later.

00:07:15.700 | So we can extract those and see that we do in fact have two of those.

00:07:19.400 | And again, we can check the dimensionality there as well.

00:07:23.300 | So now what we need to do is initialize a Pinecone instance.

00:07:28.700 | And this is where we're going to store all of our vectors.

00:07:31.800 | So for that, we need to head over to app.pinecone.io.

00:07:36.600 | So let me open that over here. You will need to sign up if this is your first time.

00:07:42.200 | Again, you should come through to a page that looks kind of like this.

00:07:45.100 | So I have James's default project up here. You will have your name followed by default project.

00:07:50.400 | And what we're going to do is we don't want to create our first index.

00:07:53.400 | We're going to be doing that in Python. What we do need is the API keys.

00:07:57.400 | So I'm going to just take one of these. I have my default API key here.

00:08:00.800 | I'm going to copy it here and we're going to paste it into the notebook.

00:08:04.000 | So I've stored mine in a variable called Pinecone key.

00:08:09.000 | So I can run that. And what this will do is initialize our connection to Pinecone.

00:08:13.400 | It will check if there is an index called OpenAI within our project.

00:08:18.200 | So within this space here, we don't have any so it doesn't exist.

00:08:23.300 | If it doesn't exist, it will be created and it will use this dimension here.

00:08:27.800 | So this dimension is a 1536 that we saw earlier. And then we'll connect to that index.

00:08:33.100 | So let's run that. And if we navigate back to the page here, the app.pinecone.io,

00:08:39.700 | we can refresh and we should see that we have an index here.

00:08:44.200 | It was initializing and now it's ready. So we can see all the details there.

00:08:49.000 | We see the dimensionality, the pod types we're using, metrics and so on.

00:08:53.800 | So these are just default variables there. But yes, we do want to be using cosine and pod type.

00:08:58.900 | You can change the pod type depending on what you're wanting to do.

00:09:02.400 | So back in our code, let's go ahead and begin populating that index.

00:09:07.100 | So to populate the index, we obviously need some data.

00:09:10.500 | We're just going to use a very small data set, 1,000 questions from the Trek data set.

00:09:15.800 | So let's load that. This we are getting from Hugging Face data sets.

00:09:21.500 | So if we actually go to Hugging Face CO data sets, Trek,

00:09:25.900 | we'll see the data set that we are downloading, which is this here.

00:09:31.500 | Okay. I think in total there's maybe 5,000-ish examples in there.

00:09:38.400 | We're just going to use the first 1,000 to make things really fast as we're walking through this example.

00:09:43.200 | Okay. And yes, we can see we have text, course label,

00:09:46.200 | fine label. All we really care about here is actually the text.

00:09:49.900 | Okay. And we can have a look at the first one.

00:09:53.000 | How did certain develop in and then leave Russia.

00:09:56.600 | And we can also compare that over to here and we see that it's actually exactly the same.

00:10:01.300 | Okay, cool. So now what we want to do is we're going to create a vector embedding for each one of these samples.

00:10:08.700 | So well, let's walk through the logic of doing that.

00:10:12.000 | So we're going to be doing that in a loop. We're going to be doing it in batches of 32.

00:10:15.700 | And what we're going to do is extract the start position of the batch,

00:10:19.200 | which is I and the end position of that batch.

00:10:21.900 | And we're going to get all of the text within that batch.

00:10:26.500 | So this should actually be high end. So we get all the text within the batch.

00:10:31.500 | We get all the IDs, which is just a count. You can use actual IDs if you want.

00:10:35.800 | For this example, it's not really needed.

00:10:37.600 | And then what we're going to do is we're going to create our embeddings using the OpenAI endpoint that we used before.

00:10:42.500 | So we have our inputs, which is our batch of text.

00:10:45.500 | We have the engine, which is the Arda002 model.

00:10:48.700 | And then here, we're just reformatting those embeddings into a format that we can then take and upstart into Pinecone.

00:10:56.800 | We also, so later on when we're serving or when we're querying,

00:11:01.500 | what we're going to want to do is we don't want to see these vectors because it doesn't make sense to us.

00:11:05.900 | We want to see the original text.

00:11:08.200 | So to make that easy, what we're going to do is pair our metadata.

00:11:12.500 | So the metadata is literally just that text that we want to see.

00:11:16.400 | That will basically just be some metadata attached to each one of our vectors.

00:11:20.600 | And it means that when we're querying, we can just return that and read the actual text rather than looking at the vectors.

00:11:26.900 | I mean, that's it. So we zip all this together.

00:11:29.000 | So each record is going to be a unique ID, the vector embedding, and the attached metadata.

00:11:35.400 | And then we upstart all that into Pinecone.

00:11:38.100 | So we can run that. It should be pretty quick.

00:11:41.200 | Okay. Yep. It's like 13 seconds, really fast.

00:11:44.600 | Okay. 14 seconds total. Really super fast for a thousand items.

00:11:49.600 | That's pretty insane. So now what we can do, that is the indexing portion of our app done.

00:11:55.600 | So all of this in green is now complete.

00:11:59.200 | So we can kind of cross that off. Now what we need to focus on is the querying.

00:12:03.200 | So how do we do querying? It's actually really easy.

00:12:05.800 | So we have a query. I'm going to say what caused the 1929 Great Depression?

00:12:10.700 | We're kind of limited in number of questions we can ask here because we do only have 1,000 examples indexed.

00:12:16.900 | Realistically, probably millions or more.

00:12:19.800 | So we're going to be limited on what we can actually ask here.

00:12:22.800 | But this is so pretty good in order to just demonstrate this workflow.

00:12:28.500 | So let's run this. Basically, we're doing the exact same thing for the query that we did with the lines or the Trek dataset from before.

00:12:36.700 | So we're just embedding it using the Adder002 model.

00:12:40.300 | In this case, we just have one string input there.

00:12:43.900 | And then what we do is in that response, we're going to have data, we're going to retrieve the first item.

00:12:49.700 | There's just one item in there anyway. And we want the embedding from that.

00:12:53.600 | And that will be, if I take a look at this, so that will be a 1536 dimensional vector.

00:13:01.700 | And then what we can do is we pass that to index.query, like so.

00:13:08.900 | OK, so we can remove those square brackets there. Top k equals 5, include metadata.

00:13:16.300 | We do want to include this. This is going to return the text, the original text back to us.

00:13:21.400 | So let's see, are we returning questions that are similar to the question we asked?

00:13:26.700 | OK, so why did the world enter a global depression in 1929 when it was the Great Depression?

00:13:33.200 | I don't know what is with the weird formatting here.

00:13:36.800 | And then it's talking about some other things that are maybe somewhat related.

00:13:40.100 | I'm not really sure, or just things from around that sort of time era.

00:13:44.600 | But we can see that the score here, the similarity, does drop really quickly when we come down to these.

00:13:51.400 | Because they're actually not that relevant. They're just kind of within the same context, I suppose.

00:13:56.200 | So that's pretty cool.

00:13:58.200 | It's clearly returning the correct question that we would expect it to, based on the question we asked.

00:14:04.100 | OK, so we can also format that a little bit nicer.

00:14:06.900 | So here, just run that. We can see a little bit easier to read than in this sort of response format that we had up here.

00:14:14.000 | Now let's make it a little bit harder. We're just going to replace the correct term depression with incorrect term recession.

00:14:19.800 | And see if it still understands our query, because this is where a lexical search,

00:14:24.000 | so where you're searching by keywords, would fail.

00:14:26.100 | In this case, we should see, hopefully, that it does not fail.

00:14:28.500 | So replacing or replicating the same logic again.

00:14:32.400 | And we can see that, yes, the similarity is slightly low, because we're using a different word.

00:14:37.100 | But it's still returning the relevant question as our first example there.

00:14:42.100 | OK, that is pretty cool. Now let's make it even harder.

00:14:46.500 | Why was there a long-term economic downturn in the early 20th century?

00:14:50.600 | Is it going to figure out that this is what we're talking about?

00:14:53.400 | That we're talking about the global depression of 1929?

00:14:57.800 | And yes, it does. And the similarity is actually pretty good there.

00:15:01.300 | So despite not really sharing any of the same words,

00:15:03.900 | it actually manages to identify that this is talking about the same thing, which is pretty good.

00:15:09.700 | Now with that done, we can finish with this example.

00:15:12.700 | So one thing you might need to do here is head over to Pinecone console,

00:15:18.200 | and you can just go ahead and delete the index, or you can do it in code. Completely up to you.

00:15:23.400 | Great. So that's it for this walkthrough and example.

00:15:27.500 | I hope this has been useful. It's really cool to see OpenAI's new embedding model.

00:15:32.600 | And from what I've heard, the performance, although not as clear from this example, is really good.

00:15:38.800 | And as you have seen from this example, it's super easy to use.

00:15:42.100 | So a few lines of code, and we have this really cool, really high-performance,

00:15:46.800 | semantic search example with OpenAI and Pinecone, and we don't really need to worry about anything.

00:15:53.000 | It's just super easy to do. So I hope this has all been interesting and useful.

00:15:57.800 | Thank you very much for watching, and I'll see you again in the next one. Bye.

00:16:01.600 | [MUSIC]

00:16:10.100 | [BLANK_AUDIO]

OpenAI's New GPT 3.5 Embedding Model for Semantic Search

Chapters