back to indexHow to build next-level Q&A with OpenAI
Chapters
0:0 Intro
4:7 Embedding the QA
10:8 Creating the index
11:54 Querying
17:13 Outro
00:00:00.000 |
Today, I want to show you a demo of how you can use generative question answering using 00:00:13.480 |
Now recently, we hosted a workshop between OpenAI and Pinecone, and people had the opportunity 00:00:23.560 |
And also, if you would like to ask questions, you can just go to this web address up here. 00:00:32.320 |
So one of those that I quite liked was, how do I define a PyTorch tensor of any zeros? 00:00:48.020 |
So we come down here, we have style at the moment, so it's paragraph by question. 00:00:56.300 |
I've just here removed the filters for Streamlet, Huggerface, and TensorFlow. 00:01:03.100 |
So we're only pulling information from the PyTorch forums, but we can add those back 00:01:07.760 |
in, and it should be able to figure out anyway. 00:01:10.280 |
But when we know we're looking at PyTorch, why not add that in there? 00:01:18.300 |
So there are a few ways to create a PyTorch tensor of all zeros. 00:01:25.880 |
Oh, okay, so we get a slightly different answer, interesting. 00:01:32.120 |
One way is to use PyTorch function torch zeros. 00:01:39.680 |
And then another one is great using the PyTorch function zeros light. 00:01:49.980 |
So it's giving you two options here and also describing how they vary, which is really 00:02:01.920 |
So now we can also have a look at the sources. 00:02:04.680 |
So if we look at the sources, we'll see that they're all just coming from PyTorch now. 00:02:18.560 |
And what I want to do is just say, okay, can you summarize what OpenAI Clip is? 00:02:32.640 |
So OpenAI Clip is a contrastive language image pre-training model that sees pairs of images 00:02:38.540 |
and text and returns a matrix of cosine similarity between text in each image. 00:02:47.080 |
So written in PyTorch, uses BCE loss in a, well, yeah, that's really cool. 00:02:58.020 |
And then we can also, I want to summarize the problems that people have. 00:03:03.160 |
So essentially this is going to summarize the problems that it sees in the context. 00:03:06.880 |
So when people are asking questions, it's going to summarize those problems that people 00:03:10.280 |
are having, given that particular topic of OpenAI Clip. 00:03:17.240 |
It just says this page is originally published on that site. 00:03:26.160 |
Actually so we're just going from PyTorch, it's probably not a good idea. 00:03:31.080 |
We should probably include HuggingFace in there, maybe TensorFlow as well. 00:03:38.120 |
So how to use ClipModel for image search and style transfer, how to fine tune the ClipModel, 00:03:48.560 |
And the questions generally seek to find out whether the ClipModel can be used to generate 00:03:58.420 |
So, and you can see where it's coming from here. 00:03:59.840 |
So OpenAI Clip for image search, style transfer, and so on. 00:04:05.960 |
So yeah, that's, I think, a really cool example of OpenAI's embedding and generation models 00:04:20.000 |
So I want to just show you a little bit of how that works. 00:04:23.900 |
So over on the left here, we have the indexing data set stage. 00:04:29.800 |
So this starts with scraping data from different websites. 00:04:34.480 |
So we use the forum website, so PyTorch, for example, here, and also all the ones that 00:04:41.760 |
you saw over here, so Streamlet, PyTorch, HuggingFace, and TensorFlow. 00:04:48.000 |
So I scraped all of those, and that resulted in these four JSON line files. 00:04:56.720 |
Now can I, let me show you what they look like. 00:05:00.440 |
And we just have these lines of all this information. 00:05:02.680 |
We have the thread, so we have, okay, the dots that's coming from the category, which 00:05:09.920 |
is vision in this case, the thread that it's coming from, so like the topic of the conversation 00:05:19.680 |
in the PyTorch forums, the actual link to it, the question, and then we have a load 00:05:24.880 |
of answers, basically, or just responses to the question. 00:05:28.920 |
So we have those, and this is what you end up seeing here. 00:05:31.800 |
So here, I've just concatenated them all together, and then we return over to here, and we feed 00:05:41.680 |
So if we just went over here to the API, go to docs, and we can come down, if we come 00:05:51.080 |
to embeddings, this talks to us about the embeddings, and how they work, and how to 00:06:01.120 |
So yeah, I mean, if you want to have a look at that, you can do, but we're going to see 00:06:06.900 |
So switching back over to code, if we come down a little bit, so here, I'm just cleaning 00:06:14.400 |
And one thing you can do is that the tokenizer for GPT-3, the embedding model, is similar 00:06:24.420 |
So we can actually download the tokenizer from Transformers, and check roughly how many 00:06:32.200 |
tokens we're going to have in our-- we're feeding it through to the OpenAI API. 00:06:39.800 |
So we'll go down, and then here, we can go ahead and actually get some embeddings. 00:06:44.960 |
So all we're doing here, we get the getEmbeddings endpoint, which is from-- if I just write 00:06:58.080 |
And you do need an account or everything for this, so I'm just pointing that out. 00:07:02.440 |
You also need a API or-- oh, yeah, API key here. 00:07:11.380 |
So yeah, if you need that as well, you go there. 00:07:15.760 |
So this is just our ordered data from before, and then we also have our embeddings that 00:07:23.280 |
And then we're just saving them to this query embeddings pocket file. 00:07:27.120 |
Now, at that point, we move over to the index initialization step. 00:07:36.120 |
So we just take all those we've created, got all our text, come through here, created the 00:07:42.120 |
embeddings, come over here, and now we're at this stage. 00:07:45.520 |
So the Pinecone VectorDB, we're just putting all of our embeddings into there. 00:07:54.040 |
So we first load our data, check that it's actually there, looks OK. 00:08:02.120 |
So the current-- this is changing pretty soon. 00:08:05.920 |
So maybe by the time you're watching this, it might have already changed. 00:08:09.960 |
But the max size limit for metadata in Pinecone is 5 kilobytes. 00:08:14.680 |
So what we need to do is just check, OK, is the text field that we're feeding in here 00:08:32.200 |
So what we can do is either trim down, like truncate those text fields, or we can just 00:08:38.180 |
keep the text locally and then just map the IDs to that text when we're feeding them back 00:08:46.040 |
Now, in this case, I'm just going to map them to IDs. 00:08:49.680 |
So you imagine you-- we're not going to put our text data in Pinecone, just the vectors. 00:08:58.520 |
But obviously, when we're returning the most relevant vectors back to us, we need a way 00:09:05.480 |
to understand what that text is so that we can feed it into our OpenAI GPT or generation 00:09:13.680 |
So we need a way to go from those IDs to the original text, which is all we're doing here. 00:09:20.600 |
So I'm getting the IDs, or I'm creating an ID, which is literally just numbers, nothing 00:09:33.740 |
And then after we've created those IDs, I'm just initializing the index. 00:09:43.920 |
We do need to make sure that we're using the cosine metric and also using the correct dimensionality. 00:09:52.400 |
So we have our embeddings that we've created up here. 00:09:56.560 |
And here, we're just saying, OK, use the dimensionality that is of those embeddings. 00:10:02.400 |
So I think-- I'm not sure how much it is for the query model. 00:10:04.800 |
It might be something like 2,048 dimensionality vectors. 00:10:27.880 |
So we do that, and then we just connect to our index after we've created it here. 00:10:36.320 |
At that point, we can then begin to populate the index with the embeddings that we've created 00:10:48.240 |
So what we're doing is including the ID and then the embedding. 00:10:54.720 |
So this is pretty important if we want to do any filtering or-- we can see here we've 00:11:00.640 |
got docs, category, and thread, and also the link. 00:11:04.880 |
That's really important for our app, because we're including all of this information in 00:11:10.200 |
If we have a look here, we can see we have these sources. 00:11:14.040 |
So we have the docs, followed by the category, followed by the name of the thread. 00:11:20.400 |
And also, if we click on it here, it will take us through to the thread. 00:11:26.200 |
So all that metadata that we specified there is used to build that sources section there. 00:11:33.680 |
And then finally-- so I mentioned before, Pinecone isn't going to, at this point, handle 00:11:40.480 |
So all I do is create another data set, which contains the ID to text mappings. 00:11:49.120 |
And then just save that to file in a JSON file, and nothing crazy there. 00:11:55.760 |
OK, so that that we just went through was this bit here, which seems like very little 00:12:11.020 |
So we're now kind of done with the indexing part. 00:12:17.240 |
We only need to repeat this again if we're adding more data to our tool. 00:12:29.080 |
This is what the users are going to be doing. 00:12:31.040 |
So we enter this search term, how to use Radiant Tape in TensorFlow, as I mentioned before. 00:12:37.040 |
And if we were going to go through that, it would look like this. 00:12:45.040 |
OK, so we have the mappings that we have here. 00:12:54.420 |
We connect to OpenAI, and then we use this getEmbedding to actually create our embeddings. 00:13:11.280 |
And then we just define a function that is going to use OpenAI to create the query embedding. 00:13:20.080 |
And then we use that query embedding to retrieve the most relevant context from Pinecone. 00:13:26.960 |
And then we feed them back into the generation model and retrieve an answer. 00:13:52.120 |
And then we're just going through all of those that we've got, and we're creating a really 00:13:57.320 |
big string of context, which you can see here, actually. 00:14:00.880 |
So you see, we have createContext, how do I use a Gradient Tape in TensorFlow. 00:14:10.360 |
This isn't the answer, or this isn't the question that we're going to ask to the next generative 00:14:17.080 |
This is just all of the source context that we're going to assign to it. 00:14:21.200 |
So we're going to say, OK, based on all this information here, answer a particular question. 00:14:31.600 |
And we can see, OK, here are the sort of questions we're going to ask. 00:14:36.760 |
So conservative Q&A, answer the question based on the context below. 00:14:42.920 |
And if the question can't be answered based on the context, say, I don't know, OK? 00:14:51.100 |
So I just want to show you what that might look like in the actual demo. 00:14:58.120 |
So let's go with-- let's restrict everything to streamlet, and we'll ask about OpenAI Clip, 00:15:09.800 |
And then we just change to conservative Q&A and go down. 00:15:18.520 |
So that's how the generative model is kind of reading the questions, literally reading 00:15:26.160 |
them and kind of producing this pretty intelligent answer based on what we've given it. 00:15:39.440 |
But the reason it didn't know is because what it was given there was not found within the 00:15:47.240 |
There was nothing in the streamlet docs that we filtered down to that said anything about 00:15:53.580 |
And then as well as that, we also pass in the answer. 00:15:57.800 |
And then we prompted OpenAI's generative model to actually answer that by saying, OK, answer. 00:16:04.200 |
So it was saying, like, basically, finish this sentence. 00:16:09.800 |
But you get these really incredible responses. 00:16:14.600 |
OK, and then there's just a few examples here, actually. 00:16:21.680 |
So you see, well, GPT-2 strength and weaknesses, and it says, I don't know. 00:16:26.560 |
So there's obviously not enough in there for it to do anything. 00:16:29.800 |
There's a really good one here, which is the extract key libraries and tools. 00:16:34.280 |
So embedding models, which embed images and text. 00:16:51.200 |
And let's make sure we have everything in there. 00:16:54.120 |
OK, and we get this really cool sort of list of all the embedding models that you would 00:17:03.400 |
use for extracting or which can embed images and text. 00:17:13.060 |
So that is, I think, pretty much all of that. 00:17:17.560 |
There's a little more code, I think, if we go up a little bit. 00:17:25.160 |
So we're just creating a context here, which is what we covered above. 00:17:31.040 |
And then we're just saying, OK, go into the OpenAI completion endpoint. 00:17:36.640 |
And that's going to complete the question or the text that we've given to it. 00:17:44.080 |
And yeah, no, this sort of thing just almost blows me away a little bit with how impressive 00:17:52.520 |
it is, because it seems almost genuinely intelligent. 00:17:58.280 |
And you ask a question, and it gives you a genuinely intelligent response that is kind 00:18:04.320 |
of pulled from all these different sources and formatted in a particular way based on 00:18:14.000 |
And yeah, definitely something I want to read about more, these sort of generative models, 00:18:22.480 |
because it's clearly really impressive and very useful. 00:18:31.320 |
I hope this has been interesting to see this demo and go through how everything works. 00:18:46.440 |
And if you are interested in trying it out, you can do. 00:18:49.160 |
You just go to share, Streamlit, I/O, Pinecone AI, Playground, Beyond Search, OpenAI, and 00:18:57.880 |
And if you'd like to take a look at the code for that as well, you can do. 00:19:06.600 |
What you do is head over to this URL here, and then you have all the code I just showed 00:19:11.840 |
you in here, and also the data as well, if you'd like that. 00:19:21.880 |
I hope this has been useful, and I will see you again in the next one.