back to index

How to build next-level Q&A with OpenAI


Chapters

0:0 Intro
4:7 Embedding the QA
10:8 Creating the index
11:54 Querying
17:13 Outro

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, I want to show you a demo of how you can use generative question answering using
00:00:10.800 | Pinecone and OpenAI.
00:00:13.480 | Now recently, we hosted a workshop between OpenAI and Pinecone, and people had the opportunity
00:00:20.440 | to ask questions themselves as well.
00:00:23.560 | And also, if you would like to ask questions, you can just go to this web address up here.
00:00:30.160 | And they asked some really good questions.
00:00:32.320 | So one of those that I quite liked was, how do I define a PyTorch tensor of any zeros?
00:00:48.020 | So we come down here, we have style at the moment, so it's paragraph by question.
00:00:53.840 | We'll leave it at that for now.
00:00:56.300 | I've just here removed the filters for Streamlet, Huggerface, and TensorFlow.
00:01:03.100 | So we're only pulling information from the PyTorch forums, but we can add those back
00:01:07.760 | in, and it should be able to figure out anyway.
00:01:10.280 | But when we know we're looking at PyTorch, why not add that in there?
00:01:17.080 | Then we get this.
00:01:18.300 | So there are a few ways to create a PyTorch tensor of all zeros.
00:01:22.800 | I don't know why it's re-running there.
00:01:25.880 | Oh, okay, so we get a slightly different answer, interesting.
00:01:32.120 | One way is to use PyTorch function torch zeros.
00:01:35.780 | Okay, yeah, that's cool.
00:01:39.680 | And then another one is great using the PyTorch function zeros light.
00:01:46.480 | And then it explains what it does as well.
00:01:48.980 | So that's quite cool.
00:01:49.980 | So it's giving you two options here and also describing how they vary, which is really
00:01:59.760 | nice.
00:02:01.920 | So now we can also have a look at the sources.
00:02:04.680 | So if we look at the sources, we'll see that they're all just coming from PyTorch now.
00:02:10.200 | Let's ask another question.
00:02:12.080 | So this one's not even really a question.
00:02:13.980 | I'm just going to say OpenAI Clip.
00:02:18.560 | And what I want to do is just say, okay, can you summarize what OpenAI Clip is?
00:02:27.640 | So we'll come down here.
00:02:28.640 | Let's see what it, see what it returns.
00:02:31.640 | Cool.
00:02:32.640 | So OpenAI Clip is a contrastive language image pre-training model that sees pairs of images
00:02:38.540 | and text and returns a matrix of cosine similarity between text in each image.
00:02:44.560 | Okay, that's cool.
00:02:47.080 | So written in PyTorch, uses BCE loss in a, well, yeah, that's really cool.
00:02:56.600 | So that's just a paragraph.
00:02:58.020 | And then we can also, I want to summarize the problems that people have.
00:03:03.160 | So essentially this is going to summarize the problems that it sees in the context.
00:03:06.880 | So when people are asking questions, it's going to summarize those problems that people
00:03:10.280 | are having, given that particular topic of OpenAI Clip.
00:03:15.740 | So we'll come down.
00:03:17.240 | It just says this page is originally published on that site.
00:03:20.080 | Can I rerun?
00:03:21.220 | See if it comes up, anything.
00:03:26.160 | Actually so we're just going from PyTorch, it's probably not a good idea.
00:03:31.080 | We should probably include HuggingFace in there, maybe TensorFlow as well.
00:03:35.920 | Okay.
00:03:38.120 | So how to use ClipModel for image search and style transfer, how to fine tune the ClipModel,
00:03:45.240 | how to train the model, medical data.
00:03:48.560 | And the questions generally seek to find out whether the ClipModel can be used to generate
00:03:53.440 | replies to English language text input.
00:03:57.420 | So that's really cool.
00:03:58.420 | So, and you can see where it's coming from here.
00:03:59.840 | So OpenAI Clip for image search, style transfer, and so on.
00:04:05.960 | So yeah, that's, I think, a really cool example of OpenAI's embedding and generation models
00:04:16.360 | used in unison with Pinecone as well.
00:04:20.000 | So I want to just show you a little bit of how that works.
00:04:23.900 | So over on the left here, we have the indexing data set stage.
00:04:29.800 | So this starts with scraping data from different websites.
00:04:34.480 | So we use the forum website, so PyTorch, for example, here, and also all the ones that
00:04:41.760 | you saw over here, so Streamlet, PyTorch, HuggingFace, and TensorFlow.
00:04:48.000 | So I scraped all of those, and that resulted in these four JSON line files.
00:04:56.720 | Now can I, let me show you what they look like.
00:05:00.440 | And we just have these lines of all this information.
00:05:02.680 | We have the thread, so we have, okay, the dots that's coming from the category, which
00:05:09.920 | is vision in this case, the thread that it's coming from, so like the topic of the conversation
00:05:19.680 | in the PyTorch forums, the actual link to it, the question, and then we have a load
00:05:24.880 | of answers, basically, or just responses to the question.
00:05:28.920 | So we have those, and this is what you end up seeing here.
00:05:31.800 | So here, I've just concatenated them all together, and then we return over to here, and we feed
00:05:37.920 | all those into an OpenAI embedding model.
00:05:41.680 | So if we just went over here to the API, go to docs, and we can come down, if we come
00:05:51.080 | to embeddings, this talks to us about the embeddings, and how they work, and how to
00:05:58.400 | actually use them.
00:06:01.120 | So yeah, I mean, if you want to have a look at that, you can do, but we're going to see
00:06:05.120 | this in a moment anyway.
00:06:06.900 | So switching back over to code, if we come down a little bit, so here, I'm just cleaning
00:06:13.400 | it up.
00:06:14.400 | And one thing you can do is that the tokenizer for GPT-3, the embedding model, is similar
00:06:22.160 | to GPT-2 tokenizer.
00:06:24.420 | So we can actually download the tokenizer from Transformers, and check roughly how many
00:06:32.200 | tokens we're going to have in our-- we're feeding it through to the OpenAI API.
00:06:39.800 | So we'll go down, and then here, we can go ahead and actually get some embeddings.
00:06:44.960 | So all we're doing here, we get the getEmbeddings endpoint, which is from-- if I just write
00:06:53.120 | this in-- so it's a pip install OpenAI.
00:06:58.080 | And you do need an account or everything for this, so I'm just pointing that out.
00:07:02.440 | You also need a API or-- oh, yeah, API key here.
00:07:11.380 | So yeah, if you need that as well, you go there.
00:07:14.320 | And then you can see everything.
00:07:15.760 | So this is just our ordered data from before, and then we also have our embeddings that
00:07:20.160 | we just created using that endpoint.
00:07:23.280 | And then we're just saving them to this query embeddings pocket file.
00:07:27.120 | Now, at that point, we move over to the index initialization step.
00:07:31.200 | So over in our visual, that looks like this.
00:07:36.120 | So we just take all those we've created, got all our text, come through here, created the
00:07:42.120 | embeddings, come over here, and now we're at this stage.
00:07:45.520 | So the Pinecone VectorDB, we're just putting all of our embeddings into there.
00:07:50.440 | So let's have a look at how we do that.
00:07:54.040 | So we first load our data, check that it's actually there, looks OK.
00:08:02.120 | So the current-- this is changing pretty soon.
00:08:05.920 | So maybe by the time you're watching this, it might have already changed.
00:08:09.960 | But the max size limit for metadata in Pinecone is 5 kilobytes.
00:08:14.680 | So what we need to do is just check, OK, is the text field that we're feeding in here
00:08:19.960 | greater than that?
00:08:21.360 | If so, we need to clean up a little bit.
00:08:24.680 | And yes, it is.
00:08:27.480 | So in this case, a lot of them are too big.
00:08:32.200 | So what we can do is either trim down, like truncate those text fields, or we can just
00:08:38.180 | keep the text locally and then just map the IDs to that text when we're feeding them back
00:08:46.040 | Now, in this case, I'm just going to map them to IDs.
00:08:49.680 | So you imagine you-- we're not going to put our text data in Pinecone, just the vectors.
00:08:56.560 | But we also include an ID.
00:08:58.520 | But obviously, when we're returning the most relevant vectors back to us, we need a way
00:09:05.480 | to understand what that text is so that we can feed it into our OpenAI GPT or generation
00:09:11.520 | model.
00:09:13.680 | So we need a way to go from those IDs to the original text, which is all we're doing here.
00:09:20.600 | So I'm getting the IDs, or I'm creating an ID, which is literally just numbers, nothing
00:09:27.560 | special.
00:09:28.560 | It doesn't have to be anything special.
00:09:29.560 | It just needs to be unique.
00:09:33.740 | And then after we've created those IDs, I'm just initializing the index.
00:09:38.400 | You can do this before or after if you want.
00:09:41.800 | It doesn't really matter.
00:09:43.920 | We do need to make sure that we're using the cosine metric and also using the correct dimensionality.
00:09:52.400 | So we have our embeddings that we've created up here.
00:09:56.560 | And here, we're just saying, OK, use the dimensionality that is of those embeddings.
00:10:02.400 | So I think-- I'm not sure how much it is for the query model.
00:10:04.800 | It might be something like 2,048 dimensionality vectors.
00:10:08.400 | I'm not sure.
00:10:09.800 | Actually, we can-- how could it look?
00:10:13.960 | So go to the Pinecone console, open AI.
00:10:19.400 | OK, so it's 4,096 dimensions.
00:10:26.400 | Cool.
00:10:27.880 | So we do that, and then we just connect to our index after we've created it here.
00:10:34.880 | So create our index.
00:10:36.320 | At that point, we can then begin to populate the index with the embeddings that we've created
00:10:41.960 | from OpenAI.
00:10:43.760 | So that's what I'm doing here.
00:10:44.960 | We're doing it in batches of 32.
00:10:48.240 | So what we're doing is including the ID and then the embedding.
00:10:53.080 | And then we're also including metadata.
00:10:54.720 | So this is pretty important if we want to do any filtering or-- we can see here we've
00:11:00.640 | got docs, category, and thread, and also the link.
00:11:04.880 | That's really important for our app, because we're including all of this information in
00:11:08.600 | the sources section that we saw before.
00:11:10.200 | If we have a look here, we can see we have these sources.
00:11:14.040 | So we have the docs, followed by the category, followed by the name of the thread.
00:11:20.400 | And also, if we click on it here, it will take us through to the thread.
00:11:26.200 | So all that metadata that we specified there is used to build that sources section there.
00:11:33.680 | And then finally-- so I mentioned before, Pinecone isn't going to, at this point, handle
00:11:38.440 | all of that text information.
00:11:40.480 | So all I do is create another data set, which contains the ID to text mappings.
00:11:49.120 | And then just save that to file in a JSON file, and nothing crazy there.
00:11:55.760 | OK, so that that we just went through was this bit here, which seems like very little
00:12:06.100 | for what we just spoke about.
00:12:08.760 | But yeah, that bit of that.
00:12:11.020 | So we're now kind of done with the indexing part.
00:12:13.720 | So we can cross off.
00:12:15.240 | We're fine with index everything.
00:12:17.240 | We only need to repeat this again if we're adding more data to our tool.
00:12:22.960 | And then we'll go over to the querying bit.
00:12:26.560 | So here, we're going to query.
00:12:29.080 | This is what the users are going to be doing.
00:12:31.040 | So we enter this search term, how to use Radiant Tape in TensorFlow, as I mentioned before.
00:12:37.040 | And if we were going to go through that, it would look like this.
00:12:45.040 | OK, so we have the mappings that we have here.
00:12:52.160 | So that's the ID to text.
00:12:54.420 | We connect to OpenAI, and then we use this getEmbedding to actually create our embeddings.
00:13:01.880 | We don't run that here, though.
00:13:05.400 | We load our Pinecone index over here.
00:13:11.280 | And then we just define a function that is going to use OpenAI to create the query embedding.
00:13:20.080 | And then we use that query embedding to retrieve the most relevant context from Pinecone.
00:13:26.960 | And then we feed them back into the generation model and retrieve an answer.
00:13:34.960 | OK, so get the embedding.
00:13:38.920 | Where is that?
00:13:39.920 | So here, you can see.
00:13:41.280 | So this size here is a query model.
00:13:43.720 | You can see that there.
00:13:47.520 | And then we're querying Pinecone here.
00:13:52.120 | And then we're just going through all of those that we've got, and we're creating a really
00:13:57.320 | big string of context, which you can see here, actually.
00:14:00.880 | So you see, we have createContext, how do I use a Gradient Tape in TensorFlow.
00:14:07.440 | And this creates the context.
00:14:10.360 | This isn't the answer, or this isn't the question that we're going to ask to the next generative
00:14:16.080 | model.
00:14:17.080 | This is just all of the source context that we're going to assign to it.
00:14:21.200 | So we're going to say, OK, based on all this information here, answer a particular question.
00:14:27.640 | So let's go down.
00:14:31.600 | And we can see, OK, here are the sort of questions we're going to ask.
00:14:36.760 | So conservative Q&A, answer the question based on the context below.
00:14:42.920 | And if the question can't be answered based on the context, say, I don't know, OK?
00:14:49.680 | And then we go on.
00:14:51.100 | So I just want to show you what that might look like in the actual demo.
00:14:58.120 | So let's go with-- let's restrict everything to streamlet, and we'll ask about OpenAI Clip,
00:15:06.920 | maybe.
00:15:07.920 | Yeah, let's do that.
00:15:09.800 | And then we just change to conservative Q&A and go down.
00:15:15.960 | And we see that we get I don't know, right?
00:15:18.520 | So that's how the generative model is kind of reading the questions, literally reading
00:15:26.160 | them and kind of producing this pretty intelligent answer based on what we've given it.
00:15:33.800 | It's following the instructions really well.
00:15:36.740 | So it had this I don't know.
00:15:39.440 | But the reason it didn't know is because what it was given there was not found within the
00:15:45.560 | context, OK?
00:15:47.240 | There was nothing in the streamlet docs that we filtered down to that said anything about
00:15:51.840 | OpenAI Clip.
00:15:53.580 | And then as well as that, we also pass in the answer.
00:15:57.800 | And then we prompted OpenAI's generative model to actually answer that by saying, OK, answer.
00:16:04.200 | So it was saying, like, basically, finish this sentence.
00:16:07.920 | That's all we're really doing here.
00:16:09.800 | But you get these really incredible responses.
00:16:14.600 | OK, and then there's just a few examples here, actually.
00:16:21.680 | So you see, well, GPT-2 strength and weaknesses, and it says, I don't know.
00:16:26.560 | So there's obviously not enough in there for it to do anything.
00:16:29.800 | There's a really good one here, which is the extract key libraries and tools.
00:16:34.280 | So embedding models, which embed images and text.
00:16:38.960 | OK, let me copy this.
00:16:43.440 | And what was it?
00:16:44.960 | Was it bullet points?
00:16:45.960 | Oh, no, extract key libraries and tools.
00:16:48.840 | Let's do that.
00:16:51.200 | And let's make sure we have everything in there.
00:16:54.120 | OK, and we get this really cool sort of list of all the embedding models that you would
00:17:03.400 | use for extracting or which can embed images and text.
00:17:10.160 | So that's really, really impressive.
00:17:13.060 | So that is, I think, pretty much all of that.
00:17:17.560 | There's a little more code, I think, if we go up a little bit.
00:17:23.080 | So I kind of skipped over this.
00:17:25.160 | So we're just creating a context here, which is what we covered above.
00:17:31.040 | And then we're just saying, OK, go into the OpenAI completion endpoint.
00:17:36.640 | And that's going to complete the question or the text that we've given to it.
00:17:42.680 | So that's really cool.
00:17:44.080 | And yeah, no, this sort of thing just almost blows me away a little bit with how impressive
00:17:52.520 | it is, because it seems almost genuinely intelligent.
00:17:58.280 | And you ask a question, and it gives you a genuinely intelligent response that is kind
00:18:04.320 | of pulled from all these different sources and formatted in a particular way based on
00:18:09.360 | what you've asked.
00:18:10.360 | It is, for me, incredibly interesting.
00:18:14.000 | And yeah, definitely something I want to read about more, these sort of generative models,
00:18:22.480 | because it's clearly really impressive and very useful.
00:18:29.000 | So that's it for this video.
00:18:31.320 | I hope this has been interesting to see this demo and go through how everything works.
00:18:41.080 | But yeah, let me know what you think.
00:18:46.440 | And if you are interested in trying it out, you can do.
00:18:49.160 | You just go to share, Streamlit, I/O, Pinecone AI, Playground, Beyond Search, OpenAI, and
00:18:56.120 | so on.
00:18:57.880 | And if you'd like to take a look at the code for that as well, you can do.
00:19:06.600 | What you do is head over to this URL here, and then you have all the code I just showed
00:19:11.840 | you in here, and also the data as well, if you'd like that.
00:19:17.640 | So yeah, I think it's been really cool.
00:19:20.320 | So thank you very much for watching.
00:19:21.880 | I hope this has been useful, and I will see you again in the next one.