How to build next-level Q&A with OpenAI

00:00:00.000 | Today, I want to show you a demo of how you can use generative question answering using

00:00:10.800 | Pinecone and OpenAI.

00:00:13.480 | Now recently, we hosted a workshop between OpenAI and Pinecone, and people had the opportunity

00:00:20.440 | to ask questions themselves as well.

00:00:23.560 | And also, if you would like to ask questions, you can just go to this web address up here.

00:00:30.160 | And they asked some really good questions.

00:00:32.320 | So one of those that I quite liked was, how do I define a PyTorch tensor of any zeros?

00:00:48.020 | So we come down here, we have style at the moment, so it's paragraph by question.

00:00:53.840 | We'll leave it at that for now.

00:00:56.300 | I've just here removed the filters for Streamlet, Huggerface, and TensorFlow.

00:01:03.100 | So we're only pulling information from the PyTorch forums, but we can add those back

00:01:07.760 | in, and it should be able to figure out anyway.

00:01:10.280 | But when we know we're looking at PyTorch, why not add that in there?

00:01:17.080 | Then we get this.

00:01:18.300 | So there are a few ways to create a PyTorch tensor of all zeros.

00:01:22.800 | I don't know why it's re-running there.

00:01:25.880 | Oh, okay, so we get a slightly different answer, interesting.

00:01:32.120 | One way is to use PyTorch function torch zeros.

00:01:35.780 | Okay, yeah, that's cool.

00:01:39.680 | And then another one is great using the PyTorch function zeros light.

00:01:46.480 | And then it explains what it does as well.

00:01:48.980 | So that's quite cool.

00:01:49.980 | So it's giving you two options here and also describing how they vary, which is really

00:01:59.760 | nice.

00:02:01.920 | So now we can also have a look at the sources.

00:02:04.680 | So if we look at the sources, we'll see that they're all just coming from PyTorch now.

00:02:10.200 | Let's ask another question.

00:02:12.080 | So this one's not even really a question.

00:02:13.980 | I'm just going to say OpenAI Clip.

00:02:18.560 | And what I want to do is just say, okay, can you summarize what OpenAI Clip is?

00:02:27.640 | So we'll come down here.

00:02:28.640 | Let's see what it, see what it returns.

00:02:31.640 | Cool.

00:02:32.640 | So OpenAI Clip is a contrastive language image pre-training model that sees pairs of images

00:02:38.540 | and text and returns a matrix of cosine similarity between text in each image.

00:02:44.560 | Okay, that's cool.

00:02:47.080 | So written in PyTorch, uses BCE loss in a, well, yeah, that's really cool.

00:02:56.600 | So that's just a paragraph.

00:02:58.020 | And then we can also, I want to summarize the problems that people have.

00:03:03.160 | So essentially this is going to summarize the problems that it sees in the context.

00:03:06.880 | So when people are asking questions, it's going to summarize those problems that people

00:03:10.280 | are having, given that particular topic of OpenAI Clip.

00:03:15.740 | So we'll come down.

00:03:17.240 | It just says this page is originally published on that site.

00:03:20.080 | Can I rerun?

00:03:21.220 | See if it comes up, anything.

00:03:26.160 | Actually so we're just going from PyTorch, it's probably not a good idea.

00:03:31.080 | We should probably include HuggingFace in there, maybe TensorFlow as well.

00:03:35.920 | Okay.

00:03:38.120 | So how to use ClipModel for image search and style transfer, how to fine tune the ClipModel,

00:03:45.240 | how to train the model, medical data.

00:03:48.560 | And the questions generally seek to find out whether the ClipModel can be used to generate

00:03:53.440 | replies to English language text input.

00:03:57.420 | So that's really cool.

00:03:58.420 | So, and you can see where it's coming from here.

00:03:59.840 | So OpenAI Clip for image search, style transfer, and so on.

00:04:05.960 | So yeah, that's, I think, a really cool example of OpenAI's embedding and generation models

00:04:16.360 | used in unison with Pinecone as well.

00:04:20.000 | So I want to just show you a little bit of how that works.

00:04:23.900 | So over on the left here, we have the indexing data set stage.

00:04:29.800 | So this starts with scraping data from different websites.

00:04:34.480 | So we use the forum website, so PyTorch, for example, here, and also all the ones that

00:04:41.760 | you saw over here, so Streamlet, PyTorch, HuggingFace, and TensorFlow.

00:04:48.000 | So I scraped all of those, and that resulted in these four JSON line files.

00:04:56.720 | Now can I, let me show you what they look like.

00:05:00.440 | And we just have these lines of all this information.

00:05:02.680 | We have the thread, so we have, okay, the dots that's coming from the category, which

00:05:09.920 | is vision in this case, the thread that it's coming from, so like the topic of the conversation

00:05:19.680 | in the PyTorch forums, the actual link to it, the question, and then we have a load

00:05:24.880 | of answers, basically, or just responses to the question.

00:05:28.920 | So we have those, and this is what you end up seeing here.

00:05:31.800 | So here, I've just concatenated them all together, and then we return over to here, and we feed

00:05:37.920 | all those into an OpenAI embedding model.

00:05:41.680 | So if we just went over here to the API, go to docs, and we can come down, if we come

00:05:51.080 | to embeddings, this talks to us about the embeddings, and how they work, and how to

00:05:58.400 | actually use them.

00:06:01.120 | So yeah, I mean, if you want to have a look at that, you can do, but we're going to see

00:06:05.120 | this in a moment anyway.

00:06:06.900 | So switching back over to code, if we come down a little bit, so here, I'm just cleaning

00:06:13.400 | it up.

00:06:14.400 | And one thing you can do is that the tokenizer for GPT-3, the embedding model, is similar

00:06:22.160 | to GPT-2 tokenizer.

00:06:24.420 | So we can actually download the tokenizer from Transformers, and check roughly how many

00:06:32.200 | tokens we're going to have in our-- we're feeding it through to the OpenAI API.

00:06:39.800 | So we'll go down, and then here, we can go ahead and actually get some embeddings.

00:06:44.960 | So all we're doing here, we get the getEmbeddings endpoint, which is from-- if I just write

00:06:53.120 | this in-- so it's a pip install OpenAI.

00:06:58.080 | And you do need an account or everything for this, so I'm just pointing that out.

00:07:02.440 | You also need a API or-- oh, yeah, API key here.

00:07:11.380 | So yeah, if you need that as well, you go there.

00:07:14.320 | And then you can see everything.

00:07:15.760 | So this is just our ordered data from before, and then we also have our embeddings that

00:07:20.160 | we just created using that endpoint.

00:07:23.280 | And then we're just saving them to this query embeddings pocket file.

00:07:27.120 | Now, at that point, we move over to the index initialization step.

00:07:31.200 | So over in our visual, that looks like this.

00:07:36.120 | So we just take all those we've created, got all our text, come through here, created the

00:07:42.120 | embeddings, come over here, and now we're at this stage.

00:07:45.520 | So the Pinecone VectorDB, we're just putting all of our embeddings into there.

00:07:50.440 | So let's have a look at how we do that.

00:07:54.040 | So we first load our data, check that it's actually there, looks OK.

00:08:02.120 | So the current-- this is changing pretty soon.

00:08:05.920 | So maybe by the time you're watching this, it might have already changed.

00:08:09.960 | But the max size limit for metadata in Pinecone is 5 kilobytes.

00:08:14.680 | So what we need to do is just check, OK, is the text field that we're feeding in here

00:08:19.960 | greater than that?

00:08:21.360 | If so, we need to clean up a little bit.

00:08:24.680 | And yes, it is.

00:08:27.480 | So in this case, a lot of them are too big.

00:08:32.200 | So what we can do is either trim down, like truncate those text fields, or we can just

00:08:38.180 | keep the text locally and then just map the IDs to that text when we're feeding them back

00:08:45.040 | in.

00:08:46.040 | Now, in this case, I'm just going to map them to IDs.

00:08:49.680 | So you imagine you-- we're not going to put our text data in Pinecone, just the vectors.

00:08:56.560 | But we also include an ID.

00:08:58.520 | But obviously, when we're returning the most relevant vectors back to us, we need a way

00:09:05.480 | to understand what that text is so that we can feed it into our OpenAI GPT or generation

00:09:11.520 | model.

00:09:13.680 | So we need a way to go from those IDs to the original text, which is all we're doing here.

00:09:20.600 | So I'm getting the IDs, or I'm creating an ID, which is literally just numbers, nothing

00:09:27.560 | special.

00:09:28.560 | It doesn't have to be anything special.

00:09:29.560 | It just needs to be unique.

00:09:33.740 | And then after we've created those IDs, I'm just initializing the index.

00:09:38.400 | You can do this before or after if you want.

00:09:41.800 | It doesn't really matter.

00:09:43.920 | We do need to make sure that we're using the cosine metric and also using the correct dimensionality.

00:09:52.400 | So we have our embeddings that we've created up here.

00:09:56.560 | And here, we're just saying, OK, use the dimensionality that is of those embeddings.

00:10:02.400 | So I think-- I'm not sure how much it is for the query model.

00:10:04.800 | It might be something like 2,048 dimensionality vectors.

00:10:08.400 | I'm not sure.

00:10:09.800 | Actually, we can-- how could it look?

00:10:13.960 | So go to the Pinecone console, open AI.

00:10:19.400 | OK, so it's 4,096 dimensions.

00:10:26.400 | Cool.

00:10:27.880 | So we do that, and then we just connect to our index after we've created it here.

00:10:34.880 | So create our index.

00:10:36.320 | At that point, we can then begin to populate the index with the embeddings that we've created

00:10:41.960 | from OpenAI.

00:10:43.760 | So that's what I'm doing here.

00:10:44.960 | We're doing it in batches of 32.

00:10:48.240 | So what we're doing is including the ID and then the embedding.

00:10:53.080 | And then we're also including metadata.

00:10:54.720 | So this is pretty important if we want to do any filtering or-- we can see here we've

00:11:00.640 | got docs, category, and thread, and also the link.

00:11:04.880 | That's really important for our app, because we're including all of this information in

00:11:08.600 | the sources section that we saw before.

00:11:10.200 | If we have a look here, we can see we have these sources.

00:11:14.040 | So we have the docs, followed by the category, followed by the name of the thread.

00:11:20.400 | And also, if we click on it here, it will take us through to the thread.

00:11:26.200 | So all that metadata that we specified there is used to build that sources section there.

00:11:33.680 | And then finally-- so I mentioned before, Pinecone isn't going to, at this point, handle

00:11:38.440 | all of that text information.

00:11:40.480 | So all I do is create another data set, which contains the ID to text mappings.

00:11:49.120 | And then just save that to file in a JSON file, and nothing crazy there.

00:11:55.760 | OK, so that that we just went through was this bit here, which seems like very little

00:12:06.100 | for what we just spoke about.

00:12:08.760 | But yeah, that bit of that.

00:12:11.020 | So we're now kind of done with the indexing part.

00:12:13.720 | So we can cross off.

00:12:15.240 | We're fine with index everything.

00:12:17.240 | We only need to repeat this again if we're adding more data to our tool.

00:12:22.960 | And then we'll go over to the querying bit.

00:12:26.560 | So here, we're going to query.

00:12:29.080 | This is what the users are going to be doing.

00:12:31.040 | So we enter this search term, how to use Radiant Tape in TensorFlow, as I mentioned before.

00:12:37.040 | And if we were going to go through that, it would look like this.

00:12:45.040 | OK, so we have the mappings that we have here.

00:12:52.160 | So that's the ID to text.

00:12:54.420 | We connect to OpenAI, and then we use this getEmbedding to actually create our embeddings.

00:13:01.880 | We don't run that here, though.

00:13:05.400 | We load our Pinecone index over here.

00:13:11.280 | And then we just define a function that is going to use OpenAI to create the query embedding.

00:13:20.080 | And then we use that query embedding to retrieve the most relevant context from Pinecone.

00:13:26.960 | And then we feed them back into the generation model and retrieve an answer.

00:13:34.960 | OK, so get the embedding.

00:13:38.920 | Where is that?

00:13:39.920 | So here, you can see.

00:13:41.280 | So this size here is a query model.

00:13:43.720 | You can see that there.

00:13:47.520 | And then we're querying Pinecone here.

00:13:52.120 | And then we're just going through all of those that we've got, and we're creating a really

00:13:57.320 | big string of context, which you can see here, actually.

00:14:00.880 | So you see, we have createContext, how do I use a Gradient Tape in TensorFlow.

00:14:07.440 | And this creates the context.

00:14:10.360 | This isn't the answer, or this isn't the question that we're going to ask to the next generative

00:14:16.080 | model.

00:14:17.080 | This is just all of the source context that we're going to assign to it.

00:14:21.200 | So we're going to say, OK, based on all this information here, answer a particular question.

00:14:27.640 | So let's go down.

00:14:31.600 | And we can see, OK, here are the sort of questions we're going to ask.

00:14:36.760 | So conservative Q&A, answer the question based on the context below.

00:14:42.920 | And if the question can't be answered based on the context, say, I don't know, OK?

00:14:49.680 | And then we go on.

00:14:51.100 | So I just want to show you what that might look like in the actual demo.

00:14:58.120 | So let's go with-- let's restrict everything to streamlet, and we'll ask about OpenAI Clip,

00:15:06.920 | maybe.

00:15:07.920 | Yeah, let's do that.

00:15:09.800 | And then we just change to conservative Q&A and go down.

00:15:15.960 | And we see that we get I don't know, right?

00:15:18.520 | So that's how the generative model is kind of reading the questions, literally reading

00:15:26.160 | them and kind of producing this pretty intelligent answer based on what we've given it.

00:15:33.800 | It's following the instructions really well.

00:15:36.740 | So it had this I don't know.

00:15:39.440 | But the reason it didn't know is because what it was given there was not found within the

00:15:45.560 | context, OK?

00:15:47.240 | There was nothing in the streamlet docs that we filtered down to that said anything about

00:15:51.840 | OpenAI Clip.

00:15:53.580 | And then as well as that, we also pass in the answer.

00:15:57.800 | And then we prompted OpenAI's generative model to actually answer that by saying, OK, answer.

00:16:04.200 | So it was saying, like, basically, finish this sentence.

00:16:07.920 | That's all we're really doing here.

00:16:09.800 | But you get these really incredible responses.

00:16:14.600 | OK, and then there's just a few examples here, actually.

00:16:21.680 | So you see, well, GPT-2 strength and weaknesses, and it says, I don't know.

00:16:26.560 | So there's obviously not enough in there for it to do anything.

00:16:29.800 | There's a really good one here, which is the extract key libraries and tools.

00:16:34.280 | So embedding models, which embed images and text.

00:16:38.960 | OK, let me copy this.

00:16:43.440 | And what was it?

00:16:44.960 | Was it bullet points?

00:16:45.960 | Oh, no, extract key libraries and tools.

00:16:48.840 | Let's do that.

00:16:51.200 | And let's make sure we have everything in there.

00:16:54.120 | OK, and we get this really cool sort of list of all the embedding models that you would

00:17:03.400 | use for extracting or which can embed images and text.

00:17:10.160 | So that's really, really impressive.

00:17:13.060 | So that is, I think, pretty much all of that.

00:17:17.560 | There's a little more code, I think, if we go up a little bit.

00:17:23.080 | So I kind of skipped over this.

00:17:25.160 | So we're just creating a context here, which is what we covered above.

00:17:31.040 | And then we're just saying, OK, go into the OpenAI completion endpoint.

00:17:36.640 | And that's going to complete the question or the text that we've given to it.

00:17:42.680 | So that's really cool.

00:17:44.080 | And yeah, no, this sort of thing just almost blows me away a little bit with how impressive

00:17:52.520 | it is, because it seems almost genuinely intelligent.

00:17:58.280 | And you ask a question, and it gives you a genuinely intelligent response that is kind

00:18:04.320 | of pulled from all these different sources and formatted in a particular way based on

00:18:09.360 | what you've asked.

00:18:10.360 | It is, for me, incredibly interesting.

00:18:14.000 | And yeah, definitely something I want to read about more, these sort of generative models,

00:18:22.480 | because it's clearly really impressive and very useful.

00:18:29.000 | So that's it for this video.

00:18:31.320 | I hope this has been interesting to see this demo and go through how everything works.

00:18:41.080 | But yeah, let me know what you think.

00:18:46.440 | And if you are interested in trying it out, you can do.

00:18:49.160 | You just go to share, Streamlit, I/O, Pinecone AI, Playground, Beyond Search, OpenAI, and

00:18:56.120 | so on.

00:18:57.880 | And if you'd like to take a look at the code for that as well, you can do.

00:19:06.600 | What you do is head over to this URL here, and then you have all the code I just showed

00:19:11.840 | you in here, and also the data as well, if you'd like that.

00:19:17.640 | So yeah, I think it's been really cool.

00:19:20.320 | So thank you very much for watching.

00:19:21.880 | I hope this has been useful, and I will see you again in the next one.

00:19:26.360 | Bye.

How to build next-level Q&A with OpenAI

Chapters