Back to Index

How to build next-level Q&A with OpenAI


Chapters

0:0 Intro
4:7 Embedding the QA
10:8 Creating the index
11:54 Querying
17:13 Outro

Transcript

Today, I want to show you a demo of how you can use generative question answering using Pinecone and OpenAI. Now recently, we hosted a workshop between OpenAI and Pinecone, and people had the opportunity to ask questions themselves as well. And also, if you would like to ask questions, you can just go to this web address up here.

And they asked some really good questions. So one of those that I quite liked was, how do I define a PyTorch tensor of any zeros? So we come down here, we have style at the moment, so it's paragraph by question. We'll leave it at that for now. I've just here removed the filters for Streamlet, Huggerface, and TensorFlow.

So we're only pulling information from the PyTorch forums, but we can add those back in, and it should be able to figure out anyway. But when we know we're looking at PyTorch, why not add that in there? Then we get this. So there are a few ways to create a PyTorch tensor of all zeros.

I don't know why it's re-running there. Oh, okay, so we get a slightly different answer, interesting. One way is to use PyTorch function torch zeros. Okay, yeah, that's cool. And then another one is great using the PyTorch function zeros light. And then it explains what it does as well.

So that's quite cool. So it's giving you two options here and also describing how they vary, which is really nice. So now we can also have a look at the sources. So if we look at the sources, we'll see that they're all just coming from PyTorch now. Let's ask another question.

So this one's not even really a question. I'm just going to say OpenAI Clip. And what I want to do is just say, okay, can you summarize what OpenAI Clip is? So we'll come down here. Let's see what it, see what it returns. Cool. So OpenAI Clip is a contrastive language image pre-training model that sees pairs of images and text and returns a matrix of cosine similarity between text in each image.

Okay, that's cool. So written in PyTorch, uses BCE loss in a, well, yeah, that's really cool. So that's just a paragraph. And then we can also, I want to summarize the problems that people have. So essentially this is going to summarize the problems that it sees in the context.

So when people are asking questions, it's going to summarize those problems that people are having, given that particular topic of OpenAI Clip. So we'll come down. It just says this page is originally published on that site. Can I rerun? See if it comes up, anything. Actually so we're just going from PyTorch, it's probably not a good idea.

We should probably include HuggingFace in there, maybe TensorFlow as well. Okay. So how to use ClipModel for image search and style transfer, how to fine tune the ClipModel, how to train the model, medical data. And the questions generally seek to find out whether the ClipModel can be used to generate replies to English language text input.

So that's really cool. So, and you can see where it's coming from here. So OpenAI Clip for image search, style transfer, and so on. So yeah, that's, I think, a really cool example of OpenAI's embedding and generation models used in unison with Pinecone as well. So I want to just show you a little bit of how that works.

So over on the left here, we have the indexing data set stage. So this starts with scraping data from different websites. So we use the forum website, so PyTorch, for example, here, and also all the ones that you saw over here, so Streamlet, PyTorch, HuggingFace, and TensorFlow. So I scraped all of those, and that resulted in these four JSON line files.

Now can I, let me show you what they look like. And we just have these lines of all this information. We have the thread, so we have, okay, the dots that's coming from the category, which is vision in this case, the thread that it's coming from, so like the topic of the conversation in the PyTorch forums, the actual link to it, the question, and then we have a load of answers, basically, or just responses to the question.

So we have those, and this is what you end up seeing here. So here, I've just concatenated them all together, and then we return over to here, and we feed all those into an OpenAI embedding model. So if we just went over here to the API, go to docs, and we can come down, if we come to embeddings, this talks to us about the embeddings, and how they work, and how to actually use them.

So yeah, I mean, if you want to have a look at that, you can do, but we're going to see this in a moment anyway. So switching back over to code, if we come down a little bit, so here, I'm just cleaning it up. And one thing you can do is that the tokenizer for GPT-3, the embedding model, is similar to GPT-2 tokenizer.

So we can actually download the tokenizer from Transformers, and check roughly how many tokens we're going to have in our-- we're feeding it through to the OpenAI API. So we'll go down, and then here, we can go ahead and actually get some embeddings. So all we're doing here, we get the getEmbeddings endpoint, which is from-- if I just write this in-- so it's a pip install OpenAI.

And you do need an account or everything for this, so I'm just pointing that out. You also need a API or-- oh, yeah, API key here. So yeah, if you need that as well, you go there. And then you can see everything. So this is just our ordered data from before, and then we also have our embeddings that we just created using that endpoint.

And then we're just saving them to this query embeddings pocket file. Now, at that point, we move over to the index initialization step. So over in our visual, that looks like this. So we just take all those we've created, got all our text, come through here, created the embeddings, come over here, and now we're at this stage.

So the Pinecone VectorDB, we're just putting all of our embeddings into there. So let's have a look at how we do that. So we first load our data, check that it's actually there, looks OK. So the current-- this is changing pretty soon. So maybe by the time you're watching this, it might have already changed.

But the max size limit for metadata in Pinecone is 5 kilobytes. So what we need to do is just check, OK, is the text field that we're feeding in here greater than that? If so, we need to clean up a little bit. And yes, it is. So in this case, a lot of them are too big.

So what we can do is either trim down, like truncate those text fields, or we can just keep the text locally and then just map the IDs to that text when we're feeding them back in. Now, in this case, I'm just going to map them to IDs. So you imagine you-- we're not going to put our text data in Pinecone, just the vectors.

But we also include an ID. But obviously, when we're returning the most relevant vectors back to us, we need a way to understand what that text is so that we can feed it into our OpenAI GPT or generation model. So we need a way to go from those IDs to the original text, which is all we're doing here.

So I'm getting the IDs, or I'm creating an ID, which is literally just numbers, nothing special. It doesn't have to be anything special. It just needs to be unique. And then after we've created those IDs, I'm just initializing the index. You can do this before or after if you want.

It doesn't really matter. We do need to make sure that we're using the cosine metric and also using the correct dimensionality. So we have our embeddings that we've created up here. And here, we're just saying, OK, use the dimensionality that is of those embeddings. So I think-- I'm not sure how much it is for the query model.

It might be something like 2,048 dimensionality vectors. I'm not sure. Actually, we can-- how could it look? So go to the Pinecone console, open AI. OK, so it's 4,096 dimensions. Cool. So we do that, and then we just connect to our index after we've created it here. So create our index.

At that point, we can then begin to populate the index with the embeddings that we've created from OpenAI. So that's what I'm doing here. We're doing it in batches of 32. So what we're doing is including the ID and then the embedding. And then we're also including metadata. So this is pretty important if we want to do any filtering or-- we can see here we've got docs, category, and thread, and also the link.

That's really important for our app, because we're including all of this information in the sources section that we saw before. If we have a look here, we can see we have these sources. So we have the docs, followed by the category, followed by the name of the thread. And also, if we click on it here, it will take us through to the thread.

So all that metadata that we specified there is used to build that sources section there. And then finally-- so I mentioned before, Pinecone isn't going to, at this point, handle all of that text information. So all I do is create another data set, which contains the ID to text mappings.

And then just save that to file in a JSON file, and nothing crazy there. OK, so that that we just went through was this bit here, which seems like very little for what we just spoke about. But yeah, that bit of that. So we're now kind of done with the indexing part.

So we can cross off. We're fine with index everything. We only need to repeat this again if we're adding more data to our tool. And then we'll go over to the querying bit. So here, we're going to query. This is what the users are going to be doing. So we enter this search term, how to use Radiant Tape in TensorFlow, as I mentioned before.

And if we were going to go through that, it would look like this. OK, so we have the mappings that we have here. So that's the ID to text. We connect to OpenAI, and then we use this getEmbedding to actually create our embeddings. We don't run that here, though.

We load our Pinecone index over here. And then we just define a function that is going to use OpenAI to create the query embedding. And then we use that query embedding to retrieve the most relevant context from Pinecone. And then we feed them back into the generation model and retrieve an answer.

OK, so get the embedding. Where is that? So here, you can see. So this size here is a query model. You can see that there. And then we're querying Pinecone here. And then we're just going through all of those that we've got, and we're creating a really big string of context, which you can see here, actually.

So you see, we have createContext, how do I use a Gradient Tape in TensorFlow. And this creates the context. This isn't the answer, or this isn't the question that we're going to ask to the next generative model. This is just all of the source context that we're going to assign to it.

So we're going to say, OK, based on all this information here, answer a particular question. So let's go down. And we can see, OK, here are the sort of questions we're going to ask. So conservative Q&A, answer the question based on the context below. And if the question can't be answered based on the context, say, I don't know, OK?

And then we go on. So I just want to show you what that might look like in the actual demo. So let's go with-- let's restrict everything to streamlet, and we'll ask about OpenAI Clip, maybe. Yeah, let's do that. And then we just change to conservative Q&A and go down.

And we see that we get I don't know, right? So that's how the generative model is kind of reading the questions, literally reading them and kind of producing this pretty intelligent answer based on what we've given it. It's following the instructions really well. So it had this I don't know.

But the reason it didn't know is because what it was given there was not found within the context, OK? There was nothing in the streamlet docs that we filtered down to that said anything about OpenAI Clip. And then as well as that, we also pass in the answer. And then we prompted OpenAI's generative model to actually answer that by saying, OK, answer.

So it was saying, like, basically, finish this sentence. That's all we're really doing here. But you get these really incredible responses. OK, and then there's just a few examples here, actually. So you see, well, GPT-2 strength and weaknesses, and it says, I don't know. So there's obviously not enough in there for it to do anything.

There's a really good one here, which is the extract key libraries and tools. So embedding models, which embed images and text. OK, let me copy this. And what was it? Was it bullet points? Oh, no, extract key libraries and tools. Let's do that. And let's make sure we have everything in there.

OK, and we get this really cool sort of list of all the embedding models that you would use for extracting or which can embed images and text. So that's really, really impressive. So that is, I think, pretty much all of that. There's a little more code, I think, if we go up a little bit.

So I kind of skipped over this. So we're just creating a context here, which is what we covered above. And then we're just saying, OK, go into the OpenAI completion endpoint. And that's going to complete the question or the text that we've given to it. So that's really cool.

And yeah, no, this sort of thing just almost blows me away a little bit with how impressive it is, because it seems almost genuinely intelligent. And you ask a question, and it gives you a genuinely intelligent response that is kind of pulled from all these different sources and formatted in a particular way based on what you've asked.

It is, for me, incredibly interesting. And yeah, definitely something I want to read about more, these sort of generative models, because it's clearly really impressive and very useful. So that's it for this video. I hope this has been interesting to see this demo and go through how everything works.

But yeah, let me know what you think. And if you are interested in trying it out, you can do. You just go to share, Streamlit, I/O, Pinecone AI, Playground, Beyond Search, OpenAI, and so on. And if you'd like to take a look at the code for that as well, you can do.

What you do is head over to this URL here, and then you have all the code I just showed you in here, and also the data as well, if you'd like that. So yeah, I think it's been really cool. So thank you very much for watching. I hope this has been useful, and I will see you again in the next one.

Bye.