back to indexGenerative Question-Answering with OpenAI's GPT-3.5 and Davinci
Chapters
0:0 Generative Question Answering with OpenAI
0:56 Example App for Generative QA
5:2 OpenAI Pinecone Stack Architecture
7:18 Dealing with LLM Hallucination
9:28 Indexing all data
13:47 Querying
16:37 Generation of answers
21:25 Testing some generative question answers
22:55 Final notes
00:00:07.880 |
So what I mean by generative question answering 00:00:16.680 |
and you ask them where is this particular item? 00:00:21.400 |
Or maybe you're not entirely sure what the item is 00:00:28.560 |
that taste kind of like cherries but look like strawberries? 00:00:32.680 |
And hopefully the attendant will be able to say, 00:00:36.600 |
And they will say, okay, you just need to go down to aisle two 00:00:39.600 |
and they'll be on the left or they'll take you there. 00:00:42.600 |
That is basically what generative question answering is. 00:00:45.840 |
You're asking something, a question in natural language, 00:00:50.840 |
and that something is generating natural language back to you 00:00:59.160 |
this idea of generative question answering is to show you. 00:01:12.520 |
and Streamlit forums, but it is enough for us 00:01:19.000 |
So what I'm gonna do here is I'm going to say, 00:01:29.520 |
or what are the differences between TensorFlow and PyTorch? 00:01:37.000 |
We can limit the amount of data that it's going to pull from 00:01:42.280 |
it's going to return five items of information. 00:01:49.480 |
So they are two of the most popular deep learning frameworks, 00:01:53.360 |
PyTorch, Python-based library developed by Facebook, 00:01:56.160 |
while TensorFlow is a library developed by Google. 00:01:58.160 |
Both frameworks are open source, large community. 00:02:01.040 |
Main difference is that PyTorch is more intuitive 00:02:03.680 |
and easy to use, while TensorFlow is more powerful 00:02:10.160 |
PyTorch is better suited for rapid programming and research, 00:02:12.640 |
while TensorFlow is better for production and deployment. 00:02:24.240 |
And then it even encourages us to learn both frameworks 00:02:27.560 |
to get the most out of our deep learning projects. 00:02:30.040 |
If we have a look at where this information is coming from, 00:02:37.040 |
and it will take us to those original discussions. 00:02:45.200 |
So we're not just relying on this actually being the case. 00:03:08.280 |
computation with respect to its input variables. 00:03:15.680 |
and then record the forward pass of the model. 00:03:23.240 |
This can be done by calling the tape gradient method. 00:03:26.080 |
And then we continue with some more information 00:03:29.160 |
like we use to calculate higher order derivatives 00:03:36.920 |
But let's have a look at how we would actually build this. 00:03:51.720 |
Now, our generative question answering pipeline 00:03:55.120 |
using AI is going to contain two primary items, 00:03:58.200 |
a knowledge base and the AI models themselves. 00:04:03.520 |
we can think of it as a model's long-term memory. 00:04:10.280 |
the shop assistant has been working in that store for a while. 00:04:36.920 |
Something in their brain is translating that language 00:04:56.240 |
and then returning that information back to us 00:05:08.080 |
we are going to have an embedding model here, 00:05:14.840 |
question, we feed that into the embedding model. 00:05:30.160 |
that essentially represents the semantic meaning 00:05:39.160 |
The knowledge base is going to be Pinecone Vector Database. 00:05:42.760 |
And from there, within that Pinecone Vector Database, 00:05:51.640 |
or just information that has already been indexed. 00:06:01.080 |
So we essentially have a ton of index items like this. 00:06:06.080 |
And then we introduce this query vector into here. 00:06:26.360 |
And then we pass all of those to another model. 00:06:32.320 |
So this is going to be a DaVinci text generation model. 00:06:39.400 |
we're going to translate them back into text. 00:06:44.000 |
that contain relevant information to our particular query. 00:06:52.400 |
And we're going to format them in a certain way. 00:06:54.040 |
And we'll dive into that a little more later on. 00:06:56.400 |
DaVinci will be able to complete our questions. 00:07:05.800 |
answer that based on the following information. 00:07:08.800 |
And then that's where we pass in the information 00:07:20.600 |
is why not just generate the answer from the query directly? 00:07:46.080 |
or we can suffer from something called hallucination. 00:07:49.560 |
Now, hallucination is where a text generation model 00:07:53.160 |
is basically giving text that seems like it's true, 00:08:07.040 |
but they aren't necessarily tied to being truthful. 00:08:23.520 |
which seems very plausible using scientific terminology 00:08:29.520 |
But in reality, all of that may be completely false. 00:08:34.040 |
So by adding in this component of a knowledge base 00:08:41.320 |
from knowledge base and then answer the question 00:08:46.760 |
we are forcing the model to almost be truthful 00:08:55.400 |
as long as our knowledge base actually does contain facts. 00:09:02.360 |
So whereas a typical language generation model 00:09:13.200 |
by giving it this knowledge from a particular domain 00:09:17.840 |
based on that knowledge that we've extracted. 00:09:20.480 |
So there are a few good reasons for doing this. 00:09:32.960 |
we're going to feed it into the embedding model 00:10:25.840 |
So if you'd like to understand this in more detail, 00:10:33.640 |
I'm just going to go through it pretty quickly. 00:10:38.160 |
Of course, we're going to be loading this dataset here, 00:10:41.320 |
MLQA, which is just a set of question answering 00:10:50.120 |
So like HuggingFace, Hightorch, and a few others. 00:10:56.000 |
is removing excessively long items from there. 00:11:05.000 |
what we're going to do is include quite a lot of information. 00:11:07.240 |
So we're going to include the thread title or topic, 00:11:16.360 |
And you can kind of see like thread title is this, 00:11:23.560 |
you have the answer a little further on, which is here. 00:11:38.680 |
So this is actually not that much for this model. 00:11:45.720 |
I will show you how to do that just in case you don't know. 00:11:52.680 |
and you create an account if you don't already have one. 00:11:57.240 |
And then you need to go up to your profile on the top right, 00:12:05.960 |
if you don't have any of these saved elsewhere. 00:12:09.400 |
you just paste it into here and then continue running. 00:12:19.120 |
So we have the embedding model here, Arda002. 00:12:39.800 |
And then you just go into the left here after signing up 00:12:47.680 |
We create the index and connect to the index. 00:12:54.760 |
After that, we just need to index everything. 00:12:56.680 |
So we're going through our text here in these batches of 128. 00:13:04.760 |
So we're encoding everything in these batches of 128. 00:13:09.360 |
so that's going to be the plain text associated 00:13:11.560 |
with each one of our, what will be vector embeddings. 00:13:18.280 |
So we just clean them up from the response format here 00:13:35.600 |
And then we have these 6,000 vectors in Pinecone 00:13:42.480 |
Now, usually you probably want way more items than this, 00:13:46.120 |
but this is, I think, good enough for this example. 00:13:53.480 |
So this is the next step, not the final step. 00:13:56.440 |
So we have a user query and what we're going to do is, 00:13:59.840 |
it's going to look very similar to what we already did. 00:14:02.520 |
We're going to pass that into the text embedding, 00:14:12.000 |
That gives us our query vector, which we'll call XQ. 00:14:17.640 |
Now, Pinecone has metadata attached to each of our vectors. 00:14:22.640 |
So let's say we return three of these vectors. 00:14:34.360 |
But then later on, we're also going to pass that 00:14:38.040 |
which gets produced a more intelligent answer 00:14:41.360 |
using all of the contexts that we've returned. 00:14:44.720 |
we're going to be using this zero making queries notebook 00:14:52.400 |
Now I'm actually going to run through this one 00:14:53.800 |
because we're going to be moving on to generation stuff, 00:14:57.680 |
And I want to go through it in a little more detail. 00:14:59.560 |
So I'm going to run this, which is just the prerequisites. 00:15:03.120 |
And then we'll move on to initializing everything. 00:15:16.240 |
So this is solving the variable called Pinecone key. 00:15:27.200 |
we should see that there are already values in there 00:16:02.080 |
what are the differences between PyTorch and TensorFlow? 00:16:09.280 |
Here, we're just extracting it from the response. 00:16:11.320 |
And then we use that query vector to query Pinecone. 00:16:13.720 |
So index.query, I'm going to return top three items. 00:16:19.200 |
And we can see here, we have this general discussion. 00:16:36.720 |
So within that, we should have a few relevant contexts 00:16:41.040 |
that we can then feed into the generation model 00:16:44.880 |
as basically sources of truth for whatever it is it's saying. 00:16:49.760 |
Now, generation stage is actually very simple. 00:17:02.240 |
followed onto the end of it or appended onto the end of it. 00:17:06.840 |
we're also going to have like a little statement, 00:17:09.440 |
which is basically telling the generation model 00:17:17.760 |
And the instruction is going to be something like, 00:17:19.720 |
answer the following question given the following context. 00:17:28.480 |
is actually going to be at the end of this here. 00:17:54.320 |
So this is what we returned from Pinecone up here. 00:17:57.400 |
Okay, so this text that you can just about see here. 00:18:01.120 |
Okay, so it's within the metadata, it's a context 00:18:05.920 |
And then you can see a prompt that we're building. 00:18:09.160 |
answer the question based on the context below. 00:18:14.080 |
That's where we feed in our data that we retrieved. 00:18:17.920 |
Now that data retrieved is obviously a few items. 00:18:21.160 |
So we join all those with basically these separators here. 00:18:25.000 |
And then after we have put that data in there, 00:18:32.200 |
And then because this generation model is basically 00:18:34.880 |
just taking forward the text that we have so far 00:18:37.480 |
and continuing it to keep with the format of question 00:18:41.560 |
of context, question, we end the prompt with answer. 00:18:46.560 |
And then GPT is going to just complete this query. 00:18:51.040 |
It's going to continue, it's going to generate text. 00:18:53.320 |
Now this is going to be basically the final structure, 00:19:06.240 |
so that we're not putting too many of these contexts in here. 00:19:09.440 |
So here, what we would do is we'll go through 00:19:14.640 |
So essentially we're going to start with context one, 00:19:17.040 |
then we're going to try context two, context three, 00:19:21.560 |
So we're going to start with just the first context 00:19:26.920 |
with the first two contexts and the first three contexts 00:19:33.800 |
The exception here is if the context that we've gone up to 00:19:46.880 |
So let's say we get to the third context here, 00:19:50.200 |
Then we just take the first two contexts, right? 00:19:53.080 |
And then actually after that, we should have a break 00:19:58.520 |
Otherwise, if we get up to all of the contexts, 00:20:03.600 |
Okay, at that point, we move on to the completion point. 00:20:37.440 |
or how many tokens should the model generate? 00:20:48.760 |
if you did want to stop on a particular character, 00:20:51.160 |
like maybe you wanted to stop on a full stop, 00:20:56.960 |
So we run this and now we can check what has been generated. 00:21:08.600 |
And it says, okay, if you ask me for my personal opinion, 00:21:10.720 |
I find TensorFlow more convenient in the industry, 00:21:13.760 |
prototyping, deployment, scalability is easier, 00:21:19.080 |
more Pythonic and easier to implement complex stuff. 00:21:22.440 |
So we see it's pretty aligned to the answer we got earlier on. 00:21:30.520 |
And what we're going to do is actually modify the prompt. 00:21:33.520 |
So I'm going to say, give an exhaustive summary 00:21:35.600 |
and answer based on the question using the context below. 00:21:48.880 |
This, I'm saying exhaustive summary and then answer. 00:21:54.400 |
So let's come down here and yeah, we can see it's longer. 00:21:59.360 |
two of the most popular major machine learning libraries. 00:22:02.280 |
TensorFlow is maintained and released by Google 00:22:05.120 |
while PyTorch is maintained and released by Facebook. 00:22:07.600 |
TensorFlow is more convenient in this industry, 00:22:12.040 |
So we're still getting the same information in there, 00:22:17.760 |
And it also even says PyTorch is more handy in research, 00:22:26.960 |
over Python equivalent as it can run on the client side too. 00:22:43.440 |
So in reality, the fact that you do have TensorFlow.js 00:22:52.000 |
but maybe it's not explained very well in this generation. 00:22:58.720 |
on generative question answering using Pinecone 00:23:03.360 |
and OpenAI's embedding models and also generation models. 00:23:14.280 |
Like we can build some insane things with this 00:23:18.640 |
and it can go far beyond what I've shown here 00:23:32.560 |
And this is something that we will definitely explore 00:23:40.240 |
I hope all this has been interesting and useful.