back to index

Generative Question-Answering with OpenAI's GPT-3.5 and Davinci


Chapters

0:0 Generative Question Answering with OpenAI
0:56 Example App for Generative QA
5:2 OpenAI Pinecone Stack Architecture
7:18 Dealing with LLM Hallucination
9:28 Indexing all data
13:47 Querying
16:37 Generation of answers
21:25 Testing some generative question answers
22:55 Final notes

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're gonna take a look how to build
00:00:01.880 | a generative question answering app
00:00:04.420 | using OpenAI and Pinecone.
00:00:07.880 | So what I mean by generative question answering
00:00:10.240 | is imagine you go down to a grocery store
00:00:14.760 | and you go to one of the attendants
00:00:16.680 | and you ask them where is this particular item?
00:00:21.400 | Or maybe you're not entirely sure what the item is
00:00:24.440 | and you say, okay, where are those things
00:00:28.560 | that taste kind of like cherries but look like strawberries?
00:00:32.680 | And hopefully the attendant will be able to say,
00:00:34.720 | ah, you mean cherry strawberries.
00:00:36.600 | And they will say, okay, you just need to go down to aisle two
00:00:39.600 | and they'll be on the left or they'll take you there.
00:00:42.600 | That is basically what generative question answering is.
00:00:45.840 | You're asking something, a question in natural language,
00:00:50.840 | and that something is generating natural language back to you
00:00:55.040 | that answers your question.
00:00:56.600 | Now, I think one of the best ways to explain
00:00:59.160 | this idea of generative question answering is to show you.
00:01:03.400 | So we don't have that much data behind this,
00:01:06.000 | there's a little bit over 6,000 examples
00:01:09.440 | from the Hugging Face, PyTorch, TensorFlow
00:01:12.520 | and Streamlit forums, but it is enough for us
00:01:15.680 | to get some interesting answers from this.
00:01:19.000 | So what I'm gonna do here is I'm going to say,
00:01:23.120 | I want to get a paragraph about a question.
00:01:25.440 | And I'm gonna say, what is the difference
00:01:29.520 | or what are the differences between TensorFlow and PyTorch?
00:01:34.520 | And I'm going to ask you this question.
00:01:37.000 | We can limit the amount of data that it's going to pull from
00:01:40.640 | by adjusting top case at the moment,
00:01:42.280 | it's going to return five items of information.
00:01:44.920 | But I think that should be enough for this.
00:01:47.280 | So let's go and get this.
00:01:49.480 | So they are two of the most popular deep learning frameworks,
00:01:53.360 | PyTorch, Python-based library developed by Facebook,
00:01:56.160 | while TensorFlow is a library developed by Google.
00:01:58.160 | Both frameworks are open source, large community.
00:02:01.040 | Main difference is that PyTorch is more intuitive
00:02:03.680 | and easy to use, while TensorFlow is more powerful
00:02:08.360 | and has more features.
00:02:10.160 | PyTorch is better suited for rapid programming and research,
00:02:12.640 | while TensorFlow is better for production and deployment.
00:02:16.760 | Whereas PyTorch is more Pythonic,
00:02:18.840 | whereas TensorFlow is more convenient
00:02:20.760 | in the industry for prototyping, deployment
00:02:22.840 | and scalability.
00:02:24.240 | And then it even encourages us to learn both frameworks
00:02:27.560 | to get the most out of our deep learning projects.
00:02:30.040 | If we have a look at where this information is coming from,
00:02:32.360 | it's mostly TensorFlow.
00:02:33.720 | So maybe there's some bias there,
00:02:35.840 | and we can actually click on these
00:02:37.040 | and it will take us to those original discussions.
00:02:40.240 | So we can actually see where this model
00:02:43.440 | is getting this information from.
00:02:45.200 | So we're not just relying on this actually being the case.
00:02:48.920 | So another thing we can ask here
00:02:50.880 | is something that is perhaps more factual
00:02:54.400 | rather than opinionated,
00:02:56.360 | is how do I use gradient tape in TensorFlow?
00:03:01.360 | And let's see what it says.
00:03:04.400 | Again, we get this.
00:03:05.240 | So it's a powerful tool in TensorFlow
00:03:06.680 | that allows us to calculate gradients,
00:03:08.280 | computation with respect to its input variables.
00:03:11.200 | And then to use gradient tape,
00:03:13.400 | you first need to create a tape object
00:03:15.680 | and then record the forward pass of the model.
00:03:17.960 | So after the forward pass is recorded,
00:03:19.640 | you can then calculate gradients
00:03:21.120 | with respect to those variables.
00:03:23.240 | This can be done by calling the tape gradient method.
00:03:26.080 | And then we continue with some more information
00:03:29.160 | like we use to calculate higher order derivatives
00:03:32.800 | and then, and then, and so on.
00:03:34.840 | So that's pretty cool.
00:03:36.920 | But let's have a look at how we would actually build this.
00:03:40.120 | Now, when we're building generative
00:03:42.280 | question answering systems with AI,
00:03:44.640 | naturally we just replace the person
00:03:47.960 | with some sort of AI pipeline.
00:03:51.720 | Now, our generative question answering pipeline
00:03:55.120 | using AI is going to contain two primary items,
00:03:58.200 | a knowledge base and the AI models themselves.
00:04:01.600 | So the knowledge base,
00:04:03.520 | we can think of it as a model's long-term memory.
00:04:07.560 | Okay, so in the shop assistant example,
00:04:10.280 | the shop assistant has been working in that store for a while.
00:04:13.520 | They have a long-term memory
00:04:15.680 | of roughly where different items are,
00:04:17.520 | maybe they're actually very good
00:04:19.000 | and know exactly where different items are.
00:04:21.680 | And they just have this large knowledge base
00:04:26.080 | of everything in that store.
00:04:29.040 | And when we ask that question,
00:04:31.360 | we are producing our natural language prompt
00:04:34.840 | or query to that person.
00:04:36.920 | Something in their brain is translating that language
00:04:41.800 | into some sort of concept or idea.
00:04:44.520 | And they're looking at their knowledge base
00:04:47.000 | based on their past experience,
00:04:49.520 | attaching that query to some nearby points
00:04:54.520 | within their knowledge base,
00:04:56.240 | and then returning that information back to us
00:04:58.520 | by generating some natural language
00:05:00.520 | that explains what we are looking for.
00:05:02.560 | So if we take a look at how this will look
00:05:05.200 | for our AI pipeline,
00:05:08.080 | we are going to have an embedding model here,
00:05:11.360 | and we have our question like syntax here,
00:05:14.840 | question, we feed that into the embedding model.
00:05:17.120 | We're going to be using the ARDA002 model,
00:05:19.760 | which is a GPT 3.5 embedding model.
00:05:23.920 | That's going to translate that query
00:05:27.160 | into some sort of numerical representation
00:05:30.160 | that essentially represents the semantic meaning
00:05:33.800 | behind our query.
00:05:34.720 | So we get a vector in vector space.
00:05:37.240 | Now we then pass that to the knowledge base.
00:05:39.160 | The knowledge base is going to be Pinecone Vector Database.
00:05:42.760 | And from there, within that Pinecone Vector Database,
00:05:46.400 | just like the shop assistant,
00:05:47.480 | we're going to have essentially
00:05:49.440 | what we can think of as past experiences,
00:05:51.640 | or just information that has already been indexed.
00:05:55.800 | It's already been encoded by GPT 3.5 model
00:05:59.200 | and stored within Pinecone.
00:06:01.080 | So we essentially have a ton of index items like this.
00:06:06.080 | And then we introduce this query vector into here.
00:06:10.600 | So maybe it comes here,
00:06:12.280 | and we're going to say, just return,
00:06:14.280 | let's say in this example,
00:06:15.360 | we have a few more points around here,
00:06:17.280 | just return the top three items.
00:06:19.440 | So in this case, we're going to return this,
00:06:22.680 | this, and this.
00:06:24.160 | So the top three most similar items.
00:06:26.360 | And then we pass all of those to another model.
00:06:29.000 | So this is going to be another OpenAI model.
00:06:32.320 | So this is going to be a DaVinci text generation model.
00:06:36.000 | And here, these data points,
00:06:37.880 | we actually, right here,
00:06:39.400 | we're going to translate them back into text.
00:06:42.040 | So there'll be like natural language texts
00:06:44.000 | that contain relevant information to our particular query.
00:06:46.880 | We are going to take our query as well.
00:06:49.480 | We're going to bring that over here.
00:06:50.880 | We're going to feed that in there.
00:06:52.400 | And we're going to format them in a certain way.
00:06:54.040 | And we'll dive into that a little more later on.
00:06:56.400 | DaVinci will be able to complete our questions.
00:07:01.400 | So we're going to ask a question.
00:07:03.200 | We have our original query.
00:07:04.960 | We're going to say,
00:07:05.800 | answer that based on the following information.
00:07:08.800 | And then that's where we pass in the information
00:07:10.960 | we've returned from Pinecone.
00:07:12.640 | And it will be able to answer us questions
00:07:15.960 | with incredible accuracy.
00:07:18.040 | Now, one question we might have here
00:07:20.600 | is why not just generate the answer from the query directly?
00:07:25.600 | And that can work in some cases.
00:07:28.360 | And it will actually work
00:07:29.360 | in a lot of general knowledge use cases.
00:07:31.640 | Like if we ask something like,
00:07:33.520 | who was the first man on the moon?
00:07:34.920 | It's probably going to say Neil Armstrong
00:07:36.680 | without struggling too much.
00:07:38.960 | That's a very obvious fact.
00:07:41.480 | But if we ask it more specific questions,
00:07:44.000 | then it can struggle
00:07:46.080 | or we can suffer from something called hallucination.
00:07:49.560 | Now, hallucination is where a text generation model
00:07:53.160 | is basically giving text that seems like it's true,
00:07:57.720 | but it's actually not true.
00:07:59.000 | And that is just a natural consequence
00:08:02.000 | of what these models do.
00:08:03.600 | They generate human language.
00:08:05.600 | They do it very well,
00:08:07.040 | but they aren't necessarily tied to being truthful.
00:08:12.040 | They can just make something up.
00:08:14.000 | And that can be quite problematic.
00:08:15.560 | Imagine like in the medical domain
00:08:17.760 | and this model may begin generating text
00:08:21.120 | about a patient's diagnosis,
00:08:23.520 | which seems very plausible using scientific terminology
00:08:27.240 | and statistics and so on.
00:08:29.520 | But in reality, all of that may be completely false.
00:08:32.600 | It's just making it up.
00:08:34.040 | So by adding in this component of a knowledge base
00:08:38.880 | and forcing the model to pull information
00:08:41.320 | from knowledge base and then answer the question
00:08:43.400 | based on the information we pulled on
00:08:45.320 | from that knowledge base,
00:08:46.760 | we are forcing the model to almost be truthful
00:08:51.160 | and base its answers on actual facts,
00:08:55.400 | as long as our knowledge base actually does contain facts.
00:08:58.720 | And at the same time,
00:09:00.160 | we can also use it for domain adaption.
00:09:02.360 | So whereas a typical language generation model
00:09:06.200 | may not have a very good understanding
00:09:09.040 | of a particular domain,
00:09:10.600 | we can actually help it out
00:09:13.200 | by giving it this knowledge from a particular domain
00:09:16.200 | and then asking it to answer a question
00:09:17.840 | based on that knowledge that we've extracted.
00:09:20.480 | So there are a few good reasons for doing this.
00:09:24.120 | Now to do this,
00:09:25.040 | we are going to essentially do three things.
00:09:28.680 | We first need to index all of our data.
00:09:31.160 | So we're going to take all of our tags,
00:09:32.960 | we're going to feed it into the embedding model
00:09:35.800 | and that will create our vectors.
00:09:37.960 | Let's just say they're here
00:09:39.760 | and we pass them into Pinecone.
00:09:41.760 | And this is the indexing stage.
00:09:44.160 | Now to perform this indexing stage,
00:09:47.280 | we're going to come over to this repo here,
00:09:49.720 | so Pinecone I/O samples.
00:09:51.960 | And at the moment,
00:09:52.840 | these notebooks are stored here,
00:09:54.960 | but I will leave a link on this video
00:09:58.240 | that will keep updated to these notebooks
00:10:00.800 | just in case they do change in the future.
00:10:02.760 | We come to here and this is our notebook.
00:10:05.960 | Now we can also open it in Colab.
00:10:08.120 | So I'm going to do that over here.
00:10:09.800 | Again, I'm going to make sure
00:10:10.880 | there's a link in this video right now
00:10:12.840 | so that you can click that link
00:10:14.480 | and just go straight to this.
00:10:16.480 | So the first thing we need to do
00:10:17.560 | is install our prerequisites,
00:10:19.440 | OpenAI, Pack and Climb datasets.
00:10:21.640 | Now I have a few videos
00:10:23.680 | that go through basically this process.
00:10:25.840 | So if you'd like to understand this in more detail,
00:10:30.320 | I will link to one of those videos.
00:10:32.800 | But right now,
00:10:33.640 | I'm just going to go through it pretty quickly.
00:10:35.160 | So first is Dataprep.
00:10:37.200 | We do this all the time.
00:10:38.160 | Of course, we're going to be loading this dataset here,
00:10:41.320 | MLQA, which is just a set of question answering
00:10:45.200 | based on ML questions
00:10:46.800 | across a few different document forms.
00:10:50.120 | So like HuggingFace, Hightorch, and a few others.
00:10:53.320 | I'm not going to go through this,
00:10:54.640 | but all we're doing essentially
00:10:56.000 | is removing excessively long items from there.
00:10:59.440 | Now, this bit is probably more important.
00:11:02.920 | So when we're creating these embeddings,
00:11:05.000 | what we're going to do is include quite a lot of information.
00:11:07.240 | So we're going to include the thread title or topic,
00:11:10.280 | the question that was asked
00:11:11.520 | and the answer that we've been given.
00:11:14.600 | So we do that here.
00:11:16.360 | And you can kind of see like thread title is this,
00:11:19.240 | and then you have the question asked,
00:11:22.160 | and then later on,
00:11:23.560 | you have the answer a little further on, which is here.
00:11:28.400 | So we're going to throw all of that
00:11:30.760 | into our Arda002 embedding model,
00:11:34.400 | and it can embed a ton of text.
00:11:35.880 | I think it's up to around 10 pages of text.
00:11:38.680 | So this is actually not that much for this model.
00:11:41.880 | So to actually embed the things,
00:11:43.160 | we need an OpenAI API key.
00:11:45.720 | I will show you how to do that just in case you don't know.
00:11:48.520 | So we go to this betaopenai.com/signup,
00:11:52.680 | and you create an account if you don't already have one.
00:11:54.840 | I do, so I'm going to log in.
00:11:57.240 | And then you need to go up to your profile on the top right,
00:12:01.320 | click View API Keys.
00:12:03.040 | And you may need to create a new secret key
00:12:05.960 | if you don't have any of these saved elsewhere.
00:12:08.200 | Now, once you do have those,
00:12:09.400 | you just paste it into here and then continue running.
00:12:12.160 | If you write this,
00:12:13.920 | basically you should get a list of these
00:12:15.800 | if you have authenticated.
00:12:17.360 | Yeah, let's go through these a bit quicker.
00:12:19.120 | So we have the embedding model here, Arda002.
00:12:22.680 | Create the embeddings here, so on and so on.
00:12:26.240 | Here, we're creating our Pinecone index.
00:12:29.720 | So we're going to call it OpenAI ML QA.
00:12:32.480 | We need to get a API key again for that,
00:12:36.160 | so that we would find it at the pinecone.io.
00:12:39.800 | And then you just go into the left here after signing up
00:12:43.480 | and then copy your API key.
00:12:46.120 | And then of course, put it in here.
00:12:47.680 | We create the index and connect to the index.
00:12:52.240 | Okay, so that's initializing everything.
00:12:54.760 | After that, we just need to index everything.
00:12:56.680 | So we're going through our text here in these batches of 128.
00:13:01.640 | We are encoding everything here.
00:13:04.760 | So we're encoding everything in these batches of 128.
00:13:07.640 | We're getting relevant metadata,
00:13:09.360 | so that's going to be the plain text associated
00:13:11.560 | with each one of our, what will be vector embeddings.
00:13:15.840 | And we also have the embeddings.
00:13:18.280 | So we just clean them up from the response format here
00:13:22.680 | and we pull those together.
00:13:24.240 | So the IDs are just unique IDs,
00:13:26.920 | which is actually just a count in this case,
00:13:28.800 | embeddings themselves and the metadata.
00:13:31.360 | Then we add all those to Pinecone.
00:13:33.080 | Okay, pretty straightforward.
00:13:35.600 | And then we have these 6,000 vectors in Pinecone
00:13:39.960 | that we can then use as our knowledge base.
00:13:42.480 | Now, usually you probably want way more items than this,
00:13:46.120 | but this is, I think, good enough for this example.
00:13:48.640 | Now, after everything is indexed,
00:13:51.280 | we move on to querying.
00:13:53.480 | So this is the next step, not the final step.
00:13:56.440 | So we have a user query and what we're going to do is,
00:13:59.840 | it's going to look very similar to what we already did.
00:14:02.520 | We're going to pass that into the text embedding,
00:14:05.400 | R002 model.
00:14:07.000 | So this is the GPT 3.5 embedding model.
00:14:12.000 | That gives us our query vector, which we'll call XQ.
00:14:16.360 | And then we pass that to Pinecone.
00:14:17.640 | Now, Pinecone has metadata attached to each of our vectors.
00:14:22.640 | So let's say we return three of these vectors.
00:14:26.280 | We also have the metadata attached to it.
00:14:28.720 | Okay, and we return that to ourselves.
00:14:32.400 | So this is the querying set.
00:14:34.360 | But then later on, we're also going to pass that
00:14:36.240 | onto the generation model,
00:14:38.040 | which gets produced a more intelligent answer
00:14:41.360 | using all of the contexts that we've returned.
00:14:43.520 | So to begin making queries,
00:14:44.720 | we're going to be using this zero making queries notebook
00:14:47.840 | here.
00:14:48.680 | And again, I'm going to open it up in Colab.
00:14:52.400 | Now I'm actually going to run through this one
00:14:53.800 | because we're going to be moving on to generation stuff,
00:14:56.200 | which I haven't covered much before.
00:14:57.680 | And I want to go through it in a little more detail.
00:14:59.560 | So I'm going to run this, which is just the prerequisites.
00:15:03.120 | And then we'll move on to initializing everything.
00:15:05.240 | So we have the Pinecone vector database,
00:15:08.240 | the embedding and the generation models.
00:15:10.560 | First, we start with Pinecone.
00:15:12.720 | I'm going to run it again
00:15:13.560 | because I need to connect to the index.
00:15:16.240 | So this is solving the variable called Pinecone key.
00:15:20.440 | And here, we're just connecting to the index
00:15:22.680 | and we're going to describe the index stats.
00:15:24.360 | So because we already populated it,
00:15:26.040 | we already did the indexing,
00:15:27.200 | we should see that there are already values in there
00:15:30.320 | or vectors.
00:15:31.320 | And we can see that here.
00:15:32.280 | So we have the 6,000 in there.
00:15:34.560 | And here, we're going to need
00:15:36.040 | to initialize the OpenAI models.
00:15:38.880 | For that, we need an OpenAI key.
00:15:41.240 | And again, I've solved that in a variable
00:15:43.000 | called OpenAI key.
00:15:44.720 | Okay.
00:15:45.560 | Great.
00:15:48.360 | So the embedding model,
00:15:50.640 | let's see how we actually query with that.
00:15:52.280 | So we initialize the name of it there.
00:15:54.520 | So text embedding audit002.
00:15:56.320 | This needs to be the same as the model
00:15:57.960 | that we use during indexing.
00:15:59.920 | And I'm going to say, like we saw earlier,
00:16:02.080 | what are the differences between PyTorch and TensorFlow?
00:16:05.200 | Let's ask that question.
00:16:06.400 | So here, we're creating our query vector.
00:16:09.280 | Here, we're just extracting it from the response.
00:16:11.320 | And then we use that query vector to query Pinecone.
00:16:13.720 | So index.query, I'm going to return top three items.
00:16:17.400 | Let's see what it returns.
00:16:19.200 | And we can see here, we have this general discussion.
00:16:21.400 | I think this post might help you.
00:16:22.640 | Come down, it's PyTorch versus TensorFlow.
00:16:25.480 | So yeah, probably pretty relevant.
00:16:27.040 | And we have three of those.
00:16:28.240 | So we come down a little more.
00:16:29.280 | We have another one here.
00:16:31.480 | This is on about TensorFlow.js in this case.
00:16:34.400 | And so on.
00:16:36.720 | So within that, we should have a few relevant contexts
00:16:41.040 | that we can then feed into the generation model
00:16:44.880 | as basically sources of truth for whatever it is it's saying.
00:16:49.760 | Now, generation stage is actually very simple.
00:16:52.000 | We have our contexts down here.
00:16:54.800 | We are feeding them up into here.
00:16:56.640 | We're taking our query.
00:16:58.320 | So we're going to have our query question.
00:17:00.760 | And then we're going to have these contexts
00:17:02.240 | followed onto the end of it or appended onto the end of it.
00:17:05.400 | Now, actually, before this,
00:17:06.840 | we're also going to have like a little statement,
00:17:09.440 | which is basically telling the generation model
00:17:13.680 | how to or what to do based on instruction.
00:17:17.760 | And the instruction is going to be something like,
00:17:19.720 | answer the following question given the following context.
00:17:24.720 | And I should actually, so this question
00:17:28.480 | is actually going to be at the end of this here.
00:17:31.840 | Okay, so we're going to have the instruction
00:17:34.040 | followed by the context.
00:17:36.200 | So the ground truth of information
00:17:38.640 | that we returned from our vector database,
00:17:41.440 | and then we're going to have the question.
00:17:42.960 | Now let's see what that actually looks like.
00:17:45.520 | So come down to here, we have a limit
00:17:48.360 | on how much we're going to be feeding
00:17:49.920 | into the generation model.
00:17:51.440 | We're just setting that there.
00:17:53.320 | We have the context.
00:17:54.320 | So this is what we returned from Pinecone up here.
00:17:57.400 | Okay, so this text that you can just about see here.
00:18:01.120 | Okay, so it's within the metadata, it's a context
00:18:03.480 | from the matches that we got there.
00:18:05.920 | And then you can see a prompt that we're building.
00:18:07.840 | Okay, so we're going to say,
00:18:09.160 | answer the question based on the context below.
00:18:12.400 | So then we have context.
00:18:14.080 | That's where we feed in our data that we retrieved.
00:18:17.920 | Now that data retrieved is obviously a few items.
00:18:21.160 | So we join all those with basically these separators here.
00:18:25.000 | And then after we have put that data in there,
00:18:28.880 | we have the question, that's our query.
00:18:32.200 | And then because this generation model is basically
00:18:34.880 | just taking forward the text that we have so far
00:18:37.480 | and continuing it to keep with the format of question
00:18:41.560 | of context, question, we end the prompt with answer.
00:18:46.560 | And then GPT is going to just complete this query.
00:18:51.040 | It's going to continue, it's going to generate text.
00:18:53.320 | Now this is going to be basically the final structure,
00:18:56.480 | but this is not exactly how we built it.
00:18:59.240 | We actually need to remove this bit here
00:19:02.080 | because we're going to, in the middle here,
00:19:05.000 | we're going to use a bit of logic
00:19:06.240 | so that we're not putting too many of these contexts in here.
00:19:09.440 | So here, what we would do is we'll go through
00:19:12.240 | all of our contexts one at a time.
00:19:14.640 | So essentially we're going to start with context one,
00:19:17.040 | then we're going to try context two, context three,
00:19:20.680 | one at a time.
00:19:21.560 | So we're going to start with just the first context
00:19:23.880 | and then we move on in the second iteration
00:19:26.920 | with the first two contexts and the first three contexts
00:19:29.960 | and so on and so on until we have
00:19:32.280 | all of those contexts in there.
00:19:33.800 | The exception here is if the context that we've gone up to
00:19:38.440 | exceeds the limit that we've set here.
00:19:41.680 | At that point, we would say, okay,
00:19:43.720 | the prompt is that number minus one.
00:19:46.880 | So let's say we get to the third context here,
00:19:49.240 | but it's too big.
00:19:50.200 | Then we just take the first two contexts, right?
00:19:53.080 | And then actually after that, we should have a break
00:19:56.480 | and that produces our prompt.
00:19:58.520 | Otherwise, if we get up to all of the contexts,
00:20:01.360 | we just join them all together like this.
00:20:03.600 | Okay, at that point, we move on to the completion point.
00:20:06.920 | So the text generation part.
00:20:09.960 | Now for text generation,
00:20:11.160 | we use OpenAI Completion Create.
00:20:13.520 | We're using that TextEventually003 model.
00:20:16.720 | We're passing our prompt, temperature,
00:20:18.280 | so like the randomness in the generation.
00:20:21.880 | We say it's zero because we kind of want it
00:20:23.800 | to be pretty accurate.
00:20:26.880 | If we want more interesting answers,
00:20:29.360 | then we can increase the temperature.
00:20:31.480 | We have maximum tokens there.
00:20:33.200 | So how many things should the model generate
00:20:37.440 | or how many tokens should the model generate?
00:20:39.720 | And then a few other generation variables.
00:20:43.360 | You can read up on these in the OpenAI docs.
00:20:47.560 | Okay, one thing here,
00:20:48.760 | if you did want to stop on a particular character,
00:20:51.160 | like maybe you wanted to stop on a full stop,
00:20:53.680 | you can do so by specifying that there.
00:20:56.960 | So we run this and now we can check what has been generated.
00:20:59.840 | So we have here,
00:21:01.840 | so we go response choices, zero text,
00:21:05.160 | and then we just strip that text
00:21:06.600 | so maybe space on either end.
00:21:08.600 | And it says, okay, if you ask me for my personal opinion,
00:21:10.720 | I find TensorFlow more convenient in the industry,
00:21:13.760 | prototyping, deployment, scalability is easier,
00:21:16.560 | and PyTorch more handy in research,
00:21:19.080 | more Pythonic and easier to implement complex stuff.
00:21:22.440 | So we see it's pretty aligned to the answer we got earlier on.
00:21:25.760 | Now let's try another one.
00:21:26.880 | So again, what are the differences
00:21:28.400 | between PyTorch and TensorFlow?
00:21:30.520 | And what we're going to do is actually modify the prompt.
00:21:33.520 | So I'm going to say, give an exhaustive summary
00:21:35.600 | and answer based on the question using the context below.
00:21:39.920 | So actually give an exhaustive summary
00:21:42.240 | and answer based on the question.
00:21:44.480 | So we have a slightly different prompt.
00:21:45.680 | Let's see what we get.
00:21:46.880 | So before it was a pretty short answer.
00:21:48.880 | This, I'm saying exhaustive summary and then answer.
00:21:52.160 | So it should be longer, I would expect.
00:21:54.400 | So let's come down here and yeah, we can see it's longer.
00:21:57.320 | We see PyTorch and TensorFlow,
00:21:59.360 | two of the most popular major machine learning libraries.
00:22:02.280 | TensorFlow is maintained and released by Google
00:22:05.120 | while PyTorch is maintained and released by Facebook.
00:22:07.600 | TensorFlow is more convenient in this industry,
00:22:09.640 | prototyping, deployment, and scalability.
00:22:12.040 | So we're still getting the same information in there,
00:22:15.200 | but it's being generated in a different way.
00:22:17.760 | And it also even says PyTorch is more handy in research,
00:22:21.280 | it's more Pythonic, and so on and so on.
00:22:23.480 | It also includes this here,
00:22:24.960 | TensorFlow.js has several unique advantages
00:22:26.960 | over Python equivalent as it can run on the client side too.
00:22:31.640 | So I suppose it's saying TensorFlow.js
00:22:33.880 | versus TensorFlow in Python,
00:22:36.160 | which is not directly related to PyTorch.
00:22:39.920 | But as far as I know,
00:22:40.760 | there isn't a JavaScript version of PyTorch.
00:22:43.440 | So in reality, the fact that you do have TensorFlow.js
00:22:47.120 | is in itself an advantage.
00:22:49.320 | So I can see why that's been pulled in,
00:22:52.000 | but maybe it's not explained very well in this generation.
00:22:55.800 | Okay, so that's it for this example
00:22:58.720 | on generative question answering using Pinecone
00:23:03.360 | and OpenAI's embedding models and also generation models.
00:23:08.280 | Now, as you can see,
00:23:09.760 | this is incredibly easy to put together
00:23:12.520 | and incredibly powerful.
00:23:14.280 | Like we can build some insane things with this
00:23:18.640 | and it can go far beyond what I've shown here
00:23:21.520 | with a few different prompts,
00:23:24.080 | like asking for a bullet point list
00:23:26.120 | that explains what steps you need to take
00:23:28.120 | in order to do something
00:23:30.040 | is one really cool example that I like.
00:23:32.560 | And this is something that we will definitely explore
00:23:35.520 | more in the future,
00:23:36.920 | but for now, that's it for this video.
00:23:40.240 | I hope all this has been interesting and useful.
00:23:43.440 | So thank you very much for watching
00:23:45.600 | and I will see you again in the next one.
00:23:49.120 | (gentle music)
00:23:51.720 | (gentle music)
00:23:54.320 | (gentle music)
00:23:56.920 | (gentle music)
00:23:59.560 | (gentle music)
00:24:02.160 | [BLANK_AUDIO]