Generative Question-Answering with OpenAI's GPT-3.5 and Davinci

00:00:00.000 | Today we're gonna take a look how to build

00:00:01.880 | a generative question answering app

00:00:04.420 | using OpenAI and Pinecone.

00:00:07.880 | So what I mean by generative question answering

00:00:10.240 | is imagine you go down to a grocery store

00:00:14.760 | and you go to one of the attendants

00:00:16.680 | and you ask them where is this particular item?

00:00:21.400 | Or maybe you're not entirely sure what the item is

00:00:24.440 | and you say, okay, where are those things

00:00:28.560 | that taste kind of like cherries but look like strawberries?

00:00:32.680 | And hopefully the attendant will be able to say,

00:00:34.720 | ah, you mean cherry strawberries.

00:00:36.600 | And they will say, okay, you just need to go down to aisle two

00:00:39.600 | and they'll be on the left or they'll take you there.

00:00:42.600 | That is basically what generative question answering is.

00:00:45.840 | You're asking something, a question in natural language,

00:00:50.840 | and that something is generating natural language back to you

00:00:55.040 | that answers your question.

00:00:56.600 | Now, I think one of the best ways to explain

00:00:59.160 | this idea of generative question answering is to show you.

00:01:03.400 | So we don't have that much data behind this,

00:01:06.000 | there's a little bit over 6,000 examples

00:01:09.440 | from the Hugging Face, PyTorch, TensorFlow

00:01:12.520 | and Streamlit forums, but it is enough for us

00:01:15.680 | to get some interesting answers from this.

00:01:19.000 | So what I'm gonna do here is I'm going to say,

00:01:23.120 | I want to get a paragraph about a question.

00:01:25.440 | And I'm gonna say, what is the difference

00:01:29.520 | or what are the differences between TensorFlow and PyTorch?

00:01:34.520 | And I'm going to ask you this question.

00:01:37.000 | We can limit the amount of data that it's going to pull from

00:01:40.640 | by adjusting top case at the moment,

00:01:42.280 | it's going to return five items of information.

00:01:44.920 | But I think that should be enough for this.

00:01:47.280 | So let's go and get this.

00:01:49.480 | So they are two of the most popular deep learning frameworks,

00:01:53.360 | PyTorch, Python-based library developed by Facebook,

00:01:56.160 | while TensorFlow is a library developed by Google.

00:01:58.160 | Both frameworks are open source, large community.

00:02:01.040 | Main difference is that PyTorch is more intuitive

00:02:03.680 | and easy to use, while TensorFlow is more powerful

00:02:08.360 | and has more features.

00:02:10.160 | PyTorch is better suited for rapid programming and research,

00:02:12.640 | while TensorFlow is better for production and deployment.

00:02:16.760 | Whereas PyTorch is more Pythonic,

00:02:18.840 | whereas TensorFlow is more convenient

00:02:20.760 | in the industry for prototyping, deployment

00:02:22.840 | and scalability.

00:02:24.240 | And then it even encourages us to learn both frameworks

00:02:27.560 | to get the most out of our deep learning projects.

00:02:30.040 | If we have a look at where this information is coming from,

00:02:32.360 | it's mostly TensorFlow.

00:02:33.720 | So maybe there's some bias there,

00:02:35.840 | and we can actually click on these

00:02:37.040 | and it will take us to those original discussions.

00:02:40.240 | So we can actually see where this model

00:02:43.440 | is getting this information from.

00:02:45.200 | So we're not just relying on this actually being the case.

00:02:48.920 | So another thing we can ask here

00:02:50.880 | is something that is perhaps more factual

00:02:54.400 | rather than opinionated,

00:02:56.360 | is how do I use gradient tape in TensorFlow?

00:03:01.360 | And let's see what it says.

00:03:04.400 | Again, we get this.

00:03:05.240 | So it's a powerful tool in TensorFlow

00:03:06.680 | that allows us to calculate gradients,

00:03:08.280 | computation with respect to its input variables.

00:03:11.200 | And then to use gradient tape,

00:03:13.400 | you first need to create a tape object

00:03:15.680 | and then record the forward pass of the model.

00:03:17.960 | So after the forward pass is recorded,

00:03:19.640 | you can then calculate gradients

00:03:21.120 | with respect to those variables.

00:03:23.240 | This can be done by calling the tape gradient method.

00:03:26.080 | And then we continue with some more information

00:03:29.160 | like we use to calculate higher order derivatives

00:03:32.800 | and then, and then, and so on.

00:03:34.840 | So that's pretty cool.

00:03:36.920 | But let's have a look at how we would actually build this.

00:03:40.120 | Now, when we're building generative

00:03:42.280 | question answering systems with AI,

00:03:44.640 | naturally we just replace the person

00:03:47.960 | with some sort of AI pipeline.

00:03:51.720 | Now, our generative question answering pipeline

00:03:55.120 | using AI is going to contain two primary items,

00:03:58.200 | a knowledge base and the AI models themselves.

00:04:01.600 | So the knowledge base,

00:04:03.520 | we can think of it as a model's long-term memory.

00:04:07.560 | Okay, so in the shop assistant example,

00:04:10.280 | the shop assistant has been working in that store for a while.

00:04:13.520 | They have a long-term memory

00:04:15.680 | of roughly where different items are,

00:04:17.520 | maybe they're actually very good

00:04:19.000 | and know exactly where different items are.

00:04:21.680 | And they just have this large knowledge base

00:04:26.080 | of everything in that store.

00:04:29.040 | And when we ask that question,

00:04:31.360 | we are producing our natural language prompt

00:04:34.840 | or query to that person.

00:04:36.920 | Something in their brain is translating that language

00:04:41.800 | into some sort of concept or idea.

00:04:44.520 | And they're looking at their knowledge base

00:04:47.000 | based on their past experience,

00:04:49.520 | attaching that query to some nearby points

00:04:54.520 | within their knowledge base,

00:04:56.240 | and then returning that information back to us

00:04:58.520 | by generating some natural language

00:05:00.520 | that explains what we are looking for.

00:05:02.560 | So if we take a look at how this will look

00:05:05.200 | for our AI pipeline,

00:05:08.080 | we are going to have an embedding model here,

00:05:11.360 | and we have our question like syntax here,

00:05:14.840 | question, we feed that into the embedding model.

00:05:17.120 | We're going to be using the ARDA002 model,

00:05:19.760 | which is a GPT 3.5 embedding model.

00:05:23.920 | That's going to translate that query

00:05:27.160 | into some sort of numerical representation

00:05:30.160 | that essentially represents the semantic meaning

00:05:33.800 | behind our query.

00:05:34.720 | So we get a vector in vector space.

00:05:37.240 | Now we then pass that to the knowledge base.

00:05:39.160 | The knowledge base is going to be Pinecone Vector Database.

00:05:42.760 | And from there, within that Pinecone Vector Database,

00:05:46.400 | just like the shop assistant,

00:05:47.480 | we're going to have essentially

00:05:49.440 | what we can think of as past experiences,

00:05:51.640 | or just information that has already been indexed.

00:05:55.800 | It's already been encoded by GPT 3.5 model

00:05:59.200 | and stored within Pinecone.

00:06:01.080 | So we essentially have a ton of index items like this.

00:06:06.080 | And then we introduce this query vector into here.

00:06:10.600 | So maybe it comes here,

00:06:12.280 | and we're going to say, just return,

00:06:14.280 | let's say in this example,

00:06:15.360 | we have a few more points around here,

00:06:17.280 | just return the top three items.

00:06:19.440 | So in this case, we're going to return this,

00:06:22.680 | this, and this.

00:06:24.160 | So the top three most similar items.

00:06:26.360 | And then we pass all of those to another model.

00:06:29.000 | So this is going to be another OpenAI model.

00:06:32.320 | So this is going to be a DaVinci text generation model.

00:06:36.000 | And here, these data points,

00:06:37.880 | we actually, right here,

00:06:39.400 | we're going to translate them back into text.

00:06:42.040 | So there'll be like natural language texts

00:06:44.000 | that contain relevant information to our particular query.

00:06:46.880 | We are going to take our query as well.

00:06:49.480 | We're going to bring that over here.

00:06:50.880 | We're going to feed that in there.

00:06:52.400 | And we're going to format them in a certain way.

00:06:54.040 | And we'll dive into that a little more later on.

00:06:56.400 | DaVinci will be able to complete our questions.

00:07:01.400 | So we're going to ask a question.

00:07:03.200 | We have our original query.

00:07:04.960 | We're going to say,

00:07:05.800 | answer that based on the following information.

00:07:08.800 | And then that's where we pass in the information

00:07:10.960 | we've returned from Pinecone.

00:07:12.640 | And it will be able to answer us questions

00:07:15.960 | with incredible accuracy.

00:07:18.040 | Now, one question we might have here

00:07:20.600 | is why not just generate the answer from the query directly?

00:07:25.600 | And that can work in some cases.

00:07:28.360 | And it will actually work

00:07:29.360 | in a lot of general knowledge use cases.

00:07:31.640 | Like if we ask something like,

00:07:33.520 | who was the first man on the moon?

00:07:34.920 | It's probably going to say Neil Armstrong

00:07:36.680 | without struggling too much.

00:07:38.960 | That's a very obvious fact.

00:07:41.480 | But if we ask it more specific questions,

00:07:44.000 | then it can struggle

00:07:46.080 | or we can suffer from something called hallucination.

00:07:49.560 | Now, hallucination is where a text generation model

00:07:53.160 | is basically giving text that seems like it's true,

00:07:57.720 | but it's actually not true.

00:07:59.000 | And that is just a natural consequence

00:08:02.000 | of what these models do.

00:08:03.600 | They generate human language.

00:08:05.600 | They do it very well,

00:08:07.040 | but they aren't necessarily tied to being truthful.

00:08:12.040 | They can just make something up.

00:08:14.000 | And that can be quite problematic.

00:08:15.560 | Imagine like in the medical domain

00:08:17.760 | and this model may begin generating text

00:08:21.120 | about a patient's diagnosis,

00:08:23.520 | which seems very plausible using scientific terminology

00:08:27.240 | and statistics and so on.

00:08:29.520 | But in reality, all of that may be completely false.

00:08:32.600 | It's just making it up.

00:08:34.040 | So by adding in this component of a knowledge base

00:08:38.880 | and forcing the model to pull information

00:08:41.320 | from knowledge base and then answer the question

00:08:43.400 | based on the information we pulled on

00:08:45.320 | from that knowledge base,

00:08:46.760 | we are forcing the model to almost be truthful

00:08:51.160 | and base its answers on actual facts,

00:08:55.400 | as long as our knowledge base actually does contain facts.

00:08:58.720 | And at the same time,

00:09:00.160 | we can also use it for domain adaption.

00:09:02.360 | So whereas a typical language generation model

00:09:06.200 | may not have a very good understanding

00:09:09.040 | of a particular domain,

00:09:10.600 | we can actually help it out

00:09:13.200 | by giving it this knowledge from a particular domain

00:09:16.200 | and then asking it to answer a question

00:09:17.840 | based on that knowledge that we've extracted.

00:09:20.480 | So there are a few good reasons for doing this.

00:09:24.120 | Now to do this,

00:09:25.040 | we are going to essentially do three things.

00:09:28.680 | We first need to index all of our data.

00:09:31.160 | So we're going to take all of our tags,

00:09:32.960 | we're going to feed it into the embedding model

00:09:35.800 | and that will create our vectors.

00:09:37.960 | Let's just say they're here

00:09:39.760 | and we pass them into Pinecone.

00:09:41.760 | And this is the indexing stage.

00:09:44.160 | Now to perform this indexing stage,

00:09:47.280 | we're going to come over to this repo here,

00:09:49.720 | so Pinecone I/O samples.

00:09:51.960 | And at the moment,

00:09:52.840 | these notebooks are stored here,

00:09:54.960 | but I will leave a link on this video

00:09:58.240 | that will keep updated to these notebooks

00:10:00.800 | just in case they do change in the future.

00:10:02.760 | We come to here and this is our notebook.

00:10:05.960 | Now we can also open it in Colab.

00:10:08.120 | So I'm going to do that over here.

00:10:09.800 | Again, I'm going to make sure

00:10:10.880 | there's a link in this video right now

00:10:12.840 | so that you can click that link

00:10:14.480 | and just go straight to this.

00:10:16.480 | So the first thing we need to do

00:10:17.560 | is install our prerequisites,

00:10:19.440 | OpenAI, Pack and Climb datasets.

00:10:21.640 | Now I have a few videos

00:10:23.680 | that go through basically this process.

00:10:25.840 | So if you'd like to understand this in more detail,

00:10:30.320 | I will link to one of those videos.

00:10:32.800 | But right now,

00:10:33.640 | I'm just going to go through it pretty quickly.

00:10:35.160 | So first is Dataprep.

00:10:37.200 | We do this all the time.

00:10:38.160 | Of course, we're going to be loading this dataset here,

00:10:41.320 | MLQA, which is just a set of question answering

00:10:45.200 | based on ML questions

00:10:46.800 | across a few different document forms.

00:10:50.120 | So like HuggingFace, Hightorch, and a few others.

00:10:53.320 | I'm not going to go through this,

00:10:54.640 | but all we're doing essentially

00:10:56.000 | is removing excessively long items from there.

00:10:59.440 | Now, this bit is probably more important.

00:11:02.920 | So when we're creating these embeddings,

00:11:05.000 | what we're going to do is include quite a lot of information.

00:11:07.240 | So we're going to include the thread title or topic,

00:11:10.280 | the question that was asked

00:11:11.520 | and the answer that we've been given.

00:11:14.600 | So we do that here.

00:11:16.360 | And you can kind of see like thread title is this,

00:11:19.240 | and then you have the question asked,

00:11:22.160 | and then later on,

00:11:23.560 | you have the answer a little further on, which is here.

00:11:28.400 | So we're going to throw all of that

00:11:30.760 | into our Arda002 embedding model,

00:11:34.400 | and it can embed a ton of text.

00:11:35.880 | I think it's up to around 10 pages of text.

00:11:38.680 | So this is actually not that much for this model.

00:11:41.880 | So to actually embed the things,

00:11:43.160 | we need an OpenAI API key.

00:11:45.720 | I will show you how to do that just in case you don't know.

00:11:48.520 | So we go to this betaopenai.com/signup,

00:11:52.680 | and you create an account if you don't already have one.

00:11:54.840 | I do, so I'm going to log in.

00:11:57.240 | And then you need to go up to your profile on the top right,

00:12:01.320 | click View API Keys.

00:12:03.040 | And you may need to create a new secret key

00:12:05.960 | if you don't have any of these saved elsewhere.

00:12:08.200 | Now, once you do have those,

00:12:09.400 | you just paste it into here and then continue running.

00:12:12.160 | If you write this,

00:12:13.920 | basically you should get a list of these

00:12:15.800 | if you have authenticated.

00:12:17.360 | Yeah, let's go through these a bit quicker.

00:12:19.120 | So we have the embedding model here, Arda002.

00:12:22.680 | Create the embeddings here, so on and so on.

00:12:26.240 | Here, we're creating our Pinecone index.

00:12:29.720 | So we're going to call it OpenAI ML QA.

00:12:32.480 | We need to get a API key again for that,

00:12:36.160 | so that we would find it at the pinecone.io.

00:12:39.800 | And then you just go into the left here after signing up

00:12:43.480 | and then copy your API key.

00:12:46.120 | And then of course, put it in here.

00:12:47.680 | We create the index and connect to the index.

00:12:52.240 | Okay, so that's initializing everything.

00:12:54.760 | After that, we just need to index everything.

00:12:56.680 | So we're going through our text here in these batches of 128.

00:13:01.640 | We are encoding everything here.

00:13:04.760 | So we're encoding everything in these batches of 128.

00:13:07.640 | We're getting relevant metadata,

00:13:09.360 | so that's going to be the plain text associated

00:13:11.560 | with each one of our, what will be vector embeddings.

00:13:15.840 | And we also have the embeddings.

00:13:18.280 | So we just clean them up from the response format here

00:13:22.680 | and we pull those together.

00:13:24.240 | So the IDs are just unique IDs,

00:13:26.920 | which is actually just a count in this case,

00:13:28.800 | embeddings themselves and the metadata.

00:13:31.360 | Then we add all those to Pinecone.

00:13:33.080 | Okay, pretty straightforward.

00:13:35.600 | And then we have these 6,000 vectors in Pinecone

00:13:39.960 | that we can then use as our knowledge base.

00:13:42.480 | Now, usually you probably want way more items than this,

00:13:46.120 | but this is, I think, good enough for this example.

00:13:48.640 | Now, after everything is indexed,

00:13:51.280 | we move on to querying.

00:13:53.480 | So this is the next step, not the final step.

00:13:56.440 | So we have a user query and what we're going to do is,

00:13:59.840 | it's going to look very similar to what we already did.

00:14:02.520 | We're going to pass that into the text embedding,

00:14:05.400 | R002 model.

00:14:07.000 | So this is the GPT 3.5 embedding model.

00:14:12.000 | That gives us our query vector, which we'll call XQ.

00:14:16.360 | And then we pass that to Pinecone.

00:14:17.640 | Now, Pinecone has metadata attached to each of our vectors.

00:14:22.640 | So let's say we return three of these vectors.

00:14:26.280 | We also have the metadata attached to it.

00:14:28.720 | Okay, and we return that to ourselves.

00:14:32.400 | So this is the querying set.

00:14:34.360 | But then later on, we're also going to pass that

00:14:36.240 | onto the generation model,

00:14:38.040 | which gets produced a more intelligent answer

00:14:41.360 | using all of the contexts that we've returned.

00:14:43.520 | So to begin making queries,

00:14:44.720 | we're going to be using this zero making queries notebook

00:14:47.840 | here.

00:14:48.680 | And again, I'm going to open it up in Colab.

00:14:52.400 | Now I'm actually going to run through this one

00:14:53.800 | because we're going to be moving on to generation stuff,

00:14:56.200 | which I haven't covered much before.

00:14:57.680 | And I want to go through it in a little more detail.

00:14:59.560 | So I'm going to run this, which is just the prerequisites.

00:15:03.120 | And then we'll move on to initializing everything.

00:15:05.240 | So we have the Pinecone vector database,

00:15:08.240 | the embedding and the generation models.

00:15:10.560 | First, we start with Pinecone.

00:15:12.720 | I'm going to run it again

00:15:13.560 | because I need to connect to the index.

00:15:16.240 | So this is solving the variable called Pinecone key.

00:15:20.440 | And here, we're just connecting to the index

00:15:22.680 | and we're going to describe the index stats.

00:15:24.360 | So because we already populated it,

00:15:26.040 | we already did the indexing,

00:15:27.200 | we should see that there are already values in there

00:15:30.320 | or vectors.

00:15:31.320 | And we can see that here.

00:15:32.280 | So we have the 6,000 in there.

00:15:34.560 | And here, we're going to need

00:15:36.040 | to initialize the OpenAI models.

00:15:38.880 | For that, we need an OpenAI key.

00:15:41.240 | And again, I've solved that in a variable

00:15:43.000 | called OpenAI key.

00:15:44.720 | Okay.

00:15:45.560 | Great.

00:15:48.360 | So the embedding model,

00:15:50.640 | let's see how we actually query with that.

00:15:52.280 | So we initialize the name of it there.

00:15:54.520 | So text embedding audit002.

00:15:56.320 | This needs to be the same as the model

00:15:57.960 | that we use during indexing.

00:15:59.920 | And I'm going to say, like we saw earlier,

00:16:02.080 | what are the differences between PyTorch and TensorFlow?

00:16:05.200 | Let's ask that question.

00:16:06.400 | So here, we're creating our query vector.

00:16:09.280 | Here, we're just extracting it from the response.

00:16:11.320 | And then we use that query vector to query Pinecone.

00:16:13.720 | So index.query, I'm going to return top three items.

00:16:17.400 | Let's see what it returns.

00:16:19.200 | And we can see here, we have this general discussion.

00:16:21.400 | I think this post might help you.

00:16:22.640 | Come down, it's PyTorch versus TensorFlow.

00:16:25.480 | So yeah, probably pretty relevant.

00:16:27.040 | And we have three of those.

00:16:28.240 | So we come down a little more.

00:16:29.280 | We have another one here.

00:16:31.480 | This is on about TensorFlow.js in this case.

00:16:34.400 | And so on.

00:16:36.720 | So within that, we should have a few relevant contexts

00:16:41.040 | that we can then feed into the generation model

00:16:44.880 | as basically sources of truth for whatever it is it's saying.

00:16:49.760 | Now, generation stage is actually very simple.

00:16:52.000 | We have our contexts down here.

00:16:54.800 | We are feeding them up into here.

00:16:56.640 | We're taking our query.

00:16:58.320 | So we're going to have our query question.

00:17:00.760 | And then we're going to have these contexts

00:17:02.240 | followed onto the end of it or appended onto the end of it.

00:17:05.400 | Now, actually, before this,

00:17:06.840 | we're also going to have like a little statement,

00:17:09.440 | which is basically telling the generation model

00:17:13.680 | how to or what to do based on instruction.

00:17:17.760 | And the instruction is going to be something like,

00:17:19.720 | answer the following question given the following context.

00:17:24.720 | And I should actually, so this question

00:17:28.480 | is actually going to be at the end of this here.

00:17:31.840 | Okay, so we're going to have the instruction

00:17:34.040 | followed by the context.

00:17:36.200 | So the ground truth of information

00:17:38.640 | that we returned from our vector database,

00:17:41.440 | and then we're going to have the question.

00:17:42.960 | Now let's see what that actually looks like.

00:17:45.520 | So come down to here, we have a limit

00:17:48.360 | on how much we're going to be feeding

00:17:49.920 | into the generation model.

00:17:51.440 | We're just setting that there.

00:17:53.320 | We have the context.

00:17:54.320 | So this is what we returned from Pinecone up here.

00:17:57.400 | Okay, so this text that you can just about see here.

00:18:01.120 | Okay, so it's within the metadata, it's a context

00:18:03.480 | from the matches that we got there.

00:18:05.920 | And then you can see a prompt that we're building.

00:18:07.840 | Okay, so we're going to say,

00:18:09.160 | answer the question based on the context below.

00:18:12.400 | So then we have context.

00:18:14.080 | That's where we feed in our data that we retrieved.

00:18:17.920 | Now that data retrieved is obviously a few items.

00:18:21.160 | So we join all those with basically these separators here.

00:18:25.000 | And then after we have put that data in there,

00:18:28.880 | we have the question, that's our query.

00:18:32.200 | And then because this generation model is basically

00:18:34.880 | just taking forward the text that we have so far

00:18:37.480 | and continuing it to keep with the format of question

00:18:41.560 | of context, question, we end the prompt with answer.

00:18:46.560 | And then GPT is going to just complete this query.

00:18:51.040 | It's going to continue, it's going to generate text.

00:18:53.320 | Now this is going to be basically the final structure,

00:18:56.480 | but this is not exactly how we built it.

00:18:59.240 | We actually need to remove this bit here

00:19:02.080 | because we're going to, in the middle here,

00:19:05.000 | we're going to use a bit of logic

00:19:06.240 | so that we're not putting too many of these contexts in here.

00:19:09.440 | So here, what we would do is we'll go through

00:19:12.240 | all of our contexts one at a time.

00:19:14.640 | So essentially we're going to start with context one,

00:19:17.040 | then we're going to try context two, context three,

00:19:20.680 | one at a time.

00:19:21.560 | So we're going to start with just the first context

00:19:23.880 | and then we move on in the second iteration

00:19:26.920 | with the first two contexts and the first three contexts

00:19:29.960 | and so on and so on until we have

00:19:32.280 | all of those contexts in there.

00:19:33.800 | The exception here is if the context that we've gone up to

00:19:38.440 | exceeds the limit that we've set here.

00:19:41.680 | At that point, we would say, okay,

00:19:43.720 | the prompt is that number minus one.

00:19:46.880 | So let's say we get to the third context here,

00:19:49.240 | but it's too big.

00:19:50.200 | Then we just take the first two contexts, right?

00:19:53.080 | And then actually after that, we should have a break

00:19:56.480 | and that produces our prompt.

00:19:58.520 | Otherwise, if we get up to all of the contexts,

00:20:01.360 | we just join them all together like this.

00:20:03.600 | Okay, at that point, we move on to the completion point.

00:20:06.920 | So the text generation part.

00:20:09.960 | Now for text generation,

00:20:11.160 | we use OpenAI Completion Create.

00:20:13.520 | We're using that TextEventually003 model.

00:20:16.720 | We're passing our prompt, temperature,

00:20:18.280 | so like the randomness in the generation.

00:20:21.880 | We say it's zero because we kind of want it

00:20:23.800 | to be pretty accurate.

00:20:26.880 | If we want more interesting answers,

00:20:29.360 | then we can increase the temperature.

00:20:31.480 | We have maximum tokens there.

00:20:33.200 | So how many things should the model generate

00:20:37.440 | or how many tokens should the model generate?

00:20:39.720 | And then a few other generation variables.

00:20:43.360 | You can read up on these in the OpenAI docs.

00:20:47.560 | Okay, one thing here,

00:20:48.760 | if you did want to stop on a particular character,

00:20:51.160 | like maybe you wanted to stop on a full stop,

00:20:53.680 | you can do so by specifying that there.

00:20:56.960 | So we run this and now we can check what has been generated.

00:20:59.840 | So we have here,

00:21:01.840 | so we go response choices, zero text,

00:21:05.160 | and then we just strip that text

00:21:06.600 | so maybe space on either end.

00:21:08.600 | And it says, okay, if you ask me for my personal opinion,

00:21:10.720 | I find TensorFlow more convenient in the industry,

00:21:13.760 | prototyping, deployment, scalability is easier,

00:21:16.560 | and PyTorch more handy in research,

00:21:19.080 | more Pythonic and easier to implement complex stuff.

00:21:22.440 | So we see it's pretty aligned to the answer we got earlier on.

00:21:25.760 | Now let's try another one.

00:21:26.880 | So again, what are the differences

00:21:28.400 | between PyTorch and TensorFlow?

00:21:30.520 | And what we're going to do is actually modify the prompt.

00:21:33.520 | So I'm going to say, give an exhaustive summary

00:21:35.600 | and answer based on the question using the context below.

00:21:39.920 | So actually give an exhaustive summary

00:21:42.240 | and answer based on the question.

00:21:44.480 | So we have a slightly different prompt.

00:21:45.680 | Let's see what we get.

00:21:46.880 | So before it was a pretty short answer.

00:21:48.880 | This, I'm saying exhaustive summary and then answer.

00:21:52.160 | So it should be longer, I would expect.

00:21:54.400 | So let's come down here and yeah, we can see it's longer.

00:21:57.320 | We see PyTorch and TensorFlow,

00:21:59.360 | two of the most popular major machine learning libraries.

00:22:02.280 | TensorFlow is maintained and released by Google

00:22:05.120 | while PyTorch is maintained and released by Facebook.

00:22:07.600 | TensorFlow is more convenient in this industry,

00:22:09.640 | prototyping, deployment, and scalability.

00:22:12.040 | So we're still getting the same information in there,

00:22:15.200 | but it's being generated in a different way.

00:22:17.760 | And it also even says PyTorch is more handy in research,

00:22:21.280 | it's more Pythonic, and so on and so on.

00:22:23.480 | It also includes this here,

00:22:24.960 | TensorFlow.js has several unique advantages

00:22:26.960 | over Python equivalent as it can run on the client side too.

00:22:31.640 | So I suppose it's saying TensorFlow.js

00:22:33.880 | versus TensorFlow in Python,

00:22:36.160 | which is not directly related to PyTorch.

00:22:39.920 | But as far as I know,

00:22:40.760 | there isn't a JavaScript version of PyTorch.

00:22:43.440 | So in reality, the fact that you do have TensorFlow.js

00:22:47.120 | is in itself an advantage.

00:22:49.320 | So I can see why that's been pulled in,

00:22:52.000 | but maybe it's not explained very well in this generation.

00:22:55.800 | Okay, so that's it for this example

00:22:58.720 | on generative question answering using Pinecone

00:23:03.360 | and OpenAI's embedding models and also generation models.

00:23:08.280 | Now, as you can see,

00:23:09.760 | this is incredibly easy to put together

00:23:12.520 | and incredibly powerful.

00:23:14.280 | Like we can build some insane things with this

00:23:18.640 | and it can go far beyond what I've shown here

00:23:21.520 | with a few different prompts,

00:23:24.080 | like asking for a bullet point list

00:23:26.120 | that explains what steps you need to take

00:23:28.120 | in order to do something

00:23:30.040 | is one really cool example that I like.

00:23:32.560 | And this is something that we will definitely explore

00:23:35.520 | more in the future,

00:23:36.920 | but for now, that's it for this video.

00:23:40.240 | I hope all this has been interesting and useful.

00:23:43.440 | So thank you very much for watching

00:23:45.600 | and I will see you again in the next one.

00:23:48.000 | Bye.

00:23:49.120 | (gentle music)

00:23:51.720 | (gentle music)

00:23:54.320 | (gentle music)

00:23:56.920 | (gentle music)

00:23:59.560 | (gentle music)

00:24:02.160 | [BLANK_AUDIO]

Generative Question-Answering with OpenAI's GPT-3.5 and Davinci

Chapters