back to index

GPT 4: Superpower results with search


Chapters

0:0 Why GPT-4 can fail - hallucinations
1:50 What we can do with retrieval augmentation
2:16 How retrieval augmentation works
7:41 Scraping docs for LLMs
10:1 Preprocessing and chunking text for GPT4
13:24 Creating embeddings with text-embedding-ada-002
14:58 Creating the Pinecone vector database
19:24 Retrieving relevant docs with semantic search
20:23 GPT-4 generated answers
23:44 GPT-4 with augmentation vs. GPT-4 without
25:34 Building powerful tools is almost too easy

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're going to take a look at a example walkthrough app that is going to show us how
00:00:07.040 | to alleviate two of the biggest problems with GPT-4, GPT-3.5 and other large language models.
00:00:13.680 | Those two things that I'm talking about are their ability to very convincingly make things up,
00:00:19.520 | which we call hallucinations, and also their inability to contain up-to-date information.
00:00:27.600 | So most of the models we're dealing with at the moment, they haven't seen any world information,
00:00:33.440 | world data since September 2021. That's up to that point, that's where their train data cuts off.
00:00:39.920 | So they're pretty outdated. So what we are going to be able to do with the approach I'm going to
00:00:45.360 | show you is take a question like, how do I use the LLM chain in LineChain? Now, LineChain is a very
00:00:53.200 | recent Python library. So most of these models, their train data to cut off is at September 2021.
00:00:59.920 | They have no idea about LineChain. LLM chain is a particular object within that library.
00:01:06.160 | If I asked GPT-4, how do I do this? The answer isn't very good. So the answer is that LLM chain
00:01:14.240 | in LineChain is an ambiguous term, lacks context. It could refer to a language model. So it did
00:01:19.680 | manage to get that, which is kind of cool. It could be a blockchain technology. This is the
00:01:24.800 | answer that I seem to see in GPT models quite a lot, that this is some sort of blockchain technology.
00:01:30.720 | So assuming that LLM chain refers to a language model and LineChain refers to a blockchain
00:01:38.640 | technology, then it gives you instructions on how to use it. This is just completely false.
00:01:44.240 | This isn't useful in any way to us whatsoever. So this isn't good. With the approach I'm going
00:01:51.680 | to show you, we will get this answer. To use the LLM chain, LineChain, follow these steps.
00:01:56.880 | Import necessary libraries. Do this. Create, initialize your LLM. Create a prompt template.
00:02:05.200 | Import the LLM chain. Initialize your LLM chain and then run your LLM chain. That's exactly how
00:02:11.680 | you do it. So what we're going to cover in this video is how to make that happen. So the question
00:02:16.800 | now is, what are we doing? What are we going to do? Now, as I mentioned, large language models,
00:02:22.240 | they kind of exist in a vacuum. They don't have any sort of external stimuli to the world. They
00:02:28.960 | just have their own internal memory, which was built during the training of this large
00:02:36.080 | language model. That is kind of all they have. And it's pretty powerful. I mean, you've seen
00:02:41.600 | ChatGPT, now GPT-4, like the things that they can do is incredible, right? Their general knowledge
00:02:49.840 | of the world is very, very good. It's just not up to date and it's not always reliable. Sometimes
00:02:55.840 | they just make things up. So what we want to do is give the large language model access to the
00:03:02.800 | outside world. Now, how do we do that? Well, we're going to use a few different components here. The
00:03:10.720 | main component is what we call a vector database. We're going to be using what is called the Pine
00:03:16.960 | Cone Vector Database for that. Essentially, you can think of this as within your brain,
00:03:25.040 | you kind of have your long-term memory somewhere in there. You can think of Pine Cone as your kind
00:03:33.040 | of long-term memory storage. The large language model is, I know maybe it's like your short-term
00:03:40.240 | memory. Maybe it's also like the neocortex, which kind of like runs your brain or performs all these
00:03:48.800 | logical calculations within your brain. That is kind of how we can think of these two components
00:03:54.720 | and how they relate to each other. Then we're also going to, okay, so let's say we take a query.
00:04:01.280 | We're going to take this query down here. Typically, we just put that query straight
00:04:05.600 | into the large language model. Instead, now what we're going to do is we're going to have
00:04:09.840 | another large language model that has been built for embeddings. Now, an embedding,
00:04:15.920 | you can think of embeddings as kind of like the language of language models. That's kind of what
00:04:24.240 | they are. These vectors, they basically create a representation, a numerical representation
00:04:32.240 | of language. It's probably better if I draw that out. You have this embedding model here.
00:04:38.880 | Given your query, it's going to map that into essentially what is a vector space. It's going
00:04:46.000 | to put it here based on the meaning of that query. We create this vector embedding. Then we take its
00:04:54.080 | pine cone. In pine cone, we already have many of these vector embeddings that we created beforehand.
00:05:01.040 | Let's say this is kind of inside pine cone. There's all of these different vectors everywhere.
00:05:11.120 | They all represent a piece of information. What we're doing here is we're taking that,
00:05:16.320 | we're putting it in here. We're saying, okay, which are the vectors that are nearest to our
00:05:23.120 | query vector? Maybe it's this one, this one, and this one. Then we return those. Those three items,
00:05:31.600 | they come out to here. We have our vectors. They are connected to some piece of text,
00:05:40.160 | relevant text to whatever our query is. We then take our query, bring it up here,
00:05:48.480 | and we feed it into the large language model alongside these pieces of information that we
00:05:54.800 | just retrieved. Now the large language model has a way to, it has some sort of connection to the
00:06:02.880 | outside world in the form of this vector database, which is retrieving relevant information based on
00:06:08.800 | a particular query. That's what we're going to implement. I think that's enough for this kind of
00:06:17.040 | abstract visual for this. Let's just jump straight into the code. I will leave a link to this
00:06:23.440 | notebook so you can follow along. It will be somewhere near the top of the video right now.
00:06:28.720 | There are a few things that we need to import here or install. We're going to be using Beautiful
00:06:36.400 | Soup. You saw the question before. It is about a particular Python library. Where do we get
00:06:44.960 | the information about the Python library? We just go to their docs.
00:06:49.120 | We go to the line chain, read docs.io, and they have a lot. Everything we need is here. It has
00:07:02.080 | guides. It has code. It has everything. All we're going to do is just scrape the website.
00:07:09.840 | Obviously, that website, their doc site is pretty up to date for the library. We can just keep
00:07:17.920 | something that goes through and maybe updates it every half a day or every day, depending on
00:07:24.720 | how up to date you need this thing. We're going to be using Beautiful Soup,
00:07:30.960 | TickToken, OpenAI, LineChain, and PyConclient. I'm going to go through all of these later as we
00:07:37.600 | come to that. I don't want to take too long going through how we're getting our data and so on,
00:07:45.040 | because it's going to vary depending on what it is you're actually doing. I'll just show you very
00:07:49.840 | quickly. I'm using requests. I'm getting the different web pages. We come to here, and I'm
00:07:56.160 | basically just identifying all the links that are to the same, like linechain.readdocs.io. I'm
00:08:02.240 | getting all the links on each page that direct to another page on the site. Then I'm also just
00:08:10.640 | getting the main content from that page. You can see here, the front page, "Welcome to LineChain,
00:08:17.920 | content is getting started," modules. It's super messy, and I'm sure 100% you can do better than
00:08:24.320 | what I'm doing here. This is really quick code. Most of this, even the preprocessing,
00:08:31.200 | data scraping side of things, chat, this is all mostly chat GPT, not even me. This is just pulled
00:08:38.960 | together really quickly, and we get this pretty messy input. But large language models are really
00:08:48.000 | good at processing text, so I don't actually need anything more than this, which is pretty insane.
00:08:53.680 | I'm just taking this and putting it into a function here, scrape. We have a URL,
00:08:59.600 | which is just a string, and we go through and we extract everything we need there.
00:09:03.680 | Then here, I'm setting up that loop to go through all the pages that we find,
00:09:09.360 | and just scrape everything, and we add everything to data here. You can see if we scroll up, there's
00:09:16.960 | a few 404s where it can't find a webpage. Now, this might just be that I'm calling the wrongly
00:09:25.920 | formatted URL or something else. I'm not sure, but I'm not too worried. It's just a pretty quick
00:09:33.520 | run-through here. All I want is that we have a decent amount of data in here, and we do.
00:09:38.160 | Let's have a look at what one of those looks like. Data, this is the third page that we scraped.
00:09:46.240 | Yeah, it's really messy. It's hard to read. I think there's code in here.
00:09:52.080 | Yeah, there's code and everything in here. It's hard, but it's fine. It works. We don't
00:10:00.400 | really need much more, but it is very long. There are token limits to GPT-4. The model we're going
00:10:07.520 | to be using is an 8k token limit. There will be a new model. It's a 32k token limit, but we don't
00:10:13.600 | want to necessarily use that full token limit because it's expensive. They charge you per token.
00:10:19.360 | We don't want to just pass in a full page of text like this. It's better if we chunk it into smaller
00:10:25.520 | chunks, which allows us to be more concise in the information that we're feeding into GPT-4 later on,
00:10:33.040 | and also save money. You don't want to just throw in everything you have.
00:10:36.880 | What I'm going to do is we're going to split everything into not 1,000 token chunks,
00:10:43.520 | actually we want it a little bit lower, so 500 token chunks. Now, here I'm actually using
00:10:50.000 | LangChain. They have a really nice text splitter function here. Let me walk you through this,
00:10:56.560 | because this I think most of us are going to need to do when we're working with text data.
00:11:01.520 | We want to take our big chunks of text and we want to split it into smaller chunks.
00:11:07.440 | How do we do that? Well, first we want to get the OpenAI, because we're using OpenAI models here,
00:11:13.920 | we want to get the OpenAI tape token tokenizer to count the number of tokens that we have in a
00:11:19.120 | chunk. That's what we're doing here. We're setting up this counting function, which will check the
00:11:24.640 | length of our text. We're going to pass that into this function here. What is this function?
00:11:30.400 | This is called the recursive character text splitter. What this is going to do is it's going
00:11:35.360 | to try first to separate your text into roughly 500 token chunks using this character string.
00:11:44.000 | Double new lines. If it can't find that, it's going to try single new line. If it can't do that,
00:11:50.160 | it will try space. If it can't do that, it's going to split wherever it can. This is probably one of
00:11:58.400 | the better, in my opinion, options for splitting your text into chunks. It works really well.
00:12:04.320 | With this text, it's probably not even that ideal. I don't even know if we have new lines in this.
00:12:11.120 | This is probably just mostly going to split on spaces. It works. We don't need to worry about it
00:12:19.520 | too much. Cool. We process our data into chunks using that approach. We have this here. We're
00:12:27.680 | going through all of our data. We split everything. We are getting the text records. I don't know if
00:12:36.960 | -- do we have an example? Yeah, here. If we come to the format of this, we have the URL and we also
00:12:44.640 | have the text. That's why we're pulling in this text here. Because we now have multiple chunks
00:12:54.000 | for each page, we need to create a separate chunk for each one of those, but we still want to
00:13:01.680 | include the URL. We create a unique ID for each chunk. We have that chunk of text that we got
00:13:08.960 | from here. We have the number chunks. Each page is going to have 5, 6, 7 or so chunks. We also
00:13:16.720 | have the URL for the page. We can link back to that at a later point if we wanted to do that.
00:13:23.200 | All right. Cool. Then we initialize our embedding model here. We're using the API directly.
00:13:32.080 | What we're doing here is using an embedding model. Now, embeddings are pretty cheap. I don't
00:13:40.560 | remember the exact pricing, but it's really hard to spend a lot of money when you're embedding
00:13:44.560 | things with this model. I wouldn't worry too much about the cost on this side of things. It's more
00:13:51.440 | when you get to Jupyter 4 later on where it starts to get a bit more expensive.
00:13:54.400 | This is just an example. How do we create our embeddings? We have open AI embedding create.
00:14:04.160 | We pass in the text embedding model. You also need your open AI key. For that, you need to go
00:14:11.120 | to platform open AI.com. Let me double check that. You'd come to the platform here. You'd go
00:14:26.640 | up to your profile on the top right and you just click view API keys. That's it. Then we run that
00:14:33.840 | and we'll get a response that has this. We have object, data, model, usage. We want to go into
00:14:40.080 | data and then we get our embeddings like this. We have this is embedding one or zero. This is
00:14:46.240 | embedding one because we passed two sentences there. Each one of those is this dimensionality.
00:14:51.520 | This is important for initializing our vector database or our vector index. Let's move on to
00:15:00.480 | that. We get to here. We need to initialize our connection to Pinecone. For this, you do need to
00:15:10.400 | sign up for an account. You can get a free API key. To do that, you need to go to app.pinecone.io.
00:15:19.840 | We should find ourselves here. You'll probably end up in this, like, it will say your name,
00:15:26.880 | default project, and you just go to API keys. You press copy and you would paste it into here.
00:15:36.720 | You also need the environment. The environment is not necessarily going to be this. I should
00:15:41.280 | just remove that. The environment is whatever you have here and this will change. It depends on when
00:15:47.600 | you sign up, among other things. So, yeah, that will vary. Don't rely on what I put here, which
00:15:54.240 | was the US West 1 GCP. It can change. It also depends if you already have a project that you
00:16:00.640 | set up with a particular environment, then, of course, it's going to be whichever environment
00:16:04.560 | you chose there. All right. After that, we check if the index already exists. If this is your first
00:16:14.000 | time walking through this with me, then it probably won't exist. So, the index is this GPT4
00:16:19.680 | line chain dot. You can see if I go into mine, it will be there, right? Because I just created it
00:16:26.960 | before recording this. So, I do have that in there. So, this would not run. All right. But
00:16:33.040 | important things is we have our index name. You can rename it to whatever you want. I'm just using
00:16:38.800 | this because it's descriptive. I'm not going to forget what it is. A dimension is where we need
00:16:43.200 | that 1, 5, 3, 6, which we've got up here. So, the dimensionality of our vectors. That's important.
00:16:50.480 | And then the metric, we're using dot product. So, text embedding, you should be able to use it with
00:16:56.160 | our dot product or cosine. We're just going dot product there. And then here, we are so, after
00:17:03.600 | this, we'll create our index. Then we're connecting to our index. Okay? So, this is gRPC index. You
00:17:09.200 | can also use just index. But this is kind of more reliable, faster, and so on. And then after you've
00:17:17.040 | connected, you can view your index stats. Now, the first time you run this, you should see that
00:17:23.120 | the total vector count is zero, right? Because it's empty. Then, you know, after we've done that,
00:17:29.520 | we move on to populating the index. To populate the index, we will do this, right? So, we're going
00:17:36.880 | to do it in batches of 100. All right? So, we'll create 100 embeddings and add all of those to
00:17:43.840 | Pinecone in a batch of 100. Okay? So, what we're going to do is we loop through our dataset,
00:17:50.160 | through all of the chunks that we have with this batch size. We find the end of the batch.
00:17:55.120 | So, the initial one should be like zero to 100. Right? We take our metadata information there.
00:18:05.040 | We get the IDs from that. We get the text from that. And then what we do is we create our
00:18:12.240 | embeddings. Now, that should work. But sometimes there are issues, like when you have like a rate
00:18:19.040 | limit error or something along those lines. So, I just had a really simple try except statement
00:18:26.080 | in here to just try again. Okay. Cool. After that, we've got our embeddings. Okay. That's good.
00:18:34.960 | And we can move on to so, we clean up our metadata here. So, within our metadata,
00:18:41.680 | we only want the text. Maybe the chunk. I don't think we really even need the chunk,
00:18:46.320 | but I'm just putting it in there. And the URL, I think that's important. Like, if we're returning
00:18:51.280 | results to a user, it can be nice to direct them to where those results are coming from.
00:18:57.760 | Right? It helps a user have trust in whatever you're sort of spitting out, rather than not
00:19:06.000 | knowing where this information is coming from. Right? And then we add all of that to our vector
00:19:12.240 | index. So, we have our IDs, the embeddings, and the metadata for that batch of 100 items. And
00:19:18.240 | then we just loop through, keep going, keep going, in batches of 100. Right. Once that is all done,
00:19:25.760 | we get to move on to what I think is the cool part. Right? So, how do I use the LM chain inline
00:19:33.520 | chain? I think we can just run this. Okay. And let's have a look at the responses. Now, this
00:19:42.640 | is kind of messy. Here we go. So, I'm returning five responses. Now, if you see the first one
00:19:49.280 | here, this is not that relevant. Okay. The top one that we have here. Okay. Fine. Come on to the
00:19:56.800 | next one. This is talking a little bit about large language models. I don't think it necessarily
00:20:03.200 | manages LM chain here. Fine. Move on to the next one. Now, we get something. Right? LM combined
00:20:10.240 | chains. It's talking about what chains are, why we use them. It talks about the prompt template,
00:20:16.240 | which is a part of the LM chain. And it talks a little bit more about the LM chain. Right. So,
00:20:21.600 | that's the sort of information we want. But, I mean, there's so much information here. Do we
00:20:25.520 | really want to give all of this to a user? No, I don't think so. Right? We want to basically
00:20:32.720 | give this information to a large language model, which is going to use it to give a more concise
00:20:37.760 | and useful answer to the user. So, to do that, we create this sort of format here for our query.
00:20:48.240 | Right? So, this is just adding in that information that we got up here
00:20:51.200 | into our query. And we can have a look at what it looks like. So, augmented query.
00:20:55.920 | Right? It's actually kind of messy. Let me print it. Maybe that will be better.
00:21:07.040 | Right. So, you can kind of see, I mean, these are just single lines. It's really messy. But we
00:21:13.360 | separate each example with like these three dashes and a few new lines. And then we have all this.
00:21:20.880 | And then we ask, we put our query at the end. How do I use the LM chain and line chain? All right.
00:21:27.680 | That is our new augmented query. We have all this external information from the world. And then we
00:21:34.400 | have our query. Before, it was just this. Right? Now, we have all this other information that we
00:21:39.680 | can feed into the model. Right? Now, GPT-4, at least in its current state, is a chat model.
00:21:48.400 | Okay? So, we need to use a chat completion endpoint like we would have done with GPT-3.5
00:21:55.040 | Turbo. And with those, we have kind of like the system message that primes the model. Right? So,
00:22:02.000 | I'm going to say you are a QnA bot. You are highly intelligent. That answers your questions
00:22:07.120 | based on the information provided. So, this is important. Based on the information provided by
00:22:11.840 | the user above each question. Right? Now, this information isn't actually provided by the user.
00:22:19.120 | But as far as our AI bot knows, it is. Because it's coming in through a user prompt. Right?
00:22:25.040 | If the information cannot be found in the information provided by the user,
00:22:30.640 | you truthfully say that I do not know. Okay? I don't know the answer to this. Right? So,
00:22:36.800 | this is to try and avoid hallucination where it makes things up. Right? Because we kind of don't
00:22:41.920 | want that. It doesn't fully fix that problem. But it does help a lot. So, we pass in that primer.
00:22:49.200 | And then we pass in our augmented query. We're also going to do this. Actually, let me run this.
00:22:55.680 | We're also going to do this here. So, we're going to display the response nicely
00:22:59.360 | with markdown. So, what we'll see with GPT-4 is that it's going to kind of format everything
00:23:06.320 | nicely for us. Which is great. But obviously, just printing it out doesn't look that good.
00:23:10.880 | So, we use this. Okay? And let's run. And we get this. Okay. So, to use the LN chain,
00:23:18.800 | line chain, follow these steps, import necessary classes. I think these all look correct. OpenAI,
00:23:25.520 | temperature 0.9. Now, all this looks pretty good. I'd say the only thing missing is probably that
00:23:32.000 | it is missing the fact that you need to add in your OpenAI API key. But otherwise, this looks
00:23:40.800 | perfect. Right? So, I mean, that's really cool. Okay. That's great. But maybe a question that
00:23:47.760 | at least I would have is how does this compare to not feeding in all of that extra information
00:23:52.480 | that we got from the vector database? All right. We can try. All right. So, let's do the same
00:23:57.920 | thing again. This time, we're not using the augmented query. We're just using the query.
00:24:01.920 | And we just get I don't know. Right? Because we set that set the system up beforehand with
00:24:09.760 | the system message to not answer and just say I don't know if it doesn't have the information
00:24:15.840 | contained within the information that we pass within the user prompt. Okay. So, that's good.
00:24:22.160 | It's working. But what if we didn't have the I don't know part? Would it just maybe it could
00:24:27.360 | just answer the question. Maybe we're kind of limiting it here. So, I've added this new system
00:24:33.120 | message. You are QA bot. A highly intelligent system that answers user questions doesn't say
00:24:37.920 | anything about saying I don't know. Let's try. Okay. Cool. So, LineChain hasn't provided any
00:24:46.160 | public documentation on LLMChain nor is there a known technology called LLMChain in their library.
00:24:52.720 | To better assist you, could you provide more information or context about LLMChain, LineChain?
00:24:57.120 | Okay. Meanwhile, if you are providing to LineChain a blockchain based decentralized AI language
00:25:03.920 | model, I'm you know, I keep getting this answer from GPT and I have no idea if it's actually a
00:25:11.920 | real thing or it's just like completely made up. I assume it must be because it keeps telling me
00:25:16.160 | this. But yeah, I mean, obviously, this is wrong. This isn't what we're going for. It says here if
00:25:21.920 | you're looking for help with a specific language chain or model in NLP, like this is kind of
00:25:29.040 | relevant, but it's not. It clearly doesn't know what we're talking about. It's just making guesses.
00:25:34.320 | This is just an example of where we would use this system. As you saw, it's pretty easy to
00:25:42.480 | set up. There's nothing complicated going on here. We're just kind of calling this API,
00:25:46.240 | calling this API, and all of a sudden we have this insanely powerful tool that we can use
00:25:53.520 | to build really cool things. It's getting stupidly easy to create these sort of systems
00:26:00.400 | that are incredibly powerful. I think it shows there are so many startups that are doing this
00:26:06.000 | sort of thing. But at least for me, what I find most interesting here is that I can take this,
00:26:12.240 | I can integrate into some sort of tooling or process that is specific to what I need to do,
00:26:18.720 | and it can just help me be more productive and help me do things faster. I think that's probably,
00:26:25.760 | at least for me right now, that's the most exciting bit. Then of course, for anyone working
00:26:31.120 | in the company or any founders working on their startup and so on, these sort of technologies
00:26:36.880 | are like rocket fuel. Things you can do in such a short amount of time is insane. Anyway,
00:26:44.960 | I'm going to leave it there. I hope this video has been interesting and helpful.
00:26:49.760 | Thank you very much for watching, and I will see you again in the next one. Bye.
00:26:59.840 | [END]