Back to Index

GPT 4: Superpower results with search


Chapters

0:0 Why GPT-4 can fail - hallucinations
1:50 What we can do with retrieval augmentation
2:16 How retrieval augmentation works
7:41 Scraping docs for LLMs
10:1 Preprocessing and chunking text for GPT4
13:24 Creating embeddings with text-embedding-ada-002
14:58 Creating the Pinecone vector database
19:24 Retrieving relevant docs with semantic search
20:23 GPT-4 generated answers
23:44 GPT-4 with augmentation vs. GPT-4 without
25:34 Building powerful tools is almost too easy

Transcript

Today we're going to take a look at a example walkthrough app that is going to show us how to alleviate two of the biggest problems with GPT-4, GPT-3.5 and other large language models. Those two things that I'm talking about are their ability to very convincingly make things up, which we call hallucinations, and also their inability to contain up-to-date information.

So most of the models we're dealing with at the moment, they haven't seen any world information, world data since September 2021. That's up to that point, that's where their train data cuts off. So they're pretty outdated. So what we are going to be able to do with the approach I'm going to show you is take a question like, how do I use the LLM chain in LineChain?

Now, LineChain is a very recent Python library. So most of these models, their train data to cut off is at September 2021. They have no idea about LineChain. LLM chain is a particular object within that library. If I asked GPT-4, how do I do this? The answer isn't very good.

So the answer is that LLM chain in LineChain is an ambiguous term, lacks context. It could refer to a language model. So it did manage to get that, which is kind of cool. It could be a blockchain technology. This is the answer that I seem to see in GPT models quite a lot, that this is some sort of blockchain technology.

So assuming that LLM chain refers to a language model and LineChain refers to a blockchain technology, then it gives you instructions on how to use it. This is just completely false. This isn't useful in any way to us whatsoever. So this isn't good. With the approach I'm going to show you, we will get this answer.

To use the LLM chain, LineChain, follow these steps. Import necessary libraries. Do this. Create, initialize your LLM. Create a prompt template. Import the LLM chain. Initialize your LLM chain and then run your LLM chain. That's exactly how you do it. So what we're going to cover in this video is how to make that happen.

So the question now is, what are we doing? What are we going to do? Now, as I mentioned, large language models, they kind of exist in a vacuum. They don't have any sort of external stimuli to the world. They just have their own internal memory, which was built during the training of this large language model.

That is kind of all they have. And it's pretty powerful. I mean, you've seen ChatGPT, now GPT-4, like the things that they can do is incredible, right? Their general knowledge of the world is very, very good. It's just not up to date and it's not always reliable. Sometimes they just make things up.

So what we want to do is give the large language model access to the outside world. Now, how do we do that? Well, we're going to use a few different components here. The main component is what we call a vector database. We're going to be using what is called the Pine Cone Vector Database for that.

Essentially, you can think of this as within your brain, you kind of have your long-term memory somewhere in there. You can think of Pine Cone as your kind of long-term memory storage. The large language model is, I know maybe it's like your short-term memory. Maybe it's also like the neocortex, which kind of like runs your brain or performs all these logical calculations within your brain.

That is kind of how we can think of these two components and how they relate to each other. Then we're also going to, okay, so let's say we take a query. We're going to take this query down here. Typically, we just put that query straight into the large language model.

Instead, now what we're going to do is we're going to have another large language model that has been built for embeddings. Now, an embedding, you can think of embeddings as kind of like the language of language models. That's kind of what they are. These vectors, they basically create a representation, a numerical representation of language.

It's probably better if I draw that out. You have this embedding model here. Given your query, it's going to map that into essentially what is a vector space. It's going to put it here based on the meaning of that query. We create this vector embedding. Then we take its pine cone.

In pine cone, we already have many of these vector embeddings that we created beforehand. Let's say this is kind of inside pine cone. There's all of these different vectors everywhere. They all represent a piece of information. What we're doing here is we're taking that, we're putting it in here.

We're saying, okay, which are the vectors that are nearest to our query vector? Maybe it's this one, this one, and this one. Then we return those. Those three items, they come out to here. We have our vectors. They are connected to some piece of text, relevant text to whatever our query is.

We then take our query, bring it up here, and we feed it into the large language model alongside these pieces of information that we just retrieved. Now the large language model has a way to, it has some sort of connection to the outside world in the form of this vector database, which is retrieving relevant information based on a particular query.

That's what we're going to implement. I think that's enough for this kind of abstract visual for this. Let's just jump straight into the code. I will leave a link to this notebook so you can follow along. It will be somewhere near the top of the video right now. There are a few things that we need to import here or install.

We're going to be using Beautiful Soup. You saw the question before. It is about a particular Python library. Where do we get the information about the Python library? We just go to their docs. We go to the line chain, read docs.io, and they have a lot. Everything we need is here.

It has guides. It has code. It has everything. All we're going to do is just scrape the website. Obviously, that website, their doc site is pretty up to date for the library. We can just keep something that goes through and maybe updates it every half a day or every day, depending on how up to date you need this thing.

We're going to be using Beautiful Soup, TickToken, OpenAI, LineChain, and PyConclient. I'm going to go through all of these later as we come to that. I don't want to take too long going through how we're getting our data and so on, because it's going to vary depending on what it is you're actually doing.

I'll just show you very quickly. I'm using requests. I'm getting the different web pages. We come to here, and I'm basically just identifying all the links that are to the same, like linechain.readdocs.io. I'm getting all the links on each page that direct to another page on the site. Then I'm also just getting the main content from that page.

You can see here, the front page, "Welcome to LineChain, content is getting started," modules. It's super messy, and I'm sure 100% you can do better than what I'm doing here. This is really quick code. Most of this, even the preprocessing, data scraping side of things, chat, this is all mostly chat GPT, not even me.

This is just pulled together really quickly, and we get this pretty messy input. But large language models are really good at processing text, so I don't actually need anything more than this, which is pretty insane. I'm just taking this and putting it into a function here, scrape. We have a URL, which is just a string, and we go through and we extract everything we need there.

Then here, I'm setting up that loop to go through all the pages that we find, and just scrape everything, and we add everything to data here. You can see if we scroll up, there's a few 404s where it can't find a webpage. Now, this might just be that I'm calling the wrongly formatted URL or something else.

I'm not sure, but I'm not too worried. It's just a pretty quick run-through here. All I want is that we have a decent amount of data in here, and we do. Let's have a look at what one of those looks like. Data, this is the third page that we scraped.

Yeah, it's really messy. It's hard to read. I think there's code in here. Yeah, there's code and everything in here. It's hard, but it's fine. It works. We don't really need much more, but it is very long. There are token limits to GPT-4. The model we're going to be using is an 8k token limit.

There will be a new model. It's a 32k token limit, but we don't want to necessarily use that full token limit because it's expensive. They charge you per token. We don't want to just pass in a full page of text like this. It's better if we chunk it into smaller chunks, which allows us to be more concise in the information that we're feeding into GPT-4 later on, and also save money.

You don't want to just throw in everything you have. What I'm going to do is we're going to split everything into not 1,000 token chunks, actually we want it a little bit lower, so 500 token chunks. Now, here I'm actually using LangChain. They have a really nice text splitter function here.

Let me walk you through this, because this I think most of us are going to need to do when we're working with text data. We want to take our big chunks of text and we want to split it into smaller chunks. How do we do that? Well, first we want to get the OpenAI, because we're using OpenAI models here, we want to get the OpenAI tape token tokenizer to count the number of tokens that we have in a chunk.

That's what we're doing here. We're setting up this counting function, which will check the length of our text. We're going to pass that into this function here. What is this function? This is called the recursive character text splitter. What this is going to do is it's going to try first to separate your text into roughly 500 token chunks using this character string.

Double new lines. If it can't find that, it's going to try single new line. If it can't do that, it will try space. If it can't do that, it's going to split wherever it can. This is probably one of the better, in my opinion, options for splitting your text into chunks.

It works really well. With this text, it's probably not even that ideal. I don't even know if we have new lines in this. This is probably just mostly going to split on spaces. It works. We don't need to worry about it too much. Cool. We process our data into chunks using that approach.

We have this here. We're going through all of our data. We split everything. We are getting the text records. I don't know if -- do we have an example? Yeah, here. If we come to the format of this, we have the URL and we also have the text. That's why we're pulling in this text here.

Because we now have multiple chunks for each page, we need to create a separate chunk for each one of those, but we still want to include the URL. We create a unique ID for each chunk. We have that chunk of text that we got from here. We have the number chunks.

Each page is going to have 5, 6, 7 or so chunks. We also have the URL for the page. We can link back to that at a later point if we wanted to do that. All right. Cool. Then we initialize our embedding model here. We're using the API directly.

What we're doing here is using an embedding model. Now, embeddings are pretty cheap. I don't remember the exact pricing, but it's really hard to spend a lot of money when you're embedding things with this model. I wouldn't worry too much about the cost on this side of things. It's more when you get to Jupyter 4 later on where it starts to get a bit more expensive.

This is just an example. How do we create our embeddings? We have open AI embedding create. We pass in the text embedding model. You also need your open AI key. For that, you need to go to platform open AI.com. Let me double check that. You'd come to the platform here.

You'd go up to your profile on the top right and you just click view API keys. That's it. Then we run that and we'll get a response that has this. We have object, data, model, usage. We want to go into data and then we get our embeddings like this.

We have this is embedding one or zero. This is embedding one because we passed two sentences there. Each one of those is this dimensionality. This is important for initializing our vector database or our vector index. Let's move on to that. We get to here. We need to initialize our connection to Pinecone.

For this, you do need to sign up for an account. You can get a free API key. To do that, you need to go to app.pinecone.io. We should find ourselves here. You'll probably end up in this, like, it will say your name, default project, and you just go to API keys.

You press copy and you would paste it into here. You also need the environment. The environment is not necessarily going to be this. I should just remove that. The environment is whatever you have here and this will change. It depends on when you sign up, among other things. So, yeah, that will vary.

Don't rely on what I put here, which was the US West 1 GCP. It can change. It also depends if you already have a project that you set up with a particular environment, then, of course, it's going to be whichever environment you chose there. All right. After that, we check if the index already exists.

If this is your first time walking through this with me, then it probably won't exist. So, the index is this GPT4 line chain dot. You can see if I go into mine, it will be there, right? Because I just created it before recording this. So, I do have that in there.

So, this would not run. All right. But important things is we have our index name. You can rename it to whatever you want. I'm just using this because it's descriptive. I'm not going to forget what it is. A dimension is where we need that 1, 5, 3, 6, which we've got up here.

So, the dimensionality of our vectors. That's important. And then the metric, we're using dot product. So, text embedding, you should be able to use it with our dot product or cosine. We're just going dot product there. And then here, we are so, after this, we'll create our index. Then we're connecting to our index.

Okay? So, this is gRPC index. You can also use just index. But this is kind of more reliable, faster, and so on. And then after you've connected, you can view your index stats. Now, the first time you run this, you should see that the total vector count is zero, right?

Because it's empty. Then, you know, after we've done that, we move on to populating the index. To populate the index, we will do this, right? So, we're going to do it in batches of 100. All right? So, we'll create 100 embeddings and add all of those to Pinecone in a batch of 100.

Okay? So, what we're going to do is we loop through our dataset, through all of the chunks that we have with this batch size. We find the end of the batch. So, the initial one should be like zero to 100. Right? We take our metadata information there. We get the IDs from that.

We get the text from that. And then what we do is we create our embeddings. Now, that should work. But sometimes there are issues, like when you have like a rate limit error or something along those lines. So, I just had a really simple try except statement in here to just try again.

Okay. Cool. After that, we've got our embeddings. Okay. That's good. And we can move on to so, we clean up our metadata here. So, within our metadata, we only want the text. Maybe the chunk. I don't think we really even need the chunk, but I'm just putting it in there.

And the URL, I think that's important. Like, if we're returning results to a user, it can be nice to direct them to where those results are coming from. Right? It helps a user have trust in whatever you're sort of spitting out, rather than not knowing where this information is coming from.

Right? And then we add all of that to our vector index. So, we have our IDs, the embeddings, and the metadata for that batch of 100 items. And then we just loop through, keep going, keep going, in batches of 100. Right. Once that is all done, we get to move on to what I think is the cool part.

Right? So, how do I use the LM chain inline chain? I think we can just run this. Okay. And let's have a look at the responses. Now, this is kind of messy. Here we go. So, I'm returning five responses. Now, if you see the first one here, this is not that relevant.

Okay. The top one that we have here. Okay. Fine. Come on to the next one. This is talking a little bit about large language models. I don't think it necessarily manages LM chain here. Fine. Move on to the next one. Now, we get something. Right? LM combined chains. It's talking about what chains are, why we use them.

It talks about the prompt template, which is a part of the LM chain. And it talks a little bit more about the LM chain. Right. So, that's the sort of information we want. But, I mean, there's so much information here. Do we really want to give all of this to a user?

No, I don't think so. Right? We want to basically give this information to a large language model, which is going to use it to give a more concise and useful answer to the user. So, to do that, we create this sort of format here for our query. Right? So, this is just adding in that information that we got up here into our query.

And we can have a look at what it looks like. So, augmented query. Right? It's actually kind of messy. Let me print it. Maybe that will be better. Right. So, you can kind of see, I mean, these are just single lines. It's really messy. But we separate each example with like these three dashes and a few new lines.

And then we have all this. And then we ask, we put our query at the end. How do I use the LM chain and line chain? All right. That is our new augmented query. We have all this external information from the world. And then we have our query. Before, it was just this.

Right? Now, we have all this other information that we can feed into the model. Right? Now, GPT-4, at least in its current state, is a chat model. Okay? So, we need to use a chat completion endpoint like we would have done with GPT-3.5 Turbo. And with those, we have kind of like the system message that primes the model.

Right? So, I'm going to say you are a QnA bot. You are highly intelligent. That answers your questions based on the information provided. So, this is important. Based on the information provided by the user above each question. Right? Now, this information isn't actually provided by the user. But as far as our AI bot knows, it is.

Because it's coming in through a user prompt. Right? If the information cannot be found in the information provided by the user, you truthfully say that I do not know. Okay? I don't know the answer to this. Right? So, this is to try and avoid hallucination where it makes things up.

Right? Because we kind of don't want that. It doesn't fully fix that problem. But it does help a lot. So, we pass in that primer. And then we pass in our augmented query. We're also going to do this. Actually, let me run this. We're also going to do this here.

So, we're going to display the response nicely with markdown. So, what we'll see with GPT-4 is that it's going to kind of format everything nicely for us. Which is great. But obviously, just printing it out doesn't look that good. So, we use this. Okay? And let's run. And we get this.

Okay. So, to use the LN chain, line chain, follow these steps, import necessary classes. I think these all look correct. OpenAI, temperature 0.9. Now, all this looks pretty good. I'd say the only thing missing is probably that it is missing the fact that you need to add in your OpenAI API key.

But otherwise, this looks perfect. Right? So, I mean, that's really cool. Okay. That's great. But maybe a question that at least I would have is how does this compare to not feeding in all of that extra information that we got from the vector database? All right. We can try.

All right. So, let's do the same thing again. This time, we're not using the augmented query. We're just using the query. And we just get I don't know. Right? Because we set that set the system up beforehand with the system message to not answer and just say I don't know if it doesn't have the information contained within the information that we pass within the user prompt.

Okay. So, that's good. It's working. But what if we didn't have the I don't know part? Would it just maybe it could just answer the question. Maybe we're kind of limiting it here. So, I've added this new system message. You are QA bot. A highly intelligent system that answers user questions doesn't say anything about saying I don't know.

Let's try. Okay. Cool. So, LineChain hasn't provided any public documentation on LLMChain nor is there a known technology called LLMChain in their library. To better assist you, could you provide more information or context about LLMChain, LineChain? Okay. Meanwhile, if you are providing to LineChain a blockchain based decentralized AI language model, I'm you know, I keep getting this answer from GPT and I have no idea if it's actually a real thing or it's just like completely made up.

I assume it must be because it keeps telling me this. But yeah, I mean, obviously, this is wrong. This isn't what we're going for. It says here if you're looking for help with a specific language chain or model in NLP, like this is kind of relevant, but it's not.

It clearly doesn't know what we're talking about. It's just making guesses. This is just an example of where we would use this system. As you saw, it's pretty easy to set up. There's nothing complicated going on here. We're just kind of calling this API, calling this API, and all of a sudden we have this insanely powerful tool that we can use to build really cool things.

It's getting stupidly easy to create these sort of systems that are incredibly powerful. I think it shows there are so many startups that are doing this sort of thing. But at least for me, what I find most interesting here is that I can take this, I can integrate into some sort of tooling or process that is specific to what I need to do, and it can just help me be more productive and help me do things faster.

I think that's probably, at least for me right now, that's the most exciting bit. Then of course, for anyone working in the company or any founders working on their startup and so on, these sort of technologies are like rocket fuel. Things you can do in such a short amount of time is insane.

Anyway, I'm going to leave it there. I hope this video has been interesting and helpful. Thank you very much for watching, and I will see you again in the next one. Bye.