Llama Index 101 with Vector DBs and GPT 3.5

Today we're going to take a look at how we can use Llama Index in production with Pinecone. Now this is an introduction to the Llama Index library that was previously known as GPT Index. We're not going to go into any details on the more advanced features of the library.

We're just going to see how to actually use it and get started with it and do that in a way that would be more production friendly with a vector database like Pinecone. Now for those of you that don't know Llama Index is a library that helps us build a better retrieval augmentation pipeline for our LLMs.

So we would use retrieval augmentation when we want to give our LLM source knowledge, so knowledge from the outside world or maybe some internal database or something along those lines. And that will help us one, reference that other knowledge so we can add in citations and things like that there and it will also help us reduce the likelihood of hallucinations.

So Llama Index is a library that will support us in doing that. Now Llama Index can do a lot of things, not all of those we're going to cover in this video but the main features of the library include the data loaders that allow us to very easily extract data from APIs, PDFs, SQL databases, CSVs, all of the most common types of data sources.

It also gives us some more advanced ways of structuring our data so we can add in connections between different data sources which is kind of useful. So imagine you have a lot of chunks of text from PDFs, what you can do is add in connections between those chunks. So the first chunk in your database would be connected to the next chunk with a little connector that says this is actually the next chunk and this is the previous chunk.

And they also support things like post-retrieval re-ranking as well. So there's plenty to talk about but first let's get started with a simple introduction to the library. So we're going to walk through this notebook here. There will be a link to this notebook at the top of the video right now.

So the first thing we need to do is install the prerequisite libraries so go ahead and run that. Now for the runtime here we don't need to be using GPU so you can just check if you are doing that or not. It costs money to use GPU on Colab so you can just set hardware accelerator to none to save that money.

Okay and once we're doing that what we're going to do is just download a data set. So I'm going to use the squad so it's the Sanford question answering data set. Okay so there's a few things I'm doing here. First I'm just getting the relevant columns I need there so the id, the context which is like a chunk of text and the title so the basically the page title where that context is coming from.

And then what I'm doing is dropping duplicates. So in the squad data set you will basically have like 20 contexts and 20 different answers but those 20 contexts are all identical for those 20 different questions so you end up with a lot of duplicate contexts in there but because we are just using the context we actually need to remove all that duplication.

So that's what I'm doing here and then we get this. Okay so we have our id so it's like the document or context id, the context itself and then we have where that is coming from. Okay so we have the first few there are all from the University of Notre Dame Wikipedia page and in total we have just almost 19,000 records in there.

So LongIndex uses these document objects which you can think of as being basically the context or it revolves around the context of your data right so this chunk of text and it will obviously include other bits of information around that context. So for us it's going to include the document id right so every document is going to need an id and this is optional so we can also add extra info which we can think of as metadata for our context.

Now for us we just have title but obviously we could add more this is a dictionary so we could add I don't know something else here right and we could just you know put something but of course we don't actually need that so we'll remove that but yeah you can you can put as many fields as you like in there and yeah let's run that and take a look at one of those documents and see what it looks like.

So you can think of this as a core object for LongIndex all right so we have this document we have the text and then we go through here we have the document id and that is your info. Now embedding we don't have an embedding for it yet so we're going to create that later but the embedding is also very important because that's what will allow us to search through that data set later on.

Okay so now what we need to do is actually create those embeddings so to create those embeddings we're going to be using OpenAI so for that you will need to get a OpenAI API key from platform.openai.conf and then you would just put that in here I have already done it so I will move on to this.

So one step further from our document is what we would call a node so a node the way that I would think of this is it's your document object with extra information about that document in relation to other documents within what will be your database. So let's say you have the chunks of paragraphs or chunks of text from a pdf a node will contain the information that chunk one is followed by chunk two and then in chunk two it will say chunk one was the preceding chunk before this so it has that relational information between the chunks whereas a document will not have that.

So we would need to add that in there we're not going to do that here we'll talk about that in the future but we still need to use the the nodes here so we're going to just run this so our nodes in this case are basically going to be the same as our documents in terms of the information that they carry but node is the object type that we will build our vector database from.

So let's run this I should say here we've we have set up our OpenAI API key we don't actually need to use it yet I should have really done that later but it's there now so we have that ready for when we do want to use it. Okay so we've just created all of the nodes from the documents let's take a look at those nodes okay obviously we have the same number of nodes as we do documents.

Now we are going to be using Pinecone which is a managed vector database as the as a database for our Llama index data okay so to use that we need to get our API key and environment which we do from app.pinecone.io and within there we once you are in app.pinecone.io you should be able to see API keys over on the left you'll see something looks like this zoom out a bit and you just want to copy your API key and also remember your environment here so I've got us-west1-gcp so your API key you put it in here and here I'm going to put us-west1-gcp.

Okay and after running that let me walk you through this what's going on here so we initialize our connection to Pinecone we create our Pinecone index so it's going to call it Llama index intro you can call this whatever you want and the things that we do need to do is one create our index if it does not already exist which if you're running this for the first time it won't and to create that index you need to make sure dimensionality is the same as the text embedding on the 002 model which is the embedding model we're using and that dimensionality is 1536 and we also need to make sure we're using the right metric we can actually use any metric here so you can use euclidean dot product or cosine but I think cosine is the fastest in terms of the similarity calculation when you're using text embedding on 002 although in reality the difference between them is practically nil so you can use any but I recommend cosine now after that we would just connect to the index okay so here we're connecting to Pinecone creating the index and then connecting to that index okay once that is done we can move on to connecting so we've just created our index connected to it now what we want to do is connect to it through the vector store abstraction in Llama index so to do that that's pretty simple Pinecone vector store and then we just pass in our index that's it that's pretty easy so this would just allow us to use the other Llama index components with our Pinecone vector store cool so I think that is all good and then we have a few more things going on here so let's talk through all of this let me make it a lot more readable so yeah there's a few things going on so we have basically what we're wanting to do here is create our index which is this GPT vector store index so we're basically going to take all of our documents and we're going to take the service context which is like your embedding pipeline and we're also going to take the storage context which is the vector store itself and this will essentially act as a pipeline around that so it's going to take all of our documents it's going to feed them through our embedding pipeline so this service context embed all of them and put them all into our vector store so I mean it's not in reality it's pretty straightforward let me just explain that from the perspective of where we're actually initializing these so storage context from default it's really simple we're just using our vector store there are other parameters in here but we don't need to use any of those because we're just using our vector store with the default settings with the service context like I said this is the embedding pipeline again we don't really need to specify much here we just need to specify okay we're using openai embeddings this is going to automatically pull in our API heap which we set earlier on up here okay so it's going to automatically pull in the API key we do need to set the model so text embedding order 002 at the time of me going through this is the recommended model from openai and we have our embedding batch size so this is one important thing that you should set basically it will embed things in batches of 100 I think by default the value for this is much smaller it's 32 or 16 or something like that so that basically means if every it's going to take 16 chunks of text it's going to send them to openai get the embeddings and then it's going to pass it onto the storage context and upset those pine cone but what we've done here is set the embedding batch size to 100 so it's going to take 100 send it to openai then send it to pine cone that means that you need to make less requests what is it it's like six times less requests if you set this to 100 which means in essence you're going to be roughly six times faster because the majority of the wait time in these API requests is actually the network latency so it's making the request receiving the request so by increasing that batch size it means things are going to be faster which is I think what we all want so yeah we set that and then with that we initialize our service context right so that embedding pipeline or maybe we can even think of it as a pre-processing pipeline for our data and then we just set everything up together right so our that's our full indexing pipeline we can initialize that now this can take a long time and unfortunately alarm index doesn't have like a progress bar or anything built into it but we can actually check the progress of our index creation then we go down to so alarm index intro here we can go to index info and then we see the total number of vectors that are in there okay and can we also you can also see the rate of them being updated as well and you can then refresh and you can see where we are okay so we're 4.3 thousand and we need to upsert how many quite a few actually so it's going to take a little while what i might do is just stop that for now and we can just jump ahead and begin asking questions so i'm not waiting too long for that but yeah that's just one of the unfortunate things with alarming dates but we can kind of get around that by just taking a look in the in the actual pinecone dashboard at how many vectors we actually currently have in there okay so yeah let's stop that right now it is very slow to do this with with llama index if you were just wanting to get your vectors and documents in there i would just use pinecone directly it's a much faster i mean for what 18 000 16 000 whatever that number is you're going to be waiting i don't know not long like maybe a couple of minutes at most because you need to embed things with open ai and then send things pinecone yeah a few minutes if you if you set that code abruptly but anyway that does mean that we wouldn't benefit from the other things that lam index offers so in some cases it might just be a case of being patient but that lam index and the embedding process will be optimized in in the near future so that hopefully that will not take quite as long to actually upset everything so from here let's pretend we've upset everything now what we want to do is build our query engine so the query engine is basically it's just the index and we have this method called as query engine it basically just reformats that into something that we can begin querying okay so we have our create our query engine then we do query engine query and our question is going to be in what year was the college of engineering established at the university of notredame we saw that the first few items in there were talking about the university of notredame so we would expect that that will work why okay let me so it looks like that hasn't actually initialized index properly so because i've kind of sucked it midway through so what i'll do is i'm just going to take like 100 dots quickly okay so it's a bit quicker let's check okay so we still have those all those other documents in there so now let's try that okay and we get the college engineering was established in 1920 i'm sure it's one of the first items it's probably where i got the question from the universe oh yeah so like question four here i think if we take a look at that so data four oops oh it's a day frame so it should be i look no three i'm being stupid yeah and we can have a look at the context okay so it's pulling this information established in 1920 okay cool so yeah that's how we would set up with larm index with a vector database like pine cone once we're done with that all we want to do is if you're finishing you put maybe you want to ask some more questions so obviously go ahead and do that but once you are done and you're not going to use the index again we delete the index just so that we're not wasting resources there and we can actually use our at least if you're on the free to you can use that index for something else after that so that's it for this video i just wanted to very quickly introduce llama index and how we would use it of course like i said at the start there is a lot more to llama index than what i'm showing here but this is very simply an introduction to the library but anyway i hope this has all been useful and interesting so thank you very much for watching and i will see you again in the next one you you you

Llama Index 101 with Vector DBs and GPT 3.5

Chapters

Transcript