back to indexLlama Index 101 with Vector DBs and GPT 3.5
Chapters
0:0 Getting Started with Llama Index
1:13 Llama Index Features
2:15 Llama Index Code Intro
3:55 Llama Index Document Objects
5:45 Llama Index Nodes
7:23 Indexing with Pinecone
9:22 Vector Store in Llama Index
15:36 Making Queries with Llama Index
00:00:00.000 |
Today we're going to take a look at how we can use Llama Index in production with Pinecone. 00:00:06.480 |
Now this is an introduction to the Llama Index library that was previously known as GPT Index. 00:00:12.160 |
We're not going to go into any details on the more advanced features of the library. 00:00:18.560 |
We're just going to see how to actually use it and get started with it and do that in a way that 00:00:25.680 |
would be more production friendly with a vector database like Pinecone. Now for those of you that 00:00:31.680 |
don't know Llama Index is a library that helps us build a better retrieval augmentation pipeline for 00:00:41.280 |
our LLMs. So we would use retrieval augmentation when we want to give our LLM source knowledge, 00:00:49.360 |
so knowledge from the outside world or maybe some internal database or something along those lines. 00:00:56.560 |
And that will help us one, reference that other knowledge so we can add in citations and things 00:01:03.280 |
like that there and it will also help us reduce the likelihood of hallucinations. So Llama Index 00:01:10.560 |
is a library that will support us in doing that. Now Llama Index can do a lot of things, 00:01:15.760 |
not all of those we're going to cover in this video but the main features of the library include 00:01:22.320 |
the data loaders that allow us to very easily extract data from APIs, PDFs, SQL databases, 00:01:31.760 |
CSVs, all of the most common types of data sources. It also gives us some more advanced 00:01:39.680 |
ways of structuring our data so we can add in connections between different data sources which 00:01:45.520 |
is kind of useful. So imagine you have a lot of chunks of text from PDFs, what you can do is add in 00:01:52.000 |
connections between those chunks. So the first chunk in your database would be connected to the 00:01:57.440 |
next chunk with a little connector that says this is actually the next chunk and this is the previous 00:02:02.880 |
chunk. And they also support things like post-retrieval re-ranking as well. So there's 00:02:08.960 |
plenty to talk about but first let's get started with a simple introduction to the library. So 00:02:15.120 |
we're going to walk through this notebook here. There will be a link to this notebook at the top 00:02:19.440 |
of the video right now. So the first thing we need to do is install the prerequisite libraries so 00:02:24.800 |
go ahead and run that. Now for the runtime here we don't need to be using GPU so you can just check 00:02:33.680 |
if you are doing that or not. It costs money to use GPU on Colab so you can just set hardware 00:02:38.480 |
accelerator to none to save that money. Okay and once we're doing that what we're going to do is 00:02:44.240 |
just download a data set. So I'm going to use the squad so it's the Sanford question answering data 00:02:49.040 |
set. Okay so there's a few things I'm doing here. First I'm just getting the relevant columns I need 00:02:55.360 |
there so the id, the context which is like a chunk of text and the title so the basically the page 00:03:03.200 |
title where that context is coming from. And then what I'm doing is dropping duplicates. So in the 00:03:08.240 |
squad data set you will basically have like 20 contexts and 20 different answers but those 20 00:03:15.600 |
contexts are all identical for those 20 different questions so you end up with a lot of duplicate 00:03:21.280 |
contexts in there but because we are just using the context we actually need to remove all that 00:03:26.960 |
duplication. So that's what I'm doing here and then we get this. Okay so we have our id so it's 00:03:35.120 |
like the document or context id, the context itself and then we have where that is coming 00:03:41.520 |
from. Okay so we have the first few there are all from the University of Notre Dame Wikipedia page 00:03:46.800 |
and in total we have just almost 19,000 records in there. So LongIndex uses these document objects 00:03:58.720 |
which you can think of as being basically the context or it revolves around the context 00:04:05.200 |
of your data right so this chunk of text and it will obviously include other bits of information 00:04:14.160 |
around that context. So for us it's going to include the document id right so every document 00:04:20.080 |
is going to need an id and this is optional so we can also add extra info which we can think of as 00:04:26.560 |
metadata for our context. Now for us we just have title but obviously we could add more this is a 00:04:33.920 |
dictionary so we could add I don't know something else here right and we could just you know put 00:04:39.600 |
something but of course we don't actually need that so we'll remove that but yeah you can you 00:04:46.400 |
can put as many fields as you like in there and yeah let's run that and take a look at one of 00:04:52.240 |
those documents and see what it looks like. So you can think of this as a core object for LongIndex 00:05:00.960 |
all right so we have this document we have the text and then we go through here we have the 00:05:08.080 |
document id and that is your info. Now embedding we don't have an embedding for it yet so we're 00:05:13.760 |
going to create that later but the embedding is also very important because that's what will 00:05:17.920 |
allow us to search through that data set later on. Okay so now what we need to do is actually 00:05:27.040 |
create those embeddings so to create those embeddings we're going to be using OpenAI 00:05:31.840 |
so for that you will need to get a OpenAI API key from platform.openai.conf and then you would just 00:05:41.680 |
put that in here I have already done it so I will move on to this. So one step further from our 00:05:49.280 |
document is what we would call a node so a node the way that I would think of this is it's your 00:05:55.680 |
document object with extra information about that document in relation to other documents within 00:06:03.120 |
what will be your database. So let's say you have the chunks of paragraphs or chunks of text from a 00:06:10.480 |
pdf a node will contain the information that chunk one is followed by chunk two and then in 00:06:17.440 |
chunk two it will say chunk one was the preceding chunk before this so it has that relational 00:06:24.720 |
information between the chunks whereas a document will not have that. So we would need to add that 00:06:31.920 |
in there we're not going to do that here we'll talk about that in the future but we still need 00:06:37.120 |
to use the the nodes here so we're going to just run this so our nodes in this case are basically 00:06:44.480 |
going to be the same as our documents in terms of the information that they carry but node is the 00:06:49.760 |
object type that we will build our vector database from. So let's run this I should say here we've 00:07:00.640 |
we have set up our OpenAI API key we don't actually need to use it yet I should have 00:07:05.520 |
really done that later but it's there now so we have that ready for when we do want to use it. 00:07:12.080 |
Okay so we've just created all of the nodes from the documents let's take a look at those nodes 00:07:18.400 |
okay obviously we have the same number of nodes as we do documents. Now we are going to be using 00:07:26.400 |
Pinecone which is a managed vector database as the as a database for our Llama index data okay 00:07:35.280 |
so to use that we need to get our API key and environment which we do from app.pinecone.io 00:07:44.800 |
and within there we once you are in app.pinecone.io you should be able to see API keys over on the 00:07:50.640 |
left you'll see something looks like this zoom out a bit and you just want to copy your API key 00:07:57.120 |
and also remember your environment here so I've got us-west1-gcp so your API key you put it in 00:08:03.440 |
here and here I'm going to put us-west1-gcp. Okay and after running that let me walk you through 00:08:12.880 |
this what's going on here so we initialize our connection to Pinecone we create our Pinecone 00:08:19.440 |
index so it's going to call it Llama index intro you can call this whatever you want 00:08:22.960 |
and the things that we do need to do is one create our index if it does not already exist which if 00:08:29.600 |
you're running this for the first time it won't and to create that index you need to make sure 00:08:35.360 |
dimensionality is the same as the text embedding on the 002 model which is the embedding model 00:08:40.400 |
we're using and that dimensionality is 1536 and we also need to make sure we're using the right 00:08:47.520 |
metric we can actually use any metric here so you can use euclidean dot product or cosine 00:08:52.720 |
but I think cosine is the fastest in terms of the similarity calculation when you're using text 00:09:01.120 |
embedding on 002 although in reality the difference between them is practically nil so you can use any 00:09:09.040 |
but I recommend cosine now after that we would just connect to the index okay so here we're 00:09:16.080 |
connecting to Pinecone creating the index and then connecting to that index 00:09:20.000 |
okay once that is done we can move on to connecting so we've just created our index 00:09:31.600 |
connected to it now what we want to do is connect to it through the vector store abstraction in 00:09:37.520 |
Llama index so to do that that's pretty simple Pinecone vector store and then we just pass in 00:09:44.160 |
our index that's it that's pretty easy so this would just allow us to use the other 00:09:50.720 |
Llama index components with our Pinecone vector store 00:09:55.440 |
cool so I think that is all good and then we have a few more things going on here so let's talk 00:10:06.320 |
through all of this let me make it a lot more readable so yeah there's a few things going on 00:10:13.440 |
so we have basically what we're wanting to do here is create our index which is this GPT vector 00:10:19.920 |
store index so we're basically going to take all of our documents and we're going to take the 00:10:25.840 |
service context which is like your embedding pipeline and we're also going to take the 00:10:32.160 |
storage context which is the vector store itself and this will essentially act as a pipeline around 00:10:40.480 |
that so it's going to take all of our documents it's going to feed them through our embedding 00:10:47.040 |
pipeline so this service context embed all of them and put them all into our vector store 00:10:54.880 |
so I mean it's not in reality it's pretty straightforward let me just explain that from 00:11:01.360 |
the perspective of where we're actually initializing these so storage context from default 00:11:08.640 |
it's really simple we're just using our vector store there are other parameters in here but we 00:11:12.560 |
don't need to use any of those because we're just using our vector store with the default 00:11:17.600 |
settings with the service context like I said this is the embedding pipeline again we don't really 00:11:23.760 |
need to specify much here we just need to specify okay we're using openai embeddings 00:11:28.160 |
this is going to automatically pull in our API heap which we set earlier on up here okay 00:11:38.320 |
so it's going to automatically pull in the API key we do need to set the model so text embedding 00:11:44.240 |
order 002 at the time of me going through this is the recommended model from openai 00:11:50.000 |
and we have our embedding batch size so this is one important thing that you should set 00:11:54.880 |
basically it will embed things in batches of 100 I think by default the value for this is much 00:12:03.200 |
smaller it's 32 or 16 or something like that so that basically means if every it's going to 00:12:11.440 |
take 16 chunks of text it's going to send them to openai get the embeddings and then it's going to 00:12:19.360 |
pass it onto the storage context and upset those pine cone but what we've done here is set the 00:12:25.600 |
embedding batch size to 100 so it's going to take 100 send it to openai then send it to pine cone 00:12:32.160 |
that means that you need to make less requests what is it it's like six times less requests if 00:12:39.840 |
you set this to 100 which means in essence you're going to be roughly six times faster because the 00:12:45.440 |
majority of the wait time in these API requests is actually the network latency so it's making 00:12:51.680 |
the request receiving the request so by increasing that batch size it means things are going to be 00:12:57.360 |
faster which is I think what we all want so yeah we set that and then with that we initialize our 00:13:05.840 |
service context right so that embedding pipeline or maybe we can even think of it as a pre-processing 00:13:12.080 |
pipeline for our data and then we just set everything up together right so our that's 00:13:18.480 |
our full indexing pipeline we can initialize that now this can take a long time and unfortunately 00:13:27.520 |
alarm index doesn't have like a progress bar or anything built into it but we can actually check 00:13:33.200 |
the progress of our index creation then we go down to so alarm index intro here we can go to index 00:13:42.560 |
info and then we see the total number of vectors that are in there okay and can we also you can 00:13:49.760 |
also see the rate of them being updated as well and you can then refresh and you can see where we 00:13:58.160 |
are okay so we're 4.3 thousand and we need to upsert how many quite a few actually so it's going 00:14:09.840 |
to take a little while what i might do is just stop that for now and we can just jump ahead and 00:14:19.360 |
begin asking questions so i'm not waiting too long for that but yeah that's just one of the 00:14:25.680 |
unfortunate things with alarming dates but we can kind of get around that by just taking a look in 00:14:30.320 |
the in the actual pinecone dashboard at how many vectors we actually currently have in there okay 00:14:38.000 |
so yeah let's stop that right now it is very slow to do this with with llama index if you were just 00:14:46.480 |
wanting to get your vectors and documents in there i would just use pinecone directly it's 00:14:52.160 |
a much faster i mean for what 18 000 16 000 whatever that number is you're going to be 00:14:58.480 |
waiting i don't know not long like maybe a couple of minutes at most because you need to embed 00:15:05.840 |
things with open ai and then send things pinecone yeah a few minutes if you if you set that code 00:15:12.160 |
abruptly but anyway that does mean that we wouldn't benefit from the other things that 00:15:19.600 |
lam index offers so in some cases it might just be a case of being patient but that lam index and 00:15:27.120 |
the embedding process will be optimized in in the near future so that hopefully that will not take 00:15:33.120 |
quite as long to actually upset everything so from here let's pretend we've upset everything 00:15:40.320 |
now what we want to do is build our query engine so the query engine is basically it's just the 00:15:47.840 |
index and we have this method called as query engine it basically just reformats that into 00:15:54.000 |
something that we can begin querying okay so we have our create our query engine then we do 00:16:01.040 |
query engine query and our question is going to be in what year was the college of engineering 00:16:06.240 |
established at the university of notredame we saw that the first few items in there were talking 00:16:12.800 |
about the university of notredame so we would expect that that will work why okay let me so 00:16:22.800 |
it looks like that hasn't actually initialized index properly so because i've kind of sucked it 00:16:28.160 |
midway through so what i'll do is i'm just going to take like 100 dots quickly okay so it's a bit 00:16:36.880 |
quicker let's check okay so we still have those all those other documents in there so now let's 00:16:43.600 |
try that okay and we get the college engineering was established in 1920 i'm sure it's one of the 00:16:49.680 |
first items it's probably where i got the question from the universe oh yeah so like question four 00:16:55.280 |
here i think if we take a look at that so data four oops oh it's a day frame so it should be i look 00:17:18.400 |
yeah and we can have a look at the context okay so it's pulling this information established in 00:17:25.840 |
1920 okay cool so yeah that's how we would set up with larm index with a vector database like pine 00:17:34.320 |
cone once we're done with that all we want to do is if you're finishing you put maybe you want to 00:17:40.880 |
ask some more questions so obviously go ahead and do that but once you are done and you're not going 00:17:45.280 |
to use the index again we delete the index just so that we're not wasting resources there and we can 00:17:51.680 |
actually use our at least if you're on the free to you can use that index for something else after 00:17:57.360 |
that so that's it for this video i just wanted to very quickly introduce llama index and how we would 00:18:05.040 |
use it of course like i said at the start there is a lot more to llama index than what i'm showing 00:18:10.880 |
here but this is very simply an introduction to the library but anyway i hope this has all been 00:18:18.000 |
useful and interesting so thank you very much for watching and i will see you again in the next one