back to indexLangChain Multi-Query Retriever for RAG
Chapters
0:0 LangChain Multi-Query
0:31 What is Multi-Query in RAG?
1:50 RAG Index Code
2:56 Creating a LangChain MultiQueryRetriever
7:16 Adding Generation to Multi-Query
8:51 RAG in LangChain using Sequential Chain
11:18 Customizing LangChain Multi Query
13:41 Reducing Multi Query Hallucination
16:56 Multi Query in a Larger RAG Pipeline
00:00:00.000 |
Today, we're going to be talking about another method 00:00:02.880 |
that we can use to make retrieval for LLMs better. 00:00:06.780 |
We're going to be taking a look at how to do multi-query 00:00:12.740 |
I'm going to almost jump straight into the code, 00:00:15.140 |
but in a future video, I will talk a little bit more 00:00:17.980 |
about multi-query and maybe a more fully-fledged 00:00:38.060 |
what we're going to do is we're going to take a single query. 00:00:40.100 |
We're going to throw that into our right pipeline. 00:00:48.980 |
So this single query gets turned into this query vector 00:01:05.520 |
and that LLM will generate multiple queries for us. 00:01:17.600 |
And the idea is that there is some variety between them. 00:01:22.880 |
a single point in vector space that is relevant to us, 00:01:25.600 |
we might identify three points within the vector space 00:01:30.260 |
And we naturally pull in a higher variety of records 00:01:35.860 |
So what is LLMA may become three different questions 00:01:44.820 |
We're searching a wider or broader vector space 00:01:56.120 |
I will just point out libraries we need installed here. 00:01:59.160 |
The dataset we're using here is this AI archive chunks. 00:02:06.160 |
And all I'm doing here is setting everything up. 00:02:20.680 |
Again, instructions, if you need them, are there. 00:02:26.700 |
Now the full length of the documents is, you know, 00:02:29.840 |
it's not huge, but it can take a little bit of time, 00:02:32.300 |
especially depending on your internet connection. 00:03:00.620 |
So we are going to be doing multi-query in Langchain, 00:03:07.060 |
is to initialize a VectorStore object in Langchain. 00:03:15.100 |
Of course, if needed, if you're using something else, 00:03:36.280 |
Then what we can do here is we want to initialize 00:03:43.100 |
So as usual, Langchain kind of has everything in there. 00:03:56.300 |
So the multi-query retriever, as most retrievers, 00:04:05.020 |
the multiple queries, we also need the LLM in there as well. 00:04:15.220 |
that we're generating from the multi-query retriever. 00:04:18.260 |
You don't need this, but if you would like to see 00:04:20.740 |
what is actually going on with generating queries, 00:04:25.780 |
So our question is going to be, tell me about LLAMA2. 00:04:36.820 |
We can see the generated queries that we have. 00:04:40.780 |
what information can you provide about LLAMA2? 00:04:46.940 |
That is the first question that it will search with. 00:04:50.880 |
Then we have, could you give me some details about LLAMA2? 00:04:55.540 |
And three, I would like to learn more about LLAMA2. 00:05:00.100 |
So that's, you know, I think that's kind of cool, 00:05:06.260 |
there's not much variety between these questions. 00:05:09.660 |
Like you're going to get slightly different results, 00:05:12.160 |
but not significantly, because the semantic meaning 00:05:24.460 |
The default number of documents that is returned 00:05:30.580 |
So in reality, we are returning nine documents here, 00:05:35.940 |
So there's a lot of overlap between these queries, 00:05:45.980 |
So, you know, we are at least expanding the scope 00:05:53.340 |
and I will show you how to do that pretty soon 00:05:58.340 |
But yes, here we can see the documents that were returned. 00:06:06.860 |
So we know that this one here is actually coming 00:06:08.780 |
from the LLAMA2 paper, and it is talking about, 00:06:14.740 |
a seven to seven billion parameter LLM, right? 00:06:22.460 |
The next one, which is here, is actually talking about, 00:06:26.780 |
I think it's talking about LLAMAs, like the, yeah, here. 00:06:30.680 |
It's talking about alpacas and LLAMAs and so on. 00:06:39.260 |
it's not talking about LLAMA in the context that we want. 00:06:41.840 |
We have another one here, LLAMA2 paper again. 00:06:46.340 |
So we're getting something that is relevant, hopefully. 00:06:49.620 |
We develop and release LLAMA2, and then there. 00:06:52.860 |
Generally perform better than existing open source models. 00:06:55.900 |
Okay, so we're getting more information there. 00:07:00.620 |
Here we get, again, it's talking about the animals. 00:07:03.580 |
And then this final one here is the base paper. 00:07:13.420 |
and instruction following LLAMA model, right? 00:07:16.940 |
So we have a few results here, not all of them relevant, 00:07:19.940 |
but for the most part, we can work with that. 00:07:32.900 |
and we're gonna do this in this video, at least, 00:07:39.520 |
In another future video, we'll look at doing it 00:07:42.980 |
sort of outside LangChain as well, just so we can compare. 00:08:02.260 |
so question answering prompt, just has some instructions, 00:08:18.720 |
the ones I just showed you, into that QA chain directly. 00:08:28.020 |
ranging in scale of seven to 70 billion parameter models. 00:08:42.460 |
All right, so there's quite a bit of information in there, 00:08:58.980 |
So rather than me kind of like writing some code 00:09:43.820 |
And then I'm wrapping this retrieval transform function 00:09:56.800 |
So the input into this is going to be a question, 00:10:02.040 |
and the output is going to be query and context, 00:10:12.200 |
is that you cannot have the same input variable 00:10:17.200 |
All right, so that's why I'm calling this question 00:10:20.560 |
If I put question here, I'm going to get an error. 00:10:26.700 |
Now that we have our transform chain for retrieval 00:10:31.900 |
we wrap all of this into a single sequential chain 00:10:34.780 |
and that gives us our RAG pipeline in LangChain. 00:10:43.120 |
With that, we can just perform the full RAG pipeline 00:10:59.900 |
Don't know why it's in this weird color, but okay. 00:11:03.620 |
So at the top, we have the same things we saw before, 00:11:06.460 |
those three questions, and then this is the output. 00:11:09.180 |
All right, it's the same as what we have before 00:11:10.540 |
'cause we're actually just doing the same thing. 00:11:12.100 |
We just wrapped it into this sequential chain 00:11:18.980 |
Now let's take a look at modifying our prompt 00:11:28.780 |
and probably the most important part of this video, 00:11:33.000 |
which is, okay, how does it behave with different queries? 00:11:49.980 |
We want to get a variety of relevant search results. 00:11:52.940 |
Okay, so what I'm trying to do with this query 00:12:09.760 |
So this is kind of like our custom approach to doing this. 00:12:14.980 |
We have this lineless object here, an output parser. 00:12:20.180 |
is our query here is going to generate the questions 00:12:28.060 |
This output parser here is going to look for new lines 00:12:31.660 |
and it's gonna separate out the queries based on that. 00:12:35.180 |
So it's just parsing the output we generate here. 00:12:42.180 |
And what I'm gonna do is reinitialize the retriever 00:12:51.820 |
And we'll just see the sort of queries that we get, okay? 00:12:59.420 |
How are llamas used in agriculture and farming? 00:13:05.220 |
So yes, we've definitely got more diverse questions here, 00:13:13.060 |
And it's like, okay, you want me to ask some unique, 00:13:19.260 |
So there's kind of like pros and cons to doing this. 00:13:25.220 |
are not going to be as relevant to our query. 00:13:28.460 |
Although we actually still do get the llama paper 00:13:30.820 |
because honestly, I don't think there's much in there 00:13:46.040 |
is that when you're trying to increase the variety 00:14:03.900 |
So now what I'm going to do in a second prompt 00:14:07.780 |
I'm going to say, okay, I'm basically saying the same 00:14:10.180 |
as what I said in that first prompt, but I just added this. 00:14:16.340 |
machine learning, and related disciplines, right? 00:14:21.860 |
as to what it should be generating queries for. 00:14:25.220 |
And well, let's see, let's see if this helps our LLM. 00:14:40.060 |
which is more than the five we had for the first one. 00:14:42.780 |
And now we can see, okay, what are the key features 00:14:45.700 |
and capabilities of large language model LLAMA2? 00:14:51.540 |
How does LLAMA2 compare to other large language models 00:14:57.860 |
Okay, what are the applications and use cases of LLAMA2 00:15:05.620 |
Right, so I personally think those results are way better 00:15:11.740 |
And we can see the docs that are being returned here. 00:15:25.940 |
So this one is definitely talking about LLAMA2. 00:15:33.180 |
Even large language models are brittle, social bias. 00:15:36.540 |
So this one, I don't see anything relevant for LLAMA2 here, 00:16:07.900 |
So, "These closed product LLAMs are heavily fine-tuned 00:16:13.660 |
"Greatly enhances their usability and safety." 00:16:16.580 |
Okay, "In this work, we develop and release LLAMA2," 00:16:23.780 |
Here, we are talking about the original LLAMA model, 00:16:40.060 |
It's just talking about ML and NLP in general. 00:16:42.420 |
Okay, and this one's talking, again, generally about LLAMs. 00:16:45.540 |
So, we have sort of a mix of LLAMs in there, some LLAMA. 00:16:59.540 |
that we've broadened the scope of what we're searching for, 00:17:01.940 |
which is, that's what we want to do with multi-query, 00:17:05.060 |
but it still doesn't make a good retrieval system, 00:17:16.460 |
it broadens the scope of what we're searching for, 00:17:22.940 |
so that we don't have so much irrelevant or noisy results 00:17:34.820 |
where we're searching for a particular keyword, 00:17:36.700 |
which is LLAMA2, by using something like hybrid search, 00:17:50.740 |
and we'll end up returning like 50 or so documents.