back to index

NEW RAG Framework: Canopy


Chapters

0:0 Canopy RAG Framework
1:31 Canopy Setup
4:23 Data Input to Canopy
7:4 Upserting with Canopy
8:24 Chatting with Canopy CLI
10:21 RAG vs. LLM only
12:24 Handling Complex Queries in RAG

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to be exploring the new Canopy framework.
00:00:03.300 | This is a framework that has been developed
00:00:05.900 | by the GenAI team at Pinecone.
00:00:08.220 | And the idea is essentially
00:00:09.700 | to help us build better RAG pipelines
00:00:12.580 | without needing to get into all of the details
00:00:15.420 | of how to build a RAG pipeline.
00:00:17.460 | Because it's very easy to just build
00:00:19.300 | a very simple RAG pipeline,
00:00:20.660 | but it's very hard to build a good one.
00:00:22.320 | And it also comes with a lot of nice little features.
00:00:25.380 | One that I really like is the ability
00:00:28.100 | to just chat within the terminal
00:00:30.020 | and see the difference between a RAG output
00:00:32.980 | and a non-RAG output,
00:00:34.340 | so that you can very quickly evaluate
00:00:36.900 | how well your RAG pipeline is performing.
00:00:40.140 | Now, all of this has been wrapped up
00:00:41.380 | into a very easy to use framework.
00:00:43.620 | So let's jump into it and see how we can use it.
00:00:48.220 | Okay, so we can see the GitHub repo here.
00:00:51.660 | And yeah, just a short description.
00:00:54.300 | And we can kind of come to this visual here.
00:00:56.220 | It gives us sort of a rough idea
00:00:57.820 | of kind of what is going on there.
00:00:59.940 | And if we come down to here,
00:01:03.860 | we can see the different components of Canopy.
00:01:07.540 | I'm really gonna be focusing on the Canopy CLI down here,
00:01:12.540 | just to show you how to get started with it.
00:01:16.100 | So we're mostly gonna be using everything through CLI.
00:01:19.940 | Okay, and if we come down to the setup here,
00:01:23.300 | we have, okay, you can create a virtual environment,
00:01:25.860 | you can go ahead and do that, it's fine.
00:01:28.140 | I'm not going to in this case.
00:01:30.380 | But then what we do want is we want to install the package.
00:01:32.820 | So I'm actually just gonna copy this.
00:01:35.020 | And I'm gonna come over to my terminal window here.
00:01:37.580 | All right, so I'm gonna pip install.
00:01:39.300 | I'm just gonna add a upgrade flag here.
00:01:41.420 | And yes, I will let that install.
00:01:46.180 | I've already installed it, so yeah.
00:01:48.340 | Once it has installed,
00:01:49.900 | we should be able to just run Canopy.
00:01:54.860 | And we'll get this error message to begin with.
00:01:57.740 | And that's because we haven't set a few environment variables
00:02:00.700 | but we do know from this that it isn't solved.
00:02:03.580 | So to deal with this,
00:02:05.460 | we need to set some environment variables.
00:02:08.220 | So we have PyCon API key, PyCon environment.
00:02:11.540 | There's also the OpenAI API key as well
00:02:13.940 | that we should add into there.
00:02:15.300 | So I'm gonna go ahead and do that.
00:02:17.580 | I'm gonna run vim.
00:02:19.100 | I'm just gonna add all of these
00:02:20.900 | into some environment variables file.
00:02:24.220 | Okay, so I'm going to do on Mac.
00:02:27.500 | So I'm gonna do export PyCon API key.
00:02:31.300 | I'm gonna put my API key in there.
00:02:34.180 | I'm gonna do export PyCon environment.
00:02:39.100 | And also put that in there.
00:02:41.300 | And then I'm gonna do export OpenAI API key.
00:02:44.940 | Put that in there.
00:02:46.860 | So for the PyCon API key and environment,
00:02:50.180 | we go to app.pycon.io.
00:02:52.900 | We go to API keys and I'm just gonna copy this
00:02:55.820 | and I'm gonna take note of my environment as well.
00:02:58.180 | So us-west1-gcp and come back over here
00:03:00.980 | and I'm just gonna put it into this here.
00:03:04.700 | So you can try and seal my API keys if you like.
00:03:10.580 | And for the OpenAI API key,
00:03:12.820 | you want to go to platform.openai.com.
00:03:17.820 | We go to API keys at the top here.
00:03:21.420 | And I already created one, but I'm gonna create a new one.
00:03:25.500 | So canopy-demo2, I create my secret key.
00:03:30.500 | And again, I'm just gonna go put it in here.
00:03:37.860 | Great, so put those in.
00:03:44.060 | And now I can just go ahead and do that.
00:03:47.220 | Now with that done, let's try and run Canopy again.
00:03:51.580 | And we should get something that looks like this.
00:03:54.140 | Now what we can do is create a new index.
00:03:57.580 | Now to create a new index, you'd run canopy-new
00:04:01.900 | and then you'd have your index name.
00:04:04.420 | I'm gonna call mine canopy-101,
00:04:08.060 | but I already actually created canopy-101.
00:04:10.740 | So I'm just gonna call it 101A for now.
00:04:12.740 | Okay, so I confirm.
00:04:16.660 | Okay, and then from there, what we do want to do
00:04:18.940 | is actually add our data to this index.
00:04:21.500 | Now, let me jump across to a notebook
00:04:25.180 | and I'll show you how we can create data
00:04:27.620 | in the correct format for Canopy.
00:04:29.740 | Okay, so we're gonna work through this notebook
00:04:31.460 | very quickly.
00:04:32.700 | There'll be a link to this notebook
00:04:33.860 | at the top of the video right now.
00:04:35.700 | So we're gonna take this data set
00:04:37.900 | that I scraped from archive.
00:04:40.220 | It's just a load of AI archive papers.
00:04:44.180 | I've used either this version or the chunk version,
00:04:47.260 | this example a few times in recent videos.
00:04:51.460 | But if we just take a quick look at what is in there,
00:04:54.100 | we see that we basically have, okay,
00:04:56.380 | there's this understanding HTML with large language models,
00:05:00.180 | as a summary, and then we have the content.
00:05:02.300 | The content is kind of the bit we care most about.
00:05:05.420 | Now, the content in there is fairly long
00:05:08.860 | and typically what we do to handle that
00:05:11.500 | is we have to chunk it up into smaller parts.
00:05:16.300 | So let me just take the length of that.
00:05:19.660 | Okay, so yes, quite a few characters there.
00:05:24.060 | That wouldn't all fit into the context window of a LM,
00:05:27.860 | or it may fit in,
00:05:29.580 | but the whole 400 archive papers definitely wouldn't.
00:05:34.020 | And when we are feeding knowledge into an LM,
00:05:39.020 | we also want to be feeding that knowledge
00:05:41.180 | into smaller chunks
00:05:42.340 | so that we're not filling that context window
00:05:44.260 | so that we don't run into LM recall issues.
00:05:47.620 | So to avoid that, yeah, we use chunking.
00:05:53.060 | And fortunately, that's kind of built in to Kanopy.
00:05:57.140 | So we don't even need to care about it.
00:06:00.020 | It's gonna be done automatically.
00:06:02.180 | All we need to do is set up this data format here.
00:06:05.980 | So we have ID, text, source,
00:06:07.860 | so where the source is coming from.
00:06:09.700 | You don't have to pass that.
00:06:10.700 | You can just leave it blank, it's fine.
00:06:13.700 | And then metadata, which is just a dictionary
00:06:16.380 | containing any relevant information
00:06:17.980 | that you may or may not want in
00:06:20.020 | and like attached to your vectors.
00:06:22.420 | Again, you don't need to put anything in here.
00:06:25.300 | Okay, so we run this.
00:06:28.100 | This is just transforming our Wackyface dataset
00:06:32.920 | into this format.
00:06:35.020 | Okay, and removing the columns that we don't want.
00:06:38.220 | Then what I'm going to do is
00:06:39.580 | convert this into a JSON lines file.
00:06:41.660 | Okay, and then should be able to
00:06:46.660 | take a look at that over here.
00:06:50.060 | And yeah, we can see all of this.
00:06:55.300 | Okay, so with that done,
00:06:57.420 | we can move on to actually putting all of this
00:07:01.420 | into our index using Kanopy.
00:07:05.060 | Okay, once we have our dataset,
00:07:08.280 | we can go ahead and run Kanopy.
00:07:10.980 | Upsert, and it would be in here.
00:07:15.280 | So this is where I saved my data
00:07:17.740 | in the same direction I'm in now.
00:07:19.160 | And actually, you know, we can just see that quickly.
00:07:22.220 | Yeah, okay, so I'm going to upsert this.
00:07:25.500 | So Kanopy upsert, there we go.
00:07:28.580 | Now, when we try and do that,
00:07:31.180 | we're actually gonna get this error.
00:07:33.100 | And that's because we also need
00:07:34.200 | a index name environment variable.
00:07:36.340 | So we'll go ahead and do that as well.
00:07:39.300 | You can also set index name here within the command,
00:07:41.860 | but I'm going to do it via this.
00:07:45.680 | Okay, and I want A to start with.
00:07:52.440 | And do the upsert.
00:07:56.220 | It'll ask us to confirm that everything was correct.
00:07:58.380 | So just, you know, quick check, it looks pretty good.
00:08:01.720 | Say yes, and we continue.
00:08:05.900 | And then, yeah, we're gonna get this loading bar.
00:08:08.020 | It's gonna just show us the progress of our upsert.
00:08:12.780 | But I've already created my index,
00:08:15.420 | doing this exact same process.
00:08:16.860 | So I'm gonna actually cancel that.
00:08:19.700 | And what I'm going to do is change my index name
00:08:22.100 | to that other index.
00:08:24.580 | And then I'm going to start Kanopy.
00:08:26.460 | Okay, so I'm gonna do Kanopy start.
00:08:29.300 | And what this is going to do
00:08:31.260 | is start up the API or Kanopy server, okay?
00:08:35.960 | So from here, I can actually, you know,
00:08:38.080 | I could go to my localhost 8000 and go to the docs,
00:08:42.200 | and I can see, if I zoom in a little bit,
00:08:45.500 | see it has some documentation.
00:08:46.840 | We have all the endpoints and stuff in here
00:08:48.680 | that we can use.
00:08:50.500 | Now, I actually want to use the CLI.
00:08:53.800 | Now, the CLI requires that you have the Kanopy server
00:08:57.600 | running in the background.
00:08:58.640 | So I'm gonna switch across to a new terminal window.
00:09:01.580 | I'm going to activate my ML environment.
00:09:04.480 | I'm going to run source Mac Env,
00:09:07.080 | and I'm going to export my index name.
00:09:09.160 | Then what I want to do is run Kanopy chat.
00:09:14.600 | And so you can run Kanopy chat without any arguments,
00:09:19.040 | and that will, you know,
00:09:21.120 | it's like you're chatting with your LLM,
00:09:23.880 | and it's doing rag in the background,
00:09:25.280 | and you're getting your responses.
00:09:26.980 | But I also actually want to do it with NoRag.
00:09:29.840 | What NoRag will do is show us a comparison
00:09:33.440 | of the LLM response with and without rag.
00:09:36.440 | So this is incredibly useful for just evaluating
00:09:40.640 | what rag is actually doing for you.
00:09:42.960 | So yeah, let's see.
00:09:45.280 | Let's take a look at this,
00:09:46.120 | and yeah, we should see some pretty interesting results.
00:09:50.180 | Okay, cool.
00:09:51.020 | So we get a nice little note up there.
00:09:53.520 | This is debugging tool, not to be used for production.
00:09:57.240 | That's cool, 'cause we're just testing it.
00:09:59.160 | So hello there.
00:10:02.080 | So with that, press escape and enter.
00:10:05.120 | I'll send my query with context rag.
00:10:09.520 | Okay, so we see with this query,
00:10:12.680 | we literally get the same response,
00:10:14.100 | because, you know, it doesn't really matter
00:10:15.800 | whether we're using rag or not for general chat.
00:10:20.300 | But what if we have something like an actual query
00:10:24.960 | that is relevant to the dataset that we put behind this?
00:10:29.120 | So our dataset contains information
00:10:31.000 | about LLAMA2, the large language model,
00:10:33.360 | because this is an archived dataset on, like, AI.
00:10:38.360 | So I can ask it something like that.
00:10:40.600 | I can ask it, "Can you tell me about LLAMA2?"
00:10:45.600 | So obviously, with context,
00:10:50.880 | LLAMA2 is a collection of pre-trained
00:10:52.320 | and fine-tuned large language models,
00:10:54.560 | ranging in scale from 770 billion parameters,
00:10:57.520 | so on and so on, right?
00:10:59.880 | That's cool.
00:11:01.420 | Then, no rag, I apologize,
00:11:03.840 | but I'm not aware of any specific entity called LLAMA2.
00:11:06.880 | Okay, so this LLAM,
00:11:09.840 | it just doesn't know anything about LLAMA2,
00:11:11.840 | because its training day cutoff was, like, September 2021.
00:11:16.080 | So, yeah, it cannot know about that.
00:11:20.240 | So, I don't know, let's continue the conversation.
00:11:23.120 | Like, okay, fascinating.
00:11:27.400 | Can you tell me more about when I might want to use LLAMA?
00:11:32.400 | Okay, let's see what we get.
00:11:35.760 | Okay, cool.
00:11:37.640 | So, with context, rag, we have LLAMA2,
00:11:41.320 | specifically the fine-tuned LLAMs,
00:11:44.720 | optimized for dialogue use cases,
00:11:46.360 | found to outperform open-source trap models
00:11:48.240 | on most benchmarks that were tested, so on and so on.
00:11:51.320 | Okay, so, also gives us a source document,
00:11:53.240 | which is pretty nice.
00:11:55.640 | Now, without a context, okay,
00:11:59.000 | LLAMAs can serve various purposes
00:12:01.320 | and be useful in different situations.
00:12:04.360 | Can use them as pack animals, therapy animals,
00:12:06.760 | guard animals, apparently, I didn't know that.
00:12:10.440 | And, okay, maybe, and in sustainable agriculture.
00:12:15.440 | So, obviously, one of those answers
00:12:19.140 | is a little bit better than the other,
00:12:21.980 | at least for our use case.
00:12:24.240 | Now, let's ask you a slightly more complicated question.
00:12:27.500 | So, can you tell me about LLAMA2 versus Distilbert?
00:12:32.500 | Now, this is the sort of question
00:12:37.800 | where a typical rag pipeline, if not built well,
00:12:41.080 | will probably struggle,
00:12:42.080 | because there's actually kind of two search queries in here.
00:12:46.100 | We want to be searching for LLAMA2,
00:12:47.760 | and we also want to be searching for Distilbert,
00:12:49.840 | which appear in different papers.
00:12:52.040 | But, typically, the way that rag would be implemented,
00:12:55.880 | at least, you know, your first versions and whatever else,
00:12:59.640 | that's probably gonna get passed to your vector database
00:13:04.560 | as a single query.
00:13:06.520 | The good thing about Canopy is that it will handle this,
00:13:08.920 | and it'll actually split this up into multiple queries,
00:13:11.200 | so we're doing multiple searches,
00:13:13.180 | getting results from the Distilbert paper
00:13:15.440 | and the LLAMA2 paper,
00:13:17.000 | and then it's gonna provide us, hopefully,
00:13:19.120 | with a good comparison between the two.
00:13:22.060 | All right, so LLAMA2 is a collection of pre-trained
00:13:24.640 | and fine-tuned lost-language models, so on and so on.
00:13:28.080 | Cool.
00:13:29.120 | Distilbert is a smaller, faster, and lighter version
00:13:31.080 | of the BERT language model.
00:13:33.000 | In summary, LLAMA2 is specifically summarized
00:13:35.960 | for dialogue use cases,
00:13:37.080 | whilst Distilbert is a more efficient version
00:13:39.040 | of the BERT model that can be used
00:13:40.080 | for various natural language processing tasks.
00:13:42.960 | Okay, I think, you know, so it's a good comparison.
00:13:46.960 | Without context, obviously,
00:13:48.600 | it doesn't know what LLAMA2 is.
00:13:52.480 | So, yeah, it's like, okay, it's not a known entity or term
00:13:56.480 | in the realm of NLP or AI.
00:13:59.120 | However, Distilbert refers to a specific model architecture
00:14:02.320 | used for various NLP tasks.
00:14:04.320 | So, actually, it can tell us a little bit about Distilbert,
00:14:07.400 | because this is an owner model,
00:14:09.880 | so it does know about that,
00:14:12.220 | but it can't give us a good comparison.
00:14:14.760 | So, that's a very quick introduction
00:14:18.080 | to the Canopy framework.
00:14:19.920 | I think, from this, you can very clearly see
00:14:22.400 | what the pros of using something like this are.
00:14:26.720 | Of course, this is just the CLI.
00:14:28.320 | There's also the Canopy server
00:14:31.360 | and the actual framework itself,
00:14:34.080 | which you can obviously go ahead and try out.
00:14:36.720 | But for now, that's it for this video.
00:14:38.240 | Hope all this has been useful.
00:14:39.520 | So, thank you very much for watching,
00:14:40.960 | and I will see you again in the next one.
00:14:43.680 | (gentle music)
00:14:47.100 | (gentle music)
00:14:49.680 | (gentle music)
00:14:52.260 | (gentle music)
00:14:54.840 | (soft music)