NEW RAG Framework: Canopy

00:00:00.000 | Today, we're going to be exploring the new Canopy framework.

00:00:03.300 | This is a framework that has been developed

00:00:05.900 | by the GenAI team at Pinecone.

00:00:08.220 | And the idea is essentially

00:00:09.700 | to help us build better RAG pipelines

00:00:12.580 | without needing to get into all of the details

00:00:15.420 | of how to build a RAG pipeline.

00:00:17.460 | Because it's very easy to just build

00:00:19.300 | a very simple RAG pipeline,

00:00:20.660 | but it's very hard to build a good one.

00:00:22.320 | And it also comes with a lot of nice little features.

00:00:25.380 | One that I really like is the ability

00:00:28.100 | to just chat within the terminal

00:00:30.020 | and see the difference between a RAG output

00:00:32.980 | and a non-RAG output,

00:00:34.340 | so that you can very quickly evaluate

00:00:36.900 | how well your RAG pipeline is performing.

00:00:40.140 | Now, all of this has been wrapped up

00:00:41.380 | into a very easy to use framework.

00:00:43.620 | So let's jump into it and see how we can use it.

00:00:48.220 | Okay, so we can see the GitHub repo here.

00:00:51.660 | And yeah, just a short description.

00:00:54.300 | And we can kind of come to this visual here.

00:00:56.220 | It gives us sort of a rough idea

00:00:57.820 | of kind of what is going on there.

00:00:59.940 | And if we come down to here,

00:01:03.860 | we can see the different components of Canopy.

00:01:07.540 | I'm really gonna be focusing on the Canopy CLI down here,

00:01:12.540 | just to show you how to get started with it.

00:01:16.100 | So we're mostly gonna be using everything through CLI.

00:01:19.940 | Okay, and if we come down to the setup here,

00:01:23.300 | we have, okay, you can create a virtual environment,

00:01:25.860 | you can go ahead and do that, it's fine.

00:01:28.140 | I'm not going to in this case.

00:01:30.380 | But then what we do want is we want to install the package.

00:01:32.820 | So I'm actually just gonna copy this.

00:01:35.020 | And I'm gonna come over to my terminal window here.

00:01:37.580 | All right, so I'm gonna pip install.

00:01:39.300 | I'm just gonna add a upgrade flag here.

00:01:41.420 | And yes, I will let that install.

00:01:46.180 | I've already installed it, so yeah.

00:01:48.340 | Once it has installed,

00:01:49.900 | we should be able to just run Canopy.

00:01:54.860 | And we'll get this error message to begin with.

00:01:57.740 | And that's because we haven't set a few environment variables

00:02:00.700 | but we do know from this that it isn't solved.

00:02:03.580 | So to deal with this,

00:02:05.460 | we need to set some environment variables.

00:02:08.220 | So we have PyCon API key, PyCon environment.

00:02:11.540 | There's also the OpenAI API key as well

00:02:13.940 | that we should add into there.

00:02:15.300 | So I'm gonna go ahead and do that.

00:02:17.580 | I'm gonna run vim.

00:02:19.100 | I'm just gonna add all of these

00:02:20.900 | into some environment variables file.

00:02:24.220 | Okay, so I'm going to do on Mac.

00:02:27.500 | So I'm gonna do export PyCon API key.

00:02:31.300 | I'm gonna put my API key in there.

00:02:34.180 | I'm gonna do export PyCon environment.

00:02:39.100 | And also put that in there.

00:02:41.300 | And then I'm gonna do export OpenAI API key.

00:02:44.940 | Put that in there.

00:02:46.860 | So for the PyCon API key and environment,

00:02:50.180 | we go to app.pycon.io.

00:02:52.900 | We go to API keys and I'm just gonna copy this

00:02:55.820 | and I'm gonna take note of my environment as well.

00:02:58.180 | So us-west1-gcp and come back over here

00:03:00.980 | and I'm just gonna put it into this here.

00:03:04.700 | So you can try and seal my API keys if you like.

00:03:10.580 | And for the OpenAI API key,

00:03:12.820 | you want to go to platform.openai.com.

00:03:17.820 | We go to API keys at the top here.

00:03:21.420 | And I already created one, but I'm gonna create a new one.

00:03:25.500 | So canopy-demo2, I create my secret key.

00:03:30.500 | And again, I'm just gonna go put it in here.

00:03:37.860 | Great, so put those in.

00:03:44.060 | And now I can just go ahead and do that.

00:03:47.220 | Now with that done, let's try and run Canopy again.

00:03:51.580 | And we should get something that looks like this.

00:03:54.140 | Now what we can do is create a new index.

00:03:57.580 | Now to create a new index, you'd run canopy-new

00:04:01.900 | and then you'd have your index name.

00:04:04.420 | I'm gonna call mine canopy-101,

00:04:08.060 | but I already actually created canopy-101.

00:04:10.740 | So I'm just gonna call it 101A for now.

00:04:12.740 | Okay, so I confirm.

00:04:16.660 | Okay, and then from there, what we do want to do

00:04:18.940 | is actually add our data to this index.

00:04:21.500 | Now, let me jump across to a notebook

00:04:25.180 | and I'll show you how we can create data

00:04:27.620 | in the correct format for Canopy.

00:04:29.740 | Okay, so we're gonna work through this notebook

00:04:31.460 | very quickly.

00:04:32.700 | There'll be a link to this notebook

00:04:33.860 | at the top of the video right now.

00:04:35.700 | So we're gonna take this data set

00:04:37.900 | that I scraped from archive.

00:04:40.220 | It's just a load of AI archive papers.

00:04:44.180 | I've used either this version or the chunk version,

00:04:47.260 | this example a few times in recent videos.

00:04:51.460 | But if we just take a quick look at what is in there,

00:04:54.100 | we see that we basically have, okay,

00:04:56.380 | there's this understanding HTML with large language models,

00:05:00.180 | as a summary, and then we have the content.

00:05:02.300 | The content is kind of the bit we care most about.

00:05:05.420 | Now, the content in there is fairly long

00:05:08.860 | and typically what we do to handle that

00:05:11.500 | is we have to chunk it up into smaller parts.

00:05:16.300 | So let me just take the length of that.

00:05:19.660 | Okay, so yes, quite a few characters there.

00:05:24.060 | That wouldn't all fit into the context window of a LM,

00:05:27.860 | or it may fit in,

00:05:29.580 | but the whole 400 archive papers definitely wouldn't.

00:05:34.020 | And when we are feeding knowledge into an LM,

00:05:39.020 | we also want to be feeding that knowledge

00:05:41.180 | into smaller chunks

00:05:42.340 | so that we're not filling that context window

00:05:44.260 | so that we don't run into LM recall issues.

00:05:47.620 | So to avoid that, yeah, we use chunking.

00:05:53.060 | And fortunately, that's kind of built in to Kanopy.

00:05:57.140 | So we don't even need to care about it.

00:06:00.020 | It's gonna be done automatically.

00:06:02.180 | All we need to do is set up this data format here.

00:06:05.980 | So we have ID, text, source,

00:06:07.860 | so where the source is coming from.

00:06:09.700 | You don't have to pass that.

00:06:10.700 | You can just leave it blank, it's fine.

00:06:13.700 | And then metadata, which is just a dictionary

00:06:16.380 | containing any relevant information

00:06:17.980 | that you may or may not want in

00:06:20.020 | and like attached to your vectors.

00:06:22.420 | Again, you don't need to put anything in here.

00:06:25.300 | Okay, so we run this.

00:06:28.100 | This is just transforming our Wackyface dataset

00:06:32.920 | into this format.

00:06:35.020 | Okay, and removing the columns that we don't want.

00:06:38.220 | Then what I'm going to do is

00:06:39.580 | convert this into a JSON lines file.

00:06:41.660 | Okay, and then should be able to

00:06:46.660 | take a look at that over here.

00:06:50.060 | And yeah, we can see all of this.

00:06:55.300 | Okay, so with that done,

00:06:57.420 | we can move on to actually putting all of this

00:07:01.420 | into our index using Kanopy.

00:07:05.060 | Okay, once we have our dataset,

00:07:08.280 | we can go ahead and run Kanopy.

00:07:10.980 | Upsert, and it would be in here.

00:07:15.280 | So this is where I saved my data

00:07:17.740 | in the same direction I'm in now.

00:07:19.160 | And actually, you know, we can just see that quickly.

00:07:22.220 | Yeah, okay, so I'm going to upsert this.

00:07:25.500 | So Kanopy upsert, there we go.

00:07:28.580 | Now, when we try and do that,

00:07:31.180 | we're actually gonna get this error.

00:07:33.100 | And that's because we also need

00:07:34.200 | a index name environment variable.

00:07:36.340 | So we'll go ahead and do that as well.

00:07:39.300 | You can also set index name here within the command,

00:07:41.860 | but I'm going to do it via this.

00:07:45.680 | Okay, and I want A to start with.

00:07:52.440 | And do the upsert.

00:07:56.220 | It'll ask us to confirm that everything was correct.

00:07:58.380 | So just, you know, quick check, it looks pretty good.

00:08:01.720 | Say yes, and we continue.

00:08:05.900 | And then, yeah, we're gonna get this loading bar.

00:08:08.020 | It's gonna just show us the progress of our upsert.

00:08:12.780 | But I've already created my index,

00:08:15.420 | doing this exact same process.

00:08:16.860 | So I'm gonna actually cancel that.

00:08:19.700 | And what I'm going to do is change my index name

00:08:22.100 | to that other index.

00:08:24.580 | And then I'm going to start Kanopy.

00:08:26.460 | Okay, so I'm gonna do Kanopy start.

00:08:29.300 | And what this is going to do

00:08:31.260 | is start up the API or Kanopy server, okay?

00:08:35.960 | So from here, I can actually, you know,

00:08:38.080 | I could go to my localhost 8000 and go to the docs,

00:08:42.200 | and I can see, if I zoom in a little bit,

00:08:45.500 | see it has some documentation.

00:08:46.840 | We have all the endpoints and stuff in here

00:08:48.680 | that we can use.

00:08:50.500 | Now, I actually want to use the CLI.

00:08:53.800 | Now, the CLI requires that you have the Kanopy server

00:08:57.600 | running in the background.

00:08:58.640 | So I'm gonna switch across to a new terminal window.

00:09:01.580 | I'm going to activate my ML environment.

00:09:04.480 | I'm going to run source Mac Env,

00:09:07.080 | and I'm going to export my index name.

00:09:09.160 | Then what I want to do is run Kanopy chat.

00:09:14.600 | And so you can run Kanopy chat without any arguments,

00:09:19.040 | and that will, you know,

00:09:21.120 | it's like you're chatting with your LLM,

00:09:23.880 | and it's doing rag in the background,

00:09:25.280 | and you're getting your responses.

00:09:26.980 | But I also actually want to do it with NoRag.

00:09:29.840 | What NoRag will do is show us a comparison

00:09:33.440 | of the LLM response with and without rag.

00:09:36.440 | So this is incredibly useful for just evaluating

00:09:40.640 | what rag is actually doing for you.

00:09:42.960 | So yeah, let's see.

00:09:45.280 | Let's take a look at this,

00:09:46.120 | and yeah, we should see some pretty interesting results.

00:09:50.180 | Okay, cool.

00:09:51.020 | So we get a nice little note up there.

00:09:53.520 | This is debugging tool, not to be used for production.

00:09:57.240 | That's cool, 'cause we're just testing it.

00:09:59.160 | So hello there.

00:10:02.080 | So with that, press escape and enter.

00:10:05.120 | I'll send my query with context rag.

00:10:09.520 | Okay, so we see with this query,

00:10:12.680 | we literally get the same response,

00:10:14.100 | because, you know, it doesn't really matter

00:10:15.800 | whether we're using rag or not for general chat.

00:10:20.300 | But what if we have something like an actual query

00:10:24.960 | that is relevant to the dataset that we put behind this?

00:10:29.120 | So our dataset contains information

00:10:31.000 | about LLAMA2, the large language model,

00:10:33.360 | because this is an archived dataset on, like, AI.

00:10:38.360 | So I can ask it something like that.

00:10:40.600 | I can ask it, "Can you tell me about LLAMA2?"

00:10:45.600 | So obviously, with context,

00:10:50.880 | LLAMA2 is a collection of pre-trained

00:10:52.320 | and fine-tuned large language models,

00:10:54.560 | ranging in scale from 770 billion parameters,

00:10:57.520 | so on and so on, right?

00:10:59.880 | That's cool.

00:11:01.420 | Then, no rag, I apologize,

00:11:03.840 | but I'm not aware of any specific entity called LLAMA2.

00:11:06.880 | Okay, so this LLAM,

00:11:09.840 | it just doesn't know anything about LLAMA2,

00:11:11.840 | because its training day cutoff was, like, September 2021.

00:11:16.080 | So, yeah, it cannot know about that.

00:11:20.240 | So, I don't know, let's continue the conversation.

00:11:23.120 | Like, okay, fascinating.

00:11:27.400 | Can you tell me more about when I might want to use LLAMA?

00:11:32.400 | Okay, let's see what we get.

00:11:35.760 | Okay, cool.

00:11:37.640 | So, with context, rag, we have LLAMA2,

00:11:41.320 | specifically the fine-tuned LLAMs,

00:11:44.720 | optimized for dialogue use cases,

00:11:46.360 | found to outperform open-source trap models

00:11:48.240 | on most benchmarks that were tested, so on and so on.

00:11:51.320 | Okay, so, also gives us a source document,

00:11:53.240 | which is pretty nice.

00:11:55.640 | Now, without a context, okay,

00:11:59.000 | LLAMAs can serve various purposes

00:12:01.320 | and be useful in different situations.

00:12:04.360 | Can use them as pack animals, therapy animals,

00:12:06.760 | guard animals, apparently, I didn't know that.

00:12:10.440 | And, okay, maybe, and in sustainable agriculture.

00:12:15.440 | So, obviously, one of those answers

00:12:19.140 | is a little bit better than the other,

00:12:21.980 | at least for our use case.

00:12:24.240 | Now, let's ask you a slightly more complicated question.

00:12:27.500 | So, can you tell me about LLAMA2 versus Distilbert?

00:12:32.500 | Now, this is the sort of question

00:12:37.800 | where a typical rag pipeline, if not built well,

00:12:41.080 | will probably struggle,

00:12:42.080 | because there's actually kind of two search queries in here.

00:12:46.100 | We want to be searching for LLAMA2,

00:12:47.760 | and we also want to be searching for Distilbert,

00:12:49.840 | which appear in different papers.

00:12:52.040 | But, typically, the way that rag would be implemented,

00:12:55.880 | at least, you know, your first versions and whatever else,

00:12:59.640 | that's probably gonna get passed to your vector database

00:13:04.560 | as a single query.

00:13:06.520 | The good thing about Canopy is that it will handle this,

00:13:08.920 | and it'll actually split this up into multiple queries,

00:13:11.200 | so we're doing multiple searches,

00:13:13.180 | getting results from the Distilbert paper

00:13:15.440 | and the LLAMA2 paper,

00:13:17.000 | and then it's gonna provide us, hopefully,

00:13:19.120 | with a good comparison between the two.

00:13:22.060 | All right, so LLAMA2 is a collection of pre-trained

00:13:24.640 | and fine-tuned lost-language models, so on and so on.

00:13:28.080 | Cool.

00:13:29.120 | Distilbert is a smaller, faster, and lighter version

00:13:31.080 | of the BERT language model.

00:13:33.000 | In summary, LLAMA2 is specifically summarized

00:13:35.960 | for dialogue use cases,

00:13:37.080 | whilst Distilbert is a more efficient version

00:13:39.040 | of the BERT model that can be used

00:13:40.080 | for various natural language processing tasks.

00:13:42.960 | Okay, I think, you know, so it's a good comparison.

00:13:46.960 | Without context, obviously,

00:13:48.600 | it doesn't know what LLAMA2 is.

00:13:52.480 | So, yeah, it's like, okay, it's not a known entity or term

00:13:56.480 | in the realm of NLP or AI.

00:13:59.120 | However, Distilbert refers to a specific model architecture

00:14:02.320 | used for various NLP tasks.

00:14:04.320 | So, actually, it can tell us a little bit about Distilbert,

00:14:07.400 | because this is an owner model,

00:14:09.880 | so it does know about that,

00:14:12.220 | but it can't give us a good comparison.

00:14:14.760 | So, that's a very quick introduction

00:14:18.080 | to the Canopy framework.

00:14:19.920 | I think, from this, you can very clearly see

00:14:22.400 | what the pros of using something like this are.

00:14:26.720 | Of course, this is just the CLI.

00:14:28.320 | There's also the Canopy server

00:14:31.360 | and the actual framework itself,

00:14:34.080 | which you can obviously go ahead and try out.

00:14:36.720 | But for now, that's it for this video.

00:14:38.240 | Hope all this has been useful.

00:14:39.520 | So, thank you very much for watching,

00:14:40.960 | and I will see you again in the next one.

00:14:42.840 | Bye.

00:14:43.680 | (gentle music)

00:14:47.100 | (gentle music)

00:14:49.680 | (gentle music)

00:14:52.260 | (gentle music)

00:14:54.840 | (soft music)

00:14:57.260 | you

NEW RAG Framework: Canopy

Chapters