NEW Pinecone Assistant


0:0 AI Assistants
0:41 Pinecone Assistants in Python
1:19 Building an AI Research Assistant
2:11 Assistant Message and Chat
3:5 Adding Files to the Assistant
5:30 Chatting with our Assistant
7:23 Assistant Chat History
10:47 Asking about Mamba 2
12:11 Wrapping up with Assistants

00:00:00.000 | Today, we are going to be taking a look
00:00:02.120 | at the new Pinecone Assistance.
00:00:04.960 | Now, Pinecone Assistance allows us to build AI assistance
00:00:09.080 | and augment them with additional documents
00:00:12.600 | and knowledge super easily.
00:00:14.480 | So that means that we can get AI assistance
00:00:17.140 | that suffer less from hallucinations,
00:00:19.480 | have more up-to-date knowledge,
00:00:21.800 | and can also answer questions about knowledge
00:00:25.520 | specific to our own use cases or our own organizations,
00:00:29.960 | by simply providing them with the source of that knowledge
00:00:34.040 | through PDF documents.
00:00:35.880 | In this video, we're going to take a look
00:00:37.800 | at Pinecone Assistance and how we can use them in Python.
00:00:41.240 | Now, we're gonna be working through this notebook here.
00:00:43.640 | There will be a link to this in the comments below.
00:00:46.200 | But what we're first going to do
00:00:47.400 | is just install prerequisites.
00:00:48.840 | So we have the Pinecone Client as usual,
00:00:51.040 | and then we also have this plugin,
00:00:52.920 | which allows us to use assistance
00:00:55.200 | through the usual Python Client.
00:00:57.880 | So we install those.
00:00:59.520 | I'm also going to be using this Pinecone Notebooks,
00:01:01.640 | which you'll see here.
00:01:03.640 | I'm just trying to authenticate
00:01:04.840 | or get my Pinecone API key within a notebook,
00:01:06.960 | which is kind of nice.
00:01:08.360 | So I've already ran it, so it's not gonna do it again.
00:01:11.380 | But basically, API key is in Pinecone API key
00:01:14.520 | environment variable there.
00:01:15.640 | So now I can just initialize my client as per usual.
00:01:19.500 | And what we're going to be doing
00:01:20.940 | is building an AI research assistant.
00:01:23.980 | So first, I wanna see, okay,
00:01:26.720 | do I have any assistants already?
00:01:29.000 | No, I don't.
00:01:30.460 | Now I'm gonna go ahead and actually create
00:01:32.920 | my AI research assistant.
00:01:35.480 | So I'm giving it this name.
00:01:37.200 | And I'm also adding this metadata in here.
00:01:39.680 | You don't need to do this.
00:01:42.280 | It's optional, so I can actually just remove it.
00:01:44.600 | But I'm adding it in there just so if others
00:01:48.020 | in the organization see this assistant,
00:01:50.080 | they can see who created it,
00:01:51.320 | and I can keep track of the version as well.
00:01:53.480 | So I will run that.
00:01:56.200 | Cool, and we can see that it has been created.
00:01:58.860 | And it is ready.
00:02:01.000 | I can also check here.
00:02:03.520 | So I can pass in name my assistant to describe assistant.
00:02:07.000 | And we can just see that information again
00:02:08.640 | if we do need it.
00:02:10.360 | But yeah, we can move on to actually
00:02:14.240 | trying to interact with the assistant.
00:02:17.560 | It won't work this first time,
00:02:18.760 | because we need to provide it with some knowledge
00:02:21.160 | before we start asking questions.
00:02:22.760 | But I do want to just go over what we are doing here.
00:02:26.040 | Okay, so we also have this new message object.
00:02:28.840 | And that message object allows us
00:02:30.580 | to pass in the content of our message,
00:02:33.320 | and allows us to specify whether it is us talking,
00:02:36.240 | i.e. the user, or whether it is the assistant talking.
00:02:39.680 | So I'm going to be asking this question
00:02:41.680 | about Mixture 8x7b.
00:02:44.040 | So I put role user here.
00:02:46.160 | And I'm going to hit the chat completions method here
00:02:49.640 | with messages.
00:02:50.480 | I'm going to pass a list of my messages,
00:02:53.320 | which is just one for now.
00:02:55.160 | So I'm going to run that.
00:02:57.360 | And we will get this error.
00:02:58.680 | And we can see here,
00:02:59.760 | assistant doesn't contain any files, right?
00:03:02.120 | So we need to add some files to our assistant
00:03:04.600 | for it to work.
00:03:05.440 | So to do that, I'm going to download
00:03:08.720 | basically a ton of recent top AI papers
00:03:13.720 | from the past two months.
00:03:15.440 | So yeah, I'm going to git clone this repo.
00:03:19.640 | And basically within this repo, just loads PDF files.
00:03:23.880 | So it may take a moment to download.
00:03:26.160 | Okay, great, so that's done.
00:03:28.520 | And now I'm going to use Pathlib
00:03:30.280 | to basically get the paths for all
00:03:33.160 | of the PDF files that I just downloaded.
00:03:35.960 | So I'm going to run that.
00:03:37.600 | Cool, so we have, I don't remember how many we have here.
00:03:41.720 | 48 maybe is the right number.
00:03:44.040 | So roughly 48 PDFs about AI.
00:03:48.160 | So I'm going to upload all those to our assistant.
00:03:51.120 | We have this assistant upload file method.
00:03:55.280 | We pass it a file path,
00:03:56.880 | and then it's going to send it over to Pinecone
00:03:59.560 | and the assistant.
00:04:00.800 | And then we also have this timeout option.
00:04:02.320 | So timeout, we can have a few values here.
00:04:05.040 | So we can set this value to like five.
00:04:08.200 | If we would like to wait five seconds
00:04:10.200 | and then like get a return, get a response from Pinecone.
00:04:13.560 | Or we could say none if we just want to wait
00:04:17.400 | until the PDF file has been processed.
00:04:19.720 | Or what I am going to do is do use minus one,
00:04:22.960 | which basically says send the PDF file
00:04:25.560 | and then return immediately.
00:04:26.760 | Don't wait for its process
00:04:28.240 | because I just want to like send as many PDFs as I can.
00:04:32.160 | So yeah, as quickly as I can.
00:04:35.280 | So that's what I'm doing.
00:04:36.600 | Now, because we are returning the status
00:04:40.680 | of these files immediately,
00:04:42.800 | what we will see in a moment
00:04:44.320 | is that the status for these files
00:04:46.400 | is going to come up as a processing
00:04:48.640 | because we've literally sent it to Pinecone.
00:04:52.160 | Then of course, Pinecone has started processing the document
00:04:54.840 | and then we returned the status of that document immediately
00:04:58.040 | rather than waiting for its process.
00:04:59.960 | So if we have a look here,
00:05:01.160 | we're going to see that all of these are processing.
00:05:03.760 | So what we now want to do is,
00:05:07.360 | okay, have they finished processing yet?
00:05:09.920 | I don't know, let's have a look.
00:05:11.120 | So we just call describe file and we pass the file ID.
00:05:15.000 | And now we can see that at least this first document
00:05:17.240 | has finished processing.
00:05:19.040 | And I'm going to run this little for loop here
00:05:21.200 | to check the rest of them.
00:05:22.240 | So let's see how many of them are complete.
00:05:25.120 | Okay, so all of them are complete.
00:05:28.080 | That was super fast.
00:05:29.600 | So yeah, we can move on
00:05:33.200 | to actually chatting with our assistant now.
00:05:36.000 | So we're going to come down to here.
00:05:37.520 | I'm going to import this Markdown display
00:05:40.200 | because the assistant will reply in Markdown.
00:05:44.200 | So there are citations and stuff in there
00:05:47.120 | and they are formatted with Markdown.
00:05:48.880 | So it's a lot nicer to print that out
00:05:52.040 | with Markdown rather than just viewing it
00:05:53.880 | as a direct print.
00:05:55.560 | So I'm going to hit chat completions again.
00:05:59.160 | Same question, I'm just going to ask you
00:06:00.560 | about the Mixture model and see what it says.
00:06:03.560 | Okay, cool.
00:06:04.400 | So we have this big chunk of text
00:06:08.720 | telling us all about Mixture 8x7b.
00:06:12.720 | And well, I mean, kind of the whole point of,
00:06:15.920 | or one of the main points of Pinecone assistance
00:06:20.040 | is that everything is grounded in like actual knowledge.
00:06:24.720 | So we can see that in the response here.
00:06:29.080 | So one, everything is accurate.
00:06:31.200 | So Mixture 8x7b is a sparse mixture
00:06:35.240 | of experts language model.
00:06:36.840 | And it gives us all this information about it,
00:06:38.920 | which is great.
00:06:40.120 | But one thing that is really nice about this
00:06:42.360 | is we have the citations here.
00:06:44.160 | So we can see here that we have reference one.
00:06:46.560 | So this mixture of experts PDF,
00:06:48.960 | which we can open in a moment.
00:06:50.800 | And we also see what pages the,
00:06:54.920 | like we've used in order to get that information.
00:06:57.160 | So we have page one and page four.
00:06:59.640 | Okay, so to basically construct this paragraph here,
00:07:02.880 | that is what is being used.
00:07:04.560 | Then we have, okay, we have a PDF one again,
00:07:07.840 | where page is one, two, two,
00:07:09.480 | and then also page six to construct this one
00:07:12.440 | and so on and so on, which is pretty nice.
00:07:15.040 | And then we can also click on here
00:07:17.120 | and it brings us through to just seeing the PDF here,
00:07:21.120 | which is pretty cool.
00:07:22.640 | And then obviously we can refer to our citations
00:07:26.320 | and basically just have more confidence
00:07:28.280 | in what the assistant is telling us, which is nice.
00:07:32.800 | So that is cool.
00:07:34.920 | But now I want to actually chat with the assistant.
00:07:37.840 | So we're gonna set up some code
00:07:39.200 | that will allow us to do that a bit more easily.
00:07:41.560 | So first thing we need is a list of our chat history.
00:07:46.520 | And I'm gonna initialize that list
00:07:49.080 | with the first message that I sent asking about Mixture
00:07:52.440 | and the response from our assistant, which is here, okay?
00:07:57.440 | So actually we can just have a quick look
00:08:00.080 | at the output there so that you can see what I'm doing
00:08:03.880 | or what I am looking at.
00:08:06.800 | Okay, so we have the content
00:08:10.240 | and then we also have the role, which is assistant.
00:08:13.280 | So yeah, we're just passing that
00:08:15.080 | and creating a message object using those two values.
00:08:19.120 | And then I'm going to create this chat function,
00:08:21.440 | which is just gonna consume a message right from me.
00:08:23.680 | So when I'm asking a question,
00:08:25.480 | I'm gonna pass it into there.
00:08:27.320 | It's going to format my input into a message object.
00:08:32.320 | We are going to get the response from our assistant.
00:08:36.600 | We're going to extract that response out
00:08:40.040 | into the format that we need.
00:08:42.440 | And then I'm going to add both my initial message.
00:08:47.000 | So what's coming in here or here
00:08:50.120 | and the message or response from the assistant
00:08:53.160 | to the chat history, okay?
00:08:54.600 | So we're going to be adding to the chat history over time.
00:08:57.280 | And then I'm gonna return the markdown formatted response
00:09:01.800 | so that we can actually see what it is saying.
00:09:04.320 | And let's ask some more questions.
00:09:06.280 | So the first one is I'm going to ask a little bit more
00:09:09.480 | about what is a sparse mixture of experts model.
00:09:14.480 | What does it mean?
00:09:16.840 | So let's see what it tells us.
00:09:20.080 | Okay, cool.
00:09:20.920 | So we have this sparse mixture of experts model,
00:09:24.240 | architecture, and machine learning.
00:09:26.080 | No, no, gone, you know, so on and so on.
00:09:29.160 | And we can actually see that the reference here is different.
00:09:33.600 | It's not actually coming from the same paper.
00:09:35.240 | It's coming from another paper that we have in there.
00:09:37.760 | So we can open that, right?
00:09:39.400 | And we see, okay, this paper is literally talking about
00:09:43.840 | or to some degree about SMOE, which is pretty cool.
00:09:48.760 | And interestingly also tells us, okay, look,
00:09:51.520 | we have this low expert activation
00:09:53.800 | where it's talking about basically the drawbacks of SMOE.
00:09:57.000 | And if we come back over to here,
00:09:59.880 | we'll see that this is being pulled in as well,
00:10:01.800 | which is pretty cool.
00:10:02.960 | So within this short summary,
00:10:05.120 | it's showing us all like the most important information
00:10:07.920 | or in my opinion, some of the most interesting information.
00:10:11.720 | So, okay, that's cool, but I have no idea what this means.
00:10:14.760 | So let's ask about that.
00:10:17.080 | So why is low expert activation a bad thing?
00:10:21.440 | Why is the problem with that?
00:10:22.600 | So let's see what it comes up with.
00:10:25.440 | Okay, so we're pulling from the same paper again.
00:10:28.040 | And it said, okay, detrimental several reasons.
00:10:31.640 | When you under utilizing the model capacity,
00:10:34.280 | suboptimal performance, inefficiency in learning
00:10:37.360 | and limited fine grained understanding.
00:10:40.040 | Okay, so that's cool.
00:10:42.400 | Okay, nice.
00:10:43.600 | We learned about mixture and SMOE a little bit.
00:10:47.280 | Now let's learn about something more recent.
00:10:50.840 | So we have the Mamba2 model.
00:10:52.520 | I, you know, let's say I don't have a clue what Mamba2 is.
00:10:55.760 | And I just want to,
00:10:57.520 | I just want like a really nice little overview
00:11:00.200 | of what it is.
00:11:01.680 | So let's ask and see what we get.
00:11:06.200 | Cool, so Mamba2 is a type of deep learning model
00:11:10.360 | designed to handle sequences of data
00:11:11.800 | like text or audio very efficiently.
00:11:14.160 | Here's a breakdown.
00:11:15.000 | What is Mamba2?
00:11:15.840 | It's a sequence model.
00:11:16.680 | It builds on top of the original Mamba model.
00:11:18.720 | And it helps to process sequences more efficiently
00:11:22.280 | than traditional models like transformers,
00:11:25.480 | which is pretty cool.
00:11:27.640 | And we can see, okay, we've got reference one here,
00:11:30.040 | but in this output, we actually have two references,
00:11:32.840 | which is nice,
00:11:33.680 | or at least two different documents that it's pulling from.
00:11:37.080 | And yeah, we can go ahead and have a look at both of those.
00:11:41.480 | So I'm going to close these.
00:11:42.640 | So transformers are SSMs.
00:11:44.920 | So this is a Mamba2 paper, I believe.
00:11:48.120 | Yeah, so Mamba2, cool.
00:11:52.720 | And let's have a look at what the other one is.
00:11:54.560 | So this is actually the Mamba1 paper.
00:11:57.320 | Okay, so it's pulling information from both of those
00:11:59.920 | and constructing this nice overview,
00:12:03.000 | which is pretty cool.
00:12:03.920 | And yeah, probably actually pretty useful
00:12:07.440 | for just keeping relatively up to date with what is going on.
00:12:12.000 | So you can, of course,
00:12:14.200 | continue talking to your assistant for as long as you like,
00:12:17.360 | but I'm done with mine now.
00:12:18.720 | So I'm going to go ahead
00:12:19.800 | and save myself a little bit of memory
00:12:22.280 | by deleting the assistant.
00:12:23.720 | So if we just come over to here quickly,
00:12:26.120 | we're going to go to assistants beta,
00:12:28.000 | and this is a Pinecone console.
00:12:29.880 | And you can see down here, I have this storage.
00:12:32.480 | So we have limited storage at the moment.
00:12:35.120 | So I'm going to just go ahead and delete my assistant,
00:12:38.600 | and that will free up the storage for me
00:12:41.120 | by just deleting all those documents
00:12:43.520 | that I originally provided it with.
00:12:46.160 | And yeah, with that, we are done with this walkthrough.
00:12:50.400 | And we've seen a little bit
00:12:51.680 | of what Pinecone assistants can do,
00:12:53.360 | which is just a really easy-to-use,
00:12:57.600 | like out-of-the-box AI assistant
00:13:00.400 | that is able to ground its answers in knowledge
00:13:03.920 | very well, as we saw,
00:13:05.960 | and gives us a really nice little interface
00:13:07.800 | for providing more trustworthy outputs from our assistant.
00:13:12.800 | So that's it for this video.
00:13:16.080 | I hope all of this has been useful and interesting,
00:13:20.720 | but I'll leave it there for now.
00:13:21.760 | So thank you very much for watching,
00:13:23.200 | and I will see you again in the next one.
