NEW Pinecone Assistant

00:00:00.000 | Today, we are going to be taking a look

00:00:02.120 | at the new Pinecone Assistance.

00:00:04.960 | Now, Pinecone Assistance allows us to build AI assistance

00:00:09.080 | and augment them with additional documents

00:00:12.600 | and knowledge super easily.

00:00:14.480 | So that means that we can get AI assistance

00:00:17.140 | that suffer less from hallucinations,

00:00:19.480 | have more up-to-date knowledge,

00:00:21.800 | and can also answer questions about knowledge

00:00:25.520 | specific to our own use cases or our own organizations,

00:00:29.960 | by simply providing them with the source of that knowledge

00:00:34.040 | through PDF documents.

00:00:35.880 | In this video, we're going to take a look

00:00:37.800 | at Pinecone Assistance and how we can use them in Python.

00:00:41.240 | Now, we're gonna be working through this notebook here.

00:00:43.640 | There will be a link to this in the comments below.

00:00:46.200 | But what we're first going to do

00:00:47.400 | is just install prerequisites.

00:00:48.840 | So we have the Pinecone Client as usual,

00:00:51.040 | and then we also have this plugin,

00:00:52.920 | which allows us to use assistance

00:00:55.200 | through the usual Python Client.

00:00:57.880 | So we install those.

00:00:59.520 | I'm also going to be using this Pinecone Notebooks,

00:01:01.640 | which you'll see here.

00:01:03.640 | I'm just trying to authenticate

00:01:04.840 | or get my Pinecone API key within a notebook,

00:01:06.960 | which is kind of nice.

00:01:08.360 | So I've already ran it, so it's not gonna do it again.

00:01:11.380 | But basically, API key is in Pinecone API key

00:01:14.520 | environment variable there.

00:01:15.640 | So now I can just initialize my client as per usual.

00:01:19.500 | And what we're going to be doing

00:01:20.940 | is building an AI research assistant.

00:01:23.980 | So first, I wanna see, okay,

00:01:26.720 | do I have any assistants already?

00:01:29.000 | No, I don't.

00:01:30.460 | Now I'm gonna go ahead and actually create

00:01:32.920 | my AI research assistant.

00:01:35.480 | So I'm giving it this name.

00:01:37.200 | And I'm also adding this metadata in here.

00:01:39.680 | You don't need to do this.

00:01:42.280 | It's optional, so I can actually just remove it.

00:01:44.600 | But I'm adding it in there just so if others

00:01:48.020 | in the organization see this assistant,

00:01:50.080 | they can see who created it,

00:01:51.320 | and I can keep track of the version as well.

00:01:53.480 | So I will run that.

00:01:56.200 | Cool, and we can see that it has been created.

00:01:58.860 | And it is ready.

00:02:01.000 | I can also check here.

00:02:03.520 | So I can pass in name my assistant to describe assistant.

00:02:07.000 | And we can just see that information again

00:02:08.640 | if we do need it.

00:02:10.360 | But yeah, we can move on to actually

00:02:14.240 | trying to interact with the assistant.

00:02:17.560 | It won't work this first time,

00:02:18.760 | because we need to provide it with some knowledge

00:02:21.160 | before we start asking questions.

00:02:22.760 | But I do want to just go over what we are doing here.

00:02:26.040 | Okay, so we also have this new message object.

00:02:28.840 | And that message object allows us

00:02:30.580 | to pass in the content of our message,

00:02:33.320 | and allows us to specify whether it is us talking,

00:02:36.240 | i.e. the user, or whether it is the assistant talking.

00:02:39.680 | So I'm going to be asking this question

00:02:41.680 | about Mixture 8x7b.

00:02:44.040 | So I put role user here.

00:02:46.160 | And I'm going to hit the chat completions method here

00:02:49.640 | with messages.

00:02:50.480 | I'm going to pass a list of my messages,

00:02:53.320 | which is just one for now.

00:02:55.160 | So I'm going to run that.

00:02:57.360 | And we will get this error.

00:02:58.680 | And we can see here,

00:02:59.760 | assistant doesn't contain any files, right?

00:03:02.120 | So we need to add some files to our assistant

00:03:04.600 | for it to work.

00:03:05.440 | So to do that, I'm going to download

00:03:08.720 | basically a ton of recent top AI papers

00:03:13.720 | from the past two months.

00:03:15.440 | So yeah, I'm going to git clone this repo.

00:03:19.640 | And basically within this repo, just loads PDF files.

00:03:23.880 | So it may take a moment to download.

00:03:26.160 | Okay, great, so that's done.

00:03:28.520 | And now I'm going to use Pathlib

00:03:30.280 | to basically get the paths for all

00:03:33.160 | of the PDF files that I just downloaded.

00:03:35.960 | So I'm going to run that.

00:03:37.600 | Cool, so we have, I don't remember how many we have here.

00:03:41.720 | 48 maybe is the right number.

00:03:44.040 | So roughly 48 PDFs about AI.

00:03:48.160 | So I'm going to upload all those to our assistant.

00:03:51.120 | We have this assistant upload file method.

00:03:55.280 | We pass it a file path,

00:03:56.880 | and then it's going to send it over to Pinecone

00:03:59.560 | and the assistant.

00:04:00.800 | And then we also have this timeout option.

00:04:02.320 | So timeout, we can have a few values here.

00:04:05.040 | So we can set this value to like five.

00:04:08.200 | If we would like to wait five seconds

00:04:10.200 | and then like get a return, get a response from Pinecone.

00:04:13.560 | Or we could say none if we just want to wait

00:04:17.400 | until the PDF file has been processed.

00:04:19.720 | Or what I am going to do is do use minus one,

00:04:22.960 | which basically says send the PDF file

00:04:25.560 | and then return immediately.

00:04:26.760 | Don't wait for its process

00:04:28.240 | because I just want to like send as many PDFs as I can.

00:04:32.160 | So yeah, as quickly as I can.

00:04:35.280 | So that's what I'm doing.

00:04:36.600 | Now, because we are returning the status

00:04:40.680 | of these files immediately,

00:04:42.800 | what we will see in a moment

00:04:44.320 | is that the status for these files

00:04:46.400 | is going to come up as a processing

00:04:48.640 | because we've literally sent it to Pinecone.

00:04:52.160 | Then of course, Pinecone has started processing the document

00:04:54.840 | and then we returned the status of that document immediately

00:04:58.040 | rather than waiting for its process.

00:04:59.960 | So if we have a look here,

00:05:01.160 | we're going to see that all of these are processing.

00:05:03.760 | So what we now want to do is,

00:05:07.360 | okay, have they finished processing yet?

00:05:09.920 | I don't know, let's have a look.

00:05:11.120 | So we just call describe file and we pass the file ID.

00:05:15.000 | And now we can see that at least this first document

00:05:17.240 | has finished processing.

00:05:19.040 | And I'm going to run this little for loop here

00:05:21.200 | to check the rest of them.

00:05:22.240 | So let's see how many of them are complete.

00:05:25.120 | Okay, so all of them are complete.

00:05:28.080 | That was super fast.

00:05:29.600 | So yeah, we can move on

00:05:33.200 | to actually chatting with our assistant now.

00:05:36.000 | So we're going to come down to here.

00:05:37.520 | I'm going to import this Markdown display

00:05:40.200 | because the assistant will reply in Markdown.

00:05:44.200 | So there are citations and stuff in there

00:05:47.120 | and they are formatted with Markdown.

00:05:48.880 | So it's a lot nicer to print that out

00:05:52.040 | with Markdown rather than just viewing it

00:05:53.880 | as a direct print.

00:05:55.560 | So I'm going to hit chat completions again.

00:05:59.160 | Same question, I'm just going to ask you

00:06:00.560 | about the Mixture model and see what it says.

00:06:03.560 | Okay, cool.

00:06:04.400 | So we have this big chunk of text

00:06:08.720 | telling us all about Mixture 8x7b.

00:06:12.720 | And well, I mean, kind of the whole point of,

00:06:15.920 | or one of the main points of Pinecone assistance

00:06:20.040 | is that everything is grounded in like actual knowledge.

00:06:24.720 | So we can see that in the response here.

00:06:29.080 | So one, everything is accurate.

00:06:31.200 | So Mixture 8x7b is a sparse mixture

00:06:35.240 | of experts language model.

00:06:36.840 | And it gives us all this information about it,

00:06:38.920 | which is great.

00:06:40.120 | But one thing that is really nice about this

00:06:42.360 | is we have the citations here.

00:06:44.160 | So we can see here that we have reference one.

00:06:46.560 | So this mixture of experts PDF,

00:06:48.960 | which we can open in a moment.

00:06:50.800 | And we also see what pages the,

00:06:54.920 | like we've used in order to get that information.

00:06:57.160 | So we have page one and page four.

00:06:59.640 | Okay, so to basically construct this paragraph here,

00:07:02.880 | that is what is being used.

00:07:04.560 | Then we have, okay, we have a PDF one again,

00:07:07.840 | where page is one, two, two,

00:07:09.480 | and then also page six to construct this one

00:07:12.440 | and so on and so on, which is pretty nice.

00:07:15.040 | And then we can also click on here

00:07:17.120 | and it brings us through to just seeing the PDF here,

00:07:21.120 | which is pretty cool.

00:07:22.640 | And then obviously we can refer to our citations

00:07:26.320 | and basically just have more confidence

00:07:28.280 | in what the assistant is telling us, which is nice.

00:07:32.800 | So that is cool.

00:07:34.920 | But now I want to actually chat with the assistant.

00:07:37.840 | So we're gonna set up some code

00:07:39.200 | that will allow us to do that a bit more easily.

00:07:41.560 | So first thing we need is a list of our chat history.

00:07:46.520 | And I'm gonna initialize that list

00:07:49.080 | with the first message that I sent asking about Mixture

00:07:52.440 | and the response from our assistant, which is here, okay?

00:07:57.440 | So actually we can just have a quick look

00:08:00.080 | at the output there so that you can see what I'm doing

00:08:03.880 | or what I am looking at.

00:08:06.800 | Okay, so we have the content

00:08:10.240 | and then we also have the role, which is assistant.

00:08:13.280 | So yeah, we're just passing that

00:08:15.080 | and creating a message object using those two values.

00:08:19.120 | And then I'm going to create this chat function,

00:08:21.440 | which is just gonna consume a message right from me.

00:08:23.680 | So when I'm asking a question,

00:08:25.480 | I'm gonna pass it into there.

00:08:27.320 | It's going to format my input into a message object.

00:08:32.320 | We are going to get the response from our assistant.

00:08:36.600 | We're going to extract that response out

00:08:40.040 | into the format that we need.

00:08:42.440 | And then I'm going to add both my initial message.

00:08:47.000 | So what's coming in here or here

00:08:50.120 | and the message or response from the assistant

00:08:53.160 | to the chat history, okay?

00:08:54.600 | So we're going to be adding to the chat history over time.

00:08:57.280 | And then I'm gonna return the markdown formatted response

00:09:01.800 | so that we can actually see what it is saying.

00:09:04.320 | And let's ask some more questions.

00:09:06.280 | So the first one is I'm going to ask a little bit more

00:09:09.480 | about what is a sparse mixture of experts model.

00:09:14.480 | What does it mean?

00:09:16.840 | So let's see what it tells us.

00:09:20.080 | Okay, cool.

00:09:20.920 | So we have this sparse mixture of experts model,

00:09:24.240 | architecture, and machine learning.

00:09:26.080 | No, no, gone, you know, so on and so on.

00:09:29.160 | And we can actually see that the reference here is different.

00:09:33.600 | It's not actually coming from the same paper.

00:09:35.240 | It's coming from another paper that we have in there.

00:09:37.760 | So we can open that, right?

00:09:39.400 | And we see, okay, this paper is literally talking about

00:09:43.840 | or to some degree about SMOE, which is pretty cool.

00:09:48.760 | And interestingly also tells us, okay, look,

00:09:51.520 | we have this low expert activation

00:09:53.800 | where it's talking about basically the drawbacks of SMOE.

00:09:57.000 | And if we come back over to here,

00:09:59.880 | we'll see that this is being pulled in as well,

00:10:01.800 | which is pretty cool.

00:10:02.960 | So within this short summary,

00:10:05.120 | it's showing us all like the most important information

00:10:07.920 | or in my opinion, some of the most interesting information.

00:10:11.720 | So, okay, that's cool, but I have no idea what this means.

00:10:14.760 | So let's ask about that.

00:10:17.080 | So why is low expert activation a bad thing?

00:10:21.440 | Why is the problem with that?

00:10:22.600 | So let's see what it comes up with.

00:10:25.440 | Okay, so we're pulling from the same paper again.

00:10:28.040 | And it said, okay, detrimental several reasons.

00:10:31.640 | When you under utilizing the model capacity,

00:10:34.280 | suboptimal performance, inefficiency in learning

00:10:37.360 | and limited fine grained understanding.

00:10:40.040 | Okay, so that's cool.

00:10:42.400 | Okay, nice.

00:10:43.600 | We learned about mixture and SMOE a little bit.

00:10:47.280 | Now let's learn about something more recent.

00:10:50.840 | So we have the Mamba2 model.

00:10:52.520 | I, you know, let's say I don't have a clue what Mamba2 is.

00:10:55.760 | And I just want to,

00:10:57.520 | I just want like a really nice little overview

00:11:00.200 | of what it is.

00:11:01.680 | So let's ask and see what we get.

00:11:06.200 | Cool, so Mamba2 is a type of deep learning model

00:11:10.360 | designed to handle sequences of data

00:11:11.800 | like text or audio very efficiently.

00:11:14.160 | Here's a breakdown.

00:11:15.000 | What is Mamba2?

00:11:15.840 | It's a sequence model.

00:11:16.680 | It builds on top of the original Mamba model.

00:11:18.720 | And it helps to process sequences more efficiently

00:11:22.280 | than traditional models like transformers,

00:11:25.480 | which is pretty cool.

00:11:27.640 | And we can see, okay, we've got reference one here,

00:11:30.040 | but in this output, we actually have two references,

00:11:32.840 | which is nice,

00:11:33.680 | or at least two different documents that it's pulling from.

00:11:37.080 | And yeah, we can go ahead and have a look at both of those.

00:11:41.480 | So I'm going to close these.

00:11:42.640 | So transformers are SSMs.

00:11:44.920 | So this is a Mamba2 paper, I believe.

00:11:48.120 | Yeah, so Mamba2, cool.

00:11:52.720 | And let's have a look at what the other one is.

00:11:54.560 | So this is actually the Mamba1 paper.

00:11:57.320 | Okay, so it's pulling information from both of those

00:11:59.920 | and constructing this nice overview,

00:12:03.000 | which is pretty cool.

00:12:03.920 | And yeah, probably actually pretty useful

00:12:07.440 | for just keeping relatively up to date with what is going on.

00:12:12.000 | So you can, of course,

00:12:14.200 | continue talking to your assistant for as long as you like,

00:12:17.360 | but I'm done with mine now.

00:12:18.720 | So I'm going to go ahead

00:12:19.800 | and save myself a little bit of memory

00:12:22.280 | by deleting the assistant.

00:12:23.720 | So if we just come over to here quickly,

00:12:26.120 | we're going to go to assistants beta,

00:12:28.000 | and this is a Pinecone console.

00:12:29.880 | And you can see down here, I have this storage.

00:12:32.480 | So we have limited storage at the moment.

00:12:35.120 | So I'm going to just go ahead and delete my assistant,

00:12:38.600 | and that will free up the storage for me

00:12:41.120 | by just deleting all those documents

00:12:43.520 | that I originally provided it with.

00:12:46.160 | And yeah, with that, we are done with this walkthrough.

00:12:50.400 | And we've seen a little bit

00:12:51.680 | of what Pinecone assistants can do,

00:12:53.360 | which is just a really easy-to-use,

00:12:57.600 | like out-of-the-box AI assistant

00:13:00.400 | that is able to ground its answers in knowledge

00:13:03.920 | very well, as we saw,

00:13:05.960 | and gives us a really nice little interface

00:13:07.800 | for providing more trustworthy outputs from our assistant.

00:13:12.800 | So that's it for this video.

00:13:16.080 | I hope all of this has been useful and interesting,

00:13:20.720 | but I'll leave it there for now.

00:13:21.760 | So thank you very much for watching,

00:13:23.200 | and I will see you again in the next one.

00:13:25.640 | Bye.

00:13:26.520 | (gentle music)

00:13:29.120 | (gentle music)

00:13:31.760 | (gentle music)

00:13:34.400 | (gentle music)

00:13:37.040 | (gentle music)

00:13:39.680 | (gentle music)

00:13:42.280 | you

00:13:44.340 | [BLANK_AUDIO]

NEW Pinecone Assistant

Chapters