Today, we are going to be taking a look at the new Pinecone Assistance. Now, Pinecone Assistance allows us to build AI assistance and augment them with additional documents and knowledge super easily. So that means that we can get AI assistance that suffer less from hallucinations, have more up-to-date knowledge, and can also answer questions about knowledge specific to our own use cases or our own organizations, by simply providing them with the source of that knowledge through PDF documents.
In this video, we're going to take a look at Pinecone Assistance and how we can use them in Python. Now, we're gonna be working through this notebook here. There will be a link to this in the comments below. But what we're first going to do is just install prerequisites.
So we have the Pinecone Client as usual, and then we also have this plugin, which allows us to use assistance through the usual Python Client. So we install those. I'm also going to be using this Pinecone Notebooks, which you'll see here. I'm just trying to authenticate or get my Pinecone API key within a notebook, which is kind of nice.
So I've already ran it, so it's not gonna do it again. But basically, API key is in Pinecone API key environment variable there. So now I can just initialize my client as per usual. And what we're going to be doing is building an AI research assistant. So first, I wanna see, okay, do I have any assistants already?
No, I don't. Now I'm gonna go ahead and actually create my AI research assistant. So I'm giving it this name. And I'm also adding this metadata in here. You don't need to do this. It's optional, so I can actually just remove it. But I'm adding it in there just so if others in the organization see this assistant, they can see who created it, and I can keep track of the version as well.
So I will run that. Cool, and we can see that it has been created. And it is ready. I can also check here. So I can pass in name my assistant to describe assistant. And we can just see that information again if we do need it. But yeah, we can move on to actually trying to interact with the assistant.
It won't work this first time, because we need to provide it with some knowledge before we start asking questions. But I do want to just go over what we are doing here. Okay, so we also have this new message object. And that message object allows us to pass in the content of our message, and allows us to specify whether it is us talking, i.e.
the user, or whether it is the assistant talking. So I'm going to be asking this question about Mixture 8x7b. So I put role user here. And I'm going to hit the chat completions method here with messages. I'm going to pass a list of my messages, which is just one for now.
So I'm going to run that. And we will get this error. And we can see here, assistant doesn't contain any files, right? So we need to add some files to our assistant for it to work. So to do that, I'm going to download basically a ton of recent top AI papers from the past two months.
So yeah, I'm going to git clone this repo. And basically within this repo, just loads PDF files. So it may take a moment to download. Okay, great, so that's done. And now I'm going to use Pathlib to basically get the paths for all of the PDF files that I just downloaded.
So I'm going to run that. Cool, so we have, I don't remember how many we have here. 48 maybe is the right number. So roughly 48 PDFs about AI. So I'm going to upload all those to our assistant. We have this assistant upload file method. We pass it a file path, and then it's going to send it over to Pinecone and the assistant.
And then we also have this timeout option. So timeout, we can have a few values here. So we can set this value to like five. If we would like to wait five seconds and then like get a return, get a response from Pinecone. Or we could say none if we just want to wait until the PDF file has been processed.
Or what I am going to do is do use minus one, which basically says send the PDF file and then return immediately. Don't wait for its process because I just want to like send as many PDFs as I can. So yeah, as quickly as I can. So that's what I'm doing.
Now, because we are returning the status of these files immediately, what we will see in a moment is that the status for these files is going to come up as a processing because we've literally sent it to Pinecone. Then of course, Pinecone has started processing the document and then we returned the status of that document immediately rather than waiting for its process.
So if we have a look here, we're going to see that all of these are processing. So what we now want to do is, okay, have they finished processing yet? I don't know, let's have a look. So we just call describe file and we pass the file ID. And now we can see that at least this first document has finished processing.
And I'm going to run this little for loop here to check the rest of them. So let's see how many of them are complete. Okay, so all of them are complete. That was super fast. So yeah, we can move on to actually chatting with our assistant now. So we're going to come down to here.
I'm going to import this Markdown display because the assistant will reply in Markdown. So there are citations and stuff in there and they are formatted with Markdown. So it's a lot nicer to print that out with Markdown rather than just viewing it as a direct print. So I'm going to hit chat completions again.
Same question, I'm just going to ask you about the Mixture model and see what it says. Okay, cool. So we have this big chunk of text telling us all about Mixture 8x7b. And well, I mean, kind of the whole point of, or one of the main points of Pinecone assistance is that everything is grounded in like actual knowledge.
So we can see that in the response here. So one, everything is accurate. So Mixture 8x7b is a sparse mixture of experts language model. And it gives us all this information about it, which is great. But one thing that is really nice about this is we have the citations here.
So we can see here that we have reference one. So this mixture of experts PDF, which we can open in a moment. And we also see what pages the, like we've used in order to get that information. So we have page one and page four. Okay, so to basically construct this paragraph here, that is what is being used.
Then we have, okay, we have a PDF one again, where page is one, two, two, and then also page six to construct this one and so on and so on, which is pretty nice. And then we can also click on here and it brings us through to just seeing the PDF here, which is pretty cool.
And then obviously we can refer to our citations and basically just have more confidence in what the assistant is telling us, which is nice. So that is cool. But now I want to actually chat with the assistant. So we're gonna set up some code that will allow us to do that a bit more easily.
So first thing we need is a list of our chat history. And I'm gonna initialize that list with the first message that I sent asking about Mixture and the response from our assistant, which is here, okay? So actually we can just have a quick look at the output there so that you can see what I'm doing or what I am looking at.
Okay, so we have the content and then we also have the role, which is assistant. So yeah, we're just passing that and creating a message object using those two values. And then I'm going to create this chat function, which is just gonna consume a message right from me. So when I'm asking a question, I'm gonna pass it into there.
It's going to format my input into a message object. We are going to get the response from our assistant. We're going to extract that response out into the format that we need. And then I'm going to add both my initial message. So what's coming in here or here and the message or response from the assistant to the chat history, okay?
So we're going to be adding to the chat history over time. And then I'm gonna return the markdown formatted response so that we can actually see what it is saying. And let's ask some more questions. So the first one is I'm going to ask a little bit more about what is a sparse mixture of experts model.
What does it mean? So let's see what it tells us. Okay, cool. So we have this sparse mixture of experts model, architecture, and machine learning. No, no, gone, you know, so on and so on. And we can actually see that the reference here is different. It's not actually coming from the same paper.
It's coming from another paper that we have in there. So we can open that, right? And we see, okay, this paper is literally talking about or to some degree about SMOE, which is pretty cool. And interestingly also tells us, okay, look, we have this low expert activation where it's talking about basically the drawbacks of SMOE.
And if we come back over to here, we'll see that this is being pulled in as well, which is pretty cool. So within this short summary, it's showing us all like the most important information or in my opinion, some of the most interesting information. So, okay, that's cool, but I have no idea what this means.
So let's ask about that. So why is low expert activation a bad thing? Why is the problem with that? So let's see what it comes up with. Okay, so we're pulling from the same paper again. And it said, okay, detrimental several reasons. When you under utilizing the model capacity, suboptimal performance, inefficiency in learning and limited fine grained understanding.
Okay, so that's cool. Okay, nice. We learned about mixture and SMOE a little bit. Now let's learn about something more recent. So we have the Mamba2 model. I, you know, let's say I don't have a clue what Mamba2 is. And I just want to, I just want like a really nice little overview of what it is.
So let's ask and see what we get. Cool, so Mamba2 is a type of deep learning model designed to handle sequences of data like text or audio very efficiently. Here's a breakdown. What is Mamba2? It's a sequence model. It builds on top of the original Mamba model. And it helps to process sequences more efficiently than traditional models like transformers, which is pretty cool.
And we can see, okay, we've got reference one here, but in this output, we actually have two references, which is nice, or at least two different documents that it's pulling from. And yeah, we can go ahead and have a look at both of those. So I'm going to close these.
So transformers are SSMs. So this is a Mamba2 paper, I believe. Yeah, so Mamba2, cool. And let's have a look at what the other one is. So this is actually the Mamba1 paper. Okay, so it's pulling information from both of those and constructing this nice overview, which is pretty cool.
And yeah, probably actually pretty useful for just keeping relatively up to date with what is going on. So you can, of course, continue talking to your assistant for as long as you like, but I'm done with mine now. So I'm going to go ahead and save myself a little bit of memory by deleting the assistant.
So if we just come over to here quickly, we're going to go to assistants beta, and this is a Pinecone console. And you can see down here, I have this storage. So we have limited storage at the moment. So I'm going to just go ahead and delete my assistant, and that will free up the storage for me by just deleting all those documents that I originally provided it with.
And yeah, with that, we are done with this walkthrough. And we've seen a little bit of what Pinecone assistants can do, which is just a really easy-to-use, like out-of-the-box AI assistant that is able to ground its answers in knowledge very well, as we saw, and gives us a really nice little interface for providing more trustworthy outputs from our assistant.
So that's it for this video. I hope all of this has been useful and interesting, but I'll leave it there for now. So thank you very much for watching, and I will see you again in the next one. Bye. (gentle music) (gentle music) (gentle music) (gentle music) (gentle music) (gentle music) you