NEW Pinecone Assistant Features + GA Release!

Today we're going to be trying out the new Pinecone Assistant, which has just been made generally available. Now for those of you that don't know, Pinecone Assistant is an API service that provides you with an agent that comes with best-in-class, essentially, rag capabilities. It focuses on making your agent as grounded in truth as possible.

Of course, being Pinecone, their whole thing is vector databases retrieval, so they're pretty good at that sort of thing. Now with Assistant being made generally available, that comes with a few new features. So those are, and there's a great summary of everything in this article here as well, but those are probably the most exciting, in my opinion, the custom instructions, which obviously is pretty useful when you're building your own agents, and this, which is very cool, so new input and output formats.

So the input formats, I mean, that's nice, so we have markdown and duct files now, which is cool, in addition to PDF and text, but then what I do like here is the output format. There is essentially a JSON mode, so your agent can output structured output now, and generally speaking, I'm always a big fan of that because basically everything that I've had to build with agents, especially, well, I'd say more recently, but for quite a long time now, that ability to output structured text is just incredibly useful, at least to do that reliably.

So that's pretty big, in my opinion. Region control, so EU, US, so obviously GDPR if you need that, and I shouldn't say finally, because these are also pretty important, but there are also, there's also a new chat API and a new context API, so we'll take a look at all of those.

Now, before we jump in, I will just say, so this visualizes things pretty well to why you might actually want to use PyCon assistance over like OpenAI assistance, which would be the most similar thing out there. Generally speaking, and we will see this in the example that we walk through, PyCon assistance is, it's just very good at grounding everything it tells you with sources.

So it's much more trustworthy, much less likely to hallucinate and make stuff up than you would find in OpenAI assistance, and you can see some metrics here, but yeah, we'll see actually in the example that we walk through, this is the case as well. So the example we're going to walk through, there will be a link to this example in the description video and also the top comment video, I will make sure that is in there, but we're going to go in and we're going to build something very simple.

So we're going to build a assistant, it's going to help us understand some, like an AI paper, and it's going to be, it's going to be from Yorkshire, so it will be, well, we'll see, we'll see what it will be like. So first thing is, of course, API key, this will be free.

So you can go and get an API key, it's app.pycon.io. I did include a link here, there. So you can go there, you get your API key, and then you just enter it. So I'm going to be creating my new assistant called the Yorkshire Assistant. Right now, I don't have any assistants, okay, so you can just list those like that.

So I'm going to go ahead and create him. So my instructions, of course, one of the key features here is those custom instructions. I'm going to go with the typical, you are a helpful assistant that must help with some queries, and then we're going to modify it a little bit.

So he's going to be helpful, and he's also going to be from the Yorkshire countryside and will always respond with heavy Yorkshire slang, colloquialisms, and references to that great county. They will, or he will try, he, she, will try to use relevant metaphors to explain, explain concepts to the user.

And then one thing that I really like about more recent models is that they do tend to go with markdown output. And I found that with assistant, it didn't go with markdown output, which is actually probably not a bad thing. By default, I would maybe rather defined, I would like that.

So I'm just telling it's here. So, you know, format your answers in markdown whilst maintaining the Yorkshire accent. So we create our assistant. Here I'm doing another quick check, okay, just making sure we don't actually have that assistant already. Then we do create assistant, passing our name, instructions, and then timeout here is just, okay, how long are we going to wait before we just return?

You can, it doesn't really matter, but it's there. Now we're going to download a kind of interesting paper. I haven't been through the whole thing, to be honest, but it looks pretty interesting from the abstract. So there is this reasoning language models or large reasoning models, a blueprint. So it's just kind of talking about these new models, which are getting a bit of, you know, people like them.

It's basically, people are very interested in these. It's like the O1s, O3s, and a few of those other ones. So there are quite a few of these out there and a lot of people are very interested in them and they are pretty cool. So we have this paper and I want to learn a little bit about it.

So we go on, we download it here. That's just going to download it. If you're in Colab, it's going to download it here for you. If you are, of course, local or somewhere else, it's going to just, it's just going to put it in the same file folder that you are working in right now.

Okay. We're going to go ahead and I'm going to take my file name, which is this, I saved it right here, and I'm going to upload that file. I had this metadata, you actually don't even need this and I don't even use it, but you basically, you can just put in some stuff in here.

Actually the type of this is a paper. Okay. So, you know, you can put that in there if you like, you don't have to, it doesn't really matter. But what this will do, this is now going to go ahead and it's just going to upload our file. It does take a moment because it is, well, it's waiting not just for the upload, but also for the file to finish processing and also being made available.

So you can modify that as a, I think it's a weight parameter, if I'm not wrong. And that will basically just say, okay, wait a little bit or not. It might also be timeouts, but anyway, you can use that if you want. We're just going to wait. Okay. So that is now uploaded and we're going to go through and just try out these various APIs that are available.

So very quickly just summarize what those are. There's three, I think it's just three. So there's a chat API. This is like the standard API, which we use to interact with our system. Think of this like chat completions from OpenAI, but it is Pinecone's version of that, which obviously has the, you know, the other fields and stuff that are relevant to what you're doing here.

Okay. So this is the one that you're probably going to want to use if you're using this entire system as essentially a chatbot or like a full-on agent. This is what you would use. Okay. And we'll, we'll use it in a moment. The other one is the context API.

So that is okay. In some cases, maybe you don't want to use the agent part of what Pinecone is doing here or what the assistant is. Instead, maybe you just want to, you know, take advantage of Pinecone's retrieval component, right? So the retrieval of Pinecone is obviously pretty good.

So this will, the context API just allows you to take the documents that you have uploaded, the files that you've uploaded. They obviously get processed, they get stored, indexed, you know, whatever they're doing there. And then this context API basically allows you to retrieve everything, right? It doesn't, it doesn't go through the whole generation.

There's no agent that's deciding, oh, I need to search for this. I'm going to retrieve these. I'm now I'm going to generate an answer. There's none of that. It's just the retrieval component, right? So if you, if you look, it's basically in retrieval augmented generation, right? That I mentioned here, it's like the retrieval augmentation parts.

So then what you get is all of your citations. They don't call them citations. They're called snippets. And those snippets are, you know, it's basically your context or your chunks of a document that you can then take and send to a downstream agent, LLM, you know, whatever you're doing.

Okay. So it's up to you to define that generation part in the case of using the context API, which I like, I like that they're breaking this apart a little bit because I don't always want to use like the full thing. Then finally check completions API. This is the chat API, but it's just an opening eye compatible one.

So we'll talk about this later. Yeah, I'll leave it for later, but that can be useful in some scenarios as well. Cool. So let's see how the chat API works first. So chat API, we're going to go in, I'm going to create our message lists of messages, which is just role and content, pretty familiar format there.

And of course, this is a list. So you can have many messages in here. You might, you know, when you're obviously using this as a chat bot, you're going to be adding multiple messages. It's going to be a conversation. So you would be appending those to your messages object as you go or messages list, I should say.

So we create our messages. Then we're going to go assistant chat. Okay. That's it. It's just assistant chat, super easy. And we just pass our messages into the messages parameter there. And then we get a response. Okay. The direct content generated by our system is going to be going into a response message content.

So I'm going to ask what is a reasoning language model or RLM. And we'll see what happens. I'm not using print here, because I've asked it to output things in markdown. So I am just asking one formatting that into markdown. And you can see here, there isn't actually any markdown necessarily being used.

But occasionally there will be some. So it depends on, you know, what is it answering? It doesn't always need to. So we say a up. And RLM or reasoning language model is a type of AI model that sends, you know, so on and so on with advanced reasoning mechanisms.

Okay. They're built on these three main pillars, so on and so on. Right. So it's just explaining what an RLM is. And then we have the traditional sort of Yorkshire AOP there as well. We can run that again, see what we get. Okay. It's being a little more Yorkshire here.

So AOP and RLM or reasoning language model is a rate fancy type of AI model, which is great. Rate. This sort of stuff, I don't even know how you can pronounce. Large language models with advanced reasoning mechanisms, complex problems, solving tasks, integrating structured reasoning processes like Monte Carlo tree search.

Okay. So we have that. In essence, RLMs are the next step in AI evolution, combing the best language, understanding, and advanced reasoning to solve complex problems more effectively. So it's all right. I mean, I would like more Yorkshire in there, but it was not bad. And it was definitely accurate.

So cool. Chat response. We have this. So I just want to show you like the full response, what we have in there. So what we just extracted out here is this. Okay. So we have this bit here, but you see, there's a ton of other stuff in here as well.

And this is useful. This is incredibly useful when we're, you know, the whole point of grounding our LLM or agent responses, assistant responses, whatever you want to call it, the whole point of grounding them with citations and truth is for the most part to provide users with a little more trust in the system.

And of course, I mean, there's a whole thing of, you know, it wouldn't be able to answer the question if you didn't give it this information. That's a big part of it, of course. But part of the, you know, returning citations is that trust component, which can be very useful.

But of course, reading this, I don't necessarily know where this information is coming from as a user. So one thing that I think a lot of us would like to do downstream is modify this to actually include the citations, or at least include the sources somewhere in our responses or in our, you know, interfaces, whatever we're doing.

So let's go through this and just see how we'd actually do that. So Pinecone does return these citations, okay, which is great. So we have citations. The citation, it shows the position in the text. This is a character position where that citation is coming from, okay. And then it mentions the references.

And references is actually a list here. So we have a list of these reference objects. Most importantly, we have the pages, right. We have the file that that comes from, which is useful. But I actually think more useful is actually that we get the signed URL, right. So the signed URL is just like a private URL that we can access to go ahead and actually see our PDF, right.

And this is a PDF using language models. So this is being stored by Pinecone. And then they've given us this link to go and access it. And of course, we can then go and share that link. So in our interfaces, we'd be taking that signed URL, pushing it forward into like a UI or whatever else.

And, yeah, getting, we're basically able to provide users with, okay, this is exactly where this information is coming from, which is pretty cool. So we have that. I'm not going to go through this right now. Close this. But we have those two bits of information, which are pretty important, in my opinion, and we can do a lot with them.

And, yeah, you can see here within the same references list, I think we have another reference. Yes. So references is actually a list of references. And we can implement those as we wish. But in this example, we're just going to implement the first reference for each citation that we get here.

We have another citation here, position, you know, a little bit later in the text. That's coming from page three, you know, so on and so on. You can see it continues going. So we have a few citations in there. And let's see how we might want to integrate that nicely with our response.

So within, as I mentioned, citation object, we have that character position. We have our references, which includes pages and assigned URL. We're going to use all of that. So what I'm doing here, I'm creating a citation in markdown. So in markdown, I want to take that pages text or, you know, list of pages.

And I want to take where that goes to. And let's see what that looks like. Okay. So it looks like this. Pretty messy. But you can see that this is actually just markdown format. So we have square brackets followed by link within the standard brackets. And if we display that in markdown, it's going to look like this.

Okay. So we have this nice little citation. We can click on this. And it'll take us back through to that PDF, which is great. So we can actually go ahead and insert those into our response. So let's go ahead and do that. So what we're doing, I'm just taking out the response content.

I'm converting it into a string. So I'm not like overwriting the original. Then I'm going to loop through the citations that we have in reverse order. So we're going to be inserting them. You know, if we insert them in the original order, basically we would have to modify the position count for every new insertion.

So we don't want to do that because that's complicated. So we'll just do it in the reverse order. Okay. So reverse order, we build that citation, which is what I just showed you here. So we're going to build this. That's what we do here. Then I'm going to just insert it, right?

So we have our content. We're going to take the content up to that position, insert our citation, and then content, you know, following that position. This will insert it right after a word. So we could also add a little bit of like a space here as well, if we wanted, and that should look a little nicer.

It's up to what we would like. Okay. We can see this. Right? So now this is our text, and we have those citations right in the middle of that. Okay. Which is pretty cool. Great. So we have that. Now, the one thing I did mention before is that using Pinecone Assistant, you generally get better grounding of knowledge than you would with like OpenAI Assistant.

So let's ask a, you know, a kind of relevant question, but this information is just not contained within the paper. Okay. So how many organizations are adopting RLMs? That is just not mentioned. So let's see what it comes up with. Now, this is the sort of question where typically you're probably fairly likely to end up getting a hallucination.

And yeah, here we avoid that, fortunately, which is, you know, one of the pros of Pinecone Assistant. So we get a lot. I can't give you an exact number of organizations adopting RLMs, but I can tell you that these models are garnering a lot of interest in various sectors.

RLMs are being used in various fields like healthcare, so on and so on. Yeah. There's a lot more Yorkshire in here. And I think that all looks pretty good. So we have that, right? So it's saying, okay, I can't give you an exact answer, but, you know, this is probably more of like an opinion more than anything.

So that's pretty cool. Now, context API. This is what I mentioned before. So basically what we have with the chat API is like agent, rag, document processing, all of that's kind of wrapped into one thing. This is breaking up part of that pipeline. So context API, you're still doing the document processing, right?

So that's all handled when you obviously upload your files, but then retrieval, right? It's extracting out that the retrieval component or the retrieval augmentation of rag and, you know, getting rid of the sort of chat or LLM agent component. So let's see, let's have a look at what this looks like.

So we're going to say, what is an RLM? Okay. Just see what we get. Okay, cool. So that gives you a lot. So let's try and like pass through this a little bit. So we get this context response object inside the context response. We have snippets. That is a list of snippet objects, which contain mainly this content.

That's probably what we're most interested in there. So yeah, you have all this content and these are basically like chunks of document that are relevant to whatever you've asked, right? So what is it? RLM is probably quite a few relevant chunks in there. It includes also includes the file with the link as well as before.

So you can pull that information through if you like, which is useful. And also pretty useful here is the basically the pages or the page number where this information is coming from. So, you know, if we're wanting a little more, a little more control over what we're building, this is pretty good.

It means that we can use a part of, you know, Pinecone's assistant without using the full thing, which is nice. So yeah, we would take those snippets, feed them into some downstream LM agent, you know, whatever it is that you're doing now onto the final API, chat completions API.

Now this isn't, there's no new functionality with this API beyond what we have in the chat API. However, what it does do is essentially copies, or it does, it copies the OpenAI standard format for chat completions and allows us to essentially swap out a OpenAI or other LLM that uses the same format as OpenAI.

It allows us to swap out that API endpoint for Pinecone's assistant, which is pretty useful. So that's, let's see how that works. We go here. We need to get our assistant host, assistant name, right? So we're basically constructing a URL here. We have, because we're going to be essentially replacing where OpenAI is pointing to this.

Okay. So we have our, this is our host. And then we also have in here, this would be Yorkshire assistant. Okay. So we put those together to get our base or just URL. Okay. Which is this. And then what we're going to do is actually initialize OpenAI client for API key.

We're actually passing in our Pinecone API key, but for base URL, we're swapping the standard base or default URL that OpenAI would use, which would point to OpenAI with this. Okay. So we run that. And now we can actually interact with Pinecone assistant as we would with, you know, chat completions through OpenAI.

So we can run this. This is exactly the same format that we would use with OpenAI. And we can see here that we are replying to the question before where we're asking about organizations using the new RLMs and we get our rate. Let's have a gander at what we've got here from the search results.

It ain't exactly clear how many organizations are adopting RLMs. Right. So there we go. We we've got our answer. Like it doesn't know again. Right. Which is, is what we want it to be saying. We don't want it to be saying like making things up. That's kind of, you know, just exactly what we don't want it to do.

So that is good. And okay. Why would we do that? You know, why, why would we use the chat completions API? Well, it's basically just to allow us if we're using OpenAI or other providers, we can just quickly swap out and sort of test and demo with Pinecone. Or even if we, you know, if we're offering more, if we've built something and we're offering multiple LLM providers, we can go and just swap that out in our code very easily, which, you know, when you need to move fast, this is incredibly useful.

Okay. So with that, we're actually done. So that's, that's everything with, well, not everything. That's probably the main features of a system where we, you know, we should be aware of. Now, finally, when you are done with your system, you might want to delete it so you can go ahead and we'll just run delete your system.

Easy. Now I would recommend, so there is this article from Pinecone, so that the sort of blog release on this, I think is super helpful in just understanding what is actually going on here, which is just incredibly useful. So I'll leave it there for now. I hope all this has been useful and interesting.

So thank you very much for watching and I'll see you again in the next one. Bye. (soft music) (music)

NEW Pinecone Assistant Features + GA Release!

Chapters

Transcript