FIRST Look at Pinecone Serverless!

Today, we're going to be taking a look at the new Pinecone serverless. Now, serverless is a complete redesign of the Pinecone Vector Database, and it comes with much more flexibility, scalability, and a huge reduction in costs for the vast majority of use cases on Pinecone. So just to point out the cost savings of this, I want to take a look at the pricing calculator.

So if I look at a very typical use case, all right, so I'm on the Pinecone website, and I come down to, they explain everything here. The pricing is like completely different, so you're not paying for like a pod now. Obviously, you're on serverless, so there is no longer any such thing as pods, but instead, you're paying based on the amount that you're storing and the amount that you're querying.

So you have a separation between storage and queries, which means you can store a ton of stuff, and you can pay very little, because that's now on storage-optimized hardware rather than compute-optimized hardware. So if we come down here, we'll go to, you know, the raggedy use case, it's probably the most common.

Zoom in a little bit. And yeah, if I go, okay, order 002 embeddings, like 5 million records is a lot for most raggedy use cases. Honestly, I think you're probably gonna be using less. But anyway, let's just leave 5 million for now. Queries per month, so that's quite a lot, 260,000 queries a month.

Again, it depends on your use case, but I think most of the things that I have built at least are gonna go nowhere near that. And then writes per month. So, you know, how many new vectors, right? So how many new vectors am I going to write to the database every month?

Let's say 100,000, okay? Metadata size is pretty big. It depends on how you're structuring everything. And then namespaces, again, that's gonna depend. If you have a lot of different users, for example, if it's a user-facing app, you will probably have quite a lot of namespaces, but it depends, all right?

So with that, it's $21 a month. For this, in the pod-based Pinecone, you'd be hitting $70 a month. Now, this is a large number. For the majority of use cases, you're probably gonna be looking at like 500,000 maybe, maybe a million, you know, it varies a lot, right? Depending on your use case.

Now, in the past, for 500,000 vectors on the pod-based Pinecone, you just have to pay for a pod, like P1 or S1, and that's gonna cost $70, just every month, right? That's how much you're paying. Now, okay, $6, right? That's an insane cost saving. If, you know, if you're doing less queries per month, which is fairly likely for a lot of users, I think it goes even lower.

Now, if we decrease the number of namespaces, let's say, worst case scenario, you just have one namespace, it goes up a little bit. But, you know, it's still $10 compared to the $70 that we would have had before, which is pretty good. Now, that cost savings, let's take a look at how we'd actually use new Pinecone serverless via the Python client.

So I'm gonna come over to the examples of Pinecone, and I'm just gonna do, we can do semantic search for now. Okay, so semantic search, I will open this in Colab, and I'll come to here. Now, first thing we're gonna need to do is just install everything. The installs are gonna be slightly different by the time you see this, hopefully.

So you should see 300 for the Pinecone client, and 0.6 rather than this. So this here for Pinecone datasets. So I'm installing those, and then we're gonna come down here and just download a dataset. Now, the reason that we're using this dataset, Pinecone datasets, is because we already have the vectors created for us.

So we don't need to go and spend time creating the embeddings. So it's a lot quicker. And then once that is downloaded, I'm gonna print out length. So I've just taken a slice of the dataset, like 80,000 records there, and yeah, it's super quick. Okay, and then we'll come down to here.

We're gonna decide whether we want to use serverless or the pod-based approach. So for the, you know, we can do both. Okay, so with the new Python client, it supports both. If you wanna use pods, you'd set that to false. Otherwise, we go with true. I'm also gonna use true.

And then we have our API, keys, environment, variables. So for serverless, we don't need the environment anymore. So we can just remove that. Instead, you know, there's the region, which is basically the same thing, but it just doesn't include the cloud name, which we have here instead. So I'm gonna go over to my Pinecone project here.

I'm going to go to API keys, and I'm gonna take an API key, okay? So this does need to be a serverless project. Right now, with serverless, there is not a free tier, as we have with the pod-based architecture in Pinecone. Instead, there is currently a, like $100 that you can claim and just use serverless for that.

And obviously, if you're using that, it's gonna last you a pretty long time, given the prices I just showed you. But there is a Pinecone serverless coming, like a free tier. So that is coming, it's just not quite there yet. Now, we are gonna come down to here. I'm gonna put in my API key here, and then I'm gonna come over to here.

So I'm not using the pod spec here, I'm using serverless spec. So this is a new object we have that just defines your, basically the specification of your, the configuration of your index. I am using serverless, of course, so I'm using this one. And we specified cloud and the region.

Right now, this is the only one that is currently supported, as far as I'm aware. So you will want to use the same, but of course, more are coming. Also new, we're gonna have GCP and Azure pretty soon as well. Cool, so run that. Let's create an index. Slightly different again here.

So rather than just listing the indexes, we need to go through, because when we list index, we get a lot more information than we used to with the old client. So we just need to do this to return the indexes, or the index names. If you do have indexes, you can also use this, I believe.

Okay, so after we've done that, we, if I run this, let me run it. I'm gonna check if the index already exists. If it doesn't, I'm gonna create one. The spec here is the serverless spec that you saw before. And then we come down to here, and we would just gonna have a look, okay, is the index being created?

Once it has been, we're gonna describe it. I literally just created mine. So this now shows as being, having some vectors in there. You should see zero if this is your first time running through the notebook. Then what I'm gonna do is run this. So I'm going to just upset all of my vectors.

That will take a moment to run. Now, while that is running, let's have a look at, let's have a look at how much money we'd be saving on this compared to the pod-based approach. So we have, what is it, like 80,000 vectors, I think. So we can do 80,000 vectors.

Let's say I'm gonna get, be optimistic and say, I'm gonna get 100,000 queries a month, which I don't think I will. And let's say I'm gonna write another 20,000 vectors a month. I'm gonna have one namespace on this, so it's worst case scenario. And my vector dimensionality is actually, I think it's 384.

Yeah, 384. So that's gonna cost me a grand total of $3.69 a month, which is not too bad. And even better when you consider we have like a $100 credit. So that's not bad. Now, I'll fast forward to when our upload is complete. Okay, so that has finished and we can go ahead and just make a query.

Okay, let's see what we get. We should get basically the same results as what we've had before with the pod-based approach. So we can say, which city is the highest population in the world? We're just doing a semantic search here. So we're gonna just see the results that we get.

Let's see. Okay, it says, I think I format it a little nicer here. Yeah, what's the world's largest city, biggest city, so on and so on. Okay, so these are core questions that we're searching across here. And then I can modify the language a bit. I can say, which metropolis has the highest number of people and just see what it says.

And yeah, again, we get what is the biggest city and then what is the world's largest city. So yeah, semantic search, everything checks out there. So, I mean, that all looks good. Once you're finished with that, we just wanna save resources and just delete that index. So we do that.

And we're now done. So that's a very fast introduction to PyCon serverless. It's very exciting. It's gonna save people a ton of money. It is going to make vector search a lot more scalable, accessible, and we're gonna see a lot of really cool performance upgrades. So for now, I'm gonna leave it there.

I hope all of this has been interesting and useful. So thank you very much for watching and I will see you again in the next one. Bye. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) you

FIRST Look at Pinecone Serverless!

Transcript