back to index

FIRST Look at Pinecone Serverless!


Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to be taking a look
00:00:01.880 | at the new Pinecone serverless.
00:00:04.120 | Now, serverless is a complete redesign
00:00:07.200 | of the Pinecone Vector Database,
00:00:09.480 | and it comes with much more flexibility,
00:00:14.000 | scalability, and a huge reduction in costs
00:00:18.040 | for the vast majority of use cases on Pinecone.
00:00:21.060 | So just to point out the cost savings of this,
00:00:25.440 | I want to take a look at the pricing calculator.
00:00:28.760 | So if I look at a very typical use case,
00:00:32.000 | all right, so I'm on the Pinecone website,
00:00:35.560 | and I come down to, they explain everything here.
00:00:39.640 | The pricing is like completely different,
00:00:41.460 | so you're not paying for like a pod now.
00:00:44.400 | Obviously, you're on serverless,
00:00:45.880 | so there is no longer any such thing as pods,
00:00:49.400 | but instead, you're paying based on the amount
00:00:53.480 | that you're storing and the amount that you're querying.
00:00:55.880 | So you have a separation between storage and queries,
00:00:59.600 | which means you can store a ton of stuff,
00:01:01.520 | and you can pay very little,
00:01:03.900 | because that's now on storage-optimized hardware
00:01:06.880 | rather than compute-optimized hardware.
00:01:09.680 | So if we come down here, we'll go to,
00:01:12.200 | you know, the raggedy use case,
00:01:13.280 | it's probably the most common.
00:01:16.040 | Zoom in a little bit.
00:01:18.080 | And yeah, if I go, okay,
00:01:20.640 | order 002 embeddings,
00:01:23.480 | like 5 million records is a lot for most raggedy use cases.
00:01:28.480 | Honestly, I think you're probably gonna be using less.
00:01:32.160 | But anyway, let's just leave 5 million for now.
00:01:35.400 | Queries per month, so that's quite a lot,
00:01:38.520 | 260,000 queries a month.
00:01:41.360 | Again, it depends on your use case,
00:01:43.680 | but I think most of the things that I have built
00:01:47.120 | at least are gonna go nowhere near that.
00:01:49.720 | And then writes per month.
00:01:52.960 | So, you know, how many new vectors, right?
00:01:55.440 | So how many new vectors am I going to write
00:01:58.000 | to the database every month?
00:01:59.880 | Let's say 100,000, okay?
00:02:02.280 | Metadata size is pretty big.
00:02:05.320 | It depends on how you're structuring everything.
00:02:08.320 | And then namespaces, again, that's gonna depend.
00:02:10.280 | If you have a lot of different users,
00:02:12.760 | for example, if it's a user-facing app,
00:02:15.160 | you will probably have quite a lot of namespaces,
00:02:17.520 | but it depends, all right?
00:02:19.440 | So with that, it's $21 a month.
00:02:23.200 | For this, in the pod-based Pinecone,
00:02:25.880 | you'd be hitting $70 a month.
00:02:29.720 | Now, this is a large number.
00:02:31.200 | For the majority of use cases,
00:02:32.520 | you're probably gonna be looking at like 500,000 maybe,
00:02:37.240 | maybe a million, you know, it varies a lot, right?
00:02:40.600 | Depending on your use case.
00:02:41.840 | Now, in the past, for 500,000 vectors
00:02:46.840 | on the pod-based Pinecone,
00:02:49.240 | you just have to pay for a pod, like P1 or S1,
00:02:52.320 | and that's gonna cost $70, just every month, right?
00:02:56.960 | That's how much you're paying.
00:02:57.920 | Now, okay, $6, right?
00:03:01.680 | That's an insane cost saving.
00:03:05.440 | If, you know, if you're doing less queries per month,
00:03:07.880 | which is fairly likely for a lot of users,
00:03:09.920 | I think it goes even lower.
00:03:12.520 | Now, if we decrease the number of namespaces,
00:03:14.600 | let's say, worst case scenario,
00:03:16.480 | you just have one namespace, it goes up a little bit.
00:03:19.480 | But, you know, it's still $10 compared to the $70
00:03:23.560 | that we would have had before, which is pretty good.
00:03:27.960 | Now, that cost savings, let's take a look
00:03:31.280 | at how we'd actually use new Pinecone serverless
00:03:34.880 | via the Python client.
00:03:36.760 | So I'm gonna come over to the examples of Pinecone,
00:03:41.000 | and I'm just gonna do, we can do semantic search for now.
00:03:45.120 | Okay, so semantic search, I will open this in Colab,
00:03:49.320 | and I'll come to here.
00:03:51.040 | Now, first thing we're gonna need to do
00:03:53.200 | is just install everything.
00:03:54.520 | The installs are gonna be slightly different
00:03:56.680 | by the time you see this, hopefully.
00:03:58.520 | So you should see 300 for the Pinecone client,
00:04:02.200 | and 0.6 rather than this.
00:04:05.800 | So this here for Pinecone datasets.
00:04:10.800 | So I'm installing those,
00:04:13.560 | and then we're gonna come down here
00:04:15.360 | and just download a dataset.
00:04:17.400 | Now, the reason that we're using this dataset,
00:04:20.200 | Pinecone datasets, is because we already have
00:04:23.240 | the vectors created for us.
00:04:25.000 | So we don't need to go and spend time
00:04:27.600 | creating the embeddings.
00:04:29.760 | So it's a lot quicker.
00:04:30.880 | And then once that is downloaded,
00:04:34.440 | I'm gonna print out length.
00:04:35.520 | So I've just taken a slice of the dataset,
00:04:37.720 | like 80,000 records there, and yeah, it's super quick.
00:04:43.720 | Okay, and then we'll come down to here.
00:04:45.800 | We're gonna decide whether we want to use serverless
00:04:47.840 | or the pod-based approach.
00:04:50.120 | So for the, you know, we can do both.
00:04:54.360 | Okay, so with the new Python client, it supports both.
00:04:57.480 | If you wanna use pods, you'd set that to false.
00:04:59.280 | Otherwise, we go with true.
00:05:01.120 | I'm also gonna use true.
00:05:03.440 | And then we have our API, keys, environment, variables.
00:05:07.640 | So for serverless, we don't need the environment anymore.
00:05:12.760 | So we can just remove that.
00:05:14.840 | Instead, you know, there's the region,
00:05:17.320 | which is basically the same thing,
00:05:20.320 | but it just doesn't include the cloud name,
00:05:23.400 | which we have here instead.
00:05:25.560 | So I'm gonna go over to my Pinecone project here.
00:05:30.560 | I'm going to go to API keys,
00:05:33.080 | and I'm gonna take an API key, okay?
00:05:35.640 | So this does need to be a serverless project.
00:05:39.160 | Right now, with serverless, there is not a free tier,
00:05:44.160 | as we have with the pod-based architecture in Pinecone.
00:05:49.120 | Instead, there is currently a, like $100
00:05:54.120 | that you can claim and just use serverless for that.
00:05:57.400 | And obviously, if you're using that,
00:05:59.120 | it's gonna last you a pretty long time,
00:06:01.080 | given the prices I just showed you.
00:06:03.240 | But there is a Pinecone serverless coming,
00:06:06.600 | like a free tier.
00:06:07.640 | So that is coming, it's just not quite there yet.
00:06:11.560 | Now, we are gonna come down to here.
00:06:14.040 | I'm gonna put in my API key here,
00:06:16.240 | and then I'm gonna come over to here.
00:06:18.600 | So I'm not using the pod spec here,
00:06:23.160 | I'm using serverless spec.
00:06:24.260 | So this is a new object we have that just defines your,
00:06:28.840 | basically the specification of your,
00:06:30.920 | the configuration of your index.
00:06:32.920 | I am using serverless, of course, so I'm using this one.
00:06:36.000 | And we specified cloud and the region.
00:06:38.240 | Right now, this is the only one
00:06:40.200 | that is currently supported, as far as I'm aware.
00:06:43.160 | So you will want to use the same,
00:06:45.280 | but of course, more are coming.
00:06:47.160 | Also new, we're gonna have GCP
00:06:49.400 | and Azure pretty soon as well.
00:06:51.280 | Cool, so run that.
00:06:53.440 | Let's create an index.
00:06:55.560 | Slightly different again here.
00:06:57.640 | So rather than just listing the indexes,
00:07:00.020 | we need to go through, because when we list index,
00:07:02.640 | we get a lot more information
00:07:03.760 | than we used to with the old client.
00:07:05.500 | So we just need to do this to return the indexes,
00:07:10.500 | or the index names.
00:07:13.400 | If you do have indexes, you can also use this, I believe.
00:07:17.700 | Okay, so after we've done that, we,
00:07:23.920 | if I run this, let me run it.
00:07:25.380 | I'm gonna check if the index already exists.
00:07:28.920 | If it doesn't, I'm gonna create one.
00:07:31.040 | The spec here is the serverless spec that you saw before.
00:07:35.780 | And then we come down to here,
00:07:38.100 | and we would just gonna have a look,
00:07:40.140 | okay, is the index being created?
00:07:42.280 | Once it has been, we're gonna describe it.
00:07:46.340 | I literally just created mine.
00:07:48.740 | So this now shows as being, having some vectors in there.
00:07:53.740 | You should see zero if this is your first time
00:07:56.120 | running through the notebook.
00:07:57.400 | Then what I'm gonna do is run this.
00:07:59.100 | So I'm going to just upset all of my vectors.
00:08:02.620 | That will take a moment to run.
00:08:04.500 | Now, while that is running, let's have a look at,
00:08:07.860 | let's have a look at how much money we'd be saving on this
00:08:12.500 | compared to the pod-based approach.
00:08:15.160 | So we have, what is it, like 80,000 vectors, I think.
00:08:20.000 | So we can do 80,000 vectors.
00:08:26.300 | Let's say I'm gonna get, be optimistic and say,
00:08:29.940 | I'm gonna get 100,000 queries a month,
00:08:32.660 | which I don't think I will.
00:08:34.740 | And let's say I'm gonna write another 20,000 vectors a month.
00:08:38.920 | I'm gonna have one namespace on this,
00:08:41.780 | so it's worst case scenario.
00:08:44.100 | And my vector dimensionality is actually,
00:08:46.300 | I think it's 384.
00:08:48.700 | Yeah, 384.
00:08:54.960 | So that's gonna cost me a grand total of $3.69 a month,
00:08:59.960 | which is not too bad.
00:09:04.760 | And even better when you consider
00:09:06.880 | we have like a $100 credit.
00:09:11.260 | So that's not bad.
00:09:12.680 | Now, I'll fast forward to when our upload is complete.
00:09:17.440 | Okay, so that has finished and we can go ahead
00:09:21.000 | and just make a query.
00:09:22.680 | Okay, let's see what we get.
00:09:23.920 | We should get basically the same results
00:09:25.560 | as what we've had before with the pod-based approach.
00:09:29.200 | So we can say, which city is the highest population
00:09:32.200 | in the world?
00:09:33.040 | We're just doing a semantic search here.
00:09:34.240 | So we're gonna just see the results that we get.
00:09:37.080 | Let's see.
00:09:38.760 | Okay, it says, I think I format it a little nicer here.
00:09:43.760 | Yeah, what's the world's largest city,
00:09:45.200 | biggest city, so on and so on.
00:09:47.080 | Okay, so these are core questions
00:09:49.000 | that we're searching across here.
00:09:51.000 | And then I can modify the language a bit.
00:09:54.200 | I can say, which metropolis has the highest number of people
00:09:57.720 | and just see what it says.
00:09:59.760 | And yeah, again, we get what is the biggest city
00:10:02.840 | and then what is the world's largest city.
00:10:04.720 | So yeah, semantic search, everything checks out there.
00:10:09.460 | So, I mean, that all looks good.
00:10:11.600 | Once you're finished with that,
00:10:13.520 | we just wanna save resources and just delete that index.
00:10:17.340 | So we do that.
00:10:19.480 | And we're now done.
00:10:21.000 | So that's a very fast introduction to PyCon serverless.
00:10:25.800 | It's very exciting.
00:10:26.640 | It's gonna save people a ton of money.
00:10:28.640 | It is going to make vector search
00:10:32.120 | a lot more scalable, accessible,
00:10:34.760 | and we're gonna see a lot of really cool
00:10:37.840 | performance upgrades.
00:10:39.000 | So for now, I'm gonna leave it there.
00:10:40.720 | I hope all of this has been interesting and useful.
00:10:42.660 | So thank you very much for watching
00:10:44.280 | and I will see you again in the next one.
00:10:47.240 | (upbeat music)
00:10:49.820 | (upbeat music)
00:10:52.400 | (upbeat music)
00:10:54.980 | (upbeat music)
00:10:57.600 | (upbeat music)
00:11:00.180 | (upbeat music)