Supercharge eCommerce Search: OpenAI's CLIP, BM25, and Python

Today we're going to be taking a look at actually quite a few things that we're going to apply in the scenario of a multimodal e-commerce search engine. Now when I say there's quite a few technologies involved here I say that because we're going to be covering something that's called hybrid search, which is a pretty recent thing So that's the idea of searching across both sparse vectors and dense vectors.

If that doesn't make sense, no problem. We're going to explain it in a moment We're also going to be taking a look at multimodality So that is where you have multiple modalities of data within the either within your query or within your search space So in our scenario, we're going to have both images and text which is a pretty typical scenario when it comes to e-commerce search and We're going to be actually mixing both of those.

So we're going to have like a hybrid Multimodal search which is I think pretty cool. And I think you know, there's a lot of cool ideas that could come from this So let's jump straight into it. Now what you can see on the screen right now is a screenshot from Amazon right so we can see multimodality.

It's definitely a thing here. So we have images we have text Probably a couple of the bits of information we can we can scrape them there But I want to focus on we have the titles and we have these images When we're searching for these things our queries might describe different parts of these images, right?

So in this top one, we have This is basically a keyword matching query, right? So someone is looking for the brand French connection They want jeans and they want them to be for men We would expect in the product description or in the actual product title. It's probably going to have all those keywords So in that case you might just do like a keyword search, right which is is what we would Typically refer to as a sparse vector search Right because with these keywords we can create these what called sparse vectors using things like TF IDF and BM 25 Right, but then on this next one, we have something a little bit different.

So we have some descriptive words here So we're saying okay faded worn-out looking Blue as well as descriptive and then jeans for men right jeans for men. That's fine There's probably a this probably okay as a keyword. It's probably okay as a keyword with these ones here, you know They're not really like okay blue might be in there in the in the description right, in fact almost definitely will be fine that's kind of like a mix both like it's descriptive and Visual but also it will probably be in the description But then faded worn-out looking, you know, maybe faded would be in there worn-out looking probably not so these are more descriptive and honestly if we're going to search with the more Descriptive like you can think of it as a more human like way of describing things Essentially a semantic search if we're going to use that we ideally want dense vectors okay, so you have like a transformer model like Bert or something on these lines and You take that and that will create like a vector in dense vector space Or if we're thinking about the multimodal side of things then maybe we don't use maybe we use something like clip Which is a multi modal model Basically clip can encode both images and text into the same space so you would create like a vector here and that would hopefully if you've you know also encoded your images be close to a Image vector that can represents the meaning behind that text if that makes sense if that's a little confusing I do have some videos on clip you can take a look at those.

I'll make sure there's a link somewhere around here So we kind of have like a mix of things here and then we kind of see that here like even more so so we have faded Blue maybe faded as a is a better example. So faded is very descriptive But French connection is like 100% that's a keyword match, right?

We want to compare keywords in that scenario a model like birds or clip probably won't capture that information You know this brand very easily so That kind of makes things more difficult. It's almost like we need a mix of both sparse and dense and You know, we can also see that here, right?

So we have these images these look kind of faded here also these ones here, right? So it'd be great if we can kind of encode those images with a model like clip But also at the same time if we say, you know in our users query They say American Eagle if I think that's a brand if it has American Eagle in there We also want to be considering that within our search So, you know We basically want to consider these different modalities and we also want to consider these different ways of searching both semantic based and also keyword base So how do we actually do that?

I can't already mentioned it, but you take your image here What we are going to do is encode that with a clip model to create a dense vector. Okay, and Then down here. We have our description what we're going to do with that is we're going to use bm25 bm25 To create a sparse vector.

All right, which is kind of different and I'll show you what they look like in a moment So we're going to create both of those and then users gonna come with their query, you know They're gonna search for like some like dark blue boss genes Maybe they'll put for men as well in there I don't know but essentially they're gonna come along this query Because we want to search for both across both the clip and the bm25 vectors We need to encode this query with both of those encoding methods, right?

So we take our query. It's gonna go Okay here and here One for clip and one for bm25. Okay, and then we sort of get like two vectors here what we need to do is we need to take them into a Vector database that can handle both of these so going to use pinecone for that there are not many vector databases that can handle this right now, so pinecone is is one of the few there and Within pinecone.

We're going to have these vectors are going to be in there, right? They're going to be sold in there and We're going to use pinecone to compare them okay, and what we we should hopefully see is that these the vectors are matched to this image here or this listing will be very close to the Vectors that we see in our users query and we would obviously return that listing laid this out very badly But we would return this so that's the general process Before we move on to code.

Let's just have a quick look at What was sparse vector was a dense vector just to give you a quick idea of what they might look like So at the top here, we have a sparse vector. The reason it's sparse is because the information within that vector is Very sparsely located right?

There's only a few non zero values within the vector Whereas, you know, the majority of the values are zeros. Okay, they're kind of like they don't have any meaning There's there that is what a sparse vector is For a dense vector. It's different so you can see here the majority of the vector Has information inside it the majority of the vector is non zero values.

There's also a difference in dimensionality So dimensionality of like this vector here, you know, it could be tens of thousands and you know Typically it is right so if we for example use a BERT tokenizer as part of process to create our sparse vector we would end up with a 30k Dimensional sparse vector, right?

But then doesn't take a lot of space to store all of this information because in reality you don't actually need to store all of It because it's mostly zeros. So what you would end up doing is all right, so we have a Point three here. You'd end up just creating like a dictionary or something along these lines Where you have okay in position or in index two, three, four five We have the value zero point three and then in the index over here is like ten or eleven or something We have zero point one.

So then it actually ends up becoming quite an efficient way to store these vectors Whereas you can't really do that with dense facts You just have to store them as is the typical dimensionality here it varies but it's a very typical embedding dimensionality for these numbers at the moment is 768 but you will see some others it kind of goes up to like think about the OPI embeddings The current order zero zero two model is like one five Three six, I think something along those lines and some of their older models actually use over ten thousand Embedding dimensions, but that that's really kind of like an edge case So that kind of explains what we're going to be looking at We'll just kind of covered a lot of the theory behind all this stuff But I think you know, it'll make a lot more sense if we actually go through the the code to create this So let's jump into that Okay, so I'm getting the notebook for this from the pinecone examples repo here in ecommerce search Yeah, and I'm just gonna go and open that in collab Okay, so This is a collab for this.

I will make sure there is a link to this notebook or to the github repo At the top of the video right now so first thing we come to is a pip install now if you notice that you have Transformers and sentient transformers at the top of the notebook.

That's a good indication We're probably going to need to use the GPU So you just head up to runtime change your runtime type and make sure you're using a GPU Then if you are running locally try and make sure you have a CUDA enabled GPU or if not You know you just deal with the slowness of CPU a little bit.

This isn't a huge data sets It will take a bit longer but not too long. Now. This should be Updated by the time I release this video. So actually rather than using this whole pip install You can actually just do this. So Pinecone client and we're going to be using gRPC, which is essentially just It's going to help us index our vectors more quickly But at the moment, this is actually still in beta.

So I'm using this Okay, cool they're installed now so come down to here we need to Essentially initialize our connections pinecone for that. We need to go to a pinecone dot IO. So I'm gonna head over there Now unless you use pinecone You will probably see something like this and also you might actually need to create an account Once you have create an account you will see something like this.

You need to head over to API keys You will have this default key You have your environment here Which will probably not be internal bait if you probably be like u.s. West one or u.s East one you need to take this so copy this and Put into your environment here and you also need to copy the value here.

So you just click here and Put that here. I've sold mine in a variable called your API key so I run that that will initialize the connection and Then we come down to actually creating our index now to create the index because we're going to be using this what's called a sparse dense index, which is it's essentially an index that has both the Sparse vectors and the dense vectors in one to actually use that we need to make sure we're using a few Set items in the create index call.

So the metric has to be dot product and for the pod type We have to either use s 1 or p 1 other than that. I think we're okay We can kind of change everything else now dimensionality because we're going to be using The clip for the dense embeddings.

We need to set up to the same dimensionality as clip, right? Which is 512 so dimension here is always referring to your dense vector dimension So we create that your index name you can call it whatever you want. It doesn't really matter. Okay, cool So we've just created our sparse dense enabled index and then we connect to it Now if you have followed any of these videos before if you're kind of aware pinecone I think you've probably mostly seen me use this pinecone index in its name.

You can use that as fine You know, we don't have to change it because we're using sparse dense or anything like that. You can use index it's just I'm going to be using GRPC index because the connection handling with this index is better and It's also faster when you're doing upsets, right?

So in reality the index behind this is still the same It's just the connection is through GRPC rather than rest. So we run that cool and What we're going to do is so we've got this the open fashion product images data set So this Kaggle data set it just has you can kind of see in the in the right up here all these fashion images We're going to be using that also has like descriptions and everything We're going to see in a moment, but we're using a subset of that that ashrack is created here So we're going to grab that.

It's currently stored on Hugging face datasets so you can basically you can go on hugging face dot co type this in and you'll be able to find the data Set that would just take a moment to download Cool, so we can kind of see what we have here So we have ID gender mass category or all these different things in terms of the text I think the main one we're focusing on is going to be the product display name I think we also do include a few others so the majority of a sparse vector will be created with this and Then the entirety of the dense vector is going to create with the image now because we're confusing image to clip to dance vector so What we're going to do is Click on this.

So we're going to create this meta data Data frame which is just going to contain Everything except from the image column. Okay, and the image column is we're going to be storing that in a list called images So now we can take a look at you know, one of the images we have this Index 900.

It's just kind of like a dress or a long top or something along those lines So yeah, we see that and then if we just take a look at our metadata So we've now created a data frame from this we have a lot of useful information there, right? So we have the gender What is it apparel?

We have accessories and stuff down here as well It's top shirts, you know all this different stuff basically, which is kind of useful I'm going to use that later But actually the most descriptive part here if I can if I can get to it is the product display name Okay, so it's not super descriptive but it has kind of like a short product description in there Which is what we're going to be relying on for our sparse vectors so What we're going to do says here We're going to using all of these metadata fields except from the ID and year to create our sparse vectors So within our sparse vector, we're going to have men apparel top wear shirts and a blue season We're not going to include the year Maybe you could in maybe that would make sense to in some cases, but in this case, I don't think so It's casual and we're going to have this now product spline name.

So as you're going to within that sparse vector We're going to have a ton of different things which is really useful for this sort of information. So let's go down We're going to create the sparse vectors first. So for now, I'm using this File here. This will probably get updated pretty soon So when it does get updated, I'm going to leave I'm gonna leave a comment I'll just pin it to the top of the video giving you new code for this But essentially all that's in this is a bm25 implementation that we're going to be using Okay, so let's come down to here what we're going to be doing or what this bm25 implementation needs is a tokenizer right, so we need a tokenizer essentially to split our text into either like words or Pieces of words and we're going to do that using a Bert tokenizer.

So we run that So here's the tokenized function. So we import our text, right? And this will need to output basically a list of tokens It will tokenize those using hugging face transformers tokenizer we extract those input IDs and then we convert those IDs back into text tokens, right and With this tokenizer function we pass that into the bm25 implementation so basically bm25 is going to use this to tokenize everything and Then with those tokens, it's going to create our sparse bm25 vectors.

Okay cool, so Just yeah, I'm showing you what this tokenizing function is doing. So it's just essentially breaking everything apart into these little components But in reality, we're actually going to have a ton of different things in there We're going to have like all the other columns as well now with bm25 The way it works is there are a few parameters within the function that are based off of your Sort of larger amount of data, right?

So all the data you have you need to feed into bm25 That will allow the model or bm25 to essentially train on that and update those parameters which will then be used later So we run that right and you can actually see a few those sort of number of dots the average document length document frequency and so Cool, so just come down we can try right?

So this is going to create the the query for the for this particular prompt here So we run that and you can see your bm25 query vector there now We're actually running this on across all of our documents that we're going to store in the vector database We actually only need to store or run at the adopt side of it Because the query side is kind of considering the term frequency against the inverse document frequency But we only need to do that on one side for it to be effective right, so you can simplify the calculation just calculate the IDF part for the things that are stored within the Vector database.

So that's what we're doing here run that. Okay now for the dense vectors I would say we can be using something different. So we're actually going to use a clip here and We're actually implementing clip through the sentence transformers library. It just makes things easier for us you can also implement it through the face transformers library as well, but this is easier and We're using CUDA if we can right so you can just print the device to just check what you are using here Try if you can and use CUDA you can also I believe if you're on Mac you can use NPS But yeah, naturally CUDA is typically going to be faster Okay, cool.

And Yeah, we have a clip model there and what we can do is we can encode We can encode this All right, and then we see we get a 512 dimensional dense vector. Okay, cool So now we need to pull that together So we are this is going to be a little bit different if you have watched previous videos when I'm using pine cone So let's let's take a look at this.

Okay Cool. Okay, that's good Right. So we're going to be okay TQ DMS just so we can see the progress so we can see, you know The progress bar is where going through everything This is something new We'll see see soon at batch size. We're going to go 200 This will work on on Colab and you can even go higher But I find you know, this is this works fine And we're going to go through the fashion data set in batches of 200 at a time We find the end of the batch We extract the metadata from that batch So we have our metadata just kind of extracting out in 200 in chunks of 200 items or rows at a time And then so here we're concatenating all that metadata except from the ID in here to create that that single string.

This is what we're going to use to Create our sparse vector. Okay, and you can see that here Right. So that's how we create a sparse vector now for the dense vector. We need to extract the images so we get the images from the Images very or list that we created earlier.

We Convert them into dense embeddings with our dense model, which is clip and then here We're just creating some unique IDs. So the ID is a literally just a count in this case But we do need to make sure they're a string Obviously if you have actual IDs that correlate to your items Maybe you want to use those and in fact, actually we did have ideas.

We just took them out here So, you know, maybe we could have used that but it's fine. We don't it's not really so important in this example Right and a bit that's different. So in pinecone, you know, there's sort of Making things more organized so that we can do things more efficiently So there are a few changes here.

So we've created our items here. Typically would just feed them in in like a List of two pools, which is kind of not that organized. So that is changing a little bit here First we create a structure for our metadata and inside that structure. We just need to add The meta here.

So that's why I imported up here this Google Structure item here object So this is what we create for our metadata for one row of our metadata to note that look here We're looping through the batch. All right, and as we're looping through a batch, we're appending everything to this upsets list here and So as we go through we have this pinecone GRPC vector.

All right, so if you're not using GRPC use vector and A GRPC vector expects at least two things, right? Which it would be your ID and your values. So your ID is obviously your your record ID or vector ID values is your dense vector and Then for metadata and sparse values, you know that they're optional, right?

But obviously we do have metadata in our use case and we do have sparse values as well so if the metadata we feed in this the strokes that we just created and for sparse values we have our GRPC sparse values here, and I realized actually after creating this that I can just do This okay so let's Go through and run this Okay, and that would take a little bit of time.

So I will Skip forward and I will see you when it's ready. Okay, so it's just finished and we have okay, 44,000 vectors in there and Yeah, okay. We'll start having a look at actually querying those and getting some some results. So Let's come to here. I think maybe I need to remove this Okay, good.

So let's start the dark blue French connection genes for men, right? So that's gonna be our query when we're querying we we do need to create BM 25 query vector. So we're using transform query here. We also need to use this so Creating our dense vector embedding so model and code query to list and we want to search through that Okay.

So with our search this is again if you watch the previous videos slightly different we now have so vectors I think what we had before But now we also have sparse vector Which is where we pass in our sparse vector and what I'm going to do is just return The images based on the results that we get.

Okay, so we just get all of these Pill image objects to actually view those we're going to use this function here Display result, which is just essentially going to display some HTML within a notebook Which will show the actual images themselves and with that would just view them. Okay, so query was Dark blue.

No. Yeah dark blue French connection genes for men And this is what we return. Okay, so yeah, it seems pretty accurate but let's see what happens if we kind of weigh the Sparse versus dense vectors. Alright, so what we're going to do is use a function named hybrid scale for this okay, and Essentially if we take look here, we have this alpha value this alpha value must be between 0 and 1 Where 0 basically means you're doing a sparse vector search only because if you see here 1 minus 0 so The sparse vector multiplied by 1 for sparse, but this one will be this is a dense one You're multiplying by 0 here.

So you're basically making it Meaningless and then if you want to dense search only you would you would use a alpha value of 1 or 1.0 so you don't trigger this so we run this and Then okay, we first do a pure sparse vector search So we set that to 0 to do this do so run this Okay, and it looks like we kind of get good results.

Although there are definitely a few women's genes in here So let's have a look at what's actually going on there. So we go to product display name All right, so on French connection. Yeah, we're doing well, right? So the keywords are being pulled in we'll find that exact match That's great.

But the issue is we do have like men is a component of the of the word woman So we're kind of pulling those in as well so What we can do is okay. Let's just go pure dense and see what what happens Okay, so this is basing it purely on the images, right?

So it's looking for dark blue French connection genes the issue is does it actually know what French connection is and if we come down to the The display name of these items and see okay. It's actually returns of the brand like loco locomotive I think it's called and Even spy car and Wrangler.

It does include a few French connection But I think that's just purely by chance rather than it actually knew that these are French connection genes no, maybe it did who knows but Nonetheless, yeah, it's not it's not pulling back the French connection stuff So we can't just set the alpha value - it's actually not gonna use 0.07.

We're gonna use 0.07 right, so it's Actually, I'm basically a sparse search with a little bit of dense in there as well In fact, not even 0.07 0.05. Okay All right, so we run this So this is heavily weighted towards sparse, but it's still considered some dense and You know, they look mostly.

Okay, there is some black here You know, ideally you'd rather avoid that Maybe we could increase it to dance a little bit more to try and avoid that but you can see That for the most part we are getting French connection in here. All right, so we're getting some locomotive and here as well But for the most part is French connection and also not including any Women's genes in there as well, which is a bonus Cool.

Let's try it some more queries, which actually I think can demonstrate this pretty well So we're going to go small beige handbag for women. We're going to go pure sparse first. So looking at the keywords only and Okay, not really beige. All right, we have like every color in there They're all handbags, which looks good.

And obviously we're all for women. So Okay, not bad Just not beige whatsoever. So what we're going to do is we're going to add a little bit of dance again All right, see what happens All right so even with that little tiny bit of a dense factor in there the results are just like Ten times better we get pure beige.

All of these are Handbags, right? So that's really good We can also look here It's kind of interesting that it didn't pull beige in even though they all say beige and the actual sparse factor But anyway, so that's kind of cool. What happens if we just get pure dense It's kind of interesting.

I think so if we go pure dense Yeah, I think for the most part they're beige, which is okay But then we're getting it like women's purses in there as well. Not just Handbags, so we can see that Okay, like clip is understanding everything as like it knows that a woman's handbag and a woman's purse are kind of similar So it's just kind of giving you all of those that are also beige So, you know We can kind of see the benefit of having a mix of sparse and dense in here across both modalities as well All right another thing.

So this is kind of interesting we're gonna start with this image here and We're going to use this image to search right so clip can also handle images, right? so this is going to be our clip vector and We're also going to add in a text query, which is going to become our sparse vector All right.

So let's try those. We're going to be using a mix 0.3 for this one, right? So what do we have here? So these are all kind of similar to this image, I think Yeah, I definitely say so you return the same one here and They even I mean to me they look like the same person All right, so obviously clip isn't it doesn't know that you just want to focus on the top.

It's kind of looking at everything All right, so it's also gonna kind of return people that look similar It's kind of like a amusing side effect But the purple component isn't I think maybe it's considered a little bit here and here But you know, it's not really considered a huge amount there.

Is it? Okay, so Let's try rather than just relying on the query let's try and use the pine cones metadata filtering right because we know that in our metadata we have the The color of these items. All right, so we do the same again same query this time. We're going to filter for the color purple Okay, run this.

All right, and then straight away we get you know much better results. So with this well filtering for the color purple we are querying with both an image and Also a text query, right? So we you know, we're kind of approaching this search from several different Angles, right, you know Basically, the point is that we can do search in a ton of different ways and we get some some pretty cool results So another one here, we're gonna have this guy in his shirt and we're gonna go with a green shirt all right going to add that to our filter straight away and Yeah straight away so one thing is okay We don't even include like it needs to be men's shirt Anywhere in here the clip model is kind of handling that for us because it knows we're looking at a guy initially So it's just retaining as other guys Like these to me seem like we're actually filtering for green here So it means that this shirt is for some reason marked as green.

I don't know why but nonetheless We for the most part, you know, they're kind of that's a green type of color kind of faded like this one here So it works pretty well, I think Cool. So yeah, if you're if you're done with the index obviously if you want to kind of play around with it a little bit more feel free and go ahead to once you are done You can delete the index just to save your resources if you just have the one index, it's free anyway But if you have multiple obviously you'd be paying for this.

So I mean that's it for this example of It's like a hybrid multimodal search for e-commerce items as I mentioned at start we just kind of Blazed through a ton of different technologies like we had filtering in their hybrid search in their multimodality Kind of dived into obviously with the hybrid dived into the embedding methods sparse and dense So we covered a ton of things But I think it's kind of cool to see how many different ways you can Search and how you can enhance your search But anyway, that's it for now so I hope all this been interesting and Helpful.

Thank you very much for watching the video and I will see you again in the next one. Bye

Supercharge eCommerce Search: OpenAI's CLIP, BM25, and Python

Chapters

Transcript