Back to Index

Implementing Filters in the New Haystack Doc Store


Chapters

0:0 Intro
2:41 Filtering
5:36 Testing Existing Filter Utils
7:57 Making Sense of Filter Utils
10:35 Writing the First Filter
16:26 First Working Filter
18:24 Testing New Filters
21:27 Implementing in the Doc Store
24:2 Testing Pipeline Filters
27:11 Final Issue and Outro

Transcript

Today we're going to have a look at the current state of the PR for the Haystack Pinecone document store and we're going to try and work through and maybe solve all of the issues although I tend to be quite optimistic so maybe that's not going to be realistic but we're going to give it a go and we're going to go through and try and figure out what we need to do to actually merge this into the Haystack library.

So we'll just scroll down and so I have Bogdan who has given a lot of feedback on this and the first thing or okay not the first thing because I need to figure this out as well is make sure typing is compliant with mypy. I mean for most of it I was just taking methods from other parts of the framework so I'm not sure if there's been some updates since I started doing that so my types are just kind of out of date or if it's some other types I've kind of put in there not realized they were different I'm not sure so we'll figure that out later it's fine.

So one big one is to make use of the filter utils for converting the filters at the moment it's just a method inside the document store called I think build filter clause because this handle is converting from a from Haystack syntax of your so Haystack filter syntax to Pinecone filter syntax so we need to just solve that and what do you think so it's just a load of these like little ones that we're going to go through so for the so vector dim this is what makes me think I was looking at some older version this has been deprecated now and now we're just using vector or embedding dim so let's fix that so anywhere it says vector dim I just need to remove that first let me just check okay and just replace all of these so let's replace all of those so embedding dim okay it's number one easy so this is the more difficult thing I think we need to to figure out so there's this filter utils script and when I looked it was kind of confusing before so I'm going to let's take a look now and see if it see if we can figure out so filter utils I think it's here so it's still in the document store directory and we have this logical filter clause come down here and ah okay okay this yeah this is why it's so confusing to me so we have convert to elastic search convert to sql and they're all just empty so maybe I just need to I'm just going to try and add the convert to pinecone method in here like pretty much as it is already and hopefully I don't know figure that out but at the same time we have this logical filter clause object and I'm not sure what that looks like exactly so if we come here class that is able to pass a filter and convert it into the format that the underlying databases of the document stores require filters are defined as nested dictionaries keys of dictionaries can be logical operators comparison operator and so on okay so the logical filter clause is here then we have the comparison operators as well okay logical operator keys might take a dictionary metadata field names so at the moment we are converting these correctly as far as I know but I'm just not sure how to integrate that with this filter utils so let's have a look at another document store let's have a look at we aviate and see how they use use that so it's filter utils logical filter clause so they only use a logical filter clause here they don't use the other one as order comparison so let's have a look logical filter clause see where it's being used so if there's filters this is just being called so logical filter clause pass filters convert to okay kind of need to see what this is so if we if we take okay let's take this and we'll put it into a document into another file okay let me create a file here and host like test and so I'll just call it filter utils test okay so I'm going to test this but we need to import it so where is whereas we aviate importing this from here so okay let's take that import it here should work I'll select the kernel haystack let's see so filters I'm going to take a filter from the filter test I created before yeah anything there we go let's see what that gives us a filter or operation wow super weird um convert to elastic convert to elastic search right so I need to figure this out a little bit see what we have into so parse is creating one of these objects what object is it filter utils or operation seems really complicated to me so parse here okay not operation parse value okay not and so it's like we need to modify the current filter so so we need to modify this to not take a look directly at the keys but instead look at the objects created over here so this and operation or or so on we also need to figure out what those are so they in here they're actually in this script and operation ah okay ah ah cool okay so this is where that is happening so condition convert to elastic search where's condition coming from here um condition self-lock okay it's just it's stored in there from the logical filter clause okay so you initialize this logical filter clause which consumes which consumes where is it um up here somewhere conditions here okay self-conditions are added to the logical uh filter what's it called again logical filter clause and then later on when we actually use this and operation that is inheriting that value from the logical filter clause so the condition self.conditions is already in here okay and then in here we need to specify how to deal with each of those items okay but then it's a condition convert to we aviate it's a condition in the operator return okay so if we do something like let's just take let's take the we aviate one if we do something like convert to convert to pine cone convert to pine cone here and the operator is not and is that what it is okay so the operator in this case is and is that right can't remember now so or um let me filter this where is um okay yeah it's and so and then the operands what is the operands here um is that relevant I don't use it here all right let's see what that returns to us so first I think we also need to pass convert to convert off in the original or in the first um class so logical filter clause fine convert to pine cone and we just pass I'm not going to write the uh description in there yet so let's go ahead and I'm just going to open store that maybe I can also let me add it for the or as well so or operation okay oh um no no not that one let me copy that out okay where's your operation oh here so convert to pine cone again here this time we are using or I imagine in here maybe I need to create a list so let's let's just test it first see what we get so install now let's go back and actually test that so um restart yeah run this should we should now have so we get the filter date um don't need to do that again and then from that we should be able to so if we just convert to we aviate for example we should see something uh okay what happened there did I interrupt okay let's run again okay fine pass we get this filter utils or operation convert to we aviate let's see what happens okay cool and then if we do same again but convert to pine cone attribute error um okay so I need to add convert pine cone to a lot of things here I think okay um let's go back and do that then now you just leave it as we aviate we're going to reform everything I just want to make sure that everything is in place and I can just modify things in there the thing I think realistically the conversion from to hate to pine cone should be super straightforward because we don't have all these different syntaxes ours is very aligned to pine cone so it shouldn't really be an issue or pine cones is very aligned to haystacks and so on I'm just going to skip forward to this in this bit okay so that should be enough for that and we can first need to put console again and then we'll just rerun this and make sure that we're actually returning some sort of dictionary or filter let's make sure it's everything is in place so restart okay and then run this see what happens okay so some issues in there let's try and figure that out okay so we've got this now so we have like a really basic dictionary coming through so all you really need to do now is modify the code that we've added to filter utils to align to the logic in here so I think what we'll do is maybe have a few example filters and kind of figure out how we're going to write that in there so let's start with that okay so we've got this working now at least for this one filter query which is pretty cool a lot easier than I was expecting so that's really good and this sort of logic of putting things or separating everything out I think makes a lot more sense particularly if you consider how complex some of these other querying syntax must get so before I was thinking oh this seems a really overly complicated way of doing things but now I think I'm convinced this is probably one of the best ways to do it so the only one that we don't have is this not operation so I don't really know what to do with that yet so I'm going to for now I'm just going to leave it and come back to it later maybe just add like a not implemented error or something like that the only reason I say that is it would be great to say if you say not this maybe you could invert it by just looking for all of the opposite values but to do that you'd have to pull out all of the metadata from a pine cone and I don't think it's possible to do that but something I need to check so for now I'm just going to ignore this one we'll deal with it later so yeah let's that looks good I think the only thing now I want to test it on some other filters this is just one filter maybe it doesn't work on the others so let me find some of those so I think which notebook were those testing filters and yeah we come down here we have a few other filters as well so let's try those hopefully it works so filter down here we pass them and then we convert them to pine cone so convert to pine cone let's see let me just remove that so we see it straight away okay so we have um we have a lot of stuff in here let me just remove this my Zelda okay um yeah I mean that looks looks right to me let's try another there's a testing okay yeah uh we we just did that one so let's try this one so let's see what we get so convert to pine cone just this again okay um yeah and this looks good as well so I think I think we're probably ready to actually go ahead and test that in here so let me take one of these um I'll take some of these filters I'm just going to apply directly to pine cone to make sure it's actually working so to do this I have to create a load of vectors and I don't want me to do that I just want to create like a couple of vectors so if I go to pine cone demo come here it's okay so this one this one's easier yeah so this one only does it with like six um so I'm gonna run through this and see if it works so initialize I think we've already created this dictionary this document store um oh the other thing we need to do is actually implement it in pine cone so that's important so in here at the moment I'm going to go to filter I'm using this uh build filter clause I don't want that anymore so I actually want to just remove this and then let's find where we're applying filters okay let's have a look at right document see what I do okay so here we're using self build filter clause so we need to change this to use the same um logic as the other document source so so I look at um we aviate and I think okay uh ignore that we aviate filters see where they okay here this is what we want to do so the filter this just use this nothing more than that so it's pretty simple so the filters are just going to be converted into that so but they we use filter dict here but we just use filters convert to pine cone and I'm just going to search for this that will tell us where else we are using that so search for this oh it's only the only here oh really okay so that's that's it it's the only bit we need to change we also need to import this so it's what is it logical filter okay import this and we should be good to go okay now let's go ahead and pip install that and then we can test it and hopefully it will work okay so come over here pine cone demo let's restart that rerun it and so we've already got a document store here so I don't think we need to write these documents again yeah can ignore that dense passage retriever we do need we don't need to update the embedding so let me just move this to a new cell run it we do need this and then okay let's make sure it's working and there's answers and then we will try and apply the metadata filter so filters just something really simple here and then maybe we can like modify them a little bit to see what else you can get okay it looks good just going to run that just so I can see maybe some of the metadata we have here it's just this one again I want to return another name so I can add it like a crane or statement and see what that looks like but for now it's fine okay let's run it okay cool so it looks like filtering actually worked we're only returning two here because based on that filter there's only two items in the out of the six so that that's really cool let's get all the documents so and let's get some other names so we have this one let's add that into our into our filter so we're going to go we're going to use all here so want or this or this okay let's run that I think that's in the right syntax there we go so now we're returning more samples we're returning all of them now because they're either from from this one or from this one which is really cool and we can we can double check that is the case so I can't see anything there I return all documents and we'll go forward for duck in all ducks is it duck can I do let's just check that this works no okay so and see what's in there it's a document meta cool document meta and then name so let's just print all those out to make sure that is the case and just print cool so it looks like it's working so that is I think most of the issues on the pull request I think there's one other maybe one other big one at least okay so the typing with mypy so we can see that if we I'm not going to go through that now but I can at least show you what that error is so so we come to type check here and we can see in the test with mypy there's a load of typing issues like here all of these all typing issues so that's something to fix as well but we're not going to go through that now I think it's not that interesting yeah I think that's it for this video I hope it's been useful to just kind of go through that and like interesting to see how haystack actually works and particularly haystack filtering works in the in the library so yeah thank you very much for watching I hope it's been useful and I will see you in the next