Back to Index

fastai v2 walk-thru #6


Chapters

0:0
0:57 Type Dispatch
10:41 Create Pipeline
11:8 Empty Pipeline
15:25 Compose Transforms
24:8 Filtering
39:41 Pipeline Setup
46:35 Examples
58:25 Data Source Tests

Transcript

Hey everybody, can you see me and hear me okay? Great, sorry about the delay. Youtube streaming doesn't quite work on Firefox properly. Thanks Google. All right. I'm not quite sure what we're going to talk about today as usual, but I do have some place to start. Which is in O2.

I realized this one piece I didn't tell you about, which is in type dispatch. If you haven't seen it, it's a very nice walkthrough of type dispatch on the forums now. Thank you to Aman Arora. The basic idea of type dispatch is kind of quite well described in the tests, which is that here is a bunch of functions and they take different things.

And underneath you can see, so we create a type dispatch object with that list of functions, and then we treat it like a dictionary passing in some type. And it tells us what function you should call to use that type as you can see. And then as well as doing that, more importantly, you can treat it like a function.

It's a callable and you can pass it something. In this case, I'm passing it the normal int. So it's going to go to the numbers.integral version, which would return x plus one. If we check, yes, it has returned x plus one. Something that I didn't look at last time, though, was this other cell underneath it, which is exactly the same thing, but this time these functions take self as well.

And we create a type dispatch object as usual. We then insert that type dispatch object as an attribute of this class. And we're going to call the attribute f. And now we call a.f and we get back again two. Now there's something a bit magic going on here. How did Python know or how to type dispatch know that a.f should be passed a self as well as the x.

And the answer is by default, it doesn't. A.f is just an attribute of this class. There's nothing particularly to say that it should be passed a self. So how does it do that? And actually, it would be good to do another test to make sure it's getting a real self.

So maybe we should like. What if we did something like that? Actually, I don't want it to be kind of interesting, but you'd be let's take the bool one and instead of just return x, why don't we go self dot foo equals a. So in that case, we would expect to be able to go a.f false, for example.

So that's going to call it with a boolean, which should call this version. So after that, we should find that a dot foo is a. Let's run that. Well, and it is and it worked. So it is correctly binding somehow self to the this object. So the way that happens is a bit of magic where when we call the back over and type this batch in dunder call, it checks for a special thing called self dot instance stands for self dot instance.

It finds out whether this is an instance in an instance of something. And if it is in self instance, not none, in which case, instead of using the function that it's looked up, it wraps it as a method. So it's a method where the method is this function and the instance it's bound to is this instance.

So this is how you turn this is how you turn a function into a method. OK, so that's fine. But how on earth does it know what self dot inst is? Somehow self dot inst has to be set to this case a. And the answer is that in Python, when you go a dot f like this, it's actually going to call a special method of whatever class f is in.

And the method of calls is called dunder get. So we've never seen this before dunder get. This is again, it's all remember the place to learn about all this stuff is the Python data model documentation. So we could look for dunder get. And here we are. It's called to get the attribute of the owner class or an instance of that class.

The instance is the instance that the attribute was accessed through. So in this case, dunder get is going to be called and it's going to be passed in as self the value of this f thing. But more interestingly, it's going to be also passed in as inst the value of the thing before the dot a.

And so that means we can just go self dot inst equals inst. And so now from now on the type dispatch object knows what instance it's being called on. So we just go self dot inst equals inst. And so then later on when we call the function. We check if it's not done and indeed it's not done.

And so then we wrap it as a method. So that's again, like just super nice. Python extensible data model that lets us do anything we want to. And so in this case, there's this kind of wonderful magic where, you know, I was really surprised this was possible when I learned about it.

I really wanted it to work, which I thought would be I have both functions and methods and have my new dispatch automatically handle both. And it does. This is how this is how it does. So that is. Yeah, I think that's pretty great. OK, so something that I added with Sylvain this morning actually was we added one more thing to our transform class, which is as well as encodes and decodes.

We've also added setups. So setups will be called by setup as before, but now setups is a type dispatch object. So the codes changed a tiny bit. I've put encodes, decodes and setups into this couple. And so now we go through for name in those three methods and we type dispatch objects.

That's all in the meta class. And the reason for that is that we're starting to work on rapids. Rapids, if you haven't seen it, is a very nice project from Nvidia, which provides something a bit like pandas that runs on the GPU. And it's only a bit like pandas.

It's not pandas. So we basically want to create tabular transforms that work automatically correctly on rapids data frames and on pandas data frames. And that includes we want the setup to work appropriately. And so now thanks to this, we can now do that. So. Yeah, so that's why we just added that.

So slight changes to the code from last time. They're very small. All right. Well, let's keep working back through the details of these things that use this then, shall we? So we've done zero to. Oops. And so let's now look at zero. So in zero three, we can report that thing we just made the code dot data dot transform.

And what we're going to do is we're going to create pipeline. So pipeline. As usual, it's probably easiest just to look at the tests. So a pipeline is an object which is so let's create a pipeline called pipe. And it's an object which is callable. So this is an empty pipeline.

So an empty pipeline always returns whatever it's given. It has the same as item behavior as transforms do. And so if you say set as item on a pipeline, it will set the as item boolean in all of the transforms in that pipeline to that value. So in this case, we're going to say false.

And so if we set it to false, then we pass in a. Couple back a couple. So where it gets more interesting is if we create a pipeline. That does something. So here's a pipeline with two transforms. The first transform is something which encoding turns something into an inch decoding turns it back into a float.

And these are the capital versions, capitalized versions, which are the first day I versions that know how to show themselves. One of the main reasons we use it in tests is to make sure that the retaining types works properly. And then here's a transform, which simply remember you can create a transform either by subclassing or by.

Instantiating. So here's a transform, which is going to set encodes to this negative function and the decodes to the same negative function. So there's two transforms. And so those are the two transforms in our pipeline. So if we start with the value two point zero and then we. Pop that into our pipeline.

And the first thing it's going to do is make it a negative because that's the first thing in a pipeline that's going to become negative two. And then the second thing is going to do is it's going to turn it into a capital I int. So test equals type checks that this is equal to this and not only is it equal, but they're exactly the same type.

So this is now inch minus two as expected. And so you can see what the pipeline is doing is it's calling. Taking this value and it calls this function first. So that makes it negative two point oh and then this function second, which makes it capital I int of negative two.

And that's all it's doing. So that is function composition. When we call pipe dot decode, it will simply call the decodes of each thing in the reverse order. So if we start with T, which we know is int negative two and we call decode on the pipe. The first thing we'll do we go in transform to this one decodes.

So it'll turn it back into a float. That's now negative two point zero. And then the second thing it will do is to go to the codes of negative transform, which is just again negative. And so that's going to be a capital left float two point oh confirm. Yes, it is a capital F float two point oh.

So how does that work? Let's take a look at pipeline. The key thing that pipeline does when we call under call is it calls a function called compose transforms passing in the value, which in this case was the value two point oh and it's going to pass in the list of functions.

And so that list of functions is just the thing that we pass to the pipeline constructor. So we just set self dot F's to a list of those functions. We will turn them into transforms for those that aren't transforms and we will sort them by order if they have an order.

So compose transforms is the key thing that's actually doing the work. And so that's here for his compose transforms. That's going to take some value to calculate the composed functions on and some list of transforms. And are we encoding or decoding? Are we going in forward order or reverse order?

And this is basically a pretty classic function composition loop. You go through each function. You call it and you replace your current value with the result. And then you go back and do that again. Keep doing that for everything in the in the list of functions. And so you can see you can use compose transform.

So here's examples of some functions. So any time you want to know how something like that's used internally like this works, you can see the tests for the thing that's used internally before you worry about how it's being used in, for example, pipeline. So here's the test of composing some plain functions.

And then here's a test of composing some transforms. Because you can always use. Functions as transforms. And in this case, as long as we're not going to pass in is equals false, it's not going to try and do anything other than just call the callable. So that's how the pipeline does.

And it's also how the pipeline does decode just going to call compose transforms and say is encode is false and reverse equals true. So that's that's the simple bit there. Where it gets interesting is when we call pipeline show. What does pipeline show do? So intuitively, what it does is it takes T and remember T is a capital I int minus two.

And it decodes it one transform at a time in reverse order until it gets to a data type that has a show method, a data type that is showable. Now, in this case, a capital I int. Is already showable a capital I int. It's sort of a definition. Capital I int is just an int that inherits from both small int and show title and show and it has no other code in it.

And show title, remember, is just a class which has a show method and shows it by calling show title. And show title, by the way, if you're passing in a no context at all, it just prints whatever it's passed in. And if you pass in a plot, it will show it as a title on the plot.

So show title three, as you can see. So in this case, pipe dot show T and remember T is minus two, simply prints out minus two. And the reason for that is that because this is already something that's showable. So there's no decoding needs to be done. For those of you didn't see it earlier, notice that the way I test that this prints something out is I make that lambda.

Which takes no arguments. And that's the first thing that you pass to this test stood it out method. And that tests that this function when run prints that to stood it out. That's a nice way to check what's happening with our show methods. When as item is false in our pipeline and we call pipe dot show.

Pipe dot show will call show on each element of the tuple that's passed in. Because remember when as item equals false in transforms, it's basically saying you should apply the transform to each element of a couple that's passed in. And so when you show it, you should show on each element of the couple that's passed in.

And the reason for that is for stuff like when you're doing show batch. Like the pets data set, you've got two things in the tuple of the batch. You've got the images and you've got the labels. And so we want to show the image and then show the label.

So in this case when I say pipe dot show. And pass in one comma two. And that's going to print minus one and then print minus two. So it's calling show on each element of that tuple. And notice it's also applied the pipeline to each element of that tuple.

So each one has been negated and turned into an inch. OK, so that's pipeline and so show. Is going to go through our functions in reverse order. It'll see if it can show it. Without doing any decoding. And if it can, then I'm done. If it can't, it will try to decode it with this function.

And then it will go back and try again and see if it can show it now. And if it can't, it will go to the next part, the next earlier function in the pipeline. Decode, can I show it now and so forth. And so all underscore show is doing is it's just checking whether as item is true or not.

And then it's checking that whether there is a show method for this type or not. For everything in that type or not. And then if there is, then it's going to show them. OK, so I'm not super exciting code. But conceptually interesting. This idea of being able to decode and show things turns out to be super helpful.

OK. So here you can see we're creating some functions, some of which only operate on certain types. And so here we've got a function, one function that operates on tensor image, one that operates on everything, one that operates on PIO images. And so here's something which is going to call image.open.resize.

And then turn that into a tensor image. And then take its negative. So we should find at the end of all that that we have a tensor image, which we do. And we check that we have the right values if we get rid of the F1 piece. So we just open in the image and then turn it into a tensor image.

We should be able to show it. So that's making sure that we can. OK. So now's a good time to talk about filtering. Actually, no, let's talk about filtering after we talk about the data source class. So we've kind of covered these ones before. So then a transformed list is something where we're going to pass in a list of items and a list of transforms.

And it's going to create a pipeline with those transforms. And it's going to. So it's a subclass of triphoned base, which is a subclass of L. So it's just passing the items back up to the L constructor. So it's going to basically be a list of items. But it's also got this pipeline in it.

And so this is where you can learn some new interesting stuff in L. Because L actually lets us create. It's kind of designed to create some more interesting types of collections. So in this case, you can see what happens in L when I call get item. It actually calls self underscore gets if you have an iterator of indexes.

Otherwise it calls self underscore get and self underscore get by default. Assuming there's no I lock. So as long as this is not a pandas data frame, it just returns the ice element. But what we can do into from list is we can override underscore get. And we will continue to call capital L underscore get.

But then we will call our pipeline. So this is how we end up with something that we can say, let's create a triphoned list. The items are one, two, three. Our pipeline is going to be negative and then our int triphoned transform. And then we now have something that we can treat like a list.

We can subscript into it. And it's going to grab the first thing, which is in this case 2.0, and apply the pipeline to it. So we end up with int negative 2. So this is starting to really look like something that has nice PyTorch data set behavior. So you could absolutely use this as a PyTorch data set, which indeed we did.

Back pretty close to when we started these walkthroughs in 0.8. Here is 2fmds, so going right back to it, we created an image resizer transform that can encode an image by resizing it. We then created a pipeline. And then we actually do something a bit more complex. So we have to come back to this in a moment.

We use a 2fmds, not a triphoned list. Yeah, OK. So let's get going to 2fmds so we can see how to make a-- the reason that 2fmds isn't quite everything we want for a data set is normally a data set when you index into it. It should return two things, independent variable, interdependent variable.

And so far, we only have something that kind of returns one thing. So what we can do is we can make something slightly more interesting called 2fmds, which now it's very, very, very similar to 2fmdlist. So it's still inheriting from 2fmdbase. It should turn inherits from L. It still passes the items into it.

But this time, it creates a few 2fmd lists. And specifically, it creates one for each list of transforms you pass in. So now we don't just pass in one pipeline, but we pass in n pipelines, where n usually is 2. We usually want to set up an x pipeline and a y pipeline, an independent variable pipeline, and a dependent variable pipeline.

So we go through each of those pipelines, and we create a transformed list with the same items, but the different set of transforms. And so this is actually the thing that we used in the pets tutorial, because we said, oh, let's have-- we're going to start with a list of-- our items is a list of file names.

And the first pipeline will treat it as an image-- the path to an image. It will open the image. We resize it, turn it into a tensor, and make it a float. And the second pipeline will treat it as a source of a label, and it will label it and categorize it.

And so inside-- in fact, let's look at it. If we run this, we should be able to look inside TDS, and we should find there's self.tls, and that should contain two transformed lists with the same items. So let's take a look. TDS.tls. Yep. So there are-- there's our transformed lists.

So the 0th one is going to have items. As you can see. And it should have also a pipeline in it. So let's go back and check. tif and list. So that pipeline will be called-- called tifms. Yes, it does. So there's tls1, 0, and 1. OK, so there's our two transforms.

So it's going to be applying these different pipelines to exactly the same items. And so that's why then, when we say t equals TDS0, we're getting back an image and a category. And you can see the types here, hence the image image. And so when we say TDS.decode, here is-- here is TDS.decode.

It's going to go through each of those transform lists and decode each one. So yeah, so Max, yes, both pipelines are going to start with the same thing because the tif and lists I'm creating are both being passed exactly the same list of items. And so when you think about it, this is like how-- exactly, the one for x opens the path as images and the second creates labels from the paths.

Well done. So yeah, so here is pipeline1. In fact, let's see if we can use it, all right? So let's grab items. So here's items that we passed into a TIFMDS. So here's item0. There's a path. And so when you think about it, like, whatever your independent and dependent variables are, they're kind of being somehow derived from the same place.

That's kind of like what a labeling function is. So let's create an item called-- like so. So here's our item. And so let's create our function for x would be tds.tifms.tls0.tifms. So here's our-- that's our first pipeline, right? And then our second pipeline is tds.tls1.tifms. And so there's both our pipelines.

So if we then apply fx to our item, we get our image. And if we apply fy to our item, we get back our category. So yeah, so that's kind of a useful thing to try doing, is to like just see what's going on inside. So could TIFMDS accept tuples instead of items?

Well, I mean, it's not as dead off. Like, the items can be tuples. So yes, absolutely. Because it's just calling a pipeline. And so-- well, let's try it, right? So let's create some items equals 0 comma 1 comma 1 comma 2 comma 3 comma 4. All right, so there's some items, which is a list of tuples.

And so we could create like a function. So there's a function in Python called item getter. So if I call fx is item getter 0, the way that works is that-- let's just try to make this more helpful by making this a capital L. So if I apply fx to every element of that, I get back, as you can see, the first element of each tuple.

So let's create fx and fy. And so now if we do fy, we'll get the second element of each tuple. And so we could create a fmds, which the items of it are its. And we're going to pass in two pipelines. The first pipeline just contains fx. And the second pipeline just contains fy.

So now if we look at tds0, there you go. So it's going to take the zeroth element, which is the tuple 0, 1. And then it's going to pass it through two functions. The first function is item getter 0, which returns the 0 thing. The second function is item getter 1, which returns the index 1 thing.

And then it's going to put them back into a tuple. So hopefully that answers your question as well, David. You just have to make sure that your items contain whatever information is necessary to construct your data that you want to end up in your minidatches. Thanks to all those very, very good questions.

OK. So and to be clear, remember, I got to this over a period of about 25 iterations and pretty much rewrites over a period of many weeks. And so each piece is very simple, but they go together in very neat ways, which most people don't have to understand all these details.

But you folks, you're here because you do want to understand all these details. Don't let it bother you that it's going to take a while probably before it all clicks into place. But hopefully these notebooks can help her click into place. OK, so yeah, so like going back to notebook 0.8 would be a really great thing to do, you know, kind of as homework if you want to do some homework before tomorrow, because you'll understand what all the pieces are now.

All right, so that is TIFMDS and TIFM list. In practice, you probably won't use TIFM list much because most of the time you want multiple sets of transforms because you want to create a data loader, which is going to have a mini batch with tuples of things. So you want TIFMDS most of the time, but TIFMDS uses TIFM list.

So remember, these things are very, very small. Each one is very, very small. So try to make sure you get a good intuitive understanding of what each thing does and read through the tests and understand like, why did we add that test? You know, because these tests are not arbitrary.

They're the set of things we think provides kind of the best clarity around the details of what this thing does. And remember that the method section has tests as well, so that you can learn more information about what all these different methods do. Okay, so now something new. Oh, no, this is not new.

This is 0.5. So I think we've done all this before. So get split and label. We've seen all of that before. Category map we've seen before. Oh, we didn't look at pipeline setup. Okay, so we've talked briefly about pipeline setup before. So in something like Categorize, if you don't pass a vocab in to Categorize, it uses setup to automatically create a category, sorry, a vocab.

And a vocab is a category map class, which is this very small little thing, which simply will call .unique on the list of items to find out the vocab. Unless it's a pandas dataframe in the categorical series, in which case the pandas already has that done for us. So the key thing here is calling setup or setups because it's a transform.

It's a transform method. It's a type dispatch method. So if we look at pipeline, let's see if we can find some setup examples. We don't have good setup examples. That sounds like an oversight. I guess the setup examples probably won't come until we look at data source. Well, here's a good place to look at them.

So let's learn about setups by looking at Categorize. So Categorize, I've already taken you through this code. The key thing is that we need to make sure we call setup at exactly the right time. If you look at here in pets, first we need to label and then we need to categorize.

And more importantly, when we create our vocab, we need to create our vocab after calling the labeler. So what happens is that setup is going to be called also as part of the pipeline. So if you look at the pipeline, here it is. Here's setup. What we actually do is we set self.functions to an empty list.

So we say we actually don't know what our transforms are. And then we store that list of functions in a temporary thing called tifims. And then we go through each tifim and add it. And what add does is add, then call setup, and then add it to the list.

So the reason we do it in this rather awkward way is because this way, when we call Categorize, it will have already added the labeler to the pipeline. So it's going to go through and it's going to say, okay, the first thing is, sorry, the first thing is labeler.

So it will self.add labeler, which doesn't really have a setup, so just adds it to the list. And then it will add Categorize. And notice it calls setup before it adds it, right? Because we can't append it before we call it setup. And so that's why the setup is going to get the raw paths.

It's going to get the paths after the labels have been extracted from them. So this is kind of like a, yeah, it's this really important detail that we found pretty tricky to get right. But now that it's there, we find it just super handy because the right information goes automatically to the right parts of the pipeline.

And so things can set themselves up automatically. So that's really handy. So that's the key thing to kind of understand in Categorize is that setup, yes, it's really the first time we properly test and display how setup works. And so you can see here we create a tifmDS with cat.cat as our items.

And our pipeline is just a Categorize transform. And so at the end of that, the vocab should be cat.dog. It is. So bodyCategorize doesn't have any new information. It's the same stuff. But it's a good way that you can check it out to get a second kind of angle on how setup works.

So now that we've got all that, we have all the information we need to create tifmDL. So now if you go back and look at tifmDL again, it'll be more straightforward. So you can see that we go through each of after item, before batch, after batch. So these are all things in the data loader's kind of list of methods it goes through.

And if you pass in any of those keyword arguments, then it will grab that keyword argument and turn it into a pipeline. OK. And we'll then set it up. And notice here it passes itself because the pipeline setup, generally speaking, it needs to know what items to set up with.

So for example, in categorize, the setup is receiving the actual list of labeled items to create a vocab with. OK. So that's the key thing here is we now know what these pipelines are. And so you can now also look and see how decode is actually working. And it's calling decode on each of those things in the pipeline.

OK. So now we can look at some more examples. So we now can see CUDA is just a transform that has an encodes and a decodes and the encodes, we've seen this before, puts it on the device and decodes puts on the CPU. And this is pretty cool because I don't think there are any other frameworks that will like automatically put things on the back on the CPU for you when you're all done with them for the purpose of displaying them or whatever.

And so you don't have any memory leaks. But to float tensor, we've seen normalization, we've seen. And data bunch, we've seen. OK, so we're kind of gradually working back up here, which is nice. And hopefully, yes, we're now up to 06, which is really what I wanted to get to today.

And so 06 introduces data source. And to remember what data source does if we look at the 08 pets tutorial, data source is something which is almost identical to TIFMDS. In fact, TIFMDS, let's go back and have a look at our version of pets with TIFMDS. This is the TIFMDS version of pets.

We had two sets of transforms, one which is image create, one which was label and categorize. And then we created a TIFMDS passing in the items and the transforms. And then we made that into a data loader passing in some after item transforms to happen there. Notice that if I copy just the TDS, it out and paste it here.

The TDS, let's find them up so you can see. As you can see, TIFMDS and the data source versions are almost identical. And the way you use them is almost identical. The difference is one extra argument, which is the filters. The filters tell the data source how to do this, which is to get a subset.

And all it does, literally, is subset one simply returns a new TIFMDS, which contains not items, but items for split IDX one. So to remind you, this is a while ago, split IDX was just a tuple with two things in it, which is the list of indexes that are in the training set and the list of indexes in the validation set.

So this -- and pets.subset one has another name, which is .valid. It's exactly the same thing. And pets.subset zero is also called .train. So all this data source is doing is it's giving us something that looks like two different TIFMDSs, one which is going to give us back things from the validation set and one that's going to give us back things from the training set.

And the way it does that is by passing in filters. So the way that works is actually nice and easy. As you can see, data source is much less than a screen of code. And quite a lot of that is actually the thing to create a data bunch. But the actual thing -- so as you can see, it's a subclass of TIFMDS.

So it behaves a lot like TIFMDS because it is a TIFMDS, but it's a TIFMDS that also has subset. And so subset is something that's going to pass this TIFMDS into something called make subset. And so make subset is something that's going to grab all of our transforms. And it's going to create, as we discussed, a new TIFMDS containing just the subset of the items that are in filters i.

So in our split index is zero or split index is one. And it's going to pass in the transforms. And a key thing it's going to also pass in is do set up equals false because I don't need to recreate the vocab. We already have one, for instance. So this is just a TIFMDS for a subset of items.

That's basically all a data source is. So like in terms of the other code here, it's just to do a bit of kind of bookkeeping and checking and stuff. So, for example, these filters IDs that we pass in, you can pass in as many as you like. Normally, it will be true.

A list of indexes for the training set and a list for the validation set. You can do more. And so like I just check here to make sure that there's no indexes in the training set that are also in the validation set. So we kind of try to make sure that good data science practices are followed.

It will let you know if they're not. OK. So there's something else, though, interesting about our filters. And that is that when we create the subset, we actually pass into TIFMDS a filter parameter. And what does that do? Let's find out. So. So as you can see, TIFMDS takes a filter argument.

And it doesn't really do anything with it other than it passes it on to the list it creates. So fine. Let's look at that. So TIFM list gets a filter argument. And what does it do with it? And the answer is nothing much. Just passes it on to the pipeline that it creates.

OK. So what does the pipeline do with it? The pipeline grabs the filter. And what does it do with it? It stores it. Why? Because when we call call or decode, it passes it as a parameter to call or decode. So what does composed TIFMDS do with it? And the answer is nothing really.

It's just a keyword argument that it passes to our function. What does our function do with it or transform do with it? And the answer is whatever you like. So the key insight here is that our transforms actually have the ability to know whether they're being called on the training set or the validation set.

And actually, by default, it's something that it does with that. Which is when you create a function, a transform, you can actually pass in a filter. And if you pass in a filter when you create the transform, that says this transform should only be applied on that particular subset.

So for example, data augmentation. The data augmentation transforms, I think by default, always set filter to zero. Because if filter equals none, it means you should apply it all the time. But if filter equals zero, it means you should only apply this to the training set. Which is what we want, right?

We want our data augmentation only applied to the training set. So when we call this transform, it's going to be passed the filter. Because remember, the pipeline passed it along. And you can see here, if our filter is not none, and the filter for this function is not this transform's filter, it does nothing at all.

Okay, so that's the default behavior for a transform, is that it will be disabled if you set the transform's filter. And then you call it with some different filter. So this is like just another of these nice things to make sure that you don't accidentally do things that you wouldn't want to do, where we make sure that it's only being called on the training set that's appropriate.

So most of the time, you don't have to worry about it, right? Because most of the time, when you create a transform, you're not passing in a filter, right? So most of the time, this just does nothing at all, right? But if you do, or more generally, if you have some transform that's not, seems to be not doing anything, then you should check, maybe you created it with a filter.

So yes, Max, a filter is just an integer. I think, yes, it's just an integer. But, you know, you could inherit from transform and replace how underscore core works and actually have things that work differently for training versus validation. So, you know, the key thing here is that the infrastructure is in place that functions, that transforms can behave differently.

They actually know where they're being called from. So you can see in the data source tests, let's start by looking at them, right? So here's some items, not through four. So here's a data source, not through four. And the pipeline, it's going to apply is do nothing at all.

So it's an empty pipeline. So the data source has a list of filters, and we didn't pass in any filters. So it's just going to, by default, have one filter, which is everything in the list. So if we just grab item number two, then here is zero, one, two, and it should return two.

OK, if we grab items one and two, they're going to get back one and two. And notice they're being turned into tuples. And the reason for that is that, remember, this is a different. And two from the S's take a list of pipelines and create a tuple with an element for each pipeline.

So we have one pipeline. So we get back tuples with one thing in. This is what you want in PyTorch mini batches, right? A data set and a data loader should return tuples. So Pedro's question is, should we call retain type back here? And the answer is no. Retain type gets past two things, the new result of whatever functions we called and the original thing we were passed in.

And it makes sure that this res has the same type as this if res ends up as a subclass of this. So if res ends up a tensor and X is a tensor image, it will turn res into a tensor image. In this case, nothing happened to X. It didn't change.

So we have no retain type to do. It's already the same type as X because it is X. OK. You can also index into different lists and therefore data sources with masks, boolean masks instead of indexes. So that's just that. They also work on data frames. This is important that they work on data frames.

It's not just they work on data frames, but they work on data frames in a optimized way. So they'll actually use the ILOC method in data frame to do things efficiently. OK. How do we set up a pipeline where a transform of X depends on Y? Let's look at that next time or sometime in the next couple of days.

That's a great question. And do remind me in the next couple of days if I forget. OK. So then you can see here's the same thing basically passing in our range with no transforms. But this time it's passing some filters. So now there are two sets of filters and there's subset zero, which is the same as the training set.

There's subset one, which is the same as the validation set. So that's that. Oh, batteries. That's fine. And the filters could also be masks. They don't have to be hints. Great. OK. Well, I think that's enough for today. So, yeah, let me know any questions you've got, but hopefully these things are starting to come together.

All right, thanks everybody, see ya.