fastai v2 walk-thru #9

00:00:00.000 | Hi, can you all see me and hear me okay?

00:00:19.000 | Great.

00:00:30.560 | What does 5x5 mean, Fred?

00:00:45.640 | And does anybody have any requests for stuff they would like to see today, if you'll ask?

00:00:54.300 | Oh, good.

00:00:59.400 | I think 5x5 is being the weightlifting I do.

00:01:04.080 | Five reps.

00:01:05.080 | Five plots of five reps.

00:01:09.960 | Okay.

00:01:12.880 | So in the absence of requests, I will show you something that's changed.

00:01:17.760 | We've renamed things a little bit, as you can see.

00:01:25.160 | Transform now has some more stuff in it.

00:01:39.280 | Specifically it has pipeline.

00:01:40.280 | The pipeline has been moved into transform.

00:01:44.560 | Transform and pipeline are not specifically to do with data, they're just ways of doing

00:01:49.200 | functions and dispatch, basically.

00:01:55.500 | And then data core -- no, we don't have a tentative release date.

00:02:09.560 | Data core contains transform data loader, data bunch, transform list, and data source.

00:02:18.800 | So there's not a transformed DS anymore.

00:02:25.440 | And then six is a new module called data.transforms, which is where, you know, some standard transforms

00:02:39.520 | live, basically, and not just standard transforms, but also stuff you would use in standard transforms

00:02:45.480 | like get files and split and stuff like that.

00:02:50.500 | So those are those three things.

00:02:56.280 | If you're interested, I can tell you a bit about what happened with data source and transformed

00:03:00.640 | DS. Because it's kind of an interesting design question, and I'm not sure I have a simple

00:03:12.000 | rule of thumb for it, but basically, we like to kind of have layers that each thing does

00:03:17.880 | one thing and one thing separately to other things.

00:03:24.320 | But if you have too many layers, then debugging gets confusing, and so I kind of find my approach

00:03:31.000 | to designing is extremely iterative.

00:03:32.280 | In fact, it's entirely iterative.

00:03:33.280 | I don't really design much upfront at all.

00:03:37.760 | And I found that we were getting weird bugs in data source, which I was having, you know,

00:03:46.120 | weird bugs mean that something's not clear enough or, you know, the things that you think

00:03:50.440 | are in your head aren't really in your head the way you thought they are.

00:03:56.320 | And I realized that a data source without any filters, without any subsets, basically,

00:04:09.360 | was the same as a true from DS.

00:04:14.320 | And then that made me think having two separate classes for those things kind of seemed weird,

00:04:18.600 | and I wondered if we put them all in the same class, what would it look like?

00:04:22.840 | And as you can see, data source, doing both data source and turf MDS now, is, if anything

00:04:32.760 | shorter than data source that was inheriting, it's ended up clearer, which is interesting.

00:04:39.280 | The only thing it inherits from is something called filtered base, which is super tiny.

00:04:44.760 | It's basically just something which you have to define subset, and then it's going to define

00:04:54.440 | train and valid properties for each of your two subsets.

00:05:00.240 | And the other thing it does is it adds a data bunch, which will create a data bunch containing

00:05:08.880 | a default turfMDL for each of your subsets.

00:05:17.160 | And one of the nice things is that means that turfMList can also inherit from filtered base,

00:05:23.960 | which means that you can create a data bunch from a turfMList, or you can create a training

00:05:28.600 | or validation set from a turfMList.

00:05:31.000 | So yeah, if you don't need multiple independent pipelines creating a tuple thing, then this

00:05:39.520 | might be an easy way to create really simple data sources.

00:05:49.040 | So yeah, the tests that were in turfMDS are still here, all the same tests are still here,

00:06:01.480 | but now they all say data source.

00:06:03.320 | So here's an example of a data source test without any splits being applied.

00:06:11.640 | So it's just acting, so there's no use of train or valid or whatever.

00:06:20.000 | And here's one that does have filters applied.

00:06:21.680 | So we can check train and valid, as you can see.

00:06:30.600 | And then the actual use of the creating of the filters is done in turfMList.

00:06:42.040 | So that's part of why data source is so simple now, because data source is simply something

00:06:47.080 | that contains a turfMList for each transform that you pass in.

00:06:58.720 | So that's a change.

00:07:04.920 | And our code ended up much simpler, and it's easier to debug, and the weird bugs we had

00:07:11.260 | went away, so that was all good.

00:07:17.120 | Another change, which is less substantive, is tabular.

00:07:24.480 | So now that we have this filtered base subclass, tabular doesn't really need to use data sources

00:07:30.360 | anymore.

00:07:31.440 | It can just inherit from filtered base, and it will get a train and validation set automatically.

00:07:41.900 | So it just has to define subset, because that is the one not-implemented thing that subclasses

00:07:56.520 | have to define.

00:07:59.800 | And so now there's no-- so as you can see, tabular is actually a bit smaller and simpler

00:08:03.760 | now.

00:08:06.320 | It doesn't have to have a data source method anymore.

00:08:10.760 | If you want to create a train and valid set, you can just pass in fields.

00:08:16.560 | So if we have a look, here's an example of creating a tabular without passing in fields,

00:08:21.880 | and so it just acts like a normal data frame type thing.

00:08:27.640 | And here is a processed one with categorify, just like before.

00:08:35.000 | And then here is one with splits, which-- that's going to be confusing, because we call it

00:08:43.880 | splits in one place and filts in the other.

00:08:46.480 | We should change that.

00:08:49.320 | We'll add a note.

00:08:59.160 | OK.

00:09:06.240 | How do I navigate code?

00:09:07.760 | I mainly use Vim.

00:09:12.160 | So I don't have to too much, because my code's pretty small and self-contained.

00:09:18.440 | But if I do need to jump around, I just use Vim using its tags functionality.

00:09:27.440 | And the other thing is the get nbsource link.

00:09:39.520 | Actually, there's a-- what's the one that gives it to us nbsource link?

00:09:50.040 | There we go.

00:09:51.040 | So you can do it this way.

00:09:54.040 | You get something you can click on, and it will take you straight to the right spot.

00:10:04.120 | That looks like it is-- yep, it is pipeline.

00:10:07.720 | OK.

00:10:08.720 | So that's another option.

00:10:13.720 | Quite often, I just want to see how something is defined, in which case I'll just do the

00:10:20.960 | question marks just to double check.

00:10:24.680 | But yeah, we have-- you can attach VS Code to a remote terminal easily enough.

00:10:34.920 | And so you can always explore it through VS Code or whatever.

00:10:39.400 | But yeah, it works fine in Vim.

00:10:45.760 | So I could go colon tag, pip tab, and it will tab complete to pipeline.

00:10:54.440 | And there is the class, as you can see.

00:10:59.800 | And then if I go to-- oh, I want to know what transform is, so control right square bracket,

00:11:06.800 | and that will take me straight to the definition of transform, and so forth.

00:11:10.080 | Yeah, I guess most editors do the same stuff.

00:11:12.840 | Don't just remember, in local, you've got a full, browseable set of modules.

00:11:20.720 | What kind of weird bugs did I have?

00:11:22.800 | Oh, you know how it is.

00:11:25.040 | After you've fixed a bug, you can throw it out of your head.

00:11:30.200 | One of the big challenges is around setup.

00:11:36.680 | Setup's actually quite tricky.

00:11:40.560 | So what we do in pipeline with setup is we, first of all, make a copy of our transforms.

00:11:54.240 | And we then clear our transforms.

00:11:56.880 | And then we go through the copy of the transforms and add them back one at a time.

00:12:01.800 | And after adding each one back-- well, before adding each one back, we call setup.

00:12:06.060 | And then we add.

00:12:09.120 | If you don't do this, if you just call setup on all of them after adding them all, you

00:12:14.620 | kind of have this weird thing where all of your transforms are being called even before

00:12:19.560 | they're set up.

00:12:21.120 | So you kind of have to add checks inside your transforms to make sure whether they're set

00:12:25.260 | up yet or not.

00:12:26.260 | And if they're not, then you'd like to do nothing.

00:12:28.680 | It's super awkward.

00:12:30.880 | And so like one of the problems was in the train and valid subsets, they both had their

00:12:40.160 | own kind of copy of the same pipeline.

00:12:43.360 | And previously, I wasn't going to firms equals-- I wasn't clearing it out like this in place.

00:12:49.680 | But instead, I was going like, self.fs, tofums equals nothing, self.tofums.

00:13:13.040 | So before I was doing it like that, which kind of looks like it's doing the same thing,

00:13:17.600 | right?

00:13:18.600 | Self.fs to be empty, and it's setting tofums to be my previous set, and it looks the same.

00:13:26.760 | But the problem is that if there are other pipelines that are pointing at the same list

00:13:31.440 | of transforms, they're not being emptied out by this, or else self.fs.clear does empty

00:13:38.320 | them out.

00:13:41.000 | So that was an example of a weird bug was with the old version.

00:13:47.640 | This weren't setting up properly.

00:13:52.080 | And it was kind of hard to debug, because there was just a few too many layers.

00:14:06.080 | OK.

00:14:09.600 | So in tabular now, we don't have to call tabular object dot data source anymore.

00:14:22.400 | We can just pass splits, which I think I'll rename into "filts," or maybe I'll call them

00:14:27.360 | all "filts."

00:14:28.360 | Or maybe I'll call them all "filts."

00:14:29.360 | Anyway, we'll make them more consistent.

00:14:31.600 | We can just pass that into our constructor.

00:14:37.120 | And so the other thing about this is we don't have to call setup anymore.

00:14:40.360 | We have all the information we need to set up as soon as we instantiate this.

00:14:44.440 | So we just call setup directly in it.

00:14:50.660 | Another example of weird bugs to avoid, again, it's the subset functionality.

00:14:56.000 | When we subset, we want to create a new tabular object with a slice, the split of what we

00:15:05.680 | want.

00:15:09.220 | But we had to make sure that in new, we do set up equals false, otherwise when you create

00:15:14.320 | the subset, it's going to rerun setup, which would be annoying.

00:15:19.560 | So we found the bug, because we added some tests and found they weren't passing.

00:15:26.640 | So we always try to think of tests that we can add.

00:15:33.800 | So yeah, tabular_rapids, you can check out.

00:15:37.800 | It's in 42.

00:15:40.340 | It's missing an underscore from the front, so that suggests that I haven't been working

00:15:43.880 | on that.

00:15:44.880 | It's been Sylvan's baby.

00:15:45.880 | But that suggests that it should be more or less working.

00:15:48.920 | So you could certainly try it out.

00:15:51.100 | It certainly hasn't been much used, though.

00:15:53.280 | So it might be a bit buggy still.

00:15:56.440 | But yeah, hopefully you'll find that's working.

00:16:00.320 | So I believe it's a lot faster than the pandas one.

00:16:06.960 | OK.

00:16:10.600 | So those are those changes.

00:16:13.840 | So everything else here is basically the same.

00:16:16.440 | Oh, and then the other thing I do is I added databunch.

00:16:24.360 | So that was nice and easy, because databunch is now in filtered base.

00:16:29.880 | So we get that for free.

00:16:31.840 | Sorry, Marlon, I don't know what you mean by probabilistic inference.

00:16:38.200 | OK.

00:16:40.300 | So that's that.

00:16:45.160 | So maybe we can go back and look at 00 and 01 a little bit.

00:16:58.980 | That'll be fun.

00:17:06.140 | And actually, I don't know if you remember, but 00 and 01 aren't quite the start.

00:17:14.340 | There's all the ones that start with 9, which is the notebook stuff, which I don't know

00:17:21.440 | that we're bother looking at.

00:17:23.080 | But there's also a special one, which is imports.py.

00:17:29.780 | And that is not generated by a notebook.

00:17:32.280 | And so we actually start with imports.py.

00:17:35.500 | So that's got all the imports, as you can see.

00:17:41.680 | These types here, I think, are only in Python 3.7.

00:17:44.840 | So we patch them in if they're missing.

00:17:53.240 | And then we have a tiny number of little functions just for checking equality or doing nothing

00:18:02.000 | and checking if something's an iterator or a collection.

00:18:04.520 | I think these are probably things we needed in the notebook, the notebook notebooks.

00:18:12.120 | So that's why they're here.

00:18:16.280 | So that one's not created by a notebook.

00:18:24.520 | So yeah.

00:18:25.520 | So going all the way back to 00, the first thing I wanted to write was something which

00:18:34.140 | would test whether a and b could successfully be compared using some comparator.

00:18:42.640 | For example, test whether 1, 2 and 1, 2 are equal.

00:18:51.840 | Problem is that if this could pass and be wrong, because what if test always returned

00:18:58.040 | true?

00:18:59.040 | I actually needed a way to test whether it successfully fails.

00:19:04.240 | But my test, the idea is that they always throw an exception if they fail, specifically

00:19:09.680 | an assertion exception.

00:19:12.200 | The reason for that is that if you run a notebook that causes an exception, you'll get a nice

00:19:17.280 | stack trace and all that kind of a thing.

00:19:18.680 | So it's a good way to show a test failure, in my opinion.

00:19:21.960 | So that means I needed to have a way to test for failures.

00:19:26.160 | You can't test for failures by just passing the code directly in like that, because that

00:19:39.360 | would actually run this code, it would cause an exception, and that's it.

00:19:45.560 | The exception already happens.

00:19:47.060 | So you always have to put a lambda there so it doesn't actually run it.

00:19:52.160 | So the first thing I actually needed to do was create a test fail function, which will

00:19:57.600 | try to call the function.

00:20:01.040 | And if there is an exception, then if you passed in contains, and that says I want you

00:20:08.200 | to make sure that the string of the exception contains something, so either make sure they

00:20:12.320 | didn't pass that or that it was here, and then return.

00:20:16.320 | So if you didn't end up in the exception clause, then I failed.

00:20:19.800 | I didn't get an exception, so that's test fail.

00:20:22.540 | So that was kind of step one, is something that would allow us to test for failures.

00:20:26.320 | And so here's something that checks that we actually get a failure.

00:20:35.320 | And so then we can test our test with equals and not equals for both failing and succeeding.

00:20:46.840 | So all equal was one of the things that was defined in local.imports, but we can still

00:20:53.080 | display it here.

00:21:01.920 | And then we can create not equals.

00:21:09.400 | And, yeah, so then we can start using the fact that we have a general purpose test A and

00:21:21.000 | B in some comparator to start defining things like test ik, which is the one we normally

00:21:26.520 | use for testing that A and B are equals.

00:21:30.960 | And then this is just what's printed.

00:21:32.480 | If there's a failure, it'll tell us what the failure was.

00:21:39.760 | So the equals tries to kind of do the right thing.

00:21:56.960 | So if either of them have an array equals method, then we should use that to test for

00:22:04.240 | error quality, that's kind of the Python or the NumPy protocol for checking for array

00:22:09.840 | equality.

00:22:10.840 | If one of them is an nd array, we can use NumPy.

00:22:14.700 | If one of them is a string or a dicta or a set, we can just use operator.equals.

00:22:20.120 | If one of them is an iterator, we can use all equal, which, as you can see, checks whether

00:22:27.600 | everything in each one is equal.

00:22:35.320 | Otherwise we'll just use operator equality.

00:22:37.520 | So we try to kind of make equals work across a variety of types.

00:22:57.520 | And that's why you can see test equal being checked with all kinds of things like arrays

00:23:02.520 | and dictionaries and data frames, series, so forth.

00:23:11.040 | So that's the main one we use all the time in our tests.

00:23:14.840 | Sometimes we use test ik type, which tests whether A and B are equal, and also tests

00:23:21.000 | whether their types are equal.

00:23:32.000 | And if you pass a list or a tuple, then we'll also check that the types of all of its contents

00:23:38.000 | are equal.

00:23:39.720 | So test for not equals, test for the two things are close.

00:23:52.640 | Okay.

00:23:53.640 | So that's OO.

00:23:56.320 | All right.

00:23:59.920 | I'm going to look at meta classes just yet.

00:24:09.600 | So here is O1 core.

00:24:17.680 | So quite often we use patch.

00:24:21.760 | For example, we use it for ls, for example, we have here define ls, self colon path.

00:24:51.520 | And it has at patch.

00:24:55.040 | So what that does is if we say P equals path dot, you can go P dot ls.

00:25:11.400 | So how does that work?

00:25:15.280 | Well, I remember a decorator in Python is simply passed its function as an argument.

00:25:24.800 | So in this particular case for patch def func, patch will be passed func.

00:25:38.480 | And so then that function, we want to find out what to patch.

00:25:45.900 | So we want to patch this parameter's type.

00:25:54.240 | And so to find that parameter's types, we go through all of the annotations and just

00:25:59.960 | find the first one, which means this is like, in some ways, I mean, it won't tell you if

00:26:06.520 | you do something dumb like that.

00:26:10.200 | It'll still end up being patched to T3.

00:26:16.000 | But that's fine.

00:26:17.160 | I don't always check for every dumb thing you might do.

00:26:23.460 | Just as long as the behavior works correctly when used correctly and the really obvious

00:26:28.040 | mistakes are checked for.

00:26:30.200 | So that's going to tell us what the type we're patching.

00:26:34.640 | And then it will patch to that type with this function.

00:26:40.120 | And so here's patch two, which there's really not much to tell you about that.

00:26:48.000 | It just goes through and uses the func tools stuff to make sure all of the metadata is

00:26:53.720 | correct and it will set in this class with this name the function that we asked for.

00:27:08.600 | Which is better, Win or Ubuntu?

00:27:10.320 | Oh, it's up to you.

00:27:12.400 | I use Ubuntu in my server here, as you see.

00:27:17.240 | And I use Windows on my computer because I do a lot of-- I like to draw things a lot

00:27:21.600 | when I'm talking, so I like to use something with a stylus.

00:27:28.780 | And I-- yeah, there's a lot I like about Windows on my desktop.

00:27:34.760 | OK, so that's patch.

00:27:39.840 | So then we've got a different thing, which is patch property.

00:27:45.000 | And patch property does the same thing as patch, but it passes as prop equals true,

00:27:52.040 | which as you can see simply turns a function into a property.

00:27:57.920 | Because remember, when you say at property in Python, property is just a decorator, so

00:28:01.920 | you can use it as a function.

00:28:03.520 | So here it is being used as a property.

00:28:07.600 | So why not use wraps?

00:28:12.320 | The-- what was it?

00:28:19.780 | Oh, yeah.

00:28:26.200 | This is obviously the comment that was telling me it was something about pipeline.

00:28:33.780 | This is basically doing the same thing as functools.update wrapper or whatever it's called.

00:28:42.280 | And it's setting the function with its name to the attribute.

00:28:49.380 | I don't remember anymore.

00:28:51.680 | Maybe this is now obsolete, because I added a comment to here to remind myself why I did

00:28:56.800 | it, but now I don't understand the comment, so I'm not sure.

00:29:04.640 | functools.update wrapper.

00:29:08.960 | Let's see what it looks like.

00:29:13.680 | So it uses wrapper assignments as a find, goes through each one, and it grabs it, and

00:29:24.720 | it sets it to the value.

00:29:48.960 | So I'm not doing this bit, and I don't remember what that is, but maybe there was some reason

00:29:52.560 | why we do that, although-- yeah, I'm not sure.

00:30:02.920 | Yeah, I'm not sure.

00:30:06.800 | Maybe we can now.

00:30:11.440 | OK.

00:30:15.640 | So then we have things like delegates-- yeah, sorry, but I know you meant wraps, but wraps

00:30:24.720 | just calls update wrapper, so that's all wraps is.

00:30:31.840 | As you can see, functools.wraps, yeah, so that's all it is.

00:30:47.120 | OK.

00:30:49.040 | So delegates we've kind of looked at before.

00:30:56.240 | So that's the thing that allows us-- you can either delegate passing and nothing at all,

00:31:03.040 | in which case it will delegate your init to your base classes init.

00:31:08.800 | So you can see here how I'm testing it, right?

00:31:10.640 | I've added a little thing called test-sig, which checks that the signature string of

00:31:19.120 | five is equal to whatever you pass it.

00:31:22.920 | So here you can see we've got a foo, and we've got a, b equals 1, and quags, and then quags

00:31:31.800 | is being delegated to base foo, which has e and c equals 2.

00:31:39.800 | And so that's not a quag, that is a quag.

00:31:42.880 | So it's going to therefore end up as a, b equals 1, and c equals 2.

00:31:48.060 | So we can see the signature is grabbing that stuff from base foo.

00:31:54.120 | Actually, the other thing we could test-- no, actually, that's not the right place to test

00:32:08.400 | it.

00:32:09.400 | That's fine.

00:32:10.400 | We should get rid of this.

00:32:22.200 | This one, useQuags, is mainly used by other functions.

00:32:26.640 | We don't normally use it directly, but this is like something where you can basically

00:32:29.560 | say, I want you to replace quags with y and z.

00:32:34.480 | So you can see here I've got a, b equals 1, quags, and then that's it.

00:32:39.280 | These add y and z, and so as you can see here, it's added y and z.

00:32:44.000 | We don't normally use it directly, and you can see it's just grabbing the signature and

00:32:50.160 | replacing stuff in the signature.

00:32:52.920 | But it is used in that very important funx-quags thing that we use all the time.

00:33:01.920 | That's the thing where we say, oh, these methods, this list of methods, are things that you

00:33:07.560 | could pass in as quags.

00:33:09.960 | And if you do, it will replace the method here.

00:33:14.200 | And so as you can see there, I use quags to replace the signature with the correct signature.

00:33:25.600 | And here you can see I am using functools.update wrapper, which I could also have done it with

00:33:37.400 | by saying, at wraps, hold in it, I guess, would have worked just as well.

00:33:55.280 | I'm trying to remember why this is here, and I now don't.

00:34:11.920 | What am I doing with that?

00:34:18.040 | Ah, yes.

00:34:20.960 | Okay.

00:34:21.960 | So when we-- so we've got functs-quags here.

00:34:36.480 | We said b is our methods.

00:34:38.800 | So if I create something of that type, then b is going to return 2, because it's the method.

00:34:43.960 | But then I can pass in something and say, no, replace b with a method that returns 3,

00:34:49.280 | and make sure that's what happened.

00:34:56.400 | And then what you can do instead of passing in a function or lambda, you can pass in a

00:35:02.440 | method.

00:35:03.440 | And if you pass in a method, it's going to get self as well.

00:35:06.240 | So to tell it that something should be a method, you put @method above it.

00:35:11.280 | And the way that is done is using this little trick here, which is to replace f with a types.method

00:35:20.480 | type wrapper.

00:35:26.920 | And that's what's checked here.

00:35:28.880 | Check to see whether something's a method.

00:35:34.320 | Okay. So that's what that does.

00:35:45.280 | I added this little decorator that uses a external thing called type check, which basically

00:35:56.800 | does runtime type checking.

00:35:58.240 | It's part of this thing called type guard.

00:36:00.840 | Although honestly, I haven't actually used it since I added it.

00:36:05.520 | So I might remove it, or we might decide to use it more widely.

00:36:09.480 | But basically what it does is if you add a annotation, and then you try to call it with

00:36:17.960 | the wrong type, then it'll fail.

00:36:24.680 | It's an interesting idea.

00:36:25.680 | I haven't found myself wanting it much yet.

00:36:34.360 | Okay.

00:36:37.360 | What else is there to show you here?

00:36:50.080 | Add docs, we've seen plenty of times.

00:36:56.800 | So here's an example.

00:36:58.320 | We've got some class with some functions.

00:37:00.520 | And if we say, then say add docs, then we can say these are my doc strings for each

00:37:07.520 | function.

00:37:08.520 | And so I can then just check that it does in fact get those doc strings.

00:37:15.200 | Okay.

00:37:18.400 | So that's that.

00:37:20.520 | And then get atra, I guess we've pretty much seen now.

00:37:27.840 | So get atra is the thing that we inherit from in order to get done to get atra for free.

00:37:36.760 | And specifically what it's going to do is it's going to try and find the unknown attribute

00:37:41.720 | in self.default.

00:37:45.320 | So here's an example where we set self.default to whatever you pass in.

00:37:51.480 | So we passed in hi.

00:37:54.680 | So we would expect to be able to do dot lower.

00:37:59.560 | That would make a lot more sense if this was capitalized.

00:38:06.040 | There we go.

00:38:13.720 | And it fails if we try to say upper because underscore extra is the list of things that

00:38:19.680 | we are allowed to delegate.

00:38:23.240 | Although by default it will delegate everything.

00:38:27.160 | So dir in Python gives you back a list of all of the attributes.

00:38:30.240 | So we can use anything by default that's in self.default as long as it doesn't start with

00:38:35.540 | underscore because that would be private.

00:38:39.200 | So dunder dir is a thing that Python calls when you call dir.

00:38:42.880 | So when you do like tab completion that's how it does tab completion.

00:38:46.880 | So we then do custom dir which is looking at everything in the type and everything in

00:39:01.040 | the object and anything else that you add manually.

00:39:10.840 | So here we check that lower has been added to our dir.

00:39:19.040 | Sometimes you don't want to inherit from getatra but instead you want to kind of do it manually.

00:39:27.220 | So you can also instead define your own dunder getatra and simply return this delegate atra

00:39:35.560 | which will basically do exactly the same thing except you don't get the dunder dir thing.

00:39:45.520 | One more thing.

00:39:47.560 | Set state.

00:39:51.160 | When you override dunder getatra in Python it often kills pickle.

00:39:57.400 | And so we just I think we just looked it up on stack overflow and found a few.

00:40:01.400 | So pickle will use dunder set state to decide what to pickle basically.

00:40:11.560 | And I don't quite remember why but somehow doing this fixes pickling.

00:40:15.120 | That's why that's there.

00:40:18.520 | Okay.

00:40:21.940 | So last one for today is L. This is the main one.

00:40:29.000 | So L is a collection base which also has getatra.

00:40:44.120 | And also uses new check meta to make sure that we don't that if you pass in an L then

00:40:49.920 | it just gives you back what you started with rather than creating another one.

00:40:54.480 | A collection base is just something which contains, composes some items.

00:41:01.840 | And basically everything is just delegated down to that.

00:41:04.520 | So delegates down length and getitem, setitem, delitem, repra and itter.

00:41:11.800 | If you don't know what any of these things are check the Python data model docs.

00:41:21.640 | So then L adds a lot of behavior which is best understood by looking at the tests I

00:41:29.000 | think.

00:41:30.860 | So you can pass in pretty much anything to an L that you could otherwise pass into the

00:41:37.960 | normal Python list.

00:41:39.960 | So list range 12, we try to make it behave as much like a Python list as possible.

00:41:55.400 | And if you pass in the same things, in fact you can see we actually test check that that's

00:41:59.080 | the same as list range 12.

00:42:04.640 | But then we have another nice little thing.

00:42:06.280 | So we can do dot reverse, for example, as you can see.

00:42:25.600 | Now reverse is actually not listed anywhere here.

00:42:35.240 | As you can see.

00:42:36.480 | And the reason for that is that we inherited from getatra and that default is set to self.items

00:42:46.720 | and list has a reverse.

00:42:52.920 | So actually all we were doing is we were dedicating to list.

00:43:06.920 | Okay.

00:43:29.760 | We have a dunder set item, as you can see.

00:43:37.520 | So we can set something, T3 equals H. And then some of the nice stuff that we're adding is

00:43:45.100 | being able to kind of more NumPy style set multiple things to multiple values and retrieve

00:43:52.360 | multiple things.

00:43:59.840 | Yeah.

00:44:01.840 | So that's some basic functionality in L. You can create an empty one, which should be the

00:44:11.800 | match to an empty list, of course, a pen just like a list can plus equals to it like a list

00:44:18.320 | can.

00:44:22.040 | You can add things onto the left of it instead of the right, which a list can't.

00:44:28.600 | You can multiply just like a list can.

00:44:36.120 | Unlike a list, you can negate.

00:44:38.680 | So this is the negation operation.

00:44:40.520 | The true false false becomes false true true.

00:44:53.280 | So then here's an interesting one, cycle.

00:44:58.300 | So cycle simply calls it a tool.cycle.

00:45:12.680 | So that's a useful thing to know about basically it at all start cycle.

00:45:16.760 | Simply let's try it in a tools.cycle one, two, three.

00:45:26.280 | And then we'll need to just grab the first little bit of that.

00:45:28.840 | Otherwise it'll be infinitely long and I don't have an infinite amount of RAM.

00:45:38.400 | I sliced, grab the first bit, kind of 12.

00:45:43.480 | Oh, and then we'll need to listify that so you can see it.

00:45:49.560 | Okay.

00:45:50.560 | So as you can see what cycle does one, two, three, one, two, three, one, two, three.

00:45:52.960 | So it'll do that forever.

00:45:54.920 | And then we sliced in the first 12.

00:45:58.240 | So we can say l.cycle one, two, three, for example.

00:46:15.280 | And then we can do the same thing, it a tools dot I slice that and then list that, oops.

00:46:41.240 | And then slice by how much, there we go, same thing.

00:46:47.320 | All right, so questions, how do I handle multiple indices?

00:46:53.200 | So we handle multiple indices by defining get item.

00:47:02.120 | So get item, it's going to check whether the index that's passed in is an indexer or not.

00:47:08.240 | What's an indexer?

00:47:10.280 | An indexer is something that is either an int or is something that has an end in property

00:47:20.440 | which is zero.

00:47:22.240 | Why is that?

00:47:23.800 | Because of this, T equals one, two, three, T one, that's an indexer, but here's something

00:47:32.980 | else that's an indexer, import torch, torch, that's an indexer too, okay?

00:47:48.480 | And that's because torch.tensor.endim is zero.

00:48:02.640 | But you can't do that, okay?

00:48:11.040 | So that's what is indexer is checking for.

00:48:15.520 | So if it's an indexer, then we call underscore get, which as you can see, checks if it's

00:48:25.640 | an indexer, and if it is, it simply tries to find out whether self.items has an i lock.

00:48:32.600 | In this case, it doesn't, so it's just going to give us self.items and i.

00:48:36.520 | So it's just going to be self.items i.

00:48:39.620 | But your question is, what happens if it's a list?

00:48:43.800 | In that case, we're going to end up over here.

00:48:48.880 | So we're going to create a new L containing self.getIdx, which in this case, it's not

00:48:55.640 | an indexer.

00:48:57.720 | So we're going to convert a mask to indexes.

00:48:59.860 | So if it's Booleans, it'll convert into indexes.

00:49:03.080 | And then it'll check does it have i lock, which else doesn't, does it have dunder array,

00:49:07.120 | which else doesn't.

00:49:08.480 | So then it's going to return a list comprehension.

00:49:12.040 | And so that's how come that works.

00:49:15.280 | OK, yeah, so how does none plus done work?

00:49:22.480 | As I mentioned, it's in dunder add.

00:49:26.240 | And specifically here, you can see we create a new L containing all of the items in A plus

00:49:36.920 | B listified, and listify none is an empty list.

00:49:46.740 | So that's why that works.

00:49:50.240 | OK, so here you can see we've got an infinite number of ones.

00:50:00.180 | And if we zip that with T, where T is L range four, that should be the same as zipping range

00:50:09.480 | four with four ones.

00:50:11.680 | So that works there.

00:50:14.240 | L.range is almost the same as normal range, except it returns an L. Shuffled does what

00:50:26.200 | it sounds like.

00:50:30.040 | And we actually have a test shuffled now, I think, so we can do that instead.

00:50:48.240 | So mapped is basically the same as calling map underscore f comma t, except that there's

00:51:00.800 | a few differences.

00:51:01.800 | One is that that returns a map object, where else our map actually does the mapping.

00:51:10.160 | So t.mat, as you can see.

00:51:27.480 | And you can pass in arguments, as you can see.

00:51:38.040 | All keyword arguments.

00:51:41.920 | So we use that quite a lot.

00:51:46.000 | OK, so tens of things you can construct an L with.

00:51:53.440 | You can construct it with a list.

00:51:54.840 | You can construct it with another L. You can construct it with a string, in which case

00:52:05.280 | it will stay as a string, with a range.

00:52:09.720 | You can construct it with a generator.

00:52:14.880 | Now this is different to how Python lists work.

00:52:18.560 | If I go list array zero like this, then as you can see, that gets converted into a list

00:52:33.600 | containing zero, or zero comma one, if your list is zero or one.

00:52:38.240 | Whereas L doesn't do that by default, L will create a single item containing the array.

00:52:45.280 | Because most of the time, particularly with tensors, you don't want to unwrap them into

00:52:48.800 | a list.

00:52:49.800 | You want to actually put the tensor or the array into the list.

00:52:56.600 | Is there any way to know how L is shuffled?

00:53:00.520 | Not with the shuffled.

00:53:01.720 | You would have to use indexes or something for that.

00:53:12.560 | OK.

00:53:14.620 | So that's an important difference.

00:53:21.880 | If you want the same behavior that list does, then you can pass use list equals true to

00:53:31.240 | give you the same behavior as list.

00:53:33.080 | So instead of having an array with zero one in, that will actually create two items now,

00:53:39.280 | zero and one.

00:53:40.280 | So that does exactly the same thing as list would do if you say use list equals true.

00:53:47.120 | OK.

00:53:51.200 | You can pass the match parameter to the constructor to get the same behavior as listify had in

00:53:57.760 | version one, which is basically to say make this list as long as this list.

00:54:04.360 | That's why that will create one, one, one.

00:54:12.000 | Here's the test that confirms that L of T is T. Note that is means that identical objects

00:54:18.960 | are the same reference.

00:54:22.040 | OK.

00:54:24.580 | And so then you can see some of the methods.

00:54:28.520 | So here's checking get item.

00:54:33.300 | As you can see here, we've got using an array of masks instead.

00:54:39.960 | So that's just like NumPy.

00:54:41.360 | The mask array has to be the same number of booleans as the length of the list.

00:54:50.720 | It has a dot unique as you can see.

00:54:58.960 | This is basically kind of this is basically telling you the reverse mapping.

00:55:03.440 | So it's a mapping from where is the three, for example, and it's in location zero, one,

00:55:10.440 | two.

00:55:11.440 | Whereas the one, it's in location zero, so it's a dictionary.

00:55:15.880 | So that valid to IDX and unique kind of two things you need to create a vocab.

00:55:22.760 | We can filter.

00:55:24.100 | This is basically the same as the filter function in Python.

00:55:26.800 | But it's going to return an L. Here's mapped.

00:55:34.280 | Mapped dict is kind of handy.

00:55:36.560 | It does exactly the same as mapped, but rather than returning a list, it returns a dictionary

00:55:42.440 | from the original value of the list to the value of the function.

00:55:46.520 | So that's pretty handy.

00:55:50.480 | Zipped is basically the same as zipping lists, as you can see, it returns an L.

00:55:57.440 | One nice thing you can add to zip, though, is if the lists are different lengths, then

00:56:01.680 | you can say cycled equals true, and it will replicate the shorter one, as you can see,

00:56:09.240 | and it'll cycle through it again to make it the same length as the longer one.

00:56:13.920 | Or else cycled equals false behaves the same way as the normal zip.

00:56:21.680 | And then mapped zip basically takes the result of that zipped and puts it into a map.

00:56:27.480 | So for example, if we do mapped zip multiplication, then it's going to zip one, two, three with

00:56:35.080 | two, three, four, and then apply a multiplication to each one to give us element-wise multiplication.

00:56:42.440 | It won't be fast like numpy, so don't use this instead of numpy, but it's quite handy sometimes.

00:56:52.840 | Zip with will take this L and zip it with this list, as you can see.

00:57:02.920 | And here's the same thing with the map as well.

00:57:06.840 | That's the same thing as before.

00:57:11.760 | Item-getter is just going to apply -- which one is it?

00:57:35.240 | Item-getter.

00:57:36.240 | Oh, it's an operator.

00:57:37.240 | Of course it is.

00:57:38.240 | It's an operator.item-getter to every item of a list, so our t is 1, 0, 2, 1, 3, 2, 2.

00:57:56.680 | So t.item-got 1 will return the 1th element from each of those, so it will be 0, 1, 2,

00:58:09.680 | 2.

00:58:10.680 | I use that a lot, actually.

00:58:14.280 | Attribute-got is basically the same thing, but it's going to return this attribute from

00:58:19.520 | each thing.

00:58:20.520 | So here we've got a3b4, a1b2.

00:58:24.120 | So this will be the b from each, so 4, 2.

00:58:28.720 | We use that quite a lot, too.

00:58:30.840 | Sort it is pretty obvious.

00:58:34.020 | Range is pretty obvious.

00:58:36.840 | All right.

00:58:39.520 | So there's a little guided tour of the first half of O1 core.

00:58:48.040 | Thanks for tuning in, and I'll see you all next time.

00:58:51.440 | Bye-bye.

00:58:52.440 | Bye-bye.

00:58:53.440 | Bye-bye.

00:58:54.440 | Bye-bye.

00:58:55.440 | Bye-bye.

00:58:56.440 | Bye-bye.

00:58:57.440 | Bye-bye.

00:58:58.440 | Bye-bye.

00:58:58.440 | [BLANK_AUDIO]

fastai v2 walk-thru #9

Chapters