back to index

An overview of fastai2


Chapters

0:0
4:28 An infinitely customizable training loop
10:46 optimizer with pluggable stats and steppers
13:24 Image classification
26:25 provides the basic features needed for working with model inputs

Whisper Transcript | Transcript Only Page

00:00:00.000 | basically every time Sylvain and I found something that didn't quite work the
00:00:05.200 | way we wanted it at any part of the stack, we wrote our own. So it's kind of
00:00:09.920 | like building something with no particular deadline and trying to do
00:00:13.060 | everything the very very best we can. So the layered API of Fast AI v2 starts at
00:00:22.600 | the applications layer which is where most beginners will start and it looks
00:00:29.160 | a lot like Fast AI v1 which is the released version of the software that
00:00:33.880 | people have seen before, but v2 everything is rewritten from scratch, it's
00:00:38.120 | totally new, there's no code borrowed, but the top level API looks quite similar.
00:00:42.040 | The idea is that in one, two, three, four lines of code you can create a state-of-the-art
00:00:51.160 | computer vision classifier, including transfer learning, with nearly the same one,
00:00:59.520 | two, three, four lines of code, oh five lines of code in this case because we're
00:01:04.440 | also displaying, you can create a state-of-the-art segmentation model and
00:01:09.040 | actually like when I say state-of-the-art like for example this segmentation model
00:01:12.440 | is to the best of my knowledge still better than any published result on this
00:01:16.760 | particular Canvit data set. So like these five lines of code are super good, five
00:01:20.440 | lines of code and as you can see it includes a line of code which if you say
00:01:25.760 | show batch it will display your data in an appropriate format, in this case
00:01:31.040 | showing you segmentation, a picture and the color coded pixels overlaid on top
00:01:36.240 | of the picture. The same basic four lines of code will do text classification, so
00:01:43.720 | here's the basis of ULMFIT which is a system that we developed and wrote up
00:01:50.840 | along with Sebastian Ruder for transfer learning in natural language processing
00:01:55.160 | and as you can see in here this is working on IMDB on a single epoch in four
00:02:01.200 | minutes, the accuracy here is basically what was the state-of-the-art as of a
00:02:05.680 | couple of years ago. Tabular or time series analysis, same deal, basically a few
00:02:13.800 | lines of code, nearly exactly the same lines of code and you'll get a great
00:02:17.080 | result from your tabular data and Ditto for collaborative filtering. So the
00:02:23.800 | high-level API for fast AIV2 is designed to be something where you know
00:02:30.160 | regardless of what application you're working on you can get a great result
00:02:34.440 | from it using sensible defaults and carefully selected hyper parameters
00:02:40.240 | automatically largely done for you for the most common kinds of problems that
00:02:46.440 | people look at and that bit doesn't look that different to V1 but understanding
00:02:52.320 | how we get to that is kind of interesting and involves getting deeper
00:02:58.000 | and deeper. This approach though does work super well and partly it's because this
00:03:05.960 | is based on quite a few years of research to figure out what are the best
00:03:09.200 | ways to solve various problems along the way and when people actually try using
00:03:13.680 | fast AI they're often surprised. So this person posted on our forum that they've
00:03:18.680 | been working in TF2 for a while and for some reason they couldn't figure out all
00:03:23.600 | of their models are suddenly working much better and the answer is basically
00:03:27.360 | they're getting all these nice kind of curated best practices and somebody else
00:03:32.840 | on Twitter saw that and said yep we found the same thing we were trying
00:03:36.640 | TensorFlow spent months tweaking and then we switched to fast AI a couple of
00:03:40.840 | days later we were getting better results. So these kind of carefully
00:03:44.400 | curated defaults and algorithms and high-level APIs that do things right for
00:03:49.920 | you the first time even for experienced practitioners can give you better
00:03:54.320 | results faster but it's actually the other pieces that are more I think
00:04:00.400 | interesting for a Swift conversation because as the deeper we go into how we
00:04:05.720 | make that work the more stuff you'll see which will be a great fit I think with
00:04:11.800 | Swift. So the mid-layer API is something which is largely new to fast - actually I
00:04:21.880 | guess the foundation layer is new. So the mid-layer I guess I'd say is more
00:04:25.000 | rewritten for v1 and it contains some of the things that make those high-level
00:04:30.920 | APIs easy. One of the bits which is the most interesting is the training loop
00:04:38.000 | itself and I thank Sylvain for the set of slides we have for the training loop.
00:04:44.880 | This is what a training loop looks like in PyTorch. We calculate some predictions
00:04:50.440 | we get a loss we do a backwards pass to get the gradients we do an optimizer
00:04:56.200 | step and then optionally we run time to time we'll zero the gradients based on
00:05:00.680 | if we're doing when we're accumulating. So this is what that loop looks like run
00:05:08.000 | the model get the loss do the gradients step the optimizer do that a bunch of
00:05:12.360 | times. But you want to do something interesting you'll need to add something
00:05:19.640 | to the loop to do keeping track of your training statistics in in tensorboard or
00:05:24.720 | in fast progress or whatever. You might want to schedule various hyper parameters
00:05:29.520 | in various different ways. You might want to add various different kinds of
00:05:33.560 | characterization. You may want to do mixed precision training. You may want to do
00:05:39.760 | GANs. So this is a problem because either you have to write a new training loop
00:05:46.400 | for every time you want to add a different tweak. Now making all those
00:05:50.520 | tweaks work together then becomes incredibly complicated. Or each or you
00:05:57.160 | try and write one training loop which does everything you can think of. This is
00:06:00.080 | the training loop for fast AI 0.7 which only did a tiny subset of the things I
00:06:04.700 | just said that was ridiculous. Or you can add callbacks at each step. Now the idea
00:06:12.200 | of callbacks has been around in deep learning for a long time, APIs. But what's
00:06:18.640 | very different about fast AI is that every callback is actually a two-way
00:06:23.540 | callback. It can read absolutely everything. It can read gradients,
00:06:27.320 | parameters, data, so forth. And it can write them. So it can actually change
00:06:33.440 | anything at any time. So the callbacks we say infinitely flexible. We feel pretty
00:06:41.000 | confident in that because the training loop in fast AI has not needed to be
00:06:47.120 | modified to do any of the tweaks that I showed you before. So even the entirety
00:06:52.520 | of training GANs can be done in a callback. So basically we switch out our
00:06:58.680 | basic training loop and replace it with one with the same five steps but
00:07:03.880 | callbacks between every step. So that means for example if you want to do a
00:07:11.920 | scheduler you can define a batch begin that sets the optimizer's learning rate
00:07:16.640 | to some function. Or if you want to do early stopping you can write an on epoch
00:07:21.560 | end that checks the metrics and stops training. Or you can do parallel training,
00:07:28.640 | set up data parallel and happy at the end of training take data parallel off
00:07:34.560 | again. Gradient clipping, you have access to the parameters themselves so you
00:07:40.160 | can click the gradient norms at the end of the backward step and so forth. So all
00:07:47.320 | of these different things are all things that have been written with fast AI
00:07:52.640 | callbacks including for example mixed precision all of NVIDIA's
00:07:59.600 | recommendations mixed precision training will be added automatically if you just
00:08:03.600 | add a to FP16 at the end of your learn call. And really importantly you know for
00:08:11.920 | example all of those mixed precision things can be combined with multi GPU and
00:08:17.160 | one cycle training and gradient accumulation and so forth. And so trying
00:08:24.480 | to you know create a state-of-the-art model which involves combining state-of-the-art
00:08:30.800 | regularization and mixed precision and distributed training and so forth is a
00:08:35.840 | really really really hard job. But with this approach it's actually just a single
00:08:41.040 | extra line of code to add each feature and they all explicitly are designed to
00:08:45.840 | work with each other and are tested to work with each other. So for instance here
00:08:51.520 | is mixup data augmentation which is a incredibly powerful data augmentation
00:08:56.520 | method that has powered lots of state-of-the-art results and as you can see
00:09:01.720 | it's well under a screen of code. By comparison here is the version of mixup
00:09:07.720 | from the paper not only is it far longer but it only works with one particular
00:09:13.280 | data set and one particular optimizer and is full of all kinds of assumptions and
00:09:18.240 | only one particular kind of metric and so forth. So that's an example of these
00:09:25.200 | mid-tier APIs. Another one is the optimizer. It turns out that you know it
00:09:34.320 | looks like there's been lots and lots of different optimizers appearing in the
00:09:38.640 | last year or two. It actually turns out that they're all minor tweaks on each
00:09:44.440 | other. Most libraries don't write them this way. So for example Adam W, also
00:09:50.960 | known as decoupled weight decay Adam, was added to PyTorch quite recently in the
00:09:59.160 | last month or two and it required writing a whole new class and a whole new
00:10:06.240 | step to implement and it took you know it was like two or three years after the
00:10:11.640 | paper was released. On the other hand fast AI's implementation as you can see
00:10:17.680 | involves a single extra function containing two lines of code and this
00:10:23.320 | little bit of gray here. So it's kind of like two and a half three lines of code
00:10:26.920 | to implement the same thing because what we did was we realized let's refactor the
00:10:34.640 | idea of an optimizer, see what's different for each of these you know
00:10:39.200 | state-of-the-art optimizers that have appeared recently and make it so that
00:10:43.800 | each of those things can be added and removed by just changing two things.
00:10:49.360 | Stats and steppers. A stat is something that you measure during training such as
00:10:58.680 | the gradients or the gradient squared or you might use dampening or momentum or
00:11:02.600 | whatever and then a stepper is something that uses those stats to change the
00:11:08.800 | weights in some way and you can combine those things together and by combining
00:11:13.000 | these we've been able to implement all these different optimizers. So for
00:11:21.200 | instance the lamb optimizer which came out of Google and was super cool at
00:11:28.560 | reducing tree training time from three days to 76 minutes we were able to
00:11:34.200 | implement that in this tiny piece of code and one of the nice things is that
00:11:39.360 | when you compare it to the math it really looks almost line for line
00:11:44.240 | identical except ours is a little bit nicer because we refactored some of the
00:11:48.720 | math. So it like makes it really easy to do research as well because you can kind
00:11:54.560 | of quite directly bring the equations across into your code. Then the last of
00:12:01.640 | the mid-tier APIs is the data block API which is something we had in version 1
00:12:09.520 | as well but when we were porting that to Swift we had an opportunity to rethink
00:12:21.240 | it and actually Alexis Gallagher in particular helped us to rethink it in a
00:12:26.760 | more idiomatic Swifty way and it came out really nicely and so then we took
00:12:32.520 | the result of that and kind of ported it back into Python and we ended up with
00:12:36.400 | something that was quite a bit nicer so there's been a kind of a nice
00:12:39.800 | interaction and interplay between fast AI in Python and Swift AI in Swift in
00:12:46.000 | terms of helping each other's APIs but basically the data block API is something
00:12:51.760 | where you define each of the key things that the program needs to know to
00:12:57.960 | flexibly get your data into a form you can put in a model so it needs to know
00:13:03.540 | what type of data do you have how do you get that data how do you split it into a
00:13:10.160 | training set in the validation set and then put that all together into a data
00:13:13.940 | bunch which is just a simple little class it's literally I think four lines
00:13:17.240 | of code which just has the validation set and the training set in one place so
00:13:25.640 | with a data block you just say okay my types I want to create a black and white
00:13:32.000 | pillow image for my X and a category for my Y and to get the list of files for
00:13:39.040 | those I need to use this function and to split those files into training and
00:13:43.880 | validation use this function which is looking at the grandparent path
00:13:48.560 | directory name and to get the labels use this function which is use the parents
00:13:55.160 | path name and so with that that's enough to give you a feminist for instance and
00:14:03.080 | so once you've done this you end up with a data bunch and as I mentioned before
00:14:08.120 | everything has a show batch so one of the nice things is it makes it very easy
00:14:13.120 | for you to look at your data regardless of whether it's tabular or collaborative
00:14:16.840 | filtering or vision or text or even audio if it was audio would let you show
00:14:21.960 | you a spectrogram and let you play play the sound so you can do custom labeling
00:14:30.360 | with data blocks by using for example a regular expression labeler you can get
00:14:37.280 | your labels from an external file or data frame and they could be model with
00:14:41.800 | multi labels so this thing here knows it's a multi label classification task
00:14:45.640 | so it's automatically put the semicolon between each label again it's still
00:14:51.120 | basically just three lines of code to define the data block so here's a data
00:14:56.560 | block for segmentation and you can see really the only thing I had to change
00:15:00.480 | here was that my dependent variable has been changed in category to pillow mask
00:15:07.640 | and again automatically I show batchworks and we can train a model from
00:15:11.520 | that straight away as well you could do key points so here I've just changed my
00:15:17.320 | dependent variable to tensor point and so now it knows how to behave with that
00:15:21.440 | object detection so now change my dependent variable to bounding box and
00:15:26.400 | you can see I've got my bounding boxes here text and so forth so actually going
00:15:35.720 | back I have a couple questions if you're if it's okay yeah so it's if you the
00:15:42.800 | code you've got sort of the the X's and Y's and these both he sounds like these
00:15:48.600 | different data types roughly conform to a protocol yep we're gonna get to that in
00:15:53.400 | a moment absolutely that's an excellent way to think of it and actually this is
00:15:59.760 | the way it looked about three weeks ago it now it looks even more like a
00:16:02.640 | protocol so yes this is where this is what where it all comes from which is
00:16:08.800 | the foundation APIs and this is the bit that I think is the most relevant to
00:16:12.840 | Swift and a lot of this I think would be a lot easier to write in Swift so the
00:16:21.160 | first thing that we added to PyTorch was object-oriented tensors for for too long
00:16:28.720 | we've all been satisfied with a data type called tensor which has no
00:16:34.840 | semantics to it and so those tensors actually represent something like a
00:16:39.880 | sentence or a picture of a cat or recording of somebody saying something
00:16:46.920 | so why can't I take one of those tensors and say dot flip or dot rotate or dot
00:16:53.220 | resample or dot translate to German well the the answer is you can't because it's
00:17:00.160 | just a tensor without a type so we have added types to tensors so you can now
00:17:08.000 | have a tensor image tensor point tensor bounding box and you can define a flip
00:17:13.520 | left right for each and so this is some of the source code from we've written
00:17:17.540 | our own computer vision library so that now you can say flip LR and it flips the
00:17:24.400 | puppy and if it was a key points it would fit the key points it was a
00:17:29.560 | bounding box it would fit the bounding boxes and so forth so this is an example
00:17:34.520 | of how tensors which carry around semantics are nice it's also nice that I
00:17:39.880 | didn't just say dot show right so dot show is something that's defined for all
00:17:45.480 | fast a IV to tensor types and it will just display that tensor it could even
00:17:52.440 | be a tuple containing a tensor and some bounding boxes and some bounding box
00:17:57.040 | classes whatever it is it will be able to display it it will be able to convert
00:18:02.480 | it into batches for modeling and so forth so you know with with that we can
00:18:12.260 | now create for example a random transformation called flip item and we
00:18:17.360 | can say that the encoding of that random transformation is defined for a pillow
00:18:22.520 | image or any tensor type and in each case the implementation is simply to
00:18:28.080 | call X dot flip LR or we could do the dihedral symmetry transforms in the same
00:18:33.440 | way before we call grab a random number between zero and seven to the side which
00:18:39.420 | of the eight transposes to do and then encodes call X dot let's dihedral with
00:18:47.600 | that thing we just got and so now we can call that transform a bunch of times and
00:18:54.260 | each time we'll get back a different random augmentation so a lot of these
00:18:58.300 | things become nice and easy hey Jeremy some Maxim asked why isn't tensor backing
00:19:03.520 | data structure for an image type tensor image is a tensor which is an image type
00:19:11.440 | why is it he says why isn't tensor a backing what why not have a different
00:19:17.780 | type named image I guess that has a tensor inside of it do you mean why
00:19:22.920 | inherit rather than compose apparently yes that yeah so inheritance I mean you
00:19:34.160 | can do both and you can create identical API's inheritance just has the benefit
00:19:38.720 | that all the normal stuff you can do with a tensor you can do with a tensor
00:19:42.200 | that happens to be an image so just because a tensor is an image doesn't
00:19:45.540 | mean you now don't want to be able to do fancy indexing to it or do an LU
00:19:50.840 | composition of it or stack it with other chances across that axis so basically a
00:19:59.240 | tensor image ought to have all the behavior of a tensor plus additional
00:20:04.560 | behavior so that's why we used inheritance we have a version that uses
00:20:09.360 | composition as well and it uses Python's nice get atra functionality to pass on
00:20:16.480 | all of it all of the behavior of tensor but it comes up more nicely in Python
00:20:23.980 | when you do inheritance and actually the PyTorch team has decided to officially
00:20:30.240 | implement semantic tensor subtypes now and so hopefully in the next version of
00:20:35.240 | PyTorch you won't have to use the extremely ugly hacks that we had to use
00:20:39.640 | to make this work and hopefully you'll see in TorchVision some of these ideas
00:20:48.080 | will be brought over there. So how does the type propagate? So if you do
00:20:54.720 | arithmetic on an image tensor do you get an image tensor back there? So Chris and I had a conversation about this a few
00:21:03.280 | months ago and I said I'm banging my head around this issue of types not carrying
00:21:09.680 | around their behavior and Chris casually mentioned oh yes that thing is called
00:21:13.500 | higher kind of types so I went home and that was one of these phrases that I
00:21:17.920 | thought only functional programming dweebs talked about and I would never care about it because it actually
00:21:25.440 | matters a lot and it's basically the idea that if you have a tensor image
00:21:28.360 | and you add one to it you want to get back a tensor image because it should be
00:21:33.440 | an image that's a bit brighter rather than something that loses its type. So we
00:21:38.200 | implemented our own again hacky partial higher kind of type implementation in
00:21:44.440 | FastAIV2 so any of these things that you do to a tensor of a subtype you will
00:21:53.040 | nearly always get back the correctly sub-typed tensor. I mean I saw the
00:21:58.480 | PyTorch recently sort of talking about their named indexing extensions for
00:22:04.640 | their tensors as well and I think they have a similar kind of challenge there
00:22:08.160 | where when you start doing arithmetic and other things like that on tensor that has
00:22:12.480 | named dimensions you want to propagate those along. Yeah so we haven't
00:22:19.360 | started using that yet because it hasn't quite landed as stable but yeah we
00:22:27.360 | talked to the PyTorch team at the DevCon and that we certainly are planning to
00:22:32.640 | bring these ideas together they're orthog and orbit related concerns. Yeah I just
00:22:37.440 | mean that I assume that that feature has the same problem and the same challenge.
00:22:41.760 | I assume so. It would be interesting to see what they do. Yeah it would. Yeah so you
00:22:53.280 | know it's kind of nice not only do we get to be able to say .show batch but you can
00:22:57.440 | even go .show results and in this case it knows what the independent variables
00:23:03.560 | type is it knows what the dependent variables type is and it even knows
00:23:07.080 | things like hey for a classification task those two things should be the same
00:23:10.800 | and if they're not by default I will highlight them. So these like lower level
00:23:15.760 | foundations are the things that drive our ability to easily add this higher
00:23:19.800 | level functionality. So you know this is the kind of ugly stuff we wouldn't
00:23:27.280 | have to do in Swift we had to write our own type dispatch system so that we can
00:23:33.160 | annotate things with types and those type annotations are actually semantic
00:23:36.880 | and so we now have the joyfully modern idea of function overloading in Python
00:23:44.920 | which has made life a lot easier and we already have that. Do you have many
00:23:51.760 | users that are using this yet? It's still pre-release it's not even alpha but there
00:23:59.080 | is a enthusiastic early adopter community who is using it so for example
00:24:07.200 | the user contributed audio library has already been ported to it. I've also
00:24:13.960 | built a medical imaging library on top of it and have written a series of five
00:24:17.720 | notebooks showing how to do CT scan analysis with it so it's kind of like it
00:24:25.680 | works and I was curious what your users think of it because there's this
00:24:33.200 | very strongly held conception that Python folks hate types and you're kind of
00:24:39.600 | providing a little bit of typing. Yeah and I'm curious how they react to that. The
00:24:44.440 | extremely biased subset of early adopter fast AI enthusiasts who are using it
00:24:50.040 | love it and they tend to be people who have gone pretty deep in the past so
00:24:55.640 | for example my friend Andrew Shaw who wrote something called music auto bot
00:25:00.240 | which is one of the coolest things in the world in case you haven't
00:25:04.160 | seen it yet which is something where you can generate music using a neural
00:25:09.360 | network you can put in some melodies and some chords and it will auto complete
00:25:13.360 | some additional melodies and chords or you can put it in a melody and it will
00:25:17.320 | automatically add chords or you can add chords that create melody and so he had
00:25:23.240 | to write his own MIDI library fastai.midi he rewrote it in v2 and he said it's
00:25:30.220 | just like so so so much easier thanks to those mid-tier APIs so yeah at this stage
00:25:39.080 | I was just gonna I was just gonna jump in quick I've been helping with some of
00:25:45.000 | the audio stuff and it's been it's been really awesome so it it makes things a
00:25:52.240 | lot more flexible than version one so that that's probably my favorite thing
00:25:57.120 | about it is everything can be interchanged nothing is like well it's got
00:26:02.720 | to be this way because that's how it is. Another piece of the transform is the
00:26:12.280 | foundation is the partially reversible composed function pipeline dispatched
00:26:17.640 | over collections which really rolls off the tongue we call them transform and
00:26:21.900 | pipeline basically the idea is that the way you kind of want function dispatch
00:26:31.960 | to work and function composition to work in deep learning is a little different
00:26:39.040 | to other places there's a couple of things the first is you often want to
00:26:44.520 | dispatch over tuples now what I mean by that is if you have a function called
00:26:51.240 | flip left right and you have a tuple representing a mini batch where your
00:26:58.880 | independent variable is a picture and your dependent variable is a set of
00:27:02.560 | bounding boxes if you say flip left right on that tuple you would expect both
00:27:08.520 | the X and the Y to be flipped and to be flipped with the type appropriate
00:27:14.680 | method so our transforms will automatically send each element of a
00:27:21.800 | tuple to the function separately and or dispatch according to their types
00:27:27.160 | automatically we've mentioned type retention so the kind of basic
00:27:31.720 | higher type type stuff we need one interesting thing is not only encoding
00:27:39.040 | so in other words applying the function you often need to be able to decode which
00:27:44.400 | is to kind of de apply the function so for example a categorization transform
00:27:49.600 | would take the word dog and convert it to the number one perhaps which is what
00:27:55.960 | you need for modeling but then when your predictions come back you need to know
00:27:59.640 | what one represents so you need to reverse that transform and turn one back
00:28:05.160 | into dog often those transforms also need data driven set up for example in
00:28:11.720 | that example of dog becoming one there needs to be something that actually
00:28:15.720 | creates that vocab automatically recognizing what are all the possible
00:28:19.360 | classes so it can create a different index for each one and then apply that
00:28:24.280 | to the validation set and quite often these transforms also have some kind of
00:28:29.480 | state such as the vocab so we built this bunch of stuff that builds on top of
00:28:37.200 | each other at the lowest level is a class for transform which is a callable
00:28:44.320 | which also has a decode does the type retention high kind of type thing and
00:28:50.240 | does the dispatch over tuples by default so that the pipeline is something that
00:28:54.820 | does function composition over transforms and it knows about for example
00:29:01.560 | setting up transforms and like setting up transforms in a pipeline is a bit
00:29:07.520 | tricky because you have to make sure that at each level of the pipeline only
00:29:13.120 | the previous steps have been applied before you set up the next step so does
00:29:18.120 | little things like that and then we have something that applies a pipeline to a
00:29:22.600 | collection to give you an indexable lazily transformed collection and then
00:29:28.120 | you can do those in parallel to get back you know an independent dependent
00:29:32.560 | variable for instance and then finally we've built a data loader which will
00:29:40.000 | apply these things in parallel and create collated batches so in the end all
00:29:51.400 | this stuff makes a lot of things much easier for example the language model
00:29:56.120 | data loader in fast AI v1 was like pages of code in TensorFlow it's pages of code
00:30:03.320 | in fast AI v2 it's less than a screen of code by leveraging these it on these
00:30:09.840 | powerful abstractions and foundations so then finally and again this is something
00:30:17.960 | I think Swift will be great for we worked really hard to make everything
00:30:22.680 | extremely well optimized so for example pre-processing and natural language
00:30:27.160 | processing we created a parallel generator in in Python which you can
00:30:34.280 | then basically pass a class to the define some setup and a call and it can
00:30:39.400 | automatically paralyze that so for example tokenization is done in parallel
00:30:47.040 | in a pretty memory efficient way but perhaps the thing I'm most excited about
00:30:55.800 | both in Python and Swift is the optimized pipeline running on the GPU so
00:31:03.840 | all of the pretty much all of the transforms we've done can and and by
00:31:11.040 | default do run on the GPU so for example when you do the flip left right I showed
00:31:16.760 | you earlier will actually run on the GPU as well warp as well zoom as well even
00:31:22.440 | things like crop so one of the basics of this is the affine coordinate transform
00:31:31.240 | which uses affine grid and grid sample which are very powerful PyTorch
00:31:37.160 | functions which would be great things to actually write in script for TensorFlow's
00:31:43.800 | new meta programming because they don't exist in TensorFlow or at least not in
00:31:49.600 | any very complete way but with these with these basic ideas we can create this
00:31:54.720 | affine coordinate transform that lets us do a very wide range of data
00:31:59.560 | augmentations in parallel on the GPU for those of you that know about the DALI
00:32:04.160 | library that we've created this provides a lot of the same benefits of DALI it's
00:32:10.480 | pretty similar in terms of its performance but the nice thing is all
00:32:14.880 | the stuff you write you write it in Python not in CUDA so with DALI if they
00:32:19.760 | don't have the exact transformation you want and there's a pretty high chance
00:32:24.360 | that they won't then you're stuck or else with fast AI v2 you can write your
00:32:30.800 | own in a few lines of Python you can test it out in a Jupyter notebook it
00:32:36.800 | makes life super easy so this kind of stuff you know I feel like because Swift
00:32:44.880 | is you know a much faster more hackable language than than Python or at least
00:32:51.800 | hackable in the sense of performance I guess not as hackable in terms of its
00:32:55.720 | type system necessarily you know I feel like we can kind of build even more
00:33:02.480 | powerful foundations and pipelines and you know like a real Swift for TensorFlow
00:33:10.760 | computer vision library you know leveraging the metaprogramming and
00:33:15.720 | leveraging Swift numerics stuff like that I think would be super cool so