back to indexAn overview of fastai2
Chapters
0:0
4:28 An infinitely customizable training loop
10:46 optimizer with pluggable stats and steppers
13:24 Image classification
26:25 provides the basic features needed for working with model inputs
00:00:00.000 |
basically every time Sylvain and I found something that didn't quite work the 00:00:05.200 |
way we wanted it at any part of the stack, we wrote our own. So it's kind of 00:00:09.920 |
like building something with no particular deadline and trying to do 00:00:13.060 |
everything the very very best we can. So the layered API of Fast AI v2 starts at 00:00:22.600 |
the applications layer which is where most beginners will start and it looks 00:00:29.160 |
a lot like Fast AI v1 which is the released version of the software that 00:00:33.880 |
people have seen before, but v2 everything is rewritten from scratch, it's 00:00:38.120 |
totally new, there's no code borrowed, but the top level API looks quite similar. 00:00:42.040 |
The idea is that in one, two, three, four lines of code you can create a state-of-the-art 00:00:51.160 |
computer vision classifier, including transfer learning, with nearly the same one, 00:00:59.520 |
two, three, four lines of code, oh five lines of code in this case because we're 00:01:04.440 |
also displaying, you can create a state-of-the-art segmentation model and 00:01:09.040 |
actually like when I say state-of-the-art like for example this segmentation model 00:01:12.440 |
is to the best of my knowledge still better than any published result on this 00:01:16.760 |
particular Canvit data set. So like these five lines of code are super good, five 00:01:20.440 |
lines of code and as you can see it includes a line of code which if you say 00:01:25.760 |
show batch it will display your data in an appropriate format, in this case 00:01:31.040 |
showing you segmentation, a picture and the color coded pixels overlaid on top 00:01:36.240 |
of the picture. The same basic four lines of code will do text classification, so 00:01:43.720 |
here's the basis of ULMFIT which is a system that we developed and wrote up 00:01:50.840 |
along with Sebastian Ruder for transfer learning in natural language processing 00:01:55.160 |
and as you can see in here this is working on IMDB on a single epoch in four 00:02:01.200 |
minutes, the accuracy here is basically what was the state-of-the-art as of a 00:02:05.680 |
couple of years ago. Tabular or time series analysis, same deal, basically a few 00:02:13.800 |
lines of code, nearly exactly the same lines of code and you'll get a great 00:02:17.080 |
result from your tabular data and Ditto for collaborative filtering. So the 00:02:23.800 |
high-level API for fast AIV2 is designed to be something where you know 00:02:30.160 |
regardless of what application you're working on you can get a great result 00:02:34.440 |
from it using sensible defaults and carefully selected hyper parameters 00:02:40.240 |
automatically largely done for you for the most common kinds of problems that 00:02:46.440 |
people look at and that bit doesn't look that different to V1 but understanding 00:02:52.320 |
how we get to that is kind of interesting and involves getting deeper 00:02:58.000 |
and deeper. This approach though does work super well and partly it's because this 00:03:05.960 |
is based on quite a few years of research to figure out what are the best 00:03:09.200 |
ways to solve various problems along the way and when people actually try using 00:03:13.680 |
fast AI they're often surprised. So this person posted on our forum that they've 00:03:18.680 |
been working in TF2 for a while and for some reason they couldn't figure out all 00:03:23.600 |
of their models are suddenly working much better and the answer is basically 00:03:27.360 |
they're getting all these nice kind of curated best practices and somebody else 00:03:32.840 |
on Twitter saw that and said yep we found the same thing we were trying 00:03:36.640 |
TensorFlow spent months tweaking and then we switched to fast AI a couple of 00:03:40.840 |
days later we were getting better results. So these kind of carefully 00:03:44.400 |
curated defaults and algorithms and high-level APIs that do things right for 00:03:49.920 |
you the first time even for experienced practitioners can give you better 00:03:54.320 |
results faster but it's actually the other pieces that are more I think 00:04:00.400 |
interesting for a Swift conversation because as the deeper we go into how we 00:04:05.720 |
make that work the more stuff you'll see which will be a great fit I think with 00:04:11.800 |
Swift. So the mid-layer API is something which is largely new to fast - actually I 00:04:21.880 |
guess the foundation layer is new. So the mid-layer I guess I'd say is more 00:04:25.000 |
rewritten for v1 and it contains some of the things that make those high-level 00:04:30.920 |
APIs easy. One of the bits which is the most interesting is the training loop 00:04:38.000 |
itself and I thank Sylvain for the set of slides we have for the training loop. 00:04:44.880 |
This is what a training loop looks like in PyTorch. We calculate some predictions 00:04:50.440 |
we get a loss we do a backwards pass to get the gradients we do an optimizer 00:04:56.200 |
step and then optionally we run time to time we'll zero the gradients based on 00:05:00.680 |
if we're doing when we're accumulating. So this is what that loop looks like run 00:05:08.000 |
the model get the loss do the gradients step the optimizer do that a bunch of 00:05:12.360 |
times. But you want to do something interesting you'll need to add something 00:05:19.640 |
to the loop to do keeping track of your training statistics in in tensorboard or 00:05:24.720 |
in fast progress or whatever. You might want to schedule various hyper parameters 00:05:29.520 |
in various different ways. You might want to add various different kinds of 00:05:33.560 |
characterization. You may want to do mixed precision training. You may want to do 00:05:39.760 |
GANs. So this is a problem because either you have to write a new training loop 00:05:46.400 |
for every time you want to add a different tweak. Now making all those 00:05:50.520 |
tweaks work together then becomes incredibly complicated. Or each or you 00:05:57.160 |
try and write one training loop which does everything you can think of. This is 00:06:00.080 |
the training loop for fast AI 0.7 which only did a tiny subset of the things I 00:06:04.700 |
just said that was ridiculous. Or you can add callbacks at each step. Now the idea 00:06:12.200 |
of callbacks has been around in deep learning for a long time, APIs. But what's 00:06:18.640 |
very different about fast AI is that every callback is actually a two-way 00:06:23.540 |
callback. It can read absolutely everything. It can read gradients, 00:06:27.320 |
parameters, data, so forth. And it can write them. So it can actually change 00:06:33.440 |
anything at any time. So the callbacks we say infinitely flexible. We feel pretty 00:06:41.000 |
confident in that because the training loop in fast AI has not needed to be 00:06:47.120 |
modified to do any of the tweaks that I showed you before. So even the entirety 00:06:52.520 |
of training GANs can be done in a callback. So basically we switch out our 00:06:58.680 |
basic training loop and replace it with one with the same five steps but 00:07:03.880 |
callbacks between every step. So that means for example if you want to do a 00:07:11.920 |
scheduler you can define a batch begin that sets the optimizer's learning rate 00:07:16.640 |
to some function. Or if you want to do early stopping you can write an on epoch 00:07:21.560 |
end that checks the metrics and stops training. Or you can do parallel training, 00:07:28.640 |
set up data parallel and happy at the end of training take data parallel off 00:07:34.560 |
again. Gradient clipping, you have access to the parameters themselves so you 00:07:40.160 |
can click the gradient norms at the end of the backward step and so forth. So all 00:07:47.320 |
of these different things are all things that have been written with fast AI 00:07:52.640 |
callbacks including for example mixed precision all of NVIDIA's 00:07:59.600 |
recommendations mixed precision training will be added automatically if you just 00:08:03.600 |
add a to FP16 at the end of your learn call. And really importantly you know for 00:08:11.920 |
example all of those mixed precision things can be combined with multi GPU and 00:08:17.160 |
one cycle training and gradient accumulation and so forth. And so trying 00:08:24.480 |
to you know create a state-of-the-art model which involves combining state-of-the-art 00:08:30.800 |
regularization and mixed precision and distributed training and so forth is a 00:08:35.840 |
really really really hard job. But with this approach it's actually just a single 00:08:41.040 |
extra line of code to add each feature and they all explicitly are designed to 00:08:45.840 |
work with each other and are tested to work with each other. So for instance here 00:08:51.520 |
is mixup data augmentation which is a incredibly powerful data augmentation 00:08:56.520 |
method that has powered lots of state-of-the-art results and as you can see 00:09:01.720 |
it's well under a screen of code. By comparison here is the version of mixup 00:09:07.720 |
from the paper not only is it far longer but it only works with one particular 00:09:13.280 |
data set and one particular optimizer and is full of all kinds of assumptions and 00:09:18.240 |
only one particular kind of metric and so forth. So that's an example of these 00:09:25.200 |
mid-tier APIs. Another one is the optimizer. It turns out that you know it 00:09:34.320 |
looks like there's been lots and lots of different optimizers appearing in the 00:09:38.640 |
last year or two. It actually turns out that they're all minor tweaks on each 00:09:44.440 |
other. Most libraries don't write them this way. So for example Adam W, also 00:09:50.960 |
known as decoupled weight decay Adam, was added to PyTorch quite recently in the 00:09:59.160 |
last month or two and it required writing a whole new class and a whole new 00:10:06.240 |
step to implement and it took you know it was like two or three years after the 00:10:11.640 |
paper was released. On the other hand fast AI's implementation as you can see 00:10:17.680 |
involves a single extra function containing two lines of code and this 00:10:23.320 |
little bit of gray here. So it's kind of like two and a half three lines of code 00:10:26.920 |
to implement the same thing because what we did was we realized let's refactor the 00:10:34.640 |
idea of an optimizer, see what's different for each of these you know 00:10:39.200 |
state-of-the-art optimizers that have appeared recently and make it so that 00:10:43.800 |
each of those things can be added and removed by just changing two things. 00:10:49.360 |
Stats and steppers. A stat is something that you measure during training such as 00:10:58.680 |
the gradients or the gradient squared or you might use dampening or momentum or 00:11:02.600 |
whatever and then a stepper is something that uses those stats to change the 00:11:08.800 |
weights in some way and you can combine those things together and by combining 00:11:13.000 |
these we've been able to implement all these different optimizers. So for 00:11:21.200 |
instance the lamb optimizer which came out of Google and was super cool at 00:11:28.560 |
reducing tree training time from three days to 76 minutes we were able to 00:11:34.200 |
implement that in this tiny piece of code and one of the nice things is that 00:11:39.360 |
when you compare it to the math it really looks almost line for line 00:11:44.240 |
identical except ours is a little bit nicer because we refactored some of the 00:11:48.720 |
math. So it like makes it really easy to do research as well because you can kind 00:11:54.560 |
of quite directly bring the equations across into your code. Then the last of 00:12:01.640 |
the mid-tier APIs is the data block API which is something we had in version 1 00:12:09.520 |
as well but when we were porting that to Swift we had an opportunity to rethink 00:12:21.240 |
it and actually Alexis Gallagher in particular helped us to rethink it in a 00:12:26.760 |
more idiomatic Swifty way and it came out really nicely and so then we took 00:12:32.520 |
the result of that and kind of ported it back into Python and we ended up with 00:12:36.400 |
something that was quite a bit nicer so there's been a kind of a nice 00:12:39.800 |
interaction and interplay between fast AI in Python and Swift AI in Swift in 00:12:46.000 |
terms of helping each other's APIs but basically the data block API is something 00:12:51.760 |
where you define each of the key things that the program needs to know to 00:12:57.960 |
flexibly get your data into a form you can put in a model so it needs to know 00:13:03.540 |
what type of data do you have how do you get that data how do you split it into a 00:13:10.160 |
training set in the validation set and then put that all together into a data 00:13:13.940 |
bunch which is just a simple little class it's literally I think four lines 00:13:17.240 |
of code which just has the validation set and the training set in one place so 00:13:25.640 |
with a data block you just say okay my types I want to create a black and white 00:13:32.000 |
pillow image for my X and a category for my Y and to get the list of files for 00:13:39.040 |
those I need to use this function and to split those files into training and 00:13:43.880 |
validation use this function which is looking at the grandparent path 00:13:48.560 |
directory name and to get the labels use this function which is use the parents 00:13:55.160 |
path name and so with that that's enough to give you a feminist for instance and 00:14:03.080 |
so once you've done this you end up with a data bunch and as I mentioned before 00:14:08.120 |
everything has a show batch so one of the nice things is it makes it very easy 00:14:13.120 |
for you to look at your data regardless of whether it's tabular or collaborative 00:14:16.840 |
filtering or vision or text or even audio if it was audio would let you show 00:14:21.960 |
you a spectrogram and let you play play the sound so you can do custom labeling 00:14:30.360 |
with data blocks by using for example a regular expression labeler you can get 00:14:37.280 |
your labels from an external file or data frame and they could be model with 00:14:41.800 |
multi labels so this thing here knows it's a multi label classification task 00:14:45.640 |
so it's automatically put the semicolon between each label again it's still 00:14:51.120 |
basically just three lines of code to define the data block so here's a data 00:14:56.560 |
block for segmentation and you can see really the only thing I had to change 00:15:00.480 |
here was that my dependent variable has been changed in category to pillow mask 00:15:07.640 |
and again automatically I show batchworks and we can train a model from 00:15:11.520 |
that straight away as well you could do key points so here I've just changed my 00:15:17.320 |
dependent variable to tensor point and so now it knows how to behave with that 00:15:21.440 |
object detection so now change my dependent variable to bounding box and 00:15:26.400 |
you can see I've got my bounding boxes here text and so forth so actually going 00:15:35.720 |
back I have a couple questions if you're if it's okay yeah so it's if you the 00:15:42.800 |
code you've got sort of the the X's and Y's and these both he sounds like these 00:15:48.600 |
different data types roughly conform to a protocol yep we're gonna get to that in 00:15:53.400 |
a moment absolutely that's an excellent way to think of it and actually this is 00:15:59.760 |
the way it looked about three weeks ago it now it looks even more like a 00:16:02.640 |
protocol so yes this is where this is what where it all comes from which is 00:16:08.800 |
the foundation APIs and this is the bit that I think is the most relevant to 00:16:12.840 |
Swift and a lot of this I think would be a lot easier to write in Swift so the 00:16:21.160 |
first thing that we added to PyTorch was object-oriented tensors for for too long 00:16:28.720 |
we've all been satisfied with a data type called tensor which has no 00:16:34.840 |
semantics to it and so those tensors actually represent something like a 00:16:39.880 |
sentence or a picture of a cat or recording of somebody saying something 00:16:46.920 |
so why can't I take one of those tensors and say dot flip or dot rotate or dot 00:16:53.220 |
resample or dot translate to German well the the answer is you can't because it's 00:17:00.160 |
just a tensor without a type so we have added types to tensors so you can now 00:17:08.000 |
have a tensor image tensor point tensor bounding box and you can define a flip 00:17:13.520 |
left right for each and so this is some of the source code from we've written 00:17:17.540 |
our own computer vision library so that now you can say flip LR and it flips the 00:17:24.400 |
puppy and if it was a key points it would fit the key points it was a 00:17:29.560 |
bounding box it would fit the bounding boxes and so forth so this is an example 00:17:34.520 |
of how tensors which carry around semantics are nice it's also nice that I 00:17:39.880 |
didn't just say dot show right so dot show is something that's defined for all 00:17:45.480 |
fast a IV to tensor types and it will just display that tensor it could even 00:17:52.440 |
be a tuple containing a tensor and some bounding boxes and some bounding box 00:17:57.040 |
classes whatever it is it will be able to display it it will be able to convert 00:18:02.480 |
it into batches for modeling and so forth so you know with with that we can 00:18:12.260 |
now create for example a random transformation called flip item and we 00:18:17.360 |
can say that the encoding of that random transformation is defined for a pillow 00:18:22.520 |
image or any tensor type and in each case the implementation is simply to 00:18:28.080 |
call X dot flip LR or we could do the dihedral symmetry transforms in the same 00:18:33.440 |
way before we call grab a random number between zero and seven to the side which 00:18:39.420 |
of the eight transposes to do and then encodes call X dot let's dihedral with 00:18:47.600 |
that thing we just got and so now we can call that transform a bunch of times and 00:18:54.260 |
each time we'll get back a different random augmentation so a lot of these 00:18:58.300 |
things become nice and easy hey Jeremy some Maxim asked why isn't tensor backing 00:19:03.520 |
data structure for an image type tensor image is a tensor which is an image type 00:19:11.440 |
why is it he says why isn't tensor a backing what why not have a different 00:19:17.780 |
type named image I guess that has a tensor inside of it do you mean why 00:19:22.920 |
inherit rather than compose apparently yes that yeah so inheritance I mean you 00:19:34.160 |
can do both and you can create identical API's inheritance just has the benefit 00:19:38.720 |
that all the normal stuff you can do with a tensor you can do with a tensor 00:19:42.200 |
that happens to be an image so just because a tensor is an image doesn't 00:19:45.540 |
mean you now don't want to be able to do fancy indexing to it or do an LU 00:19:50.840 |
composition of it or stack it with other chances across that axis so basically a 00:19:59.240 |
tensor image ought to have all the behavior of a tensor plus additional 00:20:04.560 |
behavior so that's why we used inheritance we have a version that uses 00:20:09.360 |
composition as well and it uses Python's nice get atra functionality to pass on 00:20:16.480 |
all of it all of the behavior of tensor but it comes up more nicely in Python 00:20:23.980 |
when you do inheritance and actually the PyTorch team has decided to officially 00:20:30.240 |
implement semantic tensor subtypes now and so hopefully in the next version of 00:20:35.240 |
PyTorch you won't have to use the extremely ugly hacks that we had to use 00:20:39.640 |
to make this work and hopefully you'll see in TorchVision some of these ideas 00:20:48.080 |
will be brought over there. So how does the type propagate? So if you do 00:20:54.720 |
arithmetic on an image tensor do you get an image tensor back there? So Chris and I had a conversation about this a few 00:21:03.280 |
months ago and I said I'm banging my head around this issue of types not carrying 00:21:09.680 |
around their behavior and Chris casually mentioned oh yes that thing is called 00:21:13.500 |
higher kind of types so I went home and that was one of these phrases that I 00:21:17.920 |
thought only functional programming dweebs talked about and I would never care about it because it actually 00:21:25.440 |
matters a lot and it's basically the idea that if you have a tensor image 00:21:28.360 |
and you add one to it you want to get back a tensor image because it should be 00:21:33.440 |
an image that's a bit brighter rather than something that loses its type. So we 00:21:38.200 |
implemented our own again hacky partial higher kind of type implementation in 00:21:44.440 |
FastAIV2 so any of these things that you do to a tensor of a subtype you will 00:21:53.040 |
nearly always get back the correctly sub-typed tensor. I mean I saw the 00:21:58.480 |
PyTorch recently sort of talking about their named indexing extensions for 00:22:04.640 |
their tensors as well and I think they have a similar kind of challenge there 00:22:08.160 |
where when you start doing arithmetic and other things like that on tensor that has 00:22:12.480 |
named dimensions you want to propagate those along. Yeah so we haven't 00:22:19.360 |
started using that yet because it hasn't quite landed as stable but yeah we 00:22:27.360 |
talked to the PyTorch team at the DevCon and that we certainly are planning to 00:22:32.640 |
bring these ideas together they're orthog and orbit related concerns. Yeah I just 00:22:37.440 |
mean that I assume that that feature has the same problem and the same challenge. 00:22:41.760 |
I assume so. It would be interesting to see what they do. Yeah it would. Yeah so you 00:22:53.280 |
know it's kind of nice not only do we get to be able to say .show batch but you can 00:22:57.440 |
even go .show results and in this case it knows what the independent variables 00:23:03.560 |
type is it knows what the dependent variables type is and it even knows 00:23:07.080 |
things like hey for a classification task those two things should be the same 00:23:10.800 |
and if they're not by default I will highlight them. So these like lower level 00:23:15.760 |
foundations are the things that drive our ability to easily add this higher 00:23:19.800 |
level functionality. So you know this is the kind of ugly stuff we wouldn't 00:23:27.280 |
have to do in Swift we had to write our own type dispatch system so that we can 00:23:33.160 |
annotate things with types and those type annotations are actually semantic 00:23:36.880 |
and so we now have the joyfully modern idea of function overloading in Python 00:23:44.920 |
which has made life a lot easier and we already have that. Do you have many 00:23:51.760 |
users that are using this yet? It's still pre-release it's not even alpha but there 00:23:59.080 |
is a enthusiastic early adopter community who is using it so for example 00:24:07.200 |
the user contributed audio library has already been ported to it. I've also 00:24:13.960 |
built a medical imaging library on top of it and have written a series of five 00:24:17.720 |
notebooks showing how to do CT scan analysis with it so it's kind of like it 00:24:25.680 |
works and I was curious what your users think of it because there's this 00:24:33.200 |
very strongly held conception that Python folks hate types and you're kind of 00:24:39.600 |
providing a little bit of typing. Yeah and I'm curious how they react to that. The 00:24:44.440 |
extremely biased subset of early adopter fast AI enthusiasts who are using it 00:24:50.040 |
love it and they tend to be people who have gone pretty deep in the past so 00:24:55.640 |
for example my friend Andrew Shaw who wrote something called music auto bot 00:25:00.240 |
which is one of the coolest things in the world in case you haven't 00:25:04.160 |
seen it yet which is something where you can generate music using a neural 00:25:09.360 |
network you can put in some melodies and some chords and it will auto complete 00:25:13.360 |
some additional melodies and chords or you can put it in a melody and it will 00:25:17.320 |
automatically add chords or you can add chords that create melody and so he had 00:25:23.240 |
to write his own MIDI library fastai.midi he rewrote it in v2 and he said it's 00:25:30.220 |
just like so so so much easier thanks to those mid-tier APIs so yeah at this stage 00:25:39.080 |
I was just gonna I was just gonna jump in quick I've been helping with some of 00:25:45.000 |
the audio stuff and it's been it's been really awesome so it it makes things a 00:25:52.240 |
lot more flexible than version one so that that's probably my favorite thing 00:25:57.120 |
about it is everything can be interchanged nothing is like well it's got 00:26:02.720 |
to be this way because that's how it is. Another piece of the transform is the 00:26:12.280 |
foundation is the partially reversible composed function pipeline dispatched 00:26:17.640 |
over collections which really rolls off the tongue we call them transform and 00:26:21.900 |
pipeline basically the idea is that the way you kind of want function dispatch 00:26:31.960 |
to work and function composition to work in deep learning is a little different 00:26:39.040 |
to other places there's a couple of things the first is you often want to 00:26:44.520 |
dispatch over tuples now what I mean by that is if you have a function called 00:26:51.240 |
flip left right and you have a tuple representing a mini batch where your 00:26:58.880 |
independent variable is a picture and your dependent variable is a set of 00:27:02.560 |
bounding boxes if you say flip left right on that tuple you would expect both 00:27:08.520 |
the X and the Y to be flipped and to be flipped with the type appropriate 00:27:14.680 |
method so our transforms will automatically send each element of a 00:27:21.800 |
tuple to the function separately and or dispatch according to their types 00:27:27.160 |
automatically we've mentioned type retention so the kind of basic 00:27:31.720 |
higher type type stuff we need one interesting thing is not only encoding 00:27:39.040 |
so in other words applying the function you often need to be able to decode which 00:27:44.400 |
is to kind of de apply the function so for example a categorization transform 00:27:49.600 |
would take the word dog and convert it to the number one perhaps which is what 00:27:55.960 |
you need for modeling but then when your predictions come back you need to know 00:27:59.640 |
what one represents so you need to reverse that transform and turn one back 00:28:05.160 |
into dog often those transforms also need data driven set up for example in 00:28:11.720 |
that example of dog becoming one there needs to be something that actually 00:28:15.720 |
creates that vocab automatically recognizing what are all the possible 00:28:19.360 |
classes so it can create a different index for each one and then apply that 00:28:24.280 |
to the validation set and quite often these transforms also have some kind of 00:28:29.480 |
state such as the vocab so we built this bunch of stuff that builds on top of 00:28:37.200 |
each other at the lowest level is a class for transform which is a callable 00:28:44.320 |
which also has a decode does the type retention high kind of type thing and 00:28:50.240 |
does the dispatch over tuples by default so that the pipeline is something that 00:28:54.820 |
does function composition over transforms and it knows about for example 00:29:01.560 |
setting up transforms and like setting up transforms in a pipeline is a bit 00:29:07.520 |
tricky because you have to make sure that at each level of the pipeline only 00:29:13.120 |
the previous steps have been applied before you set up the next step so does 00:29:18.120 |
little things like that and then we have something that applies a pipeline to a 00:29:22.600 |
collection to give you an indexable lazily transformed collection and then 00:29:28.120 |
you can do those in parallel to get back you know an independent dependent 00:29:32.560 |
variable for instance and then finally we've built a data loader which will 00:29:40.000 |
apply these things in parallel and create collated batches so in the end all 00:29:51.400 |
this stuff makes a lot of things much easier for example the language model 00:29:56.120 |
data loader in fast AI v1 was like pages of code in TensorFlow it's pages of code 00:30:03.320 |
in fast AI v2 it's less than a screen of code by leveraging these it on these 00:30:09.840 |
powerful abstractions and foundations so then finally and again this is something 00:30:17.960 |
I think Swift will be great for we worked really hard to make everything 00:30:22.680 |
extremely well optimized so for example pre-processing and natural language 00:30:27.160 |
processing we created a parallel generator in in Python which you can 00:30:34.280 |
then basically pass a class to the define some setup and a call and it can 00:30:39.400 |
automatically paralyze that so for example tokenization is done in parallel 00:30:47.040 |
in a pretty memory efficient way but perhaps the thing I'm most excited about 00:30:55.800 |
both in Python and Swift is the optimized pipeline running on the GPU so 00:31:03.840 |
all of the pretty much all of the transforms we've done can and and by 00:31:11.040 |
default do run on the GPU so for example when you do the flip left right I showed 00:31:16.760 |
you earlier will actually run on the GPU as well warp as well zoom as well even 00:31:22.440 |
things like crop so one of the basics of this is the affine coordinate transform 00:31:31.240 |
which uses affine grid and grid sample which are very powerful PyTorch 00:31:37.160 |
functions which would be great things to actually write in script for TensorFlow's 00:31:43.800 |
new meta programming because they don't exist in TensorFlow or at least not in 00:31:49.600 |
any very complete way but with these with these basic ideas we can create this 00:31:54.720 |
affine coordinate transform that lets us do a very wide range of data 00:31:59.560 |
augmentations in parallel on the GPU for those of you that know about the DALI 00:32:04.160 |
library that we've created this provides a lot of the same benefits of DALI it's 00:32:10.480 |
pretty similar in terms of its performance but the nice thing is all 00:32:14.880 |
the stuff you write you write it in Python not in CUDA so with DALI if they 00:32:19.760 |
don't have the exact transformation you want and there's a pretty high chance 00:32:24.360 |
that they won't then you're stuck or else with fast AI v2 you can write your 00:32:30.800 |
own in a few lines of Python you can test it out in a Jupyter notebook it 00:32:36.800 |
makes life super easy so this kind of stuff you know I feel like because Swift 00:32:44.880 |
is you know a much faster more hackable language than than Python or at least 00:32:51.800 |
hackable in the sense of performance I guess not as hackable in terms of its 00:32:55.720 |
type system necessarily you know I feel like we can kind of build even more 00:33:02.480 |
powerful foundations and pipelines and you know like a real Swift for TensorFlow 00:33:10.760 |
computer vision library you know leveraging the metaprogramming and 00:33:15.720 |
leveraging Swift numerics stuff like that I think would be super cool so