back to indexLesson 13 (2019) - Basics of Swift for Deep Learning
Chapters
0:0
6:35 What's the future of fastai?
13:57 What is Swift? The claims
18:5 Swift for TensorFlow
22:6 S4TF and TensorFlow?
43:47 Compiler optimizations
47:14 Float in Swift
00:00:00.000 |
Welcome everybody to lesson 13, also known as lesson six of part two, 00:00:10.320 |
The lesson in which we start talking about SWIFT. 00:00:14.120 |
Before we do, I wanted to mention a couple of cool things during the week 00:00:19.000 |
because lots of people doing lots of cool things. 00:00:21.440 |
And in part two, I haven't done as much highlighting of these things, 00:00:24.400 |
but I thought it'd be nice to look at a couple of cool examples from this week. 00:00:29.000 |
Big congrats to Rob Gee, who said that 14 months ago, 00:00:33.520 |
he'd never done any machine learning or deep learning or Python or Maths 00:00:36.640 |
beyond high school, and he just competed in one of the top academic challenges 00:00:42.640 |
for computer vision machine learning and came first and second in the two 00:00:50.040 |
So congrats, Rob, and I thought this is a great example of, like, you know, 00:00:55.360 |
the kind of interesting things you can do because if you do an academic challenge, 00:00:58.720 |
like this, and if you do well, like Rob, you actually get the opportunity, 00:01:05.760 |
And so if you've never done an academic paper before, 00:01:09.480 |
You kind of have a certain publishing venue, and you get a bit of an insight 00:01:15.000 |
to the academic world, and I certainly found the same thing 00:01:18.040 |
that Rob points out here, which is that when you actually submit a paper 00:01:22.120 |
for the first time, you suddenly realize why so many academic papers aren't 00:01:25.760 |
that useful because they focus on weird things. 00:01:32.600 |
I also feel like I have a kind of a regular gig 00:01:38.040 |
in promoting the amazing work Elena Harley does because she does 00:01:40.680 |
so much great work, but this is yet another great piece of work that she's done. 00:01:44.000 |
You'll remember from the part one genomic stuff, and this is nice 00:01:48.760 |
because it's an example of looking at text data for the purpose of looking 00:01:59.880 |
at genomic information, and I just love this example. 00:02:04.120 |
And Elena has got a great walkthrough that you can find describing -- 00:02:10.800 |
It's the exact steps that we've just taken, and one of the things I love is she's 00:02:14.440 |
actually using whatever this version of fast AI is that we have in our X folder, 00:02:21.960 |
It's the stuff that we've built from scratch, and so it's nice to see that used in practice, 00:02:27.080 |
not just used, but not bad for a quick-throated together baseline. 00:02:30.920 |
It hits 15th out of 357 teams on this leaderboard, which she describes 00:02:38.400 |
as not a bad starting point, so not a bad starting point at all. 00:02:45.880 |
So rewind, start of lesson eight, we said we're going to try and recreate fast AI 00:02:55.880 |
And 26 days ago, Sylvia and I started the same thing for Swift, 00:03:04.920 |
except we actually had no ability to cheat because when we started, 00:03:08.920 |
we really were starting with the foundations. 00:03:12.880 |
There are no Swift data science modules, basically. 00:03:18.080 |
But there is stuff, as you'll see, for creating tensors and random number generators, 00:03:25.480 |
and you'll actually see we've been able to use Matplotlib, 00:03:28.240 |
which might surprise you, we'll show you why and how. 00:03:31.040 |
So this is like what we're going to do over the next two lessons, is revisit this. 00:03:36.200 |
Now, obviously, we're not going to go through every step in excruciating detail 00:03:39.640 |
because as you'll see, the vast majority of it is, hey, 00:03:42.040 |
this is almost identical to what we did in Python. 00:03:44.760 |
So what we're still going to do is dig in to the bits 00:03:47.960 |
that show us something interesting or different, of which there will be many. 00:03:53.520 |
But in the end, we're going to get here, which is, this is xresnet, 00:03:59.360 |
is the res block from xresnet, and this, believe it or not, is the Swift version. 00:04:08.240 |
So we're going to end up in this beautiful situation. 00:04:18.880 |
Hopefully, it's going to make you feel pretty comfortable 00:04:23.200 |
And what about how we go and get all the data in there? 00:04:27.040 |
We're going to have to deal with all the TensorFlow data APIs 00:04:32.120 |
Well, no. Here is the data approach that we're going to use. 00:04:39.560 |
So we're actually going to show you how to build the data blocks API in Swift 00:04:46.160 |
So, you know, three weeks ago, when we started digging into this, 00:04:53.120 |
And I'm really thrilled to find that not only is it possible, 00:04:57.600 |
but we end up with code that looks, you know, wonderfully familiar 00:05:02.840 |
and has all the nice features that we've hopefully grown to love. 00:05:07.840 |
So to get to here, there's a lot of people I want to thank. 00:05:11.640 |
In particular, Chris Latner, who I still don't understand 00:05:19.880 |
It seems like, you know, he has very strange judgment or something. 00:05:25.440 |
Yeah. But given that he did, we felt that we had to make sure 00:05:32.880 |
And then the whole Swift for TensorFlow team has actually made this project 00:05:37.880 |
And this totally wouldn't have happened without everybody pulling together. 00:05:43.000 |
Also in terms of bad judgment, Sylvain has, you know, obviously, 00:05:48.080 |
we all know made the mistake of deciding to spend his time working with me 00:05:53.240 |
on fast AI stuff, and a few weeks ago I said, guess what? 00:05:58.000 |
We're going to rebuild everything from scratch in Swift. 00:06:01.840 |
And rather than running away screaming, he said, okay, when do we start? 00:06:07.400 |
So thank you, Sylvain, and he has built nearly all of these notebooks 00:06:11.240 |
in three weeks and learned Swift, which is not bad. 00:06:14.320 |
Thanks, Alexis, for the value types discussion. 00:06:18.840 |
Pedro has built something which makes the Swift packaging system 00:06:24.300 |
slightly less than it otherwise does, which is a lot. 00:06:28.480 |
And also the whole San Francisco fast AI study group who's given us lots 00:06:32.040 |
of valuable feedback over the last few weeks. 00:06:42.000 |
There is a blog post we will link to about, which many of you, 00:06:46.600 |
I'm sure, have read about why we're doing this crazy hair-brained Swift thing. 00:06:52.920 |
But I particularly wanted to mention, like, two things. 00:06:57.560 |
The first is we actually don't really have a choice, right? 00:07:01.200 |
We used to work with TensorFlow and Keras, and we had to stop 00:07:07.800 |
because we couldn't build for you and with you the things 00:07:13.320 |
So luckily PyTorch had just come out, and we actually first started 00:07:17.920 |
using PyTorch in this course, I think it was two weeks 00:07:24.280 |
So doing ridiculous things ridiculously early is definitely part of our DNA. 00:07:31.800 |
But then when we came to get to the next part one, 00:07:34.640 |
because we started using PyTorch in an earlier part two, 00:07:38.040 |
just like we're doing for Swift, then we got to the next part one, 00:07:42.040 |
and we thought, well, PyTorch is great, we want to use it all the time, 00:07:45.280 |
but there's no way we can teach PyTorch to new practitioners. 00:07:50.760 |
So we created a whole new library called Fast AI, because we had to. 00:07:56.120 |
And now we've hit the same point we were with when we first switched to PyTorch, 00:08:00.160 |
which is we can't build with you the stuff we want to build together, right? 00:08:07.080 |
We want to create nice, regularized RNNs, for example. 00:08:12.280 |
We want to create batch norm layers which skip computing the statistics 00:08:18.600 |
We want to create highly flexible on GPU augmentation systems. 00:08:23.720 |
And we can do all these things in PyTorch, but they're slow enough 00:08:38.720 |
And we were very lucky again that Swift for TensorFlow 00:08:44.200 |
The second thing I mention is that not only will this not hurt Fast AI 00:08:50.400 |
for PyTorch, but I'm confident it will make it better. 00:08:53.920 |
I find that when I work with new programming languages 00:09:03.760 |
And we're already coming up with ideas that's 00:09:05.680 |
going to help us to actually make Fast AI for Python better. 00:09:13.960 |
Well, because it's impractical deep learning for coders, 00:09:21.440 |
and Julia is far too mature for such a thing. 00:09:30.120 |
But I mean, a lot of it is just for me personally. 00:09:33.160 |
I'm really interested in getting involved in something that 00:09:35.960 |
has a huge amount of potential, but a little bit earlier. 00:09:42.720 |
I wanted to create something for our student community 00:09:45.720 |
and our practitioner community where we could all kind of help 00:09:49.440 |
be part of that guiding and development process. 00:09:56.320 |
Second reason is it's not just about the language, 00:10:01.600 |
And Julia doesn't quite have that big customer that really 00:10:07.040 |
makes sure that it goes from quite good to huge, 00:10:26.120 |
I also feel like the stuff that I've talked to Chris Latner 00:10:36.800 |
goes beyond the stuff I've talked to the Julia core team 00:10:43.040 |
with many of the Julia founding team members. 00:10:47.320 |
And I hope I get to spend to have time hacking stuff 00:10:51.680 |
But I feel like the place we're heading with Swift 00:10:59.880 |
But I definitely would not say to anybody, don't touch Julia. 00:11:07.040 |
to a large number of the Julia developers at their request 00:11:11.840 |
to say, why don't you build fast AI with us for Julia too? 00:11:31.240 |
the answer to the question of what do we do next 00:11:33.800 |
is given we're using Python, we fix the problems 00:11:41.900 |
they're creating things like tf.function, which 00:11:44.800 |
is kind of like allows them to connect up Python 00:11:47.920 |
to the whole XLA environment, which you'll learn about. 00:11:51.880 |
In PyTorch, they're using JIT to kind of turn Python into C++. 00:12:00.680 |
Why aren't we just do the best with what we have? 00:12:05.400 |
Because Python, to be clear, has a fantastic ecosystem. 00:12:09.280 |
And it kind of seems crazy to throw that all away. 00:12:17.240 |
I think you'll see the answer to this question 00:12:21.520 |
But basically, it turns out, I'm pretty sure, 00:12:25.920 |
that it's easier to pick the right language and compilation 00:12:30.180 |
environment and write everything else on top of it from scratch 00:12:35.480 |
than it is to write everything else on top of something that 00:12:40.760 |
and then try and madly patch up the thing underneath to make 00:12:50.840 |
everything we're doing has to be massively parallel. 00:12:53.280 |
But Python is written so that no two things can 00:12:59.040 |
So these are things that are just incredibly, 00:13:06.240 |
it's like the things we want is how it's designed. 00:13:10.160 |
So why not Python? Because we think that we can't keep going 00:13:20.040 |
We were not the first people to think the existing languages 00:13:36.080 |
And he's so OCD that before that, he wrote his own compiler. 00:13:41.240 |
And so whilst it may be difficult to be around 00:13:45.880 |
that these people exist because they create the things we have. 00:13:50.480 |
Chris, tell us about why you did this crazy thing. 00:14:10.600 |
If you go ask the marketing people what it is, it says-- 00:14:13.080 |
they say things like, Swift defines away large classes 00:14:18.640 |
And it's optimized to get the most out of modern hardware. 00:14:21.760 |
I think another aspect of it is that Swift is really ambitious. 00:14:29.960 |
Swift isn't just about solving a niche problem. 00:14:33.920 |
It was not about, let's make iOS a little bit better. 00:14:36.680 |
Swift was conceived as a full stack programming language 00:14:48.280 |
that's actually set out from the start to do that. 00:14:51.000 |
But that's one of the reasons why it becomes really interesting 00:14:54.920 |
Because you want very high level programming abstractions 00:14:57.440 |
and a very nice and easy to use way of working with the language. 00:15:00.920 |
But you also want the ability to go down to the metal 00:15:10.560 |
One of the things that he says is that it's-- the interesting thing 00:15:14.880 |
about Swift is that you get expressivity, flexibility, 00:15:19.080 |
tight code, you get safety, ease of use, and speed. 00:15:37.320 |
This is something I don't think that people realize. 00:15:39.840 |
Jeremy, you like to point out that Python is 25 years old. 00:15:43.240 |
Many of the systems we're working on are 30 years old. 00:15:50.600 |
I mean, for me, I've never spent time writing 00:15:57.080 |
I also don't really remember-- that's not quite true. 00:15:59.520 |
I guess languages like JavaScript have developed quickly. 00:16:02.320 |
But it's really unusual to have such a young language 00:16:08.560 |
of feeling like, oh, I'm using a language which lots of people 00:16:17.800 |
because you have millions of programmers using it. 00:16:22.680 |
And so that's one of the things that in this project 00:16:29.600 |
that you can only do if, on a reasonable time scale, 00:16:32.360 |
you can innovate and make new language features, 00:16:39.800 |
Swift, from its roots, was designed for usability. 00:16:42.360 |
And so it was designed for IDEs, and user experience, 00:16:45.720 |
and code completion, and inline errors, and things like that. 00:16:48.400 |
It has a refactoring engine, a bunch of stuff 00:16:52.640 |
The other aspect of Swift that I think is really important 00:17:01.720 |
will look pretty familiar if you've used JavaScript. 00:17:06.600 |
It will look pretty similar in a lot of ways. 00:17:18.800 |
And through a hardcore, intense design process, 00:17:24.480 |
with something that is well-considered and fits 00:17:27.600 |
It reminds me of Perl, which Larry Wall, it's 00:17:30.960 |
developed and described as a Swiss army chainsaw. 00:17:36.360 |
to get the best bits, but it's much more curated and carefully 00:17:44.880 |
And so as we'll talk about, the whole team that built Swift 00:17:48.560 |
originally was the team that built LLVM, and Clang, 00:17:54.880 |
You come from a perspective of, we'll create a programming 00:17:56.960 |
language and then figure out how to make it fast later. 00:18:02.000 |
to be something that compilers can use, and humans too. 00:18:06.680 |
So here, we're here to rethink how deep learning systems work 00:18:10.320 |
from the ground up, and where a lot of the systems 00:18:12.720 |
today are constrained by being as much as you can 00:18:21.160 |
We're changing TensorFlow, which we'll talk about. 00:18:23.160 |
There's a tremendous amount of stuff involved in this. 00:18:25.320 |
And one of the things that's really cool about this, 00:18:28.920 |
want to make an open, hackable platform, where you can go 00:18:37.760 |
do lots of new kinds of things, because that's 00:18:50.720 |
If you're going to be doing an impractical deep learning 00:18:56.160 |
So yeah, Swift is very much not just for iOS programming. 00:19:05.600 |
And all these people that are writing iOS applications 00:19:16.560 |
with the language which is super cool, suddenly. 00:19:22.440 |
So the things we'll be talking about across the lesson 00:19:25.000 |
are the Swift potential project has very big bricks that 00:19:36.120 |
to the entire ecosystem that you already know. 00:19:38.280 |
Automact differentiation, hugely important for ML. 00:19:41.200 |
And Swift has a really cool, industry-leading approach 00:19:52.320 |
is that a lot of what you see as a high-level programmer 00:19:57.720 |
And so this is an example layer built in Swift. 00:20:06.440 |
And we're working to reduce the difference even more. 00:20:09.560 |
I mean, some of the differences are very nice, 00:20:11.440 |
like the fact that you don't have to pass self all the time. 00:20:15.240 |
There are things where you're just like, oh, finally. 00:20:18.120 |
So it's actually getting to the point where the boilerplate 00:20:22.400 |
in the Python code, there's more boilerplate of like, oh, 00:20:24.640 |
this self.com, and self.pool, and self.com are-- 00:20:29.080 |
Yeah, so we're going to start very deep and very low level. 00:20:32.720 |
So I just want to give you a high-level view of what 00:20:38.880 |
And so in Swift, this is implemented with a struct. 00:20:48.200 |
And so you have column 2D, max.pool, flatten. 00:20:53.480 |
And we use call instead of underbar, underbar call. 00:21:02.360 |
One major difference is this differentiable thing. 00:21:04.760 |
And you may be wondering, why do we have differentiable? 00:21:08.600 |
that it should be able to differentiate this. 00:21:10.480 |
And one of the cool things about compiler integration 00:21:15.880 |
the gradient of some function, in the happy path, 00:21:28.260 |
And so one of the cool things about having proper language 00:21:30.720 |
support is you can get an error message that says, hey, 00:21:37.480 |
You say, oh, it can't be differentiated because integers 00:21:40.480 |
and this in CAS you have can't be differentiated. 00:21:43.920 |
And then it says, even farther, well, it's actually-- 00:21:46.560 |
this is several levels deep in function calls. 00:21:51.600 |
And it's really cool to get this in your workbook 00:21:59.520 |
build things for usability, you get really nice behavior 00:22:10.440 |
So TensorFlow, one way to think about classic TensorFlow 00:22:13.280 |
is that you have a tremendous amount of infrastructure. 00:22:15.880 |
TensorFlow has this really mature distribution, 00:22:23.720 |
with mobile devices, all this cool stuff in the ecosystem. 00:22:32.000 |
And Python for TF includes its auto-diff system. 00:22:34.840 |
And then it has a bunch of APIs like Keras and Estimator 00:22:38.920 |
And so what we've done is we've built a parallel stack 00:22:40.960 |
where we're using a lot of the same infrastructure 00:22:44.560 |
And then we're building a new fast AI framework on top. 00:22:47.120 |
Now, one of the things we'll talk about more later 00:22:49.000 |
is that TensorFlow itself is undergoing radical change 00:23:07.880 |
are undergoing major changes, which is super exciting. 00:23:16.400 |
What's the roadmap relationship between Swift for TensorFlow 00:23:24.840 |
So right now, the Swift for TensorFlow project, 00:23:28.480 |
Actually, it is literally a dev branch on the Swift project. 00:23:39.840 |
And a bunch of stuff has already been done that way. 00:23:42.280 |
So the Python integration drove new language features in Swift. 00:23:48.040 |
We got them integrated into the mainline compiler. 00:23:50.920 |
And iOS developers can use cool things because of Swift 00:23:59.000 |
Now, I thought I would start with some really basic things 00:24:04.780 |
so you understand you have some context of how 00:24:07.240 |
And then we'll get into some more interesting stuff. 00:24:09.440 |
So this is a Jupyter Notebook, just like you'd expect. 00:24:11.920 |
This is Jeremy's window machine, which I will struggle to use. 00:24:15.680 |
Because I have not used Windows for a long time. 00:24:20.120 |
It has a Mac keyboard on it, so it's super weird. 00:24:22.640 |
It scrolls the wrong direction, but it will be great. 00:24:25.280 |
So a lot of what Swift does looks very familiar to Python. 00:24:35.800 |
It all works exactly the same as you'd expect in Python. 00:24:39.940 |
is that you have this let and this var thing. 00:24:47.800 |
And as Jeremy, I think, loves to say, in a workbook, 00:24:52.560 |
And then you can change it however much you want. 00:24:55.400 |
But you can see, if you declare a constant like pi-- 00:24:58.840 |
because pi should not change, it makes sense to use let. 00:25:07.400 |
And this error message says something like cell 5, line 1. 00:25:12.500 |
is if you hit Shift-L, you get line numbers in here. 00:25:15.640 |
And here you can see it's referring to cell 5, line 1. 00:25:19.000 |
And it says, hey, if you go to cell 3, line 2, up here, 00:25:30.920 |
And I'll just mention, for people watching that 00:25:47.280 |
But when you're getting into this deep learning mode, 00:25:50.800 |
we generally represent flipping everything upside down, 00:25:53.320 |
at least for the R&D and prototyping process. 00:25:55.960 |
Because you want things to be infinitely hackable, 00:25:58.880 |
the way Chris describes, you want them to be vast. 00:26:02.200 |
You want to be public so that you can see inside them. 00:26:06.280 |
been recent PRs to Swift for TensorFlow itself, where 00:26:11.720 |
we're starting to change APIs in this way to make it 00:26:20.640 |
So you may notice that we're not using a lot of types here. 00:26:29.960 |
What Swift does is it has a thing called type inference. 00:26:32.160 |
And so when you say var x equals 4, it knows that 4 is an int. 00:26:40.880 |
it will say that, oh, that's a float or a double. 00:26:43.680 |
And so types in Swift are written with a colon here. 00:26:45.880 |
And so you can say, OK, well, even though this would normally 00:26:53.040 |
Emoji is totally better than Greek letters, Jeremy. 00:27:00.800 |
He's like, how do you feel like emoji about emoji 00:27:06.280 |
And I literally said to him, Chris, they're fine, 00:27:09.320 |
as long as it's the pile of poo emoji and it's next to a cow. 00:27:13.120 |
And Chris goes, OK, it's the pile of poo emoji 00:27:20.520 |
So this is great power and great responsibility. 00:27:23.840 |
If you name all of your variables pile of poo, 00:27:44.000 |
I'm not going to say one's better than the other. 00:27:56.280 |
Well, in Python you use def, and Swift you use func. 00:28:01.720 |
And so what this is, is this is defining a function 00:28:04.240 |
and you declare the inputs, the signature, x and y, 00:28:10.600 |
When you call it, you pass keyword arguments. 00:28:13.560 |
Swift is very opinionated about keyword arguments. 00:28:15.720 |
And so if you say that this thing has x and y as arguments, 00:28:21.880 |
is you'll see this underbar thing going on right here. 00:28:24.400 |
And this is saying that ignore, underbars ignore, 00:28:27.640 |
just like Python, ignore are the keyword argument. 00:28:32.840 |
I've got to say, I love almost everything about Swift, 00:28:39.600 |
So this bit I find awkward, because these positional 00:28:44.520 |
parameters, you can't even change the order of them. 00:28:50.240 |
If you do have that underscore to say you don't have to name it, 00:28:58.280 |
I find this bit nasty, but it's almost everything else 00:29:02.040 |
I love about Swift, so I just put up with it. 00:29:04.600 |
This is also not my opinion of the right thing. 00:29:08.000 |
But the argument for that is consistency of APIs 00:29:14.800 |
So tuples work pretty much the same way as they do in Python. 00:29:17.240 |
You can use them to return multiple values, which 00:29:26.560 |
You get access to the tuples, all the nice things 00:29:30.560 |
One of the things that's different about Swift 00:29:32.360 |
in Python-- Swift has this thing called struct. 00:29:52.440 |
So I would say, like, think of it more like a Python class 00:30:08.320 |
And in Python, there's this thing called data class. 00:30:18.780 |
There's some extra boilerplate we need in the Python version. 00:30:21.940 |
For example, we have to put the two things on different lines. 00:30:27.000 |
But overall, like a lot of things between Swift and Python, 00:30:33.960 |
So now, one of the bad things about this thing 00:30:37.120 |
But complex numbers work with integers, as well. 00:30:39.120 |
And they work with doubles and lots of other things. 00:30:44.040 |
And we'll talk more about the details of generics later. 00:30:51.240 |
And complex works with any type T and anything that is signed. 00:30:57.640 |
And that's what the signed numeric thing says. 00:30:59.800 |
And so now, what I can do is I can define the struct. 00:31:02.760 |
And I can use it down here with integers and with floating 00:31:06.480 |
And it just figures out that T is int or T is float, 00:31:10.540 |
And this is something that Python can't do, right? 00:31:27.320 |
us do some pretty powerful stuff right out of the box. 00:31:29.960 |
And we'll talk about some of the really cool things 00:31:31.560 |
that you can do that make it super flexible and super-- 00:31:35.280 |
I mean, there's some really powerful things you can do. 00:31:41.200 |
is that just like in Python, you have computed properties 00:31:52.440 |
Here's a computed property doing a weird thing. 00:31:54.840 |
But here I just have a computed getter and a setter. 00:32:12.240 |
is that after you define a type, you can add methods to it. 00:32:15.320 |
And this is something that might make you feel weird. 00:32:21.240 |
You can add it on your own, like we're doing here. 00:32:26.120 |
I mean, it doesn't make me feel weird, Chris. 00:32:32.880 |
But it's kind of something that we're told to avoid. 00:32:35.040 |
Because monkey patching has weird, dangerous, undefined, 00:32:38.800 |
strange behavior, and things combine in weird ways. 00:32:45.160 |
So this works in a very highly principled way 00:32:53.200 |
This is not something you should feel bad about. 00:32:56.800 |
And so I'm using this to add two complex numbers. 00:32:59.920 |
I feel bad about this because there's a way to spell add. 00:33:11.400 |
But if you want to add an operator, what you do 00:33:16.520 |
And so instead of honor bar, honor bar add, and all that jazz, 00:33:19.480 |
you just define the operators you want and spell them 00:33:22.640 |
And they're just functions like anything else. 00:33:26.880 |
that would be really nice to be able to do in Python, 00:33:30.840 |
a whole bunch of different functions or operators 00:33:35.800 |
And they behave differently depending on what type 00:33:40.400 |
Now, Python does have a standard library decorator 00:33:46.760 |
We almost never use it because like every time 00:33:48.640 |
we've tried to use it, it reacts in weird ways 00:33:58.640 |
it's very much designed for us to be able to say like, 00:34:03.160 |
And they all have different meanings of what, for example, 00:34:08.160 |
And so here we're implementing plus on complex in terms 00:34:12.120 |
And so we're just adding together the real and imaginary, 00:34:23.360 |
And so some of us do lots of math, us not including me. 00:34:31.560 |
you're doing quaternions or other cool things like that, 00:34:33.680 |
and it's really nice to be able to use operators that 00:34:36.800 |
And so if you want to find a square root operator, 00:34:40.200 |
And these just work, and now you can use a square root operator 00:34:44.200 |
And this is one of the examples of Swift being hackable. 00:34:50.720 |
has a bunch of stuff built in and provided in the box. 00:34:54.320 |
But the stuff the standard library does, you can do too. 00:34:57.600 |
And so we try very hard to not make the standard library 00:35:05.680 |
There's this guided tour here, which is really cool. 00:35:09.480 |
And so if you want just a high-level introduction 00:35:13.200 |
But let's dive into some more relevant pieces. 00:35:17.440 |
The first is, does Swift support any debugger within Jupyter, 00:35:20.240 |
similar to IPDB for Python to set breakpoints? 00:35:28.120 |
So Jupyter is actually talking to a debugger. 00:35:31.600 |
But that's one of the things we're interested in. 00:35:36.160 |
But the guy in the front row that built it all is smiling. 00:35:41.320 |
And does Swift have something similar to Python's ARGs 00:35:54.640 |
So let's talk about Python now, because we love Python, right? 00:36:01.680 |
Swift's data science ecosystem is kind of pathetic. 00:36:08.800 |
and Swift being pathetic, you all know Python. 00:36:15.320 |
And there's no reason for you to relearn new APIs. 00:36:17.880 |
If you know the APIs in Python, just use them. 00:36:21.580 |
because I think it might blow your mind a little bit. 00:36:23.720 |
So to use Python and Swift, you first import Python. 00:36:27.200 |
This is just a library in Swift called Python. 00:36:30.500 |
And then you use Python to import whatever Python libraries 00:36:40.480 |
You just literally import NumPy, or here we're 00:36:46.720 |
This gives you PLT, just like you would do in Python. 00:36:59.400 |
So we can now use this to do some cool stuff. 00:37:09.400 |
This is firing up TensorFlow and loading the MNIST data set 00:37:14.800 |
Once that comes back, now we can use Matplotlib. 00:37:17.080 |
And Matplotlib, we can use this magic incantation, 00:37:23.080 |
We can then load, take the tensor that TensorFlow gave us, 00:37:26.720 |
do all the NumPy stuff with a NumPy ND array, 00:37:33.160 |
And this all just works the way you would normally 00:37:36.680 |
And the cool thing about this is that Swift knows nothing 00:37:42.320 |
knows nothing about any of the particular libraries 00:37:46.440 |
We literally just imported some random thing with strings. 00:37:51.040 |
And Swift doesn't know anything about what you imported here. 00:37:57.440 |
because we're just using Python right from Swift 00:38:03.960 |
is that we think about Python as though it has no types. 00:38:11.140 |
And Python has an internal representation of objects. 00:38:22.760 |
just like type in Python, says give me the type of np, 00:38:26.600 |
or give me the type of np.array, or give me a type of the array 00:38:33.600 |
What it actually shows you is the type is Python object. 00:38:36.640 |
And so Python values are Python object types in Swift. 00:38:44.520 |
And so it just works in Swift, because you are literally 00:38:48.600 |
using Python as dynamically as Python is designed to be used. 00:38:56.880 |
that you can do is you can import Python into Swift, 00:39:04.480 |
and now go to town and just use all the standard FastAI 00:39:17.960 |
And it's interesting how when you look at the resulting code, 00:39:22.080 |
it's the same code that we-- like at this point, 00:39:25.600 |
you can't tell other than some slightly different syntax here 00:39:35.880 |
One thing I'll say about this is this is a super cool feature 00:39:42.280 |
that you should totally use to fill in the gaps that 00:39:44.840 |
need to be filled in while this ecosystem doesn't exist. 00:39:47.920 |
But then as soon as possible, fill in the gap. 00:39:51.560 |
Because I don't want us, as a Swift for TensorFlow community, 00:39:58.880 |
write our own even better DataFrames library, 00:40:11.640 |
and then gradually replace it with bits that are more swifty. 00:40:14.720 |
I mean, one of the awesome things about Swift 00:40:16.760 |
is that it supports really well-considered and beautiful 00:40:21.200 |
And it was really designed for building APIs. 00:40:29.480 |
If you want to open a file, open a file the way you know how. 00:40:35.200 |
Don't waste your brain cycles on that kind of stuff. 00:40:38.080 |
So let's now talk about the idea of Jeremy's course here, 00:40:42.120 |
which is building a machine learning library from scratch. 00:40:44.480 |
And I think it's very quaint that Jeremy tried so hard 00:40:52.320 |
how to build a Matmul with a few loops, and looping over an array, 00:40:58.000 |
and adding and multiplying floating point numbers. 00:41:02.960 |
that he thinks that this is going down to the foundation. 00:41:09.840 |
then I think we should go down to the bedrock 00:41:12.960 |
and actually talk about where float and arrays come from. 00:41:16.000 |
But before we do that, I want to dive in and geek out 00:41:21.760 |
or it's useful to understand, what LLVM and Swift and things 00:41:26.240 |
So, Chris, what you're saying is that I cheated. 00:41:49.440 |
So the way I think about this is that there's actually 00:41:54.360 |
There's humans, which we're all kind of a pain to work with, 00:42:04.480 |
And so what languages are is they're a point in between. 00:42:07.320 |
And different languages are different points in between. 00:42:09.800 |
And some are optimized for working with humans better. 00:42:12.560 |
Some are optimized for working with computers better. 00:42:19.880 |
Well, the way that it used to work in the bad old days 00:42:22.160 |
is that if somebody wanted to build a compiler for x86, 00:42:24.600 |
they would build a parser, the front end part of a compiler. 00:42:27.760 |
They'd then build an optimizer and make the code go fast. 00:42:30.120 |
And they'd build a code generator for the Intel PC 00:42:43.160 |
Somebody else would say, hey, APL is really cool. 00:42:45.040 |
Let's build a parser for APL, an optimizer for APL, 00:42:52.080 |
there's a lot of re-implantation of all the things going on. 00:43:00.720 |
where you can make it so that lots of different language 00:43:04.280 |
people can implement lots of different front ends. 00:43:08.020 |
can implement what's called the back end or the code generator. 00:43:10.680 |
And now they can share a tremendous amount of code. 00:43:15.400 |
And we should all thank Chris Latner's master's thesis 00:43:18.640 |
supervisor for forcing him to write his damn thesis 00:43:21.400 |
and getting him to actually write LLVM version 1 00:43:24.120 |
in one and a half weeks of Diet Coke-fueled coding activity. 00:43:30.820 |
is give people a ridiculous deadline and it happens. 00:43:34.800 |
And so the details of what LLVM is is not really important. 00:43:37.800 |
But this LLVM is what powers Julia and Rust and Swift 00:43:44.120 |
It's like a very common set of infrastructure 00:43:48.080 |
And if you're not very familiar with compilers 00:43:53.040 |
including constant folding, removing dead code, 00:43:56.360 |
other things like the example I show here of taking 00:44:01.560 |
This is something that in PyTorch, for example, 00:44:03.640 |
if you do a multiply inside of a loop of two tensors, 00:44:06.320 |
it's going to run that multiply every single time 00:44:18.600 |
and you see, oh, I'm doing this work inside the loop 00:44:30.800 |
and other optimization systems in GCC or whatever, 00:44:34.760 |
it suddenly makes you realize that compilers are something 00:44:38.640 |
different to at least what I thought they were. 00:44:40.700 |
I thought compilers were things that got in your way 00:44:43.760 |
and complained until you did things the way they expected 00:44:49.840 |
that I would have finished yesterday if it was Python. 00:44:54.040 |
But actually, working with Swift, and particularly 00:44:58.240 |
with Swift for TensorFlow, has made me realize 00:45:00.080 |
that these optimizations actually allow us to write code 00:45:03.840 |
in different ways and actually be much more lazy 00:45:08.880 |
- And this is as you think about a point in the space 00:45:12.360 |
- Yeah, so we're actually gonna show you something 00:45:14.220 |
really mind-blowing next week where this is actually 00:45:17.880 |
gonna be used to basically make auto-diff work. 00:45:21.680 |
And it's like, it blew my mind when I found out about it. 00:45:24.880 |
- Yep, and so now if you think about languages 00:45:28.400 |
there's a couple of different ways to look at this. 00:45:31.760 |
ignoring the syntax pieces, which the syntax is always 00:45:44.480 |
boil a dictionary down, it's a bunch of C functions. 00:45:50.540 |
what C functions to call and what order and on what data. 00:45:56.280 |
and so the Python program ends up being slow, 00:46:04.920 |
C++, the atoms are built in things like integers 00:46:07.920 |
and floats and arrays and pointers and things like that. 00:46:10.800 |
And then a C++ programmer can use structs and classes 00:46:16.000 |
or its variable size array thing out of in the library. 00:46:20.720 |
And it can do this because C++ is a fast language. 00:46:23.000 |
It's also not a super well-considered language, 00:46:26.560 |
but it's weird to me in C++ that arrays hard-coded 00:46:30.800 |
into the compiler, but string is a library feature. 00:46:38.760 |
What Swift is, is it says, let's rethink all this. 00:46:42.520 |
And so the primitives, the low-level atoms of the universe 00:46:46.040 |
are now things that LLVM, the compiler, knows about. 00:46:53.320 |
of course, the high-level stuff too, like layers, 00:46:57.640 |
And so a float is not a magic built-in thing. 00:47:04.800 |
And if something is interesting for a library developer to do, 00:47:09.000 |
maybe you want to do it in your workbook, right? 00:47:11.080 |
And so having an open ecosystem is very powerful. 00:47:14.920 |
And so if you actually go look at the library 00:47:16.880 |
that implements float, float is just a struct, 00:47:20.280 |
just like the complex thing we were talking about before. 00:47:22.800 |
In the middle, the inside of it is this built-in weird thing. 00:47:31.760 |
Plus is just an operator, just like we were talking 00:47:35.240 |
Just this one happens to be named plus or plus equals. 00:47:41.760 |
>> Well, let's go look at the implementation. 00:47:47.360 |
this is the standard library that comes with Swift. 00:47:53.640 |
It implements add, pi, like all the things that are 00:47:57.360 |
in float is just a gigantic pile of Swift code. 00:48:00.400 |
And the cool thing about this is that this means 00:48:15.120 |
But the fact that you can is actually important 00:48:21.880 |
When I was starting out and I did a lot of stuff 00:48:35.040 |
And I very often hit this floor where things weren't working the 00:48:40.280 |
So I had to use assembler, which nobody should ever have to do. 00:48:51.800 |
And over the last 25 years, we've gradually kind of filled 00:48:56.440 |
in more and more of the things that numeric programmers use. 00:49:01.520 |
But what I'm kind of finding is happening now is 00:49:03.560 |
as numeric programming is becoming differentiable 00:49:05.800 |
programming, I'm hitting the bottom of the stack again. 00:49:12.560 |
And/or there are things I want to do a little bit differently. 00:49:14.960 |
So I feel like we're at this point in history. 00:49:17.960 |
You know, we might be for the next five or ten years or more 00:49:21.440 |
where data scientists don't need to know how to write assembler. 00:49:25.280 |
But they do need a system where they can go under the surface 00:49:28.400 |
and actually change things that people don't normally change. 00:49:34.680 |
I think the goal here is an infinitely hackable platform. 00:49:37.120 |
So like in the box are all the nice things you'd expect. 00:49:43.520 |
But if you want to go change it and do your own, you can. 00:49:51.800 |
Now, we talked about structure a little bit like classes in Python. 00:49:58.120 |
The major difference is that these are actually fast. 00:50:02.640 |
that multiplies two things together and adds it. 00:50:05.360 |
If this was Python, these would be allocating objects. 00:50:11.120 |
This thing I'm showing you now is called the compiler explorer. 00:50:14.080 |
And you thought you came to learn machine learning? 00:50:16.000 |
Here's some assembly language, which we're going to get away 00:50:20.840 |
But the point is like you're writing a reasonable Swift code 00:50:23.040 |
and you're getting literally the same code you would get 00:50:28.320 |
Like even though float is implemented in the standard library, 00:50:32.640 |
You're getting the lowest level optimized fast code that's turning 00:50:35.680 |
to multiply instruction and add instruction on Intel. 00:50:38.640 |
And I'll go away from this very quickly because we're not here 00:50:43.720 |
So now the thing about float again is not really 00:50:47.040 |
about something you should want to do, but you can poke 00:50:51.760 |
One of the things we've at least so far chosen not 00:50:54.200 |
to do is we don't export the built-in to workbooks. 00:50:57.240 |
And so you have to write a standalone file to use it. 00:51:01.200 |
But one of the really powerful things about this is 00:51:03.320 |
because these types are defined in the library, they're not magic. 00:51:10.600 |
And so we can add a method to int or to bool. 00:51:13.640 |
So here, you know, we add a little is odd method 00:51:17.080 |
that's just saying is the low bit set or clear. 00:51:28.980 |
We can add a symbol that turns a boolean into an emoji 00:51:43.620 |
This is particularly important for all of us at this stage 00:51:49.600 |
Swift hasn't been widely used for numeric programming. 00:51:53.540 |
And so when I started playing with it in December 00:51:59.920 |
So yeah, so I created this library called BaseMath 00:52:03.040 |
where I literally was defining things on float 00:52:19.360 |
that I math stuff that I wanted in the language. 00:52:22.760 |
And so if you're hacking around over the coming months 00:52:27.120 |
and you find things not quite the way you want, 00:52:33.560 |
And it's really, really, really common in Swift code 00:52:49.200 |
but there's lots of interesting things in the system. 00:53:03.660 |
Let's talk about array because we need arrays 00:53:09.840 |
let's look at how you use it as a Swift programmer. 00:53:13.960 |
You just define them with square brackets like you'd expect. 00:53:20.760 |
there's two different syntaxes for the types. 00:53:26.920 |
But that is actually just sugar for this array, okay? 00:53:30.320 |
And if you print out the types of these things, 00:53:32.640 |
you'll see that they're all just array event, 00:53:40.080 |
It just goes over all the elements of the array. 00:53:46.520 |
based on whether you want the endpoint or not. 00:53:50.040 |
which includes that endpoint, you use dot, dot, dot. 00:53:52.640 |
In an exclusive range, you use dot, dot lesson. 00:53:56.960 |
Swift supports functional programming things. 00:54:07.080 |
Closures are the same thing as lambdas in Python 00:54:11.480 |
And so here what we're doing is we're saying, 00:54:15.680 |
that takes all the elements and adds 10 to them. 00:54:19.840 |
You can just do this right in line and it's nice and fast. 00:54:29.000 |
So filter just takes a predicate where you say, 00:54:31.480 |
filter and only include the things that are odd, okay? 00:54:36.500 |
And now we get an array that just has odd things in it. 00:54:41.640 |
is that Swift has lots of nice syntactic shortcuts. 00:54:43.980 |
And so instead of naming our argument like we did in map, 00:54:46.680 |
we just use the default name, which is dollar sign zero. 00:54:48.940 |
- So the top one is equivalent to lambda arg colon, 00:54:54.400 |
And so then we can get rid of both the lambda 00:54:56.880 |
and the arg colon by sticking it in curly brackets 00:55:01.960 |
- Another super common thing is that often these closures 00:55:04.360 |
end up being the last argument to a function. 00:55:06.320 |
If you have, if they're the last argument to a function, 00:55:08.080 |
you can just put them outside the parentheses. 00:55:10.000 |
And if that's the only thing in the parentheses, 00:55:11.800 |
you can just get rid of the parentheses as well. 00:55:16.800 |
where you can say map and multiply all the elements by three 00:55:19.960 |
and then filter them and only keep the odd ones. 00:55:24.840 |
or here's a map where I'm saying, you know, pick, 00:55:34.400 |
- Yeah, so this, so just come back and have a look 00:55:36.960 |
at this map filter again at the end of the lesson 00:55:39.120 |
because this is how you do list comprehensions in Swift. 00:55:43.280 |
because we, the stuff that's already built in 00:55:45.800 |
very elegantly gives us list comprehensions are free. 00:55:48.760 |
- Yep, and all these things are just library features. 00:55:55.300 |
we're just adding all the elements of the array to zero 00:56:06.160 |
Now we're talking about array, array is a type. 00:56:08.840 |
And that means you can do an extension on a type. 00:56:11.600 |
So you can add methods to arrays, that's super easy. 00:56:19.840 |
So double elements just multiplies all the elements by two 00:56:22.520 |
and like the self thing we don't actually need. 00:56:29.520 |
And now one of the other things you may wonder about 00:56:31.120 |
is like, why do we need this where element is numeric? 00:56:34.080 |
And what this is talking about is it's saying, 00:56:35.800 |
well, we're taking all the elements out of this thing 00:56:46.360 |
hey, I can't double all the elements of a Boolean array 00:56:51.940 |
And so in Python, what would end up happening 00:56:53.960 |
is if you accidentally pass the wrong thing in, 00:57:01.760 |
somewhere later you find out you have twos in your Booleans, 00:57:26.720 |
which has some particular properties to it, right? 00:57:36.000 |
And so we're not gonna go into the details now, 00:57:37.940 |
but just like take a look at this after the lesson 00:57:46.600 |
- And again, one of the cool things about this 00:57:49.480 |
it's all open to you and you can do cool things 00:57:54.100 |
So I'm not gonna go into the full details of array. 00:57:57.600 |
Array is implemented right here and array.swift. 00:58:00.960 |
This is the standard library, array is a struct. 00:58:07.080 |
You can go through and you can see all the different 00:58:13.920 |
I think that we will just consider it that we implement this. 00:58:32.360 |
Now that we've invented SWIFT and float and array, 00:58:36.200 |
we will actually implement matrix multiplication. 00:58:40.360 |
Okay, any questions before we keep going, Rachel? 00:58:46.000 |
The first is that in keyword is very unintuitive 00:59:01.940 |
- So that's the question, it's like, why is it so weird? 00:59:07.200 |
- Yeah, so we carefully considered the name of this keyword 00:59:24.000 |
There's historical reasons, but they're not good reasons. 00:59:36.400 |
can SWIFT LLVM implement instructions to execute on the GPU? 00:59:43.720 |
This is one of the things we're investing a lot 00:59:47.320 |
and we'll be talking about that a little bit next time. 00:59:49.660 |
- Yeah, but I mean, the short answer is that LLVM 00:59:54.400 |
And one of the backends it has is a PTX backend, 00:59:57.760 |
which is the lower level Nvidia kind of instruction set. 01:00:01.920 |
And so like right now you can compile stuff with LLVM 01:00:08.800 |
- And in fact, like every pixel on the iPhone 01:00:11.120 |
goes through LLVM, not through SWIFT in a workbook. 01:00:23.440 |
So now that we have our bedrock of floats and arrays, 01:00:28.380 |
let's build something a little bit higher level. 01:00:32.620 |
So here what we're gonna do is we're actually gonna 01:00:41.480 |
where we could use the random number generation 01:00:45.160 |
but we're not gonna use the matrix multiplication operator 01:00:54.120 |
and SWIFT for TensorFlow has this tensor type, 01:00:58.220 |
this little float thing we wanna go away eventually, 01:01:00.280 |
we hope, but right now you say tensor of float 01:01:09.260 |
there's lots of different things you can play with. 01:01:19.640 |
So here what we're doing is we're doing something 01:01:25.400 |
But here we're starting a little bit lower level. 01:01:31.520 |
And so what we need to do is we need to pass in 01:01:34.800 |
and then we're doing a two dimensional matrix multiplication 01:01:40.420 |
- So that's a definition of a tuple parameter, Chris? 01:01:44.960 |
- Yep, so A dims is two ints and B dims is two ints. 01:01:48.240 |
And so what we do is we create an array full of zeros, 01:01:50.920 |
and then we write the same three loops you saw before. 01:01:55.400 |
we have to do manual arithmetic to get into it. 01:01:59.960 |
Now, one of the things that you'll see is like, 01:02:02.280 |
if you actually try this out and you run this, 01:02:06.880 |
this is why you say don't do things and I don't listen, 01:02:30.300 |
The Python version of this took 835 milliseconds. 01:02:37.000 |
So, I mean, the first thing I wanted to compare 01:02:38.720 |
was to look at the code, so that's a Swift code. 01:02:45.800 |
- Yep, here you have 2D arrays, which we'll get to. 01:02:48.680 |
- And so, yeah, we kind of found with the Python version, 01:02:51.780 |
it took about a second to do a five by 784 matrix model, 01:03:00.800 |
we can't use this because it's just not fast enough. 01:03:05.180 |
But something very different is going on here 01:03:21.640 |
it's primitives that are principled that are fast. 01:03:24.320 |
And when you build things out of principled fast primitives, 01:03:27.080 |
you get new things that are principled and fast. 01:03:32.200 |
this is like, this was a whole mind shift change. 01:03:34.920 |
'Cause at this point, the fact that you can write this 01:03:40.160 |
I can now write anything I can think of doing with numbers 01:03:54.840 |
so one way to think about, so for intensifiers, 01:03:56.560 |
we're trying to subtract C and C++ out of the picture, right? 01:03:59.800 |
Because if you think about working with Python, 01:04:03.220 |
a lot of it ends up being, if you care about performance, 01:04:07.480 |
How do you go into C stuff or working around writing, 01:04:11.600 |
oh, I need a new kind of data structure, what do I do? 01:04:15.080 |
because writing in Python wouldn't be fast enough. 01:04:22.000 |
But here we're implementing basic stuff in workbooks 01:04:27.360 |
that's like a compiled program that I just ship it. 01:04:30.240 |
I don't have to figure out how to put the C library 01:04:33.240 |
over here and compile it and put it together with this header. 01:04:35.840 |
So Jeremy, what is this built-in called time? 01:04:42.720 |
- Absolutely, so we're not using percent time 01:04:49.500 |
we shouldn't be allowed to use things that are magic, 01:04:55.600 |
and it's actually written in this notebook called 00. 01:05:13.520 |
about working with Swift is we can build everything 01:05:20.160 |
And the details don't matter, but basically you can see 01:05:22.920 |
we can grab some stuff from the Swift standard library, 01:05:28.400 |
And we can run some function and see how long it takes. 01:05:44.240 |
that takes no arguments and returns nothing at all. 01:05:46.980 |
And so, for example, we can give it a default value, 01:05:52.680 |
and just run it once, we can just do that, right? 01:06:03.200 |
And so this version of time actually is both time it 01:06:09.560 |
So if you give it repeating, it'll do that, right? 01:06:12.160 |
And actually, this 00 notebook is worth flicking through 01:06:16.840 |
because it's the only notebook where there's no tensors 01:06:24.640 |
So if you just want to see like just Swift, right? 01:06:30.360 |
So for example, you can see how we built this thing 01:06:43.800 |
And you can kind of see how to write these things 01:06:49.360 |
And now we can export this and now anytime you want 01:07:02.240 |
And one of the nice things here is that you'll see 01:07:04.240 |
that download a file with actually using this path library 01:07:12.720 |
And that's because there's a wonderful programmer, 01:07:21.120 |
And I mentioned he's actually an open source programmer 01:07:23.880 |
who entirely relies on sponsorship for his work. 01:07:26.280 |
So if you like Max's work, which I certainly do, 01:07:29.840 |
you should go to his Patreon and give him a few dollars. 01:07:33.640 |
So thanks to Max, we have something that's really just 01:07:37.840 |
like pathlib but actually almost a bit better. 01:07:40.360 |
There's something that's almost exactly like requests. 01:07:44.320 |
So in the Swift ecosystem, thanks to the fact 01:07:46.960 |
that you've got all these millions of iOS programmers 01:07:49.320 |
who have been using this language for five years 01:07:52.960 |
there's actually a nice non-data science ecosystem. 01:07:58.040 |
about a non-data science Python similar packages, 01:08:01.400 |
is there any web framework similar to Flask or Django 01:08:06.800 |
>> Yeah, actually the Swift on the server community is 01:08:13.720 |
And a bunch of these different frameworks have Swift versions. 01:08:18.480 |
of the biggest non-iOS communities that exist. 01:08:21.480 |
>> So one I've seen a lot of love for is Vapor, I think? 01:08:28.320 |
And they're putting a lot of time and thought into that. 01:08:35.640 |
And there's a Swift version of that called Swift Neo. 01:08:38.480 |
So there's a bunch of these fairly infrastructural things 01:08:42.160 |
that exist that are part of the Swift ecosystem. 01:08:45.920 |
>> Great. So here you can see how we can download a file. 01:08:50.600 |
We've got try-catch blocks, a lot like we're used to. 01:08:56.240 |
So in this case, we want to download MNIST and load it up. 01:09:04.080 |
and we'll talk a lot more about this next week. 01:09:05.800 |
But things can get a little bit complicated when, like, 01:09:09.200 |
for example, for MNIST, we've got a file containing labels. 01:09:25.840 |
Right? So here's a version where we actually tell it, "Oh, 01:09:30.400 |
you could load up different types of MNIST data, 01:09:33.920 |
and it's going to return a tensor of that type." 01:09:37.560 |
Okay. And unfortunately, if I try to use this, I get an error. 01:09:41.000 |
Right? And I really wanted to show you this error 01:09:43.600 |
because for the first week as a Swift programmer, I kind of -- 01:09:53.440 |
Swift hated me, and it told me these things like, 01:10:00.840 |
And I'm just going to say that's totally fine. 01:10:03.080 |
Right? Because the Swift type system is very new to somebody 01:10:09.440 |
like me and probably most of the people watching this. 01:10:15.720 |
who understand it pretty well, and it's totally normal to think. 01:10:24.560 |
>> To be stubbing your toe on every new thing 01:10:27.880 |
>> And particularly this generic stuff, you know? 01:10:30.200 |
And I would say, look, a couple of suggestions. 01:10:32.400 |
The first is just write the two separate versions 01:10:35.680 |
so that you don't get frustrated, and then come back 01:10:43.120 |
>> Yeah. But quite often the kinds of errors you get 01:10:46.160 |
from the site system are sometimes they're even 01:10:48.800 |
like a long way away from really where the problem was. 01:10:51.640 |
It can be quite difficult because it's a powerful type system 01:10:57.480 |
Now, in this case, the issue basically is that we are trying 01:11:01.840 |
to call -- we're trying to initialize either floats 01:11:12.440 |
that the type we're creating can initialize either floats 01:11:16.080 |
or bytes, so as you'll learn next week, you could do 01:11:19.120 |
that by creating something called a protocol. 01:11:21.160 |
You do it by saying that these things conform 01:11:25.060 |
You then use that protocol, and so now this version 01:11:34.040 |
So this is a nice little package that you can look through. 01:11:40.840 |
in 00 was the thing that makes //export works. 01:11:51.080 |
One of the things that I needed to do here was 01:11:55.040 |
to check whether something matches a regular expression 01:12:01.080 |
I found it extremely weird that the way to do 01:12:04.080 |
that in Swift is called range of options regular expression, 01:12:09.080 |
so I created something called find in string. 01:12:26.320 |
and to give you a sense of like when I say clunky APIs, 01:12:29.600 |
in particular, you'll see here we're using Max's beautiful 01:12:36.080 |
Before we realized that the path library does everything we 01:12:38.480 |
wanted, we used the thing that Apple provides, 01:12:51.200 |
in Apple's foundation library looks like, oh, my God, 01:13:00.000 |
like the leading path component, the binding path component, 01:13:05.640 |
I don't know why, but a lot of Swift programmers seem 01:13:13.480 |
but I think foundation is not necessarily your favorite API 01:13:19.360 |
>> I think it's fair to say that the thing that's great 01:13:21.320 |
about foundation is it has a lot of interesting 01:13:23.680 |
and useful stuff, URLs, and other stuff like that, 01:13:31.000 |
>> It's great that it's there, and it's amazing 01:13:37.160 |
>> I need something like the ability to tell the time. 01:13:40.320 |
It's in dispatch, or the ability to work with URLs. 01:13:43.680 |
And so know that foundation is there, and generally speaking, 01:13:50.040 |
Because there's a lot of stuff you want will live there, 01:13:52.160 |
and if you forget to import it, it won't appear 01:13:53.800 |
in your tab completion, and you'll get errors. 01:13:56.200 |
But also, when you find clunky foundation APIs, which is-- 01:14:04.640 |
Anyway, so once you've done that, now we've got our own, 01:14:13.600 |
And now, we can just do the same thing that we do in Python, 01:14:16.440 |
and we now have a nice little module exporter that we can use. 01:14:24.160 |
How do we know that calling F parentheses is not optimized away 01:14:32.080 |
>> Generally, so that's actually a great question. 01:14:41.480 |
I don't think there's no cross-workbook optimization 01:14:50.880 |
What I recommend doing is put something that's not trivial 01:15:00.720 |
and that's not something that gets optimized away. 01:15:03.040 |
You can also, like, get the value into the closure 01:15:06.880 |
So, there's different things that you can do like that. 01:15:08.720 |
>> Yeah. Sometimes, when I was doing this stuff, 01:15:10.160 |
a base method, I would just add a print inside the thing 01:15:12.240 |
that I was timing to force it to be calculated. 01:15:14.800 |
>> And one of the other things that will happen with GPUs 01:15:36.560 |
And so, what's happening is that every time you, 01:15:41.480 |
to make sure that the index of your computing is in bounds. 01:15:46.040 |
that you would not need to do if you're in C. 01:15:49.240 |
And so, one of the other really cool things about Swift 01:15:51.360 |
is that you can actually go all the way down to the bare metal 01:15:55.240 |
and do things, the unsafe, nasty, awesome C-Wave, 01:16:00.240 |
if you want to, to get even a little bit more performance. 01:16:02.840 |
And so, here, sorry, I forgot to change this back, 01:16:10.040 |
that we did before where we take in two arrays 01:16:13.160 |
And so, what we're going to do is to optimize storing 01:16:18.520 |
give me an unsafe mutable buffer pointer into that array. 01:16:22.400 |
And it's unsafe, it's verbose, it has red warning signs 01:16:29.200 |
But with almost no code change, now we're able to get something 01:16:35.480 |
And so, here's MatMul, and now it runs at .07 milliseconds, 01:16:40.120 |
which is even faster, which really is a performance of C. 01:16:44.120 |
>> And something I found with Bay Smith is, like, 01:16:47.680 |
sometimes these differences are four or five times faster 01:16:59.880 |
>> But the thing I want to emphasize at this point is 01:17:01.400 |
that this is like a super low-level geeky thing 01:17:07.440 |
This is something that it exists because at certain points 01:17:11.000 |
in your career or your journey, you may find that it's useful. 01:17:14.080 |
Or you may find something that somebody else wrote, 01:17:21.880 |
But usually, you're not working at this level. 01:17:26.840 |
If you want to go like super deep down the rabbit hole, 01:17:30.040 |
unsafe pointer, and unsafe mutable buffer pointer, 01:17:32.800 |
and all these things are also Swift libraries, 01:17:34.520 |
and you can go see their implementations, too. 01:17:40.280 |
So, at this point, let's skip over more C stuff and jump 01:17:47.760 |
So, we've got a matrix multiplication working 01:17:51.680 |
on arrays and floats, but we also have tensors. 01:17:56.000 |
And so, when we talked about Tensor and MatMul 01:18:00.160 |
in the PyTorch context, you started out by using the Tensor 01:18:03.840 |
abstraction as the thing that was the container 01:18:09.000 |
So, let's talk a little bit about how Tensor works 01:18:12.240 |
so for TensorFlow piece of this, Tensor is a type. 01:18:15.800 |
And Tensor can have multiple different elements in it. 01:18:18.560 |
Like we talked about before, you can create one with zeros 01:18:22.920 |
And the nice thing about tensors is that they carry a shape, 01:18:26.400 |
just like you'd expect, and so you can get it with a dot shape. 01:18:36.640 |
and you can print it out, and it's a two-dimensional tensor, 01:18:41.560 |
Python has the @ operator to do MatMuls of two tensors. 01:18:46.280 |
Swift has the same thing, but it uses the nice Unicode thing. 01:18:50.360 |
There's an easy way to type this if you're on a Mac 01:18:53.160 |
or if you're using the compose key on Windows. 01:18:55.440 |
Or if you don't like Unicode, that's also totally cool. 01:18:58.560 |
You can just use the MatMul function and just spell it out. 01:19:03.000 |
of Swift just wanting to work the way you want to work. 01:19:05.280 |
And if you like math, then you can have math. 01:19:08.360 |
If you want to type things out, you can do that too. 01:19:15.920 |
You can reshape them with the reshape function. 01:19:18.240 |
They support square root and all the other math stuff. 01:19:21.560 |
It has element-wise operations like add and multiply 01:19:32.240 |
>> So, what we did was we turned off bounce checking. 01:19:35.200 |
And so, if I write code that -- if I have an array of 10 numbers, 01:19:38.800 |
and in Swift, if I access out the 11th element, 01:19:41.920 |
it will explode and tell me that that's invalid. 01:19:44.560 |
If you use unsafe, then it will let you do that. 01:19:47.920 |
And so, whatever happens to be in memory beyond the end 01:19:57.280 |
And so, this is -- Swift is trying to be default -- 01:20:00.680 |
by default safe, and it's trying to help you. 01:20:03.720 |
But if you want to, you can just rip off all the guardrails. 01:20:08.240 |
like you can literally do things as dynamic as Python 01:20:11.680 |
But, you know, the defaults are there to help you out. 01:20:14.640 |
>> Yeah, so, Python programmers, a lot of them won't be familiar 01:20:18.660 |
But in things like C, unsafe code is code where you're working 01:20:22.440 |
with memory that hasn't been initialized, or it's been freed. 01:20:25.520 |
And it's a really big problem if you're using it 01:20:36.520 |
So, you know, you should -- I think it's fine 01:20:41.560 |
So, coming back to tensor, you know, you can add them. 01:20:46.160 |
You can -- like all the normal stuff you'd expect is on tensor. 01:20:49.560 |
Tensor else, if I run the right cell, tensors also have a bunch 01:20:54.080 |
of methods that do cool things like convolutions and other stuff 01:21:01.080 |
that it likes comparisons to return Booleans. 01:21:03.760 |
And so, you'll see that if you compare two tensors, 01:21:06.000 |
it will see if -- it will give you an ordering 01:21:10.420 |
But sometimes you want to get a tensor of Booleans back. 01:21:13.400 |
And so, Swift calls these the point-wise operators. 01:21:17.640 |
And so, if you put a dot before the less than or the greater than 01:21:20.320 |
or the equals or whatever, it will do a tensor comparison. 01:21:24.080 |
>> Yeah. And I get burnt by this all the time in NumPy and PyTorch 01:21:29.240 |
So, I think that this design call is awesome, this idea 01:21:32.440 |
that Boolean operations always return Booleans 01:21:36.320 |
and point-wise operations return point-wise Booleans. 01:21:40.960 |
>> And then you have reductions like any and all that say, hey, 01:21:43.400 |
if I have a tensor of Booleans, I can check to see 01:21:46.320 |
if they're all set or if any of them are set. 01:21:51.920 |
of the notebook is just saying, hey, look, all the stuff 01:21:54.000 |
that you've seen before looks exactly the same 01:21:58.200 |
Sometimes the words change, like unsqueeze is called expanding 01:22:01.080 |
shape at, which is a rather swifty way of saying things. 01:22:05.000 |
But there's -- in a lot of these notebooks, you'll find that there's 01:22:09.320 |
like lots of details where we've just basically copied the Python 01:22:13.480 |
code and we're not going to show you all those details 01:22:17.720 |
>> Yep. Now, let's talk about matmul on tensor. 01:22:20.880 |
So, what we've done here is we've defined the same matmul 01:22:23.160 |
that we had before and before we took two arrays 01:22:32.200 |
We start by creating a zero tensor, we loop over it all. 01:22:34.400 |
Now we have our two-dimensional indexing just 01:22:38.360 |
When you run this, what you'll see is this is a lot slower. 01:22:59.080 |
The first thing I want to say is that hopefully 01:23:02.720 |
at this point you're thinking this is a problem 01:23:06.480 |
because it's kind of like the exact opposite of everything 01:23:09.680 |
that Chris has been telling us and I've been telling you 01:23:12.960 |
Like, what's the point of something that's infinitely 01:23:15.240 |
hackable if there's this tensor flow layer we go beneath 01:23:18.920 |
and that it's so slow that we can't really actually write 01:23:27.120 |
So, we would not -- we would not be running this course 01:23:34.960 |
This is where we are now and it's a temporary situation 01:23:46.880 |
So, the first thing to point out is that when you work 01:23:50.440 |
with PyTorch, we have a similar issue, right? 01:23:54.440 |
Is like we don't write PyTorch triply nested for loops either, 01:24:01.960 |
right, and the reason we don't is that we need PyTorch 01:24:04.640 |
to have a reasonable amount of work to do each time we get it 01:24:15.960 |
multiply them together and then it says here's the whole thing 01:24:21.360 |
So, it's like if PyTorch was an airplane, right, 01:24:24.200 |
and we want to send our stuff off to China, we pack it all 01:24:31.840 |
As opposed to the triply nested for loop version, 01:24:34.440 |
which is where I take a sock and I put it in an airplane 01:24:38.000 |
and it flies to China and back and then I put it 01:24:42.120 |
And it's going to take, that's a fast airplane, right, 01:24:45.960 |
but it's just not an efficient way to use it, right? 01:24:50.160 |
So, we already have this issue which is you've got 01:24:53.000 |
to give PyTorch enough work to do to make this latency, 01:25:00.280 |
Now, TensorFlow was designed in a very different way 01:25:03.480 |
to PyTorch and for those of you that did the earliest courses 01:25:08.320 |
with fast AI, this will look very familiar, right? 01:25:10.680 |
It's actually a really fascinating programming model. 01:25:13.960 |
You say there will be later on a float called X 01:25:19.160 |
and later on I will want to multiply that float by two. 01:25:25.360 |
Now, set up a session where we're going to do some stuff 01:25:28.840 |
and run this computation graph, which could have lots 01:25:32.680 |
of things going on in it, and run it in these things, right? 01:25:37.880 |
So, I basically kind of set up this whole series 01:25:40.240 |
of computations and then I pass in some data. 01:25:43.880 |
So, this is a very different feel to PyTorch, right? 01:25:49.120 |
And because TensorFlow was built this way, TensorFlow, to me, 01:25:57.040 |
It behaves like a ship or actually a ship designed 01:26:01.160 |
for shipping ships, or actually a shipping ship designed 01:26:04.800 |
for shipping shipping ships, which is this particular one, 01:26:09.720 |
So, if you have a shipping ship shipping ships ship, 01:26:15.080 |
then you need to use it in a different way, which is 01:26:17.160 |
if you want to get all of your socks, all of the socks 01:26:20.080 |
in America to China, you send them all, send all of your ships 01:26:26.880 |
everybody dumps their socks on and they all get sent 01:26:31.960 |
Now, to take advantage of this extraordinary power, 01:26:35.400 |
you have to use it in a certain way and you have 01:26:36.960 |
to have certain things that you need to be able to do. 01:26:43.480 |
to run all the world's search engine queries, 01:26:49.400 |
Now, TF Eager is the kind of the new hot thing for TensorFlow 01:26:54.880 |
and it's designed to, or it does look like PyTorch, right? 01:27:00.200 |
So, this is what happens when you say TF.enableEagerExecution, 01:27:09.480 |
I say, this is my number and this is my matrix multiplication, 01:27:16.520 |
The thing about this is, though, is that this is kind 01:27:21.040 |
of syntax sugar on top of the ship, shipping, ship, 01:27:29.560 |
a lot of the same kind of foundations and some 01:27:32.280 |
of it's been optimized but only a bit, right? 01:27:35.120 |
So, as I say this today, as of April 2019, a like 5 by 5 matrix, 01:27:48.760 |
which is 10 times longer than PyTorch takes, right? 01:27:54.360 |
and so TF Eager is not a solution to writing the kind 01:28:00.960 |
of low level get down to the bottom stuff that Chris is saying, 01:28:21.160 |
>> So, TensorFlow has this big ecosystem of things to try 01:28:24.600 |
and kind of fill in around this, around this issue 01:28:28.200 |
of having this huge kind of mechanism that works 01:28:30.360 |
in a particular way to make sure that, you know, 01:28:32.480 |
you can put it on mobile phones or that you can do it 01:28:38.240 |
But the good news is that what's happening at the moment 01:28:43.360 |
and the reason we're seeing this speed, right, 01:28:57.800 |
And this is like a great choice because it lets us 01:29:00.120 |
like do this course, it lets us say like here's how you use it, 01:29:03.400 |
we can build things on top of it whilst the real stuff is being built 01:29:08.400 |
behind the scenes and the real stuff which is to sit on top 01:29:13.080 |
of this thing called MLIR which Chris can tell us a little bit 01:29:16.320 |
about which is basically gets all of that compiler goodness 01:29:20.240 |
that you've seen and allow that to work with the GPU and the CPU 01:29:25.640 |
and different types of accelerators and let you write Swift, right? 01:29:31.080 |
So the reason I mention this is that for probably as long 01:29:34.280 |
as this course is relevant, you know, like the next year, 01:29:38.760 |
the true potential of what we're talking about, 01:29:44.080 |
We're actually building for a future that's not here yet. 01:29:48.320 |
But when we get there, all the stuff that Chris is showing you, 01:30:01.560 |
>> Yeah, so if I were, a different way to explain it, 01:30:04.280 |
a year from now it's going to be mind blowing. 01:30:08.400 |
to do stuff you've never even thought that was possible 01:30:16.680 |
Like there are certain humans in the world, like Scott Gray is one 01:30:19.520 |
of these people who can make an accelerator do crazy things 01:30:24.520 |
And that's what we're trying to do, but in a workbook. 01:30:27.000 |
>> Right. And the reason this matters is that there are vast areas 01:30:30.960 |
of unexplored research territory because, I mean, 01:30:35.360 |
most people can't write the CUDA code, and even those that can, 01:30:43.480 |
So in a year's time, we'll be able to do stuff 01:30:46.480 |
that people just aren't even trying to do yet. 01:30:48.920 |
>> But one of the cool things about this is you don't have 01:30:51.080 |
to wait a year, so next lesson we'll show you that XLA is here 01:30:54.160 |
today, XLA is super awesome, it's a really important part 01:30:57.240 |
of the TensorFlow ecosystem and it's way better 01:31:02.440 |
>> So we want to like completely jump over the industry 01:31:08.760 |
>> But even today, TensorFlow is a lot of power and XLA allows you 01:31:16.960 |
So XLA is this really nice kind of intermediate thing 01:31:18.560 |
where it's much more mature than the PyTorch JIT, 01:31:23.520 |
It's a compiler that will turn your code into stuff that's kind 01:31:27.480 |
of similar-ish performance to what you might see 01:31:30.280 |
from PyTorch JIT, probably a lot less rough edges. 01:31:33.120 |
>> It doesn't generate blobs of C++ and try to compile them again. 01:31:38.520 |
>> So it's a really neat path because it allows us 01:31:40.840 |
to do this course now, it allows you to start playing with this now 01:31:43.960 |
in a couple of months, it allows you to get a lot of performance 01:31:46.960 |
for a lot of things that you might want to play with and it means 01:31:50.400 |
that by the time MLIR comes, we'll be all ready 01:31:56.760 |
>> And is there a way to make sure the matmul 01:31:59.760 |
or other functions are correctly using shared memory on the GPO? 01:32:04.960 |
to make sure you aren't constantly busting the cache 01:32:08.520 |
>> We're not going to talk about this next week 01:32:11.680 |
>> Well, so I think that the thing to know is 01:32:16.800 |
You would not poke a tensor one float at a time. 01:32:24.800 |
But what you end up wanting to write is, let's see here. 01:32:29.120 |
You just write this where you write something 01:32:32.600 |
where you take two matrices and you're multiplying together 01:32:35.400 |
or you use the Unicode one or the matmul one and it goes fast 01:32:38.280 |
and it takes 0.02 seconds which is faster than Swift version 01:32:45.920 |
If you run on the CPU, it uses all the threads on your computer 01:32:50.240 |
And so the way to think about tensor is that it's meant 01:32:58.000 |
>> And we will see next week some really interesting stuff coming 01:33:00.720 |
down the line with stuff where you can write kind 01:33:06.200 |
of tiled algorithms in ways that are much more concise 01:33:17.400 |
How do LLVM, MLIR and XLA relate to each other? 01:33:22.120 |
>> That would be better explained with slides which we'll go 01:33:37.120 |
But LLVM really helps you with the one float at a time kind 01:33:40.720 |
of a thing if you're going to a simpler processor. 01:33:43.800 |
XLA is really good at tensors and so it's a tensor compiler 01:33:47.960 |
and so it's really good at saying I have these big tensor 01:33:50.920 |
operations, I have convolutions to get maximum performance 01:33:58.080 |
You have to know about tiling, you have to know about fusion, 01:34:02.180 |
of these low level systems things before you then hand it 01:34:09.400 |
And so XLA talks to LLVM for GPUs for example 01:34:17.640 |
and LLVM does all the float and small vector stuff. 01:34:23.560 |
but it's tackling graph level optimizations in tensor flow 01:34:27.200 |
and kind of expanding XLA beyond just dense linear algebra 01:34:31.520 |
because there's a lot of interesting sparse things 01:34:34.600 |
and other things that are coming down the pipeline 01:34:38.400 |
>> So yeah, so basically I mean we won't look at the rest 01:34:41.680 |
of this notebook other than to say that the broadcasting stuff 01:34:47.320 |
So you can kind of see how that all looks at the moment. 01:34:54.440 |
>> All that stuff is all here and don't worry 01:34:56.440 |
about the performance, it's really slow at the moment 01:34:58.240 |
for the reason we mentioned, but it totally won't be. 01:35:01.120 |
And you can also see matrix modifications of different sizes 01:35:11.200 |
>> Well do you want to do this or do you want to go to 11? 01:35:19.440 |
So one of the really cool things about the stack is 01:35:21.640 |
that tensorflow is a really mature ecosystem. 01:35:23.520 |
It has hundreds of different operators available. 01:35:28.680 |
So tensorflow kind of grew organically over time a little bit 01:35:31.480 |
and so it has a lot of things in its toolbox. 01:35:35.920 |
What Swift for Tensorflow does is it tries to curate that 01:35:38.640 |
and it has tensor and the way tensor works is it's the struct 01:35:47.080 |
And so if you look inside tensor and here there's a link 01:35:49.960 |
so you can go click on it and see the implementation. 01:35:52.440 |
Tensor has this thing called tensor handle that is 01:35:55.200 |
under the covers basically the tensorflow low level thing 01:36:01.240 |
And if you look at plus, what plus does on two tensors is it 01:36:08.760 |
And the way that this works is this is just directly talking 01:36:16.120 |
And the add op is implemented with cuDNN or it's implemented 01:36:19.000 |
with XLA or it's implemented in different ways 01:36:22.960 |
But this is just a simple syntactic sugar thing 01:36:26.200 |
where we're saying hey plus turns into tensorflow add. 01:36:29.560 |
Now again, tensorflow has tons of cool stuff and it has stuff 01:36:33.160 |
that I barely understand with lots of mathy things 01:36:38.400 |
and like Bayesian propagation of things that I've-- 01:36:42.600 |
>> We have an excellent course about triangular decomposition 01:36:47.120 |
I'm going to try to survive next week and then I'll take it. 01:36:57.000 |
And so what you can do is you can actually add new things 01:37:01.080 |
And so one example of that right here is so you can get 01:37:05.720 |
like zeros like if you go into here, let's see if this is. 01:37:11.640 |
So with tab completion you can see all of the interesting things. 01:37:18.280 |
Add many sparse to tensor map, add n, adjust contrast, a sign, 01:37:23.640 |
like it's got lots and lots and lots and lots and lots and lots 01:37:27.800 |
>> And this is super cool, particularly if you're watching this 01:37:30.720 |
between like about April and about September, 01:37:34.080 |
like in the period where maybe the XLA stuff isn't 01:37:39.920 |
You probably care about this because there's-- 01:37:43.360 |
>> Lots and lots and lots and lots and lots and lots and lots 01:37:47.480 |
>> Which we haven't necessarily surfaced yet. 01:37:50.680 |
how do I can switch from RGB to BGR format and somebody said, 01:37:57.520 |
oh, there's something in TensorFlow called reversed 01:38:04.320 |
>> Yeah, and so one of the things we use for X-Res 01:38:06.400 |
and other image models in this course is, hey, 01:38:10.440 |
And you could do that with Python, that's fine. 01:38:16.760 |
And so all we're doing is we're adding a method 01:38:21.600 |
if you want to create a string tensor from a file, 01:38:25.480 |
we can have read tensor, we can just use read file 01:38:30.960 |
and now I can say, string, give me a string tensor, read file, 01:38:35.920 |
foo and I added a decode JPEG method on here too 01:38:39.960 |
so now I can say decode JPEG, JPEG and I got my file, right? 01:38:46.640 |
And so this is one of the cool things about this platform is 01:38:51.480 |
We're not trying to hide it, we're just trying 01:38:52.800 |
to curate it a little bit but again, you can just go add 01:38:58.480 |
in the study group today was building an audio library 01:39:00.000 |
with Swift for TensorFlow and we haven't surfed any of that 01:39:03.200 |
so they were grabbing, you know, raw dot decode WAV or something 01:39:10.360 |
And again, Swift gives you nice ways to build these things 01:39:12.800 |
as APIs with default arguments and all this nice stuff 01:39:15.320 |
and so you get a lot of design space to do things 01:39:21.760 |
to do this is we've kind of gone like super, super bottom up. 01:39:26.880 |
I must admit I thought we had done bottom up before 01:39:33.040 |
>> Yeah, then we brought a compiler guy who, you know, 01:39:36.320 |
is always good at making me feel small and insignificant. 01:39:40.080 |
And so, but now let's jump back up to the top again 01:39:44.720 |
to see where we're going to end up and then next week, 01:39:50.160 |
we're going to kind of flesh out all the stuff 01:39:56.680 |
And notebook 11 is interesting because this is the one 01:40:00.120 |
where we train an xresnet on imagenet, right? 01:40:07.000 |
So every time we import the previous notebook, 01:40:12.120 |
just like we do in Python, the previous notebooks, however, 01:40:16.440 |
aren't just numbered but they also have the name. 01:40:22.840 |
of percent map plot lib inline in this environment. 01:40:30.040 |
So here, load data, we'll show you how we built something 01:40:37.520 |
that downloads imagenet but it basically looks almost exactly 01:40:40.440 |
like the very similar to the download MNIST thing 01:40:45.320 |
And we've created an item list which has extensions. 01:40:49.680 |
And we've created a split data which takes an item list. 01:41:02.400 |
because now we can just pass in a trailing closure 01:41:05.520 |
which as Chris described, if the last parameter is a closure, 01:41:12.080 |
then you can just whack it in curly brackets. 01:41:16.200 |
And you don't even have to give it an argument name 01:41:20.920 |
So we're saying split this item list by grandparent. 01:41:26.720 |
This is the file name that you're going to get. 01:41:28.360 |
This is basically like the equivalent of doing partial, 01:41:30.720 |
right, and it's going to be some different validation set. 01:41:42.200 |
So you can say we've got a whole data blocks API here. 01:41:45.720 |
One of the things that I guess you're going to talk about next 01:41:52.240 |
>> Yeah, OK, so basically in Swift, as Chris mentioned, 01:42:04.360 |
struts are things that normally don't change. 01:42:07.760 |
But you can create something that kind of feels a lot 01:42:15.960 |
Because remember, processors actually change their state 01:42:20.520 |
because we get like a vocabulary, for example, 01:42:22.960 |
the first time we use a processor on the training set. 01:42:29.520 |
And then we've added a to data bunch and we can pass 01:42:33.160 |
in all the normal stuff, including a batch size. 01:42:37.640 |
So next thing we can do is we can do transformations. 01:42:43.000 |
And again here, we can use a trailing closure 01:42:45.880 |
to basically do a partial, to say that we're going to do 01:42:56.200 |
Something that I think Chris will probably talk 01:43:00.800 |
But basically in Swift, very often you want to be able 01:43:06.480 |
to say, hey, this is going to return either a batch of data 01:43:10.400 |
or maybe it was going to return nothing at all, right? 01:43:13.320 |
Which in Python, we use the optional type for that. 01:43:18.560 |
And it's called the same thing in Swift, right? 01:43:22.840 |
So basically what happens is if you have something 01:43:27.440 |
that might not return anything, so one batch might not return 01:43:30.200 |
anything because it might be nothing to return. 01:43:33.440 |
And then the exclamation mark just says, assume that it's something. 01:43:45.680 |
So it's been really fun, this process of, you know, 01:43:49.520 |
in the last couple of weeks of basically saying, 01:43:58.960 |
And one thing I'll say is like a lot of these notebooks have been written 01:44:04.960 |
by Sylvain in particular and by me a little bit. 01:44:12.840 |
through those notebooks thinking, oh, this is nice, 01:44:15.640 |
but it'd be even more Swift-y if you did blah. 01:44:18.400 |
Please let us know in the forum, because we're super interested 01:44:23.000 |
>> And I've been super interested to learn all the ML. 01:44:27.880 |
I mean, it's, you know, in one sense, it's a good sign 01:44:32.400 |
that you're learning fast AI for Swift from the people who started the fast AI 01:44:37.760 |
in Swift projects, but on another sense, I know nothing about Swift 01:44:41.600 |
and Chris doesn't know much about deep learning, 01:44:43.440 |
so maybe it's the worst of all possible worlds. 01:44:51.640 |
So as you can see, we've got a data blocks API that's now working. 01:44:57.000 |
The other thing I mentioned, as you'll see next week, 01:44:59.440 |
is the way we've got this working is it's using a TensorFlow API called tf.data, 01:45:06.360 |
which is actually a lot better than a lot of data APIs, 01:45:11.040 |
but it's still not nearly as good as I would like, and I would love to, 01:45:15.360 |
as a community, start building out the next version that uses 01:45:19.160 |
like Swift's libdispatch to do the threading and maybe openCV 01:45:23.520 |
to do the transformations and stuff like that. 01:45:29.280 |
something like the Python data blocks API, but that is like native. 01:45:42.960 |
So let's create an x-resnet model, and as you've already seen 01:45:46.400 |
in the slides, it ends up looking very, very familiar. 01:45:56.600 |
and this will probably only be true for a couple more weeks, 01:45:59.320 |
there are kind of two versions of all the layers. 01:46:01.800 |
There's the versions in the fast AI repo, which all start with FA, 01:46:06.080 |
and there are versions in the Swift repo that don't. 01:46:10.960 |
So a conflier has a batch norm, and it has a convolution. 01:46:16.720 |
Another thing that's slightly awkward at the moment is that we -- 01:46:19.960 |
so you'll see, right now, some of our code looks weird 01:46:25.880 |
because auto diff in Swift doesn't support flow control, 01:46:34.320 |
So when you see something like no bias convolution, 01:46:37.760 |
that's because we can't write a convolution layer 01:46:41.160 |
that has an if statement saying if the person asks for bias, 01:46:46.280 |
So don't worry too much about those workarounds. 01:46:50.800 |
So we've got a batch norm layer, we've got a conv layer, 01:46:53.560 |
and we can go through them, and the zero bn is the stuff 01:47:01.800 |
but otherwise everything looks the same as usual. 01:47:05.600 |
Because we don't have the ability right now -- 01:47:14.920 |
we've basically added something called a switchable layer, 01:47:23.440 |
how we actually wrote our own kind of custom gradients 01:47:28.560 |
for this kind of layer, and that'll be fun to learn about. 01:47:31.880 |
So then we used that to basically have something where -- 01:47:35.600 |
because remember in xresnet, in the identity path, 01:47:43.040 |
sometimes you change the number of channels in that path. 01:47:45.560 |
If you down-sample, then you maybe add an average pool 2D. 01:47:50.240 |
So because, again, we don't have the ability to have if, 01:48:00.160 |
by adding a one-by-one conv, so that's all that is. 01:48:04.400 |
So most of this stuff, if you're watching this, you know, 01:48:17.400 |
the res block, there's really nothing to mention, is there? 01:48:32.240 |
in the Python versions and kind of switch between them, 01:48:43.440 |
Like, it's the same layers equals conv layer. 01:48:52.040 |
This question mark and colon is identical to if 01:49:01.480 |
It comes from C. And then, yeah, and then finally in the call, 01:49:29.600 |
And so it's interesting to see how some swift things kind 01:49:39.960 |
this is the same as range and blocks in Python. 01:49:44.680 |
So this is basically saying map, range and blocks, 01:49:54.520 |
I find it more clear, the swift way, but very, very similar. 01:49:59.160 |
>> And the idea of Swift is to have simpler primitives 01:50:09.400 |
The x res net looks very similar to what we would expect. 01:50:16.760 |
We've still got our number of filters thing going on. 01:50:20.120 |
The stem, so now we've got that array map thing. 01:50:24.120 |
You're kind of going to start to get a feel for these kind 01:50:28.440 |
So kind of range.map is a useful kind of idiom to be aware of. 01:50:39.280 |
It just depends on how you want to write the code. 01:50:43.280 |
than enumerate bracket something, it's not enumerated, 01:50:46.720 |
When you map to it, you get back an index and an object just 01:50:51.400 |
So in this case, because we've gone .map and then .reduce 01:50:56.720 |
with plus on a list, this is a list comprehension now, right? 01:51:01.120 |
Because this is spitting out a bunch of lists 01:51:09.560 |
This is one of those cases where you're asking 01:51:14.720 |
List could be empty, so there might be nothing in it. 01:51:23.880 |
So we can compose this list of layers on our input. 01:51:29.680 |
So again, we've got kind of similar concepts expressed 01:51:39.800 |
So now we create the various functions that we need 01:51:46.960 |
So one is a function that's going to create a model. 01:51:49.600 |
So it's going to be something that creates an exresnet, 01:51:52.800 |
and that's the function that's going to create a model. 01:51:55.560 |
We're going to have a function that creates our optimizer, 01:51:59.800 |
which, as you'll see, we have a stateful optimizer, 01:52:06.480 |
We have a learner, just like we had in Python, 01:52:12.040 |
And again, next time we'll talk about how all these are built. 01:52:21.120 |
We have recorder callbacks, just like we're familiar with. 01:52:25.480 |
We have one-cycle training, just like we're familiar with. 01:52:27.920 |
This add one-cycle delegates and make default delegates 01:52:33.520 |
and we'll have some slightly neater ways to do this. 01:52:39.360 |
And then we train it with a resnet 18 for a few minutes, 01:52:45.120 |
Couple of things to mention as I go through this end of April. 01:52:51.600 |
Right now, this uses about twice as much memory as PyTorch, 01:52:56.880 |
and it's about three to four times slower than PyTorch. 01:53:00.240 |
No fundamental reason why this need be the case. 01:53:19.440 |
at a point where we can actually train proper models like this 01:53:22.960 |
from scratch in not too slow and not too memory intensive. 01:53:28.280 |
And if you're interested in getting into the weights, 01:53:31.160 |
we would certainly love help with fixing the performance 01:53:42.000 |
as someone who's now using Swift for the first time? 01:53:45.880 |
So the best place to go is github.com/tensorflow/swift. 01:53:57.880 |
is that we're building everything in the open. 01:54:06.600 |
that you'll find linked up for a GitHub page. 01:54:08.680 |
And so we try to have a really inclusive and welcoming 01:54:11.520 |
And there's a ton of resources available and a lot to do. 01:54:20.160 |
is to come to the hair brain forum, the last day of forums. 01:54:23.960 |
Because I think for a lot of people, the right way-- 01:54:28.120 |
that you'll get the most out of, the most relevant to you 01:54:30.480 |
right now, is to pick something that you've already 01:54:39.920 |
And you may think you have no idea how to do that. 01:54:43.440 |
But create a really crappy, slimmed-down Swift version 01:54:48.640 |
That's the only way any of this stuff gets done. 01:55:05.280 |
Yeah, where fast AI lives and where Swift for TensorFlow 01:55:09.960 |
But in the end, between the fast AI and Swift for TensorFlow 01:55:15.360 |
repos, there'll be a kind of an ecosystem that covers 01:55:19.520 |
the same kind of ground that PyTorch plus fast AI covers 01:55:28.320 |
Because you've got the entirety of Scikit Learn and Matplotlib 01:55:36.160 |
Another thing is, if you go on the fast AI GitHub, 01:55:44.560 |
And so next time, we'll go back through and talk about two 01:55:50.400 |
And so if you'd like to look, you can go do that. 01:55:52.600 |
And they'll get a little bit better by next week, I bet. 01:55:54.900 |
And one thing to mention is, with the normal fast AI 01:56:06.440 |
These notebooks is going to be very different. 01:56:08.400 |
We're going to keep them very, very up to date. 01:56:10.400 |
So by the time you watch this, they may look very different. 01:56:13.800 |
Because we want to always have for you, showing you, 01:56:16.880 |
this is how we suggest you write Swift for TensorFlow code now. 01:56:24.760 |
you'll see we've been discovering new things, 01:56:26.640 |
like differentiable arrays and suchable layers. 01:56:28.680 |
And it allows us to delete lots of previous workarounds 01:56:32.720 |
And the next couple of months will be similar. 01:56:40.800 |
Swift is both thread safe and has a really great threading 01:56:45.240 |
And so you can fire up lots of concurrent work items, 01:56:48.600 |
set up work queues, has a really vibrant and good API 01:56:55.560 |
and all these advanced things that are available there. 01:56:58.520 |
Yeah, I've never used such a nice kind of threading-- 01:57:05.880 |
So on the Apple side, they call it Grand Central Dispatch. 01:57:08.680 |
But they've put the whole thing over to Linux. 01:57:11.000 |
And you have this whole kind of threading library framework, 01:57:17.760 |
This is one of the reasons the Swift server community really 01:57:22.320 |
but it also supports threading and other things really well 01:57:27.040 |
Are there any scikit-learn for Swift projects in the works? 01:57:42.120 |
But it would definitely be nice if you could build a gradient 01:57:45.080 |
boosting machine or even simple things like K-nearest neighbors 01:57:57.160 |
is to go beyond just reinventing what's already there. 01:58:02.240 |
but it could be a lot better, particularly in Swift. 01:58:06.520 |
So if you do decide I want to build bits of sklearn 01:58:16.000 |
try and build something that's vastly better than what's 01:58:21.440 |
I wouldn't suggest that being a starter project. 01:58:24.880 |
one of the lessons in the one through seven class 01:58:32.400 |
I think that'd be a really great place to start. 01:58:34.560 |
As you get more experienced and you get more familiar with Swift, 01:58:37.560 |
then tackling something like building a big library 01:58:40.760 |
Is there any plan to build tensor shapes into the Swift type 01:59:04.800 |
So this is something we're super interested in. 01:59:06.720 |
We think there's a lot of opportunities there, 01:59:08.720 |
both in terms of shape checking, for example. 01:59:12.120 |
The whole stack we're building with the compiler integration 01:59:14.800 |
and the good the locations and stuff like that. 01:59:25.000 |
how the best way to do that is, because there's trade offs. 01:59:27.520 |
But that's exactly all the second step things we want to do, 01:59:33.760 |
the basic auto diff, base performance, like scale out, 01:59:42.160 |
And we're trying to stay focused on making sure 01:59:44.080 |
that things are really good, and build the basics, 01:59:58.960 |
How is ampersand referencing different from struct? 02:00:05.400 |
So Swift has a-- this comes back to safety in the language. 02:00:12.760 |
to references, like classes, and structs, and values. 02:00:17.200 |
And so I'm going to save that mind-blowing piece 02:00:31.500 |
And how is Swift for probabilistic programming? 02:00:37.080 |
both completely incapable of talking intelligently about, 02:00:42.360 |
is one of those things that I think is really underutilized. 02:00:46.880 |
One of the things that I think is really interesting 02:00:49.200 |
about Swift as a platform for machine learning 02:00:54.600 |
so with Python, you end up in this weird situation 02:00:59.880 |
and then you have an application that you eventually 02:01:07.440 |
And we can start erasing some of those boundaries, 02:01:09.480 |
because Swift can be put in a mobile app, believe it or not, 02:01:21.760 |
So I think the answer is it'll be a great fit. 02:01:27.880 |
But basically, with things like probabilistic programming 02:01:40.640 |
because Sumith Chinchilla, who created PyTorch, 02:01:44.800 |
He said, if you want to do this kind of work at the moment, 02:01:47.800 |
you might want to look at Julia, which is another great option 02:01:57.920 |
of lots of things calling lots of other things. 02:02:00.080 |
And so you need-- and they're often kind of small. 02:02:02.800 |
So you need those things to happen super quickly. 02:02:12.840 |
when you add all the threading stuff on top as well. 02:02:20.680 |
think you could start getting into straight away. 02:02:23.880 |
One of the nice things about that is you can do a lot on the CPU. 02:02:27.400 |
A lot of these things don't even make sense on the GPU. 02:02:33.560 |
And for that, actually, we'll add it to the forum post. 02:02:40.480 |
about how to access a variety of random number distributions 02:02:44.080 |
from Swift, C++, random number distributions. 02:02:48.240 |
So you could actually get started with this right away. 02:02:54.360 |
has a big, mature framework called TensorFlow Probability. 02:02:57.680 |
And so I personally don't know very much about it. 02:02:59.680 |
But I expect that all the atoms you need are all there. 02:03:03.480 |
And we just need somebody who knows the space really well 02:03:05.800 |
to build a Swift library that can expose all the primitives 02:03:11.040 |
How could you deploy Swift models on Android? 02:03:19.520 |
Well, so I think there's two options that you have there. 02:03:21.760 |
So one is, Swift, again, builds on the entire TensorFlow 02:03:34.860 |
And so the whole mobile deployment situation there 02:03:39.040 |
I feel like that's kind of the model we're trying to get away 02:03:41.480 |
from a little bit, though, do you feel that way? 02:03:42.840 |
So the other option is, Swift actually runs fine on Android. 02:03:49.960 |
Swift on Android isn't really awesome, as far as I know. 02:04:02.720 |
And so Swift fits into the native stuff is my understanding. 02:04:06.200 |
But I know that people are building and shipping 02:04:10.120 |
And so that's a totally reasonable thing to do. 02:04:13.940 |
The other thing to mention in terms of platforms 02:04:17.200 |
is that Swift on Windows is starting to come together 02:04:21.040 |
So I don't know where it'll be by the time you're 02:04:28.240 |
to worlds outside the iOS world pretty rapidly. 02:04:35.540 |
People are writing, what's the Windows MFC apps in Swift, 02:04:47.040 |
is a little taste of where we're heading next week. 02:04:55.800 |
shows in her computational linear algebra course. 02:04:58.960 |
And it comes from a really amazing programming language 02:05:10.880 |
in terms of completely rethinking how we program 02:05:15.660 |
And I want to show it to you because it gives you 02:05:29.720 |
the goal here is to be able to get this performance. 02:05:31.800 |
Because remember, the C speed, triply nested for loop, 02:05:43.600 |
but you being able to write it yourself in Swift? 02:05:49.320 |
And so this video actually comes from the Halide project, which 02:05:55.400 |
is a programming language that has kind of invented 02:06:01.880 |
And so I'm going to use it to describe the problems 02:06:14.120 |
And so the algorithm we're going to write here 02:06:18.520 |
is one where they're doing a simple blur, a 3 by 3 blur. 02:06:31.200 |
In what order, for example, do I compute the values 02:06:35.560 |
And one way is just go through each x one at a time, 02:06:39.800 |
and then within that, go through each y one at a time. 02:06:51.720 |
Now these aren't going to have very different characteristics. 02:06:56.020 |
because we're jumping further through memory. 02:06:58.160 |
But what we could do is we could do something called vectorization. 02:07:07.880 |
is we actually take four or sometimes even eight numbers 02:07:11.160 |
at a time, and throw them all at the CPU or GPU at once, 02:07:16.960 |
And so we have these things called vector units 02:07:23.100 |
happening at the same time, because you have multiple cores. 02:07:26.240 |
But in fact, in the GPU, this is what happens all the time. 02:07:29.560 |
Or in order to think about better memory access, 02:07:33.040 |
we could do a little block at a time, like this. 02:07:40.200 |
to change the performance characteristics of my, 02:07:44.880 |
Another question is, when should I compute my inputs? 02:07:50.000 |
And see how it's going through three at a time? 02:07:53.160 |
Because I'm trying to calculate three at a time. 02:07:57.600 |
Now I have to go through all of those three at a time. 02:08:04.400 |
And it's not able to really use the cache well. 02:08:07.240 |
Instead, I could do a whole set of nine at a time. 02:08:11.560 |
And that would then allow me to create a whole blurred output 02:08:17.720 |
Or I could go through it like this, exactly like before, 02:08:25.440 |
I don't need to recompute, because it's already there. 02:08:29.360 |
OK, just to add a clarification that the left panel's input, 02:08:40.920 |
So we can do vectorized input, and then vectorized 02:08:43.500 |
on the intermediate values, and then calculate those 02:08:45.760 |
to create our vectorized output with parallel processing. 02:08:54.720 |
And you can see they're going to have very different performance 02:08:59.480 |
So at Halide, they have this neat idea, which is, hey, 02:09:06.800 |
let's not write nested, nested, nested for loops, and tiles, 02:09:13.260 |
Let's instead describe for each value x, y in my blurred output, 02:09:19.200 |
here is how it's calculated in this declarative way. 02:09:24.560 |
This is literally the definition of a blur algorithm. 02:09:30.840 |
And then after you've done that, you can then say to Halide, 02:09:36.080 |
what are different schedules for calculating that? 02:09:40.040 |
So what's the kind of order in which things are done? 02:09:42.520 |
And for these different schedules that are written here, 02:09:45.960 |
they have all the different behaviors you just saw. 02:09:51.120 |
When expert CUDA programmers and expert CPU programmers 02:10:02.160 |
they're using the world's best knowledge available 02:10:05.640 |
across all of those things to create special versions 02:10:08.160 |
for every architecture, for lots of different matrix sizes, 02:10:13.880 |
tensors of different numbers of dimensions, so much assembly 02:10:22.840 |
So how are we going to be able to write the stuff that's 02:10:28.520 |
in our head, but have it run reasonably quickly? 02:10:34.440 |
And so what we're moving towards with stuff like MLIR 02:10:40.440 |
is the ability to kind of have domain-specific languages 02:10:44.280 |
where you could write, here's the tiling domain-specific 02:10:54.680 |
And so that's the hope of where we're going to be going here 02:10:58.360 |
is that Chris's team is going to be putting these kinds of tools 02:11:06.600 |
Well, and so one of the bad things about Halide-- so 02:11:09.080 |
in this space, we have TensorFlow today, TensorFlow today. 02:11:14.080 |
XLA is an important part of TensorFlow right now. 02:11:16.280 |
It's just not really wired into the Swift part of TensorFlow 02:11:23.160 |
don't have to hand-tune it, like that whole writing out 02:11:26.760 |
XLA does a good job of using the hardware as it is today. 02:11:30.440 |
The thing we're going further with MLIR is to say, 02:11:32.680 |
well, instead of you having to put all this blood, sweat, 02:11:35.480 |
and tears in to tune it, and know the hardware, 02:11:37.840 |
and do all this stuff, we can do other things like search. 02:11:43.720 |
available now which will use genetic algorithms 02:11:50.760 |
So you're starting to see the ideas that come out 02:11:53.600 |
of the database query optimizer world coming into the CUDA 02:11:58.320 |
And this is going to be so great for us data scientists. 02:12:01.240 |
Search can be implemented in lots of different ways-- 02:12:06.760 |
There's lots of cool things that can be done here. 02:12:10.640 |
is crack open the compiler and make the internal algorithms 02:12:16.480 |
So this is why you have a compiler guy and a DL guy 02:12:27.680 |
Because we're not going to get this kind of great outcome 02:12:36.840 |
that they have in TensorFlow and XLA and so forth. 02:12:39.760 |
So next week, come back, and we will dig even deeper.