back to indexLesson 14 (2019) - Swift: C interop; Protocols; Putting it all together
Chapters
0:0 Intro
0:25 Overview
0:47 Shoutouts
1:21 Package cache
2:52 Image processing kernels
3:26 Excellet
4:13 Fusion nodes
4:41 The big question
7:49 MOA
9:36 The Problem
11:44 Tensor Comprehension
14:35 Summary
14:51 The future
16:24 Audio processing
23:47 C
27:10 C header files
29:58 Inline functions
30:36 C compiler
32:25 Example
39:6 OpenCV
40:16 SwiftCV
44:48 Dynamically linked or statically linked
45:24 How much C do you need
47:3 Why Im teaching this
53:36 OpenCV Data Blocks
57:55 Layer
00:00:00.000 |
Welcome to the final lesson of this section of 2019. 00:00:08.060 |
Although I guess it depends on the videos, what order we end up doing the extra ones. 00:00:12.500 |
This is the final one that we're recording live here in San Francisco. 00:00:27.600 |
I won't read through it all, but basically we're going to be filling in the gap between 00:00:31.760 |
matrix multiplication and training ImageNet with all the tricks. 00:00:38.080 |
And along the way, we're going to be seeing a bunch of interesting Swift features and 00:00:42.740 |
actually seeing how they make our code cleaner, safer, faster. 00:00:47.240 |
I want to do a special shout out to a couple of our San Francisco study group members who 00:00:52.640 |
have been particularly helpful over the last couple of weeks since I know nothing about 00:00:58.640 |
It's been nice to have some folks who do, such as Alexis, who has been responsible actually 00:01:04.960 |
for quite some of the most exciting material you're going to see today. 00:01:13.040 |
So if you need glasses, you should definitely go there and get algorithmically designed 00:01:20.680 |
And thanks also to Pedro, who has almost single-handedly created this fantastic package cache that 00:01:28.520 |
we have so that in your Jupyter Notebooks, you can import all the other modules that 00:01:35.120 |
we're using and other exported modules from the Notebooks, and it doesn't have to recompile 00:01:41.720 |
And I actually am a customer of his as well, or at least I was when I used an iPhone. 00:01:46.320 |
He's the developer of Camera Plus, which is the most popular camera application on the 00:01:52.480 |
And back when I used an iPhone, I loved that program. 00:01:54.880 |
So I'm sure version two is even better, but I haven't tried version two. 00:01:58.160 |
So you can use his camera while looking through your Topology Eyewear glasses. 00:02:07.280 |
And where we left off last week was that I made a grand claim -- well, I pointed out 00:02:14.760 |
I pointed out through this fantastic Halide video that actually running low-level kind 00:02:21.800 |
of CUDA-kernel-y stuff fast is actually much harder than just running a bunch of for loops 00:02:27.920 |
And I showed you some stuff based on Halide, which showed here some ways that you can write 00:02:33.680 |
it fast, and here's some ways you could do it quickly. 00:02:36.520 |
And then I made the bold claim that being able to do this on the GPU through Swift is 00:02:43.540 |
And so to find out how that's going to happen, let's hear it directly from Chris. 00:02:53.460 |
So we went through this video, and the author of Halide gave a great talk about how in image 00:02:58.680 |
processing kernels, there's actually a lot of different ways to get the computer to run 00:03:02.840 |
this, and they all have very different performance characteristics, and it's really hard to take 00:03:06.860 |
even a two-dimensional blur and make it go fast. 00:03:12.120 |
We're not talking about two-dimensional images, we're talking about 5D matrices and tensors 00:03:16.360 |
and lots of different operations that are composed together, and hundreds or thousands 00:03:20.120 |
of ops, and trying to make that all go fast is really, really, really, really hard. 00:03:26.680 |
So if you wanted to do that, what you'd do is you'd write a whole new compiler to do 00:03:29.640 |
this, and it would take years and years of time. 00:03:32.120 |
But fortunately, there's a great team at Google called the XLA team that has done all this 00:03:37.440 |
And so what XLA is, is it's exactly one of those things. 00:03:41.000 |
It's something that takes in this graph of tensor operations, so things like convolutions 00:03:49.480 |
It does low-level optimizations to allocate buffers, to take these different kernels and 00:03:55.420 |
fuse them together, and then it generates really high-performance code that runs on 00:03:59.280 |
things like CPUs, GPUs, or TPUs, which are crazy-fast high-performance accelerators that 00:04:07.440 |
And so XLA does all this stuff for us now, which is really exciting. 00:04:12.600 |
And if you take the running bachelor example that we left off with, and we were talking 00:04:16.600 |
about, this is the graph that XLA will generate for you. 00:04:19.760 |
And this is generated from Swift code, actually. 00:04:23.680 |
And so you can see here what these darker boxes are, is they're fusion nodes, where 00:04:27.680 |
it's taken a whole bunch of different operations, pushed them together, gotten rid of memory 00:04:35.600 |
And the cool thing about this is, this is all existing shipping technology that TensorFlow 00:04:42.760 |
There's a big question, though, and a big gotcha, which is, this only works if you have 00:04:47.680 |
And with TensorFlow 1, that was pretty straightforward, because TensorFlow 1 was all about graphs. 00:04:51.400 |
Jeremy talked about the shipping, shipping, shipping, ship, ship, ship, shipping thingy, 00:04:57.300 |
ship, ship, shipping, ship, ship, ship, I don't know. 00:05:03.560 |
And so with TensorFlow 1, it was really natural. 00:05:06.040 |
With TensorFlow 2, with PyTorch, there's a bigger problem, which is, with eager mode, 00:05:10.720 |
That's the whole point, is you want to have step at a time, you run one op at a time, 00:05:14.460 |
and so you don't get the notion of these things. 00:05:17.760 |
So what the entire world has figured out is that there's two basic approaches of getting 00:05:24.440 |
There's tracing, and there's different theories on tracing. 00:05:27.560 |
There's staging and taking code and turning it into a graph algorithmically. 00:05:32.760 |
And PyTorch and TensorFlow both have similar but different approaches to both of these 00:05:38.400 |
The problem with these things is they all have really weird side effects, and they're 00:05:42.880 |
And so if Swift for TensorFlow is an airplane, we've taken off, and we're just coming off 00:05:50.480 |
the runway, but we're still building all this stuff into Swift for TensorFlow as the plane 00:05:58.680 |
The team was working on the demo, and it just didn't come together today. 00:06:05.580 |
And so one of the problems with tracing, for example, is that in PyTorch or in TensorFlow 00:06:10.920 |
Python, when you trace, if you have control flow in your model, it will unroll the entire 00:06:16.920 |
And so if you have an RNN, for example, it will unroll the entire RNN and make one gigantic 00:06:22.920 |
And some control flow you want to ignore, some control flow you want to keep in the graph. 00:06:25.920 |
And so having more control over that is something that we think is really important. 00:06:29.440 |
So, Chris, this nearly there is at the end of April. 00:06:33.980 |
This video will be out somewhere around mid to late June. 00:06:38.800 |
I suspect it will be up and running by then, and if it's not, you will personally go to 00:06:42.200 |
the house of the person watching the video and fix it for them. 00:06:48.080 |
In two, three months, so that's July, look on the TensorFlow main page. 00:06:58.920 |
So we'll see how the future -- >> And there should be a notebook in the 00:07:04.000 |
hair brain repo that will be called batch norm or something. 00:07:08.680 |
And we'll have an XLA version of this running. 00:07:12.400 |
>> And so Swift also has this thing called graph program extraction. 00:07:14.680 |
The basic idea here is where autograph and torch script are doing these things where 00:07:18.440 |
they're kind of like Python but kind of not, and Jeremy was talking before about how you 00:07:23.040 |
had a comment in the wrong place and torch script will fall over and it's not -- it kind 00:07:26.600 |
of looks like Python but really, really is not. 00:07:29.680 |
With Swift, we have a compiled, reasonable language, and so we could just use compiler 00:07:34.160 |
techniques to form a graph, pull it out for you. 00:07:36.600 |
And so a lot of things that are very magic and very weird are just very natural and plug 00:07:41.760 |
So I'm very excited about where all this comes. 00:07:48.160 |
So one last thing that doesn't exist, because Jeremy wanted to talk about this, he's very 00:07:53.000 |
excited, is there's this question about what does MLIR relate to XLA, what is all this 00:07:57.600 |
stuff going on, what does this make sense for TensorFlow? 00:08:00.720 |
And the way I look at this is XLA is really good if you have -- if you want high performance 00:08:04.840 |
with these common operators like matrix multiplication, convolution, things like that. 00:08:09.400 |
These operators can be combined in lots of different ways. 00:08:11.480 |
And so these are the primitives that a lot of deep learning is built out of. 00:08:16.880 |
And XLA is really awesome for high performance, particularly weird accelerators. 00:08:21.360 |
But there's a catch with this, because one of the things that power deep learning is 00:08:26.320 |
the ability to innovate in many of these ways. 00:08:28.360 |
And so depth-wise convolutions came out, and suddenly with many fewer parameters, you can 00:08:32.800 |
get really good accuracy wins, and you couldn't do that if you just had convolution. 00:08:37.160 |
Yeah. And like on the other hand, like depth-wise convolutions are a specific case of grouped 00:08:44.880 |
And the reason we haven't been talking about grouped convolutions in class is that so far 00:08:51.520 |
And so there's this whole thing that like somebody wrote a paper about three years ago, 00:08:56.080 |
which basically says, hey, here's a way to get all the benefit of convolutions, but much, 00:09:01.360 |
And we're still -- you know, the practical deep learning for coders course still doesn't 00:09:05.080 |
teach them, because they're still not practical, because no one's got them running quickly 00:09:09.040 |
And so we've been talking about this whole course. 00:09:11.040 |
The goal with this whole platform is to make it an infinitely hackable platform. 00:09:15.280 |
And so if it's infinitely hackable down in convolution, or give up all performance around 00:09:20.880 |
And so what MLIR is about is there's multiple different aspects of the project, but I think 00:09:25.800 |
one Jeremy's most excited about is, what about custom ops, right? 00:09:30.120 |
How can we make it so you don't bottom out at matmul in convolution, and so you get that 00:09:33.400 |
hackability to invent the next great convolution, right? 00:09:36.400 |
So the cool thing about this is that this is a solved problem. 00:09:39.720 |
The problem is all the problems -- all the solutions are in these weird systems that 00:09:43.600 |
don't talk to each other, and they don't work well together, and they're solving different 00:09:47.520 |
So Halide, for example, is a really awesome system if you're looking for 2D image processing 00:09:54.920 |
Other people have built systems on top of Halide to try to adapt it, and things like 00:10:02.000 |
There's other solutions, so PlatML was recently acquired by Intel, and they have a lot of 00:10:06.120 |
really cool compiler technology that is kind of in their little space. 00:10:09.400 |
TVM is a really exciting project, also building on Halide, pulling it together with its own 00:10:17.000 |
It's also in each of these cases they've built some kind of domain-specific language to make 00:10:20.360 |
it easier for you, the data scientist, to write what you want in a quick and easy way. 00:10:26.600 |
And so -- and often what happens here is that each of these plug into the deep learning 00:10:32.800 |
And so what you end up having to do is you end up in a mode of saying, TVM's really good 00:10:38.240 |
And Tensor Comprehensions, which is another cool research project, is good at these kinds 00:10:43.880 |
And so I have to pick and choose the framework I want to use based on which one they happen 00:10:49.800 |
>> And again, we don't teach this in practical deep learning for coders because it's not 00:10:53.600 |
You know, these things are generally research-quality code. 00:10:55.880 |
They generally don't integrate with things like PyTorch. 00:10:58.860 |
They generally require lots of complex build steps. 00:11:04.400 |
They work really great on the algorithm and the paper, but they kind of fall apart on 00:11:09.520 |
So our goal and our vision here with TensorFlow but with Swiffer TensorFlow also is to make 00:11:15.120 |
it so that you can express things at the highest level of abstraction you can. 00:11:18.120 |
So if you have a batch norm layer, totally go for that batch norm layer. 00:11:21.640 |
If that's what you want, use it, and you're good. 00:11:23.680 |
If you want to implement your own running batch norm, you can do that in terms of mat 00:11:30.240 |
If you want to sync down further, you can go down to one of these systems. 00:11:34.200 |
If you want to go down further, you can write assembly code for your accelerator if that's 00:11:38.800 |
But you should be able to get all the way down and pick that level of abstraction that 00:11:44.320 |
And so I just want to give Tensor comprehensions as one random example of how cool this can 00:11:49.720 |
So this is taken straight out of their paper. 00:11:53.240 |
But Tensor comprehensions gives you what is basically like Einstein notation on total 00:12:00.400 |
It's like insome, but taken to a crazy extreme level. 00:12:07.160 |
And what Tensor comprehensions is doing is you write this very simple, this very simple 00:12:12.960 |
It's admittedly kind of weird, and it has magic, and the syntax isn't the important 00:12:18.320 |
But you write pretty simple code, and then it does all this really hardcore compiler 00:12:23.600 |
So it starts out with your code, it then fuses the different loops, because these two things 00:12:28.480 |
It does inference on what are the ranges for all the loops and what the variables that 00:12:36.840 |
Fuse, tile, then sync the code to make it so the inner loops can be vectorized. 00:12:43.040 |
This is actually a particularly interesting example, because this thing here, gem, is 00:12:50.240 |
This is actually the thing on which large amounts of deep learning and linear algebra 00:12:55.800 |
So a lot of the stuff we write ends up calling a gem. 00:12:59.000 |
And the fact that you can write this thing into lines of code, if you look inside most 00:13:04.200 |
linear algebra libraries, there will be hundreds or thousands of lines of code to implement 00:13:09.800 |
So the fact that you can do this so concisely is super cool. 00:13:13.200 |
And so the idea that then we could do nice little tweaks on convolutions or whatever 00:13:20.200 |
in similar amounts of code is something that I get very excited about. 00:13:25.720 |
And the other thing to consider with this is that, again, generating really good code 00:13:30.360 |
But once you make it so that you separate out the code that gets compiled from the 00:13:34.980 |
algorithms that get applied to it, now you can do search over those algorithms. 00:13:39.040 |
Now you can apply machine learning to the compiler itself. 00:13:42.040 |
And now you can do some really cool things that open up new doors. 00:13:46.080 |
So I mean, that's actually really interesting because in the world of databases, which is 00:13:50.800 |
a lot more mature than the world of deep learning, this is how it works, right? 00:13:54.160 |
You have a DSL, normally called SQL, where you express what you want, not how to get 00:14:00.280 |
And then there's a thing called a query analyzer or query compiler or query optimizer that figures 00:14:06.600 |
And it'll do crazy stuff like genetic algorithms and all kinds of heuristics. 00:14:10.880 |
And so like what we're seeing here is we'll be able to do that for deep learning, our 00:14:18.120 |
own DSLs and our own optimizers, not deep learning optimizers, but more like database 00:14:26.760 |
The ML Air part of this is longer time horizon. 00:14:29.080 |
This is not going to be done by the time this video comes out. 00:14:31.840 |
But this is all stuff that's getting built, and it's all open source, and it's super exciting. 00:14:35.880 |
So to overall summarize all this TensorFlow infrastructure stuff, so TensorFlow is deeply 00:14:40.200 |
investing in the fundamental parts of the system. 00:14:42.180 |
This includes the compiler stuff, also the runtime, op dispatch, the kernels themselves. 00:14:47.040 |
There's tons and tons and tons of stuff, and it's all super exciting. 00:14:55.960 |
Yeah, this is very exciting, Chris, that sometime in the next year or two, there'll be these 00:15:00.600 |
But I actually know about some really fast languages right now. 00:15:09.880 |
It's actually languages that we can make run really fast right now. 00:15:15.040 |
And it's quite amazing, actually, how easy we can make this. 00:15:22.800 |
Like when you say to an average data scientist, hey, you can now integrate C libraries, their 00:15:29.040 |
response is not likely to be oh, awesome, right? 00:15:31.960 |
Because data scientists don't generally work at the level of C libraries. 00:15:36.680 |
But data scientists work in some domain, right? 00:15:40.760 |
You work in neuroradiology, image acquisition, or you work in astrophysics or whatever. 00:15:47.880 |
And in your domain, there will be many C libraries that do the exact thing that you want to do 00:15:57.320 |
And currently, you can only access the ones that have been wrapped in Python, and you 00:16:02.360 |
can only access the bits that have been wrapped in Python. 00:16:05.640 |
What if you could actually access the entire world of software that's been written in C, 00:16:11.520 |
which is what most software has been written in, and it's easy enough that, you know, an 00:16:25.160 |
Let's say we want to do audio processing, okay? 00:16:29.440 |
And so for audio processing, I'm thinking like, oh, how do I start doing audio processing 00:16:35.520 |
and in my quick look around, I couldn't see much in Swift that works on Linux for audio 00:16:41.960 |
>> So you write an MP3 decoder from scratch, right? 00:16:43.840 |
>> Yeah, I thought about doing an MP3 decoder from scratch, but then I -- 00:16:48.160 |
>> I figured, like, people have MP3 decoders already. 00:16:52.240 |
And I looked it up on the Internet, and it turns out there's lots of C libraries that 00:16:55.880 |
And one popular one, apparently, is called SOX, right? 00:16:59.600 |
And I'm a data scientist, I'm not an audio processing person, so this is my process last 00:17:03.400 |
week was, like, C, library, MP3, decode, and it says use SOX. 00:17:10.800 |
I've got something here that says import SOX. 00:17:15.160 |
And then it says, in it's SOX, and then it says read SOX audio. 00:17:29.840 |
This is what C library home pages tend to look like, they tend to be very 90s. 00:17:35.380 |
And basically, I looked at the documentation, and C library documentation tends to be less 00:17:40.680 |
than obvious to kind of see what's going on, but, you know, you just kind of have to learn 00:17:48.240 |
to read it, just like you learn to read Python documentation. 00:17:50.520 |
So basically it says you have to use this header, and then these are the various functions 00:17:55.340 |
There's something called edit, and there's something called open. 00:17:59.280 |
I jumped into VIM, and I created a directory, and I called it Swift SOX. 00:18:09.560 |
And in that directory, I created a few things. 00:18:12.680 |
I created a file called package.swift, and this is the thing that defines a Swift package. 00:18:21.680 |
A Swift package is something that you can import, and you can actually type Swift package 00:18:26.740 |
in it, and it will kind of create this skeleton for you. 00:18:29.760 |
Personally, my approach to wrapping a new C library is to always copy an existing C library 00:18:35.740 |
folder that I've created, and then just change the name, because every one of them has the 00:18:41.440 |
So this is file number one, you have to give it a name, and then you have to say what's 00:18:44.680 |
the name of the library in C. And in SOX, the name of the library is SOX. 00:18:49.800 |
Part two is you have to create a file called sources SOX module.moduleMap, and it contains 00:19:00.040 |
always these exact lines of code, again, where you just change the word SOX, and the word 00:19:08.360 |
>> So what this is doing is this is saying that you want to call it SOX and Swift. 00:19:15.680 |
>> Well, I actually call it SOXU.H, which we'll see in a moment. 00:19:18.960 |
And then, but the library isn't -- it gets linked in by libsox. 00:19:23.720 |
>> So all these things in C can be different. 00:19:26.840 |
So, you know, most of the time, we can make them look the same. 00:19:30.560 |
And so then the final third file that you have to create is the .H file. 00:19:35.160 |
And so you put that in sources, SOX, and I call it SOXU umbrella header.H, and that contains 00:19:42.160 |
one line of code, which is the header file, which as you saw from the documentation, you 00:19:48.480 |
So once you add these three files, you can then do that, okay? 00:19:58.120 |
And now this thing, this C function is available to Swift, right? 00:20:08.600 |
Because suddenly, like a lot of what this impractical deep learning for coders course 00:20:12.080 |
is about is like opening doors that weren't available to us as data scientists before 00:20:17.480 |
and thinking what happens if you go through that door, right? 00:20:20.120 |
So what happens if you go through the door where suddenly all of the World C libraries 00:20:25.520 |
What can you do in your domain that nobody was doing before? 00:20:29.180 |
Because there wasn't any Python libraries like that. 00:20:32.660 |
So I -- and so what I tend to do is write little Swift functions that wrap the C functions 00:20:40.760 |
So here's init SOX, which checks for the value I'm told the docs say to check for. 00:20:46.120 |
And SOX open read, for some reason you have to pass nil, nil, nil, so I just wrap that. 00:20:58.880 |
And so that's going to return some kind of structure. 00:21:03.640 |
And so you have to read the documentation to find out what it is or copy and paste somebody 00:21:08.720 |
Very often the thing that's returned to you is going to be a C pointer. 00:21:16.720 |
You just say point E to grab the thing that it's pointing at. 00:21:20.000 |
And according to the documentation, there's going to be something called signal, which 00:21:24.520 |
is going to contain things like sample rate, precision, channels, and length. 00:21:36.240 |
So I can run that, and I can see I've opened an audio file with a C library without any 00:21:42.960 |
>> One of the things you can do is you can type SOX tab, and now here's all the stuff 00:22:00.960 |
And this is kind of somewhat similar to Python. 00:22:04.600 |
In Python you can open C libraries in theory and work with them. 00:22:09.240 |
But I don't do it, almost never do it, because I find that when I try to -- the thing you 00:22:14.240 |
get back are these C structures and pointers, which I can't work with in Python in a convenient 00:22:21.520 |
Or if I do use things like PyBind11, which is something that helps with that, then I 00:22:25.320 |
have to create all these make scripts and compile processes, and I just don't bother, 00:22:34.080 |
And then the nice thing is we can bring Python and C and Swift together by typing import Python. 00:22:42.240 |
Now we can just say -- we can take our C array and say make NumPy array and plot it, right? 00:22:51.360 |
So we're really bringing it all together now. 00:22:54.200 |
And we can even use the ipython.display, and we can hear some audio. 00:23:19.360 |
We've got Swift libraries, C libraries, Python libraries. 00:23:25.000 |
We can do stuff that our peers aren't doing yet. 00:23:28.720 |
But what I want to know, Chris, is how the hell is this even possible? 00:23:34.720 |
Your guy likes to look under the covers or under the hood. 00:23:45.400 |
So it should be no problem to do this, right? 00:23:52.080 |
I think you were just talking about why it's actually very useful. 00:23:54.680 |
There's tons of code available in C. A lot of that C is really useful. 00:23:58.240 |
But C is actually a terrible, crazy, gross language on its own right. 00:24:03.400 |
C has all these horrible things in it, like pointers, that are horribly unsafe. 00:24:12.240 |
>> Is it possible to achieve similar results in Python using something like scytheon? 00:24:18.240 |
So scytheon is a Python-like language which compiles to C. 00:24:27.040 |
And I would generally rather write scytheon than C for integrating C with Python. 00:24:37.400 |
You still kind of -- it's actually easier in a Jupyter notebook because you can just 00:24:41.200 |
say percent, percent scytheon and kind of integrate it. 00:24:43.440 |
But as soon as you want to start shipping a module with that thing, which presumably 00:24:48.000 |
is the purpose is you want to share it, you then have to deal with, like, build scripts 00:24:53.680 |
So scytheon has done an amazing job of kind of making it as easy as possible. 00:25:00.320 |
But I personally have tried to do quite a lot with scytheon in the last few months and 00:25:04.680 |
ended up swearing off it because it's just still not convenient enough. 00:25:09.680 |
I can't quite use a normal debugger or a normal profiler and just ship the code directly. 00:25:16.240 |
And it's still -- yeah, it's great for Python if that's what you're working with. 00:25:23.480 |
I've created Swift C libraries, I created a Swift C library within a week of starting 00:25:31.200 |
And so the thing I want to underscore here is that C is actually really complicated. 00:25:38.160 |
It's got bit fields and unions and it's weird notion of what verbs are. 00:25:42.680 |
It's got all this crazy stuff that the grammar is context sensitive and gross. 00:25:48.680 |
And so it's just actually really hard to deal with. 00:25:51.280 |
Does it sound like somebody who's been through the process of writing a C compiler and came 00:25:55.920 |
>> Well, so the only thing worse than C is C++. 00:26:03.400 |
It's both more gross and huge and it's also more important in some ways. 00:26:07.920 |
And so Swift doesn't integrate with C++ today, but we want to be able to. 00:26:11.680 |
We want to be able to provide the same level of integration that you just saw with C and 00:26:17.760 |
Well, Swift loves C APIs like Jeremy was just saying. 00:26:21.360 |
And so we love C APIs because we want you to be able to directly access all this cool 00:26:27.560 |
And so the way it works as you just saw is we take the C ideas and remap them into Swift. 00:26:32.560 |
And so because of that, because they're native pure Swift things, that's where you get the 00:26:39.480 |
That's where you get all the things you expect to work in Swift talking to dusty deck old 00:26:44.400 |
grody C code from the 80s or wherever you got it from, whatever epoch. 00:26:50.680 |
And so we also don't want to have wrappers or overhead because that's totally not what 00:26:57.320 |
So Jeremy showed you that usually when you import the C API into Swift, it looks like 00:27:03.260 |
And so you could, but the nice thing about that is that you can build the APIs you want 00:27:06.960 |
to wrap it and you can build your abstractions and make that all good in Swift. 00:27:11.240 |
So one of the ways this happens is that inside the Swift compiler, it can actually read C 00:27:18.280 |
And so we don't have a great way to plug this into workbooks quite yet, but Swift can actually 00:27:24.000 |
take a C header file like math.h, which has macros. 00:27:27.360 |
Here's M under bar E because M under bar E is a good way to name E apparently. 00:27:34.500 |
Here's the sine and cos function, which of course it returns sine and cosine in through 00:27:41.320 |
And so when you import all that stuff into Swift, you get M under bar E as a double that 00:27:49.160 |
You have square root and you can totally call square root. 00:27:51.400 |
You get this unsafe mutable pointer double thing, which we'll talk about later. 00:27:55.400 |
Similarly, like malloc, free, realloc, all this stuff exists. 00:27:59.400 |
And so just to show you how crazy this is, let's see if we can do the side by side thing. 00:28:21.200 |
So what we have here is we have the original header file, math.h on the left. 00:28:24.880 |
If you look at this, you'll see lots of horrible things in C that everybody forgets about because 00:28:30.000 |
you never write C like this, but this is what C looks like when you're talking about libraries. 00:28:51.600 |
We've got structures like exception, of course, that's an exception, right? 00:28:57.720 |
So when you import this into Swift, this is what the Swift compiler sees. 00:29:00.640 |
You see something that looks very similar, but this is all Swift syntax. 00:29:06.160 |
So you see you get the header, the comments, you get all the same functions. 00:29:09.800 |
You get all of the -- like here's your munderbar e. 00:29:25.360 |
So if you want to get this to work, what you can do is you can build into the Swift compiler. 00:29:29.680 |
We can write a C parser, right, and we can implement a C preprocessor, and we can implement 00:29:34.560 |
all the weird rules in C. Someday we can extend it and write C++ as well, and we can build 00:29:40.440 |
this library so the Swift compiler knows how to parse C code, and a C++ compiler is pretty 00:29:45.800 |
easy to write, so we can hack that on a weekend. 00:29:48.640 |
Or, yay, good news, like we've already done this many years ago, it's called Clang. 00:29:58.520 |
Oh, this is getting even -- just to -- even more talk about how horrible C is, you actually 00:30:08.440 |
Inline functions, the insane thing about inline functions is that they don't exist anywhere 00:30:18.200 |
And so if you want to be able to call this function from C, you actually have to code 00:30:21.440 |
gen, you have to be able to parse that, code gen, understand what unions are now, understand 00:30:26.560 |
all of this crazy stuff just so you can get the sign bit out of a float. 00:30:30.680 |
C also has things like function pointers and macros and tons of other stuff, it's just 00:30:35.560 |
And so the right way to do this is to build a C compiler. 00:30:41.880 |
And so what ends up happening is that when Jeremy says import socks, Swift goes and says, 00:30:56.680 |
Go parse all the things the header file pulls in. 00:31:01.280 |
And go pull the entire universe of C together into that module. 00:31:05.840 |
And then build what's called syntax trees to represent all the C stuff. 00:31:10.760 |
Well, now we've got a very perfect, pristine C view of the world the exact same way a C 00:31:17.640 |
And so what we can do then is we can build this integration between Clang and Swift where 00:31:22.920 |
when you say give me malloc or give me socks in it, Swift says, whoa, what is that? 00:31:30.000 |
And Clang says, oh, yeah, I know what socks in it is. 00:31:32.960 |
It takes all these pointers and blah, blah, blah, blah. 00:31:36.280 |
I will remap your pointers into my unsafe pointer. 00:31:40.600 |
I will remap your int into my int32 because the languages are a little bit different. 00:31:47.680 |
And then when you call that inline function, Swift doesn't want to know how unions work. 00:31:53.040 |
So what it doesn't say is it says, hey, Clang, you know how to do all this stuff. 00:31:55.960 |
You know how to code generate all these things. 00:31:58.320 |
And they both talked to the LVM compiler that we were talking about last time. 00:32:11.680 |
And so these two things plug together really well. 00:32:18.240 |
If you want to geek out about this, there's a whole talk that's like a half hour, an hour 00:32:22.200 |
long talking about how all this stuff works at a lower level. 00:32:31.560 |
So one of the reasons I'm really interested in this description, Chris, is that it's kind 00:32:38.400 |
of all about one of the reasons I really wanted to work with you, apart from the fact that 00:32:43.760 |
you're very amusing and entertaining, is that -- 00:32:49.000 |
Is that this idea of what you did with Clang and Swift is like the kind of stuff that we're 00:32:55.880 |
going to be seeing is what's happened with like how differentiation is getting added 00:33:01.760 |
And like this idea of like being able to pull on this entire compiler infrastructure, as 00:33:06.800 |
you'll see, is actually going to allow us to do some similarly exciting and surprisingly 00:33:15.360 |
And I'll say this is all simple now, but actually getting these two massive systems talk to 00:33:21.540 |
And it was -- getting Python integrated was comparatively easy because Python is super 00:33:30.080 |
And one thing I'll say about C libraries is each time I come across a C library, many 00:33:35.160 |
of them have used these weird edge case things Chris described in surprising ways. 00:33:40.000 |
And so I just wanted to point at a couple of pointers as to how you can deal with these 00:33:45.540 |
So when I started looking at kind of how do I create my own version of tf.data, I need 00:33:51.580 |
to be able to read JPEG files, I need to be able to do image processing, I was interested 00:33:58.520 |
And vips is a really interesting C library for image processing. 00:34:04.200 |
And so I started looking at the -- at bringing in the C library. 00:34:08.820 |
And so I started in exactly the way you've seen already. 00:34:15.240 |
So you'll find just like we have a Swift socks in the repo, there's also a Swift vips. 00:34:25.120 |
And we'll start -- we'll start seeing some pretty familiar things. 00:34:30.480 |
There's the same package.swift that you've seen before, but now it's got some extra lines 00:34:37.480 |
There's the sources, vips, module map with the exact three lines of code that you've 00:34:44.880 |
There's the sources, vips, some header, I call it a different name in this case, which 00:34:51.080 |
has the one line of code which is connecting to the header. 00:34:54.960 |
After you've done that, you can now import vips done. 00:34:59.880 |
But it turns out that the vips documentation says that they actually added the ability 00:35:12.040 |
So it turns out that you can do that in C, even though it's not officially part of C, 00:35:16.720 |
by using something called "bargs," which is basically in C. You can say the number of 00:35:22.160 |
arguments that go here is kind of not defined ahead of time, and you can use something I've 00:35:26.000 |
never heard of before called a sentinel, and basically you end up with stuff which looks 00:35:35.760 |
You end up with stuff which looks like this, where you basically say I want to do a resize 00:35:42.000 |
and it has some arguments that are specified, like horizontal scale, and by default it makes 00:35:48.440 |
But if you want to also change the aspect ratio and have a vertical scale, you literally 00:35:52.480 |
write the string vscale, and that says, oh, the next argument is the vertical scale. 00:35:57.680 |
And if you want to use some different interpolation kernel, you pass the word kernel, you say 00:36:04.160 |
Now this is tricky because for all the magic that Swift does do, it doesn't currently know 00:36:12.400 |
It's just an edge case of the C world that Swift hasn't handled yet. 00:36:18.800 |
>> Jeremy has this amazing thing to find the breaking point of anything. 00:36:25.440 |
The trick is to provide a header file where the things that Swift needs to call look like 00:36:36.160 |
So in this case, you can see I've actually written my own C library, right? 00:36:40.640 |
And so I added a C library by literally just putting into sources. 00:36:49.360 |
And in there, I just dumped a C header file, right? 00:36:55.180 |
As soon as I do that, I can now add that C library, not precompiled, but actual C code 00:37:03.160 |
I've just written to my package.swift, and I can use that from Swift as well, right? 00:37:09.200 |
And so that means that I can wrap the VIPs weird varargs resize version with a non-varargs 00:37:16.120 |
resize version where you always pass in vertical scale, for instance. 00:37:20.360 |
And so now, I can just go ahead and say, VIPs load image. 00:37:26.660 |
And then I can say VIPs get, and then I can pass that to Swift for TensorFlow in order 00:37:37.680 |
Now, there's a really interesting thing here, which is when you're working with C, you have 00:37:46.720 |
So Swift has this fantastic reference counting system, which nearly always handles memory 00:37:54.560 |
Every C library handles memory management differently. 00:37:58.300 |
So we're about to talk about OpenCV, which actually has its own reference counting system, 00:38:03.560 |
But most of the time, the library will tell you, hey, this thing is going to allocate 00:38:07.440 |
some memory, you have to free it later, right? 00:38:12.480 |
The VIPs get function says, hey, this memory, you're going to have to free it later. 00:38:17.240 |
To free memory in C, you use the free function, because we can use C functions from Swift. 00:38:28.240 |
And I need to make sure that we call it when we're all done. 00:38:31.680 |
And there's a super cool thing in Swift called defer. 00:38:35.100 |
And defer says, run this piece of code before you finish doing whatever we're doing, which 00:38:41.080 |
in this case would be before we exit from this function. 00:38:44.160 |
>> Yeah, so if you throw an exception, if you return early, anything else, it will make 00:38:50.720 |
In this case, I probably didn't need defer, because there isn't exceptions being thrown 00:38:54.200 |
or lots of different return places, but that's my habit, is that if I need to clean up memory, 00:38:59.320 |
I just chuck it in a defer block, or at least that's one of the two methods that I use. 00:39:07.960 |
So because I like finding the edges of things and then doing it anyway, the next thing I 00:39:14.000 |
looked at, and this gives you a good sense of how much I hate tf.data, is I was trying 00:39:20.160 |
to do anything I could to avoid tf.data, and so I thought, all right, let's try OpenCV. 00:39:26.360 |
And for those of you that have been around FastAI for a while, you'll remember OpenCV 00:39:35.520 |
And I loved it because it was insanely, insanely fast. 00:39:41.400 |
It's fast, reliable, high-quality code that covers a massive amount of computer vision. 00:39:47.240 |
It's kind of like -- it's what everybody uses if they can. 00:39:50.760 |
And much to my sadness, we had to throw it out, because it just -- it hates Python multiprocessing 00:39:58.960 |
It just kept creating weird race conditions and crashes and stalls, like literally the 00:40:04.680 |
same code on the same operating system on two different AWS servers that are meant to 00:40:09.000 |
be the same spec, would give different results. 00:40:12.520 |
So I was kind of hopeful maybe it'll work in Swift, so we gave it a go. 00:40:16.880 |
And unfortunately, since I last looked at it, they threw away their C API entirely, and 00:40:22.200 |
they're now C++ only, and Chris just told you we can't use C++ from Swift. 00:40:30.160 |
You can disguise it so Swift doesn't know that it's C++. 00:40:34.720 |
And so the disguise needs to be a C header file that only contains C stuff, right? 00:40:43.440 |
But what's in the C++ file behind the header file -- 00:40:49.960 |
>> Clang those Pascal calling conventions, and Swift can call them. 00:40:59.080 |
So here's Swift CV, and so Swift CV has a very familiar-looking package.swift that contains 00:41:06.720 |
the stuff that we're used to, and it contains a very familiar-looking OpenCV4 module map. 00:41:13.320 |
Now, OpenCV actually has more than one library, so we just have to list all the libraries. 00:41:19.240 |
It has a very familiar-looking -- actually, we don't even have a -- oh, sorry, that's 00:41:25.040 |
So we didn't use the header file here, because we're actually going to do it all from our 00:41:33.100 |
So I created a C OpenCV, and inside here, you'll find C++ code. 00:41:45.160 |
And we actually largely stole this from the Go OpenCV wrapper, because Go also doesn't 00:41:51.560 |
know how to talk to C++, but it does know how to talk to C, so that was a convenient 00:41:57.280 |
But you can see that, for example, we can't call new, because new is C++, but we can create 00:42:02.400 |
a function called matnew that calls that, and then we can create a header that has mat new, 00:42:18.320 |
This is actually a plain C struct -- pointer to a struct, and so I can call that. 00:42:25.480 |
And so even generics, C++ generics, we can handle this way. 00:42:28.880 |
So OpenCV actually has a full-on multidimensional generic array, like NumPy, with, like, matrix 00:42:38.280 |
multiplication, all the stuff in it, and the way its generic stuff works is that you can 00:42:42.960 |
ask for a particular pixel and you say, "What data type is it using C++ generics?" 00:42:48.580 |
So we just create lots and lots of different versions, all the different generic versions, 00:42:59.480 |
So once we've done all that, we can then say import SwiftCV and start using OpenCV stuff. 00:43:11.440 |
Well, now that we can use that, we can read an image, we can have a look at its size, 00:43:18.880 |
we can get the underlying C pointer, and we can start doing -- yeah, and we can start 00:43:26.600 |
doing timing things and kind of see, is it actually looking like it's going to be hopeful 00:43:37.240 |
I was very scared when I started seeing in Swift all these unsafe, mutable pointers and 00:43:47.160 |
>> But this is C, right, and so C is inherently unsafe. 00:43:52.240 |
>> Swift's theory on that is that it does not prevent you from using it. 00:43:57.080 |
It just makes it so you know that you're in that world. 00:44:00.120 |
>> But there's actually this great table I saw from Ray Wendelix, from Ray Wendelix website, 00:44:11.240 |
And basically what he pointed out is the names of all of these crazy pointer types actually 00:44:18.520 |
They all start with unsafe, they all end with pointer, and in the middle there's this little 00:44:22.680 |
mini-language which is can you change them or not, are they typed or not, do we know 00:44:30.380 |
the count of the number of things in there or not, and what type of thing do they point 00:44:35.720 |
So once you kind of realize all of these names have that structure, suddenly things start 00:44:45.320 |
>> All right, let's go with the two questions. 00:44:48.560 |
>> One is, are the C libraries dynamically linked or statically linked or compiled from 00:44:56.160 |
By default, if you import them, they are statically linked. 00:44:59.520 |
And so they'll link in with the normal linker flags, and if the library is a .a file, then 00:45:05.160 |
it will get statically linked directly into your Swift code. 00:45:07.560 |
If it's a .so file, then you'll dynamically link it, but it's still linked to your executable. 00:45:16.600 |
And so if you want to, you can dynamically load C libraries. 00:45:25.280 |
>> Another question is, how much C do you have to know to do all these C-related imports? 00:45:33.880 |
So I don't really know any C at all, so I kind of randomly press buttons until things 00:45:39.000 |
start working or copy and paste other people's C code. 00:45:42.280 |
>> The Internet Stack Overflow has a lot of helpful stuff. 00:45:46.120 |
You need to know there's a thing called a header file, and that contains a list of the 00:45:49.240 |
functions that you can call, and you need to know that you type #include, angle brackets, 00:45:55.160 |
But you can just copy and paste the Swift socks library that I've already shown you, 00:46:02.280 |
And so really, you don't need to know any C. You just need to replace the word "socks" 00:46:05.920 |
with the name of the library you're using, and then you need to know -- you need to kind 00:46:10.920 |
of work through the documentation that's in C, and that's the bit where it gets, like, 00:46:18.000 |
you know -- I find the tab completion stuff is the best way to handle that, is like hit 00:46:22.440 |
tab, and you say let x equal, and then you call some function, and then you say x. and 00:46:29.040 |
you see what's inside it, and things kind of start working. 00:46:31.960 |
>> And for all the hard time you give socks as, you know, not a web design firm, it has 00:46:39.000 |
a pretty well-structured API, and so if you have a well-structured API like this, then 00:46:44.640 |
If you have something somebody hacked together, they didn't think about it, then it's probably 00:46:49.760 |
going to be weird, and you may have to understand their API, and it may require you to understand 00:46:53.200 |
a lot of C. But those are the APIs that you probably won't end up using, because if they 00:46:57.320 |
haven't gone a lot of love to their API, people aren't using it, usually. 00:47:03.400 |
>> My impression is that almost all of the examples of the future power of Swift seem 00:47:07.280 |
to rely not on the abstraction to higher levels, but on the diving into lower-level details. 00:47:13.320 |
As a data scientist, I try to avoid doing this, I only go low if I know there's a big 00:47:19.880 |
>> So let me set my perspective as a data scientist, and maybe we can hear you all. 00:47:24.240 |
>> Well, and I was just going to inject, we're starting at the bottom, so we'll be getting 00:47:29.960 |
But there's a reason that I'm wanting to teach this stuff, which is that I actually think 00:47:38.400 |
as data scientists, this is our opportunity to be far more awesome. 00:47:44.720 |
It's like being able to access, something I've noticed for the last 25 years is everybody 00:47:51.640 |
I know in, I mean, it didn't used to be called data science, we used to call it industrial 00:47:56.000 |
mathematics or whatever, operated within the world that was accessible to them, right? 00:48:03.360 |
So at the moment, for example, there's a huge world of something called sparse convolutions 00:48:07.560 |
that are, I know they're amazing, I've seen competition-winning solutions, they get state-of-the-art 00:48:15.680 |
There's two people in the world doing it, because it all requires custom CUDA kernels. 00:48:22.680 |
For years, for decades, almost nobody was doing differential programming, because we had to 00:48:28.720 |
So like, it's not just about, oh, I want an extra, it's absolutely not about, I want an 00:48:34.440 |
extra 5% of performance, it's about being able to do whatever's in your head. 00:48:41.320 |
I used to be a management consultant, I'm afraid to say, and I didn't know how to program, 00:48:46.360 |
and I knew Excel, and the day that I learned Visual Basic was like, oh, now I'm not limited 00:48:53.360 |
to the things I can do in a spreadsheet, I can program. 00:48:57.680 |
And then when I learned Delphi, it was like, oh, now I'm not limited to the things that 00:49:01.080 |
I can program in a spreadsheet, I can do things through in my head. 00:49:09.840 |
>> Hey, and some people are feeling overwhelmed with Swift, C, C++, Python, PyTorch, TensorFlow, 00:49:18.440 |
Swift for TensorFlow, do we need to become experts on all these different languages? 00:49:24.760 |
>> No, we don't, but can I show why this is super interesting? 00:49:30.000 |
Because this is like -- so let me show you why I started going down this path, right? 00:49:43.360 |
And I found that it took me 33 seconds to iterate through ImageNet. 00:49:54.760 |
And I know that in Python, we have a notebook which Sylvia created to compare, called timing, 00:50:12.200 |
And this is not an insignificant difference, waiting more than three times as long just 00:50:19.840 |
So I thought, well, I bet OpenCV can do it fast. 00:50:39.040 |
So this is the entirety of my test program, right, which is something that downloads ImageNet 00:50:45.680 |
and reads and resizes images, and does it with four threads. 00:50:52.440 |
And so if you go Swift run -- sorry, Swift run -- okay, so when I run this, check this 00:51:14.120 |
And half a day's work, I have something that can give me an image processing pipeline that's 00:51:22.240 |
And so it's not just like, oh, we can now do things a bit faster, but it's now like 00:51:27.320 |
any time I get stuck that I can't do something, it's not in the library I want, or it's so 00:51:32.400 |
slow as to be unusable, you know, this whole world's open. 00:51:36.000 |
So I'd say we don't really touch this stuff until you get to a point where you have no 00:51:42.320 |
choice but to, and at that point you're just really glad it's there. 00:51:45.600 |
>> Well, and to me, I think it's also -- the capability is important even if you don't 00:51:51.840 |
So keep in mind, this is all code that's in a workbook. 00:51:54.600 |
So you can get code in the workbook from anywhere, and now you can share that workbook, and you 00:51:59.200 |
don't have to share this like tangled web of dependencies that have to go with the workbook. 00:52:03.920 |
And so the fact that you can do this in Swift doesn't mean that you yourself have to write 00:52:06.960 |
the code, but it means you can build on code that other people wrote. 00:52:09.880 |
And if you haven't seen Swift at all, if this is your first exposure to it, this is definitely 00:52:15.360 |
Like the data APIs that we're about to look at would be a much more reasonable place to 00:52:20.080 |
You've had a month or two months' worth of hacking with Swift time, and that's Jeremy 00:52:24.960 |
month, so that's like a year for normal people. 00:52:27.920 |
So this being like super powerful and the ability to do this is, I think, really great, 00:52:35.760 |
>> Yeah, and I am totally not a C programmer at all, and it's -- honestly, it's been more 00:52:39.600 |
like two weeks, because before that I was actually teaching a Python course, believe 00:52:47.040 |
So, I mean, so this is all -- it's all there, and I would definitely recommend ignoring 00:52:51.840 |
all of this stuff, and we're about to start zooming up the levels of the stack. 00:52:56.520 |
But the fact that it's there, I think, is reassuring, because one of the challenges 00:52:59.320 |
that we have with Python is that you get this ceiling, and if you get up to the ceiling, 00:53:04.040 |
then there's no going further without this crazy amount of complexity, and whether that 00:53:08.440 |
be concurrency, or whether that be C APIs, or whether that be other things, that prevents 00:53:12.880 |
the next steps and the next levels of innovation and the industry moving forward. 00:53:16.440 |
>> And this is meant to be giving you enough to go on with until a year's time course, 00:53:22.200 |
So like it's -- hopefully this is something where you can pick and choose which bits 00:53:26.360 |
you want to dig into, and whichever bit you pick to dig into, we're showing you all the 00:53:30.920 |
depth that you can dig into over the next 12 months. 00:53:36.720 |
So I was really excited to discover that we can use OpenCV, which is something I've wanted 00:53:44.740 |
ever since we had to throw it away from fast AI, and so I thought, you know, what would 00:53:48.920 |
it take to create a data blocks API with OpenCV? 00:53:53.960 |
And thanks to Alexis Gallagher, who kind of gave us the great starting point to say, well, 00:53:58.440 |
here is what a Swifty-style data blocks would look like, we were able to flesh it out into 00:54:07.240 |
And when Chris described Alexis to me as the world leader on value types, I was like, wait, 00:54:14.400 |
I thought, okay, I guess we can listen to Alexis's code for this. 00:54:20.000 |
>> I will say I'm terrified about presenting those slides, because Alexis is sitting right 00:54:24.280 |
there, and if you start scowling, then -- oh, no. 00:54:28.960 |
>> We have a handheld mic, come and correct us any time. 00:54:31.840 |
So there's a thing here called OC data block generic, where you'll find that what we've 00:54:37.080 |
actually done is we've got the entire data blocks API in this really interesting Swifty-style, 00:54:45.880 |
and what you'll see is that when we compare it to the Python version, this is on every 00:54:56.960 |
So let's talk about some of the problems with the data block API in Python. 00:55:00.000 |
I love the data block API, but lots of you have rightly complained that we have to run 00:55:06.480 |
all the -- in a particular order, if we get the wrong order, we get inscrutable errors. 00:55:13.520 |
We have to make sure that we have certain steps in there, if we miss a step, we get 00:55:21.680 |
It's difficult to deal with at that level, and then the code inside the data blocks API, 00:55:28.240 |
I hate changing it now, because I find it too confusing to remember like why it all 00:55:34.400 |
But check out this Swift code that does the same thing, right? 00:55:39.620 |
This is just the same get files we've seen before. 00:55:43.520 |
All we need to do is we say, you know what, if you're going to create some new data bunch, 00:55:50.180 |
you need some way to get the data, you need some way -- and let's assume that they're 00:55:54.680 |
just paths for now -- it is some way to turn all of those paths into items, so something 00:56:01.600 |
You need some way to split that between training and validation. 00:56:09.440 |
So for example, for ImageNet, download calls that. 00:56:16.480 |
And things that convert paths to images, so which grab all of the list of paths is that 00:56:27.400 |
And then the thing that converts -- that decides whether they're training or validation is 00:56:36.420 |
And the thing that creates the label is the parent. 00:56:41.160 |
And so, like, we can basically just define this one neat little package of information 00:56:49.200 |
And Swift will actually tell us if we forgot something. 00:56:54.960 |
Or if one of the things that we provided, like, is training is meant to return true or 00:57:01.080 |
If we return, like, accidentally return the words train instead, it'll complain and let 00:57:11.020 |
But to understand what's going on here, we need to learn a bit more about how Swift works 00:57:19.520 |
So this is something that is actually useful if you are doing deep learning stuff. 00:57:28.600 |
So let's talk about what protocols are in Swift. 00:57:37.680 |
Right now we want to talk about what protocols are. 00:57:39.480 |
And if you've worked in other languages, you may be familiar with things like interfaces 00:57:43.240 |
in Java, abstract classes that are often used, advanced other weird languages have things 00:57:51.560 |
And so all these things are related to protocols in Swift. 00:57:55.040 |
And what protocols do is they're all about splitting the interface of a type from the 00:58:03.880 |
And it says that to use a layer, or rather to define a layer, you have this call. 00:58:09.520 |
So layers are callable, just like in PyTorch. 00:58:12.160 |
And so then you can define a dense layer and say how to call a dense layer. 00:58:14.760 |
You can define a conv2D and show how to implement a conv2D layer. 00:58:22.160 |
And so there's a contract here between what a type is supposed to have. 00:58:27.620 |
And then the implementations, these are different. 00:58:33.200 |
It's like in PyTorch, you kind of have to know that there's something called forward. 00:58:37.400 |
And if you misspell it, or forget it to put it there, or put under call instead of forward, 00:58:45.920 |
Whereas with this approach, you get time completion from the signature. 00:58:52.200 |
And Swift will tell you if you get it wrong, it'll say this is what the function should 00:58:58.440 |
So what this is really doing is this is defining behaviors for groups of types. 00:59:04.760 |
Layer is like the commonality between a whole group of types that behave like a layer. 00:59:12.280 |
Well, it's a list of what are called requirements. 00:59:14.360 |
And so these are all the methods that the type has to have. 00:59:21.480 |
And so one of the things that Swift has in its library is the notion of equatable. 00:59:27.000 |
An equatable is any type that has this equals equals operator, right? 00:59:31.560 |
And then it says what is equatability and all that kind of stuff. 00:59:34.600 |
Now the cool thing about this is that you can build up towers of types. 00:59:39.720 |
And so you can say, well, equatable gets refined by an additive arithmetic. 00:59:44.080 |
And an additive arithmetic is something that sports addition and subtraction. 00:59:47.840 |
Then if you also have multiplication, you can be numeric. 00:59:50.120 |
And if you also have negation, then it can be signed. 00:59:52.940 |
And then you can have integers, and you can have floating point. 00:59:55.920 |
And now you can have all these different things that exist in the ecosystem of your program, 01:00:02.200 |
And so this is all very -- these things, these ways to reason about these groups of types 01:00:10.480 |
give you the ability to get these abstractions that we all want. 01:00:15.920 |
These all exist, and you can go see them in the standard library. 01:00:20.240 |
Well, the cool thing about this is that now you can define behavior that applies to all 01:00:27.320 |
And so what we're doing here is we're saying not equal. 01:00:31.480 |
Well, all things that are equatable, and this T colon equatable says I work with any type 01:00:39.480 |
What this is doing is this is defining a function, not equal, on any type that's equatable, and 01:00:43.960 |
it takes two of these things and returns a bool. 01:00:47.040 |
And we can implement not equal by just calling equals equals, which all equatable things 01:00:52.920 |
So to be clear, what Chris just did here was he wrote one function that is now going to 01:00:59.120 |
add behavior to every single thing that defines equals automatically, which is pretty magic. 01:01:05.640 |
Just like everywhere, boom, one place, super, super abstract. 01:01:11.320 |
But this also works for lots of other things. 01:01:15.800 |
Well, it needs any type that is signed and numeric and that's comparable. 01:01:27.200 |
But now everything that is a number that can be compared is now absible. 01:01:36.040 |
And so with dictionary, what you want is you want the keys in a dictionary all have to 01:01:41.160 |
This is how the dictionary does its efficient lookups and things like that. 01:01:45.880 |
And so all these things kind of stack together and fit together like building blocks. 01:01:49.960 |
One of the really cool things about this now is that we can start taking this further. 01:01:54.240 |
So we talked about not equal building on top of equal equal. 01:01:57.440 |
In the last lesson, we defined this is odd function. 01:02:00.840 |
Well, because this protocol exists, we can actually add it as a method to all things 01:02:07.920 |
And so we can say, hey, put this on all binary integers and give them all an is odd method. 01:02:13.040 |
And now I don't have to put is odd on int and int 16 and int 32 and the C weird things. 01:02:18.640 |
You can just do it in one place and now everybody gets this method. 01:02:26.240 |
Here we can say, hey, I want a inferring from method that does some learning phase switching 01:02:31.560 |
But now because I put this on all layers, well, I can use it on my model because your 01:02:38.840 |
And so this thing allows this one simple idea of defining groups of types and then broadcasting 01:02:44.640 |
behavior and functionality onto all of them at once is really powerful. 01:02:47.760 |
Yeah, I mean, it's like Python's monkey patching, which we use all the time. 01:02:52.600 |
But A, it's not this kind of hacky thing with weird undefined behavior sometime. 01:02:57.640 |
And B, we don't have to monkey patch lots and lots of different places to add functionality 01:03:03.360 |
We just put it in one place and the functionality gets sucked in by everything that can use it. 01:03:09.560 |
And remember, extensions are really cool because they work even if you didn't define the type. 01:03:12.760 |
So what you can literally do is you can pull in some C library, not that we'd love C, but 01:03:18.600 |
some C library and add things to its structs. 01:03:21.600 |
I mean, this is it will have things automatically added to those tracks because it already supports 01:03:30.200 |
So all this stuff composes together in a really nice way, which allows you to do very powerful 01:03:35.840 |
So mix-ins show up and you can control where they go. 01:03:43.400 |
And so he defined his own protocol countable and he says, things are accountable if they 01:03:50.000 |
And the only thing I care about for countable things is I can read it. 01:04:05.080 |
And then Jeremy says, hey, well, take any sequence. 01:04:07.440 |
Let's add a method or a property called total count to anything that's a sequence. 01:04:12.440 |
So a sequence is the same as Python's iterables. 01:04:16.880 |
And so this is things like dictionaries and arrays and ranges and all these things are 01:04:22.120 |
And it says, so long as the element is countable, so I have an array of countable things, then 01:04:27.120 |
I can get a total count method or a property. 01:04:29.920 |
And the way this is implemented is it just says, hey, map over myself, get the count, 01:04:34.880 |
get all the counts up together, and then I have a total count of all the things in my 01:04:40.120 |
And now if I have an array of arrays, an array of mats, lazy mapped collection sequence-y 01:04:46.280 |
thingy of mats, whatever it is, now I can just ask for its count or its total count. 01:04:51.400 |
Hey, Chris, this -- so this functionality you're describing is basically the same as 01:05:06.080 |
I mean, so the interesting thing for me here is -- 01:05:09.440 |
Well, the interesting thing -- the reason I ask is because, like, I've tried to use Haskell 01:05:16.560 |
I'm clearly not somebody who's smart enough to use Haskell. 01:05:21.040 |
Yet I wrote the code that's on the screen right now, like, a couple of days ago. 01:05:25.840 |
And I didn't spend a moment even thinking about the fact I was writing the code. 01:05:29.800 |
It was only, like, the next day that I looked back at this code, and I thought, like, wow, 01:05:34.160 |
I just did something which no other language I've used both could do, and I was smart enough 01:05:40.560 |
Like, it kind of makes this what I think of as, like, super genius functionality available 01:05:47.320 |
Yeah, and so back at the very, very beginning of this, we talked about Swift's goal is to 01:05:50.880 |
be able to take good ideas wherever they are and assemble them in a tasteful way, and then 01:05:57.000 |
Being not weird is a pretty hard but important goal. 01:06:00.360 |
So the way I look at programming languages is that programming languages in the world 01:06:05.000 |
have these gems in them, and Haskell has a lot of gems, incidentally. 01:06:13.680 |
But then it gets buried in weird syntax, and it's just purely functional. 01:06:17.680 |
You have to be -- you know, it has a very opinionated worldview of how you're supposed 01:06:20.960 |
to write code, and so it appeals to a group of people, which is great, but then it gets 01:06:26.840 |
And to me it's really sad that the great technology in programming languages that's been invented 01:06:31.360 |
for decades and decades and decades gets forgotten just because it happened to be put in the 01:06:37.200 |
It's not just that, but it's even the whole way things are described are all about -- 01:06:46.240 |
And so a lot of what Swift is trying to do is just trying to take those ideas, re-explain 01:06:50.180 |
them in a way that actually makes sense, stack them up in a very nice, consistent way, and 01:06:57.200 |
And so a lot of this was pull these things together and really polish and really push 01:07:01.600 |
and, like, make sure that the core is really solid. 01:07:07.480 |
How does the Swift protocol approach avoid the inheritance tree hell problem in languages 01:07:12.580 |
like C#, where you end up with enormous trees that are impossible to refactor? 01:07:17.920 |
And similarly, what are the top opinions around using the mix-in pattern, which has been found 01:07:26.040 |
So the way that Swift influences is completely different than the way that subclasses work 01:07:30.800 |
in C# or Java or other object-oriented languages. 01:07:34.480 |
There, what you get is something called a Vtable. 01:07:37.220 |
And so your type has to have one set of mappings for all these different methods, and then 01:07:47.720 |
Like, so we, on the last slide, added a method is odd to all the integers. 01:07:53.800 |
That would be a very inefficient thing to do. 01:07:56.580 |
And so the implementation is completely different. 01:08:00.460 |
I will, at the end of this, I think in a couple of slides, have a good pointer that will give 01:08:05.280 |
you a very nice deep dive on all that kind of stuff. 01:08:07.960 |
So also there's the binary method problem, and there's a whole bunch of other things 01:08:11.040 |
that are very cleanly solved in Swift protocols. 01:08:16.220 |
Out of curiosity, could you give an estimate of how long it would take someone to go from 01:08:20.000 |
a fair level of knowledge in Python, TensorFlow deep learning, to start being able to be a 01:08:25.920 |
competent contributor to Swift for TensorFlow? 01:08:29.720 |
So we've designed Swift in general to be really easy to learn, and so that you can learn as 01:08:35.140 |
And this course is a little bit very -- it's very bottoms up, but a lot of Swift, just 01:08:41.720 |
And what you start with when you go in from that perspective is you get a very top-down 01:08:48.920 |
And what I would do is I would start with the Google for a Swift tour, and you get a 01:08:53.040 |
very nice top-down view of the language, and it's very approachable. 01:08:56.760 |
And like just pick something that like is in FastAI in some FastAI notebook now. 01:09:01.160 |
We haven't implemented it yet, and pop it into a notebook, right? 01:09:04.400 |
And the first time you try to do that, you'll get all kinds of weird errors and obstructions, 01:09:08.200 |
and you won't know what's going on, but after a few weeks -- 01:09:11.000 |
That's on the forum, and that's what the community's about. 01:09:13.280 |
Yeah, lots of help from the forum, and Chris and I are both on the forum, and there's SFTF 01:09:18.720 |
We'll help you out, and in a few weeks' time, you'll be writing stuff from scratch and finding 01:09:28.360 |
So I want to address one weird thing here and give you something to think about, and 01:09:34.440 |
you might wonder, okay, well, Jeremy wants to know all the countable things. 01:09:39.720 |
We have arrays and we have mat, and we have to say that they are countable. 01:09:43.560 |
But the compiler knows that it's countable or not. 01:09:45.680 |
If you try to make something countable and it doesn't have a count method, the compiler 01:09:51.200 |
Well, let's talk about a different example, and the answer is that protocols are not just 01:09:55.880 |
about methods -- and this is also related to the C# question -- but the behavior of those 01:10:02.440 |
And so here we're going to switch domains and talk about shapes. 01:10:04.760 |
And so I have shape -- all shapes have to have a draw method, right? 01:10:09.760 |
And what I can do is I can define an octagon and tell it how to draw. 01:10:12.360 |
I can define a diamond, tell it how to draw using exactly the same stuff that we just 01:10:19.840 |
And the cool thing about this is now I can define a method, refresh, and now refresh, 01:10:23.480 |
all it does is it clears the canvas and draws the shape. 01:10:28.800 |
So if you go do a tab completion on your octagon, it all just works. 01:10:32.440 |
But what happens if you have something else with the draw method? 01:10:37.280 |
It's a very different notion of what drawing is, right? 01:10:40.920 |
You don't want cowboys to have a refresh method. 01:10:44.040 |
It doesn't make sense to clear the screen and then pull out a gun. 01:10:50.600 |
And so the idea of protocols is really, again, to categorize and describe groups of types. 01:10:56.180 |
And one of the things you'll see, which is kind of cool, is you can define a protocol 01:11:01.240 |
So it's a protocol that doesn't require anything. 01:11:03.380 |
And then you go say, I want that type, that type, that type, that type to be in this group. 01:11:08.400 |
And now I will have a way to describe that group of types. 01:11:11.760 |
So it can be totally random, whatever makes sense for you. 01:11:17.040 |
You can do lots of different things that now apply to exactly that group of types. 01:11:20.280 |
And I actually found, I still find that this kind of protocol-based programming approach 01:11:25.800 |
is like the exact upside down opposite of how I've always thought about things. 01:11:31.600 |
It's kind of like you don't create something that contains these things, but you kind of 01:11:37.320 |
And the more I've looked at code that works this way, the more I realize it tends to be 01:11:45.640 |
But I still find it a struggle because I just don't have that sense of this is how to go 01:11:54.480 |
And one of the things you'll notice is that we added this protocol to array in an extension. 01:11:59.680 |
So unlike interfaces in a Java or C# type of language, we can take somebody else's type 01:12:04.800 |
and then make it work with the protocol after the fact. 01:12:08.360 |
And so I think that's a superpower here that allows you to work with these values in different 01:12:13.520 |
So this is a super brief, high-level view of protocols. 01:12:17.480 |
Protocols are really cool in Swift, and they draw in a lot of great work in the Haskell 01:12:22.400 |
There's a bunch of talks, and even Jeremy wrote a blog post that's really cool that 01:12:25.720 |
talks about some of the fun things you can do. 01:12:31.760 |
Because once a functionality of a particular API or class is extended in this way, you 01:12:35.760 |
won't know if the functionality is coming from the original class or from somewhere 01:12:40.400 |
- Yeah, so that's something you let go of when you write Swift code. 01:12:43.840 |
And there's a couple of reasons for that, one of which is that you get good ID support. 01:12:47.880 |
And so, again, we're kind of building some parts of this airplane as we fly it, but in 01:12:53.080 |
Xcode, for example, you can click on a method and jump to the definition, right? 01:12:57.560 |
And so you can say, well, okay, here's a map on array. 01:13:09.160 |
And so all sequences have map filter reduced and a whole bunch of other stuff. 01:13:13.360 |
And so arrays are, of course, sequences, and so they get all that behavior. 01:13:17.960 |
And so the fact that it's coming out of sequence as a Swift programmer, particularly when you're 01:13:23.680 |
- And actually, you know, we've had this same discussion around Python, which is like, oh, 01:13:27.300 |
Jeremy imports star, and therefore I don't know where things come from, because the only 01:13:30.840 |
way I used to know where things come from is because I looked at the top of a file and 01:13:38.200 |
And we had that whole discussion earlier lesson where I said, that's not how you figure out 01:13:42.560 |
You learn to use jump to symbol in your IDE, or you learn to use Jupyter Notebox ability 01:13:55.320 |
I feel that Scala is often a very nicely designed language that my knowledge doesn't lack in 01:14:00.760 |
terms of the features I've seen so far in Swift. 01:14:04.080 |
And if so, is the choice of Swift more about JVM as opposed to non-JVM runtimes and compilers? 01:14:12.560 |
Scala is one of the, my, the way we explain Scala is that they are very generous in the 01:14:23.440 |
They're undergoing a big redesign of the language to kind of cut it down and try to make the 01:14:27.680 |
features more sensible and stack up nicely together. 01:14:31.600 |
Swift and Scala have a lot of similarities in some places and they diverge wildly in 01:14:35.600 |
- I mean, I would say there's a, you know, I feel like anybody doing this course understands 01:14:39.760 |
the value of tasteful curation because PyTorch is very tastefully curated and TensorFlow might 01:14:48.440 |
And so like using a tastefully curated, carefully put together API like Swift has and like PyTorch 01:14:55.040 |
has, I think it makes life easier for us as data scientists and programmers. 01:14:58.920 |
- Yeah, but I think the other point is also very good. 01:15:02.040 |
So Scala is very strong in the JVM, Java, virtual machine ecosystem and it works very 01:15:07.600 |
well with all the Java APIs and it's great in that space. 01:15:11.480 |
Swift is really great if you don't want to wait for JVM to start up so you can run a 01:15:16.800 |
And so there's nice duels and they have different strengths and weaknesses in that sense. 01:15:20.960 |
- Do we have time before our break that I can quickly show how this all goes together? 01:15:24.920 |
- I probably can't stop you even if I wanted to. 01:15:29.040 |
- So just to come back to this HC, right, you can basically see what's happened here. 01:15:37.400 |
We have to find this protocol saying these are the things that we want to have in a Databox 01:15:41.400 |
API and then we said here is a specific example of a Databox API. 01:15:46.680 |
Now at this point we're missing one important thing which we've never actually created the 01:15:51.800 |
bit that says this is how you open an image and resize it and stuff like that, right? 01:15:57.600 |
So we just go through and we can say let's call .download, let's call .get items. 01:16:04.200 |
We can create nice simple little functions now. 01:16:06.440 |
We don't have to create complex class hierarchies to say things like tell me about some sample 01:16:14.440 |
And we can create a single little function which creates a train and a valid. 01:16:21.200 |
This is something I really like about this style of programming is this is a named tuple. 01:16:26.560 |
And I really like this idea that we don't have to create our own struct in class all 01:16:31.040 |
It's kind of a very functional style of programming where you just say I'm just going to define 01:16:35.880 |
my type as soon as I need it and this type is defined as being a thing with a train and 01:16:40.940 |
So as soon as I work brackets parentheses around this it's both a type and a thing now. 01:16:46.760 |
And so now I can partition into train and valid and that's returned something where 01:16:52.120 |
I can grab a random element from valid and a random element from train. 01:16:57.720 |
We can create a processor, again it's just a protocol, right? 01:17:01.400 |
So remember a processor is a thing like for categories, creating a vocab of all of the 01:17:08.120 |
possible categories and so a processor is something where there's some way to say like 01:17:13.000 |
what is the vocab and if you have a vocab then process things from text into numbers 01:17:23.160 |
And so we can now go ahead and create a category processor, right? 01:17:27.280 |
So here's like grab all the unique values and here's label to int and here's int to 01:17:33.160 |
So now that we have a category processor we can try using it to make sure that it looks 01:17:52.080 |
sensible and we can now label and process our data. 01:17:58.480 |
So we first have to call label and then we have to call process. 01:18:03.880 |
Given that we have to do those things in a row, rather than creating whole new API functions 01:18:09.720 |
we can actually just use function composition. 01:18:12.920 |
Now in PyTorch we've often used a thing called compose but actually it turns out to be much 01:18:19.480 |
easier as you'll see if you don't create a function called compose but you actually 01:18:25.520 |
And so here's an operator which we will call compose, right? 01:18:29.240 |
Which is just defined as first call this function f and then call this function g on whatever 01:18:37.520 |
So now we have to find a new function composition which first labels and then processes. 01:18:44.360 |
And so now here's something which does both and so we can map, right? 01:18:49.940 |
So we don't have to create again all these classes and special purpose functions. 01:18:55.440 |
We're just putting together function composition and map to label all of our training data 01:19:04.640 |
And so then finally we can say well this is the final structure we want. 01:19:08.920 |
We want a training set, we want a validation set and let's again create our own little 01:19:16.760 |
Yeah, so our training sets, an array of named tuples, a validation set is an array of named 01:19:20.720 |
tuples and so we're gonna initialize it by passing both in. 01:19:24.400 |
And so this basically is now our data blocks API. 01:19:27.840 |
There's a function called make split labeled data and we're just gonna pass in one of those 01:19:37.680 |
So we're gonna be passing in the image net configuration protocol, the thing that conforms 01:19:45.000 |
And we're gonna be passing in some processor, right, which is gonna be a category processor 01:19:49.800 |
and it's gonna sort of call download, get the items, partition, map, label of, and then 01:19:56.880 |
initialize the processor state and then do label of and then process is our processing 01:20:07.940 |
So now we can say to use this with OpenCV, we define how to open an image. 01:20:15.520 |
We define how to convert BGR to RGB, cuz OpenCV uses BGR, that's how old it is. 01:20:21.680 |
We define the thing that resizes to 224 by 224 with bilinear interpolation. 01:20:26.400 |
And so the process of opening an image is to open, then BGR to RGB, and then resize and 01:20:32.680 |
we compose them all together, and that's it, right? 01:20:36.600 |
So now that we've got that, we then need to convert it to a tensor. 01:20:42.500 |
So the entire process is to go through all those transforms and then convert to a tensor. 01:20:48.040 |
And then, I'll skip over the bit that does the mini batches. 01:20:54.160 |
There's a thing we've got to do the mini batches with that split label data we created, and 01:20:59.040 |
we then just pass in the transforms that we want, and we're done, right? 01:21:04.840 |
So the data blocks API in kind of functional-ish, protocol-ish, Swift, ends up being a lot less 01:21:13.120 |
code to write and a lot easier for the end user. 01:21:17.760 |
Cuz now for the end user, there's a lot less they have to learn to use this data blocks 01:21:22.920 |
It's really just like the normal kind of maps and function composition that hopefully they're 01:21:31.640 |
So I'm really excited to see how this came out, because it solves the problems that I've 01:21:38.760 |
been battling with for the last year with the Python data box API, and it's been really 01:21:44.700 |
just a couple of days of work to get to this point. 01:21:47.240 |
>> And one of the things that this points to in Swift that is a big focus is on building 01:21:52.640 |
And so, again, we've been talking about this idea of being able to take an API, use it 01:21:58.720 |
It could be in C or Python or whatever, but it's about building these things that compose 01:22:03.320 |
together and they fit together in very nice ways. 01:22:05.960 |
And with Swift, you get these clean abstractions. 01:22:09.960 |
So once you pass in the right things, it works. 01:22:12.280 |
You don't get the stack trace coming out of the middle of somebody else's library that 01:22:15.760 |
now you have to figure out what you did somewhere along the way that caused it to break, at least 01:22:21.680 |
>> So to see what this ends up looking like, I've created a package called data block. 01:22:25.040 |
It contains two files in, it's got a package.swift and it's got a main.swift. 01:22:32.200 |
So all that in the end to actually use it, that's how much code it is to use your data 01:22:42.880 |
So let's take a five-minute break and see you back here at 8.05. 01:22:49.040 |
Okay, so we're gradually working our way back to what we briefly saw last week, notebook11, 01:23:02.080 |
trading image net, and we're gradually making our way back up to hit that point again. 01:23:08.240 |
It's a bit of a slow process because along the way we've had to kind of invent, float, 01:23:12.640 |
and learn about a new language and stuff like that. 01:23:14.680 |
But we are actually finally up to zero to a fully connected model, believe it or not. 01:23:20.200 |
And the nice thing is at this point, things are going to start looking more and more familiar. 01:23:26.760 |
One thing I will say though that can look quite unfamiliar is the amount of typing that 01:23:33.440 |
you have to type with Swift, but there's actually a trick, which is you don't have to type all 01:23:42.640 |
>> What you can actually do is you can say, oh, here's the type I use all the time, tensor, 01:23:48.840 |
bracket, float, and I don't like writing angle brackets either. 01:23:58.000 |
Now to be clear, a lot of real Swift programmers in their production code might not like doing 01:24:07.840 |
And personally I do do that a lot, even not in notebooks. 01:24:12.080 |
But you might want to be careful if you're doing actual Swift programming. 01:24:17.600 |
>> The way I would look at it is if you're building something for somebody else to use, 01:24:20.920 |
if you're publishing an API, you probably don't want to do that. 01:24:24.280 |
>> But if you're hacking things together and you're playing and having fun, it's no problem 01:24:30.160 |
I personally, I would say if I'm giving somebody something that's the whole thing, tensor floats, 01:24:35.240 |
But anyway, in a notebook, I definitely don't want to be typing that. 01:24:38.200 |
So in a notebook, make it easier for your interactive programming by knowing about things 01:24:44.800 |
That's something we also want to make better just in general so that these things all just 01:24:51.600 |
So then we can write a normalized function that looks exactly the same as our Python 01:24:57.680 |
And we can use mean and standard deviation just like in Python. 01:25:01.800 |
And we can define tests with asserts just like in Python. 01:25:08.040 |
We can calculate n and m and c, the same constant, the variables that we used in Python in exactly 01:25:17.000 |
We can create our weights and biases just like in Python, except there's a nice kind 01:25:24.640 |
of rule of thumb in the Swift world, which is any time you have a function that's going 01:25:32.200 |
to create some new thing for you, we always use the init constructor for that. 01:25:37.880 |
So for example, generating random numbers and dumping them into a tensor, that's constructing 01:25:44.960 |
So it's actually -- you're actually calling tensorflow.init here. 01:25:50.980 |
And so if you're trying to find where is it in an API that I get to create something in 01:25:56.840 |
this way, you should generally look for the -- in the init section. 01:25:59.440 |
So this is how you create random numbers in Swift for TensorFlow. 01:26:02.680 |
This is how you create tensor of zeros in Swift for TensorFlow. 01:26:08.120 |
This is all the same stuff we just basically copied and pasted it from the PyTorch version 01:26:15.720 |
We use a linear function, except rather than at, we use dot, because that's what they use 01:26:27.080 |
If you're on anything else, it's compose key dot equals. 01:26:32.480 |
And so now we can go ahead and calculate linear functions. 01:26:36.160 |
We can calculate relu, exactly the same as PyTorch. 01:26:41.260 |
We can do proper climbing in it, exactly like PyTorch. 01:26:45.600 |
And so now we're at the point where we can define the forward pass of a model. 01:26:50.280 |
And this looks, basically, again, identical to PyTorch. 01:26:54.800 |
A model can just be something that returns some value. 01:27:00.320 |
So that -- the forward pass of our model really just builds on stuff that we already know 01:27:05.440 |
about and it looks almost identical to PyTorch, as does a loss function, right? 01:27:11.360 |
It looks a little bit different because it's not called squeeze. 01:27:15.760 |
Mean squared error is the same as PyTorch as well. 01:27:24.080 |
If it doesn't, go back to 02 in the Python notebooks. 01:27:29.020 |
And actually this is one of the tricks, like this is why we've done it this way for you 01:27:32.000 |
all, is that we have, like, literally kind of these parallel texts, you know, there's 01:27:37.320 |
a Python version, there's a Swift version, so you can see how they translate and see exactly 01:27:42.080 |
how you can go from one language and one framework to another. 01:27:51.080 |
So to do a backward pass, we can do it exactly the same way as, again, we did it in PyTorch. 01:27:56.480 |
One trick we kind of -- Python hack we used in PyTorch. 01:28:04.360 |
>> Doing it all manually, yep, because we have to build everything in Scratch. 01:28:06.160 |
And the PyTorch version, we actually added a .grad attribute to all of our tenses. 01:28:11.000 |
We're not allowed to just throw attributes in arbitrary places in Swift, so we have to 01:28:14.760 |
define a class which has the actual value and the gradient. 01:28:18.720 |
But once we've done that, the rest of this looks exactly the same as the PyTorch version 01:28:31.280 |
Here's the Python version we created for LinGrad, here's the Swift version for LinGrad. 01:28:39.360 |
So now that we've done all that, we can go ahead and do our entire forward and backward 01:28:55.760 |
>> Well, you skipped past the big flashing red lights that says don't do this. 01:29:07.240 |
We're defining a class and putting things in classes, and we haven't seen classes yet, 01:29:13.720 |
Because before, we've used things that looked like classes, but they didn't say class on 01:29:22.880 |
>> So let's play a little game, and so let's talk about this idea of values and references, 01:29:30.120 |
because that's what struct versus class really means in Swift. 01:29:34.040 |
A value is a struct thing, and a reference is a class thing. 01:29:40.600 |
Here's some really simple Python code, and there's no tricks here. 01:29:43.140 |
What we're doing is we're assigning four into A, we're copying A into B, we're incrementing 01:29:48.720 |
And so when you do this, you see that A gets incremented. 01:29:55.160 |
In Swift, you do the same thing, you get the same thing out. 01:30:04.000 |
So here I have an array or list in Python, and I put into X, and then I copy X and Y. 01:30:22.800 |
I just added something to X, and now Y changed? 01:30:30.840 |
We learn that there's this thing called a reference, and we learn that it does things 01:30:41.960 |
And so here we have, again, this identical code except var. 01:30:46.080 |
We put one and a two into X, we copy X into Y, we add something to X, we print it out, 01:30:57.640 |
So this is something called value and reference semantics. 01:31:01.840 |
And in Swift, arrays, dictionaries, tensors, like all these things have what's known as 01:31:07.600 |
And let's dive in a little bit about what that is. 01:31:10.020 |
So a value in something that has value semantics is a variable that -- sorry, this is self-referential. 01:31:17.800 |
When you declare something in your code, you're declaring a name. 01:31:20.840 |
And if it's a name for a value, that name stands for that value, right? 01:31:25.720 |
X stands for the array of elements that it contains. 01:31:32.380 |
This is what you expect out of basic integers. 01:31:34.340 |
This is what you expect out of basic things that you interact with on a daily basis. 01:31:40.120 |
Reference semantics are weird if you think about it. 01:31:41.940 |
So what we're doing is we're saying that X is a name for something else. 01:31:47.540 |
And so we usually don't think about this until it comes around to bite us. 01:31:53.680 |
And let's dive in a little bit to understand why this causes problems. 01:32:00.480 |
It's something that Jeremy wrote with a very descriptive name. 01:32:03.120 |
And it takes T, and then it goes and updates this, and that's fine, right? 01:32:09.840 |
You move on and put in a workbook, and then you build the next workbook. 01:32:13.320 |
Next workbook calls in a do thing, and you find out, oh, well, it changed the tensor 01:32:18.800 |
I passed in, but I was using that tensor for something else. 01:32:21.760 |
And now I've got a problem because it's changing a tensor that I want to use. 01:32:27.800 |
And I find out the do thing is causing the problem. 01:32:32.360 |
I don't know who here adds clones in a principled way or who here-- 01:32:38.760 |
So what we do in fast AI is we kind of don't have clone. 01:32:41.600 |
And then when things start breaking, I add more until things start breaking, and then 01:32:52.920 |
Possibly a few too many, or possibly a few too few. 01:32:56.800 |
What we have is we have a foot gun here in the first case. 01:32:59.640 |
So something that's ready to explode if I use it wrong. 01:33:06.640 |
So it's going to do that copy even if I don't need to, which is really sad. 01:33:18.480 |
Arguments in Swift actually even default to constants, which makes it so that you can't 01:33:23.400 |
If you do actually want to modify something in the caller, you can do that too. 01:33:27.760 |
You just have to be a little bit more explicit about it and use this thing called in out. 01:33:31.440 |
And so now if you want to update the thing somebody passed to you, that's fine. 01:33:36.040 |
Just pass it in out and everything works fine. 01:33:38.400 |
And on the call side, you pass it with this ampersand thing so that they know that it 01:33:49.880 |
Well, when we talk about names, we're talking about values. 01:33:56.480 |
And so I say it has two fields, real and imaginary. 01:33:58.960 |
And I define an instance of my complex number here named X. 01:34:04.560 |
And so this is saying I have X and it's a name for the value that has one and two in 01:34:10.960 |
And so I introduce Y. Y is another notational instance of this struct. 01:34:18.920 |
And if I go and I copy it, then I get another copy. 01:34:28.920 |
And so this works with structs, this works with tuples, this works with arrays and dictionaries 01:34:39.760 |
I have a class and the class has a string and it has an integer. 01:34:43.040 |
And so somewhere in memory, there is a string and there is an array and they're stuck together, 01:34:48.860 |
But now when I say X, X is actually a reference or a pointer or an indirection. 01:34:54.520 |
The reason for that is because you wrote class instead of struct. 01:34:58.380 |
So by writing class, you're saying when you create one of these things, please create 01:35:06.120 |
And now what happens with references is you now get copies of that reference. 01:35:11.680 |
And so when I copy X into Y, just like in PyTorch or Python, I have another reference 01:35:20.460 |
And so that's why when you go and you update it, so I'm going to go change the array through 01:35:24.480 |
Y, it's also going to change the data that you see through X. 01:35:32.460 |
And so you can declare things as classes and classes are good for certain things and they're 01:35:38.080 |
important and valuable and you can subclass them and classes are nice in various ways. 01:35:42.840 |
But you have a choice and a lot of things that you've seen are all defined as structs 01:35:45.760 |
because they have much more predictable behavior and they stack up more correctly. 01:35:48.700 |
So in this case, you know, I was trying to literally duplicate a Python/PyTorch API. 01:35:56.040 |
And so I just found I wasn't able to unless I used class. 01:36:01.120 |
And then you kind of said, okay, well, that's how you do it. 01:36:03.960 |
Yeah, and we'll get back to auto diff in a second. 01:36:09.040 |
And again, when you're learning Swift, it's fine, just reach for the things that are familiar 01:36:15.760 |
But here we're trying to talk about things Swift is doing to help save you and make your 01:36:25.360 |
But then every time a real Swift programmer takes my thing that had class and replaces 01:36:29.520 |
it with something more Swift-y, it ends up being shorter and easier to understand. 01:36:35.160 |
And so I agree, go for it, get things working with class. 01:36:39.520 |
But when it becomes time, start to work with this and look at it and figure out how it 01:36:45.960 |
Now, there's one thing that's really weird here. 01:36:46.960 |
And if you remember last time, the first thing I told you about was var and let, right? 01:36:55.920 |
We've got Y and now we are updating, if this thing will go away, we are updating a thing 01:37:09.360 |
Well, the reason here is that the thing that is constant, this constant is this reference. 01:37:15.840 |
And so we've made a new copy of the reference, but we're allowed to copy the thing it points 01:37:25.440 |
So this doesn't make sense, none of this makes sense, but how does let and var work? 01:37:30.200 |
Well, this is a thing that comes back to the mutation model in Swift. 01:37:37.240 |
But let's say I have a complex number and it's a struct and I say, hey, this thing is 01:37:47.640 |
Well, if you try to do that, Swift will tell you, ha ha, you can't do that. 01:37:50.680 |
You can't use plus equals on a real that's in a C because C1 is a let. 01:37:57.040 |
And so it tries to lead you to solving a problem that says, hey, by the way, if you want to 01:38:01.400 |
fix this, you want to make it go away, just change let to var and then everything is good. 01:38:10.600 |
And so what I'm going to do is I'm going to get a little bit trickier and I'm going to 01:38:13.840 |
I'm going to add a method increment to my complex number. 01:38:18.320 |
I'm going to increment inside the method and then call the method. 01:38:24.200 |
Well, you know, these things may be in different files. 01:38:27.360 |
The compiler may only be able to see one or the other. 01:38:30.040 |
And so if you run this, it has no idea whether increment is going to change that thing, right? 01:38:36.280 |
And so what the compiler does is says, ah, well, you can't implement real inside of this 01:38:42.280 |
increment method either because it says self is immutable. 01:38:48.080 |
And it says Mark method mutating to make self mutable. 01:38:51.520 |
Now the thing to think about in methods, both in Python, but also in Swift is that they 01:38:55.900 |
have a self and in Python, you have to declare it. 01:39:01.040 |
It just, it's just not making you write it all the time because that would be annoying. 01:39:06.320 |
And so when you declare a method on a struct, what you do is you're getting self and it's 01:39:16.560 |
Now what this is saying is this is saying that, hey, you're actually changing self.real. 01:39:23.520 |
And so you can't do that here, but what you can do is you can mark it mutating. 01:39:27.200 |
And so what that looks like is that says we can mark this function as mutating. 01:39:32.080 |
And what that does is it says our self is now one of these in out things, the in out 01:39:36.920 |
thing that allows us to change it in the color. 01:39:39.440 |
And because it's now mutating, it's totally fine to change it. 01:39:43.840 |
The compiler leads you to this and shows you what to do, but now we come back to this problem 01:39:54.200 |
The compiler will tell you, hey, you can't do that. 01:39:59.000 |
And now it knows the increment can change it. 01:40:01.560 |
And so it says really, really, really, if you want to do this, go mark C1 as a var. 01:40:05.640 |
And Jeremy would say, just mark everything as a var because that's how he is. 01:40:10.040 |
And so the nice thing about this, though, is it all stacks up nicely and it all works. 01:40:15.000 |
And this is what allows -- this is kind of the underlying mechanics that allow the value 01:40:20.400 |
Now, you may be wondering, how is this efficient? 01:40:24.200 |
So we were talking about in the PyTorch world, you end up copying all the time, even if you 01:40:29.600 |
In Swift, we don't want to do all those copies. 01:40:32.040 |
And so on the other hand, we don't want to be, like, always copying. 01:40:36.200 |
So where do the copies go and how does that work? 01:40:38.880 |
So if you're using arrays, or arrays of arrays of arrays of dictionaries of arrays of super 01:40:43.200 |
nested things, what ends up happening is arrays are struct, you might be surprised. 01:40:49.640 |
And inside of that struct, it has a pointer or a reference. 01:40:54.060 |
And so the elements of an array are actually implemented with the class. 01:40:57.160 |
And so what I have here is I have A1, which is some array, and I copied it to A2, and 01:41:01.560 |
I copied it to A3, I copied it to A4 because I'm passing it all around, I'm just passing 01:41:04.800 |
this array around, no big deal, and what happens is I'm just copying this reference and it 01:41:11.560 |
And so this passing around arrays, full value semantics, super cheap, no problem, it's not 01:41:15.720 |
copying any data, it's just passing the pointer around, right, just like you do in C or even 01:41:23.080 |
The magic happens when you go and you say, okay, well, I've now got A4, and so all these 01:41:27.280 |
things are all sharing this thing, I'm going to add one element to A4, well, what happens? 01:41:31.760 |
Well, first thing that happens is append is a mutating method, and so it says, hey, I'm 01:41:38.320 |
this thing called a copy-on-write type, and so I want to check to see if I'm the only 01:41:44.880 |
And it turns out no, lots of other things are pointing to our data here, and so lazily, 01:41:51.600 |
because it's shared, I'll make a copy of this array. 01:41:54.260 |
And so I only get a copy of the data if it's shared and if it changes. 01:42:07.080 |
Now the interesting thing about this is because of the way this all works out is if you go 01:42:09.840 |
and you change A4 again, it goes and just updates it in place, there's no extra copy. 01:42:14.800 |
And so the cool thing about this is that you get exactly the right number of copies and 01:42:19.880 |
it just works, you as a programmer don't have to think about this. 01:42:23.320 |
This is one of the things that Swift is just, like, subtracting from your consciousness 01:42:27.760 |
of the things that you have to worry about, which is really nice. 01:42:31.000 |
And so a really nice aspect of this is that you get algebra, like, values work the way 01:42:36.640 |
values are supposed to work, you get super high performance, we get to use more emojis, 01:42:43.360 |
If you want to learn more about this, because this is also a really cool, deep topic that 01:42:47.640 |
you can geek out about, particularly if you've done object-oriented programming before, there's 01:42:50.640 |
a lot that's really nice about this, there's a video you can see more. 01:42:53.960 |
So let's go back to that auto-diff thing, and let's actually talk about auto-diff from a 01:43:00.360 |
So this is the auto-diff system implemented the same way as the manually done PyTorch version, 01:43:08.800 |
and we didn't like it because it was using references. 01:43:12.080 |
Let's implement, again, the very low-level manual way in Swift, but before we do, let's 01:43:19.760 |
So Swift has built-in, and Swift for TensorFlow has built-in automatic differentiation for 01:43:25.960 |
So you don't have to write gradients manually, you don't have to worry about all this stuff. 01:43:30.880 |
There are functions like gradient, and you call gradient, and you pass it a closure, 01:43:36.000 |
and you say, what is the gradient of x times x? 01:43:39.680 |
And it gives you a new function that computes the gradient of x times x, and here we're 01:43:42.720 |
just calling that function on a bunch of numbers that we're striding over and printing them 01:43:47.200 |
out, and it just gives you this gradient of this random little function we wrote. 01:43:52.800 |
Now, one of the interesting things about this is, I wrote this out, it takes just doubles 01:43:59.160 |
Auto-diff in Swift works on any differentiable type, anything that's continuous, anything 01:44:03.760 |
that's not like integers, anything that has a gradient. 01:44:11.620 |
This has to be built into the language itself, because you're actually -- you're just -- you're 01:44:16.680 |
literally compiling something that's multiplying doubles together, and it has to figure out 01:44:22.680 |
You can do things as a library, and that's what PyTorch and other frameworks do in Python. 01:44:37.320 |
If you want to define quaternions or other cool numeric scientific-y things that are continuous, 01:44:43.880 |
those are differentiable, too, and that all stacks out and works. 01:44:46.800 |
So the -- there's a bunch of cool stuff that works this way. 01:44:52.640 |
You can get the gradient at some point with the function. 01:44:58.800 |
Instead of talking about that, we're going to do the -- from the bottom-up thing. 01:45:03.160 |
And so I'm going to pretend I understand calculus for a minute, which is sad. 01:45:08.080 |
So if you think about what differentiation is, computing the derivative of a function, 01:45:16.600 |
You have to know the axioms of the universe, like, what does -- what is the derivative 01:45:20.360 |
of plus or multiply or sine or cosine or tensor or matmul. 01:45:27.160 |
Then you have to compose these things together, and the way you compose them together is this 01:45:30.680 |
thing called the chain rule, and this is something that I relearned, sadly, over the last couple 01:45:37.080 |
>> But we did in the Python part of this course. 01:45:45.160 |
>> Yeah, apparently there's some ancient feud between the people who invented calculus independently, 01:45:52.800 |
So what this is saying is this is saying, if you want the derivative of f calling g, 01:45:57.560 |
the derivative of f calling g is the derivative of f applied to the forward version of g multiplied 01:46:04.880 |
And this is important because this is actually computing the forward version of g in order 01:46:12.240 |
>> Which we kind of hid away in our dy du du dx version. 01:46:25.080 |
Well, what we're going to do is we're going to look at defining the forward function of 01:46:27.880 |
this, and so we'll use the mean squared error as the example function. 01:46:34.000 |
This is a little bit more complicated than I want, and so what I'm going to do is I'm 01:46:36.560 |
going to actually just look at this piece here, and so I'm going to define this function 01:46:39.720 |
MSE inner, and all it is is it's the dot squared dot mean. 01:46:44.280 |
So it's conceptually this thing, MSE inner, that just gets the square of x and then does 01:46:49.720 |
the mean just because that's simpler, and then we'll come back to MSE at the end. 01:46:53.800 |
And so in order to understand what's going on, I'm going to find this little helper function 01:46:57.360 |
called trace, and all trace does is it -- you can put it in your function, and it uses this 01:47:02.480 |
little magic thingy called pound function, and when you call trace, it just prints out 01:47:09.240 |
And so here we call foo, and it prints out, hey, I'm in foo AB, and I'm in bar X, and 01:47:14.960 |
so we'll use that to understand what's happening in these cells. 01:47:18.280 |
So here I can define, just like you did in the PyTorch version, the forward and the derivative 01:47:25.680 |
versions of these things, and so X times X is the forward, the gradient version is two 01:47:29.960 |
times X. X dot mean is the forward, this weird thing of doing a divide is apparently the 01:47:36.440 |
gradient of mean, and I checked it, it apparently works, I don't know why. 01:47:42.360 |
So then when you define the forward function of this MSE inner function, it's just saying 01:47:45.800 |
give me the square and take the mean, super simple, and then we can use the chain rule, 01:47:51.040 |
and this is literally where we use the chain rule to say, okay, we want the gradient of 01:47:54.560 |
one function on another function, just like the syntax shows, and the way we do that is 01:47:58.840 |
we get the gradient of mean and pass it to the inner thing and multiply it by the gradient 01:48:05.320 |
So this is really literally the math interpretation of this stuff. 01:48:09.640 |
And given that we have this, we can now wrap it up into more functions, and we can say, 01:48:14.680 |
let's compute the forward and the backwards version of MSE, we just call the forward version, 01:48:18.420 |
we call the backward version, and then we can run on some example data, one, two, three, 01:48:23.760 |
>> And just to be clear, the upside down triangle thing is not an operator here, it's just using 01:48:28.440 |
inner code as part of the name of that function. 01:48:31.140 |
>> That's a gradient delta symbol thingy, I found that on Wikipedia. 01:48:39.480 |
So when you run this, what you'll see is it computes the forward version of the saying, 01:48:43.200 |
it runs square and then it runs mean, and then it runs square again, and then it runs 01:48:47.480 |
the backward version of mean and square, and this makes sense given the chain rule, right? 01:48:50.920 |
We have to recompute the forward version of square to do this, and for this simple example, 01:48:55.640 |
that's fine, square is just one multiply, but consider it might be a multiply of megabytes 01:49:02.680 |
It's not necessarily cheap, and when you start composing these things together, this recomputation 01:49:09.260 |
So let's look at what we can do to factor that out. 01:49:12.760 |
So there's this pattern called chainers, and what we call the value and chainer pattern. 01:49:18.960 |
And what we want to do is we want to find each of these functions, like square or mean 01:49:24.080 |
or your model, as one function that returns two things. 01:49:29.480 |
And so what we're going to do is we're going to look at the other version of calculus's 01:49:33.620 |
form of this, and so when you say that the derivative of x squared is 2x, you actually 01:49:39.280 |
have to move the dx over with it, and this matters because the functions we just defined 01:49:46.600 |
are actually only, those are only valid if you're looking at a given point, that they're 01:49:50.320 |
not valid if you compose it with another function. 01:49:52.440 |
This is just another way of writing the chain rule. 01:49:54.340 |
It's the exact same thing, and so we're going to call this the gradient chain, and all it 01:50:01.120 |
And Chris, I just need to warn you, in one of the earlier courses, I got my upside-down 01:50:05.880 |
triangles mixed up as you just did, so the other way round is delta, and this one is 01:50:10.920 |
called nabla, and I only know that because I got in trouble for screwing it up from everywhere 01:50:18.480 |
So all this is, is the same thing we saw before, it just has an extra multiplication there 01:50:22.760 |
because that's what the chain rule apparently really says. 01:50:25.760 |
So what we can do now is, now that we have this, we can actually define this value with 01:50:32.600 |
What this is doing is it's wrapping up both of these things into one thing. 01:50:35.880 |
So here we're returning the value, when you call this, we're also returning this chain 01:50:41.760 |
Can you just explain this TF arrow, TF, how do I read that, TF arrow, TF? 01:50:48.640 |
So what this is doing is this is saying we're defining a function, square WVC, it takes 01:50:53.320 |
X, it returns a tuple, we know what tuples are. 01:50:57.720 |
These are fancy tuples, like you were showing before, where the two things are labeled. 01:51:02.240 |
So there's a value member of the tuple, and there's a chain label of the tuple. 01:51:09.960 |
And so this says it is a closure that takes a tensor of float and returns a tensor of 01:51:15.620 |
So that's just a way of defining a type in Swift where the type is itself a function. 01:51:21.400 |
And so what square VWC is going to be is it's going to be two things, it's the forward thing, 01:51:26.360 |
the multiple X times X, and that's the backwards thing, the thing we showed just up above that 01:51:34.120 |
And the forward thing is the actual value of the forward thing, the backward thing is 01:51:37.480 |
a function that will calculate the backward thing. 01:51:39.840 |
And the chain here is returning a closure, and so it's not actually doing that computation. 01:51:44.800 |
So we can do the same thing with mean, and there is the same computation. 01:51:47.960 |
And so now what this is doing is it's a little abstraction that allows us to pull together 01:51:52.240 |
the forward function and the backward function into one little unit. 01:51:57.200 |
And the reason why this is interesting is we can start composing these things. 01:52:00.920 |
And so this MSE inner thing that we were talking about before, which is mean followed by square, 01:52:05.400 |
or square followed by mean, we can define, we just call square VWC and then we pass the 01:52:13.960 |
And then the result of calling this thing is mean.value, and the derivative is those 01:52:24.720 |
And so if we run this, now we get this really interesting behavior where when we call it, 01:52:28.360 |
we're only calling the forward functions once and the backward function once as well. 01:52:34.640 |
And we also get the ability to separate this out. 01:52:36.660 |
And so here what we're doing is we're calling the VWC for the whole computation, which gives 01:52:46.080 |
And if that's all we want, that's cool, we can stop there. 01:52:51.640 |
And so here we call it what we call the chain function to get that derivative. 01:52:56.320 |
And so that's what gives us both the ability to get the forward and the backward separate, 01:53:01.480 |
which we need, but also it makes it so we're not getting the re-computation because we're 01:53:06.660 |
reusing the same values within these closures. 01:53:10.020 |
So given that we have these like infinitesimally tiny little things, let's talk about applying 01:53:15.480 |
I'll go pretty quickly because the details aren't really important. 01:53:20.040 |
And so we're using the same thing as ReLU grad from before. 01:53:25.400 |
Here's the LIN grad using the PyTorch style of doing this. 01:53:29.160 |
And so all we're doing is we're pulling together the forward computation in the value thing 01:53:35.160 |
And then we're doing this backward computation here. 01:53:38.280 |
So can I just talk about this difference because it's really interesting because this is the 01:53:42.940 |
version that Silva and I wrote when we just pushed it over from PyTorch. 01:53:49.600 |
And we actually did the same thing that Chris just did, which is we avoided calculating 01:53:58.560 |
And the way we did it was to cache away in in.grad and out.grad the intermediate values 01:54:08.760 |
so that we could then use them again later without recalculating them. 01:54:12.760 |
Now what Chris is showing you here is doing the exact same thing but in a much more automated 01:54:24.520 |
We're having to kind of use this kind of very heuristic, hacky, one at a time approach of 01:54:29.640 |
saying what do I need at each point, let's save it away in something or give it a name 01:54:36.800 |
And also without any mutation, this functional approach is basically saying let's package 01:54:42.600 |
of everything we need and hand it over to everything that needs it. 01:54:47.480 |
And so that way we never had to say what are we going to need for later. 01:54:54.920 |
You'll see all the steps are here out times blah dot transposed, out times blah dot transposed, 01:55:00.760 |
But we never had to think about what to cache away. 01:55:03.960 |
And so this is not something I would want to write ever again, manually, personally. 01:55:11.480 |
But the advantage of this is it's really mechanical and it's very structured. 01:55:16.120 |
And so when you write MSE, the full MSE, what we can do is we can say, well, it's that subtraction, 01:55:21.400 |
then it's that dot mean dot squared, and then on the backwards pass we have to undo the 01:55:28.360 |
And so it's very mechanical how it plugs together. 01:55:31.280 |
Now we can write that forward and backward function and it looks very similar to what 01:55:34.400 |
the manual version of the PyTorch thing looked like where you're calling these functions 01:55:38.920 |
and then in the backward version you start out with one because the gradient of the loss 01:55:43.520 |
with respect to itself is one, which now I understand, thanks to Jeremy. 01:55:49.040 |
And then they chain it all together and you get the gradients. 01:55:51.800 |
And through all of this work, again, what we've ended up with is we've gotten the forward 01:55:55.680 |
and backwards pass, we get the gradients of the thing, and now we can do optimizers and 01:56:01.920 |
>> I've got this mention like what Chris was saying about this one thing here and so 01:56:07.600 |
For Chris and I, we took a really long time to get to this point and we found it extremely 01:56:15.120 |
difficult and at every point up until the point where it was done, we were totally sure 01:56:22.080 |
And so like, please don't worry that there's a lot here and that you might be feeling the 01:56:27.880 |
same way Chris and I did, but yeah, you'll get there, right? 01:56:36.160 |
It's okay if this seems tricky, but just go through each step one at a time. 01:56:40.120 |
So again, this is talking about the low-level math-y stuff that underlies calculus. 01:56:45.440 |
And so the cool thing about this, though, from the Swift perspective is this is mechanical. 01:56:53.120 |
And so one of the things that we've talked about a lot in this course is the idea of 01:56:59.000 |
their primitives, they're the atoms of the universe and then they're the things you build 01:57:03.440 |
And so the atoms of the universe for tensor, the atoms of the universe for float, we've 01:57:07.800 |
And so we've seen multiply and we've seen add on floats. 01:57:10.880 |
Well, if you look at the primitives of the universe for tensor, they're just methods 01:57:15.000 |
and they call the raw ops that we showed you last time, right? 01:57:18.240 |
And so if you go look at the TensorFlow APIs, what you'll see is those atoms have this thing 01:57:26.440 |
that Swift calls them VJP's for weird reasons. 01:57:30.960 |
This defines exactly the mechanical thingy that we showed you. 01:57:34.720 |
And so the atoms know what their derivatives are and the compiler doesn't have to know 01:57:39.240 |
about the atoms, but that means that if you want to, you can introduce new atoms. 01:57:45.380 |
The payoff of this now, though, is you don't have to deal with any of this stuff. 01:57:53.320 |
So here's MSE inner and it just does dot squared dot mean. 01:57:58.440 |
And I say make it differentiable and I can actually get that weird thing, that chainer 01:58:04.040 |
thing directly out of it and I can get direct low-level access if for some reason I ever 01:58:11.000 |
Generally you don't and that's why you say give me the gradient or give me the value 01:58:17.600 |
And the cool thing about this is this all stacks up from very simple things and it composes 01:58:24.680 |
And if you want to, you can now go hack up the internals of the system and play around 01:58:29.520 |
with the guts and it's exposed and open for you. 01:58:32.440 |
But if you're like me, at least, you would stay away from it and just write maps. 01:58:36.440 |
Well, I mean, sometimes we do need it, right? 01:58:38.300 |
So you'll remember when we did the heat maps, right, those heat maps, we actually had to 01:58:45.880 |
dive into the registering a backward callback in PyTorch and grab the gradients and then 01:58:53.320 |
use those in our calculations and so there's plenty of stuff we come across where you actually 01:58:59.000 |
Yeah, and there are some really cool things you can do too. 01:59:01.800 |
So now we ended up with a model and so this is something that I had never got around to 01:59:10.440 |
Here we're implementing it with matmoles and with the lint function, the rel use and things 01:59:16.160 |
The bad thing about defining a forward function like this is you get tons of arguments to 01:59:20.360 |
And so some of these arguments are things that you want to feed into the model. 01:59:23.280 |
Some of these things are parameters and so as a refactoring, what we can do is we can 01:59:27.440 |
introduce a struct, you might be surprised, that puts all of our parameters into it. 01:59:31.720 |
So here we have my model and we're saying it is differentiable and what differentiable 01:59:35.700 |
means is it has a whole bunch of floating point stuff in it and I want to get the gradient 01:59:44.280 |
So now I can shove all those arguments into the struct, it gives me a nice capsule to 01:59:48.640 |
deal with and now I can use the forward function on my model. 01:59:54.800 |
This is starting to look nicer, this is more familiar and I can just do math and I can 01:59:59.040 |
use w1 and b1 and these are just values defined on our struct. 02:00:03.920 |
Now I can get the gradient with respect to the whole model and our loss. 02:00:09.120 |
And all of this is building up on top of all those different primitives that we saw before 02:00:13.680 |
that we, and the chain rule and all these things, that now we can say hey, give us the 02:00:20.920 |
gradient of the model with respect to x-train and y-train and we get all the gradients of 02:00:29.960 |
You can see it all calling the little functions that we wrote and it's all pretty fast. 02:00:35.640 |
Now again, like we were just talking about, this is not something you should do for matmul 02:00:40.460 |
or convolution, but there are reasons why this is cool and so there are good reasons 02:00:47.480 |
So sometimes the gradients you get out of any auto-diff system will be slow because you 02:00:52.800 |
do a ton of computation and it turns out the gradient ends up being more complicated and 02:01:00.800 |
And so it's actually really nice that you can say hey, here's the forward version of 02:01:03.840 |
this big complicated computation, I'm going to have an approximation that just runs faster. 02:01:08.740 |
Sometimes you'll get numerical instabilities in your gradients and so you can define, again, 02:01:12.120 |
a different implementation of the backwards pass, which can be useful for exotic cases. 02:01:18.680 |
There are some people on the far research side of things that want to use learning and 02:01:22.000 |
things like that to learn gradients, which is cool. 02:01:25.360 |
And so having the system where everything is just simple and composes but is hackable 02:01:31.560 |
There are also always going to be limitations of the system. 02:01:35.640 |
Now one of the limitations that we currently have today, which will hopefully be fixed 02:01:39.720 |
by the time the video comes out, is we don't support control flow and auto-diff. 02:01:43.440 |
And so if you do an if or a loop like an RNN, auto-diff will say I don't support that yet. 02:01:49.940 |
But that's okay because you can do it yourself. 02:02:01.680 |
And so what we have implemented here, and we'll talk about layers more in a second, 02:02:07.120 |
is we have this thing called switchable layer. 02:02:09.000 |
And what switchable layer is, is it's just a layer that allows us to have a Boolean toggle 02:02:20.760 |
And so Swift auto-diff doesn't currently support if. 02:02:24.160 |
And so when we define the forward function, it's super easy. 02:02:26.360 |
We just check to see if it's on, and if so, we run the forward, otherwise we don't. 02:02:31.080 |
Because it doesn't support that control flow yet, we have to write the backwards pass manually. 02:02:35.560 |
And we can do that using exactly all the stuff that we just showed. 02:02:38.480 |
We implement the value, and we implement the chainer thing. 02:02:42.040 |
And we can implement it by returning the right magic set of closures and stuff like that. 02:02:46.800 |
And so it sucks that Swift doesn't support this yet. 02:02:52.160 |
And so for this or anything else, you can go and customize it to your heart's content. 02:02:57.800 |
And I mean, one of the key things here is that Chris was talking about kind of the atoms. 02:03:02.800 |
And at the moment, the atoms is TensorFlow, which is way too big an atom. 02:03:10.260 |
But at the point when we're kind of in MLIR world, the atoms are the things going on inside 02:03:20.000 |
And so this ability to actually differentiate on float directly suddenly becomes super important. 02:03:25.700 |
Because it means that like, I mean, for decades, people weren't doing much researchy stuff 02:03:32.480 |
And one of the reasons was that none of us could be bothered implementing a accelerated 02:03:38.720 |
version of every damned, you know, CUDA operation that we needed to do the backward pass of 02:03:47.040 |
Nowadays, we only work with a subset of things that like PyTorch and stuff already supports. 02:03:52.880 |
So at the point where-- so this is the thing about why we're doing this stuff with Swift 02:03:58.360 |
now is that this is the foundations of something that in the next year or two will give us 02:04:05.840 |
all the way down infinitely hackable, fully differentiable system. 02:04:13.680 |
So we've talked about MatMul, we've talked about Autodiff. 02:04:18.640 |
So layers are now super easy, it just uses all the same stuff you've seen. 02:04:22.580 |
And so if you go look at layer, it's a protocol, just like we were talking before. 02:04:26.400 |
And layers are differentiable, like they contain bags of parameters, just like we just saw. 02:04:34.160 |
The requirement inside of a layer is you have to have a call. 02:04:36.680 |
So layers in Swift are callable, just like you'd expect. 02:04:40.320 |
And they have-- they work with any type that's an input or output. 02:04:43.820 |
And what layer says is the input and output types just have to be differentiable. 02:04:51.200 |
And so underneath here, you can see us defining a few different layers. 02:04:54.360 |
So for example, here is the definition of a dense layer. 02:04:58.600 |
And so then now that we've got our layers and we've got our forward pass, that's enough 02:05:03.600 |
to actually allow us to do many batch training. 02:05:05.880 |
And I'm not going to go through all this in any detail, other than just to point out that 02:05:13.160 |
And it's just a layer, because it's just a differentiable thing that has a call function. 02:05:20.640 |
We can define negative log likelihood, log sum X. Once we've done all that, we're allowed 02:05:26.880 |
to use the Swift for TS version, because we've done it ourselves. 02:05:30.400 |
And at that point, we can create a training loop. 02:05:32.240 |
So we all define accuracy, just like we did in PyTorch, set up our mini-batch, just like 02:05:40.440 |
And at this point, we can create a training loop. 02:05:43.280 |
So we just go through and grab our X and Y and update all of our things. 02:05:48.040 |
You'll notice that there's no torch.nograd here, and that's because in Swift, you opt 02:05:57.360 |
So you wrap the stuff that wants gradients inside value with gradient. 02:06:04.600 |
Now one really cool thing is that all of these things end up packaged up together, thanks 02:06:11.800 |
to the layer protocol, into a thing called variables. 02:06:22.720 |
So thanks to that, we don't have to write anything else. 02:06:25.640 |
We can just say model.variables minus equals LR times grad, and it just works. 02:06:32.080 |
Thanks to the magic of protocol extensions, our model got that for free, because we said 02:06:40.480 |
Okay, so I think that's about all we wanted to show there. 02:06:45.720 |
So now that we've got that, we're actually allowed to use optimizers, so we can just 02:06:52.560 |
And that gives us a standard training loop, which we can use. 02:06:56.600 |
And then on top of that, we can add callbacks, which I won't go into the details, but you 02:07:05.800 |
And you will find that, let's find them, here we go. 02:07:10.560 |
We'll find a letter class, which has the same callbacks that we're used to. 02:07:17.960 |
And then, eventually, we'll get to the point where we've actually written a stateful optimizer 02:07:25.280 |
with hyperparameters, again, just like we saw in PyTorch. 02:07:28.160 |
And most of this will now look very familiar. 02:07:30.920 |
We won't look at dictionaries now, but they're almost identical to PyTorch dictionaries, 02:07:36.560 |
So you see, we've got states and steppers and stats, just like in PyTorch. 02:07:41.160 |
And so, eventually, you'll see we have things like the lamb optimizer written in Swift, 02:07:51.880 |
And things like square derivatives, we can use our nice little Unicode to make them a 02:07:57.560 |
And so now we have a function created SGD optimizer, a function to create an atom optimizer. 02:08:03.240 |
We have a function to do one-cycle scheduling. 02:08:08.000 |
And thanks to Matplotlib, we can check that it all works. 02:08:13.680 |
>> So this is really the power of the abstraction, coming back to one of the earlier questions 02:08:16.240 |
of earlier today we started in C. And we were talking about very abstract things like protocols 02:08:24.160 |
But when you get those basic things -- and this is one of the reasons why learning Swift 02:08:28.000 |
goes pretty quickly -- you get the basic idea, and now it applies everywhere. 02:08:41.840 |
And to say it's really very similar-looking code to what we have in PyTorch. 02:08:46.640 |
So then by the time we get to 11, other than this hacky workaround for the fact we can't 02:08:52.160 |
do control flow differentiation yet, coming very soon, our XResNet, as you've seen, looks 02:08:59.460 |
very similar, and we can train it in the usual way. 02:09:10.120 |
And Chris spent a couple of decades for us, first of all building a compiler framework 02:09:15.680 |
and then a compiler and then a C compiler and then a C++ compiler and then a new language 02:09:20.760 |
And then we came in and -- >> Let me correct you on one minor detail 02:09:30.200 |
Amazing people that I got to work with built all of this stuff. 02:09:35.240 |
And likewise, all of these workbooks were built by amazing people that we were lucky 02:09:44.520 |
And then let's look at -- so it's kind of like, thanks to all that work, we then got 02:09:51.960 |
to a point where, 18 months ago, you and I met, you just joined Google, we were at the 02:09:58.960 |
TensorFlow symposium, and I said, what are you doing here? 02:10:02.840 |
I thought, you're a compiler guy, and he said, oh, well, now I'm going to be a deep learning 02:10:13.120 |
So then I complained about how terrible everything was, and Chris said -- so Chris said, I've 02:10:20.040 |
I was like, we need a lot more of a new framework, you know, I described the problems that we've 02:10:24.280 |
talked about with, like, where Python's up to, and Chris said, well, I might actually 02:10:28.720 |
be creating a new language for deep learning, which I was very excited about because I'm 02:10:33.240 |
totally not happy with the current languages we have for deep learning. 02:10:37.600 |
So then 12 months ago, I guess we started asking this question of, like, what if high-level 02:10:43.760 |
API design actually influenced the creation of a differentiable programming language? 02:10:50.800 |
>> And so to me, one of the dreams is when you connect the building of a thing with teaching 02:11:02.240 |
And one of the beautiful things about FastAI is pulling together, both building the framework, 02:11:07.200 |
teaching the framework, and doing research with the framework. 02:11:11.120 |
So next time we caught up, I said, maybe we should try writing FastAI in Swift. 02:11:23.200 |
>> Well, so I think the one thing before this, I'm like, hey, Jeremy, it's starting to work. 02:11:27.920 |
>> And he says, oh, cool, can we ship it yet? 02:11:34.640 |
>> So that's the course where we teach people to use this thing that doesn't exist yet. 02:11:39.600 |
>> And I think I said naively, I like deadlines. 02:11:45.560 |
>> So then one month ago, we created a GitHub repo. 02:11:53.800 |
We sat in a room with the Swift for TensorFlow team. 02:11:56.320 |
And we wrote first line of the first notebook. 02:11:59.920 |
And you told your team, hey, we're going to write all of the Python book from scratch. 02:12:04.620 |
And they basically said, what have you gotten us into? 02:12:12.760 |
So, I mean, to me, the question is still this, which is, what if high-level API design was 02:12:17.500 |
able to influence the creation of a differentiable programming language? 02:12:20.640 |
And I guess we started answering that question. 02:12:25.040 |
I mean, I think that what we've learned even over the last month is that there's still 02:12:29.760 |
And I think this is the kind of thing that really benefits from different kinds of people 02:12:32.940 |
and perspectives and a different set of challenges. 02:12:36.760 |
And just today and yesterday working on data blocks, a breakthrough happened where there's 02:12:42.120 |
an entirely new way to reimagine it as this functional composition that solves a lot of 02:12:48.040 |
And a lot of those kinds of breakthroughs, I think, are still just waiting to happen. 02:12:51.200 |
>> I mean, it's been an interesting process for me, Chris, because we decided to go back 02:12:59.600 |
And as we did it, we were thinking, like, what would this look like when we get to Swift? 02:13:04.840 |
And so even as we did the Python library, we created the idea of stateful optimizers. 02:13:16.240 |
But that's also been interesting, I've seen, as an outsider from a distance, that Swift 02:13:21.000 |
syntax seems to be changing thanks to some of this. 02:13:24.920 |
So there are new features in Swift, including callables. 02:13:28.260 |
That's a thing that exists because of Swift for DensorFlow. 02:13:31.720 |
The Python interoperability, believe it or not, we drove that because it's really important 02:13:36.240 |
There's a bunch of stuff like that that's already being driven by this project, and 02:13:40.440 |
And so, like, making it so float can default away to nothing. 02:13:45.840 |
And otherwise, it wouldn't have been a priority. 02:13:47.840 |
>> So I mean, so it's still really, really early days. 02:13:54.520 |
And I think the question, in my mind, is now, like, what will happen when data scientists 02:14:01.400 |
in lots of different domains have access to an infinitely hackable, differentiable language 02:14:07.840 |
along with the world of all of the C libraries, you know, like, what do we end up with? 02:14:14.220 |
Because we kind of -- we're starting from very little in terms of ecosystem, right? 02:14:19.400 |
But, like, there are things in Swift -- we haven't covered, for example, something called 02:14:23.400 |
K paths, but there's this thing called K paths, which might let us write, like, little query 02:14:32.320 |
>> Yeah, give me all the parameters out of this thing and let me do something interesting 02:14:38.200 |
>> And so, you know, I guess at this point, I'm kind of saying that people is like, pick 02:14:45.240 |
some piece of this that might be interesting in your domain. 02:14:49.760 |
And over the next 12 to 24 months, explore with us, so that, you know, as Chris said, 02:14:59.000 |
putting this airplane together whilst it's flying, you know, by the time it's -- actually, 02:15:05.240 |
all the pieces are together, you'll have your domain-specific pieces together, and I think 02:15:11.280 |
>> And one of the things that's also really important about this project is it's not cast 02:15:16.760 |
So we can and we will change it to make it great. 02:15:19.320 |
And to me, we're very much in the phase of let's focus on making the basic ingredients 02:15:25.600 |
that everybody puts things together, like, let's talk about the core of layer is. 02:15:31.740 |
Let's talk about what all these basic things are. 02:15:37.640 |
>> Yeah, let's -- we can consider float down. 02:15:40.760 |
But let's actually really focus on getting these right so that then we can build amazing 02:15:47.000 |
And to me, the thing I'm looking forward to is just innovation. 02:15:51.520 |
Innovation happens when you make things that were previously hard accessible to more people, 02:15:57.280 |
>> So the thing I keep hearing is, how do I get involved? 02:16:00.380 |
So like, I think there's many places you can get involved, but like, to me, the best way 02:16:06.240 |
to get involved is by trying to start using little bits of this in work that you're doing 02:16:12.480 |
or utilities you're building or hobbies you have, you know, just try -- you know, it's 02:16:18.500 |
not so much how do I add some new custom derivative thing into Swift and TensorFlow, but it's 02:16:25.560 |
like, you know, implement some notebook that didn't exist before or take some Python library 02:16:30.400 |
that you've liked using and try and create a Swift version. 02:16:35.640 |
So one of the things when Swift first came up is that a lot of people were blogging about 02:16:39.360 |
their experiences and what they learned and what they liked and what they didn't like, 02:16:42.920 |
and that's an amazing communication channel because the team listens to that, and that's 02:16:46.400 |
a huge feedback loop because we can see somebody was struggling about it, and even over the 02:16:51.040 |
last couple of weeks, when Jeremy complains about something, we're like, oh, that is really 02:16:55.360 |
Maybe we should fix that, and we do change it, and then progress happens, right? 02:16:59.240 |
>> And so we want that feedback loop in blogs and other kinds of -- 02:17:01.760 |
>> Yeah, it's a very receptive community, very receptive team, for sure. 02:17:05.760 |
Were there any highlight questions that you wanted to ask before we wrapped up, Rachel? 02:17:17.240 |
It's been an absolute honor and absolute pleasure to get to work with you and with your team. 02:17:23.940 |
It's like a dream come true for me and to see what is being built here, and you're always 02:17:30.160 |
super humble about your influence, but, I mean, you've been such an extraordinary influence 02:17:35.040 |
in all the things that you've helped make happen, and I'm super thrilled for our little 02:17:41.940 |
community that you've -- let us piggyback on yours a little bit. 02:17:48.240 |
>> Oh, and from my perspective, as a tool builder, tool builders exist because of users, 02:17:55.380 |
and I want to build a beautiful thing, and I think everybody working on the project wants 02:17:58.520 |
to build something that is really beautiful, really profound, that enables people to do 02:18:05.200 |
>> I think we're already seeing that starting to happen, so thank you so much, and thanks