back to index

Lesson 14 (2019) - Swift: C interop; Protocols; Putting it all together


Chapters

0:0 Intro
0:25 Overview
0:47 Shoutouts
1:21 Package cache
2:52 Image processing kernels
3:26 Excellet
4:13 Fusion nodes
4:41 The big question
7:49 MOA
9:36 The Problem
11:44 Tensor Comprehension
14:35 Summary
14:51 The future
16:24 Audio processing
23:47 C
27:10 C header files
29:58 Inline functions
30:36 C compiler
32:25 Example
39:6 OpenCV
40:16 SwiftCV
44:48 Dynamically linked or statically linked
45:24 How much C do you need
47:3 Why Im teaching this
53:36 OpenCV Data Blocks
57:55 Layer

Whisper Transcript | Transcript Only Page

00:00:00.000 | Welcome to the final lesson of this section of 2019.
00:00:08.060 | Although I guess it depends on the videos, what order we end up doing the extra ones.
00:00:12.500 | This is the final one that we're recording live here in San Francisco.
00:00:16.720 | Anyway, lesson 14.
00:00:19.060 | Lesson two of our special Swift episodes.
00:00:25.240 | This is what we'll be covering today.
00:00:27.600 | I won't read through it all, but basically we're going to be filling in the gap between
00:00:31.760 | matrix multiplication and training ImageNet with all the tricks.
00:00:38.080 | And along the way, we're going to be seeing a bunch of interesting Swift features and
00:00:42.740 | actually seeing how they make our code cleaner, safer, faster.
00:00:47.240 | I want to do a special shout out to a couple of our San Francisco study group members who
00:00:52.640 | have been particularly helpful over the last couple of weeks since I know nothing about
00:00:57.640 | Swift.
00:00:58.640 | It's been nice to have some folks who do, such as Alexis, who has been responsible actually
00:01:04.960 | for quite some of the most exciting material you're going to see today.
00:01:09.840 | And he is the CTO at Topology Eyewear.
00:01:13.040 | So if you need glasses, you should definitely go there and get algorithmically designed
00:01:16.960 | glasses, literally.
00:01:18.480 | So that's pretty cool.
00:01:19.480 | So thanks, Alexis, for your help.
00:01:20.680 | And thanks also to Pedro, who has almost single-handedly created this fantastic package cache that
00:01:28.520 | we have so that in your Jupyter Notebooks, you can import all the other modules that
00:01:35.120 | we're using and other exported modules from the Notebooks, and it doesn't have to recompile
00:01:38.560 | at all.
00:01:39.560 | And so that's really thanks to Pedro.
00:01:41.720 | And I actually am a customer of his as well, or at least I was when I used an iPhone.
00:01:46.320 | He's the developer of Camera Plus, which is the most popular camera application on the
00:01:50.480 | App Store, literally.
00:01:52.480 | And back when I used an iPhone, I loved that program.
00:01:54.880 | So I'm sure version two is even better, but I haven't tried version two.
00:01:58.160 | So you can use his camera while looking through your Topology Eyewear glasses.
00:02:02.920 | All right.
00:02:05.160 | So thanks to both of you.
00:02:07.280 | And where we left off last week was that I made a grand claim -- well, I pointed out
00:02:13.760 | a couple of things.
00:02:14.760 | I pointed out through this fantastic Halide video that actually running low-level kind
00:02:21.800 | of CUDA-kernel-y stuff fast is actually much harder than just running a bunch of for loops
00:02:26.920 | in order.
00:02:27.920 | And I showed you some stuff based on Halide, which showed here some ways that you can write
00:02:33.680 | it fast, and here's some ways you could do it quickly.
00:02:36.520 | And then I made the bold claim that being able to do this on the GPU through Swift is
00:02:42.240 | where we're heading.
00:02:43.540 | And so to find out how that's going to happen, let's hear it directly from Chris.
00:02:48.640 | >> Sure.
00:02:49.640 | Thanks, Jeremy.
00:02:51.000 | So we will briefly talk about this.
00:02:53.460 | So we went through this video, and the author of Halide gave a great talk about how in image
00:02:58.680 | processing kernels, there's actually a lot of different ways to get the computer to run
00:03:02.840 | this, and they all have very different performance characteristics, and it's really hard to take
00:03:06.860 | even a two-dimensional blur and make it go fast.
00:03:10.760 | But we're doing something even harder.
00:03:12.120 | We're not talking about two-dimensional images, we're talking about 5D matrices and tensors
00:03:16.360 | and lots of different operations that are composed together, and hundreds or thousands
00:03:20.120 | of ops, and trying to make that all go fast is really, really, really, really hard.
00:03:26.680 | So if you wanted to do that, what you'd do is you'd write a whole new compiler to do
00:03:29.640 | this, and it would take years and years of time.
00:03:32.120 | But fortunately, there's a great team at Google called the XLA team that has done all this
00:03:36.440 | for us.
00:03:37.440 | And so what XLA is, is it's exactly one of those things.
00:03:41.000 | It's something that takes in this graph of tensor operations, so things like convolutions
00:03:46.240 | and matmoles and adds and things like that.
00:03:49.480 | It does low-level optimizations to allocate buffers, to take these different kernels and
00:03:55.420 | fuse them together, and then it generates really high-performance code that runs on
00:03:59.280 | things like CPUs, GPUs, or TPUs, which are crazy-fast high-performance accelerators that
00:04:06.440 | Google has.
00:04:07.440 | And so XLA does all this stuff for us now, which is really exciting.
00:04:12.600 | And if you take the running bachelor example that we left off with, and we were talking
00:04:16.600 | about, this is the graph that XLA will generate for you.
00:04:19.760 | And this is generated from Swift code, actually.
00:04:23.680 | And so you can see here what these darker boxes are, is they're fusion nodes, where
00:04:27.680 | it's taken a whole bunch of different operations, pushed them together, gotten rid of memory
00:04:31.800 | transfers, pushed all the loops together.
00:04:35.600 | And the cool thing about this is, this is all existing shipping technology that TensorFlow
00:04:39.920 | has now.
00:04:42.760 | There's a big question, though, and a big gotcha, which is, this only works if you have
00:04:46.680 | graphs.
00:04:47.680 | And with TensorFlow 1, that was pretty straightforward, because TensorFlow 1 was all about graphs.
00:04:51.400 | Jeremy talked about the shipping, shipping, shipping, ship, ship, ship, shipping thingy,
00:04:57.300 | ship, ship, shipping, ship, ship, ship, I don't know.
00:05:00.440 | My recursion's wrong.
00:05:03.560 | And so with TensorFlow 1, it was really natural.
00:05:06.040 | With TensorFlow 2, with PyTorch, there's a bigger problem, which is, with eager mode,
00:05:09.720 | you don't have graphs.
00:05:10.720 | That's the whole point, is you want to have step at a time, you run one op at a time,
00:05:14.460 | and so you don't get the notion of these things.
00:05:17.760 | So what the entire world has figured out is that there's two basic approaches of getting
00:05:22.560 | graphs from eager mode.
00:05:24.440 | There's tracing, and there's different theories on tracing.
00:05:27.560 | There's staging and taking code and turning it into a graph algorithmically.
00:05:32.760 | And PyTorch and TensorFlow both have similar but different approaches to both of these
00:05:37.160 | different things.
00:05:38.400 | The problem with these things is they all have really weird side effects, and they're
00:05:41.440 | very difficult to reason about.
00:05:42.880 | And so if Swift for TensorFlow is an airplane, we've taken off, and we're just coming off
00:05:50.480 | the runway, but we're still building all this stuff into Swift for TensorFlow as the plane
00:05:54.640 | is flying.
00:05:55.920 | And so we don't actually have this today.
00:05:58.680 | The team was working on the demo, and it just didn't come together today.
00:06:03.940 | But this is really cool.
00:06:05.580 | And so one of the problems with tracing, for example, is that in PyTorch or in TensorFlow
00:06:10.920 | Python, when you trace, if you have control flow in your model, it will unroll the entire
00:06:15.880 | control flow.
00:06:16.920 | And so if you have an RNN, for example, it will unroll the entire RNN and make one gigantic
00:06:21.920 | thing.
00:06:22.920 | And some control flow you want to ignore, some control flow you want to keep in the graph.
00:06:25.920 | And so having more control over that is something that we think is really important.
00:06:29.440 | So, Chris, this nearly there is at the end of April.
00:06:33.980 | This video will be out somewhere around mid to late June.
00:06:38.800 | I suspect it will be up and running by then, and if it's not, you will personally go to
00:06:42.200 | the house of the person watching the video and fix it for them.
00:06:45.760 | So here's the deal.
00:06:48.080 | In two, three months, so that's July, look on the TensorFlow main page.
00:06:56.200 | There should be a co-lab demo showing this.
00:06:58.920 | So we'll see how the future -- >> And there should be a notebook in the
00:07:04.000 | hair brain repo that will be called batch norm or something.
00:07:08.680 | And we'll have an XLA version of this running.
00:07:12.400 | >> And so Swift also has this thing called graph program extraction.
00:07:14.680 | The basic idea here is where autograph and torch script are doing these things where
00:07:18.440 | they're kind of like Python but kind of not, and Jeremy was talking before about how you
00:07:23.040 | had a comment in the wrong place and torch script will fall over and it's not -- it kind
00:07:26.600 | of looks like Python but really, really is not.
00:07:29.680 | With Swift, we have a compiled, reasonable language, and so we could just use compiler
00:07:34.160 | techniques to form a graph, pull it out for you.
00:07:36.600 | And so a lot of things that are very magic and very weird are just very natural and plug
00:07:40.760 | into the system.
00:07:41.760 | So I'm very excited about where all this comes.
00:07:44.080 | But for right now, this doesn't exist.
00:07:46.640 | The airplane is being built.
00:07:48.160 | So one last thing that doesn't exist, because Jeremy wanted to talk about this, he's very
00:07:53.000 | excited, is there's this question about what does MLIR relate to XLA, what is all this
00:07:57.600 | stuff going on, what does this make sense for TensorFlow?
00:08:00.720 | And the way I look at this is XLA is really good if you have -- if you want high performance
00:08:04.840 | with these common operators like matrix multiplication, convolution, things like that.
00:08:09.400 | These operators can be combined in lots of different ways.
00:08:11.480 | And so these are the primitives that a lot of deep learning is built out of.
00:08:16.880 | And XLA is really awesome for high performance, particularly weird accelerators.
00:08:21.360 | But there's a catch with this, because one of the things that power deep learning is
00:08:26.320 | the ability to innovate in many of these ways.
00:08:28.360 | And so depth-wise convolutions came out, and suddenly with many fewer parameters, you can
00:08:32.800 | get really good accuracy wins, and you couldn't do that if you just had convolution.
00:08:37.160 | Yeah. And like on the other hand, like depth-wise convolutions are a specific case of grouped
00:08:43.880 | convolutions.
00:08:44.880 | And the reason we haven't been talking about grouped convolutions in class is that so far
00:08:48.440 | no one's really got them running quickly.
00:08:51.520 | And so there's this whole thing that like somebody wrote a paper about three years ago,
00:08:56.080 | which basically says, hey, here's a way to get all the benefit of convolutions, but much,
00:09:00.120 | much faster.
00:09:01.360 | And we're still -- you know, the practical deep learning for coders course still doesn't
00:09:05.080 | teach them, because they're still not practical, because no one's got them running quickly
00:09:09.040 | And so we've been talking about this whole course.
00:09:11.040 | The goal with this whole platform is to make it an infinitely hackable platform.
00:09:15.280 | And so if it's infinitely hackable down in convolution, or give up all performance around
00:09:19.120 | a CPU, well, that's not good enough.
00:09:20.880 | And so what MLIR is about is there's multiple different aspects of the project, but I think
00:09:25.800 | one Jeremy's most excited about is, what about custom ops, right?
00:09:30.120 | How can we make it so you don't bottom out at matmul in convolution, and so you get that
00:09:33.400 | hackability to invent the next great convolution, right?
00:09:36.400 | So the cool thing about this is that this is a solved problem.
00:09:39.720 | The problem is all the problems -- all the solutions are in these weird systems that
00:09:43.600 | don't talk to each other, and they don't work well together, and they're solving different
00:09:46.520 | slices of it.
00:09:47.520 | So Halide, for example, is a really awesome system if you're looking for 2D image processing
00:09:51.600 | algorithms, right?
00:09:53.160 | That doesn't really help us.
00:09:54.920 | Other people have built systems on top of Halide to try to adapt it, and things like
00:09:57.800 | that.
00:09:58.800 | But this is really not a perfect solution.
00:10:02.000 | There's other solutions, so PlatML was recently acquired by Intel, and they have a lot of
00:10:06.120 | really cool compiler technology that is kind of in their little space.
00:10:09.400 | TVM is a really exciting project, also building on Halide, pulling it together with its own
00:10:13.800 | secret sauce of different things.
00:10:15.280 | And it's not just the compiler technology.
00:10:17.000 | It's also in each of these cases they've built some kind of domain-specific language to make
00:10:20.360 | it easier for you, the data scientist, to write what you want in a quick and easy way.
00:10:25.600 | Right.
00:10:26.600 | And so -- and often what happens here is that each of these plug into the deep learning
00:10:30.720 | frameworks in different ways, right?
00:10:32.800 | And so what you end up having to do is you end up in a mode of saying, TVM's really good
00:10:36.840 | for this set of stuff.
00:10:38.240 | And Tensor Comprehensions, which is another cool research project, is good at these kinds
00:10:42.880 | of things.
00:10:43.880 | And so I have to pick and choose the framework I want to use based on which one they happen
00:10:47.920 | to build into, which is not very --
00:10:49.800 | >> And again, we don't teach this in practical deep learning for coders because it's not
00:10:52.600 | practical yet.
00:10:53.600 | You know, these things are generally research-quality code.
00:10:55.880 | They generally don't integrate with things like PyTorch.
00:10:58.860 | They generally require lots of complex build steps.
00:11:01.440 | >> The compile time is often really slow.
00:11:04.400 | They work really great on the algorithm and the paper, but they kind of fall apart on
00:11:07.280 | things that aren't.
00:11:08.280 | All those kinds of problems.
00:11:09.520 | So our goal and our vision here with TensorFlow but with Swiffer TensorFlow also is to make
00:11:15.120 | it so that you can express things at the highest level of abstraction you can.
00:11:18.120 | So if you have a batch norm layer, totally go for that batch norm layer.
00:11:21.640 | If that's what you want, use it, and you're good.
00:11:23.680 | If you want to implement your own running batch norm, you can do that in terms of mat
00:11:27.120 | mules and adds and things like that, fine.
00:11:30.240 | If you want to sync down further, you can go down to one of these systems.
00:11:34.200 | If you want to go down further, you can write assembly code for your accelerator if that's
00:11:37.800 | the thing you're into.
00:11:38.800 | But you should be able to get all the way down and pick that level of abstraction that
00:11:42.400 | allows you to do what you want to do.
00:11:44.320 | And so I just want to give Tensor comprehensions as one random example of how cool this can
00:11:49.720 | So this is taken straight out of their paper.
00:11:51.760 | This is not integrated.
00:11:53.240 | But Tensor comprehensions gives you what is basically like Einstein notation on total
00:11:57.400 | steroids.
00:11:58.400 | It's like insome.
00:11:59.400 | Yes, good point.
00:12:00.400 | It's like insome, but taken to a crazy extreme level.
00:12:07.160 | And what Tensor comprehensions is doing is you write this very simple, this very simple
00:12:11.960 | code.
00:12:12.960 | It's admittedly kind of weird, and it has magic, and the syntax isn't the important
00:12:17.320 | thing.
00:12:18.320 | But you write pretty simple code, and then it does all this really hardcore compiler
00:12:22.600 | stuff.
00:12:23.600 | So it starts out with your code, it then fuses the different loops, because these two things
00:12:27.480 | expand out to loops.
00:12:28.480 | It does inference on what are the ranges for all the loops and what the variables that
00:12:33.080 | you're indexing into the arrays do.
00:12:35.280 | Then fuse and tile these things.
00:12:36.840 | Fuse, tile, then sync the code to make it so the inner loops can be vectorized.
00:12:43.040 | This is actually a particularly interesting example, because this thing here, gem, is
00:12:47.800 | a generalized matrix-matrix product.
00:12:50.240 | This is actually the thing on which large amounts of deep learning and linear algebra
00:12:54.800 | and stuff is based on.
00:12:55.800 | So a lot of the stuff we write ends up calling a gem.
00:12:59.000 | And the fact that you can write this thing into lines of code, if you look inside most
00:13:04.200 | linear algebra libraries, there will be hundreds or thousands of lines of code to implement
00:13:08.800 | something like this.
00:13:09.800 | So the fact that you can do this so concisely is super cool.
00:13:13.200 | And so the idea that then we could do nice little tweaks on convolutions or whatever
00:13:20.200 | in similar amounts of code is something that I get very excited about.
00:13:23.720 | Yeah.
00:13:24.720 | Me too.
00:13:25.720 | And the other thing to consider with this is that, again, generating really good code
00:13:28.720 | for this is hard.
00:13:30.360 | But once you make it so that you separate out the code that gets compiled from the
00:13:34.980 | algorithms that get applied to it, now you can do search over those algorithms.
00:13:39.040 | Now you can apply machine learning to the compiler itself.
00:13:42.040 | And now you can do some really cool things that open up new doors.
00:13:46.080 | So I mean, that's actually really interesting because in the world of databases, which is
00:13:50.800 | a lot more mature than the world of deep learning, this is how it works, right?
00:13:54.160 | You have a DSL, normally called SQL, where you express what you want, not how to get
00:13:59.280 | there.
00:14:00.280 | And then there's a thing called a query analyzer or query compiler or query optimizer that figures
00:14:05.000 | out the best way to get there.
00:14:06.600 | And it'll do crazy stuff like genetic algorithms and all kinds of heuristics.
00:14:10.880 | And so like what we're seeing here is we'll be able to do that for deep learning, our
00:14:18.120 | own DSLs and our own optimizers, not deep learning optimizers, but more like database
00:14:22.760 | optimizers.
00:14:23.760 | Yeah.
00:14:24.760 | So it's going to be really exciting.
00:14:25.760 | So we're building all this.
00:14:26.760 | The ML Air part of this is longer time horizon.
00:14:29.080 | This is not going to be done by the time this video comes out.
00:14:31.840 | But this is all stuff that's getting built, and it's all open source, and it's super exciting.
00:14:35.880 | So to overall summarize all this TensorFlow infrastructure stuff, so TensorFlow is deeply
00:14:40.200 | investing in the fundamental parts of the system.
00:14:42.180 | This includes the compiler stuff, also the runtime, op dispatch, the kernels themselves.
00:14:47.040 | There's tons and tons and tons of stuff, and it's all super exciting.
00:14:51.160 | So let's stop talking about the future.
00:14:52.960 | Yeah.
00:14:53.960 | I mean, that's kind of boring.
00:14:54.960 | Like what can we do today?
00:14:55.960 | Yeah, this is very exciting, Chris, that sometime in the next year or two, there'll be these
00:14:59.240 | really fast things.
00:15:00.600 | But I actually know about some really fast languages right now.
00:15:02.880 | Really?
00:15:03.880 | Yeah.
00:15:04.880 | They're called C, C++ and Swift.
00:15:05.880 | Seriously?
00:15:07.880 | Yeah.
00:15:08.880 | Let me show you what I mean.
00:15:09.880 | It's actually languages that we can make run really fast right now.
00:15:15.040 | And it's quite amazing, actually, how easy we can make this.
00:15:22.800 | Like when you say to an average data scientist, hey, you can now integrate C libraries, their
00:15:29.040 | response is not likely to be oh, awesome, right?
00:15:31.960 | Because data scientists don't generally work at the level of C libraries.
00:15:36.680 | But data scientists work in some domain, right?
00:15:40.760 | You work in neuroradiology, image acquisition, or you work in astrophysics or whatever.
00:15:47.880 | And in your domain, there will be many C libraries that do the exact thing that you want to do
00:15:54.960 | at great speed, right?
00:15:57.320 | And currently, you can only access the ones that have been wrapped in Python, and you
00:16:02.360 | can only access the bits that have been wrapped in Python.
00:16:05.640 | What if you could actually access the entire world of software that's been written in C,
00:16:11.520 | which is what most software has been written in, and it's easy enough that, you know, an
00:16:16.800 | average data scientist can do it.
00:16:19.000 | So here's what it looks like, right?
00:16:25.160 | Let's say we want to do audio processing, okay?
00:16:29.440 | And so for audio processing, I'm thinking like, oh, how do I start doing audio processing
00:16:35.520 | and in my quick look around, I couldn't see much in Swift that works on Linux for audio
00:16:40.960 | processing.
00:16:41.960 | >> So you write an MP3 decoder from scratch, right?
00:16:43.840 | >> Yeah, I thought about doing an MP3 decoder from scratch, but then I --
00:16:46.920 | >> That's a single processing.
00:16:48.160 | >> I figured, like, people have MP3 decoders already.
00:16:51.240 | What are they doing?
00:16:52.240 | And I looked it up on the Internet, and it turns out there's lots of C libraries that
00:16:54.880 | do it.
00:16:55.880 | And one popular one, apparently, is called SOX, right?
00:16:59.600 | And I'm a data scientist, I'm not an audio processing person, so this is my process last
00:17:03.400 | week was, like, C, library, MP3, decode, and it says use SOX.
00:17:09.800 | So look at this.
00:17:10.800 | I've got something here that says import SOX.
00:17:15.160 | And then it says, in it's SOX, and then it says read SOX audio.
00:17:20.160 | Where did this come from?
00:17:21.600 | Well, this comes from a library.
00:17:27.680 | Here it is, sound exchange.
00:17:29.840 | This is what C library home pages tend to look like, they tend to be very 90s.
00:17:35.380 | And basically, I looked at the documentation, and C library documentation tends to be less
00:17:40.680 | than obvious to kind of see what's going on, but, you know, you just kind of have to learn
00:17:48.240 | to read it, just like you learn to read Python documentation.
00:17:50.520 | So basically it says you have to use this header, and then these are the various functions
00:17:54.340 | you can call.
00:17:55.340 | There's something called edit, and there's something called open.
00:17:57.280 | So here's what I did.
00:17:59.280 | I jumped into VIM, and I created a directory, and I called it Swift SOX.
00:18:09.560 | And in that directory, I created a few things.
00:18:12.680 | I created a file called package.swift, and this is the thing that defines a Swift package.
00:18:21.680 | A Swift package is something that you can import, and you can actually type Swift package
00:18:26.740 | in it, and it will kind of create this skeleton for you.
00:18:29.760 | Personally, my approach to wrapping a new C library is to always copy an existing C library
00:18:35.740 | folder that I've created, and then just change the name, because every one of them has the
00:18:39.280 | same three files, right?
00:18:41.440 | So this is file number one, you have to give it a name, and then you have to say what's
00:18:44.680 | the name of the library in C. And in SOX, the name of the library is SOX.
00:18:49.800 | Part two is you have to create a file called sources SOX module.moduleMap, and it contains
00:19:00.040 | always these exact lines of code, again, where you just change the word SOX, and the word
00:19:04.960 | SOX, and the word SOX.
00:19:06.780 | So it's not rocket science.
00:19:08.360 | >> So what this is doing is this is saying that you want to call it SOX and Swift.
00:19:13.160 | They called it SOXU.H for some reason.
00:19:15.680 | >> Well, I actually call it SOXU.H, which we'll see in a moment.
00:19:18.960 | And then, but the library isn't -- it gets linked in by libsox.
00:19:22.720 | >> Yeah, exactly.
00:19:23.720 | >> So all these things in C can be different.
00:19:25.840 | >> Yeah.
00:19:26.840 | So, you know, most of the time, we can make them look the same.
00:19:30.560 | And so then the final third file that you have to create is the .H file.
00:19:35.160 | And so you put that in sources, SOX, and I call it SOXU umbrella header.H, and that contains
00:19:42.160 | one line of code, which is the header file, which as you saw from the documentation, you
00:19:46.760 | just copy and paste it from there.
00:19:48.480 | So once you add these three files, you can then do that, okay?
00:19:56.000 | And so now I can import SOX.
00:19:58.120 | And now this thing, this C function is available to Swift, right?
00:20:06.840 | And so this is kind of wild, right?
00:20:08.600 | Because suddenly, like a lot of what this impractical deep learning for coders course
00:20:12.080 | is about is like opening doors that weren't available to us as data scientists before
00:20:17.480 | and thinking what happens if you go through that door, right?
00:20:20.120 | So what happens if you go through the door where suddenly all of the World C libraries
00:20:24.280 | are available to you?
00:20:25.520 | What can you do in your domain that nobody was doing before?
00:20:29.180 | Because there wasn't any Python libraries like that.
00:20:32.660 | So I -- and so what I tend to do is write little Swift functions that wrap the C functions
00:20:39.760 | to make them look nice.
00:20:40.760 | So here's init SOX, which checks for the value I'm told the docs say to check for.
00:20:46.120 | And SOX open read, for some reason you have to pass nil, nil, nil, so I just wrap that.
00:20:51.420 | And so now I can say read SOX audio.
00:20:58.880 | And so that's going to return some kind of structure.
00:21:03.640 | And so you have to read the documentation to find out what it is or copy and paste somebody
00:21:06.840 | else's code.
00:21:08.720 | Very often the thing that's returned to you is going to be a C pointer.
00:21:13.160 | And that's no problem.
00:21:15.200 | Swift is perfectly happy with pointers.
00:21:16.720 | You just say point E to grab the thing that it's pointing at.
00:21:20.000 | And according to the documentation, there's going to be something called signal, which
00:21:24.520 | is going to contain things like sample rate, precision, channels, and length.
00:21:30.840 | And so I can -- let's run those two.
00:21:36.240 | So I can run that, and I can see I've opened an audio file with a C library without any
00:21:41.960 | extra stuff.
00:21:42.960 | >> One of the things you can do is you can type SOX tab, and now here's all the stuff
00:21:50.280 | that's coming in from that header file.
00:21:53.040 | >> That's wild.
00:21:54.040 | Yeah.
00:21:55.040 | Super cool.
00:21:56.040 | So now I can go ahead and read that.
00:22:00.960 | And this is kind of somewhat similar to Python.
00:22:04.600 | In Python you can open C libraries in theory and work with them.
00:22:09.240 | But I don't do it, almost never do it, because I find that when I try to -- the thing you
00:22:14.240 | get back are these C structures and pointers, which I can't work with in Python in a convenient
00:22:21.520 | Or if I do use things like PyBind11, which is something that helps with that, then I
00:22:25.320 | have to create all these make scripts and compile processes, and I just don't bother,
00:22:30.160 | right?
00:22:31.160 | None of us bother.
00:22:32.160 | But in Swift, it's totally fine.
00:22:34.080 | And then the nice thing is we can bring Python and C and Swift together by typing import Python.
00:22:39.920 | >> Be unholy marriage.
00:22:41.240 | >> Yeah.
00:22:42.240 | Now we can just say -- we can take our C array and say make NumPy array and plot it, right?
00:22:51.360 | So we're really bringing it all together now.
00:22:54.200 | And we can even use the ipython.display, and we can hear some audio.
00:23:00.360 | >> Hi, everyone.
00:23:01.360 | I'm Chris.
00:23:02.360 | >> Hi, everyone.
00:23:04.360 | Hi, Chris.
00:23:05.360 | >> Hi, Jeremy.
00:23:06.360 | >> Thank you, Jeremy.
00:23:07.360 | >> My pleasure.
00:23:08.360 | All right.
00:23:09.360 | So --
00:23:10.360 | >> Why did I say this to this again?
00:23:15.360 | [ Laughter ]
00:23:16.360 | >> Okay.
00:23:17.360 | So this is pretty great, right?
00:23:19.360 | We've got Swift libraries, C libraries, Python libraries.
00:23:23.960 | We can bring them all together.
00:23:25.000 | We can do stuff that our peers aren't doing yet.
00:23:28.720 | But what I want to know, Chris, is how the hell is this even possible?
00:23:32.720 | >> Wow.
00:23:33.720 | Okay.
00:23:34.720 | Your guy likes to look under the covers or under the hood.
00:23:38.440 | Where's the -- cool.
00:23:41.440 | So let's talk about this.
00:23:43.640 | C is really a very simple language.
00:23:45.400 | So it should be no problem to do this, right?
00:23:47.640 | So C is two things, actually.
00:23:50.720 | It's really important.
00:23:52.080 | I think you were just talking about why it's actually very useful.
00:23:54.680 | There's tons of code available in C. A lot of that C is really useful.
00:23:58.240 | But C is actually a terrible, crazy, gross language on its own right.
00:24:03.400 | C has all these horrible things in it, like pointers, that are horribly unsafe.
00:24:08.240 | >> And we have a question.
00:24:10.240 | >> Oh.
00:24:11.240 | Let's do it.
00:24:12.240 | >> Is it possible to achieve similar results in Python using something like scytheon?
00:24:16.240 | >> Yeah.
00:24:17.240 | Absolutely.
00:24:18.240 | So scytheon is a Python-like language which compiles to C.
00:24:27.040 | And I would generally rather write scytheon than C for integrating C with Python.
00:24:37.400 | You still kind of -- it's actually easier in a Jupyter notebook because you can just
00:24:41.200 | say percent, percent scytheon and kind of integrate it.
00:24:43.440 | But as soon as you want to start shipping a module with that thing, which presumably
00:24:48.000 | is the purpose is you want to share it, you then have to deal with, like, build scripts
00:24:52.200 | and stuff like that.
00:24:53.680 | So scytheon has done an amazing job of kind of making it as easy as possible.
00:25:00.320 | But I personally have tried to do quite a lot with scytheon in the last few months and
00:25:04.680 | ended up swearing off it because it's just still not convenient enough.
00:25:09.680 | I can't quite use a normal debugger or a normal profiler and just ship the code directly.
00:25:16.240 | And it's still -- yeah, it's great for Python if that's what you're working with.
00:25:21.400 | But it's nowhere near as convenient.
00:25:23.480 | I've created Swift C libraries, I created a Swift C library within a week of starting
00:25:28.040 | to use Swift.
00:25:29.040 | It was just very natural.
00:25:30.200 | >> Cool.
00:25:31.200 | And so the thing I want to underscore here is that C is actually really complicated.
00:25:35.640 | C has macros.
00:25:36.640 | It's got this preprocessor thing going on.
00:25:38.160 | It's got bit fields and unions and it's weird notion of what verbs are.
00:25:41.680 | It's got volatiles.
00:25:42.680 | It's got all this crazy stuff that the grammar is context sensitive and gross.
00:25:48.680 | And so it's just actually really hard to deal with.
00:25:51.280 | Does it sound like somebody who's been through the process of writing a C compiler and came
00:25:54.920 | out the other side?
00:25:55.920 | >> Well, so the only thing worse than C is C++.
00:26:00.040 | And it has this like dual side of it.
00:26:03.400 | It's both more gross and huge and it's also more important in some ways.
00:26:07.920 | And so Swift doesn't integrate with C++ today, but we want to be able to.
00:26:11.680 | We want to be able to provide the same level of integration that you just saw with C and
00:26:16.360 | But how are we going to do that?
00:26:17.760 | Well, Swift loves C APIs like Jeremy was just saying.
00:26:21.360 | And so we love C APIs because we want you to be able to directly access all this cool
00:26:25.560 | functionality that exists in the world.
00:26:27.560 | And so the way it works as you just saw is we take the C ideas and remap them into Swift.
00:26:32.560 | And so because of that, because they're native pure Swift things, that's where you get the
00:26:37.480 | debugger integration.
00:26:38.480 | That's where you get code completion.
00:26:39.480 | That's where you get all the things you expect to work in Swift talking to dusty deck old
00:26:44.400 | grody C code from the 80s or wherever you got it from, whatever epoch.
00:26:50.680 | And so we also don't want to have wrappers or overhead because that's totally not what
00:26:56.080 | Swift's about.
00:26:57.320 | So Jeremy showed you that usually when you import the C API into Swift, it looks like
00:27:02.180 | a C API.
00:27:03.260 | And so you could, but the nice thing about that is that you can build the APIs you want
00:27:06.960 | to wrap it and you can build your abstractions and make that all good in Swift.
00:27:11.240 | So one of the ways this happens is that inside the Swift compiler, it can actually read C
00:27:16.840 | header files.
00:27:18.280 | And so we don't have a great way to plug this into workbooks quite yet, but Swift can actually
00:27:24.000 | take a C header file like math.h, which has macros.
00:27:27.360 | Here's M under bar E because M under bar E is a good way to name E apparently.
00:27:32.800 | Here's the old school square root.
00:27:34.500 | Here's the sine and cos function, which of course it returns sine and cosine in through
00:27:38.480 | pointers because C doesn't have tuples.
00:27:41.320 | And so when you import all that stuff into Swift, you get M under bar E as a double that
00:27:48.160 | you can get.
00:27:49.160 | You have square root and you can totally call square root.
00:27:50.400 | You have sine and cos.
00:27:51.400 | You get this unsafe mutable pointer double thing, which we'll talk about later.
00:27:55.400 | Similarly, like malloc, free, realloc, all this stuff exists.
00:27:59.400 | And so just to show you how crazy this is, let's see if we can do the side by side thing.
00:28:04.440 | Can you make it do side by side?
00:28:05.920 | Is that a challenge?
00:28:10.800 | My window skills are dusty.
00:28:18.200 | Check it out.
00:28:19.200 | Okay.
00:28:20.200 | Beautiful.
00:28:21.200 | So what we have here is we have the original header file, math.h on the left.
00:28:24.880 | If you look at this, you'll see lots of horrible things in C that everybody forgets about because
00:28:30.000 | you never write C like this, but this is what C looks like when you're talking about libraries.
00:28:35.280 | We've got a whole bunch of if defs.
00:28:36.680 | We've got macros.
00:28:38.560 | We've got like crazy macros.
00:28:41.640 | We've got conditionally enabled things.
00:28:43.600 | We've got these things are also macros.
00:28:46.120 | We've got inline functions.
00:28:47.600 | We've got tons and tons and tons of stuff.
00:28:50.480 | We've got comments.
00:28:51.600 | We've got structures like exception, of course, that's an exception, right?
00:28:57.720 | So when you import this into Swift, this is what the Swift compiler sees.
00:29:00.640 | You see something that looks very similar, but this is all Swift syntax.
00:29:06.160 | So you see you get the header, the comments, you get all the same functions.
00:29:09.800 | You get all of the -- like here's your munderbar e.
00:29:15.840 | And you get your structures as well.
00:29:17.640 | This all comes right in.
00:29:18.880 | And this is why Swift can see it.
00:29:21.440 | Now how does this work?
00:29:23.440 | That's the big magic question.
00:29:25.360 | So if you want to get this to work, what you can do is you can build into the Swift compiler.
00:29:29.680 | We can write a C parser, right, and we can implement a C preprocessor, and we can implement
00:29:34.560 | all the weird rules in C. Someday we can extend it and write C++ as well, and we can build
00:29:40.440 | this library so the Swift compiler knows how to parse C code, and a C++ compiler is pretty
00:29:45.800 | easy to write, so we can hack that on a weekend.
00:29:48.640 | Or, yay, good news, like we've already done this many years ago, it's called Clang.
00:29:55.120 | So what Clang is, it's a C++ compiler.
00:29:58.520 | Oh, this is getting even -- just to -- even more talk about how horrible C is, you actually
00:30:05.160 | get -- you get inline functions.
00:30:08.440 | Inline functions, the insane thing about inline functions is that they don't exist anywhere
00:30:13.240 | in a program unless you use them.
00:30:16.200 | Right?
00:30:17.200 | They get inlined.
00:30:18.200 | And so if you want to be able to call this function from C, you actually have to code
00:30:21.440 | gen, you have to be able to parse that, code gen, understand what unions are now, understand
00:30:26.560 | all of this crazy stuff just so you can get the sign bit out of a float.
00:30:30.680 | C also has things like function pointers and macros and tons of other stuff, it's just
00:30:34.400 | madness.
00:30:35.560 | And so the right way to do this is to build a C compiler.
00:30:39.560 | And the C compiler we have is called Clang.
00:30:41.880 | And so what ends up happening is that when Jeremy says import socks, Swift goes and says,
00:30:47.680 | ha ha, what's a socks?
00:30:49.000 | Oh, it's a module.
00:30:50.000 | Okay, what is a module?
00:30:51.160 | Oh, it's C. Oh, it's got a header file.
00:30:54.680 | Fire up Clang.
00:30:55.680 | Go parse that header file.
00:30:56.680 | Go parse all the things the header file pulls in.
00:30:58.720 | That's what an umbrella header is.
00:31:01.280 | And go pull the entire universe of C together into that module.
00:31:05.840 | And then build what's called syntax trees to represent all the C stuff.
00:31:10.760 | Well, now we've got a very perfect, pristine C view of the world the exact same way a C
00:31:16.060 | compiler does.
00:31:17.640 | And so what we can do then is we can build this integration between Clang and Swift where
00:31:22.920 | when you say give me malloc or give me socks in it, Swift says, whoa, what is that?
00:31:28.840 | Hey, Clang, do you know what this is?
00:31:30.000 | And Clang says, oh, yeah, I know what socks in it is.
00:31:31.960 | It's this weird function.
00:31:32.960 | It takes all these pointers and blah, blah, blah, blah.
00:31:35.160 | And Swift says, okay, cool.
00:31:36.280 | I will remap your pointers into my unsafe pointer.
00:31:40.600 | I will remap your int into my int32 because the languages are a little bit different.
00:31:46.040 | And so that remapping happens.
00:31:47.680 | And then when you call that inline function, Swift doesn't want to know how unions work.
00:31:51.280 | That's crazy.
00:31:53.040 | So what it doesn't say is it says, hey, Clang, you know how to do all this stuff.
00:31:55.960 | You know how to code generate all these things.
00:31:58.320 | And they both talked to the LVM compiler that we were talking about last time.
00:32:02.180 | And so they actually talked to each other.
00:32:04.040 | They share the code.
00:32:05.040 | Clang does all that heavy lifting.
00:32:06.480 | And now it's both correct.
00:32:08.680 | And it just works.
00:32:10.240 | Two things we like.
00:32:11.680 | And so these two things plug together really well.
00:32:13.760 | And now Swift can talk directly to C APIs.
00:32:16.160 | It's very nice.
00:32:18.240 | If you want to geek out about this, there's a whole talk that's like a half hour, an hour
00:32:22.200 | long talking about how all this stuff works at a lower level.
00:32:26.000 | We will add that link to the lesson notes.
00:32:28.040 | Yeah.
00:32:29.040 | So let's jump back to your example.
00:32:30.560 | Okay.
00:32:31.560 | So one of the reasons I'm really interested in this description, Chris, is that it's kind
00:32:38.400 | of all about one of the reasons I really wanted to work with you, apart from the fact that
00:32:43.760 | you're very amusing and entertaining, is that --
00:32:47.000 | Amusing to laugh at.
00:32:48.000 | Yeah, absolutely.
00:32:49.000 | Is that this idea of what you did with Clang and Swift is like the kind of stuff that we're
00:32:55.880 | going to be seeing is what's happened with like how differentiation is getting added
00:33:00.760 | to Swift.
00:33:01.760 | And like this idea of like being able to pull on this entire compiler infrastructure, as
00:33:06.800 | you'll see, is actually going to allow us to do some similarly exciting and surprisingly
00:33:12.480 | amazing things in deep learning work.
00:33:15.360 | And I'll say this is all simple now, but actually getting these two massive systems talk to
00:33:19.280 | each other was kind of heroic.
00:33:21.540 | And it was -- getting Python integrated was comparatively easy because Python is super
00:33:26.520 | dynamic and C is not dynamic.
00:33:29.080 | Yeah.
00:33:30.080 | And one thing I'll say about C libraries is each time I come across a C library, many
00:33:35.160 | of them have used these weird edge case things Chris described in surprising ways.
00:33:40.000 | And so I just wanted to point at a couple of pointers as to how you can deal with these
00:33:43.980 | weird edge cases.
00:33:45.540 | So when I started looking at kind of how do I create my own version of tf.data, I need
00:33:51.580 | to be able to read JPEG files, I need to be able to do image processing, I was interested
00:33:55.120 | in trying this library called vips.
00:33:58.520 | And vips is a really interesting C library for image processing.
00:34:04.200 | And so I started looking at the -- at bringing in the C library.
00:34:08.820 | And so I started in exactly the way you've seen already.
00:34:13.280 | So let's do that.
00:34:15.240 | So you'll find just like we have a Swift socks in the repo, there's also a Swift vips.
00:34:25.120 | And we'll start -- we'll start seeing some pretty familiar things.
00:34:30.480 | There's the same package.swift that you've seen before, but now it's got some extra lines
00:34:35.880 | we'll describe in a moment.
00:34:37.480 | There's the sources, vips, module map with the exact three lines of code that you've
00:34:42.880 | seen before.
00:34:44.880 | There's the sources, vips, some header, I call it a different name in this case, which
00:34:51.080 | has the one line of code which is connecting to the header.
00:34:54.960 | After you've done that, you can now import vips done.
00:34:59.880 | But it turns out that the vips documentation says that they actually added the ability
00:35:05.840 | to handle optional positional arguments.
00:35:10.040 | >> In C.
00:35:11.040 | >> In C.
00:35:12.040 | So it turns out that you can do that in C, even though it's not officially part of C,
00:35:16.720 | by using something called "bargs," which is basically in C. You can say the number of
00:35:22.160 | arguments that go here is kind of not defined ahead of time, and you can use something I've
00:35:26.000 | never heard of before called a sentinel, and basically you end up with stuff which looks
00:35:30.400 | like this.
00:35:35.760 | You end up with stuff which looks like this, where you basically say I want to do a resize
00:35:42.000 | and it has some arguments that are specified, like horizontal scale, and by default it makes
00:35:46.440 | the aspect ratio the same.
00:35:48.440 | But if you want to also change the aspect ratio and have a vertical scale, you literally
00:35:52.480 | write the string vscale, and that says, oh, the next argument is the vertical scale.
00:35:57.680 | And if you want to use some different interpolation kernel, you pass the word kernel, you say
00:36:01.120 | there's some different interpolation kernel.
00:36:04.160 | Now this is tricky because for all the magic that Swift does do, it doesn't currently know
00:36:09.600 | how to deal with vargs and sentinels.
00:36:12.400 | It's just an edge case of the C world that Swift hasn't handled yet.
00:36:15.760 | >> I think this might be the last edge case.
00:36:17.800 | >> Okay.
00:36:18.800 | >> Jeremy has this amazing thing to find the breaking point of anything.
00:36:21.440 | >> That's what I do.
00:36:22.440 | >> Yes.
00:36:23.440 | >> But no problem, right?
00:36:25.440 | The trick is to provide a header file where the things that Swift needs to call look like
00:36:34.760 | the things that they expect.
00:36:36.160 | So in this case, you can see I've actually written my own C library, right?
00:36:40.640 | And so I added a C library by literally just putting into sources.
00:36:46.760 | I just created another directory.
00:36:49.360 | And in there, I just dumped a C header file, right?
00:36:53.640 | And here's the amazing thing.
00:36:55.180 | As soon as I do that, I can now add that C library, not precompiled, but actual C code
00:37:03.160 | I've just written to my package.swift, and I can use that from Swift as well, right?
00:37:09.200 | And so that means that I can wrap the VIPs weird varargs resize version with a non-varargs
00:37:16.120 | resize version where you always pass in vertical scale, for instance.
00:37:20.360 | And so now, I can just go ahead and say, VIPs load image.
00:37:26.660 | And then I can say VIPs get, and then I can pass that to Swift for TensorFlow in order
00:37:34.120 | to display it through matplotlib.
00:37:37.680 | Now, there's a really interesting thing here, which is when you're working with C, you have
00:37:43.960 | to deal with C memory management.
00:37:46.720 | So Swift has this fantastic reference counting system, which nearly always handles memory
00:37:52.600 | for you.
00:37:54.560 | Every C library handles memory management differently.
00:37:58.300 | So we're about to talk about OpenCV, which actually has its own reference counting system,
00:38:02.560 | believe it or not.
00:38:03.560 | But most of the time, the library will tell you, hey, this thing is going to allocate
00:38:07.440 | some memory, you have to free it later, right?
00:38:10.560 | And so here's a really cool trick.
00:38:12.480 | The VIPs get function says, hey, this memory, you're going to have to free it later.
00:38:17.240 | To free memory in C, you use the free function, because we can use C functions from Swift.
00:38:25.940 | We can use the free function.
00:38:28.240 | And I need to make sure that we call it when we're all done.
00:38:31.680 | And there's a super cool thing in Swift called defer.
00:38:35.100 | And defer says, run this piece of code before you finish doing whatever we're doing, which
00:38:41.080 | in this case would be before we exit from this function.
00:38:44.160 | >> Yeah, so if you throw an exception, if you return early, anything else, it will make
00:38:48.720 | sure to run that.
00:38:49.720 | >> Yeah.
00:38:50.720 | In this case, I probably didn't need defer, because there isn't exceptions being thrown
00:38:54.200 | or lots of different return places, but that's my habit, is that if I need to clean up memory,
00:38:59.320 | I just chuck it in a defer block, or at least that's one of the two methods that I use.
00:39:05.360 | So that's that.
00:39:07.960 | So because I like finding the edges of things and then doing it anyway, the next thing I
00:39:14.000 | looked at, and this gives you a good sense of how much I hate tf.data, is I was trying
00:39:20.160 | to do anything I could to avoid tf.data, and so I thought, all right, let's try OpenCV.
00:39:26.360 | And for those of you that have been around FastAI for a while, you'll remember OpenCV
00:39:30.320 | is what we used in FastAI 0.7.
00:39:35.520 | And I loved it because it was insanely, insanely fast.
00:39:41.400 | It's fast, reliable, high-quality code that covers a massive amount of computer vision.
00:39:47.240 | It's kind of like -- it's what everybody uses if they can.
00:39:50.760 | And much to my sadness, we had to throw it out, because it just -- it hates Python multiprocessing
00:39:57.960 | so much.
00:39:58.960 | It just kept creating weird race conditions and crashes and stalls, like literally the
00:40:04.680 | same code on the same operating system on two different AWS servers that are meant to
00:40:09.000 | be the same spec, would give different results.
00:40:11.360 | So that was sad.
00:40:12.520 | So I was kind of hopeful maybe it'll work in Swift, so we gave it a go.
00:40:16.880 | And unfortunately, since I last looked at it, they threw away their C API entirely, and
00:40:22.200 | they're now C++ only, and Chris just told you we can't use C++ from Swift.
00:40:28.480 | But here's the good news.
00:40:30.160 | You can disguise it so Swift doesn't know that it's C++.
00:40:34.720 | And so the disguise needs to be a C header file that only contains C stuff, right?
00:40:43.440 | But what's in the C++ file behind the header file --
00:40:46.960 | >> Can be anything.
00:40:47.960 | >> Can be anything at all.
00:40:48.960 | >> Maybe Pascal.
00:40:49.960 | >> Clang those Pascal calling conventions, and Swift can call them.
00:40:56.320 | I didn't know that.
00:40:57.800 | >> Pascal strings, too.
00:40:59.080 | So here's Swift CV, and so Swift CV has a very familiar-looking package.swift that contains
00:41:06.720 | the stuff that we're used to, and it contains a very familiar-looking OpenCV4 module map.
00:41:13.320 | Now, OpenCV actually has more than one library, so we just have to list all the libraries.
00:41:19.240 | It has a very familiar-looking -- actually, we don't even have a -- oh, sorry, that's
00:41:24.040 | right.
00:41:25.040 | So we didn't use the header file here, because we're actually going to do it all from our
00:41:28.520 | own custom C++/C code.
00:41:33.100 | So I created a C OpenCV, and inside here, you'll find C++ code.
00:41:45.160 | And we actually largely stole this from the Go OpenCV wrapper, because Go also doesn't
00:41:51.560 | know how to talk to C++, but it does know how to talk to C, so that was a convenient
00:41:55.780 | way to kind of get started.
00:41:57.280 | But you can see that, for example, we can't call new, because new is C++, but we can create
00:42:02.400 | a function called matnew that calls that, and then we can create a header that has mat new,
00:42:16.080 | and that's not C++, right?
00:42:18.320 | This is actually a plain C struct -- pointer to a struct, and so I can call that.
00:42:25.480 | And so even generics, C++ generics, we can handle this way.
00:42:28.880 | So OpenCV actually has a full-on multidimensional generic array, like NumPy, with, like, matrix
00:42:38.280 | multiplication, all the stuff in it, and the way its generic stuff works is that you can
00:42:42.960 | ask for a particular pixel and you say, "What data type is it using C++ generics?"
00:42:48.580 | So we just create lots and lots of different versions, all the different generic versions,
00:42:53.480 | which in the header file look like C.
00:42:59.480 | So once we've done all that, we can then say import SwiftCV and start using OpenCV stuff.
00:43:08.600 | So what does that look like?
00:43:11.440 | Well, now that we can use that, we can read an image, we can have a look at its size,
00:43:18.880 | we can get the underlying C pointer, and we can start doing -- yeah, and we can start
00:43:26.600 | doing timing things and kind of see, is it actually looking like it's going to be hopeful
00:43:30.960 | in terms of performance, and so forth.
00:43:37.240 | I was very scared when I started seeing in Swift all these unsafe, mutable pointers and
00:43:41.880 | whatnot.
00:43:42.880 | >> They're designed to make you scared.
00:43:45.160 | It starts with unsafe.
00:43:46.160 | >> Fair enough.
00:43:47.160 | >> But this is C, right, and so C is inherently unsafe.
00:43:51.240 | >> Yeah.
00:43:52.240 | >> Swift's theory on that is that it does not prevent you from using it.
00:43:57.080 | It just makes it so you know that you're in that world.
00:44:00.120 | >> But there's actually this great table I saw from Ray Wendelix, from Ray Wendelix website,
00:44:09.560 | and I've stolen it here.
00:44:11.240 | And basically what he pointed out is the names of all of these crazy pointer types actually
00:44:16.480 | have this very well-structured thing.
00:44:18.520 | They all start with unsafe, they all end with pointer, and in the middle there's this little
00:44:22.680 | mini-language which is can you change them or not, are they typed or not, do we know
00:44:30.380 | the count of the number of things in there or not, and what type of thing do they point
00:44:34.280 | to if they're typed.
00:44:35.720 | So once you kind of realize all of these names have that structure, suddenly things start
00:44:40.840 | seeming more reasonable again.
00:44:43.520 | >> We have two questions.
00:44:45.320 | >> All right, let's go with the two questions.
00:44:48.560 | >> One is, are the C libraries dynamically linked or statically linked or compiled from
00:44:53.240 | source to be available in Swift?
00:44:55.160 | >> Sure.
00:44:56.160 | By default, if you import them, they are statically linked.
00:44:59.520 | And so they'll link in with the normal linker flags, and if the library is a .a file, then
00:45:05.160 | it will get statically linked directly into your Swift code.
00:45:07.560 | If it's a .so file, then you'll dynamically link it, but it's still linked to your executable.
00:45:13.240 | All the linker stuff, so dlopen, is a C API.
00:45:16.600 | And so if you want to, you can dynamically load C libraries.
00:45:20.520 | You can look up their symbols dynamically.
00:45:22.240 | You can do all that kind of stuff, too.
00:45:25.280 | >> Another question is, how much C do you have to know to do all these C-related imports?
00:45:32.880 | >> Almost none.
00:45:33.880 | So I don't really know any C at all, so I kind of randomly press buttons until things
00:45:39.000 | start working or copy and paste other people's C code.
00:45:42.280 | >> The Internet Stack Overflow has a lot of helpful stuff.
00:45:45.120 | >> Yeah.
00:45:46.120 | You need to know there's a thing called a header file, and that contains a list of the
00:45:49.240 | functions that you can call, and you need to know that you type #include, angle brackets,
00:45:54.160 | header file.
00:45:55.160 | But you can just copy and paste the Swift socks library that I've already shown you,
00:46:00.120 | which has the three files already there.
00:46:02.280 | And so really, you don't need to know any C. You just need to replace the word "socks"
00:46:05.920 | with the name of the library you're using, and then you need to know -- you need to kind
00:46:10.920 | of work through the documentation that's in C, and that's the bit where it gets, like,
00:46:18.000 | you know -- I find the tab completion stuff is the best way to handle that, is like hit
00:46:22.440 | tab, and you say let x equal, and then you call some function, and then you say x. and
00:46:29.040 | you see what's inside it, and things kind of start working.
00:46:31.960 | >> And for all the hard time you give socks as, you know, not a web design firm, it has
00:46:39.000 | a pretty well-structured API, and so if you have a well-structured API like this, then
00:46:43.400 | using it is pretty straightforward.
00:46:44.640 | If you have something somebody hacked together, they didn't think about it, then it's probably
00:46:49.760 | going to be weird, and you may have to understand their API, and it may require you to understand
00:46:53.200 | a lot of C. But those are the APIs that you probably won't end up using, because if they
00:46:57.320 | haven't gone a lot of love to their API, people aren't using it, usually.
00:47:03.400 | >> My impression is that almost all of the examples of the future power of Swift seem
00:47:07.280 | to rely not on the abstraction to higher levels, but on the diving into lower-level details.
00:47:13.320 | As a data scientist, I try to avoid doing this, I only go low if I know there's a big
00:47:17.040 | performance gain to be had.
00:47:19.880 | >> So let me set my perspective as a data scientist, and maybe we can hear you all.
00:47:24.240 | >> Well, and I was just going to inject, we're starting at the bottom, so we'll be getting
00:47:28.840 | much higher levels soon.
00:47:29.960 | But there's a reason that I'm wanting to teach this stuff, which is that I actually think
00:47:38.400 | as data scientists, this is our opportunity to be far more awesome.
00:47:44.720 | It's like being able to access, something I've noticed for the last 25 years is everybody
00:47:51.640 | I know in, I mean, it didn't used to be called data science, we used to call it industrial
00:47:56.000 | mathematics or whatever, operated within the world that was accessible to them, right?
00:48:03.360 | So at the moment, for example, there's a huge world of something called sparse convolutions
00:48:07.560 | that are, I know they're amazing, I've seen competition-winning solutions, they get state-of-the-art
00:48:14.680 | results.
00:48:15.680 | There's two people in the world doing it, because it all requires custom CUDA kernels.
00:48:22.680 | For years, for decades, almost nobody was doing differential programming, because we had to
00:48:26.960 | calculate the derivatives by hand.
00:48:28.720 | So like, it's not just about, oh, I want an extra, it's absolutely not about, I want an
00:48:34.440 | extra 5% of performance, it's about being able to do whatever's in your head.
00:48:41.320 | I used to be a management consultant, I'm afraid to say, and I didn't know how to program,
00:48:46.360 | and I knew Excel, and the day that I learned Visual Basic was like, oh, now I'm not limited
00:48:53.360 | to the things I can do in a spreadsheet, I can program.
00:48:57.680 | And then when I learned Delphi, it was like, oh, now I'm not limited to the things that
00:49:01.080 | I can program in a spreadsheet, I can do things through in my head.
00:49:04.920 | So that's where I want us all to get to.
00:49:08.840 | Yeah.
00:49:09.840 | >> Hey, and some people are feeling overwhelmed with Swift, C, C++, Python, PyTorch, TensorFlow,
00:49:18.440 | Swift for TensorFlow, do we need to become experts on all these different languages?
00:49:23.760 | >> No.
00:49:24.760 | >> No, we don't, but can I show why this is super interesting?
00:49:30.000 | Because this is like -- so let me show you why I started going down this path, right?
00:49:37.320 | Which is that I was using tf.data.
00:49:43.360 | And I found that it took me 33 seconds to iterate through ImageNet.
00:49:54.760 | And I know that in Python, we have a notebook which Sylvia created to compare, called timing,
00:50:08.680 | and the exact same thing takes 11.5 seconds.
00:50:12.200 | And this is not an insignificant difference, waiting more than three times as long just
00:50:16.400 | to load the data is just not okay for me.
00:50:19.840 | So I thought, well, I bet OpenCV can do it fast.
00:50:25.200 | So I created this little OpenCV thing.
00:50:28.960 | And then I created a little test program.
00:50:39.040 | So this is the entirety of my test program, right, which is something that downloads ImageNet
00:50:45.680 | and reads and resizes images, and does it with four threads.
00:50:52.440 | And so if you go Swift run -- sorry, Swift run -- okay, so when I run this, check this
00:51:06.280 | out, 7.2 seconds, right?
00:51:10.040 | And so this was like half a day's work.
00:51:14.120 | And half a day's work, I have something that can give me an image processing pipeline that's
00:51:19.960 | even faster than PyTorch.
00:51:22.240 | And so it's not just like, oh, we can now do things a bit faster, but it's now like
00:51:27.320 | any time I get stuck that I can't do something, it's not in the library I want, or it's so
00:51:32.400 | slow as to be unusable, you know, this whole world's open.
00:51:36.000 | So I'd say we don't really touch this stuff until you get to a point where you have no
00:51:42.320 | choice but to, and at that point you're just really glad it's there.
00:51:45.600 | >> Well, and to me, I think it's also -- the capability is important even if you don't
00:51:49.840 | do it.
00:51:51.840 | So keep in mind, this is all code that's in a workbook.
00:51:54.600 | So you can get code in the workbook from anywhere, and now you can share that workbook, and you
00:51:59.200 | don't have to share this like tangled web of dependencies that have to go with the workbook.
00:52:03.920 | And so the fact that you can do this in Swift doesn't mean that you yourself have to write
00:52:06.960 | the code, but it means you can build on code that other people wrote.
00:52:09.880 | And if you haven't seen Swift at all, if this is your first exposure to it, this is definitely
00:52:13.880 | not the place you start.
00:52:15.360 | Like the data APIs that we're about to look at would be a much more reasonable place to
00:52:19.080 | start.
00:52:20.080 | You've had a month or two months' worth of hacking with Swift time, and that's Jeremy
00:52:24.960 | month, so that's like a year for normal people.
00:52:27.920 | So this being like super powerful and the ability to do this is, I think, really great,
00:52:34.760 | and I agree with you.
00:52:35.760 | >> Yeah, and I am totally not a C programmer at all, and it's -- honestly, it's been more
00:52:39.600 | like two weeks, because before that I was actually teaching a Python course, believe
00:52:42.840 | it or not.
00:52:43.840 | But Silva has been doing this for a month.
00:52:46.040 | >> Yeah.
00:52:47.040 | So, I mean, so this is all -- it's all there, and I would definitely recommend ignoring
00:52:51.840 | all of this stuff, and we're about to start zooming up the levels of the stack.
00:52:56.520 | But the fact that it's there, I think, is reassuring, because one of the challenges
00:52:59.320 | that we have with Python is that you get this ceiling, and if you get up to the ceiling,
00:53:04.040 | then there's no going further without this crazy amount of complexity, and whether that
00:53:08.440 | be concurrency, or whether that be C APIs, or whether that be other things, that prevents
00:53:12.880 | the next steps and the next levels of innovation and the industry moving forward.
00:53:16.440 | >> And this is meant to be giving you enough to go on with until a year's time course,
00:53:21.200 | as well.
00:53:22.200 | So like it's -- hopefully this is something where you can pick and choose which bits
00:53:26.360 | you want to dig into, and whichever bit you pick to dig into, we're showing you all the
00:53:30.920 | depth that you can dig into over the next 12 months.
00:53:36.720 | So I was really excited to discover that we can use OpenCV, which is something I've wanted
00:53:44.740 | ever since we had to throw it away from fast AI, and so I thought, you know, what would
00:53:48.920 | it take to create a data blocks API with OpenCV?
00:53:53.960 | And thanks to Alexis Gallagher, who kind of gave us the great starting point to say, well,
00:53:58.440 | here is what a Swifty-style data blocks would look like, we were able to flesh it out into
00:54:05.880 | this complete thing.
00:54:07.240 | And when Chris described Alexis to me as the world leader on value types, I was like, wait,
00:54:13.240 | I thought you created them.
00:54:14.400 | I thought, okay, I guess we can listen to Alexis's code for this.
00:54:20.000 | >> I will say I'm terrified about presenting those slides, because Alexis is sitting right
00:54:24.280 | there, and if you start scowling, then -- oh, no.
00:54:28.960 | >> We have a handheld mic, come and correct us any time.
00:54:31.840 | So there's a thing here called OC data block generic, where you'll find that what we've
00:54:37.080 | actually done is we've got the entire data blocks API in this really interesting Swifty-style,
00:54:45.880 | and what you'll see is that when we compare it to the Python version, this is on every
00:54:53.160 | axis very significantly better.
00:54:56.960 | So let's talk about some of the problems with the data block API in Python.
00:55:00.000 | I love the data block API, but lots of you have rightly complained that we have to run
00:55:06.480 | all the -- in a particular order, if we get the wrong order, we get inscrutable errors.
00:55:13.520 | We have to make sure that we have certain steps in there, if we miss a step, we get
00:55:18.840 | inscrutable errors.
00:55:21.680 | It's difficult to deal with at that level, and then the code inside the data blocks API,
00:55:28.240 | I hate changing it now, because I find it too confusing to remember like why it all
00:55:33.400 | fits together.
00:55:34.400 | But check out this Swift code that does the same thing, right?
00:55:37.640 | So is that download?
00:55:39.620 | This is just the same get files we've seen before.
00:55:43.520 | All we need to do is we say, you know what, if you're going to create some new data bunch,
00:55:50.180 | you need some way to get the data, you need some way -- and let's assume that they're
00:55:54.680 | just paths for now -- it is some way to turn all of those paths into items, so something
00:55:59.680 | like our item list in Python.
00:56:01.600 | You need some way to split that between training and validation.
00:56:05.520 | And you need some way to label the items.
00:56:09.440 | So for example, for ImageNet, download calls that.
00:56:16.480 | And things that convert paths to images, so which grab all of the list of paths is that
00:56:24.840 | collect files.
00:56:27.400 | And then the thing that converts -- that decides whether they're training or validation is
00:56:33.240 | whether the parent.parent is trained or not.
00:56:36.420 | And the thing that creates the label is the parent.
00:56:41.160 | And so, like, we can basically just define this one neat little package of information
00:56:47.240 | and we're done.
00:56:49.200 | And Swift will actually tell us if we forgot something.
00:56:54.960 | Or if one of the things that we provided, like, is training is meant to return true or
00:56:59.620 | false if it's training or validation.
00:57:01.080 | If we return, like, accidentally return the words train instead, it'll complain and let
00:57:06.080 | us know.
00:57:07.080 | So, like, I just love this so much.
00:57:11.020 | But to understand what's going on here, we need to learn a bit more about how Swift works
00:57:16.840 | and this idea of protocols.
00:57:18.520 | >> Yeah.
00:57:19.520 | So this is something that is actually useful if you are doing deep learning stuff.
00:57:25.600 | So let's talk about --
00:57:26.600 | >> Sorry.
00:57:27.600 | >> Go for it.
00:57:28.600 | So let's talk about what protocols are in Swift.
00:57:31.560 | So we've seen structs.
00:57:37.680 | Right now we want to talk about what protocols are.
00:57:39.480 | And if you've worked in other languages, you may be familiar with things like interfaces
00:57:43.240 | in Java, abstract classes that are often used, advanced other weird languages have things
00:57:50.440 | called type classes.
00:57:51.560 | And so all these things are related to protocols in Swift.
00:57:55.040 | And what protocols do is they're all about splitting the interface of a type from the
00:57:58.880 | implementation.
00:57:59.960 | And so we'll talk about layer later.
00:58:02.780 | Layer is a protocol.
00:58:03.880 | And it says that to use a layer, or rather to define a layer, you have this call.
00:58:09.520 | So layers are callable, just like in PyTorch.
00:58:12.160 | And so then you can define a dense layer and say how to call a dense layer.
00:58:14.760 | You can define a conv2D and show how to implement a conv2D layer.
00:58:22.160 | And so there's a contract here between what a type is supposed to have.
00:58:25.720 | All layers must be callable.
00:58:27.620 | And then the implementations, these are different.
00:58:29.200 | Right.
00:58:30.200 | So this is pretty straightforward stuff.
00:58:32.200 | Even that's quite a nice track.
00:58:33.200 | It's like in PyTorch, you kind of have to know that there's something called forward.
00:58:37.400 | And if you misspell it, or forget it to put it there, or put under call instead of forward,
00:58:44.280 | you get kind of weird and screwable errors.
00:58:45.920 | Whereas with this approach, you get time completion from the signature.
00:58:51.200 | Yeah.
00:58:52.200 | And Swift will tell you if you get it wrong, it'll say this is what the function should
00:58:55.440 | have been called.
00:58:56.440 | Yeah.
00:58:57.440 | That's great.
00:58:58.440 | So what this is really doing is this is defining behaviors for groups of types.
00:59:02.440 | And so we're saying a layer.
00:59:04.760 | Layer is like the commonality between a whole group of types that behave like a layer.
00:59:11.160 | And what does that behavior mean?
00:59:12.280 | Well, it's a list of what are called requirements.
00:59:14.360 | And so these are all the methods that the type has to have.
00:59:17.100 | This is the signatures of the types.
00:59:19.240 | And these things often have invariants.
00:59:21.480 | And so one of the things that Swift has in its library is the notion of equatable.
00:59:26.000 | What is equatable?
00:59:27.000 | An equatable is any type that has this equals equals operator, right?
00:59:31.560 | And then it says what is equatability and all that kind of stuff.
00:59:34.600 | Now the cool thing about this is that you can build up towers of types.
00:59:39.720 | And so you can say, well, equatable gets refined by an additive arithmetic.
00:59:44.080 | And an additive arithmetic is something that sports addition and subtraction.
00:59:47.840 | Then if you also have multiplication, you can be numeric.
00:59:50.120 | And if you also have negation, then it can be signed.
00:59:52.940 | And then you can have integers, and you can have floating point.
00:59:55.920 | And now you can have all these different things that exist in the ecosystem of your program,
01:00:00.280 | and you can describe them.
01:00:02.200 | And so this is all very -- these things, these ways to reason about these groups of types
01:00:10.480 | give you the ability to get these abstractions that we all want.
01:00:14.160 | And these types already exist in Swift.
01:00:15.920 | These all exist, and you can go see them in the standard library.
01:00:18.760 | And so why do you want this?
01:00:20.240 | Well, the cool thing about this is that now you can define behavior that applies to all
01:00:24.520 | of the members of that class.
01:00:27.320 | And so what we're doing here is we're saying not equal.
01:00:31.480 | Well, all things that are equatable, and this T colon equatable says I work with any type
01:00:37.200 | that is equatable.
01:00:39.480 | What this is doing is this is defining a function, not equal, on any type that's equatable, and
01:00:43.960 | it takes two of these things and returns a bool.
01:00:47.040 | And we can implement not equal by just calling equals equals, which all equatable things
01:00:51.200 | are, and then inverting it.
01:00:52.920 | So to be clear, what Chris just did here was he wrote one function that is now going to
01:00:59.120 | add behavior to every single thing that defines equals automatically, which is pretty magic.
01:01:05.640 | Just like everywhere, boom, one place, super, super abstract.
01:01:11.320 | But this also works for lots of other things.
01:01:13.160 | This works for like absolute value.
01:01:14.800 | What does absolute value mean?
01:01:15.800 | Well, it needs any type that is signed and numeric and that's comparable.
01:01:20.880 | And how do you implement absolute value?
01:01:22.240 | Well, you compare the thing against zero.
01:01:23.840 | If it's less than zero, you negate it.
01:01:25.200 | Otherwise, you return it.
01:01:26.200 | Super simple.
01:01:27.200 | But now everything that is a number that can be compared is now absible.
01:01:32.360 | Types, same thing.
01:01:34.760 | All these things work the same way.
01:01:36.040 | And so with dictionary, what you want is you want the keys in a dictionary all have to
01:01:40.160 | be hashable.
01:01:41.160 | This is how the dictionary does its efficient lookups and things like that.
01:01:44.240 | The value can be anything, though.
01:01:45.880 | And so all these things kind of stack together and fit together like building blocks.
01:01:49.960 | One of the really cool things about this now is that we can start taking this further.
01:01:54.240 | So we talked about not equal building on top of equal equal.
01:01:57.440 | In the last lesson, we defined this is odd function.
01:01:59.600 | We defined it on int.
01:02:00.840 | Well, because this protocol exists, we can actually add it as a method to all things
01:02:06.200 | that are binary integers.
01:02:07.920 | And so we can say, hey, put this on all binary integers and give them all an is odd method.
01:02:13.040 | And now I don't have to put is odd on int and int 16 and int 32 and the C weird things.
01:02:18.640 | You can just do it in one place and now everybody gets this method.
01:02:23.720 | On layers.
01:02:24.720 | This is something that's closer to home.
01:02:26.240 | Here we can say, hey, I want a inferring from method that does some learning phase switching
01:02:30.560 | magic nonsense.
01:02:31.560 | But now because I put this on all layers, well, I can use it on my model because your
01:02:36.080 | model is a layer.
01:02:37.240 | My dense layer, that's a layer.
01:02:38.840 | And so this thing allows this one simple idea of defining groups of types and then broadcasting
01:02:44.640 | behavior and functionality onto all of them at once is really powerful.
01:02:47.760 | Yeah, I mean, it's like Python's monkey patching, which we use all the time.
01:02:52.600 | But A, it's not this kind of hacky thing with weird undefined behavior sometime.
01:02:57.640 | And B, we don't have to monkey patch lots and lots of different places to add functionality
01:03:02.360 | to lots of things.
01:03:03.360 | We just put it in one place and the functionality gets sucked in by everything that can use it.
01:03:08.560 | Yeah.
01:03:09.560 | And remember, extensions are really cool because they work even if you didn't define the type.
01:03:12.760 | So what you can literally do is you can pull in some C library, not that we'd love C, but
01:03:18.600 | some C library and add things to its structs.
01:03:21.600 | I mean, this is it will have things automatically added to those tracks because it already supports
01:03:28.200 | those operations.
01:03:29.200 | Yeah.
01:03:30.200 | So all this stuff composes together in a really nice way, which allows you to do very powerful
01:03:33.480 | and very simple and beautiful things.
01:03:35.840 | So mix-ins show up and you can control where they go.
01:03:40.640 | And so here's an example.
01:03:41.640 | This is something that Jeremy wrote.
01:03:43.400 | And so he defined his own protocol countable and he says, things are accountable if they
01:03:47.680 | have a variable named count.
01:03:50.000 | And the only thing I care about for countable things is I can read it.
01:03:52.240 | I don't have to write it.
01:03:53.240 | That's what the get means.
01:03:54.240 | And so he says, array is countable.
01:03:57.080 | His OpenCV mat thingy is countable.
01:04:01.080 | Like all these...
01:04:02.080 | There's a number of pixels in it.
01:04:03.080 | Yeah.
01:04:04.080 | All these things are countable.
01:04:05.080 | And then Jeremy says, hey, well, take any sequence.
01:04:07.440 | Let's add a method or a property called total count to anything that's a sequence.
01:04:12.440 | So a sequence is the same as Python's iterables.
01:04:14.640 | What do you think you can iterate through?
01:04:15.880 | Exactly.
01:04:16.880 | And so this is things like dictionaries and arrays and ranges and all these things are
01:04:21.120 | sequences.
01:04:22.120 | And it says, so long as the element is countable, so I have an array of countable things, then
01:04:27.120 | I can get a total count method or a property.
01:04:29.920 | And the way this is implemented is it just says, hey, map over myself, get the count,
01:04:34.880 | get all the counts up together, and then I have a total count of all the things in my
01:04:39.120 | array.
01:04:40.120 | And now if I have an array of arrays, an array of mats, lazy mapped collection sequence-y
01:04:46.280 | thingy of mats, whatever it is, now I can just ask for its count or its total count.
01:04:51.400 | Hey, Chris, this -- so this functionality you're describing is basically the same as
01:04:56.800 | what Haskell calls type classes.
01:04:59.920 | Is that right?
01:05:00.920 | Yeah.
01:05:01.920 | Is this kind of, like, stolen from Haskell?
01:05:05.080 | Sparred.
01:05:06.080 | I mean, so the interesting thing for me here is --
01:05:08.040 | We let them play with it, too.
01:05:09.440 | Well, the interesting thing -- the reason I ask is because, like, I've tried to use Haskell
01:05:13.720 | before many times and have always failed.
01:05:16.560 | I'm clearly not somebody who's smart enough to use Haskell.
01:05:21.040 | Yet I wrote the code that's on the screen right now, like, a couple of days ago.
01:05:25.840 | And I didn't spend a moment even thinking about the fact I was writing the code.
01:05:29.800 | It was only, like, the next day that I looked back at this code, and I thought, like, wow,
01:05:34.160 | I just did something which no other language I've used both could do, and I was smart enough
01:05:39.560 | to do it.
01:05:40.560 | Like, it kind of makes this what I think of as, like, super genius functionality available
01:05:46.320 | to normal people.
01:05:47.320 | Yeah, and so back at the very, very beginning of this, we talked about Swift's goal is to
01:05:50.880 | be able to take good ideas wherever they are and assemble them in a tasteful way, and then
01:05:54.920 | be not weird.
01:05:57.000 | Being not weird is a pretty hard but important goal.
01:06:00.360 | So the way I look at programming languages is that programming languages in the world
01:06:05.000 | have these gems in them, and Haskell has a lot of gems, incidentally.
01:06:09.000 | It's a really cool functional language.
01:06:10.360 | It's very academic.
01:06:11.720 | It's super powerful in lots of ways.
01:06:13.680 | But then it gets buried in weird syntax, and it's just purely functional.
01:06:17.680 | You have to be -- you know, it has a very opinionated worldview of how you're supposed
01:06:20.960 | to write code, and so it appeals to a group of people, which is great, but then it gets
01:06:25.040 | ignored by the masses.
01:06:26.840 | And to me it's really sad that the great technology in programming languages that's been invented
01:06:31.360 | for decades and decades and decades gets forgotten just because it happened to be put in the
01:06:35.200 | wrong place.
01:06:36.200 | Yeah.
01:06:37.200 | It's not just that, but it's even the whole way things are described are all about --
01:06:41.360 | -- monoids and monads and whatever --
01:06:43.160 | Existentials and things like that.
01:06:45.240 | Yeah, exactly.
01:06:46.240 | And so a lot of what Swift is trying to do is just trying to take those ideas, re-explain
01:06:50.180 | them in a way that actually makes sense, stack them up in a very nice, consistent way, and
01:06:54.640 | design it, right?
01:06:57.200 | And so a lot of this was pull these things together and really polish and really push
01:07:01.600 | and, like, make sure that the core is really solid.
01:07:04.840 | Okay.
01:07:05.840 | We have a question.
01:07:07.480 | How does the Swift protocol approach avoid the inheritance tree hell problem in languages
01:07:12.580 | like C#, where you end up with enormous trees that are impossible to refactor?
01:07:17.920 | And similarly, what are the top opinions around using the mix-in pattern, which has been found
01:07:22.540 | to be an anti-pattern in other contexts?
01:07:25.040 | Yeah.
01:07:26.040 | So the way that Swift influences is completely different than the way that subclasses work
01:07:30.800 | in C# or Java or other object-oriented languages.
01:07:34.480 | There, what you get is something called a Vtable.
01:07:37.220 | And so your type has to have one set of mappings for all these different methods, and then
01:07:40.800 | you get very deep inheritance hierarchies.
01:07:42.900 | In Swift, you end up adding methods to int.
01:07:47.720 | Like, so we, on the last slide, added a method is odd to all the integers.
01:07:52.480 | Just don't have a Vtable.
01:07:53.800 | That would be a very inefficient thing to do.
01:07:56.580 | And so the implementation is completely different.
01:07:58.200 | The trails are completely different.
01:08:00.460 | I will, at the end of this, I think in a couple of slides, have a good pointer that will give
01:08:05.280 | you a very nice deep dive on all that kind of stuff.
01:08:07.960 | So also there's the binary method problem, and there's a whole bunch of other things
01:08:11.040 | that are very cleanly solved in Swift protocols.
01:08:13.160 | Okay.
01:08:14.160 | And then there was also a question.
01:08:16.220 | Out of curiosity, could you give an estimate of how long it would take someone to go from
01:08:20.000 | a fair level of knowledge in Python, TensorFlow deep learning, to start being able to be a
01:08:25.920 | competent contributor to Swift for TensorFlow?
01:08:28.720 | Yeah.
01:08:29.720 | So we've designed Swift in general to be really easy to learn, and so that you can learn as
01:08:34.140 | you go.
01:08:35.140 | And this course is a little bit very -- it's very bottoms up, but a lot of Swift, just
01:08:39.040 | like Python, was designed to be taught.
01:08:41.720 | And what you start with when you go in from that perspective is you get a very top-down
01:08:47.480 | kind of perspective.
01:08:48.920 | And what I would do is I would start with the Google for a Swift tour, and you get a
01:08:53.040 | very nice top-down view of the language, and it's very approachable.
01:08:56.760 | And like just pick something that like is in FastAI in some FastAI notebook now.
01:09:01.160 | We haven't implemented it yet, and pop it into a notebook, right?
01:09:04.400 | And the first time you try to do that, you'll get all kinds of weird errors and obstructions,
01:09:08.200 | and you won't know what's going on, but after a few weeks --
01:09:11.000 | That's on the forum, and that's what the community's about.
01:09:13.280 | Yeah, lots of help from the forum, and Chris and I are both on the forum, and there's SFTF
01:09:17.160 | teams on the forum.
01:09:18.720 | We'll help you out, and in a few weeks' time, you'll be writing stuff from scratch and finding
01:09:24.640 | it a much smoother process.
01:09:27.360 | Yeah.
01:09:28.360 | So I want to address one weird thing here and give you something to think about, and
01:09:34.440 | you might wonder, okay, well, Jeremy wants to know all the countable things.
01:09:39.720 | We have arrays and we have mat, and we have to say that they are countable.
01:09:43.560 | But the compiler knows that it's countable or not.
01:09:45.680 | If you try to make something countable and it doesn't have a count method, the compiler
01:09:48.240 | will complain to you.
01:09:49.680 | So why do we have to do this?
01:09:51.200 | Well, let's talk about a different example, and the answer is that protocols are not just
01:09:55.880 | about methods -- and this is also related to the C# question -- but the behavior of those
01:10:00.640 | methods also matters.
01:10:02.440 | And so here we're going to switch domains and talk about shapes.
01:10:04.760 | And so I have shape -- all shapes have to have a draw method, right?
01:10:08.120 | This is super easy.
01:10:09.760 | And what I can do is I can define an octagon and tell it how to draw.
01:10:12.360 | I can define a diamond, tell it how to draw using exactly the same stuff that we just
01:10:17.840 | saw before.
01:10:18.840 | Really easy.
01:10:19.840 | And the cool thing about this is now I can define a method, refresh, and now refresh,
01:10:23.480 | all it does is it clears the canvas and draws the shape.
01:10:26.360 | And so all shapes will get a refresh method.
01:10:28.800 | So if you go do a tab completion on your octagon, it all just works.
01:10:32.440 | But what happens if you have something else with the draw method?
01:10:35.400 | So cowboys know how to draw.
01:10:37.280 | It's a very different notion of what drawing is, right?
01:10:40.920 | You don't want cowboys to have a refresh method.
01:10:44.040 | It doesn't make sense to clear the screen and then pull out a gun.
01:10:49.080 | That's not what we're talking about here.
01:10:50.600 | And so the idea of protocols is really, again, to categorize and describe groups of types.
01:10:56.180 | And one of the things you'll see, which is kind of cool, is you can define a protocol
01:10:59.160 | with nothing in it.
01:11:01.240 | So it's a protocol that doesn't require anything.
01:11:03.380 | And then you go say, I want that type, that type, that type, that type to be in this group.
01:11:08.400 | And now I will have a way to describe that group of types.
01:11:11.760 | So it can be totally random, whatever makes sense for you.
01:11:14.680 | And then you can do reflection on it.
01:11:17.040 | You can do lots of different things that now apply to exactly that group of types.
01:11:20.280 | And I actually found, I still find that this kind of protocol-based programming approach
01:11:25.800 | is like the exact upside down opposite of how I've always thought about things.
01:11:31.600 | It's kind of like you don't create something that contains these things, but you kind of
01:11:34.720 | like, I don't know, somehow shove things in.
01:11:37.320 | And the more I've looked at code that works this way, the more I realize it tends to be
01:11:44.000 | clearer and more concise.
01:11:45.640 | But I still find it a struggle because I just don't have that sense of this is how to go
01:11:51.600 | about creating these kinds of APIs.
01:11:54.480 | And one of the things you'll notice is that we added this protocol to array in an extension.
01:11:59.680 | So unlike interfaces in a Java or C# type of language, we can take somebody else's type
01:12:04.800 | and then make it work with the protocol after the fact.
01:12:08.360 | And so I think that's a superpower here that allows you to work with these values in different
01:12:12.520 | ways.
01:12:13.520 | So this is a super brief, high-level view of protocols.
01:12:17.480 | Protocols are really cool in Swift, and they draw in a lot of great work in the Haskell
01:12:20.720 | and other communities.
01:12:22.400 | There's a bunch of talks, and even Jeremy wrote a blog post that's really cool that
01:12:25.720 | talks about some of the fun things you can do.
01:12:29.400 | - Won't extensions make code hard to read?
01:12:31.760 | Because once a functionality of a particular API or class is extended in this way, you
01:12:35.760 | won't know if the functionality is coming from the original class or from somewhere
01:12:39.400 | else.
01:12:40.400 | - Yeah, so that's something you let go of when you write Swift code.
01:12:43.840 | And there's a couple of reasons for that, one of which is that you get good ID support.
01:12:47.880 | And so, again, we're kind of building some parts of this airplane as we fly it, but in
01:12:53.080 | Xcode, for example, you can click on a method and jump to the definition, right?
01:12:57.560 | And so you can say, well, okay, here's a map on array.
01:13:01.360 | Where does map come from?
01:13:02.520 | Well, map isn't defined on array.
01:13:04.720 | Map filter reduced.
01:13:05.720 | Those aren't defined on array.
01:13:06.720 | Those are actually defined on sequence.
01:13:09.160 | And so all sequences have map filter reduced and a whole bunch of other stuff.
01:13:13.360 | And so arrays are, of course, sequences, and so they get all that behavior.
01:13:17.960 | And so the fact that it's coming out of sequence as a Swift programmer, particularly when you're
01:13:21.680 | starting, doesn't really matter.
01:13:22.680 | It's just good functionality.
01:13:23.680 | - And actually, you know, we've had this same discussion around Python, which is like, oh,
01:13:27.300 | Jeremy imports star, and therefore I don't know where things come from, because the only
01:13:30.840 | way I used to know where things come from is because I looked at the top of a file and
01:13:33.640 | it would say, from blah, import foo.
01:13:36.440 | And so I know foo comes from blah.
01:13:38.200 | And we had that whole discussion earlier lesson where I said, that's not how you figure out
01:13:41.560 | where things come from.
01:13:42.560 | You learn to use jump to symbol in your IDE, or you learn to use Jupyter Notebox ability
01:13:47.680 | to show you where things come from.
01:13:51.280 | That's just the way to go.
01:13:53.720 | - Thank you.
01:13:55.320 | I feel that Scala is often a very nicely designed language that my knowledge doesn't lack in
01:14:00.760 | terms of the features I've seen so far in Swift.
01:14:03.080 | Is that true?
01:14:04.080 | And if so, is the choice of Swift more about JVM as opposed to non-JVM runtimes and compilers?
01:14:09.840 | - Yeah, so Scala is a great language.
01:14:12.560 | Scala is one of the, my, the way we explain Scala is that they are very generous in the
01:14:18.480 | features they accept.
01:14:23.440 | They're undergoing a big redesign of the language to kind of cut it down and try to make the
01:14:27.680 | features more sensible and stack up nicely together.
01:14:31.600 | Swift and Scala have a lot of similarities in some places and they diverge wildly in
01:14:34.600 | other places.
01:14:35.600 | - I mean, I would say there's a, you know, I feel like anybody doing this course understands
01:14:39.760 | the value of tasteful curation because PyTorch is very tastefully curated and TensorFlow might
01:14:46.400 | not be.
01:14:48.440 | And so like using a tastefully curated, carefully put together API like Swift has and like PyTorch
01:14:55.040 | has, I think it makes life easier for us as data scientists and programmers.
01:14:58.920 | - Yeah, but I think the other point is also very good.
01:15:02.040 | So Scala is very strong in the JVM, Java, virtual machine ecosystem and it works very
01:15:07.600 | well with all the Java APIs and it's great in that space.
01:15:11.480 | Swift is really great if you don't want to wait for JVM to start up so you can run a
01:15:15.800 | script, right?
01:15:16.800 | And so there's nice duels and they have different strengths and weaknesses in that sense.
01:15:20.960 | - Do we have time before our break that I can quickly show how this all goes together?
01:15:24.920 | - I probably can't stop you even if I wanted to.
01:15:29.040 | - So just to come back to this HC, right, you can basically see what's happened here.
01:15:37.400 | We have to find this protocol saying these are the things that we want to have in a Databox
01:15:41.400 | API and then we said here is a specific example of a Databox API.
01:15:46.680 | Now at this point we're missing one important thing which we've never actually created the
01:15:51.800 | bit that says this is how you open an image and resize it and stuff like that, right?
01:15:57.600 | So we just go through and we can say let's call .download, let's call .get items.
01:16:04.200 | We can create nice simple little functions now.
01:16:06.440 | We don't have to create complex class hierarchies to say things like tell me about some sample
01:16:11.720 | and it prints it out, right?
01:16:14.440 | And we can create a single little function which creates a train and a valid.
01:16:20.200 | This is neat, right?
01:16:21.200 | This is something I really like about this style of programming is this is a named tuple.
01:16:26.560 | And I really like this idea that we don't have to create our own struct in class all
01:16:30.040 | the time.
01:16:31.040 | It's kind of a very functional style of programming where you just say I'm just going to define
01:16:35.880 | my type as soon as I need it and this type is defined as being a thing with a train and
01:16:39.800 | a thing with a valid.
01:16:40.940 | So as soon as I work brackets parentheses around this it's both a type and a thing now.
01:16:46.760 | And so now I can partition into train and valid and that's returned something where
01:16:52.120 | I can grab a random element from valid and a random element from train.
01:16:57.720 | We can create a processor, again it's just a protocol, right?
01:17:01.400 | So remember a processor is a thing like for categories, creating a vocab of all of the
01:17:08.120 | possible categories and so a processor is something where there's some way to say like
01:17:13.000 | what is the vocab and if you have a vocab then process things from text into numbers
01:17:20.040 | or deprocess things from numbers into text.
01:17:23.160 | And so we can now go ahead and create a category processor, right?
01:17:27.280 | So here's like grab all the unique values and here's label to int and here's int to
01:17:32.160 | label.
01:17:33.160 | So now that we have a category processor we can try using it to make sure that it looks
01:17:52.080 | sensible and we can now label and process our data.
01:17:58.480 | So we first have to call label and then we have to call process.
01:18:03.880 | Given that we have to do those things in a row, rather than creating whole new API functions
01:18:09.720 | we can actually just use function composition.
01:18:12.920 | Now in PyTorch we've often used a thing called compose but actually it turns out to be much
01:18:19.480 | easier as you'll see if you don't create a function called compose but you actually
01:18:23.400 | create an operator.
01:18:25.520 | And so here's an operator which we will call compose, right?
01:18:29.240 | Which is just defined as first call this function f and then call this function g on whatever
01:18:35.800 | the first thing you passed it is.
01:18:37.520 | So now we have to find a new function composition which first labels and then processes.
01:18:44.360 | And so now here's something which does both and so we can map, right?
01:18:49.940 | So we don't have to create again all these classes and special purpose functions.
01:18:55.440 | We're just putting together function composition and map to label all of our training data
01:19:02.040 | and all of our validation data.
01:19:04.640 | And so then finally we can say well this is the final structure we want.
01:19:08.920 | We want a training set, we want a validation set and let's again create our own little
01:19:13.440 | type in line, right?
01:19:15.360 | So that's an array of tuples.
01:19:16.760 | Yeah, so our training sets, an array of named tuples, a validation set is an array of named
01:19:20.720 | tuples and so we're gonna initialize it by passing both in.
01:19:24.400 | And so this basically is now our data blocks API.
01:19:27.840 | There's a function called make split labeled data and we're just gonna pass in one of those
01:19:35.800 | configuration protocols we saw.
01:19:37.680 | So we're gonna be passing in the image net configuration protocol, the thing that conforms
01:19:44.000 | to that protocol.
01:19:45.000 | And we're gonna be passing in some processor, right, which is gonna be a category processor
01:19:49.800 | and it's gonna sort of call download, get the items, partition, map, label of, and then
01:19:56.880 | initialize the processor state and then do label of and then process is our processing
01:20:02.760 | function and then map that, right?
01:20:06.120 | And so that's it.
01:20:07.940 | So now we can say to use this with OpenCV, we define how to open an image.
01:20:14.520 | There it is.
01:20:15.520 | We define how to convert BGR to RGB, cuz OpenCV uses BGR, that's how old it is.
01:20:21.680 | We define the thing that resizes to 224 by 224 with bilinear interpolation.
01:20:26.400 | And so the process of opening an image is to open, then BGR to RGB, and then resize and
01:20:32.680 | we compose them all together, and that's it, right?
01:20:36.600 | So now that we've got that, we then need to convert it to a tensor.
01:20:42.500 | So the entire process is to go through all those transforms and then convert to a tensor.
01:20:48.040 | And then, I'll skip over the bit that does the mini batches.
01:20:54.160 | There's a thing we've got to do the mini batches with that split label data we created, and
01:20:59.040 | we then just pass in the transforms that we want, and we're done, right?
01:21:04.840 | So the data blocks API in kind of functional-ish, protocol-ish, Swift, ends up being a lot less
01:21:13.120 | code to write and a lot easier for the end user.
01:21:17.760 | Cuz now for the end user, there's a lot less they have to learn to use this data blocks
01:21:22.920 | It's really just like the normal kind of maps and function composition that hopefully they're
01:21:29.280 | familiar with as Swift programmers.
01:21:31.640 | So I'm really excited to see how this came out, because it solves the problems that I've
01:21:38.760 | been battling with for the last year with the Python data box API, and it's been really
01:21:44.700 | just a couple of days of work to get to this point.
01:21:47.240 | >> And one of the things that this points to in Swift that is a big focus is on building
01:21:51.400 | APIs.
01:21:52.640 | And so, again, we've been talking about this idea of being able to take an API, use it
01:21:57.720 | without knowing how it works.
01:21:58.720 | It could be in C or Python or whatever, but it's about building these things that compose
01:22:03.320 | together and they fit together in very nice ways.
01:22:05.960 | And with Swift, you get these clean abstractions.
01:22:09.960 | So once you pass in the right things, it works.
01:22:12.280 | You don't get the stack trace coming out of the middle of somebody else's library that
01:22:15.760 | now you have to figure out what you did somewhere along the way that caused it to break, at least
01:22:20.680 | not nearly as often.
01:22:21.680 | >> So to see what this ends up looking like, I've created a package called data block.
01:22:25.040 | It contains two files in, it's got a package.swift and it's got a main.swift.
01:22:29.680 | And main.swift is that, right?
01:22:32.200 | So all that in the end to actually use it, that's how much code it is to use your data
01:22:37.280 | blocks API and grab all the batches.
01:22:40.200 | So it comes out super pretty.
01:22:42.880 | So let's take a five-minute break and see you back here at 8.05.
01:22:49.040 | Okay, so we're gradually working our way back to what we briefly saw last week, notebook11,
01:23:02.080 | trading image net, and we're gradually making our way back up to hit that point again.
01:23:08.240 | It's a bit of a slow process because along the way we've had to kind of invent, float,
01:23:12.640 | and learn about a new language and stuff like that.
01:23:14.680 | But we are actually finally up to zero to a fully connected model, believe it or not.
01:23:20.200 | And the nice thing is at this point, things are going to start looking more and more familiar.
01:23:26.760 | One thing I will say though that can look quite unfamiliar is the amount of typing that
01:23:33.440 | you have to type with Swift, but there's actually a trick, which is you don't have to type all
01:23:39.880 | these types.
01:23:40.880 | >> You don't have to type types.
01:23:42.640 | >> What you can actually do is you can say, oh, here's the type I use all the time, tensor,
01:23:48.840 | bracket, float, and I don't like writing angle brackets either.
01:23:52.320 | So let's just create a type alias called tf.
01:23:55.760 | And now I just use tf everywhere.
01:23:58.000 | Now to be clear, a lot of real Swift programmers in their production code might not like doing
01:24:06.840 | that a lot.
01:24:07.840 | And personally I do do that a lot, even not in notebooks.
01:24:12.080 | But you might want to be careful if you're doing actual Swift programming.
01:24:17.600 | >> The way I would look at it is if you're building something for somebody else to use,
01:24:20.920 | if you're publishing an API, you probably don't want to do that.
01:24:23.280 | >> Yeah.
01:24:24.280 | >> But if you're hacking things together and you're playing and having fun, it's no problem
01:24:27.160 | at all.
01:24:28.160 | >> Yeah.
01:24:29.160 | I mean, different strokes.
01:24:30.160 | I personally, I would say if I'm giving somebody something that's the whole thing, tensor floats,
01:24:34.000 | I would do it.
01:24:35.240 | But anyway, in a notebook, I definitely don't want to be typing that.
01:24:38.200 | So in a notebook, make it easier for your interactive programming by knowing about things
01:24:42.800 | like type alias.
01:24:43.800 | >> Yeah.
01:24:44.800 | That's something we also want to make better just in general so that these things all just
01:24:47.200 | default to float.
01:24:48.200 | >> Yeah.
01:24:49.200 | >> You don't have to worry about it.
01:24:50.200 | >> That'll be nice.
01:24:51.600 | So then we can write a normalized function that looks exactly the same as our Python
01:24:56.560 | normalized function.
01:24:57.680 | And we can use mean and standard deviation just like in Python.
01:25:01.800 | And we can define tests with asserts just like in Python.
01:25:06.480 | So this all looks identical.
01:25:08.040 | We can calculate n and m and c, the same constant, the variables that we used in Python in exactly
01:25:13.760 | the same way as Python.
01:25:17.000 | We can create our weights and biases just like in Python, except there's a nice kind
01:25:24.640 | of rule of thumb in the Swift world, which is any time you have a function that's going
01:25:32.200 | to create some new thing for you, we always use the init constructor for that.
01:25:37.880 | So for example, generating random numbers and dumping them into a tensor, that's constructing
01:25:43.520 | a new tensor for you.
01:25:44.960 | So it's actually -- you're actually calling tensorflow.init here.
01:25:50.980 | And so if you're trying to find where is it in an API that I get to create something in
01:25:56.840 | this way, you should generally look for the -- in the init section.
01:25:59.440 | So this is how you create random numbers in Swift for TensorFlow.
01:26:02.680 | This is how you create tensor of zeros in Swift for TensorFlow.
01:26:06.260 | So here's our weights and biases.
01:26:08.120 | This is all the same stuff we just basically copied and pasted it from the PyTorch version
01:26:13.040 | with some very, very minor changes.
01:26:15.720 | We use a linear function, except rather than at, we use dot, because that's what they use
01:26:23.000 | in Swift for TensorFlow.
01:26:24.600 | If you're on a Mac, that's option eight.
01:26:27.080 | If you're on anything else, it's compose key dot equals.
01:26:32.480 | And so now we can go ahead and calculate linear functions.
01:26:36.160 | We can calculate relu, exactly the same as PyTorch.
01:26:41.260 | We can do proper climbing in it, exactly like PyTorch.
01:26:45.600 | And so now we're at the point where we can define the forward pass of a model.
01:26:50.280 | And this looks, basically, again, identical to PyTorch.
01:26:54.800 | A model can just be something that returns some value.
01:27:00.320 | So that -- the forward pass of our model really just builds on stuff that we already know
01:27:05.440 | about and it looks almost identical to PyTorch, as does a loss function, right?
01:27:11.360 | It looks a little bit different because it's not called squeeze.
01:27:13.760 | It's called squeezing shape.
01:27:14.760 | It does other than that.
01:27:15.760 | Mean squared error is the same as PyTorch as well.
01:27:18.500 | And so now here's our entire forward pass.
01:27:22.560 | So hopefully that all looks very familiar.
01:27:24.080 | If it doesn't, go back to 02 in the Python notebooks.
01:27:29.020 | And actually this is one of the tricks, like this is why we've done it this way for you
01:27:32.000 | all, is that we have, like, literally kind of these parallel texts, you know, there's
01:27:37.320 | a Python version, there's a Swift version, so you can see how they translate and see exactly
01:27:42.080 | how you can go from one language and one framework to another.
01:27:46.960 | That's all very well.
01:27:47.960 | But we also need to do a backward pass.
01:27:51.080 | So to do a backward pass, we can do it exactly the same way as, again, we did it in PyTorch.
01:27:56.480 | One trick we kind of -- Python hack we used in PyTorch.
01:28:00.040 | >> And so this is doing it the hard way.
01:28:01.360 | This is doing it all manually.
01:28:02.360 | >> Okay.
01:28:03.360 | >> Because we have to build it.
01:28:04.360 | >> Doing it all manually, yep, because we have to build everything in Scratch.
01:28:06.160 | And the PyTorch version, we actually added a .grad attribute to all of our tenses.
01:28:11.000 | We're not allowed to just throw attributes in arbitrary places in Swift, so we have to
01:28:14.760 | define a class which has the actual value and the gradient.
01:28:18.720 | But once we've done that, the rest of this looks exactly the same as the PyTorch version
01:28:25.080 | Here's our MSC grad, our ReLU grad.
01:28:28.200 | That's all exactly the same.
01:28:29.200 | In fact, you can compare here, right?
01:28:31.280 | Here's the Python version we created for LinGrad, here's the Swift version for LinGrad.
01:28:38.360 | It's almost identical.
01:28:39.360 | So now that we've done all that, we can go ahead and do our entire forward and backward
01:28:47.400 | pass and we're good to go.
01:28:51.800 | But it could be so much better.
01:28:55.760 | >> Well, you skipped past the big flashing red lights that says don't do this.
01:29:01.560 | Did you miss that part?
01:29:04.240 | >> Tell me about it.
01:29:05.240 | >> Oh, okay.
01:29:06.240 | So let's talk about this.
01:29:07.240 | We're defining a class and putting things in classes, and we haven't seen classes yet,
01:29:11.440 | at least not very much.
01:29:12.720 | >> That's true.
01:29:13.720 | Because before, we've used things that looked like classes, but they didn't say class on
01:29:17.640 | them.
01:29:18.640 | They said struct on them.
01:29:19.880 | >> Yes.
01:29:20.880 | And so what is that?
01:29:21.880 | >> That's true.
01:29:22.880 | >> So let's play a little game, and so let's talk about this idea of values and references,
01:29:30.120 | because that's what struct versus class really means in Swift.
01:29:34.040 | A value is a struct thing, and a reference is a class thing.
01:29:38.120 | So let's talk about Python.
01:29:40.600 | Here's some really simple Python code, and there's no tricks here.
01:29:43.140 | What we're doing is we're assigning four into A, we're copying A into B, we're incrementing
01:29:47.120 | A and printing them out.
01:29:48.720 | And so when you do this, you see that A gets incremented.
01:29:51.160 | Of course.
01:29:52.160 | B does not.
01:29:53.160 | Of course.
01:29:54.160 | This all makes perfect sense.
01:29:55.160 | In Swift, you do the same thing, you get the same thing out.
01:29:58.520 | This is how math works, right?
01:30:01.360 | All very straightforward.
01:30:02.760 | Let's talk about arrays.
01:30:04.000 | So here I have an array or list in Python, and I put into X, and then I copy X and Y.
01:30:10.360 | I add something to X, and it has it.
01:30:12.400 | I have to point with this.
01:30:15.080 | And then it has the extra item.
01:30:16.240 | That makes perfect sense, right?
01:30:17.920 | What happens to Y?
01:30:20.400 | What?
01:30:21.800 | What just happened here?
01:30:22.800 | I just added something to X, and now Y changed?
01:30:25.880 | Now what is going on here?
01:30:28.920 | Well, we learn.
01:30:30.840 | We learn that there's this thing called a reference, and we learn that it does things
01:30:35.200 | like this, and we learn when it bites us.
01:30:37.800 | What happens in Swift?
01:30:38.800 | Well, Swift has arrays.
01:30:39.800 | It doesn't have lists the same way.
01:30:41.960 | And so here we have, again, this identical code except var.
01:30:46.080 | We put one and a two into X, we copy X into Y, we add something to X, we print it out,
01:30:51.320 | we get the extra element.
01:30:53.760 | But Y is correct.
01:30:55.760 | What just happened?
01:30:57.640 | So this is something called value and reference semantics.
01:31:01.840 | And in Swift, arrays, dictionaries, tensors, like all these things have what's known as
01:31:06.600 | value semantics.
01:31:07.600 | And let's dive in a little bit about what that is.
01:31:10.020 | So a value in something that has value semantics is a variable that -- sorry, this is self-referential.
01:31:17.800 | When you declare something in your code, you're declaring a name.
01:31:20.840 | And if it's a name for a value, that name stands for that value, right?
01:31:25.720 | X stands for the array of elements that it contains.
01:31:30.640 | This is how math always works.
01:31:32.380 | This is what you expect out of basic integers.
01:31:34.340 | This is what you expect out of basic things that you interact with on a daily basis.
01:31:40.120 | Reference semantics are weird if you think about it.
01:31:41.940 | So what we're doing is we're saying that X is a name for something else.
01:31:47.540 | And so we usually don't think about this until it comes around to bite us.
01:31:51.340 | And so this is kind of a problem.
01:31:53.680 | And let's dive in a little bit to understand why this causes problems.
01:31:57.760 | So here's a function.
01:31:59.480 | It's a do thing.
01:32:00.480 | It's something that Jeremy wrote with a very descriptive name.
01:32:03.120 | And it takes T, and then it goes and updates this, and that's fine, right?
01:32:07.840 | It's super fast.
01:32:08.840 | Everything is good.
01:32:09.840 | You move on and put in a workbook, and then you build the next workbook.
01:32:13.320 | Next workbook calls in a do thing, and you find out, oh, well, it changed the tensor
01:32:18.800 | I passed in, but I was using that tensor for something else.
01:32:21.760 | And now I've got a problem because it's changing a tensor that I want to use.
01:32:25.800 | And now I've got this bug.
01:32:26.800 | I have to debug it.
01:32:27.800 | And I find out the do thing is causing the problem.
01:32:29.480 | And so what do I do?
01:32:31.000 | I go put a clone in there.
01:32:32.360 | I don't know who here adds clones in a principled way or who here--
01:32:37.280 | I do use it in a principled way.
01:32:38.760 | So what we do in fast AI is we kind of don't have clone.
01:32:41.600 | And then when things start breaking, I add more until things start breaking, and then
01:32:45.480 | we're done.
01:32:46.480 | That sounds great.
01:32:47.480 | Yeah.
01:32:48.480 | So there's a lot of clone in fast AI.
01:32:49.920 | And yeah.
01:32:50.920 | That's a good principle.
01:32:51.920 | I see what you're going for.
01:32:52.920 | Possibly a few too many, or possibly a few too few.
01:32:55.800 | Well, so now think about this.
01:32:56.800 | What we have is we have a foot gun here in the first case.
01:32:59.640 | So something that's ready to explode if I use it wrong.
01:33:02.800 | Now I added clone.
01:33:03.800 | And so good news, it's cracked but slow.
01:33:06.640 | So it's going to do that copy even if I don't need to, which is really sad.
01:33:11.680 | In Swift, things just work.
01:33:13.640 | You pass in a tensor.
01:33:15.040 | You can update it.
01:33:16.040 | You can return it.
01:33:17.040 | And it leaves the original one alone.
01:33:18.480 | Arguments in Swift actually even default to constants, which makes it so that you can't
01:33:22.360 | do that.
01:33:23.400 | If you do actually want to modify something in the caller, you can do that too.
01:33:27.760 | You just have to be a little bit more explicit about it and use this thing called in out.
01:33:31.440 | And so now if you want to update the thing somebody passed to you, that's fine.
01:33:36.040 | Just pass it in out and everything works fine.
01:33:38.400 | And on the call side, you pass it with this ampersand thing so that they know that it
01:33:42.240 | can change.
01:33:43.240 | Now, what is going on here?
01:33:44.960 | So this is good math.
01:33:47.040 | This is like the correct behavior.
01:33:48.880 | But how does this work?
01:33:49.880 | Well, when we talk about names, we're talking about values.
01:33:53.080 | And so here I have a struct.
01:33:54.940 | This is a valuey thing.
01:33:56.480 | And so I say it has two fields, real and imaginary.
01:33:58.960 | And I define an instance of my complex number here named X.
01:34:04.560 | And so this is saying I have X and it's a name for the value that has one and two in
01:34:10.960 | And so I introduce Y. Y is another notational instance of this struct.
01:34:17.080 | And so it also has a one and a two.
01:34:18.920 | And if I go and I copy it, then I get another copy.
01:34:22.640 | And if I change one, then I update just Ys.
01:34:26.120 | This is, again, the way things should work.
01:34:28.920 | And so this works with structs, this works with tuples, this works with arrays and dictionaries
01:34:35.120 | and all that kind of stuff.
01:34:36.520 | How do references work?
01:34:37.520 | Well, references, the name here.
01:34:39.760 | I have a class and the class has a string and it has an integer.
01:34:43.040 | And so somewhere in memory, there is a string and there is an array and they're stuck together,
01:34:47.860 | just like with the struct.
01:34:48.860 | But now when I say X, X is actually a reference or a pointer or an indirection.
01:34:54.520 | The reason for that is because you wrote class instead of struct.
01:34:58.380 | So by writing class, you're saying when you create one of these things, please create
01:35:02.760 | a reference, not a value.
01:35:04.600 | Yes, that's exactly right.
01:35:06.120 | And now what happens with references is you now get copies of that reference.
01:35:11.680 | And so when I copy X into Y, just like in PyTorch or Python, I have another reference
01:35:17.760 | or another pointer to the same data.
01:35:20.460 | And so that's why when you go and you update it, so I'm going to go change the array through
01:35:24.480 | Y, it's also going to change the data that you see through X.
01:35:29.880 | And so in Swift, you have a choice.
01:35:32.460 | And so you can declare things as classes and classes are good for certain things and they're
01:35:38.080 | important and valuable and you can subclass them and classes are nice in various ways.
01:35:42.840 | But you have a choice and a lot of things that you've seen are all defined as structs
01:35:45.760 | because they have much more predictable behavior and they stack up more correctly.
01:35:48.700 | So in this case, you know, I was trying to literally duplicate a Python/PyTorch API.
01:35:56.040 | And so I just found I wasn't able to unless I used class.
01:36:01.120 | And then you kind of said, okay, well, that's how you do it.
01:36:03.960 | Yeah, and we'll get back to auto diff in a second.
01:36:06.040 | But don't do it that way.
01:36:07.040 | Yeah.
01:36:08.040 | And so you can absolutely do that.
01:36:09.040 | And again, when you're learning Swift, it's fine, just reach for the things that are familiar
01:36:12.160 | and then you can learn as you go.
01:36:13.920 | That's perfectly acceptable.
01:36:15.760 | But here we're trying to talk about things Swift is doing to help save you and make your
01:36:19.640 | code more efficient and things like that.
01:36:21.280 | And I still reach for class a lot.
01:36:25.360 | But then every time a real Swift programmer takes my thing that had class and replaces
01:36:29.520 | it with something more Swift-y, it ends up being shorter and easier to understand.
01:36:35.160 | And so I agree, go for it, get things working with class.
01:36:39.520 | But when it becomes time, start to work with this and look at it and figure out how it
01:36:44.960 | works.
01:36:45.960 | Now, there's one thing that's really weird here.
01:36:46.960 | And if you remember last time, the first thing I told you about was var and let, right?
01:36:51.860 | And what is going on here?
01:36:53.600 | This does not make any sense.
01:36:55.920 | We've got Y and now we are updating, if this thing will go away, we are updating a thing
01:37:03.640 | in Y even though Y is a constant.
01:37:07.320 | And what does that even mean?
01:37:09.360 | Well, the reason here is that the thing that is constant, this constant is this reference.
01:37:15.840 | And so we've made a new copy of the reference, but we're allowed to copy the thing it points
01:37:19.560 | to because we're not changing X or Y itself.
01:37:25.440 | So this doesn't make sense, none of this makes sense, but how does let and var work?
01:37:30.200 | Well, this is a thing that comes back to the mutation model in Swift.
01:37:33.200 | And I'll go through this pretty quickly.
01:37:35.000 | This is not something you have to know.
01:37:37.240 | But let's say I have a complex number and it's a struct and I say, hey, this thing is
01:37:41.760 | a constant.
01:37:43.040 | I want to go change it, right?
01:37:45.640 | That's not supposed to work.
01:37:46.640 | What happens?
01:37:47.640 | Well, if you try to do that, Swift will tell you, ha ha, you can't do that.
01:37:50.680 | You can't use plus equals on a real that's in a C because C1 is a let.
01:37:55.920 | And Swift is helpful.
01:37:57.040 | And so it tries to lead you to solving a problem that says, hey, by the way, if you want to
01:38:01.400 | fix this, you want to make it go away, just change let to var and then everything is good.
01:38:05.640 | That's totally fine.
01:38:06.840 | Now, okay, fine.
01:38:08.320 | Well, maybe I really do want to change it.
01:38:10.600 | And so what I'm going to do is I'm going to get a little bit trickier and I'm going to
01:38:12.840 | find this extension.
01:38:13.840 | I'm going to add a method increment to my complex number.
01:38:18.320 | I'm going to increment inside the method and then call the method.
01:38:21.920 | Can I get away with that?
01:38:24.200 | Well, you know, these things may be in different files.
01:38:27.360 | The compiler may only be able to see one or the other.
01:38:30.040 | And so if you run this, it has no idea whether increment is going to change that thing, right?
01:38:36.280 | And so what the compiler does is says, ah, well, you can't implement real inside of this
01:38:42.280 | increment method either because it says self is immutable.
01:38:48.080 | And it says Mark method mutating to make self mutable.
01:38:51.520 | Now the thing to think about in methods, both in Python, but also in Swift is that they
01:38:55.900 | have a self and in Python, you have to declare it.
01:39:00.040 | Swift has it too.
01:39:01.040 | It just, it's just not making you write it all the time because that would be annoying.
01:39:06.320 | And so when you declare a method on a struct, what you do is you're getting self and it's
01:39:12.480 | a copy of the struct.
01:39:15.280 | Okay.
01:39:16.560 | Now what this is saying is this is saying that, hey, you're actually changing self.real.
01:39:22.520 | Self is constant.
01:39:23.520 | And so you can't do that here, but what you can do is you can mark it mutating.
01:39:27.200 | And so what that looks like is that says we can mark this function as mutating.
01:39:32.080 | And what that does is it says our self is now one of these in out things, the in out
01:39:36.920 | thing that allows us to change it in the color.
01:39:39.440 | And because it's now mutating, it's totally fine to change it.
01:39:42.840 | That's no big deal.
01:39:43.840 | The compiler leads you to this and shows you what to do, but now we come back to this problem
01:39:49.200 | over here.
01:39:50.200 | We say, well, we have a constant.
01:39:51.200 | We're calling increment.
01:39:52.200 | How does that work?
01:39:53.200 | Well, it still doesn't.
01:39:54.200 | The compiler will tell you, hey, you can't do that.
01:39:56.240 | You can't mutate C1.
01:39:59.000 | And now it knows the increment can change it.
01:40:01.560 | And so it says really, really, really, if you want to do this, go mark C1 as a var.
01:40:05.640 | And Jeremy would say, just mark everything as a var because that's how he is.
01:40:10.040 | And so the nice thing about this, though, is it all stacks up nicely and it all works.
01:40:15.000 | And this is what allows -- this is kind of the underlying mechanics that allow the value
01:40:19.400 | stuff to work.
01:40:20.400 | Now, you may be wondering, how is this efficient?
01:40:24.200 | So we were talking about in the PyTorch world, you end up copying all the time, even if you
01:40:28.040 | don't end up needing it.
01:40:29.600 | In Swift, we don't want to do all those copies.
01:40:32.040 | And so on the other hand, we don't want to be, like, always copying.
01:40:36.200 | So where do the copies go and how does that work?
01:40:38.880 | So if you're using arrays, or arrays of arrays of arrays of dictionaries of arrays of super
01:40:43.200 | nested things, what ends up happening is arrays are struct, you might be surprised.
01:40:49.640 | And inside of that struct, it has a pointer or a reference.
01:40:54.060 | And so the elements of an array are actually implemented with the class.
01:40:57.160 | And so what I have here is I have A1, which is some array, and I copied it to A2, and
01:41:01.560 | I copied it to A3, I copied it to A4 because I'm passing it all around, I'm just passing
01:41:04.800 | this array around, no big deal, and what happens is I'm just copying this reference and it
01:41:09.360 | happens to be buried inside of a struct.
01:41:11.560 | And so this passing around arrays, full value semantics, super cheap, no problem, it's not
01:41:15.720 | copying any data, it's just passing the pointer around, right, just like you do in C or even
01:41:20.960 | in Python.
01:41:23.080 | The magic happens when you go and you say, okay, well, I've now got A4, and so all these
01:41:27.280 | things are all sharing this thing, I'm going to add one element to A4, well, what happens?
01:41:31.760 | Well, first thing that happens is append is a mutating method, and so it says, hey, I'm
01:41:38.320 | this thing called a copy-on-write type, and so I want to check to see if I'm the only
01:41:42.480 | user of this data.
01:41:44.880 | And it turns out no, lots of other things are pointing to our data here, and so lazily,
01:41:51.600 | because it's shared, I'll make a copy of this array.
01:41:54.260 | And so I only get a copy of the data if it's shared and if it changes.
01:41:58.440 | >> So that should be one, two, three, 22?
01:42:00.640 | >> Yeah, that should be one, two, three, 92.
01:42:04.040 | I am buggier than Swift.
01:42:07.080 | Now the interesting thing about this is because of the way this all works out is if you go
01:42:09.840 | and you change A4 again, it goes and just updates it in place, there's no extra copy.
01:42:14.800 | And so the cool thing about this is that you get exactly the right number of copies and
01:42:19.880 | it just works, you as a programmer don't have to think about this.
01:42:23.320 | This is one of the things that Swift is just, like, subtracting from your consciousness
01:42:27.760 | of the things that you have to worry about, which is really nice.
01:42:31.000 | And so a really nice aspect of this is that you get algebra, like, values work the way
01:42:36.640 | values are supposed to work, you get super high performance, we get to use more emojis,
01:42:41.280 | which I always appreciate.
01:42:43.360 | If you want to learn more about this, because this is also a really cool, deep topic that
01:42:47.640 | you can geek out about, particularly if you've done object-oriented programming before, there's
01:42:50.640 | a lot that's really nice about this, there's a video you can see more.
01:42:53.960 | So let's go back to that auto-diff thing, and let's actually talk about auto-diff from a
01:42:58.800 | different perspective.
01:43:00.360 | So this is the auto-diff system implemented the same way as the manually done PyTorch version,
01:43:08.800 | and we didn't like it because it was using references.
01:43:12.080 | Let's implement, again, the very low-level manual way in Swift, but before we do, let's
01:43:17.480 | talk about where we want to get to.
01:43:19.760 | So Swift has built-in, and Swift for TensorFlow has built-in automatic differentiation for
01:43:24.960 | your code.
01:43:25.960 | So you don't have to write gradients manually, you don't have to worry about all this stuff.
01:43:28.920 | And the way it works is really simple.
01:43:30.880 | There are functions like gradient, and you call gradient, and you pass it a closure,
01:43:36.000 | and you say, what is the gradient of x times x?
01:43:39.680 | And it gives you a new function that computes the gradient of x times x, and here we're
01:43:42.720 | just calling that function on a bunch of numbers that we're striding over and printing them
01:43:47.200 | out, and it just gives you this gradient of this random little function we wrote.
01:43:52.800 | Now, one of the interesting things about this is, I wrote this out, it takes just doubles
01:43:56.920 | or floats or other things like that.
01:43:59.160 | Auto-diff in Swift works on any differentiable type, anything that's continuous, anything
01:44:03.760 | that's not like integers, anything that has a gradient.
01:44:07.520 | And so --
01:44:08.520 | >> So you can't do this in just a library.
01:44:11.620 | This has to be built into the language itself, because you're actually -- you're just -- you're
01:44:16.680 | literally compiling something that's multiplying doubles together, and it has to figure out
01:44:20.680 | how to get gradients out of that.
01:44:21.680 | >> Yeah.
01:44:22.680 | You can do things as a library, and that's what PyTorch and other frameworks do in Python.
01:44:25.960 | >> Well, PyTorch --
01:44:26.960 | >> But it doesn't work the same way at all.
01:44:27.960 | >> And PyTorch will not do that on doubles.
01:44:29.840 | PyTorch --
01:44:30.840 | >> Oh, yes.
01:44:31.840 | That is true.
01:44:32.840 | >> Why is it using that on tenses?
01:44:33.840 | >> Yes.
01:44:35.840 | And so this doesn't just work on doubles.
01:44:37.320 | If you want to define quaternions or other cool numeric scientific-y things that are continuous,
01:44:43.880 | those are differentiable, too, and that all stacks out and works.
01:44:46.800 | So the -- there's a bunch of cool stuff that works this way.
01:44:51.040 | You can define a function.
01:44:52.640 | You can get the gradient at some point with the function.
01:44:55.080 | You can pass enclosures.
01:44:56.080 | Like, all this stuff is really nice.
01:44:58.800 | Instead of talking about that, we're going to do the -- from the bottom-up thing.
01:45:03.160 | And so I'm going to pretend I understand calculus for a minute, which is sad.
01:45:08.080 | So if you think about what differentiation is, computing the derivative of a function,
01:45:15.000 | there's two basic things you have to do.
01:45:16.600 | You have to know the axioms of the universe, like, what does -- what is the derivative
01:45:20.360 | of plus or multiply or sine or cosine or tensor or matmul.
01:45:27.160 | Then you have to compose these things together, and the way you compose them together is this
01:45:30.680 | thing called the chain rule, and this is something that I relearned, sadly, over the last couple
01:45:36.080 | of weeks.
01:45:37.080 | >> But we did in the Python part of this course.
01:45:39.880 | >> Yes.
01:45:40.880 | >> And we wrote it a different way.
01:45:41.880 | We had to y dx equals to y du du dx.
01:45:45.160 | >> Yeah, apparently there's some ancient feud between the people who invented calculus independently,
01:45:49.520 | and they could not agree on notation.
01:45:52.800 | So what this is saying is this is saying, if you want the derivative of f calling g,
01:45:57.560 | the derivative of f calling g is the derivative of f applied to the forward version of g multiplied
01:46:03.160 | by the derivative of g.
01:46:04.880 | And this is important because this is actually computing the forward version of g in order
01:46:09.840 | to get the derivative of this, and so --
01:46:12.240 | >> Which we kind of hid away in our dy du du dx version.
01:46:15.640 | >> Do you want to make the final motion?
01:46:17.720 | >> Oh, sure.
01:46:18.720 | I don't know how to do it on your machine.
01:46:20.720 | Oh, there you go.
01:46:24.080 | So how are we going to do this?
01:46:25.080 | Well, what we're going to do is we're going to look at defining the forward function of
01:46:27.880 | this, and so we'll use the mean squared error as the example function.
01:46:34.000 | This is a little bit more complicated than I want, and so what I'm going to do is I'm
01:46:36.560 | going to actually just look at this piece here, and so I'm going to define this function
01:46:39.720 | MSE inner, and all it is is it's the dot squared dot mean.
01:46:44.280 | So it's conceptually this thing, MSE inner, that just gets the square of x and then does
01:46:49.720 | the mean just because that's simpler, and then we'll come back to MSE at the end.
01:46:53.800 | And so in order to understand what's going on, I'm going to find this little helper function
01:46:57.360 | called trace, and all trace does is it -- you can put it in your function, and it uses this
01:47:02.480 | little magic thingy called pound function, and when you call trace, it just prints out
01:47:06.880 | the function that it's called from.
01:47:09.240 | And so here we call foo, and it prints out, hey, I'm in foo AB, and I'm in bar X, and
01:47:14.960 | so we'll use that to understand what's happening in these cells.
01:47:18.280 | So here I can define, just like you did in the PyTorch version, the forward and the derivative
01:47:25.680 | versions of these things, and so X times X is the forward, the gradient version is two
01:47:29.960 | times X. X dot mean is the forward, this weird thing of doing a divide is apparently the
01:47:36.440 | gradient of mean, and I checked it, it apparently works, I don't know why.
01:47:42.360 | So then when you define the forward function of this MSE inner function, it's just saying
01:47:45.800 | give me the square and take the mean, super simple, and then we can use the chain rule,
01:47:51.040 | and this is literally where we use the chain rule to say, okay, we want the gradient of
01:47:54.560 | one function on another function, just like the syntax shows, and the way we do that is
01:47:58.840 | we get the gradient of mean and pass it to the inner thing and multiply it by the gradient
01:48:03.680 | of the other thing.
01:48:05.320 | So this is really literally the math interpretation of this stuff.
01:48:09.640 | And given that we have this, we can now wrap it up into more functions, and we can say,
01:48:14.680 | let's compute the forward and the backwards version of MSE, we just call the forward version,
01:48:18.420 | we call the backward version, and then we can run on some example data, one, two, three,
01:48:22.760 | four.
01:48:23.760 | >> And just to be clear, the upside down triangle thing is not an operator here, it's just using
01:48:28.440 | inner code as part of the name of that function.
01:48:31.140 | >> That's a gradient delta symbol thingy, I found that on Wikipedia.
01:48:39.480 | So when you run this, what you'll see is it computes the forward version of the saying,
01:48:43.200 | it runs square and then it runs mean, and then it runs square again, and then it runs
01:48:47.480 | the backward version of mean and square, and this makes sense given the chain rule, right?
01:48:50.920 | We have to recompute the forward version of square to do this, and for this simple example,
01:48:55.640 | that's fine, square is just one multiply, but consider it might be a multiply of megabytes
01:49:01.420 | worth of stuff.
01:49:02.680 | It's not necessarily cheap, and when you start composing these things together, this recomputation
01:49:07.260 | can really come back to bite you.
01:49:09.260 | So let's look at what we can do to factor that out.
01:49:12.760 | So there's this pattern called chainers, and what we call the value and chainer pattern.
01:49:18.960 | And what we want to do is we want to find each of these functions, like square or mean
01:49:24.080 | or your model, as one function that returns two things.
01:49:29.480 | And so what we're going to do is we're going to look at the other version of calculus's
01:49:33.620 | form of this, and so when you say that the derivative of x squared is 2x, you actually
01:49:39.280 | have to move the dx over with it, and this matters because the functions we just defined
01:49:46.600 | are actually only, those are only valid if you're looking at a given point, that they're
01:49:50.320 | not valid if you compose it with another function.
01:49:52.440 | This is just another way of writing the chain rule.
01:49:54.340 | It's the exact same thing, and so we're going to call this the gradient chain, and all it
01:49:59.620 | is is an extra multiply.
01:50:01.120 | And Chris, I just need to warn you, in one of the earlier courses, I got my upside-down
01:50:05.880 | triangles mixed up as you just did, so the other way round is delta, and this one is
01:50:10.920 | called nabla, and I only know that because I got in trouble for screwing it up from everywhere
01:50:14.320 | last time.
01:50:15.320 | Okay.
01:50:16.320 | Thank you, Jeremy, for saving me.
01:50:18.480 | So all this is, is the same thing we saw before, it just has an extra multiplication there
01:50:22.760 | because that's what the chain rule apparently really says.
01:50:25.760 | So what we can do now is, now that we have this, we can actually define this value with
01:50:29.760 | chain function, and check this out.
01:50:32.600 | What this is doing is it's wrapping up both of these things into one thing.
01:50:35.880 | So here we're returning the value, when you call this, we're also returning this chain
01:50:40.760 | function.
01:50:41.760 | Can you just explain this TF arrow, TF, how do I read that, TF arrow, TF?
01:50:48.640 | So what this is doing is this is saying we're defining a function, square WVC, it takes
01:50:53.320 | X, it returns a tuple, we know what tuples are.
01:50:57.720 | These are fancy tuples, like you were showing before, where the two things are labeled.
01:51:02.240 | So there's a value member of the tuple, and there's a chain label of the tuple.
01:51:05.840 | The value is just a tensor float.
01:51:08.160 | The chain is actually going to be a closure.
01:51:09.960 | And so this says it is a closure that takes a tensor of float and returns a tensor of
01:51:14.620 | float.
01:51:15.620 | So that's just a way of defining a type in Swift where the type is itself a function.
01:51:21.400 | And so what square VWC is going to be is it's going to be two things, it's the forward thing,
01:51:26.360 | the multiple X times X, and that's the backwards thing, the thing we showed just up above that
01:51:31.440 | does DDX times two times X.
01:51:34.120 | And the forward thing is the actual value of the forward thing, the backward thing is
01:51:37.480 | a function that will calculate the backward thing.
01:51:39.840 | And the chain here is returning a closure, and so it's not actually doing that computation.
01:51:44.800 | So we can do the same thing with mean, and there is the same computation.
01:51:47.960 | And so now what this is doing is it's a little abstraction that allows us to pull together
01:51:52.240 | the forward function and the backward function into one little unit.
01:51:57.200 | And the reason why this is interesting is we can start composing these things.
01:52:00.920 | And so this MSE inner thing that we were talking about before, which is mean followed by square,
01:52:05.400 | or square followed by mean, we can define, we just call square VWC and then we pass the
01:52:10.560 | value that it returns into the mean VWC.
01:52:13.960 | And then the result of calling this thing is mean.value, and the derivative is those
01:52:20.640 | two chains stuck together.
01:52:24.720 | And so if we run this, now we get this really interesting behavior where when we call it,
01:52:28.360 | we're only calling the forward functions once and the backward function once as well.
01:52:34.640 | And we also get the ability to separate this out.
01:52:36.660 | And so here what we're doing is we're calling the VWC for the whole computation, which gives
01:52:41.520 | us the two things.
01:52:43.360 | And here we're using the value.
01:52:44.800 | So we got the forward version of the value.
01:52:46.080 | And if that's all we want, that's cool, we can stop there.
01:52:49.640 | But we don't.
01:52:50.640 | We want the backward version too.
01:52:51.640 | And so here we call it what we call the chain function to get that derivative.
01:52:56.320 | And so that's what gives us both the ability to get the forward and the backward separate,
01:53:01.480 | which we need, but also it makes it so we're not getting the re-computation because we're
01:53:06.660 | reusing the same values within these closures.
01:53:10.020 | So given that we have these like infinitesimally tiny little things, let's talk about applying
01:53:14.480 | this pattern.
01:53:15.480 | I'll go pretty quickly because the details aren't really important.
01:53:17.740 | So ReLU is just max with zero.
01:53:20.040 | And so we're using the same thing as ReLU grad from before.
01:53:25.400 | Here's the LIN grad using the PyTorch style of doing this.
01:53:29.160 | And so all we're doing is we're pulling together the forward computation in the value thing
01:53:34.160 | here.
01:53:35.160 | And then we're doing this backward computation here.
01:53:37.280 | And we're doing this with closures.
01:53:38.280 | So can I just talk about this difference because it's really interesting because this is the
01:53:42.940 | version that Silva and I wrote when we just pushed it over from PyTorch.
01:53:49.600 | And we actually did the same thing that Chris just did, which is we avoided calculating
01:53:56.360 | the forward pass twice.
01:53:58.560 | And the way we did it was to cache away in in.grad and out.grad the intermediate values
01:54:08.760 | so that we could then use them again later without recalculating them.
01:54:12.760 | Now what Chris is showing you here is doing the exact same thing but in a much more automated
01:54:21.200 | way, right?
01:54:22.200 | It's a very mechanical process.
01:54:23.520 | Yeah.
01:54:24.520 | We're having to kind of use this kind of very heuristic, hacky, one at a time approach of
01:54:29.640 | saying what do I need at each point, let's save it away in something or give it a name
01:54:33.320 | and then we'll use it again later.
01:54:35.800 | It's kind of interesting.
01:54:36.800 | And also without any mutation, this functional approach is basically saying let's package
01:54:42.600 | of everything we need and hand it over to everything that needs it.
01:54:47.480 | And so that way we never had to say what are we going to need for later.
01:54:52.440 | It just works.
01:54:54.920 | You'll see all the steps are here out times blah dot transposed, out times blah dot transposed,
01:54:59.760 | right?
01:55:00.760 | But we never had to think about what to cache away.
01:55:03.960 | And so this is not something I would want to write ever again, manually, personally.
01:55:11.480 | But the advantage of this is it's really mechanical and it's very structured.
01:55:16.120 | And so when you write MSE, the full MSE, what we can do is we can say, well, it's that subtraction,
01:55:21.400 | then it's that dot mean dot squared, and then on the backwards pass we have to undo the
01:55:26.280 | squeeze and the subtraction thingy.
01:55:28.360 | And so it's very mechanical how it plugs together.
01:55:31.280 | Now we can write that forward and backward function and it looks very similar to what
01:55:34.400 | the manual version of the PyTorch thing looked like where you're calling these functions
01:55:38.920 | and then in the backward version you start out with one because the gradient of the loss
01:55:43.520 | with respect to itself is one, which now I understand, thanks to Jeremy.
01:55:49.040 | And then they chain it all together and you get the gradients.
01:55:51.800 | And through all of this work, again, what we've ended up with is we've gotten the forward
01:55:55.680 | and backwards pass, we get the gradients of the thing, and now we can do optimizers and
01:55:59.920 | apply the updates.
01:56:00.920 | Now the --
01:56:01.920 | >> I've got this mention like what Chris was saying about this one thing here and so
01:56:06.600 | forth.
01:56:07.600 | For Chris and I, we took a really long time to get to this point and we found it extremely
01:56:15.120 | difficult and at every point up until the point where it was done, we were totally sure
01:56:19.520 | we weren't smart enough to do it.
01:56:22.080 | And so like, please don't worry that there's a lot here and that you might be feeling the
01:56:27.880 | same way Chris and I did, but yeah, you'll get there, right?
01:56:32.840 | >> This was a harrowing journey.
01:56:35.160 | >> Yeah.
01:56:36.160 | It's okay if this seems tricky, but just go through each step one at a time.
01:56:40.120 | So again, this is talking about the low-level math-y stuff that underlies calculus.
01:56:45.440 | And so the cool thing about this, though, from the Swift perspective is this is mechanical.
01:56:51.120 | And compilers are good at mechanical things.
01:56:53.120 | And so one of the things that we've talked about a lot in this course is the idea of
01:56:59.000 | their primitives, they're the atoms of the universe and then they're the things you build
01:57:02.440 | out of them.
01:57:03.440 | And so the atoms of the universe for tensor, the atoms of the universe for float, we've
01:57:06.800 | seen, right?
01:57:07.800 | And so we've seen multiply and we've seen add on floats.
01:57:10.880 | Well, if you look at the primitives of the universe for tensor, they're just methods
01:57:15.000 | and they call the raw ops that we showed you last time, right?
01:57:18.240 | And so if you go look at the TensorFlow APIs, what you'll see is those atoms have this thing
01:57:26.440 | that Swift calls them VJP's for weird reasons.
01:57:30.960 | This defines exactly the mechanical thingy that we showed you.
01:57:34.720 | And so the atoms know what their derivatives are and the compiler doesn't have to know
01:57:39.240 | about the atoms, but that means that if you want to, you can introduce new atoms.
01:57:43.280 | That's fine.
01:57:45.380 | The payoff of this now, though, is you don't have to deal with any of this stuff.
01:57:49.000 | So that's the upshot.
01:57:51.040 | What I can do is I can define a function.
01:57:53.320 | So here's MSE inner and it just does dot squared dot mean.
01:57:58.440 | And I say make it differentiable and I can actually get that weird thing, that chainer
01:58:04.040 | thing directly out of it and I can get direct low-level access if for some reason I ever
01:58:08.880 | wanted to.
01:58:11.000 | Generally you don't and that's why you say give me the gradient or give me the value
01:58:14.040 | and the gradient.
01:58:15.880 | And so this stuff just works.
01:58:17.600 | And the cool thing about this is this all stacks up from very simple things and it composes
01:58:22.120 | in a very nice way.
01:58:24.680 | And if you want to, you can now go hack up the internals of the system and play around
01:58:29.520 | with the guts and it's exposed and open for you.
01:58:32.440 | But if you're like me, at least, you would stay away from it and just write maps.
01:58:36.440 | Well, I mean, sometimes we do need it, right?
01:58:38.300 | So you'll remember when we did the heat maps, right, those heat maps, we actually had to
01:58:45.880 | dive into the registering a backward callback in PyTorch and grab the gradients and then
01:58:53.320 | use those in our calculations and so there's plenty of stuff we come across where you actually
01:58:57.520 | need to come up with this.
01:58:59.000 | Yeah, and there are some really cool things you can do too.
01:59:01.800 | So now we ended up with a model and so this is something that I had never got around to
01:59:08.440 | doing with Fixme.
01:59:09.440 | So here's our forward function.
01:59:10.440 | Here we're implementing it with matmoles and with the lint function, the rel use and things
01:59:14.120 | like that.
01:59:16.160 | The bad thing about defining a forward function like this is you get tons of arguments to
01:59:19.240 | your function.
01:59:20.360 | And so some of these arguments are things that you want to feed into the model.
01:59:23.280 | Some of these things are parameters and so as a refactoring, what we can do is we can
01:59:27.440 | introduce a struct, you might be surprised, that puts all of our parameters into it.
01:59:31.720 | So here we have my model and we're saying it is differentiable and what differentiable
01:59:35.700 | means is it has a whole bunch of floating point stuff in it and I want to get the gradient
01:59:40.880 | with respect to all of these.
01:59:44.280 | So now I can shove all those arguments into the struct, it gives me a nice capsule to
01:59:48.640 | deal with and now I can use the forward function on my model.
01:59:53.160 | I can declare it as a method.
01:59:54.800 | This is starting to look nicer, this is more familiar and I can just do math and I can
01:59:59.040 | use w1 and b1 and these are just values defined on our struct.
02:00:03.920 | Now I can get the gradient with respect to the whole model and our loss.
02:00:09.120 | And all of this is building up on top of all those different primitives that we saw before
02:00:13.680 | that we, and the chain rule and all these things, that now we can say hey, give us the
02:00:20.920 | gradient of the model with respect to x-train and y-train and we get all the gradients of
02:00:26.360 | w1, b1, w2, b2 and all this stuff works.
02:00:29.960 | You can see it all calling the little functions that we wrote and it's all pretty fast.
02:00:35.640 | Now again, like we were just talking about, this is not something you should do for matmul
02:00:40.460 | or convolution, but there are reasons why this is cool and so there are good reasons
02:00:45.240 | and there are annoying reasons, I guess.
02:00:47.480 | So sometimes the gradients you get out of any auto-diff system will be slow because you
02:00:52.800 | do a ton of computation and it turns out the gradient ends up being more complicated and
02:00:58.920 | sometimes you want to do an approximation.
02:01:00.800 | And so it's actually really nice that you can say hey, here's the forward version of
02:01:03.840 | this big complicated computation, I'm going to have an approximation that just runs faster.
02:01:08.740 | Sometimes you'll get numerical instabilities in your gradients and so you can define, again,
02:01:12.120 | a different implementation of the backwards pass, which can be useful for exotic cases.
02:01:18.680 | There are some people on the far research side of things that want to use learning and
02:01:22.000 | things like that to learn gradients, which is cool.
02:01:25.360 | And so having the system where everything is just simple and composes but is hackable
02:01:29.960 | is really nice.
02:01:31.560 | There are also always going to be limitations of the system.
02:01:35.640 | Now one of the limitations that we currently have today, which will hopefully be fixed
02:01:39.720 | by the time the video comes out, is we don't support control flow and auto-diff.
02:01:43.440 | And so if you do an if or a loop like an RNN, auto-diff will say I don't support that yet.
02:01:49.940 | But that's okay because you can do it yourself.
02:01:54.520 | So we'll go see an example of that in 11.
02:01:59.640 | There we go.
02:02:01.680 | And so what we have implemented here, and we'll talk about layers more in a second,
02:02:07.120 | is we have this thing called switchable layer.
02:02:09.000 | And what switchable layer is, is it's just a layer that allows us to have a Boolean toggle
02:02:15.520 | to turn it on and off.
02:02:17.200 | And the on and off needs an if.
02:02:20.760 | And so Swift auto-diff doesn't currently support if.
02:02:24.160 | And so when we define the forward function, it's super easy.
02:02:26.360 | We just check to see if it's on, and if so, we run the forward, otherwise we don't.
02:02:31.080 | Because it doesn't support that control flow yet, we have to write the backwards pass manually.
02:02:35.560 | And we can do that using exactly all the stuff that we just showed.
02:02:38.480 | We implement the value, and we implement the chainer thing.
02:02:42.040 | And we can implement it by returning the right magic set of closures and stuff like that.
02:02:46.800 | And so it sucks that Swift doesn't support this yet.
02:02:50.240 | But it's an infinitely hackable system.
02:02:52.160 | And so for this or anything else, you can go and customize it to your heart's content.
02:02:56.800 | Yeah.
02:02:57.800 | And I mean, one of the key things here is that Chris was talking about kind of the atoms.
02:03:02.800 | And at the moment, the atoms is TensorFlow, which is way too big an atom.
02:03:08.420 | It's a very large atom.
02:03:10.260 | But at the point when we're kind of in MLIR world, the atoms are the things going on inside
02:03:16.440 | your little kernel DSL that you've written.
02:03:20.000 | And so this ability to actually differentiate on float directly suddenly becomes super important.
02:03:25.700 | Because it means that like, I mean, for decades, people weren't doing much researchy stuff
02:03:31.480 | with deep learning.
02:03:32.480 | And one of the reasons was that none of us could be bothered implementing a accelerated
02:03:38.720 | version of every damned, you know, CUDA operation that we needed to do the backward pass of
02:03:44.480 | and do the calculus, blah, blah, blah.
02:03:47.040 | Nowadays, we only work with a subset of things that like PyTorch and stuff already supports.
02:03:52.880 | So at the point where-- so this is the thing about why we're doing this stuff with Swift
02:03:58.360 | now is that this is the foundations of something that in the next year or two will give us
02:04:05.840 | all the way down infinitely hackable, fully differentiable system.
02:04:11.600 | Can we jump to layer really quick?
02:04:13.680 | So we've talked about MatMul, we've talked about Autodiff.
02:04:16.140 | Now let's talk about other stuff.
02:04:18.640 | So layers are now super easy, it just uses all the same stuff you've seen.
02:04:22.580 | And so if you go look at layer, it's a protocol, just like we were talking before.
02:04:26.400 | And layers are differentiable, like they contain bags of parameters, just like we just saw.
02:04:34.160 | The requirement inside of a layer is you have to have a call.
02:04:36.680 | So layers in Swift are callable, just like you'd expect.
02:04:40.320 | And they have-- they work with any type that's an input or output.
02:04:43.820 | And what layer says is the input and output types just have to be differentiable.
02:04:48.560 | And so layer itself is really simple.
02:04:50.200 | Yeah.
02:04:51.200 | And so underneath here, you can see us defining a few different layers.
02:04:54.360 | So for example, here is the definition of a dense layer.
02:04:58.600 | And so then now that we've got our layers and we've got our forward pass, that's enough
02:05:03.600 | to actually allow us to do many batch training.
02:05:05.880 | And I'm not going to go through all this in any detail, other than just to point out that
02:05:09.560 | you can see here is defining a model.
02:05:13.160 | And it's just a layer, because it's just a differentiable thing that has a call function.
02:05:16.240 | And you can call the model layer.
02:05:19.080 | We can define log softmax.
02:05:20.640 | We can define negative log likelihood, log sum X. Once we've done all that, we're allowed
02:05:26.880 | to use the Swift for TS version, because we've done it ourselves.
02:05:30.400 | And at that point, we can create a training loop.
02:05:32.240 | So we all define accuracy, just like we did in PyTorch, set up our mini-batch, just like
02:05:37.200 | we did in PyTorch.
02:05:40.440 | And at this point, we can create a training loop.
02:05:43.280 | So we just go through and grab our X and Y and update all of our things.
02:05:48.040 | You'll notice that there's no torch.nograd here, and that's because in Swift, you opt
02:05:55.240 | into gradients, not out of gradients.
02:05:57.360 | So you wrap the stuff that wants gradients inside value with gradient.
02:06:01.960 | And there we go.
02:06:02.960 | So we've got a training loop.
02:06:04.600 | Now one really cool thing is that all of these things end up packaged up together, thanks
02:06:11.800 | to the layer protocol, into a thing called variables.
02:06:15.380 | Which layer is differentiable?
02:06:16.680 | Differentiable is also a protocol.
02:06:19.600 | Protocols have lots of cool stuff on them.
02:06:22.720 | So thanks to that, we don't have to write anything else.
02:06:25.640 | We can just say model.variables minus equals LR times grad, and it just works.
02:06:32.080 | Thanks to the magic of protocol extensions, our model got that for free, because we said
02:06:38.560 | it's a layer.
02:06:40.480 | Okay, so I think that's about all we wanted to show there.
02:06:45.720 | So now that we've got that, we're actually allowed to use optimizers, so we can just
02:06:51.240 | use that instead.
02:06:52.560 | And that gives us a standard training loop, which we can use.
02:06:56.600 | And then on top of that, we can add callbacks, which I won't go into the details, but you
02:07:03.280 | can check it out in 0.4.
02:07:05.800 | And you will find that, let's find them, here we go.
02:07:10.560 | We'll find a letter class, which has the same callbacks that we're used to.
02:07:17.960 | And then, eventually, we'll get to the point where we've actually written a stateful optimizer
02:07:25.280 | with hyperparameters, again, just like we saw in PyTorch.
02:07:28.160 | And most of this will now look very familiar.
02:07:30.920 | We won't look at dictionaries now, but they're almost identical to PyTorch dictionaries,
02:07:34.880 | and we use them in almost the same way.
02:07:36.560 | So you see, we've got states and steppers and stats, just like in PyTorch.
02:07:41.160 | And so, eventually, you'll see we have things like the lamb optimizer written in Swift,
02:07:48.240 | which is pretty great.
02:07:49.480 | And it's the same amount of code.
02:07:51.880 | And things like square derivatives, we can use our nice little Unicode to make them a
02:07:56.200 | bit easier to see.
02:07:57.560 | And so now we have a function created SGD optimizer, a function to create an atom optimizer.
02:08:03.240 | We have a function to do one-cycle scheduling.
02:08:08.000 | And thanks to Matplotlib, we can check that it all works.
02:08:11.680 | >> Yep.
02:08:12.680 | >> So it's all there.
02:08:13.680 | >> So this is really the power of the abstraction, coming back to one of the earlier questions
02:08:16.240 | of earlier today we started in C. And we were talking about very abstract things like protocols
02:08:22.200 | and how things fit together.
02:08:24.160 | But when you get those basic things -- and this is one of the reasons why learning Swift
02:08:28.000 | goes pretty quickly -- you get the basic idea, and now it applies everywhere.
02:08:32.080 | Yeah, and here we are doing mix-up.
02:08:35.240 | And so now we're in 10.
02:08:39.000 | And here we are doing label smoothing.
02:08:41.840 | And to say it's really very similar-looking code to what we have in PyTorch.
02:08:46.640 | So then by the time we get to 11, other than this hacky workaround for the fact we can't
02:08:52.160 | do control flow differentiation yet, coming very soon, our XResNet, as you've seen, looks
02:08:59.460 | very similar, and we can train it in the usual way.
02:09:04.000 | And there we go.
02:09:05.800 | So we've kind of started with nothing.
02:09:10.120 | And Chris spent a couple of decades for us, first of all building a compiler framework
02:09:15.680 | and then a compiler and then a C compiler and then a C++ compiler and then a new language
02:09:19.000 | and then a compiler for the language.
02:09:20.760 | And then we came in and -- >> Let me correct you on one minor detail
02:09:24.800 | here.
02:09:25.800 | >> Some people helped you?
02:09:26.800 | >> I did not build all this stuff.
02:09:30.200 | Amazing people that I got to work with built all of this stuff.
02:09:35.240 | And likewise, all of these workbooks were built by amazing people that we were lucky
02:09:39.240 | enough to work with.
02:09:41.240 | >> Yeah, absolutely.
02:09:42.760 | So that's what happened.
02:09:44.520 | And then let's look at -- so it's kind of like, thanks to all that work, we then got
02:09:51.960 | to a point where, 18 months ago, you and I met, you just joined Google, we were at the
02:09:58.960 | TensorFlow symposium, and I said, what are you doing here?
02:10:02.840 | I thought, you're a compiler guy, and he said, oh, well, now I'm going to be a deep learning
02:10:07.680 | >> Well, deep learning sounds really cool.
02:10:09.440 | >> Yeah.
02:10:10.440 | >> He hadn't told me it was uncool yet.
02:10:12.120 | >> Yeah.
02:10:13.120 | So then I complained about how terrible everything was, and Chris said -- so Chris said, I've
02:10:19.040 | got to create a new framework.
02:10:20.040 | I was like, we need a lot more of a new framework, you know, I described the problems that we've
02:10:24.280 | talked about with, like, where Python's up to, and Chris said, well, I might actually
02:10:28.720 | be creating a new language for deep learning, which I was very excited about because I'm
02:10:33.240 | totally not happy with the current languages we have for deep learning.
02:10:37.600 | So then 12 months ago, I guess we started asking this question of, like, what if high-level
02:10:43.760 | API design actually influenced the creation of a differentiable programming language?
02:10:48.800 | What would that mean?
02:10:50.800 | >> And so to me, one of the dreams is when you connect the building of a thing with teaching
02:10:58.080 | of a thing with using a thing in reality.
02:11:02.240 | And one of the beautiful things about FastAI is pulling together, both building the framework,
02:11:07.200 | teaching the framework, and doing research with the framework.
02:11:10.120 | >> Yeah.
02:11:11.120 | So next time we caught up, I said, maybe we should try writing FastAI in Swift.
02:11:19.200 | And you're like, we could do that, I guess.
02:11:22.200 | I was like, great.
02:11:23.200 | >> Well, so I think the one thing before this, I'm like, hey, Jeremy, it's starting to work.
02:11:26.920 | >> Yeah.
02:11:27.920 | >> And he says, oh, cool, can we ship it yet?
02:11:30.120 | I'm like, it's starting to work.
02:11:32.120 | >> And it's a high-level API.
02:11:33.640 | >> Yes.
02:11:34.640 | >> So that's the course where we teach people to use this thing that doesn't exist yet.
02:11:39.600 | >> And I think I said naively, I like deadlines.
02:11:42.120 | Deadlines are a good thing.
02:11:43.120 | They force progress to happen.
02:11:45.560 | >> So then one month ago, we created a GitHub repo.
02:11:49.760 | And we put a notebook in it.
02:11:51.880 | And we got the last TensorFlow Dev Summit.
02:11:53.800 | We sat in a room with the Swift for TensorFlow team.
02:11:56.320 | And we wrote first line of the first notebook.
02:11:59.920 | And you told your team, hey, we're going to write all of the Python book from scratch.
02:12:04.620 | And they basically said, what have you gotten us into?
02:12:10.320 | And I think we've learned a lot.
02:12:11.760 | >> Yeah.
02:12:12.760 | So, I mean, to me, the question is still this, which is, what if high-level API design was
02:12:17.500 | able to influence the creation of a differentiable programming language?
02:12:20.640 | And I guess we started answering that question.
02:12:23.040 | >> Yeah.
02:12:24.040 | I don't think we're there yet.
02:12:25.040 | I mean, I think that what we've learned even over the last month is that there's still
02:12:27.880 | a really long way to go.
02:12:29.760 | And I think this is the kind of thing that really benefits from different kinds of people
02:12:32.940 | and perspectives and a different set of challenges.
02:12:36.760 | And just today and yesterday working on data blocks, a breakthrough happened where there's
02:12:42.120 | an entirely new way to reimagine it as this functional composition that solves a lot of
02:12:47.040 | problems.
02:12:48.040 | And a lot of those kinds of breakthroughs, I think, are still just waiting to happen.
02:12:51.200 | >> I mean, it's been an interesting process for me, Chris, because we decided to go back
02:12:55.240 | and redo the Python library from scratch.
02:12:59.600 | And as we did it, we were thinking, like, what would this look like when we get to Swift?
02:13:04.840 | And so even as we did the Python library, we created the idea of stateful optimizers.
02:13:10.380 | We created the new callbacks version too.
02:13:15.240 | So that was interesting.
02:13:16.240 | But that's also been interesting, I've seen, as an outsider from a distance, that Swift
02:13:21.000 | syntax seems to be changing thanks to some of this.
02:13:23.920 | >> Yeah, absolutely.
02:13:24.920 | So there are new features in Swift, including callables.
02:13:28.260 | That's a thing that exists because of Swift for DensorFlow.
02:13:31.720 | The Python interoperability, believe it or not, we drove that because it's really important
02:13:35.240 | for what we're doing.
02:13:36.240 | There's a bunch of stuff like that that's already being driven by this project, and
02:13:39.280 | I think there's going to be more.
02:13:40.440 | And so, like, making it so float can default away to nothing.
02:13:43.840 | That's really important.
02:13:44.840 | We have to do that.
02:13:45.840 | And otherwise, it wouldn't have been a priority.
02:13:47.840 | >> So I mean, so it's still really, really early days.
02:13:54.520 | And I think the question, in my mind, is now, like, what will happen when data scientists
02:14:01.400 | in lots of different domains have access to an infinitely hackable, differentiable language
02:14:07.840 | along with the world of all of the C libraries, you know, like, what do we end up with?
02:14:14.220 | Because we kind of -- we're starting from very little in terms of ecosystem, right?
02:14:19.400 | But, like, there are things in Swift -- we haven't covered, for example, something called
02:14:23.400 | K paths, but there's this thing called K paths, which might let us write, like, little query
02:14:28.520 | language DSLs in Swift with type safety.
02:14:32.320 | >> Yeah, give me all the parameters out of this thing and let me do something interesting
02:14:35.200 | to them.
02:14:36.200 | >> Yeah.
02:14:37.200 | >> It's really cool.
02:14:38.200 | >> And so, you know, I guess at this point, I'm kind of saying that people is like, pick
02:14:45.240 | some piece of this that might be interesting in your domain.
02:14:49.760 | And over the next 12 to 24 months, explore with us, so that, you know, as Chris said,
02:14:59.000 | putting this airplane together whilst it's flying, you know, by the time it's -- actually,
02:15:05.240 | all the pieces are together, you'll have your domain-specific pieces together, and I think
02:15:10.040 | it will be super exciting.
02:15:11.280 | >> And one of the things that's also really important about this project is it's not cast
02:15:15.240 | in concrete.
02:15:16.760 | So we can and we will change it to make it great.
02:15:19.320 | And to me, we're very much in the phase of let's focus on making the basic ingredients
02:15:25.600 | that everybody puts things together, like, let's talk about the core of layer is.
02:15:30.120 | Let's talk about what data blocks should be.
02:15:31.740 | Let's talk about what all these basic things are.
02:15:34.640 | Let's not mess with float anymore.
02:15:35.640 | Let's go up a few levels.
02:15:36.640 | >> While it's done?
02:15:37.640 | >> Yeah, let's -- we can consider float down.
02:15:40.760 | But let's actually really focus on getting these right so that then we can build amazing
02:15:45.800 | things on top of them.
02:15:47.000 | And to me, the thing I'm looking forward to is just innovation.
02:15:51.520 | Innovation happens when you make things that were previously hard accessible to more people,
02:15:55.800 | and that's what I would just love to see.
02:15:57.280 | >> So the thing I keep hearing is, how do I get involved?
02:16:00.380 | So like, I think there's many places you can get involved, but like, to me, the best way
02:16:06.240 | to get involved is by trying to start using little bits of this in work that you're doing
02:16:12.480 | or utilities you're building or hobbies you have, you know, just try -- you know, it's
02:16:18.500 | not so much how do I add some new custom derivative thing into Swift and TensorFlow, but it's
02:16:25.560 | like, you know, implement some notebook that didn't exist before or take some Python library
02:16:30.400 | that you've liked using and try and create a Swift version.
02:16:33.360 | >> Try something or write a blog post.
02:16:35.640 | So one of the things when Swift first came up is that a lot of people were blogging about
02:16:39.360 | their experiences and what they learned and what they liked and what they didn't like,
02:16:42.920 | and that's an amazing communication channel because the team listens to that, and that's
02:16:46.400 | a huge feedback loop because we can see somebody was struggling about it, and even over the
02:16:51.040 | last couple of weeks, when Jeremy complains about something, we're like, oh, that is really
02:16:54.360 | hard.
02:16:55.360 | Maybe we should fix that, and we do change it, and then progress happens, right?
02:16:58.240 | >> Yeah.
02:16:59.240 | >> And so we want that feedback loop in blogs and other kinds of --
02:17:01.760 | >> Yeah, it's a very receptive community, very receptive team, for sure.
02:17:05.760 | Were there any highlight questions that you wanted to ask before we wrapped up, Rachel?
02:17:11.560 | Really?
02:17:12.560 | Okay.
02:17:13.560 | Well, I mean, let me say this, Chris.
02:17:17.240 | It's been an absolute honor and absolute pleasure to get to work with you and with your team.
02:17:23.940 | It's like a dream come true for me and to see what is being built here, and you're always
02:17:30.160 | super humble about your influence, but, I mean, you've been such an extraordinary influence
02:17:35.040 | in all the things that you've helped make happen, and I'm super thrilled for our little
02:17:41.940 | community that you've -- let us piggyback on yours a little bit.
02:17:47.040 | Thank you so much for this.
02:17:48.240 | >> Oh, and from my perspective, as a tool builder, tool builders exist because of users,
02:17:55.380 | and I want to build a beautiful thing, and I think everybody working on the project wants
02:17:58.520 | to build something that is really beautiful, really profound, that enables people to do
02:18:03.040 | things they've never done before.
02:18:04.040 | I'm really excited to see that.
02:18:05.200 | >> I think we're already seeing that starting to happen, so thank you so much, and thanks
02:18:08.280 | everybody for joining us.
02:18:09.280 | >> Thanks for having us.
02:18:10.280 | >> See you on the forums.
02:18:11.280 | [ Applause ]
02:18:11.280 | (audience applauding)