Back to Index

Lesson 13 (2019) - Basics of Swift for Deep Learning


Chapters

0:0
6:35 What's the future of fastai?
13:57 What is Swift? The claims
18:5 Swift for TensorFlow
22:6 S4TF and TensorFlow?
43:47 Compiler optimizations
47:14 Float in Swift

Transcript

Welcome everybody to lesson 13, also known as lesson six of part two, also known as lesson one of -- I never mind. The lesson in which we start talking about SWIFT. Before we do, I wanted to mention a couple of cool things during the week because lots of people doing lots of cool things.

And in part two, I haven't done as much highlighting of these things, but I thought it'd be nice to look at a couple of cool examples from this week. Big congrats to Rob Gee, who said that 14 months ago, he'd never done any machine learning or deep learning or Python or Maths beyond high school, and he just competed in one of the top academic challenges for computer vision machine learning and came first and second in the two of the three challenge tracks he entered.

So congrats, Rob, and I thought this is a great example of, like, you know, the kind of interesting things you can do because if you do an academic challenge, like this, and if you do well, like Rob, you actually get the opportunity, as he mentions here, to write a paper.

And so if you've never done an academic paper before, this is a great way to get an introduction. You kind of have a certain publishing venue, and you get a bit of an insight to the academic world, and I certainly found the same thing that Rob points out here, which is that when you actually submit a paper for the first time, you suddenly realize why so many academic papers aren't that useful because they focus on weird things.

So anyway, I thought this is a great result. Congratulations to Rob. I also feel like I have a kind of a regular gig in promoting the amazing work Elena Harley does because she does so much great work, but this is yet another great piece of work that she's done.

You'll remember from the part one genomic stuff, and this is nice because it's an example of looking at text data for the purpose of looking at genomic information, and I just love this example. And Elena has got a great walkthrough that you can find describing -- I mean, look how familiar this looks.

It's the exact steps that we've just taken, and one of the things I love is she's actually using whatever this version of fast AI is that we have in our X folder, so it's not even fast AI version one. It's the stuff that we've built from scratch, and so it's nice to see that used in practice, not just used, but not bad for a quick-throated together baseline.

It hits 15th out of 357 teams on this leaderboard, which she describes as not a bad starting point, so not a bad starting point at all. So that's very cool. So rewind, start of lesson eight, we said we're going to try and recreate fast AI and much of PyTorch from the foundations.

And 26 days ago, Sylvia and I started the same thing for Swift, except we actually had no ability to cheat because when we started, we really were starting with the foundations. There are no Swift data science modules, basically. But there is stuff, as you'll see, for creating tensors and random number generators, and you'll actually see we've been able to use Matplotlib, which might surprise you, we'll show you why and how.

So this is like what we're going to do over the next two lessons, is revisit this. Now, obviously, we're not going to go through every step in excruciating detail because as you'll see, the vast majority of it is, hey, this is almost identical to what we did in Python.

So what we're still going to do is dig in to the bits that show us something interesting or different, of which there will be many. But in the end, we're going to get here, which is, this is xresnet, is the res block from xresnet, and this, believe it or not, is the Swift version.

Like, you almost can't tell it's different. So we're going to end up in this beautiful situation. This is the res block. Here's the xresnet itself. It's so concise. It's so familiar. Hopefully, it's going to make you feel pretty comfortable that this is where we're going. And what about how we go and get all the data in there?

We're going to have to deal with all the TensorFlow data APIs and learn all this stuff as well? Well, no. Here is the data approach that we're going to use. Data blocks API. So we're actually going to show you how to build the data blocks API in Swift for TensorFlow.

So, you know, three weeks ago, when we started digging into this, none of us knew if this would be possible. And I'm really thrilled to find that not only is it possible, but we end up with code that looks, you know, wonderfully familiar and has all the nice features that we've hopefully grown to love.

So to get to here, there's a lot of people I want to thank. In particular, Chris Latner, who I still don't understand why he put his trust in us in this way. It seems like, you know, he has very strange judgment or something. Yeah. But given that he did, we felt that we had to make sure we don't screw it up.

So he's been fantastic. And then the whole Swift for TensorFlow team has actually made this project their number one priority. And this totally wouldn't have happened without everybody pulling together. Also in terms of bad judgment, Sylvain has, you know, obviously, we all know made the mistake of deciding to spend his time working with me on fast AI stuff, and a few weeks ago I said, guess what?

We're going to rebuild everything from scratch in Swift. And rather than running away screaming, he said, okay, when do we start? So thank you, Sylvain, and he has built nearly all of these notebooks in three weeks and learned Swift, which is not bad. Thanks, Alexis, for the value types discussion.

You'll see next week's super helpful. Pedro has built something which makes the Swift packaging system slightly less than it otherwise does, which is a lot. And also the whole San Francisco fast AI study group who's given us lots of valuable feedback over the last few weeks. So what does this mean for fast AI?

There is a blog post we will link to about, which many of you, I'm sure, have read about why we're doing this crazy hair-brained Swift thing. But I particularly wanted to mention, like, two things. The first is we actually don't really have a choice, right? We used to work with TensorFlow and Keras, and we had to stop because we couldn't build for you and with you the things that we wanted to build together.

So luckily PyTorch had just come out, and we actually first started using PyTorch in this course, I think it was two weeks after the first pre-release of PyTorch. So doing ridiculous things ridiculously early is definitely part of our DNA. We were lucky that happened. But then when we came to get to the next part one, because we started using PyTorch in an earlier part two, just like we're doing for Swift, then we got to the next part one, and we thought, well, PyTorch is great, we want to use it all the time, but there's no way we can teach PyTorch to new practitioners.

So we created a whole new library called Fast AI, because we had to. And now we've hit the same point we were with when we first switched to PyTorch, which is we can't build with you the stuff we want to build together, right? We're hitting the boundaries. We want to create nice, regularized RNNs, for example.

We want to create batch norm layers which skip computing the statistics from time to time, for example. We want to create highly flexible on GPU augmentation systems. And we can do all these things in PyTorch, but they're slow enough that we kind of tend to choose not to. So we're hitting the limits.

So we actually need to do something else. So we need to start building the next thing. And we were very lucky again that Swift for TensorFlow appeared at just the right time. The second thing I mention is that not only will this not hurt Fast AI for PyTorch, but I'm confident it will make it better.

I find that when I work with new programming languages and in new environments, I learn new things. And I become a better developer. And we're already coming up with ideas that's going to help us to actually make Fast AI for Python better. You have a question, Rachel? Yes. Why has Fast AI chosen Swift over Julia?

Well, because it's impractical deep learning for coders, and Julia is far too mature for such a thing. I mean, they would both be great choices. But I mean, a lot of it is just for me personally. I'm really interested in getting involved in something that has a huge amount of potential, but a little bit earlier.

So I can help guide it. I feel like I can be more part of it. I wanted to create something for our student community and our practitioner community where we could all kind of help be part of that guiding and development process. So that's one reason. Second reason is it's not just about the language, but it's about who's using it.

And Julia doesn't quite have that big customer that really makes sure that it goes from quite good to huge, whereas Swift for TensorFlow has Google. And so Google needs it to work. Google needs it to be hugely successful. So I feel that that's really good. I also feel like the stuff that I've talked to Chris Latner about as to what's next for Swift goes beyond the stuff I've talked to the Julia core team about what they're doing.

And so to be clear, I've sat down with many of the Julia founding team members. And I think they're amazing. And I hope I get to spend to have time hacking stuff with them too. But I feel like the place we're heading with Swift is like another level again, as you'll see.

But I definitely would not say to anybody, don't touch Julia. I've actually sent invites to the forum to a large number of the Julia developers at their request to say, why don't you build fast AI with us for Julia too? So I hope that happens. Perhaps another question is, why not Python?

Because it's-- and for most people, the answer to the question of what do we do next is given we're using Python, we fix the problems in certain ways. So for example, in the TensorFlow world, they're creating things like tf.function, which is kind of like allows them to connect up Python to the whole XLA environment, which you'll learn about.

In PyTorch, they're using JIT to kind of turn Python into C++. Why aren't we doing this? Why aren't we just do the best with what we have? Because Python, to be clear, has a fantastic ecosystem. And we already all know it. And it kind of seems crazy to throw that all away.

I think you'll see the answer to this question in the next two weeks. But basically, it turns out, I'm pretty sure, that it's easier to pick the right language and compilation environment and write everything else on top of it from scratch than it is to write everything else on top of something that was not written for that in the first place and then try and madly patch up the thing underneath to make it work.

So things like the global interpreter lock-- everything we're doing has to be massively parallel. But Python is written so that no two things can happen at the same time. So these are things that are just incredibly, incredibly hard to work around. Or else with Swift, as you'll see, it's like the things we want is how it's designed.

So why not Python? Because we think that we can't keep going with Python. We were not the first people to think the existing languages are not good enough. There's actually somebody else who had that thought a few years ago. And he's so OCD that he actually decided to write his own language.

And he's so OCD that before that, he wrote his own compiler. Because they weren't good enough either. And so whilst it may be difficult to be around such an OCD person, we're all very thankful that these people exist because they create the things we have. Oh, and here's one now.

Chris, tell us about why you did this crazy thing. Thanks, Jeremy. I'm not the only crazy one, as you know. So let's talk about what Swift is. And then we'll kind of go very high level. And then we'll get down in the bits. So Swift is a lot of different things.

It's a programming language. If you go ask the marketing people what it is, it says-- they say things like, Swift defines away large classes of errors in programs. It talks about how fast it is. And it's optimized to get the most out of modern hardware. I think another aspect of it is that Swift is really ambitious.

And that is something that I think you'll see over the next two lessons, where Swift isn't just about solving a niche problem. It was not about, let's make iOS a little bit better. Swift was conceived as a full stack programming language that goes all the way from scripting down to low level systems programming.

So you can write boot loaders in it. Now, this is crazy. I'm not aware of any other language that's actually set out from the start to do that. But that's one of the reasons why it becomes really interesting and really good for this domain. Because you want very high level programming abstractions and a very nice and easy to use way of working with the language.

But you also want the ability to go down to the metal in the rare cases you need to. So there's some random guy on the internet that wrote a blog post about this. I highly recommend that. One of the things that he says is that it's-- the interesting thing about Swift is that you get expressivity, flexibility, tight code, you get safety, ease of use, and speed.

And it pulls it together in a unique way. That's a very insightful quote, Chris. I'd like to read more of that person's work. Yeah, he does some good things, too. He's a little bit crazy himself, too. So getting out of the marketing stuff, what is it really Swift about?

Swift is a really young language. This is something I don't think that people realize. Jeremy, you like to point out that Python is 25 years old. Many of the systems we're working on are 30 years old. Yeah, I think Python might be 30. JavaScript might be 25. Java's about 25.

I mean, for me, I've never spent time writing such a young language before. I also don't really remember-- that's not quite true. I guess languages like JavaScript have developed quickly. But it's really unusual to have such a young language be so widely used. So there's definitely a feeling for me often of feeling like, oh, I'm using a language which lots of people use, yet somehow it still feels-- not everything's fully baked yet.

Yeah, well, it's kind of interesting because you have millions of programmers using it. But on the other hand, it can be changed. And so that's one of the things that in this project we'll talk about. Being able to do language-integrated autodiff is an example of the kind of thing that you can only do if, on a reasonable time scale, you can innovate and make new language features, get them merged in, and evolve quickly.

And that's one of the things that's made this whole project possible. Swift, from its roots, was designed for usability. And so it was designed for IDEs, and user experience, and code completion, and inline errors, and things like that. It has a refactoring engine, a bunch of stuff that modern languages have.

The other aspect of Swift that I think is really important is Swift is designed to be not weird. And so when you look at Swift code over the course of the lessons, it will look pretty familiar if you've used JavaScript. You've used lots of other languages. You've used Python.

It will look pretty similar in a lot of ways. And that's by design. A lot of languages start out trying to prove a point about something. And Swift was really designed to be-- there's lots of good ideas in the world. Let's take them together. And through a hardcore, intense design process, actually apply taste and try to come up with something that is well-considered and fits well together.

It reminds me of Perl, which Larry Wall, it's developed and described as a Swiss army chainsaw. Swift has got a similar feel of trying to get the best bits, but it's much more curated and carefully designed than something like Perl. So it fits together. And so as we'll talk about, the whole team that built Swift originally was the team that built LLVM, and Clang, and things like this.

And so many languages were designed. You come from a perspective of, we'll create a programming language and then figure out how to make it fast later. A lot of Swift was built from the beginning to be something that compilers can use, and humans too. So what is Swift for TensorFlow?

So here, we're here to rethink how deep learning systems work from the ground up, and where a lot of the systems today are constrained by being as much as you can get done in a Python library. Here, we're changing the language. We're changing the compiler. We're changing all the library stacks.

We're changing TensorFlow, which we'll talk about. There's a tremendous amount of stuff involved in this. And one of the things that's really cool about this, and one of the focuses, is that we want to make an open, hackable platform, where you can go and change anything you want in it, and you can experiment, research, explore, do lots of new kinds of things, because that's where science comes from.

Oh yeah, caveat. It's all broken. Fair? Yeah, nothing works, which is important. If you're going to be doing an impractical deep learning for coders, you wouldn't want to work with something that works. So yeah, Swift is very much not just for iOS programming. It's an incredibly powerful language. And all these people that are writing iOS applications can much more quickly become AI experts, because suddenly, they're working with the language which is super cool, suddenly.

And they help propel the ecosystem as well. So the things we'll be talking about across the lesson are the Swift potential project has very big bricks that are part of it. So part of it is the Tensor API. We'll touch on that a little bit today. Python integration is a really big deal.

This is what gives you direct access to the entire ecosystem that you already know. Automact differentiation, hugely important for ML. And Swift has a really cool, industry-leading approach to it stolen from Fortran. Jupyter, you'll see a lot of that today. So one of the things you'll notice is that a lot of what you see as a high-level programmer is very familiar.

And that's by design. And so this is an example layer built in Swift. This is an example layer built in PyTorch. And it looks basically the same. I mean, there's differences. And we're working to reduce the difference even more. We love all those floats. I mean, some of the differences are very nice, like the fact that you don't have to pass self all the time.

There are things where you're just like, oh, finally. So it's actually getting to the point where the boilerplate in the Python code, there's more boilerplate of like, oh, this self.com, and self.pool, and self.com are-- and get rid of a lot of that. Yeah, so we're going to start very deep and very low level.

So I just want to give you a high-level view of what things look like and where we'll end up. And so this is a layer. And so in Swift, this is implemented with a struct. We'll talk about those a little bit later. And it says, I'm defining my model.

And it's a layer. We use layers just like you would normally. And so you have column 2D, max.pool, flatten. Things are callable in Swift. And we use call instead of underbar, underbar call. You'll see a lot less underbars. And otherwise, it looks basically the same. You just compose these things together.

One major difference is this differentiable thing. And you may be wondering, why do we have differentiable? Well, this is just telling the compiler that it should be able to differentiate this. And one of the cool things about compiler integration is that when you say, hey, compiler, give me the gradient of some function, in the happy path, when everything is good, it just does.

And that's what you'd expect. But the unhappy path matters as well. I don't know if anybody here makes mistakes. I do. And so one of the cool things about having proper language support is you can get an error message that says, hey, that function can't be differentiated. That's useful.

But you go farther. You say, oh, it can't be differentiated because integers and this in CAS you have can't be differentiated. And then it says, even farther, well, it's actually-- this is several levels deep in function calls. And this is exactly the path. And this is exactly what went wrong.

And it's really cool to get this in your workbook without even having to run something. And so this is the kind of-- when you build things for IDEs and you build things for usability, you get really nice behavior that the compiler is helping you. So what is Swift for TensorFlow?

And how does it stack up? And how does it relate to TensorFlow? So TensorFlow, one way to think about classic TensorFlow is that you have a tremendous amount of infrastructure. TensorFlow has this really mature distribution, scale out, end-to-end training, inference with mobile devices, all this cool stuff in the ecosystem.

And then it has this thing called Python. I call it Python for TF. And Python for TF includes its auto-diff system. And then it has a bunch of APIs like Keras and Estimator and other things built on top. And so what we've done is we've built a parallel stack where we're using a lot of the same infrastructure underneath the covers.

And then we're building a new fast AI framework on top. Now, one of the things we'll talk about more later is that TensorFlow itself is undergoing radical change in the internals. And one example of this is the XLA compiler. And one of the things you'll find out is that TensorFlow is compilerizing itself as new accelerators and new technologies.

And lots of things are coming into play. And so TensorFlow is the internals are undergoing major changes, which is super exciting. So let's dive in to some code. Yeah. What's the roadmap relationship between Swift for TensorFlow and mainstream Swift? Will they eventually be the same thing? Yeah, that's a great question.

So right now, the Swift for TensorFlow project, you can think of it as like a dev branch. Actually, it is literally a dev branch on the Swift project. And so what we do is we build things like automatic differentiation. We bake them, we get experience. And then we propose them and merge them back in the mainline Swift language.

And a bunch of stuff has already been done that way. So the Python integration drove new language features in Swift. We propose them. We got them integrated into the mainline compiler. And now they're shipping. And iOS developers can use cool things because of Swift for TensorFlow. Yep. So let's dive in.

Now, I thought I would start with some really basic things to just introduce the language, just so you understand you have some context of how things work. And then we'll get into some more interesting stuff. So this is a Jupyter Notebook, just like you'd expect. This is Jeremy's window machine, which I will struggle to use.

Because I have not used Windows for a long time. It will be fine. It has a Mac keyboard on it, so it's super weird. It scrolls the wrong direction, but it will be great. So a lot of what Swift does looks very familiar to Python. So here I have some integers, six.

I have some math. I have print. It all works exactly the same as you'd expect in Python. One of the major differences in Swift is that you have this let and this var thing. And so let in Swift means constant. var means variable. And so it's super easy. And as Jeremy, I think, loves to say, in a workbook, just declare everything var, and then you don't have to worry about it.

And then you can change it however much you want. But you can see, if you declare a constant like pi-- because pi should not change, it makes sense to use let. If you try to change it, change a constant, you'll get this error message. And this error message says something like cell 5, line 1.

One of the cool things about Jupyter is if you hit Shift-L, you get line numbers in here. And here you can see it's referring to cell 5, line 1. And it says, hey, if you go to cell 3, line 2, up here, you can change this let into a var.

And now your code will work. And so it's trying to help you out here. So that's something that you'll see. That's super awesome. And I'll just mention, for people watching that have a background in Swift programming, there's a tendency in that culture to keep things closed, keep things constant, keep things private.

And there's lots of good reasons for that. But when you're getting into this deep learning mode, we generally represent flipping everything upside down, at least for the R&D and prototyping process. Because you want things to be infinitely hackable, the way Chris describes, you want them to be vast. So you can change them.

You want to be public so that you can see inside them. And so you'll actually find there's been recent PRs to Swift for TensorFlow itself, where we're starting to change APIs in this way to make it so that people can look inside the covers and change things more. So you may notice that we're not using a lot of types here.

But Swift is a typed language. Types are actually a very important way that the compiler can help you, because you can detect errors early. What Swift does is it has a thing called type inference. And so when you say var x equals 4, it knows that 4 is an int.

And so it will default it to an int. Or if you say var x is equal to 1.0, it will say that, oh, that's a float or a double. And so types in Swift are written with a colon here. And so you can say, OK, well, even though this would normally be an integer, actually make it a float.

Swift supports important things like emoji. Emoji is totally better than Greek letters, Jeremy. Yeah, so actually Chris asked me last week. He goes, Jeremy, yes, Chris, what? He's like, how do you feel like emoji about emoji in the books in Swift code? And I literally said to him, Chris, they're fine, as long as it's the pile of poo emoji and it's next to a cow.

And Chris goes, OK, it's the pile of poo emoji but it's next to a dog. Is that OK? So yeah, OK. We split the difference. So this is great power and great responsibility. If you name all of your variables pile of poo, then your code is-- never mind. Static confine.

Yes, yes, yes, descriptive. Maybe. Yes. So let's talk about a few other things. So Python uses indentation. Swift uses curly braces. So I don't think that there's any-- I'm not going to say one's better than the other. Curly braces are more commonly used, and so that's why Swift uses them.

But they're basically the same thing. Just you'll figure it out. How do functions work? Well, in Python you use def, and Swift you use func. Because it's a function. And so what this is, is this is defining a function and you declare the inputs, the signature, x and y, and it returns a float, and you implement it with the thing you'd expect.

When you call it, you pass keyword arguments. Swift is very opinionated about keyword arguments. And so if you say that this thing has x and y as arguments, you have to pass x and y. And so one of the funny things you'll see is you'll see this underbar thing going on right here.

And this is saying that ignore, underbars ignore, just like Python, ignore are the keyword argument. And so when you call it, you don't pass it. That's all that means. I've got to say, I love almost everything about Swift, except for three things. This is one of the three things.

So this bit I find awkward, because these positional parameters, you can't even change the order of them. Even though they're positional, you can't use them in a different order. If you do have that underscore to say you don't have to name it, then you're not allowed to name it.

Like it's-- I don't know. I find this bit nasty, but it's almost everything else I love about Swift, so I just put up with it. This is also not my opinion of the right thing. But the argument for that is consistency of APIs is important, and it works fine.

So tuples work pretty much the same way as they do in Python. You can use them to return multiple values, which is what we're doing here. So here we're turning two floats, and we're saying the first one is sine, and the second one's cosine. You get destructuring. You get access to the tuples, all the nice things that you'd expect.

One of the things that's different about Swift in Python-- Swift has this thing called struct. And the way to think about it to begin with is it's just like a class in Python. Structs are super powerful, though. They are more efficient. There's a lot of good things about them.

They don't require memory allocation. And we'll talk about why that matters. If you've got a C programming background, it's not much like that at all. So I would say, like, think of it more like a Python class than a C struct. Yeah, exactly. And we'll show you a little bit about that.

So here I have a complex F struct, and I've got a real and imaginary. I stick in there. I can create one of these complex Fs by specifying these things. I print it out, and I get it back. And in Python, there's this thing called data class. Yeah, so we've used data class.

And it's interesting. When I threw in a data class here, it looks almost exactly the same. There's some extra boilerplate we need in the Python version. For example, we have to put the two things on different lines. We can't put them on the same line. But overall, like a lot of things between Swift and Python, ends up looking extremely comfortable.

Yep. OK. So now, one of the bad things about this thing is you notice it's defined with floats. But complex numbers work with integers, as well. And they work with doubles and lots of other things. So the way that Swift handles this is this thing called generics. And we'll talk more about the details of generics later.

But basically, what we can do is we can say, let's define a complex type. And complex works with any type T and anything that is signed. And that's a number. And that's what the signed numeric thing says. And so now, what I can do is I can define the struct.

And I can use it down here with integers and with floating point. And it just figures out that T is int or T is float, depending on what you use it with. And this is something that Python can't do, right? So with Python, if we remove the data class, we could certainly then remove the float.

And then we could have it untyped. But we can't say in Python, these two have to be of the same type. But I don't know what type it is yet. So this ability to use generics lets us do some pretty powerful stuff right out of the box. Yeah. And we'll talk about some of the really cool things that you can do that make it super flexible and super-- I mean, there's some really powerful things you can do.

So we've got our complex here. One of the things you can see we're doing is that just like in Python, you have computed properties and stored properties. And here we have a computed property. We can define a getter just in line. And so it's just a stored property. But you provide a body.

It's quite simple. Here's a computed property doing a weird thing. But here I just have a computed getter and a setter. And it's pretty straightforward. This is very similar to C#. When you've got one of these, you can create some of these things. You can use the computed property.

And it works just like a normal property. It's all very simple. Now, one of the cool things about Swift is that after you define a type, you can add methods to it. And this is something that might make you feel weird. But you can add methods on anybody's type.

You can add it on your own, like we're doing here. Or you can add it on standard library types. Or you can add on anybody's-- I mean, it doesn't make me feel weird, Chris. Because we do it in fast AI all the time. It's called monkey patching. But it's kind of something that we're told to avoid.

Because monkey patching has weird, dangerous, undefined, strange behavior, and things combine in weird ways. We get conflicts. So is this monkey patching? Should we be avoiding this in Swift? So this works in a very highly principled way that actually composes. And if you get a conflict, the compiler will ask you which one you want.

This is not something you should feel bad about. Now, here, I'm defining an add method. And so I'm using this to add two complex numbers. I feel bad about this because there's a way to spell add. And yes, it's ADD, I guess. But I would rather spell with plus.

And so you can call a method on this, just like any other method. But if you want to add an operator, what you do is you just define func plus. And so instead of honor bar, honor bar add, and all that jazz, you just define the operators you want and spell them the way you expect.

And they're just functions like anything else. And this already is getting at something that would be really nice to be able to do in Python, would be able to say, oh, there's a whole bunch of different functions or operators with the same name. And they behave differently depending on what type I pass to them.

Now, Python does have a standard library decorator you can use called single dispatch. We almost never use it because like every time we've tried to use it, it reacts in weird ways with everything else. But it's super nice that in Swift, as in many typed languages like this, it's very much designed for us to be able to say like, oh, here's lots of different types.

And they all have different meanings of what, for example, plus means, and it just works. And so here we're implementing plus on complex in terms of plus of its elements. And so we're just adding together the real and imaginary, and these are different pluses. One of the mind-blowing things that's very different than Python is that you can define your own operators.

And so some of us do lots of math, us not including me. But some of you all do a lot of math. Or you're working in a domain where you're doing quaternions or other cool things like that, and it's really nice to be able to use operators that are familiar to your domain.

And so if you want to find a square root operator, then you can define a square root operator. And these just work, and now you can use a square root operator just like anything else. And this is one of the examples of Swift being hackable. Like, there's a standard library that has a bunch of stuff built in and provided in the box.

But the stuff the standard library does, you can do too. And so we try very hard to not make the standard library be a privilege. So that's like the super quick introduction to some random stuff in Swift. There's this guided tour here, which is really cool. It goes into other random stuff.

And so if you want just a high-level introduction like this, you can go there. But let's dive into some more relevant pieces. First, we have two questions. The first is, does Swift support any debugger within Jupyter, similar to IPDB for Python to set breakpoints? So we don't have that yet.

We have all the mechanics under the covers. So Jupyter is actually talking to a debugger. We just haven't wired it up yet. But that's one of the things we're interested in. OK, so that's probably coming-- I can't promise that. But the guy in the front row that built it all is smiling.

So maybe. And does Swift have something similar to Python's ARGs and KWRs? Yes. In fact, we'll talk about that when we get to the Python section. Great, thank you. So it works a little bit differently. So let's talk about Python now, because we love Python, right? Well, Swift loves Python too.

And as Jeremy healthily pointed out, Swift's data science ecosystem is kind of pathetic. So Python is really important. And beyond the data science ecosystem and Swift being pathetic, you all know Python. And so you all know important APIs that are pervasively available. And there's no reason for you to relearn new APIs.

If you know the APIs in Python, just use them. So let's talk about how that works, because I think it might blow your mind a little bit. So to use Python and Swift, you first import Python. This is just a library in Swift called Python. And then you use Python to import whatever Python libraries you want.

And there's no wrappers. There's no build steps. There's no wrapper generator thingies. You just literally import NumPy, or here we're importing Matplotlib. What does this give you? This gives you NP. This gives you PLT, just like you would do in Python. And now you use it just like in Python.

And so here I'm calling NP array. And this is-- except for the let, this is literally what you write in Python. So we can now use this to do some cool stuff. And so here, actually, we're going to use this load MNIST function. And we'll see it a little bit later.

It's in the 00 notebook. This is firing up TensorFlow and loading the MNIST data set and plopping it into a tensor for us. Once that comes back, now we can use Matplotlib. And Matplotlib, we can use this magic incantation, this kind of like the Matplotlib inline. We can then load, take the tensor that TensorFlow gave us, do all the NumPy stuff with a NumPy ND array, reshape it, and plot it out.

And this all just works the way you would normally use Matplotlib. And the cool thing about this is that Swift knows nothing about Matplotlib, knows nothing about NumPy, knows nothing about any of the particular libraries you're using. We literally just imported some random thing with strings. And Swift doesn't know anything about what you imported here.

And so you may be wondering how this works, because we're just using Python right from Swift and how does Swift know what Python is. Well, the way to think about this is that we think about Python as though it has no types. But really, Python has one type. And that type is Python object.

And Python has an internal representation of objects. And you can use dot on them. And you can call them. And so the way it works in Swift is that you have one type called Python object. So here, when we use the type of, that's just like type in Python, says give me the type of np, or give me the type of np.array, or give me a type of the array that we got, or whatever.

What it actually shows you is the type is Python object. And so Python values are Python object types in Swift. And when you dot them, when you call them, it's just using the interpreter. And so it just works in Swift, because you are literally using Python as dynamically as Python is designed to be used.

And you can actually go and look. And one of the totally rain-twisting things that you can do is you can import Python into Swift, import FastAI's Python libraries into Swift, and now go to town and just use all the standard FastAI cool stuff right from Swift. And it all just works.

So thank you to Omar SF for trying this. It's a crazy thing to try. And it's interesting how when you look at the resulting code, it's the same code that we-- like at this point, you can't tell other than some slightly different syntax here and there. But it's all the same.

It's like Python with Latin ver. One thing I'll say about this is this is a super cool feature that you should totally use to fill in the gaps that need to be filled in while this ecosystem doesn't exist. But then as soon as possible, fill in the gap. Because I don't want us, as a Swift for TensorFlow community, to use this as such a crutch that we never write our own even better DataFrames library, because we're always using pandas.

And we always use Matplotlib, so we never create an even better plotting library. We should use the crutch to allow us to get all our work done end to end, and then gradually replace it with bits that are more swifty. I mean, one of the awesome things about Swift is that it supports really well-considered and beautiful APIs.

And it was really designed for building APIs. But particularly when you're new to Swift, don't worry about that stuff. That's a problem for a different day. If you want to open a file, open a file the way you know how. Just use the Python file IO library. That's fine.

Don't waste your brain cycles on that kind of stuff. So let's now talk about the idea of Jeremy's course here, which is building a machine learning library from scratch. And I think it's very quaint that Jeremy tried so hard to go down to the foundation and teach you how to build a Matmul with a few loops, and looping over an array, and adding and multiplying floating point numbers.

And I think that it's very nice how that he thinks that this is going down to the foundation. Oh, Chris, it's Matmul from scratch. See? Yes. Well, so if that's Matmul from scratch, then I think we should go down to the bedrock and actually talk about where float and arrays come from.

But before we do that, I want to dive in and geek out just a little bit about compilers, because I think you need to understand, or it's useful to understand, what LLVM and Swift and things like this are. So, Chris, what you're saying is that I cheated. I used array without implementing an array.

Exactly. And I used float without implementing float. So let's fix that. OK, I'm sorry. So what is a-- yeah, so what is a compiler? Actually, we can do-- oh, look at you. Touchscreens. Wow, crazy. OK, so what is a compiler anyways? And what is a language? So the way I think about this is that there's actually two unmovable obstacles in the universe.

There's humans, which we're all kind of a pain to work with, right? Highly opinionated sometimes. And then there's computers. And they're really a pain, because they are super opinionated. And so what languages are is they're a point in between. And different languages are different points in between. And some are optimized for working with humans better.

Some are optimized for working with computers better. But good languages work well with both. And so how do compilers work? Well, the way that it used to work in the bad old days is that if somebody wanted to build a compiler for x86, they would build a parser, the front end part of a compiler.

They'd then build an optimizer and make the code go fast. And they'd build a code generator for the Intel PC or whatever it is that they want to target. Somebody else would come along and say, hey, I want a different compiler. I want a C++ compiler. And they would build a parser.

They would build an optimizer. And they'd build a back end for PowerPC. Somebody else would say, hey, APL is really cool. Let's build a parser for APL, an optimizer for APL, and then a back end for ARM. And if you've noticed the trend here, there's a lot of re-implantation of all the things going on.

And so what compilers have done is they've created funnel points. LLVM is one of these funnel points where you can make it so that lots of different language people can implement lots of different front ends. And lots of different hardware people can implement what's called the back end or the code generator.

And now they can share a tremendous amount of code. And now you can get all the permutations that you want out of this. And we should all thank Chris Latner's master's thesis supervisor for forcing him to write his damn thesis and getting him to actually write LLVM version 1 in one and a half weeks of Diet Coke-fueled coding activity.

This is the way we get things done, is give people a ridiculous deadline and it happens. And so the details of what LLVM is is not really important. But this LLVM is what powers Julia and Rust and Swift and Clang that does C and C++. It's like a very common set of infrastructure that lots of things use these days.

And if you're not very familiar with compilers and what optimizations are, there's a bunch of standard stuff that LLVM does, including constant folding, removing dead code, other things like the example I show here of taking an expression and pulling it out of a loop. This is something that in PyTorch, for example, if you do a multiply inside of a loop of two tensors, it's going to run that multiply every single time you go through the loop.

But reasonable, more modern languages actually pull these things out for you. So this is a fascinating example. Because normally, if we're writing Python and you see, oh, I'm doing this work inside the loop redundantly, I can pull it out. That's something that I, as a programmer, have to figure out.

And the fact that LLVM can do this for you and other optimization systems in GCC or whatever, it suddenly makes you realize that compilers are something different to at least what I thought they were. I thought compilers were things that got in your way and complained until you did things the way they expected you to do them and took forever to run code that I would have finished yesterday if it was Python.

But actually, working with Swift, and particularly with Swift for TensorFlow, has made me realize that these optimizations actually allow us to write code in different ways and actually be much more lazy about how we write code. - And this is as you think about a point in the space between the human and the computer.

- Yeah, so we're actually gonna show you something really mind-blowing next week where this is actually gonna be used to basically make auto-diff work. And it's like, it blew my mind when I found out about it. - Yep, and so now if you think about languages and different points in the space, there's a couple of different ways to look at this.

One of the ways I think about it, ignoring the syntax pieces, which the syntax is always the first thing people talk about, is what are the atoms of the universe and how do they get composed and how do you build things out of them? And so if you look at Python, for example, if you boil everything down in Python, boil a dictionary down, it's a bunch of C functions.

And then what the interpreter does, the Python interpreter does, is it decides what C functions to call and what order and on what data. And so the Python interpreter is slow, and so the Python program ends up being slow, even if the C pieces are fast. C++ is another language.

C++ is a little bit different. C++, the atoms are built in things like integers and floats and arrays and pointers and things like that. And then a C++ programmer can use structs and classes to build complex numbers or strings or its variable size array thing out of in the library.

And it can do this because C++ is a fast language. It's also not a super well-considered language, but it's weird to me in C++ that arrays hard-coded into the compiler, but string is a library feature. And why is that? That doesn't really make sense to me because strings and arrays are so similar.

What Swift is, is it says, let's rethink all this. And so the primitives, the low-level atoms of the universe are now things that LLVM, the compiler, knows about. And then all the abstractions you build, including floats, arrays, dictionaries, of course, the high-level stuff too, like layers, those are all defined in the library.

And so a float is not a magic built-in thing. Swift doesn't like magic built-in things. Swift likes things that are hackable. And if something is interesting for a library developer to do, maybe you want to do it in your workbook, right? And so having an open ecosystem is very powerful.

And so if you actually go look at the library that implements float, float is just a struct, just like the complex thing we were talking about before. In the middle, the inside of it is this built-in weird thing. That's an LLVM thing. And plus, on floats, isn't a magic thing that the compiler knows about.

Plus is just an operator, just like we were talking about before when we defined square root. Just this one happens to be named plus or plus equals. And it's implemented with LLVM magic. >> So we're allowed to use float now, Chris? >> Well, let's go look at the implementation.

And so if you actually go look at this, this is the standard library that comes with Swift. Here you can see it implements infinity. It implements not a number. It implements add, pi, like all the things that are in float is just a gigantic pile of Swift code. And the cool thing about this is that this means that you can implement low-level stuff like this too right in the workbook.

>> And to be clear, we don't expect you to implement float yourself. But the fact that you can is actually important for data scientists. And so let me explain. When I was starting out and I did a lot of stuff with Delphi I guess 20 something years ago, which is like a very fast Pascal system.

And I was writing a lot of numeric code. And I very often hit this floor where things weren't working the way I wanted them to. So I had to use assembler, which nobody should ever have to do. But that was the floor I hit. Like I had work that needed to be done.

And I couldn't do it in Delphi. So I had to use assembler. But at least I could. And over the last 25 years, we've gradually kind of filled in more and more of the things that numeric programmers use. But what I'm kind of finding is happening now is as numeric programming is becoming differentiable programming, I'm hitting the bottom of the stack again.

And there aren't things that I want to do. And/or there are things I want to do a little bit differently. So I feel like we're at this point in history. You know, we might be for the next five or ten years or more where data scientists don't need to know how to write assembler.

But they do need a system where they can go under the surface and actually change things that people don't normally change. >> Yeah. Well, and again, to me, I think the goal here is an infinitely hackable platform. So like in the box are all the nice things you'd expect.

You don't have to write map models. You don't have to write floats. But if you want to go change it and do your own, you can. Or if you want to take somebody else's, you can drop it in your workbook. You don't have to recompile the whole stack. Now, we talked about structure a little bit like classes in Python.

The major difference is that these are actually fast. So here's our square add that multiplies two things together and adds it. If this was Python, these would be allocating objects. This would be doing lots of crazy stuff. This thing I'm showing you now is called the compiler explorer. And you thought you came to learn machine learning?

Here's some assembly language, which we're going to get away from as soon as possible. But the point is like you're writing a reasonable Swift code and you're getting literally the same code you would get from Clang if you wrote C++. Like even though float is implemented in the standard library, there's no tricks here.

You're getting the lowest level optimized fast code that's turning to multiply instruction and add instruction on Intel. And I'll go away from this very quickly because we're not here to learn about Intel assembly. So now the thing about float again is not really about something you should want to do, but you can poke at it if you want.

You can see what's inside of it. One of the things we've at least so far chosen not to do is we don't export the built-in to workbooks. And so you have to write a standalone file to use it. We could change that if we wanted to. But one of the really powerful things about this is because these types are defined in the library, they're not magic.

Well, now all the other things we talked about before work with these. And so we can add a method to int or to bool. So here, you know, we add a little is odd method that's just saying is the low bit set or clear. That's cool. That's fine. Like this is not monkey patching.

This is just super low level. Int is a struct. Sure you can add a method to it. No problem. We can add a symbol that turns a boolean into an emoji because emojis are cool. And so now you can just use these things. And we can say, hey, 4, are you odd?

A4, are you odd? Turn yourself into a symbol. And we get true false. We get thumbs up, thumbs down. And it all just kind of works. This is particularly important for all of us at this stage because as we discussed, you know, Swift hasn't been widely used for numeric programming.

So a lot of stuff doesn't exist. And so when I started playing with it in December and I realized like, oh, I'm missing stuff. So yeah, so I created this library called BaseMath where I literally was defining things on float that I needed to exist. And not only did they then exist, but they ran at C speed.

And then from then on, I had all the stuff that I math stuff that I wanted in the language. And so if you're hacking around over the coming months and you find things not quite the way you want, you can and should change it, right? And it's really, really, really common in Swift code to add extensions to basic types.

It's not at all unusual or weird. It's just part of how you write Swift code. - And you can make it feel the way you want. So we're not going to dive in too deep, but there's lots of interesting things in the system. So if you say, well, how does and and work?

And and only evaluates one side of itself if the other side is true. Well, that's implemented in our libraries, three lines of code, you can go dive in. There's a couple of links. Let's talk about array because we need arrays to implement MatMul. Before we talk about how array works, let's look at how you use it as a Swift programmer.

Arrays in Swift are super simple. You just define them with square brackets like you'd expect. Swift is type inferred. And so what you'll end up seeing is there's two different syntaxes for the types. There's int and square brackets, which is the way you'd normally write it if it's not inferred.

But that is actually just sugar for this array, okay? And if you print out the types of these things, you'll see that they're all just array event, array event, array event. Well, arrays can be iterated over. So you can have a for loop. It just goes over all the elements of the array.

Pretty simple. You can slice them. Swift has two ways to slice based on whether you want the endpoint or not. And if you want an inclusive range, which includes that endpoint, you use dot, dot, dot. In an exclusive range, you use dot, dot lesson. And the lesson says, but not the last one.

Swift supports functional programming things. And so here what we do is we use this functional map algorithm. And it's using a closure. Closures are the same thing as lambdas in Python with slightly nicer syntax. And so here what we're doing is we're saying, give me an array, but run a function that takes all the elements and adds 10 to them.

And it's very simple. You can just do this right in line and it's nice and fast. And so here we get our array where everything has 10 added to it. It has filter and reduce as well. So filter just takes a predicate where you say, filter and only include the things that are odd, okay?

And we just added is odd. And now we get an array that just has odd things in it. Super easy. And one of the other things you'll notice is that Swift has lots of nice syntactic shortcuts. And so instead of naming our argument like we did in map, we just use the default name, which is dollar sign zero.

- So the top one is equivalent to lambda arg colon, arg plus 10, right? And so then we can get rid of both the lambda and the arg colon by sticking it in curly brackets and just using dollar zero to refer to the first argument. - Another super common thing is that often these closures end up being the last argument to a function.

If you have, if they're the last argument to a function, you can just put them outside the parentheses. And if that's the only thing in the parentheses, you can just get rid of the parentheses as well. And so you get these really nice things that are kind of like list comprehensions where you can say map and multiply all the elements by three and then filter them and only keep the odd ones.

And you get very nice, fluent things, or here's a map where I'm saying, you know, pick, get the odd, like decide whether it's odd and then turn it into a symbol. And I get very nice, simple. - Yeah, so this, so just come back and have a look at this map filter again at the end of the lesson because this is how you do list comprehensions in Swift.

You don't need special syntax for it because we, the stuff that's already built in very elegantly gives us list comprehensions are free. - Yep, and all these things are just library features. Reduce is a, it's a reduction. So you give it the first element. And then in this case, we're just adding all the elements of the array to zero and plus is a function.

We saw it already. And so this just uses the plus function to do a reduction, it's super simple. Now we're talking about array, array is a type. And that means you can do an extension on a type. So you can add methods to arrays, that's super easy. So here we defined a double elements method that returns a new array and we just map.

So double elements just multiplies all the elements by two and like the self thing we don't actually need. - Oh, thank you. - Right, Jeremy? - Too much self in Python. - Yeah. And now one of the other things you may wonder about is like, why do we need this where element is numeric?

And what this is talking about is it's saying, well, we're taking all the elements out of this thing and multiplying it by two. This is helping us catch errors. So if I have an array of Booleans, I get an error message that says, hey, I can't double all the elements of a Boolean array because bool is not a number.

And so in Python, what would end up happening is if you accidentally pass the wrong thing in, you would pass in your Booles and then they'd get multiplied by two and then sometime in a far distant API call, somewhere later you find out you have twos in your Booleans, like what just happened.

And so Swift helps catch these errors early. - And talking of what just happened, this is the point where if you've used a few languages before, you're thinking, oh, Swift is actually pretty different in a pretty interesting way 'cause what we've just done, we've just said, here is some functionality which applies to some type which has some particular properties to it, right?

So we've like defined functionality in a way that's gonna be looked up in this really nuanced and interesting way. And so we're not gonna go into the details now, but just like take a look at this after the lesson and think like, wow, what's going on here? 'Cause this is something really interesting.

- And again, one of the cool things about this is because it's all built in the library, it's all open to you and you can do cool things like add methods or do other things. So I'm not gonna go into the full details of array. Array is implemented right here and array.swift.

This is the standard library, array is a struct. It has a buffer inside of it, the elements. You can go through and you can see all the different gory details of things that go into array. And instead of coding this in Workbook, I think that we will just consider it that we implement this.

Is that okay with you, Jeremy? - Absolutely. - Okay, so now we can use arrays. Okay, so let's move on to Matmill. Okay, so what I'm gonna suggest is we might be good time to take a break. So let's take a six minute break and we'll see you back here at 6.47.

Now that we've invented SWIFT and float and array, we will actually implement matrix multiplication. So we'll see you back here at 7.47. Okay, any questions before we keep going, Rachel? - Yes, we have two questions. The first is that in keyword is very unintuitive for arg and arg plus 10.

- Enclosures, yeah. Can we point at that? So yeah, the in keyword. - Yeah, up here, in, yep. - So that's the question, it's like, why is it so weird? - Why is arg in arg plus 10? - Yeah, so we carefully considered the name of this keyword and we didn't have anything better to use, so we got stuck with this.

- In Python, I guess that would be a colon. - Yeah, so there's no good answer. Nobody knows what it means. There's historical reasons, but they're not good reasons. So we just do it and it's-- - So the answer is because Chris says so. - Thanks for your honesty.

- Why do we use colon? Well, that's what Python says. - And the second question, can SWIFT LLVM implement instructions to execute on the GPU? Would this be a good idea? - Yeah, this is a really exciting direction. This is one of the things we're investing a lot in TensorFlow and the infrastructure pieces, and we'll be talking about that a little bit next time.

- Yeah, but I mean, the short answer is that LLVM has a number of backends. And one of the backends it has is a PTX backend, which is the lower level Nvidia kind of instruction set. And so like right now you can compile stuff with LLVM and have it run as code kernels.

So the answer is yes, absolutely. - And in fact, like every pixel on the iPhone goes through LLVM, not through SWIFT in a workbook. - Not bad. So LLVM is used for lots of cool stuff and using it for more cool stuff is fun. So now that we have our bedrock of floats and arrays, let's build something a little bit higher level.

Let's talk about matrix multiplication. So here what we're gonna do is we're actually gonna load up a few tensors. And here we're playing by the same rules that we played with in Python, where we could use the random number generation and things like this, but we're not gonna use the matrix multiplication operator quite yet.

So there's lots of ways, there we go. There's lots of ways to create tensors and SWIFT for TensorFlow has this tensor type, this little float thing we wanna go away eventually, we hope, but right now you say tensor of float to say that I want a tensor of floats.

Use a shape, you can get zeros ones, you can repeat a thing, you can get random, there's lots of different things you can play with. I highly recommend you type tab in Jupyter and it will tell you all the things that you can do with completion. So let's build a matrix multiply.

So here what we're doing is we're doing something kind of similar to what Jeremy did in the Python version of this. But here we're starting a little bit lower level. We only have one dimensional arrays. That's how the SWIFT array works. And so what we need to do is we need to pass in two arrays of floats, and then we're doing a two dimensional matrix multiplication so we need to know the number of rows and number of columns for A and B.

- So that's a definition of a tuple parameter, Chris? - Yep. - So there are two ints. - Yep, so A dims is two ints and B dims is two ints. And so what we do is we create an array full of zeros, and then we write the same three loops you saw before.

And because we have a one dimensional array, we have to do manual arithmetic to get into it. Now, one of the things that you'll see is like, if you actually try this out and you run this, I didn't actually run the cell, this is why you say don't do things and I don't listen, and then I make a fool out of myself.

- Okay, so then you run this, you get the tensors here, we're just using the MNIST data set because it's fun to use. Now we can run this and we can time it. And one of the major differences you'll see is we just wrote three loops and it takes 0.132 milliseconds.

The Python version of this took 835 milliseconds. - We'll just have a look. So Chris, I just wanted to compare. So, I mean, the first thing I wanted to compare was to look at the code, so that's a Swift code. And in Python, there's the Python code. So it's basically exactly the same code.

- Yep, here you have 2D arrays, which we'll get to. - And so, yeah, we kind of found with the Python version, it took about a second to do a five by 784 matrix model, probably with a 784 by 10. So we kind of did the math and said like, we can't use this because it's just not fast enough.

But something very different is going on here because this is about 9,000 times faster? - Yeah, so this is not super fast, but this is pretty reasonable. This is what you get from roughly C, right? And that's because, again, we talked about, it's primitives that are principled that are fast.

And when you build things out of principled fast primitives, you get new things that are principled and fast. - Okay, so this is no big deal for you, but for a Python programmer, this is like, this was a whole mind shift change. 'Cause at this point, the fact that you can write this and have it run this fast means like, I can now write anything I can think of doing with numbers in the most obvious way possible and have it work pretty damn well.

So this is kind of like a superpower that Python programmers don't have. - Well, and if you think about it, so one way to think about, so for intensifiers, we're trying to subtract C and C++ out of the picture, right? Because if you think about working with Python, a lot of it ends up being, if you care about performance, working around the gill.

And how do you do that? How do you go into C stuff or working around writing, oh, I need a new kind of data structure, what do I do? I write a bunch of stuff in C because writing in Python wouldn't be fast enough. And that's problematic for lots of reasons, one of which is it's just really hard to do things in workbooks.

But here we're implementing basic stuff in workbooks and it goes fast. - Yeah, and I can ship something that's like a compiled program that I just ship it. I don't have to figure out how to put the C library over here and compile it and put it together with this header.

So Jeremy, what is this built-in called time? Is that built-in in language? - No, well, so Chris-- - Can you show me what that is? - Absolutely, so we're not using percent time because percent time is a magic. And we, under your new rules, we shouldn't be allowed to use things that are magic, we should write them ourselves.

So time is written ourselves and it's actually written in this notebook called 00. So yeah, so when we started out, we started out with a blank slate and so we had to start out with things like how do I time a cell, right? So the answer is, this is the nice thing about working with Swift is we can build everything from scratch, right?

So here's the timing section, right? And the details don't matter, but basically you can see we can grab some stuff from the Swift standard library, for example, a function that tells the time. And we can run some function and see how long it takes. And the nice thing is that we can end up with some very neat syntax, right?

Because the thing that we pass in, the function we pass in is a closure, right? And this is how you say pass in some closure that takes no arguments and returns nothing at all. And so, for example, we can give it a default value, which means if we want to time something and just run it once, we can just do that, right?

So we can create syntax, you know, we can create APIs that are really nice and elegant and simple to use. And so this version of time actually is both time it and time together, right? So if you give it repeating, it'll do that, right? And actually, this 00 notebook is worth flicking through because it's the only notebook where there's no tensors in it, there's no tensor flow in it, there's no tf.data in it.

So if you just want to see like just Swift, right? This is a good way to learn just Swift. So for example, you can see how we built this thing where you can go ls.shell. So we've actually added something to string that will run a task, right? And you can kind of see how to write these things in just nice neat little concise packages.

And now we can export this and now anytime you want to run a process through the shell, you can just go blah.shell. You can see how to download a file. And one of the nice things here is that you'll see that download a file with actually using this path library that looks almost identical to pathlib.

And that's because there's a wonderful programmer, wonderful person called Max Hal. This is Max's username on GitHub. And I mentioned he's actually an open source programmer who entirely relies on sponsorship for his work. So if you like Max's work, which I certainly do, you should go to his Patreon and give him a few dollars.

So thanks to Max, we have something that's really just like pathlib but actually almost a bit better. There's something that's almost exactly like requests. It's called just, right? So in the Swift ecosystem, thanks to the fact that you've got all these millions of iOS programmers who have been using this language for five years to build cool stuff, there's actually a nice non-data science ecosystem.

>> And while we're talking about a non-data science Python similar packages, is there any web framework similar to Flask or Django for Swift yet? >> Yeah, actually the Swift on the server community is a really vibrant community. And there's the equivalent of Ruby on Rails. And a bunch of these different frameworks have Swift versions.

And that's actually one of the biggest non-iOS communities that exist. >> So one I've seen a lot of love for is Vapor, I think? >> Yeah, Vapor, IBM is investing. They have a framework called Couture. And they're putting a lot of time and thought into that. Apple has a low-level thing that's like Netty, the Netty library on Java.

And there's a Swift version of that called Swift Neo. So there's a bunch of these fairly infrastructural things that exist that are part of the Swift ecosystem. And Swift is really great on servers too. >> Great. So here you can see how we can download a file. It's all pretty straightforward.

We've got try-catch blocks, a lot like we're used to. But they're kind of do try-catch. The details are a bit different. So in this case, we want to download MNIST and load it up. One thing to be aware of is that things -- and we'll talk a lot more about this next week.

But things can get a little bit complicated when, like, for example, for MNIST, we've got a file containing labels. And we've got a file containing images. They're different types. The labels are ints, the images are floats. So we kind of want two functions. One that returns a tensor of floats, one that returns a tensor of images.

That's duplicate code. I hate duplicate code. Right? So here's a version where we actually tell it, "Oh, you could load up different types of MNIST data, and it's going to return a tensor of that type." Okay. And unfortunately, if I try to use this, I get an error. Right?

And I really wanted to show you this error because for the first week as a Swift programmer, I kind of -- I've never felt so stupid. Like, I felt like everything I wanted to do, Swift hated me, and it told me these things like, "Cannot invoke map with a da da da da da, what the hell is all this?" And I'm just going to say that's totally fine.

Right? Because the Swift type system is very new to somebody like me and probably most of the people watching this. The messages are helpful for people who understand it pretty well, and it's totally normal to think. For a week or two, Swift hates me. >> To be stubbing your toe on every new thing and feeling like you'll never -- >> And particularly this generic stuff, you know?

And I would say, look, a couple of suggestions. The first is just write the two separate versions so that you don't get frustrated, and then come back and try again a few times yourself. Ask on the forum. >> Stack overflow is great. >> Yeah. But quite often the kinds of errors you get from the site system are sometimes they're even like a long way away from really where the problem was.

It can be quite difficult because it's a powerful type system for it to really know where the problem is. Now, in this case, the issue basically is that we are trying to call -- we're trying to initialize either floats or bytes, and so it basically needs to know that the type we're creating can initialize either floats or bytes, so as you'll learn next week, you could do that by creating something called a protocol.

You do it by saying that these things conform to that protocol. You then use that protocol, and so now this version of load MNIST works totally fine, right? So this is a nice little package that you can look through. The last piece that we had to build in 00 was the thing that makes //export works.

So it's kind of delightful writing the thing that makes //export work by typing //export. One of the things that I needed to do here was to check whether something matches a regular expression or not. I found it extremely weird that the way to do that in Swift is called range of options regular expression, so I created something called find in string.

So now I never have to worry about the weird and clunky Swift syntax. Most of the time, I'm just looking to see whether something does exist or not, so I just created something called has match that checks whether something exists or not. So I make Swift work the way I want it to, and to give you a sense of like when I say clunky APIs, in particular, you'll see here we're using Max's beautiful path library.

Before we realized that the path library does everything we wanted, we used the thing that Apple provides, which is called the foundation library, and that comes with Swift. These two lines -- >> And also works great on Linux. It's a standard thing that's available. >> Yeah, so those two lines of code in Apple's foundation library looks like, oh, my God, it looks like this.

Okay, so to me, a lot of Swift APIs look like the leading path component, the binding path component, pending path extension, right? I don't know why, but a lot of Swift programmers seem to like writing this kind of code. I like writing this kind of code, but I think foundation is not necessarily your favorite API design, Chris, would that be fair to say?

>> I think it's fair to say that the thing that's great about foundation is it has a lot of interesting and useful stuff, URLs, and other stuff like that, but its design is not all great. >> It's great that it's there, and it's amazing that it's all been ported to Linux, right?

So quite often, you'll find-- >> It's got a tremendous amount of function. >> I need something like the ability to tell the time. It's in dispatch, or the ability to work with URLs. And so know that foundation is there, and generally speaking, I always just import it, first thing, right?

Because there's a lot of stuff you want will live there, and if you forget to import it, it won't appear in your tab completion, and you'll get errors. But also, when you find clunky foundation APIs, which is-- >> There's actually a better one out there. >> Or write your own little wrapper on top.

Anyway, so once you've done that, now we've got our own, you know, JSON serialization thing. We can grab our Jupyter notebook. We can find ourselves. We can export them. And now, we can just do the same thing that we do in Python, and we now have a nice little module exporter that we can use.

>> It's cool. >> We have a question on the time function. How do we know that calling F parentheses is not optimized away in this case because of a lack of side effects detected by the compiler? >> Generally, so that's actually a great question. In the case of the workbook, I don't think there's no cross-workbook optimization going on, so that's one thing.

I don't know if there's a really good-- that's a good question. What I recommend doing is put something that's not trivial inside the thing you're timing. And so, if you're doing, you know, we'll show you later launching a CUDA kernel to do matrix multiplication, for example, and that's not something that gets optimized away.

You can also, like, get the value into the closure and then take the value back out. So, there's different things that you can do like that. >> Yeah. Sometimes, when I was doing this stuff, a base method, I would just add a print inside the thing that I was timing to force it to be calculated.

>> And one of the other things that will happen with GPUs is GPUs run asynchronously, and so, you need to force a GPU sync. We'll show you how to do that in a minute. So, anyway, so coming back to this, so we showed you how to build matmul. We showed you how to build time.

So, this matmul is working on arrays. And this is pretty fast. We talked about it's 0.13 seconds. But array in Swift is safe. And so, what's happening is that every time you, like, index into an array, it does a check to make sure that the index of your computing is in bounds.

And so, this is actually doing a lot of work that you would not need to do if you're in C. And so, one of the other really cool things about Swift is that you can actually go all the way down to the bare metal and do things, the unsafe, nasty, awesome C-Wave, if you want to, to get even a little bit more performance.

And so, here, sorry, I forgot to change this back, but we have a couple of arrays. And so, we have the exact same signature that we did before where we take in two arrays and we have our dimensions. And so, what we're going to do is to optimize storing into that result array, we're going to say, give me an unsafe mutable buffer pointer into that array.

And it's unsafe, it's verbose, it has red warning signs all over because it's unsafe. But with almost no code change, now we're able to get something that runs twice as fast. And so, here's MatMul, and now it runs at .07 milliseconds, which is even faster, which really is a performance of C.

And this is pretty cool. >> And something I found with Bay Smith is, like, sometimes these differences are four or five times faster because making something a pointer allows it to use 70 vectorization. >> Yep. >> So, it's not a minor tweak. You can get super fast performance. >> But the thing I want to emphasize at this point is that this is like a super low-level geeky thing that not everybody should do, right?

This is something that it exists because at certain points in your career or your journey, you may find that it's useful. Or you may find something that somebody else wrote, and it going twice as fast as it otherwise would is a pretty big deal because it makes you twice as productive.

But usually, you're not working at this level. Layers are a good thing. If you want to go like super deep down the rabbit hole, unsafe pointer, and unsafe mutable buffer pointer, and all these things are also Swift libraries, and you can go see their implementations, too. And those are implemented in terms of the LVM magic, just like Flow does.

So, at this point, let's skip over more C stuff and jump down to working with Tensor. So, we've got a matrix multiplication working on arrays and floats, but we also have tensors. And so, when we talked about Tensor and MatMul in the PyTorch context, you started out by using the Tensor abstraction as the thing that was the container for the MatMul.

So, let's talk a little bit about how Tensor works because this is the first really, so for TensorFlow piece of this, Tensor is a type. And Tensor can have multiple different elements in it. Like we talked about before, you can create one with zeros or random numbers. And the nice thing about tensors is that they carry a shape, just like you'd expect, and so you can get it with a dot shape.

So, here you can see we have a 5 by 784, just like you might expect. And here we have a two-dimensional tensor, and you can print it out, and it's a two-dimensional tensor, just like you would kind of expect. Python has the @ operator to do MatMuls of two tensors.

Swift has the same thing, but it uses the nice Unicode thing. There's an easy way to type this if you're on a Mac or if you're using the compose key on Windows. Or if you don't like Unicode, that's also totally cool. You can just use the MatMul function and just spell it out.

And so, you know, this is an example of Swift just wanting to work the way you want to work. And if you like math, then you can have math. If you want to type things out, you can do that too. They're both great. Tensors do all the basic stuff you'd expect.

You can reshape them with the reshape function. They support square root and all the other math stuff. It all works the way you'd expect. It has element-wise operations like add and multiply and square root and pow. >> No, we have a question from earlier. Why was it unsafe, what you did?

>> So, what we did was we turned off bounce checking. And so, if I write code that -- if I have an array of 10 numbers, and in Swift, if I access out the 11th element, it will explode and tell me that that's invalid. If you use unsafe, then it will let you do that.

And so, whatever happens to be in memory beyond the end of your array, you're now poking it. And, you know, you should not do that, but you're taking the guardrails off. And so, this is -- Swift is trying to be default -- by default safe, and it's trying to help you.

It's trying to check things for you. But if you want to, you can just rip off all the guardrails. And just like we showed you with Python, like you can literally do things as dynamic as Python if that's what you'd like. But, you know, the defaults are there to help you out.

>> Yeah, so, Python programmers, a lot of them won't be familiar with this idea. But in things like C, unsafe code is code where you're working with memory that hasn't been initialized, or it's been freed. And it's a really big problem if you're using it like in production or something.

Because that kind of thing is how people can like inject shell code into your server. >> Security form. >> And steal your users and whatever. So, you know, you should -- I think it's fine in algebra to notebook, though. >> Yeah, yeah. So, coming back to tensor, you know, you can add them.

You can multiply them. You can -- like all the normal stuff you'd expect is on tensor. Tensor else, if I run the right cell, tensors also have a bunch of methods that do cool things like convolutions and other stuff like that that we'll talk about later. One of the interesting things about Swift is that it likes comparisons to return Booleans.

And so, you'll see that if you compare two tensors, it will see if -- it will give you an ordering of the two tensors. But sometimes you want to get a tensor of Booleans back. And so, Swift calls these the point-wise operators. And so, if you put a dot before the less than or the greater than or the equals or whatever, it will do a tensor comparison.

>> Yeah. And I get burnt by this all the time in NumPy and PyTorch that doesn't have this level of consistency. So, I think that this design call is awesome, this idea that Boolean operations always return Booleans and point-wise operations return point-wise Booleans. It's a good idea. >> And then you have reductions like any and all that say, hey, if I have a tensor of Booleans, I can check to see if they're all set or if any of them are set.

>> So, basically then, I mean, the next part of the notebook is just saying, hey, look, all the stuff that you've seen before looks exactly the same as what you've seen before. Sometimes the words change, like unsqueeze is called expanding shape at, which is a rather swifty way of saying things.

But there's -- in a lot of these notebooks, you'll find that there's like lots of details where we've just basically copied the Python code and we're not going to show you all those details because they're the same. >> Yep. Now, let's talk about matmul on tensor. So, what we've done here is we've defined the same matmul that we had before and before we took two arrays and we took two dimensions.

The tensor carries a shape. So, here we implemented matmul on tensor. We start by creating a zero tensor, we loop over it all. Now we have our two-dimensional indexing just like you saw before with NumPy. When you run this, what you'll see is this is a lot slower. This takes seven seconds to do on matmul where before it was taking 0.07 -- >> 0.1 seconds.

>> Yeah, milliseconds. So -- >> So, what is this? This is about -- >> It's thousands of times. >> 10,000 times faster. >> Yeah. So, why is that, Jeremy? >> Why is that? The first thing I want to say is that hopefully at this point you're thinking this is a problem because it's kind of like the exact opposite of everything that Chris has been telling us and I've been telling you about why this is good.

Like, what's the point of something that's infinitely hackable if there's this tensor flow layer we go beneath and that it's so slow that we can't really actually write things that run, I mean, seven seconds through a small matrix modification? Extraordinary. So, we would not -- we would not be running this course if this is where we were heading, right?

This is where we are now and it's a temporary situation that we're fixing. And so, let me explain what's going on and how it's being fixed, right? So, the first thing to point out is that when you work with PyTorch, we have a similar issue, right? Is like we don't write PyTorch triply nested for loops either, right, and the reason we don't is that we need PyTorch to have a reasonable amount of work to do each time we get it to do something, right?

So, we kind of say here's a whole matrix A, here's a whole matrix B, there it all is, multiply them together and then it says here's the whole thing multiplied together and that's what we do. So, it's like if PyTorch was an airplane, right, and we want to send our stuff off to China, we pack it all in a bag and we put the bag in the airplane and it gets sent off to China.

As opposed to the triply nested for loop version, which is where I take a sock and I put it in an airplane and it flies to China and back and then I put it in my next sock and it flies there and back. And it's going to take, that's a fast airplane, right, but it's just not an efficient way to use it, right?

So, we already have this issue which is you've got to give PyTorch enough work to do to make this latency, this overhead worthwhile. Now, TensorFlow was designed in a very different way to PyTorch and for those of you that did the earliest courses with fast AI, this will look very familiar, right?

It's actually a really fascinating programming model. You say there will be later on a float called X and later on I will want to multiply that float by two. Now, set up a session where we're going to do some stuff and run this computation graph, which could have lots of things going on in it, and run it in these things, right?

So, I basically kind of set up this whole series of computations and then I pass in some data. So, this is a very different feel to PyTorch, right? And because TensorFlow was built this way, TensorFlow, to me, does not behave like a plane. It behaves like a ship or actually a ship designed for shipping ships, or actually a shipping ship designed for shipping shipping ships, which is this particular one, the MV Blue Marlin.

So, if you have a shipping ship shipping ships ship, then you need to use it in a different way, which is if you want to get all of your socks, all of the socks in America to China, you send them all, send all of your ships off to all of the ports in America, everybody dumps their socks on and they all get sent to China and we're all happy, right?

Now, to take advantage of this extraordinary power, you have to use it in a certain way and you have to have certain things that you need to be able to do. So, like, if you're Google and you want to run all the world's search engine queries, this makes a hell of a lot of sense.

Now, TF Eager is the kind of the new hot thing for TensorFlow and it's designed to, or it does look like PyTorch, right? So, this is what happens when you say TF.enableEagerExecution, that's becoming the default in TensorFlow. You can say, here's my number. I'm not saying there will be a number later.

I say, this is my number and this is my matrix multiplication, right, and I can print it. The thing about this is, though, is that this is kind of syntax sugar on top of the ship, shipping, ship, shipping, ship, ship, right? Or whatever the hell the thing is. Because we're still using the same, a lot of the same kind of foundations and some of it's been optimized but only a bit, right?

So, as I say this today, as of April 2019, a like 5 by 5 matrix, like a tiny matrix multiply on a GPU with TF Eager takes 0.28 milliseconds, which is 10 times longer than PyTorch takes, right? And so, we still just have a lot of overhead and so TF Eager is not a solution to writing the kind of low level get down to the bottom stuff that Chris is saying, you can do with Swift.

>> Yeah, but also neither are GPUs. The GPU is not going to be fast at a 5 by 5 matrix multiply either. >> Right, right. So, I mean, it's, but it's not, you know, we want something, we want something better. >> Yes, right. >> So, TensorFlow has this big ecosystem of things to try and kind of fill in around this, around this issue of having this huge kind of mechanism that works in a particular way to make sure that, you know, you can put it on mobile phones or that you can do it on web servers or whatever, right?

But the good news is that what's happening at the moment and the reason we're seeing this speed, right, is that behind the scenes, Swift for TensorFlow is using TF Eager. And this is like a great choice because it lets us like do this course, it lets us say like here's how you use it, we can build things on top of it whilst the real stuff is being built behind the scenes and the real stuff which is to sit on top of this thing called MLIR which Chris can tell us a little bit about which is basically gets all of that compiler goodness that you've seen and allow that to work with the GPU and the CPU and different types of accelerators and let you write Swift, right?

So the reason I mention this is that for probably as long as this course is relevant, you know, like the next year, the true potential of what we're talking about, you kind of won't be able to see it, right? We're actually building for a future that's not here yet.

This is like a year away. But when we get there, all the stuff that Chris is showing you, we'll be able to write stuff that looks, that could even look like this. >> Yeah, so if I were, a different way to explain it, a year from now it's going to be mind blowing.

Like the future, you're going to be able to do stuff you've never even thought that was possible and use these accelerators in ways that are just completely unthinkable unless you're writing low-level CUDA today. Like there are certain humans in the world, like Scott Gray is one of these people who can make an accelerator do crazy things that nobody even thought was possible.

And that's what we're trying to do, but in a workbook. >> Right. And the reason this matters is that there are vast areas of unexplored research territory because, I mean, most people can't write the CUDA code, and even those that can, it takes so long and it has so many errors, you just don't, right?

So in a year's time, we'll be able to do stuff that people just aren't even trying to do yet. >> But one of the cool things about this is you don't have to wait a year, so next lesson we'll show you that XLA is here today, XLA is super awesome, it's a really important part of the TensorFlow ecosystem and it's way better than the Torch JIT.

>> Right. So just to explain, yeah. >> So we want to like completely jump over the industry and do something that is mind-expanding. >> Right. >> But even today, TensorFlow is a lot of power and XLA allows you to express and build really cool things with super high performance.

>> Exactly. So XLA is this really nice kind of intermediate thing where it's much more mature than the PyTorch JIT, it's been around for a couple of years now. It's a compiler that will turn your code into stuff that's kind of similar-ish performance to what you might see from PyTorch JIT, probably a lot less rough edges.

>> It doesn't generate blobs of C++ and try to compile them again. It's a principle compiler. >> So it's a really neat path because it allows us to do this course now, it allows you to start playing with this now in a couple of months, it allows you to get a lot of performance for a lot of things that you might want to play with and it means that by the time MLIR comes, we'll be all ready to hit the ground running.

>> Cool. >> And is there a way to make sure the matmul or other functions are correctly using shared memory on the GPO? For example, using tiling to make sure you aren't constantly busting the cache of shared memory on the GPO. >> We're not going to talk about this next week so maybe we could go back to that or?

>> Well, so I think that the thing to know is that this is not something you would write in Swift for TensorFlow, right? You would not poke a tensor one float at a time. It was just not designed for that. And so you can go through all the steps.

This is very similar to the Python workbook. But what you end up wanting to write is, let's see here. You just write this where you write something where you take two matrices and you're multiplying together or you use the Unicode one or the matmul one and it goes fast and it takes 0.02 seconds which is faster than Swift version because it's using the GPU.

It's properly tile blocked. If you run on the CPU, it uses all the threads on your computer and it goes really fast. And so the way to think about tensor is that it's meant for these big operations. It's not meant for one float at a time. >> And we will see next week some really interesting stuff coming down the line with stuff where you can write kind of tiled algorithms in ways that are much more concise than the triply nested for loops but much more flexible than the matmul.

>> Yeah. Join. >> Sorry, another question just came in. How do LLVM, MLIR and XLA relate to each other? >> That would be better explained with slides which we'll go into on the next time I think. But LLVM, the simple way to explain it is that it is really good at CPUs.

It's a little bit of an oversimplification because we do use it for some GPU stuff. But LLVM really helps you with the one float at a time kind of a thing if you're going to a simpler processor. XLA is really good at tensors and so it's a tensor compiler and so it's really good at saying I have these big tensor operations, I have convolutions to get maximum performance out of a CPU or a GPU or a TPU for example.

You have to know about tiling, you have to know about fusion, you have to know about a lot of these low level systems things before you then hand it off to LLVM that does the low level stuff. And so XLA talks to LLVM for GPUs for example and for CPUs and there's the way to think about it is XLA does all the tensor stuff and LLVM does all the float and small vector stuff.

MLIR is like XLA in a certain way but it's tackling graph level optimizations in tensor flow and kind of expanding XLA beyond just dense linear algebra because there's a lot of interesting sparse things and other things that are coming down the pipeline that are really exciting. >> So yeah, so basically I mean we won't look at the rest of this notebook other than to say that the broadcasting stuff that we saw is all here.

So you can kind of see how that all looks at the moment. >> You can run ops on the CPU or the GPU. I mean all that kind of stuff. >> All that stuff is all here and don't worry about the performance, it's really slow at the moment for the reason we mentioned, but it totally won't be.

And you can also see matrix modifications of different sizes and how to take its timing and so forth. So did you want to go to Roar now or? >> Well do you want to do this or do you want to go to 11? Do we have time? >> We have time to do this now.

>> Okay, so. >> Five to 10 minutes. >> Okay, cool. So one of the really cool things about the stack is that tensorflow is a really mature ecosystem. It has hundreds of different operators available. There's pros and cons of this. So tensorflow kind of grew organically over time a little bit and so it has a lot of things in its toolbox.

What Swift for Tensorflow does is it tries to curate that and it has tensor and the way tensor works is it's the struct and the struct is implemented in terms of those low level tensor operations. And so if you look inside tensor and here there's a link so you can go click on it and see the implementation.

Tensor has this thing called tensor handle that is under the covers basically the tensorflow low level thing that eager mode uses. And if you look at plus, what plus does on two tensors is it calls this raw add operation. And the way that this works is this is just directly talking to the add op in tensorflow.

And the add op is implemented with cuDNN or it's implemented with XLA or it's implemented in different ways for different pieces of hardware. But this is just a simple syntactic sugar thing where we're saying hey plus turns into tensorflow add. Now again, tensorflow has tons of cool stuff and it has stuff that I barely understand with lots of mathy things and triangular LED compositional things and like Bayesian propagation of things that I've-- >> We have an excellent course about triangular decomposition if you-- >> Awesome.

I'm going to try to survive next week and then I'll take it. And so we haven't curated all of the things that are potentially interesting. And so what you can do is you can actually add new things to tensor. And so one example of that right here is so you can get like zeros like if you go into here, let's see if this is.

So with tab completion you can see all of the interesting things. Add many sparse to tensor map, add n, adjust contrast, a sign, like it's got lots and lots and lots and lots and lots and lots and lots of-- >> And this is super cool, particularly if you're watching this between like about April and about September, like in the period where maybe the XLA stuff isn't as fully fleshed out.

You probably care about this because there's-- >> Lots and lots and lots and lots and lots and lots and lots and lots of-- >> Which we haven't necessarily surfaced yet. So for example, somebody today was saying, how do I can switch from RGB to BGR format and somebody said, oh, there's something in TensorFlow called reversed and so here's the answer, raw.reversed.

So what's knowing about this? >> Yeah, and so one of the things we use for X-Res and other image models in this course is, hey, we need to be able to load a file. And you could do that with Python, that's fine. TensorFlow has great stuff for doing this and so here we just use raw read file.

And so all we're doing is we're adding a method to a string tensor and we're saying, hey, if you want to create a string tensor from a file, we can have read tensor, we can just use read file and now I can say, string, give me a string tensor, read file, foo and I added a decode JPEG method on here too so now I can say decode JPEG, JPEG and I got my file, right?

And so this is one of the cool things about this platform is that TensorFlow has all this functionality. We're not trying to hide it, we're just trying to curate it a little bit but again, you can just go add whatever you need. >> Yeah, so one of the people in the study group today was building an audio library with Swift for TensorFlow and we haven't surfed any of that so they were grabbing, you know, raw dot decode WAV or something and they had it all up and running.

>> Yeah, and it's super cool. And again, Swift gives you nice ways to build these things as APIs with default arguments and all this nice stuff and so you get a lot of design space to do things that work the way you'd like them to work. >> Cool. So the way we're going to do this is we've kind of gone like super, super bottom up.

I must admit I thought we had done bottom up before but this is another level of bottom up. >> Then he brought a compiler guy. >> Yeah, then we brought a compiler guy who, you know, is always good at making me feel small and insignificant. And so, but now let's jump back up to the top again to see where we're going to end up and then next week, we're going to kind of flesh out all the stuff between the middle, right?

So I'm going to jump to notebook 11. And notebook 11 is interesting because this is the one where we train an xresnet on imagenet, right? So this is where we're heading. So every time we import the previous notebook, just like we do in Python, the previous notebooks, however, aren't just numbered but they also have the name.

That's the only difference. And then this is just the equivalent of percent map plot lib inline in this environment. So here, load data, we'll show you how we built something that downloads imagenet but it basically looks almost exactly like the very similar to the download MNIST thing you've already seen.

And we've created an item list which has extensions. And we've created a split data which takes an item list. And one of the nice things here is that we don't really need something like functools.partial in Swift because now we can just pass in a trailing closure which as Chris described, if the last parameter is a closure, then you can just whack it in curly brackets.

You don't even need a return or anything. And you don't even have to give it an argument name because you can use the default ones. So we're saying split this item list by grandparent. This is the file name that you're going to get. This is basically like the equivalent of doing partial, right, and it's going to be some different validation set.

And so now we can create our label data and we need a processor. So we have, again, a category processor. So you can say we've got a whole data blocks API here. One of the things that I guess you're going to talk about next week, Chris, is end and mutation and stuff.

>> Sure. >> Yeah, OK, so basically in Swift, as Chris mentioned, most of the time we use struts. And as Chris will describe to you, struts are things that normally don't change. But you can create something that kind of feels a lot like a C++ reference or a C pointer, but it's much cooler by adding an ampersand.

Because remember, processors actually change their state because we get like a vocabulary, for example, the first time we use a processor on the training set. So now we've got a split label data set. And then we've added a to data bunch and we can pass in all the normal stuff, including a batch size.

So next thing we can do is we can do transformations. And again here, we can use a trailing closure to basically do a partial, to say that we're going to do resize in our transformations. So then we'll grab a batch. Something that I think Chris will probably talk about next week more is this thing.

But basically in Swift, very often you want to be able to say, hey, this is going to return either a batch of data or maybe it was going to return nothing at all, right? Which in Python, we use the optional type for that. And it's called the same thing in Swift, right?

>> Yeah, none. >> None, yeah. So basically what happens is if you have something that might not return anything, so one batch might not return anything because it might be nothing to return. It can return nothing. And then the exclamation mark just says, assume that it's something. So we can look at the shapes of that batch.

And look, we've even got show batch. So it's been really fun, this process of, you know, in the last couple of weeks of basically saying, what does fast AI look like on Swift? And one thing I'll say is like a lot of these notebooks have been written by Sylvain in particular and by me a little bit.

And we don't know Swift at all well. So any good Swift programmers looking through those notebooks thinking, oh, this is nice, but it'd be even more Swift-y if you did blah. Please let us know in the forum, because we're super interested to learn how to write better Swift. >> And I've been super interested to learn all the ML.

It's been great. >> It has been great. I mean, it's, you know, in one sense, it's a good sign that you're learning fast AI for Swift from the people who started the fast AI in Swift projects, but on another sense, I know nothing about Swift and Chris doesn't know much about deep learning, so maybe it's the worst of all possible worlds.

I don't know. >> No, I think we're all learning together. >> So anyway, yeah, it's been super fun. So as you can see, we've got a data blocks API that's now working. The other thing I mentioned, as you'll see next week, is the way we've got this working is it's using a TensorFlow API called tf.data, which is actually a lot better than a lot of data APIs, but it's still not nearly as good as I would like, and I would love to, as a community, start building out the next version that uses like Swift's libdispatch to do the threading and maybe openCV to do the transformations and stuff like that.

Like we can build, I think, a data blocks, something like the Python data blocks API, but that is like native. It's not talking to anything else. >> Yeah. >> Anyway, so now we've got batches of data. We can train a model as soon as we have one. So let's create an x-resnet model, and as you've already seen in the slides, it ends up looking very, very familiar.

So here's our conflier. Just one thing to mention, at the moment, and this will probably only be true for a couple more weeks, there are kind of two versions of all the layers. There's the versions in the fast AI repo, which all start with FA, and there are versions in the Swift repo that don't.

So just ignore these FAs. So a conflier has a batch norm, and it has a convolution. Another thing that's slightly awkward at the moment is that we -- so you'll see, right now, some of our code looks weird because auto diff in Swift doesn't support flow control, so if or for loops.

That'll change soon enough, right? So when you see something like no bias convolution, that's because we can't write a convolution layer that has an if statement saying if the person asks for bias, use it, otherwise don't, right? So don't worry too much about those workarounds. Either they'll go away soon enough.

So we've got a batch norm layer, we've got a conv layer, and we can go through them, and the zero bn is the stuff that you're used to, and as Chris said, dunder call is built without the dunder, but otherwise everything looks the same as usual. Because we don't have the ability right now -- this will change soon -- to use if in code that needs to be differentiated, we've basically added something called a switchable layer, which is something you can turn on or off, so the details don't matter.

Chris will describe next week, however, how we actually wrote our own kind of custom gradients for this kind of layer, and that'll be fun to learn about. So then we used that to basically have something where -- because remember in xresnet, in the identity path, it's not always an identity path.

Sometimes you down-sample in that path, sometimes you change the number of channels in that path. If you down-sample, then you maybe add an average pool 2D. So because, again, we don't have the ability to have if, we just use this switchable layer, and maybe you change the number of channels by adding a one-by-one conv, so that's all that is.

So most of this stuff, if you're watching this, you know, much later than kind of July or something, this will probably all have gone away and been replaced by some if statements. But, you know, once we've got all that, the res block, there's really nothing to mention, is there?

I mean, it's basically identical. If you look back at the version in 11 on -- in the Python versions and kind of switch between them, you almost need like a strobe-like thing to see that they're different. Like, it's the same layers equals conv layer. Layers equals conv layer. I don't know why we changed the name.

Got this ternary here. This question mark and colon is identical to if and else as an operator in Python. It comes from C. And then, yeah, and then finally in the call, that and that look exactly the same. >> Pure self. >> Yeah. Thank heavens. Thank you. Make layer looks basically the same.

This is the make layer we had before. This is the make layer we have now. Don't need that. And so it's interesting to see how some swift things kind of come out quite neatly, right? So this use of map, so this is generating -- this is the same as range and blocks in Python.

So this is basically saying map, range and blocks, and then passing in this closure which generates the res block, right? So I think it's kind of -- I don't know. I find it more clear, the swift way, but very, very similar. >> And the idea of Swift is to have simpler primitives that compose instead of having special cases for the important things.

>> Yeah. So now we've got all that. The x res net looks very similar to what we would expect. We've still got our number of filters thing going on. The stem, so now we've got that array map thing. You're kind of going to start to get a feel for these kind of idioms in Swift.

So kind of range.map is a useful kind of idiom to be aware of. >> You can also use a for loop. You can say for i and 0.3. That's also fine, too. It just depends on how you want to write the code. >> There's an enumerate, but rather than enumerate bracket something, it's not enumerated, but it works the same way.

When you map to it, you get back an index and an object just like in Python, so very familiar. So in this case, because we've gone .map and then .reduce with plus on a list, this is a list comprehension now, right? Because this is spitting out a bunch of lists that we're joining all together.

So those special cases there. This is one of those cases where you're asking for the last element of a list. List could be empty, so there might be nothing in it. So exclamation mark says just assume that there is something there. And we've written a little compose. So we can compose this list of layers on our input.

So again, we've got kind of similar concepts expressed in similar ways. So we can now put all that together, and we've got all our different resnets. So now we create the various functions that we need to pass into our ladder. So one is a function that's going to create a model.

So it's going to be something that creates an exresnet, and that's the function that's going to create a model. We're going to have a function that creates our optimizer, which, as you'll see, we have a stateful optimizer, just like we had in Python. We have a learner, just like we had in Python, and it has very, very similar book to it.

And again, next time we'll talk about how all these are built. And so atom optimizer, of course, is just a thing that's hackable. Yep. You can change it. Exactly. We have recorder callbacks, just like we're familiar with. We have one-cycle training, just like we're familiar with. This add one-cycle delegates and make default delegates is probably going to go away pretty soon, and we'll have some slightly neater ways to do this.

So by the time you see this notebook, this might have changed a bit. And then we train it with a resnet 18 for a few minutes, and we're at 0.81. Couple of things to mention as I go through this end of April. Right now, this uses about twice as much memory as PyTorch, and it's about three to four times slower than PyTorch.

No fundamental reason why this need be the case. We've just started. And so the fact that within-- It's not bad for three weeks. Not bad for three weeks. I mean, and all the work-- From nothing. And all the work that you guys did to build the auto diff in the first place.

Three weeks ago, it really didn't work. Yeah, so it's pretty cool that we're at a point where we can actually train proper models like this from scratch in not too slow and not too memory intensive. And if you're interested in getting into the weights, we would certainly love help with fixing the performance and fixing the amount of memory.

So that's a related question. What would be the best way to contribute to the Swift for TensorFlow ecosystem as someone who's now using Swift for the first time? Yeah, that's a great question. So the best place to go is github.com/tensorflow/swift. That's our landing page. There's also a bunch of tutorials there.

It explains how to get and build things. One of the things that we're doing is that we're building everything in the open. And so we do our development in the open. We use GitHub. We have our discussions on a mailing list that you'll find linked up for a GitHub page.

And so we try to have a really inclusive and welcoming community. And there's a ton of resources available and a lot to do. Yeah, and that's one way to do it. I would like to suggest another way, which is to come to the hair brain forum, the last day of forums.

Because I think for a lot of people, the right way-- the best way to contribute, the way that you'll get the most out of, the most relevant to you right now, is to pick something that you've already been using in Python and doesn't exist yet and create the Swift version.

And you may think you have no idea how to do that. And perhaps you don't. But create a really crappy, slimmed-down Swift version and build from there. That's the only way any of this stuff gets done. Ask for help on the forum. Offer help to others. And so pick small bits or find some piece that hasn't been documented yet.

We haven't really figured out yet where-- To put things. Yeah, where fast AI lives and where Swift for TensorFlow lives and where different repos will be. But in the end, between the fast AI and Swift for TensorFlow repos, there'll be a kind of an ecosystem that covers the same kind of ground that PyTorch plus fast AI covers and has just as much room for you to-- well, much more room, actually, for you to build things on top of that.

Because you've got the entirety of Scikit Learn and Matplotlib and Pandas and all this stuff to-- Another thing is, if you go on the fast AI GitHub, you'll see these workbooks. And we skipped from 1 to 11. And so next time, we'll go back through and talk about two and three and four and five.

But these are all there for you now. And so if you'd like to look, you can go do that. And they'll get a little bit better by next week, I bet. Yeah. And one thing to mention is, with the normal fast AI notebooks, we nearly freeze them. Once we do the course, we just fix errors.

And that's about it. These notebooks is going to be very different. We're going to keep them very, very up to date. So by the time you watch this, they may look very different. Because we want to always have for you, showing you, this is how we suggest you write Swift for TensorFlow code now.

And even over the last week, we've been-- if you look at the GitHub history, you'll see we've been discovering new things, like differentiable arrays and suchable layers. And it allows us to delete lots of previous workarounds in the next weeks. And the next couple of months will be similar.

So yeah. Do you want more questions now? Sure. All right. Is Swift thread safe? Yes. Swift is both thread safe and has a really great threading abstraction called dispatch. And so you can fire up lots of concurrent work items, set up work queues, has a really vibrant and good API for doing that with quality of service, and all these advanced things that are available there.

Yeah, I've never used such a nice kind of threading-- it's like it's a framework. It feels more than just a language. So on the Apple side, they call it Grand Central Dispatch. But they've put the whole thing over to Linux. And you have this whole kind of threading library framework, which is super easy to use and extensible.

This is one of the reasons the Swift server community really likes Swift, is that it's efficient, yes, but it also supports threading and other things really well and very naturally. Oh. Are there any scikit-learn for Swift projects in the works? I have no idea. There should be. Yeah, I haven't seen anything much.

I have a random forest implementation I would love to convert over to Swift, which is a pretty nice one. But it would definitely be nice if you could build a gradient boosting machine or even simple things like K-nearest neighbors and stuff like that. I think, though, that the opportunity here is to go beyond just reinventing what's already there.

Scikit-learn is cool that it exists, but it could be a lot better, particularly in Swift. So if you do decide I want to build bits of sklearn or bits of pandas, jump on the forum and talk to us about it, and let's try and build something that's vastly better than what's been before, not just a copy of it.

I wouldn't suggest that being a starter project. If you want a starter project, pick one of the lessons in the one through seven class and implement some of those algorithms. I think in terms of the framework, I think that'd be a really great place to start. As you get more experienced and you get more familiar with Swift, then tackling something like building a big library can be a lot of fun.

Is there any plan to build tensor shapes into the Swift type system? So we have a tensor shape type. It's a struct, literally, right now. And that's when you pass in shapes to create a tensor of zeros, you get that. I think what they're probably asking is, will we get dimensions in the types or will we get names in the dimensions?

So this is something we're super interested in. We think there's a lot of opportunities there, both in terms of shape checking, for example. The whole stack we're building with the compiler integration and the good the locations and stuff like that. We think there's a ton of potential. We haven't tapped into that yet.

The names on dimensions is tricky, and there's lots of space. And we haven't exactly figured out how the best way to do that is, because there's trade offs. But that's exactly all the second step things we want to do, probably starting this fall-ish, when the basic auto diff, base performance, like scale out, and a bunch of other things are all settled.

And we feel good about all those basics. And we're trying to stay focused on making sure that things are really good, and build the basics, and get them right, and then move out. Any more questions? No, I mean, that's fine. We're at a good stuff. How is ampersand referencing different from struct?

Oh, we'll talk more about that next time. So Swift has a-- this comes back to safety in the language. Swift has a completely different approach to references, like classes, and structs, and values. And so I'm going to save that mind-blowing piece for next time. It's really cool. It is.

It underlies a ton of-- I mean, it's a very foundational thing that makes a lot of stuff possible. And how is Swift for probabilistic programming? So this is one of the areas that I'm both completely incapable of talking intelligently about, but also very excited about, because this is one of those things that I think is really underutilized.

One of the things that I think is really interesting about Swift as a platform for machine learning is that you often-- so with Python, you end up in this weird situation where you build a machine learning model, and then you have an application that you eventually want to use it in.

And these are two different things. Training your model and deploying your model are different worlds. And we can start erasing some of those boundaries, because Swift can be put in a mobile app, believe it or not, or put in a server, or put in other things you're actually deploying.

And probabilistic programming and many of these other applications I think would be great to build and integrate with the applications themselves. So I think the answer is it'll be a great fit. I haven't seen anything here yet. But basically, with things like probabilistic programming or things like kind of graph-based systems, they're not a great fit for PyTorch.

And that's not a controversial thing to say, because Sumith Chinchilla, who created PyTorch, said that on Twitter himself last week. He said, if you want to do this kind of work at the moment, you might want to look at Julia, which is another great option for this kind of programming.

Because what happens is you have these kind of deep connections, computational graphs, of lots of things calling lots of other things. And so you need-- and they're often kind of small. So you need those things to happen super quickly. So things like Julia and Swift work really well for that kind of thing, particularly when you add all the threading stuff on top as well.

So if you're interested in that area, that would certainly be one that I think you could start getting into straight away. One of the nice things about that is you can do a lot on the CPU. A lot of these things don't even make sense on the GPU. So you can take advantage of it right now.

And for that, actually, we'll add it to the forum post. But I actually have a post already about how to access a variety of random number distributions from Swift, C++, random number distributions. So you could actually get started with this right away. Yeah, also, the TensorFlow ecosystem has a big, mature framework called TensorFlow Probability.

And so I personally don't know very much about it. But I expect that all the atoms you need are all there. And we just need somebody who knows the space really well to build a Swift library that can expose all the primitives that TensorFlow already has. Is that it?

How could you deploy Swift models on Android? Well, so I think there's two options that you have there. So one is, Swift, again, builds on the entire TensorFlow ecosystem. And so TensorFlow ecosystem includes graphs. And graphs are part of what Swift talks to. So you can export a graph.

You can send it through TF Lite. And so the whole mobile deployment situation there works. I feel like that's kind of the model we're trying to get away from a little bit, though, do you feel that way? So the other option is, Swift actually runs fine on Android. People ship Android apps written in Swift.

So you can do that, too. Swift on Android isn't really awesome, as far as I know. I'm not exactly an Android programmer. I don't really know that. But the issue there is that Android, you have a dual world between the Java stuff and the native stuff. And so Swift fits into the native stuff is my understanding.

But I know that people are building and shipping Android apps written in Swift. And so that's a totally reasonable thing to do. The other thing to mention in terms of platforms is that Swift on Windows is starting to come together as well. So I don't know where it'll be by the time you're watching this at home.

But Swift is definitely making its way to worlds outside the iOS world pretty rapidly. And Windows is one of them. Yeah, it's super exciting. People are writing, what's the Windows MFC apps in Swift, which is brain-twisting to me. So what we're going to close with now is a little taste of where we're heading next week.

And this is actually something that Rachel shows in her computational linear algebra course. And it comes from a really amazing programming language called Halide. And Halide is one of these dramatic steps in terms of completely rethinking how we program computers. And I want to show it to you because it gives you a sense of the kinds of problems that Swift has to solve in order to-- like, the goal here is to be able to get-- the goal here is to be able to get this performance.

Because remember, the C speed, triply nested for loop, was 0.07. But TensorFlow is 0.02. How do we get this level of performance, but you being able to write it yourself in Swift? Now here's why it's hard. And so this video actually comes from the Halide project, which is a programming language that has kind of invented an amazing way to think about this.

And so I'm going to use it to describe the problems that we're going to solve. So in order to write something fast, we have to think about how everything's going to get put together. And so the algorithm we're going to write here that they wrote in this Halide video is one where they're doing a simple blur, a 3 by 3 blur.

So we take each group of 3 by 3 pixels, and we take their average. That's how you do a 3 by 3 blur. What are some of the things we could do? In what order, for example, do I compute the values in my 3 by 3 blur? And one way is just go through each x one at a time, and then within that, go through each y one at a time.

That would be one choice. A second choice I could make would be to do the exact opposite, which is to go through each column one at a time. Now these aren't going to have very different characteristics. Maybe the latter might be a bit slower, because we're jumping further through memory.

But what we could do is we could do something called vectorization. And vectorization is super important, because what happens with vectorization is we actually take four or sometimes even eight numbers at a time, and throw them all at the CPU or GPU at once, and say calculate them all together.

And so we have these things called vector units in our computers nowadays that can do that. You can even have multiple vectorized things happening at the same time, because you have multiple cores. But in fact, in the GPU, this is what happens all the time. Or in order to think about better memory access, we could do a little block at a time, like this.

So there's lots of choices about how I compute my values that are going to change the performance characteristics of my, in this case, a blur. Another question is, when should I compute my inputs? So here's my input. And see how it's going through three at a time? Because I'm trying to calculate three at a time.

And that gives me my blur in x. Now I have to go through all of those three at a time. And so you can see this is super slow. It's recalculating things again and again. And it's not able to really use the cache well. Instead, I could do a whole set of nine at a time.

And that would then allow me to create a whole blurred output at a time. Or I could go through it like this, exactly like before, but actually save what I had before. And that means when I then do the next row, I don't need to recompute, because it's already there.

OK, just to add a clarification that the left panel's input, the middle is the intermediate values, and the right is the final output. Thank you. We could vectorize that. So we can do vectorized input, and then vectorized on the intermediate values, and then calculate those to create our vectorized output with parallel processing.

Here's another way that we could combine vectorization and parallel processing. There's all these things you can do. And you can see they're going to have very different performance characteristics, right? So at Halide, they have this neat idea, which is, hey, let's not write nested, nested, nested for loops, and tiles, and looping through the memory like that.

Let's instead describe for each value x, y in my blurred output, here is how it's calculated in this declarative way. This is literally the definition of a blur algorithm. And so you first do that. And then after you've done that, you can then say to Halide, what are different schedules for calculating that?

So what's the kind of order in which things are done? And for these different schedules that are written here, they have all the different behaviors you just saw. Now, here's the thing. When expert CUDA programmers and expert CPU programmers write code to do things like this, they're using the world's best knowledge available across all of those things to create special versions for every architecture, for lots of different matrix sizes, tensors of different numbers of dimensions, so much assembly code, right?

And so we can't write that, right? So how are we going to be able to write the stuff that's in our head, but have it run reasonably quickly? And so what we're moving towards with stuff like MLIR is the ability to kind of have domain-specific languages where you could write, here's the tiling domain-specific language, and here's the-- For example, Halide directly.

For example, Halide directly, right? And so that's the hope of where we're going to be going here is that Chris's team is going to be putting these kinds of tools in your hands via Swift. Is that a reasonable summary? Well, and so one of the bad things about Halide-- so in this space, we have TensorFlow today, TensorFlow today.

We have XLA. XLA is an important part of TensorFlow right now. It's just not really wired into the Swift part of TensorFlow yet. XLA does all this stuff right now. And XLA is really good because you don't have to hand-tune it, like that whole writing out schedules. XLA does a good job of using the hardware as it is today.

The thing we're going further with MLIR is to say, well, instead of you having to put all this blood, sweat, and tears in to tune it, and know the hardware, and do all this stuff, we can do other things like search. And in fact, there are research systems available now which will use genetic algorithms to auto-search for their optimal schedule.

So you're starting to see the ideas that come out of the database query optimizer world coming into the CUDA kernel world. And this is going to be so great for us data scientists. Exactly. Search can be implemented in lots of different ways-- brute force, reinforcement learning, lots of different genetic algorithms.

There's lots of cool things that can be done here. And so what we're going to do over time is crack open the compiler and make the internal algorithms all learned. And I think that's going to be super cool. So this is why you have a compiler guy and a DL guy standing next to each other, right?

Because it-- Oh, well, we like each other, too. Oh, you're OK. Because we're not going to get this kind of great outcome unless people like us are working together with amazing teams like the folks that they have in TensorFlow and XLA and so forth. So next week, come back, and we will dig even deeper.

Thank you very much, Chris Langmer. Thank you, Jeremy. (audience applauds)