back to index

Lesson 13 (2019) - Basics of Swift for Deep Learning


Chapters

0:0
6:35 What's the future of fastai?
13:57 What is Swift? The claims
18:5 Swift for TensorFlow
22:6 S4TF and TensorFlow?
43:47 Compiler optimizations
47:14 Float in Swift

Whisper Transcript | Transcript Only Page

00:00:00.000 | Welcome everybody to lesson 13, also known as lesson six of part two,
00:00:07.160 | also known as lesson one of -- I never mind.
00:00:10.320 | The lesson in which we start talking about SWIFT.
00:00:14.120 | Before we do, I wanted to mention a couple of cool things during the week
00:00:19.000 | because lots of people doing lots of cool things.
00:00:21.440 | And in part two, I haven't done as much highlighting of these things,
00:00:24.400 | but I thought it'd be nice to look at a couple of cool examples from this week.
00:00:29.000 | Big congrats to Rob Gee, who said that 14 months ago,
00:00:33.520 | he'd never done any machine learning or deep learning or Python or Maths
00:00:36.640 | beyond high school, and he just competed in one of the top academic challenges
00:00:42.640 | for computer vision machine learning and came first and second in the two
00:00:47.640 | of the three challenge tracks he entered.
00:00:50.040 | So congrats, Rob, and I thought this is a great example of, like, you know,
00:00:55.360 | the kind of interesting things you can do because if you do an academic challenge,
00:00:58.720 | like this, and if you do well, like Rob, you actually get the opportunity,
00:01:02.320 | as he mentions here, to write a paper.
00:01:05.760 | And so if you've never done an academic paper before,
00:01:07.800 | this is a great way to get an introduction.
00:01:09.480 | You kind of have a certain publishing venue, and you get a bit of an insight
00:01:15.000 | to the academic world, and I certainly found the same thing
00:01:18.040 | that Rob points out here, which is that when you actually submit a paper
00:01:22.120 | for the first time, you suddenly realize why so many academic papers aren't
00:01:25.760 | that useful because they focus on weird things.
00:01:28.760 | So anyway, I thought this is a great result.
00:01:31.080 | Congratulations to Rob.
00:01:32.600 | I also feel like I have a kind of a regular gig
00:01:38.040 | in promoting the amazing work Elena Harley does because she does
00:01:40.680 | so much great work, but this is yet another great piece of work that she's done.
00:01:44.000 | You'll remember from the part one genomic stuff, and this is nice
00:01:48.760 | because it's an example of looking at text data for the purpose of looking
00:01:59.880 | at genomic information, and I just love this example.
00:02:04.120 | And Elena has got a great walkthrough that you can find describing --
00:02:08.920 | I mean, look how familiar this looks.
00:02:10.800 | It's the exact steps that we've just taken, and one of the things I love is she's
00:02:14.440 | actually using whatever this version of fast AI is that we have in our X folder,
00:02:20.560 | so it's not even fast AI version one.
00:02:21.960 | It's the stuff that we've built from scratch, and so it's nice to see that used in practice,
00:02:27.080 | not just used, but not bad for a quick-throated together baseline.
00:02:30.920 | It hits 15th out of 357 teams on this leaderboard, which she describes
00:02:38.400 | as not a bad starting point, so not a bad starting point at all.
00:02:42.400 | So that's very cool.
00:02:45.880 | So rewind, start of lesson eight, we said we're going to try and recreate fast AI
00:02:51.840 | and much of PyTorch from the foundations.
00:02:55.880 | And 26 days ago, Sylvia and I started the same thing for Swift,
00:03:04.920 | except we actually had no ability to cheat because when we started,
00:03:08.920 | we really were starting with the foundations.
00:03:12.880 | There are no Swift data science modules, basically.
00:03:18.080 | But there is stuff, as you'll see, for creating tensors and random number generators,
00:03:25.480 | and you'll actually see we've been able to use Matplotlib,
00:03:28.240 | which might surprise you, we'll show you why and how.
00:03:31.040 | So this is like what we're going to do over the next two lessons, is revisit this.
00:03:36.200 | Now, obviously, we're not going to go through every step in excruciating detail
00:03:39.640 | because as you'll see, the vast majority of it is, hey,
00:03:42.040 | this is almost identical to what we did in Python.
00:03:44.760 | So what we're still going to do is dig in to the bits
00:03:47.960 | that show us something interesting or different, of which there will be many.
00:03:53.520 | But in the end, we're going to get here, which is, this is xresnet,
00:03:59.360 | is the res block from xresnet, and this, believe it or not, is the Swift version.
00:04:05.320 | Like, you almost can't tell it's different.
00:04:08.240 | So we're going to end up in this beautiful situation.
00:04:10.960 | This is the res block.
00:04:12.160 | Here's the xresnet itself.
00:04:14.200 | It's so concise.
00:04:16.760 | It's so familiar.
00:04:18.880 | Hopefully, it's going to make you feel pretty comfortable
00:04:20.640 | that this is where we're going.
00:04:23.200 | And what about how we go and get all the data in there?
00:04:27.040 | We're going to have to deal with all the TensorFlow data APIs
00:04:30.320 | and learn all this stuff as well?
00:04:32.120 | Well, no. Here is the data approach that we're going to use.
00:04:38.480 | Data blocks API.
00:04:39.560 | So we're actually going to show you how to build the data blocks API in Swift
00:04:43.640 | for TensorFlow.
00:04:46.160 | So, you know, three weeks ago, when we started digging into this,
00:04:51.280 | none of us knew if this would be possible.
00:04:53.120 | And I'm really thrilled to find that not only is it possible,
00:04:57.600 | but we end up with code that looks, you know, wonderfully familiar
00:05:02.840 | and has all the nice features that we've hopefully grown to love.
00:05:07.840 | So to get to here, there's a lot of people I want to thank.
00:05:11.640 | In particular, Chris Latner, who I still don't understand
00:05:17.520 | why he put his trust in us in this way.
00:05:19.880 | It seems like, you know, he has very strange judgment or something.
00:05:25.440 | Yeah. But given that he did, we felt that we had to make sure
00:05:29.840 | we don't screw it up.
00:05:30.640 | So he's been fantastic.
00:05:32.880 | And then the whole Swift for TensorFlow team has actually made this project
00:05:36.040 | their number one priority.
00:05:37.880 | And this totally wouldn't have happened without everybody pulling together.
00:05:43.000 | Also in terms of bad judgment, Sylvain has, you know, obviously,
00:05:48.080 | we all know made the mistake of deciding to spend his time working with me
00:05:53.240 | on fast AI stuff, and a few weeks ago I said, guess what?
00:05:58.000 | We're going to rebuild everything from scratch in Swift.
00:06:01.840 | And rather than running away screaming, he said, okay, when do we start?
00:06:07.400 | So thank you, Sylvain, and he has built nearly all of these notebooks
00:06:11.240 | in three weeks and learned Swift, which is not bad.
00:06:14.320 | Thanks, Alexis, for the value types discussion.
00:06:17.000 | You'll see next week's super helpful.
00:06:18.840 | Pedro has built something which makes the Swift packaging system
00:06:24.300 | slightly less than it otherwise does, which is a lot.
00:06:28.480 | And also the whole San Francisco fast AI study group who's given us lots
00:06:32.040 | of valuable feedback over the last few weeks.
00:06:36.520 | So what does this mean for fast AI?
00:06:42.000 | There is a blog post we will link to about, which many of you,
00:06:46.600 | I'm sure, have read about why we're doing this crazy hair-brained Swift thing.
00:06:52.920 | But I particularly wanted to mention, like, two things.
00:06:57.560 | The first is we actually don't really have a choice, right?
00:07:01.200 | We used to work with TensorFlow and Keras, and we had to stop
00:07:07.800 | because we couldn't build for you and with you the things
00:07:11.560 | that we wanted to build together.
00:07:13.320 | So luckily PyTorch had just come out, and we actually first started
00:07:17.920 | using PyTorch in this course, I think it was two weeks
00:07:21.720 | after the first pre-release of PyTorch.
00:07:24.280 | So doing ridiculous things ridiculously early is definitely part of our DNA.
00:07:28.920 | We were lucky that happened.
00:07:31.800 | But then when we came to get to the next part one,
00:07:34.640 | because we started using PyTorch in an earlier part two,
00:07:38.040 | just like we're doing for Swift, then we got to the next part one,
00:07:42.040 | and we thought, well, PyTorch is great, we want to use it all the time,
00:07:45.280 | but there's no way we can teach PyTorch to new practitioners.
00:07:50.760 | So we created a whole new library called Fast AI, because we had to.
00:07:56.120 | And now we've hit the same point we were with when we first switched to PyTorch,
00:08:00.160 | which is we can't build with you the stuff we want to build together, right?
00:08:05.960 | We're hitting the boundaries.
00:08:07.080 | We want to create nice, regularized RNNs, for example.
00:08:12.280 | We want to create batch norm layers which skip computing the statistics
00:08:16.200 | from time to time, for example.
00:08:18.600 | We want to create highly flexible on GPU augmentation systems.
00:08:23.720 | And we can do all these things in PyTorch, but they're slow enough
00:08:27.600 | that we kind of tend to choose not to.
00:08:31.040 | So we're hitting the limits.
00:08:33.000 | So we actually need to do something else.
00:08:35.520 | So we need to start building the next thing.
00:08:38.720 | And we were very lucky again that Swift for TensorFlow
00:08:42.200 | appeared at just the right time.
00:08:44.200 | The second thing I mention is that not only will this not hurt Fast AI
00:08:50.400 | for PyTorch, but I'm confident it will make it better.
00:08:53.920 | I find that when I work with new programming languages
00:08:57.280 | and in new environments, I learn new things.
00:09:02.200 | And I become a better developer.
00:09:03.760 | And we're already coming up with ideas that's
00:09:05.680 | going to help us to actually make Fast AI for Python better.
00:09:09.360 | You have a question, Rachel?
00:09:11.320 | Why has Fast AI chosen Swift over Julia?
00:09:13.960 | Well, because it's impractical deep learning for coders,
00:09:21.440 | and Julia is far too mature for such a thing.
00:09:26.400 | I mean, they would both be great choices.
00:09:30.120 | But I mean, a lot of it is just for me personally.
00:09:33.160 | I'm really interested in getting involved in something that
00:09:35.960 | has a huge amount of potential, but a little bit earlier.
00:09:39.480 | So I can help guide it.
00:09:41.080 | I feel like I can be more part of it.
00:09:42.720 | I wanted to create something for our student community
00:09:45.720 | and our practitioner community where we could all kind of help
00:09:49.440 | be part of that guiding and development process.
00:09:53.720 | So that's one reason.
00:09:56.320 | Second reason is it's not just about the language,
00:10:00.040 | but it's about who's using it.
00:10:01.600 | And Julia doesn't quite have that big customer that really
00:10:07.040 | makes sure that it goes from quite good to huge,
00:10:13.080 | whereas Swift for TensorFlow has Google.
00:10:17.280 | And so Google needs it to work.
00:10:20.160 | Google needs it to be hugely successful.
00:10:22.360 | So I feel that that's really good.
00:10:26.120 | I also feel like the stuff that I've talked to Chris Latner
00:10:32.080 | about as to what's next for Swift
00:10:36.800 | goes beyond the stuff I've talked to the Julia core team
00:10:40.320 | about what they're doing.
00:10:41.600 | And so to be clear, I've sat down
00:10:43.040 | with many of the Julia founding team members.
00:10:45.960 | And I think they're amazing.
00:10:47.320 | And I hope I get to spend to have time hacking stuff
00:10:50.520 | with them too.
00:10:51.680 | But I feel like the place we're heading with Swift
00:10:55.920 | is like another level again, as you'll see.
00:10:59.880 | But I definitely would not say to anybody, don't touch Julia.
00:11:04.320 | I've actually sent invites to the forum
00:11:07.040 | to a large number of the Julia developers at their request
00:11:11.840 | to say, why don't you build fast AI with us for Julia too?
00:11:16.120 | So I hope that happens.
00:11:20.480 | Perhaps another question is, why not Python?
00:11:23.840 | Because it's-- and for most people,
00:11:31.240 | the answer to the question of what do we do next
00:11:33.800 | is given we're using Python, we fix the problems
00:11:39.040 | in certain ways.
00:11:40.200 | So for example, in the TensorFlow world,
00:11:41.900 | they're creating things like tf.function, which
00:11:44.800 | is kind of like allows them to connect up Python
00:11:47.920 | to the whole XLA environment, which you'll learn about.
00:11:51.880 | In PyTorch, they're using JIT to kind of turn Python into C++.
00:11:59.520 | Why aren't we doing this?
00:12:00.680 | Why aren't we just do the best with what we have?
00:12:05.400 | Because Python, to be clear, has a fantastic ecosystem.
00:12:08.040 | And we already all know it.
00:12:09.280 | And it kind of seems crazy to throw that all away.
00:12:17.240 | I think you'll see the answer to this question
00:12:20.160 | in the next two weeks.
00:12:21.520 | But basically, it turns out, I'm pretty sure,
00:12:25.920 | that it's easier to pick the right language and compilation
00:12:30.180 | environment and write everything else on top of it from scratch
00:12:35.480 | than it is to write everything else on top of something that
00:12:38.640 | was not written for that in the first place
00:12:40.760 | and then try and madly patch up the thing underneath to make
00:12:44.240 | it work.
00:12:46.280 | So things like the global interpreter lock--
00:12:50.840 | everything we're doing has to be massively parallel.
00:12:53.280 | But Python is written so that no two things can
00:12:57.000 | happen at the same time.
00:12:59.040 | So these are things that are just incredibly,
00:13:02.160 | incredibly hard to work around.
00:13:04.760 | Or else with Swift, as you'll see,
00:13:06.240 | it's like the things we want is how it's designed.
00:13:10.160 | So why not Python? Because we think that we can't keep going
00:13:17.240 | with Python.
00:13:20.040 | We were not the first people to think the existing languages
00:13:23.680 | are not good enough.
00:13:26.060 | There's actually somebody else who
00:13:27.440 | had that thought a few years ago.
00:13:29.280 | And he's so OCD that he actually decided
00:13:34.360 | to write his own language.
00:13:36.080 | And he's so OCD that before that, he wrote his own compiler.
00:13:39.160 | Because they weren't good enough either.
00:13:41.240 | And so whilst it may be difficult to be around
00:13:43.720 | such an OCD person, we're all very thankful
00:13:45.880 | that these people exist because they create the things we have.
00:13:48.760 | Oh, and here's one now.
00:13:50.480 | Chris, tell us about why you did this crazy thing.
00:13:54.000 | Thanks, Jeremy.
00:13:56.880 | I'm not the only crazy one, as you know.
00:14:00.240 | So let's talk about what Swift is.
00:14:02.640 | And then we'll kind of go very high level.
00:14:04.920 | And then we'll get down in the bits.
00:14:07.240 | So Swift is a lot of different things.
00:14:09.360 | It's a programming language.
00:14:10.600 | If you go ask the marketing people what it is, it says--
00:14:13.080 | they say things like, Swift defines away large classes
00:14:15.560 | of errors in programs.
00:14:17.020 | It talks about how fast it is.
00:14:18.640 | And it's optimized to get the most out of modern hardware.
00:14:21.760 | I think another aspect of it is that Swift is really ambitious.
00:14:24.880 | And that is something that I think
00:14:26.800 | you'll see over the next two lessons, where
00:14:29.960 | Swift isn't just about solving a niche problem.
00:14:33.920 | It was not about, let's make iOS a little bit better.
00:14:36.680 | Swift was conceived as a full stack programming language
00:14:40.000 | that goes all the way from scripting down
00:14:41.920 | to low level systems programming.
00:14:43.480 | So you can write boot loaders in it.
00:14:45.120 | Now, this is crazy.
00:14:46.880 | I'm not aware of any other language
00:14:48.280 | that's actually set out from the start to do that.
00:14:51.000 | But that's one of the reasons why it becomes really interesting
00:14:53.600 | and really good for this domain.
00:14:54.920 | Because you want very high level programming abstractions
00:14:57.440 | and a very nice and easy to use way of working with the language.
00:15:00.920 | But you also want the ability to go down to the metal
00:15:03.000 | in the rare cases you need to.
00:15:05.160 | So there's some random guy on the internet
00:15:06.880 | that wrote a blog post about this.
00:15:08.480 | I highly recommend that.
00:15:10.560 | One of the things that he says is that it's-- the interesting thing
00:15:14.880 | about Swift is that you get expressivity, flexibility,
00:15:19.080 | tight code, you get safety, ease of use, and speed.
00:15:21.920 | And it pulls it together in a unique way.
00:15:24.080 | That's a very insightful quote, Chris.
00:15:25.920 | I'd like to read more of that person's work.
00:15:28.080 | Yeah, he does some good things, too.
00:15:29.760 | He's a little bit crazy himself, too.
00:15:31.880 | So getting out of the marketing stuff,
00:15:34.080 | what is it really Swift about?
00:15:35.840 | Swift is a really young language.
00:15:37.320 | This is something I don't think that people realize.
00:15:39.840 | Jeremy, you like to point out that Python is 25 years old.
00:15:43.240 | Many of the systems we're working on are 30 years old.
00:15:45.680 | Yeah, I think Python might be 30.
00:15:47.080 | JavaScript might be 25.
00:15:49.080 | Java's about 25.
00:15:50.600 | I mean, for me, I've never spent time writing
00:15:55.320 | such a young language before.
00:15:57.080 | I also don't really remember-- that's not quite true.
00:15:59.520 | I guess languages like JavaScript have developed quickly.
00:16:02.320 | But it's really unusual to have such a young language
00:16:05.000 | be so widely used.
00:16:06.560 | So there's definitely a feeling for me often
00:16:08.560 | of feeling like, oh, I'm using a language which lots of people
00:16:11.440 | use, yet somehow it still feels--
00:16:14.840 | not everything's fully baked yet.
00:16:16.240 | Yeah, well, it's kind of interesting
00:16:17.800 | because you have millions of programmers using it.
00:16:19.800 | But on the other hand, it can be changed.
00:16:22.680 | And so that's one of the things that in this project
00:16:24.840 | we'll talk about.
00:16:25.960 | Being able to do language-integrated
00:16:27.840 | autodiff is an example of the kind of thing
00:16:29.600 | that you can only do if, on a reasonable time scale,
00:16:32.360 | you can innovate and make new language features,
00:16:34.240 | get them merged in, and evolve quickly.
00:16:36.800 | And that's one of the things that's
00:16:38.240 | made this whole project possible.
00:16:39.800 | Swift, from its roots, was designed for usability.
00:16:42.360 | And so it was designed for IDEs, and user experience,
00:16:45.720 | and code completion, and inline errors, and things like that.
00:16:48.400 | It has a refactoring engine, a bunch of stuff
00:16:50.640 | that modern languages have.
00:16:52.640 | The other aspect of Swift that I think is really important
00:16:54.880 | is Swift is designed to be not weird.
00:16:58.200 | And so when you look at Swift code
00:17:00.040 | over the course of the lessons, it
00:17:01.720 | will look pretty familiar if you've used JavaScript.
00:17:04.160 | You've used lots of other languages.
00:17:05.520 | You've used Python.
00:17:06.600 | It will look pretty similar in a lot of ways.
00:17:08.480 | And that's by design.
00:17:10.000 | A lot of languages start out trying
00:17:12.280 | to prove a point about something.
00:17:13.640 | And Swift was really designed to be--
00:17:15.920 | there's lots of good ideas in the world.
00:17:17.760 | Let's take them together.
00:17:18.800 | And through a hardcore, intense design process,
00:17:22.200 | actually apply taste and try to come up
00:17:24.480 | with something that is well-considered and fits
00:17:26.720 | well together.
00:17:27.600 | It reminds me of Perl, which Larry Wall, it's
00:17:30.960 | developed and described as a Swiss army chainsaw.
00:17:34.520 | Swift has got a similar feel of trying
00:17:36.360 | to get the best bits, but it's much more curated and carefully
00:17:42.200 | designed than something like Perl.
00:17:43.600 | So it fits together.
00:17:44.880 | And so as we'll talk about, the whole team that built Swift
00:17:48.560 | originally was the team that built LLVM, and Clang,
00:17:50.640 | and things like this.
00:17:51.520 | And so many languages were designed.
00:17:54.880 | You come from a perspective of, we'll create a programming
00:17:56.960 | language and then figure out how to make it fast later.
00:17:59.800 | A lot of Swift was built from the beginning
00:18:02.000 | to be something that compilers can use, and humans too.
00:18:05.440 | So what is Swift for TensorFlow?
00:18:06.680 | So here, we're here to rethink how deep learning systems work
00:18:10.320 | from the ground up, and where a lot of the systems
00:18:12.720 | today are constrained by being as much as you can
00:18:15.640 | get done in a Python library.
00:18:17.400 | Here, we're changing the language.
00:18:18.840 | We're changing the compiler.
00:18:19.840 | We're changing all the library stacks.
00:18:21.160 | We're changing TensorFlow, which we'll talk about.
00:18:23.160 | There's a tremendous amount of stuff involved in this.
00:18:25.320 | And one of the things that's really cool about this,
00:18:27.440 | and one of the focuses, is that we
00:18:28.920 | want to make an open, hackable platform, where you can go
00:18:32.520 | and change anything you want in it,
00:18:34.560 | and you can experiment, research, explore,
00:18:37.760 | do lots of new kinds of things, because that's
00:18:39.760 | where science comes from.
00:18:41.160 | Oh yeah, caveat.
00:18:45.640 | It's all broken.
00:18:47.560 | Fair?
00:18:48.280 | Yeah, nothing works, which is important.
00:18:50.720 | If you're going to be doing an impractical deep learning
00:18:53.160 | for coders, you wouldn't want to work
00:18:54.640 | with something that works.
00:18:56.160 | So yeah, Swift is very much not just for iOS programming.
00:19:02.640 | It's an incredibly powerful language.
00:19:05.600 | And all these people that are writing iOS applications
00:19:11.280 | can much more quickly become AI experts,
00:19:14.080 | because suddenly, they're working
00:19:16.560 | with the language which is super cool, suddenly.
00:19:20.160 | And they help propel the ecosystem as well.
00:19:22.440 | So the things we'll be talking about across the lesson
00:19:25.000 | are the Swift potential project has very big bricks that
00:19:28.520 | are part of it.
00:19:29.160 | So part of it is the Tensor API.
00:19:30.520 | We'll touch on that a little bit today.
00:19:32.600 | Python integration is a really big deal.
00:19:34.560 | This is what gives you direct access
00:19:36.120 | to the entire ecosystem that you already know.
00:19:38.280 | Automact differentiation, hugely important for ML.
00:19:41.200 | And Swift has a really cool, industry-leading approach
00:19:45.400 | to it stolen from Fortran.
00:19:48.280 | Jupyter, you'll see a lot of that today.
00:19:50.680 | So one of the things you'll notice
00:19:52.320 | is that a lot of what you see as a high-level programmer
00:19:55.720 | is very familiar.
00:19:56.800 | And that's by design.
00:19:57.720 | And so this is an example layer built in Swift.
00:20:02.160 | This is an example layer built in PyTorch.
00:20:04.080 | And it looks basically the same.
00:20:05.480 | I mean, there's differences.
00:20:06.440 | And we're working to reduce the difference even more.
00:20:08.400 | We love all those floats.
00:20:09.560 | I mean, some of the differences are very nice,
00:20:11.440 | like the fact that you don't have to pass self all the time.
00:20:15.240 | There are things where you're just like, oh, finally.
00:20:18.120 | So it's actually getting to the point where the boilerplate
00:20:22.400 | in the Python code, there's more boilerplate of like, oh,
00:20:24.640 | this self.com, and self.pool, and self.com are--
00:20:27.960 | and get rid of a lot of that.
00:20:29.080 | Yeah, so we're going to start very deep and very low level.
00:20:32.720 | So I just want to give you a high-level view of what
00:20:34.880 | things look like and where we'll end up.
00:20:37.400 | And so this is a layer.
00:20:38.880 | And so in Swift, this is implemented with a struct.
00:20:40.840 | We'll talk about those a little bit later.
00:20:42.560 | And it says, I'm defining my model.
00:20:44.160 | And it's a layer.
00:20:46.040 | We use layers just like you would normally.
00:20:48.200 | And so you have column 2D, max.pool, flatten.
00:20:52.120 | Things are callable in Swift.
00:20:53.480 | And we use call instead of underbar, underbar call.
00:20:55.720 | You'll see a lot less underbars.
00:20:58.200 | And otherwise, it looks basically the same.
00:21:00.000 | You just compose these things together.
00:21:02.360 | One major difference is this differentiable thing.
00:21:04.760 | And you may be wondering, why do we have differentiable?
00:21:07.320 | Well, this is just telling the compiler
00:21:08.600 | that it should be able to differentiate this.
00:21:10.480 | And one of the cool things about compiler integration
00:21:13.240 | is that when you say, hey, compiler, give me
00:21:15.880 | the gradient of some function, in the happy path,
00:21:19.960 | when everything is good, it just does.
00:21:21.560 | And that's what you'd expect.
00:21:22.880 | But the unhappy path matters as well.
00:21:24.680 | I don't know if anybody here makes mistakes.
00:21:27.400 | I do.
00:21:28.260 | And so one of the cool things about having proper language
00:21:30.720 | support is you can get an error message that says, hey,
00:21:32.840 | that function can't be differentiated.
00:21:34.960 | That's useful.
00:21:36.480 | But you go farther.
00:21:37.480 | You say, oh, it can't be differentiated because integers
00:21:40.480 | and this in CAS you have can't be differentiated.
00:21:43.920 | And then it says, even farther, well, it's actually--
00:21:46.560 | this is several levels deep in function calls.
00:21:48.520 | And this is exactly the path.
00:21:49.760 | And this is exactly what went wrong.
00:21:51.600 | And it's really cool to get this in your workbook
00:21:53.800 | without even having to run something.
00:21:56.160 | And so this is the kind of--
00:21:57.960 | when you build things for IDEs and you
00:21:59.520 | build things for usability, you get really nice behavior
00:22:04.080 | that the compiler is helping you.
00:22:06.160 | So what is Swift for TensorFlow?
00:22:07.480 | And how does it stack up?
00:22:08.440 | And how does it relate to TensorFlow?
00:22:10.440 | So TensorFlow, one way to think about classic TensorFlow
00:22:13.280 | is that you have a tremendous amount of infrastructure.
00:22:15.880 | TensorFlow has this really mature distribution,
00:22:19.840 | scale out, end-to-end training, inference
00:22:23.720 | with mobile devices, all this cool stuff in the ecosystem.
00:22:27.400 | And then it has this thing called Python.
00:22:29.260 | I call it Python for TF.
00:22:32.000 | And Python for TF includes its auto-diff system.
00:22:34.840 | And then it has a bunch of APIs like Keras and Estimator
00:22:37.580 | and other things built on top.
00:22:38.920 | And so what we've done is we've built a parallel stack
00:22:40.960 | where we're using a lot of the same infrastructure
00:22:43.320 | underneath the covers.
00:22:44.560 | And then we're building a new fast AI framework on top.
00:22:47.120 | Now, one of the things we'll talk about more later
00:22:49.000 | is that TensorFlow itself is undergoing radical change
00:22:52.120 | in the internals.
00:22:52.920 | And one example of this is the XLA compiler.
00:22:55.800 | And one of the things you'll find out
00:22:57.240 | is that TensorFlow is compilerizing itself
00:23:00.760 | as new accelerators and new technologies.
00:23:03.200 | And lots of things are coming into play.
00:23:05.760 | And so TensorFlow is the internals
00:23:07.880 | are undergoing major changes, which is super exciting.
00:23:11.880 | So let's dive in to some code.
00:23:14.540 | Yeah.
00:23:16.400 | What's the roadmap relationship between Swift for TensorFlow
00:23:20.120 | and mainstream Swift?
00:23:21.800 | Will they eventually be the same thing?
00:23:23.760 | Yeah, that's a great question.
00:23:24.840 | So right now, the Swift for TensorFlow project,
00:23:26.640 | you can think of it as like a dev branch.
00:23:28.480 | Actually, it is literally a dev branch on the Swift project.
00:23:31.760 | And so what we do is we build things
00:23:33.320 | like automatic differentiation.
00:23:34.880 | We bake them, we get experience.
00:23:36.520 | And then we propose them and merge them back
00:23:38.140 | in the mainline Swift language.
00:23:39.840 | And a bunch of stuff has already been done that way.
00:23:42.280 | So the Python integration drove new language features in Swift.
00:23:47.120 | We propose them.
00:23:48.040 | We got them integrated into the mainline compiler.
00:23:49.920 | And now they're shipping.
00:23:50.920 | And iOS developers can use cool things because of Swift
00:23:54.440 | for TensorFlow.
00:23:57.840 | So let's dive in.
00:23:59.000 | Now, I thought I would start with some really basic things
00:24:03.240 | to just introduce the language, just
00:24:04.780 | so you understand you have some context of how
00:24:06.360 | things work.
00:24:07.240 | And then we'll get into some more interesting stuff.
00:24:09.440 | So this is a Jupyter Notebook, just like you'd expect.
00:24:11.920 | This is Jeremy's window machine, which I will struggle to use.
00:24:15.680 | Because I have not used Windows for a long time.
00:24:19.320 | It will be fine.
00:24:20.120 | It has a Mac keyboard on it, so it's super weird.
00:24:22.640 | It scrolls the wrong direction, but it will be great.
00:24:25.280 | So a lot of what Swift does looks very familiar to Python.
00:24:31.000 | So here I have some integers, six.
00:24:33.960 | I have some math.
00:24:35.000 | I have print.
00:24:35.800 | It all works exactly the same as you'd expect in Python.
00:24:38.440 | One of the major differences in Swift
00:24:39.940 | is that you have this let and this var thing.
00:24:42.380 | And so let in Swift means constant.
00:24:45.160 | var means variable.
00:24:46.560 | And so it's super easy.
00:24:47.800 | And as Jeremy, I think, loves to say, in a workbook,
00:24:49.840 | just declare everything var, and then you
00:24:51.320 | don't have to worry about it.
00:24:52.560 | And then you can change it however much you want.
00:24:55.400 | But you can see, if you declare a constant like pi--
00:24:58.840 | because pi should not change, it makes sense to use let.
00:25:03.740 | If you try to change it, change a constant,
00:25:06.240 | you'll get this error message.
00:25:07.400 | And this error message says something like cell 5, line 1.
00:25:11.080 | One of the cool things about Jupyter
00:25:12.500 | is if you hit Shift-L, you get line numbers in here.
00:25:15.640 | And here you can see it's referring to cell 5, line 1.
00:25:19.000 | And it says, hey, if you go to cell 3, line 2, up here,
00:25:22.400 | you can change this let into a var.
00:25:24.680 | And now your code will work.
00:25:26.000 | And so it's trying to help you out here.
00:25:28.160 | So that's something that you'll see.
00:25:29.540 | That's super awesome.
00:25:30.920 | And I'll just mention, for people watching that
00:25:36.600 | have a background in Swift programming,
00:25:38.480 | there's a tendency in that culture
00:25:40.440 | to keep things closed, keep things constant,
00:25:43.960 | keep things private.
00:25:45.040 | And there's lots of good reasons for that.
00:25:47.280 | But when you're getting into this deep learning mode,
00:25:50.800 | we generally represent flipping everything upside down,
00:25:53.320 | at least for the R&D and prototyping process.
00:25:55.960 | Because you want things to be infinitely hackable,
00:25:58.880 | the way Chris describes, you want them to be vast.
00:26:01.240 | So you can change them.
00:26:02.200 | You want to be public so that you can see inside them.
00:26:04.760 | And so you'll actually find there's
00:26:06.280 | been recent PRs to Swift for TensorFlow itself, where
00:26:11.720 | we're starting to change APIs in this way to make it
00:26:16.160 | so that people can look inside the covers
00:26:18.880 | and change things more.
00:26:20.640 | So you may notice that we're not using a lot of types here.
00:26:23.280 | But Swift is a typed language.
00:26:24.600 | Types are actually a very important way
00:26:26.840 | that the compiler can help you, because you
00:26:28.600 | can detect errors early.
00:26:29.960 | What Swift does is it has a thing called type inference.
00:26:32.160 | And so when you say var x equals 4, it knows that 4 is an int.
00:26:36.880 | And so it will default it to an int.
00:26:38.400 | Or if you say var x is equal to 1.0,
00:26:40.880 | it will say that, oh, that's a float or a double.
00:26:43.680 | And so types in Swift are written with a colon here.
00:26:45.880 | And so you can say, OK, well, even though this would normally
00:26:47.840 | be an integer, actually make it a float.
00:26:50.800 | Swift supports important things like emoji.
00:26:53.040 | Emoji is totally better than Greek letters, Jeremy.
00:26:55.280 | Yeah, so actually Chris asked me last week.
00:26:58.080 | He goes, Jeremy, yes, Chris, what?
00:27:00.800 | He's like, how do you feel like emoji about emoji
00:27:04.480 | in the books in Swift code?
00:27:06.280 | And I literally said to him, Chris, they're fine,
00:27:09.320 | as long as it's the pile of poo emoji and it's next to a cow.
00:27:13.120 | And Chris goes, OK, it's the pile of poo emoji
00:27:15.520 | but it's next to a dog.
00:27:16.480 | Is that OK?
00:27:17.160 | So yeah, OK.
00:27:18.280 | We split the difference.
00:27:20.520 | So this is great power and great responsibility.
00:27:23.840 | If you name all of your variables pile of poo,
00:27:26.040 | then your code is--
00:27:28.040 | never mind.
00:27:28.880 | Static confine.
00:27:30.720 | Yes, yes, yes, descriptive.
00:27:32.440 | Maybe.
00:27:34.640 | So let's talk about a few other things.
00:27:36.240 | So Python uses indentation.
00:27:40.160 | Swift uses curly braces.
00:27:41.680 | So I don't think that there's any--
00:27:44.000 | I'm not going to say one's better than the other.
00:27:46.120 | Curly braces are more commonly used,
00:27:48.440 | and so that's why Swift uses them.
00:27:50.160 | But they're basically the same thing.
00:27:52.560 | Just you'll figure it out.
00:27:55.400 | How do functions work?
00:27:56.280 | Well, in Python you use def, and Swift you use func.
00:28:00.360 | Because it's a function.
00:28:01.720 | And so what this is, is this is defining a function
00:28:04.240 | and you declare the inputs, the signature, x and y,
00:28:06.960 | and it returns a float, and you implement it
00:28:08.640 | with the thing you'd expect.
00:28:10.600 | When you call it, you pass keyword arguments.
00:28:13.560 | Swift is very opinionated about keyword arguments.
00:28:15.720 | And so if you say that this thing has x and y as arguments,
00:28:18.760 | you have to pass x and y.
00:28:20.240 | And so one of the funny things you'll see
00:28:21.880 | is you'll see this underbar thing going on right here.
00:28:24.400 | And this is saying that ignore, underbars ignore,
00:28:27.640 | just like Python, ignore are the keyword argument.
00:28:29.960 | And so when you call it, you don't pass it.
00:28:31.920 | That's all that means.
00:28:32.840 | I've got to say, I love almost everything about Swift,
00:28:36.840 | except for three things.
00:28:38.120 | This is one of the three things.
00:28:39.600 | So this bit I find awkward, because these positional
00:28:44.520 | parameters, you can't even change the order of them.
00:28:46.960 | Even though they're positional, you
00:28:48.460 | can't use them in a different order.
00:28:50.240 | If you do have that underscore to say you don't have to name it,
00:28:54.320 | then you're not allowed to name it.
00:28:55.960 | Like it's-- I don't know.
00:28:58.280 | I find this bit nasty, but it's almost everything else
00:29:02.040 | I love about Swift, so I just put up with it.
00:29:04.600 | This is also not my opinion of the right thing.
00:29:08.000 | But the argument for that is consistency of APIs
00:29:10.680 | is important, and it works fine.
00:29:14.800 | So tuples work pretty much the same way as they do in Python.
00:29:17.240 | You can use them to return multiple values, which
00:29:19.960 | is what we're doing here.
00:29:20.920 | So here we're turning two floats,
00:29:22.440 | and we're saying the first one is sine,
00:29:24.200 | and the second one's cosine.
00:29:25.600 | You get destructuring.
00:29:26.560 | You get access to the tuples, all the nice things
00:29:29.480 | that you'd expect.
00:29:30.560 | One of the things that's different about Swift
00:29:32.360 | in Python-- Swift has this thing called struct.
00:29:34.560 | And the way to think about it to begin with
00:29:37.520 | is it's just like a class in Python.
00:29:40.520 | Structs are super powerful, though.
00:29:42.520 | They are more efficient.
00:29:43.700 | There's a lot of good things about them.
00:29:45.240 | They don't require memory allocation.
00:29:47.080 | And we'll talk about why that matters.
00:29:48.600 | If you've got a C programming background,
00:29:50.840 | it's not much like that at all.
00:29:52.440 | So I would say, like, think of it more like a Python class
00:29:55.000 | than a C struct.
00:29:56.080 | Yeah, exactly.
00:29:56.760 | And we'll show you a little bit about that.
00:29:58.640 | So here I have a complex F struct,
00:30:01.600 | and I've got a real and imaginary.
00:30:03.120 | I stick in there.
00:30:03.760 | I can create one of these complex Fs
00:30:05.600 | by specifying these things.
00:30:06.640 | I print it out, and I get it back.
00:30:08.320 | And in Python, there's this thing called data class.
00:30:11.040 | Yeah, so we've used data class.
00:30:12.680 | And it's interesting.
00:30:13.640 | When I threw in a data class here,
00:30:15.640 | it looks almost exactly the same.
00:30:18.780 | There's some extra boilerplate we need in the Python version.
00:30:21.940 | For example, we have to put the two things on different lines.
00:30:24.520 | We can't put them on the same line.
00:30:27.000 | But overall, like a lot of things between Swift and Python,
00:30:30.560 | ends up looking extremely comfortable.
00:30:33.960 | So now, one of the bad things about this thing
00:30:35.560 | is you notice it's defined with floats.
00:30:37.120 | But complex numbers work with integers, as well.
00:30:39.120 | And they work with doubles and lots of other things.
00:30:41.200 | So the way that Swift handles this
00:30:42.560 | is this thing called generics.
00:30:44.040 | And we'll talk more about the details of generics later.
00:30:46.600 | But basically, what we can do is we can say,
00:30:49.000 | let's define a complex type.
00:30:51.240 | And complex works with any type T and anything that is signed.
00:30:56.560 | And that's a number.
00:30:57.640 | And that's what the signed numeric thing says.
00:30:59.800 | And so now, what I can do is I can define the struct.
00:31:02.760 | And I can use it down here with integers and with floating
00:31:05.760 | point.
00:31:06.480 | And it just figures out that T is int or T is float,
00:31:09.120 | depending on what you use it with.
00:31:10.540 | And this is something that Python can't do, right?
00:31:13.600 | So with Python, if we remove the data class,
00:31:16.400 | we could certainly then remove the float.
00:31:18.120 | And then we could have it untyped.
00:31:19.600 | But we can't say in Python, these two
00:31:21.800 | have to be of the same type.
00:31:23.600 | But I don't know what type it is yet.
00:31:25.440 | So this ability to use generics lets
00:31:27.320 | us do some pretty powerful stuff right out of the box.
00:31:29.640 | Yeah.
00:31:29.960 | And we'll talk about some of the really cool things
00:31:31.560 | that you can do that make it super flexible and super--
00:31:35.280 | I mean, there's some really powerful things you can do.
00:31:37.920 | So we've got our complex here.
00:31:39.440 | One of the things you can see we're doing
00:31:41.200 | is that just like in Python, you have computed properties
00:31:43.600 | and stored properties.
00:31:44.680 | And here we have a computed property.
00:31:46.480 | We can define a getter just in line.
00:31:48.560 | And so it's just a stored property.
00:31:49.920 | But you provide a body.
00:31:51.120 | It's quite simple.
00:31:52.440 | Here's a computed property doing a weird thing.
00:31:54.840 | But here I just have a computed getter and a setter.
00:31:57.440 | And it's pretty straightforward.
00:31:58.960 | This is very similar to C#.
00:32:01.200 | When you've got one of these, you
00:32:02.560 | can create some of these things.
00:32:04.280 | You can use the computed property.
00:32:05.800 | And it works just like a normal property.
00:32:08.160 | It's all very simple.
00:32:09.920 | Now, one of the cool things about Swift
00:32:12.240 | is that after you define a type, you can add methods to it.
00:32:15.320 | And this is something that might make you feel weird.
00:32:18.800 | But you can add methods on anybody's type.
00:32:21.240 | You can add it on your own, like we're doing here.
00:32:23.240 | Or you can add it on standard library types.
00:32:24.800 | Or you can add on anybody's--
00:32:26.120 | I mean, it doesn't make me feel weird, Chris.
00:32:28.880 | Because we do it in fast AI all the time.
00:32:31.080 | It's called monkey patching.
00:32:32.880 | But it's kind of something that we're told to avoid.
00:32:35.040 | Because monkey patching has weird, dangerous, undefined,
00:32:38.800 | strange behavior, and things combine in weird ways.
00:32:41.120 | We get conflicts.
00:32:41.960 | So is this monkey patching?
00:32:43.560 | Should we be avoiding this in Swift?
00:32:45.160 | So this works in a very highly principled way
00:32:47.360 | that actually composes.
00:32:48.840 | And if you get a conflict, the compiler
00:32:51.040 | will ask you which one you want.
00:32:53.200 | This is not something you should feel bad about.
00:32:55.160 | Now, here, I'm defining an add method.
00:32:56.800 | And so I'm using this to add two complex numbers.
00:32:59.920 | I feel bad about this because there's a way to spell add.
00:33:03.040 | And yes, it's ADD, I guess.
00:33:05.360 | But I would rather spell with plus.
00:33:08.320 | And so you can call a method on this,
00:33:10.240 | just like any other method.
00:33:11.400 | But if you want to add an operator, what you do
00:33:13.760 | is you just define func plus.
00:33:16.520 | And so instead of honor bar, honor bar add, and all that jazz,
00:33:19.480 | you just define the operators you want and spell them
00:33:21.680 | the way you expect.
00:33:22.640 | And they're just functions like anything else.
00:33:24.560 | And this already is getting at something
00:33:26.880 | that would be really nice to be able to do in Python,
00:33:29.440 | would be able to say, oh, there's
00:33:30.840 | a whole bunch of different functions or operators
00:33:34.400 | with the same name.
00:33:35.800 | And they behave differently depending on what type
00:33:38.880 | I pass to them.
00:33:40.400 | Now, Python does have a standard library decorator
00:33:44.880 | you can use called single dispatch.
00:33:46.760 | We almost never use it because like every time
00:33:48.640 | we've tried to use it, it reacts in weird ways
00:33:51.520 | with everything else.
00:33:52.680 | But it's super nice that in Swift,
00:33:55.480 | as in many typed languages like this,
00:33:58.640 | it's very much designed for us to be able to say like,
00:34:01.520 | oh, here's lots of different types.
00:34:03.160 | And they all have different meanings of what, for example,
00:34:06.600 | plus means, and it just works.
00:34:08.160 | And so here we're implementing plus on complex in terms
00:34:10.880 | of plus of its elements.
00:34:12.120 | And so we're just adding together the real and imaginary,
00:34:14.400 | and these are different pluses.
00:34:17.040 | One of the mind-blowing things that's
00:34:19.880 | very different than Python is that you
00:34:21.600 | can define your own operators.
00:34:23.360 | And so some of us do lots of math, us not including me.
00:34:26.760 | But some of you all do a lot of math.
00:34:29.880 | Or you're working in a domain where
00:34:31.560 | you're doing quaternions or other cool things like that,
00:34:33.680 | and it's really nice to be able to use operators that
00:34:35.400 | are familiar to your domain.
00:34:36.800 | And so if you want to find a square root operator,
00:34:38.440 | then you can define a square root operator.
00:34:40.200 | And these just work, and now you can use a square root operator
00:34:42.880 | just like anything else.
00:34:44.200 | And this is one of the examples of Swift being hackable.
00:34:48.480 | Like, there's a standard library that
00:34:50.720 | has a bunch of stuff built in and provided in the box.
00:34:54.320 | But the stuff the standard library does, you can do too.
00:34:57.600 | And so we try very hard to not make the standard library
00:35:00.320 | be a privilege.
00:35:01.680 | So that's like the super quick introduction
00:35:04.000 | to some random stuff in Swift.
00:35:05.680 | There's this guided tour here, which is really cool.
00:35:08.040 | It goes into other random stuff.
00:35:09.480 | And so if you want just a high-level introduction
00:35:11.360 | like this, you can go there.
00:35:13.200 | But let's dive into some more relevant pieces.
00:35:15.280 | First, we have two questions.
00:35:17.440 | The first is, does Swift support any debugger within Jupyter,
00:35:20.240 | similar to IPDB for Python to set breakpoints?
00:35:23.840 | So we don't have that yet.
00:35:25.640 | We have all the mechanics under the covers.
00:35:28.120 | So Jupyter is actually talking to a debugger.
00:35:30.240 | We just haven't wired it up yet.
00:35:31.600 | But that's one of the things we're interested in.
00:35:33.200 | OK, so that's probably coming--
00:35:34.800 | I can't promise that.
00:35:36.160 | But the guy in the front row that built it all is smiling.
00:35:39.120 | So maybe.
00:35:41.320 | And does Swift have something similar to Python's ARGs
00:35:44.200 | and KWRs?
00:35:47.840 | In fact, we'll talk about that when
00:35:49.240 | we get to the Python section.
00:35:50.200 | Great, thank you.
00:35:51.520 | So it works a little bit differently.
00:35:54.640 | So let's talk about Python now, because we love Python, right?
00:35:57.560 | Well, Swift loves Python too.
00:35:58.920 | And as Jeremy healthily pointed out,
00:36:01.680 | Swift's data science ecosystem is kind of pathetic.
00:36:04.440 | So Python is really important.
00:36:07.000 | And beyond the data science ecosystem
00:36:08.800 | and Swift being pathetic, you all know Python.
00:36:11.380 | And so you all know important APIs
00:36:13.880 | that are pervasively available.
00:36:15.320 | And there's no reason for you to relearn new APIs.
00:36:17.880 | If you know the APIs in Python, just use them.
00:36:20.120 | So let's talk about how that works,
00:36:21.580 | because I think it might blow your mind a little bit.
00:36:23.720 | So to use Python and Swift, you first import Python.
00:36:27.200 | This is just a library in Swift called Python.
00:36:30.500 | And then you use Python to import whatever Python libraries
00:36:33.440 | you want.
00:36:34.320 | And there's no wrappers.
00:36:35.580 | There's no build steps.
00:36:36.800 | There's no wrapper generator thingies.
00:36:40.480 | You just literally import NumPy, or here we're
00:36:42.360 | importing Matplotlib.
00:36:44.920 | What does this give you?
00:36:45.920 | This gives you NP.
00:36:46.720 | This gives you PLT, just like you would do in Python.
00:36:49.640 | And now you use it just like in Python.
00:36:51.520 | And so here I'm calling NP array.
00:36:53.780 | And this is-- except for the let,
00:36:55.920 | this is literally what you write in Python.
00:36:59.400 | So we can now use this to do some cool stuff.
00:37:01.840 | And so here, actually, we're going
00:37:04.200 | to use this load MNIST function.
00:37:06.280 | And we'll see it a little bit later.
00:37:07.740 | It's in the 00 notebook.
00:37:09.400 | This is firing up TensorFlow and loading the MNIST data set
00:37:12.560 | and plopping it into a tensor for us.
00:37:14.800 | Once that comes back, now we can use Matplotlib.
00:37:17.080 | And Matplotlib, we can use this magic incantation,
00:37:20.480 | this kind of like the Matplotlib inline.
00:37:23.080 | We can then load, take the tensor that TensorFlow gave us,
00:37:26.720 | do all the NumPy stuff with a NumPy ND array,
00:37:31.240 | reshape it, and plot it out.
00:37:33.160 | And this all just works the way you would normally
00:37:35.720 | use Matplotlib.
00:37:36.680 | And the cool thing about this is that Swift knows nothing
00:37:39.680 | about Matplotlib, knows nothing about NumPy,
00:37:42.320 | knows nothing about any of the particular libraries
00:37:45.040 | you're using.
00:37:46.440 | We literally just imported some random thing with strings.
00:37:51.040 | And Swift doesn't know anything about what you imported here.
00:37:55.560 | And so you may be wondering how this works,
00:37:57.440 | because we're just using Python right from Swift
00:38:00.520 | and how does Swift know what Python is.
00:38:02.320 | Well, the way to think about this
00:38:03.960 | is that we think about Python as though it has no types.
00:38:06.880 | But really, Python has one type.
00:38:09.200 | And that type is Python object.
00:38:11.140 | And Python has an internal representation of objects.
00:38:13.320 | And you can use dot on them.
00:38:14.640 | And you can call them.
00:38:16.040 | And so the way it works in Swift is
00:38:17.560 | that you have one type called Python object.
00:38:20.240 | So here, when we use the type of, that's
00:38:22.760 | just like type in Python, says give me the type of np,
00:38:26.600 | or give me the type of np.array, or give me a type of the array
00:38:30.240 | that we got, or whatever.
00:38:33.600 | What it actually shows you is the type is Python object.
00:38:36.640 | And so Python values are Python object types in Swift.
00:38:41.000 | And when you dot them, when you call them,
00:38:42.940 | it's just using the interpreter.
00:38:44.520 | And so it just works in Swift, because you are literally
00:38:48.600 | using Python as dynamically as Python is designed to be used.
00:38:52.920 | And you can actually go and look.
00:38:54.200 | And one of the totally rain-twisting things
00:38:56.880 | that you can do is you can import Python into Swift,
00:39:00.880 | import FastAI's Python libraries into Swift,
00:39:04.480 | and now go to town and just use all the standard FastAI
00:39:09.280 | cool stuff right from Swift.
00:39:11.560 | And it all just works.
00:39:14.000 | So thank you to Omar SF for trying this.
00:39:16.400 | It's a crazy thing to try.
00:39:17.960 | And it's interesting how when you look at the resulting code,
00:39:22.080 | it's the same code that we-- like at this point,
00:39:25.600 | you can't tell other than some slightly different syntax here
00:39:31.920 | and there.
00:39:32.420 | But it's all the same.
00:39:34.000 | It's like Python with Latin ver.
00:39:35.880 | One thing I'll say about this is this is a super cool feature
00:39:42.280 | that you should totally use to fill in the gaps that
00:39:44.840 | need to be filled in while this ecosystem doesn't exist.
00:39:47.920 | But then as soon as possible, fill in the gap.
00:39:51.560 | Because I don't want us, as a Swift for TensorFlow community,
00:39:56.240 | to use this as such a crutch that we never
00:39:58.880 | write our own even better DataFrames library,
00:40:01.160 | because we're always using pandas.
00:40:02.800 | And we always use Matplotlib, so we never
00:40:05.000 | create an even better plotting library.
00:40:07.720 | We should use the crutch to allow
00:40:09.840 | us to get all our work done end to end,
00:40:11.640 | and then gradually replace it with bits that are more swifty.
00:40:14.720 | I mean, one of the awesome things about Swift
00:40:16.760 | is that it supports really well-considered and beautiful
00:40:20.520 | APIs.
00:40:21.200 | And it was really designed for building APIs.
00:40:24.680 | But particularly when you're new to Swift,
00:40:26.800 | don't worry about that stuff.
00:40:28.080 | That's a problem for a different day.
00:40:29.480 | If you want to open a file, open a file the way you know how.
00:40:32.200 | Just use the Python file IO library.
00:40:34.400 | That's fine.
00:40:35.200 | Don't waste your brain cycles on that kind of stuff.
00:40:38.080 | So let's now talk about the idea of Jeremy's course here,
00:40:42.120 | which is building a machine learning library from scratch.
00:40:44.480 | And I think it's very quaint that Jeremy tried so hard
00:40:49.640 | to go down to the foundation and teach you
00:40:52.320 | how to build a Matmul with a few loops, and looping over an array,
00:40:58.000 | and adding and multiplying floating point numbers.
00:41:00.280 | And I think that it's very nice how
00:41:02.960 | that he thinks that this is going down to the foundation.
00:41:05.680 | Oh, Chris, it's Matmul from scratch.
00:41:08.320 | Well, so if that's Matmul from scratch,
00:41:09.840 | then I think we should go down to the bedrock
00:41:12.960 | and actually talk about where float and arrays come from.
00:41:16.000 | But before we do that, I want to dive in and geek out
00:41:18.200 | just a little bit about compilers,
00:41:19.960 | because I think you need to understand,
00:41:21.760 | or it's useful to understand, what LLVM and Swift and things
00:41:25.360 | like this are.
00:41:26.240 | So, Chris, what you're saying is that I cheated.
00:41:28.480 | I used array without implementing an array.
00:41:30.280 | Exactly.
00:41:30.760 | And I used float without implementing float.
00:41:33.160 | So let's fix that.
00:41:34.480 | OK, I'm sorry.
00:41:35.840 | So what is a-- yeah, so what is a compiler?
00:41:41.960 | Actually, we can do-- oh, look at you.
00:41:43.960 | Touchscreens.
00:41:44.640 | Wow, crazy.
00:41:45.840 | OK, so what is a compiler anyways?
00:41:48.240 | And what is a language?
00:41:49.440 | So the way I think about this is that there's actually
00:41:52.200 | two unmovable obstacles in the universe.
00:41:54.360 | There's humans, which we're all kind of a pain to work with,
00:41:57.480 | right?
00:41:58.200 | Highly opinionated sometimes.
00:42:00.000 | And then there's computers.
00:42:01.160 | And they're really a pain, because they
00:42:02.960 | are super opinionated.
00:42:04.480 | And so what languages are is they're a point in between.
00:42:07.320 | And different languages are different points in between.
00:42:09.800 | And some are optimized for working with humans better.
00:42:12.560 | Some are optimized for working with computers better.
00:42:14.920 | But good languages work well with both.
00:42:18.280 | And so how do compilers work?
00:42:19.880 | Well, the way that it used to work in the bad old days
00:42:22.160 | is that if somebody wanted to build a compiler for x86,
00:42:24.600 | they would build a parser, the front end part of a compiler.
00:42:27.760 | They'd then build an optimizer and make the code go fast.
00:42:30.120 | And they'd build a code generator for the Intel PC
00:42:32.800 | or whatever it is that they want to target.
00:42:34.840 | Somebody else would come along and say,
00:42:35.880 | hey, I want a different compiler.
00:42:36.880 | I want a C++ compiler.
00:42:38.200 | And they would build a parser.
00:42:39.520 | They would build an optimizer.
00:42:40.640 | And they'd build a back end for PowerPC.
00:42:43.160 | Somebody else would say, hey, APL is really cool.
00:42:45.040 | Let's build a parser for APL, an optimizer for APL,
00:42:48.080 | and then a back end for ARM.
00:42:50.280 | And if you've noticed the trend here,
00:42:52.080 | there's a lot of re-implantation of all the things going on.
00:42:54.880 | And so what compilers have done is
00:42:56.440 | they've created funnel points.
00:42:58.720 | LLVM is one of these funnel points
00:43:00.720 | where you can make it so that lots of different language
00:43:04.280 | people can implement lots of different front ends.
00:43:06.520 | And lots of different hardware people
00:43:08.020 | can implement what's called the back end or the code generator.
00:43:10.680 | And now they can share a tremendous amount of code.
00:43:12.680 | And now you can get all the permutations
00:43:14.400 | that you want out of this.
00:43:15.400 | And we should all thank Chris Latner's master's thesis
00:43:18.640 | supervisor for forcing him to write his damn thesis
00:43:21.400 | and getting him to actually write LLVM version 1
00:43:24.120 | in one and a half weeks of Diet Coke-fueled coding activity.
00:43:29.200 | This is the way we get things done,
00:43:30.820 | is give people a ridiculous deadline and it happens.
00:43:34.800 | And so the details of what LLVM is is not really important.
00:43:37.800 | But this LLVM is what powers Julia and Rust and Swift
00:43:41.680 | and Clang that does C and C++.
00:43:44.120 | It's like a very common set of infrastructure
00:43:46.400 | that lots of things use these days.
00:43:48.080 | And if you're not very familiar with compilers
00:43:49.640 | and what optimizations are, there's
00:43:51.200 | a bunch of standard stuff that LLVM does,
00:43:53.040 | including constant folding, removing dead code,
00:43:56.360 | other things like the example I show here of taking
00:43:59.360 | an expression and pulling it out of a loop.
00:44:01.560 | This is something that in PyTorch, for example,
00:44:03.640 | if you do a multiply inside of a loop of two tensors,
00:44:06.320 | it's going to run that multiply every single time
00:44:08.420 | you go through the loop.
00:44:09.420 | But reasonable, more modern languages
00:44:11.880 | actually pull these things out for you.
00:44:13.800 | So this is a fascinating example.
00:44:16.480 | Because normally, if we're writing Python
00:44:18.600 | and you see, oh, I'm doing this work inside the loop
00:44:21.000 | redundantly, I can pull it out.
00:44:23.140 | That's something that I, as a programmer,
00:44:25.120 | have to figure out.
00:44:26.600 | And the fact that LLVM can do this for you
00:44:30.800 | and other optimization systems in GCC or whatever,
00:44:34.760 | it suddenly makes you realize that compilers are something
00:44:38.640 | different to at least what I thought they were.
00:44:40.700 | I thought compilers were things that got in your way
00:44:43.760 | and complained until you did things the way they expected
00:44:47.280 | you to do them and took forever to run code
00:44:49.840 | that I would have finished yesterday if it was Python.
00:44:54.040 | But actually, working with Swift, and particularly
00:44:58.240 | with Swift for TensorFlow, has made me realize
00:45:00.080 | that these optimizations actually allow us to write code
00:45:03.840 | in different ways and actually be much more lazy
00:45:07.880 | about how we write code.
00:45:08.880 | - And this is as you think about a point in the space
00:45:10.880 | between the human and the computer.
00:45:12.360 | - Yeah, so we're actually gonna show you something
00:45:14.220 | really mind-blowing next week where this is actually
00:45:17.880 | gonna be used to basically make auto-diff work.
00:45:21.680 | And it's like, it blew my mind when I found out about it.
00:45:24.880 | - Yep, and so now if you think about languages
00:45:27.200 | and different points in the space,
00:45:28.400 | there's a couple of different ways to look at this.
00:45:30.320 | One of the ways I think about it,
00:45:31.760 | ignoring the syntax pieces, which the syntax is always
00:45:34.120 | the first thing people talk about,
00:45:35.640 | is what are the atoms of the universe
00:45:37.840 | and how do they get composed
00:45:39.280 | and how do you build things out of them?
00:45:40.520 | And so if you look at Python, for example,
00:45:42.580 | if you boil everything down in Python,
00:45:44.480 | boil a dictionary down, it's a bunch of C functions.
00:45:47.360 | And then what the interpreter does,
00:45:48.480 | the Python interpreter does, is it decides
00:45:50.540 | what C functions to call and what order and on what data.
00:45:53.880 | And so the Python interpreter is slow,
00:45:56.280 | and so the Python program ends up being slow,
00:45:59.120 | even if the C pieces are fast.
00:46:01.760 | C++ is another language.
00:46:03.000 | C++ is a little bit different.
00:46:04.920 | C++, the atoms are built in things like integers
00:46:07.920 | and floats and arrays and pointers and things like that.
00:46:10.800 | And then a C++ programmer can use structs and classes
00:46:13.500 | to build complex numbers or strings
00:46:16.000 | or its variable size array thing out of in the library.
00:46:20.720 | And it can do this because C++ is a fast language.
00:46:23.000 | It's also not a super well-considered language,
00:46:26.560 | but it's weird to me in C++ that arrays hard-coded
00:46:30.800 | into the compiler, but string is a library feature.
00:46:33.560 | And why is that?
00:46:34.400 | That doesn't really make sense to me
00:46:36.220 | because strings and arrays are so similar.
00:46:38.760 | What Swift is, is it says, let's rethink all this.
00:46:42.520 | And so the primitives, the low-level atoms of the universe
00:46:46.040 | are now things that LLVM, the compiler, knows about.
00:46:48.920 | And then all the abstractions you build,
00:46:50.680 | including floats, arrays, dictionaries,
00:46:53.320 | of course, the high-level stuff too, like layers,
00:46:55.320 | those are all defined in the library.
00:46:57.640 | And so a float is not a magic built-in thing.
00:47:01.240 | Swift doesn't like magic built-in things.
00:47:03.240 | Swift likes things that are hackable.
00:47:04.800 | And if something is interesting for a library developer to do,
00:47:09.000 | maybe you want to do it in your workbook, right?
00:47:11.080 | And so having an open ecosystem is very powerful.
00:47:14.920 | And so if you actually go look at the library
00:47:16.880 | that implements float, float is just a struct,
00:47:20.280 | just like the complex thing we were talking about before.
00:47:22.800 | In the middle, the inside of it is this built-in weird thing.
00:47:26.160 | That's an LLVM thing.
00:47:27.680 | And plus, on floats, isn't a magic thing
00:47:30.440 | that the compiler knows about.
00:47:31.760 | Plus is just an operator, just like we were talking
00:47:33.400 | about before when we defined square root.
00:47:35.240 | Just this one happens to be named plus or plus equals.
00:47:37.560 | And it's implemented with LLVM magic.
00:47:39.440 | >> So we're allowed to use float now, Chris?
00:47:41.760 | >> Well, let's go look at the implementation.
00:47:44.840 | And so if you actually go look at this,
00:47:47.360 | this is the standard library that comes with Swift.
00:47:50.040 | Here you can see it implements infinity.
00:47:52.000 | It implements not a number.
00:47:53.640 | It implements add, pi, like all the things that are
00:47:57.360 | in float is just a gigantic pile of Swift code.
00:48:00.400 | And the cool thing about this is that this means
00:48:02.800 | that you can implement low-level stuff
00:48:05.040 | like this too right in the workbook.
00:48:08.960 | >> And to be clear, we don't expect you
00:48:12.720 | to implement float yourself.
00:48:15.120 | But the fact that you can is actually important
00:48:18.760 | for data scientists.
00:48:20.080 | And so let me explain.
00:48:21.880 | When I was starting out and I did a lot of stuff
00:48:26.760 | with Delphi I guess 20 something years ago,
00:48:30.520 | which is like a very fast Pascal system.
00:48:33.080 | And I was writing a lot of numeric code.
00:48:35.040 | And I very often hit this floor where things weren't working the
00:48:38.920 | way I wanted them to.
00:48:40.280 | So I had to use assembler, which nobody should ever have to do.
00:48:44.000 | But that was the floor I hit.
00:48:45.280 | Like I had work that needed to be done.
00:48:47.280 | And I couldn't do it in Delphi.
00:48:49.000 | So I had to use assembler.
00:48:50.200 | But at least I could.
00:48:51.800 | And over the last 25 years, we've gradually kind of filled
00:48:56.440 | in more and more of the things that numeric programmers use.
00:49:01.520 | But what I'm kind of finding is happening now is
00:49:03.560 | as numeric programming is becoming differentiable
00:49:05.800 | programming, I'm hitting the bottom of the stack again.
00:49:09.880 | And there aren't things that I want to do.
00:49:12.560 | And/or there are things I want to do a little bit differently.
00:49:14.960 | So I feel like we're at this point in history.
00:49:17.960 | You know, we might be for the next five or ten years or more
00:49:21.440 | where data scientists don't need to know how to write assembler.
00:49:25.280 | But they do need a system where they can go under the surface
00:49:28.400 | and actually change things that people don't normally change.
00:49:33.240 | >> Yeah. Well, and again, to me,
00:49:34.680 | I think the goal here is an infinitely hackable platform.
00:49:37.120 | So like in the box are all the nice things you'd expect.
00:49:40.600 | You don't have to write map models.
00:49:41.920 | You don't have to write floats.
00:49:43.520 | But if you want to go change it and do your own, you can.
00:49:46.160 | Or if you want to take somebody else's,
00:49:47.840 | you can drop it in your workbook.
00:49:49.600 | You don't have to recompile the whole stack.
00:49:51.800 | Now, we talked about structure a little bit like classes in Python.
00:49:58.120 | The major difference is that these are actually fast.
00:50:00.280 | So here's our square add
00:50:02.640 | that multiplies two things together and adds it.
00:50:05.360 | If this was Python, these would be allocating objects.
00:50:08.800 | This would be doing lots of crazy stuff.
00:50:11.120 | This thing I'm showing you now is called the compiler explorer.
00:50:14.080 | And you thought you came to learn machine learning?
00:50:16.000 | Here's some assembly language, which we're going to get away
00:50:19.560 | from as soon as possible.
00:50:20.840 | But the point is like you're writing a reasonable Swift code
00:50:23.040 | and you're getting literally the same code you would get
00:50:25.800 | from Clang if you wrote C++.
00:50:28.320 | Like even though float is implemented in the standard library,
00:50:31.360 | there's no tricks here.
00:50:32.640 | You're getting the lowest level optimized fast code that's turning
00:50:35.680 | to multiply instruction and add instruction on Intel.
00:50:38.640 | And I'll go away from this very quickly because we're not here
00:50:41.280 | to learn about Intel assembly.
00:50:43.720 | So now the thing about float again is not really
00:50:47.040 | about something you should want to do, but you can poke
00:50:49.160 | at it if you want.
00:50:50.460 | You can see what's inside of it.
00:50:51.760 | One of the things we've at least so far chosen not
00:50:54.200 | to do is we don't export the built-in to workbooks.
00:50:57.240 | And so you have to write a standalone file to use it.
00:50:59.400 | We could change that if we wanted to.
00:51:01.200 | But one of the really powerful things about this is
00:51:03.320 | because these types are defined in the library, they're not magic.
00:51:07.400 | Well, now all the other things we talked
00:51:08.800 | about before work with these.
00:51:10.600 | And so we can add a method to int or to bool.
00:51:13.640 | So here, you know, we add a little is odd method
00:51:17.080 | that's just saying is the low bit set or clear.
00:51:19.880 | That's cool.
00:51:21.180 | That's fine.
00:51:22.480 | Like this is not monkey patching.
00:51:23.780 | This is just super low level.
00:51:25.080 | Int is a struct.
00:51:26.380 | Sure you can add a method to it.
00:51:27.680 | No problem.
00:51:28.980 | We can add a symbol that turns a boolean into an emoji
00:51:30.280 | because emojis are cool.
00:51:32.400 | And so now you can just use these things.
00:51:34.560 | And we can say, hey, 4, are you odd?
00:51:37.120 | A4, are you odd?
00:51:38.440 | Turn yourself into a symbol.
00:51:39.740 | And we get true false.
00:51:41.040 | We get thumbs up, thumbs down.
00:51:42.340 | And it all just kind of works.
00:51:43.620 | This is particularly important for all of us at this stage
00:51:47.920 | because as we discussed, you know,
00:51:49.600 | Swift hasn't been widely used for numeric programming.
00:51:52.040 | So a lot of stuff doesn't exist.
00:51:53.540 | And so when I started playing with it in December
00:51:56.720 | and I realized like, oh, I'm missing stuff.
00:51:59.920 | So yeah, so I created this library called BaseMath
00:52:03.040 | where I literally was defining things on float
00:52:08.040 | that I needed to exist.
00:52:10.240 | And not only did they then exist,
00:52:12.740 | but they ran at C speed.
00:52:15.480 | And then from then on, I had all the stuff
00:52:19.360 | that I math stuff that I wanted in the language.
00:52:22.760 | And so if you're hacking around over the coming months
00:52:27.120 | and you find things not quite the way you want,
00:52:30.200 | you can and should change it, right?
00:52:33.560 | And it's really, really, really common in Swift code
00:52:38.560 | to add extensions to basic types.
00:52:41.220 | It's not at all unusual or weird.
00:52:43.320 | It's just part of how you write Swift code.
00:52:45.720 | - And you can make it feel the way you want.
00:52:47.480 | So we're not going to dive in too deep,
00:52:49.200 | but there's lots of interesting things in the system.
00:52:51.640 | So if you say, well, how does and and work?
00:52:54.520 | And and only evaluates one side of itself
00:52:57.340 | if the other side is true.
00:52:59.160 | Well, that's implemented in our libraries,
00:53:00.440 | three lines of code, you can go dive in.
00:53:01.960 | There's a couple of links.
00:53:03.660 | Let's talk about array because we need arrays
00:53:05.720 | to implement MatMul.
00:53:06.720 | Before we talk about how array works,
00:53:09.840 | let's look at how you use it as a Swift programmer.
00:53:12.240 | Arrays in Swift are super simple.
00:53:13.960 | You just define them with square brackets like you'd expect.
00:53:16.760 | Swift is type inferred.
00:53:18.920 | And so what you'll end up seeing is
00:53:20.760 | there's two different syntaxes for the types.
00:53:22.560 | There's int and square brackets,
00:53:24.080 | which is the way you'd normally write it
00:53:25.360 | if it's not inferred.
00:53:26.920 | But that is actually just sugar for this array, okay?
00:53:30.320 | And if you print out the types of these things,
00:53:32.640 | you'll see that they're all just array event,
00:53:34.360 | array event, array event.
00:53:36.040 | Well, arrays can be iterated over.
00:53:39.040 | So you can have a for loop.
00:53:40.080 | It just goes over all the elements of the array.
00:53:42.240 | Pretty simple.
00:53:43.240 | You can slice them.
00:53:45.080 | Swift has two ways to slice
00:53:46.520 | based on whether you want the endpoint or not.
00:53:48.640 | And if you want an inclusive range,
00:53:50.040 | which includes that endpoint, you use dot, dot, dot.
00:53:52.640 | In an exclusive range, you use dot, dot lesson.
00:53:54.740 | And the lesson says, but not the last one.
00:53:56.960 | Swift supports functional programming things.
00:54:00.180 | And so here what we do is we use
00:54:02.760 | this functional map algorithm.
00:54:05.000 | And it's using a closure.
00:54:07.080 | Closures are the same thing as lambdas in Python
00:54:09.680 | with slightly nicer syntax.
00:54:11.480 | And so here what we're doing is we're saying,
00:54:13.040 | give me an array, but run a function
00:54:15.680 | that takes all the elements and adds 10 to them.
00:54:18.760 | And it's very simple.
00:54:19.840 | You can just do this right in line and it's nice and fast.
00:54:22.120 | And so here we get our array
00:54:23.280 | where everything has 10 added to it.
00:54:25.020 | It has filter and reduce as well.
00:54:29.000 | So filter just takes a predicate where you say,
00:54:31.480 | filter and only include the things that are odd, okay?
00:54:34.600 | And we just added is odd.
00:54:36.500 | And now we get an array that just has odd things in it.
00:54:39.120 | Super easy.
00:54:39.960 | And one of the other things you'll notice
00:54:41.640 | is that Swift has lots of nice syntactic shortcuts.
00:54:43.980 | And so instead of naming our argument like we did in map,
00:54:46.680 | we just use the default name, which is dollar sign zero.
00:54:48.940 | - So the top one is equivalent to lambda arg colon,
00:54:52.800 | arg plus 10, right?
00:54:54.400 | And so then we can get rid of both the lambda
00:54:56.880 | and the arg colon by sticking it in curly brackets
00:54:59.200 | and just using dollar zero
00:55:00.320 | to refer to the first argument.
00:55:01.960 | - Another super common thing is that often these closures
00:55:04.360 | end up being the last argument to a function.
00:55:06.320 | If you have, if they're the last argument to a function,
00:55:08.080 | you can just put them outside the parentheses.
00:55:10.000 | And if that's the only thing in the parentheses,
00:55:11.800 | you can just get rid of the parentheses as well.
00:55:13.640 | And so you get these really nice things
00:55:14.960 | that are kind of like list comprehensions
00:55:16.800 | where you can say map and multiply all the elements by three
00:55:19.960 | and then filter them and only keep the odd ones.
00:55:22.840 | And you get very nice, fluent things,
00:55:24.840 | or here's a map where I'm saying, you know, pick,
00:55:27.440 | get the odd, like decide whether it's odd
00:55:31.240 | and then turn it into a symbol.
00:55:32.580 | And I get very nice, simple.
00:55:34.400 | - Yeah, so this, so just come back and have a look
00:55:36.960 | at this map filter again at the end of the lesson
00:55:39.120 | because this is how you do list comprehensions in Swift.
00:55:41.760 | You don't need special syntax for it
00:55:43.280 | because we, the stuff that's already built in
00:55:45.800 | very elegantly gives us list comprehensions are free.
00:55:48.760 | - Yep, and all these things are just library features.
00:55:50.820 | Reduce is a, it's a reduction.
00:55:53.200 | So you give it the first element.
00:55:54.460 | And then in this case,
00:55:55.300 | we're just adding all the elements of the array to zero
00:55:58.660 | and plus is a function.
00:55:59.880 | We saw it already.
00:56:01.240 | And so this just uses the plus function
00:56:03.920 | to do a reduction, it's super simple.
00:56:06.160 | Now we're talking about array, array is a type.
00:56:08.840 | And that means you can do an extension on a type.
00:56:11.600 | So you can add methods to arrays, that's super easy.
00:56:14.280 | So here we defined a double elements method
00:56:16.920 | that returns a new array and we just map.
00:56:19.840 | So double elements just multiplies all the elements by two
00:56:22.520 | and like the self thing we don't actually need.
00:56:25.360 | - Oh, thank you.
00:56:26.260 | - Right, Jeremy?
00:56:27.200 | - Too much self in Python.
00:56:28.520 | - Yeah.
00:56:29.520 | And now one of the other things you may wonder about
00:56:31.120 | is like, why do we need this where element is numeric?
00:56:34.080 | And what this is talking about is it's saying,
00:56:35.800 | well, we're taking all the elements out of this thing
00:56:37.640 | and multiplying it by two.
00:56:39.240 | This is helping us catch errors.
00:56:40.960 | So if I have an array of Booleans,
00:56:44.560 | I get an error message that says,
00:56:46.360 | hey, I can't double all the elements of a Boolean array
00:56:49.740 | because bool is not a number.
00:56:51.940 | And so in Python, what would end up happening
00:56:53.960 | is if you accidentally pass the wrong thing in,
00:56:56.240 | you would pass in your Booles
00:56:57.780 | and then they'd get multiplied by two
00:56:59.240 | and then sometime in a far distant API call,
00:57:01.760 | somewhere later you find out you have twos in your Booleans,
00:57:05.400 | like what just happened.
00:57:06.600 | And so Swift helps catch these errors early.
00:57:08.160 | - And talking of what just happened,
00:57:09.520 | this is the point where if you've used
00:57:11.440 | a few languages before, you're thinking,
00:57:13.140 | oh, Swift is actually pretty different
00:57:16.100 | in a pretty interesting way
00:57:17.360 | 'cause what we've just done,
00:57:19.080 | we've just said, here is some functionality
00:57:23.120 | which applies to some type
00:57:26.720 | which has some particular properties to it, right?
00:57:29.560 | So we've like defined functionality in a way
00:57:32.880 | that's gonna be looked up
00:57:34.120 | in this really nuanced and interesting way.
00:57:36.000 | And so we're not gonna go into the details now,
00:57:37.940 | but just like take a look at this after the lesson
00:57:41.420 | and think like, wow, what's going on here?
00:57:43.840 | 'Cause this is something really interesting.
00:57:46.600 | - And again, one of the cool things about this
00:57:47.960 | is because it's all built in the library,
00:57:49.480 | it's all open to you and you can do cool things
00:57:52.360 | like add methods or do other things.
00:57:54.100 | So I'm not gonna go into the full details of array.
00:57:57.600 | Array is implemented right here and array.swift.
00:58:00.960 | This is the standard library, array is a struct.
00:58:03.920 | It has a buffer inside of it, the elements.
00:58:07.080 | You can go through and you can see all the different
00:58:09.440 | gory details of things that go into array.
00:58:12.160 | And instead of coding this in Workbook,
00:58:13.920 | I think that we will just consider it that we implement this.
00:58:16.840 | Is that okay with you, Jeremy?
00:58:17.760 | - Absolutely.
00:58:18.600 | - Okay, so now we can use arrays.
00:58:20.120 | Okay, so let's move on to Matmill.
00:58:23.800 | Okay, so what I'm gonna suggest
00:58:25.360 | is we might be good time to take a break.
00:58:27.960 | So let's take a six minute break
00:58:29.360 | and we'll see you back here at 6.47.
00:58:32.360 | Now that we've invented SWIFT and float and array,
00:58:36.200 | we will actually implement matrix multiplication.
00:58:38.440 | So we'll see you back here at 7.47.
00:58:40.360 | Okay, any questions before we keep going, Rachel?
00:58:44.080 | - Yes, we have two questions.
00:58:46.000 | The first is that in keyword is very unintuitive
00:58:49.760 | for arg and arg plus 10.
00:58:51.480 | - Enclosures, yeah.
00:58:52.640 | Can we point at that?
00:58:53.920 | So yeah, the in keyword.
00:58:56.020 | - Yeah, up here, in, yep.
00:59:01.940 | - So that's the question, it's like, why is it so weird?
00:59:05.040 | - Why is arg in arg plus 10?
00:59:07.200 | - Yeah, so we carefully considered the name of this keyword
00:59:11.080 | and we didn't have anything better to use,
00:59:12.620 | so we got stuck with this.
00:59:13.920 | - In Python, I guess that would be a colon.
00:59:18.960 | - Yeah, so there's no good answer.
00:59:21.920 | Nobody knows what it means.
00:59:24.000 | There's historical reasons, but they're not good reasons.
00:59:27.280 | So we just do it and it's--
00:59:28.400 | - So the answer is because Chris says so.
00:59:30.920 | - Thanks for your honesty.
00:59:32.080 | - Why do we use colon?
00:59:32.920 | Well, that's what Python says.
00:59:34.580 | - And the second question,
00:59:36.400 | can SWIFT LLVM implement instructions to execute on the GPU?
00:59:40.800 | Would this be a good idea?
00:59:42.040 | - Yeah, this is a really exciting direction.
00:59:43.720 | This is one of the things we're investing a lot
00:59:45.240 | in TensorFlow and the infrastructure pieces,
00:59:47.320 | and we'll be talking about that a little bit next time.
00:59:49.660 | - Yeah, but I mean, the short answer is that LLVM
00:59:52.640 | has a number of backends.
00:59:54.400 | And one of the backends it has is a PTX backend,
00:59:57.760 | which is the lower level Nvidia kind of instruction set.
01:00:01.920 | And so like right now you can compile stuff with LLVM
01:00:04.680 | and have it run as code kernels.
01:00:07.320 | So the answer is yes, absolutely.
01:00:08.800 | - And in fact, like every pixel on the iPhone
01:00:11.120 | goes through LLVM, not through SWIFT in a workbook.
01:00:16.120 | - Not bad.
01:00:17.600 | So LLVM is used for lots of cool stuff
01:00:20.720 | and using it for more cool stuff is fun.
01:00:23.440 | So now that we have our bedrock of floats and arrays,
01:00:28.380 | let's build something a little bit higher level.
01:00:29.760 | Let's talk about matrix multiplication.
01:00:32.620 | So here what we're gonna do is we're actually gonna
01:00:36.360 | load up a few tensors.
01:00:37.880 | And here we're playing by the same rules
01:00:39.420 | that we played with in Python,
01:00:41.480 | where we could use the random number generation
01:00:43.480 | and things like this,
01:00:45.160 | but we're not gonna use the matrix multiplication operator
01:00:47.920 | quite yet.
01:00:48.920 | So there's lots of ways, there we go.
01:00:52.560 | There's lots of ways to create tensors
01:00:54.120 | and SWIFT for TensorFlow has this tensor type,
01:00:58.220 | this little float thing we wanna go away eventually,
01:01:00.280 | we hope, but right now you say tensor of float
01:01:02.720 | to say that I want a tensor of floats.
01:01:05.320 | Use a shape, you can get zeros ones,
01:01:07.320 | you can repeat a thing, you can get random,
01:01:09.260 | there's lots of different things you can play with.
01:01:11.840 | I highly recommend you type tab in Jupyter
01:01:14.760 | and it will tell you all the things
01:01:16.080 | that you can do with completion.
01:01:18.120 | So let's build a matrix multiply.
01:01:19.640 | So here what we're doing is we're doing something
01:01:21.440 | kind of similar to what Jeremy did
01:01:23.680 | in the Python version of this.
01:01:25.400 | But here we're starting a little bit lower level.
01:01:27.320 | We only have one dimensional arrays.
01:01:29.120 | That's how the SWIFT array works.
01:01:31.520 | And so what we need to do is we need to pass in
01:01:33.440 | two arrays of floats,
01:01:34.800 | and then we're doing a two dimensional matrix multiplication
01:01:37.200 | so we need to know the number of rows
01:01:38.520 | and number of columns for A and B.
01:01:40.420 | - So that's a definition of a tuple parameter, Chris?
01:01:42.960 | - Yep.
01:01:43.800 | - So there are two ints.
01:01:44.960 | - Yep, so A dims is two ints and B dims is two ints.
01:01:48.240 | And so what we do is we create an array full of zeros,
01:01:50.920 | and then we write the same three loops you saw before.
01:01:53.120 | And because we have a one dimensional array,
01:01:55.400 | we have to do manual arithmetic to get into it.
01:01:59.960 | Now, one of the things that you'll see is like,
01:02:02.280 | if you actually try this out and you run this,
01:02:05.360 | I didn't actually run the cell,
01:02:06.880 | this is why you say don't do things and I don't listen,
01:02:10.680 | and then I make a fool out of myself.
01:02:13.760 | - Okay, so then you run this,
01:02:16.720 | you get the tensors here,
01:02:18.260 | we're just using the MNIST data set
01:02:19.800 | because it's fun to use.
01:02:21.880 | Now we can run this and we can time it.
01:02:23.800 | And one of the major differences you'll see
01:02:25.120 | is we just wrote three loops
01:02:26.840 | and it takes 0.132 milliseconds.
01:02:30.300 | The Python version of this took 835 milliseconds.
01:02:33.200 | - We'll just have a look.
01:02:34.400 | So Chris, I just wanted to compare.
01:02:37.000 | So, I mean, the first thing I wanted to compare
01:02:38.720 | was to look at the code, so that's a Swift code.
01:02:41.040 | And in Python, there's the Python code.
01:02:43.440 | So it's basically exactly the same code.
01:02:45.800 | - Yep, here you have 2D arrays, which we'll get to.
01:02:48.680 | - And so, yeah, we kind of found with the Python version,
01:02:51.780 | it took about a second to do a five by 784 matrix model,
01:02:56.780 | probably with a 784 by 10.
01:02:58.740 | So we kind of did the math and said like,
01:03:00.800 | we can't use this because it's just not fast enough.
01:03:05.180 | But something very different is going on here
01:03:07.800 | because this is about 9,000 times faster?
01:03:12.600 | - Yeah, so this is not super fast,
01:03:15.320 | but this is pretty reasonable.
01:03:16.480 | This is what you get from roughly C, right?
01:03:20.160 | And that's because, again, we talked about,
01:03:21.640 | it's primitives that are principled that are fast.
01:03:24.320 | And when you build things out of principled fast primitives,
01:03:27.080 | you get new things that are principled and fast.
01:03:29.440 | - Okay, so this is no big deal for you,
01:03:30.960 | but for a Python programmer,
01:03:32.200 | this is like, this was a whole mind shift change.
01:03:34.920 | 'Cause at this point, the fact that you can write this
01:03:37.640 | and have it run this fast means like,
01:03:40.160 | I can now write anything I can think of doing with numbers
01:03:44.660 | in the most obvious way possible
01:03:47.360 | and have it work pretty damn well.
01:03:50.100 | So this is kind of like a superpower
01:03:51.600 | that Python programmers don't have.
01:03:53.400 | - Well, and if you think about it,
01:03:54.840 | so one way to think about, so for intensifiers,
01:03:56.560 | we're trying to subtract C and C++ out of the picture, right?
01:03:59.800 | Because if you think about working with Python,
01:04:03.220 | a lot of it ends up being, if you care about performance,
01:04:05.640 | working around the gill.
01:04:06.640 | And how do you do that?
01:04:07.480 | How do you go into C stuff or working around writing,
01:04:11.600 | oh, I need a new kind of data structure, what do I do?
01:04:13.840 | I write a bunch of stuff in C
01:04:15.080 | because writing in Python wouldn't be fast enough.
01:04:17.440 | And that's problematic for lots of reasons,
01:04:19.080 | one of which is it's just really hard
01:04:20.200 | to do things in workbooks.
01:04:22.000 | But here we're implementing basic stuff in workbooks
01:04:24.560 | and it goes fast.
01:04:25.560 | - Yeah, and I can ship something
01:04:27.360 | that's like a compiled program that I just ship it.
01:04:30.240 | I don't have to figure out how to put the C library
01:04:33.240 | over here and compile it and put it together with this header.
01:04:35.840 | So Jeremy, what is this built-in called time?
01:04:38.600 | Is that built-in in language?
01:04:39.840 | - No, well, so Chris--
01:04:41.760 | - Can you show me what that is?
01:04:42.720 | - Absolutely, so we're not using percent time
01:04:44.720 | because percent time is a magic.
01:04:46.560 | And we, under your new rules,
01:04:49.500 | we shouldn't be allowed to use things that are magic,
01:04:51.440 | we should write them ourselves.
01:04:52.800 | So time is written ourselves
01:04:55.600 | and it's actually written in this notebook called 00.
01:04:59.760 | So yeah, so when we started out,
01:05:03.560 | we started out with a blank slate
01:05:06.360 | and so we had to start out with things like
01:05:09.360 | how do I time a cell, right?
01:05:11.200 | So the answer is, this is the nice thing
01:05:13.520 | about working with Swift is we can build everything
01:05:16.280 | from scratch, right?
01:05:17.620 | So here's the timing section, right?
01:05:20.160 | And the details don't matter, but basically you can see
01:05:22.920 | we can grab some stuff from the Swift standard library,
01:05:25.880 | for example, a function that tells the time.
01:05:28.400 | And we can run some function and see how long it takes.
01:05:31.600 | And the nice thing is that we can end up
01:05:34.320 | with some very neat syntax, right?
01:05:37.380 | Because the thing that we pass in,
01:05:39.520 | the function we pass in is a closure, right?
01:05:42.120 | And this is how you say pass in some closure
01:05:44.240 | that takes no arguments and returns nothing at all.
01:05:46.980 | And so, for example, we can give it a default value,
01:05:50.720 | which means if we want to time something
01:05:52.680 | and just run it once, we can just do that, right?
01:05:56.800 | So we can create syntax, you know,
01:05:59.440 | we can create APIs that are really nice
01:06:01.520 | and elegant and simple to use.
01:06:03.200 | And so this version of time actually is both time it
01:06:07.480 | and time together, right?
01:06:09.560 | So if you give it repeating, it'll do that, right?
01:06:12.160 | And actually, this 00 notebook is worth flicking through
01:06:16.840 | because it's the only notebook where there's no tensors
01:06:21.440 | in it, there's no tensor flow in it,
01:06:23.040 | there's no tf.data in it.
01:06:24.640 | So if you just want to see like just Swift, right?
01:06:28.880 | This is a good way to learn just Swift.
01:06:30.360 | So for example, you can see how we built this thing
01:06:34.240 | where you can go ls.shell.
01:06:37.720 | So we've actually added something to string
01:06:40.640 | that will run a task, right?
01:06:43.800 | And you can kind of see how to write these things
01:06:46.800 | in just nice neat little concise packages.
01:06:49.360 | And now we can export this and now anytime you want
01:06:52.040 | to run a process through the shell,
01:06:56.400 | you can just go blah.shell.
01:06:59.920 | You can see how to download a file.
01:07:02.240 | And one of the nice things here is that you'll see
01:07:04.240 | that download a file with actually using this path library
01:07:10.200 | that looks almost identical to pathlib.
01:07:12.720 | And that's because there's a wonderful programmer,
01:07:16.920 | wonderful person called Max Hal.
01:07:18.800 | This is Max's username on GitHub.
01:07:21.120 | And I mentioned he's actually an open source programmer
01:07:23.880 | who entirely relies on sponsorship for his work.
01:07:26.280 | So if you like Max's work, which I certainly do,
01:07:29.840 | you should go to his Patreon and give him a few dollars.
01:07:33.640 | So thanks to Max, we have something that's really just
01:07:37.840 | like pathlib but actually almost a bit better.
01:07:40.360 | There's something that's almost exactly like requests.
01:07:43.040 | It's called just, right?
01:07:44.320 | So in the Swift ecosystem, thanks to the fact
01:07:46.960 | that you've got all these millions of iOS programmers
01:07:49.320 | who have been using this language for five years
01:07:51.560 | to build cool stuff,
01:07:52.960 | there's actually a nice non-data science ecosystem.
01:07:56.720 | >> And while we're talking
01:07:58.040 | about a non-data science Python similar packages,
01:08:01.400 | is there any web framework similar to Flask or Django
01:08:04.880 | for Swift yet?
01:08:06.800 | >> Yeah, actually the Swift on the server community is
01:08:09.760 | a really vibrant community.
01:08:11.080 | And there's the equivalent of Ruby on Rails.
01:08:13.720 | And a bunch of these different frameworks have Swift versions.
01:08:17.160 | And that's actually one
01:08:18.480 | of the biggest non-iOS communities that exist.
01:08:21.480 | >> So one I've seen a lot of love for is Vapor, I think?
01:08:23.880 | >> Yeah, Vapor, IBM is investing.
01:08:26.640 | They have a framework called Couture.
01:08:28.320 | And they're putting a lot of time and thought into that.
01:08:31.880 | Apple has a low-level thing that's
01:08:33.760 | like Netty, the Netty library on Java.
01:08:35.640 | And there's a Swift version of that called Swift Neo.
01:08:38.480 | So there's a bunch of these fairly infrastructural things
01:08:42.160 | that exist that are part of the Swift ecosystem.
01:08:44.200 | And Swift is really great on servers too.
01:08:45.920 | >> Great. So here you can see how we can download a file.
01:08:49.320 | It's all pretty straightforward.
01:08:50.600 | We've got try-catch blocks, a lot like we're used to.
01:08:52.640 | But they're kind of do try-catch.
01:08:54.160 | The details are a bit different.
01:08:56.240 | So in this case, we want to download MNIST and load it up.
01:09:01.040 | One thing to be aware of is that things --
01:09:04.080 | and we'll talk a lot more about this next week.
01:09:05.800 | But things can get a little bit complicated when, like,
01:09:09.200 | for example, for MNIST, we've got a file containing labels.
01:09:12.480 | And we've got a file containing images.
01:09:14.520 | They're different types.
01:09:15.840 | The labels are ints, the images are floats.
01:09:17.840 | So we kind of want two functions.
01:09:19.760 | One that returns a tensor of floats,
01:09:21.240 | one that returns a tensor of images.
01:09:23.200 | That's duplicate code.
01:09:24.480 | I hate duplicate code.
01:09:25.840 | Right? So here's a version where we actually tell it, "Oh,
01:09:30.400 | you could load up different types of MNIST data,
01:09:33.920 | and it's going to return a tensor of that type."
01:09:37.560 | Okay. And unfortunately, if I try to use this, I get an error.
01:09:41.000 | Right? And I really wanted to show you this error
01:09:43.600 | because for the first week as a Swift programmer, I kind of --
01:09:47.760 | I've never felt so stupid.
01:09:50.000 | Like, I felt like everything I wanted to do,
01:09:53.440 | Swift hated me, and it told me these things like,
01:09:56.720 | "Cannot invoke map with a da da da da da,
01:09:58.760 | what the hell is all this?"
01:10:00.840 | And I'm just going to say that's totally fine.
01:10:03.080 | Right? Because the Swift type system is very new to somebody
01:10:09.440 | like me and probably most of the people watching this.
01:10:12.240 | The messages are helpful for people
01:10:15.720 | who understand it pretty well, and it's totally normal to think.
01:10:21.640 | For a week or two, Swift hates me.
01:10:24.560 | >> To be stubbing your toe on every new thing
01:10:26.560 | and feeling like you'll never --
01:10:27.880 | >> And particularly this generic stuff, you know?
01:10:30.200 | And I would say, look, a couple of suggestions.
01:10:32.400 | The first is just write the two separate versions
01:10:35.680 | so that you don't get frustrated, and then come back
01:10:37.840 | and try again a few times yourself.
01:10:40.000 | Ask on the forum.
01:10:41.800 | >> Stack overflow is great.
01:10:43.120 | >> Yeah. But quite often the kinds of errors you get
01:10:46.160 | from the site system are sometimes they're even
01:10:48.800 | like a long way away from really where the problem was.
01:10:51.640 | It can be quite difficult because it's a powerful type system
01:10:55.120 | for it to really know where the problem is.
01:10:57.480 | Now, in this case, the issue basically is that we are trying
01:11:01.840 | to call -- we're trying to initialize either floats
01:11:07.120 | or bytes, and so it basically needs to know
01:11:12.440 | that the type we're creating can initialize either floats
01:11:16.080 | or bytes, so as you'll learn next week, you could do
01:11:19.120 | that by creating something called a protocol.
01:11:21.160 | You do it by saying that these things conform
01:11:23.760 | to that protocol.
01:11:25.060 | You then use that protocol, and so now this version
01:11:28.880 | of load MNIST works totally fine, right?
01:11:34.040 | So this is a nice little package that you can look through.
01:11:39.360 | The last piece that we had to build
01:11:40.840 | in 00 was the thing that makes //export works.
01:11:44.880 | So it's kind of delightful writing the thing
01:11:47.400 | that makes //export work by typing //export.
01:11:51.080 | One of the things that I needed to do here was
01:11:55.040 | to check whether something matches a regular expression
01:11:59.760 | or not.
01:12:01.080 | I found it extremely weird that the way to do
01:12:04.080 | that in Swift is called range of options regular expression,
01:12:09.080 | so I created something called find in string.
01:12:11.520 | So now I never have to worry about the weird
01:12:13.880 | and clunky Swift syntax.
01:12:15.840 | Most of the time, I'm just looking
01:12:17.200 | to see whether something does exist or not,
01:12:19.120 | so I just created something called has match
01:12:20.800 | that checks whether something exists or not.
01:12:22.720 | So I make Swift work the way I want it to,
01:12:26.320 | and to give you a sense of like when I say clunky APIs,
01:12:29.600 | in particular, you'll see here we're using Max's beautiful
01:12:34.600 | path library.
01:12:36.080 | Before we realized that the path library does everything we
01:12:38.480 | wanted, we used the thing that Apple provides,
01:12:41.200 | which is called the foundation library,
01:12:42.520 | and that comes with Swift.
01:12:44.080 | These two lines --
01:12:45.400 | >> And also works great on Linux.
01:12:46.720 | It's a standard thing that's available.
01:12:48.200 | >> Yeah, so those two lines of code
01:12:51.200 | in Apple's foundation library looks like, oh, my God,
01:12:53.920 | it looks like this.
01:12:55.520 | Okay, so to me, a lot of Swift APIs look
01:13:00.000 | like the leading path component, the binding path component,
01:13:03.200 | pending path extension, right?
01:13:05.640 | I don't know why, but a lot of Swift programmers seem
01:13:09.280 | to like writing this kind of code.
01:13:11.720 | I like writing this kind of code,
01:13:13.480 | but I think foundation is not necessarily your favorite API
01:13:17.840 | design, Chris, would that be fair to say?
01:13:19.360 | >> I think it's fair to say that the thing that's great
01:13:21.320 | about foundation is it has a lot of interesting
01:13:23.680 | and useful stuff, URLs, and other stuff like that,
01:13:27.200 | but its design is not all great.
01:13:31.000 | >> It's great that it's there, and it's amazing
01:13:33.280 | that it's all been ported to Linux, right?
01:13:34.960 | So quite often, you'll find--
01:13:35.920 | >> It's got a tremendous amount of function.
01:13:37.160 | >> I need something like the ability to tell the time.
01:13:40.320 | It's in dispatch, or the ability to work with URLs.
01:13:43.680 | And so know that foundation is there, and generally speaking,
01:13:47.360 | I always just import it, first thing, right?
01:13:50.040 | Because there's a lot of stuff you want will live there,
01:13:52.160 | and if you forget to import it, it won't appear
01:13:53.800 | in your tab completion, and you'll get errors.
01:13:56.200 | But also, when you find clunky foundation APIs, which is--
01:14:00.600 | >> There's actually a better one out there.
01:14:02.320 | >> Or write your own little wrapper on top.
01:14:04.640 | Anyway, so once you've done that, now we've got our own,
01:14:06.920 | you know, JSON serialization thing.
01:14:08.880 | We can grab our Jupyter notebook.
01:14:11.000 | We can find ourselves.
01:14:12.300 | We can export them.
01:14:13.600 | And now, we can just do the same thing that we do in Python,
01:14:16.440 | and we now have a nice little module exporter that we can use.
01:14:20.080 | >> It's cool.
01:14:21.360 | >> We have a question on the time function.
01:14:24.160 | How do we know that calling F parentheses is not optimized away
01:14:28.320 | in this case because of a lack
01:14:29.680 | of side effects detected by the compiler?
01:14:32.080 | >> Generally, so that's actually a great question.
01:14:39.000 | In the case of the workbook,
01:14:41.480 | I don't think there's no cross-workbook optimization
01:14:44.560 | going on, so that's one thing.
01:14:46.000 | I don't know if there's a really good--
01:14:49.440 | that's a good question.
01:14:50.880 | What I recommend doing is put something that's not trivial
01:14:53.960 | inside the thing you're timing.
01:14:55.320 | And so, if you're doing, you know,
01:14:56.600 | we'll show you later launching a CUDA kernel
01:14:58.800 | to do matrix multiplication, for example,
01:15:00.720 | and that's not something that gets optimized away.
01:15:03.040 | You can also, like, get the value into the closure
01:15:05.320 | and then take the value back out.
01:15:06.880 | So, there's different things that you can do like that.
01:15:08.720 | >> Yeah. Sometimes, when I was doing this stuff,
01:15:10.160 | a base method, I would just add a print inside the thing
01:15:12.240 | that I was timing to force it to be calculated.
01:15:14.800 | >> And one of the other things that will happen with GPUs
01:15:16.560 | is GPUs run asynchronously, and so,
01:15:18.560 | you need to force a GPU sync.
01:15:20.160 | We'll show you how to do that in a minute.
01:15:23.480 | So, anyway, so coming back to this,
01:15:25.720 | so we showed you how to build matmul.
01:15:27.600 | We showed you how to build time.
01:15:29.080 | So, this matmul is working on arrays.
01:15:30.680 | And this is pretty fast.
01:15:31.920 | We talked about it's 0.13 seconds.
01:15:34.520 | But array in Swift is safe.
01:15:36.560 | And so, what's happening is that every time you,
01:15:39.280 | like, index into an array, it does a check
01:15:41.480 | to make sure that the index of your computing is in bounds.
01:15:44.400 | And so, this is actually doing a lot of work
01:15:46.040 | that you would not need to do if you're in C.
01:15:49.240 | And so, one of the other really cool things about Swift
01:15:51.360 | is that you can actually go all the way down to the bare metal
01:15:55.240 | and do things, the unsafe, nasty, awesome C-Wave,
01:16:00.240 | if you want to, to get even a little bit more performance.
01:16:02.840 | And so, here, sorry, I forgot to change this back,
01:16:05.840 | but we have a couple of arrays.
01:16:07.840 | And so, we have the exact same signature
01:16:10.040 | that we did before where we take in two arrays
01:16:11.800 | and we have our dimensions.
01:16:13.160 | And so, what we're going to do is to optimize storing
01:16:16.040 | into that result array, we're going to say,
01:16:18.520 | give me an unsafe mutable buffer pointer into that array.
01:16:22.400 | And it's unsafe, it's verbose, it has red warning signs
01:16:27.440 | all over because it's unsafe.
01:16:29.200 | But with almost no code change, now we're able to get something
01:16:33.880 | that runs twice as fast.
01:16:35.480 | And so, here's MatMul, and now it runs at .07 milliseconds,
01:16:40.120 | which is even faster, which really is a performance of C.
01:16:42.800 | And this is pretty cool.
01:16:44.120 | >> And something I found with Bay Smith is, like,
01:16:47.680 | sometimes these differences are four or five times faster
01:16:51.760 | because making something a pointer allows it
01:16:53.600 | to use 70 vectorization.
01:16:55.360 | >> Yep.
01:16:56.640 | >> So, it's not a minor tweak.
01:16:57.960 | You can get super fast performance.
01:16:59.880 | >> But the thing I want to emphasize at this point is
01:17:01.400 | that this is like a super low-level geeky thing
01:17:05.560 | that not everybody should do, right?
01:17:07.440 | This is something that it exists because at certain points
01:17:11.000 | in your career or your journey, you may find that it's useful.
01:17:14.080 | Or you may find something that somebody else wrote,
01:17:16.880 | and it going twice as fast
01:17:18.280 | as it otherwise would is a pretty big deal
01:17:20.000 | because it makes you twice as productive.
01:17:21.880 | But usually, you're not working at this level.
01:17:24.000 | Layers are a good thing.
01:17:26.840 | If you want to go like super deep down the rabbit hole,
01:17:30.040 | unsafe pointer, and unsafe mutable buffer pointer,
01:17:32.800 | and all these things are also Swift libraries,
01:17:34.520 | and you can go see their implementations, too.
01:17:36.080 | And those are implemented in terms
01:17:37.400 | of the LVM magic, just like Flow does.
01:17:40.280 | So, at this point, let's skip over more C stuff and jump
01:17:45.960 | down to working with Tensor.
01:17:47.760 | So, we've got a matrix multiplication working
01:17:51.680 | on arrays and floats, but we also have tensors.
01:17:56.000 | And so, when we talked about Tensor and MatMul
01:18:00.160 | in the PyTorch context, you started out by using the Tensor
01:18:03.840 | abstraction as the thing that was the container
01:18:07.720 | for the MatMul.
01:18:09.000 | So, let's talk a little bit about how Tensor works
01:18:10.560 | because this is the first really,
01:18:12.240 | so for TensorFlow piece of this, Tensor is a type.
01:18:15.800 | And Tensor can have multiple different elements in it.
01:18:18.560 | Like we talked about before, you can create one with zeros
01:18:20.920 | or random numbers.
01:18:22.920 | And the nice thing about tensors is that they carry a shape,
01:18:26.400 | just like you'd expect, and so you can get it with a dot shape.
01:18:29.440 | So, here you can see we have a 5 by 784,
01:18:32.040 | just like you might expect.
01:18:33.520 | And here we have a two-dimensional tensor,
01:18:36.640 | and you can print it out, and it's a two-dimensional tensor,
01:18:38.840 | just like you would kind of expect.
01:18:41.560 | Python has the @ operator to do MatMuls of two tensors.
01:18:46.280 | Swift has the same thing, but it uses the nice Unicode thing.
01:18:50.360 | There's an easy way to type this if you're on a Mac
01:18:53.160 | or if you're using the compose key on Windows.
01:18:55.440 | Or if you don't like Unicode, that's also totally cool.
01:18:58.560 | You can just use the MatMul function and just spell it out.
01:19:01.120 | And so, you know, this is an example
01:19:03.000 | of Swift just wanting to work the way you want to work.
01:19:05.280 | And if you like math, then you can have math.
01:19:08.360 | If you want to type things out, you can do that too.
01:19:10.560 | They're both great.
01:19:12.200 | Tensors do all the basic stuff you'd expect.
01:19:15.920 | You can reshape them with the reshape function.
01:19:18.240 | They support square root and all the other math stuff.
01:19:20.240 | It all works the way you'd expect.
01:19:21.560 | It has element-wise operations like add and multiply
01:19:24.480 | and square root and pow.
01:19:27.160 | >> No, we have a question from earlier.
01:19:29.800 | Why was it unsafe, what you did?
01:19:32.240 | >> So, what we did was we turned off bounce checking.
01:19:35.200 | And so, if I write code that -- if I have an array of 10 numbers,
01:19:38.800 | and in Swift, if I access out the 11th element,
01:19:41.920 | it will explode and tell me that that's invalid.
01:19:44.560 | If you use unsafe, then it will let you do that.
01:19:47.920 | And so, whatever happens to be in memory beyond the end
01:19:50.520 | of your array, you're now poking it.
01:19:52.120 | And, you know, you should not do that,
01:19:54.760 | but you're taking the guardrails off.
01:19:57.280 | And so, this is -- Swift is trying to be default --
01:20:00.680 | by default safe, and it's trying to help you.
01:20:02.400 | It's trying to check things for you.
01:20:03.720 | But if you want to, you can just rip off all the guardrails.
01:20:06.000 | And just like we showed you with Python,
01:20:08.240 | like you can literally do things as dynamic as Python
01:20:10.360 | if that's what you'd like.
01:20:11.680 | But, you know, the defaults are there to help you out.
01:20:14.640 | >> Yeah, so, Python programmers, a lot of them won't be familiar
01:20:17.360 | with this idea.
01:20:18.660 | But in things like C, unsafe code is code where you're working
01:20:22.440 | with memory that hasn't been initialized, or it's been freed.
01:20:25.520 | And it's a really big problem if you're using it
01:20:27.520 | like in production or something.
01:20:29.000 | Because that kind of thing is how people can
01:20:31.040 | like inject shell code into your server.
01:20:33.800 | >> Security form.
01:20:35.100 | >> And steal your users and whatever.
01:20:36.520 | So, you know, you should -- I think it's fine
01:20:38.960 | in algebra to notebook, though.
01:20:40.260 | >> Yeah, yeah.
01:20:41.560 | So, coming back to tensor, you know, you can add them.
01:20:44.880 | You can multiply them.
01:20:46.160 | You can -- like all the normal stuff you'd expect is on tensor.
01:20:49.560 | Tensor else, if I run the right cell, tensors also have a bunch
01:20:54.080 | of methods that do cool things like convolutions and other stuff
01:20:56.800 | like that that we'll talk about later.
01:20:58.920 | One of the interesting things about Swift is
01:21:01.080 | that it likes comparisons to return Booleans.
01:21:03.760 | And so, you'll see that if you compare two tensors,
01:21:06.000 | it will see if -- it will give you an ordering
01:21:09.120 | of the two tensors.
01:21:10.420 | But sometimes you want to get a tensor of Booleans back.
01:21:13.400 | And so, Swift calls these the point-wise operators.
01:21:17.640 | And so, if you put a dot before the less than or the greater than
01:21:20.320 | or the equals or whatever, it will do a tensor comparison.
01:21:24.080 | >> Yeah. And I get burnt by this all the time in NumPy and PyTorch
01:21:27.280 | that doesn't have this level of consistency.
01:21:29.240 | So, I think that this design call is awesome, this idea
01:21:32.440 | that Boolean operations always return Booleans
01:21:36.320 | and point-wise operations return point-wise Booleans.
01:21:39.640 | It's a good idea.
01:21:40.960 | >> And then you have reductions like any and all that say, hey,
01:21:43.400 | if I have a tensor of Booleans, I can check to see
01:21:46.320 | if they're all set or if any of them are set.
01:21:48.920 | >> So, basically then, I mean, the next part
01:21:51.920 | of the notebook is just saying, hey, look, all the stuff
01:21:54.000 | that you've seen before looks exactly the same
01:21:56.920 | as what you've seen before.
01:21:58.200 | Sometimes the words change, like unsqueeze is called expanding
01:22:01.080 | shape at, which is a rather swifty way of saying things.
01:22:05.000 | But there's -- in a lot of these notebooks, you'll find that there's
01:22:09.320 | like lots of details where we've just basically copied the Python
01:22:13.480 | code and we're not going to show you all those details
01:22:15.960 | because they're the same.
01:22:17.720 | >> Yep. Now, let's talk about matmul on tensor.
01:22:20.880 | So, what we've done here is we've defined the same matmul
01:22:23.160 | that we had before and before we took two arrays
01:22:25.760 | and we took two dimensions.
01:22:28.040 | The tensor carries a shape.
01:22:29.360 | So, here we implemented matmul on tensor.
01:22:32.200 | We start by creating a zero tensor, we loop over it all.
01:22:34.400 | Now we have our two-dimensional indexing just
01:22:36.200 | like you saw before with NumPy.
01:22:38.360 | When you run this, what you'll see is this is a lot slower.
01:22:42.160 | This takes seven seconds to do on matmul
01:22:44.400 | where before it was taking 0.07 --
01:22:46.480 | >> 0.1 seconds.
01:22:48.080 | >> Yeah, milliseconds.
01:22:49.480 | So --
01:22:51.040 | >> So, what is this?
01:22:52.320 | This is about --
01:22:53.640 | >> It's thousands of times.
01:22:54.960 | >> 10,000 times faster.
01:22:56.240 | >> Yeah. So, why is that, Jeremy?
01:22:57.760 | >> Why is that?
01:22:59.080 | The first thing I want to say is that hopefully
01:23:02.720 | at this point you're thinking this is a problem
01:23:06.480 | because it's kind of like the exact opposite of everything
01:23:09.680 | that Chris has been telling us and I've been telling you
01:23:11.800 | about why this is good.
01:23:12.960 | Like, what's the point of something that's infinitely
01:23:15.240 | hackable if there's this tensor flow layer we go beneath
01:23:18.920 | and that it's so slow that we can't really actually write
01:23:21.440 | things that run, I mean, seven seconds
01:23:23.800 | through a small matrix modification?
01:23:26.080 | Extraordinary.
01:23:27.120 | So, we would not -- we would not be running this course
01:23:32.400 | if this is where we were heading, right?
01:23:34.960 | This is where we are now and it's a temporary situation
01:23:40.240 | that we're fixing.
01:23:41.560 | And so, let me explain what's going on
01:23:43.760 | and how it's being fixed, right?
01:23:46.880 | So, the first thing to point out is that when you work
01:23:50.440 | with PyTorch, we have a similar issue, right?
01:23:54.440 | Is like we don't write PyTorch triply nested for loops either,
01:24:01.960 | right, and the reason we don't is that we need PyTorch
01:24:04.640 | to have a reasonable amount of work to do each time we get it
01:24:08.200 | to do something, right?
01:24:09.720 | So, we kind of say here's a whole matrix A,
01:24:12.880 | here's a whole matrix B, there it all is,
01:24:15.960 | multiply them together and then it says here's the whole thing
01:24:18.680 | multiplied together and that's what we do.
01:24:21.360 | So, it's like if PyTorch was an airplane, right,
01:24:24.200 | and we want to send our stuff off to China, we pack it all
01:24:26.880 | in a bag and we put the bag in the airplane
01:24:29.640 | and it gets sent off to China.
01:24:31.840 | As opposed to the triply nested for loop version,
01:24:34.440 | which is where I take a sock and I put it in an airplane
01:24:38.000 | and it flies to China and back and then I put it
01:24:39.680 | in my next sock and it flies there and back.
01:24:42.120 | And it's going to take, that's a fast airplane, right,
01:24:45.960 | but it's just not an efficient way to use it, right?
01:24:50.160 | So, we already have this issue which is you've got
01:24:53.000 | to give PyTorch enough work to do to make this latency,
01:24:58.000 | this overhead worthwhile.
01:25:00.280 | Now, TensorFlow was designed in a very different way
01:25:03.480 | to PyTorch and for those of you that did the earliest courses
01:25:08.320 | with fast AI, this will look very familiar, right?
01:25:10.680 | It's actually a really fascinating programming model.
01:25:13.960 | You say there will be later on a float called X
01:25:19.160 | and later on I will want to multiply that float by two.
01:25:25.360 | Now, set up a session where we're going to do some stuff
01:25:28.840 | and run this computation graph, which could have lots
01:25:32.680 | of things going on in it, and run it in these things, right?
01:25:37.880 | So, I basically kind of set up this whole series
01:25:40.240 | of computations and then I pass in some data.
01:25:43.880 | So, this is a very different feel to PyTorch, right?
01:25:49.120 | And because TensorFlow was built this way, TensorFlow, to me,
01:25:53.320 | does not behave like a plane.
01:25:57.040 | It behaves like a ship or actually a ship designed
01:26:01.160 | for shipping ships, or actually a shipping ship designed
01:26:04.800 | for shipping shipping ships, which is this particular one,
01:26:07.840 | the MV Blue Marlin.
01:26:09.720 | So, if you have a shipping ship shipping ships ship,
01:26:15.080 | then you need to use it in a different way, which is
01:26:17.160 | if you want to get all of your socks, all of the socks
01:26:20.080 | in America to China, you send them all, send all of your ships
01:26:25.280 | off to all of the ports in America,
01:26:26.880 | everybody dumps their socks on and they all get sent
01:26:29.480 | to China and we're all happy, right?
01:26:31.960 | Now, to take advantage of this extraordinary power,
01:26:35.400 | you have to use it in a certain way and you have
01:26:36.960 | to have certain things that you need to be able to do.
01:26:39.640 | So, like, if you're Google and you want
01:26:43.480 | to run all the world's search engine queries,
01:26:46.680 | this makes a hell of a lot of sense.
01:26:49.400 | Now, TF Eager is the kind of the new hot thing for TensorFlow
01:26:54.880 | and it's designed to, or it does look like PyTorch, right?
01:27:00.200 | So, this is what happens when you say TF.enableEagerExecution,
01:27:03.400 | that's becoming the default in TensorFlow.
01:27:06.240 | You can say, here's my number.
01:27:07.560 | I'm not saying there will be a number later.
01:27:09.480 | I say, this is my number and this is my matrix multiplication,
01:27:12.960 | right, and I can print it.
01:27:16.520 | The thing about this is, though, is that this is kind
01:27:21.040 | of syntax sugar on top of the ship, shipping, ship,
01:27:24.400 | shipping, ship, ship, right?
01:27:25.800 | Or whatever the hell the thing is.
01:27:27.120 | Because we're still using the same,
01:27:29.560 | a lot of the same kind of foundations and some
01:27:32.280 | of it's been optimized but only a bit, right?
01:27:35.120 | So, as I say this today, as of April 2019, a like 5 by 5 matrix,
01:27:41.720 | like a tiny matrix multiply on a GPU
01:27:44.440 | with TF Eager takes 0.28 milliseconds,
01:27:48.760 | which is 10 times longer than PyTorch takes, right?
01:27:51.640 | And so, we still just have a lot of overhead
01:27:54.360 | and so TF Eager is not a solution to writing the kind
01:28:00.960 | of low level get down to the bottom stuff that Chris is saying,
01:28:06.520 | you can do with Swift.
01:28:07.840 | >> Yeah, but also neither are GPUs.
01:28:09.760 | The GPU is not going to be fast
01:28:11.360 | at a 5 by 5 matrix multiply either.
01:28:13.400 | >> Right, right.
01:28:14.680 | So, I mean, it's, but it's not, you know,
01:28:16.640 | we want something, we want something better.
01:28:19.400 | >> Yes, right.
01:28:21.160 | >> So, TensorFlow has this big ecosystem of things to try
01:28:24.600 | and kind of fill in around this, around this issue
01:28:28.200 | of having this huge kind of mechanism that works
01:28:30.360 | in a particular way to make sure that, you know,
01:28:32.480 | you can put it on mobile phones or that you can do it
01:28:34.760 | on web servers or whatever, right?
01:28:38.240 | But the good news is that what's happening at the moment
01:28:43.360 | and the reason we're seeing this speed, right,
01:28:50.960 | is that behind the scenes, Swift
01:28:53.800 | for TensorFlow is using TF Eager.
01:28:57.800 | And this is like a great choice because it lets us
01:29:00.120 | like do this course, it lets us say like here's how you use it,
01:29:03.400 | we can build things on top of it whilst the real stuff is being built
01:29:08.400 | behind the scenes and the real stuff which is to sit on top
01:29:13.080 | of this thing called MLIR which Chris can tell us a little bit
01:29:16.320 | about which is basically gets all of that compiler goodness
01:29:20.240 | that you've seen and allow that to work with the GPU and the CPU
01:29:25.640 | and different types of accelerators and let you write Swift, right?
01:29:31.080 | So the reason I mention this is that for probably as long
01:29:34.280 | as this course is relevant, you know, like the next year,
01:29:38.760 | the true potential of what we're talking about,
01:29:41.800 | you kind of won't be able to see it, right?
01:29:44.080 | We're actually building for a future that's not here yet.
01:29:46.680 | This is like a year away.
01:29:48.320 | But when we get there, all the stuff that Chris is showing you,
01:29:54.480 | we'll be able to write stuff that looks,
01:29:58.080 | that could even look like this.
01:30:01.560 | >> Yeah, so if I were, a different way to explain it,
01:30:04.280 | a year from now it's going to be mind blowing.
01:30:07.040 | Like the future, you're going to be able
01:30:08.400 | to do stuff you've never even thought that was possible
01:30:10.920 | and use these accelerators in ways
01:30:12.440 | that are just completely unthinkable
01:30:14.040 | unless you're writing low-level CUDA today.
01:30:16.680 | Like there are certain humans in the world, like Scott Gray is one
01:30:19.520 | of these people who can make an accelerator do crazy things
01:30:22.800 | that nobody even thought was possible.
01:30:24.520 | And that's what we're trying to do, but in a workbook.
01:30:27.000 | >> Right. And the reason this matters is that there are vast areas
01:30:30.960 | of unexplored research territory because, I mean,
01:30:35.360 | most people can't write the CUDA code, and even those that can,
01:30:39.240 | it takes so long and it has so many errors,
01:30:42.200 | you just don't, right?
01:30:43.480 | So in a year's time, we'll be able to do stuff
01:30:46.480 | that people just aren't even trying to do yet.
01:30:48.920 | >> But one of the cool things about this is you don't have
01:30:51.080 | to wait a year, so next lesson we'll show you that XLA is here
01:30:54.160 | today, XLA is super awesome, it's a really important part
01:30:57.240 | of the TensorFlow ecosystem and it's way better
01:30:59.640 | than the Torch JIT.
01:31:00.960 | >> Right. So just to explain, yeah.
01:31:02.440 | >> So we want to like completely jump over the industry
01:31:04.960 | and do something that is mind-expanding.
01:31:07.480 | >> Right.
01:31:08.760 | >> But even today, TensorFlow is a lot of power and XLA allows you
01:31:11.880 | to express and build really cool things
01:31:14.360 | with super high performance.
01:31:15.680 | >> Exactly.
01:31:16.960 | So XLA is this really nice kind of intermediate thing
01:31:18.560 | where it's much more mature than the PyTorch JIT,
01:31:20.920 | it's been around for a couple of years now.
01:31:23.520 | It's a compiler that will turn your code into stuff that's kind
01:31:27.480 | of similar-ish performance to what you might see
01:31:30.280 | from PyTorch JIT, probably a lot less rough edges.
01:31:33.120 | >> It doesn't generate blobs of C++ and try to compile them again.
01:31:36.960 | It's a principle compiler.
01:31:38.520 | >> So it's a really neat path because it allows us
01:31:40.840 | to do this course now, it allows you to start playing with this now
01:31:43.960 | in a couple of months, it allows you to get a lot of performance
01:31:46.960 | for a lot of things that you might want to play with and it means
01:31:50.400 | that by the time MLIR comes, we'll be all ready
01:31:54.160 | to hit the ground running.
01:31:55.480 | >> Cool.
01:31:56.760 | >> And is there a way to make sure the matmul
01:31:59.760 | or other functions are correctly using shared memory on the GPO?
01:32:03.560 | For example, using tiling
01:32:04.960 | to make sure you aren't constantly busting the cache
01:32:07.200 | of shared memory on the GPO.
01:32:08.520 | >> We're not going to talk about this next week
01:32:09.920 | so maybe we could go back to that or?
01:32:11.680 | >> Well, so I think that the thing to know is
01:32:13.800 | that this is not something you would write
01:32:15.480 | in Swift for TensorFlow, right?
01:32:16.800 | You would not poke a tensor one float at a time.
01:32:19.600 | It was just not designed for that.
01:32:21.200 | And so you can go through all the steps.
01:32:22.760 | This is very similar to the Python workbook.
01:32:24.800 | But what you end up wanting to write is, let's see here.
01:32:29.120 | You just write this where you write something
01:32:32.600 | where you take two matrices and you're multiplying together
01:32:35.400 | or you use the Unicode one or the matmul one and it goes fast
01:32:38.280 | and it takes 0.02 seconds which is faster than Swift version
01:32:42.160 | because it's using the GPU.
01:32:44.160 | It's properly tile blocked.
01:32:45.920 | If you run on the CPU, it uses all the threads on your computer
01:32:48.520 | and it goes really fast.
01:32:50.240 | And so the way to think about tensor is that it's meant
01:32:52.960 | for these big operations.
01:32:55.560 | It's not meant for one float at a time.
01:32:58.000 | >> And we will see next week some really interesting stuff coming
01:33:00.720 | down the line with stuff where you can write kind
01:33:06.200 | of tiled algorithms in ways that are much more concise
01:33:09.200 | than the triply nested for loops
01:33:10.960 | but much more flexible than the matmul.
01:33:13.680 | >> Yeah. Join.
01:33:15.640 | >> Sorry, another question just came in.
01:33:17.400 | How do LLVM, MLIR and XLA relate to each other?
01:33:22.120 | >> That would be better explained with slides which we'll go
01:33:25.120 | into on the next time I think.
01:33:26.560 | But LLVM, the simple way to explain it is
01:33:30.240 | that it is really good at CPUs.
01:33:33.120 | It's a little bit of an oversimplification
01:33:34.760 | because we do use it for some GPU stuff.
01:33:37.120 | But LLVM really helps you with the one float at a time kind
01:33:40.720 | of a thing if you're going to a simpler processor.
01:33:43.800 | XLA is really good at tensors and so it's a tensor compiler
01:33:47.960 | and so it's really good at saying I have these big tensor
01:33:50.920 | operations, I have convolutions to get maximum performance
01:33:54.440 | out of a CPU or a GPU or a TPU for example.
01:33:58.080 | You have to know about tiling, you have to know about fusion,
01:34:00.880 | you have to know about a lot
01:34:02.180 | of these low level systems things before you then hand it
01:34:06.520 | off to LLVM that does the low level stuff.
01:34:09.400 | And so XLA talks to LLVM for GPUs for example
01:34:13.240 | and for CPUs and there's the way to think
01:34:15.680 | about it is XLA does all the tensor stuff
01:34:17.640 | and LLVM does all the float and small vector stuff.
01:34:21.040 | MLIR is like XLA in a certain way
01:34:23.560 | but it's tackling graph level optimizations in tensor flow
01:34:27.200 | and kind of expanding XLA beyond just dense linear algebra
01:34:31.520 | because there's a lot of interesting sparse things
01:34:34.600 | and other things that are coming down the pipeline
01:34:36.480 | that are really exciting.
01:34:38.400 | >> So yeah, so basically I mean we won't look at the rest
01:34:41.680 | of this notebook other than to say that the broadcasting stuff
01:34:45.080 | that we saw is all here.
01:34:47.320 | So you can kind of see how that all looks at the moment.
01:34:50.800 | >> You can run ops on the CPU or the GPU.
01:34:53.120 | I mean all that kind of stuff.
01:34:54.440 | >> All that stuff is all here and don't worry
01:34:56.440 | about the performance, it's really slow at the moment
01:34:58.240 | for the reason we mentioned, but it totally won't be.
01:35:01.120 | And you can also see matrix modifications of different sizes
01:35:04.320 | and how to take its timing and so forth.
01:35:07.760 | So did you want to go to Roar now or?
01:35:11.200 | >> Well do you want to do this or do you want to go to 11?
01:35:12.960 | Do we have time?
01:35:14.280 | >> We have time to do this now.
01:35:15.560 | >> Okay, so.
01:35:16.880 | >> Five to 10 minutes.
01:35:18.160 | >> Okay, cool.
01:35:19.440 | So one of the really cool things about the stack is
01:35:21.640 | that tensorflow is a really mature ecosystem.
01:35:23.520 | It has hundreds of different operators available.
01:35:27.360 | There's pros and cons of this.
01:35:28.680 | So tensorflow kind of grew organically over time a little bit
01:35:31.480 | and so it has a lot of things in its toolbox.
01:35:35.920 | What Swift for Tensorflow does is it tries to curate that
01:35:38.640 | and it has tensor and the way tensor works is it's the struct
01:35:43.000 | and the struct is implemented in terms
01:35:44.760 | of those low level tensor operations.
01:35:47.080 | And so if you look inside tensor and here there's a link
01:35:49.960 | so you can go click on it and see the implementation.
01:35:52.440 | Tensor has this thing called tensor handle that is
01:35:55.200 | under the covers basically the tensorflow low level thing
01:35:59.920 | that eager mode uses.
01:36:01.240 | And if you look at plus, what plus does on two tensors is it
01:36:05.280 | calls this raw add operation.
01:36:08.760 | And the way that this works is this is just directly talking
01:36:13.360 | to the add op in tensorflow.
01:36:16.120 | And the add op is implemented with cuDNN or it's implemented
01:36:19.000 | with XLA or it's implemented in different ways
01:36:20.960 | for different pieces of hardware.
01:36:22.960 | But this is just a simple syntactic sugar thing
01:36:26.200 | where we're saying hey plus turns into tensorflow add.
01:36:29.560 | Now again, tensorflow has tons of cool stuff and it has stuff
01:36:33.160 | that I barely understand with lots of mathy things
01:36:35.400 | and triangular LED compositional things
01:36:38.400 | and like Bayesian propagation of things that I've--
01:36:42.600 | >> We have an excellent course about triangular decomposition
01:36:44.760 | if you--
01:36:45.600 | >> Awesome.
01:36:47.120 | I'm going to try to survive next week and then I'll take it.
01:36:52.200 | And so we haven't curated all of the things
01:36:55.480 | that are potentially interesting.
01:36:57.000 | And so what you can do is you can actually add new things
01:36:59.760 | to tensor.
01:37:01.080 | And so one example of that right here is so you can get
01:37:05.720 | like zeros like if you go into here, let's see if this is.
01:37:11.640 | So with tab completion you can see all of the interesting things.
01:37:18.280 | Add many sparse to tensor map, add n, adjust contrast, a sign,
01:37:23.640 | like it's got lots and lots and lots and lots and lots and lots
01:37:26.520 | and lots of--
01:37:27.800 | >> And this is super cool, particularly if you're watching this
01:37:30.720 | between like about April and about September,
01:37:34.080 | like in the period where maybe the XLA stuff isn't
01:37:37.080 | as fully fleshed out.
01:37:39.920 | You probably care about this because there's--
01:37:43.360 | >> Lots and lots and lots and lots and lots and lots and lots
01:37:46.200 | and lots of--
01:37:47.480 | >> Which we haven't necessarily surfaced yet.
01:37:48.920 | So for example, somebody today was saying,
01:37:50.680 | how do I can switch from RGB to BGR format and somebody said,
01:37:57.520 | oh, there's something in TensorFlow called reversed
01:38:00.360 | and so here's the answer, raw.reversed.
01:38:03.000 | So what's knowing about this?
01:38:04.320 | >> Yeah, and so one of the things we use for X-Res
01:38:06.400 | and other image models in this course is, hey,
01:38:08.960 | we need to be able to load a file.
01:38:10.440 | And you could do that with Python, that's fine.
01:38:12.280 | TensorFlow has great stuff for doing this
01:38:14.080 | and so here we just use raw read file.
01:38:16.760 | And so all we're doing is we're adding a method
01:38:18.960 | to a string tensor and we're saying, hey,
01:38:21.600 | if you want to create a string tensor from a file,
01:38:25.480 | we can have read tensor, we can just use read file
01:38:30.960 | and now I can say, string, give me a string tensor, read file,
01:38:35.920 | foo and I added a decode JPEG method on here too
01:38:39.960 | so now I can say decode JPEG, JPEG and I got my file, right?
01:38:46.640 | And so this is one of the cool things about this platform is
01:38:49.440 | that TensorFlow has all this functionality.
01:38:51.480 | We're not trying to hide it, we're just trying
01:38:52.800 | to curate it a little bit but again, you can just go add
01:38:55.880 | whatever you need.
01:38:57.180 | >> Yeah, so one of the people
01:38:58.480 | in the study group today was building an audio library
01:39:00.000 | with Swift for TensorFlow and we haven't surfed any of that
01:39:03.200 | so they were grabbing, you know, raw dot decode WAV or something
01:39:07.440 | and they had it all up and running.
01:39:09.080 | >> Yeah, and it's super cool.
01:39:10.360 | And again, Swift gives you nice ways to build these things
01:39:12.800 | as APIs with default arguments and all this nice stuff
01:39:15.320 | and so you get a lot of design space to do things
01:39:17.320 | that work the way you'd like them to work.
01:39:19.400 | >> Cool. So the way we're going
01:39:21.760 | to do this is we've kind of gone like super, super bottom up.
01:39:26.880 | I must admit I thought we had done bottom up before
01:39:29.200 | but this is another level of bottom up.
01:39:31.480 | >> Then he brought a compiler guy.
01:39:33.040 | >> Yeah, then we brought a compiler guy who, you know,
01:39:36.320 | is always good at making me feel small and insignificant.
01:39:40.080 | And so, but now let's jump back up to the top again
01:39:44.720 | to see where we're going to end up and then next week,
01:39:50.160 | we're going to kind of flesh out all the stuff
01:39:53.040 | between the middle, right?
01:39:54.600 | So I'm going to jump to notebook 11.
01:39:56.680 | And notebook 11 is interesting because this is the one
01:40:00.120 | where we train an xresnet on imagenet, right?
01:40:05.240 | So this is where we're heading.
01:40:07.000 | So every time we import the previous notebook,
01:40:12.120 | just like we do in Python, the previous notebooks, however,
01:40:16.440 | aren't just numbered but they also have the name.
01:40:18.400 | That's the only difference.
01:40:19.720 | And then this is just the equivalent
01:40:22.840 | of percent map plot lib inline in this environment.
01:40:30.040 | So here, load data, we'll show you how we built something
01:40:37.520 | that downloads imagenet but it basically looks almost exactly
01:40:40.440 | like the very similar to the download MNIST thing
01:40:44.040 | you've already seen.
01:40:45.320 | And we've created an item list which has extensions.
01:40:49.680 | And we've created a split data which takes an item list.
01:40:54.280 | And one of the nice things here is
01:40:56.320 | that we don't really need something
01:40:58.760 | like functools.partial in Swift
01:41:02.400 | because now we can just pass in a trailing closure
01:41:05.520 | which as Chris described, if the last parameter is a closure,
01:41:12.080 | then you can just whack it in curly brackets.
01:41:14.200 | You don't even need a return or anything.
01:41:16.200 | And you don't even have to give it an argument name
01:41:18.880 | because you can use the default ones.
01:41:20.920 | So we're saying split this item list by grandparent.
01:41:26.720 | This is the file name that you're going to get.
01:41:28.360 | This is basically like the equivalent of doing partial,
01:41:30.720 | right, and it's going to be some different validation set.
01:41:34.160 | And so now we can create our label data
01:41:39.120 | and we need a processor.
01:41:40.480 | So we have, again, a category processor.
01:41:42.200 | So you can say we've got a whole data blocks API here.
01:41:45.720 | One of the things that I guess you're going to talk about next
01:41:49.000 | week, Chris, is end and mutation and stuff.
01:41:51.920 | >> Sure.
01:41:52.240 | >> Yeah, OK, so basically in Swift, as Chris mentioned,
01:42:00.880 | most of the time we use struts.
01:42:02.800 | And as Chris will describe to you,
01:42:04.360 | struts are things that normally don't change.
01:42:07.760 | But you can create something that kind of feels a lot
01:42:10.840 | like a C++ reference or a C pointer,
01:42:13.480 | but it's much cooler by adding an ampersand.
01:42:15.960 | Because remember, processors actually change their state
01:42:20.520 | because we get like a vocabulary, for example,
01:42:22.960 | the first time we use a processor on the training set.
01:42:26.360 | So now we've got a split label data set.
01:42:29.520 | And then we've added a to data bunch and we can pass
01:42:33.160 | in all the normal stuff, including a batch size.
01:42:37.640 | So next thing we can do is we can do transformations.
01:42:43.000 | And again here, we can use a trailing closure
01:42:45.880 | to basically do a partial, to say that we're going to do
01:42:49.200 | resize in our transformations.
01:42:53.480 | So then we'll grab a batch.
01:42:56.200 | Something that I think Chris will probably talk
01:42:57.880 | about next week more is this thing.
01:43:00.800 | But basically in Swift, very often you want to be able
01:43:06.480 | to say, hey, this is going to return either a batch of data
01:43:10.400 | or maybe it was going to return nothing at all, right?
01:43:13.320 | Which in Python, we use the optional type for that.
01:43:18.560 | And it's called the same thing in Swift, right?
01:43:20.760 | >> Yeah, none.
01:43:21.520 | >> None, yeah.
01:43:22.840 | So basically what happens is if you have something
01:43:27.440 | that might not return anything, so one batch might not return
01:43:30.200 | anything because it might be nothing to return.
01:43:32.160 | It can return nothing.
01:43:33.440 | And then the exclamation mark just says, assume that it's something.
01:43:39.440 | So we can look at the shapes of that batch.
01:43:41.760 | And look, we've even got show batch.
01:43:45.680 | So it's been really fun, this process of, you know,
01:43:49.520 | in the last couple of weeks of basically saying,
01:43:52.680 | what does fast AI look like on Swift?
01:43:58.960 | And one thing I'll say is like a lot of these notebooks have been written
01:44:04.960 | by Sylvain in particular and by me a little bit.
01:44:08.720 | And we don't know Swift at all well.
01:44:10.720 | So any good Swift programmers looking
01:44:12.840 | through those notebooks thinking, oh, this is nice,
01:44:15.640 | but it'd be even more Swift-y if you did blah.
01:44:18.400 | Please let us know in the forum, because we're super interested
01:44:20.800 | to learn how to write better Swift.
01:44:23.000 | >> And I've been super interested to learn all the ML.
01:44:25.280 | It's been great.
01:44:26.600 | >> It has been great.
01:44:27.880 | I mean, it's, you know, in one sense, it's a good sign
01:44:32.400 | that you're learning fast AI for Swift from the people who started the fast AI
01:44:37.760 | in Swift projects, but on another sense, I know nothing about Swift
01:44:41.600 | and Chris doesn't know much about deep learning,
01:44:43.440 | so maybe it's the worst of all possible worlds.
01:44:45.360 | I don't know.
01:44:46.640 | >> No, I think we're all learning together.
01:44:50.080 | >> So anyway, yeah, it's been super fun.
01:44:51.640 | So as you can see, we've got a data blocks API that's now working.
01:44:57.000 | The other thing I mentioned, as you'll see next week,
01:44:59.440 | is the way we've got this working is it's using a TensorFlow API called tf.data,
01:45:06.360 | which is actually a lot better than a lot of data APIs,
01:45:11.040 | but it's still not nearly as good as I would like, and I would love to,
01:45:15.360 | as a community, start building out the next version that uses
01:45:19.160 | like Swift's libdispatch to do the threading and maybe openCV
01:45:23.520 | to do the transformations and stuff like that.
01:45:25.840 | Like we can build, I think, a data blocks,
01:45:29.280 | something like the Python data blocks API, but that is like native.
01:45:34.200 | It's not talking to anything else.
01:45:36.160 | >> Yeah.
01:45:37.200 | >> Anyway, so now we've got batches of data.
01:45:39.480 | We can train a model as soon as we have one.
01:45:42.960 | So let's create an x-resnet model, and as you've already seen
01:45:46.400 | in the slides, it ends up looking very, very familiar.
01:45:51.160 | So here's our conflier.
01:45:53.080 | Just one thing to mention, at the moment,
01:45:56.600 | and this will probably only be true for a couple more weeks,
01:45:59.320 | there are kind of two versions of all the layers.
01:46:01.800 | There's the versions in the fast AI repo, which all start with FA,
01:46:06.080 | and there are versions in the Swift repo that don't.
01:46:08.720 | So just ignore these FAs.
01:46:10.960 | So a conflier has a batch norm, and it has a convolution.
01:46:16.720 | Another thing that's slightly awkward at the moment is that we --
01:46:19.960 | so you'll see, right now, some of our code looks weird
01:46:25.880 | because auto diff in Swift doesn't support flow control,
01:46:29.960 | so if or for loops.
01:46:32.440 | That'll change soon enough, right?
01:46:34.320 | So when you see something like no bias convolution,
01:46:37.760 | that's because we can't write a convolution layer
01:46:41.160 | that has an if statement saying if the person asks for bias,
01:46:44.560 | use it, otherwise don't, right?
01:46:46.280 | So don't worry too much about those workarounds.
01:46:48.600 | Either they'll go away soon enough.
01:46:50.800 | So we've got a batch norm layer, we've got a conv layer,
01:46:53.560 | and we can go through them, and the zero bn is the stuff
01:46:55.960 | that you're used to, and as Chris said,
01:46:59.360 | dunder call is built without the dunder,
01:47:01.800 | but otherwise everything looks the same as usual.
01:47:05.600 | Because we don't have the ability right now --
01:47:09.240 | this will change soon -- to use if in code
01:47:13.160 | that needs to be differentiated,
01:47:14.920 | we've basically added something called a switchable layer,
01:47:17.240 | which is something you can turn on or off,
01:47:19.000 | so the details don't matter.
01:47:21.560 | Chris will describe next week, however,
01:47:23.440 | how we actually wrote our own kind of custom gradients
01:47:28.560 | for this kind of layer, and that'll be fun to learn about.
01:47:31.880 | So then we used that to basically have something where --
01:47:35.600 | because remember in xresnet, in the identity path,
01:47:39.280 | it's not always an identity path.
01:47:41.240 | Sometimes you down-sample in that path,
01:47:43.040 | sometimes you change the number of channels in that path.
01:47:45.560 | If you down-sample, then you maybe add an average pool 2D.
01:47:50.240 | So because, again, we don't have the ability to have if,
01:47:53.520 | we just use this switchable layer,
01:47:55.320 | and maybe you change the number of channels
01:48:00.160 | by adding a one-by-one conv, so that's all that is.
01:48:04.400 | So most of this stuff, if you're watching this, you know,
01:48:08.520 | much later than kind of July or something,
01:48:10.560 | this will probably all have gone away
01:48:11.840 | and been replaced by some if statements.
01:48:16.080 | But, you know, once we've got all that,
01:48:17.400 | the res block, there's really nothing to mention, is there?
01:48:21.960 | I mean, it's basically identical.
01:48:23.520 | If you look back at the version in 11 on --
01:48:32.240 | in the Python versions and kind of switch between them,
01:48:40.120 | you almost need like a strobe-like thing
01:48:42.160 | to see that they're different.
01:48:43.440 | Like, it's the same layers equals conv layer.
01:48:47.400 | Layers equals conv layer.
01:48:49.000 | I don't know why we changed the name.
01:48:50.480 | Got this ternary here.
01:48:52.040 | This question mark and colon is identical to if
01:48:58.760 | and else as an operator in Python.
01:49:01.480 | It comes from C. And then, yeah, and then finally in the call,
01:49:08.480 | that and that look exactly the same.
01:49:13.840 | >> Pure self.
01:49:15.160 | >> Yeah. Thank heavens.
01:49:16.640 | Thank you.
01:49:19.480 | Make layer looks basically the same.
01:49:21.760 | This is the make layer we had before.
01:49:24.080 | This is the make layer we have now.
01:49:26.160 | Don't need that.
01:49:29.600 | And so it's interesting to see how some swift things kind
01:49:34.800 | of come out quite neatly, right?
01:49:36.200 | So this use of map, so this is generating --
01:49:39.960 | this is the same as range and blocks in Python.
01:49:44.680 | So this is basically saying map, range and blocks,
01:49:48.440 | and then passing in this closure
01:49:50.640 | which generates the res block, right?
01:49:53.080 | So I think it's kind of -- I don't know.
01:49:54.520 | I find it more clear, the swift way, but very, very similar.
01:49:59.160 | >> And the idea of Swift is to have simpler primitives
01:50:01.840 | that compose instead of having special cases
01:50:04.320 | for the important things.
01:50:07.200 | >> Yeah. So now we've got all that.
01:50:09.400 | The x res net looks very similar to what we would expect.
01:50:16.760 | We've still got our number of filters thing going on.
01:50:20.120 | The stem, so now we've got that array map thing.
01:50:24.120 | You're kind of going to start to get a feel for these kind
01:50:26.280 | of idioms in Swift.
01:50:28.440 | So kind of range.map is a useful kind of idiom to be aware of.
01:50:34.080 | >> You can also use a for loop.
01:50:35.200 | You can say for i and 0.3.
01:50:37.960 | That's also fine, too.
01:50:39.280 | It just depends on how you want to write the code.
01:50:42.040 | >> There's an enumerate, but rather
01:50:43.280 | than enumerate bracket something, it's not enumerated,
01:50:45.440 | but it works the same way.
01:50:46.720 | When you map to it, you get back an index and an object just
01:50:49.280 | like in Python, so very familiar.
01:50:51.400 | So in this case, because we've gone .map and then .reduce
01:50:56.720 | with plus on a list, this is a list comprehension now, right?
01:51:01.120 | Because this is spitting out a bunch of lists
01:51:03.280 | that we're joining all together.
01:51:05.840 | So those special cases there.
01:51:09.560 | This is one of those cases where you're asking
01:51:12.840 | for the last element of a list.
01:51:14.720 | List could be empty, so there might be nothing in it.
01:51:16.760 | So exclamation mark says just assume
01:51:19.160 | that there is something there.
01:51:20.920 | And we've written a little compose.
01:51:23.880 | So we can compose this list of layers on our input.
01:51:29.680 | So again, we've got kind of similar concepts expressed
01:51:32.840 | in similar ways.
01:51:34.160 | So we can now put all that together,
01:51:36.120 | and we've got all our different resnets.
01:51:39.800 | So now we create the various functions that we need
01:51:44.600 | to pass into our ladder.
01:51:46.960 | So one is a function that's going to create a model.
01:51:49.600 | So it's going to be something that creates an exresnet,
01:51:52.800 | and that's the function that's going to create a model.
01:51:55.560 | We're going to have a function that creates our optimizer,
01:51:59.800 | which, as you'll see, we have a stateful optimizer,
01:52:03.280 | just like we had in Python.
01:52:06.480 | We have a learner, just like we had in Python,
01:52:09.200 | and it has very, very similar book to it.
01:52:12.040 | And again, next time we'll talk about how all these are built.
01:52:14.480 | And so atom optimizer, of course, is just
01:52:16.920 | a thing that's hackable.
01:52:18.680 | You can change it.
01:52:19.280 | Exactly.
01:52:21.120 | We have recorder callbacks, just like we're familiar with.
01:52:25.480 | We have one-cycle training, just like we're familiar with.
01:52:27.920 | This add one-cycle delegates and make default delegates
01:52:31.480 | is probably going to go away pretty soon,
01:52:33.520 | and we'll have some slightly neater ways to do this.
01:52:36.080 | So by the time you see this notebook,
01:52:37.960 | this might have changed a bit.
01:52:39.360 | And then we train it with a resnet 18 for a few minutes,
01:52:42.400 | and we're at 0.81.
01:52:45.120 | Couple of things to mention as I go through this end of April.
01:52:51.600 | Right now, this uses about twice as much memory as PyTorch,
01:52:56.880 | and it's about three to four times slower than PyTorch.
01:53:00.240 | No fundamental reason why this need be the case.
01:53:02.920 | We've just started.
01:53:04.720 | And so the fact that within--
01:53:06.240 | It's not bad for three weeks.
01:53:07.320 | Not bad for three weeks.
01:53:08.360 | I mean, and all the work--
01:53:09.000 | From nothing.
01:53:09.680 | And all the work that you guys did
01:53:11.120 | to build the auto diff in the first place.
01:53:12.840 | Three weeks ago, it really didn't work.
01:53:17.080 | Yeah, so it's pretty cool that we're
01:53:19.440 | at a point where we can actually train proper models like this
01:53:22.960 | from scratch in not too slow and not too memory intensive.
01:53:28.280 | And if you're interested in getting into the weights,
01:53:31.160 | we would certainly love help with fixing the performance
01:53:34.440 | and fixing the amount of memory.
01:53:36.680 | So that's a related question.
01:53:38.480 | What would be the best way to contribute
01:53:40.040 | to the Swift for TensorFlow ecosystem
01:53:42.000 | as someone who's now using Swift for the first time?
01:53:44.600 | Yeah, that's a great question.
01:53:45.880 | So the best place to go is github.com/tensorflow/swift.
01:53:50.400 | That's our landing page.
01:53:51.640 | There's also a bunch of tutorials there.
01:53:53.520 | It explains how to get and build things.
01:53:56.480 | One of the things that we're doing
01:53:57.880 | is that we're building everything in the open.
01:54:00.680 | And so we do our development in the open.
01:54:02.320 | We use GitHub.
01:54:04.520 | We have our discussions on a mailing list
01:54:06.600 | that you'll find linked up for a GitHub page.
01:54:08.680 | And so we try to have a really inclusive and welcoming
01:54:11.000 | community.
01:54:11.520 | And there's a ton of resources available and a lot to do.
01:54:15.480 | Yeah, and that's one way to do it.
01:54:18.280 | I would like to suggest another way, which
01:54:20.160 | is to come to the hair brain forum, the last day of forums.
01:54:23.960 | Because I think for a lot of people, the right way--
01:54:26.680 | the best way to contribute, the way
01:54:28.120 | that you'll get the most out of, the most relevant to you
01:54:30.480 | right now, is to pick something that you've already
01:54:34.240 | been using in Python and doesn't exist yet
01:54:37.080 | and create the Swift version.
01:54:39.920 | And you may think you have no idea how to do that.
01:54:42.160 | And perhaps you don't.
01:54:43.440 | But create a really crappy, slimmed-down Swift version
01:54:47.760 | and build from there.
01:54:48.640 | That's the only way any of this stuff gets done.
01:54:51.640 | Ask for help on the forum.
01:54:54.600 | Offer help to others.
01:54:56.520 | And so pick small bits or find some piece
01:54:59.520 | that hasn't been documented yet.
01:55:02.200 | We haven't really figured out yet where--
01:55:04.560 | To put things.
01:55:05.280 | Yeah, where fast AI lives and where Swift for TensorFlow
01:55:07.960 | lives and where different repos will be.
01:55:09.960 | But in the end, between the fast AI and Swift for TensorFlow
01:55:15.360 | repos, there'll be a kind of an ecosystem that covers
01:55:19.520 | the same kind of ground that PyTorch plus fast AI covers
01:55:23.280 | and has just as much room for you to--
01:55:25.480 | well, much more room, actually, for you
01:55:27.080 | to build things on top of that.
01:55:28.320 | Because you've got the entirety of Scikit Learn and Matplotlib
01:55:32.080 | and Pandas and all this stuff to--
01:55:36.160 | Another thing is, if you go on the fast AI GitHub,
01:55:38.920 | you'll see these workbooks.
01:55:40.840 | And we skipped from 1 to 11.
01:55:44.560 | And so next time, we'll go back through and talk about two
01:55:46.560 | and three and four and five.
01:55:48.400 | But these are all there for you now.
01:55:50.400 | And so if you'd like to look, you can go do that.
01:55:52.600 | And they'll get a little bit better by next week, I bet.
01:55:54.400 | Yeah.
01:55:54.900 | And one thing to mention is, with the normal fast AI
01:55:58.280 | notebooks, we nearly freeze them.
01:56:02.120 | Once we do the course, we just fix errors.
01:56:05.000 | And that's about it.
01:56:06.440 | These notebooks is going to be very different.
01:56:08.400 | We're going to keep them very, very up to date.
01:56:10.400 | So by the time you watch this, they may look very different.
01:56:13.800 | Because we want to always have for you, showing you,
01:56:16.880 | this is how we suggest you write Swift for TensorFlow code now.
01:56:21.320 | And even over the last week, we've been--
01:56:22.960 | if you look at the GitHub history,
01:56:24.760 | you'll see we've been discovering new things,
01:56:26.640 | like differentiable arrays and suchable layers.
01:56:28.680 | And it allows us to delete lots of previous workarounds
01:56:31.760 | in the next weeks.
01:56:32.720 | And the next couple of months will be similar.
01:56:35.080 | So yeah.
01:56:35.840 | Do you want more questions now?
01:56:37.000 | Sure.
01:56:37.760 | All right.
01:56:38.440 | Is Swift thread safe?
01:56:40.800 | Swift is both thread safe and has a really great threading
01:56:43.480 | abstraction called dispatch.
01:56:45.240 | And so you can fire up lots of concurrent work items,
01:56:48.600 | set up work queues, has a really vibrant and good API
01:56:54.080 | for doing that with quality of service,
01:56:55.560 | and all these advanced things that are available there.
01:56:58.520 | Yeah, I've never used such a nice kind of threading--
01:57:03.000 | it's like it's a framework.
01:57:04.200 | It feels more than just a language.
01:57:05.880 | So on the Apple side, they call it Grand Central Dispatch.
01:57:08.680 | But they've put the whole thing over to Linux.
01:57:11.000 | And you have this whole kind of threading library framework,
01:57:14.320 | which is super easy to use and extensible.
01:57:17.760 | This is one of the reasons the Swift server community really
01:57:20.200 | likes Swift, is that it's efficient, yes,
01:57:22.320 | but it also supports threading and other things really well
01:57:24.720 | and very naturally.
01:57:27.040 | Are there any scikit-learn for Swift projects in the works?
01:57:30.840 | I have no idea.
01:57:32.400 | There should be.
01:57:33.400 | Yeah, I haven't seen anything much.
01:57:36.240 | I have a random forest implementation
01:57:38.400 | I would love to convert over to Swift,
01:57:40.400 | which is a pretty nice one.
01:57:42.120 | But it would definitely be nice if you could build a gradient
01:57:45.080 | boosting machine or even simple things like K-nearest neighbors
01:57:49.720 | and stuff like that.
01:57:51.440 | I think, though, that the opportunity here
01:57:57.160 | is to go beyond just reinventing what's already there.
01:58:00.200 | Scikit-learn is cool that it exists,
01:58:02.240 | but it could be a lot better, particularly in Swift.
01:58:06.520 | So if you do decide I want to build bits of sklearn
01:58:11.000 | or bits of pandas, jump on the forum
01:58:13.720 | and talk to us about it, and let's
01:58:16.000 | try and build something that's vastly better than what's
01:58:19.560 | been before, not just a copy of it.
01:58:21.440 | I wouldn't suggest that being a starter project.
01:58:23.440 | If you want a starter project, pick
01:58:24.880 | one of the lessons in the one through seven class
01:58:29.440 | and implement some of those algorithms.
01:58:31.240 | I think in terms of the framework,
01:58:32.400 | I think that'd be a really great place to start.
01:58:34.560 | As you get more experienced and you get more familiar with Swift,
01:58:37.560 | then tackling something like building a big library
01:58:39.920 | can be a lot of fun.
01:58:40.760 | Is there any plan to build tensor shapes into the Swift type
01:58:47.200 | system?
01:58:48.680 | So we have a tensor shape type.
01:58:51.520 | It's a struct, literally, right now.
01:58:53.160 | And that's when you pass in shapes
01:58:55.200 | to create a tensor of zeros, you get that.
01:58:57.320 | I think what they're probably asking is,
01:58:59.720 | will we get dimensions in the types
01:59:02.280 | or will we get names in the dimensions?
01:59:04.800 | So this is something we're super interested in.
01:59:06.720 | We think there's a lot of opportunities there,
01:59:08.720 | both in terms of shape checking, for example.
01:59:12.120 | The whole stack we're building with the compiler integration
01:59:14.800 | and the good the locations and stuff like that.
01:59:16.960 | We think there's a ton of potential.
01:59:18.620 | We haven't tapped into that yet.
01:59:20.520 | The names on dimensions is tricky,
01:59:22.360 | and there's lots of space.
01:59:23.600 | And we haven't exactly figured out
01:59:25.000 | how the best way to do that is, because there's trade offs.
01:59:27.520 | But that's exactly all the second step things we want to do,
01:59:31.480 | probably starting this fall-ish, when
01:59:33.760 | the basic auto diff, base performance, like scale out,
01:59:37.240 | and a bunch of other things are all settled.
01:59:39.620 | And we feel good about all those basics.
01:59:42.160 | And we're trying to stay focused on making sure
01:59:44.080 | that things are really good, and build the basics,
01:59:47.840 | and get them right, and then move out.
01:59:51.000 | Any more questions?
01:59:51.960 | No, I mean, that's fine.
01:59:56.520 | We're at a good stuff.
01:59:58.960 | How is ampersand referencing different from struct?
02:00:02.760 | Oh, we'll talk more about that next time.
02:00:05.400 | So Swift has a-- this comes back to safety in the language.
02:00:09.720 | Swift has a completely different approach
02:00:12.760 | to references, like classes, and structs, and values.
02:00:17.200 | And so I'm going to save that mind-blowing piece
02:00:20.080 | for next time.
02:00:21.840 | It's really cool.
02:00:22.840 | It is.
02:00:23.680 | It underlies a ton of--
02:00:25.960 | I mean, it's a very foundational thing
02:00:28.040 | that makes a lot of stuff possible.
02:00:31.500 | And how is Swift for probabilistic programming?
02:00:35.000 | So this is one of the areas that I'm
02:00:37.080 | both completely incapable of talking intelligently about,
02:00:40.600 | but also very excited about, because this
02:00:42.360 | is one of those things that I think is really underutilized.
02:00:46.880 | One of the things that I think is really interesting
02:00:49.200 | about Swift as a platform for machine learning
02:00:51.440 | is that you often--
02:00:54.600 | so with Python, you end up in this weird situation
02:00:57.760 | where you build a machine learning model,
02:00:59.880 | and then you have an application that you eventually
02:01:02.080 | want to use it in.
02:01:02.800 | And these are two different things.
02:01:04.400 | Training your model and deploying your model
02:01:06.200 | are different worlds.
02:01:07.440 | And we can start erasing some of those boundaries,
02:01:09.480 | because Swift can be put in a mobile app, believe it or not,
02:01:11.800 | or put in a server, or put in other things
02:01:13.840 | you're actually deploying.
02:01:14.840 | And probabilistic programming and many
02:01:16.480 | of these other applications I think
02:01:18.960 | would be great to build and integrate
02:01:20.400 | with the applications themselves.
02:01:21.760 | So I think the answer is it'll be a great fit.
02:01:25.200 | I haven't seen anything here yet.
02:01:27.880 | But basically, with things like probabilistic programming
02:01:31.480 | or things like kind of graph-based systems,
02:01:36.400 | they're not a great fit for PyTorch.
02:01:38.760 | And that's not a controversial thing to say,
02:01:40.640 | because Sumith Chinchilla, who created PyTorch,
02:01:42.760 | said that on Twitter himself last week.
02:01:44.800 | He said, if you want to do this kind of work at the moment,
02:01:47.800 | you might want to look at Julia, which is another great option
02:01:50.040 | for this kind of programming.
02:01:51.520 | Because what happens is you have these kind
02:01:55.080 | of deep connections, computational graphs,
02:01:57.920 | of lots of things calling lots of other things.
02:02:00.080 | And so you need-- and they're often kind of small.
02:02:02.800 | So you need those things to happen super quickly.
02:02:05.760 | So things like Julia and Swift work really
02:02:10.720 | well for that kind of thing, particularly
02:02:12.840 | when you add all the threading stuff on top as well.
02:02:16.680 | So if you're interested in that area,
02:02:18.400 | that would certainly be one that I
02:02:20.680 | think you could start getting into straight away.
02:02:23.880 | One of the nice things about that is you can do a lot on the CPU.
02:02:27.400 | A lot of these things don't even make sense on the GPU.
02:02:30.600 | So you can take advantage of it right now.
02:02:33.560 | And for that, actually, we'll add it to the forum post.
02:02:37.880 | But I actually have a post already
02:02:40.480 | about how to access a variety of random number distributions
02:02:44.080 | from Swift, C++, random number distributions.
02:02:48.240 | So you could actually get started with this right away.
02:02:52.520 | Yeah, also, the TensorFlow ecosystem
02:02:54.360 | has a big, mature framework called TensorFlow Probability.
02:02:57.680 | And so I personally don't know very much about it.
02:02:59.680 | But I expect that all the atoms you need are all there.
02:03:03.480 | And we just need somebody who knows the space really well
02:03:05.800 | to build a Swift library that can expose all the primitives
02:03:08.880 | that TensorFlow already has.
02:03:10.560 | Is that it?
02:03:11.040 | How could you deploy Swift models on Android?
02:03:19.520 | Well, so I think there's two options that you have there.
02:03:21.760 | So one is, Swift, again, builds on the entire TensorFlow
02:03:25.080 | ecosystem.
02:03:25.760 | And so TensorFlow ecosystem includes graphs.
02:03:29.200 | And graphs are part of what Swift talks to.
02:03:32.360 | So you can export a graph.
02:03:33.440 | You can send it through TF Lite.
02:03:34.860 | And so the whole mobile deployment situation there
02:03:37.800 | works.
02:03:39.040 | I feel like that's kind of the model we're trying to get away
02:03:41.480 | from a little bit, though, do you feel that way?
02:03:42.840 | So the other option is, Swift actually runs fine on Android.
02:03:45.280 | People ship Android apps written in Swift.
02:03:47.280 | So you can do that, too.
02:03:49.960 | Swift on Android isn't really awesome, as far as I know.
02:03:53.400 | I'm not exactly an Android programmer.
02:03:54.920 | I don't really know that.
02:03:56.000 | But the issue there is that Android,
02:03:59.160 | you have a dual world between the Java stuff
02:04:01.760 | and the native stuff.
02:04:02.720 | And so Swift fits into the native stuff is my understanding.
02:04:06.200 | But I know that people are building and shipping
02:04:08.960 | Android apps written in Swift.
02:04:10.120 | And so that's a totally reasonable thing to do.
02:04:13.940 | The other thing to mention in terms of platforms
02:04:17.200 | is that Swift on Windows is starting to come together
02:04:20.540 | as well.
02:04:21.040 | So I don't know where it'll be by the time you're
02:04:23.040 | watching this at home.
02:04:24.280 | But Swift is definitely making its way
02:04:28.240 | to worlds outside the iOS world pretty rapidly.
02:04:32.640 | And Windows is one of them.
02:04:34.080 | Yeah, it's super exciting.
02:04:35.540 | People are writing, what's the Windows MFC apps in Swift,
02:04:42.040 | which is brain-twisting to me.
02:04:44.440 | So what we're going to close with now
02:04:47.040 | is a little taste of where we're heading next week.
02:04:51.320 | And this is actually something that Rachel
02:04:55.800 | shows in her computational linear algebra course.
02:04:58.960 | And it comes from a really amazing programming language
02:05:05.160 | called Halide.
02:05:06.640 | And Halide is one of these dramatic steps
02:05:10.880 | in terms of completely rethinking how we program
02:05:13.920 | computers.
02:05:15.660 | And I want to show it to you because it gives you
02:05:18.240 | a sense of the kinds of problems that Swift
02:05:22.080 | has to solve in order to--
02:05:24.960 | like, the goal here is to be able to get--
02:05:29.720 | the goal here is to be able to get this performance.
02:05:31.800 | Because remember, the C speed, triply nested for loop,
02:05:35.640 | was 0.07.
02:05:37.680 | But TensorFlow is 0.02.
02:05:40.480 | How do we get this level of performance,
02:05:43.600 | but you being able to write it yourself in Swift?
02:05:47.720 | Now here's why it's hard.
02:05:49.320 | And so this video actually comes from the Halide project, which
02:05:55.400 | is a programming language that has kind of invented
02:05:58.920 | an amazing way to think about this.
02:06:01.880 | And so I'm going to use it to describe the problems
02:06:04.200 | that we're going to solve.
02:06:05.520 | So in order to write something fast,
02:06:09.360 | we have to think about how everything's
02:06:12.120 | going to get put together.
02:06:14.120 | And so the algorithm we're going to write here
02:06:16.880 | that they wrote in this Halide video
02:06:18.520 | is one where they're doing a simple blur, a 3 by 3 blur.
02:06:22.320 | So we take each group of 3 by 3 pixels,
02:06:25.200 | and we take their average.
02:06:26.520 | That's how you do a 3 by 3 blur.
02:06:28.500 | What are some of the things we could do?
02:06:31.200 | In what order, for example, do I compute the values
02:06:34.020 | in my 3 by 3 blur?
02:06:35.560 | And one way is just go through each x one at a time,
02:06:39.800 | and then within that, go through each y one at a time.
02:06:43.040 | That would be one choice.
02:06:45.040 | A second choice I could make would
02:06:46.880 | be to do the exact opposite, which
02:06:48.560 | is to go through each column one at a time.
02:06:51.720 | Now these aren't going to have very different characteristics.
02:06:54.520 | Maybe the latter might be a bit slower,
02:06:56.020 | because we're jumping further through memory.
02:06:58.160 | But what we could do is we could do something called vectorization.
02:07:02.040 | And vectorization is super important,
02:07:05.920 | because what happens with vectorization
02:07:07.880 | is we actually take four or sometimes even eight numbers
02:07:11.160 | at a time, and throw them all at the CPU or GPU at once,
02:07:15.320 | and say calculate them all together.
02:07:16.960 | And so we have these things called vector units
02:07:18.840 | in our computers nowadays that can do that.
02:07:21.160 | You can even have multiple vectorized things
02:07:23.100 | happening at the same time, because you have multiple cores.
02:07:26.240 | But in fact, in the GPU, this is what happens all the time.
02:07:29.560 | Or in order to think about better memory access,
02:07:33.040 | we could do a little block at a time, like this.
02:07:36.880 | So there's lots of choices about how
02:07:38.920 | I compute my values that are going
02:07:40.200 | to change the performance characteristics of my,
02:07:43.040 | in this case, a blur.
02:07:44.880 | Another question is, when should I compute my inputs?
02:07:48.660 | So here's my input.
02:07:50.000 | And see how it's going through three at a time?
02:07:53.160 | Because I'm trying to calculate three at a time.
02:07:55.940 | And that gives me my blur in x.
02:07:57.600 | Now I have to go through all of those three at a time.
02:08:00.320 | And so you can see this is super slow.
02:08:02.080 | It's recalculating things again and again.
02:08:04.400 | And it's not able to really use the cache well.
02:08:07.240 | Instead, I could do a whole set of nine at a time.
02:08:11.560 | And that would then allow me to create a whole blurred output
02:08:15.320 | at a time.
02:08:17.720 | Or I could go through it like this, exactly like before,
02:08:20.680 | but actually save what I had before.
02:08:22.840 | And that means when I then do the next row,
02:08:25.440 | I don't need to recompute, because it's already there.
02:08:29.360 | OK, just to add a clarification that the left panel's input,
02:08:32.960 | the middle is the intermediate values,
02:08:35.440 | and the right is the final output.
02:08:36.920 | Thank you.
02:08:39.400 | We could vectorize that.
02:08:40.920 | So we can do vectorized input, and then vectorized
02:08:43.500 | on the intermediate values, and then calculate those
02:08:45.760 | to create our vectorized output with parallel processing.
02:08:49.840 | Here's another way that we could combine
02:08:51.520 | vectorization and parallel processing.
02:08:53.120 | There's all these things you can do.
02:08:54.720 | And you can see they're going to have very different performance
02:08:57.680 | characteristics, right?
02:08:59.480 | So at Halide, they have this neat idea, which is, hey,
02:09:06.800 | let's not write nested, nested, nested for loops, and tiles,
02:09:11.560 | and looping through the memory like that.
02:09:13.260 | Let's instead describe for each value x, y in my blurred output,
02:09:19.200 | here is how it's calculated in this declarative way.
02:09:24.560 | This is literally the definition of a blur algorithm.
02:09:28.560 | And so you first do that.
02:09:30.840 | And then after you've done that, you can then say to Halide,
02:09:36.080 | what are different schedules for calculating that?
02:09:40.040 | So what's the kind of order in which things are done?
02:09:42.520 | And for these different schedules that are written here,
02:09:45.960 | they have all the different behaviors you just saw.
02:09:49.120 | Now, here's the thing.
02:09:51.120 | When expert CUDA programmers and expert CPU programmers
02:09:56.480 | write code to do things like this,
02:10:02.160 | they're using the world's best knowledge available
02:10:05.640 | across all of those things to create special versions
02:10:08.160 | for every architecture, for lots of different matrix sizes,
02:10:13.880 | tensors of different numbers of dimensions, so much assembly
02:10:19.040 | code, right?
02:10:20.800 | And so we can't write that, right?
02:10:22.840 | So how are we going to be able to write the stuff that's
02:10:28.520 | in our head, but have it run reasonably quickly?
02:10:34.440 | And so what we're moving towards with stuff like MLIR
02:10:40.440 | is the ability to kind of have domain-specific languages
02:10:44.280 | where you could write, here's the tiling domain-specific
02:10:48.720 | language, and here's the--
02:10:51.240 | For example, Halide directly.
02:10:53.000 | For example, Halide directly, right?
02:10:54.680 | And so that's the hope of where we're going to be going here
02:10:58.360 | is that Chris's team is going to be putting these kinds of tools
02:11:02.760 | in your hands via Swift.
02:11:05.400 | Is that a reasonable summary?
02:11:06.600 | Well, and so one of the bad things about Halide-- so
02:11:09.080 | in this space, we have TensorFlow today, TensorFlow today.
02:11:13.160 | We have XLA.
02:11:14.080 | XLA is an important part of TensorFlow right now.
02:11:16.280 | It's just not really wired into the Swift part of TensorFlow
02:11:20.240 | XLA does all this stuff right now.
02:11:21.800 | And XLA is really good because you
02:11:23.160 | don't have to hand-tune it, like that whole writing out
02:11:26.200 | schedules.
02:11:26.760 | XLA does a good job of using the hardware as it is today.
02:11:30.440 | The thing we're going further with MLIR is to say,
02:11:32.680 | well, instead of you having to put all this blood, sweat,
02:11:35.480 | and tears in to tune it, and know the hardware,
02:11:37.840 | and do all this stuff, we can do other things like search.
02:11:41.800 | And in fact, there are research systems
02:11:43.720 | available now which will use genetic algorithms
02:11:48.080 | to auto-search for their optimal schedule.
02:11:50.760 | So you're starting to see the ideas that come out
02:11:53.600 | of the database query optimizer world coming into the CUDA
02:11:57.600 | kernel world.
02:11:58.320 | And this is going to be so great for us data scientists.
02:12:00.760 | Exactly.
02:12:01.240 | Search can be implemented in lots of different ways--
02:12:03.240 | brute force, reinforcement learning,
02:12:05.080 | lots of different genetic algorithms.
02:12:06.760 | There's lots of cool things that can be done here.
02:12:08.880 | And so what we're going to do over time
02:12:10.640 | is crack open the compiler and make the internal algorithms
02:12:14.080 | all learned.
02:12:14.720 | And I think that's going to be super cool.
02:12:16.480 | So this is why you have a compiler guy and a DL guy
02:12:20.640 | standing next to each other, right?
02:12:22.120 | Because it--
02:12:22.960 | Oh, well, we like each other, too.
02:12:24.920 | Oh, you're OK.
02:12:27.680 | Because we're not going to get this kind of great outcome
02:12:32.240 | unless people like us are working together
02:12:35.240 | with amazing teams like the folks
02:12:36.840 | that they have in TensorFlow and XLA and so forth.
02:12:39.760 | So next week, come back, and we will dig even deeper.
02:12:44.440 | Thank you very much, Chris Langmer.
02:12:45.800 | Thank you, Jeremy.
02:12:46.640 | (audience applauds)