back to indexPractical Deep Learning for Coders: Lesson 1
Chapters
0:0 Introduction
0:25 What has changed since 2015
1:20 Is it a bird
2:9 Images are made of numbers
3:29 Downloading images
4:25 Creating a DataBlock and Learner
5:18 Training the model and making a prediction
7:20 What can deep learning do now
10:33 Pathways Language Model (PaLM)
15:40 How the course will be taught. Top down learning
19:25 Jeremy Howard’s qualifications
22:38 Comparison between modern deep learning and 2012 machine learning practices
24:31 Visualizing layers of a trained neural network
27:40 Image classification applied to audio
28:8 Image classification applied to time series and fraud
30:16 Pytorch vs Tensorflow
31:43 Example of how Fastai builds off Pytorch (AdamW optimizer)
35:18 Using cloud servers to run your notebooks (Kaggle)
38:45 Bird or not bird? & explaining some Kaggle features
40:15 How to import libraries like Fastai in Python
40:42 Best practice - viewing your data between steps
42:0 Datablocks API overarching explanation
44:40 Datablocks API parameters explanation
48:40 Where to find fastai documentation
49:54 Fastai’s learner (combines model & data)
50:40 Fastai’s available pretrained models
52:2 What’s a pretrained model?
53:48 Testing your model with predict method
55:8 Other applications of computer vision. Segmentation
56:48 Segmentation code explanation
58:32 Tabular analysis with fastai
59:42 show_batch method explanation
61:25 Collaborative filtering (recommendation system) example
65:8 How to turn your notebooks into a presentation tool (RISE)
65:45 What else can you make with notebooks?
68:6 What can deep learning do presently?
70:33 The first neural network - Mark I Perceptron (1957)
72:38 Machine learning models at a high level
78:27 Homework
00:00:00.000 |
Welcome to practical deep learning for coders lesson one. This is version 5 of this course. 00:00:11.760 |
And it's the first year one we've done in two years. So we've got a lot of cool things 00:00:15.560 |
to cover. It's amazing how much has changed. Here is a XKCD from the end of 2015. Who here 00:00:28.520 |
has seen XKCD comics before? Pretty much everybody, not surprising. So the basic joke here is 00:00:36.960 |
I'll let you read it and then I'll come back to it. So it can be hard to tell what's easy 00:00:54.320 |
and what's nearly impossible. And in 2015 or at the end of 2015, the idea of checking 00:00:58.800 |
whether something is a photo of a bird was considered nearly impossible. So impossible 00:01:03.320 |
it was the basic idea of a joke because everybody knows that that's nearly impossible. We're 00:01:08.680 |
now going to build exactly that system for free in about two minutes. So let's build 00:01:15.880 |
an is-it-a-bird system. So we're going to use Python. And so I'm going to run through 00:01:22.960 |
this really quickly. You're not expected to run through it with me because we're going 00:01:25.900 |
to come back to it. OK. But let's go ahead and run that cell. OK. So what we're doing 00:01:35.840 |
is we're searching DuckDuckGo for images of bird photos and we're just going to grab one. 00:01:43.320 |
And so here is the URL of the bird that we grabbed. OK. We'll download it. OK. So there 00:01:53.880 |
it is. So we've grabbed a bird and so OK. We've now got something that can download 00:01:58.280 |
pictures of birds. Now we're going to need to build a system that can recognize things 00:02:05.600 |
that are birds versus things that aren't birds from photos. Now of course computers need 00:02:10.680 |
numbers to work with. But luckily images are made of numbers. I actually found this really 00:02:18.320 |
nice website called PickSpy where I can grab a bird. And if I go over it, let's pick its 00:02:31.360 |
beak. You'll see here that that part of the beak was 251 brightness of red, 48 of green 00:02:41.360 |
and 21 of blue. So that's GB. And so you can see as I wave around those colors are changing 00:02:50.400 |
those numbers. And so this picture, the thing that we recognize as a picture, is actually 00:02:58.300 |
256 by 171 by 3 numbers between 0 and 255 representing the amount of red, green and blue 00:03:09.240 |
on each pixel. So that's going to be an input to our program that's going to try and figure 00:03:15.420 |
out whether this is a picture of a bird or not. OK. So let's go ahead and run this cell. 00:03:28.080 |
Which is going to go through, and I needed bird and non-bird, but you can't really search 00:03:33.840 |
Google images or dot dot go images for not a bird. This doesn't work that way. So I just 00:03:39.520 |
decided to use forest. I thought, OK, pictures of forest versus pictures of bird sounds like 00:03:43.960 |
a good starting point. So I go through each of forest and bird and I search for forest 00:03:51.520 |
photo and bird photo, download images and then resize them to be no bigger than 400 00:03:58.480 |
pixels on a side, just because we don't need particularly big ones. It takes a surprisingly 00:04:03.680 |
large amount of time just for a computer to open an image. OK, so we've now got 200 of 00:04:09.720 |
each. I find when I download images, I often get a few broken ones. And if you try and 00:04:14.960 |
train a model with broken images, it will not work. So here's something which just verifies 00:04:20.040 |
each image and unlinks, so deletes the ones that don't work. OK, so now we can create 00:04:28.080 |
what's called a data block. So after I run this cell, you'll see that basically I go 00:04:40.960 |
through the details of this later, but a data block gives fast AI, the library, all the 00:04:46.280 |
information it needs to create a computer vision model. And so in this case, we're basically 00:04:50.560 |
telling it get all the image files that we just downloaded. And then we say show me a 00:04:55.120 |
few up to six. And let's see. Yeah, so we've got some birds, forest, bird, bird, forest. 00:04:59.800 |
OK, so one of the nice things about doing computer vision models is it's really easy 00:05:03.600 |
to check your data because you can just look at it, which is not the case for a lot of 00:05:07.960 |
kinds of models. OK, so we've now downloaded 200 pictures of birds, 200 pictures of forests. 00:05:16.080 |
So we'll now press run. And this model is actually running on my laptop. So it's not 00:05:25.040 |
using a vast data center. It's running on my presentation laptop and it's doing it at 00:05:29.960 |
the same time as my laptop is streaming video, which is possibly a bad idea. And so what 00:05:38.440 |
it's going to do is it's going to run through every photo out of those 400. And for the 00:05:45.400 |
ones that are forest, it's going to learn a bit more about what forest looks like. And 00:05:49.760 |
for the ones that are bird, it'll learn a bit more about what bird looks like. So overall, 00:05:54.580 |
it took under 30 seconds. And believe it or not, that's enough to finish doing the thing 00:06:03.440 |
which was in that XKCD comic. Let's check by passing in that bird that we downloaded 00:06:11.000 |
at the start. This is a bird. Probability, it's a bird. One. Rounded to the nearest four 00:06:19.280 |
decimal places. So something pretty extraordinary has happened since late 2015, which is literally 00:06:28.800 |
something that has gone from so impossible, it's a joke, to so easy that I can run it 00:06:35.040 |
on my laptop computer in, I don't know how long it was, about two minutes. And so hopefully 00:06:42.160 |
that gives you a sense that creating really interesting, real working programs with deep 00:06:52.520 |
learning is something that it doesn't take a lot of code, didn't take any math, didn't 00:06:57.880 |
take more than my laptop computer. It's pretty accessible, in fact. So that's really what 00:07:05.200 |
we're going to be learning about over the next seven weeks. 00:07:09.280 |
So where have we got to now with deep learning? Well, it moves so fast. But even in the last 00:07:17.520 |
few weeks, we've taken it up another notch as a community. You might have seen that something 00:07:23.600 |
called Dali 2 has been released, which uses deep learning to generate new pictures. And 00:07:30.000 |
I thought this was an amazing thing that this guy, Nick, did, where he took his friend's 00:07:34.680 |
Twitter bios and typed them into the Dali 2 input, and it generated these pictures. 00:07:42.460 |
So this guy, he typed in commitments, empathetic, psychedelic, philosophical, and it generated 00:07:48.080 |
these pictures. So I'll just show you a few of these. I'll let you read them. I love that. 00:08:05.280 |
That one's pretty amazing, I reckon. Actually. I love this. Happy Sushi Fest has actually 00:08:17.680 |
got a happy rock to move around. So this is like, yeah, I don't know. When I look at these, 00:08:27.240 |
I still get pretty blown away that this is a computer algorithm using nothing but this 00:08:33.000 |
text input to generate these arbitrary pictures, in this case of fairly complex and creative 00:08:39.720 |
things. So the guy who made those points out, he spends about two minutes or so creating 00:08:48.480 |
each of these. He tries a few different prompts, and he tries a few different pictures. And 00:08:52.880 |
so he's given an example here of when he typed something into the system. Here's an example 00:08:56.520 |
of 10 different things he gets back when he puts in expressive painting of a man shining 00:09:02.580 |
rays of justice and transparency on a blue bird Twitter logo. So it's not just, you know, 00:09:10.960 |
Dali too, to be clear. There's, you know, a lot of different systems doing something 00:09:14.760 |
like this now. There's something called Mid Journey, which this Twitter account posted 00:09:19.960 |
a female scientist with a laptop writing code in a symbolic, meaningful, and vibrant style. 00:09:26.200 |
This one here is an HD photo of a rare psychedelic pink elephant. And this one, I think, is the 00:09:32.320 |
second one here. I never know how to actually pronounce this. This one's pretty cool. A 00:09:39.660 |
blind bat with big sunglasses holding a walking stick in his hand. And so when actual artists, 00:09:49.240 |
you know, so this, for example, this guy said he knows nothing about art, you know, he's 00:09:54.040 |
got no artistic talent, it's just something, you know, he threw together. This guy is an 00:09:58.140 |
artist who actually writes his own software based on deep learning and spends, you know, 00:10:03.680 |
months on building stuff. And as you can see, you can really take it to the next level. 00:10:09.200 |
It's been really great, actually, to see how a lot of fast AI alumni with backgrounds as 00:10:15.080 |
artists have gone on to bring deep learning and art together. And it's a very exciting 00:10:19.320 |
direction. And it's not just images, to be clear. You know, one of the other interesting 00:10:25.240 |
thing that's popped up in the last couple of weeks is Google's Pathways language model, 00:10:30.760 |
which can take any arbitrary English as text question and can create an answer, which not 00:10:37.560 |
only answers the question, but also explains its thinking, whatever it means for a language 00:10:44.640 |
model to be thinking. One of the ones I found pretty amazing was that it can explain a joke. 00:10:51.080 |
So, I'll let you read this. So, this is actually a joke that probably needs explanations for 00:11:12.240 |
anybody who's not familiar with TPUs. So, this model just took the text as input and 00:11:18.120 |
created this text as output. And so, you can see, you know, again, deep learning models 00:11:25.680 |
doing things which I think very few, if any of us would have believed would be maybe possible 00:11:31.720 |
to do by computers even in our lifetime. This means that there is a lot of practical and 00:11:41.680 |
ethical considerations. We will touch on them during this course, but can't possibly hope 00:11:48.320 |
to do them justice. So, I would certainly encourage you to check out ethics.fast.ai to see our 00:11:55.180 |
whole data ethics course taught by my co-founder, Dr. Rachel Thomas, which goes into these issues 00:12:03.640 |
in a lot more detail. All right. So, as well as being an AI researcher 00:12:12.880 |
at the University of Queensland and fast.ai, I am also a homeschooling primary school teacher. 00:12:19.680 |
And for that reason, I study education a lot. And one of the people who are live in education 00:12:26.600 |
is a guy named Dylan Williams. And he has this great approach in his classrooms of figuring 00:12:31.880 |
out how his students are getting along, which is to put a coloured cup on their desk, green 00:12:38.120 |
to mean that they're doing fine, yellow cup to mean I'm not quite sure, and a red cup 00:12:44.160 |
to mean I have no idea what's going on. Now, since most of you are watching this remotely, 00:12:49.680 |
I can't look at your cups, and I don't think anybody bought coloured cups with them today. 00:12:54.120 |
So instead, we have an online version of this. So, what I want you to do is go to cups.fast.ai/fast. 00:13:08.480 |
That's cups.fast.ai/fast. And don't do this if you're like a fast.ai expert who's done 00:13:17.120 |
the course five times, because if you're following along, that doesn't really mean much, obviously. 00:13:21.120 |
It's really for people who are not already fast.ai experts. And so, click one of these 00:13:27.120 |
coloured buttons. And what I will do is I will go to the teacher version and see what 00:13:36.800 |
buttons you're pressing. All right. So, so far, people are feeling we're not going too 00:13:41.520 |
fast on the whole. We've got one brief read. Okay. So, hey, Nick, this URL, it's the same 00:13:51.000 |
thing with teacher on the end. Can you keep that open as well and let me know if it suddenly 00:13:57.000 |
gets covered in red? If you are somebody who's read, I'm not going to come to you now because 00:14:07.160 |
there's not enough of you to stop the class. So, it's up to you to ask on the forum or 00:14:11.880 |
on the YouTube live chat. And there's a lot of folks, luckily, who will be able to help 00:14:16.520 |
you. All right. I wanted to do a big shout out to Radik. Radik created cups.fast.ai for 00:14:29.120 |
me. I said to him last week I need a way of seeing coloured cups on the internet. And 00:14:33.720 |
he wrote it in one evening. And I also wanted to shout out that Radik just announced today 00:14:42.600 |
that he got a job at NVIDIA AI. And I wanted to say, you know, that fast.ai alumni around 00:14:48.960 |
the world very, very frequently, like every day or two, they may want me to say that they've 00:14:53.720 |
got their dream job. And, yeah, if you're looking for inspiration of how to get into 00:15:01.640 |
the field, I couldn't recommend nothing. Nothing would be better than checking out Radik's 00:15:07.200 |
work. And he's actually written a book about his journey. It's got a lot of tips in particular 00:15:11.680 |
about how to take advantage of fast.ai, make the most of these lessons. And so I would certainly 00:15:17.240 |
say check that out as well. And if you're here live, he's one of our TAs as well. So 00:15:21.120 |
you can say hello to him afterwards. He looks exactly like this picture here. So I mentioned 00:15:28.760 |
I spent a lot of time studying education, both for my home schooling duties and also 00:15:34.680 |
for my courses. And you'll see that there's something a bit different, very different 00:15:40.600 |
about this course, which is that we started by training a model. We didn't start by doing 00:15:45.880 |
a in-depth review of linear algebra and calculus. That's because two of my favorite writers 00:15:53.840 |
and researchers on education, Paul Lockhart and David Perkins, and many others talk about 00:15:58.760 |
how much better people learn when they learn with a context in place. So the way we learn 00:16:06.200 |
math at school where we do counting and then adding and then fractions and then decimals 00:16:12.360 |
and then blah, blah, blah. And 15 years later, we start doing the really interesting stuff 00:16:18.320 |
at grad school. That is not the way most people learn effectively. The way most people learn 00:16:24.960 |
effectively is from the way we teach sports. For example, where we show you a whole game 00:16:31.360 |
of sports, we show you how much fun it is. You go and start playing sports, simple versions 00:16:36.720 |
of them. You're not very good. And then you gradually put more and more pieces together. 00:16:41.800 |
So that's how we do deep learning. You will go into as much depth as the most sophisticated, 00:16:52.240 |
technically detailed classes you'll find later. But first, you'll learn to be very, very good 00:17:01.240 |
at actually building and deploying models. And you will learn why and how things work 00:17:09.520 |
as you need to to get to the next level. For those of you that have spent a lot of time 00:17:14.080 |
in technical education, like if you've done a PhD or something, will find this deeply 00:17:18.080 |
uncomfortable because you'll be wanting to understand why everything works from the start. 00:17:23.440 |
Just do your best to go along with it. Those of you who haven't will find this very natural. 00:17:27.640 |
Oh, and this is Dylan Williams, who I mentioned before, the guy who came up with the really 00:17:32.280 |
cool cut scene. There'll be a lot of tricks that have come out of the educational research 00:17:40.520 |
literature scattered through this course. On the whole, I won't call them out, they'll 00:17:43.840 |
just be there. But maybe from time to time we'll talk about them. 00:17:47.200 |
All right, so before we start talking about how we actually built that model and how it 00:17:51.720 |
works, I guess I should convince you that I'm worth listening to. I'll try to do that 00:17:56.720 |
reasonably quickly, because I don't like tooting my own horn, but I know it's important. So 00:18:03.360 |
the first thing I mentioned about me is that me and my friend Songva wrote this extremely 00:18:07.520 |
popular book, Deep Learning for Coders, and that book is what this course is quite heavily 00:18:12.760 |
based on. We're not going to be using any material from the book directly, and you might 00:18:18.360 |
be surprised by that. But the reason actually is that the educational research literature 00:18:23.320 |
shows that people learn things best when they hear the same thing in multiple different 00:18:27.480 |
ways. So I want you to read the book, and you'll also see the same information presented 00:18:33.800 |
in a different way in these videos. So one of the bits of homework after each lesson 00:18:39.600 |
will be to read a chapter of the book. A lot of people like the book. Peter Norvig, Director 00:18:46.760 |
of Research, loves the book. In fact, this one's here. One of the best sources for a 00:18:50.880 |
program to become proficient in deep learning. Eric Topol loves the book. Hal Varian, America's 00:18:57.640 |
Professor at Berkeley, Chief Congressman at Google, likes the book. Jerome Percenti, who 00:19:02.320 |
is the Head of AI at Facebook, likes the book. A lot of people like the book. So hopefully 00:19:08.520 |
you'll find that you like this material as well. I've spent about 30 years of my life 00:19:15.240 |
working in and around machine learning, including building a number of companies that relied 00:19:20.440 |
on it, and became the highest ranked competitor in the world on Kaggle in machine learning 00:19:27.800 |
competitions. My company in Liddick, which I founded, was the first company to specialize 00:19:33.560 |
in deep learning for medicine, and MIT voted at one of the 50 smartest companies in 2016, 00:19:39.040 |
just above Facebook and SpaceX. I started Fast AI with Rachel Thomas, and that was quite 00:19:49.720 |
a few years ago now, but it's had a big impact on the world already. Including work we've 00:19:57.520 |
done with our students has been globally recognized, such as our Wind in the Dawn Bench competition, 00:20:04.400 |
which showed how we could train big neural networks faster than anybody in the world, 00:20:10.080 |
and cheaper than anybody in the world. And so that was a really big step in 2018, which 00:20:18.840 |
actually made a big difference. Google started using our special approaches in their models, 00:20:25.920 |
Nvidia started optimizing their stuff using our approaches, so it made quite a big difference 00:20:32.360 |
there. I'm the inventor of the ULM fit algorithm, which according to the Transformers book was 00:20:38.840 |
one of the two key foundations behind the modern NLP revolution. This is the paper here. And 00:20:51.040 |
actually, you know, interesting point about that, it was actually invented for a fast 00:20:57.560 |
AI course. So the first time it appeared was not actually in the journal, it was actually 00:21:06.360 |
in lesson four of the course, I think the 2016 course, if I remember correctly. And you know, 00:21:16.000 |
most importantly, of course, I've been teaching this course since version one. And this is 00:21:22.400 |
actually this, I think this is the very first version of it, which even back then was getting 00:21:26.640 |
HBR's attention. A lot of people have been watching the course, and it's been, you know, 00:21:34.080 |
fairly widely used. YouTube doesn't show likes anymore. So I have to show you our likes for 00:21:39.720 |
you. You know, it's it's been amazing to see how, yeah, how many alumni have gone from 00:21:49.320 |
this to, to, you know, to really doing amazing things, you know, and so, for example, Andre 00:21:59.000 |
Capathy told me that Tesla, I think he said pretty much everybody who joins Tesla in AI 00:22:03.920 |
is meant to do this course, I believe at OpenAI, they told me that all the residents joining 00:22:08.720 |
there first do this course. So this, you know, this course is really widely used in industry 00:22:14.600 |
and research for people. And they have a lot of success. Okay, so there's a bit of brief 00:22:20.640 |
information about why you should hopefully get going with this. Alright, so let's get 00:22:27.000 |
back to what's what's happened here. Why are we able to create a bird recognizer in a minute 00:22:34.800 |
or two? And why couldn't we do it before? So I'm going to go back to 2012. And in 2012, 00:22:43.280 |
this was how image recognition was done. This is the computational pathologist. It was a 00:22:51.240 |
project done at Stanford, very successful, very famous project that was looking at the 00:22:55.760 |
five year survival of breast cancer patients by looking at their histopathology images 00:23:03.200 |
slides. Now, so this is like what I would call a classic machine learning approach. 00:23:09.720 |
And I spoke to the senior author of this Daphne Coller. And I asked her why they didn't use 00:23:14.440 |
deep learning. And she said, Well, it just, you know, it wasn't really on the radar at 00:23:18.680 |
that point. So this is like a pre deep learning approach. And so the way they did this was 00:23:24.920 |
they got a big team of mathematicians and computer scientists and pathologists and so 00:23:29.040 |
forth to get together and build these ideas for features like relationships between epithelial 00:23:34.160 |
nuclear neighbors. Thousands and thousands, actually, they created a features and each 00:23:38.760 |
one required a lot of expertise from a cross disciplinary group of experts at Stanford. 00:23:44.520 |
So this project took years, and a lot of people and a lot of code and a lot of math. And then 00:23:50.000 |
once they had all these features, they then fed them into a machine learning model, in 00:23:54.360 |
this case, logistic regression to predict survival. As I say, it's very successful, right? 00:24:00.480 |
But it's not something that I could create for you in a minute, at the start of a course. 00:24:07.200 |
The difference with neural networks is neural networks don't require us to build these features. 00:24:11.760 |
They build them for us. And so what actually happened was, in I think it was 2015, Matt 00:24:19.880 |
Zyla and Rob Fergus took a trained neural network and they looked inside it to see what 00:24:26.920 |
it had learned. So we don't give it features, we ask it to learn features. So when Zyla 00:24:34.000 |
and Fergus looked inside a neural network, they looked at the actual, the weights in 00:24:41.840 |
the model, and they draw a picture of them. And this was nine of the sets of weights they 00:24:45.680 |
found. And this set of weights, for example, finds diagonal edges. This set of weights 00:24:51.520 |
finds yellow to blue gradients. And this set of weights finds red to green gradients and 00:24:57.920 |
so forth, right? And then down here are examples of some bits of photos which closely matched, 00:25:04.560 |
for example, this feature detector. And deep learning is deep because we can then take 00:25:13.480 |
these features and combine them to create more advanced features. So these are some 00:25:19.360 |
layer two features. So there's a feature, for example, that finds corners and a feature 00:25:24.540 |
that finds curves and a feature that finds circles. And here are some examples of bits 00:25:29.140 |
of pictures that the circle finder found. And so remember, with a neural net, which 00:25:35.720 |
is the basic function used in deep learning, we don't have to hand code any of these or 00:25:41.060 |
come up with any of these ideas. You just start with actually a random neural network, 00:25:47.560 |
and you feed it examples, and you have a learn to recognize things. And it turns out that 00:25:54.400 |
these are the things that it creates for itself. So you can then combine these features. And 00:26:03.600 |
when you combine these features, it creates a feature detector, for example, that finds 00:26:07.880 |
kind of repeating geometric shapes. And it creates a feature detector, for example, that 00:26:14.440 |
finds kind of really little things, which it looks like is finding the edges of flowers. 00:26:20.680 |
And this feature detector here seems to be finding words. And so the deeper you get, 00:26:28.120 |
the more sophisticated the features it can find are. And so you can imagine that trying 00:26:32.080 |
to code these things by hand would be insanely difficult, and you wouldn't know even what 00:26:38.680 |
to encode by hand. So what we're going to learn is how neural networks do this automatically. 00:26:45.520 |
But this is the key difference of why we can now do things that previously we just didn't 00:26:51.040 |
even conceive of as possible, because now we don't have to hand code the features we 00:26:56.880 |
look for. They can all be learned. Now, build is important to recognize. We're going to 00:27:06.080 |
be spending some time learning about building image-based algorithms. And image-based algorithms 00:27:14.280 |
are not just for images. And in fact, this is going to be a general theme. We're going 00:27:17.680 |
to show you some foundational techniques. But with creativity, these foundational techniques 00:27:23.840 |
can be used very widely. So for example, an image recognizer can also be used to classify 00:27:34.080 |
sounds. So this was an example from one of our students who posted on the forum and said 00:27:41.160 |
for their project, they would try classifying sounds. And so they basically took sounds 00:27:47.200 |
and created pictures from their waveforms. And then they used an image recognizer on 00:27:52.200 |
that. And they got a state of the art result, by the way. Another of our students on the 00:27:57.320 |
forum said that they did something very similar to take time series and turn them into pictures 00:28:03.200 |
and then use image classifiers. Another of our students created pictures from mouse movements 00:28:12.480 |
from users of a computer system. So the clicks became dots and the movements became lines 00:28:17.880 |
and the speed of the movement became colors. And then use that to create an image classifier. 00:28:23.640 |
So you can see with some creativity, there's a lot of things you can do with images. There's 00:28:33.880 |
something else I wanted to point out, which is that as you saw, when we trained a real 00:28:41.720 |
working bird-recognized image model, we didn't need lots of math. There wasn't any. We didn't 00:28:48.200 |
need lots of data. We had 200 pictures. We didn't need lots of expensive computers. We 00:28:52.680 |
just used my laptop. This is generally the case for the vast majority of deep learning 00:28:59.640 |
that you'll need in real life. There will be some math that pops up during this course, 00:29:09.560 |
but we will teach it to you as needed or we'll refer you to external resources as needed. 00:29:14.040 |
But it will just be the little bits that you actually need. The myth that deep learning 00:29:20.600 |
needs lots of data I think is mainly passed along by big companies that want to sell you 00:29:27.480 |
computers to store lots of data and to process it. We find that most real world projects 00:29:34.680 |
don't need extraordinary amounts of data at all. And as you'll see, there's actually a 00:29:40.760 |
lot of fantastic places you can do state-of-the-art work for free nowadays, which is great news. 00:29:51.080 |
One of the key reasons for this is because of something called transfer learning, which 00:29:55.080 |
we'll be learning about a lot during this course, and it's something which very few 00:30:00.080 |
people are aware of and are aware of. In this course, we'll be using PyTorch. For those 00:30:08.720 |
of you who are not particularly close to the deep learning world, you might have heard 00:30:14.880 |
of TensorFlow and not of PyTorch. You might be surprised to hear that TensorFlow has been 00:30:22.960 |
dying in popularity in recent years, and PyTorch is actually growing rapidly. And in research 00:30:34.560 |
repositories amongst the top papers, TensorFlow is a tiny minority now compared to PyTorch. 00:30:44.780 |
This is also great research that's come out from Ryan O'Connor. He also discovered that 00:30:52.960 |
the majority of people that were doing TensorFlow in 2018, the majority of now shifted to PyTorch. 00:31:00.920 |
And I mention this because what people use in research is a very strong leading indicator 00:31:06.600 |
of what's going to happen in industry because this is where all the new algorithms are going 00:31:10.720 |
to come out, this is where all the papers are going to be written about. It's going 00:31:14.640 |
to be increasingly difficult to use TensorFlow. We've been using PyTorch since before it came 00:31:20.600 |
out, before the initial release, because we knew just from technical fundamentals, it 00:31:25.520 |
was far better. So this course has been using PyTorch for a long time. I will say, however, 00:31:32.080 |
that PyTorch requires a lot of hairy code for relatively simple things. This is the 00:31:37.280 |
code required to implement a particular optimizer called AdamW in plain PyTorch. I actually 00:31:43.740 |
copied this code from the PyTorch repository. So as you can see, there's a lot of it. This 00:31:51.360 |
gray bit here is the code required to do the same thing with FastAI. FastAI is a library 00:31:59.120 |
we built on top of PyTorch. This huge difference is not because PyTorch is bad, it's because 00:32:06.520 |
PyTorch is designed to be a strong foundation to build things on top of, like FastAI. When 00:32:16.180 |
you use FastAI, the library, you get access to all the power of PyTorch as well. But you 00:32:23.540 |
shouldn't be writing all this code if you only need to write this much code. The problem 00:32:29.040 |
of writing lots of code is that that's lots of things to make mistakes with, lots of things 00:32:33.320 |
to not have best practices in, lots of things to maintain. In general, we found particularly 00:32:39.960 |
with deep learning, less code is better. Particularly with FastAI, the code you don't write is code 00:32:48.520 |
that we've basically found kind of best practices for you. So when you use the code that we've 00:32:54.520 |
provided for you, you're generally fine to get better results. So FastAI has been a really 00:33:02.480 |
popular library, and it's very widely used in industry, in academia, and in teaching. 00:33:10.680 |
And as we go through this course, we'll be seeing more and more pure PyTorch as we get 00:33:16.300 |
deeper and deeper underneath to see exactly how things work. The FastAI library just won 00:33:23.760 |
the 2020 Best Paper Award, or the paper about it, in information. So again, you can see 00:33:29.200 |
it's a very well regarded library. Okay, so okay, we're still green. That's good. So you 00:33:41.880 |
may have noticed something interesting, which is that I'm actually running code in these 00:33:48.800 |
slides. That's because these slides are not in PowerPoint. These slides are in Jupyter 00:33:58.120 |
Notebook. Jupyter Notebook is the environment in which you will be doing most of your computing. 00:34:09.080 |
It's a web-based application, which is extremely popular and widely used in industry and in 00:34:19.200 |
academia and in teaching, and is a very, very, very powerful way to experiment and explore 00:34:26.360 |
and to build. Nowadays, I would say most people, at least most students, run Jupyter Notebooks 00:34:37.760 |
not on their own computers, particularly for data science, but on a cloud server, of which 00:34:44.220 |
there's quite a few. And as I mentioned earlier, if you go to course.fast.ai, you can see how 00:34:52.100 |
to use various different cloud servers. One I'm going to show an example of is Kaggle. 00:35:05.440 |
So Kaggle doesn't just have competitions, but it also has a cloud notebooks server. 00:35:10.380 |
And I've got quite a few examples there. So let me give you a quick example of how we 00:35:21.340 |
use Jupyter Notebooks to build stuff, to experiment, to explore. So on Kaggle, if you start with 00:35:30.480 |
somebody else's Notebook, so why don't you start with this one, Jupyter Notebook 101. 00:35:35.700 |
If it's your own Notebook, you'll see a button called edit. If it's somebody else's, that 00:35:39.080 |
button will say copy and edit. If you use somebody's Notebook that you like, make sure 00:35:45.240 |
you click the upvote button to encourage them and to help other people find it before you 00:35:49.720 |
go ahead and copy and edit. And once we're in edit mode, we can now use this Notebook. 00:35:59.300 |
And to use it, we can type in any arbitrary expression in Python and click run. And the 00:36:06.260 |
very first time we do that, it says session is starting. It's basically launching a virtual 00:36:10.460 |
computer for us to run our code. This is all free. In a sense, it's like the world's most 00:36:18.420 |
powerful calculator. It's a calculator where you have all of the capabilities of the world's, 00:36:25.500 |
I think, most popular programming language. Certainly, it and JavaScript would be the 00:36:29.260 |
top two directly at your disposal. So Python does know how to do one plus one. And so you 00:36:35.400 |
can see here, it spits out the answer. I hate clicking. I always use keyboard shortcuts. 00:36:40.580 |
So instead of clicking this little arrow, you just press shift and to do the same thing. 00:36:46.820 |
And as you can see, there's not just calculations here. There's also pros. And so Jupyter Notebooks 00:36:53.420 |
are great for explaining to you the version of yourself in six months time, what on earth 00:36:59.180 |
you are doing or to your coworkers or to people in the open source community, what are people 00:37:03.460 |
you're blogging for, etc. And so you just type pros. And as you can see, when we create 00:37:09.820 |
a new cell, you can create a code cell, which is a cell that lets you type calculations 00:37:16.700 |
or a markdown cell, which is a cell that lets you create pros. And the pros use this formatting 00:37:26.340 |
in a little mini language called markdown. There's so many tutorials around, I won't 00:37:30.300 |
explain it to you, but it lets you do things like links and so forth. So I'll let you follow 00:37:41.100 |
through the tutorial in your own time because it really explains to you what to do. One 00:37:48.220 |
thing to point out is that sometimes you'll see me use cells with an exclamation mark 00:37:52.060 |
at the start. That's not Python. That's a bash shell command. Okay, so that's what the 00:37:58.260 |
exclamation mark means. As you can see, you can put images into notebooks. And so the 00:38:06.080 |
image I popped in here was the one showing that Jupiter won the 2017 software system 00:38:11.100 |
award, which is pretty much the biggest award there is for this kind of software. Okay, 00:38:17.420 |
so that's the basic idea of how we use notebooks. So let's have a look at how we do our, how 00:38:28.140 |
we do our bird or not bird model. One thing I always like to do when I'm using something 00:38:38.460 |
like Colab or Kaggle cloud platforms that I'm not controlling is make sure that I'm 00:38:43.420 |
using the most recent version of any software. So my first cell here is exclamation mark 00:38:49.620 |
pip install minus U that means upgrade Q for quiet fast AI. So that makes sure that we 00:38:55.420 |
have the latest version of fast AI. And if you always have that at the start of your 00:38:58.940 |
notebooks, you're never going to have those awkward foreign threads where you say, why 00:39:02.820 |
isn't this working? And somebody says to you, oh, you're using an old version of some software. 00:39:09.860 |
So you'll see here, this notebook is the exact thing that I was showing you at the start 00:39:16.700 |
of this lesson. So if you haven't done much Python, you might be surprised about how little 00:39:31.980 |
code there is here. And so Python is a concise but not too concise language, you'll see that 00:39:41.980 |
there's less boilerplate than some other languages you might be familiar with. And I'm also taking 00:39:47.380 |
advantage of a lot of libraries. So fast AI provides a lot of convenient things for you. 00:39:54.940 |
So I forgot to import. So to use a external library, we use import to import a symbol 00:40:09.260 |
from a library. Fast AI has a lot of libraries we provide, they generally start with fast 00:40:15.100 |
something. So for example, to make it easy to download a URL, fast download has download 00:40:20.100 |
URL. To make it easy to create a thumbnail, we have image to thumb and so forth. So we 00:40:31.180 |
I always like to view as I'm building a model, my data at every step. So that's why I first 00:40:38.740 |
of all grab one bird, and then I grab one forest photo, and I look at them to make sure they 00:40:45.780 |
look reasonable. And once I think okay, they look okay, then I go ahead and download. And 00:40:58.620 |
so you can see fast AI has a download images, where you just provide a list of URLs. So 00:41:04.820 |
that's how easy it is. And it does that in parallel. So it does that, you know, surprisingly 00:41:10.740 |
quickly. One other fast AI thing I'm using here is resize images. You generally will 00:41:18.260 |
find that for computer vision algorithms, you don't need particularly big images. So 00:41:23.660 |
I'm resizing these to a maximum side length of 400. Because it's actually much faster. 00:41:30.820 |
This GPUs are so quick for big images, most of the time can be taken up just opening it. 00:41:37.740 |
The neural net itself often takes less time. So that's another good reason to make them 00:41:41.980 |
smaller. Okay. So the main thing I wanted to tell you about was this data block command. 00:41:51.420 |
So the data block is the key thing that you're going to want to get familiar with, as deep 00:41:56.660 |
learning practitioners at the start of your journey. Because the main thing you're going 00:42:02.820 |
to be trying to figure out is how do I get this data into my model? Now that might surprise 00:42:09.340 |
you. You might be thinking we should be spending all of our time talking about neural network 00:42:13.300 |
architectures and matrix multiplication and gradients and stuff like that. The truth is 00:42:20.780 |
very little of that comes up in practice. And the reason is that at this point, the deep 00:42:28.380 |
learning community has found a reasonably small number of types of model that work for 00:42:36.780 |
nearly all the main applications you'll need. And fast AI will create the right type of 00:42:42.840 |
model for you the vast majority of the time. So all of that stuff about tweaking neural 00:42:50.800 |
network architectures and stuff, I mean, we'll get to it eventually in this course. But you 00:42:56.480 |
might be surprised to discover that it almost never comes up. Kind of like if you ever did 00:43:02.260 |
like a computer science course or something, and they spent all this time on the details 00:43:06.860 |
of compilers and operating systems, and then you get to the real world and you never use 00:43:10.620 |
it again. So this course is called practical deep learning. And so we're going to focus 00:43:15.340 |
on the stuff that is practically important. Okay, so our images are finished downloading, 00:43:22.740 |
and two of them were broken, so we just deleted them. Another thing you'll note, by the way, 00:43:31.620 |
if you're a keen software engineer is I tend to use a lot of functional style in my programs 00:43:39.340 |
I find for kind of the kind of work I do that a functional style works very well. If you're 00:43:47.220 |
you know a lot of people in Python are less familiar with that it's more it may becomes 00:43:49.900 |
more from other things. So yeah, that's why you'll see me using stuff like map and stuff 00:43:54.340 |
quite a lot. Alright, so a data block is the key thing you need to know about if you're 00:44:01.120 |
going to know how to use different kinds of data sets. And so these are all of the things 00:44:06.620 |
basically that you'll be providing. And so what we did when we designed the data block 00:44:10.900 |
was we actually looked and said, okay, over hundreds of projects, what are all the things 00:44:19.080 |
that change from project to project to get the data into the right shape. And we realized 00:44:24.340 |
we could basically split it down into these five things. So the first thing that we tell 00:44:30.460 |
fast AI is what kind of input do we have. And so then so there are lots of blocks in 00:44:36.500 |
fast AI for different kinds of input. So he said, Oh, the input is an image. What kind 00:44:41.340 |
of output is there? What kind of label? The outputs are category. So that means it's one 00:44:45.580 |
of a number of possibilities. So that's enough for fast AI to know what kind of model to 00:44:52.980 |
build for you. So what are the items in this model? What am I actually going to be looking 00:44:58.060 |
at to look to train from? This is a function. In fact, you might have noticed if you were 00:45:03.800 |
looking carefully that we use this function here. It's a function which returns a list 00:45:11.940 |
of all of the image files in a path based on extension. So every time it's going to 00:45:17.420 |
try and find out what things to train from, it's going to use that function. In this case, 00:45:21.420 |
we'll get a list of image files. Now, something we'll talk about shortly is that it's critical 00:45:27.780 |
that you put aside some data for testing the accuracy of your model. And that's called 00:45:32.860 |
a validation set. It's so critical that fast AI won't let you train a model with that one. 00:45:39.860 |
So you actually have to tell it how to create a validation set, how to set aside some data. 00:45:45.020 |
And in this case, we say randomly set aside 20% of the data. Okay, next question, then 00:45:55.740 |
you have to tell fast AI is how do we know the correct label of a photo? How do we know 00:46:02.400 |
if it's a bird photo or a forest photo? And this is another function. And this function 00:46:09.540 |
simply returns the parent folder of a path. And so in this case, we saved our images into 00:46:20.580 |
either forest or bird. So that's where the labels are going to come from. 00:46:26.980 |
And then finally, most computer vision architectures need all of your inputs as you train to be 00:46:35.260 |
the same size. So item transforms are all of the bits of code that are going to run 00:46:43.100 |
on every item, on every image in this case. And we're saying, okay, we want you to resize 00:46:49.300 |
each of them to being 192 by 192 pixels. There's two ways you can resize, you can either crop 00:46:56.580 |
out a piece in the middle, or you can squish it. And so we're saying, squish it. So that's 00:47:06.100 |
the data block, that's all that you need. And from there, we create an important class 00:47:11.660 |
called data loaders. Data loaders are the things that actually PyTorch iterates through 00:47:18.220 |
to grab a bunch of your data at a time. The way it can do it so fast is by using a GPU, 00:47:24.580 |
which is something that can do thousands of things at the same time. And that means it 00:47:28.620 |
needs thousands of things to do at the same time. So a data loader will feed the training 00:47:34.820 |
algorithm with a bunch of your images at once. In fact, we don't call it a bunch, we call 00:47:42.220 |
it a batch, or a mini batch. And so when we say show batch, that's actually a very specific 00:47:53.420 |
word in deep learning, it's saying show me an example of a batch of data that you would 00:47:58.740 |
be passing into the model. And so you can see show batch gives you tells you two things, 00:48:03.860 |
the input, which is the picture, and the label. And remember, the label came by calling that 00:48:11.900 |
function. So when you come to building your own models, you'll be wanting to know what 00:48:20.940 |
kind of splitters are there and what kinds of labeling functions are there and so forth. 00:48:24.460 |
That's wrong button. You'll be wanting to know what kind of labeling functions are there 00:48:30.420 |
and what kind of splitters are there and so forth. And so docs.fast.ai is where you go 00:48:35.540 |
to get that information. Often the best place to go is the tutorials. So for example, here's 00:48:43.060 |
a whole data block tutorial. And there's lots and lots of examples. So hopefully you can 00:48:48.900 |
start out by finding something that's similar to what you want to do and see how we did 00:48:54.860 |
it. But then of course, there's also the underlying API information. So here's data blocks. OK. 00:49:07.540 |
How are we doing? Still doing good. All right. So at the end of all this, we've got an object 00:49:19.380 |
called dls. It stands for data loaders. And that contains iterators that PyTorch can run 00:49:27.700 |
through to grab batches of randomly split out training images to train the model with 00:49:35.260 |
and validation images to test the model with. So now we need a model. The critical concept 00:49:43.660 |
here in fast.ai is called a learner. A learner is something which combines a model, that 00:49:50.660 |
is the actual neural network function we'll be training, and the data we use to train 00:49:55.220 |
it with. And that's why you have to pass in two things. The data, which is the data loaders 00:50:01.500 |
object, and a model. And so the model is going to be the actual neural network function that 00:50:11.580 |
you want to pass in. And as I said, there's a relatively small number that basically work 00:50:16.620 |
for the vast majority of things you do. If you pass in just a bare symbol like this, 00:50:24.020 |
it's going to be one of fast.ai's built-in models. But what's particularly interesting 00:50:30.100 |
is that we integrate a wonderful library by Ross Whiteman called Tim, the PyTorch image 00:50:35.820 |
models, which is the largest collection of computer vision models in the world. And at 00:50:40.580 |
this point, fast.ai is the first and only framework to integrate this. So you can use 00:50:45.540 |
any one of the PyTorch image models. And one of our students, Amanomora, was kind enough 00:50:51.740 |
to create this fantastic documentation where you can find out all about the different models. 00:51:03.140 |
And if we click on here, you can get lots and lots of information about all the different 00:51:08.380 |
models that Ross has provided. Having said that, the model family called ResNet are probably 00:51:19.680 |
going to be fine for nearly all the things you want to do. But it is fun to try different 00:51:23.900 |
models out. So you can type in any string here to use any one of those other models. 00:51:33.980 |
Okay, so if we run that, let's see what happens. Okay, so this is interesting. So when I ran 00:51:41.820 |
this, so remember on Kaggle, it's creating a new virtual computer for us. So it doesn't 00:51:47.980 |
really have anything ready to go. So when I ran this, the first thing it did was it 00:51:51.260 |
said downloading resnet18.pth. What's that? Well, the reason we can do this so fast is 00:52:01.100 |
because somebody else has already trained this model to recognize over 1 million images of 00:52:08.980 |
over 1,000 different types, something called the image net dataset. And they then made 00:52:16.140 |
those weights available, those parameters available on the internet for anybody to download. 00:52:22.780 |
By default, on fast.ai, when you ask for a model, we will download those weights for 00:52:29.380 |
you so that you don't start with a random network that can't do anything. You actually 00:52:34.900 |
start with a network that can do an awful lot. And so then something that fast.ai has 00:52:40.580 |
that's unique is this fine-tune method, which what it does is it takes those pre-trained 00:52:46.500 |
weights we downloaded for you and it adjusts them in a really carefully controlled way 00:52:52.660 |
to just teach the model the differences between your dataset and what it was originally trained 00:53:00.060 |
for. That's called fine-tuning. Hence the name. So that's why you'll see this downloading 00:53:07.500 |
happen first. And so as you can see at the end of it, this is the error rate here. After 00:53:13.980 |
a few seconds, it's 100% accurate. So we now have a learner. And this learner has started 00:53:26.140 |
with a pre-trained model. It's been fine-tuned for the purpose of recognizing bird pictures 00:53:32.220 |
from forest pictures. So you can now call .predict on it. And .predict, you pass in 00:53:43.140 |
an image. And so this is how you would then deploy your model. So in the code, you have 00:53:51.020 |
whatever it needs to do. So in this particular case, this person had some reason that he 00:53:59.300 |
needs the app to check whether they're in a national park and whether it's a photo of 00:54:03.220 |
a bird. So at the bit where they need to know if it's a photo of a bird, it would just call 00:54:08.380 |
this one line of code, learn.predict. And so that's going to return whether it's a bird 00:54:15.660 |
or not as a string, whether it's a bird or not as an integer, and the probability that 00:54:20.780 |
it's a non-bird or a bird. And so that's why we can print these things out. 00:54:29.100 |
So that's how we can create a computer vision model. What about other kinds of models? There's 00:54:39.920 |
a lot more in the world than just computer vision, a lot more than just image recognition. 00:54:45.220 |
Or even within computer vision, there's a lot more than just image recognition. For 00:54:51.220 |
example, there's segmentation. So segmentation, maybe the best way to explain segmentation 00:55:05.940 |
is to show you the result of this model. Segmentation is where we take photos, in this case of road 00:55:14.700 |
scenes, and we color in every pixel according to what is it. So in this case, we've got 00:55:22.300 |
brown as cars, blue as fences, I guess, red as buildings, brown. And so on the left here, 00:55:32.780 |
some photos that somebody has already gone through and classified every pixel of every 00:55:38.060 |
one of these images according to what that pixel is a pixel of. And then on the right 00:55:45.200 |
is what our model is guessing. And as you can see, it's getting a lot of the pixels 00:55:52.580 |
correct, and some of them is getting wrong. It's actually amazing how many is getting 00:55:58.900 |
correct because this particular model I trained in about 20 seconds using a tiny, tiny, tiny 00:56:15.260 |
amount of data. So again, you would think this would be a particularly challenging problem 00:56:22.300 |
to solve, but it took about 20 seconds of training to solve it not amazingly well, but 00:56:29.260 |
pretty well. If I'd trained it for another two minutes, it'd probably be close to perfect. 00:56:34.940 |
So this is called segmentation. Now, you'll see that there's very, very little code required, 00:56:45.860 |
and the steps are actually going to look quite familiar. In fact, in this case, we're using 00:56:50.460 |
an even simpler approach. Now, earlier on, we used data blocks. Data blocks are a kind 00:56:56.940 |
of intermediate level, very flexible approach that you can take to handling almost any kind 00:57:05.740 |
of data. But for the kinds of data that occur a lot, you can use these special data loaders 00:57:11.500 |
classes, which kind of lets you use even less code. So in this case, to create data loaders 00:57:17.140 |
for segmentation, you can just say, okay, I'm going to pass you in a function for labeling. 00:57:24.220 |
And you can see here, it's got pretty similar things that we pass in to what we passed in 00:57:29.340 |
for data blocks before. So our file names is getImageFiles again, and then our label 00:57:35.620 |
function is something that grabs this path and the codes, so the code, so like what does 00:57:48.820 |
each code mean, is going to be this text file. But you can see the basic information we're 00:57:55.340 |
providing is very, very similar, regardless of whether we're doing segmentation or object 00:58:01.700 |
recognition. And then the next steps are pretty much the same. We create a learner for segmentation. 00:58:06.460 |
We create something called a unit learner, which we'll learn about later. And then again, 00:58:10.740 |
we call fine-tune. So that is it. And that's how we create a segmentation model. What about 00:58:19.580 |
stepping away from computer vision? So perhaps the most widely used kind of model used in 00:58:25.820 |
industry is tabular analysis. So taking things like spreadsheets and database tables and 00:58:30.420 |
trying to predict columns of those. So in tabular analysis, it really looks very similar 00:58:42.780 |
to what we've seen already. We grab some data, and you'll see when I call this untied data, 00:58:49.140 |
this is the thing in Fast.ai that downloads some data and decompresses it for you. And 00:58:53.220 |
there's a whole lot of URLs provided by Fast.ai for all the kind of common data sets that 00:58:59.220 |
you might want to use, all the ones that are in the book, or lots of data sets that are 00:59:03.460 |
kind of widely used in learning and research. So that makes life nice and easy for you. 00:59:08.940 |
So again, we're going to create data loaders, but this time it's tabular data loaders. But 00:59:12.700 |
we provide pretty similar kind of information to what we have before. A couple of new things. 00:59:17.900 |
We have to tell it which of the columns are categorical. So they can only take one of 00:59:22.220 |
a few values, and which ones are continuous. So they can take basically any real number. 00:59:30.020 |
And then again, we can use the exact same show batch that we've seen before to see the 00:59:35.740 |
data. And so Fast.ai uses a lot of something called type dispatch, which is a system that's 00:59:43.740 |
particularly popular in a language called Julia, to basically automatically do the right 00:59:49.240 |
thing for your data, regardless of what kind of data it is. So if your call show batch 00:59:54.240 |
on something, you should get back something useful, regardless of what kind of information 00:59:58.700 |
you provided. So for a table, it shows you the information in that table. This particular 01:00:05.980 |
data set is a data set of whether people have less than $50,000 or more than $50,000 in 01:00:14.260 |
salary for different districts based on demographic information in each district. So to build a 01:00:24.740 |
model for that data loaders, we do, as always, something_learner. In this case, it's a tabular 01:00:33.700 |
learner. Now this time we don't say fine-tune. We say fit, specifically fit one cycle. That's 01:00:40.940 |
because for tabular models, there's not generally going to be a pre-trained model that already 01:00:45.300 |
does something like what you want, because every table of data is very different, whereas 01:00:51.560 |
pictures often have a similar theme. They're all pictures. They all have the same kind 01:00:55.860 |
of general idea of what pictures are. So that's why it generally doesn't make too much sense 01:01:02.260 |
to fine-tune a tabular model. So instead, you just fit. So there's one difference there. 01:01:08.060 |
I'll show another example. Okay, so collaborative filtering. Collaborative filtering is the 01:01:21.520 |
basis of most recommendation systems today. It's a system where we basically take data 01:01:26.940 |
set that says which users liked which products or which users used which products, and then 01:01:36.340 |
we use that to guess what other products those users might like based on finding similar users 01:01:42.620 |
and what those similar users liked. The interesting thing about collaborative filtering is that 01:01:47.020 |
when we say similar users, we're not referring to similar demographically, but similar in 01:01:53.500 |
the sense of people who liked the same kinds of products. So for example, if you use any 01:02:02.560 |
of the music systems like Spotify or Apple Music or whatever, it'll ask you first like 01:02:09.020 |
what's a few pieces of music you like, and you tell it. And then it says, okay, well, 01:02:15.380 |
maybe let's start playing this music for you. And that's how it works. It uses collaborative 01:02:20.140 |
filtering. So we can create a collaborative filtering data loaders in exactly the same 01:02:28.620 |
way that we're used to by downloading and decompressing some data, create our collab 01:02:34.340 |
data loaders. In this case, we can just say from CSV and pass in a CSV. And this is what 01:02:39.700 |
collaborative filtering data looks like. It's going to have, generally speaking, a user 01:02:43.500 |
ID, some kind of product ID, in this case, a movie, and a rating. So in this case, this 01:02:52.900 |
user gave this movie a rating of 3.5 out of 5. And so again, you can see show batch, right? 01:03:00.420 |
So use show batch, you should get back some useful visualization of your data regardless 01:03:05.420 |
of what kind of data it is. And so again, we create a learner. This time it's a collaborative 01:03:15.340 |
filtering learner, and you pass in your data. In this case, we give it one extra piece of 01:03:22.340 |
information, which is because this is not predicting a category, but it's predicting 01:03:26.980 |
a real number, we tell it what's the possible range. The actual range is 1 to 5. But for 01:03:34.800 |
reasons you'll learn about later, it's a good idea to actually go from a little bit lower 01:03:38.820 |
than the possible minimum to a little bit higher. That's why I say 0.5 to 5.5. And then 01:03:45.260 |
fine-tune. Now again, we don't really need to fine-tune here because there's not really 01:03:49.060 |
such a thing as a pre-trained collaborative filtering model. We could just say fit or 01:03:53.140 |
fit one cycle. But actually fine-tune works fine as well. So after we train it for a while, 01:04:01.060 |
this here is the mean squared error. So it's basically that on average, how far off are 01:04:07.080 |
we for the validation set. And you can see as we train, and it's literally so fast, it's 01:04:13.060 |
less than a second each epoch, that error goes down and down. And for any kind of fast AI 01:04:20.740 |
model, you can... And for any kind of fast AI model, you can always call show results 01:04:29.820 |
and get something sensible. So in this case, it's going to show a few examples of users 01:04:33.140 |
and movies. Here's the actual rating that user gave that movie, and here's the rating 01:04:38.420 |
that the model predicted. Okay, so apparently a lot of people on the forum are asking how 01:04:45.080 |
I'm turning this notebook into a presentation. So I'd be delighted to show you because I'm 01:04:53.020 |
very pleased that these people made this thing for free for us to use. It's called Rise. 01:04:59.700 |
And all I do is it's a notebook extension. And in your notebook, it gives you an extra 01:05:07.020 |
little thing on the side where you say which things are slides or which things are fragments. 01:05:12.500 |
And a fragment just means this is a slide that's a fragment. So if I do that, you'll 01:05:18.460 |
see it starts with a slide, and then the fragment gets added in. Yeah, that's about all theories 01:05:26.180 |
to it, actually. It's pretty great. And it's very well documented. I'll just mention, what 01:05:32.420 |
do I make with Jupyter Notebooks? This entire book was written entirely in Jupyter Notebooks. 01:05:47.420 |
Here are the notebooks. So if you go to the Fast.io Fastbook repo, you can read the whole 01:05:54.740 |
book. And because it's all in notebooks, every time we say here's how you create this plot 01:06:01.660 |
or here's how you train this model, you can actually create the plot or you can actually 01:06:05.180 |
train the model because it's all notebooks. The entire Fast.io library is actually written 01:06:17.580 |
in notebooks. So you might be surprised to discover that if you go to fast.io/fast.io, 01:06:27.180 |
the source code for the entire library is notebooks. And so the nice thing about this 01:06:40.140 |
is that the source code for the Fast.io library has actual pictures of the actual things that 01:06:46.020 |
we're building, for example. What else have we done with notebooks? Oh, blogging. I love 01:06:58.700 |
blogging with notebooks because when I want to explain something, I just write the code 01:07:04.820 |
and you can just see the outputs. And it all just works. Another thing you might be surprised 01:07:14.140 |
by is all of our tests and continuous integration are also all in notebooks. So every time we 01:07:20.920 |
change one of our notebooks, every time we change one of our notebooks, hundreds of tests 01:07:33.020 |
get run automatically in parallel. And if there's any issues, we will find out about 01:07:39.700 |
it. So notebooks are great. And Rise is a really nice way to do slides in notebooks. 01:07:51.180 |
All right. So what can deep learning do at present? We're still scratching the tip of 01:08:02.300 |
the iceberg, even though it's a pretty well-hyped, heavily marketed technology at this point. 01:08:08.180 |
When we started in 2014 or so, not many people were talking about deep learning. And really, 01:08:18.860 |
there was no accessible way to get started with it. There were no pre-trained models 01:08:22.380 |
you could download. There was just starting to appear some of the first open source software 01:08:30.700 |
that would run on GPUs. But despite the fact that today there's a lot of people talking 01:08:36.740 |
about deep learning, we're just scratching the surface. Every time pretty much somebody 01:08:40.840 |
says to me, "I work in domain X and I thought I might try deep learning out to see if it 01:08:45.700 |
can help." And I say to them a few months later and I say, "How did it go?" They nearly 01:08:50.180 |
always say, "Wow, we just broke the state-of-the-art results in our field." So when I say these 01:08:55.540 |
are things that it's currently state-of-the-art for, these are kind of the ones that people 01:08:59.180 |
have tried so far. But still, most things haven't been tried. So in NLP, deep learning 01:09:04.700 |
is the state-of-the-art method in all these kinds of things and a lot more. Computer vision, 01:09:14.660 |
medicine, biology, recommendation systems, playing games, robotics. I've tried elsewhere 01:09:28.580 |
to make bigger lists and I just end up with pages and pages and pages. Generally speaking, 01:09:37.860 |
if it's something that a human can do reasonably quickly, like look at a Go board and decide 01:09:47.420 |
if it looks like a good Go board or not, even if it needs to be an expert human, then that's 01:09:53.820 |
probably something that deep learning will be pretty good at. If it's something that 01:09:59.140 |
takes a lot of logical thought processes over an extended period of time, particularly if 01:10:05.700 |
it's not based on much data, maybe not, like who's going to win the next election or something 01:10:13.140 |
like that. That'd be kind of broadly how I would try to decide, is your thing useful 01:10:17.540 |
for deep, good for deep learning or not. It's been a long time to get to this point. Yes, 01:10:26.700 |
deep learning is incredibly powerful now, but it's taken decades of work. This was the 01:10:32.340 |
first neural network. Remember, neural networks are the basis of deep learning. This was back 01:10:37.140 |
in 1957. The basic ideas have not changed much at all, but we do have things like GPUs now 01:10:51.320 |
and solid state drives and stuff like that. Of course, much more data just is available 01:10:56.980 |
now, but this has been decades of really hard work by a lot of people to get to this point. 01:11:11.700 |
Let's take a step back and talk about what's going on in these models. I'm going to describe 01:11:23.660 |
the basic idea of machine learning, largely as it was described by Arthur Samuel in the 01:11:30.140 |
late '50s when it was invented. I'm going to do it with these graphs, which, by the way, 01:11:40.500 |
you might find fun. These graphs are themselves created with Jupyter Notebooks. These are 01:11:54.340 |
graph-fizz descriptions that are going to get turned into these. There's a little sneak 01:11:59.540 |
peek behind the scenes for you. Let's start with a graph of what does a normal 01:12:06.420 |
program look like? In the pre-deep learning, machine learning days, you still have inputs 01:12:13.020 |
and you still have results. Then you code a program in the middle, which is a bunch 01:12:18.060 |
of conditionals and loops and setting variables and blah, blah, blah. A machine learning model 01:12:25.620 |
doesn't look that different, but the program has been replaced with something called a 01:12:36.860 |
model. We don't just have inputs now. We now also have weights, which are also called parameters. 01:12:43.780 |
The key thing is this. The model is not anymore a bunch of conditionals and loops and things. 01:12:51.060 |
It's a mathematical function. In the case of a neural network, it's a mathematical function 01:12:55.500 |
that takes the inputs, multiplies them together by one set of weights, and adds them up. It 01:13:03.420 |
does that again for a second set of weights and adds them up. It does it again for a third 01:13:06.660 |
set of weights and adds them up and so forth. It then takes all the negative numbers and 01:13:11.860 |
replaces them with zeros. Then it takes those as inputs to a next layer. It does the same 01:13:18.700 |
thing, multiplies them a bunch of times and adds them up. It does that a few times. That's 01:13:24.220 |
called a neural network. The model, therefore, is not going to do anything useful, and this 01:13:32.780 |
leads weights to very carefully chosen. The way it works is that we actually start out 01:13:39.140 |
with these weights as being random. Initially, this thing doesn't do anything useful at all. 01:13:53.020 |
What we do, the way Arthur Samuel described it back in the late 50s, the inventor of machine 01:13:58.100 |
lighting, is he said, "Okay, let's take the inputs and the weights, put them through our 01:14:02.860 |
model." He wasn't talking particularly about neural networks. He's just like, "Whatever 01:14:06.380 |
model you like. Get the results, and then let's decide how good they are." If, for example, 01:14:16.660 |
we're trying to decide, "Is this a picture of a bird?" The model said, which initially 01:14:22.300 |
is random, says, "This isn't a bird." Actually, it is a bird. It would say, "Oh, you're wrong." 01:14:28.020 |
We then calculate the loss. The loss is a number that says, "How good were the results?" 01:14:35.740 |
That's all pretty straightforward. We could, for example, say, "Oh, what's the accuracy?" 01:14:38.860 |
We could look at 100 photos and say, "Which percentage of them did it get right?" No worries. 01:14:45.840 |
Now the critical step is this arrow. We need a way of updating the weights that is coming 01:14:53.220 |
up with a new set of weights that are a bit better than the previous set. By a bit better, 01:15:00.940 |
we mean it should make the loss get a little bit better. We've got this number that says, 01:15:06.980 |
"How good is our model?" Initially, it's terrible, right? It's random. We need some mechanism 01:15:14.140 |
of making it a little bit better. If we can just do that one thing, then we just need 01:15:19.740 |
to iterate this a few times because each time we put in some more inputs and put in our 01:15:25.220 |
weights and get our loss and use it to make it a little bit better, then if we make it 01:15:29.820 |
a little bit better enough times, eventually it's going to get good, assuming that our 01:15:35.820 |
model is flexible enough to represent the thing we want to do. 01:15:40.340 |
Now remember what I told you earlier about what a neural network is, which is basically 01:15:44.700 |
multiplying things together and adding them up and replacing the negatives with zeros, 01:15:49.420 |
and you do that a few times? That is, preferably, an infinitely flexible function. It actually 01:15:56.780 |
turns out that that incredibly simple sequence of steps, if you repeat it a few times and 01:16:03.580 |
you do enough of them, can solve any computable function, and something like generate an artwork 01:16:14.820 |
based off somebody's Twitter bio is an example of a computable function, or translate English 01:16:24.140 |
to Chinese is an example of a computable function. They're not the kinds of normal functions 01:16:30.780 |
you do in Year 8 math, but they are computable functions. Therefore, if we can just create 01:16:38.660 |
this step and use the neural network as our model, then we're good to go. In theory, we 01:16:48.140 |
can solve anything given enough time and enough data. That's exactly what we do. 01:16:59.860 |
Once we've finished that training procedure, we don't need the loss anymore. Even the weights 01:17:07.980 |
themselves, we can integrate them into the model. We finish changing them, so we can 01:17:12.100 |
just say that's now fixed. Once we've done that, we now have something which takes inputs, 01:17:18.340 |
puts them through a model, and gives us results. It looks exactly like our original idea of 01:17:25.460 |
a program. That's why we can do what I described earlier. That is, once we've got that learn.predict 01:17:32.900 |
for our bird recognizer, we can insert it into any piece of computer code. Once we've 01:17:38.220 |
got a trained model as just another piece of code, we can call with some inputs and 01:17:43.900 |
get some outputs. Deploying machine learning models in practice can come with a lot of 01:17:55.820 |
little tricky details, but the basic idea in your code is that you're just going to 01:17:59.500 |
have a line of code that says learn.predict, and then you just fit it in with all the rest 01:18:04.020 |
of your code in the usual way. This is why, because a trained model is just another thing 01:18:08.960 |
that maps inputs to results. As we come to wrap up this first lesson, 01:18:25.860 |
for those of you that are already familiar with notebooks and Python, this is going to 01:18:32.140 |
be pretty easy for you. You're just going to be using some stuff that you're already 01:18:37.260 |
familiar with and some slightly new libraries. For those of you who are not familiar with 01:18:41.260 |
Python, you're biting into a big thing here. There's obviously a lot you're going to have 01:18:47.980 |
to learn. To be clear, I'm not going to be teaching Python in this course, but we do 01:18:55.300 |
have links to great Python resources in the forum, so check out that thread. Regardless 01:19:04.100 |
of where you're at, the most important thing is to experiment. Experimenting could be as 01:19:13.500 |
simple as just running those Kaggle notebooks that I've shown you just to see them run. 01:19:21.620 |
You could try changing things a little bit. I'd really love you to try doing the bird 01:19:26.100 |
or forest exercise, but come up with something else. Maybe try to use three or four categories 01:19:31.140 |
rather than two. Have a think about something that you think would be fun to try. Depending 01:19:39.060 |
on where you're at, push yourself a little bit, but not too much. Make sure you get something 01:19:44.580 |
finished before the next lesson. Most importantly, read chapter one of the book. It's got much 01:19:51.940 |
the same stuff that we've seen today, but presented in a slightly different way. Then 01:19:57.540 |
come back to the forums and present what you've done in the share your work here thread. After 01:20:05.900 |
the first time we did this in year one of the course, we got over a thousand replies. 01:20:13.580 |
Of those replies, it's amazing how many of them have ended up turning into new startups, 01:20:20.420 |
scientific papers, job offers. It's been really cool to watch people's journeys. Some of them 01:20:26.140 |
are just plain fun. This person classified different types of Trinidad and Tobago people. 01:20:33.020 |
People do stuff based on where they live and what their interests are. I don't know if 01:20:36.580 |
this person is particularly interested in zucchini and cucumber, but they made a zucchini 01:20:40.060 |
and cucumber classifier. I thought this was a really interesting one, classifying satellite 01:20:44.780 |
imagery into what city it's probably a picture of. Amazingly accurate actually, 85% with 01:20:50.620 |
110 classes. Panama City bus classifier, battered cloth classifier. This one, very practically 01:21:01.380 |
important, recognizing the state of buildings. We've had quite a few students actually move 01:21:05.820 |
into disaster resilience based on satellite imagery using exactly this kind of work. We've 01:21:11.940 |
already actually seen this example, Ethan Sooten, the sound classifier. I mentioned it was state 01:21:18.740 |
of the art. He actually checked up the data sets website and found that he beat the state 01:21:22.500 |
of the art for that. Elena Harley did human normal sequencing. She was at Human Longevity 01:21:30.220 |
International. She actually did three different really interesting pieces of cancer work during 01:21:35.140 |
that first course, if I remember correctly. I showed you this picture before. What I didn't 01:21:40.860 |
mention is actually this student lab was a software developer at Splunk, a big NASDAQ-listed 01:21:48.220 |
company. This student project he did turned into a new patented product at Splunk and 01:21:55.020 |
a big blog post. The whole thing turned out to be really cool. It was basically something 01:21:58.700 |
to identify fraudsters using image recognition with these pictures we discussed. One of our 01:22:06.820 |
students built this startup called Envision. Anyway, there's been lots and lots of examples. 01:22:14.780 |
All of this is to say, have a go at starting something, create something you think would 01:22:22.860 |
be fun or interesting, and share it in the forum. If you're a total beginner with Python, 01:22:29.580 |
then start with something simple, but I think you'll find people very encouraging. If you've 01:22:33.660 |
done this a few times before, then try to push yourself a little bit further. Don't 01:22:38.220 |
forget to look at the quiz questions at the end of the book and see if you can answer 01:22:42.460 |
them all correctly. Thanks, everybody, so much for coming. Bye.