back to indexLesson 8: Deep Learning Part 2 2018 - Single object detection
Chapters
0:0 Intro
1:58 Key takeaways
2:11 Differentiable programming
3:21 Transfer learning
5:14 Architecture design
6:26 Overfitting
7:45 Categorical Data
8:50 Cutting Edge Deep Learning
14:17 How to use provided notebooks
16:50 A good time to build a deep learning box
21:34 Reading papers
30:12 generative models
35:7 object detection
38:32 stage 1 classification
40:12 logbook
42:44 pathlib
45:52 pathopen
49:12 contents
54:17 dictionary
00:00:00.000 |
Okay, welcome to Part 2 of Deep Learning for Coders. 00:00:07.840 |
Part 1 was Practical Deep Learning for Coders, Part 2 is not impractical, but it is a little 00:00:17.840 |
This is probably a really dumb idea, but last year I started not starting Part 2 with Part 00:00:23.920 |
2 Lesson 1, but Part 2 Lesson 8 because it's kind of part of the same sequence. 00:00:29.640 |
I've done that again, but sometimes I'll probably forget and call things Lesson 1. 00:00:34.920 |
So Part 2 Lesson 1 and Part 2 Lesson 8 are the same thing if I ever make that mistake. 00:00:39.920 |
So we're going to be talking about object detection today, which refers to not just 00:00:45.240 |
finding out what a picture is a picture of, but also where about that thing is. 00:00:50.240 |
But in general, the idea of each lesson in this part is not so much because I particularly 00:01:01.000 |
want you to care about object detection, but rather because I'm trying to pick topics which 00:01:06.640 |
allow me to teach you some foundational skills that you haven't got yet. 00:01:12.040 |
So for example, object detection is going to be all about creating much richer convolutional 00:01:19.960 |
network structures, which have a lot more interesting stuff going on and a lot more 00:01:24.560 |
stuff going on in the fast.ai library that we have to customize to get there. 00:01:29.380 |
So at the end of these 7 weeks, I can't possibly cover the hundreds of interesting things people 00:01:35.160 |
are doing with deep learning right now, but the good news is that all of those hundreds 00:01:40.800 |
of things you'll see once you read the papers, like minor tweaks on a reasonably small number 00:01:48.520 |
So we covered a bunch of those concepts in Part 1, and we're going to go a lot deeper 00:01:52.760 |
into those concepts and build on them to get some deeper concepts in Part 2. 00:02:00.160 |
So in terms of what we covered in Part 1, there's a few key takeaways. 00:02:09.360 |
We'll go through each of these takeaways in turn. 00:02:12.280 |
One is the idea -- and you might have seen recently Yann LeCun has been promoting the 00:02:17.240 |
idea that we don't call this deep learning, but differentiable programming. 00:02:23.120 |
And the idea is that, you'll have noticed, all the stuff we did in Part 1 was really 00:02:29.320 |
about setting up a differentiable function and a loss function that describes how good 00:02:37.280 |
the parameters are, and then pressing Go and it kind of makes it work. 00:02:44.240 |
And so I think it's quite a good way of thinking about it, differentiable programming, this 00:02:48.600 |
idea that if you can configure a loss function that describes the scores, how good something 00:02:58.080 |
is at doing your task, and you have a reasonably flexible neural network architecture, you're 00:03:06.680 |
So that's one key way of thinking about this. 00:03:09.920 |
This example here comes from Playground.TensorFlow.org, which is a cool website where you can play 00:03:15.200 |
interactively with creating your own little differentiable functions manually. 00:03:24.400 |
The second thing we learned is about transfer learning. 00:03:28.960 |
And it's basically that transfer learning is the most important single thing to be able 00:03:40.440 |
Nearly all courses, nearly all papers, nearly everything in deep learning, education, research 00:03:47.520 |
focuses on starting with random weights, which is ridiculous because you almost never would 00:03:58.640 |
You would only want to do that if nobody had ever trained a model on a vaguely similar 00:04:05.480 |
set of data with an even remotely connected kind of problem to solve as what you're doing 00:04:17.920 |
So this is where the fast.ai library and the stuff we talk about in this class is vastly 00:04:24.240 |
different to any other library or course, it's all focused on transfer learning and it turns 00:04:31.680 |
out that you do a lot of things quite differently. 00:04:35.960 |
So the basic idea of transfer learning is here's a network that does thing A, remove 00:04:41.680 |
the last layer or so, replace it with a few random layers at the end, fine-tune those 00:04:49.680 |
layers to do thing B, taking advantage of the features that the original network learned, 00:04:56.440 |
and then optionally fine-tune the whole thing end-to-end. 00:04:59.800 |
And you've now got something which probably uses orders of magnitude less data than if 00:05:07.560 |
It's probably a lot more accurate and probably trained a lot faster. 00:05:19.380 |
We didn't talk a hell of a lot about architecture design in Part 1, and that's because architecture 00:05:29.840 |
There's a pretty small range of architectures that generally work pretty well quite a lot 00:05:38.080 |
We've been focusing on using CNNs for generally fixed size, somehow ordered data, RNNs for 00:05:47.840 |
sequences that have some kind of state, fiddling around a tiny bit with activation functions 00:05:54.160 |
like softmax if you've got a single categorical outcome or sigmoid if you've got multiple 00:06:03.800 |
Some of the architecture design we'll be doing in this part gets more interesting, particularly 00:06:14.800 |
But on the whole, I think we probably spend less time talking about architecture design 00:06:18.320 |
than most courses or papers because it's generally not the hard bit in my opinion. 00:06:27.280 |
The third thing we looked at was how to avoid overfitting. 00:06:31.640 |
The general idea that I tried to explain is the way I like to build a model is to first 00:06:38.300 |
of all create something that's definitely terribly over-parameterized, will massively 00:06:44.080 |
overfit for sure, train it and make sure it does overfit. 00:06:47.480 |
Because at that point you know, okay, I've got a model that is capable of reflecting 00:06:52.760 |
the training set and then it's as simple as doing these things to then reduce that overfitting. 00:07:00.480 |
If you don't start with something that's overfitting, then you're kind of lost. 00:07:05.800 |
So you start with something that's overfitting and then to make it overfit less, you can 00:07:10.080 |
add more data, you can add more data augmentation, you can do things like more batch norm layers 00:07:19.720 |
or dense nets or various things that can handle basically less data, you can add regularization 00:07:31.200 |
And then finally, this is often the thing people do first, this should be the thing 00:07:35.320 |
you do last, is reduce the complexity of your architecture, have less layers or less activations. 00:07:47.560 |
We talked quite a bit about embeddings, both for NLP and the general idea of any kind of 00:07:54.400 |
categorical data as being something you can now model with neural nets. 00:07:58.320 |
It's been interesting to see how since Part 1 came out, at which point there were almost 00:08:05.600 |
no examples of papers or blogs or anything about using tabular data or categorical data 00:08:14.400 |
in deep learning, suddenly it's kind of taken off and it's kind of everywhere. 00:08:21.080 |
So this is becoming a more and more popular approach. 00:08:25.040 |
It's still little enough known that when I say to people, we use neural nets for time 00:08:31.880 |
series and tabular data analysis, it's often kind of like, wait, really? 00:08:36.400 |
But it's definitely not such a far out idea, and there's more and more resources available, 00:08:43.760 |
including recent Kaggle competition winning approaches using this technique. 00:08:53.160 |
So Part 1, which particularly had those five messages, really was all about introducing 00:09:07.040 |
And so it's like trying to show you techniques which were mature enough that they definitely 00:09:17.280 |
work reasonably reliably for practical real-world problems, and that I had researched and tuned 00:09:26.320 |
enough over quite a long period of time that I could kind of say, OK, here's a sequence 00:09:30.720 |
of steps and architectures and whatever that if you use this, you'll almost certainly get 00:09:36.720 |
pretty good results, and then had kind of put that into the fast AI library into a way 00:09:43.200 |
that you could do that pretty quickly and easily. 00:09:45.840 |
So that's kind of what practical deep learning for coders was designed to do. 00:09:52.720 |
So this Part 2 is cutting edge deep learning for coders, and what that means is I often 00:10:00.520 |
don't know the exact best parameters, architecture details and so forth to solve a particular 00:10:08.920 |
We don't necessarily know if it's going to solve a problem well enough to be practically 00:10:13.280 |
useful, it almost certainly won't be integrated well enough into fast AI or in any library 00:10:19.520 |
that you can just press a few buttons and it will start working. 00:10:24.040 |
It's all about stuff which I'm not going to teach it unless I'm very confident that it 00:10:31.760 |
either is now or will be soon, a very practically useful technique. 00:10:38.900 |
So I don't kind of take stuff which just appeared and I don't know enough about it to kind of 00:10:47.480 |
So if I'm teaching it in this course, I'm saying either works well in the research literature 00:10:55.520 |
now and it's going to be well worth learning about or we're pretty close to being there, 00:11:01.100 |
but it's going to take a lot of creaking often and experimenting to get it to work on your 00:11:06.480 |
particular problem because we don't know the details well enough to know how to make it 00:11:16.420 |
So it's kind of exciting to be working at this point. 00:11:25.120 |
It means that rather than fast AI and PyTorch being obscure black boxes which you just know 00:11:34.160 |
these recipes for, you're going to learn the details of them well enough that you can customize 00:11:40.900 |
them exactly the way you want, that you can debug them, that you can read the source code 00:11:46.240 |
of them to see what's happening and so forth. 00:11:50.040 |
And so if you're not pretty confident of object-oriented Python and stuff like that, then that's something 00:11:59.440 |
you're going to want to focus on studying during this course because we assume that. 00:12:11.880 |
I will be trying to introduce you to some tools that I think are particularly helpful 00:12:18.200 |
like the Python Debugger, like how to use your editor to jump through the code, stuff 00:12:24.600 |
And in general there will be a lot more detailed specific code walkthroughs, coding technique 00:12:32.840 |
discussions and stuff like that, as well as more detailed walkthroughs of papers and stuff. 00:12:40.880 |
And so anytime we cover one of these things, if you notice something where you're like, 00:12:47.160 |
this is assuming some knowledge that I don't have, that's fine. 00:12:52.120 |
It just means that's something you could ask on the forum and say hey, Jeremy was talking 00:12:58.240 |
about static methods in Python, I don't really know what a static method is, or why he was 00:13:05.320 |
using it here, could someone give me some resources. 00:13:08.280 |
These are things that are not rocket science, just because you don't happen to have come 00:13:12.560 |
across it yet doesn't mean it's hard, it's just something you need to learn. 00:13:20.920 |
I will mention that as I cover these research-level topics and develop these courses, I often 00:13:29.480 |
refer to code that academics have put up to go along with their papers, or kind of example 00:13:35.760 |
code that somebody else has written on GitHub. 00:13:38.480 |
I nearly always find that there's some massive critical flaw in it. 00:13:45.400 |
So be careful of taking code from online resources and assuming that if it doesn't work for you 00:13:54.680 |
that you've made a mistake or something, this kind of research-level code, it's just good 00:14:00.760 |
enough that they were able to run their particular experiments every second Tuesday. 00:14:09.320 |
So you should be ready to do some debugging and so forth. 00:14:18.560 |
So on that sense, I just wanted to remind you about something from our old course wiki 00:14:25.080 |
that we sometimes talk about, which is like people often ask what should I do after the 00:14:30.640 |
lesson, like how do I know if I've got it, and we basically have this thing called how 00:14:38.640 |
to use the provided notebooks, and the idea is this. 00:14:42.980 |
Don't open up the notebook, I know I said this in part 1 as well, but I'll say it again, 00:14:47.120 |
then go shift, enter, shift, enter, shift, enter until a bug appears and then go to the 00:14:55.080 |
The idea of the notebook is to kind of be like a little crutch to help you get through 00:15:01.880 |
The idea is that you start with an empty notebook and think I now want to complete this process. 00:15:08.520 |
And that might initially require you alt-tabbing to the notebook and reading it, figuring out 00:15:18.460 |
what it says, but whatever you do, don't copy and paste it to your notebook. 00:15:26.400 |
So try to make sure you can repeat the process, and as you're typing it out, you need to be 00:15:32.920 |
thinking, what am I typing, why am I typing it? 00:15:36.100 |
So if you can get to the point where you can solve an object detection problem yourself 00:15:44.560 |
in a new empty notebook, even if it's using the exact same data set we used in the course, 00:15:53.960 |
That will take a while, but the idea is that by practicing the second time you try to do 00:15:58.840 |
it, the third time you try to do it, you'll check the notebook lastness. 00:16:04.360 |
And if there's anything in the notebook where you think, if you think I don't know what 00:16:08.800 |
it's doing, I hope to teach you enough techniques in this course, in this class, that you'll 00:16:14.100 |
know how to experiment to find out what it's doing, so you shouldn't have to ask that. 00:16:19.600 |
But you may well want to ask, why is it doing that? 00:16:22.600 |
That's the conceptual bit, and that's something which you may need to go to the forums and 00:16:27.000 |
say, before this step, Jeremy had done this, after this step, Jeremy had done that, there's 00:16:34.440 |
this bit in the middle where he does this other thing, I don't quite know why. 00:16:38.720 |
So you can say here are my hypotheses as to why, try and work through it as much as possible, 00:16:45.280 |
that way you'll both be helping yourself and other people will help you fill in the gaps. 00:16:53.240 |
If you wish, and you have the financial resources, now is a good time to build a deep learning 00:17:02.000 |
When I say a good time, I don't mean a good time in the history of the pricing of GPUs. 00:17:06.520 |
GPUs are currently by far the most expensive they've ever been, as I say this because of 00:17:22.120 |
The fact is if you're paying somewhere between $0.60 and $0.90 an hour for doing your deep 00:17:31.680 |
learning on a cloud provider, particularly if you're still on a K80 like an Amazon P2 00:17:39.160 |
or Google Colab actually, if you haven't come across it, now lets you train on a K80 for 00:17:50.440 |
You can buy one that's going to be like three times faster for maybe $600 or $700. 00:18:00.920 |
You need a box to put it in, of course, but the example in the bottom right here from 00:18:07.680 |
the forum was something that somebody put together in last year's course, so like a year 00:18:12.280 |
ago they were able to put together a pretty decent box for a bit over $500. 00:18:18.360 |
Probably speaking you're probably looking at more like $1,000 or $1,500. 00:18:21.560 |
I created a new forum thread where you can talk about options and parts and ask questions 00:18:33.440 |
If you could afford it right now, the GTX 1080 Ti is almost certainly what you want 00:18:46.680 |
If you can't afford that, you should probably be looking for a second-hand 980 or a second-hand 00:18:55.500 |
If you can afford to spend more money, it's worth getting a second GPU so you can do what 00:19:00.800 |
I do, which is to have one GPU training and another GPU which I'm running an interactive 00:19:14.080 |
RAM is very useful, try and get 32GB if you can, RAM is not terribly expensive. 00:19:22.560 |
A lot of people find that their vendor or person to buy one of these business classes 00:19:30.400 |
You can get one of the Intel i5 or i7 consumer CPUs, far, far cheaper, but actually a lot 00:19:42.560 |
If you're doing computer vision, that's definitely not true. 00:19:45.600 |
It's very common now with these 1080TIs and so forth to find that the speed of the data 00:19:50.600 |
augmentation is actually the slow bit that's happening on the CPU, so it's worth getting 00:20:01.320 |
Your GPU, if it's running quickly but the hard drive's not fast enough to give it data, 00:20:08.000 |
So if you can afford an NVMe drive that's super, super fast, you don't have to get a 00:20:13.880 |
You can just get a little one that you just copy your current set of data onto and have 00:20:17.320 |
some big RAID array that sits there for the rest of your data when you're not using it. 00:20:24.800 |
There's a slightly arcane thing about PCI lanes which is basically like the size of 00:20:31.720 |
the highway that connects your GPU to your computer, and a lot of people claim that you 00:20:45.080 |
It actually turns out, based on some analysis that I've seen recently that that's not true, 00:20:56.640 |
So again, hopefully it'll help you save some money on your motherboard. 00:21:01.600 |
If you've never heard of PCI lanes before, trust me, by the end of putting together this 00:21:11.720 |
You can buy all the parts and put it together yourself. 00:21:13.720 |
It's not that hard, it can be a useful learning experience, it can also be kind of frustrating 00:21:18.840 |
and annoying, so you can always go to central computers and they'll put it together for 00:21:23.960 |
you, there's lots of online vendors that will do the same thing, and they'll generally make 00:21:28.160 |
sure it turns on and runs properly and generally not much of a mark-up, so it's not a bad idea. 00:21:37.640 |
We're going to be doing a lot of reading papers. 00:21:40.560 |
Basically each week we'll be implementing a paper, or a few papers, and if you haven't 00:21:45.320 |
looked at papers before, they look something like on the left. 00:21:49.720 |
The thing on the left is an extract from the paper that implements Adam. 00:21:55.320 |
You may also have seen Adam as a single Excel formula on the spreadsheet that I've written. 00:22:03.160 |
The difference is in academic papers, people love to use Greek letters, they also hate 00:22:11.560 |
So you'll often see like a page-long formula where when you actually look at it carefully 00:22:17.000 |
you'll realize the same kind of sub-equation appears 8 times. 00:22:22.280 |
They didn't think to say above it, let t equal this sub-equation and now it's 1. 00:22:28.200 |
I don't know why this is a thing, but I guess all this is to say once you've read and understood 00:22:38.560 |
a paper, you then go back to it and you look at it and you're just like wow, how did they 00:22:46.400 |
Like Adam is like momentum on the gradient and momentum on the square of the gradient. 00:23:00.160 |
And the other reason it's a big long thing is because they have things like this where 00:23:03.080 |
they have theorems and corollaries and stuff where they're kind of saying here's all our 00:23:09.080 |
theoretical reasoning behind why this ought to work or whatever. 00:23:14.160 |
And for whatever reason, a lot of conferences and journals don't like to accept papers that 00:23:20.240 |
don't have a lot of this theoretical justification. 00:23:23.280 |
Jeffrey Hinton has talked about this a bit, particularly a decade or two ago when no conferences 00:23:30.080 |
would really accept any neural network papers. 00:23:33.920 |
Then there was this one abstract theoretical result that came out where suddenly they could 00:23:40.040 |
show this practically unimportant but theoretically interesting thing, and then suddenly they 00:23:46.920 |
could then start submitting things to journals because they had this theoretical justification. 00:23:52.360 |
So academic papers are a bit weird, but in the end it's the way that the research community 00:24:00.520 |
communicates their findings and so we need to learn to read them. 00:24:05.680 |
But something that can be a great thing to do is to take a paper, put in the effort to 00:24:11.880 |
understand it, and then write a blog where you explain it in code and normal English. 00:24:21.280 |
And lots of people who do that end up getting quite a following, end up getting some pretty 00:24:27.640 |
great job offers and so forth because it's such a useful skill to be able to show I can 00:24:33.760 |
understand these papers, I can implement them in code, I can explain them in English. 00:24:41.720 |
One thing I will mention is it's very hard to read or understand something which you 00:24:48.640 |
can't vocalize, which means if you don't know the names of the Greek letters, it sounds 00:24:54.840 |
weird, but it's actually very difficult to understand, remember, take in a formula that 00:25:05.680 |
You need to know that that squiggle is called delta, or that squiggle is called sigma, whatever. 00:25:10.120 |
Just spending some time learning the names of the Greek letters sounds like a strange 00:25:14.600 |
thing to do, but suddenly you don't look at these things anymore and go squiggle a over 00:25:18.560 |
squiggle b plus other weird squiggles, it looks like a y thing. 00:25:29.400 |
So now that we're kind of at the cutting edge stage, a lot of the stuff we'll be learning 00:25:36.600 |
in this class is stuff that almost nobody else knows about. 00:25:43.840 |
So that's a great opportunity for you to be the first person to create an understandable 00:25:51.320 |
and generalizable code library that implements it, or the first person to write a blog post 00:25:55.840 |
that explains it in clear English, or the first person to try applying it to this slightly 00:26:00.960 |
different area which is obviously going to work just as well, or whatever. 00:26:07.040 |
So when we say cutting edge research, that doesn't mean you have to come up with the 00:26:12.920 |
next batch norm, or the next atom, or the next diluted convolution. 00:26:18.560 |
It can mean take this thing that was used for translation and apply it to this very 00:26:27.400 |
similar other parallel NLP task, or take this thing that was tested on skin lesions and 00:26:34.400 |
tested on this data set of this other kind of lesions. 00:26:39.080 |
That kind of stuff is super great learning experience and incredibly useful because the 00:26:45.680 |
vast majority of the world that knows nothing about this whole field, it just looks like 00:26:51.840 |
You'll be like, hey, I've for the first time shown greater than 90% accuracy at finding 00:27:07.600 |
So when I say here experiment in your area of expertise, one of the things we particularly 00:27:13.040 |
look for in this class is to bring in people who are pretty good at something else, pretty 00:27:21.240 |
good at meteorology, or pretty good at de novo drug design, or pretty good at goat dairy 00:27:31.320 |
farming, or whatever, these are all examples of people we've had in the class. 00:27:39.880 |
So probably the thing you can do the best would be to take that thing you're already 00:27:45.980 |
pretty good at and add on these new skills, because otherwise if you're trying to go into 00:27:52.200 |
some different domain, you're going to have to figure out how do I get data for that domain, 00:27:55.360 |
how do I know what are the problems to solve in that domain, and so forth. 00:28:00.640 |
Whereas often it will seem pretty trivial to you to take this technique applied to this 00:28:05.640 |
data set that you've already got sitting on your hard drive, but that's often going to 00:28:09.120 |
be a super interesting thing for the rest of the world to see like oh, that's interesting 00:28:14.840 |
when you apply it to meteorology data and use this RNN or whatever, suddenly it allows 00:28:21.640 |
you to forecast over larger areas or longer time periods. 00:28:30.600 |
So communicating what you're doing is super helpful, we've talked about that before, but 00:28:37.360 |
I know something that a lot of people in the forums ask people who have already written 00:28:42.240 |
- when somebody has written a blog, often people in the forum will be like how did you 00:28:47.960 |
get up the guts to do that, or what of the process you got to before you decided to start 00:28:52.240 |
publishing something, or whatever, and the answer is always the same. 00:28:56.600 |
It's always just, I was sure I wasn't good enough to do it, I felt terrified and intimidated 00:29:04.680 |
with doing it, but I wrote it and posted it anyway. 00:29:09.280 |
There's never a time I think any of us actually feel like we're not total frauds and imposters, 00:29:16.640 |
but we know more about what we're doing than us six months ago. 00:29:21.000 |
And there's somebody else in the world who knows as much as you did six months ago, so 00:29:25.320 |
if you write something now that would have helped you six months ago, you're helping 00:29:30.420 |
Honestly if you wait another six months, then the year of 12 months ago probably won't even 00:29:35.640 |
understand that anymore because it's too advanced now. 00:29:40.040 |
It's great to communicate wherever you're up to in a way that you think would be helpful 00:29:46.440 |
to the person you were before you knew that thing. 00:29:51.560 |
And of course something that the forums have been useful for is getting feedback about 00:29:56.600 |
drafts and if you post a draft of something that you're thinking of releasing, then other 00:30:06.400 |
folks here can point out things that they find unclear or they think need some corrections. 00:30:13.160 |
So the kind of overarching theme of Part 2 I've described as generative models, but unfortunately 00:30:23.440 |
then Rachel asked me this afternoon exactly what I meant by generative models, and I realized 00:30:30.000 |
So what I really mean is in Part 1, the output of our neural networks was generally like 00:30:37.920 |
a number or a category, whereas the outputs of a lot of the stuff in Part 2 are going 00:30:49.440 |
to be like a whole lot of things, like the top left and bottom right location of every 00:30:57.800 |
object in an image along with what the object is, or a complete picture with a class of 00:31:03.880 |
every single pixel in that picture, or an enhanced super-resolution version of the input 00:31:11.920 |
image, or the entire original input paragraph translated into French. 00:31:23.280 |
It's kind of like, often it just requires some different ways of thinking about things 00:31:30.980 |
and some kind of different architectures and so forth, and so that's kind of like I guess 00:31:36.280 |
the main theme of the kind of techniques we'll be looking at. 00:31:41.360 |
The vast majority, possibly all, of the data we'll be looking at will be either text or 00:31:53.440 |
It would be fairly trivial to do most of these things with audio as well, it's just not something 00:32:03.360 |
Somebody asked on the forum about what can we do more stuff with time series and tabular 00:32:07.700 |
data, and my answer was, I've already taught you everything I know about that and I'm not 00:32:13.920 |
sure there's much else to say, particularly if you check out the machine learning course, 00:32:21.040 |
which goes into a lot of that in a lot more detail. 00:32:24.160 |
I don't feel like there's more stuff to tell you, I think that's a super-important area, 00:32:38.200 |
We'll be looking at some larger data sets, both in terms of the number of objects in 00:32:43.960 |
the data set and the size of each of those objects. 00:32:47.600 |
For those of you that are working with limited computational resources, please don't let 00:32:51.680 |
that put you off, feel free to replace it with something smaller and simpler. 00:32:56.480 |
In fact, when I was designing this course, I did quite a lot of it in Australia when 00:33:02.240 |
I went to visit my mum, and my mum decided to book a nice holiday house for us with fast 00:33:11.240 |
We turned up the holiday house with fast WiFi, and indeed it did have WiFi, it was fast, 00:33:17.000 |
but the WiFi was not connected to the internet. 00:33:22.360 |
So I called up the agent and I said, "I found the ADSL router and it's got an ADSL thing 00:33:30.760 |
plugged in, and I followed the cable down, and the other end of the cable has nothing 00:33:35.940 |
So she called the people renting the house and the owner and called me back the next 00:33:45.800 |
day, and she said, "Actually, Point Leo has no internet." 00:33:57.480 |
So the good old Australian government had decided to replace ADSL in Point Leo with 00:34:02.400 |
a new national broadband network, and therefore they had disconnected ADSL that had not yet 00:34:10.280 |
So we had fast WiFi, which we could use to Skype chat from one side of the house to the 00:34:18.360 |
Luckily, I did have a new Surface Book 15-inch, which has a GTX 1070 in it, and so I wrote 00:34:28.760 |
a large amount of this course entirely on my laptop, which means I had to practice with 00:34:34.720 |
relatively small resources, I mean not tiny, but 16GB RAM and 6GB GPU, and it was all in 00:34:50.000 |
So I can tell you that pretty much all of this course works well on Windows, on a laptop. 00:34:57.040 |
So you can always use smaller batch sizes, you could use a cut-down version of the dataset, 00:35:02.200 |
So if you have the resources, you'll get better results if you can use the bigger datasets 00:35:07.240 |
Now's a good time, I think, to take a somewhat early break so we can fix the forums. 00:35:33.560 |
So let's start talking about object detection, and so here is an example of object detection. 00:35:41.320 |
So hopefully you'll see two main differences from what we're used to when it comes to classification. 00:35:49.440 |
The first is that we have multiple things that we're classifying, which is not unheard 00:35:57.840 |
We did that in the Planet Satellite data, for example, but what is kind of unheard of 00:36:03.120 |
is that as well as saying what we see, we've also got what's called bounding boxes around 00:36:10.120 |
A bounding box has a very specific definition, which is it's a box, it's a rectangle, and 00:36:19.640 |
the rectangle has the object entirely fitting within it, but it's no bigger than it has 00:36:29.800 |
You'll see this bounding box is perhaps, for the horse at least, slightly imperfect in 00:36:36.280 |
that it looks like there's a bit of tail here. 00:36:39.240 |
So it probably should be a bit wider, and maybe there's even a little bit of hoof here, 00:36:44.360 |
So the bounding boxes won't be perfect, but they're generally pretty good in most data 00:36:53.320 |
So our job will be to take data that has been labeled in this way and on data that is unlabeled 00:37:01.720 |
to generate the classes of the objects and each one of those are bounding boxes. 00:37:11.080 |
One thing I'll note to start with is that labeling this kind of data is generally more 00:37:18.440 |
It's generally quicker to say horse, person, person, horse, car, dog, jumbo jet, than it 00:37:24.480 |
is to say if there's a whole horse race going on to label the exact location of every rider 00:37:32.680 |
And then of course it also depends on what classes do you want to label, do you want 00:37:40.720 |
So generally always, just like in ImageNet, it's not like tell me any object you see in 00:37:50.080 |
In ImageNet it's like here are the 1000 classes that we ask you to look for, tell us which 00:37:56.920 |
one of those 1000 classes you find, just tell me one thing. 00:38:02.000 |
For these object detection data sets, it is a list of object classes that we want you 00:38:08.280 |
to tell us about and find every single one of them of any type in the picture along with 00:38:14.520 |
So in this case, why isn't there a tree or jump labeled? 00:38:19.280 |
That's because for this particular data set they weren't one of the classes that the annotators 00:38:23.520 |
were asked to find and therefore were not part of this particular problem. 00:38:27.800 |
So that's kind of the specification of the object detection problem. 00:38:40.840 |
And stage 1 is actually going to be surprisingly straightforward. 00:38:45.800 |
And we're going to start at the top and work down. 00:38:48.560 |
We're going to start out by classifying the largest object in each image. 00:38:55.400 |
So we're going to try and say person, actually this one is wrong, dog is not the largest 00:39:02.440 |
So here's an example of a misclassified one, bird, correct, person, correct. 00:39:09.440 |
That will be the first thing we try to do, that's not going to require anything new, 00:39:15.640 |
The second thing will be to tell us the location of the largest object in each image. 00:39:23.440 |
Again here, this is actually incorrect, it should have labeled the sofa, but you can 00:39:29.680 |
And then finally we will try and do both at the same time, which is to label what it is 00:39:34.520 |
and where it is for the largest thing in the picture. 00:39:38.400 |
And this is going to be relatively straightforward, actually, so it will be a good warm-up to get 00:39:45.320 |
But what I'm going to do is I'm going to use it as an opportunity to show you some useful 00:39:51.680 |
coding techniques, really, and a couple of little fast.ai handy details before we then 00:40:00.280 |
get on to multi-label classification and then multiple object classification. 00:40:08.600 |
The notebook that we're using is Pascal notebook, and all of the notebooks are in the DL2 folder. 00:40:22.240 |
One thing you'll see in some of my notebooks is torch.coder.set_device, you may have even 00:40:27.640 |
seen it in the last part, just in case you're wondering why that's there. 00:40:30.840 |
I have four GPUs on the university server that I use, and so I can put a number from 00:40:39.880 |
This is how I prefer to use multiple GPUs rather than run a model on multiple GPUs, which 00:40:45.160 |
doesn't always beat it up that much, and it's kind of awkward. 00:40:48.160 |
I generally like to have different GPUs running different things, so in this case I was running 00:40:56.040 |
something in this on device 1 and doing something else on another notebook in device 2. 00:41:01.160 |
Now obviously if you see this in a notebook left behind, that was a mistake. 00:41:04.960 |
If you don't have more than one GPU, you're going to get an error, so you can just change 00:41:13.680 |
So there's a number of standard object detection datasets, just like ImageNet is a standard 00:41:21.160 |
object classification dataset, and kind of the old classic ImageNet equivalent, if you 00:41:27.600 |
like, is Pascal BOC, Visual Object Classes, something like that. 00:41:40.240 |
The actual main website for it is like, I don't know, it's running on somebody's coffee warmer 00:41:47.760 |
or something, it goes down all the time every time he makes coffee. 00:41:52.320 |
So some folks have mirrored it, which is very kind of them, so you might find it easier 00:41:58.480 |
You'll see when you download it that there's a 2007 dataset, the 2012 dataset, there basically 00:42:05.920 |
were academic competitions in those different years, just like the ImageNet dataset we tend 00:42:10.120 |
to use is actually the ImageNet 2012 competition dataset. 00:42:17.160 |
We'll be using the 2007 version in this particular notebook. 00:42:21.800 |
Feel free to use the 2012 instead, it's a bit bigger, you might get better results. 00:42:26.320 |
A lot of people, in fact most people now in research papers actually combine the two. 00:42:32.160 |
You do have to be careful because there's some leakage between the validation sets between 00:42:36.360 |
the two, so if you do decide to do that, make sure you do some reading about the dataset 00:42:40.800 |
to make sure you know how to combine them correctly. 00:42:44.400 |
The first thing you'll notice in terms of coding here is this, we haven't used this 00:42:55.160 |
This is part of the Python 3 standard library called pathlib, and it's super handy. 00:43:02.600 |
It basically gives you an object-oriented access to a directory or a file. 00:43:09.680 |
So you can see, if I go path.something, there's lots of things I can do. 00:43:23.680 |
One of them is iterative directory, however, path.iterate directory returns that. 00:43:35.680 |
Basically you've come across generators by now because we did quite a lot of stuff that 00:43:39.840 |
used them behind the scenes without talking about them too much, but basically a generator 00:43:44.400 |
is something in Python 3 which you can iterate over. 00:43:51.280 |
So basically you could go for o in that print, o for instance, or of course you could do the 00:44:07.800 |
same thing as a list comprehension, or you can just stick the word "list" around it to 00:44:23.440 |
Any time you see me put list around something, that's normally because it returned a generator. 00:44:30.880 |
The reason that things generally return generators is that what if the directory had 10 million 00:44:37.940 |
You don't necessarily want a 10 million long list, so with a for loop, you'll just grab 00:44:42.960 |
one, do the thing, throw it away, grab a second, throw it away. 00:44:50.480 |
You'll see that the things that's returning aren't actually strings, but they're some 00:44:56.400 |
If you're using Windows, it'll be a Windows path, on Linux it'll be a POSIX path. 00:45:02.520 |
Most of the time you can use them as if they were strings, like if you pass it to any of 00:45:07.840 |
the os.path.whatever functions in Python, it'll just work. 00:45:13.400 |
But some external libraries, it won't work, so that's fine. 00:45:20.080 |
If you grab one of these, let's just grab one of these. 00:45:29.360 |
So in general, you can change data types in Python just by naming the data type that you 00:45:37.240 |
want and treating it like a function, and that will cast it. 00:45:41.960 |
So anytime you try to use one of these pathlib objects and you pass it to something which 00:45:48.040 |
says like I was expecting a string, this is not a string, that's how you do it. 00:45:53.580 |
So you'll see there's quite a lot of convenient things you can do. 00:45:55.800 |
One kind of fun thing is the slash operator is not divided by, but it's path/. 00:46:08.000 |
So they've overwritten the slash operator in Python so that it works, so you can say 00:46:13.240 |
path/whatever, and you'll see how that's not inside a string. 00:46:18.880 |
So this is actually applying not the division operator, but the overwritten slash operator, 00:46:28.640 |
And you'll see if you run that, it doesn't return a string, it returns a pathlib object. 00:46:38.440 |
And so one of the things a pathlib object can do is it has an open method, so it's actually 00:46:45.360 |
pretty cool once you start getting the hang of it. 00:46:48.400 |
And you'll also find that the open method takes all the kind of arguments you're familiar 00:46:53.760 |
with, you can say write, or binary, or encoding, or whatever. 00:46:57.740 |
So in this case, I want to load up these JSON files which contain not the images but the 00:47:09.320 |
bounding boxes and the classes of the objects. 00:47:14.160 |
And so in Python, the easiest way to do that is with the JSON library, or there's some 00:47:20.160 |
faster API equivalent versions, but this is pretty small so you won't need them. 00:47:24.440 |
And you go to JSON.load, and you pass it an open file object, and so the easy way to do 00:47:31.120 |
that since we're using pathlib is just go path.open. 00:47:36.120 |
So these JSON files that we're going to look inside in a moment, if you haven't used them 00:47:39.560 |
before JSON is JavaScript object notation, it's kind of the most standard way to pass 00:47:46.160 |
around hierarchical structured data now, obviously not just with JavaScript. 00:47:54.960 |
You'll see I've got some JSON files in here, they actually did not come from the mirror 00:48:00.200 |
The original Pascal annotations were in XML format, but cool kids can't use XML anymore, 00:48:07.680 |
we have to use JSON, so somebody's converted them all to JSON, and so you'll find the second 00:48:15.480 |
So if you just pop them in the same location that I've put them here, everything will work 00:48:22.400 |
So these annotation files, JSONs, basically contain a dictionary. 00:48:29.440 |
Once you open up the JSON, it becomes a Python dictionary, and they've got a few different 00:48:35.680 |
The first is we can look at images, it's got a list of all of the images, how big they 00:48:45.960 |
One thing you'll notice here is I've taken the word images and put it inside a constant 00:48:55.320 |
They seem kind of weird, but if you're using a notebook or any kind of IDE, this now means 00:49:01.960 |
I can tap Complete all of my strings and I won't accidentally type it slightly wrong, 00:49:12.240 |
So here's the contents of the first few things in the images. 00:49:16.680 |
More interestingly, here are some of the annotations. 00:49:20.480 |
So you'll see basically an annotation contains a bounding box, and the bounding box tells 00:49:27.160 |
you the column and row of the top left, and its height and width. 00:49:37.160 |
And then it tells you that that particular bounding box is for this particular image, 00:49:43.160 |
so you'd have to join that up over here to find it's actually O12.jpg. 00:49:54.300 |
Also some of them at least have a polygon segmentation, not just a bounding box. 00:50:02.600 |
Some of them have an ignore flag, so we'll ignore the ignore flags. 00:50:06.280 |
Some of them have something telling you it's a crowd of that object, not just one of them. 00:50:16.080 |
So then you saw here there's a category ID, so then we can look at the categories, and 00:50:20.680 |
here's a few examples, basically each ID has a name, there we go. 00:50:29.240 |
So what I did then was turn this category list into a dictionary from ID to name, I created 00:50:37.320 |
a dictionary from ID to name of the image file names, and I created a list of all of 00:50:47.480 |
So generally when you're working with a new dataset, I try to make it look the way I would 00:50:55.960 |
want it to if I kind of designed that dataset, so I just kind of do a quick bit of manipulation. 00:51:02.040 |
And so the steps you see here, and you'll see in each class, are basically the sequence 00:51:08.520 |
of steps I took as I started working with this new dataset, except without the thousands 00:51:21.560 |
I find the one thing people most comment on when they see me working in real time, having 00:51:29.880 |
seen my classes, is like "wow, you actually don't know what you're doing, do you?" 00:51:36.320 |
It's like 99% of the things I do don't work, and then the small percentage of the things 00:51:44.320 |
So I mentioned that because machine learning and particularly deep learning is kind of 00:51:50.640 |
incredibly frustrating because in theory, you're just to find the correct loss function and 00:51:57.520 |
a flexible enough architecture, and you press train and you're done. 00:52:03.160 |
But if that was actually all at talk, then nothing would take any time, and the problem 00:52:10.400 |
is that all the steps along the way until it works, it doesn't work. 00:52:16.960 |
Like it goes straight to infinity, or it crashes with an incorrect tensor size, or whatever. 00:52:24.200 |
And I will endeavor to show you some debugging techniques as we go, but it's one of the hardest 00:52:32.160 |
things to teach because I don't know, maybe I just haven't quite figured it out yet. 00:52:42.680 |
The main thing it requires is tenacity. I find the biggest difference between the people 00:52:48.440 |
I've worked with who are super effective and the ones who don't seem to go very far has 00:52:54.000 |
never been about intellect, it's always been about sticking with it, basically never giving 00:53:03.920 |
It's particularly important with this deep learning stuff because you don't get that 00:53:09.040 |
continuous reward cycle. With normal programming, you've got like 12 things to do until you've 00:53:15.040 |
got your flash endpoint staged up. You know at each stage, it's like okay, we've successfully 00:53:20.720 |
processed in the JSON, and now we've successfully got the callback from that promise, and now 00:53:26.000 |
we've successfully created the authentication system. 00:53:30.160 |
It's this constant sequence of stuff that works, whereas generally with training a model, 00:53:36.620 |
it's a constant stream of like "it doesn't work, it doesn't work, it doesn't work" until 00:53:40.800 |
eventually it does. So it's kind of annoying. 00:53:48.000 |
So let's now look at the images. You'll find inside the VOC devkit, there's 2007 and 2012 00:53:57.720 |
directories, and in there there's a whole bunch of stuff that's mainly these XML files, 00:54:03.120 |
the one we care about with JPEG images, and so again here you've got the pathlibs/operator, 00:54:17.080 |
So what I wanted to do was to create a dictionary where the key was the image ID, and the value 00:54:30.020 |
was a list of all of its annotations. So basically what I wanted to do was go through each of 00:54:38.340 |
the annotations that doesn't say to ignore it, and append it, the bounding box and the 00:54:48.040 |
class, to the appropriate dictionary item where that dictionary item is a list. But 00:54:57.380 |
the annoying thing is if that dictionary item doesn't exist yet, then there's no list to 00:55:05.760 |
So one super handy trick in Python is that there's a class called collections.defaultdict, 00:55:13.360 |
which is just like a dictionary, but if you try and access a key that doesn't exist, it 00:55:20.040 |
magically makes itself exist and sets itself equal to the return value of this function. 00:55:27.660 |
Now this could be the name of some function that you've defined, or it can be a lambda 00:55:33.600 |
function. A lambda function simply means it's a function that you define in place. We'll 00:55:39.700 |
be seeing lots of them. So here's an example of a function. All the arguments to the function 00:55:46.420 |
are listed on the left, so there's no arguments to the function. And lambda functions are special, 00:55:51.380 |
you don't have to write return as a return is assumed. So in this case, this is a lambda 00:55:56.500 |
function that takes no arguments and returns an empty list. So in other words, every time 00:56:01.820 |
I try and access something in train annotations that doesn't exist, it now does exist and 00:56:10.980 |
it's an empty list, which means I can append to it. 00:56:18.220 |
One comment on variable naming is when I read through these notebooks, I'll generally try 00:56:29.060 |
and speak out the English words that the variable name is a noun for. A reasonable question 00:56:36.740 |
would be why didn't I write the full name of the variable in English rather than using 00:56:41.820 |
a short mnemonic. It's a personal preference I have based on a number of programming communities 00:56:49.340 |
where the basic thesis is that the more that you can see in a single eye grab of the screen, 00:57:00.500 |
the more you can understand intuitively at one go. Every time your eye has to jump around, 00:57:07.980 |
it's kind of like a context change that reduces your understanding. It's a style of programming 00:57:13.340 |
I found super helpful, and so generally speaking I particularly try to reduce the vertical 00:57:20.260 |
height, so things don't scroll off the screen, but I also try to reduce the size of things 00:57:26.460 |
so that there's a mnemonic there which if you know it's training annotations, it doesn't 00:57:33.080 |
take long for you to see training annotations, but you don't have to write the whole thing 00:57:38.580 |
So I'm not saying you have to do it this way, I'm just saying there's some very large programming 00:57:42.500 |
communities, some of which have been around for 50 or 60 years which have used this approach 00:57:46.940 |
and I find it works well. It's interesting to compare, I guess my philosophy is somewhere 00:57:57.020 |
between math and Java. In math, everything's a single character. The same single character 00:58:06.460 |
can be used in the same paper for five different things, and depending on whether it's in italics 00:58:11.620 |
or boldface or capitals, it's another five different things. I find that less than ideal. 00:58:19.100 |
In Java, variable names sometimes require a few pages to print out, and I find that less 00:58:26.540 |
So for me, I personally like names which are short enough to not take too much of my perception 00:58:37.580 |
to see at once, but long enough to have a mnemonic. Also, however, a lot of the time 00:58:46.340 |
the variable will be describing a mathematical object as it exists in a paper, and there 00:58:51.780 |
isn't really an English name for it, and so in those cases I will use the same, often 00:59:00.100 |
And so if you see something called delta or A or something, and it's like something inside 00:59:07.540 |
an equation from a paper, I generally try to use the same thing just to explain that. 00:59:17.220 |
By no means do you have to do the same thing. I will say, however, if you contribute to 00:59:21.340 |
fast.ai, I'm not particularly fastidious about coding style or whatever, but if you write 00:59:26.940 |
things more like the way I do than the way Java people do, I'll certainly appreciate it. 00:59:34.700 |
So by the end of this we now have a dictionary from file names to a tuple, and so here's 00:59:41.940 |
an example of looking at that dictionary and we get back a bounding box and a class. 00:59:53.740 |
You'll see when I create the bounding box, I've done a couple of things. The first is 00:59:57.980 |
I've switched the x and y coordinates. The reason for this, I think we mentioned this 01:00:02.580 |
briefly in the last course, the computer vision world when you say my screen is 640x480, that's 01:00:11.740 |
width by height. Whereas the math world when you say my array is 640x480, it's rows by 01:00:19.340 |
columns, i.e. height by width. So you'll see that a lot of things like PIL or Pillow Image 01:00:26.900 |
Library in Python tend to do things in this kind of width by height or columns by rows 01:00:37.280 |
My view is don't put up with this kind of incredibly annoying inconsistency, fix it. 01:00:45.660 |
So I've decided FastAI is the NumPy PyTorch way is the right way, so I'm always rows by 01:00:54.340 |
columns. So you'll see here I've switched my rows by columns. 01:01:00.580 |
I've also decided that we're going to do things by describing the top left x, y coordinate 01:01:08.020 |
and the bottom right x, y coordinate bounding box rather than the x, y and the height width. 01:01:15.060 |
So you'll see here I'm just converting the height and width to the top left and bottom 01:01:25.600 |
So again, I often find dealing with junior programmers, in particular junior data scientists, 01:01:31.420 |
that they get given data sets that are in shitty formats, happy APIs, and they just 01:01:37.920 |
act as if everything has to be that way. But your life will be much easier if you take 01:01:42.780 |
a couple of moments to make things consistent and make them the way you want them to be. 01:01:51.300 |
So earlier on I took all of our classes and created a categories list, and so if we look 01:01:58.300 |
up category number 7, which is what this is, category number 7 is a car. Let's have a look 01:02:05.260 |
at another example. Image number 17 has two bounding boxes, one of them is type 15, one 01:02:13.180 |
is type 13, that is a person and a horse. So this will be much easier to understand if 01:02:18.020 |
we can see a picture of these things. So let's create some pictures. 01:02:23.740 |
So having just turned our height width stuff into top left, bottom right stuff, we're now 01:02:32.260 |
going to create a method to do the exact opposite, because any time I want to call some library 01:02:38.980 |
that expects the opposite, I'm going to need to pass it in the opposite. So here is something 01:02:42.980 |
that converts a bounding box to a height width, bbhw, b bounding box to height width. So it's 01:02:50.860 |
again reversing the order and giving us the height and width. So we can now open an image 01:03:04.620 |
in order to display it, and where we're going to get to is we're going to get it to show 01:03:10.940 |
this - that's that car. So one thing that I often get asked on the forums or through 01:03:17.880 |
GitHub is like, well how did I find out about this open image thing? Where did it come from, 01:03:26.460 |
what does it mean, who uses it. And so I wanted to take a moment because one of the things 01:03:33.220 |
you're going to be doing a lot, and I know a lot of you aren't professional coders, you 01:03:38.180 |
have backgrounds in statistics or meteorology or physics or whatever, and I apologize for 01:03:44.020 |
those of you who are professional coders, you know all this already. Because we're going 01:03:48.260 |
to be doing a lot of stuff with the fastai library and other libraries, you need to be 01:03:55.460 |
And so let me give you a quick overview of how to navigate through code, and for those 01:04:00.300 |
of you who haven't used an editor properly before, this is going to blow your minds. 01:04:05.820 |
For those of you that have, you're going to be like, check this out guys, check this out. 01:04:11.020 |
For the demo I'm going to show you in Visual Studio Code, personally my view is that on 01:04:17.100 |
pretty much every platform, unless you're prepared to put in the decades of your life 01:04:23.680 |
to learn Vim or Emacs well, Visual Studio Code is probably the best editor out there. 01:04:29.260 |
It's free, it's open source, there are other perfectly good ones as well. 01:04:33.860 |
So if you download a recent version of Anaconda, it will offer to install Visual Studio Code 01:04:38.500 |
for you, it integrates with Anaconda, sets it up with your Python interpreter and comes 01:04:46.020 |
So it's a good choice if you're not sure. If you've got some other editor you like, 01:04:54.640 |
So if I fire up Visual Studio Code, the first thing to do of course is do a git clone of 01:05:01.580 |
the fastai library to your laptop. You'll find in the root of the repo as well as the 01:05:08.820 |
environment.yml file that sets up Anaconda environment for GPU. One of the students has 01:05:14.640 |
been kind enough to create an environment-CPU.yml file, and perhaps one of you that knows how 01:05:21.420 |
to do this can add some notes to the wiki, but basically you can use that to create a 01:05:32.180 |
The reason you might want to do that is so that as you navigate the code, you'll be able 01:05:37.620 |
to navigate into PyTorch, you'll see all the stuff is there. 01:05:42.420 |
So I opened up Visual Studio Code, and it's as simple as saying open folder, and then 01:05:48.740 |
you can just point it at the fastai github folder that you just downloaded. 01:05:54.380 |
And so the next thing you need to do is to set up Visual Studio Code to say I want to 01:06:04.780 |
So the way you do that is with the select interpreter command, and there's a really nice 01:06:09.020 |
idea which is kind of like the best of both worlds between a command-line interface and 01:06:14.620 |
a GUI, which is this is the only command you need to know, Ctrl+Shift+P. You hit Ctrl+Shift+P, 01:06:21.620 |
and then you start typing what you want to do and watch what happens. 01:06:24.700 |
I want to change my interpreter into... okay, and it appears. 01:06:31.340 |
If you're not sure, you can kind of try a few different things. 01:06:36.060 |
So here we are, Python select interpreter, and you can see generally you can type stuff 01:06:40.100 |
in, it will give you a list of things if you can. 01:06:42.700 |
And so here's a list of all of the environments and interpreters I have set up, and here's 01:06:51.060 |
So that's basically the only setup that you have to do. 01:06:55.340 |
The only other thing you might want to do is to know there's an integrated terminal, 01:06:59.540 |
so if you hit Ctrl+Backtick, it brings up the terminal. 01:07:04.220 |
And the first time you do it, it will ask you what terminal do you want. 01:07:08.780 |
If you're in Windows, it will be like PowerShell or Command Prompt or Bash. 01:07:13.420 |
If you're on Linux, you've got multiple shells installed, it will ask. 01:07:16.880 |
So as you can see, I've got it set up to use Bash. 01:07:21.340 |
And you'll see it automatically goes to the directory that I'm in. 01:07:28.220 |
So the main thing we want to do right now is find out what open_images is. 01:07:32.540 |
So the only thing you need to know to do that is Ctrl+T. 01:07:37.980 |
If you hit Ctrl+T, you can now type the name of a class, a function, pretty much anything 01:07:47.620 |
And it's kind of cool if there's something that's got like camel case capitalized or 01:07:51.620 |
something with underscore, you can just type the first few letters of each bit so I could 01:08:00.220 |
I do that and it's found the function, it's also found some other things that match. 01:08:08.820 |
So that's kind of a good way you can see exactly where it's come from and you can find out exactly 01:08:14.740 |
And then the next thing I guess would be like, well, what's it used for? 01:08:18.620 |
So if it's used inside fast.ai, you could say find_references, which is shift, it should 01:08:26.700 |
say shift_f12, open_image, shift_f12, and it brings up something saying, oh, it's used 01:08:39.940 |
twice in this code base, and I can go and have a look at each of those examples. 01:08:46.820 |
If it's used in multiple different files, it will tell you the model of different files 01:08:54.260 |
Another thing that's really handy then is as you look at the code, you'll find that certain 01:08:59.580 |
bits of the code call other parts of the code. 01:09:03.220 |
So for example, if you're inside files_dataset, and you're like, oh, this is calling something 01:09:10.200 |
You can wave your pointer over it and it will give you the docstring. 01:09:14.380 |
Or you can hit f12, and it jumps straight to its definition. 01:09:20.140 |
So often it's easy to get a bit lost in things, call things, call things, and if you have 01:09:24.940 |
to manually go to each bit, it's infuriating, whereas this way it's always one button away. 01:09:31.020 |
Ctrl+T to go to something that you specifically know the name of, or f12 to jump to the name 01:09:36.380 |
of the definition of something that you're clicking on. 01:09:39.700 |
When you're done, you probably want to go back to where you came from, so Alt+Left takes 01:09:49.740 |
So whatever you use, BIM, Emacs, Atom, whatever, they all have this functionality as long as 01:10:02.820 |
If you use PyCharm, you can get that for free, that doesn't need any extensions because it's 01:10:08.340 |
Whatever you're using, you want to know how to do this stuff. 01:10:14.860 |
Finally I'll mention there's a nice thing called Zen mode, Ctrl+KZ, which basically 01:10:22.020 |
gets rid of everything else so you can focus, but it does keep this nice little thing on 01:10:25.980 |
the right-hand side which shows you where you are. 01:10:35.540 |
That's something that you should practice if you haven't played around with it before 01:10:39.540 |
during the week because we're increasingly going to be digging deeper and deeper into 01:10:47.060 |
As I say, if you're already a professional coder, know all this stuff, apologies for telling 01:10:53.460 |
So we're going to -- well actually since we did that, let's just talk about OpenImage. 01:11:02.340 |
You'll see that we're using CV2, CV2 is the library, is actually the OpenCV library. 01:11:12.240 |
You might wonder why we're using OpenCV, and I want to explain some of the inits of fast.ai 01:11:19.060 |
to you because some of them are kind of interesting and might be helpful to you. 01:11:24.940 |
The torch vision, like the standard PyTorch vision library, actually uses PyTorch tensors 01:11:33.500 |
for all of its data augmentation and stuff like that. 01:11:38.220 |
A lot of people use Pillow, the standard Python imaging library. 01:11:48.180 |
I found OpenCV was about 5 to 10 times faster than TorchVision, so early on I actually teamed 01:11:56.260 |
up with one of the students from an earlier class to do the Planet Lab satellite competition 01:12:00.340 |
back when that was on, and we used TorchVision. 01:12:04.340 |
Because it was so slow, we could only get 25% GPU utilization because we were doing 01:12:12.620 |
So then I used the Profiler to find out what was going on and realized it was all in TorchVision. 01:12:20.700 |
Pillow or PIL is quite a bit faster, but it's not as fast as OpenCV, and also is not nearly 01:12:34.540 |
So I actually talked to the guy who developed the thing, Python has this thing called the 01:12:41.060 |
global interpreter lock, the GIL, which basically means that two threads can't do Pythonic things 01:12:50.500 |
It makes Python a really shitty language for modern programming, but we're stuck with it. 01:12:58.820 |
So I spoke to the guy on Twitter who actually made it so that OpenCV releases the GIL. 01:13:05.580 |
So one of the reasons the Fast.io library is so amazingly fast is because we don't use 01:13:11.300 |
multiple processors like every other library does through our data augmentation, we actually 01:13:16.940 |
And the reason we can do multiple threads is because we use OpenCV. 01:13:20.940 |
Unfortunately, OpenCV is a really shitty API, it's kind of inscrutable, a lot of stuff it 01:13:29.940 |
When I say poorly documented, it's documented in really obtuse kind of ways. 01:13:38.700 |
So that's why I try to make it so no one using Fast.io needs to know that it's using OpenCV. 01:13:45.820 |
If you want to open an image, do you really need to know that you have to pass these flags 01:13:51.700 |
Do you actually need to know that if the reading fails it doesn't show an exception, it just 01:13:58.980 |
It's these kinds of things that we try to do to actually make it work nicely. 01:14:03.500 |
But as you start to dig into it, you'll find yourself in these places and you'll want to 01:14:11.260 |
And I mentioned this in particular to say don't start using PyTorch for your data augmentation, 01:14:19.220 |
don't start bringing in Pillow, you'll find suddenly things slow down horribly or the 01:14:23.380 |
multithreading won't work anymore, try to stick to using OpenCV for your processing. 01:14:35.980 |
So we've got our image, we're just going to use it to demonstrate the Pascal library. 01:14:46.340 |
And so the next thing I wanted to show you in terms of important coding stuff we're going 01:14:50.180 |
to be using throughout this course is using Matplotlib a lot better. 01:14:55.860 |
So Matplotlib is so named because it was originally a clone of Matlab's plotting library. 01:15:05.140 |
Unfortunately, Matlab's plotting library is awful, but at the time it was what everybody 01:15:16.800 |
So at some point, the Matplotlib folks realized that the Matlab plotting library is awful, 01:15:26.540 |
so they added a second API to it which was an object-oriented API. 01:15:31.900 |
Unfortunately, because nobody who originally learned Matplotlib learned the OO API, they 01:15:37.580 |
then taught the next generation of people the old Matlab-style API, and now there's 01:15:42.220 |
basically no examples or tutorials online I'm aware of that use the much, much better, 01:15:50.540 |
So one of the things I'm going to try and show you because plotting is so important 01:15:54.440 |
in deep learning is how to use this API, and I've discovered some simple little tricks. 01:16:00.780 |
One simple little trick is plot.subplots is just a super handy wrapper, I'm going to use 01:16:06.820 |
a lots, and what it does is it returns two things. 01:16:13.220 |
One of the things you probably want to care about, the other thing is an axes object, 01:16:18.500 |
and basically anywhere where you used to say plt.something, you now say ax.something, and 01:16:25.660 |
it will now do that plotting to that particular subplot. 01:16:30.540 |
So a lot of the time you'll use this, or I'll use this during this course to plot multiple 01:16:36.100 |
plots that we can compare next to each other, but even in this case I'm creating a single 01:16:45.060 |
But it's just nice to only know one thing rather than lots of things, so regardless 01:16:50.140 |
of whether you're doing one plot or lots of plots, I always start now with this plot.subplot. 01:16:56.980 |
And the nice thing is that this way I can pass in an axes object if I want to plot it 01:17:03.420 |
into a figure I've already created, or if it hasn't been passed in I can create one. 01:17:10.980 |
So this is also a nice way to make your matplotlib functions really versatile, and you'll kind 01:17:20.900 |
So now rather than plot.imshow, it's ax.imshow. 01:17:25.700 |
And then rather than kind of weird stateful setting things in the old-style API, you can 01:17:33.900 |
now use oos, get_access that returns an object, set_visible, sets a property, it's all pretty 01:17:42.960 |
So once you start getting the hang of a small number of these oo.matplotlib things, hopefully 01:17:49.460 |
you'll find life a lot easier, so I'm going to show you a few right now actually. 01:17:54.780 |
So let me show you a cool example, what I think is a cool example. 01:17:59.500 |
So one thing that kind of drives me crazy with people putting text on images, whether 01:18:05.380 |
it be subtitles on TV or people doing stuff with computer vision is that it's like white 01:18:11.900 |
text on a white background or black text on a dark background, you can't read it. 01:18:16.900 |
And so a really simple thing that I like to do every time I draw on an image is to either 01:18:22.660 |
make my text in boxes white with a little black border or vice versa. 01:18:28.740 |
And so here's a cool little thing you can do in matplotlib, is you can take a matplotlib 01:18:36.140 |
plotting object and you can go setPathEffects and say add a black stroke around it. 01:18:47.580 |
And you can see that when you draw that, it doesn't matter that here it's white on a white 01:18:54.420 |
background or here it's on a black background, it's equally visible. 01:18:59.060 |
And I know it's a simple little thing, but it kind of just makes life so much better 01:19:04.260 |
when you can actually see your bounding boxes and actually read the text. 01:19:08.420 |
So you can see, rather than just saying add a rectangle, I get the object that it creates 01:19:19.700 |
Now everything I do, I'm going to get this nice path effect on it. 01:19:24.860 |
You can see matplotlib is a perfectly convenient way of drawing stuff. 01:19:30.300 |
So when I want to draw a rectangle, matplotlib calls that a patch, and then you can pass 01:19:39.740 |
So here's -- again, rather than having to remember all that every time, please stick 01:19:46.420 |
And now you can use that function every time. 01:19:49.660 |
You don't have to put it in a library somewhere, I always put lots of functions inside my notebook. 01:19:55.380 |
If I use it in like three notebooks, then I know it's useful enough that I'll stick 01:20:05.100 |
You can draw text, and notice all of these take an axis object, so this is always going 01:20:10.500 |
to be added to whatever thing I want to add it to. 01:20:13.420 |
So I can add text, and draw an outline around it. 01:20:18.180 |
So having done all that, I can now take my showImage, and notice here the showImage, 01:20:26.340 |
if you didn't pass it an axis, it returns the axis it created. 01:20:29.820 |
So showImage returns the axis that image is on, I then turn my bounding box into height 01:20:35.580 |
and width for this particular image's bounding box, I can then draw the rectangle, I can 01:20:48.180 |
So remember the bounding box x and y are the first two coordinates, so b colon 2 is the 01:20:57.980 |
This is the, remember the tuple contains two things, the bounding box and then the class, 01:21:04.040 |
so this is the class, and then to get the text of it I just pass it into my categories 01:21:11.780 |
So now that I've kind of got all that set up, I can use that for all of my object detection 01:21:21.480 |
What I really want to do though is to kind of package all that up, so here it is, packaging 01:21:25.460 |
it all up, so here's something that draws an image with some annotations, so it shows 01:21:31.000 |
the image, it goes through each annotation, turns it into height and width, draws the 01:21:40.420 |
If you haven't seen this before, each annotation remember contains a bounding box and a class, 01:21:46.300 |
so rather than going for o in a and n and going 0, 0, 1, I can destructure it, so if 01:21:56.800 |
you put something on the left, then that's going to put the two parts of the tuple or 01:22:06.260 |
So for the bounding box and the class in the annotations, go ahead and do all that, and 01:22:14.060 |
so then I can then say okay, draw an image at a particular index by grabbing the image 01:22:20.080 |
ID, opening it up and then calling that draw, and so let's test it out, and there it is. 01:22:29.140 |
So that kind of seems like quite a few steps, but to me, when you're working with a new 01:22:38.220 |
data set, getting to the point that you can rapidly explore it, it pays off. 01:22:45.980 |
You'll see as we start building our model, we're going to keep using these functions 01:22:58.060 |
So step 1 from our presentation is to do a classifier. 01:23:05.740 |
And so I think it's always good, like for me, I didn't really have much experience before 01:23:10.340 |
I started preparing this course a few months ago in doing this kind of object detection 01:23:16.260 |
stuff, so I was like, alright, I want to get this feeling of, even though it's deep learning, 01:23:28.460 |
I thought, alright, why don't I find the biggest object in each image and classify it? 01:23:37.460 |
This is one of the biggest problems I find, particularly with younger students, is they 01:23:41.900 |
figure out the whole big solution they want, generally which involves a whole lot of new 01:23:48.420 |
speculative ideas that nobody's ever tried before, and they spend 6 months doing it, 01:23:54.060 |
and then the day before the presentation, none of it works, and they're screwed. 01:24:00.740 |
I've talked about my approach to Kaggle competitions before, which is like half an hour if you 01:24:06.460 |
At the end of that half an hour, submit something, and try and make it a little bit better than 01:24:11.820 |
So I've kind of tried to do the same thing in preparing this lesson, which is try to 01:24:18.060 |
create something that's a bit better than the last thing. 01:24:20.740 |
So the first thing, the easiest thing I could come up with was my largest item classifier. 01:24:27.340 |
So the first thing I needed to do was to go through each of the bounding boxes in an image 01:24:43.500 |
So I actually didn't write that first, I actually wrote this first. 01:24:48.900 |
So normally I pretend that somebody else has created the exact API I want, and then go 01:24:56.580 |
So I wrote this line first, and it's like, okay, I need something which takes all of 01:25:03.300 |
the bounding boxes for a particular image and finds the largest, and that's pretty straightforward. 01:25:12.820 |
I can just sort the bounding boxes, and here again we've got a lambda function. 01:25:18.620 |
So again, if you haven't used lambda functions before, this is something you should study 01:25:22.180 |
during the week, they're used all over the place to quickly define a once-off function. 01:25:29.380 |
And in this case, the Python built-in sorted function lets you pass in a function to say, 01:25:39.460 |
how do you decide whether something's earlier or later in the sort order? 01:25:44.660 |
And in this case, I took the product of the last two items of my bounding box list, i.e. 01:26:00.300 |
the bottom right hand corner, minus the first two items of my bounding box list, i.e. the 01:26:06.860 |
So bottom right minus top left is the size, the two sizes, and if you take the product 01:26:12.620 |
of those two things you get the size of the bounding box. 01:26:15.980 |
And so then that's the function, do that in descending order. 01:26:20.940 |
Often you can take something that's going to be a few lines of code and turn it into 01:26:27.020 |
one line of code, and sometimes you can take that too far, but for me, I like to do that 01:26:35.580 |
where I reasonably can, because again, having to understand a whole big chain of things, 01:26:43.620 |
my brain can just say, I can just look at that at once, and say okay, there it is. 01:26:48.980 |
And also I find that over time, my brain kind of builds up this little library of idioms, 01:26:55.860 |
and more and more things I can look at a single line and know what's going on. 01:27:04.020 |
So this now is a dictionary, and it's a dictionary because this is a dictionary comprehension. 01:27:15.140 |
A dictionary comprehension is just like a list comprehension, I'm going to use it a lot 01:27:18.820 |
in this part of the course, except it goes inside curly brackets, and it's got a key colon 01:27:27.820 |
So here the key is going to be the image ID, and the value is the largest bound box. 01:27:39.540 |
So now that we've got that, we can look at an example, and here's an example of the largest 01:27:51.020 |
So obviously there's a lot of objects here, there's three bicycles and three people, but 01:28:01.500 |
I feel like this ought to go without saying, but it definitely needs to be said because 01:28:07.020 |
You need to look at every stage when you've got any kind of processing pipeline, if you're 01:28:13.680 |
as bad at coding as I am, everything you do will be wrong the first time you do it. 01:28:20.220 |
But there's lots of people that are as bad as me at coding, and yet lots of people write 01:28:24.700 |
lines and lines of code assuming they're all correct, and then at the very end they've got 01:28:29.460 |
a mistake and they don't know where it came from. 01:28:32.020 |
So particularly when you're working with images or text, like things that humans can look 01:28:41.740 |
So here I have it, yep, that looks like the biggest thing, and that certainly looks like 01:28:48.660 |
Here's another nice thing in Pathlib, make directory, so it's a handy little method. 01:28:56.440 |
So I'm going to create a path called CSV, which is a path to my large objects CSV file. 01:29:10.860 |
We have an image classifier from CSV, I could go through a whole lot of work to create a 01:29:17.060 |
custom data set and blah blah blah to use this particular format I have. 01:29:23.900 |
It's so easy to create the CSV, chuck it inside a temporary folder, and then use something 01:29:34.020 |
Something I've seen a lot of times on the forum is people will say how do I convert 01:29:39.340 |
this weird structure into a way that fast.ai can accept it, and then normally somebody 01:29:45.700 |
on the forum will say, print it to a CSV file. 01:29:53.620 |
And the easiest way to create a CSV file is to create a pandas dataframe. 01:29:58.140 |
So here's my pandas dataframe, I can just give it a dictionary with the name of a column 01:30:04.660 |
and the list of things in that column, so there's the file name, there's the category, 01:30:11.340 |
and then you'll see here, why do I have this? 01:30:13.340 |
I've already named the columns in the dictionary, why is it here? 01:30:17.460 |
Because the order of columns matters, and the dictionary does not have an order. 01:30:23.700 |
So this says the file name comes first and the category comes second. 01:30:27.500 |
So that's a good trick to creating your CSVs. 01:30:33.420 |
I have a CSV file that contains a bunch of file names, and for each one it contains the 01:30:42.540 |
So this is the same two lines of code you've seen a thousand times. 01:30:48.900 |
What we will do though is to take a look at this. 01:31:01.180 |
So you might remember the default strategy for creating a 224x224 image in fastai is 01:31:12.580 |
to first of all resize it, so the largest side is 224, and then to take a random square 01:31:25.300 |
crop during training, and then during validation we take the center crop unless we use data 01:31:31.900 |
augmentation in which case we do a few random crops. 01:31:37.860 |
For bounding boxes, we don't want to do that because unlike an image net where the thing 01:31:44.180 |
we care about is pretty much in the middle and it's pretty big, a lot of the stuff in 01:31:49.220 |
object detection is quite small and close to the edge. 01:31:53.940 |
So we could crop it out, and that would be bad. 01:31:56.880 |
So when you create your transforms you can choose crop type = crop type.no, and no means 01:32:04.500 |
don't crop, and therefore to make it square instead it squishes it. 01:32:10.060 |
So you'll see this guy now looks kind of a bit strangely wide, and that's because he's 01:32:21.700 |
Generally speaking, a lot of computer vision models work a little bit better if you crop 01:32:29.900 |
rather than squish, but they still work pretty well if you squish, and in this case we definitely 01:32:36.040 |
don't want to crop, so this is perfectly fine. 01:32:40.540 |
If you had very long or very tall images such that if a human looked at the squash version 01:32:48.620 |
you'd be like, that looks really weird, then that might be difficult to model, but in this 01:32:53.140 |
case we're just like, it looks a little bit strange, so the computer won't mind. 01:33:03.500 |
So I'm going to quite often dig a little bit into some more depths of fast.ai and PyTorch, 01:33:11.780 |
and in this case I want to just look at data loaders a little bit more. 01:33:16.460 |
So you already know that inside a model data object, when there's lots of model data subclasses 01:33:30.200 |
like image classifier data, we have a bunch of things which include a training data loader 01:33:36.900 |
and a training data set, and we'll talk much more about this soon. 01:33:42.500 |
The main thing to know about a data loader is that it's an iterator, that each time you 01:33:48.560 |
grab the next iteration of stuff from it, you get a mini-batch. 01:33:54.020 |
And the mini-batch you get is of whatever size you asked for, and by default the batch 01:34:02.260 |
In Python, the way you grab the next thing from an iterator is with next, but you can't 01:34:19.540 |
The reason you can't do that is because you need to say, start a new epoch now. 01:34:26.820 |
In general, this isn't just in PyTorch, but for any Python iterator, you kind of need 01:34:31.740 |
to say start at the beginning of the sequence, please. 01:34:35.980 |
So the way you do that, and this is a general Python concept, is you write iter. 01:34:43.020 |
And iter says please grab an iterator out of this object. 01:34:49.940 |
Specifically as we'll learn later, it means this class has to have defined an underscore 01:34:54.580 |
underscore iter underscore underscore method, which returns some different object which 01:35:00.300 |
then has an underscore underscore next underscore underscore method. 01:35:07.820 |
And so if you want to grab just a single batch, this is how you do it. 01:35:19.500 |
Because our data loaders, our data sets behind the data loaders always have an x independent 01:35:30.980 |
So here we can grab a mini-batch of x's and y's. 01:35:35.260 |
I now want to pass that to that show image command we had earlier, but we can't send 01:36:00.660 |
For one thing, it's not a NumPy array, it's not on the CPU, and its shape is all wrong. 01:36:15.380 |
Furthermore these are not numbers between 0 and 1, why not? 01:36:21.620 |
Because remember all of the standard ImageNet pre-trained models expect our data to be normalized 01:36:33.900 |
So if you look inside -- let's use Visual Studio Code for this since that's what we've 01:36:38.420 |
been doing -- so if you look inside transforms from model, so Ctrl+T transforms from model, 01:36:50.460 |
T-F-M, which in turn calls transforms, so F12, actually transforms from model, calls transform 01:37:17.420 |
And it normalizes with some set of image statistics, and the set of image statistics, they're basically 01:37:24.300 |
This is the ImageNet statistics, this is the statistics used for inception models. 01:37:29.220 |
So there's a whole bunch of stuff that's been done to the input to get it ready to be passed 01:37:39.240 |
So we have a function called denorm for denormalize. 01:37:45.600 |
It doesn't only denormalize, it also fixes up the dimension order and all that stuff. 01:37:52.980 |
The denormalization depends on the transform. 01:37:57.460 |
And the dataset knows what transform was used to create it. 01:38:01.620 |
So that's why you have to go model data, dot, and then some dataset, dot, denorm, and that's 01:38:07.700 |
a function that is stored for you that will undo everything. 01:38:12.620 |
And then you can pass that a mini-batch, but you have to turn it into NumPy first. 01:38:20.900 |
So this is like all the stuff that you need to be able to do to grab batches and look 01:38:27.400 |
And so after you've done all that, you can show the image, and we've got back our last 01:38:36.040 |
So in the end, we've just got the standard four lines of code. 01:38:39.340 |
We've got our transforms, we've got our model data, convlin, dot, pre-trained, we're using 01:38:45.700 |
a ResNet34 here, I'm going to add accuracy as a metric, fix some optimization function, 01:38:53.100 |
do an LRfind, and that looks kind of weird, not particularly helpful. 01:38:58.200 |
Normally we would expect to see an uptick on the right. 01:39:01.420 |
The reason we don't see it is because we intentionally remove the first few points and the last few 01:39:09.660 |
The reason is that often the last few points shoot so high up towards infinity that you 01:39:13.940 |
basically can't see anything, so the vast majority of the time removing the last few 01:39:20.180 |
However, when you've got very few mini-batches, sometimes it's not a good idea, and so a lot 01:39:25.740 |
of people ask this on the forum, here's how you fix it. 01:39:28.820 |
Just say skip, by default it skips 10 at the start, so in this case we just say 5, by default 01:39:34.300 |
it skips 5 at the end, so now we can see the shape properly. 01:39:41.380 |
If your data set is really tiny, you may need to use a smaller batch size, like if you only 01:39:46.700 |
have three or four batches worth, there's nothing to see. 01:39:52.220 |
But in this case, it's fine, we just have to plot a little bit more. 01:39:57.220 |
So we pick a learning rate, we say fit, after one epoch, just train the last layer, it's 01:40:05.660 |
80%, let's unfreeze a couple of layers, do another epoch, 82%, and freeze the whole thing, 01:40:24.060 |
Unlike ImageNet or dogs vs cats, where each image has one major thing, they were picked 01:40:29.900 |
because they had one major thing, and the one major thing is what you're asked to look 01:40:33.460 |
for, a lot of the Pascal data set has lots of little things, and so a large classifier 01:40:45.860 |
But of course, we really need to be able to see the results to see whether it makes sense. 01:40:54.020 |
So we're going to write something that creates this, and in this case, after working with 01:41:00.620 |
this a while, I know what the 20 Pascal classes are. 01:41:05.220 |
So I know there's a person in the bicycle class, I know there's a dog in a sofa class, 01:41:09.220 |
so I know this is wrong, it should be sofa, that's correct, bird, yes, yes, chair, that's 01:41:14.380 |
wrong, I think the table's bigger, motorbike's correct because there's no cactus, that should 01:41:18.960 |
be a bus, person's correct, bird's correct, cow's correct, plant's correct, cow's correct. 01:41:27.960 |
So when you see a piece of code like this, if you're not familiar with all the steps 01:41:36.940 |
to get there, it can be a little overwhelming. 01:41:43.660 |
And I feel the same way when I see a few lines of code and something I'm not terribly familiar 01:41:47.540 |
with, I feel overwhelmed as well, but it turns out there's two ways to make it super simple 01:41:58.900 |
The high-level way is run each line of code step-at-step, print out the inputs, print out 01:42:13.820 |
If there's a line of code where you don't understand how the outputs relate to the inputs, 01:42:21.040 |
So now all you need to know is what are the two ways you can step through the lines of 01:42:28.340 |
The way I use perhaps the most often is to take the contents of the loop, copy it, create 01:42:37.780 |
a cell above it, paste it, outdent it, write i=0, and then put them all in separate cells, 01:42:48.560 |
and then run each one one at a time, printing out the input samples. 01:42:52.700 |
I know that's obvious, but the number of times I actually see people do that when they ask 01:42:57.620 |
me for help is basically zero, because if they had done that, they wouldn't be asking 01:43:06.020 |
Another method that's super handy and there's particular situations where it's super handy 01:43:20.220 |
So for the other half of you, this will be life-changing. 01:43:23.820 |
Actually, a guy I know this morning who's actually a deep learning researcher wrote 01:43:29.540 |
on Twitter, and his message on Twitter was "How come nobody told me about the Python Debugger 01:43:39.220 |
And this guy's an expert, but because nobody teaches basic software engineering skills 01:43:45.660 |
in academic courses, nobody thought to say to him, "Hey Mark, do you know what? 01:43:52.860 |
There's something that shows you everything your code does one step at a time." 01:43:58.220 |
So I replied on Twitter and I said, "Good news Mark, not only that, every single language 01:44:03.780 |
in existence, in every single operating system also has a debugger, and if you Google for 01:44:08.860 |
language name debugger, it will tell you how to use it." 01:44:12.660 |
So there's a meta piece of information for you. 01:44:15.740 |
In Python, the standard debugger is called PDB. 01:44:25.660 |
And the reason I'm mentioning this now is because during the next few weeks, if you're 01:44:32.220 |
anything like me, 99% of the time you'll be in a situation where your code's not working. 01:44:38.860 |
And very often it will have been on the 14th mini-batch inside the forward method of your 01:44:49.060 |
And the answer is you go inside your module and you write that. 01:44:54.500 |
And if you know it's only happening on the 14th iteration, you type if i = 13. 01:45:09.460 |
PDB is the Python debugger, fastai imports it for you, if you get the message that PDB 01:45:14.100 |
is not there, then you can just say import PDB. 01:45:19.260 |
And you'll see it's not the most user-friendly experience. 01:45:25.460 |
But the first cool thing to notice is, the debugger even works in a notebook. 01:45:42.540 |
The main thing to know is this is one of these situations where you definitely want to know 01:45:48.100 |
So you could type next, but you definitely want to type n. 01:45:51.740 |
You could type continue, but you definitely want to type c. 01:45:57.180 |
So what I can do now that I'm sitting here is it shows me the line it's about to run. 01:46:06.220 |
So one thing I might want to do is to print out something, and I can write any Python 01:46:24.660 |
I might want to find out more about where am I in the code more generally. 01:46:29.700 |
I don't just want to see this line, but what's before it and after it, in which case I want 01:46:35.540 |
And so you can see I'm about to run that line, these are the lines above it, and below it. 01:46:43.300 |
So I might be now like, let's run this line and see what happens. 01:46:47.060 |
So go to the next line, here's n, and you can see now it's about to run the next line. 01:46:55.420 |
One handy tip, you don't even have to type n. 01:46:57.700 |
If you just hit enter, it repeats the last thing you did, so that's another n. 01:47:04.340 |
Unfortunately, single letters are often used for debugger commands. 01:47:10.900 |
So if I just type b, it'll run the b command rather than print b for me. 01:47:28.380 |
At this point, if I hit next, it'll draw the text. 01:47:32.920 |
But I don't want to just draw the text, I want to know how it's going to draw the text. 01:47:37.780 |
So I don't want to know next over it, I want to s step into it. 01:47:41.660 |
So if I now hit s, step into it, I'm now inside draw text, and I now hit n, I can see draw 01:47:52.700 |
And then I'm like, okay, I know everything I want to know about this, I will continue 01:47:59.020 |
So c will continue until I'm back at the breakpoint again. 01:48:05.500 |
What if I was zipping along, and this happens quite often, let's step into dnorm. 01:48:15.900 |
And what will often happen is if you're debugging something in your PyTorch module, and it's 01:48:22.260 |
hit an exception, and you're trying to debug, you'll find yourself like six layers deep 01:48:29.740 |
You want to actually see back up what's happening where you called it from. 01:48:34.480 |
So in this case, I'm inside this property, but I actually want to know what was going 01:48:38.680 |
on up the call stack, I just hit u, and that doesn't actually run anything, it just changes 01:48:46.200 |
the context of the debugger to show me what called it, and now I can type things to find 01:48:59.420 |
And then if I'm going to go down again, it's deep, so I'm not going to show you everything 01:49:04.980 |
about the debugger, but I've just showed you all of those commands. 01:49:12.380 |
Something that we've found helpful as we've been doing this is using from ipython.core.debugger 01:49:17.020 |
imports that trace, and then you get it all prettily colored. 01:49:28.220 |
Azar, tell us, I know you're doing an interesting project, can you tell us about it? 01:49:34.300 |
Hello everyone, I'm Azar, here with my collaborator Britt, and we're using this kind of stuff 01:49:42.340 |
to try to build a Google Translate for animal communication. 01:49:48.620 |
So that involves playing around a lot with unsupervised machine neural translation and 01:49:59.660 |
So there you have to go, and we're talking to a number of researchers to try to collect 01:50:03.480 |
and collate large data sets, but if we can't get it that way, we're thinking about building 01:50:07.740 |
a living library of the audio of the species of Earth that involves going out and collecting 01:50:29.000 |
The other place that the debugger comes in particularly handy is, as I say, if you've 01:50:35.200 |
got an exception, particularly if it's deep inside PyTorch. 01:50:38.560 |
So if I, like when I times 100 here, obviously that's going to be an exception, I've got 01:50:50.940 |
Now in this case it's easy to see what's wrong, but often it's not, so what do I do? 01:50:57.940 |
Percent debug pops open the debugger at the point the exception had. 01:51:05.060 |
So now I can check, okay, preds.lend preds 64, I times 100, I've got to print that because 01:51:17.700 |
I have a command, 100, oh no wonder, and you can go down, you can go up, you can list whatever. 01:51:28.460 |
I do all of my development, both with the library and the lessons in Jupyter Notebook. 01:51:37.340 |
I do it all interactively and I use percent debug all the time along with this idea of 01:51:46.540 |
copying stuff out of a function, putting it into separate cells, running it step by step. 01:51:52.140 |
There are similar things you can do inside, for example, Visual Studio Code. 01:51:56.340 |
There's actually a Jupyter extension which says you select any line of code inside Visual 01:52:01.460 |
Studio Code and say run in Jupyter, and it will run it in Jupyter and create a little 01:52:12.700 |
Actually I think Jupyter Notebook is better, and perhaps by the time you watch this on 01:52:18.940 |
the video, Jupyter Lab will be the main thing. 01:52:21.740 |
Jupyter Lab is like the next version of Jupyter Notebook, pretty similar. 01:52:42.780 |
We know exactly how to fix it, so we will worry about that another time. 01:52:53.220 |
So to kind of do the next stage, we want to create the bounding box. 01:53:00.740 |
And now creating the bounding box around the largest object may seem like something you 01:53:05.760 |
haven't done before, but actually it's totally something you've done before. 01:53:11.500 |
And the reason it's something you've done before is we know that we can create a regression 01:53:23.500 |
In other words, a classification neural net is just one that has a sigmoid or softmax 01:53:28.300 |
output and that we use a cross-entropy or a binary cross-entropy, negative or blackyoid, 01:53:40.500 |
If we don't have the softmax of sigmoids at the end, and we use mean squared error as 01:53:46.100 |
a loss function, it's now a regression model, so we can now use it to predict a continuous 01:53:55.100 |
We also know that we can have multiple outputs, like in the Planet competition we did a multiple 01:54:06.220 |
What if we combine the two ideas and do a multiple column regression? 01:54:12.140 |
In this case we've got four numbers, top, left, x and y, bottom, right, x and y, and 01:54:19.780 |
we could create a neural net with four activations. 01:54:23.340 |
We could have no softmax or sigmoid and use a mean squared error loss function. 01:54:28.260 |
And this is kind of like where you're thinking about it like differentiable programming. 01:54:33.260 |
It's not like how do I create a bounding box model, it's like what do I need? 01:54:40.460 |
I need four numbers, therefore I need a neural network with four activations. 01:54:49.860 |
The other half I need to know is a loss function. 01:54:52.500 |
In other words, what's a function that when it is lower means that the four numbers are 01:54:59.940 |
Because if I can do those two things, I'm done. 01:55:05.980 |
If the x is close to the first activation and the y is close to the second, then I'm 01:55:15.100 |
I just need to create a model with four activations with a mean squared error loss function, and 01:55:32.100 |
And if you remember from part 1, to do a multiple label classification, your multiple labels 01:55:40.820 |
have to be space-separated, and then your file name is comma-separated. 01:55:46.660 |
So I'll take my largest item dictionary, create a bunch of bounding boxes for each one separated 01:55:59.740 |
by a space using a list comprehension, then create a data frame like I did before, I'll 01:56:05.020 |
turn that into a CSV, and now I've got something that's got the file name and the four bounding 01:56:12.380 |
I will then pass that to from_csv, again I will use crop_type=crop_type.no. 01:56:23.180 |
Next week we'll look at transform_type.coordinate. 01:56:26.500 |
For now, just realize that when we're doing scaling and data augmentation, that needs 01:56:30.260 |
to happen to the bounding boxes, not just to the images. 01:56:34.420 |
ImageClassifierData.csv gets us to a situation where we can now grab one mini-batch of data, 01:56:43.260 |
we can denormalize it, we can turn the bounding box back into a height width so that we can 01:56:50.160 |
Remember we're not doing classifications, I don't know what kind of thing this is, it's 01:56:57.700 |
So I now want to create a conflict based on ResNet-34, but I don't want to add the standard 01:57:08.540 |
set of fully connected layers that create a classifier, I want to just add a single 01:57:18.700 |
So FastAI has this concept of a custom head, if you say my model has a custom head, the 01:57:25.820 |
head being the thing that's added to the top of the model, then it's not going to create 01:57:30.740 |
any of that fully connected network for you, it's not going to add the adaptive average 01:57:36.860 |
pooling for you, but instead it will add whatever model you ask for. 01:57:42.540 |
So in this case I've created a tiny model, it's a model that flattens out the previous 01:57:50.700 |
Normally it would have a 7x7 by I think 512 previous layer in ResNet-34, so it just flattens 01:57:56.300 |
that out into a single vector of length 25.088, and then I just add a linear layer that goes 01:58:06.860 |
So that's the simplest possible kind of final layer you could add. 01:58:12.260 |
I stick that on top of my pre-trained ResNet-34 model, so this is exactly the same as usual 01:58:21.280 |
Optimize it with Adam, use a criteria, I'm actually not going to use MSC, I'm going to 01:58:25.260 |
use L1 loss, so I can't remember if we covered this last week, we can revise it next week 01:58:30.620 |
if we didn't, but L1 loss means rather than adding up the squared errors, add up the absolute 01:58:39.180 |
It's normally actually what you want, adding up the squared errors really penalizes bad 01:58:45.780 |
misses by too much, so L1 loss is generally better to work with. 01:58:52.500 |
I'll come back to this next week, but basically you can see what we do now is we do our LR 01:58:57.460 |
find, find our learning rate, learn for a while, freeze 2 - 2, learn a bit more, freeze - 3, 01:59:06.180 |
learn a bit more, and you can see this validation loss, which remember is the absolute value, 01:59:12.860 |
mean of the absolute value of the pixels we're off by, gets lower and lower, and then when 01:59:18.700 |
we're done we can print out the bounding boxes, and lo and behold, it's done a damn good job. 01:59:27.940 |
So we'll revise this a bit more next week, but you can see this idea of like if I said 01:59:34.300 |
to you before this class, do you know how to create a bounding box model? 01:59:38.540 |
You might have said, no, nobody's taught me that. 01:59:42.900 |
But the question actually is, can you create a model with 4 continuous outputs? 01:59:49.580 |
Can you create a loss function that is lower if those 4 outputs are near to 4 other numbers? 01:59:57.300 |
Now you'll see if I scroll a bit further down, it starts looking a bit crappy. 02:00:05.140 |
And that's not surprising, because how the hell do you decide which bird, so it's just 02:00:11.420 |
said I'll just pick the middle, which cow, I'll pick the middle. 02:00:17.380 |
How much of this is actually potted plant, I'll pick the middle. 02:00:21.580 |
This one it could probably improve, but it's got close to the car, but it's a pretty weird 02:00:27.220 |
But nonetheless, for the ones that are reasonably clear, I would say it's been a pretty good 02:00:38.780 |
I think it's been a kind of gentle introduction for the first lesson. 02:00:43.900 |
If you're a professional coder, there's probably not heaps of new stuff here for you. 02:00:50.060 |
And so in that case I would suggest practicing learning about bounding boxes and stuff. 02:00:55.900 |
If you aren't so experienced with things like debuggers and matplotlib API and stuff like 02:01:02.980 |
that, there's going to be a lot for you to practice because we're going to be really