back to index

Lesson 8: Deep Learning Part 2 2018 - Single object detection


Chapters

0:0 Intro
1:58 Key takeaways
2:11 Differentiable programming
3:21 Transfer learning
5:14 Architecture design
6:26 Overfitting
7:45 Categorical Data
8:50 Cutting Edge Deep Learning
14:17 How to use provided notebooks
16:50 A good time to build a deep learning box
21:34 Reading papers
30:12 generative models
35:7 object detection
38:32 stage 1 classification
40:12 logbook
42:44 pathlib
45:52 pathopen
49:12 contents
54:17 dictionary

Whisper Transcript | Transcript Only Page

00:00:00.000 | Okay, welcome to Part 2 of Deep Learning for Coders.
00:00:07.840 | Part 1 was Practical Deep Learning for Coders, Part 2 is not impractical, but it is a little
00:00:15.560 | different as we'll discuss.
00:00:17.840 | This is probably a really dumb idea, but last year I started not starting Part 2 with Part
00:00:23.920 | 2 Lesson 1, but Part 2 Lesson 8 because it's kind of part of the same sequence.
00:00:29.640 | I've done that again, but sometimes I'll probably forget and call things Lesson 1.
00:00:34.920 | So Part 2 Lesson 1 and Part 2 Lesson 8 are the same thing if I ever make that mistake.
00:00:39.920 | So we're going to be talking about object detection today, which refers to not just
00:00:45.240 | finding out what a picture is a picture of, but also where about that thing is.
00:00:50.240 | But in general, the idea of each lesson in this part is not so much because I particularly
00:01:01.000 | want you to care about object detection, but rather because I'm trying to pick topics which
00:01:06.640 | allow me to teach you some foundational skills that you haven't got yet.
00:01:12.040 | So for example, object detection is going to be all about creating much richer convolutional
00:01:19.960 | network structures, which have a lot more interesting stuff going on and a lot more
00:01:24.560 | stuff going on in the fast.ai library that we have to customize to get there.
00:01:29.380 | So at the end of these 7 weeks, I can't possibly cover the hundreds of interesting things people
00:01:35.160 | are doing with deep learning right now, but the good news is that all of those hundreds
00:01:40.800 | of things you'll see once you read the papers, like minor tweaks on a reasonably small number
00:01:47.360 | of concepts.
00:01:48.520 | So we covered a bunch of those concepts in Part 1, and we're going to go a lot deeper
00:01:52.760 | into those concepts and build on them to get some deeper concepts in Part 2.
00:02:00.160 | So in terms of what we covered in Part 1, there's a few key takeaways.
00:02:09.360 | We'll go through each of these takeaways in turn.
00:02:12.280 | One is the idea -- and you might have seen recently Yann LeCun has been promoting the
00:02:17.240 | idea that we don't call this deep learning, but differentiable programming.
00:02:23.120 | And the idea is that, you'll have noticed, all the stuff we did in Part 1 was really
00:02:29.320 | about setting up a differentiable function and a loss function that describes how good
00:02:37.280 | the parameters are, and then pressing Go and it kind of makes it work.
00:02:44.240 | And so I think it's quite a good way of thinking about it, differentiable programming, this
00:02:48.600 | idea that if you can configure a loss function that describes the scores, how good something
00:02:58.080 | is at doing your task, and you have a reasonably flexible neural network architecture, you're
00:03:04.280 | kind of done.
00:03:06.680 | So that's one key way of thinking about this.
00:03:09.920 | This example here comes from Playground.TensorFlow.org, which is a cool website where you can play
00:03:15.200 | interactively with creating your own little differentiable functions manually.
00:03:24.400 | The second thing we learned is about transfer learning.
00:03:28.960 | And it's basically that transfer learning is the most important single thing to be able
00:03:36.360 | to do, to use deep learning effectively.
00:03:40.440 | Nearly all courses, nearly all papers, nearly everything in deep learning, education, research
00:03:47.520 | focuses on starting with random weights, which is ridiculous because you almost never would
00:03:56.560 | want to do that.
00:03:58.640 | You would only want to do that if nobody had ever trained a model on a vaguely similar
00:04:05.480 | set of data with an even remotely connected kind of problem to solve as what you're doing
00:04:13.000 | now, which almost never happens.
00:04:17.920 | So this is where the fast.ai library and the stuff we talk about in this class is vastly
00:04:24.240 | different to any other library or course, it's all focused on transfer learning and it turns
00:04:31.680 | out that you do a lot of things quite differently.
00:04:35.960 | So the basic idea of transfer learning is here's a network that does thing A, remove
00:04:41.680 | the last layer or so, replace it with a few random layers at the end, fine-tune those
00:04:49.680 | layers to do thing B, taking advantage of the features that the original network learned,
00:04:56.440 | and then optionally fine-tune the whole thing end-to-end.
00:04:59.800 | And you've now got something which probably uses orders of magnitude less data than if
00:05:05.160 | you started with random weights.
00:05:07.560 | It's probably a lot more accurate and probably trained a lot faster.
00:05:19.380 | We didn't talk a hell of a lot about architecture design in Part 1, and that's because architecture
00:05:26.360 | design is getting less and less interesting.
00:05:29.840 | There's a pretty small range of architectures that generally work pretty well quite a lot
00:05:36.560 | of the time.
00:05:38.080 | We've been focusing on using CNNs for generally fixed size, somehow ordered data, RNNs for
00:05:47.840 | sequences that have some kind of state, fiddling around a tiny bit with activation functions
00:05:54.160 | like softmax if you've got a single categorical outcome or sigmoid if you've got multiple
00:05:58.880 | outcomes and so forth.
00:06:03.800 | Some of the architecture design we'll be doing in this part gets more interesting, particularly
00:06:09.640 | this first session about object detection.
00:06:14.800 | But on the whole, I think we probably spend less time talking about architecture design
00:06:18.320 | than most courses or papers because it's generally not the hard bit in my opinion.
00:06:27.280 | The third thing we looked at was how to avoid overfitting.
00:06:31.640 | The general idea that I tried to explain is the way I like to build a model is to first
00:06:38.300 | of all create something that's definitely terribly over-parameterized, will massively
00:06:44.080 | overfit for sure, train it and make sure it does overfit.
00:06:47.480 | Because at that point you know, okay, I've got a model that is capable of reflecting
00:06:52.760 | the training set and then it's as simple as doing these things to then reduce that overfitting.
00:07:00.480 | If you don't start with something that's overfitting, then you're kind of lost.
00:07:05.800 | So you start with something that's overfitting and then to make it overfit less, you can
00:07:10.080 | add more data, you can add more data augmentation, you can do things like more batch norm layers
00:07:19.720 | or dense nets or various things that can handle basically less data, you can add regularization
00:07:27.980 | like weight decay and dropout.
00:07:31.200 | And then finally, this is often the thing people do first, this should be the thing
00:07:35.320 | you do last, is reduce the complexity of your architecture, have less layers or less activations.
00:07:47.560 | We talked quite a bit about embeddings, both for NLP and the general idea of any kind of
00:07:54.400 | categorical data as being something you can now model with neural nets.
00:07:58.320 | It's been interesting to see how since Part 1 came out, at which point there were almost
00:08:05.600 | no examples of papers or blogs or anything about using tabular data or categorical data
00:08:14.400 | in deep learning, suddenly it's kind of taken off and it's kind of everywhere.
00:08:21.080 | So this is becoming a more and more popular approach.
00:08:25.040 | It's still little enough known that when I say to people, we use neural nets for time
00:08:31.880 | series and tabular data analysis, it's often kind of like, wait, really?
00:08:36.400 | But it's definitely not such a far out idea, and there's more and more resources available,
00:08:43.760 | including recent Kaggle competition winning approaches using this technique.
00:08:53.160 | So Part 1, which particularly had those five messages, really was all about introducing
00:09:02.240 | you to best practices in deep learning.
00:09:07.040 | And so it's like trying to show you techniques which were mature enough that they definitely
00:09:17.280 | work reasonably reliably for practical real-world problems, and that I had researched and tuned
00:09:26.320 | enough over quite a long period of time that I could kind of say, OK, here's a sequence
00:09:30.720 | of steps and architectures and whatever that if you use this, you'll almost certainly get
00:09:36.720 | pretty good results, and then had kind of put that into the fast AI library into a way
00:09:43.200 | that you could do that pretty quickly and easily.
00:09:45.840 | So that's kind of what practical deep learning for coders was designed to do.
00:09:52.720 | So this Part 2 is cutting edge deep learning for coders, and what that means is I often
00:10:00.520 | don't know the exact best parameters, architecture details and so forth to solve a particular
00:10:07.920 | problem.
00:10:08.920 | We don't necessarily know if it's going to solve a problem well enough to be practically
00:10:13.280 | useful, it almost certainly won't be integrated well enough into fast AI or in any library
00:10:19.520 | that you can just press a few buttons and it will start working.
00:10:24.040 | It's all about stuff which I'm not going to teach it unless I'm very confident that it
00:10:31.760 | either is now or will be soon, a very practically useful technique.
00:10:38.900 | So I don't kind of take stuff which just appeared and I don't know enough about it to kind of
00:10:45.720 | know what's the trajectory going to be.
00:10:47.480 | So if I'm teaching it in this course, I'm saying either works well in the research literature
00:10:55.520 | now and it's going to be well worth learning about or we're pretty close to being there,
00:11:01.100 | but it's going to take a lot of creaking often and experimenting to get it to work on your
00:11:06.480 | particular problem because we don't know the details well enough to know how to make it
00:11:12.480 | work for every data set or every example.
00:11:16.420 | So it's kind of exciting to be working at this point.
00:11:25.120 | It means that rather than fast AI and PyTorch being obscure black boxes which you just know
00:11:34.160 | these recipes for, you're going to learn the details of them well enough that you can customize
00:11:40.900 | them exactly the way you want, that you can debug them, that you can read the source code
00:11:46.240 | of them to see what's happening and so forth.
00:11:50.040 | And so if you're not pretty confident of object-oriented Python and stuff like that, then that's something
00:11:59.440 | you're going to want to focus on studying during this course because we assume that.
00:12:11.880 | I will be trying to introduce you to some tools that I think are particularly helpful
00:12:18.200 | like the Python Debugger, like how to use your editor to jump through the code, stuff
00:12:23.600 | like that.
00:12:24.600 | And in general there will be a lot more detailed specific code walkthroughs, coding technique
00:12:32.840 | discussions and stuff like that, as well as more detailed walkthroughs of papers and stuff.
00:12:40.880 | And so anytime we cover one of these things, if you notice something where you're like,
00:12:47.160 | this is assuming some knowledge that I don't have, that's fine.
00:12:52.120 | It just means that's something you could ask on the forum and say hey, Jeremy was talking
00:12:58.240 | about static methods in Python, I don't really know what a static method is, or why he was
00:13:05.320 | using it here, could someone give me some resources.
00:13:08.280 | These are things that are not rocket science, just because you don't happen to have come
00:13:12.560 | across it yet doesn't mean it's hard, it's just something you need to learn.
00:13:20.920 | I will mention that as I cover these research-level topics and develop these courses, I often
00:13:29.480 | refer to code that academics have put up to go along with their papers, or kind of example
00:13:35.760 | code that somebody else has written on GitHub.
00:13:38.480 | I nearly always find that there's some massive critical flaw in it.
00:13:45.400 | So be careful of taking code from online resources and assuming that if it doesn't work for you
00:13:54.680 | that you've made a mistake or something, this kind of research-level code, it's just good
00:14:00.760 | enough that they were able to run their particular experiments every second Tuesday.
00:14:09.320 | So you should be ready to do some debugging and so forth.
00:14:18.560 | So on that sense, I just wanted to remind you about something from our old course wiki
00:14:25.080 | that we sometimes talk about, which is like people often ask what should I do after the
00:14:30.640 | lesson, like how do I know if I've got it, and we basically have this thing called how
00:14:38.640 | to use the provided notebooks, and the idea is this.
00:14:42.980 | Don't open up the notebook, I know I said this in part 1 as well, but I'll say it again,
00:14:47.120 | then go shift, enter, shift, enter, shift, enter until a bug appears and then go to the
00:14:51.160 | forums and say the notebook's broken.
00:14:55.080 | The idea of the notebook is to kind of be like a little crutch to help you get through
00:15:00.880 | each step.
00:15:01.880 | The idea is that you start with an empty notebook and think I now want to complete this process.
00:15:08.520 | And that might initially require you alt-tabbing to the notebook and reading it, figuring out
00:15:18.460 | what it says, but whatever you do, don't copy and paste it to your notebook.
00:15:23.000 | Type it out yourself.
00:15:26.400 | So try to make sure you can repeat the process, and as you're typing it out, you need to be
00:15:32.920 | thinking, what am I typing, why am I typing it?
00:15:36.100 | So if you can get to the point where you can solve an object detection problem yourself
00:15:44.560 | in a new empty notebook, even if it's using the exact same data set we used in the course,
00:15:50.800 | that's a great sign that you're getting it.
00:15:53.960 | That will take a while, but the idea is that by practicing the second time you try to do
00:15:58.840 | it, the third time you try to do it, you'll check the notebook lastness.
00:16:04.360 | And if there's anything in the notebook where you think, if you think I don't know what
00:16:08.800 | it's doing, I hope to teach you enough techniques in this course, in this class, that you'll
00:16:14.100 | know how to experiment to find out what it's doing, so you shouldn't have to ask that.
00:16:19.600 | But you may well want to ask, why is it doing that?
00:16:22.600 | That's the conceptual bit, and that's something which you may need to go to the forums and
00:16:27.000 | say, before this step, Jeremy had done this, after this step, Jeremy had done that, there's
00:16:34.440 | this bit in the middle where he does this other thing, I don't quite know why.
00:16:38.720 | So you can say here are my hypotheses as to why, try and work through it as much as possible,
00:16:45.280 | that way you'll both be helping yourself and other people will help you fill in the gaps.
00:16:53.240 | If you wish, and you have the financial resources, now is a good time to build a deep learning
00:16:59.560 | box for yourself.
00:17:02.000 | When I say a good time, I don't mean a good time in the history of the pricing of GPUs.
00:17:06.520 | GPUs are currently by far the most expensive they've ever been, as I say this because of
00:17:11.760 | the cryptocurrency mining boom.
00:17:17.960 | I mean it's a good time in your study cycle.
00:17:22.120 | The fact is if you're paying somewhere between $0.60 and $0.90 an hour for doing your deep
00:17:31.680 | learning on a cloud provider, particularly if you're still on a K80 like an Amazon P2
00:17:39.160 | or Google Colab actually, if you haven't come across it, now lets you train on a K80 for
00:17:46.040 | free.
00:17:47.040 | But those are very slow GPUs.
00:17:50.440 | You can buy one that's going to be like three times faster for maybe $600 or $700.
00:18:00.920 | You need a box to put it in, of course, but the example in the bottom right here from
00:18:07.680 | the forum was something that somebody put together in last year's course, so like a year
00:18:12.280 | ago they were able to put together a pretty decent box for a bit over $500.
00:18:18.360 | Probably speaking you're probably looking at more like $1,000 or $1,500.
00:18:21.560 | I created a new forum thread where you can talk about options and parts and ask questions
00:18:29.280 | and so forth.
00:18:33.440 | If you could afford it right now, the GTX 1080 Ti is almost certainly what you want
00:18:39.760 | in terms of the best price performance mix.
00:18:43.140 | If you can't afford it, a 1070 is fine.
00:18:46.680 | If you can't afford that, you should probably be looking for a second-hand 980 or a second-hand
00:18:52.160 | 970, something like that.
00:18:55.500 | If you can afford to spend more money, it's worth getting a second GPU so you can do what
00:19:00.800 | I do, which is to have one GPU training and another GPU which I'm running an interactive
00:19:07.920 | Jupyter notebook session in.
00:19:14.080 | RAM is very useful, try and get 32GB if you can, RAM is not terribly expensive.
00:19:22.560 | A lot of people find that their vendor or person to buy one of these business classes
00:19:27.000 | on CPUs, that's a total waste of time.
00:19:30.400 | You can get one of the Intel i5 or i7 consumer CPUs, far, far cheaper, but actually a lot
00:19:36.240 | of them are faster.
00:19:39.480 | Often you'll hear CPU speed doesn't matter.
00:19:42.560 | If you're doing computer vision, that's definitely not true.
00:19:45.600 | It's very common now with these 1080TIs and so forth to find that the speed of the data
00:19:50.600 | augmentation is actually the slow bit that's happening on the CPU, so it's worth getting
00:19:55.120 | a decent CPU.
00:20:01.320 | Your GPU, if it's running quickly but the hard drive's not fast enough to give it data,
00:20:06.260 | then that's a waste as well.
00:20:08.000 | So if you can afford an NVMe drive that's super, super fast, you don't have to get a
00:20:12.880 | big one.
00:20:13.880 | You can just get a little one that you just copy your current set of data onto and have
00:20:17.320 | some big RAID array that sits there for the rest of your data when you're not using it.
00:20:24.800 | There's a slightly arcane thing about PCI lanes which is basically like the size of
00:20:31.720 | the highway that connects your GPU to your computer, and a lot of people claim that you
00:20:40.200 | need to have 16 lanes to feed your GPU.
00:20:45.080 | It actually turns out, based on some analysis that I've seen recently that that's not true,
00:20:53.520 | you need 8 lanes of GPU.
00:20:56.640 | So again, hopefully it'll help you save some money on your motherboard.
00:21:01.600 | If you've never heard of PCI lanes before, trust me, by the end of putting together this
00:21:05.360 | box you'll be sick of hearing about them.
00:21:11.720 | You can buy all the parts and put it together yourself.
00:21:13.720 | It's not that hard, it can be a useful learning experience, it can also be kind of frustrating
00:21:18.840 | and annoying, so you can always go to central computers and they'll put it together for
00:21:23.960 | you, there's lots of online vendors that will do the same thing, and they'll generally make
00:21:28.160 | sure it turns on and runs properly and generally not much of a mark-up, so it's not a bad idea.
00:21:37.640 | We're going to be doing a lot of reading papers.
00:21:40.560 | Basically each week we'll be implementing a paper, or a few papers, and if you haven't
00:21:45.320 | looked at papers before, they look something like on the left.
00:21:49.720 | The thing on the left is an extract from the paper that implements Adam.
00:21:55.320 | You may also have seen Adam as a single Excel formula on the spreadsheet that I've written.
00:22:00.680 | They're the same thing.
00:22:03.160 | The difference is in academic papers, people love to use Greek letters, they also hate
00:22:10.160 | to refactor.
00:22:11.560 | So you'll often see like a page-long formula where when you actually look at it carefully
00:22:17.000 | you'll realize the same kind of sub-equation appears 8 times.
00:22:22.280 | They didn't think to say above it, let t equal this sub-equation and now it's 1.
00:22:28.200 | I don't know why this is a thing, but I guess all this is to say once you've read and understood
00:22:38.560 | a paper, you then go back to it and you look at it and you're just like wow, how did they
00:22:43.000 | make such a simple thing so complicated?
00:22:46.400 | Like Adam is like momentum on the gradient and momentum on the square of the gradient.
00:22:58.160 | That's it.
00:22:59.160 | And it's a big long thing.
00:23:00.160 | And the other reason it's a big long thing is because they have things like this where
00:23:03.080 | they have theorems and corollaries and stuff where they're kind of saying here's all our
00:23:09.080 | theoretical reasoning behind why this ought to work or whatever.
00:23:14.160 | And for whatever reason, a lot of conferences and journals don't like to accept papers that
00:23:20.240 | don't have a lot of this theoretical justification.
00:23:23.280 | Jeffrey Hinton has talked about this a bit, particularly a decade or two ago when no conferences
00:23:30.080 | would really accept any neural network papers.
00:23:33.920 | Then there was this one abstract theoretical result that came out where suddenly they could
00:23:40.040 | show this practically unimportant but theoretically interesting thing, and then suddenly they
00:23:46.920 | could then start submitting things to journals because they had this theoretical justification.
00:23:52.360 | So academic papers are a bit weird, but in the end it's the way that the research community
00:24:00.520 | communicates their findings and so we need to learn to read them.
00:24:05.680 | But something that can be a great thing to do is to take a paper, put in the effort to
00:24:11.880 | understand it, and then write a blog where you explain it in code and normal English.
00:24:21.280 | And lots of people who do that end up getting quite a following, end up getting some pretty
00:24:27.640 | great job offers and so forth because it's such a useful skill to be able to show I can
00:24:33.760 | understand these papers, I can implement them in code, I can explain them in English.
00:24:41.720 | One thing I will mention is it's very hard to read or understand something which you
00:24:48.640 | can't vocalize, which means if you don't know the names of the Greek letters, it sounds
00:24:54.840 | weird, but it's actually very difficult to understand, remember, take in a formula that
00:25:01.240 | appears again and again that's got squiggle.
00:25:05.680 | You need to know that that squiggle is called delta, or that squiggle is called sigma, whatever.
00:25:10.120 | Just spending some time learning the names of the Greek letters sounds like a strange
00:25:14.600 | thing to do, but suddenly you don't look at these things anymore and go squiggle a over
00:25:18.560 | squiggle b plus other weird squiggles, it looks like a y thing.
00:25:23.360 | They've all got names.
00:25:29.400 | So now that we're kind of at the cutting edge stage, a lot of the stuff we'll be learning
00:25:36.600 | in this class is stuff that almost nobody else knows about.
00:25:43.840 | So that's a great opportunity for you to be the first person to create an understandable
00:25:51.320 | and generalizable code library that implements it, or the first person to write a blog post
00:25:55.840 | that explains it in clear English, or the first person to try applying it to this slightly
00:26:00.960 | different area which is obviously going to work just as well, or whatever.
00:26:07.040 | So when we say cutting edge research, that doesn't mean you have to come up with the
00:26:12.920 | next batch norm, or the next atom, or the next diluted convolution.
00:26:18.560 | It can mean take this thing that was used for translation and apply it to this very
00:26:27.400 | similar other parallel NLP task, or take this thing that was tested on skin lesions and
00:26:34.400 | tested on this data set of this other kind of lesions.
00:26:39.080 | That kind of stuff is super great learning experience and incredibly useful because the
00:26:45.680 | vast majority of the world that knows nothing about this whole field, it just looks like
00:26:50.840 | magic.
00:26:51.840 | You'll be like, hey, I've for the first time shown greater than 90% accuracy at finding
00:26:58.360 | this kind of lesion in this kind of data.
00:27:07.600 | So when I say here experiment in your area of expertise, one of the things we particularly
00:27:13.040 | look for in this class is to bring in people who are pretty good at something else, pretty
00:27:21.240 | good at meteorology, or pretty good at de novo drug design, or pretty good at goat dairy
00:27:31.320 | farming, or whatever, these are all examples of people we've had in the class.
00:27:39.880 | So probably the thing you can do the best would be to take that thing you're already
00:27:45.980 | pretty good at and add on these new skills, because otherwise if you're trying to go into
00:27:52.200 | some different domain, you're going to have to figure out how do I get data for that domain,
00:27:55.360 | how do I know what are the problems to solve in that domain, and so forth.
00:28:00.640 | Whereas often it will seem pretty trivial to you to take this technique applied to this
00:28:05.640 | data set that you've already got sitting on your hard drive, but that's often going to
00:28:09.120 | be a super interesting thing for the rest of the world to see like oh, that's interesting
00:28:14.840 | when you apply it to meteorology data and use this RNN or whatever, suddenly it allows
00:28:21.640 | you to forecast over larger areas or longer time periods.
00:28:30.600 | So communicating what you're doing is super helpful, we've talked about that before, but
00:28:37.360 | I know something that a lot of people in the forums ask people who have already written
00:28:42.240 | - when somebody has written a blog, often people in the forum will be like how did you
00:28:47.960 | get up the guts to do that, or what of the process you got to before you decided to start
00:28:52.240 | publishing something, or whatever, and the answer is always the same.
00:28:56.600 | It's always just, I was sure I wasn't good enough to do it, I felt terrified and intimidated
00:29:04.680 | with doing it, but I wrote it and posted it anyway.
00:29:09.280 | There's never a time I think any of us actually feel like we're not total frauds and imposters,
00:29:16.640 | but we know more about what we're doing than us six months ago.
00:29:21.000 | And there's somebody else in the world who knows as much as you did six months ago, so
00:29:25.320 | if you write something now that would have helped you six months ago, you're helping
00:29:29.420 | some people.
00:29:30.420 | Honestly if you wait another six months, then the year of 12 months ago probably won't even
00:29:35.640 | understand that anymore because it's too advanced now.
00:29:40.040 | It's great to communicate wherever you're up to in a way that you think would be helpful
00:29:46.440 | to the person you were before you knew that thing.
00:29:51.560 | And of course something that the forums have been useful for is getting feedback about
00:29:56.600 | drafts and if you post a draft of something that you're thinking of releasing, then other
00:30:06.400 | folks here can point out things that they find unclear or they think need some corrections.
00:30:13.160 | So the kind of overarching theme of Part 2 I've described as generative models, but unfortunately
00:30:23.440 | then Rachel asked me this afternoon exactly what I meant by generative models, and I realized
00:30:27.440 | I don't really know.
00:30:30.000 | So what I really mean is in Part 1, the output of our neural networks was generally like
00:30:37.920 | a number or a category, whereas the outputs of a lot of the stuff in Part 2 are going
00:30:49.440 | to be like a whole lot of things, like the top left and bottom right location of every
00:30:57.800 | object in an image along with what the object is, or a complete picture with a class of
00:31:03.880 | every single pixel in that picture, or an enhanced super-resolution version of the input
00:31:11.920 | image, or the entire original input paragraph translated into French.
00:31:23.280 | It's kind of like, often it just requires some different ways of thinking about things
00:31:30.980 | and some kind of different architectures and so forth, and so that's kind of like I guess
00:31:36.280 | the main theme of the kind of techniques we'll be looking at.
00:31:41.360 | The vast majority, possibly all, of the data we'll be looking at will be either text or
00:31:47.720 | image data.
00:31:53.440 | It would be fairly trivial to do most of these things with audio as well, it's just not something
00:31:59.560 | I've spent much time on myself yet.
00:32:03.360 | Somebody asked on the forum about what can we do more stuff with time series and tabular
00:32:07.700 | data, and my answer was, I've already taught you everything I know about that and I'm not
00:32:13.920 | sure there's much else to say, particularly if you check out the machine learning course,
00:32:21.040 | which goes into a lot of that in a lot more detail.
00:32:24.160 | I don't feel like there's more stuff to tell you, I think that's a super-important area,
00:32:30.400 | but I think we're done with that.
00:32:38.200 | We'll be looking at some larger data sets, both in terms of the number of objects in
00:32:43.960 | the data set and the size of each of those objects.
00:32:47.600 | For those of you that are working with limited computational resources, please don't let
00:32:51.680 | that put you off, feel free to replace it with something smaller and simpler.
00:32:56.480 | In fact, when I was designing this course, I did quite a lot of it in Australia when
00:33:02.240 | I went to visit my mum, and my mum decided to book a nice holiday house for us with fast
00:33:09.720 | WiFi.
00:33:11.240 | We turned up the holiday house with fast WiFi, and indeed it did have WiFi, it was fast,
00:33:17.000 | but the WiFi was not connected to the internet.
00:33:22.360 | So I called up the agent and I said, "I found the ADSL router and it's got an ADSL thing
00:33:30.760 | plugged in, and I followed the cable down, and the other end of the cable has nothing
00:33:33.660 | to plug into."
00:33:35.940 | So she called the people renting the house and the owner and called me back the next
00:33:45.800 | day, and she said, "Actually, Point Leo has no internet."
00:33:57.480 | So the good old Australian government had decided to replace ADSL in Point Leo with
00:34:02.400 | a new national broadband network, and therefore they had disconnected ADSL that had not yet
00:34:07.800 | connected the national broadband network.
00:34:10.280 | So we had fast WiFi, which we could use to Skype chat from one side of the house to the
00:34:16.120 | other, but I had no internet.
00:34:18.360 | Luckily, I did have a new Surface Book 15-inch, which has a GTX 1070 in it, and so I wrote
00:34:28.760 | a large amount of this course entirely on my laptop, which means I had to practice with
00:34:34.720 | relatively small resources, I mean not tiny, but 16GB RAM and 6GB GPU, and it was all in
00:34:48.800 | Windows by the way.
00:34:50.000 | So I can tell you that pretty much all of this course works well on Windows, on a laptop.
00:34:57.040 | So you can always use smaller batch sizes, you could use a cut-down version of the dataset,
00:35:01.200 | whatever.
00:35:02.200 | So if you have the resources, you'll get better results if you can use the bigger datasets
00:35:06.240 | when they're available.
00:35:07.240 | Now's a good time, I think, to take a somewhat early break so we can fix the forums.
00:35:14.320 | So the forums are still down.
00:35:18.080 | Okay, let's come back at 7.25.
00:35:33.560 | So let's start talking about object detection, and so here is an example of object detection.
00:35:41.320 | So hopefully you'll see two main differences from what we're used to when it comes to classification.
00:35:49.440 | The first is that we have multiple things that we're classifying, which is not unheard
00:35:57.840 | We did that in the Planet Satellite data, for example, but what is kind of unheard of
00:36:03.120 | is that as well as saying what we see, we've also got what's called bounding boxes around
00:36:09.120 | what we see.
00:36:10.120 | A bounding box has a very specific definition, which is it's a box, it's a rectangle, and
00:36:19.640 | the rectangle has the object entirely fitting within it, but it's no bigger than it has
00:36:28.080 | to be.
00:36:29.800 | You'll see this bounding box is perhaps, for the horse at least, slightly imperfect in
00:36:36.280 | that it looks like there's a bit of tail here.
00:36:39.240 | So it probably should be a bit wider, and maybe there's even a little bit of hoof here,
00:36:42.760 | maybe it should be a bit longer.
00:36:44.360 | So the bounding boxes won't be perfect, but they're generally pretty good in most data
00:36:49.900 | sets that you can find.
00:36:53.320 | So our job will be to take data that has been labeled in this way and on data that is unlabeled
00:37:01.720 | to generate the classes of the objects and each one of those are bounding boxes.
00:37:11.080 | One thing I'll note to start with is that labeling this kind of data is generally more
00:37:16.840 | expensive.
00:37:18.440 | It's generally quicker to say horse, person, person, horse, car, dog, jumbo jet, than it
00:37:24.480 | is to say if there's a whole horse race going on to label the exact location of every rider
00:37:31.680 | and every horse.
00:37:32.680 | And then of course it also depends on what classes do you want to label, do you want
00:37:37.960 | to label every fence post or whatever.
00:37:40.720 | So generally always, just like in ImageNet, it's not like tell me any object you see in
00:37:49.080 | this picture.
00:37:50.080 | In ImageNet it's like here are the 1000 classes that we ask you to look for, tell us which
00:37:56.920 | one of those 1000 classes you find, just tell me one thing.
00:38:02.000 | For these object detection data sets, it is a list of object classes that we want you
00:38:08.280 | to tell us about and find every single one of them of any type in the picture along with
00:38:13.520 | where they are.
00:38:14.520 | So in this case, why isn't there a tree or jump labeled?
00:38:19.280 | That's because for this particular data set they weren't one of the classes that the annotators
00:38:23.520 | were asked to find and therefore were not part of this particular problem.
00:38:27.800 | So that's kind of the specification of the object detection problem.
00:38:34.920 | So let me describe stage 1.
00:38:40.840 | And stage 1 is actually going to be surprisingly straightforward.
00:38:45.800 | And we're going to start at the top and work down.
00:38:48.560 | We're going to start out by classifying the largest object in each image.
00:38:55.400 | So we're going to try and say person, actually this one is wrong, dog is not the largest
00:39:00.560 | object, sofa is the largest object.
00:39:02.440 | So here's an example of a misclassified one, bird, correct, person, correct.
00:39:09.440 | That will be the first thing we try to do, that's not going to require anything new,
00:39:12.880 | so it'll just be a bit of a warm-up for us.
00:39:15.640 | The second thing will be to tell us the location of the largest object in each image.
00:39:23.440 | Again here, this is actually incorrect, it should have labeled the sofa, but you can
00:39:27.920 | see where it's coming from.
00:39:29.680 | And then finally we will try and do both at the same time, which is to label what it is
00:39:34.520 | and where it is for the largest thing in the picture.
00:39:38.400 | And this is going to be relatively straightforward, actually, so it will be a good warm-up to get
00:39:44.320 | us going again.
00:39:45.320 | But what I'm going to do is I'm going to use it as an opportunity to show you some useful
00:39:51.680 | coding techniques, really, and a couple of little fast.ai handy details before we then
00:40:00.280 | get on to multi-label classification and then multiple object classification.
00:40:07.080 | So let's start here.
00:40:08.600 | The notebook that we're using is Pascal notebook, and all of the notebooks are in the DL2 folder.
00:40:22.240 | One thing you'll see in some of my notebooks is torch.coder.set_device, you may have even
00:40:27.640 | seen it in the last part, just in case you're wondering why that's there.
00:40:30.840 | I have four GPUs on the university server that I use, and so I can put a number from
00:40:36.720 | 0 to 3 in here to pick one.
00:40:39.880 | This is how I prefer to use multiple GPUs rather than run a model on multiple GPUs, which
00:40:45.160 | doesn't always beat it up that much, and it's kind of awkward.
00:40:48.160 | I generally like to have different GPUs running different things, so in this case I was running
00:40:56.040 | something in this on device 1 and doing something else on another notebook in device 2.
00:41:01.160 | Now obviously if you see this in a notebook left behind, that was a mistake.
00:41:04.960 | If you don't have more than one GPU, you're going to get an error, so you can just change
00:41:09.200 | it to 0 or delete that line entirely.
00:41:13.680 | So there's a number of standard object detection datasets, just like ImageNet is a standard
00:41:21.160 | object classification dataset, and kind of the old classic ImageNet equivalent, if you
00:41:27.600 | like, is Pascal BOC, Visual Object Classes, something like that.
00:41:40.240 | The actual main website for it is like, I don't know, it's running on somebody's coffee warmer
00:41:47.760 | or something, it goes down all the time every time he makes coffee.
00:41:52.320 | So some folks have mirrored it, which is very kind of them, so you might find it easier
00:41:55.360 | to grab from the mirror.
00:41:58.480 | You'll see when you download it that there's a 2007 dataset, the 2012 dataset, there basically
00:42:05.920 | were academic competitions in those different years, just like the ImageNet dataset we tend
00:42:10.120 | to use is actually the ImageNet 2012 competition dataset.
00:42:17.160 | We'll be using the 2007 version in this particular notebook.
00:42:21.800 | Feel free to use the 2012 instead, it's a bit bigger, you might get better results.
00:42:26.320 | A lot of people, in fact most people now in research papers actually combine the two.
00:42:32.160 | You do have to be careful because there's some leakage between the validation sets between
00:42:36.360 | the two, so if you do decide to do that, make sure you do some reading about the dataset
00:42:40.800 | to make sure you know how to combine them correctly.
00:42:44.400 | The first thing you'll notice in terms of coding here is this, we haven't used this
00:42:52.280 | before.
00:42:53.280 | I'm going to be using this all the time now.
00:42:55.160 | This is part of the Python 3 standard library called pathlib, and it's super handy.
00:43:02.600 | It basically gives you an object-oriented access to a directory or a file.
00:43:09.680 | So you can see, if I go path.something, there's lots of things I can do.
00:43:23.680 | One of them is iterative directory, however, path.iterate directory returns that.
00:43:35.680 | Basically you've come across generators by now because we did quite a lot of stuff that
00:43:39.840 | used them behind the scenes without talking about them too much, but basically a generator
00:43:44.400 | is something in Python 3 which you can iterate over.
00:43:51.280 | So basically you could go for o in that print, o for instance, or of course you could do the
00:44:07.800 | same thing as a list comprehension, or you can just stick the word "list" around it to
00:44:20.400 | return a generator into the list.
00:44:23.440 | Any time you see me put list around something, that's normally because it returned a generator.
00:44:28.640 | It's not particularly interesting.
00:44:30.880 | The reason that things generally return generators is that what if the directory had 10 million
00:44:36.640 | items in?
00:44:37.940 | You don't necessarily want a 10 million long list, so with a for loop, you'll just grab
00:44:42.960 | one, do the thing, throw it away, grab a second, throw it away.
00:44:50.480 | You'll see that the things that's returning aren't actually strings, but they're some
00:44:55.400 | kind of object.
00:44:56.400 | If you're using Windows, it'll be a Windows path, on Linux it'll be a POSIX path.
00:45:02.520 | Most of the time you can use them as if they were strings, like if you pass it to any of
00:45:07.840 | the os.path.whatever functions in Python, it'll just work.
00:45:13.400 | But some external libraries, it won't work, so that's fine.
00:45:20.080 | If you grab one of these, let's just grab one of these.
00:45:29.360 | So in general, you can change data types in Python just by naming the data type that you
00:45:37.240 | want and treating it like a function, and that will cast it.
00:45:41.960 | So anytime you try to use one of these pathlib objects and you pass it to something which
00:45:48.040 | says like I was expecting a string, this is not a string, that's how you do it.
00:45:53.580 | So you'll see there's quite a lot of convenient things you can do.
00:45:55.800 | One kind of fun thing is the slash operator is not divided by, but it's path/.
00:46:08.000 | So they've overwritten the slash operator in Python so that it works, so you can say
00:46:13.240 | path/whatever, and you'll see how that's not inside a string.
00:46:18.880 | So this is actually applying not the division operator, but the overwritten slash operator,
00:46:24.880 | which means get a child thing in that path.
00:46:28.640 | And you'll see if you run that, it doesn't return a string, it returns a pathlib object.
00:46:38.440 | And so one of the things a pathlib object can do is it has an open method, so it's actually
00:46:45.360 | pretty cool once you start getting the hang of it.
00:46:48.400 | And you'll also find that the open method takes all the kind of arguments you're familiar
00:46:53.760 | with, you can say write, or binary, or encoding, or whatever.
00:46:57.740 | So in this case, I want to load up these JSON files which contain not the images but the
00:47:09.320 | bounding boxes and the classes of the objects.
00:47:14.160 | And so in Python, the easiest way to do that is with the JSON library, or there's some
00:47:20.160 | faster API equivalent versions, but this is pretty small so you won't need them.
00:47:24.440 | And you go to JSON.load, and you pass it an open file object, and so the easy way to do
00:47:31.120 | that since we're using pathlib is just go path.open.
00:47:36.120 | So these JSON files that we're going to look inside in a moment, if you haven't used them
00:47:39.560 | before JSON is JavaScript object notation, it's kind of the most standard way to pass
00:47:46.160 | around hierarchical structured data now, obviously not just with JavaScript.
00:47:54.960 | You'll see I've got some JSON files in here, they actually did not come from the mirror
00:47:59.040 | I mentioned.
00:48:00.200 | The original Pascal annotations were in XML format, but cool kids can't use XML anymore,
00:48:07.680 | we have to use JSON, so somebody's converted them all to JSON, and so you'll find the second
00:48:12.400 | link here has all the JSON files.
00:48:15.480 | So if you just pop them in the same location that I've put them here, everything will work
00:48:20.360 | for you.
00:48:22.400 | So these annotation files, JSONs, basically contain a dictionary.
00:48:29.440 | Once you open up the JSON, it becomes a Python dictionary, and they've got a few different
00:48:34.680 | things in.
00:48:35.680 | The first is we can look at images, it's got a list of all of the images, how big they
00:48:42.720 | are, and a unique ID for each one.
00:48:45.960 | One thing you'll notice here is I've taken the word images and put it inside a constant
00:48:54.320 | called images.
00:48:55.320 | They seem kind of weird, but if you're using a notebook or any kind of IDE, this now means
00:49:01.960 | I can tap Complete all of my strings and I won't accidentally type it slightly wrong,
00:49:08.160 | so that's just a handy trick.
00:49:12.240 | So here's the contents of the first few things in the images.
00:49:16.680 | More interestingly, here are some of the annotations.
00:49:20.480 | So you'll see basically an annotation contains a bounding box, and the bounding box tells
00:49:27.160 | you the column and row of the top left, and its height and width.
00:49:37.160 | And then it tells you that that particular bounding box is for this particular image,
00:49:43.160 | so you'd have to join that up over here to find it's actually O12.jpg.
00:49:49.680 | And it's of category ID 7.
00:49:54.300 | Also some of them at least have a polygon segmentation, not just a bounding box.
00:49:59.640 | We're not going to be using that.
00:50:02.600 | Some of them have an ignore flag, so we'll ignore the ignore flags.
00:50:06.280 | Some of them have something telling you it's a crowd of that object, not just one of them.
00:50:12.160 | So that's what these annotations look like.
00:50:16.080 | So then you saw here there's a category ID, so then we can look at the categories, and
00:50:20.680 | here's a few examples, basically each ID has a name, there we go.
00:50:29.240 | So what I did then was turn this category list into a dictionary from ID to name, I created
00:50:37.320 | a dictionary from ID to name of the image file names, and I created a list of all of
00:50:44.740 | the image IDs just to make life easier.
00:50:47.480 | So generally when you're working with a new dataset, I try to make it look the way I would
00:50:55.960 | want it to if I kind of designed that dataset, so I just kind of do a quick bit of manipulation.
00:51:02.040 | And so the steps you see here, and you'll see in each class, are basically the sequence
00:51:08.520 | of steps I took as I started working with this new dataset, except without the thousands
00:51:16.380 | of screw-ups that I did.
00:51:21.560 | I find the one thing people most comment on when they see me working in real time, having
00:51:29.880 | seen my classes, is like "wow, you actually don't know what you're doing, do you?"
00:51:36.320 | It's like 99% of the things I do don't work, and then the small percentage of the things
00:51:40.640 | that do work end up here.
00:51:44.320 | So I mentioned that because machine learning and particularly deep learning is kind of
00:51:50.640 | incredibly frustrating because in theory, you're just to find the correct loss function and
00:51:57.520 | a flexible enough architecture, and you press train and you're done.
00:52:03.160 | But if that was actually all at talk, then nothing would take any time, and the problem
00:52:10.400 | is that all the steps along the way until it works, it doesn't work.
00:52:16.960 | Like it goes straight to infinity, or it crashes with an incorrect tensor size, or whatever.
00:52:24.200 | And I will endeavor to show you some debugging techniques as we go, but it's one of the hardest
00:52:32.160 | things to teach because I don't know, maybe I just haven't quite figured it out yet.
00:52:42.680 | The main thing it requires is tenacity. I find the biggest difference between the people
00:52:48.440 | I've worked with who are super effective and the ones who don't seem to go very far has
00:52:54.000 | never been about intellect, it's always been about sticking with it, basically never giving
00:53:03.920 | It's particularly important with this deep learning stuff because you don't get that
00:53:09.040 | continuous reward cycle. With normal programming, you've got like 12 things to do until you've
00:53:15.040 | got your flash endpoint staged up. You know at each stage, it's like okay, we've successfully
00:53:20.720 | processed in the JSON, and now we've successfully got the callback from that promise, and now
00:53:26.000 | we've successfully created the authentication system.
00:53:30.160 | It's this constant sequence of stuff that works, whereas generally with training a model,
00:53:36.620 | it's a constant stream of like "it doesn't work, it doesn't work, it doesn't work" until
00:53:40.800 | eventually it does. So it's kind of annoying.
00:53:48.000 | So let's now look at the images. You'll find inside the VOC devkit, there's 2007 and 2012
00:53:57.720 | directories, and in there there's a whole bunch of stuff that's mainly these XML files,
00:54:03.120 | the one we care about with JPEG images, and so again here you've got the pathlibs/operator,
00:54:12.660 | and inside there's a few examples of images.
00:54:17.080 | So what I wanted to do was to create a dictionary where the key was the image ID, and the value
00:54:30.020 | was a list of all of its annotations. So basically what I wanted to do was go through each of
00:54:38.340 | the annotations that doesn't say to ignore it, and append it, the bounding box and the
00:54:48.040 | class, to the appropriate dictionary item where that dictionary item is a list. But
00:54:57.380 | the annoying thing is if that dictionary item doesn't exist yet, then there's no list to
00:55:04.480 | append to.
00:55:05.760 | So one super handy trick in Python is that there's a class called collections.defaultdict,
00:55:13.360 | which is just like a dictionary, but if you try and access a key that doesn't exist, it
00:55:20.040 | magically makes itself exist and sets itself equal to the return value of this function.
00:55:27.660 | Now this could be the name of some function that you've defined, or it can be a lambda
00:55:33.600 | function. A lambda function simply means it's a function that you define in place. We'll
00:55:39.700 | be seeing lots of them. So here's an example of a function. All the arguments to the function
00:55:46.420 | are listed on the left, so there's no arguments to the function. And lambda functions are special,
00:55:51.380 | you don't have to write return as a return is assumed. So in this case, this is a lambda
00:55:56.500 | function that takes no arguments and returns an empty list. So in other words, every time
00:56:01.820 | I try and access something in train annotations that doesn't exist, it now does exist and
00:56:10.980 | it's an empty list, which means I can append to it.
00:56:18.220 | One comment on variable naming is when I read through these notebooks, I'll generally try
00:56:29.060 | and speak out the English words that the variable name is a noun for. A reasonable question
00:56:36.740 | would be why didn't I write the full name of the variable in English rather than using
00:56:41.820 | a short mnemonic. It's a personal preference I have based on a number of programming communities
00:56:49.340 | where the basic thesis is that the more that you can see in a single eye grab of the screen,
00:57:00.500 | the more you can understand intuitively at one go. Every time your eye has to jump around,
00:57:07.980 | it's kind of like a context change that reduces your understanding. It's a style of programming
00:57:13.340 | I found super helpful, and so generally speaking I particularly try to reduce the vertical
00:57:20.260 | height, so things don't scroll off the screen, but I also try to reduce the size of things
00:57:26.460 | so that there's a mnemonic there which if you know it's training annotations, it doesn't
00:57:33.080 | take long for you to see training annotations, but you don't have to write the whole thing
00:57:38.580 | So I'm not saying you have to do it this way, I'm just saying there's some very large programming
00:57:42.500 | communities, some of which have been around for 50 or 60 years which have used this approach
00:57:46.940 | and I find it works well. It's interesting to compare, I guess my philosophy is somewhere
00:57:57.020 | between math and Java. In math, everything's a single character. The same single character
00:58:06.460 | can be used in the same paper for five different things, and depending on whether it's in italics
00:58:11.620 | or boldface or capitals, it's another five different things. I find that less than ideal.
00:58:19.100 | In Java, variable names sometimes require a few pages to print out, and I find that less
00:58:25.140 | than ideal as well.
00:58:26.540 | So for me, I personally like names which are short enough to not take too much of my perception
00:58:37.580 | to see at once, but long enough to have a mnemonic. Also, however, a lot of the time
00:58:46.340 | the variable will be describing a mathematical object as it exists in a paper, and there
00:58:51.780 | isn't really an English name for it, and so in those cases I will use the same, often
00:58:56.780 | single letter that the paper uses.
00:59:00.100 | And so if you see something called delta or A or something, and it's like something inside
00:59:07.540 | an equation from a paper, I generally try to use the same thing just to explain that.
00:59:17.220 | By no means do you have to do the same thing. I will say, however, if you contribute to
00:59:21.340 | fast.ai, I'm not particularly fastidious about coding style or whatever, but if you write
00:59:26.940 | things more like the way I do than the way Java people do, I'll certainly appreciate it.
00:59:34.700 | So by the end of this we now have a dictionary from file names to a tuple, and so here's
00:59:41.940 | an example of looking at that dictionary and we get back a bounding box and a class.
00:59:53.740 | You'll see when I create the bounding box, I've done a couple of things. The first is
00:59:57.980 | I've switched the x and y coordinates. The reason for this, I think we mentioned this
01:00:02.580 | briefly in the last course, the computer vision world when you say my screen is 640x480, that's
01:00:11.740 | width by height. Whereas the math world when you say my array is 640x480, it's rows by
01:00:19.340 | columns, i.e. height by width. So you'll see that a lot of things like PIL or Pillow Image
01:00:26.900 | Library in Python tend to do things in this kind of width by height or columns by rows
01:00:33.260 | way, NumPy is the opposite way around.
01:00:37.280 | My view is don't put up with this kind of incredibly annoying inconsistency, fix it.
01:00:45.660 | So I've decided FastAI is the NumPy PyTorch way is the right way, so I'm always rows by
01:00:54.340 | columns. So you'll see here I've switched my rows by columns.
01:01:00.580 | I've also decided that we're going to do things by describing the top left x, y coordinate
01:01:08.020 | and the bottom right x, y coordinate bounding box rather than the x, y and the height width.
01:01:15.060 | So you'll see here I'm just converting the height and width to the top left and bottom
01:01:23.100 | right.
01:01:25.600 | So again, I often find dealing with junior programmers, in particular junior data scientists,
01:01:31.420 | that they get given data sets that are in shitty formats, happy APIs, and they just
01:01:37.920 | act as if everything has to be that way. But your life will be much easier if you take
01:01:42.780 | a couple of moments to make things consistent and make them the way you want them to be.
01:01:51.300 | So earlier on I took all of our classes and created a categories list, and so if we look
01:01:58.300 | up category number 7, which is what this is, category number 7 is a car. Let's have a look
01:02:05.260 | at another example. Image number 17 has two bounding boxes, one of them is type 15, one
01:02:13.180 | is type 13, that is a person and a horse. So this will be much easier to understand if
01:02:18.020 | we can see a picture of these things. So let's create some pictures.
01:02:23.740 | So having just turned our height width stuff into top left, bottom right stuff, we're now
01:02:32.260 | going to create a method to do the exact opposite, because any time I want to call some library
01:02:38.980 | that expects the opposite, I'm going to need to pass it in the opposite. So here is something
01:02:42.980 | that converts a bounding box to a height width, bbhw, b bounding box to height width. So it's
01:02:50.860 | again reversing the order and giving us the height and width. So we can now open an image
01:03:04.620 | in order to display it, and where we're going to get to is we're going to get it to show
01:03:10.940 | this - that's that car. So one thing that I often get asked on the forums or through
01:03:17.880 | GitHub is like, well how did I find out about this open image thing? Where did it come from,
01:03:26.460 | what does it mean, who uses it. And so I wanted to take a moment because one of the things
01:03:33.220 | you're going to be doing a lot, and I know a lot of you aren't professional coders, you
01:03:38.180 | have backgrounds in statistics or meteorology or physics or whatever, and I apologize for
01:03:44.020 | those of you who are professional coders, you know all this already. Because we're going
01:03:48.260 | to be doing a lot of stuff with the fastai library and other libraries, you need to be
01:03:51.620 | able to navigate very quickly through them.
01:03:55.460 | And so let me give you a quick overview of how to navigate through code, and for those
01:04:00.300 | of you who haven't used an editor properly before, this is going to blow your minds.
01:04:05.820 | For those of you that have, you're going to be like, check this out guys, check this out.
01:04:11.020 | For the demo I'm going to show you in Visual Studio Code, personally my view is that on
01:04:17.100 | pretty much every platform, unless you're prepared to put in the decades of your life
01:04:23.680 | to learn Vim or Emacs well, Visual Studio Code is probably the best editor out there.
01:04:29.260 | It's free, it's open source, there are other perfectly good ones as well.
01:04:33.860 | So if you download a recent version of Anaconda, it will offer to install Visual Studio Code
01:04:38.500 | for you, it integrates with Anaconda, sets it up with your Python interpreter and comes
01:04:43.700 | with the Python extensions and everything.
01:04:46.020 | So it's a good choice if you're not sure. If you've got some other editor you like,
01:04:52.500 | search for the right keywords for the help.
01:04:54.640 | So if I fire up Visual Studio Code, the first thing to do of course is do a git clone of
01:05:01.580 | the fastai library to your laptop. You'll find in the root of the repo as well as the
01:05:08.820 | environment.yml file that sets up Anaconda environment for GPU. One of the students has
01:05:14.640 | been kind enough to create an environment-CPU.yml file, and perhaps one of you that knows how
01:05:21.420 | to do this can add some notes to the wiki, but basically you can use that to create a
01:05:27.300 | local CPU-only fastai installation.
01:05:32.180 | The reason you might want to do that is so that as you navigate the code, you'll be able
01:05:37.620 | to navigate into PyTorch, you'll see all the stuff is there.
01:05:42.420 | So I opened up Visual Studio Code, and it's as simple as saying open folder, and then
01:05:48.740 | you can just point it at the fastai github folder that you just downloaded.
01:05:54.380 | And so the next thing you need to do is to set up Visual Studio Code to say I want to
01:06:00.260 | use the fastai conda environment, please.
01:06:04.780 | So the way you do that is with the select interpreter command, and there's a really nice
01:06:09.020 | idea which is kind of like the best of both worlds between a command-line interface and
01:06:14.620 | a GUI, which is this is the only command you need to know, Ctrl+Shift+P. You hit Ctrl+Shift+P,
01:06:21.620 | and then you start typing what you want to do and watch what happens.
01:06:24.700 | I want to change my interpreter into... okay, and it appears.
01:06:31.340 | If you're not sure, you can kind of try a few different things.
01:06:36.060 | So here we are, Python select interpreter, and you can see generally you can type stuff
01:06:40.100 | in, it will give you a list of things if you can.
01:06:42.700 | And so here's a list of all of the environments and interpreters I have set up, and here's
01:06:46.460 | my fastai environment.
01:06:51.060 | So that's basically the only setup that you have to do.
01:06:55.340 | The only other thing you might want to do is to know there's an integrated terminal,
01:06:59.540 | so if you hit Ctrl+Backtick, it brings up the terminal.
01:07:04.220 | And the first time you do it, it will ask you what terminal do you want.
01:07:08.780 | If you're in Windows, it will be like PowerShell or Command Prompt or Bash.
01:07:13.420 | If you're on Linux, you've got multiple shells installed, it will ask.
01:07:16.880 | So as you can see, I've got it set up to use Bash.
01:07:21.340 | And you'll see it automatically goes to the directory that I'm in.
01:07:28.220 | So the main thing we want to do right now is find out what open_images is.
01:07:32.540 | So the only thing you need to know to do that is Ctrl+T.
01:07:37.980 | If you hit Ctrl+T, you can now type the name of a class, a function, pretty much anything
01:07:42.540 | and you can find out about it.
01:07:44.260 | So open_image, you can see it appears.
01:07:47.620 | And it's kind of cool if there's something that's got like camel case capitalized or
01:07:51.620 | something with underscore, you can just type the first few letters of each bit so I could
01:07:55.420 | be like open_image, for example.
01:08:00.220 | I do that and it's found the function, it's also found some other things that match.
01:08:06.380 | There it is.
01:08:08.820 | So that's kind of a good way you can see exactly where it's come from and you can find out exactly
01:08:12.860 | what it is.
01:08:14.740 | And then the next thing I guess would be like, well, what's it used for?
01:08:18.620 | So if it's used inside fast.ai, you could say find_references, which is shift, it should
01:08:26.700 | say shift_f12, open_image, shift_f12, and it brings up something saying, oh, it's used
01:08:39.940 | twice in this code base, and I can go and have a look at each of those examples.
01:08:46.820 | If it's used in multiple different files, it will tell you the model of different files
01:08:50.460 | that it's used in.
01:08:54.260 | Another thing that's really handy then is as you look at the code, you'll find that certain
01:08:59.580 | bits of the code call other parts of the code.
01:09:03.220 | So for example, if you're inside files_dataset, and you're like, oh, this is calling something
01:09:07.280 | called open_image, what is that?
01:09:10.200 | You can wave your pointer over it and it will give you the docstring.
01:09:14.380 | Or you can hit f12, and it jumps straight to its definition.
01:09:20.140 | So often it's easy to get a bit lost in things, call things, call things, and if you have
01:09:24.940 | to manually go to each bit, it's infuriating, whereas this way it's always one button away.
01:09:31.020 | Ctrl+T to go to something that you specifically know the name of, or f12 to jump to the name
01:09:36.380 | of the definition of something that you're clicking on.
01:09:39.700 | When you're done, you probably want to go back to where you came from, so Alt+Left takes
01:09:44.700 | you back to where you were.
01:09:49.740 | So whatever you use, BIM, Emacs, Atom, whatever, they all have this functionality as long as
01:09:58.460 | you have an appropriate extension installed.
01:10:02.820 | If you use PyCharm, you can get that for free, that doesn't need any extensions because it's
01:10:07.340 | Python.
01:10:08.340 | Whatever you're using, you want to know how to do this stuff.
01:10:14.860 | Finally I'll mention there's a nice thing called Zen mode, Ctrl+KZ, which basically
01:10:22.020 | gets rid of everything else so you can focus, but it does keep this nice little thing on
01:10:25.980 | the right-hand side which shows you where you are.
01:10:35.540 | That's something that you should practice if you haven't played around with it before
01:10:39.540 | during the week because we're increasingly going to be digging deeper and deeper into
01:10:44.180 | fast.ai and PyTorch libraries.
01:10:47.060 | As I say, if you're already a professional coder, know all this stuff, apologies for telling
01:10:50.940 | me stuff you already know.
01:10:53.460 | So we're going to -- well actually since we did that, let's just talk about OpenImage.
01:11:02.340 | You'll see that we're using CV2, CV2 is the library, is actually the OpenCV library.
01:11:12.240 | You might wonder why we're using OpenCV, and I want to explain some of the inits of fast.ai
01:11:19.060 | to you because some of them are kind of interesting and might be helpful to you.
01:11:24.940 | The torch vision, like the standard PyTorch vision library, actually uses PyTorch tensors
01:11:33.500 | for all of its data augmentation and stuff like that.
01:11:38.220 | A lot of people use Pillow, the standard Python imaging library.
01:11:45.220 | I did a lot of testing of all of these.
01:11:48.180 | I found OpenCV was about 5 to 10 times faster than TorchVision, so early on I actually teamed
01:11:56.260 | up with one of the students from an earlier class to do the Planet Lab satellite competition
01:12:00.340 | back when that was on, and we used TorchVision.
01:12:04.340 | Because it was so slow, we could only get 25% GPU utilization because we were doing
01:12:09.820 | a lot of data augmentation.
01:12:12.620 | So then I used the Profiler to find out what was going on and realized it was all in TorchVision.
01:12:20.700 | Pillow or PIL is quite a bit faster, but it's not as fast as OpenCV, and also is not nearly
01:12:33.060 | as thread-safe.
01:12:34.540 | So I actually talked to the guy who developed the thing, Python has this thing called the
01:12:41.060 | global interpreter lock, the GIL, which basically means that two threads can't do Pythonic things
01:12:48.820 | at the same time.
01:12:50.500 | It makes Python a really shitty language for modern programming, but we're stuck with it.
01:12:58.820 | So I spoke to the guy on Twitter who actually made it so that OpenCV releases the GIL.
01:13:05.580 | So one of the reasons the Fast.io library is so amazingly fast is because we don't use
01:13:11.300 | multiple processors like every other library does through our data augmentation, we actually
01:13:15.140 | do multiple threads.
01:13:16.940 | And the reason we can do multiple threads is because we use OpenCV.
01:13:20.940 | Unfortunately, OpenCV is a really shitty API, it's kind of inscrutable, a lot of stuff it
01:13:28.460 | does is poorly documented.
01:13:29.940 | When I say poorly documented, it's documented in really obtuse kind of ways.
01:13:38.700 | So that's why I try to make it so no one using Fast.io needs to know that it's using OpenCV.
01:13:45.820 | If you want to open an image, do you really need to know that you have to pass these flags
01:13:49.540 | to open to actually make it work?
01:13:51.700 | Do you actually need to know that if the reading fails it doesn't show an exception, it just
01:13:56.340 | silently returns none?
01:13:58.980 | It's these kinds of things that we try to do to actually make it work nicely.
01:14:03.500 | But as you start to dig into it, you'll find yourself in these places and you'll want to
01:14:09.860 | know why.
01:14:11.260 | And I mentioned this in particular to say don't start using PyTorch for your data augmentation,
01:14:19.220 | don't start bringing in Pillow, you'll find suddenly things slow down horribly or the
01:14:23.380 | multithreading won't work anymore, try to stick to using OpenCV for your processing.
01:14:35.980 | So we've got our image, we're just going to use it to demonstrate the Pascal library.
01:14:46.340 | And so the next thing I wanted to show you in terms of important coding stuff we're going
01:14:50.180 | to be using throughout this course is using Matplotlib a lot better.
01:14:55.860 | So Matplotlib is so named because it was originally a clone of Matlab's plotting library.
01:15:05.140 | Unfortunately, Matlab's plotting library is awful, but at the time it was what everybody
01:15:13.700 | knew.
01:15:16.800 | So at some point, the Matplotlib folks realized that the Matlab plotting library is awful,
01:15:26.540 | so they added a second API to it which was an object-oriented API.
01:15:31.900 | Unfortunately, because nobody who originally learned Matplotlib learned the OO API, they
01:15:37.580 | then taught the next generation of people the old Matlab-style API, and now there's
01:15:42.220 | basically no examples or tutorials online I'm aware of that use the much, much better,
01:15:47.460 | easier to understand, simpler OO API.
01:15:50.540 | So one of the things I'm going to try and show you because plotting is so important
01:15:54.440 | in deep learning is how to use this API, and I've discovered some simple little tricks.
01:16:00.780 | One simple little trick is plot.subplots is just a super handy wrapper, I'm going to use
01:16:06.820 | a lots, and what it does is it returns two things.
01:16:13.220 | One of the things you probably want to care about, the other thing is an axes object,
01:16:18.500 | and basically anywhere where you used to say plt.something, you now say ax.something, and
01:16:25.660 | it will now do that plotting to that particular subplot.
01:16:30.540 | So a lot of the time you'll use this, or I'll use this during this course to plot multiple
01:16:36.100 | plots that we can compare next to each other, but even in this case I'm creating a single
01:16:44.060 | plot.
01:16:45.060 | But it's just nice to only know one thing rather than lots of things, so regardless
01:16:50.140 | of whether you're doing one plot or lots of plots, I always start now with this plot.subplot.
01:16:56.980 | And the nice thing is that this way I can pass in an axes object if I want to plot it
01:17:03.420 | into a figure I've already created, or if it hasn't been passed in I can create one.
01:17:10.980 | So this is also a nice way to make your matplotlib functions really versatile, and you'll kind
01:17:17.220 | of see this used throughout this course.
01:17:20.900 | So now rather than plot.imshow, it's ax.imshow.
01:17:25.700 | And then rather than kind of weird stateful setting things in the old-style API, you can
01:17:33.900 | now use oos, get_access that returns an object, set_visible, sets a property, it's all pretty
01:17:41.260 | normal straightforward stuff.
01:17:42.960 | So once you start getting the hang of a small number of these oo.matplotlib things, hopefully
01:17:49.460 | you'll find life a lot easier, so I'm going to show you a few right now actually.
01:17:54.780 | So let me show you a cool example, what I think is a cool example.
01:17:59.500 | So one thing that kind of drives me crazy with people putting text on images, whether
01:18:05.380 | it be subtitles on TV or people doing stuff with computer vision is that it's like white
01:18:11.900 | text on a white background or black text on a dark background, you can't read it.
01:18:16.900 | And so a really simple thing that I like to do every time I draw on an image is to either
01:18:22.660 | make my text in boxes white with a little black border or vice versa.
01:18:28.740 | And so here's a cool little thing you can do in matplotlib, is you can take a matplotlib
01:18:36.140 | plotting object and you can go setPathEffects and say add a black stroke around it.
01:18:47.580 | And you can see that when you draw that, it doesn't matter that here it's white on a white
01:18:54.420 | background or here it's on a black background, it's equally visible.
01:18:59.060 | And I know it's a simple little thing, but it kind of just makes life so much better
01:19:04.260 | when you can actually see your bounding boxes and actually read the text.
01:19:08.420 | So you can see, rather than just saying add a rectangle, I get the object that it creates
01:19:17.620 | and then pass that object to drawOutline.
01:19:19.700 | Now everything I do, I'm going to get this nice path effect on it.
01:19:24.860 | You can see matplotlib is a perfectly convenient way of drawing stuff.
01:19:30.300 | So when I want to draw a rectangle, matplotlib calls that a patch, and then you can pass
01:19:37.260 | in all different kinds of patches.
01:19:39.740 | So here's -- again, rather than having to remember all that every time, please stick
01:19:45.420 | it in a function.
01:19:46.420 | And now you can use that function every time.
01:19:49.660 | You don't have to put it in a library somewhere, I always put lots of functions inside my notebook.
01:19:55.380 | If I use it in like three notebooks, then I know it's useful enough that I'll stick
01:20:01.020 | it in a separate library.
01:20:05.100 | You can draw text, and notice all of these take an axis object, so this is always going
01:20:10.500 | to be added to whatever thing I want to add it to.
01:20:13.420 | So I can add text, and draw an outline around it.
01:20:18.180 | So having done all that, I can now take my showImage, and notice here the showImage,
01:20:26.340 | if you didn't pass it an axis, it returns the axis it created.
01:20:29.820 | So showImage returns the axis that image is on, I then turn my bounding box into height
01:20:35.580 | and width for this particular image's bounding box, I can then draw the rectangle, I can
01:20:42.460 | then draw the text in the top left corner.
01:20:48.180 | So remember the bounding box x and y are the first two coordinates, so b colon 2 is the
01:20:54.580 | top left.
01:20:57.980 | This is the, remember the tuple contains two things, the bounding box and then the class,
01:21:04.040 | so this is the class, and then to get the text of it I just pass it into my categories
01:21:08.700 | list and there we go.
01:21:11.780 | So now that I've kind of got all that set up, I can use that for all of my object detection
01:21:15.740 | stuff from here on.
01:21:21.480 | What I really want to do though is to kind of package all that up, so here it is, packaging
01:21:25.460 | it all up, so here's something that draws an image with some annotations, so it shows
01:21:31.000 | the image, it goes through each annotation, turns it into height and width, draws the
01:21:35.860 | rectangle, draws the text.
01:21:40.420 | If you haven't seen this before, each annotation remember contains a bounding box and a class,
01:21:46.300 | so rather than going for o in a and n and going 0, 0, 1, I can destructure it, so if
01:21:56.800 | you put something on the left, then that's going to put the two parts of the tuple or
01:22:02.540 | a list into those two things, super handy.
01:22:06.260 | So for the bounding box and the class in the annotations, go ahead and do all that, and
01:22:14.060 | so then I can then say okay, draw an image at a particular index by grabbing the image
01:22:20.080 | ID, opening it up and then calling that draw, and so let's test it out, and there it is.
01:22:29.140 | So that kind of seems like quite a few steps, but to me, when you're working with a new
01:22:38.220 | data set, getting to the point that you can rapidly explore it, it pays off.
01:22:45.980 | You'll see as we start building our model, we're going to keep using these functions
01:22:49.880 | now to kind of see how things are going.
01:22:58.060 | So step 1 from our presentation is to do a classifier.
01:23:05.740 | And so I think it's always good, like for me, I didn't really have much experience before
01:23:10.340 | I started preparing this course a few months ago in doing this kind of object detection
01:23:16.260 | stuff, so I was like, alright, I want to get this feeling of, even though it's deep learning,
01:23:23.260 | of continual progress.
01:23:25.780 | So I'm like, what can I make work?
01:23:28.460 | I thought, alright, why don't I find the biggest object in each image and classify it?
01:23:33.900 | I know how to do that.
01:23:37.460 | This is one of the biggest problems I find, particularly with younger students, is they
01:23:41.900 | figure out the whole big solution they want, generally which involves a whole lot of new
01:23:48.420 | speculative ideas that nobody's ever tried before, and they spend 6 months doing it,
01:23:54.060 | and then the day before the presentation, none of it works, and they're screwed.
01:24:00.740 | I've talked about my approach to Kaggle competitions before, which is like half an hour if you
01:24:05.460 | date.
01:24:06.460 | At the end of that half an hour, submit something, and try and make it a little bit better than
01:24:10.820 | yesterday's.
01:24:11.820 | So I've kind of tried to do the same thing in preparing this lesson, which is try to
01:24:18.060 | create something that's a bit better than the last thing.
01:24:20.740 | So the first thing, the easiest thing I could come up with was my largest item classifier.
01:24:27.340 | So the first thing I needed to do was to go through each of the bounding boxes in an image
01:24:40.060 | and get the largest one.
01:24:43.500 | So I actually didn't write that first, I actually wrote this first.
01:24:48.900 | So normally I pretend that somebody else has created the exact API I want, and then go
01:24:54.460 | back and write it.
01:24:56.580 | So I wrote this line first, and it's like, okay, I need something which takes all of
01:25:03.300 | the bounding boxes for a particular image and finds the largest, and that's pretty straightforward.
01:25:12.820 | I can just sort the bounding boxes, and here again we've got a lambda function.
01:25:18.620 | So again, if you haven't used lambda functions before, this is something you should study
01:25:22.180 | during the week, they're used all over the place to quickly define a once-off function.
01:25:29.380 | And in this case, the Python built-in sorted function lets you pass in a function to say,
01:25:39.460 | how do you decide whether something's earlier or later in the sort order?
01:25:44.660 | And in this case, I took the product of the last two items of my bounding box list, i.e.
01:26:00.300 | the bottom right hand corner, minus the first two items of my bounding box list, i.e. the
01:26:05.860 | top left corner.
01:26:06.860 | So bottom right minus top left is the size, the two sizes, and if you take the product
01:26:12.620 | of those two things you get the size of the bounding box.
01:26:15.980 | And so then that's the function, do that in descending order.
01:26:20.940 | Often you can take something that's going to be a few lines of code and turn it into
01:26:27.020 | one line of code, and sometimes you can take that too far, but for me, I like to do that
01:26:35.580 | where I reasonably can, because again, having to understand a whole big chain of things,
01:26:43.620 | my brain can just say, I can just look at that at once, and say okay, there it is.
01:26:48.980 | And also I find that over time, my brain kind of builds up this little library of idioms,
01:26:55.860 | and more and more things I can look at a single line and know what's going on.
01:27:04.020 | So this now is a dictionary, and it's a dictionary because this is a dictionary comprehension.
01:27:15.140 | A dictionary comprehension is just like a list comprehension, I'm going to use it a lot
01:27:18.820 | in this part of the course, except it goes inside curly brackets, and it's got a key colon
01:27:25.660 | value.
01:27:27.820 | So here the key is going to be the image ID, and the value is the largest bound box.
01:27:39.540 | So now that we've got that, we can look at an example, and here's an example of the largest
01:27:48.580 | bounding box for this image.
01:27:51.020 | So obviously there's a lot of objects here, there's three bicycles and three people, but
01:27:57.020 | here's the largest bounding box.
01:28:01.500 | I feel like this ought to go without saying, but it definitely needs to be said because
01:28:05.460 | so many people don't do it.
01:28:07.020 | You need to look at every stage when you've got any kind of processing pipeline, if you're
01:28:13.680 | as bad at coding as I am, everything you do will be wrong the first time you do it.
01:28:20.220 | But there's lots of people that are as bad as me at coding, and yet lots of people write
01:28:24.700 | lines and lines of code assuming they're all correct, and then at the very end they've got
01:28:29.460 | a mistake and they don't know where it came from.
01:28:32.020 | So particularly when you're working with images or text, like things that humans can look
01:28:38.140 | at and understand, keep looking at it.
01:28:41.740 | So here I have it, yep, that looks like the biggest thing, and that certainly looks like
01:28:46.660 | a person.
01:28:47.660 | So let's move on.
01:28:48.660 | Here's another nice thing in Pathlib, make directory, so it's a handy little method.
01:28:56.440 | So I'm going to create a path called CSV, which is a path to my large objects CSV file.
01:29:05.060 | Why am I going to create a CSV file?
01:29:09.620 | Pure laziness, right?
01:29:10.860 | We have an image classifier from CSV, I could go through a whole lot of work to create a
01:29:17.060 | custom data set and blah blah blah to use this particular format I have.
01:29:22.900 | But why?
01:29:23.900 | It's so easy to create the CSV, chuck it inside a temporary folder, and then use something
01:29:30.180 | that already you have.
01:29:34.020 | Something I've seen a lot of times on the forum is people will say how do I convert
01:29:39.340 | this weird structure into a way that fast.ai can accept it, and then normally somebody
01:29:45.700 | on the forum will say, print it to a CSV file.
01:29:49.380 | So that's a good simple tip.
01:29:53.620 | And the easiest way to create a CSV file is to create a pandas dataframe.
01:29:58.140 | So here's my pandas dataframe, I can just give it a dictionary with the name of a column
01:30:04.660 | and the list of things in that column, so there's the file name, there's the category,
01:30:11.340 | and then you'll see here, why do I have this?
01:30:13.340 | I've already named the columns in the dictionary, why is it here?
01:30:17.460 | Because the order of columns matters, and the dictionary does not have an order.
01:30:23.700 | So this says the file name comes first and the category comes second.
01:30:27.500 | So that's a good trick to creating your CSVs.
01:30:31.300 | So now it's just dogs and cats.
01:30:33.420 | I have a CSV file that contains a bunch of file names, and for each one it contains the
01:30:39.460 | plus of that object.
01:30:42.540 | So this is the same two lines of code you've seen a thousand times.
01:30:48.900 | What we will do though is to take a look at this.
01:30:55.700 | The one thing that's different is crop type.
01:31:01.180 | So you might remember the default strategy for creating a 224x224 image in fastai is
01:31:12.580 | to first of all resize it, so the largest side is 224, and then to take a random square
01:31:25.300 | crop during training, and then during validation we take the center crop unless we use data
01:31:31.900 | augmentation in which case we do a few random crops.
01:31:37.860 | For bounding boxes, we don't want to do that because unlike an image net where the thing
01:31:44.180 | we care about is pretty much in the middle and it's pretty big, a lot of the stuff in
01:31:49.220 | object detection is quite small and close to the edge.
01:31:53.940 | So we could crop it out, and that would be bad.
01:31:56.880 | So when you create your transforms you can choose crop type = crop type.no, and no means
01:32:04.500 | don't crop, and therefore to make it square instead it squishes it.
01:32:10.060 | So you'll see this guy now looks kind of a bit strangely wide, and that's because he's
01:32:15.540 | been squished like this rather than cropped.
01:32:21.700 | Generally speaking, a lot of computer vision models work a little bit better if you crop
01:32:29.900 | rather than squish, but they still work pretty well if you squish, and in this case we definitely
01:32:36.040 | don't want to crop, so this is perfectly fine.
01:32:40.540 | If you had very long or very tall images such that if a human looked at the squash version
01:32:48.620 | you'd be like, that looks really weird, then that might be difficult to model, but in this
01:32:53.140 | case we're just like, it looks a little bit strange, so the computer won't mind.
01:33:03.500 | So I'm going to quite often dig a little bit into some more depths of fast.ai and PyTorch,
01:33:11.780 | and in this case I want to just look at data loaders a little bit more.
01:33:16.460 | So you already know that inside a model data object, when there's lots of model data subclasses
01:33:30.200 | like image classifier data, we have a bunch of things which include a training data loader
01:33:36.900 | and a training data set, and we'll talk much more about this soon.
01:33:42.500 | The main thing to know about a data loader is that it's an iterator, that each time you
01:33:48.560 | grab the next iteration of stuff from it, you get a mini-batch.
01:33:54.020 | And the mini-batch you get is of whatever size you asked for, and by default the batch
01:34:00.660 | size is 64.
01:34:02.260 | In Python, the way you grab the next thing from an iterator is with next, but you can't
01:34:15.180 | just do that.
01:34:19.540 | The reason you can't do that is because you need to say, start a new epoch now.
01:34:26.820 | In general, this isn't just in PyTorch, but for any Python iterator, you kind of need
01:34:31.740 | to say start at the beginning of the sequence, please.
01:34:35.980 | So the way you do that, and this is a general Python concept, is you write iter.
01:34:43.020 | And iter says please grab an iterator out of this object.
01:34:49.940 | Specifically as we'll learn later, it means this class has to have defined an underscore
01:34:54.580 | underscore iter underscore underscore method, which returns some different object which
01:35:00.300 | then has an underscore underscore next underscore underscore method.
01:35:05.380 | So that's how I do that.
01:35:07.820 | And so if you want to grab just a single batch, this is how you do it.
01:35:13.320 | x, y = next in a data loader.
01:35:17.740 | Why x, y?
01:35:19.500 | Because our data loaders, our data sets behind the data loaders always have an x independent
01:35:26.700 | and a y independent variable.
01:35:30.980 | So here we can grab a mini-batch of x's and y's.
01:35:35.260 | I now want to pass that to that show image command we had earlier, but we can't send
01:35:45.060 | that straight to show image.
01:35:57.620 | Here it is.
01:36:00.660 | For one thing, it's not a NumPy array, it's not on the CPU, and its shape is all wrong.
01:36:10.420 | It's not 224x224x3, it's 3x224x224.
01:36:15.380 | Furthermore these are not numbers between 0 and 1, why not?
01:36:21.620 | Because remember all of the standard ImageNet pre-trained models expect our data to be normalized
01:36:29.980 | to have a 0 mean and a 1 standard deviation.
01:36:33.900 | So if you look inside -- let's use Visual Studio Code for this since that's what we've
01:36:38.420 | been doing -- so if you look inside transforms from model, so Ctrl+T transforms from model,
01:36:50.460 | T-F-M, which in turn calls transforms, so F12, actually transforms from model, calls transform
01:37:11.740 | from stats, and here you can see normalize.
01:37:17.420 | And it normalizes with some set of image statistics, and the set of image statistics, they're basically
01:37:22.860 | hard-coded.
01:37:24.300 | This is the ImageNet statistics, this is the statistics used for inception models.
01:37:29.220 | So there's a whole bunch of stuff that's been done to the input to get it ready to be passed
01:37:35.980 | to a pre-trained model.
01:37:39.240 | So we have a function called denorm for denormalize.
01:37:45.600 | It doesn't only denormalize, it also fixes up the dimension order and all that stuff.
01:37:52.980 | The denormalization depends on the transform.
01:37:57.460 | And the dataset knows what transform was used to create it.
01:38:01.620 | So that's why you have to go model data, dot, and then some dataset, dot, denorm, and that's
01:38:07.700 | a function that is stored for you that will undo everything.
01:38:12.620 | And then you can pass that a mini-batch, but you have to turn it into NumPy first.
01:38:20.900 | So this is like all the stuff that you need to be able to do to grab batches and look
01:38:25.900 | at them.
01:38:27.400 | And so after you've done all that, you can show the image, and we've got back our last
01:38:30.340 | list.
01:38:31.340 | So that's looking good.
01:38:36.040 | So in the end, we've just got the standard four lines of code.
01:38:39.340 | We've got our transforms, we've got our model data, convlin, dot, pre-trained, we're using
01:38:45.700 | a ResNet34 here, I'm going to add accuracy as a metric, fix some optimization function,
01:38:53.100 | do an LRfind, and that looks kind of weird, not particularly helpful.
01:38:58.200 | Normally we would expect to see an uptick on the right.
01:39:01.420 | The reason we don't see it is because we intentionally remove the first few points and the last few
01:39:08.660 | points.
01:39:09.660 | The reason is that often the last few points shoot so high up towards infinity that you
01:39:13.940 | basically can't see anything, so the vast majority of the time removing the last few
01:39:18.540 | points is a good idea.
01:39:20.180 | However, when you've got very few mini-batches, sometimes it's not a good idea, and so a lot
01:39:25.740 | of people ask this on the forum, here's how you fix it.
01:39:28.820 | Just say skip, by default it skips 10 at the start, so in this case we just say 5, by default
01:39:34.300 | it skips 5 at the end, so now we can see the shape properly.
01:39:41.380 | If your data set is really tiny, you may need to use a smaller batch size, like if you only
01:39:46.700 | have three or four batches worth, there's nothing to see.
01:39:52.220 | But in this case, it's fine, we just have to plot a little bit more.
01:39:57.220 | So we pick a learning rate, we say fit, after one epoch, just train the last layer, it's
01:40:05.660 | 80%, let's unfreeze a couple of layers, do another epoch, 82%, and freeze the whole thing,
01:40:15.740 | not really improving.
01:40:19.180 | Why are we stuck at 80%?
01:40:20.580 | It kind of makes sense, right?
01:40:24.060 | Unlike ImageNet or dogs vs cats, where each image has one major thing, they were picked
01:40:29.900 | because they had one major thing, and the one major thing is what you're asked to look
01:40:33.460 | for, a lot of the Pascal data set has lots of little things, and so a large classifier
01:40:41.980 | is not necessarily going to do great.
01:40:45.860 | But of course, we really need to be able to see the results to see whether it makes sense.
01:40:54.020 | So we're going to write something that creates this, and in this case, after working with
01:41:00.620 | this a while, I know what the 20 Pascal classes are.
01:41:05.220 | So I know there's a person in the bicycle class, I know there's a dog in a sofa class,
01:41:09.220 | so I know this is wrong, it should be sofa, that's correct, bird, yes, yes, chair, that's
01:41:14.380 | wrong, I think the table's bigger, motorbike's correct because there's no cactus, that should
01:41:18.960 | be a bus, person's correct, bird's correct, cow's correct, plant's correct, cow's correct.
01:41:24.540 | So it's looking pretty good.
01:41:27.960 | So when you see a piece of code like this, if you're not familiar with all the steps
01:41:36.940 | to get there, it can be a little overwhelming.
01:41:43.660 | And I feel the same way when I see a few lines of code and something I'm not terribly familiar
01:41:47.540 | with, I feel overwhelmed as well, but it turns out there's two ways to make it super simple
01:41:53.980 | to understand the code.
01:41:57.260 | Or there's one high-level way.
01:41:58.900 | The high-level way is run each line of code step-at-step, print out the inputs, print out
01:42:08.140 | the outputs.
01:42:09.860 | Most of the time, that'll be enough.
01:42:13.820 | If there's a line of code where you don't understand how the outputs relate to the inputs,
01:42:17.980 | go and have a look for the source.
01:42:21.040 | So now all you need to know is what are the two ways you can step through the lines of
01:42:25.660 | code one at a time.
01:42:28.340 | The way I use perhaps the most often is to take the contents of the loop, copy it, create
01:42:37.780 | a cell above it, paste it, outdent it, write i=0, and then put them all in separate cells,
01:42:48.560 | and then run each one one at a time, printing out the input samples.
01:42:52.700 | I know that's obvious, but the number of times I actually see people do that when they ask
01:42:57.620 | me for help is basically zero, because if they had done that, they wouldn't be asking
01:43:01.700 | for help.
01:43:06.020 | Another method that's super handy and there's particular situations where it's super handy
01:43:11.260 | is to use the Python Debugger.
01:43:13.540 | Who here has used a debugger before?
01:43:17.020 | So half to two-thirds.
01:43:20.220 | So for the other half of you, this will be life-changing.
01:43:23.820 | Actually, a guy I know this morning who's actually a deep learning researcher wrote
01:43:29.540 | on Twitter, and his message on Twitter was "How come nobody told me about the Python Debugger
01:43:35.620 | before?
01:43:36.620 | My life has changed."
01:43:39.220 | And this guy's an expert, but because nobody teaches basic software engineering skills
01:43:45.660 | in academic courses, nobody thought to say to him, "Hey Mark, do you know what?
01:43:52.860 | There's something that shows you everything your code does one step at a time."
01:43:58.220 | So I replied on Twitter and I said, "Good news Mark, not only that, every single language
01:44:03.780 | in existence, in every single operating system also has a debugger, and if you Google for
01:44:08.860 | language name debugger, it will tell you how to use it."
01:44:12.660 | So there's a meta piece of information for you.
01:44:15.740 | In Python, the standard debugger is called PDB.
01:44:20.140 | And there's two main ways to use it.
01:44:22.980 | The first is to go into your code.
01:44:25.660 | And the reason I'm mentioning this now is because during the next few weeks, if you're
01:44:32.220 | anything like me, 99% of the time you'll be in a situation where your code's not working.
01:44:38.860 | And very often it will have been on the 14th mini-batch inside the forward method of your
01:44:44.780 | custom module.
01:44:46.540 | It's like, what do you do?
01:44:49.060 | And the answer is you go inside your module and you write that.
01:44:54.500 | And if you know it's only happening on the 14th iteration, you type if i = 13.
01:45:04.100 | So you can set a conditional breakpoint.
01:45:09.460 | PDB is the Python debugger, fastai imports it for you, if you get the message that PDB
01:45:14.100 | is not there, then you can just say import PDB.
01:45:17.380 | So let's try that.
01:45:19.260 | And you'll see it's not the most user-friendly experience.
01:45:22.700 | It just pops up a box.
01:45:25.460 | But the first cool thing to notice is, the debugger even works in a notebook.
01:45:29.900 | So that's pretty nifty.
01:45:30.900 | It will also work in the terminal.
01:45:35.060 | And so what can you do?
01:45:36.860 | You can type h for help.
01:45:40.020 | And there are plenty of tutorials here.
01:45:42.540 | The main thing to know is this is one of these situations where you definitely want to know
01:45:45.940 | the one-letter mnemonics.
01:45:48.100 | So you could type next, but you definitely want to type n.
01:45:51.740 | You could type continue, but you definitely want to type c.
01:45:54.860 | I've listed the main ones you need.
01:45:57.180 | So what I can do now that I'm sitting here is it shows me the line it's about to run.
01:46:06.220 | So one thing I might want to do is to print out something, and I can write any Python
01:46:14.260 | expression and hit enter and find it.
01:46:20.980 | So that's a useful thing to do.
01:46:24.660 | I might want to find out more about where am I in the code more generally.
01:46:29.700 | I don't just want to see this line, but what's before it and after it, in which case I want
01:46:33.500 | L for list.
01:46:35.540 | And so you can see I'm about to run that line, these are the lines above it, and below it.
01:46:43.300 | So I might be now like, let's run this line and see what happens.
01:46:47.060 | So go to the next line, here's n, and you can see now it's about to run the next line.
01:46:55.420 | One handy tip, you don't even have to type n.
01:46:57.700 | If you just hit enter, it repeats the last thing you did, so that's another n.
01:47:02.640 | So I now should have a thing called b.
01:47:04.340 | Unfortunately, single letters are often used for debugger commands.
01:47:10.900 | So if I just type b, it'll run the b command rather than print b for me.
01:47:15.520 | So to force it to print, you use print b.
01:47:18.580 | So there's a bird.
01:47:23.980 | Alright, fine, let's do next again.
01:47:28.380 | At this point, if I hit next, it'll draw the text.
01:47:32.920 | But I don't want to just draw the text, I want to know how it's going to draw the text.
01:47:37.780 | So I don't want to know next over it, I want to s step into it.
01:47:41.660 | So if I now hit s, step into it, I'm now inside draw text, and I now hit n, I can see draw
01:47:49.220 | text, and so forth.
01:47:52.700 | And then I'm like, okay, I know everything I want to know about this, I will continue
01:47:57.420 | until I hit the next breakpoint.
01:47:59.020 | So c will continue until I'm back at the breakpoint again.
01:48:05.500 | What if I was zipping along, and this happens quite often, let's step into dnorm.
01:48:13.860 | Here I am inside dnorm.
01:48:15.900 | And what will often happen is if you're debugging something in your PyTorch module, and it's
01:48:22.260 | hit an exception, and you're trying to debug, you'll find yourself like six layers deep
01:48:27.780 | inside PyTorch.
01:48:29.740 | You want to actually see back up what's happening where you called it from.
01:48:34.480 | So in this case, I'm inside this property, but I actually want to know what was going
01:48:38.680 | on up the call stack, I just hit u, and that doesn't actually run anything, it just changes
01:48:46.200 | the context of the debugger to show me what called it, and now I can type things to find
01:48:56.060 | out about that environment.
01:48:59.420 | And then if I'm going to go down again, it's deep, so I'm not going to show you everything
01:49:04.980 | about the debugger, but I've just showed you all of those commands.
01:49:09.740 | Yes, Azar?
01:49:12.380 | Something that we've found helpful as we've been doing this is using from ipython.core.debugger
01:49:17.020 | imports that trace, and then you get it all prettily colored.
01:49:20.620 | You do indeed, thank you.
01:49:24.280 | Excellent tip.
01:49:26.620 | Let's learn about some of our students here.
01:49:28.220 | Azar, tell us, I know you're doing an interesting project, can you tell us about it?
01:49:32.300 | Sure.
01:49:33.300 | Okay.
01:49:34.300 | Hello everyone, I'm Azar, here with my collaborator Britt, and we're using this kind of stuff
01:49:42.340 | to try to build a Google Translate for animal communication.
01:49:48.620 | So that involves playing around a lot with unsupervised machine neural translation and
01:49:54.620 | doing it on top of audio.
01:49:56.120 | Where do you get data for that from?
01:49:58.660 | That's sort of the hard problem.
01:49:59.660 | So there you have to go, and we're talking to a number of researchers to try to collect
01:50:03.480 | and collate large data sets, but if we can't get it that way, we're thinking about building
01:50:07.740 | a living library of the audio of the species of Earth that involves going out and collecting
01:50:13.300 | 100,000 hours of gelata monkey vocalization.
01:50:16.260 | I didn't know that, that's pretty cool.
01:50:21.100 | All right.
01:50:23.220 | That's great here.
01:50:24.220 | So let's get rid of that set trace.
01:50:29.000 | The other place that the debugger comes in particularly handy is, as I say, if you've
01:50:35.200 | got an exception, particularly if it's deep inside PyTorch.
01:50:38.560 | So if I, like when I times 100 here, obviously that's going to be an exception, I've got
01:50:43.060 | rid of the set trace.
01:50:44.580 | So if I run this now, something's wrong.
01:50:50.940 | Now in this case it's easy to see what's wrong, but often it's not, so what do I do?
01:50:57.940 | Percent debug pops open the debugger at the point the exception had.
01:51:05.060 | So now I can check, okay, preds.lend preds 64, I times 100, I've got to print that because
01:51:17.700 | I have a command, 100, oh no wonder, and you can go down, you can go up, you can list whatever.
01:51:28.460 | I do all of my development, both with the library and the lessons in Jupyter Notebook.
01:51:37.340 | I do it all interactively and I use percent debug all the time along with this idea of
01:51:46.540 | copying stuff out of a function, putting it into separate cells, running it step by step.
01:51:52.140 | There are similar things you can do inside, for example, Visual Studio Code.
01:51:56.340 | There's actually a Jupyter extension which says you select any line of code inside Visual
01:52:01.460 | Studio Code and say run in Jupyter, and it will run it in Jupyter and create a little
01:52:07.780 | window showing you the output.
01:52:10.460 | There's neat little stuff like that.
01:52:12.700 | Actually I think Jupyter Notebook is better, and perhaps by the time you watch this on
01:52:18.940 | the video, Jupyter Lab will be the main thing.
01:52:21.740 | Jupyter Lab is like the next version of Jupyter Notebook, pretty similar.
01:52:26.620 | Well, I just broke it totally.
01:52:42.780 | We know exactly how to fix it, so we will worry about that another time.
01:52:46.820 | We'll debug it this evening.
01:52:53.220 | So to kind of do the next stage, we want to create the bounding box.
01:53:00.740 | And now creating the bounding box around the largest object may seem like something you
01:53:05.760 | haven't done before, but actually it's totally something you've done before.
01:53:11.500 | And the reason it's something you've done before is we know that we can create a regression
01:53:21.020 | rather than a classification neural net.
01:53:23.500 | In other words, a classification neural net is just one that has a sigmoid or softmax
01:53:28.300 | output and that we use a cross-entropy or a binary cross-entropy, negative or blackyoid,
01:53:35.500 | loss function.
01:53:36.500 | That's basically what makes it a classifier.
01:53:40.500 | If we don't have the softmax of sigmoids at the end, and we use mean squared error as
01:53:46.100 | a loss function, it's now a regression model, so we can now use it to predict a continuous
01:53:51.700 | number rather than a category.
01:53:55.100 | We also know that we can have multiple outputs, like in the Planet competition we did a multiple
01:54:02.500 | object classification.
01:54:06.220 | What if we combine the two ideas and do a multiple column regression?
01:54:12.140 | In this case we've got four numbers, top, left, x and y, bottom, right, x and y, and
01:54:19.780 | we could create a neural net with four activations.
01:54:23.340 | We could have no softmax or sigmoid and use a mean squared error loss function.
01:54:28.260 | And this is kind of like where you're thinking about it like differentiable programming.
01:54:33.260 | It's not like how do I create a bounding box model, it's like what do I need?
01:54:40.460 | I need four numbers, therefore I need a neural network with four activations.
01:54:48.540 | That's half of what I need to know.
01:54:49.860 | The other half I need to know is a loss function.
01:54:52.500 | In other words, what's a function that when it is lower means that the four numbers are
01:54:58.940 | better?
01:54:59.940 | Because if I can do those two things, I'm done.
01:55:05.980 | If the x is close to the first activation and the y is close to the second, then I'm
01:55:11.580 | done.
01:55:12.580 | So that's it.
01:55:15.100 | I just need to create a model with four activations with a mean squared error loss function, and
01:55:21.860 | that should be it.
01:55:24.300 | We don't need anything new, so let's try it.
01:55:28.160 | So again, we'll use a CSV.
01:55:32.100 | And if you remember from part 1, to do a multiple label classification, your multiple labels
01:55:40.820 | have to be space-separated, and then your file name is comma-separated.
01:55:46.660 | So I'll take my largest item dictionary, create a bunch of bounding boxes for each one separated
01:55:59.740 | by a space using a list comprehension, then create a data frame like I did before, I'll
01:56:05.020 | turn that into a CSV, and now I've got something that's got the file name and the four bounding
01:56:10.660 | box coordinates.
01:56:12.380 | I will then pass that to from_csv, again I will use crop_type=crop_type.no.
01:56:23.180 | Next week we'll look at transform_type.coordinate.
01:56:26.500 | For now, just realize that when we're doing scaling and data augmentation, that needs
01:56:30.260 | to happen to the bounding boxes, not just to the images.
01:56:34.420 | ImageClassifierData.csv gets us to a situation where we can now grab one mini-batch of data,
01:56:43.260 | we can denormalize it, we can turn the bounding box back into a height width so that we can
01:56:47.500 | show it, and here it is.
01:56:50.160 | Remember we're not doing classifications, I don't know what kind of thing this is, it's
01:56:54.180 | just a thing but there is a thing.
01:56:57.700 | So I now want to create a conflict based on ResNet-34, but I don't want to add the standard
01:57:08.540 | set of fully connected layers that create a classifier, I want to just add a single
01:57:15.500 | linear layer with four outputs.
01:57:18.700 | So FastAI has this concept of a custom head, if you say my model has a custom head, the
01:57:25.820 | head being the thing that's added to the top of the model, then it's not going to create
01:57:30.740 | any of that fully connected network for you, it's not going to add the adaptive average
01:57:36.860 | pooling for you, but instead it will add whatever model you ask for.
01:57:42.540 | So in this case I've created a tiny model, it's a model that flattens out the previous
01:57:49.700 | layer.
01:57:50.700 | Normally it would have a 7x7 by I think 512 previous layer in ResNet-34, so it just flattens
01:57:56.300 | that out into a single vector of length 25.088, and then I just add a linear layer that goes
01:58:02.940 | from 25.088 to 4, there's my 4 outputs.
01:58:06.860 | So that's the simplest possible kind of final layer you could add.
01:58:12.260 | I stick that on top of my pre-trained ResNet-34 model, so this is exactly the same as usual
01:58:17.480 | except I've just got this custom head.
01:58:21.280 | Optimize it with Adam, use a criteria, I'm actually not going to use MSC, I'm going to
01:58:25.260 | use L1 loss, so I can't remember if we covered this last week, we can revise it next week
01:58:30.620 | if we didn't, but L1 loss means rather than adding up the squared errors, add up the absolute
01:58:36.300 | values of the errors.
01:58:39.180 | It's normally actually what you want, adding up the squared errors really penalizes bad
01:58:45.780 | misses by too much, so L1 loss is generally better to work with.
01:58:52.500 | I'll come back to this next week, but basically you can see what we do now is we do our LR
01:58:57.460 | find, find our learning rate, learn for a while, freeze 2 - 2, learn a bit more, freeze - 3,
01:59:06.180 | learn a bit more, and you can see this validation loss, which remember is the absolute value,
01:59:12.860 | mean of the absolute value of the pixels we're off by, gets lower and lower, and then when
01:59:18.700 | we're done we can print out the bounding boxes, and lo and behold, it's done a damn good job.
01:59:27.940 | So we'll revise this a bit more next week, but you can see this idea of like if I said
01:59:34.300 | to you before this class, do you know how to create a bounding box model?
01:59:38.540 | You might have said, no, nobody's taught me that.
01:59:42.900 | But the question actually is, can you create a model with 4 continuous outputs?
01:59:49.580 | Can you create a loss function that is lower if those 4 outputs are near to 4 other numbers?
01:59:56.300 | Then you're done.
01:59:57.300 | Now you'll see if I scroll a bit further down, it starts looking a bit crappy.
02:00:02.860 | Anytime you've got more than one object.
02:00:05.140 | And that's not surprising, because how the hell do you decide which bird, so it's just
02:00:11.420 | said I'll just pick the middle, which cow, I'll pick the middle.
02:00:17.380 | How much of this is actually potted plant, I'll pick the middle.
02:00:21.580 | This one it could probably improve, but it's got close to the car, but it's a pretty weird
02:00:27.220 | But nonetheless, for the ones that are reasonably clear, I would say it's been a pretty good
02:00:33.660 | Alright, so that's time for this week.
02:00:38.780 | I think it's been a kind of gentle introduction for the first lesson.
02:00:43.900 | If you're a professional coder, there's probably not heaps of new stuff here for you.
02:00:50.060 | And so in that case I would suggest practicing learning about bounding boxes and stuff.
02:00:55.900 | If you aren't so experienced with things like debuggers and matplotlib API and stuff like
02:01:02.980 | that, there's going to be a lot for you to practice because we're going to be really
02:01:05.860 | assuming you know it well from next week.
02:01:08.340 | Thanks everybody, see you next Monday.
02:01:10.580 | (audience applauds)