back to indexDeep learning certificate "lesson 0"
Chapters
0:0 Introduction
0:55 What is deep learning
1:15 It changes everything
2:13 Googles autopilot
2:40 Babel fish
3:7 Google Inbox
5:31 Deep learning
6:1 Machine learning
9:0 Google
10:51 Yahoo
11:17 Teach Rank
20:54 Choreography
21:4 Data Institute
27:40 Feature Engineering
36:49 Optimization
00:00:00.000 |
So, anyway, I'm going to just briefly say my name is Dave Yominsky, I'm the director 00:00:09.840 |
of the analytics program and director of the data institute. 00:00:12.600 |
I'll say a few more words about the data institute upstairs and I hope everyone will come upstairs 00:00:19.760 |
But right now, it's my great pleasure to introduce Jeremy Howard, who is many things a serial 00:00:25.840 |
entrepreneur, his most recent venture is analytic, which is bringing deep learning to medicine 00:00:32.840 |
and before that, he was also the former president and chief data scientist at Kaggle and I think 00:00:40.120 |
I'm going to leave it there because I could keep going. 00:00:42.120 |
But anyway, let's give a warm welcome to Jeremy. 00:00:59.680 |
So, you know, my passion at the moment and for the last few years has been this area of 00:01:05.720 |
Who here has kind of come across deep learning at some point? 00:01:09.200 |
Heard of it, knows about it, maybe a little over half of you, two-thirds, okay, great. 00:01:16.520 |
It's one of these things which kind of feels like a great fad or a great marketing thing 00:01:22.680 |
or something kind of like, I don't know, big data or internet of things or, you know, all 00:01:31.480 |
But it actually reminds me of another fad, which I was really excited about in the early 00:01:37.000 |
90s and I was telling everybody it was going to be huge and that fad was called the internet. 00:01:41.280 |
And so some fads, fads, but they're also fads for a reason. 00:01:46.040 |
I think deep learning is going to be more important and more transformational than the 00:01:57.240 |
Every one of you will be deeply impacted by deep learning and that many of us are already 00:02:06.420 |
So before I talk about that, I want to talk about the kind of how people have viewed computers 00:02:12.140 |
for many years and people have really made computers the butt of jokes for many years 00:02:22.240 |
So you may remember from 2009, this was Google's autopilot, which was their April Fool's joke. 00:02:29.240 |
And the butt of the joke was basically, well, of course, computers can't send email. 00:02:35.260 |
And so that was the April Fool's joke of 2009. 00:02:40.160 |
Going back further, a source of humor for Douglas Adams was the Babel Fish, which was 00:02:46.240 |
basically the idea that technology could never be so advanced as to do something this clever 00:02:53.180 |
So he came up with this idea of this fish called the Babel Fish that could translate 00:02:56.960 |
language and so probably useful was this thing that it was used as the proof of the existence 00:03:02.760 |
of God in the Hitchhiker's Guide to the Galaxy. 00:03:11.640 |
Basically the joke doesn't work anymore because your computer really can reply to your email. 00:03:17.320 |
These are actual examples of replies that have been automatically generated by Google 00:03:22.560 |
Inbox, which is a mobile app for Android and iOS and you can also access on the web. 00:03:30.400 |
And this is not some carefully curated set of responses for this particular email. 00:03:37.280 |
In fact, 15% of emails sent through Inbox by Google are now automatically created by 00:03:45.000 |
So it's actually being very widely used already. 00:03:50.080 |
So here's another example of what the system does. 00:03:55.640 |
And indeed the Babel Fish now exists as well. 00:04:00.240 |
You can for free use Skype's translator system to translate voice to voice for any of six 00:04:15.840 |
This is actually not a genuine Van Gogh, but it's fairly impressive. 00:04:22.400 |
In fact on that I'm going to give you a little test which is to figure out which of these 00:04:29.960 |
are real paintings and drawings and which ones were done by a computer. 00:04:37.800 |
Now that you've made your decisions, I will show you. 00:04:41.160 |
The first sketch, I guess pastel sketch on the left, is done by a computer. 00:04:52.480 |
And the third one we all know about extrapolating from past events, but at this level it worked. 00:04:59.000 |
And you can see here that the level of nuance that the computer has done here and kind of 00:05:04.660 |
realizing that this piece uses a lot of lines and arcs and has decided to actually connect 00:05:13.860 |
this lady's eyebrow to her nose, to her shoulder as an arc and also has these kind of areas 00:05:20.720 |
of bursts of color and realize that her hair bun would be a good place to have a burst 00:05:25.720 |
It's quite a sophisticated rendition of both the style and the content. 00:05:31.680 |
So as you might have guessed, the reason that fiction has become reality and computers have 00:05:39.680 |
gone past what was previously a joke and indeed now they're generating art, which is very 00:05:44.680 |
hard to tell from real human art, is because of this thing called deep learning. 00:05:50.840 |
I don't have time today to go into detail about all of the interesting applications, 00:05:55.400 |
but I do have a talk on tech.com that you can watch if you have 18 minutes and get more 00:06:02.320 |
But before we talk more about deep learning, let's talk about machine learning. 00:06:09.000 |
Deep learning is a way of doing machine learning. 00:06:11.760 |
So machine learning was invented by this guy, Arthur Samuels, in 1956. 00:06:16.680 |
This is him playing chess against an IBM mainframe. 00:06:20.680 |
Rather than programming this IBM mainframe to play chess, instead he got the computer 00:06:28.000 |
to play against itself thousands of times and figure out how to play effectively. 00:06:33.380 |
And after doing that, this computer beat the creator of the program. 00:06:40.440 |
So machine learning has been around for a long time. 00:06:43.280 |
The thing is though that until very recently you needed an Arthur Samuels to write your 00:06:50.720 |
machine learning algorithm for you, to actually get to the point that the machine could learn 00:06:55.600 |
to tackle your task, talk a lot of programming effort and engineering effort and also a lot 00:07:00.440 |
of domain expertise to bring basically mainly to do what's called feature engineering. 00:07:07.880 |
But something very interesting has happened more recently, which is that we have the three 00:07:13.360 |
pieces that at least in theory ought to make machine learning universal. 00:07:18.880 |
So imagine if you could get a computer to learn and it could learn any type of relationship. 00:07:25.440 |
Now when you see the word function as a mathematical function, you might think like a line or a 00:07:31.080 |
But I mean function in the most wide possible sense. 00:07:34.120 |
Like the function that translates Russian into Japanese or the function that allows 00:07:44.200 |
That's what I mean by an infinitely flexible function. 00:07:46.920 |
So imagine if you had that and you had some way to fit the parameters of that function 00:07:55.760 |
It could model anything that you could come up with such as the two examples I just gave. 00:08:01.560 |
You just need one more piece, which is the ability to do that quickly and that scale. 00:08:07.000 |
And if you had those three things, you now have a totally general learning system, which 00:08:17.240 |
Deep learning is a particular algorithm for doing machine learning, which has these three 00:08:24.040 |
The infinitely flexible function is the neural network, which has been around for a long 00:08:29.600 |
The all-purpose parameter fitting is backpropagation, which has been around since the, really since 00:08:35.800 |
1974, but was not noticed by the world until 1986. 00:08:40.760 |
Until very recently, though, we didn't have this. 00:08:43.300 |
And the fast and scalable has recently come along for various reasons, including the advances 00:08:48.320 |
in GPUs, which used mainly to play computer games, but also turn out to be perfect for 00:08:53.160 |
deep learning, wider variety of data, and some vital improvements to the algorithms themselves. 00:09:03.560 |
Jeff Dean presented this from Google last week, showing how often deep learning is now 00:09:15.840 |
And you can see this classic hockey stick shape showing an exponential growth here. 00:09:23.160 |
Google are amongst the first, or maybe the first, at really picking up on using this 00:09:30.360 |
But you can imagine that if Google is basically-- what they did was they set aside a group of 00:09:35.600 |
people, and they said, go to different parts of Google, tell them about deep learning, 00:09:41.040 |
And from my understanding of the people I know, everywhere they went, the answer was, 00:09:48.600 |
And of course, the people that that original team talked to were now talking to other people, 00:09:54.280 |
So when I say deep learning changes, everything I certainly would expect in your organizations 00:10:00.400 |
Every aspect of your organization can probably be touched effectively by this. 00:10:05.680 |
An example, when Google wanted to map the location of every residence and business in 00:10:14.880 |
They basically grabbed the entire Street View database. 00:10:18.940 |
These are examples of pictures from the Street View database, and they built a deep learning 00:10:22.080 |
system that could identify house numbers and could then read those house numbers. 00:10:27.080 |
And an hour later, they had mapped the entirety of the country of France. 00:10:31.520 |
This is obviously something that previously would have taken hundreds of people many years. 00:10:36.520 |
And this is one of the reasons that, particularly for startups, and here in the Bay Area, this 00:10:40.680 |
is important, deep learning really does change everything, because suddenly a startup can 00:10:45.760 |
do things that previously required huge amounts of resources. 00:10:52.360 |
So we've kind of seen a little bit of this before. 00:10:55.100 |
What happens when an algorithm comes along that makes a big difference? 00:11:06.600 |
Eighty percent of home pages were Yahoo back in the day. 00:11:12.840 |
And Yahoo was manually curated by expert web surfers. 00:11:18.080 |
And then this company came along and replaced the expert web surfers with a machine learning 00:11:27.320 |
Now this was an algorithm that, compared to deep learning, is incredibly limited and simple 00:11:33.880 |
But if you think about the impact that that algorithm had on Yahoo, well, think about 00:11:39.680 |
the impact of the collaborative filtering algorithm had on kind of Amazon versus Barnes 00:11:44.080 |
and Noble, now that we have really successful recommendation systems. 00:11:47.320 |
You can see how even relatively simple versions of machine learning have had huge commercial 00:12:02.560 |
A paper last year showed that deep learning is able to recognize the content of photos. 00:12:10.600 |
This is something called the ImageNet dataset, which is one and a half million photos. 00:12:17.160 |
And a very patient human had actually spent time trying to classify thousands of these 00:12:23.720 |
photos and tested themselves and found that they had a 5% error rate. 00:12:28.640 |
And last year it was announced by Microsoft Research that they had a system which was 00:12:37.960 |
In fact, this number is now down to about 3%. 00:12:44.440 |
So with deep learning, computers can now see. 00:12:49.520 |
And they can see in a range of interesting ways. 00:12:51.560 |
Anybody here from China will probably recognize Baidu Shutu. 00:12:55.480 |
And on Baidu Shutu, which is a part of the popular kind of Google competitor-- well, 00:13:01.840 |
not really a competitor, since Google's not there. 00:13:07.000 |
You can upload a picture, which is what I did here. 00:13:11.040 |
And it has come up with all of these similar images. 00:13:16.000 |
So it figured out the breed of the dog, the composition, the type of the background, the 00:13:20.480 |
fact that it's had its tongue hanging out, and so forth. 00:13:23.200 |
So you can see that image analysis is a lot more than just saying it's a dog, which is 00:13:28.320 |
what the Chinese at the top are saying it's a golden retriever, but really understanding 00:13:32.920 |
And I'll give you some examples of some of the extraordinary things that allows us to 00:13:38.640 |
Speaking about Baidu, they have now announced that they can recognize speech more accurately 00:13:44.720 |
than humans, in Chinese and English at least. 00:13:49.160 |
So we now have computers at a point where last year they can recognize pictures better 00:13:53.440 |
than us, and now they can recognize speech better than us. 00:14:00.360 |
Microsoft have this amazing system using deep learning where you can take a picture in which 00:14:05.280 |
large bits are being cut off, in this case it was a panorama that was done quite badly. 00:14:10.520 |
And the bottom shows how it has automatically filled in its guess as to what the rest might 00:14:17.000 |
And so this is taking image recognition to the next level, which is to say can I construct 00:14:21.480 |
an image which would be believable to an image recognizer. 00:14:25.160 |
This is part of something called generative models, which is a huge area right now. 00:14:29.000 |
And again, this is a freely available software that you can download off the internet. 00:14:40.600 |
If I had deep learning system here, I probably could have looked that up. 00:14:46.480 |
So generative models are kind of interesting. 00:14:48.240 |
This is like in some ways more quirky than anything else, but I think it's fascinating. 00:14:52.320 |
These pictures here, the four corners are actual photos. 00:14:56.360 |
The ones in the middle are generated by a deep learning algorithm to try and interpolate 00:15:02.240 |
But what you can do more than that is you can then say to the deep learning algorithm, 00:15:07.700 |
what would this photo look like if the person was feeling differently? 00:15:18.680 |
I mean, the interesting thing here is you can see it's doing a lot more than just plastering 00:15:23.840 |
You know, their eyes are smiling, their faces are moving. 00:15:26.440 |
We can even take some famous paintings and slightly change how they're looking. 00:15:38.000 |
And you can see as she's moving her eyes up and down again, the whole features are moving 00:15:43.080 |
One of the interesting things about this Mona Lisa example was that this system-- she's 00:15:49.920 |
This system was originally trained without having any paintings in the training set. 00:15:56.320 |
And one of the interesting things about deep learning is how well it can generalize to 00:16:02.040 |
In this case, it turns out that it knows how to generate different face movements for paintings. 00:16:11.560 |
A lot of people think that deep learning is just about big data. 00:16:15.280 |
Ilya Sutskever from OpenAI presented last week a new model in which he showed that on 00:16:21.120 |
a very famous data set called MNIST, which we'll learn about more shortly. 00:16:25.680 |
But it's basically trying to recognize digits. 00:16:29.880 |
It's a very old, classic machine learning problem. 00:16:35.200 |
He discovered that with just 50 labeled images of digits, he could train a 99% accurate number 00:16:49.480 |
And so these recent advances that allow us to use small amounts of data is something 00:16:52.920 |
that's really changing what's possible with deep learning. 00:16:58.080 |
It's also turning really anybody into an artist. 00:17:00.720 |
There's a thing called neural doodle that allows you to whip out your stylus and jot 00:17:05.400 |
down some sophisticated imagery like this and then say how you would like it rendered, 00:17:10.280 |
In this case, it was being rendered as impressionism. 00:17:13.680 |
You can see it's done a pretty good job of generating an image which hopefully fits what 00:17:19.160 |
the original artist had in their head with their original doodle. 00:17:25.720 |
And it's not just about images, it's about text as well, or even combining the two. 00:17:32.800 |
These sentences are totally novel sentences constructed from scratch by a deep learning 00:17:41.640 |
So you can see that in order to construct this, the deep learning algorithm must have 00:17:45.840 |
understood a lot about not just what the main objects in the picture are, but how they relate 00:17:58.200 |
I got so excited about this that three years ago, I left my job at Kaggle and spent a year 00:18:04.760 |
researching what are the biggest opportunities for deep learning in the world. 00:18:08.880 |
I came to the conclusion that the number one biggest opportunity at that time was medicine. 00:18:17.960 |
We had four of us, all computer scientists and mathematicians, no medical people on the 00:18:26.640 |
And within two months, we had a system for radiology which could predict the malignancy 00:18:33.040 |
of lung cancer more accurately than a panel of four of the world's best radiologists. 00:18:39.960 |
This was kind of very exciting to me because it was everything that I hoped was possible. 00:18:46.040 |
It was also always somehow surprising when you actually run a model and it's classifying 00:18:52.840 |
cancer and you genuinely have no idea how it did it because of course all you do is 00:18:57.040 |
set up the kind of situation in which you can learn and then it does that learning. 00:19:04.160 |
So this turned out to be very successful and Enlurik today has raised $15 million. 00:19:09.280 |
It's a pretty successful company and one of the things I mentioned is that earlier Baidu 00:19:15.560 |
Shutu example of taking a picture and finding similar pictures, that's doing big things 00:19:24.320 |
It basically allows radiologists to find previous patients from a database of millions of CT 00:19:30.080 |
scans and MRIs to find the people that have medical imagery just like the patient that 00:19:34.680 |
they're interested in and then they can find out exactly the path of that patient, how 00:19:38.920 |
did they respond to different drugs, so forth. 00:19:42.240 |
So this kind of semantic search of imagery is a really exciting area. 00:19:48.600 |
So one thing interesting about my particular CV when it comes to creating the first deep 00:19:57.120 |
learning medical diagnostic company is not so much what I've done but perhaps what I 00:20:04.520 |
And so that's the entirety of my actual biology, life sciences and medicine experience. 00:20:11.640 |
And one of the exciting things to those of you who are entrepreneurs or interested in 00:20:15.480 |
being entrepreneurs is that there is no limits as to what you can hope to do. 00:20:21.800 |
You recognize a problem that you want to solve and that you care about and that hopefully 00:20:26.920 |
maybe hasn't been solved before that well and have a go and really you can do a lot. 00:20:34.960 |
In my case, once I kind of showed that we could do some useful stuff in oncology and 00:20:41.560 |
we got covered by CNN on one of the TV shows and then suddenly the medical establishment 00:20:47.440 |
kind of came to us at which point we've got a lot of help from the medical establishment 00:20:50.920 |
as well so you kind of get this nice feedback loop going on. 00:20:56.440 |
So most importantly, deep learning can also do choreography. 00:21:02.120 |
So if you're excited about this and think this all sounds interesting, you might be wondering 00:21:11.880 |
And the answer you won't be surprised to hear is the Data Institute. 00:21:17.520 |
We haven't previously announced this but I'm going to announce now that the first, through 00:21:23.160 |
our knowledge, the first ever in-person university accredited deep learning certificate will 00:21:30.660 |
be here at the Data Institute starting in the second lesson will start in late October. 00:21:40.360 |
You might be wondering when the first lesson is and the answer is it's right now. 00:21:48.700 |
So you came to university, come on, you've got to expect to be studying here. 00:21:54.800 |
So what I'm going to show you is so the Stivka course will be seven weeks of two and a half 00:22:03.080 |
We don't have two and a half hours right now so this will by necessity be heavily compressed. 00:22:07.360 |
So if this doesn't make as much sense as you might like it to, don't worry, the MSAM students 00:22:12.200 |
will certainly follow along fine but I'll try and make it as clear as possible. 00:22:19.780 |
One of the things that I strongly believe is that deep learning is easy. 00:22:23.640 |
It is made hard by people who put way more math into it than is necessary and also by 00:22:30.120 |
what I think is a desire for exclusivity amongst the kind of deep learning specialists. 00:22:36.960 |
They make up crazy new jargon about things that are really very simple. 00:22:41.080 |
So I want to kind of show you how simple it can be. 00:22:43.920 |
And specifically we're going to look at MNIST, the data set I told you about which is about 00:22:51.080 |
And I'm going to use a system called Jupyter Notebook. 00:22:55.140 |
For those of you that don't code, I hope the fact that this is done in code isn't too off-putting. 00:22:59.400 |
You certainly don't need to use code for everything but I find it a very good way to kind of show 00:23:06.360 |
So I'm going to have to make sure that we actually have this thing running, which was 00:23:23.360 |
And so the data, the MNIST data has 55,000 28x28 images in it. 00:23:37.080 |
As you can see it is a 28x28 picture and as well as the image we also have labels, which 00:23:50.640 |
And so you can see that this, as is common with pretty much every machine learning dataset, 00:23:57.020 |
You have something which is information that you're given and then information that you 00:24:01.240 |
So in this case the goal of this dataset is to take a picture of a number and return what 00:24:11.600 |
So here's the first five pictures and the first five numbers that go with each one. 00:24:17.840 |
So this was originally generated by an IST and they basically had thousands of people 00:24:24.240 |
draw lots of numbers and then somebody went through and coded into a computer what each 00:24:29.240 |
So I'm going to show you some interesting things we can do with pictures. 00:24:33.320 |
The first thing I'm going to do is I'm going to create a little matrix here that I've called 00:24:37.040 |
And as you can see I've got minus ones at the top of it and then ones and then zeros 00:24:45.720 |
And what I'm going to show you is, in fact I want you to think about something, which 00:24:49.640 |
is what would happen if I took that matrix and what I want to do is basically shift it 00:24:57.200 |
over this first image and I'm going to take this three by three and I'm going to put it 00:25:01.240 |
right at the top left and I'm going to move it right a bit and I'm going to move it right 00:25:04.120 |
a bit and I'm going to go all the way to the end, I'll start back here and all the way 00:25:08.400 |
And at each point, as it's kind of overlapping the three by three area of pixels, I want 00:25:13.160 |
to take the value of that pixel and I want to multiply it by each of the equivalent values 00:25:22.240 |
And just to give you a sense of what that looks like, here on the right is a low res 00:25:30.520 |
Here on the left is how that photo is represented as numbers. 00:25:33.480 |
So you can see here where it's black, there are low numbers in the 20s and where it's 00:25:41.840 |
So that's how pictures are stored in your computer. 00:25:45.080 |
And then you can see here we've got an example of a particular matrix and basically we can 00:25:53.240 |
multiply every one of these sets of three pixels by the three things in that matrix 00:25:59.480 |
and you get something that comes out on the right. 00:26:04.120 |
So in this case, we're going to take this picture and multiply it by this matrix. 00:26:08.760 |
And so to make life a little bit easier for ourself, let's try and zoom in to a little 00:26:14.400 |
So here's our original picture, the first picture, and let's zoom into the top left-hand 00:26:29.360 |
All right, so let's think about what would happen if we took that three-by-three picture 00:26:34.000 |
and it was over here, or if it was over here, what would happen? 00:26:38.560 |
So I want you to try and have a guess at what you think is going to happen to each one of 00:26:42.120 |
these pixels, and don't actually have very much room here. 00:26:51.400 |
So what I've done here is I've printed out the actual value of each one of those pixels. 00:26:55.400 |
So you can see at the top it's all black, it's all zeros, and in that bit where there's 00:27:00.040 |
a little bit of seven pointing out through, there's some numbers that go up to one. 00:27:06.120 |
So let's try, it's called correlating, by the way, let's try correlating my top filter 00:27:17.360 |
So here's the result, and you can see at the top it's all zeros. 00:27:21.520 |
And up here we've got some high numbers, and down here we've got some low numbers. 00:27:30.480 |
Did you figure out what that was going to look like? 00:27:32.880 |
So you can see basically what it's done if we look at the whole picture is it has highlighted 00:27:42.400 |
We've taken something incredibly simple, which is this 3x3 matrix. 00:27:47.560 |
We've multiplied it by every 3x3 area in our picture, and each time we've added it up. 00:27:52.360 |
And we've ended up with something that finds top edges. 00:27:56.480 |
And so before deep learning, this is part of what we would call feature engineering. 00:28:00.400 |
This is basically where people would say, "How do you figure out what kind of number 00:28:05.360 |
Well, maybe one of the things we should do is find out where the edges of it are. 00:28:11.040 |
So we're going to keep doing this a little bit more. 00:28:13.320 |
So one of the things we could do is look at other kinds of edges. 00:28:19.580 |
You can basically take a matrix and say, "Rotate it by 90 degrees n times." 00:28:24.460 |
So if I rotate it by 90 degrees once, I now have something which looks like this. 00:28:31.960 |
And if I do that for every possible rotation, you can see that basically gives me four different 00:28:42.920 |
So a word that you're going to hear a lot is convolutional neural networks, because convolutional 00:28:47.280 |
neural networks is basically what all image recognition today uses. 00:28:53.040 |
And the word convolution is one of these overly complex words, in my opinion. 00:28:56.880 |
It actually means the same thing as correlation. 00:28:58.940 |
The only difference is that convolution means that you take the original filter and you 00:29:06.760 |
I've said convolve my image by my top filter rotated by 90 degrees and plot it. 00:29:15.240 |
So when you hear people talk about convolutions, this is actually all they mean. 00:29:18.960 |
They're basically multiplying it by each area and adding it up. 00:29:24.160 |
So we can do the same thing for diagonal edges. 00:29:27.840 |
So here I've built four different diagonal edges. 00:29:31.520 |
And then I could try taking our first image and correlating it with every one of those. 00:29:37.960 |
And so here you can see I've got a correlate at the top, with the left, bottom, right, 00:29:46.560 |
Well, basically this is because this is a kind of feature engineering. 00:29:50.520 |
We have found eight different ways of thinking about the number seven, or this particular 00:29:57.520 |
And so what we do with that in machine learning is we want to basically create a fingerprint 00:30:02.680 |
of what does a seven tend to look like on average. 00:30:06.280 |
And so in deep learning, to do that, we tend to use something called max pooling. 00:30:11.720 |
And max pooling is another of these complex sounding things that is actually ridiculously 00:30:16.600 |
And as you can see in Python, it's actually a single line of code. 00:30:19.240 |
What we're going to do is we're going to take each seven by seven area, because these are 00:30:22.520 |
28 by 28, so that'll give us four, seven by seven areas, and find the value of the brightest 00:30:34.320 |
So you can see that this top edge one, there's some really big numbers here. 00:30:41.040 |
You can see that for the bottom left edge, there's very little, which is bright. 00:30:45.760 |
So this is kind of like a fingerprint of this particular image. 00:30:53.320 |
So I'm going to use this now to create something really simple. 00:30:55.780 |
It's going to figure out the difference between eights and ones, because that just seems like 00:31:02.200 |
So I'm going to grab all of the eights out of our MNIST data set, and all of the ones, 00:31:07.200 |
and I'm going to show you a few examples of each of them. 00:31:09.400 |
OK, so there's some eights and there's some ones. 00:31:12.280 |
Hopefully, one of the things you're seeing here is that if you're not somebody who codes 00:31:16.560 |
or maybe you used to and you don't much anymore, it's very quick and easy to code. 00:31:21.600 |
Like these things are generally like one short line, you know, it doesn't take lots of mucking 00:31:26.600 |
around like it used to back in the days of writing C code. 00:31:30.680 |
So what I'm going to do now is I'm going to create this max pooling fingerprint basically 00:31:42.560 |
And then what I can do is I'll show you the first five of them. 00:31:46.520 |
So here are the first five eights that are in our data set and what their little fingerprints 00:31:57.680 |
So what I can now do is I can basically say, tell me what the average one of those fingerprints 00:32:08.880 |
I'm going to take the mean across all of the eights that have been pulled. 00:32:18.880 |
These eight pictures here are the average of the top side, the left side, bottom side, 00:32:25.680 |
sorry, right side, bottom side, right side, and so forth for all of the eights in our 00:32:36.080 |
And so we can do something, we can, first of all, I'll repeat the exact same process 00:32:40.680 |
for the ones and hopefully we'll be able to see that there'll be some differences. 00:32:47.240 |
You can see that the ones basically have no diagonal edges, right? 00:32:51.000 |
So it's all very light gray, but they have very strong vertical edges. 00:32:56.560 |
So what we're hoping is that we can use this insight now to recognize eights versus ones 00:33:01.900 |
and have our little hand, have little digit recognizer. 00:33:05.920 |
So the way we're going to do that is we are going to correlate for every image in our 00:33:14.080 |
data set, we're going to correlate it with each of these parts of the fingerprint, basically. 00:33:20.960 |
That's what this single line of code here does. 00:33:25.320 |
So here's an example of taking the very first one of our eights and seeing how well it correlates 00:33:39.200 |
So we're basically at the point where we can now put all this together. 00:33:43.360 |
So what I'm going to do is I'm going to basically say, all right, I'm going to decide whether 00:33:49.440 |
So I've got this function called is it an eight? 00:33:54.660 |
This is the sum of squared errors, which I won't bother explaining to you, but a lot 00:33:58.120 |
of you probably already know what that is, sum of squared errors. 00:34:00.720 |
So basically, if it's closer to the filters for being a one than it is to the one for 00:34:07.680 |
being an eight, it's going to-- how I'm going to decide whether it's going to be a one or 00:34:13.880 |
So I'm just going to test and I'm going to say if the error is higher for the ones, then 00:34:31.880 |
And I've tested to see my error for the eight filters and my error for the one filters. 00:34:37.280 |
And you can see my error for the eight filters is lower than my error for the one filters. 00:34:46.600 |
We can now calculate for our entire data set of eights and ones. 00:34:55.280 |
And for our entire data set of eights and ones, one minus is it an eight? 00:35:01.220 |
So as you can see, this is taking now a little while to calculate because it's basically 00:35:09.080 |
So for is it an eight, 5,200 times it said yes if it was an eight, and 287 times it said 00:35:20.240 |
It has successfully found something that can recognize a difference. 00:35:28.120 |
It's for 8,900 times if it's a one, it said it's a one, and 166 times it said it's a one 00:35:35.420 |
So these four numbers here are called a classification matrix. 00:35:40.400 |
And when data scientists build these machine learning models, this is basically the thing 00:35:43.800 |
that we tend to look at, decide whether they're any good or not. 00:35:48.040 |
That's the entirety of building a simple machine learning approach to image recognition. 00:35:56.600 |
I'm sure you guys have lots of examples of how to make it better. 00:36:00.000 |
And one obvious way to make it better would be to not use the crappy first attempt I had 00:36:05.880 |
at-- this was literally the first eight features I just came up with. 00:36:10.400 |
I'm sure there's a lot of much better features we could be using. 00:36:14.600 |
Specifically, there's a lot of much better three by three matrices. 00:36:21.440 |
So that would be one step would be to make these better. 00:36:23.720 |
Another would be it doesn't really make sense that we're treating all of the filters as 00:36:36.400 |
More importantly though, wouldn't we like to be able to say, I don't just want to look 00:36:41.160 |
for a straight edge or a horizontal edge, but I want to look for something more complex, 00:36:46.280 |
I'd love to be able to find corners just here. 00:36:50.080 |
Deep learning is a thing that takes this and does all of those things. 00:36:54.560 |
And the way it does it is by using something called optimization. 00:36:58.960 |
Basically what we do is rather than starting out with eight carefully planned filters like 00:37:03.400 |
these, we actually start out with eight random filters or 100 random filters. 00:37:11.480 |
And we set up something that tries to make those filters better and better and better. 00:37:18.400 |
But rather than optimizing filters, we are going to optimize a simple line. 00:37:23.960 |
A lot of you have probably looked at linear regression some time in your life. 00:37:27.760 |
And we're going to do linear regression, but the deep learning way. 00:37:34.720 |
The definition of a line is something that takes a slope, a coefficient, and an x value, 00:37:45.720 |
Probably everybody has done that amount of math at the very least. 00:37:51.320 |
Again, it looks like I'm going to have to restart this guy. 00:38:03.560 |
So after we define a line, let's actually set up some data. 00:38:08.760 |
So let's say our actual a is three and our actual b is eight. 00:38:14.440 |
So we're now going to create some random data. 00:38:17.800 |
Okay, and for x, it's just going to be a random number. 00:38:21.040 |
And then for y, it will be the correct value of y based on this line, okay. 00:38:27.640 |
So here is my x values and here is my y values. 00:38:34.880 |
The machine learning goal, if you were given this data, would be forget that you ever knew 00:38:39.720 |
that the correct values of a were three and b is eight. 00:38:45.500 |
This is the equivalent of figuring out what the optimal set of filters are for my image 00:38:52.840 |
It's basically the same thing, but in this case, my filters, we have, you know, quite 00:38:57.040 |
a few of them, but here we just have two of them to make the reasoning simpler. 00:39:02.400 |
But actually it's going to be exactly the same, totally identical as to how this works. 00:39:06.320 |
So once you know how to do this, you'll know how to do that deep learning thing I just 00:39:10.040 |
described of actually optimizing these filters. 00:39:14.120 |
So to do it, we do something very similar to what we had before. 00:39:17.120 |
We basically have to define how do we know whether our prediction is good or not. 00:39:22.280 |
And so we'll basically say our prediction is good if the squared error, again we're 00:39:27.920 |
using the squared error thing, is high versus low, okay. 00:39:34.120 |
So our loss function, every deep learning algorithm has a loss function, will be the errors based 00:39:39.600 |
on the y values that we actually have versus the result of applying our linear function. 00:39:51.720 |
So I've just decided let's start at guessing that a is minus 1 and guessing that b is positive 00:39:57.240 |
Okay, so if that were the case, what would my average loss be? 00:40:01.540 |
And so this says on average, Jeremy, you would have been out by 8.6, okay. 00:40:11.240 |
I basically figure out can I make it a little bit higher or a little bit lower and for each 00:40:17.820 |
of my a guess and my b guess, and would my loss function go up or would it go down? 00:40:23.640 |
I've actually got a nice little Excel spreadsheet that actually does this. 00:40:28.980 |
I won't go through it in detail now, but basically I've done exactly the same thing. 00:40:37.200 |
I've got my sum of squared errors, and you can see I've literally taken what's the value 00:40:46.580 |
So what's the change in the error if I add 0.01 to a and 0.01 to b, okay? 00:40:51.180 |
And so if I divide that error by my 0.01, that gives me what's known as the derivative. 00:40:56.400 |
So anybody who's done calculus will, of course, recognize this. 00:40:59.720 |
So all I need to do now is say, okay, if I increased x by a bit, my loss function goes 00:41:06.720 |
If I increase a by a bit, my loss function goes down. 00:41:09.720 |
If I increase b by a bit, my loss function goes down. 00:41:13.920 |
Therefore, I should increase a and b by a bit. 00:41:29.880 |
So here is the entirety of how to do an optimization from scratch. 00:41:37.400 |
So it's basically saying, okay, calculate my predicted y. 00:41:41.040 |
That's just my linear function with my a guess and my b guess and my x, okay? 00:41:49.160 |
And you'll see in this case, I'm not doing it that slow way of adding 0.01, but everybody 00:41:53.440 |
who's done calculus will know there's always a shortcut in calculus to doing things quickly. 00:41:59.700 |
And in case you're thinking you, if you want to do deep learning, you're going to have 00:42:02.440 |
to remember all of your rules of calculus, you don't. 00:42:09.620 |
In real life, if you need a derivative, you go to alpha.wolfram.com, and you type in the 00:42:15.680 |
thing that you want your derivative of, and you press enter. 00:42:25.000 |
You double-click that, copy it, and you paste it, okay? 00:42:32.360 |
And then I pasted that into my code, all right? 00:42:40.440 |
So I bet you're all glad you spent lots of time learning those stupid rules. 00:42:44.880 |
So okay, now that I've done that, don't worry too much about this code, but basically what 00:42:48.800 |
I'm going to do now is I'm going to animate what happens as we call this update function 00:42:53.040 |
40 times, okay, starting with a guess of a of minus 1 and a guess of b of 1. 00:43:01.240 |
And I'm going to plot the original data and my line, and let's see what happens. 00:43:10.800 |
So I'd have to run it for a little bit longer, of course, if it was going to exactly hit. 00:43:14.520 |
But you can see that the line is getting closer and closer to the data. 00:43:18.360 |
So just imagine now taking this idea and doing it for each of these filters, what would happen? 00:43:30.560 |
Imagine if you didn't just have these filters, but imagine if these filters themselves became 00:43:39.080 |
That would allow you to create a corner, because it could say, "Oh, a bit of top edge and a 00:43:44.200 |
Assuming that the thing did actually decide that edges were interesting. 00:43:48.240 |
So it turns out that, this is why the little guys here, so we can be excited that we've 00:43:55.620 |
just successfully learned about deep learning, it turns out that somebody did this, and two 00:44:04.600 |
They created lots and lots of layers optimized in exactly, exactly this way. 00:44:10.360 |
This is not some super dumbed down version, this is it, right? 00:44:14.200 |
They did this, and they discovered that layer one, I mean, what author, there's one difference 00:44:18.920 |
with they had color images rather than black and white. 00:44:21.560 |
This is nine out of the 34 examples they had on the first layer, and you can see it decided 00:44:27.180 |
that it wanted to look for edges as well as gradients. 00:44:32.160 |
On the right-hand side, it's showing examples from real photos, they had one and a half 00:44:35.800 |
million photos, real examples of like nine patches of photo that matched this particular 00:44:44.360 |
So then what they did was they say, "Okay, this guy's name is Matt Zeiler." 00:44:49.000 |
He said, "Okay, what would happen now if we created a new layer which took these as inputs 00:44:54.240 |
and combined them in exactly the same way as we did with pixels?" 00:45:00.400 |
And so layer two is a little bit harder to draw, so instead he draws nine examples of 00:45:05.320 |
how it gets activated by various images, and you can see in layer two it's learned to find 00:45:11.720 |
lots of horizontal lines, lots of vertical lines, it's learned to find circles. 00:45:16.280 |
And indeed, if you look on the right, it's even already basically got something that 00:45:22.940 |
So layer two is finding circular things, stripey things, edges, and as we hoped, corners. 00:45:35.400 |
Layer three is going to do exactly the same thing, but it's going to start with these. 00:45:39.280 |
And this is just 16 out of the probably 60 or so filters he has. 00:45:44.040 |
And so each part is getting exponentially more sophisticated in what it can do. 00:45:48.800 |
And so by layer three, we already have a filter which can find text. 00:45:54.600 |
We have a filter that can find repeating patterns. 00:45:58.180 |
By layer four, we have a filter which can find dog faces. 00:46:03.200 |
By layer five, we have a filter that can find the eyeballs of lizards and birds. 00:46:08.660 |
The most recent deep learning networks have over 1,000 filters. 00:46:14.160 |
So can you imagine each of these exponentially improving levels of kind of semantic richness? 00:46:19.760 |
And that is why these incredibly simple things I showed you, which is convolutions plus optimization 00:46:27.360 |
applied to multiple layers, can let you understand speech better than a human, recognize images 00:46:35.360 |
So that's basically the summary of why deep learning changes everything. 00:46:39.160 |
And if you want to have the rest of lesson one and a review of that, and then lesson