back to indexLesson 2 - Deep Learning for Coders (2020)
Chapters
0:0 Lesson 1 recap
2:10 Classification vs Regression
4:50 Validation data set
6:42 Epoch, metrics, error rate and accuracy
9:7 Overfitting, training, validation and testing data set
12:10 How to choose your training set
15:55 Transfer learning
21:50 Fine tuning
22:23 Why transfer learning works so well
28:26 Vision techniques used for sound
29:30 Using pictures to create fraud detection at Splunk
30:38 Detecting viruses using CNN
31:20 List of most important terms used in this course
31:50 Arthur Samuel’s overall approach to neural networks
32:35 End of Chapter 1 of the Book
40:4 Where to find pretrained models
41:20 The state of deep learning
44:30 Recommendation vs Prediction
45:50 Interpreting Models - P value
57:20 Null Hypothesis Significance Testing
62:48 Turn predictive model into something useful in production
74:6 Practical exercise with Bing Image Search
76:25 Bing Image Sign up
81:38 Data Block API
88:48 Lesson Summary
00:00:06.620 |
Practical Deep Learning for Coders. This is lesson 2 and and in the last lesson we started 00:00:14.060 |
training our first models. We didn't really have any idea how that training 00:00:18.760 |
was really working, but we were looking at a high level at what was going on and 00:00:24.040 |
we learned about what is machine learning and how does that work and we 00:00:35.320 |
realized that based on how machine learning worked that there are some 00:00:40.400 |
fundamental limitations on on what it can do and we talked about some of those 00:00:45.080 |
limitations and we also talked about how after you've trained a machine learning 00:00:48.680 |
model you end up with a program which behaves much like a normal program or 00:00:54.240 |
something with inputs and a thing in the middle and outputs. So today we're 00:00:59.520 |
gonna finish up talking about talking about that and we're going to then look 00:01:05.040 |
at how we get those models into production and what some of the issues 00:01:08.400 |
with doing that might be. I wanted to remind you that there are two sets of 00:01:16.360 |
books, sorry two sets of notebooks available to you. One is the the the 00:01:22.320 |
fastbook repo, the full actual notebooks containing all the text of the O'Reilly 00:01:29.000 |
book and so this lets you see everything that I'm telling you in much more detail 00:01:35.920 |
and then as well as that there's the the course v4 repo which contains exactly 00:01:42.520 |
the same notebooks but with all the pros stripped away to help you study. So 00:01:47.640 |
that's where you really want to be doing your experiment and your practice and so 00:01:51.640 |
maybe as you listen to the video you can kind of switch back and forth between 00:01:56.600 |
the video and reading or do one and then the other and then put it away and have 00:02:01.280 |
a look at the course v4 notebooks and try to remember like okay what was this 00:02:04.800 |
section about and run the code and see what happens and change it and so forth. 00:02:11.200 |
So we were looking at this line of code where we looked at how we created our 00:02:21.360 |
data by passing in information perhaps most importantly some way to label the 00:02:28.920 |
data and we talked about the importance of labeling and in this case the this 00:02:33.000 |
particular data set whether it's a cat or a dog you can tell by whether it's an 00:02:37.000 |
uppercase or a lowercase letter in the first position that's just how this data 00:02:41.520 |
set if they tell you when the readme works and we also looked particularly at 00:02:46.360 |
this idea of valid percent equals 0.2 and like what does that mean it creates a 00:02:51.320 |
validation set and that was something I wanted to talk more about. The first thing 00:02:57.560 |
I do want to do though is point out that this particular labeling function 00:03:05.320 |
returns something that's either true or false and actually this data set as we'll 00:03:11.480 |
see later also tells also contains the actual breed of 37 different cat and dog 00:03:17.160 |
breeds so you can you can also grab that from the file name. In each of those two 00:03:23.120 |
cases we're trying to predict a category is it a cat or is it a dog or is it a 00:03:29.680 |
German Shepherd or a beagle or rag doll cat or whatever when you're trying to 00:03:36.360 |
predict a category so when the label is a category we call that a classification 00:03:42.160 |
model. On the other hand you might try to predict how old is the animal or how 00:03:50.120 |
tall is it or something like that which is like a continuous number that could 00:03:55.680 |
be like 13.2 or 26.5 or whatever anytime you're trying to predict a number your 00:04:02.000 |
label is a number you call that regression okay so those are the two 00:04:06.720 |
main types of model classification and regressions this is very important jargon 00:04:11.000 |
to know about so the regression model attempts to predict one or more numeric 00:04:16.840 |
quantities such as temperature or location or whatever this is a bit 00:04:21.240 |
confusing because sometimes people use the word regression as a shortcut to a 00:04:25.920 |
particular like a abbreviation for a particular kind of model called linear 00:04:30.720 |
regression that's super confusing because that's not what regression means 00:04:36.240 |
linear regression is just a particular kind of regression but I just wanted to 00:04:40.320 |
warn you of that when you start talking about regression a lot of people will 00:04:45.280 |
assume you're talking about linear regression even though that's not what 00:04:48.080 |
the word means. Alright so I wanted to talk about this valid percent 0.2 thing 00:04:54.840 |
so as we described valid percent grabs in this case 20% of the data but 0.2 and 00:05:02.040 |
puts it aside like in a separate bucket and then when you train your model your 00:05:08.120 |
model doesn't get to look at that data at all that data is only used to decide 00:05:13.960 |
to show you how accurate your model is so if you train for too long and or with 00:05:24.200 |
not enough data and or a model with too many parameters after a while the 00:05:28.920 |
accuracy of your model will actually get worse and this is called overfitting 00:05:34.040 |
right so we use the validation set to ensure that we're not overfitting the 00:05:42.200 |
next line of code that we looked at is this one where we created something 00:05:46.880 |
called a learner we'll be learning a lot more about that but a learner is 00:05:50.320 |
basically or is something which contains your data and your architecture that is 00:05:57.320 |
the mathematical function that you're optimizing and so a learner is the thing 00:06:03.000 |
that tries to figure out what are the parameters which best cause this 00:06:06.920 |
function to match the labels in this data so we're talking a lot more about 00:06:13.640 |
that but basically this particular function resnet 34 is the name of a 00:06:18.960 |
particular architecture which is just very good for computer vision problems 00:06:23.480 |
in fact the name really is resnet and then 34 tells you how many layers there 00:06:29.040 |
are so you can use ones with bigger numbers here to get more parameters that 00:06:33.040 |
will take longer to train take more memory more likely to overfit but could 00:06:38.160 |
also create more complex models right now though I wanted to focus on this 00:06:44.320 |
part here which is metrics equals error rate this is where you list the 00:06:49.920 |
functions that you want to be that you want to be called with your data with 00:06:54.080 |
your validation data and print it out after each epoch and epoch is is what we 00:07:01.280 |
call it when you look at every single image in the data set once and so after 00:07:06.960 |
you've looked at every image in the data set once we print out some information 00:07:11.160 |
about how you're doing and the most important thing we print out is the 00:07:15.040 |
result of calling these metrics so error rate is the name of a metric and it's a 00:07:20.200 |
function that just prints out what percent of the validation set are being 00:07:24.760 |
incorrectly classified by your model so a metrics a function that measures the 00:07:32.280 |
quality of the predictions using the validation set so error rates one 00:07:36.720 |
another common metric is accuracy which is just one minus error rate so very 00:07:42.160 |
important to remember from last week we talked about loss Arthur Samuel had this 00:07:48.840 |
important idea in machine learning that we need some way to figure out how good 00:07:54.800 |
our how well our model is doing so that when we change the parameters we can 00:08:00.040 |
figure out which set of parameters make that performance measurement get better 00:08:04.280 |
or worse that performance measurement is called the loss the loss is not 00:08:10.440 |
necessarily the same as your metric the reason why is a bit subtle and we'll be 00:08:17.140 |
seeing it in a lot of detail once we delve into the math in the coming 00:08:20.140 |
lessons but basically you need a function you need a loss function where 00:08:27.880 |
if you change the parameters by just a little bit up or just a little bit down 00:08:31.960 |
you can see if the loss gets a little bit better or a little bit worse and it 00:08:35.920 |
turns out that error rate and accuracy doesn't tell you that at all because you 00:08:40.600 |
might change the parameters by smudge such a small amount that none of your 00:08:45.040 |
dogs predictions start becoming cats and none of your cat predictions start 00:08:48.920 |
becoming dogs so like your predictions don't change and your error rate doesn't 00:08:52.640 |
change so loss and metric are closely related but the metric is the thing that 00:08:58.080 |
you care about the loss is the thing which your computer is using as the 00:09:03.640 |
measurement of performance to decide how to update your parameters so we measure 00:09:11.200 |
overfitting by looking at the metrics on the validation set so fast AI always 00:09:18.360 |
uses the validation set to print out your metrics and overfitting is like the 00:09:24.320 |
key thing that machine learning is about it's all about how do we find a model 00:09:30.160 |
which fits the data not just for the data that we're training with but for 00:09:35.120 |
data that the training algorithm hasn't seen before so overfitting results when 00:09:44.880 |
our model is basically cheating our model can cheat by saying oh I've seen 00:09:52.560 |
this exact picture before and I remember that that's a picture of a cat so it 00:09:58.040 |
might not have learned what cats look like in general it just remembers you 00:10:01.880 |
know that images one four and eight are cats and two and three and five are dogs 00:10:06.640 |
and learns nothing actually about what they really look like so that's the kind 00:10:11.600 |
of cheating that we're trying to avoid we don't want it to memorize our 00:10:15.360 |
particular data set so we split off our validation data and most of this these 00:10:22.120 |
words you're seeing on the screen are from the book okay so I just copied and 00:10:25.080 |
pasted them so if we split off our validation data and make sure that our 00:10:31.240 |
model sees it during training it's completely untainted by it so we can't 00:10:35.080 |
possibly cheat not quite true we can cheat the way we could cheat is we could 00:10:41.960 |
run we could fit a model look at the result in the validation set change 00:10:46.600 |
something a little bit fit another model look at the validation set change 00:10:50.280 |
something a little bit we could do that like a hundred times until we find 00:10:53.920 |
something where the validation set looks the best but now we might have fit to 00:10:57.900 |
the validation set right so if you want to be really rigorous about this you 00:11:03.120 |
should actually set aside a third bit of data called the test set that is not 00:11:08.640 |
used for training and it's not used for your metrics it's actually you don't 00:11:13.520 |
look at it until the whole project's finished and this is what's used on 00:11:17.280 |
competition platforms like Kaggle on Kaggle after the competition finishes 00:11:23.680 |
your performance will be measured against a data set that you have never seen and 00:11:30.920 |
so that's a really helpful approach and it's actually a great idea to do that 00:11:38.120 |
like even if you're not doing the modeling yourself so if you're if you're 00:11:43.600 |
looking at vendors and you're just trying to decide should I go with IBM or 00:11:48.040 |
Google or Microsoft and they're all showing you how great their models are 00:11:52.360 |
what you should do is you should say okay you go and build your models and I 00:11:57.680 |
am going to hang on to ten percent of my data and I'm not going to let you see it 00:12:01.460 |
at all and when you're all finished come back and then I'll run your model on the 00:12:06.240 |
ten percent of data you've never seen now pulling out your validation and test 00:12:15.400 |
sets is a bit subtle though here's an example of a simple little data set and 00:12:21.260 |
this comes from a fantastic blog post that Rachel wrote that we will link to 00:12:25.960 |
about creating effective validation sets and you can see basically you have some 00:12:31.120 |
kind of seasonal data set here now if you just say okay fast AI I want to 00:12:37.360 |
model that I want to create my data loader using a valid percent of 0.2 it 00:12:44.920 |
would do this it would delete randomly some of the dots right now this isn't 00:12:52.180 |
very helpful because it's we can still cheat because these dots are right in 00:12:57.560 |
the middle of other dots and this isn't what would happen in practice what would 00:13:01.120 |
happen in practice is we would want to predict this is sales by date right we 00:13:05.480 |
want to predict the sales for next week not the sales for 14 days ago 18 days 00:13:10.720 |
ago in 29 days ago right so what you actually need to do to create an 00:13:15.120 |
effective validation set here is not do it randomly but instead chop off the end 00:13:21.640 |
right and so this is what happens in all Kaggle competitions pretty much that 00:13:26.240 |
involve time for instance is the thing that you have to predict is the next 00:13:30.640 |
like two weeks or so after the last data point that they give you and this is 00:13:36.760 |
what you should do also for your test set so again if you've got vendors that 00:13:40.760 |
you're looking at you should say to them okay after you're all done modeling we're 00:13:45.000 |
going to check your model against a data that is one week later than you've ever 00:13:49.920 |
seen before and you won't be able to retrain or anything because that's what 00:13:53.880 |
happens in practice right okay there's a question I've heard people describe 00:14:00.480 |
overfitting as training error being below validation error does this rule of 00:14:05.120 |
thumb end up being roughly the same as yours okay so that's a great question so 00:14:09.560 |
I think what they mean there is training loss versus validation loss because we 00:14:15.840 |
don't print training error so we do print at the end of each epoch the value 00:14:21.680 |
of your loss function for the training set and the value of the loss function 00:14:25.080 |
for the validation set and if you train for long enough so if it's training 00:14:32.120 |
nicely your training loss will go down and your validation loss will go down 00:14:37.200 |
because by definition loss function is defined such as a lower loss function is 00:14:44.920 |
a better model if you start overfitting your training loss will keep going down 00:14:51.440 |
right because like why wouldn't that you know you're getting better and better 00:14:55.480 |
parameters but your validation loss will start to go up because actually you 00:15:02.160 |
started fitting to the specific data points in the training set and so it's 00:15:05.920 |
not going to actually get better it's going to get it's not going to get better 00:15:08.640 |
for the validation set it'll start to get worse however that does not 00:15:14.880 |
necessarily mean that you're overfitting or at least not overfitting in a bad way 00:15:19.080 |
as we'll see it's actually possible to be at a point where the validation loss 00:15:24.760 |
is getting worse but the validation accuracy or error or metric is still 00:15:29.640 |
improving so I'm not going to describe how that would happen mathematically yet 00:15:35.240 |
because we need to learn more about loss functions but we will but for now just 00:15:39.640 |
realize that the important thing to look at is your metric getting worse not your 00:15:46.120 |
loss function getting worse thank you for that fantastic question the next 00:15:56.520 |
important thing we need to learn about is called transfer learning so the next 00:16:00.600 |
line of code said learn fine tune why does it say learn fine tune fine tune is 00:16:07.360 |
what we do when we are transfer learning so transfer learning is using a pre 00:16:12.580 |
trained model for a task that is different to what it was originally 00:16:16.060 |
trained for so more jargon to understand our jargon let's look at that what's a 00:16:20.980 |
pre trained model so what happens is remember I told you the architecture 00:16:25.240 |
we're using is called resnet 34 so when we take that resnet 34 that's just a 00:16:30.640 |
it's just a mathematical function okay with lots of parameters that we're going 00:16:35.320 |
to fit using machine learning there's a big data set called image net that 00:16:42.560 |
contains 1.3 million pictures of a thousand different types of thing 00:16:46.640 |
whether it be mushrooms or animals or airplanes or hammers or whatever there's 00:16:55.240 |
a competition there used to be a competition that runs every year to see 00:16:58.240 |
who could get the best accuracy on the image net competition and the models 00:17:02.600 |
that did really well people would take those specific values of those 00:17:07.800 |
parameters and they would make them available on the internet for anybody to 00:17:11.920 |
download so if you download that you don't just have an architecture now you 00:17:16.280 |
have a trained model you have a model that can recognize a thousand categories 00:17:22.320 |
of thing in images which probably isn't very useful unless you happen to want 00:17:28.400 |
something that recognizes those exact thousand categories of thing but it turns 00:17:33.120 |
out you can rather you can start with those weights in your model and then 00:17:40.560 |
train some more epochs on your data and you'll end up with a far far more 00:17:47.120 |
accurate model than you would if you didn't start with that pre-trained model 00:17:51.540 |
and we'll see why in just a moment right but this idea of transfer learning it's 00:17:57.040 |
kind of it makes intuitive sense right image net already has some cats and some 00:18:03.520 |
dogs in it it's you know it can say this is a cat and this is a dog but you want 00:18:07.320 |
to maybe do something that recognizes lots of breeds that aren't an image net 00:18:11.000 |
well for it to be able to recognize cats versus dogs versus airplanes versus 00:18:16.160 |
hammers it has to understand things like what does metal look like what does fur 00:18:22.120 |
look like what it is look like you know so it can say like oh this breed of 00:18:26.480 |
animal this breed of dog has pointy ears and oh this thing is metal so it can't 00:18:31.320 |
be a dog so all these kinds of concepts get implicitly learnt by a pre-trained 00:18:37.360 |
model so if you start with a pre-trained model then you don't it you don't have 00:18:41.880 |
to learn all these features from scratch and so transfer learning is the single 00:18:48.840 |
most important thing for being able to use less data and less compute and get 00:18:54.960 |
better accuracy so that's a key focus for the fast AI library and a key focus 00:19:00.920 |
for this course there's a question I'm a bit confused on the differences between 00:19:12.600 |
last error and metric sure so error is just one kind of metric so there's lots 00:19:23.600 |
of different possible labels you could have let's say you're trying to create a 00:19:27.360 |
model which could predict how old a cat or dog is so the metric you might use is 00:19:36.920 |
on average how many years were you off by so that would be a metric on the other 00:19:44.440 |
hand if you're trying to predict whether this is a cat or a dog your metric could 00:19:49.720 |
be what percentage of the time am I wrong so that latter metric is called the 00:19:55.240 |
error rate okay so error is one particular metric it's a thing that 00:20:00.320 |
measures how well you're doing and it's like it should be the thing that you 00:20:04.880 |
most care about so you write a function or use one of fast AI's pre-defined ones 00:20:10.520 |
which measures how well you're doing loss is the thing that we talked about in 00:20:19.820 |
lesson one so I'll give a quick summary but go back to lesson one if you don't 00:20:23.760 |
remember Arthur Samuel talked about how a machine learning model needs some 00:20:29.000 |
measure of performance which we can look at when we adjust our parameters up or 00:20:35.360 |
down does that measure of performance get better or worse and as I mentioned 00:20:40.500 |
earlier some metrics possibly won't change at all if you move the parameters 00:20:47.960 |
up and down just a little bit so they can't be used for this purpose of 00:20:52.920 |
adjusting the parameters to find a better measure of performance so quite 00:20:56.600 |
often we need to use a different function we call this the loss function 00:21:00.840 |
the loss function is the measure of performance that the algorithm uses to 00:21:05.480 |
try to make the parameters better and it's something which should kind of 00:21:09.600 |
track pretty closely to the the metric you care about but it's something which 00:21:15.120 |
as you change the parameters a bit the loss should always change a bit and so 00:21:20.880 |
there's a lot of hand waving there because we need to look at some of the 00:21:24.440 |
math of how that works and we'll be doing that in the next couple of lessons 00:21:30.920 |
thanks for the great questions okay so fine-tuning is a particular transfer 00:21:39.840 |
learning technique where the oh and you're still showing your picture and 00:21:44.960 |
out the slides so fine-tuning is a transfer learning technique where the 00:21:54.720 |
weights this is not quite the right word we should say the parameters where the 00:21:58.400 |
parameters of a pre-trained model are updated by training for additional epochs 00:22:02.800 |
using a different task to that used for pre-training so pre-training the task 00:22:07.400 |
might have been image net classification and then our different task might be 00:22:12.040 |
recognizing cats versus dogs so the way by default fast AI does fine-tuning is 00:22:22.240 |
that we use one epoch which remember is one looking at every image in the data 00:22:27.420 |
set once one epoch to fit just those parts of the model necessary to get the 00:22:34.240 |
particular part of the model that's especially for your data set working and 00:22:40.480 |
then we use as many epochs as you ask for to fit the whole model and so this 00:22:45.640 |
is more if you for those people who might be a bit more advanced we'll see 00:22:48.680 |
exactly how this works later on in the lessons so why does transfer learning 00:22:55.560 |
work and why does it work so well the best way in my opinion to look at this 00:22:59.920 |
is to see this paper by Zyla and Fergus who were actually 2012 image net winners 00:23:07.200 |
and interestingly their key insights came from their ability to visualize 00:23:13.160 |
what's going on inside a model so visualization very often turns out to be 00:23:17.960 |
super important to getting great results what they were able to do was they looked 00:23:22.600 |
remember I told you like a resnet 34 has 34 layers they looked at something 00:23:28.880 |
called Alex net which was the previous winner of the competition which only had 00:23:32.520 |
seven layers at the time that was considered huge and so they took a seven 00:23:37.040 |
layer model and they said what is the first layer of parameters look like and 00:23:42.720 |
they figured it out how to draw a picture of them right and so the first 00:23:47.720 |
layer had lots and lots of features but here are nine of them one two three four 00:23:55.900 |
five six seven eight nine and here's what nine of those features look like one of 00:24:00.960 |
them was something that could recognize diagonal lines from top left to bottom 00:24:04.440 |
right one of them could find diagonal lines from bottom left to top right one 00:24:08.840 |
of them could find gradients that went from the top of orange to the bottom of 00:24:12.400 |
blue some of them were able you know one of them was specifically for finding 00:24:17.740 |
things that were green and so forth right so for each of these nine they're 00:24:25.040 |
called filters they're all features so then something really interesting they 00:24:30.400 |
did was they looked at for each one of these each one of these filters each one 00:24:34.840 |
of these features and we'll learn kind of mathematically about what these 00:24:38.360 |
actually mean in the coming lessons but for now let's just recognize them as 00:24:43.120 |
saying oh there's something that looks at diagonal lines and something that 00:24:45.520 |
looks at gradients and they found in the actual images in ImageNet specific 00:24:52.900 |
examples of parts of photos that match that filter so for this top left filter 00:24:58.400 |
here are nine actual patches of real photos that match that filter and as you 00:25:04.360 |
can see they're all diagonal lines and so here's the for the green one here's 00:25:08.560 |
parts of actual photos that match the green one so layer one is super super 00:25:14.560 |
simple and one of the interesting things to note here is that something that can 00:25:18.000 |
recognize gradients and patches of color and lines is likely to be useful for 00:25:22.360 |
lots of other tasks as well not just ImageNet so you can kind of see how 00:25:26.760 |
something that can do this might also be good at many many other computer vision 00:25:33.280 |
tasks as well this is layer two layer two takes the features of layer one and 00:25:40.680 |
combines them so it can not just find edges but can find corners or repeating 00:25:49.800 |
curving patterns or semicircles or full circles and so you can see for example 00:25:56.760 |
here's a it's kind of hard to exactly visualize these layers after layer one 00:26:06.120 |
you kind of have to show examples of what the filters look like but here you 00:26:11.080 |
can see examples of parts of photos that these this layer to circular filter has 00:26:17.080 |
activated on and as you can see it's found things with circles so 00:26:23.880 |
interestingly this one which is this kind of blotchy gradient seems to be 00:26:28.040 |
very good at finding sunsets and this repeating vertical pattern is very good 00:26:33.320 |
at finding like curtains and wheat fields and stuff so the further we get 00:26:39.560 |
layer three then gets to combine all the kinds of features in layer two and 00:26:45.320 |
remember we're only seeing so we're only seeing here 12 of the features but 00:26:49.480 |
actually there's probably hundreds of them I don't remember exactly in Alex 00:26:52.520 |
Net but there's lots but by the time we get to layer three by combining features 00:26:57.440 |
from layer two it already has something which is finding text so this is a 00:27:03.480 |
feature which can find bits of image that contain text it's already got 00:27:08.120 |
something which can find repeating geometric patterns and you see this is 00:27:12.980 |
not just like a matching specific pixel patterns this is like a semantic concept 00:27:20.240 |
it can find repeating circles or repeating squares or repeating hexagons 00:27:24.400 |
right so it's it's really like computing it's not just matching a template and 00:27:31.220 |
remember we know that neural networks can solve any possible computable 00:27:35.180 |
function so it can certainly do that so layer 4 gets to combine all the filters 00:27:43.400 |
from layer 3 anyway at once and so by layer 4 we have something that can find 00:27:47.240 |
dog faces for instance so you can kind of see how each layer we get like 00:27:56.800 |
multiplicatively more sophisticated features and so that's why these deep 00:28:01.880 |
neural networks can be so incredibly powerful it's also why transfer learning 00:28:07.740 |
can work so well because like if we wanted something that can find books and 00:28:13.160 |
I don't think there's a book category in ImageNet well it's actually already got 00:28:17.020 |
something that can find text as an earlier filter which I guess it must be 00:28:20.840 |
using to find maybe there's a category for library or something or a bookshelf 00:28:25.880 |
so when you use transfer learning you can take advantage of all of these 00:28:30.840 |
pre-learn features to find things that are just combinations of these existing 00:28:37.680 |
features that's why transfer learning can be done so much more quickly and so 00:28:42.880 |
much less data than traditional approaches one important thing to 00:28:48.360 |
realize then is that these techniques for computer vision are not just good at 00:28:53.160 |
recognizing photos there's all kinds of things you can turn into pictures for 00:28:58.680 |
example these are example these are sounds that have been turned into 00:29:04.440 |
pictures by representing their frequencies over time and it turns out 00:29:10.080 |
that if you convert a sound into these kinds of pictures you can get basically 00:29:15.960 |
state-of-the-art results at sound detection just by using the exact same 00:29:21.660 |
ResNet learner that we've already seen I wanted to highlight that it's 945 so if 00:29:28.520 |
you want to take a break soon a really cool example from I think this is our 00:29:34.160 |
very first year of running fast AI one of our students created pictures they 00:29:40.080 |
worked at Splunk in anti-fraud and they created pictures of users moving their 00:29:45.240 |
mouse and if I remember correctly as they moved their mouse he basically drew 00:29:49.800 |
a picture of where the mouse moved and the color depended on how fast they 00:29:54.180 |
moved and these circular blobs is where they clicked the left or the right mouse 00:29:59.080 |
button and at Splunk they then well he what he did actually for the for the 00:30:04.740 |
course as a project for the course is he tried to see whether he could use this 00:30:09.480 |
these pictures with exactly the same approach we saw in lesson one to create 00:30:15.000 |
an anti-fraud model and it worked so well that Splunk ended up patenting a new 00:30:21.240 |
product based on this technique and you can actually check it out there's a blog 00:30:25.040 |
post about it on the internet where they describe this breakthrough anti-fraud 00:30:29.120 |
approach which literally came from one of our really amazing and brilliant and 00:30:34.440 |
creative students after lesson one of the course another cool example of this 00:30:40.640 |
is looking at different viruses and again turning them into pictures and you 00:30:48.800 |
can kind of see how they've got here this is from a paper check out the book 00:30:52.500 |
for the citation they've got three examples of a particular virus called 00:30:57.200 |
vb.at and another example of a particular virus called fakrian and you 00:31:02.240 |
can see each case the pictures all look kind of similar and that's why again 00:31:06.960 |
they can get state-of-the-art results in in virus detection by turning the kind 00:31:12.760 |
of program signatures into pictures and putting it through image recognition so 00:31:20.520 |
in the book you'll find a list of all of the terms all of the most important 00:31:25.560 |
terms we've seen by so far and what they mean I'm not going to read through them 00:31:29.280 |
but I want you to please because these are the these are the terms that we're 00:31:33.480 |
going to be using from now on and you've got to know what they mean because if 00:31:38.720 |
you don't you're going to be really confused because I'll be talking about 00:31:41.320 |
labels and architectures and models and parameters and they have very specific 00:31:46.080 |
exact meanings and they'll be using those exact meanings so please review this so 00:31:52.520 |
to remind you this is where we got to we we ended up with Arthur Samuel's overall 00:31:59.520 |
approach and we replaced his terms with our terms so we have an architecture 00:32:05.520 |
which contains parameters as inputs and we will parameters and the data as 00:32:11.960 |
inputs so that the architecture press the parameters of the model with the 00:32:18.560 |
inputs they use to calculate predictions they are compared to the labels with a 00:32:23.480 |
loss function and that loss function is used to update the parameters many many 00:32:28.320 |
times to make them better and better until the loss gets nice and super low 00:32:34.360 |
so this is the end of chapter one of the book it's really important to look at 00:32:39.800 |
the questionnaire because the questionnaire is the thing where you can 00:32:43.080 |
check whether you have taken away from this book of this chapter the stuff that 00:32:49.260 |
we hope you have so go through it and anything that you're not sure about the 00:32:55.520 |
tech that the answer is in the text so just go back to earlier in the book and 00:33:00.080 |
you will in the chapter and you will find the answers there's also a further 00:33:05.800 |
research section after each questionnaire for the first couple of 00:33:09.480 |
chapters they're actually pretty simple hopefully they're pretty fun and 00:33:12.240 |
interesting they're things where to answer the question it's not enough to 00:33:15.480 |
just look in the chapter you actually have to go and do your own thinking and 00:33:19.640 |
experimenting and googling and so forth in later chapters some of these further 00:33:25.880 |
research things are pretty significant projects that might take a few days or 00:33:30.320 |
even weeks and so yeah you know check them out because hopefully they'll be a 00:33:35.480 |
great way to expand your understanding of the material so something that Sylvain 00:33:42.560 |
points out in the book is that if you really want to make the most of this 00:33:46.000 |
then after each chapter please take the time to experiment with your own project 00:33:50.640 |
and with the notebooks you provide what we provide and then see if you can redo 00:33:55.560 |
the the notebooks on a new data set and perhaps for chapter one that might be a 00:34:00.200 |
bit hard because we haven't really shown how to change things but for chapter for 00:34:03.640 |
chapter two which we're going to start next you'll absolutely be able to do 00:34:07.240 |
that okay so let's take a five minute break and we'll come back at 955 San 00:34:16.880 |
Francisco time okay so welcome back everybody and I think we've got a couple 00:34:22.360 |
of questions to start with so Rachel please take it away sure our filters 00:34:27.560 |
independent by that I mean if filters are pre trained might they become less 00:34:31.840 |
good in detecting features of previous images when fine-tuned oh that is a 00:34:37.000 |
great question so assuming I understand the question correctly if if you start 00:34:43.120 |
with say an image net model and then you you fine-tune it on dogs versus cats for 00:34:49.720 |
a few epochs and you get something that's very good at recognizing dogs 00:34:53.560 |
versus cats it's going to be much less good as an image net model after that 00:34:58.840 |
so it's not going to be very good at recognizing airplanes or or hammers or 00:35:03.520 |
whatever this is called catastrophic forgetting in the literature the idea 00:35:10.040 |
that as you like see more images about different things to what you saw earlier 00:35:14.360 |
that you start to forget about the things you saw earlier so if you want to 00:35:20.180 |
fine-tune something which is good at a new task but also continues to be good 00:35:26.080 |
at the previous task you need to keep putting in examples of the previous task 00:35:30.000 |
as well and what are the example what are the differences between parameters 00:35:37.800 |
and hyper parameters if I am feeding an image of a dog as an input and then 00:35:43.120 |
changing the hyper parameters of batch size in the model what would be an 00:35:47.160 |
example of a parameter so the parameters are the things that described in lesson 00:35:55.160 |
one that Arthur Samuel described as being the things which change what the 00:36:02.720 |
model does what the architecture does so we start with this infinitely flexible 00:36:08.540 |
function the thing called a neural network that can do anything at all and 00:36:14.080 |
the the way you get it to do one thing versus another thing is by changing its 00:36:19.880 |
parameters there they are the numbers that you pass into that function so 00:36:24.640 |
there's two types of numbers you pass into the function there's the numbers 00:36:27.640 |
that represent your input like the pixels of your dog and there's the 00:36:32.240 |
numbers that represent the learnt parameters so in the example of 00:36:39.520 |
something that's not a neural net but like a checkers playing program like 00:36:43.560 |
Arthur Samuel might have used back in the early 60s and late 50s those 00:36:47.640 |
parameters may have been things like if there is a opportunity to take a piece 00:36:54.640 |
versus an opportunity to get to the end of a board how much more value should I 00:36:59.920 |
consider one versus the other you know it's twice as important or it's three 00:37:03.720 |
times as important that two versus three that would be an example of a parameter 00:37:08.480 |
in a neural network parameters are a much more abstract concept and so a 00:37:14.960 |
detailed understanding of what they are will come in the next lesson or two but 00:37:20.080 |
it's the same basic idea they're the numbers which change what the model does 00:37:26.480 |
to be something that recognizes malignant tumors versus cats versus dogs 00:37:32.600 |
versus colorizes black and white pictures whereas the hyper parameter is 00:37:38.920 |
the choices about what what numbers do you pass to the function when you act 00:37:46.040 |
the actual fitting function to decide how that fitting process happens there's 00:37:52.520 |
a question I'm curious about the pacing of this course I'm concerned that all 00:37:55.960 |
the material may not be covered depends what you mean by all the material we 00:38:00.920 |
certainly won't cover everything in the world so yeah we'll cover what we can 00:38:08.400 |
then we'll cover what we can in seven lessons we're certainly not covering the 00:38:12.960 |
whole book if that's what you're wondering the whole book will be covered 00:38:16.200 |
in either two or three courses in the past it's generally been two courses to 00:38:22.200 |
cover about the amount of stuff in the book but we'll see how it goes because 00:38:26.480 |
the books pretty big 500 pages when you say two courses you mean 14 lessons 14 00:38:32.520 |
yeah so it'd be like 14 or 21 lessons to get through the whole book although 00:38:38.080 |
having said that by the end of the first lesson hopefully there'll be kind of 00:38:40.880 |
like enough momentum and understanding that the reading the book independently 00:38:44.800 |
will be more useful and you'll have also kind of gained a community of folks on 00:38:50.800 |
the forums that you can hang out with and ask questions of and so forth so in 00:38:57.380 |
in the second part of the course we're going to be talking about putting stuff 00:39:02.000 |
in production and we're so to do that we need to understand like what are the 00:39:08.040 |
capabilities and limitations of of deep learning what are the kinds of projects 00:39:13.880 |
that even make sense to try to put in production and you know one of the key 00:39:18.600 |
things I should mention in in the Balkan in this course is that the first two or 00:39:22.160 |
three lessons and chapters there's a lot of stuff which is designed not just for 00:39:27.640 |
for coders but for for everybody there's lots of information about like what are 00:39:34.880 |
the practical things you need to know to make deep learning work and so one of 00:39:38.320 |
them things you need to know is like well what's deep learning actually good 00:39:41.360 |
at at the moment so I'll summarize what the book says about this but there are 00:39:48.240 |
the kind of four key areas that we have as applications in fast AI computer 00:39:53.760 |
vision text tabula and but I've called here Rexis this stands for recommendation 00:39:58.200 |
systems and specifically a technique called collaborative filtering which we 00:40:01.760 |
briefly saw last week sorry another question is are there any pre-trained 00:40:06.960 |
weights available other than the ones from image net that we can use if yes 00:40:11.240 |
when should we use others in one image net oh that's a really great question so 00:40:16.280 |
yes there are a lot of pre-trained models and one way to find them but also 00:40:23.320 |
you're currently just showing switching okay great one great way to find them is 00:40:29.120 |
you can look up model zoo which is a common name for like places that have 00:40:36.480 |
lots of different models and so here's lots of model zoos or you can look for 00:40:44.400 |
pre-trained models and so yeah there's quite a few unfortunately not as wide a 00:40:57.320 |
variety as I would like that most are still on image net or similar kinds of 00:41:02.640 |
general photos for example medical imaging there's hardly any there's a lot 00:41:09.800 |
of opportunities for people to create domain specific pre-trained models it's 00:41:13.320 |
it's still an area that's really underdone because not enough people are 00:41:16.320 |
working on transfer learning okay so as I was mentioning we've kind of got these 00:41:23.760 |
four applications that we've talked about a bit and deep learning is pretty you 00:41:32.560 |
know pretty good at all of those tabular data like spreadsheets and database 00:41:39.160 |
tables is an area where deep learning is not always the best choice but it's 00:41:44.160 |
particularly good for things involving high cardinality variables that means 00:41:48.280 |
variables that have like lots and lots of discrete levels like zip code or 00:41:52.520 |
product ID or something like that deep learning is really pretty great for 00:41:58.600 |
those in particular for text it's pretty great at things like classification and 00:42:06.760 |
translation it's actually terrible for conversation and so that's that's been 00:42:11.720 |
something that's been a huge disappointment for a lot of companies 00:42:14.120 |
they tried to create these like conversation bots but actually deep 00:42:18.480 |
learning isn't good at providing accurate information it's good at 00:42:23.240 |
providing things that sound accurate and sound compelling but it we don't really 00:42:27.280 |
have great ways yet of actually making sure it's correct one big issue for 00:42:34.840 |
recommendation systems collaborative filtering is that deep learning is 00:42:39.880 |
focused on making predictions which don't necessarily actually mean creating 00:42:44.760 |
useful recommendations we'll see what that means in a moment deep learning is 00:42:50.680 |
also good at multimodal that means things where you've got multiple 00:42:56.440 |
different types of data so you might have some tabular data including a text 00:43:00.360 |
column and an image and some collaborative filtering data and 00:43:06.880 |
combining that all together is something that deep learning is really good at so 00:43:11.040 |
for example putting captions on photos is something which deep learning is 00:43:17.920 |
pretty good at although again it's not very good at being accurate so what you 00:43:22.400 |
know it might say this is a picture of two birds it's actually a picture of 00:43:25.800 |
three birds and then this other category there's lots and lots of things that you 00:43:33.800 |
can do with deep learning by being creative about the use of these kinds of 00:43:38.240 |
other application-based approaches for example an approach that we developed 00:43:43.600 |
for natural language processing called ULM fit or you're learning in the course 00:43:48.120 |
it turns out that it's also fantastic at doing protein analysis if you think of 00:43:53.040 |
the different proteins as being different words and they're in a 00:43:57.360 |
sequence which has some kind of state and meaning it turns out that ULM fit 00:44:02.240 |
works really well for protein analysis so often it's about kind of being being 00:44:06.880 |
creative so to decide like for the product that you're trying to build is 00:44:12.480 |
deep learning going to work well for it in the end you kind of just have to try 00:44:17.600 |
it and see but if you if you do a search you know hopefully you can find 00:44:24.480 |
examples about the people that have tried something similar even if you 00:44:27.760 |
can't that doesn't mean it's not going to work so for example I mentioned the 00:44:33.280 |
collaborative filtering issue where a recommendation and a prediction are not 00:44:37.840 |
necessarily the same thing you can see this on Amazon for example quite often 00:44:43.040 |
so I bought a Terry Pratchett book and then Amazon tried for months to get me to 00:44:48.880 |
buy more Terry Pratchett books now that must be because their predictive model 00:44:53.240 |
said that people who bought one particular Terry Pratchett book are 00:44:57.440 |
likely to also buy other Terry Pratchett books but from the point of view of like 00:45:01.880 |
well is this going to change my buying behavior probably not right like if I 00:45:07.040 |
liked that book I already know I like that author and I already know that like 00:45:10.440 |
they probably wrote other things so I'll go and buy it anyway so this would be an 00:45:14.360 |
example of like Amazon probably not being very smart up here they're 00:45:18.720 |
actually showing me collaborative filtering predictions rather than 00:45:23.280 |
actually figuring out how to optimize a recommendation so an optimized 00:45:27.520 |
recommendation would be something more like your local human bookseller might 00:45:32.360 |
do where they might say oh you like Terry Pratchett well let me tell you 00:45:36.840 |
about other kind of comedy fantasy sci-fi writers on the similar vein who 00:45:41.440 |
you might not have heard about before so the difference between recommendations 00:45:46.240 |
and predictions is super important so I wanted to talk about a really important 00:45:53.360 |
issue around interpreting models and for a case study for this I thought we let's 00:45:59.000 |
pick something that's actually super important right now which is a model in 00:46:03.240 |
this paper one of the things we're going to try and do in this course is learn 00:46:06.160 |
how to read papers so here is a paper which you I would love for everybody to 00:46:11.480 |
read called high temperature and high humidity reduce the transmission of 00:46:15.540 |
COVID-19 now this is a very important issue because if the claim of this paper 00:46:20.840 |
is true and that would mean that this is going to be a seasonal disease and if 00:46:25.360 |
this is a seasonal disease and it's going to have massive policy implications 00:46:30.360 |
so let's try and find out how this was modeled and understand how to interpret 00:46:35.240 |
this model so this is a key picture from the paper and what they've done here is 00:46:45.560 |
they've taken a hundred cities in China and they've plotted the temperature on 00:46:50.300 |
one axis in Celsius and are on the other axis where R is a measure of 00:46:56.160 |
transmissibility it says for each person that has this disease how many people on 00:47:02.200 |
average will they infect so if R is under one then the disease will not 00:47:07.720 |
spread is if R is higher than like two it's going to spread incredibly quickly 00:47:14.840 |
and basically R is going to you know any high R is going to create an 00:47:18.560 |
exponential transmission impact and you can see in this case they have plotted a 00:47:25.000 |
best fit line through here and then they've made a claim that there's some 00:47:30.440 |
particular relationship in terms of a formula that R is 1.99 minus 0.023 times 00:47:38.480 |
temperature so a very obvious concern I would have looking at this picture is 00:47:44.840 |
that this might just be random maybe there's no relationship at all but just 00:47:52.160 |
if you picked a hundred cities at random perhaps they would sometimes show this 00:47:57.680 |
level of relationship so one simple way to kind of see that would be to actually 00:48:04.840 |
do it in a spreadsheet so here's here is a spreadsheet where what I did was I kind 00:48:12.960 |
of eyeballed this data and I guessed about what is the mean degrees centigrade 00:48:17.920 |
I think it's about five and what's about the standard deviation of centigrade I 00:48:22.440 |
think it's probably about five as well and then I did the same thing for R I 00:48:27.240 |
think the mean R looks like it's about 1.9 to me and it looks like the standard 00:48:32.040 |
deviation of R is probably about 0.5 so what I then did was I just jumped over 00:48:38.560 |
here and I created a random normal value so a random value from a normal 00:48:46.000 |
distribution from a normal distribution so a bell curve with that particular 00:48:50.200 |
mean and standard deviation of temperature and that particular mean and 00:48:55.120 |
standard deviation of R and so this would be an example of a city that might 00:49:02.480 |
be in this data set of a hundred cities something with nine degrees Celsius and 00:49:06.800 |
an R of 1.1 so that would be nine degrees Celsius and an R of 1.1 so 00:49:12.680 |
something about here and so then I just copied that formula down 100 times so 00:49:22.920 |
here are a hundred cities that could be in China right where this is assuming 00:49:30.160 |
that there is no relationship between temperature and R right they're just 00:49:34.320 |
random numbers and so each time I recalculate that so if I hit ctrl equals 00:49:42.000 |
it will just recalculate it right I get different numbers okay because they're 00:49:47.680 |
random and so you can see at the top here I've then got the average of all of 00:49:55.240 |
the temperatures and the average of all of the R's and the average of all the 00:49:58.880 |
temperatures varies and the average of all of R's varies as well so then I what 00:50:09.560 |
I did was I copied those random numbers over here let's actually do it so I'll 00:50:18.600 |
go copy these 100 random numbers and paste them here here here here and so 00:50:32.760 |
now I've got one two three four five six I've got six kind of groups of 100 00:50:40.720 |
cities right and so let's stop those from randomly changing anymore by just 00:50:49.520 |
fixing them in stone there okay so now that I've paste them in I've got six 00:51:01.520 |
examples of what a hundred cities might look like if there was no relationship 00:51:06.440 |
at all between temperature and R and I've got their main temperature and R 00:51:11.560 |
in each of those six examples and what I've done is you can see here at least 00:51:16.980 |
for the first one is I've plotted it right and you can see in this case there's 00:51:22.040 |
actually a slight positive slope and I've actually calculated the slope for 00:51:33.500 |
each just by using the slope function in Microsoft Excel and you can see that 00:51:37.840 |
actually in this particular case is just random five times it's been negative and 00:51:46.200 |
it's even more negative than their point 0 to 3 and so you can like it's kind of 00:51:53.560 |
matching our intuition here which is that this the slope of the line that we 00:51:57.800 |
have here is something that absolutely can often happen totally by chance it 00:52:03.680 |
doesn't seem to be indicating any kind of real relationship at all if we wanted 00:52:09.240 |
that slope to be like more confident we would need to look at more cities so like 00:52:17.800 |
here I've got 3,000 randomly generated numbers and you can see here the slope 00:52:26.960 |
is 0.00002 right it's almost exactly zero which is what we'd expect right when 00:52:33.080 |
there's actually no relationship between C and R and in this case there isn't 00:52:37.440 |
they're all random then if we look at lots and lots of randomly generated 00:52:41.360 |
cities then we can say oh yeah this there's no slope but when you only look 00:52:45.840 |
at a hundred as we did here you're going to see relationships totally 00:52:51.520 |
coincidentally very very often right so that's something that we need to be able 00:52:57.360 |
to measure and so one way to measure that is we use something called a p-value so 00:53:03.080 |
a p-value here's how a p-value works we start out with something called a null 00:53:07.720 |
hypothesis and the null hypothesis is basically what's what's our starting 00:53:13.760 |
point assumption so our starting point assumption might be oh there's no 00:53:17.280 |
relationship between temperature and R and then we gather some data and have 00:53:22.280 |
you explained what R is I have yes R is the transmissibility of the virus so 00:53:28.680 |
then we gather data of independent and dependent variables so in this case the 00:53:32.860 |
independent variable is the thing that we think might cause a dependent variable 00:53:38.000 |
so here the independent variable would be temperature the dependent variable 00:53:41.000 |
would be R so here we've gathered data there's the data that was gathered in 00:53:45.720 |
this example and then we say what percentage of the time would we see this 00:53:50.880 |
amount of relationship which is a slope of 0.023 by chance and as we've seen one 00:53:57.720 |
way to do that is by what we would call a simulation which is by generating 00:54:02.080 |
random numbers a hundred set pairs of random numbers a bunch of times and 00:54:06.440 |
seeing how often you see this this relationship we don't actually have to 00:54:12.240 |
do it that though there's actually a simple equation we can use to jump 00:54:17.840 |
straight to this number which is what percent of the time would we see that 00:54:21.280 |
relationship by chance and this is basically what that looks like we have 00:54:31.040 |
the most likely observation which in this case would be if there is no 00:54:35.980 |
relationship between temperature and R then the most likely slope would be 0 00:54:40.040 |
and sometimes you get positive slopes by chance and sometimes you get pretty small 00:54:48.940 |
slopes and sometimes you get large negative slopes by chance and so the you 00:54:55.360 |
know the larger the number the less likely it is to happen whether it be on 00:54:58.360 |
the positive side or the negative side and so in our case our question was how 00:55:04.880 |
often are we going to get less than negative 0.023 so it would actually be 00:55:10.000 |
somewhere down here and I actually copy this from Wikipedia where they were 00:55:13.560 |
looking for positive numbers and so they've colored in this area above a 00:55:17.760 |
number so this is the p-value and so you can we don't care about the math but 00:55:22.480 |
there's a simple little equation you can use to directly figure out this number 00:55:29.720 |
the p-value from the data so this is kind of how nearly all kind of medical 00:55:39.840 |
research results tend to be shown and folks really focus on this idea of p 00:55:45.480 |
values and indeed in this particular study as we'll see in a moment they 00:55:49.640 |
reported p-values so probably a lot of you have seen p-values in your previous 00:55:55.840 |
lives they come up in a lot of different domains here's the thing they are 00:56:01.840 |
terrible you almost always shouldn't be using them don't just trust me trust the 00:56:07.800 |
American Statistical Association they point out six things about p-values and 00:56:14.240 |
those include p-values do not measure the probability that the hypothesis is 00:56:19.320 |
true all the probability that the data were produced by random choice alone now 00:56:24.480 |
we know this because we just saw that if we use more data right so if we sample 00:56:32.040 |
3000 random cities rather than a hundred we get a much smaller value right so p 00:56:40.200 |
values don't just tell you about how big a relationship is but they actually tell 00:56:44.320 |
you about a combination of that and how much data did you collect right so so 00:56:49.560 |
they don't measure the probability that the hypothesis is true so therefore 00:56:53.960 |
conclusions and policy decisions should not be based on whether a p-value passes 00:56:58.920 |
some threshold p-value does not measure the importance of a result right because 00:57:08.000 |
again it could just tell you that you collected lots of data which doesn't 00:57:11.880 |
tell you that the results actually of any practical input and so by itself it 00:57:16.120 |
does not provide a good measure of evidence so Frank Harrell who is somebody 00:57:23.600 |
who I read his book and it's a really important part of my learning he's a 00:57:28.360 |
professor of biostatistics has a number of great articles about this he says 00:57:34.280 |
null hypothesis testing and p-values have done significant harm to science 00:57:39.160 |
and he wrote another piece called null hypothesis significance testing never 00:57:44.160 |
worked so I've shown you what p-values are so that you know why they don't work 00:57:52.320 |
not so that you can use them right but they're a super important part of 00:57:56.440 |
machine learning because they come up all the time in making this you know when 00:58:01.320 |
people saying this is how we decide whether your drug worked or whether 00:58:06.000 |
there is a epidemiological relationship or whatever and indeed p-values appear 00:58:13.160 |
in this paper so in the paper they show the results of a multiple linear 00:58:19.800 |
regression and they put three stars next to any relationship which has a p-value 00:58:27.400 |
of 0.01 or less so there is something useful to say about a small p-value like 00:58:38.240 |
0.01 or less which is that the thing that we're looking at did not probably did not 00:58:43.400 |
happen by chance right the biggest statistical error people make all the 00:58:48.200 |
time is that they see that a p-value is not less than 0.05 and then they make 00:58:54.400 |
the erroneous conclusion that no relationship exists right which doesn't 00:59:01.880 |
make any sense because like it let's say you only had like three data points then 00:59:06.480 |
you almost certainly won't have enough data to have a p-value of less than 0.05 00:59:11.400 |
for any hypothesis so like the way to check is to go back and say what if I 00:59:17.520 |
picked the exact opposite null hypothesis what if my null hypothesis was 00:59:21.880 |
there is a relationship between temperature and R then do I have enough 00:59:26.040 |
data to reject that null hypothesis right and if the answer is no then you 00:59:34.820 |
just don't have enough data to make any conclusions at all right so in this case 00:59:39.800 |
they do have enough data to be confident that there is a relationship between 00:59:46.160 |
temperature and R now that's weird because we just looked at the graph and 00:59:52.120 |
we did a little back of bit of a back of the envelope in Excel and we thought 00:59:55.000 |
this is could could well be random so here's where the issue is the graph 01:00:03.500 |
shows what we call a univariate relationship a univariate relationship 01:00:07.220 |
shows the relationship between one independent variable and one dependent 01:00:11.300 |
variable and that's what you can normally show on a graph but in this 01:00:14.880 |
case they did a multivariate model in which they looked at temperature and 01:00:19.680 |
humidity and GDP per capita and population density and when you put all 01:00:26.680 |
of those things into the model then you end up with statistically significant 01:00:30.560 |
results for temperature and humidity why does that happen well the reason that 01:00:36.040 |
happens is because all these variation in the blue dots is not random there's a 01:00:44.040 |
reason they're different right and the reasons include denser cities are going 01:00:49.160 |
to have higher transmission for instance and probably more humid will have less 01:00:55.000 |
transmission so when you do a multivariate model it actually allows you 01:01:02.360 |
to be more confident of your results right but the p-value as noted by the 01:01:11.760 |
American Statistical Association does not tell us whether this is a practical 01:01:15.640 |
importance the thing that tells us this is a practical is importance is the 01:01:20.400 |
actual slope that's found and so in this case the equation they come up with is 01:01:28.120 |
that R equals 3.968 minus 3.0.038 by temperature minus 0.024 by relative 01:01:37.600 |
humidity this is this equation is this practically important well we can again 01:01:43.320 |
do a little back of the envelope here by just putting that into Excel let's say 01:01:52.160 |
there was one place that had a temperature of 10 centigrade and a 01:01:55.480 |
humidity of 40 then if this equation is correct I would be about 2.7 somewhere 01:02:02.320 |
with a temperature of 35 centigrade any humidity of 80 I would be about 0.8 so 01:02:08.880 |
is this practically important oh my god yes right two different cities with 01:02:15.400 |
different climates can be if they're the same in every other way and this model 01:02:19.920 |
is correct then one city would have no spread of disease because I was less than 01:02:25.280 |
one one would have massive exponential explosion so we can see from this model 01:02:33.120 |
that if the modeling is correct then this is a highly practically significant 01:02:38.100 |
result so this is how you determine practical significance of your models 01:02:41.960 |
it's not with p-values but with looking at kind of actual outcomes so how do you 01:02:49.880 |
think about the practical importance of a model and how do you turn a predictive 01:02:57.960 |
model into something useful in production so I spent many many years 01:03:03.080 |
thinking about this and I actually created a with some other great folks 01:03:09.640 |
actually created a paper about it designing great data products and this 01:03:19.680 |
is largely based on 10 years of work I did at a company I founded called optimal 01:03:26.060 |
decisions group and optimal decisions group was focused on the question of 01:03:30.940 |
helping insurance companies figure out what prices to set and insurance 01:03:36.440 |
companies up until that point had focused on predictive modeling 01:03:40.240 |
actuaries in particular spent their time trying to figure out how likely is it 01:03:47.320 |
that you're going to crash your car and if you do how much damage might you have 01:03:50.920 |
and then based on that try to figure out what price they should set for your 01:03:55.320 |
policy so for this company what we did was we decided to use a different 01:04:01.160 |
approach which I ended up calling the drivetrain approach just described here 01:04:06.280 |
to to set insurance prices and indeed to do all kinds of other things and so for 01:04:12.780 |
the insurance example the objective would be if an insurance company would 01:04:17.520 |
be how do I maximize my let's say five year profit and then what inputs can we 01:04:25.800 |
control can we control which I call levers so in this case it would be what 01:04:30.460 |
price can I set and then data is data which can tell you as you change your 01:04:37.560 |
levers how does that change your objective so if I start increasing my 01:04:41.600 |
price to people who are likely to crash their car then we'll get less of them 01:04:46.300 |
which means we'll have less costs but at the same time we'll also have less 01:04:50.240 |
revenue coming in for example so to link up the kind of the levers to the 01:04:55.720 |
objective via the data we collect we build models that described how the 01:04:59.960 |
levers influenced the objective and this is all a it seems pretty obvious when 01:05:05.640 |
you say it like this but when we started work with optimal decisions in 1999 01:05:11.040 |
nobody was doing this in insurance everybody in insurance was simply 01:05:15.640 |
doing a predictive model to guess how likely people were to crash their car and 01:05:20.920 |
then pricing was set by like adding 20% or whatever it was just done in a very 01:05:27.000 |
kind of naive way so what I did is I you know over many years took this basic 01:05:35.040 |
process and tried to help lots of companies figure out how to use it to 01:05:39.080 |
turn predictive models into actions so the starting point in like actually 01:05:46.800 |
getting value in a predictive model is thinking about what is it you're trying 01:05:50.400 |
to do and you know what are the sources of value in that thing you're trying to 01:05:53.400 |
do the levers what are the things you can change like what's the point of a 01:05:58.280 |
predictive model if you can't do anything about it right figuring out 01:06:02.800 |
ways to find what data you you don't have which one's suitable what's 01:06:06.040 |
available then think about what approaches to analytics you can then 01:06:09.160 |
take and then super important like well can you actually implement you know 01:06:15.960 |
those changes and super super important how do you actually change things as the 01:06:21.360 |
environment changes and you know interestingly a lot of these things are 01:06:24.920 |
areas where there's not very much academic research there's a little bit 01:06:28.680 |
and some of the papers that have been particularly around maintenance of like 01:06:34.440 |
how do you decide when your machine learning model is kind of still okay how 01:06:39.560 |
do you update it over time I've had like many many many many citations but they 01:06:45.240 |
don't pop up very often because a lot of folks are so focused on the math you 01:06:49.760 |
know and then there's the whole question of like what constraints are in place 01:06:54.000 |
across this whole thing so what you'll find in the book is there is a whole 01:06:58.120 |
appendix which actually goes through every one of these six things and has a 01:07:03.800 |
whole list of examples so this is an example of how to like think about value 01:07:11.520 |
and lots of questions that companies and organizations can use to try and think 01:07:17.800 |
about you know all of these different pieces of the actual puzzle of getting 01:07:25.200 |
stuff into production and actually into an effective product we have a question 01:07:29.120 |
sure just a moment so I say so do check out this appendix because it actually 01:07:33.680 |
originally appeared as a blog post and I think except for my COVID-19 posts that 01:07:39.560 |
I did with Rachel it's actually the most popular blog post I've ever written it's 01:07:43.880 |
at hundreds of thousands of views and it kind of represents like 20 years of hard 01:07:48.560 |
one insights about like how you actually get value from machine learning in 01:07:55.120 |
practice and what you actually have to ask so please check it out because 01:07:58.320 |
hopefully you'll find it helpful so when we think about like think about this for 01:08:03.760 |
the question of how should people think about the relationship between seasonality 01:08:08.160 |
and transmissibility of COVID-19 you kind of need to dig really deeply into the 01:08:15.720 |
questions about like oh not just what what's that what are those numbers in 01:08:20.680 |
the data but what does it really look like right so one of the things in the 01:08:24.160 |
paper that they show is actual maps right of temperature and humidity and ah 01:08:31.360 |
right and you can see like not surprisingly that humidity and 01:08:37.680 |
temperature in China are what we would call autocorrelated which is to say that 01:08:44.160 |
places that are close to each other in this case geographically have similar 01:08:48.080 |
temperatures and similar humidities and so like this actually puts into the 01:08:54.960 |
question the a lot the p-values that they have right because you you can't 01:09:01.040 |
really think of these as a hundred totally separate cities because the ones 01:09:04.760 |
that are close to each other probably have very close behavior so maybe you 01:09:08.080 |
should think of them as like a small number of sets of cities you know of 01:09:12.920 |
kind of larger geographies so these are the kinds of things that when you look 01:09:18.280 |
actually into a model you need to like think about what are the what are the 01:09:23.000 |
limitations but then to decide like well what does that mean what do I what do I 01:09:26.880 |
do about that you you need to think of it from this kind of utility point of 01:09:34.360 |
view this kind of end-to-end what are the actions I can take what are the 01:09:39.040 |
results point of view not just null hypothesis testing so in this case for 01:09:44.440 |
example there are basically four possible key ways this could end up it 01:09:52.040 |
could end up that there really is a relationship between temperature and R 01:09:57.480 |
or so that's what the right-hand side is or there is no real relationship between 01:10:03.800 |
temperature and R and we might act on the assumption that there is a 01:10:09.160 |
relationship or we might act on the assumption that there isn't a 01:10:12.720 |
relationship and so you kind of want to look at each of these four possibilities 01:10:16.760 |
and say like well what would be the economic and societal consequences and 01:10:22.560 |
you know there's going to be a huge difference in lives lost and you know 01:10:28.000 |
economies crashing and whatever else to you know for each of these four the the 01:10:36.180 |
paper actually you know has shown if their model is correct what's the likely 01:10:42.000 |
R value in March for like every city in the world and the likely R value in July 01:10:48.440 |
for every city in the world and so for example if you look at kind of New 01:10:52.880 |
England and New York the prediction here is and also West the other the very 01:10:57.680 |
coast of the West Coast is that in July the disease will stop spreading now you 01:11:04.640 |
know in a if that happens if they're right then that's going to be a 01:11:08.880 |
disaster because I think it's very likely in America and also the UK that 01:11:14.320 |
people will say oh turns out this disease is not a problem you know it 01:11:19.300 |
didn't really take off at all the scientists were wrong people will go 01:11:23.000 |
back to their previous day-to-day life and we could see what happened in 1918 01:11:28.160 |
flu virus of like the second go around when winter hits could be much worse 01:11:34.760 |
than than the start right so like there's these kind of like huge potential 01:11:41.800 |
policy impacts depending on whether this is true or false and so to think about 01:11:47.880 |
it - yes I also just wanted to say that it would be it would be very 01:11:53.160 |
irresponsible to think oh summer's gonna solve it we don't need to act now just 01:11:59.240 |
in that this is something growing exponentially and could do a huge huge 01:12:02.840 |
amount of damage yeah yeah so it could already has done either way if you 01:12:08.040 |
assume that there will be seasonality and that summer will fix things then it 01:12:13.760 |
could lead you to be apathetic now if you assume there's no seasonality and 01:12:18.160 |
then there is then you could end up kind of creating a larger level of 01:12:24.720 |
expectation of distraction than actually happens and end up with your population 01:12:28.720 |
being even more apathetic you know so that they're you know being wrong in any 01:12:33.000 |
direction of your problem so one of the ways we tend to deal with this with with 01:12:37.800 |
this kind of modeling is we try to think about priors so priors are basically 01:12:42.820 |
things where we you know rather than just having a null hypothesis we try and 01:12:47.020 |
start with a guess as to like well what's what's more likely right so in 01:12:52.080 |
this case if memory says correctly I think we know that like flu viruses 01:12:57.560 |
become inactive at 27 centigrade we know that like cold the cold coronaviruses 01:13:04.640 |
are seasonal 1918 the 1918 flu epidemic was seasonal in every country and city 01:13:14.880 |
that's been studied so far there's been quite a few studies like this they've 01:13:18.120 |
always found climate relationships so far so maybe we'd say well our prior belief 01:13:23.640 |
is that this thing is probably seasonal and so then we'd say well this 01:13:27.960 |
particular paper adds some evidence to that so like it shows like how 01:13:34.800 |
incredibly complex it is to use a model in practice for in this case policy 01:13:42.800 |
discussions but also for like organizational decisions because you 01:13:47.880 |
know there's always complexities there's always uncertainties and so you actually 01:13:52.520 |
have to think about the the utilities you know and your best guesses and try to 01:13:57.920 |
combine everything together as best as you can okay so with all that said it's 01:14:08.080 |
still nice to be able to get our our models up and running even if you know 01:14:14.560 |
even just a predictive model is sometimes useful of its own sometimes 01:14:19.300 |
it's useful to prototype something and sometimes it's just it's going to be 01:14:23.960 |
part of some bigger picture so rather than try to create some huge end-to-end 01:14:28.180 |
model here we thought we would just show you how to get your your pytorch fast AI 01:14:36.400 |
model up and running in as raw a form as possible so that from there you can kind 01:14:43.180 |
of build on top of it as you like so to do that we are going to download and 01:14:51.100 |
curate our own data set and you're going to do the same thing you've got to train 01:14:55.200 |
your own model on that data set and then you're going to get an application and 01:15:00.360 |
then you're going to host it okay now there's lots of ways to create an image 01:15:07.720 |
data set you might have some photos on your own computer there might be stuff 01:15:12.080 |
at work you can use one of the easiest though is just to download stuff off the 01:15:17.720 |
internet there's lots of services for downloading stuff off the internet we're 01:15:22.120 |
going to be using Bing image search here because they're super easy to use a lot 01:15:28.080 |
of the other kind of easy to use things require breaking the terms of service of 01:15:32.640 |
websites so like we're not going to show you how to do that but there's lots of 01:15:38.280 |
examples that do show you how to do that so you can check them out as well if you 01:15:42.560 |
if you want to Bing image search is actually pretty great at least at the 01:15:46.260 |
moment these things change a lot so keep an eye on our website to see if we've 01:15:52.160 |
changed our recommendation the biggest problem with Bing image search is that 01:15:57.480 |
the sign-up process is a nightmare at least at the moment like one of the 01:16:03.360 |
hardest parts of this book is just signing up to their damn API which 01:16:07.720 |
requires going through Azure it's called cognitive services Azure cognitive 01:16:11.400 |
services so we'll make sure that all that information is on the website for 01:16:15.820 |
you to follow through just how to sign up so we're going to start from the 01:16:19.160 |
assumption that you've already signed up but you can find it just go Bing Bing 01:16:29.040 |
image search API and at the moment they give you seven days with a pretty high 01:16:36.760 |
quota for free and then after that you can keep using it as long as you like 01:16:46.240 |
but they kind of limit it to like three transactions per second or something 01:16:50.580 |
which is still plenty you can still do thousands for free so it's it's at the 01:16:54.920 |
moment it's pretty great even for free so what will happen is when you sign up 01:17:02.240 |
for Bing image search or any of these kind of services they'll give you an API 01:17:05.840 |
key so just replace the xxx here with the API key that they give you okay so 01:17:12.740 |
that's now going to be called key in fact let's do it over here okay so you'll put 01:17:21.080 |
in your key and then there's a function we've created called search images Bing 01:17:27.800 |
which is just a super tiny little function as you can see it's just two 01:17:32.900 |
lines of code I was just trying to save a little bit of time which will take some 01:17:38.960 |
take your API key and some search term and return a list of URLs that match 01:17:44.200 |
that search term as you can see for using this particular service you have 01:17:52.600 |
to install a particular package so we show you how to do that on the site as 01:17:59.320 |
well so once you've done so you'll be able to run this and that will return by 01:18:05.500 |
default I think 150 URLs okay so fast AI comes with a download URL function so 01:18:13.200 |
let's just download one of those images just to check and open it up and so what 01:18:18.760 |
I did was I searched for grizzly bear and here I have a grizzly bear so then 01:18:24.800 |
what I did was I said okay let's try and create a model that can recognize 01:18:29.480 |
grizzly bears versus black bears versus teddy bears so that way I can find out I 01:18:35.280 |
could set up some video recognition system near our campsite when we're out 01:18:40.800 |
camping that gives me bear warnings but if it's a teddy bear coming then it 01:18:45.600 |
doesn't warn me and wake me up because that would not be scary at all so then I 01:18:50.200 |
just go through each of those three bear types create a directory with the name 01:18:55.760 |
of grizzly or black or teddy bear search being for that particular search term 01:19:02.640 |
along with bear and download and so download images is a fast AI function as 01:19:09.160 |
well so after that I can call get image files which is a fast AI function that 01:19:16.040 |
will just return recursively all of the image files inside this path and you can 01:19:21.080 |
see it's given me bears / black / and then lots of numbers so one of the 01:19:29.480 |
things you have to be careful of is that a lot of the stuff you download will 01:19:32.360 |
turn out to be like not images at all and will break so you can call verify 01:19:36.800 |
images to check that all of these file names are actual images and in this case 01:19:44.180 |
I didn't have any failed so this it's empty but if you did have some then you 01:19:50.160 |
would call path dot unlink unlink path dot unlink is part of the Python 01:19:56.000 |
standard library and it deletes a file and map is something that will call this 01:20:02.120 |
function for every element of this collection this is part of a special 01:20:10.080 |
fast AI class called L it's basically it's kind of a mix between the Python 01:20:16.160 |
standard library list class and a NumPy array class and we'll be learning more 01:20:21.840 |
about it later in this course but it basically tries to make it super easy to 01:20:26.040 |
do kind of more functional style programming and Python so in this case 01:20:31.720 |
it's going to unlink everything that's in the failed list which is probably what 01:20:37.040 |
we want because they're all the images that failed to verify alright so we've 01:20:42.280 |
now got a path that contains a whole bunch of images and they're classified 01:20:48.760 |
according to black grizzly or teddy based on what folder they're in and so to 01:20:55.320 |
create so we're going to create a model and so to create a model the first thing 01:20:59.920 |
we need to do is to tell fast AI what kind of data we have and how it's 01:21:07.120 |
structured now in part in lesson one of the course we did that by using what we 01:21:13.960 |
call a factory method which is we just said image data loaders dot from name 01:21:20.040 |
and it did it all for us those factory methods are fine for beginners but now 01:21:28.040 |
we're into lesson two we're not quite beginners anymore so we're going to show 01:21:31.120 |
you the super super flexible way to use data in whatever format you like and 01:21:36.040 |
it's called the data block API and so the data block API looks like this 01:21:46.080 |
here's the data block API you tell fast AI what your independent variable is and 01:21:54.040 |
what your dependent variable is so what your labels are and what your input data 01:21:57.800 |
is so in this case our input data are images and our labels are categories so 01:22:05.560 |
the category is going to be either grizzly or black or teddy so that's the 01:22:12.040 |
first thing you tell it that that's the block parameter and then you tell it how 01:22:16.160 |
do you get a list of all of the in this case file names right and we just saw 01:22:20.760 |
how to do that because we just called the function ourselves the function is 01:22:23.820 |
called get image files so we tell it what function to use to get that list of 01:22:27.560 |
items and then you tell it how do you split the data into a validation set and 01:22:34.280 |
a training set and so we're going to use something called a random splitter which 01:22:37.960 |
just splits it randomly and we're going to put 30% of it into the validation set 01:22:42.000 |
we're also going to set the random seed which ensures that every time we run 01:22:46.280 |
this the validation set will be the same and then you say okay how do you label 01:22:51.960 |
the data and this is the name of a function called parent label and so 01:22:56.520 |
that's going to look for each item at the name of the parent so this this 01:23:03.120 |
particular one would become a black bear and this is like the most common way for 01:23:08.960 |
image data sets to be represented is that they get put the different images 01:23:13.240 |
get the files get put into folder according to their label and then 01:23:19.200 |
finally here we've got something called item transforms we'll be learning a lot 01:23:22.960 |
more about transforms in a moment that these are basically functions that get 01:23:26.760 |
applied to each image and so each image is going to be resized to 128 by 128 01:23:34.160 |
square so we're going to be learning more about data block API soon but 01:23:39.680 |
basically the process is going to be it's going to call whatever is get 01:23:42.240 |
items which is a list of image files it's then I'm going to call get X get Y 01:23:47.680 |
so in this case there's no get X but there is a get Y so it's just parent 01:23:51.240 |
label and then it's going to call the create method for each of these two 01:23:55.360 |
things it's going to create an image and it's going to create a category it's 01:23:59.080 |
then going to call the item transforms which is resize and then the next thing 01:24:04.040 |
it does is it puts it into something called a data loader a data loader is 01:24:07.760 |
something that grabs a few images at a time I think by default at 64 and puts 01:24:13.840 |
them all into a single it's got a batch it just grabs 64 images and sticks them 01:24:18.760 |
all together and the reason it does that is it then puts them all onto the GPU at 01:24:23.320 |
once so it can pass them all to the model through the GPU in one go and 01:24:30.360 |
that's going to let the GPU go much faster as we'll be learning about and 01:24:35.200 |
then finally we don't use any here we can have something called batch 01:24:38.680 |
transforms which we will talk about later and then somewhere in the middle 01:24:43.280 |
about here conceptually is the splitter which is the thing that splits into the 01:24:48.680 |
training set and the validation set so this is a super flexible way to tell 01:24:54.560 |
fast AI how to work with your data and so at the end of that it returns an 01:25:03.120 |
object of type data loaders that's why we always call these things DL's right so 01:25:08.880 |
data loaders has a validation and a training data loader and a data loader as 01:25:15.480 |
I just mentioned is something that grabs a batch of a few items at a time and 01:25:19.880 |
puts it on the GPU for you so this is basically the entire code of data loaders 01:25:26.920 |
so the details don't matter I just wanted to point out that like a lot of 01:25:31.120 |
these concepts in fast AI when you actually look at what they are there 01:25:34.800 |
they're incredibly simple little things it's literally something that you just 01:25:38.680 |
pass in a few data loaders to and it's still some in an attribute and pass and 01:25:43.160 |
gives you the first one back as dot train and the second one back as dot 01:25:47.000 |
valid so we can create our data loaders by first of all creating the data block 01:25:57.680 |
and then we call the data loaders passing in our path to create DL's and 01:26:02.400 |
then you can call show batch on that you can call show batch on pretty much 01:26:06.360 |
anything in fast AI to see your data and look we've got some grizzlies we've got 01:26:10.700 |
a teddy we've got a grizzly so you get the idea right I'm going to look at these 01:26:19.880 |
different I'm going to look at data augmentation next week so I'm going to 01:26:23.360 |
skip over data augmentation and let's just jump straight into training your 01:26:27.200 |
model so once we've got DL's we can just like in lesson one call CNN learner to 01:26:38.600 |
create a resnet we're going to create a smaller resident this time a resnet 18 01:26:43.080 |
again asking for error rate we can then call dot fine-tune again so you see it's 01:26:48.080 |
all the same lines of code we've already seen and you can see our error rate goes 01:26:52.800 |
down from 9 to 1 so you've got 1% error and after training for about 25 seconds 01:26:58.960 |
so you can see you know we've only got 450 images we've trained for well less 01:27:05.320 |
than a minute and we only have let's look at the confusion matrix so we can 01:27:09.640 |
say I want to create a classification interpretation class I want to look at 01:27:14.840 |
the confusion matrix and the confusion matrix as you can see it's something 01:27:19.540 |
that says for things that are actually black bears how many are predicted to be 01:27:24.280 |
black bears versus grizzly bears versus teddy bears so the diagonal are the ones 01:27:31.280 |
that are all correct and so it looks like we've got two errors we've got one 01:27:34.580 |
grizzly that was predicted to be black one black that was predicted to be 01:27:37.760 |
grizzly super super useful method is plot top losses that'll actually show me 01:27:48.280 |
what my errors actually look like so this one here was predicted to be a 01:27:53.420 |
grizzly bear but the label was black bear this one was the one that's 01:27:58.000 |
predicted to be a black bear and the label was grizzly bear these ones here 01:28:03.440 |
are not actually wrong there this is predicted to be black and it's actually 01:28:06.360 |
black but the reason they appear in this is because these are the ones that the 01:28:12.160 |
model was the least confident about okay so we're going to look at image 01:28:18.520 |
classifier cleaner next week let's focus on how we then get this into production 01:28:24.160 |
so to get it into production we need to export the model so what exporting the 01:28:32.680 |
model does is it creates a new file which by default is called export dot 01:28:38.200 |
pickle which contains the architecture and all of the parameters of the model 01:28:44.160 |
so that is now something that you can copy over to a server somewhere and 01:28:50.160 |
treat it as a predefined program right so then so the the process of using your 01:28:58.840 |
trained model on new data kind of in production is called inference so here 01:29:06.200 |
I've created an inference learner by loading that learner back again right and 01:29:11.280 |
so obviously it doesn't make sense to do it right next to after I've saved it in 01:29:16.760 |
in a notebook but I'm just showing you how it would work right so this is 01:29:20.360 |
something that you would do on your server inference and remember that once 01:29:26.320 |
you have trained a model you can just treat it as a program you can pass 01:29:30.660 |
inputs to it so this is now our our program this is our bear predictor so I 01:29:35.800 |
can now call predict on it and I can pass it an image and it will tell me 01:29:42.680 |
here is it is ninety nine point nine nine nine percent sure that this is a 01:29:47.760 |
grizzly so I think what we're going to do here is we're going to wrap it up 01:29:53.200 |
here and next week we'll finish off by creating an actual GUI for our bear 01:30:03.160 |
classifier we will show how to run it for free on a service called binder and 01:30:16.000 |
yeah and then I think we'll be ready to dive into some of the some of the 01:30:21.560 |
details of what's going on behind the scenes any questions or anything else 01:30:26.200 |
before we wrap up Rachel now okay great all right thanks everybody so we 01:30:36.320 |
hopefully yeah I think from here on we've covered you know most of the key 01:30:44.040 |
kind of underlying foundational stuff from a machine learning point of view 01:30:48.240 |
that we're going to need to cover so we'll be able to ready to dive into 01:30:54.160 |
lower level details of how deep learning works behind the scenes and I think 01:31:01.440 |
that'll be starting from next week so see you then