Back to Index

USF Deep Learning Info 2020


Chapters

0:0
0:18 Course logistics
9:26 Why top-down learning?
20:50 Research examples
25:21 The fastai library
25:30 PyTorch is taking over the research world
28:33 Helping rescue misguided tensorflow
32:49 fastai's layered API
36:36 Image classification
37:34 Custom labelling
37:46 Segmentation
38:14 Object detection

Transcript

I'm Jeremy. This is Silvan who helped me develop this course and I'm a trainer. He's talking to learn things. So hopefully you're hearing a lot about learning that. Otherwise you're in the wrong place. Don't ask me much about logistics because I don't think that would be a careless job.

This is the basic deal. I wanted to say a couple of things about prerequisites. It really is a course for coders. So if you're not a strong coder, you can do it. You just have to work hard. We're not even going to tell you how to code in Python.

If you've never coded in Python but you've done two or three other languages, you'll pick it up super fast. It'll be fine. If you've done a little bit of MATLAB scripting and that's about it, you're going to have to put in more Alice. Have a look at the recordings of some of the previous classes to get a sense of the amount of coding chops that are expected.

But really it's about, you'll get out of it what you put into it. The location is here, PG&E, so here's 101 Howard where we are. So it's just a couple of blocks away, but don't turn up here for the class because it's not here. It's a much bigger auditorium because it's quite a popular course.

However, there is a study group here every day throughout the course. It's not office hours. It's not tutoring. It's a study group, so you can come and hang out at USF. We kind of particularly set it up because a lot of people fly in from overseas and they're just here the whole time, so it's kind of a nice place for them to work from.

But I'll show you some examples of student projects that have come out of the course, and most of them happened in the study group. So that's a particularly good thing to do if a lot of people put their jobs on hiatus during the course so they can focus on it full time, and if that's you, you probably want to come along to the study group and get involved.

So it's really like doing projects and hanging out with other like-minded students. Okay, so why learn deep learning? Well, deep learning is quite good at quite a lot of things. These are all things that deep learning is the best in the world at right now, and for many of these things, superhuman as well.

I won't go through all of them, but basically for kind of complex problems, particularly those involving some amount of kind of pattern recognition and analogy making, deep learning tends to work very well and it's used very widely in industry and scientific research and so forth. So I kind of saw this coming a while ago and got pretty excited about it and also kind of nervous about it because to me when the big technology comes along, which kind of changes what's possible, what changes how much people can do things, it gives big opportunities, but it can also be a bit of a threat if it all ends up in the hands of a small homogenous group of people.

So kind of our mission is to get this tool into the hands of as many people as possible. One of the things that stops people from getting into deep learning is a view that they can't do it or that deep learning is not right for them. This is a list of reasons people tell me that they're not doing deep learning.

None of them are true. As you'll see in the course, you can get great results with 30 or 40 images. We do nearly all of our work on a single laptop computer. You can use deep learning for a really wide range of applications. We're not going to be building a brain.

We're not claiming this is artificial general intelligence and it's not something we're going to talk about at all. We're just talking about deep learning as a tool to get stuff done. So there's a strong connection between the University of San Francisco and FAST.ai. The University of San Francisco is the oldest university in San Francisco.

The main campus is on the other side of town. This downtown campus is where the Data Institute lives, which is where all the data science stuff happens. How many of you are familiar with FAST.ai? The courses you see on FAST.ai all got recorded here at USF or at PG&E.

They're all USF courses that get turned into MOOCs. One obvious question then would be why do the in-person course when it'll be online in July? The obvious answer, well, A, you're going to be the first to see the material by quite a bit. You're going to be doing it with a bunch of like-minded people.

The difference between being in a in-person group of people who are all studying the same thing at the same time, it's pretty different and it's interesting to see how many of the best students who go on to do the most high-impact work were in the in-person course, which when you think there's like two or three hundred people that do the in-person course versus two or three hundred thousand that do the online course, it's kind of interesting to see that.

FAST.ai does a few things. So one of them is education. But all of the things that FAST.ai does is about making deep learning accessible, as I described. So as well as the education, there's a community, online community that we build and help develop. We do research and we build software.

All of these things are very connected to the course that I want to talk about. So I'm going to bring all these together because the stuff we do at USF is all deeply connected to the mission of FAST.ai. The community all happens here on the forums, forums go FAST.ai.

And one of the really interesting things about this is that during the period that the live course is going on, there's a whole other level of activity on the forums. One of the reasons for that is that we actually invite this year the top 800 participants from the forums to participate live in the course with you through a live stream.

So that's an invite-only thing where the best people from the community get to participate. And so most of those folks are expert practitioners. Many of them have published papers or PhDs or whatever. And the quid pro quo is that they help during the course answering questions and expanding on things that people are interested in and so forth.

So there's this kind of huge uptick in activity on the forums that goes on during the in-person course. And the actual category where that's happening is a private category just for the people that are in the course, the invited live stream or the in-person folks until the online version comes out in July.

So it's kind of like your private study group of a thousand people around the world. So as I mentioned, the courses that get recorded here at USF get remixed into an online course. The online courses that we've developed have been super popular, so nearly a million hours of time spent around the world on watching this material over three million views.

One of the reasons that the course has been so popular is it's kind of upside down compared to most technical teaching, which is that you learn how to do things before you learn why those things work. So we describe that as the difference between a bottom-up teaching approach and a top-down teaching approach.

So bottom-up is what most university technical material looks like. It's kind of started like addition and then subtraction and then gradually build up until in your PhD program you learn to do something useful. And it's a lot of people who like study math, for example, kind of say like, "I didn't actually get to appreciate the beauty of this subject until I got to my PhD." There's another way to learn, just top-down, which is how we learn music or baseball, which is to put an instrument in somebody's hands and get them playing music and then gradually over the next ten years you learn about harmonies and theory and history and whatever.

So we teach people learning more like how people teach music or sports. So you get started using stuff right away and this means that you avoid these problems at the bottom-up approach. You have motivation from the start because you're building useful stuff. You have the context, human brains love context, so you know why things are being done.

And you understand which bits are important because we don't teach the lower-level pieces until we need them. So one misunderstanding of the top-down approach is some people think it's kind of dumbed down or has less theory and foundations and that couldn't be further from the truth because what happens is with the top-down approach as we peel the layers away, we do eventually get to that core and so we actually end up seeing the math and the theory and so forth.

Having said that, the math is not taught with Greek letters and math notation. The math is taught with code because our view is, well there's a couple of things. The first is all the math gets ended up being turned into code anyway to actually get the computer to do something.

So you may as well see it in the form it's going to be used. When it's shown in the form of code you can experiment with it. You can put inputs into it and see the outputs come out of it. You can see what's going on. And also, why learn two whole separate languages and notations?

If you know how to code, then let's do that. So we teach the math that's necessary to actually understand the foundations. We introduce a bit of notation in order to, because sometimes you just have to read a paper to see how something works, so we try to show his how to make sense of papers.

But the vast majority of the explanation is as code. Here's a piece of math. Now let's look at the code of the math and see how it maps. One thing that you might find interesting is to look at some student work. So something that I did a while ago was to put up a post on the forum saying like, "Oh, after lesson one, if you've made something interesting, let us know." And this was now quite a while ago.

I did this when there was a thousand replies. There's probably more like 2,000 now. Lots of people have posted and said like, "Oh, here's something I built." And it's been really cool because there are people all around the world that do the course, even the live course, because of the 800 I mentioned, who are also live streaming it.

You get all these interesting projects going on. So one person talked about a recognition program to see different types of Trinidad and Tobago masqueraders versus regular islanders. Somebody did zucchinis versus cucumber. And one of the interesting things here is it's like 47 cucumbers and 39 zucchinis. And they got, I think, it was 100% accuracy.

So we don't need lots of data, even for things that are pretty subtle, like cucumbers versus zucchinis. This was an interesting one, which is predicting what city a satellite photo is of, which is kind of something that I doubt any of us humans could do. But a computer was doing it to 85% accuracy for 110 cities, which is amazing.

Looking at Panamanian buses, Batik cloth patterns, Tanzanian building conditions. And it turns out that it's not at all uncommon, like every course, there's lots of examples of people who discover that they have a new state of the art. Because deep learning still has been not applied to more things than it has been applied to.

So whatever you come in here with an interest in as a hobby or as your vocation or whatever, hopefully you can try these techniques on that thing. So it turned out that Sivash got a state of the art result on Dev and Gary character recognition, literally using the lesson one course material.

Ethan got a state of the art result on environmental sound classification. One of the interesting things here is lesson one is all about image classification, but you can turn a lot of things into images. So in this case he converted sounds into images representing those sounds by creating things called spectrograms.

And then he compared to his paper to see what the state of the art was and got the best result. Elena is an example of somebody who took it to a whole other level. She actually, during the course, did three hosts all in the area of genomics. She's one of the top scientists at Human Longevity International.

And in every case she showed a significant advance of the state of the art in different genomics areas. So we actually, like you'll see there's a lot of writing here. You don't have to write, but a lot of students do. We encourage it because it's a great way to develop your understanding of the material, is to try to write it down, you know.

And so we do talk a bit about writing, and a lot of students try out their head at writing something about, particularly about, like in Elena's case, a combination of deep learning and something you know about. Some of the student projects go big. A good example is Jason Antich, who, during the course, created this project he called de-oldify.

So in lesson seven last year we showed a new approach to generative image models, where we said, like, oh, you can take a picture as an import and create a new picture as an output. And Jason thought, oh, I wonder what would happen if I did that to create black and white input images to turn them into color output images.

And as you can see, it worked amazingly. And as of last week he just announced that he's now quit his job. He now has a new company. He's just sold it to the world's largest ancestry online site. And in the first week they had over, I think they said, a million people use his system to colorize photos of their relatives.

It's even kind of created new communities of practitioners. A lot of radiologists have taken the course. Folks like Alex, who went on to get together with a bunch of other folks and create a well-regarded paper. He also won a Kaggle competition on pneumonia detection, a very significant Kaggle competition.

He's now widely regarded as one of the experts in the field of big learning and medical imaging, even although I think he's still a resident or recently just finished being a resident. It's been cool to see there's lots of radiologists now who, particularly younger folks like Alex, who are expert practitioners in deep learning and also deeply understand their field of radiology and are bringing the two together to do some really powerful stuff.

Melissa Thapros actually did some super exciting work. She kind of really pioneered the study of facial recognition algorithms on people of color. But it turned out that they didn't work real well. And this is important because she was helping Kiva that does microlending, mainly in markets where there's not that many white folks.

So they tried to kind of use the algorithms that are out there. They didn't work very well. And she won this million-dollar AI for Everyone Challenge round. And a lot of these people don't have traditional machine learning backgrounds. So at the time, I think she was doing a PhD in English literature, as I mentioned, Alex, radiology.

Another alumni saying he was a lawyer, and he built this super impressive system for GitHub showing how you can search for code in English and get back code snippets. Christine Payne wrote a neural net that created this. So this music generator, she developed... So after doing the Fast.AI course, and actually she was in the in-person USF course, she went to OpenAI and became a resident there, and she wrote this, which went on to be produced by the BBC Philharmonic.

And so her background is... that's her as a pianist. So here's a great example of people bringing their domain expertise together with their deep learning skills. It's important to realize, folks like Christine, it's not like she just does a single course and she's instantly an expert. She worked very hard over a significant period of time and is also a genius, which helps.

But the point here is that you absolutely can bring together domain expertise and a deep understanding of deep learning that you can get through this USF course and the other Fast.AI online courses and so forth and pull it together into something super cool. So a lot of our research ends up in the course, and actually a lot of that research happens during the course, particularly in the study group.

So, for example, MIT Technology Review wrote about how a small team of student coders beat Google. They didn't just beat Google but also Intel to create the fastest ever training of ImageNet and CIFAR-10, two of the most important computer vision benchmarks in the world. This was a competition called John Bench.

So, yeah, this was exciting in the study group. A few of us decided, hey, let's see if we can take a crack at this competition. We spent a couple of weeks giving it our best shot. And really the trick was we didn't have the resources, like Intel entered, and they entered by combining a thousand servers into a mega cluster and running it on this mega cluster.

We had one computer. So we had to think, how do we beat a thousand computers with one computer? So we had to be creative and thoughtful. And so we've had lots of examples of this, both from projects that we've done in our research and our students, again and again, get state-of-the-art results with far fewer resources than the big guys.

Another interesting example of research that actually came out of a course was a couple of years ago, maybe three years ago now. I thought it would be great to show the students how transferable ideas are. And so the first two lectures were mainly about computer vision images. And so I wanted to show what would happen if you took those same ideas and applied them to text.

And it turned out that nobody had really done that before. And I didn't know much about text, natural language processing, but I thought I'd give it a go. And within a few hours of trying it, it turned out I had a new state-of-the-art result for one of the most widely studied text classification datasets.

So somebody who saw that course, who was then doing his PhD in this area, got in touch and offered to write it up into a paper. And we ended up publishing a paper together that got published in the top journal for computational linguistics. And actually kind of went on to help kick-start a new era in NLP, even got written up in the New York Times.

And today this idea of using these computer vision-based transfer learning techniques in NLP is probably the most important NLP development going on at the moment. Another area that we've really focused on in trying to -- this is all trying to make deep learning more accessible, right? So less computers, less data, less specialist knowledge.

So one of the things that's been holding back people from using deep learning is there's a lot of parameters to tweak, settings to get just right, like learning rates and regularization and optimization parameters and whatever. So one of the things we do is we just look out there to find the research that's already been done but has been overlooked to solve these problems.

So, for example, the most important parameter to set is something called a learning rate. And we discovered there was already a paper showing how to set it really well in like a minute, whereas previously people were using vast compute clusters to try out lots of different values. So we kind of popularize that and put it into the software and make it easy to use.

It's called the learning rate finder. The other thing we do is we tried lots of different -- these settings that we tweaked, they're called hyperparameters. We tried lots of different hyperparameters across lots of different data sets and found a set of hyperparameters that just work nearly all the time.

And we made them all the defaults. So one of the things we show in the course is how to like not waste your time and money doing stuff that, you know, you just don't have to fiddle with. There's already well-known good settings or there's easy ways to figure them out pretty quickly.

So it's a very kind of practical approach. So one of the really interesting things about it is the use of something called the Fast AI Library. So the Fast AI Library is a library that sits on top of another library called PyTorch. In deep learning, there are two libraries that pretty much everyone uses, PyTorch and TensorFlow.

TensorFlow came out of Google, the advanced resources a few years ago. And we used to teach it in this course. But we got to a point where it wasn't flexible enough to handle what we wanted to show students how to do with it and flexible enough for the research we wanted to do.

And luckily, at that time, a couple of years ago, a new library came out called PyTorch, which was really just a couple of people wrote it. So far fewer resources. But interestingly, it kind of had a bit of a fast AI feel to it in that because they didn't have the resources, they had to be super careful to curate the best approach to each thing, to make sure they did each thing just once and to be careful.

And we thought it was amazingly great and we switched everything over to PyTorch. And what's happened since then is PyTorch is kind of taking over the world. The first place you take over the world is in research because when the researchers flock to your software, then all of the new developments come out with your software and anybody who wants to use that have to do it with your software.

So if we look at the last few academic conferences, in each case, the percentage using PyTorch papers is over 50%. And you can see that's happened basically in one year. So you can clearly see that PyTorch is going to take over research in the industry in the next year or two.

So there's no point in learning what people used to use. And there's a reason that this is happening, it's just much better. So we focus on that. The only issue is that PyTorch is kind of a lower level plumbing library. You have to write everything yourself. So that's no good really for like particularly a top-down approach to getting stuff done.

So we wrote our own library on top of that called FastAI, which just makes a lot of things much easier. And so FastAI is now super popular in its own right. It's available on every major cloud platform. Lots and lots of research is coming out of it. Lots of Fortune 500 companies are using it.

And it's kind of really cool. We often get messages like this from people saying, "Oh, I just started using FastAI. I used to use TensorFlow and the first thing I tried, everything's so much better. How could it be this much better? I thought deep learning was deep learning." And then somebody else replied, founder of this company, saying, "Yep, that's what we found.

We used to use TensorFlow. We spent months tweaking it. FastAI used it and immediately got better results." So I mean, the main thing we teach is the concepts, the understanding of what we're doing. So in a sense, having great software doesn't matter as much. But it sure is nice that when you do something correctly, it would be nice to get a world-class result rather than have to spend months fiddling around with things.

So for example, with the previous version of FastAI, when we compared it to Keras, which is the main equivalent kind of API on top of TensorFlow, comparing it to the code that Keras makes available for a particular problem, we found our error was about half of the Keras error.

Our speed was about twice the Keras speed. The amount of lines of code was about one-sixth of the Keras lines of code. Lines of code is important. It's really important because these extra 26 lines of code is like 26 lines of things you have to tweak and change and choose, which is cognitive overhead that you have to deal with.

And if it doesn't work, which one of those extra lines was the problem? Where did you make the mistake? So when we used to use Keras, we used to teach it in the course, this would happen all the time. I would keep finding that things didn't work as well as I hoped they would, and there would be days of figuring out why, and there would be one of those lines of boilerplate where something was true rather than false.

So our view is that you shouldn't have to type more lines of code than necessary, but at the same token, everything should be infinitely customizable. So to get to that point, this course will be the first-ever course to show how to use FastAI version 2, which is officially coming out in July, so we'll be using the pre-release version of it.

And FastAI version 2 is a rewrite from scratch that is described fully in this peer-reviewed paper in the journal Information and will be in this O'Reilly book, of which you'll all be able to access for free during the course. And it's a huge advance over anything that's come before.

As I said, it's a rewrite from scratch, and it's very much designed to be what we described in the paper as a layered API. It's all about what happens when you take a coder mentality to a deep learning library and you think hard about things like refactoring and separation of concerns and stuff that software engineers really care about.

So with FastAI version 2, there's a lot of interesting stuff that you'll be the first ones to learn about and experiment with. So for example, there are diagnostics that print out after training that create a picture. This is a unique, new kind of picture showing you what's going on inside a network as it trains.

I won't describe it fully here, but basically what this is showing is the first one, two, three, four layers of a deep neural network, and it's showing that the numbers, they're called activations that neural network, are growing exponentially and crashing, and growing exponentially and crashing. This is really bad, it turns out.

And so there are these pictures that you can get out of the training process to actually look inside and see what's going on. And one of the nice things about this picture is it was actually developed during the last course in a study group. So one of the international visitors from Italy, Stefano, actually helped draw out all the different ways that we could build this visualization, and it's ended up in the library.

So as I mentioned, it's a layered API, and the course will focus initially on the top layer, which we call the applications. And these are the four things which are kind of pretty well established as being things deep learning is very good at, and we know how to do them, and we kind of really know how to do them properly, and it should work out of the box each time.

So we start there, and then we gradually delve into the mid-layer API, which is the components, the flexible components that the applications are built on. And then eventually we get to the foundation, which is the lowest level plumbing that everything is built on top of. So the applications for vision, for example, is all the code you need with fast AI to create a world-class classifier for recognizing pet breeds, and it takes, as you see, 35 seconds to run.

This is on a single computer, and it's one, two, three, four lines of code. The lines of code don't matter so much other than to point out that if you wanted to switch from an image classifier to something that can handle segmentation, this is segmentation. It's where you have a photo, and you want to color code every pixel to say what is it a pixel of.

So here green is a road, red is a sidewalk, orange is a light pole, and so forth. So you can see again, it's one, two, three, four lines of code, nearly the same four lines of code to do segmentation. And this show batch, which visualizes the contents, are going to be the same each time too.

Yeah? >> For the color select for those, is it random by the machine? >> The color selection is coming from something called a color map. So the plotting library we use is something called Mapplotlib, and Mapplotlib has a variety of color maps, and we just have a default color map that tries to select colors that are nicely distinctive.

Same thing for text. So this is how to get world class results on sentiment analysis, and again, it's the same lines of code, and again, show batch will now tell us here's the text and here's the labels. Tabular data, stuff that's in spreadsheets or database tables, is something that a lot of people don't realize actually works great with deep learning.

And again, it's basically the same lines of code, so here's one to predict who's going to be a high-income earner versus a low-income earner based on socioeconomic data. There's one extra step, which is you have to say which ones are categorical and which ones are continuous, which we'll learn all about, but other than that, it's the same steps.

And then very related is collaborative filtering. Collaborative filtering is a really important technique for recommendation systems, so figuring out who's going to be interested in buying this product or looking at this movie based on the past behavior of similar customers, and again, it's the same basic lines of code.

So as I mentioned, those applications, so we'll study all of those applications, we'll learn how to use them in practice and make it work well. But they're built on this mid-tier API that you can kind of mix and match. So I want to show you the whole thing, obviously, now, but an example of this is something called the DataBlock API, where, for example, if you want to do a digit recognition, so you've probably heard, like for machine learning, for DIC learning, a lot of the work is the data processing, it's getting the data into a form, you can model it.

We've realized there's basically four things you have to do to make that happen, and so we created this DataBlock API where you list the four things separately. So in this case, to do digit recognition, we say the input type is a black and white image, as you can see.

The output type is a category. Which digit is it? They are image files. How do you split into training and validation? How do you get the label? So you just say each of the things, and these are all plain Python functions. So you can write your own Python code to replace any of these things.

And so once you've done that, you now have something called a DataLoader, which is a PyTorch concept, which is basically a thing you can train a model from. So you can use a very similar-looking DataBlock to do custom labeling for a pet's example, to label with multiple labels, for example, for satellite classification, segmentation, it looks almost the same.

Instead of now having an image input and a category output, we have an image input and a mask output. If you're doing key points, so in this case we're looking for the center of people's faces, it's almost the same thing, but now we have an image input and a point output.

So this is an example of how it's really software engineering kind of basic principles of building these APIs with a nice decoupled separation of concerns, and the user ends up in a situation where they can build what they need, in a kind of fast and customized and easy way.

So I'll show you one more example of this mid-tier API, which is something called optimizers. Optimizers are the things that actually train your model, and you'll be learning all about them. It turns out that optimizers are a current kind of big area of research interest, and in the last 12 months people have built some much better optimizers.

They work much better than the basic approach called SGD. And so when they do that, they release papers and they release code. So one of the important recent optimizers is something called AdamW. It's actually not that recent, it was a couple of years ago, but it took about two years for AdamW to get implemented in PyTorch.

And in PyTorch it takes all this code because the software engineering work of refactoring had never happened. So we realized when we looked at lots and lots of papers that you could refactor all of them into a small basic framework using callbacks. So this is the same thing. All this code gets turned into these three lines and this little gray bit.

So that's the equivalent. So for us, we had AdamW implemented the day after the paper came out. So one of the cool things about working with fastAI is you often get to be the first to try out new research techniques because they're so easy to implement. Either us or somebody in the community will implement them.

So a really cool example of this is Google actually implemented a new optimizer to reduce the amount of time it took to train a very important NLP, language model, from three days to 76 minutes. And they created this thing called the Lamb Optimizer. And in their paper, this is their algorithm.

And in fastAI, this is the algorithm. And you can map basically line to line. And one of the nice things, if you're not a math person like me, I am not a math person, being able to see how the code matched to the math helps me get more comfortable with the math.

So actually I presented this to Google a few weeks ago, fastAI2. And it turned out one of the people in the room was one of the people on the paper. And he was just sort of happy to see his ideas so nicely expressed in code. So a lot of people are kind of funny about the idea of having a small amount of code as if that somehow decreases readability.

But a small amount of code means you're expressing the actual thing you want to express. So when it says, you know, compute this thing, there should be a line of code, computing that thing. If it's 50 lines of code computing that thing, my brain can't cope. So, you know, fastAIv2 is kind of designed for people with brains like mine that can't cope with too much complexity.

So we kind of removed it. Okay, so that's the quick overview. And we've got 15 minutes for questions. So, yeah, let me know what you want to know. So I've watched several of the courses of Battler, yeah, before. And by the way, they're really good. But now one of the parts that I've struggled is basically the math.

The math is sort of like trying to understand that sometimes it's really hard. But is there anything that you would recommend for the math? So for understanding the math, the best is actually part 2, which I assume will be in October, but we already have a part 2. So in the part 2 we already have online on fastAI that was recorded as the USF part 2.

We implement, I don't know how many silver, like a dozen of all papers from scratch. So if you do that, like for me, as I say, I'm not a math person. I studied philosophy, and even then not in any break depth. So my understanding of the math is basically coming from this process of implementing papers.

And so I find when I read somebody else's code and I read their paper and I compare them and then I implement it myself, you know, I kind of get there. And there is a kind of a language to it that takes a while to pick up, just like programming.

And the other thing is it's kind of less, it's less well-defined. People change the way it's described, so sometimes you just have to stare at it for a while and go and ask people, but it's showing the same ideas that's in the code. So it takes a certain amount of tenacity as well.

So, how is this year's course different from last year's course? And if you do the online course, would you get a head start if so? Sure. So each year the course is 100% new material. But each year it's trying to teach you the same stuff, which is how to be a good learning practitioner and also prepare you to do the Part 2 course, which is to become a world-class researcher, or at least the foundations for that.

But each year the world changes enough that we think we can do a much better job with the things that have happened in the last 12 months. So doing the previous year's course helps a lot. So like of the 800 people that are doing it live, probably most of them have done all of our previous courses.

Like a lot of people do every year's course, partly because you get a different angle on the material. So yeah, it's super helpful, particularly if you haven't got lots of Python experience or if you haven't played around much with like NumPy or the kind of scientific programming libraries. It's super helpful.

The thing I'll say is this course is kind of unusual in the variety of people that do it. There are plenty of people that turn up two hours a week and don't do any of the assignments and that's it. And they come out of it like learning stuff. They probably can't train a model much themselves and make it work, or if it breaks they wouldn't know how to fix it because they haven't practiced.

But they have a good sense like if they're a product manager or a CTO or whatever, like what are the capabilities, what does the kind of approach look like, where do people get stuck, you know, where are the constraints. But a lot of people study fast AI parts one and two full time for a year.

Like a lot of people take a year off to just do that and you certainly can get into that depth as well. So you can kind of decide how deep to go. And if you are interested in going deep then studying previous courses is certainly useful. One of the things we're doing differently this year is we're also incorporating the key material from the introduction to machine learning course.

So we'll be learning about in more detail about things like training versus validation sets. We'll be learning about random forests, we'll be learning about feature importance, stuff like that. So that's one key difference. So in the past there was two separate courses. Do you have any recommended readings that we might want to look at prior to the course?

The best recommended reading would be the previous videos. And there isn't much, I'm just trying to think, do you think so? I mean there's actually, people have taken great, really great notes about previous courses. So like the other thing would be, and they're linked to the courses. So there's people who have gone to a lot of effort to turn lessons into prose.

But yeah, I mean there's not a lot of great material out there and kind of top down practical deep learning elsewhere that we found. Are we going to use PyTorch 3D? No, there won't be any 3D stuff. Part two, there will be, we do, so we have a medical research initiative here, which I chair, called WAMRI.

So we have a lot of 3D medical data. It turns out that the vast majority of the time the best thing to do is what we call 2.5D, which is where you basically treat the images largely separately and then at the very end you combine them. The basic techniques to do that, we'll learn them all in the part one course.

But to actually put them together that would be more of a project you could do, I guess. I actually have two questions. The first one, I really love what you did with the abstraction in FastAI. I think that's brilliant. However, generally when you do something like that, you have to give up something.

And what I'm thinking is you give up fine-grained control, maybe. Is that fair to say? No, it's not. Just like we have layers of abstractions in all the software we use, just because you create another layer of abstraction doesn't mean the other ones go away. So in the previous version of FastAI, we didn't have this mid-layer.

And that was a big problem because the applications were written in these low-level foundations. So if you wanted to create a new application for audio or 3D medical imaging or whatever, you would dump way down into those weeds. So by adding this extra tier, we've kind of had this pre-release version available for a few months now.

It's been amazing to see how the community is building a lot more stuff already. The other thing is that when you provide the applications tier that makes it more concise but also more expressive, you can get involved more quickly. You can understand what's going on by saying, "Oh, these are the things you actually have to change from application to application, data set to data set." So then you can customize it a little bit at a time.

So it makes it much more gradually learnable, gradually extensible. You mentioned the code versus mathematics and how there's almost one-to-one correspondence between the two. But I'm thinking you can't prove stuff using code. You can with math. We don't do any proofs, and I don't find that's a problem. I've published very highly cited research papers in top journals, and I don't have proofs.

I say, "Hey, here's this thing in computer vision that works, and here's an understanding of why it works, and here's the similar ideas in natural language processing, and so we would expect the same thing to happen, and let's try it out," and whatever. So proofs are a controversial topic in deep learning, and for many years they were absolutely demanded for pretty much every conference paper that got accepted to the top conferences.

Jeffrey Hinton, who's one of the fathers of deep learning, complained regularly about the fact that we built something that creates these billion parameter networks with all these layers. Any proof requires simplifying the math down to a point that it's not accurate anymore. And there's still a bit of that that happens, honestly.

There's a lot of papers that are still published today where they end up in top journals because they have proofs in them, but they require--it's kind of like economics. It's like, "Oh, let's set up some premises that have nothing to do with the reality of training a real neural network," but now we've simplified it so much, the main thing people try to do is prove convergence bounds.

It basically says, "Oh, regardless of how you initialize this thing, you'll always end up with the error getting better and better." When I started out in this area 20 plus years ago, everything was always about proving convergence, but a lot of people in operations research focused on only looked at algorithms where you could prove that it would be optimal.

And it really set the field back because it meant that all the techniques that worked in practice that we could improve it for got ignored. So I'm kind of very cautious about proofs in general, and it's certainly not something that we look at in the course. Yes? I'm wondering, is there anything that we should be doing in terms of expectations that you may have for projects or whatever?

Is there anything we should be preparing between now and when it begins? Yes, absolutely. Your greatest impact will be if you can combine what you learn in this course with stuff that you're deeply passionate about right now. So that might be stuff you do at work, or it might be stuff you do outside of work.

So if you can come with ideas about problems you want to solve, data sets you want to explore, that's super helpful. If you can start curating the data set that you might be interested in learning more about, that would be super helpful. So I think there's always kind of a delicate balance between following the material that's in each lesson versus exploring your project.

And there's mistakes to be made on either side. If you only look at your project, you're going to miss out on actually understanding the stuff that's being presented each week. But if you just do nothing but read and listen to what's being presented and not experimenting with your own stuff, you don't get to find out where the hard edges are.

So then you can jump on the forums and say like, "Hey, Jeremy thought this would work if we did this thing, and I tried it on this data set, and it's totally not working at all." And have that conversation and try to figure it out. More generally for code, write as much Python as possible.

And then the Python we use for deep learning is a particular kind of Python. It's where we almost never use loops, but instead we do things on whole arrays, or we call it intensors at once. So learning to use the NumPy library, N-U-M-P-Y, would be super helpful because PyTorch is nearly identical.

And trying to get some experience of working with matrices and vectors and adding them together and multiplying them and stuff like that is useful. You don't need to study linear programming. Sorry, linear algebra. We have a linear algebra course, but you don't need to know. A lot of people come in and spend too much time on this math stuff.

But yeah, I would come in as good at coding as you can make yourself, because that's the language you'll be talking to the computer in. Yes, sir? Do we need to create our own compute environment like an AWS or a cloud before this class starts? No, you don't. And thanks for the good question.

If you go to course.fast.ai and then click on Server Setup, you can see there's all these different options. And if you scroll down here, you'll find a description of each one. So if you pick, let's say, Gradient, then you'll see it basically tells you create an account, click on the fastai button, decide which machine you want, click Create Notebook, and you start it.

So all the major cloud companies have fastai built into their environments. There's always a group of people that are tempted to set up a new computer and put in a GPU and all this stuff. Those are the people that spend the entire course installing Linux drivers, so don't be that person.

Google Cloud comes with $300 of free credits, which is more than enough to get you through the course many times over. There's also something called Colab, which is free, and Campbell Kernels, which are free. If you're interested in exploring a little bit of basic Linux setup stuff, Google Cloud is your best option.

If you don't want to touch that at all, PaperSpace Gradient has free one-click Jupyter notebooks that you can get started straight away, and all the data sets and notebooks are ready to go.