Jeremy Howard: Deep Learning Frameworks - TensorFlow, PyTorch, fast.ai

00:00:00.000 | From the perspective of deep learning frameworks, you work with fast AI, particularly this framework,

00:00:07.840 | and PyTorch and TensorFlow.

00:00:10.600 | What are the strengths of each platform, your perspective?

00:00:15.000 | So in terms of what we've done our research on and taught in our course, we started with

00:00:19.880 | Theano and Keras, and then we switched to TensorFlow and Keras, and then we switched

00:00:28.080 | to PyTorch, and then we switched to PyTorch and fast AI.

00:00:32.520 | And that kind of reflects a growth and development of the ecosystem of deep learning libraries.

00:00:41.680 | Theano and TensorFlow were great, but were much harder to teach and do research and development

00:00:50.400 | on because they define what's called a computational graph up front, a static graph, where you

00:00:55.360 | basically have to say, "Here are all the things that I'm going to eventually do in my model."

00:01:01.240 | And then later on you say, "Okay, do those things with this data."

00:01:04.280 | And you can't debug them, you can't do them step by step, you can't program them interactively

00:01:09.120 | in a Jupyter notebook and so forth.

00:01:11.480 | PyTorch was not the first, but PyTorch was certainly the strongest entrant to come along

00:01:16.360 | and say, "Let's not do it that way.

00:01:17.920 | Let's just use normal Python.

00:01:20.620 | And everything you know about in Python is just going to work, and we'll figure out how

00:01:24.400 | to make that run on the GPU as and when necessary."

00:01:30.040 | That turned out to be a huge leap in terms of what we could do with our research and

00:01:35.920 | what we could do with our teaching.

00:01:37.680 | Because it wasn't limiting.

00:01:39.680 | Yeah, I mean, it was critical for us for something like DawnBench to be able to rapidly try things.

00:01:45.080 | It's just so much harder to be a researcher and practitioner when you have to do everything

00:01:48.800 | up front and you can't inspect it.

00:01:52.640 | Problem with PyTorch is it's not at all accessible to newcomers because you have to write your

00:01:59.620 | own training loop and manage the gradients and all this stuff.

00:02:04.920 | And it's also not great for researchers because you're spending your time dealing with all

00:02:08.960 | this boilerplate and overhead rather than thinking about your algorithm.

00:02:13.100 | So we ended up writing this very multi-layered API that at the top level you can train a

00:02:19.000 | state-of-the-art neural network and three lines of code, and which kind of talks to

00:02:23.680 | an API, which talks to an API, which talks to an API, which you can dive into at any

00:02:27.400 | level and get progressively closer to the machine kind of levels of control.

00:02:34.560 | And this is the FastAI library.

00:02:36.800 | That's been critical for us and for our students and for lots of people that have won big learning

00:02:43.200 | competitions with it and written academic papers with it.

00:02:47.800 | It's made a big difference.

00:02:49.800 | We're still limited though by Python and particularly this problem with things like recurrent neural

00:02:56.000 | nets, say, where you just can't change things unless you accept it going so slowly that

00:03:02.840 | it's impractical.

00:03:04.840 | So in the latest incarnation of the course and with some of the research we're now starting

00:03:09.400 | to do, we're starting to do stuff, some stuff in Swift.

00:03:13.520 | I think we're three years away from that being super practical, but I'm in no hurry.

00:03:20.080 | I'm very happy to invest the time to get there.

00:03:23.800 | But, you know, with that, we actually already have a nascent version of the FastAI library

00:03:29.240 | for vision running on Swift and TensorFlow.

00:03:33.920 | Because Python for TensorFlow is not going to cut it.

00:03:37.320 | It's just a disaster.

00:03:39.120 | What they did was they tried to replicate the bits that people were saying they like

00:03:44.680 | about PyTorch, this kind of interactive computation, but they didn't actually change their foundational

00:03:51.320 | runtime components.

00:03:53.140 | So they kind of added this like syntax sugar they call TF eager, TensorFlow eager, which

00:03:57.760 | makes it look a lot like PyTorch, but it's 10 times slower than PyTorch to actually do

00:04:03.880 | a step.

00:04:05.700 | So because they didn't invest the time in like retooling the foundations because their

00:04:10.520 | code base is so horribly complex.

00:04:12.520 | Yeah, I think it's probably very difficult to do that kind of retooling.

00:04:15.560 | Yeah, well, particularly the way TensorFlow was written.

00:04:17.600 | It was written by a lot of people very quickly in a very disorganized way.

00:04:22.560 | So like when you actually look in the code, as I do often, I'm always just like, oh, God,

00:04:26.800 | what were they thinking?

00:04:28.040 | It's just, it's pretty awful.

00:04:30.640 | So I'm really extremely negative about the potential future for Python for TensorFlow.

00:04:39.260 | But Swift for TensorFlow can be a different beast altogether.

00:04:42.860 | It can be like, it can basically be a layer on top of MLIR that takes advantage of, you

00:04:49.260 | know, all the great compiler stuff that Swift builds on with LLVM.

00:04:54.140 | And yeah, it could be, I think it will be absolutely fantastic.

00:04:58.780 | Well, you're inspiring me to try.

00:05:01.060 | I haven't truly felt the pain of TensorFlow 2.0 Python.

00:05:06.820 | It's fine by me.

00:05:08.340 | But yeah, I mean, it does the job if you're using like predefined things that somebody's

00:05:14.580 | already written.

00:05:16.860 | But if you actually compare, you know, like I've had to do, because I've been having to

00:05:21.100 | do a lot of stuff with TensorFlow recently, you actually compare like, okay, I want to

00:05:24.580 | write something from scratch.

00:05:26.580 | And I just keep finding it's like, oh, it's running 10 times slower than PyTorch.

00:05:30.760 | So is the biggest cost, let's throw running time out the window, how long it takes you

00:05:37.140 | to program?

00:05:38.820 | That's not too different now.

00:05:40.200 | Thanks to TensorFlow Eager, that's not too different.

00:05:43.260 | But because so many things take so long to run, you wouldn't run it at 10 times slower.

00:05:49.460 | Like you just go like, oh, this is taking too long.

00:05:52.380 | And also, there's a lot of things which are just less programmable, like tf.data, which

00:05:56.660 | is the way data processing works in TensorFlow is just this big mess.

00:06:00.500 | It's incredibly inefficient.

00:06:02.380 | And they kind of had to write it that way because of the TPU problems I described earlier.

00:06:08.340 | So I just, you know, I just feel like they've got this huge technical debt, which they're

00:06:14.140 | not going to solve without starting from scratch.

00:06:16.820 | So here's an interesting question.

00:06:18.420 | If there's a new student starting today, what would you recommend they use?

00:06:26.140 | Well, I mean, we obviously recommend Fast.ai and PyTorch because we teach new students.

00:06:31.940 | And that's what we teach with.

00:06:33.100 | So we would very strongly recommend that because it will let you get on top of the concepts

00:06:39.040 | much more quickly.

00:06:41.100 | So then you'll become an actual and you'll also learn the actual state of the art techniques,

00:06:45.300 | you know, so you actually get world class results.

00:06:48.380 | Honestly, it doesn't much matter what library you learn, because switching from China to

00:06:56.580 | MXNet to TensorFlow to PyTorch is going to be a couple of days work as long as you understand

00:07:02.100 | the foundation as well.

00:07:04.460 | But you think will Swift creep in there as a thing that people start using?

00:07:12.100 | Not for a few years, particularly because like Swift has no data science community,

00:07:19.780 | libraries, tooling.

00:07:22.660 | And the Swift community has a total lack of appreciation and understanding of numeric

00:07:29.300 | computing.

00:07:30.300 | So like they keep on making stupid decisions, you know, for years they've just done dumb

00:07:33.920 | things around performance and prioritization.

00:07:39.540 | That's clearly changing now because the developer of Swift, Chris Latner, is working at Google

00:07:48.580 | on Swift for TensorFlow.

00:07:49.900 | So like that's a priority.

00:07:53.340 | It'll be interesting to see what happens with Apple because like Apple hasn't shown any

00:07:57.500 | sign of caring about numeric programming in Swift.

00:08:02.980 | So I mean, hopefully they'll get off their ass and start appreciating this because currently

00:08:08.320 | all of their low-level libraries are not written in Swift.

00:08:14.220 | They're not particularly Swifty at all.

00:08:16.620 | Stuff like Core ML, they're really pretty rubbish.

00:08:19.940 | So yeah, so there's a long way to go.

00:08:22.820 | But at least one nice thing is that Swift for TensorFlow can actually directly use Python

00:08:27.660 | code and Python libraries in a literally the entire lesson one notebook of fast.ai runs

00:08:34.500 | in Swift right now in Python mode.

00:08:37.740 | So that's a nice intermediate thing.

00:08:40.380 | [END]

00:08:40.880 | 1

00:08:41.880 | Page 1 of 3

Jeremy Howard: Deep Learning Frameworks - TensorFlow, PyTorch, fast.ai | AI Podcast Clips