back to indexJeremy Howard: Deep Learning Frameworks - TensorFlow, PyTorch, fast.ai | AI Podcast Clips
00:00:00.000 |
From the perspective of deep learning frameworks, you work with fast AI, particularly this framework, 00:00:10.600 |
What are the strengths of each platform, your perspective? 00:00:15.000 |
So in terms of what we've done our research on and taught in our course, we started with 00:00:19.880 |
Theano and Keras, and then we switched to TensorFlow and Keras, and then we switched 00:00:28.080 |
to PyTorch, and then we switched to PyTorch and fast AI. 00:00:32.520 |
And that kind of reflects a growth and development of the ecosystem of deep learning libraries. 00:00:41.680 |
Theano and TensorFlow were great, but were much harder to teach and do research and development 00:00:50.400 |
on because they define what's called a computational graph up front, a static graph, where you 00:00:55.360 |
basically have to say, "Here are all the things that I'm going to eventually do in my model." 00:01:01.240 |
And then later on you say, "Okay, do those things with this data." 00:01:04.280 |
And you can't debug them, you can't do them step by step, you can't program them interactively 00:01:11.480 |
PyTorch was not the first, but PyTorch was certainly the strongest entrant to come along 00:01:20.620 |
And everything you know about in Python is just going to work, and we'll figure out how 00:01:24.400 |
to make that run on the GPU as and when necessary." 00:01:30.040 |
That turned out to be a huge leap in terms of what we could do with our research and 00:01:39.680 |
Yeah, I mean, it was critical for us for something like DawnBench to be able to rapidly try things. 00:01:45.080 |
It's just so much harder to be a researcher and practitioner when you have to do everything 00:01:52.640 |
Problem with PyTorch is it's not at all accessible to newcomers because you have to write your 00:01:59.620 |
own training loop and manage the gradients and all this stuff. 00:02:04.920 |
And it's also not great for researchers because you're spending your time dealing with all 00:02:08.960 |
this boilerplate and overhead rather than thinking about your algorithm. 00:02:13.100 |
So we ended up writing this very multi-layered API that at the top level you can train a 00:02:19.000 |
state-of-the-art neural network and three lines of code, and which kind of talks to 00:02:23.680 |
an API, which talks to an API, which talks to an API, which you can dive into at any 00:02:27.400 |
level and get progressively closer to the machine kind of levels of control. 00:02:36.800 |
That's been critical for us and for our students and for lots of people that have won big learning 00:02:43.200 |
competitions with it and written academic papers with it. 00:02:49.800 |
We're still limited though by Python and particularly this problem with things like recurrent neural 00:02:56.000 |
nets, say, where you just can't change things unless you accept it going so slowly that 00:03:04.840 |
So in the latest incarnation of the course and with some of the research we're now starting 00:03:09.400 |
to do, we're starting to do stuff, some stuff in Swift. 00:03:13.520 |
I think we're three years away from that being super practical, but I'm in no hurry. 00:03:20.080 |
I'm very happy to invest the time to get there. 00:03:23.800 |
But, you know, with that, we actually already have a nascent version of the FastAI library 00:03:33.920 |
Because Python for TensorFlow is not going to cut it. 00:03:39.120 |
What they did was they tried to replicate the bits that people were saying they like 00:03:44.680 |
about PyTorch, this kind of interactive computation, but they didn't actually change their foundational 00:03:53.140 |
So they kind of added this like syntax sugar they call TF eager, TensorFlow eager, which 00:03:57.760 |
makes it look a lot like PyTorch, but it's 10 times slower than PyTorch to actually do 00:04:05.700 |
So because they didn't invest the time in like retooling the foundations because their 00:04:12.520 |
Yeah, I think it's probably very difficult to do that kind of retooling. 00:04:15.560 |
Yeah, well, particularly the way TensorFlow was written. 00:04:17.600 |
It was written by a lot of people very quickly in a very disorganized way. 00:04:22.560 |
So like when you actually look in the code, as I do often, I'm always just like, oh, God, 00:04:30.640 |
So I'm really extremely negative about the potential future for Python for TensorFlow. 00:04:39.260 |
But Swift for TensorFlow can be a different beast altogether. 00:04:42.860 |
It can be like, it can basically be a layer on top of MLIR that takes advantage of, you 00:04:49.260 |
know, all the great compiler stuff that Swift builds on with LLVM. 00:04:54.140 |
And yeah, it could be, I think it will be absolutely fantastic. 00:05:01.060 |
I haven't truly felt the pain of TensorFlow 2.0 Python. 00:05:08.340 |
But yeah, I mean, it does the job if you're using like predefined things that somebody's 00:05:16.860 |
But if you actually compare, you know, like I've had to do, because I've been having to 00:05:21.100 |
do a lot of stuff with TensorFlow recently, you actually compare like, okay, I want to 00:05:26.580 |
And I just keep finding it's like, oh, it's running 10 times slower than PyTorch. 00:05:30.760 |
So is the biggest cost, let's throw running time out the window, how long it takes you 00:05:40.200 |
Thanks to TensorFlow Eager, that's not too different. 00:05:43.260 |
But because so many things take so long to run, you wouldn't run it at 10 times slower. 00:05:49.460 |
Like you just go like, oh, this is taking too long. 00:05:52.380 |
And also, there's a lot of things which are just less programmable, like tf.data, which 00:05:56.660 |
is the way data processing works in TensorFlow is just this big mess. 00:06:02.380 |
And they kind of had to write it that way because of the TPU problems I described earlier. 00:06:08.340 |
So I just, you know, I just feel like they've got this huge technical debt, which they're 00:06:14.140 |
not going to solve without starting from scratch. 00:06:18.420 |
If there's a new student starting today, what would you recommend they use? 00:06:26.140 |
Well, I mean, we obviously recommend Fast.ai and PyTorch because we teach new students. 00:06:33.100 |
So we would very strongly recommend that because it will let you get on top of the concepts 00:06:41.100 |
So then you'll become an actual and you'll also learn the actual state of the art techniques, 00:06:45.300 |
you know, so you actually get world class results. 00:06:48.380 |
Honestly, it doesn't much matter what library you learn, because switching from China to 00:06:56.580 |
MXNet to TensorFlow to PyTorch is going to be a couple of days work as long as you understand 00:07:04.460 |
But you think will Swift creep in there as a thing that people start using? 00:07:12.100 |
Not for a few years, particularly because like Swift has no data science community, 00:07:22.660 |
And the Swift community has a total lack of appreciation and understanding of numeric 00:07:30.300 |
So like they keep on making stupid decisions, you know, for years they've just done dumb 00:07:33.920 |
things around performance and prioritization. 00:07:39.540 |
That's clearly changing now because the developer of Swift, Chris Latner, is working at Google 00:07:53.340 |
It'll be interesting to see what happens with Apple because like Apple hasn't shown any 00:07:57.500 |
sign of caring about numeric programming in Swift. 00:08:02.980 |
So I mean, hopefully they'll get off their ass and start appreciating this because currently 00:08:08.320 |
all of their low-level libraries are not written in Swift. 00:08:16.620 |
Stuff like Core ML, they're really pretty rubbish. 00:08:22.820 |
But at least one nice thing is that Swift for TensorFlow can actually directly use Python 00:08:27.660 |
code and Python libraries in a literally the entire lesson one notebook of fast.ai runs