back to indexFrançois Chollet: History of Keras and TensorFlow | AI Podcast Clips
00:00:00.000 |
- Let's go from the philosophical to the practical. 00:00:09.640 |
that you kind of remember in relation to Keras 00:00:11.680 |
and in general, TensorFlow, Theano, the old days. 00:00:15.240 |
Can you give a brief overview, Wikipedia style history 00:00:18.600 |
and your role in it before we return to AGI discussions? 00:00:33.400 |
So I started working on it in February, 2015. 00:00:39.440 |
there weren't too many people working on deep learning, 00:00:43.520 |
The software tooling was not really developed. 00:01:02.080 |
Caffe was the one library that everyone was using 00:01:06.560 |
- And computer vision was the most popular problem. 00:01:10.120 |
Like Covenants was like the subfield of deep learning 00:01:44.040 |
And there was no like good solution for RNNs at the time. 00:01:49.040 |
Like there was no reusable open source implementation 00:02:08.800 |
is that the models would be defined via Python code, 00:02:13.720 |
which was kind of like going against the mainstream 00:02:17.760 |
at the time because Caffe, PyLearn2, and so on, 00:02:21.360 |
like all the big libraries were actually going 00:02:24.000 |
with the approach of having static configuration files 00:02:28.920 |
So some libraries were using code to define models, 00:02:32.240 |
like Torch7, obviously, but that was not Python. 00:02:35.640 |
Lasagne was like a Theano-based, very early library 00:02:40.080 |
that was, I think, developed, I'm not sure exactly, 00:02:51.560 |
And the value proposition at the time was that 00:02:59.400 |
reusable open source implementation of Elastium, 00:03:21.520 |
So I drew a lot of inspiration from Scikit-Learn 00:03:25.600 |
It's almost like Scikit-Learn for neural networks. 00:03:45.880 |
It's magical in the sense that it's delightful, right? 00:04:05.920 |
you made me realize that that was a design decision at all, 00:04:13.080 |
whether the YAML, especially if Caffe was the most popular. 00:04:16.360 |
- It was the most popular by far at the time. 00:04:50.440 |
Lots of people were starting to be interested in LSTM. 00:04:55.640 |
because it was offering an easy to use LSTM implementation. 00:05:00.440 |
started to be intrigued by the capabilities of RNN, 00:05:14.720 |
and that was actually completely unrelated to Keras. 00:05:23.880 |
So I was doing computer vision research at Google initially. 00:05:28.680 |
I was exposed to the early internal version of TensorFlow. 00:05:37.120 |
and that was definitely the way it was at the time, 00:05:38.920 |
is that this was an improved version of Theano. 00:05:50.000 |
And I was actually very busy as a new Googler. 00:05:57.720 |
But then in November, I think it was November 2015, 00:06:07.760 |
that, hey, I had to actually go and make it happen. 00:06:10.520 |
So in December, I ported Keras to run on top of TensorFlow, 00:06:18.480 |
where I was abstracting away all the backend functionality 00:06:30.640 |
And for the next year, Theano stayed as the default option. 00:06:43.840 |
It was much faster, especially when it came to on-ends. 00:06:53.400 |
has similar architectural decisions as Theano. 00:07:14.680 |
So, and even though it grew to have, you know, 00:07:17.960 |
a lot of users for DeepLean library at the time, 00:07:20.720 |
like throughout 2016, but I wasn't doing it as my main job. 00:07:42.800 |
- Where I was doing like, so I was doing research 00:07:44.240 |
and things like, so I did a lot of computer vision research, 00:07:56.080 |
And so Rajat was saying, "Hey, we saw Keras, we like it. 00:08:10.520 |
And I was like, "Yeah, that sounds like a great opportunity. 00:08:13.600 |
And so I started working on integrating the Keras API 00:08:20.520 |
So what followed up is a sort of like temporary 00:08:35.400 |
And, you know, I've never actually gotten back 00:08:40.800 |
- Well, it's kind of funny that somebody like you 00:08:45.520 |
who dreams of, or at least sees the power of AI systems 00:08:50.520 |
that reason and theorem proving we'll talk about 00:08:54.840 |
has also created a system that makes the most basic 00:09:12.280 |
But so TensorFlow 2.0, it's kind of, there's a sprint. 00:09:20.160 |
What do you look, what are you working on these days? 00:09:28.960 |
There's so many things that just make it a lot easier 00:09:36.800 |
What are the problems you have to kind of solve? 00:09:49.640 |
It's a delightful product compared to TensorFlow 1.0. 00:09:54.640 |
So on the Keras side, what I'm really excited about is that, 00:10:00.600 |
so, you know, previously Keras has been this very easy 00:10:05.280 |
to use high-level interface to do deep learning. 00:10:10.720 |
you know, if you wanted a lot of flexibility, 00:10:18.840 |
was probably not the optimal way to do things 00:10:21.840 |
compared to just writing everything from scratch. 00:10:24.280 |
So in some way, the framework was getting in the way. 00:10:28.120 |
And in TensorFlow 2.0, you don't have this at all, 00:10:31.000 |
actually, you have the usability of the high-level interface, 00:10:34.520 |
but you have the flexibility of this lower-level interface. 00:10:45.000 |
and flexibility trade-offs depending on your needs, right? 00:10:53.120 |
and you get a lot of help doing so by, you know, 00:10:56.440 |
subclassing models and writing some train loops 00:11:14.760 |
and, you know, are ideal for a data scientist, 00:11:28.240 |
that are more or less low-level, more or less high-level 00:11:33.680 |
profiles ranging from researchers to data scientists