Back to Index

François Chollet: History of Keras and TensorFlow | AI Podcast Clips


Transcript

- Let's go from the philosophical to the practical. Can you give me a history of Keras and all the major deep learning frameworks that you kind of remember in relation to Keras and in general, TensorFlow, Theano, the old days. Can you give a brief overview, Wikipedia style history and your role in it before we return to AGI discussions?

- Yeah, that's a broad topic. So I started working on Keras. It was a name Keras at the time. I actually picked the name like just the day I was gonna release it. So I started working on it in February, 2015. And so at the time, there weren't too many people working on deep learning, maybe like fewer than 10,000.

The software tooling was not really developed. So the main deep learning library was Caffe, which was mostly C++. - Why do you say Caffe was the main one? - Caffe was vastly more popular than Theano in late 2014, early 2015. Caffe was the one library that everyone was using for computer vision.

- And computer vision was the most popular problem. - Absolutely. Like Covenants was like the subfield of deep learning that everyone was working on. So myself, so in late 2014, I was actually interested in RNNs, in Recurrent Neural Networks, which was a very niche topic at the time. It really took off around 2016.

And so I was looking for good tools. I had used Torch7, I had used Theano, used Theano a lot in Kaggle competitions. I had used Caffe. And there was no like good solution for RNNs at the time. Like there was no reusable open source implementation of an LSTM, for instance.

So I decided to build my own. And at first, the pitch for that was, it was gonna be mostly around LSTM, Recurrent Neural Networks. It was gonna be in Python. An important decision at the time that was kind of not obvious is that the models would be defined via Python code, which was kind of like going against the mainstream at the time because Caffe, PyLearn2, and so on, like all the big libraries were actually going with the approach of having static configuration files in YAML to define models.

So some libraries were using code to define models, like Torch7, obviously, but that was not Python. Lasagne was like a Theano-based, very early library that was, I think, developed, I'm not sure exactly, probably late 2014. - It's Python as well. - It's Python as well. It was like on top of Theano.

And so I started working on something. And the value proposition at the time was that not only that what I think was the first reusable open source implementation of Elastium, you could combine ONNNs and Covenants with the same library, which is not really possible before, like Caffe was only doing Covenants.

And it was kind of easy to use because, so before I was using Theano, I was actually using Scikit-Learn, and I loved Scikit-Learn for its usability. So I drew a lot of inspiration from Scikit-Learn when I met Keras. It's almost like Scikit-Learn for neural networks. - Yeah, the fit function.

- Exactly, the fit function, like reducing a complex string loop to a single function call, right? And of course, some people will say, this is hiding a lot of details, but that's exactly the point, right? The magic is the point. So it's magical, but in a good way. It's magical in the sense that it's delightful, right?

- Yeah, I'm actually quite surprised. I didn't know that it was born out of desire to implement RNNs and LSTMs. - It was. - That's fascinating. So you were actually one of the first people to really try to attempt to get the major architectures together. And it's also interesting, you made me realize that that was a design decision at all, is defining the model and code.

Just, I'm putting myself in your shoes, whether the YAML, especially if Caffe was the most popular. - It was the most popular by far at the time. - If I were, yeah, I don't, I didn't like the YAML thing, but it makes more sense that you will put in a configuration file the definition of a model.

That's an interesting gutsy move to stick with defining it in code. Just if you look back. - Other libraries were doing it as well, but it was definitely the more niche option. - Yeah, okay, Keras and then- - Keras, so I released Keras in March, 2015, and it got users pretty much from the start.

So the deep learning community was very, very small at the time. Lots of people were starting to be interested in LSTM. So it was gonna release it at the right time because it was offering an easy to use LSTM implementation. Exactly at the time where lots of people started to be intrigued by the capabilities of RNN, RNN, so NLP.

So it grew from there. Then I joined Google about six months later, and that was actually completely unrelated to Keras. I actually joined a research team working on image classification, mostly like computer vision. So I was doing computer vision research at Google initially. And immediately when I joined Google, I was exposed to the early internal version of TensorFlow.

And the way it appeared to me at the time, and that was definitely the way it was at the time, is that this was an improved version of Theano. So I immediately knew I had to port Keras to this new TensorFlow thing. And I was actually very busy as a new Googler.

So I had not time to work on that. But then in November, I think it was November 2015, TensorFlow got released. And it was kind of like my wake up call that, hey, I had to actually go and make it happen. So in December, I ported Keras to run on top of TensorFlow, but it was not exactly a port.

It was more like a refactoring, where I was abstracting away all the backend functionality into one module, so that the same code base could run on top of multiple backends. So on top of TensorFlow or Theano. And for the next year, Theano stayed as the default option. It was easier to use, somewhat less buggy.

It was much faster, especially when it came to on-ends. But eventually, TensorFlow overtook it. - And TensorFlow, the early TensorFlow, has similar architectural decisions as Theano. So it was a natural transition. - Yeah, absolutely. - So what, I mean, that's still Keras as a side, almost fun project, right?

- Yeah, so it was not my job assignment. So it's not, I was doing it on the side. So, and even though it grew to have, you know, a lot of users for DeepLean library at the time, like throughout 2016, but I wasn't doing it as my main job.

So things started changing in, I think it's, must have been maybe October, 2016. So one year later. So Rajat, who was the lead on TensorFlow, basically showed up one day in our building. - Yeah. - Where I was doing like, so I was doing research and things like, so I did a lot of computer vision research, also collaborations with Christian Zugedi and deep learning for theorem proving.

It was a really interesting research topic. And so Rajat was saying, "Hey, we saw Keras, we like it. We saw that you're at Google. Why don't you come over for like a quarter and work with us?" And I was like, "Yeah, that sounds like a great opportunity. Let's do it." And so I started working on integrating the Keras API into TensorFlow more tightly.

So what followed up is a sort of like temporary TensorFlow only version of Keras that was in TensorFlow.contrib for a while. And finally moved to TensorFlow core. And, you know, I've never actually gotten back to my old team doing research. - Well, it's kind of funny that somebody like you who dreams of, or at least sees the power of AI systems that reason and theorem proving we'll talk about has also created a system that makes the most basic kind of Lego building that is deep learning, super accessible, super easy.

So beautifully so. It's a funny irony that you're both, you're responsible for both things. But so TensorFlow 2.0, it's kind of, there's a sprint. I don't know how long it'll take, but there's a sprint towards the finish. What do you look, what are you working on these days? What are you excited about?

What are you excited about in 2.0? I mean, eager execution. There's so many things that just make it a lot easier to work. What are you excited about? And what's also really hard? What are the problems you have to kind of solve? - So I've spent the past year and a half working on TensorFlow 2.0.

It's been a long journey. I'm actually extremely excited about it. I think it's a great product. It's a delightful product compared to TensorFlow 1.0. We've made huge progress. So on the Keras side, what I'm really excited about is that, so, you know, previously Keras has been this very easy to use high-level interface to do deep learning.

But if you wanted to, you know, if you wanted a lot of flexibility, the Keras framework, you know, was probably not the optimal way to do things compared to just writing everything from scratch. So in some way, the framework was getting in the way. And in TensorFlow 2.0, you don't have this at all, actually, you have the usability of the high-level interface, but you have the flexibility of this lower-level interface.

And you have this spectrum of workflows where you can get more or less usability and flexibility trade-offs depending on your needs, right? You can write everything from scratch and you get a lot of help doing so by, you know, subclassing models and writing some train loops using ego execution.

It's very flexible. It's very easy to debug. It's very powerful. But all of this integrates seamlessly with higher-level features up to, you know, the classic Keras workflows, which are very scikit-learn-like and, you know, are ideal for a data scientist, machine learning engineer type of profile. So now you can have the same framework offering the same set of APIs that enable a spectrum of workflows that are more or less low-level, more or less high-level that are suitable for, you know, profiles ranging from researchers to data scientists and everything in between.

(upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music)