François Chollet: History of Keras and TensorFlow

00:00:00.000 | - Let's go from the philosophical to the practical.

00:00:05.000 | Can you give me a history of Keras

00:00:07.420 | and all the major deep learning frameworks

00:00:09.640 | that you kind of remember in relation to Keras

00:00:11.680 | and in general, TensorFlow, Theano, the old days.

00:00:15.240 | Can you give a brief overview, Wikipedia style history

00:00:18.600 | and your role in it before we return to AGI discussions?

00:00:22.320 | - Yeah, that's a broad topic.

00:00:23.880 | So I started working on Keras.

00:00:27.260 | It was a name Keras at the time.

00:00:29.440 | I actually picked the name like

00:00:31.520 | just the day I was gonna release it.

00:00:33.400 | So I started working on it in February, 2015.

00:00:38.000 | And so at the time,

00:00:39.440 | there weren't too many people working on deep learning,

00:00:41.280 | maybe like fewer than 10,000.

00:00:43.520 | The software tooling was not really developed.

00:00:46.060 | So the main deep learning library was Caffe,

00:00:52.000 | which was mostly C++.

00:00:54.040 | - Why do you say Caffe was the main one?

00:00:55.960 | - Caffe was vastly more popular than Theano

00:00:59.200 | in late 2014, early 2015.

00:01:02.080 | Caffe was the one library that everyone was using

00:01:05.560 | for computer vision.

00:01:06.560 | - And computer vision was the most popular problem.

00:01:09.280 | - Absolutely.

00:01:10.120 | Like Covenants was like the subfield of deep learning

00:01:13.640 | that everyone was working on.

00:01:16.320 | So myself, so in late 2014,

00:01:20.840 | I was actually interested in RNNs,

00:01:23.760 | in Recurrent Neural Networks,

00:01:24.920 | which was a very niche topic at the time.

00:01:28.960 | It really took off around 2016.

00:01:31.840 | And so I was looking for good tools.

00:01:34.280 | I had used Torch7, I had used Theano,

00:01:37.960 | used Theano a lot in Kaggle competitions.

00:01:40.800 | I had used Caffe.

00:01:44.040 | And there was no like good solution for RNNs at the time.

00:01:49.040 | Like there was no reusable open source implementation

00:01:52.000 | of an LSTM, for instance.

00:01:53.360 | So I decided to build my own.

00:01:56.240 | And at first, the pitch for that was,

00:01:58.800 | it was gonna be mostly around LSTM,

00:02:02.280 | Recurrent Neural Networks.

00:02:03.320 | It was gonna be in Python.

00:02:04.720 | An important decision at the time

00:02:07.680 | that was kind of not obvious

00:02:08.800 | is that the models would be defined via Python code,

00:02:13.720 | which was kind of like going against the mainstream

00:02:17.760 | at the time because Caffe, PyLearn2, and so on,

00:02:21.360 | like all the big libraries were actually going

00:02:24.000 | with the approach of having static configuration files

00:02:26.880 | in YAML to define models.

00:02:28.920 | So some libraries were using code to define models,

00:02:32.240 | like Torch7, obviously, but that was not Python.

00:02:35.640 | Lasagne was like a Theano-based, very early library

00:02:40.080 | that was, I think, developed, I'm not sure exactly,

00:02:42.000 | probably late 2014.

00:02:43.560 | - It's Python as well.

00:02:44.560 | - It's Python as well.

00:02:45.400 | It was like on top of Theano.

00:02:47.880 | And so I started working on something.

00:02:51.560 | And the value proposition at the time was that

00:02:55.720 | not only that what I think was the first

00:02:59.400 | reusable open source implementation of Elastium,

00:03:01.960 | you could combine ONNNs and Covenants

00:03:07.640 | with the same library,

00:03:08.600 | which is not really possible before,

00:03:10.120 | like Caffe was only doing Covenants.

00:03:12.280 | And it was kind of easy to use because,

00:03:16.240 | so before I was using Theano,

00:03:17.640 | I was actually using Scikit-Learn,

00:03:18.880 | and I loved Scikit-Learn for its usability.

00:03:21.520 | So I drew a lot of inspiration from Scikit-Learn

00:03:24.760 | when I met Keras.

00:03:25.600 | It's almost like Scikit-Learn for neural networks.

00:03:28.200 | - Yeah, the fit function.

00:03:29.920 | - Exactly, the fit function,

00:03:31.160 | like reducing a complex string loop

00:03:34.040 | to a single function call, right?

00:03:36.080 | And of course, some people will say,

00:03:38.120 | this is hiding a lot of details,

00:03:39.560 | but that's exactly the point, right?

00:03:41.920 | The magic is the point.

00:03:43.480 | So it's magical, but in a good way.

00:03:45.880 | It's magical in the sense that it's delightful, right?

00:03:48.800 | - Yeah, I'm actually quite surprised.

00:03:50.840 | I didn't know that it was born out of desire

00:03:52.760 | to implement RNNs and LSTMs.

00:03:55.640 | - It was. - That's fascinating.

00:03:56.960 | So you were actually one of the first people

00:03:59.200 | to really try to attempt

00:04:01.120 | to get the major architectures together.

00:04:04.160 | And it's also interesting,

00:04:05.920 | you made me realize that that was a design decision at all,

00:04:08.320 | is defining the model and code.

00:04:10.520 | Just, I'm putting myself in your shoes,

00:04:13.080 | whether the YAML, especially if Caffe was the most popular.

00:04:16.360 | - It was the most popular by far at the time.

00:04:19.200 | - If I were, yeah, I don't,

00:04:21.640 | I didn't like the YAML thing,

00:04:22.760 | but it makes more sense

00:04:25.440 | that you will put in a configuration file

00:04:27.200 | the definition of a model.

00:04:28.880 | That's an interesting gutsy move

00:04:30.360 | to stick with defining it in code.

00:04:33.240 | Just if you look back.

00:04:34.760 | - Other libraries were doing it as well,

00:04:36.640 | but it was definitely the more niche option.

00:04:39.480 | - Yeah, okay, Keras and then-

00:04:41.560 | - Keras, so I released Keras in March, 2015,

00:04:44.720 | and it got users pretty much from the start.

00:04:47.360 | So the deep learning community

00:04:48.280 | was very, very small at the time.

00:04:50.440 | Lots of people were starting to be interested in LSTM.

00:04:53.800 | So it was gonna release it at the right time

00:04:55.640 | because it was offering an easy to use LSTM implementation.

00:04:58.760 | Exactly at the time where lots of people

00:05:00.440 | started to be intrigued by the capabilities of RNN,

00:05:04.440 | RNN, so NLP.

00:05:05.480 | So it grew from there.

00:05:07.120 | Then I joined Google about six months later,

00:05:14.720 | and that was actually completely unrelated to Keras.

00:05:18.160 | I actually joined a research team

00:05:20.280 | working on image classification,

00:05:22.720 | mostly like computer vision.

00:05:23.880 | So I was doing computer vision research at Google initially.

00:05:26.840 | And immediately when I joined Google,

00:05:28.680 | I was exposed to the early internal version of TensorFlow.

00:05:33.680 | And the way it appeared to me at the time,

00:05:37.120 | and that was definitely the way it was at the time,

00:05:38.920 | is that this was an improved version of Theano.

00:05:43.920 | So I immediately knew I had to port Keras

00:05:47.920 | to this new TensorFlow thing.

00:05:50.000 | And I was actually very busy as a new Googler.

00:05:53.960 | So I had not time to work on that.

00:05:57.720 | But then in November, I think it was November 2015,

00:06:01.880 | TensorFlow got released.

00:06:04.440 | And it was kind of like my wake up call

00:06:07.760 | that, hey, I had to actually go and make it happen.

00:06:10.520 | So in December, I ported Keras to run on top of TensorFlow,

00:06:15.360 | but it was not exactly a port.

00:06:16.520 | It was more like a refactoring,

00:06:18.480 | where I was abstracting away all the backend functionality

00:06:22.640 | into one module, so that the same code base

00:06:25.520 | could run on top of multiple backends.

00:06:28.280 | So on top of TensorFlow or Theano.

00:06:30.640 | And for the next year, Theano stayed as the default option.

00:06:35.640 | It was easier to use, somewhat less buggy.

00:06:43.840 | It was much faster, especially when it came to on-ends.

00:06:46.560 | But eventually, TensorFlow overtook it.

00:06:49.560 | - And TensorFlow, the early TensorFlow,

00:06:53.400 | has similar architectural decisions as Theano.

00:06:57.120 | So it was a natural transition.

00:07:00.640 | - Yeah, absolutely.

00:07:01.480 | - So what, I mean, that's still Keras

00:07:05.280 | as a side, almost fun project, right?

00:07:08.480 | - Yeah, so it was not my job assignment.

00:07:12.200 | So it's not, I was doing it on the side.

00:07:14.680 | So, and even though it grew to have, you know,

00:07:17.960 | a lot of users for DeepLean library at the time,

00:07:20.720 | like throughout 2016, but I wasn't doing it as my main job.

00:07:25.680 | So things started changing in, I think it's,

00:07:28.400 | must have been maybe October, 2016.

00:07:33.360 | So one year later.

00:07:34.520 | So Rajat, who was the lead on TensorFlow,

00:07:38.440 | basically showed up one day in our building.

00:07:41.960 | - Yeah.

00:07:42.800 | - Where I was doing like, so I was doing research

00:07:44.240 | and things like, so I did a lot of computer vision research,

00:07:47.840 | also collaborations with Christian Zugedi

00:07:50.760 | and deep learning for theorem proving.

00:07:52.840 | It was a really interesting research topic.

00:07:56.080 | And so Rajat was saying, "Hey, we saw Keras, we like it.

00:08:02.720 | We saw that you're at Google.

00:08:05.640 | Why don't you come over for like a quarter

00:08:08.480 | and work with us?"

00:08:10.520 | And I was like, "Yeah, that sounds like a great opportunity.

00:08:12.440 | Let's do it."

00:08:13.600 | And so I started working on integrating the Keras API

00:08:18.600 | into TensorFlow more tightly.

00:08:20.520 | So what followed up is a sort of like temporary

00:08:25.840 | TensorFlow only version of Keras

00:08:28.680 | that was in TensorFlow.contrib for a while.

00:08:32.520 | And finally moved to TensorFlow core.

00:08:35.400 | And, you know, I've never actually gotten back

00:08:38.560 | to my old team doing research.

00:08:40.800 | - Well, it's kind of funny that somebody like you

00:08:45.520 | who dreams of, or at least sees the power of AI systems

00:08:50.520 | that reason and theorem proving we'll talk about

00:08:54.840 | has also created a system that makes the most basic

00:08:59.720 | kind of Lego building that is deep learning,

00:09:03.560 | super accessible, super easy.

00:09:05.800 | So beautifully so.

00:09:07.000 | It's a funny irony that you're both,

00:09:10.880 | you're responsible for both things.

00:09:12.280 | But so TensorFlow 2.0, it's kind of, there's a sprint.

00:09:17.160 | I don't know how long it'll take,

00:09:18.240 | but there's a sprint towards the finish.

00:09:20.160 | What do you look, what are you working on these days?

00:09:24.240 | What are you excited about?

00:09:25.320 | What are you excited about in 2.0?

00:09:27.440 | I mean, eager execution.

00:09:28.960 | There's so many things that just make it a lot easier

00:09:31.600 | to work.

00:09:32.920 | What are you excited about?

00:09:34.760 | And what's also really hard?

00:09:36.800 | What are the problems you have to kind of solve?

00:09:38.960 | - So I've spent the past year and a half

00:09:41.160 | working on TensorFlow 2.0.

00:09:44.080 | It's been a long journey.

00:09:46.120 | I'm actually extremely excited about it.

00:09:48.280 | I think it's a great product.

00:09:49.640 | It's a delightful product compared to TensorFlow 1.0.

00:09:52.560 | We've made huge progress.

00:09:54.640 | So on the Keras side, what I'm really excited about is that,

00:10:00.600 | so, you know, previously Keras has been this very easy

00:10:05.280 | to use high-level interface to do deep learning.

00:10:08.400 | But if you wanted to,

00:10:10.720 | you know, if you wanted a lot of flexibility,

00:10:16.480 | the Keras framework, you know,

00:10:18.840 | was probably not the optimal way to do things

00:10:21.840 | compared to just writing everything from scratch.

00:10:24.280 | So in some way, the framework was getting in the way.

00:10:28.120 | And in TensorFlow 2.0, you don't have this at all,

00:10:31.000 | actually, you have the usability of the high-level interface,

00:10:34.520 | but you have the flexibility of this lower-level interface.

00:10:37.960 | And you have this spectrum of workflows

00:10:40.360 | where you can get more or less usability

00:10:45.000 | and flexibility trade-offs depending on your needs, right?

00:10:50.000 | You can write everything from scratch

00:10:53.120 | and you get a lot of help doing so by, you know,

00:10:56.440 | subclassing models and writing some train loops

00:10:59.880 | using ego execution.

00:11:01.640 | It's very flexible.

00:11:02.560 | It's very easy to debug.

00:11:03.600 | It's very powerful.

00:11:04.600 | But all of this integrates seamlessly

00:11:08.200 | with higher-level features up to, you know,

00:11:11.040 | the classic Keras workflows,

00:11:12.640 | which are very scikit-learn-like

00:11:14.760 | and, you know, are ideal for a data scientist,

00:11:19.280 | machine learning engineer type of profile.

00:11:21.440 | So now you can have the same framework

00:11:24.040 | offering the same set of APIs

00:11:26.080 | that enable a spectrum of workflows

00:11:28.240 | that are more or less low-level, more or less high-level

00:11:31.800 | that are suitable for, you know,

00:11:33.680 | profiles ranging from researchers to data scientists

00:11:37.640 | and everything in between.

00:11:39.000 | (upbeat music)

00:11:41.600 | (upbeat music)

00:11:44.200 | (upbeat music)

00:11:46.800 | (upbeat music)

00:11:49.400 | (upbeat music)

00:11:52.000 | (upbeat music)

00:11:54.600 | [BLANK_AUDIO]

François Chollet: History of Keras and TensorFlow | AI Podcast Clips