back to index

François Chollet: History of Keras and TensorFlow | AI Podcast Clips


Whisper Transcript | Transcript Only Page

00:00:00.000 | - Let's go from the philosophical to the practical.
00:00:05.000 | Can you give me a history of Keras
00:00:07.420 | and all the major deep learning frameworks
00:00:09.640 | that you kind of remember in relation to Keras
00:00:11.680 | and in general, TensorFlow, Theano, the old days.
00:00:15.240 | Can you give a brief overview, Wikipedia style history
00:00:18.600 | and your role in it before we return to AGI discussions?
00:00:22.320 | - Yeah, that's a broad topic.
00:00:23.880 | So I started working on Keras.
00:00:27.260 | It was a name Keras at the time.
00:00:29.440 | I actually picked the name like
00:00:31.520 | just the day I was gonna release it.
00:00:33.400 | So I started working on it in February, 2015.
00:00:38.000 | And so at the time,
00:00:39.440 | there weren't too many people working on deep learning,
00:00:41.280 | maybe like fewer than 10,000.
00:00:43.520 | The software tooling was not really developed.
00:00:46.060 | So the main deep learning library was Caffe,
00:00:52.000 | which was mostly C++.
00:00:54.040 | - Why do you say Caffe was the main one?
00:00:55.960 | - Caffe was vastly more popular than Theano
00:00:59.200 | in late 2014, early 2015.
00:01:02.080 | Caffe was the one library that everyone was using
00:01:05.560 | for computer vision.
00:01:06.560 | - And computer vision was the most popular problem.
00:01:09.280 | - Absolutely.
00:01:10.120 | Like Covenants was like the subfield of deep learning
00:01:13.640 | that everyone was working on.
00:01:16.320 | So myself, so in late 2014,
00:01:20.840 | I was actually interested in RNNs,
00:01:23.760 | in Recurrent Neural Networks,
00:01:24.920 | which was a very niche topic at the time.
00:01:28.960 | It really took off around 2016.
00:01:31.840 | And so I was looking for good tools.
00:01:34.280 | I had used Torch7, I had used Theano,
00:01:37.960 | used Theano a lot in Kaggle competitions.
00:01:40.800 | I had used Caffe.
00:01:44.040 | And there was no like good solution for RNNs at the time.
00:01:49.040 | Like there was no reusable open source implementation
00:01:52.000 | of an LSTM, for instance.
00:01:53.360 | So I decided to build my own.
00:01:56.240 | And at first, the pitch for that was,
00:01:58.800 | it was gonna be mostly around LSTM,
00:02:02.280 | Recurrent Neural Networks.
00:02:03.320 | It was gonna be in Python.
00:02:04.720 | An important decision at the time
00:02:07.680 | that was kind of not obvious
00:02:08.800 | is that the models would be defined via Python code,
00:02:13.720 | which was kind of like going against the mainstream
00:02:17.760 | at the time because Caffe, PyLearn2, and so on,
00:02:21.360 | like all the big libraries were actually going
00:02:24.000 | with the approach of having static configuration files
00:02:26.880 | in YAML to define models.
00:02:28.920 | So some libraries were using code to define models,
00:02:32.240 | like Torch7, obviously, but that was not Python.
00:02:35.640 | Lasagne was like a Theano-based, very early library
00:02:40.080 | that was, I think, developed, I'm not sure exactly,
00:02:42.000 | probably late 2014.
00:02:43.560 | - It's Python as well.
00:02:44.560 | - It's Python as well.
00:02:45.400 | It was like on top of Theano.
00:02:47.880 | And so I started working on something.
00:02:51.560 | And the value proposition at the time was that
00:02:55.720 | not only that what I think was the first
00:02:59.400 | reusable open source implementation of Elastium,
00:03:01.960 | you could combine ONNNs and Covenants
00:03:07.640 | with the same library,
00:03:08.600 | which is not really possible before,
00:03:10.120 | like Caffe was only doing Covenants.
00:03:12.280 | And it was kind of easy to use because,
00:03:16.240 | so before I was using Theano,
00:03:17.640 | I was actually using Scikit-Learn,
00:03:18.880 | and I loved Scikit-Learn for its usability.
00:03:21.520 | So I drew a lot of inspiration from Scikit-Learn
00:03:24.760 | when I met Keras.
00:03:25.600 | It's almost like Scikit-Learn for neural networks.
00:03:28.200 | - Yeah, the fit function.
00:03:29.920 | - Exactly, the fit function,
00:03:31.160 | like reducing a complex string loop
00:03:34.040 | to a single function call, right?
00:03:36.080 | And of course, some people will say,
00:03:38.120 | this is hiding a lot of details,
00:03:39.560 | but that's exactly the point, right?
00:03:41.920 | The magic is the point.
00:03:43.480 | So it's magical, but in a good way.
00:03:45.880 | It's magical in the sense that it's delightful, right?
00:03:48.800 | - Yeah, I'm actually quite surprised.
00:03:50.840 | I didn't know that it was born out of desire
00:03:52.760 | to implement RNNs and LSTMs.
00:03:55.640 | - It was. - That's fascinating.
00:03:56.960 | So you were actually one of the first people
00:03:59.200 | to really try to attempt
00:04:01.120 | to get the major architectures together.
00:04:04.160 | And it's also interesting,
00:04:05.920 | you made me realize that that was a design decision at all,
00:04:08.320 | is defining the model and code.
00:04:10.520 | Just, I'm putting myself in your shoes,
00:04:13.080 | whether the YAML, especially if Caffe was the most popular.
00:04:16.360 | - It was the most popular by far at the time.
00:04:19.200 | - If I were, yeah, I don't,
00:04:21.640 | I didn't like the YAML thing,
00:04:22.760 | but it makes more sense
00:04:25.440 | that you will put in a configuration file
00:04:27.200 | the definition of a model.
00:04:28.880 | That's an interesting gutsy move
00:04:30.360 | to stick with defining it in code.
00:04:33.240 | Just if you look back.
00:04:34.760 | - Other libraries were doing it as well,
00:04:36.640 | but it was definitely the more niche option.
00:04:39.480 | - Yeah, okay, Keras and then-
00:04:41.560 | - Keras, so I released Keras in March, 2015,
00:04:44.720 | and it got users pretty much from the start.
00:04:47.360 | So the deep learning community
00:04:48.280 | was very, very small at the time.
00:04:50.440 | Lots of people were starting to be interested in LSTM.
00:04:53.800 | So it was gonna release it at the right time
00:04:55.640 | because it was offering an easy to use LSTM implementation.
00:04:58.760 | Exactly at the time where lots of people
00:05:00.440 | started to be intrigued by the capabilities of RNN,
00:05:04.440 | RNN, so NLP.
00:05:05.480 | So it grew from there.
00:05:07.120 | Then I joined Google about six months later,
00:05:14.720 | and that was actually completely unrelated to Keras.
00:05:18.160 | I actually joined a research team
00:05:20.280 | working on image classification,
00:05:22.720 | mostly like computer vision.
00:05:23.880 | So I was doing computer vision research at Google initially.
00:05:26.840 | And immediately when I joined Google,
00:05:28.680 | I was exposed to the early internal version of TensorFlow.
00:05:33.680 | And the way it appeared to me at the time,
00:05:37.120 | and that was definitely the way it was at the time,
00:05:38.920 | is that this was an improved version of Theano.
00:05:43.920 | So I immediately knew I had to port Keras
00:05:47.920 | to this new TensorFlow thing.
00:05:50.000 | And I was actually very busy as a new Googler.
00:05:53.960 | So I had not time to work on that.
00:05:57.720 | But then in November, I think it was November 2015,
00:06:01.880 | TensorFlow got released.
00:06:04.440 | And it was kind of like my wake up call
00:06:07.760 | that, hey, I had to actually go and make it happen.
00:06:10.520 | So in December, I ported Keras to run on top of TensorFlow,
00:06:15.360 | but it was not exactly a port.
00:06:16.520 | It was more like a refactoring,
00:06:18.480 | where I was abstracting away all the backend functionality
00:06:22.640 | into one module, so that the same code base
00:06:25.520 | could run on top of multiple backends.
00:06:28.280 | So on top of TensorFlow or Theano.
00:06:30.640 | And for the next year, Theano stayed as the default option.
00:06:35.640 | It was easier to use, somewhat less buggy.
00:06:43.840 | It was much faster, especially when it came to on-ends.
00:06:46.560 | But eventually, TensorFlow overtook it.
00:06:49.560 | - And TensorFlow, the early TensorFlow,
00:06:53.400 | has similar architectural decisions as Theano.
00:06:57.120 | So it was a natural transition.
00:07:00.640 | - Yeah, absolutely.
00:07:01.480 | - So what, I mean, that's still Keras
00:07:05.280 | as a side, almost fun project, right?
00:07:08.480 | - Yeah, so it was not my job assignment.
00:07:12.200 | So it's not, I was doing it on the side.
00:07:14.680 | So, and even though it grew to have, you know,
00:07:17.960 | a lot of users for DeepLean library at the time,
00:07:20.720 | like throughout 2016, but I wasn't doing it as my main job.
00:07:25.680 | So things started changing in, I think it's,
00:07:28.400 | must have been maybe October, 2016.
00:07:33.360 | So one year later.
00:07:34.520 | So Rajat, who was the lead on TensorFlow,
00:07:38.440 | basically showed up one day in our building.
00:07:41.960 | - Yeah.
00:07:42.800 | - Where I was doing like, so I was doing research
00:07:44.240 | and things like, so I did a lot of computer vision research,
00:07:47.840 | also collaborations with Christian Zugedi
00:07:50.760 | and deep learning for theorem proving.
00:07:52.840 | It was a really interesting research topic.
00:07:56.080 | And so Rajat was saying, "Hey, we saw Keras, we like it.
00:08:02.720 | We saw that you're at Google.
00:08:05.640 | Why don't you come over for like a quarter
00:08:08.480 | and work with us?"
00:08:10.520 | And I was like, "Yeah, that sounds like a great opportunity.
00:08:12.440 | Let's do it."
00:08:13.600 | And so I started working on integrating the Keras API
00:08:18.600 | into TensorFlow more tightly.
00:08:20.520 | So what followed up is a sort of like temporary
00:08:25.840 | TensorFlow only version of Keras
00:08:28.680 | that was in TensorFlow.contrib for a while.
00:08:32.520 | And finally moved to TensorFlow core.
00:08:35.400 | And, you know, I've never actually gotten back
00:08:38.560 | to my old team doing research.
00:08:40.800 | - Well, it's kind of funny that somebody like you
00:08:45.520 | who dreams of, or at least sees the power of AI systems
00:08:50.520 | that reason and theorem proving we'll talk about
00:08:54.840 | has also created a system that makes the most basic
00:08:59.720 | kind of Lego building that is deep learning,
00:09:03.560 | super accessible, super easy.
00:09:05.800 | So beautifully so.
00:09:07.000 | It's a funny irony that you're both,
00:09:10.880 | you're responsible for both things.
00:09:12.280 | But so TensorFlow 2.0, it's kind of, there's a sprint.
00:09:17.160 | I don't know how long it'll take,
00:09:18.240 | but there's a sprint towards the finish.
00:09:20.160 | What do you look, what are you working on these days?
00:09:24.240 | What are you excited about?
00:09:25.320 | What are you excited about in 2.0?
00:09:27.440 | I mean, eager execution.
00:09:28.960 | There's so many things that just make it a lot easier
00:09:31.600 | to work.
00:09:32.920 | What are you excited about?
00:09:34.760 | And what's also really hard?
00:09:36.800 | What are the problems you have to kind of solve?
00:09:38.960 | - So I've spent the past year and a half
00:09:41.160 | working on TensorFlow 2.0.
00:09:44.080 | It's been a long journey.
00:09:46.120 | I'm actually extremely excited about it.
00:09:48.280 | I think it's a great product.
00:09:49.640 | It's a delightful product compared to TensorFlow 1.0.
00:09:52.560 | We've made huge progress.
00:09:54.640 | So on the Keras side, what I'm really excited about is that,
00:10:00.600 | so, you know, previously Keras has been this very easy
00:10:05.280 | to use high-level interface to do deep learning.
00:10:08.400 | But if you wanted to,
00:10:10.720 | you know, if you wanted a lot of flexibility,
00:10:16.480 | the Keras framework, you know,
00:10:18.840 | was probably not the optimal way to do things
00:10:21.840 | compared to just writing everything from scratch.
00:10:24.280 | So in some way, the framework was getting in the way.
00:10:28.120 | And in TensorFlow 2.0, you don't have this at all,
00:10:31.000 | actually, you have the usability of the high-level interface,
00:10:34.520 | but you have the flexibility of this lower-level interface.
00:10:37.960 | And you have this spectrum of workflows
00:10:40.360 | where you can get more or less usability
00:10:45.000 | and flexibility trade-offs depending on your needs, right?
00:10:50.000 | You can write everything from scratch
00:10:53.120 | and you get a lot of help doing so by, you know,
00:10:56.440 | subclassing models and writing some train loops
00:10:59.880 | using ego execution.
00:11:01.640 | It's very flexible.
00:11:02.560 | It's very easy to debug.
00:11:03.600 | It's very powerful.
00:11:04.600 | But all of this integrates seamlessly
00:11:08.200 | with higher-level features up to, you know,
00:11:11.040 | the classic Keras workflows,
00:11:12.640 | which are very scikit-learn-like
00:11:14.760 | and, you know, are ideal for a data scientist,
00:11:19.280 | machine learning engineer type of profile.
00:11:21.440 | So now you can have the same framework
00:11:24.040 | offering the same set of APIs
00:11:26.080 | that enable a spectrum of workflows
00:11:28.240 | that are more or less low-level, more or less high-level
00:11:31.800 | that are suitable for, you know,
00:11:33.680 | profiles ranging from researchers to data scientists
00:11:37.640 | and everything in between.
00:11:39.000 | (upbeat music)
00:11:41.600 | (upbeat music)
00:11:44.200 | (upbeat music)
00:11:46.800 | (upbeat music)
00:11:49.400 | (upbeat music)
00:11:52.000 | (upbeat music)
00:11:54.600 | [BLANK_AUDIO]