Rajat Monga: TensorFlow | Lex Fridman Podcast #22

00:00:00.000 | The following is a conversation with Rajat Manga.

00:00:03.080 | He's an engineering director at Google,

00:00:04.920 | leading the TensorFlow team.

00:00:06.960 | TensorFlow is an open source library

00:00:09.160 | at the center of much of the work going on

00:00:11.000 | in the world in deep learning, both the cutting edge

00:00:13.480 | research and the large scale application

00:00:15.680 | of learning based approaches.

00:00:17.720 | But it's quickly becoming much more than a software library.

00:00:20.920 | It's now an ecosystem of tools for the deployment

00:00:23.760 | of machine learning in the cloud, on the phone,

00:00:25.720 | in the browser, on both generic and specialized hardware.

00:00:29.840 | TPU, GPU, and so on.

00:00:31.920 | Plus, there's a big emphasis on growing a passionate community

00:00:35.200 | of developers.

00:00:36.600 | Rajat, Jeff Dean, and a large team

00:00:38.880 | of engineers at Google Brain are working

00:00:40.920 | to define the future of machine learning with TensorFlow 2.0,

00:00:44.640 | which is now in alpha.

00:00:46.200 | I think the decision to open source TensorFlow

00:00:49.160 | is a definitive moment in the tech industry.

00:00:51.720 | It showed that open innovation can be successful

00:00:54.360 | and inspire many companies to open source their code

00:00:56.880 | to publish and, in general, engage

00:00:58.840 | in the open exchange of ideas.

00:01:01.200 | This conversation is part of the Artificial Intelligence

00:01:03.920 | podcast.

00:01:05.040 | If you enjoy it, subscribe on YouTube, iTunes,

00:01:07.840 | or simply connect with me on Twitter at Lex Friedman,

00:01:10.840 | spelled F-R-I-D. And now, here's my conversation

00:01:14.960 | with Rajat Manga.

00:01:17.920 | You were involved with Google Brain

00:01:19.720 | since its start in 2011 with Jeff Dean.

00:01:24.840 | It started with Disbelief, the proprietary machine learning

00:01:29.200 | library, and turned to TensorFlow in 2014,

00:01:32.800 | the open source library.

00:01:35.680 | So what were the early days of Google Brain like?

00:01:39.080 | What were the goals, the missions?

00:01:41.800 | How do you even proceed forward when

00:01:44.880 | there's so much possibilities before you?

00:01:47.720 | It was interesting back then when I started,

00:01:50.520 | or when we were even just talking about it.

00:01:55.320 | The idea of deep learning was interesting and intriguing

00:01:59.480 | in some ways.

00:02:00.400 | It hadn't yet taken off, but it held some promise.

00:02:04.840 | It had shown some very promising and early results.

00:02:08.640 | I think the idea where Andrew and Jeff had started

00:02:11.360 | was, what if we can take this, what people are doing

00:02:15.400 | in research, and scale it to what Google has in terms

00:02:19.160 | of the compute power, and also put that kind of data

00:02:23.960 | together, what does it mean?

00:02:25.280 | And so far, the results have been, if you scale the compute,

00:02:28.280 | scale the data, it does better.

00:02:30.160 | And would that work?

00:02:31.480 | And so that was the first year or two,

00:02:33.400 | can we prove that out, right?

00:02:35.080 | And with Disbelief, when we started the first year,

00:02:37.480 | we got some early wins, which is always great.

00:02:40.760 | - What were the wins like?

00:02:41.920 | What was the wins where you were,

00:02:44.120 | there's some promise to this, this is gonna be good?

00:02:46.600 | - I think there are two early wins where one was speech,

00:02:49.680 | that we collaborated very closely

00:02:51.400 | with the speech research team,

00:02:52.480 | who was also getting interested in this.

00:02:54.840 | And the other one was on images where we,

00:02:57.880 | the cat paper, as we call it,

00:02:59.480 | that was covered by a lot of folks.

00:03:03.160 | - And the birth of Google Brain was around neural networks.

00:03:07.040 | That was, so it was deep learning from the very beginning.

00:03:09.320 | That was the whole mission.

00:03:10.840 | So what would, in terms of scale,

00:03:15.040 | what was the sort of dream of what this could become?

00:03:20.040 | Like, were there echoes of this open source

00:03:23.120 | TensorFlow community that might be brought in?

00:03:26.260 | Was there a sense of TPUs?

00:03:28.640 | Was there a sense of like, machine learning

00:03:31.160 | is now gonna be at the core of the entire company?

00:03:33.720 | Is it going to grow into that direction?

00:03:36.040 | - Yeah, I think, so that was interesting.

00:03:38.320 | And like, if I think back to 2012 or 2011,

00:03:41.400 | and first was, can we scale it in the year

00:03:44.800 | or so we had started scaling it

00:03:46.440 | to hundreds and thousands of machines.

00:03:48.160 | In fact, we had some runs even going to 10,000 machines,

00:03:51.080 | and all of those shows great promise.

00:03:52.920 | In terms of machine learning at Google,

00:03:56.800 | the good thing was Google's been doing machine learning

00:03:58.780 | for a long time.

00:04:00.240 | Deep learning was new, but as we scaled this up,

00:04:03.760 | we showed that, yes, that was possible,

00:04:05.600 | and it was gonna impact lots of things.

00:04:07.840 | Like, we started seeing real products wanting to use this.

00:04:11.200 | Again, speech was the first.

00:04:12.760 | There were image things that photos came out of,

00:04:15.240 | and then many other products as well.

00:04:17.440 | So that was exciting.

00:04:18.940 | As we went into that a couple of years,

00:04:23.200 | externally also academia started to,

00:04:25.840 | there was lots of push on, okay, deep learning's interesting,

00:04:28.360 | we should be doing more, and so on.

00:04:30.640 | And so by 2014, we were looking at,

00:04:34.400 | okay, this is a big thing, it's gonna grow.

00:04:36.820 | And not just internally, externally as well.

00:04:39.480 | Yes, maybe Google's ahead of where everybody is,

00:04:42.320 | but there's a lot to do.

00:04:43.640 | So a lot of this started to make sense and come together.

00:04:46.720 | - So the decision to open source,

00:04:49.560 | I was just chatting with Chris Glattner about this,

00:04:52.240 | the decision to go open source with TensorFlow,

00:04:54.640 | I would say, for me personally,

00:04:57.080 | seems to be one of the big seminal moments

00:04:59.640 | in all of software engineering ever.

00:05:01.720 | I think that's when a large company like Google

00:05:04.640 | decides to take a large project

00:05:06.480 | that many lawyers might argue has a lot of IP,

00:05:10.840 | just decide to go open source with it,

00:05:12.940 | and in so doing, lead the entire world in saying,

00:05:15.280 | you know what, open innovation is a pretty powerful thing,

00:05:19.400 | and it's okay to do.

00:05:20.820 | That was, I mean, that's an incredible moment in time.

00:05:26.340 | So do you remember those discussions happening?

00:05:29.320 | Whether open source should be happening?

00:05:31.440 | What was that like?

00:05:32.720 | - I would say, I think, so the initial idea came from Jeff,

00:05:36.880 | who was a big proponent of this.

00:05:39.440 | I think it came off of two big things.

00:05:42.480 | One was research-wise, we were a research group.

00:05:46.320 | We were putting all our research out there,

00:05:49.640 | we were building on others' research,

00:05:51.720 | and we wanted to push the state of the art forward,

00:05:55.000 | and part of that was to share the research.

00:05:56.840 | That's how I think deep learning and machine learning

00:05:58.960 | has really grown so fast.

00:06:00.440 | So the next step was, okay, now,

00:06:03.360 | would software help with that?

00:06:05.360 | And it seemed like they were existing.

00:06:08.460 | A few libraries out there,

00:06:10.360 | Tiano being one, Torch being another, and a few others,

00:06:13.200 | but they were all done by academia,

00:06:15.520 | and so the level was significantly different.

00:06:18.080 | The other one was, from a software perspective,

00:06:22.040 | Google had done lots of software that we used internally,

00:06:27.040 | and we published papers.

00:06:29.120 | Often there was an open source project

00:06:31.720 | that came out of that,

00:06:32.640 | that somebody else picked up that paper and implemented,

00:06:35.440 | and they were very successful.

00:06:38.280 | Back then, it was like, okay, there's Hadoop,

00:06:41.460 | which has come off of tech that we've built.

00:06:44.160 | We know the tech we've built is way better

00:06:46.220 | for a number of different reasons.

00:06:47.880 | We've invested a lot of effort in that,

00:06:50.440 | and turns out we have Google Cloud,

00:06:54.320 | and we are now not really providing our tech,

00:06:57.520 | but we are saying, okay, we have Bigtable,

00:07:00.360 | which is the original thing.

00:07:02.040 | We are going to now provide HBase APIs on top of that,

00:07:04.720 | which isn't as good, but that's what everybody's used to.

00:07:07.480 | So there's this, like, can we make something that is better

00:07:10.960 | and really just provide?

00:07:12.320 | Helps the community in lots of ways,

00:07:14.320 | but also helps push a good standard forward.

00:07:18.320 | - So how does Cloud fit into that?

00:07:19.920 | There's a TensorFlow open source library,

00:07:22.680 | and how does the fact that you can use

00:07:26.280 | so many of the resources that Google provides

00:07:28.240 | and the Cloud fit into that strategy?

00:07:31.480 | - So TensorFlow itself is open,

00:07:33.600 | and you can use it anywhere, right?

00:07:34.920 | And we want to make sure that continues to be the case.

00:07:38.360 | On Google Cloud, we do make sure that

00:07:41.720 | there's lots of integrations with everything else,

00:07:43.840 | and we want to make sure that it works

00:07:45.400 | really, really well there.

00:07:47.320 | - You're leading the TensorFlow effort.

00:07:50.400 | Can you tell me the history and the timeline

00:07:51.880 | of TensorFlow project in terms of major design decisions,

00:07:55.880 | so like the open source decision,

00:07:58.160 | but really, you know, what to include and not?

00:08:01.600 | There's this incredible ecosystem

00:08:03.200 | that I'd like to talk about.

00:08:04.960 | There's all these parts, but what if you just,

00:08:07.960 | some sample moments that defined

00:08:12.960 | what TensorFlow eventually became through its,

00:08:16.000 | I don't know if you're allowed to say history when it's,

00:08:19.520 | but in deep learning, everything moves so fast

00:08:21.320 | in just a few years, is there any history?

00:08:23.480 | - Yes, yes.

00:08:24.880 | So looking back, we were building TensorFlow,

00:08:29.800 | I guess we open sourced it in 2015, November 2015.

00:08:34.280 | We started on it in summer of 2014, I guess.

00:08:38.640 | And somewhere like three to six, late 2014,

00:08:43.000 | by then we had decided that,

00:08:44.920 | okay, there's a high likelihood we'll open source it.

00:08:47.120 | So we started thinking about that

00:08:48.920 | and making sure we're heading down that path.

00:08:51.520 | At that point, by that point, we had seen a few,

00:08:57.320 | you know, lots of different use cases at Google.

00:08:59.320 | So there were things like, okay,

00:09:01.000 | yes, you want to run it at large scale in the data center.

00:09:04.200 | Yes, we need to support different kind of hardware.

00:09:07.560 | We had GPUs at that point,

00:09:09.440 | we had our first GPU at that point,

00:09:11.880 | or was about to come out, you know, roughly around that time.

00:09:15.720 | So the design sort of included those.

00:09:18.720 | We had started to push on mobile.

00:09:21.800 | So we were running models on mobile.

00:09:24.920 | At that point, people were customizing code.

00:09:28.160 | So we wanted to make sure TensorFlow

00:09:29.520 | could support that as well,

00:09:30.720 | so that that sort of became part of that overall design.

00:09:35.280 | - When you say mobile,

00:09:36.560 | you mean like pretty complicated algorithms

00:09:38.640 | running on the phone?

00:09:40.040 | - That's correct.

00:09:40.880 | So when you have a model that you deploy on the phone

00:09:44.320 | and run it the right--

00:09:45.320 | - So already at that time,

00:09:46.400 | there was ideas of running machine learning on the phone.

00:09:48.840 | - That's correct.

00:09:49.680 | We already had a couple of products

00:09:51.440 | that were doing that by then.

00:09:53.320 | And in those cases, we had basically customized

00:09:56.440 | handcrafted code or some internal libraries that we're using.

00:10:00.200 | - So I was actually at Google during this time

00:10:02.600 | in a parallel, I guess, universe,

00:10:04.600 | but we were using Theano and Caffe.

00:10:07.200 | - Yeah.

00:10:08.040 | - Was there some degree to which you were balancing,

00:10:11.640 | like trying to see what Caffe was offering people,

00:10:15.560 | trying to see what Theano was offering,

00:10:18.040 | that you want to make sure you're delivering

00:10:20.000 | on whatever that is, perhaps the Python part of thing,

00:10:23.760 | maybe did that influence any design decisions?

00:10:27.560 | - Totally.

00:10:28.400 | So when we built this belief,

00:10:29.640 | and some of that was in parallel

00:10:31.640 | with some of these libraries coming up,

00:10:33.440 | I mean, Theano itself is older,

00:10:35.400 | but we were building this belief focused on

00:10:40.520 | our internal thing because our systems were very different.

00:10:43.000 | By the time we got to this,

00:10:44.120 | we looked at a number of libraries that were out there.

00:10:47.160 | Theano, there were folks in the group

00:10:49.320 | who had experience with Torch, with Lua.

00:10:52.160 | There were folks here who had seen Caffe.

00:10:54.800 | I mean, actually, Yang Jing was here as well.

00:10:57.560 | There's, what other libraries?

00:11:03.040 | I think we looked at a number of things.

00:11:04.960 | Might even have looked at JNR back then.

00:11:06.880 | I'm trying to remember if it was there.

00:11:09.440 | In fact, yeah, we did discuss ideas around,

00:11:12.080 | okay, should we have a graph or not?

00:11:14.240 | And they were, so putting all these together

00:11:19.360 | was definitely, you know,

00:11:20.520 | they were key decisions that we wanted.

00:11:22.680 | We had seen limitations in our prior disbelief things.

00:11:27.280 | A few of them were just in terms of

00:11:30.960 | research was moving so fast, we wanted the flexibility.

00:11:33.800 | The hardware was changing fast,

00:11:36.400 | we expected to change that,

00:11:37.800 | so that those probably were two things.

00:11:39.920 | And yeah, I think the flexibility in terms of

00:11:43.520 | being able to express all kinds of crazy things

00:11:45.360 | was definitely a big one then.

00:11:47.000 | - So what, the graph decisions,

00:11:48.680 | without moving towards TensorFlow 2.0,

00:11:52.440 | there's more, by default, there'll be eager execution.

00:11:56.760 | So sort of hiding the graph a little bit

00:11:59.200 | because it's less intuitive in terms of

00:12:01.920 | the way people develop and so on.

00:12:03.600 | What was that discussion like in terms of using graphs?

00:12:06.760 | It seemed, it's kind of the Theano way,

00:12:09.360 | did it seem the obvious choice?

00:12:11.640 | - So I think where it came from was

00:12:14.400 | our disbelief had a graph-like thing as well.

00:12:17.680 | A much more, it wasn't a general graph,

00:12:19.800 | it was more like a straight line thing.

00:12:21.880 | More like what you might think of CAFE,

00:12:25.080 | I guess, in that sense.

00:12:26.440 | But the graph was, and we always cared

00:12:30.040 | about the production stuff.

00:12:31.160 | Like even with disbelief, you were deploying

00:12:32.560 | a whole bunch of stuff in production.

00:12:34.480 | So graph did come from that when we thought of,

00:12:37.480 | okay, should we do that in Python?

00:12:39.200 | And we experimented with some ideas where

00:12:41.800 | it looked a lot simpler to use,

00:12:43.880 | but not having a graph meant,

00:12:46.760 | okay, how do you deploy now?

00:12:47.960 | So that was probably what tilted the balance for us

00:12:51.200 | and eventually we ended up with a graph.

00:12:52.960 | - And I guess the question there is, did you,

00:12:55.400 | I mean, so production seems to be

00:12:57.400 | the really good thing to focus on,

00:12:59.880 | but did you even anticipate the other side of it

00:13:02.480 | where there could be, what is it,

00:13:04.600 | what are the numbers, something crazy,

00:13:06.640 | 41 million downloads?

00:13:09.000 | - Yep.

00:13:09.840 | (laughing)

00:13:12.080 | - I mean, was that even like a possibility

00:13:15.480 | in your mind that it would be as popular as it became?

00:13:19.200 | - So I think we did see a need for this

00:13:23.240 | a lot from the research perspective

00:13:27.600 | and like early days of deep learning in some ways.

00:13:30.960 | 41 million, no, I don't think I imagined this number then.

00:13:35.560 | It seemed like there's a potential future

00:13:41.720 | where lots more people would be doing this

00:13:43.800 | and how do we enable that?

00:13:45.720 | I would say this kind of growth,

00:13:48.160 | I probably started seeing somewhat after the open sourcing

00:13:52.680 | where it was like, okay, deep learning

00:13:55.840 | is actually growing way faster

00:13:57.920 | for a lot of different reasons

00:13:59.280 | and we are in just the right place to push on that

00:14:02.800 | and leverage that and deliver on lots of things

00:14:06.160 | that people want.

00:14:07.520 | - So what changed once you open sourced?

00:14:09.800 | Like how this incredible amount of attention

00:14:13.400 | from a global population of developers,

00:14:16.520 | how did the project start changing?

00:14:18.240 | I don't even actually remember during those times.

00:14:22.240 | I know looking now, there's really good documentation,

00:14:24.600 | there's an ecosystem of tools,

00:14:26.640 | there's a community, there's a YouTube channel now, right?

00:14:29.800 | - Yeah. (laughing)

00:14:31.200 | - It's very community driven.

00:14:33.840 | Back then, I guess 0.1 version,

00:14:37.600 | is that the version?

00:14:39.840 | - I think we called it 0.6 or five,

00:14:42.160 | something like that, I forget what that is.

00:14:43.760 | - What changed leading into 1.0?

00:14:46.200 | - It's interesting, I think we've gone through

00:14:50.440 | a few things there.

00:14:51.680 | When we started out, when we first came out,

00:14:53.720 | people loved the documentation we have

00:14:56.120 | because it was just a huge step up from everything else

00:14:58.880 | because all of those were academic projects,

00:15:00.480 | people doing, who don't think about documentation.

00:15:03.400 | I think what that changed was,

00:15:07.000 | instead of deep learning being a research thing,

00:15:10.400 | some people who were just developers

00:15:12.600 | could now suddenly take this out

00:15:14.680 | and do some interesting things with it, right?

00:15:16.960 | Who had no clue what machine learning was before then.

00:15:20.280 | And that, I think really changed how things

00:15:23.280 | started to scale up in some ways and pushed on it.

00:15:26.720 | Over the next few months as we looked at,

00:15:30.400 | how do we stabilize things,

00:15:32.000 | as we look at not just researchers,

00:15:33.880 | now we want stability, people want to deploy things,

00:15:36.520 | that's how we started planning for 1.0.

00:15:39.000 | And there are certain needs for that perspective,

00:15:42.200 | and so again, documentation comes up,

00:15:44.360 | designs, more kinds of things to put that together.

00:15:48.200 | And so that was exciting to get that to a stage

00:15:52.240 | where more and more enterprises wanted to buy in

00:15:55.400 | and really get behind that.

00:15:57.760 | And I think post 1.0 and over the next few releases,

00:16:02.680 | that enterprise adoption also started to take off.

00:16:05.280 | I would say between the initial release and 1.0,

00:16:08.000 | it was, okay, researchers, of course,

00:16:11.080 | then a lot of hobbies and early interest,

00:16:13.760 | people excited about this who started to get on board,

00:16:15.960 | and then over the 1.x thing, lots of enterprises.

00:16:19.040 | - I imagine anything that's below 1.0

00:16:23.840 | gets pressure to be,

00:16:25.960 | enterprise probably wants something that's stable.

00:16:28.040 | - Exactly.

00:16:28.880 | - And do you have a sense now that TensorFlow is stable?

00:16:33.320 | Like it feels like deep learning in general

00:16:35.560 | is extremely dynamic field.

00:16:37.720 | There's so much changing.

00:16:39.000 | TensorFlow has been growing incredibly.

00:16:43.400 | You have a sense of stability at the helm of it?

00:16:46.760 | I mean, I know you're in the midst of it, but--

00:16:48.400 | - Yeah, I think in the midst of it,

00:16:51.680 | it's often easy to forget what an enterprise wants

00:16:55.120 | and what some of the people on that side want.

00:16:58.800 | There are still people running models

00:17:00.440 | that are three years old, four years old,

00:17:02.680 | so Inception is still used by tons of people.

00:17:06.040 | Even ResNet-50 is what, a couple of years old now or more,

00:17:08.960 | but there are tons of people who use that,

00:17:10.920 | and they're fine.

00:17:12.240 | They don't need the last couple of bits of performance

00:17:15.320 | or quality, they want some stability

00:17:17.720 | and things that just work.

00:17:19.640 | And so there is value in providing that

00:17:22.240 | with that kind of stability and making it really simpler,

00:17:25.240 | because that allows a lot more people to access it.

00:17:27.840 | And then there's the research crowd which wants,

00:17:31.240 | okay, they wanna do these crazy things

00:17:33.080 | exactly like you're saying, right?

00:17:34.320 | Not just deep learning in the straight-up models

00:17:37.080 | that used to be there, they want RNNs,

00:17:40.640 | and even RNNs are maybe old, they are transformers now,

00:17:43.480 | and now it needs to combine with RL and GANs and so on.

00:17:48.480 | So there's definitely that area,

00:17:51.200 | the boundary that's shifting and pushing

00:17:53.360 | the state of the art,

00:17:55.200 | but I think there's more and more of the past

00:17:57.200 | that's much more stable,

00:17:59.720 | and even stuff that was two, three years old

00:18:02.720 | is very, very usable by lots of people.

00:18:04.960 | So that part makes it a lot easier.

00:18:07.480 | - So I imagine, maybe you can correct me if I'm wrong,

00:18:09.840 | one of the biggest use cases is essentially

00:18:12.440 | taking something like ResNet-50

00:18:14.440 | and doing some kind of transfer learning

00:18:17.280 | on a very particular problem that you have.

00:18:19.600 | It's basically probably what majority of the world does.

00:18:23.120 | And you wanna make that as easy as possible.

00:18:27.400 | - So I would say, for the hobbyist perspective,

00:18:30.480 | that's the most common case, right?

00:18:32.840 | In fact, the apps on phones and stuff that you'll see,

00:18:35.440 | the early ones, that's the most common case.

00:18:37.720 | I would say there are a couple of reasons for that.

00:18:40.360 | One is that everybody talks about that.

00:18:43.520 | It looks great on slides.

00:18:45.840 | - Yeah, it's visual. - That's a part

00:18:46.920 | of the presentation, yeah, exactly.

00:18:48.940 | What enterprises want is, that is part of it,

00:18:53.200 | but that's not the big thing.

00:18:54.520 | Enterprises really have data

00:18:56.200 | that they wanna make predictions on.

00:18:58.120 | This is often what they used to do

00:19:00.440 | with the people who were doing ML,

00:19:01.880 | was just regression models, linear regression,

00:19:04.360 | logistic regression, linear models,

00:19:06.520 | or maybe gradient-boosted trees and so on.

00:19:09.880 | Some of them still benefit from deep learning,

00:19:11.820 | but they weren't that, that's the bread and butter,

00:19:14.520 | like the structured data and so on.

00:19:16.380 | So depending on the audience you look at,

00:19:18.280 | they're a little bit different.

00:19:19.600 | - And they just have, I mean, the best of enterprise

00:19:23.420 | probably just has a very large dataset,

00:19:26.520 | or deep learning can probably shine.

00:19:28.680 | - That's correct, that's right.

00:19:30.280 | And then the, I think the other pieces that they want,

00:19:33.280 | again, with 2.0, or the developer summit we put together,

00:19:36.440 | is the whole TensorFlow Extended piece,

00:19:39.040 | which is the entire pipeline.

00:19:40.640 | They care about stability across doing their entire thing.

00:19:43.600 | They want simplicity across the entire thing.

00:19:46.280 | I don't need to just train a model.

00:19:47.720 | I need to do that every day again, over and over again.

00:19:51.320 | - I wonder to which degree you have a role in,

00:19:54.340 | I don't know, so I teach a course on deep learning.

00:19:57.080 | I have people like lawyers come up to me and say,

00:20:00.740 | you know, say, "When is machine learning gonna enter legal,

00:20:04.140 | "the legal realm?"

00:20:05.600 | The same thing in all kinds of disciplines,

00:20:09.480 | immigration, insurance.

00:20:13.760 | Often when I see what it boils down to is,

00:20:16.360 | these companies are often a little bit old school

00:20:19.480 | in the way they organize the data.

00:20:20.880 | So the data is just not ready yet, it's not digitized.

00:20:24.040 | Do you also find yourself being in the role of

00:20:26.880 | an evangelist for, like,

00:20:29.320 | let's get, organize your data, folks,

00:20:33.120 | and then you'll get the big benefit of TensorFlow.

00:20:35.520 | Do you get those, have those conversations?

00:20:38.040 | - Yeah, yeah, I get all kinds of questions there from,

00:20:42.340 | okay, what can I, what do I need to make this work, right?

00:20:47.660 | Do we really need deep learning?

00:20:50.860 | I mean, there are all these things,

00:20:52.120 | I already used this linear model, why would this help?

00:20:55.240 | I don't have enough data, let's say, you know,

00:20:57.200 | or I wanna use machine learning,

00:21:00.040 | but I have no clue where to start.

00:21:01.800 | So it varies, back to all the way to the experts

00:21:04.980 | who wise for very specific things, so it's interesting.

00:21:08.600 | - Is there a good answer?

00:21:09.640 | It boils down to, oftentimes, digitizing data.

00:21:12.520 | So whatever you want automated,

00:21:14.480 | whatever data you want to make prediction based on,

00:21:17.560 | you have to make sure that it's in an organized form.

00:21:20.720 | And you've, like, within the TensorFlow ecosystem,

00:21:24.000 | there's now, you're providing more and more datasets

00:21:26.560 | and more and more pre-trained models.

00:21:28.960 | Are you finding yourself also the organizer of datasets?

00:21:32.400 | - Yes, I think with TensorFlow datasets

00:21:34.520 | that we just released, that's definitely come up

00:21:37.520 | where people want these datasets, can we organize them

00:21:40.120 | and can we make that easier?

00:21:41.440 | So that's definitely one important thing.

00:21:45.320 | The other related thing I would say is I often tell people,

00:21:47.680 | you know what, don't think of the most fanciest thing

00:21:50.960 | that the newest model that you see.

00:21:53.320 | Make something very basic work and then you can improve it.

00:21:56.400 | There's just lots of things you can do with it.

00:21:58.920 | - Yeah, start with the basics, true.

00:22:00.640 | One of the big things that makes TensorFlow

00:22:03.280 | even more accessible was the appearance,

00:22:06.120 | whenever that happened, of Keras,

00:22:08.360 | the Keras standard, sort of outside of TensorFlow.

00:22:12.400 | I think it was Keras on top of Theano at first,

00:22:17.760 | only, and then Keras became on top of TensorFlow.

00:22:22.480 | Do you know when Keras chose to also add TensorFlow

00:22:27.480 | as a backend, was it just the community

00:22:32.320 | that drove that initially?

00:22:33.960 | Do you know if there was discussions, conversations?

00:22:37.000 | - Yeah, so Francois started the Keras project

00:22:40.960 | before he was at Google and the first thing was Theano.

00:22:44.560 | I don't remember if that was after TensorFlow

00:22:47.160 | was created or way before.

00:22:48.480 | And then at some point when TensorFlow

00:22:52.080 | started becoming popular, there were enough similarities

00:22:54.200 | that he decided to, okay, create this interface

00:22:56.360 | and put TensorFlow as a backend.

00:22:58.200 | I believe that might still have been before

00:23:01.200 | he joined Google, so we weren't really talking about that.

00:23:06.200 | He decided on his own and thought that was interesting

00:23:09.760 | and relevant to the community.

00:23:11.320 | In fact, I didn't find out about him being at Google

00:23:17.120 | until a few months after he was here.

00:23:19.680 | He was working on some research ideas

00:23:21.880 | and doing Keras on his nights and weekends project.

00:23:24.520 | - Oh, interesting.

00:23:25.360 | So he wasn't part of the TensorFlow.

00:23:28.560 | He didn't join initially.

00:23:29.760 | - He joined research and he was doing some amazing research.

00:23:32.320 | He has some papers on that and research.

00:23:35.480 | He's a great researcher as well.

00:23:37.120 | And at some point we realized, oh,

00:23:40.640 | he's doing this good stuff.

00:23:42.480 | People seem to like the API and he's right here.

00:23:45.440 | So we talked to him and he said,

00:23:47.760 | okay, why don't I come over to your team

00:23:50.640 | and work with you for a quarter

00:23:52.840 | and let's make that integration happen.

00:23:55.560 | And we talked to his manager and he said,

00:23:56.880 | sure, my quarter's fine.

00:23:58.600 | And that quarter's been something like two years now.

00:24:02.480 | (laughing)

00:24:03.400 | So he's fully on this.

00:24:05.120 | - So Keras got integrated into TensorFlow

00:24:09.680 | like in a deep way.

00:24:12.040 | And now with 2.0, TensorFlow 2.0,

00:24:15.240 | sort of Keras is kind of the recommended way

00:24:18.760 | for a beginner to interact with TensorFlow.

00:24:21.720 | Which makes that initial sort of transfer learning

00:24:24.640 | or the basic use cases, even for enterprise,

00:24:28.080 | super simple, right?

00:24:29.320 | - That's correct.

00:24:30.440 | - So what was that decision like?

00:24:32.040 | That seems like,

00:24:32.880 | that's kind of a bold decision as well.

00:24:37.760 | - We did spend a lot of time thinking about that one.

00:24:41.240 | We had a bunch of APIs, some built by us.

00:24:46.040 | There was a parallel layers API that we were building.

00:24:48.800 | And when we decided to do Keras in parallel,

00:24:51.600 | so there were like, okay, two things that we are looking at.

00:24:54.440 | And the first thing we was trying to do

00:24:55.960 | is just have them look similar,

00:24:58.240 | like be as integrated as possible,

00:25:00.120 | share all of that stuff.

00:25:02.200 | There were also like three other APIs

00:25:04.040 | that others had built over time

00:25:05.880 | because we didn't have a standard one.

00:25:09.080 | But one of the messages that we kept hearing

00:25:11.480 | from the community, okay, which one do we use?

00:25:13.240 | And they kept saying like, okay,

00:25:14.480 | here's a model in this one,

00:25:15.600 | and here's a model in this one, which should I pick?

00:25:18.880 | So that's sort of like, okay,

00:25:20.960 | we had to address that straight on with 2.0.

00:25:24.080 | The whole idea was we need to simplify,

00:25:26.360 | we had to pick one.

00:25:27.400 | Based on where we were, we were like, okay,

00:25:31.240 | let's see what are the people like.

00:25:35.680 | And Keras was clearly one that lots of people loved.

00:25:39.320 | There were lots of great things about it.

00:25:41.640 | So we settled on that.

00:25:43.920 | - Organically, that's kind of the best way to do it.

00:25:46.440 | It was great.

00:25:47.520 | It was surprising, nevertheless,

00:25:48.760 | to sort of bring in an outside.

00:25:51.120 | I mean, there was a feeling like Keras might

00:25:54.080 | be almost like a competitor

00:25:55.440 | in a certain kind of a two tensor flow.

00:25:58.040 | And in a sense, it became an empowering element

00:26:01.320 | of tensor flow.

00:26:02.240 | - That's right.

00:26:03.280 | Yeah, it's interesting how you can put two things together

00:26:06.400 | which can align, right?

00:26:08.280 | And in this case, I think Francois, the team,

00:26:11.760 | and a bunch of us have chatted,

00:26:14.240 | and I think we all want to see the same kind of things.

00:26:17.360 | We all care about making it easier

00:26:18.840 | for the huge set of developers out there,

00:26:21.480 | and that makes a difference.

00:26:23.520 | - So Python has Guido Van Rossum,

00:26:26.920 | who until recently held the position

00:26:28.960 | of benevolent dictator for life.

00:26:33.600 | Does a huge successful open source project

00:26:36.520 | like TensorFlow need one person who makes a final decision?

00:26:40.680 | So you've did a pretty successful TensorFlow Dev Summit

00:26:45.480 | just now, last couple of days.

00:26:47.520 | There's clearly a lot of different new features

00:26:51.080 | being incorporated in an amazing ecosystem and so on.

00:26:53.880 | How are those design decisions made?

00:26:57.320 | Is there a BDFL in TensorFlow,

00:27:02.800 | or is it more distributed and organic?

00:27:05.880 | - I think it's somewhat different, I would say.

00:27:08.800 | I've always been involved in the key design directions,

00:27:14.960 | but there are lots of things that are distributed

00:27:18.520 | where there are a number of people,

00:27:20.680 | Martin Wick being one who has really driven

00:27:23.320 | a lot of our open source stuff, a lot of the APIs,

00:27:26.600 | and there are a number of other people

00:27:29.280 | who've been pushed and been responsible

00:27:32.760 | for different parts of it.

00:27:34.160 | We do have regular design reviews.

00:27:37.920 | Over the last year, we've really spent a lot of time

00:27:40.160 | opening up to the community and adding transparency.

00:27:43.320 | We're setting more processes in place,

00:27:45.920 | so RFCs, special interest groups,

00:27:49.200 | really grow that community and scale that.

00:27:52.160 | I think the kind of scale that ecosystem is in,

00:27:57.800 | I don't think we could scale with having me

00:27:59.560 | as the standpoint of decision maker.

00:28:02.320 | - I got it.

00:28:03.480 | So yeah, the growth of that ecosystem.

00:28:05.920 | Maybe you can talk about it a little bit.

00:28:08.080 | First of all, it started with Andrej Karpathy

00:28:10.760 | when he first did ConvNetJS.

00:28:13.160 | The fact that you can train in your own network

00:28:15.400 | in the browser in JavaScript was incredible.

00:28:18.520 | So now TensorFlow.js is really making that a serious,

00:28:23.960 | a legit thing, a way to operate,

00:28:27.560 | whether it's in the back end or the front end.

00:28:29.560 | Then there's the TensorFlow Extended,

00:28:31.400 | like you mentioned.

00:28:32.720 | There's TensorFlow Lite for mobile.

00:28:35.360 | And all of it, as far as I can tell,

00:28:37.480 | it's really converging towards being able to save models

00:28:42.480 | in the same kind of way.

00:28:43.480 | You can move around, you can train on the desktop,

00:28:46.680 | and then move it to mobile and so on.

00:28:48.720 | - That's right.

00:28:49.560 | - There's that cohesiveness.

00:28:52.280 | So can you maybe give me, whatever I missed,

00:28:56.120 | a bigger overview of the mission of the ecosystem

00:28:58.840 | that's trying to be built and where is it moving forward?

00:29:02.080 | - Yeah.

00:29:02.920 | So in short, the way I like to think of this is

00:29:06.760 | our goal is to enable machine learning.

00:29:09.720 | And in a couple of ways.

00:29:11.680 | One is we have lots of exciting things going on in ML today.

00:29:16.520 | We started with deep learning,

00:29:17.520 | but we now support a bunch of other algorithms too.

00:29:21.400 | So one is to, on the research side,

00:29:23.800 | keep pushing on the state of the art.

00:29:26.040 | How do we enable researchers

00:29:27.240 | to build the next amazing thing?

00:29:28.960 | So BERT came out recently.

00:29:31.760 | It's great that people are able to do new kinds of research.

00:29:33.960 | There are lots of amazing research

00:29:35.400 | that happens across the world.

00:29:37.520 | So that's one direction.

00:29:38.840 | The other is, how do you take that across

00:29:42.480 | all the people outside who want to take that research

00:29:45.200 | and do some great things with it

00:29:46.640 | and integrate it to build real products,

00:29:48.640 | to have a real impact on people?

00:29:51.800 | And so that's the other axis in some ways.

00:29:55.040 | At a high level, one way I think about it is

00:29:59.640 | there are a crazy number of compute devices across the world.

00:30:04.240 | And we often used to think of ML and training

00:30:07.920 | and all of this as, okay,

00:30:08.920 | something you do either in a workstation

00:30:10.840 | or the data center or cloud.

00:30:12.600 | But we see things running on the phones.

00:30:15.720 | We see things running on really tiny chips.

00:30:17.680 | I mean, we had some demos of the developer summit.

00:30:20.760 | And so the way I think about this ecosystem is,

00:30:25.760 | how do we help get machine learning on every device

00:30:29.960 | that has a compute capability?

00:30:32.560 | And that continues to grow.

00:30:33.800 | And so in some ways, this ecosystem has looked at

00:30:38.720 | various aspects of that and grown over time

00:30:41.160 | to cover more of those.

00:30:42.480 | And we continue to push the boundaries.

00:30:44.680 | In some areas, we've built more tooling

00:30:48.200 | and things around that to help you.

00:30:50.040 | I mean, the first tool we started was TensorBoard.

00:30:52.800 | You wanted to learn just the training piece.

00:30:55.000 | TFX or TensorFlow Extended

00:30:58.120 | to really do your entire ML pipelines

00:31:00.440 | if you care about all that production stuff.

00:31:03.920 | But then going to the edge,

00:31:06.640 | going to different kinds of things.

00:31:09.520 | And it's not just us now.

00:31:11.840 | We're at a place where there are lots of libraries

00:31:14.480 | being built on top.

00:31:15.840 | So there are some for research,

00:31:17.800 | maybe things like TensorFlow Agents

00:31:20.080 | or TensorFlow Probability that started as research things

00:31:22.480 | or for researchers for focusing

00:31:24.240 | on certain kinds of algorithms.

00:31:26.160 | But they're also being deployed

00:31:27.320 | or used by production folks.

00:31:30.280 | And some have come from within Google,

00:31:33.360 | just teams across Google

00:31:34.760 | who wanted to build these things.

00:31:37.040 | Others have come from just the community

00:31:39.720 | because there are different pieces

00:31:41.840 | that different parts of the community care about.

00:31:44.640 | And I see our goal as enabling even that.

00:31:49.640 | We cannot and won't build every single thing.

00:31:53.280 | That just doesn't make sense.

00:31:54.880 | But if we can enable others

00:31:56.560 | to build the things that they care about,

00:31:58.360 | and there's a broader community that cares about that,

00:32:01.480 | and we can help encourage that,

00:32:02.920 | and that's great.

00:32:05.320 | That really helps the entire ecosystem, not just those.

00:32:08.640 | One of the big things about 2.0 that we're pushing on is,

00:32:11.880 | okay, we have these so many different pieces, right?

00:32:14.680 | How do we help make all of them work well together?

00:32:18.360 | So there are a few key pieces there that we're pushing on,

00:32:22.000 | one being the core format in there

00:32:23.920 | and how we share the models themselves

00:32:26.640 | through SaveModel and TensorFlow Hub and so on.

00:32:29.600 | And a few of the pieces that we really put this together.

00:32:34.040 | - I was very skeptical that that's,

00:32:35.920 | when TensorFlow.js came out,

00:32:37.320 | it didn't seem, or Deep Learning.js.

00:32:40.200 | - Yeah, that was the first.

00:32:41.680 | - It seems like technically a very difficult project.

00:32:44.920 | As a standalone, it's not as difficult,

00:32:47.040 | but as a thing that integrates into the ecosystem,

00:32:50.000 | it seems very difficult.

00:32:51.280 | So, I mean, there's a lot of aspects of this

00:32:53.280 | you're making look easy, but,

00:32:54.720 | on the technical side,

00:32:57.240 | how many challenges have to be overcome here?

00:32:59.540 | - A lot.

00:33:01.560 | - And still have to be overcome.

00:33:03.120 | That's the question here, too.

00:33:04.960 | - There are lots of steps to it.

00:33:06.240 | I mean, we've iterated over the last few years,

00:33:08.080 | so there's a lot we've learned.

00:33:10.760 | I, yeah, often when things come together well,

00:33:14.280 | things look easy, and that's exactly the point.

00:33:16.480 | It should be easy for the end user,

00:33:18.400 | but there are lots of things that go behind that.

00:33:21.400 | If I think about still challenges ahead,

00:33:25.400 | there are,

00:33:26.760 | you know, we have a lot more devices coming on board,

00:33:32.920 | for example, from the hardware perspective.

00:33:35.360 | How do we make it really easy for these vendors

00:33:37.680 | to integrate with something like TensorFlow, right?

00:33:41.280 | So there's a lot of compiler stuff

00:33:43.680 | that others are working on.

00:33:45.360 | There are things we can do in terms of our APIs

00:33:48.360 | and so on that we can do.

00:33:49.680 | As we, you know, TensorFlow started

00:33:53.960 | as a very monolithic system,

00:33:55.840 | and to some extent it still is.

00:33:57.680 | There are less, lots of tools around it,

00:33:59.440 | but the core is still pretty large and monolithic.

00:34:02.960 | One of the key challenges for us to scale that out

00:34:05.760 | is how do we break that apart with clearer interfaces?

00:34:10.400 | It's, you know, in some ways it's software engineering 101,

00:34:14.560 | but for a system that's now four years old, I guess, or more,

00:34:19.560 | and that's still rapidly evolving

00:34:21.640 | and that we're not slowing down with,

00:34:24.040 | it's hard to, you know, change and modify

00:34:26.840 | and really break apart.

00:34:28.280 | It's sort of like, as people say, right,

00:34:29.960 | it's like changing the engine with a car running

00:34:32.640 | or fix that, that's exactly what we're trying to do.

00:34:35.120 | - So there's a challenge here

00:34:37.600 | because the downside of so many people

00:34:41.600 | being excited about TensorFlow

00:34:43.880 | and coming to rely on it in many of their applications

00:34:48.640 | is that you're kind of responsible,

00:34:52.080 | like it's the technical debt.

00:34:53.560 | You're responsible for previous versions

00:34:55.720 | to some degree still working.

00:34:57.640 | So when you're trying to innovate,

00:34:59.960 | I mean, it's probably easier to just start from scratch

00:35:03.800 | every few months.

00:35:04.960 | (laughs)

00:35:05.840 | - Absolutely.

00:35:07.240 | - So do you feel the pain of that?

00:35:09.360 | A 2.0 does break some back compatibility,

00:35:14.320 | but not too much.

00:35:15.400 | It seems like the conversion is pretty straightforward.

00:35:18.160 | Do you think that's still important

00:35:20.280 | given how quickly deep learning is changing?

00:35:22.920 | Can you just, the things that you've learned,

00:35:26.400 | can you just start over or is there pressure to not?

00:35:29.320 | - It's a tricky balance.

00:35:31.600 | So if it was just a researcher writing a paper

00:35:36.360 | who a year later will not look at that code again,

00:35:39.440 | sure, it doesn't matter.

00:35:40.760 | There are a lot of production systems

00:35:43.480 | that rely on TensorFlow,

00:35:44.720 | both at Google and across the world.

00:35:47.280 | And people worry about this.

00:35:49.760 | I mean, these systems run for a long time.

00:35:52.440 | So it is important to keep that compatibility and so on.

00:35:57.280 | And yes, it does come with a huge cost.

00:35:59.760 | There's, we have to think about a lot of things

00:36:03.000 | as we do new things and make new changes.

00:36:05.840 | I think it's a trade-off, right?

00:36:09.160 | You can, you might slow certain kinds of things down,

00:36:13.040 | but the overall value you're bringing because of that

00:36:15.480 | is much bigger because it's not just about

00:36:18.640 | breaking the person yesterday,

00:36:20.600 | it's also about telling the person tomorrow

00:36:23.720 | that you know what, this is how we do things.

00:36:26.360 | We're not going to break you when you come on board

00:36:28.600 | because there are lots of new people

00:36:29.920 | who are also going to come on board.

00:36:31.680 | You know, one way I like to think about this,

00:36:34.720 | and I always push the team to think about as well,

00:36:38.000 | when you want to do new things,

00:36:39.600 | you want to start with a clean slate,

00:36:42.040 | design with a clean slate in mind.

00:36:44.920 | And then we'll figure out how to make sure

00:36:47.520 | all the other things work.

00:36:48.680 | And yes, we do make compromises occasionally,

00:36:51.320 | but unless you design with the clean slate

00:36:55.240 | and not worry about that,

00:36:56.560 | you'll never get to a good place.

00:36:58.400 | - That's brilliant.

00:36:59.240 | So even if you are responsible in the idea stage,

00:37:04.080 | when you're thinking of new,

00:37:05.800 | just put all that behind you.

00:37:07.720 | Okay, that's really well put.

00:37:09.600 | So I have to ask this because a lot of students,

00:37:12.000 | developers ask me,

00:37:13.240 | how I feel about PyTorch versus TensorFlow.

00:37:16.320 | So I've recently completely switched

00:37:18.280 | my research group to TensorFlow.

00:37:20.920 | I wish everybody would just use the same thing,

00:37:23.280 | and TensorFlow is as close to that, I believe, as we have.

00:37:26.960 | But do you enjoy competition?

00:37:31.000 | So TensorFlow is leading in many ways,

00:37:34.320 | on many dimensions in terms of ecosystem,

00:37:36.760 | in terms of the number of users,

00:37:39.040 | momentum, power, production level, so on.

00:37:41.200 | But a lot of researchers are now also using PyTorch.

00:37:46.000 | Do you enjoy that kind of competition

00:37:47.520 | or do you just ignore it and focus on

00:37:49.800 | making TensorFlow the best that it can be?

00:37:52.320 | - So just like research or anything people are doing,

00:37:55.480 | it's great to get different kinds of ideas.

00:37:58.120 | And when we started with TensorFlow,

00:38:01.480 | like I was saying earlier,

00:38:03.320 | one, it was very important for us

00:38:05.560 | to also have production in mind.

00:38:07.440 | We didn't want just research, right?

00:38:09.000 | And that's why we chose certain things.

00:38:11.320 | Now PyTorch came along and said,

00:38:12.840 | you know what, I only care about research.

00:38:14.880 | This is what I'm trying to do.

00:38:16.320 | What's the best thing I can do for this?

00:38:18.400 | And it started iterating and said,

00:38:20.880 | okay, I don't need to worry about graphs.

00:38:22.560 | Let me just run things.

00:38:25.200 | I don't care if it's not as fast as it can be,

00:38:27.440 | but let me just make this part easy.

00:38:30.520 | And there are things you can learn from that, right?

00:38:32.600 | They again had the benefit of seeing what had come before,

00:38:36.800 | but also exploring certain different kinds of spaces.

00:38:40.560 | And they had some good things there,

00:38:43.600 | building on say things like JNUR and so on before that.

00:38:46.720 | So competition is definitely interesting.

00:38:49.360 | It made us, this is an area that we had thought about,

00:38:51.920 | like I said, very early on.

00:38:53.760 | Over time we had revisited this a couple of times,

00:38:56.640 | should we add this again?

00:38:59.040 | At some point we said, you know what,

00:39:01.080 | it seems like this can be done well,

00:39:02.920 | so let's try it again.

00:39:04.320 | And that's how we started pushing on eager execution.

00:39:07.720 | How do we combine those two together?

00:39:09.920 | Which has finally come very well together in 2.0,

00:39:13.160 | but it took us a while to get all the things together

00:39:15.760 | and so on.

00:39:16.600 | - So let me ask, put another way,

00:39:19.360 | I think eager execution is a really powerful thing

00:39:21.880 | that was added.

00:39:22.720 | Do you think it wouldn't have been,

00:39:24.400 | you know, Muhammad Ali versus Fraser, right?

00:39:28.440 | Do you think it wouldn't have been added as quickly

00:39:31.240 | if PyTorch wasn't there?

00:39:33.800 | - It might have taken longer.

00:39:35.800 | Yeah, it was, I mean, we had tried some variants

00:39:38.240 | of that before, so I'm sure it would have happened,

00:39:40.960 | but it might have taken longer.

00:39:42.280 | - I'm grateful that TensorFlow is following

00:39:44.240 | the way they did.

00:39:45.080 | It's doing some incredible work last couple of years.

00:39:47.800 | What other things that we didn't talk about

00:39:49.640 | are you looking forward in 2.0?

00:39:52.640 | That comes to mind.

00:39:54.040 | So we talked about some of the ecosystem stuff,

00:39:56.520 | making it easily accessible through Keras,

00:40:00.000 | eager execution.

00:40:01.440 | Is there other things that we missed?

00:40:02.840 | - Yeah, so I would say one is just where 2.0 is,

00:40:07.520 | and you know, with all the things that we've talked about.

00:40:10.760 | I think as we think beyond that,

00:40:13.760 | there are lots of other things that it enables us to do

00:40:16.600 | and that we're excited about.

00:40:18.760 | So what it's setting us up for,

00:40:20.720 | okay, here are these really clean APIs.

00:40:22.520 | We've cleaned up the surface for what the users want.

00:40:25.640 | What it also allows us to do a whole bunch of stuff

00:40:28.320 | behind the scenes once we are ready with 2.0.

00:40:31.600 | So for example, in TensorFlow with graphs

00:40:36.600 | and all the things you could do,

00:40:37.720 | you could always get a lot of good performance

00:40:40.600 | if you spent the time to tune it, right?

00:40:43.280 | And we've clearly shown that, lots of people do that.

00:40:48.720 | With 2.0, with these APIs,

00:40:52.040 | where we are, we can give you a lot of performance

00:40:55.120 | just with whatever you do.

00:40:56.600 | Because we see it's much cleaner,

00:41:01.400 | we know most people are gonna do things this way.

00:41:03.720 | We can really optimize for that

00:41:05.520 | and get a lot of those things out of the box.

00:41:09.040 | And it really allows us, both for single machine

00:41:11.920 | and distributed and so on,

00:41:13.880 | to really explore other spaces behind the scenes

00:41:17.200 | after 2.0 in the future versions as well.

00:41:19.720 | So right now the team's really excited about that.

00:41:23.000 | That over time, I think we'll see that.

00:41:25.840 | The other piece that I was talking about

00:41:27.760 | in terms of just restructuring the monolithic thing

00:41:31.640 | into more pieces and making it more modular,

00:41:34.360 | I think that's gonna be really important

00:41:36.800 | for a lot of the other people in the ecosystem

00:41:41.800 | or the organizations and so on that wanted to build things.

00:41:44.840 | - Can you elaborate a little bit what you mean

00:41:46.400 | by making TensorFlow ecosystem more modular?

00:41:50.720 | - So the way it's organized today is there's one,

00:41:55.040 | there are lots of repositories

00:41:56.320 | in the TensorFlow organization at GitHub.

00:41:58.360 | The core one where we have TensorFlow,

00:42:01.120 | it has the execution engine,

00:42:04.120 | it has the key backends for CPUs and GPUs,

00:42:08.320 | it has the work to do distributed stuff.

00:42:12.600 | And all of these just work together

00:42:14.440 | in a single library or binary.

00:42:17.280 | There's no way to split them apart easily.

00:42:18.840 | I mean, there are some interfaces,

00:42:20.000 | but they're not very clean.

00:42:21.640 | In a perfect world, you would have clean interfaces

00:42:23.960 | where, okay, I wanna run it on my fancy cluster

00:42:27.760 | with some custom networking,

00:42:29.400 | just implement this and do that.

00:42:31.000 | I mean, we kind of support that,

00:42:32.680 | but it's hard for people today.

00:42:34.640 | I think as we are starting to see more interesting things

00:42:38.200 | in some of these spaces,

00:42:39.480 | having that clean separation will really start to help.

00:42:43.360 | And again, going to the large size of the ecosystem

00:42:47.400 | and the different groups involved there,

00:42:50.200 | enabling people to evolve

00:42:52.600 | and push on things more independently

00:42:54.400 | just allows it to scale better.

00:42:56.080 | - And by people, you mean individual developers and--

00:42:59.120 | - And organizations.

00:42:59.960 | - And organizations.

00:43:01.840 | So the hope is that everybody sort of major,

00:43:04.280 | I don't know, Pepsi or something uses,

00:43:06.920 | like major corporations go to TensorFlow to this kind of--

00:43:11.080 | - Yeah, if you look at enterprise like Pepsi or these,

00:43:13.680 | I mean, a lot of them are already using TensorFlow.

00:43:15.840 | They are not the ones that do the development

00:43:18.960 | or changes in the core.

00:43:20.400 | Some of them do, but a lot of them don't.

00:43:21.960 | I mean, they touch small pieces.

00:43:23.760 | There are lots of these,

00:43:25.680 | some of them being, let's say, hardware vendors

00:43:27.680 | who are building their custom hardware

00:43:29.000 | and they want their own pieces.

00:43:30.880 | Or some of them being bigger companies, say IBM.

00:43:34.200 | I mean, they're involved in some of our

00:43:36.520 | special interest groups,

00:43:38.160 | and they see a lot of users who want certain things

00:43:41.040 | and they want to optimize for that.

00:43:42.640 | So folks like that often.

00:43:44.480 | - Autonomous vehicle companies, perhaps.

00:43:46.360 | - Exactly, yes.

00:43:48.200 | - So yeah, like I mentioned,

00:43:50.040 | TensorFlow has been downloaded 41 million times,

00:43:52.800 | 50,000 commits, almost 10,000 pull requests,

00:43:56.520 | 1,800 contributors.

00:43:58.360 | So I'm not sure if you can explain it,

00:44:02.160 | but what does it take to build a community like that?

00:44:06.840 | In retrospect, what do you think,

00:44:09.200 | what is the critical thing that allowed

00:44:11.240 | for this growth to happen,

00:44:12.680 | and how does that growth continue?

00:44:14.640 | - Yeah, yeah, that's an interesting question.

00:44:17.960 | I wish I had all the answers there, I guess,

00:44:20.320 | so we could replicate it.

00:44:21.640 | I think there are a number of things

00:44:25.600 | that need to come together, right?

00:44:27.920 | One, just like any new thing,

00:44:32.520 | it is about, there's a sweet spot of timing,

00:44:35.960 | what's needed, does it grow with

00:44:38.920 | what's needed, so in this case, for example,

00:44:41.640 | TensorFlow is not just grown because it was a good tool,

00:44:43.680 | it's also grown with the growth of deep learning itself.

00:44:46.720 | So those factors come into play.

00:44:49.040 | Other than that, though,

00:44:50.360 | I think just hearing, listening to the community,

00:44:55.240 | what they need, being open to,

00:44:58.440 | like in terms of external contributions,

00:45:01.120 | we've spent a lot of time in making sure

00:45:04.560 | we can accept those contributions well,

00:45:06.880 | we can help the contributors in adding those,

00:45:09.520 | putting the right process in place,

00:45:11.360 | getting the right kind of community,

00:45:13.400 | welcoming them and so on.

00:45:15.200 | Like over the last year,

00:45:17.160 | we've really pushed on transparency,

00:45:19.080 | that that's important for an open source project.

00:45:22.320 | People want to know where things are going,

00:45:23.840 | and we're like, okay, here's a process

00:45:26.240 | where you can do that, here are our seas and so on.

00:45:29.400 | So thinking through, there are lots of community aspects

00:45:32.960 | that come into that you can really work on.

00:45:36.520 | As a small project, it's maybe easy to do

00:45:38.760 | because there's like two developers and you can do those.

00:45:42.200 | As you grow, putting more of these processes in place,

00:45:47.000 | thinking about the documentation,

00:45:49.160 | thinking about what do developers care about,

00:45:51.960 | what kind of tools would they want to use?

00:45:55.200 | All of these come into play, I think.

00:45:56.920 | - So one of the big things I think

00:45:58.480 | that feeds the TensorFlow fire

00:46:00.720 | is people building something on TensorFlow.

00:46:04.000 | And implement a particular architecture

00:46:07.720 | that does something cool and useful.

00:46:09.520 | And they put that on GitHub.

00:46:11.120 | And so it just feeds this growth.

00:46:15.600 | Do you have a sense that with 2.0 and 1.0

00:46:19.600 | that there may be a little bit of a partitioning

00:46:21.560 | like there is with Python 2 and 3,

00:46:24.080 | that there'll be a code base?

00:46:26.040 | And in the older versions of TensorFlow,

00:46:28.360 | they will not be as compatible easily?

00:46:31.120 | Or are you pretty confident that this kind of conversion

00:46:35.560 | is pretty natural and easy to do?

00:46:37.920 | - So we're definitely working hard

00:46:39.920 | to make that very easy to do.

00:46:41.440 | There's lots of tooling that we talked about

00:46:43.440 | at the developer summit this week.

00:46:45.760 | And we continue to invest in that tooling.

00:46:48.280 | It's, you know, when you think of these

00:46:50.920 | significant version changes, that's always a risk.

00:46:53.520 | And we are really pushing hard

00:46:55.720 | to make that transition very, very smooth.

00:46:59.200 | I think, so at some level, people want to move

00:47:03.640 | when they see the value in the new thing.

00:47:05.600 | They don't want to move just because it's a new thing.

00:47:07.680 | And some people do,

00:47:08.520 | but most people want a really good thing.

00:47:11.480 | And I think over the next few months

00:47:13.800 | as people start to see the value,

00:47:15.400 | we'll definitely see that shift happening.

00:47:17.640 | So I'm pretty excited and confident

00:47:19.720 | that we'll see people moving.

00:47:21.640 | As you said earlier, this field is also moving rapidly.

00:47:24.680 | So that'll help because we can do more things.

00:47:26.760 | And, you know, all the new things

00:47:28.080 | will clearly happen in 2.X.

00:47:29.520 | So people will have lots of good reasons to move.

00:47:32.280 | - So what do you think TensorFlow 3.0 looks like?

00:47:36.160 | Is that, is there, are things happening so crazy

00:47:40.360 | that even at the end of this year

00:47:42.520 | it seems impossible to plan for?

00:47:44.320 | Or is it possible to plan for the next five years?

00:47:48.600 | - I think it's tricky.

00:47:50.840 | There are some things that we can expect

00:47:54.560 | in terms of, okay, change, yes, change is going to happen.

00:47:58.040 | (laughing)

00:47:59.760 | Are there some things going to stick around

00:48:01.720 | and some things not going to stick around?

00:48:03.760 | I would say the basics of deep learning,

00:48:08.200 | the, you know, say convolution models

00:48:10.440 | or the basic kind of things,

00:48:12.760 | they'll probably be around in some form still in five years.

00:48:16.360 | Will RLN GAN stay?

00:48:18.640 | Very likely based on where they are.

00:48:21.240 | Will we have new things?

00:48:22.880 | Probably, but those are hard to predict.

00:48:24.720 | And some, directionally, some things that we can see is,

00:48:29.720 | you know, and things that we're starting to do, right,

00:48:32.800 | with some of our projects right now

00:48:35.480 | is just 2.0 combining eager execution and graphs

00:48:39.160 | where we're starting to make it more like

00:48:41.520 | just your natural programming language.

00:48:43.200 | You're not trying to program something else.

00:48:45.720 | Similarly with Swift for TensorFlow,

00:48:47.280 | we're taking that approach.

00:48:48.320 | Can you do something round up, right?

00:48:50.080 | So some of those ideas seem like, okay,

00:48:52.160 | that's the right direction.

00:48:54.120 | In five years, we expect to see more in that area.

00:48:57.160 | Other things we don't know is,

00:49:00.120 | will hardware accelerators be the same?

00:49:03.240 | Will we be able to train with four bits instead of 32 bits?

00:49:08.240 | - And I think the TPU side of things is exploring that.

00:49:11.520 | I mean, TPU's already on version three.

00:49:14.000 | It seems that the evolution of TPU and TensorFlow

00:49:17.560 | are sort of, they're co-evolving almost.

00:49:22.160 | In terms of both are learning from each other

00:49:24.760 | and from the community and from the applications

00:49:27.960 | where the biggest benefit is achieved.

00:49:29.720 | - That's right.

00:49:30.560 | - You've been trying to sort of,

00:49:32.320 | with eager with Keras to make TensorFlow

00:49:34.280 | as accessible and easy to use as possible.

00:49:36.480 | What do you think for beginners

00:49:38.040 | is the biggest thing they struggle with?

00:49:40.000 | Have you encountered that?

00:49:42.080 | Or is basically what Keras is solving

00:49:44.280 | is that eager, like we talked about?

00:49:47.400 | - Yeah, for some of them, like you said, right,

00:49:50.920 | beginners want to just be able to take some image model,

00:49:54.880 | they don't care if it's Inception or ResNet

00:49:57.040 | or something else, and do some training

00:49:59.560 | or transfer learning on their kind of model.

00:50:02.480 | Being able to make that easy is important.

00:50:04.440 | So in some ways, if you do that by providing them

00:50:08.600 | simple models with, say, in hub or so on,

00:50:11.400 | they don't care about what's inside that box,

00:50:13.720 | but they want to be able to use it.

00:50:15.160 | So we're pushing on, I think, different levels.

00:50:17.640 | If you look at just a component that you get

00:50:19.960 | which has the layers already smooshed in,

00:50:22.760 | the beginners probably just want that.

00:50:25.200 | Then the next step is, okay,

00:50:26.720 | look at building layers with Keras.

00:50:29.040 | If you go out to research,

00:50:30.240 | then they are probably writing custom layers themselves

00:50:33.120 | or doing their own loops.

00:50:34.360 | So there's a whole spectrum there.

00:50:36.320 | - And then providing the pre-trained models

00:50:38.640 | seems to really decrease the time from you trying to start.

00:50:43.640 | So you could basically in a Colab notebook

00:50:46.800 | achieve what you need.

00:50:49.080 | So I'm basically answering my own question

00:50:51.240 | because I think what TensorFlow delivered on recently

00:50:54.240 | is trivial for beginners.

00:50:56.920 | So I was just wondering if there was other pain points

00:51:00.720 | you're trying to ease, but I'm not sure there would be.

00:51:02.480 | - No, those are probably the big ones.

00:51:04.200 | I mean, I see high schoolers doing a whole bunch of things

00:51:07.040 | now, which is pretty amazing.

00:51:08.800 | - It's both amazing and terrifying.

00:51:10.680 | - Yes.

00:51:12.640 | - In a sense that when they grow up,

00:51:15.840 | some incredible ideas will be coming from them.

00:51:19.240 | So there's certainly a technical aspect to your work,

00:51:21.840 | but you also have a management aspect to your role

00:51:25.200 | with TensorFlow leading the project,

00:51:27.960 | a large number of developers and people.

00:51:31.080 | So what do you look for in a good team?

00:51:34.680 | What do you think?

00:51:35.960 | You know, Google has been at the forefront of exploring

00:51:38.400 | what it takes to build a good team.

00:51:40.440 | And TensorFlow is one of the most cutting edge technologies

00:51:45.520 | in the world.

00:51:46.360 | So in this context, what do you think makes for a good team?

00:51:49.360 | - It's definitely something I think a fair bit about.

00:51:53.160 | I think in terms of the team being able to deliver

00:51:58.160 | something well, one of the things that's important

00:52:02.640 | is a cohesion across the team.

00:52:05.800 | So being able to execute together in doing things.

00:52:10.440 | It's not an, like at this scale,

00:52:13.160 | an individual engineer can only do so much.

00:52:15.440 | There's a lot more that they can do together,

00:52:18.240 | even though we have some amazing superstars across Google

00:52:21.760 | and in the team, but there's, you know,

00:52:25.120 | often the way I see it as the product

00:52:27.360 | of what the team generates is way larger than the whole,

00:52:30.520 | or you know, the individual put together.

00:52:34.440 | And so how do we have all of them work together,

00:52:37.320 | the culture of the team itself.

00:52:40.040 | Hiring good people is important,

00:52:43.080 | but part of that is it's not just that, okay,

00:52:45.680 | we hire a bunch of smart people and throw them together

00:52:48.160 | and let them do things.

00:52:49.760 | It's also people have to care about what they're building.

00:52:52.960 | People have to be motivated for the right kind of things.

00:52:57.400 | That's often an important factor.

00:52:59.840 | And, you know, finally, how do you put that together

00:53:04.640 | with a somewhat unified vision of where we want to go?

00:53:08.840 | So are we all looking in the same direction

00:53:11.240 | or each of us going all over?

00:53:13.600 | And sometimes it's a mix.

00:53:16.120 | Google's a very bottom-up organization in some sense,

00:53:20.560 | also research even more so, and that's how we started.

00:53:26.400 | But as we've become this larger product and ecosystem,

00:53:30.920 | I think it's also important to combine that well

00:53:33.200 | with a mix of, okay, here's the direction we want to go in.

00:53:38.000 | There is exploration we'll do around that,

00:53:39.880 | but let's keep staying in that direction,

00:53:42.840 | not just all over the place.

00:53:44.480 | - And is there a way you monitor the health of the team?

00:53:46.880 | Sort of like, is there a way you know you did a good job?

00:53:51.880 | The team is good?

00:53:53.080 | Like, I mean, you're sort of, you're saying nice things,

00:53:56.240 | but it's sometimes difficult to determine how aligned.

00:54:00.880 | - Yes. - 'Cause it's not binary.

00:54:02.160 | It's not like, there's tensions and complexities and so on.

00:54:06.760 | And the other element of this is the mission of superstars.

00:54:09.240 | You know, there's so much, even at Google,

00:54:11.800 | such a large percentage of work is done

00:54:13.560 | by individual superstars too.

00:54:16.000 | So there's a, and sometimes those superstars

00:54:19.960 | can be against the dynamic of a team and those tensions.

00:54:23.280 | I mean, I'm sure in TensorFlow it might be

00:54:26.600 | a little bit easier because the mission of the project

00:54:28.880 | is so sort of beautiful.

00:54:31.760 | You're at the cutting edge, so it's exciting.

00:54:33.720 | - Yep.

00:54:34.840 | - But have you struggled with that?

00:54:36.720 | Has there been challenges?

00:54:38.360 | - There are always people challenges

00:54:39.880 | in different kinds of ways.

00:54:41.280 | That said, I think we've been,

00:54:43.040 | what's good about getting people who care

00:54:46.880 | and are, you know, have the same kind of culture,

00:54:50.440 | and that's Google in general to a large extent.

00:54:53.480 | But also, like you said, given that the project

00:54:56.120 | has had so many exciting things to do,

00:54:58.800 | there's been room for lots of people

00:55:00.760 | to do different kinds of things and grow,

00:55:02.440 | which does make the problem a bit easier, I guess.

00:55:06.480 | And it allows people, depending on what they're doing,

00:55:09.920 | if there's room around them, then that's fine.

00:55:12.320 | But yes, we do care about whether a superstar or not

00:55:18.160 | that they need to work well with the team across Google.

00:55:22.600 | - That's interesting to hear.

00:55:23.680 | So it's like superstar or not, the productivity broadly

00:55:28.000 | is about the team.

00:55:30.560 | - Yeah, yeah.

00:55:31.520 | I mean, they might add a lot of value,

00:55:33.000 | but if they're hurting the team, then that's a problem.

00:55:35.760 | - So in hiring engineers, it's so interesting, right?

00:55:39.080 | The hiring process, what do you look for?

00:55:41.880 | How do you determine a good developer

00:55:44.320 | or a good member of a team

00:55:46.280 | from just a few minutes or hours together?

00:55:48.620 | (chuckles)

00:55:50.320 | Again, no magic answers, I'm sure.

00:55:52.000 | - Yeah, yeah.

00:55:52.840 | I mean, Google has a hiring process

00:55:55.400 | that we've refined over the last 20 years, I guess,

00:55:58.840 | and that you've probably heard and seen a lot about.

00:56:02.280 | So we do work with the same hiring process,

00:56:05.040 | and that's really helped.

00:56:06.480 | For me in particular, I would say,

00:56:10.920 | in addition to the core technical skills,

00:56:14.240 | what does matter is their motivation in what they want to do

00:56:19.240 | because if that doesn't align well with where we want to go,

00:56:23.000 | that's not going to lead to long-term success

00:56:25.360 | for either them or the team.

00:56:27.320 | And I think that becomes more important

00:56:30.040 | the more senior the person is,

00:56:31.480 | but it's important at every level.

00:56:33.600 | Like even the junior most engineer,

00:56:34.960 | if they're not motivated to do well

00:56:36.400 | at what they're trying to do,

00:56:37.720 | however smart they are,

00:56:38.840 | it's going to be hard for them to succeed.

00:56:40.360 | - Does the Google hiring process touch on that passion?

00:56:44.540 | So like trying to determine,

00:56:46.480 | 'cause I think as far as I understand,

00:56:48.480 | maybe you can speak to it,

00:56:49.600 | that the Google hiring process sort of helps,

00:56:53.360 | in the initial, like determines the skill set there,

00:56:56.400 | is your puzzle-solving ability,

00:56:57.920 | problem-solving ability good?

00:56:59.920 | But like, I'm not sure,

00:57:02.540 | but it seems that the determining whether the person

00:57:05.600 | is like fire inside them,

00:57:07.600 | that burns to do anything really,

00:57:09.040 | it doesn't really matter,

00:57:09.880 | it's just some cool stuff, I'm going to do it.

00:57:12.500 | That, I don't know,

00:57:15.320 | is that something that ultimately ends up

00:57:17.280 | when they have a conversation with you

00:57:18.840 | or once it gets closer to the team?

00:57:22.640 | - So one of the things we do have as part of the process

00:57:25.440 | is just a culture fit,

00:57:27.160 | like part of the interview process itself,

00:57:29.200 | in addition to just the technical skills,

00:57:31.040 | and each engineer or whoever the interviewer is,

00:57:34.280 | is supposed to rate the person on the culture

00:57:38.360 | and the culture fit with Google and so on.

00:57:40.000 | So that is definitely part of the process.

00:57:42.200 | Now, there are various kinds of projects

00:57:45.880 | and different kinds of things,

00:57:46.920 | so there might be variants

00:57:48.820 | in the kind of culture you want there and so on,

00:57:51.400 | and yes, that does vary.

00:57:52.760 | So for example,

00:57:54.000 | TensorFlow's always been a fast-moving project,

00:57:56.960 | and we want people who are comfortable with that.

00:57:59.420 | But at the same time now, for example,

00:58:02.680 | we are at a place where we are also a very full-fledged

00:58:04.820 | product and we want to make sure things that work

00:58:07.800 | really, really work, right?

00:58:09.320 | You can't cut corners all the time.

00:58:11.680 | So balancing that out and finding the people

00:58:14.320 | who are the right fit for those is important.

00:58:17.600 | And I think those kind of things do vary a bit

00:58:19.720 | across projects and teams and product areas across Google,

00:58:23.200 | and so you'll see some differences there

00:58:25.240 | in the final checklist.

00:58:27.720 | But a lot of the core culture,

00:58:29.400 | it comes along with just engineering excellence and so on.

00:58:32.820 | - What is the hardest part of your job?

00:58:37.600 | Take your pick, I guess.

00:58:41.040 | - It's fun, I would say, right?

00:58:44.440 | Hard, yes.

00:58:45.560 | I mean, lots of things at different times.

00:58:47.280 | I think that does vary.

00:58:49.200 | - So let me clarify that difficult things are fun

00:58:52.680 | when you solve them, right?

00:58:53.980 | (laughing)

00:58:55.440 | - Yes.

00:58:56.280 | - It's fun in that sense.

00:58:57.520 | - I think the key to a successful thing across the board,

00:59:02.520 | and in this case, it's a large ecosystem now,

00:59:05.360 | but even a small product, is striking that fine balance

00:59:09.840 | across different aspects of it.

00:59:12.040 | Sometimes it's how fast you go versus how perfect it is.

00:59:17.040 | Sometimes it's how do you involve this huge community?

00:59:21.480 | Who do you involve?

00:59:22.440 | Or do you decide, okay, now is not a good time

00:59:24.820 | to involve them because it's not the right fit?

00:59:28.640 | And sometimes it's saying no to certain kinds of things.

00:59:33.680 | Those are often the hard decisions.

00:59:35.760 | Some of them you make quickly

00:59:39.620 | because you don't have the time.

00:59:41.040 | Some of them you get time to think about them,

00:59:43.240 | but they're always hard.

00:59:44.520 | - So on both, both choices are pretty good,

00:59:46.880 | it's those decisions.

00:59:49.200 | What about deadlines?

00:59:50.400 | Is this, do you find TensorFlow to be driven

00:59:54.840 | by deadlines to a degree that a product might?

01:00:00.400 | Or is there still a balance to where,

01:00:03.920 | I mean, it's less deadline.

01:00:04.920 | You had the Dev Summit, they came together incredibly.

01:00:08.920 | Looked like there's a lot of moving pieces and so on.

01:00:11.480 | So did that deadline make people rise to the occasion,

01:00:15.120 | releasing TensorFlow 2.0 Alpha?

01:00:18.040 | - Yeah.

01:00:18.880 | - I think that was done last minute as well.

01:00:20.400 | I mean, like up to the last point.

01:00:25.400 | - Again, it's one of those things that's,

01:00:28.440 | you need to strike the good balance.

01:00:29.920 | There's some value that deadlines bring

01:00:32.080 | that does bring a sense of urgency

01:00:33.960 | to get the right things together

01:00:35.760 | instead of getting the perfect thing out.

01:00:38.320 | You need something that's good and works well.

01:00:41.320 | And the team definitely did a great job

01:00:43.280 | in putting that together.

01:00:44.120 | So I was very amazed and excited by everything,

01:00:46.600 | how that came together.

01:00:48.760 | That said, across the year,

01:00:49.880 | we try not to put out official deadlines.

01:00:52.560 | We focus on key things that are important,

01:00:57.000 | figure out how much of it's important.

01:01:00.640 | And we are developing in the open,

01:01:03.920 | what internally and externally,

01:01:05.800 | everything's available to everybody.

01:01:07.960 | So you can pick and look at where things are.

01:01:11.240 | We do releases at a regular cadence.

01:01:13.240 | So fine, if something doesn't necessarily end up

01:01:16.080 | at this month, it'll end up in the next release

01:01:17.800 | in a month or two.

01:01:18.760 | And that's okay, but we want to get,

01:01:22.200 | like keep moving as fast as we can in these different areas.

01:01:25.200 | Because we can iterate and improve on things.

01:01:29.680 | Sometimes it's okay to put things out

01:01:32.000 | that aren't fully ready.

01:01:32.960 | We'll make sure it's clear that, okay,

01:01:34.640 | this is experimental, but it's out there

01:01:36.560 | if you want to try and give feedback.

01:01:38.000 | That's very, very useful.

01:01:39.440 | I think that quick cycle and quick iteration is important.

01:01:42.580 | That's what we often focus on

01:01:46.160 | rather than here's a deadline where you get everything else.

01:01:49.240 | - Is 2.0, is there pressure to make that stable?

01:01:52.880 | Or like, for example, WordPress 5.0 just came out

01:01:56.680 | and there was no pressure to,

01:01:59.920 | it was a lot of build updates,

01:02:01.800 | it delivered way too late.

01:02:04.000 | But, and they said, okay, well,

01:02:06.000 | but we're going to release a lot of updates

01:02:07.480 | really quickly to improve it.

01:02:09.720 | Do you see TensorFlow 2.0 in that same kind of way?

01:02:12.280 | Or is there this pressure to once it hits 2.0,

01:02:15.280 | once you get to the release candidate

01:02:16.800 | and then you get to the final,

01:02:19.000 | that's going to be the stable thing?

01:02:22.520 | - So it's going to be stable in just like when NodeX was,

01:02:26.960 | where every API that's there is going to remain and work.

01:02:31.140 | It doesn't mean we can't change things under the covers.

01:02:34.840 | It doesn't mean we can't add things.

01:02:36.780 | So there's still a lot more for us to do

01:02:39.240 | and we're going to need to have more releases.

01:02:41.120 | So in that sense, there's still,

01:02:42.680 | I don't think we'd be done in like two months

01:02:44.760 | when we release this.

01:02:46.200 | - I don't know if you can say,

01:02:47.600 | but is there, you know,

01:02:49.920 | there's not external deadlines for TensorFlow 2.0,

01:02:53.800 | but is there internal deadlines,

01:02:57.120 | the artificial or otherwise,

01:02:58.600 | that you're trying to set for yourself?

01:03:00.920 | Or is it whenever it's ready?

01:03:03.120 | - So we want it to be a great product, right?

01:03:05.680 | And that's a big, important piece for us.

01:03:08.880 | TensorFlow's already out there.

01:03:11.200 | We have, you know, 41 million downloads for 1.x.

01:03:13.760 | So it's not like we have to have this.

01:03:16.440 | Yeah, exactly.

01:03:17.280 | So it's not like,

01:03:18.520 | a lot of the features that we've, you know,

01:03:20.200 | really polishing and putting them together are there.

01:03:23.600 | We don't have to rush that just because.

01:03:26.200 | So in that sense,

01:03:27.040 | we want to get it right and really focus on that.

01:03:29.940 | That said, we have said that we are looking to get this out

01:03:32.600 | in the next few months, in the next quarter.

01:03:34.480 | And we, you know, as far as possible,

01:03:37.100 | we'll definitely try to make that happen.

01:03:39.960 | - Yeah, my favorite line was,

01:03:41.880 | spring is a relative concept.

01:03:44.360 | I love it.

01:03:45.200 | - Yes.

01:03:46.040 | - Spoken like a true developer.

01:03:47.720 | So, you know, something I'm really interested in,

01:03:50.240 | and your previous line of work is,

01:03:53.000 | before TensorFlow, you led a team at Google on search ads.

01:03:56.680 | I think this is a very interesting topic,

01:04:01.880 | on every level, on a technical level.

01:04:04.040 | Because at their best,

01:04:06.120 | ads connect people to the things they want and need.

01:04:09.160 | - Yep.

01:04:10.120 | - And at their worst, they're just these things

01:04:12.320 | that annoy the heck out of you,

01:04:14.960 | to the point of ruining the entire user experience

01:04:17.360 | of whatever you're actually doing.

01:04:19.060 | So they have a bad rep, I guess.

01:04:22.200 | And on the other end,

01:04:26.240 | so that this connecting users to the thing they need and want

01:04:29.680 | is a beautiful opportunity for machine learning to shine.

01:04:34.080 | Like huge amounts of data that's personalized,

01:04:36.360 | and you kind of map to the thing

01:04:37.880 | they actually want, won't get annoyed.

01:04:40.440 | So what have you learned from this,

01:04:43.240 | Google that's leading the world in this aspect?

01:04:45.200 | What have you learned from that experience?

01:04:47.600 | And what do you think is the future of ads?

01:04:51.600 | Take you back to the--

01:04:52.560 | - Yeah.

01:04:53.400 | (laughing)

01:04:54.240 | Yes, it's been a while,

01:04:55.280 | but I totally agree with what you said.

01:04:58.400 | I think the search ads,

01:05:01.480 | the way it was always looked at,

01:05:03.240 | and I believe it still is,

01:05:05.280 | it's an extension of what search is trying to do.

01:05:08.280 | The goal is to make the information

01:05:10.600 | and make the world's information accessible.

01:05:14.720 | With ads, it's not just information,

01:05:17.120 | but it may be products or other things

01:05:19.160 | that people care about.

01:05:20.800 | And so it's really important for them to align

01:05:23.840 | with what the users need.

01:05:26.480 | And in search ads,

01:05:29.160 | there's a minimum quality level

01:05:30.960 | before that ad would be shown.

01:05:32.320 | If you don't have an ad that hits that quality,

01:05:34.040 | but it will not be shown even if we have it,

01:05:36.000 | and okay, maybe we lose some money there, that's fine.

01:05:39.640 | That is really, really important.

01:05:41.280 | And I think that is something I really liked

01:05:43.440 | about being there.

01:05:45.080 | Advertising is a key part.

01:05:48.200 | I mean, as a model, it's been around for ages, right?

01:05:51.720 | It's not a new model.

01:05:53.440 | It's been adapted to the web

01:05:54.880 | and became a core part of search

01:05:57.480 | and many other search engines across the world.

01:06:02.160 | I do hope, like I said,

01:06:04.440 | there are aspects of ads that are annoying

01:06:06.720 | and I go to a website

01:06:08.000 | and if it just keeps popping an ad in my face,

01:06:11.120 | not to let me read, that's gonna be annoying clearly.

01:06:13.880 | So I hope we can strike that balance

01:06:18.800 | between showing a good ad

01:06:23.120 | where it's valuable to the user

01:06:25.120 | and provides the monetization to the service.

01:06:31.040 | And this might be search,

01:06:32.080 | this might be a website, all of these.

01:06:34.840 | They do need the monetization

01:06:36.960 | for them to provide that service.

01:06:39.680 | But if it's done in that good balance

01:06:42.480 | between showing just some random stuff that's distracting

01:06:47.480 | versus showing something that's actually valuable.

01:06:50.960 | - So do you see it moving forward

01:06:53.480 | as to continue being a model

01:06:57.560 | that funds businesses like Google,

01:07:01.000 | that's a significant revenue stream?

01:07:05.200 | 'Cause that's one of the most exciting things,

01:07:08.160 | but also limiting things in the internet

01:07:09.720 | is nobody wants to pay for anything.

01:07:12.240 | And advertisements, again, coupled at their best,

01:07:15.400 | are actually really useful and not annoying.

01:07:17.600 | Do you see that continuing and growing and improving

01:07:22.360 | or do you see more Netflix-type models

01:07:26.720 | where you have to start to pay for content?

01:07:29.000 | - I think it's a mix.

01:07:30.360 | I think it's gonna take a long while

01:07:32.280 | for everything to be paid on the internet, if at all.

01:07:35.360 | Probably not.

01:07:36.200 | I mean, I think there's always gonna be things

01:07:37.840 | that are sort of monetized with things like ads.

01:07:40.800 | But over the last few years, I would say,

01:07:42.800 | we've definitely seen that transition

01:07:44.800 | towards more paid services across the web

01:07:48.640 | and people are willing to pay for them

01:07:50.400 | because they do see the value.

01:07:51.720 | I mean, Netflix is a great example.

01:07:53.640 | I mean, we have YouTube doing things.

01:07:56.560 | People pay for the apps they buy.

01:07:58.760 | More people, I find, are willing to pay

01:08:01.080 | for newspaper content,

01:08:03.120 | for the good news websites across the web.

01:08:07.240 | That wasn't the case even a few years ago, I would say.

01:08:11.040 | And I just see that change in myself as well

01:08:13.320 | and just lots of people around me.

01:08:14.840 | So definitely hopeful that we'll transition

01:08:17.160 | to that mixed model where maybe you get

01:08:20.840 | to try something out for free, maybe with ads,

01:08:24.160 | but then there's a more clear revenue model

01:08:27.400 | that sort of helps go beyond that.

01:08:29.360 | - So speaking of revenue,

01:08:33.440 | how is it that a person can use the TPU

01:08:37.160 | in a Google Colab for free?

01:08:39.440 | So what's the--

01:08:40.640 | (laughing)

01:08:42.000 | I guess the question is, what's the future of TensorFlow

01:08:47.000 | in terms of empowering, say, a class of 300 students

01:08:51.920 | and then amassed by MIT,

01:08:55.400 | what is going to be the future of them being able

01:08:57.840 | to do their homework in TensorFlow?

01:09:00.040 | Like, where are they going to train these networks, right?

01:09:02.880 | What's that future look like with TPUs,

01:09:06.480 | with cloud services, and so on?

01:09:08.960 | - I think a number of things there.

01:09:10.280 | I mean, TensorFlow, open source, you can run it wherever.

01:09:13.680 | You can run it on your desktop

01:09:15.040 | and your desktops always keep getting more powerful,

01:09:17.520 | so maybe you can do more.

01:09:19.560 | My phone is, like, I don't know how many times

01:09:21.440 | more powerful than my first desktop.

01:09:23.680 | - You'll probably train it on your phone, though.

01:09:25.240 | Yeah, that's true.

01:09:26.280 | - Right, so in that sense,

01:09:27.720 | the power you have in your hands is a lot more.

01:09:30.640 | Clouds are actually very interesting from, say,

01:09:34.440 | students' or courses' perspective

01:09:36.960 | because they make it very easy to get started.

01:09:40.080 | I mean, Colab, the great thing about it

01:09:42.080 | is go to a website and it just works.

01:09:45.160 | No installation needed, nothing,

01:09:47.600 | you're just there and things are working.

01:09:50.040 | That's really the power of cloud as well.

01:09:52.320 | And so I do expect that to grow.

01:09:55.360 | Again, Colab is a free service.

01:09:57.960 | It's great to get started, to play with things,

01:10:00.880 | to explore things.

01:10:02.200 | That said, with free you can only get so much.

01:10:06.200 | So just like we were talking about,

01:10:10.160 | free versus paid, and yeah,

01:10:12.240 | there are services you can pay for and get a lot more.

01:10:15.320 | - Great, so if I'm a complete beginner

01:10:17.720 | interested in machine learning and TensorFlow,

01:10:20.000 | what should I do?

01:10:21.640 | - Probably start with going to our website

01:10:23.560 | and playing there.

01:10:24.400 | - So just go to TensorFlow.org and start clicking on things?

01:10:26.600 | - Yep, check our tutorials and guides.

01:10:28.480 | There's stuff you can just click there

01:10:29.840 | and go to Colab and do things.

01:10:31.360 | No installation needed, you can get started right there.

01:10:34.080 | - Okay, awesome, Rajat, thank you so much for talking today.

01:10:36.760 | - Thank you, Lex, it was great.

01:10:38.360 | (upbeat music)

01:10:40.940 | (upbeat music)

01:10:43.520 | (upbeat music)

01:10:46.100 | (upbeat music)

01:10:48.680 | (upbeat music)

01:10:51.260 | (upbeat music)

01:10:53.840 | [BLANK_AUDIO]

Rajat Monga: TensorFlow | Lex Fridman Podcast #22

Chapters