back to index

Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Lex Fridman Podcast #21


Chapters

0:0
1:28 What Was the First Program You'Ve Ever Written
4:45 What Is a Compiler
4:48 Phases of a Compiler
8:16 Compiler Infrastructure
9:31 Llvm Open Source
19:16 Intermediate Representation
32:23 Linux Still Defaults to Gcc
35:18 Code Owners
39:29 The Preprocessor
40:31 Swift
47:34 Progressive Disclosure of Complexity
49:58 Swift and Python Talking to each Other
53:44 Python
56:30 Automatic Differentiation
62:6 Challenges in the Llvm Ecosystem
62:20 The Future of Open Source
67:28 Elon Musk
71:10 The Dragon Book

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Chris Latner.
00:00:02.680 | Currently, he's a senior director at Google,
00:00:05.280 | working on several projects,
00:00:06.720 | including CPU, GPU, TPU accelerators for TensorFlow,
00:00:10.660 | Swift for TensorFlow,
00:00:12.040 | and all kinds of machine learning compiler magic
00:00:14.360 | going on behind the scenes.
00:00:16.320 | He's one of the top experts in the world
00:00:18.420 | on compiler technologies,
00:00:20.080 | which means he deeply understands the intricacies
00:00:23.560 | of how hardware and software come together
00:00:26.240 | to create efficient code.
00:00:27.920 | He created the LLVM compiler infrastructure project
00:00:31.480 | and the Clang compiler.
00:00:33.400 | He led major engineering efforts at Apple,
00:00:36.040 | including the creation of the Swift programming language.
00:00:38.920 | He also briefly spent time at Tesla
00:00:41.800 | as vice president of Autopilot software
00:00:44.320 | during the transition from Autopilot hardware one
00:00:46.820 | to hardware two,
00:00:48.120 | when Tesla essentially started from scratch
00:00:51.280 | to build an in-house software infrastructure for Autopilot.
00:00:54.900 | I could have easily talked to Chris for many more hours.
00:00:58.060 | Compiling code down across the levels of abstraction
00:01:01.260 | is one of the most fundamental and fascinating aspects
00:01:04.140 | of what computers do.
00:01:05.460 | And he is one of the world experts in this process.
00:01:08.660 | It's rigorous science and it's messy, beautiful art.
00:01:12.940 | This conversation is part
00:01:14.700 | of the Artificial Intelligence podcast.
00:01:16.780 | If you enjoy it, subscribe on YouTube, iTunes,
00:01:19.440 | or simply connect with me on Twitter @LexFriedman,
00:01:22.820 | spelled F-R-I-D.
00:01:24.780 | And now here's my conversation with Chris Ladner.
00:01:29.420 | What was the first program you've ever written?
00:01:32.420 | - My first program?
00:01:34.180 | - Back, and when was it?
00:01:35.420 | - I think I started as a kid
00:01:37.540 | and my parents got a basic programming book.
00:01:41.660 | And so when I started,
00:01:42.620 | it was typing out programs from a book
00:01:45.420 | and seeing how they worked and then typing them in wrong
00:01:49.340 | and trying to figure out why they were not working right,
00:01:51.660 | that kind of stuff.
00:01:53.020 | - So basic, what was the first language
00:01:54.900 | that you remember yourself maybe falling in love with,
00:01:58.380 | like really connecting with?
00:02:00.180 | - I don't know, I mean, I feel like I've learned a lot
00:02:01.540 | along the way and each of them
00:02:03.700 | have a different special thing about them.
00:02:06.700 | So I started in basic and then went like GW basic,
00:02:09.760 | which was the thing back in the DOS days,
00:02:11.460 | and then upgraded to Q basic and eventually Quick basic,
00:02:15.320 | which are all slightly more fancy versions
00:02:17.780 | of Microsoft basic.
00:02:19.500 | Made the jump to Pascal
00:02:20.900 | and started doing machine language programming
00:02:23.380 | and assembly in Pascal, which was really cool.
00:02:25.300 | Turbo Pascal was amazing for its day.
00:02:28.140 | Eventually got into C, C++
00:02:29.940 | and then kind of did lots of other weird things.
00:02:33.420 | - I feel like you took the dark path,
00:02:34.900 | which is the, you could have gone Lisp.
00:02:39.500 | - Yeah, yeah.
00:02:40.340 | - You could have gone higher level
00:02:41.500 | sort of functional philosophical hippie route.
00:02:44.620 | Instead you went into like the dark arts of the C.
00:02:48.100 | - It was straight into the machine.
00:02:49.740 | - Straight to the machine.
00:02:50.700 | So it started with basic Pascal and then assembly
00:02:53.900 | and then wrote a lot of assembly.
00:02:55.340 | And I eventually did small talk
00:02:58.980 | and other things like that,
00:03:00.100 | but that was not the starting point.
00:03:01.940 | - But so what is this journey into C?
00:03:05.100 | Is that in high school?
00:03:06.380 | Is that in college?
00:03:07.580 | - That was in high school, yeah.
00:03:08.820 | So, and then that was really about
00:03:13.540 | trying to be able to do more powerful things
00:03:15.220 | than what Pascal could do
00:03:16.620 | and also to learn a different world.
00:03:18.980 | C was really confusing to me with pointers
00:03:20.740 | and the syntax and everything,
00:03:21.900 | and it took a while,
00:03:22.820 | but Pascal's much more principled in various ways.
00:03:27.820 | C is more, I mean, it has its historical roots,
00:03:33.380 | but it's not as easy to learn.
00:03:35.500 | - With pointers, there's this memory management thing
00:03:39.860 | that you have to become conscious of.
00:03:41.660 | Is that the first time you start to understand
00:03:43.860 | that there's resources that you're supposed to manage?
00:03:46.500 | - Well, so you have that in Pascal as well,
00:03:48.500 | but in Pascal, these, like the caret instead of the star,
00:03:51.620 | and there's some small differences like that,
00:03:53.180 | but it's not about pointer arithmetic.
00:03:55.660 | And in C, you end up thinking about
00:03:58.220 | how things get laid out in memory a lot more.
00:04:00.820 | And so in Pascal, you have allocating and deallocating
00:04:04.140 | and owning the memory,
00:04:05.460 | but just the programs are simpler
00:04:07.500 | and you don't have to,
00:04:10.020 | well, for example, Pascal has a string type.
00:04:12.620 | And so you can think about a string
00:04:14.020 | instead of an array of characters
00:04:15.860 | which are consecutive in memory.
00:04:17.660 | So it's a little bit of a higher level abstraction.
00:04:20.380 | - So let's get into it.
00:04:22.780 | Let's talk about LLVM, Clang, and compilers.
00:04:25.540 | - Sure.
00:04:26.540 | - So can you tell me first what LLVM and Clang are
00:04:31.540 | and how is it that you find yourself
00:04:33.940 | the creator and lead developer,
00:04:35.500 | one of the most powerful compiler optimization systems
00:04:39.380 | in use today?
00:04:40.220 | - Sure, so I guess they're different things.
00:04:43.220 | So let's start with what is a compiler?
00:04:47.060 | It's a--
00:04:47.900 | - Is that a good place to start?
00:04:48.820 | - What are the phases of a compiler?
00:04:50.220 | Where are the parts?
00:04:51.060 | Yeah, what is it?
00:04:51.900 | - So what is even a compiler used for?
00:04:53.380 | So the way I look at this is you have a two-sided problem
00:04:57.540 | of you have humans that need to write code
00:05:00.420 | and then you have machines that need to run the program
00:05:02.340 | that the human wrote.
00:05:03.340 | And for lots of reasons,
00:05:04.900 | the humans don't wanna be writing in binary
00:05:07.020 | and wanna think about every piece of hardware.
00:05:08.940 | And so at the same time that you have lots of humans,
00:05:12.060 | you also have lots of kinds of hardware.
00:05:14.780 | And so compilers are the art of allowing humans
00:05:17.780 | to think at a level of abstraction
00:05:19.220 | that they wanna think about and then get that program,
00:05:22.380 | get the thing that they wrote
00:05:23.580 | to run on a specific piece of hardware.
00:05:26.020 | And the interesting and exciting part of all this
00:05:29.460 | is that there's now lots of different kinds of hardware,
00:05:31.940 | chips like x86 and PowerPC and ARM and things like that,
00:05:35.900 | but also high-performance accelerators
00:05:37.300 | for machine learning and other things like that,
00:05:38.880 | or also just different kinds of hardware, GPUs.
00:05:41.480 | These are new kinds of hardware.
00:05:42.900 | And at the same time on the programming side of it,
00:05:45.580 | you have your basic, you have C, you have JavaScript,
00:05:48.620 | you have Python, you have Swift,
00:05:50.540 | you have like lots of other languages
00:05:52.820 | that are all trying to talk to the human in a different way
00:05:55.180 | to make them more expressive and capable and powerful.
00:05:58.300 | And so compilers are the thing
00:06:01.460 | that goes from one to the other.
00:06:03.420 | - End to end, from the very beginning to the very end.
00:06:05.180 | - End to end.
00:06:06.020 | And so you go from what the human wrote
00:06:08.100 | and programming languages end up being about
00:06:11.560 | expressing intent, not just for the compiler
00:06:14.540 | and the hardware, but the programming language's job
00:06:17.940 | is really to capture an expression
00:06:20.880 | of what the programmer wanted
00:06:22.640 | that then can be maintained and adapted
00:06:25.100 | and evolved by other humans,
00:06:27.060 | as well as interpreted by the compiler.
00:06:29.680 | So when you look at this problem,
00:06:31.520 | you have on the one hand humans, which are complicated,
00:06:34.180 | and you have hardware, which is complicated.
00:06:36.740 | And so compilers typically work in multiple phases.
00:06:39.860 | And so the software engineering challenge
00:06:42.720 | that you have here is try to get maximum reuse
00:06:44.980 | out of the amount of code that you write,
00:06:47.120 | because these compilers are very complicated.
00:06:49.780 | And so the way it typically works out
00:06:51.200 | is that you have something called a front end or a parser
00:06:54.440 | that is language specific.
00:06:56.620 | And so you'll have a C parser, and that's what Clang is,
00:06:59.460 | or C++ or JavaScript or Python or whatever,
00:07:03.440 | that's the front end.
00:07:04.960 | Then you'll have a middle part,
00:07:07.080 | which is often the optimizer.
00:07:08.960 | And then you'll have a late part,
00:07:11.120 | which is hardware specific.
00:07:13.340 | And so compilers end up,
00:07:15.040 | there's many different layers often,
00:07:16.680 | but these three big groups are very common in compilers.
00:07:20.860 | And what LLVM is trying to do
00:07:22.200 | is trying to standardize that middle and last part.
00:07:25.360 | And so one of the cool things about LLVM
00:07:27.880 | is that there are a lot of different languages
00:07:29.760 | that compile through to it.
00:07:31.080 | And so things like Swift, but also Julia, Rust,
00:07:36.520 | Clang for C, C++, Objective-C,
00:07:39.120 | like these are all very different languages
00:07:40.920 | and they can all use the same optimization infrastructure,
00:07:43.780 | which gets better performance,
00:07:45.400 | and the same code generation infrastructure
00:07:47.240 | for hardware support.
00:07:48.800 | And so LLVM is really that layer that is common
00:07:52.240 | that all these different specific compilers can use.
00:07:55.580 | - And is it a standard, like a specification,
00:07:59.300 | or is it literally an implementation?
00:08:01.160 | - It's an implementation.
00:08:02.120 | And so I think there's a couple of different ways
00:08:05.880 | of looking at it, right?
00:08:06.720 | Because it depends on which angle you're looking at it from.
00:08:09.680 | LLVM ends up being a bunch of code, okay?
00:08:12.600 | So it's a bunch of code that people reuse
00:08:14.440 | and they build compilers with.
00:08:16.520 | We call it a compiler infrastructure
00:08:18.040 | because it's kind of the underlying platform
00:08:20.000 | that you build a concrete compiler on top of.
00:08:22.520 | But it's also a community.
00:08:23.680 | And the LLVM community is hundreds of people
00:08:26.800 | that all collaborate.
00:08:27.920 | And one of the most fascinating things about LLVM
00:08:30.560 | over the course of time is that we've managed somehow
00:08:34.280 | to successfully get harsh competitors
00:08:37.080 | in the commercial space to collaborate
00:08:39.080 | on shared infrastructure.
00:08:41.120 | And so you have Google and Apple, you have AMD and Intel,
00:08:45.880 | you have Nvidia and AMD on the graphics side,
00:08:48.880 | you have Cray and everybody else doing these things.
00:08:52.640 | And like all these companies are collaborating together
00:08:55.420 | to make that shared infrastructure really, really great.
00:08:58.520 | And they do this not out of the goodness of their heart,
00:09:01.400 | but they do it because it's in their commercial interest
00:09:03.460 | of having really great infrastructure
00:09:05.160 | that they can build on top of,
00:09:06.800 | and facing the reality that it's so expensive
00:09:09.120 | that no one company, even the big companies,
00:09:11.200 | no one company really wants to implement it all themselves.
00:09:14.640 | - Expensive or difficult?
00:09:16.120 | - Both.
00:09:16.960 | That's a great point because it's also about the skill sets.
00:09:20.080 | - Right.
00:09:20.920 | - And the skill sets are very hard to find.
00:09:25.920 | - How big is the LLVM?
00:09:28.000 | It always seems like with open source projects,
00:09:30.480 | the kind, and LLVM is open source?
00:09:33.540 | - Yes, it's open source.
00:09:34.460 | It's about, it's 19 years old now, so it's fairly old.
00:09:38.700 | - It seems like the magic often happens
00:09:40.980 | in a very small circle of people.
00:09:43.060 | - Yes.
00:09:43.900 | - At least at early birth and whatever.
00:09:46.100 | - Yes.
00:09:46.940 | So LLVM came from a university project,
00:09:49.700 | and so I was at the University of Illinois,
00:09:51.700 | and there it was myself, my advisor,
00:09:53.940 | and then a team of two or three research students
00:09:57.540 | in the research group,
00:09:58.420 | and we built many of the core pieces initially.
00:10:02.140 | I then graduated and went to Apple,
00:10:03.780 | and at Apple brought it to the products,
00:10:06.500 | first in the OpenGL graphics stack,
00:10:09.380 | but eventually to the C compiler realm,
00:10:11.620 | and eventually built Clang,
00:10:12.820 | and eventually built Swift and these things,
00:10:14.680 | along the way building a team of people
00:10:16.420 | that are really amazing compiler engineers
00:10:18.660 | that helped build a lot of that.
00:10:20.220 | And so as it was gaining momentum,
00:10:21.900 | and as Apple was using it, being open source and public,
00:10:24.820 | and encouraging contribution,
00:10:26.460 | many others, for example, at Google,
00:10:28.820 | came in and started contributing,
00:10:30.260 | and in some cases, Google effectively owns Clang now,
00:10:33.740 | because it cares so much about C++
00:10:35.580 | and the evolution of that ecosystem,
00:10:37.340 | and so it's investing a lot in the C++ world
00:10:41.420 | and the tooling and things like that.
00:10:42.980 | And so likewise, NVIDIA cares a lot about CUDA,
00:10:47.860 | and so CUDA uses Clang and uses LLVM
00:10:50.780 | for graphics and GPU.
00:10:54.980 | - And so when you first started as a master's project,
00:10:58.900 | I guess, did you think it was gonna go as far as it went?
00:11:02.940 | Were you crazy ambitious about it?
00:11:06.300 | - No. - It seems like
00:11:07.140 | a really difficult undertaking, a brave one.
00:11:09.800 | - Yeah, no, no, no, it was nothing like that.
00:11:11.340 | So I mean, my goal when I went to the University of Illinois
00:11:13.700 | was to get in and out with a non-thesis master's in a year,
00:11:17.500 | and get back to work.
00:11:18.700 | So I was not planning to stay for five years
00:11:22.140 | and build this massive infrastructure.
00:11:24.460 | I got nerd sniped into staying,
00:11:27.380 | and a lot of it was because LLVM was fun,
00:11:29.620 | I was building cool stuff
00:11:30.900 | and learning really interesting things
00:11:33.420 | and facing both software engineering challenges,
00:11:36.900 | but also learning how to work in a team
00:11:38.540 | and things like that.
00:11:40.100 | I had worked at many companies as interns before that,
00:11:43.620 | but it was really a different thing
00:11:45.860 | to have a team of people that are working together
00:11:48.060 | and trying to collaborate in version control,
00:11:50.460 | and it was just a little bit different.
00:11:52.380 | - Like I said, I just talked to Don Knuth,
00:11:54.020 | and he believes that 2% of the world population
00:11:56.820 | have something weird with their brain,
00:11:58.780 | that they're geeks, they understand computers,
00:12:01.060 | they're connected with computers.
00:12:02.540 | He put it at exactly 2%.
00:12:04.340 | Okay, so--
00:12:05.500 | - He's a specific guy.
00:12:06.500 | - It's very specific.
00:12:08.740 | What he says is, I can't prove it,
00:12:10.140 | but it's very empirically there.
00:12:11.740 | Is there something that attracts you
00:12:14.500 | to the idea of optimizing code?
00:12:16.900 | I mean, it seems like that's one of the biggest,
00:12:19.140 | coolest things about LLVM.
00:12:20.900 | - Yeah, that's one of the major things it does.
00:12:22.500 | So I got into that because of a person, actually.
00:12:26.460 | So when I was in my undergraduate,
00:12:28.220 | I had an advisor, or a professor named Steve Vegdal,
00:12:32.060 | and I went to this little, tiny, private school.
00:12:35.740 | There were, I think, seven or nine people
00:12:38.300 | in my computer science department,
00:12:40.340 | students in my class.
00:12:43.100 | So it was a very tiny, very small school.
00:12:47.420 | It was kind of a wart on the side of the math department,
00:12:49.940 | kind of a thing at the time.
00:12:51.260 | I think it's evolved a lot in the many years since then.
00:12:53.820 | But Steve Vegdal was a compiler guy,
00:12:58.300 | and he was super passionate.
00:12:59.620 | And his passion rubbed off on me,
00:13:02.780 | and one of the things I like about compilers
00:13:04.460 | is that they're large, complicated software pieces.
00:13:09.140 | And so one of the culminating classes
00:13:12.980 | that many computer science departments,
00:13:14.580 | at least at the time, did was to say
00:13:16.740 | that you would take algorithms and data structures
00:13:18.460 | in all these core classes,
00:13:19.520 | but then the compilers class
00:13:20.740 | was one of the last classes you take,
00:13:22.140 | because it pulls everything together,
00:13:24.340 | and then you work on one piece of code
00:13:26.980 | over the entire semester.
00:13:28.700 | And so you keep building on your own work,
00:13:32.080 | which is really interesting.
00:13:33.500 | It's also very challenging,
00:13:34.780 | because in many classes, if you don't get a project done,
00:13:37.500 | you just forget about it and move on to the next one
00:13:39.340 | and get your B or whatever it is.
00:13:41.300 | But here you have to live with the decisions you make
00:13:43.900 | and continue to reinvest in it,
00:13:45.260 | and I really like that.
00:13:46.860 | And so I did a extra study project with him
00:13:51.060 | the following semester, and he was just really great.
00:13:53.940 | And he was also a great mentor in a lot of ways.
00:13:56.900 | And so from him and from his advice,
00:13:59.580 | he encouraged me to go to graduate school.
00:14:01.500 | I wasn't super excited about going to grad school.
00:14:03.220 | I wanted the master's degree,
00:14:05.220 | but I didn't want to be an academic.
00:14:07.460 | But like I said, I kind of got tricked into saying
00:14:11.180 | I was having a lot of fun,
00:14:12.140 | and I definitely do not regret it.
00:14:14.600 | - What aspects of compilers
00:14:15.900 | were the things you connected with?
00:14:17.980 | So LLVM, there's also the other part
00:14:22.180 | that's just really interesting,
00:14:23.460 | if you're interested in languages,
00:14:24.980 | is parsing and just analyzing,
00:14:27.700 | like, yeah, analyzing the language,
00:14:29.660 | breaking it down, parsing, and so on.
00:14:31.260 | Was that interesting to you,
00:14:32.340 | or were you more interested in optimization?
00:14:34.100 | - For me, it was more, so I'm not really a math person.
00:14:37.460 | I can do math.
00:14:38.300 | I understand some bits of it when I get into it,
00:14:41.600 | but math is never the thing that attracted me.
00:14:43.980 | And so a lot of the parser part of the compiler
00:14:46.180 | has a lot of good formal theories
00:14:47.860 | that Don, for example, knows quite well.
00:14:50.460 | Still waiting for his book on that.
00:14:52.160 | But I just like building a thing
00:14:56.460 | and seeing what it could do,
00:14:57.940 | and exploring and getting it to do more things,
00:15:00.820 | and then setting new goals and reaching for them.
00:15:03.220 | And in the case of LLVM, when I started working on that,
00:15:09.100 | my research advisor that I was working for
00:15:12.420 | was a compiler guy.
00:15:13.460 | And so he and I specifically found each other
00:15:15.640 | because we were both interested in compilers,
00:15:16.980 | and so I started working with him and taking his class.
00:15:19.500 | And a lot of LLVM initially was,
00:15:21.380 | it's fun implementing all the standard algorithms
00:15:23.660 | and all the things that people had been talking about
00:15:26.420 | and were well-known and they were in the curricula
00:15:28.960 | for advanced studies in compilers.
00:15:31.340 | And so just being able to build that was really fun.
00:15:34.580 | And I was learning a lot by,
00:15:36.940 | instead of reading about it, just building.
00:15:38.660 | And so I enjoyed that.
00:15:40.260 | - So you said compilers are these complicated systems.
00:15:42.860 | Can you even just, with language, try to describe
00:15:47.300 | how you turn a C++ program into code?
00:15:52.260 | Like what are the hard parts?
00:15:53.500 | Why is it so hard?
00:15:54.660 | - So I'll give you examples of the hard parts
00:15:56.380 | along the way.
00:15:57.220 | So C++ is a very complicated programming language.
00:16:01.100 | It's something like 1,400 pages in the spec.
00:16:03.500 | So C++ by itself is crazy complicated.
00:16:06.140 | - Can we just, sorry, pause.
00:16:07.180 | What makes the language complicated
00:16:08.780 | in terms of what's syntactically?
00:16:12.500 | - So it's what they call syntax.
00:16:14.340 | So the actual, how the characters are arranged, yes.
00:16:16.700 | It's also semantics, how it behaves.
00:16:20.100 | It's also, in the case of C++,
00:16:21.740 | there's a huge amount of history.
00:16:23.420 | C++ built on top of C.
00:16:25.580 | You play that forward and then a bunch of suboptimal,
00:16:28.740 | in some cases, decisions were made and they compound.
00:16:31.700 | And then more and more and more things
00:16:33.380 | keep getting added to C++ and it will probably never stop.
00:16:37.020 | But the language is very complicated from that perspective.
00:16:39.440 | And so the interactions between subsystems
00:16:41.220 | is very complicated.
00:16:42.380 | There's just a lot there.
00:16:43.540 | And when you talk about the front end,
00:16:45.620 | one of the major challenges, which Clang, as a project,
00:16:48.580 | the C++ compiler that I built, I and many people built,
00:16:52.280 | one of the challenges we took on was we looked at GCC.
00:16:57.580 | Okay, GCC at the time was like a really good
00:17:01.060 | industry standardized compiler that had really consolidated
00:17:05.260 | a lot of the other compilers in the world
00:17:06.700 | and was a standard.
00:17:08.320 | But it wasn't really great for research.
00:17:10.600 | The design was very difficult to work with.
00:17:12.540 | And it was full of global variables and other things
00:17:16.600 | that made it very difficult to reuse
00:17:18.100 | in ways that it wasn't originally designed for.
00:17:20.380 | And so with Clang, one of the things that we wanted to do
00:17:22.500 | is push forward on better user interface.
00:17:25.460 | So make error messages that are just better than GCCs.
00:17:28.140 | And that's actually hard because you have to do
00:17:29.900 | a lot of bookkeeping in an efficient way
00:17:31.840 | to be able to do that.
00:17:33.580 | We wanted to make compile time better.
00:17:35.140 | And so compile time is about making it efficient,
00:17:37.500 | which is also really hard when you're keeping track
00:17:39.140 | of extra information.
00:17:40.500 | We wanted to make new tools available.
00:17:43.380 | So refactoring tools and other analysis tools
00:17:46.340 | that the GCC never supported,
00:17:48.360 | also leveraging the extra information we kept,
00:17:51.140 | but enabling those new classes of tools
00:17:54.060 | that then get built into IDEs.
00:17:55.940 | And so that's been one of the areas that Clang
00:17:58.900 | has really helped push the world forward in,
00:18:01.300 | is in the tooling for C and C++ and things like that.
00:18:05.020 | But C++ in the front end piece is complicated
00:18:07.740 | and you have to build syntax trees
00:18:08.980 | and you have to check every rule in the spec
00:18:11.300 | and you have to turn that back into an error message
00:18:13.980 | to the human that the human can understand
00:18:16.000 | when they do something wrong.
00:18:17.800 | But then you start doing the, what's called lowering.
00:18:20.700 | So going from C++ and the way that it represents code
00:18:23.420 | down to the machine.
00:18:24.900 | And when you do that,
00:18:25.740 | there's many different phases you go through.
00:18:28.220 | Often there are, I think LLVM has something like 150
00:18:33.020 | different, what are called passes in the compiler
00:18:36.220 | that the code passes through.
00:18:38.740 | And these get organized in very complicated ways,
00:18:41.840 | which affect the generated code and the performance
00:18:44.340 | and compile time and many other things.
00:18:45.980 | - What are they passing through?
00:18:47.300 | So after you do the Clang parsing,
00:18:51.820 | what's the, is it a graph?
00:18:53.960 | What does it look like?
00:18:54.800 | What's the data structure here?
00:18:55.940 | - Yeah, so in the parser, it's usually a tree
00:18:59.020 | and it's called an abstract syntax tree.
00:19:01.060 | And so the idea is you have a node for the plus
00:19:04.580 | that the human wrote in their code or the function call,
00:19:07.740 | you'll have a node for call with the function that they call
00:19:10.780 | and the arguments they pass, things like that.
00:19:13.080 | This then gets lowered into what's called
00:19:16.860 | an intermediate representation.
00:19:18.600 | And intermediate representations are like LLVM has one.
00:19:22.100 | And there it's a, it's what's called a control flow graph.
00:19:26.940 | And so you represent each operation in the program
00:19:31.220 | as a very simple, like this is gonna add two numbers,
00:19:34.480 | this is gonna multiply two things, maybe we'll do a call,
00:19:37.460 | but then they get put in what are called blocks.
00:19:40.260 | And so you get blocks of these straight line operations,
00:19:43.580 | where instead of being nested like in a tree,
00:19:45.300 | it's straight line operations.
00:19:46.900 | And so there's a sequence and an ordering
00:19:48.420 | to these operations.
00:19:49.740 | - So within the block or outside the block?
00:19:51.780 | - That's within the block.
00:19:53.180 | And so it's a straight line sequence of operations
00:19:54.980 | within the block, and then you have branches,
00:19:57.460 | like conditional branches between blocks.
00:20:00.140 | And so when you write a loop, for example,
00:20:02.720 | in a syntax tree, you would have a for node,
00:20:07.020 | like for a for statement in a C-like language,
00:20:09.020 | you'd have a for node,
00:20:10.780 | and you have a pointer to the expression
00:20:12.180 | for the initializer, a pointer to the expression
00:20:14.060 | for the increment, a pointer to the expression
00:20:15.820 | for the comparison, a pointer to the body.
00:20:18.700 | Okay, and these are all nested underneath it.
00:20:21.040 | In a control flow graph, you get a block for the code
00:20:24.580 | that runs before the loop, so the initializer code,
00:20:27.580 | then you have a block for the body of the loop,
00:20:30.260 | and so the body of the loop code goes in there,
00:20:33.740 | but also the increment and other things like that,
00:20:35.540 | and then you have a branch that goes back to the top,
00:20:37.780 | and a comparison and a branch that goes out.
00:20:39.860 | And so it's more of a assembly level kind of representation.
00:20:44.000 | But the nice thing about this level of representation
00:20:46.040 | is it's much more language independent.
00:20:48.660 | And so there's lots of different kinds of languages
00:20:51.880 | with different kinds of, you know, JavaScript
00:20:55.180 | has a lot of different ideas of what is false, for example,
00:20:58.140 | and all that can stay in the front end,
00:21:00.760 | but then that middle part can be shared across all those.
00:21:04.180 | - How close is that intermediate representation
00:21:07.500 | to neural networks, for example?
00:21:10.260 | Is there, are they, 'cause everything you describe
00:21:13.540 | is a kind of echoes of a neural network graph.
00:21:16.060 | Are they neighbors, or what?
00:21:18.940 | - They're quite different in details,
00:21:20.980 | but they're very similar in idea.
00:21:22.500 | So one of the things that neural networks do
00:21:24.020 | is they learn representations for data
00:21:26.880 | at different levels of abstraction, right?
00:21:29.140 | And then they transform those through layers, right?
00:21:32.380 | So the compiler does very similar things,
00:21:35.660 | but one of the things the compiler does
00:21:37.140 | is it has relatively few different representations.
00:21:40.660 | Where a neural network often, as you get deeper,
00:21:42.500 | for example, you get many different representations,
00:21:44.820 | and each, you know, layer or set of ops
00:21:47.400 | is transforming between these different representations.
00:21:50.220 | In a compiler, often you get one representation
00:21:53.100 | and they do many transformations to it.
00:21:55.240 | And these transformations are often applied iteratively.
00:21:58.700 | And for programmers, there's familiar types of things.
00:22:02.940 | For example, trying to find expressions inside of a loop
00:22:06.180 | and pulling them out of a loop,
00:22:07.320 | so that they execute for your times,
00:22:08.540 | or find redundant computation,
00:22:10.740 | or find constant folding or other simplifications,
00:22:15.340 | turning, you know, two times X into X shift left by one,
00:22:19.060 | and things like this are all the examples
00:22:21.980 | of the things that happen.
00:22:23.340 | But compilers end up getting a lot of theorem proving
00:22:26.180 | and other kinds of algorithms
00:22:27.660 | that try to find higher level properties of the program
00:22:29.940 | that then can be used by the optimizer.
00:22:32.260 | - Cool, so what's like the biggest bang for the buck
00:22:35.900 | with optimization?
00:22:37.660 | What's a-- - Today?
00:22:38.700 | Yeah.
00:22:39.540 | - Well, no, not even today.
00:22:40.900 | At the very beginning, the '80s, I don't know.
00:22:42.620 | - Oh yeah, so for the '80s,
00:22:43.940 | a lot of it was things like register allocation.
00:22:46.420 | So the idea of in a modern, like a microprocessor,
00:22:50.940 | what you'll end up having is you'll end up having memory,
00:22:52.740 | which is relatively slow,
00:22:54.300 | and then you have registers that are relatively fast.
00:22:57.060 | But registers, you don't have very many of them, okay?
00:23:00.300 | And so when you're writing a bunch of code,
00:23:02.560 | you're just saying like, compute this,
00:23:04.140 | put in a temporary variable, compute this,
00:23:05.460 | compute this, compute this, put in a temporary variable,
00:23:07.740 | I have a loop, I have some other stuff going on.
00:23:09.740 | Well, now you're running on an x86,
00:23:11.620 | like a desktop PC or something.
00:23:13.860 | Well, it only has, in some cases,
00:23:16.140 | some modes, eight registers, right?
00:23:18.660 | And so now the compiler has to choose
00:23:20.740 | what values get put in what registers
00:23:22.780 | at what points in the program.
00:23:24.780 | And this is actually a really big deal.
00:23:26.460 | So if you think about, you have a loop,
00:23:28.540 | an inner loop that executes millions of times maybe.
00:23:31.580 | If you're doing loads and stores inside that loop,
00:23:33.580 | then it's gonna be really slow.
00:23:34.860 | But if you can somehow fit all the values
00:23:37.060 | inside that loop in registers, now it's really fast.
00:23:40.140 | And so getting that right requires a lot of work
00:23:43.340 | because there's many different ways to do that.
00:23:44.940 | And often what the compiler ends up doing
00:23:46.940 | is it ends up thinking about things
00:23:48.820 | in a different representation than what the human wrote.
00:23:51.860 | Right, you wrote into x.
00:23:53.300 | Well, the compiler thinks about that
00:23:54.700 | as four different values,
00:23:56.740 | each which have different lifetimes
00:23:58.340 | across the function that it's in.
00:24:00.340 | And each of those could be put in a register
00:24:02.580 | or memory or different memory,
00:24:04.180 | or maybe in some parts of the code,
00:24:06.140 | recomputed instead of stored and reloaded.
00:24:08.700 | And there are many of these different kinds of techniques
00:24:10.700 | that can be used.
00:24:11.540 | - So it's adding almost like a time dimension to,
00:24:15.580 | it's trying to optimize across time.
00:24:18.260 | So it's considering when you're programming,
00:24:20.340 | you're not thinking in that way.
00:24:21.860 | - Yeah, absolutely.
00:24:23.140 | And so the RISC era made things--
00:24:26.300 | - RISC?
00:24:27.140 | - So RISC chips, R-I-S-C.
00:24:29.980 | The RISC chips as opposed to CISC chips.
00:24:33.700 | The RISC chips made things more complicated
00:24:35.980 | for the compiler because what they ended up doing
00:24:39.740 | is ending up adding pipelines to the processor
00:24:42.380 | where the processor can do more than one thing at a time.
00:24:44.980 | But this means that the order of operations matters a lot.
00:24:47.620 | And so one of the classical compiler techniques
00:24:49.740 | that you use is called scheduling.
00:24:51.980 | And so moving the instructions around
00:24:54.180 | so that the processor can keep its pipelines full
00:24:57.420 | instead of stalling and getting blocked.
00:24:59.580 | And so there's a lot of things like that
00:25:00.980 | that are kind of bread and butter compiler techniques
00:25:03.620 | that have been studied a lot over the course of decades now.
00:25:06.220 | But the engineering side of making them real
00:25:08.540 | is also still quite hard.
00:25:10.660 | And you talk about machine learning.
00:25:12.420 | This is a huge opportunity for machine learning
00:25:14.380 | because many of these algorithms are full of these
00:25:17.300 | like hokey hand-rolled heuristics
00:25:19.100 | which work well on specific benchmarks
00:25:20.900 | but don't generalize and full of magic numbers.
00:25:23.900 | And I hear there's some techniques
00:25:26.540 | that are good at handling that.
00:25:28.020 | - So what would be the,
00:25:29.860 | if you were to apply machine learning to this,
00:25:33.020 | what's the thing you're trying to optimize?
00:25:34.700 | Is it ultimately the running time
00:25:38.100 | of a specific-- - Yeah, yeah.
00:25:39.060 | You can pick your metric and there's running time,
00:25:41.180 | there's memory use,
00:25:42.220 | there's lots of different things that you can optimize for.
00:25:45.180 | Code size is another one that some people care about
00:25:47.180 | in the embedded space.
00:25:48.820 | - Is this like the thinking into the future
00:25:51.660 | or is somebody actually been crazy enough to try
00:25:55.580 | to have machine learning-based parameter tuning
00:25:59.060 | for the optimization of compilers?
00:26:01.020 | - So this is something that is,
00:26:02.580 | I would say, research right now.
00:26:04.820 | There are a lot of research systems
00:26:06.780 | that have been applying search in various forms
00:26:09.060 | and using reinforcement learning as one form
00:26:11.460 | but also brute force search has been tried for quite a while.
00:26:14.380 | And usually these are in small problem spaces.
00:26:18.140 | So find the optimal way to code generate
00:26:21.460 | a matrix multiply for a GPU, right?
00:26:23.700 | Something like that where you say,
00:26:25.460 | there there's a lot of design space of,
00:26:28.540 | do you unroll loops a lot?
00:26:29.940 | Do you execute multiple things in parallel?
00:26:32.620 | And there's many different confounding factors here
00:26:35.340 | because graphics cards have different numbers
00:26:37.580 | of threads and registers and execution ports
00:26:40.260 | and memory bandwidth and many different constraints
00:26:42.260 | that interact in nonlinear ways.
00:26:44.460 | And so search is very powerful for that
00:26:46.500 | and it gets used in certain ways
00:26:49.860 | but it's not very structured.
00:26:51.260 | This is something that we as an industry need to fix.
00:26:54.540 | - So you said 80s but like,
00:26:56.260 | so have there been like big jumps
00:26:59.260 | in improvement and optimization?
00:27:01.300 | - Yeah.
00:27:02.380 | - Yeah, since then what's the coolest thing?
00:27:05.300 | - It's largely been driven by hardware.
00:27:07.100 | So hardware and software.
00:27:09.860 | So in the mid nineties, Java totally changed the world.
00:27:13.660 | Right, and I'm still amazed by how much change
00:27:17.020 | was introduced by Java.
00:27:17.860 | - In a good way or?
00:27:18.700 | - In a good way.
00:27:19.540 | So like reflecting back,
00:27:20.620 | Java introduced things like,
00:27:22.420 | all at once introduced things like JIT compilation.
00:27:25.620 | I mean, none of these were novel
00:27:26.900 | but it pulled it together and made it mainstream
00:27:28.620 | and made people invest in it.
00:27:30.580 | JIT compilation, garbage collection,
00:27:32.660 | portable code, safe code, like memory safe code,
00:27:36.620 | like a very dynamic dispatch execution model.
00:27:41.420 | Like many of these things,
00:27:42.660 | which had been done in research systems
00:27:44.100 | and had been done in small ways in various places,
00:27:46.920 | really came to the forefront
00:27:48.020 | and really changed how things worked
00:27:49.780 | and therefore changed the way people
00:27:51.780 | thought about the problem.
00:27:53.100 | JavaScript was another major world change
00:27:56.340 | based on the way it works.
00:27:57.780 | But also on the hardware side of things,
00:28:02.260 | multi-core and vector instructions
00:28:05.140 | really change the problem space
00:28:07.500 | and are very,
00:28:08.380 | they don't remove any of the problems
00:28:10.820 | that compilers faced in the past,
00:28:12.380 | but they add new kinds of problems
00:28:14.540 | of how do you find enough work
00:28:16.380 | to keep a four wide vector busy, right?
00:28:20.020 | Or if you're doing a matrix multiplication,
00:28:22.660 | how do you do different columns out of that matrix
00:28:25.300 | in at the same time
00:28:26.660 | and how do you maximally utilize
00:28:28.060 | the arithmetic compute that one core has?
00:28:31.420 | And then how do you take it to multiple cores?
00:28:33.460 | - How did the whole virtual machine thing change
00:28:35.740 | the compilation pipeline?
00:28:37.980 | - Yeah, so what the Java virtual machine does
00:28:40.420 | is it splits,
00:28:43.100 | just like I was talking about before,
00:28:44.140 | where you have a front end that parses the code
00:28:46.260 | and then you have an intermediate representation
00:28:47.980 | that gets transformed.
00:28:49.420 | What Java did was they said,
00:28:50.940 | we will parse the code
00:28:51.980 | and then compile to what's known as Java bytecode.
00:28:54.660 | And that bytecode is now a portable code representation
00:28:58.540 | that is industry standard and locked down
00:29:00.840 | and can't change.
00:29:02.380 | And then the back part of the compiler
00:29:05.020 | that does optimization and code generation
00:29:07.260 | can now be built by different vendors.
00:29:09.220 | And Java bytecode can be shipped around across the wire.
00:29:12.980 | It's memory safe and relatively trusted.
00:29:15.840 | - And because of that, it can run in the browser.
00:29:18.660 | - And that's why it runs in the browser, right?
00:29:20.500 | And so that way you can be in,
00:29:22.940 | again, back in the day,
00:29:23.820 | you would write a Java applet
00:29:24.980 | and you'd use as a web developer,
00:29:27.740 | you'd build this mini app that would run on a webpage.
00:29:30.840 | Well, a user of that is running a web browser
00:29:33.600 | on their computer.
00:29:34.440 | You download that Java bytecode,
00:29:36.160 | which can be trusted,
00:29:37.840 | and then you do all the compiler stuff on your machine
00:29:41.040 | so that you know that you trust that.
00:29:42.400 | - Now, is that a good idea or a bad idea?
00:29:44.040 | - It's a great idea.
00:29:44.880 | I mean, it's a great idea for certain problems.
00:29:46.200 | And I'm very much a believer
00:29:48.200 | that technology is itself neither good nor bad.
00:29:50.480 | It's how you apply it.
00:29:51.600 | You know, this would be a very, very bad thing
00:29:54.620 | for very low levels of the software stack,
00:29:56.920 | but in terms of solving
00:29:58.880 | some of these software portability and transparency
00:30:01.380 | or portability problems,
00:30:02.740 | I think it's been really good.
00:30:04.180 | Now, Java ultimately didn't win out on the desktop.
00:30:06.540 | And like, there are good reasons for that,
00:30:09.340 | but it's been very successful on servers
00:30:12.020 | and in many places,
00:30:13.160 | it's been a very successful thing over decades.
00:30:16.260 | - So what has been LLVMs
00:30:21.260 | and C-Langs improvements in optimization
00:30:26.100 | that throughout its history,
00:30:28.620 | what are some moments where you had set back
00:30:31.040 | and really proud of what's been accomplished?
00:30:33.260 | - Yeah, I think that the interesting thing about LLVM
00:30:36.140 | is not the innovations in compiler research.
00:30:40.100 | It has very good implementations
00:30:41.860 | of various important algorithms, no doubt.
00:30:43.960 | And a lot of really smart people have worked on it.
00:30:48.240 | But I think that the thing that's most profound about LLVM
00:30:50.540 | is that through standardization,
00:30:52.540 | it made things possible
00:30:53.780 | that otherwise wouldn't have happened, okay?
00:30:56.220 | And so interesting things that have happened with LLVM,
00:30:59.060 | for example, Sony has picked up LLVM
00:31:01.220 | and used it to do all the graphics compilation
00:31:03.900 | in their movie production pipeline.
00:31:06.060 | And so now they're able to have better special effects
00:31:07.880 | because of LLVM.
00:31:09.620 | That's kind of cool.
00:31:11.140 | That's not what it was designed for, right?
00:31:12.980 | But that's the sign of good infrastructure
00:31:15.460 | when it can be used in ways it was never designed for
00:31:18.740 | because it has good layering and software engineering
00:31:20.940 | and it's composable and things like that.
00:31:23.400 | - Which is where, as you said, it differs from GCC.
00:31:26.100 | - Yes, GCC is also great in various ways,
00:31:28.220 | but it's not as good as infrastructure technology.
00:31:31.780 | It's really a C compiler or it's a Fortran compiler.
00:31:36.140 | It's not infrastructure in the same way.
00:31:38.700 | - Now you can tell I don't know what I'm talking about
00:31:41.540 | because I keep saying C-lang.
00:31:43.660 | You can always tell when a person is close,
00:31:48.080 | by the way, pronounce something.
00:31:49.420 | I don't think, have I ever used clang?
00:31:52.580 | - Entirely possible.
00:31:53.520 | Have you, well, so you've used code,
00:31:55.680 | it's generated probably.
00:31:58.200 | So clang is an LLVM we use to compile
00:32:01.760 | all the apps on the iPhone effectively and the OS's.
00:32:05.240 | It compiles Google's production server applications.
00:32:09.380 | It's used to build GameCube games and PlayStation 4
00:32:14.880 | and things like that.
00:32:16.680 | - So as a user I have, but just everything I've done
00:32:20.140 | that I experienced with Linux has been,
00:32:22.120 | I believe, always GCC.
00:32:23.560 | - Yeah, I think Linux still defaults to GCC.
00:32:26.520 | - And is there a reason for that?
00:32:27.800 | Or is it because, I mean, is there a reason for that?
00:32:29.440 | - It's a combination of technical and social reasons.
00:32:32.040 | Many Linux developers do use clang,
00:32:35.960 | but the distributions for lots of reasons
00:32:39.640 | use GCC historically and they've not switched.
00:32:43.380 | - And just anecdotally online,
00:32:46.640 | it seems that LLVM has either reached the level GCC
00:32:50.640 | or superseded on different features or whatever.
00:32:53.520 | - The way I would say it is that they're so close
00:32:55.200 | it doesn't matter.
00:32:56.040 | - Yeah, exactly.
00:32:56.860 | - Like they're slightly better in some ways,
00:32:58.120 | slightly worse in other ways,
00:32:59.160 | but it doesn't actually really matter anymore at that level.
00:33:03.200 | - So in terms of optimization breakthroughs,
00:33:06.280 | it's just been solid incremental work.
00:33:09.160 | - Yeah, yeah, which describes a lot of compilers.
00:33:12.520 | The hard thing about compilers in my experience
00:33:14.960 | is the engineering, the software engineering,
00:33:17.400 | making it so that you can have hundreds of people
00:33:20.120 | collaborating on really detailed low-level work
00:33:23.560 | and scaling that, and that's really hard.
00:33:27.840 | And that's one of the things I think LLVM has done well.
00:33:30.640 | And that kind of goes back to the original design goals
00:33:34.160 | with it to be modular and things like that.
00:33:37.120 | And incidentally, I don't want to take all the credit
00:33:38.840 | for this, right?
00:33:39.680 | I mean, some of the best parts about LLVM
00:33:41.720 | is that it was designed to be modular.
00:33:43.560 | And when I started, I would write, for example,
00:33:45.560 | a register allocator, and then somebody much smarter than me
00:33:48.460 | would come in and pull it out and replace it
00:33:50.680 | with something else that they would come up with.
00:33:52.640 | And because it's modular, they were able to do that.
00:33:55.160 | And that's one of the challenges with GCC, for example,
00:33:58.240 | is replacing subsystems is incredibly difficult.
00:34:01.240 | It can be done, but it wasn't designed for that.
00:34:04.640 | And that's one of the reasons that LLVM's been
00:34:06.040 | very successful in the research world as well.
00:34:08.720 | - But in a community sense, Guido van Rossum, right,
00:34:12.960 | from Python, just retired from, what is it,
00:34:18.460 | Benevolent Dictator for Life, right?
00:34:20.500 | So in managing this community of brilliant compiler folks,
00:34:24.380 | did it, for a time at least, fall on you to approve things?
00:34:30.980 | - Oh yeah, so I mean, I still have something
00:34:34.020 | where I can order a magnitude more patches in LLVM
00:34:37.980 | than anybody else.
00:34:39.020 | And many of those I wrote myself.
00:34:42.780 | - But you still write, I mean, you're still close
00:34:46.660 | to the, I don't know what the expression is,
00:34:49.460 | to the metal, you still write code.
00:34:51.020 | - Yeah, I still write code.
00:34:52.220 | Not as much as I was able to in grad school,
00:34:54.260 | but that's an important part of my identity.
00:34:56.780 | But the way that LLVM has worked over time
00:34:58.860 | is that when I was a grad student, I could do all the work
00:35:01.340 | and steer everything and review every patch
00:35:04.140 | and make sure everything was done exactly
00:35:06.860 | the way my opinionated sense felt like it should be done.
00:35:10.620 | And that was fine.
00:35:11.740 | But at sync scale, you can't do that, right?
00:35:14.300 | And so what ends up happening is LLVM has a hierarchical
00:35:18.060 | system of what's called co-donors.
00:35:20.540 | These co-donors are given the responsibility
00:35:22.900 | not to do all the work,
00:35:24.900 | not necessarily to review all the patches,
00:35:26.660 | but to make sure that the patches do get reviewed
00:35:28.820 | and make sure that the right thing's happening
00:35:30.340 | architecturally in their area.
00:35:32.180 | And so what you'll see is you'll see that,
00:35:34.660 | for example, hardware manufacturers end up owning
00:35:38.540 | the hardware specific parts of their hardware.
00:35:43.620 | That's very common.
00:35:44.500 | Leaders in the community that have done really good work
00:35:47.740 | naturally become the de facto owner of something.
00:35:50.900 | And then usually somebody else is like,
00:35:53.420 | how about we make them the official co-donor?
00:35:55.540 | And then we'll have somebody to make sure
00:35:58.620 | that all the patches get reviewed in a timely manner.
00:36:00.300 | And then everybody's like, yes, that's obvious.
00:36:02.060 | And then it happens, right?
00:36:03.220 | And usually this is a very organic thing, which is great.
00:36:06.060 | And so I'm nominally the top of that stack still,
00:36:08.740 | but I don't spend a lot of time reviewing patches.
00:36:11.540 | What I do is I help negotiate a lot of the
00:36:16.140 | technical disagreements that end up happening
00:36:18.220 | and making sure that the community as a whole
00:36:19.660 | makes progress and is moving in the right direction
00:36:22.060 | and doing that.
00:36:23.900 | So we also started a nonprofit six years ago, seven years ago.
00:36:28.900 | Time's gone away.
00:36:30.820 | And the nonprofit, the LLVM Foundation nonprofit
00:36:34.020 | helps oversee all the business sides of things
00:36:35.940 | and make sure that the events that the LLVM community has
00:36:38.820 | are funded and set up and run correctly
00:36:41.580 | and stuff like that.
00:36:42.780 | But the foundation is very much,
00:36:44.580 | stays out of the technical side of
00:36:46.940 | where the project is going.
00:36:49.060 | - Right, so it sounds like a lot of it is just organic.
00:36:52.100 | Just...
00:36:53.180 | - Yeah, well, and this is, LLVM is almost 20 years old,
00:36:55.700 | which is hard to believe.
00:36:56.580 | Somebody pointed out to me recently that
00:36:58.500 | LLVM is now older than GCC was when LLVM started, right?
00:37:04.340 | So time has a way of getting away from you.
00:37:06.860 | But the good thing about that is it has a really robust,
00:37:10.420 | really amazing community of people that are
00:37:13.540 | in their professional lives,
00:37:14.700 | spread across lots of different companies,
00:37:16.300 | but it's a community of people that are interested
00:37:19.780 | in similar kinds of problems
00:37:21.140 | and have been working together effectively for years
00:37:23.660 | and have a lot of trust and respect for each other.
00:37:26.460 | And even if they don't always agree that,
00:37:28.940 | we're able to find a path forward.
00:37:31.180 | - So then in a slightly different flavor of effort,
00:37:34.500 | you started Apple in 2005 with the task of making,
00:37:38.940 | I guess, LLVM production ready.
00:37:41.820 | And then eventually 2013 through 2017,
00:37:44.660 | leading the entire developer tools department.
00:37:48.380 | We're talking about LLVM, Xcode, Objective-C to Swift.
00:37:52.980 | So in a quick overview of your time there,
00:37:58.620 | what were the challenges?
00:37:59.660 | First of all, leading such a huge group of developers,
00:38:03.260 | what was the big motivator dream mission
00:38:06.580 | behind creating Swift, the early birth of it
00:38:11.420 | from Objective-C and so on and Xcode?
00:38:13.460 | What are the challenges?
00:38:14.300 | - So these are different questions.
00:38:15.940 | - Yeah, I know, but I wanna talk about the other stuff too.
00:38:19.780 | - I'll stay on the technical side,
00:38:21.260 | then we can talk about the big team pieces, if that's okay.
00:38:24.500 | - So I used to really oversimplify many years of hard work.
00:38:27.700 | LLVM started, joined Apple, became a thing,
00:38:32.460 | became successful and became deployed.
00:38:34.580 | But then there's a question about
00:38:35.980 | how do we actually parse the source code?
00:38:38.900 | So LLVM is that back part,
00:38:40.300 | the optimizer and the code generator.
00:38:42.340 | And LLVM is really good for Apple
00:38:44.060 | as it went through a couple of hardware transitions.
00:38:46.060 | I joined right at the time of the Intel transition,
00:38:47.940 | for example, and 64-bit transitions,
00:38:51.820 | and then the transition to ARM with the iPhone.
00:38:53.500 | And so LLVM was very useful
00:38:54.700 | for some of these kinds of things.
00:38:56.860 | But at the same time, there's a lot of questions
00:38:58.460 | around developer experience.
00:39:00.100 | And so if you're a programmer pounding out
00:39:01.900 | at the time objective C code,
00:39:03.420 | the error message you get, the compile time,
00:39:06.500 | the turnaround cycle, the tooling and the IDE
00:39:09.700 | were not great, were not as good as they could be.
00:39:12.980 | And so, as I occasionally do, I'm like,
00:39:17.980 | well, okay, how hard is it to write a C compiler?
00:39:20.660 | And so I'm not gonna commit to anybody,
00:39:22.540 | I'm not gonna tell anybody,
00:39:23.380 | I'm just gonna just do it on nights and weekends
00:39:25.980 | and start working on it.
00:39:27.420 | And then I built up and see there's this thing
00:39:30.140 | called the preprocessor, which people don't like,
00:39:32.980 | but it's actually really hard and complicated
00:39:35.460 | and includes a bunch of really weird things
00:39:37.660 | like trigraphs and other stuff like that
00:39:39.260 | that are really nasty.
00:39:40.920 | And it's the crux of a bunch of the performance issues
00:39:44.020 | in the compiler.
00:39:45.620 | Started working on the parser
00:39:46.580 | and kind of got to the point where I'm like,
00:39:47.780 | oh, you know what, we could actually do this.
00:39:49.820 | Everybody's saying that this is impossible to do,
00:39:51.420 | but it's actually just hard, it's not impossible.
00:39:53.900 | And eventually told my manager about it,
00:39:57.540 | and he's like, oh, wow, this is great,
00:39:59.180 | we do need to solve this problem.
00:40:00.300 | Oh, this is great, we can get you one other person
00:40:02.500 | to work with you on this.
00:40:04.420 | And slowly a team is formed and it starts taking off.
00:40:08.300 | And C++, for example, huge complicated language,
00:40:12.020 | people always assume that it's impossible to implement,
00:40:14.320 | and it's very nearly impossible,
00:40:16.220 | but it's just really, really hard.
00:40:18.700 | And the way to get there is to build it
00:40:20.820 | one piece at a time, incrementally.
00:40:22.420 | And that was only possible because we were lucky
00:40:26.380 | to hire some really exceptional engineers
00:40:28.140 | that knew various parts of it very well
00:40:30.340 | and could do great things.
00:40:32.640 | Swift was kind of a similar thing.
00:40:34.420 | So Swift came from, we were just finishing off
00:40:39.140 | the first version of C++ support in Clang.
00:40:42.580 | And C++ is a very formidable and very important language,
00:40:47.220 | but it's also ugly in lots of ways.
00:40:49.260 | And you can't influence C++ without thinking
00:40:52.340 | there has to be a better thing, right?
00:40:54.380 | And so I started working on Swift, again,
00:40:56.140 | with no hope or ambition that would go anywhere,
00:40:58.540 | just let's see what could be done,
00:41:00.820 | let's play around with this thing.
00:41:02.620 | It was me in my spare time,
00:41:04.860 | not telling anybody about it kind of a thing.
00:41:08.220 | And it made some good progress.
00:41:09.420 | I'm like, actually, it would make sense to do this.
00:41:11.260 | At the same time, I started talking with the senior VP
00:41:14.800 | of software at the time, a guy named Bertrand Serlet.
00:41:17.700 | And Bertrand was very encouraging.
00:41:19.260 | He was like, well, let's have fun, let's talk about this.
00:41:22.080 | And he was a little bit of a language guy.
00:41:23.460 | And so he helped guide some of the early work
00:41:26.140 | and encouraged me and got things off the ground.
00:41:30.420 | And eventually, I told my manager and told other people,
00:41:34.420 | and it started making progress.
00:41:38.780 | The complicating thing with Swift was that
00:41:41.740 | the idea of doing a new language is not obvious to anybody,
00:41:45.460 | including myself.
00:41:46.560 | And the tone at the time was that the iPhone was successful
00:41:50.860 | because of Objective-C, right?
00:41:53.420 | - Oh, interesting.
00:41:54.420 | - In Objective-C. - Not despite of,
00:41:55.540 | or just because of. - Correct.
00:41:57.140 | And you have to understand that at the time,
00:41:59.880 | Apple was hiring software people that loved Objective-C.
00:42:05.060 | Right, and it wasn't that they came despite Objective-C.
00:42:07.940 | They loved Objective-C, and that's why they got hired.
00:42:10.020 | And so you had a software team that the leadership
00:42:12.540 | in many cases went all the way back to Next,
00:42:15.180 | where Objective-C really became real.
00:42:19.380 | And so they quote unquote grew up writing Objective-C.
00:42:23.220 | And many of the individual engineers all were hired
00:42:26.460 | because they loved Objective-C.
00:42:28.300 | And so this notion of, okay, let's do new language
00:42:30.540 | was kind of heretical in many ways, right?
00:42:34.100 | Meanwhile, my sense was that the outside community
00:42:36.500 | wasn't really in love with Objective-C.
00:42:37.860 | Some people were, and some of the most outspoken people were,
00:42:40.060 | but other people were hitting challenges
00:42:42.620 | because it has very sharp corners and it's difficult to learn.
00:42:46.460 | And so one of the challenges of making Swift happen
00:42:50.060 | that was totally non-technical is the social part
00:42:54.700 | of what do we do?
00:42:57.500 | Like if we do a new language, which at Apple,
00:42:59.820 | many things happen that don't ship, right?
00:43:02.200 | So if we ship it, what is the metrics of success?
00:43:05.740 | Why would we do this?
00:43:06.560 | Why wouldn't we make Objective-C better?
00:43:07.940 | If Objective-C has problems,
00:43:09.220 | let's file off those rough corners and edges.
00:43:12.120 | And one of the major things that became the reason
00:43:15.080 | to do this was this notion of safety, memory safety.
00:43:18.940 | And the way Objective-C works is that a lot
00:43:22.140 | of the object system and everything else
00:43:24.100 | is built on top of pointers in C.
00:43:27.500 | Objective-C is an extension on top of C.
00:43:29.920 | And so pointers are unsafe.
00:43:32.620 | And if you get rid of the pointers,
00:43:34.580 | it's not Objective-C anymore.
00:43:36.420 | And so fundamentally, that was an issue
00:43:39.020 | that you could not fix safety or memory safety
00:43:42.140 | without fundamentally changing the language.
00:43:45.580 | And so once we got through that part
00:43:47.300 | of the mental process and the thought process,
00:43:50.900 | it became a design process of saying,
00:43:53.500 | okay, well, if we're gonna do something new, what is good?
00:43:56.300 | Like, how do we think about this?
00:43:57.420 | And what do we like?
00:43:58.240 | And what are we looking for?
00:43:59.180 | And that was a very different phase of it.
00:44:02.420 | - So what are some design choices early on in Swift?
00:44:05.920 | Like we're talking about braces.
00:44:08.080 | Are you making a type language or not?
00:44:12.020 | All those kinds of things.
00:44:13.220 | - Yeah, so some of those were obvious given the context.
00:44:16.020 | So a typed language, for example,
00:44:17.780 | Objective-C is a typed language.
00:44:19.180 | And going with an untyped language
00:44:20.820 | wasn't really seriously considered.
00:44:24.260 | We wanted the performance and we wanted refactoring tools
00:44:27.200 | and other things like that to go with typed languages.
00:44:29.580 | - Quick dumb question.
00:44:31.400 | Was it obvious, I think this would be a dumb question,
00:44:34.580 | but was it obvious that the language
00:44:36.300 | has to be a compiled language?
00:44:38.020 | - Yes, that's not a dumb question.
00:44:42.020 | Earlier, I think late '90s,
00:44:43.660 | Apple had seriously considered moving
00:44:45.380 | its development experience to Java.
00:44:47.400 | But Swift started in 2010,
00:44:51.700 | which was several years after the iPhone.
00:44:53.820 | It was when the iPhone was definitely on an upper trajectory
00:44:56.580 | and the iPhone was still extremely,
00:44:58.700 | and is still a bit memory constrained, right?
00:45:01.780 | And so being able to compile the code and then ship it
00:45:05.460 | and then having standalone code that is not JIT compiled
00:45:09.580 | is a very big deal and is very much part
00:45:12.180 | of the Apple value system.
00:45:15.180 | Now, JavaScript's also a thing, right?
00:45:17.460 | I mean, it's not that this is exclusive
00:45:19.340 | and technologies are good
00:45:21.260 | depending on how they're applied, right?
00:45:23.900 | But in the design of Swift,
00:45:26.060 | saying like, how can we make Objective-C better, right?
00:45:28.300 | Objective-C is statically compiled
00:45:29.580 | and that was the contiguous, natural thing to do.
00:45:32.500 | - Just skip ahead a little bit.
00:45:34.660 | Right back, just as a question,
00:45:36.580 | as you think about today in 2019,
00:45:40.060 | in your work at Google, TensorFlow and so on,
00:45:42.380 | is, again, compilations, static compilation,
00:45:46.620 | still the right thing?
00:45:49.420 | - Yeah, so the funny thing after working on compilers
00:45:52.540 | for a really long time is that,
00:45:54.820 | and this is one of the things that LLVM has helped with,
00:45:58.980 | is that I don't look at compilations being static
00:46:02.460 | or dynamic or interpreted or not.
00:46:05.220 | This is a spectrum.
00:46:07.780 | And one of the cool things about Swift
00:46:09.100 | is that Swift is not just statically compiled.
00:46:12.180 | It's actually dynamically compiled as well.
00:46:14.140 | And it can also be interpreted,
00:46:15.260 | though nobody's actually done that.
00:46:17.500 | And so what ends up happening
00:46:20.140 | when you use Swift in a workbook,
00:46:22.100 | for example, in Colab or in Jupyter,
00:46:24.120 | is it's actually dynamically compiling the statements
00:46:26.340 | as you execute them.
00:46:28.180 | And so this gets back to the software engineering problems,
00:46:32.540 | right, where if you layer the stack properly,
00:46:34.940 | you can actually completely change
00:46:37.280 | how and when things get compiled
00:46:38.940 | because you have the right abstractions there.
00:46:41.060 | And so the way that a Colab workbook works with Swift
00:46:44.800 | is that when you start typing into it,
00:46:47.740 | it creates a process, a Unix process,
00:46:50.260 | and then each line of code you type in,
00:46:52.220 | it compiles it through the Swift compiler,
00:46:54.660 | the front-end part,
00:46:56.180 | and then sends it through the optimizer,
00:46:58.380 | JIT compiles machine code,
00:47:00.700 | and then injects it into that process.
00:47:03.860 | And so as you're typing new stuff,
00:47:05.380 | it's like squirting in new code
00:47:08.060 | and overwriting and replacing and updating code in place.
00:47:11.260 | And the fact that it can do this is not an accident.
00:47:13.500 | Like Swift was designed for this,
00:47:15.620 | but it's an important part of how the language was set up
00:47:18.100 | and how it's layered, and this is a non-obvious piece.
00:47:21.340 | And one of the things with Swift that was,
00:47:24.280 | for me, a very strong design point
00:47:25.880 | is to make it so that you can learn it very quickly.
00:47:29.620 | And so from a language design perspective,
00:47:32.060 | the thing that I always come back to is this UI principle
00:47:34.500 | of progressive disclosure of complexity.
00:47:37.900 | And so in Swift, you can start by saying print,
00:47:41.260 | quote, hello world, quote, right?
00:47:43.980 | And there's no slash N, just like Python,
00:47:46.500 | one line of code, no main, no--
00:47:48.380 | - No header files.
00:47:49.220 | - No header files, no public static class void,
00:47:51.540 | blah, blah, blah, string, like Java has, right?
00:47:54.340 | So one line of code, right?
00:47:55.600 | And you can teach that and it works great.
00:47:58.420 | Then you can say, well, let's introduce variables.
00:48:00.260 | And so you can declare a variable with var,
00:48:02.380 | so var x equals four, what is a variable?
00:48:04.660 | You can use x, x plus one, this is what it means.
00:48:07.700 | Then you can say, well, how about control flow?
00:48:09.500 | Well, this is what an if statement is.
00:48:10.840 | This is what a for statement is.
00:48:12.260 | This is what a while statement is.
00:48:13.960 | Then you can say, let's introduce functions, right?
00:48:17.260 | And many languages like Python have had this kind of notion
00:48:21.500 | of let's introduce small things
00:48:22.820 | and then you can add complexity,
00:48:24.380 | then you can introduce classes,
00:48:25.740 | and then you can add generics in the case of Swift,
00:48:28.060 | and then you can, in modules,
00:48:29.500 | then build out in terms of the things that you're expressing.
00:48:32.220 | But this is not very typical for compiled languages.
00:48:35.740 | And so this was a very strong design point
00:48:38.020 | and one of the reasons that Swift in general
00:48:40.980 | is designed with this factoring of complexity in mind
00:48:43.500 | so that the language can express powerful things.
00:48:46.460 | You can write firmware in Swift if you want to,
00:48:49.300 | but it has a very high level feel,
00:48:51.900 | which is really this perfect blend
00:48:53.780 | because often you have very advanced library writers
00:48:57.440 | that want to be able to use the nitty gritty details,
00:49:00.500 | but then other people just want to use the libraries
00:49:02.940 | and work at a higher abstraction level.
00:49:04.900 | - It's kind of cool that I saw
00:49:06.100 | that you can just interoperability.
00:49:09.180 | I don't think I pronounced that word enough,
00:49:11.340 | but you can just drag in Python.
00:49:13.760 | It's just straight, you can import,
00:49:16.760 | like I saw this in the demo, import and import.
00:49:19.620 | How do you make that happen?
00:49:20.780 | - Yeah, well.
00:49:21.620 | - What's up with, is that as easy as it looks or is it?
00:49:25.540 | - Yes, as easy as it looks.
00:49:26.540 | That's not a stage magic hack or anything like that.
00:49:29.420 | - No, no, I don't mean from the user perspective,
00:49:31.380 | I mean from the implementation perspective,
00:49:33.180 | to make it happen.
00:49:34.100 | - So it's easy once all the pieces are in place.
00:49:36.980 | The way it works, so if you think about
00:49:38.500 | a dynamically typed language like Python,
00:49:40.620 | you can think about it in two different ways.
00:49:42.140 | You can say it has no types,
00:49:44.460 | which is what most people would say,
00:49:47.420 | or you can say it has one type,
00:49:49.100 | and you can say it has one type
00:49:51.500 | and it's like the Python object.
00:49:53.700 | The Python object gets passed around
00:49:55.020 | and because there's only one type, it's implicit.
00:49:57.420 | And so what happens with Swift
00:50:00.220 | and Python talking to each other,
00:50:01.340 | Swift has lots of types, it has arrays and it has strings
00:50:04.300 | and all like classes and that kind of stuff,
00:50:07.100 | but it now has a Python object type.
00:50:11.140 | So there is one Python object type.
00:50:12.780 | And so when you say import NumPy,
00:50:15.860 | what you get is a Python object, which is the NumPy module.
00:50:19.900 | And then you say np.array, and it says,
00:50:22.540 | okay, hey, Python object, I have no idea what you are,
00:50:24.980 | give me your array member.
00:50:26.280 | Okay, cool.
00:50:28.120 | And it just uses dynamic stuff,
00:50:30.100 | talks to the Python interpreter and says,
00:50:31.860 | hey, Python, what's the .array member in that Python object?
00:50:35.700 | It gives you back another Python object.
00:50:37.380 | And now you say parentheses for the call
00:50:39.500 | and the arguments you're gonna pass.
00:50:40.580 | And so then it says, hey, a Python object
00:50:43.540 | that is the result of np.array, call with these arguments.
00:50:48.020 | Again, calling into the Python interpreter to do that work.
00:50:50.180 | And so right now, this is all really simple.
00:50:53.620 | And if you dive into the code,
00:50:55.500 | what you'll see is that the Python module in Swift
00:50:58.420 | is something like 1200 lines of code or something.
00:51:01.340 | It's written in pure Swift.
00:51:02.340 | It's super simple.
00:51:03.540 | And it's built on top of the C interoperability
00:51:06.540 | because it just talks to the Python interpreter.
00:51:09.500 | But making that possible required us
00:51:11.180 | to add two major language features to Swift
00:51:13.460 | to be able to express these dynamic calls
00:51:15.360 | and the dynamic member lookups.
00:51:17.180 | And so what we've done over the last year
00:51:19.500 | is we've proposed, implement, standardized
00:51:23.060 | and contributed new language features to the Swift language
00:51:26.100 | in order to make it so it is really trivial.
00:51:28.300 | Right, and this is one of the things about Swift
00:51:31.340 | that is critical to the Swift for TensorFlow work,
00:51:34.820 | which is that we can actually add new language features.
00:51:37.140 | And the bar for adding those is high,
00:51:39.140 | but it's what makes it possible.
00:51:42.140 | - So you're now at Google doing incredible work
00:51:45.180 | on several things, including TensorFlow.
00:51:47.660 | So TensorFlow 2.0 or whatever leading up to 2.0
00:51:52.180 | has by default in 2.0 has eager execution.
00:51:56.780 | And yet in order to make code optimized for GPU or TPU
00:52:00.500 | or some of these systems,
00:52:03.380 | computation needs to be converted to a graph.
00:52:05.940 | So what's that process like?
00:52:07.420 | What are the challenges there?
00:52:08.940 | - Yeah, so I'm tangentially involved in this,
00:52:11.700 | but the way that it works with autograph
00:52:15.220 | is that you mark your function with a decorator.
00:52:21.580 | And when Python calls it, that decorator is invoked.
00:52:24.220 | And then it says, before I call this function,
00:52:28.220 | you can transform it.
00:52:29.460 | And so the way autograph works is,
00:52:31.620 | as far as I understand is it actually uses the Python parser
00:52:34.420 | to go parse that, turn it into a syntax tree,
00:52:37.140 | and now apply compiler techniques to, again,
00:52:39.380 | transform this down into TensorFlow graphs.
00:52:42.260 | And so you can think of it as saying,
00:52:44.740 | hey, I have an if statement,
00:52:45.860 | I'm gonna create an if node in the graph,
00:52:48.340 | like you say, tf.cond.
00:52:51.060 | You have a multiply,
00:52:53.020 | well, I'll turn that into a multiply node in the graph,
00:52:55.300 | and it becomes this tree transformation.
00:52:57.700 | - So where does the Swift for TensorFlow come in?
00:53:01.260 | Which is, you know, parallels.
00:53:04.580 | You know, for one, Swift is a interface,
00:53:06.940 | like Python is an interface to TensorFlow,
00:53:09.220 | but it seems like there's a lot more going on
00:53:11.220 | in just a different language interface.
00:53:13.140 | There's optimization methodology.
00:53:14.900 | - Yeah, so the TensorFlow world has a couple of different,
00:53:19.400 | what I'd call front-end technologies.
00:53:21.180 | And so Swift and Python and Go and Rust and Julia
00:53:25.260 | and all these things share the TensorFlow graphs
00:53:29.300 | and all the runtime and everything that's later.
00:53:32.700 | And so Swift for TensorFlow is merely another front-end
00:53:36.660 | for TensorFlow, just like any of these other systems are.
00:53:40.620 | There's a major difference between,
00:53:42.700 | I would say, three camps of technologies here.
00:53:44.620 | There's Python, which is a special case,
00:53:46.820 | because the vast majority of the community effort
00:53:49.180 | is going to the Python interface.
00:53:51.120 | And Python has its own approaches
00:53:52.980 | for automatic differentiation,
00:53:54.520 | it has its own APIs and all this kind of stuff.
00:53:57.140 | There's Swift, which I'll talk about in a second.
00:54:00.200 | And then there's kind of everything else.
00:54:01.920 | And so the everything else are effectively language bindings.
00:54:05.400 | So they call into the TensorFlow runtime,
00:54:08.000 | but they're not,
00:54:08.840 | they usually don't have automatic differentiation
00:54:10.920 | or they usually don't provide anything other than APIs
00:54:14.560 | that call the C APIs in TensorFlow.
00:54:16.480 | And so they're kind of wrappers for that.
00:54:18.400 | Swift is really kind of special.
00:54:19.840 | And it's a very different approach.
00:54:21.860 | Swift for TensorFlow, that is,
00:54:24.120 | is a very different approach,
00:54:25.380 | because there we're saying,
00:54:26.660 | let's look at all the problems that need to be solved
00:54:29.040 | in the full stack of the TensorFlow compilation process,
00:54:34.040 | if you think about it that way.
00:54:35.680 | Because TensorFlow is fundamentally a compiler.
00:54:38.180 | It takes models, and then it makes them go fast on hardware.
00:54:42.760 | That's what a compiler does.
00:54:43.800 | And it has a front-end, it has an optimizer,
00:54:47.560 | and it has many back-ends.
00:54:49.320 | And so if you think about it the right way,
00:54:51.680 | or if you look at it in a particular way,
00:54:54.600 | like it is a compiler.
00:54:56.040 | And so Swift is merely another front-end.
00:55:02.120 | But it's saying, and the design principle is saying,
00:55:05.560 | let's look at all the problems that we face
00:55:07.600 | as machine learning practitioners,
00:55:09.760 | and what is the best possible way we can do that,
00:55:12.220 | given the fact that we can change
00:55:13.520 | literally anything in this entire stack.
00:55:15.920 | And Python, for example, where the vast majority
00:55:18.480 | of the engineering and effort has gone into,
00:55:22.460 | is constrained by being the best possible thing
00:55:24.900 | you can do with a Python library.
00:55:27.160 | Like there are no Python language features
00:55:29.320 | that are added because of machine learning
00:55:31.040 | that I'm aware of.
00:55:32.560 | They added a matrix multiplication operator with that,
00:55:35.120 | but that's as close as you get.
00:55:37.320 | And so with Swift, it's hard,
00:55:41.200 | but you can add language features to the language,
00:55:43.800 | and there's a community process for that.
00:55:45.840 | And so we look at these things and say,
00:55:48.040 | well, what is the right division of labor
00:55:49.720 | between the human programmer and the compiler?
00:55:51.980 | And Swift has a number of things that shift that balance.
00:55:55.280 | So because it has a type system, for example,
00:56:00.280 | it makes certain things possible for analysis of the code,
00:56:03.280 | and the compiler can automatically build graphs for you
00:56:06.640 | without you thinking about them.
00:56:08.680 | Like that's a big deal for a programmer.
00:56:10.520 | You just get free performance,
00:56:11.640 | you get clustering and fusion and optimization,
00:56:14.380 | things like that,
00:56:16.120 | without you as a programmer having to manually do it
00:56:18.160 | because the compiler can do it for you.
00:56:20.040 | Automatic differentiation is another big deal.
00:56:22.200 | And I think one of the key contributions
00:56:24.920 | of the Swift for TensorFlow project
00:56:27.240 | is that there's this entire body of work
00:56:30.860 | on automatic differentiation
00:56:32.080 | that dates back to the Fortran days.
00:56:34.560 | People doing a tremendous amount of numerical computing
00:56:36.360 | in Fortran used to write these,
00:56:38.480 | what they call source to source translators,
00:56:40.520 | where you take a bunch of code,
00:56:43.240 | shove it into a mini compiler,
00:56:45.080 | and it would push out more Fortran code,
00:56:48.040 | but it would generate the backwards passes
00:56:50.200 | for your functions for you, the derivatives.
00:56:52.800 | And so in that work in the '70s,
00:56:57.240 | a tremendous number of optimizations,
00:56:58.720 | a tremendous number of techniques
00:57:01.160 | for fixing numerical instability
00:57:02.920 | and other kinds of problems were developed,
00:57:05.080 | but they're very difficult to port into a world
00:57:07.600 | where in eager execution,
00:57:09.160 | you get an op by op at a time.
00:57:11.240 | Like you need to be able to look at an entire function
00:57:13.240 | and be able to reason about what's going on.
00:57:15.400 | And so when you have a language integrated
00:57:18.320 | automatic differentiation,
00:57:19.560 | which is one of the things
00:57:20.480 | that the Swift project is focusing on,
00:57:22.760 | you can open all these techniques
00:57:24.680 | and reuse them in familiar ways.
00:57:28.700 | But the language integration piece
00:57:30.160 | has a bunch of design room in it,
00:57:31.360 | and it's also complicated.
00:57:33.240 | - The other piece of the puzzle here
00:57:34.880 | that's kind of interesting is TPUs at Google.
00:57:37.000 | - Yes.
00:57:37.840 | - So we're in a new world with deep learning.
00:57:40.200 | It constantly is changing,
00:57:41.640 | and I imagine, without disclosing anything,
00:57:45.020 | I imagine you're still innovating on the TPU front too.
00:57:48.440 | - Indeed.
00:57:49.280 | - So how much sort of interplay is there
00:57:52.240 | between software and hardware
00:57:53.600 | in trying to figure out how to together
00:57:55.000 | move towards an optimized solution?
00:57:56.720 | - There's an incredible amount.
00:57:57.760 | So we're on our third generation of TPUs,
00:57:59.480 | which are now 100 petaflops
00:58:01.640 | in a very large liquid-cooled box,
00:58:04.560 | a virtual box with no cover.
00:58:07.760 | And as you might imagine, we're not out of ideas yet.
00:58:11.280 | The great thing about TPUs
00:58:13.800 | is that they're a perfect example
00:58:15.440 | of hardware-software co-design.
00:58:17.580 | And so it's about saying,
00:58:19.240 | what hardware do we build to solve
00:58:20.800 | certain classes of machine learning problems?
00:58:23.880 | Well, the algorithms are changing.
00:58:26.320 | Like the hardware takes, you know,
00:58:28.160 | some cases years to produce, right?
00:58:30.160 | And so you have to make bets
00:58:31.640 | and decide what is going to happen.
00:58:34.080 | And so, and what is the best way
00:58:36.120 | to spend the transistors
00:58:37.280 | to get the maximum, you know,
00:58:38.840 | performance per watt or area per cost,
00:58:41.520 | or like whatever it is that you're optimizing for.
00:58:43.920 | And so one of the amazing things about TPUs
00:58:46.560 | is this numeric format called BFloat16.
00:58:50.000 | BFloat16 is a compressed 16-bit floating point format,
00:58:54.160 | but it puts the bits in different places.
00:58:56.040 | And in numeric terms,
00:58:57.120 | it has a smaller mantissa and a larger exponent.
00:59:00.360 | That means that it's less precise,
00:59:02.980 | but it can represent larger ranges of values,
00:59:05.720 | which in the machine learning context
00:59:07.300 | is really important and useful
00:59:08.620 | because sometimes you have very small gradients
00:59:13.080 | you want to accumulate
00:59:13.960 | and very, very small numbers
00:59:16.440 | that are important to move things as you're learning,
00:59:20.560 | but sometimes you have very large magnitude numbers as well.
00:59:23.240 | And BFloat16 is not as precise, the mantissa is small,
00:59:28.200 | but it turns out the machine learning algorithms
00:59:30.000 | actually want to generalize.
00:59:31.560 | And so there's, you know,
00:59:33.240 | theories that this actually increases
00:59:35.520 | the ability for the network to generalize across datasets.
00:59:38.000 | And regardless of whether it's good or bad,
00:59:41.160 | it's much cheaper at the hardware level to implement
00:59:43.720 | because the area and time of a multiplier
00:59:48.100 | is N squared in the number of bits in the mantissa,
00:59:50.880 | but it's linear with size of the exponent.
00:59:53.360 | - And you're connected to both efforts here,
00:59:55.600 | both on the hardware and the software side?
00:59:57.200 | - Yeah, and so that was a breakthrough
00:59:58.920 | coming from the research side
01:00:01.040 | and people working on optimizing network transport
01:00:04.740 | of weights across the network originally
01:00:07.920 | and trying to find ways to compress that.
01:00:10.160 | But then it got burned into silicon
01:00:12.160 | and it's a key part of what makes TPU performance so amazing
01:00:15.320 | and great.
01:00:17.880 | Now TPUs have many different aspects of it
01:00:19.880 | that are important,
01:00:20.720 | but the co-design between the low-level compiler bits
01:00:25.080 | and the software bits and the algorithms
01:00:27.400 | is all super important.
01:00:28.680 | And it's this amazing trifecta that only Google can do.
01:00:32.880 | - Yeah, that's super exciting.
01:00:34.240 | So can you tell me about MLIR project,
01:00:38.480 | previously the secretive one?
01:00:41.440 | - Yeah, so MLIR is a project that we announced
01:00:43.600 | at a compiler conference three weeks ago or something,
01:00:47.760 | the Compilers for Machine Learning Conference.
01:00:50.840 | Basically, again, if you look at TensorFlow
01:00:52.680 | as a compiler stack,
01:00:53.720 | it has a number of compiler algorithms within it.
01:00:56.140 | It also has a number of compilers that get embedded into it
01:00:59.040 | and they're made by different vendors.
01:01:00.480 | For example, Google has XLA,
01:01:02.640 | which is a great compiler system.
01:01:04.720 | NVIDIA has TensorRT, Intel has Ngraph.
01:01:08.640 | There's a number of these different compiler systems
01:01:10.800 | and they're very hardware specific
01:01:13.800 | and they're trying to solve different parts of the problems,
01:01:16.440 | but they're all kind of similar in a sense
01:01:19.080 | of they wanna integrate with TensorFlow.
01:01:20.840 | Now TensorFlow has an optimizer
01:01:22.880 | and it has these different code generation technologies
01:01:25.480 | built in.
01:01:26.400 | The idea of MLIR is to build a common infrastructure
01:01:28.680 | to support all these different subsystems.
01:01:31.080 | And initially it's to be able to make it
01:01:33.560 | so that they all plug in together
01:01:34.840 | and they can share a lot more code and can be reusable.
01:01:37.840 | But over time,
01:01:38.680 | we hope that the industry will start collaborating
01:01:41.640 | and sharing code.
01:01:42.480 | And instead of reinventing the same things
01:01:43.960 | over and over again,
01:01:45.240 | that we can actually foster some of that,
01:01:47.280 | working together to solve common problem energy
01:01:51.360 | that has been useful in the compiler field before.
01:01:54.440 | Beyond that, MLIR is,
01:01:56.360 | some people have joked that it's kind of LLVM2.
01:01:59.240 | It learns a lot about what LLVM has been good
01:02:01.800 | and what LLVM has done wrong.
01:02:04.280 | And it's a chance to fix that.
01:02:06.400 | And also there are challenges in the LLVM ecosystem as well,
01:02:09.800 | where LLVM is very good at the thing it was designed to do,
01:02:12.680 | but 20 years later, the world has changed
01:02:15.520 | and people are trying to solve higher level problems
01:02:17.360 | and we need some new technology.
01:02:20.280 | - And what's the future of open source in this context?
01:02:24.680 | - Very soon.
01:02:25.680 | So it is not yet open source,
01:02:27.440 | but it will be hopefully in the next couple of months.
01:02:29.320 | - So you still believe in the value of open source
01:02:31.000 | in these kinds of contexts?
01:02:31.840 | - Oh yeah, absolutely.
01:02:32.680 | And I think that the TensorFlow community at large
01:02:36.080 | fully believes in open source.
01:02:37.640 | - So I mean, there is a difference between Apple,
01:02:40.080 | where you were previously, and Google now,
01:02:42.360 | in spirit and culture.
01:02:43.480 | And I would say the open sourcing of TensorFlow
01:02:45.440 | was a seminal moment in the history of software
01:02:48.400 | 'cause here's this large company
01:02:51.000 | releasing a very large code base that's open sourcing.
01:02:55.880 | What are your thoughts on that?
01:02:57.760 | How happy or not were you to see
01:03:00.440 | that kind of degree of open sourcing?
01:03:02.880 | - So between the two, I prefer the Google approach,
01:03:05.320 | if that's what you're saying.
01:03:06.820 | The Apple approach makes sense
01:03:11.000 | given the historical context that Apple came from,
01:03:13.400 | but that's been 35 years ago.
01:03:15.720 | And I think that Apple is definitely adapting.
01:03:18.160 | And the way I look at it is that there's different kinds
01:03:21.320 | of concerns in this space, right?
01:03:23.120 | It is very rational for a business
01:03:24.840 | to care about making money.
01:03:28.680 | That fundamentally is what a business is about, right?
01:03:31.600 | But I think it's also incredibly realistic to say
01:03:34.840 | it's not your string library
01:03:36.120 | that's the thing that's gonna make you money.
01:03:38.040 | It's gonna be the amazing UI product differentiating features
01:03:42.000 | and other things like that
01:03:42.920 | that you build on top of your string library.
01:03:45.200 | And so keeping your string library proprietary and secret
01:03:49.440 | and things like that,
01:03:51.000 | maybe not the important thing anymore, right?
01:03:54.680 | Where before platforms were different, right?
01:03:57.680 | And even 15 years ago, things were a little bit different,
01:04:01.480 | but the world is changing.
01:04:02.840 | So Google strikes a very good balance, I think.
01:04:05.200 | And I think the TensorFlow being open source
01:04:08.600 | really changed the entire machine learning field
01:04:11.960 | and it caused a revolution in its own right.
01:04:14.000 | And so I think it's amazingly forward-looking
01:04:17.520 | because I could have imagined,
01:04:20.400 | and I wasn't at Google at the time,
01:04:21.520 | but I could imagine a different context
01:04:23.120 | and a different world where a company says,
01:04:25.400 | "Machine learning is critical to what we're doing.
01:04:27.520 | "We're not gonna give it to other people," right?
01:04:29.600 | And so that decision is a profoundly brilliant insight
01:04:34.600 | that I think has really led to the world being better
01:04:38.280 | and better for Google as well.
01:04:40.120 | - And has all kinds of ripple effects.
01:04:42.200 | I think it is really, I mean,
01:04:45.000 | you can't understate Google deciding that,
01:04:47.880 | how profound that is for software.
01:04:49.800 | It's awesome.
01:04:50.840 | - Well, and it's been, and again,
01:04:52.640 | I can understand the concern about
01:04:55.640 | if we release our machine learning software,
01:04:57.640 | our competitors could go faster.
01:05:00.640 | But on the other hand, I think that open sourcing TensorFlow
01:05:02.520 | has been fantastic for Google.
01:05:03.920 | And I'm sure that decision was very non-obvious at the time,
01:05:08.920 | but I think it's worked out very well.
01:05:11.460 | - So let's try this real quick.
01:05:13.200 | You were at Tesla for five months
01:05:15.600 | as the VP of Autopilot Software.
01:05:17.640 | You led the team during the transition
01:05:19.440 | from H hardware one to hardware two.
01:05:22.320 | I have a couple of questions.
01:05:23.400 | So one, first of all, to me,
01:05:26.080 | that's one of the bravest engineering decisions
01:05:28.520 | undertaking, sort of like,
01:05:31.880 | undertaking really ever in the automotive industry,
01:05:34.320 | to me, software-wise, starting from scratch.
01:05:37.440 | It's a really brave engineering decision.
01:05:39.240 | So my one question there is, what was that like?
01:05:42.720 | What was the challenge of that?
01:05:43.960 | - Do you mean the career decision of jumping
01:05:45.760 | from a comfortable good job into the unknown, or?
01:05:48.800 | - That combined, so at the individual level,
01:05:51.520 | you making that decision, and then when you show up,
01:05:56.320 | you know, it's a really hard engineering problem.
01:05:58.800 | So you could just stay, maybe slow down,
01:06:03.640 | say hardware one, or those kinds of decisions.
01:06:06.720 | Just taking it full on, let's do this from scratch.
01:06:10.200 | What was that like?
01:06:11.120 | - Well, so, I mean, I don't think Tesla has a culture
01:06:13.240 | of taking things slow and seeing how it goes.
01:06:15.720 | So, and one of the things that attracts me about Tesla
01:06:18.080 | is it's very much a gung-ho, let's change the world,
01:06:20.040 | let's figure it out kind of a place.
01:06:21.520 | And so I have a huge amount of respect for that.
01:06:24.000 | Tesla has done very smart things
01:06:27.960 | with hardware one in particular,
01:06:29.400 | and the hardware one design was originally designed
01:06:32.760 | to be very simple automation features in the car
01:06:37.320 | for like traffic-aware cruise control and things like that.
01:06:39.840 | And the fact that they were able to effectively
01:06:42.360 | feature creep it into lane holding
01:06:44.160 | and a very useful driver assistance feature
01:06:48.120 | is pretty astounding,
01:06:49.280 | particularly given the details of the hardware.
01:06:51.640 | Hardware two built on that in a lot of ways.
01:06:54.640 | And the challenge there was that they were transitioning
01:06:56.800 | from a third-party provided vision stack
01:07:00.080 | to an in-house built vision stack.
01:07:01.720 | And so for the first step, which I mostly helped with,
01:07:05.680 | was getting onto that new vision stack.
01:07:08.520 | And that was very challenging.
01:07:10.880 | And it was time critical for various reasons,
01:07:14.000 | and it was a big leap,
01:07:15.000 | but it was fortunate that it built on a lot of the knowledge
01:07:17.560 | and expertise in the team that had built
01:07:19.440 | hardware one's driver assistance features.
01:07:22.920 | - So you spoke in a collected and kind way
01:07:25.360 | about your time at Tesla,
01:07:26.680 | but it was ultimately not a good fit.
01:07:28.960 | Elon Musk, we've talked on this podcast,
01:07:31.840 | several guests of the course,
01:07:33.440 | Elon Musk continues to do some of the most bold
01:07:35.920 | and innovative engineering work in the world.
01:07:38.220 | At times at the cost to some of the members
01:07:40.320 | of the Tesla team,
01:07:41.320 | what did you learn about this working
01:07:44.080 | in this chaotic world with Elon?
01:07:45.840 | - Yeah, so I guess I would say that when I was at Tesla,
01:07:50.520 | I experienced and saw the highest degree of turnover
01:07:54.440 | I'd ever seen in a company, which was a bit of a shock.
01:07:57.280 | But one of the things I learned and I came to respect
01:08:00.520 | is that Elon's able to attract amazing talent
01:08:03.400 | because he has a very clear vision of the future
01:08:05.800 | and he can get people to buy into it
01:08:07.240 | because they want that future to happen.
01:08:09.880 | And the power of vision is something
01:08:11.880 | that I have a tremendous amount of respect for.
01:08:14.240 | And I think that Elon is fairly singular in the world
01:08:17.640 | in terms of the things he's able to get people to believe in.
01:08:22.400 | And it's a very,
01:08:24.000 | there may be people that stand in the street corner
01:08:27.440 | and say, "Ah, we're gonna go to Mars."
01:08:29.400 | But then there are a few people that can get others
01:08:33.480 | to buy into it and believe in,
01:08:34.680 | build the path and make it happen.
01:08:36.200 | And so I respect that.
01:08:38.220 | I don't respect all of his methods,
01:08:41.040 | but I have a huge amount of respect for that.
01:08:43.760 | - You've mentioned in a few places,
01:08:46.840 | including in this context, working hard.
01:08:50.440 | What does it mean to work hard?
01:08:51.960 | And when you look back at your life,
01:08:53.500 | what were some of the most brutal periods
01:08:57.080 | of having to really sort of put everything you have
01:09:01.280 | into something?
01:09:02.220 | - Yeah, good question.
01:09:05.040 | So working hard can be defined a lot of different ways.
01:09:07.400 | So a lot of hours.
01:09:08.680 | And so that is true.
01:09:11.340 | The thing to me that's the hardest
01:09:14.500 | is both being short-term focused on delivering
01:09:18.080 | and executing and making a thing happen,
01:09:20.280 | while also thinking about the longer term
01:09:23.080 | and trying to balance that, right?
01:09:24.400 | Because if you are myopically focused on solving a task
01:09:28.480 | and getting that done,
01:09:29.660 | and only think about that incremental next step,
01:09:32.560 | you will miss the next big hill you should jump over to.
01:09:36.360 | And so I've been really fortunate
01:09:38.040 | that I've been able to kind of oscillate between the two.
01:09:42.080 | And historically at Apple, for example,
01:09:45.640 | that was made possible because I was able to work
01:09:47.480 | with some really amazing people and build up teams
01:09:49.560 | and leadership structures
01:09:51.040 | and allow them to grow in their careers
01:09:55.320 | and take on responsibility,
01:09:57.120 | thereby freeing up me to be a little bit crazy
01:10:00.120 | and thinking about the next thing.
01:10:02.700 | And so it's a lot of that,
01:10:04.700 | but it's also about,
01:10:06.220 | with experience you make connections
01:10:08.020 | that other people don't necessarily make.
01:10:10.140 | And so I think that's a big part as well.
01:10:13.020 | But the bedrock is just a lot of hours
01:10:16.060 | and that's okay with me.
01:10:19.860 | There's different theories on work-life balance
01:10:21.540 | and my theory for myself,
01:10:23.520 | which I do not project onto the team,
01:10:25.260 | but my theory for myself is that,
01:10:26.980 | I wanna love what I'm doing and work really hard.
01:10:30.460 | And my purpose, I feel like,
01:10:33.540 | and my goal is to change the world
01:10:35.020 | and make it a better place.
01:10:36.300 | And that's what I'm really motivated to do.
01:10:38.460 | - So last question, LLVM logo is a dragon.
01:10:44.060 | - Yeah.
01:10:44.880 | - You explain that this is because dragons
01:10:46.800 | have connotations of power, speed, intelligence.
01:10:50.380 | It can also be sleek, elegant, and modular,
01:10:53.340 | though you remove the modular part.
01:10:55.180 | What is your favorite dragon-related character
01:10:58.980 | from fiction, video, or movies?
01:11:01.460 | - So those are all very kind ways of explaining it.
01:11:03.820 | Do you wanna know the real reason it's a dragon?
01:11:05.700 | - Well, yeah.
01:11:07.020 | - Is that better?
01:11:07.900 | So there's a seminal book on compiler design
01:11:11.040 | called "The Dragon Book."
01:11:12.500 | And so this is a really old now book on compilers.
01:11:16.300 | And so the dragon logo for LLVM came about
01:11:20.540 | because at Apple,
01:11:22.060 | we kept talking about LLVM-related technologies
01:11:24.740 | and there's no logo to put on a slide.
01:11:27.020 | And so we're like, "What do we do?"
01:11:28.460 | And somebody's like, "Well, what kind of logo
01:11:29.940 | "should a compiler technology have?"
01:11:32.180 | And I'm like, "I don't know.
01:11:33.260 | "I mean, the dragon is the best thing that we've got."
01:11:37.300 | And Apple somehow magically came up with the logo
01:11:40.660 | and it was a great thing
01:11:42.660 | and the whole community rallied around it.
01:11:44.020 | And then it got better
01:11:45.740 | as other graphic designers got involved.
01:11:47.380 | But that's originally where it came from.
01:11:49.340 | - Story.
01:11:50.180 | Is there dragons from fiction that you connect with?
01:11:52.720 | That "Game of Thrones," "Lord of the Rings,"
01:11:57.220 | that kind of thing?
01:11:58.060 | - "Lord of the Rings" is great.
01:11:59.180 | I also like role-playing games
01:12:00.460 | and things like computer role-playing games.
01:12:02.220 | And so dragons often show up in there.
01:12:03.660 | But it really comes back to the book.
01:12:07.140 | Oh no, we need a thing.
01:12:08.620 | - Yeah.
01:12:09.940 | - And hilariously, one of the funny things about LLVM
01:12:13.700 | is that my wife, who's amazing,
01:12:16.980 | runs the LLVM Foundation.
01:12:19.460 | And she goes to Grace Hopper
01:12:20.860 | and is trying to get more women involved in the,
01:12:23.420 | she's also a compiler engineer.
01:12:24.620 | So she's trying to get other women
01:12:26.060 | to get interested in compilers and things like this.
01:12:28.020 | And so she hands out the stickers.
01:12:29.980 | And people like the LLVM sticker
01:12:32.140 | because of "Game of Thrones."
01:12:34.300 | And so sometimes culture has this helpful effect
01:12:36.860 | to get the next generation of compiler engineers
01:12:39.940 | engaged with the cause.
01:12:42.380 | - Okay, awesome.
01:12:43.260 | Chris, thanks so much for talking to me.
01:12:44.100 | - Yeah, it's been great talking with you.
01:12:45.820 | (upbeat music)
01:12:48.420 | (upbeat music)
01:12:51.020 | (upbeat music)
01:12:53.620 | (upbeat music)
01:12:56.220 | (upbeat music)
01:12:58.820 | (upbeat music)
01:13:01.420 | [BLANK_AUDIO]