back to indexChris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators | Lex Fridman Podcast #21
Chapters
0:0
1:28 What Was the First Program You'Ve Ever Written
4:45 What Is a Compiler
4:48 Phases of a Compiler
8:16 Compiler Infrastructure
9:31 Llvm Open Source
19:16 Intermediate Representation
32:23 Linux Still Defaults to Gcc
35:18 Code Owners
39:29 The Preprocessor
40:31 Swift
47:34 Progressive Disclosure of Complexity
49:58 Swift and Python Talking to each Other
53:44 Python
56:30 Automatic Differentiation
62:6 Challenges in the Llvm Ecosystem
62:20 The Future of Open Source
67:28 Elon Musk
71:10 The Dragon Book
00:00:00.000 |
The following is a conversation with Chris Latner. 00:00:06.720 |
including CPU, GPU, TPU accelerators for TensorFlow, 00:00:12.040 |
and all kinds of machine learning compiler magic 00:00:20.080 |
which means he deeply understands the intricacies 00:00:27.920 |
He created the LLVM compiler infrastructure project 00:00:36.040 |
including the creation of the Swift programming language. 00:00:44.320 |
during the transition from Autopilot hardware one 00:00:51.280 |
to build an in-house software infrastructure for Autopilot. 00:00:54.900 |
I could have easily talked to Chris for many more hours. 00:00:58.060 |
Compiling code down across the levels of abstraction 00:01:01.260 |
is one of the most fundamental and fascinating aspects 00:01:05.460 |
And he is one of the world experts in this process. 00:01:08.660 |
It's rigorous science and it's messy, beautiful art. 00:01:16.780 |
If you enjoy it, subscribe on YouTube, iTunes, 00:01:19.440 |
or simply connect with me on Twitter @LexFriedman, 00:01:24.780 |
And now here's my conversation with Chris Ladner. 00:01:29.420 |
What was the first program you've ever written? 00:01:45.420 |
and seeing how they worked and then typing them in wrong 00:01:49.340 |
and trying to figure out why they were not working right, 00:01:54.900 |
that you remember yourself maybe falling in love with, 00:02:00.180 |
- I don't know, I mean, I feel like I've learned a lot 00:02:06.700 |
So I started in basic and then went like GW basic, 00:02:11.460 |
and then upgraded to Q basic and eventually Quick basic, 00:02:20.900 |
and started doing machine language programming 00:02:23.380 |
and assembly in Pascal, which was really cool. 00:02:29.940 |
and then kind of did lots of other weird things. 00:02:41.500 |
sort of functional philosophical hippie route. 00:02:44.620 |
Instead you went into like the dark arts of the C. 00:02:50.700 |
So it started with basic Pascal and then assembly 00:03:22.820 |
but Pascal's much more principled in various ways. 00:03:27.820 |
C is more, I mean, it has its historical roots, 00:03:35.500 |
- With pointers, there's this memory management thing 00:03:41.660 |
Is that the first time you start to understand 00:03:43.860 |
that there's resources that you're supposed to manage? 00:03:48.500 |
but in Pascal, these, like the caret instead of the star, 00:03:51.620 |
and there's some small differences like that, 00:03:58.220 |
how things get laid out in memory a lot more. 00:04:00.820 |
And so in Pascal, you have allocating and deallocating 00:04:17.660 |
So it's a little bit of a higher level abstraction. 00:04:26.540 |
- So can you tell me first what LLVM and Clang are 00:04:35.500 |
one of the most powerful compiler optimization systems 00:04:53.380 |
So the way I look at this is you have a two-sided problem 00:05:00.420 |
and then you have machines that need to run the program 00:05:07.020 |
and wanna think about every piece of hardware. 00:05:08.940 |
And so at the same time that you have lots of humans, 00:05:14.780 |
And so compilers are the art of allowing humans 00:05:19.220 |
that they wanna think about and then get that program, 00:05:26.020 |
And the interesting and exciting part of all this 00:05:29.460 |
is that there's now lots of different kinds of hardware, 00:05:31.940 |
chips like x86 and PowerPC and ARM and things like that, 00:05:37.300 |
for machine learning and other things like that, 00:05:38.880 |
or also just different kinds of hardware, GPUs. 00:05:42.900 |
And at the same time on the programming side of it, 00:05:45.580 |
you have your basic, you have C, you have JavaScript, 00:05:52.820 |
that are all trying to talk to the human in a different way 00:05:55.180 |
to make them more expressive and capable and powerful. 00:06:03.420 |
- End to end, from the very beginning to the very end. 00:06:14.540 |
and the hardware, but the programming language's job 00:06:31.520 |
you have on the one hand humans, which are complicated, 00:06:36.740 |
And so compilers typically work in multiple phases. 00:06:42.720 |
that you have here is try to get maximum reuse 00:06:47.120 |
because these compilers are very complicated. 00:06:51.200 |
is that you have something called a front end or a parser 00:06:56.620 |
And so you'll have a C parser, and that's what Clang is, 00:07:16.680 |
but these three big groups are very common in compilers. 00:07:22.200 |
is trying to standardize that middle and last part. 00:07:27.880 |
is that there are a lot of different languages 00:07:31.080 |
And so things like Swift, but also Julia, Rust, 00:07:40.920 |
and they can all use the same optimization infrastructure, 00:07:48.800 |
And so LLVM is really that layer that is common 00:07:52.240 |
that all these different specific compilers can use. 00:07:55.580 |
- And is it a standard, like a specification, 00:08:02.120 |
And so I think there's a couple of different ways 00:08:06.720 |
Because it depends on which angle you're looking at it from. 00:08:20.000 |
that you build a concrete compiler on top of. 00:08:27.920 |
And one of the most fascinating things about LLVM 00:08:30.560 |
over the course of time is that we've managed somehow 00:08:41.120 |
And so you have Google and Apple, you have AMD and Intel, 00:08:45.880 |
you have Nvidia and AMD on the graphics side, 00:08:48.880 |
you have Cray and everybody else doing these things. 00:08:52.640 |
And like all these companies are collaborating together 00:08:55.420 |
to make that shared infrastructure really, really great. 00:08:58.520 |
And they do this not out of the goodness of their heart, 00:09:01.400 |
but they do it because it's in their commercial interest 00:09:06.800 |
and facing the reality that it's so expensive 00:09:11.200 |
no one company really wants to implement it all themselves. 00:09:16.960 |
That's a great point because it's also about the skill sets. 00:09:28.000 |
It always seems like with open source projects, 00:09:34.460 |
It's about, it's 19 years old now, so it's fairly old. 00:09:53.940 |
and then a team of two or three research students 00:09:58.420 |
and we built many of the core pieces initially. 00:10:21.900 |
and as Apple was using it, being open source and public, 00:10:30.260 |
and in some cases, Google effectively owns Clang now, 00:10:42.980 |
And so likewise, NVIDIA cares a lot about CUDA, 00:10:54.980 |
- And so when you first started as a master's project, 00:10:58.900 |
I guess, did you think it was gonna go as far as it went? 00:11:09.800 |
- Yeah, no, no, no, it was nothing like that. 00:11:11.340 |
So I mean, my goal when I went to the University of Illinois 00:11:13.700 |
was to get in and out with a non-thesis master's in a year, 00:11:33.420 |
and facing both software engineering challenges, 00:11:40.100 |
I had worked at many companies as interns before that, 00:11:45.860 |
to have a team of people that are working together 00:11:48.060 |
and trying to collaborate in version control, 00:11:54.020 |
and he believes that 2% of the world population 00:11:58.780 |
that they're geeks, they understand computers, 00:12:16.900 |
I mean, it seems like that's one of the biggest, 00:12:20.900 |
- Yeah, that's one of the major things it does. 00:12:22.500 |
So I got into that because of a person, actually. 00:12:28.220 |
I had an advisor, or a professor named Steve Vegdal, 00:12:32.060 |
and I went to this little, tiny, private school. 00:12:47.420 |
It was kind of a wart on the side of the math department, 00:12:51.260 |
I think it's evolved a lot in the many years since then. 00:13:04.460 |
is that they're large, complicated software pieces. 00:13:16.740 |
that you would take algorithms and data structures 00:13:34.780 |
because in many classes, if you don't get a project done, 00:13:37.500 |
you just forget about it and move on to the next one 00:13:41.300 |
But here you have to live with the decisions you make 00:13:51.060 |
the following semester, and he was just really great. 00:13:53.940 |
And he was also a great mentor in a lot of ways. 00:14:01.500 |
I wasn't super excited about going to grad school. 00:14:07.460 |
But like I said, I kind of got tricked into saying 00:14:34.100 |
- For me, it was more, so I'm not really a math person. 00:14:38.300 |
I understand some bits of it when I get into it, 00:14:41.600 |
but math is never the thing that attracted me. 00:14:43.980 |
And so a lot of the parser part of the compiler 00:14:57.940 |
and exploring and getting it to do more things, 00:15:00.820 |
and then setting new goals and reaching for them. 00:15:03.220 |
And in the case of LLVM, when I started working on that, 00:15:13.460 |
And so he and I specifically found each other 00:15:15.640 |
because we were both interested in compilers, 00:15:16.980 |
and so I started working with him and taking his class. 00:15:21.380 |
it's fun implementing all the standard algorithms 00:15:23.660 |
and all the things that people had been talking about 00:15:26.420 |
and were well-known and they were in the curricula 00:15:31.340 |
And so just being able to build that was really fun. 00:15:40.260 |
- So you said compilers are these complicated systems. 00:15:42.860 |
Can you even just, with language, try to describe 00:15:54.660 |
- So I'll give you examples of the hard parts 00:15:57.220 |
So C++ is a very complicated programming language. 00:16:14.340 |
So the actual, how the characters are arranged, yes. 00:16:25.580 |
You play that forward and then a bunch of suboptimal, 00:16:28.740 |
in some cases, decisions were made and they compound. 00:16:33.380 |
keep getting added to C++ and it will probably never stop. 00:16:37.020 |
But the language is very complicated from that perspective. 00:16:45.620 |
one of the major challenges, which Clang, as a project, 00:16:48.580 |
the C++ compiler that I built, I and many people built, 00:16:52.280 |
one of the challenges we took on was we looked at GCC. 00:17:01.060 |
industry standardized compiler that had really consolidated 00:17:12.540 |
And it was full of global variables and other things 00:17:18.100 |
in ways that it wasn't originally designed for. 00:17:20.380 |
And so with Clang, one of the things that we wanted to do 00:17:25.460 |
So make error messages that are just better than GCCs. 00:17:28.140 |
And that's actually hard because you have to do 00:17:35.140 |
And so compile time is about making it efficient, 00:17:37.500 |
which is also really hard when you're keeping track 00:17:43.380 |
So refactoring tools and other analysis tools 00:17:48.360 |
also leveraging the extra information we kept, 00:17:55.940 |
And so that's been one of the areas that Clang 00:18:01.300 |
is in the tooling for C and C++ and things like that. 00:18:05.020 |
But C++ in the front end piece is complicated 00:18:11.300 |
and you have to turn that back into an error message 00:18:17.800 |
But then you start doing the, what's called lowering. 00:18:20.700 |
So going from C++ and the way that it represents code 00:18:25.740 |
there's many different phases you go through. 00:18:28.220 |
Often there are, I think LLVM has something like 150 00:18:33.020 |
different, what are called passes in the compiler 00:18:38.740 |
And these get organized in very complicated ways, 00:18:41.840 |
which affect the generated code and the performance 00:18:55.940 |
- Yeah, so in the parser, it's usually a tree 00:19:01.060 |
And so the idea is you have a node for the plus 00:19:04.580 |
that the human wrote in their code or the function call, 00:19:07.740 |
you'll have a node for call with the function that they call 00:19:10.780 |
and the arguments they pass, things like that. 00:19:18.600 |
And intermediate representations are like LLVM has one. 00:19:22.100 |
And there it's a, it's what's called a control flow graph. 00:19:26.940 |
And so you represent each operation in the program 00:19:31.220 |
as a very simple, like this is gonna add two numbers, 00:19:34.480 |
this is gonna multiply two things, maybe we'll do a call, 00:19:37.460 |
but then they get put in what are called blocks. 00:19:40.260 |
And so you get blocks of these straight line operations, 00:19:43.580 |
where instead of being nested like in a tree, 00:19:53.180 |
And so it's a straight line sequence of operations 00:19:54.980 |
within the block, and then you have branches, 00:20:07.020 |
like for a for statement in a C-like language, 00:20:12.180 |
for the initializer, a pointer to the expression 00:20:14.060 |
for the increment, a pointer to the expression 00:20:18.700 |
Okay, and these are all nested underneath it. 00:20:21.040 |
In a control flow graph, you get a block for the code 00:20:24.580 |
that runs before the loop, so the initializer code, 00:20:27.580 |
then you have a block for the body of the loop, 00:20:30.260 |
and so the body of the loop code goes in there, 00:20:33.740 |
but also the increment and other things like that, 00:20:35.540 |
and then you have a branch that goes back to the top, 00:20:39.860 |
And so it's more of a assembly level kind of representation. 00:20:44.000 |
But the nice thing about this level of representation 00:20:48.660 |
And so there's lots of different kinds of languages 00:20:51.880 |
with different kinds of, you know, JavaScript 00:20:55.180 |
has a lot of different ideas of what is false, for example, 00:21:00.760 |
but then that middle part can be shared across all those. 00:21:04.180 |
- How close is that intermediate representation 00:21:10.260 |
Is there, are they, 'cause everything you describe 00:21:13.540 |
is a kind of echoes of a neural network graph. 00:21:29.140 |
And then they transform those through layers, right? 00:21:37.140 |
is it has relatively few different representations. 00:21:40.660 |
Where a neural network often, as you get deeper, 00:21:42.500 |
for example, you get many different representations, 00:21:47.400 |
is transforming between these different representations. 00:21:50.220 |
In a compiler, often you get one representation 00:21:55.240 |
And these transformations are often applied iteratively. 00:21:58.700 |
And for programmers, there's familiar types of things. 00:22:02.940 |
For example, trying to find expressions inside of a loop 00:22:10.740 |
or find constant folding or other simplifications, 00:22:15.340 |
turning, you know, two times X into X shift left by one, 00:22:23.340 |
But compilers end up getting a lot of theorem proving 00:22:27.660 |
that try to find higher level properties of the program 00:22:32.260 |
- Cool, so what's like the biggest bang for the buck 00:22:40.900 |
At the very beginning, the '80s, I don't know. 00:22:43.940 |
a lot of it was things like register allocation. 00:22:46.420 |
So the idea of in a modern, like a microprocessor, 00:22:50.940 |
what you'll end up having is you'll end up having memory, 00:22:54.300 |
and then you have registers that are relatively fast. 00:22:57.060 |
But registers, you don't have very many of them, okay? 00:23:05.460 |
compute this, compute this, put in a temporary variable, 00:23:07.740 |
I have a loop, I have some other stuff going on. 00:23:28.540 |
an inner loop that executes millions of times maybe. 00:23:31.580 |
If you're doing loads and stores inside that loop, 00:23:37.060 |
inside that loop in registers, now it's really fast. 00:23:40.140 |
And so getting that right requires a lot of work 00:23:43.340 |
because there's many different ways to do that. 00:23:48.820 |
in a different representation than what the human wrote. 00:24:08.700 |
And there are many of these different kinds of techniques 00:24:11.540 |
- So it's adding almost like a time dimension to, 00:24:35.980 |
for the compiler because what they ended up doing 00:24:39.740 |
is ending up adding pipelines to the processor 00:24:42.380 |
where the processor can do more than one thing at a time. 00:24:44.980 |
But this means that the order of operations matters a lot. 00:24:47.620 |
And so one of the classical compiler techniques 00:24:54.180 |
so that the processor can keep its pipelines full 00:25:00.980 |
that are kind of bread and butter compiler techniques 00:25:03.620 |
that have been studied a lot over the course of decades now. 00:25:12.420 |
This is a huge opportunity for machine learning 00:25:14.380 |
because many of these algorithms are full of these 00:25:20.900 |
but don't generalize and full of magic numbers. 00:25:29.860 |
if you were to apply machine learning to this, 00:25:39.060 |
You can pick your metric and there's running time, 00:25:42.220 |
there's lots of different things that you can optimize for. 00:25:45.180 |
Code size is another one that some people care about 00:25:51.660 |
or is somebody actually been crazy enough to try 00:25:55.580 |
to have machine learning-based parameter tuning 00:26:06.780 |
that have been applying search in various forms 00:26:11.460 |
but also brute force search has been tried for quite a while. 00:26:14.380 |
And usually these are in small problem spaces. 00:26:32.620 |
And there's many different confounding factors here 00:26:35.340 |
because graphics cards have different numbers 00:26:40.260 |
and memory bandwidth and many different constraints 00:26:51.260 |
This is something that we as an industry need to fix. 00:27:09.860 |
So in the mid nineties, Java totally changed the world. 00:27:13.660 |
Right, and I'm still amazed by how much change 00:27:22.420 |
all at once introduced things like JIT compilation. 00:27:26.900 |
but it pulled it together and made it mainstream 00:27:32.660 |
portable code, safe code, like memory safe code, 00:27:36.620 |
like a very dynamic dispatch execution model. 00:27:44.100 |
and had been done in small ways in various places, 00:28:22.660 |
how do you do different columns out of that matrix 00:28:31.420 |
And then how do you take it to multiple cores? 00:28:33.460 |
- How did the whole virtual machine thing change 00:28:37.980 |
- Yeah, so what the Java virtual machine does 00:28:44.140 |
where you have a front end that parses the code 00:28:46.260 |
and then you have an intermediate representation 00:28:51.980 |
and then compile to what's known as Java bytecode. 00:28:54.660 |
And that bytecode is now a portable code representation 00:29:09.220 |
And Java bytecode can be shipped around across the wire. 00:29:15.840 |
- And because of that, it can run in the browser. 00:29:18.660 |
- And that's why it runs in the browser, right? 00:29:27.740 |
you'd build this mini app that would run on a webpage. 00:29:30.840 |
Well, a user of that is running a web browser 00:29:37.840 |
and then you do all the compiler stuff on your machine 00:29:44.880 |
I mean, it's a great idea for certain problems. 00:29:48.200 |
that technology is itself neither good nor bad. 00:29:51.600 |
You know, this would be a very, very bad thing 00:29:58.880 |
some of these software portability and transparency 00:30:04.180 |
Now, Java ultimately didn't win out on the desktop. 00:30:13.160 |
it's been a very successful thing over decades. 00:30:31.040 |
and really proud of what's been accomplished? 00:30:33.260 |
- Yeah, I think that the interesting thing about LLVM 00:30:43.960 |
And a lot of really smart people have worked on it. 00:30:48.240 |
But I think that the thing that's most profound about LLVM 00:30:56.220 |
And so interesting things that have happened with LLVM, 00:31:01.220 |
and used it to do all the graphics compilation 00:31:06.060 |
And so now they're able to have better special effects 00:31:15.460 |
when it can be used in ways it was never designed for 00:31:18.740 |
because it has good layering and software engineering 00:31:23.400 |
- Which is where, as you said, it differs from GCC. 00:31:28.220 |
but it's not as good as infrastructure technology. 00:31:31.780 |
It's really a C compiler or it's a Fortran compiler. 00:31:38.700 |
- Now you can tell I don't know what I'm talking about 00:32:01.760 |
all the apps on the iPhone effectively and the OS's. 00:32:05.240 |
It compiles Google's production server applications. 00:32:09.380 |
It's used to build GameCube games and PlayStation 4 00:32:16.680 |
- So as a user I have, but just everything I've done 00:32:27.800 |
Or is it because, I mean, is there a reason for that? 00:32:29.440 |
- It's a combination of technical and social reasons. 00:32:39.640 |
use GCC historically and they've not switched. 00:32:46.640 |
it seems that LLVM has either reached the level GCC 00:32:50.640 |
or superseded on different features or whatever. 00:32:53.520 |
- The way I would say it is that they're so close 00:32:59.160 |
but it doesn't actually really matter anymore at that level. 00:33:09.160 |
- Yeah, yeah, which describes a lot of compilers. 00:33:12.520 |
The hard thing about compilers in my experience 00:33:14.960 |
is the engineering, the software engineering, 00:33:17.400 |
making it so that you can have hundreds of people 00:33:20.120 |
collaborating on really detailed low-level work 00:33:27.840 |
And that's one of the things I think LLVM has done well. 00:33:30.640 |
And that kind of goes back to the original design goals 00:33:37.120 |
And incidentally, I don't want to take all the credit 00:33:43.560 |
And when I started, I would write, for example, 00:33:45.560 |
a register allocator, and then somebody much smarter than me 00:33:50.680 |
with something else that they would come up with. 00:33:52.640 |
And because it's modular, they were able to do that. 00:33:55.160 |
And that's one of the challenges with GCC, for example, 00:33:58.240 |
is replacing subsystems is incredibly difficult. 00:34:01.240 |
It can be done, but it wasn't designed for that. 00:34:04.640 |
And that's one of the reasons that LLVM's been 00:34:06.040 |
very successful in the research world as well. 00:34:08.720 |
- But in a community sense, Guido van Rossum, right, 00:34:20.500 |
So in managing this community of brilliant compiler folks, 00:34:24.380 |
did it, for a time at least, fall on you to approve things? 00:34:34.020 |
where I can order a magnitude more patches in LLVM 00:34:42.780 |
- But you still write, I mean, you're still close 00:34:58.860 |
is that when I was a grad student, I could do all the work 00:35:06.860 |
the way my opinionated sense felt like it should be done. 00:35:14.300 |
And so what ends up happening is LLVM has a hierarchical 00:35:26.660 |
but to make sure that the patches do get reviewed 00:35:28.820 |
and make sure that the right thing's happening 00:35:34.660 |
for example, hardware manufacturers end up owning 00:35:38.540 |
the hardware specific parts of their hardware. 00:35:44.500 |
Leaders in the community that have done really good work 00:35:47.740 |
naturally become the de facto owner of something. 00:35:53.420 |
how about we make them the official co-donor? 00:35:58.620 |
that all the patches get reviewed in a timely manner. 00:36:00.300 |
And then everybody's like, yes, that's obvious. 00:36:03.220 |
And usually this is a very organic thing, which is great. 00:36:06.060 |
And so I'm nominally the top of that stack still, 00:36:08.740 |
but I don't spend a lot of time reviewing patches. 00:36:16.140 |
technical disagreements that end up happening 00:36:18.220 |
and making sure that the community as a whole 00:36:19.660 |
makes progress and is moving in the right direction 00:36:23.900 |
So we also started a nonprofit six years ago, seven years ago. 00:36:30.820 |
And the nonprofit, the LLVM Foundation nonprofit 00:36:34.020 |
helps oversee all the business sides of things 00:36:35.940 |
and make sure that the events that the LLVM community has 00:36:49.060 |
- Right, so it sounds like a lot of it is just organic. 00:36:53.180 |
- Yeah, well, and this is, LLVM is almost 20 years old, 00:36:58.500 |
LLVM is now older than GCC was when LLVM started, right? 00:37:06.860 |
But the good thing about that is it has a really robust, 00:37:16.300 |
but it's a community of people that are interested 00:37:21.140 |
and have been working together effectively for years 00:37:23.660 |
and have a lot of trust and respect for each other. 00:37:31.180 |
- So then in a slightly different flavor of effort, 00:37:34.500 |
you started Apple in 2005 with the task of making, 00:37:44.660 |
leading the entire developer tools department. 00:37:48.380 |
We're talking about LLVM, Xcode, Objective-C to Swift. 00:37:59.660 |
First of all, leading such a huge group of developers, 00:38:15.940 |
- Yeah, I know, but I wanna talk about the other stuff too. 00:38:21.260 |
then we can talk about the big team pieces, if that's okay. 00:38:24.500 |
- So I used to really oversimplify many years of hard work. 00:38:44.060 |
as it went through a couple of hardware transitions. 00:38:46.060 |
I joined right at the time of the Intel transition, 00:38:51.820 |
and then the transition to ARM with the iPhone. 00:38:56.860 |
But at the same time, there's a lot of questions 00:39:06.500 |
the turnaround cycle, the tooling and the IDE 00:39:09.700 |
were not great, were not as good as they could be. 00:39:17.980 |
well, okay, how hard is it to write a C compiler? 00:39:23.380 |
I'm just gonna just do it on nights and weekends 00:39:27.420 |
And then I built up and see there's this thing 00:39:30.140 |
called the preprocessor, which people don't like, 00:39:32.980 |
but it's actually really hard and complicated 00:39:40.920 |
And it's the crux of a bunch of the performance issues 00:39:47.780 |
oh, you know what, we could actually do this. 00:39:49.820 |
Everybody's saying that this is impossible to do, 00:39:51.420 |
but it's actually just hard, it's not impossible. 00:40:00.300 |
Oh, this is great, we can get you one other person 00:40:04.420 |
And slowly a team is formed and it starts taking off. 00:40:08.300 |
And C++, for example, huge complicated language, 00:40:12.020 |
people always assume that it's impossible to implement, 00:40:22.420 |
And that was only possible because we were lucky 00:40:34.420 |
So Swift came from, we were just finishing off 00:40:42.580 |
And C++ is a very formidable and very important language, 00:40:56.140 |
with no hope or ambition that would go anywhere, 00:41:04.860 |
not telling anybody about it kind of a thing. 00:41:09.420 |
I'm like, actually, it would make sense to do this. 00:41:11.260 |
At the same time, I started talking with the senior VP 00:41:14.800 |
of software at the time, a guy named Bertrand Serlet. 00:41:19.260 |
He was like, well, let's have fun, let's talk about this. 00:41:23.460 |
And so he helped guide some of the early work 00:41:26.140 |
and encouraged me and got things off the ground. 00:41:30.420 |
And eventually, I told my manager and told other people, 00:41:41.740 |
the idea of doing a new language is not obvious to anybody, 00:41:46.560 |
And the tone at the time was that the iPhone was successful 00:41:59.880 |
Apple was hiring software people that loved Objective-C. 00:42:05.060 |
Right, and it wasn't that they came despite Objective-C. 00:42:07.940 |
They loved Objective-C, and that's why they got hired. 00:42:10.020 |
And so you had a software team that the leadership 00:42:19.380 |
And so they quote unquote grew up writing Objective-C. 00:42:23.220 |
And many of the individual engineers all were hired 00:42:28.300 |
And so this notion of, okay, let's do new language 00:42:34.100 |
Meanwhile, my sense was that the outside community 00:42:37.860 |
Some people were, and some of the most outspoken people were, 00:42:42.620 |
because it has very sharp corners and it's difficult to learn. 00:42:46.460 |
And so one of the challenges of making Swift happen 00:42:50.060 |
that was totally non-technical is the social part 00:42:57.500 |
Like if we do a new language, which at Apple, 00:43:02.200 |
So if we ship it, what is the metrics of success? 00:43:09.220 |
let's file off those rough corners and edges. 00:43:12.120 |
And one of the major things that became the reason 00:43:15.080 |
to do this was this notion of safety, memory safety. 00:43:39.020 |
that you could not fix safety or memory safety 00:43:47.300 |
of the mental process and the thought process, 00:43:53.500 |
okay, well, if we're gonna do something new, what is good? 00:44:02.420 |
- So what are some design choices early on in Swift? 00:44:13.220 |
- Yeah, so some of those were obvious given the context. 00:44:24.260 |
We wanted the performance and we wanted refactoring tools 00:44:27.200 |
and other things like that to go with typed languages. 00:44:31.400 |
Was it obvious, I think this would be a dumb question, 00:44:53.820 |
It was when the iPhone was definitely on an upper trajectory 00:44:58.700 |
and is still a bit memory constrained, right? 00:45:01.780 |
And so being able to compile the code and then ship it 00:45:05.460 |
and then having standalone code that is not JIT compiled 00:45:26.060 |
saying like, how can we make Objective-C better, right? 00:45:29.580 |
and that was the contiguous, natural thing to do. 00:45:40.060 |
in your work at Google, TensorFlow and so on, 00:45:49.420 |
- Yeah, so the funny thing after working on compilers 00:45:54.820 |
and this is one of the things that LLVM has helped with, 00:45:58.980 |
is that I don't look at compilations being static 00:46:09.100 |
is that Swift is not just statically compiled. 00:46:24.120 |
is it's actually dynamically compiling the statements 00:46:28.180 |
And so this gets back to the software engineering problems, 00:46:32.540 |
right, where if you layer the stack properly, 00:46:38.940 |
because you have the right abstractions there. 00:46:41.060 |
And so the way that a Colab workbook works with Swift 00:47:08.060 |
and overwriting and replacing and updating code in place. 00:47:11.260 |
And the fact that it can do this is not an accident. 00:47:15.620 |
but it's an important part of how the language was set up 00:47:18.100 |
and how it's layered, and this is a non-obvious piece. 00:47:25.880 |
is to make it so that you can learn it very quickly. 00:47:32.060 |
the thing that I always come back to is this UI principle 00:47:37.900 |
And so in Swift, you can start by saying print, 00:47:49.220 |
- No header files, no public static class void, 00:47:51.540 |
blah, blah, blah, string, like Java has, right? 00:47:58.420 |
Then you can say, well, let's introduce variables. 00:48:04.660 |
You can use x, x plus one, this is what it means. 00:48:07.700 |
Then you can say, well, how about control flow? 00:48:13.960 |
Then you can say, let's introduce functions, right? 00:48:17.260 |
And many languages like Python have had this kind of notion 00:48:25.740 |
and then you can add generics in the case of Swift, 00:48:29.500 |
then build out in terms of the things that you're expressing. 00:48:32.220 |
But this is not very typical for compiled languages. 00:48:40.980 |
is designed with this factoring of complexity in mind 00:48:43.500 |
so that the language can express powerful things. 00:48:46.460 |
You can write firmware in Swift if you want to, 00:48:53.780 |
because often you have very advanced library writers 00:48:57.440 |
that want to be able to use the nitty gritty details, 00:49:00.500 |
but then other people just want to use the libraries 00:49:16.760 |
like I saw this in the demo, import and import. 00:49:21.620 |
- What's up with, is that as easy as it looks or is it? 00:49:26.540 |
That's not a stage magic hack or anything like that. 00:49:29.420 |
- No, no, I don't mean from the user perspective, 00:49:34.100 |
- So it's easy once all the pieces are in place. 00:49:40.620 |
you can think about it in two different ways. 00:49:55.020 |
and because there's only one type, it's implicit. 00:50:01.340 |
Swift has lots of types, it has arrays and it has strings 00:50:15.860 |
what you get is a Python object, which is the NumPy module. 00:50:22.540 |
okay, hey, Python object, I have no idea what you are, 00:50:31.860 |
hey, Python, what's the .array member in that Python object? 00:50:43.540 |
that is the result of np.array, call with these arguments. 00:50:48.020 |
Again, calling into the Python interpreter to do that work. 00:50:55.500 |
what you'll see is that the Python module in Swift 00:50:58.420 |
is something like 1200 lines of code or something. 00:51:03.540 |
And it's built on top of the C interoperability 00:51:06.540 |
because it just talks to the Python interpreter. 00:51:23.060 |
and contributed new language features to the Swift language 00:51:28.300 |
Right, and this is one of the things about Swift 00:51:31.340 |
that is critical to the Swift for TensorFlow work, 00:51:34.820 |
which is that we can actually add new language features. 00:51:42.140 |
- So you're now at Google doing incredible work 00:51:47.660 |
So TensorFlow 2.0 or whatever leading up to 2.0 00:51:56.780 |
And yet in order to make code optimized for GPU or TPU 00:52:03.380 |
computation needs to be converted to a graph. 00:52:08.940 |
- Yeah, so I'm tangentially involved in this, 00:52:15.220 |
is that you mark your function with a decorator. 00:52:21.580 |
And when Python calls it, that decorator is invoked. 00:52:24.220 |
And then it says, before I call this function, 00:52:31.620 |
as far as I understand is it actually uses the Python parser 00:52:34.420 |
to go parse that, turn it into a syntax tree, 00:52:53.020 |
well, I'll turn that into a multiply node in the graph, 00:52:57.700 |
- So where does the Swift for TensorFlow come in? 00:53:09.220 |
but it seems like there's a lot more going on 00:53:14.900 |
- Yeah, so the TensorFlow world has a couple of different, 00:53:21.180 |
And so Swift and Python and Go and Rust and Julia 00:53:25.260 |
and all these things share the TensorFlow graphs 00:53:29.300 |
and all the runtime and everything that's later. 00:53:32.700 |
And so Swift for TensorFlow is merely another front-end 00:53:36.660 |
for TensorFlow, just like any of these other systems are. 00:53:42.700 |
I would say, three camps of technologies here. 00:53:46.820 |
because the vast majority of the community effort 00:53:54.520 |
it has its own APIs and all this kind of stuff. 00:53:57.140 |
There's Swift, which I'll talk about in a second. 00:54:01.920 |
And so the everything else are effectively language bindings. 00:54:08.840 |
they usually don't have automatic differentiation 00:54:10.920 |
or they usually don't provide anything other than APIs 00:54:26.660 |
let's look at all the problems that need to be solved 00:54:29.040 |
in the full stack of the TensorFlow compilation process, 00:54:35.680 |
Because TensorFlow is fundamentally a compiler. 00:54:38.180 |
It takes models, and then it makes them go fast on hardware. 00:55:02.120 |
But it's saying, and the design principle is saying, 00:55:09.760 |
and what is the best possible way we can do that, 00:55:15.920 |
And Python, for example, where the vast majority 00:55:22.460 |
is constrained by being the best possible thing 00:55:32.560 |
They added a matrix multiplication operator with that, 00:55:41.200 |
but you can add language features to the language, 00:55:49.720 |
between the human programmer and the compiler? 00:55:51.980 |
And Swift has a number of things that shift that balance. 00:55:55.280 |
So because it has a type system, for example, 00:56:00.280 |
it makes certain things possible for analysis of the code, 00:56:03.280 |
and the compiler can automatically build graphs for you 00:56:11.640 |
you get clustering and fusion and optimization, 00:56:16.120 |
without you as a programmer having to manually do it 00:56:20.040 |
Automatic differentiation is another big deal. 00:56:34.560 |
People doing a tremendous amount of numerical computing 00:57:05.080 |
but they're very difficult to port into a world 00:57:11.240 |
Like you need to be able to look at an entire function 00:57:34.880 |
that's kind of interesting is TPUs at Google. 00:57:37.840 |
- So we're in a new world with deep learning. 00:57:45.020 |
I imagine you're still innovating on the TPU front too. 00:58:07.760 |
And as you might imagine, we're not out of ideas yet. 00:58:20.800 |
certain classes of machine learning problems? 00:58:41.520 |
or like whatever it is that you're optimizing for. 00:58:50.000 |
BFloat16 is a compressed 16-bit floating point format, 00:58:57.120 |
it has a smaller mantissa and a larger exponent. 00:59:02.980 |
but it can represent larger ranges of values, 00:59:08.620 |
because sometimes you have very small gradients 00:59:16.440 |
that are important to move things as you're learning, 00:59:20.560 |
but sometimes you have very large magnitude numbers as well. 00:59:23.240 |
And BFloat16 is not as precise, the mantissa is small, 00:59:28.200 |
but it turns out the machine learning algorithms 00:59:35.520 |
the ability for the network to generalize across datasets. 00:59:41.160 |
it's much cheaper at the hardware level to implement 00:59:48.100 |
is N squared in the number of bits in the mantissa, 01:00:01.040 |
and people working on optimizing network transport 01:00:12.160 |
and it's a key part of what makes TPU performance so amazing 01:00:20.720 |
but the co-design between the low-level compiler bits 01:00:28.680 |
And it's this amazing trifecta that only Google can do. 01:00:41.440 |
- Yeah, so MLIR is a project that we announced 01:00:43.600 |
at a compiler conference three weeks ago or something, 01:00:47.760 |
the Compilers for Machine Learning Conference. 01:00:53.720 |
it has a number of compiler algorithms within it. 01:00:56.140 |
It also has a number of compilers that get embedded into it 01:01:08.640 |
There's a number of these different compiler systems 01:01:13.800 |
and they're trying to solve different parts of the problems, 01:01:22.880 |
and it has these different code generation technologies 01:01:26.400 |
The idea of MLIR is to build a common infrastructure 01:01:34.840 |
and they can share a lot more code and can be reusable. 01:01:38.680 |
we hope that the industry will start collaborating 01:01:47.280 |
working together to solve common problem energy 01:01:51.360 |
that has been useful in the compiler field before. 01:01:56.360 |
some people have joked that it's kind of LLVM2. 01:01:59.240 |
It learns a lot about what LLVM has been good 01:02:06.400 |
And also there are challenges in the LLVM ecosystem as well, 01:02:09.800 |
where LLVM is very good at the thing it was designed to do, 01:02:15.520 |
and people are trying to solve higher level problems 01:02:20.280 |
- And what's the future of open source in this context? 01:02:27.440 |
but it will be hopefully in the next couple of months. 01:02:29.320 |
- So you still believe in the value of open source 01:02:32.680 |
And I think that the TensorFlow community at large 01:02:37.640 |
- So I mean, there is a difference between Apple, 01:02:43.480 |
And I would say the open sourcing of TensorFlow 01:02:45.440 |
was a seminal moment in the history of software 01:02:51.000 |
releasing a very large code base that's open sourcing. 01:03:02.880 |
- So between the two, I prefer the Google approach, 01:03:11.000 |
given the historical context that Apple came from, 01:03:15.720 |
And I think that Apple is definitely adapting. 01:03:18.160 |
And the way I look at it is that there's different kinds 01:03:28.680 |
That fundamentally is what a business is about, right? 01:03:31.600 |
But I think it's also incredibly realistic to say 01:03:36.120 |
that's the thing that's gonna make you money. 01:03:38.040 |
It's gonna be the amazing UI product differentiating features 01:03:42.920 |
that you build on top of your string library. 01:03:45.200 |
And so keeping your string library proprietary and secret 01:03:51.000 |
maybe not the important thing anymore, right? 01:03:54.680 |
Where before platforms were different, right? 01:03:57.680 |
And even 15 years ago, things were a little bit different, 01:04:02.840 |
So Google strikes a very good balance, I think. 01:04:08.600 |
really changed the entire machine learning field 01:04:14.000 |
And so I think it's amazingly forward-looking 01:04:25.400 |
"Machine learning is critical to what we're doing. 01:04:27.520 |
"We're not gonna give it to other people," right? 01:04:29.600 |
And so that decision is a profoundly brilliant insight 01:04:34.600 |
that I think has really led to the world being better 01:05:00.640 |
But on the other hand, I think that open sourcing TensorFlow 01:05:03.920 |
And I'm sure that decision was very non-obvious at the time, 01:05:26.080 |
that's one of the bravest engineering decisions 01:05:31.880 |
undertaking really ever in the automotive industry, 01:05:39.240 |
So my one question there is, what was that like? 01:05:45.760 |
from a comfortable good job into the unknown, or? 01:05:51.520 |
you making that decision, and then when you show up, 01:05:56.320 |
you know, it's a really hard engineering problem. 01:06:03.640 |
say hardware one, or those kinds of decisions. 01:06:06.720 |
Just taking it full on, let's do this from scratch. 01:06:11.120 |
- Well, so, I mean, I don't think Tesla has a culture 01:06:13.240 |
of taking things slow and seeing how it goes. 01:06:15.720 |
So, and one of the things that attracts me about Tesla 01:06:18.080 |
is it's very much a gung-ho, let's change the world, 01:06:21.520 |
And so I have a huge amount of respect for that. 01:06:29.400 |
and the hardware one design was originally designed 01:06:32.760 |
to be very simple automation features in the car 01:06:37.320 |
for like traffic-aware cruise control and things like that. 01:06:39.840 |
And the fact that they were able to effectively 01:06:49.280 |
particularly given the details of the hardware. 01:06:54.640 |
And the challenge there was that they were transitioning 01:07:01.720 |
And so for the first step, which I mostly helped with, 01:07:10.880 |
And it was time critical for various reasons, 01:07:15.000 |
but it was fortunate that it built on a lot of the knowledge 01:07:33.440 |
Elon Musk continues to do some of the most bold 01:07:35.920 |
and innovative engineering work in the world. 01:07:45.840 |
- Yeah, so I guess I would say that when I was at Tesla, 01:07:50.520 |
I experienced and saw the highest degree of turnover 01:07:54.440 |
I'd ever seen in a company, which was a bit of a shock. 01:07:57.280 |
But one of the things I learned and I came to respect 01:08:00.520 |
is that Elon's able to attract amazing talent 01:08:03.400 |
because he has a very clear vision of the future 01:08:11.880 |
that I have a tremendous amount of respect for. 01:08:14.240 |
And I think that Elon is fairly singular in the world 01:08:17.640 |
in terms of the things he's able to get people to believe in. 01:08:24.000 |
there may be people that stand in the street corner 01:08:29.400 |
But then there are a few people that can get others 01:08:41.040 |
but I have a huge amount of respect for that. 01:08:57.080 |
of having to really sort of put everything you have 01:09:05.040 |
So working hard can be defined a lot of different ways. 01:09:14.500 |
is both being short-term focused on delivering 01:09:24.400 |
Because if you are myopically focused on solving a task 01:09:29.660 |
and only think about that incremental next step, 01:09:32.560 |
you will miss the next big hill you should jump over to. 01:09:38.040 |
that I've been able to kind of oscillate between the two. 01:09:45.640 |
that was made possible because I was able to work 01:09:47.480 |
with some really amazing people and build up teams 01:09:57.120 |
thereby freeing up me to be a little bit crazy 01:10:19.860 |
There's different theories on work-life balance 01:10:26.980 |
I wanna love what I'm doing and work really hard. 01:10:46.800 |
have connotations of power, speed, intelligence. 01:10:55.180 |
What is your favorite dragon-related character 01:11:01.460 |
- So those are all very kind ways of explaining it. 01:11:03.820 |
Do you wanna know the real reason it's a dragon? 01:11:12.500 |
And so this is a really old now book on compilers. 01:11:22.060 |
we kept talking about LLVM-related technologies 01:11:28.460 |
And somebody's like, "Well, what kind of logo 01:11:33.260 |
"I mean, the dragon is the best thing that we've got." 01:11:37.300 |
And Apple somehow magically came up with the logo 01:11:50.180 |
Is there dragons from fiction that you connect with? 01:12:09.940 |
- And hilariously, one of the funny things about LLVM 01:12:20.860 |
and is trying to get more women involved in the, 01:12:26.060 |
to get interested in compilers and things like this. 01:12:34.300 |
And so sometimes culture has this helpful effect 01:12:36.860 |
to get the next generation of compiler engineers