back to index

Chris Lattner: Future of Programming and AI | Lex Fridman Podcast #381


Chapters

0:0 Introduction
2:20 Mojo programming language
12:37 Code indentation
21:4 The power of autotuning
30:54 Typed programming languages
47:38 Immutability
59:56 Distributed deployment
94:23 Mojo vs CPython
110:12 Guido van Rossum
117:13 Mojo vs PyTorch vs TensorFlow
120:37 Swift programming language
126:9 Julia programming language
131:14 Switching programming languages
140:40 Mojo playground
145:30 Jeremy Howard
156:16 Function overloading
164:41 Error vs Exception
172:21 Mojo roadmap
185:23 Building a company
197:9 ChatGPT
203:32 Danger of AI
207:27 Future of programming
210:43 Advice for young people

Whisper Transcript | Transcript Only Page

00:00:00.000 | on one axis you have more hardware coming in,
00:00:01.760 | on the other hand you have an explosion of innovation in AI.
00:00:05.680 | And so what happened with both TensorFlow and PyTorch
00:00:07.480 | is that the explosion of innovation in AI
00:00:10.400 | has led to, it's not just about matrix multiplication
00:00:12.960 | and convolution, these things have now
00:00:14.520 | like 2,000 different operators.
00:00:17.200 | And on the other hand you have,
00:00:18.280 | I don't know how many pieces of hardware there are,
00:00:20.320 | it's a lot.
00:00:21.220 | Part of my thesis, part of my belief of where computing goes
00:00:24.280 | if you look out 10 years from now,
00:00:26.320 | is it's not gonna get simpler.
00:00:28.520 | Physics isn't going back to where we came from,
00:00:30.800 | it's only gonna get weirder from here on out.
00:00:33.480 | And so to me, the exciting part about what we're building
00:00:36.880 | is it's about building that universal platform
00:00:40.200 | which the world can continue to get weird,
00:00:42.920 | 'cause again I don't think it's avoidable, it's physics,
00:00:45.400 | but we can help lift people's scale, do things with it,
00:00:48.360 | and they don't have to rewrite their code
00:00:49.320 | every time a new device comes out.
00:00:51.160 | And I think that's pretty cool.
00:00:52.720 | - The following is a conversation with Chris Ladner.
00:00:57.400 | His third time on this podcast.
00:00:59.560 | As I've said many times before,
00:01:01.600 | he's one of the most brilliant engineers
00:01:03.560 | in modern computing.
00:01:04.800 | Having created LLM Compiler Infrastructure Project,
00:01:07.960 | the Clang Compiler, the Swift Programming Language,
00:01:10.920 | a lot of key contributions to TensorFlow and TPUs
00:01:13.600 | as part of Google.
00:01:14.720 | He served as Vice President of Autopilot Software at Tesla,
00:01:18.880 | was a software innovator and leader at Apple,
00:01:21.880 | and now he co-created a new full stack AI infrastructure
00:01:26.960 | for distributed training, inference, and deployment
00:01:29.760 | on all kinds of hardware called modular.
00:01:32.720 | And a new programming language called Mojo
00:01:36.600 | that is a superset of Python,
00:01:38.320 | giving you all the usability of Python,
00:01:40.680 | but with a performance of C, C++.
00:01:43.640 | In many cases, Mojo code has demonstrated
00:01:46.800 | over 30,000x speedup over Python.
00:01:51.800 | If you love machine learning, if you love Python,
00:01:54.560 | you should definitely give Mojo a try.
00:01:56.920 | This programming language, this new AI framework
00:02:00.240 | and infrastructure, and this conversation with Chris
00:02:03.720 | is mind blowing.
00:02:05.760 | I love it.
00:02:07.400 | It gets pretty technical at times,
00:02:08.840 | so I hope you hang on for the ride.
00:02:11.600 | This is the Lex Friedman Podcast.
00:02:13.480 | To support it, please check out our sponsors
00:02:15.520 | in the description.
00:02:16.680 | And now, dear friends, here's Chris Ladner.
00:02:19.960 | - It's been, I think, two years since we last talked,
00:02:23.160 | and in that time, you somehow went
00:02:25.760 | and co-created a new programming language called Mojo.
00:02:29.880 | So it's optimized for AI.
00:02:31.680 | It's a superset of Python.
00:02:33.520 | Let's look at the big picture.
00:02:35.040 | What is the vision for Mojo?
00:02:37.840 | - For Mojo?
00:02:38.680 | Well, so I mean, I think you have to zoom out.
00:02:40.080 | So I've been working on a lot of related technologies
00:02:43.200 | for many, many years.
00:02:44.240 | So I've worked on LLVM and a lot of things,
00:02:46.480 | and mobile and servers and things like this,
00:02:50.240 | but the world's changing.
00:02:51.480 | And what's happened with AI is we have new GPUs
00:02:54.040 | and new machine learning accelerators
00:02:56.680 | and other ASICs and things like that
00:02:58.160 | that make AI go real fast.
00:03:00.760 | At Google, I worked on TPUs.
00:03:02.440 | That's one of the biggest, largest-scale deployed systems
00:03:05.160 | that exist for AI.
00:03:06.360 | And really what you see is if you look across
00:03:09.240 | all of the things that are happening in the industry,
00:03:10.760 | there's this new compute platform coming.
00:03:12.760 | And it's not just about CPUs or GPUs or TPUs or NPUs
00:03:17.600 | or IPUs or whatever, all the PUs, right?
00:03:20.920 | It's about how do we program these things, right?
00:03:23.520 | And so for software folks like us, right,
00:03:27.000 | it doesn't do us any good if there's this amazing hardware
00:03:29.600 | that we can't use.
00:03:31.080 | And one of the things you find out really quick
00:03:33.000 | is that having the theoretical capability
00:03:35.720 | of programming something and then having the world's power
00:03:39.040 | and the innovation of all the smart people in the world
00:03:42.080 | get unleashed on something can be quite different.
00:03:44.960 | And so really where Mojo came from was starting
00:03:47.560 | from a problem of we need to be able
00:03:49.760 | to take machine learning,
00:03:50.960 | take the infrastructure underneath it
00:03:52.200 | and make it way more accessible, way more usable,
00:03:54.840 | way more understandable by normal people and researchers
00:03:57.840 | and other folks that are not themselves like experts
00:04:00.880 | in GPUs and things like this.
00:04:02.640 | And then through that journey, we realized,
00:04:04.880 | hey, we need syntax for this.
00:04:05.880 | We need to do a programming language.
00:04:07.560 | - So one of the main features of the language,
00:04:10.480 | I say so fully in jest, is that it allows you
00:04:14.160 | to have the file extension to be an emoji
00:04:18.600 | or the fire emoji, which is one of the first emojis
00:04:23.600 | used as a file extension I've ever seen in my life.
00:04:27.280 | And then you ask yourself the question,
00:04:29.120 | why in the 21st century are we not using Unicode
00:04:32.680 | for file extensions?
00:04:34.600 | This, I mean, it's an epic decision.
00:04:36.520 | I think clearly the most important decision
00:04:38.560 | you made the most, but you could also just use Mojo
00:04:41.120 | as the file extension.
00:04:42.280 | - Well, so, okay, so take a step back.
00:04:43.800 | I mean, come on, Lex, do you think
00:04:44.640 | that the world's ready for this?
00:04:45.800 | This is a big moment in the world, right?
00:04:47.720 | This is, we're releasing this onto the world.
00:04:49.440 | - This is innovation.
00:04:50.600 | - I mean, it really is kind of brilliant.
00:04:53.640 | Emojis are such a big part of our daily lives.
00:04:58.120 | Why is it not in programming?
00:05:00.120 | - Well, and like you take a step back
00:05:01.720 | and look at what file extensions are, right?
00:05:04.720 | They're basically metadata, right?
00:05:06.560 | And so why are we spending all the screen space on them
00:05:08.840 | and all this stuff?
00:05:09.800 | Also, you have them stacked up next to text files
00:05:12.280 | and PDF files and whatever else.
00:05:13.840 | Like if you're gonna do something cool,
00:05:15.240 | you want it to stand out, right?
00:05:16.640 | Emojis are colorful, they're visual, they're beautiful.
00:05:19.960 | - What's been the response so far from,
00:05:22.560 | is there a support on like Windows on operating systems
00:05:25.400 | in displaying like file explorer?
00:05:27.040 | - Yeah, yeah, yeah.
00:05:27.880 | The one problem I've seen is that Git
00:05:29.320 | doesn't escape it right.
00:05:31.440 | And so it thinks that the fire emoji is unprintable
00:05:33.560 | and so it like prints out weird hex things
00:05:35.240 | if you use the command line Git tool.
00:05:37.080 | But everything else as far as I'm aware works fine.
00:05:39.520 | And I have faith that Git can be improved.
00:05:41.560 | So I'm not worried. - And so GitHub is fine.
00:05:43.960 | - GitHub is fine, yep.
00:05:45.320 | GitHub is fine, Visual Studio Code, Windows,
00:05:47.600 | like all this stuff, totally ready
00:05:49.520 | because people have internationalization
00:05:51.680 | in their normal part of their paths.
00:05:54.280 | So this is just like taking the next step, right?
00:05:56.840 | Somewhere between, oh wow, that makes sense, cool.
00:06:00.120 | I like new things, to oh my God, you're killing my baby.
00:06:03.560 | Like what are you talking about?
00:06:04.560 | This can never be, like I can never handle this.
00:06:06.560 | How am I gonna type this?
00:06:08.320 | Like all these things.
00:06:09.160 | And so this is something where I think
00:06:11.440 | that the world will get there.
00:06:12.640 | We don't have to bet the whole farm on this.
00:06:14.520 | I think we can provide both paths.
00:06:17.320 | But I think it'll be great.
00:06:18.800 | - When can we have emojis as part of the code, I wonder?
00:06:22.360 | - Yeah, so I mean, lots of languages provide that.
00:06:24.160 | So I think that we have partial support for that.
00:06:26.720 | It's probably not fully done yet.
00:06:27.960 | But yeah, you can do that.
00:06:30.400 | For example, in Swift, you can do that for sure.
00:06:32.440 | So an example we gave at Apple was the dog cow.
00:06:37.440 | So that's a classical Mac heritage thing.
00:06:39.840 | And so you use the dog and the cow emoji together
00:06:41.600 | and that could be your variable name.
00:06:42.640 | But of course the internet went
00:06:44.400 | and made pile of poop for everything.
00:06:46.880 | So if you wanna name your function pile of poop,
00:06:49.240 | then you can totally go to town
00:06:50.960 | and see how that gets through code review.
00:06:52.720 | (laughing)
00:06:54.280 | - Okay, so let me just ask a bunch of random questions.
00:06:58.480 | So is Mojo primarily designed for AIs
00:07:01.920 | or is it a general purpose programming language?
00:07:03.760 | - Yeah, good question.
00:07:04.600 | So it's AI first.
00:07:05.960 | And so AI is driving a lot of the requirements.
00:07:07.960 | And so Modular is building and designing
00:07:11.840 | and driving Mojo forward.
00:07:13.080 | And it's not because it's an interesting project
00:07:15.360 | theoretically to build, it's because we need it.
00:07:18.320 | So at Modular, we're really tackling
00:07:20.840 | the AI infrastructure landscape and the big problems in AI.
00:07:24.920 | The reasons that it is so difficult to use and scale
00:07:27.120 | and adopt and deploy and like all these big problems in AI.
00:07:31.400 | And so we're coming at it from that perspective.
00:07:33.440 | Now, when you do that,
00:07:34.960 | when you start tackling these problems,
00:07:36.120 | you realize that the solution to these problems
00:07:40.360 | isn't actually an AI specific solution.
00:07:43.160 | And so while we're doing this,
00:07:44.080 | we're building Mojo to be a fully general programming
00:07:45.800 | language.
00:07:46.640 | And that means that you can obviously tackle GPUs and CPUs
00:07:51.160 | and like these AI things,
00:07:52.360 | but it's also a really great way to build NumPy
00:07:55.400 | and other things like that.
00:07:56.520 | Or, you know, just if you look at what many Python libraries
00:07:59.720 | are today, often they're a layer of Python for the API
00:08:02.680 | and they end up being C and C++ code underneath them.
00:08:05.680 | That's very true in AI,
00:08:07.280 | that's true in lots of other domains as well.
00:08:09.240 | And so anytime you see this pattern,
00:08:10.600 | that's an opportunity for Mojo to help simplify the world
00:08:13.160 | and help people have one thing.
00:08:15.080 | - So optimize through simplification by having one thing.
00:08:20.520 | So you mentioned modular,
00:08:21.920 | Mojo is the programming language,
00:08:23.200 | modular is the whole software stack.
00:08:25.720 | - So just over a year ago,
00:08:26.840 | we started this company called Modular.
00:08:28.720 | Okay, what Modular is about is,
00:08:30.360 | it's about taking AI and up-leveling it
00:08:33.080 | into the next generation, right?
00:08:34.560 | And so if you take a step back,
00:08:37.560 | what's gone on in the last five, six, seven, eight years
00:08:40.400 | is that we've had things like TensorFlow and PyTorch
00:08:43.520 | and these other systems come in,
00:08:44.760 | you've used them, you know this.
00:08:46.280 | And what's happened is these things have grown like crazy.
00:08:49.680 | And they get tons of users,
00:08:51.040 | it's in production deployment scenarios,
00:08:53.280 | it's being used to power so many systems.
00:08:55.520 | I mean, AI is all around us now.
00:08:57.560 | It used to be controversial years ago,
00:08:59.440 | but now it's a thing.
00:09:00.720 | But the challenge with these systems
00:09:02.760 | is that they haven't always been thought out
00:09:06.600 | with current demands in mind.
00:09:08.920 | And so you think about it,
00:09:10.280 | where were LLMs eight years ago?
00:09:13.560 | Well, they didn't exist, right?
00:09:14.760 | AI has changed so much.
00:09:16.120 | And a lot of what people are doing today
00:09:17.720 | are very different than when these systems were built.
00:09:20.080 | And meanwhile, the hardware side of this
00:09:21.560 | has gotten into a huge mess.
00:09:23.200 | There's tons of new chips and accelerators
00:09:24.960 | and every big company's announcing a new chip every day,
00:09:27.360 | it feels like.
00:09:28.440 | And so between that,
00:09:29.840 | you have like this moving system on one side,
00:09:33.640 | a moving system on the other side,
00:09:34.920 | and it just turns into this gigantic mess,
00:09:36.960 | which makes it very difficult for people to actually use AI,
00:09:40.200 | particularly in production deployment scenarios.
00:09:42.600 | And so what Modular's doing is we're helping
00:09:44.360 | build out that software stack
00:09:45.520 | to help solve some of those problems,
00:09:47.000 | so then people can be more productive
00:09:48.440 | and get more AI research into production.
00:09:50.960 | Now, what Mojo does is it's a really, really,
00:09:53.600 | really important piece of that.
00:09:55.240 | And so that is part of that engine
00:09:57.560 | and part of the technology
00:09:58.400 | that allows us to solve these problems.
00:10:00.640 | - So Mojo is a programming language
00:10:02.400 | that allows you to do the higher level programming,
00:10:06.400 | the low level programming,
00:10:07.960 | they do all kinds of programming in that spectrum
00:10:11.800 | that gets you closer and closer to the hardware.
00:10:14.080 | - So take a step back.
00:10:14.920 | So Lex, what do you love about Python?
00:10:17.040 | - Oh boy, where do I begin?
00:10:19.920 | What is love?
00:10:22.320 | What do I love about Python?
00:10:23.640 | - You're a guy who knows love, I know this.
00:10:25.360 | - Yes.
00:10:26.200 | How intuitive it is.
00:10:31.000 | How it feels like I'm writing natural language English.
00:10:33.840 | How when I can not just write,
00:10:39.040 | but read other people's code,
00:10:40.280 | somehow I can understand it faster.
00:10:42.240 | It's more condensed than other languages,
00:10:46.360 | like ones I'm really familiar with,
00:10:48.160 | like C++ and C.
00:10:51.200 | There's a bunch of sexy little features.
00:10:53.600 | We'll probably talk about some of them,
00:10:56.520 | but list comprehensions and stuff like this.
00:10:59.880 | - Well, and don't forget the entire ecosystem
00:11:02.680 | of all the packages.
00:11:03.520 | - Oh yeah, that's probably huge.
00:11:04.600 | - 'Cause there's always something,
00:11:05.440 | if you wanna do anything, there's always a package.
00:11:07.840 | - Yeah, so it's not just the ecosystem of the packages
00:11:12.240 | and the ecosystem of the humans that do it.
00:11:14.520 | That's a really, that's an interesting dynamic.
00:11:17.600 | - That's huge.
00:11:18.440 | - I think something about the usability
00:11:22.200 | and the ecosystem makes the thing viral,
00:11:24.120 | it grows, and it's a virtuous cycle.
00:11:26.920 | - Well, and there's many things that went into that.
00:11:28.680 | So I think that ML was very good for Python,
00:11:31.400 | and so I think that TensorFlow and PyTorch
00:11:33.520 | and these systems embracing Python
00:11:35.400 | really took and helped Python grow.
00:11:38.040 | But I think that the major thing underlying it
00:11:40.520 | is that Python's like the universal connector.
00:11:42.880 | It really helps bring together lots of different systems
00:11:45.920 | so you can compose them and build out larger systems
00:11:48.200 | without having to understand how it works.
00:11:50.320 | But then what is the problem with Python?
00:11:53.200 | - Well, I guess you could say several things,
00:11:54.960 | but probably that it's slow.
00:11:57.240 | I think that's usually what people complain about.
00:11:59.440 | And so, I mean, other people would complain about tabs
00:12:02.600 | and spaces versus curly braces or whatever,
00:12:04.720 | but I mean, those people are just wrong
00:12:07.280 | 'cause it is actually just better to use indentation.
00:12:10.080 | - Wow, strong words.
00:12:12.960 | So actually, on a small tangent, let's actually take that.
00:12:15.840 | Let's take all kinds of tangents.
00:12:17.400 | - Oh, come on, Lex, you can push me on it.
00:12:18.760 | I can take it.
00:12:19.600 | - Design, listen, I've recently left Emacs for VS Code,
00:12:24.560 | the kind of hate mail I had to receive
00:12:26.880 | because on the way to doing that,
00:12:28.160 | I also said I've considered Vim
00:12:30.600 | and chose not to and went with VS Code.
00:12:33.680 | - You're touching on deep religions, right?
00:12:36.560 | - Anyway, tabs is an interesting design decision.
00:12:39.400 | And so you've really written a new programming language here.
00:12:42.960 | Yes, it is a super set of Python,
00:12:45.880 | but you can make a bunch of different
00:12:47.240 | interesting decisions here.
00:12:48.760 | And you chose actually to stick with Python
00:12:51.480 | as in terms of some of the syntax.
00:12:55.360 | - Well, so let me explain why.
00:12:56.960 | So I mean, you can explain this in many rational ways.
00:13:01.960 | I think that the indentation is beautiful,
00:13:04.640 | but that's not a rational explanation,
00:13:07.000 | but I can defend it rationally.
00:13:08.400 | So first of all, Python 1 has millions of programmers.
00:13:12.680 | It is huge, it's everywhere.
00:13:13.920 | It owns machine learning.
00:13:15.120 | So factually, it is the thing.
00:13:18.560 | Second of all, if you look at it,
00:13:20.400 | C code, C++ code, Java, whatever, Swift,
00:13:23.480 | curly brace languages also run through formatting tools
00:13:27.120 | and get indented.
00:13:28.120 | And so if they're not indented correctly,
00:13:31.880 | first of all, it will twist your brain around.
00:13:34.160 | It can lead to bugs.
00:13:35.120 | There's notorious bugs that have happened across time
00:13:37.520 | where the indentation was wrong or misleading
00:13:40.000 | and it wasn't formatted right.
00:13:41.160 | And so it turned into an issue, right?
00:13:43.600 | And so what ends up happening in modern large scale
00:13:46.280 | code bases is people run automatic formatters.
00:13:49.200 | So now what you end up with is indentation
00:13:51.040 | and curly braces.
00:13:53.160 | Well, if you're gonna have the notion of grouping,
00:13:58.160 | why not have one thing, right?
00:14:00.320 | And get rid of all the clutter
00:14:01.240 | and have a more beautiful thing, right?
00:14:02.760 | Also, you look at many of these languages,
00:14:04.000 | it's like, okay, well, you can have curly braces
00:14:06.120 | or you can omit them if there's one statement
00:14:08.080 | or you just like enter this entire world
00:14:09.600 | of complicated design space that objectively you don't need
00:14:13.120 | if you have Python style indentation.
00:14:15.760 | - Yeah, I would love to actually see statistics
00:14:17.320 | on errors made because of indentation.
00:14:19.840 | Like how many errors are made in Python versus in C++
00:14:23.600 | that have to do with basic formatting,
00:14:25.520 | all that kind of stuff.
00:14:26.360 | I would love to see.
00:14:27.200 | - I think it's probably pretty minor
00:14:28.680 | because once you get, like you use VS Code, I do too.
00:14:32.360 | So if you get VS Code set up,
00:14:33.360 | it does the indentation for you generally, right?
00:14:35.440 | And so you don't, you know, it's actually really nice
00:14:37.440 | to not have to fight it.
00:14:39.360 | And then what you can see is the editor's telling you
00:14:42.000 | how your code will work by indenting it,
00:14:44.040 | which I think is pretty cool.
00:14:45.280 | - I honestly don't think I've ever,
00:14:49.200 | I don't remember having an error in Python
00:14:51.800 | because I indented stuff wrong.
00:14:53.600 | - So, I mean, I think that there's,
00:14:54.680 | again, this is a religious thing.
00:14:55.920 | And so I can joke about it and I love to kind of,
00:14:59.360 | you know, I realize that this is such a polarizing thing
00:15:02.840 | and everybody wants to argue about it.
00:15:03.680 | And so I like poking at the bear a little bit, right?
00:15:06.760 | But frankly, right, come back to the first point, Python 1.
00:15:10.400 | Like it's huge, it's an AI, it's the right thing.
00:15:13.120 | For us, like we see Mojo as being an incredible part
00:15:15.400 | of the Python ecosystem.
00:15:17.080 | We're not looking to break Python or change it
00:15:19.400 | or quote unquote fix it.
00:15:21.280 | We love Python for what it is.
00:15:22.760 | Our view is that Python is just not done yet.
00:15:25.440 | And so if you look at, you know,
00:15:27.320 | you mentioned Python being slow.
00:15:28.760 | Well, there's a couple of different things
00:15:29.920 | that go into that, which we can talk about if you want.
00:15:31.960 | But one of them is it just doesn't have those features
00:15:34.560 | that you would use to do C-like programming.
00:15:37.520 | And so if you say, okay, well, I'm forced out of Python
00:15:40.440 | into C for certain use cases.
00:15:42.920 | Well, then what we're doing is we're saying,
00:15:44.080 | okay, well, why is that?
00:15:45.800 | Can we just add those features that are missing from Python
00:15:48.440 | back up to Mojo?
00:15:50.080 | And then you can have everything that's great about Python,
00:15:52.080 | all the things you're talking about that you love,
00:15:54.240 | plus not be forced out of it
00:15:55.960 | when you do something a little bit
00:15:57.880 | more computationally intense or weird or hardwarey
00:16:01.280 | or whatever it is that you're doing.
00:16:02.960 | - Well, a million questions I wanna ask,
00:16:05.080 | but high level again, is it compiled
00:16:07.280 | or is it an interpreted language?
00:16:08.480 | So Python is just in time compilation.
00:16:11.160 | What's Mojo?
00:16:12.400 | - So Mojo, a complicated answer.
00:16:15.440 | It does all the things.
00:16:16.840 | So it's interpreted, it's JIT compiled,
00:16:18.360 | and it's statically compiled.
00:16:19.800 | And so this is for a variety of reasons.
00:16:24.120 | So one of the things that makes Python beautiful
00:16:26.960 | is that it's very dynamic.
00:16:29.360 | And because it's dynamic, one of the things they added
00:16:32.160 | is that it has this powerful metaprogramming feature.
00:16:35.080 | And so if you look at something like PyTorch or TensorFlow,
00:16:38.080 | I mean, even a simple use case,
00:16:41.640 | you define a class that has the plus method.
00:16:45.440 | You can overload the dunder methods,
00:16:47.200 | like dunder add, for example,
00:16:48.800 | and then the plus method works on your class.
00:16:51.040 | And so it has very nice and very expressive
00:16:53.520 | dynamic metaprogramming features.
00:16:55.560 | In Mojo, we want all those features to come in.
00:16:58.920 | We don't wanna break Python, we want it all to work.
00:17:00.840 | But the problem is, is you can't run
00:17:02.200 | those super dynamic features on an embedded processor
00:17:06.080 | or on a GPU.
00:17:08.200 | Or if you could, you probably don't want to
00:17:10.760 | just because of the performance.
00:17:11.960 | And so we entered this question of saying,
00:17:14.680 | okay, how do you get the power of this dynamic metaprogramming
00:17:18.880 | into a language that has to be super efficient
00:17:21.800 | in specific cases?
00:17:23.160 | And so what we did was we said,
00:17:24.320 | okay, we'll take that interpreter.
00:17:25.880 | Python has an interpreter in it, right?
00:17:28.040 | Take that interpreter and allow it to run at compile time.
00:17:31.440 | And so now what you get is you get
00:17:32.560 | compile time metaprogramming.
00:17:34.400 | And so this is super interesting and super powerful
00:17:36.600 | because one of the big advantages you get
00:17:39.320 | is you get Python style expressive APIs.
00:17:42.120 | You get the ability to have overloaded operators.
00:17:45.040 | And if you look at what happens inside of like PyTorch,
00:17:47.360 | for example, with automatic differentiation
00:17:49.320 | and eager mode and like all these things,
00:17:51.080 | they're using these really dynamic
00:17:52.640 | and powerful features at runtime.
00:17:54.680 | But we can take those features and lift them
00:17:56.480 | so that they run at compile time.
00:17:58.200 | - So you're, 'cause C++ has metaprogramming
00:18:01.320 | with templates, but it's really messy.
00:18:05.400 | - It's super messy.
00:18:06.520 | It's always, it was accidentally,
00:18:09.160 | I mean, different people have different interpretations.
00:18:11.560 | My interpretation is that it was made accidentally powerful.
00:18:14.680 | It was not designed to be Turing complete, for example,
00:18:17.680 | but that was discovered kind of along the way accidentally.
00:18:21.040 | And so there've been a number of languages in the space.
00:18:24.360 | And so they usually have templates
00:18:26.640 | or code instantiation, code copying features
00:18:28.760 | of various sorts.
00:18:29.680 | Some more modern languages or some more newer languages,
00:18:33.840 | let's say like, you know, they're fairly unknown,
00:18:37.480 | like Zig, for example, says, okay, well,
00:18:40.680 | let's take all of those types that you can run it,
00:18:43.840 | all those things you can do at runtime
00:18:45.800 | and allow them to happen at compile time.
00:18:48.200 | And so one of the problems with C++,
00:18:50.520 | I mean, which is one of the problems with C++.
00:18:54.080 | - Here we go.
00:18:54.920 | - Is. - Wrong words.
00:18:55.760 | We're gonna offend everybody today.
00:18:57.040 | - Oh, it's okay.
00:18:57.880 | I mean, everybody hates me for a variety of reasons anyways,
00:18:59.920 | I'm sure, right?
00:19:00.760 | I've written enough.
00:19:02.440 | - Just the way they show love.
00:19:03.440 | - I have written enough C++ code
00:19:05.160 | to earn a little bit of grumpiness with C++,
00:19:07.360 | but one of the problems with it
00:19:10.400 | is that the metaprogramming system templates
00:19:13.000 | is just a completely different universe
00:19:14.680 | from the normal runtime programming world.
00:19:18.200 | And so if you do metaprogramming and programming,
00:19:20.320 | it's just like a different universe,
00:19:21.360 | different syntax, different concepts,
00:19:23.760 | different stuff going on.
00:19:24.760 | And so again, one of our goals with Mojo
00:19:27.160 | is to make things really easy to use, easy to learn.
00:19:29.920 | And so there's a natural stepping stone.
00:19:32.720 | And so as you do this, you say, okay,
00:19:34.120 | well, I have to do programming at runtime,
00:19:36.520 | I have to do programming at compile time.
00:19:39.200 | Why are these different things?
00:19:41.320 | - How hard is that to pull it off?
00:19:42.400 | 'Cause that sounds, to me as a fan of metaprogramming
00:19:45.160 | in C++ even, how hard is it to pull that off?
00:19:49.040 | That sounds really, really exciting
00:19:50.520 | 'cause you can do the same style programming
00:19:52.800 | at compile time and at runtime.
00:19:54.200 | That's really, really exciting.
00:19:55.880 | - Yep, yep.
00:19:56.720 | And so, I mean, in terms of the compiler
00:19:57.800 | implementation details, it's hard.
00:19:59.820 | I won't be shy about that, it's super hard.
00:20:02.880 | It requires, I mean, what Mojo has underneath the covers
00:20:05.760 | is a completely new approach
00:20:06.960 | to the design of the compiler itself.
00:20:09.600 | And so this builds on these technologies
00:20:11.280 | like MLIR that you mentioned,
00:20:13.040 | but it also includes other like caching
00:20:15.200 | and other interpreters and JIT compilers
00:20:18.720 | and other stuff like that.
00:20:19.560 | - So you have like an interpreter inside the compiler?
00:20:20.640 | - Within the compiler, yes.
00:20:22.240 | And so it really takes the standard model
00:20:26.160 | of programming languages and kind of twists it
00:20:30.280 | and unifies it with the runtime model,
00:20:32.360 | which I think is really cool.
00:20:34.080 | And to me, the value of that is that, again,
00:20:36.280 | many of these languages have metaprogramming features,
00:20:38.140 | like they grow macros or something, right?
00:20:40.280 | Lisp, right?
00:20:41.120 | I know your roots, right?
00:20:43.620 | And this is a powerful thing, right?
00:20:46.720 | And so if you go back to Lisp,
00:20:48.440 | one of the most powerful things about it
00:20:50.200 | is that it said that the metaprogramming
00:20:52.160 | and the programming are the same, right?
00:20:54.080 | And so that made it way simpler, way more consistent,
00:20:56.560 | way easier to understand, reason about,
00:20:58.360 | and it made it more composable.
00:20:59.720 | So if you build a library, you can use it
00:21:01.160 | both at runtime and compile time,
00:21:03.440 | which is pretty cool.
00:21:04.360 | - Yeah, and for machine learning,
00:21:05.800 | I think metaprogramming,
00:21:07.040 | I think we could generally say is extremely useful.
00:21:11.800 | And so you get features, I mean, I'll jump around,
00:21:14.680 | but there's the feature of auto-tuning
00:21:17.360 | and adaptive compilation just blows my mind.
00:21:20.840 | - Well, so, okay, so let's come back to that.
00:21:22.080 | - Sure thing, all right.
00:21:23.000 | - So what is machine learning?
00:21:25.160 | Or what is a machine learning model?
00:21:26.480 | Like you take a PyTorch model off the internet, right?
00:21:29.280 | It's really interesting to me because what PyTorch
00:21:32.120 | and what TensorFlow and all these frameworks
00:21:34.160 | are kind of pushing compute into
00:21:35.680 | is they're pushing into like this abstract specification
00:21:39.320 | of a compute problem, which then gets mapped
00:21:41.840 | in a whole bunch of different ways, right?
00:21:43.400 | And so this is why it became a metaprogramming problem.
00:21:45.320 | Is that you wanna be able to say,
00:21:46.880 | cool, I have this neural net,
00:21:48.880 | now run it with batch size, 1,000, right?
00:21:51.520 | Do a mapping across batch, or okay,
00:21:54.960 | I wanna take this problem now run it across 1,000 CPUs
00:21:58.160 | or GPUs, right?
00:21:59.600 | And so this problem of describe the compute
00:22:04.040 | and then map it and do things and transform it
00:22:05.920 | are actually, it's very profound.
00:22:08.640 | And that's one of the things
00:22:09.480 | that makes machine learning systems really special.
00:22:12.160 | - Maybe can you describe autotuning and how do you pull off?
00:22:15.640 | I mean, I guess adaptive compilation
00:22:17.480 | is what we're talking about as metaprogramming.
00:22:19.560 | - Yeah. - How do you pull off autotune?
00:22:20.720 | I mean, is that as profound as I think it is?
00:22:23.240 | It just seems like a really,
00:22:24.640 | we mentioned list comprehensions.
00:22:27.720 | To me, from a quick glance at Mojo,
00:22:31.600 | which by the way, I have to absolutely like dive in.
00:22:34.560 | As I realized how amazing this is,
00:22:37.440 | I absolutely must dive in.
00:22:39.080 | That looks like just an incredible feature
00:22:41.800 | for machine learning people.
00:22:43.160 | - Yeah, well, so what is autotuning?
00:22:44.640 | So take a step back.
00:22:46.160 | Autotuning is a feature in Mojo.
00:22:47.640 | It's not, so very little of what we're doing
00:22:50.160 | is actually research.
00:22:51.200 | Like many of these ideas have existed in other systems
00:22:54.320 | and other places.
00:22:55.160 | And so what we're doing is we're pulling together good ideas,
00:22:56.840 | remixing them, and making them into
00:22:59.360 | hopefully a beautiful system, right?
00:23:01.560 | And so autotuning, the observation is that
00:23:04.440 | turns out hardware systems, algorithms
00:23:07.840 | are really complicated.
00:23:09.200 | Turns out maybe you don't actually want to know
00:23:11.160 | how the hardware works, right?
00:23:13.560 | A lot of people don't, right?
00:23:14.720 | And so there are lots of really smart hardware people.
00:23:17.480 | I know a lot of them, where they know everything about,
00:23:20.960 | okay, the cache size is this,
00:23:22.960 | and the number of registers is that.
00:23:24.320 | And if you use this length of vector,
00:23:25.960 | it's gonna be super efficient
00:23:27.160 | because it maps directly onto what it can do.
00:23:28.800 | And like all this kind of stuff,
00:23:30.000 | where the GPU has SMs and it has a warp size of whatever,
00:23:32.840 | right, all the stuff that goes into these things,
00:23:34.840 | or the tile size of a TPU is 128,
00:23:36.760 | like these factoids, right?
00:23:38.800 | My belief is that most normal people,
00:23:43.200 | and I love hardware people also,
00:23:44.440 | I'm not trying to offend literally everybody
00:23:45.960 | in the internet, but most programmers
00:23:49.560 | actually don't want to know this stuff, right?
00:23:51.680 | And so if you come at it from the perspective
00:23:53.360 | of how do we allow people to build
00:23:55.280 | both more abstracted, but also more portable code,
00:23:58.840 | because it could be that the vector length changes,
00:24:01.080 | or the cache size changes,
00:24:02.080 | or it could be that the tile size of your matrix changes,
00:24:04.120 | or the number, an A100 versus an H100
00:24:07.040 | versus a Volta versus a whatever GPU
00:24:09.440 | have different characteristics, right?
00:24:11.240 | A lot of the algorithms that you run are actually the same,
00:24:14.400 | but the parameters, these magic numbers you have to fill in
00:24:17.200 | end up being really fiddly numbers
00:24:18.880 | that an expert has to go figure out.
00:24:21.080 | And so what autotuning does, it says,
00:24:22.600 | okay, well, guess what?
00:24:24.480 | There's a lot of compute out there, right?
00:24:26.920 | So instead of having humans go randomly try all the things,
00:24:29.640 | or do a grid search, or go search some complicated
00:24:31.760 | multi-dimensional space, how about we have computers do that?
00:24:36.040 | Right, and so what autotuning does is you can say,
00:24:37.960 | hey, here's my algorithm.
00:24:40.080 | If it's a matrix operation or something like that,
00:24:43.200 | you can say, okay, I'm gonna carve it up into blocks,
00:24:45.400 | I'm gonna do those blocks in parallel,
00:24:46.920 | and I want this with 128 things that I'm running on,
00:24:50.400 | and I want to cut it this way or that way or whatever,
00:24:52.800 | and you can say, hey, go see which one's
00:24:54.400 | actually empirically better on the system.
00:24:57.280 | - And then the result of that, you cache for that system.
00:25:00.120 | - Yep. - You save it.
00:25:01.400 | - And so come back to twisting your compiler brain, right?
00:25:05.540 | So not only does the compiler have an interpreter
00:25:08.120 | that you use to do metaprogramming,
00:25:10.120 | that compiler, that interpreter, that metaprogramming
00:25:12.760 | now has to actually take your code
00:25:14.280 | and go run it on a target machine.
00:25:16.020 | See which one it likes the best,
00:25:18.920 | and then stitch it in and then keep going, right?
00:25:20.960 | - So part of the compilation is machine-specific.
00:25:23.360 | - Yeah, well, so I mean, this is an optional feature,
00:25:25.680 | right, so you don't have to use it for everything.
00:25:26.880 | But yeah, if you're, so one of the things
00:25:30.540 | that we're in the quest of is ultimate performance.
00:25:33.720 | - Yes. - Right, and ultimate
00:25:34.720 | performance is important for a couple of reasons, right?
00:25:36.640 | So if you're an enterprise, you're looking to save costs
00:25:38.800 | and compute and things like this,
00:25:40.440 | ultimate performance translates to fewer servers.
00:25:44.080 | Like if you care about the environment,
00:25:45.680 | hey, better performance leads to more efficiency, right?
00:25:49.400 | I mean, you could joke and say like,
00:25:51.160 | you know, Python's bad for the environment, right?
00:25:54.320 | And so if you move to Mojo, it's like at least 10x better,
00:25:56.660 | just out of the box and keep going, right?
00:25:58.760 | But performance is also interesting
00:26:02.080 | because it leads to better products.
00:26:03.920 | And so in the space of machine learning, right,
00:26:05.880 | if you reduce the latency of a model, so it runs faster.
00:26:09.840 | So every time you query the server running the model,
00:26:11.760 | it takes less time.
00:26:12.840 | Well, then the product team can go
00:26:13.920 | and make the model bigger.
00:26:15.560 | Well, that's actually makes it,
00:26:17.040 | so you have a better experience as a customer.
00:26:19.880 | And so a lot of people care about that.
00:26:21.480 | - So for auto tuning, for like tile size,
00:26:23.480 | you mentioned 128 for TPU,
00:26:25.160 | you would specify like a bunch of options to try.
00:26:28.200 | Just in the code, just simple statement.
00:26:31.000 | And then you can just set and forget and know,
00:26:33.640 | depending on wherever it compiles,
00:26:35.440 | it'll actually be the fastest.
00:26:37.520 | - Yeah, exactly.
00:26:38.360 | And the beauty of this is that it helps you
00:26:39.360 | in a whole bunch of different ways, right?
00:26:40.680 | So if you're building, so often what'll happen is that,
00:26:43.680 | you've written a bunch of software yourself, right?
00:26:45.600 | You wake up one day, you say, I have an idea,
00:26:47.920 | I'm gonna go cut up some code.
00:26:49.400 | I get to work, I forget about it,
00:26:51.880 | I move on with life, I come back six months or a year
00:26:54.880 | or two years or three years later, you dust it off
00:26:56.640 | and you go use it again in a new environment.
00:26:59.400 | And maybe your GPU is different.
00:27:00.720 | Maybe you're running on a server instead of a laptop,
00:27:03.160 | maybe whatever, right?
00:27:04.680 | And so the problem now is you say, okay, well,
00:27:07.280 | I mean, again, not everybody cares about performance,
00:27:09.280 | but if you do, you say, okay, well,
00:27:10.760 | I wanna take advantage of all these new features.
00:27:13.240 | I don't wanna break the old thing though, right?
00:27:15.800 | And so the typical way of handling this kind of stuff
00:27:19.160 | before is, if you're talking about C++ templates
00:27:21.800 | or you're talking about C with macros,
00:27:24.360 | you end up with if-def's,
00:27:25.440 | you get like all these weird things get layered in,
00:27:27.800 | make the code super complicated,
00:27:29.160 | and then how do you test it, right?
00:27:31.080 | It becomes this crazy complexity,
00:27:33.200 | multi-dimensional space that you have to worry about.
00:27:35.680 | And that just doesn't scale very well.
00:27:39.040 | - Actually, let me just jump around
00:27:40.320 | before I go to some specific features,
00:27:42.080 | like the increase in performance here
00:27:44.720 | that we're talking about can be just insane.
00:27:48.080 | You write that Moja can provide a 35,000X
00:27:50.480 | speed up over Python.
00:27:55.800 | How does it do that?
00:27:57.120 | - Yeah, so it can even do more, but we'll get to that.
00:28:00.640 | So first of all, when we say that we're talking about
00:28:05.880 | what's called CPython,
00:28:07.040 | it's the default Python that everybody uses.
00:28:09.080 | When you type Python three,
00:28:09.960 | that's typically the one you use, right?
00:28:12.680 | CPython is an interpreter.
00:28:14.720 | And so interpreters, they have an extra layer
00:28:16.880 | of byte codes and things like this
00:28:19.440 | that they have to go read, parse, interpret,
00:28:21.240 | and it makes them kind of slow from that perspective.
00:28:23.480 | And so one of the first things we do
00:28:25.200 | is we move to a compiler.
00:28:27.200 | And so just moving to a compiler,
00:28:28.640 | getting the interpreter out of the loop
00:28:30.400 | is two to five to 10X speed up, depending on the code.
00:28:33.440 | So just out of the gate,
00:28:34.920 | just using more modern techniques, right?
00:28:39.040 | Now, if you do that, one of the things you can do
00:28:41.120 | is you can start to look at how CPython
00:28:43.320 | started to lay out data.
00:28:45.960 | And so one of the things that CPython did,
00:28:48.760 | and this isn't part of the Python spec necessarily,
00:28:51.440 | but this is just sets of decisions,
00:28:53.560 | is that if you take an integer, for example,
00:28:56.600 | it'll put it in an object.
00:28:58.560 | Because in Python, everything's an object.
00:29:00.240 | And so they do the very logical thing
00:29:02.800 | of keeping the memory representation
00:29:05.120 | of all objects the same.
00:29:06.720 | So all objects have a header.
00:29:08.120 | They have like payload data.
00:29:10.000 | And what this means is that every time
00:29:11.560 | you pass around an object,
00:29:12.480 | you're passing around a pointer to the data.
00:29:15.520 | Well, this has overhead.
00:29:16.640 | And it turns out that modern computers
00:29:18.720 | don't like chasing pointers very much in things like this.
00:29:21.000 | It means that you have to allocate the data.
00:29:23.120 | It means you have to reference count it,
00:29:24.800 | which is another way that Python uses
00:29:26.560 | to keep track of memory.
00:29:27.800 | And so this has a lot of overhead.
00:29:29.480 | And so if you say, okay,
00:29:31.600 | let's try to get that out of the heap,
00:29:34.520 | out of a box, out of an indirection,
00:29:37.280 | and into the registers.
00:29:38.700 | That's another 10x.
00:29:42.280 | - So it adds up if you're reference counting
00:29:44.520 | every single thing you create, that adds up.
00:29:48.000 | - Yep, and if you look at,
00:29:49.240 | people complain about the Python gil.
00:29:50.920 | This is one of the things that hurts parallelism.
00:29:54.060 | That's because of the reference counting.
00:29:56.200 | Right, and so the gil and reference counting
00:29:57.840 | are very tightly intertwined in Python.
00:29:59.400 | It's not the only thing,
00:30:00.240 | but it's very tightly intertwined.
00:30:02.040 | And so then you lean into this and you say, okay, cool.
00:30:03.840 | Well, modern computers,
00:30:05.600 | they can do more than one operation at a time.
00:30:07.880 | And so they have vectors.
00:30:08.720 | What is a vector?
00:30:09.540 | Well, a vector allows you to take one,
00:30:11.120 | instead of taking one piece of data,
00:30:12.360 | doing an add or a multiply,
00:30:13.480 | and then picking up the next one,
00:30:15.720 | you can now do four or eight or 16 or 32 at a time.
00:30:18.560 | Right, well, Python doesn't expose that because of reasons.
00:30:21.600 | And so now you can say, okay, well, you can adopt that.
00:30:24.040 | Now you have threads.
00:30:25.040 | Now you have like additional things,
00:30:26.520 | like you can control memory hierarchy.
00:30:27.760 | And so what Mojo allows you to do
00:30:29.200 | is it allows you to start taking advantage
00:30:30.720 | of all these powerful things
00:30:32.360 | that have been built into the hardware over time.
00:30:34.400 | And it gives, the library gives very nice features.
00:30:38.560 | So you can say, just parallelize this, do this in parallel.
00:30:41.480 | Right, so it's very, very powerful weapons against slowness,
00:30:46.480 | which is why people have been, I think,
00:30:48.280 | having fun, like just taking code and making it go fast,
00:30:50.480 | because it's just kind of an adrenaline rush
00:30:52.520 | to see like how fast you can get things.
00:30:54.460 | - Before I talk about some of the interesting stuff
00:30:56.280 | with parallelization and all that,
00:30:57.640 | let's first talk about like the basics.
00:31:00.080 | We talked to indentation, right?
00:31:01.520 | So this thing looks like Python.
00:31:04.120 | It's sexy and beautiful like Python, as I mentioned.
00:31:07.320 | Is it a typed language?
00:31:09.200 | So what's the role of types?
00:31:10.840 | - Yeah, good question.
00:31:11.680 | So Python has types.
00:31:13.880 | It has strings, it has integers,
00:31:15.880 | it has dictionaries and like all that stuff,
00:31:18.440 | but they all live at runtime, right?
00:31:20.440 | And so because all those types live at runtime in Python,
00:31:23.480 | you never, you don't have to spell them.
00:31:26.560 | Python also has like this whole typing thing going on now,
00:31:29.080 | and a lot of people use it.
00:31:30.640 | I'm not talking about that.
00:31:31.640 | That's kind of a different thing.
00:31:32.480 | We can go back to that if you want.
00:31:33.560 | But typically the, you know, you just say,
00:31:37.760 | I take, I have a def and my def takes two parameters.
00:31:40.320 | I'm gonna call them A and B,
00:31:41.480 | and I don't have to write a type, okay?
00:31:43.520 | So that is great, but what that does
00:31:46.640 | is that forces what's called a consistent representation.
00:31:49.240 | So these things have to be a pointer to an object
00:31:51.800 | with the object header, and they all have to look the same.
00:31:54.480 | And then when you dispatch a method,
00:31:56.480 | you go through all the same different paths,
00:31:58.000 | no matter what the receiver, whatever that type is.
00:32:01.240 | So what Mojo does is it allows you
00:32:03.160 | to have more than one kind of type.
00:32:04.360 | And so what it does is allows you to say,
00:32:06.480 | okay, cool, I have an object,
00:32:08.360 | and objects behave like Python does.
00:32:10.080 | And so it's fully dynamic, and that's all great.
00:32:12.000 | And for many things, classes,
00:32:13.640 | like that's all very powerful and very important.
00:32:16.440 | But if you wanna say, hey, it's an integer,
00:32:18.560 | and it's 32 bits or 64 bits or whatever it is,
00:32:20.760 | or it's a floating point value, it's 64 bits.
00:32:24.400 | Well, then the compiler can take that,
00:32:25.840 | and it can use that to do way better optimization.
00:32:28.800 | And it turns out, again, getting rid of the indirections,
00:32:30.720 | that's huge, means you can get better code completion
00:32:33.640 | because you have, 'cause compiler knows what the type is,
00:32:36.840 | and so it knows what operations work on it.
00:32:38.960 | And so that's actually pretty huge.
00:32:40.880 | And so what Mojo does is it allows you
00:32:43.680 | to progressively adopt types into your program.
00:32:46.560 | And so you can start, again, it's compatible with Python.
00:32:49.360 | And so then you can add however many types you want,
00:32:51.200 | wherever you want them.
00:32:52.560 | And if you don't wanna deal with it,
00:32:53.400 | you don't have to deal with it, right?
00:32:55.040 | And so one of our opinions on this
00:32:59.160 | is that it's not that types are the right thing
00:33:01.640 | or the wrong thing, it's that they're a useful thing.
00:33:04.480 | - Which is kind of optional, it's not strict typing.
00:33:07.720 | You don't have to specify a type.
00:33:09.240 | - Exactly.
00:33:10.520 | - Okay, so starting from the thing
00:33:12.120 | that Python's kind of reaching towards right now
00:33:14.280 | with trying to inject types into it, what is it doing?
00:33:17.520 | - Yeah, with a very different approach, but yes.
00:33:20.200 | - What's the different approach?
00:33:21.200 | I'm actually one of the people
00:33:23.080 | that have not been using types very much in Python.
00:33:26.880 | So I haven't-- - That's okay.
00:33:27.720 | - Why did you say?
00:33:29.640 | - It's just, well, because I know the importance.
00:33:31.840 | It's like adults use strict typing.
00:33:35.440 | And so I refuse to grow up in that sense.
00:33:38.040 | It's a kind of rebellion.
00:33:39.360 | But I just know that it probably reduces
00:33:43.600 | the amount of errors,
00:33:44.440 | even just for forget-about-performance improvements.
00:33:46.760 | It probably reduces errors when you do strict typing.
00:33:49.200 | - Yeah, so I mean, I think it's interesting
00:33:50.680 | if you look at that, right?
00:33:51.520 | And the reason I'm giving you a hard time,
00:33:53.360 | is that there's this cultural norm,
00:33:57.080 | this pressure, there has to be a right way to do things.
00:34:00.800 | Grownups only do it one way,
00:34:01.920 | and if you don't do that, you should feel bad.
00:34:04.240 | Some people feel like Python's a guilty pleasure
00:34:06.280 | or something, and it's like, when it gets serious,
00:34:08.400 | I need to go rewrite it, right?
00:34:09.600 | - Yeah, exactly.
00:34:10.440 | - Well, I mean, cool, I understand history,
00:34:12.760 | and I understand kind of where this comes from,
00:34:14.340 | but I don't think it has to be a guilty pleasure.
00:34:16.720 | - Yeah.
00:34:17.560 | - Right, and so if you look at that, you say,
00:34:19.200 | why do you have to rewrite it?
00:34:20.160 | Well, you have to rewrite it to deploy.
00:34:22.360 | Well, why do you want to deploy?
00:34:23.360 | Well, you care about performance,
00:34:24.880 | or you care about perfectibility,
00:34:25.920 | or you want a tiny thing on the server
00:34:29.040 | that has no dependencies, or you have objectives
00:34:31.960 | that you're trying to attain.
00:34:34.200 | So what if Python can achieve those objectives?
00:34:37.680 | So if you want types, well, maybe you want types
00:34:39.440 | because you want to make sure you're passing
00:34:40.720 | on the right thing, sure, you can add a type.
00:34:43.480 | If you don't care, you're prototyping some stuff,
00:34:45.800 | you're hacking some things out,
00:34:46.880 | you're pulling some RAM code off the internet,
00:34:49.040 | it should just work, right?
00:34:51.080 | And you shouldn't be pressured,
00:34:53.120 | you shouldn't feel bad about doing the right thing
00:34:55.600 | or the thing that feels good.
00:34:56.600 | Now, if you're in a team, right,
00:34:58.840 | you're working at some massive internet company
00:35:00.960 | and you have 400 million lines of Python code,
00:35:03.800 | well, they may have a house rule that you use types.
00:35:06.520 | - Yeah.
00:35:07.360 | - Right, because it makes it easier for different humans
00:35:08.640 | to talk to each other and understand what's going on
00:35:10.320 | and bugs at scale, right?
00:35:12.600 | And so there are lots of good reasons
00:35:14.160 | why you might want to use types,
00:35:15.880 | but that doesn't mean that everybody
00:35:17.640 | should use them all the time, right?
00:35:18.880 | So what Mojo does is it says, cool,
00:35:20.280 | well, allow people to use types,
00:35:22.680 | and if you use types, you get nice things out of it, right?
00:35:25.480 | You get better performance and things like this, right?
00:35:27.640 | But Mojo is a full compatible superset of Python, right?
00:35:32.640 | And so that means it has to work without types.
00:35:36.200 | It has to support all the dynamic things,
00:35:38.560 | it has to support all the packages,
00:35:39.440 | it has to support for comprehension,
00:35:42.160 | list comprehensions and things like this, right?
00:35:43.880 | And so that starting point, I think, is really important.
00:35:47.120 | And I think that, again, you can look at
00:35:49.720 | why I care so much about this,
00:35:51.040 | and there's many different aspects of that,
00:35:52.640 | one of which is the world went through
00:35:54.480 | a very challenging migration from Python 2 to Python 3.
00:35:57.760 | Right, and this migration took many years,
00:36:01.920 | and it was very painful for many teams, right?
00:36:03.960 | And there's a lot of things that went on in that.
00:36:06.440 | I'm not an expert in all the details,
00:36:09.160 | and I honestly don't want to be.
00:36:10.880 | I don't want the world to have to go through that.
00:36:13.000 | Right, and people can ignore Mojo,
00:36:14.960 | and if it's not their thing, that's cool,
00:36:16.960 | but if they want to use Mojo,
00:36:18.080 | I don't want them to have to rewrite all their code.
00:36:19.960 | - Yeah, I mean, this, okay, the superset part is just,
00:36:24.120 | I mean, there's so much brilliant stuff here.
00:36:25.360 | That definitely is incredible.
00:36:27.860 | We'll talk about that.
00:36:29.720 | But first of all, how's the typing implemented differently
00:36:32.920 | in Python versus Mojo?
00:36:36.560 | So this heterogeneous flexibility,
00:36:39.560 | you said, is differently implemented.
00:36:40.920 | - Yeah, so I'm not a full expert
00:36:42.800 | in the whole backstory and types in Python,
00:36:45.000 | so I'll give you that.
00:36:46.600 | I can give you my understanding.
00:36:48.960 | My understanding is, basically, like many dynamic languages,
00:36:52.120 | the ecosystem went through a phase
00:36:54.440 | where people went from writing scripts
00:36:56.620 | to writing large-scale, huge code bases in Python,
00:37:00.600 | and at scale, it kind of helps to have types.
00:37:03.920 | People want to be able to reason about interfaces.
00:37:05.640 | What, do you expect, a string or an int, or like,
00:37:08.560 | these basic things, right?
00:37:10.160 | And so what the Python community started doing
00:37:12.360 | is it started saying, okay, let's have tools on the side,
00:37:15.320 | checker tools, right, that go and, like,
00:37:18.480 | enforce invariance, check for bugs,
00:37:21.080 | try to identify things.
00:37:22.480 | These are called static analysis tools, generally,
00:37:24.800 | and so these tools run over your code
00:37:26.320 | and try to look for bugs.
00:37:27.960 | What ended up happening is there's so many of these things,
00:37:29.800 | so many different weird patterns and different approaches
00:37:31.880 | on specifying the types and different things going on
00:37:34.240 | that the Python community realized and recognized,
00:37:36.640 | hey, hey, hey, there's a thing here.
00:37:38.760 | And so what they started to do
00:37:39.880 | is they started to standardize the syntax
00:37:41.480 | for adding types to Python.
00:37:43.480 | Now, one of the challenges that they had
00:37:44.960 | is that they're coming from kind of this fragmented world
00:37:47.320 | where there's lots of different tools,
00:37:48.600 | they have different trade-offs and interpretations,
00:37:50.800 | and the types mean different things,
00:37:51.880 | and so if you look at types in Python,
00:37:54.480 | according to the Python spec, the types are ignored.
00:37:57.680 | Right, so according to the Python spec,
00:38:00.800 | you can write pretty much anything in a type position, okay?
00:38:05.040 | And you can, technically, you can write any expression, okay?
00:38:10.040 | Now, that's beautiful because you can extend it,
00:38:13.720 | you can do cool things, you can write,
00:38:14.720 | build your own tools, you can build your own house linter
00:38:17.520 | or something like that, right?
00:38:18.920 | But it's also a problem because any existing Python program
00:38:22.800 | may be using different tools,
00:38:24.160 | and they have different interpretations.
00:38:25.600 | And so if you adopt somebody's package into your ecosystem,
00:38:28.860 | try to run the tool you prefer,
00:38:30.460 | it may throw out tons of weird errors
00:38:31.880 | and warnings and problems just because it's incompatible
00:38:34.080 | with how these things work.
00:38:35.840 | Also because they're added late
00:38:37.640 | and they're not checked by the Python interpreter,
00:38:39.520 | it's always kind of more of a hint than it is a requirement.
00:38:42.600 | Also, the CPython implementation
00:38:44.920 | can't use them for performance, and so it's really--
00:38:47.320 | - Well, you have the big one, right?
00:38:48.280 | So you can't utilize for the compilation,
00:38:50.760 | for the just-in-time compilation, okay.
00:38:52.520 | - Exactly, and this all comes back to the design principle
00:38:55.000 | of it's, they're kind of hints,
00:38:57.120 | they're kind of, the definition's a little bit murky,
00:38:59.320 | it's unclear exactly the interpretation in a bunch of cases,
00:39:01.860 | and so because of that, you can't actually,
00:39:04.860 | even if you want to, it's really difficult to use them
00:39:06.680 | to say, like, it is going to be an int,
00:39:09.200 | and if it's not, it's a problem, right?
00:39:11.000 | A lot of code would break if you did that.
00:39:13.080 | So in Mojo, right, so you can still use
00:39:15.360 | those kind of type annotations, it's fine,
00:39:17.200 | but in Mojo, if you declare a type and you use it,
00:39:20.080 | then it means it is going to be that type,
00:39:23.120 | and the compiler helps you check that,
00:39:24.520 | and enforce it, and it's safe,
00:39:26.680 | and it's not a, like, best effort hint kind of a thing.
00:39:30.760 | - So if you try to shove a string-type thing
00:39:33.160 | into an integer-- - You get an error.
00:39:35.240 | - And you get an error from the compiler, compile time.
00:39:37.960 | Nice, okay, what kind of basic types are there?
00:39:41.960 | - Yeah, so Mojo is pretty hardcore
00:39:46.760 | in terms of what it tries to do in the language,
00:39:49.840 | which is, the philosophy there is that we,
00:39:52.940 | again, if you look at Python, right,
00:39:56.440 | Python's a beautiful language
00:39:57.480 | because it's so extensible, right?
00:39:58.840 | And so all of the different things in Python,
00:40:01.840 | like for loops, and plus, and like all these things
00:40:04.240 | can be accessed through these underbar, underbar methods.
00:40:07.800 | Okay, so you have to say, okay,
00:40:10.320 | if I make something that is super fast,
00:40:11.920 | I can go all the way down to the middle,
00:40:13.760 | why do I need to have integers built into the language?
00:40:17.160 | And so what Mojo does is it says,
00:40:18.560 | okay, well, we can have this notion of structs.
00:40:20.760 | So you have classes in Python, now you can have structs.
00:40:24.160 | Classes are dynamic, structs are static.
00:40:27.140 | Cool, we can get high performance,
00:40:28.360 | we can write C++ kind of code with structs if you want.
00:40:31.600 | These things mix and work beautifully together.
00:40:34.440 | But what that means is that you can go and implement
00:40:36.280 | strings, and ints, and floats, and arrays,
00:40:38.720 | and all that kind of stuff in the language, right?
00:40:41.720 | And so that's really cool because,
00:40:44.560 | to me as a idealizing compiler language type of person,
00:40:49.560 | what I wanna do is I wanna get magic out of the compiler
00:40:52.960 | and put it in the libraries.
00:40:54.360 | Because if somebody can, you know,
00:40:55.880 | if we can build an integer that's beautiful
00:40:57.760 | and has an amazing API and does all the things
00:40:59.680 | you'd expect an integer to do, but you don't like it,
00:41:03.160 | maybe you want a big integer,
00:41:04.120 | maybe you want a like sideways integer, I don't know,
00:41:06.360 | like what all the space of integers are,
00:41:08.500 | then you can do that and it's not a second class citizen.
00:41:14.360 | And so if you look at certain other languages,
00:41:16.600 | like C++, one I also love and use a lot,
00:41:19.900 | int is hard-coded in the language, but complex is not.
00:41:25.860 | And so isn't it kind of weird that you have this
00:41:29.160 | STD complex class, but you have int,
00:41:32.520 | and complex tries to look like a natural numeric type
00:41:35.800 | and things like this, but integers and floating point
00:41:38.520 | have these like special promotion rules
00:41:40.640 | and other things like that that are magic
00:41:42.000 | and they're hacked into the compiler.
00:41:43.760 | And because of that, you can't actually make something
00:41:45.440 | that works like the built-in types.
00:41:47.600 | - Is there something provided as a standard
00:41:49.920 | because it's AI first, you know,
00:41:54.160 | numerical types are so important here.
00:41:56.520 | So is there something like a nice standard implementation
00:42:00.440 | of integer and float?
00:42:01.280 | - Yeah, so we're still building all that stuff out.
00:42:02.920 | So we provide integers and floats and all that kind of stuff.
00:42:05.040 | We also provide like buffers and tensors
00:42:07.240 | and things like that that you'd expect in an ML context.
00:42:10.080 | Honestly, we need to keep designing and redesigning
00:42:13.080 | and working with the community to build that out
00:42:14.440 | and make that better.
00:42:15.280 | That's not our strength right now.
00:42:17.080 | Give us six months or a year,
00:42:18.160 | and I think it'll be way better.
00:42:19.320 | But the power of putting in the library means
00:42:22.560 | that we can have teams of experts
00:42:23.920 | that aren't compiler engineers that can help us design
00:42:27.000 | and refine and drive us forward.
00:42:28.720 | - So one of the exciting things we should mention here
00:42:31.000 | is that this is new and fresh.
00:42:35.400 | This cake is unbaked.
00:42:37.800 | It's almost baked.
00:42:38.680 | You can tell it's delicious,
00:42:40.680 | but it's not fully ready to be consumed.
00:42:42.800 | - Yep, that's very fair.
00:42:43.960 | It is very useful, but it's very useful
00:42:45.920 | if you're a super low-level programmer right now.
00:42:47.920 | And what we're doing is we're working our way up the stack.
00:42:49.840 | And so the way I would look at Mojo today in May in 2023
00:42:54.840 | is that it's like a 0.1.
00:42:58.080 | So I think that a year from now,
00:43:00.360 | it's gonna be way more interesting to a variety of people.
00:43:03.880 | But what we're doing is we decided to release it early
00:43:07.000 | so that people can get access to it and play with it
00:43:08.720 | and we can build it with the community.
00:43:10.280 | We have a big roadmap, fully published,
00:43:14.480 | being transparent about this,
00:43:15.640 | and a lot of people are involved in this stuff.
00:43:17.080 | And so what we're doing is we're really optimizing
00:43:19.080 | for building this thing the right way.
00:43:21.840 | And building it the right way is kind of interesting,
00:43:24.040 | working with the community,
00:43:24.880 | because everybody wants it yesterday.
00:43:27.160 | And so sometimes it's kind of,
00:43:30.320 | you know, there's some dynamics there,
00:43:31.920 | but I think it's the right thing.
00:43:34.280 | - So there's a Discord also,
00:43:35.560 | so the dynamics is pretty interesting.
00:43:37.840 | Sometimes the community probably can be very chaotic
00:43:40.440 | and introduce a lot of stress.
00:43:44.400 | Guido famously quit over the stress
00:43:46.440 | of the Walrus operator.
00:43:47.920 | I mean, it broke.
00:43:49.640 | - His father broke the camel's back.
00:43:52.080 | - Exactly.
00:43:52.920 | And so like, it can be very stressful to develop.
00:43:55.160 | But can you just, a tangent upon a tangent,
00:43:58.080 | is it stressful to work through the design
00:44:03.080 | of various features here,
00:44:04.520 | given that the community is so richly involved?
00:44:07.240 | - Well, so I've been doing open development
00:44:10.120 | and community stuff for decades now.
00:44:12.360 | Somehow this has happened to me.
00:44:14.040 | So I've learned some tricks.
00:44:15.720 | But the thing that always gets me
00:44:17.440 | is I want to make people happy.
00:44:19.320 | Right, and so this is,
00:44:21.320 | maybe not all people all happy all the time,
00:44:23.560 | but generally I want people to be happy, right?
00:44:25.840 | And so the challenge is that, again,
00:44:28.360 | we're tapping into some long,
00:44:30.840 | some deep-seated, long tensions and pressures,
00:44:33.920 | both in the Python world,
00:44:35.320 | but also in the AI world,
00:44:36.600 | in the hardware world and things like this.
00:44:38.200 | And so people just want us to move faster, right?
00:44:41.040 | And so, again, our decision was,
00:44:43.480 | let's release this early.
00:44:44.680 | Let's get people used to it or access to it
00:44:47.600 | and play with it.
00:44:48.440 | And like, let's build in the open,
00:44:49.920 | which we could have, you know,
00:44:52.560 | had the language monk sitting in the cloister
00:44:55.920 | up on the hilltop,
00:44:57.800 | like beavering away, trying to build something.
00:44:59.600 | But in my experience,
00:45:00.480 | you get something that's way better
00:45:01.680 | if you work with the community.
00:45:03.440 | - Right.
00:45:04.280 | And so, yes, it can be frustrating,
00:45:05.800 | can be challenging for lots of people involved.
00:45:07.400 | And, you know, if you, I mean,
00:45:08.680 | if you mentioned our Discord,
00:45:09.720 | we have over 10,000 people on the Discord,
00:45:11.760 | 11,000 people or something.
00:45:13.320 | Keep in mind, we released Mojo like two weeks ago.
00:45:15.480 | - Yeah.
00:45:16.320 | - So.
00:45:17.520 | - It's very active.
00:45:18.360 | - So it's very cool.
00:45:19.920 | But what that means is that, you know,
00:45:22.760 | 10, 11,000 people all will want something different, right?
00:45:25.800 | And so what we've done is we've tried to say,
00:45:28.440 | okay, cool, here's our roadmap.
00:45:30.360 | Here, and the roadmap isn't completely arbitrary.
00:45:33.480 | It's based on here's the logical order
00:45:35.520 | in which to build these features
00:45:36.800 | or add these capabilities and things like that.
00:45:38.880 | And what we've done is we've spun really fast
00:45:40.800 | on like bug fixes.
00:45:41.800 | And so we actually have very few bugs, which is cool.
00:45:45.800 | I mean, actually for a project in the state,
00:45:47.920 | but then what we're doing is we're dropping in features
00:45:50.000 | very deliberately.
00:45:51.280 | - I mean, this is fun to watch
00:45:52.200 | 'cause you got the two gigantic communities
00:45:55.000 | of like hardware, like systems engineers,
00:45:57.840 | and then you have the machine learning Python people
00:46:01.680 | that are like higher level.
00:46:03.080 | - Yeah.
00:46:03.920 | - And it's just two, like army, like.
00:46:07.880 | - They've been at war.
00:46:08.720 | Yeah, they've been at war.
00:46:10.560 | Right, and so here's a test.
00:46:12.240 | - Like a Tolkien novel or something, okay.
00:46:13.720 | - So here's a test.
00:46:14.560 | And again, like it's super funny
00:46:15.640 | for something that's only been out for two weeks, right?
00:46:17.920 | People are so impatient, right?
00:46:20.160 | But okay, cool.
00:46:21.360 | Let's fast forward a year.
00:46:23.160 | Like in a year's time, Mojo will be actually quite amazing
00:46:25.880 | and solve tons of problems and be very good.
00:46:28.080 | People still have these problems, right?
00:46:31.320 | And so you look at this and you say,
00:46:33.640 | and the way I look at this at least is to say,
00:46:35.760 | okay, well, we're solving big longstanding problems.
00:46:39.840 | To me, I, again, working on many different problems,
00:46:43.200 | I wanna make sure we do it right, right?
00:46:45.200 | There's like a responsibility you feel
00:46:46.920 | because if you mess it up, right?
00:46:49.720 | There's very few opportunities to do projects like this
00:46:51.720 | and have them really have impact on the world.
00:46:53.840 | If we do it right, then maybe we can take those feuding armies
00:46:57.280 | and actually heal some of those wounds.
00:46:58.920 | - Yeah, this is like, this feels like a speech
00:47:01.960 | by George Washington or Abraham Lincoln or something.
00:47:05.320 | - And you look at this and it's like, okay,
00:47:06.480 | well, how different are we?
00:47:08.320 | We all want beautiful things.
00:47:09.520 | We all want something that's nice.
00:47:10.680 | We all wanna be able to work together.
00:47:11.840 | We all want our stuff to be used, right?
00:47:13.360 | And so if we can help heal that,
00:47:14.880 | now I'm not optimistic that all people will use Mojo
00:47:18.080 | and they'll stop using C++.
00:47:19.480 | That's not my goal, right?
00:47:21.000 | But if we can heal some of that,
00:47:22.800 | I think that'd be pretty cool.
00:47:24.320 | - Yeah, and we start by putting the people
00:47:26.720 | who like braces into the gulag.
00:47:29.480 | - So there are proposals for adding braces to Mojo.
00:47:32.920 | We just tell them no. - Oh, interesting.
00:47:35.080 | Okay. (both laughing)
00:47:37.600 | Politely.
00:47:38.440 | Yeah, anyway, so there's a lot of amazing features
00:47:40.760 | on the roadmap and those are already implemented.
00:47:42.600 | It'd be awesome.
00:47:43.760 | I could just ask you a few things.
00:47:44.920 | - Yeah, go for it.
00:47:45.760 | - So the other performance improvement
00:47:48.600 | comes from immutability.
00:47:50.400 | So what's this var and this let thing that we got going on?
00:47:55.400 | What's immutability?
00:47:56.960 | - Yeah, so one of the things that is useful,
00:48:00.400 | and it's not always required, but it's useful,
00:48:02.880 | is knowing whether something can change out
00:48:04.280 | from underneath you, right?
00:48:05.760 | And so in Python, you have a pointer to an array, right?
00:48:09.280 | And so you pass that pointer to an array around to things.
00:48:12.120 | If you pass into a function,
00:48:14.920 | they may take that and scroll it away
00:48:16.400 | in some other data structure.
00:48:18.200 | And so you get your array back and you go to use it.
00:48:20.440 | Now somebody else is like putting stuff in your array.
00:48:23.000 | How do you reason about that?
00:48:24.240 | It gets to be very complicated and leads to lots of bugs.
00:48:27.360 | And so one of the things that,
00:48:30.480 | again, this is not something Mojo forces on you,
00:48:32.600 | but something that Mojo enables
00:48:34.160 | is a thing called value semantics.
00:48:36.280 | And what value semantics do is they take collections,
00:48:40.060 | like arrays, like dictionaries,
00:48:43.000 | also tensors and strings and things like this
00:48:45.400 | that are much higher level
00:48:47.000 | and make them behave like proper values.
00:48:49.480 | And so it makes it look like
00:48:50.600 | if you pass these things around,
00:48:52.800 | you get a logical copy of all the data.
00:48:55.240 | And so if I pass you an array,
00:48:57.280 | it's your array, you can go do what you want to it,
00:48:58.840 | you're not gonna hurt my array.
00:49:00.360 | Now that is an interesting
00:49:02.400 | and very powerful design principle.
00:49:04.160 | It defines away a ton of bugs.
00:49:05.920 | You have to be careful to implement it in an efficient way.
00:49:08.600 | - Is there a performance hit that's significant?
00:49:12.020 | - Generally not, if you implement it the right way,
00:49:15.160 | but it requires a lot of very low level
00:49:18.080 | getting the language right bits.
00:49:20.520 | - I assume there'd be a huge performance hit
00:49:22.880 | 'cause the benefit is really nice
00:49:24.880 | 'cause you don't get into the complex.
00:49:25.720 | - Absolutely, well, the trick is you can't do copies.
00:49:29.920 | So you have to provide the behavior of copying
00:49:33.440 | without doing the copy.
00:49:35.280 | - Yeah, how do you do that?
00:49:36.920 | (laughing)
00:49:38.320 | How do you do that?
00:49:39.160 | - It's not magic, it's just, it's actually pretty cool.
00:49:42.200 | Well, so first, before we talk about how that works,
00:49:44.400 | let's talk about how it works in Python.
00:49:46.080 | So in Python, you define a person class,
00:49:48.920 | or maybe a person class is a bad idea.
00:49:50.680 | You define a database class,
00:49:52.280 | and database class has an array of records,
00:49:54.080 | something like that.
00:49:55.400 | And so the problem is that if you pass in a record
00:49:57.920 | or a class instance into the database,
00:50:00.840 | it'll take a hold of that object
00:50:02.600 | and then it assumes it has it.
00:50:04.960 | And if you're passing an object in,
00:50:06.800 | you have to know that that database is gonna take it,
00:50:09.920 | and therefore you shouldn't change it
00:50:10.920 | after you put it in the database.
00:50:11.960 | This is the problem.
00:50:12.800 | - You just kind of have to know that.
00:50:13.760 | - You just have to kind of know that, right?
00:50:15.560 | And so you roll out version one of the database,
00:50:18.000 | you just kind of have to know that.
00:50:19.760 | Of course, Lex uses his own database, right?
00:50:21.720 | - Yeah.
00:50:22.560 | - Right, 'cause you built it.
00:50:23.400 | You understand how this works, right?
00:50:24.920 | Somebody else joins the team, they don't know this.
00:50:26.640 | - Yes.
00:50:27.480 | - Right, and so now they suddenly get bugs.
00:50:29.600 | You're having to maintain the database,
00:50:31.160 | you shake your fist, you argue,
00:50:33.680 | the 10th time this happens, you're like,
00:50:35.200 | okay, we have to do something different, right?
00:50:36.880 | And so what you do is you go change your Python code,
00:50:39.040 | and you change your database class to copy the record
00:50:42.360 | every time you add it.
00:50:43.560 | And so what ends up happening is you say,
00:50:45.160 | okay, I will do what's called a defensive copy
00:50:48.120 | inside the database, and then that way,
00:50:50.440 | if somebody passes something in,
00:50:51.920 | I will have my own copy of it,
00:50:54.340 | and they can go do whatever,
00:50:55.520 | and they're not gonna break my thing.
00:50:57.680 | Okay, this is usually the two design patterns.
00:51:00.560 | If you look in PyTorch, for example,
00:51:02.160 | this is cloning a tensor.
00:51:03.680 | Like there's a specific thing,
00:51:05.000 | and you have to know where to call it,
00:51:06.040 | and if you don't call it in the right place,
00:51:06.920 | you get these bugs, and this is state of the art, right?
00:51:10.600 | So a different approach, so it's used in many languages.
00:51:12.960 | So I've worked with it in Swift,
00:51:15.280 | is you say, okay, well, let's provide value semantics.
00:51:17.560 | And so we wanna provide the view
00:51:20.440 | that you get a logically independent copy,
00:51:22.980 | but we wanna do that lazily.
00:51:24.960 | And so what we do is you say,
00:51:26.440 | okay, if you pass something into a function,
00:51:29.060 | it doesn't actually make a copy.
00:51:30.520 | What it actually does is it just
00:51:31.560 | increments a reference to it.
00:51:33.200 | And if you pass it around, you stick in your database,
00:51:35.880 | it can go in the database, you own it,
00:51:38.100 | and then you come back out of the stack,
00:51:39.380 | nobody's copied anything.
00:51:41.040 | You come back out of the stack,
00:51:42.320 | and then the caller lets go of it.
00:51:44.320 | Well, then you've just handed it off to the database,
00:51:47.560 | you've transferred it, and there's no copies made.
00:51:50.840 | Now, on the other hand, if your coworker goes
00:51:54.100 | and hands you a record, and you pass it in,
00:51:56.320 | you stick it in the database,
00:51:57.400 | and then you go to town and you start modifying it,
00:52:00.180 | what happens is you get a copy lazily on demand.
00:52:03.620 | And so what this does is gives you copies
00:52:06.260 | only when you need them,
00:52:07.820 | and it also, so it defines away the bugs,
00:52:09.560 | but it also generally reduces
00:52:11.160 | the number of copies in practice.
00:52:12.920 | - But the implementation details are tricky here.
00:52:15.320 | - Yeah, so this is, yes.
00:52:17.280 | - Something with reference counting,
00:52:19.640 | but to make it performant across a number
00:52:23.240 | of different kinds of objects?
00:52:26.280 | - Yeah, so you need a couple of things.
00:52:27.880 | So this concept has existed in many different worlds,
00:52:31.720 | and so again, it's not novel research at all.
00:52:35.240 | The magic is getting the design right
00:52:37.220 | so that you can do this in a reasonable way.
00:52:39.520 | And so there's a number of components that go into this.
00:52:41.160 | One is when you're passing around,
00:52:43.520 | so we're talking about Python and reference counting
00:52:46.240 | at the expense of doing that.
00:52:47.800 | When you're passing values around,
00:52:49.300 | you don't wanna do extra reference counting
00:52:51.120 | for no good reason,
00:52:52.240 | and so you have to make sure that you're efficient
00:52:53.680 | and you transfer ownership
00:52:55.280 | instead of duplicating references and things like that,
00:52:57.960 | which is a very low-level problem.
00:53:00.560 | You also have to adopt this,
00:53:02.360 | and you have to build these data structures.
00:53:04.280 | And so if you say, you know,
00:53:06.400 | Mojo has to be compatible with Python,
00:53:07.800 | so of course the default list is a reference semantic list
00:53:11.600 | that works the way you'd expect in Python,
00:53:14.000 | but then you have to design a value semantic list.
00:53:16.840 | And so you just have to implement that,
00:53:18.000 | and then you implement the logic within.
00:53:20.000 | And so the role of the language here
00:53:22.100 | is to provide all the low-level hooks
00:53:24.080 | that allow the author of the type
00:53:26.960 | to be able to get and express this behavior
00:53:28.600 | without forcing it into all cases
00:53:30.620 | or hard-coding this into the language itself.
00:53:32.960 | - But there's an ownership,
00:53:34.020 | so you're constantly tracking who owns the thing.
00:53:37.120 | - Yes, and so there's a whole system called ownership,
00:53:39.120 | and so this is related to work done in the Rust community.
00:53:43.480 | Also the Swift community has done a bunch of work,
00:53:45.120 | and there's a bunch of different other languages
00:53:46.680 | that have all kind of, C++ actually has copy constructors
00:53:49.960 | and destructors and things like that.
00:53:51.560 | And so, and I mean, C++ has everything.
00:53:54.500 | So it has move constructors,
00:53:55.600 | it has like this whole world of things.
00:53:57.160 | And so this is a body of work
00:54:00.920 | that's kind of been developing for many, many years now.
00:54:03.320 | And so Mojo takes some of the best ideas
00:54:06.240 | out of all these systems and remixes it in a nice way
00:54:08.640 | so that you get the power of something
00:54:11.360 | like the Rust programming language,
00:54:13.080 | but you don't have to deal with it when you don't want to,
00:54:15.760 | which is a major thing in terms of teaching and learning
00:54:18.080 | and being able to use and scale these systems.
00:54:21.040 | - How does that play with argument conventions?
00:54:23.940 | What are they, why are they important?
00:54:25.600 | How does the value semantics,
00:54:26.720 | how does the transfer ownership work
00:54:28.840 | with the arguments when they're passed into functions?
00:54:30.720 | - Yeah, so if you go deep into systems programming land,
00:54:34.320 | so this isn't, again, this is not something for everybody,
00:54:36.760 | but if you go deep into systems programming land,
00:54:39.020 | what you encounter is you encounter these types
00:54:41.600 | that get weird.
00:54:43.720 | So if you're used to Python, you think about everything,
00:54:46.000 | I can just copy it around,
00:54:47.280 | I can go change it and mutate it and do these things,
00:54:50.040 | and it's all cool.
00:54:50.960 | If you get into systems programming land,
00:54:53.760 | you get into these things like I have an atomic number,
00:54:56.720 | or I have a mutex,
00:54:58.320 | or I have a uniquely owned database handle,
00:55:02.320 | things like this, right?
00:55:03.820 | So these types you can't necessarily copy.
00:55:05.820 | Sometimes you can't necessarily even move them
00:55:07.860 | to a different address.
00:55:09.440 | And so what Mojo allows you to do
00:55:11.360 | is it allows you to express,
00:55:12.800 | hey, I don't wanna get a copy of this thing,
00:55:15.840 | I wanna actually just get a reference to it.
00:55:18.120 | And by doing that, what you can say is you can say,
00:55:20.080 | okay, if I'm defining something weird,
00:55:22.100 | like a atomic number or something,
00:55:24.440 | it's like it has to be,
00:55:25.760 | so an atomic number is an area in memory
00:55:29.520 | that multiple threads can access at a time
00:55:32.100 | without locks, right?
00:55:34.200 | And so the definition of an atomic number
00:55:37.560 | is multiple different things have to be poking at that,
00:55:39.720 | therefore they have to agree on where it is.
00:55:42.240 | Right, and so you can't just move it out from underneath one
00:55:44.320 | because it kind of breaks what it means.
00:55:46.640 | And so that's an example of a type that you can't even,
00:55:49.160 | you can't copy, you can't move it.
00:55:50.880 | Once you create it, it has to be where it was, right?
00:55:53.820 | Now, if you look at many other examples,
00:55:56.080 | like a database handle, right?
00:55:57.560 | So, okay, well, what happens,
00:56:00.000 | how do you copy a database handle?
00:56:01.680 | Do you copy the whole database?
00:56:03.040 | That's not something you necessarily wanna do.
00:56:05.340 | There's a lot of types like that
00:56:08.480 | where you wanna be able to say that they are uniquely owned.
00:56:11.660 | So there's always one of this thing,
00:56:15.080 | and or if I create a thing, I don't copy it.
00:56:19.400 | And so what Mojo allows you to do is it allows you to say,
00:56:22.160 | hey, I wanna pass around a reference to this thing
00:56:23.760 | without copying it.
00:56:24.720 | And so it has borrowed conventions.
00:56:27.120 | So you can say, you can use it,
00:56:29.360 | but you don't get to change it.
00:56:31.320 | You can pass it by mutable reference.
00:56:33.280 | And so if you do that, then you can,
00:56:35.200 | you get a reference to it, but you can change it.
00:56:37.360 | And so it manages all that kind of stuff.
00:56:39.640 | - So it's just a really nice implementation
00:56:42.120 | of like a C++ has, you know,
00:56:45.840 | the different kinds of pointers.
00:56:47.200 | - Yeah, has pointers.
00:56:48.040 | - Smart, smart, different kinds of implications
00:56:50.080 | of smart pointers that you can explicitly define.
00:56:52.800 | This allows you, but you're saying that's more like
00:56:55.640 | the weird case versus the common case.
00:56:57.760 | - Well, it depends on where, I mean,
00:57:00.000 | I don't think I'm a normal person.
00:57:01.800 | So I mean, I'm not one to call other people weird,
00:57:04.300 | but the, but you know, if you talk to a normal Python,
00:57:09.300 | a typical Python programmer,
00:57:10.760 | you're typically not thinking about this, right?
00:57:12.320 | This is a lower level of abstraction.
00:57:13.960 | Now, if you talk to a C++ programmer,
00:57:15.720 | certainly if you talk to a Rust programmer,
00:57:17.440 | again, they're not weird, they're delightful.
00:57:19.520 | Like these are all good people, right?
00:57:21.960 | Those folks will think about all the time, right?
00:57:24.760 | And so I look at this as there's a spectrum
00:57:26.720 | between very deep, low level systems.
00:57:29.160 | I'm gonna go poke the bits and care about
00:57:30.760 | how they're laid out in memory,
00:57:32.240 | all the way up to application and scripting
00:57:34.720 | and other things like this.
00:57:35.680 | And so it's not that anybody's right or wrong,
00:57:37.800 | it's about how do we build one system that scales.
00:57:41.280 | - By the way, the idea of an atomic number
00:57:44.640 | has been something that always brought me deep happiness,
00:57:49.120 | because the flip side of that,
00:57:52.920 | the idea that threads can just modify stuff asynchronously,
00:57:57.920 | just the whole idea of concurrent programming
00:58:02.960 | is a source of infinite stress for me.
00:58:05.360 | - Well, so this is where you jump into,
00:58:07.320 | again, you zoom out and get out of programming languages
00:58:11.520 | or compilers and you just look what the industry has done.
00:58:14.640 | My mind is constantly blown by this, right?
00:58:16.640 | And you look at what, Moore's law.
00:58:20.080 | Moore's law has this idea that computers for a long time,
00:58:23.440 | single thread performance just got faster and faster
00:58:25.320 | and faster and faster for free.
00:58:27.200 | But then physics and other things intervened
00:58:30.560 | and power consumption, other things started to matter.
00:58:32.840 | And so what ended up happening is we went
00:58:34.640 | from single core computers to multi-core,
00:58:37.440 | then we went to accelerators, right?
00:58:39.040 | And this trend towards specialization of hardware
00:58:41.760 | is only gonna continue.
00:58:43.440 | And so for years, us programming language nerds
00:58:47.680 | and compiler people have been saying,
00:58:49.200 | okay, well, how do we tackle multi-core, right?
00:58:51.720 | For a while, it was like multi-core is the future,
00:58:53.560 | we have to get on top of this thing.
00:58:55.320 | Then it was multi-core is the default,
00:58:56.840 | what are we doing with this thing?
00:58:57.760 | And then it's like, there's chips
00:58:59.760 | with hundreds of cores in them, what happened, right?
00:59:03.280 | And so I'm super inspired by the fact that
00:59:07.000 | in the face of this, those machine learning people
00:59:10.480 | invented this idea of a tensor, right?
00:59:13.000 | And what is a tensor?
00:59:13.840 | A tensor is like an arithmetic and algebraic concept,
00:59:18.240 | it's like an abstraction around a gigantic,
00:59:20.880 | parallelizable data set, right?
00:59:23.920 | And because of that, and because of things
00:59:25.480 | like TensorFlow and PyTorch, we're able to say,
00:59:27.480 | okay, well, express the math of the system.
00:59:31.560 | This enables you to do automatic differentiations,
00:59:33.560 | enables you to do all these cool things.
00:59:36.320 | And it's an abstract representation.
00:59:39.760 | Well, because you have that abstract representation,
00:59:41.480 | you can now map it onto these parallel machines
00:59:43.940 | without having to control, okay, put that byte here,
00:59:47.240 | put that byte there, put that byte there.
00:59:48.760 | And this has enabled an explosion in terms of AI,
00:59:51.720 | compute, accelerators, like all the stuff.
00:59:54.280 | And so that's super, super exciting.
00:59:56.360 | - What about the deployment,
00:59:58.280 | the execution across multiple machines?
01:00:00.760 | So you write that the modular compute platform
01:00:05.080 | dynamically partitions models with billions of parameters
01:00:08.360 | and distributes their execution across multiple machines,
01:00:11.480 | enabling unparalleled efficiency.
01:00:15.480 | By the way, the use of unparalleled in that sentence,
01:00:18.120 | anyway, enabling unparalleled efficiency,
01:00:20.280 | scale, and reliability for the largest workloads.
01:00:22.840 | So how do you do this abstraction
01:00:27.240 | of distributed deployment of large models?
01:00:31.400 | - Yeah, so one of the really interesting tensions,
01:00:34.520 | so there's a whole bunch of stuff that goes into that.
01:00:36.160 | I'll pick a random walkthrough.
01:00:38.920 | If you go back and replay the history of machine learning,
01:00:42.440 | right, I mean, the brief, the most recent history
01:00:44.480 | of machine learning, 'cause this is, as you know,
01:00:46.040 | very deep, I knew Lex when he had an AI podcast.
01:00:50.080 | - Yes. (laughs)
01:00:52.280 | - Right? - Yep.
01:00:53.120 | - So if you look at just TensorFlow and PyTorch,
01:00:57.600 | which is pretty recent history in the big picture, right?
01:00:59.640 | But TensorFlow's all about graphs.
01:01:03.080 | PyTorch, I think, pretty unarguably ended up winning.
01:01:06.640 | And why did it win?
01:01:07.600 | Mostly because of usability, right?
01:01:09.800 | And the usability of PyTorch is, I think, huge.
01:01:12.000 | And I think, again, that's a huge testament
01:01:13.760 | to the power of taking abstract, theoretical,
01:01:17.560 | technical concepts and bringing it to the masses, right?
01:01:20.640 | Now, the challenge with what the TensorFlow
01:01:23.600 | versus the PyTorch design points was that TensorFlow
01:01:27.600 | is kind of difficult to use for researchers,
01:01:30.200 | but it was actually pretty good for deployment.
01:01:32.360 | PyTorch is really good for researchers.
01:01:33.880 | It kind of is not super great for deployment, right?
01:01:36.320 | And so I think that we, as an industry,
01:01:38.560 | have been struggling.
01:01:40.200 | And if you look at what deploying
01:01:41.960 | a machine learning model today means,
01:01:43.640 | is that you'll have researchers who are,
01:01:46.160 | I mean, wicked smart, of course,
01:01:47.600 | but they're wicked smart at model architecture
01:01:50.240 | and data and calculus.
01:01:52.800 | (laughs)
01:01:53.640 | Like, they're wicked smart in various domains.
01:01:55.720 | They don't wanna know anything about the hardware
01:01:57.360 | or deployment or C++ or things like this, right?
01:01:59.480 | And so what's happened is you get people
01:02:01.200 | who train the model, they throw it over the fence,
01:02:04.120 | and then you have people that try to deploy the model.
01:02:06.820 | Well, every time you have a team A does X,
01:02:11.720 | they throw it over the fence,
01:02:12.720 | and team B does Y, like, you have a problem,
01:02:17.600 | because, of course, it never works the first time.
01:02:20.080 | And so you throw it over the fence,
01:02:21.760 | they figure out, okay, it's too slow, it won't fit,
01:02:24.480 | doesn't use the right operator,
01:02:26.680 | the tool crashes, whatever the problem is,
01:02:30.280 | then they have to throw it back over the fence.
01:02:32.560 | And every time you throw a thing over a fence,
01:02:34.880 | it takes three weeks of project managers
01:02:36.360 | and meetings and things like this.
01:02:37.920 | And so what we've seen today is that
01:02:40.200 | getting models in production can take weeks or months.
01:02:43.720 | Like, it's not atypical.
01:02:44.720 | I talk to lots of people, and you talk about, like,
01:02:47.440 | VP of software at some internet company
01:02:49.280 | trying to deploy a model, and they're like,
01:02:51.080 | why do I need a team of 45 people?
01:02:52.880 | (laughs)
01:02:53.720 | Like, it's so easy to train a model,
01:02:55.520 | why can't I deploy it, right?
01:02:58.080 | And if you dig into this, every layer is problematic.
01:03:01.720 | So if you look at the language piece,
01:03:03.680 | I mean, this is tip of the iceberg.
01:03:05.500 | It's a very exciting tip of the iceberg for folks,
01:03:07.540 | but you've got Python on one side
01:03:09.640 | and C++ on the other side.
01:03:11.320 | Python doesn't really deploy.
01:03:13.120 | I mean, it can theoretically, technically in some cases,
01:03:15.560 | but often a lot of production teams
01:03:17.280 | will wanna get things out of Python
01:03:18.640 | because they get better performance
01:03:19.840 | and control and whatever else.
01:03:21.640 | So Mojo can help with that.
01:03:23.000 | If you look at serving, so you talk about gigantic models.
01:03:27.380 | Well, a gigantic model won't fit on one machine.
01:03:29.920 | (laughs)
01:03:30.840 | Right, and so now you have this model,
01:03:32.920 | it's written in Python, it has to be rewritten in C++.
01:03:36.180 | Now it also has to be carved up
01:03:37.720 | so that half of it runs on one machine,
01:03:39.200 | half of it runs on another machine,
01:03:40.920 | or maybe it runs on 10 machines.
01:03:43.480 | Well, so now suddenly the complexity is exploding, right?
01:03:47.240 | And the reason for this is that if you look into
01:03:50.200 | TensorFlow, PyTorch, these systems,
01:03:52.440 | they weren't really designed for this world, right?
01:03:54.860 | They were designed for, you know,
01:03:56.400 | back in the day when we were starting and doing things,
01:03:59.580 | where it was a different, much simpler world.
01:04:02.100 | Like you wanna run ResNet-50
01:04:03.520 | or some ancient model architecture like this.
01:04:06.000 | It was just a, it was a completely different world.
01:04:08.120 | - Train on one GPU. - Exactly.
01:04:10.080 | - Doing things on one GPU. - AlexNet.
01:04:11.440 | Yeah, AlexNet, right, the major breakthrough.
01:04:14.160 | And the world has changed, right?
01:04:17.600 | And so now the challenge is that
01:04:19.400 | TensorFlow, PyTorch, these systems,
01:04:20.800 | they weren't actually designed for LLMs.
01:04:22.760 | Like that was not a thing.
01:04:24.920 | And so where TensorFlow actually has amazing power
01:04:27.480 | in terms of scale and deployment and things like that,
01:04:29.840 | and I think Google is, I mean, maybe not unmatched,
01:04:32.680 | but they're like incredible in terms of their capabilities
01:04:34.960 | and gigantic scale.
01:04:36.300 | Many researchers using PyTorch, right?
01:04:40.520 | And so PyTorch doesn't have those same capabilities.
01:04:42.640 | And so what Modular can do is it can help with that.
01:04:44.840 | Now, if you take a step back and you say like,
01:04:46.000 | what is Modular doing, right?
01:04:48.080 | So Modular has like a bitter enemy
01:04:51.980 | that we're fighting against in the industry.
01:04:54.160 | And it's one of these things where everybody knows it,
01:04:57.000 | but nobody is usually willing to talk about it.
01:05:00.920 | - The bitter enemy.
01:05:02.300 | - The bitter thing that we have to destroy,
01:05:04.800 | that we're all struggling with,
01:05:05.920 | and it's like fish can't see water, is complexity.
01:05:09.680 | - Sure, yes.
01:05:11.080 | Complexity.
01:05:11.920 | That was very philosophical of you.
01:05:13.960 | That's what you said.
01:05:15.000 | - And so if you look at it, yes, it is on the hardware side.
01:05:18.720 | All these accelerators, all these software stacks
01:05:21.200 | that go with the accelerator,
01:05:22.200 | all these, like there's massive complexity over there.
01:05:24.760 | You look at what's happening on the modeling side.
01:05:28.400 | Massive amount of complexity.
01:05:29.680 | Like things are changing all the time.
01:05:30.760 | People are inventing.
01:05:31.840 | Turns out the research is not done, right?
01:05:34.600 | And so people wanna be able to move fast.
01:05:35.960 | Transformers are amazing,
01:05:37.880 | but there's a ton of diversity even within transformers.
01:05:40.320 | And what's the next transformer, right?
01:05:42.800 | And you look into serving.
01:05:44.960 | Also, huge amounts of complexity.
01:05:46.680 | It turns out that all the cloud providers
01:05:49.000 | have all their very weird, but very cool hardware
01:05:52.040 | for networking and all this kind of stuff.
01:05:53.620 | And it's all very complicated.
01:05:55.200 | People aren't using that.
01:05:56.340 | You look at classical serving, right?
01:05:59.640 | There's this whole world of people
01:06:00.680 | who know how to write high-performance servers
01:06:02.320 | with zero copy networking.
01:06:03.640 | And all this fancy asynchronous IO
01:06:07.360 | and all these fancy things in the serving community,
01:06:11.120 | very little of that has pervaded
01:06:12.760 | into the machine learning world, right?
01:06:14.920 | And why is that?
01:06:15.760 | Well, it's because, again,
01:06:16.840 | these systems have been built up over many years.
01:06:19.120 | They haven't been rethought.
01:06:21.720 | There hasn't been a first principles approach to this.
01:06:23.960 | And so what Modular's doing is we're saying,
01:06:25.640 | okay, we've built many of these things.
01:06:28.720 | So I've worked on TensorFlow and TPUs and things like that.
01:06:31.520 | Other folks on our team have worked on PyTorch Core.
01:06:35.040 | We've worked on Audix Runtime.
01:06:36.480 | We've worked on many of these other systems.
01:06:38.280 | And so built systems like the Apple accelerators
01:06:41.880 | and all that kind of stuff.
01:06:43.120 | Our team is quite amazing.
01:06:44.960 | And so one of the things
01:06:46.440 | that roughly everybody at Modular's grumpy about
01:06:48.920 | is that when you're working on one of these projects,
01:06:51.800 | you have a first-order goal.
01:06:54.360 | Get the hardware to work.
01:06:55.360 | Get the system to enable one more model.
01:06:57.440 | Get this product out the door.
01:06:59.160 | Enable the specific workload
01:07:00.960 | or solve this problem for this product team.
01:07:04.960 | And nobody's been given a chance
01:07:06.160 | to actually do that step back.
01:07:08.080 | And so we as an industry,
01:07:08.920 | we didn't take two steps forward.
01:07:10.360 | We took like 18 steps forward
01:07:12.440 | in terms of all this really cool technology
01:07:14.120 | across compilers and systems and runtimes
01:07:16.160 | and heterogeneous computing, all this kind of stuff.
01:07:18.400 | And all this technology has been,
01:07:20.200 | I wouldn't say beautifully designed,
01:07:23.280 | but it's been proven in different quadrants.
01:07:26.240 | You look at Google with TPUs.
01:07:28.000 | Massive, huge exaflops of compute
01:07:31.680 | strapped together into machines
01:07:33.320 | that researchers are programming in Python in a notebook.
01:07:37.040 | That's huge.
01:07:37.880 | That's amazing. - That's incredible.
01:07:38.720 | - That's incredible. - Right?
01:07:39.560 | It's incredible.
01:07:40.400 | And so you look at the technology that goes into that
01:07:42.640 | and the algorithms are actually quite general.
01:07:46.080 | And so lots of other hardware out there
01:07:48.600 | and lots of other teams out there
01:07:49.880 | don't have the sophistication
01:07:51.160 | or maybe the years working on it
01:07:53.680 | or the budget or whatever that Google does.
01:07:56.320 | And so they should be getting access to the same algorithms,
01:07:59.000 | but they just don't have that.
01:08:00.440 | So what Modular's doing is we're saying,
01:08:03.200 | cool, this is not research anymore.
01:08:05.360 | We've built auto-tuning in many systems.
01:08:08.080 | We've built programming languages.
01:08:09.760 | And so have implemented C++, have implemented Swift,
01:08:13.480 | have implemented many of these things.
01:08:15.000 | And so it's hard, but it's not research.
01:08:19.880 | And you look at accelerators.
01:08:21.640 | Well, we know there's a bunch
01:08:23.440 | of different weird kind of accelerators,
01:08:24.880 | but they actually cluster together, right?
01:08:27.360 | And you look at GPUs.
01:08:28.360 | Well, there's a couple of major vendors of GPUs
01:08:30.560 | and they maybe don't always get along,
01:08:32.520 | but their architectures are very similar.
01:08:34.560 | You look at CPUs.
01:08:35.480 | CPUs are still super important
01:08:37.360 | for the deployment side of things.
01:08:39.040 | And you see new architectures coming out
01:08:40.920 | from all the cloud providers and things like this,
01:08:42.680 | and they're all super important to the world, right?
01:08:45.440 | But they don't have the 30 years of development
01:08:47.600 | that the entrenched people do, right?
01:08:49.720 | And so what Modular can do is we're saying,
01:08:52.200 | okay, all this complexity,
01:08:54.040 | like it's not bad complexity,
01:08:56.200 | it's actually innovation, right?
01:08:59.400 | And so it's innovation that's happening
01:09:01.360 | and it's for good reasons,
01:09:03.120 | but I have sympathy for the poor software people, right?
01:09:06.120 | I mean, again, I'm generally a software person too.
01:09:08.440 | I love hardware, but software people
01:09:10.560 | wanna build applications and products and solutions
01:09:13.280 | that scale over many years.
01:09:15.960 | They don't wanna build a solution
01:09:17.000 | for one generation of hardware with one vendor's tools, right?
01:09:20.720 | And because of this,
01:09:21.560 | they need something that scales with them.
01:09:23.760 | They need something that works on cloud and mobile, right?
01:09:27.920 | Because their product manager said,
01:09:29.720 | hey, I want it to have lower latency
01:09:31.680 | and it's better for personalization
01:09:33.520 | or whatever they decide, right?
01:09:35.480 | Products evolve.
01:09:36.480 | And so the challenge with the machine learning technology
01:09:39.720 | and the infrastructure that we have today in the industry
01:09:42.000 | is that it's all these point solutions.
01:09:44.720 | And because there are all these point solutions,
01:09:46.240 | it means that as your product evolves,
01:09:47.640 | you have to switch different technology stacks
01:09:49.760 | or switch to a different vendor.
01:09:51.320 | And what that does is that slows down progress.
01:09:54.160 | - So basically, a lot of the things we've developed
01:09:57.480 | in those little silos for machine learning tasks,
01:10:01.880 | you want to make that the first class citizen
01:10:03.880 | of a general purpose programming language
01:10:06.160 | that can then be compiled
01:10:07.160 | across all these kinds of hardware.
01:10:08.680 | - Well, so it's not really about a programming language.
01:10:10.640 | I mean, the programming language
01:10:11.600 | is a component of the mission, right?
01:10:13.840 | And the mission is, or not literal,
01:10:15.840 | but our joking mission is to save the world
01:10:17.760 | from terrible AI software.
01:10:19.640 | - Excellent, I love it.
01:10:20.480 | - Okay. (laughs)
01:10:21.800 | So, if you look at this mission, you need a syntax.
01:10:26.800 | So that's, so yeah, she needed a programming language, right?
01:10:29.360 | And like, we wouldn't have to build the programming language
01:10:31.960 | if one existed, right?
01:10:33.880 | So if Python was already good enough, then cool,
01:10:35.240 | we would've just used it, right?
01:10:36.360 | We're not just doing very large scale,
01:10:38.800 | expensive engineering projects for the sake of it.
01:10:40.760 | Like, it's to solve a problem, right?
01:10:43.360 | It's also about accelerators.
01:10:47.160 | It's also about exotic numerics and BFLUT16
01:10:50.160 | and matrix multiplications and convolutions
01:10:52.120 | and like this kind of stuff.
01:10:54.160 | Within the stack, there are things like kernel fusion.
01:10:57.720 | That's a esoteric but really important thing
01:10:59.800 | that leads to much better performance
01:11:02.120 | and much more general research hackability together.
01:11:06.560 | Right, and so-- - And that's enabled
01:11:08.160 | by the ASICs, that's enabled by certain hardware.
01:11:10.720 | So it's like, where's the dance between,
01:11:13.000 | I mean, there's several questions here.
01:11:15.800 | Like, how do you add a piece of hardware to the stack?
01:11:18.520 | - Yeah. - If a new piece of,
01:11:19.400 | like if I have this genius invention
01:11:22.360 | of a specialized accelerator,
01:11:24.960 | how do I add that to the modular framework?
01:11:27.320 | And also, how does modular as a standard
01:11:30.680 | start to define the kind of hardware
01:11:33.960 | that should be developed?
01:11:35.120 | - Yeah, so let me take a step back
01:11:36.800 | and talk about status quo, okay?
01:11:39.160 | And so if you go back to TensorFlow 1,
01:11:41.880 | PyTorch 1, this kind of timeframe,
01:11:44.640 | and these have all evolved and gotten way more complicated.
01:11:47.840 | So let's go back to the glorious simple days, right?
01:11:50.680 | These things basically were CPUs and CUDA.
01:11:54.000 | And so what you do is you say, go do a dense layer,
01:11:58.000 | and a dense layer has a matrix multiplication in it, right?
01:12:00.360 | And so when you say that, you say,
01:12:02.120 | go do this big operation of matrix multiplication,
01:12:04.920 | and if it's on a GPU, kick off a CUDA kernel.
01:12:08.240 | If it's on a CPU, go do like an Intel algorithm
01:12:11.840 | or something like that with the Intel MKL, okay?
01:12:14.640 | Now, that's really cool if you're either NVIDIA
01:12:17.760 | or Intel, right?
01:12:19.240 | But then more hardware comes in, right?
01:12:23.120 | And on one axis, you have more hardware coming in.
01:12:25.440 | On the other hand, you have an explosion
01:12:27.680 | of innovation in AI.
01:12:29.360 | And so what happened with both TensorFlow and PyTorch
01:12:31.120 | is that the explosion of innovation in AI has led to,
01:12:35.000 | it's not just about matrix multiplication and convolution.
01:12:37.400 | These things have now like 2,000 different operators.
01:12:40.880 | And on the other hand, you have,
01:12:41.960 | I don't know how many pieces of hardware there are out there.
01:12:44.000 | It's a lot, okay?
01:12:45.680 | It's not even hundreds.
01:12:47.320 | It's probably thousands, okay?
01:12:48.880 | And across all of edge and across like all
01:12:51.760 | the different things that exist.
01:12:52.600 | - That are used at scale.
01:12:54.080 | - Yeah, exactly.
01:12:54.920 | I mean, AI-- - Also, it's not just like--
01:12:56.360 | - AI is everywhere. - A handful.
01:12:57.560 | Yeah, it's not a handful of TPU alternatives.
01:12:59.800 | - Correct.
01:13:00.640 | - It's-- - It's every phone,
01:13:02.320 | often with many different-- - Right.
01:13:04.680 | - Chips inside of it from different vendors.
01:13:06.520 | - Right.
01:13:07.360 | - Like it's, AI is everywhere.
01:13:10.160 | It's a thing, right?
01:13:11.160 | - Why are they all making their own chips?
01:13:12.760 | Like why is everybody making their own thing?
01:13:15.520 | - Well, so because-- - Is that a good thing?
01:13:17.240 | - For sure.
01:13:18.080 | - So Chris's philosophy on hardware.
01:13:19.520 | - Yeah.
01:13:20.360 | - Right, so my philosophy is that
01:13:22.040 | there isn't one right solution, right?
01:13:25.080 | And so I think that, again, we're at the end of Moore's law.
01:13:27.760 | Specialization happens.
01:13:29.160 | - Yeah.
01:13:30.000 | - If you're building, if you're training GPT-5,
01:13:33.000 | you want some crazy supercomputer data center thingy.
01:13:38.040 | If you're making a smart camera that runs on batteries,
01:13:41.680 | you want something that looks very different.
01:13:43.840 | If you're building a phone,
01:13:44.720 | you want something that looks very different.
01:13:45.880 | If you have something like a laptop,
01:13:47.720 | you want something that looks maybe similar,
01:13:49.640 | but a different scale, right?
01:13:51.120 | And so AI ends up touching all of our lives.
01:13:54.320 | Robotics, right?
01:13:55.360 | And like lots of different things.
01:13:57.080 | And so as you look into this,
01:13:59.040 | these have different power envelopes.
01:14:00.960 | There's different trade-offs in terms of the algorithms.
01:14:03.080 | There's new innovations in sparsity
01:14:04.720 | and other data formats and things like that.
01:14:06.800 | And so hardware innovation,
01:14:09.040 | I think is a really good thing, right?
01:14:10.760 | And what I'm interested in is unlocking that innovation.
01:14:12.880 | There's also like analog and quantum
01:14:14.440 | and like all the really weird stuff, right?
01:14:18.360 | And so if somebody can come up with a chip
01:14:20.440 | that uses analog computing
01:14:21.760 | and it's 100X more power efficient,
01:14:23.920 | think what that would mean in terms of the daily impact
01:14:26.840 | on the products we use.
01:14:28.040 | That'd be huge.
01:14:29.440 | Now, if you're building an analog computer,
01:14:32.240 | you may not be a compiler specialist.
01:14:34.600 | But these are different skill sets, right?
01:14:36.440 | And so you can hire some compiler people
01:14:38.840 | if you're running a big company maybe.
01:14:40.680 | But it turns out these are really
01:14:43.000 | like exotic new generation of compilers.
01:14:46.480 | Like this is a different thing, right?
01:14:47.920 | And so if you take a step back out
01:14:49.800 | and come back to what is the status quo?
01:14:51.720 | Status quo is that if you're Intel or you're Nvidia,
01:14:54.960 | you continue to keep up with the industry
01:14:56.840 | and you chase and, okay, there's 1900 now,
01:14:59.560 | there's 2000 now, there's 2100.
01:15:01.520 | And you have a huge team of people
01:15:02.960 | that are like trying to keep up and tune and optimize.
01:15:04.840 | And even when one of the big guys comes out
01:15:07.760 | with a new generation of their chip,
01:15:09.600 | they have to go back and rewrite all these things, right?
01:15:12.160 | So really it's only powered by having hundreds of people
01:15:14.560 | that are all like frantically trying to keep up.
01:15:17.200 | And what that does is that keeps out the little guys.
01:15:19.880 | And sometimes the not so little guys,
01:15:21.240 | the big guys that are also just not
01:15:23.520 | in those dominant positions.
01:15:24.640 | And so what has been happening,
01:15:28.080 | and so a lot of, you talk about the rise
01:15:29.840 | of new exotic crazy accelerators,
01:15:31.920 | is people have been trying to turn this from a,
01:15:34.120 | let's go write lots of special kernels problem
01:15:36.600 | into a compiler problem.
01:15:38.640 | And so we, and I contributed to this as well,
01:15:41.840 | we as an industry went into it,
01:15:43.200 | like let's go make this compiler problem phase,
01:15:46.440 | let's call it.
01:15:47.280 | And much of the industry is still in this phase,
01:15:49.280 | by the way, so I wouldn't say this phase is over.
01:15:51.880 | And so the idea is to say, look, okay,
01:15:54.680 | what a compiler does is it provides
01:15:56.280 | a much more general extensible hackable interface
01:16:01.280 | for dealing with the general case, right?
01:16:04.520 | And so within machine learning algorithms
01:16:08.360 | for example, people figured out that,
01:16:09.840 | hey, if I do a matrix multiplication
01:16:11.680 | and I do a ReLU, right?
01:16:13.760 | The classic activation function,
01:16:16.860 | it is way faster to do one pass over the data
01:16:20.080 | and then do the ReLU on the output
01:16:22.760 | where I'm writing out the data,
01:16:24.040 | 'cause ReLU is just a maximum operation, right?
01:16:26.240 | Max is zero.
01:16:27.320 | And so it's an amazing optimization,
01:16:30.280 | take MatMul, ReLU, squish together in one operation,
01:16:33.500 | now we have MatMul ReLU.
01:16:35.400 | Well, wait a second, if I do that now,
01:16:37.360 | I just went from having two operators to three.
01:16:40.960 | But now I figure out, okay,
01:16:41.880 | well, there's a lot of activation functions.
01:16:43.680 | What about leaky ReLU?
01:16:46.400 | What about like a million things that are out there, right?
01:16:49.320 | And so as I start fusing these in,
01:16:51.740 | now I get permutations of all these algorithms, right?
01:16:54.480 | And so what the compiler people said is they said,
01:16:56.000 | hey, cool, well, I will go enumerate all the algorithms
01:16:58.680 | and I will enumerate all the pairs
01:16:59.960 | and I will actually generate a kernel for you.
01:17:02.320 | And I think that this has been very useful for the industry.
01:17:05.200 | This is one of the things that powers Google TPUs,
01:17:08.160 | PyTorch 2s, like rolling out really cool compiler stuff
01:17:10.920 | with Triton, this other technology and things like this.
01:17:13.880 | And so the compiler people are kind of coming into their fore
01:17:16.840 | and saying like, awesome, this is a compiler problem,
01:17:18.680 | we'll compiler it.
01:17:19.600 | Here's the problem.
01:17:22.360 | Not everybody's a compiler person.
01:17:23.700 | I love compiler people, trust me, right?
01:17:25.420 | But not everybody can or should be a compiler person.
01:17:27.880 | It turns out that there are people
01:17:30.120 | that know analog computers really well,
01:17:31.920 | or they know some GPU internal architecture thing
01:17:35.240 | really well, or they know some crazy,
01:17:37.200 | sparse numeric, interesting algorithm
01:17:40.280 | that is the cusp of research,
01:17:42.160 | but they're not compiler people.
01:17:43.560 | And so one of the challenges with this new wave
01:17:46.080 | of technology trying to turn everything into a compiler,
01:17:49.160 | is again, it has excluded a ton of people.
01:17:51.560 | And so you look at what does Mojo do,
01:17:53.480 | what does the modular stack do,
01:17:55.220 | is brings programmability back into this world.
01:17:57.780 | Like it enables, I wouldn't say normal people,
01:18:00.300 | but a different kind of delightful nerd
01:18:03.320 | that cares about numerics, or cares about hardware,
01:18:05.840 | or cares about things like this,
01:18:07.320 | to be able to express that in the stack
01:18:08.900 | and extend the stack without having to
01:18:11.160 | actually go hack the compiler itself.
01:18:12.920 | - So extend the stack on the algorithm side,
01:18:16.520 | and then on the hardware side.
01:18:18.440 | - Yeah, so again, go back to the simplest example of int.
01:18:21.840 | And so what both Swift and Mojo
01:18:23.320 | and other things like this did,
01:18:24.820 | is we said, okay, pull magic out of the compiler
01:18:27.160 | and put it in the standard library.
01:18:29.040 | And so what modular's doing with the engine
01:18:31.040 | that we're providing and this very deep technology stack,
01:18:34.640 | which goes into heterogeneous runtimes
01:18:36.720 | and a whole bunch of really cool things,
01:18:39.780 | this whole stack allows that stack to be extended
01:18:44.440 | and hacked and changed by researchers,
01:18:46.840 | and by hardware innovators,
01:18:47.880 | and by people who know things that we don't know.
01:18:51.680 | 'Cause modular has some smart people,
01:18:53.240 | but we don't have all the smart people, it turns out.
01:18:56.240 | - What are heterogeneous runtimes?
01:18:57.960 | - Yeah, so what is heterogeneous?
01:19:01.600 | So heterogeneous just means
01:19:02.740 | many different kinds of things together.
01:19:04.720 | And so the simplest example you might come up with
01:19:07.640 | is a CPU and a GPU.
01:19:09.760 | And so it's a simple heterogeneous computer to say,
01:19:12.680 | I will run my data loading and pre-processing
01:19:14.920 | and other algorithms on the CPU,
01:19:16.800 | and then once I get it into the right shape,
01:19:18.400 | I shove it into the GPU,
01:19:19.760 | I do a lot of matrix multiplications and convolutions
01:19:22.200 | and things like this,
01:19:23.440 | and I get it back out and I do some reductions and summaries
01:19:27.080 | and I shove it across the wire,
01:19:29.280 | across the network to another machine.
01:19:31.280 | And so you've got now what are effectively two computers,
01:19:36.080 | a CPU and a GPU talking to each other,
01:19:38.360 | working together in a heterogeneous system.
01:19:40.520 | But that was 10 years ago.
01:19:43.760 | You look at a modern cell phone.
01:19:47.360 | Modern cell phone, you've got CPUs,
01:19:50.080 | and they're not just CPUs,
01:19:51.160 | there's like big dot little CPUs,
01:19:53.080 | and so there's multiple different kinds of CPUs
01:19:54.880 | that are kind of working together,
01:19:56.240 | that are multi-core.
01:19:57.320 | You've got GPUs, you've got neural network accelerators,
01:20:01.120 | you got dedicated hardware blocks for media,
01:20:04.760 | so for video decode and JPEG decode and things like this.
01:20:07.560 | And so you've got this massively complicated system,
01:20:09.600 | and this isn't just cell phones,
01:20:10.600 | every laptop these days is doing the same thing,
01:20:13.240 | and all of these blocks can run at the same time
01:20:16.320 | and need to be choreographed, right?
01:20:19.680 | And so again, one of the cool things about machine learning
01:20:22.200 | is it's moving things to like data flow graphs
01:20:24.240 | and a higher level of abstractions and tensors
01:20:26.480 | and these things that it doesn't specify,
01:20:28.600 | here's how to do the algorithm.
01:20:30.880 | It gives the system a lot more flexibility
01:20:32.640 | in terms of how to translate or map it or compile it
01:20:35.500 | onto the system that you have.
01:20:37.240 | And so what you need,
01:20:38.400 | the bottom-est part of the layer there
01:20:40.200 | is a way for all these devices to talk to each other.
01:20:43.440 | And so this is one thing that I'm very passionate about.
01:20:46.520 | I mean, you know I'm a nerd.
01:20:47.920 | But all these machines and all these systems
01:20:51.280 | are effectively parallel computers
01:20:53.640 | running at the same time,
01:20:55.240 | sending messages to each other.
01:20:56.680 | And so they're all fully asynchronous.
01:20:59.240 | Well, this is actually a small version
01:21:01.120 | of the same problem you have in a data center.
01:21:03.480 | Right, in a data center,
01:21:04.320 | you now have multiple different machines,
01:21:06.360 | sometimes very specialized,
01:21:07.600 | sometimes with GPUs or TPUs in one node
01:21:10.400 | and sometimes with disks in another node.
01:21:12.120 | And so you get a much larger scale heterogeneous computer.
01:21:15.800 | And so what ends up happening
01:21:16.960 | is you have this like multi-layer abstraction
01:21:19.680 | of hierarchical parallelism,
01:21:21.160 | hierarchical asynchronous communication.
01:21:24.440 | And making that, again, my enemy is complexity.
01:21:28.520 | By getting that away from being
01:21:30.480 | different specialized systems
01:21:31.800 | at every different part of the stack
01:21:33.600 | and having more consistency and uniformity,
01:21:36.000 | I think we can help lift the world
01:21:37.240 | and make it much simpler and actually get used.
01:21:39.720 | - But how do you leverage the strengths
01:21:41.400 | of the different specialized systems?
01:21:42.760 | So looking inside the smartphone,
01:21:44.800 | I don't know, five, six computers essentially
01:21:48.680 | inside a smartphone.
01:21:50.240 | How do you, without trying to minimize the explicit,
01:21:55.240 | making it explicit which computer
01:21:59.160 | is supposed to be used for which operation?
01:22:00.560 | - Yeah, so there's a pretty well-known algorithm
01:22:03.240 | and what you're doing is you're looking at two factors.
01:22:05.640 | You're looking at the factor of sending data
01:22:07.680 | from one thing to another, right?
01:22:09.240 | 'Cause it takes time to get it from that side of the chip
01:22:11.120 | to that side of the chip and things like this.
01:22:13.600 | And then you're looking at what is the time it takes
01:22:15.120 | to do an operation on a particular block?
01:22:18.440 | So take CPUs.
01:22:19.560 | CPUs are fully general.
01:22:22.080 | They can do anything, right?
01:22:23.760 | But then you have a neural net accelerator
01:22:25.320 | that's really good at matrix multiplications, okay?
01:22:28.040 | And so you say, okay, well,
01:22:29.280 | if my workload is all matrix multiplications,
01:22:31.800 | I start up, I send the data over the neural net thing,
01:22:34.560 | it goes and does matrix multiplications.
01:22:36.160 | When it's done, it sends me back the result.
01:22:38.160 | All is good, right?
01:22:39.560 | And so the simplest thing is just saying,
01:22:41.520 | do matrix operations over there, right?
01:22:44.800 | But then you realize you get a little bit more complicated
01:22:47.120 | because you can do matrix multiplications on a GPU,
01:22:49.520 | you can do it on a neural net accelerator,
01:22:52.840 | you can do it on CPU,
01:22:54.200 | and they'll have different trade-offs and costs.
01:22:56.160 | And it's not just matrix multiplication.
01:22:58.040 | And so what you actually look at is you look at,
01:23:00.480 | I have generally a graph of compute.
01:23:03.040 | I wanna do a partitioning.
01:23:04.760 | I wanna look at the communication,
01:23:07.080 | the bisection bandwidth and like the overhead
01:23:09.120 | and the sending of all these different things
01:23:11.320 | and build a model for this and then decide,
01:23:13.840 | okay, it's an optimization problem.
01:23:15.440 | Where do I wanna place this compute?
01:23:17.960 | - So it's the old school theoretical computer science problem
01:23:21.200 | of scheduling.
01:23:22.040 | And then how does, presumably it's possible
01:23:26.320 | to somehow magically include autotune into this?
01:23:29.420 | - Absolutely.
01:23:31.320 | So, I mean, in my opinion,
01:23:33.400 | this is an opinion,
01:23:34.240 | this is not, not everybody would agree with this,
01:23:36.720 | but in my opinion,
01:23:38.120 | the world benefits from simple and predictable systems
01:23:41.120 | at the bottom that you can control.
01:23:43.820 | But then once you have a predictable execution layer,
01:23:47.240 | you can build lots of different policies on top of it.
01:23:50.120 | And so one policy can be that the human programmer says,
01:23:55.000 | do that here, do that here, do that here, do that here,
01:23:57.340 | and like fully manually controls everything.
01:23:59.540 | And the system should just do it.
01:24:02.880 | Then you quickly get in the mode of like,
01:24:04.140 | I don't wanna have to tell it to do it.
01:24:06.000 | And so the next logical step that people typically take
01:24:08.760 | is they write some terrible heuristic.
01:24:11.240 | Oh, if it's a major small location, do it over there.
01:24:13.320 | Or if it's floating point, do it on the GPU.
01:24:14.960 | If it's integer, do it on the CPU.
01:24:16.160 | Like something like that, right?
01:24:17.680 | And then you then get into this mode of like,
01:24:21.280 | people care more and more and more,
01:24:22.280 | and you say, okay, well,
01:24:23.100 | let's actually like make the heuristic better.
01:24:26.120 | Let's get into auto-tuning.
01:24:27.640 | Let's actually do a search of the space to decide,
01:24:32.400 | well, what is actually better, right?
01:24:35.320 | Well, then you get into this problem
01:24:36.840 | where you realize this is not a small space.
01:24:38.480 | This is a many dimensional, hyper-dimensional space
01:24:42.580 | that you cannot exhaustively search.
01:24:45.280 | So do you know of any algorithms
01:24:47.040 | that are good at searching very complicated spaces for-
01:24:50.160 | - Don't tell me you're gonna turn this
01:24:51.740 | into a machine learning problem.
01:24:53.440 | - So then you turn it into a machine learning problem,
01:24:55.280 | and then you have a space of genetic algorithms
01:24:57.520 | and reinforcement learning and like all these-
01:25:00.080 | - But can you include that into the stack,
01:25:02.040 | into the module stack?
01:25:04.080 | - Yeah, yeah.
01:25:04.920 | - And where does it sit?
01:25:05.880 | Where does it live?
01:25:06.700 | Is it a separate thing, or is it part of the compilation?
01:25:08.920 | - So you start from simple and predictable models.
01:25:11.940 | And so you can have full control,
01:25:13.400 | and you can have coarse-grained knobs
01:25:15.720 | that like nudge systems so you don't have to do this.
01:25:18.400 | But if you really care about getting the best,
01:25:21.200 | you know, the last ounce out of a problem,
01:25:23.360 | then you can use additional tools.
01:25:25.300 | And there are the cool things,
01:25:26.240 | you don't wanna do this every time you run a model.
01:25:28.200 | You wanna figure out the right answer and then cache it.
01:25:31.600 | And once you do that, you can get,
01:25:33.220 | you can say, okay, cool,
01:25:34.360 | I can get up and running very quickly.
01:25:36.360 | I can get good execution out of my system.
01:25:39.960 | I can decide if something's important,
01:25:41.760 | and if it's important,
01:25:42.600 | I can go throw a bunch of machines at it
01:25:44.160 | and do a big, expensive search over the space
01:25:46.360 | using whatever technique I feel like.
01:25:47.760 | It's really up to the problem.
01:25:49.800 | And then when I get the right answer,
01:25:51.240 | cool, I can just start using it.
01:25:53.320 | Right, and so you can get out of this trade-off
01:25:55.920 | between, okay, am I gonna like spend forever doing a thing,
01:25:58.680 | or do I get up and running quickly,
01:26:00.340 | and is the quality result?
01:26:01.560 | Like these are actually not in contention with each other
01:26:05.040 | if the system's designed to scale.
01:26:07.440 | - You started and did a little bit of a whirlwind overview
01:26:10.960 | of how you get the 35,000x speedup or more over Python.
01:26:15.960 | Jeremy Howard did a really great presentation
01:26:19.760 | about sort of the basic, like, look at the code,
01:26:22.440 | here's how you get the speedup.
01:26:23.880 | Like you said, that's something we could,
01:26:26.080 | probably developers can do for their own code
01:26:28.560 | to see how you can get these gigantic speedups.
01:26:31.120 | But can you maybe speak to the machine learning task
01:26:33.720 | in general?
01:26:34.560 | How do you make some of this code fast and specific?
01:26:37.280 | What would you say is the main bottleneck?
01:26:41.220 | For machine learning tasks.
01:26:44.780 | So are we talking about matmul, matrix multiplication,
01:26:49.020 | how do you make that fast?
01:26:50.300 | - So, I mean, if you just look at the Python problem,
01:26:52.700 | right, you can say, how do I make Python faster?
01:26:55.980 | And there's been a lot of people
01:26:56.820 | that have been working on the,
01:26:58.380 | okay, how do I make Python 2x faster, 10x faster,
01:27:00.580 | or something like that, right?
01:27:01.500 | And there've been a ton of projects in that vein, right?
01:27:04.340 | Mojo started from the, what can the hardware do?
01:27:07.000 | Like, what is the limit of physics?
01:27:09.980 | What is the speed of light?
01:27:11.060 | What is the, like, how fast can this thing go?
01:27:13.220 | And then, how do I express that?
01:27:15.580 | Right, and so it wasn't anchored relatively
01:27:18.260 | on make Python a little bit faster.
01:27:20.220 | It's saying, cool, I know what the hardware can do,
01:27:22.580 | let's unlock that, right?
01:27:23.780 | Now, when you--
01:27:25.780 | - You can just say how gutsy that is to be in the meeting,
01:27:29.180 | and as opposed to trying to see,
01:27:30.380 | how do we get the improvement?
01:27:31.780 | It's like, what can the physics do?
01:27:34.100 | - I mean, maybe I'm a special kind of nerd,
01:27:35.860 | but you look at that, what is the limit of physics?
01:27:38.580 | How fast can these things go, right?
01:27:41.300 | When you start looking at that,
01:27:42.820 | typically it ends up being a memory problem, right?
01:27:45.500 | And so today, particularly
01:27:47.740 | with these specialized accelerators,
01:27:49.460 | the problem is that you can do a lot of math within them,
01:27:52.900 | but you get bottleneck sending data back and forth
01:27:55.900 | to memory, whether it be local memory,
01:27:58.060 | or distant memory, or disk, or whatever it is,
01:28:01.420 | and that bottleneck, particularly
01:28:03.540 | as the training sizes get large,
01:28:05.380 | as you start doing tons of inferences all over the place,
01:28:08.540 | that becomes a huge bottleneck for people, right?
01:28:11.060 | So again, what happened is,
01:28:13.700 | we went through a phase of many years
01:28:15.300 | where people took the special case and hand-tuned it,
01:28:18.860 | and tweaked it, and tricked it out,
01:28:19.900 | and they knew exactly how the hardware worked,
01:28:21.140 | and they knew the model, and they made it fast.
01:28:24.020 | Didn't generalize.
01:28:25.540 | And so you can make, you know,
01:28:27.060 | Resonant 50, or some, or AlexNet, or something,
01:28:29.580 | Inception V1, like you can do that, right?
01:28:31.740 | Because the models are small, they fit in your head, right?
01:28:35.180 | But as the models get bigger, more complicated,
01:28:37.380 | as the machines get more complicated, it stops working.
01:28:40.460 | Right, and so this is where things
01:28:42.500 | like kernel fusion come in.
01:28:44.540 | So what is kernel fusion?
01:28:45.500 | This is this idea of saying, let's avoid going to memory,
01:28:48.580 | and let's do that by building a new hybrid kernel,
01:28:52.780 | a numerical algorithm that actually keeps things
01:28:56.740 | in the accelerator, instead of having to write
01:28:58.980 | all the way out to memory, right?
01:29:01.140 | What's happened with these accelerators now
01:29:03.020 | is you get multiple levels of memory.
01:29:04.500 | Like in a GPU, for example, you'll have global memory,
01:29:06.940 | and local memory, and like all these things.
01:29:09.780 | If you zoom way into how hardware works,
01:29:13.620 | the register file is actually a memory.
01:29:16.420 | So the registers are like an L0 cache.
01:29:18.660 | And so a lot of taking advantage of the hardware
01:29:22.980 | ends up being fully utilizing the full power
01:29:26.740 | in all of its capability.
01:29:28.900 | And this has a number of problems, right?
01:29:30.700 | One of which is, again, the complexity of disaster, right?
01:29:33.620 | There's too much hardware.
01:29:35.140 | Even if you just say, let's look at the chips
01:29:37.780 | from one line of vendor, like Apple, or Intel,
01:29:40.300 | or whatever it is, each version of the chip
01:29:43.580 | comes out with new features, and they change things
01:29:45.580 | so that it takes more time or less time
01:29:47.340 | to do different things.
01:29:48.180 | And you can't rewrite all the software
01:29:50.060 | whenever a new chip comes out, right?
01:29:52.020 | And so this is where you need a much more scalable approach.
01:29:54.500 | And this is what Mojo, and what the modular stack provides,
01:29:57.340 | is it provides this infrastructure,
01:29:59.500 | and the system for factoring all this complexity,
01:30:02.300 | and then allowing people to express algorithms.
01:30:04.180 | You talk about auto-tuning, for example,
01:30:06.500 | express algorithms in a more portable way
01:30:09.140 | so that when a new chip comes out,
01:30:11.020 | you don't have to rewrite it all.
01:30:13.540 | So to me, I kind of joke, what is a compiler?
01:30:16.820 | Well, there's many ways to explain that.
01:30:19.420 | You convert thing A into thing B,
01:30:21.420 | and you convert source code to machine code.
01:30:23.620 | Like, you can talk about many, many things that compilers do.
01:30:28.020 | But to me, it's about a bag of tricks.
01:30:30.740 | It's about a system and a framework
01:30:32.620 | that you can hang complexity.
01:30:35.180 | It's a system that can then generalize,
01:30:37.060 | and it can work on problems that are bigger than,
01:30:38.940 | fit in one human's head, right?
01:30:41.420 | And so what that means, what a good stack,
01:30:43.900 | and what the modular stack provides,
01:30:45.660 | is the ability to walk up to it with a new problem,
01:30:48.820 | and it'll generally work quite well.
01:30:50.620 | And that's something that a lot of machine learning
01:30:52.940 | infrastructure, and tools, and technologies don't have.
01:30:56.660 | Typical state of the art today is you walk up,
01:30:58.780 | particularly if you're deploying,
01:30:59.740 | if you walk up with a new model,
01:31:01.300 | you try to push it through the converter,
01:31:02.740 | and the converter crashes.
01:31:04.020 | That's crazy.
01:31:07.460 | The state of ML tooling today is not anything
01:31:10.940 | that a C programmer would ever accept, right?
01:31:13.820 | And it's always been this kind of flaky set of tooling
01:31:16.660 | that's never been integrated well,
01:31:17.940 | and it's been, never worked together,
01:31:21.340 | because it's not designed together.
01:31:22.980 | It's built by different teams,
01:31:24.100 | it's built by different hardware vendors,
01:31:25.500 | it's built by different systems,
01:31:26.780 | it's built by different internet companies
01:31:28.380 | that are trying to solve their problems, right?
01:31:30.740 | And so that means that we get this fragmented,
01:31:33.500 | terrible mess of complexity.
01:31:35.020 | - So, I mean, the specifics of,
01:31:37.580 | and Jeremy showed this, there's the vectorize function,
01:31:40.780 | which I guess is built into Mojo?
01:31:45.780 | - Vectorize, as he showed, is built into the library.
01:31:49.420 | - Into the library, it's in a library.
01:31:52.020 | Vectorize, parallelize, which vectorizes more low-level,
01:31:56.420 | parallelizes higher-level.
01:31:58.220 | There's the tiling thing,
01:31:59.420 | which is how he demonstrated the autotune, I think.
01:32:04.420 | - So think about this in like levels,
01:32:07.100 | hierarchical levels of abstraction, right?
01:32:09.580 | And so at the very, if you zoom all the way
01:32:12.140 | into a compute problem, you have one floating point number.
01:32:15.340 | Right, and so then you say, okay, I wanna be,
01:32:17.260 | I can do things one at a time in an interpreter.
01:32:20.740 | It's pretty slow, right?
01:32:21.900 | So I can get to doing one at a time in a compiler,
01:32:24.940 | I can see.
01:32:26.060 | Then I can get to doing four or eight or 16 at a time
01:32:29.780 | with vectors, that's called vectorization.
01:32:32.780 | Then you can say, hey, I have a whole bunch of different,
01:32:35.740 | you know, what a multi-core computer is,
01:32:37.980 | is it's basically a bunch of computers, right?
01:32:41.020 | So they're all independent computers
01:32:42.500 | that can talk to each other and they share memory.
01:32:44.980 | And so now what parallelize does is it says, okay,
01:32:47.340 | run multiple instances of this on different computers.
01:32:50.460 | And now they can all work together on a problem, right?
01:32:52.340 | And so what you're doing is you're saying,
01:32:53.900 | keep going out to the next level out.
01:32:56.860 | And as you do that, how do I take advantage of this?
01:32:59.420 | So tiling is a memory optimization, right?
01:33:02.540 | It says, okay, let's make sure that we're keeping the data
01:33:05.060 | close to the compute part of the problem,
01:33:07.860 | instead of sending it all back and forth through memory
01:33:10.860 | every time I load a block.
01:33:12.980 | - And the size of the block size is all,
01:33:15.220 | that's how you get to the auto-tune
01:33:16.660 | to make sure it's optimized.
01:33:18.140 | - Yeah, well, so all of these,
01:33:19.220 | the details matter so much to get good performance.
01:33:22.100 | This is another funny thing about machine learning
01:33:24.380 | and high-performance computing that is very different
01:33:26.860 | than C compilers we all grew up with,
01:33:29.980 | where if you get a new version of GCC
01:33:32.820 | or a new version of Clang or something like that,
01:33:35.260 | maybe something will go 1% faster, right?
01:33:39.220 | And so compiler engineers will work really, really,
01:33:41.940 | really hard to get half a percent out of your C code,
01:33:45.060 | something like that.
01:33:46.360 | But when you're talking about an accelerator
01:33:48.660 | or an AI application,
01:33:50.020 | or you're talking about these kinds of algorithms,
01:33:53.540 | and these are things people used to write in Fortran,
01:33:55.260 | for example, right?
01:33:56.260 | If you get it wrong, it's not 5% or 1%.
01:33:59.940 | It could be 2x or 10x, right?
01:34:03.060 | If you think about it,
01:34:04.740 | you really want to make use of the full memory you have,
01:34:07.660 | the cache, for example,
01:34:09.100 | but if you use too much space, it doesn't fit in the cache,
01:34:11.740 | now you're gonna be thrashing all the way back out
01:34:13.580 | to main memory.
01:34:14.780 | And these can be 2x, 10x, major performance differences.
01:34:18.580 | And so this is where getting these magic numbers
01:34:21.220 | and these things right is really actually quite important.
01:34:23.980 | - So you mentioned that Mojo's a superset of Python.
01:34:27.540 | Can you run Python code as if it's Mojo code?
01:34:34.780 | - Yes, yes.
01:34:38.020 | So, and this has two sides of it.
01:34:41.180 | So Mojo's not done yet, so I'll give you a disclaimer,
01:34:43.660 | Mojo's not done yet,
01:34:44.500 | but already we see people that take small pieces
01:34:47.620 | of Python code, move it over, they don't change it,
01:34:50.620 | and you can get 12x speedups.
01:34:52.740 | Like somebody was just tweeting about that yesterday,
01:34:54.460 | which is pretty cool, right?
01:34:56.220 | And again, interpreters, compilers, right?
01:34:58.060 | And so without changing any code,
01:35:00.180 | without, also this is not JIT compiling
01:35:03.540 | or doing anything fancy,
01:35:05.140 | this is just basic stuff, move it straight over.
01:35:08.540 | Now Mojo will continue to grow out,
01:35:10.260 | and as it grows out,
01:35:11.140 | it will have more and more and more features,
01:35:13.300 | and our North Star is to be a full superset of Python,
01:35:15.820 | and so you can bring over basically arbitrary Python code
01:35:18.900 | and have it just work.
01:35:20.460 | And it may not always be 12x faster,
01:35:22.340 | but it should be at least as fast,
01:35:24.860 | and way faster in many cases, is the goal, right?
01:35:27.720 | Now it'll take time to do that,
01:35:31.060 | and Python is a complicated language,
01:35:32.460 | there's not just the obvious things,
01:35:34.500 | but there's also non-obvious things that are complicated,
01:35:37.740 | like we have to be able to talk to CPython packages
01:35:40.780 | that talk to the CAPI,
01:35:41.860 | and there's a bunch of pieces to this.
01:35:44.540 | - So you have to, I mean, just to make explicit,
01:35:47.500 | the obvious may not be so obvious
01:35:50.220 | until you think about it,
01:35:51.140 | so to run Python code,
01:35:53.060 | that means you have to run all the Python packages
01:35:56.180 | and libraries.
01:35:57.020 | - Yeah, yeah.
01:35:57.860 | - So that means what?
01:36:00.060 | What's the relationship between Mojo and CPython,
01:36:04.300 | the interpreter that's presumably would be tasked
01:36:08.080 | with getting those packages to work?
01:36:09.660 | - Yep, so in the fullness of time,
01:36:11.860 | Mojo will solve for all the problems,
01:36:14.180 | and you'll be able to move Python packages over
01:36:16.660 | and run them in Mojo.
01:36:17.980 | - Without the CPython.
01:36:19.220 | - Without CPython, someday.
01:36:21.300 | - Yeah.
01:36:22.140 | - Right, so not today, but someday.
01:36:23.340 | And that'll be a beautiful day,
01:36:24.740 | because then you'll get a whole bunch of advantages,
01:36:27.140 | and you'll get massive speedups and things like this.
01:36:29.100 | - But you can do that one at a time, right?
01:36:30.380 | You can move packages one at a time.
01:36:31.500 | - Exactly, but we're not willing to wait for that.
01:36:34.740 | Python is too important, the ecosystem is too broad,
01:36:37.940 | we wanna both be able to build Mojo out,
01:36:40.660 | we also wanna do it the right way,
01:36:41.780 | without intense time pressure,
01:36:44.620 | we're obviously moving fast,
01:36:46.140 | and so what we do is we say,
01:36:48.340 | okay, well let's make it so you can import
01:36:49.700 | an arbitrary existing package,
01:36:52.700 | arbitrary, including you write your own
01:36:57.180 | on your local disk, or whatever,
01:36:58.780 | it's not like a standard, an arbitrary package,
01:37:01.700 | and import that using CPython.
01:37:04.500 | 'Cause CPython already runs all the packages, right?
01:37:07.020 | And so what we do is we built an integration layer,
01:37:09.640 | where we can actually use CPython,
01:37:12.180 | again, I'm practical,
01:37:13.600 | and to actually just load and use
01:37:16.100 | all the existing packages as they are.
01:37:18.560 | The downside of that is you don't get the benefits
01:37:20.260 | of Mojo for those packages, right?
01:37:22.060 | And so they'll run as fast as they do
01:37:23.620 | in the traditional CPython way,
01:37:25.620 | but what that does is that gives you
01:37:28.060 | an incremental migration path,
01:37:29.700 | and so if you say, hey, cool,
01:37:31.220 | well here's a, you know, the Python ecosystem is vast,
01:37:34.100 | I want all of it to just work,
01:37:35.940 | but there's certain things that are really important,
01:37:37.740 | and so if I'm doing weather forecasting or something,
01:37:41.740 | well, I wanna be able to load all the data,
01:37:43.860 | I wanna be able to work with it,
01:37:44.700 | and then I have my own crazy algorithm inside of it,
01:37:47.180 | well, normally I'd write that in C++,
01:37:50.300 | if I can write in Mojo and have one system that scales,
01:37:52.820 | well, that's way easier to work with.
01:37:54.980 | - Is it hard to do that, to have that layer
01:37:58.100 | that's running CPython?
01:38:00.180 | Because is there some communication back and forth?
01:38:02.380 | - Yes, it's complicated.
01:38:03.980 | I mean, this is what we do,
01:38:05.060 | so I mean, we make it look easy,
01:38:06.820 | but it is complicated,
01:38:08.820 | but what we do is we use the CPython existing interpreter,
01:38:13.420 | so it's running its own byte codes,
01:38:14.980 | and that's how it provides full compatibility,
01:38:17.140 | and then it gives us CPython objects,
01:38:19.920 | and we use those objects as is,
01:38:22.940 | and so that way, we're fully compatible
01:38:25.140 | with all the CPython objects and all the, you know,
01:38:28.540 | it's not just the Python part,
01:38:30.060 | it's also the C packages, the C libraries underneath them,
01:38:33.060 | because they're often hybrid,
01:38:34.460 | and so we can fully run,
01:38:35.580 | and we're fully compatible with all that,
01:38:37.180 | and the way we do that is that we have to play
01:38:39.320 | by the rules, right,
01:38:40.220 | and so we keep objects in that representation
01:38:43.140 | when they're coming from that world.
01:38:44.380 | - What's the representation that's being used?
01:38:46.600 | - In memory.
01:38:47.440 | We'd have to know a lot about
01:38:48.820 | how the CPython interpreter works.
01:38:51.340 | It has, for example, reference counting,
01:38:53.660 | but also different rules on how to pass pointers around
01:38:56.420 | and things like this.
01:38:57.540 | Super low-level fiddly, and it's not like Python,
01:39:00.160 | it's like how the interpreter works, okay?
01:39:02.940 | And so that gets all exposed out,
01:39:04.820 | and then you have to define wrappers
01:39:06.780 | around the low-level C code, right?
01:39:08.940 | And so what this means is you have to know not only C,
01:39:13.820 | which is a different role from Python, obviously,
01:39:16.580 | not only Python, but the wrappers,
01:39:18.940 | but the interpreter and the wrappers
01:39:20.580 | and the implementation details and the conventions,
01:39:22.600 | and it's just this really complicated mess,
01:39:25.020 | and when you do that,
01:39:25.860 | now suddenly you have a debugger that debugs Python
01:39:28.820 | that can't step into C code,
01:39:30.220 | so you have this two-world problem, right?
01:39:33.100 | And so by pulling this all into Mojo,
01:39:36.420 | what you get is you get one world.
01:39:38.500 | You get the ability to say,
01:39:39.340 | "Cool, I have untyped, very dynamic,
01:39:42.020 | "beautiful, simple code.
01:39:44.260 | "Okay, I care about performance for whatever reason," right?
01:39:46.700 | There's lots of reasons you might care,
01:39:49.580 | and so then you add types, you can parallelize things,
01:39:52.140 | you can vectorize things, you can use these techniques,
01:39:54.220 | which are general techniques to solve a problem,
01:39:56.980 | and then you can do that by staying in the system,
01:39:59.780 | and if you have that one Python package
01:40:03.260 | that's really important to you, you can move it to Mojo,
01:40:05.380 | you get massive performance benefits on that,
01:40:07.500 | and other advantages.
01:40:09.700 | If you like stack types, it's nice if they're enforced.
01:40:12.620 | Some people like that, right, rather than being hints,
01:40:14.660 | so there's other advantages too,
01:40:16.460 | and then you can do that incrementally as you go.
01:40:20.540 | - So one different perspective on this would be
01:40:27.020 | why Mojo instead of making CPython faster,
01:40:30.940 | redesigning CPython?
01:40:32.460 | - Yeah, well, I mean, you could argue
01:40:34.300 | Mojo is redesigning CPython,
01:40:36.300 | but why not make CPython faster and better
01:40:39.820 | and other things like that?
01:40:41.140 | There's lots of people working on that.
01:40:42.900 | So actually, there's a team at Microsoft
01:40:44.580 | that is really improving, I think CPython 3.11
01:40:49.540 | came out in October or something like that,
01:40:51.300 | and it was 15% faster, 20% faster across the board,
01:40:56.300 | which is pretty huge given how mature Python is
01:40:59.660 | and things like this, and so that's awesome, I love it.
01:41:03.300 | Doesn't run on GPU, it doesn't do AI stuff,
01:41:07.980 | like it doesn't do vectors, doesn't do things.
01:41:10.380 | 20% is good, 35,000 times is better, right?
01:41:14.980 | So they're definitely, I'm a huge fan of that work,
01:41:19.100 | by the way, and it composes well with what we're doing,
01:41:21.060 | and so it's not like we're fighting or anything like that,
01:41:23.980 | it's actually just goodness for the world,
01:41:26.300 | but it's just a different path, right?
01:41:27.900 | And again, we're not working forwards
01:41:29.980 | from making Python a little bit better,
01:41:31.900 | we're working backwards from what is the limit of physics.
01:41:34.940 | - What's the process of porting Python code to Mojo?
01:41:38.980 | Is there, what's involved in that process?
01:41:43.460 | Is there tooling for that?
01:41:44.940 | - Not yet, so we're missing some basic features right now,
01:41:48.300 | and so we're continuing to drop out new features
01:41:50.660 | on a weekly basis, but at the fullness of time,
01:41:55.060 | give us a year and a half, maybe two years.
01:41:57.980 | - Is it an automatable process?
01:41:59.860 | - So when we're ready, it'll be very automatable, yes.
01:42:03.500 | - Is it possible to automate, in the general case,
01:42:08.500 | the Python to Mojo conversion?
01:42:10.540 | - Yeah, well-- - You're saying it's possible.
01:42:12.140 | - Well, so, and this is why, I mean,
01:42:14.060 | among other reasons why we use tabs.
01:42:16.620 | - Yes. - Right, so, first of all,
01:42:18.700 | by being a superset, it's like C versus C++.
01:42:22.940 | Can you move C code to C++?
01:42:24.540 | - Yes. - Yeah, right,
01:42:26.940 | and you can move C code to C++,
01:42:29.660 | and then you can adopt classes,
01:42:32.460 | you can adopt templates, you can adopt other references
01:42:35.500 | or whatever C++ features you want
01:42:37.740 | after you move C code to C++.
01:42:40.060 | Like, you can't use templates in C, right?
01:42:42.940 | And so if you leave it at C, fine,
01:42:44.340 | you can't use the cool features, but it still works, right?
01:42:46.660 | And C and C++ code work together,
01:42:48.900 | and so that's the analogy, right?
01:42:51.060 | Now, here, right, you,
01:42:54.900 | there's not a Python is bad and a Mojo is good, right?
01:43:00.260 | Mojo just gives you superpowers, right?
01:43:02.220 | And so if you wanna stay with Python, that's cool,
01:43:05.260 | but the tooling should be actually very beautiful
01:43:08.100 | and simple because we're doing the hard work
01:43:10.460 | of defining a superset.
01:43:12.580 | - Right, so you're, right, so there's several things
01:43:15.140 | to say there, but also the conversion tooling
01:43:17.660 | should probably give you hints as to, like,
01:43:19.500 | how you can improve the code.
01:43:20.620 | - And then, yeah, exactly, once you're in the new world,
01:43:22.660 | then you can build all kinds of cool tools to say, like,
01:43:24.660 | hey, should you adopt this feature?
01:43:26.100 | Or, like, and we haven't built those tools yet,
01:43:27.980 | but I fully expect those tools will exist,
01:43:29.900 | and then you can, like, you know,
01:43:31.260 | quote, unquote, modernize your code
01:43:32.620 | or however you wanna look at it, right?
01:43:34.580 | So, I mean, one of the things that I think
01:43:35.980 | is really interesting about Mojo is that
01:43:38.460 | there have been a lot of projects
01:43:40.340 | to improve Python over the years.
01:43:42.780 | Everything from, you know, getting Python
01:43:45.180 | to run on the Java virtual machine,
01:43:47.620 | PyPy, which is a JIT compiler,
01:43:49.300 | there's tons of these projects out there
01:43:50.900 | that have been working on improving Python in various ways.
01:43:54.540 | They fall into one of two camps.
01:43:56.340 | So PyPy is a great example of a camp
01:43:58.640 | that is trying to be compatible with Python.
01:44:01.520 | Even there, not really, it doesn't work
01:44:03.100 | with all the C packages and stuff like that,
01:44:05.540 | but they're trying to be compatible with Python.
01:44:08.260 | There's also another category of these things
01:44:10.180 | where they're saying, well, Python is too complicated.
01:44:12.620 | (laughs)
01:44:13.460 | And, you know, I'm gonna cheat on the edges
01:44:15.940 | and, you know, like integers in Python
01:44:18.700 | can be an arbitrary size integer.
01:44:21.180 | Like, if you care about it fitting in a,
01:44:23.100 | going fast in a register in a computer,
01:44:25.340 | that's really annoying, right?
01:44:26.980 | And so you can choose to pass on that, right?
01:44:29.700 | You can say, well, people don't really use
01:44:31.860 | big integers that often, therefore,
01:44:33.460 | I'm gonna just not do it and it'll be fine.
01:44:35.660 | Not a Python superset. (laughs)
01:44:39.220 | Or you can do the hard thing and say,
01:44:40.660 | okay, this is Python.
01:44:42.500 | You can't be a superset of Python
01:44:43.940 | without being a superset of Python.
01:44:46.820 | And that's a really hard technical problem,
01:44:49.920 | but it's, in my opinion, worth it, right?
01:44:52.400 | And it's worth it because it's not about any one package,
01:44:55.660 | it's about this ecosystem.
01:44:56.760 | It's about what Python means for the world.
01:44:58.840 | And it also means we don't wanna repeat
01:45:01.060 | the Python 2 to Python 3 transition.
01:45:02.860 | Like, we want people to be able to adopt this stuff quickly.
01:45:06.680 | And so by doing that work, we can help lift people.
01:45:09.940 | - Yeah, the challenge, it's really interesting,
01:45:11.820 | technical, philosophical challenge
01:45:13.300 | of really making a language a superset of another language.
01:45:18.300 | That's breaking my brain a little bit.
01:45:21.340 | - Well, it paints you in the corners.
01:45:22.980 | So again, I'm very happy with Python.
01:45:25.660 | So all joking aside, I think that the annotation thing
01:45:28.740 | is not the actual important part of the problem.
01:45:31.900 | - Yes. (laughs)
01:45:32.840 | - Right, but the fact that Python
01:45:34.820 | has amazing dynamic metaprogramming features
01:45:36.940 | and they translate to beautiful static metaprogramming
01:45:39.720 | features, I think is profound.
01:45:41.860 | I think that's huge, right?
01:45:43.220 | And so Python, I've talked with Guido about this.
01:45:45.820 | It's like, it was not designed to do what we're doing.
01:45:49.900 | That was not the reason they built it this way,
01:45:51.560 | but because they really cared and they were very thoughtful
01:45:53.760 | about how they designed the language,
01:45:55.420 | it scales very elegantly in the space.
01:45:58.060 | But if you look at other languages,
01:45:59.400 | for example, C and C++, right?
01:46:02.660 | If you're building a superset,
01:46:04.500 | you get stuck with the design decisions of the subset, right?
01:46:09.500 | And so, C++ is way more complicated
01:46:13.540 | because of C in the legacy than it would have been
01:46:16.400 | if they would have theoretically designed
01:46:18.140 | a from scratch thing.
01:46:19.420 | And there's lots of people right now
01:46:21.700 | that are trying to make C++ better and re-syntax C++.
01:46:25.340 | It's gonna be great, we'll just change all the syntax.
01:46:28.000 | But if you do that, now suddenly you have zero packages.
01:46:31.160 | You don't have compatibility.
01:46:32.160 | - So what are the, if you could just linger on that,
01:46:35.960 | what are the biggest challenges
01:46:38.280 | of keeping that superset status?
01:46:40.840 | What are the things you're struggling with?
01:46:42.480 | Is it all boiled down to having a big integer?
01:46:45.700 | - No, I mean, it's--
01:46:46.540 | - What are the other things like?
01:46:48.000 | - Usually it's the long tail weird things.
01:46:50.920 | So let me give you a war story.
01:46:52.960 | So war story in the space is, you go way back in time,
01:46:57.960 | project I worked on is called Clang.
01:47:00.020 | Clang, what it is, is a C/C++ parser, right?
01:47:04.060 | And when I started working on Clang,
01:47:06.320 | it must've been like 2006 or something,
01:47:08.980 | it was when I, 2007, 2006,
01:47:11.340 | when I first started working on it, right?
01:47:13.820 | It's funny how time flies.
01:47:15.120 | - Yeah, yeah.
01:47:16.120 | - I started that project and I'm like, okay,
01:47:19.460 | well, I wanna build a C parser, C++ parser for LLVM.
01:47:24.460 | It's gonna be the, GCC is yucky.
01:47:29.140 | This is me in earlier times.
01:47:31.580 | It's yucky, it's unprincipled,
01:47:32.980 | it has all these weird features, all these bugs.
01:47:35.620 | It's yucky, so I'm gonna build a standard
01:47:39.100 | compliant C and C++ parser.
01:47:41.540 | It's gonna be beautiful, it'll be amazing, well engineered,
01:47:44.400 | all the cool things an engineer wants to do.
01:47:46.740 | And so I started implementing and building it out
01:47:48.180 | and building it out and building it out.
01:47:49.340 | And then I got to include standard io.h.
01:47:52.400 | And all of the headers in the world use all the GCC stuff.
01:47:57.360 | Okay?
01:47:59.780 | And so again, come back away from theory,
01:48:03.700 | back to reality, right?
01:48:05.660 | I was at a fork in the road.
01:48:08.180 | I could have built an amazingly beautiful academic thing
01:48:11.140 | that nobody would ever use.
01:48:12.500 | Or I could say, well, it's yucky in various ways.
01:48:18.100 | All these design mistakes, accidents of history,
01:48:20.860 | the legacy, at that point GCC was like over 20 years old.
01:48:24.860 | Which, by the way, now LLVM's over 20 years old, right?
01:48:27.980 | So it's funny how time catches up to you, right?
01:48:30.340 | And so you say, okay, well, what is easier, right?
01:48:35.340 | I mean, as an engineer, it's actually much easier for me
01:48:38.460 | to go implement long tail compatibility, weird features,
01:48:41.660 | even if they're distasteful, and just do the hard work
01:48:44.740 | and figure it out, reverse engineer, understand what it is,
01:48:48.020 | write a bunch of test cases, try to understand behavior.
01:48:51.220 | It's way easier to do all that work as an engineer
01:48:53.820 | than it is to go talk to all C programmers
01:48:55.700 | and argue with them and try to get them
01:48:57.360 | to rewrite their code.
01:48:58.660 | - Yeah.
01:48:59.500 | 'Cause that breaks a lot more things.
01:49:02.900 | - Yeah, and you have realities.
01:49:05.180 | Nobody actually even understands how the code works,
01:49:07.900 | because it was written by the person who quit 10 years ago.
01:49:10.740 | (laughs)
01:49:11.580 | Right, and so this software is kind of frustrating that way,
01:49:16.420 | but that's how the world works, right?
01:49:19.140 | - Yeah, unfortunately it can never be
01:49:20.340 | this perfect, beautiful thing.
01:49:23.160 | - Well, there are occasions in which you get a build,
01:49:26.180 | like you invent a new data structure
01:49:28.740 | or something like that, or there's this beautiful algorithm
01:49:30.840 | that just makes you super happy, and I love that moment,
01:49:33.940 | but when you're working with people,
01:49:36.340 | and you're working with code and dusty deck code bases
01:49:38.740 | and things like this, right,
01:49:40.580 | it's not about what's theoretically beautiful,
01:49:42.940 | it's about what's practical, what's real,
01:49:44.460 | what people actually use, and I don't meet a lot of people
01:49:47.980 | that say, "I wanna rewrite all my code,"
01:49:50.700 | just for the sake of it.
01:49:52.620 | - By the way, there could be interesting possibilities,
01:49:54.300 | and we'll probably talk about it,
01:49:55.400 | where AI can help rewrite some code.
01:49:57.420 | That might be a farther out future,
01:50:00.100 | but it's a really interesting one,
01:50:01.420 | how that could create more, be a tool in the battle
01:50:06.420 | against this monster of complexity that you mentioned.
01:50:09.740 | - Yeah.
01:50:10.580 | - You mentioned Guido, the benevolent dictator
01:50:15.540 | for life of Python.
01:50:17.420 | What does he think about Mojo?
01:50:19.140 | Have you talked to him much about it?
01:50:20.980 | - I have talked with him about it.
01:50:22.340 | He found it very interesting.
01:50:24.060 | We actually talked with Guido before it launched,
01:50:26.160 | and so he was aware of it before it went public.
01:50:28.840 | I have a ton of respect for Guido
01:50:29.960 | for a bunch of different reasons.
01:50:31.400 | You talk about walrus operator,
01:50:33.180 | and Guido's pretty amazing in terms of steering
01:50:38.180 | such a huge and diverse community,
01:50:41.040 | and driving it forward, and I think Python
01:50:45.960 | is what it is thanks to him, right?
01:50:48.520 | And so to me, it was really important
01:50:50.760 | starting to work on Mojo to get his feedback
01:50:53.080 | and get his input and get his eyes on this, right?
01:50:56.280 | Now, a lot of what Guido was and is, I think,
01:51:00.520 | concerned about is how do we not fragment the community?
01:51:04.040 | We don't want a Python 2 to Python 3 thing.
01:51:06.360 | That was really painful for everybody involved,
01:51:09.200 | and so we spent quite a bit of time talking about that
01:51:11.200 | and some of the tricks I learned from Swift, for example.
01:51:13.600 | So in the migration from Swift,
01:51:15.240 | we managed to not just convert Objective-C
01:51:19.640 | into a slightly prettier Objective-C, which we did,
01:51:22.400 | we then converted, not entirely,
01:51:24.880 | but almost an entire community
01:51:27.000 | to a completely different language, right?
01:51:30.240 | And so there's a bunch of tricks
01:51:31.520 | that you learn along the way
01:51:32.880 | that are directly relevant to what we do,
01:51:34.880 | and so this is where, for example,
01:51:37.000 | you leverage CPython while bringing up the new thing.
01:51:41.440 | That approach is, I think, proven
01:51:43.320 | and comes from experience,
01:51:45.240 | and so Guido was very interested in, like, okay, cool.
01:51:48.680 | I think that Python is really his legacy.
01:51:50.560 | It's his baby, and I have tons of respect for that.
01:51:53.080 | Incidentally, I see Mojo as a member of the Python family.
01:51:55.920 | We're not trying to take Python away from Guido
01:51:57.680 | and from the Python community,
01:51:59.200 | and so to me, it's really important
01:52:03.280 | that we're a good member of that community,
01:52:05.400 | and so I think that, again, you would have to ask Guido this,
01:52:08.560 | but I think that he was very interested
01:52:10.080 | in this notion of, like, cool,
01:52:12.320 | Python gets beaten up for being slow.
01:52:14.280 | Maybe there's a path out of that, right?
01:52:18.960 | And that, you know, if the future is Python, right?
01:52:23.120 | I mean, look at the far outside case on this, right?
01:52:28.120 | And I'm not saying this is Guido's perspective,
01:52:30.520 | but, you know, there's this path of saying, like,
01:52:33.000 | okay, well, suddenly, Python can suddenly go
01:52:35.480 | all the places it's never been able to go before, right?
01:52:38.640 | And that means that Python can go even further
01:52:40.440 | and can have even more impact on the world.
01:52:42.440 | - So in some sense, Mojo could be seen as Python 4.0.
01:52:48.200 | - I would not say that.
01:52:49.320 | I think that would drive a lot of people really crazy.
01:52:51.400 | - Because of the PTSD of the 3.0 2.0.
01:52:54.160 | - I'm willing to annoy people about Emacs versus Vim,
01:52:56.360 | or about Tabs versus Spaces. - Not that one.
01:52:58.560 | - I don't know, that might be a little bit far even for me.
01:53:00.400 | Like, my skin may not be that thick, but.
01:53:02.280 | - But the point is, the step to being a superset
01:53:05.000 | and allowing all of these capabilities, I think,
01:53:08.680 | is the evolution of a language.
01:53:10.280 | It feels like an evolution of a language.
01:53:12.760 | So he's interested by the ideas that you're playing with,
01:53:16.440 | but also concerned about the fragmentation.
01:53:18.440 | So how, what are the ideas you've learned?
01:53:20.920 | What are you thinking about?
01:53:21.840 | How do we avoid fragmenting the community?
01:53:24.400 | Where the Pythonistas and the,
01:53:28.200 | I don't know what to call the Mojo people.
01:53:32.560 | - Magicians. - Magicians, I like it.
01:53:34.680 | - There you go. - Can coexist happily
01:53:36.840 | and share code and basically just have these big code bases
01:53:41.280 | that are using CPython and more and more moving towards Mojo.
01:53:46.280 | - Well, so again, these are lessons I learned from Swift
01:53:48.240 | and here we face very similar problems, right?
01:53:51.240 | In Swift, you have Objective-C, super dynamic.
01:53:54.400 | They're very different syntax, right?
01:53:59.880 | But you're talking to people
01:54:01.080 | who have large scale code bases.
01:54:03.200 | I mean, Apple's got the biggest, largest scale code base
01:54:06.360 | of Objective-C code, right?
01:54:07.920 | And so, you know, none of the companies,
01:54:10.200 | none of the iOS developers,
01:54:11.240 | none of the other developers
01:54:12.280 | want to rewrite everything all at once
01:54:13.720 | and so you want to be able to adopt things piece at a time.
01:54:16.280 | And so a thing that I found that worked very well
01:54:18.400 | in the Swift community was saying,
01:54:20.200 | "Okay, cool," and this was when Swift was very young,
01:54:23.280 | and she'd say, "Okay, you have a million line of code
01:54:26.200 | "Objective-C app, don't rewrite it all.
01:54:29.400 | "But when you implement a new feature,
01:54:31.160 | "go implement that new class using Swift."
01:54:35.240 | Right, and so now this turns out
01:54:36.720 | is a very wonderful thing for an app developer,
01:54:40.520 | but it's a huge challenge for the compiler team
01:54:43.440 | and the systems people that are implementing this, right?
01:54:45.600 | And this comes back to what is this trade-off
01:54:47.920 | between doing the hard thing that enables scale
01:54:51.480 | versus doing the theoretically pure and ideal thing, right?
01:54:54.040 | And so Swift had adopted and built
01:54:56.560 | a lot of different machinery to deeply integrate
01:54:58.840 | with the Objective-C runtime.
01:55:00.280 | And we're doing the same thing with Python, right?
01:55:02.600 | Now, what happened in the case of Swift
01:55:04.520 | is that Swift as a language got more and more
01:55:07.640 | and more mature over time, right?
01:55:09.840 | And incidentally, Mojo is a much simpler language
01:55:12.280 | than Swift in many ways, and so I think that Mojo
01:55:14.400 | will develop way faster than Swift for a variety of reasons.
01:55:17.840 | But as the language gets more mature,
01:55:19.400 | in parallel with that, you have new people
01:55:21.080 | starting new projects, right?
01:55:23.600 | And so when the language is mature
01:55:25.680 | and somebody's starting a new project,
01:55:26.760 | that's when they say, "Okay, cool,
01:55:27.720 | "I'm not dealing with a million lines of code.
01:55:29.840 | "I'll just start and use the new thing for my whole stack."
01:55:33.240 | Now, the problem is, again, you come back to
01:55:35.360 | where communities and where people that work together,
01:55:38.760 | you build a new subsystem or a new feature or a new thing
01:55:41.480 | in Swift or you build a new thing in Mojo,
01:55:44.280 | then you want it to end up being used on the other side.
01:55:47.960 | Right?
01:55:49.800 | And so then you need to work on integration
01:55:51.240 | back the other way.
01:55:52.880 | And so it's not just Mojo talking to Python,
01:55:55.360 | it's also Python talking to Mojo, right?
01:55:57.960 | And so what I would love to see,
01:55:59.720 | I don't wanna see this next month, right?
01:56:01.360 | But what I wanna see over the course of time
01:56:02.920 | is I would love to see people that are building
01:56:05.560 | these packages, like NumPy or TensorFlow,
01:56:10.560 | or these packages that are half Python, half C++.
01:56:15.160 | And if you say, "Okay, cool, I want to get out
01:56:17.920 | "of this Python C++ world into a unified world,
01:56:22.080 | "and so I can move to Mojo,
01:56:23.880 | "but I can't give up all my Python clients."
01:56:26.600 | 'Cause these libraries get used by everybody,
01:56:30.120 | and they're not all gonna switch all at once,
01:56:32.960 | and maybe never, right?
01:56:35.120 | Well, so the way we should do that is we should
01:56:36.720 | vend Python interfaces to the Mojo types.
01:56:40.080 | And that's what we did in Swift, and it worked great.
01:56:43.000 | I mean, it was a huge implementation challenge
01:56:44.760 | for the compiler people, right?
01:56:46.240 | But there's only a dozen of those compiler people,
01:56:49.400 | and there are millions of users.
01:56:50.920 | And so it's a very expensive, capital-intensive,
01:56:54.760 | like, skill-set-intensive problem,
01:56:57.320 | but once you solve that problem,
01:56:59.000 | it really helps adoption, it really helps
01:57:00.760 | the community progressively adopt technologies.
01:57:03.000 | And so I think that this approach will work quite well
01:57:05.280 | with the Python and the Mojo world.
01:57:07.120 | - So for a package, port it to Mojo,
01:57:09.840 | and then create a Python interface.
01:57:11.720 | - Yep.
01:57:12.560 | - So how do, just to linger on these packages,
01:57:17.080 | NumPy, PyTorch, and TensorFlow.
01:57:19.640 | - Yeah.
01:57:20.480 | - How do they play nicely together?
01:57:21.400 | So is Mojo supposed to be,
01:57:24.200 | let's talk about the machine learning ones.
01:57:26.960 | Is Mojo kind of visioned to replace PyTorch and TensorFlow,
01:57:31.760 | to incorporate it?
01:57:32.880 | What's the relationship in this?
01:57:35.000 | - All right, so take a step back.
01:57:37.280 | So I wear many hats.
01:57:39.480 | So you're angling in on the Mojo side.
01:57:43.120 | - Yes.
01:57:43.960 | - Mojo's a programming language,
01:57:44.800 | and so it can help solve the C, C++, Python feud
01:57:49.680 | that's happening.
01:57:50.520 | - The fire emoji got me, I'm sorry.
01:57:51.920 | We should be talking modular, yes, yes.
01:57:54.200 | - Yes, okay, so the fire emoji is amazing, I love it.
01:57:56.960 | It's a big deal.
01:57:59.640 | The other side of this is the fire emoji is in service
01:58:02.240 | of solving some big AI problems.
01:58:04.480 | - Yes.
01:58:05.320 | - Right, and so the big AI problems are, again,
01:58:07.400 | this fragmentation, this hardware nightmare,
01:58:09.200 | this explosion of new potential,
01:58:12.920 | but it's not getting felt by the industry, right?
01:58:15.800 | And so when you look at how does the modular engine
01:58:18.520 | help TensorFlow and PyTorch, right?
01:58:20.520 | It's not replacing them, right?
01:58:22.200 | In fact, when I talk to people, again,
01:58:24.560 | they don't like to rewrite all their code.
01:58:26.400 | You have people that are using a bunch of PyTorch,
01:58:28.560 | a bunch of TensorFlow.
01:58:30.160 | They have models that they've been building
01:58:31.560 | over the course of many years, right?
01:58:33.320 | And when I talk to them, there's a few exceptions,
01:58:36.040 | but generally they don't want to rewrite all their code.
01:58:39.320 | Right, and so what we're doing is we're saying,
01:58:40.680 | okay, well, you don't have to rewrite all your code.
01:58:43.040 | What happens is the modular engine goes in there
01:58:45.160 | and goes underneath TensorFlow and PyTorch.
01:58:47.360 | It's fully compatible and it just provides better performance,
01:58:50.400 | better predictability, better tooling.
01:58:52.840 | It's a better experience that helps lift TensorFlow
01:58:55.160 | and PyTorch and make them even better.
01:58:56.960 | I love Python, I love TensorFlow, I love PyTorch, right?
01:59:00.240 | This is about making the world better
01:59:02.200 | because we need AI to go further.
01:59:04.440 | - But if I have a process that trains a model
01:59:07.160 | and I have a process that performs inference on that model
01:59:10.120 | and I have the model itself,
01:59:12.200 | what should I do with that in the long arc of history
01:59:15.640 | in terms of if I use PyTorch to train it,
01:59:19.960 | should I rewrite stuff in Mojo?
01:59:22.200 | Would that, if I care about performance?
01:59:24.600 | - Oh, so I mean, again, it depends.
01:59:26.560 | So if you care about performance,
01:59:28.040 | then writing it in Mojo is gonna be way better
01:59:29.840 | than writing it in Python.
01:59:30.800 | But if you look at LLM companies, for example,
01:59:34.800 | so you look at OpenAI, Rumored,
01:59:36.480 | and you look at many of the other folks
01:59:37.800 | that are working on many of these LLMs
01:59:41.600 | and other innovative machine learning models,
01:59:44.360 | on the one hand, they're innovating in the data collection
01:59:46.680 | and the model, billions of parameters
01:59:48.600 | in the model architecture and the RLE, HF,
01:59:52.560 | and all the cool things that people are talking about.
01:59:56.360 | But on the other hand, they're spending a lot of time
01:59:58.360 | writing "Coup de Grâce."
02:00:00.000 | (laughs)
02:00:00.840 | Right?
02:00:01.680 | And so you say, "Wait a second,
02:00:03.560 | "how much faster could all this progress go
02:00:05.480 | "if they were not having to handwrite
02:00:06.560 | "all these coup de grèces?"
02:00:08.240 | Right, and so there are a few technologies
02:00:09.840 | that are out there, and people have been working
02:00:11.720 | on this problem for a while,
02:00:12.720 | and they're trying to solve subsets of the problem,
02:00:15.600 | again, kind of fragmenting the space.
02:00:17.040 | And so what Mojo provides for these kinds of companies
02:00:19.680 | is the ability to say, "Cool, I can have a unifying theory."
02:00:23.280 | Right, and again, the better together,
02:00:25.560 | the unifying theory, the two-world problem
02:00:27.960 | or the three-world problem or the n-world problem,
02:00:29.640 | like, this is the thing that is slowing people down.
02:00:32.400 | And so as we help solve this problem,
02:00:34.040 | I think it'll be very helpful
02:00:35.080 | for making this whole cycle go faster.
02:00:38.160 | - So obviously we've talked about the transition
02:00:40.680 | from Objective-C to Swift,
02:00:42.120 | you've designed this programming language,
02:00:45.520 | and you've also talked quite a bit
02:00:47.280 | about the use of Swift for machine learning context.
02:00:51.160 | Why have you decided to move away
02:00:55.040 | from maybe an intense focus on Swift
02:00:59.520 | for the machine learning context
02:01:00.680 | versus sort of designing a new programming language
02:01:05.080 | that happens to be a super-sensitive--
02:01:05.920 | - You're saying this is an irrational
02:01:07.520 | set of life choices I make, or what?
02:01:09.280 | (both laughing)
02:01:10.640 | - Did you go to the desert, and did you meditate on it?
02:01:13.320 | Okay, all right.
02:01:15.040 | No, it was bold and needed, and I think,
02:01:18.720 | I mean, it's just bold, and sometimes to take those leaps
02:01:21.240 | is a difficult leap to take.
02:01:22.960 | - Yeah, well, so, okay, I mean,
02:01:24.400 | I think there's a couple of different things.
02:01:25.560 | So actually, I left Apple back in 2017,
02:01:29.160 | like January 2017, so it's been a number of years
02:01:32.880 | that I left Apple, and the reason I left Apple was to do AI.
02:01:36.240 | Okay, so, and again, I won't comment on Apple and AI,
02:01:41.960 | but at the time, right, I wanted to get into
02:01:46.640 | and understand and understand the technology,
02:01:48.440 | understand the applications, the workloads,
02:01:50.120 | and so I was like, okay, I'm gonna go dive deep
02:01:52.280 | into applied and AI, and then the technology underneath it.
02:01:55.440 | Right?
02:01:56.280 | I found myself at Google.
02:01:59.360 | - And that was like when TPUs were waking up.
02:02:02.520 | - Exactly, and so I found myself at Google,
02:02:04.320 | and Jeff Dean, who's a rock star, as you know, right,
02:02:09.120 | and in 2017, TensorFlow's like really taking off
02:02:13.440 | and doing incredible things, and I was attracted to Google
02:02:16.560 | to help them with the TPUs, right,
02:02:18.320 | and TPUs are an innovative hardware accelerator platform,
02:02:21.840 | and have now, I mean, I think proven massive scale
02:02:24.400 | and like done incredible things, right,
02:02:26.560 | and so one of the things that this led into
02:02:29.640 | is a bunch of different projects, which I'll skip over,
02:02:32.000 | right, one of which was the Swift for TensorFlow project,
02:02:35.280 | right, and so that project was a research project,
02:02:38.560 | and so the idea of that is say, okay, well,
02:02:41.000 | let's look at innovative new programming models
02:02:43.480 | where we can get a fast programming language,
02:02:45.900 | we can get automatic differentiation into language,
02:02:48.920 | let's push the boundaries of these things
02:02:50.880 | in a research setting, right,
02:02:53.200 | now that project I think lasted two, three years,
02:02:57.120 | there's some really cool outcomes of that,
02:02:58.580 | so one of the things that's really interesting
02:03:00.400 | is I published a talk at an LLVM conference in 2018,
02:03:05.400 | again, this seems like so long ago,
02:03:07.920 | about graph program abstraction,
02:03:10.000 | which is basically the thing that's in PyTorch 2,
02:03:13.200 | and so PyTorch 2 with all this Dynamo Rio thing,
02:03:15.320 | it's all about this graph program abstraction thing
02:03:17.680 | from Python byte codes, and so a lot of the research
02:03:20.320 | that was done ended up pursuing and going out
02:03:24.280 | through the industry and influencing things,
02:03:26.040 | and I think it's super exciting and awesome to see that,
02:03:28.900 | but the Swift for TensorFlow project itself
02:03:30.400 | did not work out super well,
02:03:31.720 | and so there's a couple of different problems with that,
02:03:34.200 | one of which is that you may have noticed
02:03:36.680 | Swift is not Python,
02:03:37.720 | there's a few people that write Python code,
02:03:41.680 | and so it turns out that all of ML
02:03:44.720 | is pretty happy with Python.
02:03:46.520 | - It's actually a problem
02:03:47.360 | that other programming languages have as well,
02:03:50.240 | that they're not Python.
02:03:51.520 | We'll probably maybe briefly talk about Julia,
02:03:54.560 | who's a very interesting, beautiful programming language,
02:03:57.280 | but it's not Python.
02:03:58.480 | - Exactly, and so if you're saying,
02:04:01.440 | I'm gonna solve a machine learning problem
02:04:03.660 | where all the programmers are Python programmers,
02:04:06.800 | and you say, the first thing you have to do
02:04:08.160 | is switch to a different language,
02:04:10.520 | well, your new thing may be good or bad or whatever,
02:04:13.320 | but if it's a new thing, the adoption barrier is massive.
02:04:17.480 | - It's still possible.
02:04:18.480 | - Still possible, yeah, absolutely,
02:04:19.720 | the world changes and evolves,
02:04:21.160 | and there's definitely room for new and good ideas,
02:04:23.520 | but it just makes it so much harder,
02:04:26.000 | and so lesson learned, Swift is not Python,
02:04:29.340 | and people are not always in search
02:04:31.200 | of learning a new thing
02:04:32.200 | for the sake of learning a new thing,
02:04:33.360 | and if you wanna be compatible with all the world's code,
02:04:35.800 | turns out, meet the world where it is.
02:04:38.800 | Second thing is that, a lesson learned
02:04:43.000 | is that Swift, as a very fast and efficient language,
02:04:46.400 | kind of like Mojo, but a different take on it still,
02:04:50.280 | really worked well with Eager Mode,
02:04:54.760 | and so Eager Mode is something that PyTorch does,
02:04:57.400 | and it proved out really well,
02:04:58.640 | and it enables really expressive and dynamic
02:05:01.960 | and easy to debug programming.
02:05:04.260 | TensorFlow at the time was not set up for that,
02:05:08.440 | let's say, that was not--
02:05:09.560 | - The timing is also important in this world.
02:05:11.680 | - Yeah, yeah, and TensorFlow's a good thing,
02:05:13.440 | and it has many, many strengths,
02:05:16.080 | but you could say Swift for TensorFlow is a good idea,
02:05:20.560 | except for the Swift, and except for the TensorFlow part.
02:05:23.280 | (laughing)
02:05:24.640 | - Swift because it's not Python,
02:05:26.200 | and TensorFlow because it's not--
02:05:27.520 | - Wasn't set up for Eager Mode at the time, yeah.
02:05:29.520 | - As 1.0? - Exactly.
02:05:31.120 | And so one of the things about that
02:05:34.320 | is that in the context of it being a research project,
02:05:36.400 | I'm very happy with the fact that
02:05:38.480 | we built a lot of really cool technology,
02:05:40.480 | we learned a lot of things,
02:05:41.760 | I think the ideas went on to have influence
02:05:43.560 | in other systems, like PyTorch,
02:05:44.680 | a few people use that, I hear, right?
02:05:46.600 | And so I think that's super cool.
02:05:48.560 | And for me personally, I learned so much from it, right?
02:05:51.560 | And I think a lot of the engineers that worked on it
02:05:53.320 | also learned a tremendous amount.
02:05:55.080 | And so I think that that's just really exciting to see,
02:05:59.080 | and I'm sorry that the project didn't work out,
02:06:01.560 | I wish it did, of course, right?
02:06:03.120 | But it's a research project,
02:06:07.720 | and so you're there to learn from it.
02:06:09.440 | - Well, it's interesting to think about
02:06:12.560 | the evolution of programming
02:06:15.520 | as we come up with these whole new set of algorithms
02:06:19.560 | in machine learning, in artificial intelligence,
02:06:22.120 | and what's going to win out.
02:06:23.920 | Because it could be a new programming language.
02:06:25.920 | - Yeah. - It could be,
02:06:27.020 | I just mentioned Julia,
02:06:30.760 | I think there's a lot of ideas behind Julia
02:06:33.320 | that Mojo shares.
02:06:37.120 | What are your thoughts about Julia in general?
02:06:40.760 | - So I will have to say that when we launched Mojo,
02:06:43.720 | one of the biggest things I didn't predict
02:06:46.520 | was the response from the Julia community.
02:06:48.800 | And so I was not, I mean, okay, let me take a step back.
02:06:53.460 | I've known the Julia folks for a really long time.
02:06:56.120 | They were an adopter of LLVM a long time ago.
02:06:59.340 | They've been pushing state-of-the-art
02:07:00.720 | in a bunch of different ways.
02:07:01.800 | Julia's a really cool system.
02:07:03.840 | I had always thought of Julia
02:07:05.800 | as being mostly a scientific computing focused environment.
02:07:10.440 | And I thought that was its focus.
02:07:12.600 | I neglected to understand that one of their missions
02:07:16.120 | is to help make Python work end-to-end.
02:07:19.400 | And so I think that was my error for not understanding that.
02:07:23.120 | And so I could have been maybe more sensitive to that.
02:07:25.720 | But there's major differences
02:07:27.680 | between what Mojo's doing and what Julia's doing.
02:07:30.040 | So as you say, Julia is not Python.
02:07:32.240 | And so one of the things that a lot of the Julia people
02:07:36.240 | came out and said is like, okay, well,
02:07:38.480 | if we put a ton of more energy and a ton more money
02:07:41.680 | or engineering or whatever into Julia,
02:07:44.040 | maybe that would be better than starting Mojo, right?
02:07:47.400 | Well, I mean, maybe that's true,
02:07:49.400 | but it still wouldn't make Julia into Python.
02:07:52.480 | So if you've worked backwards from the goal of
02:07:54.560 | let's build something for Python programmers
02:07:57.400 | without requiring them to relearn syntax,
02:08:01.440 | then Julia just isn't there, right?
02:08:04.440 | I mean, that's a different thing, right?
02:08:05.640 | And so if you anchor on, I love Julia
02:08:09.160 | and I want Julia to go further,
02:08:10.440 | then you can look at it from a different lens.
02:08:12.360 | But the lens we were coming at was,
02:08:14.280 | hey, everybody is using Python.
02:08:16.240 | Python isn't, syntax isn't broken.
02:08:18.800 | Let's take what's great about Python and make it even better.
02:08:21.240 | And so it was just a different starting point.
02:08:23.240 | So I think Julia's a great language.
02:08:24.720 | The community's a lovely community.
02:08:26.240 | They're doing really cool stuff,
02:08:27.560 | but it's just a slightly different angle.
02:08:30.400 | - But it does seem that Python is quite sticky.
02:08:33.480 | Is there some philosophical almost thing you could say
02:08:37.880 | about why Python by many measures
02:08:40.120 | seems to be the most popular programming language
02:08:42.160 | in the world?
02:08:43.160 | - Well, I can tell you things I love about it.
02:08:44.840 | Maybe that's one way to answer the question, right?
02:08:46.840 | So huge package ecosystem,
02:08:49.640 | super lightweight and easy to integrate.
02:08:51.880 | It has very low startup time.
02:08:53.760 | - So what startup time?
02:08:55.720 | You mean like learning curve or what?
02:08:57.240 | - Yeah, so if you look at certain other languages,
02:08:59.320 | you say like Go,
02:09:01.680 | and it just takes like Java, for example,
02:09:03.880 | it takes a long time to JIT compile all the things.
02:09:05.840 | And then the VM starts up
02:09:08.080 | and the garbage collectors kicks in
02:09:09.360 | and then it revs its engines
02:09:10.520 | and then it can plow through a lot of internet stuff
02:09:12.400 | or whatever, right?
02:09:14.000 | Python is like scripting.
02:09:15.640 | Like it just goes, right?
02:09:17.640 | Python has very low compile time.
02:09:19.560 | Like so you're not sitting there waiting.
02:09:21.120 | Python integrates in a notebooks in a very elegant way
02:09:23.720 | that makes exploration super interactive
02:09:26.240 | and it's awesome, right?
02:09:27.560 | Python is also,
02:09:29.240 | it's like almost the glue of computing
02:09:32.200 | because it has such a simple object representation,
02:09:35.440 | a lot of things plug into it.
02:09:37.320 | That dynamic metaprogramming thing we were talking about
02:09:39.360 | also enables really expressive and beautiful APIs, right?
02:09:42.680 | So there's lots of reasons that you can look at
02:09:45.760 | technical things that Python has done
02:09:47.520 | and say like, okay, wow,
02:09:48.520 | this is actually a pretty amazing thing.
02:09:50.280 | And any one of those you can neglect.
02:09:52.840 | People all just talk about indentation
02:09:55.560 | and ignore like the fundamental things.
02:09:57.760 | But then you also look at the community side, right?
02:10:00.320 | So Python owns machine learning.
02:10:02.160 | Machine learning is pretty big.
02:10:04.080 | - Yeah, and it's growing.
02:10:05.040 | - And it's growing, right?
02:10:05.880 | And it's growing in importance, right?
02:10:07.160 | And so--
02:10:08.000 | - And there's a reputation and prestige to machine learning
02:10:10.840 | to where like if you're a new programmer,
02:10:12.840 | you're thinking about like,
02:10:14.600 | which programming language do I use?
02:10:16.400 | Well, I should probably care about machine learning.
02:10:18.240 | Therefore, let me try Python
02:10:20.120 | and it kind of builds and builds and builds.
02:10:21.480 | - And you even go back before that.
02:10:24.240 | Like my kids learn Python.
02:10:27.440 | Right, not because I'm telling them to learn Python,
02:10:29.040 | but because--
02:10:29.880 | - Were they rebelling against you or what?
02:10:31.800 | - Oh, no, no, no.
02:10:32.640 | Well, they also learn Scratch and things like this too.
02:10:34.760 | But it's because Python is taught everywhere, right?
02:10:37.560 | Because it's easy to learn, right?
02:10:38.960 | And because it's pervasive, right?
02:10:40.480 | And there's--
02:10:41.320 | - Back in my day, we learned Java and C++.
02:10:44.120 | - Yeah, well.
02:10:45.720 | - Uphill both directions.
02:10:47.480 | But yes, I guess Python is the main language
02:10:49.840 | of teaching software engineering in schools now.
02:10:51.920 | - Yeah, well, and if you look at this,
02:10:53.680 | there's these growth cycles, right?
02:10:56.320 | If you look at what causes things to become popular
02:10:59.600 | and then gain in popularity,
02:11:00.920 | there's reinforcing feedback loops and things like this.
02:11:03.480 | And I think Python has done,
02:11:05.000 | again, the whole community has done a really good job
02:11:06.840 | of building those growth loops
02:11:08.160 | and help propel the ecosystem.
02:11:10.080 | And I think that, again, you look at
02:11:11.680 | what you can get done with just a few lines of code.
02:11:13.360 | It's amazing.
02:11:14.640 | - So this kind of self-building loop
02:11:19.560 | is interesting to understand
02:11:20.800 | because when you look at Mojo,
02:11:22.960 | what it stands for, some of the features,
02:11:25.880 | it seems sort of clear that this is a good direction
02:11:29.640 | for programming languages to evolve
02:11:31.840 | in the machine learning community.
02:11:33.600 | But it's still not obvious that it will
02:11:35.880 | because of this, whatever the engine of popularity,
02:11:39.320 | of virality, is there something you could speak to?
02:11:42.440 | Like how do you get people to switch?
02:11:45.560 | - Yeah, well, I mean, I think that the viral growth loop
02:11:48.880 | is to switch people to Unicode.
02:11:50.800 | - Yes.
02:11:51.640 | - I think the Unicode file extensions are what I'm betting on.
02:11:53.280 | I think that's gonna be the thing.
02:11:54.760 | - Yeah.
02:11:55.840 | - Tell the kids that you could use the fire emoji
02:11:58.000 | and they'd be like, what?
02:11:59.120 | - Exactly.
02:11:59.960 | (laughing)
02:12:01.360 | Well, in all seriousness, I mean,
02:12:03.600 | I think there's really, I'll give you two opposite answers.
02:12:07.480 | One is, I hope if it's useful, if it solves problems,
02:12:10.960 | and if people care about those problems being solved,
02:12:14.080 | they'll adopt the tech.
02:12:15.800 | Right, that's kind of the simple answer.
02:12:17.880 | And when you're looking to get tech adopted,
02:12:19.880 | the question is, is it solving an important problem
02:12:22.680 | people need solved?
02:12:24.000 | And is the adoption cost low enough
02:12:27.240 | that they're willing to make the switch and cut over
02:12:30.480 | and do the pain up front so that they can actually do it?
02:12:33.200 | Right?
02:12:34.520 | And so hopefully Mojo will be that for a bunch of people.
02:12:37.400 | And people building these hybrid packages are suffering.
02:12:41.240 | It's really painful.
02:12:42.240 | And so I think that we have a good shot of helping people.
02:12:45.200 | But the other side is like,
02:12:46.360 | it's okay if people don't use Mojo.
02:12:48.480 | Like it's not my job to say like, everybody should do this.
02:12:51.000 | Like, I'm not saying Python is bad.
02:12:52.360 | Like I hope Python, CPython, like all these implementations,
02:12:55.280 | 'cause Python ecosystem is not just CPython.
02:12:57.320 | It's also a bunch of different implementations
02:12:59.520 | with different trade-offs.
02:13:00.360 | And this ecosystem is really powerful and exciting,
02:13:04.040 | as are other programming languages.
02:13:05.880 | It's not like TypeScript or something is gonna go away.
02:13:08.920 | Right?
02:13:09.760 | And so there's not a winner take all thing.
02:13:11.800 | And so I hope that Mojo is exciting and useful to people.
02:13:14.320 | But if it's not, that's also fine.
02:13:16.120 | - But I also wonder what the use case
02:13:20.760 | for why you should try Mojo would be.
02:13:23.520 | So practically speaking, it seems like,
02:13:27.240 | so there's entertainment.
02:13:29.840 | There's the dopamine hit of saying, holy shit,
02:13:32.280 | this is 10 times faster.
02:13:34.060 | This little piece of code is 10 times faster in Mojo.
02:13:37.880 | - Out of the box before you get to 35,000.
02:13:40.520 | - Exactly.
02:13:41.360 | I mean, just even that, I mean, that's the dopamine hit
02:13:44.120 | that every programmer sort of dreams of is the optimization.
02:13:49.920 | It's also the drug that can pull you in
02:13:53.320 | and have you waste way too much of your life
02:13:56.800 | optimizing and over-optimizing, right?
02:13:58.700 | But so what do you see that would be like commonly,
02:14:03.280 | this is very hard to predict, of course,
02:14:04.720 | but if you look 10 years from now,
02:14:08.120 | Mojo's super successful,
02:14:10.760 | what do you think would be the thing where people like try
02:14:14.640 | and then use it regularly and it kind of grows
02:14:17.600 | and grows and grows and grows?
02:14:18.440 | - Well, so you talk about dopamine hit.
02:14:19.960 | And so again, humans are not one thing.
02:14:24.360 | And some people love rewriting their code
02:14:27.280 | and learning new things and throwing themselves in deep end
02:14:29.040 | and trying out new things.
02:14:30.560 | In my experience, most people don't.
02:14:33.680 | Like they're too busy.
02:14:34.560 | They have other things going on.
02:14:36.160 | By number, most people don't like this.
02:14:39.400 | I want to rewrite all my code.
02:14:40.900 | But even those people, the too busy people,
02:14:45.840 | the people that don't actually care about the language,
02:14:48.840 | that just care about getting stuff done,
02:14:50.680 | those people do like learning new things, right?
02:14:54.080 | And so you talk about the dopamine rush of 10x faster.
02:14:56.460 | Wow, that's cool.
02:14:57.300 | I want to do that again.
02:14:58.360 | Well, it's also like,
02:14:59.200 | here's the thing I've heard about in a different domain
02:15:01.760 | and I don't have to rewrite all my code.
02:15:02.960 | I can learn a new trick, right?
02:15:05.160 | Well, that's called growth, you know?
02:15:07.360 | And so one thing that I think is cool about Mojo,
02:15:10.920 | and again, this will take a little bit of time for,
02:15:13.560 | for example, the blog posts and the books
02:15:15.800 | and like all that kind of stuff develop
02:15:17.160 | and the language needs to get further along.
02:15:19.120 | But what we're doing, you talk about types,
02:15:21.280 | like you can say, look,
02:15:22.360 | you can start with the world you already know
02:15:24.680 | and you can progressively learn new things
02:15:26.640 | and adopt them where it makes sense.
02:15:28.580 | If you never do that, that's cool.
02:15:31.160 | You're not a bad person.
02:15:32.360 | If you get really excited about it
02:15:34.600 | and want to go all the way in the deep end
02:15:36.040 | and want to rewrite everything and like whatever,
02:15:38.040 | that's cool, right?
02:15:39.240 | But I think the middle path is actually the more likely one
02:15:41.760 | where it's, you know, you come out with a new idea
02:15:46.400 | and you discover, wow, that makes my code way simpler,
02:15:48.760 | way more beautiful, way faster, way whatever.
02:15:51.080 | And I think that's what people like.
02:15:53.020 | Now, if you fast forward and you said like 10 years up,
02:15:56.800 | right, I can give you a very different answer on that,
02:15:59.680 | which is, I mean, if you go back
02:16:02.040 | and look at what computers looked like 20 years ago,
02:16:05.400 | every 18 months they got faster for free, right?
02:16:09.120 | 2x faster every 18 months.
02:16:10.640 | It was like clockwork.
02:16:11.560 | It was free, right?
02:16:13.280 | You go back 10 years ago and we entered in this world
02:16:15.760 | where suddenly we had multi-core CPUs and we had GPUs.
02:16:19.840 | And if you squint and turn your head,
02:16:22.080 | what a GPU is, is it's just a many core,
02:16:24.520 | very simple CPU thing kind of, right?
02:16:27.000 | And so, and 10 years ago, it was CPUs and GPUs and graphics.
02:16:31.640 | Today, we have CPUs, GPUs, graphics and AI
02:16:39.080 | because it's so important
02:16:40.560 | and because the compute is so demanding
02:16:42.360 | because of the smart cameras and the watches
02:16:44.520 | and all the different places the AI needs
02:16:47.000 | to work in our lives,
02:16:48.480 | it's caused this explosion of hardware.
02:16:50.640 | And so part of my thesis,
02:16:52.280 | part of my belief of where computing goes,
02:16:54.280 | if you look out 10 years from now,
02:16:56.320 | is it's not gonna get simpler.
02:16:58.520 | Physics isn't going back to where we came from.
02:17:00.800 | It's only gonna get weirder from here on out, right?
02:17:03.460 | And so to me, the exciting part about what we're building
02:17:06.920 | is it's about building that universal platform,
02:17:10.200 | which the world can continue to get weird,
02:17:12.920 | 'cause again, I don't think it's avoidable, it's physics,
02:17:15.400 | but we can help lift people's scale, do things with it,
02:17:18.360 | and they don't have to rewrite their code
02:17:19.320 | every time a new device comes out.
02:17:21.160 | And I think that's pretty cool.
02:17:22.520 | And so if Mojo can help with that problem,
02:17:24.520 | then I think that it will be hopefully quite interesting
02:17:27.160 | and quite useful to a wide range of people
02:17:29.200 | because there's so much potential
02:17:31.040 | and like there's so, you know,
02:17:32.180 | maybe analog computers will become a thing or something,
02:17:34.960 | right?
02:17:35.800 | And we need to be able to get into a mode
02:17:37.080 | where we can move this programming model forward,
02:17:40.040 | but do so in a way where we're lifting people
02:17:41.920 | and growing them instead of forcing them
02:17:45.080 | to rewrite all their code and exploding them.
02:17:46.760 | - Do you think there'll be a few major libraries
02:17:49.640 | that go Mojo first?
02:17:51.080 | - Well, so I mean, the modular engine's all Mojo.
02:17:56.560 | So again, come back to like, we're not building Mojo
02:17:59.240 | because it's fun, we're building Mojo
02:18:00.400 | because we had to to solve these accelerators.
02:18:02.800 | - That's the origin story.
02:18:03.720 | But I mean, ones that are currently in Python.
02:18:05.800 | - Yeah, so I think that a number of these projects will.
02:18:07.680 | And so one of the things, again,
02:18:09.100 | this is just my best guess,
02:18:10.240 | like each of the package maintainers also has,
02:18:12.860 | I'm sure plenty of other things going on.
02:18:14.480 | People don't like, really don't like rewriting code
02:18:16.560 | just for the sake of rewriting code.
02:18:19.080 | But sometimes like people are excited
02:18:21.800 | about like adopting a new idea.
02:18:23.960 | - Yeah.
02:18:24.800 | - It turns out that while rewriting code
02:18:26.480 | is generally not people's first thing,
02:18:29.520 | turns out that redesigning something while you rewrite it
02:18:32.920 | and using a rewrite as an excuse to redesign
02:18:35.680 | can lead to the 2.0 of your thing
02:18:38.440 | that's way better than the 1.0, right?
02:18:40.740 | And so I have no idea, I can't predict that.
02:18:43.500 | But there's a lot of these places where,
02:18:45.820 | again, if you have a package that is half C and half Python,
02:18:49.340 | right, you just solve the pain,
02:18:51.380 | make it easier to move things faster,
02:18:52.820 | make it easier to debug and evolve your tech.
02:18:55.880 | Adopting Mojo kind of makes sense to start with.
02:18:57.900 | And then it gives you this opportunity
02:18:59.020 | to rethink these things.
02:19:00.220 | - So the two big gains are that there's a performance gain,
02:19:04.820 | and then there's the portability
02:19:08.600 | to all kinds of different devices.
02:19:10.200 | - And there's safety, right?
02:19:11.480 | So you talk about real types.
02:19:13.140 | I mean, not saying this is for everybody,
02:19:16.340 | but that's actually a pretty big thing, right?
02:19:18.040 | - Yeah, types are--
02:19:19.080 | - And so there's a bunch of different aspects
02:19:20.960 | of what Value Mojo provides.
02:19:23.200 | And so I mean, it's funny for me,
02:19:24.300 | like I've been working on these kinds of technologies
02:19:27.040 | and tools for too many years now.
02:19:30.480 | But you look at Swift, right?
02:19:32.120 | And we talked about Swift for TensorFlow,
02:19:33.400 | but Swift as a programming language, right?
02:19:35.560 | Swift's now 13 years old from when I started it.
02:19:41.480 | So, 'cause I started in 2010, if I remember.
02:19:44.400 | And so that project,
02:19:47.120 | and I was involved with it for 12 years or something, right?
02:19:50.200 | That project has gone through its own
02:19:52.120 | really interesting story arc, right?
02:19:53.740 | And it's a mature, successful,
02:19:55.480 | used by millions of people system, right?
02:19:57.960 | Certainly not dead yet, right?
02:19:59.360 | But also going through that story arc,
02:20:02.000 | I learned a tremendous amount about building languages,
02:20:04.240 | about building compilers, about working with community,
02:20:06.920 | and things like this.
02:20:07.760 | And so that experience, like I'm helping channel
02:20:10.320 | and bring directly into Mojo.
02:20:12.160 | And other systems, same thing.
02:20:14.040 | Like apparently I like building,
02:20:15.840 | building and iterating and evolving things.
02:20:17.400 | And so you look at this LLVM thing
02:20:18.960 | that I worked on 20 years ago,
02:20:20.760 | you look at MLIR, right?
02:20:22.620 | And so a lot of the lessons learned in LLVM
02:20:25.040 | got fed into MLIR.
02:20:26.440 | And I think that MLIR is a way better system than LLVM was.
02:20:29.540 | And Swift is a really good system,
02:20:31.760 | and it's amazing.
02:20:33.080 | But I hope that Mojo will take the next step forward
02:20:37.360 | in terms of design.
02:20:39.280 | - In terms of running Mojo,
02:20:42.480 | people can play with it.
02:20:43.480 | What's Mojo Playground?
02:20:45.540 | - Yeah.
02:20:46.380 | - From the interface perspective,
02:20:49.260 | and from the hardware perspective,
02:20:51.300 | what's this incredible thing running on?
02:20:54.560 | - Yeah, so right now,
02:20:55.560 | so here we are two weeks after launch.
02:20:58.000 | We decided that, okay,
02:20:59.720 | we have this incredible set of technology
02:21:01.360 | that we think might be good,
02:21:03.440 | but we have not given it to lots of people yet.
02:21:06.640 | And so we were very conservative and said,
02:21:08.480 | let's put it in a workbook
02:21:09.920 | so that if it crashes, we can do something about it.
02:21:11.960 | We can monitor and track that, right?
02:21:13.560 | And so, again, things are still super early,
02:21:16.560 | but we're having like one person a minute sign up
02:21:21.480 | with over 70,000 people two weeks in.
02:21:25.040 | It's kind of crazy.
02:21:26.080 | - So you can sign up to Mojo Playground
02:21:28.200 | and you can use it in the cloud.
02:21:30.760 | - Yeah.
02:21:31.600 | - In your browser.
02:21:32.440 | - And so what that's running on,
02:21:33.840 | yeah, what that's running on is that's running on cloud VMs.
02:21:37.320 | And so you share a machine with a bunch of other people,
02:21:40.560 | but it turns out there's a bunch of them now
02:21:42.360 | because there's a lot of people.
02:21:43.880 | And so what you're doing is you're getting free compute
02:21:46.040 | and you're getting to play with this thing
02:21:47.200 | in kind of a limited controlled way
02:21:49.860 | so that we can make sure
02:21:50.700 | that it doesn't totally crash and be embarrassing, right?
02:21:55.400 | So now a lot of the feedback we've gotten
02:21:57.360 | is people want to download it locally.
02:21:58.800 | So we're working on that right now.
02:21:59.980 | And so that's--
02:22:00.820 | - So that's the goal, to be able to download locally.
02:22:03.040 | - Yeah, that's what everybody expects.
02:22:04.640 | And so we're working on that right now.
02:22:05.900 | And so we just want to make sure that we do it right.
02:22:07.720 | And I think this is one of the lessons I learned
02:22:10.380 | from Swift also, by the way,
02:22:12.680 | is that when we launched Swift,
02:22:14.880 | gosh, it feels like forever ago, it was 2014.
02:22:17.400 | And we, I mean, it was super exciting.
02:22:20.960 | I and we, the team had worked on Swift
02:22:22.960 | for a number of years in secrecy, okay?
02:22:25.580 | And we, four years into this development,
02:22:29.380 | roughly, of working on this thing,
02:22:31.980 | at that point, about 250 people at Apple knew about it.
02:22:35.500 | Okay, so it was secret.
02:22:36.580 | Apple's good at secrecy, and it was a secret project.
02:22:39.020 | And so we launched this at WWDC,
02:22:41.580 | a bunch of hoopla and excitement,
02:22:42.940 | and said, "Developers, you're gonna be able to develop
02:22:45.540 | "and submit apps to the App Store in three months."
02:22:48.540 | - Yeah.
02:22:49.420 | - Well, several interesting things happened, right?
02:22:51.740 | So first of all, we learned that, A, it had a lot of bugs.
02:22:55.600 | (laughs)
02:22:56.440 | It was not actually production quality.
02:22:58.280 | And it was extremely stressful in terms of like,
02:23:01.100 | trying to get it working for a bunch of people.
02:23:03.720 | And so what happened was we went from zero to,
02:23:06.440 | you know, I don't know how many developers
02:23:07.920 | Apple had at the time, but a lot of developers overnight,
02:23:11.100 | and they ran into a lot of bugs,
02:23:12.880 | and it was really embarrassing,
02:23:13.940 | and it was very stressful for everybody involved, right?
02:23:16.680 | It was also very exciting,
02:23:17.720 | 'cause everybody was excited about that.
02:23:19.920 | The other thing I learned is that when that happened,
02:23:22.180 | roughly every software engineer
02:23:23.520 | who did not know about the project at Apple,
02:23:25.780 | their head exploded when it was launched,
02:23:27.760 | 'cause they didn't know it was coming.
02:23:29.300 | And so they're like, "Wait, what is this?
02:23:31.080 | "I signed up to work for Apple
02:23:32.340 | "because I love Objective-C.
02:23:33.400 | "Why is there a new thing?"
02:23:34.520 | Right? - Yeah.
02:23:35.360 | - And so, now what that meant, practically,
02:23:38.960 | is that the push from launch to, first of all, the fall,
02:23:43.280 | but then to 2.0 and 3.0, and like, all the way forward,
02:23:46.900 | was super painful for the engineering team and myself.
02:23:51.600 | It was very stressful.
02:23:53.080 | The developer community was very grumpy about it,
02:23:55.080 | because they're like, "Okay, well, wait a second.
02:23:56.420 | "You're changing and breaking my code,
02:23:57.880 | "and we have to fix the bugs."
02:23:59.880 | And it was just a lot of tension and friction on all sides.
02:24:03.840 | There's a lot of technical debt in the compiler,
02:24:07.640 | because we have to run really fast,
02:24:09.360 | and you have to go implement the thing
02:24:10.320 | and unblock the use case and do the thing,
02:24:11.800 | and you know it's not right,
02:24:13.120 | but you never have time to go back and do it right.
02:24:15.000 | And I'm very proud of the Swift team,
02:24:17.480 | because they've come, I mean, we, but they came so far,
02:24:22.600 | and made so much progress over this time since launch.
02:24:26.360 | It's pretty incredible,
02:24:27.200 | and Swift is a very, very good thing.
02:24:29.480 | But I just don't want to do that again, right?
02:24:31.000 | And so--
02:24:31.840 | - So, iterate more through the development process.
02:24:35.520 | - And so what we're doing is we're not launching it
02:24:37.400 | when it's, hopefully, 0.9 with no testers.
02:24:40.600 | We're launching it and saying it's 0.1, right?
02:24:43.200 | And so we're setting expectations of saying like,
02:24:45.040 | "Okay, well, don't use this for production."
02:24:47.920 | Right, if you're interested in what we're doing,
02:24:49.720 | we'll do it in an open way, and we can do it together,
02:24:53.320 | but don't use it in production yet.
02:24:54.960 | Like, we'll get there, but let's do it the right way.
02:24:57.760 | And I'm also saying, we're not in a race.
02:25:01.120 | The thing that I want to do is build the world's best thing.
02:25:03.960 | - Yeah.
02:25:04.800 | - Right, because if you do it right,
02:25:06.760 | and it lifts the industry,
02:25:08.280 | it doesn't matter if it takes an extra two months.
02:25:10.040 | - Yeah.
02:25:10.880 | - Like, two months is worth waiting.
02:25:11.720 | And so doing it right,
02:25:13.760 | and not being overwhelmed with technical debt
02:25:16.160 | and things like this is like, again, war wounds.
02:25:20.000 | Lessons learned, whatever you want to say,
02:25:22.240 | I think is absolutely the right thing to do,
02:25:23.920 | even though right now people are very frustrated
02:25:26.000 | that you can't download it,
02:25:27.200 | or it doesn't have feature X or something like this.
02:25:30.240 | - What have you learned in the little bit of time
02:25:34.160 | since it's been released into the wild
02:25:38.280 | that people have been complaining about feature X or Y or Z?
02:25:41.840 | What have they been complaining about?
02:25:43.200 | What they have been excited about?
02:25:46.360 | Like, almost like detailed things
02:25:48.440 | versus like big vision. - Yeah, yeah.
02:25:49.720 | - I think everyone would be very excited
02:25:51.840 | about the big vision.
02:25:53.040 | - Yeah, yeah, well, so I mean, I've been very pleased.
02:25:54.760 | I mean, in fact, I mean, we've been massively overwhelmed
02:25:57.400 | with response, which is a good problem to have.
02:26:00.880 | It's kind of like a success disaster,
02:26:02.760 | in a sense, right?
02:26:03.840 | And so, I mean, if you go back in time,
02:26:08.080 | when we started Modular,
02:26:09.200 | which was just not yet a year and a half ago,
02:26:12.120 | so it's still a pretty new company, new team,
02:26:15.040 | small but very good team of people.
02:26:17.680 | Like we started with extreme conviction
02:26:20.240 | that there's a set of problems that we need to solve.
02:26:22.320 | And if we solve it,
02:26:23.160 | then people will be interested in what we're doing, right?
02:26:26.240 | But again, you're building in basically secret, right?
02:26:29.480 | You're trying to figure it out.
02:26:31.280 | It's, creation's a messy process.
02:26:33.160 | You're having to go through different paths
02:26:34.760 | and understand what you wanna do and how to explain it.
02:26:37.360 | Often when you're doing disruptive and new kinds of things,
02:26:40.960 | just knowing how to explain it is super difficult, right?
02:26:44.480 | And so when we launched, we hoped people would be excited,
02:26:48.320 | but I'm an optimist,
02:26:50.480 | but I'm also like, don't wanna get ahead of myself.
02:26:53.160 | And so when people found out about Mojo,
02:26:55.600 | I think their heads exploded a little bit, right?
02:26:58.640 | And here's, I think, a pretty credible team
02:27:01.560 | that has built some languages and some tools before.
02:27:03.520 | And so they have some lessons learned
02:27:06.120 | and are tackling some of the deep problems
02:27:08.320 | in the Python ecosystem and giving it the love
02:27:10.600 | and attention that it should be getting.
02:27:12.440 | And I think people got very excited about that.
02:27:14.200 | And so if you look at that,
02:27:15.480 | I mean, I think people are excited about ownership
02:27:17.760 | and taking a step beyond rust, right?
02:27:19.440 | And there's people that are very excited about that.
02:27:20.880 | And there's people that are excited about,
02:27:23.000 | you know, just like,
02:27:24.280 | I made Game of Life go 400 times faster, right?
02:27:28.080 | And things like that.
02:27:28.920 | And that's really cool.
02:27:29.760 | There are people that are really excited about the,
02:27:31.400 | okay, I really hate writing stuff in C++, save me.
02:27:34.600 | - Like systems engineer, they're like stepping up like,
02:27:36.720 | oh, yes. - Yeah, yeah.
02:27:37.840 | So that's me, by the way, also.
02:27:41.640 | I really wanna stop writing C++.
02:27:44.080 | But the--
02:27:45.160 | - I get third person excitement when people tweet,
02:27:49.520 | yeah, I made this code, Game of Life or whatever faster.
02:27:52.720 | And you're like, yeah.
02:27:54.080 | - Yeah, and also like,
02:27:55.560 | I would also say that,
02:27:58.000 | let me cast blame out to people who deserve it.
02:28:02.200 | - Sure.
02:28:03.040 | - These terrible people who convinced me to do some of this.
02:28:05.920 | - Yes.
02:28:06.760 | - Jeremy Howard. - Yes.
02:28:08.120 | - That guy.
02:28:09.680 | - Well, he's been pushing for this kind of thing.
02:28:11.360 | He's been pushing for-- - He's wanted this
02:28:12.200 | for years. - Yeah, he's wanted this
02:28:14.060 | for a long, long time. - He's wanted this for years.
02:28:15.840 | And so-- - For people who don't know
02:28:16.760 | Jeremy Howard, he's like one of the most legit people
02:28:19.200 | in the machine learning community.
02:28:20.800 | He's a grassroots, he really teaches,
02:28:24.800 | he's an incredible educator, he's an incredible teacher,
02:28:26.840 | but also legit in terms of a machine learning engineer
02:28:30.040 | himself. - Yes.
02:28:30.880 | - And he's been running the fast.ai
02:28:33.640 | and looking, I think, for exactly what you've done.
02:28:36.720 | - Exactly, and so, I mean, the first time,
02:28:40.480 | so I met Jeremy pretty early on.
02:28:42.800 | But the first time I sat up and I'm like,
02:28:46.280 | this guy is ridiculous, is when I was at Google
02:28:49.380 | and we were bringing up TPUs and we had a whole team
02:28:51.520 | of people and there was this competition called DawnBench
02:28:56.520 | of who can train ImageNet fastest, right?
02:29:01.520 | And Jeremy and one of his researchers crushed Google
02:29:05.680 | not through sheer force of the amazing amount of compute
02:29:09.640 | and the number of TPUs and stuff like that,
02:29:11.560 | that he just decided that progressive imagery sizing
02:29:14.760 | was the right way to train the model and fewer epochs faster
02:29:17.640 | and make the whole thing go vroom, right?
02:29:20.720 | And I'm like, this guy is incredible.
02:29:24.160 | So you can say, anyways, come back to, you know,
02:29:27.120 | where's Mojo coming from?
02:29:28.840 | Chris finally listened to Jeremy.
02:29:30.580 | (laughing)
02:29:32.240 | It's all his fault.
02:29:33.200 | - Well, there's a kind of very refreshing,
02:29:37.880 | pragmatic view that he has about machine learning
02:29:40.680 | that I don't know if it's like this mix of a desire
02:29:45.680 | for efficiency but ultimately grounded in a desire
02:29:50.260 | to make machine learning more accessible to a lot of people.
02:29:53.840 | I don't know what that is.
02:29:54.680 | I guess that's coupled with efficiency and performance
02:29:58.360 | but it's not just obsessed about performance.
02:30:01.280 | - So a lot of AI and AI research ends up being
02:30:03.640 | that it has to go fast enough to get scale.
02:30:07.280 | So a lot of people don't actually care about performance,
02:30:09.520 | particularly on the research side,
02:30:10.880 | until it allows them to have a bigger data set, right?
02:30:14.040 | And so suddenly now you care about distributed compute
02:30:16.720 | and like all these exotic HPC,
02:30:18.600 | like you don't actually wanna know about that.
02:30:20.200 | You just want to be able to do more experiments faster
02:30:23.080 | and do so with bigger data sets, right?
02:30:25.040 | And so Jeremy has been really pushing the limits.
02:30:27.920 | And one of the things I'll say about Jeremy,
02:30:29.920 | and there's many things I could say about Jeremy
02:30:31.600 | 'cause I'm a fanboy of his, but he, it fits in his head.
02:30:36.800 | And Jeremy actually takes the time where many people don't
02:30:39.560 | to really dive deep into why is the beta parameter
02:30:43.360 | of the atom optimizer equal to this, right?
02:30:46.360 | And he'll go survey and understand
02:30:49.260 | what are all the activation functions and the trade-offs
02:30:51.280 | and why is it that everybody that does this model
02:30:54.720 | pick that thing.
02:30:55.800 | - So the why, not just trying different values,
02:30:58.600 | like really what is going on here?
02:31:00.760 | - Right, and so as a consequence of that,
02:31:02.480 | like he's always, again, he makes time
02:31:05.160 | but he spends time to understand things at a depth
02:31:08.280 | that a lot of people don't.
02:31:09.720 | And as you say, he then brings it and teaches people.
02:31:12.960 | And his mission is to help lift,
02:31:16.280 | his website says, "Making AI uncool again."
02:31:18.720 | Like it's about, like forget about the hype,
02:31:21.060 | it's actually practical and useful.
02:31:22.580 | Let's teach people how to do this, right?
02:31:24.840 | Now the problem Jeremy struggled with
02:31:26.240 | is that he's pushing the envelope, right?
02:31:28.580 | Research isn't about doing the thing
02:31:30.080 | that is staying on the happy path
02:31:31.600 | or the well-paved road, right?
02:31:34.180 | And so a lot of the systems today
02:31:35.840 | have been these really fragile, fragmented things
02:31:38.640 | or special case in this happy path.
02:31:40.280 | And if you fall off the happy path,
02:31:42.400 | you get eaten by an alligator.
02:31:43.900 | So what about, so Python has this giant ecosystem
02:31:50.320 | of packages and there's a package repository.
02:31:54.800 | Do you have ideas of how to do that well for Mojo?
02:31:58.360 | - Yeah.
02:31:59.200 | - How to do a repository of packages?
02:32:00.720 | - Well, so that's another really interesting problem
02:32:02.560 | that I knew about, but I didn't understand
02:32:05.420 | how big of a problem it was.
02:32:07.020 | Python packaging, a lot of people have very big pain points
02:32:11.100 | and a lot of scars with Python packaging.
02:32:12.940 | - Oh, you mean, so there's several things--
02:32:14.980 | - Building and distributing and managing dependencies
02:32:17.820 | and versioning and all this stuff.
02:32:19.900 | - So from the perspective of if you want
02:32:21.860 | to create your own package.
02:32:23.100 | - Yes, and then, or you wanna build on top
02:32:25.380 | of a bunch of other people's packages
02:32:26.940 | and then they get updated and things like this.
02:32:29.100 | Now I'm not an expert in this,
02:32:30.420 | so I don't know the answer.
02:32:33.560 | I think this is one of the reasons why it's great
02:32:35.680 | that we work as a team and there's other really good
02:32:38.160 | and smart people involved.
02:32:39.460 | But one of the things I've heard from smart people
02:32:44.400 | who've done a lot of this is that the packaging
02:32:47.420 | becomes a huge disaster when you get
02:32:49.000 | the Python and C together.
02:32:50.320 | And so if you have this problem where you have code split
02:32:54.400 | between Python and C, now not only do you have to package
02:32:57.680 | the C code, you have to build the C code.
02:33:00.240 | C doesn't have a package manager, right?
02:33:02.580 | C doesn't have a dependency versioning management system.
02:33:05.740 | Right, and so I'm not experienced in the state of the art
02:33:09.060 | and all the different Python package managers,
02:33:12.540 | but my understanding is that's a massive part
02:33:14.860 | of the problem and I think Mojo solves that part
02:33:17.380 | of the problem directly heads on.
02:33:19.300 | Now, one of the things I think we'll do with the community,
02:33:21.780 | and this isn't, again, we're not solving
02:33:24.260 | all the world's problems at once.
02:33:25.260 | We have to be kind of focused to start with,
02:33:27.420 | is that I think that we will have an opportunity
02:33:29.700 | to reevaluate packaging.
02:33:31.920 | - Right. - And so I think
02:33:32.760 | that we can come back and say, okay, well,
02:33:34.760 | given the new tools and technologies and the cool things
02:33:36.880 | we have that we've built up, because we have not just
02:33:39.360 | syntax, we have an entirely new compiler stack
02:33:41.520 | that works in a new way, maybe there's other innovations
02:33:44.320 | we can bring together and maybe we can help solve
02:33:46.080 | that problem.
02:33:47.000 | - So almost a tangent to that question
02:33:48.720 | from the user perspective of packages,
02:33:50.840 | it was always surprising to me that it was not easier
02:33:56.160 | to sort of explore and find packages.
02:33:59.280 | With pip install, it just, it feels,
02:34:04.280 | it's an incredible ecosystem.
02:34:06.320 | - It's huge. - It's just interesting
02:34:08.640 | that it wasn't made, it's still, I think,
02:34:10.920 | not made easier to discover packages to do
02:34:13.840 | like a search and discovery, as YouTube calls it.
02:34:18.840 | - Well, I mean, it's kind of funny because this is one
02:34:22.360 | of the challenges of these intentionally decentralized
02:34:26.160 | communities, and so I don't know what the right answer
02:34:28.740 | is for Python, I mean, there are many people that would,
02:34:32.180 | or I don't even know the right answer for Mojo.
02:34:35.180 | So there are many people that would have much more
02:34:37.540 | informed opinions than I do, but it's interesting
02:34:39.700 | if you look at this, right, open source communities,
02:34:42.200 | you know, there's Git, Git is a fully decentralized,
02:34:46.180 | anybody can do it any way they want,
02:34:47.460 | but then there's GitHub, right, and GitHub,
02:34:50.100 | centralized, commercial in that case, right, thing,
02:34:54.300 | really help pull together and help solve some
02:34:56.120 | of the discovery problems and help build
02:34:57.760 | a more consistent community, and so maybe there's
02:35:01.080 | opportunities for-- - For something like
02:35:02.680 | a GitHub for-- - Yeah.
02:35:04.320 | - Although even GitHub, I might be wrong on this,
02:35:06.600 | but the search and discovery for GitHub is not that great.
02:35:10.680 | Like, I still use Google Search.
02:35:13.140 | - Yeah, well, I mean, maybe that's because GitHub
02:35:15.760 | doesn't want to replace Google Search, right?
02:35:18.680 | I think there is room for specialized solutions
02:35:21.440 | to specific problems, but-- - Sure.
02:35:23.440 | - I don't know, I don't know the right answer
02:35:24.680 | for GitHub either, that's, they can go figure that out.
02:35:28.720 | - But the point is to have an interface that's usable,
02:35:31.040 | that's accessible to people of all different skill levels.
02:35:33.440 | - Well, and again, like, what are the benefit
02:35:35.440 | of standards, right, standards allow you to build
02:35:37.500 | these next level up ecosystem, next level up infrastructure,
02:35:41.040 | next level up things, and so, again, come back to,
02:35:44.840 | I hate complexity, C plus Python is complicated.
02:35:49.320 | It makes everything more difficult to deal with,
02:35:51.400 | it makes it difficult to port, move code around,
02:35:53.760 | work with, all these things get more complicated,
02:35:56.040 | and so, I mean, I'm not an expert,
02:35:58.320 | but maybe Mojo can help a little bit
02:35:59.760 | by helping reduce the amount of C in this ecosystem
02:36:02.760 | and make it therefore scale better.
02:36:03.880 | - So any kind of packages that are hybrid in nature
02:36:06.960 | would be a natural fit to move to Mojo.
02:36:09.480 | - Which is a lot of them, by the way, so.
02:36:11.520 | - A lot of them, especially, they're doing
02:36:13.960 | some interesting stuff computation-wise.
02:36:16.000 | Let me ask you about some features.
02:36:18.880 | - Yeah.
02:36:19.720 | - So we talked about, obviously, indentation,
02:36:22.240 | that it's a typed language, or optionally typed.
02:36:26.120 | Is that the right way to say it?
02:36:27.280 | - It's either optionally or progressively, or--
02:36:29.120 | - Progressively, okay.
02:36:29.960 | - I think, so people have very strong opinions
02:36:32.440 | on the right word to use.
02:36:33.560 | - Yeah.
02:36:34.520 | - I don't know.
02:36:35.360 | - I look forward to your letters.
02:36:36.920 | So there's the var versus let, but let is for constants.
02:36:41.600 | Var is an optional.
02:36:44.520 | - Yeah, var makes it mutable, so you can reassign.
02:36:47.200 | - Okay.
02:36:48.560 | - Then there's function overloading.
02:36:52.600 | - Oh, okay, yeah.
02:36:54.480 | - I mean, there's a lot of source of happiness for me,
02:36:56.400 | but function overloading, that's, I guess,
02:36:59.600 | is that for performance, or is that,
02:37:03.920 | why does Python not have function overloading?
02:37:06.200 | - So I can speculate.
02:37:08.160 | So Python is a dynamic language.
02:37:10.600 | The way it works is that Python and Objective-C
02:37:15.320 | are actually very similar worlds if you ignore syntax.
02:37:20.120 | And so Objective-C is straight-line derived from Smalltalk,
02:37:25.960 | a really venerable, interesting language
02:37:30.440 | that much of the world has forgotten about,
02:37:31.880 | but the people that remember it love it, generally.
02:37:34.880 | And the way that Smalltalk works
02:37:36.400 | is that every object has a dictionary in it,
02:37:39.120 | and the dictionary maps from the name of a function,
02:37:41.440 | or the name of a value within an object,
02:37:43.920 | to its implementation.
02:37:45.680 | And so the way you call a method in Objective-C
02:37:48.000 | is you say, go look up, the way I call foo
02:37:51.000 | is I go look up foo, I get a pointer to the function back,
02:37:53.280 | and then I call it, okay?
02:37:55.240 | That's how Python works, right?
02:37:57.040 | And so now the problem with that
02:37:58.160 | is that the dictionary within a Python object,
02:38:01.520 | all the keys are strings, and it's a dictionary,
02:38:04.960 | so you can only have one entry per name.
02:38:06.600 | - You think it's as simple as that?
02:38:08.040 | - I think it's as simple as that.
02:38:09.360 | And so now, why do they never fix this?
02:38:13.200 | Why do they not change it to not be a dictionary?
02:38:14.800 | Why do they not change it, like, do other things?
02:38:18.080 | Well, you don't really have to in Python,
02:38:20.680 | because it's dynamic.
02:38:21.680 | And so you can say, I get into the function,
02:38:24.240 | now if I got past an integer, do some dynamic tests for it,
02:38:27.280 | if it's a string, go do another thing.
02:38:30.200 | There's another additional challenge,
02:38:31.560 | which is even if you did support overloading,
02:38:33.560 | you're saying, okay, well, here's a version
02:38:35.120 | of a function for integers and a function for strings.
02:38:38.240 | Well, you'd have, even if you could put it
02:38:39.640 | in that dictionary, you'd have to have the caller
02:38:41.640 | do the dispatch, and so every time you call the function,
02:38:44.520 | you'd have to say, is it an integer, is it a string?
02:38:46.520 | And so you'd have to figure out where to do that test.
02:38:48.840 | And so in a dynamic language,
02:38:50.440 | overloading is something you don't have to have.
02:38:54.440 | But now you get into a typed language,
02:38:58.000 | and in Python, if you subscript with an integer,
02:39:02.720 | then you get typically one element out of a collection.
02:39:06.200 | If you subscript with a range,
02:39:08.280 | you get a different thing out.
02:39:10.360 | And so often in typed languages,
02:39:12.360 | you'll wanna be able to express the fact that,
02:39:14.240 | cool, I have different behavior,
02:39:16.640 | depending on what I actually pass into this thing.
02:39:19.120 | If you can model that, it can make it safer
02:39:20.680 | and more predictable and faster and all these things.
02:39:23.760 | - It somehow feels safer, yes,
02:39:26.360 | but it also feels empowering in terms of clarity.
02:39:29.600 | Like you don't have to design whole different functions.
02:39:32.520 | - Yeah, well, and this is also one of the challenges
02:39:34.600 | with the existing Python typing systems
02:39:38.600 | is that in practice, like you take subscript,
02:39:41.320 | like in practice, a lot of these functions,
02:39:43.360 | they don't have one signature.
02:39:45.560 | They actually have different behavior in different cases.
02:39:47.720 | And so this is why it's difficult to retrofit this
02:39:50.680 | into existing Python code and make it play well with typing.
02:39:55.680 | You kind of have to design for that.
02:39:57.520 | - Okay, so there's an interesting distinction
02:40:00.400 | that people that program Python might be interested in
02:40:02.680 | is def versus fn.
02:40:04.560 | So it's two different ways to define a function.
02:40:08.400 | - Yeah.
02:40:09.240 | - And fn is a stricter version of def.
02:40:13.680 | What's the coolness that comes from the strictness?
02:40:16.240 | - So here you get into what is the trade-off
02:40:18.480 | with the superset?
02:40:19.840 | - Yes.
02:40:20.680 | - Okay, so a superset, you have to,
02:40:23.320 | or you really want to be compatible.
02:40:25.680 | Like if you're doing a superset,
02:40:26.640 | you've decided compatibility with existing code
02:40:30.160 | is the important thing,
02:40:31.680 | even if some of the decisions they made
02:40:33.040 | were maybe not what you'd choose.
02:40:34.280 | - Yeah. - Okay.
02:40:36.160 | So that means you put a lot of time into compatibility
02:40:38.480 | and it means that you get locked into decisions of the past,
02:40:41.960 | even if they may not have been a good thing, right?
02:40:44.320 | Now, systems programmers typically like to control things.
02:40:47.960 | (laughs)
02:40:48.800 | Right?
02:40:49.640 | And they want to make sure that,
02:40:50.800 | you know, not in all cases, of course,
02:40:52.280 | and even systems programmers are not one thing, right?
02:40:55.200 | But often you want predictability.
02:40:57.480 | And so one of the things that Python has, for example,
02:41:00.360 | as you know, is that if you define a variable,
02:41:02.280 | you just say x equals four.
02:41:04.120 | I have a variable named x.
02:41:05.520 | Now I say some long name equals 17.
02:41:11.400 | Print out some long name.
02:41:13.440 | Oops, but I typoed it.
02:41:15.280 | Right, well, the compiler, the Python compiler doesn't know,
02:41:18.480 | in all cases, what you're defining and what you're using.
02:41:21.060 | And did you typo the use of it or the definition?
02:41:24.640 | Right, and so for people coming from typed languages,
02:41:28.760 | again, I'm not saying they're right or wrong,
02:41:30.320 | but that drives them crazy
02:41:31.680 | because they want the compiler to tell them
02:41:33.120 | you typoed the name of this thing.
02:41:35.080 | Right, and so what Fn does is it turns on,
02:41:37.320 | as you say, it's a strict mode.
02:41:38.640 | And so it says, okay, well,
02:41:40.120 | you have to actually declare,
02:41:41.360 | intentionally declare your variables before you use them.
02:41:43.600 | That gives you more predictability,
02:41:45.240 | more error checking and things like this,
02:41:47.440 | but you don't have to use it.
02:41:51.720 | And this is a way that Mojo is both compatible,
02:41:55.080 | 'cause defs work the same way
02:41:56.160 | that defs have always worked,
02:41:58.160 | but it provides a new alternative
02:41:59.440 | that gives you more control
02:42:00.420 | and it allows certain kinds of people
02:42:02.180 | that have a different philosophy
02:42:03.120 | to be able to express that and get that.
02:42:05.400 | - But usually if you're writing Mojo code from scratch,
02:42:08.800 | you'll be using Fn.
02:42:09.900 | - It depends, again, it depends on your mentality, right?
02:42:13.440 | It's not that def is Python and Fn is Mojo.
02:42:17.840 | Mojo has both and it loves both.
02:42:19.480 | - Right. - And it really depends on--
02:42:20.320 | - Sometimes it's just strict.
02:42:21.440 | - Yeah, exactly.
02:42:22.760 | Are you playing around and scripting something out?
02:42:25.040 | And is it a one-off throwaway script?
02:42:27.200 | Cool, like Python is great at that.
02:42:29.200 | - I'll still be using Fn, but yeah.
02:42:31.280 | - Well, so-- - I love strictness.
02:42:32.960 | - Okay, well, so-- - Control, power.
02:42:35.740 | - You also like suffering, right?
02:42:38.080 | - Yes, I go hand in hand.
02:42:40.200 | - How many pull-ups?
02:42:41.200 | - I've lost count at this point.
02:42:45.960 | - So, and that's cool, I love you for that.
02:42:48.440 | And I love other people who like strict things, right?
02:42:50.400 | But I don't want to say that that's the right thing,
02:42:53.540 | because Python's also very beautiful
02:42:55.240 | for hacking around and doing stuff and research
02:42:57.120 | and these other cases where you may not want that.
02:42:59.560 | You see, I just feel like, maybe I'm wrong on that,
02:43:02.560 | but it feels like strictness leads to faster debugging.
02:43:05.640 | So in terms of going from, even on a small project,
02:43:09.120 | from zero to completion, it just,
02:43:11.880 | I guess it depends how many bugs you generate, usually.
02:43:14.920 | - Well, so, I mean, it's, again,
02:43:16.360 | lessons learned in looking at the ecosystem.
02:43:18.000 | It's really, I mean, I think it's,
02:43:20.680 | if you study some of these languages over time,
02:43:23.140 | like the Ruby community, for example.
02:43:25.160 | Now, Ruby is a pretty well-developed,
02:43:27.180 | pretty established community, but along their path,
02:43:29.660 | they really invested in unit testing.
02:43:32.260 | So I think that the Ruby community has really pushed forward
02:43:35.380 | the state of the art of testing
02:43:37.340 | because they didn't have a type system
02:43:38.580 | that caught a lot of bugs at compile time, right?
02:43:40.900 | And so you can have the best of both worlds.
02:43:43.180 | You can have good testing and good types, right?
02:43:44.900 | And things like this, but I thought that
02:43:47.060 | it was really interesting to see how
02:43:48.580 | certain challenges get solved.
02:43:49.940 | And in Python, for example,
02:43:52.000 | the interactive notebook kind of experiences
02:43:53.880 | and stuff like this are really amazing.
02:43:55.540 | And if you typo something, it doesn't matter.
02:43:57.820 | It just tells you, that's fine, right?
02:43:59.220 | And so I think that the trade-offs are very different
02:44:01.180 | if you're building a large-scale production system
02:44:04.740 | versus you're building and exploring a notebook.
02:44:07.620 | - Speaking of control, the hilarious thing,
02:44:09.180 | if you look at code I read just for myself for fun,
02:44:11.940 | it's like littered with asserts everywhere.
02:44:14.620 | - Okay. (laughing)
02:44:16.260 | - It's a kind of--
02:44:17.540 | - Well, then, yeah, you'd like types.
02:44:18.620 | (laughing)
02:44:19.740 | - It's basically saying in a dictatorial way,
02:44:24.100 | "This should be true now.
02:44:25.420 | "Otherwise, everything stops."
02:44:27.500 | - Well, and that is the sign.
02:44:30.100 | I love you, man.
02:44:31.900 | But that is the sign of somebody who likes control.
02:44:34.580 | - Yeah.
02:44:35.420 | - And so, yes, I think that you'll like FN.
02:44:36.980 | - I'll have this turn into a therapy.
02:44:37.820 | - I think you'll like Mojo.
02:44:38.940 | - Therapy session, yes, I definitely will.
02:44:41.100 | Speaking of asserts, exceptions are called errors.
02:44:46.240 | Why is it called errors?
02:44:47.420 | - So, I mean, we use the same, we're the same as Python,
02:44:51.380 | right, but we implement in a very different way, right?
02:44:53.900 | And so, if you look at other languages,
02:44:56.540 | like we'll pick on C++, our favorite, right?
02:44:59.340 | C++ has a thing called zero-cost exception handling.
02:45:02.980 | Okay, and this is, in my opinion,
02:45:07.500 | something to learn lessons from.
02:45:09.060 | (laughing)
02:45:09.900 | - It's a nice, polite way of saying it.
02:45:11.620 | - And so, zero-cost exception handling,
02:45:15.340 | the way it works is that it's called zero-cost
02:45:18.300 | because if you don't throw an exception,
02:45:21.140 | there's supposed to be no overhead for the non-error code.
02:45:25.300 | And so, it takes the error path out of the common path.
02:45:29.940 | It does this by making throwing an error
02:45:33.100 | extremely expensive.
02:45:35.040 | And so, if you actually throw an error
02:45:36.700 | with a C++ compiler using exceptions,
02:45:39.340 | has to go look up in tables on the side
02:45:41.140 | and do all this stuff,
02:45:41.980 | and so throwing an error could be like 10,000 times
02:45:44.540 | more expensive than referring from a function, right?
02:45:47.860 | Also, it's called zero-cost exceptions,
02:45:49.840 | but it's not zero-cost by any stretch of the imagination
02:45:52.860 | because it massively bloats out your code, your binary.
02:45:55.820 | It also adds a whole bunch of different paths
02:45:59.060 | because of destructors and other things like that
02:46:01.180 | that exist in C++.
02:46:02.460 | And it reduces the number of optimizations.
02:46:04.620 | It has all these effects.
02:46:06.220 | And so, this thing that was called zero-cost exceptions,
02:46:09.900 | it really ain't.
02:46:10.900 | (laughing)
02:46:11.740 | Okay, now, if you fast-forward to newer languages,
02:46:15.740 | and this includes Swift and Rust and Go,
02:46:18.860 | and now Mojo,
02:46:20.460 | well, and Python's a little bit different
02:46:24.120 | because it's interpreted,
02:46:24.980 | and so it's got a little bit of a different thing going on,
02:46:26.900 | but if you look at compiled languages,
02:46:29.040 | many newer languages say,
02:46:32.780 | "Okay, well, let's not do that zero-cost exception
02:46:35.660 | "handling thing.
02:46:36.900 | "Let's actually treat throwing an error
02:46:39.560 | "the same as returning a variant,
02:46:42.860 | "returning either the normal result or an error."
02:46:46.780 | Now, programmers generally don't want to deal
02:46:50.580 | with all the typing machinery
02:46:51.940 | and pushing around a variant.
02:46:54.740 | And so, you use all the syntax that Python gives us.
02:46:57.220 | For example, try and catch,
02:46:58.560 | functions that raise and things like this.
02:47:01.500 | You can put a raises decorator on your function,
02:47:04.260 | stuff like this, if you wanna control that.
02:47:06.660 | And then, the language can provide syntax for it,
02:47:09.860 | but under the hood, the way the computer executes it,
02:47:12.540 | throwing an error is basically
02:47:14.380 | as fast as returning something.
02:47:15.860 | - Oh, interesting.
02:47:16.700 | So, it's exactly the same way
02:47:17.540 | from a compiler perspective.
02:47:19.140 | - And so, this is actually,
02:47:20.740 | I mean, it's a fairly nerdy thing, right?
02:47:23.260 | Which is why I love it.
02:47:24.780 | But this has a huge impact on the way you design your APIs.
02:47:29.780 | So, in C++, huge communities turn off exceptions
02:47:34.820 | because the cost is just so high, right?
02:47:37.500 | And so, the zero-cost cost is so high, right?
02:47:40.420 | And so, that means you can't actually use exceptions
02:47:43.420 | in many libraries.
02:47:45.640 | - Interesting.
02:47:46.600 | - And even for the people that do use it,
02:47:48.640 | well, okay, how and when do you wanna pay the cost?
02:47:51.880 | If I try to open a file, should I throw an error?
02:47:55.200 | Well, what if I'm probing around looking for something,
02:47:58.160 | right, I'm looking it up in many different paths.
02:48:00.000 | Well, if it's really slow to do that,
02:48:01.980 | maybe I'll add another function
02:48:03.880 | that doesn't throw an error,
02:48:05.120 | returns an error code instead,
02:48:07.140 | and I have two different versions of the same thing,
02:48:09.080 | and so it causes you to fork your APIs.
02:48:11.640 | And so, you know, one of the things I learned
02:48:14.040 | from Apple and Nisal Love is the art of API design
02:48:17.160 | is actually really profound.
02:48:18.800 | I think this is something that Python's also done
02:48:20.400 | a pretty good job at in terms of building out
02:48:23.140 | this large-scale package ecosystem.
02:48:24.440 | It's about having standards and things like this.
02:48:26.720 | And so, you know, we wouldn't wanna enter a mode
02:48:28.840 | where, you know, there's this theoretical feature
02:48:32.040 | that exists in language,
02:48:32.920 | but people don't use it in practice.
02:48:35.040 | Now, I'll also say one of the other really cool things
02:48:37.440 | about this implementation approach
02:48:39.200 | is that it can run on GPUs,
02:48:40.440 | and it can run on accelerators and things like this,
02:48:42.360 | and that standard zero-cost exception thing
02:48:45.720 | would never work on an accelerator.
02:48:47.400 | And so, this is also part of how Mojo can scale
02:48:49.800 | all the way down to, like, little embedded systems
02:48:52.000 | and to running on GPUs and things like that.
02:48:54.760 | - Can you actually say about the,
02:48:56.560 | maybe, is there some high-level way
02:49:00.560 | to describe the challenge of exceptions
02:49:03.520 | and how they work in code during compilation?
02:49:06.860 | So, just this idea of percolating up a thing, an error.
02:49:11.880 | - Yeah, yeah, so the way to think about it is,
02:49:15.160 | think about a function that doesn't return anything,
02:49:17.080 | just as a simple case, right?
02:49:18.320 | And so, you have function one calls function two
02:49:22.640 | calls function three, calls function four,
02:49:25.040 | along that call stack that are tri-blocks, right?
02:49:28.160 | And so, if you have function one calls function two,
02:49:30.120 | function two has a tri-block,
02:49:31.840 | and then within it, it calls function three, right?
02:49:34.520 | Well, what happens if function three throws?
02:49:36.760 | Well, actually, start simpler.
02:49:39.440 | What happens if it returns?
02:49:40.720 | Well, if it returns, it's supposed to go back out
02:49:42.560 | and continue executing and then fall off the bottom
02:49:44.520 | of the tri-block and keep going, and all's good.
02:49:47.440 | If the function throws, you're supposed to exit
02:49:49.800 | the current function and then get into the accept clause,
02:49:53.880 | right, and then do whatever code's there
02:49:55.280 | and then keep following on and going on.
02:49:57.400 | And so, the way that a compiler like Mojo works
02:50:00.120 | is that the call to that function,
02:50:02.680 | which happens in the accept block, calls a function,
02:50:05.680 | and then instead of returning nothing,
02:50:07.980 | it actually returns, you know, a variant
02:50:10.320 | between nothing and an error.
02:50:13.080 | And so, if you return normally,
02:50:14.760 | fall off the bottom or do a return,
02:50:16.520 | you return nothing, and if you throw an error,
02:50:19.960 | you return the variant that is, I'm an error, right?
02:50:24.480 | So, when you get to the call, you say, okay, cool,
02:50:26.520 | I called a function.
02:50:27.720 | Hey, I know locally I'm in a tri-block, right?
02:50:30.720 | And so, I call the function,
02:50:32.320 | and then I check to see what it returns.
02:50:34.120 | Aha, if it's that error thing, jump to the accept block.
02:50:37.360 | - And that's all done for you behind the scenes.
02:50:39.720 | - Exactly, and so the compiler does all this for you.
02:50:42.040 | And I mean, one of the things,
02:50:43.360 | if you dig into how this stuff works in Python,
02:50:45.560 | it gets a little bit more complicated
02:50:46.760 | because you have finally blocks,
02:50:49.120 | which now you need to go into, do some stuff,
02:50:51.680 | and then those can also throw and return.
02:50:54.280 | - Wait, what, nested?
02:50:55.120 | - Yeah, and like, this stuff matters for compatibility.
02:50:58.760 | Like, there's-- - Really?
02:50:59.680 | You can nest them?
02:51:00.880 | - There's with clauses, and so with clauses
02:51:02.960 | are kind of like finally blocks
02:51:04.120 | with some special stuff going on, and so there's--
02:51:06.160 | - Nesting in general, nesting of anything,
02:51:08.080 | nesting of functions should be illegal.
02:51:11.080 | (laughing)
02:51:12.680 | It just feels like it adds a level of complexity.
02:51:15.240 | - Lex, I'm merely an implementer,
02:51:16.680 | and so this is, again, one of the trade-offs you get
02:51:21.680 | when you decide to build a superset
02:51:22.920 | is you get to implement a full fidelity implementation
02:51:26.280 | of the thing that you decided is good,
02:51:28.680 | and so, yeah, I mean, we can complain
02:51:33.080 | about the reality of the world and shake our fist, but--
02:51:36.440 | - It always feels like you shouldn't be allowed to do that,
02:51:39.280 | like, to declare functions inside functions.
02:51:41.640 | Inside functions.
02:51:43.840 | - Oh, wait, wait, wait, what happened to Lex, the Lisp guy?
02:51:47.680 | - No, I understand that,
02:51:48.640 | but Lisp is what I used to do in college.
02:51:51.800 | (laughing)
02:51:52.640 | - So now you've grown up?
02:51:53.840 | - You know, we've all done things in college
02:51:56.560 | we're not proud of, no, no, no.
02:51:57.960 | - Wait a sec, wait a sec. - I love Lisp, I love Lisp.
02:52:00.160 | - Okay, yeah, I was gonna say,
02:52:01.240 | you're afraid of me irritating the whole internet?
02:52:03.240 | - Yeah, no, no, I love Lisp.
02:52:05.320 | It worked as a joke in my head and it came out right.
02:52:09.480 | - So nested functions are, joking aside,
02:52:11.360 | actually really great for certain things, right?
02:52:13.800 | And so these are also called closures.
02:52:16.400 | Closures are pretty cool, and you can pass callbacks,
02:52:18.960 | there's a lot of good patterns, and so.
02:52:21.120 | - So speaking of which, I don't think you have
02:52:24.520 | nested functions implemented yet in Mojo.
02:52:28.920 | - We don't have lambda syntax, but we do have--
02:52:30.840 | - Lambda syntax. - Nested functions, yeah.
02:52:32.760 | - There's a few things on the roadmap that you have
02:52:34.600 | that it'd be cool to sort of just fly through
02:52:37.320 | 'cause it's interesting to see how many features
02:52:40.880 | there are in a language, small and big.
02:52:43.280 | - Yep. - They have to implement.
02:52:44.880 | - Yeah. - So first of all,
02:52:45.800 | there's tuple support, and that has to do
02:52:48.040 | with some very specific aspect of it,
02:52:49.800 | like the parentheses are not parentheses, that.
02:52:52.480 | - Yeah, this is just a totally syntactic thing.
02:52:54.480 | - A syntactic thing.
02:52:55.560 | Okay, but it's cool still.
02:52:57.620 | So keyword arguments and functions?
02:53:01.400 | - Yeah, so this is where in Python you can say
02:53:03.760 | call a function x equals four.
02:53:05.680 | - Yeah. - And x is the name
02:53:06.880 | of the argument.
02:53:07.720 | - That's a nice sort of self-documenting feature.
02:53:11.240 | - Yeah, and again, this isn't rocket science to implement,
02:53:13.480 | this is just the laundry list of things.
02:53:14.680 | - It's just on the list.
02:53:15.880 | - The bigger features are things like traits.
02:53:19.880 | So traits are when you wanna define abstract,
02:53:22.600 | so when you get into typed languages,
02:53:25.280 | you need the ability to write generics,
02:53:27.680 | and so you wanna say I wanna write this function,
02:53:29.520 | and now I want to work on all things
02:53:31.140 | that are arithmetic-like.
02:53:33.520 | Well, what does arithmetic-like mean?
02:53:34.920 | Well, arithmetic-like is a categorization
02:53:38.020 | of a bunch of types, and so, again,
02:53:40.840 | you can define it in many different ways,
02:53:41.960 | and I'm not gonna go into ring theory or something,
02:53:44.760 | but you can say it's arithmetic-like
02:53:47.320 | if you can add, subtract, multiply, divide it, for example.
02:53:50.400 | And so what you're saying is you're saying
02:53:52.320 | there's a set of traits that apply
02:53:54.700 | to a broad variety of types,
02:53:57.540 | and so all these types are arithmetic-like,
02:54:00.200 | all these tensors and floating point integer,
02:54:02.280 | and there's this category of types,
02:54:04.960 | and then I can define on an orthogonal axis algorithms
02:54:08.520 | that then work against types that have those properties.
02:54:11.280 | And so this is a, again, it's a widely known thing.
02:54:15.600 | It's been implemented in Swift and Rust
02:54:17.360 | and many languages, so it's not Haskell,
02:54:20.320 | which is where everybody learns their tricks from,
02:54:24.240 | but we need to implement that,
02:54:26.960 | and that'll enable a new level of expressivity.
02:54:29.720 | - So classes.
02:54:31.960 | - Yeah, classes are a big deal.
02:54:33.320 | - It's a big deal still to be implemented.
02:54:36.000 | Like you said, Lambda syntax,
02:54:40.200 | and there's detail stuff like whole module import,
02:54:43.120 | support for top-level code and file scope,
02:54:48.540 | and then global variables also,
02:54:50.300 | so being able to have variables outside of a top-level--
02:54:54.440 | - Well, and so this comes back to where Mojo came from
02:54:57.200 | and the fact that this is 0.1, right?
02:54:59.040 | And so we're building, so Modular's building an AI stack,
02:55:03.560 | right, and an AI stack has a bunch of problems
02:55:05.680 | working with hardware and writing high-performance kernels
02:55:08.760 | and doing this kernel fusion thing I was talking about
02:55:10.640 | and getting the most out of the hardware,
02:55:12.680 | and so we've really prioritized and built Mojo
02:55:14.960 | to solve Modular's problem, right?
02:55:18.280 | Now, our North Star is build out
02:55:20.120 | and support all the things,
02:55:21.280 | and so we're making incredible progress.
02:55:23.080 | By the way, Mojo's only like seven months old,
02:55:25.960 | so that's another interesting thing.
02:55:27.800 | - I mean, part of the reason I wanted to mention
02:55:29.280 | some of these things is like there's a lot to do
02:55:32.640 | and it's pretty cool how you just kinda,
02:55:35.120 | sometimes you take for granted
02:55:37.040 | how much there is in a programming language,
02:55:38.600 | how many cool features you kinda rely on,
02:55:40.320 | and this is kinda a nice reminder
02:55:41.960 | when you lay it as a to-do list.
02:55:44.220 | - Yeah, and so, I mean, but also you look into,
02:55:47.160 | it's amazing how much is also there,
02:55:49.040 | and you take it for granted that a value,
02:55:52.920 | if you define it, it will get destroyed automatically.
02:55:56.800 | Like that little feature itself
02:55:58.000 | is actually really complicated,
02:55:59.800 | given the way the ownership system has to work,
02:56:01.920 | and the way that works within Mojo
02:56:03.880 | is a huge step forward from what Rust and Swift have done.
02:56:06.440 | - Wait, can you say that again?
02:56:07.280 | When a value, when you define it,
02:56:09.240 | it gets destroyed automatically?
02:56:10.080 | - Yeah, so like say you have a string, right?
02:56:12.000 | So you just find a string on the stack, okay?
02:56:14.040 | Or whatever that means, like in your local function.
02:56:17.440 | Right, and so you say, like whether it be in a def,
02:56:20.760 | and so you just say x equals hello world, right?
02:56:24.080 | Well, if your string type requires you to allocate memory,
02:56:27.920 | then when it's destroyed, you have to deallocate it.
02:56:30.720 | So in Python and Mojo,
02:56:31.920 | you define that with the del method, right?
02:56:34.940 | Where does that get run?
02:56:36.140 | Well, it gets run sometime between the last use of the value
02:56:43.560 | and the end of the program.
02:56:46.480 | Like in this, you now get into garbage collection,
02:56:49.560 | you get into like all these long debated,
02:56:51.960 | you talk about religions and trade-offs
02:56:55.400 | and things like this,
02:56:56.240 | this is a hugely hotly contested world.
02:56:59.360 | If you look at C++, the way this works is that
02:57:01.880 | if you define a variable,
02:57:04.140 | or a set of variables within a function,
02:57:06.520 | they get destroyed in a last in, first out order.
02:57:10.800 | So it's like nesting, okay?
02:57:12.320 | This has a huge problem,
02:57:14.720 | because if you define, you have a big scope,
02:57:16.840 | and you define a whole bunch of values at the top,
02:57:18.920 | and then you use them,
02:57:20.040 | and then you do a whole bunch of code that doesn't use them,
02:57:22.640 | they don't get destroyed until the very end of that scope.
02:57:25.800 | And so, this also destroys tail calls,
02:57:28.280 | so good functional programming, right?
02:57:30.040 | This has a bunch of different impacts on,
02:57:32.920 | you talk about reference counting optimizations
02:57:35.120 | and things like this, a bunch of very low level things.
02:57:37.920 | And so what Mojo does is it has a different approach on that
02:57:40.960 | from any language I'm familiar with,
02:57:42.840 | where it destroys them as soon as possible.
02:57:45.000 | And by doing that, you get better memory use,
02:57:48.400 | you get better predictability,
02:57:49.640 | you get tail calls that work,
02:57:51.360 | you get a bunch of other things,
02:57:52.800 | you get better ownership tracking.
02:57:54.240 | There's a bunch of these very simple things
02:57:56.520 | that are very fundamental,
02:57:58.640 | that are already built in there in Mojo today,
02:58:01.400 | that are the things that nobody talks about generally,
02:58:03.960 | but when they don't work right,
02:58:05.200 | you find out and you have to complain about.
02:58:07.880 | - Is it trivial to know what's the soonest possible
02:58:11.840 | to delete a thing that's not gonna be used again?
02:58:14.000 | - Yeah, well, I mean, it's generally trivial,
02:58:15.880 | it's after the last use of it.
02:58:17.400 | So if you define X as a string,
02:58:19.360 | and then you have some use of X somewhere in your code.
02:58:21.840 | - Within that scope, you mean,
02:58:23.080 | within the scope that is accessible?
02:58:25.440 | - Yeah, exactly.
02:58:26.280 | So you can only use something within its scope.
02:58:28.200 | And so then it doesn't wait
02:58:29.800 | until the end of the scope to delete it,
02:58:31.920 | it destroys it after the last use.
02:58:34.240 | - So there's kind of some very eager machine
02:58:36.760 | that's just sitting there and deleting.
02:58:38.320 | - Yeah, and it's all in the compiler,
02:58:39.320 | so it's not at runtime, which is also cool.
02:58:41.880 | And so, yeah, and so what,
02:58:45.240 | and this is actually non-trivial
02:58:46.440 | because you have control flow.
02:58:48.360 | And so it gets complicated pretty quickly.
02:58:50.040 | And so like getting this right was not--
02:58:51.760 | - Oh, so you have to insert delete, like in a lot of places.
02:58:54.280 | - Potentially, yeah, exactly.
02:58:55.680 | So the compiler has to reason about this.
02:58:57.280 | And this is where, again,
02:58:58.640 | it's experience building languages
02:58:59.960 | and not getting this right.
02:59:01.120 | So again, you get another chance to do it
02:59:02.920 | and you get basic things like this, right?
02:59:05.720 | But it's extremely powerful when you do that.
02:59:08.480 | And so there's a bunch of things like that
02:59:09.960 | that kind of combine together.
02:59:12.280 | And this comes back to the,
02:59:13.840 | you get a chance to do it the right way,
02:59:15.280 | do it the right way,
02:59:16.200 | and make sure that every brick you put down is really good
02:59:18.960 | so that when you put more bricks on top of it,
02:59:20.720 | they stack up to something that's beautiful.
02:59:22.680 | - Well, there's also,
02:59:24.320 | like how many design discussions
02:59:28.480 | do there have to be about particular details,
02:59:30.560 | like implementation of particular small features?
02:59:32.880 | Because the features that seem small,
02:59:36.560 | I bet some of them might be like really,
02:59:40.960 | require really big design decisions.
02:59:42.680 | - Yeah, well, so, I mean,
02:59:44.160 | let me give you another example of this.
02:59:45.760 | Python has a feature called Async/Await.
02:59:48.200 | So it's a new feature.
02:59:50.440 | I mean, in the long archives on history,
02:59:53.080 | it's a relatively new feature, right?
02:59:55.440 | That allows way more expressive asynchronous programming.
02:59:59.200 | Okay, again, this is a,
03:00:01.480 | Python's a beautiful thing,
03:00:02.520 | and they did things that are great for Mojo
03:00:04.560 | for completely different reasons.
03:00:06.920 | The reason the Async/Await got added to Python,
03:00:09.840 | as far as I know, is because Python doesn't support threads.
03:00:13.320 | Okay, and so Python doesn't support threads,
03:00:16.560 | but you wanna work with networking
03:00:18.760 | and other things like that that can block.
03:00:20.520 | I mean, Python does support threads,
03:00:21.920 | it's just not its strength.
03:00:23.160 | And so, they added this feature called Async/Await.
03:00:27.760 | It's also seen in other languages like Swift
03:00:29.680 | and JavaScript and many other places as well.
03:00:32.400 | Async/Await in Mojo is amazing,
03:00:35.600 | 'cause we have a high-performance heterogeneous compute
03:00:37.480 | runtime underneath the covers
03:00:39.440 | that then allows non-blocking I/O,
03:00:42.680 | so you get full use of your accelerator.
03:00:45.840 | That's huge, it turns out.
03:00:47.160 | It's actually really an important part
03:00:48.560 | of fully utilizing the machine.
03:00:50.680 | You talk about design discussions,
03:00:52.920 | that took a lot of discussions, right?
03:00:54.640 | And it probably will require more iteration.
03:00:56.800 | And so, my philosophy with Mojo is that,
03:00:59.360 | you know, we have a small team of really good people
03:01:01.240 | that are pushing forward,
03:01:02.320 | and they're very good at the extremely deep,
03:01:05.280 | knowing how the compiler and runtime
03:01:06.880 | and like all the low-level stuff works together,
03:01:09.680 | but they're not perfect.
03:01:11.440 | Same thing as the Swift team, right?
03:01:13.040 | And this is where one of the reasons
03:01:14.600 | we released Mojo much earlier is so we can get feedback.
03:01:18.000 | And we've already like renamed a keyword
03:01:20.800 | due to community feedback.
03:01:22.080 | - Which one?
03:01:22.920 | - We use an ampersand, and now it's named in-out.
03:01:26.960 | We're not renaming existing Python keywords,
03:01:28.720 | 'cause that breaks compatibility, right?
03:01:30.320 | We're renaming things we're adding
03:01:32.520 | and making sure that they are designed well,
03:01:35.120 | we get usage experience,
03:01:36.640 | we iterate and work with the community,
03:01:37.920 | because again, if you scale something really fast
03:01:40.520 | and everybody writes all their code
03:01:41.360 | and they start using it in production,
03:01:42.800 | then it's impossible to change.
03:01:44.600 | And so you wanna learn from people,
03:01:46.160 | you wanna iterate and work on that early on,
03:01:48.120 | and this is where design discussions,
03:01:50.040 | it's actually quite important.
03:01:51.960 | - Could you incorporate an emoji into the language,
03:01:55.040 | into the main language?
03:01:56.280 | - I could.
03:01:57.120 | - Like a--
03:01:57.960 | - Do you have a favorite one?
03:01:59.600 | - Well, I really like, in terms of humor,
03:02:01.840 | I like a raffle, whatever,
03:02:04.160 | rolling on the floor laughing.
03:02:06.320 | So that could be like a,
03:02:07.920 | what would that be the use case for that?
03:02:10.320 | Like an exception, throw an exception of some sort?
03:02:12.360 | I don't know.
03:02:13.200 | - You should totally file a feature request.
03:02:14.920 | - Or maybe a heart one, it has to be a heart one.
03:02:19.840 | - People have told me that I'm insane,
03:02:21.320 | so this is, I'm liking this.
03:02:23.720 | - I'm gonna use the viral nature of the internet
03:02:27.480 | to actually get this passed.
03:02:30.320 | - I mean, it's funny, you come back to the flame emoji,
03:02:32.160 | file extension, right?
03:02:33.440 | We have the option to use the flame emoji,
03:02:38.160 | which just even that concept,
03:02:40.120 | 'cause for example, the people at GitHub say,
03:02:42.280 | "Now I've seen everything."
03:02:43.680 | - Yeah, there's something, it's reinvigorating.
03:02:48.880 | It's like, oh, that's possible.
03:02:52.320 | That's really cool.
03:02:53.160 | For some reason, that makes everything else
03:02:55.480 | seem really exciting.
03:02:56.320 | - I think the world is ready for this stuff, right?
03:02:58.240 | And so when we have a package manager,
03:03:00.000 | we'll clearly have to innovate
03:03:01.680 | by having the compiled package thing
03:03:03.240 | be the little box with the bow on it, right?
03:03:06.400 | I mean, it has to be done.
03:03:08.760 | - It has to be done.
03:03:09.600 | Is there some stuff on the roadmap
03:03:11.280 | that you're particularly stressed about
03:03:13.760 | or excited about that you're thinking about a lot?
03:03:16.000 | - I mean, as of today's snapshot,
03:03:18.160 | which will be obviously tomorrow,
03:03:19.960 | the Lifetime stuff is really exciting.
03:03:21.640 | And so Lifetimes give you safe references to memory
03:03:25.800 | without dangling pointers.
03:03:27.760 | And so this has been done in languages like Rust before,
03:03:29.920 | and so we have a new approach, which is really cool.
03:03:31.760 | I'm very excited about that.
03:03:32.840 | That'll be out to the community very soon.
03:03:35.600 | The traits feature is really a big deal,
03:03:38.800 | and so that's blocking a lot of API design.
03:03:41.120 | And so there's that.
03:03:42.240 | I think that's really exciting.
03:03:43.800 | A lot of it is these kind of table stakes features.
03:03:48.880 | One of the things that is, again,
03:03:50.640 | also lessons learned with Swift
03:03:52.440 | is that programmers in general
03:03:56.480 | like to add syntactic sugar.
03:03:57.940 | And so it's like, oh, well, this annoying thing,
03:04:01.280 | like in Python, you have to spell underbar underbar add.
03:04:05.160 | Why can't I just use plus?
03:04:07.280 | Def plus, come on.
03:04:08.400 | Why can't I just do that, right?
03:04:09.440 | And so trivial bit of syntactic sugar,
03:04:11.520 | it makes sense, it's beautiful, it's obvious.
03:04:14.080 | We're trying not to do that.
03:04:16.440 | And so for two different reasons,
03:04:18.960 | one of which is that, again, lesson learned with Swift,
03:04:21.560 | Swift has a lot of syntactic sugar,
03:04:23.480 | which may be a good thing, maybe not, I don't know.
03:04:28.120 | But because it's such an easy and addictive thing to do,
03:04:31.840 | sugar, like make sure blood get crazy, right?
03:04:35.640 | Like the community will really dig into that
03:04:37.680 | and wanna do a lot of that.
03:04:38.640 | And I think it's very distracting
03:04:40.000 | from building the core abstractions.
03:04:42.040 | The second is we wanna be a good member
03:04:43.400 | of the Python community, right?
03:04:46.760 | And so we wanna work with the broader Python community
03:04:49.880 | and yeah, we're pushing forward
03:04:52.000 | a bunch of systems programming features
03:04:53.520 | and we need to build them out to understand them.
03:04:55.440 | But once we get a long ways forward,
03:04:57.640 | I wanna make sure that we go back to the Python community
03:04:59.640 | and say, okay, let's do some design reviews.
03:05:01.280 | Let's actually talk about this stuff.
03:05:02.400 | Let's figure out how we want this stuff
03:05:04.000 | all to work together.
03:05:04.840 | And syntactic sugar just makes all that more complicated.
03:05:08.000 | - And yeah, list comprehensions are yet to be implemented.
03:05:12.640 | And my favorite, I mean, dictionaries.
03:05:15.760 | - Yeah, there's some basic stuff, 0.1.
03:05:19.600 | - 0.1.
03:05:20.680 | - Yeah, but nonetheless,
03:05:21.520 | it's actually still quite interesting and useful.
03:05:23.520 | - As you've mentioned, modular is very new.
03:05:26.180 | Mojo is very new.
03:05:28.800 | It's a relatively small team.
03:05:31.000 | There's building up this gigantic stack,
03:05:35.040 | this incredible stack that's going to perhaps define
03:05:38.040 | the future of development of our AI overlords.
03:05:43.040 | - We just hope it will be useful.
03:05:45.360 | - As do all of us.
03:05:48.720 | So what have you learned from this process
03:05:52.800 | of building up a team?
03:05:54.080 | Maybe one question is, how do you hire great programmers,
03:05:59.120 | great people that operate in this compiler,
03:06:03.880 | hardware, machine learning, software,
03:06:08.760 | interface design space,
03:06:10.960 | and maybe are a little bit fluid in what they can do?
03:06:14.040 | - So, okay, so--
03:06:15.000 | - Language design, too.
03:06:16.400 | - So building a company is just as interesting
03:06:18.480 | in different ways as building a language.
03:06:22.000 | Like different skill sets, different things,
03:06:23.440 | but super interesting.
03:06:24.840 | And I've built a lot of teams in a lot of different places.
03:06:27.840 | If you zoom in from the big problem into recruiting,
03:06:30.700 | well, so here's our problem, okay?
03:06:33.880 | I'll just, I'll be very straightforward about this.
03:06:36.540 | We started modular with a lot of conviction
03:06:38.880 | about we understand the problems,
03:06:40.560 | we understand the customer pain points,
03:06:42.360 | we need to work backwards from the suffering
03:06:44.880 | in the industry, and if we solve those problems,
03:06:46.840 | we think it'll be useful for people.
03:06:49.240 | But the problem is, is that the people we need to hire,
03:06:51.440 | as you say, are all these super specialized people
03:06:54.320 | that have jobs at big tech, big tech worlds, right?
03:06:58.680 | And I don't think we have product market fit
03:07:02.560 | in the way that a normal startup does,
03:07:04.840 | or we don't have product market fit challenges,
03:07:07.600 | because right now everybody's using AI,
03:07:09.560 | and so many of them are suffering, and they want help.
03:07:12.000 | And so, again, we started with strong conviction.
03:07:14.600 | Now, again, you have to hire and recruit the best,
03:07:17.440 | and the best all have jobs.
03:07:18.720 | And so what we've done is we've said,
03:07:20.120 | okay, well, let's build an amazing culture.
03:07:22.280 | Start with that.
03:07:24.020 | That's usually not something a company starts with.
03:07:26.020 | Usually you hire a bunch of people,
03:07:27.520 | and then people start fighting,
03:07:29.040 | and it turns into a gigantic mess,
03:07:31.240 | and then you try to figure out
03:07:32.160 | how to improve your culture later.
03:07:34.120 | My co-founder, Tim, in particular,
03:07:35.440 | is super passionate about making sure that that's right,
03:07:38.100 | and we've spent a lot of time early on
03:07:40.080 | to make sure that we can scale.
03:07:41.760 | - Can you comment, sorry, before we get to the second,
03:07:44.200 | what makes for a good culture?
03:07:45.700 | - So, I mean, there's many different cultures,
03:07:48.320 | and I have learned many things from many different people.
03:07:50.160 | - You worked with several very unique,
03:07:53.480 | almost famously unique cultures.
03:07:55.180 | - And some of them I learned what to do,
03:07:56.700 | and some of them I learned what not to do.
03:07:58.900 | - Yep.
03:07:59.740 | - Okay, and so we want an inclusive culture.
03:08:03.220 | I believe in amazing people working together.
03:08:09.020 | And so I've seen cultures where people,
03:08:10.800 | you have amazing people, and they're fighting each other.
03:08:13.300 | I see amazing people, and they're told what to do.
03:08:16.500 | Like, "Doubt shalt line up and do what I say.
03:08:18.740 | "It doesn't matter if it's the right thing.
03:08:20.020 | "Do it."
03:08:20.860 | - Mm-hmm. - Right?
03:08:21.700 | And neither of these is the,
03:08:23.100 | and I've seen people that have no direction.
03:08:24.860 | They're just kind of floating in different places.
03:08:27.180 | And they wanna be amazing, they just don't know how.
03:08:29.300 | And so a lot of it starts with have a clear vision.
03:08:32.060 | Right, and so we have a clear vision of what we're doing.
03:08:35.160 | And so I kind of grew up at Apple
03:08:37.940 | in my engineering life, right?
03:08:39.940 | And so a lot of the Apple DNA rubbed off on me.
03:08:43.580 | My co-founder, Tim, also is like a strong product guy.
03:08:46.100 | And so what we learned is, you know,
03:08:48.260 | I saw at Apple that you don't work
03:08:49.860 | from building cool technology.
03:08:52.220 | You don't work from, like, come up with a cool product
03:08:54.740 | and think about the features you'll have
03:08:56.060 | and the big check boxes and stuff like this.
03:08:58.460 | 'Cause if you go talk to customers,
03:08:59.540 | they don't actually care about your product.
03:09:01.700 | They don't care about your technology.
03:09:03.260 | What they care about is their problems.
03:09:05.220 | - Mm-hmm. - Right?
03:09:06.820 | And if your product can help solve their problems,
03:09:09.460 | well, hey, they might be interested in that.
03:09:11.300 | - Mm-hmm. - Right?
03:09:12.140 | And so if you speak to them about their problems,
03:09:13.300 | if you understand and you have compassion,
03:09:14.700 | you understand what people are working with,
03:09:17.000 | then you can work backwards to building an amazing product.
03:09:19.640 | - So the vision starts by defining the problem.
03:09:21.780 | - And then you can work backwards in solving technology.
03:09:24.100 | - Got it. - And at Apple,
03:09:25.060 | like it's, I think, pretty famously said that,
03:09:27.300 | you know, for every, you know,
03:09:30.300 | there's 100 no's for every yes.
03:09:32.020 | I would refine that to say that there's 100 not yet's
03:09:35.260 | for every yes. - Yeah.
03:09:36.580 | - But famously, if you go back to the iPhone, for example,
03:09:39.660 | right, the iPhone 1, I mean, many people laughed at it
03:09:42.940 | because it didn't have 3G, it didn't have copy and paste.
03:09:46.660 | Right?
03:09:47.500 | And then a year later, okay, finally it has 3G,
03:09:50.340 | but it still doesn't have copy and paste, it's a joke.
03:09:52.500 | Nobody will ever use this product,
03:09:53.740 | blah, blah, blah, blah, blah, blah, blah, right?
03:09:55.820 | Well, year three, it had copy and paste
03:09:58.220 | and people stopped talking about it, right?
03:10:00.060 | And so being laser focused and having conviction
03:10:03.820 | and understanding what the core problems are
03:10:06.300 | and giving the team the space
03:10:07.620 | to be able to build the right tech is really important.
03:10:10.700 | Also, I mean, you come back to recruiting,
03:10:13.980 | you have to pay well, right?
03:10:16.020 | So we have to pay industry leading salaries
03:10:17.540 | and have good benefits and things like this,
03:10:19.220 | that's a big piece.
03:10:20.700 | We're a remote first company.
03:10:22.700 | And so we have to,
03:10:24.080 | so remote first has a very strong set of pros and cons.
03:10:31.040 | On the one hand, you can hire people from wherever they are
03:10:34.180 | and you can attract amazing talent,
03:10:35.940 | even if they live in strange places or unusual places.
03:10:39.420 | On the other hand, you have time zones.
03:10:42.100 | On the other hand, you have like everybody
03:10:44.740 | on the internet will fight
03:10:46.180 | if they don't understand each other.
03:10:47.700 | And so we've had to learn how to like have a system
03:10:50.340 | where we actually fly people in
03:10:51.540 | and we get the whole company together periodically.
03:10:53.660 | And then we get work groups together
03:10:55.100 | and we plan and execute together.
03:10:56.700 | - And there's like an intimacy
03:10:58.340 | to the in-person brainstorming that I guess you lose,
03:11:01.740 | but maybe you don't, maybe if you get to know each other well
03:11:04.620 | and you trust each other, maybe you can do that.
03:11:06.660 | - Well, so when the pandemic first hit,
03:11:08.340 | I mean, I'm curious about your experience too.
03:11:09.980 | The first thing I missed was having whiteboards, right?
03:11:13.860 | Those design discussions where like I can high intensity,
03:11:17.820 | work through things, get things done,
03:11:19.900 | work through the problem of the day,
03:11:21.060 | understand where you're on,
03:11:22.340 | figure out and solve the problem and move forward.
03:11:24.840 | But we figured out ways to work around that now
03:11:29.180 | with all these screen sharing
03:11:31.780 | and other things like that that we do.
03:11:33.300 | The thing I miss now is sitting down
03:11:35.660 | at a lunch table with the team.
03:11:37.660 | The spontaneous things like the coffee bar things
03:11:42.060 | and the bumping into each other
03:11:44.100 | and getting to know people outside of the transactional
03:11:47.060 | solve a problem over Zoom thing.
03:11:49.580 | - And I think there's just a lot of stuff that,
03:11:52.660 | I'm not an expert at this, I don't know who is.
03:11:54.960 | Hopefully there's some people,
03:11:56.420 | but there's stuff that somehow is missing on Zoom.
03:11:59.740 | Even with the whiteboard, if you look at that,
03:12:02.940 | if you have a room with one person at the whiteboard
03:12:05.540 | and then there's like three other people at a table,
03:12:10.140 | there's a, first of all, there's a social aspect to that
03:12:13.220 | where you're just shooting the shit a little bit,
03:12:14.940 | almost like--
03:12:15.780 | - Yeah, as people are just kind of coming in and--
03:12:17.620 | - Yeah, that, but also while,
03:12:21.580 | it's a breakout discussion that happens
03:12:23.500 | for like seconds at a time, maybe an inside joke.
03:12:27.660 | It's like this interesting dynamic that happens that Zoom--
03:12:30.500 | - And you're bonding, yeah.
03:12:31.500 | - You're bonding, you're bonding,
03:12:32.740 | but through that bonding, you get the excitement.
03:12:35.780 | There's certain ideas that are like complete bullshit
03:12:38.580 | and you'll see that in the faces of others
03:12:41.180 | that you won't see necessarily on Zoom.
03:12:43.580 | It feels like that should be possible to do
03:12:49.220 | without being in person.
03:12:50.740 | - Well, I mean, being in person is a very different thing.
03:12:54.020 | It's worth it, but you can't always do it.
03:12:56.380 | And so again, we're still learning,
03:12:58.500 | we're all still learning as humanity
03:13:01.300 | with this new reality, right?
03:13:03.020 | But what we found is that getting people together,
03:13:06.220 | whether it be a team or the whole company or whatever,
03:13:08.620 | is worth the expense because people work together
03:13:11.140 | and are happier after that.
03:13:13.340 | Like it just, like there's a massive period of time
03:13:16.460 | where you like go out and things start getting frayed,
03:13:18.660 | pull people together, and then you realize
03:13:20.780 | that we're all working together,
03:13:22.020 | we see things the same way,
03:13:23.020 | we work through the disagreement or the misunderstanding,
03:13:24.940 | we're talking across each other,
03:13:26.300 | and then you work much better together.
03:13:28.180 | And so things like that, I think are really quite important.
03:13:30.740 | - What about people that are kind of specialized
03:13:33.900 | in very different aspects of the stack working together?
03:13:36.300 | What are some interesting challenges there?
03:13:38.380 | - Yeah, well, so I mean, there's lots of interesting people,
03:13:40.580 | as you can tell, I'm hard to deal with too.
03:13:43.020 | (laughing)
03:13:44.180 | - But you're one of the most lovable people.
03:13:46.380 | - So one of the, so there's different philosophies
03:13:50.660 | in building teams.
03:13:51.860 | For me, and so some people say, "Hire 10X programmers,"
03:13:56.100 | and that's the only thing, whatever that means, right?
03:13:58.940 | What I believe in is building well-balanced teams,
03:14:02.540 | teams that have people that are different in them.
03:14:05.420 | Like if you have all generals and no troops,
03:14:08.260 | or all troops and no generals,
03:14:09.620 | or you have all people that think in one way
03:14:12.340 | and not the other way,
03:14:13.180 | what you get is you get a very biased
03:14:14.900 | and skewed and weird situation
03:14:16.440 | where people end up being unhappy.
03:14:18.140 | And so what I like to do is I like to build teams
03:14:20.460 | of people where they're not all the same.
03:14:22.780 | You know, we do have teams that are focused on like runtime
03:14:25.780 | or compiler, GPU, or whatever the speciality is,
03:14:29.520 | but people bring a different take
03:14:30.960 | and have a different perspective.
03:14:32.660 | And I look for people that compliment each other.
03:14:35.060 | And particularly if you look at leadership teams
03:14:37.260 | and things like this,
03:14:38.100 | you don't want everybody thinking the same way.
03:14:40.700 | You want people bringing different perspectives
03:14:42.660 | and experiences.
03:14:43.740 | And so I think that's really important.
03:14:45.460 | - That's team, but what about building a company
03:14:48.020 | as ambitious as Modular?
03:14:49.540 | So what, are there some interesting lessons there?
03:14:53.020 | - Oh, I mean, so many.
03:14:53.900 | Like, so one of the things I love about,
03:14:56.660 | okay, so Modular's the first company I built from scratch.
03:15:00.060 | One of the first things that was profound
03:15:05.060 | was I'm not cleaning up somebody else's mess.
03:15:07.340 | Right, and so if you look at--
03:15:09.540 | - That's liberating to some degree.
03:15:10.980 | - It's super liberating.
03:15:11.900 | And also many of the projects I've built in the past
03:15:16.900 | have not been core to the product of the company.
03:15:20.260 | Swift is not Apple's product, right?
03:15:23.740 | MLIR is not Google's revenue machine or whatever, right?
03:15:28.060 | It's important, but it's like working
03:15:31.120 | on the accounting software for the retail giant
03:15:35.400 | or something, right?
03:15:36.240 | It's like enabling infrastructure and technology.
03:15:39.640 | And so at Modular, the tech we're building
03:15:42.220 | is here to solve people's problems.
03:15:45.000 | Like, it is directly the thing that we're giving to people.
03:15:47.240 | And so this is a really big difference.
03:15:49.680 | And what it means for me as a leader,
03:15:51.360 | but also for many of our engineers
03:15:53.440 | is they're working on the thing that matters.
03:15:55.760 | And that's actually pretty, I mean, again,
03:15:57.520 | for compiler people and things like that,
03:15:59.400 | that's usually not the case, right?
03:16:01.720 | And so that's also pretty exciting and quite nice.
03:16:04.880 | But one of the ways that this manifests
03:16:09.060 | is it makes it easier to make decisions.
03:16:11.300 | And so one of the challenges I've had in other worlds
03:16:13.840 | is it's like, okay, well, community matters somehow
03:16:18.240 | for the goodness of the world,
03:16:19.960 | or open source matters theoretically,
03:16:21.800 | but I don't wanna pay for a T-shirt, right?
03:16:25.840 | Or some swag.
03:16:26.680 | Like, well, T-shirts cost 10 bucks each.
03:16:28.640 | You can have 100 T-shirts for $1,000.
03:16:31.400 | To a megacorp, $1,000 is uncountably,
03:16:34.800 | can't count that low, right?
03:16:37.060 | But justifying it and getting a T-shirt,
03:16:39.320 | by the way, if you'd like a T-shirt,
03:16:40.440 | I can give you a T-shirt.
03:16:41.280 | - Oh, I would 100% like a T-shirt.
03:16:44.400 | Are you joking?
03:16:45.400 | - You can have a Fire Emoji T-shirt.
03:16:47.600 | - I will treasure this.
03:16:50.840 | - Is that a good thing?
03:16:51.680 | - I will pass it down to my grandchildren.
03:16:53.240 | - And so it's very liberating to be able to decide.
03:16:56.000 | I think that Lex should have a T-shirt.
03:16:57.960 | (laughing)
03:16:58.880 | Right?
03:16:59.720 | And it becomes very simple, 'cause I like Lex.
03:17:02.480 | - This is awesome.
03:17:05.760 | So I have to ask you about the,
03:17:12.280 | one of the interesting developments
03:17:15.340 | with large language models
03:17:17.040 | is that they're able to generate code recently really well.
03:17:25.160 | - Yes.
03:17:26.000 | - To a degree that maybe,
03:17:27.560 | I don't know if you understand,
03:17:30.000 | but I struggle to understand
03:17:32.380 | because it forces me to ask questions
03:17:34.800 | about the nature of programming,
03:17:36.760 | of the nature of thought,
03:17:39.400 | because the language models are able to predict
03:17:42.760 | the kind of code I was about to write so well
03:17:45.880 | that it makes me wonder how unique my brain is
03:17:48.520 | and where the valuable ideas actually come from.
03:17:50.960 | How much do I contribute in terms of ingenuity,
03:17:55.960 | innovation to code I write,
03:17:58.680 | or design and that kind of stuff?
03:18:00.460 | When you stand on the shoulders of giants,
03:18:03.240 | are you really doing anything?
03:18:04.320 | And what LLMs are helping you do
03:18:06.840 | is they help you stand on the shoulders of giants
03:18:09.400 | in your program.
03:18:10.320 | There's mistakes.
03:18:11.520 | They're interesting that you learn from,
03:18:12.860 | but I just, it would love to get your opinion first,
03:18:15.860 | high level of what you think about this impact
03:18:19.840 | of larger language models when they do program synthesis,
03:18:22.880 | when they generate code.
03:18:24.520 | - Yeah, well, so I don't know where it all goes.
03:18:28.880 | - Yeah.
03:18:30.160 | - I'm an optimist and I'm a human optimist.
03:18:32.800 | I think that things I've seen are that a lot of the LLMs
03:18:35.960 | are really good at crushing leet code projects
03:18:38.700 | and they can reverse the linked list like crazy.
03:18:41.760 | Well, it turns out there's a lot of instances
03:18:44.520 | of that on the internet and it's a pretty stock thing.
03:18:46.740 | And so if you want to see standard questions answered,
03:18:50.480 | LLMs can memorize all the answers and that can be amazing.
03:18:52.860 | And also they do generalize out from that.
03:18:55.060 | And so there's good work on that.
03:18:56.760 | But I think that if you, in my experience, building things,
03:19:01.120 | building something like you talk about Mojo,
03:19:03.400 | or you talk about these things,
03:19:04.700 | or you talk about building an applied solution to a problem,
03:19:07.780 | it's also about working with people.
03:19:10.340 | It's about understanding the problem.
03:19:11.400 | What is the product that you wanna build?
03:19:13.060 | What are the use case?
03:19:13.900 | What are the customers?
03:19:15.240 | You can't just go survey all the customers
03:19:16.800 | because they'll tell you that they want a faster horse.
03:19:19.960 | Maybe they need a car.
03:19:21.560 | And so a lot of it comes into,
03:19:23.120 | I don't feel like we have to compete with LLMs.
03:19:26.440 | I think they'll help automate
03:19:27.600 | a ton of the mechanical stuff out of the way.
03:19:30.720 | And just like I think we all try to scale
03:19:32.880 | through delegation and things like this,
03:19:34.800 | delegating wrote things to an LLM,
03:19:36.880 | I think is an extremely valuable approach
03:19:40.200 | that will help us all scale and be more productive.
03:19:42.400 | - But I think it's a fascinating companion.
03:19:45.000 | But I'd say I don't think that that means
03:19:46.320 | that we're gonna be done with coding.
03:19:47.800 | - Sure, sure.
03:19:49.000 | But there's power in it as a companion.
03:19:52.640 | And from there, I would love to zoom in
03:19:55.400 | onto Mojo a little bit.
03:19:56.880 | Do you think about that?
03:19:59.520 | Do you think about LLMs generating Mojo code
03:20:02.400 | and helping sort of,
03:20:05.320 | like when you design a new programming language,
03:20:07.640 | it almost seems like, man, it would be nice to sort of,
03:20:13.400 | almost as a way to learn how I'm supposed to use this thing
03:20:16.680 | for them to be trained on some of the Mojo code.
03:20:19.840 | - So I do lead an AI company,
03:20:21.120 | so maybe there'll be a Mojo LLM at some point.
03:20:25.800 | But if your question is like,
03:20:27.800 | how do we make a language to be suitable for LLMs?
03:20:31.040 | I think that the cool thing about LLMs
03:20:34.880 | is you don't have to.
03:20:35.920 | (laughs)
03:20:36.760 | Right?
03:20:37.600 | And so if you look at what is English
03:20:39.480 | or any of these other terrible languages
03:20:41.160 | that we as humans deal with on a continuous basis,
03:20:43.360 | they're never designed for machines.
03:20:45.360 | And yet they're the intermediate representation.
03:20:48.360 | They're the exchange format that we humans use
03:20:50.520 | to get stuff done, right?
03:20:52.360 | And so these programming languages,
03:20:53.640 | they're an intermediate representation
03:20:55.480 | between the human and the computer
03:20:57.560 | or the human and the compiler, roughly, right?
03:21:00.640 | And so I think the LLMs will have no problem
03:21:03.240 | learning whatever keyword we pick.
03:21:05.560 | - Maybe the five emoji is gonna--
03:21:07.600 | - Oh, maybe that's gonna break it.
03:21:08.680 | It doesn't tokenize?
03:21:09.520 | - No, the reverse of that, it will actually enable it
03:21:12.360 | because one of the issues I could see
03:21:14.440 | with being a superset of Python
03:21:16.560 | is there would be confusion about the gray area.
03:21:19.440 | So it would be mixing stuff.
03:21:21.040 | But--
03:21:23.600 | - Well, I'm a human optimist.
03:21:24.440 | I'm also an LLM optimist.
03:21:25.720 | I think that we'll solve that problem.
03:21:26.920 | - We'll solve it, yeah.
03:21:27.920 | - But you look at that and you say,
03:21:30.920 | okay, well, reducing the rote thing, right?
03:21:34.640 | It turns out compilers are very particular
03:21:36.360 | and they really want things,
03:21:38.000 | they really want the indentation to be right.
03:21:39.480 | They really want the colon to be there
03:21:41.040 | on your else or else it'll complain, right?
03:21:43.200 | I mean, compilers can do better at this,
03:21:45.560 | but LLMs can totally help solve that problem.
03:21:48.640 | And so I'm very happy about the new predictive coding
03:21:51.800 | and copilot type features and things like this
03:21:53.760 | because I think it'll all just make us more productive.
03:21:55.920 | - It's still messy and fuzzy and uncertain, unpredictable.
03:21:59.440 | So, but is there a future you see
03:22:02.000 | given how big of a leap GPT-4 was
03:22:05.520 | where you start to see something like LLMs
03:22:08.080 | inside a compiler?
03:22:11.120 | Or no?
03:22:11.960 | - I mean, you could do that.
03:22:12.960 | Yeah, absolutely.
03:22:13.800 | I mean, I think that'd be interesting.
03:22:15.160 | - Is that wise?
03:22:16.920 | - Well, I mean, it would be very expensive.
03:22:19.280 | So compilers run fast and they're very efficient
03:22:21.960 | and LLMs are currently very expensive.
03:22:24.000 | There's on-device LLMs and there's other things going on.
03:22:26.280 | And so maybe there's an answer there.
03:22:29.080 | I think that one of the things
03:22:30.040 | that I haven't seen enough of is that,
03:22:33.440 | so LLMs to me are amazing when you tap
03:22:35.480 | into the creative potential of the hallucinations, right?
03:22:40.360 | And so if you're doing creative brainstorming
03:22:43.200 | or creative writing or things like that,
03:22:44.600 | the hallucinations work in your favor.
03:22:46.700 | If you're writing code that has to be correct
03:22:50.000 | 'cause you're gonna ship it in production,
03:22:51.240 | then maybe that's not actually a feature.
03:22:53.920 | And so I think that there has been research
03:22:56.640 | and there has been work on building
03:22:58.720 | algebraic reasoning systems and kind of like figuring out
03:23:02.600 | more things that feel like proofs.
03:23:05.040 | And so I think that there could be interesting work
03:23:06.920 | in terms of building more reliable at scale systems.
03:23:10.080 | And that could be interesting.
03:23:11.360 | But if you chase that rabbit hole down,
03:23:13.640 | the question then becomes,
03:23:14.560 | how do you express your intent to the machine?
03:23:16.800 | And so maybe you want an LLM to provide the spec,
03:23:19.220 | but you have a different kind of net
03:23:21.200 | that then actually implements the code.
03:23:23.880 | - Right, so it's the use of documentation
03:23:26.400 | and inspiration versus the actual implementation.
03:23:30.560 | - Yeah, potentially.
03:23:31.560 | - Since a successful modular will be the thing that runs,
03:23:37.440 | I say so jokingly, our AI overlords,
03:23:40.840 | but AI systems that are used across,
03:23:43.840 | I know it's a cliche term, but in and of things.
03:23:47.520 | So across--
03:23:48.360 | - So I'll joke and say like AGI should be written in Mojo.
03:23:52.120 | - Yeah, AGI should be written in Mojo.
03:23:54.240 | You're joking, but it's also possible that it's not a joke.
03:23:57.240 | That a lot of the ideas behind Mojo
03:24:00.840 | is seems like the natural set of ideas
03:24:04.560 | that would enable at-scale training
03:24:07.000 | and inference of AI systems.
03:24:09.880 | So I just have to ask you about the big philosophical
03:24:13.160 | question about human civilization.
03:24:15.000 | So folks like Eliezer Yudkowsky
03:24:17.720 | are really concerned about the threat of AI.
03:24:20.680 | Do you think about the good and the bad that can happen
03:24:25.680 | at scale deployment of AI systems?
03:24:29.760 | - Well, so I've thought a lot about it,
03:24:31.200 | and there's a lot of different parts to this problem.
03:24:33.960 | Everything from job displacement to Skynet,
03:24:37.680 | things like this.
03:24:38.520 | And so you can zoom into sub parts of this problem.
03:24:41.040 | I'm not super optimistic about AGI being solved next year.
03:24:47.640 | I don't think that's gonna happen personally.
03:24:50.160 | - So you have a kind of Zen-like calm about,
03:24:53.920 | is there's a nervousness because the leap of GPT-4
03:24:57.120 | seemed so big.
03:24:59.040 | - Sure, it's huge.
03:24:59.880 | - It's like we're almost,
03:25:02.040 | there's some kind of transitionary period.
03:25:03.920 | You're thinking--
03:25:04.960 | - Well, so I mean, there's a couple of things going on there.
03:25:07.760 | One is I'm sure GPT-5 and 7 and 19 will be also huge leaps.
03:25:12.760 | They're also getting much more expensive to run.
03:25:16.120 | And so there may be a limiting function
03:25:17.640 | in terms of just expense on the one hand, and train.
03:25:20.360 | Like that could be a limiter that slows things down.
03:25:23.400 | But I think the bigger limiter is outside of like,
03:25:26.760 | Skynet takes over and I don't spend any time
03:25:29.320 | thinking about that because if Skynet takes over
03:25:30.880 | and kills us all, then I'll be dead.
03:25:32.120 | So I don't worry about that.
03:25:34.040 | So, you know, I mean, that's just,
03:25:36.360 | okay, if other things worry about, I'll just focus on.
03:25:38.800 | I'll focus and not worry about that one.
03:25:41.160 | But I think that the other thing I'd say is that
03:25:44.760 | AI moves quickly, but humans move slowly
03:25:47.600 | and we adapt slowly.
03:25:48.840 | And so what I expect to happen is,
03:25:50.640 | just like any technology diffusion,
03:25:52.840 | like the promise and then the application
03:25:56.600 | takes time to roll out.
03:25:58.680 | And so I think that I'm not even too worried about
03:26:01.880 | autonomous cars defining away all the taxi drivers.
03:26:04.720 | Remember, autonomy was supposed to be solved by 2020.
03:26:07.440 | - Boy, do I remember.
03:26:08.720 | - So, and so like, I think that on the one hand,
03:26:12.480 | we can see amazing progress, but on the other hand,
03:26:14.920 | we can see that, you know, the reality is a little bit
03:26:18.320 | more complicated and it may take longer to roll out
03:26:20.440 | than you might expect.
03:26:22.080 | - Well, that's in the physical space.
03:26:23.220 | I do think in the digital space is the stuff
03:26:26.760 | that's built on top of LLMs that runs, you know,
03:26:31.100 | the millions of apps that could be built on top of them
03:26:34.120 | and that could be run on millions of devices,
03:26:36.200 | millions of types of devices.
03:26:37.920 | I just think that the rapid effect it has
03:26:43.800 | on human civilization could be truly transformative to it.
03:26:47.840 | Yeah, we don't even know.
03:26:49.520 | - Well, and so, and there I think it depends on
03:26:52.080 | are you an optimist or a pessimist or a masochist?
03:26:54.800 | - Just to clarify, optimist about human civilization.
03:27:00.400 | - Me too.
03:27:01.300 | And so I look at that as saying, okay, cool, what will AI do?
03:27:04.480 | Right, and so some people say, oh my God,
03:27:06.380 | is it gonna destroy us all?
03:27:07.220 | How do we prevent that?
03:27:08.620 | I kind of look at it from a, is it gonna unlock us all?
03:27:12.780 | Right, you talk about coding, is it gonna make it
03:27:14.180 | so I don't have to do all the repetitive stuff?
03:27:16.780 | Well, suddenly that's a very optimistic way to look at it
03:27:18.940 | and you look at what a lot of these technologies
03:27:22.340 | have done to improve our lives and I want that to go faster.
03:27:25.680 | - What do you think the future of programming looks like
03:27:29.140 | for the next 10, 20, 30, 50 years?
03:27:31.520 | With LLMs and with Mojo, with modular,
03:27:37.460 | like the vision for devices, the hardware to the compilers,
03:27:41.580 | to the different stacks of software?
03:27:44.780 | - Well, so what I want, I mean, coming back
03:27:46.460 | to my arch nemesis, right, it's complexity, right?
03:27:49.140 | So again, me being the optimist, if we drive down complexity
03:27:54.140 | we can make these tools, these technologies,
03:27:56.360 | these cool hardware widgets accessible to way more people.
03:27:59.780 | Right, and so what I'd love to see
03:28:00.780 | is more personalized experiences, more things,
03:28:04.380 | the research getting into production
03:28:05.860 | instead of being lost at NeurIPS, right?
03:28:08.460 | And so, and like these things that impact people's lives
03:28:13.180 | by entering products.
03:28:15.220 | And so one of the things that I'm a little bit concerned
03:28:17.180 | about is right now the big companies are investing
03:28:21.060 | huge amounts of money and are driving the top line
03:28:23.460 | of AI capability forward really quickly.
03:28:26.040 | But if it means that you have to have $100 million
03:28:28.740 | to train a model or more, $100 billion, right,
03:28:32.260 | well, that's gonna make it very concentrated
03:28:34.940 | with very few people in the world
03:28:37.100 | that can actually do this stuff.
03:28:38.580 | I would much rather see lots of people across the industry
03:28:43.100 | be able to participate and use this, right?
03:28:45.260 | And you look at this, you know, I mean,
03:28:46.740 | a lot of great research has been done in the health world
03:28:49.660 | and looking at, like detecting pathologies
03:28:52.820 | and doing radiology with AI and like doing all these things.
03:28:56.600 | Well, the problem today is that to deploy
03:28:58.960 | and build these systems, you have to be an expert
03:29:00.440 | in radiology and an expert in AI.
03:29:04.060 | And if we can break down the barriers
03:29:06.560 | so that more people can use AI techniques,
03:29:09.000 | and it's more like programming Python,
03:29:11.500 | which roughly everybody can do if they want to, right?
03:29:15.060 | Then I think that we'll get a lot more practical application
03:29:17.680 | of these techniques and a lot more nichier,
03:29:20.880 | cool, but narrower domains.
03:29:22.440 | I think that's gonna be really cool.
03:29:24.280 | - Do you think we'll have more or less programmers
03:29:26.560 | in the world than now?
03:29:28.620 | - Well, so I think we'll have more programmers,
03:29:31.560 | but they may not consider themselves to be programmers.
03:29:33.800 | - That'd be a different name for it.
03:29:34.920 | - Right, I mean, do you consider somebody that uses,
03:29:37.240 | you know, I think that arguably
03:29:38.280 | the most popular programming language is Excel.
03:29:42.720 | - Yeah.
03:29:44.160 | - Right? - Yes, yep.
03:29:45.160 | - And so do they consider themselves to be programmers?
03:29:47.520 | Maybe not.
03:29:48.360 | I mean, some of them make crazy macros and stuff like that,
03:29:51.040 | but what you mentioned, Steve Jobs,
03:29:56.040 | it's the bicycle for the mind that allows you to go faster.
03:30:00.360 | Right, and so I think that as we look forward, right,
03:30:03.240 | what is AI?
03:30:04.080 | I look at it as hopefully a new programming paradigm.
03:30:06.960 | It's like object-oriented programming, right?
03:30:09.400 | If you want to write a cat detector,
03:30:10.600 | you don't use for loops.
03:30:12.360 | It turns out that's not the right tool for the job, right?
03:30:14.640 | And so right now, unfortunately,
03:30:16.600 | because I mean, it's not unfortunate,
03:30:18.400 | but it's just kind of where things are.
03:30:20.080 | AI is this weird, different thing
03:30:22.560 | that's not integrated into programming languages
03:30:24.960 | and normal tool chains,
03:30:25.840 | and all the technology is really weird
03:30:27.960 | and doesn't work right, and you have to babysit it,
03:30:30.160 | and every time you switch hardware, it's different.
03:30:33.040 | It shouldn't be that way.
03:30:34.520 | When you change that, when you fix that,
03:30:35.840 | suddenly, again, the tools and technologies
03:30:37.800 | can be way easier to use.
03:30:39.400 | You can start using them for many more things.
03:30:41.040 | And so that's why I would be excited about it.
03:30:43.840 | - What kind of advice could you give
03:30:44.920 | to somebody in high school right now,
03:30:46.360 | or maybe early college, who's curious about programming
03:30:50.000 | and feeling like the world is changing really quickly here?
03:30:55.560 | - Yeah, well--
03:30:56.400 | - What kind of stuff to learn?
03:30:57.760 | What kind of stuff to work on?
03:31:00.000 | Should they finish college?
03:31:01.080 | Should they go work at a company?
03:31:03.280 | Should they build a thing?
03:31:04.440 | What do you think?
03:31:05.600 | - Well, so, I mean, one of the things I'd say
03:31:07.000 | is that you'll be most successful
03:31:09.600 | if you work on something you're excited by.
03:31:12.440 | And so don't get the book and read the book
03:31:15.600 | cover to cover and study and memorize
03:31:17.320 | and recite and flashcard.
03:31:19.160 | Go build something.
03:31:20.560 | Like, go solve a problem.
03:31:21.440 | Go build the thing that you want to exist.
03:31:23.120 | Go build an app.
03:31:24.360 | Go build, train a model.
03:31:26.760 | Like, go build something and actually use it
03:31:28.720 | and set a goal for yourself.
03:31:29.800 | And if you do that, then you'll,
03:31:32.200 | you know, there's a success.
03:31:33.440 | There's the adrenaline rush.
03:31:34.520 | There's the achievement.
03:31:35.340 | There's the unlock that I think is where,
03:31:37.880 | you know, if you keep setting goals
03:31:39.120 | and you keep doing things and building things,
03:31:41.160 | learning by building is really powerful.
03:31:43.160 | In terms of career advice,
03:31:45.440 | I mean, everybody's different.
03:31:46.640 | It's very hard to give generalized advice.
03:31:49.120 | I'll speak as, you know, a compiler nerd.
03:31:52.800 | If everybody's going left,
03:31:55.800 | sometimes it's pretty cool to go right.
03:31:57.480 | - Yeah.
03:31:58.320 | - And so just because everybody's doing a thing,
03:32:00.040 | it doesn't mean you have to do the same thing
03:32:03.000 | and follow the herd.
03:32:03.920 | In fact, I think that sometimes
03:32:06.260 | the most exciting paths through life
03:32:08.560 | lead to being curious about things
03:32:10.960 | that nobody else actually focuses on, right?
03:32:13.400 | And it turns out that understanding deeply
03:32:16.560 | parts of the problem that people want to take for granted
03:32:19.780 | makes you extremely valuable and specialized
03:32:22.000 | in ways that the herd is not.
03:32:24.560 | And so, again, I mean,
03:32:26.020 | there's lots of rooms for specialization,
03:32:27.600 | lots of rooms for generalists.
03:32:29.480 | There's lots of room for different kinds
03:32:31.040 | and parts of the problem.
03:32:32.040 | But I think that it's, you know,
03:32:34.440 | just because everybody's doing one thing
03:32:36.240 | doesn't mean you should necessarily do it.
03:32:38.200 | - And now the herd is using Python.
03:32:40.040 | So if you want to be a rebel, go check out Mojo
03:32:45.040 | and help Chris and the rest of the world
03:32:48.160 | fight the arch nemesis of complexity
03:32:50.440 | 'cause simple is beautiful.
03:32:51.480 | - There you go.
03:32:52.880 | - Chris, you're an incredible person.
03:32:54.200 | You've been so kind to me ever since we met.
03:32:56.760 | You've been extremely supportive.
03:32:58.080 | I'm forever grateful for that.
03:33:00.000 | Thank you for being who you are,
03:33:01.680 | for being legit, for being kind,
03:33:03.520 | for fighting this really interesting problem
03:33:08.840 | of how to make AI accessible to a huge number of people,
03:33:12.240 | huge number of devices.
03:33:13.960 | - Yeah, well, so Lex, you're a pretty special person too.
03:33:16.120 | Right, and so I think that, you know,
03:33:18.880 | one of the funny things about you is that
03:33:20.760 | besides being curious and pretty damn smart,
03:33:23.000 | you're actually willing to push on things.
03:33:24.640 | And I think that you've got an agenda
03:33:27.320 | to like make the world think,
03:33:28.820 | which I think is a pretty good agenda.
03:33:31.320 | - It's a pretty good one.
03:33:32.920 | Thank you so much for talking to me, Chris.
03:33:34.440 | - Yeah, thanks, Lex.
03:33:35.440 | - Thanks for listening to this conversation
03:33:37.720 | with Chris Ladner.
03:33:38.840 | To support this podcast,
03:33:40.040 | please check out our sponsors in the description.
03:33:42.520 | And now let me leave you with some words
03:33:44.520 | from Isaac Asimov.
03:33:46.360 | I do not fear computers.
03:33:48.640 | I fear the lack of them.
03:33:50.520 | Thank you for listening and hope to see you next time.
03:33:54.680 | (upbeat music)
03:33:57.260 | (upbeat music)
03:33:59.840 | [BLANK_AUDIO]