back to indexChris Lattner: Future of Programming and AI | Lex Fridman Podcast #381
Chapters
0:0 Introduction
2:20 Mojo programming language
12:37 Code indentation
21:4 The power of autotuning
30:54 Typed programming languages
47:38 Immutability
59:56 Distributed deployment
94:23 Mojo vs CPython
110:12 Guido van Rossum
117:13 Mojo vs PyTorch vs TensorFlow
120:37 Swift programming language
126:9 Julia programming language
131:14 Switching programming languages
140:40 Mojo playground
145:30 Jeremy Howard
156:16 Function overloading
164:41 Error vs Exception
172:21 Mojo roadmap
185:23 Building a company
197:9 ChatGPT
203:32 Danger of AI
207:27 Future of programming
210:43 Advice for young people
00:00:00.000 |
on one axis you have more hardware coming in, 00:00:01.760 |
on the other hand you have an explosion of innovation in AI. 00:00:05.680 |
And so what happened with both TensorFlow and PyTorch 00:00:10.400 |
has led to, it's not just about matrix multiplication 00:00:18.280 |
I don't know how many pieces of hardware there are, 00:00:21.220 |
Part of my thesis, part of my belief of where computing goes 00:00:28.520 |
Physics isn't going back to where we came from, 00:00:30.800 |
it's only gonna get weirder from here on out. 00:00:33.480 |
And so to me, the exciting part about what we're building 00:00:36.880 |
is it's about building that universal platform 00:00:42.920 |
'cause again I don't think it's avoidable, it's physics, 00:00:45.400 |
but we can help lift people's scale, do things with it, 00:00:52.720 |
- The following is a conversation with Chris Ladner. 00:01:04.800 |
Having created LLM Compiler Infrastructure Project, 00:01:07.960 |
the Clang Compiler, the Swift Programming Language, 00:01:10.920 |
a lot of key contributions to TensorFlow and TPUs 00:01:14.720 |
He served as Vice President of Autopilot Software at Tesla, 00:01:18.880 |
was a software innovator and leader at Apple, 00:01:21.880 |
and now he co-created a new full stack AI infrastructure 00:01:26.960 |
for distributed training, inference, and deployment 00:01:51.800 |
If you love machine learning, if you love Python, 00:01:56.920 |
This programming language, this new AI framework 00:02:00.240 |
and infrastructure, and this conversation with Chris 00:02:19.960 |
- It's been, I think, two years since we last talked, 00:02:25.760 |
and co-created a new programming language called Mojo. 00:02:38.680 |
Well, so I mean, I think you have to zoom out. 00:02:40.080 |
So I've been working on a lot of related technologies 00:02:51.480 |
And what's happened with AI is we have new GPUs 00:03:02.440 |
That's one of the biggest, largest-scale deployed systems 00:03:06.360 |
And really what you see is if you look across 00:03:09.240 |
all of the things that are happening in the industry, 00:03:12.760 |
And it's not just about CPUs or GPUs or TPUs or NPUs 00:03:20.920 |
It's about how do we program these things, right? 00:03:27.000 |
it doesn't do us any good if there's this amazing hardware 00:03:31.080 |
And one of the things you find out really quick 00:03:35.720 |
of programming something and then having the world's power 00:03:39.040 |
and the innovation of all the smart people in the world 00:03:42.080 |
get unleashed on something can be quite different. 00:03:44.960 |
And so really where Mojo came from was starting 00:03:52.200 |
and make it way more accessible, way more usable, 00:03:54.840 |
way more understandable by normal people and researchers 00:03:57.840 |
and other folks that are not themselves like experts 00:04:07.560 |
- So one of the main features of the language, 00:04:10.480 |
I say so fully in jest, is that it allows you 00:04:18.600 |
or the fire emoji, which is one of the first emojis 00:04:23.600 |
used as a file extension I've ever seen in my life. 00:04:29.120 |
why in the 21st century are we not using Unicode 00:04:38.560 |
you made the most, but you could also just use Mojo 00:04:47.720 |
This is, we're releasing this onto the world. 00:04:53.640 |
Emojis are such a big part of our daily lives. 00:05:06.560 |
And so why are we spending all the screen space on them 00:05:09.800 |
Also, you have them stacked up next to text files 00:05:16.640 |
Emojis are colorful, they're visual, they're beautiful. 00:05:22.560 |
is there a support on like Windows on operating systems 00:05:31.440 |
And so it thinks that the fire emoji is unprintable 00:05:37.080 |
But everything else as far as I'm aware works fine. 00:05:54.280 |
So this is just like taking the next step, right? 00:05:56.840 |
Somewhere between, oh wow, that makes sense, cool. 00:06:00.120 |
I like new things, to oh my God, you're killing my baby. 00:06:04.560 |
This can never be, like I can never handle this. 00:06:18.800 |
- When can we have emojis as part of the code, I wonder? 00:06:22.360 |
- Yeah, so I mean, lots of languages provide that. 00:06:24.160 |
So I think that we have partial support for that. 00:06:30.400 |
For example, in Swift, you can do that for sure. 00:06:32.440 |
So an example we gave at Apple was the dog cow. 00:06:39.840 |
And so you use the dog and the cow emoji together 00:06:46.880 |
So if you wanna name your function pile of poop, 00:06:54.280 |
- Okay, so let me just ask a bunch of random questions. 00:07:01.920 |
or is it a general purpose programming language? 00:07:05.960 |
And so AI is driving a lot of the requirements. 00:07:13.080 |
And it's not because it's an interesting project 00:07:15.360 |
theoretically to build, it's because we need it. 00:07:20.840 |
the AI infrastructure landscape and the big problems in AI. 00:07:24.920 |
The reasons that it is so difficult to use and scale 00:07:27.120 |
and adopt and deploy and like all these big problems in AI. 00:07:31.400 |
And so we're coming at it from that perspective. 00:07:36.120 |
you realize that the solution to these problems 00:07:44.080 |
we're building Mojo to be a fully general programming 00:07:46.640 |
And that means that you can obviously tackle GPUs and CPUs 00:07:52.360 |
but it's also a really great way to build NumPy 00:07:56.520 |
Or, you know, just if you look at what many Python libraries 00:07:59.720 |
are today, often they're a layer of Python for the API 00:08:02.680 |
and they end up being C and C++ code underneath them. 00:08:07.280 |
that's true in lots of other domains as well. 00:08:10.600 |
that's an opportunity for Mojo to help simplify the world 00:08:15.080 |
- So optimize through simplification by having one thing. 00:08:37.560 |
what's gone on in the last five, six, seven, eight years 00:08:40.400 |
is that we've had things like TensorFlow and PyTorch 00:08:46.280 |
And what's happened is these things have grown like crazy. 00:09:17.720 |
are very different than when these systems were built. 00:09:24.960 |
and every big company's announcing a new chip every day, 00:09:29.840 |
you have like this moving system on one side, 00:09:36.960 |
which makes it very difficult for people to actually use AI, 00:09:40.200 |
particularly in production deployment scenarios. 00:09:50.960 |
Now, what Mojo does is it's a really, really, 00:10:02.400 |
that allows you to do the higher level programming, 00:10:07.960 |
they do all kinds of programming in that spectrum 00:10:11.800 |
that gets you closer and closer to the hardware. 00:10:31.000 |
How it feels like I'm writing natural language English. 00:10:59.880 |
- Well, and don't forget the entire ecosystem 00:11:05.440 |
if you wanna do anything, there's always a package. 00:11:07.840 |
- Yeah, so it's not just the ecosystem of the packages 00:11:14.520 |
That's a really, that's an interesting dynamic. 00:11:26.920 |
- Well, and there's many things that went into that. 00:11:38.040 |
But I think that the major thing underlying it 00:11:40.520 |
is that Python's like the universal connector. 00:11:42.880 |
It really helps bring together lots of different systems 00:11:45.920 |
so you can compose them and build out larger systems 00:11:53.200 |
- Well, I guess you could say several things, 00:11:57.240 |
I think that's usually what people complain about. 00:11:59.440 |
And so, I mean, other people would complain about tabs 00:12:07.280 |
'cause it is actually just better to use indentation. 00:12:12.960 |
So actually, on a small tangent, let's actually take that. 00:12:19.600 |
- Design, listen, I've recently left Emacs for VS Code, 00:12:36.560 |
- Anyway, tabs is an interesting design decision. 00:12:39.400 |
And so you've really written a new programming language here. 00:12:56.960 |
So I mean, you can explain this in many rational ways. 00:13:08.400 |
So first of all, Python 1 has millions of programmers. 00:13:23.480 |
curly brace languages also run through formatting tools 00:13:31.880 |
first of all, it will twist your brain around. 00:13:35.120 |
There's notorious bugs that have happened across time 00:13:37.520 |
where the indentation was wrong or misleading 00:13:43.600 |
And so what ends up happening in modern large scale 00:13:46.280 |
code bases is people run automatic formatters. 00:13:53.160 |
Well, if you're gonna have the notion of grouping, 00:14:04.000 |
it's like, okay, well, you can have curly braces 00:14:06.120 |
or you can omit them if there's one statement 00:14:09.600 |
of complicated design space that objectively you don't need 00:14:15.760 |
- Yeah, I would love to actually see statistics 00:14:19.840 |
Like how many errors are made in Python versus in C++ 00:14:28.680 |
because once you get, like you use VS Code, I do too. 00:14:33.360 |
it does the indentation for you generally, right? 00:14:35.440 |
And so you don't, you know, it's actually really nice 00:14:39.360 |
And then what you can see is the editor's telling you 00:14:55.920 |
And so I can joke about it and I love to kind of, 00:14:59.360 |
you know, I realize that this is such a polarizing thing 00:15:03.680 |
And so I like poking at the bear a little bit, right? 00:15:06.760 |
But frankly, right, come back to the first point, Python 1. 00:15:10.400 |
Like it's huge, it's an AI, it's the right thing. 00:15:13.120 |
For us, like we see Mojo as being an incredible part 00:15:17.080 |
We're not looking to break Python or change it 00:15:22.760 |
Our view is that Python is just not done yet. 00:15:29.920 |
that go into that, which we can talk about if you want. 00:15:31.960 |
But one of them is it just doesn't have those features 00:15:37.520 |
And so if you say, okay, well, I'm forced out of Python 00:15:45.800 |
Can we just add those features that are missing from Python 00:15:50.080 |
And then you can have everything that's great about Python, 00:15:52.080 |
all the things you're talking about that you love, 00:15:57.880 |
more computationally intense or weird or hardwarey 00:16:24.120 |
So one of the things that makes Python beautiful 00:16:29.360 |
And because it's dynamic, one of the things they added 00:16:32.160 |
is that it has this powerful metaprogramming feature. 00:16:35.080 |
And so if you look at something like PyTorch or TensorFlow, 00:16:48.800 |
and then the plus method works on your class. 00:16:55.560 |
In Mojo, we want all those features to come in. 00:16:58.920 |
We don't wanna break Python, we want it all to work. 00:17:02.200 |
those super dynamic features on an embedded processor 00:17:14.680 |
okay, how do you get the power of this dynamic metaprogramming 00:17:18.880 |
into a language that has to be super efficient 00:17:28.040 |
Take that interpreter and allow it to run at compile time. 00:17:34.400 |
And so this is super interesting and super powerful 00:17:42.120 |
You get the ability to have overloaded operators. 00:17:45.040 |
And if you look at what happens inside of like PyTorch, 00:18:09.160 |
I mean, different people have different interpretations. 00:18:11.560 |
My interpretation is that it was made accidentally powerful. 00:18:14.680 |
It was not designed to be Turing complete, for example, 00:18:17.680 |
but that was discovered kind of along the way accidentally. 00:18:21.040 |
And so there've been a number of languages in the space. 00:18:29.680 |
Some more modern languages or some more newer languages, 00:18:33.840 |
let's say like, you know, they're fairly unknown, 00:18:40.680 |
let's take all of those types that you can run it, 00:18:50.520 |
I mean, which is one of the problems with C++. 00:18:57.880 |
I mean, everybody hates me for a variety of reasons anyways, 00:19:18.200 |
And so if you do metaprogramming and programming, 00:19:27.160 |
is to make things really easy to use, easy to learn. 00:19:42.400 |
'Cause that sounds, to me as a fan of metaprogramming 00:19:45.160 |
in C++ even, how hard is it to pull that off? 00:20:02.880 |
It requires, I mean, what Mojo has underneath the covers 00:20:19.560 |
- So you have like an interpreter inside the compiler? 00:20:26.160 |
of programming languages and kind of twists it 00:20:36.280 |
many of these languages have metaprogramming features, 00:20:54.080 |
And so that made it way simpler, way more consistent, 00:21:07.040 |
I think we could generally say is extremely useful. 00:21:11.800 |
And so you get features, I mean, I'll jump around, 00:21:20.840 |
- Well, so, okay, so let's come back to that. 00:21:26.480 |
Like you take a PyTorch model off the internet, right? 00:21:29.280 |
It's really interesting to me because what PyTorch 00:21:35.680 |
is they're pushing into like this abstract specification 00:21:43.400 |
And so this is why it became a metaprogramming problem. 00:21:54.960 |
I wanna take this problem now run it across 1,000 CPUs 00:22:04.040 |
and then map it and do things and transform it 00:22:09.480 |
that makes machine learning systems really special. 00:22:12.160 |
- Maybe can you describe autotuning and how do you pull off? 00:22:17.480 |
is what we're talking about as metaprogramming. 00:22:20.720 |
I mean, is that as profound as I think it is? 00:22:31.600 |
which by the way, I have to absolutely like dive in. 00:22:51.200 |
Like many of these ideas have existed in other systems 00:22:55.160 |
And so what we're doing is we're pulling together good ideas, 00:23:09.200 |
Turns out maybe you don't actually want to know 00:23:14.720 |
And so there are lots of really smart hardware people. 00:23:17.480 |
I know a lot of them, where they know everything about, 00:23:27.160 |
because it maps directly onto what it can do. 00:23:30.000 |
where the GPU has SMs and it has a warp size of whatever, 00:23:32.840 |
right, all the stuff that goes into these things, 00:23:49.560 |
actually don't want to know this stuff, right? 00:23:51.680 |
And so if you come at it from the perspective 00:23:55.280 |
both more abstracted, but also more portable code, 00:23:58.840 |
because it could be that the vector length changes, 00:24:02.080 |
or it could be that the tile size of your matrix changes, 00:24:11.240 |
A lot of the algorithms that you run are actually the same, 00:24:14.400 |
but the parameters, these magic numbers you have to fill in 00:24:26.920 |
So instead of having humans go randomly try all the things, 00:24:29.640 |
or do a grid search, or go search some complicated 00:24:31.760 |
multi-dimensional space, how about we have computers do that? 00:24:36.040 |
Right, and so what autotuning does is you can say, 00:24:40.080 |
If it's a matrix operation or something like that, 00:24:43.200 |
you can say, okay, I'm gonna carve it up into blocks, 00:24:46.920 |
and I want this with 128 things that I'm running on, 00:24:50.400 |
and I want to cut it this way or that way or whatever, 00:24:57.280 |
- And then the result of that, you cache for that system. 00:25:01.400 |
- And so come back to twisting your compiler brain, right? 00:25:05.540 |
So not only does the compiler have an interpreter 00:25:10.120 |
that compiler, that interpreter, that metaprogramming 00:25:18.920 |
and then stitch it in and then keep going, right? 00:25:20.960 |
- So part of the compilation is machine-specific. 00:25:23.360 |
- Yeah, well, so I mean, this is an optional feature, 00:25:25.680 |
right, so you don't have to use it for everything. 00:25:30.540 |
that we're in the quest of is ultimate performance. 00:25:34.720 |
performance is important for a couple of reasons, right? 00:25:36.640 |
So if you're an enterprise, you're looking to save costs 00:25:40.440 |
ultimate performance translates to fewer servers. 00:25:45.680 |
hey, better performance leads to more efficiency, right? 00:25:51.160 |
you know, Python's bad for the environment, right? 00:25:54.320 |
And so if you move to Mojo, it's like at least 10x better, 00:26:03.920 |
And so in the space of machine learning, right, 00:26:05.880 |
if you reduce the latency of a model, so it runs faster. 00:26:09.840 |
So every time you query the server running the model, 00:26:17.040 |
so you have a better experience as a customer. 00:26:25.160 |
you would specify like a bunch of options to try. 00:26:31.000 |
And then you can just set and forget and know, 00:26:40.680 |
So if you're building, so often what'll happen is that, 00:26:43.680 |
you've written a bunch of software yourself, right? 00:26:45.600 |
You wake up one day, you say, I have an idea, 00:26:51.880 |
I move on with life, I come back six months or a year 00:26:54.880 |
or two years or three years later, you dust it off 00:26:56.640 |
and you go use it again in a new environment. 00:27:00.720 |
Maybe you're running on a server instead of a laptop, 00:27:04.680 |
And so the problem now is you say, okay, well, 00:27:07.280 |
I mean, again, not everybody cares about performance, 00:27:10.760 |
I wanna take advantage of all these new features. 00:27:13.240 |
I don't wanna break the old thing though, right? 00:27:15.800 |
And so the typical way of handling this kind of stuff 00:27:19.160 |
before is, if you're talking about C++ templates 00:27:25.440 |
you get like all these weird things get layered in, 00:27:33.200 |
multi-dimensional space that you have to worry about. 00:27:57.120 |
- Yeah, so it can even do more, but we'll get to that. 00:28:00.640 |
So first of all, when we say that we're talking about 00:28:14.720 |
And so interpreters, they have an extra layer 00:28:21.240 |
and it makes them kind of slow from that perspective. 00:28:30.400 |
is two to five to 10X speed up, depending on the code. 00:28:39.040 |
Now, if you do that, one of the things you can do 00:28:48.760 |
and this isn't part of the Python spec necessarily, 00:29:18.720 |
don't like chasing pointers very much in things like this. 00:29:50.920 |
This is one of the things that hurts parallelism. 00:30:02.040 |
And so then you lean into this and you say, okay, cool. 00:30:05.600 |
they can do more than one operation at a time. 00:30:15.720 |
you can now do four or eight or 16 or 32 at a time. 00:30:18.560 |
Right, well, Python doesn't expose that because of reasons. 00:30:21.600 |
And so now you can say, okay, well, you can adopt that. 00:30:32.360 |
that have been built into the hardware over time. 00:30:34.400 |
And it gives, the library gives very nice features. 00:30:38.560 |
So you can say, just parallelize this, do this in parallel. 00:30:41.480 |
Right, so it's very, very powerful weapons against slowness, 00:30:48.280 |
having fun, like just taking code and making it go fast, 00:30:54.460 |
- Before I talk about some of the interesting stuff 00:31:04.120 |
It's sexy and beautiful like Python, as I mentioned. 00:31:20.440 |
And so because all those types live at runtime in Python, 00:31:26.560 |
Python also has like this whole typing thing going on now, 00:31:37.760 |
I take, I have a def and my def takes two parameters. 00:31:46.640 |
is that forces what's called a consistent representation. 00:31:49.240 |
So these things have to be a pointer to an object 00:31:51.800 |
with the object header, and they all have to look the same. 00:31:58.000 |
no matter what the receiver, whatever that type is. 00:32:10.080 |
And so it's fully dynamic, and that's all great. 00:32:13.640 |
like that's all very powerful and very important. 00:32:18.560 |
and it's 32 bits or 64 bits or whatever it is, 00:32:20.760 |
or it's a floating point value, it's 64 bits. 00:32:25.840 |
and it can use that to do way better optimization. 00:32:28.800 |
And it turns out, again, getting rid of the indirections, 00:32:30.720 |
that's huge, means you can get better code completion 00:32:33.640 |
because you have, 'cause compiler knows what the type is, 00:32:43.680 |
to progressively adopt types into your program. 00:32:46.560 |
And so you can start, again, it's compatible with Python. 00:32:49.360 |
And so then you can add however many types you want, 00:32:59.160 |
is that it's not that types are the right thing 00:33:01.640 |
or the wrong thing, it's that they're a useful thing. 00:33:04.480 |
- Which is kind of optional, it's not strict typing. 00:33:12.120 |
that Python's kind of reaching towards right now 00:33:14.280 |
with trying to inject types into it, what is it doing? 00:33:17.520 |
- Yeah, with a very different approach, but yes. 00:33:23.080 |
that have not been using types very much in Python. 00:33:29.640 |
- It's just, well, because I know the importance. 00:33:44.440 |
even just for forget-about-performance improvements. 00:33:46.760 |
It probably reduces errors when you do strict typing. 00:33:57.080 |
this pressure, there has to be a right way to do things. 00:34:01.920 |
and if you don't do that, you should feel bad. 00:34:04.240 |
Some people feel like Python's a guilty pleasure 00:34:06.280 |
or something, and it's like, when it gets serious, 00:34:12.760 |
and I understand kind of where this comes from, 00:34:14.340 |
but I don't think it has to be a guilty pleasure. 00:34:17.560 |
- Right, and so if you look at that, you say, 00:34:29.040 |
that has no dependencies, or you have objectives 00:34:34.200 |
So what if Python can achieve those objectives? 00:34:37.680 |
So if you want types, well, maybe you want types 00:34:40.720 |
on the right thing, sure, you can add a type. 00:34:43.480 |
If you don't care, you're prototyping some stuff, 00:34:46.880 |
you're pulling some RAM code off the internet, 00:34:53.120 |
you shouldn't feel bad about doing the right thing 00:34:58.840 |
you're working at some massive internet company 00:35:00.960 |
and you have 400 million lines of Python code, 00:35:03.800 |
well, they may have a house rule that you use types. 00:35:07.360 |
- Right, because it makes it easier for different humans 00:35:08.640 |
to talk to each other and understand what's going on 00:35:22.680 |
and if you use types, you get nice things out of it, right? 00:35:25.480 |
You get better performance and things like this, right? 00:35:27.640 |
But Mojo is a full compatible superset of Python, right? 00:35:32.640 |
And so that means it has to work without types. 00:35:42.160 |
list comprehensions and things like this, right? 00:35:43.880 |
And so that starting point, I think, is really important. 00:35:54.480 |
a very challenging migration from Python 2 to Python 3. 00:36:01.920 |
and it was very painful for many teams, right? 00:36:03.960 |
And there's a lot of things that went on in that. 00:36:10.880 |
I don't want the world to have to go through that. 00:36:18.080 |
I don't want them to have to rewrite all their code. 00:36:19.960 |
- Yeah, I mean, this, okay, the superset part is just, 00:36:24.120 |
I mean, there's so much brilliant stuff here. 00:36:29.720 |
But first of all, how's the typing implemented differently 00:36:48.960 |
My understanding is, basically, like many dynamic languages, 00:36:56.620 |
to writing large-scale, huge code bases in Python, 00:37:00.600 |
and at scale, it kind of helps to have types. 00:37:03.920 |
People want to be able to reason about interfaces. 00:37:05.640 |
What, do you expect, a string or an int, or like, 00:37:10.160 |
And so what the Python community started doing 00:37:12.360 |
is it started saying, okay, let's have tools on the side, 00:37:22.480 |
These are called static analysis tools, generally, 00:37:27.960 |
What ended up happening is there's so many of these things, 00:37:29.800 |
so many different weird patterns and different approaches 00:37:31.880 |
on specifying the types and different things going on 00:37:34.240 |
that the Python community realized and recognized, 00:37:44.960 |
is that they're coming from kind of this fragmented world 00:37:48.600 |
they have different trade-offs and interpretations, 00:37:54.480 |
according to the Python spec, the types are ignored. 00:38:00.800 |
you can write pretty much anything in a type position, okay? 00:38:05.040 |
And you can, technically, you can write any expression, okay? 00:38:10.040 |
Now, that's beautiful because you can extend it, 00:38:14.720 |
build your own tools, you can build your own house linter 00:38:18.920 |
But it's also a problem because any existing Python program 00:38:25.600 |
And so if you adopt somebody's package into your ecosystem, 00:38:31.880 |
and warnings and problems just because it's incompatible 00:38:37.640 |
and they're not checked by the Python interpreter, 00:38:39.520 |
it's always kind of more of a hint than it is a requirement. 00:38:44.920 |
can't use them for performance, and so it's really-- 00:38:52.520 |
- Exactly, and this all comes back to the design principle 00:38:57.120 |
they're kind of, the definition's a little bit murky, 00:38:59.320 |
it's unclear exactly the interpretation in a bunch of cases, 00:39:04.860 |
even if you want to, it's really difficult to use them 00:39:17.200 |
but in Mojo, if you declare a type and you use it, 00:39:26.680 |
and it's not a, like, best effort hint kind of a thing. 00:39:35.240 |
- And you get an error from the compiler, compile time. 00:39:37.960 |
Nice, okay, what kind of basic types are there? 00:39:46.760 |
in terms of what it tries to do in the language, 00:39:58.840 |
And so all of the different things in Python, 00:40:01.840 |
like for loops, and plus, and like all these things 00:40:04.240 |
can be accessed through these underbar, underbar methods. 00:40:13.760 |
why do I need to have integers built into the language? 00:40:18.560 |
okay, well, we can have this notion of structs. 00:40:20.760 |
So you have classes in Python, now you can have structs. 00:40:28.360 |
we can write C++ kind of code with structs if you want. 00:40:31.600 |
These things mix and work beautifully together. 00:40:34.440 |
But what that means is that you can go and implement 00:40:38.720 |
and all that kind of stuff in the language, right? 00:40:44.560 |
to me as a idealizing compiler language type of person, 00:40:49.560 |
what I wanna do is I wanna get magic out of the compiler 00:40:57.760 |
and has an amazing API and does all the things 00:40:59.680 |
you'd expect an integer to do, but you don't like it, 00:41:04.120 |
maybe you want a like sideways integer, I don't know, 00:41:08.500 |
then you can do that and it's not a second class citizen. 00:41:14.360 |
And so if you look at certain other languages, 00:41:19.900 |
int is hard-coded in the language, but complex is not. 00:41:25.860 |
And so isn't it kind of weird that you have this 00:41:32.520 |
and complex tries to look like a natural numeric type 00:41:35.800 |
and things like this, but integers and floating point 00:41:43.760 |
And because of that, you can't actually make something 00:41:56.520 |
So is there something like a nice standard implementation 00:42:01.280 |
- Yeah, so we're still building all that stuff out. 00:42:02.920 |
So we provide integers and floats and all that kind of stuff. 00:42:07.240 |
and things like that that you'd expect in an ML context. 00:42:10.080 |
Honestly, we need to keep designing and redesigning 00:42:13.080 |
and working with the community to build that out 00:42:19.320 |
But the power of putting in the library means 00:42:23.920 |
that aren't compiler engineers that can help us design 00:42:28.720 |
- So one of the exciting things we should mention here 00:42:45.920 |
if you're a super low-level programmer right now. 00:42:47.920 |
And what we're doing is we're working our way up the stack. 00:42:49.840 |
And so the way I would look at Mojo today in May in 2023 00:43:00.360 |
it's gonna be way more interesting to a variety of people. 00:43:03.880 |
But what we're doing is we decided to release it early 00:43:07.000 |
so that people can get access to it and play with it 00:43:15.640 |
and a lot of people are involved in this stuff. 00:43:17.080 |
And so what we're doing is we're really optimizing 00:43:21.840 |
And building it the right way is kind of interesting, 00:43:37.840 |
Sometimes the community probably can be very chaotic 00:43:52.920 |
And so like, it can be very stressful to develop. 00:44:04.520 |
given that the community is so richly involved? 00:44:23.560 |
but generally I want people to be happy, right? 00:44:30.840 |
some deep-seated, long tensions and pressures, 00:44:38.200 |
And so people just want us to move faster, right? 00:44:52.560 |
had the language monk sitting in the cloister 00:44:57.800 |
like beavering away, trying to build something. 00:45:05.800 |
can be challenging for lots of people involved. 00:45:13.320 |
Keep in mind, we released Mojo like two weeks ago. 00:45:22.760 |
10, 11,000 people all will want something different, right? 00:45:25.800 |
And so what we've done is we've tried to say, 00:45:30.360 |
Here, and the roadmap isn't completely arbitrary. 00:45:36.800 |
or add these capabilities and things like that. 00:45:38.880 |
And what we've done is we've spun really fast 00:45:41.800 |
And so we actually have very few bugs, which is cool. 00:45:47.920 |
but then what we're doing is we're dropping in features 00:45:57.840 |
and then you have the machine learning Python people 00:46:15.640 |
for something that's only been out for two weeks, right? 00:46:23.160 |
Like in a year's time, Mojo will be actually quite amazing 00:46:33.640 |
and the way I look at this at least is to say, 00:46:35.760 |
okay, well, we're solving big longstanding problems. 00:46:39.840 |
To me, I, again, working on many different problems, 00:46:49.720 |
There's very few opportunities to do projects like this 00:46:51.720 |
and have them really have impact on the world. 00:46:53.840 |
If we do it right, then maybe we can take those feuding armies 00:46:58.920 |
- Yeah, this is like, this feels like a speech 00:47:01.960 |
by George Washington or Abraham Lincoln or something. 00:47:14.880 |
now I'm not optimistic that all people will use Mojo 00:47:29.480 |
- So there are proposals for adding braces to Mojo. 00:47:38.440 |
Yeah, anyway, so there's a lot of amazing features 00:47:40.760 |
on the roadmap and those are already implemented. 00:47:50.400 |
So what's this var and this let thing that we got going on? 00:48:00.400 |
and it's not always required, but it's useful, 00:48:05.760 |
And so in Python, you have a pointer to an array, right? 00:48:09.280 |
And so you pass that pointer to an array around to things. 00:48:18.200 |
And so you get your array back and you go to use it. 00:48:20.440 |
Now somebody else is like putting stuff in your array. 00:48:24.240 |
It gets to be very complicated and leads to lots of bugs. 00:48:30.480 |
again, this is not something Mojo forces on you, 00:48:36.280 |
And what value semantics do is they take collections, 00:48:43.000 |
also tensors and strings and things like this 00:48:57.280 |
it's your array, you can go do what you want to it, 00:49:05.920 |
You have to be careful to implement it in an efficient way. 00:49:08.600 |
- Is there a performance hit that's significant? 00:49:12.020 |
- Generally not, if you implement it the right way, 00:49:25.720 |
- Absolutely, well, the trick is you can't do copies. 00:49:29.920 |
So you have to provide the behavior of copying 00:49:39.160 |
- It's not magic, it's just, it's actually pretty cool. 00:49:42.200 |
Well, so first, before we talk about how that works, 00:49:55.400 |
And so the problem is that if you pass in a record 00:50:06.800 |
you have to know that that database is gonna take it, 00:50:15.560 |
And so you roll out version one of the database, 00:50:24.920 |
Somebody else joins the team, they don't know this. 00:50:35.200 |
okay, we have to do something different, right? 00:50:36.880 |
And so what you do is you go change your Python code, 00:50:39.040 |
and you change your database class to copy the record 00:50:45.160 |
okay, I will do what's called a defensive copy 00:50:57.680 |
Okay, this is usually the two design patterns. 00:51:06.920 |
you get these bugs, and this is state of the art, right? 00:51:10.600 |
So a different approach, so it's used in many languages. 00:51:15.280 |
is you say, okay, well, let's provide value semantics. 00:51:33.200 |
And if you pass it around, you stick in your database, 00:51:44.320 |
Well, then you've just handed it off to the database, 00:51:47.560 |
you've transferred it, and there's no copies made. 00:51:50.840 |
Now, on the other hand, if your coworker goes 00:51:57.400 |
and then you go to town and you start modifying it, 00:52:00.180 |
what happens is you get a copy lazily on demand. 00:52:12.920 |
- But the implementation details are tricky here. 00:52:27.880 |
So this concept has existed in many different worlds, 00:52:31.720 |
and so again, it's not novel research at all. 00:52:39.520 |
And so there's a number of components that go into this. 00:52:43.520 |
so we're talking about Python and reference counting 00:52:52.240 |
and so you have to make sure that you're efficient 00:52:55.280 |
instead of duplicating references and things like that, 00:53:07.800 |
so of course the default list is a reference semantic list 00:53:14.000 |
but then you have to design a value semantic list. 00:53:30.620 |
or hard-coding this into the language itself. 00:53:34.020 |
so you're constantly tracking who owns the thing. 00:53:37.120 |
- Yes, and so there's a whole system called ownership, 00:53:39.120 |
and so this is related to work done in the Rust community. 00:53:43.480 |
Also the Swift community has done a bunch of work, 00:53:45.120 |
and there's a bunch of different other languages 00:53:46.680 |
that have all kind of, C++ actually has copy constructors 00:54:00.920 |
that's kind of been developing for many, many years now. 00:54:06.240 |
out of all these systems and remixes it in a nice way 00:54:13.080 |
but you don't have to deal with it when you don't want to, 00:54:15.760 |
which is a major thing in terms of teaching and learning 00:54:18.080 |
and being able to use and scale these systems. 00:54:21.040 |
- How does that play with argument conventions? 00:54:28.840 |
with the arguments when they're passed into functions? 00:54:30.720 |
- Yeah, so if you go deep into systems programming land, 00:54:34.320 |
so this isn't, again, this is not something for everybody, 00:54:36.760 |
but if you go deep into systems programming land, 00:54:39.020 |
what you encounter is you encounter these types 00:54:43.720 |
So if you're used to Python, you think about everything, 00:54:47.280 |
I can go change it and mutate it and do these things, 00:54:53.760 |
you get into these things like I have an atomic number, 00:55:05.820 |
Sometimes you can't necessarily even move them 00:55:18.120 |
And by doing that, what you can say is you can say, 00:55:37.560 |
is multiple different things have to be poking at that, 00:55:42.240 |
Right, and so you can't just move it out from underneath one 00:55:46.640 |
And so that's an example of a type that you can't even, 00:55:50.880 |
Once you create it, it has to be where it was, right? 00:56:03.040 |
That's not something you necessarily wanna do. 00:56:08.480 |
where you wanna be able to say that they are uniquely owned. 00:56:19.400 |
And so what Mojo allows you to do is it allows you to say, 00:56:22.160 |
hey, I wanna pass around a reference to this thing 00:56:35.200 |
you get a reference to it, but you can change it. 00:56:48.040 |
- Smart, smart, different kinds of implications 00:56:50.080 |
of smart pointers that you can explicitly define. 00:56:52.800 |
This allows you, but you're saying that's more like 00:57:01.800 |
So I mean, I'm not one to call other people weird, 00:57:04.300 |
but the, but you know, if you talk to a normal Python, 00:57:10.760 |
you're typically not thinking about this, right? 00:57:17.440 |
again, they're not weird, they're delightful. 00:57:21.960 |
Those folks will think about all the time, right? 00:57:35.680 |
And so it's not that anybody's right or wrong, 00:57:37.800 |
it's about how do we build one system that scales. 00:57:44.640 |
has been something that always brought me deep happiness, 00:57:52.920 |
the idea that threads can just modify stuff asynchronously, 00:57:57.920 |
just the whole idea of concurrent programming 00:58:07.320 |
again, you zoom out and get out of programming languages 00:58:11.520 |
or compilers and you just look what the industry has done. 00:58:20.080 |
Moore's law has this idea that computers for a long time, 00:58:23.440 |
single thread performance just got faster and faster 00:58:30.560 |
and power consumption, other things started to matter. 00:58:39.040 |
And this trend towards specialization of hardware 00:58:43.440 |
And so for years, us programming language nerds 00:58:49.200 |
okay, well, how do we tackle multi-core, right? 00:58:51.720 |
For a while, it was like multi-core is the future, 00:58:59.760 |
with hundreds of cores in them, what happened, right? 00:59:07.000 |
in the face of this, those machine learning people 00:59:13.840 |
A tensor is like an arithmetic and algebraic concept, 00:59:25.480 |
like TensorFlow and PyTorch, we're able to say, 00:59:31.560 |
This enables you to do automatic differentiations, 00:59:39.760 |
Well, because you have that abstract representation, 00:59:41.480 |
you can now map it onto these parallel machines 00:59:43.940 |
without having to control, okay, put that byte here, 00:59:48.760 |
And this has enabled an explosion in terms of AI, 01:00:00.760 |
So you write that the modular compute platform 01:00:05.080 |
dynamically partitions models with billions of parameters 01:00:08.360 |
and distributes their execution across multiple machines, 01:00:15.480 |
By the way, the use of unparalleled in that sentence, 01:00:20.280 |
scale, and reliability for the largest workloads. 01:00:31.400 |
- Yeah, so one of the really interesting tensions, 01:00:34.520 |
so there's a whole bunch of stuff that goes into that. 01:00:38.920 |
If you go back and replay the history of machine learning, 01:00:42.440 |
right, I mean, the brief, the most recent history 01:00:44.480 |
of machine learning, 'cause this is, as you know, 01:00:46.040 |
very deep, I knew Lex when he had an AI podcast. 01:00:53.120 |
- So if you look at just TensorFlow and PyTorch, 01:00:57.600 |
which is pretty recent history in the big picture, right? 01:01:03.080 |
PyTorch, I think, pretty unarguably ended up winning. 01:01:09.800 |
And the usability of PyTorch is, I think, huge. 01:01:13.760 |
to the power of taking abstract, theoretical, 01:01:17.560 |
technical concepts and bringing it to the masses, right? 01:01:23.600 |
versus the PyTorch design points was that TensorFlow 01:01:30.200 |
but it was actually pretty good for deployment. 01:01:33.880 |
It kind of is not super great for deployment, right? 01:01:47.600 |
but they're wicked smart at model architecture 01:01:53.640 |
Like, they're wicked smart in various domains. 01:01:55.720 |
They don't wanna know anything about the hardware 01:01:57.360 |
or deployment or C++ or things like this, right? 01:02:01.200 |
who train the model, they throw it over the fence, 01:02:04.120 |
and then you have people that try to deploy the model. 01:02:17.600 |
because, of course, it never works the first time. 01:02:21.760 |
they figure out, okay, it's too slow, it won't fit, 01:02:30.280 |
then they have to throw it back over the fence. 01:02:32.560 |
And every time you throw a thing over a fence, 01:02:40.200 |
getting models in production can take weeks or months. 01:02:44.720 |
I talk to lots of people, and you talk about, like, 01:02:58.080 |
And if you dig into this, every layer is problematic. 01:03:05.500 |
It's a very exciting tip of the iceberg for folks, 01:03:13.120 |
I mean, it can theoretically, technically in some cases, 01:03:23.000 |
If you look at serving, so you talk about gigantic models. 01:03:27.380 |
Well, a gigantic model won't fit on one machine. 01:03:32.920 |
it's written in Python, it has to be rewritten in C++. 01:03:43.480 |
Well, so now suddenly the complexity is exploding, right? 01:03:47.240 |
And the reason for this is that if you look into 01:03:52.440 |
they weren't really designed for this world, right? 01:03:56.400 |
back in the day when we were starting and doing things, 01:03:59.580 |
where it was a different, much simpler world. 01:04:03.520 |
or some ancient model architecture like this. 01:04:06.000 |
It was just a, it was a completely different world. 01:04:11.440 |
Yeah, AlexNet, right, the major breakthrough. 01:04:24.920 |
And so where TensorFlow actually has amazing power 01:04:27.480 |
in terms of scale and deployment and things like that, 01:04:29.840 |
and I think Google is, I mean, maybe not unmatched, 01:04:32.680 |
but they're like incredible in terms of their capabilities 01:04:40.520 |
And so PyTorch doesn't have those same capabilities. 01:04:42.640 |
And so what Modular can do is it can help with that. 01:04:44.840 |
Now, if you take a step back and you say like, 01:04:54.160 |
And it's one of these things where everybody knows it, 01:04:57.000 |
but nobody is usually willing to talk about it. 01:05:05.920 |
and it's like fish can't see water, is complexity. 01:05:15.000 |
- And so if you look at it, yes, it is on the hardware side. 01:05:18.720 |
All these accelerators, all these software stacks 01:05:22.200 |
all these, like there's massive complexity over there. 01:05:24.760 |
You look at what's happening on the modeling side. 01:05:37.880 |
but there's a ton of diversity even within transformers. 01:05:49.000 |
have all their very weird, but very cool hardware 01:06:00.680 |
who know how to write high-performance servers 01:06:07.360 |
and all these fancy things in the serving community, 01:06:16.840 |
these systems have been built up over many years. 01:06:21.720 |
There hasn't been a first principles approach to this. 01:06:28.720 |
So I've worked on TensorFlow and TPUs and things like that. 01:06:31.520 |
Other folks on our team have worked on PyTorch Core. 01:06:38.280 |
And so built systems like the Apple accelerators 01:06:46.440 |
that roughly everybody at Modular's grumpy about 01:06:48.920 |
is that when you're working on one of these projects, 01:07:16.160 |
and heterogeneous computing, all this kind of stuff. 01:07:33.320 |
that researchers are programming in Python in a notebook. 01:07:40.400 |
And so you look at the technology that goes into that 01:07:42.640 |
and the algorithms are actually quite general. 01:07:56.320 |
And so they should be getting access to the same algorithms, 01:08:09.760 |
And so have implemented C++, have implemented Swift, 01:08:28.360 |
Well, there's a couple of major vendors of GPUs 01:08:40.920 |
from all the cloud providers and things like this, 01:08:42.680 |
and they're all super important to the world, right? 01:08:45.440 |
But they don't have the 30 years of development 01:09:03.120 |
but I have sympathy for the poor software people, right? 01:09:06.120 |
I mean, again, I'm generally a software person too. 01:09:10.560 |
wanna build applications and products and solutions 01:09:17.000 |
for one generation of hardware with one vendor's tools, right? 01:09:23.760 |
They need something that works on cloud and mobile, right? 01:09:36.480 |
And so the challenge with the machine learning technology 01:09:39.720 |
and the infrastructure that we have today in the industry 01:09:44.720 |
And because there are all these point solutions, 01:09:47.640 |
you have to switch different technology stacks 01:09:51.320 |
And what that does is that slows down progress. 01:09:54.160 |
- So basically, a lot of the things we've developed 01:09:57.480 |
in those little silos for machine learning tasks, 01:10:01.880 |
you want to make that the first class citizen 01:10:08.680 |
- Well, so it's not really about a programming language. 01:10:21.800 |
So, if you look at this mission, you need a syntax. 01:10:26.800 |
So that's, so yeah, she needed a programming language, right? 01:10:29.360 |
And like, we wouldn't have to build the programming language 01:10:33.880 |
So if Python was already good enough, then cool, 01:10:38.800 |
expensive engineering projects for the sake of it. 01:10:54.160 |
Within the stack, there are things like kernel fusion. 01:11:02.120 |
and much more general research hackability together. 01:11:08.160 |
by the ASICs, that's enabled by certain hardware. 01:11:15.800 |
Like, how do you add a piece of hardware to the stack? 01:11:44.640 |
and these have all evolved and gotten way more complicated. 01:11:47.840 |
So let's go back to the glorious simple days, right? 01:11:54.000 |
And so what you do is you say, go do a dense layer, 01:11:58.000 |
and a dense layer has a matrix multiplication in it, right? 01:12:02.120 |
go do this big operation of matrix multiplication, 01:12:04.920 |
and if it's on a GPU, kick off a CUDA kernel. 01:12:08.240 |
If it's on a CPU, go do like an Intel algorithm 01:12:11.840 |
or something like that with the Intel MKL, okay? 01:12:14.640 |
Now, that's really cool if you're either NVIDIA 01:12:23.120 |
And on one axis, you have more hardware coming in. 01:12:29.360 |
And so what happened with both TensorFlow and PyTorch 01:12:31.120 |
is that the explosion of innovation in AI has led to, 01:12:35.000 |
it's not just about matrix multiplication and convolution. 01:12:37.400 |
These things have now like 2,000 different operators. 01:12:41.960 |
I don't know how many pieces of hardware there are out there. 01:12:57.560 |
Yeah, it's not a handful of TPU alternatives. 01:13:12.760 |
Like why is everybody making their own thing? 01:13:25.080 |
And so I think that, again, we're at the end of Moore's law. 01:13:30.000 |
- If you're building, if you're training GPT-5, 01:13:33.000 |
you want some crazy supercomputer data center thingy. 01:13:38.040 |
If you're making a smart camera that runs on batteries, 01:13:41.680 |
you want something that looks very different. 01:13:44.720 |
you want something that looks very different. 01:14:00.960 |
There's different trade-offs in terms of the algorithms. 01:14:10.760 |
And what I'm interested in is unlocking that innovation. 01:14:23.920 |
think what that would mean in terms of the daily impact 01:14:51.720 |
Status quo is that if you're Intel or you're Nvidia, 01:15:02.960 |
that are like trying to keep up and tune and optimize. 01:15:09.600 |
they have to go back and rewrite all these things, right? 01:15:12.160 |
So really it's only powered by having hundreds of people 01:15:14.560 |
that are all like frantically trying to keep up. 01:15:17.200 |
And what that does is that keeps out the little guys. 01:15:31.920 |
is people have been trying to turn this from a, 01:15:34.120 |
let's go write lots of special kernels problem 01:15:38.640 |
And so we, and I contributed to this as well, 01:15:43.200 |
like let's go make this compiler problem phase, 01:15:47.280 |
And much of the industry is still in this phase, 01:15:49.280 |
by the way, so I wouldn't say this phase is over. 01:15:56.280 |
a much more general extensible hackable interface 01:16:16.860 |
it is way faster to do one pass over the data 01:16:24.040 |
'cause ReLU is just a maximum operation, right? 01:16:30.280 |
take MatMul, ReLU, squish together in one operation, 01:16:37.360 |
I just went from having two operators to three. 01:16:46.400 |
What about like a million things that are out there, right? 01:16:51.740 |
now I get permutations of all these algorithms, right? 01:16:54.480 |
And so what the compiler people said is they said, 01:16:56.000 |
hey, cool, well, I will go enumerate all the algorithms 01:16:59.960 |
and I will actually generate a kernel for you. 01:17:02.320 |
And I think that this has been very useful for the industry. 01:17:05.200 |
This is one of the things that powers Google TPUs, 01:17:08.160 |
PyTorch 2s, like rolling out really cool compiler stuff 01:17:10.920 |
with Triton, this other technology and things like this. 01:17:13.880 |
And so the compiler people are kind of coming into their fore 01:17:16.840 |
and saying like, awesome, this is a compiler problem, 01:17:25.420 |
But not everybody can or should be a compiler person. 01:17:31.920 |
or they know some GPU internal architecture thing 01:17:43.560 |
And so one of the challenges with this new wave 01:17:46.080 |
of technology trying to turn everything into a compiler, 01:17:55.220 |
is brings programmability back into this world. 01:17:57.780 |
Like it enables, I wouldn't say normal people, 01:18:03.320 |
that cares about numerics, or cares about hardware, 01:18:18.440 |
- Yeah, so again, go back to the simplest example of int. 01:18:24.820 |
is we said, okay, pull magic out of the compiler 01:18:31.040 |
that we're providing and this very deep technology stack, 01:18:39.780 |
this whole stack allows that stack to be extended 01:18:47.880 |
and by people who know things that we don't know. 01:18:53.240 |
but we don't have all the smart people, it turns out. 01:19:04.720 |
And so the simplest example you might come up with 01:19:09.760 |
And so it's a simple heterogeneous computer to say, 01:19:12.680 |
I will run my data loading and pre-processing 01:19:19.760 |
I do a lot of matrix multiplications and convolutions 01:19:23.440 |
and I get it back out and I do some reductions and summaries 01:19:31.280 |
And so you've got now what are effectively two computers, 01:19:53.080 |
and so there's multiple different kinds of CPUs 01:19:57.320 |
You've got GPUs, you've got neural network accelerators, 01:20:04.760 |
so for video decode and JPEG decode and things like this. 01:20:07.560 |
And so you've got this massively complicated system, 01:20:10.600 |
every laptop these days is doing the same thing, 01:20:13.240 |
and all of these blocks can run at the same time 01:20:19.680 |
And so again, one of the cool things about machine learning 01:20:22.200 |
is it's moving things to like data flow graphs 01:20:24.240 |
and a higher level of abstractions and tensors 01:20:32.640 |
in terms of how to translate or map it or compile it 01:20:40.200 |
is a way for all these devices to talk to each other. 01:20:43.440 |
And so this is one thing that I'm very passionate about. 01:21:01.120 |
of the same problem you have in a data center. 01:21:12.120 |
And so you get a much larger scale heterogeneous computer. 01:21:16.960 |
is you have this like multi-layer abstraction 01:21:24.440 |
And making that, again, my enemy is complexity. 01:21:37.240 |
and make it much simpler and actually get used. 01:21:44.800 |
I don't know, five, six computers essentially 01:21:50.240 |
How do you, without trying to minimize the explicit, 01:22:00.560 |
- Yeah, so there's a pretty well-known algorithm 01:22:03.240 |
and what you're doing is you're looking at two factors. 01:22:09.240 |
'Cause it takes time to get it from that side of the chip 01:22:11.120 |
to that side of the chip and things like this. 01:22:13.600 |
And then you're looking at what is the time it takes 01:22:25.320 |
that's really good at matrix multiplications, okay? 01:22:29.280 |
if my workload is all matrix multiplications, 01:22:31.800 |
I start up, I send the data over the neural net thing, 01:22:44.800 |
But then you realize you get a little bit more complicated 01:22:47.120 |
because you can do matrix multiplications on a GPU, 01:22:54.200 |
and they'll have different trade-offs and costs. 01:22:58.040 |
And so what you actually look at is you look at, 01:23:07.080 |
the bisection bandwidth and like the overhead 01:23:09.120 |
and the sending of all these different things 01:23:17.960 |
- So it's the old school theoretical computer science problem 01:23:26.320 |
to somehow magically include autotune into this? 01:23:34.240 |
this is not, not everybody would agree with this, 01:23:38.120 |
the world benefits from simple and predictable systems 01:23:43.820 |
But then once you have a predictable execution layer, 01:23:47.240 |
you can build lots of different policies on top of it. 01:23:50.120 |
And so one policy can be that the human programmer says, 01:23:55.000 |
do that here, do that here, do that here, do that here, 01:24:06.000 |
And so the next logical step that people typically take 01:24:11.240 |
Oh, if it's a major small location, do it over there. 01:24:17.680 |
And then you then get into this mode of like, 01:24:23.100 |
let's actually like make the heuristic better. 01:24:27.640 |
Let's actually do a search of the space to decide, 01:24:38.480 |
This is a many dimensional, hyper-dimensional space 01:24:47.040 |
that are good at searching very complicated spaces for- 01:24:53.440 |
- So then you turn it into a machine learning problem, 01:24:55.280 |
and then you have a space of genetic algorithms 01:24:57.520 |
and reinforcement learning and like all these- 01:25:06.700 |
Is it a separate thing, or is it part of the compilation? 01:25:08.920 |
- So you start from simple and predictable models. 01:25:15.720 |
that like nudge systems so you don't have to do this. 01:25:18.400 |
But if you really care about getting the best, 01:25:26.240 |
you don't wanna do this every time you run a model. 01:25:28.200 |
You wanna figure out the right answer and then cache it. 01:25:44.160 |
and do a big, expensive search over the space 01:25:53.320 |
Right, and so you can get out of this trade-off 01:25:55.920 |
between, okay, am I gonna like spend forever doing a thing, 01:26:01.560 |
Like these are actually not in contention with each other 01:26:07.440 |
- You started and did a little bit of a whirlwind overview 01:26:10.960 |
of how you get the 35,000x speedup or more over Python. 01:26:15.960 |
Jeremy Howard did a really great presentation 01:26:19.760 |
about sort of the basic, like, look at the code, 01:26:26.080 |
probably developers can do for their own code 01:26:28.560 |
to see how you can get these gigantic speedups. 01:26:31.120 |
But can you maybe speak to the machine learning task 01:26:34.560 |
How do you make some of this code fast and specific? 01:26:44.780 |
So are we talking about matmul, matrix multiplication, 01:26:50.300 |
- So, I mean, if you just look at the Python problem, 01:26:52.700 |
right, you can say, how do I make Python faster? 01:26:58.380 |
okay, how do I make Python 2x faster, 10x faster, 01:27:01.500 |
And there've been a ton of projects in that vein, right? 01:27:04.340 |
Mojo started from the, what can the hardware do? 01:27:11.060 |
What is the, like, how fast can this thing go? 01:27:20.220 |
It's saying, cool, I know what the hardware can do, 01:27:25.780 |
- You can just say how gutsy that is to be in the meeting, 01:27:35.860 |
but you look at that, what is the limit of physics? 01:27:42.820 |
typically it ends up being a memory problem, right? 01:27:49.460 |
the problem is that you can do a lot of math within them, 01:27:52.900 |
but you get bottleneck sending data back and forth 01:27:58.060 |
or distant memory, or disk, or whatever it is, 01:28:05.380 |
as you start doing tons of inferences all over the place, 01:28:08.540 |
that becomes a huge bottleneck for people, right? 01:28:15.300 |
where people took the special case and hand-tuned it, 01:28:19.900 |
and they knew exactly how the hardware worked, 01:28:21.140 |
and they knew the model, and they made it fast. 01:28:27.060 |
Resonant 50, or some, or AlexNet, or something, 01:28:31.740 |
Because the models are small, they fit in your head, right? 01:28:35.180 |
But as the models get bigger, more complicated, 01:28:37.380 |
as the machines get more complicated, it stops working. 01:28:45.500 |
This is this idea of saying, let's avoid going to memory, 01:28:48.580 |
and let's do that by building a new hybrid kernel, 01:28:52.780 |
a numerical algorithm that actually keeps things 01:28:56.740 |
in the accelerator, instead of having to write 01:29:04.500 |
Like in a GPU, for example, you'll have global memory, 01:29:18.660 |
And so a lot of taking advantage of the hardware 01:29:30.700 |
One of which is, again, the complexity of disaster, right? 01:29:35.140 |
Even if you just say, let's look at the chips 01:29:37.780 |
from one line of vendor, like Apple, or Intel, 01:29:43.580 |
comes out with new features, and they change things 01:29:52.020 |
And so this is where you need a much more scalable approach. 01:29:54.500 |
And this is what Mojo, and what the modular stack provides, 01:29:59.500 |
and the system for factoring all this complexity, 01:30:02.300 |
and then allowing people to express algorithms. 01:30:13.540 |
So to me, I kind of joke, what is a compiler? 01:30:23.620 |
Like, you can talk about many, many things that compilers do. 01:30:37.060 |
and it can work on problems that are bigger than, 01:30:45.660 |
is the ability to walk up to it with a new problem, 01:30:50.620 |
And that's something that a lot of machine learning 01:30:52.940 |
infrastructure, and tools, and technologies don't have. 01:30:56.660 |
Typical state of the art today is you walk up, 01:31:07.460 |
The state of ML tooling today is not anything 01:31:10.940 |
that a C programmer would ever accept, right? 01:31:13.820 |
And it's always been this kind of flaky set of tooling 01:31:28.380 |
that are trying to solve their problems, right? 01:31:30.740 |
And so that means that we get this fragmented, 01:31:37.580 |
and Jeremy showed this, there's the vectorize function, 01:31:45.780 |
- Vectorize, as he showed, is built into the library. 01:31:52.020 |
Vectorize, parallelize, which vectorizes more low-level, 01:31:59.420 |
which is how he demonstrated the autotune, I think. 01:32:12.140 |
into a compute problem, you have one floating point number. 01:32:15.340 |
Right, and so then you say, okay, I wanna be, 01:32:17.260 |
I can do things one at a time in an interpreter. 01:32:21.900 |
So I can get to doing one at a time in a compiler, 01:32:26.060 |
Then I can get to doing four or eight or 16 at a time 01:32:32.780 |
Then you can say, hey, I have a whole bunch of different, 01:32:37.980 |
is it's basically a bunch of computers, right? 01:32:42.500 |
that can talk to each other and they share memory. 01:32:44.980 |
And so now what parallelize does is it says, okay, 01:32:47.340 |
run multiple instances of this on different computers. 01:32:50.460 |
And now they can all work together on a problem, right? 01:32:56.860 |
And as you do that, how do I take advantage of this? 01:33:02.540 |
It says, okay, let's make sure that we're keeping the data 01:33:07.860 |
instead of sending it all back and forth through memory 01:33:19.220 |
the details matter so much to get good performance. 01:33:22.100 |
This is another funny thing about machine learning 01:33:24.380 |
and high-performance computing that is very different 01:33:32.820 |
or a new version of Clang or something like that, 01:33:39.220 |
And so compiler engineers will work really, really, 01:33:41.940 |
really hard to get half a percent out of your C code, 01:33:50.020 |
or you're talking about these kinds of algorithms, 01:33:53.540 |
and these are things people used to write in Fortran, 01:34:04.740 |
you really want to make use of the full memory you have, 01:34:09.100 |
but if you use too much space, it doesn't fit in the cache, 01:34:11.740 |
now you're gonna be thrashing all the way back out 01:34:14.780 |
And these can be 2x, 10x, major performance differences. 01:34:18.580 |
And so this is where getting these magic numbers 01:34:21.220 |
and these things right is really actually quite important. 01:34:23.980 |
- So you mentioned that Mojo's a superset of Python. 01:34:27.540 |
Can you run Python code as if it's Mojo code? 01:34:41.180 |
So Mojo's not done yet, so I'll give you a disclaimer, 01:34:44.500 |
but already we see people that take small pieces 01:34:47.620 |
of Python code, move it over, they don't change it, 01:34:52.740 |
Like somebody was just tweeting about that yesterday, 01:35:05.140 |
this is just basic stuff, move it straight over. 01:35:11.140 |
it will have more and more and more features, 01:35:13.300 |
and our North Star is to be a full superset of Python, 01:35:15.820 |
and so you can bring over basically arbitrary Python code 01:35:24.860 |
and way faster in many cases, is the goal, right? 01:35:34.500 |
but there's also non-obvious things that are complicated, 01:35:37.740 |
like we have to be able to talk to CPython packages 01:35:44.540 |
- So you have to, I mean, just to make explicit, 01:35:53.060 |
that means you have to run all the Python packages 01:36:00.060 |
What's the relationship between Mojo and CPython, 01:36:04.300 |
the interpreter that's presumably would be tasked 01:36:14.180 |
and you'll be able to move Python packages over 01:36:24.740 |
because then you'll get a whole bunch of advantages, 01:36:27.140 |
and you'll get massive speedups and things like this. 01:36:31.500 |
- Exactly, but we're not willing to wait for that. 01:36:34.740 |
Python is too important, the ecosystem is too broad, 01:36:58.780 |
it's not like a standard, an arbitrary package, 01:37:04.500 |
'Cause CPython already runs all the packages, right? 01:37:07.020 |
And so what we do is we built an integration layer, 01:37:18.560 |
The downside of that is you don't get the benefits 01:37:31.220 |
well here's a, you know, the Python ecosystem is vast, 01:37:35.940 |
but there's certain things that are really important, 01:37:37.740 |
and so if I'm doing weather forecasting or something, 01:37:44.700 |
and then I have my own crazy algorithm inside of it, 01:37:50.300 |
if I can write in Mojo and have one system that scales, 01:38:00.180 |
Because is there some communication back and forth? 01:38:08.820 |
but what we do is we use the CPython existing interpreter, 01:38:14.980 |
and that's how it provides full compatibility, 01:38:25.140 |
with all the CPython objects and all the, you know, 01:38:30.060 |
it's also the C packages, the C libraries underneath them, 01:38:37.180 |
and the way we do that is that we have to play 01:38:40.220 |
and so we keep objects in that representation 01:38:44.380 |
- What's the representation that's being used? 01:38:53.660 |
but also different rules on how to pass pointers around 01:38:57.540 |
Super low-level fiddly, and it's not like Python, 01:39:08.940 |
And so what this means is you have to know not only C, 01:39:13.820 |
which is a different role from Python, obviously, 01:39:20.580 |
and the implementation details and the conventions, 01:39:25.860 |
now suddenly you have a debugger that debugs Python 01:39:44.260 |
"Okay, I care about performance for whatever reason," right? 01:39:49.580 |
and so then you add types, you can parallelize things, 01:39:52.140 |
you can vectorize things, you can use these techniques, 01:39:54.220 |
which are general techniques to solve a problem, 01:39:56.980 |
and then you can do that by staying in the system, 01:40:03.260 |
that's really important to you, you can move it to Mojo, 01:40:05.380 |
you get massive performance benefits on that, 01:40:09.700 |
If you like stack types, it's nice if they're enforced. 01:40:12.620 |
Some people like that, right, rather than being hints, 01:40:16.460 |
and then you can do that incrementally as you go. 01:40:20.540 |
- So one different perspective on this would be 01:40:44.580 |
that is really improving, I think CPython 3.11 01:40:51.300 |
and it was 15% faster, 20% faster across the board, 01:40:56.300 |
which is pretty huge given how mature Python is 01:40:59.660 |
and things like this, and so that's awesome, I love it. 01:41:07.980 |
like it doesn't do vectors, doesn't do things. 01:41:14.980 |
So they're definitely, I'm a huge fan of that work, 01:41:19.100 |
by the way, and it composes well with what we're doing, 01:41:21.060 |
and so it's not like we're fighting or anything like that, 01:41:31.900 |
we're working backwards from what is the limit of physics. 01:41:34.940 |
- What's the process of porting Python code to Mojo? 01:41:44.940 |
- Not yet, so we're missing some basic features right now, 01:41:48.300 |
and so we're continuing to drop out new features 01:41:50.660 |
on a weekly basis, but at the fullness of time, 01:41:59.860 |
- So when we're ready, it'll be very automatable, yes. 01:42:03.500 |
- Is it possible to automate, in the general case, 01:42:10.540 |
- Yeah, well-- - You're saying it's possible. 01:42:32.460 |
you can adopt templates, you can adopt other references 01:42:44.340 |
you can't use the cool features, but it still works, right? 01:42:54.900 |
there's not a Python is bad and a Mojo is good, right? 01:43:02.220 |
And so if you wanna stay with Python, that's cool, 01:43:05.260 |
but the tooling should be actually very beautiful 01:43:12.580 |
- Right, so you're, right, so there's several things 01:43:15.140 |
to say there, but also the conversion tooling 01:43:20.620 |
- And then, yeah, exactly, once you're in the new world, 01:43:22.660 |
then you can build all kinds of cool tools to say, like, 01:43:26.100 |
Or, like, and we haven't built those tools yet, 01:43:50.900 |
that have been working on improving Python in various ways. 01:44:05.540 |
but they're trying to be compatible with Python. 01:44:08.260 |
There's also another category of these things 01:44:10.180 |
where they're saying, well, Python is too complicated. 01:44:26.980 |
And so you can choose to pass on that, right? 01:44:52.400 |
And it's worth it because it's not about any one package, 01:45:02.860 |
Like, we want people to be able to adopt this stuff quickly. 01:45:06.680 |
And so by doing that work, we can help lift people. 01:45:09.940 |
- Yeah, the challenge, it's really interesting, 01:45:13.300 |
of really making a language a superset of another language. 01:45:25.660 |
So all joking aside, I think that the annotation thing 01:45:28.740 |
is not the actual important part of the problem. 01:45:36.940 |
and they translate to beautiful static metaprogramming 01:45:43.220 |
And so Python, I've talked with Guido about this. 01:45:45.820 |
It's like, it was not designed to do what we're doing. 01:45:49.900 |
That was not the reason they built it this way, 01:45:51.560 |
but because they really cared and they were very thoughtful 01:46:04.500 |
you get stuck with the design decisions of the subset, right? 01:46:13.540 |
because of C in the legacy than it would have been 01:46:21.700 |
that are trying to make C++ better and re-syntax C++. 01:46:25.340 |
It's gonna be great, we'll just change all the syntax. 01:46:28.000 |
But if you do that, now suddenly you have zero packages. 01:46:32.160 |
- So what are the, if you could just linger on that, 01:46:42.480 |
Is it all boiled down to having a big integer? 01:46:52.960 |
So war story in the space is, you go way back in time, 01:47:19.460 |
well, I wanna build a C parser, C++ parser for LLVM. 01:47:32.980 |
it has all these weird features, all these bugs. 01:47:41.540 |
It's gonna be beautiful, it'll be amazing, well engineered, 01:47:46.740 |
And so I started implementing and building it out 01:47:52.400 |
And all of the headers in the world use all the GCC stuff. 01:48:08.180 |
I could have built an amazingly beautiful academic thing 01:48:12.500 |
Or I could say, well, it's yucky in various ways. 01:48:18.100 |
All these design mistakes, accidents of history, 01:48:20.860 |
the legacy, at that point GCC was like over 20 years old. 01:48:24.860 |
Which, by the way, now LLVM's over 20 years old, right? 01:48:27.980 |
So it's funny how time catches up to you, right? 01:48:30.340 |
And so you say, okay, well, what is easier, right? 01:48:35.340 |
I mean, as an engineer, it's actually much easier for me 01:48:38.460 |
to go implement long tail compatibility, weird features, 01:48:41.660 |
even if they're distasteful, and just do the hard work 01:48:44.740 |
and figure it out, reverse engineer, understand what it is, 01:48:48.020 |
write a bunch of test cases, try to understand behavior. 01:48:51.220 |
It's way easier to do all that work as an engineer 01:49:05.180 |
Nobody actually even understands how the code works, 01:49:07.900 |
because it was written by the person who quit 10 years ago. 01:49:11.580 |
Right, and so this software is kind of frustrating that way, 01:49:23.160 |
- Well, there are occasions in which you get a build, 01:49:28.740 |
or something like that, or there's this beautiful algorithm 01:49:30.840 |
that just makes you super happy, and I love that moment, 01:49:36.340 |
and you're working with code and dusty deck code bases 01:49:40.580 |
it's not about what's theoretically beautiful, 01:49:44.460 |
what people actually use, and I don't meet a lot of people 01:49:52.620 |
- By the way, there could be interesting possibilities, 01:50:01.420 |
how that could create more, be a tool in the battle 01:50:06.420 |
against this monster of complexity that you mentioned. 01:50:10.580 |
- You mentioned Guido, the benevolent dictator 01:50:24.060 |
We actually talked with Guido before it launched, 01:50:26.160 |
and so he was aware of it before it went public. 01:50:33.180 |
and Guido's pretty amazing in terms of steering 01:50:53.080 |
and get his input and get his eyes on this, right? 01:50:56.280 |
Now, a lot of what Guido was and is, I think, 01:51:00.520 |
concerned about is how do we not fragment the community? 01:51:06.360 |
That was really painful for everybody involved, 01:51:09.200 |
and so we spent quite a bit of time talking about that 01:51:11.200 |
and some of the tricks I learned from Swift, for example. 01:51:19.640 |
into a slightly prettier Objective-C, which we did, 01:51:37.000 |
you leverage CPython while bringing up the new thing. 01:51:45.240 |
and so Guido was very interested in, like, okay, cool. 01:51:50.560 |
It's his baby, and I have tons of respect for that. 01:51:53.080 |
Incidentally, I see Mojo as a member of the Python family. 01:51:55.920 |
We're not trying to take Python away from Guido 01:52:05.400 |
and so I think that, again, you would have to ask Guido this, 01:52:18.960 |
And that, you know, if the future is Python, right? 01:52:23.120 |
I mean, look at the far outside case on this, right? 01:52:28.120 |
And I'm not saying this is Guido's perspective, 01:52:30.520 |
but, you know, there's this path of saying, like, 01:52:35.480 |
all the places it's never been able to go before, right? 01:52:38.640 |
And that means that Python can go even further 01:52:42.440 |
- So in some sense, Mojo could be seen as Python 4.0. 01:52:49.320 |
I think that would drive a lot of people really crazy. 01:52:54.160 |
- I'm willing to annoy people about Emacs versus Vim, 01:52:58.560 |
- I don't know, that might be a little bit far even for me. 01:53:02.280 |
- But the point is, the step to being a superset 01:53:05.000 |
and allowing all of these capabilities, I think, 01:53:12.760 |
So he's interested by the ideas that you're playing with, 01:53:36.840 |
and share code and basically just have these big code bases 01:53:41.280 |
that are using CPython and more and more moving towards Mojo. 01:53:46.280 |
- Well, so again, these are lessons I learned from Swift 01:53:48.240 |
and here we face very similar problems, right? 01:53:51.240 |
In Swift, you have Objective-C, super dynamic. 01:54:03.200 |
I mean, Apple's got the biggest, largest scale code base 01:54:13.720 |
and so you want to be able to adopt things piece at a time. 01:54:16.280 |
And so a thing that I found that worked very well 01:54:20.200 |
"Okay, cool," and this was when Swift was very young, 01:54:23.280 |
and she'd say, "Okay, you have a million line of code 01:54:36.720 |
is a very wonderful thing for an app developer, 01:54:40.520 |
but it's a huge challenge for the compiler team 01:54:43.440 |
and the systems people that are implementing this, right? 01:54:45.600 |
And this comes back to what is this trade-off 01:54:47.920 |
between doing the hard thing that enables scale 01:54:51.480 |
versus doing the theoretically pure and ideal thing, right? 01:54:56.560 |
a lot of different machinery to deeply integrate 01:55:00.280 |
And we're doing the same thing with Python, right? 01:55:04.520 |
is that Swift as a language got more and more 01:55:09.840 |
And incidentally, Mojo is a much simpler language 01:55:12.280 |
than Swift in many ways, and so I think that Mojo 01:55:14.400 |
will develop way faster than Swift for a variety of reasons. 01:55:27.720 |
"I'm not dealing with a million lines of code. 01:55:29.840 |
"I'll just start and use the new thing for my whole stack." 01:55:35.360 |
where communities and where people that work together, 01:55:38.760 |
you build a new subsystem or a new feature or a new thing 01:55:44.280 |
then you want it to end up being used on the other side. 01:56:02.920 |
is I would love to see people that are building 01:56:10.560 |
or these packages that are half Python, half C++. 01:56:15.160 |
And if you say, "Okay, cool, I want to get out 01:56:17.920 |
"of this Python C++ world into a unified world, 01:56:26.600 |
'Cause these libraries get used by everybody, 01:56:30.120 |
and they're not all gonna switch all at once, 01:56:35.120 |
Well, so the way we should do that is we should 01:56:40.080 |
And that's what we did in Swift, and it worked great. 01:56:43.000 |
I mean, it was a huge implementation challenge 01:56:46.240 |
But there's only a dozen of those compiler people, 01:56:50.920 |
And so it's a very expensive, capital-intensive, 01:57:00.760 |
the community progressively adopt technologies. 01:57:03.000 |
And so I think that this approach will work quite well 01:57:12.560 |
- So how do, just to linger on these packages, 01:57:26.960 |
Is Mojo kind of visioned to replace PyTorch and TensorFlow, 01:57:44.800 |
and so it can help solve the C, C++, Python feud 01:57:54.200 |
- Yes, okay, so the fire emoji is amazing, I love it. 01:57:59.640 |
The other side of this is the fire emoji is in service 01:58:05.320 |
- Right, and so the big AI problems are, again, 01:58:12.920 |
but it's not getting felt by the industry, right? 01:58:15.800 |
And so when you look at how does the modular engine 01:58:26.400 |
You have people that are using a bunch of PyTorch, 01:58:33.320 |
And when I talk to them, there's a few exceptions, 01:58:36.040 |
but generally they don't want to rewrite all their code. 01:58:39.320 |
Right, and so what we're doing is we're saying, 01:58:40.680 |
okay, well, you don't have to rewrite all your code. 01:58:43.040 |
What happens is the modular engine goes in there 01:58:47.360 |
It's fully compatible and it just provides better performance, 01:58:52.840 |
It's a better experience that helps lift TensorFlow 01:58:56.960 |
I love Python, I love TensorFlow, I love PyTorch, right? 01:59:04.440 |
- But if I have a process that trains a model 01:59:07.160 |
and I have a process that performs inference on that model 01:59:12.200 |
what should I do with that in the long arc of history 01:59:28.040 |
then writing it in Mojo is gonna be way better 01:59:30.800 |
But if you look at LLM companies, for example, 01:59:41.600 |
and other innovative machine learning models, 01:59:44.360 |
on the one hand, they're innovating in the data collection 01:59:52.560 |
and all the cool things that people are talking about. 01:59:56.360 |
But on the other hand, they're spending a lot of time 02:00:09.840 |
that are out there, and people have been working 02:00:12.720 |
and they're trying to solve subsets of the problem, 02:00:17.040 |
And so what Mojo provides for these kinds of companies 02:00:19.680 |
is the ability to say, "Cool, I can have a unifying theory." 02:00:27.960 |
or the three-world problem or the n-world problem, 02:00:29.640 |
like, this is the thing that is slowing people down. 02:00:38.160 |
- So obviously we've talked about the transition 02:00:47.280 |
about the use of Swift for machine learning context. 02:01:00.680 |
versus sort of designing a new programming language 02:01:10.640 |
- Did you go to the desert, and did you meditate on it? 02:01:18.720 |
I mean, it's just bold, and sometimes to take those leaps 02:01:24.400 |
I think there's a couple of different things. 02:01:29.160 |
like January 2017, so it's been a number of years 02:01:32.880 |
that I left Apple, and the reason I left Apple was to do AI. 02:01:36.240 |
Okay, so, and again, I won't comment on Apple and AI, 02:01:46.640 |
and understand and understand the technology, 02:01:50.120 |
and so I was like, okay, I'm gonna go dive deep 02:01:52.280 |
into applied and AI, and then the technology underneath it. 02:01:59.360 |
- And that was like when TPUs were waking up. 02:02:04.320 |
and Jeff Dean, who's a rock star, as you know, right, 02:02:09.120 |
and in 2017, TensorFlow's like really taking off 02:02:13.440 |
and doing incredible things, and I was attracted to Google 02:02:18.320 |
and TPUs are an innovative hardware accelerator platform, 02:02:21.840 |
and have now, I mean, I think proven massive scale 02:02:29.640 |
is a bunch of different projects, which I'll skip over, 02:02:32.000 |
right, one of which was the Swift for TensorFlow project, 02:02:35.280 |
right, and so that project was a research project, 02:02:41.000 |
let's look at innovative new programming models 02:02:43.480 |
where we can get a fast programming language, 02:02:45.900 |
we can get automatic differentiation into language, 02:02:53.200 |
now that project I think lasted two, three years, 02:02:58.580 |
so one of the things that's really interesting 02:03:00.400 |
is I published a talk at an LLVM conference in 2018, 02:03:10.000 |
which is basically the thing that's in PyTorch 2, 02:03:13.200 |
and so PyTorch 2 with all this Dynamo Rio thing, 02:03:15.320 |
it's all about this graph program abstraction thing 02:03:17.680 |
from Python byte codes, and so a lot of the research 02:03:20.320 |
that was done ended up pursuing and going out 02:03:26.040 |
and I think it's super exciting and awesome to see that, 02:03:31.720 |
and so there's a couple of different problems with that, 02:03:47.360 |
that other programming languages have as well, 02:03:51.520 |
We'll probably maybe briefly talk about Julia, 02:03:54.560 |
who's a very interesting, beautiful programming language, 02:04:03.660 |
where all the programmers are Python programmers, 02:04:10.520 |
well, your new thing may be good or bad or whatever, 02:04:13.320 |
but if it's a new thing, the adoption barrier is massive. 02:04:21.160 |
and there's definitely room for new and good ideas, 02:04:33.360 |
and if you wanna be compatible with all the world's code, 02:04:43.000 |
is that Swift, as a very fast and efficient language, 02:04:46.400 |
kind of like Mojo, but a different take on it still, 02:04:54.760 |
and so Eager Mode is something that PyTorch does, 02:05:04.260 |
TensorFlow at the time was not set up for that, 02:05:09.560 |
- The timing is also important in this world. 02:05:16.080 |
but you could say Swift for TensorFlow is a good idea, 02:05:20.560 |
except for the Swift, and except for the TensorFlow part. 02:05:27.520 |
- Wasn't set up for Eager Mode at the time, yeah. 02:05:34.320 |
is that in the context of it being a research project, 02:05:48.560 |
And for me personally, I learned so much from it, right? 02:05:51.560 |
And I think a lot of the engineers that worked on it 02:05:55.080 |
And so I think that that's just really exciting to see, 02:05:59.080 |
and I'm sorry that the project didn't work out, 02:06:15.520 |
as we come up with these whole new set of algorithms 02:06:19.560 |
in machine learning, in artificial intelligence, 02:06:23.920 |
Because it could be a new programming language. 02:06:37.120 |
What are your thoughts about Julia in general? 02:06:40.760 |
- So I will have to say that when we launched Mojo, 02:06:48.800 |
And so I was not, I mean, okay, let me take a step back. 02:06:53.460 |
I've known the Julia folks for a really long time. 02:06:56.120 |
They were an adopter of LLVM a long time ago. 02:07:05.800 |
as being mostly a scientific computing focused environment. 02:07:12.600 |
I neglected to understand that one of their missions 02:07:19.400 |
And so I think that was my error for not understanding that. 02:07:23.120 |
And so I could have been maybe more sensitive to that. 02:07:27.680 |
between what Mojo's doing and what Julia's doing. 02:07:32.240 |
And so one of the things that a lot of the Julia people 02:07:38.480 |
if we put a ton of more energy and a ton more money 02:07:44.040 |
maybe that would be better than starting Mojo, right? 02:07:49.400 |
but it still wouldn't make Julia into Python. 02:07:52.480 |
So if you've worked backwards from the goal of 02:08:10.440 |
then you can look at it from a different lens. 02:08:18.800 |
Let's take what's great about Python and make it even better. 02:08:21.240 |
And so it was just a different starting point. 02:08:30.400 |
- But it does seem that Python is quite sticky. 02:08:33.480 |
Is there some philosophical almost thing you could say 02:08:40.120 |
seems to be the most popular programming language 02:08:43.160 |
- Well, I can tell you things I love about it. 02:08:44.840 |
Maybe that's one way to answer the question, right? 02:08:57.240 |
- Yeah, so if you look at certain other languages, 02:09:03.880 |
it takes a long time to JIT compile all the things. 02:09:10.520 |
and then it can plow through a lot of internet stuff 02:09:21.120 |
Python integrates in a notebooks in a very elegant way 02:09:32.200 |
because it has such a simple object representation, 02:09:37.320 |
That dynamic metaprogramming thing we were talking about 02:09:39.360 |
also enables really expressive and beautiful APIs, right? 02:09:42.680 |
So there's lots of reasons that you can look at 02:09:57.760 |
But then you also look at the community side, right? 02:10:08.000 |
- And there's a reputation and prestige to machine learning 02:10:16.400 |
Well, I should probably care about machine learning. 02:10:27.440 |
Right, not because I'm telling them to learn Python, 02:10:32.640 |
Well, they also learn Scratch and things like this too. 02:10:34.760 |
But it's because Python is taught everywhere, right? 02:10:49.840 |
of teaching software engineering in schools now. 02:10:56.320 |
If you look at what causes things to become popular 02:11:00.920 |
there's reinforcing feedback loops and things like this. 02:11:05.000 |
again, the whole community has done a really good job 02:11:11.680 |
what you can get done with just a few lines of code. 02:11:25.880 |
it seems sort of clear that this is a good direction 02:11:35.880 |
because of this, whatever the engine of popularity, 02:11:39.320 |
of virality, is there something you could speak to? 02:11:45.560 |
- Yeah, well, I mean, I think that the viral growth loop 02:11:51.640 |
- I think the Unicode file extensions are what I'm betting on. 02:11:55.840 |
- Tell the kids that you could use the fire emoji 02:12:03.600 |
I think there's really, I'll give you two opposite answers. 02:12:07.480 |
One is, I hope if it's useful, if it solves problems, 02:12:10.960 |
and if people care about those problems being solved, 02:12:19.880 |
the question is, is it solving an important problem 02:12:27.240 |
that they're willing to make the switch and cut over 02:12:30.480 |
and do the pain up front so that they can actually do it? 02:12:34.520 |
And so hopefully Mojo will be that for a bunch of people. 02:12:37.400 |
And people building these hybrid packages are suffering. 02:12:42.240 |
And so I think that we have a good shot of helping people. 02:12:48.480 |
Like it's not my job to say like, everybody should do this. 02:12:52.360 |
Like I hope Python, CPython, like all these implementations, 02:12:57.320 |
It's also a bunch of different implementations 02:13:00.360 |
And this ecosystem is really powerful and exciting, 02:13:05.880 |
It's not like TypeScript or something is gonna go away. 02:13:11.800 |
And so I hope that Mojo is exciting and useful to people. 02:13:29.840 |
There's the dopamine hit of saying, holy shit, 02:13:34.060 |
This little piece of code is 10 times faster in Mojo. 02:13:41.360 |
I mean, just even that, I mean, that's the dopamine hit 02:13:44.120 |
that every programmer sort of dreams of is the optimization. 02:13:58.700 |
But so what do you see that would be like commonly, 02:14:10.760 |
what do you think would be the thing where people like try 02:14:14.640 |
and then use it regularly and it kind of grows 02:14:27.280 |
and learning new things and throwing themselves in deep end 02:14:45.840 |
the people that don't actually care about the language, 02:14:50.680 |
those people do like learning new things, right? 02:14:54.080 |
And so you talk about the dopamine rush of 10x faster. 02:14:59.200 |
here's the thing I've heard about in a different domain 02:15:07.360 |
And so one thing that I think is cool about Mojo, 02:15:10.920 |
and again, this will take a little bit of time for, 02:15:22.360 |
you can start with the world you already know 02:15:36.040 |
and want to rewrite everything and like whatever, 02:15:39.240 |
But I think the middle path is actually the more likely one 02:15:41.760 |
where it's, you know, you come out with a new idea 02:15:46.400 |
and you discover, wow, that makes my code way simpler, 02:15:48.760 |
way more beautiful, way faster, way whatever. 02:15:53.020 |
Now, if you fast forward and you said like 10 years up, 02:15:56.800 |
right, I can give you a very different answer on that, 02:16:02.040 |
and look at what computers looked like 20 years ago, 02:16:05.400 |
every 18 months they got faster for free, right? 02:16:13.280 |
You go back 10 years ago and we entered in this world 02:16:15.760 |
where suddenly we had multi-core CPUs and we had GPUs. 02:16:27.000 |
And so, and 10 years ago, it was CPUs and GPUs and graphics. 02:16:58.520 |
Physics isn't going back to where we came from. 02:17:00.800 |
It's only gonna get weirder from here on out, right? 02:17:03.460 |
And so to me, the exciting part about what we're building 02:17:06.920 |
is it's about building that universal platform, 02:17:12.920 |
'cause again, I don't think it's avoidable, it's physics, 02:17:15.400 |
but we can help lift people's scale, do things with it, 02:17:24.520 |
then I think that it will be hopefully quite interesting 02:17:32.180 |
maybe analog computers will become a thing or something, 02:17:37.080 |
where we can move this programming model forward, 02:17:40.040 |
but do so in a way where we're lifting people 02:17:45.080 |
to rewrite all their code and exploding them. 02:17:46.760 |
- Do you think there'll be a few major libraries 02:17:51.080 |
- Well, so I mean, the modular engine's all Mojo. 02:17:56.560 |
So again, come back to like, we're not building Mojo 02:18:00.400 |
because we had to to solve these accelerators. 02:18:03.720 |
But I mean, ones that are currently in Python. 02:18:05.800 |
- Yeah, so I think that a number of these projects will. 02:18:10.240 |
like each of the package maintainers also has, 02:18:14.480 |
People don't like, really don't like rewriting code 02:18:29.520 |
turns out that redesigning something while you rewrite it 02:18:45.820 |
again, if you have a package that is half C and half Python, 02:18:52.820 |
make it easier to debug and evolve your tech. 02:18:55.880 |
Adopting Mojo kind of makes sense to start with. 02:19:00.220 |
- So the two big gains are that there's a performance gain, 02:19:16.340 |
but that's actually a pretty big thing, right? 02:19:19.080 |
- And so there's a bunch of different aspects 02:19:24.300 |
like I've been working on these kinds of technologies 02:19:35.560 |
Swift's now 13 years old from when I started it. 02:19:47.120 |
and I was involved with it for 12 years or something, right? 02:20:02.000 |
I learned a tremendous amount about building languages, 02:20:04.240 |
about building compilers, about working with community, 02:20:07.760 |
And so that experience, like I'm helping channel 02:20:26.440 |
And I think that MLIR is a way better system than LLVM was. 02:20:33.080 |
But I hope that Mojo will take the next step forward 02:21:03.440 |
but we have not given it to lots of people yet. 02:21:09.920 |
so that if it crashes, we can do something about it. 02:21:16.560 |
but we're having like one person a minute sign up 02:21:33.840 |
yeah, what that's running on is that's running on cloud VMs. 02:21:37.320 |
And so you share a machine with a bunch of other people, 02:21:43.880 |
And so what you're doing is you're getting free compute 02:21:50.700 |
that it doesn't totally crash and be embarrassing, right? 02:22:00.820 |
- So that's the goal, to be able to download locally. 02:22:05.900 |
And so we just want to make sure that we do it right. 02:22:07.720 |
And I think this is one of the lessons I learned 02:22:14.880 |
gosh, it feels like forever ago, it was 2014. 02:22:31.980 |
at that point, about 250 people at Apple knew about it. 02:22:36.580 |
Apple's good at secrecy, and it was a secret project. 02:22:42.940 |
and said, "Developers, you're gonna be able to develop 02:22:45.540 |
"and submit apps to the App Store in three months." 02:22:49.420 |
- Well, several interesting things happened, right? 02:22:51.740 |
So first of all, we learned that, A, it had a lot of bugs. 02:22:58.280 |
And it was extremely stressful in terms of like, 02:23:01.100 |
trying to get it working for a bunch of people. 02:23:03.720 |
And so what happened was we went from zero to, 02:23:07.920 |
Apple had at the time, but a lot of developers overnight, 02:23:13.940 |
and it was very stressful for everybody involved, right? 02:23:19.920 |
The other thing I learned is that when that happened, 02:23:38.960 |
is that the push from launch to, first of all, the fall, 02:23:43.280 |
but then to 2.0 and 3.0, and like, all the way forward, 02:23:46.900 |
was super painful for the engineering team and myself. 02:23:53.080 |
The developer community was very grumpy about it, 02:23:55.080 |
because they're like, "Okay, well, wait a second. 02:23:59.880 |
And it was just a lot of tension and friction on all sides. 02:24:03.840 |
There's a lot of technical debt in the compiler, 02:24:13.120 |
but you never have time to go back and do it right. 02:24:17.480 |
because they've come, I mean, we, but they came so far, 02:24:22.600 |
and made so much progress over this time since launch. 02:24:29.480 |
But I just don't want to do that again, right? 02:24:31.840 |
- So, iterate more through the development process. 02:24:35.520 |
- And so what we're doing is we're not launching it 02:24:40.600 |
We're launching it and saying it's 0.1, right? 02:24:43.200 |
And so we're setting expectations of saying like, 02:24:47.920 |
Right, if you're interested in what we're doing, 02:24:49.720 |
we'll do it in an open way, and we can do it together, 02:24:54.960 |
Like, we'll get there, but let's do it the right way. 02:25:01.120 |
The thing that I want to do is build the world's best thing. 02:25:08.280 |
it doesn't matter if it takes an extra two months. 02:25:13.760 |
and not being overwhelmed with technical debt 02:25:16.160 |
and things like this is like, again, war wounds. 02:25:23.920 |
even though right now people are very frustrated 02:25:27.200 |
or it doesn't have feature X or something like this. 02:25:30.240 |
- What have you learned in the little bit of time 02:25:38.280 |
that people have been complaining about feature X or Y or Z? 02:25:53.040 |
- Yeah, yeah, well, so I mean, I've been very pleased. 02:25:54.760 |
I mean, in fact, I mean, we've been massively overwhelmed 02:25:57.400 |
with response, which is a good problem to have. 02:26:09.200 |
which was just not yet a year and a half ago, 02:26:12.120 |
so it's still a pretty new company, new team, 02:26:20.240 |
that there's a set of problems that we need to solve. 02:26:23.160 |
then people will be interested in what we're doing, right? 02:26:26.240 |
But again, you're building in basically secret, right? 02:26:34.760 |
and understand what you wanna do and how to explain it. 02:26:37.360 |
Often when you're doing disruptive and new kinds of things, 02:26:40.960 |
just knowing how to explain it is super difficult, right? 02:26:44.480 |
And so when we launched, we hoped people would be excited, 02:26:50.480 |
but I'm also like, don't wanna get ahead of myself. 02:26:55.600 |
I think their heads exploded a little bit, right? 02:27:01.560 |
that has built some languages and some tools before. 02:27:08.320 |
in the Python ecosystem and giving it the love 02:27:12.440 |
And I think people got very excited about that. 02:27:15.480 |
I mean, I think people are excited about ownership 02:27:19.440 |
And there's people that are very excited about that. 02:27:24.280 |
I made Game of Life go 400 times faster, right? 02:27:29.760 |
There are people that are really excited about the, 02:27:31.400 |
okay, I really hate writing stuff in C++, save me. 02:27:34.600 |
- Like systems engineer, they're like stepping up like, 02:27:45.160 |
- I get third person excitement when people tweet, 02:27:49.520 |
yeah, I made this code, Game of Life or whatever faster. 02:27:58.000 |
let me cast blame out to people who deserve it. 02:28:03.040 |
- These terrible people who convinced me to do some of this. 02:28:09.680 |
- Well, he's been pushing for this kind of thing. 02:28:14.060 |
for a long, long time. - He's wanted this for years. 02:28:16.760 |
Jeremy Howard, he's like one of the most legit people 02:28:24.800 |
he's an incredible educator, he's an incredible teacher, 02:28:26.840 |
but also legit in terms of a machine learning engineer 02:28:33.640 |
and looking, I think, for exactly what you've done. 02:28:46.280 |
this guy is ridiculous, is when I was at Google 02:28:49.380 |
and we were bringing up TPUs and we had a whole team 02:28:51.520 |
of people and there was this competition called DawnBench 02:29:01.520 |
And Jeremy and one of his researchers crushed Google 02:29:05.680 |
not through sheer force of the amazing amount of compute 02:29:11.560 |
that he just decided that progressive imagery sizing 02:29:14.760 |
was the right way to train the model and fewer epochs faster 02:29:24.160 |
So you can say, anyways, come back to, you know, 02:29:37.880 |
pragmatic view that he has about machine learning 02:29:40.680 |
that I don't know if it's like this mix of a desire 02:29:45.680 |
for efficiency but ultimately grounded in a desire 02:29:50.260 |
to make machine learning more accessible to a lot of people. 02:29:54.680 |
I guess that's coupled with efficiency and performance 02:29:58.360 |
but it's not just obsessed about performance. 02:30:01.280 |
- So a lot of AI and AI research ends up being 02:30:07.280 |
So a lot of people don't actually care about performance, 02:30:10.880 |
until it allows them to have a bigger data set, right? 02:30:14.040 |
And so suddenly now you care about distributed compute 02:30:18.600 |
like you don't actually wanna know about that. 02:30:20.200 |
You just want to be able to do more experiments faster 02:30:25.040 |
And so Jeremy has been really pushing the limits. 02:30:29.920 |
and there's many things I could say about Jeremy 02:30:31.600 |
'cause I'm a fanboy of his, but he, it fits in his head. 02:30:36.800 |
And Jeremy actually takes the time where many people don't 02:30:39.560 |
to really dive deep into why is the beta parameter 02:30:49.260 |
what are all the activation functions and the trade-offs 02:30:51.280 |
and why is it that everybody that does this model 02:30:55.800 |
- So the why, not just trying different values, 02:31:05.160 |
but he spends time to understand things at a depth 02:31:09.720 |
And as you say, he then brings it and teaches people. 02:31:35.840 |
have been these really fragile, fragmented things 02:31:43.900 |
So what about, so Python has this giant ecosystem 02:31:50.320 |
of packages and there's a package repository. 02:31:54.800 |
Do you have ideas of how to do that well for Mojo? 02:32:00.720 |
- Well, so that's another really interesting problem 02:32:07.020 |
Python packaging, a lot of people have very big pain points 02:32:14.980 |
- Building and distributing and managing dependencies 02:32:26.940 |
and then they get updated and things like this. 02:32:33.560 |
I think this is one of the reasons why it's great 02:32:35.680 |
that we work as a team and there's other really good 02:32:39.460 |
But one of the things I've heard from smart people 02:32:44.400 |
who've done a lot of this is that the packaging 02:32:50.320 |
And so if you have this problem where you have code split 02:32:54.400 |
between Python and C, now not only do you have to package 02:33:02.580 |
C doesn't have a dependency versioning management system. 02:33:05.740 |
Right, and so I'm not experienced in the state of the art 02:33:09.060 |
and all the different Python package managers, 02:33:12.540 |
but my understanding is that's a massive part 02:33:14.860 |
of the problem and I think Mojo solves that part 02:33:19.300 |
Now, one of the things I think we'll do with the community, 02:33:27.420 |
is that I think that we will have an opportunity 02:33:34.760 |
given the new tools and technologies and the cool things 02:33:36.880 |
we have that we've built up, because we have not just 02:33:39.360 |
syntax, we have an entirely new compiler stack 02:33:41.520 |
that works in a new way, maybe there's other innovations 02:33:44.320 |
we can bring together and maybe we can help solve 02:33:50.840 |
it was always surprising to me that it was not easier 02:34:13.840 |
like a search and discovery, as YouTube calls it. 02:34:18.840 |
- Well, I mean, it's kind of funny because this is one 02:34:22.360 |
of the challenges of these intentionally decentralized 02:34:26.160 |
communities, and so I don't know what the right answer 02:34:28.740 |
is for Python, I mean, there are many people that would, 02:34:32.180 |
or I don't even know the right answer for Mojo. 02:34:35.180 |
So there are many people that would have much more 02:34:37.540 |
informed opinions than I do, but it's interesting 02:34:39.700 |
if you look at this, right, open source communities, 02:34:42.200 |
you know, there's Git, Git is a fully decentralized, 02:34:50.100 |
centralized, commercial in that case, right, thing, 02:34:54.300 |
really help pull together and help solve some 02:34:57.760 |
a more consistent community, and so maybe there's 02:35:04.320 |
- Although even GitHub, I might be wrong on this, 02:35:06.600 |
but the search and discovery for GitHub is not that great. 02:35:13.140 |
- Yeah, well, I mean, maybe that's because GitHub 02:35:15.760 |
doesn't want to replace Google Search, right? 02:35:18.680 |
I think there is room for specialized solutions 02:35:23.440 |
- I don't know, I don't know the right answer 02:35:24.680 |
for GitHub either, that's, they can go figure that out. 02:35:28.720 |
- But the point is to have an interface that's usable, 02:35:31.040 |
that's accessible to people of all different skill levels. 02:35:33.440 |
- Well, and again, like, what are the benefit 02:35:35.440 |
of standards, right, standards allow you to build 02:35:37.500 |
these next level up ecosystem, next level up infrastructure, 02:35:41.040 |
next level up things, and so, again, come back to, 02:35:44.840 |
I hate complexity, C plus Python is complicated. 02:35:49.320 |
It makes everything more difficult to deal with, 02:35:51.400 |
it makes it difficult to port, move code around, 02:35:53.760 |
work with, all these things get more complicated, 02:35:59.760 |
by helping reduce the amount of C in this ecosystem 02:36:03.880 |
- So any kind of packages that are hybrid in nature 02:36:19.720 |
- So we talked about, obviously, indentation, 02:36:22.240 |
that it's a typed language, or optionally typed. 02:36:27.280 |
- It's either optionally or progressively, or-- 02:36:29.960 |
- I think, so people have very strong opinions 02:36:36.920 |
So there's the var versus let, but let is for constants. 02:36:44.520 |
- Yeah, var makes it mutable, so you can reassign. 02:36:54.480 |
- I mean, there's a lot of source of happiness for me, 02:37:03.920 |
why does Python not have function overloading? 02:37:10.600 |
The way it works is that Python and Objective-C 02:37:15.320 |
are actually very similar worlds if you ignore syntax. 02:37:20.120 |
And so Objective-C is straight-line derived from Smalltalk, 02:37:31.880 |
but the people that remember it love it, generally. 02:37:39.120 |
and the dictionary maps from the name of a function, 02:37:45.680 |
And so the way you call a method in Objective-C 02:37:51.000 |
is I go look up foo, I get a pointer to the function back, 02:37:58.160 |
is that the dictionary within a Python object, 02:38:01.520 |
all the keys are strings, and it's a dictionary, 02:38:13.200 |
Why do they not change it to not be a dictionary? 02:38:14.800 |
Why do they not change it, like, do other things? 02:38:24.240 |
now if I got past an integer, do some dynamic tests for it, 02:38:31.560 |
which is even if you did support overloading, 02:38:35.120 |
of a function for integers and a function for strings. 02:38:39.640 |
in that dictionary, you'd have to have the caller 02:38:41.640 |
do the dispatch, and so every time you call the function, 02:38:44.520 |
you'd have to say, is it an integer, is it a string? 02:38:46.520 |
And so you'd have to figure out where to do that test. 02:38:50.440 |
overloading is something you don't have to have. 02:38:58.000 |
and in Python, if you subscript with an integer, 02:39:02.720 |
then you get typically one element out of a collection. 02:39:12.360 |
you'll wanna be able to express the fact that, 02:39:16.640 |
depending on what I actually pass into this thing. 02:39:20.680 |
and more predictable and faster and all these things. 02:39:26.360 |
but it also feels empowering in terms of clarity. 02:39:29.600 |
Like you don't have to design whole different functions. 02:39:32.520 |
- Yeah, well, and this is also one of the challenges 02:39:38.600 |
is that in practice, like you take subscript, 02:39:45.560 |
They actually have different behavior in different cases. 02:39:47.720 |
And so this is why it's difficult to retrofit this 02:39:50.680 |
into existing Python code and make it play well with typing. 02:39:57.520 |
- Okay, so there's an interesting distinction 02:40:00.400 |
that people that program Python might be interested in 02:40:04.560 |
So it's two different ways to define a function. 02:40:13.680 |
What's the coolness that comes from the strictness? 02:40:26.640 |
you've decided compatibility with existing code 02:40:36.160 |
So that means you put a lot of time into compatibility 02:40:38.480 |
and it means that you get locked into decisions of the past, 02:40:41.960 |
even if they may not have been a good thing, right? 02:40:44.320 |
Now, systems programmers typically like to control things. 02:40:52.280 |
and even systems programmers are not one thing, right? 02:40:57.480 |
And so one of the things that Python has, for example, 02:41:00.360 |
as you know, is that if you define a variable, 02:41:15.280 |
Right, well, the compiler, the Python compiler doesn't know, 02:41:18.480 |
in all cases, what you're defining and what you're using. 02:41:21.060 |
And did you typo the use of it or the definition? 02:41:24.640 |
Right, and so for people coming from typed languages, 02:41:28.760 |
again, I'm not saying they're right or wrong, 02:41:41.360 |
intentionally declare your variables before you use them. 02:41:51.720 |
And this is a way that Mojo is both compatible, 02:42:05.400 |
- But usually if you're writing Mojo code from scratch, 02:42:09.900 |
- It depends, again, it depends on your mentality, right? 02:42:22.760 |
Are you playing around and scripting something out? 02:42:48.440 |
And I love other people who like strict things, right? 02:42:50.400 |
But I don't want to say that that's the right thing, 02:42:55.240 |
for hacking around and doing stuff and research 02:42:57.120 |
and these other cases where you may not want that. 02:42:59.560 |
You see, I just feel like, maybe I'm wrong on that, 02:43:02.560 |
but it feels like strictness leads to faster debugging. 02:43:05.640 |
So in terms of going from, even on a small project, 02:43:11.880 |
I guess it depends how many bugs you generate, usually. 02:43:20.680 |
if you study some of these languages over time, 02:43:27.180 |
pretty established community, but along their path, 02:43:32.260 |
So I think that the Ruby community has really pushed forward 02:43:38.580 |
that caught a lot of bugs at compile time, right? 02:43:43.180 |
You can have good testing and good types, right? 02:43:55.540 |
And if you typo something, it doesn't matter. 02:43:59.220 |
And so I think that the trade-offs are very different 02:44:01.180 |
if you're building a large-scale production system 02:44:04.740 |
versus you're building and exploring a notebook. 02:44:09.180 |
if you look at code I read just for myself for fun, 02:44:19.740 |
- It's basically saying in a dictatorial way, 02:44:31.900 |
But that is the sign of somebody who likes control. 02:44:41.100 |
Speaking of asserts, exceptions are called errors. 02:44:47.420 |
- So, I mean, we use the same, we're the same as Python, 02:44:51.380 |
right, but we implement in a very different way, right? 02:44:59.340 |
C++ has a thing called zero-cost exception handling. 02:45:15.340 |
the way it works is that it's called zero-cost 02:45:21.140 |
there's supposed to be no overhead for the non-error code. 02:45:25.300 |
And so, it takes the error path out of the common path. 02:45:41.980 |
and so throwing an error could be like 10,000 times 02:45:44.540 |
more expensive than referring from a function, right? 02:45:49.840 |
but it's not zero-cost by any stretch of the imagination 02:45:52.860 |
because it massively bloats out your code, your binary. 02:45:55.820 |
It also adds a whole bunch of different paths 02:45:59.060 |
because of destructors and other things like that 02:46:06.220 |
And so, this thing that was called zero-cost exceptions, 02:46:11.740 |
Okay, now, if you fast-forward to newer languages, 02:46:24.980 |
and so it's got a little bit of a different thing going on, 02:46:32.780 |
"Okay, well, let's not do that zero-cost exception 02:46:42.860 |
"returning either the normal result or an error." 02:46:46.780 |
Now, programmers generally don't want to deal 02:46:54.740 |
And so, you use all the syntax that Python gives us. 02:47:01.500 |
You can put a raises decorator on your function, 02:47:06.660 |
And then, the language can provide syntax for it, 02:47:09.860 |
but under the hood, the way the computer executes it, 02:47:24.780 |
But this has a huge impact on the way you design your APIs. 02:47:29.780 |
So, in C++, huge communities turn off exceptions 02:47:37.500 |
And so, the zero-cost cost is so high, right? 02:47:40.420 |
And so, that means you can't actually use exceptions 02:47:48.640 |
well, okay, how and when do you wanna pay the cost? 02:47:51.880 |
If I try to open a file, should I throw an error? 02:47:55.200 |
Well, what if I'm probing around looking for something, 02:47:58.160 |
right, I'm looking it up in many different paths. 02:48:07.140 |
and I have two different versions of the same thing, 02:48:11.640 |
And so, you know, one of the things I learned 02:48:14.040 |
from Apple and Nisal Love is the art of API design 02:48:18.800 |
I think this is something that Python's also done 02:48:20.400 |
a pretty good job at in terms of building out 02:48:24.440 |
It's about having standards and things like this. 02:48:26.720 |
And so, you know, we wouldn't wanna enter a mode 02:48:28.840 |
where, you know, there's this theoretical feature 02:48:35.040 |
Now, I'll also say one of the other really cool things 02:48:40.440 |
and it can run on accelerators and things like this, 02:48:47.400 |
And so, this is also part of how Mojo can scale 02:48:49.800 |
all the way down to, like, little embedded systems 02:49:03.520 |
and how they work in code during compilation? 02:49:06.860 |
So, just this idea of percolating up a thing, an error. 02:49:11.880 |
- Yeah, yeah, so the way to think about it is, 02:49:15.160 |
think about a function that doesn't return anything, 02:49:18.320 |
And so, you have function one calls function two 02:49:25.040 |
along that call stack that are tri-blocks, right? 02:49:28.160 |
And so, if you have function one calls function two, 02:49:31.840 |
and then within it, it calls function three, right? 02:49:40.720 |
Well, if it returns, it's supposed to go back out 02:49:42.560 |
and continue executing and then fall off the bottom 02:49:44.520 |
of the tri-block and keep going, and all's good. 02:49:47.440 |
If the function throws, you're supposed to exit 02:49:49.800 |
the current function and then get into the accept clause, 02:49:57.400 |
And so, the way that a compiler like Mojo works 02:50:02.680 |
which happens in the accept block, calls a function, 02:50:16.520 |
you return nothing, and if you throw an error, 02:50:19.960 |
you return the variant that is, I'm an error, right? 02:50:24.480 |
So, when you get to the call, you say, okay, cool, 02:50:27.720 |
Hey, I know locally I'm in a tri-block, right? 02:50:34.120 |
Aha, if it's that error thing, jump to the accept block. 02:50:37.360 |
- And that's all done for you behind the scenes. 02:50:39.720 |
- Exactly, and so the compiler does all this for you. 02:50:43.360 |
if you dig into how this stuff works in Python, 02:50:49.120 |
which now you need to go into, do some stuff, 02:50:55.120 |
- Yeah, and like, this stuff matters for compatibility. 02:51:04.120 |
with some special stuff going on, and so there's-- 02:51:12.680 |
It just feels like it adds a level of complexity. 02:51:16.680 |
and so this is, again, one of the trade-offs you get 02:51:22.920 |
is you get to implement a full fidelity implementation 02:51:33.080 |
about the reality of the world and shake our fist, but-- 02:51:36.440 |
- It always feels like you shouldn't be allowed to do that, 02:51:43.840 |
- Oh, wait, wait, wait, what happened to Lex, the Lisp guy? 02:51:57.960 |
- Wait a sec, wait a sec. - I love Lisp, I love Lisp. 02:52:01.240 |
you're afraid of me irritating the whole internet? 02:52:05.320 |
It worked as a joke in my head and it came out right. 02:52:11.360 |
actually really great for certain things, right? 02:52:16.400 |
Closures are pretty cool, and you can pass callbacks, 02:52:21.120 |
- So speaking of which, I don't think you have 02:52:28.920 |
- We don't have lambda syntax, but we do have-- 02:52:32.760 |
- There's a few things on the roadmap that you have 02:52:34.600 |
that it'd be cool to sort of just fly through 02:52:37.320 |
'cause it's interesting to see how many features 02:52:49.800 |
like the parentheses are not parentheses, that. 02:52:52.480 |
- Yeah, this is just a totally syntactic thing. 02:53:01.400 |
- Yeah, so this is where in Python you can say 02:53:07.720 |
- That's a nice sort of self-documenting feature. 02:53:11.240 |
- Yeah, and again, this isn't rocket science to implement, 02:53:15.880 |
- The bigger features are things like traits. 02:53:19.880 |
So traits are when you wanna define abstract, 02:53:27.680 |
and so you wanna say I wanna write this function, 02:53:41.960 |
and I'm not gonna go into ring theory or something, 02:53:47.320 |
if you can add, subtract, multiply, divide it, for example. 02:54:00.200 |
all these tensors and floating point integer, 02:54:04.960 |
and then I can define on an orthogonal axis algorithms 02:54:08.520 |
that then work against types that have those properties. 02:54:11.280 |
And so this is a, again, it's a widely known thing. 02:54:20.320 |
which is where everybody learns their tricks from, 02:54:26.960 |
and that'll enable a new level of expressivity. 02:54:40.200 |
and there's detail stuff like whole module import, 02:54:50.300 |
so being able to have variables outside of a top-level-- 02:54:54.440 |
- Well, and so this comes back to where Mojo came from 02:54:59.040 |
And so we're building, so Modular's building an AI stack, 02:55:03.560 |
right, and an AI stack has a bunch of problems 02:55:05.680 |
working with hardware and writing high-performance kernels 02:55:08.760 |
and doing this kernel fusion thing I was talking about 02:55:12.680 |
and so we've really prioritized and built Mojo 02:55:23.080 |
By the way, Mojo's only like seven months old, 02:55:27.800 |
- I mean, part of the reason I wanted to mention 02:55:29.280 |
some of these things is like there's a lot to do 02:55:44.220 |
- Yeah, and so, I mean, but also you look into, 02:55:52.920 |
if you define it, it will get destroyed automatically. 02:55:59.800 |
given the way the ownership system has to work, 02:56:03.880 |
is a huge step forward from what Rust and Swift have done. 02:56:10.080 |
- Yeah, so like say you have a string, right? 02:56:12.000 |
So you just find a string on the stack, okay? 02:56:14.040 |
Or whatever that means, like in your local function. 02:56:17.440 |
Right, and so you say, like whether it be in a def, 02:56:20.760 |
and so you just say x equals hello world, right? 02:56:24.080 |
Well, if your string type requires you to allocate memory, 02:56:27.920 |
then when it's destroyed, you have to deallocate it. 02:56:36.140 |
Well, it gets run sometime between the last use of the value 02:56:46.480 |
Like in this, you now get into garbage collection, 02:56:59.360 |
If you look at C++, the way this works is that 02:57:06.520 |
they get destroyed in a last in, first out order. 02:57:16.840 |
and you define a whole bunch of values at the top, 02:57:20.040 |
and then you do a whole bunch of code that doesn't use them, 02:57:22.640 |
they don't get destroyed until the very end of that scope. 02:57:32.920 |
you talk about reference counting optimizations 02:57:35.120 |
and things like this, a bunch of very low level things. 02:57:37.920 |
And so what Mojo does is it has a different approach on that 02:57:45.000 |
And by doing that, you get better memory use, 02:57:58.640 |
that are already built in there in Mojo today, 02:58:01.400 |
that are the things that nobody talks about generally, 02:58:07.880 |
- Is it trivial to know what's the soonest possible 02:58:11.840 |
to delete a thing that's not gonna be used again? 02:58:14.000 |
- Yeah, well, I mean, it's generally trivial, 02:58:19.360 |
and then you have some use of X somewhere in your code. 02:58:26.280 |
So you can only use something within its scope. 02:58:51.760 |
- Oh, so you have to insert delete, like in a lot of places. 02:59:05.720 |
But it's extremely powerful when you do that. 02:59:16.200 |
and make sure that every brick you put down is really good 02:59:18.960 |
so that when you put more bricks on top of it, 02:59:28.480 |
do there have to be about particular details, 02:59:30.560 |
like implementation of particular small features? 02:59:55.440 |
That allows way more expressive asynchronous programming. 03:00:06.920 |
The reason the Async/Await got added to Python, 03:00:09.840 |
as far as I know, is because Python doesn't support threads. 03:00:23.160 |
And so, they added this feature called Async/Await. 03:00:29.680 |
and JavaScript and many other places as well. 03:00:35.600 |
'cause we have a high-performance heterogeneous compute 03:00:59.360 |
you know, we have a small team of really good people 03:01:06.880 |
and like all the low-level stuff works together, 03:01:14.600 |
we released Mojo much earlier is so we can get feedback. 03:01:22.920 |
- We use an ampersand, and now it's named in-out. 03:01:37.920 |
because again, if you scale something really fast 03:01:51.960 |
- Could you incorporate an emoji into the language, 03:02:10.320 |
Like an exception, throw an exception of some sort? 03:02:14.920 |
- Or maybe a heart one, it has to be a heart one. 03:02:23.720 |
- I'm gonna use the viral nature of the internet 03:02:30.320 |
- I mean, it's funny, you come back to the flame emoji, 03:02:40.120 |
'cause for example, the people at GitHub say, 03:02:43.680 |
- Yeah, there's something, it's reinvigorating. 03:02:56.320 |
- I think the world is ready for this stuff, right? 03:03:13.760 |
or excited about that you're thinking about a lot? 03:03:21.640 |
And so Lifetimes give you safe references to memory 03:03:27.760 |
And so this has been done in languages like Rust before, 03:03:29.920 |
and so we have a new approach, which is really cool. 03:03:43.800 |
A lot of it is these kind of table stakes features. 03:03:57.940 |
And so it's like, oh, well, this annoying thing, 03:04:01.280 |
like in Python, you have to spell underbar underbar add. 03:04:11.520 |
it makes sense, it's beautiful, it's obvious. 03:04:18.960 |
one of which is that, again, lesson learned with Swift, 03:04:23.480 |
which may be a good thing, maybe not, I don't know. 03:04:28.120 |
But because it's such an easy and addictive thing to do, 03:04:31.840 |
sugar, like make sure blood get crazy, right? 03:04:46.760 |
And so we wanna work with the broader Python community 03:04:53.520 |
and we need to build them out to understand them. 03:04:57.640 |
I wanna make sure that we go back to the Python community 03:05:04.840 |
And syntactic sugar just makes all that more complicated. 03:05:08.000 |
- And yeah, list comprehensions are yet to be implemented. 03:05:21.520 |
it's actually still quite interesting and useful. 03:05:35.040 |
this incredible stack that's going to perhaps define 03:05:38.040 |
the future of development of our AI overlords. 03:05:54.080 |
Maybe one question is, how do you hire great programmers, 03:06:10.960 |
and maybe are a little bit fluid in what they can do? 03:06:16.400 |
- So building a company is just as interesting 03:06:24.840 |
And I've built a lot of teams in a lot of different places. 03:06:27.840 |
If you zoom in from the big problem into recruiting, 03:06:33.880 |
I'll just, I'll be very straightforward about this. 03:06:44.880 |
in the industry, and if we solve those problems, 03:06:49.240 |
But the problem is, is that the people we need to hire, 03:06:51.440 |
as you say, are all these super specialized people 03:06:54.320 |
that have jobs at big tech, big tech worlds, right? 03:07:04.840 |
or we don't have product market fit challenges, 03:07:09.560 |
and so many of them are suffering, and they want help. 03:07:12.000 |
And so, again, we started with strong conviction. 03:07:14.600 |
Now, again, you have to hire and recruit the best, 03:07:24.020 |
That's usually not something a company starts with. 03:07:35.440 |
is super passionate about making sure that that's right, 03:07:41.760 |
- Can you comment, sorry, before we get to the second, 03:07:45.700 |
- So, I mean, there's many different cultures, 03:07:48.320 |
and I have learned many things from many different people. 03:08:03.220 |
I believe in amazing people working together. 03:08:10.800 |
you have amazing people, and they're fighting each other. 03:08:13.300 |
I see amazing people, and they're told what to do. 03:08:16.500 |
Like, "Doubt shalt line up and do what I say. 03:08:24.860 |
They're just kind of floating in different places. 03:08:27.180 |
And they wanna be amazing, they just don't know how. 03:08:29.300 |
And so a lot of it starts with have a clear vision. 03:08:32.060 |
Right, and so we have a clear vision of what we're doing. 03:08:39.940 |
And so a lot of the Apple DNA rubbed off on me. 03:08:43.580 |
My co-founder, Tim, also is like a strong product guy. 03:08:52.220 |
You don't work from, like, come up with a cool product 03:09:06.820 |
And if your product can help solve their problems, 03:09:12.140 |
And so if you speak to them about their problems, 03:09:17.000 |
then you can work backwards to building an amazing product. 03:09:19.640 |
- So the vision starts by defining the problem. 03:09:21.780 |
- And then you can work backwards in solving technology. 03:09:25.060 |
like it's, I think, pretty famously said that, 03:09:32.020 |
I would refine that to say that there's 100 not yet's 03:09:36.580 |
- But famously, if you go back to the iPhone, for example, 03:09:39.660 |
right, the iPhone 1, I mean, many people laughed at it 03:09:42.940 |
because it didn't have 3G, it didn't have copy and paste. 03:09:47.500 |
And then a year later, okay, finally it has 3G, 03:09:50.340 |
but it still doesn't have copy and paste, it's a joke. 03:09:53.740 |
blah, blah, blah, blah, blah, blah, blah, right? 03:10:00.060 |
And so being laser focused and having conviction 03:10:07.620 |
to be able to build the right tech is really important. 03:10:24.080 |
so remote first has a very strong set of pros and cons. 03:10:31.040 |
On the one hand, you can hire people from wherever they are 03:10:35.940 |
even if they live in strange places or unusual places. 03:10:47.700 |
And so we've had to learn how to like have a system 03:10:51.540 |
and we get the whole company together periodically. 03:10:58.340 |
to the in-person brainstorming that I guess you lose, 03:11:01.740 |
but maybe you don't, maybe if you get to know each other well 03:11:04.620 |
and you trust each other, maybe you can do that. 03:11:08.340 |
I mean, I'm curious about your experience too. 03:11:09.980 |
The first thing I missed was having whiteboards, right? 03:11:13.860 |
Those design discussions where like I can high intensity, 03:11:22.340 |
figure out and solve the problem and move forward. 03:11:24.840 |
But we figured out ways to work around that now 03:11:37.660 |
The spontaneous things like the coffee bar things 03:11:44.100 |
and getting to know people outside of the transactional 03:11:49.580 |
- And I think there's just a lot of stuff that, 03:11:52.660 |
I'm not an expert at this, I don't know who is. 03:11:56.420 |
but there's stuff that somehow is missing on Zoom. 03:11:59.740 |
Even with the whiteboard, if you look at that, 03:12:02.940 |
if you have a room with one person at the whiteboard 03:12:05.540 |
and then there's like three other people at a table, 03:12:10.140 |
there's a, first of all, there's a social aspect to that 03:12:13.220 |
where you're just shooting the shit a little bit, 03:12:15.780 |
- Yeah, as people are just kind of coming in and-- 03:12:23.500 |
for like seconds at a time, maybe an inside joke. 03:12:27.660 |
It's like this interesting dynamic that happens that Zoom-- 03:12:32.740 |
but through that bonding, you get the excitement. 03:12:35.780 |
There's certain ideas that are like complete bullshit 03:12:50.740 |
- Well, I mean, being in person is a very different thing. 03:13:03.020 |
But what we found is that getting people together, 03:13:06.220 |
whether it be a team or the whole company or whatever, 03:13:08.620 |
is worth the expense because people work together 03:13:13.340 |
Like it just, like there's a massive period of time 03:13:16.460 |
where you like go out and things start getting frayed, 03:13:23.020 |
we work through the disagreement or the misunderstanding, 03:13:28.180 |
And so things like that, I think are really quite important. 03:13:30.740 |
- What about people that are kind of specialized 03:13:33.900 |
in very different aspects of the stack working together? 03:13:38.380 |
- Yeah, well, so I mean, there's lots of interesting people, 03:13:46.380 |
- So one of the, so there's different philosophies 03:13:51.860 |
For me, and so some people say, "Hire 10X programmers," 03:13:56.100 |
and that's the only thing, whatever that means, right? 03:13:58.940 |
What I believe in is building well-balanced teams, 03:14:02.540 |
teams that have people that are different in them. 03:14:18.140 |
And so what I like to do is I like to build teams 03:14:22.780 |
You know, we do have teams that are focused on like runtime 03:14:25.780 |
or compiler, GPU, or whatever the speciality is, 03:14:32.660 |
And I look for people that compliment each other. 03:14:35.060 |
And particularly if you look at leadership teams 03:14:38.100 |
you don't want everybody thinking the same way. 03:14:40.700 |
You want people bringing different perspectives 03:14:45.460 |
- That's team, but what about building a company 03:14:49.540 |
So what, are there some interesting lessons there? 03:14:56.660 |
okay, so Modular's the first company I built from scratch. 03:15:05.060 |
was I'm not cleaning up somebody else's mess. 03:15:11.900 |
And also many of the projects I've built in the past 03:15:16.900 |
have not been core to the product of the company. 03:15:23.740 |
MLIR is not Google's revenue machine or whatever, right? 03:15:31.120 |
on the accounting software for the retail giant 03:15:36.240 |
It's like enabling infrastructure and technology. 03:15:45.000 |
Like, it is directly the thing that we're giving to people. 03:15:53.440 |
is they're working on the thing that matters. 03:16:01.720 |
And so that's also pretty exciting and quite nice. 03:16:11.300 |
And so one of the challenges I've had in other worlds 03:16:13.840 |
is it's like, okay, well, community matters somehow 03:16:53.240 |
- And so it's very liberating to be able to decide. 03:16:59.720 |
And it becomes very simple, 'cause I like Lex. 03:17:17.040 |
is that they're able to generate code recently really well. 03:17:39.400 |
because the language models are able to predict 03:17:42.760 |
the kind of code I was about to write so well 03:17:45.880 |
that it makes me wonder how unique my brain is 03:17:48.520 |
and where the valuable ideas actually come from. 03:17:50.960 |
How much do I contribute in terms of ingenuity, 03:18:06.840 |
is they help you stand on the shoulders of giants 03:18:12.860 |
but I just, it would love to get your opinion first, 03:18:15.860 |
high level of what you think about this impact 03:18:19.840 |
of larger language models when they do program synthesis, 03:18:24.520 |
- Yeah, well, so I don't know where it all goes. 03:18:32.800 |
I think that things I've seen are that a lot of the LLMs 03:18:35.960 |
are really good at crushing leet code projects 03:18:38.700 |
and they can reverse the linked list like crazy. 03:18:41.760 |
Well, it turns out there's a lot of instances 03:18:44.520 |
of that on the internet and it's a pretty stock thing. 03:18:46.740 |
And so if you want to see standard questions answered, 03:18:50.480 |
LLMs can memorize all the answers and that can be amazing. 03:18:56.760 |
But I think that if you, in my experience, building things, 03:19:04.700 |
or you talk about building an applied solution to a problem, 03:19:16.800 |
because they'll tell you that they want a faster horse. 03:19:23.120 |
I don't feel like we have to compete with LLMs. 03:19:27.600 |
a ton of the mechanical stuff out of the way. 03:19:40.200 |
that will help us all scale and be more productive. 03:20:05.320 |
like when you design a new programming language, 03:20:07.640 |
it almost seems like, man, it would be nice to sort of, 03:20:13.400 |
almost as a way to learn how I'm supposed to use this thing 03:20:16.680 |
for them to be trained on some of the Mojo code. 03:20:21.120 |
so maybe there'll be a Mojo LLM at some point. 03:20:27.800 |
how do we make a language to be suitable for LLMs? 03:20:41.160 |
that we as humans deal with on a continuous basis, 03:20:45.360 |
And yet they're the intermediate representation. 03:20:48.360 |
They're the exchange format that we humans use 03:20:57.560 |
or the human and the compiler, roughly, right? 03:21:09.520 |
- No, the reverse of that, it will actually enable it 03:21:16.560 |
is there would be confusion about the gray area. 03:21:38.000 |
they really want the indentation to be right. 03:21:45.560 |
but LLMs can totally help solve that problem. 03:21:48.640 |
And so I'm very happy about the new predictive coding 03:21:51.800 |
and copilot type features and things like this 03:21:53.760 |
because I think it'll all just make us more productive. 03:21:55.920 |
- It's still messy and fuzzy and uncertain, unpredictable. 03:22:19.280 |
So compilers run fast and they're very efficient 03:22:24.000 |
There's on-device LLMs and there's other things going on. 03:22:35.480 |
into the creative potential of the hallucinations, right? 03:22:40.360 |
And so if you're doing creative brainstorming 03:22:46.700 |
If you're writing code that has to be correct 03:22:58.720 |
algebraic reasoning systems and kind of like figuring out 03:23:05.040 |
And so I think that there could be interesting work 03:23:06.920 |
in terms of building more reliable at scale systems. 03:23:14.560 |
how do you express your intent to the machine? 03:23:16.800 |
And so maybe you want an LLM to provide the spec, 03:23:26.400 |
and inspiration versus the actual implementation. 03:23:31.560 |
- Since a successful modular will be the thing that runs, 03:23:43.840 |
I know it's a cliche term, but in and of things. 03:23:48.360 |
- So I'll joke and say like AGI should be written in Mojo. 03:23:54.240 |
You're joking, but it's also possible that it's not a joke. 03:24:09.880 |
So I just have to ask you about the big philosophical 03:24:20.680 |
Do you think about the good and the bad that can happen 03:24:31.200 |
and there's a lot of different parts to this problem. 03:24:38.520 |
And so you can zoom into sub parts of this problem. 03:24:41.040 |
I'm not super optimistic about AGI being solved next year. 03:24:47.640 |
I don't think that's gonna happen personally. 03:24:53.920 |
is there's a nervousness because the leap of GPT-4 03:25:04.960 |
- Well, so I mean, there's a couple of things going on there. 03:25:07.760 |
One is I'm sure GPT-5 and 7 and 19 will be also huge leaps. 03:25:12.760 |
They're also getting much more expensive to run. 03:25:17.640 |
in terms of just expense on the one hand, and train. 03:25:20.360 |
Like that could be a limiter that slows things down. 03:25:23.400 |
But I think the bigger limiter is outside of like, 03:25:29.320 |
thinking about that because if Skynet takes over 03:25:36.360 |
okay, if other things worry about, I'll just focus on. 03:25:41.160 |
But I think that the other thing I'd say is that 03:25:58.680 |
And so I think that I'm not even too worried about 03:26:01.880 |
autonomous cars defining away all the taxi drivers. 03:26:04.720 |
Remember, autonomy was supposed to be solved by 2020. 03:26:08.720 |
- So, and so like, I think that on the one hand, 03:26:12.480 |
we can see amazing progress, but on the other hand, 03:26:14.920 |
we can see that, you know, the reality is a little bit 03:26:18.320 |
more complicated and it may take longer to roll out 03:26:26.760 |
that's built on top of LLMs that runs, you know, 03:26:31.100 |
the millions of apps that could be built on top of them 03:26:34.120 |
and that could be run on millions of devices, 03:26:43.800 |
on human civilization could be truly transformative to it. 03:26:49.520 |
- Well, and so, and there I think it depends on 03:26:52.080 |
are you an optimist or a pessimist or a masochist? 03:26:54.800 |
- Just to clarify, optimist about human civilization. 03:27:01.300 |
And so I look at that as saying, okay, cool, what will AI do? 03:27:08.620 |
I kind of look at it from a, is it gonna unlock us all? 03:27:12.780 |
Right, you talk about coding, is it gonna make it 03:27:14.180 |
so I don't have to do all the repetitive stuff? 03:27:16.780 |
Well, suddenly that's a very optimistic way to look at it 03:27:18.940 |
and you look at what a lot of these technologies 03:27:22.340 |
have done to improve our lives and I want that to go faster. 03:27:25.680 |
- What do you think the future of programming looks like 03:27:37.460 |
like the vision for devices, the hardware to the compilers, 03:27:46.460 |
to my arch nemesis, right, it's complexity, right? 03:27:49.140 |
So again, me being the optimist, if we drive down complexity 03:27:56.360 |
these cool hardware widgets accessible to way more people. 03:28:00.780 |
is more personalized experiences, more things, 03:28:08.460 |
And so, and like these things that impact people's lives 03:28:15.220 |
And so one of the things that I'm a little bit concerned 03:28:17.180 |
about is right now the big companies are investing 03:28:21.060 |
huge amounts of money and are driving the top line 03:28:26.040 |
But if it means that you have to have $100 million 03:28:28.740 |
to train a model or more, $100 billion, right, 03:28:38.580 |
I would much rather see lots of people across the industry 03:28:46.740 |
a lot of great research has been done in the health world 03:28:52.820 |
and doing radiology with AI and like doing all these things. 03:28:58.960 |
and build these systems, you have to be an expert 03:29:11.500 |
which roughly everybody can do if they want to, right? 03:29:15.060 |
Then I think that we'll get a lot more practical application 03:29:24.280 |
- Do you think we'll have more or less programmers 03:29:28.620 |
- Well, so I think we'll have more programmers, 03:29:31.560 |
but they may not consider themselves to be programmers. 03:29:34.920 |
- Right, I mean, do you consider somebody that uses, 03:29:38.280 |
the most popular programming language is Excel. 03:29:45.160 |
- And so do they consider themselves to be programmers? 03:29:48.360 |
I mean, some of them make crazy macros and stuff like that, 03:29:56.040 |
it's the bicycle for the mind that allows you to go faster. 03:30:00.360 |
Right, and so I think that as we look forward, right, 03:30:04.080 |
I look at it as hopefully a new programming paradigm. 03:30:06.960 |
It's like object-oriented programming, right? 03:30:12.360 |
It turns out that's not the right tool for the job, right? 03:30:22.560 |
that's not integrated into programming languages 03:30:27.960 |
and doesn't work right, and you have to babysit it, 03:30:30.160 |
and every time you switch hardware, it's different. 03:30:39.400 |
You can start using them for many more things. 03:30:41.040 |
And so that's why I would be excited about it. 03:30:46.360 |
or maybe early college, who's curious about programming 03:30:50.000 |
and feeling like the world is changing really quickly here? 03:31:05.600 |
- Well, so, I mean, one of the things I'd say 03:31:39.120 |
and you keep doing things and building things, 03:31:58.320 |
- And so just because everybody's doing a thing, 03:32:00.040 |
it doesn't mean you have to do the same thing 03:32:16.560 |
parts of the problem that people want to take for granted 03:32:40.040 |
So if you want to be a rebel, go check out Mojo 03:33:08.840 |
of how to make AI accessible to a huge number of people, 03:33:13.960 |
- Yeah, well, so Lex, you're a pretty special person too. 03:33:40.040 |
please check out our sponsors in the description. 03:33:50.520 |
Thank you for listening and hope to see you next time.