back to indexTravis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming | Lex Fridman Podcast #224
Chapters
0:0 Introduction
1:11 Early programming
22:52 SciPy
39:46 Open source
51:29 NumPy
88:44 Guido van Rossum
101:2 Efficiency
109:54 Objects
116:52 Numba
125:58 Anaconda
130:25 Conda
146:1 Quansight Labs
149:37 OpenTeams
157:10 GitHub
162:40 Marketing
167:18 Great programming
178:8 Hiring
182:6 Advice for young people
00:00:00.000 |
The following is a conversation with Travis Olyphant, 00:00:23.980 |
made Python more accessible to a much larger audience. 00:00:27.620 |
Travis's life work across a large number of programming 00:00:31.200 |
and entrepreneurial efforts has and will continue 00:00:34.760 |
to have immeasurable impact on millions of lives 00:00:38.460 |
by empowering scientists and engineers in big companies, 00:00:47.200 |
and solve them with the power of programming. 00:00:53.440 |
which is something that when combined with vision 00:01:02.320 |
please check out our sponsors in the description. 00:01:06.960 |
and here is my conversation with Travis Olyphant. 00:01:10.640 |
What was the first computer program you've ever written? 00:01:23.320 |
Atari 400, I think, or maybe it was an Atari 800. 00:01:28.320 |
and we just were just basic loops to print things out. 00:01:43.320 |
when I was told that don't use goto statements. 00:01:45.720 |
Those are bad software engineering principles. 00:01:48.360 |
It goes against what great, beautiful code is. 00:01:52.040 |
I was like, oh, okay, there's rules to this game. 00:01:58.360 |
I did a lot of other kinds of just programming in TI, 00:02:02.200 |
but finally when I took an AP Computer Science course 00:02:07.440 |
That's when I, oh, there are these principles. 00:02:11.320 |
- No, I didn't take C until the next year in college. 00:02:14.660 |
I had a course in C, but I haven't done much in Pascal, 00:02:23.480 |
but when did you first fall in love with programming? 00:02:33.460 |
and he was excited about the spreadsheet capability, 00:02:39.560 |
the add-ons so we could actually program in basic, 00:02:45.960 |
Then we got a TI99, TI99-4A when I was about 12, 00:02:50.040 |
and I would just, it had sprites and graphics and music. 00:02:55.320 |
That's when I really sort of fell in love with programming. 00:03:06.360 |
- Yeah, the Timex Sinclair was one of the very first. 00:03:11.420 |
well, it was still expensive, but it was 2K of memory. 00:03:16.780 |
But yeah, it had memory, and you could program it. 00:03:19.000 |
You had the, in order to store your programs, 00:03:31.920 |
Still remember that sound, but that was the storage. 00:03:34.760 |
- And what was the programming language, do you remember? 00:03:38.960 |
and so a little bit of spreadsheet programming in VisiCalc, 00:03:54.800 |
and a lot of people think they don't like math 00:03:58.080 |
because I think when they're exposed to it early, 00:04:05.920 |
And I do have a reasonably, I mean, not perfect, 00:04:08.600 |
but a reasonably long little short-term memory buffer. 00:04:20.300 |
And so computing was problem-solving applied. 00:04:46.100 |
Like it was sort of, yeah, it was a lot of work 00:04:53.100 |
And that's still work, but it's getting easier. 00:05:09.060 |
And I remember the day when I would dream in Spanish. 00:05:30.060 |
that leads you down certain thought processes. 00:06:01.020 |
the sad, hopeful songs, the over-romanticized, 00:06:48.460 |
But there's a bunch of things that are untranslatable. 00:06:53.580 |
I actually have a few conversations coming up offline 00:07:01.980 |
And that's for people who worked in this field, 00:07:13.980 |
'cause there's just a magic captured by that sentence. 00:07:16.380 |
And how do you translate just in the right way? 00:07:24.220 |
"Beauty will save the world" from Dostoevsky. 00:07:34.140 |
but it also leads your mind down certain trajectories 00:07:49.740 |
- Well, we don't, we live in our own little pockets. 00:08:06.980 |
I don't truly know Japanese and Portuguese and Brazil, 00:08:28.540 |
so much of the technical world is in English, 00:08:39.620 |
there's a lot of genius out there that we miss, 00:08:41.780 |
and we're sort of fortunate when it bubbles up 00:08:45.020 |
into something that we can understand or process. 00:08:55.420 |
or very resistant, sort of authoritarian structures. 00:09:11.820 |
that I really enjoyed thinking in, as you said. 00:09:19.620 |
do you remember when you first kind of connected with Python, 00:09:29.460 |
I was a graduate student studying biomedical engineering 00:09:34.660 |
I'd been involved in taking information from satellites. 00:09:43.980 |
doing some data processing information out of it. 00:09:54.220 |
and they had their own little scripting tools 00:10:00.860 |
I was looking for something and encountered Python, 00:10:06.180 |
had two things that made me not filter it away. 00:10:11.740 |
I looked at a few other languages that are out there 00:10:26.980 |
You know, and I went back and read the mailing list 00:10:36.020 |
unstructured cooperation happens in the open source world 00:10:39.500 |
that led to a lot of this collective programming, 00:10:43.340 |
which is something maybe we might get into a little later, 00:10:48.340 |
- Numeric filled the gap of having an array object. 00:10:57.540 |
two, three, four-dimensional tensor, they call it now. 00:11:00.660 |
I'm still in the category that a tensor is another thing, 00:11:03.220 |
and it's just an N-V-A-R-A, we should call it, 00:11:17.140 |
So Numeric had math and a basic way to think in arrays. 00:11:20.780 |
So I was looking for that, and it had complex numbers. 00:11:29.500 |
you think, "Ah, complex numbers are just two floats." 00:11:48.140 |
and not having it means you have to develop it 00:11:50.860 |
several times, and those times may not share an approach. 00:11:55.700 |
one of the things programming enables is abstractions. 00:11:59.100 |
But when you have shared abstractions, it's even better. 00:12:02.980 |
of actually we all think of this the same way, 00:12:07.940 |
Because powerful in that we now can quickly make 00:12:17.100 |
we maybe left behind in producing that abstraction, 00:12:21.900 |
and actually building around the programming world. 00:12:24.140 |
So I think it's a fascinating philosophical topic. 00:12:26.540 |
- Yeah, that will continue for many years, I think. 00:12:29.540 |
- As we build more and more and more abstractions. 00:12:32.340 |
we have a world that's built on these abstractions 00:12:47.500 |
There's, it has implications for things like, 00:13:03.380 |
And, you know, there's strategies to approach that, 00:13:12.100 |
And then I had the experience, I did some stuff in Python, 00:13:17.860 |
my focus was on, I was actually doing a combination 00:13:20.940 |
of MRI and ultrasound, and looking at a phenomenon 00:13:36.820 |
is how to do that with both ultrasound and MRI. 00:13:44.260 |
In '98, I went back, looked at what I'd written, 00:13:53.660 |
I'd done the same thing, and then I looked back, 00:13:58.380 |
Now, you know, I'm not saying, so that made me, 00:14:07.660 |
And so that led me to go, hmm, I'm gonna push more to this. 00:14:27.180 |
- So space is used, indentation, I should say, 00:14:43.900 |
I was open-minded, so I was cognizant of the concern. 00:14:48.060 |
And it definitely has, it has specific challenges. 00:14:55.540 |
and if your editors aren't supportive of that, 00:15:00.100 |
when terminals didn't necessarily have the intelligence 00:15:03.340 |
Now, iPython and Jupyter Notebooks handle it just fine, 00:15:08.820 |
formatting challenges, also mixed tabs and spaces. 00:15:14.820 |
on what was happening, you would have these issues. 00:15:16.940 |
So there were really concrete reasons about it 00:15:20.460 |
I never really encountered a problem with it, 00:15:23.460 |
personally, like it was occasional annoyances, 00:15:28.500 |
that it didn't have all this extra characters, right? 00:15:43.380 |
because of the minimalism of like how many characters 00:15:49.860 |
But what you realize with that compactness comes, 00:15:58.940 |
and less and less readable to a point where it's like, 00:16:16.300 |
Like it means you have to be an expert in Perl 00:16:20.420 |
Whereas Python allowed you not to have to be an expert. 00:16:23.020 |
You don't have to take all this brain energy. 00:16:25.700 |
you could leverage your English language center, 00:16:34.740 |
Latin-based languages with the characters are at least similar. 00:16:38.620 |
but I don't know what it's like to be a Japanese 00:16:41.300 |
or a Chinese person trying to learn a different syntax. 00:16:45.740 |
Like what would computer programming look like in that? 00:16:54.260 |
I'm not sure Python or any programming language does that. 00:16:58.100 |
The fact that it was accessible, I could be a scientist. 00:17:00.300 |
What I really liked is many programming languages 00:17:02.860 |
really demand a lot of you, and you can get a lot, 00:17:15.300 |
So more people could actually, as a scientist, 00:17:21.420 |
besides programming, I could still use this language 00:17:27.300 |
Now I was also comfortable in C at that time. 00:17:30.900 |
- And MATLAB I did a lot before that, exactly. 00:17:34.860 |
those three languages were really the tools I used 00:17:39.560 |
But to your point about language helping you think, 00:17:42.620 |
one of the big things about MATLAB was it was, 00:17:44.580 |
and APL before it, I don't know if you remember APL. 00:17:48.500 |
- APL is actually the predecessor of array-based programming, 00:18:01.100 |
Microsoft as a company generally did not understand 00:18:03.900 |
array-based programming, culturally they didn't understand it 00:18:08.580 |
kept missing the understanding of what this was. 00:18:11.580 |
They've gotten better, but there's still a whole culture 00:18:15.660 |
that's systems programming or web programming 00:18:18.900 |
or lists and maps and what about an n-dimensional array? 00:18:22.540 |
Oh yeah, that's just an implementation detail. 00:18:29.860 |
APL was the first language to understand that 00:18:36.780 |
not only glyphs, like new characters, new glyphs, 00:18:43.980 |
when the QWERTY keyboard maybe wasn't as established. 00:18:47.980 |
Like, well, we can have a new keyboard, no big deal. 00:18:56.500 |
as people would pride themselves on how much, 00:18:58.620 |
could they write the game of life in 30 characters of APL? 00:19:06.100 |
and they have adverbs, they would have adjectives 00:19:20.900 |
you think in n dimensions, it's something I like to say, 00:19:22.900 |
and you start to think differently about data at that point. 00:19:30.100 |
if you really internalize linear algebra as a course, 00:19:38.540 |
You don't have to think about the individual numbers 00:19:48.500 |
You're saying MATLAB and APL were like the early, 00:19:52.620 |
I don't know if many languages got that right ever. 00:20:03.020 |
I would say APLJ was another version that was, 00:20:14.540 |
in terms of let's add arrays plus broadcasting, 00:20:23.140 |
it's still in Python, for the number of dimensions. 00:20:25.900 |
That's different than say the rank of a matrix, 00:20:33.060 |
but NumPy is a very pragmatic, practical tool. 00:20:46.100 |
now there's a ton of them over the past two or three years. 00:20:50.300 |
- So if we just sort of linger on the early days 00:21:02.260 |
what really makes you connect with a language? 00:21:06.300 |
I'm not sure it's obvious to introspect that. 00:21:12.820 |
I think definitely the fact that I could read it later, 00:21:16.420 |
that I could use it productively without becoming an expert. 00:21:19.500 |
Like other languages I had to put more effort into. 00:21:21.420 |
- Right, that's like an empirical observation, 00:21:23.940 |
like you're not analyzing any one aspect of the language, 00:21:43.220 |
and then the syntax wasn't that far behind it. 00:21:46.740 |
- Right, now there are some warts there still, 00:21:54.380 |
Some of those things got added to the language too. 00:21:56.580 |
I was really grateful for some of the early pioneers 00:22:13.460 |
These were people that were on the main list, 00:22:22.540 |
And the fact that they went the engineering route of J 00:22:25.220 |
I don't think that's entirely favoring engineers, 00:22:36.780 |
- But the fact that complex numbers were there, 00:22:39.100 |
The fact that I could write NDA array constructs, 00:22:52.660 |
- I don't know what to start talking to you about, 00:22:54.820 |
'cause you've created so many incredible projects 00:22:57.860 |
that basically changed the whole landscape of programming. 00:23:26.020 |
which is an array library that made a lot of it possible. 00:23:29.940 |
Like I didn't have an ordinary differential equation solver, 00:23:40.580 |
These are things I remember being critical things 00:23:43.700 |
Optimization, I just wanna pass a function to an optimizer 00:23:46.780 |
and have it tell me what the optimum value is. 00:23:59.140 |
wouldn't it be great if we had this optimizer library? 00:24:06.980 |
and eager, and probably more time than sense. 00:24:13.620 |
My wife thinks I'm working on my PhD, and I am. 00:24:36.660 |
But really when I fell in love with Python in '98, 00:24:38.260 |
I thought, oh, well, there's just a few things missing. 00:24:39.740 |
Like, oh, I need a reader to read DICOM files. 00:24:57.100 |
so that in Python I could write things more easily. 00:25:04.860 |
as a scripting language and a high-level language 00:25:06.540 |
to think about, but that I can extend easily. 00:25:15.300 |
I mean, the only, the hard part of extending Python 00:25:17.260 |
was something called the way memory management works, 00:25:21.100 |
And so there's a tracking of reference counting 00:25:32.220 |
It's not just, I have to now think about pointers 00:25:34.740 |
and I have to think about stuff that is different. 00:25:38.500 |
you're like putting a new cartridge in your brain. 00:25:48.320 |
I could just think about MRI and high-level writing. 00:25:51.540 |
But I could do that, and that kind of, I liked it. 00:25:57.220 |
well, let me just add a bunch of stuff to Python 00:26:08.900 |
that people had written in the '60s and the '70s 00:26:17.220 |
And Fortran 77 is actually a really great language. 00:26:24.140 |
because it's got complex numbers, got arrays, 00:26:27.740 |
Now, the problem with it is you'd never wanna write 00:26:32.300 |
but it's totally fine to write a subroutine in. 00:26:39.100 |
But at the time, I just want libraries that do something, 00:26:42.220 |
like, oh, here's an order-inference equation. 00:26:48.820 |
I mean, you could, but it's nice to have somebody 00:26:51.820 |
And so I sort of started this journey in '98, really, 00:27:04.580 |
and making an ordinary additional equation solver, 00:27:09.180 |
So we could call it ODE-PAC, I think I called it then, 00:27:11.860 |
QuadPAC, and then I just made these packages. 00:27:20.700 |
was actually just getting your stuff installed. 00:27:25.820 |
like, today, people think, what does that mean? 00:27:27.580 |
Well, then it meant some poorly-written webpage, 00:27:30.780 |
I had some bad webpage up, and I put a tarball, 00:27:57.060 |
but it seems like in the scientific community, 00:28:10.980 |
that I'm going to make this usable for others, that's-- 00:28:18.100 |
been inspired by Linus and him making his code available, 00:28:24.460 |
So I'd kind of been previously primed that way. 00:28:32.660 |
if collectively we build knowledge and share it, 00:28:42.900 |
I liked that part of science, that part of sharing. 00:28:45.700 |
And then all of a sudden, oh, wait, here's something, 00:28:49.940 |
And then I slowly over years learned how to share better 00:28:52.780 |
so that you could actually engage more people faster. 00:28:55.140 |
One of the key things was actually giving people 00:28:59.020 |
So that it wasn't just, here's source code, good luck. 00:29:02.660 |
- It's compiled, ready to install, just, you know, so. 00:29:05.740 |
In fact, a lot of the journey from '98, even through 2012, 00:29:10.780 |
Like, it's why, you know, it's really the key 00:29:13.260 |
as to why a scientist with dreams of doing MRI research 00:29:22.260 |
- I work with a few folks now that don't program, 00:29:25.800 |
like, on the creative side, the video side, the audio side. 00:29:32.500 |
I have to try to get them, I'm having now the task 00:29:35.180 |
of teaching them how to do Python enough to run the scripts. 00:29:45.700 |
basically to my mom, how to write a Python script. 00:29:50.540 |
I have to, it's a to-do item for me to figure out, 00:29:52.820 |
like, what is the minimal amount of information 00:29:57.980 |
that when you enjoy it, to your effect of it-- 00:30:00.820 |
- And they're related, those are two related questions. 00:30:02.540 |
- And then the debugging, like the iterative process 00:30:05.500 |
of running the script to figure out what the error is, 00:30:07.860 |
maybe even for some people to do the fix yourself. 00:30:13.620 |
like, how do you distribute that code to them? 00:30:24.300 |
the circle of people that are able to use your programs, 00:30:28.060 |
you increase it, it's like, effectiveness and it's power. 00:30:32.940 |
And so you have to think, you know, can I write scripts, 00:30:37.060 |
can I write programs that can be used by medical engineers, 00:30:40.180 |
by all kinds of people that don't know programming? 00:30:55.380 |
like, how frictionless can you make the early steps? 00:31:00.420 |
To go in any community is, any friction point, 00:31:05.780 |
Sometimes you may wanna intentionally do that 00:31:09.100 |
if you're early enough on, you need a lot of help, 00:31:16.860 |
as opposed to contributors if you're early on. 00:31:23.140 |
but it really emerged as this collection of modules 00:31:28.620 |
And you know, I think I got 100 users, right, 00:31:33.020 |
But the fact that I got 100 users and more than that, 00:31:41.340 |
That was the, you know, here I'm writing papers, 00:31:44.260 |
I'm giving conferences, and I get people to say hello, 00:31:54.620 |
I was starting to see that sense of academic life 00:32:05.060 |
- And you know, that's not true across the board, 00:32:08.580 |
But here in this world, I was getting responses 00:32:13.460 |
You know, I remember Piero Peterson in Estonia, right, 00:32:24.380 |
I don't think I ever understood that makefile actually, 00:32:35.100 |
But you know, the process was, he sent me a patch file, 00:32:41.580 |
And the style back then was, here's a mailing list, 00:32:43.620 |
it's very, it wasn't as, it certainly weren't 00:32:46.180 |
the tools that are available today, it was very early on. 00:32:48.900 |
But I really started to, that's the whole year, 00:32:50.700 |
I think I did about seven packages that year, right? 00:32:55.540 |
I collected them into a thing called Multipack. 00:32:57.820 |
So '99, there was this thing called Multipack, 00:33:01.780 |
I know he was a high school student at the time, 00:33:09.700 |
And then of course a massive increase of usage. 00:33:12.660 |
- So by the way, most of this development was under Linux. 00:33:15.860 |
- Yes, yes, it was on Linux, I was a Linux developer, 00:33:20.220 |
I mean, at the time, I was actually getting into, 00:33:22.980 |
I had a new hard drive, did some kernel programming 00:33:26.460 |
I mean, not programming, but modification to the kernel 00:33:28.740 |
so I could actually get a hard drive working. 00:33:32.300 |
I was also, at school, I was building a cluster, 00:33:36.060 |
I took Mac computers, and you put Yellow Dog Linux on 'em. 00:33:44.700 |
and so I kind of got permission to go grab 'em together, 00:33:46.820 |
I put about 24 of 'em together in a cluster in a cabinet, 00:33:51.660 |
and I wrote a C++ program to do MRI simulation. 00:34:06.260 |
That's why ordinary differential equations were key, 00:34:08.140 |
was because that's the heart of a block equation 00:34:18.540 |
and what you're interested in, they're coinciding. 00:34:24.940 |
which helped in the sense that I was using it for me, 00:34:28.500 |
I had one person who was like, well, no, this is better, 00:34:33.260 |
to guide some of what those APIs might look like. 00:34:40.980 |
oh, yeah, the binary installer really helps people. 00:34:49.060 |
So around 2000, so I graduated my PhD in 2000, 00:35:04.020 |
There was a company, there was a guy, actually, 00:35:07.580 |
they were two friends who founded a company called Enthought. 00:35:27.540 |
and kind of add it to some stuff that he'd done, 00:35:39.460 |
It came from Multipack and a whole bunch of modules 00:35:42.180 |
I'd written, plus a few things from some other folks, 00:35:44.460 |
and then pulled together in a single installer. 00:35:51.220 |
- How did you think about SciPy in context of Python, 00:35:56.900 |
as a way to make an R&D environment for Python, 00:36:03.340 |
So numeric was the array library we depended on. 00:36:09.460 |
and at the time, the original vision of SciPy 00:36:18.580 |
that you could then install and get going with. 00:36:28.340 |
to do massive scale projects with open source collectives. 00:36:33.340 |
Actually, there's sort of an intrinsic cooperation limit 00:36:38.460 |
as to which, you know, too many cooks in the kitchen, 00:36:40.580 |
you know, you can do amazing infrastructure work. 00:36:42.740 |
When it comes down to bringing it all together 00:36:45.860 |
that actually requires a little more product management 00:36:53.980 |
So it struggled, you know, it struggled to get, 00:36:56.900 |
almost too many voices, it's hard to have everybody agree, 00:36:59.060 |
you know, consensus doesn't really work at that scale. 00:37:02.100 |
You end up with politics, you end up with the same kind 00:37:03.900 |
of things that's happened in large organizations 00:37:09.420 |
So consensus building was still, was challenging at scale 00:37:13.860 |
Early on, it's fine 'cause there's nobody there. 00:37:15.700 |
And so it works, but then as you get more successful 00:37:19.020 |
oh, there's this scale at which this doesn't work anymore 00:37:22.340 |
and we have to come up with different approaches. 00:37:28.940 |
I remember the days of getting that release ready, 00:37:31.100 |
it was a Windows installer and there were bugs on how, 00:37:34.060 |
you know, the Windows compiler handled complex numbers 00:37:42.260 |
effort had nothing to do with my area of study. 00:37:45.540 |
And at the same time, I had just gotten an offer. 00:37:48.820 |
and help him start that company with his friend. 00:37:51.500 |
And I, at the time, I was like, I was so intrigued, 00:38:03.900 |
then I started working on Sci-Fi as a professor too. 00:38:07.940 |
- So that's, I left, I've got the Mayo Clinic, graduated, 00:38:10.020 |
wrote my thesis using Sci-Fi, wrote, you know, 00:38:28.940 |
I ended up using Dyslin plus some of the plotting 00:38:33.780 |
Anyway, it was, people don't plot that way now, 00:38:37.180 |
but this is before, and Sci-Fi was trying to add plotting. 00:38:42.580 |
Really the success of plotting came from John Hunter, 00:38:45.580 |
who had a similar experience to my experience, 00:38:49.660 |
just trying to get stuff done and kind of having more time 00:38:59.140 |
he wasn't a student at the time, but he was an, 00:39:03.500 |
So he just went out and said, "Cool, I'll make a new project, 00:39:05.460 |
"and we'll call it Matplotlib," and he released it in 2001, 00:39:09.900 |
And it was separate library, separate install, 00:39:15.540 |
And so Sci-Fi, you know, in 2001 we released Sci-Fi, 00:39:18.980 |
and then Enthoq created a conference called Sci-Fi, 00:39:22.380 |
which brought people together to talk about the space. 00:39:26.700 |
It's one of the favorite conferences of a lot of people, 00:39:28.460 |
because it's, you know, it's changed over the years, 00:39:30.820 |
but early on it was, you know, a collection of 50 people 00:39:36.700 |
practicing scientists who want to care about coding 00:39:42.180 |
And I remember being driven by, you know, I like MATLAB, 00:39:52.700 |
but I also see the role for proprietary software. 00:40:00.860 |
"you have to have this proprietary software." 00:40:02.540 |
- Right, and there's also culture around MATLAB, 00:40:10.860 |
- I mean, there's just a culture, they try really hard, 00:40:13.940 |
but it just is this corporate IBM style culture 00:40:18.420 |
I don't want to say negative things about IBM or whatever, 00:40:23.740 |
It's something I'm in the middle of right now 00:40:27.020 |
and how do you connect the ethos of cooperative development 00:40:30.840 |
with the necessity of creating profits, right? 00:40:42.220 |
'Cause I was writing sci-fi, I mean, as an aside, 00:40:58.780 |
Certainly the ideas on IP law, I read a lot of his stuff. 00:41:13.220 |
"I think just be like me and don't have kids, right? 00:41:18.540 |
- That was the, what he said in that moment, right? 00:41:24.980 |
There has to be a way to preserve the culture of open source 00:41:27.460 |
and still be able to make sufficient money to feed your-- 00:41:31.500 |
Well, so that actually led me to a study of economics. 00:41:34.500 |
'Cause at the time, I was ignorant, and I really was. 00:41:36.660 |
And I'm actually, I'm embarrassed for educational system 00:41:39.420 |
that they could let me, and I was valedictorian 00:41:41.300 |
in my high school class, and I did super well in college. 00:41:47.620 |
But the fact that I could do that and then be clueless 00:41:54.420 |
Like, I should've learned this in fifth grade. 00:42:02.820 |
because you've created tools that changed the lives 00:42:11.580 |
the basics economics of how to build up a giant system 00:42:20.700 |
I was in a library, I was reading books on capitalism, 00:42:24.700 |
I was reading books on, you know, what is this thing? 00:42:29.740 |
And I encountered, basically, I encountered a set of writings 00:42:33.140 |
from people that said they were the inheritors 00:42:38.620 |
and kind of this notion of emergent societies, 00:42:42.500 |
and realized, oh, there's this whole world out here 00:42:58.100 |
They want their economists to back them up, right? 00:43:02.980 |
like the magicians in Pharaoh's court, right? 00:43:19.380 |
And I found a lot of writings that I really loved. 00:43:39.020 |
it's not gonna work to not have private property. 00:43:41.780 |
You're not gonna be able to come up with prices. 00:43:43.380 |
The bureaucrats aren't gonna be able to determine 00:43:45.180 |
how to allocate resources without a price system. 00:43:47.660 |
And a price system emerges from people making trades. 00:43:52.780 |
if they have authority over the thing they're trading. 00:44:04.780 |
- Yeah, the prices have a signal that's used. 00:44:11.860 |
- Like you would in the software engineering space. 00:44:23.860 |
And the fact that, oh, this is actually really critical. 00:44:29.340 |
and that we're dangerously not learning about this, 00:44:45.580 |
So how did that resolve itself in terms of Sly Fi? 00:44:49.100 |
- So I would say it didn't really resolve itself. 00:44:51.340 |
It sort of started a journey that I'm continuing on. 00:45:00.940 |
with giving stuff away and creating the market externalities 00:45:05.940 |
that the fact that, yeah, people might use it, 00:45:10.580 |
and I'll have to figure something else out to get paid. 00:45:14.940 |
that a lot of people have used stuff that I've written, 00:45:17.220 |
and I haven't necessarily benefited economically from it. 00:45:23.260 |
like, oh, I should've gotten more value out of this. 00:45:31.540 |
let them benefit, so it actually creates more of the same. 00:45:36.880 |
but there's some aspect, I wish there was mechanisms 00:45:40.460 |
for me to reward whoever created Sly Fi and NumPy, 00:45:49.180 |
- But there should be a very frictionless mechanism. 00:45:51.860 |
- There should be a frictionless mechanism, I totally agree. 00:45:53.340 |
I would love to talk about some of the ideas I have, 00:45:56.220 |
I think I've come up with some interesting notions 00:46:00.580 |
anything that will work takes time to emerge, right? 00:46:04.940 |
That's definitely one thing I've also understood and learned 00:46:10.100 |
we often give credit to, oh, this president gets elected, 00:46:14.420 |
And I saw that when I had a transition in a condo 00:46:24.380 |
- And sometimes the decision you made 10 years before 00:46:42.180 |
that's the stuff I would read a ton about early on. 00:46:49.500 |
and honestly, not for personally, I've been happy, 00:46:51.740 |
I've been happy, I feel like I don't have any, 00:46:58.980 |
trajectory from academia, is reading that stuff 00:47:04.740 |
I love software, but we need more entrepreneurs, 00:47:10.360 |
So once I kind of had that virus infect my brain, 00:47:17.580 |
to go to a tenure track position at a university, 00:47:22.780 |
I was kind of already out the door when I started, 00:47:41.140 |
I would think it's basically accessibility to scientists. 00:47:43.660 |
Like, give them, give scientists and engineers 00:47:46.500 |
tools that they don't have to think a lot about programming, 00:47:51.820 |
and sort of just the right length of spelling. 00:47:58.140 |
where it's like, make very, very long names, right? 00:48:01.860 |
And you can see it in some programming languages 00:48:09.860 |
characters would have to be six letters early on, right? 00:48:25.860 |
So when you look at great scientific libraries 00:48:29.180 |
and functions, there's a richness of documentation 00:48:34.820 |
The first glance at a function gives you the intuition 00:48:37.620 |
of all it needs to do by looking at the headers and so on, 00:48:40.540 |
but to get the depths of all the complexities involved, 00:48:56.780 |
The reality is those took about 10 years to evolve, right? 00:49:00.460 |
Given the fact that we didn't have a big budget, 00:49:13.740 |
not nearly enough to keep up with what was necessary. 00:49:18.860 |
I mean, it's hard to start a business and then do consulting 00:49:27.780 |
We stayed connected all while I was a student, 00:49:30.980 |
I went to BYU and started to teach electrical engineering, 00:49:59.300 |
Like my fundamental thing is I respect people. 00:50:03.900 |
I was thinking they had more knowledge than they did. 00:50:07.640 |
And so I would just speak at a very high level, 00:50:11.060 |
- But they need to rise to the standard that you set. 00:50:17.180 |
- And I agree, and that was kind of what was inspiring me. 00:50:52.300 |
And I realized, okay, I'm not, this is not working. 00:50:56.420 |
and I turned around and just started using the chalkboard. 00:51:02.300 |
and gave people time to process and to think. 00:51:07.940 |
but I really love that part of like the teaching. 00:51:15.420 |
Kind of how do you take the knowledge and then produce it? 00:51:21.020 |
Like ultimately Sai Pai was everything, right? 00:51:31.420 |
there was a little bit of like the Hubble Space Telescope. 00:51:42.420 |
And he had called me before I left BYU and said, 00:51:54.260 |
You know, broadcast needs to be a little more settled. 00:51:57.940 |
They wanted, you know, record arrays are like a data frame, 00:52:06.860 |
would you wanna work on something to make this work? 00:52:08.300 |
And I said, yeah, I'm interested, but I'm going here. 00:52:18.840 |
So I had a graduate student, my only graduate student, 00:52:21.660 |
a Chinese fellow, Liu Hongze is his name, great guy. 00:52:26.260 |
He wrote a bunch of stuff for iterative linear algebra, 00:52:31.380 |
linear algebra tools that are currently there in SciPy. 00:52:34.340 |
And they've gotten better since, but this is in 2005. 00:52:39.260 |
but Perry has started working on a replacement 00:52:56.740 |
It's open dilations, you know, there was sort of this, 00:52:59.580 |
as a medical imaging student, I knew what it was 00:53:04.380 |
And in fact, I'd wanted to do something like that 00:53:06.460 |
in Python, in SciPy, but just had never gotten around to it. 00:53:10.220 |
So when it came out, but it worked only on NumEri, 00:53:14.180 |
and SciPy needed NumEric, and so we effectively 00:53:22.500 |
They were just two, so you could have a gigabyte 00:53:24.420 |
of NumEri data and a gigabyte of NumEric data, 00:53:36.300 |
We're not, we're sort of redoing each other's work, 00:53:40.380 |
So that's what led me, even though I knew it was risky, 00:53:43.940 |
because my, you know, I was on a tenure track position. 00:53:54.780 |
and a little more of the paper writing and grant writing, 00:53:57.260 |
which was naive, but it was definitely the time, 00:54:23.660 |
understanding the role of software engineering 00:54:25.420 |
and programming in society is a little bit lacking. 00:54:28.700 |
Now, I was in an electrical engineering position. 00:54:34.700 |
And so, you know, good people, and I had a great time. 00:54:54.860 |
so I had a kind of a radio, instead of a radio, 00:54:58.300 |
a digital radio class, it was a digital MRI class. 00:55:01.820 |
- And I had people sign up, two people signed up, 00:55:04.020 |
then they dropped, and so I had nobody in this class. 00:55:06.660 |
So, and I didn't have any other courses to teach, 00:55:10.940 |
and I'll just write a merger of numeric and numeric. 00:55:14.820 |
Like, I'll basically take the numeric code base, 00:55:19.240 |
and then kind of come up with a single array library 00:55:22.460 |
So that's where NumPy came from, was my thinking, 00:55:25.500 |
hey, I can do this, and who else is going to? 00:55:27.860 |
'Cause at that point, I'd been around the community 00:55:38.580 |
that went in the first documentation for NumPy, 00:55:47.580 |
which is all the C API of numeric, all the C stuff. 00:55:54.780 |
So it was sort of, out of a sense of duty and passion, 00:56:01.460 |
I don't think the department here is gonna appreciate this, 00:56:10.780 |
of the way you thought and the action you took, 00:56:19.900 |
because what happens as the tools become more popular, 00:56:34.780 |
and it's like great leaders throughout history, 00:56:42.500 |
because I think that can make a big difference. 00:56:52.540 |
- I wonder if it's possible in the early days 00:56:58.220 |
- It is possible, especially in the early days. 00:57:01.620 |
And the more energy in the factions, the harder. 00:57:43.140 |
Part of my, my religious beliefs actually lead to that, 00:57:55.900 |
yeah, it may not work out for me financially, 00:58:07.060 |
and I knew, and partly because these Sci-Py conferences, 00:58:10.860 |
I knew there was a lot of need for this, right? 00:58:13.300 |
And so I had this, it wasn't like I was alone 00:58:16.500 |
I had these people who knew, but it was crazy. 00:58:20.780 |
yeah, we didn't think you'd be able to do it. 00:58:23.180 |
- And also, instructive, like practically speaking, 00:58:28.740 |
that you were chasing, the morphology, like the-- 00:58:31.700 |
- Like, it's not just like-- - There's an end result. 00:58:49.100 |
And I had, in fact, this almost got me over my skis, right? 00:58:52.200 |
I would say, well, in retrospect, I hate looking back. 00:58:56.180 |
I can tell you all the flaws with NumPy, right? 00:59:02.560 |
I wish I had somebody to slap me the wet fish there. 00:59:04.340 |
Like, I needed, like, what I'd wished I'd had 00:59:09.420 |
and certainly library writing and array library. 00:59:12.820 |
I could go back in time and go, do this, do that. 00:59:15.500 |
'Cause there's things we did that are still there 00:59:18.140 |
that are problematic, that created challenges for later. 00:59:26.500 |
Like, there was pieces of the design of NumPy, 00:59:29.100 |
I didn't know what to do until five years ago. 00:59:32.900 |
but I didn't know at the time, and I couldn't get the help. 00:59:36.700 |
It took about, it took four months to write the first version 00:59:42.580 |
But it was that first four months of intense writing, 00:59:47.820 |
coding, getting something out the door that worked. 00:59:52.420 |
And then the big thing I did was create a new type object 01:00:01.020 |
not just broadcasting, but advanced indexing. 01:00:03.500 |
So that you could do masked indexing and indirect indexing 01:00:09.940 |
- So for people who don't know, and maybe you can elaborate. 01:00:12.820 |
So NumPy, I guess the vision in the narrowest sense 01:00:23.180 |
And like at any level of abstraction you want, 01:00:30.060 |
that you would naturally want to investigate such objects. 01:00:37.180 |
- So it had an associated library of math operations. 01:00:44.940 |
So the key for me was, I was gonna write NumPy 01:00:50.380 |
In fact, early on, one of the initial proposals 01:00:54.580 |
and it would have the numeric object inside of it. 01:01:01.420 |
because numeric already had a little mini library 01:01:08.900 |
that nobody wanted to, they wanted backward compatibility. 01:01:25.420 |
Like some of the complexity in today's object 01:01:27.180 |
is actually from that goal of backward compatibility 01:01:34.620 |
Which is instructive because a lot of things are there. 01:01:41.340 |
It's an artifact of its historical existence. 01:02:08.900 |
sort of empathize to the culture of the people 01:02:11.580 |
that love something about this particular API. 01:02:18.980 |
the actual usage patterns and truly understand them. 01:02:29.420 |
And you have to also have enough passion that you'll do it. 01:02:38.420 |
So it really is an aspect, it's a philosophical, 01:02:47.100 |
it's sort of a life philosophy for me, right? 01:02:49.260 |
That I'm constantly pursuing and that helped, 01:02:54.300 |
like looking at human civilization as one object, 01:03:26.060 |
Could you, it's like-- - Yeah, Guido and Python. 01:03:30.980 |
it's like I said, we wanted to build this big thing, 01:03:33.780 |
What happened is we had Mavericks and champions 01:03:43.980 |
of this selfless, give the stewardship mentality 01:03:56.620 |
Like not waiting for everybody else to do the work, 01:03:58.920 |
but you're doing it for the benefit of others 01:04:04.860 |
you're not worried about what you're gonna get, 01:04:17.380 |
And this is what had no impact on the result. 01:04:28.820 |
Like Wes McKinney was critical to the success of Python 01:04:33.420 |
which is the roots of that were all the way back 01:04:43.180 |
Wes started to use that almost like a data frame, 01:04:49.780 |
okay, if you wanna augment it at another column, 01:04:52.240 |
you have to insert, you have to do all this memory movement 01:04:57.180 |
oh, I'm gonna have a loose collection of arrays. 01:05:00.480 |
So it's a record of arrays that is a part of a data frame. 01:05:03.980 |
And we thought about that back in the memory days, 01:05:08.940 |
And then also the operations that were relevant 01:05:12.620 |
What I noticed is just that each of these little things 01:05:19.940 |
about six months in, people started joining me. 01:05:27.300 |
And these people are many of the unsung heroes, I would say. 01:05:30.300 |
People who are, they sometimes don't get the credit 01:05:32.940 |
they deserve because they were critical both to support, 01:05:41.600 |
And they were helping, encouraged by contributing. 01:05:43.860 |
And once, the big thing for me was when John Hunter, 01:05:47.260 |
he had previously done kind of a simple thing 01:05:50.200 |
called numerics to kind of, between numeric and number A, 01:05:55.120 |
that would just select each one for Matplotlib. 01:05:57.900 |
In 2006, he finally said, we're gonna just make numpy 01:06:04.420 |
and I remember specifically when he did that, 01:06:08.420 |
That was when I knew we had succeeded, success. 01:06:11.220 |
Before then, it was still, I didn't know, sure. 01:06:17.860 |
and then I've been floored by what it's done. 01:06:24.740 |
- And it has to do with, again, the language thing. 01:06:28.620 |
It just, people started to think in terms of numpy. 01:06:33.020 |
- And that opened up a whole new way of thinking. 01:06:36.460 |
And part of the story that you kind of mentioned, 01:06:42.980 |
is it seems like at some point in this story, 01:06:54.820 |
the scientific community started to think like programmers 01:07:00.140 |
or started to utilize the tools of computers to do, 01:07:04.300 |
like at a scale that wasn't done with Fortran. 01:07:12.020 |
I mean, there's a few other competitors, I guess, 01:07:14.260 |
but Python, I think, really, really took over. 01:07:19.700 |
because this is sort of the start of this journey in 2005, '06. 01:07:23.260 |
So my tenure committee, I applied for tenure in 2006, 2007. 01:07:34.140 |
Right, so I was a polarizing figure in the department. 01:07:36.860 |
It went all the way up to the university president. 01:07:39.780 |
Ultimately, my department chair had the sway. 01:07:43.780 |
they said, "Come back in two years and do it again." 01:07:49.700 |
I mean, I had this interest in entrepreneurship, 01:07:59.700 |
So I do have to give credit to that exploration of economics 01:08:03.060 |
because that led me, oh, I had a lot of opinions. 01:08:15.740 |
- So you value broadly, philosophically, freedom. 01:08:20.300 |
but I also understand the power of communities, 01:08:26.160 |
And so what's that balance, right, that makes sense? 01:08:31.500 |
I gotta go out and explore this entrepreneur world. 01:08:39.720 |
"Hey, could I join you and start this trend?" 01:08:43.060 |
And he, at that time, they were using Sci-Fi a lot, 01:08:53.380 |
I left academia and went to entrepreneur world in 2007. 01:08:57.300 |
So I moved here in 2007, kind of took a leap, 01:09:06.900 |
I've kept some connections to a lot of academics, 01:09:12.500 |
I still value the essence and the soul and the heart 01:09:21.340 |
and the kind of, we can go into detail about why and where 01:09:24.540 |
and how this happens, what are some of the challenges. 01:09:31.820 |
I still love MIT because there's magic there. 01:09:35.580 |
There's people I talk to, like researchers, faculty, 01:09:42.700 |
and just the conversation, that's magic there. 01:10:00.820 |
And I'm still have a lot of hope that that can change 01:10:05.820 |
because I don't often see that particular type of magic 01:10:12.820 |
So we need that and we need that flame going. 01:10:16.620 |
- And it's the same thing as exactly as you said, 01:10:23.700 |
But then if you, like the reason I stepped away, 01:10:27.180 |
the reason I'm here just like you did in Austin 01:10:29.920 |
is like if I wanna build one robot, I'll stay at MIT. 01:10:37.140 |
enough to where I can explore the magic of that, 01:10:44.140 |
- That translational dance has been lost a bit. 01:10:47.580 |
- Right, and there's a lot of reasons for that. 01:10:51.660 |
but I realized that I wanted to explore entrepreneurship 01:10:57.740 |
and it's been a driving passion for 20 years, 25 years. 01:11:01.560 |
How do we connect capital markets and company, 01:11:06.500 |
'cause again, I fell in love with the notion, 01:11:07.860 |
oh, profit seeking on its own is not a bad thing. 01:11:13.520 |
for allocating resources that, in an emergent way, right, 01:11:28.820 |
of the world's resources and voluntarily people 01:11:37.560 |
and to try to figure out, and that's what I've been 01:11:39.180 |
kind of stumbling through for the past 14 years. 01:11:47.860 |
One of the things I had done, it's worth mentioning 01:11:53.820 |
I said, well, I don't know how to fund this thing. 01:11:57.860 |
And I had done some fundraising from the public 01:12:00.500 |
to try to get public fundraisers from my lab. 01:12:04.260 |
the fundraising circuit the way it's traditionally done. 01:12:06.920 |
So I wrote a book and I said, I'm gonna write a book 01:12:17.260 |
and made sure the stuff worked so the book would work. 01:12:19.740 |
So it really helped actually make NumPy become a thing. 01:12:28.180 |
Guide to NumPy is not a book you pick up and go, 01:12:38.020 |
so I said, look, I need to, so I'm gonna charge for it. 01:12:42.740 |
Not that much, just probably five angry messages, 01:12:57.780 |
But there were a few, but actually surprisingly not. 01:13:08.540 |
So what I did, I did it in an interesting way. 01:13:10.180 |
I said, well, kind of my ideas around IP law and stuff. 01:13:16.420 |
Like once it's, the fact that you have a thing 01:13:18.300 |
and copying is free, but the creation is not free. 01:13:21.660 |
So how do you fund the creation and allow the copying? 01:13:25.400 |
Right, and in software it's a little more complicated 01:13:26.820 |
than that because creation is actually a continuous thing. 01:13:29.180 |
You know, it's not like you build a widget and it's done. 01:13:37.540 |
I said, look, I need, I think I said 250,000. 01:13:40.740 |
If I make 250,000 from this book, I'll make it free. 01:13:53.140 |
And it's actually interesting 'cause one of the people 01:13:55.860 |
who also thought that was interesting ended up being 01:13:57.980 |
Chris White, who was the director of DARPA project 01:14:02.980 |
And the reason he even called us back is 'cause he remembered 01:14:05.380 |
my name from this book and he thought that was interesting. 01:14:08.140 |
And so even though we hadn't gone to the demo days, 01:14:10.900 |
we applied and the people said, yeah, nobody ever gets this 01:14:21.700 |
I was actually really, really pleased by the result. 01:14:23.900 |
I mean, I ended up in three years, I made 90,000. 01:14:29.500 |
I just put it up on, used PayPal and sold it. 01:14:32.100 |
And those are my first taste of kind of, okay, 01:14:40.380 |
From Germany to Japan, it was actually, it did work. 01:14:44.500 |
And so I appreciated the fact that PayPal existed 01:15:00.580 |
and not trying to make the full amount because, 01:15:03.300 |
you know, a year and a half later, I was at Enthought. 01:15:07.860 |
And then actually what happened is the documentation people, 01:15:10.820 |
we want to do documentation for Sci-Fi as a collective. 01:15:14.220 |
And they were essentially needing the stuff in the book. 01:15:20.260 |
hey, can we just use the stuff from your book? 01:15:21.820 |
And at that point I said, yeah, I'll just open it up. 01:15:27.180 |
And the money that I made actually funded my grad student. 01:15:35.300 |
- The funny thing is if you do a very similar 01:15:37.340 |
kind of experiment now with NumPy or something like it, 01:15:43.700 |
- Because of the tooling and the community building. 01:15:49.140 |
there's just a virality to that kind of idea. 01:15:53.500 |
And really I've thought about a couple of books 01:15:56.020 |
or a couple of things that could be done there. 01:15:58.900 |
Even, I tried to hire a ghostwriter this year too, 01:16:09.740 |
So I came here, worked at Enthought for four years. 01:16:51.480 |
Somehow it's sad that when there's that kind of. 01:17:13.520 |
in the SciPy, in supporting the community around SciPy. 01:17:36.240 |
that could do, that could get venture funding, right? 01:17:45.040 |
- So let me ask you, it's a little bit for fun 01:18:10.720 |
or not regret necessarily, but some things to think about. 01:18:14.440 |
If you could go back and you could fix stuff about NumPy 01:18:21.960 |
what kind of things would you like to see changed? 01:18:28.200 |
First of all, you know, I wrote NumPy as a service 01:18:35.040 |
and then other people came help make it happen. 01:18:36.800 |
NumPy succeeded because the work of a lot of people, right? 01:18:42.240 |
I'm grateful for the opportunity, the role I could play 01:18:45.120 |
and grateful that things I did had an impact, 01:18:49.240 |
because the other people that came to the story. 01:18:55.760 |
the way data types, we had array scalers, for example, 01:18:59.320 |
that are really just a substitute for a type concept. 01:19:03.800 |
Right, so we had array scalers or actual Python objects 01:19:06.960 |
so that there's for every, for a 32 bit float 01:19:13.120 |
Python doesn't have a natural, it's just one integer, 01:19:17.000 |
Well, what about these lower precision types, 01:19:25.300 |
but then have an object in Python that was one of them. 01:19:28.740 |
And there's questions about, like in retrospect, 01:19:34.880 |
And like made the type system actually a Python type system 01:19:38.020 |
as opposed to currently, it's a Python one level type system. 01:19:42.240 |
between Python one, Python two, it's kind of technical, 01:19:47.320 |
it was really brilliant, it was the actually, 01:19:50.220 |
Python one, all classes, new objects were one. 01:20:07.960 |
and have users write classes that are new types. 01:20:10.800 |
So he was able to have your user classes be actual types 01:20:13.320 |
and the Python type system got a lot more rich. 01:20:16.480 |
I barely understood that at the time that NumPy was written 01:20:21.400 |
created a type system that was Python one era. 01:20:24.420 |
It was every D type is an instance of the same type 01:20:29.240 |
as opposed to having new D types be really just Python types 01:20:40.320 |
it's the fact that it's clumsy to create new types. 01:20:47.560 |
you wanna create new types, you wanna quaternion type 01:20:49.500 |
or you wanna add a new posit type or you wanna, 01:21:02.880 |
it would integrate with that type system much cleaner 01:21:08.520 |
You could actually have Python when you add Numba 01:21:18.880 |
but are you talking about from the perspective 01:21:20.960 |
of developers within NumPy or users of NumPy? 01:21:23.880 |
- Developers of new, not really users of NumPy so much, 01:21:28.720 |
- So you're thinking about like how to design NumPy 01:21:36.760 |
- It's less work to make it better and to keep it maintained 01:21:39.360 |
and where that's impacted things, for example, is the GPU. 01:21:43.440 |
Like all of a sudden GPUs start getting added 01:21:50.600 |
The fact that we have to download a whole other object 01:22:00.200 |
if we could sort of go on that tangent briefly 01:22:02.520 |
is you have PyTorch and other library like TensorFlow 01:22:17.760 |
Well, and the problem was they didn't realize that. 01:22:26.960 |
is there like a difference between their implementations? 01:22:34.040 |
And sorry to interrupt that there's GPUs, ASICs, 01:22:41.600 |
or the aliens will come with a new kind of computer, 01:22:43.960 |
like an abstraction that NumPy should just operate nicely 01:22:50.280 |
and smarter and smarter with this multi-dimensional arrays. 01:22:56.920 |
We are working on something now called data-apis.org, 01:23:05.320 |
It's not just me, it's me and Rolf and Athen and Aaron 01:23:09.160 |
and a lot of companies are helping us at Quantsight Labs. 01:23:22.600 |
with the TensorFlow team and the PyTorch team 01:23:30.000 |
'cause the first year after leaving Anaconda in 2018, 01:23:34.000 |
I became deeply aware of this and realized that, 01:23:36.040 |
oh, this split in the array community that exists today 01:23:39.000 |
makes what I was concerned about in 2005 pretty parochial. 01:23:47.360 |
so perhaps the industry can sustain more stacks, right? 01:24:00.840 |
but it's better if you can at least refactor some parts. 01:24:12.720 |
- Yeah, innovative and then maybe on the infrastructure, 01:24:22.480 |
And I think, but it was interesting to hear the stories. 01:24:24.680 |
I mean, TensorFlow came out of a C++ library, 01:24:30.240 |
that was basically how they were doing inference, right? 01:24:36.480 |
That C++ library, then what was interesting to me 01:24:38.440 |
was the fact that both Google and Facebook did not, 01:24:42.640 |
it's not like they supported Python or NumPy initially, 01:24:47.240 |
They came to this world and then all these were like, 01:24:54.840 |
TensorFlow's bolt-on, I don't mean to offend, 01:25:05.800 |
'cause in the sense that I don't give people input enough. 01:25:13.680 |
- When I went to, it was a talk given at Mallorca in Spain 01:25:24.840 |
Like you're taking this beautiful system we've created 01:25:27.080 |
and like you're corrupting all these poor Python people, 01:25:36.760 |
And so Keras TensorFlow is fine, is reasonable. 01:25:43.640 |
Like Facebook had their own C++ library for doing inference 01:25:48.160 |
and they also had the same reaction, they had to do this. 01:25:52.840 |
maybe because the way it's situated in part of FAIR, 01:25:56.600 |
TensorFlow is definitely used and they have to make, 01:26:06.880 |
Facebook's been much more open to having community input 01:26:14.200 |
they're really eager to have community users. 01:26:18.800 |
Like it's harder to become a contributor to TensorFlow. 01:26:21.600 |
- And it's also, this is a very difficult question to answer 01:26:24.760 |
and I don't mean to be throwing shade at anybody, 01:26:27.080 |
but you have to wonder, it's the Microsoft question, 01:26:30.320 |
of when you have a tool like PyTorch or TensorFlow, 01:26:36.280 |
and how much are you tending to the big corporate clients? 01:26:46.440 |
or do you tend to the few that are giving you a ton of money? 01:26:54.840 |
- 'Cause I feel like if you nurture the hackers, 01:26:57.760 |
you will make the right decisions in the longterm 01:27:07.080 |
'Cause you can lean to the hackers and run out of money. 01:27:11.480 |
Which has been some of the challenge I've faced. 01:27:15.760 |
I would look at some of the experiments like NumPy, 01:27:23.720 |
- Right, I mean, I didn't succeed in the early days 01:27:26.560 |
of getting enough financial contribution to NumPy 01:27:32.480 |
I had to just catch an hour here, an hour there. 01:27:37.960 |
Like I've wanted to be able to do something about that 01:27:44.560 |
we had an offer from Microsoft early days of Anaconda. 01:27:50.240 |
The problem was the right people at Microsoft 01:28:02.720 |
but it was another R company that was emergent. 01:28:09.280 |
But they were really doubling down on R, right? 01:28:14.280 |
So it's not, it wasn't, it was before Satya was there. 01:28:19.400 |
- Right, and the offer was coming from someone 01:28:23.840 |
- Right, and if it had come from Scott Guthrie, 01:28:36.040 |
especially given what Microsoft has since done 01:28:38.640 |
for the open source community and all those things. 01:28:41.520 |
I really like some of the stuff they've been doing. 01:28:43.640 |
They're still working, and they've hired Guido now, 01:28:46.360 |
and they've hired a lot of Python developers. 01:28:52.400 |
Which means he retired, then he came out of retirement, 01:29:05.040 |
- Well, he was kind of saying he would retire, 01:29:09.640 |
since I last sat down and really talked to Guido. 01:29:18.200 |
because I'd finally figured out the type system for NumPy. 01:29:20.720 |
I wanted to kind of talk about that with him, 01:29:24.000 |
- Could you stay in that, just for a brief moment, 01:29:31.280 |
What have you learned from Guido about programming, 01:29:45.680 |
He may, but we talk enough to, I respect his, 01:30:06.920 |
- I would have loved to be a fan in the wild. 01:30:14.840 |
Like he was willing to listen to people's ideas, right? 01:30:19.720 |
Now generally, I'm not saying universally that's been true, 01:30:27.240 |
Like on the scientific side, he would just kind of defer. 01:30:29.080 |
He didn't really always understand what we were doing. 01:30:39.640 |
but about 10 years later than it should have. 01:30:46.240 |
And I learned this while I was writing NumPy. 01:30:48.200 |
I also wrote tools to, I became a Python dev, 01:30:57.000 |
but we got the basic structure of it into Python. 01:31:10.800 |
'Cause I wrote the underlying infrastructure in C, 01:31:27.280 |
Like really, I really got a lot of respect for him 01:31:29.480 |
when I saw what he did with this type class merger thing. 01:31:42.320 |
The reason I could is 'cause he'd written this blog post 01:31:50.120 |
But he was willing to at least try to write this post. 01:31:53.320 |
And so he's been motivated early on with Python. 01:31:59.920 |
oh, maybe we should be pushing programming to more people. 01:32:02.080 |
So he had this populist notion, I guess, or populist sense. 01:32:10.600 |
of engaging with contributors sufficiently to, 01:32:24.560 |
- Can you also comment on this tragic stepping down 01:32:29.120 |
from his position as the benevolent dictator for life 01:32:43.680 |
you can look up, there's the Walrus operator, 01:33:07.320 |
What do you think about the pressure of leadership? 01:33:10.400 |
- It's something that, you mentioned the letter I wrote 01:33:19.560 |
You get criticized, right, and you get pushed, 01:33:21.680 |
and you get, not everybody loves what you do. 01:33:23.840 |
Like, any time you do anything that has impact at all, 01:33:32.000 |
because it's impossible if you did everything right. 01:33:39.320 |
People can, I prefer to get people to benefit the doubt. 01:33:43.120 |
I don't immediately assume they have bad intentions. 01:33:45.840 |
And maybe for other, you know, maybe that doesn't happen 01:33:48.200 |
for everybody, for whatever reason, their past, 01:33:50.240 |
their experience with people, they sometimes have bad, 01:33:53.080 |
so they immediately attribute to you bad intentions. 01:33:57.800 |
but I think you're misinterpreting the whole point. 01:34:03.680 |
you know, I've been, sometimes I say to people, 01:34:07.160 |
I know I'm, I care enough about entrepreneurship 01:34:09.840 |
to make some open source people uncomfortable. 01:34:26.080 |
I've noticed this too, that there's a tendency, 01:34:31.840 |
when you don't have perfect information about the situation, 01:34:35.600 |
you tend to fill the gaps with the worst possible, 01:34:39.320 |
or at least the bad story that fills those gaps. 01:34:47.000 |
maybe not fully naively, but filling in the gaps 01:34:49.760 |
with the good, with the best, with the positive, 01:34:54.760 |
with the hopeful explanation of why you see this. 01:34:57.320 |
So if you see somebody like you trying to make money 01:35:00.280 |
on a book about NumPy, there's a million stories around that 01:35:04.080 |
that are positive, and those are good to think about, 01:35:15.600 |
And also when you project that positive intent, 01:35:24.280 |
And of course, what Twitter early on figured on, 01:35:27.760 |
Facebook, is that they can make a lot of money 01:35:33.120 |
- And so like, there's this, we're fighting this mechanism. 01:35:39.560 |
- And then for some reason, something in our minds 01:35:41.920 |
really enjoys sharing that and getting all excited 01:35:48.600 |
perhaps that we're gonna get eaten if we don't. 01:35:50.880 |
- Exactly, for us to be effective as a group of people 01:35:54.600 |
you have to project positive intent, I think. 01:36:01.640 |
But Python has done a reasonable job in the past, 01:36:05.440 |
it started to get this pressure where it didn't. 01:36:07.840 |
I really didn't know enough about what happened. 01:36:12.160 |
and I know most of the steering committee members today, 01:36:17.920 |
but it's the wrong role for me right now, right? 01:36:20.880 |
I have a lot of respect for the Python developer space 01:36:28.840 |
and array programming developers or science developers. 01:36:31.440 |
And in fact, Python succeeds in the array space 01:36:39.440 |
and working like everything to try to keep up 01:36:45.520 |
Like I'm a C programmer, but not a computer scientist. 01:36:49.080 |
Like I was an engineer and physicist and mathematician, 01:36:52.600 |
and I didn't always understand what they were talking about 01:36:56.400 |
and why they would have opinions the way they did. 01:37:00.280 |
Then you also have to explain your point of view 01:37:04.840 |
And that communication is always the challenge. 01:37:09.200 |
about the negativity is just another form of that. 01:37:12.560 |
And it does appear we're wired anyway to at least have a, 01:37:15.880 |
there's a part of us that will enemy, friend, enemy. 01:37:31.640 |
and all of this, let me just ask you these questions. 01:37:34.120 |
So one interesting side on the Python history 01:37:41.000 |
You mentioned move from Python one to Python two, 01:37:50.040 |
It broke in quite a small way backward compatibility, 01:38:01.480 |
- From how long it took and how painful it seemed to be? 01:38:07.000 |
Well, I mentioned here earlier that NumPy was written in 2005. 01:38:15.520 |
to talk about getting NumPy into Python three. 01:38:18.880 |
oh, we were moving to Python three, let's have that be, 01:38:22.200 |
because like, wait, Python three, that was in 2020, right? 01:38:25.480 |
When we finally ended support for Python two, 01:38:49.800 |
And then 3.4 started to be, oh, yeah, I want that. 01:38:52.600 |
And then 3.5 as the matrix multiply operator, 01:39:08.200 |
when you have a group of people using something, 01:39:15.440 |
I think it fixed some things Guido had always hated. 01:39:17.200 |
I think he didn't like the fact that print was a statement. 01:39:20.760 |
But in some sense, that's a bit of gratuitous change 01:39:27.320 |
But one of the challenges was there wasn't enough features 01:39:40.480 |
I think also it illustrated just the funding realities. 01:39:53.440 |
And I've learned some of the behind the scenes 01:39:57.920 |
And maybe not on air, we can talk about some of that. 01:40:01.520 |
But it's interesting to see, but Guido had a job, 01:40:03.640 |
but his full-time job wasn't just work on Python. 01:40:16.200 |
- Maybe that's a feature and not a bug, I don't know. 01:40:19.080 |
At least early on, it's sort of, I know, yeah. 01:40:21.840 |
- It's like Olympic athletes are often severely underfunded, 01:40:25.200 |
but maybe that's what brings out the greatness. 01:40:33.680 |
I currently have an incubator for open source startups. 01:40:37.640 |
is create the environment I wished had existed 01:40:44.120 |
I'm trying to create those opportunities and environments. 01:40:52.600 |
- So let me stay, I mean, I could probably stay 01:40:55.480 |
at NumPy for a long time, but this is fun question. 01:41:00.920 |
So Andrej Kapothy leads the Tesla Autopilot team, 01:41:04.640 |
and he's also one of the most like legit programmers I know. 01:41:09.640 |
It's like he builds stuff from scratch a lot, 01:41:16.160 |
He just builds it from scratch, and I always love that. 01:41:27.160 |
saying that they got a significant improvement 01:41:31.240 |
on some aspect of their like data loading, I think, 01:41:44.480 |
that you can get even a much greater improvement 01:42:08.040 |
between usability and efficiency broadly in NumPy, 01:42:19.080 |
if you use a NumPy math function on a scalar, 01:42:23.600 |
it's gonna be slower than using a Python function 01:42:33.760 |
'Cause you can also call that math object on an array. 01:42:36.720 |
And so effectively, it goes through a similar machine. 01:42:39.520 |
There aren't enough of the, which you would do, 01:42:56.880 |
that's where you definitely need to be using arrays. 01:42:59.040 |
But if you're less than that, and for reading, 01:43:02.720 |
and essentially it's not compute bound, it's IO bound, 01:43:05.560 |
and so you're really taking lists of 1,000 at a time 01:43:08.440 |
and doing work on it, yeah, you could be faster 01:43:21.200 |
it's very possible that np.squareroot is much faster. 01:43:31.640 |
or whatever, all the different quotes around that, 01:43:34.040 |
is sometimes obsessing about this particular little quark 01:43:41.680 |
- For somebody like, if you're trying to optimize your path, 01:43:51.840 |
- I believe the quote is, it's the root of all evil. 01:43:59.080 |
- Well, Doc Knuth is kind of like Mark Twain, 01:44:00.800 |
people just attribute stuff to him, I don't-- 01:44:22.640 |
And the other part, the other part I didn't mention, 01:44:32.120 |
and I wanted it from the beginning of writing NumPy, 01:44:41.880 |
is really that you can write functions using all of it. 01:44:47.840 |
I write this N-dimensional for loop with four loops, 01:44:53.560 |
I'm gonna do this operation, this plus, this minus, 01:45:03.600 |
with the added benefit of, oh, it can be parallelized, 01:45:14.120 |
and then try to infer parallelism from for loops. 01:45:19.600 |
and just automatically parallelize that problem. 01:45:28.960 |
there are others, sine, cosine, add, subtract, 01:45:36.880 |
and all these special functions that come up in physics, 01:45:40.200 |
and I added them as Ufuncs, so they could work on arrays. 01:45:45.920 |
that was one of the things we tried to make better in NumPy 01:45:47.760 |
was how do they work, can they do broadcasting, 01:45:56.520 |
So what happens, the Python scaler gets broadcast 01:46:01.320 |
and then it goes through the whole same machinery 01:46:20.360 |
that can be faster than just calling the CLIB square root. 01:46:27.640 |
- In the Python runtime, so they really optimize it, 01:46:35.200 |
- They don't have to worry about the fact that, 01:46:36.320 |
oh, this could be an object with many pieces. 01:46:41.060 |
in the sense that typecasting and broadcasting, 01:46:47.360 |
I have a scaler with a four dimensional array, 01:46:50.480 |
Oh, I have to kind of coerce the shape of this guy 01:46:54.640 |
to make it work against the whole four dimensional array. 01:46:56.880 |
So it's the idea of, I can do a one dimensional array 01:46:59.680 |
against a two dimensional array, and have it make sense. 01:47:03.200 |
is it challenges you to reformulate, rethink your problem, 01:47:14.320 |
- Exactly, in fact, that's where some of the edge cases, 01:47:16.000 |
boundaries are, is that, well, they're still there, 01:47:18.960 |
and this is where array scalers are particular. 01:47:28.300 |
And so their default is to coerce the array scaler 01:47:41.760 |
we do comparisons and say, look, it's a thousand X speed up. 01:48:04.020 |
And then that's always been the power of Python, 01:48:11.520 |
in the runtime of the Python interpreter, yeah. 01:48:17.260 |
which you do in the high level is just high level logic. 01:48:34.040 |
the language like Julia says, we're gonna be all in one. 01:48:37.400 |
And then there's, the jury's out, is that possible? 01:48:55.440 |
But to compile those libraries takes a while. 01:49:00.800 |
You wanna have this precompiled binary available 01:49:09.800 |
running binary code is more than source code. 01:49:13.840 |
it's the loader, it's the how does that interpret it 01:49:17.640 |
There's a lot of details there that actually, 01:49:25.160 |
the better off you are, and you can do more details. 01:49:28.440 |
But sometimes it helps with abstractions too. 01:49:33.480 |
with abstractions is you kind of sometimes assume 01:49:41.520 |
had your case in mind and found the optimal solution. 01:49:49.000 |
- One of the really powerful things to me early on, 01:49:52.800 |
I mean, it sounds silly to say, but with Python, 01:49:55.480 |
probably one of the reasons I fell in love with it 01:50:00.920 |
- So obviously probably most languages have some, 01:50:06.480 |
but it felt like it was a first-class citizen. 01:50:09.040 |
And it was just my brain was able to think in dictionaries. 01:50:12.200 |
But then there is the thing that I guess I still use 01:50:23.760 |
the running time cost is not that significant. 01:50:26.040 |
There's a lot of things to understand about dictionaries 01:50:30.400 |
that the abstraction kind of doesn't necessarily 01:50:37.400 |
- Right, do you really understand the notion of a hash map 01:50:41.080 |
But you're right, dictionaries are a good example 01:50:44.960 |
And I agree with you, I love dictionaries too. 01:50:47.840 |
Took me a while to understand that once you do, 01:50:54.120 |
that one of the foundational things is dictionaries, 01:51:07.400 |
like dictionaries and lists and tuples and binary trees. 01:51:35.000 |
and make them less error-prone to human users, 01:51:41.480 |
human interpretable names that are sticky to those arrays. 01:51:44.680 |
So that's how you start to think about dictionaries. 01:51:52.120 |
And that's actually the tension I've had with NumPy 01:51:58.800 |
human interpretability and also protecting me 01:52:12.920 |
- Yes, so there's a project called Labeled Arrays. 01:52:18.040 |
oh, we need, we're indexing NumPy with just numbers, 01:52:21.320 |
all the columns and particularly the dimensions. 01:52:25.520 |
you don't necessarily need to label each column or row, 01:52:36.760 |
That was one of the impetuses for pandas actually, 01:52:39.680 |
was just, oh, we do need to label these things. 01:52:43.040 |
And Labeled Array was an attempt to add that, 01:52:47.680 |
And there's been, like that's an example of something 01:52:49.360 |
I think NumPy could add, could be added to NumPy. 01:52:53.080 |
But one of the challenges again, how do you fund this? 01:52:55.000 |
Like I said, one of the tragedies I think is that, 01:53:02.380 |
So I've always just done it in my spare time, 01:53:16.640 |
I'm actually paying people to work on NumPy and SciPy, 01:53:22.820 |
That's what I always wanted to do from day one, 01:53:24.280 |
it just took me a while to figure out a mechanism to do that. 01:53:27.640 |
- Even like in the university setting, respecting that, 01:53:49.180 |
But then also there's just a better allocation of resources. 01:53:57.020 |
we spent over $6 trillion in the Middle East after 9/11 01:54:04.580 |
And sort of to put politics and all that aside, 01:54:08.060 |
it's just, you think about the education system, 01:54:10.120 |
all the other ways we could have possibly allocated 01:54:21.220 |
by allocating a little bit of money to the programmers 01:54:26.220 |
that build the tools that run the world, it's fascinating. 01:54:47.060 |
So it's like, I think you can have enough money 01:54:50.760 |
and actually be wealthy while maintaining your values. 01:54:55.560 |
There's an old adage that nations that trade together 01:54:59.480 |
I've often thought about nations that code together. 01:55:03.880 |
- Because one of the things I love about open source 01:55:09.200 |
One of the challenges with business and open source 01:55:10.800 |
is the fact that, well, business is national. 01:55:16.280 |
and have laws that are respected in those jurisdictions, 01:55:18.320 |
and hiring, and yet the open source ecosystem 01:55:23.080 |
Like, currently, one of the problems we're solving 01:55:47.600 |
and, you know, and it's, there's a certain amount 01:55:50.520 |
of humanizing, right, that gets away from the, 01:55:58.560 |
but the memes are not even an accurate reflection 01:56:02.440 |
- Well, if you look at the major power centers 01:56:08.240 |
in the next few decades, it's the United States, 01:56:18.280 |
So if they work together, I think that's one way, 01:56:21.360 |
the politicians can do their stupid bickering, 01:56:23.380 |
but, like, there's a layer of infrastructure, of humanity, 01:56:29.440 |
that I think can prevent major military conflict, 01:56:34.100 |
which would, I think, most likely happen at the cyber level 01:56:43.320 |
Nations that code together don't go to war together. 01:56:48.620 |
That's one of the philosophical hopes, but yeah. 01:56:58.520 |
So from the early days, there was kind of a pushback 01:57:08.240 |
If you wanna write something that's usable and friendly, 01:57:19.760 |
And the reality was people would write high-level code 01:57:30.720 |
Like before Numba, it was always don't write a for loop. 01:57:44.880 |
that you don't necessarily need all the time. 01:57:52.800 |
A vectorized was a tool in NumPy, it was released. 01:58:01.080 |
So you get a function that just worked on a scalar. 01:58:12.020 |
If X equals zero, return one, otherwise do sine X over X. 01:58:26.720 |
and at every call do a loop back into Python. 01:58:30.440 |
It gave you the appearance of a Ufunc, but it was very slow. 01:58:36.280 |
and produce a Ufunc working on binary native code. 01:58:39.480 |
So in fact, I had somebody work on that with PyPy 01:58:42.780 |
and see if PyPy could be used to produce a Ufunc like that 01:58:45.620 |
early on in 2009 or something like that, 2010. 01:58:52.860 |
But in 2012, Peter and I had just started Anaconda. 01:58:57.000 |
We had, I had just, I'd learned to raise money. 01:59:00.700 |
That's a different topic, but I'd learned to, you know, 01:59:03.060 |
raise money from friends, family, and fools, as they say. 01:59:20.600 |
And we had a bunch of ideas there, but one of them, 01:59:27.240 |
I just, I went, I heard about my friend Dave Beasley 01:59:40.060 |
that just basically mapped Python bytecode to LLVM. 01:59:46.500 |
- Right, so, and the first version is like, this works, 01:59:55.360 |
There had been efforts to speed up Python in the past, 02:00:05.140 |
the runtime of Python, which is fundamentally hard, 02:00:12.160 |
Like, it's generic, you know, when it does this variable. 02:00:20.280 |
I said, I'm gonna take a subset of the Python syntax 02:00:27.400 |
- So it's almost like for loops, like focusing on for loops. 02:00:30.440 |
- For loops, scalar arithmetic, you know, typed, 02:00:34.400 |
you know, really typed language, a typed subset. 02:00:41.880 |
So you didn't have to spell all the types out, 02:00:43.400 |
because when you call a function, so Python is typed. 02:00:59.180 |
that could be compiled and have them used for NumPy arrays. 02:01:07.040 |
Do you add a comment within Python that tells it to do, 02:01:17.760 |
that it just looks at the type of the objects 02:01:23.440 |
And then it was also, because we had a use case 02:01:26.080 |
that could work early, like one of the challenges 02:01:29.040 |
of any kind of new development is if you have something 02:01:33.200 |
a long time, it's really hard to get it off the ground. 02:01:35.960 |
If you have a project where there's some incremental story, 02:01:39.200 |
it can start working today and solve a problem, 02:01:42.280 |
then you can start getting it out there, getting feedback. 02:01:44.640 |
Because Numba today, now Numba is nine years old today, 02:01:47.640 |
right, the first two, three versions were not great, right? 02:01:52.120 |
But they solved a problem and so people could try it 02:02:00.600 |
the subset it would actually compile was small. 02:02:09.000 |
So decorators are just these little constructs 02:02:11.040 |
that let you decorate code with an @ and then a name. 02:02:17.760 |
and actually just compile it and replace the Python function 02:02:24.920 |
And it would just do that and we went from Python bytecode, 02:02:28.520 |
then we went to AST, I mean, writing compilers 02:02:30.720 |
actually, I learned a lot about why computer science 02:02:36.600 |
They use tree structures, they use all the concepts 02:02:40.480 |
And it's actually hard to, it's easy to write a compiler 02:02:47.600 |
and we ended up with three versions of Numba, right? 02:02:51.540 |
- What's, what programming language is Numba written in? 02:03:01.700 |
- Yeah, so Python, but then the whole goal of Numba 02:03:07.480 |
And so LLVM actually does the code generation. 02:03:12.760 |
if you're not writing the parser nor the code generator. 02:03:16.680 |
- So for people who don't know, LLVM is a compiler itself. 02:03:20.280 |
- Yeah, it's really badly named low-level virtual machine, 02:03:29.280 |
But the name makes you imply that the virtual machine 02:03:49.160 |
And it was a place where you could collaborate. 02:03:51.040 |
And we were early, I mean, people had started before. 02:04:01.040 |
because one, every browser has a JavaScript JIT. 02:04:18.600 |
- Yes, well, it's kind of incredible what they've done. 02:04:24.760 |
But Numba was an effort to make that happen with Python. 02:04:32.440 |
And then we also applied for this DARPA grant 02:04:34.800 |
and used some of that money to continue the development. 02:04:36.800 |
And then we used proceeds from service projects we would do. 02:04:41.840 |
on that we would then use some of the profits 02:04:45.400 |
So we ended up with a team of two or three people 02:04:50.720 |
And ultimately, the fact that we had a commercial version 02:04:54.720 |
So part of the way I was trying to fund Numba 02:04:58.560 |
and then we'll have a commercial version of Numba 02:05:05.520 |
and the very first AppJIT compiler that in 2012, 2013, 02:05:18.800 |
- And that's an interesting funding mechanism 02:05:21.120 |
because large companies or larger companies care about speed 02:05:35.200 |
One, they'll pay for really good user interfaces, right? 02:05:37.960 |
And so I'm always looking for what are the things 02:05:40.160 |
people will pay for that you could actually adapt 02:05:45.560 |
The second is speed, like a better runtime, faster runtime. 02:05:50.000 |
you mean like a small number of people pay a lot of money, 02:06:06.800 |
So there's a company, but there's also a project 02:06:14.600 |
for many reasons, but one of which is bringing 02:06:31.520 |
- Yeah, I'll tell you a little bit of the history of that. 02:06:35.240 |
we wanted to scale Python 'cause we, you know, 02:06:39.040 |
Peter and I had the goal of when we started Anaconda, 02:06:52.520 |
but these need to run at scale with lots of machines. 02:06:55.160 |
The other thing we wanted to do was make user interfaces 02:06:59.200 |
We wanted to make sure the web did not pass by 02:07:05.840 |
So those are the two kind of technical areas. 02:07:07.560 |
We thought, oh, we'll build products in this space. 02:07:12.360 |
Very quickly in, but of course, the thing I knew how to do 02:07:18.800 |
and fools that had invested didn't lose their money. 02:07:21.560 |
So it's a little different than if you take money 02:07:25.360 |
the venture fund, they want you to go big or go home. 02:07:27.560 |
They're kind of like expecting nine out of 10 to fail 02:07:40.280 |
So I'm gonna do something I know can return a profit, 02:07:51.120 |
And I've learned from since and have it better. 02:07:56.520 |
of kind of attracting the interest around the area 02:08:00.160 |
and then funnel some money on some interesting projects. 02:08:02.920 |
Super excited about what came out of our energy there. 02:08:06.640 |
- So what are some of the interesting projects? 02:08:15.880 |
These are all tools that are extremely relevant 02:08:26.040 |
- Oh, JupyterLab, JupyterLab came out of this too. 02:08:34.000 |
So Bokeh was one of the foundational things to say, 02:08:39.160 |
- Right, that's right, that's right, that's right. 02:08:43.280 |
with all due respect to Matplotlib and Bokeh, 02:08:52.160 |
- Right, 'cause you're, I mean, I don't know. 02:09:03.440 |
But there's a difference between static plots 02:09:06.920 |
I'm an end user, I just want to write a simple, 02:09:09.760 |
for Panda started the idea of here's a data frame, 02:09:12.880 |
I'm just gonna attach plot as a method to my object, 02:09:20.200 |
'cause there's a lot less you have to pass in, right? 02:09:23.680 |
You can just say, here's my object, you know what you are. 02:09:31.320 |
that have not been super well-developed entirely. 02:09:33.720 |
But Bokeh was focused on interactive plotting. 02:09:38.400 |
between interactive plotting and application, 02:09:42.680 |
And there's some incredible work that got done there, right? 02:09:45.800 |
because then you're basically doing JavaScript and Python. 02:09:49.440 |
So we wanted to tackle some of these hard problems 02:10:03.040 |
And so we had two proposals, one for the Bokeh 02:10:05.560 |
and one for actually Numba and the other work. 02:10:14.880 |
- Fortunately, Chris let us use some of the money to fund 02:10:31.720 |
Yeah, Conda, it was early on, like I said, with SciPy. 02:10:35.480 |
SciPy was a distribution, mass generation library. 02:10:37.880 |
And you heard me talking about compiler issues 02:10:41.480 |
and the fact that people can use your libraries 02:10:47.800 |
And one of the first things we did at Continuum Analytics 02:10:50.680 |
became Anaconda, was organize the PyData ecosystem 02:11:04.200 |
but we're also gonna reify the community aspect 02:11:14.720 |
like this whole story of packaging in Python? 02:11:19.280 |
- Yeah, that's what I'm gonna get to, actually. 02:11:24.200 |
I think it's best expressed through the conversation 02:11:26.080 |
I had with Guido at a conference, where I said, 02:11:31.280 |
And Guido said, I don't ever care about packaging. 02:11:36.320 |
I'm like, I guess if you're the language creator 02:11:39.720 |
in the distribution, maybe you don't worry about packaging. 02:11:42.520 |
But Guido has never really cared about packaging, right? 02:11:45.200 |
And never really cared about the problem of distribution. 02:11:51.440 |
In fact, there's a philosophical question about 02:11:54.160 |
should you have different development packaging managers? 02:11:56.680 |
Should you have a package manager per language? 02:12:04.200 |
And there's an aspect of the development tool 02:12:07.680 |
And every language should have some story there 02:12:12.120 |
- So you should have language-specific development tools. 02:12:14.960 |
- Development tools that relate to package managers. 02:12:19.500 |
around package management that those language-specific 02:12:23.560 |
and currently aren't doing a good job of that. 02:12:25.920 |
That was one of the challenges of not seeing that difference 02:12:47.480 |
into my current development environment today. 02:13:19.960 |
Peter has a great photo of me talking to Guido 02:13:21.880 |
and he pretends we're talking about this story. 02:13:26.320 |
and ask Guido, "Guido, we need to fix packaging in Python. 02:13:36.920 |
- All right, you said, okay, you said to do this ourselves. 02:13:39.440 |
So at the same time, people did start to work 02:13:45.680 |
So in 2012, kind of motivated by our training courses 02:13:50.200 |
very similar to what you just mentioned about your mother. 02:14:03.440 |
"because I just spent a week getting my Python environment 02:14:05.440 |
"and if you change NumPy, I have to reinstall everything. 02:14:09.120 |
"And reinstalling is such a pain, don't do it." 02:14:10.920 |
I'm like, wait, okay, so now we're not making changes 02:14:14.060 |
to a library because of the installation problem 02:14:20.540 |
So we said, we're gonna make a distribution of Python. 02:14:26.920 |
I wanted to make one that we gave away for free 02:14:40.380 |
we want a package manager that works on Windows, 02:14:49.520 |
RPM is operating system specific package manager. 02:15:14.360 |
And we decided to go multiple operating systems, 02:15:17.120 |
multiple and programming language independent. 02:15:19.240 |
Because even Python, and particularly what was important 02:15:21.840 |
was SciPy has a bunch of Fortran in it, right? 02:15:24.960 |
And scikit-learn has links to a bunch of C++. 02:15:30.000 |
And the Python package managers, especially early on, 02:15:49.560 |
In fact, scikit-learn was a fantastic project that emerged. 02:15:57.120 |
I talked to you earlier about SciPy being too big 02:16:04.120 |
And there's scikit-image, there's scikit-learn, 02:16:07.600 |
And it was a fantastic move that the community did. 02:16:13.520 |
I didn't like the fact you typed scikit-image. 02:16:15.480 |
I was like, it has got to be simpler, sklearn. 02:16:19.760 |
I don't like typing all this stuff, it's important. 02:16:24.680 |
I love the fact that they went out and they did it. 02:16:31.240 |
Scikit-learn really emerged as a fantastic project. 02:16:34.600 |
- And the documentation around that is also incredible. 02:16:36.440 |
- And the documentation was incredible, exactly. 02:16:37.800 |
- I don't know who did that, but they did a great job. 02:16:41.400 |
a lot of people, a lot of European contributors. 02:16:55.240 |
which is machine learning, but couldn't install it. 02:17:02.120 |
So our use case of Conda was Conda install Scikit-learn. 02:17:06.040 |
Right, and it was the best way to install Scikit-learn 02:17:16.720 |
I still think you should Conda install Scikit-learn 02:17:22.200 |
The issue is the package they created was wheels, 02:17:24.720 |
and Pip does not handle the multi-vendor approach. 02:17:27.320 |
They don't handle the fact you have C++ libraries 02:17:32.200 |
And so what you have to do in the wheel world 02:17:36.080 |
You have to take all the binary and vendor it. 02:17:38.480 |
Now if a change happens in underlying dependency, 02:18:00.640 |
the number of time Pip is mentioned over Conda 02:18:07.960 |
It wasn't true early because Pip didn't exist. 02:18:13.040 |
of the internet documentation user-generated. 02:18:20.400 |
You're just not gonna see Conda in that first page. 02:18:35.080 |
- Like for especially super challenging thing, 02:18:36.560 |
I don't know, one of the big pain points for me 02:18:44.720 |
- I think Conda, I don't know if Conda has solved that one. 02:18:49.120 |
- I don't know, I certainly know Pip has not solved, 02:18:56.720 |
- I actually don't know, I should probably know 02:18:58.360 |
a good answer for this, but if you compile OpenCV 02:19:07.480 |
So there's this kind of flexibility of what you, 02:19:17.880 |
- So Conda has a notion of variance of a package. 02:19:20.560 |
You can actually have different compilation versions 02:19:23.160 |
of a package, so not just the version's different, 02:19:24.720 |
but oh, this is compiled with these optimizations on it. 02:19:30.120 |
- Well, Pip, as far as I know, does not have flavors. 02:19:43.640 |
It barely, you can sort of paper over it and duct tape it 02:19:56.160 |
it's an area where if you understand some things, 02:19:58.400 |
but not all the things, but they've done a great job 02:20:07.040 |
in order to make Conda more community-centric, right? 02:20:18.280 |
Even if the company is yourself, it's just one person. 02:20:23.320 |
But ultimately for products to succeed virally 02:20:33.680 |
And a big part of that is engagement with those people, 02:20:38.560 |
And what happened with Conda in the early days, 02:20:46.360 |
is sort of the community recipe creation community. 02:20:52.160 |
and Peter is CEO of Anaconda, he's my co-founder. 02:21:00.000 |
We're still great friends, we talk all the time. 02:21:03.600 |
There's a long story there about why and how, 02:21:06.040 |
and we can cover it in some other podcast perhaps. 02:21:09.480 |
- It's sort of a more, maybe a more business-focused one. 02:21:21.200 |
And let the, Anaconda shouldn't be fighting this battle. 02:21:30.440 |
and then they'll actually move us the right direction. 02:21:33.480 |
is many of the cool kids I know don't use Conda. 02:21:39.800 |
It's really a matter of, Conda has some challenges, 02:21:45.400 |
And it's that aspect of, wait, who's doing this? 02:21:47.600 |
And the fact that then the PyPA really stepped up. 02:21:50.960 |
Like they were not solving the problem at all. 02:21:53.400 |
And now they kinda got to where they're solving it 02:22:03.960 |
But, and we still use it all the time at Quantsite 02:22:09.000 |
but you can kinda do similar things with Pip and Docker. 02:22:13.000 |
So, especially with the web development community, 02:22:17.080 |
there's a lot of different kind of developers 02:22:20.200 |
And there's still a lack of some clear understanding. 02:22:25.320 |
and there's only a few people in the PyPA who get it. 02:22:28.280 |
And then others who are just massively trumpeting 02:22:30.680 |
the power of Pip, but just do not understand the problem. 02:22:32.840 |
- Yeah, so one of the obvious things to me from a mom, 02:22:44.960 |
and just, it seems much easier to recommend Conda there. 02:22:49.040 |
But then you should also recommend it across the board. 02:22:58.720 |
What I, like, build the environment with Pip, with Conda, 02:23:03.320 |
and then Pip install on top of that, that's fine. 02:23:05.280 |
Be careful about Pip installing OpenCV or TensorFlow, 02:23:17.640 |
the infrastructure with Conda, and then the weirdos, 02:23:21.560 |
that, like, the weird, like, implementation for some. 02:23:28.400 |
that, based on your location and time of day and date, 02:23:33.400 |
tells you the exact position of the sun relative to the-- 02:23:39.680 |
but it's very precise, and I was like, all right. 02:23:42.040 |
But that was, and it's Pip, and it's very nice. 02:23:46.920 |
Python developers who wanna get their stuff published, 02:23:52.840 |
I mean, even if it's, you know, the challenge is, 02:23:56.440 |
and there's a key thing that needs to be added to Pip, 02:24:03.400 |
Like, 'cause it's, you know, recognize you're not 02:24:07.240 |
So, let, like, give up, and allow a system packager to work. 02:24:12.240 |
That way, an Akanda's installed, and it has Pip, 02:24:15.120 |
it would default to Kanda to install its stuff, 02:24:20.560 |
Like, that's the, that's a key, not difficult, 02:24:23.440 |
but somewhat work, some work, feature needs to be added. 02:24:30.880 |
I wish I was more successful in the business side, 02:24:33.440 |
trying to get there, but I wish my, you know, 02:24:35.040 |
my family, friends, and full community that I know-- 02:24:39.320 |
'cause I know tons of things to do, effectively, 02:24:56.460 |
We created a community analytics to get Anaconda started. 02:24:58.160 |
Done it again with QuantSite, super excited by that. 02:25:13.480 |
to an open economy, so it's basically consulting 02:25:17.680 |
It's a consulting company, and what I've said 02:25:20.220 |
when I started it was we're trying to create products, 02:25:22.520 |
people, and technology, so it's divided into two groups, 02:25:28.300 |
The two groups are a consulting services company 02:25:31.960 |
and data engineering, and data management better, 02:25:40.000 |
if you're using Jupyter, we do staff augmentation, 02:25:42.880 |
need more programmers, help you use DAS more effectively, 02:25:50.800 |
both immediate help, and then learn from somebody. 02:25:57.080 |
we've kind of separated some of these other things 02:26:01.760 |
One of the things I loved about what we did at Anaconda 02:26:06.720 |
This time we did a lot of innovation at Anaconda. 02:26:09.360 |
I wanted to do innovation, but also contribute 02:26:12.320 |
to the projects that existed, like create a place 02:26:20.400 |
can pay people to work on them and keep them going. 02:26:22.680 |
So that's Labs, QuantSite Labs is a separate organization, 02:26:29.940 |
and in fact, every project that we have at QuantSite, 02:26:33.240 |
a portion of the money goes directly to QuantSite Labs 02:26:40.040 |
And currently, so I'm really excited about Labs, 02:26:45.240 |
- So Labs is working to make the software better, 02:26:55.400 |
we have a thing called a community work order, we call it. 02:26:57.440 |
If a company says, I wanna make Spyder better, 02:27:03.020 |
of a developer of Spyder, or a developer of NumPy, 02:27:09.080 |
what you want them to do, you can give them your priorities 02:27:10.980 |
and things you wish existed, and they'll work 02:27:17.500 |
and what emerges is what the community wants. 02:27:18.900 |
- Is there some aspect on the consulting side 02:27:21.100 |
that is helping, as we were talking about morphology 02:27:29.100 |
sort of inspiring the need for updates to SciPy? 02:27:37.660 |
Tesla's Dojo chip, I'm hoping we'll have a chance 02:27:43.820 |
The other thing that's driving it is scalable, 02:27:45.500 |
like speed and scale, how do I write NumPy code 02:27:49.200 |
or NumPy light code if I want it to run across a cluster? 02:27:56.340 |
or there's Modin, and there's, so Pandas code, 02:28:02.060 |
that I want to scale, so that's one big area. 02:28:04.860 |
- Have you gotten a chance to chat with Andre and Elon 02:28:12.260 |
I just saw their Tesla AI Days video, super exciting. 02:28:16.260 |
- So this one of the, you know, I love great engineering, 02:28:18.540 |
software engineering teams, and engineering teams 02:28:20.580 |
in general, and they're doing a lot of incredible stuff 02:28:25.020 |
so many aspects of the machine learning pipeline 02:28:42.820 |
In fact, we have at Quantsight, we've been fortunate enough 02:28:47.540 |
So we have about 13 developers at Quantsight. 02:28:49.860 |
Some of them are in labs working directly on PyTorch. 02:28:52.540 |
- On PyTorch, that's great. - On PyTorch, right? 02:28:55.660 |
I went to both TensorFlow and PyTorch and said, 02:29:03.220 |
we have this bigger mission, we wanna make sure 02:29:06.740 |
So, and Facebook responded really positively, 02:29:28.740 |
and I was looking to kind of cut some development effort, 02:29:31.580 |
and they couldn't receive that as easily, I think. 02:29:39.740 |
- So OpenTeams, I'm super excited about OpenTeams, 02:29:48.860 |
But one of the things we, when we started Quantsight, 02:29:50.980 |
we knew we would do, is we'd develop products and ideas, 02:30:00.260 |
that like five or six companies could have come out of that. 02:30:02.940 |
And we just didn't structure it so they could. 02:30:10.060 |
There's like lots of companies that could exist 02:30:13.100 |
And so I thought, oh, here's a recipe for an incubation, 02:30:16.420 |
a concept that we could actually spawn new companies, 02:30:26.540 |
So labs, I think there should be a lot of things 02:30:32.540 |
You could have a lot of open source research labs. 02:30:35.100 |
Along the way, so in 2018, when the bigger idea came, 02:30:41.120 |
So we created a venture fund called Quantsight Initiate 02:30:49.540 |
How do we actually go in this direction and build a fund? 02:30:56.580 |
Like our venture fund, the carried interest portion of it 02:31:04.120 |
So you use the power of the organic formation of teams 02:31:06.780 |
in the open source community, and then naturally, 02:31:10.660 |
that leads to a business that can make money. 02:31:14.180 |
- And then it always maintains and loops back 02:31:16.700 |
to the open source. - Loops back to open source. 02:31:22.380 |
And it's also beneficial because, oh, I have natural 02:31:29.220 |
Like, they'll all be out there talking to people. 02:31:37.900 |
So Quantsight has the services, the lab, the fund, right? 02:31:41.900 |
In that process, a lot of stuff started to happen. 02:31:44.220 |
They're like, oh, you know, we started to do recruiting 02:31:48.060 |
And I was starting to build a bigger sales team 02:31:50.220 |
and marketing team and people besides just developers. 02:31:52.900 |
And one of the challenges with that is you end up 02:31:58.820 |
in any company you go to, you kind of go, look, 02:32:00.780 |
is this a business led company, a developer led company? 02:32:11.380 |
we can actually just create, like this concept 02:32:23.100 |
a business development company for many, many Quantsights, 02:32:30.820 |
essentially be the enterprise software company 02:32:34.460 |
If you look at what enterprise software wants 02:32:36.740 |
from the customer side, and during this journey, 02:32:38.620 |
I've had the chance to work and sell to lots of companies, 02:32:42.340 |
Exxon and Shell and Davey Morgan, Bank of America, 02:32:45.220 |
like the Fortune 100, and talk to a lot of people 02:33:01.140 |
Here's open source, lead to enterprise software, 02:33:02.580 |
now I buy it, and then they have to stitch it together 02:33:11.500 |
they're trying to buy, what most open source, 02:33:15.780 |
that they can customize that are as inexpensive as they can. 02:33:23.500 |
is about solving enterprise software problems. 02:33:28.140 |
- With a connect, but we do it honoring the topology. 02:33:35.140 |
and the procurement energy, and we were on the business side 02:33:37.980 |
get the deals closed, and then have a network of partners 02:33:40.580 |
like QuantSite and others who we hand the deals to, right? 02:33:44.100 |
To actually do the work, and then we have to maintain, 02:33:52.100 |
It's not just, here's a lead, go figure that out, 02:33:54.660 |
but no, we're gonna make sure you get what you need. 02:34:05.300 |
I mean, there's all kinds of flavors of business people, 02:34:11.940 |
- There's a challenge, I hear what you're saying, 02:34:13.260 |
because I've had the same challenge, and it's true. 02:34:15.580 |
There's sometimes you think, okay, this is way overwrought. 02:34:20.220 |
and you have to, 'cause the companies have needs. 02:34:28.300 |
in the best way, like the value of open source, for example. 02:34:30.980 |
- Right, and I'm really grateful for all my experiences 02:34:32.940 |
over the past 14 years, understanding that side of it, 02:34:38.620 |
but also dealing with marketing professionals, 02:34:40.540 |
and sales professionals, and people that make a career 02:34:42.860 |
out of that, and understanding what they're thinking about, 02:34:44.340 |
and also understanding, well, let's make this better. 02:34:47.940 |
like OpenTeams, I see as the transmission layer 02:34:50.440 |
between companies and open source communities, 02:34:59.300 |
and tools that we know we can replace for folks. 02:35:05.260 |
a lot of customization to make it work for you. 02:35:17.160 |
All of those should be replaced by open source foundations, 02:35:23.160 |
in these giant organizations that do exactly that, 02:35:28.320 |
and hiring a huge team of consultants that customize it, 02:35:31.360 |
and then that whole thing gets outdated quickly. 02:35:38.320 |
to that is kind of what Tesla's doing a little bit of, 02:35:43.240 |
which is basically build up a software engineering team. 02:35:52.400 |
- And you're creating a topology for some of that. 02:35:54.320 |
- You're right, you just don't have to do it. 02:36:01.120 |
Open Team's the future of enterprise software. 02:36:04.760 |
Like, this idea just percolated over the past year 02:36:25.560 |
There's a lot of wasted energy in small teams 02:36:29.380 |
and the sales energy to get into large companies 02:36:32.640 |
There's a lot of money spent on that process. 02:36:34.680 |
- Creating the tools and processes for that sales. 02:36:48.800 |
And we have a, you know, so we have a part of our work 02:36:53.360 |
and making sure we're doing things useful for them, 02:36:59.240 |
And then, figuring out which targets we have. 02:37:01.900 |
You know, we're not taking on all of open source, 02:37:10.600 |
Can I ask you, why do you think Microsoft bought GitHub 02:37:14.440 |
and what do you think is the future of GitHub? 02:37:18.220 |
I think they did because Microsoft has always 02:37:23.520 |
Like, one of the things Microsoft's always done well 02:37:25.160 |
is understand their power as developers, right? 02:37:35.360 |
Because they recognize GitHub is where developers are at. 02:37:43.600 |
- Are they just basically throwing money at developers 02:38:02.200 |
they're a big company and they sell products. 02:38:04.680 |
they know there's opportunity to make money from GitHub. 02:38:08.600 |
There's definitely a business there, you know, 02:38:15.360 |
there's, they had definitely wanted to recognize 02:38:34.280 |
into supporting PyData for several conferences 02:38:42.520 |
don't always understand how their software gets used. 02:38:49.600 |
you see, oh, these companies have large development teams. 02:38:58.340 |
that I had a chance to learn some of these people 02:39:01.320 |
And even work alongside them, you know, as a consultant, 02:39:05.080 |
using my, using open source and trying to think, 02:39:09.960 |
- Some of it is actually, for a large organization, 02:39:32.960 |
Like you try to show off that you care about developers 02:39:39.040 |
And like one way, I think like Ford should have bought GitHub 02:39:51.080 |
It's probably an art to show that you care to developers 02:40:07.880 |
and like literally look at like the top most popular projects 02:40:12.880 |
in Python and just say, we're just gonna give money. 02:40:17.880 |
- Like that's gonna immediately make you cool. 02:40:21.600 |
And in fact, they set up NumFocus to make it easy. 02:40:26.060 |
is also you have to have some business development. 02:40:32.680 |
at Linux Foundation, know how they're doing it. 02:40:47.800 |
But there's different energies in getting donations 02:40:51.920 |
as there is getting, this is important to my business. 02:41:01.160 |
if you can tie the message to an ROI for the company, 02:41:04.080 |
it becomes a brainer. - That's more effective. 02:41:07.000 |
So, and there are rational arguments to make. 02:41:09.600 |
I've tried to have conversations with marketing, 02:41:14.880 |
oh, you could just take a fraction of your marketing budget 02:41:20.280 |
and you get better results from your marketing. 02:41:34.560 |
that will obviously be a much better investment 02:41:37.280 |
in terms of marketing is supporting open source projects. 02:41:46.520 |
Knowledge gets very specific and very channeled, right? 02:41:50.000 |
And so people get, they get a lot of learning 02:41:58.200 |
to have a sense that you might have something to offer. 02:42:13.440 |
And so like, there's an operational aspect to that 02:42:21.020 |
- So you have to hire at a high position level 02:42:24.400 |
where they care about this and they specialize. 02:42:27.680 |
And because you can also do it very clumsily, right? 02:42:40.820 |
Can I just, 'cause I just need, I need to say this. 02:42:44.440 |
I've been very surprised how often marketing people 02:42:51.800 |
I feel like the best marketing is doing something novel 02:42:58.320 |
It feels like so much of the marketing practice 02:43:04.400 |
or maybe they're studying for what was the best thing 02:43:08.480 |
And they're just repeating that over and over 02:43:10.840 |
as opposed to innovating, like taking the risk. 02:43:18.840 |
- Yeah, there's an aspect of data observation 02:43:25.140 |
But it absolutely, it's about, I think it's content. 02:43:27.700 |
Like, there's this whole world on content marketing 02:43:30.220 |
that you could almost say, well, yeah, it can get over, 02:43:33.580 |
you can get inundated with stuff that's not relevant to you. 02:43:36.420 |
Whereas what you're saying would be highly relevant 02:43:52.740 |
it's like Elon hired a person who's just good at Twitter 02:43:59.220 |
- I mean, that's exactly what you wanna be doing. 02:44:04.260 |
I mean, I've definitely seen people doing great work 02:44:12.680 |
really excited about it, but we have not been talking 02:44:16.280 |
- And there's different ways to talk about it. 02:44:17.740 |
There's different ways to, there's different channels 02:44:20.740 |
There's also, like, I'll just throw some shade 02:44:26.880 |
So for example, iRobot, I just had a conversation 02:44:32.660 |
- And I think I love, they're incredible robots, 02:44:35.380 |
but like, every time they do, like, advertisement, 02:44:38.900 |
not advertisement, but like, marketing type stuff, 02:44:50.240 |
but to me, when you're talking about engineering systems, 02:44:53.980 |
it's really nice to show off the magic of the engineering 02:44:56.980 |
and the software and all the geniuses behind this product 02:45:01.980 |
and the tinkering and like, the raw authenticity 02:45:07.920 |
versus the marketing people who want to have, like, 02:45:11.060 |
pretty people, like, standing there all pretty 02:45:18.060 |
speaking to the hackers, you have to throw some bones, 02:45:22.140 |
some care towards the engineers, the developers, 02:45:26.660 |
because there's some aspect, one, for the hiring, 02:45:29.820 |
but two, there's an authenticity to that kind 02:45:38.420 |
the best in the world are working at your company, 02:45:43.940 |
- It's interesting, 'cause your initial reaction would be, 02:45:45.660 |
wait, there's different users here, why would you do that? 02:45:52.540 |
but she doesn't care about that hacker culture. 02:45:56.580 |
So essentially what you said is actually the authenticity, 02:45:59.620 |
'cause everyone has a friend, everyone knows people, 02:46:05.840 |
- 'Cause I think it's the lack of that realization, 02:46:09.420 |
- That influences your general marketing, interesting. 02:46:11.740 |
- For some stupid reason, I do have a platform, 02:46:14.660 |
and it seems that the reason I have a platform, 02:46:21.260 |
and like we get excited naturally about stuff. 02:46:25.780 |
about that iRobot video, because it's boring, 02:46:36.260 |
is they're not letting me get into the robot. 02:46:40.900 |
they could be benefiting from a culture of modularity, 02:46:44.860 |
like add-ons, and that could actually dramatically help. 02:47:04.220 |
- Yeah, and connecting to the open source, as you said. 02:47:14.820 |
You've led many programmers, you lead many programmers. 02:47:18.540 |
What are some, from a programmer perspective, 02:47:40.100 |
- That your mind is a little bit on something else. 02:47:47.300 |
you know, I have some really, the only way I can do this, 02:47:49.260 |
I have some really great programmers that I work with 02:47:53.260 |
And my goal is to inspire them and hopefully help them, 02:47:56.580 |
encourage them, and help them encourage with their teams. 02:47:59.620 |
I would say there's a number of things, a couple things. 02:48:03.860 |
Like I think a programmer without curiosity is mundane. 02:48:08.860 |
Like you'll lose interest, you won't do your best work. 02:48:16.800 |
I think two, don't try to do everything at once. 02:48:19.600 |
Recognize that you're, we're limited as humans. 02:48:23.220 |
And each one of us are limited in different ways. 02:48:24.940 |
You know, we all have our different strengths and skills, 02:48:26.620 |
so it's adapting the art of programming to your skills. 02:48:31.260 |
is to limit what you're trying to solve, right? 02:48:36.700 |
usually maybe somebody else has put the architecture 02:48:38.620 |
together and they've gotten given a portion for you 02:48:43.500 |
it's sort of breaking down the problem into smaller parts 02:48:50.740 |
and try to do it all at once and you get lost 02:49:03.980 |
Even just talking about that and like writing down 02:49:07.300 |
before you write code, just what are you trying to accomplish? 02:49:09.480 |
I mean, very specific about it really, really helps. 02:49:30.100 |
It's particularly relevant in the era of codex 02:49:34.260 |
which is essentially I see as an indexing of Stack Overflow. 02:49:39.320 |
- It's a search engine over Stack Overflow basically. 02:49:41.300 |
So it's not, I mean, we've had this for a while. 02:49:43.520 |
But really you want to cut and paste, but not blindly. 02:49:47.340 |
Like absolutely have cut and paste to understand, 02:49:51.060 |
but then you understand, oh, this is what this means. 02:49:56.700 |
So it's critical, that's where the curiosity comes in. 02:50:01.980 |
And so understand, and then be sensitive to hype cycles. 02:50:20.860 |
Like likely there's signal, like there's a thing there 02:50:28.980 |
- What lessons do you draw from you having created NumPy 02:50:32.820 |
and SciPy, like in service of sort of answering 02:50:36.820 |
the question of what it takes to be a great programmer 02:50:40.620 |
How can you be the next person to create a SciPy? 02:51:04.240 |
is probably gonna suck, and that's okay, right? 02:51:07.580 |
It's honestly, I think iteration is the key to innovation. 02:51:11.220 |
And it's almost that psychological hesitation we have 02:51:22.020 |
I mean, just keep learning and keep improving 02:51:27.700 |
And then it doesn't take intense concentration, right? 02:51:38.180 |
You can't scroll your way to good programming, right? 02:51:46.020 |
Like often people will run away from something 02:51:53.340 |
And just five minutes, not gonna give you that. 02:51:56.540 |
- Was it lonely when you were building SciPy and NumPy? 02:52:00.500 |
- Hugely, yeah, absolutely lonely in the sense of 02:52:05.780 |
And that inner drive for me always comes from, 02:52:07.980 |
I have to see that this is right in some angle. 02:52:11.620 |
I have to believe it, that this is the right approach, 02:52:28.380 |
So find a good, find a thing that you know is good 02:52:34.700 |
And you kind of have to have enough realization 02:52:40.260 |
or the fact that not everybody joins you up front. 02:52:42.180 |
In fact, one thing I've talked to people a lot, 02:52:43.500 |
I've seen a lot of projects come and some fail. 02:52:45.500 |
Not everything I've done has actually worked perfectly. 02:52:49.140 |
that didn't really work, or this isn't working and why. 02:52:53.660 |
And one of the key things is you can't even know 02:53:03.200 |
before you even know if the feedback's there. 02:53:08.740 |
but six months from now, it's still kinda still emerging. 02:53:11.500 |
So give it time, 'cause you're dealing with humans, 02:53:23.540 |
you're focused on the sales side of things currently. 02:53:31.660 |
What's your, a setup that you have that brings you joy? 02:53:45.620 |
trying to figure out some good teams for 'em, 02:53:51.940 |
- Great, thank the superior editor, everybody. 02:54:12.140 |
but people do take the editor seriously, right? 02:54:16.540 |
- It is, but there's something beautiful to me about Emacs, 02:54:22.380 |
there's something beautiful to them about that. 02:54:23.220 |
- There is, I mean, I do use Vim for quick editing, 02:54:35.860 |
SciPy and NumPy are all written in Emacs on a Linux box, 02:54:48.060 |
I think Git is pretty complicated, but I love the concept. 02:54:51.620 |
And also, of course, GitHub, and then GitLab, 02:54:55.220 |
make Git definitely consumable, but that came later. 02:55:01.220 |
What were your emotional feelings about all the parentheses? 02:55:10.940 |
I knew programming, but I was a domain expert, right? 02:55:19.260 |
about what I'm doing, so why would I have all these, right? 02:55:24.500 |
You know, and now as I appreciate kind of the structure 02:55:27.260 |
that kind of naturally maps to a logical thinking 02:55:30.260 |
about a program, I can appreciate them, right? 02:55:32.940 |
And why it's actually, you could create editors 02:55:35.660 |
that make it not so problematic, right, honestly. 02:55:40.740 |
So I actually have a much more appreciation of Lisp 02:55:50.300 |
Like, typically, these languages are, you know, 02:55:53.140 |
I even saw a whole data science programming system in Lisp 02:55:56.100 |
that somebody created, which is, you know, cool. 02:55:58.500 |
But again, I think it's the lack of recognition 02:56:04.060 |
People that are never gonna be programmers for a living. 02:56:05.820 |
They don't want to have all this cuteness in their head. 02:56:08.420 |
They want just, you know, it's why BASIC, you know, 02:56:14.460 |
in terms of having that be the language of Visual BASIC, 02:56:21.260 |
They should have converted that to Python 10 years ago. 02:56:23.500 |
Like, the world would be a better place if they had, but. 02:56:31.620 |
You know, some of the most interesting people 02:56:35.860 |
and artificial intelligence have used Lisp, so. 02:56:41.220 |
When you have a language, you can think in it. 02:56:50.980 |
with a tiny keyboard, or is there like three screens? 02:56:55.860 |
I've never gotten into the many screens, to be honest. 02:57:03.460 |
Like, partly because I guess I really can't process 02:57:09.220 |
Like, I just am looking at one, and I just flip. 02:57:19.900 |
Like, this is the only time I really need another screen. 02:57:24.860 |
lead developers, but then there's also these businesses, 02:57:33.380 |
Which operating system is your favorite still, 02:57:41.460 |
and it was early days I had my own Linux desktop. 02:57:53.780 |
- Pretty much, I mean, just the fact that I had 02:57:56.460 |
to do PowerPoints, I had to do presentations, 02:58:04.420 |
- So you mentioned, so Quonset Labs, and things like that. 02:58:08.380 |
Can you give advice on how to hire great programmers, 02:58:14.580 |
- Yeah, I would say, produce an open source project. 02:58:19.980 |
- Get people contributing to it, and hire those people. 02:58:35.620 |
it's not hard to hire if I've worked with somebody 02:58:39.300 |
But an hour or two of interviews, I have no idea. 02:58:50.780 |
- It's really hard, I mean, the resume can help, 02:59:01.920 |
you have to understand what you're hiring for. 02:59:03.960 |
There are different stages and different kinds of skills, 02:59:12.600 |
is just that the whole idea of measuring ourselves 02:59:18.620 |
'cause we're not, it's a multidimensional space, 02:59:20.620 |
and how do you order a multidimensional space? 02:59:23.440 |
So this whole idea, you immediately have projected 02:59:26.160 |
into a thing, and you're talking about hiring 02:59:30.660 |
So what is the thing you're actually needing, 02:59:35.980 |
There is such a thing, generally I really value people 02:59:39.040 |
who have the affect, that care about open source. 02:59:42.920 |
Like so in some cases, their affinity to open source 02:59:48.120 |
However, I have found this interesting dichotomy 02:59:52.560 |
between open source contributors and product creation. 03:00:00.580 |
but there does seem to be the more experience, 03:00:04.960 |
the more affect somebody has to an open source community, 03:00:08.160 |
the less ability to actually produce product that they have. 03:00:13.520 |
The more product focused are, I find a lot of people, 03:00:23.300 |
but they've played here, and they do a great job here, 03:00:25.960 |
and then they don't necessarily have some of the same. 03:00:32.060 |
I think part of it is cultural, how they've emerged. 03:00:34.860 |
'Cause one of the things that open source communities 03:00:40.780 |
- That's brilliant, but you want both of those energies 03:00:45.860 |
And so it's a lot of it's creating these teams of people 03:00:48.100 |
that have these needed skills and attributes that are hard. 03:01:16.940 |
And I've spent a lot of time learning, right? 03:01:23.260 |
My whole goal was to get a PhD because I love school, 03:01:30.940 |
about elsewhere as well is the more I learned, 03:01:37.700 |
but this is such a tiny thing in the global scope 03:01:48.820 |
My wife says that I used to be a better listener. 03:01:50.620 |
Now that I'm so full of all these ideas I wanna do, 03:01:52.860 |
she kind of says, "You gotta give people time to talk." 03:01:55.500 |
- So you've succeeded on multiple dimensions. 03:02:01.680 |
The other is just creating all these products, 03:02:09.220 |
in high school and college of how to live a life 03:02:27.980 |
honestly, one thing that I've said to people is, 03:02:30.420 |
first, find people you love and care about them. 03:02:36.060 |
And family means people you love and have committed to. 03:02:45.160 |
So find people you love and wanna commit to and do that, 03:02:47.960 |
'cause it anchors you in a way that nothing else can. 03:02:55.220 |
And then from out there, you find other kinds of things 03:02:57.860 |
you can commit to, whether it's ideas or people 03:03:08.840 |
Give yourself 10 years to think about the world. 03:03:20.340 |
But recognize that the things you care about, 03:03:28.580 |
I was really passionate about one specific thing 03:03:32.540 |
I was a big, I didn't like the Federal Reserve. 03:03:46.740 |
But that's one area where you learn about something 03:03:54.120 |
Build, so often the tendency is to not like something, 03:03:59.980 |
Build something, build something to replace it. 03:04:25.800 |
is grounded in family, friendship, and ultimately love. 03:04:34.660 |
Travis, you're one of the most impactful people 03:04:39.900 |
So I truly appreciate everything you've done. 03:04:55.340 |
please check out our sponsors in the description. 03:05:00.200 |
that in the programming world is called Hodgkin's Law. 03:05:11.700 |
Thank you for listening, and hope to see you next time.