back to indexThe Array Cast: Jeremy Howard
Chapters
0:0
1:15 Dyalog Problem /solving Contest
2:40 Jeremy Howard
4:30 APL Study Group
10:20 AT Kearney
12:33 MKL (Intel)
13:0 BLAS
13:11 Perl BQN
14:6 Raku
15:45 kaggle
16:52 R
18:50 Neural Networks
19:50 Enlitic
20:1 Fast.ai
21:2 Numpy
21:26 Leading Axis Theory
21:31 Rank Conjunction
21:40 Einstein notation
22:55 CUDA
28:51 Numpy Another Iverson Ghost
30:11 Pivot Tables
30:36 SQL
31:25 Larry Wall "The three chief virtues of a programmer are: Laziness, Impatience and Hubris."
32:0 Python
36:25 Regular Expressions
36:50 PyTorch
37:39 Notation as Tool of Thought
37:55 Aaron Hsu codfns
38:40 J
39:6 Eric Iverson on Array Cast
40:18 Triangulation Jeremy Howard
41:48 Google Brain
42:30 RAPIDS
43:40 Julia
43:50 llvm
44:7 JAX
44:21 XLA
44:32 MILAR
44:42 Chris Lattner
44:53 Tensorflow
49:33 torchscript
50:9 Scheme
50:28 Swift
51:10 DragonBox Algebra
52:47 APL Glyphs
53:24 Dyalog APL
54:24 Jupyter
55:44 Jeremy's tweet of Meta Math
56:37 Power function
63:6 Reshape
63:40 Stallman 'Rho, rho, rho'
64:20 APLcart
66:12 J for C programmers
67:54 Transpose episode
70:0 APLcart video
72:28 Functional Programming
73:0 List Comprehensions
73:30 BQN to J
78:15 Einops
79:30 April Fools APL
80:35 Flask library
81:22 JuliaCon 2022
88:5 Myelination
89:15 Sanyam Bhutani interview
91:27 Jo Boaler Growth Mindset
93:45 Discovery Learning
97:5 Iverson Bracket
99:14 Radek Osmulski Meta Learning
100:12 Top Down Learning
101:20 Anki
103:50 Lex Fridman Interview
00:00:00.880 |
Welcome to another episode of ArrayCast. I'm your host, Connor. And today we have a very 00:00:05.440 |
exciting guest, which we will introduce in a second. But before we do that, we'll do brief 00:00:09.440 |
introductions and then one announcement. So first we'll go to Bob and then we'll go to Adam who has 00:00:13.120 |
the one announcement. And then we will introduce our guest. I'm Bob Terrio. I'm a J enthusiast 00:00:18.400 |
and I do some work with the J Wiki. We're underway and trying to get it all set up for the fall. 00:00:24.800 |
I'm Adam Botzewski, full-time APL programmer at Dialog Limited. Besides for actually programming 00:00:30.640 |
APL, I also take care of all kinds of social things, including the APL Wiki. And then for my 00:00:37.680 |
announcements, part of what we do with Dialog is arrange a yearly user meeting or a type of 00:00:42.880 |
conference. And at that user meeting, there is also a presentation by the winner of the APL 00:00:51.840 |
problem solving competition. That competition closes at the end of the month. So hurry up if 00:00:59.360 |
you want to participate. It's not too late even to get started at this point. And also at the end of 00:01:03.520 |
the month is the end of the early bird discount for the user meeting itself. Awesome. And just 00:01:10.560 |
a note about that contest. I think, and Adam can correct me if I'm wrong, there's two phases in 00:01:15.360 |
the first phase. It's just 10 short problems. A lot of them are just one-liners. And even if 00:01:20.560 |
you only solve one of the 10, I think you can win a small cash prize just from answering one. 00:01:26.960 |
Is that correct? I'm not even sure. You might need to solve them all. They're really easy. 00:01:36.160 |
So the point being though is that you don't need to complete the whole contest in order to be 00:01:39.680 |
eligible to win prizes. No, for sure. There's a certain amount that if you get to that point, 00:01:44.160 |
you hit a certain threshold and you can be eligible to win some free money, which is always 00:01:48.240 |
awesome. And yeah, just briefly, as I introduce myself in every other episode, I'm your host, 00:01:54.640 |
Connor, C++ professional developer, not an array language developer in my day-to-day, 00:01:59.680 |
but a huge array language and combinator enthusiast at large, which brings us to introducing our 00:02:06.240 |
guest who is Jeremy Howard, who has a very, very, very long career. And you probably have heard him 00:02:13.920 |
on other podcasts or have been giving other talks. I'll read the first paragraph of his 00:02:19.200 |
three-paragraph bio because I don't want to embarrass him too much, but he has 00:02:22.880 |
a very accomplished career. So Jeremy Howard is a data scientist, researcher, developer, 00:02:28.320 |
educator, and entrepreneur. He is the founding researcher at FastAI, a research institute 00:02:34.320 |
dedicated to making deep learning more accessible and is an honorary professor at the University of 00:02:38.800 |
Queensland. That's in Australia, I believe. Previously, Jeremy was a distinguished research 00:02:43.520 |
scientist at the University of San Francisco, where he was the founding chair of the Wicklow 00:02:47.600 |
artificial intelligence and medical medical research initiative. He's also been the CEO of 00:02:53.120 |
analytic and was the president and chief scientist of Kegel, which is the basically data science 00:02:59.760 |
version of leak code, which many software developers are familiar with. He was the CEO of two 00:03:04.400 |
successful Australian startups, Fastmail and Optimal Decisions Group. And before that, 00:03:08.400 |
in between doing a bunch of other things, he worked in management consulting at McKinsey, 00:03:12.960 |
which is an incredibly interesting start to the career that he has had now, because for those of 00:03:18.240 |
you that don't know, McKinsey is one of the three biggest management consulting firms alongside, 00:03:22.720 |
I think, Bain & Co. and BCG. So I'm super interested to hear how he started in management 00:03:27.280 |
consulting and ended up being the author of one of the most popular AI libraries in Python and also 00:03:33.520 |
the course that's attached to it, which I think is, if not, you know, the most popular, a very, 00:03:38.800 |
very popular course that students all around the world are taking. So I will stop there, 00:03:42.800 |
throw it over to Jeremy, and he can fill in all the gaps that he wants, jump back to however far 00:03:47.440 |
you want to, to tell us, you know, how you got to where you are now. And I think the one thing I 00:03:53.120 |
forgot to mention, too, is that he recently tweeted on July 1st, and we're recording this on July 4th, 00:03:58.720 |
that he quote the tweets, reads, "Next week, I'm starting a daily study group on my most loved 00:04:03.440 |
programming language, APL." And so obviously interested to hear more about that tweet and 00:04:08.560 |
what's going to be happening with that study group. So over to you, Jeremy. 00:04:11.040 |
Well, the study group is starting today as we record this. So depending on how long it takes to 00:04:19.280 |
get this out, it'll have just started. And so definitely time for people to join in. So we'll, 00:04:26.640 |
I'm sure we'll include a link to that in the show notes. Yeah, I definitely feel kind of like I'm 00:04:32.480 |
your least qualified array programming person ever interviewed on this show. I love APL and J, 00:04:43.520 |
but I've done very, very little with them, particularly APL. I've done a little, 00:04:48.960 |
little bit with J mucking around, but like, I find a couple of weeks here and there every 00:04:54.480 |
few years, and I have for a couple of decades. Having said that, I am a huge enthusiast of 00:05:04.320 |
array programming, as it is used, you know, in a loopless style in other languages, initially in 00:05:12.480 |
Pell, and nowadays in Python. Yeah, maybe I'll come back to that, because I guess you wanted to get a 00:05:18.400 |
sense of my background. Yeah, so I actually started at McKinsey. I grew up in Melbourne, Australia. And 00:05:28.640 |
I didn't know what I wanted to do when I grew up at the point that you're meant to know when you 00:05:34.240 |
choose a university, you know, major. So I picked philosophy on the basis that it was like, 00:05:39.920 |
you know, the best way of punting down the road what you might do, because with philosophy, 00:05:45.360 |
you can't do anything. And honestly, that kind of worked out in that I needed money, 00:05:54.480 |
and I needed money to get through university. So I got over like one day a week, kind of IT 00:05:59.680 |
support job at McKinsey, the McKinsey Melbourne office during university from first year, 00:06:07.600 |
I think that's from first year. But it turned out that like, yeah, I was very curious, and so I'm 00:06:15.280 |
so curious about management consulting. So every time consultants would come down and ask me to 00:06:18.720 |
like, you know, clean out the sticky coke they built in their keyboard or whatever, I would 00:06:24.800 |
always ask them what they were working on and ask them to show me and I've been really interested in 00:06:31.760 |
like doing analytics see kind of things for a few years at that point. So during high school, 00:06:38.080 |
basically every holidays, I kind of worked on stuff with spreadsheets or Microsoft access or 00:06:44.000 |
whatever. So it turned out I knew more about like, stuff like Microsoft Excel than they did. So 00:06:50.320 |
within about two months of me starting this one day a week job, I was working 90 hour weeks, 00:06:57.120 |
basically doing analytical work for the consultants. And so that, you know, that actually worked out 00:07:05.920 |
really well, because I kind of did a deal with them where they would, they gave me a full time 00:07:11.920 |
office, and they would pay me $50 an hour for whatever time I needed. And so suddenly, I was 00:07:17.760 |
actually making a lot of money, you know, working, working 90 hours a week. And yeah, it was great 00:07:28.560 |
because then the I would come up with these solutions to things they're doing in the projects, 00:07:33.120 |
and I'd have to present it to the client. So next thing I knew I was basically on the client side 00:07:37.040 |
or all the time. So I ended up actually not going to any lectures at university. And I somehow kind 00:07:45.280 |
of managed this thing where I would take two weeks off before each exam, go and talk to all my 00:07:50.720 |
lecturers and say, Hey, I was meant to be in your university course. I know you didn't see me, but I 00:07:55.040 |
was kind of busy. Can you tell me what I was meant to have done? And I would do it. And so I kind of 00:08:01.440 |
scraped by a BA in philosophy, but I don't Yeah, you know, I don't really have much of an academic 00:08:08.080 |
background. But that did give me a great background in like applying stuff like, you know, 00:08:15.280 |
linear regression and logistic regression and linear programming and, you know, 00:08:19.760 |
the basic analytical tools of the day, generally through VBA scripts in Excel, or, you know, 00:08:27.280 |
access, you know, the kind of stuff that a consultant could chuck out, you know, on to their 00:08:32.560 |
laptop at a client site. Anyway, I always felt guilty about doing that, because it just seemed 00:08:40.800 |
like this ridiculously nerdy thing to be doing when I was surrounded by all these very important, 00:08:46.000 |
you know, consultant types who seemed to be doing much more impressive strategy work. So I tried to 00:08:53.920 |
get away from that as quickly as I could, because I didn't want to be the nerd in the company. And 00:09:00.480 |
yeah, so I ended up spending the next 10 years basically doing strategy consulting. But throughout 00:09:06.080 |
that time, I did, you know, because I didn't have the same background that they did that expertise, 00:09:12.320 |
they did the MBA, they did, I had to solve things using data and analytically intensive approaches. 00:09:18.320 |
So although in theory, I was a strategy management consultant, and I was working on problems like, 00:09:23.680 |
you know, how do we fix the rice industry in Australia? Or, you know, how do we, you know, 00:09:29.360 |
like, you know, how do we deal with this new competitor coming into this industry or whatever 00:09:33.680 |
it was, I always did it by analyzing data, which actually turned out to be a good niche, you know, 00:09:40.000 |
because I was the one McKinsey consultant in Australia who did things that way. And so I 00:09:44.640 |
successful and I became I think, I ended up moving to AT Carney, which is the other of the two 00:09:50.800 |
original management consulting firms. I think I became like the youngest manager in the world. 00:09:56.800 |
And, you know, through this, we had parallel path I was doing. And then through that, learned about 00:10:03.840 |
the insurance industry and discovered like the whole insurance industry is basically pricing 00:10:09.120 |
things in a really dumb way. I developed this approach based on optimization of optimized 00:10:17.600 |
pricing, launched a company with my university friend who had a PhD in operations research. 00:10:25.360 |
And, yeah, so we built this new approach to pricing insurance, which is, it was kind of fun. 00:10:34.320 |
I mean, it's, you know, it went well in the set, you know, commercially took a bit of about 10 00:10:41.600 |
years doing doing that. And at the same time, running an email company called fast mail, 00:10:46.960 |
which also went well. Yeah, we started out basically using C++. And I would say that was 00:10:55.920 |
kind of the start of my array programming journey in that in those days, this is like 1999, 00:11:00.480 |
the very first expression templates based approaches to C++ numeric programming were appearing. 00:11:07.840 |
And so I, you know, was talking to the people working on those libraries doing stuff like 00:11:14.960 |
particularly stuff doing the big kind of high energy physics experiments that were going on in Europe. 00:11:21.040 |
It was ultimately pretty annoying to work with, though, like the amount of time it talked to 00:11:28.960 |
compile those things, it would take hours. And it was quirky as all hell, you know, it's still 00:11:35.600 |
pretty quirky doing metaprogramming in C++. But in those days, it was just a nightmare. Every 00:11:40.800 |
compiler was different. So I ended up switching to C sharp shortly after that came out. And, you know, 00:11:49.280 |
move in a way it was disappointing because that that was much less expressive as a kind of array 00:11:55.760 |
programming paradigm. And so instead, I ended up basically grabbing Intel's MKL library, which is 00:12:04.960 |
basically a blast on steroids, if you like, and writing my own C sharp wrapper to give me, 00:12:12.640 |
you know, kind of array programming ish capabilities, but not with any of the features one 00:12:17.840 |
would come to expect from a real array programming language around kind of 00:12:21.040 |
dealing with rank sensibly, and, you know, not much in the way of broadcasting, 00:12:26.320 |
which reminds me, we should come back for talking about blasts at some stage, because a lot of the 00:12:32.480 |
reasons that most languages are so disappointing at array programming is because of our reliance on 00:12:37.360 |
blasts, you know, as an industry. Fastmail, on the other hand, is being written in Perl, 00:12:45.360 |
which I really enjoyed as a programming language and still do, I still love Perl a lot. 00:12:50.240 |
But the scientific programming in Perl I didn't love at all. And so at the time, Perl 6, 00:13:01.840 |
you know, we was just starting to the idea of it was being developed. So I ended up 00:13:06.560 |
running the Perl 6 working group to add scientific programming capabilities or kind of, you know, 00:13:14.400 |
and at the time, I described those APL inspired programming capabilities to Perl. And so I 00:13:20.560 |
did an RFC around what we ended up calling hyper operators, which is basically the idea that any 00:13:27.200 |
operator can operate on arrays and can broadcast over any axes that are mismatched or whatever. 00:13:35.600 |
And those RFCs all ended up getting accepted. And Damien Conway and Larry Wall kind of expanded 00:13:42.640 |
them a little bit. Perl 6 never exactly happened. It ended up becoming a language called Raku. 00:13:51.680 |
With the butterfly logo. Yeah. And that, you know, and the kind of the performance ideas, 00:13:58.400 |
I really worked hard on, never really happened either. So that was a bit of a, 00:14:01.760 |
yeah, that was all a bit of a failure. But it was fun, and it was interesting. 00:14:05.920 |
I, you know, so after running these companies for 10 years, one of the big problems with running a 00:14:16.000 |
company is that you're surrounding by people who you hired, and they, you know, have to make 00:14:21.600 |
you like them if they want to get promoted, you know, get fired. And so you could never trust 00:14:25.120 |
anything anybody says. So I was, you know, very bad, very low expectations about my capabilities, 00:14:32.960 |
analytics leagues. I hadn't like, you know, I'd basically been running companies for 10 years. 00:14:37.920 |
I did a lot of coding and stuff, but it was in our own little world. And so after I sold those 00:14:47.920 |
companies, yeah, I, one of the things I decided to do was to try actually to become more competent, 00:14:56.640 |
you know, I had lost my, to some extent, I had lost my feeling that I should hide my nerdiness, 00:15:06.240 |
you know, and try to act like a real business person. And I thought, no, I should actually 00:15:11.840 |
see if I'm actually any good at this stuff. So I tried entering a machine learning competition 00:15:18.720 |
at a new company that had just been launched called Kaggle with this goal of like, not coming last. 00:15:26.880 |
So basically, the, you know, the way these things work is you have to make predictions on a data 00:15:37.760 |
set. And at the end of the competition, whoever's predictions are the most accurate wins the prize. 00:15:46.080 |
And so my goal was, yeah, try not to come last, which I wasn't convinced I'd be able to achieve. 00:15:52.800 |
Because as I say, I didn't feel like this is, I'd never had any technical training, 00:15:59.600 |
you know, and everybody else in these competitions were PhDs and professors or whatever else. So it 00:16:03.840 |
felt like a high bar. Anyway, I ended up winning it. And that, that changed my life, right? Because, 00:16:12.000 |
yeah, it was like, oh, okay, I am, you know, empirically good at this thing. And people 00:16:23.520 |
at my local user groups, we used quite a bit as well. You know, I told them, I'm going to try 00:16:32.560 |
entering this competition. Anyone want to create a team with me? I want to learn to use R properly. 00:16:37.360 |
And I kind of went back to the next user group meeting and people were like, I thought you were 00:16:41.040 |
just learning this thing. How did you win? I was like, I don't know. I just used common sense. 00:16:47.840 |
Yeah, so I ended up becoming the chief scientist and president of Kaggle. And Kaggle, as you know, 00:16:54.320 |
anybody in the data science world knows, has kind of grown into this huge, huge thing, ended up 00:16:59.760 |
selling it to Google. So I ended up being an equal partner in the company. I was the first 00:17:04.080 |
investor in it. And that was great. That was like, I just dove in, we moved to San Francisco for 10 00:17:11.760 |
years. You know, surrounded, surrounded by all these people who are just sort of role models 00:17:18.400 |
and idols, and partly getting to meet all these people in San Francisco was this experience of 00:17:24.880 |
realizing all these people were actually totally normal, you know, and they weren't like some 00:17:30.160 |
super genius level, like they're just normal people who, yeah, as I got to know them, 00:17:38.720 |
it gave me, I guess, a lot more confidence in myself as well. So maybe they were just normal 00:17:44.720 |
relative to you. I think in Australia, we all feel a bit, you know, intimidated by the rest of the 00:17:53.840 |
world in some ways, or a long way away, you know, our only neighbors really have a New Zealand. 00:17:59.680 |
It's very easy to feel, I don't know, like, yeah, we were not very 00:18:07.280 |
confident about capabilities over here, other than in sport, perhaps. 00:18:13.040 |
Yeah, so one of the things that happened well as a Kaggle was, I had played around with neural 00:18:20.480 |
networks a bit, a good bit, you know, like 20 years earlier. And I always felt like neural networks 00:18:26.720 |
were one day going to be the thing. It's like, you know, they are at a theoretical level, 00:18:34.080 |
infinitely capable. But, you know, they never quite did it for me. And 00:18:41.760 |
but then in 2012, suddenly, neural networks started achieving superhuman performance for 00:18:49.120 |
the first time on really challenging problems, like recognizing traffic signs, you know, 00:18:54.080 |
like recognizing pictures. And I'd always said to myself, I was going to watch for this moment, 00:19:00.160 |
and when it happened, I wanted to like, jump on it. So as soon as I saw that, I tried to jump on 00:19:04.800 |
it. So I started a new company, after a year of research into like the, you know, what what a 00:19:12.320 |
neural network's going to do, I decided medicine was going to be huge, I need nothing about medicine. 00:19:18.160 |
And I, yeah, I started a medicine company to see what we could do with deep learning in medicine. 00:19:23.200 |
So that was analytic. Yeah, that ended up going pretty well. And yeah, eventually, I kind of got 00:19:33.200 |
like a bit frustrated with that, though, because it felt like big learning can do so many things, 00:19:39.120 |
and I'm only doing such a small part of those things. So deep learning is like neural networks 00:19:44.000 |
with multiple layers. I thought the only way to actually help people really, you know, make the 00:19:51.520 |
most of this incredibly valuable technology is to teach other people how to do it, and to help 00:19:56.800 |
other people to do it. So my wife and I ended up studying a new, I'd call it kind of a research 00:20:02.560 |
lab, fast AI, to, to help, to help do that, basically, initially focus on education, 00:20:09.760 |
and then increasingly focus on research and software development to basically make it 00:20:15.520 |
easier for folks to use some deep learning. And that's, yeah, that's where I am now. And that 00:20:23.280 |
everything in deep learning is all Python. And in Python, we're very lucky to have, 00:20:30.080 |
you know, excellent libraries that behave pretty consistently with each other, 00:20:36.160 |
basically based around this NumPy library, which treats arrays very, very similarly to how 00:20:45.440 |
Jay does, except rather than leading access, it's trailing access. But basically, you get, 00:20:51.920 |
you know, you get loop free, you get broadcasting, you know, you don't get things like a rank 00:20:57.760 |
conjunction, but there's very easy ways to permute axes. So you can do basically the same thing. 00:21:05.200 |
Things like Einstein notation, you know, the built into the libraries, and then, you know, it's, 00:21:11.040 |
it's trivially easy to have them run on GPUs or TPUs or whatever, you know, so it's for the last 00:21:20.400 |
years of my life, nearly all the code I write is array programming code, even though I'm not 00:21:28.400 |
using a purely array language. All right, so where do we start now with the questions? 00:21:35.760 |
I'll let Bob and Adam go first if they want. And if they if they don't have a Okay, Bob, you go ahead. 00:21:44.080 |
I've got a quick question about about neural networks and stuff. Because when I was going to 00:21:49.360 |
university all those years ago, people were talking about neural networks, and then they just sort of 00:21:54.240 |
dropped off the face. And as you said, around 2010, suddenly they resurfaced again. What do you think 00:21:59.520 |
was the cause of that resurfacing? Was it hardware? Was it somebody discovered a new method or what? 00:22:04.480 |
Yeah, mainly hardware. So what happened was people figured out how to do GP GPU, so general purpose 00:22:12.480 |
GPU computing. So before that, I tried a few times to use GPUs with neural nets, I felt like that would 00:22:18.560 |
be the thing. But GPUs were all about like creating shaders and whatever. And it was a whole jargon 00:22:25.840 |
thing. I didn't even understand what was going on. So the key thing was in video coming up with this 00:22:31.680 |
CUDA approach, which it's it's all loops, right? But it's much easier than the old way, like the 00:22:42.080 |
loops, you basically, it's kind of loops, at least you basically say to CUDA, this is my kernel, 00:22:48.640 |
which is the piece of code I want to basically run on each symmetric multiprocessing unit. 00:22:52.960 |
And then you basically say launch a bunch of threads. And it's going to call your kernel, 00:23:00.080 |
you know, basically incrementing the x and y coordinates and passing it to your kernel, 00:23:06.000 |
making them available to your kernel. So it's a kind of it's not exactly a loop, 00:23:09.440 |
but it's this gets more like a map, I guess. And so when CUDA appeared, yeah, very quickly, 00:23:16.320 |
neural network libraries appear to take advantage appear appeared that would take advantage of it. 00:23:21.680 |
And then suddenly, you know, you get orders of magnitude more performance. And it's cheaper. 00:23:28.240 |
And you get to buy an Nvidia graphics card with a free copy of Batman, you know, on the excuse that 00:23:34.880 |
actually this is all for work. So it was it was mainly that there's also this just like at the 00:23:41.920 |
same time, the thing I'd been doing for 25 years, suddenly got a name data science, you know, we 00:23:49.440 |
like this very small industry of people like applying data driven approaches to solving 00:23:54.960 |
business problems. And we were always looking for a name. Not many people know this, but back in the 00:24:00.800 |
very early days, there was an attempt to calling it industrial mathematics. Sometimes people would 00:24:06.480 |
like shoehorn it into operations research or management science, but that was almost exclusively 00:24:11.680 |
optimization people and specifically people focused more on linear programming approaches. 00:24:17.440 |
So yeah, once data science appeared, and also like, you know, basically every company had 00:24:23.360 |
finally built their data warehouse and the data was was there. So yeah, it's like more awareness 00:24:32.560 |
of using data to solve business problems and for the first time availability of the hardware that 00:24:37.520 |
we actually needed. And as I say, in 2012, it just it's it reached the point like it been growing 00:24:44.400 |
since the first neural network was built in was at 1957, I guess, that this kind of gradual 00:24:53.040 |
rate, but once it passed human performance on some tasks, it just kept going. And so now, 00:25:00.400 |
in the last couple of months, you know, it's now like getting decent marks on MIT math tests and 00:25:08.800 |
stuff. It's it's, it's on an amazing trajectory. Yeah, it's kind of a critical mass kind of thing, 00:25:16.080 |
you get a certain amount of information and able to process and information it, I guess, as you 00:25:22.800 |
as you do with your hand, it's an exponential curve. And humans and exponential curves, 00:25:28.720 |
I think we're finding over and over again, we're not really great at understanding an exponential. 00:25:34.080 |
No, no, we're not. And that's like why I promised myself that as soon as I saw neural net starting 00:25:41.440 |
to look like they're doing interesting things, I would drop everything and jump on it, because I 00:25:45.360 |
wanted to jump on that curve as early as possible. And we're now in this situation where people are 00:25:50.960 |
just making huge amounts of money with neural nets, which they then reinvest back into making the 00:25:57.360 |
neural nets better. And so we are also seeing this kind of bifurcation of capabilities where there's 00:26:03.920 |
a small number of organizations who are extremely good at this stuff and invested in it and a lot 00:26:09.680 |
of organizations that are, you know, really struggling to figure it out. And because of the 00:26:17.680 |
exponential nature, when it happens, it happens very quickly, it feels like you didn't see it 00:26:22.240 |
coming. And suddenly, it's there. And then it was past you. And I think you're all experiencing that 00:26:26.720 |
now. Yeah, and it's happened in so many industries, you know, back in my medical startup, you know, 00:26:34.800 |
we were interviewing folks around medicines, we interviewed a guy finishing his PhD in 00:26:42.160 |
histopathology. And I remember, you know, he came in to do an interview with us. And he basically 00:26:49.440 |
gave us a presentation about his thesis on kind of graph cut segmentation approaches for pathology 00:26:54.960 |
slides. And at the end, he was like, anyway, that was my PhD. And then yesterday, because I knew I 00:26:59.920 |
was coming to see you guys, and I heard you like neural nets, I just thought I'd check out neural nets. 00:27:04.000 |
And about four hours later, I trained a neural net to do the same thing I did for my PhD. And 00:27:11.360 |
way outperformed my PhD thesis, I'd spent the last five years on and so that's where I'm at, you know, 00:27:17.360 |
and we hear this a lot. Existential crisis in the middle of an interview. Yes. 00:27:24.960 |
So I kind of have, I don't know, this is like a 1A, B and C. And I'm not sure if I should ask them 00:27:34.000 |
all at once. But so you said sort of at the tail end of the 90s is when your array language journey 00:27:40.880 |
started. But it seems from the way you explained it that you had already at some point along the 00:27:45.280 |
way heard about the array languages, APL and J, and have sort of alluded to, you know, picking up 00:27:52.640 |
some knowledge about the paradigm and the languages. So my first part of the question is sort of, 00:27:58.240 |
you know, at what point were you exposed to the paradigm in these languages? The second part is 00:28:04.000 |
what's causing you in 2022 to really dive into it? Because you said you feel like maybe a bit of an 00:28:11.600 |
imposter or the least qualified guest, which probably is you just being very modest. I'm sure 00:28:16.160 |
you know still quite a bit. And then the third part is, do you have thoughts about, and I've 00:28:21.680 |
always sort of wondered, how the array language paradigm sort of missed out on like, and Python 00:28:28.160 |
ended up being the main data science language, while like there's like an article that's floating 00:28:34.480 |
around online called NumPy, the ghost of Iverson, which it's this sort of you can see that in the 00:28:40.640 |
names and the design of the library that there is an core of APL and even the documentation 00:28:45.760 |
acknowledges that it took inspiration greatly from J and APL. But that like the array languages clearly 00:28:53.040 |
missed what was a golden opportunity for their paradigm. And we ended up with libraries and 00:29:00.080 |
other languages. So I just asked three questions at once. Feel free to tackle them in any order. 00:29:04.800 |
I have a pretty bad memory. So I think I've forgotten the second one already. So you can 00:29:09.680 |
feel free to come back to any or all of them. So my journey, which is what you started with, 00:29:18.560 |
was I always felt like we should do more stuff without using code. Because I, or at least like 00:29:31.440 |
kind of traditional, what I guess we'd call nowadays, imperative code. There was a couple 00:29:38.800 |
of tools in my early days, which I've got huge amounts of leverage from because nobody else 00:29:45.760 |
in at least the consulting firms or generally in our clients knew about them. So that was SQL and 00:29:52.240 |
pivot tables. And so pivot tables, if you haven't come across it, was basically one of the earliest 00:29:58.240 |
approaches to OLAP, you know, slicing and dicing. There was actually something slightly earlier 00:30:02.480 |
called Lotus Improv, but that was actually a separate product. Excel was basically the first 00:30:07.200 |
one to put OLAP in the spreadsheet. So no loops. You just drag and drop the things you want to group 00:30:12.560 |
by and you right click to choose how to summarize. And same with SQL, you know, you declaratively 00:30:19.920 |
say what you want to do. You don't have to loop through things. SAS actually had something similar. 00:30:25.600 |
You know, with SAS, you could basically declare a prop that would run on your data. So yeah, I 00:30:32.080 |
kind of felt like this was the way I would rather do stuff if I could. And I think that's what led 00:30:39.840 |
me when we started doing the C++ implementation of the insurance pricing stuff of being much more 00:30:46.320 |
drawn to these metaprogramming approaches. I just didn't want to be writing loops in loops and 00:30:55.200 |
dealing with all that stuff. I'm too lazy, you know, to do that. I think I'm very driven by laziness, 00:31:04.400 |
which as Larry Wall said is one of the three virtues of a great programmer. Then yeah, so I think 00:31:14.080 |
when as soon as I saw NumPy had reached a level of some reasonable 00:31:22.400 |
confidence in Python, I was very drawn to that because it was what I've been looking for. 00:31:28.400 |
And I think maybe that actually is going to bring us to answering the question of like what happened 00:31:32.480 |
for array languages. Python has a lot of problems, but at its heart, it's a very well-designed 00:31:41.680 |
language. It has a very small, flexible core. Personally, I don't like the way most people 00:31:48.880 |
write it, but it's so flexible I've able to create almost my own version of Python, 00:31:54.640 |
which is very functionally oriented. I basically have stolen the type dispatch ideas from Julia, 00:32:01.600 |
created an implementation of that in Python. My Python code doesn't look like 00:32:08.080 |
most Python code, but I can use all the stuff that's in Python. So there's this very nicely 00:32:15.200 |
designed core of a language, which I then have this almost this DSL on top of, you know, and 00:32:21.440 |
NumPy is able to create this kind of DSL again because it's working on such a flexible 00:32:28.960 |
foundation. Ideally, you know, I mean, well, okay, so Python also has another DSL built into it, 00:32:36.320 |
which is math. You know, I can use the operators plus times minus. That's convenient. And 00:32:41.280 |
in every array library, NumPy, PyTorch, TensorFlow, and Python, those operators work 00:32:47.680 |
over arrays and do broadcasting over axes and so forth and, you know, accelerate on an accelerator 00:32:54.960 |
like a GPU or a TPU. That's all great. My ideal world would be that I wouldn't just get to use 00:33:03.280 |
plus times minus, but I get to use all the APL symbols. You know, that would be amazing. 00:33:10.080 |
But given a choice between a really beautiful language, you know, at its core like Python, 00:33:18.480 |
in which I can then add a slightly cobbled together DSL like NumPy, I would much prefer 00:33:24.720 |
that over a really beautiful notation like APL, but without the fantastic language underneath, 00:33:32.480 |
you know, like I don't feel like I there's nothing about APL or J or K's like 00:33:40.960 |
programming language that attracts me. Do you know what I mean? I feel like in terms of like 00:33:47.840 |
what I could do around whether it be type dispatch or how OO is designed or, you know, how I package 00:33:57.280 |
modules or almost anything else, I would prefer the Python way. So I feel like that's basically 00:34:06.160 |
what we've ended up with. You kind of either compromise between, you know, a good language 00:34:10.720 |
with, you know, slightly substandard notation or amazingly great notation with the substandard 00:34:17.840 |
language or not just language, but ecosystem. Python has an amazing ecosystem. 00:34:25.600 |
I think I hope one day we'll get the best of both, right? Like here's my, okay, here's my 00:34:35.200 |
controversial take and it may just represent my lack of knowledge. What I like about APL is its 00:34:42.960 |
notation. I think it's a beautiful notation. I don't think it's a beautiful programming language. 00:34:50.480 |
I think some things, possibly everything, you know, some things work very well as a notation, 00:35:00.160 |
but to get to raise something to the point that it is a notation requires some years of study 00:35:07.680 |
and development and often some genius, you know, like the genius of Feynman diagrams or the genius 00:35:15.040 |
of juggling notation, you know, like there are people who find a way to turn a field into a 00:35:23.040 |
notation and suddenly they blow that field apart and make it better for everybody. 00:35:29.360 |
For me, like, I don't want to think too hard all the time. Every time I come across something that 00:35:36.320 |
really hasn't been turned into a notation yet, you know, sometimes I just like, I just want to 00:35:43.040 |
get it done, you know, and so I would rather only use notation when I'm in these fields 00:35:50.480 |
that either somebody else had figured out how to make that a notation or I feel like it's really 00:35:55.520 |
worth me investing to figure that out. Otherwise, you know, there are, and the other thing I'd say 00:36:02.080 |
is we already have notations for things that aren't APL that actually work really well, 00:36:06.000 |
like regular expressions, for example. That's a fantastic notation and I don't want to 00:36:12.320 |
replace that with APL glyphs. I just want to use regular expressions. 00:36:20.720 |
So, yeah, my ideal world would be one where we, where I can write PyTorch code, but maybe instead 00:36:28.320 |
of like Einstein operations, Einstein notation, I could use APL notation. I think that's where 00:36:39.600 |
I would love to get to one day and I would love that to totally transparently run on a GPU or TPU 00:36:47.920 |
as well. That would be my happy place. Has no reason to do with the fact that 00:36:54.000 |
I work at NVIDIA that I would love that. Interesting. I've never heard that before, 00:37:00.240 |
the difference between basically appreciating or being in love with the notation, but not the 00:37:08.000 |
language itself and that. And, you know, it started out as a notation, right? Like I was in, 00:37:14.640 |
you know, it was a notation they used for representing state machines or whatever on 00:37:20.080 |
early IBM hardware, you know, when he did his Turing Award essay, he chose to talk about his 00:37:27.040 |
notation. And, you know, you see with people like Aaron with his code defense stuff that 00:37:37.680 |
if you take a very smart person and give them a few years, they can use that notation to solve 00:37:43.840 |
incredibly challenging problems like build a compiler and do it better than you can 00:37:50.320 |
without that notation. So I'm not saying like, yeah, APL can't be used to almost anything you 00:37:58.000 |
want to use it for, but a lot of the time we don't have five years to study something very closely. 00:38:04.400 |
We just want to, you know, we've got to get something done by tomorrow. 00:38:11.360 |
Interesting. You're still again, you didn't get a answer to. 00:38:15.680 |
Oh, yeah. When did you first, well, when did you first meet APL or how did you even find APL? 00:38:20.480 |
I first found J, I think, which obviously led me to APL. And I don't quite remember where I saw it. 00:38:34.880 |
Yeah. And actually, when I got to San Francisco, so that would be I'm trying to remember 00:38:45.760 |
2010 or something, I'm not sure. I actually reached out to Eric Iverson and I said, like, 00:38:54.640 |
oh, you know, we're starting this machine learning company called Kaggle. And I kind of feel like, 00:39:02.240 |
you know, everybody does stuff in Python, and it's kind of in a lot of ways really disappointing. 00:39:06.000 |
I wish we're doing stuff in J, you know, but we really need everything to be running on the GPU, 00:39:12.240 |
or at least everything to be automatically using SIMD and multiprocessor everywhere. 00:39:18.000 |
Here's kind of enough to actually jump on a Skype call with me, not just jump on a Skype call, 00:39:23.440 |
it's like, how do you want to chat? It's like, how about Skype? And he created a Skype account. 00:39:27.760 |
Like, oh, yeah, we chatted for quite a while. We talked about, you know, these kinds of hopes and 00:39:35.600 |
yeah, but I just, you know, never really because neither J or APO is in that space yet. 00:39:46.880 |
There was just never a reason for me to do anything other than like, 00:39:51.200 |
it kind of felt like each time I'd have a bit of a break for a couple of months, 00:39:54.800 |
I'd always been a couple of weeks fiddling around with J just for fun. But that's as far as I got, 00:40:02.000 |
really. Yeah, I think the first time I'd heard of you was in an interview that Leo Laporte did with 00:40:08.240 |
you on triangulation, and you were talking about Kaggle. That was a specific thing. But I think 00:40:13.280 |
I was riding my bike along some logging or something and suddenly he said, oh, yeah, but 00:40:17.120 |
a lot of people use J. I like J. It's the first time I'd ever heard anybody on a podcast say 00:40:22.960 |
anything about J. It was just like, wow, that's amazing. And the whole interview about Kaggle, 00:40:31.120 |
there was so much of it about the importance of data processing, not just having a lot of 00:40:36.640 |
data, but knowing how to filter it down, not over filtering all those tricks. I'm thinking, 00:40:41.600 |
wow, these guys are really doing some deep stuff with this stuff and this guy is using J. 00:40:47.280 |
I was actually very surprised at that point that somebody, I guess not somebody who was 00:40:54.080 |
working so much with data would know about J, but just that it would be, 00:40:58.080 |
I guess just suddenly popped onto my headsets and I'm just, wow, that's so neat. 00:41:04.720 |
And I will say, in the array programming community, I find there's essentially a common misconception 00:41:11.200 |
that the reason people aren't using array programming languages is because they don't 00:41:16.160 |
know about them or don't understand them, which there's a kernel of truth of that, 00:41:22.240 |
but the truth is nowadays there's huge massively funded research labs at places like Google Brain 00:41:31.920 |
and Facebook AI Research and OpenAI and so forth where large teams of people are literally writing 00:41:39.520 |
new programming languages because they've tried everything else and what's out there is not 00:41:44.080 |
sufficient. In the array programming world, there's offered a huge underappreciation of 00:41:52.720 |
what Python can do nowadays, for example. As recently as last week, I heard it described in 00:41:59.440 |
a chat room, it's like people obviously don't care about performance because they're using Python. 00:42:04.160 |
And it's like, well, a large amount of the world's highest performance computing now is done with 00:42:10.800 |
Python. It's not because Python's fast, but if you want to use RAPIDS, for example, which literally 00:42:19.040 |
holds records for the highest performance recommendation systems and tabular analysis, 00:42:26.000 |
you write it in Python. So this idea of having a fast kernel that's not written in the language 00:42:38.160 |
and then something else talking to it in a very flexible way, I think is great. And as I say, 00:42:43.200 |
at the moment, we are very hamstrung in a lot of ways that we, at least until recently, we very 00:42:48.880 |
heavily relied on BLAS, which is totally the wrong thing for that kind of flexible high-performance 00:42:57.680 |
computing because it's this bunch of somewhat arbitrary kind of selection of linear algebra 00:43:05.920 |
algorithms, which, you know, things like the C# work I did, you know, they were just RAPIDS on 00:43:11.120 |
top of BLAS. And what we really want is a way to write really expressive kernels that can do 00:43:18.240 |
anything over any axes. So then there are other newer approaches like Julia, for example, which 00:43:31.360 |
is kind of like got some rispy elements to it and this type dispatch system. But because it's, 00:43:36.720 |
you know, in the end, it's on top of LLVM. What you write in Julia, you know, it does end up 00:43:45.840 |
getting optimized very well. And you can write pretty much arbitrary kernels in Julia and often 00:43:52.320 |
get best-in-class performance. And then there's other approaches like JAX. And JAX sits on top 00:44:02.480 |
of something totally different, which is it sits on top of XLA. And XLA is a compiler, which is 00:44:09.280 |
mainly designed to compile things to run fast on Google's TPUs. But it also does an okay job of 00:44:17.040 |
compiling things to run on GPUs. And then really excitingly, I think, you know, for me is the MLIR 00:44:26.240 |
project, and particularly the affine dialect. So that was created by my friend, Chris Latner, 00:44:34.240 |
who you probably know from creating Clang and LLVM and Swift. So he joined Google for a couple 00:44:45.040 |
of years. And we worked really closely together on trying to like, think about the vision of 00:44:49.920 |
really powerful programming on accelerators that's really developer friendly. Unfortunately, 00:44:58.480 |
didn't work out. Google was a bit too tight to TensorFlow. But one of the big ideas that did 00:45:04.240 |
come out of that was MLIR, and that's still going strong. And I do think there's, you know, if 00:45:09.040 |
something like APO, you know, could target MLIR and then become a DSL inside Python, it may yet win, 00:45:18.800 |
you know. I've heard Yeah, I've heard you in the past say that, on different podcasts and talks, 00:45:24.960 |
that you don't think that Python, even in light of, you know, just saying, people don't realize how 00:45:31.200 |
much you can get done with Python, that you don't think that the future of data science and AI and 00:45:35.200 |
neural networks and that type of computation is going to live in the Python ecosystem. And I've 00:45:41.040 |
heard on some podcasts, you've said that, you know, Swift has a shot based on sort of the way that 00:45:44.400 |
they've designed that language. And you just mentioned, you know, a plethora of different 00:45:48.160 |
sort of, I wouldn't say initiatives, but you know, JAX, XLA, Julia, etc. Do you have like a sense 00:45:53.600 |
of where you think the future of, not necessarily sort of array language computation, but this kind 00:45:59.680 |
of computation is going with all the different avenues? I do. You know, I think we're certainly 00:46:08.560 |
seeing the limitations of Python, and the limitations of the PyTorch, you know, 00:46:15.520 |
lazy evaluation model, which is the way most things are done in Python at the moment, 00:46:25.280 |
for kind of array programming is you have an expression, which is, you know, working on 00:46:31.200 |
arrays, possibly of different ranks with implicit looping. And, you know, that's one line of Python 00:46:37.200 |
code. And generally, that then gets your, you know, on your computer, that'll get turned into, 00:46:43.280 |
you know, a request to run some particular optimized pre written operation on the GPU or 00:46:52.000 |
TPU, that then gets sent off to the GPU or TPU, where your data has already been moved there. 00:46:58.960 |
It runs, and then it tells the CPU when it's finished. And there's a lot of latency in this, 00:47:06.800 |
right? So if you want to create your own kernel, like your own way of doing, you know, your own 00:47:12.480 |
operation effectively, you know, good luck with that. That's not going to happen in Python. 00:47:19.600 |
And I hate this, I hate it as a teacher, because, you know, I can't show my students what's going 00:47:26.080 |
on, right? It kind of goes off into, you know, kind of CUDA land and then comes back later. 00:47:33.520 |
I hate it as a hacker, because I can't go in and hack at that, I can't trace it, I can't debug it, 00:47:39.280 |
I can't easily profile it. I hate it as a researcher, because very often I'm like, 00:47:44.400 |
I know we need to change this thing in this way, but I'm damned if I'm going to go and write my own. 00:47:49.680 |
CUDA code, let alone deploy it. So JAX is, I think, a path to this. It's where you say, okay, let's not 00:47:58.160 |
target pre-written CUDA things, let's instead target a compiler. And, you know, working with 00:48:07.360 |
Chris Latner, I'd say he didn't have too many nice things to say about XLA as a compiler. It was not 00:48:13.040 |
written by compiler writers, it was written by machine learning people, really. But it does the 00:48:19.760 |
job, you know, and it's certainly better than having no compiler. And so JAX is something which, 00:48:26.080 |
instead of turning our line of Python code into a call to some pre-written operation, 00:48:32.400 |
it instead is turning it into something that's going to be read by a compiler. And so the compiler 00:48:37.280 |
can then, you know, optimize that as compilers do. So, yeah, I would guess that JAX probably has 00:48:46.560 |
a part to play here, particularly because you get to benefit from the whole Python ecosystem, 00:48:54.320 |
package management, libraries, you know, visualization tools, et cetera. 00:49:04.560 |
But, you know, longer term, it's a mess, you know, it's a mess using a language like Python which 00:49:10.640 |
wasn't designed for this. It wasn't really even designed as something that you can chuck 00:49:16.880 |
different compilers onto. So people put horrible hacks. So, for example, PyTorch, 00:49:21.440 |
they have something called TorchScript, which is a bit similar. It takes Python and kind of compiles 00:49:26.800 |
it. But they literally wrote their own parser using a bunch of regular expressions. And it's 00:49:34.080 |
it's, you know, it's not very good at what it does. It even misreads comments and stuff. 00:49:39.120 |
So, you know, I do think there's definitely room for, you know, a language of which Julia would 00:49:47.520 |
certainly be the leading contender at the moment to come in and do it properly. And Julia's got, 00:49:54.800 |
you know, Julia is written on a scheme basis. So there's this little scheme kernel 00:50:01.440 |
that does the parsing and whatnot. And then pretty much everything else after that is written in 00:50:06.560 |
Julia. And, of course, leveraging LLVM very heavily. But I think that's what we want, right? 00:50:14.000 |
Is that something which I guess I didn't love about Swift. When the team at Google wanted to 00:50:19.840 |
add differentiation support into Swift, they wrote it in C++. And I was just like, that's not a good 00:50:26.960 |
sign. You know, like, apart from anything else, you end up with this group of developers who are, 00:50:35.040 |
in theory, Swift experts, but they actually write everything in C++. And so they actually don't have 00:50:40.800 |
much feel for what it's like to write stuff in Swift. They're writing stuff for Swift. And Julia, 00:50:45.760 |
pretty much everybody who's writing stuff for Julia is writing stuff in Julia. And I think that's 00:50:52.880 |
something you guys have talked about around APL and J as well, is that there's the idea of writing 00:50:59.920 |
J things in J and APL things in APL is a very powerful idea. 00:51:08.240 |
Yeah, sorry, go on. I just remembered your third question. I'll come back to it. 00:51:12.320 |
Oh, you asked me why now am I coming back to APL and J, which is 00:51:16.160 |
totally orthogonal to everything else we've talked about, which is I had a daughter, 00:51:23.520 |
she got old enough to actually start learning math. So she's six. 00:51:27.680 |
And oh, my God, there's so many great educational apps nowadays. There's one called Dragonbox 00:51:36.800 |
Algebra. It's so much fun. Dragonbox Algebra five plus. And it's like five plus algebra, 00:51:42.640 |
like what the hell? So when she's, I think she actually says still four, I gave, you know, 00:51:46.640 |
I let her play with Dragonbox Algebra five plus. And she learned Algebra, you know, by helping 00:51:52.080 |
Dragon eggs hatch. And she liked it so much, I let her try doing Dragonbox Algebra 12 plus. 00:52:00.480 |
And she loved that as well and finished it. And so suddenly I had a five year old kid that liked 00:52:05.440 |
Algebra. Much, much surprised. Kids really can surprise you. And so, yeah, she struggled with 00:52:16.320 |
a lot of the math that they were meant to be doing at primary school, like, 00:52:20.880 |
like the vision and modification, but she liked Algebra. And we ended up homeschooling her. 00:52:28.240 |
And then one of our, her best friend is also homeschooled. So this, this year I decided I'd 00:52:35.440 |
try tutoring them in math together. And so my daughter's name's Claire, so her friend Gabe, 00:52:44.400 |
so her friend Gabe discovered on his Mac the world of alternative keyboards. So he would 00:52:49.280 |
start typing in the chat in, you know, Greek characters or Russian characters. And one day 00:52:55.760 |
I was like, okay, check this out. So I like typed in some APL characters and they were just like, 00:53:01.520 |
wow, what's that? We need that. So initially we installed dialogue APL so that they could 00:53:08.480 |
type APL characters in the chat. And so I explained to them that this is actually 00:53:16.000 |
this like super fancy math that you're typing in. And they really wanted to try it. So, 00:53:22.480 |
and that was at the time I was trying to teach them sequences and series, 00:53:28.800 |
and they were not getting it at all. It was my first total failure time as a, as a math tutor 00:53:35.440 |
with them, you know, they'd been zipping along, fractions, you know, greatest common denominator, 00:53:42.240 |
factor trees. Okay, everything's fine. It makes sense. And then we hit sequences and series. And 00:53:47.040 |
it's just like, they had no idea what I was talking about. So we put that aside. Then we spent like 00:53:55.280 |
three one hour lessons doing the basics of APL, you know, the basic operations and doing stuff 00:54:03.840 |
with lists and dyadic versus monadic, but still, you know, just primary school level math. 00:54:11.360 |
And we also did the same thing in NumPy using Jupyter. And they really enjoyed all that, 00:54:16.080 |
like they were more engaged than our normal lessons. And so then we came back to like, 00:54:23.200 |
you know, sigma i equals one to five of i squared, whatever. And I was like, okay, 00:54:29.680 |
that means this, you know, in APL and this in NumPy. And they're like, oh, is that all? 00:54:38.720 |
Fine. With, you know, that's like, yeah, so that was a problem. This idea of like Tn equals Tn 00:54:45.680 |
minus one plus blah, blah, blah, blah. It's like, what is this stuff? But when you're actually 00:54:50.160 |
indexing real things and can print out the intermediate values and all that, and you've 00:54:56.480 |
got iota or a range, they were just like, oh, okay. You know, I don't know why you explained it this 00:55:03.440 |
dumb way before. And I will say, given a choice between doing something on a whiteboard or doing 00:55:09.760 |
something in NumPy or doing something in APL, now they will always pick APL because the APL version 00:55:15.760 |
is just so much easier. You know, there's less to type, there's less to think about, 00:55:20.800 |
there's less boilerplate. And so it's been, it's only been a few weeks, but like yesterday, 00:55:26.240 |
we did the power operator, you know, and so we literally started doing the foundations of 00:55:32.320 |
metamathematics. So it's like, okay, let's create a function called capital S, capital S arrow, 00:55:38.880 |
you know, plus jot one, right? So for those Python people listening, jot is, 00:55:46.400 |
if you give it an array or a scalar, it's the same as partial in Python or bind in C++. 00:55:59.920 |
So, okay, we've now got something that adds one to things. Okay. I said, okay, 00:56:02.800 |
this is called the successor function. And so I said to them, okay, what would happen if we go 00:56:06.960 |
SSS zero? And they're like, oh, that would be three. And so I said, okay, well, what's, 00:56:14.400 |
what's addition? And then one of them's like, oh, it's, it's repeated S. I'm like, yeah, 00:56:19.520 |
it's repeated S. So how do we say repeated? So in APL, we say repeated by using this 00:56:24.720 |
star diuresis. It's called power. Okay. So now we've done that. What is multiplication? 00:56:30.800 |
And then one of them goes after a while. Oh, it's repeated addition. So we define addition, 00:56:36.880 |
and then we define multiplication. And then I'm like, okay, well, what about, you know, exponent? 00:56:43.440 |
Oh, that's just, now this one, they've heard a thousand times. They both are immediately like, 00:56:47.760 |
oh, that's repeated multiplication. So like, okay, we've now defined that. And then, okay, well, 00:56:52.640 |
subtraction, that's a bit tricky. Well, it turns out that subtraction is just, you know, is the 00:56:58.160 |
opposite of something. What's it the opposite of? They both know that. Oh, that's the opposite of 00:57:01.680 |
addition. Okay. Well, opposite of, which in math, we call inverse is just a negative power. So now 00:57:08.480 |
we define subtraction. So how would you define division? Oh, okay. How would you define roots? 00:57:13.600 |
Oh, okay. So we kind of like, you know, designing the foundations of, of mathematics here at APL, 00:57:22.560 |
you know, with a six year old and an eight year old. And during this whole thing at one point, 00:57:27.840 |
we're like, okay, well, now I can't remember why, but we're like, okay, now we got to do one divided 00:57:32.000 |
by a half. And they both like, we don't know how to do that. So, you know, APL, this stuff that's 00:57:38.880 |
considered like college level math suddenly becomes easy. And, you know, at the point when still 00:57:45.360 |
primary school level math, like one divided by a half is considered hard. So it definitely made 00:57:50.720 |
me rethink, you know, what is easy and what is hard and how to teach this math stuff. And I've 00:57:58.880 |
been doing a lot of teaching of math with APL and the kids are loving it. And I'm loving it. And 00:58:04.480 |
that's actually why I started this study group, which will be on today. Today, as we record this 00:58:11.680 |
a few days ago, as you put it out there, as I kind of started saying on Twitter to people like, 00:58:17.920 |
oh, it's really been fun teaching my kids, you know, my kid and a friend math using APL and a lot of 00:58:23.120 |
adults were like, ah, can we learn math using APL as well? So that's what we're going to do. 00:58:32.320 |
Well, and that's the whole notation thing, isn't it? It's the notation you get away from the 00:58:36.000 |
sigmas and the pies and all that, you know, subscripts. I know, right? This is exactly 00:58:40.560 |
what Everson wanted. Yeah, exactly. I mean, who wants this, you know, why should capital pi be 00:58:47.440 |
product and capital sigma be sums? Like, you know, we did class slash and it's like, okay, 00:58:54.320 |
how do we do product? They're like, oh, it's obviously time slash. And I show them backslash, 00:58:58.000 |
it's like, how do we do our cumulative product? And so it's obviously time spec slash. Yeah, 00:59:02.960 |
this stuff. And but, you know, a large group of adults can't handle this because I'll put stuff 00:59:09.040 |
on Twitter. I'll be like, here's a cool thing in APL. And like half the replies will be like, 00:59:13.440 |
well, that's line noise. That's not intuitive. It's like, how do you say that? It's this classic 00:59:21.280 |
thing that I've always said, it's like the difference between what you said that you don't 00:59:25.520 |
understand it, or is it that it's hard? And, you know, kids don't know for kids, everything's new. 00:59:32.720 |
So that, you know, they see something they've never seen before. They're just like, teach me 00:59:36.640 |
that. Or else adults, or at least a good chunk of adults, just like, I don't immediately understand 00:59:42.000 |
that. Therefore, it's too hard for me. Therefore, I'm gonna belittle the very idea of the thing. 00:59:47.760 |
I did, I did a tacit program on one liner on APL farm the other day. And somebody said, 00:59:54.160 |
that looks like Greek to me. I said, well, Greek looks like Greek to me, because I don't know Greek. 00:59:58.640 |
I mean, sure. If you don't know it, absolutely, it looks silly. But if you know it, then it's, 01:00:04.480 |
it's not that hard. Yeah, I will say like, you know, a lot of people have put a lot of hard work into 01:00:12.160 |
resources for APL and J teaching. But I think there's still a long way to go. And one of the 01:00:20.400 |
challenges is, it's like when I was learning Chinese, I really wanted to I like the idea of 01:00:26.240 |
learning Chinese new words by looking them up in a Chinese dictionary. But of course, I didn't know 01:00:31.280 |
what the characters in the dictionary meant. So I couldn't look them up. So when I learned Chinese, 01:00:35.840 |
I really spent the first 18 months just focused on learning characters. So I got through 6000 01:00:41.760 |
characters in 18 months of very hard work. And then I could start looking things up in 01:00:47.040 |
dictionary. My hope is to do a similar thing for APL, like for these study groups, 01:00:53.120 |
I want to try to find a way to introduce every glyph in an order that never refers 01:01:01.040 |
to glyphs you haven't learned yet. Like that's something I don't feel like we really have. And 01:01:05.280 |
so that then you can look up stuff in the dialogue documentation. Because now still, I don't know 01:01:11.520 |
that many glyphs. So like most of the stuff in the documentation, I don't understand because it 01:01:17.680 |
explains glyphs using glyphs I don't yet know. And then I look those up. And those are used, 01:01:21.840 |
explain things with glyphs I don't yet know. So, you know, step one for me is I think we're just 01:01:27.120 |
going to go through and try to teach what every glyph is. And then I feel like we should be able 01:01:32.480 |
to study this better together, because then we could actually read the documentation, you know, 01:01:38.080 |
to publish these sessions online. Yeah, so the study group will be recorded as videos. 01:01:45.840 |
But I also then want to actually create, you know, written materials using Jupiter, 01:01:52.320 |
which I will then publish. That's my goal. So what you said very much resonates with me, 01:01:58.720 |
that I often find myself in the when teaching people this this bind that to explain everything 01:02:06.240 |
I need to already have everything explained. And I think so and especially it comes down to, 01:02:12.960 |
in order to explain what many of these glyphs are doing, I need some fancy arrays. If I restrict 01:02:18.400 |
myself to simple vectors and scalers, then I can't really show their power. And I cannot create these 01:02:24.800 |
higher rank arrays without already using those glyphs. And so hopefully, it is this long running 01:02:30.960 |
project since like 2015, I think it is, is to add a literal array notation to APL. 01:02:37.840 |
And then there is a way in, then you can start by looking at an array, and then you can start 01:02:45.280 |
manipulating and see the effects of the glyphs and intuit from there what they do. 01:02:49.680 |
Yeah, no, I think that'll be very, very helpful. And in the meantime, you know, 01:02:54.160 |
my approach with the kids has just been to teach row quite early on. So row is the equivalent of 01:03:00.560 |
reshape in Python, most Python libraries. And yeah, so once you know how to reshape, 01:03:09.440 |
you can start with a vector and shape it to anything you like. And it's, you know, 01:03:13.120 |
it's not a difficult concept to understand. So I think that yeah, basically, the trick at the 01:03:17.200 |
moment is just to say, okay, in our learning of the dictionary of APL, one of the first things 01:03:22.240 |
we will learn is, is row. And that was really fun with the kids doing monadic row, you know, 01:03:29.760 |
to be like, okay, well, what's row of this? What's row of that? And okay, what's row of row of this? 01:03:34.880 |
And then what's row of row of row, which then led me to the to the storm and poem about 01:03:44.240 |
what is it row row row is one, etc, etc, which they loved as well. 01:03:52.160 |
Yeah, we'll link that in the show notes. Also, too, while you were saying all that, 01:03:56.480 |
that really resonated me with me when I first started learning APL is like one of the first 01:04:03.200 |
things that happened when I was like, okay, you can, you can fold, you can map. So like, 01:04:08.640 |
how do you filter, you know, what are the classic, you know, three functional things? And the problem 01:04:13.120 |
with APL and array languages is they don't have an equivalent filter that takes a predicate function, 01:04:18.240 |
they have a filter that is called compress that takes a mask that, you know, drops anything that 01:04:23.520 |
corresponds to a zero. And it wasn't until a few months later that I ended up discovering it. But 01:04:28.240 |
for both APL and the newer APL BQN, there's these two sites, Adam was the one that wrote the APL one 01:04:35.440 |
apple cart dot info, and bacon crate dot info, I also think. And so you can basically 01:04:41.200 |
semantically search for what you're trying to do. And it'll give you small expressions that do that. 01:04:46.560 |
So if you type in the word filter, which is what you would call it coming from, you know, 01:04:52.000 |
a functional language, or even I think Python calls it filter, you can get a list of small 01:04:57.440 |
expressions. And really, really often, sometimes you need to know the exact thing that it's called, 01:05:03.280 |
like one time I was searching for, you know, all the combinations or permutations. And really, 01:05:07.280 |
what I was looking for was power set. And so until you have that, you know, the word power set, 01:05:11.920 |
it's, you know, it's a fuzzy search, right? So but it's still a very, very useful tool when it's like 01:05:18.000 |
you said, you're trying to learn something like Chinese. And it's like, well, where do I even 01:05:21.040 |
start I don't I don't know the language to search the words to search for. But yeah, it is. I agree 01:05:29.520 |
that there's a large room from improvement and how to onboard people without them immediately going, 01:05:35.520 |
like you said, this looks like hieroglyphics, which I think Iverson considered a compliment, 01:05:39.760 |
like there's some anecdote I've heard where someone was like, this is hieroglyphics. And he says, 01:05:42.960 |
yes, exactly. And then the other thing like that I want to do is help in particular Python programmers 01:05:52.080 |
and maybe also do something for JavaScript programmers, which are the two most popular 01:05:55.680 |
languages, like at the moment, like a lot of the tutorials for stuff like J or whatever, 01:06:01.680 |
like J for C programmers, you know, great book, but most people aren't C programmers. And also 01:06:07.680 |
a lot of the stuff like, you know, it'd be so much easier if somebody just like said to me early on, 01:06:14.000 |
oh, you know, just the same as partial in Python, you know, or it's like, you know, putting things 01:06:23.040 |
in a box, what the hell's a box if somebody basically said, oh, it's basically the same 01:06:26.400 |
as a reference. It's like, oh, okay, you know, I think it one of your podcasts, somebody said, 01:06:30.720 |
oh, it's like void stars. Oh, yeah, okay. You know, this is kind of like lack of just saying, 01:06:36.160 |
like, this is actually the same thing as in Python and JavaScript. So I do want to do some kind of 01:06:42.320 |
yeah, mapping, yeah, like that, particularly for kind of NumPy programmers and stuff, because a 01:06:50.080 |
lot of it's so extremely similar. Be nice to kind of say like, okay, well, this is, you know, J 01:06:56.960 |
maps things over leading axes, which is exactly the same as NumPy, except it doesn't have trailing 01:07:02.240 |
axes. So if you know the NumPy rules, you basically know the J rules. Yeah, I think I think at the 01:07:09.520 |
basic level, you're absolutely right. And that that would certainly be really useful. When we've 01:07:14.080 |
talked this over before, some of the challenges are in the flavors and the details. If you send 01:07:21.040 |
somebody down the wrong road with a metaphor that almost works in some of these areas, it can really 01:07:26.560 |
be challenging for them, because they see it in with, you know, through their lens of their experience. 01:07:33.520 |
But that would say, in this area, it would work differently than it actually does. So there is a 01:07:39.760 |
challenge in that. And we find it even between APL, BQN and J. I'm trying to think of what we were 01:07:46.240 |
talking about. Oh, it was transpose, the language, the language is dyadic transpose is they hand, 01:07:51.600 |
they handle them differently. They're functionally, you can do the same things, but you have to be a 01:07:56.400 |
aware that they are going to do it differently, according to the language. Absolutely. But that's 01:08:01.520 |
not a reason to throw out the analogy, right? Like, I think everybody agrees that that it's easier for 01:08:06.480 |
an APL programmer to learn J, than for a C or JavaScript programmer to learn J, you know, 01:08:14.800 |
because there are some ideas you understand. And you can actually say to people like, okay, well, 01:08:19.760 |
this is the rank conjunction in J. And you may recognize this as being like the rank, you know, 01:08:24.320 |
operator in APL. So if we can do something like that and say like, oh, well, okay, this would do 01:08:29.840 |
the same thing as, you know, dot permute, dot blah in PyTorch. It's like, okay, I see it. 01:08:40.000 |
Well, as the maintainer of apple cart, I'd like to throw in a little call to the listeners. Like 01:08:45.920 |
what Connor mentioned, I do fairly often get people saying, well, I couldn't find this and 01:08:51.200 |
ask them, what did you search for? So do let me know, contact me by whatever means, say, if you 01:08:56.080 |
couldn't find something, either because it's altogether missing, and I might be able to edit, 01:08:59.840 |
or tell me what you search for and couldn't find, or maybe you found it later by searching for 01:09:04.720 |
something else. And I'll add those keywords for future users. And I have put in a lot of like 01:09:11.200 |
function names from other programming languages so that you can search for those and find the 01:09:15.920 |
APL equivalent. Yeah, I will say, I feel like either I'm not smart enough to use applecart.info, 01:09:24.320 |
or I haven't got the right tutorial yet. Because I, I went there, I've been there a few times. 01:09:30.240 |
And there's this like whole lot of like impressive looking stuff. And I just I, I don't want to know 01:09:36.400 |
what to do with it. And then I sometimes click things and it sends me over to this tao.run that 01:09:40.880 |
tells me like real time 0.02 seconds code, like, I find it, you know, a little, not a little, I 01:09:50.080 |
have not yet, I don't yet know how to use it. And so, you know, I guess given hearing you guys say 01:09:57.520 |
this is a really useful tool that a lot of people put a lot of time into, I should obviously invest 01:10:02.160 |
time learning how to use it. And maybe after doing that, I should explain to people how to use it. 01:10:07.840 |
I do have a video on it. And there's also a little question mark icon one can click on and get to. 01:10:12.960 |
I have tried the question mark icon as well. As I say, it might just you know, I think this often 01:10:21.120 |
happens with APL stuff. I often hit things and I feel like maybe I'm not smart enough to understand 01:10:25.840 |
this. Clearly don't think that's if we disagree. Yeah, I do recall you saying a few minutes ago 01:10:37.440 |
that you managed to teach your, you know, four year old daughter like 12 grade or age 12 algebra. 01:10:43.200 |
No, I didn't. I just gave her the app, right? It's like it's I've heard other parents have given it 01:10:49.600 |
to their kids. They all seem to handle it. It's it's just this fun game where you hatch dragon eggs 01:10:54.240 |
by like dragging things around on the iPad screen. And it just it so happens that the things you're 01:10:59.120 |
doing with dragon's eggs are the rules of algebra. And after a while, it starts to switch out some of 01:11:06.320 |
the like monsters with symbols like x and y, you know, and it does it gradually, gradually. And at 01:11:12.400 |
the end, it's like, oh, now you're doing it after birth. So I can't get any credit for that. That's 01:11:17.600 |
some very, very clever people wrote a very cool thing. It really is an amazing program. I homeschooled 01:11:23.120 |
my son as well. And we used that for algebra. Great. Yeah, it was a bit more age appropriate, 01:11:28.160 |
but it's I, I looked at that and said that that really is well put together. It's it's an amazing 01:11:35.120 |
program. I will say there'll be a Dragonbox APO one day. It's not a bad idea. Not a bad idea at all. 01:11:43.920 |
I was going to say when you're teaching somebody, one of the big challenges when you're sort of 01:11:47.360 |
trying to get a language across to a general audience is who is the audience? Because as you 01:11:53.440 |
say, if you're if you're dealing with kids or people who haven't been exposed to programming 01:11:58.640 |
before, that's a very different audience than somebody might have been exposed to some other 01:12:03.600 |
type of programming. Functional programming is a bit closer, but if you're a procedural programmer 01:12:08.480 |
or imperative programmer, it's going to be a stretch to try and bend your mind in the different 01:12:13.120 |
ways that, you know, APL or J or BQN expect you to think about things. Yeah, I think the huge rise 01:12:20.800 |
of functional programming is very helpful for coming to array programming, you know, 01:12:26.400 |
both in JavaScript and in Python. It's, you know, I think most people are doing stuff, 01:12:34.240 |
particularly in the machine learning and deep learning world, are doing a lot of functional 01:12:38.480 |
stuff off. That's the only way you can do things, particularly in deep learning. So I think, yeah, 01:12:44.240 |
I think that does help a lot. Like, like Connor said, like you've probably come across, you know, 01:12:49.360 |
map and reduce and filter and certainly in Python, you'll have done list comprehensions and dictionary 01:12:56.880 |
comprehensions. And a lot of people have done SQL. So it's, yeah, I think a lot of people come into it 01:13:04.720 |
with some relevant analogies, if we can help connect for them. Yeah, one of the things that, 01:13:12.720 |
you know, this really is reinforcing my idea that, or it's not my idea, I think it's just an idea 01:13:19.840 |
that multiple people have had, but the tool doesn't exist yet. Because we'll link to some 01:13:25.760 |
documentation that I use frequently when I'm going sometimes between APL and J on the BQN website, 01:13:31.280 |
they have BQN to dialogue APL dictionaries and BQN to J dictionaries. So sometimes I'll like, 01:13:38.320 |
if I'm trying to convert between the two, the BQN docs are so good. I'll just use BQN as like an 01:13:43.040 |
IR to go back and forth. But I've mentioned on previous podcasts that really what would be amazing 01:13:48.480 |
and it would only work to a certain extent is something like a multidirectional array language 01:13:55.040 |
transpiler and adding NumPy to that list would probably be, you know, a huge, I don't know what 01:14:00.960 |
the word for it is, but beneficial for the array community. If you can type in some NumPy expression, 01:14:06.240 |
you know, like I said, it's only gonna work to an extent, but for simple, you know, rank one vectors 01:14:10.960 |
or arrays that you're just reversing and summing and doing simple, you know, reduction and scan 01:14:15.840 |
operations, you could translate that pretty easily into APLJ and BQN. And it's, I think that would 01:14:22.560 |
make it so much easier for people to understand, aka the hieroglyphics or the Greek or the Chinese 01:14:28.080 |
or whatever metaphor you want to use. Because yeah, this is, it is definitely challenging at times 01:14:34.640 |
to get to a certain point where you have enough info to keep the snowball rolling, if you will. 01:14:39.760 |
And it's very easy to hit a wall early on. Yeah. That's a project I've been thinking about is 01:14:47.280 |
basically rewrite NumPy in APL. It doesn't seem like a whole lot of work, where just take all those 01:14:55.600 |
names that are available in NumPy and just define them as APL functions. And people can explore that 01:15:00.480 |
by opening them up and seeing how they're defined. Oh, so not actually you're saying like, 01:15:07.120 |
it wouldn't be a new thing. You're just saying like, rename the symbols, what they're known as 01:15:12.560 |
in NumPy so that you'd still be in a, like an APL. Yeah. I mean, you could use it as a library, 01:15:19.440 |
but I was thinking of it more as an interactive exploring type thing, where you open up this 01:15:24.320 |
library and then you, you write the name of some NumPy thing functionality and open it up in the 01:15:34.640 |
editor and see, well, how is this defined in APL? And then you could use it obviously, since it's 01:15:40.320 |
defined. Interesting. Then you could slowly, you could use these library functions. And then as 01:15:47.360 |
you get better at APL, you can start actually writing out the raw APL instead of using these 01:15:52.080 |
covers for it. Well, I guess, Jeremy, that's interesting. Do you think that, because you've 01:15:57.680 |
mentioned about sort of the notation versus the programming language and where do you think the, 01:16:04.400 |
like, in your dream scenario, are you actually coding in sort of an Iversonian like notation? 01:16:11.280 |
Or is it at the end of the day, does it still look like NumPy, but it's just all of the expressivity 01:16:19.280 |
and power that you have in the language like APL is brought to and combined with what NumPy 01:16:25.600 |
sort of currently looks like? I mean, well, it'd be a bit of a combination, Connor, in that, like, 01:16:30.400 |
you know, my classes and my type dispatch and my packaging and, you know, all the, you know, 01:16:40.800 |
my function definitions and whatever, that's Python. But, you know, everywhere I can use 01:16:49.040 |
plus and times and divide and whatever, I could also use any APO glyph. And so it'd be, you know, 01:16:59.760 |
basically an embedded DSL for kind of high dimensional notation. It would work automatically 01:17:09.360 |
on NumPy arrays and TensorFlow tensors and PyTorch tensors. I mean, one thing that's interesting is, 01:17:16.480 |
to a large degree, APL and PyTorch and friends have actually arrived at a similar place 01:17:27.120 |
with the same, you know, grandparents, which is, Iverson actually said his inspiration 01:17:36.640 |
for some of the APL ideas was tensor analysis. And a lot of the folks, as you can gather from 01:17:43.440 |
the fact that in PyTorch, we don't call them arrays, we call them tensors. A lot of the folks 01:17:47.280 |
working on deep learning, their inspiration was also from tensor analysis. So it comes from 01:17:51.600 |
physics, right? And so I would say, you know, a lot more folks have worked on PyTorch. We're 01:17:57.280 |
familiar with tensor analysis and physics than we're familiar with APL. And then, of course, 01:18:03.680 |
there's been other notations, like explicitly based on Einstein notation, there's a thing 01:18:10.080 |
called INOPS, which like takes, it's a very interesting kind of approach of taking Einstein 01:18:15.280 |
notation much further. And like Einstein notation, if you think about it, is the kind of the loop 01:18:21.120 |
free programming of math, right? The equivalent of loops in math is indices. And Einstein notation 01:18:28.240 |
does away with indices. And so that's why stuff like INOPS is incredibly powerful because you can 01:18:33.840 |
write, you know, an expression in INOPS with no indices and no loops. And it's all implicit 01:18:42.160 |
reductions and implicit loops. I guess, yeah, my ideal thing would be, we wouldn't have to use INOPS, 01:18:49.520 |
we can use APL, you know, and it wouldn't be embedded in a string. They would actually be 01:18:55.680 |
operators. Yeah, that's what it is. They'd be operators in the language. The Python operators 01:19:00.160 |
would not just be plus times minus slash, that would be all the APL glyphs would be Python 01:19:12.320 |
operators. And they would work on all Python data types, including all the different tensor and 01:19:18.000 |
array data types. Interesting. Yeah. So it sounds like you're describing a kind of hybrid language. 01:19:24.960 |
JavaScript too. I would love the whole DSL to be in JavaScript as well. You know, 01:19:28.720 |
that'd be great. And I feel like I saw that somewhere. I feel like I saw somebody actually 01:19:34.640 |
do an ECMA script, you know, RFC with an implementation. Yeah, it was an A+4s joke. 01:19:44.240 |
Yeah, but it actually worked, didn't it? Like, it's just there was actually an implementation. 01:19:48.800 |
I don't think they had the implementation. It was just very, very well-specced. It could 01:19:54.480 |
actually work kind of thing. No, I definitely read the code. I don't know how complete it was, 01:19:59.920 |
but there was definitely some code there. I can't find it again. If you know where it is. 01:20:04.080 |
There's a JavaScript implementation of APL by Nick Nicolev. But my problem with it, 01:20:12.480 |
it's not tightly enough connected with the underlying JavaScript. 01:20:17.280 |
It shouldn't be an A+4 full stroke, should it? It's like Gmail was an A+4 full stroke, 01:20:24.000 |
right? Gmail came out on April 1st and totally destroyed my plans for fast mail because it was 01:20:29.440 |
an April Fools joke that was real. And Flask, you know, the Flask library, I think, was originally 01:20:35.040 |
an April Fools joke. We shouldn't be using frameworks because I created a framework that's 01:20:40.480 |
so stupidly small that it shouldn't be a framework. And now that's the most popular web framework in 01:20:45.120 |
Python. So, yeah, maybe this should be an April Fools joke that becomes real. 01:20:52.000 |
How close? This is maybe an odd question, but because from what I know about Julia, 01:20:56.800 |
you can define your own Unicode operators. And I did try at one point to create a small 01:21:05.760 |
composition of two different symbols, you know, square root and reverse or something, 01:21:11.600 |
and it ended up not working and asking me for parentheses. But do you think Julia could evolve 01:21:17.360 |
to be that kind of hybrid language? Maybe. I'm actually doing a keynote at JuliaCon in a couple 01:21:26.320 |
of weeks, so maybe I should erase that. Just at the Q&A section, say, any questions? But first, 01:21:34.800 |
I've got one for the community at large. Here's what I'd like. I think my whole talk is going to 01:21:38.880 |
be kind of like what Julia needs to be, you know, to move to the next level. I'm not sure I can 01:21:45.840 |
demand that a complete APL implementation is that thing, but I could certainly put it out there as 01:21:50.320 |
something to consider. It always bothers me, though, that if you try to extend those languages 01:21:57.200 |
like this or you could do some kind of pre-compiler for it, then their order of execution ends up 01:22:05.440 |
messing up APL. I think APL very much depends on having a strict one-directional order of functions, 01:22:12.800 |
otherwise it's hopeless to keep track of. That is a big challenge because currently 01:22:18.880 |
the DSL inside Python, which is the basic mathematical operations, do have the BODMAS 01:22:26.720 |
or PEMDAS order operations. So there would need to be some way. So in Python, that wouldn't be 01:22:32.960 |
too hard, actually, because in Python, you can opt into different kind of parsing things by adding a 01:22:42.320 |
from dunderfutures import blast. You could have a from dunderfutures import APL precedence. 01:22:49.360 |
And then from then on, everything in your file is going to use right-to-left precedence. 01:22:54.480 |
That's really interesting and cool. I didn't know that. 01:23:00.400 |
Yeah, that's awesome. I've been spending a lot of time thinking about 01:23:08.240 |
function precedence and just the differences and different languages. I'm not sure if any other 01:23:13.760 |
languages have this, but something that I find very curious about BQN and APL is that they have 01:23:19.920 |
functions basically that have higher precedence than other functions. So operators in APL and 01:23:27.920 |
conjunctions in adverbs, they have higher precedence than your regular functions that apply to arrays. 01:23:35.200 |
I'm simplifying a tiny bit, but this idea that in Haskell, function application always has 01:23:40.880 |
the highest precedence. You can never get anything that has a higher function precedence than that. 01:23:46.080 |
And it always, having stumbled into the array world now, it seems like a very powerful thing 01:23:51.360 |
that these combinator-like functions don't have just by default the higher precedence. Because if 01:23:56.160 |
you have a fold or a scan or a map, you're always combining that with some kind of binary operation 01:24:01.760 |
or unary operation to create another function that you're then going to eventually apply to 01:24:05.600 |
something. But the basic right to left, putting aside the higher order functions or operators, 01:24:15.840 |
as they're known in APL, the basic right to left path, again, for teaching and for my own brain, 01:24:22.240 |
gosh, that's so much nicer than in C++. Oh my God, they're not being able to operate a precedence. 01:24:30.320 |
There's no way I can ever remember that. And there's a good chance when I'm reading somebody 01:24:34.800 |
else's code that they haven't used parentheses because they didn't really need them and that I 01:24:40.160 |
have no idea where they have to go and then I have to go and look it up. It's another of these things 01:24:45.200 |
that with the kids, I'm like, okay, you remember that stuff we spent ages on about like, first you 01:24:51.040 |
do exponents and then you do times. It's like, okay, you don't have to do any of that in APL. 01:24:56.160 |
You just go right to left and they're just like, oh, that's so much better. 01:24:59.680 |
This literally came up at work like a month ago, where I was giving this mini APL, we had 10 minutes 01:25:07.440 |
at the end of a meeting, and then I just made this offhand remark that of course, the evaluation 01:25:11.680 |
order in APL is a much simpler model than what we learned in school. And I upset, there was, 01:25:16.960 |
I don't know, 20 people in the meeting and it was the most controversial thing I had said. 01:25:23.200 |
I almost had like an out of body experience because I thought I was saying something that 01:25:27.360 |
was like objectively just true. And then I was like, wait a second, what I'm clearly missing, 01:25:32.720 |
like, is there? Yeah, well, you were wrong. Like, how do you communicate? No, I mean, 01:25:36.480 |
most adults are incapable of like new ideas. It's just, it's, it's, that's what I should have said 01:25:44.240 |
in the meeting. What, I mean, this is a reason that I, another reason I like doing things like 01:25:50.400 |
APL study groups, because it's a way of like self-selecting that small group of humanity who's 01:25:55.280 |
actually interested in trying new things, despite the fact that they're grownups, and then try to 01:25:59.920 |
surround myself with those people in my life. But isn't it sad then? I mean, what has happened 01:26:04.560 |
to those grownups? Like when you mentioned teaching these people and trying to like, 01:26:08.240 |
map their existing knowledge onto APL things, what does it mean to box and so on? I find that 01:26:12.640 |
two children and non-programmers, expanding their array model and how the functions are applied and 01:26:19.120 |
so on, is almost trivial. Meets no resistance at all. And it's all those adults that have either 01:26:26.560 |
learned their, their primitives or button mass or whatever the rules are, and, and all the computer 01:26:31.440 |
science people that know their proceedings tables and their lists of lists and so on. 01:26:36.000 |
Those are the ones that are really, really struggling. It's not just resisting. They're 01:26:40.800 |
clearly struggling. They're really trying and, and, and it's a lot of effort. So there is actually, 01:26:47.520 |
I mean, that is a known thing in educational research. So yeah, I mean, so I spent months 01:26:55.120 |
earlier this year and late last year reading every paper I caught about, you know, education, 01:27:02.720 |
because I thought if I'm going to be homeschooling, then I should try to know what I'm doing. 01:27:06.240 |
And yeah, what you describe at arm is, is absolutely a thing, which is that the, 01:27:12.640 |
you know, the research shows that trying, you know, when you've got a, you know, an existing idea, 01:27:18.480 |
which is an incorrect understanding of something, and you're trying to replace it with a correct 01:27:23.200 |
understanding, that is much harder than learning the correct version directly. So which is obviously 01:27:31.520 |
a challenge when you think about analogies and analogy has to be good enough to lead directly 01:27:38.160 |
to the, to the correct version. But I think, you know, the important thing is to find the people 01:27:43.040 |
who are who have the curiosity and tenacity to be prepared to go over that hurdle, even though it's 01:27:50.480 |
difficult, you know, because yeah, it is like, that's just, that's just how human brains are. 01:27:55.920 |
So so be it, you know. Yeah, unlearning is really hard work, actually. And if you think about it, 01:28:02.240 |
it probably should be because you spend a lot of time and energy to put some kind of a pattern 01:28:06.720 |
into your brain. Right. You don't want to have that evaporate very quickly. Right. And our, 01:28:12.080 |
you know, myelination occurs around what, like age is age to 12 or something. So like our brains 01:28:17.520 |
are literally trying to stop us from having to learn new things, because our brains think that 01:28:23.760 |
they've got stuff sorted out at that point. And so they should focus on keeping long term memories 01:28:27.680 |
around. So yeah, it does become harder. But, you know, a little bit, it's still totally doable. 01:28:34.640 |
The solution is obvious. Teach AP on primary school. 01:28:37.200 |
That's what I'm doing. What was the word you mentioned? Am I a myelation? 01:28:43.520 |
My myelination. M-E-Y-L-I-N-A-T-I-O-N. Interesting. I'd not heard that one before. 01:28:51.680 |
So it's a physical coating that I can't remember goes on the dendrites. 01:28:57.760 |
That sounds right. These fat layers or cholesterol layers. I never took any biology courses in my 01:29:06.800 |
education. So clearly, I've missed out on that aspect. You myelinated anyway. 01:29:18.480 |
You also mentioned the word tenacity, Jeff. Yeah. 01:29:24.240 |
And and and I was watching an interview with Samyan Bhatani. 01:29:29.600 |
And you were talking about because it sounds like he was you spotted at an early point in his 01:29:38.400 |
working with Kaggle that he was something probably different. And the thing you said 01:29:41.840 |
was that tenacity to to keep working at something. Yeah. 01:29:45.760 |
I think that's a really important part about educating people 01:29:49.440 |
that they shouldn't necessarily expect learning something new to be easy. 01:29:55.520 |
Oh, yeah. I mean, I really noticed that when I was started learning Chinese. 01:30:00.240 |
Like I went to, you know, just some local class in in Melbourne. 01:30:08.480 |
And everybody was very, very enthusiastic, you know, and everybody was going to learn Chinese. 01:30:14.560 |
And we all talked about the things we were going to do. 01:30:19.920 |
And yeah, each week, there'd be fewer and fewer people there. 01:30:22.800 |
And, you know, I kind of tried to keep in touch with them. 01:30:26.160 |
But after a year, every single other person had given up and I was the only one still doing it. 01:30:32.240 |
You know, so then after a couple of years, people would be like, 01:30:34.320 |
wow, you're so smart. You learn Chinese. This is like, no, man. 01:30:39.440 |
Like during those first few weeks, I was pretty sure I was learning more slowly than the other 01:30:45.120 |
students. But everybody else stopped doing it. So of course, they didn't learn Chinese. 01:30:51.680 |
And I don't know what the trick is, because, yeah, it's the same thing with, you know, 01:30:56.080 |
like it fast. I courses, they're really designed to keep people interested and get people doing 01:31:01.840 |
fun stuff from from day one. And, you know, still, I'd say most people drop out and the ones that 01:31:09.120 |
don't I would say most of them end up becoming like actual world class practitioners and they, 01:31:16.480 |
you know, build new products and startups and whatever else. And people will be like, 01:31:20.480 |
oh, I wish I knew neural nets and deep learning. It's like, okay, here's the course. 01:31:25.440 |
Just just do it and don't give up. But yeah, I don't know tenacity. 01:31:31.440 |
It's not a very common virtue, I think, for some reason. 01:31:36.960 |
It's something I've heard, I think it's Joe Bowler at Stanford talk about the growth mindset. 01:31:41.840 |
And I think that is something that, for whatever reason, some people tend to, and maybe it's 01:31:47.280 |
malanation, at those ages, you start to get that mindset where you're not so concerned about 01:31:53.600 |
having something happen that's easy to do well. But just the fact that if you keep working at it, 01:31:59.600 |
you will get it. And not everybody, I guess, is maybe put in the situations that they 01:32:05.760 |
get that feedback that tells you if I keep trying this, I'll get it. If it's not easy, they stop. 01:32:11.360 |
Yeah, I mean, that area of growth mindset is a very controversial idea in education. 01:32:18.800 |
Specifically the question of can you modify it? And I think it's certainly pretty well established 01:32:27.840 |
to this point that the kind of stuff that schools have tended to do, which is put posters up around 01:32:32.480 |
the place saying like, you know, make things a learning opportunity or don't give up, like they 01:32:37.680 |
do nothing at all. You know, with my daughter, we do all kinds of stuff around this. So we've 01:32:46.640 |
actually invented a whole family of clams. And as you can imagine, clams don't have a growth mindset, 01:32:52.960 |
they tend to sit on the bottom of the ocean, not moving. And so the family of clams that we 01:33:00.400 |
invented that we live with, you know, always at every point that we're going to have to like learn 01:33:05.360 |
something new or try something new, always start screaming and don't want to have anything to do 01:33:10.880 |
with it. And, you know, so we actually have Claire telling the clams how it's going to be okay. And, 01:33:17.280 |
you know, it's actually a good thing to learn new things. And so we're trying stuff like that to try 01:33:22.480 |
to like have have imaginary creatures that don't have a growth mindset and for her to realize how 01:33:29.520 |
how silly that is, which is fun. And the things that you were talking about in terms of the 01:33:34.960 |
meta-mathematics, you didn't say, Oh, the successor, this is what plus is you said, 01:33:40.320 |
how do you how do you how would you use this? How would you start to put it together themselves? 01:33:46.720 |
Which to me, that's the growth mindset that if you Yeah, you're creating that. But then like, 01:33:52.240 |
you know, gosh, you're getting to all the most controversial things in education here, Bob, 01:33:56.720 |
because that's the other big one is discovery learning. So this idea of having kids explore and 01:34:04.160 |
find. It's also controversial, because it turns out that actually the best way to have people 01:34:11.280 |
understand something is to give them a good explanation. So it is important, like, that 01:34:17.040 |
you combine this, like, okay, how would you do this within like, okay, let me just tell you 01:34:23.200 |
what you know why this is. It's easier for homeschooling with two kids, because I can make sure 01:34:28.560 |
their exploration is short, and correct. You know, if you spend a whole class, you know, 01:34:36.720 |
50 minutes doing totally the wrong thing, then you end up with these really incorrect 01:34:42.640 |
understandings, which you then have to kind of deprogram. So yeah, education's hard, you know. 01:34:51.040 |
And I think a lot of people look for these simple shortcuts, and they don't really exist. So you 01:35:00.640 |
actually have to have good, good explanations and good problem solving methods and yeah, 01:35:10.320 |
all this stuff. That's a really interesting area, the notation and the tools. Yeah, and you know, 01:35:17.280 |
notation, I mean, so I do a live coding, you know, video thing every day with a bunch of folks. And 01:35:28.320 |
in the most recent one, we started talking about APL, why we're going to be doing APL this week 01:35:35.360 |
instead. And I gave, you know, somebody actually said like, oh, my God, is it going to be like 01:35:40.560 |
regexes? And, you know, I kind of said like, okay, so regexes are a notation for doing stuff. And we 01:35:49.760 |
spent an hour solving the problem with regexes. And oh, my God, it was such a powerful tool for 01:35:59.680 |
this problem. And you know, by the end of it, they were all like, okay, we want to like deeply 01:36:04.080 |
study regexes. And obviously, that's a much less flexible and powerful tool notation than APL. 01:36:12.560 |
But you know, we kind of talked about how once you start understanding these notations, you can build 01:36:19.680 |
things on top of them. And then you kind of create these abstractions. And that's yeah, notation is 01:36:26.720 |
how, you know, deep human thought kind of progresses, right, in a lot of ways. So, you know, it's like, 01:36:37.840 |
I actually spoke to a math professor friend a couple of months ago about, you know, my renewed 01:36:42.800 |
interest in APL. And he was like, and I kind of sent him some, I can't remember what it was, 01:36:48.480 |
maybe doing the golden ratio or something, little snippet, and he was just like, 01:36:53.840 |
yeah, something like that looks like Greek to me, I don't understand that. It's like, 01:36:57.280 |
do you draw a math professor, you know, like, if, if I said somebody who isn't in math, 01:37:03.040 |
like a page of your, you know, research, what are they going to say? And, you know, it's interesting, 01:37:11.040 |
I said, like, there's a bit of their ideas in here, like, Iverson brackets, for example, 01:37:16.160 |
have you ever heard of Iverson brackets? He's like, well, of course, I've heard of it. Like, 01:37:19.040 |
you know, it's a fundamental tool in math. It's like, well, you know, that's one thing that you 01:37:23.520 |
guys have stolen from APL. You know, that's a powerful thing, right? It's like, fantastic, 01:37:28.640 |
I'd never want to do without Iverson brackets. So I kind of tried to say like, okay, well, imagine, 01:37:32.960 |
like, every other glyph that you don't understand here, has some rich thing like Iverson brackets, 01:37:38.400 |
you could now learn about. Okay, maybe I should give it a go. I'm not sure he has. 01:37:46.960 |
But I think that's a good example for mathematicians, is to show like his one thing, 01:37:52.320 |
at least that found its way from APL. That maybe gives you a sense that for a mathematician, 01:37:58.240 |
that there might be something in here. On that note, because I know we are potentially, 01:38:05.760 |
well, we've gone way over, but this has been awesome. But a question I think that might be 01:38:10.400 |
a good question to end on is, is, do you have any advice for folks that want to learn something, 01:38:20.880 |
whether it's Chinese, or an array language, or to get through your fast AI course? And 01:38:26.560 |
is there because I think, you know, like you said, you like to self select for folks that are 01:38:32.560 |
the curious types and that are want to learn new things and new ways to solve things. But like, 01:38:38.480 |
is there any way, other than just being tenacious to, like, be tenacious, is there tips to, you know, 01:38:46.960 |
approaching something with some angle, because I think a lot of the folks maybe listening to this 01:38:51.680 |
don't have that issue. But I definitely know a ton of people that are the are the kind of folks 01:38:57.120 |
that you know, they'll join a study group, but then three weeks and they, you know, the kind of 01:39:00.400 |
lose interest or, or they decide it's too much work or too difficult. As an educator, and you know, 01:39:07.040 |
it seems like you operate in this space. Do you have advice to tell folks, you know, 01:39:13.760 |
I mean, so much, Connor, I actually kind of embedded in my courses a lot. I can give you 01:39:19.680 |
some quick summaries. But what I will say is, my friend Radhika Zmalski, who's been taking my 01:39:25.120 |
courses for like four years, has taken everything I've said, and his experience of those things and 01:39:33.520 |
turned it into a book. So if you read, Zmalski's book is called Meta Learning, powerful mental 01:39:42.480 |
models for deep learning. This is learning as in learning deeply. So yeah, check out his book, 01:39:49.280 |
to get the full answer. I mean, there's just, gosh, there's a lot of things you can do to make 01:39:55.760 |
learning easier. You know, and a key thing I do in my courses is I always teach top down. So like 01:40:06.400 |
often people with like, let's take deep learning and neural networks, they'll be like, okay, well, 01:40:10.480 |
first, I'm going to have to learn linear algebra and calculus and blah, blah, blah. And, you know, 01:40:16.480 |
four or five years later, they still haven't actually trained a neural network. Our approach 01:40:21.760 |
in our course is in lesson one, the very first thing you do in the first 15 minutes is you train 01:40:26.320 |
a neural network. And it is more like how we learn baseball or how we learn music, you know, 01:40:36.640 |
like you say, like, okay, well, let's play baseball comes, you stand there, you stand there, 01:40:40.960 |
I've threaded this to you, you're going to hit it, you're going to run, you know, you don't start by 01:40:45.520 |
learning, you know, the parabolic trajectory of a ball or the, you know, history of the game or 01:40:53.440 |
whatever, you just start playing. So that's, you know, you want to be playing. And if you're doing 01:40:59.760 |
stuff from the start, that's fun and interesting and useful, then top down, doesn't mean it's 01:41:07.360 |
shallow, you can then work from there to like, then understand like, what's each line of code 01:41:12.320 |
doing? And then how is it doing it? And then why is it doing it? And then what happens if we do 01:41:16.560 |
it a different way? And until eventually, with with our fast AI program, you actually end up 01:41:23.040 |
rewriting your own neural network library from scratch, which means you have to very deeply 01:41:28.240 |
understand every single part of it. And then we start reading research papers. And then we start 01:41:32.960 |
learning about how to implement those research papers in the library we just wrote. So yeah, 01:41:37.600 |
I'd say go top down, make it fun, make it applied. For things like APL or Chinese, where there's 01:41:45.040 |
just stuff you have to remember, use Anki, use repetitive space learning. You know, that's been 01:41:52.000 |
around, Ebbinghaus came up with that, I don't know what, 250, 200 years ago, it works, you know, 01:42:02.080 |
everybody, if you tell them something, will forget it in a week's time, everybody, you know, and so 01:42:08.800 |
you shouldn't expect to read something and remember it. Because you're human, and humans don't do that. 01:42:15.120 |
So repetitive space learning will have you quiz you on that thing tomorrow. And then in four days 01:42:22.960 |
time, and then in 14 days time, and then in three weeks time, and if you ever forget it, it will 01:42:29.280 |
reset that schedule. And it'll make sure it's impossible to forget it, you know, so it's, 01:42:34.320 |
it's depressing to study things that then disappear. And so it's important to recognize 01:42:40.960 |
that unless you use Anki or super memo or something like that, unless you use it every day, 01:42:47.760 |
it will, it will disappear. But if you do use repetitive space learning, it's guaranteed not 01:42:53.120 |
to. And I told this to my daughter, a couple of years ago, I said, I, you know, what if I told you 01:43:00.800 |
there was a way you can guarantee to never ever forget something you want to know? It's just like, 01:43:06.800 |
that's impossible. This is like some kind of magic. It's like, no, it's not magic. And like, I sat down 01:43:13.280 |
and I drew out the Ebbinghaus forgetting curves and explained how it works. And I explained how, 01:43:20.640 |
you know, if you get quizzed on it in these schedules, it flattens out. And she was just 01:43:25.200 |
like, what do you think? I want to use that. So she's been using Anki ever since. 01:43:31.520 |
So maybe those are just two, let's just start with those two. Yeah, so go top down and, and use 01:43:38.640 |
Anki, I think could make your learning process much more fulfilling, because you'll be doing 01:43:44.400 |
stuff with what you're learning and you'll be remembering it. Well, that is awesome. And yeah, 01:43:50.160 |
definitely we'll leave links to not just Anki and the book, meta learning, but everything that we've 01:43:56.560 |
discussed throughout this conversation, because I think there's a ton of really, really awesome 01:44:00.400 |
advice. And obviously to your fast AI course in the library. And we'll also link to, I know you've 01:44:07.040 |
been on, like we mentioned before, a ton of other podcasts and talks. So if you'd like to hear more 01:44:12.960 |
from Jeremy, there's a ton of resources online. Hopefully, it sounds like you're going to be, 01:44:17.120 |
you know, building some learning materials over the next however many months or years. And so 01:44:21.920 |
in the future, if you'd love to come back and update us on on your journey with the array 01:44:26.000 |
languages, that would be super fun for us, because I've thoroughly enjoyed this conversation. And 01:44:31.040 |
thank you so much for waking up early all on the other side of the world from us, at least in 01:44:36.400 |
Austria. Thanks for having me. And yeah, I guess with that, we'll say happy array programming.