The Array Cast: Jeremy Howard

00:00:00.880 | Welcome to another episode of ArrayCast. I'm your host, Connor. And today we have a very

00:00:05.440 | exciting guest, which we will introduce in a second. But before we do that, we'll do brief

00:00:09.440 | introductions and then one announcement. So first we'll go to Bob and then we'll go to Adam who has

00:00:13.120 | the one announcement. And then we will introduce our guest. I'm Bob Terrio. I'm a J enthusiast

00:00:18.400 | and I do some work with the J Wiki. We're underway and trying to get it all set up for the fall.

00:00:24.800 | I'm Adam Botzewski, full-time APL programmer at Dialog Limited. Besides for actually programming

00:00:30.640 | APL, I also take care of all kinds of social things, including the APL Wiki. And then for my

00:00:37.680 | announcements, part of what we do with Dialog is arrange a yearly user meeting or a type of

00:00:42.880 | conference. And at that user meeting, there is also a presentation by the winner of the APL

00:00:51.840 | problem solving competition. That competition closes at the end of the month. So hurry up if

00:00:59.360 | you want to participate. It's not too late even to get started at this point. And also at the end of

00:01:03.520 | the month is the end of the early bird discount for the user meeting itself. Awesome. And just

00:01:10.560 | a note about that contest. I think, and Adam can correct me if I'm wrong, there's two phases in

00:01:15.360 | the first phase. It's just 10 short problems. A lot of them are just one-liners. And even if

00:01:20.560 | you only solve one of the 10, I think you can win a small cash prize just from answering one.

00:01:26.960 | Is that correct? I'm not even sure. You might need to solve them all. They're really easy.

00:01:36.160 | So the point being though is that you don't need to complete the whole contest in order to be

00:01:39.680 | eligible to win prizes. No, for sure. There's a certain amount that if you get to that point,

00:01:44.160 | you hit a certain threshold and you can be eligible to win some free money, which is always

00:01:48.240 | awesome. And yeah, just briefly, as I introduce myself in every other episode, I'm your host,

00:01:54.640 | Connor, C++ professional developer, not an array language developer in my day-to-day,

00:01:59.680 | but a huge array language and combinator enthusiast at large, which brings us to introducing our

00:02:06.240 | guest who is Jeremy Howard, who has a very, very, very long career. And you probably have heard him

00:02:13.920 | on other podcasts or have been giving other talks. I'll read the first paragraph of his

00:02:19.200 | three-paragraph bio because I don't want to embarrass him too much, but he has

00:02:22.880 | a very accomplished career. So Jeremy Howard is a data scientist, researcher, developer,

00:02:28.320 | educator, and entrepreneur. He is the founding researcher at FastAI, a research institute

00:02:34.320 | dedicated to making deep learning more accessible and is an honorary professor at the University of

00:02:38.800 | Queensland. That's in Australia, I believe. Previously, Jeremy was a distinguished research

00:02:43.520 | scientist at the University of San Francisco, where he was the founding chair of the Wicklow

00:02:47.600 | artificial intelligence and medical medical research initiative. He's also been the CEO of

00:02:53.120 | analytic and was the president and chief scientist of Kegel, which is the basically data science

00:02:59.760 | version of leak code, which many software developers are familiar with. He was the CEO of two

00:03:04.400 | successful Australian startups, Fastmail and Optimal Decisions Group. And before that,

00:03:08.400 | in between doing a bunch of other things, he worked in management consulting at McKinsey,

00:03:12.960 | which is an incredibly interesting start to the career that he has had now, because for those of

00:03:18.240 | you that don't know, McKinsey is one of the three biggest management consulting firms alongside,

00:03:22.720 | I think, Bain & Co. and BCG. So I'm super interested to hear how he started in management

00:03:27.280 | consulting and ended up being the author of one of the most popular AI libraries in Python and also

00:03:33.520 | the course that's attached to it, which I think is, if not, you know, the most popular, a very,

00:03:38.800 | very popular course that students all around the world are taking. So I will stop there,

00:03:42.800 | throw it over to Jeremy, and he can fill in all the gaps that he wants, jump back to however far

00:03:47.440 | you want to, to tell us, you know, how you got to where you are now. And I think the one thing I

00:03:53.120 | forgot to mention, too, is that he recently tweeted on July 1st, and we're recording this on July 4th,

00:03:58.720 | that he quote the tweets, reads, "Next week, I'm starting a daily study group on my most loved

00:04:03.440 | programming language, APL." And so obviously interested to hear more about that tweet and

00:04:08.560 | what's going to be happening with that study group. So over to you, Jeremy.

00:04:11.040 | Well, the study group is starting today as we record this. So depending on how long it takes to

00:04:19.280 | get this out, it'll have just started. And so definitely time for people to join in. So we'll,

00:04:26.640 | I'm sure we'll include a link to that in the show notes. Yeah, I definitely feel kind of like I'm

00:04:32.480 | your least qualified array programming person ever interviewed on this show. I love APL and J,

00:04:43.520 | but I've done very, very little with them, particularly APL. I've done a little,

00:04:48.960 | little bit with J mucking around, but like, I find a couple of weeks here and there every

00:04:54.480 | few years, and I have for a couple of decades. Having said that, I am a huge enthusiast of

00:05:04.320 | array programming, as it is used, you know, in a loopless style in other languages, initially in

00:05:12.480 | Pell, and nowadays in Python. Yeah, maybe I'll come back to that, because I guess you wanted to get a

00:05:18.400 | sense of my background. Yeah, so I actually started at McKinsey. I grew up in Melbourne, Australia. And

00:05:28.640 | I didn't know what I wanted to do when I grew up at the point that you're meant to know when you

00:05:34.240 | choose a university, you know, major. So I picked philosophy on the basis that it was like,

00:05:39.920 | you know, the best way of punting down the road what you might do, because with philosophy,

00:05:45.360 | you can't do anything. And honestly, that kind of worked out in that I needed money,

00:05:54.480 | and I needed money to get through university. So I got over like one day a week, kind of IT

00:05:59.680 | support job at McKinsey, the McKinsey Melbourne office during university from first year,

00:06:07.600 | I think that's from first year. But it turned out that like, yeah, I was very curious, and so I'm

00:06:15.280 | so curious about management consulting. So every time consultants would come down and ask me to

00:06:18.720 | like, you know, clean out the sticky coke they built in their keyboard or whatever, I would

00:06:24.800 | always ask them what they were working on and ask them to show me and I've been really interested in

00:06:31.760 | like doing analytics see kind of things for a few years at that point. So during high school,

00:06:38.080 | basically every holidays, I kind of worked on stuff with spreadsheets or Microsoft access or

00:06:44.000 | whatever. So it turned out I knew more about like, stuff like Microsoft Excel than they did. So

00:06:50.320 | within about two months of me starting this one day a week job, I was working 90 hour weeks,

00:06:57.120 | basically doing analytical work for the consultants. And so that, you know, that actually worked out

00:07:05.920 | really well, because I kind of did a deal with them where they would, they gave me a full time

00:07:11.920 | office, and they would pay me $50 an hour for whatever time I needed. And so suddenly, I was

00:07:17.760 | actually making a lot of money, you know, working, working 90 hours a week. And yeah, it was great

00:07:28.560 | because then the I would come up with these solutions to things they're doing in the projects,

00:07:33.120 | and I'd have to present it to the client. So next thing I knew I was basically on the client side

00:07:37.040 | or all the time. So I ended up actually not going to any lectures at university. And I somehow kind

00:07:45.280 | of managed this thing where I would take two weeks off before each exam, go and talk to all my

00:07:50.720 | lecturers and say, Hey, I was meant to be in your university course. I know you didn't see me, but I

00:07:55.040 | was kind of busy. Can you tell me what I was meant to have done? And I would do it. And so I kind of

00:08:01.440 | scraped by a BA in philosophy, but I don't Yeah, you know, I don't really have much of an academic

00:08:08.080 | background. But that did give me a great background in like applying stuff like, you know,

00:08:15.280 | linear regression and logistic regression and linear programming and, you know,

00:08:19.760 | the basic analytical tools of the day, generally through VBA scripts in Excel, or, you know,

00:08:27.280 | access, you know, the kind of stuff that a consultant could chuck out, you know, on to their

00:08:32.560 | laptop at a client site. Anyway, I always felt guilty about doing that, because it just seemed

00:08:40.800 | like this ridiculously nerdy thing to be doing when I was surrounded by all these very important,

00:08:46.000 | you know, consultant types who seemed to be doing much more impressive strategy work. So I tried to

00:08:53.920 | get away from that as quickly as I could, because I didn't want to be the nerd in the company. And

00:09:00.480 | yeah, so I ended up spending the next 10 years basically doing strategy consulting. But throughout

00:09:06.080 | that time, I did, you know, because I didn't have the same background that they did that expertise,

00:09:12.320 | they did the MBA, they did, I had to solve things using data and analytically intensive approaches.

00:09:18.320 | So although in theory, I was a strategy management consultant, and I was working on problems like,

00:09:23.680 | you know, how do we fix the rice industry in Australia? Or, you know, how do we, you know,

00:09:29.360 | like, you know, how do we deal with this new competitor coming into this industry or whatever

00:09:33.680 | it was, I always did it by analyzing data, which actually turned out to be a good niche, you know,

00:09:40.000 | because I was the one McKinsey consultant in Australia who did things that way. And so I

00:09:44.640 | successful and I became I think, I ended up moving to AT Carney, which is the other of the two

00:09:50.800 | original management consulting firms. I think I became like the youngest manager in the world.

00:09:56.800 | And, you know, through this, we had parallel path I was doing. And then through that, learned about

00:10:03.840 | the insurance industry and discovered like the whole insurance industry is basically pricing

00:10:09.120 | things in a really dumb way. I developed this approach based on optimization of optimized

00:10:17.600 | pricing, launched a company with my university friend who had a PhD in operations research.

00:10:25.360 | And, yeah, so we built this new approach to pricing insurance, which is, it was kind of fun.

00:10:34.320 | I mean, it's, you know, it went well in the set, you know, commercially took a bit of about 10

00:10:41.600 | years doing doing that. And at the same time, running an email company called fast mail,

00:10:46.960 | which also went well. Yeah, we started out basically using C++. And I would say that was

00:10:55.920 | kind of the start of my array programming journey in that in those days, this is like 1999,

00:11:00.480 | the very first expression templates based approaches to C++ numeric programming were appearing.

00:11:07.840 | And so I, you know, was talking to the people working on those libraries doing stuff like

00:11:14.960 | particularly stuff doing the big kind of high energy physics experiments that were going on in Europe.

00:11:21.040 | It was ultimately pretty annoying to work with, though, like the amount of time it talked to

00:11:28.960 | compile those things, it would take hours. And it was quirky as all hell, you know, it's still

00:11:35.600 | pretty quirky doing metaprogramming in C++. But in those days, it was just a nightmare. Every

00:11:40.800 | compiler was different. So I ended up switching to C sharp shortly after that came out. And, you know,

00:11:49.280 | move in a way it was disappointing because that that was much less expressive as a kind of array

00:11:55.760 | programming paradigm. And so instead, I ended up basically grabbing Intel's MKL library, which is

00:12:04.960 | basically a blast on steroids, if you like, and writing my own C sharp wrapper to give me,

00:12:12.640 | you know, kind of array programming ish capabilities, but not with any of the features one

00:12:17.840 | would come to expect from a real array programming language around kind of

00:12:21.040 | dealing with rank sensibly, and, you know, not much in the way of broadcasting,

00:12:26.320 | which reminds me, we should come back for talking about blasts at some stage, because a lot of the

00:12:32.480 | reasons that most languages are so disappointing at array programming is because of our reliance on

00:12:37.360 | blasts, you know, as an industry. Fastmail, on the other hand, is being written in Perl,

00:12:45.360 | which I really enjoyed as a programming language and still do, I still love Perl a lot.

00:12:50.240 | But the scientific programming in Perl I didn't love at all. And so at the time, Perl 6,

00:13:01.840 | you know, we was just starting to the idea of it was being developed. So I ended up

00:13:06.560 | running the Perl 6 working group to add scientific programming capabilities or kind of, you know,

00:13:14.400 | and at the time, I described those APL inspired programming capabilities to Perl. And so I

00:13:20.560 | did an RFC around what we ended up calling hyper operators, which is basically the idea that any

00:13:27.200 | operator can operate on arrays and can broadcast over any axes that are mismatched or whatever.

00:13:35.600 | And those RFCs all ended up getting accepted. And Damien Conway and Larry Wall kind of expanded

00:13:42.640 | them a little bit. Perl 6 never exactly happened. It ended up becoming a language called Raku.

00:13:51.680 | With the butterfly logo. Yeah. And that, you know, and the kind of the performance ideas,

00:13:58.400 | I really worked hard on, never really happened either. So that was a bit of a,

00:14:01.760 | yeah, that was all a bit of a failure. But it was fun, and it was interesting.

00:14:05.920 | I, you know, so after running these companies for 10 years, one of the big problems with running a

00:14:16.000 | company is that you're surrounding by people who you hired, and they, you know, have to make

00:14:21.600 | you like them if they want to get promoted, you know, get fired. And so you could never trust

00:14:25.120 | anything anybody says. So I was, you know, very bad, very low expectations about my capabilities,

00:14:32.960 | analytics leagues. I hadn't like, you know, I'd basically been running companies for 10 years.

00:14:37.920 | I did a lot of coding and stuff, but it was in our own little world. And so after I sold those

00:14:47.920 | companies, yeah, I, one of the things I decided to do was to try actually to become more competent,

00:14:56.640 | you know, I had lost my, to some extent, I had lost my feeling that I should hide my nerdiness,

00:15:06.240 | you know, and try to act like a real business person. And I thought, no, I should actually

00:15:11.840 | see if I'm actually any good at this stuff. So I tried entering a machine learning competition

00:15:18.720 | at a new company that had just been launched called Kaggle with this goal of like, not coming last.

00:15:26.880 | So basically, the, you know, the way these things work is you have to make predictions on a data

00:15:37.760 | set. And at the end of the competition, whoever's predictions are the most accurate wins the prize.

00:15:46.080 | And so my goal was, yeah, try not to come last, which I wasn't convinced I'd be able to achieve.

00:15:52.800 | Because as I say, I didn't feel like this is, I'd never had any technical training,

00:15:59.600 | you know, and everybody else in these competitions were PhDs and professors or whatever else. So it

00:16:03.840 | felt like a high bar. Anyway, I ended up winning it. And that, that changed my life, right? Because,

00:16:12.000 | yeah, it was like, oh, okay, I am, you know, empirically good at this thing. And people

00:16:23.520 | at my local user groups, we used quite a bit as well. You know, I told them, I'm going to try

00:16:32.560 | entering this competition. Anyone want to create a team with me? I want to learn to use R properly.

00:16:37.360 | And I kind of went back to the next user group meeting and people were like, I thought you were

00:16:41.040 | just learning this thing. How did you win? I was like, I don't know. I just used common sense.

00:16:47.840 | Yeah, so I ended up becoming the chief scientist and president of Kaggle. And Kaggle, as you know,

00:16:54.320 | anybody in the data science world knows, has kind of grown into this huge, huge thing, ended up

00:16:59.760 | selling it to Google. So I ended up being an equal partner in the company. I was the first

00:17:04.080 | investor in it. And that was great. That was like, I just dove in, we moved to San Francisco for 10

00:17:11.760 | years. You know, surrounded, surrounded by all these people who are just sort of role models

00:17:18.400 | and idols, and partly getting to meet all these people in San Francisco was this experience of

00:17:24.880 | realizing all these people were actually totally normal, you know, and they weren't like some

00:17:30.160 | super genius level, like they're just normal people who, yeah, as I got to know them,

00:17:38.720 | it gave me, I guess, a lot more confidence in myself as well. So maybe they were just normal

00:17:44.720 | relative to you. I think in Australia, we all feel a bit, you know, intimidated by the rest of the

00:17:53.840 | world in some ways, or a long way away, you know, our only neighbors really have a New Zealand.

00:17:59.680 | It's very easy to feel, I don't know, like, yeah, we were not very

00:18:07.280 | confident about capabilities over here, other than in sport, perhaps.

00:18:13.040 | Yeah, so one of the things that happened well as a Kaggle was, I had played around with neural

00:18:20.480 | networks a bit, a good bit, you know, like 20 years earlier. And I always felt like neural networks

00:18:26.720 | were one day going to be the thing. It's like, you know, they are at a theoretical level,

00:18:34.080 | infinitely capable. But, you know, they never quite did it for me. And

00:18:41.760 | but then in 2012, suddenly, neural networks started achieving superhuman performance for

00:18:49.120 | the first time on really challenging problems, like recognizing traffic signs, you know,

00:18:54.080 | like recognizing pictures. And I'd always said to myself, I was going to watch for this moment,

00:19:00.160 | and when it happened, I wanted to like, jump on it. So as soon as I saw that, I tried to jump on

00:19:04.800 | it. So I started a new company, after a year of research into like the, you know, what what a

00:19:12.320 | neural network's going to do, I decided medicine was going to be huge, I need nothing about medicine.

00:19:18.160 | And I, yeah, I started a medicine company to see what we could do with deep learning in medicine.

00:19:23.200 | So that was analytic. Yeah, that ended up going pretty well. And yeah, eventually, I kind of got

00:19:33.200 | like a bit frustrated with that, though, because it felt like big learning can do so many things,

00:19:39.120 | and I'm only doing such a small part of those things. So deep learning is like neural networks

00:19:44.000 | with multiple layers. I thought the only way to actually help people really, you know, make the

00:19:51.520 | most of this incredibly valuable technology is to teach other people how to do it, and to help

00:19:56.800 | other people to do it. So my wife and I ended up studying a new, I'd call it kind of a research

00:20:02.560 | lab, fast AI, to, to help, to help do that, basically, initially focus on education,

00:20:09.760 | and then increasingly focus on research and software development to basically make it

00:20:15.520 | easier for folks to use some deep learning. And that's, yeah, that's where I am now. And that

00:20:23.280 | everything in deep learning is all Python. And in Python, we're very lucky to have,

00:20:30.080 | you know, excellent libraries that behave pretty consistently with each other,

00:20:36.160 | basically based around this NumPy library, which treats arrays very, very similarly to how

00:20:45.440 | Jay does, except rather than leading access, it's trailing access. But basically, you get,

00:20:51.920 | you know, you get loop free, you get broadcasting, you know, you don't get things like a rank

00:20:57.760 | conjunction, but there's very easy ways to permute axes. So you can do basically the same thing.

00:21:05.200 | Things like Einstein notation, you know, the built into the libraries, and then, you know, it's,

00:21:11.040 | it's trivially easy to have them run on GPUs or TPUs or whatever, you know, so it's for the last

00:21:20.400 | years of my life, nearly all the code I write is array programming code, even though I'm not

00:21:28.400 | using a purely array language. All right, so where do we start now with the questions?

00:21:35.760 | I'll let Bob and Adam go first if they want. And if they if they don't have a Okay, Bob, you go ahead.

00:21:44.080 | I've got a quick question about about neural networks and stuff. Because when I was going to

00:21:49.360 | university all those years ago, people were talking about neural networks, and then they just sort of

00:21:54.240 | dropped off the face. And as you said, around 2010, suddenly they resurfaced again. What do you think

00:21:59.520 | was the cause of that resurfacing? Was it hardware? Was it somebody discovered a new method or what?

00:22:04.480 | Yeah, mainly hardware. So what happened was people figured out how to do GP GPU, so general purpose

00:22:12.480 | GPU computing. So before that, I tried a few times to use GPUs with neural nets, I felt like that would

00:22:18.560 | be the thing. But GPUs were all about like creating shaders and whatever. And it was a whole jargon

00:22:25.840 | thing. I didn't even understand what was going on. So the key thing was in video coming up with this

00:22:31.680 | CUDA approach, which it's it's all loops, right? But it's much easier than the old way, like the

00:22:42.080 | loops, you basically, it's kind of loops, at least you basically say to CUDA, this is my kernel,

00:22:48.640 | which is the piece of code I want to basically run on each symmetric multiprocessing unit.

00:22:52.960 | And then you basically say launch a bunch of threads. And it's going to call your kernel,

00:23:00.080 | you know, basically incrementing the x and y coordinates and passing it to your kernel,

00:23:06.000 | making them available to your kernel. So it's a kind of it's not exactly a loop,

00:23:09.440 | but it's this gets more like a map, I guess. And so when CUDA appeared, yeah, very quickly,

00:23:16.320 | neural network libraries appear to take advantage appear appeared that would take advantage of it.

00:23:21.680 | And then suddenly, you know, you get orders of magnitude more performance. And it's cheaper.

00:23:28.240 | And you get to buy an Nvidia graphics card with a free copy of Batman, you know, on the excuse that

00:23:34.880 | actually this is all for work. So it was it was mainly that there's also this just like at the

00:23:41.920 | same time, the thing I'd been doing for 25 years, suddenly got a name data science, you know, we

00:23:49.440 | like this very small industry of people like applying data driven approaches to solving

00:23:54.960 | business problems. And we were always looking for a name. Not many people know this, but back in the

00:24:00.800 | very early days, there was an attempt to calling it industrial mathematics. Sometimes people would

00:24:06.480 | like shoehorn it into operations research or management science, but that was almost exclusively

00:24:11.680 | optimization people and specifically people focused more on linear programming approaches.

00:24:17.440 | So yeah, once data science appeared, and also like, you know, basically every company had

00:24:23.360 | finally built their data warehouse and the data was was there. So yeah, it's like more awareness

00:24:32.560 | of using data to solve business problems and for the first time availability of the hardware that

00:24:37.520 | we actually needed. And as I say, in 2012, it just it's it reached the point like it been growing

00:24:44.400 | since the first neural network was built in was at 1957, I guess, that this kind of gradual

00:24:53.040 | rate, but once it passed human performance on some tasks, it just kept going. And so now,

00:25:00.400 | in the last couple of months, you know, it's now like getting decent marks on MIT math tests and

00:25:08.800 | stuff. It's it's, it's on an amazing trajectory. Yeah, it's kind of a critical mass kind of thing,

00:25:16.080 | you get a certain amount of information and able to process and information it, I guess, as you

00:25:22.800 | as you do with your hand, it's an exponential curve. And humans and exponential curves,

00:25:28.720 | I think we're finding over and over again, we're not really great at understanding an exponential.

00:25:34.080 | No, no, we're not. And that's like why I promised myself that as soon as I saw neural net starting

00:25:41.440 | to look like they're doing interesting things, I would drop everything and jump on it, because I

00:25:45.360 | wanted to jump on that curve as early as possible. And we're now in this situation where people are

00:25:50.960 | just making huge amounts of money with neural nets, which they then reinvest back into making the

00:25:57.360 | neural nets better. And so we are also seeing this kind of bifurcation of capabilities where there's

00:26:03.920 | a small number of organizations who are extremely good at this stuff and invested in it and a lot

00:26:09.680 | of organizations that are, you know, really struggling to figure it out. And because of the

00:26:17.680 | exponential nature, when it happens, it happens very quickly, it feels like you didn't see it

00:26:22.240 | coming. And suddenly, it's there. And then it was past you. And I think you're all experiencing that

00:26:26.720 | now. Yeah, and it's happened in so many industries, you know, back in my medical startup, you know,

00:26:34.800 | we were interviewing folks around medicines, we interviewed a guy finishing his PhD in

00:26:42.160 | histopathology. And I remember, you know, he came in to do an interview with us. And he basically

00:26:49.440 | gave us a presentation about his thesis on kind of graph cut segmentation approaches for pathology

00:26:54.960 | slides. And at the end, he was like, anyway, that was my PhD. And then yesterday, because I knew I

00:26:59.920 | was coming to see you guys, and I heard you like neural nets, I just thought I'd check out neural nets.

00:27:04.000 | And about four hours later, I trained a neural net to do the same thing I did for my PhD. And

00:27:11.360 | way outperformed my PhD thesis, I'd spent the last five years on and so that's where I'm at, you know,

00:27:17.360 | and we hear this a lot. Existential crisis in the middle of an interview. Yes.

00:27:24.960 | So I kind of have, I don't know, this is like a 1A, B and C. And I'm not sure if I should ask them

00:27:34.000 | all at once. But so you said sort of at the tail end of the 90s is when your array language journey

00:27:40.880 | started. But it seems from the way you explained it that you had already at some point along the

00:27:45.280 | way heard about the array languages, APL and J, and have sort of alluded to, you know, picking up

00:27:52.640 | some knowledge about the paradigm and the languages. So my first part of the question is sort of,

00:27:58.240 | you know, at what point were you exposed to the paradigm in these languages? The second part is

00:28:04.000 | what's causing you in 2022 to really dive into it? Because you said you feel like maybe a bit of an

00:28:11.600 | imposter or the least qualified guest, which probably is you just being very modest. I'm sure

00:28:16.160 | you know still quite a bit. And then the third part is, do you have thoughts about, and I've

00:28:21.680 | always sort of wondered, how the array language paradigm sort of missed out on like, and Python

00:28:28.160 | ended up being the main data science language, while like there's like an article that's floating

00:28:34.480 | around online called NumPy, the ghost of Iverson, which it's this sort of you can see that in the

00:28:40.640 | names and the design of the library that there is an core of APL and even the documentation

00:28:45.760 | acknowledges that it took inspiration greatly from J and APL. But that like the array languages clearly

00:28:53.040 | missed what was a golden opportunity for their paradigm. And we ended up with libraries and

00:29:00.080 | other languages. So I just asked three questions at once. Feel free to tackle them in any order.

00:29:04.800 | I have a pretty bad memory. So I think I've forgotten the second one already. So you can

00:29:09.680 | feel free to come back to any or all of them. So my journey, which is what you started with,

00:29:18.560 | was I always felt like we should do more stuff without using code. Because I, or at least like

00:29:31.440 | kind of traditional, what I guess we'd call nowadays, imperative code. There was a couple

00:29:38.800 | of tools in my early days, which I've got huge amounts of leverage from because nobody else

00:29:45.760 | in at least the consulting firms or generally in our clients knew about them. So that was SQL and

00:29:52.240 | pivot tables. And so pivot tables, if you haven't come across it, was basically one of the earliest

00:29:58.240 | approaches to OLAP, you know, slicing and dicing. There was actually something slightly earlier

00:30:02.480 | called Lotus Improv, but that was actually a separate product. Excel was basically the first

00:30:07.200 | one to put OLAP in the spreadsheet. So no loops. You just drag and drop the things you want to group

00:30:12.560 | by and you right click to choose how to summarize. And same with SQL, you know, you declaratively

00:30:19.920 | say what you want to do. You don't have to loop through things. SAS actually had something similar.

00:30:25.600 | You know, with SAS, you could basically declare a prop that would run on your data. So yeah, I

00:30:32.080 | kind of felt like this was the way I would rather do stuff if I could. And I think that's what led

00:30:39.840 | me when we started doing the C++ implementation of the insurance pricing stuff of being much more

00:30:46.320 | drawn to these metaprogramming approaches. I just didn't want to be writing loops in loops and

00:30:55.200 | dealing with all that stuff. I'm too lazy, you know, to do that. I think I'm very driven by laziness,

00:31:04.400 | which as Larry Wall said is one of the three virtues of a great programmer. Then yeah, so I think

00:31:14.080 | when as soon as I saw NumPy had reached a level of some reasonable

00:31:22.400 | confidence in Python, I was very drawn to that because it was what I've been looking for.

00:31:28.400 | And I think maybe that actually is going to bring us to answering the question of like what happened

00:31:32.480 | for array languages. Python has a lot of problems, but at its heart, it's a very well-designed

00:31:41.680 | language. It has a very small, flexible core. Personally, I don't like the way most people

00:31:48.880 | write it, but it's so flexible I've able to create almost my own version of Python,

00:31:54.640 | which is very functionally oriented. I basically have stolen the type dispatch ideas from Julia,

00:32:01.600 | created an implementation of that in Python. My Python code doesn't look like

00:32:08.080 | most Python code, but I can use all the stuff that's in Python. So there's this very nicely

00:32:15.200 | designed core of a language, which I then have this almost this DSL on top of, you know, and

00:32:21.440 | NumPy is able to create this kind of DSL again because it's working on such a flexible

00:32:28.960 | foundation. Ideally, you know, I mean, well, okay, so Python also has another DSL built into it,

00:32:36.320 | which is math. You know, I can use the operators plus times minus. That's convenient. And

00:32:41.280 | in every array library, NumPy, PyTorch, TensorFlow, and Python, those operators work

00:32:47.680 | over arrays and do broadcasting over axes and so forth and, you know, accelerate on an accelerator

00:32:54.960 | like a GPU or a TPU. That's all great. My ideal world would be that I wouldn't just get to use

00:33:03.280 | plus times minus, but I get to use all the APL symbols. You know, that would be amazing.

00:33:10.080 | But given a choice between a really beautiful language, you know, at its core like Python,

00:33:18.480 | in which I can then add a slightly cobbled together DSL like NumPy, I would much prefer

00:33:24.720 | that over a really beautiful notation like APL, but without the fantastic language underneath,

00:33:32.480 | you know, like I don't feel like I there's nothing about APL or J or K's like

00:33:40.960 | programming language that attracts me. Do you know what I mean? I feel like in terms of like

00:33:47.840 | what I could do around whether it be type dispatch or how OO is designed or, you know, how I package

00:33:57.280 | modules or almost anything else, I would prefer the Python way. So I feel like that's basically

00:34:06.160 | what we've ended up with. You kind of either compromise between, you know, a good language

00:34:10.720 | with, you know, slightly substandard notation or amazingly great notation with the substandard

00:34:17.840 | language or not just language, but ecosystem. Python has an amazing ecosystem.

00:34:25.600 | I think I hope one day we'll get the best of both, right? Like here's my, okay, here's my

00:34:35.200 | controversial take and it may just represent my lack of knowledge. What I like about APL is its

00:34:42.960 | notation. I think it's a beautiful notation. I don't think it's a beautiful programming language.

00:34:50.480 | I think some things, possibly everything, you know, some things work very well as a notation,

00:35:00.160 | but to get to raise something to the point that it is a notation requires some years of study

00:35:07.680 | and development and often some genius, you know, like the genius of Feynman diagrams or the genius

00:35:15.040 | of juggling notation, you know, like there are people who find a way to turn a field into a

00:35:23.040 | notation and suddenly they blow that field apart and make it better for everybody.

00:35:29.360 | For me, like, I don't want to think too hard all the time. Every time I come across something that

00:35:36.320 | really hasn't been turned into a notation yet, you know, sometimes I just like, I just want to

00:35:43.040 | get it done, you know, and so I would rather only use notation when I'm in these fields

00:35:50.480 | that either somebody else had figured out how to make that a notation or I feel like it's really

00:35:55.520 | worth me investing to figure that out. Otherwise, you know, there are, and the other thing I'd say

00:36:02.080 | is we already have notations for things that aren't APL that actually work really well,

00:36:06.000 | like regular expressions, for example. That's a fantastic notation and I don't want to

00:36:12.320 | replace that with APL glyphs. I just want to use regular expressions.

00:36:20.720 | So, yeah, my ideal world would be one where we, where I can write PyTorch code, but maybe instead

00:36:28.320 | of like Einstein operations, Einstein notation, I could use APL notation. I think that's where

00:36:39.600 | I would love to get to one day and I would love that to totally transparently run on a GPU or TPU

00:36:47.920 | as well. That would be my happy place. Has no reason to do with the fact that

00:36:54.000 | I work at NVIDIA that I would love that. Interesting. I've never heard that before,

00:37:00.240 | the difference between basically appreciating or being in love with the notation, but not the

00:37:08.000 | language itself and that. And, you know, it started out as a notation, right? Like I was in,

00:37:14.640 | you know, it was a notation they used for representing state machines or whatever on

00:37:20.080 | early IBM hardware, you know, when he did his Turing Award essay, he chose to talk about his

00:37:27.040 | notation. And, you know, you see with people like Aaron with his code defense stuff that

00:37:37.680 | if you take a very smart person and give them a few years, they can use that notation to solve

00:37:43.840 | incredibly challenging problems like build a compiler and do it better than you can

00:37:50.320 | without that notation. So I'm not saying like, yeah, APL can't be used to almost anything you

00:37:58.000 | want to use it for, but a lot of the time we don't have five years to study something very closely.

00:38:04.400 | We just want to, you know, we've got to get something done by tomorrow.

00:38:11.360 | Interesting. You're still again, you didn't get a answer to.

00:38:15.680 | Oh, yeah. When did you first, well, when did you first meet APL or how did you even find APL?

00:38:20.480 | I first found J, I think, which obviously led me to APL. And I don't quite remember where I saw it.

00:38:34.880 | Yeah. And actually, when I got to San Francisco, so that would be I'm trying to remember

00:38:45.760 | 2010 or something, I'm not sure. I actually reached out to Eric Iverson and I said, like,

00:38:54.640 | oh, you know, we're starting this machine learning company called Kaggle. And I kind of feel like,

00:39:02.240 | you know, everybody does stuff in Python, and it's kind of in a lot of ways really disappointing.

00:39:06.000 | I wish we're doing stuff in J, you know, but we really need everything to be running on the GPU,

00:39:12.240 | or at least everything to be automatically using SIMD and multiprocessor everywhere.

00:39:18.000 | Here's kind of enough to actually jump on a Skype call with me, not just jump on a Skype call,

00:39:23.440 | it's like, how do you want to chat? It's like, how about Skype? And he created a Skype account.

00:39:27.760 | Like, oh, yeah, we chatted for quite a while. We talked about, you know, these kinds of hopes and

00:39:35.600 | yeah, but I just, you know, never really because neither J or APO is in that space yet.

00:39:46.880 | There was just never a reason for me to do anything other than like,

00:39:51.200 | it kind of felt like each time I'd have a bit of a break for a couple of months,

00:39:54.800 | I'd always been a couple of weeks fiddling around with J just for fun. But that's as far as I got,

00:40:02.000 | really. Yeah, I think the first time I'd heard of you was in an interview that Leo Laporte did with

00:40:08.240 | you on triangulation, and you were talking about Kaggle. That was a specific thing. But I think

00:40:13.280 | I was riding my bike along some logging or something and suddenly he said, oh, yeah, but

00:40:17.120 | a lot of people use J. I like J. It's the first time I'd ever heard anybody on a podcast say

00:40:22.960 | anything about J. It was just like, wow, that's amazing. And the whole interview about Kaggle,

00:40:31.120 | there was so much of it about the importance of data processing, not just having a lot of

00:40:36.640 | data, but knowing how to filter it down, not over filtering all those tricks. I'm thinking,

00:40:41.600 | wow, these guys are really doing some deep stuff with this stuff and this guy is using J.

00:40:47.280 | I was actually very surprised at that point that somebody, I guess not somebody who was

00:40:54.080 | working so much with data would know about J, but just that it would be,

00:40:58.080 | I guess just suddenly popped onto my headsets and I'm just, wow, that's so neat.

00:41:04.720 | And I will say, in the array programming community, I find there's essentially a common misconception

00:41:11.200 | that the reason people aren't using array programming languages is because they don't

00:41:16.160 | know about them or don't understand them, which there's a kernel of truth of that,

00:41:22.240 | but the truth is nowadays there's huge massively funded research labs at places like Google Brain

00:41:31.920 | and Facebook AI Research and OpenAI and so forth where large teams of people are literally writing

00:41:39.520 | new programming languages because they've tried everything else and what's out there is not

00:41:44.080 | sufficient. In the array programming world, there's offered a huge underappreciation of

00:41:52.720 | what Python can do nowadays, for example. As recently as last week, I heard it described in

00:41:59.440 | a chat room, it's like people obviously don't care about performance because they're using Python.

00:42:04.160 | And it's like, well, a large amount of the world's highest performance computing now is done with

00:42:10.800 | Python. It's not because Python's fast, but if you want to use RAPIDS, for example, which literally

00:42:19.040 | holds records for the highest performance recommendation systems and tabular analysis,

00:42:26.000 | you write it in Python. So this idea of having a fast kernel that's not written in the language

00:42:38.160 | and then something else talking to it in a very flexible way, I think is great. And as I say,

00:42:43.200 | at the moment, we are very hamstrung in a lot of ways that we, at least until recently, we very

00:42:48.880 | heavily relied on BLAS, which is totally the wrong thing for that kind of flexible high-performance

00:42:57.680 | computing because it's this bunch of somewhat arbitrary kind of selection of linear algebra

00:43:05.920 | algorithms, which, you know, things like the C# work I did, you know, they were just RAPIDS on

00:43:11.120 | top of BLAS. And what we really want is a way to write really expressive kernels that can do

00:43:18.240 | anything over any axes. So then there are other newer approaches like Julia, for example, which

00:43:31.360 | is kind of like got some rispy elements to it and this type dispatch system. But because it's,

00:43:36.720 | you know, in the end, it's on top of LLVM. What you write in Julia, you know, it does end up

00:43:45.840 | getting optimized very well. And you can write pretty much arbitrary kernels in Julia and often

00:43:52.320 | get best-in-class performance. And then there's other approaches like JAX. And JAX sits on top

00:44:02.480 | of something totally different, which is it sits on top of XLA. And XLA is a compiler, which is

00:44:09.280 | mainly designed to compile things to run fast on Google's TPUs. But it also does an okay job of

00:44:17.040 | compiling things to run on GPUs. And then really excitingly, I think, you know, for me is the MLIR

00:44:26.240 | project, and particularly the affine dialect. So that was created by my friend, Chris Latner,

00:44:34.240 | who you probably know from creating Clang and LLVM and Swift. So he joined Google for a couple

00:44:45.040 | of years. And we worked really closely together on trying to like, think about the vision of

00:44:49.920 | really powerful programming on accelerators that's really developer friendly. Unfortunately,

00:44:58.480 | didn't work out. Google was a bit too tight to TensorFlow. But one of the big ideas that did

00:45:04.240 | come out of that was MLIR, and that's still going strong. And I do think there's, you know, if

00:45:09.040 | something like APO, you know, could target MLIR and then become a DSL inside Python, it may yet win,

00:45:18.800 | you know. I've heard Yeah, I've heard you in the past say that, on different podcasts and talks,

00:45:24.960 | that you don't think that Python, even in light of, you know, just saying, people don't realize how

00:45:31.200 | much you can get done with Python, that you don't think that the future of data science and AI and

00:45:35.200 | neural networks and that type of computation is going to live in the Python ecosystem. And I've

00:45:41.040 | heard on some podcasts, you've said that, you know, Swift has a shot based on sort of the way that

00:45:44.400 | they've designed that language. And you just mentioned, you know, a plethora of different

00:45:48.160 | sort of, I wouldn't say initiatives, but you know, JAX, XLA, Julia, etc. Do you have like a sense

00:45:53.600 | of where you think the future of, not necessarily sort of array language computation, but this kind

00:45:59.680 | of computation is going with all the different avenues? I do. You know, I think we're certainly

00:46:08.560 | seeing the limitations of Python, and the limitations of the PyTorch, you know,

00:46:15.520 | lazy evaluation model, which is the way most things are done in Python at the moment,

00:46:25.280 | for kind of array programming is you have an expression, which is, you know, working on

00:46:31.200 | arrays, possibly of different ranks with implicit looping. And, you know, that's one line of Python

00:46:37.200 | code. And generally, that then gets your, you know, on your computer, that'll get turned into,

00:46:43.280 | you know, a request to run some particular optimized pre written operation on the GPU or

00:46:52.000 | TPU, that then gets sent off to the GPU or TPU, where your data has already been moved there.

00:46:58.960 | It runs, and then it tells the CPU when it's finished. And there's a lot of latency in this,

00:47:06.800 | right? So if you want to create your own kernel, like your own way of doing, you know, your own

00:47:12.480 | operation effectively, you know, good luck with that. That's not going to happen in Python.

00:47:19.600 | And I hate this, I hate it as a teacher, because, you know, I can't show my students what's going

00:47:26.080 | on, right? It kind of goes off into, you know, kind of CUDA land and then comes back later.

00:47:33.520 | I hate it as a hacker, because I can't go in and hack at that, I can't trace it, I can't debug it,

00:47:39.280 | I can't easily profile it. I hate it as a researcher, because very often I'm like,

00:47:44.400 | I know we need to change this thing in this way, but I'm damned if I'm going to go and write my own.

00:47:49.680 | CUDA code, let alone deploy it. So JAX is, I think, a path to this. It's where you say, okay, let's not

00:47:58.160 | target pre-written CUDA things, let's instead target a compiler. And, you know, working with

00:48:07.360 | Chris Latner, I'd say he didn't have too many nice things to say about XLA as a compiler. It was not

00:48:13.040 | written by compiler writers, it was written by machine learning people, really. But it does the

00:48:19.760 | job, you know, and it's certainly better than having no compiler. And so JAX is something which,

00:48:26.080 | instead of turning our line of Python code into a call to some pre-written operation,

00:48:32.400 | it instead is turning it into something that's going to be read by a compiler. And so the compiler

00:48:37.280 | can then, you know, optimize that as compilers do. So, yeah, I would guess that JAX probably has

00:48:46.560 | a part to play here, particularly because you get to benefit from the whole Python ecosystem,

00:48:54.320 | package management, libraries, you know, visualization tools, et cetera.

00:49:04.560 | But, you know, longer term, it's a mess, you know, it's a mess using a language like Python which

00:49:10.640 | wasn't designed for this. It wasn't really even designed as something that you can chuck

00:49:16.880 | different compilers onto. So people put horrible hacks. So, for example, PyTorch,

00:49:21.440 | they have something called TorchScript, which is a bit similar. It takes Python and kind of compiles

00:49:26.800 | it. But they literally wrote their own parser using a bunch of regular expressions. And it's

00:49:34.080 | it's, you know, it's not very good at what it does. It even misreads comments and stuff.

00:49:39.120 | So, you know, I do think there's definitely room for, you know, a language of which Julia would

00:49:47.520 | certainly be the leading contender at the moment to come in and do it properly. And Julia's got,

00:49:54.800 | you know, Julia is written on a scheme basis. So there's this little scheme kernel

00:50:01.440 | that does the parsing and whatnot. And then pretty much everything else after that is written in

00:50:06.560 | Julia. And, of course, leveraging LLVM very heavily. But I think that's what we want, right?

00:50:14.000 | Is that something which I guess I didn't love about Swift. When the team at Google wanted to

00:50:19.840 | add differentiation support into Swift, they wrote it in C++. And I was just like, that's not a good

00:50:26.960 | sign. You know, like, apart from anything else, you end up with this group of developers who are,

00:50:35.040 | in theory, Swift experts, but they actually write everything in C++. And so they actually don't have

00:50:40.800 | much feel for what it's like to write stuff in Swift. They're writing stuff for Swift. And Julia,

00:50:45.760 | pretty much everybody who's writing stuff for Julia is writing stuff in Julia. And I think that's

00:50:52.880 | something you guys have talked about around APL and J as well, is that there's the idea of writing

00:50:59.920 | J things in J and APL things in APL is a very powerful idea.

00:51:04.080 | Yeah, I always wonder about it.

00:51:08.240 | Yeah, sorry, go on. I just remembered your third question. I'll come back to it.

00:51:11.200 | No, no, no, you go ahead. You had.

00:51:12.320 | Oh, you asked me why now am I coming back to APL and J, which is

00:51:16.160 | totally orthogonal to everything else we've talked about, which is I had a daughter,

00:51:23.520 | she got old enough to actually start learning math. So she's six.

00:51:27.680 | And oh, my God, there's so many great educational apps nowadays. There's one called Dragonbox

00:51:36.800 | Algebra. It's so much fun. Dragonbox Algebra five plus. And it's like five plus algebra,

00:51:42.640 | like what the hell? So when she's, I think she actually says still four, I gave, you know,

00:51:46.640 | I let her play with Dragonbox Algebra five plus. And she learned Algebra, you know, by helping

00:51:52.080 | Dragon eggs hatch. And she liked it so much, I let her try doing Dragonbox Algebra 12 plus.

00:52:00.480 | And she loved that as well and finished it. And so suddenly I had a five year old kid that liked

00:52:05.440 | Algebra. Much, much surprised. Kids really can surprise you. And so, yeah, she struggled with

00:52:16.320 | a lot of the math that they were meant to be doing at primary school, like,

00:52:20.880 | like the vision and modification, but she liked Algebra. And we ended up homeschooling her.

00:52:28.240 | And then one of our, her best friend is also homeschooled. So this, this year I decided I'd

00:52:35.440 | try tutoring them in math together. And so my daughter's name's Claire, so her friend Gabe,

00:52:44.400 | so her friend Gabe discovered on his Mac the world of alternative keyboards. So he would

00:52:49.280 | start typing in the chat in, you know, Greek characters or Russian characters. And one day

00:52:55.760 | I was like, okay, check this out. So I like typed in some APL characters and they were just like,

00:53:01.520 | wow, what's that? We need that. So initially we installed dialogue APL so that they could

00:53:08.480 | type APL characters in the chat. And so I explained to them that this is actually

00:53:16.000 | this like super fancy math that you're typing in. And they really wanted to try it. So,

00:53:22.480 | and that was at the time I was trying to teach them sequences and series,

00:53:28.800 | and they were not getting it at all. It was my first total failure time as a, as a math tutor

00:53:35.440 | with them, you know, they'd been zipping along, fractions, you know, greatest common denominator,

00:53:42.240 | factor trees. Okay, everything's fine. It makes sense. And then we hit sequences and series. And

00:53:47.040 | it's just like, they had no idea what I was talking about. So we put that aside. Then we spent like

00:53:55.280 | three one hour lessons doing the basics of APL, you know, the basic operations and doing stuff

00:54:03.840 | with lists and dyadic versus monadic, but still, you know, just primary school level math.

00:54:11.360 | And we also did the same thing in NumPy using Jupyter. And they really enjoyed all that,

00:54:16.080 | like they were more engaged than our normal lessons. And so then we came back to like,

00:54:23.200 | you know, sigma i equals one to five of i squared, whatever. And I was like, okay,

00:54:29.680 | that means this, you know, in APL and this in NumPy. And they're like, oh, is that all?

00:54:38.720 | Fine. With, you know, that's like, yeah, so that was a problem. This idea of like Tn equals Tn

00:54:45.680 | minus one plus blah, blah, blah, blah. It's like, what is this stuff? But when you're actually

00:54:50.160 | indexing real things and can print out the intermediate values and all that, and you've

00:54:56.480 | got iota or a range, they were just like, oh, okay. You know, I don't know why you explained it this

00:55:03.440 | dumb way before. And I will say, given a choice between doing something on a whiteboard or doing

00:55:09.760 | something in NumPy or doing something in APL, now they will always pick APL because the APL version

00:55:15.760 | is just so much easier. You know, there's less to type, there's less to think about,

00:55:20.800 | there's less boilerplate. And so it's been, it's only been a few weeks, but like yesterday,

00:55:26.240 | we did the power operator, you know, and so we literally started doing the foundations of

00:55:32.320 | metamathematics. So it's like, okay, let's create a function called capital S, capital S arrow,

00:55:38.880 | you know, plus jot one, right? So for those Python people listening, jot is,

00:55:46.400 | if you give it an array or a scalar, it's the same as partial in Python or bind in C++.

00:55:59.920 | So, okay, we've now got something that adds one to things. Okay. I said, okay,

00:56:02.800 | this is called the successor function. And so I said to them, okay, what would happen if we go

00:56:06.960 | SSS zero? And they're like, oh, that would be three. And so I said, okay, well, what's,

00:56:14.400 | what's addition? And then one of them's like, oh, it's, it's repeated S. I'm like, yeah,

00:56:19.520 | it's repeated S. So how do we say repeated? So in APL, we say repeated by using this

00:56:24.720 | star diuresis. It's called power. Okay. So now we've done that. What is multiplication?

00:56:30.800 | And then one of them goes after a while. Oh, it's repeated addition. So we define addition,

00:56:36.880 | and then we define multiplication. And then I'm like, okay, well, what about, you know, exponent?

00:56:43.440 | Oh, that's just, now this one, they've heard a thousand times. They both are immediately like,

00:56:47.760 | oh, that's repeated multiplication. So like, okay, we've now defined that. And then, okay, well,

00:56:52.640 | subtraction, that's a bit tricky. Well, it turns out that subtraction is just, you know, is the

00:56:58.160 | opposite of something. What's it the opposite of? They both know that. Oh, that's the opposite of

00:57:01.680 | addition. Okay. Well, opposite of, which in math, we call inverse is just a negative power. So now

00:57:08.480 | we define subtraction. So how would you define division? Oh, okay. How would you define roots?

00:57:13.600 | Oh, okay. So we kind of like, you know, designing the foundations of, of mathematics here at APL,

00:57:22.560 | you know, with a six year old and an eight year old. And during this whole thing at one point,

00:57:27.840 | we're like, okay, well, now I can't remember why, but we're like, okay, now we got to do one divided

00:57:32.000 | by a half. And they both like, we don't know how to do that. So, you know, APL, this stuff that's

00:57:38.880 | considered like college level math suddenly becomes easy. And, you know, at the point when still

00:57:45.360 | primary school level math, like one divided by a half is considered hard. So it definitely made

00:57:50.720 | me rethink, you know, what is easy and what is hard and how to teach this math stuff. And I've

00:57:58.880 | been doing a lot of teaching of math with APL and the kids are loving it. And I'm loving it. And

00:58:04.480 | that's actually why I started this study group, which will be on today. Today, as we record this

00:58:11.680 | a few days ago, as you put it out there, as I kind of started saying on Twitter to people like,

00:58:17.920 | oh, it's really been fun teaching my kids, you know, my kid and a friend math using APL and a lot of

00:58:23.120 | adults were like, ah, can we learn math using APL as well? So that's what we're going to do.

00:58:32.320 | Well, and that's the whole notation thing, isn't it? It's the notation you get away from the

00:58:36.000 | sigmas and the pies and all that, you know, subscripts. I know, right? This is exactly

00:58:40.560 | what Everson wanted. Yeah, exactly. I mean, who wants this, you know, why should capital pi be

00:58:47.440 | product and capital sigma be sums? Like, you know, we did class slash and it's like, okay,

00:58:54.320 | how do we do product? They're like, oh, it's obviously time slash. And I show them backslash,

00:58:58.000 | it's like, how do we do our cumulative product? And so it's obviously time spec slash. Yeah,

00:59:02.960 | this stuff. And but, you know, a large group of adults can't handle this because I'll put stuff

00:59:09.040 | on Twitter. I'll be like, here's a cool thing in APL. And like half the replies will be like,

00:59:13.440 | well, that's line noise. That's not intuitive. It's like, how do you say that? It's this classic

00:59:21.280 | thing that I've always said, it's like the difference between what you said that you don't

00:59:25.520 | understand it, or is it that it's hard? And, you know, kids don't know for kids, everything's new.

00:59:32.720 | So that, you know, they see something they've never seen before. They're just like, teach me

00:59:36.640 | that. Or else adults, or at least a good chunk of adults, just like, I don't immediately understand

00:59:42.000 | that. Therefore, it's too hard for me. Therefore, I'm gonna belittle the very idea of the thing.

00:59:47.760 | I did, I did a tacit program on one liner on APL farm the other day. And somebody said,

00:59:54.160 | that looks like Greek to me. I said, well, Greek looks like Greek to me, because I don't know Greek.

00:59:58.640 | I mean, sure. If you don't know it, absolutely, it looks silly. But if you know it, then it's,

01:00:04.480 | it's not that hard. Yeah, I will say like, you know, a lot of people have put a lot of hard work into

01:00:12.160 | resources for APL and J teaching. But I think there's still a long way to go. And one of the

01:00:20.400 | challenges is, it's like when I was learning Chinese, I really wanted to I like the idea of

01:00:26.240 | learning Chinese new words by looking them up in a Chinese dictionary. But of course, I didn't know

01:00:31.280 | what the characters in the dictionary meant. So I couldn't look them up. So when I learned Chinese,

01:00:35.840 | I really spent the first 18 months just focused on learning characters. So I got through 6000

01:00:41.760 | characters in 18 months of very hard work. And then I could start looking things up in

01:00:47.040 | dictionary. My hope is to do a similar thing for APL, like for these study groups,

01:00:53.120 | I want to try to find a way to introduce every glyph in an order that never refers

01:01:01.040 | to glyphs you haven't learned yet. Like that's something I don't feel like we really have. And

01:01:05.280 | so that then you can look up stuff in the dialogue documentation. Because now still, I don't know

01:01:11.520 | that many glyphs. So like most of the stuff in the documentation, I don't understand because it

01:01:17.680 | explains glyphs using glyphs I don't yet know. And then I look those up. And those are used,

01:01:21.840 | explain things with glyphs I don't yet know. So, you know, step one for me is I think we're just

01:01:27.120 | going to go through and try to teach what every glyph is. And then I feel like we should be able

01:01:32.480 | to study this better together, because then we could actually read the documentation, you know,

01:01:38.080 | to publish these sessions online. Yeah, so the study group will be recorded as videos.

01:01:45.840 | But I also then want to actually create, you know, written materials using Jupiter,

01:01:52.320 | which I will then publish. That's my goal. So what you said very much resonates with me,

01:01:58.720 | that I often find myself in the when teaching people this this bind that to explain everything

01:02:06.240 | I need to already have everything explained. And I think so and especially it comes down to,

01:02:12.960 | in order to explain what many of these glyphs are doing, I need some fancy arrays. If I restrict

01:02:18.400 | myself to simple vectors and scalers, then I can't really show their power. And I cannot create these

01:02:24.800 | higher rank arrays without already using those glyphs. And so hopefully, it is this long running

01:02:30.960 | project since like 2015, I think it is, is to add a literal array notation to APL.

01:02:37.840 | And then there is a way in, then you can start by looking at an array, and then you can start

01:02:45.280 | manipulating and see the effects of the glyphs and intuit from there what they do.

01:02:49.680 | Yeah, no, I think that'll be very, very helpful. And in the meantime, you know,

01:02:54.160 | my approach with the kids has just been to teach row quite early on. So row is the equivalent of

01:03:00.560 | reshape in Python, most Python libraries. And yeah, so once you know how to reshape,

01:03:09.440 | you can start with a vector and shape it to anything you like. And it's, you know,

01:03:13.120 | it's not a difficult concept to understand. So I think that yeah, basically, the trick at the

01:03:17.200 | moment is just to say, okay, in our learning of the dictionary of APL, one of the first things

01:03:22.240 | we will learn is, is row. And that was really fun with the kids doing monadic row, you know,

01:03:29.760 | to be like, okay, well, what's row of this? What's row of that? And okay, what's row of row of this?

01:03:34.880 | And then what's row of row of row, which then led me to the to the storm and poem about

01:03:44.240 | what is it row row row is one, etc, etc, which they loved as well.

01:03:52.160 | Yeah, we'll link that in the show notes. Also, too, while you were saying all that,

01:03:56.480 | that really resonated me with me when I first started learning APL is like one of the first

01:04:03.200 | things that happened when I was like, okay, you can, you can fold, you can map. So like,

01:04:08.640 | how do you filter, you know, what are the classic, you know, three functional things? And the problem

01:04:13.120 | with APL and array languages is they don't have an equivalent filter that takes a predicate function,

01:04:18.240 | they have a filter that is called compress that takes a mask that, you know, drops anything that

01:04:23.520 | corresponds to a zero. And it wasn't until a few months later that I ended up discovering it. But

01:04:28.240 | for both APL and the newer APL BQN, there's these two sites, Adam was the one that wrote the APL one

01:04:35.440 | apple cart dot info, and bacon crate dot info, I also think. And so you can basically

01:04:41.200 | semantically search for what you're trying to do. And it'll give you small expressions that do that.

01:04:46.560 | So if you type in the word filter, which is what you would call it coming from, you know,

01:04:52.000 | a functional language, or even I think Python calls it filter, you can get a list of small

01:04:57.440 | expressions. And really, really often, sometimes you need to know the exact thing that it's called,

01:05:03.280 | like one time I was searching for, you know, all the combinations or permutations. And really,

01:05:07.280 | what I was looking for was power set. And so until you have that, you know, the word power set,

01:05:11.920 | it's, you know, it's a fuzzy search, right? So but it's still a very, very useful tool when it's like

01:05:18.000 | you said, you're trying to learn something like Chinese. And it's like, well, where do I even

01:05:21.040 | start I don't I don't know the language to search the words to search for. But yeah, it is. I agree

01:05:29.520 | that there's a large room from improvement and how to onboard people without them immediately going,

01:05:35.520 | like you said, this looks like hieroglyphics, which I think Iverson considered a compliment,

01:05:39.760 | like there's some anecdote I've heard where someone was like, this is hieroglyphics. And he says,

01:05:42.960 | yes, exactly. And then the other thing like that I want to do is help in particular Python programmers

01:05:52.080 | and maybe also do something for JavaScript programmers, which are the two most popular

01:05:55.680 | languages, like at the moment, like a lot of the tutorials for stuff like J or whatever,

01:06:01.680 | like J for C programmers, you know, great book, but most people aren't C programmers. And also

01:06:07.680 | a lot of the stuff like, you know, it'd be so much easier if somebody just like said to me early on,

01:06:14.000 | oh, you know, just the same as partial in Python, you know, or it's like, you know, putting things

01:06:23.040 | in a box, what the hell's a box if somebody basically said, oh, it's basically the same

01:06:26.400 | as a reference. It's like, oh, okay, you know, I think it one of your podcasts, somebody said,

01:06:30.720 | oh, it's like void stars. Oh, yeah, okay. You know, this is kind of like lack of just saying,

01:06:36.160 | like, this is actually the same thing as in Python and JavaScript. So I do want to do some kind of

01:06:42.320 | yeah, mapping, yeah, like that, particularly for kind of NumPy programmers and stuff, because a

01:06:50.080 | lot of it's so extremely similar. Be nice to kind of say like, okay, well, this is, you know, J

01:06:56.960 | maps things over leading axes, which is exactly the same as NumPy, except it doesn't have trailing

01:07:02.240 | axes. So if you know the NumPy rules, you basically know the J rules. Yeah, I think I think at the

01:07:09.520 | basic level, you're absolutely right. And that that would certainly be really useful. When we've

01:07:14.080 | talked this over before, some of the challenges are in the flavors and the details. If you send

01:07:21.040 | somebody down the wrong road with a metaphor that almost works in some of these areas, it can really

01:07:26.560 | be challenging for them, because they see it in with, you know, through their lens of their experience.

01:07:33.520 | But that would say, in this area, it would work differently than it actually does. So there is a

01:07:39.760 | challenge in that. And we find it even between APL, BQN and J. I'm trying to think of what we were

01:07:46.240 | talking about. Oh, it was transpose, the language, the language is dyadic transpose is they hand,

01:07:51.600 | they handle them differently. They're functionally, you can do the same things, but you have to be a

01:07:56.400 | aware that they are going to do it differently, according to the language. Absolutely. But that's

01:08:01.520 | not a reason to throw out the analogy, right? Like, I think everybody agrees that that it's easier for

01:08:06.480 | an APL programmer to learn J, than for a C or JavaScript programmer to learn J, you know,

01:08:14.800 | because there are some ideas you understand. And you can actually say to people like, okay, well,

01:08:19.760 | this is the rank conjunction in J. And you may recognize this as being like the rank, you know,

01:08:24.320 | operator in APL. So if we can do something like that and say like, oh, well, okay, this would do

01:08:29.840 | the same thing as, you know, dot permute, dot blah in PyTorch. It's like, okay, I see it.

01:08:40.000 | Well, as the maintainer of apple cart, I'd like to throw in a little call to the listeners. Like

01:08:45.920 | what Connor mentioned, I do fairly often get people saying, well, I couldn't find this and

01:08:51.200 | ask them, what did you search for? So do let me know, contact me by whatever means, say, if you

01:08:56.080 | couldn't find something, either because it's altogether missing, and I might be able to edit,

01:08:59.840 | or tell me what you search for and couldn't find, or maybe you found it later by searching for

01:09:04.720 | something else. And I'll add those keywords for future users. And I have put in a lot of like

01:09:11.200 | function names from other programming languages so that you can search for those and find the

01:09:15.920 | APL equivalent. Yeah, I will say, I feel like either I'm not smart enough to use applecart.info,

01:09:24.320 | or I haven't got the right tutorial yet. Because I, I went there, I've been there a few times.

01:09:30.240 | And there's this like whole lot of like impressive looking stuff. And I just I, I don't want to know

01:09:36.400 | what to do with it. And then I sometimes click things and it sends me over to this tao.run that

01:09:40.880 | tells me like real time 0.02 seconds code, like, I find it, you know, a little, not a little, I

01:09:50.080 | have not yet, I don't yet know how to use it. And so, you know, I guess given hearing you guys say

01:09:57.520 | this is a really useful tool that a lot of people put a lot of time into, I should obviously invest

01:10:02.160 | time learning how to use it. And maybe after doing that, I should explain to people how to use it.

01:10:07.840 | I do have a video on it. And there's also a little question mark icon one can click on and get to.

01:10:12.960 | I have tried the question mark icon as well. As I say, it might just you know, I think this often

01:10:21.120 | happens with APL stuff. I often hit things and I feel like maybe I'm not smart enough to understand

01:10:25.840 | this. Clearly don't think that's if we disagree. Yeah, I do recall you saying a few minutes ago

01:10:37.440 | that you managed to teach your, you know, four year old daughter like 12 grade or age 12 algebra.

01:10:43.200 | No, I didn't. I just gave her the app, right? It's like it's I've heard other parents have given it

01:10:49.600 | to their kids. They all seem to handle it. It's it's just this fun game where you hatch dragon eggs

01:10:54.240 | by like dragging things around on the iPad screen. And it just it so happens that the things you're

01:10:59.120 | doing with dragon's eggs are the rules of algebra. And after a while, it starts to switch out some of

01:11:06.320 | the like monsters with symbols like x and y, you know, and it does it gradually, gradually. And at

01:11:12.400 | the end, it's like, oh, now you're doing it after birth. So I can't get any credit for that. That's

01:11:17.600 | some very, very clever people wrote a very cool thing. It really is an amazing program. I homeschooled

01:11:23.120 | my son as well. And we used that for algebra. Great. Yeah, it was a bit more age appropriate,

01:11:28.160 | but it's I, I looked at that and said that that really is well put together. It's it's an amazing

01:11:35.120 | program. I will say there'll be a Dragonbox APO one day. It's not a bad idea. Not a bad idea at all.

01:11:43.920 | I was going to say when you're teaching somebody, one of the big challenges when you're sort of

01:11:47.360 | trying to get a language across to a general audience is who is the audience? Because as you

01:11:53.440 | say, if you're if you're dealing with kids or people who haven't been exposed to programming

01:11:58.640 | before, that's a very different audience than somebody might have been exposed to some other

01:12:03.600 | type of programming. Functional programming is a bit closer, but if you're a procedural programmer

01:12:08.480 | or imperative programmer, it's going to be a stretch to try and bend your mind in the different

01:12:13.120 | ways that, you know, APL or J or BQN expect you to think about things. Yeah, I think the huge rise

01:12:20.800 | of functional programming is very helpful for coming to array programming, you know,

01:12:26.400 | both in JavaScript and in Python. It's, you know, I think most people are doing stuff,

01:12:34.240 | particularly in the machine learning and deep learning world, are doing a lot of functional

01:12:38.480 | stuff off. That's the only way you can do things, particularly in deep learning. So I think, yeah,

01:12:44.240 | I think that does help a lot. Like, like Connor said, like you've probably come across, you know,

01:12:49.360 | map and reduce and filter and certainly in Python, you'll have done list comprehensions and dictionary

01:12:56.880 | comprehensions. And a lot of people have done SQL. So it's, yeah, I think a lot of people come into it

01:13:04.720 | with some relevant analogies, if we can help connect for them. Yeah, one of the things that,

01:13:12.720 | you know, this really is reinforcing my idea that, or it's not my idea, I think it's just an idea

01:13:19.840 | that multiple people have had, but the tool doesn't exist yet. Because we'll link to some

01:13:25.760 | documentation that I use frequently when I'm going sometimes between APL and J on the BQN website,

01:13:31.280 | they have BQN to dialogue APL dictionaries and BQN to J dictionaries. So sometimes I'll like,

01:13:38.320 | if I'm trying to convert between the two, the BQN docs are so good. I'll just use BQN as like an

01:13:43.040 | IR to go back and forth. But I've mentioned on previous podcasts that really what would be amazing

01:13:48.480 | and it would only work to a certain extent is something like a multidirectional array language

01:13:55.040 | transpiler and adding NumPy to that list would probably be, you know, a huge, I don't know what

01:14:00.960 | the word for it is, but beneficial for the array community. If you can type in some NumPy expression,

01:14:06.240 | you know, like I said, it's only gonna work to an extent, but for simple, you know, rank one vectors

01:14:10.960 | or arrays that you're just reversing and summing and doing simple, you know, reduction and scan

01:14:15.840 | operations, you could translate that pretty easily into APLJ and BQN. And it's, I think that would

01:14:22.560 | make it so much easier for people to understand, aka the hieroglyphics or the Greek or the Chinese

01:14:28.080 | or whatever metaphor you want to use. Because yeah, this is, it is definitely challenging at times

01:14:34.640 | to get to a certain point where you have enough info to keep the snowball rolling, if you will.

01:14:39.760 | And it's very easy to hit a wall early on. Yeah. That's a project I've been thinking about is

01:14:47.280 | basically rewrite NumPy in APL. It doesn't seem like a whole lot of work, where just take all those

01:14:55.600 | names that are available in NumPy and just define them as APL functions. And people can explore that

01:15:00.480 | by opening them up and seeing how they're defined. Oh, so not actually you're saying like,

01:15:07.120 | it wouldn't be a new thing. You're just saying like, rename the symbols, what they're known as

01:15:12.560 | in NumPy so that you'd still be in a, like an APL. Yeah. I mean, you could use it as a library,

01:15:19.440 | but I was thinking of it more as an interactive exploring type thing, where you open up this

01:15:24.320 | library and then you, you write the name of some NumPy thing functionality and open it up in the

01:15:34.640 | editor and see, well, how is this defined in APL? And then you could use it obviously, since it's

01:15:40.320 | defined. Interesting. Then you could slowly, you could use these library functions. And then as

01:15:47.360 | you get better at APL, you can start actually writing out the raw APL instead of using these

01:15:52.080 | covers for it. Well, I guess, Jeremy, that's interesting. Do you think that, because you've

01:15:57.680 | mentioned about sort of the notation versus the programming language and where do you think the,

01:16:04.400 | like, in your dream scenario, are you actually coding in sort of an Iversonian like notation?

01:16:11.280 | Or is it at the end of the day, does it still look like NumPy, but it's just all of the expressivity

01:16:19.280 | and power that you have in the language like APL is brought to and combined with what NumPy

01:16:25.600 | sort of currently looks like? I mean, well, it'd be a bit of a combination, Connor, in that, like,

01:16:30.400 | you know, my classes and my type dispatch and my packaging and, you know, all the, you know,

01:16:40.800 | my function definitions and whatever, that's Python. But, you know, everywhere I can use

01:16:49.040 | plus and times and divide and whatever, I could also use any APO glyph. And so it'd be, you know,

01:16:59.760 | basically an embedded DSL for kind of high dimensional notation. It would work automatically

01:17:09.360 | on NumPy arrays and TensorFlow tensors and PyTorch tensors. I mean, one thing that's interesting is,

01:17:16.480 | to a large degree, APL and PyTorch and friends have actually arrived at a similar place

01:17:27.120 | with the same, you know, grandparents, which is, Iverson actually said his inspiration

01:17:36.640 | for some of the APL ideas was tensor analysis. And a lot of the folks, as you can gather from

01:17:43.440 | the fact that in PyTorch, we don't call them arrays, we call them tensors. A lot of the folks

01:17:47.280 | working on deep learning, their inspiration was also from tensor analysis. So it comes from

01:17:51.600 | physics, right? And so I would say, you know, a lot more folks have worked on PyTorch. We're

01:17:57.280 | familiar with tensor analysis and physics than we're familiar with APL. And then, of course,

01:18:03.680 | there's been other notations, like explicitly based on Einstein notation, there's a thing

01:18:10.080 | called INOPS, which like takes, it's a very interesting kind of approach of taking Einstein

01:18:15.280 | notation much further. And like Einstein notation, if you think about it, is the kind of the loop

01:18:21.120 | free programming of math, right? The equivalent of loops in math is indices. And Einstein notation

01:18:28.240 | does away with indices. And so that's why stuff like INOPS is incredibly powerful because you can

01:18:33.840 | write, you know, an expression in INOPS with no indices and no loops. And it's all implicit

01:18:42.160 | reductions and implicit loops. I guess, yeah, my ideal thing would be, we wouldn't have to use INOPS,

01:18:49.520 | we can use APL, you know, and it wouldn't be embedded in a string. They would actually be

01:18:55.680 | operators. Yeah, that's what it is. They'd be operators in the language. The Python operators

01:19:00.160 | would not just be plus times minus slash, that would be all the APL glyphs would be Python

01:19:12.320 | operators. And they would work on all Python data types, including all the different tensor and

01:19:18.000 | array data types. Interesting. Yeah. So it sounds like you're describing a kind of hybrid language.

01:19:24.960 | JavaScript too. I would love the whole DSL to be in JavaScript as well. You know,

01:19:28.720 | that'd be great. And I feel like I saw that somewhere. I feel like I saw somebody actually

01:19:34.640 | do an ECMA script, you know, RFC with an implementation. Yeah, it was an A+4s joke.

01:19:44.240 | Yeah, but it actually worked, didn't it? Like, it's just there was actually an implementation.

01:19:48.800 | I don't think they had the implementation. It was just very, very well-specced. It could

01:19:54.480 | actually work kind of thing. No, I definitely read the code. I don't know how complete it was,

01:19:59.920 | but there was definitely some code there. I can't find it again. If you know where it is.

01:20:04.080 | There's a JavaScript implementation of APL by Nick Nicolev. But my problem with it,

01:20:12.480 | it's not tightly enough connected with the underlying JavaScript.

01:20:17.280 | It shouldn't be an A+4 full stroke, should it? It's like Gmail was an A+4 full stroke,

01:20:24.000 | right? Gmail came out on April 1st and totally destroyed my plans for fast mail because it was

01:20:29.440 | an April Fools joke that was real. And Flask, you know, the Flask library, I think, was originally

01:20:35.040 | an April Fools joke. We shouldn't be using frameworks because I created a framework that's

01:20:40.480 | so stupidly small that it shouldn't be a framework. And now that's the most popular web framework in

01:20:45.120 | Python. So, yeah, maybe this should be an April Fools joke that becomes real.

01:20:52.000 | How close? This is maybe an odd question, but because from what I know about Julia,

01:20:56.800 | you can define your own Unicode operators. And I did try at one point to create a small

01:21:05.760 | composition of two different symbols, you know, square root and reverse or something,

01:21:11.600 | and it ended up not working and asking me for parentheses. But do you think Julia could evolve

01:21:17.360 | to be that kind of hybrid language? Maybe. I'm actually doing a keynote at JuliaCon in a couple

01:21:26.320 | of weeks, so maybe I should erase that. Just at the Q&A section, say, any questions? But first,

01:21:34.800 | I've got one for the community at large. Here's what I'd like. I think my whole talk is going to

01:21:38.880 | be kind of like what Julia needs to be, you know, to move to the next level. I'm not sure I can

01:21:45.840 | demand that a complete APL implementation is that thing, but I could certainly put it out there as

01:21:50.320 | something to consider. It always bothers me, though, that if you try to extend those languages

01:21:57.200 | like this or you could do some kind of pre-compiler for it, then their order of execution ends up

01:22:05.440 | messing up APL. I think APL very much depends on having a strict one-directional order of functions,

01:22:12.800 | otherwise it's hopeless to keep track of. That is a big challenge because currently

01:22:18.880 | the DSL inside Python, which is the basic mathematical operations, do have the BODMAS

01:22:26.720 | or PEMDAS order operations. So there would need to be some way. So in Python, that wouldn't be

01:22:32.960 | too hard, actually, because in Python, you can opt into different kind of parsing things by adding a

01:22:42.320 | from dunderfutures import blast. You could have a from dunderfutures import APL precedence.

01:22:49.360 | And then from then on, everything in your file is going to use right-to-left precedence.

01:22:54.480 | That's really interesting and cool. I didn't know that.

01:23:00.400 | Yeah, that's awesome. I've been spending a lot of time thinking about

01:23:08.240 | function precedence and just the differences and different languages. I'm not sure if any other

01:23:13.760 | languages have this, but something that I find very curious about BQN and APL is that they have

01:23:19.920 | functions basically that have higher precedence than other functions. So operators in APL and

01:23:27.920 | conjunctions in adverbs, they have higher precedence than your regular functions that apply to arrays.

01:23:35.200 | I'm simplifying a tiny bit, but this idea that in Haskell, function application always has

01:23:40.880 | the highest precedence. You can never get anything that has a higher function precedence than that.

01:23:46.080 | And it always, having stumbled into the array world now, it seems like a very powerful thing

01:23:51.360 | that these combinator-like functions don't have just by default the higher precedence. Because if

01:23:56.160 | you have a fold or a scan or a map, you're always combining that with some kind of binary operation

01:24:01.760 | or unary operation to create another function that you're then going to eventually apply to

01:24:05.600 | something. But the basic right to left, putting aside the higher order functions or operators,

01:24:15.840 | as they're known in APL, the basic right to left path, again, for teaching and for my own brain,

01:24:22.240 | gosh, that's so much nicer than in C++. Oh my God, they're not being able to operate a precedence.

01:24:30.320 | There's no way I can ever remember that. And there's a good chance when I'm reading somebody

01:24:34.800 | else's code that they haven't used parentheses because they didn't really need them and that I

01:24:40.160 | have no idea where they have to go and then I have to go and look it up. It's another of these things

01:24:45.200 | that with the kids, I'm like, okay, you remember that stuff we spent ages on about like, first you

01:24:51.040 | do exponents and then you do times. It's like, okay, you don't have to do any of that in APL.

01:24:56.160 | You just go right to left and they're just like, oh, that's so much better.

01:24:59.680 | This literally came up at work like a month ago, where I was giving this mini APL, we had 10 minutes

01:25:07.440 | at the end of a meeting, and then I just made this offhand remark that of course, the evaluation

01:25:11.680 | order in APL is a much simpler model than what we learned in school. And I upset, there was,

01:25:16.960 | I don't know, 20 people in the meeting and it was the most controversial thing I had said.

01:25:23.200 | I almost had like an out of body experience because I thought I was saying something that

01:25:27.360 | was like objectively just true. And then I was like, wait a second, what I'm clearly missing,

01:25:32.720 | like, is there? Yeah, well, you were wrong. Like, how do you communicate? No, I mean,

01:25:36.480 | most adults are incapable of like new ideas. It's just, it's, it's, that's what I should have said

01:25:44.240 | in the meeting. What, I mean, this is a reason that I, another reason I like doing things like

01:25:50.400 | APL study groups, because it's a way of like self-selecting that small group of humanity who's

01:25:55.280 | actually interested in trying new things, despite the fact that they're grownups, and then try to

01:25:59.920 | surround myself with those people in my life. But isn't it sad then? I mean, what has happened

01:26:04.560 | to those grownups? Like when you mentioned teaching these people and trying to like,

01:26:08.240 | map their existing knowledge onto APL things, what does it mean to box and so on? I find that

01:26:12.640 | two children and non-programmers, expanding their array model and how the functions are applied and

01:26:19.120 | so on, is almost trivial. Meets no resistance at all. And it's all those adults that have either

01:26:26.560 | learned their, their primitives or button mass or whatever the rules are, and, and all the computer

01:26:31.440 | science people that know their proceedings tables and their lists of lists and so on.

01:26:36.000 | Those are the ones that are really, really struggling. It's not just resisting. They're

01:26:40.800 | clearly struggling. They're really trying and, and, and it's a lot of effort. So there is actually,

01:26:47.520 | I mean, that is a known thing in educational research. So yeah, I mean, so I spent months

01:26:55.120 | earlier this year and late last year reading every paper I caught about, you know, education,

01:27:02.720 | because I thought if I'm going to be homeschooling, then I should try to know what I'm doing.

01:27:06.240 | And yeah, what you describe at arm is, is absolutely a thing, which is that the,

01:27:12.640 | you know, the research shows that trying, you know, when you've got a, you know, an existing idea,

01:27:18.480 | which is an incorrect understanding of something, and you're trying to replace it with a correct

01:27:23.200 | understanding, that is much harder than learning the correct version directly. So which is obviously

01:27:31.520 | a challenge when you think about analogies and analogy has to be good enough to lead directly

01:27:38.160 | to the, to the correct version. But I think, you know, the important thing is to find the people

01:27:43.040 | who are who have the curiosity and tenacity to be prepared to go over that hurdle, even though it's

01:27:50.480 | difficult, you know, because yeah, it is like, that's just, that's just how human brains are.

01:27:55.920 | So so be it, you know. Yeah, unlearning is really hard work, actually. And if you think about it,

01:28:02.240 | it probably should be because you spend a lot of time and energy to put some kind of a pattern

01:28:06.720 | into your brain. Right. You don't want to have that evaporate very quickly. Right. And our,

01:28:12.080 | you know, myelination occurs around what, like age is age to 12 or something. So like our brains

01:28:17.520 | are literally trying to stop us from having to learn new things, because our brains think that

01:28:23.760 | they've got stuff sorted out at that point. And so they should focus on keeping long term memories

01:28:27.680 | around. So yeah, it does become harder. But, you know, a little bit, it's still totally doable.

01:28:34.640 | The solution is obvious. Teach AP on primary school.

01:28:37.200 | That's what I'm doing. What was the word you mentioned? Am I a myelation?

01:28:43.520 | My myelination. M-E-Y-L-I-N-A-T-I-O-N. Interesting. I'd not heard that one before.

01:28:51.680 | So it's a physical coating that I can't remember goes on the dendrites.

01:28:56.160 | I think it's on the axons, isn't it?

01:28:57.760 | That sounds right. These fat layers or cholesterol layers. I never took any biology courses in my

01:29:06.800 | education. So clearly, I've missed out on that aspect. You myelinated anyway.

01:29:13.840 | Isn't that an APL function? Myelinate.

01:29:18.480 | You also mentioned the word tenacity, Jeff. Yeah.

01:29:24.240 | And and and I was watching an interview with Samyan Bhatani.

01:29:29.600 | And you were talking about because it sounds like he was you spotted at an early point in his

01:29:38.400 | working with Kaggle that he was something probably different. And the thing you said

01:29:41.840 | was that tenacity to to keep working at something. Yeah.

01:29:45.760 | I think that's a really important part about educating people

01:29:49.440 | that they shouldn't necessarily expect learning something new to be easy.

01:29:53.280 | Yeah. But you can do it.

01:29:55.520 | Oh, yeah. I mean, I really noticed that when I was started learning Chinese.

01:30:00.240 | Like I went to, you know, just some local class in in Melbourne.

01:30:08.480 | And everybody was very, very enthusiastic, you know, and everybody was going to learn Chinese.

01:30:14.560 | And we all talked about the things we were going to do.

01:30:19.920 | And yeah, each week, there'd be fewer and fewer people there.

01:30:22.800 | And, you know, I kind of tried to keep in touch with them.

01:30:26.160 | But after a year, every single other person had given up and I was the only one still doing it.

01:30:32.240 | You know, so then after a couple of years, people would be like,

01:30:34.320 | wow, you're so smart. You learn Chinese. This is like, no, man.

01:30:39.440 | Like during those first few weeks, I was pretty sure I was learning more slowly than the other

01:30:45.120 | students. But everybody else stopped doing it. So of course, they didn't learn Chinese.

01:30:51.680 | And I don't know what the trick is, because, yeah, it's the same thing with, you know,

01:30:56.080 | like it fast. I courses, they're really designed to keep people interested and get people doing

01:31:01.840 | fun stuff from from day one. And, you know, still, I'd say most people drop out and the ones that

01:31:09.120 | don't I would say most of them end up becoming like actual world class practitioners and they,

01:31:16.480 | you know, build new products and startups and whatever else. And people will be like,

01:31:20.480 | oh, I wish I knew neural nets and deep learning. It's like, okay, here's the course.

01:31:25.440 | Just just do it and don't give up. But yeah, I don't know tenacity.

01:31:31.440 | It's not a very common virtue, I think, for some reason.

01:31:36.960 | It's something I've heard, I think it's Joe Bowler at Stanford talk about the growth mindset.

01:31:41.840 | And I think that is something that, for whatever reason, some people tend to, and maybe it's

01:31:47.280 | malanation, at those ages, you start to get that mindset where you're not so concerned about

01:31:53.600 | having something happen that's easy to do well. But just the fact that if you keep working at it,

01:31:59.600 | you will get it. And not everybody, I guess, is maybe put in the situations that they

01:32:05.760 | get that feedback that tells you if I keep trying this, I'll get it. If it's not easy, they stop.

01:32:11.360 | Yeah, I mean, that area of growth mindset is a very controversial idea in education.

01:32:18.800 | Specifically the question of can you modify it? And I think it's certainly pretty well established

01:32:27.840 | to this point that the kind of stuff that schools have tended to do, which is put posters up around

01:32:32.480 | the place saying like, you know, make things a learning opportunity or don't give up, like they

01:32:37.680 | do nothing at all. You know, with my daughter, we do all kinds of stuff around this. So we've

01:32:46.640 | actually invented a whole family of clams. And as you can imagine, clams don't have a growth mindset,

01:32:52.960 | they tend to sit on the bottom of the ocean, not moving. And so the family of clams that we

01:33:00.400 | invented that we live with, you know, always at every point that we're going to have to like learn

01:33:05.360 | something new or try something new, always start screaming and don't want to have anything to do

01:33:10.880 | with it. And, you know, so we actually have Claire telling the clams how it's going to be okay. And,

01:33:17.280 | you know, it's actually a good thing to learn new things. And so we're trying stuff like that to try

01:33:22.480 | to like have have imaginary creatures that don't have a growth mindset and for her to realize how

01:33:29.520 | how silly that is, which is fun. And the things that you were talking about in terms of the

01:33:34.960 | meta-mathematics, you didn't say, Oh, the successor, this is what plus is you said,

01:33:40.320 | how do you how do you how would you use this? How would you start to put it together themselves?

01:33:46.720 | Which to me, that's the growth mindset that if you Yeah, you're creating that. But then like,

01:33:52.240 | you know, gosh, you're getting to all the most controversial things in education here, Bob,

01:33:56.720 | because that's the other big one is discovery learning. So this idea of having kids explore and

01:34:04.160 | find. It's also controversial, because it turns out that actually the best way to have people

01:34:11.280 | understand something is to give them a good explanation. So it is important, like, that

01:34:17.040 | you combine this, like, okay, how would you do this within like, okay, let me just tell you

01:34:23.200 | what you know why this is. It's easier for homeschooling with two kids, because I can make sure

01:34:28.560 | their exploration is short, and correct. You know, if you spend a whole class, you know,

01:34:36.720 | 50 minutes doing totally the wrong thing, then you end up with these really incorrect

01:34:42.640 | understandings, which you then have to kind of deprogram. So yeah, education's hard, you know.

01:34:51.040 | And I think a lot of people look for these simple shortcuts, and they don't really exist. So you

01:35:00.640 | actually have to have good, good explanations and good problem solving methods and yeah,

01:35:10.320 | all this stuff. That's a really interesting area, the notation and the tools. Yeah, and you know,

01:35:17.280 | notation, I mean, so I do a live coding, you know, video thing every day with a bunch of folks. And

01:35:28.320 | in the most recent one, we started talking about APL, why we're going to be doing APL this week

01:35:35.360 | instead. And I gave, you know, somebody actually said like, oh, my God, is it going to be like

01:35:40.560 | regexes? And, you know, I kind of said like, okay, so regexes are a notation for doing stuff. And we

01:35:49.760 | spent an hour solving the problem with regexes. And oh, my God, it was such a powerful tool for

01:35:59.680 | this problem. And you know, by the end of it, they were all like, okay, we want to like deeply

01:36:04.080 | study regexes. And obviously, that's a much less flexible and powerful tool notation than APL.

01:36:12.560 | But you know, we kind of talked about how once you start understanding these notations, you can build

01:36:19.680 | things on top of them. And then you kind of create these abstractions. And that's yeah, notation is

01:36:26.720 | how, you know, deep human thought kind of progresses, right, in a lot of ways. So, you know, it's like,

01:36:37.840 | I actually spoke to a math professor friend a couple of months ago about, you know, my renewed

01:36:42.800 | interest in APL. And he was like, and I kind of sent him some, I can't remember what it was,

01:36:48.480 | maybe doing the golden ratio or something, little snippet, and he was just like,

01:36:53.840 | yeah, something like that looks like Greek to me, I don't understand that. It's like,

01:36:57.280 | do you draw a math professor, you know, like, if, if I said somebody who isn't in math,

01:37:03.040 | like a page of your, you know, research, what are they going to say? And, you know, it's interesting,

01:37:11.040 | I said, like, there's a bit of their ideas in here, like, Iverson brackets, for example,

01:37:16.160 | have you ever heard of Iverson brackets? He's like, well, of course, I've heard of it. Like,

01:37:19.040 | you know, it's a fundamental tool in math. It's like, well, you know, that's one thing that you

01:37:23.520 | guys have stolen from APL. You know, that's a powerful thing, right? It's like, fantastic,

01:37:28.640 | I'd never want to do without Iverson brackets. So I kind of tried to say like, okay, well, imagine,

01:37:32.960 | like, every other glyph that you don't understand here, has some rich thing like Iverson brackets,

01:37:38.400 | you could now learn about. Okay, maybe I should give it a go. I'm not sure he has.

01:37:46.960 | But I think that's a good example for mathematicians, is to show like his one thing,

01:37:52.320 | at least that found its way from APL. That maybe gives you a sense that for a mathematician,

01:37:58.240 | that there might be something in here. On that note, because I know we are potentially,

01:38:05.760 | well, we've gone way over, but this has been awesome. But a question I think that might be

01:38:10.400 | a good question to end on is, is, do you have any advice for folks that want to learn something,

01:38:20.880 | whether it's Chinese, or an array language, or to get through your fast AI course? And

01:38:26.560 | is there because I think, you know, like you said, you like to self select for folks that are

01:38:32.560 | the curious types and that are want to learn new things and new ways to solve things. But like,

01:38:38.480 | is there any way, other than just being tenacious to, like, be tenacious, is there tips to, you know,

01:38:46.960 | approaching something with some angle, because I think a lot of the folks maybe listening to this

01:38:51.680 | don't have that issue. But I definitely know a ton of people that are the are the kind of folks

01:38:57.120 | that you know, they'll join a study group, but then three weeks and they, you know, the kind of

01:39:00.400 | lose interest or, or they decide it's too much work or too difficult. As an educator, and you know,

01:39:07.040 | it seems like you operate in this space. Do you have advice to tell folks, you know,

01:39:13.760 | I mean, so much, Connor, I actually kind of embedded in my courses a lot. I can give you

01:39:19.680 | some quick summaries. But what I will say is, my friend Radhika Zmalski, who's been taking my

01:39:25.120 | courses for like four years, has taken everything I've said, and his experience of those things and

01:39:33.520 | turned it into a book. So if you read, Zmalski's book is called Meta Learning, powerful mental

01:39:42.480 | models for deep learning. This is learning as in learning deeply. So yeah, check out his book,

01:39:49.280 | to get the full answer. I mean, there's just, gosh, there's a lot of things you can do to make

01:39:55.760 | learning easier. You know, and a key thing I do in my courses is I always teach top down. So like

01:40:06.400 | often people with like, let's take deep learning and neural networks, they'll be like, okay, well,

01:40:10.480 | first, I'm going to have to learn linear algebra and calculus and blah, blah, blah. And, you know,

01:40:16.480 | four or five years later, they still haven't actually trained a neural network. Our approach

01:40:21.760 | in our course is in lesson one, the very first thing you do in the first 15 minutes is you train

01:40:26.320 | a neural network. And it is more like how we learn baseball or how we learn music, you know,

01:40:36.640 | like you say, like, okay, well, let's play baseball comes, you stand there, you stand there,

01:40:40.960 | I've threaded this to you, you're going to hit it, you're going to run, you know, you don't start by

01:40:45.520 | learning, you know, the parabolic trajectory of a ball or the, you know, history of the game or

01:40:53.440 | whatever, you just start playing. So that's, you know, you want to be playing. And if you're doing

01:40:59.760 | stuff from the start, that's fun and interesting and useful, then top down, doesn't mean it's

01:41:07.360 | shallow, you can then work from there to like, then understand like, what's each line of code

01:41:12.320 | doing? And then how is it doing it? And then why is it doing it? And then what happens if we do

01:41:16.560 | it a different way? And until eventually, with with our fast AI program, you actually end up

01:41:23.040 | rewriting your own neural network library from scratch, which means you have to very deeply

01:41:28.240 | understand every single part of it. And then we start reading research papers. And then we start

01:41:32.960 | learning about how to implement those research papers in the library we just wrote. So yeah,

01:41:37.600 | I'd say go top down, make it fun, make it applied. For things like APL or Chinese, where there's

01:41:45.040 | just stuff you have to remember, use Anki, use repetitive space learning. You know, that's been

01:41:52.000 | around, Ebbinghaus came up with that, I don't know what, 250, 200 years ago, it works, you know,

01:42:02.080 | everybody, if you tell them something, will forget it in a week's time, everybody, you know, and so

01:42:08.800 | you shouldn't expect to read something and remember it. Because you're human, and humans don't do that.

01:42:15.120 | So repetitive space learning will have you quiz you on that thing tomorrow. And then in four days

01:42:22.960 | time, and then in 14 days time, and then in three weeks time, and if you ever forget it, it will

01:42:29.280 | reset that schedule. And it'll make sure it's impossible to forget it, you know, so it's,

01:42:34.320 | it's depressing to study things that then disappear. And so it's important to recognize

01:42:40.960 | that unless you use Anki or super memo or something like that, unless you use it every day,

01:42:47.760 | it will, it will disappear. But if you do use repetitive space learning, it's guaranteed not

01:42:53.120 | to. And I told this to my daughter, a couple of years ago, I said, I, you know, what if I told you

01:43:00.800 | there was a way you can guarantee to never ever forget something you want to know? It's just like,

01:43:06.800 | that's impossible. This is like some kind of magic. It's like, no, it's not magic. And like, I sat down

01:43:13.280 | and I drew out the Ebbinghaus forgetting curves and explained how it works. And I explained how,

01:43:20.640 | you know, if you get quizzed on it in these schedules, it flattens out. And she was just

01:43:25.200 | like, what do you think? I want to use that. So she's been using Anki ever since.

01:43:31.520 | So maybe those are just two, let's just start with those two. Yeah, so go top down and, and use

01:43:38.640 | Anki, I think could make your learning process much more fulfilling, because you'll be doing

01:43:44.400 | stuff with what you're learning and you'll be remembering it. Well, that is awesome. And yeah,

01:43:50.160 | definitely we'll leave links to not just Anki and the book, meta learning, but everything that we've

01:43:56.560 | discussed throughout this conversation, because I think there's a ton of really, really awesome

01:44:00.400 | advice. And obviously to your fast AI course in the library. And we'll also link to, I know you've

01:44:07.040 | been on, like we mentioned before, a ton of other podcasts and talks. So if you'd like to hear more

01:44:12.960 | from Jeremy, there's a ton of resources online. Hopefully, it sounds like you're going to be,

01:44:17.120 | you know, building some learning materials over the next however many months or years. And so

01:44:21.920 | in the future, if you'd love to come back and update us on on your journey with the array

01:44:26.000 | languages, that would be super fun for us, because I've thoroughly enjoyed this conversation. And

01:44:31.040 | thank you so much for waking up early all on the other side of the world from us, at least in

01:44:36.400 | Austria. Thanks for having me. And yeah, I guess with that, we'll say happy array programming.

01:44:41.840 | Happy programming.

01:44:43.840 | [BLANK_AUDIO]

The Array Cast: Jeremy Howard

Chapters