The Array Cast: Jeremy Howard

Welcome to another episode of ArrayCast. I'm your host, Connor. And today we have a very exciting guest, which we will introduce in a second. But before we do that, we'll do brief introductions and then one announcement. So first we'll go to Bob and then we'll go to Adam who has the one announcement.

And then we will introduce our guest. I'm Bob Terrio. I'm a J enthusiast and I do some work with the J Wiki. We're underway and trying to get it all set up for the fall. I'm Adam Botzewski, full-time APL programmer at Dialog Limited. Besides for actually programming APL, I also take care of all kinds of social things, including the APL Wiki.

And then for my announcements, part of what we do with Dialog is arrange a yearly user meeting or a type of conference. And at that user meeting, there is also a presentation by the winner of the APL problem solving competition. That competition closes at the end of the month.

So hurry up if you want to participate. It's not too late even to get started at this point. And also at the end of the month is the end of the early bird discount for the user meeting itself. Awesome. And just a note about that contest. I think, and Adam can correct me if I'm wrong, there's two phases in the first phase.

It's just 10 short problems. A lot of them are just one-liners. And even if you only solve one of the 10, I think you can win a small cash prize just from answering one. Is that correct? I'm not even sure. You might need to solve them all. They're really easy.

So the point being though is that you don't need to complete the whole contest in order to be eligible to win prizes. No, for sure. There's a certain amount that if you get to that point, you hit a certain threshold and you can be eligible to win some free money, which is always awesome.

And yeah, just briefly, as I introduce myself in every other episode, I'm your host, Connor, C++ professional developer, not an array language developer in my day-to-day, but a huge array language and combinator enthusiast at large, which brings us to introducing our guest who is Jeremy Howard, who has a very, very, very long career.

And you probably have heard him on other podcasts or have been giving other talks. I'll read the first paragraph of his three-paragraph bio because I don't want to embarrass him too much, but he has a very accomplished career. So Jeremy Howard is a data scientist, researcher, developer, educator, and entrepreneur.

He is the founding researcher at FastAI, a research institute dedicated to making deep learning more accessible and is an honorary professor at the University of Queensland. That's in Australia, I believe. Previously, Jeremy was a distinguished research scientist at the University of San Francisco, where he was the founding chair of the Wicklow artificial intelligence and medical medical research initiative.

He's also been the CEO of analytic and was the president and chief scientist of Kegel, which is the basically data science version of leak code, which many software developers are familiar with. He was the CEO of two successful Australian startups, Fastmail and Optimal Decisions Group. And before that, in between doing a bunch of other things, he worked in management consulting at McKinsey, which is an incredibly interesting start to the career that he has had now, because for those of you that don't know, McKinsey is one of the three biggest management consulting firms alongside, I think, Bain & Co.

and BCG. So I'm super interested to hear how he started in management consulting and ended up being the author of one of the most popular AI libraries in Python and also the course that's attached to it, which I think is, if not, you know, the most popular, a very, very popular course that students all around the world are taking.

So I will stop there, throw it over to Jeremy, and he can fill in all the gaps that he wants, jump back to however far you want to, to tell us, you know, how you got to where you are now. And I think the one thing I forgot to mention, too, is that he recently tweeted on July 1st, and we're recording this on July 4th, that he quote the tweets, reads, "Next week, I'm starting a daily study group on my most loved programming language, APL." And so obviously interested to hear more about that tweet and what's going to be happening with that study group.

So over to you, Jeremy. Well, the study group is starting today as we record this. So depending on how long it takes to get this out, it'll have just started. And so definitely time for people to join in. So we'll, I'm sure we'll include a link to that in the show notes.

Yeah, I definitely feel kind of like I'm your least qualified array programming person ever interviewed on this show. I love APL and J, but I've done very, very little with them, particularly APL. I've done a little, little bit with J mucking around, but like, I find a couple of weeks here and there every few years, and I have for a couple of decades.

Having said that, I am a huge enthusiast of array programming, as it is used, you know, in a loopless style in other languages, initially in Pell, and nowadays in Python. Yeah, maybe I'll come back to that, because I guess you wanted to get a sense of my background. Yeah, so I actually started at McKinsey.

I grew up in Melbourne, Australia. And I didn't know what I wanted to do when I grew up at the point that you're meant to know when you choose a university, you know, major. So I picked philosophy on the basis that it was like, you know, the best way of punting down the road what you might do, because with philosophy, you can't do anything.

And honestly, that kind of worked out in that I needed money, and I needed money to get through university. So I got over like one day a week, kind of IT support job at McKinsey, the McKinsey Melbourne office during university from first year, I think that's from first year.

But it turned out that like, yeah, I was very curious, and so I'm so curious about management consulting. So every time consultants would come down and ask me to like, you know, clean out the sticky coke they built in their keyboard or whatever, I would always ask them what they were working on and ask them to show me and I've been really interested in like doing analytics see kind of things for a few years at that point.

So during high school, basically every holidays, I kind of worked on stuff with spreadsheets or Microsoft access or whatever. So it turned out I knew more about like, stuff like Microsoft Excel than they did. So within about two months of me starting this one day a week job, I was working 90 hour weeks, basically doing analytical work for the consultants.

And so that, you know, that actually worked out really well, because I kind of did a deal with them where they would, they gave me a full time office, and they would pay me $50 an hour for whatever time I needed. And so suddenly, I was actually making a lot of money, you know, working, working 90 hours a week.

And yeah, it was great because then the I would come up with these solutions to things they're doing in the projects, and I'd have to present it to the client. So next thing I knew I was basically on the client side or all the time. So I ended up actually not going to any lectures at university.

And I somehow kind of managed this thing where I would take two weeks off before each exam, go and talk to all my lecturers and say, Hey, I was meant to be in your university course. I know you didn't see me, but I was kind of busy. Can you tell me what I was meant to have done?

And I would do it. And so I kind of scraped by a BA in philosophy, but I don't Yeah, you know, I don't really have much of an academic background. But that did give me a great background in like applying stuff like, you know, linear regression and logistic regression and linear programming and, you know, the basic analytical tools of the day, generally through VBA scripts in Excel, or, you know, access, you know, the kind of stuff that a consultant could chuck out, you know, on to their laptop at a client site.

Anyway, I always felt guilty about doing that, because it just seemed like this ridiculously nerdy thing to be doing when I was surrounded by all these very important, you know, consultant types who seemed to be doing much more impressive strategy work. So I tried to get away from that as quickly as I could, because I didn't want to be the nerd in the company.

And yeah, so I ended up spending the next 10 years basically doing strategy consulting. But throughout that time, I did, you know, because I didn't have the same background that they did that expertise, they did the MBA, they did, I had to solve things using data and analytically intensive approaches.

So although in theory, I was a strategy management consultant, and I was working on problems like, you know, how do we fix the rice industry in Australia? Or, you know, how do we, you know, like, you know, how do we deal with this new competitor coming into this industry or whatever it was, I always did it by analyzing data, which actually turned out to be a good niche, you know, because I was the one McKinsey consultant in Australia who did things that way.

And so I successful and I became I think, I ended up moving to AT Carney, which is the other of the two original management consulting firms. I think I became like the youngest manager in the world. And, you know, through this, we had parallel path I was doing. And then through that, learned about the insurance industry and discovered like the whole insurance industry is basically pricing things in a really dumb way.

I developed this approach based on optimization of optimized pricing, launched a company with my university friend who had a PhD in operations research. And, yeah, so we built this new approach to pricing insurance, which is, it was kind of fun. I mean, it's, you know, it went well in the set, you know, commercially took a bit of about 10 years doing doing that.

And at the same time, running an email company called fast mail, which also went well. Yeah, we started out basically using C++. And I would say that was kind of the start of my array programming journey in that in those days, this is like 1999, the very first expression templates based approaches to C++ numeric programming were appearing.

And so I, you know, was talking to the people working on those libraries doing stuff like particularly stuff doing the big kind of high energy physics experiments that were going on in Europe. It was ultimately pretty annoying to work with, though, like the amount of time it talked to compile those things, it would take hours.

And it was quirky as all hell, you know, it's still pretty quirky doing metaprogramming in C++. But in those days, it was just a nightmare. Every compiler was different. So I ended up switching to C sharp shortly after that came out. And, you know, move in a way it was disappointing because that that was much less expressive as a kind of array programming paradigm.

And so instead, I ended up basically grabbing Intel's MKL library, which is basically a blast on steroids, if you like, and writing my own C sharp wrapper to give me, you know, kind of array programming ish capabilities, but not with any of the features one would come to expect from a real array programming language around kind of dealing with rank sensibly, and, you know, not much in the way of broadcasting, which reminds me, we should come back for talking about blasts at some stage, because a lot of the reasons that most languages are so disappointing at array programming is because of our reliance on blasts, you know, as an industry.

Fastmail, on the other hand, is being written in Perl, which I really enjoyed as a programming language and still do, I still love Perl a lot. But the scientific programming in Perl I didn't love at all. And so at the time, Perl 6, you know, we was just starting to the idea of it was being developed.

So I ended up running the Perl 6 working group to add scientific programming capabilities or kind of, you know, and at the time, I described those APL inspired programming capabilities to Perl. And so I did an RFC around what we ended up calling hyper operators, which is basically the idea that any operator can operate on arrays and can broadcast over any axes that are mismatched or whatever.

And those RFCs all ended up getting accepted. And Damien Conway and Larry Wall kind of expanded them a little bit. Perl 6 never exactly happened. It ended up becoming a language called Raku. With the butterfly logo. Yeah. And that, you know, and the kind of the performance ideas, I really worked hard on, never really happened either.

So that was a bit of a, yeah, that was all a bit of a failure. But it was fun, and it was interesting. I, you know, so after running these companies for 10 years, one of the big problems with running a company is that you're surrounding by people who you hired, and they, you know, have to make you like them if they want to get promoted, you know, get fired.

And so you could never trust anything anybody says. So I was, you know, very bad, very low expectations about my capabilities, analytics leagues. I hadn't like, you know, I'd basically been running companies for 10 years. I did a lot of coding and stuff, but it was in our own little world.

And so after I sold those companies, yeah, I, one of the things I decided to do was to try actually to become more competent, you know, I had lost my, to some extent, I had lost my feeling that I should hide my nerdiness, you know, and try to act like a real business person.

And I thought, no, I should actually see if I'm actually any good at this stuff. So I tried entering a machine learning competition at a new company that had just been launched called Kaggle with this goal of like, not coming last. So basically, the, you know, the way these things work is you have to make predictions on a data set.

And at the end of the competition, whoever's predictions are the most accurate wins the prize. And so my goal was, yeah, try not to come last, which I wasn't convinced I'd be able to achieve. Because as I say, I didn't feel like this is, I'd never had any technical training, you know, and everybody else in these competitions were PhDs and professors or whatever else.

So it felt like a high bar. Anyway, I ended up winning it. And that, that changed my life, right? Because, yeah, it was like, oh, okay, I am, you know, empirically good at this thing. And people at my local user groups, we used quite a bit as well. You know, I told them, I'm going to try entering this competition.

Anyone want to create a team with me? I want to learn to use R properly. And I kind of went back to the next user group meeting and people were like, I thought you were just learning this thing. How did you win? I was like, I don't know. I just used common sense.

Yeah, so I ended up becoming the chief scientist and president of Kaggle. And Kaggle, as you know, anybody in the data science world knows, has kind of grown into this huge, huge thing, ended up selling it to Google. So I ended up being an equal partner in the company.

I was the first investor in it. And that was great. That was like, I just dove in, we moved to San Francisco for 10 years. You know, surrounded, surrounded by all these people who are just sort of role models and idols, and partly getting to meet all these people in San Francisco was this experience of realizing all these people were actually totally normal, you know, and they weren't like some super genius level, like they're just normal people who, yeah, as I got to know them, it gave me, I guess, a lot more confidence in myself as well.

So maybe they were just normal relative to you. I think in Australia, we all feel a bit, you know, intimidated by the rest of the world in some ways, or a long way away, you know, our only neighbors really have a New Zealand. It's very easy to feel, I don't know, like, yeah, we were not very confident about capabilities over here, other than in sport, perhaps.

Yeah, so one of the things that happened well as a Kaggle was, I had played around with neural networks a bit, a good bit, you know, like 20 years earlier. And I always felt like neural networks were one day going to be the thing. It's like, you know, they are at a theoretical level, infinitely capable.

But, you know, they never quite did it for me. And but then in 2012, suddenly, neural networks started achieving superhuman performance for the first time on really challenging problems, like recognizing traffic signs, you know, like recognizing pictures. And I'd always said to myself, I was going to watch for this moment, and when it happened, I wanted to like, jump on it.

So as soon as I saw that, I tried to jump on it. So I started a new company, after a year of research into like the, you know, what what a neural network's going to do, I decided medicine was going to be huge, I need nothing about medicine. And I, yeah, I started a medicine company to see what we could do with deep learning in medicine.

So that was analytic. Yeah, that ended up going pretty well. And yeah, eventually, I kind of got like a bit frustrated with that, though, because it felt like big learning can do so many things, and I'm only doing such a small part of those things. So deep learning is like neural networks with multiple layers.

I thought the only way to actually help people really, you know, make the most of this incredibly valuable technology is to teach other people how to do it, and to help other people to do it. So my wife and I ended up studying a new, I'd call it kind of a research lab, fast AI, to, to help, to help do that, basically, initially focus on education, and then increasingly focus on research and software development to basically make it easier for folks to use some deep learning.

And that's, yeah, that's where I am now. And that everything in deep learning is all Python. And in Python, we're very lucky to have, you know, excellent libraries that behave pretty consistently with each other, basically based around this NumPy library, which treats arrays very, very similarly to how Jay does, except rather than leading access, it's trailing access.

But basically, you get, you know, you get loop free, you get broadcasting, you know, you don't get things like a rank conjunction, but there's very easy ways to permute axes. So you can do basically the same thing. Things like Einstein notation, you know, the built into the libraries, and then, you know, it's, it's trivially easy to have them run on GPUs or TPUs or whatever, you know, so it's for the last years of my life, nearly all the code I write is array programming code, even though I'm not using a purely array language.

All right, so where do we start now with the questions? I'll let Bob and Adam go first if they want. And if they if they don't have a Okay, Bob, you go ahead. I've got a quick question about about neural networks and stuff. Because when I was going to university all those years ago, people were talking about neural networks, and then they just sort of dropped off the face.

And as you said, around 2010, suddenly they resurfaced again. What do you think was the cause of that resurfacing? Was it hardware? Was it somebody discovered a new method or what? Yeah, mainly hardware. So what happened was people figured out how to do GP GPU, so general purpose GPU computing.

So before that, I tried a few times to use GPUs with neural nets, I felt like that would be the thing. But GPUs were all about like creating shaders and whatever. And it was a whole jargon thing. I didn't even understand what was going on. So the key thing was in video coming up with this CUDA approach, which it's it's all loops, right?

But it's much easier than the old way, like the loops, you basically, it's kind of loops, at least you basically say to CUDA, this is my kernel, which is the piece of code I want to basically run on each symmetric multiprocessing unit. And then you basically say launch a bunch of threads.

And it's going to call your kernel, you know, basically incrementing the x and y coordinates and passing it to your kernel, making them available to your kernel. So it's a kind of it's not exactly a loop, but it's this gets more like a map, I guess. And so when CUDA appeared, yeah, very quickly, neural network libraries appear to take advantage appear appeared that would take advantage of it.

And then suddenly, you know, you get orders of magnitude more performance. And it's cheaper. And you get to buy an Nvidia graphics card with a free copy of Batman, you know, on the excuse that actually this is all for work. So it was it was mainly that there's also this just like at the same time, the thing I'd been doing for 25 years, suddenly got a name data science, you know, we like this very small industry of people like applying data driven approaches to solving business problems.

And we were always looking for a name. Not many people know this, but back in the very early days, there was an attempt to calling it industrial mathematics. Sometimes people would like shoehorn it into operations research or management science, but that was almost exclusively optimization people and specifically people focused more on linear programming approaches.

So yeah, once data science appeared, and also like, you know, basically every company had finally built their data warehouse and the data was was there. So yeah, it's like more awareness of using data to solve business problems and for the first time availability of the hardware that we actually needed.

And as I say, in 2012, it just it's it reached the point like it been growing since the first neural network was built in was at 1957, I guess, that this kind of gradual rate, but once it passed human performance on some tasks, it just kept going. And so now, in the last couple of months, you know, it's now like getting decent marks on MIT math tests and stuff.

It's it's, it's on an amazing trajectory. Yeah, it's kind of a critical mass kind of thing, you get a certain amount of information and able to process and information it, I guess, as you as you do with your hand, it's an exponential curve. And humans and exponential curves, I think we're finding over and over again, we're not really great at understanding an exponential.

No, no, we're not. And that's like why I promised myself that as soon as I saw neural net starting to look like they're doing interesting things, I would drop everything and jump on it, because I wanted to jump on that curve as early as possible. And we're now in this situation where people are just making huge amounts of money with neural nets, which they then reinvest back into making the neural nets better.

And so we are also seeing this kind of bifurcation of capabilities where there's a small number of organizations who are extremely good at this stuff and invested in it and a lot of organizations that are, you know, really struggling to figure it out. And because of the exponential nature, when it happens, it happens very quickly, it feels like you didn't see it coming.

And suddenly, it's there. And then it was past you. And I think you're all experiencing that now. Yeah, and it's happened in so many industries, you know, back in my medical startup, you know, we were interviewing folks around medicines, we interviewed a guy finishing his PhD in histopathology. And I remember, you know, he came in to do an interview with us.

And he basically gave us a presentation about his thesis on kind of graph cut segmentation approaches for pathology slides. And at the end, he was like, anyway, that was my PhD. And then yesterday, because I knew I was coming to see you guys, and I heard you like neural nets, I just thought I'd check out neural nets.

And about four hours later, I trained a neural net to do the same thing I did for my PhD. And way outperformed my PhD thesis, I'd spent the last five years on and so that's where I'm at, you know, and we hear this a lot. Existential crisis in the middle of an interview.

Yes. So I kind of have, I don't know, this is like a 1A, B and C. And I'm not sure if I should ask them all at once. But so you said sort of at the tail end of the 90s is when your array language journey started. But it seems from the way you explained it that you had already at some point along the way heard about the array languages, APL and J, and have sort of alluded to, you know, picking up some knowledge about the paradigm and the languages.

So my first part of the question is sort of, you know, at what point were you exposed to the paradigm in these languages? The second part is what's causing you in 2022 to really dive into it? Because you said you feel like maybe a bit of an imposter or the least qualified guest, which probably is you just being very modest.

I'm sure you know still quite a bit. And then the third part is, do you have thoughts about, and I've always sort of wondered, how the array language paradigm sort of missed out on like, and Python ended up being the main data science language, while like there's like an article that's floating around online called NumPy, the ghost of Iverson, which it's this sort of you can see that in the names and the design of the library that there is an core of APL and even the documentation acknowledges that it took inspiration greatly from J and APL.

But that like the array languages clearly missed what was a golden opportunity for their paradigm. And we ended up with libraries and other languages. So I just asked three questions at once. Feel free to tackle them in any order. I have a pretty bad memory. So I think I've forgotten the second one already.

So you can feel free to come back to any or all of them. So my journey, which is what you started with, was I always felt like we should do more stuff without using code. Because I, or at least like kind of traditional, what I guess we'd call nowadays, imperative code.

There was a couple of tools in my early days, which I've got huge amounts of leverage from because nobody else in at least the consulting firms or generally in our clients knew about them. So that was SQL and pivot tables. And so pivot tables, if you haven't come across it, was basically one of the earliest approaches to OLAP, you know, slicing and dicing.

There was actually something slightly earlier called Lotus Improv, but that was actually a separate product. Excel was basically the first one to put OLAP in the spreadsheet. So no loops. You just drag and drop the things you want to group by and you right click to choose how to summarize.

And same with SQL, you know, you declaratively say what you want to do. You don't have to loop through things. SAS actually had something similar. You know, with SAS, you could basically declare a prop that would run on your data. So yeah, I kind of felt like this was the way I would rather do stuff if I could.

And I think that's what led me when we started doing the C++ implementation of the insurance pricing stuff of being much more drawn to these metaprogramming approaches. I just didn't want to be writing loops in loops and dealing with all that stuff. I'm too lazy, you know, to do that.

I think I'm very driven by laziness, which as Larry Wall said is one of the three virtues of a great programmer. Then yeah, so I think when as soon as I saw NumPy had reached a level of some reasonable confidence in Python, I was very drawn to that because it was what I've been looking for.

And I think maybe that actually is going to bring us to answering the question of like what happened for array languages. Python has a lot of problems, but at its heart, it's a very well-designed language. It has a very small, flexible core. Personally, I don't like the way most people write it, but it's so flexible I've able to create almost my own version of Python, which is very functionally oriented.

I basically have stolen the type dispatch ideas from Julia, created an implementation of that in Python. My Python code doesn't look like most Python code, but I can use all the stuff that's in Python. So there's this very nicely designed core of a language, which I then have this almost this DSL on top of, you know, and NumPy is able to create this kind of DSL again because it's working on such a flexible foundation.

Ideally, you know, I mean, well, okay, so Python also has another DSL built into it, which is math. You know, I can use the operators plus times minus. That's convenient. And in every array library, NumPy, PyTorch, TensorFlow, and Python, those operators work over arrays and do broadcasting over axes and so forth and, you know, accelerate on an accelerator like a GPU or a TPU.

That's all great. My ideal world would be that I wouldn't just get to use plus times minus, but I get to use all the APL symbols. You know, that would be amazing. But given a choice between a really beautiful language, you know, at its core like Python, in which I can then add a slightly cobbled together DSL like NumPy, I would much prefer that over a really beautiful notation like APL, but without the fantastic language underneath, you know, like I don't feel like I there's nothing about APL or J or K's like programming language that attracts me.

Do you know what I mean? I feel like in terms of like what I could do around whether it be type dispatch or how OO is designed or, you know, how I package modules or almost anything else, I would prefer the Python way. So I feel like that's basically what we've ended up with.

You kind of either compromise between, you know, a good language with, you know, slightly substandard notation or amazingly great notation with the substandard language or not just language, but ecosystem. Python has an amazing ecosystem. I think I hope one day we'll get the best of both, right? Like here's my, okay, here's my controversial take and it may just represent my lack of knowledge.

What I like about APL is its notation. I think it's a beautiful notation. I don't think it's a beautiful programming language. I think some things, possibly everything, you know, some things work very well as a notation, but to get to raise something to the point that it is a notation requires some years of study and development and often some genius, you know, like the genius of Feynman diagrams or the genius of juggling notation, you know, like there are people who find a way to turn a field into a notation and suddenly they blow that field apart and make it better for everybody.

For me, like, I don't want to think too hard all the time. Every time I come across something that really hasn't been turned into a notation yet, you know, sometimes I just like, I just want to get it done, you know, and so I would rather only use notation when I'm in these fields that either somebody else had figured out how to make that a notation or I feel like it's really worth me investing to figure that out.

Otherwise, you know, there are, and the other thing I'd say is we already have notations for things that aren't APL that actually work really well, like regular expressions, for example. That's a fantastic notation and I don't want to replace that with APL glyphs. I just want to use regular expressions.

So, yeah, my ideal world would be one where we, where I can write PyTorch code, but maybe instead of like Einstein operations, Einstein notation, I could use APL notation. I think that's where I would love to get to one day and I would love that to totally transparently run on a GPU or TPU as well.

That would be my happy place. Has no reason to do with the fact that I work at NVIDIA that I would love that. Interesting. I've never heard that before, the difference between basically appreciating or being in love with the notation, but not the language itself and that. And, you know, it started out as a notation, right?

Like I was in, you know, it was a notation they used for representing state machines or whatever on early IBM hardware, you know, when he did his Turing Award essay, he chose to talk about his notation. And, you know, you see with people like Aaron with his code defense stuff that if you take a very smart person and give them a few years, they can use that notation to solve incredibly challenging problems like build a compiler and do it better than you can without that notation.

So I'm not saying like, yeah, APL can't be used to almost anything you want to use it for, but a lot of the time we don't have five years to study something very closely. We just want to, you know, we've got to get something done by tomorrow. Interesting. You're still again, you didn't get a answer to.

Oh, yeah. When did you first, well, when did you first meet APL or how did you even find APL? I first found J, I think, which obviously led me to APL. And I don't quite remember where I saw it. Yeah. And actually, when I got to San Francisco, so that would be I'm trying to remember 2010 or something, I'm not sure.

I actually reached out to Eric Iverson and I said, like, oh, you know, we're starting this machine learning company called Kaggle. And I kind of feel like, you know, everybody does stuff in Python, and it's kind of in a lot of ways really disappointing. I wish we're doing stuff in J, you know, but we really need everything to be running on the GPU, or at least everything to be automatically using SIMD and multiprocessor everywhere.

Here's kind of enough to actually jump on a Skype call with me, not just jump on a Skype call, it's like, how do you want to chat? It's like, how about Skype? And he created a Skype account. Like, oh, yeah, we chatted for quite a while. We talked about, you know, these kinds of hopes and yeah, but I just, you know, never really because neither J or APO is in that space yet.

There was just never a reason for me to do anything other than like, it kind of felt like each time I'd have a bit of a break for a couple of months, I'd always been a couple of weeks fiddling around with J just for fun. But that's as far as I got, really.

Yeah, I think the first time I'd heard of you was in an interview that Leo Laporte did with you on triangulation, and you were talking about Kaggle. That was a specific thing. But I think I was riding my bike along some logging or something and suddenly he said, oh, yeah, but a lot of people use J.

I like J. It's the first time I'd ever heard anybody on a podcast say anything about J. It was just like, wow, that's amazing. And the whole interview about Kaggle, there was so much of it about the importance of data processing, not just having a lot of data, but knowing how to filter it down, not over filtering all those tricks.

I'm thinking, wow, these guys are really doing some deep stuff with this stuff and this guy is using J. I was actually very surprised at that point that somebody, I guess not somebody who was working so much with data would know about J, but just that it would be, I guess just suddenly popped onto my headsets and I'm just, wow, that's so neat.

And I will say, in the array programming community, I find there's essentially a common misconception that the reason people aren't using array programming languages is because they don't know about them or don't understand them, which there's a kernel of truth of that, but the truth is nowadays there's huge massively funded research labs at places like Google Brain and Facebook AI Research and OpenAI and so forth where large teams of people are literally writing new programming languages because they've tried everything else and what's out there is not sufficient.

In the array programming world, there's offered a huge underappreciation of what Python can do nowadays, for example. As recently as last week, I heard it described in a chat room, it's like people obviously don't care about performance because they're using Python. And it's like, well, a large amount of the world's highest performance computing now is done with Python.

It's not because Python's fast, but if you want to use RAPIDS, for example, which literally holds records for the highest performance recommendation systems and tabular analysis, you write it in Python. So this idea of having a fast kernel that's not written in the language and then something else talking to it in a very flexible way, I think is great.

And as I say, at the moment, we are very hamstrung in a lot of ways that we, at least until recently, we very heavily relied on BLAS, which is totally the wrong thing for that kind of flexible high-performance computing because it's this bunch of somewhat arbitrary kind of selection of linear algebra algorithms, which, you know, things like the C# work I did, you know, they were just RAPIDS on top of BLAS.

And what we really want is a way to write really expressive kernels that can do anything over any axes. So then there are other newer approaches like Julia, for example, which is kind of like got some rispy elements to it and this type dispatch system. But because it's, you know, in the end, it's on top of LLVM.

What you write in Julia, you know, it does end up getting optimized very well. And you can write pretty much arbitrary kernels in Julia and often get best-in-class performance. And then there's other approaches like JAX. And JAX sits on top of something totally different, which is it sits on top of XLA.

And XLA is a compiler, which is mainly designed to compile things to run fast on Google's TPUs. But it also does an okay job of compiling things to run on GPUs. And then really excitingly, I think, you know, for me is the MLIR project, and particularly the affine dialect.

So that was created by my friend, Chris Latner, who you probably know from creating Clang and LLVM and Swift. So he joined Google for a couple of years. And we worked really closely together on trying to like, think about the vision of really powerful programming on accelerators that's really developer friendly.

Unfortunately, didn't work out. Google was a bit too tight to TensorFlow. But one of the big ideas that did come out of that was MLIR, and that's still going strong. And I do think there's, you know, if something like APO, you know, could target MLIR and then become a DSL inside Python, it may yet win, you know.

I've heard Yeah, I've heard you in the past say that, on different podcasts and talks, that you don't think that Python, even in light of, you know, just saying, people don't realize how much you can get done with Python, that you don't think that the future of data science and AI and neural networks and that type of computation is going to live in the Python ecosystem.

And I've heard on some podcasts, you've said that, you know, Swift has a shot based on sort of the way that they've designed that language. And you just mentioned, you know, a plethora of different sort of, I wouldn't say initiatives, but you know, JAX, XLA, Julia, etc. Do you have like a sense of where you think the future of, not necessarily sort of array language computation, but this kind of computation is going with all the different avenues?

I do. You know, I think we're certainly seeing the limitations of Python, and the limitations of the PyTorch, you know, lazy evaluation model, which is the way most things are done in Python at the moment, for kind of array programming is you have an expression, which is, you know, working on arrays, possibly of different ranks with implicit looping.

And, you know, that's one line of Python code. And generally, that then gets your, you know, on your computer, that'll get turned into, you know, a request to run some particular optimized pre written operation on the GPU or TPU, that then gets sent off to the GPU or TPU, where your data has already been moved there.

It runs, and then it tells the CPU when it's finished. And there's a lot of latency in this, right? So if you want to create your own kernel, like your own way of doing, you know, your own operation effectively, you know, good luck with that. That's not going to happen in Python.

And I hate this, I hate it as a teacher, because, you know, I can't show my students what's going on, right? It kind of goes off into, you know, kind of CUDA land and then comes back later. I hate it as a hacker, because I can't go in and hack at that, I can't trace it, I can't debug it, I can't easily profile it.

I hate it as a researcher, because very often I'm like, I know we need to change this thing in this way, but I'm damned if I'm going to go and write my own. CUDA code, let alone deploy it. So JAX is, I think, a path to this. It's where you say, okay, let's not target pre-written CUDA things, let's instead target a compiler.

And, you know, working with Chris Latner, I'd say he didn't have too many nice things to say about XLA as a compiler. It was not written by compiler writers, it was written by machine learning people, really. But it does the job, you know, and it's certainly better than having no compiler.

And so JAX is something which, instead of turning our line of Python code into a call to some pre-written operation, it instead is turning it into something that's going to be read by a compiler. And so the compiler can then, you know, optimize that as compilers do. So, yeah, I would guess that JAX probably has a part to play here, particularly because you get to benefit from the whole Python ecosystem, package management, libraries, you know, visualization tools, et cetera.

But, you know, longer term, it's a mess, you know, it's a mess using a language like Python which wasn't designed for this. It wasn't really even designed as something that you can chuck different compilers onto. So people put horrible hacks. So, for example, PyTorch, they have something called TorchScript, which is a bit similar.

It takes Python and kind of compiles it. But they literally wrote their own parser using a bunch of regular expressions. And it's it's, you know, it's not very good at what it does. It even misreads comments and stuff. So, you know, I do think there's definitely room for, you know, a language of which Julia would certainly be the leading contender at the moment to come in and do it properly.

And Julia's got, you know, Julia is written on a scheme basis. So there's this little scheme kernel that does the parsing and whatnot. And then pretty much everything else after that is written in Julia. And, of course, leveraging LLVM very heavily. But I think that's what we want, right?

Is that something which I guess I didn't love about Swift. When the team at Google wanted to add differentiation support into Swift, they wrote it in C++. And I was just like, that's not a good sign. You know, like, apart from anything else, you end up with this group of developers who are, in theory, Swift experts, but they actually write everything in C++.

And so they actually don't have much feel for what it's like to write stuff in Swift. They're writing stuff for Swift. And Julia, pretty much everybody who's writing stuff for Julia is writing stuff in Julia. And I think that's something you guys have talked about around APL and J as well, is that there's the idea of writing J things in J and APL things in APL is a very powerful idea.

Yeah, I always wonder about it. Yeah, sorry, go on. I just remembered your third question. I'll come back to it. No, no, no, you go ahead. You had. Oh, you asked me why now am I coming back to APL and J, which is totally orthogonal to everything else we've talked about, which is I had a daughter, she got old enough to actually start learning math.

So she's six. And oh, my God, there's so many great educational apps nowadays. There's one called Dragonbox Algebra. It's so much fun. Dragonbox Algebra five plus. And it's like five plus algebra, like what the hell? So when she's, I think she actually says still four, I gave, you know, I let her play with Dragonbox Algebra five plus.

And she learned Algebra, you know, by helping Dragon eggs hatch. And she liked it so much, I let her try doing Dragonbox Algebra 12 plus. And she loved that as well and finished it. And so suddenly I had a five year old kid that liked Algebra. Much, much surprised.

Kids really can surprise you. And so, yeah, she struggled with a lot of the math that they were meant to be doing at primary school, like, like the vision and modification, but she liked Algebra. And we ended up homeschooling her. And then one of our, her best friend is also homeschooled.

So this, this year I decided I'd try tutoring them in math together. And so my daughter's name's Claire, so her friend Gabe, so her friend Gabe discovered on his Mac the world of alternative keyboards. So he would start typing in the chat in, you know, Greek characters or Russian characters.

And one day I was like, okay, check this out. So I like typed in some APL characters and they were just like, wow, what's that? We need that. So initially we installed dialogue APL so that they could type APL characters in the chat. And so I explained to them that this is actually this like super fancy math that you're typing in.

And they really wanted to try it. So, and that was at the time I was trying to teach them sequences and series, and they were not getting it at all. It was my first total failure time as a, as a math tutor with them, you know, they'd been zipping along, fractions, you know, greatest common denominator, factor trees.

Okay, everything's fine. It makes sense. And then we hit sequences and series. And it's just like, they had no idea what I was talking about. So we put that aside. Then we spent like three one hour lessons doing the basics of APL, you know, the basic operations and doing stuff with lists and dyadic versus monadic, but still, you know, just primary school level math.

And we also did the same thing in NumPy using Jupyter. And they really enjoyed all that, like they were more engaged than our normal lessons. And so then we came back to like, you know, sigma i equals one to five of i squared, whatever. And I was like, okay, that means this, you know, in APL and this in NumPy.

And they're like, oh, is that all? Fine. With, you know, that's like, yeah, so that was a problem. This idea of like Tn equals Tn minus one plus blah, blah, blah, blah. It's like, what is this stuff? But when you're actually indexing real things and can print out the intermediate values and all that, and you've got iota or a range, they were just like, oh, okay.

You know, I don't know why you explained it this dumb way before. And I will say, given a choice between doing something on a whiteboard or doing something in NumPy or doing something in APL, now they will always pick APL because the APL version is just so much easier.

You know, there's less to type, there's less to think about, there's less boilerplate. And so it's been, it's only been a few weeks, but like yesterday, we did the power operator, you know, and so we literally started doing the foundations of metamathematics. So it's like, okay, let's create a function called capital S, capital S arrow, you know, plus jot one, right?

So for those Python people listening, jot is, if you give it an array or a scalar, it's the same as partial in Python or bind in C++. So, okay, we've now got something that adds one to things. Okay. I said, okay, this is called the successor function. And so I said to them, okay, what would happen if we go SSS zero?

And they're like, oh, that would be three. And so I said, okay, well, what's, what's addition? And then one of them's like, oh, it's, it's repeated S. I'm like, yeah, it's repeated S. So how do we say repeated? So in APL, we say repeated by using this star diuresis.

It's called power. Okay. So now we've done that. What is multiplication? And then one of them goes after a while. Oh, it's repeated addition. So we define addition, and then we define multiplication. And then I'm like, okay, well, what about, you know, exponent? Oh, that's just, now this one, they've heard a thousand times.

They both are immediately like, oh, that's repeated multiplication. So like, okay, we've now defined that. And then, okay, well, subtraction, that's a bit tricky. Well, it turns out that subtraction is just, you know, is the opposite of something. What's it the opposite of? They both know that. Oh, that's the opposite of addition.

Okay. Well, opposite of, which in math, we call inverse is just a negative power. So now we define subtraction. So how would you define division? Oh, okay. How would you define roots? Oh, okay. So we kind of like, you know, designing the foundations of, of mathematics here at APL, you know, with a six year old and an eight year old.

And during this whole thing at one point, we're like, okay, well, now I can't remember why, but we're like, okay, now we got to do one divided by a half. And they both like, we don't know how to do that. So, you know, APL, this stuff that's considered like college level math suddenly becomes easy.

And, you know, at the point when still primary school level math, like one divided by a half is considered hard. So it definitely made me rethink, you know, what is easy and what is hard and how to teach this math stuff. And I've been doing a lot of teaching of math with APL and the kids are loving it.

And I'm loving it. And that's actually why I started this study group, which will be on today. Today, as we record this a few days ago, as you put it out there, as I kind of started saying on Twitter to people like, oh, it's really been fun teaching my kids, you know, my kid and a friend math using APL and a lot of adults were like, ah, can we learn math using APL as well?

So that's what we're going to do. Well, and that's the whole notation thing, isn't it? It's the notation you get away from the sigmas and the pies and all that, you know, subscripts. I know, right? This is exactly what Everson wanted. Yeah, exactly. I mean, who wants this, you know, why should capital pi be product and capital sigma be sums?

Like, you know, we did class slash and it's like, okay, how do we do product? They're like, oh, it's obviously time slash. And I show them backslash, it's like, how do we do our cumulative product? And so it's obviously time spec slash. Yeah, this stuff. And but, you know, a large group of adults can't handle this because I'll put stuff on Twitter.

I'll be like, here's a cool thing in APL. And like half the replies will be like, well, that's line noise. That's not intuitive. It's like, how do you say that? It's this classic thing that I've always said, it's like the difference between what you said that you don't understand it, or is it that it's hard?

And, you know, kids don't know for kids, everything's new. So that, you know, they see something they've never seen before. They're just like, teach me that. Or else adults, or at least a good chunk of adults, just like, I don't immediately understand that. Therefore, it's too hard for me.

Therefore, I'm gonna belittle the very idea of the thing. I did, I did a tacit program on one liner on APL farm the other day. And somebody said, that looks like Greek to me. I said, well, Greek looks like Greek to me, because I don't know Greek. I mean, sure.

If you don't know it, absolutely, it looks silly. But if you know it, then it's, it's not that hard. Yeah, I will say like, you know, a lot of people have put a lot of hard work into resources for APL and J teaching. But I think there's still a long way to go.

And one of the challenges is, it's like when I was learning Chinese, I really wanted to I like the idea of learning Chinese new words by looking them up in a Chinese dictionary. But of course, I didn't know what the characters in the dictionary meant. So I couldn't look them up.

So when I learned Chinese, I really spent the first 18 months just focused on learning characters. So I got through 6000 characters in 18 months of very hard work. And then I could start looking things up in dictionary. My hope is to do a similar thing for APL, like for these study groups, I want to try to find a way to introduce every glyph in an order that never refers to glyphs you haven't learned yet.

Like that's something I don't feel like we really have. And so that then you can look up stuff in the dialogue documentation. Because now still, I don't know that many glyphs. So like most of the stuff in the documentation, I don't understand because it explains glyphs using glyphs I don't yet know.

And then I look those up. And those are used, explain things with glyphs I don't yet know. So, you know, step one for me is I think we're just going to go through and try to teach what every glyph is. And then I feel like we should be able to study this better together, because then we could actually read the documentation, you know, to publish these sessions online.

Yeah, so the study group will be recorded as videos. But I also then want to actually create, you know, written materials using Jupiter, which I will then publish. That's my goal. So what you said very much resonates with me, that I often find myself in the when teaching people this this bind that to explain everything I need to already have everything explained.

And I think so and especially it comes down to, in order to explain what many of these glyphs are doing, I need some fancy arrays. If I restrict myself to simple vectors and scalers, then I can't really show their power. And I cannot create these higher rank arrays without already using those glyphs.

And so hopefully, it is this long running project since like 2015, I think it is, is to add a literal array notation to APL. And then there is a way in, then you can start by looking at an array, and then you can start manipulating and see the effects of the glyphs and intuit from there what they do.

Yeah, no, I think that'll be very, very helpful. And in the meantime, you know, my approach with the kids has just been to teach row quite early on. So row is the equivalent of reshape in Python, most Python libraries. And yeah, so once you know how to reshape, you can start with a vector and shape it to anything you like.

And it's, you know, it's not a difficult concept to understand. So I think that yeah, basically, the trick at the moment is just to say, okay, in our learning of the dictionary of APL, one of the first things we will learn is, is row. And that was really fun with the kids doing monadic row, you know, to be like, okay, well, what's row of this?

What's row of that? And okay, what's row of row of this? And then what's row of row of row, which then led me to the to the storm and poem about what is it row row row is one, etc, etc, which they loved as well. Yeah, we'll link that in the show notes.

Also, too, while you were saying all that, that really resonated me with me when I first started learning APL is like one of the first things that happened when I was like, okay, you can, you can fold, you can map. So like, how do you filter, you know, what are the classic, you know, three functional things?

And the problem with APL and array languages is they don't have an equivalent filter that takes a predicate function, they have a filter that is called compress that takes a mask that, you know, drops anything that corresponds to a zero. And it wasn't until a few months later that I ended up discovering it.

But for both APL and the newer APL BQN, there's these two sites, Adam was the one that wrote the APL one apple cart dot info, and bacon crate dot info, I also think. And so you can basically semantically search for what you're trying to do. And it'll give you small expressions that do that.

So if you type in the word filter, which is what you would call it coming from, you know, a functional language, or even I think Python calls it filter, you can get a list of small expressions. And really, really often, sometimes you need to know the exact thing that it's called, like one time I was searching for, you know, all the combinations or permutations.

And really, what I was looking for was power set. And so until you have that, you know, the word power set, it's, you know, it's a fuzzy search, right? So but it's still a very, very useful tool when it's like you said, you're trying to learn something like Chinese.

And it's like, well, where do I even start I don't I don't know the language to search the words to search for. But yeah, it is. I agree that there's a large room from improvement and how to onboard people without them immediately going, like you said, this looks like hieroglyphics, which I think Iverson considered a compliment, like there's some anecdote I've heard where someone was like, this is hieroglyphics.

And he says, yes, exactly. And then the other thing like that I want to do is help in particular Python programmers and maybe also do something for JavaScript programmers, which are the two most popular languages, like at the moment, like a lot of the tutorials for stuff like J or whatever, like J for C programmers, you know, great book, but most people aren't C programmers.

And also a lot of the stuff like, you know, it'd be so much easier if somebody just like said to me early on, oh, you know, just the same as partial in Python, you know, or it's like, you know, putting things in a box, what the hell's a box if somebody basically said, oh, it's basically the same as a reference.

It's like, oh, okay, you know, I think it one of your podcasts, somebody said, oh, it's like void stars. Oh, yeah, okay. You know, this is kind of like lack of just saying, like, this is actually the same thing as in Python and JavaScript. So I do want to do some kind of yeah, mapping, yeah, like that, particularly for kind of NumPy programmers and stuff, because a lot of it's so extremely similar.

Be nice to kind of say like, okay, well, this is, you know, J maps things over leading axes, which is exactly the same as NumPy, except it doesn't have trailing axes. So if you know the NumPy rules, you basically know the J rules. Yeah, I think I think at the basic level, you're absolutely right.

And that that would certainly be really useful. When we've talked this over before, some of the challenges are in the flavors and the details. If you send somebody down the wrong road with a metaphor that almost works in some of these areas, it can really be challenging for them, because they see it in with, you know, through their lens of their experience.

But that would say, in this area, it would work differently than it actually does. So there is a challenge in that. And we find it even between APL, BQN and J. I'm trying to think of what we were talking about. Oh, it was transpose, the language, the language is dyadic transpose is they hand, they handle them differently.

They're functionally, you can do the same things, but you have to be a aware that they are going to do it differently, according to the language. Absolutely. But that's not a reason to throw out the analogy, right? Like, I think everybody agrees that that it's easier for an APL programmer to learn J, than for a C or JavaScript programmer to learn J, you know, because there are some ideas you understand.

And you can actually say to people like, okay, well, this is the rank conjunction in J. And you may recognize this as being like the rank, you know, operator in APL. So if we can do something like that and say like, oh, well, okay, this would do the same thing as, you know, dot permute, dot blah in PyTorch.

It's like, okay, I see it. Well, as the maintainer of apple cart, I'd like to throw in a little call to the listeners. Like what Connor mentioned, I do fairly often get people saying, well, I couldn't find this and ask them, what did you search for? So do let me know, contact me by whatever means, say, if you couldn't find something, either because it's altogether missing, and I might be able to edit, or tell me what you search for and couldn't find, or maybe you found it later by searching for something else.

And I'll add those keywords for future users. And I have put in a lot of like function names from other programming languages so that you can search for those and find the APL equivalent. Yeah, I will say, I feel like either I'm not smart enough to use applecart.info, or I haven't got the right tutorial yet.

Because I, I went there, I've been there a few times. And there's this like whole lot of like impressive looking stuff. And I just I, I don't want to know what to do with it. And then I sometimes click things and it sends me over to this tao.run that tells me like real time 0.02 seconds code, like, I find it, you know, a little, not a little, I have not yet, I don't yet know how to use it.

And so, you know, I guess given hearing you guys say this is a really useful tool that a lot of people put a lot of time into, I should obviously invest time learning how to use it. And maybe after doing that, I should explain to people how to use it.

I do have a video on it. And there's also a little question mark icon one can click on and get to. I have tried the question mark icon as well. As I say, it might just you know, I think this often happens with APL stuff. I often hit things and I feel like maybe I'm not smart enough to understand this.

Clearly don't think that's if we disagree. Yeah, I do recall you saying a few minutes ago that you managed to teach your, you know, four year old daughter like 12 grade or age 12 algebra. No, I didn't. I just gave her the app, right? It's like it's I've heard other parents have given it to their kids.

They all seem to handle it. It's it's just this fun game where you hatch dragon eggs by like dragging things around on the iPad screen. And it just it so happens that the things you're doing with dragon's eggs are the rules of algebra. And after a while, it starts to switch out some of the like monsters with symbols like x and y, you know, and it does it gradually, gradually.

And at the end, it's like, oh, now you're doing it after birth. So I can't get any credit for that. That's some very, very clever people wrote a very cool thing. It really is an amazing program. I homeschooled my son as well. And we used that for algebra. Great.

Yeah, it was a bit more age appropriate, but it's I, I looked at that and said that that really is well put together. It's it's an amazing program. I will say there'll be a Dragonbox APO one day. It's not a bad idea. Not a bad idea at all. I was going to say when you're teaching somebody, one of the big challenges when you're sort of trying to get a language across to a general audience is who is the audience?

Because as you say, if you're if you're dealing with kids or people who haven't been exposed to programming before, that's a very different audience than somebody might have been exposed to some other type of programming. Functional programming is a bit closer, but if you're a procedural programmer or imperative programmer, it's going to be a stretch to try and bend your mind in the different ways that, you know, APL or J or BQN expect you to think about things.

Yeah, I think the huge rise of functional programming is very helpful for coming to array programming, you know, both in JavaScript and in Python. It's, you know, I think most people are doing stuff, particularly in the machine learning and deep learning world, are doing a lot of functional stuff off.

That's the only way you can do things, particularly in deep learning. So I think, yeah, I think that does help a lot. Like, like Connor said, like you've probably come across, you know, map and reduce and filter and certainly in Python, you'll have done list comprehensions and dictionary comprehensions.

And a lot of people have done SQL. So it's, yeah, I think a lot of people come into it with some relevant analogies, if we can help connect for them. Yeah, one of the things that, you know, this really is reinforcing my idea that, or it's not my idea, I think it's just an idea that multiple people have had, but the tool doesn't exist yet.

Because we'll link to some documentation that I use frequently when I'm going sometimes between APL and J on the BQN website, they have BQN to dialogue APL dictionaries and BQN to J dictionaries. So sometimes I'll like, if I'm trying to convert between the two, the BQN docs are so good.

I'll just use BQN as like an IR to go back and forth. But I've mentioned on previous podcasts that really what would be amazing and it would only work to a certain extent is something like a multidirectional array language transpiler and adding NumPy to that list would probably be, you know, a huge, I don't know what the word for it is, but beneficial for the array community.

If you can type in some NumPy expression, you know, like I said, it's only gonna work to an extent, but for simple, you know, rank one vectors or arrays that you're just reversing and summing and doing simple, you know, reduction and scan operations, you could translate that pretty easily into APLJ and BQN.

And it's, I think that would make it so much easier for people to understand, aka the hieroglyphics or the Greek or the Chinese or whatever metaphor you want to use. Because yeah, this is, it is definitely challenging at times to get to a certain point where you have enough info to keep the snowball rolling, if you will.

And it's very easy to hit a wall early on. Yeah. That's a project I've been thinking about is basically rewrite NumPy in APL. It doesn't seem like a whole lot of work, where just take all those names that are available in NumPy and just define them as APL functions.

And people can explore that by opening them up and seeing how they're defined. Oh, so not actually you're saying like, it wouldn't be a new thing. You're just saying like, rename the symbols, what they're known as in NumPy so that you'd still be in a, like an APL. Yeah.

I mean, you could use it as a library, but I was thinking of it more as an interactive exploring type thing, where you open up this library and then you, you write the name of some NumPy thing functionality and open it up in the editor and see, well, how is this defined in APL?

And then you could use it obviously, since it's defined. Interesting. Then you could slowly, you could use these library functions. And then as you get better at APL, you can start actually writing out the raw APL instead of using these covers for it. Well, I guess, Jeremy, that's interesting.

Do you think that, because you've mentioned about sort of the notation versus the programming language and where do you think the, like, in your dream scenario, are you actually coding in sort of an Iversonian like notation? Or is it at the end of the day, does it still look like NumPy, but it's just all of the expressivity and power that you have in the language like APL is brought to and combined with what NumPy sort of currently looks like?

I mean, well, it'd be a bit of a combination, Connor, in that, like, you know, my classes and my type dispatch and my packaging and, you know, all the, you know, my function definitions and whatever, that's Python. But, you know, everywhere I can use plus and times and divide and whatever, I could also use any APO glyph.

And so it'd be, you know, basically an embedded DSL for kind of high dimensional notation. It would work automatically on NumPy arrays and TensorFlow tensors and PyTorch tensors. I mean, one thing that's interesting is, to a large degree, APL and PyTorch and friends have actually arrived at a similar place with the same, you know, grandparents, which is, Iverson actually said his inspiration for some of the APL ideas was tensor analysis.

And a lot of the folks, as you can gather from the fact that in PyTorch, we don't call them arrays, we call them tensors. A lot of the folks working on deep learning, their inspiration was also from tensor analysis. So it comes from physics, right? And so I would say, you know, a lot more folks have worked on PyTorch.

We're familiar with tensor analysis and physics than we're familiar with APL. And then, of course, there's been other notations, like explicitly based on Einstein notation, there's a thing called INOPS, which like takes, it's a very interesting kind of approach of taking Einstein notation much further. And like Einstein notation, if you think about it, is the kind of the loop free programming of math, right?

The equivalent of loops in math is indices. And Einstein notation does away with indices. And so that's why stuff like INOPS is incredibly powerful because you can write, you know, an expression in INOPS with no indices and no loops. And it's all implicit reductions and implicit loops. I guess, yeah, my ideal thing would be, we wouldn't have to use INOPS, we can use APL, you know, and it wouldn't be embedded in a string.

They would actually be operators. Yeah, that's what it is. They'd be operators in the language. The Python operators would not just be plus times minus slash, that would be all the APL glyphs would be Python operators. And they would work on all Python data types, including all the different tensor and array data types.

Interesting. Yeah. So it sounds like you're describing a kind of hybrid language. JavaScript too. I would love the whole DSL to be in JavaScript as well. You know, that'd be great. And I feel like I saw that somewhere. I feel like I saw somebody actually do an ECMA script, you know, RFC with an implementation.

Yeah, it was an A+4s joke. Yeah, but it actually worked, didn't it? Like, it's just there was actually an implementation. I don't think they had the implementation. It was just very, very well-specced. It could actually work kind of thing. No, I definitely read the code. I don't know how complete it was, but there was definitely some code there.

I can't find it again. If you know where it is. There's a JavaScript implementation of APL by Nick Nicolev. But my problem with it, it's not tightly enough connected with the underlying JavaScript. It shouldn't be an A+4 full stroke, should it? It's like Gmail was an A+4 full stroke, right?

Gmail came out on April 1st and totally destroyed my plans for fast mail because it was an April Fools joke that was real. And Flask, you know, the Flask library, I think, was originally an April Fools joke. We shouldn't be using frameworks because I created a framework that's so stupidly small that it shouldn't be a framework.

And now that's the most popular web framework in Python. So, yeah, maybe this should be an April Fools joke that becomes real. How close? This is maybe an odd question, but because from what I know about Julia, you can define your own Unicode operators. And I did try at one point to create a small composition of two different symbols, you know, square root and reverse or something, and it ended up not working and asking me for parentheses.

But do you think Julia could evolve to be that kind of hybrid language? Maybe. I'm actually doing a keynote at JuliaCon in a couple of weeks, so maybe I should erase that. Just at the Q&A section, say, any questions? But first, I've got one for the community at large.

Here's what I'd like. I think my whole talk is going to be kind of like what Julia needs to be, you know, to move to the next level. I'm not sure I can demand that a complete APL implementation is that thing, but I could certainly put it out there as something to consider.

It always bothers me, though, that if you try to extend those languages like this or you could do some kind of pre-compiler for it, then their order of execution ends up messing up APL. I think APL very much depends on having a strict one-directional order of functions, otherwise it's hopeless to keep track of.

That is a big challenge because currently the DSL inside Python, which is the basic mathematical operations, do have the BODMAS or PEMDAS order operations. So there would need to be some way. So in Python, that wouldn't be too hard, actually, because in Python, you can opt into different kind of parsing things by adding a from dunderfutures import blast.

You could have a from dunderfutures import APL precedence. And then from then on, everything in your file is going to use right-to-left precedence. That's really interesting and cool. I didn't know that. Yeah, that's awesome. I've been spending a lot of time thinking about function precedence and just the differences and different languages.

I'm not sure if any other languages have this, but something that I find very curious about BQN and APL is that they have functions basically that have higher precedence than other functions. So operators in APL and conjunctions in adverbs, they have higher precedence than your regular functions that apply to arrays.

I'm simplifying a tiny bit, but this idea that in Haskell, function application always has the highest precedence. You can never get anything that has a higher function precedence than that. And it always, having stumbled into the array world now, it seems like a very powerful thing that these combinator-like functions don't have just by default the higher precedence.

Because if you have a fold or a scan or a map, you're always combining that with some kind of binary operation or unary operation to create another function that you're then going to eventually apply to something. But the basic right to left, putting aside the higher order functions or operators, as they're known in APL, the basic right to left path, again, for teaching and for my own brain, gosh, that's so much nicer than in C++.

Oh my God, they're not being able to operate a precedence. There's no way I can ever remember that. And there's a good chance when I'm reading somebody else's code that they haven't used parentheses because they didn't really need them and that I have no idea where they have to go and then I have to go and look it up.

It's another of these things that with the kids, I'm like, okay, you remember that stuff we spent ages on about like, first you do exponents and then you do times. It's like, okay, you don't have to do any of that in APL. You just go right to left and they're just like, oh, that's so much better.

This literally came up at work like a month ago, where I was giving this mini APL, we had 10 minutes at the end of a meeting, and then I just made this offhand remark that of course, the evaluation order in APL is a much simpler model than what we learned in school.

And I upset, there was, I don't know, 20 people in the meeting and it was the most controversial thing I had said. I almost had like an out of body experience because I thought I was saying something that was like objectively just true. And then I was like, wait a second, what I'm clearly missing, like, is there?

Yeah, well, you were wrong. Like, how do you communicate? No, I mean, most adults are incapable of like new ideas. It's just, it's, it's, that's what I should have said in the meeting. What, I mean, this is a reason that I, another reason I like doing things like APL study groups, because it's a way of like self-selecting that small group of humanity who's actually interested in trying new things, despite the fact that they're grownups, and then try to surround myself with those people in my life.

But isn't it sad then? I mean, what has happened to those grownups? Like when you mentioned teaching these people and trying to like, map their existing knowledge onto APL things, what does it mean to box and so on? I find that two children and non-programmers, expanding their array model and how the functions are applied and so on, is almost trivial.

Meets no resistance at all. And it's all those adults that have either learned their, their primitives or button mass or whatever the rules are, and, and all the computer science people that know their proceedings tables and their lists of lists and so on. Those are the ones that are really, really struggling.

It's not just resisting. They're clearly struggling. They're really trying and, and, and it's a lot of effort. So there is actually, I mean, that is a known thing in educational research. So yeah, I mean, so I spent months earlier this year and late last year reading every paper I caught about, you know, education, because I thought if I'm going to be homeschooling, then I should try to know what I'm doing.

And yeah, what you describe at arm is, is absolutely a thing, which is that the, you know, the research shows that trying, you know, when you've got a, you know, an existing idea, which is an incorrect understanding of something, and you're trying to replace it with a correct understanding, that is much harder than learning the correct version directly.

So which is obviously a challenge when you think about analogies and analogy has to be good enough to lead directly to the, to the correct version. But I think, you know, the important thing is to find the people who are who have the curiosity and tenacity to be prepared to go over that hurdle, even though it's difficult, you know, because yeah, it is like, that's just, that's just how human brains are.

So so be it, you know. Yeah, unlearning is really hard work, actually. And if you think about it, it probably should be because you spend a lot of time and energy to put some kind of a pattern into your brain. Right. You don't want to have that evaporate very quickly.

Right. And our, you know, myelination occurs around what, like age is age to 12 or something. So like our brains are literally trying to stop us from having to learn new things, because our brains think that they've got stuff sorted out at that point. And so they should focus on keeping long term memories around.

So yeah, it does become harder. But, you know, a little bit, it's still totally doable. The solution is obvious. Teach AP on primary school. That's what I'm doing. What was the word you mentioned? Am I a myelation? My myelination. M-E-Y-L-I-N-A-T-I-O-N. Interesting. I'd not heard that one before. So it's a physical coating that I can't remember goes on the dendrites.

I think it's on the axons, isn't it? That sounds right. These fat layers or cholesterol layers. I never took any biology courses in my education. So clearly, I've missed out on that aspect. You myelinated anyway. Isn't that an APL function? Myelinate. You also mentioned the word tenacity, Jeff. Yeah.

And and and I was watching an interview with Samyan Bhatani. And you were talking about because it sounds like he was you spotted at an early point in his working with Kaggle that he was something probably different. And the thing you said was that tenacity to to keep working at something.

Yeah. I think that's a really important part about educating people that they shouldn't necessarily expect learning something new to be easy. Yeah. But you can do it. Oh, yeah. I mean, I really noticed that when I was started learning Chinese. Like I went to, you know, just some local class in in Melbourne.

And everybody was very, very enthusiastic, you know, and everybody was going to learn Chinese. And we all talked about the things we were going to do. And yeah, each week, there'd be fewer and fewer people there. And, you know, I kind of tried to keep in touch with them.

But after a year, every single other person had given up and I was the only one still doing it. You know, so then after a couple of years, people would be like, wow, you're so smart. You learn Chinese. This is like, no, man. Like during those first few weeks, I was pretty sure I was learning more slowly than the other students.

But everybody else stopped doing it. So of course, they didn't learn Chinese. And I don't know what the trick is, because, yeah, it's the same thing with, you know, like it fast. I courses, they're really designed to keep people interested and get people doing fun stuff from from day one.

And, you know, still, I'd say most people drop out and the ones that don't I would say most of them end up becoming like actual world class practitioners and they, you know, build new products and startups and whatever else. And people will be like, oh, I wish I knew neural nets and deep learning.

It's like, okay, here's the course. Just just do it and don't give up. But yeah, I don't know tenacity. It's not a very common virtue, I think, for some reason. It's something I've heard, I think it's Joe Bowler at Stanford talk about the growth mindset. And I think that is something that, for whatever reason, some people tend to, and maybe it's malanation, at those ages, you start to get that mindset where you're not so concerned about having something happen that's easy to do well.

But just the fact that if you keep working at it, you will get it. And not everybody, I guess, is maybe put in the situations that they get that feedback that tells you if I keep trying this, I'll get it. If it's not easy, they stop. Yeah, I mean, that area of growth mindset is a very controversial idea in education.

Specifically the question of can you modify it? And I think it's certainly pretty well established to this point that the kind of stuff that schools have tended to do, which is put posters up around the place saying like, you know, make things a learning opportunity or don't give up, like they do nothing at all.

You know, with my daughter, we do all kinds of stuff around this. So we've actually invented a whole family of clams. And as you can imagine, clams don't have a growth mindset, they tend to sit on the bottom of the ocean, not moving. And so the family of clams that we invented that we live with, you know, always at every point that we're going to have to like learn something new or try something new, always start screaming and don't want to have anything to do with it.

And, you know, so we actually have Claire telling the clams how it's going to be okay. And, you know, it's actually a good thing to learn new things. And so we're trying stuff like that to try to like have have imaginary creatures that don't have a growth mindset and for her to realize how how silly that is, which is fun.

And the things that you were talking about in terms of the meta-mathematics, you didn't say, Oh, the successor, this is what plus is you said, how do you how do you how would you use this? How would you start to put it together themselves? Which to me, that's the growth mindset that if you Yeah, you're creating that.

But then like, you know, gosh, you're getting to all the most controversial things in education here, Bob, because that's the other big one is discovery learning. So this idea of having kids explore and find. It's also controversial, because it turns out that actually the best way to have people understand something is to give them a good explanation.

So it is important, like, that you combine this, like, okay, how would you do this within like, okay, let me just tell you what you know why this is. It's easier for homeschooling with two kids, because I can make sure their exploration is short, and correct. You know, if you spend a whole class, you know, 50 minutes doing totally the wrong thing, then you end up with these really incorrect understandings, which you then have to kind of deprogram.

So yeah, education's hard, you know. And I think a lot of people look for these simple shortcuts, and they don't really exist. So you actually have to have good, good explanations and good problem solving methods and yeah, all this stuff. That's a really interesting area, the notation and the tools.

Yeah, and you know, notation, I mean, so I do a live coding, you know, video thing every day with a bunch of folks. And in the most recent one, we started talking about APL, why we're going to be doing APL this week instead. And I gave, you know, somebody actually said like, oh, my God, is it going to be like regexes?

And, you know, I kind of said like, okay, so regexes are a notation for doing stuff. And we spent an hour solving the problem with regexes. And oh, my God, it was such a powerful tool for this problem. And you know, by the end of it, they were all like, okay, we want to like deeply study regexes.

And obviously, that's a much less flexible and powerful tool notation than APL. But you know, we kind of talked about how once you start understanding these notations, you can build things on top of them. And then you kind of create these abstractions. And that's yeah, notation is how, you know, deep human thought kind of progresses, right, in a lot of ways.

So, you know, it's like, I actually spoke to a math professor friend a couple of months ago about, you know, my renewed interest in APL. And he was like, and I kind of sent him some, I can't remember what it was, maybe doing the golden ratio or something, little snippet, and he was just like, yeah, something like that looks like Greek to me, I don't understand that.

It's like, do you draw a math professor, you know, like, if, if I said somebody who isn't in math, like a page of your, you know, research, what are they going to say? And, you know, it's interesting, I said, like, there's a bit of their ideas in here, like, Iverson brackets, for example, have you ever heard of Iverson brackets?

He's like, well, of course, I've heard of it. Like, you know, it's a fundamental tool in math. It's like, well, you know, that's one thing that you guys have stolen from APL. You know, that's a powerful thing, right? It's like, fantastic, I'd never want to do without Iverson brackets.

So I kind of tried to say like, okay, well, imagine, like, every other glyph that you don't understand here, has some rich thing like Iverson brackets, you could now learn about. Okay, maybe I should give it a go. I'm not sure he has. But I think that's a good example for mathematicians, is to show like his one thing, at least that found its way from APL.

That maybe gives you a sense that for a mathematician, that there might be something in here. On that note, because I know we are potentially, well, we've gone way over, but this has been awesome. But a question I think that might be a good question to end on is, is, do you have any advice for folks that want to learn something, whether it's Chinese, or an array language, or to get through your fast AI course?

And is there because I think, you know, like you said, you like to self select for folks that are the curious types and that are want to learn new things and new ways to solve things. But like, is there any way, other than just being tenacious to, like, be tenacious, is there tips to, you know, approaching something with some angle, because I think a lot of the folks maybe listening to this don't have that issue.

But I definitely know a ton of people that are the are the kind of folks that you know, they'll join a study group, but then three weeks and they, you know, the kind of lose interest or, or they decide it's too much work or too difficult. As an educator, and you know, it seems like you operate in this space.

Do you have advice to tell folks, you know, I mean, so much, Connor, I actually kind of embedded in my courses a lot. I can give you some quick summaries. But what I will say is, my friend Radhika Zmalski, who's been taking my courses for like four years, has taken everything I've said, and his experience of those things and turned it into a book.

So if you read, Zmalski's book is called Meta Learning, powerful mental models for deep learning. This is learning as in learning deeply. So yeah, check out his book, to get the full answer. I mean, there's just, gosh, there's a lot of things you can do to make learning easier.

You know, and a key thing I do in my courses is I always teach top down. So like often people with like, let's take deep learning and neural networks, they'll be like, okay, well, first, I'm going to have to learn linear algebra and calculus and blah, blah, blah. And, you know, four or five years later, they still haven't actually trained a neural network.

Our approach in our course is in lesson one, the very first thing you do in the first 15 minutes is you train a neural network. And it is more like how we learn baseball or how we learn music, you know, like you say, like, okay, well, let's play baseball comes, you stand there, you stand there, I've threaded this to you, you're going to hit it, you're going to run, you know, you don't start by learning, you know, the parabolic trajectory of a ball or the, you know, history of the game or whatever, you just start playing.

So that's, you know, you want to be playing. And if you're doing stuff from the start, that's fun and interesting and useful, then top down, doesn't mean it's shallow, you can then work from there to like, then understand like, what's each line of code doing? And then how is it doing it?

And then why is it doing it? And then what happens if we do it a different way? And until eventually, with with our fast AI program, you actually end up rewriting your own neural network library from scratch, which means you have to very deeply understand every single part of it.

And then we start reading research papers. And then we start learning about how to implement those research papers in the library we just wrote. So yeah, I'd say go top down, make it fun, make it applied. For things like APL or Chinese, where there's just stuff you have to remember, use Anki, use repetitive space learning.

You know, that's been around, Ebbinghaus came up with that, I don't know what, 250, 200 years ago, it works, you know, everybody, if you tell them something, will forget it in a week's time, everybody, you know, and so you shouldn't expect to read something and remember it. Because you're human, and humans don't do that.

So repetitive space learning will have you quiz you on that thing tomorrow. And then in four days time, and then in 14 days time, and then in three weeks time, and if you ever forget it, it will reset that schedule. And it'll make sure it's impossible to forget it, you know, so it's, it's depressing to study things that then disappear.

And so it's important to recognize that unless you use Anki or super memo or something like that, unless you use it every day, it will, it will disappear. But if you do use repetitive space learning, it's guaranteed not to. And I told this to my daughter, a couple of years ago, I said, I, you know, what if I told you there was a way you can guarantee to never ever forget something you want to know?

It's just like, that's impossible. This is like some kind of magic. It's like, no, it's not magic. And like, I sat down and I drew out the Ebbinghaus forgetting curves and explained how it works. And I explained how, you know, if you get quizzed on it in these schedules, it flattens out.

And she was just like, what do you think? I want to use that. So she's been using Anki ever since. So maybe those are just two, let's just start with those two. Yeah, so go top down and, and use Anki, I think could make your learning process much more fulfilling, because you'll be doing stuff with what you're learning and you'll be remembering it.

Well, that is awesome. And yeah, definitely we'll leave links to not just Anki and the book, meta learning, but everything that we've discussed throughout this conversation, because I think there's a ton of really, really awesome advice. And obviously to your fast AI course in the library. And we'll also link to, I know you've been on, like we mentioned before, a ton of other podcasts and talks.

So if you'd like to hear more from Jeremy, there's a ton of resources online. Hopefully, it sounds like you're going to be, you know, building some learning materials over the next however many months or years. And so in the future, if you'd love to come back and update us on on your journey with the array languages, that would be super fun for us, because I've thoroughly enjoyed this conversation.

And thank you so much for waking up early all on the other side of the world from us, at least in Austria. Thanks for having me. And yeah, I guess with that, we'll say happy array programming. Happy programming.

The Array Cast: Jeremy Howard

Chapters

Transcript