back to indexfast.ai APL study session 11 (with Adám Brudzewsky)
Chapters
0:0
4:50 Describes rank.
5:28 Creates 2 layer, 3 row, 4 column array.
5:40 Definition of +/ is that it sums the rows
6:0 Summing over the last axis is always the rows.
6:20 (slash bar)- always sums over first/leading axis
6:40 Limit to rank 2. Result is identical as without
7:45 rank 0. Gives unmodified array
8:5 explanation. Treats it as a whole array
8:35 (+ ⌿ ⍤ 2)a
8:56 (+ ⌿ ⍤ 1)a ⍝ Equivalent to
9:20 Using rank 1 with + ⌿ is same as removing the bar to be
9:40 Function to compute average
10:0 m ← 2 3 ⍴ 6
10:40 Using ⌿ calculates correctly. One average per column
11:5 Average over the rows
11:35 Think of the bar (horizontal) as representing the rows
11:50 Horizontal, last axis reverse, transpose
12:20 3 4 = 3 5 ⍝ Implied rank 0
13:10 Rank operator ⍤ is entirely general purpose
14:0 Debugging trick. {… ⋄ …}
14:15 New statement symbol
15:25 Create our own trace operator
16:0 Use trick to create monadic version of trace: ⊢ identity
21:20 Can say operators have long left scope
23:0 Discussion about left-to-right and right-to-left
31:5 What’s proper APL? APL isn’t very opinionated.
32:0 Discussion about writing performant APL
32:55 Make functions leading axis oriented
33:10 Keep the code flat. Don’t use nested arrays.
35:0 Nested arrays aren’t contiguous.
35:10 Bottleneck is often memory throughput
36:10 Trick: Use boolean masks as much as you can
37:50 Use boolean
38:0 APL will squeeze arrays. stored as 1 bit booleans
39:25 Summary of good APL principles
40:45 Each operator
41:20 Don’t want loops, want array operations because of CPU support
49:50 Describing the definition of Each
52:55 Borrow/down arrow ↓ (drop)
53:44 Never really any reason to use down arrow (↓)
55:25 Can’t use Each ¨ on rows but you can use Rank
00:00:00.000 |
>> Hello. Hi, Adan is here. >> Yeah, I'm up late. I figured I should -- 00:00:26.200 |
>> I was going to say, it must be middle of the night. What time is it where you are? 00:00:37.720 |
it's nice to see you. Thanks for all your help on the forum. 00:00:40.520 |
We're going slowly, but we're making progress nonetheless. 00:00:52.520 |
Yeah. Well, I noticed you had some problems with the rank operator. I don't know if you're 00:00:57.560 |
interested in some help with that. >> Oh, absolutely. 00:01:01.560 |
>> Because you were -- I think it was in session 9, you were trying to apply the rank operator on 00:01:07.320 |
plus/ and couldn't make it work. And then, as you expected, and then in the previous lesson, 00:01:15.960 |
you were trying to use it on equal to make it compare like match. 00:01:22.680 |
>> Yeah. We did get it to work in the last lesson, which was nice, although I realized afterwards, 00:01:32.360 |
actually, there's a perfectly good out-of-product version which works. But, yeah, would love help 00:01:41.160 |
with those things. What's the easiest thing to do? >> I can do that, but I can also just explain 00:01:49.160 |
something and you can try it out for yourself. >> I think a shared screen would be good. 00:01:54.120 |
Yeah. Why don't you share yours? >> I can share my screen, sure. 00:01:57.880 |
>> Well, I think also it's just good to watch somebody who knows what they're doing is APO for 00:02:02.200 |
a change. >> Okay. I don't usually use Jupyter Notebook. >> That's fine. Use whatever you like. 00:02:09.480 |
>> Everything is the same otherwise. Okay. Actually, let me -- I can do it here instead. 00:02:19.960 |
>> Well, waiting for Adam, I just mentioned we just -- for those who didn't notice, 00:02:23.880 |
we just released the course, course.fast.ai, so like literally 15 minutes ago. 00:02:31.640 |
>> And they're coming. >> First time in two years. 00:02:40.520 |
Do I need to give you permission or anything? No, it works. Okay, great. 00:02:46.520 |
>> Nice rabbit images, Jeremy. >> Yeah, thanks. I like the rabbits. 00:02:51.080 |
It's now that computers can draw for us, I have no excuse not to add artwork. 00:02:58.200 |
My wife's an art teacher, and I've been showing her a few of the capabilities. 00:03:21.560 |
I'm impressed, but I guess it blows my mind, but I don't think she really gets it. 00:03:31.560 |
It's like you see it in the movies all the time. >> Yes, some people didn't realize computers 00:03:43.560 |
couldn't do this before. >> You can see an empty blue screen, dark blue screen? 00:03:53.160 |
don't need any interface. >> Yeah, what is this? This is the -- 00:03:58.120 |
>> This is called -- no, this is right, but it's right in full screen mode with 00:04:04.280 |
toolbars and everything turned off so you can only see the APL, nothing else. 00:04:10.520 |
>> Nice. >> So the important thing to understand -- 00:04:16.680 |
>> And when you do write, you're using the normal back tick approach to entering symbols? 00:04:21.880 |
>> I personally don't. I use the right side alt key as a shifting key. 00:04:28.200 |
>> How do you set that up? Because I've always wanted that. 00:04:30.760 |
>> I have created an actual Microsoft type keyboard layout that you can just install, 00:04:38.600 |
and then you don't need any of that. >> Okay, can you put that in the chat or 00:04:42.200 |
something? >> I'll do that afterwards. No problem. 00:04:47.080 |
So the important thing to understand is that rank, it's like -- I know you call it in Australia, 00:04:58.840 |
but some of the countries call it blinkers or call it blinders that you put in the head of a horse. 00:05:04.200 |
>> You put something on the head of the horse so it doesn't get distracted by things around, 00:05:14.360 |
so it can only -- it narrows down its vision. That's the only thing that rank can do. 00:05:20.120 |
And that's important to understand, and that's why it was not working for you, 00:05:25.240 |
not with plus slash, not with equal. So I'm creating an array here. This is a two layer, three row, 00:05:33.800 |
four column array. And the definition of plus slash is that it sums the rows. 00:05:45.640 |
So now when I sum the rows, you saw this result before. So the first result is one plus two plus 00:05:53.320 |
three plus four. That's ten. And then five, six, seven, eight. >> More generally, it's summing over 00:06:01.640 |
the last axis. >> It's summing the last axis, which is always the rows. In any array, the last 00:06:08.600 |
axis is the rows. >> Right. >> And in this array, there are six rows. 00:06:14.520 |
>> And slash bar always sums over the first axis. >> The leading axis, yes. The first axis. Now, 00:06:21.960 |
what I can do with the rank operator is to put blinders on this function. So right now, 00:06:32.120 |
it's receiving as its argument the entire array A. When I say rank two, the only thing that you 00:06:41.640 |
will ever see is arrays of rank two, even if the actual rank of the argument is higher. So what 00:06:48.280 |
rank will do is saying, oh, this function is only allowed to see arrays of rank two. Let me show it 00:06:53.720 |
this array of rank two. And when it's done processing that, let me show it this array. 00:06:59.000 |
And then I'll collect together its results from those two applications into a larger result. 00:07:05.560 |
Now, remember, plus slash sums the rows. So when it sees this row, an array, it sums this row, 00:07:13.880 |
and it sums this row, and it sums this row. Next time around, it sums this, and this, and this, 00:07:19.560 |
which means the result is going to be entirely identical to what we had before. 00:07:24.040 |
What's actually happening is that it's seeing less at a time, but that doesn't matter. The 00:07:33.080 |
result is the same. And if we do rank one, it is only allowed to ever see one row. 00:07:39.320 |
So it will sum this, and then rank will let us sum this, and so on, and we get the same result. 00:07:43.960 |
When you set the rank zero, then ranks knows, okay, this function is only ever allowed to see 00:07:52.040 |
arrays of rank zero, that is single elements. So it will tell it, okay, sum one, then sum two, 00:07:58.280 |
which is why that gives us our array unmodified. The difference with plus slash 00:08:09.400 |
is plus slash does the, sorry, plus slash bar is that it looks at the whole array and treats it 00:08:18.360 |
as a whole array. So it takes the entire first layer and adds it to the entire second layer 00:08:23.160 |
like that. Now, if you restrict the rank to rank two, it cannot see the entire array. The 00:08:33.880 |
only thing you can see is this matrix because rank is restricting its vision. So now it's 00:08:40.120 |
going to instead, the leading axis now is down here along the row. So it's going to add this row 00:08:46.760 |
to this row, to this row, giving a result with four elements like that, which is why we get 00:08:54.360 |
two times four elements. If I tell it rank one, then it will only ever be able to see 00:09:01.880 |
one row at a time. So it will sum each row and that's equivalent to plus slash. 00:09:07.240 |
So in general, when you have two symbols that are the same, but one has a bar on it, 00:09:14.440 |
then if you give the one with a bar, rank one, that's the same thing as removing the bar. 00:09:21.560 |
And so is that why like you often see more experienced APL programmers kind of 00:09:29.320 |
using the bar version kind of by default, because it's like more flexible. 00:09:33.960 |
Exactly. So now, if we try to use a function that computes the average, 00:09:42.840 |
let's do that again. And here's a function that computes the average. 00:09:52.200 |
I'm sorry, tell you, you just learned telling something like that. Now, if I try to apply this 00:09:59.560 |
on a table instead on the matrix, so we can use this one. Then obviously the average in each row 00:10:13.400 |
should be two and five. Well, what's actually happening is that we get something that makes 00:10:19.480 |
no sense at all. Why is that? Because tell it counts how many major cells there are, 00:10:26.440 |
that is along the first axis. So it says there are two, but plus slash is summing along the rows. 00:10:33.160 |
So we are summing three numbers and dividing by two, that is not an average. 00:10:37.640 |
So experienced APL, they will use slash bar instead. This sums along the leading axis, 00:10:44.440 |
and this counts along the leading axis, and this will give me one average per column. 00:10:49.640 |
So the average of one and four is two and a half, average of two and five is three and a half. 00:10:54.840 |
Yep. That's why the experienced APLs will use first axis functions, because then they can always say, 00:11:03.560 |
okay, if I want the average over the rows, I'll restrict the view of this function to the rows 00:11:13.640 |
of this array. And now I get two and five. And I couldn't have done that if I had defined my 00:11:20.440 |
function. I couldn't, it wouldn't be as flexible if I had defined my function in terms of last axis, 00:11:25.560 |
only by defining it in terms of first axis, I'm able to narrow down this vision to any lower level. 00:11:31.880 |
Is a way to think of that mnemonic with the slash bar is that like the bar is horizontal, 00:11:40.840 |
so it deals with rows? Is that like a... Yeah, that's how I think of it. At least it 00:11:45.880 |
goes down the rows instead of going along the columns. And you have the same thing with, 00:11:53.160 |
this is reverse, horizontal reverse, and this meaning last axis reverse, and this is 00:12:00.440 |
reversing it over the horizontal axis, or rotating over the horizontal axis. 00:12:05.640 |
Or flip, flipping? Yeah, either flipping or... 00:12:09.000 |
Horizontally, or flipping vertically. Yeah, exactly. So this shows the bar is transposed, 00:12:15.720 |
isn't it? Yes, the diagonal is transposed because it's flipping over the diagonal. 00:12:21.000 |
So the problem we had with, you're trying to do this thing or something like that. 00:12:31.320 |
Or we can take some numbers that makes it a little bit easier to understand. 00:12:38.040 |
So, plus and all the arithmetic functions are so-called scalar functions. They actually have an 00:12:45.960 |
implied rank zero, always. Meaning they're always so narrow-minded that they only look at individual 00:12:55.720 |
elements and never consider the whole array. And there's nothing the rank operator can do 00:13:01.000 |
to change that. So as opposed to axis, which is an ad hoc syntax that actually 00:13:07.400 |
looks at what function did you give it and does something special for it, the rank operator is 00:13:12.120 |
entirely general purpose. It has no idea about which function it's applying. 00:13:17.880 |
I see. So equals bar could be used to behave like equals, but... 00:13:21.720 |
Exactly. So if we do this, now we're using match that only is allowed to look at scalars at a time. 00:13:29.640 |
So it looks at three and three. It looks at four and looks at five as pairs. 00:13:35.480 |
And so we can use that to match anything. So match is the more general purpose function 00:13:42.120 |
than equal, but equal is a very common construct. So that we have a separate function for that. 00:13:48.680 |
And a good way to get a view of things in general, if you don't know what's going on in an expression, 00:13:55.880 |
I think you did see this phrase once. It prints things. So here's a really 00:14:03.800 |
useful debugging trick. If I have a function, I don't know exactly what's going on. 00:14:08.120 |
I'll wrap it in a division and I'll put in alpha and omega. And this is a new statement. 00:14:16.040 |
I don't know if you've seen that. And then I'll apply the function. So let's say, for example, 00:14:22.760 |
that I'm using equal. So the function will return the same result as the primitive function, 00:14:30.040 |
but it's wrapped in such a way that it prints the arguments first. 00:14:34.360 |
So now if we say three, four, and three, five, we can see that the two should probably turn boxing on. 00:14:44.520 |
We don't actually need the max style. But oops, that didn't. Oh, of course. 00:14:54.520 |
I have to specify that even when functions are printing, I want it to be 00:15:00.040 |
boxed. So what equal is seeing is three, four, and also three, five. But it does its thing 00:15:07.960 |
element by element. Now, if I apply the rank operator, it will print twice. It'll be called 00:15:15.720 |
three, three, and four, five. So this is a good way to see it. And you have already gone so advanced 00:15:21.960 |
that you created your own operators. So we can actually create a trace operator. 00:15:28.040 |
And it will print the arguments. And then it will apply the function to the arguments. 00:15:38.840 |
So now we can say three, four, equal TC, three, five. It prints that. 00:15:46.520 |
And we can do it with rank zero. It prints it like this. 00:15:56.920 |
We can even make it fancier and put labels inside and saying alpha is this, and omega is this, 00:16:01.480 |
and so on. You do need a different version for a monadic, though, because there would be a value 00:16:08.040 |
error on the alpha. So there's a trick that you haven't learned yet, which is to type this. 00:16:14.840 |
There's a special syntax in the defin. It means-- 00:16:19.080 |
Yeah, I've seen that. It means apply. It means make that-- if you haven't passed an alpha, 00:16:23.640 |
then that's default. It's just default left argument. So the default left argument-- this 00:16:26.920 |
is a funny default left argument, because the default left argument is a function, 00:16:31.240 |
which otherwise you can't pass in. But it's a function which is a no-op. It's a identity 00:16:37.400 |
function. And so here, if alpha is an identity function, then we just print omega. And over here, 00:16:44.600 |
if alpha is an identity function, we apply alpha-alpha monadically. And then we apply 00:16:49.640 |
identity function to it, which doesn't do anything. So this is a general purpose TC. 00:16:56.440 |
Sorry, just to clarify, Adam. Is that setting alpha-- it's not setting alpha to the identity 00:17:05.240 |
function, right? It's setting alpha to the result of the identity function. 00:17:13.720 |
Because there's no argument to it. So it's just a function assignment. It's a tested function. 00:17:17.640 |
Oh, yeah, yeah, yeah. Yeah, OK, sorry. That makes perfect sense. 00:17:24.360 |
OK, so we don't have to change that. So now, if we look at what plus slash is seeing, 00:17:31.880 |
and that was our problem from the beginning. So here, we can see that it's seeing the entire array. 00:17:38.120 |
It's a little bit hard to read, but here's our result. And this is the printout of the arguments. 00:17:44.200 |
We can improve TC a little bit and say alpha here. 00:17:50.840 |
Oh, yes, of course, that's not going to work. And alpha omega. 00:18:12.760 |
This will work. OK, so there's only an omega in this case. 00:18:24.600 |
And it's seeing the entire array, and it's summing its rows. 00:18:30.440 |
And when we try to apply this rank 2, then we see it twice. But it's, again, being applied to this. 00:18:37.880 |
It's exactly like applying plus slash to this array, summing its rows. 00:18:42.680 |
And so on. So this is a really useful operator. You can modify it to your heart's content. 00:18:50.440 |
You can make it do whatever you want. And to effect things, you can write a timestamp 00:18:54.440 |
when it happened and so on. So I think, I mean, I can take questions, but I think it should be 00:18:59.080 |
more clear now what the rank operator does. It's just blinders. That's all it does. 00:19:08.920 |
And this is what happened, then, with when we compared these. 00:19:20.760 |
Then you did the outer product like this. The outer product is comparing all the 00:19:31.800 |
elements on one side to all the elements on the other side, all the different combinations. 00:19:36.920 |
We can write this just with equal and rank. Yeah, that's what we did yesterday. 00:19:41.000 |
Yeah. So again, this is every element, zero on the left. 00:19:49.640 |
Rank zero on the left gets compared to every element on the right. 00:19:53.400 |
So I know you did this, but there's a point here. 00:19:57.720 |
So it gives us the same result. But what is equal actually seeing? 00:20:03.240 |
Outer product applies between all combinations of left and right elements. 00:20:07.720 |
That's not what's happening here. It's what you thought was happening here. 00:20:11.800 |
Because if you put TC in, you'll see that it's only being called with a scalar on the left 00:20:19.880 |
and a vector on the right, because that's exactly what we asked for. Scalar on the left, 00:20:25.000 |
vector on the right. If we instead do dot equal trace like this, 00:20:34.440 |
you can see that it's being called individually on every pair. 00:20:39.000 |
So the full way of behaving like the outer product is to say, I want equal to only ever 00:20:53.480 |
see arguments of rank zero on one side and zero on the other side. And you're allowed 00:20:58.840 |
to omit one if they're the same. And that function, which only looks at rank zero things, 00:21:05.480 |
should be applied between scalars on the left and vectors on the right. So I'm using rank twice. 00:21:12.840 |
Oh, wow. Hang on. Okay. I want you to understand. 00:21:15.160 |
So everything banks for the left. Okay, we read left to right when we do 00:21:19.320 |
operators. Yeah. Or you could say the operators have a long left scope. So this operator 00:21:24.760 |
takes the entire thing here, operator phrase on the left, a function phrase on the left. 00:21:28.920 |
So this is saying this function can only ever see scalars. And with that, apply it between scalars 00:21:36.760 |
on the left and vectors on the right. Now, I don't get it. So if you're saying to apply it 00:21:42.760 |
to vectors on the right, but you previously said it only, you can never apply the scalars. 00:21:47.240 |
What does that? Yeah. So it's not, remember, um, rank is not modifying the function. 00:21:52.280 |
Rank is calling the function one or more times as necessary, such that the function will have a 00:22:00.040 |
restricted view. Yes. So this function over here will be called with left arguments as scalars and 00:22:07.880 |
right arguments as vectors. We can see that by putting in CC. Oh, right. However, this function 00:22:15.800 |
itself is the derived function. It is not a normal equal. It's an equal. It's, it's a function that 00:22:22.840 |
uses equal, but only ever lets equal experience a scalar on the left and the scalar on the right. 00:22:28.920 |
And how do we know that? Well, we can look at with TC. So now we can see that this equal is being 00:22:35.320 |
called like that one element at a time on the left and on the right. We could also put in a double 00:22:39.640 |
TC, but it will be a very verbose and also when, yeah. So, so we can see first to go up all the 00:22:48.440 |
way here. So we say that the outer TC reported, I'm calling my operand with a scalar and a vector 00:22:54.840 |
and the inner TC, the left one saying, I'm seeing, I'm calling my operand with a scalar and a scalar. 00:23:04.120 |
So actually in some ways, it makes sense to think about that composition train right to left 00:23:17.400 |
diuresis is taking the whole left-hand function. So the kind of the implied loop is that left-hand 00:23:33.080 |
side is kind of the inner of the applied loop in a sense. Yes. And this is the governor. 00:23:37.320 |
Or do you read it right to left until you hit the operator and then you jump to the far left and 00:23:45.400 |
read that? I mean, the parsing goes left to right, but the point is that the right-hand 00:23:51.000 |
diuresis has the entire left-hand derived function as the thing that it's applying that rank to. 00:23:59.240 |
I find it a little bit dangerous to speak about APL in terms of right to left, 00:24:03.000 |
left to right. It's kind of like a scaffold for letting people know how simple function 00:24:10.040 |
application works in APL, but it doesn't really apply when you have the full APL syntax, including 00:24:15.880 |
operators and stranding and so on. Really the way you should think of it is in terms of binding 00:24:21.080 |
strength. What binds stronger? And then that operators bind stronger to their neighboring 00:24:29.400 |
tokens than functions do. And then when you have equal binding strength, then operators go from 00:24:38.040 |
the left stronger. So they have a long left scope and operators left upper-end will be as far to the 00:24:45.560 |
left as it can possibly reach without switching type. It can only take either a function or an 00:24:51.400 |
array. So when parsing this, we can look at this as, okay, this operator, what does it take as its 00:24:59.560 |
left upper-end? Well, here we have an operator, a magnetic operator. It can't be just that because 00:25:04.920 |
it can't take a magnetic operator as upper-end. So we have to keep going left. Maybe this is the 00:25:08.680 |
upper-end of TC. Oh, further left. No, there's a diatic operator. It's going to grab the zero 00:25:14.680 |
from TC and it takes its left upper-end. Oh, no, another left, another operator on the left. 00:25:23.400 |
So it has to have an upper-end. Keep going here. And there's a parenthesis stop. We can't go any 00:25:27.240 |
further. So we stop here. Or you could look the other way around this equal. Is it being applied 00:25:35.240 |
now? Nope. It's being grabbed by an operator on the right. Is this being applied? Nope. It's being 00:25:41.000 |
applied. It's being grabbed by an operator on its right. Okay. So here's the right upper-end. Are 00:25:47.320 |
we ready to apply? Nope. There's an operator on the right grabbing me. Is this ready? No, there's 00:25:52.600 |
another operator. Okay. And then finally, the right upper-end and the parenthesis. We can't go any 00:25:58.520 |
further. So it doesn't matter which way you go. As long as you know the binding strength rules, 00:26:03.240 |
you just go one token ahead and see are we done yet? And if the binding rules say no, 00:26:08.840 |
we're not done yet, then you keep going. And related to this, Adam, I found it very 00:26:15.960 |
insightful listening to you on a Raycast episode talking about why you tend to avoid parenthesis, 00:26:25.800 |
which is not because you're trying to type less characters, but because it's a similar idea that 00:26:31.240 |
you were saying, there's less to keep in your head at once if you can just work in the natural 00:26:39.080 |
direction and only have to keep one thing in your head at a time. Right. And it doesn't really 00:26:44.680 |
matter. You can read APL right to left or left to right. It's just a matter of reading it. So 00:26:49.960 |
the way I would read this from left to right, and actually I would avoid this parenthesis, 00:26:56.920 |
I do need to separate this array from this array, but I can do that with an identity function 00:27:02.840 |
because this operand here has to stop here. We're switching to a new token here. It can't grab further. 00:27:10.280 |
Okay. So that identity function. So what's the code again? Write something, 00:27:15.320 |
right? I mean, the efficient name is the same, but write tag is the symbol. Yeah. And so that 00:27:22.920 |
function in a dyadic context returns its right-hand side and in a monadic context returns, 00:27:30.440 |
well, it returns its right-hand side. It's right-hand side. 00:27:32.440 |
I like to call it write because it returns whatever's on the right. Right. And so that 00:27:39.400 |
function's not doing anything except as I parse that now, I basically can see that I've got a 00:27:46.440 |
function being applied to an array and therefore I'm, yeah, that's a unit of stuff that APL can then. 00:27:55.480 |
So you can read this from, so normally I would read APL, at least these expressions are short enough, 00:28:02.680 |
from left to right. Interesting. Because it's executed from right to left, 00:28:07.160 |
we can read it from left to right. And I will make a crazy claim here that English is written 00:28:14.360 |
and read from left to right. And it executes from right to left. I'll come back to that. 00:28:19.320 |
So this is ABCD equal on scalars, on scalars and vectors, to ABDC, BCA. 00:28:42.760 |
Yeah, I know what you mean. It's like when you see like, you know, three 00:28:47.960 |
divide, tilde diuresis, something, you can start reading it as like three divided into, 00:28:59.160 |
and then you can start, you know, you can see that expression, divides. 00:29:02.360 |
Just make it three, three divides five. English is executed from right to left. 00:29:10.760 |
Go drive the big red. You still have no idea what I'm saying. Bus. 00:29:17.720 |
Okay, so first you have to evaluate bus, right? Then you have to make it red. Then you have to 00:29:23.640 |
make it big. Then you have to talk about the concept of driving it. Then you have to go do that. 00:29:27.800 |
Go drive the big red bus. That's insightful. Yeah. 00:29:37.000 |
In fact, normal function syntax in other programming languages is also from right to 00:29:42.360 |
left, even though everybody thinks it's from left to right. Because if I write f of g of h of x, 00:29:51.480 |
you have to evaluate x first. And before you evaluate h, before you evaluate g, 00:29:56.280 |
before you actually write it and read it from left to right. 00:29:59.560 |
Well, a lot of people nowadays are moving towards the syntax where you kind of... 00:30:05.800 |
The x dot x. Well, or maybe it's been functional, 00:30:09.080 |
kind of a right arrow kind of cliff or something. Yeah, pipe type thing. Yeah. 00:30:16.840 |
But anyway, enough about that. So I hope this clarifies matters a bit. 00:30:24.200 |
Now, I've used up half of your time. I'm sorry. 00:30:30.520 |
No, I'm thrilled. Anybody have any questions about anything? 00:30:37.320 |
Both watch the previous ones and then join in. This is great. 00:30:42.440 |
I guess I have a more general question, Adam, which is, do you have any thoughts about... 00:30:49.080 |
I mean, I want us to finish all the glyphs, right? Which hopefully won't take too much longer. 00:30:55.400 |
But when we do, I think the next step will be to learn 00:31:01.240 |
to write APL properly and also understand why, like what proper is proper. 00:31:08.360 |
So things like this use bar version of glyphs because they're more flexible thing is like a 00:31:20.760 |
pretty key insight. Is there like good videos or books or anything like that for getting these 00:31:27.240 |
kinds of insights? The art of APL. APL style. 00:31:35.800 |
I don't... There are some tips and tricks. In general, APL isn't very 00:31:44.680 |
opinionated about how you should write things. In fact, I think dialogue is kind of proud of 00:31:53.720 |
language being a multi-paradigm language. You can write in a functional style. You don't have to. 00:32:01.480 |
You can write in the object or in the style if you want to, but you don't have to. You can write 00:32:05.720 |
test it or you can write non-test it whichever way you want. However, if you want good performance, 00:32:14.040 |
for example, then there are some things you should stick to. If you want more flexibility, 00:32:19.320 |
so your functions are generally more applicable, then there are some things you can stick to. 00:32:24.040 |
I would say for what we're doing, more flexibility is probably what we're aiming for because I think 00:32:32.680 |
like this study group are kind of positioned to just like learning about a flexible and 00:32:39.560 |
expressive notation which might help us to think about problems that we're solving. 00:32:45.880 |
There's not enough, I think, to write in order to make a paper of it. It's like a couple of lines of 00:32:51.400 |
tips like this. Make your functions leading axis oriented so that they're more flexible. You can 00:33:01.640 |
apply them. You can always make them later axis oriented by using the rank operator and 00:33:09.640 |
keep your codes flat. You can do arrays of arrays. Flat reading with parentheses? 00:33:16.600 |
No. The algorithms should use arrays that are not nested. We can have these arrays of arrays. 00:33:26.120 |
You haven't used a whole lot of them, but the opposite is called simple arrays or flat arrays. 00:33:36.600 |
I've heard some people call it, they're more sympathetic to the hardware. 00:33:39.880 |
The computer is really, really good at arrays because it's actually... 00:33:45.800 |
Just to clarify, if I remember correctly, J doesn't exactly let you have arrays of arrays. 00:33:53.960 |
You have to explicitly box them. The difference is very little. It's almost... 00:33:59.560 |
It's more focused on arrays of arrays, if I remember. 00:34:05.480 |
Yeah, K doesn't allow you multidimensional arrays. It only allows lists of lists of lists 00:34:10.600 |
and there's no other way. There's some choices that we made in design in order to avoid that 00:34:17.800 |
because it doesn't allow multidimensional arrays. 00:34:21.240 |
Yeah. In PyTorch and such things, we think about these issues a lot because 00:34:33.000 |
it really kills you on the GPU. If you're doing something across anything other than 00:34:41.240 |
the trailing axis, it'll still work, but it'll be doing a non-linear stride. 00:34:50.520 |
But it's not just a stride. You have a stride if your array is actually represented flat in memory. 00:34:59.480 |
Sure, but if you have nested arrays, it's not contiguous at all. It's not even a stride. 00:35:05.080 |
And that is going to kill performance. And not only performance, but actually in today's computers 00:35:13.160 |
are so fast that the bottleneck is often memory throughput. The RAM cannot feed problems to the 00:35:21.960 |
processor fast enough. The processor is just sitting there waiting for the RAM to deliver more work. 00:35:27.240 |
So this is actually a very current issue in the deep learning world because as of a year or two 00:35:33.160 |
ago, a lot of papers were written that would write about the flops that their algorithm would 00:35:39.800 |
require. And nobody, not nobody, but a lot of people writing these papers hadn't quite noticed 00:35:45.800 |
that there was very little correlation between flops and time because of the memory issue. 00:35:51.560 |
Now PyTorch doesn't let you have tensors of tensors, so it's less of a problem. But yeah, it does turn 00:35:59.080 |
out that memory is probably the more important issue in deep learning algorithms at the moment. 00:36:08.840 |
So here's one more trick to use in APL at least. Use Boolean masks as much as you can. And that is 00:36:18.440 |
because again, the RAM is the issue. That's the bottleneck. So in other words, instead of conditionals. 00:36:26.360 |
Well, not just instead of conditionals, but instead of integers, if you can, 00:36:31.000 |
instead of using indices and things, then you should use a mask for the whole thing. 00:36:35.160 |
The reason for that is and store data as Boolean instead of... So I just want to make sure I've 00:36:41.880 |
done the same wavelength. So as you're saying, instead of like having an array that says like, 00:36:46.040 |
get indices two, three, and five, you would have a mask array of zeros in which items two, 00:36:52.040 |
three, and five have a one in that location. Exactly. And then let's say, for example, 00:36:58.280 |
you need to combine two conditions. And you know that elements one, two, and five abide by these 00:37:08.360 |
conditions. And then by one condition, you have another condition for which elements four and five 00:37:16.360 |
hold the condition. So you could do the intersection of the two sets. I'm multiplying them. 00:37:21.560 |
To get the indices. Well, they're just numbers, right? They're just the intersection. So you do 00:37:27.400 |
the intersection of them as sets. And then those are the ones where the condition holds for both. 00:37:32.680 |
And then you could index things. However, if you had them as Boolean masks instead, 00:37:36.600 |
so it would be whatever, zero, one, zero, one, something like that, and one of them and so on, 00:37:42.520 |
one of them and so on, then you can just do an end, the Boolean end. And that gives you a new mask 00:37:47.240 |
and doing a Boolean and operation on binary data in the processor is enormously much faster than 00:37:54.840 |
doing a set intersection. That's what I meant about multiply. So you could, okay. Yes. Exactly. 00:38:00.760 |
And then there's another benefit of this is the API will aggressively squeeze arrays 00:38:08.040 |
and Boolean arrays are stored as one bit Booleans. Oh, really? That means that you can store eight 00:38:16.440 |
elements in a single byte. Wait, how does that work? Because it's not like typed per se. So if 00:38:23.080 |
it would just notice that the highest is one and the lowest is zero. And if I then try to store a 00:38:28.280 |
two and it all have to reallocate the whole thing or something. Yes. Yes. And then, but that means 00:38:35.000 |
since the processor is waiting for the data and we're able to switch to an eighth of the data size, 00:38:41.320 |
that means that the transfer time, which is the important time is going to be an eighth. 00:38:46.760 |
And that gives you enormous speed ups. And so we have all these very clever algorithms built 00:38:53.640 |
into the interpreter and algorithms that are difficult to develop. They can take decades to 00:38:59.480 |
write the C codes for that, a C code for that. And they can give you a speed up like that. So 00:39:06.200 |
basically by using APL that is optimized like this, you are, you are employing C clever C programmers 00:39:15.400 |
that have been working for you for years to fine tune your program way before you even started 00:39:21.960 |
writing your program. So, so these are, I can't, I don't even think I can think of more of more 00:39:31.320 |
things that have good principles than, than that. Okay. I mean, that's very good. You use Boolean 00:39:40.680 |
masks, keep your arrays flat. And what was the first thing I said before? And first, and first 00:39:52.280 |
access in leading access and things. So with, so for, I mean, the general programming principles, 00:40:03.240 |
you don't do global state changes and sort of global variables, really bad idea. So one thing 00:40:11.320 |
we, yeah, it's another thing I think we're pretty familiar as a community with more general software 00:40:16.680 |
engineering principles. One thing that surprised me when we were learning about each was that it 00:40:23.000 |
didn't operate over kind of major cells, but instead it operated over sub arrays. And I guess 00:40:31.080 |
that what now that we know about rank, we can just use rank for anything where we want to go over 00:40:37.560 |
major cells, which means maybe each is not so useful anymore. Each is actually really, 00:40:43.320 |
really simple. I can, I can show you if you want, I can explain what is happening with each. 00:40:48.840 |
So, okay. And like I said, just in general, but like each, each is a thing that 00:40:58.600 |
you would use and, you know, use it occasionally, but it depends what it is I'm doing. I'll try to 00:41:05.720 |
avoid it as much as possible. And among APL programmers, because it's an explicit loop 00:41:11.880 |
and that means the interpreter has no choice but to loop. And we don't want loops. We want to do 00:41:17.480 |
array operations because then the processors now have array instructions. They can do it. 00:41:25.880 |
And rank operator doesn't create explicit loops? 00:41:28.840 |
Rank operator conceptually loops, but internally if it can avoid it, it will not loop. 00:41:33.240 |
So it knows about a lot of things. Exactly. Exactly. 00:41:37.160 |
So that would be good to have a little section in the notebook there, Jeremy, where we might 00:41:45.800 |
like say this is an explicit loop and this is the less explicit way to do it. 00:41:53.960 |
Well, yeah, I mean, it's a different notebook, I think, Ben, like, you know, we've got to think 00:41:57.480 |
about how to present all this, but I think, you know, there's a note, the theory of this first 00:42:00.760 |
notebook or set of notebooks is literally, you know, a dictionary of APL glyphs in an order 00:42:08.440 |
where you never get a definition in terms of something you haven't learned yet, 00:42:12.440 |
you know, and then there's something later about like, okay, what do you do with it? 00:42:17.640 |
Cool, yeah. So if you use the rank operator to loop, then you might keep the performance 00:42:26.280 |
because it doesn't actually loop. It uses fancy instructions for that instead. Each doesn't have 00:42:31.960 |
much of a choice, although occasionally the interpreter is clever enough. If you try to 00:42:35.720 |
use plus each, it will not actually loop because it knows how to just circuit that and just do it 00:42:42.040 |
directly. But what's happening with each is, you think of the matrix, they want to loop over each 00:42:52.040 |
row, but really what each is, is very, very simple. So if you have F each, that's the same thing 00:43:00.120 |
as, you know, now that you've learned enough of these compositional operators, 00:43:05.800 |
you learn the top. It's the enclose, which I don't remember. 00:43:09.480 |
We haven't done enclose, but you can do quickly what it is. 00:43:12.120 |
Yeah, it's basically just wrapping an array up as a single element, as a scaler. 00:43:16.680 |
So it's adding a leading act. Oh, it's not adding a leading act. 00:43:19.720 |
Not adding an actually creating a scaler. It's creating a pointer to it. You can think of it 00:43:24.680 |
like that. What type is that? Is that some new type we haven't learned yet? Like it's literally 00:43:29.160 |
an enclosed item. Well, it's not numeric scaler. It's not a, and it's not a scaler. 00:43:39.000 |
It's a pointer type reference, but it doesn't, it's not a reference. No. Okay. 00:43:44.920 |
Because APL is passed by value. And so it will do, it will not keep connections between things 00:43:51.080 |
that you assign across, but internally, it's actually a pointer. And that's pretty much how 00:43:55.400 |
you can think of it, but you need some enclosure. It's a scaler. And so what it is, is enclose 00:44:04.440 |
a top F over disclose, and disclose exactly the opposite. This means follow the pointer, 00:44:13.800 |
go, go get one element and open it up. Rank zero. 00:44:19.000 |
I'm just trying to remember. So the jot, diuresis. 00:44:30.520 |
But if there's only one argument, remember, then it's the same thing as on the top. 00:44:35.400 |
So that's why it's useful to have it do that. So this means actually pre-process all arguments, 00:44:40.920 |
whether there's one or two, we just pre-process them with this close. So we open up a box. 00:44:45.960 |
Add on a ticket. You've got, oh, no, you don't have a fork. These are operators, not functions. 00:44:52.360 |
Okay. So this is, so here you've got function, operator, function, operator, function, operator, 00:44:59.720 |
array. Okay. So this whole thing is monadic because there's a thing only on the right. 00:45:04.520 |
And then, wait, what, what are you saying? This is not, this whole thing is one giant function. 00:45:12.680 |
The function is ambivalent. We call it. It's both monadic and dyadic. 00:45:17.400 |
Oh, how is this a function? I thought the zero on the right hands. Oh, no, the zero is the 00:45:23.320 |
right hand side of the operator. Yeah. Yeah. So this says, so what it's saying is on every 00:45:29.000 |
scalar element and loop as much as necessary to address scalar elements. Okay. This is fine. 00:45:34.600 |
On both, on both sides, not mine. This is rank. Oh, okay. Yes, of course. That's rank. So this says 00:45:41.720 |
on scalars. So we already, we already dug. The first thing we do is dig all the way down to the 00:45:47.640 |
scalars. That means there's nothing you can do to each to make it apply to rows. It's already 00:45:55.720 |
impossible. It's like, it's like equal or plus it's already down at the scalar level on the scalars. 00:46:01.880 |
Yeah. Sorry. So it never sees the row. Yeah. It never sees a row. It never sees any, 00:46:08.280 |
F will never see anything that comes from more than a single element. So on the single elements. 00:46:13.000 |
Yeah. And remember this zero actually means zero, zero, zero, because it, what it means is zero, 00:46:19.160 |
it's ranked zero monadically, it's ranked zero and left and this ranks zero, right. So it's 00:46:23.560 |
always ranked zero. Then on those elements, open them up. If they are nested, if they're not nested, 00:46:31.720 |
like just a simple array, then nothing happens. Apply F and package the results back into the 00:46:39.320 |
box. Oh yeah. Because the, um, when you've got multiple composition, it goes right to left. 00:46:48.920 |
This, yeah, you could, you could, the binding goes like that. So we're saying, so this, 00:46:54.200 |
this is F post processed by enclosing the result and pre-processed by disclosing the result. 00:47:02.840 |
Oh, okay. So we can show you have actually used nested arrays that have been implied 00:47:09.320 |
in close already. So, and what we can say as an example, ABC, DF, this is a nested array. 00:47:18.520 |
This is exactly the same thing as the enclose of ABC concatenated with the enclose of DF. 00:47:26.520 |
The stranding syntax is just a nicety. It's in tactic sugar. It means this. Okay. 00:47:33.080 |
It just enclose all the elements that are being stranded together. 00:47:36.760 |
So, so conceptually, these are, well, no, they're really, they're, they're scalars 00:47:44.040 |
because every vector consists of scalars and every matrix consists of rows that have scalars 00:47:49.320 |
in them, elements in them. And so this is an, and maybe we should turn boxing up to max now. 00:47:55.000 |
Why is that not just an array of arrays where the first array is ABC and the second array is DF? 00:48:03.880 |
Well, it is. Okay. So why do you need that enclose idea? 00:48:07.000 |
Because if I didn't have enclose here, then we would just be concatenating together. 00:48:14.200 |
Yeah. So we need to say each, each three element vector lives in its own little scalar. 00:48:21.320 |
And so these are individual scalars. If we look at the, at the shape of this, 00:48:28.040 |
it's the empty vector. It's a scalar. Yep. If you look at the depth of it, it's depth two. 00:48:33.640 |
There's an outer array, which is a scalar and there's an inner array, which is a vector. 00:48:38.440 |
So the two levels. Yep. And so this is what we have. This is exactly the same thing as ABC, DF. 00:48:48.440 |
And now it's easier to understand when I do reverse each on these, what was actually happening here is 00:49:01.160 |
I started off by applying on, so you can make our little, we can write this TC each. 00:49:10.200 |
So we can see that TC is first seeing ABC, then seeing DF. Ah, well done. So if TC is seeing ABC, 00:49:19.720 |
then firstly, well, if reverse is seeing ABC, that means it was only applied to the first element. 00:49:29.800 |
That's the rank zero, but it wasn't, it didn't see the enclosure. 00:49:33.560 |
So we have disclosed it. We've opened it up. Hmm. And what did that? 00:49:40.760 |
Well, because remember the definition of, of each. So in this case, it's the enclosing, 00:49:49.400 |
oops, enclosing a top, the disclosing rank zero. Oh, this is what each means. That's the definition 00:50:03.400 |
of each. If we didn't do the disclosing, if you just do this right, reverse running zero, 00:50:13.400 |
then we are reversing each scalar, but reversing a scalar doesn't do anything. 00:50:17.320 |
Hmm. If we only preprocessed it by opening it up, then we would get the matrix because we're 00:50:28.120 |
having the results from this or is a vector. So two vectors in an array makes a matrix. 00:50:34.120 |
So to stick them back into the boxes where they came from, we post process 00:50:42.520 |
the result of reverse, and this is a definition of each. And that's why you cannot use a function 00:50:49.080 |
that has an each after it to access entire rows of a matrix. It's just not possible. 00:50:54.840 |
However, if we take three, four, reshape by Yota and Yota 12, and we want to reverse each row. 00:51:07.560 |
Yeah, now you can use rank. Firstly, this function anyway is rank one. 00:51:14.200 |
Remember, if this function is the leading axis one, then the corresponding 00:51:22.280 |
function is the same thing rank one. So this reverses the first axis, and this reverses 00:51:32.360 |
the last axis. It flips it horizontally. And if we use the first axis one with rank one, 00:51:41.400 |
then we're flipping the rows because there's only one axis in them. We're only ever seeing 00:51:52.840 |
one of these. So how could we use reverse each to reverse the rows? Or for that sake, 00:52:03.800 |
reverse first each? Well, if we know that each will open up these boxes and close them down again. 00:52:12.200 |
So if we give it boxes that they can apply on, then it will work. So if I enclose 00:52:21.160 |
rank one, so enclose, remember, puts a box around things. If I just enclose the array, 00:52:26.600 |
we get a multiple enclosed, we can keep making beautiful patterns. If I enclose rank one, 00:52:33.640 |
then I took each row and made it into a scaler. That means we have a collection 00:52:40.200 |
of three scalers that's called the vector and nested vector. Now I can reverse each. 00:52:47.480 |
Isn't there an arrow that does the same thing? Yes, for matrices, there is a down arrow, 00:52:53.880 |
but it's not really necessary. It does exactly the same thing on matrices, but it doesn't do 00:52:59.240 |
the same thing on higher rank arrays, so it's more general. In a sense, this itself is a 00:53:04.360 |
last-axis function, whereas enclose is a leading-axis function. It works on everything. And you can 00:53:11.880 |
restrict it to be on the lower rank. Oh, I see. So you can think of down arrow and enclose as 00:53:17.720 |
the leading and trailing-axis versions of the same thing. Yeah, you could, yeah. 00:53:22.920 |
Maybe they don't look similar. They don't look similar, but that's this because... 00:53:30.120 |
Enclosed with a bar would look like epsilon or something. 00:53:33.560 |
Well, they would look like this, but that means something else. Oh, okay. Yeah, you haven't 00:53:39.880 |
learned the dance. You'll see. I don't give it a spot at all. But there's never any reason, 00:53:45.480 |
really, to use a down arrow. It's just confusing on higher rank arrays. It's much easier conceptually 00:53:49.160 |
to understand that this puts things into boxes, and it gets restricted to only see rows. So it 00:53:54.200 |
puts the rows into boxes. And then we can disclose... The problem is we need to disclose each of these. 00:54:04.600 |
So we need to disclose rank zero. The elements are of the vector. This vector has three elements, 00:54:14.840 |
and I want to open up each one of them because it's confined to a box. Then we get our metrics back. 00:54:19.560 |
So this is exactly possible. But notice here, I'm enclosing only to applying each 00:54:28.840 |
so that I can, again, disclose. Well, that's the inverse operations of what each actually implies 00:54:36.200 |
because this each is actually over disclose rank zero and then enclose on top of that. 00:54:51.480 |
So these negate each other. This makes scalars that are enclosed, and this opens scalars that 00:55:00.840 |
are enclosed. And this encloses the results, and this discloses the results. So these two cancel 00:55:08.600 |
each other out, and these two cancel each other out. And the only thing we have left is this rank 00:55:15.400 |
one, which it already is, and we're right back. So we can always look like that. You come full circle. 00:55:20.840 |
That's great. This is why you cannot use each on rows, but you can use rank on rows. 00:55:29.720 |
And the interpreter is clever enough that if you write reverse rank one, it won't loop. 00:55:45.080 |
It will understand that it needs to reverse the rows, and it will do that 00:55:48.920 |
as fast as it can do that with a vector instructions memory. 00:55:54.840 |
I don't know if it can actually speed anything up here, but it will try. 00:56:03.400 |
Wow, it's nice to learn APL from somebody who understands it. 00:56:12.120 |
Thanks, Adan. We should let you get to sleep. And that's our hour, so that's actually fantastic. 00:56:19.960 |
That's awesome. I feel a little bit better about hijacking your whole thing. 00:56:25.080 |
We're happy to have you hijack all the whole things. It's great. Thank you. 00:56:29.880 |
Yeah, no, it's great that you're spending the time to watch them all. It was great that you joined. 00:56:34.760 |
This was really helpful for me. I enjoy also seeing your explorations, and it 00:56:43.400 |
gives me some feedback on where we can improve our documentation. 00:56:48.120 |
Must be a bit cringy, though, to see us being like, "Oh, what are we doing?" 00:56:51.800 |
Just press that button, Jeremy, for God's sake. 00:56:55.560 |
There's been a couple of times where I kind of wish I was there. 00:57:01.720 |
I'll let you explore. Almost always, you figure it out eventually. Somebody jumps in and says, 00:57:08.920 |
"Hey, try this." But what I was worried about a little bit here is you seem to be going down 00:57:15.000 |
a wrong conceptual path with regards to rank. It seemed like you were thinking that rank actually 00:57:22.840 |
modifies the function, just like the bracket X modifies the function. That isn't the case. 00:57:32.520 |
It's nice to know that if we go too far off the deep end, you'll come tell us. 00:57:38.120 |
I'll have sleepless nights if you go too far off the right path. 00:57:45.640 |
If we see you join the call at the beginning, we're like, "Oh, we had a problem last time." 00:57:53.320 |
Otherwise, feel free to ask me questions. I'll respond on the forums.