fast.ai APL study session 11 (with Adám Brudzewsky)

00:00:00.000 | >> Hello. Hi, Adan is here. >> Yeah, I'm up late. I figured I should --

00:00:26.200 | >> I was going to say, it must be middle of the night. What time is it where you are?

00:00:33.000 | >> 01/04. >> Oh, my goodness. Well,

00:00:37.720 | it's nice to see you. Thanks for all your help on the forum.

00:00:40.520 | We're going slowly, but we're making progress nonetheless.

00:00:52.520 | Yeah. Well, I noticed you had some problems with the rank operator. I don't know if you're

00:00:57.560 | interested in some help with that. >> Oh, absolutely.

00:01:01.560 | >> Because you were -- I think it was in session 9, you were trying to apply the rank operator on

00:01:07.320 | plus/ and couldn't make it work. And then, as you expected, and then in the previous lesson,

00:01:15.960 | you were trying to use it on equal to make it compare like match.

00:01:22.680 | >> Yeah. We did get it to work in the last lesson, which was nice, although I realized afterwards,

00:01:32.360 | actually, there's a perfectly good out-of-product version which works. But, yeah, would love help

00:01:41.160 | with those things. What's the easiest thing to do? >> I can do that, but I can also just explain

00:01:49.160 | something and you can try it out for yourself. >> I think a shared screen would be good.

00:01:54.120 | Yeah. Why don't you share yours? >> I can share my screen, sure.

00:01:57.880 | >> Well, I think also it's just good to watch somebody who knows what they're doing is APO for

00:02:02.200 | a change. >> Okay. I don't usually use Jupyter Notebook. >> That's fine. Use whatever you like.

00:02:09.480 | >> Everything is the same otherwise. Okay. Actually, let me -- I can do it here instead.

00:02:19.960 | >> Well, waiting for Adam, I just mentioned we just -- for those who didn't notice,

00:02:23.880 | we just released the course, course.fast.ai, so like literally 15 minutes ago.

00:02:31.640 | >> And they're coming. >> First time in two years.

00:02:40.520 | Do I need to give you permission or anything? No, it works. Okay, great.

00:02:46.520 | >> Nice rabbit images, Jeremy. >> Yeah, thanks. I like the rabbits.

00:02:51.080 | It's now that computers can draw for us, I have no excuse not to add artwork.

00:02:58.200 | My wife's an art teacher, and I've been showing her a few of the capabilities.

00:03:14.680 | >> What does she think? >> My darling.

00:03:21.560 | I'm impressed, but I guess it blows my mind, but I don't think she really gets it.

00:03:31.560 | It's like you see it in the movies all the time. >> Yes, some people didn't realize computers

00:03:43.560 | couldn't do this before. >> You can see an empty blue screen, dark blue screen?

00:03:49.640 | >> Yes. >> Okay, just clean slate,

00:03:53.160 | don't need any interface. >> Yeah, what is this? This is the --

00:03:58.120 | >> This is called -- no, this is right, but it's right in full screen mode with

00:04:04.280 | toolbars and everything turned off so you can only see the APL, nothing else.

00:04:10.520 | >> Nice. >> So the important thing to understand --

00:04:16.680 | >> And when you do write, you're using the normal back tick approach to entering symbols?

00:04:21.880 | >> I personally don't. I use the right side alt key as a shifting key.

00:04:28.200 | >> How do you set that up? Because I've always wanted that.

00:04:30.760 | >> I have created an actual Microsoft type keyboard layout that you can just install,

00:04:38.600 | and then you don't need any of that. >> Okay, can you put that in the chat or

00:04:42.200 | something? >> I'll do that afterwards. No problem.

00:04:47.080 | So the important thing to understand is that rank, it's like -- I know you call it in Australia,

00:04:58.840 | but some of the countries call it blinkers or call it blinders that you put in the head of a horse.

00:05:04.200 | >> You put something on the head of the horse so it doesn't get distracted by things around,

00:05:14.360 | so it can only -- it narrows down its vision. That's the only thing that rank can do.

00:05:20.120 | And that's important to understand, and that's why it was not working for you,

00:05:25.240 | not with plus slash, not with equal. So I'm creating an array here. This is a two layer, three row,

00:05:33.800 | four column array. And the definition of plus slash is that it sums the rows.

00:05:45.640 | So now when I sum the rows, you saw this result before. So the first result is one plus two plus

00:05:53.320 | three plus four. That's ten. And then five, six, seven, eight. >> More generally, it's summing over

00:06:01.640 | the last axis. >> It's summing the last axis, which is always the rows. In any array, the last

00:06:08.600 | axis is the rows. >> Right. >> And in this array, there are six rows.

00:06:14.520 | >> And slash bar always sums over the first axis. >> The leading axis, yes. The first axis. Now,

00:06:21.960 | what I can do with the rank operator is to put blinders on this function. So right now,

00:06:32.120 | it's receiving as its argument the entire array A. When I say rank two, the only thing that you

00:06:41.640 | will ever see is arrays of rank two, even if the actual rank of the argument is higher. So what

00:06:48.280 | rank will do is saying, oh, this function is only allowed to see arrays of rank two. Let me show it

00:06:53.720 | this array of rank two. And when it's done processing that, let me show it this array.

00:06:59.000 | And then I'll collect together its results from those two applications into a larger result.

00:07:05.560 | Now, remember, plus slash sums the rows. So when it sees this row, an array, it sums this row,

00:07:13.880 | and it sums this row, and it sums this row. Next time around, it sums this, and this, and this,

00:07:19.560 | which means the result is going to be entirely identical to what we had before.

00:07:24.040 | What's actually happening is that it's seeing less at a time, but that doesn't matter. The

00:07:33.080 | result is the same. And if we do rank one, it is only allowed to ever see one row.

00:07:39.320 | So it will sum this, and then rank will let us sum this, and so on, and we get the same result.

00:07:43.960 | When you set the rank zero, then ranks knows, okay, this function is only ever allowed to see

00:07:52.040 | arrays of rank zero, that is single elements. So it will tell it, okay, sum one, then sum two,

00:07:58.280 | which is why that gives us our array unmodified. The difference with plus slash

00:08:09.400 | is plus slash does the, sorry, plus slash bar is that it looks at the whole array and treats it

00:08:18.360 | as a whole array. So it takes the entire first layer and adds it to the entire second layer

00:08:23.160 | like that. Now, if you restrict the rank to rank two, it cannot see the entire array. The

00:08:33.880 | only thing you can see is this matrix because rank is restricting its vision. So now it's

00:08:40.120 | going to instead, the leading axis now is down here along the row. So it's going to add this row

00:08:46.760 | to this row, to this row, giving a result with four elements like that, which is why we get

00:08:54.360 | two times four elements. If I tell it rank one, then it will only ever be able to see

00:09:01.880 | one row at a time. So it will sum each row and that's equivalent to plus slash.

00:09:07.240 | So in general, when you have two symbols that are the same, but one has a bar on it,

00:09:14.440 | then if you give the one with a bar, rank one, that's the same thing as removing the bar.

00:09:21.560 | And so is that why like you often see more experienced APL programmers kind of

00:09:29.320 | using the bar version kind of by default, because it's like more flexible.

00:09:33.960 | Exactly. So now, if we try to use a function that computes the average,

00:09:42.840 | let's do that again. And here's a function that computes the average.

00:09:52.200 | I'm sorry, tell you, you just learned telling something like that. Now, if I try to apply this

00:09:59.560 | on a table instead on the matrix, so we can use this one. Then obviously the average in each row

00:10:13.400 | should be two and five. Well, what's actually happening is that we get something that makes

00:10:19.480 | no sense at all. Why is that? Because tell it counts how many major cells there are,

00:10:26.440 | that is along the first axis. So it says there are two, but plus slash is summing along the rows.

00:10:33.160 | So we are summing three numbers and dividing by two, that is not an average.

00:10:37.640 | So experienced APL, they will use slash bar instead. This sums along the leading axis,

00:10:44.440 | and this counts along the leading axis, and this will give me one average per column.

00:10:49.640 | So the average of one and four is two and a half, average of two and five is three and a half.

00:10:54.840 | Yep. That's why the experienced APLs will use first axis functions, because then they can always say,

00:11:03.560 | okay, if I want the average over the rows, I'll restrict the view of this function to the rows

00:11:13.640 | of this array. And now I get two and five. And I couldn't have done that if I had defined my

00:11:20.440 | function. I couldn't, it wouldn't be as flexible if I had defined my function in terms of last axis,

00:11:25.560 | only by defining it in terms of first axis, I'm able to narrow down this vision to any lower level.

00:11:31.880 | Is a way to think of that mnemonic with the slash bar is that like the bar is horizontal,

00:11:40.840 | so it deals with rows? Is that like a... Yeah, that's how I think of it. At least it

00:11:45.880 | goes down the rows instead of going along the columns. And you have the same thing with,

00:11:53.160 | this is reverse, horizontal reverse, and this meaning last axis reverse, and this is

00:12:00.440 | reversing it over the horizontal axis, or rotating over the horizontal axis.

00:12:05.640 | Or flip, flipping? Yeah, either flipping or...

00:12:09.000 | Horizontally, or flipping vertically. Yeah, exactly. So this shows the bar is transposed,

00:12:15.720 | isn't it? Yes, the diagonal is transposed because it's flipping over the diagonal.

00:12:19.880 | Wow, that's so awesome.

00:12:21.000 | So the problem we had with, you're trying to do this thing or something like that.

00:12:31.320 | Or we can take some numbers that makes it a little bit easier to understand.

00:12:38.040 | So, plus and all the arithmetic functions are so-called scalar functions. They actually have an

00:12:45.960 | implied rank zero, always. Meaning they're always so narrow-minded that they only look at individual

00:12:55.720 | elements and never consider the whole array. And there's nothing the rank operator can do

00:13:01.000 | to change that. So as opposed to axis, which is an ad hoc syntax that actually

00:13:07.400 | looks at what function did you give it and does something special for it, the rank operator is

00:13:12.120 | entirely general purpose. It has no idea about which function it's applying.

00:13:17.880 | I see. So equals bar could be used to behave like equals, but...

00:13:21.720 | Exactly. So if we do this, now we're using match that only is allowed to look at scalars at a time.

00:13:29.640 | So it looks at three and three. It looks at four and looks at five as pairs.

00:13:35.480 | And so we can use that to match anything. So match is the more general purpose function

00:13:42.120 | than equal, but equal is a very common construct. So that we have a separate function for that.

00:13:48.680 | And a good way to get a view of things in general, if you don't know what's going on in an expression,

00:13:55.880 | I think you did see this phrase once. It prints things. So here's a really

00:14:03.800 | useful debugging trick. If I have a function, I don't know exactly what's going on.

00:14:08.120 | I'll wrap it in a division and I'll put in alpha and omega. And this is a new statement.

00:14:16.040 | I don't know if you've seen that. And then I'll apply the function. So let's say, for example,

00:14:22.760 | that I'm using equal. So the function will return the same result as the primitive function,

00:14:30.040 | but it's wrapped in such a way that it prints the arguments first.

00:14:34.360 | So now if we say three, four, and three, five, we can see that the two should probably turn boxing on.

00:14:44.520 | We don't actually need the max style. But oops, that didn't. Oh, of course.

00:14:54.520 | I have to specify that even when functions are printing, I want it to be

00:15:00.040 | boxed. So what equal is seeing is three, four, and also three, five. But it does its thing

00:15:07.960 | element by element. Now, if I apply the rank operator, it will print twice. It'll be called

00:15:15.720 | three, three, and four, five. So this is a good way to see it. And you have already gone so advanced

00:15:21.960 | that you created your own operators. So we can actually create a trace operator.

00:15:28.040 | And it will print the arguments. And then it will apply the function to the arguments.

00:15:38.840 | So now we can say three, four, equal TC, three, five. It prints that.

00:15:46.520 | And we can do it with rank zero. It prints it like this.

00:15:56.920 | We can even make it fancier and put labels inside and saying alpha is this, and omega is this,

00:16:01.480 | and so on. You do need a different version for a monadic, though, because there would be a value

00:16:08.040 | error on the alpha. So there's a trick that you haven't learned yet, which is to type this.

00:16:14.840 | There's a special syntax in the defin. It means--

00:16:19.080 | Yeah, I've seen that. It means apply. It means make that-- if you haven't passed an alpha,

00:16:23.640 | then that's default. It's just default left argument. So the default left argument-- this

00:16:26.920 | is a funny default left argument, because the default left argument is a function,

00:16:31.240 | which otherwise you can't pass in. But it's a function which is a no-op. It's a identity

00:16:37.400 | function. And so here, if alpha is an identity function, then we just print omega. And over here,

00:16:44.600 | if alpha is an identity function, we apply alpha-alpha monadically. And then we apply

00:16:49.640 | identity function to it, which doesn't do anything. So this is a general purpose TC.

00:16:53.720 | So now I can take A we had before.

00:16:56.440 | Sorry, just to clarify, Adam. Is that setting alpha-- it's not setting alpha to the identity

00:17:05.240 | function, right? It's setting alpha to the result of the identity function.

00:17:08.120 | No, it makes alpha B the identity function.

00:17:11.880 | Oh, OK, because I thought--

00:17:13.720 | Because there's no argument to it. So it's just a function assignment. It's a tested function.

00:17:17.640 | Oh, yeah, yeah, yeah. Yeah, OK, sorry. That makes perfect sense.

00:17:21.240 | It would be the same thing as writing this.

00:17:23.240 | Yeah, yeah, I get it.

00:17:24.360 | OK, so we don't have to change that. So now, if we look at what plus slash is seeing,

00:17:31.880 | and that was our problem from the beginning. So here, we can see that it's seeing the entire array.

00:17:38.120 | It's a little bit hard to read, but here's our result. And this is the printout of the arguments.

00:17:44.200 | We can improve TC a little bit and say alpha here.

00:17:50.840 | Oh, yes, of course, that's not going to work. And alpha omega.

00:18:12.760 | This will work. OK, so there's only an omega in this case.

00:18:24.600 | And it's seeing the entire array, and it's summing its rows.

00:18:30.440 | And when we try to apply this rank 2, then we see it twice. But it's, again, being applied to this.

00:18:37.880 | It's exactly like applying plus slash to this array, summing its rows.

00:18:42.680 | And so on. So this is a really useful operator. You can modify it to your heart's content.

00:18:50.440 | You can make it do whatever you want. And to effect things, you can write a timestamp

00:18:54.440 | when it happened and so on. So I think, I mean, I can take questions, but I think it should be

00:18:59.080 | more clear now what the rank operator does. It's just blinders. That's all it does.

00:19:04.120 | This was really helpful. Thank you, Adam.

00:19:08.920 | And this is what happened, then, with when we compared these.

00:19:20.760 | Then you did the outer product like this. The outer product is comparing all the

00:19:31.800 | elements on one side to all the elements on the other side, all the different combinations.

00:19:36.920 | We can write this just with equal and rank. Yeah, that's what we did yesterday.

00:19:41.000 | Yeah. So again, this is every element, zero on the left.

00:19:49.640 | Rank zero on the left gets compared to every element on the right.

00:19:53.400 | So I know you did this, but there's a point here.

00:19:57.720 | So it gives us the same result. But what is equal actually seeing?

00:20:03.240 | Outer product applies between all combinations of left and right elements.

00:20:07.720 | That's not what's happening here. It's what you thought was happening here.

00:20:11.800 | Because if you put TC in, you'll see that it's only being called with a scalar on the left

00:20:19.880 | and a vector on the right, because that's exactly what we asked for. Scalar on the left,

00:20:25.000 | vector on the right. If we instead do dot equal trace like this,

00:20:34.440 | you can see that it's being called individually on every pair.

00:20:39.000 | So the full way of behaving like the outer product is to say, I want equal to only ever

00:20:53.480 | see arguments of rank zero on one side and zero on the other side. And you're allowed

00:20:58.840 | to omit one if they're the same. And that function, which only looks at rank zero things,

00:21:05.480 | should be applied between scalars on the left and vectors on the right. So I'm using rank twice.

00:21:12.840 | Oh, wow. Hang on. Okay. I want you to understand.

00:21:15.160 | So everything banks for the left. Okay, we read left to right when we do

00:21:19.320 | operators. Yeah. Or you could say the operators have a long left scope. So this operator

00:21:24.760 | takes the entire thing here, operator phrase on the left, a function phrase on the left.

00:21:28.920 | So this is saying this function can only ever see scalars. And with that, apply it between scalars

00:21:36.760 | on the left and vectors on the right. Now, I don't get it. So if you're saying to apply it

00:21:42.760 | to vectors on the right, but you previously said it only, you can never apply the scalars.

00:21:47.240 | What does that? Yeah. So it's not, remember, um, rank is not modifying the function.

00:21:52.280 | Rank is calling the function one or more times as necessary, such that the function will have a

00:22:00.040 | restricted view. Yes. So this function over here will be called with left arguments as scalars and

00:22:07.880 | right arguments as vectors. We can see that by putting in CC. Oh, right. However, this function

00:22:15.800 | itself is the derived function. It is not a normal equal. It's an equal. It's, it's a function that

00:22:22.840 | uses equal, but only ever lets equal experience a scalar on the left and the scalar on the right.

00:22:28.920 | And how do we know that? Well, we can look at with TC. So now we can see that this equal is being

00:22:35.320 | called like that one element at a time on the left and on the right. We could also put in a double

00:22:39.640 | TC, but it will be a very verbose and also when, yeah. So, so we can see first to go up all the

00:22:48.440 | way here. So we say that the outer TC reported, I'm calling my operand with a scalar and a vector

00:22:54.840 | and the inner TC, the left one saying, I'm seeing, I'm calling my operand with a scalar and a scalar.

00:23:04.120 | So actually in some ways, it makes sense to think about that composition train right to left

00:23:13.320 | in the sense that the right-hand

00:23:17.400 | diuresis is taking the whole left-hand function. So the kind of the implied loop is that left-hand

00:23:33.080 | side is kind of the inner of the applied loop in a sense. Yes. And this is the governor.

00:23:37.320 | Or do you read it right to left until you hit the operator and then you jump to the far left and

00:23:45.400 | read that? I mean, the parsing goes left to right, but the point is that the right-hand

00:23:51.000 | diuresis has the entire left-hand derived function as the thing that it's applying that rank to.

00:23:59.240 | I find it a little bit dangerous to speak about APL in terms of right to left,

00:24:03.000 | left to right. It's kind of like a scaffold for letting people know how simple function

00:24:10.040 | application works in APL, but it doesn't really apply when you have the full APL syntax, including

00:24:15.880 | operators and stranding and so on. Really the way you should think of it is in terms of binding

00:24:21.080 | strength. What binds stronger? And then that operators bind stronger to their neighboring

00:24:29.400 | tokens than functions do. And then when you have equal binding strength, then operators go from

00:24:38.040 | the left stronger. So they have a long left scope and operators left upper-end will be as far to the

00:24:45.560 | left as it can possibly reach without switching type. It can only take either a function or an

00:24:51.400 | array. So when parsing this, we can look at this as, okay, this operator, what does it take as its

00:24:59.560 | left upper-end? Well, here we have an operator, a magnetic operator. It can't be just that because

00:25:04.920 | it can't take a magnetic operator as upper-end. So we have to keep going left. Maybe this is the

00:25:08.680 | upper-end of TC. Oh, further left. No, there's a diatic operator. It's going to grab the zero

00:25:14.680 | from TC and it takes its left upper-end. Oh, no, another left, another operator on the left.

00:25:23.400 | So it has to have an upper-end. Keep going here. And there's a parenthesis stop. We can't go any

00:25:27.240 | further. So we stop here. Or you could look the other way around this equal. Is it being applied

00:25:35.240 | now? Nope. It's being grabbed by an operator on the right. Is this being applied? Nope. It's being

00:25:41.000 | applied. It's being grabbed by an operator on its right. Okay. So here's the right upper-end. Are

00:25:47.320 | we ready to apply? Nope. There's an operator on the right grabbing me. Is this ready? No, there's

00:25:52.600 | another operator. Okay. And then finally, the right upper-end and the parenthesis. We can't go any

00:25:58.520 | further. So it doesn't matter which way you go. As long as you know the binding strength rules,

00:26:03.240 | you just go one token ahead and see are we done yet? And if the binding rules say no,

00:26:08.840 | we're not done yet, then you keep going. And related to this, Adam, I found it very

00:26:15.960 | insightful listening to you on a Raycast episode talking about why you tend to avoid parenthesis,

00:26:25.800 | which is not because you're trying to type less characters, but because it's a similar idea that

00:26:31.240 | you were saying, there's less to keep in your head at once if you can just work in the natural

00:26:39.080 | direction and only have to keep one thing in your head at a time. Right. And it doesn't really

00:26:44.680 | matter. You can read APL right to left or left to right. It's just a matter of reading it. So

00:26:49.960 | the way I would read this from left to right, and actually I would avoid this parenthesis,

00:26:56.920 | I do need to separate this array from this array, but I can do that with an identity function

00:27:02.840 | because this operand here has to stop here. We're switching to a new token here. It can't grab further.

00:27:10.280 | Okay. So that identity function. So what's the code again? Write something,

00:27:15.320 | right? I mean, the efficient name is the same, but write tag is the symbol. Yeah. And so that

00:27:22.920 | function in a dyadic context returns its right-hand side and in a monadic context returns,

00:27:30.440 | well, it returns its right-hand side. It's right-hand side.

00:27:32.440 | I like to call it write because it returns whatever's on the right. Right. And so that

00:27:39.400 | function's not doing anything except as I parse that now, I basically can see that I've got a

00:27:46.440 | function being applied to an array and therefore I'm, yeah, that's a unit of stuff that APL can then.

00:27:55.480 | So you can read this from, so normally I would read APL, at least these expressions are short enough,

00:28:02.680 | from left to right. Interesting. Because it's executed from right to left,

00:28:07.160 | we can read it from left to right. And I will make a crazy claim here that English is written

00:28:14.360 | and read from left to right. And it executes from right to left. I'll come back to that.

00:28:19.320 | So this is ABCD equal on scalars, on scalars and vectors, to ABDC, BCA.

00:28:34.520 | It reads naturally from left to right.

00:28:42.760 | Yeah, I know what you mean. It's like when you see like, you know, three

00:28:47.960 | divide, tilde diuresis, something, you can start reading it as like three divided into,

00:28:59.160 | and then you can start, you know, you can see that expression, divides.

00:29:02.360 | Just make it three, three divides five. English is executed from right to left.

00:29:10.760 | Go drive the big red. You still have no idea what I'm saying. Bus.

00:29:17.720 | Okay, so first you have to evaluate bus, right? Then you have to make it red. Then you have to

00:29:23.640 | make it big. Then you have to talk about the concept of driving it. Then you have to go do that.

00:29:27.800 | Go drive the big red bus. That's insightful. Yeah.

00:29:37.000 | In fact, normal function syntax in other programming languages is also from right to

00:29:42.360 | left, even though everybody thinks it's from left to right. Because if I write f of g of h of x,

00:29:51.480 | you have to evaluate x first. And before you evaluate h, before you evaluate g,

00:29:56.280 | before you actually write it and read it from left to right.

00:29:59.560 | Well, a lot of people nowadays are moving towards the syntax where you kind of...

00:30:05.800 | The x dot x. Well, or maybe it's been functional,

00:30:09.080 | kind of a right arrow kind of cliff or something. Yeah, pipe type thing. Yeah.

00:30:13.960 | That's true. Yeah.

00:30:16.840 | But anyway, enough about that. So I hope this clarifies matters a bit.

00:30:21.800 | Yeah, it's great.

00:30:24.200 | Now, I've used up half of your time. I'm sorry.

00:30:30.520 | No, I'm thrilled. Anybody have any questions about anything?

00:30:36.040 | Adam might want to go to sleep.

00:30:37.320 | Both watch the previous ones and then join in. This is great.

00:30:42.440 | I guess I have a more general question, Adam, which is, do you have any thoughts about...

00:30:49.080 | I mean, I want us to finish all the glyphs, right? Which hopefully won't take too much longer.

00:30:55.400 | But when we do, I think the next step will be to learn

00:31:01.240 | to write APL properly and also understand why, like what proper is proper.

00:31:08.360 | So things like this use bar version of glyphs because they're more flexible thing is like a

00:31:20.760 | pretty key insight. Is there like good videos or books or anything like that for getting these

00:31:27.240 | kinds of insights? The art of APL. APL style.

00:31:35.800 | I don't... There are some tips and tricks. In general, APL isn't very

00:31:44.680 | opinionated about how you should write things. In fact, I think dialogue is kind of proud of

00:31:53.720 | language being a multi-paradigm language. You can write in a functional style. You don't have to.

00:32:01.480 | You can write in the object or in the style if you want to, but you don't have to. You can write

00:32:05.720 | test it or you can write non-test it whichever way you want. However, if you want good performance,

00:32:14.040 | for example, then there are some things you should stick to. If you want more flexibility,

00:32:19.320 | so your functions are generally more applicable, then there are some things you can stick to.

00:32:24.040 | I would say for what we're doing, more flexibility is probably what we're aiming for because I think

00:32:32.680 | like this study group are kind of positioned to just like learning about a flexible and

00:32:39.560 | expressive notation which might help us to think about problems that we're solving.

00:32:45.880 | There's not enough, I think, to write in order to make a paper of it. It's like a couple of lines of

00:32:51.400 | tips like this. Make your functions leading axis oriented so that they're more flexible. You can

00:33:01.640 | apply them. You can always make them later axis oriented by using the rank operator and

00:33:09.640 | keep your codes flat. You can do arrays of arrays. Flat reading with parentheses?

00:33:16.600 | No. The algorithms should use arrays that are not nested. We can have these arrays of arrays.

00:33:26.120 | You haven't used a whole lot of them, but the opposite is called simple arrays or flat arrays.

00:33:36.600 | I've heard some people call it, they're more sympathetic to the hardware.

00:33:39.880 | The computer is really, really good at arrays because it's actually...

00:33:45.800 | Just to clarify, if I remember correctly, J doesn't exactly let you have arrays of arrays.

00:33:53.960 | You have to explicitly box them. The difference is very little. It's almost...

00:33:59.560 | It's more focused on arrays of arrays, if I remember.

00:34:05.480 | Yeah, K doesn't allow you multidimensional arrays. It only allows lists of lists of lists

00:34:10.600 | and there's no other way. There's some choices that we made in design in order to avoid that

00:34:17.800 | because it doesn't allow multidimensional arrays.

00:34:21.240 | Yeah. In PyTorch and such things, we think about these issues a lot because

00:34:33.000 | it really kills you on the GPU. If you're doing something across anything other than

00:34:41.240 | the trailing axis, it'll still work, but it'll be doing a non-linear stride.

00:34:50.520 | But it's not just a stride. You have a stride if your array is actually represented flat in memory.

00:34:59.480 | Sure, but if you have nested arrays, it's not contiguous at all. It's not even a stride.

00:35:05.080 | And that is going to kill performance. And not only performance, but actually in today's computers

00:35:13.160 | are so fast that the bottleneck is often memory throughput. The RAM cannot feed problems to the

00:35:21.960 | processor fast enough. The processor is just sitting there waiting for the RAM to deliver more work.

00:35:27.240 | So this is actually a very current issue in the deep learning world because as of a year or two

00:35:33.160 | ago, a lot of papers were written that would write about the flops that their algorithm would

00:35:39.800 | require. And nobody, not nobody, but a lot of people writing these papers hadn't quite noticed

00:35:45.800 | that there was very little correlation between flops and time because of the memory issue.

00:35:51.560 | Now PyTorch doesn't let you have tensors of tensors, so it's less of a problem. But yeah, it does turn

00:35:59.080 | out that memory is probably the more important issue in deep learning algorithms at the moment.

00:36:08.840 | So here's one more trick to use in APL at least. Use Boolean masks as much as you can. And that is

00:36:18.440 | because again, the RAM is the issue. That's the bottleneck. So in other words, instead of conditionals.

00:36:26.360 | Well, not just instead of conditionals, but instead of integers, if you can,

00:36:31.000 | instead of using indices and things, then you should use a mask for the whole thing.

00:36:35.160 | The reason for that is and store data as Boolean instead of... So I just want to make sure I've

00:36:41.880 | done the same wavelength. So as you're saying, instead of like having an array that says like,

00:36:46.040 | get indices two, three, and five, you would have a mask array of zeros in which items two,

00:36:52.040 | three, and five have a one in that location. Exactly. And then let's say, for example,

00:36:58.280 | you need to combine two conditions. And you know that elements one, two, and five abide by these

00:37:08.360 | conditions. And then by one condition, you have another condition for which elements four and five

00:37:16.360 | hold the condition. So you could do the intersection of the two sets. I'm multiplying them.

00:37:21.560 | To get the indices. Well, they're just numbers, right? They're just the intersection. So you do

00:37:27.400 | the intersection of them as sets. And then those are the ones where the condition holds for both.

00:37:32.680 | And then you could index things. However, if you had them as Boolean masks instead,

00:37:36.600 | so it would be whatever, zero, one, zero, one, something like that, and one of them and so on,

00:37:42.520 | one of them and so on, then you can just do an end, the Boolean end. And that gives you a new mask

00:37:47.240 | and doing a Boolean and operation on binary data in the processor is enormously much faster than

00:37:54.840 | doing a set intersection. That's what I meant about multiply. So you could, okay. Yes. Exactly.

00:38:00.760 | And then there's another benefit of this is the API will aggressively squeeze arrays

00:38:08.040 | and Boolean arrays are stored as one bit Booleans. Oh, really? That means that you can store eight

00:38:16.440 | elements in a single byte. Wait, how does that work? Because it's not like typed per se. So if

00:38:23.080 | it would just notice that the highest is one and the lowest is zero. And if I then try to store a

00:38:28.280 | two and it all have to reallocate the whole thing or something. Yes. Yes. And then, but that means

00:38:35.000 | since the processor is waiting for the data and we're able to switch to an eighth of the data size,

00:38:41.320 | that means that the transfer time, which is the important time is going to be an eighth.

00:38:46.760 | And that gives you enormous speed ups. And so we have all these very clever algorithms built

00:38:53.640 | into the interpreter and algorithms that are difficult to develop. They can take decades to

00:38:59.480 | write the C codes for that, a C code for that. And they can give you a speed up like that. So

00:39:06.200 | basically by using APL that is optimized like this, you are, you are employing C clever C programmers

00:39:15.400 | that have been working for you for years to fine tune your program way before you even started

00:39:21.960 | writing your program. So, so these are, I can't, I don't even think I can think of more of more

00:39:31.320 | things that have good principles than, than that. Okay. I mean, that's very good. You use Boolean

00:39:40.680 | masks, keep your arrays flat. And what was the first thing I said before? And first, and first

00:39:52.280 | access in leading access and things. So with, so for, I mean, the general programming principles,

00:40:03.240 | you don't do global state changes and sort of global variables, really bad idea. So one thing

00:40:11.320 | we, yeah, it's another thing I think we're pretty familiar as a community with more general software

00:40:16.680 | engineering principles. One thing that surprised me when we were learning about each was that it

00:40:23.000 | didn't operate over kind of major cells, but instead it operated over sub arrays. And I guess

00:40:31.080 | that what now that we know about rank, we can just use rank for anything where we want to go over

00:40:37.560 | major cells, which means maybe each is not so useful anymore. Each is actually really,

00:40:43.320 | really simple. I can, I can show you if you want, I can explain what is happening with each.

00:40:48.840 | So, okay. And like I said, just in general, but like each, each is a thing that

00:40:58.600 | you would use and, you know, use it occasionally, but it depends what it is I'm doing. I'll try to

00:41:05.720 | avoid it as much as possible. And among APL programmers, because it's an explicit loop

00:41:11.880 | and that means the interpreter has no choice but to loop. And we don't want loops. We want to do

00:41:17.480 | array operations because then the processors now have array instructions. They can do it.

00:41:25.880 | And rank operator doesn't create explicit loops?

00:41:28.840 | Rank operator conceptually loops, but internally if it can avoid it, it will not loop.

00:41:33.240 | So it knows about a lot of things. Exactly. Exactly.

00:41:37.160 | So that would be good to have a little section in the notebook there, Jeremy, where we might

00:41:45.800 | like say this is an explicit loop and this is the less explicit way to do it.

00:41:53.960 | Well, yeah, I mean, it's a different notebook, I think, Ben, like, you know, we've got to think

00:41:57.480 | about how to present all this, but I think, you know, there's a note, the theory of this first

00:42:00.760 | notebook or set of notebooks is literally, you know, a dictionary of APL glyphs in an order

00:42:08.440 | where you never get a definition in terms of something you haven't learned yet,

00:42:12.440 | you know, and then there's something later about like, okay, what do you do with it?

00:42:17.640 | Cool, yeah. So if you use the rank operator to loop, then you might keep the performance

00:42:26.280 | because it doesn't actually loop. It uses fancy instructions for that instead. Each doesn't have

00:42:31.960 | much of a choice, although occasionally the interpreter is clever enough. If you try to

00:42:35.720 | use plus each, it will not actually loop because it knows how to just circuit that and just do it

00:42:42.040 | directly. But what's happening with each is, you think of the matrix, they want to loop over each

00:42:52.040 | row, but really what each is, is very, very simple. So if you have F each, that's the same thing

00:43:00.120 | as, you know, now that you've learned enough of these compositional operators,

00:43:05.800 | you learn the top. It's the enclose, which I don't remember.

00:43:09.480 | We haven't done enclose, but you can do quickly what it is.

00:43:12.120 | Yeah, it's basically just wrapping an array up as a single element, as a scaler.

00:43:16.680 | So it's adding a leading act. Oh, it's not adding a leading act.

00:43:19.720 | Not adding an actually creating a scaler. It's creating a pointer to it. You can think of it

00:43:24.680 | like that. What type is that? Is that some new type we haven't learned yet? Like it's literally

00:43:29.160 | an enclosed item. Well, it's not numeric scaler. It's not a, and it's not a scaler.

00:43:39.000 | It's a pointer type reference, but it doesn't, it's not a reference. No. Okay.

00:43:44.920 | Because APL is passed by value. And so it will do, it will not keep connections between things

00:43:51.080 | that you assign across, but internally, it's actually a pointer. And that's pretty much how

00:43:55.400 | you can think of it, but you need some enclosure. It's a scaler. And so what it is, is enclose

00:44:04.440 | a top F over disclose, and disclose exactly the opposite. This means follow the pointer,

00:44:13.800 | go, go get one element and open it up. Rank zero.

00:44:19.000 | I'm just trying to remember. So the jot, diuresis.

00:44:25.640 | This is pre-freses, both arguments. Yes.

00:44:30.520 | But if there's only one argument, remember, then it's the same thing as on the top.

00:44:33.880 | I do remember. Yeah.

00:44:35.400 | So that's why it's useful to have it do that. So this means actually pre-process all arguments,

00:44:40.920 | whether there's one or two, we just pre-process them with this close. So we open up a box.

00:44:45.960 | Add on a ticket. You've got, oh, no, you don't have a fork. These are operators, not functions.

00:44:52.360 | Okay. So this is, so here you've got function, operator, function, operator, function, operator,

00:44:59.720 | array. Okay. So this whole thing is monadic because there's a thing only on the right.

00:45:04.520 | And then, wait, what, what are you saying? This is not, this whole thing is one giant function.

00:45:12.680 | The function is ambivalent. We call it. It's both monadic and dyadic.

00:45:17.400 | Oh, how is this a function? I thought the zero on the right hands. Oh, no, the zero is the

00:45:23.320 | right hand side of the operator. Yeah. Yeah. So this says, so what it's saying is on every

00:45:29.000 | scalar element and loop as much as necessary to address scalar elements. Okay. This is fine.

00:45:34.600 | On both, on both sides, not mine. This is rank. Oh, okay. Yes, of course. That's rank. So this says

00:45:41.720 | on scalars. So we already, we already dug. The first thing we do is dig all the way down to the

00:45:47.640 | scalars. That means there's nothing you can do to each to make it apply to rows. It's already

00:45:55.720 | impossible. It's like, it's like equal or plus it's already down at the scalar level on the scalars.

00:46:01.880 | Yeah. Sorry. So it never sees the row. Yeah. It never sees a row. It never sees any,

00:46:08.280 | F will never see anything that comes from more than a single element. So on the single elements.

00:46:13.000 | Yeah. And remember this zero actually means zero, zero, zero, because it, what it means is zero,

00:46:19.160 | it's ranked zero monadically, it's ranked zero and left and this ranks zero, right. So it's

00:46:23.560 | always ranked zero. Then on those elements, open them up. If they are nested, if they're not nested,

00:46:31.720 | like just a simple array, then nothing happens. Apply F and package the results back into the

00:46:39.320 | box. Oh yeah. Because the, um, when you've got multiple composition, it goes right to left.

00:46:48.920 | This, yeah, you could, you could, the binding goes like that. So we're saying, so this,

00:46:54.200 | this is F post processed by enclosing the result and pre-processed by disclosing the result.

00:47:02.840 | Oh, okay. So we can show you have actually used nested arrays that have been implied

00:47:09.320 | in close already. So, and what we can say as an example, ABC, DF, this is a nested array.

00:47:18.520 | This is exactly the same thing as the enclose of ABC concatenated with the enclose of DF.

00:47:26.520 | The stranding syntax is just a nicety. It's in tactic sugar. It means this. Okay.

00:47:33.080 | It just enclose all the elements that are being stranded together.

00:47:36.760 | So, so conceptually, these are, well, no, they're really, they're, they're scalars

00:47:44.040 | because every vector consists of scalars and every matrix consists of rows that have scalars

00:47:49.320 | in them, elements in them. And so this is an, and maybe we should turn boxing up to max now.

00:47:55.000 | Why is that not just an array of arrays where the first array is ABC and the second array is DF?

00:48:03.880 | Well, it is. Okay. So why do you need that enclose idea?

00:48:07.000 | Because if I didn't have enclose here, then we would just be concatenating together.

00:48:14.200 | Yeah. So we need to say each, each three element vector lives in its own little scalar.

00:48:21.320 | And so these are individual scalars. If we look at the, at the shape of this,

00:48:28.040 | it's the empty vector. It's a scalar. Yep. If you look at the depth of it, it's depth two.

00:48:33.640 | There's an outer array, which is a scalar and there's an inner array, which is a vector.

00:48:38.440 | So the two levels. Yep. And so this is what we have. This is exactly the same thing as ABC, DF.

00:48:48.440 | And now it's easier to understand when I do reverse each on these, what was actually happening here is

00:49:01.160 | I started off by applying on, so you can make our little, we can write this TC each.

00:49:10.200 | So we can see that TC is first seeing ABC, then seeing DF. Ah, well done. So if TC is seeing ABC,

00:49:19.720 | then firstly, well, if reverse is seeing ABC, that means it was only applied to the first element.

00:49:29.800 | That's the rank zero, but it wasn't, it didn't see the enclosure.

00:49:33.560 | So we have disclosed it. We've opened it up. Hmm. And what did that?

00:49:40.760 | Well, because remember the definition of, of each. So in this case, it's the enclosing,

00:49:49.400 | oops, enclosing a top, the disclosing rank zero. Oh, this is what each means. That's the definition

00:50:03.400 | of each. If we didn't do the disclosing, if you just do this right, reverse running zero,

00:50:13.400 | then we are reversing each scalar, but reversing a scalar doesn't do anything.

00:50:17.320 | Hmm. If we only preprocessed it by opening it up, then we would get the matrix because we're

00:50:28.120 | having the results from this or is a vector. So two vectors in an array makes a matrix.

00:50:34.120 | So to stick them back into the boxes where they came from, we post process

00:50:42.520 | the result of reverse, and this is a definition of each. And that's why you cannot use a function

00:50:49.080 | that has an each after it to access entire rows of a matrix. It's just not possible.

00:50:54.840 | However, if we take three, four, reshape by Yota and Yota 12, and we want to reverse each row.

00:51:07.560 | Yeah, now you can use rank. Firstly, this function anyway is rank one.

00:51:14.200 | Remember, if this function is the leading axis one, then the corresponding

00:51:22.280 | function is the same thing rank one. So this reverses the first axis, and this reverses

00:51:32.360 | the last axis. It flips it horizontally. And if we use the first axis one with rank one,

00:51:41.400 | then we're flipping the rows because there's only one axis in them. We're only ever seeing

00:51:52.840 | one of these. So how could we use reverse each to reverse the rows? Or for that sake,

00:52:03.800 | reverse first each? Well, if we know that each will open up these boxes and close them down again.

00:52:12.200 | So if we give it boxes that they can apply on, then it will work. So if I enclose

00:52:21.160 | rank one, so enclose, remember, puts a box around things. If I just enclose the array,

00:52:26.600 | we get a multiple enclosed, we can keep making beautiful patterns. If I enclose rank one,

00:52:33.640 | then I took each row and made it into a scaler. That means we have a collection

00:52:40.200 | of three scalers that's called the vector and nested vector. Now I can reverse each.

00:52:47.480 | Isn't there an arrow that does the same thing? Yes, for matrices, there is a down arrow,

00:52:53.880 | but it's not really necessary. It does exactly the same thing on matrices, but it doesn't do

00:52:59.240 | the same thing on higher rank arrays, so it's more general. In a sense, this itself is a

00:53:04.360 | last-axis function, whereas enclose is a leading-axis function. It works on everything. And you can

00:53:11.880 | restrict it to be on the lower rank. Oh, I see. So you can think of down arrow and enclose as

00:53:17.720 | the leading and trailing-axis versions of the same thing. Yeah, you could, yeah.

00:53:22.920 | Maybe they don't look similar. They don't look similar, but that's this because...

00:53:30.120 | Enclosed with a bar would look like epsilon or something.

00:53:33.560 | Well, they would look like this, but that means something else. Oh, okay. Yeah, you haven't

00:53:39.880 | learned the dance. You'll see. I don't give it a spot at all. But there's never any reason,

00:53:45.480 | really, to use a down arrow. It's just confusing on higher rank arrays. It's much easier conceptually

00:53:49.160 | to understand that this puts things into boxes, and it gets restricted to only see rows. So it

00:53:54.200 | puts the rows into boxes. And then we can disclose... The problem is we need to disclose each of these.

00:54:04.600 | So we need to disclose rank zero. The elements are of the vector. This vector has three elements,

00:54:14.840 | and I want to open up each one of them because it's confined to a box. Then we get our metrics back.

00:54:19.560 | So this is exactly possible. But notice here, I'm enclosing only to applying each

00:54:28.840 | so that I can, again, disclose. Well, that's the inverse operations of what each actually implies

00:54:36.200 | because this each is actually over disclose rank zero and then enclose on top of that.

00:54:51.480 | So these negate each other. This makes scalars that are enclosed, and this opens scalars that

00:55:00.840 | are enclosed. And this encloses the results, and this discloses the results. So these two cancel

00:55:08.600 | each other out, and these two cancel each other out. And the only thing we have left is this rank

00:55:15.400 | one, which it already is, and we're right back. So we can always look like that. You come full circle.

00:55:20.840 | That's great. This is why you cannot use each on rows, but you can use rank on rows.

00:55:29.720 | And the interpreter is clever enough that if you write reverse rank one, it won't loop.

00:55:45.080 | It will understand that it needs to reverse the rows, and it will do that

00:55:48.920 | as fast as it can do that with a vector instructions memory.

00:55:54.840 | I don't know if it can actually speed anything up here, but it will try.

00:56:03.400 | Wow, it's nice to learn APL from somebody who understands it.

00:56:12.120 | Thanks, Adan. We should let you get to sleep. And that's our hour, so that's actually fantastic.

00:56:19.960 | That's awesome. I feel a little bit better about hijacking your whole thing.

00:56:25.080 | We're happy to have you hijack all the whole things. It's great. Thank you.

00:56:29.880 | Yeah, no, it's great that you're spending the time to watch them all. It was great that you joined.

00:56:34.760 | This was really helpful for me. I enjoy also seeing your explorations, and it

00:56:43.400 | gives me some feedback on where we can improve our documentation.

00:56:48.120 | Must be a bit cringy, though, to see us being like, "Oh, what are we doing?"

00:56:51.800 | Just press that button, Jeremy, for God's sake.

00:56:55.560 | There's been a couple of times where I kind of wish I was there.

00:57:01.720 | I'll let you explore. Almost always, you figure it out eventually. Somebody jumps in and says,

00:57:08.920 | "Hey, try this." But what I was worried about a little bit here is you seem to be going down

00:57:15.000 | a wrong conceptual path with regards to rank. It seemed like you were thinking that rank actually

00:57:22.840 | modifies the function, just like the bracket X modifies the function. That isn't the case.

00:57:32.520 | It's nice to know that if we go too far off the deep end, you'll come tell us.

00:57:36.440 | Yeah, then I'll come and yell at you.

00:57:38.120 | I'll have sleepless nights if you go too far off the right path.

00:57:43.640 | Excellent. I'm glad to hear that.

00:57:45.640 | If we see you join the call at the beginning, we're like, "Oh, we had a problem last time."

00:57:48.920 | Yeah, exactly. All right. Thanks, all.

00:57:53.320 | Otherwise, feel free to ask me questions. I'll respond on the forums.

00:57:57.560 | Very well. Bye.

00:57:59.080 | - Bye.

00:58:00.020 | - Thanks, Adam.

00:58:00.860 | - Bye.

fast.ai APL study session 11 (with Adám Brudzewsky)

Chapters