fast.ai APL study session 11 (with Adám Brudzewsky)

>> Hello. Hi, Adan is here. >> Yeah, I'm up late. I figured I should -- >> I was going to say, it must be middle of the night. What time is it where you are? >> 01/04. >> Oh, my goodness. Well, it's nice to see you. Thanks for all your help on the forum.

We're going slowly, but we're making progress nonetheless. Yeah. Well, I noticed you had some problems with the rank operator. I don't know if you're interested in some help with that. >> Oh, absolutely. >> Because you were -- I think it was in session 9, you were trying to apply the rank operator on plus/ and couldn't make it work.

And then, as you expected, and then in the previous lesson, you were trying to use it on equal to make it compare like match. >> Yeah. We did get it to work in the last lesson, which was nice, although I realized afterwards, actually, there's a perfectly good out-of-product version which works.

But, yeah, would love help with those things. What's the easiest thing to do? >> I can do that, but I can also just explain something and you can try it out for yourself. >> I think a shared screen would be good. Yeah. Why don't you share yours? >> I can share my screen, sure.

>> Well, I think also it's just good to watch somebody who knows what they're doing is APO for a change. >> Okay. I don't usually use Jupyter Notebook. >> That's fine. Use whatever you like. >> Everything is the same otherwise. Okay. Actually, let me -- I can do it here instead.

>> Well, waiting for Adam, I just mentioned we just -- for those who didn't notice, we just released the course, course.fast.ai, so like literally 15 minutes ago. >> And they're coming. >> First time in two years. Do I need to give you permission or anything? No, it works. Okay, great.

>> Nice rabbit images, Jeremy. >> Yeah, thanks. I like the rabbits. It's now that computers can draw for us, I have no excuse not to add artwork. My wife's an art teacher, and I've been showing her a few of the capabilities. >> What does she think? >> My darling.

I'm impressed, but I guess it blows my mind, but I don't think she really gets it. It's like you see it in the movies all the time. >> Yes, some people didn't realize computers couldn't do this before. >> You can see an empty blue screen, dark blue screen? >> Yes.

>> Okay, just clean slate, don't need any interface. >> Yeah, what is this? This is the -- >> This is called -- no, this is right, but it's right in full screen mode with toolbars and everything turned off so you can only see the APL, nothing else. >> Nice.

>> So the important thing to understand -- >> And when you do write, you're using the normal back tick approach to entering symbols? >> I personally don't. I use the right side alt key as a shifting key. >> How do you set that up? Because I've always wanted that.

>> I have created an actual Microsoft type keyboard layout that you can just install, and then you don't need any of that. >> Okay, can you put that in the chat or something? >> I'll do that afterwards. No problem. So the important thing to understand is that rank, it's like -- I know you call it in Australia, but some of the countries call it blinkers or call it blinders that you put in the head of a horse.

>> You put something on the head of the horse so it doesn't get distracted by things around, so it can only -- it narrows down its vision. That's the only thing that rank can do. And that's important to understand, and that's why it was not working for you, not with plus slash, not with equal.

So I'm creating an array here. This is a two layer, three row, four column array. And the definition of plus slash is that it sums the rows. So now when I sum the rows, you saw this result before. So the first result is one plus two plus three plus four.

That's ten. And then five, six, seven, eight. >> More generally, it's summing over the last axis. >> It's summing the last axis, which is always the rows. In any array, the last axis is the rows. >> Right. >> And in this array, there are six rows. >> And slash bar always sums over the first axis.

>> The leading axis, yes. The first axis. Now, what I can do with the rank operator is to put blinders on this function. So right now, it's receiving as its argument the entire array A. When I say rank two, the only thing that you will ever see is arrays of rank two, even if the actual rank of the argument is higher.

So what rank will do is saying, oh, this function is only allowed to see arrays of rank two. Let me show it this array of rank two. And when it's done processing that, let me show it this array. And then I'll collect together its results from those two applications into a larger result.

Now, remember, plus slash sums the rows. So when it sees this row, an array, it sums this row, and it sums this row, and it sums this row. Next time around, it sums this, and this, and this, which means the result is going to be entirely identical to what we had before.

What's actually happening is that it's seeing less at a time, but that doesn't matter. The result is the same. And if we do rank one, it is only allowed to ever see one row. So it will sum this, and then rank will let us sum this, and so on, and we get the same result.

When you set the rank zero, then ranks knows, okay, this function is only ever allowed to see arrays of rank zero, that is single elements. So it will tell it, okay, sum one, then sum two, which is why that gives us our array unmodified. The difference with plus slash is plus slash does the, sorry, plus slash bar is that it looks at the whole array and treats it as a whole array.

So it takes the entire first layer and adds it to the entire second layer like that. Now, if you restrict the rank to rank two, it cannot see the entire array. The only thing you can see is this matrix because rank is restricting its vision. So now it's going to instead, the leading axis now is down here along the row.

So it's going to add this row to this row, to this row, giving a result with four elements like that, which is why we get two times four elements. If I tell it rank one, then it will only ever be able to see one row at a time. So it will sum each row and that's equivalent to plus slash.

So in general, when you have two symbols that are the same, but one has a bar on it, then if you give the one with a bar, rank one, that's the same thing as removing the bar. And so is that why like you often see more experienced APL programmers kind of using the bar version kind of by default, because it's like more flexible.

Exactly. So now, if we try to use a function that computes the average, let's do that again. And here's a function that computes the average. I'm sorry, tell you, you just learned telling something like that. Now, if I try to apply this on a table instead on the matrix, so we can use this one.

Then obviously the average in each row should be two and five. Well, what's actually happening is that we get something that makes no sense at all. Why is that? Because tell it counts how many major cells there are, that is along the first axis. So it says there are two, but plus slash is summing along the rows.

So we are summing three numbers and dividing by two, that is not an average. So experienced APL, they will use slash bar instead. This sums along the leading axis, and this counts along the leading axis, and this will give me one average per column. So the average of one and four is two and a half, average of two and five is three and a half.

Yep. That's why the experienced APLs will use first axis functions, because then they can always say, okay, if I want the average over the rows, I'll restrict the view of this function to the rows of this array. And now I get two and five. And I couldn't have done that if I had defined my function.

I couldn't, it wouldn't be as flexible if I had defined my function in terms of last axis, only by defining it in terms of first axis, I'm able to narrow down this vision to any lower level. Is a way to think of that mnemonic with the slash bar is that like the bar is horizontal, so it deals with rows?

Is that like a... Yeah, that's how I think of it. At least it goes down the rows instead of going along the columns. And you have the same thing with, this is reverse, horizontal reverse, and this meaning last axis reverse, and this is reversing it over the horizontal axis, or rotating over the horizontal axis.

Or flip, flipping? Yeah, either flipping or... Horizontally, or flipping vertically. Yeah, exactly. So this shows the bar is transposed, isn't it? Yes, the diagonal is transposed because it's flipping over the diagonal. Wow, that's so awesome. So the problem we had with, you're trying to do this thing or something like that.

Or we can take some numbers that makes it a little bit easier to understand. So, plus and all the arithmetic functions are so-called scalar functions. They actually have an implied rank zero, always. Meaning they're always so narrow-minded that they only look at individual elements and never consider the whole array.

And there's nothing the rank operator can do to change that. So as opposed to axis, which is an ad hoc syntax that actually looks at what function did you give it and does something special for it, the rank operator is entirely general purpose. It has no idea about which function it's applying.

I see. So equals bar could be used to behave like equals, but... Exactly. So if we do this, now we're using match that only is allowed to look at scalars at a time. So it looks at three and three. It looks at four and looks at five as pairs.

And so we can use that to match anything. So match is the more general purpose function than equal, but equal is a very common construct. So that we have a separate function for that. And a good way to get a view of things in general, if you don't know what's going on in an expression, I think you did see this phrase once.

It prints things. So here's a really useful debugging trick. If I have a function, I don't know exactly what's going on. I'll wrap it in a division and I'll put in alpha and omega. And this is a new statement. I don't know if you've seen that. And then I'll apply the function.

So let's say, for example, that I'm using equal. So the function will return the same result as the primitive function, but it's wrapped in such a way that it prints the arguments first. So now if we say three, four, and three, five, we can see that the two should probably turn boxing on.

We don't actually need the max style. But oops, that didn't. Oh, of course. I have to specify that even when functions are printing, I want it to be boxed. So what equal is seeing is three, four, and also three, five. But it does its thing element by element. Now, if I apply the rank operator, it will print twice.

It'll be called three, three, and four, five. So this is a good way to see it. And you have already gone so advanced that you created your own operators. So we can actually create a trace operator. And it will print the arguments. And then it will apply the function to the arguments.

So now we can say three, four, equal TC, three, five. It prints that. And we can do it with rank zero. It prints it like this. We can even make it fancier and put labels inside and saying alpha is this, and omega is this, and so on. You do need a different version for a monadic, though, because there would be a value error on the alpha.

So there's a trick that you haven't learned yet, which is to type this. There's a special syntax in the defin. It means-- Yeah, I've seen that. It means apply. It means make that-- if you haven't passed an alpha, then that's default. It's just default left argument. So the default left argument-- this is a funny default left argument, because the default left argument is a function, which otherwise you can't pass in.

But it's a function which is a no-op. It's a identity function. And so here, if alpha is an identity function, then we just print omega. And over here, if alpha is an identity function, we apply alpha-alpha monadically. And then we apply identity function to it, which doesn't do anything.

So this is a general purpose TC. So now I can take A we had before. Sorry, just to clarify, Adam. Is that setting alpha-- it's not setting alpha to the identity function, right? It's setting alpha to the result of the identity function. No, it makes alpha B the identity function.

Oh, OK, because I thought-- Because there's no argument to it. So it's just a function assignment. It's a tested function. Oh, yeah, yeah, yeah. Yeah, OK, sorry. That makes perfect sense. It would be the same thing as writing this. Yeah, yeah, I get it. OK, so we don't have to change that.

So now, if we look at what plus slash is seeing, and that was our problem from the beginning. So here, we can see that it's seeing the entire array. It's a little bit hard to read, but here's our result. And this is the printout of the arguments. We can improve TC a little bit and say alpha here.

Oh, yes, of course, that's not going to work. And alpha omega. This will work. OK, so there's only an omega in this case. And it's seeing the entire array, and it's summing its rows. And when we try to apply this rank 2, then we see it twice. But it's, again, being applied to this.

It's exactly like applying plus slash to this array, summing its rows. And so on. So this is a really useful operator. You can modify it to your heart's content. You can make it do whatever you want. And to effect things, you can write a timestamp when it happened and so on.

So I think, I mean, I can take questions, but I think it should be more clear now what the rank operator does. It's just blinders. That's all it does. This was really helpful. Thank you, Adam. And this is what happened, then, with when we compared these. Then you did the outer product like this.

The outer product is comparing all the elements on one side to all the elements on the other side, all the different combinations. We can write this just with equal and rank. Yeah, that's what we did yesterday. Yeah. So again, this is every element, zero on the left. Rank zero on the left gets compared to every element on the right.

So I know you did this, but there's a point here. So it gives us the same result. But what is equal actually seeing? Outer product applies between all combinations of left and right elements. That's not what's happening here. It's what you thought was happening here. Because if you put TC in, you'll see that it's only being called with a scalar on the left and a vector on the right, because that's exactly what we asked for.

Scalar on the left, vector on the right. If we instead do dot equal trace like this, you can see that it's being called individually on every pair. So the full way of behaving like the outer product is to say, I want equal to only ever see arguments of rank zero on one side and zero on the other side.

And you're allowed to omit one if they're the same. And that function, which only looks at rank zero things, should be applied between scalars on the left and vectors on the right. So I'm using rank twice. Oh, wow. Hang on. Okay. I want you to understand. So everything banks for the left.

Okay, we read left to right when we do operators. Yeah. Or you could say the operators have a long left scope. So this operator takes the entire thing here, operator phrase on the left, a function phrase on the left. So this is saying this function can only ever see scalars.

And with that, apply it between scalars on the left and vectors on the right. Now, I don't get it. So if you're saying to apply it to vectors on the right, but you previously said it only, you can never apply the scalars. What does that? Yeah. So it's not, remember, um, rank is not modifying the function.

Rank is calling the function one or more times as necessary, such that the function will have a restricted view. Yes. So this function over here will be called with left arguments as scalars and right arguments as vectors. We can see that by putting in CC. Oh, right. However, this function itself is the derived function.

It is not a normal equal. It's an equal. It's, it's a function that uses equal, but only ever lets equal experience a scalar on the left and the scalar on the right. And how do we know that? Well, we can look at with TC. So now we can see that this equal is being called like that one element at a time on the left and on the right.

We could also put in a double TC, but it will be a very verbose and also when, yeah. So, so we can see first to go up all the way here. So we say that the outer TC reported, I'm calling my operand with a scalar and a vector and the inner TC, the left one saying, I'm seeing, I'm calling my operand with a scalar and a scalar.

So actually in some ways, it makes sense to think about that composition train right to left in the sense that the right-hand diuresis is taking the whole left-hand function. So the kind of the implied loop is that left-hand side is kind of the inner of the applied loop in a sense.

Yes. And this is the governor. Or do you read it right to left until you hit the operator and then you jump to the far left and read that? I mean, the parsing goes left to right, but the point is that the right-hand diuresis has the entire left-hand derived function as the thing that it's applying that rank to.

I find it a little bit dangerous to speak about APL in terms of right to left, left to right. It's kind of like a scaffold for letting people know how simple function application works in APL, but it doesn't really apply when you have the full APL syntax, including operators and stranding and so on.

Really the way you should think of it is in terms of binding strength. What binds stronger? And then that operators bind stronger to their neighboring tokens than functions do. And then when you have equal binding strength, then operators go from the left stronger. So they have a long left scope and operators left upper-end will be as far to the left as it can possibly reach without switching type.

It can only take either a function or an array. So when parsing this, we can look at this as, okay, this operator, what does it take as its left upper-end? Well, here we have an operator, a magnetic operator. It can't be just that because it can't take a magnetic operator as upper-end.

So we have to keep going left. Maybe this is the upper-end of TC. Oh, further left. No, there's a diatic operator. It's going to grab the zero from TC and it takes its left upper-end. Oh, no, another left, another operator on the left. So it has to have an upper-end.

Keep going here. And there's a parenthesis stop. We can't go any further. So we stop here. Or you could look the other way around this equal. Is it being applied now? Nope. It's being grabbed by an operator on the right. Is this being applied? Nope. It's being applied. It's being grabbed by an operator on its right.

Okay. So here's the right upper-end. Are we ready to apply? Nope. There's an operator on the right grabbing me. Is this ready? No, there's another operator. Okay. And then finally, the right upper-end and the parenthesis. We can't go any further. So it doesn't matter which way you go. As long as you know the binding strength rules, you just go one token ahead and see are we done yet?

And if the binding rules say no, we're not done yet, then you keep going. And related to this, Adam, I found it very insightful listening to you on a Raycast episode talking about why you tend to avoid parenthesis, which is not because you're trying to type less characters, but because it's a similar idea that you were saying, there's less to keep in your head at once if you can just work in the natural direction and only have to keep one thing in your head at a time.

Right. And it doesn't really matter. You can read APL right to left or left to right. It's just a matter of reading it. So the way I would read this from left to right, and actually I would avoid this parenthesis, I do need to separate this array from this array, but I can do that with an identity function because this operand here has to stop here.

We're switching to a new token here. It can't grab further. Okay. So that identity function. So what's the code again? Write something, right? I mean, the efficient name is the same, but write tag is the symbol. Yeah. And so that function in a dyadic context returns its right-hand side and in a monadic context returns, well, it returns its right-hand side.

It's right-hand side. I like to call it write because it returns whatever's on the right. Right. And so that function's not doing anything except as I parse that now, I basically can see that I've got a function being applied to an array and therefore I'm, yeah, that's a unit of stuff that APL can then.

So you can read this from, so normally I would read APL, at least these expressions are short enough, from left to right. Interesting. Because it's executed from right to left, we can read it from left to right. And I will make a crazy claim here that English is written and read from left to right.

And it executes from right to left. I'll come back to that. So this is ABCD equal on scalars, on scalars and vectors, to ABDC, BCA. It reads naturally from left to right. Yeah, I know what you mean. It's like when you see like, you know, three divide, tilde diuresis, something, you can start reading it as like three divided into, and then you can start, you know, you can see that expression, divides.

Just make it three, three divides five. English is executed from right to left. Go drive the big red. You still have no idea what I'm saying. Bus. Okay, so first you have to evaluate bus, right? Then you have to make it red. Then you have to make it big.

Then you have to talk about the concept of driving it. Then you have to go do that. Go drive the big red bus. That's insightful. Yeah. In fact, normal function syntax in other programming languages is also from right to left, even though everybody thinks it's from left to right.

Because if I write f of g of h of x, you have to evaluate x first. And before you evaluate h, before you evaluate g, before you actually write it and read it from left to right. Well, a lot of people nowadays are moving towards the syntax where you kind of...

The x dot x. Well, or maybe it's been functional, kind of a right arrow kind of cliff or something. Yeah, pipe type thing. Yeah. That's true. Yeah. But anyway, enough about that. So I hope this clarifies matters a bit. Yeah, it's great. Now, I've used up half of your time.

I'm sorry. No, I'm thrilled. Anybody have any questions about anything? Adam might want to go to sleep. Both watch the previous ones and then join in. This is great. I guess I have a more general question, Adam, which is, do you have any thoughts about... I mean, I want us to finish all the glyphs, right?

Which hopefully won't take too much longer. But when we do, I think the next step will be to learn to write APL properly and also understand why, like what proper is proper. So things like this use bar version of glyphs because they're more flexible thing is like a pretty key insight.

Is there like good videos or books or anything like that for getting these kinds of insights? The art of APL. APL style. I don't... There are some tips and tricks. In general, APL isn't very opinionated about how you should write things. In fact, I think dialogue is kind of proud of language being a multi-paradigm language.

You can write in a functional style. You don't have to. You can write in the object or in the style if you want to, but you don't have to. You can write test it or you can write non-test it whichever way you want. However, if you want good performance, for example, then there are some things you should stick to.

If you want more flexibility, so your functions are generally more applicable, then there are some things you can stick to. I would say for what we're doing, more flexibility is probably what we're aiming for because I think like this study group are kind of positioned to just like learning about a flexible and expressive notation which might help us to think about problems that we're solving.

There's not enough, I think, to write in order to make a paper of it. It's like a couple of lines of tips like this. Make your functions leading axis oriented so that they're more flexible. You can apply them. You can always make them later axis oriented by using the rank operator and keep your codes flat.

You can do arrays of arrays. Flat reading with parentheses? No. The algorithms should use arrays that are not nested. We can have these arrays of arrays. You haven't used a whole lot of them, but the opposite is called simple arrays or flat arrays. I've heard some people call it, they're more sympathetic to the hardware.

The computer is really, really good at arrays because it's actually... Just to clarify, if I remember correctly, J doesn't exactly let you have arrays of arrays. You have to explicitly box them. The difference is very little. It's almost... It's more focused on arrays of arrays, if I remember. Yeah, K doesn't allow you multidimensional arrays.

It only allows lists of lists of lists and there's no other way. There's some choices that we made in design in order to avoid that because it doesn't allow multidimensional arrays. Yeah. In PyTorch and such things, we think about these issues a lot because it really kills you on the GPU.

If you're doing something across anything other than the trailing axis, it'll still work, but it'll be doing a non-linear stride. But it's not just a stride. You have a stride if your array is actually represented flat in memory. Sure, but if you have nested arrays, it's not contiguous at all.

It's not even a stride. And that is going to kill performance. And not only performance, but actually in today's computers are so fast that the bottleneck is often memory throughput. The RAM cannot feed problems to the processor fast enough. The processor is just sitting there waiting for the RAM to deliver more work.

So this is actually a very current issue in the deep learning world because as of a year or two ago, a lot of papers were written that would write about the flops that their algorithm would require. And nobody, not nobody, but a lot of people writing these papers hadn't quite noticed that there was very little correlation between flops and time because of the memory issue.

Now PyTorch doesn't let you have tensors of tensors, so it's less of a problem. But yeah, it does turn out that memory is probably the more important issue in deep learning algorithms at the moment. So here's one more trick to use in APL at least. Use Boolean masks as much as you can.

And that is because again, the RAM is the issue. That's the bottleneck. So in other words, instead of conditionals. Well, not just instead of conditionals, but instead of integers, if you can, instead of using indices and things, then you should use a mask for the whole thing. The reason for that is and store data as Boolean instead of...

So I just want to make sure I've done the same wavelength. So as you're saying, instead of like having an array that says like, get indices two, three, and five, you would have a mask array of zeros in which items two, three, and five have a one in that location.

Exactly. And then let's say, for example, you need to combine two conditions. And you know that elements one, two, and five abide by these conditions. And then by one condition, you have another condition for which elements four and five hold the condition. So you could do the intersection of the two sets.

I'm multiplying them. To get the indices. Well, they're just numbers, right? They're just the intersection. So you do the intersection of them as sets. And then those are the ones where the condition holds for both. And then you could index things. However, if you had them as Boolean masks instead, so it would be whatever, zero, one, zero, one, something like that, and one of them and so on, one of them and so on, then you can just do an end, the Boolean end.

And that gives you a new mask and doing a Boolean and operation on binary data in the processor is enormously much faster than doing a set intersection. That's what I meant about multiply. So you could, okay. Yes. Exactly. And then there's another benefit of this is the API will aggressively squeeze arrays and Boolean arrays are stored as one bit Booleans.

Oh, really? That means that you can store eight elements in a single byte. Wait, how does that work? Because it's not like typed per se. So if it would just notice that the highest is one and the lowest is zero. And if I then try to store a two and it all have to reallocate the whole thing or something.

Yes. Yes. And then, but that means since the processor is waiting for the data and we're able to switch to an eighth of the data size, that means that the transfer time, which is the important time is going to be an eighth. And that gives you enormous speed ups.

And so we have all these very clever algorithms built into the interpreter and algorithms that are difficult to develop. They can take decades to write the C codes for that, a C code for that. And they can give you a speed up like that. So basically by using APL that is optimized like this, you are, you are employing C clever C programmers that have been working for you for years to fine tune your program way before you even started writing your program.

So, so these are, I can't, I don't even think I can think of more of more things that have good principles than, than that. Okay. I mean, that's very good. You use Boolean masks, keep your arrays flat. And what was the first thing I said before? And first, and first access in leading access and things.

So with, so for, I mean, the general programming principles, you don't do global state changes and sort of global variables, really bad idea. So one thing we, yeah, it's another thing I think we're pretty familiar as a community with more general software engineering principles. One thing that surprised me when we were learning about each was that it didn't operate over kind of major cells, but instead it operated over sub arrays.

And I guess that what now that we know about rank, we can just use rank for anything where we want to go over major cells, which means maybe each is not so useful anymore. Each is actually really, really simple. I can, I can show you if you want, I can explain what is happening with each.

So, okay. And like I said, just in general, but like each, each is a thing that you would use and, you know, use it occasionally, but it depends what it is I'm doing. I'll try to avoid it as much as possible. And among APL programmers, because it's an explicit loop and that means the interpreter has no choice but to loop.

And we don't want loops. We want to do array operations because then the processors now have array instructions. They can do it. And rank operator doesn't create explicit loops? Rank operator conceptually loops, but internally if it can avoid it, it will not loop. So it knows about a lot of things.

Exactly. Exactly. So that would be good to have a little section in the notebook there, Jeremy, where we might like say this is an explicit loop and this is the less explicit way to do it. Well, yeah, I mean, it's a different notebook, I think, Ben, like, you know, we've got to think about how to present all this, but I think, you know, there's a note, the theory of this first notebook or set of notebooks is literally, you know, a dictionary of APL glyphs in an order where you never get a definition in terms of something you haven't learned yet, you know, and then there's something later about like, okay, what do you do with it?

Cool, yeah. So if you use the rank operator to loop, then you might keep the performance because it doesn't actually loop. It uses fancy instructions for that instead. Each doesn't have much of a choice, although occasionally the interpreter is clever enough. If you try to use plus each, it will not actually loop because it knows how to just circuit that and just do it directly.

But what's happening with each is, you think of the matrix, they want to loop over each row, but really what each is, is very, very simple. So if you have F each, that's the same thing as, you know, now that you've learned enough of these compositional operators, you learn the top.

It's the enclose, which I don't remember. We haven't done enclose, but you can do quickly what it is. Yeah, it's basically just wrapping an array up as a single element, as a scaler. So it's adding a leading act. Oh, it's not adding a leading act. Not adding an actually creating a scaler.

It's creating a pointer to it. You can think of it like that. What type is that? Is that some new type we haven't learned yet? Like it's literally an enclosed item. Well, it's not numeric scaler. It's not a, and it's not a scaler. It's a pointer type reference, but it doesn't, it's not a reference.

No. Okay. Because APL is passed by value. And so it will do, it will not keep connections between things that you assign across, but internally, it's actually a pointer. And that's pretty much how you can think of it, but you need some enclosure. It's a scaler. And so what it is, is enclose a top F over disclose, and disclose exactly the opposite.

This means follow the pointer, go, go get one element and open it up. Rank zero. I'm just trying to remember. So the jot, diuresis. This is pre-freses, both arguments. Yes. But if there's only one argument, remember, then it's the same thing as on the top. I do remember. Yeah.

So that's why it's useful to have it do that. So this means actually pre-process all arguments, whether there's one or two, we just pre-process them with this close. So we open up a box. Add on a ticket. You've got, oh, no, you don't have a fork. These are operators, not functions.

Okay. So this is, so here you've got function, operator, function, operator, function, operator, array. Okay. So this whole thing is monadic because there's a thing only on the right. And then, wait, what, what are you saying? This is not, this whole thing is one giant function. The function is ambivalent.

We call it. It's both monadic and dyadic. Oh, how is this a function? I thought the zero on the right hands. Oh, no, the zero is the right hand side of the operator. Yeah. Yeah. So this says, so what it's saying is on every scalar element and loop as much as necessary to address scalar elements.

Okay. This is fine. On both, on both sides, not mine. This is rank. Oh, okay. Yes, of course. That's rank. So this says on scalars. So we already, we already dug. The first thing we do is dig all the way down to the scalars. That means there's nothing you can do to each to make it apply to rows.

It's already impossible. It's like, it's like equal or plus it's already down at the scalar level on the scalars. Yeah. Sorry. So it never sees the row. Yeah. It never sees a row. It never sees any, F will never see anything that comes from more than a single element.

So on the single elements. Yeah. And remember this zero actually means zero, zero, zero, because it, what it means is zero, it's ranked zero monadically, it's ranked zero and left and this ranks zero, right. So it's always ranked zero. Then on those elements, open them up. If they are nested, if they're not nested, like just a simple array, then nothing happens.

Apply F and package the results back into the box. Oh yeah. Because the, um, when you've got multiple composition, it goes right to left. This, yeah, you could, you could, the binding goes like that. So we're saying, so this, this is F post processed by enclosing the result and pre-processed by disclosing the result.

Oh, okay. So we can show you have actually used nested arrays that have been implied in close already. So, and what we can say as an example, ABC, DF, this is a nested array. This is exactly the same thing as the enclose of ABC concatenated with the enclose of DF.

The stranding syntax is just a nicety. It's in tactic sugar. It means this. Okay. It just enclose all the elements that are being stranded together. So, so conceptually, these are, well, no, they're really, they're, they're scalars because every vector consists of scalars and every matrix consists of rows that have scalars in them, elements in them.

And so this is an, and maybe we should turn boxing up to max now. Why is that not just an array of arrays where the first array is ABC and the second array is DF? Well, it is. Okay. So why do you need that enclose idea? Because if I didn't have enclose here, then we would just be concatenating together.

Yeah. So we need to say each, each three element vector lives in its own little scalar. And so these are individual scalars. If we look at the, at the shape of this, it's the empty vector. It's a scalar. Yep. If you look at the depth of it, it's depth two.

There's an outer array, which is a scalar and there's an inner array, which is a vector. So the two levels. Yep. And so this is what we have. This is exactly the same thing as ABC, DF. And now it's easier to understand when I do reverse each on these, what was actually happening here is I started off by applying on, so you can make our little, we can write this TC each.

So we can see that TC is first seeing ABC, then seeing DF. Ah, well done. So if TC is seeing ABC, then firstly, well, if reverse is seeing ABC, that means it was only applied to the first element. That's the rank zero, but it wasn't, it didn't see the enclosure.

So we have disclosed it. We've opened it up. Hmm. And what did that? Well, because remember the definition of, of each. So in this case, it's the enclosing, oops, enclosing a top, the disclosing rank zero. Oh, this is what each means. That's the definition of each. If we didn't do the disclosing, if you just do this right, reverse running zero, then we are reversing each scalar, but reversing a scalar doesn't do anything.

Hmm. If we only preprocessed it by opening it up, then we would get the matrix because we're having the results from this or is a vector. So two vectors in an array makes a matrix. So to stick them back into the boxes where they came from, we post process the result of reverse, and this is a definition of each.

And that's why you cannot use a function that has an each after it to access entire rows of a matrix. It's just not possible. However, if we take three, four, reshape by Yota and Yota 12, and we want to reverse each row. Yeah, now you can use rank. Firstly, this function anyway is rank one.

Remember, if this function is the leading axis one, then the corresponding function is the same thing rank one. So this reverses the first axis, and this reverses the last axis. It flips it horizontally. And if we use the first axis one with rank one, then we're flipping the rows because there's only one axis in them.

We're only ever seeing one of these. So how could we use reverse each to reverse the rows? Or for that sake, reverse first each? Well, if we know that each will open up these boxes and close them down again. So if we give it boxes that they can apply on, then it will work.

So if I enclose rank one, so enclose, remember, puts a box around things. If I just enclose the array, we get a multiple enclosed, we can keep making beautiful patterns. If I enclose rank one, then I took each row and made it into a scaler. That means we have a collection of three scalers that's called the vector and nested vector.

Now I can reverse each. Isn't there an arrow that does the same thing? Yes, for matrices, there is a down arrow, but it's not really necessary. It does exactly the same thing on matrices, but it doesn't do the same thing on higher rank arrays, so it's more general. In a sense, this itself is a last-axis function, whereas enclose is a leading-axis function.

It works on everything. And you can restrict it to be on the lower rank. Oh, I see. So you can think of down arrow and enclose as the leading and trailing-axis versions of the same thing. Yeah, you could, yeah. Maybe they don't look similar. They don't look similar, but that's this because...

Enclosed with a bar would look like epsilon or something. Well, they would look like this, but that means something else. Oh, okay. Yeah, you haven't learned the dance. You'll see. I don't give it a spot at all. But there's never any reason, really, to use a down arrow. It's just confusing on higher rank arrays.

It's much easier conceptually to understand that this puts things into boxes, and it gets restricted to only see rows. So it puts the rows into boxes. And then we can disclose... The problem is we need to disclose each of these. So we need to disclose rank zero. The elements are of the vector.

This vector has three elements, and I want to open up each one of them because it's confined to a box. Then we get our metrics back. So this is exactly possible. But notice here, I'm enclosing only to applying each so that I can, again, disclose. Well, that's the inverse operations of what each actually implies because this each is actually over disclose rank zero and then enclose on top of that.

So these negate each other. This makes scalars that are enclosed, and this opens scalars that are enclosed. And this encloses the results, and this discloses the results. So these two cancel each other out, and these two cancel each other out. And the only thing we have left is this rank one, which it already is, and we're right back.

So we can always look like that. You come full circle. That's great. This is why you cannot use each on rows, but you can use rank on rows. And the interpreter is clever enough that if you write reverse rank one, it won't loop. It will understand that it needs to reverse the rows, and it will do that as fast as it can do that with a vector instructions memory.

I don't know if it can actually speed anything up here, but it will try. Wow, it's nice to learn APL from somebody who understands it. Thanks, Adan. We should let you get to sleep. And that's our hour, so that's actually fantastic. That's awesome. I feel a little bit better about hijacking your whole thing.

We're happy to have you hijack all the whole things. It's great. Thank you. Yeah, no, it's great that you're spending the time to watch them all. It was great that you joined. This was really helpful for me. I enjoy also seeing your explorations, and it gives me some feedback on where we can improve our documentation.

Must be a bit cringy, though, to see us being like, "Oh, what are we doing?" Just press that button, Jeremy, for God's sake. There's been a couple of times where I kind of wish I was there. I'll let you explore. Almost always, you figure it out eventually. Somebody jumps in and says, "Hey, try this." But what I was worried about a little bit here is you seem to be going down a wrong conceptual path with regards to rank.

It seemed like you were thinking that rank actually modifies the function, just like the bracket X modifies the function. That isn't the case. It's nice to know that if we go too far off the deep end, you'll come tell us. Yeah, then I'll come and yell at you. I'll have sleepless nights if you go too far off the right path.

Excellent. I'm glad to hear that. If we see you join the call at the beginning, we're like, "Oh, we had a problem last time." Yeah, exactly. All right. Thanks, all. Otherwise, feel free to ask me questions. I'll respond on the forums. Very well. Bye. - Bye. - Thanks, Adam.

- Bye.

fast.ai APL study session 11 (with Adám Brudzewsky)

Chapters

Transcript