Back to Index

Live coding 6


Chapters

0:0
6:36 Creating a persistent environment in Paperspace
13:8 Conda install mamba with -p
13:30 Install universal-ctags using micromamba
14:50 Clean up conda directory
18:30 Fixing path to universal-ctags and mamba
20:20 Create a bash.local file in /storage
23:30 Install micromamba into conda folder
24:0 Remove mamba and move conda folder into storage
24:40 Edit pre-run.sh file with symlinks to conda
25:20 Preserving .bash_history file
30:0 Test setup on new machine
34:30 Clone forked copy of fastbook
42:30 Adding git config file to persistent storage
45:0 Discussion about making contributions to repos with pull requests
48:0 Comparing different versions with nbdime on Paperspace
48:20 Start fastbook chapter 1
51:20 __all__" is pronounced "dunder all
52:50 A nifty trick for navigating source files
57:30 Optimising storage use on Paperspace
59:40 Move fastai config.ini into storage and symlink
65:45 The Path.BASE_PATH variable trick
69:0 The fastai L class: a dropin replacement for a list

Transcript

Okay. Did anybody have any questions or anything? I'm sorry. I'm sorry. I'm sorry. I'm sorry. Questions or anything from last time or comments or anything else? How's your VIM going? I just had one, Jeremy. I wasn't really sure what happened when I looked. So I did the VIM and CTAGS thing in paper space.

And as per the session, and then when I shut down the machine or the instance and went back into the same machine, it didn't seem to work. So I'm not sure why or if you want to go into that. Okay. Let's have a look. See if we can hear what might have happened.

What happened when you tried to use it? I muted myself. So what happened was the member and I reinstalled member. And what else was this? And CTAGS again. And the CTAGS file had changed, but actually didn't have anything in it compared to how it was when it was originally created.

So I don't know what happened. But then when I tried to re-index or recreate the index and that didn't seem to work either. Well, I mean, one obvious thing we should probably try to get going is being able to see the same, you know, binaries and stuff reinstalled. Having to reinstall CTAGS every time is certainly not ideal, is it?

So let's try and figure out how to do that. I'll share my screen. All right. Can you see that okay? All right. Let's start this one. This is the one we created the other day. So the only things that are going to still be there next time we come back are things that are in /storage or things that are in /notebooks on the exact machine that we just started.

And to remind you, we can make this more flexible by taking advantage of the pre-run.sh and the .basrc.local scripts that we can put in /storage. And we can use those to copy, link or move stuff out of those persistent places into our home directory if we want to. So that's kind of how we've been organizing things.

And so when we do pip install with a --local flag, it installs things into our home directory, into a subfolder called .local. And so that we have moved into /storage and then similar linked back again in our script back to home. And so now anything that we install with pip install --user will be persistent across all of our sessions.

>> Can I ask a question about that? So does that mean that if you, in the past, suggested that you can wipe your Mamba directory and use it to reinstall? Will the fact that you have the sim link to local on the storage drive, will that mean that those pip installed files will be persistent even if you wipe your Mamba drive?

>> Yes, although that thing about wiping was not for paper space, because remember on paper space, the Mamba stuff is installed into the root directory, into the /opt/ or whatever. It's not in our home directory like usual. So you can't really wipe conda on paper space. On paper space, conda is always wiped clean because you start with a new server.

And so the only stuff left will be stuff in .local. But, yeah, I don't know why you would, but you could do the same thing on your own machine. You could pip install --user and have a .local directory that, yeah, would be separate from your Mamba environments and stuff.

>> Okay. I guess it's moved. Yeah, I hadn't thought of it that way. Okay, thanks. >> Yeah, that's fine. So I'm just having a look. So I think the way we -- yeah, the way we want to install stuff like c tags, which is not a Python thing, is using conda.

And so we can install with Mamba, as we've done before. Now, by default, that's going to install it into the /opt/etc, which we will lose. But what we can do is Mamba comes with a flag we can use, which is minus P for prefix. And if we use the minus P flag, it will install stuff into a different directory.

And so we should be able to use that to install stuff in a way that we can -- that we can reuse it. And I think this is going to be a good exercise in understanding paths and aliases and -- as with sim links, I should say, and all that kind of thing.

So I think -- yeah, I think this will be an exercise worth doing. So I'm just loading up JupyterLab. Here we are. And so this is the same server, notebook server, that I used last time. And since /notebooks is persistent, that fastai folder that I -- sorry, I didn't sing with, that I get cloned is still there.

All right. So if we get a terminal up -- and we could probably take better advantage of our space. So -- good. All right. And then Apple B or control P, B shows and hides that sidebar, which is quite useful. So what we could do is we could create a -- so I just typed CD, right, which is moving me into my home directory.

And so we could create something to put other Conda stuff in here. We'll call it Conda. And I believe we could go member and then say a prefix. So our prefix is going to be our home directory's Conda directory that we just created. Install universal -C tags. So -- okay.

So -- all right. So we also don't have member installed, which is a good point. So we'd like that to also be permanently available. So let's start by saying Conda install -P. Conda -P install member. Okay. That's probably going to be after the install command, is my guess. So while that's going on -- oh.

Okay. And then because they're not using member forge, that means that the Conda forge channel is not available by default either. So if you use member forge, then it kind of assumes that you're always going to say -- that you're always saying -C Conda forge. But since we're -- and I think the plan is that paper space is going to hopefully switch to using member forge soon.

But for now, we'll have to say -C Conda forge. All right. So while that's going on -- let's also get our fastbook here, which actually makes me think about something. I just tried to reuse a command that should have been in my history, and it wasn't there. And I think the reason it wasn't there is that we -- oh, here we go.

Let's say yes over here. Wait. Why is this installing Python, even? Yikes. Oh, okay. That's annoying. So member -- okay. So let's try to install it into the tilde/conda directory. And it's saying, oh, that directory doesn't even have Python in it. So I want to install Python and everything into there.

That's annoying. Let's say no. Okay. I do have another plan up my sleeve, which is there is a version of member that is fully self-contained. Everything statically compiled into it called micro member. So we're just experimenting a bit here. Once we get this working, we will make this all easier.

So instead of using member, we say /storage. So we can put this somewhere else in a moment. So let's see if that would work. Okay. That's also not working. Hmm. This is quite tricky, isn't it? So maybe we need to figure out what a minimum environment is because we don't want to use up lots of our storage directory with junk that we don't really need.

So what I'm going to do is I'm going to install all this junk and then I'm going to see what that actually installs. And then we'll try to get rid of as much of that junk as possible. So I think that's my plan. Yeah, that's roughly what I was getting as well.

So, okay, no channel specified. I think I might be getting closer though on this condor forge. Okay, that seems to be doing something. So we might be on the way to finding a solution. Okay, let's see what this condor directory. Well, it's got a lot of stuff in it.

So I want to find, I don't want to like have stuff that takes up storage space and I don't need. So the first thing to find out is how much storage space does things take up? So D stands for disk usage, that shows you how much storage space things up minus S just gives you a summary per directory and H is as usual human eyes.

So like show me megabytes and stuff like that. So it's easy to read. So this is going to show me the size of every directory. And if I want it for the whole directory, okay, so I've got 276 meg, which isn't terrible. But let's see. It is 38 meg, lib is 203 meg.

So it's mainly in lib, right? So what I could do then is go lib slash to see which ones. Now that's quite hard to read. So when things are kind of hard to read like that, I want to find everything that's like over a megabyte or more. So in the terminal, one of the really nice things you can do is you can use the vertical bar called pipe.

And what pipe does is it takes the output of the previous command and it sends it to another command. And so the thing that you're most likely to pipe to is grep, and grep searches and only prints things that match a pattern. I just want to find those things that are a megabyte or more.

So there's a capital M for megabyte, and then there's going to be a space after it. Sort of just search for an M followed by a space. No, that didn't work. Maybe that's not a space and it's actually a tab. Yes. Okay. So here's all the stuff that's over a megabyte in lib.

And we could do the same thing for the directories inside Conda. There they are. So of that 203 meg, it's mainly Python, which I'm sure we're not going to need. So we could RM minus RF Python and see what else we got here. Python. Now got 161 meg, still quite a bit.

Still mainly in lib. So this giant thing here, lib ICU data. I don't know if we need that or not. So what we could do is just move it out of the way. So 71 and maybe just move it out of the way. And then let's see if BAMBA works.

Oh, we have an install. So let's see if we can. Oh, so BAMBA. Okay. So we've installed C tags and MAMBA, but I can't run them. And oh, that's interesting. I can run the C tags. Why is that? Oh, okay. So it looks like paper space comes with an old copy of C tags.

And this might be the issue that you were having with not being able to use it. So normally dash dash version should tell you the version of things. There we go. C tags from Emacs. Wow. Okay. So my guess is it's not going to work the same way as what we're used to.

And that's probably why some people were having problems. So this is not finding the one we just installed and we're not finding MAMBA at all. And so to remind you, the reason for this is that the way that the computer finds things to run is it searches in our path.

And the stuff that we installed is in the Tilda slash Conda slash bin directory is MAMBA. And here's our newer version of C tags. So we could run things manually by typing bin slash C tags minus minus version. Okay, so it doesn't need that. Let's move that back in.

I see you data. So that needs to go into it. Okay. And yeah, so here's a current version. Five point nine compiled. Well, two days ago. So to make it so that we'll be able to see stuff that's in our Conda directory, we need to ensure that it's in the path.

So to make sure that things are in the path when we run a bash terminal, we have to put them into our dot VIM dot bash RC dot local file. So that's going to be. Okay, so it looks like I don't currently have one. So let's create one dot bash RC dot local.

All right. So. This is a file that needs to be run with bash. And what we're going to do is we're going to export sets of variable. And so the variable we need to set is path. And we want to set it equal to the Conda VIM directory in our home directory.

And then we want to also have the contents of the existing path variable. So you have to put dollar path to say the contents of a variable. Okay, so now if we create a new terminal we can test with that worked by printing the contents of dollar path echo is how you print things.

Okay. Let's see. So that did not work. Let's try running it manually and see if it works. So source is a way of running a script and storing all the variables in this shell. So if we say slash storage slash bash RC dot local. It is Rick kind of been okay so why was it not working.

Oh, it's not bash RC dot local it's dot bash dot local. That's why. So we could move. So I don't have to type up bash RC dot local I can just type exclamation mark dollar because that means the last token from the last line, which in this case is dot bash RC dot local and move it to bash dot local.

Let's try again. There it is. Okay, so now if I run C text. I've got the right version, and I should get a right Matt run member. Okay, so mamba doesn't like being run so that's fine. What we can do is we can instead use this micro member thing instead so let's move micro member into condor bin.

And so you can get micro member. You can download it from here. Alright, so let's see how we're doing. Let's get rid of member. And let's see how much space we're now taking up. Not bad. Right, 175 megs so we've now got a, you know, place that we can install software like C tags into, and it will work.

So, to make sure that that'll work next time we need to move that into our slash storage. And then of course we'll need to similar kit back again. So, let's edit our pre run but sh file. You definitely don't have to remove anything so it won't be there so we just sim link slash storage slash Honda back to the home directory.

Okay. So there's one other thing I'd like which is my dot bash history file it's really nice to come in each time and have the same bash history file on a machine. In my opinion. So, let's move that also into slash storage, because that means that our control are an app arrow and stuff like that are going to always work.

So that goes into slash storage. And so let's do that as well. So how am I going to do this in VM I want to copy two lines and paste them. So to copy two lines. To copy a line is shift y so to copy two lines I press to shift y to copy two lines, and then shift P to paste those two lines above.

Okay. And then I can press shift W to move a word forwards and shift C to change the rest of the line so that deletes and then it puts it in insert mode, and I can type bash underscore history. Now here's where it gets fun is, I want to do exactly the same thing for this line is replace this the rest of this line is dot bash history.

So to repeat myself it is press dot. Isn't that fun. So, all right, let's, and that's not a directory so it doesn't need to be recursive. Okay, so let's try this out, shall we, by spinning up a new machine. I guess I mean I what I mean is spin up an existing machine, the instance of it.

So the reason we're spending quite a bit of time to kind of set up our paper space environment is like, because I think it's good. There are practical examples of using VM and using links and using scripts, you know, which hopefully you'll be to reuse these ideas, lots of times.

So let's start that one. Okay, so I'm just going to start up another machine and see if that all works. Let me wait for it. So yeah, I have a feeling now, once you've installed universal C tags properly and it's in your path, just a case of checking the version.

So let's create a new shell, because that one's. App arrow. Oh, this is still not giving me the right version that's interesting. Add up to the shell to the path. Route Conda bin. That's interesting. Conda in C tags. Oh, okay, it's not similarly back again because we didn't run that storage script because this is not a new machine.

So for now I'm just going to run the storage script manually. Okay, and. Okay, still not right. Oh, then the thing that puts in the path is meant to be the dot bash dot local, right, which should have run. Just create a new shell just to see if something weird happened.

Okay, so that's in route Conda bin. So now if I type C tags minus minus version with control R to find it quickly. Okay, it is working. So it's just a case of getting that running it again. Okay, so here's our new machine, which is now finished starting. So let's see if the same thing works on a brand new machine.

Where is this take a long time to start open up to put a lab the first time for some reason. On each machine. It comes. Okay, so fingers crossed. See tags minus minus version. There it is. And in theory. Yeah, we've also got our history so our history is actually saved between machines, which is quite nice, right.

I love it. So if I wanted to echo path again, I could type control our capital PA and it'll find it from our history. Even though we're in a totally separate machine. One thing weird is we're getting the path here is repeating itself. Not quite sure why that's happening.

It doesn't really matter too much, to be honest, but I'm just curious about why /opt/conda/conda/vn is appearing. But it's not appearing twice. User. Oh, no, they're all unique. Don't mind me. Okay, I was imagining things. So on this computer, I don't really want all this stuff in my notebooks either.

So let me RM minus RF everything. That's a really dangerous command. Oh, Hamilton. Hi, Hamilton. Yeah, I'm so sorry for being late. I forgot. I didn't even tell people you're coming so they don't know you're late so you could have stayed silent and not known. Oh, hopefully it's going to be joining most to all of our sessions from now.

So do you want to quickly introduce yourself, Hamel? Yeah, yeah, I'm Hamel. I have, you know, work on fast AI a lot. I especially like contributing to all the dev tools, like nb dev and fast core and stuff like that. And in like two other tools that help people automate what people do.

So yeah, Hamilton Hamels got background as a machine learning researcher and developer and also quite a bit of stuff with ML ops. And also training as a lawyer. So, you know, if anybody needs help with the law, he's the guy. That's what I tell people about that, Jeremy. All right.

Nice to see you. So we're just setting up our paper space environment Hamel. And we've got to a point now where I just launched a new instance and everything is exactly how I want it straight away, which is really nice. We've got a way to install persistently Python modules.

We can install persistently binaries. So I think that's a good place to start working through the book because we've got ourselves an environment. So that is step one. Is there anything else that people felt like we're kind of missing from the environment that they would really want? Or should we start working on chapter one of the book?

Sounds like we're all happy. Okay, great. I'll stop this machine because that one's costing me money. So here we are in slash notebook. So the first thing we're going to need is we're going to need a copy of the book. So fast book is here so we can copy the SSH GitHub thingy and clone it into slash notebooks.

Actually, wait, we're trying to do this properly, right? So if we're going to do it properly, what we should actually do is create our own fork of it so that we can make changes and save it back again. So actually rather so what I'm going to do is I'm going to click fork and that's going to create my own copy in JPHOO.

Okay, so let's get rid of that copy in JPHOO. We can actually do this all from scratch. Okay, so try following along with these steps. Make sure it works for you. So I'm just going to delete my copy of this. JPHOO slash fast book so that we can start from scratch.

Looks like I need to go get my phone, excuse me. That's annoying. Normally it lets me use GitHub on my phone. Alright, I'm just going to do this on my other computer rather than set this all up. This device option seem like it was promising to me. This device.

Yeah, no, because like this is my new Mac that like, I don't even normally on my Windows machine I use Windows Hello to do feast recognition. And when I say do it on my phone, it's not doing it with the GitHub app on my phone which is what normally happens but instead my it's on saying on my phone I have to install a USB security key so I don't know.

I could just do it over on my other computer to leave this repository. Don't worry about that another time. Okay, my other computer I should be able to use my security key. Yes, because it uses face recognition. Okay, great. I love Windows Hello. It's very handy. Okay, so now, that's all done.

So right now. Great. So now I can. Okay, so now I can create my fork. Oops. There we go. So this is going to create a basically a copy of the repository. But it's a copy that is linked back to the fast as version of the repository. So, as if, if faster I makes changes, then I can click fetch upstream and it'll copy those changes into our version, which is nice, but it'll also keep our changes as well.

As long as they don't conflict. So now, rather than cloning faster as version, we will clone our version, and we can save those changes back as we need to. Okay, so I got to get clone our fork. You can see here it's got our username, and then the repo name.

I noticed that I'm inside slash notebooks. When I do that, which means that now we can see it. And here it is. So let's open chapter one and let's open chapter one and clean checking. I remember previously with get clone you've done depth of one to not clone the whole repo.

Yeah. I wondered about doing that this time but because we want to probably save like commit things back to our fork of the repo, I decided not to go with depth one. This time, because we're not just reading it but we're actually changing it. I have a question. I have never done a fork and an emerge.

And I wonder if you can do a sample just to test. If that's okay. Yeah, let's make a change and copy it back. So, let's just on the way to the master. Yes, yes, yes. Okay. So let's see if this first cell runs first of all. You'll see at the very top there's a Pippin store fastbook, but it checks if this is a bash line because it starts with exclamation mark and it's checking whether something called slash content exists that only exists on colab.

So that will only run on colab. So it actually it's importing the fastbook, and then it's going to try to import it. Okay, great. And we could try running something. Okay, great. So now let's try pushing that back to our repo. So, if we now CD into fastbook. And we do a get status.

Get status. It looks like you haven't saved that. Yeah, I thought I did, but apparently I didn't. Oh, because it's on a Mac and you have to press Command S and I'm just not used to using a Mac here. Okay, so it tells us that we've made a change.

So, to remind you the kind of a shortcut for adding something to a commit and giving it a message is get minus a to add it and M to provide a message. And then you need to say what it was going to do. So I'm going to commit. And the message has to be in single quotes, so it knows that's all one string test, making a change.

Okay. So here's another thing that we're going to want to add to our. Here's another thing we're going to want to add to our persistent stuff is who am I. So let's run the things it says to. Okay, and then what you'll find is that there's now a config file that's appeared here.

That contains that information. So hopefully you won't be too surprised to hear that what I want to do is move that into slash storage. And then VIM slash storage. Pretty run. And Sim link it in a script. First time around we'll have to do that manually. Okay, and so I rerun the commit by typing control out a search backwards and type commit.

Press enter. Oh, I'm in the wrong place to get back to where I was a CD hyphen. There we go. Okay, so that's committed it to my local. There we go. So if I now go back to. GitHub. You'll see that this branch is one commit ahead of faster your master.

So I've made a commit that isn't part of faster. So, you know, by the way, sometimes like a lot of the time you create a fork and you make changes because you just want to make changes to your version. Sometimes you make a fork and make changes because you actually want to provide those changes back to the original project, you know, you fixed a bug of theirs or added a feature of theirs.

And so if I wanted to send my changes back to fast AI, because I think they're an improvement to the book, then I can say contribute. Right. And then that'll say, oh, okay, you can create something called a pull request. And so the pull request, it's going to show you the changes, which is I've executed some cells I hadn't executed before.

And I've added a cell where the source was one plus one and the result was two. And then I could create a pull request and that would send something back up to fast AI. And I'll show you what that would look like. Obviously, never do this unless you assure you want to, because that's just going to annoy the developers.

So then what happens is that would appear on my copy of fastbook. It would appear in pull requests. And so here's an example. So when you send in a pull request, I'll then be able to see what files you changed using this thing called review and be I'll even be able to see the changes that were made to the notebooks in a graphical view.

So I'll show you what that looks like. There we go. And they've provided a description. And so then if I wanted to add this back to fast AI, I click this button. So that's what people talk about when they talk about making a pull request. Anyway, that's not quite what we're doing.

We're just keeping our own copy of it as we discussed. All right, so can we have a version of review and be in local machine to see? Yeah, absolutely. Review and be is a is a startup. They're very, very good. You can absolutely so you can totally add it to your own repos just by going to review and be.com.

No, I mean, in my local machine. No, but you can use something quite similar called nb dime, which is actually so it doesn't really make sense to like have it on a local machine because like you don't pull requests to yourself on your machine. But what you do want to be able to see is to see differences.

And so, for example, with mine, they've got nb dime installed. If I let's make a change. So let's delete this cell, save it. And so it's nb times already installed on paper space. So I can click here and I can see, okay, I've made this change as I've deleted one cell.

All right, so let's try running. Yeah, it's happening. Okay. Well, this is running. Let's talk about what's happening here. So the first thing we've got here is a from blah import blah statement. So let's understand what's happening here. And to do that, let's CD into. The first day I repo.

Okay, so here's the first day I repo and the first day I repo contains a folder. Called first day. And that contains the first day I library. And in the first day I library, there are some also as well as some Python files as some subfolders. Now in this case we were importing stuff from fastai.vision.

And that tells us that there must be a directory called vision, and there is indeed is a directory called vision. And then in there, finally, we're importing from a module called all and that means that there must be a directory called all sorry a folder file called all pie and here it is all pie.

So what happens when we say from blah import blah is it goes through all the dot separated components except for the last one and treats these as directories. Right. And so basically what it's doing is it's going to look for a file. Called. Called fastai. Slash vision. Slash all.

That's how it gets translated. It's all very mechanical, you know, there's nothing magical weird about it. And so then if we look at that file. There it is. Right. And what does this file do. Well this file is just importing things. And so it's importing something from models basics blah blah blah.

Now, when you're inside a library like this, you'll see sometimes it uses dot or dot dot prefixes. That's going to be replaced with a file called dot dot slash basic stop pie. This will be replaced with a file called dot slash augment pie. So we should be able to find if I go control split control SP a ug tab.

Yep, there is indeed a augment. So I've got a split plane here. Here's my augment file. And so when it goes from dot augment import star, what does that do. If there's a special variable called done to all this when you got underscore underscore blah underscore underscore, we pronounce that dunder blah.

So this is done to all. If there's a special variable called done to all that's a list of all of the symbols that it wants to bring in. Right. And so here's a list of all the symbols it wants to bring in. So this file, this this command here will bring in, for example, a symbol called rand transform into this file.

Now, this file doesn't have something called done to all. And so if you don't have a file called done to all and every single thing that is imported or defined in this file will end up being exported from it. So since this line here imports something called rand transform and there's no dunder all here, that means importing all.py should also import rand transform.

And so that means I should be able to type rand t tab. And there it is. There is a rand transform. And if I hit shift enter, you can see where it's come from. So it's come from fast fast.ai vision dot augment dot rand transform, which remember is translated to fast.ai slash vision slash augment slash sorry.

OK, this is actually the name of a method. So this will become fast.ai dot vision fast.ai slash vision slash augment.py. And then we'll find the rand transform class or symbol inside there. And so let's find it. So if I click on rand transform. I'll show you a really nifty trick.

If you select something in VIM and hit the asterisk key, shift eight, it will search for the next place that this word occurs. So if I press star asterisk, here's the next copy of rand transform. And so here's the definition of it. So that's what it does when you say from blah, import blah.

If you say import star, then that will import everything that's exported. So everything in done to all or if there's no done to all, everything that's defined or imported. So that's why we now have a thing called rand transform available to us. Yeah, if you ever want to know where something's from, a lot of people are used to not using a wildcard import, so not using star, but instead listing specifically exactly all the symbols they import.

And so somebody does that. You can scroll back to the top of the file and search through it and try to find it. But you don't have to. It's much easier than indeed with star necessary to simply type the name and press shift enter and you can find out where it's from.

Or to get more info about it, do the same thing with a question mark. And as you can see, it will give you the signature, the doc string and the full path name of where it's from. And so this is one way if you want to go and look at the source code for this is I could copy that and type colon SP and paste it.

And now I've got a third split and here's the, here's that file. Okay. But if I just want the source code for that one thing, I can just put two question marks. And as you can see, this gives me the source code for that thing. Okay, so that's what the first line does.

First day I vision or import star. What is the next line do. So if I type doc on target or enough in a fast AI library, it'll tell me what this one didn't quite work correctly. It'll tell me all the information about it. For some reason, this one's not getting the usual documentation I would expect to see.

That's okay. You can always click on source to get a link to where it's defined. So here it is on hard data. Or you can click on show in docs. This one works correctly. That's interesting. So for some reason, the help the documents not working right. I'll fix that.

Yeah, so you can see here's the details here and you can see that there's, you know, it says here, for example, see URLs, you can click on these things to get more information about them. So there's lots of similar links, as you can see within the documentation. So entire data downloads and extracts URL by default into sub-directories of tilde slash doc fast AI.

So we now know that means you're your home directory. And it returns a pass. Okay, so let's see if we can understand what this is doing. So what I generally like to do. To understand a cell is to run every single line separately. So the most important thing to know how to run every single line separately.

Is to. Know that you can press control shift hyphen is control even on a Mac not command control shift hyphen to split into two cells at the cursor. The step one is to separate this out into separate cells. Okay. And so then, you know, run each one and see what happens.

So after I run this one, I should be able to look at path. Okay, so it tells me here that. It's been stored in storage data, etc. Now that's good news because we know that means it's persistent. So if I create a new instance, whatever, I'm going to have this same thing now.

Downside is if you have a free or cheap account, you don't have much space and you might not want all that space being taken up. So let's find out how much space is being taken up. So let's copy that. Head up over to our terminal. Get over there. And so remember, we can type to you -sh.

789 megs. That's pretty big. So you might not want that to be there. Which is fine. So you can just move it somewhere else, you know, put it in the home directory or something like that. So one interesting question is here is like according to the documentation, it was going to extract things by default to subdirectories of tilde slash dot fast AI, but that's not what happened.

Why is that not that happened? Well, it says it's a wraparound fast download dot get so we should probably. Look up that documentation to find out what's going on. So here's fast download. And here's fast download dot get. OK, so this is pretty much what we saw rather than, you know, this is using D dot get it returns a path by default.

It goes into base archive, which by default is this case dot fast download for fast. I might change it to fast AI. You can change them by passing fast download. OK. Oh, look at this. If there's a config file in the base directory, then they will be used for fast download.

Now our base directory for fast AI is dot fast AI. So let's go and have a look in the dot fast AI directory. CD tilde slash dot fast AI. There we go. There is indeed a config dot any. So you can see that paper space is actually set things up for us so that by default, all of the archives models we create data we download is all going to be put in the persistent storage.

So that's like that's a good thing, unless it's not what you want. Right. So I would say this is another of these things that we probably want to be able to adjust if we want to. So how about we move that? Into slash storage. And then as per usual.

Simulink it back. So when we start the machine, I don't know if there's a folder there or not. So what we can do is we can say make them as P basically creates a folder and all of its subfolders, sorry, all its parent folders and doesn't complain if it's not already there.

So I use make them as P to create a dot fast AI. And we will then. Remove a dot fast AI. Oh, now I'll show you a little trick. I want to fill out the word fast AI without typing the whole thing. If I hit control P, it will fill in the rest of the word P for previous it will fill in the rest of the last word that it can find that starts with those letters.

So I want to remove fast AI slash config dot any. And then we will link that back again. So it's going to be in slash storage. So I'll hit control P to fill out slash storage and it's going to be called config dot any. So I'll press control P and actually, if you then hit control X, control P, it keeps filling in.

The rest. All right, so now we're going to have. Config any file. Oh, except I don't want to put it in the home directory. I want to put it in the dot fast AI directory. And so we can test that copying it. Paste it. There we go. Okay, so we've now got a consistent config dot any.

I think in my case. What I want to do, and this might not be bad for most people, is I probably don't want the archives to be stored in my storage directory because I'm not going to need them again once it's been on archived. And so that's CD to slash storage slash archive.

Yeah, I don't want this there. So let's just remove it. Cool. Okay, so. This is now going to put by default stuff that I download will be in my storage or better use it anywhere. But, you know, I can always move it somewhere else if I want to. Okay.

The next two lines kind of go together. Which is I've got to use image data loaders. So again, you know, before you use something, it's good to understand what it is. So. Show in docs. All right, image data loaders is a wrapper around several data letters with methods for computer vision problems.

And what you're going to be using as one of the factory methods. So they're the things underneath. And all these factory methods except various things. So this tells me that there are various different ways of creating image data loaders, and they've got a consistent API. So that's good. In our case.

We've got. If I hit shift tab, I can find out what the parameters are. So I've got the path. I've got a list of file names. I've got a labeling. Let's see. Oh, then I've got valid percent and seed. The valid percent and seed. So what are those mean?

So we're using. valid percent is passed to image data loaders from path. Okay, so quite often, you know, we'll take a argument and just pass it to something else. So I have to click on here to find out what that does. Here we are. Validation set. Validation set is a random subset of size validation percent.

Okay, no worries. Optionally created with seed reproducibility. Cool. And it's got a labeling function. So. A function that receives a string, which is a file name and outputs a label. Okay, so let's just have a look at some of these things. So the list of file names. So remember the second argument with a list of file names is this.

So let's pull this out and create something called files. So if I press A for append above, I'll type files equals pasted here. And let's look at that. Okay, so files is a list of 7,390 items and it contains various paths in it. I'm going to remind myself that if you go path.base or base path, I'll show you what I'm doing in the tick.

Let's get rid of all this prefix copy and paste. There we go. Yeah, so if you create a special variable called path base path, then it will delete that from the start of everything it prints out. So it just shows you the relative to here. It's a bit easier to read.

So the 7,390 things with images slash blah, images slash blah. And it looks like it's going to be the name of the breed and then an underscore and then some consecutive number. Now, this looks a little bit different to what you might be used to seeing. Normally, if you look at something like a list, you expect it to look something more like this, right?

This is what lists look like in Python. So to find out why that looks different, we can check out what the type of it is by typing type. And it turns out it's not a list. It's something of type capital L. That's a special kind of list, which has lots of convenience functions in it.

We could just use question mark to find out a bit more about it. So here's the definition of where it's coming from. Here's the doc string from it, or we could type help L and not help doc. So help is useful as well. That's a built into Python. It shows you a list of all the stuff it can do.

Actually, maybe that's useful. So as you can see, it's got lots of functionality in addition to everything that a list does. Or we could type doc showing docs. So you can see this stuff's not just for fast AI libraries. It's for all libraries created by fast AI look pretty similar.

And so here's lots of information about L. Let's learn how to read this documentation, and then I think we'll stop. So L is like a list. Okay, so a list in Python is something that you create like this. Square brackets. And you can print them, for example, or you can index into them, or you can select multiple things from them.

Everything up to, but not including element two. That's a list. Okay, and L is very similar. But it doesn't have any special syntax like Python does. You have to create it like this, but it basically looks much the same. As you can see. But it has quite a lot more functionality than a list.

So I don't use a normal Python list that often, because, like, why use something which is less useful. So let's see, okay. It behaves like a list, so we know what a list looks like. So a drop-in replacement for a list. So when something's described as a drop-in replacement, anywhere that uses a list, you should be able to use this as well.

So it's got a super set of the functionality. It's like NumPy. It supports advanced indexing. Okay, so what that means is that you should be able to select multiple things from a list at once. I'm just trying to remember exactly how we do that. I think you can go like this.

Yes, okay, so I can select the zero thing and the second thing and one go. So that's an example. And as I mentioned, this is similar to what NumPy can do. So NumPy has a thing called an array. And so in NumPy, you can do the same kind of thing.

But a regular list, remember, A is our regular list. Can't do that. So you can kind of think of a capital L object as being a bit like a hybrid between a NumPy array and a Python list. You could try to use a NumPy array for things that you would otherwise use an L for.

But the problem is that a NumPy array kind of expects everything to be of the same type. So you have to be a bit careful. Sometimes it might do it for us. No, sometimes it can actually handle it for us. It's going to put them into something called an object.

So, you know, I mean, actually, now I think about it, you probably could use an array quite often, but they do behave differently. Actually, I'll show you an example. So if we go B plus Hello, then that's what that does in an L. It adds Hello to the list.

So notice that L's show you how many items are in it. And it also, by default, doesn't show you all of them, but it puts dot, dot, dot, which is both very convenient things. A Python list. I used control by mistake. Works the same way. A NumPy array. Doesn't work the same way.

And that's because NumPy is designed mainly for math. So if we make all of these into numbers, then a NumPy array, we could do this. It adds element wise. OK, so, you know, if you're kind of trying to go deep and fully understand each line of code, this is the kind of experimentation that you can do.

And hopefully what you can see here is that all of the information you need to fully understand all of these things is available to you in the documentation, which you can link to directly using the dot command and by experimenting. But it is like initially, it's a lot to learn.

But the nice thing is that the things you're learning are very reusable, right, because all this stuff is used all over the place. All right, so I'm going to wrap it up. Does anybody have any questions or comments? This has been a bit less interactive than usual. So I apologize if that's because I'm talking too much, but I definitely want to hear your thoughts or questions.

Nothing? That's fine. Thank you. Is that because it's too easy or too hard or totally obvious or I want to go away and think about it. For me, it was very good, Jeremy. OK, great. Right. Now I think it was I think it was perfect for me. Oh, awesome.

Cool. Yes. OK. I haven't seen somebody called Jess in a stream before. Are you a new person here? I've just not noticed you. A new old person who just saw it pop up in the forum. Sure. How did I how did it not pop up for me sooner? But thank you for getting caught up.

Where are you joining? Yeah, no problem. I am in the Seattle area of USA. Fantastic. Well, that's that's great. I think. Yeah. OK, well, that's a good time to finish. So hope to see you all tomorrow. See ya. Awesome. See everybody. Bye.