pythonrunscript - Answer.AI dev chat #3

Hey, everyone. This is Alexis from Answer.ai. This video is one of our dev chats. These are internal/external chats that we do about anything that we're working on and we want to share with other people. This is about a tool I made called Python Run Script. It lets you define single file Python scripts that declare their dependencies internally in comments.

So, you can just run them without having to manually set up or manage a Conda environment or a virtual environment or a VM environment or whatever. I've wanted to put this little comment right now in front of the video because some of the things in the video are a bit out of date.

So, one thing is it's called Python Run Script, not Python Script or whatever I refer to it as in the video. Another thing is it's done. It's in a GitHub repo. You can go check it out there, grab the binary, or you can install it by just pip installing Python Run Script.

And some of the syntax that we talked over ended up a little different. It uses markdown fences instead of yaml fences to let you declare your dependencies. You can use requirements.txt or environment.yaml or conda install specs and put them in there. And also it has a dry run mode or verbose mode and the other amenities you'd expect from a command line tool.

It works on Linux and Mac OS. In addition to talking about this tool, we also talk more generally about scripting languages and what makes them good or bad. So, I hope you find this interesting, and thanks. >> Okay. Now it's working. >> There we go. Now we're recording. Hello, gentlemen.

Okay. So, I thought I would talk about something that I've been hacking on the last couple days. >> Who are you? >> Oh, right. Sorry. I'm Alexis. >> Hi, Alexis. >> Hello, everybody. I'm at Answer AI. This is one of our Dev Chats, which may or may not be published.

So, might as well introduce ourselves. Jeremy, who are you? >> They know me because I've done two of them already. >> Okay. Well, Jono, what about you? >> Yep. I'm also at Answer AI. I think maybe in some past videos on the channel. If not, also R&D, same job title as everyone.

So, that should be enough for now. >> All right. So, I just wanted to talk to you guys about something that I started on Friday afternoon when the brainstorm hit me. And now it's mostly working, although it's not prettied up yet. But I'm actually quite happy with it. So, I wanted to get feedback from you both on, I don't know, ways to improve it, if it would be useful to you, or just if it's interesting.

>> So, what was the brainstorm? What was the problem your brain was trying to solve? >> So, I have a very opinionated brain. And the problem I was trying to solve, which I've tried to solve before, is to try to find an adequate scripting language. Also, just an easy way to get started on development.

Now, I started writing a mini rant about the whole, you know, what it means for a scripting language to be adequate, which I haven't published yet. But I'm aware that the moment you say something like best solution for scripting, you're going to get a bunch of responses. Like, hey, you should just use Bash because Bash is everywhere.

Or someone will say, just use Python because, you know, Bash is terrible and you have to use Python anyway because it's the best thing for the data science stack. That's a better language than Bash. All those things are true. >> I'm in the just use Perl category for what it's worth.

>> You're right. Or just use Perl. >> I think I should probably learn Ruby. >> I think Ruby came and went. Well, I'm sure that's not true. Don't send me an email. >> Careful! You don't know who's going to be watching this. >> It's great for web things. And I have published a Ruby gem, so I can't call it a Ruby hater.

Or if you keep going down this line, you'll get reactions like, what's wrong with you? You're crazy. You're overthinking it. All things are relative. There is no best language. Just be practical. Turn off your brain. That's not -- I understand that objection. I just totally reject it. It's not the way I operate.

So here are the criteria that I started putting together in my mind for what I think would be a kind of adequate, you know, scripting language or a property you really want to have in any language you want. First of all, a script means a single file. There's a lot of tasks where you want to be able to write a single file program quickly in order to get things done.

This is what we like about Bash. Also, an adequate scripting language should be one where it's good for the sort of things you do with scripting. So what do we mean when we talk about scripting? Often we mean file manipulation, some text manipulation, and some process manipulation, like launch this process, launch that process, move these files around.

This is what Bash is really good for. >> I don't know if you mentioned it, but a bit of network stuff, probably like grabbing something -- >> Perling stuff or pulling stuff down and then doing things with it. So that's sort of the domain of scripting and where some languages, you know, might be really good in a general way, like I'm fond of Swift, they often fall down on scripting because some of their APIs are very verbose for these routine operations.

>> And ideally, deployment would be attach it to an email and send it to somebody. >> Yes, yes. So this is another part of an adequate scripting language. I can even -- why not? I can share the sort of draft that I started writing on this, which you can see I'm now sort of speaking from vaguely when I talk about what we mean by an adequate scripting language.

So easily deployable is one of these things. So I've had the experience a number of times where I'm roaming around on the Internet looking for a piece of software I might use for something that would be useful, and then I find some GitHub repo, and it looks beautiful, just does it.

It's great. And then it's like, just, you know, launch your NPM and do your node and do your package.json. It's like, that's not my religion. I don't know that stack very well. I don't want to learn it very well. I've gotten other things -- >> Just to mention, your window might be quite small or something, because it looks pretty pixelated over here.

No, it's like your font size is fine. But, yeah, I think your window itself maybe was quite -- >> Is that better? >> Small, because it expands it to fill our window. >> Okay. It's probably not the right geometry. >> It's probably like a large window with normal sized font.

That way we'll get, you know, high resolution. >> That will be legible. >> Cool. >> How's that? >> Good. >> Good? Okay. So easily deployable. This is another big one. And ideally for this easily deployable property, you want to have one file, and then you give another person the one file, and then they can just run the damn thing.

You don't need to give them an instruction manual on how to understand Python virtual environments, or exactly what a Node project is, or why you should be using RBMs instead of RubyMs, instead of gem explode or whatever. And obvious things, too. You want it to be cross-platform. You don't want to have to learn the subtleties of exactly how this command is different on Linux versus different on Mac, before you can hand it around to your friends and family.

And you want it to be not a crappy language. So decent syntax is part of that. >> And I will say Perl is definitely ticking all the boxes so far. >> Oh, really? Okay. Well, maybe I should -- >> Yep. Perl is on everybody's computer already. Single file, yes.

Easily deployable, yes. And the syntax is, like, just basically the same as Bash. If you want to run a process, you stick it in backticks. The result of that process ends up in the variable that you passed it to. It has, you know, regular expression, language support directly, a library.

>> That's interesting. >> Yeah. >> So maybe Perl already -- >> And it's got a great standard library, so, yeah, your scripts will be cross-platform. And I would add another one, which is, like, you don't have to learn any language. Maybe that would be another thing to list here.

And since no one knows Perl anymore, it would fail on that. >> Yeah. No, that's also a reasonable one. Although perhaps it's a temperamental weakness of mine that I wouldn't think of adding that. So another one I would say is that you want to have a good prompt, a good interactive prompt.

And that's one that we -- that a lot of nice languages can sometimes underperform on. And it's also one of the strengths of Bash. Like, the shell is already your interactive prompt, so much so that you don't -- >> Yes. It is. Sorry. >> All right. Is audio good again?

>> Yep. >> So I'll just briefly say, like, why some obvious solutions don't work. >> Oh, we've lost you again. It's like -- I think there's, like, a wire that's loose or something. Because, like, you -- suddenly we just don't hear you at all. >> All right. Let me -- I'm on the wireless, so that seems unlikely.

>> Yeah. I was just wondering if, like, the boom mic has got a loose wire in it or something. I don't know. It's like it's just -- >> How about -- >> -- talking, everything's fine, and then suddenly it's, like, gone. Most of the time, it's fine. We can tell you what's up.

>> I'll show a bunch of stuff in the background, and maybe that'll help. All right. Here I am. So just to kind of recap why some things don't -- don't solve this problem as well as we wished it would. And did I just close Emacs when I was having my frenzy of -- >> And stop sharing your screen.

>> -- stop sharing your screen, so -- >> Okay. All right. I'll bring it up again later. But just to say, like, why some things don't meet this as well as they might. And the last thing is external dependencies. So it's all very well and good if you have this nice single file script.

But if having a single file script means you're confined to using the built-in standard library, well, you've kind of already lost now. Because now you've exploded the other properties. Like, let's say I want to do something in Python, for instance, fairly modest. Like, I want to handle YAML. As far as I know -- I could be out of date with this -- YAML isn't in the standard library.

So now I'm talking about either building my own YAML parser or emitter and putting it in my file. Now I've lost convenience. Or I'm using an external library. And now I need to explain to someone else how to make sure they can install an external library. So that's no good.

So why some languages don't meet these properties right away. The lack of easy integration of the external library, that one you lose right away for a lot of them. So that's a shame. Languages that you might think would be -- even if it's not a very good language, it's sort of everywhere.

Bash. You don't have the cross-platformness, really. Because the Bash standard library is actually all the external commands that are on your Unix system. Exactly. And those annoyingly are different in subtle ways from Mac OS to Linux. If you're writing a non-trivial script, you start becoming a specialist in the fact that the date command is going to be a little bit different.

And the grep command is going to take different arguments. It's just nonsense. Python and Node have this problem that the package management system is a thing where you kind of need to create a whole directory around your script. So you've already lost the nice single file property. And you need whoever you're giving it to to know how to manage local environments.

Or to be such a guru that they have one vast omni environment that they maintain in a state of cutting-edge excellence and consistency and self-compatibility, which not everyone knows how to do. Some languages that almost do this really well, I'd say, are Go and Rust. Why? Because they actually can produce a single static-linked executable.

So whatever complexity you went through when you were writing the thing, you can't often just give someone a damn artifact and they can run it. They don't need to get a PhD in your build system in order to use the thing, which is a beautiful, beautiful property. That's why you have all these great command line tools now that come out of the Go and Rust community.

Because they make something and then everyone can use it. And we don't need to care how it was made. Whereas that can't be said for a lot of other tools, unfortunately. So that's the kind of way I think about the problem. And then I'll show you solutions that I've run into.

Two or three that struck me as interesting. And then my version of it in Python. And you guys can let me know maybe how it could be improved. So the original idea-- let me share my screen. All right. Can you see the browser? So-- Yes. OK. So I was originally inspired by this library from years ago by a brilliant guy named Max Howell, who's responsible for the homebrew package management system, among other things.

And he did a lot of work in the Swift language. And he came up with this thing called Swift Shell. And the way it works is-- Oh, I remember that. Yeah. It's a brilliantly tasteful little piece of engineering. Swift is nice. You can write a single file script in Swift.

But then Swift is crappy because you have a bunch of dependencies. You need to have a package file. You need to have a directory. You need to name them. It's a whole separate operation. But with his thing, instead of having your shebang line in your normal Swift script be this, you just change it to Swift sh.

And then you write Swift as usual, except you add special comments that come after the normal Swift import statements. So this comment says that the Promise Git library can actually be got from the GitHub account associated with Max Howell. And this script needs a version 6.5 or greater. And you can get more involved with it if you want.

You can have fairly elaborate statements of version dependencies. So in this way, you can write a single file script that actually can have half a dozen external dependencies with pinned versions if you want. And when you run the script, it will just work. It'll do all the compiling somewhere else you don't see it.

And you just have the magic of, I give someone a single file. And as long as they have this Swift shell malarkey installed, then they can just run that file. And the first time they run it, it takes a while because it's fetching all this stuff and it's compiling.

But then the second time, it just runs because you have an actually efficient compiled executable. So that's quite nice. I quite like that. Then the Rust folks never miss a beat these days. There's a similar thing in Rust called Rust Script. And it's exactly the same idea. You need to install Rust Script.

That's actually not so great to install. You've got to go and install it with an installer. It's not just a binary you put in a magic place. But then once you do, it's the same gimmick or the same act of genius. You just replace your shebang with Rust Script.

And then you have this comment block that comes right after Rust Script that states your Rust dependencies. OK, I need time version 0.1.2.5. And then once you've done that, you've magically created a single file using Rust that can exploit the whole Rust dependency ecosystem and is dead easy to run.

You just run it and it goes. That's great for development if you just are getting something started with something small. And it's great for distribution. You just give someone the file and they don't need to learn about Rust. They just do it. I wrote a C# version of this about 20 years ago, which was great.

That's in my insurance pricing company. And we just give it to clients and they would use it to write insurance pricing scripts. I think there might be a somewhat more official way to do it in C# now. One would think there might be. So basically, I thought this should exist in Python because I don't know if everyone here agrees, but I'd say Python has-- let me change the screen I'm sharing.

How do I do that? Where's my stop sharing button? OK. Because in my opinion, one of the-- and OK, we're back in Emacs. And let me make the screen bigger. So it's a reasonable size again, Jeremy, in terms of legibility? Yeah. Because the Python packaging and distribution story, in my opinion, is terrible.

Maybe it's great if you really know your way around it. But it takes a while to realize all the different things that are out there and which ones are redundant and which ones are official and which ones are good and which ones are bad. And every time I step away from the language for years and come back, I feel like I need to recheck to see if things are-- what the situation is.

And my understanding of the current situation is that there's two things that work and they are all you need. If you don't need anything complicated, if you don't need anything that has complicated native dependencies, then you can do everything with Venv. And you probably should because Venv and Pip come built in.

And that will get you everywhere you need to go in terms of creating a contained environment and installing everything in a contained environment and working from it. And that's not enough. Because if you are doing interesting machine learning stuff, for example, or things that have parts of the data science stack, the fact is not all of that can be packaged with Venv.

And people who think so usually think so because they just haven't had to use those dependencies. So they haven't run into those cases yet. But when you hit those cases, then the answer is Conda. You don't need to like it, but that seems to be the truth. So my current-- It's not bad.

It's not bad advice. For what it's worth, I would say, actually, you should use Miniconda regardless. On Windows, if you run Python on a new Windows box, it's a wrapper which basically points you to something that lets you install it. Like it's not already there at all. On Mac, if you run Python, you're running the system Python, which Apple strongly asks everybody not to use for anything other than letting them run system stuff.

And on Linux, it's pretty likely you've got a really old Python, you know, if you're using Ubuntu. So I would tell people just use Miniconda. That way, you've got your own Python, and you can still use Venv if you want to. But, you know, now it's just like one solution for everybody.

So just use Miniconda and just use environments. I've only ever used Miniconda. I haven't used the whole sausage before. I think your knowledge of this is much better than mine. So everyone, you should listen to Jeremy's advice. But if you have an allergy to Conda, and you know you don't need the data science stack, then it's my understanding that you can get by with just pip and VM.

You probably can get by. The other thing I'd say is if you think you have an allergy to Conda, you probably have an allergy to Anaconda as it existed two or more years ago, at which point it was slow and awful. And it's actually now fast and nice. So your allergy may have recovered.

Yeah. And also, if I understand right, Miniconda now uses Mamba as its dependency resolver. So it's a lot faster than it used to be. Kind of. It uses Mamba's algorithm. But you don't ever have to think or know about that. So it's actually, yeah, it's using LitMamba behind the scenes, which uses a proper fast C++ optimized solver.

So it is fast. OK. Yeah. So maybe some people haven't touched it for a while, have a memory of it being much slower than it actually is these days. Yeah. It used to be very, very annoying. But nowadays, it just works. And it's quick. So speaking of just works, why don't I show you what I have sort of working right now?

And I can get your opinion on what would be a maybe tasteful way to improve it. So based on my picture of what one actually needs these days, being sometimes Conda and sometimes just PIP and VM, I ultimately-- let's see what the latest version of this thing is. Just one other thing to mention, if you do use Conda, you can still use VM.

So if you want to-- if for some reason, Conda environments are not for you, that's totally fine. You can just use VM in all situations except the one that Alexis mentioned, which is you want more complicated binary things to install, like a compiler framework or compiler toolkit or CUDA or whatever.

Yeah. So it's funny you mention that, because I've tried to cover both cases in the way this thing works now based on my understanding of what's sometimes necessary and what's always necessary. So the point you're making, Jeremy, is that if you create a Conda environment, once you're within the Conda environment, then you can then run Conda install, and you'll be installing packages that are Conda packages managed through Conda's distribution repos.

But once you're in the Conda environment, you can also just run PIP install and install-- And you don't even need to be in a Conda environment. Once you've installed Conda, you're always in an environment called the base environment, so you can actually never think about it. You can forget about the fact you're using Conda or Miniconda entirely.

You can use nothing but PIP. You can use nothing but VNV. And your life will be just fine. You'll just be using a recent Python version that's installed into your home directory and therefore is up to date and doesn't upset your Mac and doesn't require sudo and so forth.

You should definitely be using one of those. You don't want to use the system Python, because it's actually there for the system to use. It's not there with the intention that you would build on top of it and install new dependencies into it. Because at any given point, Apple might decide to upgrade it, change things around, and you don't want to be destabilizing your system.

All right. So here's how this will work. I've got a couple test scripts here that I was using to test it out. So I'll show some of the cases. So very simple example would be one where you have a script and it has no dependencies at all. Here's a very boring script.

And here you can see the shebang is something I'm now calling VM script. But I'll change that. I can change that now to -- >> I'm just going to tell people what /usr/bin/env is, because a lot of people are confused about that. So you don't have to use that.

It's just a thing which, I don't know, it's kind of a best -- I guess, you know, it's trying to find the correct place to find that from. You could avoid that and just directly write the full path of your combo script, too, if you wish. >> Yeah. All right.

So the script I wrote, the latest version of it, is now -- this is the part that needs cleaning up. This one right here. And what it will do is when it reads a script, it will parse a script looking for a dependency block. A dependency block is a formatted set of comments which may specify conda commands to run in a new conda environment.

And may specify the exact text of a requirements.txt file which will be pip installed, either in the conda environment or else in a VM. So, for example, I've got my test scripts here. I feel like I've let everyone into my house before I cleaned up. It's very embarrassing. If you look at this script, for instance, this is the dependency block.

The conda part of the blocks always starts with a header which is conda commands.source. Because if you were actually in a conda environment, you would source those commands in order to do your installations. And then the things that you might only want to pip install are in here, introduced with a header, requirements.txt.

And this text here in the lines under requirements.txt will literally be the text of a requirements.txt file that's used. And then Python 3 m pip install -r requirements.txt will be run to install that. So, with this script, for instance, if we run it, and let me put this in the - put my thing in the path here.

All right. So, the one I was looking at was test script conda requirements. Make sure my latest version is in the path. Yes, it is. Move this chunk out of the way. So, I'm invoking it now by calling the script. But because it's in the shebang, you wouldn't need to do that in order to run it.

>> Let's do it the cool way, then. >> All right. Let's do it the cool way. So, let me make sure I've updated this to be - so, just to recap what Jeremy was saying, people haven't used this before. This line in the beginning is what tells the operating system what interpreter to use to run the file.

So, this file's been marked executable. Let me make sure I have it here. Conda and requirements. Yeah, there's the X there for the executable bit. So, if I just run the script, conva script 2, sorry, I run test script, conda and requirements, then the operating system will read this line.

It'll say, okay, I got to go find this thing in the environment. Because I put it in the path, it will be in the environment. It will then run my conva script 2 and pass in this file test script conda requirements as the first argument to run it. And when I do that, I've got it doing - well, there it ran very quickly, because I'd already cached the building that it did.

But I have it doing a lot of debug logging where it says - it parsed it, it found that the dependency block implies the script will need conda. It then went into a cache directory that's kept out of the way where it found the conda environment defined by that directory.

>> Okay. So, even though that's called VN skipped cache, I shouldn't be confused. That's actually - >> Yeah, no, that's just the naming that I need to update on where to put it. But let's get - we can go look at those cache things, erase them, and then see it do the thing of figuring out what to do the first time.

That's much noisier. So, I'm calling it VN script cache, but I'll erase the cached environments now. And those caches - those alphanumeric blobs were MD5 hashes that were based on the dependency block. >> So, there's no fanciness, like I know some systems will - oh, you know, if you only changed one of the pip requirements, there's no reason to, like, create a whole new conda environment and download PyTorch again and all of that.

It'll just update the one package. >> Well, that won't happen anyway, because conda's very smart. So, first of all, conda stores all downloaded packages in a packages directory. So, it won't have to re-download anything. And the packages directory is stored, I think, in the base environments location. So, it shouldn't be re-downloading any of those things.

And then, secondly, although it looks like it's using up a lot of space to have a whole second copy of Python and stuff, it's actually using hard links. If you've got that installed elsewhere in another environment or in your base environment. So, it's pretty smart. >> Cool. >> Yeah.

So, this is - so, now it just actually ran and did the thing. And I haven't silenced the output by capturing standard out or dumping it. So, that's why it's so noisy. But as you can see here, it ran through and installed the bits that were needed for conda, created a conda environment associated with this directory only, did the conda installs, and then it also did the pip install into that.

And because conda also lets you specify the version of Python, this effectively ends up being a way that you can, in your Python script, specify and ensure that it's run by a specific version of Python, rather than have that be information that you communicate out of channel and that someone else needs to keep track of.

If you want to go see what it's done in that developer directory, you can see there, it's now created a new entry in that cache. And at the moment, I just write literally down here the conda commands that were detected. Oh, interesting. I also wrote the requirements thing in there.

But this is just put here for information. It's not actually run. And then here's the stuff that conda actually put in place. >> I can see it's got a little error there. I can see in line 137, underneath, it's complaining about requirements.txt command.found and --prefix command.found on line 139.

>> Oh, right. Good point. So, my parser is picking up the requirements.txt incorrectly. So, I should fix that. >> That's okay. We get the idea. >> Yeah. >> It'll be fixed by the time somebody else watches this video. >> Yes. And if we weren't using conda, if we only had -- let's look at another example.

Well, if you have a script with no dependencies, it's just going to run like a normal Python script. Let me see if I can find that. Variant requirements. Yeah. So, this is had test script no deps. Yeah. Chmod. >> I haven't seen chmod plus before. If you don't say -- >> Plus x.

>> U plus x, does it default to u and x? Or it defaults to ur, x, w? >> You got me. That was a typo. I don't know what it does. >> I'm surprised it didn't spit out an error. It must have done something. Maybe it added nothing at all to no groups at all or users.

>> Yeah. Or here's an example where it only has a requirements.txt file. So, let's see that one go. >> Core kits use fast progress rather than TQDM, by the way. That's what I hear. >> So, here it's detected that -- you can see from the logging, which is noisy right now.

It detects that the dependency block implies script will only need VM and pip. So, you don't need to have conda on your system to use it. That's the other reason I wanted to make it so it didn't require conda. The whole point of this is to create a thing where people need to install the minimum amount in order to be able to use scripts.

So, it needs to create the VM and then it does that. And then within it, it installs this library. And if we go look at the cache directory, which, again, needs renaming, we'll see there's a new entry. There. This one, hopefully, just has the requirements.txt file properly formatted in the VM inside there.

So, that's the gist of it. It's a little noisy right now. It doesn't do all the error checking, right? >> I like it. And, like, with -- people don't quite realize on the whole, I think, that conda has, like, every program you could imagine, you know. >> Yeah. It's not only Python.

>> GraphViz, Node, whatever. So, if you wanted to have a script that creates database diagrams, you know, and send it to somebody using the thing I showed yesterday, you could have this and it would have a, you know, conda install. >> Yes. >> GraphViz there. >> It's very general.

And there's a couple -- well, like, a little bit of context on this. Like, part of the reason I'm so militant, I guess, about, like, easy deployment in single files is that I've -- in the past operating environments where not everyone around me was a software engineer, at my -- where I was CTOing at my startup, we'd have a factory, like, a literal physical factory that was, like, right there as part of the office.

And there would be technicians working with manufacturing equipment. And we created software just for them to use. >> That's, like, what you guys were making. >> Oh, yeah. We were making -- right. Making custom eyeglasses. And so, we had -- we used machine learning to scan the face. >> Like, per person customized, right?

>> Per person. So, we scan your face with the front-facing iPhone camera, eventually the depth sensor, do inference to work out the 3D shape of the face. And those measurements of the shape of the face would get fed into computer-controlled milling machines, CNC machines, to make the glasses that exactly fit your face.

And so, as part of the business operations, it wasn't just people doing ML, people doing iOS development. There were people there operating CNC machines and making glasses. And they would need to -- we needed to have a workflow so that they would get data that came out of this -- came out of our database properly and was fed then correctly into the CNC machines.

You've got to do order tracking, customer tracking, taking payments, like, all that stuff. So, you need a lot of bits and pieces of custom software that everyone can use. The customer service reps, the factory technicians, the software engineers, the marketing professionals. Like, all these people are super capable, but they're not all people who want to know what the hell it even means to talk about, like, why are you bringing a snake onto my computer, you know?

Python this. It's ridiculous. Oh, what? I need to install Swift first? What's Swift? Is that a bird? Like, you know, it's no good. Like, you just want to be able to give them a goddamn file and then have them run it. And any time you need to do anything else, like, oh, just, you know, update your node, like, you failed.

Like, the whole stack has failed. If you're requiring that, you've already lost. So, before I, you know, want to make anything, I want to think about, is it going to be a thing that someone else can use easily? Not, like, that they can use if they care about it.

And I couldn't do that with Python. And I could sort of barely do that with Swift, using the Swift shell, you know, getup. But what I like about this is, you know, when I was talking to you about this earlier, John, you were saying, well, this is good, but can you just make it a Python module?

Like, that might be good for Python users, but then it's chicken and egg, because now they need to know how to install a Python module, right? Or they've got to pip install the thing that saves them from doing pip installs. Right now, my goal is for this thing just to be a single file that anyone can put in their path, if you could explain to them where a path is, and then if they do that, and then you can also explain the shebang.

So, we're already deeper in than we want. >> So, Alexis, I have, I have things that might be helpful to you. >> Yes, yes. I'm interested, I have a bunch of questions, but I'd be interested in feedback, thoughts. >> I have a, I have a installer for Miniconda that you can just run, and it works, it'll install, and without any interactive anything, it'll Miniconda on Windows, Mac, and Linux.

And I believe somewhere I also have that sitting on a, like, a URL, so you can just curl blah, pipe, bash, and you don't even have to download it. So, yeah, if you want to be like, okay, to run this script, you'll need Miniconda. If you're not sure if you've got that, or you know that you don't, copy and paste this one line, and you're done.

>> Yeah, that's a good, that'd be very interesting, because I think I want to, I haven't, obviously, it's a bit of a mess right now, because I just got it working, but my goal is to make it extremely easy to adopt, so that, like, data scientists and people who are smart and have a need to, like, maybe do a little bit of scripting, can use, really, all of Python, without needing to learn the package.

>> Exactly. Also, remember that you can pip install from a git repo directly. Things don't have to be on PyPy. So, two things. A, make sure your regular expression handles plus, because that's what you, you'll be git plus, blah, blah, blah. The second would be, you might want to, like, mention it, if you've got some kind of tutorial, or readme, or something, is, like, here's the absolute easiest way to create a package for somebody else to use.

And nowadays, with pyproject.toml, that's, yeah, pretty easy. So, if you've got a repo with a pyproject.toml in it, you've now got something that people can pip install, which means you've now got something that people, so, for people who are slightly more advanced, they can create things to enhance other people's Python scripts.

>> So this was the part where I was interested in getting feedback from both of you, because I feel like there's a lot of, there's some design choices to be made here that are shallow in engineering terms, but very consequential in user experience terms. And those are the design choices around what format to require/accept in this comment block.

>> Yeah, I mean, don't, that's too complicated. Like, requirements.txt is fine. Pyproject.toml would be if you, you know, wanted to create a package for somebody else to install. >> Yeah, so I was experimenting with this, but then I rejected it, because when I started looking at what constitutes the simplest, the kind of minimum parsable, pyproject.toml, it was already longer than it needed to be.

>> Yeah, yeah, and it's not, like, it doesn't buy you anything. >> Exactly. Yeah. >> The other thing I'd say is, for your examples, like, I don't know, I discourage people from putting specific version numbers in, in their requirements.txt. I'd say this for here as well, you know, I'd probably just say greater than or equal to four, less than five, or anything.

>> Yeah, no, I think that's true. >> There's a tendency, I mean, thusly, this comes very much from the, from the NPM world where, where everybody has these lock files and stuff. So people in the Python world tend to use this equals equals thing, but I think it's not, I mean, it doesn't matter so much what you're doing, because everything's going to be in a separate env anyway.

>> Yeah. >> But I kind of discourage people from thinking this is how we should package software, that it depends on the specific sub sub version of a package, you know. >> Well, why? Why do you say that? Let me, let me disagree a little bit, or at least check out where you're coming from.

>> Yeah, no, it's fine. Like I say, in your particular case, it actually doesn't matter too much, because you're going to have, well, the only reason why is that if I've got a different pack, a different script that uses 4.66.5, it's going to have to download a separate one and store the separate one in my packages, and I can't hard link anymore.

And so all those things I'm saying about how efficient and fast and low disk usage this will be, would be thrown away, thrown out. >> Well, they'd be compromised somewhat, because now we'd have two versions. >> Yeah. >> On the system, yeah. >> And you're not winning anything. And it's kind of, for people who are, you know, more beginner-y, just learning-y types, it's kind of saying to them, like, oh, this is how you're meant to package software.

It has, you know, you're meant to assume that it only works in the specific sub sub version. That's a minor issue, but. >> I think it depends on the sophistication of the consumer. So if I, like, think back to my, like, prototypical use scenario, if I want to hand someone a script that they can run, even if they have less, you know, technical expertise around Python, I really want to do everything I can to make sure that it will run, like, no matter where it lands.

Like, if it's raining, if it's snowing, if they're running Python 3.10, if they're running Python 3.7. >> No, no, absolutely. I agree. >> I want it just to work. So locking that down, because if I make it open-ended, then I face the risk that at some point there will be a future release.

>> Well, that's what I'm saying. You don't make it open-ended. You make it slightly more open-ended, right? So for mine, I generally write greater than or equal to x, where x is the minimum version I know works, and then less than, and then I write the next major version number.

>> Okay. >> Or, you know, you can make it slightly less annoying. Like, you could just write 4.66.star, you know, or 4.66, I think, might work. >> So pin major and minor version. I can see that. >> Anyway, it's a minor issue. I also noticed in your actual script, I think your regular expressions could be less strict.

So I would use hash backslash s star in the conda header and pip header, rather than hash space. I would also put a backslash s star after the colon. >> Sorry, which line are you talking about? >> So, yeah, those two lines, 23 and 24. After the hash, I would change the space to backslash s star.

>> Oh, so it could be an arbitrary number of whitespace tokens? >> Yeah, and I would also put arbitrary amount of whitespace before the new line. >> Yeah, I know, that's all. >> And rather than using a new line, I would put an R before that string, so that you then don't need new line.

Instead, I'd say dollar, and then, well, it depends how you're doing this exactly, but I don't know. That backslash s, in some versions of Python, that might now complain that you haven't, this is not an R string. >> These aren't R strings. >> That should be backslash s star, not backslash s plus.

You don't want to require spaces before your new line. >> I do, though, because I want everything to be hash and then a space, at least one space at the beginning. >> Yeah, but not at the end. >> That's true, yeah, yeah, yeah, okay, yeah, you're right. >> And then pipheader probably wants a plus as well.

>> Yeah, what I'm curious about is why I picked up the pip block at all when, earlier, so there's probably a deeper problem here I haven't sorted out. >> People don't have to watch this, watch this debug together, but, yeah. It's more of a general comment. It's in general for these kinds of things.

I try to make my matching things reasonably flexible. >> So this goes to the other thing I was wondering about. So I think it's almost certainly the right decision to use a syntax where, first of all, instead of putting things off of the comments line here, off of, you know, next to the thing, actually just have a block at the top.

Because that means that the requirement, yeah, because that means in particular. >> But I would put them in, this is quite a common pattern nowadays, is to put things in a YAML fenced block, so that would be, except commented, so that would be a hash space dash dash dash at the top and a hash space dash dash dash at the bottom.

And that way it's like very clear exactly where your block starts and stops. And if there's anything in there which you don't recognize in the block, you can tell people, you know, rather than it just mysteriously not running that thing. It's a YAML metadata block. It's just dash dash dash followed by some lines.

And at the end it's another dash dash dash. >> But I'm going to have to put hashes around it. >> You're going to hash before it, yeah. And with, you know, as many spaces as you like. >> So the thing that was -- I want to follow up on that because I'm not 100% sure I understood.

But the one particular thing that I really wanted to get both of your opinion on is what's the right format here? Because -- >> Yeah, that's the next thing I want to mention. So I think you should have two separate sections. I think you should have a conda requirements.

And conda requirements would all get passed to a single conda install minus YQ. And then the entire lot of requirements space delimited would all get done at once. Because most people don't need conda commands. And that's not really a conda command source. It's just a, you know, commands source.

You know, it's like just commands to run. >> So I started looking at conda requirements. But as near as I could tell, there wasn't a well-defined format that was specified by conda that had all the properties -- >> There is. It's the thing you've got there. It's the thing you've got there, which is the thing that appears after the minus minus quiet.

So that's what we use in nbdev as well. I would restrict exactly like your requirements.txt, but I would have, you know, conda requirements. >> Right. So the reason I thought that might not work -- and you can correct me on this, because you know a lot more than I do -- is that if you happen to be installing something where you wanted to rely on different channels for different packages, then the binding through which this channel applies for these packages and that channel applies for those packages -- >> There's a colon for that.

>> Oh, okay. So there's a way to -- >> And, you know, I would be inclined to have like in the, like, conda requirements, you know, space channels colon. And that way you've got like -- you know, because it's very rare that you need to like almost -- I don't think I've ever needed to specify different channels for different things.

I just have channels for everything. So I would have, like, hash conda requirements space conda-forge space fast-chan colon, you know, and then I'd have just my requirements one per line. And if you did need a specific channel for a specific thing, you can just use -- like, and if all you're doing is just concatenating with a space, then it'll work, because you can, you know, a user can put channel colon.

>> Okay. I think I need to see that syntax to understand it. You say it's using nbdev for the -- for the conda installation. >> There's no syntax. What you've got, Python equals 3.10. That's the syntax. NumPy, that's the syntax. >> But you could say -- >> Go with a colon.

>> Conda-forge colon. >> Yeah, that's what I say. You'll never -- like, I've never used it in my life. So I just like -- A, you don't have to worry about it. B, you don't have to do anything to support it. Because it'll just happen. If you just pass it along to conda install, it'll just work.

So I guess I'm thinking about setups like this. So -- >> So you just dump it. It's fine. You just dump them all, like, in one command. It'll work fine. >> So there's not a complicated binding that needs to be maintained here between cuda is enabled by this channel, this stuff is enabled by the PyTorch channel, the NVIDIA channel.

>> No, no. >> These things are coming out of the default. And if I don't preserve that binding property, then the installation won't work, if I do it on one line? >> I mean, it would -- it's unusual for that to be the case. And if it is, then people can put it in the commands one, you know?

But I would -- I suggest that for the normal, easy stuff that's, like, 99.999% of the time, just -- >> Okay. Yeah. Yeah. Because I was trying to find an easy format to use that was also well-specified and natural. And I wasn't sure what there was. It sounds like what you're saying works.

What I liked about this was -- what I didn't like about this was it wasn't a well-specified format. It was just the commands people would issue. But then what I sort of liked about it is, it was just the commands people would issue. So if you can imagine there's somebody who doesn't know their way around, but has made these commands before, it's not too frightening to see, oh, okay, these commands will be run.

>> No, exactly. I just support both. But for the one you've got here, there's no reason it couldn't say conda, colon, Python equals 3.10, numpy. >> Yeah. So basically something like -- this is what you're suggesting? >> Yeah. I mean, I wouldn't put the .txt there, because it's kind of weird, but, yeah.

>> Okay. And then the -- well, I'll look up what you mean by the YAML fencing. You mean? >> So I would -- yeah, but I would have run the whole thing, not just for the conda bit. Three dashes, not four. No, not there. So I would go right to the very -- to line two.

I would insert a new line in line two. And I would put hash dash dash dash there. And then I would go to line 10 and put a new line after that and put another hash dash. >> Oh, I see. To fence the whole thing, not the separate parts.

>> Yep. And then you just grab it. And then if somebody's accidentally typed the wrong thing somewhere, you can say, like, I didn't understand that line, you know? >> Yeah. >> Or if you don't find any requirements of any kind, you can say, like, oh, you've got a fence block, but there's nothing I recognize at all, or -- et cetera.

>> Yeah. Yeah, that might be good. I'm just trying to think about how to make it easy for someone who's never -- doesn't really know what a shebang is, doesn't quite know what conda is, but they kind of know, you know, I found a thing and now I'm trying to modify it to get a new result out of it, which is a likely situation for -- >> Yeah, exactly.

>> -- certain classes. >> I think this is fair enough, what you have to edit. >> But I'm very excited by it. I know it's a tiny little thing. It's less than, you know, less than 300 lines of code or whatever. But I think it will enable it to be easier to experiment with stuff.

Just create a single file, run it, get on with your day. >> I don't remember the last time I wrote something with over 300 lines of code in it, so I wouldn't measure things based on how many lines of code it has. >> Yeah, I mean, that's a figure of merit.

I'm just saying it's not a -- yeah, it's not a big thing. It's just a little thing. Oh, the other question is where to dump all the garbage. >> Yeah, XTG is the right thing to do, exactly. >> If it exists, but if it doesn't, then, I don't know, is it going to .cache on Linux and library developer on Mac OS?

>> Yeah, I'm trying to remember. I think FastCore, let me check. >> I just copied this behavior from SwiftShell because Max usually makes good choices. >> Yeah, so there's a FastCore.xtg library, which you could borrow. Yeah, and it attempts to have something sensible for everything, although it could probably be improved because it -- yeah, it's -- I mean, I don't know if it's wrong, but it's, you know, if you don't have something, like if you don't have XTG cache home, it uses path.home/.cache, for example.

>> What file is that, Defender? >> FastCore/xtg.py. >> Okay, well, definitely something -- >> So if you feel like there are things, better places to put things on Mac, I would rather a PR to this than -- >> Yeah. >> -- something else. That way, everybody can benefit. >> Yeah, I'll give it a think.

Apple often has pretty good docs if you look for them about where things are supposed to go. >> Yeah. >> How fully -- >> Yeah, I think I really only looked up the Linux or POSIX or something docs. I didn't really think about Mac specific. Nice, Alexis. Thank you.

>> Yeah, my pleasure. It's nice to get excited about a little thing, because, you know, it'll be done quickly because it's little, and then, you know, you can actually then go on and use it and get on something else. >> Yeah. Maybe this is why I only ever do little things.

But the nice thing is, like, if you kind of organize your little things in a nice way, eventually you discover that they all created a big thing by mistake. >> Yeah. Cool. >> Nice. All right. We done? We good? >> I think we're good.

pythonrunscript - Answer.AI dev chat #3

Transcript