I Like Notebooks

Hi there, I'm Jeremy Howard from fast.ai and I'd like to tell you a bit about why I like Jupyter Notebooks and help you maybe find some new ways that might help you really like them as well. I feel like this is kind of a dangerous thing to say. I like notebooks because every time I do to a serious software engineer type they tell me all the reasons that I should not like Jupyter Notebooks.

And they kind of act like I must just be ignorant and don't understand the better ways to code. But actually I've built a lot of good stuff in Jupyter Notebooks, and I've been coding for gosh about 30 years over that pretty much every day. I've used a lot of different IDEs, a lot of different editors, and Jupyter Notebooks seriously makes me at least twice as productive and I have a lot more fun.

I've built a number of popular software libraries like these ones in Jupyter Notebooks. In particular, Fast.ai, which is perhaps the most popular PyTorch deep learning API, other than PyTorch itself, I guess, and is very widely used at many companies, many researchers, many universities, and so forth. One of the cool things you'll see in the Fast.ai library is that actually the documentation, which you see here, it's got all these examples scattered throughout it and most things like links to source code and links to papers and links to other parts of documentation.

And actually you can click on any part of the documentation at the top and the opening collab button. And if you do that, then suddenly you'll see that entire documentation appear as an interactive, experimental playground you can play with yourself because you see all the documentation is written in Jupyter Notebooks.

Actually, not just the documentation, but all of the code itself for the library and all of the tests. And they're actually all in the same notebook. So if you start looking at one piece, you can see everything. You can see, as you see here, the implementation of this combined cosine scheduler.

You can see the examples. You can see the tests. You can see the documentation and you can start playing with it straight away to experiment with some different values and see how it works. Look at the inputs and outputs and so forth. I think that's really cool. The way that I do this is by using something called nbdev for notebook development.

nbdev is a really amazing project which I'm going to tell you a lot about at the end of this talk, the second half of this talk. But basically what nbdev does is it lets you create Python modules directly from your notebooks. You can export changes from your editor back to your notebook if you want to change things in the editor or IDE directly.

It automatically creates searchable documentation. It automatically creates PyPy and Conda installers. It will run your tests in parallel and the tests are in notebooks. It will handle continuous integration. It will handle version control stuff and so forth. It's really, really nice. I write all kinds of stuff in notebooks and here's an example of a little server I made.

And so I made this little server and it's a GitHub or a get webhook server. And the nice thing is that I haven't really done much stuff directly using Python's built-in HTTP handler classes before. So I started experimenting with them and I did so in a notebook and as I experimented I took down notes to myself and I started to create examples and little tests.

And this now becomes part of the documentation and the source code and the tests of the library I ended up building, which is called fast webhook. So you can see that you can kind of, you can write any kind of code in notebooks and you can end up exporting it into a real library and now anybody can download fast webhook and then they can see not only the final result but the process I took to get there and understand my thinking, understand the APIs I'm using, understanding the parts of the Python standard library I'm using because it's all documented in this process.

So a lot of other people are now using nbdev and one of the best comments I've seen is from Hamel Hossain from GitHub who said, "Tests, docs and code are part of the same context and are co-located." So this is what happens when you write with nbdev. And he says, "There is no other programming environment that exists like this that I'm aware of." You can even make notes to yourself about why something works the way it does very easily while you're writing the code and it isn't an afterthought.

This is fundamentally why I have a problem working in anything besides nbdev because not only does it make the code more approachable to others but forcing you to write docs actually forces you to think about the code more. And my personal projects that use nbdev I often refactor my code to be simpler and better after forcing myself to explain it and I have the exact same experience.

It really makes a difference to my workflow and a lot of this is really thanks to the underlying Jupyter Notebook system which nbdev sits on top of. Silvair Guttner and I, Silvair is my co-author on Fast AI. He is also my co-author on the Deep Learning for Coders book, which has been incredibly popular including some big names you probably know about who really like it.

This whole book was written entirely in Jupyter Notebooks and then we exported it directly with a single little script we wrote into AsciiDoc sent it off to O'Reilly and they published it into this beautiful book and a lot of people have commented on how nice this book looks, how good it feels, it's got color and nice little icons and all the stuff you'd expect, a really nice index and so forth.

So we've created a book that we're really proud of and a lot of people really like. And if you want to write a book yourself as well, you can. You can pip or condor install FastDoc, which Silvair and I have made available. This is the exact same thing that we use to make our book and you can run a single command.

FastDoc convert all and it will convert all of your notebooks into a publication quality book, or at least the AsciiDoc source for it, which you can then send to a publisher. All you have to do is write the book. Here's another example of something that we created with Jupyter Notebooks, which is a very popular course and of course people really, really like.

Nearly everybody seems to like this course, which we're so proud of because we spend a lot of time trying to get it right. And the whole thing is actually, or nearly the whole thing, is actually taught in Jupyter Notebooks. And the students then take these notebooks and what we do is we clear out all of the pros and all of the outputs and we ask the students to try to go through the notebook and figure out what's going to happen next and why are we doing this.

It's a great way to kind of force people to think about like, "Oh, did I really understand this? Do I really know what's going to happen?" And then they can run it and check. And if the answer is different to what they expected, then they can experiment. It's a really terrific way to learn and pretty much all of our students have said that once they get into it, they really adore it, they really find it terrific.

Overall, the key thing, I guess, that I like about Jupyter Notebooks is that they support literate programming. Literate programming is something that I have been fascinated in ever since I read about it in the early 90s, developed by Donald Knuth, a famous computer scientist, who describes it as a methodology that combines a programming language with a documentation language, thereby making programs more robust, more portable, more easily maintained, and arguably, certainly I find more fun to write.

The main idea is to treat a program as a piece of literature addressed to human beings rather than to a computer. And this is certainly the way that we now write code with notebooks and NVDev. And I think we actually go beyond literate programming to what I call exploratory programming, where we're using our notebook as a kind of a journal, like a scientist's journal.

And then when we're done, we'll go back and we'll try to clean it up as best as we can and we'll make that part of what we publish. So, for example, here's the actual source code from NVDev itself. And at the very start, Sylvain and I didn't know much about what is the Jupyter notebook really behind the scenes.

And as we started exploring and realizing it's just JSON and printing it out, that became part of our documentation and source code. And you can see we start to create and export functions as we go along, and that becomes part of the library. So then when somebody else comes along and wants to contribute to NVDev or to any project written with NVDev, they don't really need a huge amount of hand-holding and helping them get involved because they can see not just the final result, but the process to get there and the thinking and the choices that were made along the way, because they're all part of the notebooks.

And they can even see the tests and see how the tests are related, and it's all there in one place. So overall, notebooks give us a live coding environment. It's live in the sense that you're working directly programming against those live objects. The actual system that you're building them in has the state, has the actual current state of all of the variables and all of the systems in memory, and you can directly interact with it.

This idea goes back a really long way. One of the most famous examples of a live coding environment is Smalltalk, this one here from Smalltalk80. And as you can see here, as the code is running, things are actually happening on the screen. And anybody involved in Smalltalk in those early days will tell you that this was a critical part of why this was such a productive system and why it was such a loved system.

And a lot of people say there's never really been anything like Smalltalk again. We're kind of almost rediscovering things from decades ago. There are other interesting examples of live coding. Here's a great one from somebody called Sam Aaron, who actually does live coding as performance. Here is him writing music with code in real time.

So I think that's pretty cool. Here's something which is even cooler. This is Brett Victor, a brilliant designer and a brilliant thinker, showing a real live coding environment he created, which allows him to create games in a whole new way. I mean, not just games, you could use this for so many things.

But here's an amazing example, using it to build a computer game. So I bounce off my turtle, pause the game, and now hit this button here, which shows my guy's trail. So now I can see where he's been. And when I rewind, this trail in front of him is where he's going to be.

This is his future. And when I change the code, I change his future. So I can find exactly the value I need. So when I hit play, he slips right in there. So I've got to say I've never managed to build code in a way where the people watching it went whoa and then started clapping.

It's certainly something to aspire to. And you can see how much people really love this idea of actually interacting with a live coding environment. Brett Victor has been very inspirational. One of the things he inspired was Chris Latner, who is the guy who created LVM. He created Swift, and he built the amazing playground system, which as you can see here, as the code is running, you can actually see what it's doing and you can even plot it and so forth.

Another great example of a popular and important and powerful live coding system. So I was so proud when actually Chris himself said he thinks that NBDev is a huge step forward for programming environments. And so for that to come through Chris was a huge validation for us that somebody we really admire thinks that we're absolutely going in the right direction.

Most people, however, are not using this kind of live coding environment despite the decades of work that's kind of gone into these kinds of systems and the productivity that we found comes from it. Here's how a lot of people code, and I'm going to give an example here, you'll see why in a moment, of a very successful coder named Joel Groves.

This is Joel here, and he's good enough to actually do coding, which he puts out on the internet for people to watch. And I watched it to see exactly what this looks like. And what he does, like a lot of people do, is he has what's called a line-oriented REPL down here.

This is something where you can type in a line and it returns a line. And then the rest of it is a kind of a standard editor IDE. This is VS code, which is one of the best or maybe the best editor around. So watch what happens as he codes.

You can see here that he has to kind of go back up to find something he's done before. It's the wrong thing. And then he has to edit it. And then he's got an assertion error. Now he has to go somewhere else and then comes down here again. Now he's getting this kind of weird situation of some state that's come from the code and some state that's come from the things he's typing.

And now he's going back up here and trying to edit this. And now bringing it back down here again, and he's took it in the error. You know, as I watch this, I find this painful. You know, I don't want to write code like this. I kind of feel like this picture is Joel saying, ah, this is too much.

But I feel like a lot of real programmers tell me, you know, this is how you should code. And it kind of feels like they're saying, hey, you know, we should go back and use line-oriented REPLs or everything. Like editing. We used to edit with add the Unix editor, which was a line-oriented editor.

And as you can see, the basic approach is exactly the same as what Joel was using for manipulating Python. Now these line-oriented REPLs, you know, it's not a great way to edit text. Very few of us use it nowadays. And I would argue that it's not a good way to work with any kind of code objects.

It is linear. There's a reason that we have line-oriented REPLs. And that's because we're used to code like this. If we enter maximum slash y, we get the maximum element in the vector y. So you can see here, he is typing one line at a time and it's printing one line at a time.

By the way, this is APL, which is decades ahead of his time. It's still one of the most brilliant programming languages in the world. But I would argue that we should be moving beyond the type-align and have-align-printed approach that was developed for this kind of coding. So these kind of editing environments, like VS Code, VS Code is a brilliant piece of software.

But I refer to it as a dead coding environment because you're not interacting with live code. And that leads to errors. You get this kind of gap between the system you're working on and the final result you're creating. So Joel actually wrote a fantastic book, which, despite being fantastic, it has some errors in it.

And the kind of errors are very interesting. This is from his errata page. The errors that say the code, you can't actually run it. So this line of code doesn't work. And this line of code doesn't work. One of the really interesting ones was not only this line of code doesn't work, but hey, you've got a code repo where it does work.

And so there's this kind of like gap between the actual code you're doing and the book that you're writing and then they become out of sync and your readers end up confused because the code doesn't run. All the code in our book runs not because we're particularly brilliant, but just because we ran it all in a notebook.

And so all the outputs you see are the actual outputs that came out of the notebook. Now, of course, one of the libraries might change or there might be a breaking change to Python or something. There could be something which could cause it at some point in the future to break.

But at the point that we wrote it, and as far as I know right now still, the code is correct. And it works because, as I said, it's the code that we actually ran. There is no it's not a dead coding environment. It was a live coding environment we used to create the book.

And the book directly comes from and is that code. So why am I talking about Joel's book and Joel's coding approach? That's because actually he a couple of years ago did a brilliant presentation called I don't like notebooks. And in this presentation, he explained why he thought we shouldn't be using notebooks.

And actually notebooks are not the right approach to building effective software or doing effective teaching. And the reason I feel like I need to talk about it today is because he is such a brilliant communicator and such a funny guy that this presentation has been incredibly influential. And pretty much any time I say I like notebooks, somebody will say that's not a good idea.

Haven't you seen that presentation where that guy explained why they're terrible? So I really feel like in order to tell you why I like notebooks, I also have to tell you why Joel is wrong. Which he is. I really feel like he's wrong. I've built a lot of good stuff in notebooks.

And as you'll see, I think the points he made are based on misunderstandings or at least to sometimes that now out of deck. Because his slides are brilliant, I'm going to use a lot of them and also so you can see exactly what I'm responding to. Whenever I use his slides, I'm going to show this little icon in the bottom right hand corner.

You'll see it. The next 12 slides are actually all from his presentation. I haven't edited them because I want to make sure you see exactly what he showed. And one of the things he did say in his presentation is I am not a notebook expert. Which is great. It's nice to be self-deprecating and to kind of have that caveat.

But he still expressed very strong opinions and people still, as I said, really think he must be right. They tell me that I am making a mistake to think that I like notebooks. So I was actually worried when he first told me that he's planning to write the talk that he did.

Because I know he's a brilliant communicator and I know he's really funny and I thought, uh oh, a lot of people are going to listen to this and say, oh, I guess we shouldn't use notebooks because Joel has made a compelling case that we shouldn't. And this slide is actually from his presentation.

He actually said in his presentation, hey, look at what Jeremy said. I guess he thought it was kind of funny that I told him don't write this presentation and he wrote the presentation. And so now I feel like I have to come back and say, OK, let's set the records track here.

So here's what he said. He said he had a lot of strong opinions. I don't agree with any of them, but here they are. He said notebooks discourage good habits. He said notebooks encourage bad habits. He said notebooks encourage bad processes. He said that notebooks hinder reproducible and extensible science.

He said that notebooks are a recipe for poorly factored code. He said that notebooks make it easy to teach poorly. I don't think it's a notebooks fault that that guy's going to get over the head. I don't do that when I teach with notebooks, by the way. He said notebooks make it hard for me to teach well.

So he didn't just state these. He gave reasons. And here are some of the key reasons, I think the key reasons that he expressed. The first one he expressed was that notebooks have tons and tons of head and state. That's easy to screw up and difficult to reason about.

Which is strange. I don't find this myself. And he made the point that notebooks or he says are dangerous. I don't know if I agree they are dangerous, but he thinks notebooks are dangerous unless you run each cell exactly once in order. I was like, oh my goodness, how am I going to do that?

Wait, Jupiter has a single button you can press to do that. It's actually not that hard. If you really think it's so important to run each cell in order, you'd have a way to do so. Personally, I think it's actually really, really important to have this ability to go back and fiddle with things, to change things, to see what happens.

I like having the ability to go back and run in order, but I also like having the ability to actually, as we discussed, manipulate the live coding environment in real time to experiment and to say what if. That's a critical part of this. But you do need a way to ensure that in the end the whole thing works.

And not only does Jupiter have a couple of ways to do that, there's restart and clear output. So actually restart and clear output, restart and run all, made a mistake there. And in cell there's also a few options such as run cells to here or run all cells. And NBDev actually has something which runs all of your notebooks, all of your cells in order for you for a whole directory.

That's the main thing I use. Another concern he stated was that you can't copy and paste code and outputs from a notebook into Slack or he also gave the example of that pull requests and issues in GitHub. Now this is an example of trying to do things the same way you've always done them without thinking about what's the actual problem you're trying to solve.

Now the actual problem you try to solve is to say here's what I'm trying to do, please explain why this doesn't work. Or here have a look at this example I'm showing, or whatever. And here's how it actually looks, it's actually way better than cutting and pasting into Slack.

When we get a pull request or an issue, here's a bug report, colab notebook reproducing the behavior. Now I click on that and I get a whole notebook fully self-contained where I'm not just seeing this person's claim or I type this and this happened. But I can actually try it and that means I can then actually try to fix it right there and then.

And this is particularly helpful because all of the fast AI documentation, all of the fast AI book and all of the fast AI courses are also available as notebooks. So people can use that as a starting point or I can say like, oh, did you try the code in the book?

If you have a non-working example, could you modify the notebook to show us how yours doesn't work? And so forth. So rather than saying how do I copy and paste into Slack or GitHub, the question should be how do I understand the problem that a user is having or understand the idea that a user is telling me about?

And the answer to me is by providing an actual live coding environment, I can see that and it's so easy to do with Jupyter. Something else I really like about Jupyter is you can use something like what I really enjoy at ReviewMD to look at pull requests and pull requests don't just show me the code that's changed, which is fine.

They do. It's very nice. But they also show me the outputs that have changed and the documentation is right next to it. So here's somebody changed to test, right? And rather than thinking, oh, I wonder if those scales are any good and then having to go back and load in their PR and run it and then have a second version of the code and run that and compare the two in ReviewMB.

I can see them right next to each other and I can say, oh, yeah, this actually does look like a more clear example to me. And I can see the documentation is right next to it and I can see exactly what's going on. There's lots of ways of sharing notebooks.

Another is to press this button. This is the just button. Here's a notebook that I created and you can copy and paste images directly into a notebook. So here's one I just copied and pasted in. And if I click that button, then it automatically gives me a shareable just URL so I can paste that into Slack.

That's at least as easy as copying and pasting from IPython. And of course, I get the benefit that I'm copying and pasting not just text but pictures. And, you know, a lot of us are working with things other than just text nowadays. We want to be able to show plots, you know, histograms of things and analytics.

We want to be able to show pictures. We're going to be able to show videos. We're not just working with text all the time. And so with something like this, you can really show a much more complete example a lot of the time. It's really nice and easy to do.

Another concern, as you can see, we've still got our little pictures down here. There's still Joel's code, sorry, Joel's slides. Another concern he had was that he thinks that notebooks are harder to reproduce. And this one, he didn't really explain why he thought that way. And I don't fully understand the thought process here.

All of the same ways that you can use for dependencies in regular Python libraries like requirements text or environment YAML or whatever set up the pipe. You can use exactly the same thing for notebooks. But in practice, though, you know, notebooks, I really love because when you provide a notebook, you can just provide a cell at the top which creates the environment you need.

So, for example, you can open any chapter of the fast AI deep learning, practical deep learning for coder's book directly on Colab by clicking on a link without any installation. And the first line of the first cell installs everything you need and away you go. So really, to me, I feel like notebooks make it much easier to ensure that you have something that's reproducible.

And you can also see what the programmer did step by step to really make sure that what you're seeing is what they were seeing. Look, you can certainly make bad notebooks. You can certainly provide bad reproducibility environments. But I don't think it's anything to do with notebooks themselves. You know, it's to me, this is an environment that actually makes that easier to do well.

So the other thing that Joe talked about quite a lot was this idea of good software engineering. And he made some pretty bold plans that good software engineering can't be done or is extremely hard to do in notebooks. And he used these characters quite a lot, these Smurfs. And basically, you know, he's saying, like, you should all follow the rules of good software engineering.

But, you know, it's kind of like this idea that you should copy and paste code and outputs into Slack. You know, that's how people might have done things before. But, you know, maybe the rules of software engineering in a dead coding environment or in a line oriented repo or whatever are not the same, particularly, you know, compared to a dynamic language in a live coding environment.

And also the rules for a data scientist who's doing research and their focus is on speed of iteration and on rapidly eyeballing visualizations to see whether their, say, their microscopy images are actually getting easier to see or harder to see. To take an example of a project I've been involved in a lot recently.

These are kind of going to be different to the rules, the so-called rules of somebody who's creating a CRUD app or an e-commerce app to send a payment to a Stripe API. So I think we've got to be careful about the idea of rules and think about domains and domain expertise and environments.

So here's another slide from Joel. And his concern was that notebooks are not good for modularity. And he's giving an example here of some of his code, which he's saying is very nicely modular. I mean, sure, but why can't you do the exact same thing in notebooks? And in fact, FastAI, the library I told you about that we wrote entirely in notebooks.

Actually, the modularity is so good that we have a peer reviewed paper about the approach to modularity that we took and about how the kind of decoupled API that we created. I'm sure it's not perfect, but a lot of people have used it and have liked it and people are studying it as an example of modularity.

It's definitely not the case that notebooks somehow make it impossible or even difficult to create modular code. I'd say the same thing about testability. I don't know if this is from Joel's tests. I guess it probably is. Again, this is one of his slides. He's showing here examples of tests.

Tests are great, but in this kind of regular approach to coding and these dead coding environments, the tests live separately to the code that's being tested. And it's very easy for somebody to look at the code and not even notice the tests or they have to kind of flick back and forth between the two.

And it's not easy to connect which test is really working on which part of the code. Whereas in notebooks and also with NBDev in particular, the tests live right next to the thing they're testing. And they'll include pros explaining what it is they're testing. So here we've created a thing called an unbuffered server.

I think it was in the cell above the one I took a screenshot of here. And so here I've created a test handler to test it that sends a response and writes OK. And here's something that checks whether that starts a server and then checks whether it actually receives that OK or not.

So it's really nice having the test in the notebook and then NBDev provides a way to run all the tests across lots of notebooks and report on the overall result. And that can be run in continuous integration and NBDev gives you that actually out of the box. If you use the NBDev template, you get this kind of continuous integration testing for free.

You don't have anything to do. It just works, which I think is super cool. Another of Joel's concerns from his slides is that notebooks somehow encourage a less sophisticated approach to learning. So hit shift enter to execute a cell and go to the next one. Maybe people just do that without thinking.

I mean, it's possible people could could do that. I would say even that is better than people just reading the text and having nothing to do. But as I described, actually, what we do is we have a little script that just removes all of the pros except for headings and all of the outputs.

And then we give this to the student and then they can run through each one. And before they run it, we say, have a think about what this is going to output and think about why and think about why we're doing it. And then if you guessed wrong or figured wrong, you can actually experiment because you're in a live coding environment here.

So you can actually see, well, where did that go wrong and what actually happens? So I actually think this is a great way of learning. And a lot of our students have told us they think it's a great way of learning. I don't think I've ever heard anybody say that this ability to work interactively in this environment is decreasing their ability to learn.

So another thing that Joel said and gave a few examples is that notebooks are way less helpful than my text editor, which in his case, we saw as VS code. So he said some things are easier demonstrated. I'm going to show the opposite of his demonstrations, which is actually that Jupiter is more correct and more helpful than his idea.

So here's an example. Let's get a URL contents of a URL and if it returns something valid, it's like something truthy, then we'll return a otherwise we'll return one. So this is obviously going to return something truthy. So this should be a string. And as you can see, it's giving me I.D.

completion for a string. This code same code GISB completion for a number, not for a string, a bit length case fold conjugate. OK, so this code doesn't know because this code is it's doing the best it can. And it's kind of pretty brilliant, you know, given that limitation. But it doesn't know Jupiter knows because you ran the code.

So it actually knows what you're working with and it can actually because Python's a dynamic language, it supports this kind of dynamic introspection of what is actually inside B and what can be do. And so that's what Jupiter can use. So Jupiter is just really, really, really helpful because it can be really helpful.

This code does the best it can, but it can never be totally correct. It would literally be impossible without it actually trying to match the same stateful approach as Jupiter because Python is dynamic, because it's not fully typed. And even if you do use types for something like B above, you'd have to use a union type.

You still wouldn't actually know what the type is. So then Joel said, OK, here's what you could do to win me over and convert me to a notebook user. He said, give me ID style autocomplete. But as we discussed, ID style autocomplete is not the be all and end all.

It's actually not fully correct. Having said that, Jupiter also provides ID style autocomplete. If you give it types, then it will figure out what you mean. And if you give it functions like open that return a file, again, it will figure out what you need. So we have ID style autocomplete.

He said, give me real time type checking and linting linting. OK, here is part of a fast core library. As you can see, it's like a dozen lines of code and it actually gives you real time, actually correct type checking. So here you can see I'm calling foo, which is taking an int and a string.

And if you pass it into an int, it's checking. It does in fact fail. And again, it can do this correctly only because it's in Jupiter, only because it's actually running the code. The approaches that most people are taking to this kind of type checking is mypy. And mypy is not about 12 lines of code.

Mypy is about 100,000 lines of code. And it's complex code involving multiple different languages. And it's never going to be correct. It can't be fully correct because it's impossible for it to know exactly the types of all of your pieces of data because it's not actually running the code.

And Python is dynamic with Python. The only way to know what something actually contains is to run the code. Also, mypy means that you have to tell Python what every type is. And honestly, every other language is moving towards auto detection of types of figuring out what types are automatically.

Particularly early movers like F sharp. But nowadays, even stuff like Java, C sharp, C++, you can have like an auto type and it fixes out for you. Python is kind of moving in the opposite direction. And if you want to go the mypy static analysis, IDE approach, you're going to have to spend a lot of time doing manual typing.

Another thing Joel said he wanted to see to win him over is a better story around dependency management. Sure. Why not? As I said, notebooks can already support all the same approaches that that normal Python projects can handle. And the dev makes it even easier. You can just add a line to your settings on any with a list of requirements.

If there is some special one for Pip and Conda, you can add those special ones for development time only. You can add those and away you go. That will automatically make all of those things being stored for you when you run the notebooks. So we certainly have that. He also said he's looking for first class.

What is going on there? First class support for refactoring code out of notebooks into modules. And I agree. This is absolutely critical. And this is really the key number one first thing that NBDev does. You start with some code like this. And again, this is some source code of NBDev.

NBDev, of course, is written in NBDev. It's a notebook. And then it automatically creates an actual Python project. So those all exist. Joel did not expect that to happen. He said, the reality is you're not going to provide me with all these things. And I'm not going to switch to notebooks.

So so be it. So hopefully I've convinced you that there's no reason for you not to like notebooks and that it's not the case. They're real software developers have to use other tools. But actually, notebooks really can be really great. Let me explain more about how and why this happens.

And to do that, I'm going to focus on in particular NBDev. And I've already mentioned the basic things that NBDev does for you. Let's look more at how that works and exactly what you need to do. So here is an example of code in a notebook. And you can see here that I've got an export comment.

So NBDev uses a small number like two or three different special comments to tell it what to do. And this export says, make this part of my Python project. This doesn't have an export. So it's not part of the Python project. Now, one of the things I like to do, this is another thing that Joel talked about as being a problem for him.

With notebooks, he said it's hard to do is splitting a class into separate cells. And actually, with the FAST AI libraries using NBDev and FAST 4, it's not at all difficult to do. Here's a class and I've just got the init in it here and I can create it.

And then later on, I just use this patch decorator to add this method to this class. And so this is actually going to impact the documentation as well. The documentation of process comment will end up down here. And the documentation of class init, notebook processor init, it's going to end up up here.

And so it really helps the code reader understand things step by step. Each one has test examples kind of as it happens. And as you read through the documentation, you can see each piece one at a time. This is a really nice to me way to build up more complex classes.

All of the pieces of NBDev all get built out of a single simple little settings file, settings.me. And it's really nice because you can provide all of the information just in one place. So rather than having a version number over here and init.py and over there and setup.py and over here in your documentation, you have it once and it's used everywhere.

Ditto for your description, ditto for your source of your documentation and ditto for your repo information. It's just there in one place and then everything will use that for you. You don't have to put it in multiple places and think about how to maintain it and synchronize it. Talking of synchronization, not only can you start with a notebook and turn it into this code, which you can then open in this case, there are human VIM or you can open it in VS code or whatever.

You can edit it in your editor like VIM or VS code and it will sync it in the opposite direction to and update your notebook. And so some things are easy to do in editors, particularly kind of search and replace across multiple files and stuff like that. Or if it's an unfamiliar code base, it's nice to use the tags to kind of jump across between files.

You can edit as you go and then synchronize back to the notebook. So then how does the synchronization work? Well, there's two ways you can do it. You can either put this as the last cell in each notebook, notebook to script, and that will take the notebook you're working in and all the other notebooks and convert them into modules or at the command line.

You can run nvdef building. And so I have this in every notebook that I use because it's kind of nice to stay in the notebook environment. This is more something I tend to do as part of my release process. There's a lot of little niceties that nvdef tries hard to make nice for you to kind of make your code as correct and as close to best practices as possible, at least kind of our view of best practices.

One of the best practices that we think are important is dunderall. Dunderall is the thing that Python provides for you where you get to list what are the exported symbols in your module. If you don't provide dunderall and nearly nobody that's not an nvdef user provides it, then it exports all the symbols, not just the symbols which or anything without a leading underscore, not just the ones that you've actually directly typed in as your code, but everything you import also gets exported and that very quickly can lead to namespace pollution.

But with an nvdef module, because we automatically create a dunderall for you, which includes only the things that you requested be exported, that means that you can see the imports, for example, from fastcore.transform, which is part of an nvdef library. There's just stuff from fastcore.transform. Or else if you look at something from alannlp.nn.util, you get copy, JSON, logging, default, etc.

This is not stuff created by alannlp.nn.util. And so because this is built using the traditional VS code approach, it really is too much work to manually create dunderall. So the alannlp folks don't do it, just like pretty much every other Python library, not all of them. TK, for example, which comes with Python, does define dunderall, which is nice, but I don't know very many non-nvdev projects.

So here's another nice thing with a documentation. In the docs, you can just put your symbols in bactics, and then when you create the docs, which again, it's automatic, and it can be part of your CI system, in fact, that is by default, you can see it actually creates hyperlinks.

So nvdev knows how to actually look up each of these symbols and hyperlink to them, even things like this, which are part of different libraries. So this is a really nice feature which allows you to help out your users so that they can see exactly what you're talking about by jumping to other parts of the docs.

And of course, some things shouldn't be hyperlinked, like these, this is a parameter name, and so those will not end up hyperlinked. So the documentation which gets built for you supports all the kinds of features you might imagine, a hierarchical menu to take you to any part of the documentation pages, a table of contents for each page, you can have badges, opening collab, headings, links, all that kind of stuff.

So the documentation comes out pretty nice, I think. So here's what happens, you just run nvdev build docs, and it takes a second or so, it's all done in parallel, or you can have something like a GitHub action or whatever continuous integration system you use and call the fast AI workflows build docs GitHub action.

So then, you can open those docs directly as a notebook. And one of them is special, which is the one called index.ipynb, index.ipynb will automatically be turned into a readme.md for you as well. So no more worrying about trying to keep your files synchronized to make sure that your homepage and your readme is saying the same thing.

We actually do that for you automatically. We also, of course, make sure that it's not only the notebook, but the homepage on your documentation website, and even your PyPy and Conda descriptions will all end up showing you the same information from your index notebook. So in this way, because we're just saying build stuff in one place, do it once, and then we'll make sure that everything syncs up for you.

That makes it trivially easy to create really nice user experiences for your users. So for me, even when I create tiny, simple little projects, I always do them in nbdev, because that way I know that I can, you know, in a minute or two, provide installable libraries and documentation just in case anybody else is interested in using my work.

And often I find, you know, even for stuff that I think is pretty niche, there's always a few people who are interested in using it too. Here's an example, actually. Fast Webhook, which I mentioned before, it's really just written for myself. Fast AI, I wanted a webhook that would send out a tweet any time there was a release, but I did it.

I wrote it in like two hours, I guess, and then I just hit make release. And because I made it from nbdev template, it automatically created the Conda package and the PyPy package for me and everything was all set up, which is really nice. One of the challenges with working with notebooks on version control is you can get some really ugly diffs that won't even load in notebooks.

nbdev will actually ensure that those diffs are turned into what I would call a notebook level diff, which is to say it always ensures that your notebooks can be opened. If there's a difference only in cell outputs, it just ignores them and just picks one because, you know, you can just rerun it.

If there's actually a difference between, you know, in a cell, two people have changed the same cell, then it will actually show you the diff tags in a notebook, you can open it up in Jupyter and fix it up. All of your tests run in parallel with nbdev test nb's using as many cores as you have.

So this is a great way to ensure that every notebook runs from the top to the bottom and has the actual outputs that you're expecting. Lots of nice little pieces like math equation support, all the low tech equations work nicely. You use it in your markdown and it pops up in your both in your notebooks and in your documentation.

You're using KATEC, which is a really nice fast library for that. And there are other things that we power as well, not just publishing libraries, but nbdev also powers fast pages, which is a increasingly popular blogging system where you can write Jupyter notebooks and it turns it into a blog.

And this is really nice for anybody who is often trying to communicate technical content involving equations and/or code, visualizations, no more copying and pasting, just sent to medium or copying and pasting, you know, outputs, you know, plot outputs into files. When you can do the whole thing in a notebook, there's nothing to think about it.

It just works, which makes life very easy. And as we discussed earlier, fast doc takes notebooks and turns them into publication quality books. So I hope that you might give it a go and see why I like Jupyter notebooks. You can just go to nbdev.fast.ai, which is, of course, a nbdev-powered documentation site, and you can just click a button and it will create your nbdev repo for you and you can get started.

Thanks so much for watching, and I hope that you try this out and find that you like Jupyter notebooks too. Thanks.

I Like Notebooks

Chapters

Transcript