back to indexJ.J. Allaire (RStudio) and Jeremy Howard (fast.ai): "2-way AMA"
Chapters
0:0
12:9 Pandoc
18:58 Interactive Documents
19:2 Integration with Observable and Jupiter
25:55 High Level Api
27:20 Mid-Tier Api
27:36 The Data Blocks Api
29:9 Computer Vision Library
51:45 Michael Stonebreaker
65:27 Interactive Computing Metaphor
77:13 Reproducibility
78:46 The Prime Directive
79:18 Making Science More Accessible
92:22 Choosing What Problems To Work On
93:9 Do You Work at an Office or You Work from Home
00:00:00.000 |
All right. Hi, everybody, and welcome. I am here with JJ Allaire. My name is Jeremy Howard, 00:00:06.720 |
and we are having what I originally was very proud of myself for inventing the idea of a 00:00:12.960 |
two-way AMA. I wondered why other people haven't come up with this idea, and then I realized, 00:00:18.560 |
oh, I think I just invented another name for a conversation. So this is either a conversation 00:00:23.280 |
with JJ Allaire or a two-way AMA. We'll see if it turns out to be any different. 00:00:28.720 |
So, good day, JJ. Thanks for joining. Great being here. 00:00:32.960 |
So I always like to find out a little bit about people's environs. Where are you 00:00:42.800 |
talking to us from today? I'm talking from my home in Newton, Massachusetts. 00:00:51.760 |
And where is that? It's like, it's kind of due east, due west of the city, sorry, 00:00:59.440 |
maybe 15 minutes outside of downtown. So it's an inner ring suburb. 00:01:06.000 |
And why there, is this where you've always been? What's your favorite place in the world? 00:01:11.920 |
Not even close. No, I started in, I grew up in, I was in Philadelphia till I was 13, 00:01:17.760 |
and then I moved to Minnesota. I was there for a long time, till I was about 30. 00:01:22.000 |
And then I moved to Boston because of work, because of the company I was working with, 00:01:30.640 |
and then I just ended up staying here. And then I'm here because the public schools here are great, 00:01:38.240 |
and so I probably, all other things considered live in the city, but for the time being, 00:01:43.680 |
I want to take advantage of the public schools. Boston's a great town. I spent quite a bit of 00:01:49.840 |
time there when I set up an office in one of my earlier companies there, and I got very into the 00:01:55.440 |
Boston Red Sox, as you do, and got very into the Tennyson Racquet Club, and I have a very good 00:02:05.280 |
friend. I used to be a member of Tennyson Racquet, and I have a very good friend there who's still a 00:02:09.520 |
big place there quite a bit, so that's a gem. Yeah, that's a big town. And also, I loved it, 00:02:16.000 |
because I didn't know anybody. It's definitely a town where you could just go to a random bar, 00:02:21.200 |
sit down, watch the game, and whoever's around you will chat. It's interesting to talk to. Yeah, 00:02:27.840 |
that's true. So you end up, you're in Australia now. I am in Australia now. And you were in 00:02:33.360 |
Australia before. Yeah, so I'm in Queensland, which is a kind of, well, the part I'm in is kind of a 00:02:44.480 |
subtropical beachside town. It's not a resort town, but it's kind of like the nearest capital 00:02:56.240 |
city is Brisbane, which is like four or five million people, and it's the nearest kind of 00:03:01.760 |
beach town that people would go to for a weekend or something. So yeah, I always wanted to live in 00:03:10.240 |
Queensland. Never understood why everybody didn't want to live in Queensland, and now that I'm here, 00:03:16.960 |
I'm even more convinced everybody lived in Queensland. And the university is in Queensland, 00:03:21.920 |
too. Yeah, so the university's in Brisbane, so I'm a binary professor at the University of Queensland, 00:03:27.760 |
which is 45 minutes from here. So when I teach there, I just drive in and do my thing, drive home. 00:03:37.120 |
Cool. And in between, you were in Boston and California. 00:03:42.080 |
Yeah, so I grew up in Melbourne, which seemed like the center of the world at the time, 00:03:51.200 |
and I never understood why people talked about Australia as being far away, because it seemed 00:03:55.120 |
pretty close to me. But then, yeah, moving to San Francisco, I stayed there for 10 years for a 00:04:02.160 |
previous startup called Kaggle, and suddenly realized, yeah, God, Melbourne is a very long 00:04:08.800 |
way away, physically and everything else. And I do now feel like it's a very good experience 00:04:18.720 |
for somebody who grows up away from a kind of intellectual center like that, just spend some 00:04:27.920 |
time living in one, just to experience that. Yeah, for sure. And yeah, so tell me about what you're 00:04:36.240 |
doing now. So you're the CEO of RStudio. RStudio, which I started that about 11, I don't know, 00:04:46.080 |
11 or 12 years ago. And that was started off originally as just an open source 00:04:51.680 |
IDE for R. And it was just it was not actually intended as a company. It was just me and one 00:04:58.880 |
other person. And we had worked on lots of development tools and programming languages 00:05:03.440 |
and authoring tools and in our previous lives. And I had been involved in graduate school and 00:05:11.840 |
as an undergrad in social sciences and statistical programming, social sciences. And I sort of 00:05:16.800 |
originally that was what I wanted to make my career. And then I kind of got swept up into 00:05:22.400 |
software. And so when I finished with the startup, and then I found out about R, and I said, wow, 00:05:30.320 |
there's an open source statistical programming system. That's cool. I really would like to work 00:05:33.680 |
in open source. And it sort of, as you know, is written by statisticians for statisticians, 00:05:39.600 |
which gave it a lot of things, you know, a lot of things that got right. But then some of the 00:05:43.920 |
software tooling part they struggled with. And so I said, well, here's I can make a contribution 00:05:49.840 |
here. I know the tooling part. And and I'd like to see this project get used by more people. So we 00:05:55.760 |
just started working on that on the IDE. And then so that was just a couple of us. And then 00:06:01.680 |
long story short, we ended up getting to know Hadley Wickham. And he was working on what was 00:06:07.280 |
then not the tidyverse, but the dplyr and ggplot and things. And we said, let's let's all work 00:06:13.200 |
together. And then that sort of begged the question of well, how, how is it that we're going to all 00:06:17.120 |
work together and make sure everyone gets paid and everything. So he said, well, let's try to make a 00:06:20.400 |
company out of this. And we did that by sort of, you know, building sort of enterprise grade, 00:06:28.800 |
sort of servers that made it easier to adopt a lot of our open source software. 00:06:36.800 |
Yeah, I knew Hadley from before our studio, because of course, everybody in Australia, 00:06:41.520 |
New Zealand knows each other. So yeah, I remember actually hanging out with him in in Texas. And he 00:06:49.440 |
was in at Rice University. Yeah, that's right. Yeah. And he was already famous for his amazing 00:06:56.880 |
contributions. And he was saying to me, you know, saying like, wow, you know, the university must 00:07:02.720 |
love having somebody like you there. And it's like, no, quite the opposite. You know, they don't 00:07:08.240 |
appreciate it at all. And I got all to get support for what I'm doing. Yeah. And I just like short 00:07:14.480 |
shit, you know, what a what a terrible thing about academia is going on here. I know. And so glad 00:07:21.760 |
when he found you here, you know, you found him and he found you and that worked well. 00:07:26.560 |
Yes, it has. So that's good. And then the companies has developed well. And so that's, 00:07:32.160 |
you know, afforded one of the projects I worked on was our markdown, which is kind of a 00:07:39.040 |
literate programming system for our and that actually started working on that about 10 years ago. 00:07:44.240 |
And, and that we had a lot of success with that. But it was like, very, it was quite narrow in a 00:07:51.840 |
sense. And that it was like, why did you do that? You had some previous interest in literate 00:07:56.560 |
programming. You know, honestly, I there were two things that happened. I there were a bunch of the 00:08:03.600 |
I was working with a bunch of faculty who were teaching are and they were teaching, 00:08:08.080 |
they were everybody was trying to, well, they were teaching at the time, as we've, 00:08:11.680 |
which was this sort of latex based literate programming environment that was built into our 00:08:16.960 |
and they were doing that because they wanted to teach people that are programming and reproducible 00:08:20.560 |
workflow, but that they're teaching them latex, which was really, but that's unusual already, 00:08:25.120 |
right? Like not many people. That's unusual. Our had this thing as we've built into it, 00:08:29.600 |
you know, in like 2007 or so, I mean, they were way ahead of, or even before like, so, so are 00:08:37.520 |
always had this sort of in the community. And it was, it was actually one of the core members of 00:08:42.480 |
the art team who built this as we think so they're pushing this literate programming idea. So I kind 00:08:46.640 |
of got infected with it by exposure to that. And then at the same time, I went to use our in 2012 00:08:53.120 |
in England. And the, one of the people who presented was presented to a three hour seminar 00:09:00.320 |
on org mode and presented another system for literate programming that was more, you know, 00:09:07.680 |
human readable ASCII oriented. And so just to clarify for people who haven't seen it. So org 00:09:12.400 |
mode is an, is an E max. It's not just a mode, but it's also a file format. That's which is in many 00:09:18.400 |
ways a lot like Markdown. It's not at all compatible, but it's the same basic idea of text-based, 00:09:25.040 |
you know, format, but also in org mode, your code can kind of be evaluated and the, the 00:09:32.720 |
executed results of the execution appear in the document. So it has a lot of like what our Markdown 00:09:38.400 |
is, right? It's kind of like executable code, the outputs appear. That's exactly right. So, 00:09:44.320 |
so it's sort of like this idea. Well, we've got to ask, we've, it's really hard to teach people a lot 00:09:47.680 |
of tech. Some people were saying, well, is there a way we could get this into office? Can we get, 00:09:51.200 |
can we get through this with open document? How are we going to get people to do this without 00:09:55.280 |
while not burdening them with learning a lot of tech? When I saw org mode, I said, wow, 00:09:59.440 |
that's a better idea to me and more just ask on human readable ASCII based idea. But at the time, 00:10:06.320 |
Markdown was already really taking off and it was already in use on GitHub. It was in Houston, 00:10:11.760 |
a bunch of Wiki systems. And so I said, let's, let's take the, the core ideas of org mode and 00:10:17.360 |
sweep and build a Markdown variant of that. And, and I did it with R because that's the, 00:10:22.960 |
that's the environment I was working in. It was just, it had sort of blinders on like, 00:10:26.720 |
let's just make this work in the environment. And you were personally like doing stuff with 00:10:31.840 |
literate programming yourself and it found it useful or your, I found it useful because I was 00:10:35.680 |
building websites and documents. And yeah, I, I definitely was thought this is a great way to 00:10:40.080 |
work. And then, and at the same time, Yihue Z was created, had created a package called KnitR 00:10:47.760 |
that was sort of a replacement for sweep. It was sort of a better, like sort of feature enhanced 00:10:53.120 |
version of sweep. And at the same time he made it open so it could do restructure text and it could 00:10:57.760 |
do any ASCII doc and it could do Markdown. And so he, Yihue and I got together and said, let's create 00:11:04.960 |
this thing called R Markdown, which basically says we're going to use Knitter as a computational 00:11:09.120 |
engine and we're going to use Markdown. At the time, it was just like, we basically use sundown, 00:11:14.240 |
which was GitHub's Markdown processor. And we added math, you know, so that was pretty straightforward. 00:11:20.480 |
These tools all have R in them. Are they all exclusively R tools? 00:11:25.760 |
They, they are, they require R to run. They're pretty much R. They now, they're multi-engine, 00:11:33.440 |
so Knitter has this idea of engine. So there is a Python engine and a Julia engine, 00:11:37.840 |
but you're, you're calling Python from R. You have an embedded Python session in your R session. You 00:11:43.920 |
have an embedded Julia session in your R session. So like, it's very R-centric, even though it's 00:11:48.560 |
multiple languages, it's very R-centric. So, so yeah. And then, so we did the first iteration of 00:11:55.280 |
it and you could just make, you could just make web pages. And then at the same time, Pandoc was 00:11:59.760 |
kind of evolving and people were trying to figure out, they were like, oh, let me just glue together 00:12:03.920 |
our Markdown with Pandoc and then I can make board documents and PDFs and so on. That's, that's going 00:12:07.840 |
to be something a lot of people are not familiar with. So Pandoc is a, basically a Markdown processor. 00:12:13.680 |
It's, it's, I think it's written in Haskell, right? Although it's a compiled binary, so that doesn't 00:12:19.520 |
matter for most people. And yeah, it's kind of like a pretty, I mean, it is a kind of a Markdown 00:12:26.000 |
processor, but it can take almost any input and convert it into its Markdown and then convert that 00:12:30.720 |
into one of the steady app ports. Any text to any text. Yeah, it doesn't actually even convert it to 00:12:36.560 |
Markdown. It converts it to an internal format. That's a sort of abstract document. And so like, 00:12:45.200 |
if you're going Word to PDF, it's never seeing Markdown. It's just going. Yeah. So JJ, I had used 00:12:51.520 |
Pandoc before talking to you about all this stuff, but I had used it in this very kind of naive way 00:12:59.600 |
of just being like, Oh, I've got a document, you know, HTML document, and I want to convert it to 00:13:04.480 |
LaTeX or we're going to convert LaTeX to Markdown or whatever. And I just run it. Now, what I've 00:13:09.520 |
learned from you is that actually, you know, Pandoc has this like embedded Lua interpreter 00:13:18.320 |
and this kind of very generic system, kind of a bit like NB convert the notebook. Yeah, that's 00:13:25.120 |
right. Yeah. Takes this input as a kind of abstract syntax tree. You can munch it however 00:13:32.480 |
you like. You spit it back out. You can fit that anywhere in a Pandoc path to kind of construct 00:13:39.600 |
your own. It's like a document. It's a pipeline of transformations to the document. And the most 00:13:49.680 |
obvious of which is I just want to make a PDF or I just want to make a Word document or a web page, 00:13:54.240 |
but there's other. Yeah. And the other thing to mention is, I mean, as you said, it doesn't 00:13:58.160 |
particularly require Markdown, but you know, by the way, you know, Pandoc Markdown is this 00:14:04.480 |
fairly universal format because you can express things like divs and classes and layouts. 00:14:10.960 |
And then, yeah, the Markdown syntax, you can express the whole Pandoc AST in Pandoc Markdown. 00:14:16.640 |
Yeah. So it's a kind of a Markdown on steroids. One of the ideas that, though, that was taken 00:14:22.400 |
really seriously by John McFarland when he created Pandoc was. So the original Markdown had the idea 00:14:30.000 |
of raw HTML because the idea of John Gruber's idea was like, this is just an easier way to write 00:14:33.920 |
HTML. So of course you can put raw HTML in there. If it's something isn't in Markdown, just go ahead 00:14:38.560 |
and add the HTML. So that's a good idea. But what he had, and he was interested in creating 00:14:44.720 |
technical manuscripts. So he extended that to, you can put raw LaTeX in there. And so he basically 00:14:50.800 |
said also you can have raw LaTeX. And he made it so it was very good at generating LaTeX. 00:14:56.080 |
So he sort of added this. Yeah, because there's also Pandoc citations, for example. And citations, 00:15:01.120 |
right? So he added this idea of let's take LaTeX really seriously in a way that other Markdown 00:15:05.920 |
processors tend not to, because that's not really their use case. There are a lot of them are tied 00:15:10.000 |
to like content management systems and things are producing web content. And then let's take 00:15:14.720 |
citations really seriously. So they had a really robust implementation of citations and 00:15:20.400 |
integration with citation style language. So really first class citations and support of LaTeX, 00:15:27.120 |
and then ultimately support of Office document formats and open document and things like that. 00:15:32.640 |
So it was a more elaborate, comprehensive, hackable version of Markdown. So when we migrated, 00:15:42.640 |
we created sort of our Markdown v2 was based on Pandoc. And then-- 00:15:48.640 |
That was a couple of years. That was about eight years ago. 00:15:52.640 |
Pretty early on we moved to Pandoc, maybe even nine years ago. Pretty early on we just moved to 00:15:58.240 |
Pandoc. And then kind of to make a long story short, we created a lot of extensions to our Markdown. 00:16:05.440 |
We created a thing for making books, and we created a thing for making blogs, and we created 00:16:08.640 |
a thing for presentations, and for kind of like fancy grid layout of documents. And so we had all 00:16:16.800 |
these-- we did a version of the Distill Machine Learning Journal from Google. If you've seen those 00:16:21.360 |
articles, we made an our Markdown version of that. So we sort of innovated a lot in a very fragmented 00:16:27.760 |
way. And so we ended up at the end of this with-- we have this system that has a lot of functionality 00:16:34.000 |
that's fractured across a bunch of packages with a bunch of inconsistency that's R only. 00:16:38.400 |
And so we said that is kind of a dead end in terms of having a bigger impact on scientific 00:16:44.880 |
computing. And so we said, if we could take a step back, build a system that was agnostic to the 00:16:50.640 |
engine, the computational engine, and at the same time try to roll up a lot and synthesize a lot of 00:16:57.360 |
the ideas that we developed over that 10-year period into kind of one uniform system, then that 00:17:03.200 |
would be kind of what we needed to do to really like continue investing in a way that we felt like 00:17:07.760 |
this project is going to be meaningful in decades. So we kind of-- it was almost like take a couple 00:17:13.040 |
steps back. And that was a couple of years ago. We said, let's start working on quarto, which is a 00:17:20.160 |
language independent engine agnostic where the first two engines supported our Nitter, which was 00:17:26.400 |
what we supported in our Markdown and Jupyter. And so those are sort of equal citizens. And it is 00:17:32.960 |
possible to-- Let me just get that up. So yeah. OK, so here's quarto. OK, so this is what you're 00:17:42.480 |
working on. That's what I'm working on now. So that's pretty much what I've been working on for 00:17:46.800 |
directly or indirectly for about the last three years. And this looks a lot like Markdown. 00:17:57.440 |
It does. Yeah. It is derivative. It's syntax. And an approach to things is derivative of our Markdown. 00:18:05.360 |
And so you've got some YAML front metas, so some metadata, which is supported by Pandoc, I believe. 00:18:12.960 |
Then you've got some Markdown. This looks like something that's not in any Markdown I'm familiar 00:18:21.920 |
with. That's right. That's a cross-reference. OK. So it's saying I want a reference-- 00:18:28.000 |
Here is the label. And so now we've got, as a result, the Markdown here, the metadata here. 00:18:37.440 |
The code is also folding. And I guess I can't click on this picture if I could. And a hyperlink. 00:18:47.600 |
This was all the cross-reference. It's numbered the figure. It's only figure one. But if there 00:18:52.080 |
were 17 figures, you'd see one, two, three, four, et cetera. So that's kind of the idea. 00:18:57.120 |
Had interactive documents as well. Yep. Yep. So we do integration with Observable and 00:19:04.960 |
Jupyter. So really with Jupyter, we put the most effort into Python and making everything work 00:19:12.160 |
great in Python. We've put some effort into Julia. Any Jupyter kernel works with it. But 00:19:19.200 |
if we do a little extra work, then it works better. So yeah. So I mean, seriously, anything works. 00:19:28.560 |
I've been recently playing with APL. And I created the first ever APL kernel. Nice. 00:19:40.640 |
And so here's links to APL cliff documentation. Yep. Here is auto-generated table of contents. 00:19:53.920 |
That's cool. And then here's a Python one. Yeah. So yeah. So that's what I've been working on. And 00:20:09.200 |
I know the way that you and I got connected, well, we got introduced separately. Just hey, 00:20:15.120 |
you should get to know each other. And then we got to talking about. Yeah, that's right. And then 00:20:21.920 |
author of advances and error and such luck. Yes. Yeah. So Wes introduced us and it was like, 00:20:30.240 |
what are you working on? What are you working on? And we just, you talked a little bit about 00:20:34.640 |
nbdev2 and literate programming. And I said, well, this cordos might be related to what you're doing. 00:20:40.800 |
But it might be. I mean, I already knew, very much knew you by reputation, because 00:20:48.880 |
I was not a big user of ColdFusion, but I was an enthusiast of it, which I can come back to 00:20:57.440 |
and talk about that. I was a big user of Windows Live Writer. So these are both things that you 00:21:01.600 |
would build. And Windows Live Writer was something which felt like, it reminded me of the original 00:21:07.840 |
Mac OS graph calculator. It felt like better than all of the other things, like, because it came from 00:21:13.360 |
Microsoft, it felt better than all the other things that were around it somehow. And I thought, like, 00:21:17.440 |
how did something so come up in the windows, what I call windows extras or whatever it was, windows 00:21:25.040 |
plus windows. Well, yeah, there was like, anyway. Yeah. So yeah. And then I remember it at 00:21:35.760 |
University of San Francisco, one of our admin staff said, Oh, there's a, just got this request from a 00:21:43.680 |
guy from, you know, who's thinking of flying in for the lessons. You know, you might want to get 00:21:51.200 |
in touch with him to see if that's suitable. And it's like, what's his name? It's like it's a guy 00:21:54.640 |
called JJ Allaire. And I was like, oh, JJ Allaire is interested in fast AI. That's really cool. 00:21:59.360 |
Well, the reason I was going to do that was I was working on, I was working, creating an R interface 00:22:07.200 |
for Keras. And so I had done, we had done R, I created the R interface to Python, which is called 00:22:13.360 |
Reticulate. And then we built the TensorFlow interface. And then I was building the Keras 00:22:17.280 |
interface. And I said, well, I'm going to go take Jeremy's course in Keras. And then I found out, 00:22:22.160 |
wait, it's not in Keras anymore. Right? It's yeah. And I said, okay, I would still like to take the 00:22:29.440 |
course, but it's less right down the middle of what I'm doing. So I didn't do it. But I actually 00:22:34.800 |
had convinced one other person to do it with me. Although you did tell me that some fast ideas 00:22:39.120 |
did end up in some of your. Yeah, yeah, yeah. So so yeah, so studying fast AI is we, especially as 00:22:45.520 |
we did our PyTorch work. And we, because as you know, PyTorch doesn't doesn't offer you much in 00:22:51.360 |
the way of like a built in training loop. Right. And it doesn't really organize your work. No, it's a 00:22:56.480 |
Keras does. Right. And I think we rather liked the things you did in fast AI. And so we said, let's 00:23:03.280 |
can we do can we do some variations of those, you know, for our interface, because we clearly it 00:23:08.480 |
wasn't enough to just say, Oh, you can use torch from our I mean, it's for some certain researchers, 00:23:13.520 |
it's fine, but not for end users. So yeah, I mean, I try to encourage even researchers not to just 00:23:20.960 |
use raw PyTorch for everything. Because, you know, you really want to be incorporating best practices 00:23:27.520 |
as much as you can. Not I didn't have a couple since we're on fast AI, I did have a couple 00:23:34.080 |
questions. And one of them is like, if you think about how you help both new users ramp into things 00:23:47.200 |
and make experienced users productive, right, you provide these abstractions. And there's a dial of 00:23:53.760 |
how leaky you want, you let the abstractions be all the way from Hey, we've hidden you don't even 00:23:58.320 |
know PyTorch is here, at one end, the other end is learn PyTorch, then you know, learn our special 00:24:05.520 |
shortcuts. And in the middle is somewhere like, well, PyTorch is present, it's not hidden. You 00:24:13.280 |
can probably extend this with PyTorch. And, you know, like, I think different software design 00:24:18.640 |
problems lend themselves to different levels of leakiness. How did you think about that? Or do you 00:24:24.240 |
think? Yeah, so I've been coding for 40 years, you know, and I spent a lot more time coding than 00:24:36.320 |
building deep learning models, and a lot more time reading and studying coding and deep learning. 00:24:42.320 |
You know, software engineering is based on our ability to do good things with computers is based 00:24:52.320 |
on being able to use abstractions. And those abstractions are turned are based on being able 00:24:57.440 |
to use abstractions and, you know, so forth into machine code. We're hidden on the hard disk 00:25:03.600 |
controller, you know, etc. You know, there is none of those levels of extraction is the correct 00:25:10.080 |
level. They're all correct for what they do. So with fast AI, my approach, you know, has always 00:25:19.360 |
been just the same as all the coding I've always done, which is if I'm writing some high level API, 00:25:25.120 |
I write it using some lower level API, which I then write using some lower level API, and 00:25:30.960 |
so on until I get to the point where it's, you know, that each of them is trivially easy to use, 00:25:37.280 |
ideally, and is a kind of carefully designed set of primitive operations that make sense at that 00:25:46.480 |
level of API. So for example, the high level, so there's three main levels of API at fast AI, 00:25:53.280 |
the high level mid tier low level, the high level API is focused on applications. We provide support 00:26:00.400 |
for for which is vision, text, tabular and collaborative filtering. And then there are 00:26:06.320 |
other folks in the community who have added stuff around, you know, medical and audio and whatever. 00:26:12.400 |
And in each case, you basically use the same four lines of code. Okay, that kind of just like push 00:26:17.520 |
button interface, if you yeah, like, and that was the recipe. Yeah, and that was very much designed 00:26:22.800 |
about the idea that one day, we want to get rid of the code, and there'll be a higher level API 00:26:27.680 |
still, which is not good. Yeah, yeah, this is what I wanted to ask you. Well, when you finish, 00:26:32.800 |
I want to follow up question. Okay, cool. The this is really important for stuff like deep learning, 00:26:41.920 |
because the more boilerplate you have, the more things there are that you can screw up, 00:26:46.560 |
you know. And so if you have to like, manually create your validation set manually, make sure 00:26:52.560 |
it's not shuffled, and manually make sure the training set is shuffled, and manually make sure 00:26:57.200 |
that the augmentation is only applied to the training, like, each of those is something that 00:27:01.520 |
you're reasonably likely to forget. And when things break in deep learning, they don't break 00:27:08.080 |
properly. Generally, they don't give you an exception, or a sec fault, they just give you 00:27:12.640 |
slightly less good answers, or it's leading or misleading metrics. Yeah. So, so then the mid tier 00:27:21.440 |
API is the bit I'm most proud of. And I find that's often the hardest bit to write, you want something 00:27:28.240 |
that's extremely flexible, and that you almost never have to go deeper, but still really convenient. 00:27:35.360 |
And so for example, we've got a thing called the data blocks API, which came from me, you know, 00:27:41.440 |
I've been doing machine learning for, let's see, over 30 years now. And, you know, I just 00:27:48.400 |
thought back to like, well, what are all, what's the entire set of things I've had to do 00:27:53.600 |
to get data into a model training. And I, you know, realized that there was just 00:28:01.200 |
okay, there's like four basic, four or five basic things. Yeah. And I realized that when I pulled 00:28:07.040 |
out those four or five basic things, the huge number of classes I used to have before I built 00:28:13.520 |
the data blocks API, I realized I could replace them with just these five things by putting the 00:28:20.960 |
blocks together. And so I was able to reduce the amount of code I had by 10 fold, and increase the 00:28:28.320 |
ability for me to write my high level API a lot, and then to give the same thing to all my users. 00:28:35.200 |
And then, yeah, the bottom level API, it's still above PyTorch. Well, it's mainly like filling in 00:28:42.160 |
the things that aren't in PyTorch, but should be. So for example, I like using some object oriented 00:28:48.800 |
programming. And I believe that types should represent where possible semantic things. That's 00:28:54.800 |
something which that doesn't really exist in PyTorch. So I added object oriented types, 00:29:02.080 |
semantic types to PyTorch. Something that they've added, it's still not amazing. But we created 00:29:09.600 |
first is like a computer vision library that entirely operates on the GPU and does things 00:29:14.880 |
in a really efficient way. So kind of stuff like that. So then the idea is that a user, 00:29:22.720 |
we want them, if they're doing something supported by our application API, we want them to be able 00:29:28.080 |
to use it. We want them then to be able to say like, okay, that worked okay, but I wonder what if, 00:29:32.960 |
you know, could I make it faster by doing this? Or make it more accurate by doing that? And they 00:29:37.840 |
can just pull out one piece and replace it with a mid tier API thing, you know? So rather than 00:29:44.320 |
starting at the bottom, and then adding, you know, simplifying things with a high tier, start at the 00:29:50.320 |
top, which is also how we teach, you know, and then add in lower level things if and as you need them. 00:29:57.920 |
Did you have a goal, like, kind of what I'm thinking about, like for leaky abstraction, 00:30:01.760 |
do you have a goal where it's like, well, if someone has found, and I have not personally used 00:30:08.080 |
PyTorch, but I use Keras quite a bit, if someone finds the equivalent of a layer, you know, someone 00:30:12.960 |
has written a layer for PyTorch, they find it on Stack Overflow, how do I, you know, you know, 00:30:17.360 |
reduce the error here, whatever. Oh, do this. Is it, you know, one level would be like, oh, you can 00:30:23.200 |
literally just, you know, point to that, or another level would be like, you kind of need to package 00:30:27.680 |
that. You need to put that in a frame that the vast AI can consume. Yeah, so everything, you know, 00:30:37.840 |
the idea is basically that everything should be very easy for you to grab stuff from elsewhere and 00:30:45.120 |
just use it. So we actually have, you know, so we've got a bunch of integration, for example, 00:30:51.680 |
but in particular, you know, there's like, okay, what if, yeah, that's a great virtue of a system, 00:31:00.160 |
if it can do that, yeah, then it then it doesn't suffer from the we have to do everything. 00:31:05.840 |
Exactly. Special packaging, special wrappers. So what I did was I grabbed for this one, 00:31:13.680 |
I actually grabbed the MNIST training code from the official PyTorch examples. Yep. 00:31:18.800 |
And they originally had it as a script, so I just changed it to a module, you know. And 00:31:27.120 |
so I, so here, so this is their code, right? So I took their code. And then I said, okay, 00:31:39.760 |
well, how could, what if we wanted to replace their training loop and test loop? That's a lot 00:31:47.760 |
of code, right? And it's also not a particularly good training loop and test loop with the fast 00:31:51.840 |
AI one. And by using the fast AI one, you're going to get for free things like TensorBoard and 00:31:56.240 |
weights and biases integration, you're going to get, you're going to get all kinds of metrics, 00:32:01.760 |
you're going to get automatic mixed precision training, whatever. And so the answer is that 00:32:07.760 |
you can take all that train and test stuff and replace it with these two lines. 00:32:11.200 |
That's great. And then run this one line. And this is now also going to run with one cycle 00:32:17.040 |
training. So it's going to do a warm up, it's going to do a cool down, it's going to print out 00:32:21.200 |
as it goes. And that's literally it. Fantastic. And it's the same for other things, you know. 00:32:25.920 |
So for example, you know, I grabbed the PyTorch lightning quick start converted to a module. 00:32:36.000 |
And so those data types, the data types that are used by fast AI, since they're fundamentally the 00:32:42.800 |
PyTorch data types, that's how it all fits. They're not obscured. Yeah, that's either true, 00:32:50.000 |
or we recreate our own API compatible versions. So for example, the PyTorch data loaders are things 00:32:59.040 |
which take things that are either indexable or streamable one item at a time and batch them. 00:33:06.240 |
And we created something with the same name. The fastai.data.data loader. And then we added stuff 00:33:13.600 |
to it. We said, oh, we had a bunch of callback hooks that you can modify the data, you know, 00:33:19.280 |
after it's been batched or after it's been turned into an item or whatever. 00:33:23.360 |
So when I was thinking about your application layer, because I know like in your course, 00:33:29.840 |
you say you need to, you know, high school mathematics and some programming is what you 00:33:35.840 |
need to be able to learn this. And my question is, you could imagine, and I don't even know if this 00:33:42.640 |
is a good or a bad thing. So it's more just a question. You can imagine, you know, as you said 00:33:47.280 |
earlier, an application that does transfer learning and, you know, takes various types of 00:33:52.240 |
data that's well known and lets people say, oh, I'm doing computer vision. Or is that the right 00:33:57.600 |
layer or not? Right. Do you think that's a desirable layer to have or is the are you at 00:34:03.120 |
the right layer now where the person will encounter enough complexity that they really best know some 00:34:09.600 |
math and know some program? Yeah. So you can see where it would not be desirable to go further. 00:34:14.400 |
Yeah. So the answer is so far, we've we failed at our goal to make deep learning accessible 00:34:26.160 |
because we require high school math and a year of coding. And that's not accessible because most 00:34:32.240 |
people like I think only 1% of the world has like that coding background. So the goal has always 00:34:40.880 |
been to get to a point where I use the analogy to the internet, right? So when I started on the 00:34:48.480 |
internet, you would have to do it all through the terminal. And even when the first GUI things came 00:34:54.080 |
in, you would have to set up like PPP configuration files, whatever. And, you know, I'd read NewsNet 00:35:01.680 |
News with RN, which, you know, with all these arcane keyboard shortcuts, I mean, I loved it. 00:35:07.120 |
But it wasn't the most accessible thing. Nowadays, you know, my mum, who's 83, uses the internet 00:35:13.680 |
every day to chat to her six year old daughter on Skype and whatever. That's what most, you know, 00:35:22.240 |
AI should look like. Okay. We're starting to see a bit of that with things like 00:35:28.160 |
Codex and DALI Mini and DALI 2 and Mid-Journey and whatever, GPT-3, where, you know, I don't 00:35:38.560 |
know if you saw it yesterday, a book on OpenAI Prompt Engineering came out. In fact, I'm gonna 00:35:46.320 |
see if I can find it because it's quite interesting. And so basically, it's like there's still skill 00:35:52.800 |
involved in trying to create beautiful and relevant images using DALI 2. But it's not coding. 00:36:04.720 |
It's a different skill. It's Prompt Engineering. And okay, I think I found it. 00:36:16.000 |
So let me share my screen here. And I like this because we're all about 00:36:23.840 |
domain experts, you know. And so, you know, here's a whole book about how to create nice 00:36:35.040 |
pictures with DALI. And it doesn't have with lots of examples of nice pictures from DALI. 00:36:40.960 |
And there's no code in it, right? It's saying like, oh, we've done some research to find out 00:36:47.520 |
what kinds of words create what kinds of pictures. Here's examples of that for you. 00:36:52.400 |
And that's like someone learns, essentially, here you learn a craft of how to see the right 00:36:58.720 |
sorts of things. It's totally different than programming. 00:37:02.480 |
Right. And it requires like a genuine understanding of domain. So if you want to create good camera 00:37:08.800 |
shots that don't exist, you have to know about words like "experience close up" and "sinisterial 00:37:14.320 |
800T". Yeah, well, you can become very, very good at this. You know, extreme long shot. 00:37:19.280 |
And even like describing shadows and proportions. This is the kind of 00:37:29.280 |
thing we want people to be spending most of their time doing. And also the kind of people I want to 00:37:36.560 |
be doing it are domain experts in that field. So we want, you know, product marketing people, 00:37:45.200 |
you know, product photography people using their product photography skills to create 00:37:50.560 |
product photography mockups. We want disaster resilience experts to be doing disaster 00:37:55.280 |
resilience. We want radiologists doing radiology, supported by AI, you know. 00:38:01.680 |
Right. So yeah, so the tool that you would build for a radiologist, I mean, in a way, 00:38:08.240 |
you could even have it, you can imagine a radiologist is training a model, basically, 00:38:12.560 |
in a way, they're doing transfer learning, they're applying their data, they're there. 00:38:17.520 |
Yeah. But it's in their DICOM viewer, you know, on their radiology workflow software. 00:38:28.240 |
Okay. Well, I think the answer is that you would like to go quite a bit farther than you have. 00:38:34.080 |
Right with that. I don't quite remember what we said at the time we started. So when my wife, 00:38:38.480 |
Rachel, and I started fast AI, we just I think we were thinking it's at least a 10 year goal, 00:38:45.280 |
and of making deep learning more accessible. And like our first step was, well, 00:38:51.280 |
we should at least show people how to use what already exists. So that's why we started with a 00:38:57.440 |
course. That was the first thing we built. Because also, that way, we would find out well, what, 00:39:04.400 |
what doesn't exist, but ought to, you know, and so then it was like, well, basically, nothing 00:39:11.840 |
works except vision, computer vision at the moment, we should at least make sure this works 00:39:16.080 |
for text. So step two was, I did a lot of research into text, and I built the ULM fit algorithm and 00:39:24.320 |
integrated that and, you know, so there's a lot of research to do. And then 00:39:28.960 |
then it was like, okay, well, from the research we've done, we've realized that there's a lot of 00:39:36.080 |
things that you could do a lot better if only the software existed. So then step three was to make 00:39:39.920 |
the software exist, you know, so then there was a lot of coding. And then, you know, come back full 00:39:45.520 |
circle, do another course, you know, now showing here the best practices using everything we've 00:39:52.960 |
learned and built. Where are we now, you know, and so repeat this. So we've, we're just about to launch 00:40:00.320 |
version five of this, of this process, which, except for a year off, for COVID has been an 00:40:08.240 |
annual exercise. Yeah, I wouldn't be surprised if in the next five years, we have quite a bit of 00:40:17.440 |
the like code free stuff that we're aiming for. Okay. Yeah. Okay. All right. All right. My turn, 00:40:24.000 |
if I may. Okay, go for it. You got it. I wanted to change track a little bit, if I can, 00:40:29.200 |
to talk about your background, JJ. And the reason for that is I like to understand the background of 00:40:38.560 |
people who are doing interesting things in interesting ways. And like, what are the 00:40:42.880 |
ways I find you interesting is that your title is CEO. But in an interview, I read, you said you 00:40:50.480 |
spend about 80% of your time doing coding. And I know from personally interacting with you over a 00:40:58.160 |
lot over the last few months on building nb dev two that, yeah, you know, generally speaking, if 00:41:05.040 |
before I go to bed, I send you a message saying, there's a bug here, then by the time I wake up in 00:41:09.920 |
the morning and say, I fixed the bug is is the commit, you know, so that's unusual, you know, 00:41:17.520 |
and I also it's unusual that I feel like you I don't know, you seem to do things differently to 00:41:24.720 |
most people like you do you if you know, you feel more like a kindred spirit to me in a lot of ways 00:41:30.880 |
that like you seem to like doing things reasonably independently, but leveraging a small number of 00:41:35.200 |
smart people. And, you know, I was also interested to learn that, like me, your academic background 00:41:44.080 |
is non technical, you did, I did philosophy. You know, I'd love to hear like, yeah, what, what, 00:41:52.320 |
what was your journey from doing Paul sigh? Yeah, who founding kind of three, at least three 00:42:01.440 |
successful software companies are now working in scientific publishing? Yeah, yeah. How did that 00:42:06.960 |
happen? Well, it really, it started with, well, there's a couple of different threads that come 00:42:17.440 |
together. So one was how I got interested in data analysis, and statistical computing was, I was a 00:42:24.560 |
huge baseball fan. And I when I was like 12, I got a hold of books by Bill James, who you probably 00:42:32.880 |
have heard of. And he was a he was a math teacher from Kansas City, who wrote the Bill James baseball 00:42:39.520 |
abstract that essentially created this idea, why don't we empirically measure everything we can 00:42:44.640 |
about baseball and see what, see what's true and not true. And I don't know any other sport that 00:42:51.520 |
has a whole field of academic study of statistics named, you know, sabermetrics, you know, based on 00:42:58.960 |
that. And he started all that anyway, but what was impactful for me was, I was also very interested 00:43:04.000 |
in politics, my parents were political activists, and I was mostly interested in politics, I was 00:43:09.840 |
interested in baseball, I got the Bill James memo. And I realized like everything that people said on 00:43:14.960 |
television, about what was true about baseball, not everything, but a lot of the stuff was just 00:43:19.360 |
nonsense. The coaches, players and broadcasters, nonsense. So that had a big impact on me. I was 00:43:25.040 |
like, well, if that's true, then then a lot of the things people say about a lot of things are 00:43:29.200 |
probably nonsense. And probably data analysis is actually really fundamentally important. And so I 00:43:35.840 |
kind of got then when I was looking at political science, that was my lens. I was actually happened 00:43:41.200 |
to, I happened to find a great mentor in college who was also really into it. Can I just mention, 00:43:47.600 |
I had a similar background, but for a totally different reason, which is I started at a big 00:43:52.800 |
management consulting company when I was basically 10 years younger than everybody else. And they all 00:43:58.240 |
worked using their expertise and experience, which I didn't have. So my view was like, oh, 00:44:04.000 |
I'm going to have to use data analysis because of the ways I can. Yeah, so, so I was anyway, 00:44:11.840 |
political science, and I actually was convinced I wanted to be a political scientist, focused on 00:44:17.280 |
data analysis and things. And so I basically went to graduate school and to get a PhD in political 00:44:23.840 |
science. And by that time, I actually had taken a year off and I'd worked at the Minnesota Department 00:44:28.960 |
of Revenue as an analyst. And I used a lot. I had done plenty of messing around with software. I had 00:44:35.920 |
learned, you know, D-base and hypercard and, you know, various other kind of, you know, scripty 00:44:41.520 |
things that a layperson could access. I wasn't, I had no training in computer science and I didn't 00:44:46.240 |
take computer science in college, but I was able to get my head around things like hyper talk and 00:44:51.200 |
D-base and things like that. So, yeah. And so then, yeah, and SAS and, you know, all these kind of, 00:44:57.360 |
I was so exposed. I remember re-reading you were doing stuff with SAS and SPSS, you know, which are 00:45:02.000 |
some things I worked through. Yeah, SPSS, you know, Excel macros. So I ended up at the Department of 00:45:06.480 |
Revenue. I did a lot of SAS. I did a lot of... They're very pragmatic programming tools. Very 00:45:11.280 |
pragmatic. QuattroPro, you know, all this. So, and so then I got to graduate school and I just found 00:45:18.720 |
like, wow, I just really care a lot more about software right now than I do about political 00:45:24.160 |
science. It was actually at that moment when it was 90, uh, 92, 93, uh, when, when it was, 00:45:34.480 |
software was really coming into its own. Can I just ask that discovery? Yeah. Were you okay with that? 00:45:42.800 |
Because, because I wasn't, you know, for me, I felt embarrassed. I did, because my mentor, 00:45:52.240 |
oh my God, I spent four years, five, I spent so much time with my mentor and, you know, I just 00:45:58.160 |
was like, wow, this is, I know what I'm supposed to be doing and this is not what I'm supposed to be 00:46:02.960 |
doing. Right. But I really just went with the evidence of like, when I go to the bookstore, 00:46:08.160 |
I spent all my time in the computing section and that lights me up and that's what I want to talk 00:46:12.000 |
about. And I think you had more self-confidence than I did. Well, I also had a negative, a negative 00:46:19.120 |
experience with, um, academia, even though I had, I had a couple great professors, um, it, it didn't 00:46:26.400 |
feel like I was going to, you know, um, I didn't feel like I was going to succeed. Even if I was 00:46:32.880 |
into that, I didn't feel, it didn't resonate when I got there. Um, and so I was like, well, I'm not 00:46:37.680 |
going to do this and I think I want to do that. So I'm going to go try it. So I basically went off 00:46:41.840 |
and said, I'm going to, you know, I'm not trained to, to write software. I need to learn a bunch of 00:46:45.680 |
stuff. Uh, and I went and started, you know, teaching myself a bunch of stuff I needed to know. 00:46:50.560 |
And then I eventually got bootstrapped into doing some contracting. Um, and then I, so I sort of was 00:46:56.880 |
a contractor and kept learning stuff. And then I kind of by happenstance and good fortune ran into 00:47:02.880 |
the internet. Uh, and I had actually worked with my brother on. So when was that roughly? That was in 00:47:09.040 |
90. Um, well, we got, we got the internet at, at college my senior year. So that would have been 00:47:14.960 |
91. So we had, um, and then the web was 93. And, um, my brother was really into the internet and he 00:47:24.720 |
was going around the twin cities. You know, he got city pages, which is the, the news public, 00:47:30.960 |
the look, you know, the, um, the city newspaper, he got them to say, we're going to do 00:47:35.760 |
classifieds and forum, and we're going to do all this stuff on the internet. And then I, 00:47:40.800 |
and he was like, my brother doesn't write code. So he's like, Hey, JJ, you're, you're a contractor. 00:47:45.200 |
What are you, what can we do this? I was like, sure. I can figure this out. So I did that. And 00:47:50.160 |
then I was just like, and I, that the other thing that the big thing that happened for me was that 00:47:54.080 |
I was a fan of these tools that let ordinary people program. I was a fan of debase and hyper 00:48:00.160 |
talking on spreadsheets. And so I was like, that's really empowering. And so when I, what happened 00:48:04.720 |
was I said, wow, you know, um, my brother just told me he's going to learn Pearl so he can write 00:48:09.920 |
websites. Yeah. And I'm, and I'm looking at what, and I'm looking at what I did. I let pills so I 00:48:15.680 |
could write websites. I'm shoveling data in and out of a database and putting it through like a 00:48:19.920 |
template, you know, and mapping form fields to database. Like this is not, we don't need Pearl 00:48:25.920 |
here. You know, I mean, it turns out to do, to do fancy stuff, you need the equivalent of Pearl, 00:48:30.000 |
but to do the most basic things you don't. And so that's what I kind of came up and I, 00:48:34.320 |
and I always, I loved the idea of tools and abstractions and making computing accessible 00:48:39.920 |
and programming accessible. You know, I think the first one of those tools for the web was, 00:48:43.920 |
was Australian. It was a hot dog. Do you remember that? That's right. That's exactly right. It was. 00:48:48.400 |
Yeah. So somebody, yeah. So I, I did. I, I kind of said, well, I'm going to take a shot at, 00:48:54.080 |
at making a tool and see what happens. And that was called fusion. So, um, and so, and that, 00:48:59.840 |
I would say the other, so here it is called fusion. It still exists. It's still, uh, it's still, uh, 00:49:06.160 |
that's kind of Adobe now developer week happening and looks like now. So what, um, yeah, what year 00:49:13.200 |
was the first version of this? Uh, 95, 95. I mean, that's good longevity. That's good longevity. 00:49:20.240 |
That's right. That's right. Yeah. No, it's, uh, it's, it's, it's, uh, it's had, it's had a great existence. And, uh, um, 00:49:29.200 |
one of the, um, one of the big ideas though, that I, that I learned, one of the biggest things I 00:49:35.600 |
learned when cold fusion came out, there were probably 10 tools that, uh, did the same ish 00:49:41.200 |
thing. Was this before or after front page? Cause that was huge. It was concurrent with front page. 00:49:46.480 |
And front page didn't really do this front pages. No, it didn't. Yeah. So it was, it was concurrent 00:49:51.280 |
with front page. And basically the, the two of the biggest differentiators were, um, we had her 00:49:57.040 |
basically really good documentation and really good error messages, you know? And we just, 00:50:02.400 |
I mean, we'd see competitors that had twice our feature set, get no adoption. And what language 00:50:07.040 |
did you write this first? Okay. So I mean, but I don't remember a point at which you said you 00:50:14.960 |
learned C plus plus when I left graduate school, I learned C plus plus. Yeah. It took a couple 00:50:21.760 |
of years and, and I did the, the city pages, you know, project, and then this was my first serious 00:50:27.360 |
project. And now I wouldn't say I was good at it at that time, but I was certainly enough to, to 00:50:32.240 |
ship something. So, um, yeah. So, um, so yeah, so that was that. And, um, that was a great experience. 00:50:40.960 |
And I learned a ton from that. I also learned that, as you were saying, you know, I, I didn't 00:50:45.840 |
particularly relish the parts of entrepreneurship that didn't involve product development, you know? 00:50:51.120 |
So, and there are a lot of those are really important things that need to happen. Right. 00:50:55.600 |
So nowadays, I think you said you, you said before you, you delegate that largely to your president. 00:51:00.720 |
Yeah. The president of the company runs everything. And I, I do get involved with the, you know, 00:51:06.720 |
company strategy and certain, there are certain things that really important for me to be a part 00:51:10.800 |
of, but then I try to, to like preserve that roughly, you know, 80% of my time coding. And 00:51:16.080 |
I actually think that the, it's not just an indulgence. I actually think that great products 00:51:22.480 |
need to have people who are aware of the whole matrix of what's going on. Why is this important? 00:51:28.720 |
Why is this feature important? What users are important? How do users think that stuff 00:51:34.480 |
close to the keyboard is imperative? And a lot of times that's, that's, that doesn't happen because 00:51:40.880 |
somebody else. Yeah. Somebody else I've spoken to who has a similar approach is Michael Stonebreaker, 00:51:47.440 |
who's built a lot of the best database tools in the world at many companies. And yeah, he told me he, 00:51:55.920 |
I mean, he's also an academic, you know, so he kind of invents stuff and then finds a trusted 00:52:01.760 |
partner to bring it to market with. I don't think he's ever called himself a CEO. He kind of pulls 00:52:07.360 |
himself CTO, but you know, it's his vision and somebody else is running the admin. He creates 00:52:12.880 |
this thing and gets, and has the, there's conceptual integrity in what he creates and he gets all the, 00:52:18.960 |
all the trade-offs. I mean, there's like seven trade-offs a day that you made. Oh, he's in 00:52:23.520 |
Boston, right? Now I think about it. That's right. Yeah, he's in Boston. Yeah. I, I, I met him once 00:52:29.280 |
and happy and I met him once. So that was, that was mostly fun. Just watch those two of them talk. 00:52:36.320 |
All right. I should let you have a go at a question. Yes. Well, I was, I wanted to get 00:52:42.080 |
into a little bit of getting back to nbdev2. Oh, please. So maybe just to orient the 00:52:49.280 |
listeners who haven't seen nbdev or nbdev2. I mean, I, you've taken the, you know, notebooks 00:52:56.480 |
further than anyone thought possible and have created something really, really incredible. And 00:53:02.560 |
so I would love to hear, or I think other folks would love to try to hear general framing of what 00:53:06.960 |
that is. And I have some follow-up questions about it. Yeah, sure. So, I mean, one of the 00:53:13.040 |
best things I received was when the original creator of Jupyter and IPython notebooks sent 00:53:23.440 |
me an email and sent this blog post. He's printed out and put on his wall and he shows it to everybody 00:53:33.040 |
who wants to understand what, you know, notebooks are meant to be all about. And basically, 00:53:44.800 |
I really enjoy writing code in notebooks. And this is what, this is what my notebooks 00:53:55.440 |
look like. So this is a bit better here, but this is the first few cells of the first notebook, 00:54:02.720 |
which is used to generate nbdev. And when I first started, I didn't know anything about 00:54:08.560 |
notebooks internal. So I had to figure out what is a notebook. And so I wrote this thing that 00:54:13.920 |
reads a notebook, and then I look inside it. And as I do that, I'm a huge fan of the scientific 00:54:23.120 |
idea of journaling, right? Most of the world's best scientists have been very thoughtful about 00:54:30.880 |
how they journal, you know. So for example, the discovery of the noble gases, you know, 00:54:36.720 |
was something where basically, you know, this left over little bit of residue because the 00:54:42.320 |
scientists have been so careful about the process and journaling the path, they recognize that 00:54:47.200 |
shouldn't be there. You know, it's not I made a mistake, throw it away, but it's like, let's look 00:54:51.680 |
into it. Like, it helps with a rigor and knowing what's going on. So I like to document what I do 00:54:58.000 |
as I do it. And I also know that at some point, I'm going to want to share this with somebody else. 00:55:03.680 |
I want to show them what I found out. And I got to forget this in a year. So I want to forget 00:55:08.640 |
Germany here when I found out. But then I don't want that to be a separate artifact somewhere else. 00:55:16.240 |
Like, as I go along, I'm writing little functions, right? So initially, these two lines of code 00:55:22.080 |
would have been in their own cell, that that would have been, oh, okay, that's how you open a 00:55:25.920 |
notebook. Let's make it a function. And so I can chuck a def on top and give it. And you're also 00:55:31.840 |
articulating your understanding. Yeah, exactly. And then it's like, oh, I think it ought to give 00:55:38.080 |
something like this. And I check and it's like, oh, I did give that. And so now I've got a test of my 00:55:43.520 |
understanding and the API. And I've got to check that it's going to be consistent. And so that 00:55:47.200 |
becomes a test. So let's actually have a look at this. So here is the notebook which creates 00:55:59.280 |
notebooks, which creates nbdev. So here's notebook number one. And so we can then look at the 00:56:08.400 |
documentation for nbdev. Because writing documentation, like most people don't really do it. 00:56:18.880 |
Yeah, yeah. Well, that's what I was saying, that the whole reason that Coldview succeeds is because 00:56:22.880 |
we wrote documents. Right. So you'll see that my documentation here is the same thing as the 00:56:28.720 |
source code. And that's because source code and documentation and tests, they're all in the same 00:56:37.120 |
place. And this is like, this is kind of in some ways a lot more than just literate programming. 00:56:48.000 |
It's what I call exploratory programming. And it's this idea of like trying to recognize that 00:56:55.840 |
programming is a process done by humans and that we can support humans doing that process by giving 00:57:03.280 |
them tools that fit that process. So that's really what nbdev is all about. And it's not a new idea. 00:57:12.800 |
So obviously Knuth was the guy who kind of created the idea of literate programming, 00:57:20.720 |
combining programming language with the documentation language. And these ideas that 00:57:26.640 |
programs should be more robust, more portable, more easily maintained, and also more fun to write. 00:57:31.360 |
All things I found to be true. When I'm writing code like this, I tend to be in the flow zone 00:57:41.600 |
all the time. Because every line of code that ends up in a function, I've run it independently, 00:57:48.000 |
I've explored it, and I've played with it. I know how it works. So I don't have many bugs. And if I 00:57:55.440 |
do, they're ever weird bugs I don't understand. So I'm always progressing. So then Brett Victor, 00:58:05.200 |
who I really admire, talked about a programming system for understanding programs. And he has 00:58:15.200 |
some amazing examples of what could programming look like in a way that's much more exploratory 00:58:22.960 |
and playful. And so then another thing which was fantastic, my friend Chris Latner built Xcode 00:58:32.320 |
Playgrounds, which again, it kind of lets you see what's going on, you know, how many times it's 00:58:38.160 |
going through the loop, and what does it look like. So there was a lot of like, and of course, 00:58:43.200 |
small talk, you know, small talk was explicitly designed for exploration, like, it's, you know, 00:58:48.480 |
you have this whole... I was going to mention small talk in my file question. So that's great. Yeah. 00:58:52.880 |
So there was all that going on. And then perhaps most relevant Mathematica, which really developed 00:59:01.040 |
the idea of the notebook, and I really always enjoyed working in Mathematica. But never enjoyed 00:59:09.200 |
not being able to do anything with it, because there just wasn't a great way to like, take a 00:59:14.160 |
Mathematica notebook and give it to somebody else to play with. Yeah. Yeah. So when Jupiter came out, 00:59:20.400 |
I felt like, oh, this is a good opportunity to take these good ideas and turn them into the 00:59:28.080 |
thing I've always wanted, which is a way to build real software, real documentation, real tests, 00:59:32.400 |
but in this exploratory way. So that's what nbdev is. So you write your software in notebooks, 00:59:40.080 |
and you basically, you know, run a cell or a CLI command, and it exports it to a module. And that 00:59:50.240 |
module in Python, and that module automatically ends up on PyPy. So you can pip install it, 00:59:57.840 |
you can condor install it, automatically gets the documentation website, automatically gets 01:00:02.720 |
continuous integration tests. So somebody who actually just tried using this the first time, 01:00:08.160 |
a couple of days ago, told me from zero to having a website and module and continuous integration 01:00:17.920 |
done, it was 10 minutes. Yeah, I believe it. And that's what you want, right? Because it's like, 01:00:22.320 |
you know, you want to be to say like, oh, I brought you a little tool. Here it is. 01:00:26.320 |
There's the website, you know. And then when I get like, pull requests, you know, they're generally 01:00:32.960 |
good, because they wrote them in the notebook. So they can see exactly what it's meant to be doing, 01:00:39.600 |
they can see the tests there, there's like, they don't forget to write tests, because they're in 01:00:43.920 |
the same place, they don't forget to write documentation is in the same place, they 01:00:47.840 |
understand the context of what it's about. So I also find it helps, you know, with open source 01:00:55.040 |
collaboration as well. Now, I will say the tooling we built it on top of, which is largely kind of 01:01:04.400 |
nb-convert and stuff, the kind of the surrounding toolset around notebooks, I was never fond of. 01:01:11.440 |
I found it a bit slow and a bit clunky. I'm very grateful that open source volunteers built that 01:01:18.480 |
stuff, but I didn't particularly like it. So then, when I came across quarter, well, the first thing 01:01:27.760 |
I noticed was like, oh, this looks like nb-dev, like you guys are actually using cell comments. 01:01:34.800 |
Which we got from which we took from you from fantastic. Because we were struggling with 01:01:44.800 |
attaching metadata to cells. And as you know, notebook editors have a facility for that, 01:01:49.760 |
that is hard to find and requires you to edit raw JSON. So he said, well, that's not good. And so 01:01:56.480 |
he said, and I saw you do that. And I was like, because people are using tags, they're also using 01:02:00.560 |
tags. Absolutely. You know, and I was like, well, even the tag interface is really clumsy. And so 01:02:05.760 |
I was just like, why not the comment? You know, I saw you. Exactly. But you guys do it better, 01:02:10.080 |
because I saw yours and yours were like comment followed by a pipeline. And I had always kind of 01:02:15.440 |
struggled with his idea of like, how does anybody know whether something in nb-dev is a comment 01:02:20.160 |
or a directive? So you made that explicit. And I kind of thought, I wasn't surprised, you know, 01:02:26.800 |
because I kind of thought like, okay, JJ, I've always admired this guy's work. And he's now taken, 01:02:31.680 |
you know, I don't know if it's now I know it's intentional, but I didn't at the time, 01:02:35.120 |
it's intentional or not taken my work and made it better. And that's always, and I thought that's 01:02:40.400 |
great. We should, we should at least use that syntax. Yeah, sure. And then I started looking 01:02:48.080 |
into like, at what you're doing with it. And I thought like, Oh, no, this is like a whole tool 01:02:54.240 |
set that does everything nb-convert does and a lot more. But it's also more delightful to work with, 01:03:02.320 |
because it's got much better documentation, it's got much better defaults, it's, you know, the 01:03:07.680 |
stuff that's built in for free is much better. And then when I spoke to you, because I kind of said 01:03:15.280 |
like, to you, like, you know, this feels like something I could build nb-dev2 on, tell me a 01:03:22.480 |
bit about the technical foundation, like how is this working? And you explained to me, and I started 01:03:28.960 |
reading the source code to understand it, that it's actually this like relatively thin wrapper around 01:03:34.720 |
fantastic functionality that already exists in Pandoc. That's right. It's an orchestrator. Yeah, 01:03:41.040 |
which, you know, on a bunch of good defaults. So like, it's kind of like what fast.ai is to PyTorch, 01:03:45.760 |
in a way. Right. It is this amazing foundational technology that's actually just too hard for 01:03:51.280 |
people to get their head around. Yeah. Let's give you, like you said, good defaults, good ergonomics, 01:03:57.600 |
you know, and it's the same sort of thing. But also, Pandoc, I had so many problems with it. Like, 01:04:02.800 |
you know, when I used it, it just very often didn't quite work, you know. So you've also like, 01:04:08.560 |
just made sure it works. Like, oh, you know, that's unfortunate. Okay, make sure that works. Yeah. 01:04:15.120 |
So nbdev2 is basically like, should look very, very similar to nbdev1 except for the pipes after 01:04:27.680 |
the comments. But it's dramatically faster. Partly because, well, partly because I wrote a lot of 01:04:38.400 |
stuff myself from scratch by using the Python AST.parse stuff. So I'm working with the abstract 01:04:45.360 |
insects directly. I'm making sure I only have a parse at once. I reuse the cache to AST, you know. 01:04:50.800 |
And then partly because, you know, we leverage quarter, which is much faster than nbconvert. 01:04:58.720 |
So it's much faster. And it's kind of the code base, even though it does a lot more, 01:05:05.760 |
it's a lot smaller, you know, than nbdev. Again, by kind of like trying to build better foundations. 01:05:12.560 |
Well, the interesting thing, I noticed the title of your blog post was Use Notebooks for Everything. 01:05:19.360 |
And I, one thing that would be interesting to explore, so I kind of came up through this 01:05:26.880 |
interactive computing metaphor, which was really defined by, have you heard of ESS, 01:05:34.240 |
the emacs speak statistics? That was sort of this emacs mode for R and S, actually, originally, 01:05:41.760 |
and then R. And it was like, one of the things that it sort of said, you want everything to be 01:05:48.320 |
interactive and responsive, and you're always in a live session. The way they achieved that was 01:05:53.200 |
through, rather than having a notebook, they did line by line execution. That's like the fundamental 01:05:58.240 |
model is I select a line or a group of lines, and it can be smart syntactically, like, oh, I see the 01:06:03.520 |
line continues. And you just edit lines, basically. And then at some point, you might, like you did, 01:06:08.800 |
reorganize that into functions and so on and so forth. And so one of my questions was, and I think 01:06:14.800 |
one of the most delightful and powerful things about notebooks for Python is that they give you 01:06:20.720 |
this interactive development experience. I sort of see it, and you know, Smalltalk gives you an 01:06:25.520 |
interactive development experience with yet another kind of way of organizing the interactive 01:06:29.840 |
development. And so, you know, one of my questions is, and so we are building now, as we build tools, 01:06:35.440 |
we have this tradition from R of this ESS drive, kind of like line by line execution. 01:06:41.520 |
You see your side effects, maybe in another pane or in a console. And then we have notebooks, 01:06:46.560 |
and we're sort of trying to do tooling for both. And one of my questions is, how much of what's 01:06:52.400 |
amazing about notebooks, like, so there's multiple ideas wrapped up in notebooks. There's everything 01:06:57.440 |
in one place, there's bundling output and, you know, and then there's interactive computing 01:07:01.360 |
experience, and there's immediacy. Like, there's the thing that a lot of people hate, which is also 01:07:07.040 |
state. And the state, right. And that's a side effect of it's all trade offs, you know, and the 01:07:12.320 |
state, you know, so it's like, which, which I think of is actually part of what's excellent about 01:07:19.360 |
notebooks, if you know how to leverage the state, it's actually, if you know how to leverage the 01:07:22.880 |
state. Yeah, I mean, so it's like your file system, you know, your home directory, that is state, 01:07:32.080 |
that's also when you CD into something, and you copy something, you know, it's, it's, it's, 01:07:36.960 |
it's state. And this is your home, you know, you made this box, you created a side effect, 01:07:44.640 |
and it happens to be a, you know, a model or a data set, it's like, this is what you, this is, 01:07:49.440 |
you've created, I have it now. Yeah, you've created this environment to be in a state that 01:07:54.320 |
you want it to be. Yeah, and we have. Yeah, it's funny, because we have some religion, 01:08:02.080 |
you know, in our like, well, you need to, you need to, it's like, you need to be able to execute the 01:08:06.560 |
thing from top to bottom and have it work every time. Sure. And so, but that, but then there are 01:08:11.040 |
people who say to you, well, I don't really want to do that, because I actually, this was really 01:08:14.800 |
expensive, you create this piece of state. And I don't so much want to have to bottom, you know, 01:08:20.240 |
so, so, you know, there's, I think there's a little bit of people have tried to build, 01:08:24.240 |
you know, the sort of way to split the difference. It's funny, when I, when I first 01:08:27.280 |
encountered these ideas, I was like, wow, it's so messed up that there's all this state, 01:08:32.640 |
I was like, Mathematica must have some solution for this, I went up to, I was at like, 01:08:36.080 |
some conference, and I walked up to I said, how do you guys do this? Like, we don't, 01:08:39.760 |
we just, you just execute, you know, it's like, okay, because it turns out, you know, 01:08:44.480 |
if you want to solve that problem, it's its own quagmire. And people have reactive notebooks that 01:08:51.040 |
essentially do solve the problem, but then are really painful to work with interactively, 01:08:56.160 |
because as soon as you're doing anything that takes more than 10 seconds, you're now. 01:08:59.920 |
Yeah, so can I tell you, yes, I'm happy. I can tell you a bit about my thoughts about, 01:09:04.160 |
you know, that would love to that. So that's like the set the table of like all the stuff that's out 01:09:08.640 |
there. And where do we go? Yeah. So so a lot of people are very into line by line based approaches 01:09:18.080 |
in Python as well, particularly using the the IPython REPL. Yep. Yeah, so. And it looks 01:09:26.960 |
basically identical to how people coded in an APL 50 years ago, except they used a teletype, 01:09:34.880 |
you know, and it's based on that idea. And, you know, APL kind of invented that 01:09:40.240 |
way of working. And, and APL was more than just a programming language, because it was your REPL, 01:09:49.840 |
that was also how you would like text chat, there was an APL command for that, you know, 01:09:54.400 |
like everything was, that was your, that was your OS, if you like. And there's nothing wrong with 01:10:02.240 |
that. But we have, you know, there are there are other ways, right? And so a notebook, 01:10:07.920 |
you can do it top to bottom, if you want to. But you don't necessarily want to, because 01:10:16.160 |
it's often nice to go back and change something a little bit earlier, to answer the question, 01:10:23.600 |
I wonder what happens if, right? And so you change that, you select the four cells underneath, 01:10:28.720 |
and you hit shift enter to run those four cells, it's like, Oh, well, what if I did this? And, 01:10:33.280 |
and then you kind of think, okay, let's try three different versions of that. So you copy and paste 01:10:38.240 |
those three cells twice, and then you select them, and then you run those with two different versions, 01:10:42.800 |
and then you compare, you're doing experiments, you know, and the artifacts of those experiments 01:10:48.720 |
are right there, all in front of you. And that doesn't mean that then you're finished, 01:10:57.760 |
right? Like, hopefully, you've learned something with that, that you're finding your understanding 01:11:01.600 |
of the problem, right? So then you kind of package it up a little bit, you kind of say, Okay, well, 01:11:05.200 |
for somebody reading this notebook, I want them to see these three different versions. And so like, 01:11:08.880 |
maybe you put it into a little for loop, or maybe you create some kind of function to display it and 01:11:13.600 |
put it on a graph or whatever. But it's, you know, for me, like there, there are two critical, 01:11:20.640 |
critical keyboard shortcuts in notebooks, shift M and control shift hyphen, shift M merges two 01:11:26.160 |
cells together, and control shift hyphen splits them apart. And so I'm always like, grabbing a 01:11:31.760 |
single line of code, I'm running it, I'm exploring it, I'm, you know, assigning it to something I'm 01:11:37.840 |
trying to change fiddling with that. And after a while, I've got three lines of, you know, normally 01:11:42.800 |
almost my functions are three to four lines of code, I've got the three lines of four to four 01:11:46.560 |
lines of code that do that thing. And I just shift M a couple of times, you know, indent the block 01:11:52.960 |
underneath the death at a doc string. And then all those examples, they're all still there 01:11:58.640 |
underneath. And so I had some pros before each one. And that's a nice way of working. Yeah, yeah. And 01:12:06.720 |
like, and as you say, particularly in deep learning, like sometimes I'll be like, Okay, well, 01:12:11.600 |
I want to show how we can interact with like, a language model. All right, let's run this for 10 01:12:20.000 |
hours. You know, I come back in the morning and I've got a language model just where I want it. Yeah, 01:12:25.520 |
you know, I mean, maybe that's not a great example, because I probably serialize that as a 01:12:31.040 |
pickle file or something. But yeah, well, not necessarily want to run everything all the time. 01:12:35.200 |
Yeah, an hour or 30 minutes might be. Yeah. Just make the point just as well. I think there's an 01:12:41.680 |
issue, which is, it reminds me of my time in spreadsheets, you know, I have a huge fan of 01:12:46.720 |
spreadsheets, even though a lot of people use them badly. Yeah. And I read a book 30 plus years ago, 01:12:58.160 |
which is a book of spreadsheet style. And it was designed to be like, you know, 01:13:03.760 |
what's that English style book? It's designed to be kind of like, you know, rather than grammar 01:13:12.960 |
and style of English. It's kind of like, oh, sure, for spreadsheets. Yeah. And yeah, it explained, 01:13:20.320 |
like, here's how you add careful auditing, error checking, self documentation, whatever the 01:13:27.440 |
spreadsheets. And so ever since that, that, you know, I've tried to follow these rules so my 01:13:32.000 |
spreadsheets. Yeah, it's, it's taking a very flexible tool and using that flexibility to 01:13:39.200 |
create a process for using that tool, which works really well. Same with notebooks. If you, 01:13:44.800 |
yeah, you can shoot yourself in the foot with them, but that doesn't mean we should tell people 01:13:49.200 |
not to use them. Yeah, you should help people. You can shoot yourself in the foot with a .py 01:13:54.880 |
file or sitting at the ipy file. Or a C++ file. Or a C++ file, definitely. So yeah, 01:14:01.200 |
so we're kind of adding, like, more stuff, more and more stuff. So something that I've built as 01:14:05.440 |
part of nbdev2 is something called execnb, which is something which is just a tiny, tiny little Python 01:14:12.240 |
module that just runs notebooks. And, you know, you can parameterize the runs, you can, you know, 01:14:18.960 |
it'll save the results back into the notebook, you know, with this idea that, like, you can very 01:14:24.160 |
quickly and easily run some experiments, share the results with people. And nbdev repo, I mentioned 01:14:34.880 |
it creates continuous integration for free. That continuous integration runs every notebook top 01:14:39.040 |
to bottom. So if you're on notebooks, don't work top to bottom. As soon as you commit, you're going 01:14:43.840 |
to find out. You're going to get an Instagram from GitHub. It's kind of harmless. It's harmless to 01:14:49.200 |
create a local out-of-order notebook because it's going to get checked. Yeah. So, I mean, 01:14:53.760 |
yes, you've diluted yourself temporarily, but there's a net. Yeah, exactly. That makes sense. 01:14:58.400 |
Yeah. All right. So if I can come back to a quarto a bit, JJ, I wanted to understand 01:15:09.440 |
where you're going with it and why. So you mentioned earlier that scientific programming 01:15:17.280 |
is broadly speaking something you were trying to, like, improve. But quarto is not just scientific 01:15:24.800 |
programming. You've got all this stuff about kind of scientific publishing as well. Yeah. 01:15:30.400 |
So what are you trying to do with quarto and why are you trying to do it? 01:15:35.360 |
Well, it's, and I would say it's, quarto is much more a scientific computing, you know, 01:15:41.200 |
that's what RStudio and Tidyverse and, you know, Arrow and all those projects are about scientific 01:15:48.080 |
computing. I'd say that quarto is very squarely about scientific communication. And I would say 01:15:55.760 |
that there's a few things that just by working in the field for a little while, I have noted 01:16:01.600 |
that I think like warrant significant improvement. So one is the fact that we have scientific 01:16:10.480 |
communication for a lot of good reasons is very tied to print. And that the coin of the realm is 01:16:16.960 |
these print articles. And that's fine. And there's good reasons for that. And there may even still 01:16:21.200 |
be good reasons for that in the age of the web, where, where, for example, a PDF is a more durable 01:16:26.960 |
entity than, you know, a website that might get taken down or have its links break, etc. 01:16:31.680 |
But maybe, maybe not. Okay. So what I'm saying is I've certainly seen some discussions where 01:16:38.160 |
people say it's not a terrible thing to have a self-contained representation of your whatever, 01:16:43.840 |
better to have like a Docker image that can run everything anyway. But so very tied to print. And 01:16:50.160 |
so one of the things is to help scientific communication take better advantage of the web 01:16:55.520 |
while still not losing the focus on print. So not going completely like, hey, everything in now and 01:17:02.480 |
in the future is web. But now all of a sudden, I actually can't write an article that I can 01:17:06.240 |
publish with that with that mindset. So that's one, one piece, another piece, which was huge 01:17:12.400 |
focus of the R community, which is reproducibility. And this idea that everything should be in a dot 01:17:17.120 |
R, you know, in an R Markdown document that runs top to bottom, where your figures and your tables 01:17:21.520 |
and your, your results and everything is all reproducible and produced by code. And so helping 01:17:27.840 |
people do that is a big motivator. So let me come back to the first one, which is about scientific 01:17:33.760 |
communication, making it more web friendly. Yeah, I guess like, why? Like, what's this got to do with 01:17:43.280 |
R Studio? Or is this like, what's this got to do with you? Like, what do you, why do you care? 01:17:47.760 |
Well, what I was asking you with me was that, to me, my own, my own kind of beginning of the 01:17:52.720 |
Renaissance was the, the, the Bill James baseball abstract, eyes opened. And then I get to its 01:17:59.520 |
politics and my mentor is, is, is demonstrate. He's also like, wow, we're making decisions that 01:18:06.000 |
affect hundreds of millions of people with no evidence or making medical decisions with no, 01:18:11.760 |
not or no evidence, probably an exaggeration, but really weak under, under rigorously prepared 01:18:18.560 |
and under-evaluated evidence. And so to me, it's just like doing science. Well, has a lot of 01:18:25.840 |
consequences. So this is like, this is a, this is a, this is a, this is a mission for you to do 01:18:34.640 |
science better. That's right. And John Chambers in his book about, about, about RNS software for data 01:18:42.080 |
science. He actually has this concept in there, which I used in all my slides. It's called the 01:18:46.720 |
prime directive, which is basically like accurate, trustworthy computing of scientific results is the 01:18:53.600 |
prime directive. It's really important for the same thing, for social policy, for medicine, for, 01:18:59.280 |
you know, just safety. So that's it. I mean, I was really compelled by that. So helping people do 01:19:05.360 |
science really well and communicate technical content and persuasion well is to me very, very 01:19:12.160 |
compelling. Is there something about like accessibility there as well for you, like making it, 01:19:18.080 |
like making science more accessible and making scientific publications, like more accessible? 01:19:24.960 |
Not per se. I'm taking scientific communication at face value, that it serves whatever purposes 01:19:31.600 |
it serves and has whatever virtues it has. I'm not, I don't, I'm not saying let's change that. 01:19:36.800 |
That's not at least my thing, but I will say that another related influence was the, you probably 01:19:42.400 |
read it, the Tufti has this pamphlet, which is the cognitive style of PowerPoint, you know, 01:19:49.760 |
pitching out corrupts within, you know, and he sort of breaks down what's wrong with a lot of 01:19:56.560 |
the way we communicate about technical information. And he sort of at the end, he says, you know, 01:20:02.240 |
really what we should be doing is giving each other handouts that have analysis and evidence and data. 01:20:08.720 |
We should be reading the handouts before the meeting, and then we should be talking about them, 01:20:12.160 |
you know, not, not pitching, you know, bullets at each other. So I was compelled by that too. So I 01:20:17.920 |
was sort of very compelled by the idea, like, let's give people tools to communicate effectively 01:20:27.200 |
about technical matters and, and science. So that's, that's, that's very motivating to me. 01:20:33.840 |
So just showing this, this is the Tufti. It's a really great, yeah, they have, there's a really 01:20:40.480 |
fun, funny thing in there where he says, here's what the, was it the Gettysburg Address would be 01:20:45.840 |
as a PowerPoint, you know, presentation. So, you know, it's, you know, similar ideas 01:20:52.320 |
in, like, how Amazon do things. So, you know, they, they do a six page kind of memo. 01:21:02.800 |
And of course, also Feynman, you know, in talking about the challenges, 01:21:08.560 |
space shuttle disaster, felt like a lot of that problem came from complex ideas. 01:21:15.280 |
We just saw your, and I think you just posted on your blog about this, the evidence update 01:21:20.560 |
regarding masks and COVID-19. That's exactly what I'm talking about. Like, let's have a dialogue 01:21:26.240 |
about a matter of public health importance and use evidence and communicate a commute, 01:21:32.320 |
do technical communication really effectively. The reason I asked about accessibility is that, 01:21:36.400 |
I mean, this, this, so this was an article that me and this team, this team and I wrote in April, 01:21:48.800 |
early April 2020. So, you know, within a month really of the pandemic taking hold in the US. 01:22:01.680 |
Well, within a month. But it wasn't published, I mean, it says here accepted December the fifth, 01:22:10.800 |
and then I think it was published quite a bit later than that, maybe even. 01:22:14.560 |
So, by the time this was available on the proceedings of the National Academy of Science, 01:22:22.640 |
it was almost obsolete, you know. But what we did do was we also put it on preprints.org, 01:22:34.720 |
where it was there from, here we go, 10th of April. And these were very minor changes, right? So, 01:22:48.560 |
and this version has received 439,000 views of the abstract and 98,000 downloads, which is the, 01:22:57.840 |
by far the most viewed preprints.org paper of all time. And, you know, the fact that that was much 01:23:09.120 |
more, you know, if we compare it like. Let me, let me, let me re-answer a question, 01:23:16.320 |
because when you say accessibility, I read that as the accessibility of the discourse. Can a lay 01:23:24.080 |
person understand this? And that's not per se a goal, but it is. But accessibility in the sense 01:23:30.800 |
of the way scientific publishing works, and the delays that are inherent in the progressive 01:23:39.600 |
refinement of knowledge, and the various choke points that there are for publishing that gives 01:23:45.040 |
people credit for their careers. That is all kind of pretty messy. I don't have good ideas about 01:23:52.240 |
personally about how to resolve that, but a lot of people do. And a lot of people are working hard 01:23:56.640 |
at that. And so it is motivating to me to build, if I could build a tool that's widely adopted for 01:24:03.440 |
scientific communication, that I can marry that to good ideas that are out there, and easier to 01:24:09.520 |
adopt. I mean, that's kind of why I asked, because, yeah, like, that's kind of the number one goal, 01:24:14.800 |
even though we got to a third, but that's a hope that I have. Because like, I mean, so I, you know, 01:24:22.000 |
to be clear, I hate thinking about talking about writing about or learning about masks, I find them 01:24:28.480 |
tedious and annoying, but, you know, I have to, because other people aren't. And so I just, you 01:24:34.400 |
know, I updated that paper quite recently. But I didn't put it on a journal, I put it on our website, 01:24:42.240 |
because I felt this is more accessible. And also, because like, I just couldn't be bothered, 01:24:48.480 |
like doing all that latex stuff, and real links to real, you know, anybody can click on it and go there. 01:24:57.120 |
This is a goal we have, which we haven't, it's not evident yet, we're working on it, 01:25:02.480 |
is that you should be able to basically create a blog like this, that's got this content, but 01:25:07.920 |
and take this and repurpose that same content and send it to the journal. Exactly. That's, 01:25:11.680 |
that's exactly what I want to do. It's like single source publishing, where you can be, 01:25:15.920 |
you can almost be web first. And then, oh, look, we also know how to make, make LaTeX that you can 01:25:21.040 |
submit to the, to the other places that you need to get this published. And I can show you how 01:25:26.080 |
horrible this looks nowadays. So I did exactly that for I did a paper about vaccine safety with my 01:25:32.160 |
friend Yuri Manner. So Yuri wrote a study, or was a senior author on a study, which for whatever 01:25:42.560 |
reason, got picked up by the conspiracy theorists, well, as showing that vaccines are harmful. And so 01:25:53.520 |
him and I got together to write a paper that said, basically said, here's what that paper actually 01:26:00.480 |
says. So this is the paper here, LaTeX 2021. But again, after, you know, we actually wrote this 01:26:10.400 |
probably in about April 2021. And in the end, I just, nobody had yet reviewed our submission. 01:26:17.840 |
And so in October, I just kind of went, oh, fuck it, I'm just putting it on the web. So I had to 01:26:23.200 |
take that LaTeX document and turn it into web. And I did use pandoc to help me. But as you can see, 01:26:30.880 |
we end up with these like, oh, yeah, kind of references that I had to kind of paste them down 01:26:35.440 |
at the bottom. And then here, let me show you go to the GitHub, or this will be out by the time this 01:26:41.600 |
video broadcast, there's GitHub org called quarto dash journals. And this should show you yeah. So 01:26:51.040 |
basically, we're working on journals. So you can see like, you know, if you go to one of those, 01:26:57.360 |
let's see what it shows you. Yeah, so scroll down. Anyway, it's not it's not showing you but go to 01:27:03.840 |
that go to that template that qmd file there. And you'll see sort of an example of, you know, 01:27:10.000 |
you've got, you know, your metadata, your authors, your, you know, all the stuff you need to do, 01:27:15.440 |
it's making the LaTeX that the journal wants, including get off and getting all the fiddly bits, 01:27:19.600 |
right. But then the same exact content is going to render perfectly in HTML. 01:27:24.400 |
That's great. It's gonna do everything that is going to do everything right. So that's I think 01:27:30.960 |
the idea is, let's just write in quarto. And now we're going to be able to put it in on on the web, 01:27:36.400 |
maybe web only, you know, but also that world of publishing my god, I was so shocked when I 01:27:42.800 |
discovered how it works for this for this penis thing. And if the penis is, I think, 01:27:48.000 |
like the third highest impact journal in the world. And so, you know, I thought like, oh, 01:27:51.920 |
this is going to be a smooth professional experience. And, you know, I did the whole 01:27:56.640 |
thing in Overleaf and LaTeX and bibtech and just fine, it was pretty easy. And thanks to Overleaf, 01:28:02.560 |
you know, with all 19 of us authors could collaborate by working on different sections. 01:28:06.720 |
And so then when it came to publishing it, you know, I had to upload the rendered PDF. 01:28:14.720 |
Okay, so I uploaded the rendered PDF, I wasn't quite sure how that was going to help them. 01:28:18.640 |
And then, you know, like, a while later, yeah, they contact me and say, like, okay, we now need you to 01:28:26.880 |
like, look at these questions. They were basically they put annotations in the PDF, which is already 01:28:36.800 |
kind of hard to work with. So I ended up trying to like reply in the PDF to the and then eventually 01:28:43.600 |
they're like, okay, now you have to go through and look at the kind of camera ready document, 01:28:48.800 |
whatever, and look at these things. And they sent me back a Word document. And they've taken the 01:28:53.360 |
whole thing and redone it in Word. Yeah, just wait for it. So then, so then they're like, okay, 01:29:04.000 |
they had a question about a reference. They're like, maybe this reference doesn't really make 01:29:08.720 |
sense there. I think they said you're not allowed to use it because it violates some rule or 01:29:14.000 |
something. I was like, I don't want to fight about this as far as this, fine, you can get rid of it. 01:29:17.760 |
And they're like, okay, so what you need to do is remove that, then renumber all the references 01:29:22.480 |
afterwards. Exactly. There's 150 references. And this is reference. Well, this is what yeah, 01:29:26.800 |
this is what like a proper, you know, scientific markdown system will do that. We'll remember 01:29:32.240 |
everything. So I just said, I just said no. I've made that change in the late tech is the PDF with 01:29:39.520 |
the correction. Yeah, you fix it. Well, I almost view it as like, you have to give people tools that 01:29:47.120 |
help them with the problems they have now. And, you know, which is I need to interact with all 01:29:51.760 |
these journals and publishing systems. And then you have a chance to help them, you know, evolve 01:29:56.560 |
what they do and help them do things they never thought were possible. So I think that's one of 01:30:02.160 |
the reasons like we are focused on really tooling late tech well and letting you know, we're very 01:30:08.960 |
focused on that, even though we think, wow, it should be great if we didn't have late tech, 01:30:12.480 |
we're not we're not ignoring it, we're saying, okay, we'll tool that and but we'll also tool the web. 01:30:17.600 |
And it'll be great. And we'll all get there eventually. So that's now can I ask some, 01:30:23.040 |
I'm very, very excited about this, by the way, I love that, like, this is something I'm 01:30:28.320 |
passionate about. Yeah, it like a kind of a slightly weird way in that like, I'm 01:30:32.880 |
passionately anti how academia works to the level that everybody was assuming I would go into 01:30:39.280 |
academia following school, and I refuse to know the basis that I didn't like how academia worked. 01:30:45.360 |
Yeah. And I've now finally come full circle. I am actually a professor. But, you know, 01:30:53.200 |
only because I'm able to do it on my terms, and I totally refuse to do any of the normal things. 01:30:59.120 |
So that's, it's great to have you involved in this fight. 01:31:02.560 |
Yes, we are going to be very involved. We have a couple of questions from the community. 01:31:07.280 |
So, okay, so this one is actually asked to me, but I wouldn't mind asking it to you as well. And 01:31:15.360 |
then I can come back to myself. This person said, for Jeremy, your productivity amazes many people, 01:31:22.640 |
including myself, do you have any tips that might be valid in general? What does your usual day look 01:31:27.360 |
like? Now, I feel the same way about you, JJ, I'm amazed at what you've done and what you do. And 01:31:34.000 |
Hamill and I are both, you know, like, well, how does JJ do all these things so, so quickly? So, 01:31:42.000 |
yeah, I'd love to hear your well, I would say that, to me, the main lever for productivity is not 01:31:53.920 |
how fast you can code that that certainly helps. I think it's more what problems do I choose to solve 01:32:02.640 |
and, and what order and at what level of depth, you know, to me, getting through a problem, 01:32:08.080 |
or a problem domain is about making those choices. And there are side, side quests you can go on that 01:32:15.200 |
waste three times the total effort required to actually solve the problem. So I think that a lot 01:32:19.040 |
of that just comes from experience. So I think there's the choosing what problems to work on. 01:32:25.040 |
And that I think you can you can you can level yourself up by talking over what you're planning 01:32:31.280 |
to do with other people. And so I was thinking of trying to solve this and then this, and then they 01:32:35.040 |
say, huh, well, why is that important? Isn't that only important to this? And couldn't you do, you 01:32:39.360 |
know, so I think some dialogue helps inner dialogue is great. If you have a lot of experience, maybe 01:32:42.880 |
you can get it done with mostly inner dialogue, but talking to people, I think tactically, so that I 01:32:47.520 |
think then there's just throughput, how much, how much code, how many features can you write? And I, 01:32:53.280 |
to me, the biggest thing is just, you know, several hours of completely distraction free time. 01:32:59.280 |
So you kind of like turn off any notifications, 01:33:02.080 |
turn off notifications, build up a stack, get your stacks, you got to get like a proper head of steam 01:33:08.560 |
and not let your beat self be distracted. Do you work at an office or you work from home? 01:33:12.880 |
I do, I do work at an office. Yeah. Yeah. And I found that to be to be helpful for that, for that 01:33:19.360 |
purpose. I do, I do have a good setup for working at home too. And it's separate enough from the rest 01:33:25.440 |
of the house that I can, I can approximate that pretty well at home too. But, but yeah, so I feel 01:33:30.560 |
like, you know, I need to get four or five, six hours chunk of distraction free time. So then it 01:33:37.280 |
helps just to batch up things like, okay, and you can even batch up things by the day. Monday, I'm 01:33:42.560 |
going to do all the fiddly bits and distractions and calls and, you know, or Monday and Tuesday, 01:33:47.440 |
I'll do that. And then I know Wednesday, I have nothing scheduled at all, Wednesday through Friday, 01:33:51.520 |
and I can get to good focus. I mean, that is significantly more hours than most experts in 01:34:00.880 |
creative fields. So they can achieve like normally four hours seems to be considered about what you 01:34:06.400 |
can aim for as a best, like five or six is fantastic. Yeah. Yeah. Is that because they're 01:34:12.720 |
because of just, just sustaining concentration? Yeah. Yeah. And it's not just in like, I mean, 01:34:18.960 |
it's like in, in, in, yeah, like the, the kind of deliberate, you know, deliberate practice stuff. 01:34:27.360 |
Yeah. Yeah. Yeah. It's kind of what you're doing as well. Deliberate practice is normally what 01:34:33.280 |
no, no, that's just a helpful, helpful genetic attribute that I have. Yeah. Yeah. Yeah. I can't 01:34:39.280 |
do that. You know, I, I very rarely could do four hours. I, you know, three is good for me. 01:34:47.200 |
You try to get the three hours distraction free or? Yeah. I mean, and also, 01:34:55.120 |
so I mean, my, my, my main thing by far is, is a deliberate choice I made as an 18 year old to 01:35:02.880 |
spend on average, half of every day learning or practicing something new. Yeah. Yeah. Yeah. Which 01:35:11.600 |
is, yeah, it drives everybody I work with crazy pretty much because it makes you a very, very 01:35:19.520 |
creative inventive, able to see around a lot of corners and solve problems in ways that people. 01:35:24.320 |
Yeah. And I know, I know tools extremely well. All the keyboard shortcuts and all the tricks and 01:35:30.080 |
whatever, and all the libraries. But it does mean, yeah, people are working with me and are like, 01:35:34.000 |
okay, we're going to have this thing finished by Friday and you're, you know, learning. You're 01:35:38.880 |
doing this programming language for no obvious, like what they need to look at it along the long 01:35:43.200 |
view. And there's also means like, you know, very often using a tool I'm not very familiar with to 01:35:49.040 |
do something, even though it would be five times faster to do it manually. Yeah. But yeah, like, 01:35:53.520 |
it's definitely got me to a point now where I find, you know, the vast, you know, nearly everybody I 01:35:59.120 |
work with, I, I just get things done, you know, often 10 times faster and it tends to work the 01:36:05.600 |
first time. And I kind of often find what I do live coding or whatever people are like, 01:36:11.440 |
oh, I didn't know that tool exists. So I didn't realize you're always looking for that way or 01:36:16.080 |
an efficiency and yeah. And then so I think, yeah, I think something people would be surprised about 01:36:22.000 |
with me, if people think of me as productive, how few hours of productive time I have a day, 01:36:30.080 |
I spent a lot of time hanging out with my daughter and going for a walk on the beach and 01:36:35.120 |
eating ice cream and, you know, like try to be in a good mindset to have a good, a good three hours. 01:36:43.600 |
It's a very, very good three hours that you have that. Yeah. Not many people have good three hours 01:36:49.040 |
that often. That's right. No, I, I see that a lot of people that I work with their days divided up 01:36:54.640 |
into small bits and there's probably not three hours of even of engineering in there and they're 01:36:59.520 |
all broken up. And yeah. And it's also a case of being good at saying no, like I very rarely do 01:37:05.440 |
meetings. And if I do, I want it to be a good one. Like, like this, you know, like talking to somebody 01:37:11.840 |
I really want to talk to you about things I really care about. And so generally somebody's like, 01:37:15.680 |
can I get on your schedule for a half hour phone call? I'll say no. But, you know, if you send me 01:37:21.040 |
some email, I will respond, you know. Yeah. Okay. So, you know, yes. So your brother apparently does 01:37:31.200 |
some rapping. Somebody else wants to see you doing some rapping, JJ. That's not going to happen. Okay. 01:37:37.120 |
So both, both of us, when making design or development decisions regarding nb dev two 01:37:45.120 |
and quarto, were there any trade-offs you struggled with? Yeah, I would say two, two trade-offs. One 01:37:53.360 |
was going back to the discussion we had earlier about leaky abstractions, how leaky an abstraction 01:37:59.200 |
over pandoc should be, because our markdown actually fully, it's pretty fully abstracted 01:38:05.280 |
pandoc. Like you just used all these R functions and you didn't even know pandoc was there. And if 01:38:11.200 |
any given piece of functionality needed in pandoc, you know, you needed to address, you need some 01:38:17.680 |
hacky way to work around the fact that we've written this wrapper. And so for quarto, I went 01:38:22.800 |
with a more leaky abstraction, which basically says like everything that's in pandoc is kind of 01:38:27.520 |
there pass through. Partly that's because pandoc had evolved. It used to be that it could only 01:38:33.360 |
accept a lot of things by command line parameters. And so now it can take everything through YAML. 01:38:38.320 |
And so like, it became a system that you could interact with more reasonably without a special 01:38:44.960 |
wrapper. And so, you know, that I felt like if we decided to try to wrap it, it was going to be kind 01:38:52.400 |
of a losing game, trying to keep up with everything people were trying to do was by making it leaky, 01:38:56.560 |
we would sort of pre-roll on everybody's knowledge of pandoc and all the things that are in pandoc. 01:39:02.160 |
So that was one. And the other one, I think, which we didn't really decide on until about a year into 01:39:07.440 |
the project, was how much we should be batteries included, or how much we should be sort of 01:39:12.080 |
extension and plugin driven. And you know, extension and plugin driven can be very dynamic, 01:39:18.000 |
you know, like the JavaScript ecosystem just like keeps evolving every three months. And it's always, 01:39:22.560 |
you know, on the other hand, it's really hard for people to get their bearings and, and, and things 01:39:27.520 |
get. And so we went on batteries included, because we felt like we actually it was a somewhat bounded 01:39:35.040 |
problem. There was a bunch of it was sort of known what the we looked at a bunch of systems said, 01:39:41.200 |
it's a known feature set. And the users are not JavaScript engineers, they're analysts and 01:39:47.840 |
scientists, they will appreciate batteries. I would say as a user, I've definitely appreciated that. 01:39:56.320 |
I, yeah, I don't want to spend my time figuring out how to add a JavaScript based syntax highlighter 01:40:04.320 |
and a JavaScript based table of contents and exactly how to modify the CSS to create a 01:40:10.320 |
collapsible sidebar. I mean, nobody, yeah, nobody wants to do that. I mean, everybody needs all 01:40:14.960 |
those things. So just, you know, you give it to me, but you do a good job of making sure I can 01:40:21.520 |
replace it if I want to. And there are plenty of things that I've wanted to replace. And, and, 01:40:28.960 |
you know, so very kindly, one of the first things you did for us was you added the IPYNB filter 01:40:33.840 |
directive, where we now have a Python script that takes us, turn it in a notebook, and feeds back 01:40:42.720 |
has turned it out a notebook into being modified. And by using that, we can totally do anything we 01:40:49.680 |
like between that and the Lua filters on the AST. Yeah, there's nothing we can't do. 01:40:56.320 |
Yeah, we just introduced recently, sort of a plot extensions, which is basically, it's Lua. 01:41:02.320 |
Yes, they're installable, they're kind of easy to bundle. And so that's a nice, a nice. 01:41:08.960 |
Yeah, Hamel and I were talking about that this morning. Alright, so if I answer to this question, 01:41:13.600 |
I think, you know, the main one is actually not just about nvdev, but kind of everything we do, 01:41:19.600 |
which is in Python, there's a schism between treating it as a kind of a static language that 01:41:27.680 |
you write a bit like Java, versus a highly dynamic language that you write a bit like Lisp. And 01:41:36.800 |
in my opinion, Jupyter is best for the latter. And in general, I like writing code using the latter 01:41:45.600 |
approach. I like to, you know, I like exploratory code where I'm manipulating objects and 01:41:52.800 |
taking advantage of metaprogramming and dynamic features. The Python community has very heavily 01:42:01.360 |
leaned in towards the former, you know, so static typing, and a lot of very more enterprisey 01:42:11.120 |
approaches to testing and documentation and lots of single use tools with their own 01:42:21.760 |
concepts to learn and stuff, you know. So that's the big trade off we've made. 01:42:31.280 |
Is to basically opt out of the usual way of doing things in Python to the extent where we're 01:42:37.840 |
always starting to think, like, should we describe this as a different dialect of Python? Because 01:42:42.400 |
it's not particularly recognizable to... And you wouldn't want people to expect, oh, I can just 01:42:48.480 |
pour in all the stuff that I'm already using. Well, I mean, it interacts with it all fine, 01:42:53.520 |
but you write it in a different way. So like, if you're used to using VS code and very heavily 01:42:58.720 |
relying on static type annotations, you're not going to love our libraries because they're so 01:43:03.440 |
dynamic that VS code doesn't generally know what the hell is going on. It just kind of gets confused. 01:43:08.320 |
So, whereas in Jupyter, Jupyter always knows exactly what's going on because it can do real 01:43:14.160 |
time introspection of the symbols. And, you know, this is something you've got to say about the 01:43:22.320 |
Python community. There's this kind of basic principle that comes from Greedo, the original 01:43:29.520 |
developer of Python, which is that ideally there should only be one way to do it. And 01:43:34.800 |
I don't understand how this ever became a thing, because as soon as you say that you basically 01:43:42.080 |
turn off innovation, because if you want to do something better, you're not allowed to, 01:43:45.840 |
because you've just created a second way to do it. And so the Python community often is, 01:43:52.000 |
you know, or at least this kind of core group is often quite anti fast AI stuff, 01:43:58.320 |
because we're a second way to do it for all values of it. You know, we have a different way of 01:44:04.320 |
testing, we have a different way of building libraries, we have a different way of doing 01:44:07.920 |
types, we even have like a different way of, you know, we have a Julia inspired type dispatch 01:44:12.720 |
system, like we do a lot of stuff inspired from non Python languages. And, you know, I think that's 01:44:22.080 |
really problematic, whereas our seems to really, this seems a much more flourishing, welcoming and 01:44:29.840 |
diverse community than the Python community does feel that way. And there's a lot of different 01:44:34.400 |
there's a lot of variance in how people do things. And it's generally accepted. I would say, yeah, 01:44:39.280 |
yeah, there's a lot of stuff where people are, people are always finding new ways to use the 01:44:46.800 |
languages dynamic features to do express things differently. So yeah, yeah. So, you know, there's 01:44:55.600 |
a lot of things I don't love about our and there's a lot of things I do love about our, you know, 01:44:58.880 |
like you, I came out of the, you know, SAS, SPSS, Excel world, we used s plus, you know, 01:45:05.280 |
back before I was really a thing in the previous startup. That was my world for so many years. 01:45:15.120 |
And I wouldn't say I wouldn't go back to it. I don't, I like the language of Python more, 01:45:20.560 |
but there's a lot of stuff I wish I could have, you know, everything that Hadley has written 01:45:26.000 |
and the community, the documentation, the formula language. All right, we got one more each, 01:45:37.280 |
if that's okay. Okay, sure. That's great. Yeah. All right, JJ, nb dev two is built on top of quarter. 01:45:46.480 |
Do you have any other thoughts for stuff that might be able to build on top of quarter that 01:45:51.840 |
would be interesting to I think a couple classes of thing and nb dev two is an exemplar one, 01:45:58.640 |
which is I think of it as sort of generation of web content from software artifacts. 01:46:08.160 |
So I have a software artifact. In this case, I have my notebook that defines a bunch of functions 01:46:14.880 |
and exports things. And I can generate a website from that you could think of, you know, it has 01:46:19.760 |
real time elements. So it's not a perfect analog, but like TensorBoard, there's these artifacts 01:46:24.560 |
created in a directory, and then they create this web experience from it. And so I do think there's 01:46:29.280 |
a lot of things in the Bioconductor project had this thing pre R markdown, but they have these 01:46:35.200 |
S4 objects that were very complicated, they could have like gene sequences in them and all kinds of 01:46:40.400 |
stuff. And if you just literally get you call it a function, pass the object, it makes a website 01:46:45.200 |
from it, you know, so I think that this idea of having different types of software artifacts, 01:46:49.440 |
and then just creating websites from them is really interesting. And obviously, like documentation for 01:46:54.480 |
a software package is one variant of that, but there are other ones. And the other is, you know, 01:47:00.160 |
we sort of promote, hey, look, you can make a website, you can make a book, but you can pretty 01:47:04.960 |
much you can feed just about any publishing pipeline through, you know, from notebooks 01:47:10.480 |
through quarter into the publishing pipeline. So like, you know, you if you've got a big Hugo 01:47:15.360 |
website, you can you can pump markdown into that, or you have you're using confluence, and you need 01:47:21.280 |
to put all your articles there, you can pump things. So start building these publishing pipelines 01:47:27.120 |
downstream of Cordo to these other because, you know, it's great that you can easily make a website, 01:47:31.120 |
but oftentimes you need to get it, you need to get your content somewhere else. And so, you know, 01:47:35.920 |
hopefully, we can teach people how to do this, how to do this. I mean, it's all possible. I remember 01:47:40.080 |
you guys asked about Docosaurus. And I was like, Oh, here's an example, you can totally like feed a 01:47:44.480 |
Docosaurus site with Cordo. You know, I know how to do it, you know, I got to teach other people 01:47:49.920 |
with you. Yeah, yeah. Yeah, great. And so then, my last question was, Jeremy, NBDev made literate 01:47:58.720 |
programming in Jupiter feasible. NBDev2 improves upon that even further. What are some open 01:48:04.800 |
research slash exploration areas that could help improve literate programming even further in the 01:48:10.000 |
future? That was one of my questions, too. So that's good. All right, so I'm just gonna totally 01:48:18.480 |
hand that over to somebody else much smarter than me, who's thought about it for long than me, 01:48:22.080 |
which is Brett Victor. So Brett Victor has his talk from 2013 called The Future of Programming. 01:48:28.560 |
And Brett talks about this idea of coding being, you know, trying to work with a direct manipulation 01:48:37.680 |
of data. And so I think to me, you know, as I say, it's not so much about literate programming, 01:48:42.320 |
it's about exploratory programming. And Brett's given so many great examples of directly manipulating 01:48:47.840 |
things to code. But he actually shows his examples from the 60s, like Sketchpad, where Ivan Sutherland 01:48:54.480 |
was directly drawing things on a display, believe it or not, to like create constraints and or to 01:49:00.880 |
create automatic drawings. 69, a prologue based approach to kind of describing what you want. 01:49:15.760 |
Pattern matching, Doug Engelbert's ideas from 1968. Again, like all like manipulating things 01:49:28.480 |
on screen directly. Rand Corporation's grail of like building things up in this way. And of course, 01:49:36.480 |
we've talked about small talk. And yeah, it's all interactive responses. And so people like Brett 01:49:46.720 |
and Alan Kay talk about how we've somehow, you know, lost our ability to, you know, write things 01:50:01.760 |
in like in environments that are more like this, you know, I mean, there's a classic example from 01:50:09.040 |
Brett Victor, where he's designing a computer game, like a Super Mario style computer game. 01:50:16.160 |
And he sets up this kind of time travel debugging type system, but it's actually shows you the exact 01:50:23.280 |
way what would happen if somebody pressed the buttons you pressed in your game just now, 01:50:27.920 |
and like shows you where the characters would all end up. And he like, modifies them in real time, 01:50:32.080 |
and you see them moving. Yeah, this is like what it should feel like to work with code is it should 01:50:40.640 |
feel like this artisanal real thing. We're pretty far. It's funny, we know books are great. And 01:50:47.280 |
data science rebels are great. They are, they're probably like 15% along the way that they need. 01:50:55.200 |
I'm pretty excited about working on those problems too. Brett had also a great example of he had this 01:51:00.800 |
award winning iOS app for basically the train schedule, the bot schedule in San Francisco, 01:51:10.880 |
and he showed this example in one talk, where he describes how you could have written the whole app 01:51:17.840 |
entirely using a kind of graphical object system that's just totally unlike any coding that 01:51:26.960 |
I've ever seen. Yeah, yeah. Yeah. Well, thank you, JJ. 01:51:33.200 |
I appreciate it. Our two way AMA slash conversation. I think I did just reinvent the idea of a 01:51:40.560 |
conversation. You have to see if you're gonna if you're gonna if you're gonna promote it as a two 01:51:46.640 |
way AMA or conversation. Yeah. All right. Well, good luck with the last couple of weeks up to 01:51:55.280 |
the launch. Yeah, absolutely. Well, thanks. And we're going to be launching right around the same 01:52:00.640 |
time. So exactly the same time. It'll be fun. All right, mate. Take care of the rest of you. Bye.