"Decrappify" at Facebook F8 - New Approaches to Image and Video Reconstruction Using Deep Learning

I want to thank all of you for your time and for your time. Thank you. >> Hi, thanks for coming, everybody. There's a few seats at the front still if you're standing at the back, by the way. Some over here as well. Hi, my name is Jeremy Howard, and I am going to be talking to you about fast AI.

I'm from fast AI. You may know of us from our software library called excitingly enough, fast AI, which sits on top of PyTorch and powers some of the stuff you'll be seeing here today. You may also know of us from our course, course.fast.ai, which has taught hundreds of thousands of people deep learning from scratch.

Our first part of our course out now covers all of these concepts, and I'll tell you more about the upcoming new course later this afternoon. Our course covers many topics. Each one, there's a video, there's lesson notes, there's a community of tens of thousands of people learning together, and one of the really interesting parts of the last course was lesson seven, which, amongst other things, covered generative networks.

And in this lesson, we showed this particular technique. This here is a regular image, and this here is a crapified version of this regular image. And in this technique that we taught in lesson seven of the course, we showed how you can generate this from this in some simple, heuristic manner.

So in this case, I down sampled the dog. I added JPEG artifacts, and I added some random obscuring text. So once you've done that, it's then a piece of cake to use PyTorch and fast.ai to create a model that goes in the other direction. So it can go in this direction in a kind of deterministic, heuristic manner, but to go from here to here is obviously harder, because you have to, like, figure out what's actually being obscured here, and what are the JPEG artifacts actually hiding, and so forth.

So that's why we need to use a deep learning neural network to do that process. So here's the cool thing. You can come up with any heuristic crapification function you like, which is a deterministic process, and then train a neural network to do the opposite, which allows you to do almost anything you can imagine in terms of generative imaging.

So, for example, in the course, we showed how that exact process, when you then apply it to this crappy JPEG low resolution cat, turned it into this beautiful high resolution image. So this process does not require giant clusters of computers and huge amounts of labeled data. In this case, I used a small amount of data trained for a couple of hours on a single gaming GPU to get a model that can do this.

And once you've got the model, it takes a few milliseconds to convert an image. So it's fast, it's effective, and so when I saw this come out of this model, I was like -- I was blown away. How many people here have heard of GANs, or Generative Adversarial Networks?

This has no GANs. GANs are famously difficult to train. They take a lot of time, a lot of data, a lot of compute. This simple approach requires no GANs at all. I was then amazed at about the time that I was building this lesson when I started seeing pictures like this online from something called Deoldify, which takes black and white images and converts them into full-color images.

These are actual historical black and white images I'm going to show you. So this was built by a guy called Jason, who you're about to hear from, who built this thing called Deoldify. I reached out to him and I discovered he's actually a fast AI student himself. He had been working through the course, and he had kind of independently invented some of these decrapification approaches that I had been working on, but he had kind of gone in a slightly different direction as well.

And so we joined forces, and so now we can show you the results of the upcoming new version of Deoldify to be released today, which includes all of the stuff from the course I just described and a bunch of other stuff as well, and it can create amazing results like this.

Check this out. Here is an old black and white historical photo where you can kind of see there's maybe some wallpaper and there's some kind of dark looking lamp, and the deep learning neural network has figured out how to color in the plants and how to add all of the detail of the seat and maybe what the wallpaper actually looks like and so forth.

It's like an extraordinary job because this is what a deep learning network can do. It can actually use context and semantics and understanding of what are these images and what might they have looked like. Now that is not to say it knows really what they look like, and sometimes you get really interesting situations like this.

Here is the Golden Gate Bridge under construction in 1937, and it looks here like it might be white. And the model made it white. Now, the truth is this might not be historically accurate or it might be we actually don't know, right? So actually Jason did some research into this and discovered at this time apparently they had put some kind of red primer to see what it would look like, and it was a lead-based paint, and we don't know in the sunlight did it look like this or not.

So historical accuracy is not something this gives us, and sometimes historical accuracy is something we can't even tell because there aren't color pictures of this thing, right? So it's an artistic process, it's not a historical reenactment process, but the results are amazing. You look at tiny little bits like this and you zoom in and you realize it actually knows how to color in porcelain.

It's extraordinary what a deep learning network can do from the simple process of crappification and decrappification. So in this case the crappification was take a color image, make it black and white, randomly screw up the contrast, randomly screw up the brightness, then try and get it to undo it again just using standard images and then apply it to classic black and white photos.

So then one of my colleagues at the Wicklow AI Medical Research Initiative, a guy named Fred Munro, said we should go visit the Salk Institute because at the Salk Institute they're doing amazing stuff with $1.5 million electron microscopes that create stuff like this. And these little dots are the things that your neurotransmitters flow through.

And so there's a crazy guy named Uri at Salk who's sitting over there who's trying to use these things to build a picture of your whole brain. And this is not a particularly easy thing to do. And so they try taking this technique and crappifying high resolution microscopy images and then turning them back again into the original high res and then applying it to this, this is what comes out.

And so you're going to hear from both of these guys about the extraordinary results because they've gone way beyond even what I'm showing you here. But the basic approach is pretty simple. You can use fast AI built on top of PyTorch to grab pretty much any data set with one line of code or four lines of code depending on how you look at it using our data blocks API which is by far the most flexible system for getting data to deep learning of any library in the world.

Then you can use our massive fast AI dot vision library of transforms to very quickly create all kinds of augmentations. And this is where the crappification process can come in. You can add your JPEG artifacts or your black and white or rotators or brightness or whatever. And then on top of that, you then can take that crappified picture and put it through a unit.

Who here has heard of a unit before? So a unit is a classic neural network architecture which takes a high input image, pulls out semantic features from it and then upsizes it back again, generates some image. And units were incredibly powerful in the biomedical bioimaging literature. And in medical imaging, they've changed the world.

They're rapidly moving to other areas as well. But at the same time, lots of other techniques have appeared in the broader computer vision literature that never made their way back into units. So what we did in fast AI was we actually incorporated all of the state-of-the-art techniques from up sampling.

There's something called pixel shuffle. We're removing checkerboard artifacts. There's something called learnable blur. For normalization, there's something called spectral norm. There's a thing called a self-attention layer. We put it all together. So in fast AI, if you say, "Give me a unit loader," it does the whole thing for you.

And you get back something that's actually not just a state-of-the-art network but contains a bunch of stuff that can be put together even in the academic literature. And so the results of this are fantastic. But of course, the first thing you need to do is to train a model.

So with the fast AI library, when you train a model, you say, "Fit one cycle," and it uses, again, the best state-of-the-art mechanism for training models in the world, which it turns out is a very particular kind of learning rate annealing and a very particular kind of momentum annealing, something called one-cycle training from amazing researcher named Leslie Smith.

So you put all this stuff together. You need one more step, which is you need a loss function. And so the key thing here that allows us to get these great results without a gain is that we've, again, stolen other stuff from the neural network literature, which is there's something called Graham loss that was used for generating artistic pictures.

And we basically combine that with what's called feature loss together with some other cool techniques to create this really uniquely powerful loss function that we show in the course. So now we've got the CRAPified data. We've got the loss function. We've got the architecture. We've got the training schedule.

And you put it all together, and you get really beautiful results. And then if you want to, you can then add on at the very end a gain. But we have a very cool way of doing gains, which is we've actually figured out how to train a gain on a pre-trained model.

So you can do all the stuff you've just seen and then add a gain just at the very end, and it takes like an hour or two of extra training. We actually found for the stuff we were doing for super resolution, we didn't really need the gain. It actually didn't help.

So we just used an hour or two of regular unit training. But what then Jason did was he said, OK, that's nice. But what if we went way, way, way, way, way further? And so I want to show you what Jason managed to build. Jason? Thanks, Jeremy. So with the Oldified Success and SEAL images, the next logical step in my mind was, well, what about moving images?

Can we make that work as well, a.k.a. video? And the answer turned out to be a resounding yes. So for the first time publicly, I'm going to show you what the Oldified video looks like. -You better get on the job. Some of the kids may be up this afternoon.

-Oh, Jack, we can get along without dragging those young kids up here. -Oh, why don't you button up your lip? You're always squawking about something. You got more static on the radio. So that's the Oldified video. So now I'm going to tell you how that video was made. So it turns out, if you want great video, you actually have to focus on great images.

And that is really composed of three key approaches. The first is reliable feature detection, the second is self-attention, and the third is a new GAN training technique that we've been collaborating on with Fast AI that I'm really excited about. So first is reliable feature detection. So the most obvious thing to do with a generator that's unit-based, if you want to make it better, is to use a bigger ResNet backbone.

So instead of using ResNet 34, we use ResNet 101 in this case, and that allows the generator to pick up on features better. The second thing, though, and this is possibly even more important, is you need to acknowledge the fact that you're dealing with old and grainy film. So what you need to do is you need to simulate that condition.

And so you do lots of augmentation with brightness and contrast and Gaussian noise to simulate film grain. And if you get this right, that means you're going to have less problems with colorization, because the features are going to be detected correctly. If you get it wrong, I'll point this out.

You get things like this, this zombie hand right here on the door frame. And it's pretty unsightly. The second thing is self-attention. So normal convolutional networks with their individual convolutions, they're focusing on small areas of the image at any one time. It's called their receptive field. And that can be problematic if you're making a colorization model based on that alone, because those convolutions are not going to know what's going on on the left side versus the right side of the image.

So as a result, in this case, you get a different color to the ocean here versus here versus here. Whereas on the de-oldify model, we use self-attention. Self-attention allows for features on the global scale to be taken into account. And you can see that the ocean on the right side there, the right side render, is consistently colored in.

I like that, too. So the next thing, GAN. So the original de-oldify uses GANs. And the reason why it went for GANs is because they're really good at generating realism. They're uniquely good at generating realism, really. And my problem was I didn't know how to write a loss function that would properly evaluate whether or not something is realistically colored or not.

I tried. I tried to do not GANs because they're kind of a pain. But anyway, so the reason why de-oldify has great colorization is because of GANs. The original had great colorization because of that. But there's a big drawback. First, they're really slow. The original de-oldify took like three to five days to train on my home PC.

But the second thing is that they're really unstable. So the original de-oldify, while it had great renders, if you, quite frankly, cherry-picked after a few trials, overall, you'd still get a lot of glitchy results. And you'd have unsightly discoloration. And that was because of the GAN instability for the most part.

And then finally, they're just really difficult to get right, hyperparameters you have to tune. And experiment after experiment. I probably did over 1,000 experiments before I actually got it right. So the solution we arrived at was actually to just almost avoid GANs entirely. So we're calling that no-GAN. So there's three steps to this.

First, you pre-train the generator without GAN. And you're doing this the vast majority of the time. In this case, we're using what Jeremy mentioned earlier, which was feature loss or perceptual loss. That gets you really far. But it's still a little dull up to that point. The second step is you want to pre-train the critic without GAN, again, as a binary classifier.

So you're doing a binary classification on those generated images from that first step and the real images. And finally, the third step is you take those pre-generated components and you're training them together as GAN. Finally, but only briefly. This is really brief. It's only 30, 90 minutes for de-oldify.

And put that in perspective, you see this graph on the bottom here? That orange-yellow part, that's the actual GAN training part. The rest of that is pre-training. But the great thing about this is in the process, you get essentially the benefits of GANs without disability problems that I just described.

So now I'm going to show you what this actually looks like, starting from a completed pre-generated generator and then running that through the GAN portion of no GAN. And I'm actually going to take you a little too far, because I want you to see where it looks good and then where it gets a little too much GAN.

So this is before GAN. This is where you pre-train it. It's a little dull at this point. Here you can already see colors being introduced. This is like minutes in the training. Right here is actually where you want to stop. No, it doesn't look too bad yet, but you're going to start seeing their orange skin.

Or their skin's going to turn orange, rather. Yeah, right here. So you don't want that. You might be surprised. So the stopping point of no GAN training at that GAN part is right in the middle of the training loss going down, which is kind of counterintuitive. And I've got to be clear on this.

We haven't put a paper out on this. I don't know why that is, honestly. I think it might be because of stability issues. The batch size I was using back there was only five, which is kind of cool. I mean, it seems like no GAN accommodates low batch size really nicely.

It's really surprising it turns out as good as it does. But yeah, that's where I'm actually stopping it. So no GAN really solved my problems here. And I think this illustrates it pretty clearly. So when we made video with the original Deolify, it looked like this on the left.

And you can see their clothing is flashing red. And the guy's hand has got a fireball forming on it. It wasn't there in the original. Whereas on the right side, with a no GAN training, you see all those problems just disappear. It's just so much cleaner. Now, you might be wondering if you can use no GAN for anything else besides colorization.

It's a fair question. And the answer is yes. We actually tried it for super resolution as well. So I just took Jeremy's lesson on this with the feature loss. Which got it up to that point on the left. I plot no GAN on the results, and ran it for like 15 minutes, and got those results on the right, which are, I think, noticeably sharper.

There's a few more things to talk about here that I think are interesting. First is that super resolution result you saw on the previous slide. That was produced by using a pre-trained critic on colorization. So I just reused the critic that I used for colorization, and fine-tuned it, which is a lot less effort, and it worked.

That's really useful. Second thing is there was absolutely no temporal modeling used for making these videos. It's just image to image. It's literally just what we do for normal photos. So I'm not changing the model at all in that respect. And then finally, this can all be done on a gaming PC, just like Jeremy was talking about.

In my case, I was just running all this stuff on a 1080 Ti for any one run. So I hope you guys find no GAN useful. We certainly did. It solved so many of the problems we had. And I'm really happy to announce to you guys that it's up on GitHub, and you can try it out now.

And there's Colab notebooks. So please enjoy. And next up is Dr. Manner. He's going to talk about his awesome work at Salk Institute. Hi, everyone. My name is Uri Manner. I'm the director of the Weight Advanced Biophotonics Core at the Salk Institute for Biological Studies. How many of you know what the Salk Institute is?

All right, for those of you who don't know, it's a relatively small biological research institute, as per its name. It was founded by Jonas Salk in 1960. He's the one who created the polio vaccine, which fortunately we don't have to worry about today. And it's relatively small. It has about 50 labs, which may sound like a lot to you, but just to put things in perspective, in comparison to a state university, for example, a single department might have that many labs.

So it's relatively small, but it's incredibly mighty. Every single lab is run by a world-class leader in their field. And not only is it cutting edge and small, but powerful. It's also broad. So we have people studying cancer research, neurodegeneration and aging, and even plant research. And our slogan is, where cures begin.

Because we are not a clinical research institute. We are interested in understanding the fundamental mechanisms that underlie life. And the idea is that in order to fix something, you have to know how it works. So by understanding the basic mechanisms that drive life, and therefore disease, we have a chance at actually fixing it.

So we do research on cancer, like I mentioned, neurodegeneration. And even plant research, you could think of as a cure. For example, my colleagues have calculated that if we could increase the carbon capture capabilities of all the plants on the planet by 20%, we could completely eradicate all of the carbon emissions of the human race.

So we have a study going on right now where we're classifying the carbon capturing capabilities of plants from all over the globe, from many different climates, to try to understand how we can potentially engineer or breed plants that could do a better job of capturing carbon. And that's just, of course, one small example of what we're doing at the Salk Institute.

Now, as a director of the Weight Advanced Biophotonics Core, my job is to make sure that all of the researchers at the Salk have the most cutting edge, highly capable imaging equipment, microscopy, which is pretty much synonymous with biology. So I get to work with all of these amazing people, and it's an amazing job.

And so I'm always looking for ways to improve our microscopes and improve our imaging. As a microscopist, I have been plagued by this so-called eternal triangle of compromise. Are any of you photographers? Have any of you ever worked with a manual SLR camera? So you're familiar with this triangle of compromise.

If you want more light, you have to use a slower shutter speed, which means that you're going to get motion blur. If you want higher resolution, you have to make compromises in your adaptive focus. Or you have to use a higher flash. And all of these principles apply to microscopy as well, except we don't use a flash.

We use electrons, or we use photons. And a lot of times, we're trying to image live samples, live cells, and I'll show you soon what that looks like. You may be surprised, but our cell did not evolve to have lasers shining on them. So if you use too many photons, if you use too high of a laser power, too high of a flash, you're going to cook the cells.

And we're trying to study biology, not culinary science. So we need to use fewer photons when we're trying to image normal physiological processes. If you're trying to image cells under stress, then maybe using more photons is a way to do that. And of course, we care about speed. We want to capture the dynamic changes.

The whole point of imaging is that if we have the spatial temporal dynamics, the ultra structure, the architecture of the system that underlie life, that underlie ourselves, our tissues, then we can really understand how they work. So we want to be able to image all the detail we can with all the signal to noise that we can with minimal perturbations.

Now, one of the most popular kinds of microscopes in my lab and many others is something called a point scanning microscope. And the way it works is you scan pixel by pixel over the sample, and you build up an image that way. Now, you can imagine, if you want a higher resolution image, you need to use smaller pixels.

The smaller the pixel, the fewer the photons and the fewer the electrons that you can collect. So it ends up being much slower and much more damaging to your sample. The higher the resolution, the more damage to your sample. All right. So here's a real world example of what that might look like in terms of speed.

This is an actual pre-synaptic Bhutan. And on the left, you can see what happens with a low resolution scan. And on the right, you can see a high resolution scan. You can see that on the left, we're refreshing much faster. On the right, we're refreshing much slower, but there's more detail in that image.

This is a direct example or a demonstration of the trade-off between pixel size and speed. What you can't see here is this is a two-dimensional image. This is a 40-nanometer section of brain tissue. And what we really need, if we want to image the entire brain, which we do, we're actually trying to image every single synapse, every single connection in the entire brain so we can build up a wiring diagram of how this works so that maybe we can even build more efficient GPUs.

The brain is actually 50,000 times more efficient than a GPU. So Google AI is actually investing in this technology just because they want to be able to build better computers and better software. Anyway, I digress. The brain is 3D, not 2D. So how do you do this? One way is to serially section the brain into 40-nanometer slices throughout the whole thing.

That is really laborious, really hard. And then you have to align everything. So what we did was we invested in a $1.5 million microscope that has a built-in knife that can image and cut and image and cut automatically. And then we can go through the entire brain. And a lot of people around the world are using this type of technology for that exact purpose.

It's not just brain. We can also go through a tumor. We can go through plant tissue. So that brings me to my next problem, sample damage. If we want that resolution, we have to use more electrons. And when you start using more electrons in our serial sectioning device, you start to melt the sample.

And the knife can no longer cut it. Try cutting melted butter. You can't get clear sections. And everything starts to fall apart, or as Jeremy would say, it starts to fall to crap. Whereas on the right, with a lower pixel resolution with lower dose, you can actually see that we can go and we can section through pretty well.

But you can see the image is much grainier. We no longer have the level of detail that we really want to be able to map all of the fine structure of every single connection in the brain. But that is the state of connectomics research in the world today. Most people are imaging at that resolution.

But that's not satisfying for me. So in my other life, I'm a photographer. And actually, on Facebook, I follow Petapixel. And I was browsing through Facebook, stalking my friends, and doing whatever you do on Facebook. And I came across this article where they show that you can use deep learning to increase the resolution of compressed or low resolution photos.

I said, aha, what if we can use the same concept to increase the resolution of our microscope images? So this is the strategy that we ended up on. Previously, I used the same model that they used here, which was based on GANs, which you've heard about now. They're amazing.

Problem with GANs is they can hallucinate. They're hard to train. It worked well for some data. It didn't work well for other data. Then I was very lucky to meet Jeremy and Fred Monroe. And together, we came up with a better strategy, which depends on what we call "crapify." So we take our high resolution EM images, we crapify them, and then we run them through our dynamic res unit.

And then we tested it with actual high versus low resolution images that we took on our microscope to make sure that our model was fine tuned to work the way we needed it to work. And the results are spectacular. So on the left, you can see a low resolution image.

This is analogous to what we would be imaging when we're doing our 3D imaging. And as you can see, the vesicles, which are about 35 nanometers across, are barely detectable here. You can kind of tell where one might be here, but kind of hard to say. And in the middle, we have the output of our deep learning model.

And you can see now we can clearly see the vesicles, and we can quantify them. And on the right is a high resolution image. That's our ground truth. In my opinion, the model actually produces better looking data than the ground truth. And there's a couple reasons for that, which I'm not going to go into, but one reason is that our training data was acquired on a better microscope than our testing data.

So now we can actually do that three view sectioning at the low resolution that we're stuck with because of sample damage. But we can reconstruct it and get high resolution data. And this applies, as it turns out, to labs around the world. This is data from another lab from a different organism, different sample preparation, and it still works.

And that's bonkers, usually in microscopy and in a lot of cases in deep learning. If your training data is acquired in a certain way, and then your testing data is something completely different, it just doesn't work. But in our case, it works really well. So we're super happy with that.

And what that means is that all the connectomas researchers in the world who have this low resolution data, exabytes of data, where they can't see the synaptic vesicles, the details that actually underlie learning, and memory, and fine tuning of these connections, we can apply our model to their data, and we can rescue all of that information that they have to throw away for the sake of throughput, or sample damage, or whatever.

The other thing I'll point out is because the data is so much cleaner, auto encoders, units, they intrinsically denoise data. So because the data is cleaner, we can now segment out so much easier than we ever could before. So in this segmentation, we've identified the ER, mitochondria, presynaptic vesicles, and it was easier than it ever was before.

So not only do we have better data, we have better analysis of that data than we ever hoped to have before. Of course, even without GANS, you have to worry about hallucinations, false positives. So we randomized a bunch of low versus high resolution versus deep learning output, and we had two expert humans counting the synaptic vesicles that they saw in the images, and then we compared to the ground truth.

And of course, the biggest thing is that you see a lot less false negatives. We're able to detect way more presynaptic vesicles than we could before. We've gone from 46% to 70%. That's awesome. Sorry, we've gone from 43% to 13%. That's huge, even better than I just said. But we also have a little bit of an increase in false positives.

We've gone from 11% to 17%. I'm not actually sure all of those false positives are false. It's just that we have a limit in what our ground truth data can show. But the important thing is that the actual error between our ground truth data and our deep learning data is on the same order of magnitude, on the same level as the error between two humans looking at the ground truth data.

And I can tell you that there is no software that can do better than humans right now for identifying presynaptic vesicles. So in other words, our deep learning output is as accurate as you can hope to get anyway, which means that we can and should be using it in the field right away.

So I mentioned live imaging and culinary versus biological science. So here's a cancer cell. And we're imaging mitochondria that have been labeled with a fluorescent dye. And we're imaging it at maximum resolution. And what you can see is that the image is becoming darker and darker, which is a sign of something we call photobleaching.

That's a big problem. You can also see that the mitochondria is swelling. They're getting stressed. They're getting angry. We want to study mitochondria when they're not stressed and angry. We want to know what's normally happening to the mitochondria. So this is a problem. We can't actually image for a long period of time with high spatial temporal resolution.

So we decided to see if we could apply the same method for live fluorescence imaging. So we under sample at the microscope. And then we use deep learning to restore the image. And as you can see, it works very, very well. So this methodology applies to electron microscopy for connectomics.

It applies to cancer research for live cell imaging. It is a broad use approach that we're very excited about. So in conclusion, it works. And I think there's a lot of exciting things to look forward to. We didn't use the no GANS approach. We didn't use some of the more advanced resonant backbones.

There is so much we can do even better than this. And that's just so exciting to me. And I just want to point out quickly that our AI is dependent on NI, natural intelligence. And this would not have happened at all without Jeremy, without Fred, and without Lin-Jing Fang, who's our image analysis engineer at the Salk Institute.

And I also want to point out that all of these biological studies are massive efforts that require a whole lot of people, a whole lot of equipment that contributed a lot to making this kind of stuff happen. So with that, I'll hand back over to Jeremy. And thank you very much.

When we started Fast AI a few years ago, as a self-funded research and teaching lab, we really hoped that by putting deep learning in the hands of domain experts like Uri that they would create stuff that we had never even heard of, solve problems we didn't even know existed.

And so it's like beyond thrilling for me to see now that not only has that happened, but it's also helped us to help launch the Wycraum AI Medical Research Initiative. And with these things all coming together, we're able to see these extraordinary results, like Deontify and like PSSR, that's doing this world-changing stuff, which blows my mind what folks like Uri and Jason have built.

So what I want to do now is to see more people building mind-blowing stuff, people like you. And so I would love it if you go to fast.ai as well, like these guys did, and start your journey towards learning about building deep learning neural networks with PyTorch and the Fast AI library.

And if you like what you see, come back in June to the fast.ai website. We'll be launching this, which is Deep Learning from the Foundations. It's a new course, which will teach you how to do everything from creating a high-quality matrix multiplication implementation from scratch, all the way up to doing your own version of Deontify.

And so we'll be really excited to show you how things are built inside the Fast AI library, how things are built inside PyTorch itself. We'll be digging into the source code and looking at the actual academic papers that those pieces of source code are based on and seeing how you can contribute to the libraries yourself, do your own experiments, and move from just being a practitioner to a researcher doing stuff like what Jason and Uri have shown you today, doing stuff that no one has ever done before.

So to learn more about all of these decrapification techniques you've seen today, also in the next day or two, there will be a post with a lot more detail about it on the fast.ai blog. Check it out there. Check out Deontify on GitHub, where you can start playing with the code today, and you can even run it in free Colab notebooks that Jason has set up for you.

So you can actually colorize your own movies, colorize your own photos. Maybe if grandma's got a great black and white photo of grandpa that she loves, you could turn it into a color picture and make her day. So we've got three minutes for maybe one or two questions, if anybody has any.

No questions? OK, oh, one question. Yes. It looks like it works very reliably with the coloring. Of course, there could be mistakes happening. And what about the ethical side of that, or changing history kind of a thing? I mean, this is why we mentioned it's an artistic process. It's not a recreation.

It's not a reenactment. And so if you give Granny a picture of grandpa, and she says, oh, no, he was wearing a red shirt, not a blue shirt, so be it. But the interesting thing is that, as Jason mentioned, there's no temporal component of the Deontify model. So from frame to frame, it's discovering again and again that's a blue shirt.

And so we don't understand the science behind this. But we think there's something encoded in the luminance data of the black and white itself that's saying something about what the color actually is. Because otherwise, it just wouldn't work. So there needs to be a lot more research done into this.

But it really seems like we might be surprised to discover how often the colors are actually correct. Because somehow, it seems to be able to reverse engineer what the colors probably would have been. Even as it goes from camera one to camera two of the same scene, the actors are still wearing the same college pants.

And so somehow, it knows how to reverse engineer the color. Any more questions? Yes, one over there. I find the super resolution images really interesting. So there was one that's showing with the cells. And how we can make it more detailed with the details on the cell. So it was side by side with the ground truth data.

And I'm seeing that the one that's created by the super resolution is actually-- it's interesting because obviously, it's a lot better than the crappy version. But it's untrue on some levels in terms of the cell walls thickness compared to the ground truth. So my question is, given that AI is making it more detailed and making stuff up, basically, how can we use that?

Right. So that's why Uri and his team did this really fascinating process of actually checking it with humans. So as he said, they actually decreased the error rate very significantly. So that's the main thing. Did you want to say more about that? I think you said basically what I would have said, which is that we tried to use real biological structures to quantify the error rate.

And we used an example of a really small biological structure that you wouldn't be able to see in low, but you can see in high resolution. And we found that our error rates were within basically the gold standard of what we would be getting from our ground truth data anyway.

I'm not saying it's perfect. I think it will become better, especially working with Jeremy. But it's already usable. But partly, this is coming out of algorithmically speaking. The loss function can actually encode what the domain expert cares about. So you pre-train a network that has a loss function where that feature loss actually only recognizes something is there if the domain expert said it was there.

So there are things we can do to actually stop it from hallucinating things we care about. Thanks, everybody. Thanks very much.

"Decrappify" at Facebook F8 - New Approaches to Image and Video Reconstruction Using Deep Learning

Chapters

Transcript