NSFW Image Detection with AI

00:00:00.000 | Today we're going to be taking a look at the new Vision and Image features of Semantic Router.

00:00:05.840 | So we've added Vision transformers, thanks to Bogdan who also put together the demo that we're going to be walking through,

00:00:11.800 | and CLIP, which is a multimodal model.

00:00:15.240 | Now both of these together mean that we now have the ability to use image routes and also multimodal routes.

00:00:23.720 | Now, why would we want to use Vision or multimodal routes?

00:00:28.120 | Well, there are actually a lot of use cases, for example, data pre-processing.

00:00:32.000 | We can use this to route our processing methodology in different directions, for example with PDFs.

00:00:38.360 | We see both text and image, and based on the type of text or the types of images that we're seeing,

00:00:44.760 | we might want to process them in different ways, so we can use it there.

00:00:48.160 | We can also use these encoders in a slightly different way using the Semantic splitters in the Semantic Router library,

00:00:54.520 | which I haven't spoken about yet, but I will soon, to split video automatically based on the imagery that you're seeing within the video.

00:01:02.880 | And I think by far probably the most obvious use case here is image detection,

00:01:08.920 | and in particular with the increase in AI-generated images and maybe just people-generated images as well,

00:01:15.080 | you can use this for things like SFW versus NSFW image detection, which is what we're going to see here.

00:01:22.160 | So let's start by having a look at our NSFW SFW dataset.

00:01:27.320 | So for those of you that are not aware, SFW meaning Shrek for work, and NSFW is not Shrek for work.

00:01:35.720 | The idea is that when you're at work, you only want to be viewing SFW pictures, i.e. pictures containing Shrek.

00:01:43.720 | Whereas when you're not at work, you can look at any images you want.

00:01:47.360 | So we have an example notebook for this, of course.

00:01:51.120 | We're going to come to the Semantic Router library, docs, multimodal here, and I'm going to click open in Colab.

00:01:58.120 | Okay, great. So we're going to first do a pip install of Semantic Router, the version that we need for this,

00:02:05.120 | and we're specifying the vision dependencies here.

00:02:08.560 | There are a few vision dependencies, you've got like torch vision and things in there,

00:02:13.280 | so this can take a little bit of time to actually install everything.

00:02:18.080 | We're also going to be using hung face datasets, that is because we're going to be downloading the dataset I just showed you,

00:02:25.520 | the Shrek versus not Shrek dataset to use as routes here.

00:02:30.720 | So while we are waiting for that to install, I'm going to come down to here and I'll just show you what this dataset actually looks like.

00:02:37.520 | So we have two splits in the dataset, a training split and a test split.

00:02:43.600 | Now to load it, we're going to use this, and then you can see that we have these images here.

00:02:47.840 | So we have this, I've counted this as a Shrek image.

00:02:51.360 | So what we're going to want to do is set up some routes that detect Shrek or not Shrek.

00:02:59.520 | And we're going to be using these images within the training splits.

00:03:03.680 | We also have, if we come down here, we also have our test split.

00:03:08.000 | We won't use any of these to create our routes because we want to see that this does apply or does transfer,

00:03:15.160 | generalize to our test data as well.

00:03:18.000 | And obviously we see some slightly different Shrek and not Shrek images in here.

00:03:25.040 | So I'll skip ahead to when our install is complete.

00:03:28.840 | Okay, so it's installed.

00:03:30.280 | You will see this little warning here.

00:03:32.440 | It's not a big deal, it's fine.

00:03:33.960 | It does work.

00:03:36.680 | Okay, run that and we should see that this will work.

00:03:39.680 | You should see the Shrek rock image pop up.

00:03:43.480 | Okay, cool, looks good.

00:03:45.240 | Now what we want to do is grab all the images that are labeled with isShrek.

00:03:49.720 | So you can see in the data, maybe I'll come here and show you.

00:03:56.400 | So let's look at the data.

00:03:59.560 | We have three fields.

00:04:01.200 | So text, which is like a kind of descriptive field of what is within the image, although it's not really that descriptive.

00:04:11.480 | So like this one is Dwayne Johnson with hair.

00:04:14.920 | We have the image file and then we also have, okay, this isn't Shrek and we can have a look.

00:04:20.320 | All right, so we have image.

00:04:25.320 | Let's take a look.

00:04:27.200 | Okay, not Shrek.

00:04:28.520 | This one is Shrek.

00:04:29.800 | So what we want to do is grab the images that are labeled with isShrek.

00:04:33.960 | So, for example, the third one that we have here, this is Dwayne Johnson's Shrek and it is Shrek here.

00:04:44.240 | So we're going to go through, grab those and we're creating a list here of images that are Shrek and images that are not.

00:04:52.360 | Okay, so we have five that are Shrek, 19 that are not.

00:04:55.800 | Okay, so we're going to create our routes using the images.

00:04:59.320 | So the image is actually going in place of where we'd usually put our utterances, okay.

00:05:04.360 | So, yeah, we can create that and we're also going to create a not Shrek route as well.

00:05:10.200 | We could, I think we could just avoid that to be honest, but it's okay.

00:05:14.920 | We could do either really.

00:05:16.920 | It's good to be very verbose with your embeddings.

00:05:21.400 | We're going to initialize our multimodal clip encoder here.

00:05:25.720 | Okay, we'll take a moment to download.

00:05:27.920 | It's not a massive, this is the model size here.

00:05:30.840 | It's like almost 600 megabytes, it's not huge, but it isn't small either.

00:05:37.720 | And then what we want to do is initialize our route layer.

00:05:39.960 | A route layer always, as we've seen before, requires a encoder, which is in this case our multimodal clip encoder and the routes.

00:05:49.240 | Okay, the routes we defined before with Shrek and not Shrek.

00:05:54.320 | And we're going to test, okay, so we're going to see don't you love politics?

00:05:58.400 | That shouldn't really be either of, it shouldn't trigger anything, right.

00:06:03.280 | And this is a, you know, you can see here that I'm using text to classify here, even though we use just images in our routes.

00:06:11.200 | So that's the kind of interesting thing that we can do here.

00:06:14.160 | Okay, so we can see that here Shrek, the text is classified as Shrek in the routes, which is cool.

00:06:21.960 | So it's, you know, putting them into the images of Shrek bucket.

00:06:25.560 | And then Dwayne "The Rock" Johnson, it's seen Dwayne Johnson in the images and they are tagged as not Shrek.

00:06:33.720 | So it's giving us the not Shrek route.

00:06:36.400 | Okay, so we have everything being classified correctly there with those, you know, with that text.

00:06:42.600 | But what we really want to be doing here as well, you know, we can do both, of course, like we've seen,

00:06:48.280 | is we can take some images that we haven't seen before and see if we can label them correctly.

00:06:54.840 | So we're going to be loading this other data set.

00:06:58.520 | So the test data set, and then here there's a mix of, you know, as I said, Shrek and not Shrek.

00:07:05.040 | So this one, we have Shrek and we will see what classification we get here, which you can actually see already.

00:07:13.000 | So I run this and we can see Shrek.

00:07:15.640 | We can remove the name here, it gives us the full route choice object.

00:07:19.680 | Okay, so name Shrek.

00:07:21.280 | We have another image here again of Shrek.

00:07:24.600 | And if we come down here, yeah, I mean, I've already run it.

00:07:29.320 | So you can see that it is classifying as Shrek.

00:07:32.440 | And then we have our not Shrek picture here.

00:07:36.120 | Where did that go?

00:07:38.120 | Okay, so we have this nice coral reef.

00:07:41.280 | And if we come down to here, it's saying this is not Shrek.

00:07:45.360 | Okay, so I think in the training data for not Shrek, we have some nature images.

00:07:50.400 | So it puts nature images into that rather than none.

00:07:53.920 | So, yeah, I mean, that is it.

00:07:57.320 | We have our multimodal route layer here, it seems to be working pretty well.

00:08:03.000 | We can also, you know, if you want to take us further, you can go have a look at the route optimization stuff that we've talked about,

00:08:09.120 | where you're literally training your route layer on like a training set of utterances to the routes that they should trigger.

00:08:18.240 | With that, you can get pretty good results.

00:08:20.720 | And we have like an image detection or classification route layer here, which works pretty well.

00:08:28.760 | So that is it for this video.

00:08:30.280 | As I mentioned at the start, there's a lot more that we can actually do with this,

00:08:33.680 | ranging from the route layers that we've seen here to simple even video splitting or more intelligent data processing.

00:08:43.440 | And I'm sure there's plenty of other ways that we can use this as well that I just haven't thought of yet.

00:08:50.720 | So I'm very interested to see what people build this.

00:08:54.760 | If you do decide to build something cool, I'd love to hear about it.

00:08:58.080 | But for now, I'm going to leave it there.

00:08:59.200 | So thank you very much for watching.

00:09:01.360 | I hope this has been useful and interesting.

00:09:03.840 | And I will see you again in the next one.

00:09:06.080 | Bye.

00:09:07.080 | [MUSIC PLAYING]

NSFW Image Detection with AI

Chapters