Streamlit for ML #3 - Make Apps Fast with Caching

00:00:00.000 | In this video, we are going to continue with the app that we've been building.

00:00:04.740 | So, so far, it's a quick summary.

00:00:07.540 | We have what you can see over here.

00:00:10.000 | We have this AI Q&A.

00:00:13.100 | It's just the very basics of Streamlit.

00:00:15.940 | And we have this search bar or text input that we've built with Streamlit over here.

00:00:22.240 | Okay.

00:00:23.300 | And then we put together the back end of our app.

00:00:30.560 | So that is initializing a connection to our vector database.

00:00:35.300 | We also created all of our context vectors and put those in there.

00:00:40.400 | And also initializing our retriever model,

00:00:44.940 | which takes care of encoding the query here.

00:00:48.200 | And then Pinecone over here takes care of finding

00:00:52.800 | the most relevant context vectors based on that query vector.

00:00:59.100 | And then we had a look at how we can iterate through all the contexts

00:01:04.040 | that we returned from Pinecone and then display them.

00:01:07.740 | Now, at the moment, it's very ugly.

00:01:09.860 | And at the same time, another really bad thing

00:01:14.560 | that we need to solve in this video is this takes forever to do anything, right?

00:01:20.100 | If I just, maybe even if I just remove that and press Enter,

00:01:25.800 | I'm not even searching for anything.

00:01:28.140 | And this is going to take, I don't know, like a minute.

00:01:31.060 | I'm going to cut forwards so you don't have to wait as long as I do.

00:01:35.360 | Okay. So it's just finished. That took way too long.

00:01:39.560 | So what we want to do, or the reason for that

00:01:43.060 | is mainly the retriever model download over here.

00:01:46.960 | So every time we rerun or change anything in our app,

00:01:51.700 | the way Streamlit works is it re-executes everything in your script.

00:01:56.160 | And that's really good because it makes developing an app super simple.

00:02:00.800 | But when you have something like that, you're downloading an ML model,

00:02:05.360 | you don't want to redo that every time a tiniest little thing changes in your app.

00:02:10.500 | You only want to do it once, like when the user opens your app the first time,

00:02:15.060 | then you download it, and then you don't download it again.

00:02:17.560 | That's what you want to happen.

00:02:19.840 | And we also want the same to happen with our Pinecone connection.

00:02:24.860 | We just want to initialize that connection once

00:02:27.540 | and not every time something changes in the app.

00:02:31.060 | So we're going to do that. We're going to figure that out.

00:02:34.360 | So we can go over to the Streamlit docs,

00:02:38.440 | and we can scroll up to the top or go to its menu.

00:02:42.160 | And we go to Advanced Features.

00:02:45.540 | Now, I know it says Advanced, but it's not hard to do this.

00:02:49.300 | So we can optimize performance with stcache.

00:02:53.400 | Let's have a look at that.

00:02:55.600 | So we can scroll down.

00:02:58.700 | It's a caching mechanism that allows your app to save performance

00:03:02.040 | when loading data from the web,

00:03:03.900 | a lot of data sets, or performing expensive computations.

00:03:07.060 | Now, that sounds pretty much like what we want.

00:03:10.500 | So let's go down, and we see Basic Usage.

00:03:14.100 | So we have... This is a good example.

00:03:17.200 | So we have this function here.

00:03:19.300 | It takes a long time to run every time.

00:03:22.640 | And therefore, it makes the app very slow.

00:03:25.760 | Every time anything changes, the whole thing is reloaded.

00:03:29.640 | So this expensive computation is rerun every time.

00:03:33.060 | We don't want that to happen.

00:03:34.800 | What we want to do is, okay, you can just add this.

00:03:38.540 | And that means that the output from that expensive computation

00:03:44.340 | is just stored.

00:03:45.740 | It's not reloaded every single time.

00:03:48.900 | Okay, so let's try.

00:03:52.200 | I'm not saying it's going to work, but let's try and do that.

00:03:54.940 | So we're going to put Define Init Retriever.

00:03:59.340 | Okay, and that is just going to return the Retriever model.

00:04:05.660 | So return that.

00:04:09.540 | And we do the same for our Pinecone stuff here as well.

00:04:14.040 | So Define Init Pinecone.

00:04:17.440 | Okay, and then obviously, we need to actually call those.

00:04:22.600 | So let's do that here.

00:04:26.940 | Okay, so we're going to call those.

00:04:30.140 | We want the model is equal to Init Retriever.

00:04:35.460 | And we want the index is equal to Init Pinecone.

00:04:42.000 | Okay, let's save that.

00:04:43.560 | Let's have a look at our app.

00:04:46.560 | Okay, so it's running again.

00:04:48.300 | Let's wait a moment.

00:04:51.400 | Actually, stop that because here we're returning nothing.

00:04:55.800 | So we actually want to return the index.

00:04:58.400 | That would be useful.

00:05:00.460 | And now we have to press this, rerun.

00:05:07.440 | Okay, now the first time we do this, it's going to take a while.

00:05:12.160 | And first, okay, we want to make sure it's actually,

00:05:18.140 | is it working like it was before?

00:05:20.900 | Let's see.

00:05:22.860 | Are we getting any errors?

00:05:24.900 | Okay, no, it always seems to be working fine.

00:05:28.060 | Okay, so let's do, let's add that stcache

00:05:32.960 | that we saw in the documentation.

00:05:35.340 | Let's add that to both of these.

00:05:38.500 | Okay, save, rerun.

00:05:42.540 | We get this nice little spinner running Init Retriever.

00:05:46.000 | It's not very descriptive for our users.

00:05:47.680 | So later on, we'll have a look at making that a little more

00:05:51.340 | interesting or descriptive.

00:05:53.480 | But for now, we'll stick with that.

00:05:55.680 | And this is quite useful because we can see, okay,

00:05:58.000 | what are the slow parts of our model to load?

00:06:02.580 | Okay, so we get this error.

00:06:07.140 | Okay, why is that?

00:06:09.280 | So when we are caching with Streamlit,

00:06:17.340 | what it is doing is, well, it basically checks

00:06:21.680 | if whatever's been cached changes, okay, with every rerun.

00:06:26.080 | So it's putting the function or putting some values

00:06:29.400 | into your function or rerunning it

00:06:31.240 | and having a look at what the hash code is that comes out of it.

00:06:34.980 | Now, in this case, we're calling an API.

00:06:40.980 | We don't or we cannot actually hash the connection

00:06:49.880 | to our Pinecone index, okay?

00:06:52.680 | And we shouldn't really do that for our retriever model either.

00:06:56.240 | So what we can do is something which is kind of new

00:07:00.320 | from Streamlit, okay?

00:07:02.680 | So whereas stcache is always going to check the hash code,

00:07:07.380 | see if anything is changing,

00:07:08.880 | there are these new experimental caches.

00:07:12.180 | And one of those in particular is this,

00:07:15.380 | we have experimental memo, it's fine.

00:07:18.520 | So we use that to store expensive computations.

00:07:21.580 | That's fine, you can try that with some things,

00:07:24.840 | but that's not what we want.

00:07:26.380 | We want this experimental singleton.

00:07:29.140 | So basically what that means, experimental singleton,

00:07:32.620 | is whatever you're running should just be run once

00:07:35.240 | and it should not change, right?

00:07:37.740 | Streamlit is assuming that this will not change, right?

00:07:40.380 | So it's not going to check if it's changed

00:07:43.140 | and therefore it is not going to create that hash representation

00:07:47.820 | of whatever it is you're running.

00:07:50.620 | So we can write st experimental singleton.

00:08:00.180 | Put it here as well, oops, copy it.

00:08:04.180 | Put it here.

00:08:09.380 | And okay, we've just saved it.

00:08:11.540 | Let's have a look, see what happens.

00:08:14.480 | Okay, again, it's going to take a little while

00:08:16.440 | to rerun everything, hopefully not too long.

00:08:21.100 | Okay, there we go.

00:08:22.180 | So now we have our search.

00:08:23.980 | Let's say, who are the Normans?

00:08:27.800 | Okay, I'm not going to skip ahead straight away.

00:08:31.500 | Okay, so there's no waiting anymore,

00:08:33.320 | which is really good because before it just took so long.

00:08:38.800 | So yeah, that's how we've sort of improved the performance

00:08:42.820 | of our Streamlit app using caching

00:08:46.200 | and these new experimental caching primitives

00:08:48.520 | that Streamlit have developed.

00:08:51.320 | So that's incredibly useful.

00:08:54.560 | And what I want to look at in the next video

00:08:57.000 | is, okay, over into our app.

00:08:59.960 | Yes, the performance is there now,

00:09:01.360 | but it doesn't look so good.

00:09:03.860 | So maybe we can have a look at actually improving

00:09:08.600 | this look here.

00:09:11.160 | And to do that, we're actually going to not use,

00:09:14.360 | well, we are going to use Streamlit,

00:09:15.960 | but we're going to pull in what are called

00:09:18.720 | bootstrap card components,

00:09:21.120 | which are another sort of HTML or CSS library.

00:09:25.700 | And using the style from them to display our information.

00:09:30.700 | It would look a lot nicer than it does now.

00:09:34.400 | So that's it for this video.

00:09:37.260 | I hope it's been useful.

00:09:38.860 | Thank you very much for watching

00:09:40.160 | and I'll see you in the next one.

Streamlit for ML #3 - Make Apps Fast with Caching

Chapters