back to index

Streamlit for ML #3 - Make Apps Fast with Caching


Chapters

0:0 Intro
2:35 Streamlit Caching
6:56 Experimental Caching Primitives

Whisper Transcript | Transcript Only Page

00:00:00.000 | In this video, we are going to continue with the app that we've been building.
00:00:04.740 | So, so far, it's a quick summary.
00:00:07.540 | We have what you can see over here.
00:00:10.000 | We have this AI Q&A.
00:00:13.100 | It's just the very basics of Streamlit.
00:00:15.940 | And we have this search bar or text input that we've built with Streamlit over here.
00:00:22.240 | Okay.
00:00:23.300 | And then we put together the back end of our app.
00:00:30.560 | So that is initializing a connection to our vector database.
00:00:35.300 | We also created all of our context vectors and put those in there.
00:00:40.400 | And also initializing our retriever model,
00:00:44.940 | which takes care of encoding the query here.
00:00:48.200 | And then Pinecone over here takes care of finding
00:00:52.800 | the most relevant context vectors based on that query vector.
00:00:59.100 | And then we had a look at how we can iterate through all the contexts
00:01:04.040 | that we returned from Pinecone and then display them.
00:01:07.740 | Now, at the moment, it's very ugly.
00:01:09.860 | And at the same time, another really bad thing
00:01:14.560 | that we need to solve in this video is this takes forever to do anything, right?
00:01:20.100 | If I just, maybe even if I just remove that and press Enter,
00:01:25.800 | I'm not even searching for anything.
00:01:28.140 | And this is going to take, I don't know, like a minute.
00:01:31.060 | I'm going to cut forwards so you don't have to wait as long as I do.
00:01:35.360 | Okay. So it's just finished. That took way too long.
00:01:39.560 | So what we want to do, or the reason for that
00:01:43.060 | is mainly the retriever model download over here.
00:01:46.960 | So every time we rerun or change anything in our app,
00:01:51.700 | the way Streamlit works is it re-executes everything in your script.
00:01:56.160 | And that's really good because it makes developing an app super simple.
00:02:00.800 | But when you have something like that, you're downloading an ML model,
00:02:05.360 | you don't want to redo that every time a tiniest little thing changes in your app.
00:02:10.500 | You only want to do it once, like when the user opens your app the first time,
00:02:15.060 | then you download it, and then you don't download it again.
00:02:17.560 | That's what you want to happen.
00:02:19.840 | And we also want the same to happen with our Pinecone connection.
00:02:24.860 | We just want to initialize that connection once
00:02:27.540 | and not every time something changes in the app.
00:02:31.060 | So we're going to do that. We're going to figure that out.
00:02:34.360 | So we can go over to the Streamlit docs,
00:02:38.440 | and we can scroll up to the top or go to its menu.
00:02:42.160 | And we go to Advanced Features.
00:02:45.540 | Now, I know it says Advanced, but it's not hard to do this.
00:02:49.300 | So we can optimize performance with stcache.
00:02:53.400 | Let's have a look at that.
00:02:55.600 | So we can scroll down.
00:02:58.700 | It's a caching mechanism that allows your app to save performance
00:03:02.040 | when loading data from the web,
00:03:03.900 | a lot of data sets, or performing expensive computations.
00:03:07.060 | Now, that sounds pretty much like what we want.
00:03:10.500 | So let's go down, and we see Basic Usage.
00:03:14.100 | So we have... This is a good example.
00:03:17.200 | So we have this function here.
00:03:19.300 | It takes a long time to run every time.
00:03:22.640 | And therefore, it makes the app very slow.
00:03:25.760 | Every time anything changes, the whole thing is reloaded.
00:03:29.640 | So this expensive computation is rerun every time.
00:03:33.060 | We don't want that to happen.
00:03:34.800 | What we want to do is, okay, you can just add this.
00:03:38.540 | And that means that the output from that expensive computation
00:03:44.340 | is just stored.
00:03:45.740 | It's not reloaded every single time.
00:03:48.900 | Okay, so let's try.
00:03:52.200 | I'm not saying it's going to work, but let's try and do that.
00:03:54.940 | So we're going to put Define Init Retriever.
00:03:59.340 | Okay, and that is just going to return the Retriever model.
00:04:05.660 | So return that.
00:04:09.540 | And we do the same for our Pinecone stuff here as well.
00:04:14.040 | So Define Init Pinecone.
00:04:17.440 | Okay, and then obviously, we need to actually call those.
00:04:22.600 | So let's do that here.
00:04:26.940 | Okay, so we're going to call those.
00:04:30.140 | We want the model is equal to Init Retriever.
00:04:35.460 | And we want the index is equal to Init Pinecone.
00:04:42.000 | Okay, let's save that.
00:04:43.560 | Let's have a look at our app.
00:04:46.560 | Okay, so it's running again.
00:04:48.300 | Let's wait a moment.
00:04:51.400 | Actually, stop that because here we're returning nothing.
00:04:55.800 | So we actually want to return the index.
00:04:58.400 | That would be useful.
00:05:00.460 | And now we have to press this, rerun.
00:05:07.440 | Okay, now the first time we do this, it's going to take a while.
00:05:12.160 | And first, okay, we want to make sure it's actually,
00:05:18.140 | is it working like it was before?
00:05:20.900 | Let's see.
00:05:22.860 | Are we getting any errors?
00:05:24.900 | Okay, no, it always seems to be working fine.
00:05:28.060 | Okay, so let's do, let's add that stcache
00:05:32.960 | that we saw in the documentation.
00:05:35.340 | Let's add that to both of these.
00:05:38.500 | Okay, save, rerun.
00:05:42.540 | We get this nice little spinner running Init Retriever.
00:05:46.000 | It's not very descriptive for our users.
00:05:47.680 | So later on, we'll have a look at making that a little more
00:05:51.340 | interesting or descriptive.
00:05:53.480 | But for now, we'll stick with that.
00:05:55.680 | And this is quite useful because we can see, okay,
00:05:58.000 | what are the slow parts of our model to load?
00:06:02.580 | Okay, so we get this error.
00:06:07.140 | Okay, why is that?
00:06:09.280 | So when we are caching with Streamlit,
00:06:17.340 | what it is doing is, well, it basically checks
00:06:21.680 | if whatever's been cached changes, okay, with every rerun.
00:06:26.080 | So it's putting the function or putting some values
00:06:29.400 | into your function or rerunning it
00:06:31.240 | and having a look at what the hash code is that comes out of it.
00:06:34.980 | Now, in this case, we're calling an API.
00:06:40.980 | We don't or we cannot actually hash the connection
00:06:49.880 | to our Pinecone index, okay?
00:06:52.680 | And we shouldn't really do that for our retriever model either.
00:06:56.240 | So what we can do is something which is kind of new
00:07:00.320 | from Streamlit, okay?
00:07:02.680 | So whereas stcache is always going to check the hash code,
00:07:07.380 | see if anything is changing,
00:07:08.880 | there are these new experimental caches.
00:07:12.180 | And one of those in particular is this,
00:07:15.380 | we have experimental memo, it's fine.
00:07:18.520 | So we use that to store expensive computations.
00:07:21.580 | That's fine, you can try that with some things,
00:07:24.840 | but that's not what we want.
00:07:26.380 | We want this experimental singleton.
00:07:29.140 | So basically what that means, experimental singleton,
00:07:32.620 | is whatever you're running should just be run once
00:07:35.240 | and it should not change, right?
00:07:37.740 | Streamlit is assuming that this will not change, right?
00:07:40.380 | So it's not going to check if it's changed
00:07:43.140 | and therefore it is not going to create that hash representation
00:07:47.820 | of whatever it is you're running.
00:07:50.620 | So we can write st experimental singleton.
00:08:00.180 | Put it here as well, oops, copy it.
00:08:04.180 | Put it here.
00:08:09.380 | And okay, we've just saved it.
00:08:11.540 | Let's have a look, see what happens.
00:08:14.480 | Okay, again, it's going to take a little while
00:08:16.440 | to rerun everything, hopefully not too long.
00:08:21.100 | Okay, there we go.
00:08:22.180 | So now we have our search.
00:08:23.980 | Let's say, who are the Normans?
00:08:27.800 | Okay, I'm not going to skip ahead straight away.
00:08:31.500 | Okay, so there's no waiting anymore,
00:08:33.320 | which is really good because before it just took so long.
00:08:38.800 | So yeah, that's how we've sort of improved the performance
00:08:42.820 | of our Streamlit app using caching
00:08:46.200 | and these new experimental caching primitives
00:08:48.520 | that Streamlit have developed.
00:08:51.320 | So that's incredibly useful.
00:08:54.560 | And what I want to look at in the next video
00:08:57.000 | is, okay, over into our app.
00:08:59.960 | Yes, the performance is there now,
00:09:01.360 | but it doesn't look so good.
00:09:03.860 | So maybe we can have a look at actually improving
00:09:08.600 | this look here.
00:09:11.160 | And to do that, we're actually going to not use,
00:09:14.360 | well, we are going to use Streamlit,
00:09:15.960 | but we're going to pull in what are called
00:09:18.720 | bootstrap card components,
00:09:21.120 | which are another sort of HTML or CSS library.
00:09:25.700 | And using the style from them to display our information.
00:09:30.700 | It would look a lot nicer than it does now.
00:09:34.400 | So that's it for this video.
00:09:37.260 | I hope it's been useful.
00:09:38.860 | Thank you very much for watching
00:09:40.160 | and I'll see you in the next one.