back to indexStreamlit for ML #3 - Make Apps Fast with Caching
Chapters
0:0 Intro
2:35 Streamlit Caching
6:56 Experimental Caching Primitives
00:00:00.000 |
In this video, we are going to continue with the app that we've been building. 00:00:15.940 |
And we have this search bar or text input that we've built with Streamlit over here. 00:00:23.300 |
And then we put together the back end of our app. 00:00:30.560 |
So that is initializing a connection to our vector database. 00:00:35.300 |
We also created all of our context vectors and put those in there. 00:00:48.200 |
And then Pinecone over here takes care of finding 00:00:52.800 |
the most relevant context vectors based on that query vector. 00:00:59.100 |
And then we had a look at how we can iterate through all the contexts 00:01:04.040 |
that we returned from Pinecone and then display them. 00:01:09.860 |
And at the same time, another really bad thing 00:01:14.560 |
that we need to solve in this video is this takes forever to do anything, right? 00:01:20.100 |
If I just, maybe even if I just remove that and press Enter, 00:01:28.140 |
And this is going to take, I don't know, like a minute. 00:01:31.060 |
I'm going to cut forwards so you don't have to wait as long as I do. 00:01:35.360 |
Okay. So it's just finished. That took way too long. 00:01:39.560 |
So what we want to do, or the reason for that 00:01:43.060 |
is mainly the retriever model download over here. 00:01:46.960 |
So every time we rerun or change anything in our app, 00:01:51.700 |
the way Streamlit works is it re-executes everything in your script. 00:01:56.160 |
And that's really good because it makes developing an app super simple. 00:02:00.800 |
But when you have something like that, you're downloading an ML model, 00:02:05.360 |
you don't want to redo that every time a tiniest little thing changes in your app. 00:02:10.500 |
You only want to do it once, like when the user opens your app the first time, 00:02:15.060 |
then you download it, and then you don't download it again. 00:02:19.840 |
And we also want the same to happen with our Pinecone connection. 00:02:24.860 |
We just want to initialize that connection once 00:02:27.540 |
and not every time something changes in the app. 00:02:31.060 |
So we're going to do that. We're going to figure that out. 00:02:38.440 |
and we can scroll up to the top or go to its menu. 00:02:45.540 |
Now, I know it says Advanced, but it's not hard to do this. 00:02:58.700 |
It's a caching mechanism that allows your app to save performance 00:03:03.900 |
a lot of data sets, or performing expensive computations. 00:03:07.060 |
Now, that sounds pretty much like what we want. 00:03:25.760 |
Every time anything changes, the whole thing is reloaded. 00:03:29.640 |
So this expensive computation is rerun every time. 00:03:34.800 |
What we want to do is, okay, you can just add this. 00:03:38.540 |
And that means that the output from that expensive computation 00:03:52.200 |
I'm not saying it's going to work, but let's try and do that. 00:03:59.340 |
Okay, and that is just going to return the Retriever model. 00:04:09.540 |
And we do the same for our Pinecone stuff here as well. 00:04:17.440 |
Okay, and then obviously, we need to actually call those. 00:04:30.140 |
We want the model is equal to Init Retriever. 00:04:35.460 |
And we want the index is equal to Init Pinecone. 00:04:51.400 |
Actually, stop that because here we're returning nothing. 00:05:07.440 |
Okay, now the first time we do this, it's going to take a while. 00:05:12.160 |
And first, okay, we want to make sure it's actually, 00:05:24.900 |
Okay, no, it always seems to be working fine. 00:05:42.540 |
We get this nice little spinner running Init Retriever. 00:05:47.680 |
So later on, we'll have a look at making that a little more 00:05:55.680 |
And this is quite useful because we can see, okay, 00:05:58.000 |
what are the slow parts of our model to load? 00:06:17.340 |
what it is doing is, well, it basically checks 00:06:21.680 |
if whatever's been cached changes, okay, with every rerun. 00:06:26.080 |
So it's putting the function or putting some values 00:06:31.240 |
and having a look at what the hash code is that comes out of it. 00:06:40.980 |
We don't or we cannot actually hash the connection 00:06:52.680 |
And we shouldn't really do that for our retriever model either. 00:06:56.240 |
So what we can do is something which is kind of new 00:07:02.680 |
So whereas stcache is always going to check the hash code, 00:07:18.520 |
So we use that to store expensive computations. 00:07:21.580 |
That's fine, you can try that with some things, 00:07:29.140 |
So basically what that means, experimental singleton, 00:07:32.620 |
is whatever you're running should just be run once 00:07:37.740 |
Streamlit is assuming that this will not change, right? 00:07:43.140 |
and therefore it is not going to create that hash representation 00:08:14.480 |
Okay, again, it's going to take a little while 00:08:27.800 |
Okay, I'm not going to skip ahead straight away. 00:08:33.320 |
which is really good because before it just took so long. 00:08:38.800 |
So yeah, that's how we've sort of improved the performance 00:08:46.200 |
and these new experimental caching primitives 00:09:03.860 |
So maybe we can have a look at actually improving 00:09:11.160 |
And to do that, we're actually going to not use, 00:09:21.120 |
which are another sort of HTML or CSS library. 00:09:25.700 |
And using the style from them to display our information.