Back to Index

Streamlit for ML #3 - Make Apps Fast with Caching


Chapters

0:0 Intro
2:35 Streamlit Caching
6:56 Experimental Caching Primitives

Transcript

In this video, we are going to continue with the app that we've been building. So, so far, it's a quick summary. We have what you can see over here. We have this AI Q&A. It's just the very basics of Streamlit. And we have this search bar or text input that we've built with Streamlit over here.

Okay. And then we put together the back end of our app. So that is initializing a connection to our vector database. We also created all of our context vectors and put those in there. And also initializing our retriever model, which takes care of encoding the query here. And then Pinecone over here takes care of finding the most relevant context vectors based on that query vector.

And then we had a look at how we can iterate through all the contexts that we returned from Pinecone and then display them. Now, at the moment, it's very ugly. And at the same time, another really bad thing that we need to solve in this video is this takes forever to do anything, right?

If I just, maybe even if I just remove that and press Enter, I'm not even searching for anything. And this is going to take, I don't know, like a minute. I'm going to cut forwards so you don't have to wait as long as I do. Okay. So it's just finished.

That took way too long. So what we want to do, or the reason for that is mainly the retriever model download over here. So every time we rerun or change anything in our app, the way Streamlit works is it re-executes everything in your script. And that's really good because it makes developing an app super simple.

But when you have something like that, you're downloading an ML model, you don't want to redo that every time a tiniest little thing changes in your app. You only want to do it once, like when the user opens your app the first time, then you download it, and then you don't download it again.

That's what you want to happen. And we also want the same to happen with our Pinecone connection. We just want to initialize that connection once and not every time something changes in the app. So we're going to do that. We're going to figure that out. So we can go over to the Streamlit docs, and we can scroll up to the top or go to its menu.

And we go to Advanced Features. Now, I know it says Advanced, but it's not hard to do this. So we can optimize performance with stcache. Let's have a look at that. So we can scroll down. It's a caching mechanism that allows your app to save performance when loading data from the web, a lot of data sets, or performing expensive computations.

Now, that sounds pretty much like what we want. So let's go down, and we see Basic Usage. So we have... This is a good example. So we have this function here. It takes a long time to run every time. And therefore, it makes the app very slow. Every time anything changes, the whole thing is reloaded.

So this expensive computation is rerun every time. We don't want that to happen. What we want to do is, okay, you can just add this. And that means that the output from that expensive computation is just stored. It's not reloaded every single time. Okay, so let's try. I'm not saying it's going to work, but let's try and do that.

So we're going to put Define Init Retriever. Okay, and that is just going to return the Retriever model. So return that. And we do the same for our Pinecone stuff here as well. So Define Init Pinecone. Okay, and then obviously, we need to actually call those. So let's do that here.

Okay, so we're going to call those. We want the model is equal to Init Retriever. And we want the index is equal to Init Pinecone. Okay, let's save that. Let's have a look at our app. Okay, so it's running again. Let's wait a moment. Actually, stop that because here we're returning nothing.

So we actually want to return the index. That would be useful. And now we have to press this, rerun. Okay, now the first time we do this, it's going to take a while. And first, okay, we want to make sure it's actually, is it working like it was before?

Let's see. Are we getting any errors? Okay, no, it always seems to be working fine. Okay, so let's do, let's add that stcache that we saw in the documentation. Let's add that to both of these. Okay, save, rerun. We get this nice little spinner running Init Retriever. It's not very descriptive for our users.

So later on, we'll have a look at making that a little more interesting or descriptive. But for now, we'll stick with that. And this is quite useful because we can see, okay, what are the slow parts of our model to load? Okay, so we get this error. Okay, why is that?

So when we are caching with Streamlit, what it is doing is, well, it basically checks if whatever's been cached changes, okay, with every rerun. So it's putting the function or putting some values into your function or rerunning it and having a look at what the hash code is that comes out of it.

Now, in this case, we're calling an API. We don't or we cannot actually hash the connection to our Pinecone index, okay? And we shouldn't really do that for our retriever model either. So what we can do is something which is kind of new from Streamlit, okay? So whereas stcache is always going to check the hash code, see if anything is changing, there are these new experimental caches.

And one of those in particular is this, we have experimental memo, it's fine. So we use that to store expensive computations. That's fine, you can try that with some things, but that's not what we want. We want this experimental singleton. So basically what that means, experimental singleton, is whatever you're running should just be run once and it should not change, right?

Streamlit is assuming that this will not change, right? So it's not going to check if it's changed and therefore it is not going to create that hash representation of whatever it is you're running. So we can write st experimental singleton. Put it here as well, oops, copy it. Put it here.

And okay, we've just saved it. Let's have a look, see what happens. Okay, again, it's going to take a little while to rerun everything, hopefully not too long. Okay, there we go. So now we have our search. Let's say, who are the Normans? Okay, I'm not going to skip ahead straight away.

Okay, so there's no waiting anymore, which is really good because before it just took so long. So yeah, that's how we've sort of improved the performance of our Streamlit app using caching and these new experimental caching primitives that Streamlit have developed. So that's incredibly useful. And what I want to look at in the next video is, okay, over into our app.

Yes, the performance is there now, but it doesn't look so good. So maybe we can have a look at actually improving this look here. And to do that, we're actually going to not use, well, we are going to use Streamlit, but we're going to pull in what are called bootstrap card components, which are another sort of HTML or CSS library.

And using the style from them to display our information. It would look a lot nicer than it does now. So that's it for this video. I hope it's been useful. Thank you very much for watching and I'll see you in the next one.