back to indexStreamlit for ML #1 - Installation and API
Chapters
0:0 Intro
0:39 App Outline
3:36 Streamlit Installation
6:15 Streamlit API Basics
00:00:00.000 |
Today, we're going to start working on a NLP app for open domain question answering, built 00:00:09.080 |
using Streamlit. Now, this will be a series of videos and articles. And in this video, 00:00:18.840 |
we're just going to introduce the sort of overall plan for the app. We're going to have 00:00:24.960 |
a look at how we install Streamlit and everything else we need. And then we're going to put 00:00:29.960 |
together a few simple components that will basically act as the core of our app. So let's 00:00:39.840 |
first have a look at what we're going to build. So what I essentially want to do is have like 00:00:46.880 |
a webpage where we have a search bar, nothing particularly complicated here. And that's 00:00:54.720 |
supposed to be a magnifying glass. And what we're going to do is type like a question 00:01:01.040 |
in here. So I don't know, anything, question mark, and then we press enter and we want 00:01:08.720 |
to return a load of results. Now, these results, we want them to be intelligent. So we're going 00:01:16.760 |
to ask a question. We're going to be using the Wikipedia dataset. So we're going to have 00:01:22.880 |
paragraphs or chunks of texts from Wikipedia pages indexed somewhere, which we'll explain 00:01:31.000 |
in a minute. I want to return the most relevant of those. So if I ask something like, who 00:01:36.760 |
are the Normans? It's probably going to come back with Wikipedia page about Normans saying 00:01:41.440 |
Normans were people in north of France in this certain time in the 10th, 11th century 00:01:47.520 |
and so on. So it's going to be like a really big chunk of text. And for now, we're going 00:01:53.360 |
to do that. We're going to create like a, we're going to return a big chunk of text. 00:01:58.720 |
But then in the end, what I actually want to do is kind of use a, what's called a reader 00:02:04.620 |
model to specify a smaller part of that text. It's going to say, okay, who are the Normans? 00:02:09.480 |
It's going to say there were people in the 10th, 11th century in northern France. And 00:02:15.680 |
that's why it's going to return like a very specific segment from a larger paragraph. 00:02:20.200 |
And we're going to make it look nice. So we're going to be able to press like a little down 00:02:23.640 |
button here when we display just this little answer, and it would then expand to show us 00:02:29.880 |
a full answer, the full paragraph. So that is in essence what we are going to do. Now 00:02:38.320 |
to do that, we need a few components as Streamlit, which acts as the front end user interface 00:02:45.080 |
for our app, which is what we're going to be able to see. And then the back end, we're 00:02:49.080 |
going to build an open domain Q&A app. Now I'm going to really breeze over this, so it's 00:02:54.280 |
super quick. We're going to have a vector database to store the paragraphs that I mentioned. 00:03:03.000 |
For that, we're going to be using Pinecone. We're going to have a retriever model. We're 00:03:10.880 |
going to use a BERT retriever model that's been trained on question answering. And when 00:03:16.640 |
we want to include this little short snippet, we're going to also have a reader model. Okay. 00:03:25.760 |
And that's sort of the back end. And then we have Streamlit on the front end. So it 00:03:30.560 |
should be pretty cool. Now let's actually go ahead and install Streamlit. Okay. So to 00:03:38.120 |
create your environment, to do that, I'm using Anaconda here, the Anaconda distribution. 00:03:44.680 |
Just makes things super easy. And all you're really going to need to do, I'm going to breeze 00:03:51.240 |
through this very quickly, is you want to do condo create new, your environment name, 00:03:58.040 |
I'm going to use Streamlit, and your Python version, which is 3.8 that I'm using, followed 00:04:06.280 |
by Anaconda. So you enter that into your terminal window. So if I can copy this. Okay. I would 00:04:16.400 |
do this Python version 3.8. And for the environment name, I will use Streamlit. Now I've done 00:04:28.440 |
this already, so I'm not going to do it again. And from there, we just condo activate that 00:04:35.080 |
environment name. I will show you this. So condo activate Streamlit. Okay. And you see 00:04:43.680 |
over here, we're now in this Streamlit environment. And from there, we are going to install a 00:04:49.000 |
few things. So we're going to pip install, Streamlit for one, Pinecone client. So this 00:04:56.360 |
is going to be how we set up or create our vector database, which is going to install 00:05:02.640 |
our paragraphs, our Wikipedia paragraphs. And we also want to use sentence transformers. 00:05:10.480 |
Okay. So you have to go ahead and pip install all of those. Again, I've already done it. 00:05:16.200 |
I'm not going to redo it. And then we can check that Streamlit is installed with Streamlit 00:05:25.160 |
hello. Okay. And this should, if I write it correctly, this should pop up with a default 00:05:39.760 |
or template app in Streamlit in our browser window. So it'll take a moment to load. Okay. 00:05:48.080 |
And here we have it. Okay. So we can see we're in our browser. It just opened automatically 00:05:54.240 |
localhost 8501. So that's the default port for Streamlit apps. And there we go. So Streamlit 00:06:01.640 |
is installed. Now I'm going to control C to close that. And what I'm going to do is switch 00:06:09.800 |
over to VS code and start building a, an app. So in VS code now, I have this Streamlit-nlp 00:06:19.200 |
directories empty at the moment. So first thing I'm going to do is create a new file 00:06:26.080 |
app.py. Okay. Now app.py is going to be, well, it's going to be our app. It's going to be 00:06:33.640 |
where we, where we do everything. Now to actually use Streamlit, we want to import Streamlit 00:06:42.000 |
as st. Now we want to build our, like our search box. So what we'll first do is create 00:06:53.560 |
like a little description so we can write or we can generate HTML code through Streamlit 00:07:02.400 |
by typing markdown. You can also use HTML, which I will show you at some point, but for 00:07:10.240 |
now let's create a little header. We're going to call it AI Q&A. And I'm going to say, ask 00:07:22.360 |
me a question. Okay. And Streamlit will go ahead and actually convert that into HTML 00:07:32.280 |
code. Now let's go ahead and initialize this app. So what we first need to do is navigate 00:07:43.600 |
to that directory. So for me, it's going to be in documents, projects, and I call it Streamlit 00:07:52.760 |
NLP. Okay. And then we just want to write Streamlit run app.py. And this will initialize 00:08:07.840 |
our app in the same way that we did Streamlit hello earlier. So you can see we have this 00:08:12.120 |
localhost 8501. So if we go ahead and open that in our browser, we'll see our webpage. 00:08:20.920 |
Okay. Now it's super simple at the moment. We just have this AI Q&A, our header, and 00:08:28.400 |
this little description, ask me a question. Okay. Now what I'm going to do is place this 00:08:36.720 |
on one side and I want VS code on the other side. Okay. Now what we can do in here, let's 00:08:48.160 |
change something first, make it easy for ourselves. So I want to also add in the search bar. Now 00:08:56.480 |
to find the search bar component, what I'm going to do is type in Streamlit components. 00:09:07.320 |
And I want to go to the docs. So here, and actually maybe not Streamlit components. If 00:09:14.240 |
I come down to API reference and we can go down and we can look for what we need. So 00:09:21.520 |
we want like a search box, a text search box. So we have text input here. Okay. So we're 00:09:27.760 |
going to copy that. I'm not sure if you can see that very well or not. I zoom in a little 00:09:33.280 |
bit more. Yeah. So I'm going to write query equals ST text input. And I'm just going to 00:09:48.920 |
put something like search. Okay. And in here, so you can't see on the left here, but there's 00:09:58.460 |
also a default argument. Now I'm going to enter that in there. So the default value 00:10:05.040 |
in there is just going to be an empty string. Now this will be useful later on, but for 00:10:09.280 |
now we don't actually need that. And if I, let's open the Streamlit app. I'm going to 00:10:16.960 |
save this on the right. It says source file change, rerun. Now you can rerun it once. 00:10:24.040 |
And what we'll do is like refresh the page, rerun your app. I want to always rerun. So 00:10:30.120 |
what this is going to do is automatically reload my app every time I save the file on 00:10:35.920 |
the right. So that means that we can basically prototype and build something super fast. 00:10:43.940 |
Now the first thing I want to show you here is what happens when we type something in 00:10:49.740 |
query. So I'm going to use ST write again. So ST write, and I'm just going to write and 00:10:59.440 |
use an F string here. I'm going to put query is equal to whatever is in query here. Okay. 00:11:09.640 |
I'm going to save that. It will refresh. So at the moment we can see a query is just equal 00:11:15.000 |
to an empty string. Now what if we ask a question? Now this isn't going to work yet because we 00:11:20.440 |
haven't set up the whole, the backend logic yet, but we'll put, let's just put hello world. 00:11:28.480 |
Okay. Now as soon as I press enter there, the whole Streamlit app re-executed and rerun. 00:11:35.240 |
Okay. From top to bottom it ran the full code. Now when you have parts of your app that take 00:11:40.940 |
a little while to load, that can be a bit annoying. Although there are ways around that 00:11:44.560 |
which we will explore in the future. But for now, that's pretty much all we want to do 00:11:52.160 |
for a brief introduction. In the next video, what we're going to do is actually have a 00:11:58.920 |
look at integrating the smart part of the app. So what you saw at the start with the 00:12:03.020 |
vector database and the retriever models, we're going to integrate those into this. 00:12:08.480 |
So that's it for now. Thank you very much for watching and I'll see you in the next