Streamlit for ML #1 - Installation and API

00:00:00.000 | Today, we're going to start working on a NLP app for open domain question answering, built

00:00:09.080 | using Streamlit. Now, this will be a series of videos and articles. And in this video,

00:00:18.840 | we're just going to introduce the sort of overall plan for the app. We're going to have

00:00:24.960 | a look at how we install Streamlit and everything else we need. And then we're going to put

00:00:29.960 | together a few simple components that will basically act as the core of our app. So let's

00:00:39.840 | first have a look at what we're going to build. So what I essentially want to do is have like

00:00:46.880 | a webpage where we have a search bar, nothing particularly complicated here. And that's

00:00:54.720 | supposed to be a magnifying glass. And what we're going to do is type like a question

00:01:01.040 | in here. So I don't know, anything, question mark, and then we press enter and we want

00:01:08.720 | to return a load of results. Now, these results, we want them to be intelligent. So we're going

00:01:16.760 | to ask a question. We're going to be using the Wikipedia dataset. So we're going to have

00:01:22.880 | paragraphs or chunks of texts from Wikipedia pages indexed somewhere, which we'll explain

00:01:31.000 | in a minute. I want to return the most relevant of those. So if I ask something like, who

00:01:36.760 | are the Normans? It's probably going to come back with Wikipedia page about Normans saying

00:01:41.440 | Normans were people in north of France in this certain time in the 10th, 11th century

00:01:47.520 | and so on. So it's going to be like a really big chunk of text. And for now, we're going

00:01:53.360 | to do that. We're going to create like a, we're going to return a big chunk of text.

00:01:58.720 | But then in the end, what I actually want to do is kind of use a, what's called a reader

00:02:04.620 | model to specify a smaller part of that text. It's going to say, okay, who are the Normans?

00:02:09.480 | It's going to say there were people in the 10th, 11th century in northern France. And

00:02:15.680 | that's why it's going to return like a very specific segment from a larger paragraph.

00:02:20.200 | And we're going to make it look nice. So we're going to be able to press like a little down

00:02:23.640 | button here when we display just this little answer, and it would then expand to show us

00:02:29.880 | a full answer, the full paragraph. So that is in essence what we are going to do. Now

00:02:38.320 | to do that, we need a few components as Streamlit, which acts as the front end user interface

00:02:45.080 | for our app, which is what we're going to be able to see. And then the back end, we're

00:02:49.080 | going to build an open domain Q&A app. Now I'm going to really breeze over this, so it's

00:02:54.280 | super quick. We're going to have a vector database to store the paragraphs that I mentioned.

00:03:03.000 | For that, we're going to be using Pinecone. We're going to have a retriever model. We're

00:03:10.880 | going to use a BERT retriever model that's been trained on question answering. And when

00:03:16.640 | we want to include this little short snippet, we're going to also have a reader model. Okay.

00:03:25.760 | And that's sort of the back end. And then we have Streamlit on the front end. So it

00:03:30.560 | should be pretty cool. Now let's actually go ahead and install Streamlit. Okay. So to

00:03:38.120 | create your environment, to do that, I'm using Anaconda here, the Anaconda distribution.

00:03:44.680 | Just makes things super easy. And all you're really going to need to do, I'm going to breeze

00:03:51.240 | through this very quickly, is you want to do condo create new, your environment name,

00:03:58.040 | I'm going to use Streamlit, and your Python version, which is 3.8 that I'm using, followed

00:04:06.280 | by Anaconda. So you enter that into your terminal window. So if I can copy this. Okay. I would

00:04:16.400 | do this Python version 3.8. And for the environment name, I will use Streamlit. Now I've done

00:04:28.440 | this already, so I'm not going to do it again. And from there, we just condo activate that

00:04:35.080 | environment name. I will show you this. So condo activate Streamlit. Okay. And you see

00:04:43.680 | over here, we're now in this Streamlit environment. And from there, we are going to install a

00:04:49.000 | few things. So we're going to pip install, Streamlit for one, Pinecone client. So this

00:04:56.360 | is going to be how we set up or create our vector database, which is going to install

00:05:02.640 | our paragraphs, our Wikipedia paragraphs. And we also want to use sentence transformers.

00:05:10.480 | Okay. So you have to go ahead and pip install all of those. Again, I've already done it.

00:05:16.200 | I'm not going to redo it. And then we can check that Streamlit is installed with Streamlit

00:05:25.160 | hello. Okay. And this should, if I write it correctly, this should pop up with a default

00:05:39.760 | or template app in Streamlit in our browser window. So it'll take a moment to load. Okay.

00:05:48.080 | And here we have it. Okay. So we can see we're in our browser. It just opened automatically

00:05:54.240 | localhost 8501. So that's the default port for Streamlit apps. And there we go. So Streamlit

00:06:01.640 | is installed. Now I'm going to control C to close that. And what I'm going to do is switch

00:06:09.800 | over to VS code and start building a, an app. So in VS code now, I have this Streamlit-nlp

00:06:19.200 | directories empty at the moment. So first thing I'm going to do is create a new file

00:06:26.080 | app.py. Okay. Now app.py is going to be, well, it's going to be our app. It's going to be

00:06:33.640 | where we, where we do everything. Now to actually use Streamlit, we want to import Streamlit

00:06:42.000 | as st. Now we want to build our, like our search box. So what we'll first do is create

00:06:53.560 | like a little description so we can write or we can generate HTML code through Streamlit

00:07:02.400 | by typing markdown. You can also use HTML, which I will show you at some point, but for

00:07:10.240 | now let's create a little header. We're going to call it AI Q&A. And I'm going to say, ask

00:07:22.360 | me a question. Okay. And Streamlit will go ahead and actually convert that into HTML

00:07:32.280 | code. Now let's go ahead and initialize this app. So what we first need to do is navigate

00:07:43.600 | to that directory. So for me, it's going to be in documents, projects, and I call it Streamlit

00:07:52.760 | NLP. Okay. And then we just want to write Streamlit run app.py. And this will initialize

00:08:07.840 | our app in the same way that we did Streamlit hello earlier. So you can see we have this

00:08:12.120 | localhost 8501. So if we go ahead and open that in our browser, we'll see our webpage.

00:08:20.920 | Okay. Now it's super simple at the moment. We just have this AI Q&A, our header, and

00:08:28.400 | this little description, ask me a question. Okay. Now what I'm going to do is place this

00:08:36.720 | on one side and I want VS code on the other side. Okay. Now what we can do in here, let's

00:08:48.160 | change something first, make it easy for ourselves. So I want to also add in the search bar. Now

00:08:56.480 | to find the search bar component, what I'm going to do is type in Streamlit components.

00:09:07.320 | And I want to go to the docs. So here, and actually maybe not Streamlit components. If

00:09:14.240 | I come down to API reference and we can go down and we can look for what we need. So

00:09:21.520 | we want like a search box, a text search box. So we have text input here. Okay. So we're

00:09:27.760 | going to copy that. I'm not sure if you can see that very well or not. I zoom in a little

00:09:33.280 | bit more. Yeah. So I'm going to write query equals ST text input. And I'm just going to

00:09:48.920 | put something like search. Okay. And in here, so you can't see on the left here, but there's

00:09:58.460 | also a default argument. Now I'm going to enter that in there. So the default value

00:10:05.040 | in there is just going to be an empty string. Now this will be useful later on, but for

00:10:09.280 | now we don't actually need that. And if I, let's open the Streamlit app. I'm going to

00:10:16.960 | save this on the right. It says source file change, rerun. Now you can rerun it once.

00:10:24.040 | And what we'll do is like refresh the page, rerun your app. I want to always rerun. So

00:10:30.120 | what this is going to do is automatically reload my app every time I save the file on

00:10:35.920 | the right. So that means that we can basically prototype and build something super fast.

00:10:43.940 | Now the first thing I want to show you here is what happens when we type something in

00:10:49.740 | query. So I'm going to use ST write again. So ST write, and I'm just going to write and

00:10:59.440 | use an F string here. I'm going to put query is equal to whatever is in query here. Okay.

00:11:09.640 | I'm going to save that. It will refresh. So at the moment we can see a query is just equal

00:11:15.000 | to an empty string. Now what if we ask a question? Now this isn't going to work yet because we

00:11:20.440 | haven't set up the whole, the backend logic yet, but we'll put, let's just put hello world.

00:11:28.480 | Okay. Now as soon as I press enter there, the whole Streamlit app re-executed and rerun.

00:11:35.240 | Okay. From top to bottom it ran the full code. Now when you have parts of your app that take

00:11:40.940 | a little while to load, that can be a bit annoying. Although there are ways around that

00:11:44.560 | which we will explore in the future. But for now, that's pretty much all we want to do

00:11:52.160 | for a brief introduction. In the next video, what we're going to do is actually have a

00:11:58.920 | look at integrating the smart part of the app. So what you saw at the start with the

00:12:03.020 | vector database and the retriever models, we're going to integrate those into this.

00:12:08.480 | So that's it for now. Thank you very much for watching and I'll see you in the next

00:12:13.080 | one. Bye.

Streamlit for ML #1 - Installation and API

Chapters