Today, we're going to start working on a NLP app for open domain question answering, built using Streamlit. Now, this will be a series of videos and articles. And in this video, we're just going to introduce the sort of overall plan for the app. We're going to have a look at how we install Streamlit and everything else we need.
And then we're going to put together a few simple components that will basically act as the core of our app. So let's first have a look at what we're going to build. So what I essentially want to do is have like a webpage where we have a search bar, nothing particularly complicated here.
And that's supposed to be a magnifying glass. And what we're going to do is type like a question in here. So I don't know, anything, question mark, and then we press enter and we want to return a load of results. Now, these results, we want them to be intelligent.
So we're going to ask a question. We're going to be using the Wikipedia dataset. So we're going to have paragraphs or chunks of texts from Wikipedia pages indexed somewhere, which we'll explain in a minute. I want to return the most relevant of those. So if I ask something like, who are the Normans?
It's probably going to come back with Wikipedia page about Normans saying Normans were people in north of France in this certain time in the 10th, 11th century and so on. So it's going to be like a really big chunk of text. And for now, we're going to do that.
We're going to create like a, we're going to return a big chunk of text. But then in the end, what I actually want to do is kind of use a, what's called a reader model to specify a smaller part of that text. It's going to say, okay, who are the Normans?
It's going to say there were people in the 10th, 11th century in northern France. And that's why it's going to return like a very specific segment from a larger paragraph. And we're going to make it look nice. So we're going to be able to press like a little down button here when we display just this little answer, and it would then expand to show us a full answer, the full paragraph.
So that is in essence what we are going to do. Now to do that, we need a few components as Streamlit, which acts as the front end user interface for our app, which is what we're going to be able to see. And then the back end, we're going to build an open domain Q&A app.
Now I'm going to really breeze over this, so it's super quick. We're going to have a vector database to store the paragraphs that I mentioned. For that, we're going to be using Pinecone. We're going to have a retriever model. We're going to use a BERT retriever model that's been trained on question answering.
And when we want to include this little short snippet, we're going to also have a reader model. Okay. And that's sort of the back end. And then we have Streamlit on the front end. So it should be pretty cool. Now let's actually go ahead and install Streamlit. Okay. So to create your environment, to do that, I'm using Anaconda here, the Anaconda distribution.
Just makes things super easy. And all you're really going to need to do, I'm going to breeze through this very quickly, is you want to do condo create new, your environment name, I'm going to use Streamlit, and your Python version, which is 3.8 that I'm using, followed by Anaconda.
So you enter that into your terminal window. So if I can copy this. Okay. I would do this Python version 3.8. And for the environment name, I will use Streamlit. Now I've done this already, so I'm not going to do it again. And from there, we just condo activate that environment name.
I will show you this. So condo activate Streamlit. Okay. And you see over here, we're now in this Streamlit environment. And from there, we are going to install a few things. So we're going to pip install, Streamlit for one, Pinecone client. So this is going to be how we set up or create our vector database, which is going to install our paragraphs, our Wikipedia paragraphs.
And we also want to use sentence transformers. Okay. So you have to go ahead and pip install all of those. Again, I've already done it. I'm not going to redo it. And then we can check that Streamlit is installed with Streamlit hello. Okay. And this should, if I write it correctly, this should pop up with a default or template app in Streamlit in our browser window.
So it'll take a moment to load. Okay. And here we have it. Okay. So we can see we're in our browser. It just opened automatically localhost 8501. So that's the default port for Streamlit apps. And there we go. So Streamlit is installed. Now I'm going to control C to close that.
And what I'm going to do is switch over to VS code and start building a, an app. So in VS code now, I have this Streamlit-nlp directories empty at the moment. So first thing I'm going to do is create a new file app.py. Okay. Now app.py is going to be, well, it's going to be our app.
It's going to be where we, where we do everything. Now to actually use Streamlit, we want to import Streamlit as st. Now we want to build our, like our search box. So what we'll first do is create like a little description so we can write or we can generate HTML code through Streamlit by typing markdown.
You can also use HTML, which I will show you at some point, but for now let's create a little header. We're going to call it AI Q&A. And I'm going to say, ask me a question. Okay. And Streamlit will go ahead and actually convert that into HTML code. Now let's go ahead and initialize this app.
So what we first need to do is navigate to that directory. So for me, it's going to be in documents, projects, and I call it Streamlit NLP. Okay. And then we just want to write Streamlit run app.py. And this will initialize our app in the same way that we did Streamlit hello earlier.
So you can see we have this localhost 8501. So if we go ahead and open that in our browser, we'll see our webpage. Okay. Now it's super simple at the moment. We just have this AI Q&A, our header, and this little description, ask me a question. Okay. Now what I'm going to do is place this on one side and I want VS code on the other side.
Okay. Now what we can do in here, let's change something first, make it easy for ourselves. So I want to also add in the search bar. Now to find the search bar component, what I'm going to do is type in Streamlit components. And I want to go to the docs.
So here, and actually maybe not Streamlit components. If I come down to API reference and we can go down and we can look for what we need. So we want like a search box, a text search box. So we have text input here. Okay. So we're going to copy that.
I'm not sure if you can see that very well or not. I zoom in a little bit more. Yeah. So I'm going to write query equals ST text input. And I'm just going to put something like search. Okay. And in here, so you can't see on the left here, but there's also a default argument.
Now I'm going to enter that in there. So the default value in there is just going to be an empty string. Now this will be useful later on, but for now we don't actually need that. And if I, let's open the Streamlit app. I'm going to save this on the right.
It says source file change, rerun. Now you can rerun it once. And what we'll do is like refresh the page, rerun your app. I want to always rerun. So what this is going to do is automatically reload my app every time I save the file on the right. So that means that we can basically prototype and build something super fast.
Now the first thing I want to show you here is what happens when we type something in query. So I'm going to use ST write again. So ST write, and I'm just going to write and use an F string here. I'm going to put query is equal to whatever is in query here.
Okay. I'm going to save that. It will refresh. So at the moment we can see a query is just equal to an empty string. Now what if we ask a question? Now this isn't going to work yet because we haven't set up the whole, the backend logic yet, but we'll put, let's just put hello world.
Okay. Now as soon as I press enter there, the whole Streamlit app re-executed and rerun. Okay. From top to bottom it ran the full code. Now when you have parts of your app that take a little while to load, that can be a bit annoying. Although there are ways around that which we will explore in the future.
But for now, that's pretty much all we want to do for a brief introduction. In the next video, what we're going to do is actually have a look at integrating the smart part of the app. So what you saw at the start with the vector database and the retriever models, we're going to integrate those into this.
So that's it for now. Thank you very much for watching and I'll see you in the next one. Bye.