Lex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5

Today, we're going to focus on how we can build tools that can be used by agents in the LangChain library. Now, none of that made any sense. Let me just explain what I mean quickly. When I say agent, I'm referring to essentially a large language model that can decide and use tools that give it basically other abilities other than just kind of like the auto-complete of a typical large language model.

And when I refer to tool, obviously, that is the tool that this agent will be able to use. So, if I just kind of try and visualize this quickly, you typically have your query. That would go into a large language model and then it would output some completion. It would just output some text.

An agent is different because let's say you have your query here, goes to your agent, which is just a large language model. We can even say large language model. But now, it's going to say, "Okay, I have some tools available to me. Will any of these tools help me answer this query?" So, it will basically ask itself that question, "Can I use a tool to answer this better than I would be able to otherwise?" If the answer is yes, it's going to decide on which tool it needs to use.

So, in our case, that might be, it's going to decide to use the Lex Friedman database tool. Once it has decided to do that, it also needs to create the input to that tool. So, it's going to also, alongside this, it's going to be, "Okay, I need to ask this query here." It's probably going to be similar to the original user query, but it is actually generated again.

In some scenarios with these agents, like maybe the agent is like a Python interpreter, then in that case, it would obviously rewrite the query into Python code that can then be executed by the Python function. So, essentially, large language model is always going to rewrite something in order to put it into that tool.

Cool. So, it's going to decide on that tool. So, that tool is here, it's going to put in that input, and it's going to get some output. So, this is the response. And this is going to come out, and there may be different things that happen here. So, within that tool, maybe there's another large language model that is going to summarize or reformat the output, or maybe it's just like raw output that gets fed back here.

But basically, here, you have your answer. And the answer gets fed back to our large language model here, and based on the query and the answer, it's going to say, sometimes it will say, "Okay, I need to use another tool," or, "I need to think about this a little bit more." But at some point, it's going to get to what we call the final thought.

So, the final thought. And that is what we would give to the user. So, it's a slightly more complex approach, but, I mean, you're giving tools to large language models. So, what you can do with large language models is, all of a sudden, much grander than what you could do with just one large language model that's just doing completion.

Now, let's have a look at how we can implement all of this. So, we're going to be installing a few prerequisites here. So, we have the HuggingFace datasets library because we already have a Lex Friedman transcripts dataset that we're going to use, although I will talk about how to actually get that in another video.

We have this PodGBT library, which is actually how we get that, or create that dataset. And it's also how we're going to index everything into Pinecone. We have Pinecone here. We're using gRPC client. Actually, I'm not sure if we do need that here, but generally speaking, we can use this to make things faster.

Then we have LangChain, OpenAI, and TQDM, which is just the progress bar. Okay, cool. So, we get to here, you will need a few API keys. So, you have your OpenAI API key that is at platform.openai.com. You have your Pinecone API key. For that, you go to app.pinecone.io. And then if you have an account, you will see this screen.

Otherwise, you will need to sign up for an account and you go to API keys and just copy that. And also note that here, this is the environment, US West 1 GCP. I would paste in my API key here and here I'd put US West 1 GCP. Right? Okay, cool.

Now, let's move on to this. So, we're going to download that dataset. I'm not sure if that's actually the correct name. Let's try. Okay, so let me change the dataset name. Okay, I can see at the top here, so change column, let's transcripts. So, I'm going to put that in there.

Try again. Let me just see what I'm doing wrong. So, I'm going to copy this. Oh, I put ASP transcripts, sorry, Lex transcripts. Okay, so that is downloaded. There's not a ton of data in there at the moment. I'm actually processing more of that at the moment. So, hopefully by the time you see this video, this will be bigger and you'll have more data in there, which will give you more interesting results.

And then we need to reformat this data into a format that is required by the PodGPT indexer. Now, if you saw my recent video on ChatGPT plugins, you'll recognize that this is very similar to the format that they use with ChatGPT plugins. The reason I've done that on purpose is because I also want to show you at some point, not in this video, but in another video, how we can create a ChatGPT plugin using this pretty much the same approach.

So, we have our IDs, we have the text, and then we have metadata. In there, we have just relevant information. So, I want basically what I want the model to refer to, the title of the podcast episodes, and also link back to them. So, that's why I have those two in the metadata there.

Okay. So, we run that. And then we need to initialize our indexer object. For that, we need to actually import PodGPT. It needs to obviously have the OpenAI API key in there as well. Okay. It's named. That should work. Cool. And then what we're going to do is actually just add everything in there.

So, we're going to go through each row in our data, which represents a single podcast, right? So, if I go to data zero, there's probably going to be a lot of text in here, right? Because it's an entire podcast is here, right? We have all of this. And this is actually just a podcast clip, right?

If it's a full podcast, it's going to be even longer. Okay, this is Lex. Doing a song, apparently. And another short video. Let's try and find one that's long. Okay, this looks more like podcast length, right? You see there's a ton of text in here. Yeah, so this is an early podcast as well.

So, maybe it's not even that long compared to some of the more recent ones. But basically, we're not going to feed all of that into a single embedding because we want to be more specific with our embedding. So, the automatic processing here of the indexer is actually to split everything into chunks of, I think it's 400 tokens, right?

So, we do four row in data. I'm just going to reformat a couple of those, converting this into the publish date into a string. And we're also just removing this source here because I've renamed it URL. And then I'm creating a video record object from that dictionary, essentially. And then I'm indexing that.

And that will handle everything. So, that will handle the chunking for me as well. Okay, I wonder if this actually needs to be source. Let me try. Okay, so it was actually supposed to be source there. Okay, so I think this bit here I don't actually need to do.

I think this is actually handled by PodGPT. So, let's just remove that bit there. Unnecessary. Okay, so this bit, it's processing everything. If I would like to see where that is, then, I mean, it should come up here. Usually it does. But I can also check over in the Pinecone, okay?

So, I'm going to go to app.pinecone.io. Okay, so I come to here. Great, so PodGPT. And I think I can check in metrics. And okay, look, I can see when is this? This is just now. Yeah, all right. So, let me make that a little bit longer. Okay, it doesn't let me make it longer.

So, okay, in the last minute, basically it's increasing. Right, if I click on requests. Right, so you can see the number of requests I'm making per minute and it was increasing. Here it's at zero. I think maybe it's just because it's not counted them yet. Or actually it's because it finished.

Okay, great. So, then you can see the row of the most recent one in there. It's just what you saw before, actually. So, you want video ID, channel ID, title, publish, transcript. Cool. Now, what I want to do is I want to go to index name PodGPT, not Asclex.

And I'm going to initialize my connection through Pinecone now. Okay, so through Pinecone directly. And I'm going to create this index object here. Okay, cool. And then I'm going to initialize the retrieval components within LangChain. Okay, so the retrieval components or what you need for retrieval are embeddings and also the vector database.

Okay, Pinecone. So, I initialize both of those. The text key is important. So, that is basically saying which one of your metadata fields contains the text that has been embedded. In this case, it is the text field. Earlier on, we had transcript, right? But that's because the PodGPT library that we're using here is actually reformatting that into the earlier format that I showed you where it was ID, text, and metadata, right?

Or into something similar. So, if I do index query, I'm just going to put in like a random dummy vector at the moment. That needs to be 1, 5, 3, 6. Dimensionality, the OpenAI embedding model we're using. I do top K equals 1 and include metadata equals true. Okay, if I do that, we should see, yeah.

So, this is the format that our metadata is in within Pinecone. All right, so we have the ID. That's like a unique ID for each one. The chunk number, when it was published. We have the source of this retrieved item, and we have the text in there as well, right?

So, yeah, oh, and also the title. Cool. So, that is why we're specifying text as a text key there. And then what we want to do is initialize the GPT 3.5 TurboChat model. So, I'm using the OpenAI API key there. Same temperatures, so the amount of randomness from the model to zero.

And we're using the GPT 3.5 Turbo model. You can also just do that. Basically, the GPT 3.5 Turbo is the default model name setting for that. Okay, cool. Now, what we want to do is we're going to be using this retrieval QA object. So, recently, LangChain refactored the VectorDB QA objects, and now we use this rather than VectorDB QA.

So, it's basically pretty much the same thing, right? It's just using a slightly different approach. So, we specify the large language model that we'd like to use. We are specifying the chain type. So, I'll talk about that in a moment. And then the retriever, so that is just VectorDB.

And then the method we say asRetriever to turn into basically a retrieval object. Right? So, yeah, we have this chain type stuff here. We have two options. We have either stuff or we have MapReduce. Okay, let's say we return 10 items. If we use chain type stuff, those 10 documents are just returned as is, and they're passed to the large language model.

If we use MapReduce, those 10 items are summarized, and that summary is then passed into the large language model. We want as much information as possible coming from our Vector database. So, we use the stuff chain type. Okay, initialize that. Cool. Now, we get to the kind of interesting stuff, right?

So, we have our retriever. With all this, we could do like the naive implementation of using a Vector database with a large language model, where we're using like we have a query, we pass it to the large language model. We use that query to also search the Vector database and retrieve that information and feed it into the query or into the prompt alongside the query for the large language model.

That's like the simple way of doing this, right? You're basically searching every single time. But obviously, in like a chat scenario, you're not necessarily going to want to refer to the Vector database every single time with every single interaction. So, by using these agents and these tools, we can just do it when is needed according to the large language model, right?

But to do that, we need to create this tool, okay? So, this is basically Vector database as a tool, right? So, we're going to give a tool description. That tool description is used by the model or the large language model to decide which of its tools it should use.

Because sometimes it can have multiple tools. And it's also just to decide if it needs to use, in this case, this one tool that it has, right? So, this needs to be descriptive and very like just straightforward, right? So, here we're saying use this tool to answer user questions using Flex Freedom and Podcast.

If the user says, "Ask Alexa," use this tool to get the answer. This tool can also be used to follow-up questions from the user, right? So, I kind of wanted to add this so that it's not expecting us to say, "Ask Alexa" every time. Okay, for the first query, fine.

But then after that, maybe I want to ask a follow-up question without saying, "Ask Alexa" again, okay? And then we initialize a tool from LangChain agents. So, tools require three items. They require a function, right? So, that function basically takes in some text input. It does something and then it outputs some text output, okay?

That is what the function needs to be and that is this retriever.run, okay? So, that's a retrieval QA object that we've created up here. Okay, cool. Now we have the tool description, which we defined here, and we have the name of the tool as well. So, it's a Lex Friedman database.

So, with all of that, we're ready to move on to initializing our chatbot agent or conversational agent. Now, because it's a conversational agent, it does require some form of memory. So, conversational memory, which we've spoken about before, and we're going to use a very simple one here. So, conversational buffer window memory, which this is going to remember the previous K interactions between the user and the AI.

We're going to set K equal to five. You can set it to higher, depending on what you're looking for. So, basically, this is going to remember five AI responses and five human questions, like previous, right? So, you have your current state conversation. You're going to go back five steps, like AI and human, AI and human, AI and human, AI and human, AI and human, right?

That's how far back we're going to go. That's how much we're going to remember. Once we move on to the next interaction, it's going to forget the interaction that was six steps back. Okay? One important thing is the memory key here. I think by default, this is just history, right?

We need it to be chat history because we will be feeding this into the prompt later on and the prompt is going to use the input or the parameter chat history. So, that is important. Okay? Cool. Let's run that. And then we initialize our conversational agent. Now, let's take a look at this.

So, in here, we have our initialized agent. We have the type of agent. There are a ton of different agents that we can use in Lionchain. We're using the chat part here because we're using a chatbot model, GPT 3.5 Turbo. It's a conversational chatbot, right? So, that means it's going to be using conversational memory.

We are using the React framework, which is basically almost like a thought loop for the large language model, which is going to be reason about the query that has been given to you, which is this RE. And then decide on an action based on your reasoning. Okay? So, we're going to talk about that a lot more, in a lot more depth relatively soon.

But for now, I'm not going to go into too much detail there. But it's basically that reasoning, action, reasoning, action loop. And then description. So, the description is referring to, you know, we have that tool description up here. That is basically the deciding factor of the large language model on which tool it should use.

Okay? So, that's why it has that in there as well. In here, we're passing in the number of tools that it can use, right? There can be multiple tools. We've just got one now here. We have our large language model. Verbose is basically, because at the moment we're developing, we're trying to figure out, you know, what we need to do here.

We want to see every step in the execution of this agent. Okay? Every step in the agent executor chain. That's going to print it out, so we can see what is actually happening. Max Iterations is saying, okay, so especially when you have multiple tools, what might happen is that your agent is like, "Okay, I need to use this tool.

I'm going to use it." It gets an answer and it's going to think, "Okay, I need to use other tool to complement this answer. And I need to use another tool. I need to use another tool." And sometimes what can happen is it can just keep going, like infinitely or just for too long.

Okay? So we want to put a stop on the number of iterations I can go through there, number of loops I can go through. So we set that to two. We have our early stopping method here, so basically the model can decide, "Okay, I'm done. Stop." All right? And then we also have our conversational memory.

Okay? Very important. Cool. Now, we basically have almost everything set up. We just need to set up the final thing, is kind of clean up the prompt and also just understand the prompt, because the prompt for the conversational agent is actually pretty complicated. All right? First, let's just have a look at the default prompt that we get.

So we come to here and we have the chat prompt template. It's kind of hard to read, but we basically have the system message prompt template and that contains, "Assistant is a large language model trained by OpenAI. Assistant is designed to assist with a large range of tasks," so on and so on.

All right? I think this is quite a good system message. So in most cases, maybe you would just want to leave it there, but it's up to you. In reality, for this demo, I don't need to use all of this. So I'm going to change it and mainly just show you how to change it.

Okay? We don't really need to. But then we can also see down here, we have these, you can see that we have like tools in here and we have the description of our tools and so on and so on. This is basically all going to change to say the same.

The only thing that's going to change is this system message. So we come down to here. I'm going to change system message to very short now, just that. So we do conversational agent, agent create prompt. We pass it in our system message and we also need to pass in any tools that we're using.

Now, I know the tools are already in here, but this is basically resetting the prompt. So we need to include any of the tools that we are using in there as well. It's not going to just look at what was already in there and assume that we want them in there again.

Okay, so let's take a look at that. All right, so again, pretty messy. You can see that now the template for the system message is shorter than the template for the user message is still pretty much the same. In fact, I think it is exactly the same. So there are a few inputs that we have here.

We have the input. So this is actually what the user is writing, the chat history, and we have the agent scratchpad. Chat history, we defined this earlier on. So this is going to connect up to our conversational buffer window memory here. So basically the history of the chat is going to go in there, wherever it says this.

We have an agent scratchpad. That is basically where the thoughts of the large language model are stored. We can read them because we set verbose equal to true, but the final output will not include them. Okay, cool. So let's take a look at what we have in here. So we have three items.

We have the system message prompt template. So these are all the prompts that are being fed in there. Then we have the messages placeholder and you see this is where the chat history is going to go in. And then we have the human message prompt template. So for that, we have a single input variable, which is actually the user input.

So let's just take a look at those one by one. So system message prompt template, we can print that out. So we have the actual template here. Your help will chatbot answers user's questions. It's just what we wrote earlier. Let's take a look at this one. This is just a placeholder for our chat history that is going to be fed into there.

And then we have the human message prompt template. So we have the template here. That looks pretty messy. So let's print that out. Cool. So we have tools. Assistant can ask the user to use tools to look up information they may find helpful in answering the user's original question.

The tools that the tools of human can use are. Okay. So this is basically going to the assistant as far as it knows it's going to be responding. It's actually going to be responding to the scratchpad. So it's going to say, okay, we have this lecture room and database use this tool based on whatever.

Right. So when responding to me, please output response in one or two formats. Okay. Use this if you want the human to use the tool. In reality, it's actually a line chain or the function that we've passed that is going to be the human in this case. Right. So we're going to be using a markdown code formatted in the following schema.

It's going to be action, what actions to take. So it must be one of lecture room and DB. So if we had multiple tools that would be listed here, we just have one. So it's lecture room and DB. And then action input, what input should we pass to that?

Okay. And then use this if you want to respond directly to the human. So this is where you get your final thought or final answer. Right. And it's just a final answer. Okay, cool. And then users input and then you have the input there. Right. So our actual query will be going in there.

We insert it into that. And the full thing is passed to the large language model. Okay. So let's try. We're going to start with a really simple sort of question of a conversation, which is, hi, how are you? Right. This doesn't require the Lex Friedman database, right? We're not saying anything about Lex Friedman.

So it just says, so it goes in, it has its action and its action input. So that is a JSON format that we saw before. Right. And so the output that we would actually get there is this. I'm just a chatbot. So I don't have any feelings, but I'm here to help you with any questions you have.

We also have the chat history here. Right. That in an actual scenario, you wouldn't be feeding that back to the user. You'd just be feeding the output back to the user. Right now that is empty anyway, because we haven't said anything else. Right. Okay. So he decided not to use the Lex Friedman DB.

Now we're going to use those words, ask Lex, and we're going to ask about the future of AI. Let's see what happens. All right. So it goes in, it says, right. So we need the action, Lex Friedman database, and we say, what is the future of AI? Sometimes we're going to get this.

I think this is just an issue with the library. What we will need to do is just kind of come up and reset the agent. So let's initialize that agent again. Okay. We come to here. Let's run this. Right. And this looks good. So we have the JSON format here.

Observation. Okay. So let's have a look at that observation that is getting back from the tool. And it says, Lex Friedman discussed the potential of AI to increase quality of life, cure diseases, increase material wealth, and so on and so on. Okay. Yeah, it's pretty long. So then the thought based on this observation that I got is that we need to move on to the final answer because this answers the question and I think it's basically just copying the full thing in there.

All right. Yeah, it seems to be. Okay, cool. So then you take a look and this is awesome, a little bit messier. All right. So we have that input. This is the question we asked. Then we have the chat history. So this is a list with, we have the human message.

We have the AI message. Maybe I ran that bit, that run twice. I'm not sure, but it seems to have appeared twice in there. And then the output to that is Lex Friedman discussed the potential of AI, so on and so on. Okay, cool. Now, that's actually the end of the notebook, but what we can do is maybe just ask a follow-up question.

Like, okay, what can we ask? What does he think? What does he think about space exploration? I haven't specified ask Lex in here, but I'm hoping that it's kind of like a follow-up question. So I'm hoping that it will view that as a follow-up question and it will use the Lex Friedman database again.

Okay, very enthusiastic, so on and so on. Cool. Now we have more about history and stuff in there. So I must have run that other one twice. So our most recent one before this was ask Lex about the future of AI. We got this answer and now we're saying the next one.

So the output of the space exploration question is Lex Friedman is very enthusiastic about space exploration, believes that it is one of the most inspiring things humans can do. Cool. For now, that is a retrieval Q&A agent for getting information about the Lex Friedman podcast. Now, as you might have guessed, you can obviously apply that to a ton of different things.

It can be other podcasts. It can be completely different forms of media or of information, internal company documents, PDFs, you know, all that sort of stuff. So you can do a lot of stuff with this and as well, you can use multiple tools. So maybe I just want to focus on podcasts.

I could include like a Lex Friedman podcast agent. I could include like a Human Labs podcast agent and so on and so on, right? And maybe you want to also include other things like a calculator tool or a SQL database retriever or whatever else. There's a ton of things you can do with agents.

But anyway, that is it for this video. I hope all this has been interesting and useful. So thank you very much for watching and I will see you again in the next one. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye.

Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye.

Lex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5

Chapters

Transcript