back to indexLex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5
Chapters
0:0 Building conversational agents in LangChain
0:14 Tools and Agents in LangChain
3:57 Notebook setup and prerequisites
5:23 Data preparation
11:0 Initialize LangChain vector store
13:12 Initializing everything needed by agent
13:41 Using RetrievalQA chain in LangChain
15:59 Creating Lex Fridman DB tool
17:37 Initializing a LangChain conversational agent
21:49 Conversational memory prompt
27:41 Testing a conversation with the Lex agent
00:00:00.000 |
Today, we're going to focus on how we can build tools 00:00:04.500 |
that can be used by agents in the LangChain library. 00:00:14.400 |
When I say agent, I'm referring to essentially a large language model 00:00:27.900 |
other than just kind of like the auto-complete of a typical large language model. 00:00:34.700 |
obviously, that is the tool that this agent will be able to use. 00:00:39.400 |
So, if I just kind of try and visualize this quickly, 00:00:59.100 |
An agent is different because let's say you have your query here, 00:01:03.600 |
goes to your agent, which is just a large language model. 00:01:18.600 |
Will any of these tools help me answer this query?" 00:01:24.600 |
So, it will basically ask itself that question, 00:01:27.700 |
"Can I use a tool to answer this better than I would be able to otherwise?" 00:01:32.300 |
If the answer is yes, it's going to decide on which tool it needs to use. 00:01:39.800 |
it's going to decide to use the Lex Friedman database tool. 00:01:48.900 |
it also needs to create the input to that tool. 00:01:56.800 |
it's going to be, "Okay, I need to ask this query here." 00:02:01.200 |
It's probably going to be similar to the original user query, 00:02:10.600 |
like maybe the agent is like a Python interpreter, 00:02:15.100 |
then in that case, it would obviously rewrite the query into Python code 00:02:19.400 |
that can then be executed by the Python function. 00:02:24.200 |
So, essentially, large language model is always going to rewrite something 00:02:34.900 |
So, that tool is here, it's going to put in that input, 00:02:45.500 |
and there may be different things that happen here. 00:02:50.600 |
So, within that tool, maybe there's another large language model 00:02:55.200 |
that is going to summarize or reformat the output, 00:02:58.200 |
or maybe it's just like raw output that gets fed back here. 00:03:03.900 |
And the answer gets fed back to our large language model here, 00:03:09.000 |
and based on the query and the answer, it's going to say, 00:03:11.900 |
sometimes it will say, "Okay, I need to use another tool," 00:03:14.300 |
or, "I need to think about this a little bit more." 00:03:17.300 |
But at some point, it's going to get to what we call the final thought. 00:03:37.700 |
but, I mean, you're giving tools to large language models. 00:03:41.800 |
So, what you can do with large language models 00:03:45.300 |
is, all of a sudden, much grander than what you could do 00:03:49.900 |
with just one large language model that's just doing completion. 00:03:53.900 |
Now, let's have a look at how we can implement all of this. 00:03:57.600 |
So, we're going to be installing a few prerequisites here. 00:04:03.600 |
because we already have a Lex Friedman transcripts dataset 00:04:09.000 |
although I will talk about how to actually get that in another video. 00:04:15.000 |
We have this PodGBT library, which is actually how we get that, 00:04:20.700 |
And it's also how we're going to index everything into Pinecone. 00:04:28.800 |
Actually, I'm not sure if we do need that here, 00:04:31.100 |
but generally speaking, we can use this to make things faster. 00:04:43.800 |
So, we get to here, you will need a few API keys. 00:04:48.000 |
So, you have your OpenAI API key that is at platform.openai.com. 00:04:59.800 |
And then if you have an account, you will see this screen. 00:05:03.600 |
Otherwise, you will need to sign up for an account 00:05:08.500 |
And also note that here, this is the environment, US West 1 GCP. 00:05:14.000 |
I would paste in my API key here and here I'd put US West 1 GCP. 00:05:27.900 |
I'm not sure if that's actually the correct name. 00:05:37.100 |
Okay, I can see at the top here, so change column, let's transcripts. 00:05:52.600 |
Oh, I put ASP transcripts, sorry, Lex transcripts. 00:06:01.300 |
There's not a ton of data in there at the moment. 00:06:03.600 |
I'm actually processing more of that at the moment. 00:06:07.300 |
So, hopefully by the time you see this video, 00:06:09.700 |
this will be bigger and you'll have more data in there, 00:06:12.600 |
which will give you more interesting results. 00:06:14.700 |
And then we need to reformat this data into a format 00:06:21.900 |
Now, if you saw my recent video on ChatGPT plugins, 00:06:27.300 |
you'll recognize that this is very similar to the format 00:06:37.000 |
is because I also want to show you at some point, 00:06:55.300 |
So, I want basically what I want the model to refer to, 00:07:03.400 |
So, that's why I have those two in the metadata there. 00:07:10.400 |
And then we need to initialize our indexer object. 00:07:18.300 |
It needs to obviously have the OpenAI API key in there as well. 00:07:34.500 |
So, we're going to go through each row in our data, 00:07:44.100 |
there's probably going to be a lot of text in here, right? 00:07:47.700 |
Because it's an entire podcast is here, right? 00:07:52.700 |
And this is actually just a podcast clip, right? 00:07:57.600 |
If it's a full podcast, it's going to be even longer. 00:08:12.800 |
Okay, this looks more like podcast length, right? 00:08:31.100 |
But basically, we're not going to feed all of that 00:08:37.300 |
because we want to be more specific with our embedding. 00:08:40.100 |
So, the automatic processing here of the indexer 00:08:48.000 |
is actually to split everything into chunks of, 00:08:57.800 |
I'm just going to reformat a couple of those, 00:09:00.500 |
converting this into the publish date into a string. 00:09:03.700 |
And we're also just removing this source here 00:09:19.800 |
So, that will handle the chunking for me as well. 00:09:23.300 |
Okay, I wonder if this actually needs to be source. 00:09:30.000 |
Okay, so it was actually supposed to be source there. 00:09:37.300 |
Okay, so I think this bit here I don't actually need to do. 00:09:47.000 |
Okay, so this bit, it's processing everything. 00:09:57.600 |
But I can also check over in the Pinecone, okay? 00:10:25.900 |
So, okay, in the last minute, basically it's increasing. 00:10:31.900 |
Right, so you can see the number of requests I'm making per minute 00:10:39.200 |
I think maybe it's just because it's not counted them yet. 00:10:47.000 |
So, then you can see the row of the most recent one in there. 00:10:54.900 |
So, you want video ID, channel ID, title, publish, transcript. 00:11:00.100 |
Now, what I want to do is I want to go to index name PodGPT, 00:11:10.000 |
And I'm going to initialize my connection through Pinecone now. 00:11:18.000 |
And I'm going to create this index object here. 00:11:24.100 |
And then I'm going to initialize the retrieval components 00:11:28.800 |
Okay, so the retrieval components or what you need for retrieval 00:11:44.500 |
So, that is basically saying which one of your metadata fields 00:11:56.300 |
But that's because the PodGPT library that we're using here 00:12:00.500 |
is actually reformatting that into the earlier format 00:12:06.200 |
that I showed you where it was ID, text, and metadata, right? 00:12:15.400 |
I'm just going to put in like a random dummy vector at the moment. 00:12:24.600 |
Dimensionality, the OpenAI embedding model we're using. 00:12:28.400 |
I do top K equals 1 and include metadata equals true. 00:12:41.600 |
So, this is the format that our metadata is in within Pinecone. 00:12:57.000 |
and we have the text in there as well, right? 00:13:04.600 |
So, that is why we're specifying text as a text key there. 00:13:11.300 |
And then what we want to do is initialize the GPT 3.5 TurboChat model. 00:13:20.900 |
Same temperatures, so the amount of randomness from the model to zero. 00:13:29.400 |
Basically, the GPT 3.5 Turbo is the default model name setting for that. 00:13:39.000 |
Now, what we want to do is we're going to be using this retrieval QA object. 00:13:47.100 |
So, recently, LangChain refactored the VectorDB QA objects, 00:13:58.400 |
So, it's basically pretty much the same thing, right? 00:14:03.400 |
It's just using a slightly different approach. 00:14:09.000 |
So, we specify the large language model that we'd like to use. 00:14:18.600 |
And then the retriever, so that is just VectorDB. 00:14:21.200 |
And then the method we say asRetriever to turn into basically a retrieval object. 00:14:28.500 |
So, yeah, we have this chain type stuff here. 00:14:37.800 |
If we use chain type stuff, those 10 documents are just returned as is, 00:14:43.600 |
and they're passed to the large language model. 00:14:45.600 |
If we use MapReduce, those 10 items are summarized, 00:14:49.900 |
and that summary is then passed into the large language model. 00:14:53.800 |
We want as much information as possible coming from our Vector database. 00:15:06.400 |
Now, we get to the kind of interesting stuff, right? 00:15:10.800 |
With all this, we could do like the naive implementation 00:15:13.800 |
of using a Vector database with a large language model, 00:15:23.300 |
We use that query to also search the Vector database 00:15:26.900 |
and retrieve that information and feed it into the query 00:15:31.000 |
or into the prompt alongside the query for the large language model. 00:15:34.400 |
That's like the simple way of doing this, right? 00:15:38.500 |
You're basically searching every single time. 00:15:45.200 |
you're not necessarily going to want to refer 00:15:57.400 |
according to the large language model, right? 00:15:59.800 |
But to do that, we need to create this tool, okay? 00:16:03.200 |
So, this is basically Vector database as a tool, right? 00:16:18.000 |
Because sometimes it can have multiple tools. 00:16:19.800 |
And it's also just to decide if it needs to use, 00:16:22.800 |
in this case, this one tool that it has, right? 00:16:31.800 |
So, here we're saying use this tool to answer user questions 00:16:42.000 |
This tool can also be used to follow-up questions 00:16:54.800 |
But then after that, maybe I want to ask a follow-up question 00:17:00.300 |
And then we initialize a tool from LangChain agents. 00:17:10.800 |
So, that function basically takes in some text input. 00:17:14.600 |
It does something and then it outputs some text output, okay? 00:17:25.000 |
So, that's a retrieval QA object that we've created up here. 00:17:29.800 |
Now we have the tool description, which we defined here, 00:17:38.800 |
we're ready to move on to initializing our chatbot agent 00:17:50.800 |
So, conversational memory, which we've spoken about before, 00:17:53.600 |
and we're going to use a very simple one here. 00:17:58.400 |
which this is going to remember the previous K interactions 00:18:06.300 |
You can set it to higher, depending on what you're looking for. 00:18:09.800 |
So, basically, this is going to remember five AI responses 00:18:16.000 |
and five human questions, like previous, right? 00:18:20.300 |
So, you have your current state conversation. 00:18:25.700 |
like AI and human, AI and human, AI and human, 00:18:37.600 |
it's going to forget the interaction that was six steps back. 00:18:45.600 |
I think by default, this is just history, right? 00:18:51.200 |
because we will be feeding this into the prompt later on 00:19:08.800 |
And then we initialize our conversational agent. 00:19:19.400 |
There are a ton of different agents that we can use in Lionchain. 00:19:25.900 |
because we're using a chatbot model, GPT 3.5 Turbo. 00:19:34.900 |
So, that means it's going to be using conversational memory. 00:19:40.500 |
which is basically almost like a thought loop 00:19:50.200 |
that has been given to you, which is this RE. 00:19:53.500 |
And then decide on an action based on your reasoning. 00:20:00.700 |
So, we're going to talk about that a lot more, 00:20:04.300 |
But for now, I'm not going to go into too much detail there. 00:20:07.700 |
But it's basically that reasoning, action, reasoning, action loop. 00:20:16.700 |
you know, we have that tool description up here. 00:20:19.600 |
That is basically the deciding factor of the large language model 00:20:26.100 |
Okay? So, that's why it has that in there as well. 00:20:29.400 |
In here, we're passing in the number of tools that it can use, right? 00:20:38.300 |
Verbose is basically, because at the moment we're developing, 00:20:42.700 |
we're trying to figure out, you know, what we need to do here. 00:20:45.100 |
We want to see every step in the execution of this agent. 00:21:03.200 |
what might happen is that your agent is like, 00:21:13.000 |
"Okay, I need to use other tool to complement this answer. 00:21:18.300 |
And sometimes what can happen is it can just keep going, 00:21:27.000 |
So we want to put a stop on the number of iterations 00:21:43.800 |
And then we also have our conversational memory. 00:21:50.600 |
Now, we basically have almost everything set up. 00:22:00.500 |
because the prompt for the conversational agent 00:22:06.000 |
First, let's just have a look at the default prompt that we get. 00:22:09.500 |
So we come to here and we have the chat prompt template. 00:22:16.900 |
but we basically have the system message prompt template 00:22:22.200 |
"Assistant is a large language model trained by OpenAI. 00:22:25.400 |
Assistant is designed to assist with a large range of tasks," 00:22:37.500 |
In reality, for this demo, I don't need to use all of this. 00:22:58.000 |
This is basically all going to change to say the same. 00:23:03.100 |
The only thing that's going to change is this system message. 00:23:07.600 |
I'm going to change system message to very short now, 00:23:12.600 |
So we do conversational agent, agent create prompt. 00:23:17.600 |
and we also need to pass in any tools that we're using. 00:23:32.600 |
It's not going to just look at what was already in there 00:23:59.600 |
So this is actually what the user is writing, 00:24:02.200 |
the chat history, and we have the agent scratchpad. 00:24:10.600 |
to our conversational buffer window memory here. 00:24:17.500 |
is going to go in there, wherever it says this. 00:24:27.200 |
We can read them because we set verbose equal to true, 00:24:37.200 |
So let's take a look at what we have in here. 00:24:49.700 |
So these are all the prompts that are being fed in there. 00:24:54.300 |
and you see this is where the chat history is going to go in. 00:24:57.200 |
And then we have the human message prompt template. 00:25:00.800 |
So for that, we have a single input variable, 00:25:07.500 |
So let's just take a look at those one by one. 00:25:19.000 |
Your help will chatbot answers user's questions. 00:25:25.700 |
This is just a placeholder for our chat history 00:25:31.100 |
And then we have the human message prompt template. 00:25:53.200 |
The tools that the tools of human can use are. 00:26:01.900 |
as far as it knows it's going to be responding. 00:26:04.600 |
It's actually going to be responding to the scratchpad. 00:26:14.800 |
So when responding to me, please output response 00:26:19.500 |
Use this if you want the human to use the tool. 00:26:35.600 |
It's going to be action, what actions to take. 00:26:41.000 |
So if we had multiple tools that would be listed here, 00:26:53.100 |
And then use this if you want to respond directly to the human. 00:27:14.200 |
And the full thing is passed to the large language model. 00:27:20.400 |
We're going to start with a really simple sort of question 00:27:23.500 |
of a conversation, which is, hi, how are you? 00:27:32.000 |
We're not saying anything about Lex Friedman. 00:27:36.800 |
so it goes in, it has its action and its action input. 00:27:44.000 |
And so the output that we would actually get there 00:27:50.600 |
but I'm here to help you with any questions you have. 00:27:58.300 |
you wouldn't be feeding that back to the user. 00:27:59.900 |
You'd just be feeding the output back to the user. 00:28:09.200 |
So he decided not to use the Lex Friedman DB. 00:28:16.100 |
and we're going to ask about the future of AI. 00:28:26.700 |
So we need the action, Lex Friedman database, 00:28:35.000 |
I think this is just an issue with the library. 00:28:38.400 |
What we will need to do is just kind of come up 00:29:02.300 |
And it says, Lex Friedman discussed the potential of AI 00:29:13.900 |
So then the thought based on this observation 00:29:19.600 |
that I got is that we need to move on to the final answer 00:29:23.800 |
and I think it's basically just copying the full thing in there. 00:29:41.800 |
So this is a list with, we have the human message. 00:29:51.900 |
I'm not sure, but it seems to have appeared twice in there. 00:29:57.000 |
discussed the potential of AI, so on and so on. 00:30:01.000 |
Now, that's actually the end of the notebook, 00:30:04.300 |
but what we can do is maybe just ask a follow-up question. 00:30:28.700 |
but I'm hoping that it's kind of like a follow-up question. 00:30:31.500 |
So I'm hoping that it will view that as a follow-up question 00:30:35.300 |
and it will use the Lex Friedman database again. 00:30:43.100 |
Now we have more about history and stuff in there. 00:30:54.100 |
We got this answer and now we're saying the next one. 00:30:57.700 |
So the output of the space exploration question is 00:31:02.300 |
Lex Friedman is very enthusiastic about space exploration, 00:31:06.000 |
believes that it is one of the most inspiring things 00:31:10.200 |
For now, that is a retrieval Q&A agent for getting information 00:31:19.800 |
Now, as you might have guessed, you can obviously apply that 00:31:27.300 |
It can be completely different forms of media or of information, 00:31:32.000 |
internal company documents, PDFs, you know, all that sort of stuff. 00:31:35.300 |
So you can do a lot of stuff with this and as well, 00:31:43.600 |
I could include like a Lex Friedman podcast agent. 00:31:48.000 |
I could include like a Human Labs podcast agent and so 00:31:53.600 |
And maybe you want to also include other things like a calculator 00:31:58.100 |
tool or a SQL database retriever or whatever else. 00:32:02.200 |
There's a ton of things you can do with agents. 00:32:08.000 |
I hope all this has been interesting and useful. 00:32:11.000 |
So thank you very much for watching and I will see you