back to index

Lex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5


Chapters

0:0 Building conversational agents in LangChain
0:14 Tools and Agents in LangChain
3:57 Notebook setup and prerequisites
5:23 Data preparation
11:0 Initialize LangChain vector store
13:12 Initializing everything needed by agent
13:41 Using RetrievalQA chain in LangChain
15:59 Creating Lex Fridman DB tool
17:37 Initializing a LangChain conversational agent
21:49 Conversational memory prompt
27:41 Testing a conversation with the Lex agent

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to focus on how we can build tools
00:00:04.500 | that can be used by agents in the LangChain library.
00:00:08.200 | Now, none of that made any sense.
00:00:10.800 | Let me just explain what I mean quickly.
00:00:14.400 | When I say agent, I'm referring to essentially a large language model
00:00:20.100 | that can decide and use tools
00:00:25.200 | that give it basically other abilities
00:00:27.900 | other than just kind of like the auto-complete of a typical large language model.
00:00:32.200 | And when I refer to tool,
00:00:34.700 | obviously, that is the tool that this agent will be able to use.
00:00:39.400 | So, if I just kind of try and visualize this quickly,
00:00:44.500 | you typically have your query.
00:00:47.300 | That would go into a large language model
00:00:51.100 | and then it would output some completion.
00:00:57.500 | It would just output some text.
00:00:59.100 | An agent is different because let's say you have your query here,
00:01:03.600 | goes to your agent, which is just a large language model.
00:01:08.300 | We can even say large language model.
00:01:11.700 | But now, it's going to say,
00:01:15.100 | "Okay, I have some tools available to me.
00:01:18.600 | Will any of these tools help me answer this query?"
00:01:24.600 | So, it will basically ask itself that question,
00:01:27.700 | "Can I use a tool to answer this better than I would be able to otherwise?"
00:01:32.300 | If the answer is yes, it's going to decide on which tool it needs to use.
00:01:37.200 | So, in our case, that might be,
00:01:39.800 | it's going to decide to use the Lex Friedman database tool.
00:01:44.900 | Once it has decided to do that,
00:01:48.900 | it also needs to create the input to that tool.
00:01:53.800 | So, it's going to also, alongside this,
00:01:56.800 | it's going to be, "Okay, I need to ask this query here."
00:02:01.200 | It's probably going to be similar to the original user query,
00:02:03.900 | but it is actually generated again.
00:02:07.000 | In some scenarios with these agents,
00:02:10.600 | like maybe the agent is like a Python interpreter,
00:02:15.100 | then in that case, it would obviously rewrite the query into Python code
00:02:19.400 | that can then be executed by the Python function.
00:02:24.200 | So, essentially, large language model is always going to rewrite something
00:02:28.000 | in order to put it into that tool.
00:02:31.500 | Cool. So, it's going to decide on that tool.
00:02:34.900 | So, that tool is here, it's going to put in that input,
00:02:38.400 | and it's going to get some output.
00:02:40.700 | So, this is the response.
00:02:43.200 | And this is going to come out,
00:02:45.500 | and there may be different things that happen here.
00:02:50.600 | So, within that tool, maybe there's another large language model
00:02:55.200 | that is going to summarize or reformat the output,
00:02:58.200 | or maybe it's just like raw output that gets fed back here.
00:03:01.700 | But basically, here, you have your answer.
00:03:03.900 | And the answer gets fed back to our large language model here,
00:03:09.000 | and based on the query and the answer, it's going to say,
00:03:11.900 | sometimes it will say, "Okay, I need to use another tool,"
00:03:14.300 | or, "I need to think about this a little bit more."
00:03:17.300 | But at some point, it's going to get to what we call the final thought.
00:03:21.000 | So, the final thought.
00:03:22.800 | And that is what we would give to the user.
00:03:32.700 | So, it's a slightly more complex approach,
00:03:37.700 | but, I mean, you're giving tools to large language models.
00:03:41.800 | So, what you can do with large language models
00:03:45.300 | is, all of a sudden, much grander than what you could do
00:03:49.900 | with just one large language model that's just doing completion.
00:03:53.900 | Now, let's have a look at how we can implement all of this.
00:03:57.600 | So, we're going to be installing a few prerequisites here.
00:04:00.900 | So, we have the HuggingFace datasets library
00:04:03.600 | because we already have a Lex Friedman transcripts dataset
00:04:07.900 | that we're going to use,
00:04:09.000 | although I will talk about how to actually get that in another video.
00:04:15.000 | We have this PodGBT library, which is actually how we get that,
00:04:19.400 | or create that dataset.
00:04:20.700 | And it's also how we're going to index everything into Pinecone.
00:04:25.100 | We have Pinecone here.
00:04:27.000 | We're using gRPC client.
00:04:28.800 | Actually, I'm not sure if we do need that here,
00:04:31.100 | but generally speaking, we can use this to make things faster.
00:04:35.300 | Then we have LangChain, OpenAI, and TQDM,
00:04:39.300 | which is just the progress bar.
00:04:42.000 | Okay, cool.
00:04:43.800 | So, we get to here, you will need a few API keys.
00:04:48.000 | So, you have your OpenAI API key that is at platform.openai.com.
00:04:53.700 | You have your Pinecone API key.
00:04:56.500 | For that, you go to app.pinecone.io.
00:04:59.800 | And then if you have an account, you will see this screen.
00:05:03.600 | Otherwise, you will need to sign up for an account
00:05:05.700 | and you go to API keys and just copy that.
00:05:08.500 | And also note that here, this is the environment, US West 1 GCP.
00:05:14.000 | I would paste in my API key here and here I'd put US West 1 GCP.
00:05:20.600 | Right? Okay, cool.
00:05:23.100 | Now, let's move on to this.
00:05:24.600 | So, we're going to download that dataset.
00:05:27.900 | I'm not sure if that's actually the correct name.
00:05:30.000 | Let's try.
00:05:31.300 | Okay, so let me change the dataset name.
00:05:37.100 | Okay, I can see at the top here, so change column, let's transcripts.
00:05:41.700 | So, I'm going to put that in there.
00:05:44.500 | Try again.
00:05:49.000 | Let me just see what I'm doing wrong.
00:05:51.100 | So, I'm going to copy this.
00:05:52.600 | Oh, I put ASP transcripts, sorry, Lex transcripts.
00:05:59.000 | Okay, so that is downloaded.
00:06:01.300 | There's not a ton of data in there at the moment.
00:06:03.600 | I'm actually processing more of that at the moment.
00:06:07.300 | So, hopefully by the time you see this video,
00:06:09.700 | this will be bigger and you'll have more data in there,
00:06:12.600 | which will give you more interesting results.
00:06:14.700 | And then we need to reformat this data into a format
00:06:19.600 | that is required by the PodGPT indexer.
00:06:21.900 | Now, if you saw my recent video on ChatGPT plugins,
00:06:27.300 | you'll recognize that this is very similar to the format
00:06:32.300 | that they use with ChatGPT plugins.
00:06:34.400 | The reason I've done that on purpose
00:06:37.000 | is because I also want to show you at some point,
00:06:40.700 | not in this video, but in another video,
00:06:42.700 | how we can create a ChatGPT plugin
00:06:45.300 | using this pretty much the same approach.
00:06:48.500 | So, we have our IDs, we have the text,
00:06:51.800 | and then we have metadata.
00:06:53.100 | In there, we have just relevant information.
00:06:55.300 | So, I want basically what I want the model to refer to,
00:06:58.600 | the title of the podcast episodes,
00:07:01.400 | and also link back to them.
00:07:03.400 | So, that's why I have those two in the metadata there.
00:07:06.800 | Okay.
00:07:07.800 | So, we run that.
00:07:10.400 | And then we need to initialize our indexer object.
00:07:15.200 | For that, we need to actually import PodGPT.
00:07:18.300 | It needs to obviously have the OpenAI API key in there as well.
00:07:25.000 | Okay.
00:07:26.500 | It's named. That should work.
00:07:28.700 | Cool.
00:07:30.300 | And then what we're going to do
00:07:31.700 | is actually just add everything in there.
00:07:34.500 | So, we're going to go through each row in our data,
00:07:37.300 | which represents a single podcast, right?
00:07:41.300 | So, if I go to data zero,
00:07:44.100 | there's probably going to be a lot of text in here, right?
00:07:47.700 | Because it's an entire podcast is here, right?
00:07:51.400 | We have all of this.
00:07:52.700 | And this is actually just a podcast clip, right?
00:07:57.600 | If it's a full podcast, it's going to be even longer.
00:08:00.100 | Okay, this is Lex.
00:08:04.400 | Doing a song, apparently.
00:08:06.900 | And another short video.
00:08:10.800 | Let's try and find one that's long.
00:08:12.800 | Okay, this looks more like podcast length, right?
00:08:20.400 | You see there's a ton of text in here.
00:08:22.700 | Yeah, so this is an early podcast as well.
00:08:26.400 | So, maybe it's not even that long
00:08:28.000 | compared to some of the more recent ones.
00:08:31.100 | But basically, we're not going to feed all of that
00:08:33.700 | into a single embedding
00:08:37.300 | because we want to be more specific with our embedding.
00:08:40.100 | So, the automatic processing here of the indexer
00:08:48.000 | is actually to split everything into chunks of,
00:08:51.200 | I think it's 400 tokens, right?
00:08:54.000 | So, we do four row in data.
00:08:57.800 | I'm just going to reformat a couple of those,
00:09:00.500 | converting this into the publish date into a string.
00:09:03.700 | And we're also just removing this source here
00:09:07.400 | because I've renamed it URL.
00:09:09.100 | And then I'm creating a video record object
00:09:12.600 | from that dictionary, essentially.
00:09:16.300 | And then I'm indexing that.
00:09:17.700 | And that will handle everything.
00:09:19.800 | So, that will handle the chunking for me as well.
00:09:23.300 | Okay, I wonder if this actually needs to be source.
00:09:27.000 | Let me try.
00:09:30.000 | Okay, so it was actually supposed to be source there.
00:09:37.300 | Okay, so I think this bit here I don't actually need to do.
00:09:40.100 | I think this is actually handled by PodGPT.
00:09:43.500 | So, let's just remove that bit there.
00:09:45.200 | Unnecessary.
00:09:47.000 | Okay, so this bit, it's processing everything.
00:09:49.900 | If I would like to see where that is,
00:09:53.700 | then, I mean, it should come up here.
00:09:55.900 | Usually it does.
00:09:57.600 | But I can also check over in the Pinecone, okay?
00:10:01.700 | So, I'm going to go to app.pinecone.io.
00:10:05.400 | Okay, so I come to here.
00:10:08.100 | Great, so PodGPT.
00:10:10.600 | And I think I can check in metrics.
00:10:13.400 | And okay, look, I can see when is this?
00:10:17.000 | This is just now.
00:10:18.200 | Yeah, all right.
00:10:20.600 | So, let me make that a little bit longer.
00:10:23.000 | Okay, it doesn't let me make it longer.
00:10:25.900 | So, okay, in the last minute, basically it's increasing.
00:10:29.400 | Right, if I click on requests.
00:10:31.900 | Right, so you can see the number of requests I'm making per minute
00:10:36.800 | and it was increasing.
00:10:38.200 | Here it's at zero.
00:10:39.200 | I think maybe it's just because it's not counted them yet.
00:10:41.800 | Or actually it's because it finished.
00:10:46.100 | Okay, great.
00:10:47.000 | So, then you can see the row of the most recent one in there.
00:10:52.900 | It's just what you saw before, actually.
00:10:54.900 | So, you want video ID, channel ID, title, publish, transcript.
00:10:59.100 | Cool.
00:11:00.100 | Now, what I want to do is I want to go to index name PodGPT,
00:11:08.400 | not Asclex.
00:11:10.000 | And I'm going to initialize my connection through Pinecone now.
00:11:16.200 | Okay, so through Pinecone directly.
00:11:18.000 | And I'm going to create this index object here.
00:11:22.200 | Okay, cool.
00:11:24.100 | And then I'm going to initialize the retrieval components
00:11:27.600 | within LangChain.
00:11:28.800 | Okay, so the retrieval components or what you need for retrieval
00:11:32.500 | are embeddings and also the vector database.
00:11:37.800 | Okay, Pinecone.
00:11:38.800 | So, I initialize both of those.
00:11:41.500 | The text key is important.
00:11:44.500 | So, that is basically saying which one of your metadata fields
00:11:48.200 | contains the text that has been embedded.
00:11:51.300 | In this case, it is the text field.
00:11:54.600 | Earlier on, we had transcript, right?
00:11:56.300 | But that's because the PodGPT library that we're using here
00:12:00.500 | is actually reformatting that into the earlier format
00:12:06.200 | that I showed you where it was ID, text, and metadata, right?
00:12:11.100 | Or into something similar.
00:12:13.300 | So, if I do index query,
00:12:15.400 | I'm just going to put in like a random dummy vector at the moment.
00:12:21.300 | That needs to be 1, 5, 3, 6.
00:12:24.600 | Dimensionality, the OpenAI embedding model we're using.
00:12:28.400 | I do top K equals 1 and include metadata equals true.
00:12:36.300 | Okay, if I do that, we should see, yeah.
00:12:41.600 | So, this is the format that our metadata is in within Pinecone.
00:12:46.500 | All right, so we have the ID.
00:12:48.500 | That's like a unique ID for each one.
00:12:50.500 | The chunk number, when it was published.
00:12:52.800 | We have the source of this retrieved item,
00:12:57.000 | and we have the text in there as well, right?
00:12:59.700 | So, yeah, oh, and also the title.
00:13:03.100 | Cool.
00:13:04.600 | So, that is why we're specifying text as a text key there.
00:13:11.300 | And then what we want to do is initialize the GPT 3.5 TurboChat model.
00:13:18.200 | So, I'm using the OpenAI API key there.
00:13:20.900 | Same temperatures, so the amount of randomness from the model to zero.
00:13:24.900 | And we're using the GPT 3.5 Turbo model.
00:13:27.900 | You can also just do that.
00:13:29.400 | Basically, the GPT 3.5 Turbo is the default model name setting for that.
00:13:36.300 | Okay, cool.
00:13:39.000 | Now, what we want to do is we're going to be using this retrieval QA object.
00:13:47.100 | So, recently, LangChain refactored the VectorDB QA objects,
00:13:53.600 | and now we use this rather than VectorDB QA.
00:13:58.400 | So, it's basically pretty much the same thing, right?
00:14:03.400 | It's just using a slightly different approach.
00:14:09.000 | So, we specify the large language model that we'd like to use.
00:14:13.900 | We are specifying the chain type.
00:14:16.300 | So, I'll talk about that in a moment.
00:14:18.600 | And then the retriever, so that is just VectorDB.
00:14:21.200 | And then the method we say asRetriever to turn into basically a retrieval object.
00:14:26.500 | Right?
00:14:28.500 | So, yeah, we have this chain type stuff here.
00:14:31.900 | We have two options.
00:14:33.200 | We have either stuff or we have MapReduce.
00:14:35.400 | Okay, let's say we return 10 items.
00:14:37.800 | If we use chain type stuff, those 10 documents are just returned as is,
00:14:43.600 | and they're passed to the large language model.
00:14:45.600 | If we use MapReduce, those 10 items are summarized,
00:14:49.900 | and that summary is then passed into the large language model.
00:14:53.800 | We want as much information as possible coming from our Vector database.
00:14:57.900 | So, we use the stuff chain type.
00:15:00.400 | Okay, initialize that.
00:15:04.600 | Cool.
00:15:06.400 | Now, we get to the kind of interesting stuff, right?
00:15:09.100 | So, we have our retriever.
00:15:10.800 | With all this, we could do like the naive implementation
00:15:13.800 | of using a Vector database with a large language model,
00:15:17.800 | where we're using like we have a query,
00:15:20.800 | we pass it to the large language model.
00:15:23.300 | We use that query to also search the Vector database
00:15:26.900 | and retrieve that information and feed it into the query
00:15:31.000 | or into the prompt alongside the query for the large language model.
00:15:34.400 | That's like the simple way of doing this, right?
00:15:38.500 | You're basically searching every single time.
00:15:41.100 | But obviously, in like a chat scenario,
00:15:45.200 | you're not necessarily going to want to refer
00:15:47.400 | to the Vector database every single time
00:15:49.500 | with every single interaction.
00:15:51.400 | So, by using these agents and these tools,
00:15:54.800 | we can just do it when is needed
00:15:57.400 | according to the large language model, right?
00:15:59.800 | But to do that, we need to create this tool, okay?
00:16:03.200 | So, this is basically Vector database as a tool, right?
00:16:06.900 | So, we're going to give a tool description.
00:16:08.800 | That tool description is used by the model
00:16:12.700 | or the large language model to decide
00:16:15.500 | which of its tools it should use.
00:16:18.000 | Because sometimes it can have multiple tools.
00:16:19.800 | And it's also just to decide if it needs to use,
00:16:22.800 | in this case, this one tool that it has, right?
00:16:26.500 | So, this needs to be descriptive
00:16:28.600 | and very like just straightforward, right?
00:16:31.800 | So, here we're saying use this tool to answer user questions
00:16:34.900 | using Flex Freedom and Podcast.
00:16:37.300 | If the user says, "Ask Alexa,"
00:16:39.400 | use this tool to get the answer.
00:16:42.000 | This tool can also be used to follow-up questions
00:16:44.600 | from the user, right?
00:16:45.900 | So, I kind of wanted to add this
00:16:48.100 | so that it's not expecting us to say,
00:16:50.300 | "Ask Alexa" every time.
00:16:52.200 | Okay, for the first query, fine.
00:16:54.800 | But then after that, maybe I want to ask a follow-up question
00:16:57.300 | without saying, "Ask Alexa" again, okay?
00:17:00.300 | And then we initialize a tool from LangChain agents.
00:17:05.200 | So, tools require three items.
00:17:07.700 | They require a function, right?
00:17:10.800 | So, that function basically takes in some text input.
00:17:14.600 | It does something and then it outputs some text output, okay?
00:17:19.200 | That is what the function needs to be
00:17:22.500 | and that is this retriever.run, okay?
00:17:25.000 | So, that's a retrieval QA object that we've created up here.
00:17:28.400 | Okay, cool.
00:17:29.800 | Now we have the tool description, which we defined here,
00:17:33.200 | and we have the name of the tool as well.
00:17:35.100 | So, it's a Lex Friedman database.
00:17:37.100 | So, with all of that,
00:17:38.800 | we're ready to move on to initializing our chatbot agent
00:17:42.900 | or conversational agent.
00:17:44.500 | Now, because it's a conversational agent,
00:17:47.400 | it does require some form of memory.
00:17:50.800 | So, conversational memory, which we've spoken about before,
00:17:53.600 | and we're going to use a very simple one here.
00:17:55.800 | So, conversational buffer window memory,
00:17:58.400 | which this is going to remember the previous K interactions
00:18:02.800 | between the user and the AI.
00:18:04.700 | We're going to set K equal to five.
00:18:06.300 | You can set it to higher, depending on what you're looking for.
00:18:09.800 | So, basically, this is going to remember five AI responses
00:18:16.000 | and five human questions, like previous, right?
00:18:20.300 | So, you have your current state conversation.
00:18:22.800 | You're going to go back five steps,
00:18:25.700 | like AI and human, AI and human, AI and human,
00:18:28.900 | AI and human, AI and human, right?
00:18:31.200 | That's how far back we're going to go.
00:18:33.100 | That's how much we're going to remember.
00:18:35.000 | Once we move on to the next interaction,
00:18:37.600 | it's going to forget the interaction that was six steps back.
00:18:41.700 | Okay?
00:18:43.400 | One important thing is the memory key here.
00:18:45.600 | I think by default, this is just history, right?
00:18:49.800 | We need it to be chat history
00:18:51.200 | because we will be feeding this into the prompt later on
00:18:55.300 | and the prompt is going to use the input
00:18:59.300 | or the parameter chat history.
00:19:02.800 | So, that is important.
00:19:04.500 | Okay?
00:19:05.900 | Cool. Let's run that.
00:19:08.800 | And then we initialize our conversational agent.
00:19:11.700 | Now, let's take a look at this.
00:19:13.700 | So, in here, we have our initialized agent.
00:19:17.800 | We have the type of agent.
00:19:19.400 | There are a ton of different agents that we can use in Lionchain.
00:19:23.000 | We're using the chat part here
00:19:25.900 | because we're using a chatbot model, GPT 3.5 Turbo.
00:19:31.100 | It's a conversational chatbot, right?
00:19:34.900 | So, that means it's going to be using conversational memory.
00:19:37.900 | We are using the React framework,
00:19:40.500 | which is basically almost like a thought loop
00:19:45.800 | for the large language model,
00:19:47.500 | which is going to be reason about the query
00:19:50.200 | that has been given to you, which is this RE.
00:19:53.500 | And then decide on an action based on your reasoning.
00:19:59.300 | Okay?
00:20:00.700 | So, we're going to talk about that a lot more,
00:20:02.600 | in a lot more depth relatively soon.
00:20:04.300 | But for now, I'm not going to go into too much detail there.
00:20:07.700 | But it's basically that reasoning, action, reasoning, action loop.
00:20:11.900 | And then description.
00:20:14.100 | So, the description is referring to,
00:20:16.700 | you know, we have that tool description up here.
00:20:19.600 | That is basically the deciding factor of the large language model
00:20:23.800 | on which tool it should use.
00:20:26.100 | Okay? So, that's why it has that in there as well.
00:20:29.400 | In here, we're passing in the number of tools that it can use, right?
00:20:32.300 | There can be multiple tools.
00:20:33.900 | We've just got one now here.
00:20:35.700 | We have our large language model.
00:20:38.300 | Verbose is basically, because at the moment we're developing,
00:20:42.700 | we're trying to figure out, you know, what we need to do here.
00:20:45.100 | We want to see every step in the execution of this agent.
00:20:50.800 | Okay?
00:20:51.900 | Every step in the agent executor chain.
00:20:54.800 | That's going to print it out,
00:20:56.300 | so we can see what is actually happening.
00:20:58.900 | Max Iterations is saying, okay,
00:21:01.100 | so especially when you have multiple tools,
00:21:03.200 | what might happen is that your agent is like,
00:21:08.200 | "Okay, I need to use this tool.
00:21:09.800 | I'm going to use it."
00:21:11.100 | It gets an answer and it's going to think,
00:21:13.000 | "Okay, I need to use other tool to complement this answer.
00:21:16.100 | And I need to use another tool.
00:21:17.200 | I need to use another tool."
00:21:18.300 | And sometimes what can happen is it can just keep going,
00:21:22.400 | like infinitely or just for too long.
00:21:25.400 | Okay?
00:21:27.000 | So we want to put a stop on the number of iterations
00:21:31.200 | I can go through there,
00:21:32.300 | number of loops I can go through.
00:21:33.700 | So we set that to two.
00:21:35.200 | We have our early stopping method here,
00:21:37.600 | so basically the model can decide,
00:21:40.300 | "Okay, I'm done. Stop."
00:21:42.400 | All right?
00:21:43.800 | And then we also have our conversational memory.
00:21:46.300 | Okay? Very important.
00:21:47.800 | Cool.
00:21:50.600 | Now, we basically have almost everything set up.
00:21:54.900 | We just need to set up the final thing,
00:21:56.900 | is kind of clean up the prompt
00:21:59.200 | and also just understand the prompt,
00:22:00.500 | because the prompt for the conversational agent
00:22:02.500 | is actually pretty complicated.
00:22:04.700 | All right?
00:22:06.000 | First, let's just have a look at the default prompt that we get.
00:22:09.500 | So we come to here and we have the chat prompt template.
00:22:14.300 | It's kind of hard to read,
00:22:16.900 | but we basically have the system message prompt template
00:22:20.700 | and that contains,
00:22:22.200 | "Assistant is a large language model trained by OpenAI.
00:22:25.400 | Assistant is designed to assist with a large range of tasks,"
00:22:28.000 | so on and so on.
00:22:29.100 | All right?
00:22:30.200 | I think this is quite a good system message.
00:22:31.900 | So in most cases,
00:22:33.500 | maybe you would just want to leave it there,
00:22:35.300 | but it's up to you.
00:22:37.500 | In reality, for this demo, I don't need to use all of this.
00:22:41.200 | So I'm going to change it
00:22:43.800 | and mainly just show you how to change it.
00:22:46.900 | Okay? We don't really need to.
00:22:49.000 | But then we can also see down here,
00:22:51.100 | we have these,
00:22:52.200 | you can see that we have like tools in here
00:22:54.500 | and we have the description of our tools
00:22:56.400 | and so on and so on.
00:22:58.000 | This is basically all going to change to say the same.
00:23:03.100 | The only thing that's going to change is this system message.
00:23:06.300 | So we come down to here.
00:23:07.600 | I'm going to change system message to very short now,
00:23:11.400 | just that.
00:23:12.600 | So we do conversational agent, agent create prompt.
00:23:16.200 | We pass it in our system message
00:23:17.600 | and we also need to pass in any tools that we're using.
00:23:20.100 | Now, I know the tools are already in here,
00:23:23.600 | but this is basically resetting the prompt.
00:23:26.300 | So we need to include any of the tools
00:23:30.800 | that we are using in there as well.
00:23:32.600 | It's not going to just look at what was already in there
00:23:35.100 | and assume that we want them in there again.
00:23:37.100 | Okay, so let's take a look at that.
00:23:39.300 | All right, so again, pretty messy.
00:23:41.800 | You can see that now the template
00:23:44.800 | for the system message is shorter
00:23:46.700 | than the template for the user message
00:23:49.500 | is still pretty much the same.
00:23:52.100 | In fact, I think it is exactly the same.
00:23:53.900 | So there are a few inputs that we have here.
00:23:58.200 | We have the input.
00:23:59.600 | So this is actually what the user is writing,
00:24:02.200 | the chat history, and we have the agent scratchpad.
00:24:05.100 | Chat history, we defined this earlier on.
00:24:08.500 | So this is going to connect up
00:24:10.600 | to our conversational buffer window memory here.
00:24:13.800 | So basically the history of the chat
00:24:17.500 | is going to go in there, wherever it says this.
00:24:20.900 | We have an agent scratchpad.
00:24:22.200 | That is basically where the thoughts
00:24:25.300 | of the large language model are stored.
00:24:27.200 | We can read them because we set verbose equal to true,
00:24:30.000 | but the final output will not include them.
00:24:35.100 | Okay, cool.
00:24:37.200 | So let's take a look at what we have in here.
00:24:41.400 | So we have three items.
00:24:46.100 | We have the system message prompt template.
00:24:49.700 | So these are all the prompts that are being fed in there.
00:24:52.200 | Then we have the messages placeholder
00:24:54.300 | and you see this is where the chat history is going to go in.
00:24:57.200 | And then we have the human message prompt template.
00:25:00.800 | So for that, we have a single input variable,
00:25:04.500 | which is actually the user input.
00:25:07.500 | So let's just take a look at those one by one.
00:25:12.700 | So system message prompt template,
00:25:15.600 | we can print that out.
00:25:16.600 | So we have the actual template here.
00:25:19.000 | Your help will chatbot answers user's questions.
00:25:21.100 | It's just what we wrote earlier.
00:25:22.400 | Let's take a look at this one.
00:25:25.700 | This is just a placeholder for our chat history
00:25:29.400 | that is going to be fed into there.
00:25:31.100 | And then we have the human message prompt template.
00:25:35.400 | So we have the template here.
00:25:38.200 | That looks pretty messy.
00:25:39.500 | So let's print that out.
00:25:40.700 | Cool.
00:25:43.700 | So we have tools.
00:25:46.400 | Assistant can ask the user to use tools
00:25:48.900 | to look up information they may find helpful
00:25:51.300 | in answering the user's original question.
00:25:53.200 | The tools that the tools of human can use are.
00:25:57.800 | Okay.
00:25:58.500 | So this is basically going to the assistant
00:26:01.900 | as far as it knows it's going to be responding.
00:26:04.600 | It's actually going to be responding to the scratchpad.
00:26:07.100 | So it's going to say,
00:26:08.600 | okay, we have this lecture room and database
00:26:11.200 | use this tool based on whatever.
00:26:13.800 | Right.
00:26:14.800 | So when responding to me, please output response
00:26:17.400 | in one or two formats.
00:26:18.500 | Okay.
00:26:19.500 | Use this if you want the human to use the tool.
00:26:21.800 | In reality, it's actually a line chain
00:26:24.800 | or the function that we've passed
00:26:26.700 | that is going to be the human in this case.
00:26:29.500 | Right.
00:26:30.400 | So we're going to be using a markdown code
00:26:33.200 | formatted in the following schema.
00:26:35.600 | It's going to be action, what actions to take.
00:26:38.300 | So it must be one of lecture room and DB.
00:26:41.000 | So if we had multiple tools that would be listed here,
00:26:44.000 | we just have one.
00:26:44.700 | So it's lecture room and DB.
00:26:46.100 | And then action input,
00:26:48.800 | what input should we pass to that?
00:26:52.200 | Okay.
00:26:53.100 | And then use this if you want to respond directly to the human.
00:26:56.800 | So this is where you get your final thought
00:26:58.900 | or final answer.
00:27:00.400 | Right.
00:27:01.000 | And it's just a final answer.
00:27:02.500 | Okay, cool.
00:27:04.100 | And then users input
00:27:06.500 | and then you have the input there.
00:27:09.200 | Right.
00:27:09.500 | So our actual query will be going in there.
00:27:12.900 | We insert it into that.
00:27:14.200 | And the full thing is passed to the large language model.
00:27:18.500 | Okay.
00:27:19.500 | So let's try.
00:27:20.400 | We're going to start with a really simple sort of question
00:27:23.500 | of a conversation, which is, hi, how are you?
00:27:25.900 | Right.
00:27:26.900 | This doesn't require
00:27:29.200 | the Lex Friedman database, right?
00:27:32.000 | We're not saying anything about Lex Friedman.
00:27:34.200 | So it just says,
00:27:36.800 | so it goes in, it has its action and its action input.
00:27:40.600 | So that is a JSON format that we saw before.
00:27:43.000 | Right.
00:27:44.000 | And so the output that we would actually get there
00:27:47.500 | is this.
00:27:48.300 | I'm just a chatbot.
00:27:49.300 | So I don't have any feelings,
00:27:50.600 | but I'm here to help you with any questions you have.
00:27:53.000 | We also have the chat history here.
00:27:55.300 | Right.
00:27:55.900 | That in an actual scenario,
00:27:58.300 | you wouldn't be feeding that back to the user.
00:27:59.900 | You'd just be feeding the output back to the user.
00:28:02.800 | Right now that is empty anyway,
00:28:05.100 | because we haven't said anything else.
00:28:07.300 | Right.
00:28:08.500 | Okay.
00:28:09.200 | So he decided not to use the Lex Friedman DB.
00:28:12.700 | Now we're going to use those words,
00:28:15.300 | ask Lex,
00:28:16.100 | and we're going to ask about the future of AI.
00:28:18.300 | Let's see what happens.
00:28:20.800 | All right.
00:28:23.400 | So it goes in, it says,
00:28:25.700 | right.
00:28:26.700 | So we need the action, Lex Friedman database,
00:28:30.100 | and we say, what is the future of AI?
00:28:32.400 | Sometimes we're going to get this.
00:28:35.000 | I think this is just an issue with the library.
00:28:38.400 | What we will need to do is just kind of come up
00:28:41.200 | and reset the agent.
00:28:43.800 | So let's initialize that agent again.
00:28:46.600 | Okay.
00:28:47.500 | We come to here.
00:28:48.200 | Let's run this.
00:28:50.100 | Right.
00:28:50.800 | And this looks good.
00:28:51.700 | So we have the JSON format here.
00:28:54.900 | Observation.
00:28:56.800 | Okay.
00:28:57.800 | So let's have a look at that observation
00:28:59.700 | that is getting back from the tool.
00:29:02.300 | And it says, Lex Friedman discussed the potential of AI
00:29:05.300 | to increase quality of life,
00:29:06.600 | cure diseases, increase material wealth,
00:29:08.700 | and so on and so on.
00:29:10.900 | Okay.
00:29:11.800 | Yeah, it's pretty long.
00:29:13.900 | So then the thought based on this observation
00:29:19.600 | that I got is that we need to move on to the final answer
00:29:22.400 | because this answers the question
00:29:23.800 | and I think it's basically just copying the full thing in there.
00:29:27.500 | All right.
00:29:28.500 | Yeah, it seems to be.
00:29:30.000 | Okay, cool.
00:29:32.500 | So then you take a look
00:29:34.400 | and this is awesome, a little bit messier.
00:29:36.800 | All right.
00:29:37.300 | So we have that input.
00:29:38.200 | This is the question we asked.
00:29:40.000 | Then we have the chat history.
00:29:41.800 | So this is a list with, we have the human message.
00:29:45.800 | We have the AI message.
00:29:47.600 | Maybe I ran that bit, that run twice.
00:29:51.900 | I'm not sure, but it seems to have appeared twice in there.
00:29:54.500 | And then the output to that is Lex Friedman
00:29:57.000 | discussed the potential of AI, so on and so on.
00:29:59.900 | Okay, cool.
00:30:01.000 | Now, that's actually the end of the notebook,
00:30:04.300 | but what we can do is maybe just ask a follow-up question.
00:30:08.100 | Like, okay, what can we ask?
00:30:12.000 | What does he think?
00:30:15.200 | What does he think about space exploration?
00:30:23.600 | I haven't specified ask Lex in here,
00:30:28.700 | but I'm hoping that it's kind of like a follow-up question.
00:30:31.500 | So I'm hoping that it will view that as a follow-up question
00:30:35.300 | and it will use the Lex Friedman database again.
00:30:38.700 | Okay, very enthusiastic, so on and so on.
00:30:42.500 | Cool.
00:30:43.100 | Now we have more about history and stuff in there.
00:30:47.200 | So I must have run that other one twice.
00:30:49.700 | So our most recent one before this was
00:30:52.300 | ask Lex about the future of AI.
00:30:54.100 | We got this answer and now we're saying the next one.
00:30:57.700 | So the output of the space exploration question is
00:31:02.300 | Lex Friedman is very enthusiastic about space exploration,
00:31:06.000 | believes that it is one of the most inspiring things
00:31:08.300 | humans can do.
00:31:09.300 | Cool.
00:31:10.200 | For now, that is a retrieval Q&A agent for getting information
00:31:17.600 | about the Lex Friedman podcast.
00:31:19.800 | Now, as you might have guessed, you can obviously apply that
00:31:24.000 | to a ton of different things.
00:31:26.000 | It can be other podcasts.
00:31:27.300 | It can be completely different forms of media or of information,
00:31:32.000 | internal company documents, PDFs, you know, all that sort of stuff.
00:31:35.300 | So you can do a lot of stuff with this and as well,
00:31:38.800 | you can use multiple tools.
00:31:40.000 | So maybe I just want to focus on podcasts.
00:31:43.600 | I could include like a Lex Friedman podcast agent.
00:31:48.000 | I could include like a Human Labs podcast agent and so
00:31:51.500 | on and so on, right?
00:31:53.600 | And maybe you want to also include other things like a calculator
00:31:58.100 | tool or a SQL database retriever or whatever else.
00:32:02.200 | There's a ton of things you can do with agents.
00:32:04.800 | But anyway, that is it for this video.
00:32:08.000 | I hope all this has been interesting and useful.
00:32:11.000 | So thank you very much for watching and I will see you
00:32:15.100 | again in the next one.