Lex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5

00:00:00.000 | Today, we're going to focus on how we can build tools

00:00:04.500 | that can be used by agents in the LangChain library.

00:00:08.200 | Now, none of that made any sense.

00:00:10.800 | Let me just explain what I mean quickly.

00:00:14.400 | When I say agent, I'm referring to essentially a large language model

00:00:20.100 | that can decide and use tools

00:00:25.200 | that give it basically other abilities

00:00:27.900 | other than just kind of like the auto-complete of a typical large language model.

00:00:32.200 | And when I refer to tool,

00:00:34.700 | obviously, that is the tool that this agent will be able to use.

00:00:39.400 | So, if I just kind of try and visualize this quickly,

00:00:44.500 | you typically have your query.

00:00:47.300 | That would go into a large language model

00:00:51.100 | and then it would output some completion.

00:00:57.500 | It would just output some text.

00:00:59.100 | An agent is different because let's say you have your query here,

00:01:03.600 | goes to your agent, which is just a large language model.

00:01:08.300 | We can even say large language model.

00:01:11.700 | But now, it's going to say,

00:01:15.100 | "Okay, I have some tools available to me.

00:01:18.600 | Will any of these tools help me answer this query?"

00:01:24.600 | So, it will basically ask itself that question,

00:01:27.700 | "Can I use a tool to answer this better than I would be able to otherwise?"

00:01:32.300 | If the answer is yes, it's going to decide on which tool it needs to use.

00:01:37.200 | So, in our case, that might be,

00:01:39.800 | it's going to decide to use the Lex Friedman database tool.

00:01:44.900 | Once it has decided to do that,

00:01:48.900 | it also needs to create the input to that tool.

00:01:53.800 | So, it's going to also, alongside this,

00:01:56.800 | it's going to be, "Okay, I need to ask this query here."

00:02:01.200 | It's probably going to be similar to the original user query,

00:02:03.900 | but it is actually generated again.

00:02:07.000 | In some scenarios with these agents,

00:02:10.600 | like maybe the agent is like a Python interpreter,

00:02:15.100 | then in that case, it would obviously rewrite the query into Python code

00:02:19.400 | that can then be executed by the Python function.

00:02:24.200 | So, essentially, large language model is always going to rewrite something

00:02:28.000 | in order to put it into that tool.

00:02:31.500 | Cool. So, it's going to decide on that tool.

00:02:34.900 | So, that tool is here, it's going to put in that input,

00:02:38.400 | and it's going to get some output.

00:02:40.700 | So, this is the response.

00:02:43.200 | And this is going to come out,

00:02:45.500 | and there may be different things that happen here.

00:02:50.600 | So, within that tool, maybe there's another large language model

00:02:55.200 | that is going to summarize or reformat the output,

00:02:58.200 | or maybe it's just like raw output that gets fed back here.

00:03:01.700 | But basically, here, you have your answer.

00:03:03.900 | And the answer gets fed back to our large language model here,

00:03:09.000 | and based on the query and the answer, it's going to say,

00:03:11.900 | sometimes it will say, "Okay, I need to use another tool,"

00:03:14.300 | or, "I need to think about this a little bit more."

00:03:17.300 | But at some point, it's going to get to what we call the final thought.

00:03:21.000 | So, the final thought.

00:03:22.800 | And that is what we would give to the user.

00:03:32.700 | So, it's a slightly more complex approach,

00:03:37.700 | but, I mean, you're giving tools to large language models.

00:03:41.800 | So, what you can do with large language models

00:03:45.300 | is, all of a sudden, much grander than what you could do

00:03:49.900 | with just one large language model that's just doing completion.

00:03:53.900 | Now, let's have a look at how we can implement all of this.

00:03:57.600 | So, we're going to be installing a few prerequisites here.

00:04:00.900 | So, we have the HuggingFace datasets library

00:04:03.600 | because we already have a Lex Friedman transcripts dataset

00:04:07.900 | that we're going to use,

00:04:09.000 | although I will talk about how to actually get that in another video.

00:04:15.000 | We have this PodGBT library, which is actually how we get that,

00:04:19.400 | or create that dataset.

00:04:20.700 | And it's also how we're going to index everything into Pinecone.

00:04:25.100 | We have Pinecone here.

00:04:27.000 | We're using gRPC client.

00:04:28.800 | Actually, I'm not sure if we do need that here,

00:04:31.100 | but generally speaking, we can use this to make things faster.

00:04:35.300 | Then we have LangChain, OpenAI, and TQDM,

00:04:39.300 | which is just the progress bar.

00:04:42.000 | Okay, cool.

00:04:43.800 | So, we get to here, you will need a few API keys.

00:04:48.000 | So, you have your OpenAI API key that is at platform.openai.com.

00:04:53.700 | You have your Pinecone API key.

00:04:56.500 | For that, you go to app.pinecone.io.

00:04:59.800 | And then if you have an account, you will see this screen.

00:05:03.600 | Otherwise, you will need to sign up for an account

00:05:05.700 | and you go to API keys and just copy that.

00:05:08.500 | And also note that here, this is the environment, US West 1 GCP.

00:05:14.000 | I would paste in my API key here and here I'd put US West 1 GCP.

00:05:20.600 | Right? Okay, cool.

00:05:23.100 | Now, let's move on to this.

00:05:24.600 | So, we're going to download that dataset.

00:05:27.900 | I'm not sure if that's actually the correct name.

00:05:30.000 | Let's try.

00:05:31.300 | Okay, so let me change the dataset name.

00:05:37.100 | Okay, I can see at the top here, so change column, let's transcripts.

00:05:41.700 | So, I'm going to put that in there.

00:05:44.500 | Try again.

00:05:49.000 | Let me just see what I'm doing wrong.

00:05:51.100 | So, I'm going to copy this.

00:05:52.600 | Oh, I put ASP transcripts, sorry, Lex transcripts.

00:05:59.000 | Okay, so that is downloaded.

00:06:01.300 | There's not a ton of data in there at the moment.

00:06:03.600 | I'm actually processing more of that at the moment.

00:06:07.300 | So, hopefully by the time you see this video,

00:06:09.700 | this will be bigger and you'll have more data in there,

00:06:12.600 | which will give you more interesting results.

00:06:14.700 | And then we need to reformat this data into a format

00:06:19.600 | that is required by the PodGPT indexer.

00:06:21.900 | Now, if you saw my recent video on ChatGPT plugins,

00:06:27.300 | you'll recognize that this is very similar to the format

00:06:32.300 | that they use with ChatGPT plugins.

00:06:34.400 | The reason I've done that on purpose

00:06:37.000 | is because I also want to show you at some point,

00:06:40.700 | not in this video, but in another video,

00:06:42.700 | how we can create a ChatGPT plugin

00:06:45.300 | using this pretty much the same approach.

00:06:48.500 | So, we have our IDs, we have the text,

00:06:51.800 | and then we have metadata.

00:06:53.100 | In there, we have just relevant information.

00:06:55.300 | So, I want basically what I want the model to refer to,

00:06:58.600 | the title of the podcast episodes,

00:07:01.400 | and also link back to them.

00:07:03.400 | So, that's why I have those two in the metadata there.

00:07:06.800 | Okay.

00:07:07.800 | So, we run that.

00:07:10.400 | And then we need to initialize our indexer object.

00:07:15.200 | For that, we need to actually import PodGPT.

00:07:18.300 | It needs to obviously have the OpenAI API key in there as well.

00:07:25.000 | Okay.

00:07:26.500 | It's named. That should work.

00:07:28.700 | Cool.

00:07:30.300 | And then what we're going to do

00:07:31.700 | is actually just add everything in there.

00:07:34.500 | So, we're going to go through each row in our data,

00:07:37.300 | which represents a single podcast, right?

00:07:41.300 | So, if I go to data zero,

00:07:44.100 | there's probably going to be a lot of text in here, right?

00:07:47.700 | Because it's an entire podcast is here, right?

00:07:51.400 | We have all of this.

00:07:52.700 | And this is actually just a podcast clip, right?

00:07:57.600 | If it's a full podcast, it's going to be even longer.

00:08:00.100 | Okay, this is Lex.

00:08:04.400 | Doing a song, apparently.

00:08:06.900 | And another short video.

00:08:10.800 | Let's try and find one that's long.

00:08:12.800 | Okay, this looks more like podcast length, right?

00:08:20.400 | You see there's a ton of text in here.

00:08:22.700 | Yeah, so this is an early podcast as well.

00:08:26.400 | So, maybe it's not even that long

00:08:28.000 | compared to some of the more recent ones.

00:08:31.100 | But basically, we're not going to feed all of that

00:08:33.700 | into a single embedding

00:08:37.300 | because we want to be more specific with our embedding.

00:08:40.100 | So, the automatic processing here of the indexer

00:08:48.000 | is actually to split everything into chunks of,

00:08:51.200 | I think it's 400 tokens, right?

00:08:54.000 | So, we do four row in data.

00:08:57.800 | I'm just going to reformat a couple of those,

00:09:00.500 | converting this into the publish date into a string.

00:09:03.700 | And we're also just removing this source here

00:09:07.400 | because I've renamed it URL.

00:09:09.100 | And then I'm creating a video record object

00:09:12.600 | from that dictionary, essentially.

00:09:16.300 | And then I'm indexing that.

00:09:17.700 | And that will handle everything.

00:09:19.800 | So, that will handle the chunking for me as well.

00:09:23.300 | Okay, I wonder if this actually needs to be source.

00:09:27.000 | Let me try.

00:09:30.000 | Okay, so it was actually supposed to be source there.

00:09:37.300 | Okay, so I think this bit here I don't actually need to do.

00:09:40.100 | I think this is actually handled by PodGPT.

00:09:43.500 | So, let's just remove that bit there.

00:09:45.200 | Unnecessary.

00:09:47.000 | Okay, so this bit, it's processing everything.

00:09:49.900 | If I would like to see where that is,

00:09:53.700 | then, I mean, it should come up here.

00:09:55.900 | Usually it does.

00:09:57.600 | But I can also check over in the Pinecone, okay?

00:10:01.700 | So, I'm going to go to app.pinecone.io.

00:10:05.400 | Okay, so I come to here.

00:10:08.100 | Great, so PodGPT.

00:10:10.600 | And I think I can check in metrics.

00:10:13.400 | And okay, look, I can see when is this?

00:10:17.000 | This is just now.

00:10:18.200 | Yeah, all right.

00:10:20.600 | So, let me make that a little bit longer.

00:10:23.000 | Okay, it doesn't let me make it longer.

00:10:25.900 | So, okay, in the last minute, basically it's increasing.

00:10:29.400 | Right, if I click on requests.

00:10:31.900 | Right, so you can see the number of requests I'm making per minute

00:10:36.800 | and it was increasing.

00:10:38.200 | Here it's at zero.

00:10:39.200 | I think maybe it's just because it's not counted them yet.

00:10:41.800 | Or actually it's because it finished.

00:10:46.100 | Okay, great.

00:10:47.000 | So, then you can see the row of the most recent one in there.

00:10:52.900 | It's just what you saw before, actually.

00:10:54.900 | So, you want video ID, channel ID, title, publish, transcript.

00:10:59.100 | Cool.

00:11:00.100 | Now, what I want to do is I want to go to index name PodGPT,

00:11:08.400 | not Asclex.

00:11:10.000 | And I'm going to initialize my connection through Pinecone now.

00:11:16.200 | Okay, so through Pinecone directly.

00:11:18.000 | And I'm going to create this index object here.

00:11:22.200 | Okay, cool.

00:11:24.100 | And then I'm going to initialize the retrieval components

00:11:27.600 | within LangChain.

00:11:28.800 | Okay, so the retrieval components or what you need for retrieval

00:11:32.500 | are embeddings and also the vector database.

00:11:37.800 | Okay, Pinecone.

00:11:38.800 | So, I initialize both of those.

00:11:41.500 | The text key is important.

00:11:44.500 | So, that is basically saying which one of your metadata fields

00:11:48.200 | contains the text that has been embedded.

00:11:51.300 | In this case, it is the text field.

00:11:54.600 | Earlier on, we had transcript, right?

00:11:56.300 | But that's because the PodGPT library that we're using here

00:12:00.500 | is actually reformatting that into the earlier format

00:12:06.200 | that I showed you where it was ID, text, and metadata, right?

00:12:11.100 | Or into something similar.

00:12:13.300 | So, if I do index query,

00:12:15.400 | I'm just going to put in like a random dummy vector at the moment.

00:12:21.300 | That needs to be 1, 5, 3, 6.

00:12:24.600 | Dimensionality, the OpenAI embedding model we're using.

00:12:28.400 | I do top K equals 1 and include metadata equals true.

00:12:36.300 | Okay, if I do that, we should see, yeah.

00:12:41.600 | So, this is the format that our metadata is in within Pinecone.

00:12:46.500 | All right, so we have the ID.

00:12:48.500 | That's like a unique ID for each one.

00:12:50.500 | The chunk number, when it was published.

00:12:52.800 | We have the source of this retrieved item,

00:12:57.000 | and we have the text in there as well, right?

00:12:59.700 | So, yeah, oh, and also the title.

00:13:03.100 | Cool.

00:13:04.600 | So, that is why we're specifying text as a text key there.

00:13:11.300 | And then what we want to do is initialize the GPT 3.5 TurboChat model.

00:13:18.200 | So, I'm using the OpenAI API key there.

00:13:20.900 | Same temperatures, so the amount of randomness from the model to zero.

00:13:24.900 | And we're using the GPT 3.5 Turbo model.

00:13:27.900 | You can also just do that.

00:13:29.400 | Basically, the GPT 3.5 Turbo is the default model name setting for that.

00:13:36.300 | Okay, cool.

00:13:39.000 | Now, what we want to do is we're going to be using this retrieval QA object.

00:13:47.100 | So, recently, LangChain refactored the VectorDB QA objects,

00:13:53.600 | and now we use this rather than VectorDB QA.

00:13:58.400 | So, it's basically pretty much the same thing, right?

00:14:03.400 | It's just using a slightly different approach.

00:14:09.000 | So, we specify the large language model that we'd like to use.

00:14:13.900 | We are specifying the chain type.

00:14:16.300 | So, I'll talk about that in a moment.

00:14:18.600 | And then the retriever, so that is just VectorDB.

00:14:21.200 | And then the method we say asRetriever to turn into basically a retrieval object.

00:14:26.500 | Right?

00:14:28.500 | So, yeah, we have this chain type stuff here.

00:14:31.900 | We have two options.

00:14:33.200 | We have either stuff or we have MapReduce.

00:14:35.400 | Okay, let's say we return 10 items.

00:14:37.800 | If we use chain type stuff, those 10 documents are just returned as is,

00:14:43.600 | and they're passed to the large language model.

00:14:45.600 | If we use MapReduce, those 10 items are summarized,

00:14:49.900 | and that summary is then passed into the large language model.

00:14:53.800 | We want as much information as possible coming from our Vector database.

00:14:57.900 | So, we use the stuff chain type.

00:15:00.400 | Okay, initialize that.

00:15:04.600 | Cool.

00:15:06.400 | Now, we get to the kind of interesting stuff, right?

00:15:09.100 | So, we have our retriever.

00:15:10.800 | With all this, we could do like the naive implementation

00:15:13.800 | of using a Vector database with a large language model,

00:15:17.800 | where we're using like we have a query,

00:15:20.800 | we pass it to the large language model.

00:15:23.300 | We use that query to also search the Vector database

00:15:26.900 | and retrieve that information and feed it into the query

00:15:31.000 | or into the prompt alongside the query for the large language model.

00:15:34.400 | That's like the simple way of doing this, right?

00:15:38.500 | You're basically searching every single time.

00:15:41.100 | But obviously, in like a chat scenario,

00:15:45.200 | you're not necessarily going to want to refer

00:15:47.400 | to the Vector database every single time

00:15:49.500 | with every single interaction.

00:15:51.400 | So, by using these agents and these tools,

00:15:54.800 | we can just do it when is needed

00:15:57.400 | according to the large language model, right?

00:15:59.800 | But to do that, we need to create this tool, okay?

00:16:03.200 | So, this is basically Vector database as a tool, right?

00:16:06.900 | So, we're going to give a tool description.

00:16:08.800 | That tool description is used by the model

00:16:12.700 | or the large language model to decide

00:16:15.500 | which of its tools it should use.

00:16:18.000 | Because sometimes it can have multiple tools.

00:16:19.800 | And it's also just to decide if it needs to use,

00:16:22.800 | in this case, this one tool that it has, right?

00:16:26.500 | So, this needs to be descriptive

00:16:28.600 | and very like just straightforward, right?

00:16:31.800 | So, here we're saying use this tool to answer user questions

00:16:34.900 | using Flex Freedom and Podcast.

00:16:37.300 | If the user says, "Ask Alexa,"

00:16:39.400 | use this tool to get the answer.

00:16:42.000 | This tool can also be used to follow-up questions

00:16:44.600 | from the user, right?

00:16:45.900 | So, I kind of wanted to add this

00:16:48.100 | so that it's not expecting us to say,

00:16:50.300 | "Ask Alexa" every time.

00:16:52.200 | Okay, for the first query, fine.

00:16:54.800 | But then after that, maybe I want to ask a follow-up question

00:16:57.300 | without saying, "Ask Alexa" again, okay?

00:17:00.300 | And then we initialize a tool from LangChain agents.

00:17:05.200 | So, tools require three items.

00:17:07.700 | They require a function, right?

00:17:10.800 | So, that function basically takes in some text input.

00:17:14.600 | It does something and then it outputs some text output, okay?

00:17:19.200 | That is what the function needs to be

00:17:22.500 | and that is this retriever.run, okay?

00:17:25.000 | So, that's a retrieval QA object that we've created up here.

00:17:28.400 | Okay, cool.

00:17:29.800 | Now we have the tool description, which we defined here,

00:17:33.200 | and we have the name of the tool as well.

00:17:35.100 | So, it's a Lex Friedman database.

00:17:37.100 | So, with all of that,

00:17:38.800 | we're ready to move on to initializing our chatbot agent

00:17:42.900 | or conversational agent.

00:17:44.500 | Now, because it's a conversational agent,

00:17:47.400 | it does require some form of memory.

00:17:50.800 | So, conversational memory, which we've spoken about before,

00:17:53.600 | and we're going to use a very simple one here.

00:17:55.800 | So, conversational buffer window memory,

00:17:58.400 | which this is going to remember the previous K interactions

00:18:02.800 | between the user and the AI.

00:18:04.700 | We're going to set K equal to five.

00:18:06.300 | You can set it to higher, depending on what you're looking for.

00:18:09.800 | So, basically, this is going to remember five AI responses

00:18:16.000 | and five human questions, like previous, right?

00:18:20.300 | So, you have your current state conversation.

00:18:22.800 | You're going to go back five steps,

00:18:25.700 | like AI and human, AI and human, AI and human,

00:18:28.900 | AI and human, AI and human, right?

00:18:31.200 | That's how far back we're going to go.

00:18:33.100 | That's how much we're going to remember.

00:18:35.000 | Once we move on to the next interaction,

00:18:37.600 | it's going to forget the interaction that was six steps back.

00:18:41.700 | Okay?

00:18:43.400 | One important thing is the memory key here.

00:18:45.600 | I think by default, this is just history, right?

00:18:49.800 | We need it to be chat history

00:18:51.200 | because we will be feeding this into the prompt later on

00:18:55.300 | and the prompt is going to use the input

00:18:59.300 | or the parameter chat history.

00:19:02.800 | So, that is important.

00:19:04.500 | Okay?

00:19:05.900 | Cool. Let's run that.

00:19:08.800 | And then we initialize our conversational agent.

00:19:11.700 | Now, let's take a look at this.

00:19:13.700 | So, in here, we have our initialized agent.

00:19:17.800 | We have the type of agent.

00:19:19.400 | There are a ton of different agents that we can use in Lionchain.

00:19:23.000 | We're using the chat part here

00:19:25.900 | because we're using a chatbot model, GPT 3.5 Turbo.

00:19:31.100 | It's a conversational chatbot, right?

00:19:34.900 | So, that means it's going to be using conversational memory.

00:19:37.900 | We are using the React framework,

00:19:40.500 | which is basically almost like a thought loop

00:19:45.800 | for the large language model,

00:19:47.500 | which is going to be reason about the query

00:19:50.200 | that has been given to you, which is this RE.

00:19:53.500 | And then decide on an action based on your reasoning.

00:19:59.300 | Okay?

00:20:00.700 | So, we're going to talk about that a lot more,

00:20:02.600 | in a lot more depth relatively soon.

00:20:04.300 | But for now, I'm not going to go into too much detail there.

00:20:07.700 | But it's basically that reasoning, action, reasoning, action loop.

00:20:11.900 | And then description.

00:20:14.100 | So, the description is referring to,

00:20:16.700 | you know, we have that tool description up here.

00:20:19.600 | That is basically the deciding factor of the large language model

00:20:23.800 | on which tool it should use.

00:20:26.100 | Okay? So, that's why it has that in there as well.

00:20:29.400 | In here, we're passing in the number of tools that it can use, right?

00:20:32.300 | There can be multiple tools.

00:20:33.900 | We've just got one now here.

00:20:35.700 | We have our large language model.

00:20:38.300 | Verbose is basically, because at the moment we're developing,

00:20:42.700 | we're trying to figure out, you know, what we need to do here.

00:20:45.100 | We want to see every step in the execution of this agent.

00:20:50.800 | Okay?

00:20:51.900 | Every step in the agent executor chain.

00:20:54.800 | That's going to print it out,

00:20:56.300 | so we can see what is actually happening.

00:20:58.900 | Max Iterations is saying, okay,

00:21:01.100 | so especially when you have multiple tools,

00:21:03.200 | what might happen is that your agent is like,

00:21:08.200 | "Okay, I need to use this tool.

00:21:09.800 | I'm going to use it."

00:21:11.100 | It gets an answer and it's going to think,

00:21:13.000 | "Okay, I need to use other tool to complement this answer.

00:21:16.100 | And I need to use another tool.

00:21:17.200 | I need to use another tool."

00:21:18.300 | And sometimes what can happen is it can just keep going,

00:21:22.400 | like infinitely or just for too long.

00:21:25.400 | Okay?

00:21:27.000 | So we want to put a stop on the number of iterations

00:21:31.200 | I can go through there,

00:21:32.300 | number of loops I can go through.

00:21:33.700 | So we set that to two.

00:21:35.200 | We have our early stopping method here,

00:21:37.600 | so basically the model can decide,

00:21:40.300 | "Okay, I'm done. Stop."

00:21:42.400 | All right?

00:21:43.800 | And then we also have our conversational memory.

00:21:46.300 | Okay? Very important.

00:21:47.800 | Cool.

00:21:50.600 | Now, we basically have almost everything set up.

00:21:54.900 | We just need to set up the final thing,

00:21:56.900 | is kind of clean up the prompt

00:21:59.200 | and also just understand the prompt,

00:22:00.500 | because the prompt for the conversational agent

00:22:02.500 | is actually pretty complicated.

00:22:04.700 | All right?

00:22:06.000 | First, let's just have a look at the default prompt that we get.

00:22:09.500 | So we come to here and we have the chat prompt template.

00:22:14.300 | It's kind of hard to read,

00:22:16.900 | but we basically have the system message prompt template

00:22:20.700 | and that contains,

00:22:22.200 | "Assistant is a large language model trained by OpenAI.

00:22:25.400 | Assistant is designed to assist with a large range of tasks,"

00:22:28.000 | so on and so on.

00:22:29.100 | All right?

00:22:30.200 | I think this is quite a good system message.

00:22:31.900 | So in most cases,

00:22:33.500 | maybe you would just want to leave it there,

00:22:35.300 | but it's up to you.

00:22:37.500 | In reality, for this demo, I don't need to use all of this.

00:22:41.200 | So I'm going to change it

00:22:43.800 | and mainly just show you how to change it.

00:22:46.900 | Okay? We don't really need to.

00:22:49.000 | But then we can also see down here,

00:22:51.100 | we have these,

00:22:52.200 | you can see that we have like tools in here

00:22:54.500 | and we have the description of our tools

00:22:56.400 | and so on and so on.

00:22:58.000 | This is basically all going to change to say the same.

00:23:03.100 | The only thing that's going to change is this system message.

00:23:06.300 | So we come down to here.

00:23:07.600 | I'm going to change system message to very short now,

00:23:11.400 | just that.

00:23:12.600 | So we do conversational agent, agent create prompt.

00:23:16.200 | We pass it in our system message

00:23:17.600 | and we also need to pass in any tools that we're using.

00:23:20.100 | Now, I know the tools are already in here,

00:23:23.600 | but this is basically resetting the prompt.

00:23:26.300 | So we need to include any of the tools

00:23:30.800 | that we are using in there as well.

00:23:32.600 | It's not going to just look at what was already in there

00:23:35.100 | and assume that we want them in there again.

00:23:37.100 | Okay, so let's take a look at that.

00:23:39.300 | All right, so again, pretty messy.

00:23:41.800 | You can see that now the template

00:23:44.800 | for the system message is shorter

00:23:46.700 | than the template for the user message

00:23:49.500 | is still pretty much the same.

00:23:52.100 | In fact, I think it is exactly the same.

00:23:53.900 | So there are a few inputs that we have here.

00:23:58.200 | We have the input.

00:23:59.600 | So this is actually what the user is writing,

00:24:02.200 | the chat history, and we have the agent scratchpad.

00:24:05.100 | Chat history, we defined this earlier on.

00:24:08.500 | So this is going to connect up

00:24:10.600 | to our conversational buffer window memory here.

00:24:13.800 | So basically the history of the chat

00:24:17.500 | is going to go in there, wherever it says this.

00:24:20.900 | We have an agent scratchpad.

00:24:22.200 | That is basically where the thoughts

00:24:25.300 | of the large language model are stored.

00:24:27.200 | We can read them because we set verbose equal to true,

00:24:30.000 | but the final output will not include them.

00:24:35.100 | Okay, cool.

00:24:37.200 | So let's take a look at what we have in here.

00:24:41.400 | So we have three items.

00:24:46.100 | We have the system message prompt template.

00:24:49.700 | So these are all the prompts that are being fed in there.

00:24:52.200 | Then we have the messages placeholder

00:24:54.300 | and you see this is where the chat history is going to go in.

00:24:57.200 | And then we have the human message prompt template.

00:25:00.800 | So for that, we have a single input variable,

00:25:04.500 | which is actually the user input.

00:25:07.500 | So let's just take a look at those one by one.

00:25:12.700 | So system message prompt template,

00:25:15.600 | we can print that out.

00:25:16.600 | So we have the actual template here.

00:25:19.000 | Your help will chatbot answers user's questions.

00:25:21.100 | It's just what we wrote earlier.

00:25:22.400 | Let's take a look at this one.

00:25:25.700 | This is just a placeholder for our chat history

00:25:29.400 | that is going to be fed into there.

00:25:31.100 | And then we have the human message prompt template.

00:25:35.400 | So we have the template here.

00:25:38.200 | That looks pretty messy.

00:25:39.500 | So let's print that out.

00:25:40.700 | Cool.

00:25:43.700 | So we have tools.

00:25:46.400 | Assistant can ask the user to use tools

00:25:48.900 | to look up information they may find helpful

00:25:51.300 | in answering the user's original question.

00:25:53.200 | The tools that the tools of human can use are.

00:25:57.800 | Okay.

00:25:58.500 | So this is basically going to the assistant

00:26:01.900 | as far as it knows it's going to be responding.

00:26:04.600 | It's actually going to be responding to the scratchpad.

00:26:07.100 | So it's going to say,

00:26:08.600 | okay, we have this lecture room and database

00:26:11.200 | use this tool based on whatever.

00:26:13.800 | Right.

00:26:14.800 | So when responding to me, please output response

00:26:17.400 | in one or two formats.

00:26:18.500 | Okay.

00:26:19.500 | Use this if you want the human to use the tool.

00:26:21.800 | In reality, it's actually a line chain

00:26:24.800 | or the function that we've passed

00:26:26.700 | that is going to be the human in this case.

00:26:29.500 | Right.

00:26:30.400 | So we're going to be using a markdown code

00:26:33.200 | formatted in the following schema.

00:26:35.600 | It's going to be action, what actions to take.

00:26:38.300 | So it must be one of lecture room and DB.

00:26:41.000 | So if we had multiple tools that would be listed here,

00:26:44.000 | we just have one.

00:26:44.700 | So it's lecture room and DB.

00:26:46.100 | And then action input,

00:26:48.800 | what input should we pass to that?

00:26:52.200 | Okay.

00:26:53.100 | And then use this if you want to respond directly to the human.

00:26:56.800 | So this is where you get your final thought

00:26:58.900 | or final answer.

00:27:00.400 | Right.

00:27:01.000 | And it's just a final answer.

00:27:02.500 | Okay, cool.

00:27:04.100 | And then users input

00:27:06.500 | and then you have the input there.

00:27:09.200 | Right.

00:27:09.500 | So our actual query will be going in there.

00:27:12.900 | We insert it into that.

00:27:14.200 | And the full thing is passed to the large language model.

00:27:18.500 | Okay.

00:27:19.500 | So let's try.

00:27:20.400 | We're going to start with a really simple sort of question

00:27:23.500 | of a conversation, which is, hi, how are you?

00:27:25.900 | Right.

00:27:26.900 | This doesn't require

00:27:29.200 | the Lex Friedman database, right?

00:27:32.000 | We're not saying anything about Lex Friedman.

00:27:34.200 | So it just says,

00:27:36.800 | so it goes in, it has its action and its action input.

00:27:40.600 | So that is a JSON format that we saw before.

00:27:43.000 | Right.

00:27:44.000 | And so the output that we would actually get there

00:27:47.500 | is this.

00:27:48.300 | I'm just a chatbot.

00:27:49.300 | So I don't have any feelings,

00:27:50.600 | but I'm here to help you with any questions you have.

00:27:53.000 | We also have the chat history here.

00:27:55.300 | Right.

00:27:55.900 | That in an actual scenario,

00:27:58.300 | you wouldn't be feeding that back to the user.

00:27:59.900 | You'd just be feeding the output back to the user.

00:28:02.800 | Right now that is empty anyway,

00:28:05.100 | because we haven't said anything else.

00:28:07.300 | Right.

00:28:08.500 | Okay.

00:28:09.200 | So he decided not to use the Lex Friedman DB.

00:28:12.700 | Now we're going to use those words,

00:28:15.300 | ask Lex,

00:28:16.100 | and we're going to ask about the future of AI.

00:28:18.300 | Let's see what happens.

00:28:20.800 | All right.

00:28:23.400 | So it goes in, it says,

00:28:25.700 | right.

00:28:26.700 | So we need the action, Lex Friedman database,

00:28:30.100 | and we say, what is the future of AI?

00:28:32.400 | Sometimes we're going to get this.

00:28:35.000 | I think this is just an issue with the library.

00:28:38.400 | What we will need to do is just kind of come up

00:28:41.200 | and reset the agent.

00:28:43.800 | So let's initialize that agent again.

00:28:46.600 | Okay.

00:28:47.500 | We come to here.

00:28:48.200 | Let's run this.

00:28:50.100 | Right.

00:28:50.800 | And this looks good.

00:28:51.700 | So we have the JSON format here.

00:28:54.900 | Observation.

00:28:56.800 | Okay.

00:28:57.800 | So let's have a look at that observation

00:28:59.700 | that is getting back from the tool.

00:29:02.300 | And it says, Lex Friedman discussed the potential of AI

00:29:05.300 | to increase quality of life,

00:29:06.600 | cure diseases, increase material wealth,

00:29:08.700 | and so on and so on.

00:29:10.900 | Okay.

00:29:11.800 | Yeah, it's pretty long.

00:29:13.900 | So then the thought based on this observation

00:29:19.600 | that I got is that we need to move on to the final answer

00:29:22.400 | because this answers the question

00:29:23.800 | and I think it's basically just copying the full thing in there.

00:29:27.500 | All right.

00:29:28.500 | Yeah, it seems to be.

00:29:30.000 | Okay, cool.

00:29:32.500 | So then you take a look

00:29:34.400 | and this is awesome, a little bit messier.

00:29:36.800 | All right.

00:29:37.300 | So we have that input.

00:29:38.200 | This is the question we asked.

00:29:40.000 | Then we have the chat history.

00:29:41.800 | So this is a list with, we have the human message.

00:29:45.800 | We have the AI message.

00:29:47.600 | Maybe I ran that bit, that run twice.

00:29:51.900 | I'm not sure, but it seems to have appeared twice in there.

00:29:54.500 | And then the output to that is Lex Friedman

00:29:57.000 | discussed the potential of AI, so on and so on.

00:29:59.900 | Okay, cool.

00:30:01.000 | Now, that's actually the end of the notebook,

00:30:04.300 | but what we can do is maybe just ask a follow-up question.

00:30:08.100 | Like, okay, what can we ask?

00:30:12.000 | What does he think?

00:30:15.200 | What does he think about space exploration?

00:30:23.600 | I haven't specified ask Lex in here,

00:30:28.700 | but I'm hoping that it's kind of like a follow-up question.

00:30:31.500 | So I'm hoping that it will view that as a follow-up question

00:30:35.300 | and it will use the Lex Friedman database again.

00:30:38.700 | Okay, very enthusiastic, so on and so on.

00:30:42.500 | Cool.

00:30:43.100 | Now we have more about history and stuff in there.

00:30:47.200 | So I must have run that other one twice.

00:30:49.700 | So our most recent one before this was

00:30:52.300 | ask Lex about the future of AI.

00:30:54.100 | We got this answer and now we're saying the next one.

00:30:57.700 | So the output of the space exploration question is

00:31:02.300 | Lex Friedman is very enthusiastic about space exploration,

00:31:06.000 | believes that it is one of the most inspiring things

00:31:08.300 | humans can do.

00:31:09.300 | Cool.

00:31:10.200 | For now, that is a retrieval Q&A agent for getting information

00:31:17.600 | about the Lex Friedman podcast.

00:31:19.800 | Now, as you might have guessed, you can obviously apply that

00:31:24.000 | to a ton of different things.

00:31:26.000 | It can be other podcasts.

00:31:27.300 | It can be completely different forms of media or of information,

00:31:32.000 | internal company documents, PDFs, you know, all that sort of stuff.

00:31:35.300 | So you can do a lot of stuff with this and as well,

00:31:38.800 | you can use multiple tools.

00:31:40.000 | So maybe I just want to focus on podcasts.

00:31:43.600 | I could include like a Lex Friedman podcast agent.

00:31:48.000 | I could include like a Human Labs podcast agent and so

00:31:51.500 | on and so on, right?

00:31:53.600 | And maybe you want to also include other things like a calculator

00:31:58.100 | tool or a SQL database retriever or whatever else.

00:32:02.200 | There's a ton of things you can do with agents.

00:32:04.800 | But anyway, that is it for this video.

00:32:08.000 | I hope all this has been interesting and useful.

00:32:11.000 | So thank you very much for watching and I will see you

00:32:15.100 | again in the next one.

00:32:16.200 | Bye.

00:32:16.800 | Bye.

00:32:17.200 | Bye.

00:32:17.600 | Bye.

00:32:17.800 | Bye.

00:32:18.300 | Bye.

00:32:18.800 | Bye.

00:32:19.300 | Bye.

00:32:19.800 | Bye.

00:32:20.500 | Bye.

00:32:21.200 | Bye.

00:32:21.900 | Bye.

00:32:22.600 | Bye.

00:32:23.300 | Bye.

00:32:24.000 | Bye.

00:32:24.700 | Bye.

00:32:25.400 | Bye.

00:32:26.100 | Bye.

00:32:26.800 | Bye.

00:32:27.500 | Bye.

00:32:28.200 | Bye.

00:32:28.900 | Bye.

00:32:29.600 | Bye.

00:32:30.300 | Bye.

00:32:31.000 | Bye.

00:32:31.700 | Bye.

00:32:32.400 | Bye.

Lex Fridman Podcast Chatbot with LangChain Agents + GPT 3.5

Chapters