LangChain Mastery in 2025 | Full 5 Hour Course

00:00:00.000 | Welcome to the AI engineers guide for the line chain.

00:00:03.320 | This is a full course that will take you from the assumption

00:00:08.400 | that you know nothing about line chain to being able to

00:00:12.440 | proficiently use the framework, either, you know, within line

00:00:17.480 | chain, within line graph, or even elsewhere, from the

00:00:22.360 | fundamentals that you will learn in this course. Now, this course

00:00:26.400 | will be broken up into multiple chapters, we're going to start

00:00:29.960 | by talking a little bit about what line chain is, and when we

00:00:33.920 | should really be using it, and when maybe we don't want to use

00:00:36.560 | it. We'll talk about the pros and cons, and also about the

00:00:40.040 | wider line chain ecosystem, not just about a line chain

00:00:43.880 | framework itself. From there, we'll introduce line chain, and

00:00:48.040 | we'll just have a look at a few examples before diving into

00:00:51.080 | essentially the basics of the framework. Now, I will just note

00:00:55.720 | that all of this for line chain 0.3. So that is the latest

00:01:00.360 | current version. Although that being said, we will cover a

00:01:04.640 | little bit of where line chain comes from as well. So we'll be

00:01:07.840 | looking at pre 0.3 version methods for doing things, so

00:01:13.320 | that we can understand, okay, that's the old way of doing

00:01:16.200 | things, how do we do it now, now that we're in version 0.3? And

00:01:20.280 | also, how do we dive a little deeper into those methods as

00:01:23.200 | well and kind of customize those. From there, we'll be

00:01:25.920 | diving into what I believe is the future of AI. I mean, it's

00:01:33.400 | the now and the short term, potentially even further into

00:01:36.840 | the future. And that is agents. We'll be spending a lot of time

00:01:40.880 | on agents. So we'll be starting with a simple introduction to

00:01:45.560 | agents. So that is how can we build an agent that is simple?

00:01:51.080 | What are the main components of agents? What do they look like?

00:01:53.880 | And then we'll be diving much deeper into them. And we'll be

00:01:57.640 | building out our own agent executor, which kind of like a

00:02:01.320 | framework around the AI components of an agent, we're

00:02:06.280 | building our own. And once we've done our deep dive on agents,

00:02:10.480 | we'll be diving into line chain expression language, which we'll

00:02:14.360 | be using throughout this course. So line chain expression

00:02:17.160 | language is the recommended way of using line chain. And the

00:02:21.680 | expression language or LSAL takes kind of a break from

00:02:25.240 | standard Python syntax. So there's a bit of weirdness in

00:02:30.400 | there. And yes, we'll be using it throughout the course. But

00:02:34.000 | we're leaving the LSAL chapter until this kind of later on in

00:02:39.160 | the course, because we really want to dive into the

00:02:41.320 | fundamentals of LSAL by that point. But the idea is that by

00:02:45.360 | this point, you already have a good grasp of at least how to

00:02:47.720 | use the basics of LSAL before we really dig in at that point,

00:02:51.960 | then we'll be digging streaming, which is an essential UX

00:02:56.200 | feature of AI applications in general streaming, it can just

00:03:01.120 | improve the user experience massively. And it's not just

00:03:04.920 | about streaming tokens, you know, that interface where you

00:03:07.960 | have word by word, the AI is generating text on the screen,

00:03:12.360 | streaming is more than just that it is also the ability, if

00:03:16.920 | you've seen the interface of perplexity, where as the agent

00:03:20.400 | is thinking, you're getting an update of what the agent is

00:03:23.800 | thinking about what tools is using and how it is using those

00:03:27.320 | tools. That's also another essential feature that we need

00:03:30.600 | to have a good understanding of streaming to build. So we'll

00:03:33.960 | also be taking a look at all of that. Then we'll finally we'll

00:03:38.080 | be topping it off with a capstone project where we will

00:03:42.320 | be building our own AI agent application that is going to

00:03:47.040 | incorporate all of these features, we're going to have an

00:03:50.240 | agent that can use tools, web search, we'll be using

00:03:53.080 | streaming, and we'll see all of this in a nice interface that we

00:03:57.760 | can that we can work with. So as an overview, the course, of

00:04:01.360 | course, is very high level, what I've just gone through, there's

00:04:04.840 | a ton of stuff in here. And truly, this course can take you

00:04:07.960 | from you know, wherever you are with Lionchain at the moment,

00:04:11.160 | and whether you're a beginner or you've used it a bit or even

00:04:14.040 | intermediate, and you're probably going to learn a fair

00:04:17.000 | bit from it. So without any further ado, let's dive in to

00:04:22.280 | the first chapter. Okay, so the first chapter of the course,

00:04:27.120 | we're going to focus on when should we actually use

00:04:30.480 | Lionchain? And when should we use something else? Now, through

00:04:34.320 | this chapter, we're not really going to focus too much on the

00:04:36.800 | code. Well, you know, every other chapter is very code

00:04:40.520 | focused. But this one is a little more just theoretical.

00:04:44.000 | Why is Lionchain? Where's it fit in? When should I use it? When

00:04:46.800 | should I not? So I want to just start by framing this. Lionchain

00:04:51.560 | is one of, if not the most popular open source framework

00:04:57.360 | within the Python ecosystem, at least for AI. It works pretty

00:05:01.640 | well for a lot of things. And also works terribly for a lot of

00:05:04.600 | things as well, to be completely honest. There are massive pros,

00:05:08.000 | massive cons to using Lionchain. Here, we're just going to

00:05:10.600 | discuss a few of those and see how Lionchain maybe compares a

00:05:14.760 | little bit against other frameworks. So the very first

00:05:19.040 | question we should be asking ourselves is, do we even need a

00:05:22.680 | framework? Is a framework actually needed when we can just

00:05:28.480 | hit an API, you have the OpenAI API, other APIs, Mistral, so on,

00:05:32.840 | and we can get a response from an LLM in five lines of code on

00:05:36.960 | average for those is incredibly, incredibly simple. However, that

00:05:42.000 | can change very quickly. When we start talking about agents, or

00:05:47.080 | retrieval, augmented generation, research assistance, all this

00:05:51.120 | sort of stuff, those use cases as methods can suddenly get

00:05:57.560 | quite complicated when we're outside of frameworks. And

00:06:02.640 | that's not necessarily a bad thing. Right? It can be

00:06:06.200 | incredibly useful to be able to just understand everything that

00:06:11.360 | is going on and build it yourself. But the problem is

00:06:15.680 | that to do that, you need time, like you need to learn all the

00:06:19.560 | intricacies of building these things, the intricacies of these

00:06:22.120 | methods themselves, like what, you know, how do they even work?

00:06:24.840 | And that kind of runs in the opposite direction of what we

00:06:28.840 | see with AI at the moment, which is AI is being integrated into

00:06:32.160 | the world at an incredibly fast rate. And because of this, most

00:06:38.880 | engineers coming into the space are not from a machine learning

00:06:43.280 | or AI background, most people don't necessarily have any

00:06:46.880 | experience with the system, a lot of engineers coming in that

00:06:50.840 | could be DevOps engineers, generic backend Python

00:06:53.920 | engineers, even front end engineers coming in and

00:06:57.120 | building all these things, which is great, but they don't

00:07:00.400 | necessarily have the experience and that, you know, that might

00:07:02.920 | be you as well. And that's not a bad thing. Because the idea is

00:07:06.480 | that obviously you're going to learn and you're going to pick

00:07:08.560 | up a lot of these things. And in this scenario, there's quite a

00:07:12.520 | good argument for using a framework, because a framework

00:07:16.320 | means that you can get started faster. And a framework like

00:07:20.080 | Langchain, it abstracts away a lot of stuff. And that's a big

00:07:24.800 | complaint that a lot of people will have with Langchain. But

00:07:28.560 | that abstracting away of many things is also what made

00:07:33.120 | Langchain popular, because it means that you can come in not

00:07:35.680 | really knowing, okay, what, you know, RAG is, for example, and

00:07:39.240 | you can implement a RAG pipeline, get the benefits of it

00:07:42.080 | without really needing to understand it. And yes, there's

00:07:44.760 | an argument against that as well, just implementing

00:07:47.680 | something without really understanding it. But as we'll

00:07:50.440 | see throughout the course, it is possible to work with

00:07:54.640 | Langchain in a way, as we will in this course, where you kind

00:08:00.600 | of implement these things in an abstract way, and then break

00:08:03.400 | them apart, and start understanding the intricacies at

00:08:07.000 | least a little bit. So that can actually be pretty good.

00:08:10.640 | However, again, circling back to what we said at the start, if

00:08:17.000 | the idea or your application is just a very simple, you know,

00:08:20.360 | you need to generate some text based on some basic input,

00:08:23.920 | maybe you should just use an API, that's completely valid as

00:08:27.120 | well. Now, we just said, okay, a lot of people coming to

00:08:31.360 | Langchain might not be from an AI background. So another

00:08:35.840 | question for a lot of these engineers might be, okay, if I

00:08:38.120 | want to learn about, you know, RAG, agents, all these things,

00:08:42.520 | should I skip Langchain and just try and build it from scratch

00:08:46.800 | myself? Well, Langchain can help a lot with that learning

00:08:50.920 | journey. So you can start very abstract. And as you gradually

00:08:56.480 | begin to understand the framework better, you can strip

00:09:00.520 | away more and more of those abstractions and get more into

00:09:03.560 | the details. And in my opinion, this gradual shift towards more

00:09:08.800 | explicit code, with less abstraction, is a really nice

00:09:14.560 | feature. And it's also what we focus on, right? Throughout

00:09:17.680 | this course, that's what we're going to be doing. We're going

00:09:19.520 | to sign abstract, stripping away the abstractions, and getting

00:09:23.240 | more explicit with what we're building. So for example,

00:09:26.000 | building an agent in Langchain, there's this very simple and

00:09:31.120 | incredibly abstract create tools agent method that we can use.

00:09:36.080 | And like it creates a tool agent for you. It's it doesn't tell

00:09:41.160 | you anything. So you can you can use that, right. And we will use

00:09:46.320 | that initially in the course, but then you can actually go

00:09:49.800 | from that to defining your full agent execution logic, which is

00:09:56.120 | basically a tools call to open AI, you're going to be getting

00:09:59.880 | that tool information back, but then you've got to figure out,

00:10:02.280 | okay, how am I going to execute that? How am I going to store

00:10:04.720 | this information? And then how am I going to iterate through

00:10:07.520 | this? So we're going to be seeing that stripping away

00:10:11.560 | abstractions as we work through as we build agents as we do, as

00:10:15.360 | we build, like our streaming use case, among many other things,

00:10:18.960 | even chat memory, we'll see there as well. So Langchain can

00:10:23.160 | act as the on ramp to your AI learning experience, then what

00:10:29.040 | you might find, and I do think this is quite true, for most

00:10:33.080 | people is that if you if you're really serious about AI

00:10:37.360 | engineering, and that's what you want to do, like that's your

00:10:39.640 | focus, right, which isn't for everyone, for certain, a lot of

00:10:43.840 | people just want to understand a bit of AI, and they want to

00:10:46.360 | continue doing what they're doing, and just integrate AI

00:10:49.000 | here and there. And maybe those, you know, if that's your focus,

00:10:51.760 | you might stick with Langchain, there's not necessarily a reason

00:10:55.360 | to move on. But in the other scenario, where you're thinking,

00:10:59.880 | okay, I want to get really good at this, I want to just learn as

00:11:04.400 | much as I can. And I'm going to dedicate basically my, you know,

00:11:07.960 | my short term future of my career on becoming AI engineer.

00:11:14.080 | Then Langchain might be the on ramp, it might be your initial

00:11:18.280 | learning curve. But then after you've become competent with

00:11:21.680 | Langchain, you might actually find that you want to move on to

00:11:24.080 | other frameworks. And that doesn't necessarily mean that

00:11:26.600 | you're going to have wasted your time with Langchain. Because

00:11:30.000 | one, Langchain is a thing helping you learn. And two, one

00:11:33.640 | of the main frameworks that I recommend a lot of people to

00:11:37.080 | move on to is actually Langraff, which is still within the

00:11:40.200 | Langchain ecosystem, and it still uses a lot of Langchain

00:11:43.840 | objects and methods. And, of course, concepts as well. So

00:11:48.720 | even if you do move on from Langchain, you may move on to

00:11:52.200 | something like Langraff, which you can know Langchain for

00:11:56.000 | anyway. And let's say you do move on to another framework

00:11:58.800 | instead. In that scenario, the concepts that you learn from

00:12:02.280 | Langchain are still pretty important. So to just finish up

00:12:05.600 | this chapter, I just want to summarize on that question of

00:12:10.160 | should you be using Langchain? What's important to remember is

00:12:14.400 | that Langchain does abstract a lot. Now, this abstraction of

00:12:18.680 | Langchain is both a strength and a weakness. With more

00:12:23.240 | experience, those abstractions can feel like a limitation. And

00:12:28.720 | that is why we sort of go with the idea that Langchain is a

00:12:34.920 | really good to get started with. But as the project grows in

00:12:38.320 | complexity, or the engineers get more experience, they might move

00:12:41.040 | on to something like Langraff, which, in any case, is going to

00:12:44.520 | be using Langchain to some degree. So in either one of

00:12:48.160 | those scenarios, Langchain is going to be a core tool in an AI

00:12:55.960 | engineer's toolkit. So it's worth learning in our opinion.

00:12:59.000 | But of course, it comes with its, you know, it comes with its

00:13:02.280 | weaknesses. And it's just good to be aware of that it's not a

00:13:04.920 | perfect framework. But for the most part, you will learn a lot

00:13:08.840 | from it, and you will be able to build a lot with it. So with all

00:13:13.120 | of that, we'll move on to our first sort of hands on chapter

00:13:17.840 | with Langchain, where we'll just introduce Langchain, some of the

00:13:22.800 | essential concepts, we're not going to dive too much into the

00:13:25.240 | syntax, but we're still going to understand a little bit of what

00:13:27.200 | we can do with it. Okay, so moving on to our next chapter,

00:13:29.880 | getting started with Langchain. In this chapter, we're going to

00:13:33.720 | be introducing Langchain by building a simple LM powered

00:13:38.040 | assistant that will do various things for us, it will be

00:13:41.040 | multimodal, generating some text, generating images,

00:13:44.760 | generate some structured outputs, it will do a few things.

00:13:47.960 | Now to get started, we will go over to the course repo, all of

00:13:53.240 | the code, all the chapters are in here, there are two ways of

00:13:56.840 | running this, either locally or in Google Colab, we would

00:14:01.000 | recommend running in Google Colab, because it's just a lot

00:14:03.880 | simpler with environments. But you can also run it locally. And

00:14:07.880 | actually, for the capstone, we will be running it locally,

00:14:11.640 | there's no way of us doing that in Colab. So if you would like

00:14:16.200 | to run everything locally, I'll show you how quickly now if you

00:14:19.520 | would like to run in Colab, which I would recommend at least

00:14:22.680 | for the first notebook chapters, just skip ahead, there will be

00:14:27.520 | chapter points in the timeline of the video. So for only

00:14:32.960 | running it locally, we just come down to here. So this actually

00:14:36.840 | tells you everything that you need. So you will need to

00:14:40.920 | install uvi. Alright, so this is the package manager that we

00:14:45.000 | recommend by the Python and package management library, you

00:14:48.840 | don't need to use uvi, it's up to you. uvi is very simple, it

00:14:54.440 | works really well. So I would recommend that. So you would

00:14:57.680 | install it with this command here. This is on Mac. So it will

00:15:02.320 | be different. Otherwise, if you are on Windows, or otherwise,

00:15:06.040 | you can look at the installation guide there and it will tell

00:15:08.560 | you what to do. And so before we actually do this, what I will

00:15:12.680 | do is go ahead and just clone this repo. So we'll come into

00:15:18.400 | here, I'm going to create like a temp directory for me because

00:15:21.680 | I already have the flying chain course in there. And what I'm

00:15:25.840 | going to do is just get clone line chain course. Okay, so you

00:15:29.800 | will also need to install git if you don't have that. Okay, so

00:15:34.880 | we have that, then what we'll do is copy this. Okay, so this

00:15:39.000 | will install Python 3.12.7 for us with this command, then this

00:15:44.360 | will create a new VM within that or using Python 3.12.7 that

00:15:50.960 | we've installed. And then uvi sync will actually be looking at

00:15:55.760 | the pyproject.toml file, that's like the package installation

00:16:00.520 | for the repo and using that to install everything that we need.

00:16:05.200 | Now, we should actually make sure that we are within the

00:16:08.160 | line chain course directory. And then yes, we can run those

00:16:12.080 | three. There we go. So everything should install with

00:16:17.080 | that. Now, if you are in cursor, you can just do cursor dot or we

00:16:25.000 | can run code dot if in VS code, I'll just be running this. And

00:16:29.960 | then I've opened up the course. Now within that course, you have

00:16:34.080 | your notebooks and then you just run through these making sure

00:16:36.880 | you select your kernel, Python environment and making sure

00:16:39.960 | you're using the correct VM from here. So that should pop up

00:16:44.400 | already as this VM bin Python, and you'll click that and then

00:16:48.800 | you can run it through. When you are running locally, don't run

00:16:52.520 | these, you don't need to you've already installed everything. So

00:16:55.320 | you don't this specifically is for Colab. So that is running

00:16:59.720 | things locally. Now let's have a look at running things in Colab.

00:17:05.080 | So for running everything in Colab, we have our notebooks in

00:17:09.160 | here, we click through, and then we have each of the chapters

00:17:12.400 | through here. So starting with the first chapter, the

00:17:16.120 | introduction, which is where we are now. So what you can do to

00:17:21.000 | open this in Colab is either just click this Colab button

00:17:24.600 | here. Or if you really want to, for example, maybe this is not

00:17:30.480 | loading for you, what you can do is you can copy the URL at the

00:17:34.840 | top here, you can go over to Colab, you can go to open GitHub,

00:17:40.920 | and then just paste that in there and press enter. And there

00:17:46.360 | we go, we have our notebook. Okay, so we're in now, what we

00:17:51.720 | will do first is just install the prerequisites. So we have

00:17:55.880 | line chain, just a little line chain packages here, line chain

00:17:59.680 | core, line chain OpenAI because we're using OpenAI and line

00:18:04.120 | chain community, which is needed for running what we're running.

00:18:07.600 | Okay, so that has installed everything for us. So we can

00:18:12.200 | move on to our first step, which is initializing our LM. So

00:18:18.800 | we're going to be using GPT-40 mini, which is slightly small,

00:18:23.280 | but fast, but also cheaper model. That is also very good

00:18:27.400 | from OpenAI. So what we need to do here is get an API key. Okay,

00:18:33.320 | so for getting the API key, we're going to go to OpenAI's

00:18:37.240 | website. And you can see here that we're opening platform.

00:18:40.560 | openai.com. And then we're going to go into settings organization

00:18:44.200 | API keys. So you can copy that or just click it from here.

00:18:49.160 | Okay, so I'm going to go ahead and create a new secret key to

00:18:54.120 | actually just in case you're kind of looking for where this

00:18:57.080 | is. It's settings organization API keys again, okay, create a

00:19:01.600 | new API key, I'm going to call it line chain course. I'll just

00:19:08.240 | put on the semantic router, that's just my organization, you

00:19:11.600 | put it wherever you want it to be. And then you would copy your

00:19:16.320 | API key, you can see mine here, I'm obviously going to revert

00:19:20.040 | that before you see this, but you can try and use it if you

00:19:22.400 | really like. So I'm going to copy that. And I'm going to

00:19:25.080 | place it into this little box here. You could also just place

00:19:29.680 | it, put your full API key in here, it's up to you. But this

00:19:34.560 | little box just makes things easier. Now, that what we've

00:19:39.040 | basically done there is just passing our API key, we're

00:19:41.360 | setting our OpenAI model GPT-40 mini. And what we're going to be

00:19:45.880 | doing now is essentially just connecting and setting up our

00:19:49.960 | LLM parameters with line chain. So we run that, we say okay,

00:19:55.680 | we're using a GPT-40 mini. And we're also setting ourselves up

00:19:59.880 | to use two different LLMs here, or two of the same LLM with

00:20:04.560 | slightly different settings. So the first of those is an LLM

00:20:08.240 | with a temperature setting of zero. The temperature setting

00:20:11.480 | basically controls almost the randomness of the output of

00:20:17.160 | your LLM. And the way that it works is when an LLM is

00:20:22.520 | predicting the next token, or next word in a sequence, it'll

00:20:28.040 | provide a probability actually for all of the tokens within the

00:20:31.360 | LLMs knowledge base or what the LLM has been trained on. So

00:20:35.280 | what we do when we set a temperature of zero is we say

00:20:38.400 | you are going to give us the token with highest probability

00:20:43.960 | according to you, okay. Whereas when we set a temperature of

00:20:48.160 | 0.9, what we're saying is, okay, there's actually an increased

00:20:52.280 | probability of you giving us a token that according to your

00:20:57.720 | generated output is not the token with the highest

00:21:01.000 | probability according to the LLM. But what that tends to do

00:21:04.240 | is give us more sort of creative outputs. So that's what the

00:21:08.120 | temperature does. So we are creating a normal LLM and then

00:21:12.680 | a more creative LLM with this. So what are we going to be

00:21:16.480 | building? We're going to be taking a draft article from the

00:21:24.960 | Aurelio learning page, and we're going to be using line chain to

00:21:29.040 | generate various things that we might find helpful as well. You

00:21:34.000 | know, we have this article draft and we're editing it and just

00:21:36.520 | kind of like finalizing it. So what are those going to be? You

00:21:39.920 | can see them here. We have the title for the article,

00:21:43.040 | description, an SEO friendly description, specifically. The

00:21:47.840 | third one, we're going to be getting the LLM to provide us

00:21:50.720 | advice on existing paragraph and essentially writing a new

00:21:54.280 | paragraph for us from that existing paragraph. And what

00:21:57.440 | it's going to do, this is the structured output part is going

00:22:00.960 | to write a new version of that paragraph for us. And it's going

00:22:03.800 | to give us advice on where we can improve our writing. Then

00:22:07.000 | we're going to generate a thumbnail hero image for our

00:22:11.080 | article. So a nice image that you would put at the top. So

00:22:14.520 | here, we're just going to input our article, you can put

00:22:19.040 | something else in here if you like. Essentially, this is just

00:22:22.200 | a big article that's written a little while back on agents. And

00:22:28.680 | now we can go ahead and start preparing our prompts, which are

00:22:32.480 | essentially the instructions for our LLM. So line chain comes

00:22:36.960 | with a lot of different like utilities for prompts, and we're

00:22:41.800 | going to dive into them in a lot more detail. But I do want to

00:22:44.160 | just give you the essentials now, just so you can understand

00:22:48.040 | what we're looking at, at least conceptually. So prompts for

00:22:51.480 | chat agents are at a minimum broken up into three

00:22:55.000 | components. Those are the system prompt, this provides

00:22:58.800 | instructions to our LLM on how it should behave, what its

00:23:01.600 | objective is, and how it should go about achieving that

00:23:04.640 | objective. Generally, system prompts are going to be a bit

00:23:08.440 | longer than what we have here, depending on the use case, then

00:23:11.880 | we have our user prompts. So these are user written

00:23:15.200 | messages. Usually, sometimes we might want to pre populate

00:23:18.680 | those if we want to encourage a particular type of

00:23:21.400 | conversational patterns from our agent. But for the most part,

00:23:26.640 | yes, these are going to be user generated. Then we have our AI

00:23:30.920 | prompts. So these are, of course, AI generated. And again,

00:23:35.200 | in some cases, we might want to generate those ourselves

00:23:38.360 | beforehand or within a conversation if we have a

00:23:41.960 | particular reason for doing so. But for the most part, you can

00:23:45.080 | assume that these are actually user and AI generated. Now, the

00:23:49.480 | line chain provides us with templates for each one of these

00:23:54.480 | prompt types. Let's go ahead and have a look at what these look

00:23:58.600 | like within line chain. So to begin, we are looking at this

00:24:03.560 | one. So we have our system message prompt template and

00:24:07.920 | human messages, the user that we saw before. So we have these

00:24:12.120 | two system prompt, keeping it quite simple here, you are a AI

00:24:15.520 | system that helps generate article titles, right. So our

00:24:18.640 | first component we want to generate is article title. So

00:24:22.160 | we're telling the AI, that's what we want it to do. And then

00:24:26.600 | here, right. So here, we're actually providing kind of like

00:24:32.680 | a template for a user input. So yes, as I mentioned, user input

00:24:40.000 | can be, it can be fully generated by user, it might be

00:24:44.920 | kind of not generated by user, it might be setting up a

00:24:48.400 | conversation beforehand, which a user would later use, or in

00:24:52.320 | this scenario, we're actually creating a template, and the

00:24:57.040 | what the user will provide us will actually just be inserted

00:25:00.800 | here inside article. And that's why we have this import

00:25:04.400 | variables. So what this is going to do is okay, we have all of

00:25:09.800 | these instructions around here, they're all going to be

00:25:12.920 | provided to open AI as if it is the user saying this, but it

00:25:16.760 | will actually just be this here, that user will be providing,

00:25:21.800 | okay. And we might want to also format this a little nicer, it

00:25:24.680 | kind of depends, this will work as it is. But we can also put,

00:25:28.320 | you know, something like this to make it a little bit clearer

00:25:31.400 | to the LM. Okay, what is the article? Where are the prompts?

00:25:36.840 | So we have that, you can see in this scenario, there's not that

00:25:42.680 | much difference to what the system prompt and user prompt is

00:25:45.120 | doing. And this is, it's a particular scenario, it varies

00:25:48.440 | when you get into the more conversational stuff, as we will

00:25:50.920 | do later, you'll see that the user prompt is generally more

00:25:55.640 | fully user generated, or mostly user generated. And much of

00:26:01.160 | these types of instructions, we might actually be putting into

00:26:04.960 | the system prompt, it varies. And we'll see throughout the

00:26:07.680 | course, many different ways of using these different types of

00:26:11.560 | prompts in various different places. Then you'll see here, so

00:26:16.400 | I just want to show you how this is working, we can use this

00:26:20.120 | format method on our user prompt here to actually insert

00:26:24.640 | something within the article input here. So we're going to go

00:26:29.840 | use prompt format, and then we pass in something for article.

00:26:32.920 | Okay. And we can also maybe format this a little nicer, but

00:26:37.240 | I'll just show you this for now. So we have our human message.

00:26:39.800 | And then inside content, this is the text that we had, right, you

00:26:43.200 | can see that we have all this, right. And this is what we wrote

00:26:46.000 | before we wrote all this, except from this part, we didn't write

00:26:50.000 | this, instead of this, we had article, right. So let's format

00:26:55.920 | this a little nicer so that we can see. Okay, so this is

00:26:59.600 | exactly what we wrote up here, exactly the same, except from

00:27:02.600 | now we have test string instead of article. So later, when we

00:27:06.840 | insert our article, it's going to go inside there, slowly

00:27:10.520 | soon. It's like it's an it's an F string in Python, okay. And

00:27:14.440 | this is again, this is one of the things where people might

00:27:16.760 | complain about line chain, you know, this sort of thing can be,

00:27:20.000 | you know, it seems excessive, because you could just do this

00:27:23.120 | with an F string. But there are, as we'll see later, particularly

00:27:26.240 | when you're streaming, just really helpful features that

00:27:29.960 | come with using line chains kind of built in prompt templates,

00:27:35.360 | or at least message objects that we will see. So, you know, we

00:27:42.160 | need to keep that in mind. Again, as things get more

00:27:45.080 | complicated, line chain can be a bit more useful. So, chat

00:27:48.880 | prompt template, this is basically just going to take

00:27:52.680 | what we have here, our system prompt, user prompts, we could

00:27:55.120 | also include some AI prompts in there. And what it's going to do

00:27:59.560 | is merge both of those. And then when we do format, what it's

00:28:05.400 | going to do is put both of those together into a chat history.

00:28:09.120 | Okay, so let's see what that looks like. First, in a more

00:28:13.080 | messy way. Okay, so you can see we have just the content, right?

00:28:18.840 | So it doesn't include the whole, you know, before we had human

00:28:22.120 | message, we're not include, we're not seeing anything like

00:28:24.520 | that here. Instead, we're just seeing the string. So now let's

00:28:28.680 | switch back to print. And we can see that what we have is our

00:28:33.880 | system message here, it's just prefixed with this system. And

00:28:37.160 | then we have human, and it's prefixed by human, and then it

00:28:39.840 | continues, right? So that's, that's all it's doing is just

00:28:42.320 | kind of merging those in some sort of chat log, we could also

00:28:45.000 | put in like AI messages, and they would appear in there as

00:28:47.680 | well. Okay, so we have that. Now, that is our prompt

00:28:52.280 | template. Let's put that together with an LLM to create

00:28:55.600 | what would be in the past line chain be called an LLM chain.

00:28:59.800 | Now, we wouldn't necessarily call it an LLM chain, because

00:29:03.040 | we're not using the LLM chain abstraction, it's not super

00:29:05.880 | important, if that doesn't make sense, we'll go into it in more

00:29:09.160 | detail later, particularly in the in the LSO chapter. So what

00:29:15.080 | this chain will do, think line chain is just chains, we're

00:29:20.240 | chaining together these multiple components, it will perform the

00:29:24.040 | steps prompt formatting. So that's what I just showed you

00:29:27.160 | LLM generation, so sending our prompt to OpenAI, getting a

00:29:32.960 | response and getting that output. So you can also add

00:29:37.280 | another step here, if you want to format that in a particular

00:29:39.840 | way, we're going to be outputting that in a particular

00:29:42.440 | format so that we can feed it into the next step more easily.

00:29:45.040 | But there are also things called output parsers, which parse

00:29:48.800 | your output in a more dynamic or complicated way, depending on

00:29:53.840 | what you're doing. So this is our first look at LSAL, I don't

00:29:58.360 | want us to focus too much on the syntax here, because we will be

00:30:01.000 | doing that later. But I do want you to just understand what is

00:30:04.840 | actually happening here. And logically, what are we writing?

00:30:10.880 | So all we really need to know right now is we define our

00:30:15.680 | inputs with the first dictionary segment here. Alright, so this

00:30:20.000 | is a, you know, our inputs, which we have defined already,

00:30:24.080 | okay. So if we come up to our user prompt here, we said input

00:30:30.840 | variable is our article, right. And we might have also added

00:30:34.000 | input variables to the system prompt here as well. In that

00:30:36.880 | case, you know, let's say we had your AI assistant called name,

00:30:43.920 | right, that helps generate article titles. In this

00:30:48.720 | scenario, we might have input variables, name here, right. And

00:30:55.360 | then what we would have to do down here is we would also have

00:31:00.560 | to pass that in, right. So it also we would have article, we

00:31:04.200 | would also have name. So basically, we just need to make

00:31:09.720 | sure that in here, we're including the variables that we

00:31:13.720 | have defined as input variables for our, our first prompts.

00:31:17.680 | Okay, so we can actually go ahead and let's add that. So we

00:31:21.120 | can see it in action. So run this again, and just include

00:31:26.640 | that or reinitialize our first prompt. So we see that. And if

00:31:32.320 | we just have a look at what that means for this format function

00:31:35.480 | here, it means we'll also need to pass in a name, okay, and

00:31:39.280 | call it Joe. Okay, so Joe, the AI, right, so you're an AI

00:31:43.880 | assistant called Joe now. Okay, so we have Joe, our AI, that is

00:31:48.400 | going to be fed in through these input variables. Then we have

00:31:51.560 | this pipe operator, the pipe operator is basically saying

00:31:54.960 | whatever is to the left of the pipe operator, which in this

00:31:58.320 | case would be this is going to go into whatever is on the right

00:32:02.360 | of the pipe operator. It's that simple. Again, we'll dive into

00:32:06.240 | this and kind of break it apart in the LSL chapter. But for now,

00:32:09.320 | that's all we need to know. So this is going to go into our

00:32:13.120 | first prompt, that is going to form everything's going to add

00:32:16.680 | the name and the article that we've provided into our first

00:32:19.320 | prompt. And it's going to output that, right, output that we have

00:32:23.240 | our pipe operator here. So the output of this is going to go

00:32:26.160 | into the input of our next step, our creative LM, then that is

00:32:32.800 | going to generate some tokens, it's going to generate our

00:32:35.320 | output, that output is going to be an AI message. And as you saw

00:32:40.600 | before, if I take this bit out, within those message objects, we

00:32:47.200 | have this content field, okay, so we are actually going to

00:32:50.720 | extract the content field out from our AI message to just get

00:32:56.640 | the content. And that is what we do here. So we get the AI

00:32:59.680 | message out from ILM. And then we're extracting the content

00:33:03.040 | from that AI message object. And we're going to pass it into a

00:33:05.800 | dictionary that just contains article title, like so. Okay, we

00:33:10.200 | don't need to do that, we can just get the AI message

00:33:12.440 | directly. I just want to show you how we are using this sort

00:33:17.120 | of chain in Elsa. So once we have set up our chain, we then

00:33:23.000 | call it or execute it using the invoke method. Into that we will

00:33:27.400 | need to pass in those variables. So we have our article already,

00:33:30.800 | but we also gave our AI name now. So let's add that. And

00:33:34.520 | we'll run this. Okay, so Joe has generated us a article title,

00:33:42.440 | unlocking the future, the rise of neuro symbolic AI agents.

00:33:46.600 | Cool, much better name than what I gave the article, which was

00:33:50.440 | AI agents are neuro symbolic systems. I don't think I did too

00:33:55.320 | bad. Okay, so we have that. Now, let's continue. And what we're

00:34:01.160 | going to be doing is building more of these types of LM chain

00:34:05.560 | pipelines, where we're feeding in some prompts, we're

00:34:09.560 | generating something, getting something and doing something

00:34:12.440 | with it. So as mentioned, we have the title, we're now moving

00:34:16.840 | on to the description. So I want to generate description. So we

00:34:19.680 | have our human message prompt template. So this is actually

00:34:22.160 | going to go into a similar format as before, we probably

00:34:27.840 | also want to redefine this because I think I'm using the

00:34:30.680 | same system message there. So let's, let's go ahead and do

00:34:35.280 | modify that. Or what we could also do is let's just remove the

00:34:41.360 | name now because I've shown you that. So what we could do is

00:34:46.080 | you're an AI system that helps build good articles, right,

00:34:52.000 | build good articles. And we could just use this as our, you

00:34:56.520 | know, generic system prompt now. So let's say that's our new

00:35:00.120 | system prompt. Now we have our user prompt, you're tasked with

00:35:03.000 | creating a description for the article, the articles here for

00:35:05.320 | you to examine article, here is the article title. Okay, so we

00:35:09.000 | need the article title now as well, and our input variables.

00:35:11.920 | Now we're going to output an SEO friendly article description.

00:35:15.320 | And we're just saying, just to be certain here, do not output

00:35:18.680 | anything other than the description. So you know,

00:35:21.160 | sometimes an LLM might say, Hey, look, this is what I generated

00:35:24.960 | for you. The reason I think this is good is because so on and so

00:35:27.520 | on and so on. Right? If you're programmatically taking some

00:35:31.120 | output from an LLM, you don't want all of that fluff around

00:35:34.640 | what the LLM has generated, you just want exactly what you've

00:35:38.120 | asked it for. Okay, because otherwise, you need to pass out

00:35:40.920 | with code, and it can get messy, and also just far less reliable.

00:35:44.840 | So we're just saying do not put anything else. Then we're

00:35:48.560 | putting all of these together. So system prompt and the second

00:35:50.880 | user prompt, this one here, putting those together into a

00:35:54.560 | new chat prompt template. And then we're going to feed all

00:35:58.520 | that in to another LSL chain as we have here to generate our

00:36:04.360 | description. So let's go ahead, we invoke that as before, we're

00:36:07.800 | just making sure we add in the article title that we got from

00:36:10.960 | before. And let's see what we get. Okay, so we have this

00:36:15.400 | explore the transformative potential of neurosymbolic AI

00:36:18.280 | agents in a little bit long, to be honest. But yeah, you can see

00:36:23.160 | what it's doing here. Right. And of course, we could then go in,

00:36:26.160 | we see this kind of too long, like SEO friendly description,

00:36:30.200 | not, not really. So we can modify this. I'll put the SEO

00:36:35.440 | friendly description, make sure we don't exceed, let me put on

00:36:42.440 | a new line, make sure we don't exceed, say 200 characters, or

00:36:46.760 | maybe it's even less to SEO, I don't, I don't have a clue. I

00:36:49.960 | would just say 120 characters do not apply anything other than

00:36:53.600 | the description. Right. So we could just go back, modify our

00:36:56.400 | prompting, see what that generates again. Okay, so much

00:37:00.640 | shorter, probably too short now, but that's fine. Cool. So we

00:37:04.160 | have that we have a summary processor. And that's now in

00:37:08.000 | this dictionary format that we have here. Cool. Now the third

00:37:12.600 | step, we want to consume that first article variable with our

00:37:17.240 | full article. And we're going to generate a few different output

00:37:22.200 | fields. So for this, we're going to be using the structured

00:37:26.520 | output feature. So let's scroll down, we'll see what that is,

00:37:31.920 | what that looks like. So structured output is essentially

00:37:36.120 | we're forcing the LLAMic like it has to output a dictionary with

00:37:40.960 | these particular fields. Okay. And we can modify this quite a

00:37:45.440 | bit. But in this scenario, what I want to do is I want there to

00:37:49.600 | be an original paragraph, right, so I just want it to regenerate

00:37:52.720 | the original paragraph, because I'm lazy, and I don't want to

00:37:54.720 | extract it out, then I want to get the new edited paragraph,

00:37:59.520 | this is the LLAM generated improved paragraph, and then we

00:38:03.600 | want to get some feedback because we don't want to just

00:38:06.120 | automate ourselves, we want to augment ourselves and get better

00:38:10.760 | with AI rather than just being like how you do you do this. So

00:38:14.880 | that's what we do here. And you can see that here we're using

00:38:18.320 | this pydantic object. And what pydantic allows us to do is

00:38:21.960 | define these particular fields. And it also allows us to assign

00:38:25.840 | these descriptions to a field and line chain is actually going

00:38:29.000 | to go ahead read all of this, right even reads. So for

00:38:32.760 | example, we could put integer here, and we could actually get

00:38:35.640 | a numeric score for our paragraph, right, we can try

00:38:40.280 | that, right. So let's, let's, let's just try that quickly,

00:38:42.600 | I'll show you. So numeric, numeric score. In fact, let's

00:38:48.400 | even just ignore, let's not put anything here. So I'm going to

00:38:51.600 | put constructive feedback on the original paragraph by just put

00:38:54.200 | into here. So let's see what happens. Okay, so we have that.

00:38:58.200 | And what I'm going to do is I'm going to get our creative LM,

00:39:01.600 | I'm going to use this with structured output method. And

00:39:04.320 | that's actually going to modify that LM class, create a new LM

00:39:07.320 | class that forces LM to use this structure for the output, right,

00:39:12.440 | so passing in paragraph into here. Using this, we're creating

00:39:15.880 | this new structured LM. So let's run that and see what happens.

00:39:21.360 | Okay, so we're going to modify our chain accordingly, maybe

00:39:25.960 | what I can do is also just remove this bit for now. So we

00:39:30.960 | can just see what the structured LM outputs directly. And let's

00:39:34.800 | see. Okay, so now you can see that we actually have that

00:39:41.080 | paragraph object, right, the one we defined up here, which is

00:39:43.800 | kind of cool. And then in there, we have the original

00:39:46.760 | paragraph, right. So this is where this is coming from. I

00:39:51.200 | definitely remember writing something that looks a lot like

00:39:54.160 | that. So I think that is correct. We have the edited

00:39:57.160 | paragraph. So this is okay, what it thinks is better. And then

00:40:00.960 | interestingly, the feedback is three, which is weird, right?

00:40:05.400 | Because here we said the constructive feedback on the

00:40:08.760 | original paragraph. But what we're doing when we use this

00:40:12.080 | with structured output, for what line chain is doing is is

00:40:15.480 | essentially performing a tool call to open AI. And what a tool

00:40:19.160 | call can do is force a particular structure in the

00:40:22.480 | output of an LM. So when we say feedback has to be an integer,

00:40:27.080 | no matter what we put here, it's going to give us an integer.

00:40:30.200 | Because how do you provide constructive feedback within

00:40:33.480 | sure doesn't really make sense. But because we've set that

00:40:37.200 | limitation, that restriction here, that is what it does. It

00:40:41.760 | just gives us the numeric value. So I'm going to shift that to

00:40:45.680 | string. And then let's rerun this, see what we get. Okay, we

00:40:49.360 | should now see that we actually do get constructive feedback.

00:40:52.640 | Alright, so yeah, you can see it's quite, quite long. So the

00:40:56.480 | original paragraph effectively communicates limitations with

00:40:59.040 | neural AI systems in performing certain tasks. However, it could

00:41:03.080 | benefit from slightly improved clarity and conciseness. For

00:41:06.400 | example, the phrase was becoming clear can be made more direct by

00:41:09.960 | changing it to became evident. Yeah, true. Thank you very much.

00:41:15.240 | So yeah, now we actually get that that feedback, which is

00:41:19.480 | pretty nice. Now let's add in this final step to our chain.

00:41:24.440 | Okay, and it's just going to pull out our paragraph object

00:41:28.960 | here and extract into a dictionary, we don't necessarily

00:41:31.960 | need to do this. Honestly, I actually kind of prefer it

00:41:34.280 | within this paragraph object. But just so we can see how we

00:41:38.680 | would pass things on the other side of the chain. Okay, so now

00:41:43.680 | we can see we've extracted that out. Cool. So we have all of

00:41:49.120 | that interesting feedback again. But let's leave it there for the

00:41:54.560 | text part of this. Now let's have a look at the sort of

00:41:58.360 | multimodal features that we can work with. So this is, you know,

00:42:02.400 | maybe one of those things that's kind of seems a bit more

00:42:04.600 | abstracted, a little bit complicated, where it maybe

00:42:08.120 | could be improved. But you know, we're not going to really be

00:42:10.920 | focusing too much on the multimodal stuff, we'll still be

00:42:13.440 | focusing on language, but I did want to just show you very

00:42:16.280 | quickly. So we want this article to look better. Okay, we want to

00:42:22.000 | generate a prompt based on the article itself, that we can then

00:42:28.640 | pass to DALI, the image generation model from OpenAI,

00:42:32.600 | that will then generate an image like a like a thumbnail image

00:42:36.320 | for us. Okay. So the first step of that is we're actually going

00:42:41.160 | to get an LLM to generate that. Alright, so we have our prompt

00:42:44.760 | that we're going to use for that. So I'm gonna say generate

00:42:47.200 | a prompt with less than 500 characters to generate an image

00:42:51.600 | based on the following article. Okay, so that's our prompt.

00:42:55.240 | Yeah, super simple. We're using the generic prompt template

00:42:58.920 | here, you can use that you can use user prompt template, it's

00:43:02.480 | up to you. This is just like the generic prompt template, then

00:43:06.560 | what we're going to be doing is based on what this outputs,

00:43:11.120 | we're then going to feed that in to this generate and display

00:43:15.000 | image function via the image prompt parameter that is going

00:43:19.320 | to use the DALI API wrapper from line chain, it's going to run

00:43:23.560 | that image prompt, and we're going to get a URL out from

00:43:26.720 | that, essentially. And then we're going to read that using

00:43:29.960 | SK image here, right, so it's going to read that image URL,

00:43:33.000 | going to get the image data, and then we're just going to display

00:43:36.120 | it. Okay, so pretty straightforward. Now, again, this

00:43:42.200 | is a L cell thing here that we're doing, we have this

00:43:46.160 | runnable lambda thing, when we're running functions within

00:43:50.480 | our cell, we need to wrap them within this runnable lambda, I,

00:43:54.400 | you know, I don't want to go too much into what this is doing

00:43:57.720 | here, because we do cover in the L cell chapter. But it's just,

00:44:01.760 | you know, all you really need to know is we have a custom

00:44:04.040 | function, wrap it in runnable lambda. And then what we get

00:44:07.840 | from that we can use within this here, right, the L cell

00:44:12.000 | syntax. So what are we doing here, let's figure this out, we

00:44:15.960 | are taking our original image prompt that we defined just up

00:44:19.840 | here, right, input variable to that is article. Okay, we have

00:44:25.800 | our article data being input here, feeding that into our

00:44:29.120 | prompt. From there, we get our message that we then feed into

00:44:33.640 | our LM from the LM, it's going to generate us a, like an image

00:44:37.960 | prompt, like a prompt for generating our image for this

00:44:41.520 | article, we can even let's let's print that out, so that we can

00:44:45.920 | see what it generates, because I'm also kind of curious. Okay,

00:44:49.920 | so we'll just run that. And then let's see, it will feed in that

00:44:55.480 | content into our runnable, which is basically this function here.

00:45:00.080 | And we'll see what it generates. Okay, don't expect anything

00:45:03.880 | amazing from Dali, it's not, it's not the best, to be honest,

00:45:07.720 | but we at least we see how to use it. Okay, so we can see the

00:45:12.800 | prompt that was used here, create an image that visually

00:45:15.280 | represents the concept of neuro symbolic agents depict a

00:45:18.360 | futuristic interface where a large language model interacts

00:45:22.120 | with traditional code, symbolizing integration of, oh,

00:45:25.440 | my gosh, something computation include elements like a brain to

00:45:29.880 | represent neural networks, gears or circuits or symbolic logic,

00:45:34.600 | and a web of connections illustrating vast use cases of

00:45:38.480 | AI agents. Oh, my gosh, look at all that. Big prompt, then we

00:45:44.480 | get this. So you know, Dali is interesting, I would say, we

00:45:48.160 | could even take this, let's just see what that comes up with in

00:45:51.880 | something like mid journey, you can see these way cooler images

00:45:56.640 | that we get from just another image generation model far

00:45:59.640 | better, but pretty cool, honestly. So in terms of

00:46:02.800 | generation images, the phrasing that the prompt itself is

00:46:06.600 | actually pretty good. The image, you know, could be better. But

00:46:11.440 | that's it, right. So with all of that, we've seen a little

00:46:15.760 | introduction to what we might building with Lightning Chain.

00:46:18.520 | So that's it for our introduction chapter. As I

00:46:21.560 | mentioned, we don't want to go too much into what each of these

00:46:24.800 | things is doing, I just really want to focus on, okay, this is

00:46:29.680 | kind of how we're building something with line chain. This

00:46:33.800 | is the overall flow. We don't really want to be focusing too

00:46:37.880 | much on, okay, what exactly LSL is doing, or what exactly, you

00:46:42.960 | know, this prompt thing is that we're setting up, we're going to

00:46:47.080 | be focusing much more on all of those things, and much more in

00:46:50.880 | the upcoming chapters. So for now, we've just seen a little

00:46:55.600 | bit of what we can build before diving in, in more detail. Okay,

00:46:59.760 | so now we're going to take a look at AI observability using

00:47:04.680 | Langsmith. Now, Langsmith is another piece of the broader

00:47:08.720 | line chain ecosystem. Its focus is on allowing us to see what

00:47:14.960 | our LLMs, agents, etc, are actually doing. And it's

00:47:18.840 | something that we would definitely recommend using if

00:47:21.720 | you are going to be using line chain and line graph. Now let's

00:47:24.200 | take a look at how we would set Langsmith up, which is

00:47:27.600 | incredibly simple. So I'm going to open this in Colab. And I'm

00:47:31.960 | just going to install the prerequisites here. You'll see

00:47:35.120 | these are all the same as before, but we now have the

00:47:37.280 | Langsmith library here as well. Now, we are going to be using

00:47:41.320 | Langsmith throughout the course. So in all the following chapters,

00:47:45.200 | we're going to be importing Langsmith, and that will be

00:47:48.440 | tracking everything we're doing. But you don't need

00:47:50.720 | Langsmith to go through the course, it's an optional

00:47:53.680 | dependency. But as mentioned, I would recommend it. So we'll

00:47:57.240 | come down to here. And first thing that we will need is the

00:48:00.040 | line chain API key. Now we do need an API key, but that does

00:48:04.600 | come with a reasonable free tier. So we can see here, they

00:48:09.640 | have each of the plans. And this is the one that we are by

00:48:13.160 | default on. So it's free for one user up to 5000 tracers per

00:48:20.200 | month. If you're building out an application, I think it's

00:48:23.080 | fairly easy to go beyond that, but it really depends on what

00:48:26.000 | you're building. So it's a good place to start with. And then of

00:48:29.640 | course, you can upgrade as required. So we would go to

00:48:35.000 | smith.langchain.com. And you can see here that this will log me

00:48:40.040 | in automatically, I have all of these tracing projects, these

00:48:43.560 | are all from me running the various chapters of the course

00:48:46.360 | yours, if you do use Langsmith throughout the course, your

00:48:49.560 | Langsmith dashboard will end up looking something like this.

00:48:52.640 | Now, what we need is an API key. So we go over to settings, we

00:48:58.800 | have API keys, and we're just going to create an API key.

00:49:02.240 | Because we're just going through some personal learning right

00:49:05.120 | now, I would go with personal access token, we can give a name

00:49:08.400 | or description if you want. Okay, and we'll just copy that.

00:49:12.000 | And then we come over to our notebook, and we enter our API

00:49:15.200 | key there. And that is all we actually need to do. That's

00:49:18.240 | absolutely everything. I suppose the one thing to be aware of is

00:49:21.320 | that you should set your Langchain project to whatever

00:49:24.280 | project you're working within. So of course, within the course,

00:49:27.320 | we have individual project names for each chapter. But for your

00:49:30.800 | own projects, of course, you should make sure this is

00:49:33.320 | something that you recognize and is useful to you. So Langsmith

00:49:37.840 | actually does a lot without needing to do anything. So we

00:49:40.680 | can actually go through, let's just initialize our LLM and

00:49:43.960 | start invoking it and seeing what Langsmith returns to us. So

00:49:48.480 | we'll need our OpenAI API key, enter it here. And then let's

00:49:53.560 | just invoke hello. Okay, so nothing has changed on this end,

00:49:58.720 | right? So it was running code, there's nothing different here.

00:50:01.320 | However, now if we go to Langsmith, I'm going to go back

00:50:05.640 | to my dashboard. Okay, and you can see that the the order of

00:50:10.120 | these projects just changed a little bit. And that's because

00:50:13.000 | the most recently used project, this one at the top, Langchain

00:50:16.600 | course Langsmith OpenAI, which is the current chapter we're in,

00:50:20.200 | that was just triggered. So I can go into here, I can see, oh,

00:50:24.360 | look at this. So we actually have something in the Langsmith

00:50:27.640 | UI. And all we did was enter our Langchain API key. That's all we

00:50:31.720 | did. And we set some environment variables. And that's it. So we

00:50:34.840 | can actually click through to this and it will give us more

00:50:36.640 | information. So you can see what was the input, what was the

00:50:40.440 | output, and some other metadata here. You see, you know, there's

00:50:45.640 | not that much in here. However, when we do the same for agents,

00:50:50.840 | we'll get a lot more information. So I can even show

00:50:54.360 | you a quick example from the future chapters. If we come

00:50:59.120 | through to agents intro here, for example. And we just take a

00:51:04.040 | look at one of these. Okay, so we have this input and output,

00:51:08.440 | but then on the left here, we get all of this information. And

00:51:11.800 | the reason we get all this information is because agents

00:51:14.200 | are performing multiple LLM calls, etc, etc. So there's a

00:51:18.800 | lot more going on. So you can see, okay, what was the first

00:51:21.880 | LLM call, and then we get these tool use traces, we get another

00:51:26.120 | LLM call, another tool use and another LLM call. So you can see

00:51:30.200 | all this information, which is incredibly useful and incredibly

00:51:33.600 | easy to do. Because all I did when saying this up in that

00:51:37.120 | agent chapter was simply set the API key and the environment

00:51:41.120 | variables as we have done just now. So you get a lot out of a

00:51:46.040 | very little effort with Langsmith, which is great. So

00:51:49.120 | let's return to our Langsmith project here. And let's invoke

00:51:53.040 | some more. Now I've already shown you, you know, we're going

00:51:56.480 | to see a lot of things just by default. But we can also add

00:51:59.760 | other things that Langsmith wouldn't typically trace. So to

00:52:05.080 | do that, we will just import a traceable decorator from

00:52:08.280 | Langsmith. And then let's make these just random functions

00:52:13.600 | traceable within Langsmith. Okay, so we run those, we have

00:52:19.000 | three here. So we're going to generate a random number, we're

00:52:22.600 | going to modify how long a function takes and also generate

00:52:27.960 | a random number. And then in this one, we're going to either

00:52:31.720 | return this no error, or we're going to raise an error. So

00:52:36.200 | we're going to see how the Langsmith handles these

00:52:38.880 | different scenarios. So let's just iterate through and run

00:52:43.160 | those a few times. So it's going to run each one of those 10

00:52:46.280 | times. Okay, so let's see what happens. So they're running,

00:52:52.040 | let's go over to our Langsmith UI and see what is happening

00:52:55.840 | over here. So we can see that everything is updating, we're

00:52:58.640 | adding that information through. And we can see if we go into a

00:53:01.600 | couple of these, we can see a little more information. So the

00:53:04.520 | input and the output took three seconds. See random error here.

00:53:11.200 | In this scenario, random error passed without any issues. Let

00:53:15.480 | me just refresh the page quickly. Okay, so now we have

00:53:20.200 | the rest of the information. And we can see that occasionally,

00:53:23.840 | if there is an error from our random error function, it is

00:53:26.800 | signified with this. And we can see the traceback as well that

00:53:31.520 | was returned there, which is useful. Okay, so we can see if

00:53:34.200 | an error has been raised, we have to see what that error is.

00:53:37.400 | We can see the various latencies of these functions. So you can

00:53:42.600 | see that varying throughout here. We see all the inputs to

00:53:47.640 | each one of our functions, and then of course the outputs. So

00:53:51.600 | we can see a lot in there, which is pretty good. Now, another

00:53:55.800 | thing that we can do is we can actually filter. So if we come

00:53:59.920 | to here, we can add a filter. Let's filter for errors. That

00:54:04.760 | would be value error. And then we just get all of the cases

00:54:09.240 | where one of our functions has returned or raised an error or

00:54:13.240 | value error specifically. Okay, so that's useful. And then

00:54:17.360 | yeah, there's various other filters that we can add there.

00:54:21.160 | So we could add a name, for example, if we wanted to look

00:54:24.640 | for the generate string delay function only, we could also do

00:54:30.560 | that. Okay, and then we can see the varying latencies of that

00:54:34.880 | function as well. Cool. So we have that. Now, one final thing

00:54:40.760 | that we might want to do is maybe we want to make those

00:54:43.680 | function names a bit more descriptive or easy to search

00:54:47.920 | for, for example. And we can do that by saying the name of the

00:54:51.200 | traceable decorator, like so. So let's run that. Run this a few

00:54:56.120 | times. And then let's jump over to Langsmith again, go into

00:55:01.160 | Langsmith project. Okay, and you can see those coming through as

00:55:04.200 | well. So then we could also search for those based on that

00:55:07.560 | new name. So what was it, chit chat maker, like so. And then

00:55:12.040 | we can see all the information being streamed through to

00:55:16.560 | Langsmith. So that is our introduction to Langsmith. There

00:55:21.160 | is really not all that much to go through here. It's very easy

00:55:25.200 | to set up. And as we've seen, it gives us a lot of

00:55:27.640 | observability into what we are building. And we will be using

00:55:32.880 | this throughout the course, we don't rely on it too much. It's

00:55:35.600 | a completely optional dependency. So if you don't want

00:55:38.000 | to use Langsmith, you don't need to, but it's there and I would

00:55:40.560 | recommend doing so. So that's it for this chapter, we'll move on

00:55:43.800 | to the next one. Now we're going to move on to the chapter on

00:55:48.560 | prompts in Langchain. Now, prompts, they seem like a simple

00:55:53.040 | concept, and they are a simple concept, but there's actually

00:55:55.320 | quite a lot to them when you start diving into them. And they

00:55:59.720 | truly have been a very fundamental part of what has

00:56:04.480 | propelled us forwards from pre LLM times to the current LLM

00:56:09.360 | times. You have to think until LLMs became widespread, the way

00:56:14.520 | to fine tune a AI model or ML model back then was to get loads

00:56:22.720 | of data for your particular use case, spend a load of training

00:56:26.840 | your specific transformer or part of the transformer to

00:56:30.960 | essentially adapt it for that particular task. That could take

00:56:35.120 | a long time. Depending on the task, it could take you months

00:56:40.840 | or in some times, if it was a simpler task, it might take

00:56:44.480 | probably days, potentially weeks. Now, the interesting

00:56:48.720 | thing with LLMs is that rather than needing to go through this

00:56:53.960 | whole fine tuning process to modify a model for one task over

00:57:00.520 | another task, rather than doing that, we just prompt it

00:57:03.400 | differently, we literally tell the model, hey, I want you to do

00:57:07.360 | this in this particular way. And that is a paradigm shift in what

00:57:12.480 | you're doing is so much faster, it's going to take you, you

00:57:15.600 | know, a couple of minutes, rather than days, weeks, or

00:57:18.400 | months. And LLMs are incredibly powerful when it comes to just

00:57:23.200 | generalizing to, you know, across these many different

00:57:26.200 | tasks. So prompts, which control those instructions are a

00:57:31.480 | fundamental part of that. Now, line chain naturally has many

00:57:36.560 | functionalities around prompts. And we can build very dynamic

00:57:40.320 | prompting pipelines that modify the structure and content of

00:57:44.360 | what we're actually feeding into our LLM, depending on different

00:57:47.800 | variables, different inputs. And we'll see that in this chapter.

00:57:51.920 | So we're going to work through prompting within the scope of a

00:57:57.160 | RAG example. So let's start by just dissecting the various

00:58:01.840 | parts of a prompt that we might expect to see for a use case

00:58:06.040 | like RAG. So our typical prompt for RAG or retrieval,

00:58:11.200 | augmented generation will include rules for the LLM. And

00:58:15.960 | this is this you will see in most prompts, if not all this

00:58:21.440 | part of the prompt sets up the behavior of the LLM. That is how

00:58:26.840 | it should be responding to user queries, what sort of

00:58:30.560 | personality it should be taking on what it should be focusing on

00:58:34.360 | when it is responding any particular rules or boundaries

00:58:37.800 | that we want to set. And really, what we're trying to do here is

00:58:42.240 | just to simply provide as much information as possible to the

00:58:47.200 | LLM about what we're doing, we just want to give the LLM

00:58:53.480 | context as to the place that it finds itself in. Because an LLM

00:58:59.200 | has no idea where it is, it's just is a it takes in some

00:59:02.840 | information and spits out information. If the only

00:59:05.800 | information it receives is from the users, you know, user query,

00:59:08.680 | it has, you know, it doesn't know the context, what is the

00:59:12.840 | application that is within? What is its objective? What is its

00:59:16.880 | aim? What are the boundaries? All of this, we need to just

00:59:21.400 | assume the LLM has absolutely no idea about because it truly

00:59:26.360 | does not. So as much context as we can provide, but it's

00:59:32.280 | important that we don't overdo it. It's, we see this all the

00:59:36.040 | time, people will over prompt an LLM, you want to be concise,

00:59:40.320 | you don't want fluff. And in general, every single part of

00:59:44.280 | your prompt, the more concise and less fluffy, you can make it

00:59:47.760 | the better. Now, those rules or instructions are typically in

00:59:51.560 | the system prompt of your LLM. Now, the second one is context,

00:59:55.800 | which is RAG specific. The context refers to some sort of

00:59:59.960 | external information that you're feeding into your LLM. We may

01:00:04.920 | have received this information from web search, database query

01:00:09.600 | or quite often in this case of RAG, it's a vector database.

01:00:14.000 | This external information that we provide is essentially the

01:00:19.120 | RA retrieval augmentation of RAG. We are augmenting the

01:00:25.880 | knowledge of our LLM, which the knowledge of our LLM is

01:00:29.720 | contained within the LLM model weights. We're augmenting that

01:00:33.600 | knowledge with some external knowledge. That's what we're

01:00:36.520 | doing here. Now for chat LLMs, this context is typically

01:00:43.320 | placed within a conversational context within the user or

01:00:48.720 | assistant messages. And with more recent models, it can also

01:00:54.320 | be placed within tool and messages as well. Then we have

01:00:58.760 | the questions, pretty straightforward. This is the

01:01:01.560 | query from the user. This is more, it's usually a user

01:01:06.680 | message, of course. There might be some additional formatting

01:01:10.960 | around this, you might add a little bit of extra context, or

01:01:14.680 | you might add some additional instructions. If you find that

01:01:18.240 | your LLM sometimes veers off the rules that you've set within

01:01:21.760 | the system prompt, you might append or prefix something here.

01:01:26.520 | But for the most part, it's probably just going to be the

01:01:28.600 | user's input. And finally, so these are all the inputs for our

01:01:33.800 | prompt here is going to be the output that we get. So the

01:01:37.760 | answer from the assistant. Again, I mean, that's not even

01:01:41.480 | specific to RAG, it's just what you would expect in a chat LLM

01:01:45.680 | or any LLM. And of course, that would be an assistant message.

01:01:49.600 | So putting all of that together in an actual prompt, so you can

01:01:53.440 | see everything we have here. So we have the rules for our

01:01:57.320 | prompt here, the instructions, we're just saying, okay, answer

01:02:00.360 | the question based on the context below. If you cannot

01:02:02.440 | answer the question, using the information, answer it, I don't

01:02:05.680 | know. Then we have some context here. Okay, in this scenario,

01:02:11.200 | that context that we're feeding in here, because it's the first

01:02:14.680 | message, we might put that into the system prompt. But that may

01:02:18.160 | also be turned around. Okay, if you if you, for example, have an

01:02:21.640 | agent, you might have your question up here before the

01:02:25.760 | context. And then that would be coming from a user message. And

01:02:30.000 | then this context would follow the question and be recognized

01:02:34.600 | as a tool message, it would be fed in that way as well, can

01:02:38.920 | depends on on what sort of structure you're going for that.

01:02:41.520 | But you can do either you can feed it into the system message

01:02:43.960 | if it's less conversational, whereas if it's more

01:02:47.920 | conversational, you might feed it in as a tool message. Okay,

01:02:50.760 | and then we have a user query, which is here. And then we'd

01:02:54.160 | have the AI answer. Okay, and obviously, that would be

01:02:57.120 | generated here. Okay, so let's switch across to the code. We're

01:03:01.520 | in the linechain course repo notebooks, zero, three prompts,

01:03:05.320 | I'm just going to open this in Colab. Okay, scroll down, and

01:03:09.280 | we'll start just by installing the prerequisites. Okay, so we

01:03:13.120 | just have the various libraries, again, as I mentioned before,

01:03:16.360 | langsmith is optional, you don't need to install it. But if you

01:03:19.360 | would like to see your traces and everything in langsmith,

01:03:22.560 | then I would recommend doing that. And if you are using

01:03:25.680 | langsmith, you will need to enter your API key here. Again,

01:03:29.760 | if you're not using langsmith, you don't need to enter

01:03:32.000 | anything here, you just skip that cell. Okay, cool. And let's

01:03:36.160 | jump into the basic prompting them. So we're going to start

01:03:41.080 | with this prompt. And so use query based on the question

01:03:43.600 | below. So we're just structuring what we just saw in code. And

01:03:49.200 | we're going to be using the chat prompt template, because

01:03:52.480 | generally speaking, we're using chat LMS in most, most cases,

01:03:57.720 | nowadays. So we have our chat prompt template, and that is

01:04:01.760 | going to contain a list of messages, system message to

01:04:05.440 | begin with, which is just going to contain this. And we're

01:04:08.800 | feeding in the context within that there. And we have our

01:04:13.640 | user query here. Okay. So we'll run this. And if we take a look

01:04:20.920 | here, we haven't specified what our input variables are, okay.

01:04:26.400 | But we can see that we have query. And we have context up

01:04:31.680 | here, right? So we can see that, okay, these are the input

01:04:34.320 | variables, we just haven't explicitly defined them here. So

01:04:39.160 | let's just confirm with this, that line chain did pick those

01:04:44.040 | up. And we can see that it did. So it has context and query as

01:04:46.720 | our input variables for the prompt template that we just

01:04:50.560 | defined. Okay, so we can also see the structure of our

01:04:55.280 | templates. Let's have a look. Okay, so we can see that within

01:05:00.760 | messages here, we have a system message prompt template, the way

01:05:05.160 | that we define this, you can see here that we have from messages

01:05:08.160 | and this will consume various different structures. So you can

01:05:14.680 | see here that it has a for messages is a sequence of

01:05:19.760 | message like representation. So we could pass in a system prompt

01:05:24.240 | template object, and then a user prompt template object. Or we

01:05:30.600 | can just use a tuple like this. And this actually defines okay,

01:05:33.920 | the system, this is a user, and you could also do assistant or

01:05:38.360 | tool messages and stuff here as well using the same structure.

01:05:42.280 | And then we can look in here. And of course, that is being

01:05:45.880 | translated into the system message prompt template and

01:05:50.080 | human message prompt template. Okay. We have our input

01:05:54.680 | variables in there. And we have the template too. Okay. Now,

01:05:59.880 | let's continue. We'll see here why why just said, so we're

01:06:05.400 | importing our system message prompt template and human

01:06:08.240 | message prompt template. And you can see we're using the same

01:06:11.200 | from messages method here. Right? And you can see so

01:06:15.520 | sequence of message like representation. It's just, you

01:06:19.440 | know, what that actually means. It can vary, right? So here we

01:06:23.160 | have system message prompt template from template, prompt

01:06:25.880 | here from template query, you know, there's various ways that

01:06:28.600 | you might want to do this, it just depends on how explicit you

01:06:32.960 | want to be. Generally speaking, I think, for myself, I would

01:06:38.960 | prefer that we stick with the objects themselves, and be

01:06:43.400 | explicit. But it is definitely a little harder to pass when

01:06:46.960 | you're when you're reading this. So I understand why you might

01:06:50.520 | also prefer this is it's definitely cleaner, and it is a

01:06:53.560 | does look simpler. So it just depends, I suppose, on

01:06:58.480 | preference. Okay. So you see, again, this is exactly the same.

01:07:05.640 | Okay, we're chair prompt template, and it contains this

01:07:08.600 | and this. Okay. You probably want to see the exact output. So

01:07:14.080 | it was messages. Okay, exactly the same as why I put before.

01:07:19.880 | Cool. So we have all that. Let's see how we would invoke our LLM

01:07:25.800 | with these. We're going to be using for a mini again, we do

01:07:30.280 | need our API key. So enter that. And we'll just initialize our

01:07:37.280 | LLM, we are going with a low temperature here. So less

01:07:41.120 | randomness, or less creativity. And in many cases, this is

01:07:46.840 | actually what I would be doing. The reason in this scenario that

01:07:51.400 | we're going with low temperature is we're doing rag. And if you

01:07:55.680 | remember, before we scroll up a little bit here, our template

01:07:59.000 | says, answer the user's query based on the context below. If

01:08:01.680 | you cannot answer the question using the provided answer,

01:08:04.680 | information answer with I don't know, right. So just from

01:08:09.760 | reading that we know that we want our LLM to be as truthful

01:08:15.320 | and accurate as possible. So a more creative LLM is going to

01:08:19.720 | struggle with that and is more likely to hallucinate. Whereas a

01:08:25.080 | low creativity or low temperature LLM will probably

01:08:29.160 | stick with the rules a little better. So again, it depends on

01:08:32.320 | your use case. You know, if you're creative writing, you

01:08:35.120 | might want to go with a higher temperature there. But for

01:08:38.440 | things like rag, where the information being output should

01:08:42.120 | be accurate, and truthful. It's important, I think that we keep

01:08:47.600 | temperature low. Okay. I talked about that a little bit here. So

01:08:51.840 | of course, lower temperature zero makes the LLMs output more

01:08:56.000 | deterministic, which in theory should lead to less

01:08:59.040 | hallucination. Okay, so we're gonna go with L cell again here.

01:09:03.240 | This is for those of you that use line chain in the past, this

01:09:06.480 | is equivalent to an LLM chain object. So our prompt template

01:09:10.840 | is being fed into our LLM. Okay. And from now we have this

01:09:16.800 | pipeline. Now let's see how we would use that pipeline. So

01:09:22.120 | gonna get some, create some context here. So this is some

01:09:27.160 | context around Aurelio AI. Mention that we built semantic

01:09:32.960 | routers, semantic junkers, as AI platform, and development

01:09:38.800 | services. We mentioned, I think we specifically outlined this

01:09:43.960 | later on in the example. So the line chain experts, little piece

01:09:47.160 | of information. Now, most LLMs would have not been trained on

01:09:51.920 | the recent internet. So the fact that this came in September

01:09:55.680 | 2024, is relatively recent. So a lot of LLMs out of the box, you

01:10:00.400 | wouldn't expect them to know that. So that is a good little

01:10:05.320 | bit of information to ask you about. So we invoke, we have our

01:10:08.880 | query. So what do we do? And we have that context. Okay, so

01:10:13.320 | we're feeding that into that pipeline that we defined here.

01:10:16.120 | Alright, so when we invoke that is automatically going to take

01:10:19.920 | query and context and actually feed it into our prompt

01:10:23.800 | template. Okay. If we want to, we can also be a little more

01:10:30.040 | explicit. So you probably see me doing this throughout the

01:10:34.280 | course. Because I do like to be explicit with everything, to be

01:10:39.040 | honest. And you'll probably see me doing this. Okay, and this is

01:10:49.640 | doing the same thing. Well, you'll see it will in the

01:10:53.240 | moment. This is doing the exact same thing. Again, this is just

01:10:57.800 | an LSL thing. So all I'm doing in this scenario is I'm saying,

01:11:04.760 | okay, take that from the dictionary query. And then also

01:11:10.160 | take from that input dictionary, the context key. Okay, so this

01:11:19.000 | is doing the exact same thing. The reason that we might want to

01:11:22.240 | write this is mainly for clarity, to be honest, just too

01:11:26.520 | explicit, say, okay, these are the inputs, because otherwise,

01:11:29.240 | we don't really have them in the code other than within our

01:11:33.360 | original prompts up here, which is not super clear. So I think

01:11:39.400 | it's usually a good idea to just be more explicit with these

01:11:41.720 | things. And of course, if you decide you're going to modify

01:11:45.160 | things a little bit, let's say you modify this input down the

01:11:48.880 | line, you can still feed in the same input here, you're just

01:11:52.240 | mapping it between different keys, essentially. Or if you

01:11:56.040 | would like to just modify that, you need to lowercase it on the

01:11:59.720 | way in or something, you can do. So you have that, I'll just

01:12:06.200 | redefine that, actually. And we'll invoke again. Okay, we see

01:12:13.440 | that it does the exact same thing. Okay, so ready. So this

01:12:17.600 | is a AI message just generated by the LM. Okay, expertise in

01:12:22.440 | building AI agents, several open source frameworks, router, AI

01:12:27.400 | platform. Okay, right. So provide them. So they have

01:12:32.840 | everything that other than the line chain experts thing, it

01:12:35.280 | didn't mention that. But we will, yeah, we'll test it later

01:12:39.080 | on that. Okay, so on to future prompting. This is a specific

01:12:43.040 | prompting technique. Now, many state of the art or also to LMS

01:12:48.440 | are very good at instruction following. So you'll find that a

01:12:52.400 | few shot prompting is less common now than it used to be,

01:12:56.240 | at least for this or bigger, more state of the art models.

01:13:00.480 | But when you start using smaller models, not really what we can

01:13:05.240 | use here. But let's say you're using a source model like llama

01:13:09.400 | three, or llama two, which is much smaller, you will probably

01:13:15.080 | need to consider things like few shot prompting. Although that

01:13:18.920 | being said, with open AI models, at least the current open AI

01:13:24.440 | models, this is not so important. Nonetheless, it can

01:13:27.920 | be useful. So the idea behind future prompting is that you are

01:13:31.880 | providing a few examples to your LM of how it should behave

01:13:36.760 | before you are actually going into the main part of the

01:13:42.520 | conversation. So let's see how that would look. So we create an

01:13:46.800 | example prompt. So we have our human and AI. So human input AI

01:13:51.520 | response. So we're basically setting up okay, this with this

01:13:54.760 | type of input, you should provide this type of output.

01:13:57.960 | That's what we're doing here. And we're just going to provide

01:14:01.760 | some examples. Okay, so we have our input, here's query one,

01:14:05.880 | here's the answer one, right? This is just I just want to show

01:14:09.680 | you how it works. This is not what we'd actually feed into our

01:14:12.680 | LM. Then, with both these examples and our example prompt

01:14:16.960 | would feed both of these into line chains, a few shot chat

01:14:21.680 | message prompt template. Okay. And well, you'll see what we get

01:14:26.720 | out of it. Okay, so we basically get it formats everything and

01:14:30.480 | structures everything for us. Okay. And using this, of course,

01:14:35.920 | it depends on let's say you see that your user is talking about

01:14:42.280 | a particular topic. And you would like to guide your LM to

01:14:47.240 | talk about that particular topic in a particular way. Right. So

01:14:50.760 | you could identify that the user is talking about that topic,

01:14:53.840 | either like a keyword match or a semantic similarity match. And

01:14:58.080 | based on that, you might want to modify these examples that you

01:15:01.240 | feed into your few shot chat message prompt template. And

01:15:06.080 | then obviously, for that could be what you do with topic A for

01:15:08.960 | topic B, you might have another set of examples that you feed

01:15:12.120 | into this. All this time, your example prompts is remaining the

01:15:15.800 | same, but you're just modifying the examples that are going in

01:15:18.480 | so that they're more relevant to whatever it is your user is

01:15:21.520 | actually talking about. So that can be useful. Let's see an

01:15:25.360 | example of that. So when we are using a tiny LM, its ability

01:15:29.800 | would be limited, although I think we were probably fine

01:15:33.160 | here. We're going to say, answer the user query based on the

01:15:36.760 | context below. Always enter a markdown format, you know, being

01:15:40.120 | very specific, this is our system prompt. Okay, that's

01:15:44.320 | nice. But what we've kind of said here is, okay, always

01:15:48.200 | enter a markdown format to do that. But when doing so, please

01:15:53.440 | provide headers, short summaries, and follow bullet

01:15:55.920 | points, then conclude. Okay, so you see this here, okay, so we

01:16:01.560 | get this overview of array, you have this and this is actually

01:16:05.160 | quite good. But if we come down here, what I specifically want

01:16:09.800 | is to always follow this structure. Alright, so we have

01:16:13.880 | the double header for the topic, summary, header, a couple of

01:16:20.120 | bullet points. And then I always want to follow this pattern

01:16:22.320 | where it's like to conclude, always, it's always bold. You

01:16:26.120 | know, I want to be very specific on what I want. And to be, you

01:16:30.400 | know, fully honest, with GPT 4.0 mini, you can actually just

01:16:35.200 | prompt most of this in. But for the sake of the example, we're

01:16:38.560 | going to provide a few short examples in a few short prompt

01:16:43.760 | examples, instead to get this. So we're going to provide one

01:16:46.920 | example here. Second example here. And you'll see we're just

01:16:51.360 | following that same pattern, we're just setting up the

01:16:53.160 | pattern that the LM should use. So we're going to set that up

01:16:58.400 | here, we have our main header, a little summary, some sub

01:17:03.720 | headers, bullet points, sub header, bullet points, bullet

01:17:06.240 | points to conclude, so on and so on. Same with this one here.

01:17:09.640 | Okay. And let's see what we got. Okay, so this is the structure

01:17:20.000 | of our new few short prompt template. You can see what all

01:17:24.800 | this looks like. Let's come down and we're going to do, we're

01:17:28.840 | basically going to insert that directly into our chat prompt

01:17:32.280 | template. So we have from messages, system prompt, user

01:17:37.600 | prompt, and then we have in there, these, so let me actually

01:17:42.960 | show you very quickly. Right, so we just have this few short

01:17:48.720 | chat to message prompt template, which will be fed into the

01:17:51.320 | middle here, run that, and then feed all this back into our

01:17:54.840 | pipeline. Okay, and this will, you know, modify the structure

01:17:58.440 | so that we have that bold to conclude at the end here. Okay,

01:18:01.880 | you can see nicely here. So we get a bit more of that, the

01:18:05.880 | exact structure that we were getting again with GPT 4.0

01:18:10.160 | models and many other OpenAI models, you don't really need to

01:18:14.120 | do this, but you will see it in other examples. We do have an

01:18:17.600 | example of this where we're using a Llama and we're using, I

01:18:21.760 | think Llama 2, if I'm not wrong. And you can see that adding this

01:18:26.680 | few short prompt template is actually a very good way of

01:18:31.280 | getting those smaller, less capable models to follow your

01:18:34.600 | instructions. So this is really, when you're working with a

01:18:38.000 | smaller lens, this can be super useful, but even for SOTA models

01:18:41.360 | like GPT 4.0, if you do find that you're struggling with the

01:18:45.640 | prompting, it's just not quite following exactly what you want

01:18:48.520 | it to do. This is a very good technique for actually getting

01:18:53.240 | it to follow a very strict structure or behavior. Okay, so

01:18:57.200 | moving on, we have chain of thought prompting. So this is a

01:19:01.720 | more common prompting technique that encourages the LLM to

01:19:06.320 | think through its reasoning or its thoughts step by step. So

01:19:11.480 | it's a chain of thought. The idea behind this is like, okay,

01:19:15.040 | in math class, when you're a kid, the teachers would always

01:19:19.280 | push you to put down your, your working out, right? And there's

01:19:24.400 | multiple reasons for that. One of them is to get you to think

01:19:26.960 | because they know in a lot of cases, actually, you know,

01:19:29.400 | you're a kid and you're in a rush and you don't really care

01:19:31.400 | about this test. And the, you know, they're just trying to get

01:19:35.680 | you to slow down a little bit, and actually put down your

01:19:39.360 | reasoning. And that kind of forced you to think, oh,

01:19:41.280 | actually, I'm skipping a little bit in my head, because I'm

01:19:44.320 | trying to just do everything up here. If I write it down, all

01:19:47.480 | of a sudden, it's like, Oh, actually, I'm, yeah, I need to

01:19:50.720 | actually do that slightly differently, you realize, okay,

01:19:53.280 | you're probably rushing a little bit. Now, I'm not saying an LLM

01:19:55.960 | is rushing, but it's a similar effect by an LLM writing

01:19:58.920 | everything down, they tend to actually get things right more

01:20:03.880 | frequently. And at the same time, also similar to when

01:20:07.720 | you're a child and a teacher is reviewing your exam work by

01:20:11.360 | having the LLM write down its reasoning, you as a as a human

01:20:15.920 | or engineer, you can see where the LLM went wrong, if it did

01:20:20.200 | go wrong, which can be very useful when you're trying to

01:20:22.480 | diagnose problems. So with chain of thought, we should see

01:20:26.240 | less hallucinations, and generally bad performance. Now

01:20:30.360 | to implement chain of thought in line chain, there's no

01:20:32.320 | specific like line chain objects that do that. Instead, it's

01:20:35.800 | it's just prompting. Okay, so let's go down and just see how

01:20:39.320 | we might do that. Okay, so be helpful assistant answer the

01:20:42.960 | user question, you must answer the question directly without

01:20:46.200 | any other text or explanation. Okay, so that's our no chain of

01:20:50.520 | thought system prompt. I will just know here, especially with

01:20:53.840 | OpenAI. Again, this is one of those things where you'll see

01:20:57.040 | it more with the smaller models. Most LLMs are actually trained

01:21:00.120 | to use chain of thought prompting by default. So we're

01:21:03.120 | actually specifically telling it here, you must answer the

01:21:05.880 | question directly without any other text or explanation. Okay,

01:21:09.800 | so we're actually kind of reverse prompting it to not use

01:21:13.000 | chain of thought. Otherwise, by default, it actually will try

01:21:17.000 | and do that because it's been trained to. That's how that's

01:21:19.600 | how relevant chain of thought is. Okay, so I'm going to say

01:21:23.280 | how many keystrokes you need to type in, type the numbers from

01:21:26.640 | one to 500. Okay, we set up our like LLM chain pipeline. And

01:21:32.720 | we're going to just invoke our query. And we'll see what we

01:21:35.760 | get. Total number of keystrokes needed to type numbers from one

01:21:40.520 | to 500 is 1511. The actual answers I've written here is

01:21:47.280 | 1392. Without chain thought is hallucinating. Okay, now let's

01:21:52.720 | go ahead and see okay with chain of thought prompting, what does

01:21:55.920 | it do? So be helpful assistant answer users question. To answer

01:22:00.480 | the question, you must list systematically and in precise

01:22:04.160 | detail all sub problems that are needed to be solved to answer

01:22:07.600 | the question. Solve each sub problem individually, you have

01:22:11.720 | to shout at the LLM sometimes to get them to listen. And in

01:22:14.920 | sequence. Finally, use everything you've worked

01:22:18.120 | through to provide the final answer. Okay, so we're getting

01:22:20.480 | it we're forcing it to kind of go through the full problem

01:22:24.320 | there. We can remove that. So run that. Again, I don't know

01:22:29.720 | why we have context there. I'll remove that. And let's see. You

01:22:37.040 | can see straightaway, that's taking a lot longer to generate

01:22:40.640 | the output. That's because it's generating so many more tokens.

01:22:43.000 | So that's just one one drawback of this. But let's see what we

01:22:46.320 | have. So to determine how many keystrokes to tie those numbers,

01:22:50.200 | we is breaking down several sub problems to count number of

01:22:54.080 | digits from one to 910 to 99. So on account digits and number

01:22:59.920 | 500. Okay, interesting. So that's how it's breaking it up.

01:23:04.040 | Some more digits counts in the previous steps. So we go

01:23:07.720 | through total digits. And we see this, okay, nine digits for

01:23:12.680 | those for here 180 for here 1200. And then, of course, three

01:23:20.480 | here. So it gets all those sums those digits and actually comes

01:23:25.600 | to the right answer. Okay, so that that is, you know, that's

01:23:29.200 | the difference with with chain of thought versus without. So

01:23:32.960 | without it, we just get the wrong answer, basically

01:23:35.800 | guessing. With chain of thought, we get the right answer just by

01:23:40.480 | the LLM writing down its reasoning and breaking the

01:23:43.720 | problem down into multiple parts, which is, I found that

01:23:47.160 | super interesting that it does that. So that's pretty cool.

01:23:52.080 | Now, I will just see. So as I mentioned, as we mentioned

01:23:55.800 | before, most LLMs nowadays are actually trained to use chain of

01:23:59.120 | thought prompting by default. So let's just see if we don't

01:24:02.360 | mention anything, right? Be a helpful assistant and answer

01:24:04.440 | these users questions. So we're not telling it not to think

01:24:07.560 | through its reasoning, and we're not telling it to think through

01:24:10.800 | its reasoning. Let's just see what it does. Okay, so you can

01:24:15.560 | see, again, it's actually doing the exact same reasoning, okay,

01:24:22.000 | it doesn't, it doesn't give us like the sub problems at the

01:24:24.480 | start, but it is going through and it's breaking everything

01:24:27.480 | apart. Okay, which is quite interesting. And we get the

01:24:31.040 | same correct answer. So the formatting here is slightly

01:24:34.000 | different. It's probably a little cleaner, actually,

01:24:36.800 | although I think, I don't know. Here, we get a lot more

01:24:41.560 | information. So both are fine. And in this scenario, we

01:24:46.640 | actually do get the right answer as well. So you can see that

01:24:50.080 | that chain of thought prompting has actually been quite

01:24:54.200 | literally trained into the model. And you'll see that with

01:24:58.560 | most, well, I think all Save the Art LLMs. Okay, cool. So that

01:25:04.480 | is our chapter on prompting. Again, we're focusing very much

01:25:09.960 | on a lot of the fundamentals of prompting there. And of course,

01:25:14.880 | tying that back to the actual objects and methods within

01:25:19.600 | LanguageAid. But for now, that's it for prompting. And we'll move

01:25:23.360 | on to the next chapter. In this chapter, we're going to be

01:25:26.360 | taking a look at conversational memory in LanguageChain. We're

01:25:30.960 | going to be taking a look at the core, like chat memory

01:25:35.280 | components that have really been in LanguageChain since the

01:25:39.200 | start, but are essentially no longer in the library. And we'll

01:25:43.800 | be seeing how we actually implement those historic

01:25:48.000 | conversational memory utilities in the new versions of

01:25:53.680 | LanguageChain. So 0.3. Now as a pre warning, this chapter is

01:25:57.720 | fairly long. But that is because conversational memory is just

01:26:02.640 | such a critical part of chatbots and agents. Conversational

01:26:07.440 | memory is what allows them to remember previous interactions.

01:26:11.120 | And without it, our chatbots and agents would just be responding

01:26:15.680 | to the most recent message without any understanding of

01:26:19.760 | previous interactions within a conversation. So they would just

01:26:23.160 | not be conversational. And depending on the type of

01:26:27.960 | conversation, we might want to go with various approaches to

01:26:32.080 | how we remember those interactions within a

01:26:36.720 | conversation. Now throughout this chapter, we're going to be

01:26:39.040 | focusing on these four memory types. We'll be referring to

01:26:43.640 | these and I'll be showing you actually how each one of these

01:26:46.400 | works. But what we're really focusing on is rewriting these

01:26:50.680 | for the latest version of LangChain using the, what's

01:26:54.480 | called the runnable with message

01:26:59.120 | history. So we're going to be essentially taking a look at the

01:27:05.320 | original implementations for each of these four original

01:27:08.960 | memory types, and then we'll be rewriting them with the

01:27:12.200 | runnable memory history class. So just taking a look at each of

01:27:16.880 | these four very quickly. Conversational buffer memory is

01:27:20.840 | I think the simplest, most intuitive of these memory types.

01:27:24.840 | It is literally just you have your messages, they come in to

01:27:31.160 | this object, they are sold in this object as essentially a

01:27:35.000 | list. And when you need them again, it will return them to

01:27:39.080 | you. There's nothing, nothing else to it, super simple. The

01:27:42.760 | conversation buffer window memory, okay, so new word in the

01:27:46.600 | middle of the window. This works in pretty much the same way.

01:27:50.880 | But those messages that it has stored, it's not going to return

01:27:54.680 | all of them for you. Instead, it's just going to return the

01:27:57.720 | most recent, let's say the most recent three, for example. Okay,

01:28:02.200 | and that is defined by a parameter k. Conversational

01:28:05.560 | summary memory, rather than keeping track of the entire

01:28:09.640 | interaction memory directly, what it's doing is as those

01:28:13.800 | interactions come in, it's actually going to take them and

01:28:17.640 | it's going to compress them into a smaller little summary of what

01:28:21.720 | has been within that conversation. And as every new

01:28:25.760 | interaction is coming in, it's going to do that, and I keep

01:28:28.440 | iterating on that summary. And then that is going to return to

01:28:32.080 | us when we need it. And finally, we have the conversational

01:28:34.640 | summary buffer memory. So this is it's taking sort of buffer

01:28:40.760 | part of this is actually referring to very similar thing

01:28:44.360 | to the buffer window memory, but rather than it being a most k

01:28:48.880 | messages, it's looking at the number of tokens within your

01:28:51.600 | memory, and it's returning the most recent k tokens. That's

01:28:58.320 | what the buffer part is there. And then it's also merging that

01:29:02.560 | with the summary memory here. So essentially, what you're

01:29:06.360 | getting is almost like a list of the most recent messages based

01:29:10.280 | on the token length rather than the number of interactions,

01:29:13.160 | plus a summary, which would come at the top here. So you get

01:29:18.240 | kind of both. The idea is that obviously this summary here

01:29:22.560 | would maintain all of your interactions in a very compressed

01:29:27.800 | form. So you're, you're losing less information, and you're

01:29:31.160 | still maintaining, you know, maybe the very first

01:29:33.880 | interaction, the user might have introduced themselves, giving

01:29:36.880 | you their name, hopefully, that would be maintained within the

01:29:40.760 | summary, and it would not be lost. And then you have almost

01:29:44.040 | like high resolution on the most recent k or k tokens from your

01:29:50.440 | memory. Okay, so let's jump over to the code, we're going into

01:29:53.840 | the 04 chat memory notebook, open that in Colab. Okay, now

01:29:57.720 | here we are, let's go ahead and install the prerequisites, run

01:30:02.240 | all we again, can or cannot use a linesmith, it is up to you.

01:30:08.280 | Enter that. And let's come down and start. So first, we'll just

01:30:13.560 | initialize our LM using for a mini in this example, again, low

01:30:19.320 | temperature. And we're going to start with conversation buffer

01:30:23.000 | memory. Okay, so this is the original version of this memory

01:30:30.400 | type. So let me, where are we, we're here. So memory

01:30:35.760 | conversation buffer memory, and we're returning messages that

01:30:38.560 | needs to be set to true. So the reason that we set return

01:30:42.640 | messages true, it mentions up here is if you do not do this,

01:30:47.600 | it's going to be returning your chat history as a string to an

01:30:51.800 | LM. Whereas, well, chat elements nowadays would expect message

01:30:58.480 | objects. So yeah, you just want to be returning these as

01:31:02.840 | messages rather than as strings. Okay. Otherwise, yeah, you're

01:31:06.480 | going to get some kind of strange behavior out from your

01:31:09.360 | LMS if you return them strings. So you do want to make sure

01:31:12.160 | that it's true. I think by default, it might not be true.

01:31:15.640 | But this is coming, this is deprecated, right? It does tell

01:31:18.360 | you here, as deprecation warning, this is coming from

01:31:22.360 | older line chain, but it's a good place to start just to

01:31:25.000 | understand this. And then we're going to rewrite this with the

01:31:27.560 | runnables, which is the recommended way of doing so

01:31:30.360 | nowadays. Okay, so adding messages to our memory, we're

01:31:34.880 | going to write this, okay, so it's just a just a conversation

01:31:38.920 | user AI user AI, so on, random chat, main things to note here

01:31:44.040 | is I do provide my name, we have the the model's name, right

01:31:47.360 | towards the start of those interactions. Okay, so I'm just

01:31:50.440 | going to add all of those, we do it like this. Okay, then we can

01:31:57.040 | just see, we can load our history, like so. So let's just

01:32:02.800 | see what we have there. Okay, so we have human message, AI

01:32:06.520 | message, human message, right? This is exactly what we showed

01:32:10.200 | you just here. It's just in that message format from line chain.

01:32:13.720 | Okay, so we can do that. Alternatively, we can actually

01:32:18.240 | do this. So we can get our memory, we initialize the

01:32:21.120 | constitutional buffer memory as we did before. And we can

01:32:24.360 | actually add it directly these message into our memory like

01:32:28.360 | that. So we can use this add user message, add AI message, so

01:32:31.440 | on, so on, load again, and it's going to give us the exact same

01:32:34.680 | thing. Again, there's multiple ways to do the same thing. Cool.

01:32:38.280 | So we have that to pass all of this into our LM. Again, this is

01:32:42.920 | all deprecated stuff, we're going to learn how to use

01:32:45.000 | properly in a moment. But this is how line chain is doing in

01:32:48.760 | the past. So to pass all of this into our LM, we'd be using this

01:32:53.680 | conversation chain, right? Again, this is deprecated.

01:32:57.600 | Nowadays, we would be using L cell for this. So I just want to

01:33:02.760 | show you how this would all go together. And then we would

01:33:05.280 | invoke, okay, what is my name again, let's run that. And we'll

01:33:10.040 | see what we get is remembering everything, remember, so this

01:33:13.240 | conversation buffer memory, it doesn't drop messages, it just

01:33:17.160 | remembers everything. Right. And honestly, with the sort of high

01:33:21.920 | context windows of many LMS, that might be what you do. It

01:33:25.200 | depends on how long you expect the conversation to go on for,

01:33:27.760 | but you could you probably in most cases would get away with

01:33:30.960 | this. Okay, so what, let's see what we get. I say, what is my

01:33:36.080 | name again? Okay, let's see what it gives me says your name is

01:33:39.760 | James. Great. Thank you. That works. Now, as I mentioned, all

01:33:45.200 | of this I just showed you is actually deprecated. That's the

01:33:47.280 | old way of doing things. Let's see how we actually do this in

01:33:50.520 | modern or up to date blank chain. So we're using this

01:33:54.440 | runnable with message history. To implement that, we will need

01:33:58.800 | to use LSL. And for that we will need to just define prompt

01:34:03.080 | templates or LM as we usually would. Okay, so we're going to

01:34:06.600 | set up our system prompt, which is just a helpful system called

01:34:10.880 | Zeta. Okay, we're going to put in this messages placeholder.

01:34:15.360 | Okay, so that's important. Essentially, that is where our

01:34:19.720 | messages are coming from our conversation buffer memory is

01:34:24.360 | going to be inserted, right? So it's going to be that chat

01:34:27.400 | history is going to be inserted after our system prompt, but

01:34:30.960 | before our most recent query, which is going to be inserted

01:34:34.360 | last here. Okay, so messages placeholder item, that's

01:34:38.800 | important. And we use that throughout the course as well.

01:34:41.600 | So we use it both for chat history, and we'll see later on,

01:34:44.800 | we also use it for the intermediate thoughts that a

01:34:47.960 | agent would go through as well. So important to remember that

01:34:51.920 | little thing. We'll link our prompt template to our LM.

01:34:56.320 | Again, if we would like, we could also add in the I think we

01:35:01.320 | only have the query here. Oh, we would probably also want our

01:35:05.880 | history as well. But I'm not going to do that right now.

01:35:09.360 | Okay, so we have our pipeline. And we can go ahead and actually

01:35:13.680 | define our runnable with message history. Now this class or

01:35:18.120 | object when we are initializing it does require a few items, we

01:35:21.360 | can see them here. Okay, so we see that we have our pipeline

01:35:25.400 | with history. So it's basically going to be, you can you can see

01:35:28.720 | here, right, we have that history messages key, right, this

01:35:32.120 | here has to align with what we provided as a messages

01:35:36.120 | placeholder in our pipeline, right? So we have our pipeline

01:35:41.240 | prompt template here, and here, right. So that's where it's

01:35:45.200 | coming from. It's coming from messages placeholder, the

01:35:47.120 | variable name is history, right? That's important. That links to

01:35:51.920 | this. Then for the input messages key here, we have query

01:35:56.360 | that, again, links to this. Okay, so both important to have

01:36:02.680 | that. The other thing that is important is obviously we're

01:36:06.480 | passing in that pipeline from before. But then we also have

01:36:09.480 | this get session history. Basically, what this is doing is

01:36:12.840 | it saying, okay, I need to get the list of messages that make

01:36:16.280 | up my chat history that are going to be inserted into this

01:36:19.200 | variable. So that is a function that we define, okay. And within

01:36:23.960 | this function, what we're trying to do here is actually

01:36:26.640 | replicate what we have with the previous conversation buffer

01:36:33.000 | memory. Okay, so that's what we're doing here. So it's very

01:36:36.880 | simple, right? So we have this in memory chat message history.

01:36:42.880 | Okay, so that's just the object that we're going to be

01:36:44.840 | returning. What this will do is it will sell a session ID, the

01:36:48.560 | session ID is essentially like a unique identifier so that each

01:36:52.560 | conversational interaction within a single conversation is

01:36:56.200 | being mapped to a specific conversation. So you don't have

01:36:58.960 | overlapping, let's say have multiple users using the same

01:37:01.480 | system, you want to have a unique session ID for each one

01:37:03.960 | of those. Okay, and what it's doing is saying, okay, if the

01:37:07.080 | session ID is not in the chat map, which is this empty

01:37:10.400 | dictionary we defined here, we are going to initialize that

01:37:15.000 | session with an in memory, chat message history. Okay, that's

01:37:21.040 | it. And we return. Okay, and all that's going to do is it's

01:37:25.040 | going to basically append our messages, they will be appended

01:37:28.560 | within this chat map session ID, and they're going to get

01:37:32.560 | returned. There's nothing else to it, to be honest. So we

01:37:38.000 | invoke our runnable, let's see what we get. I need to run this.

01:37:42.720 | Okay, note that we do have this config, so we have the session

01:37:48.800 | ID, that's to again, as I mentioned, keep different

01:37:51.600 | conversations separate. Okay, so we've run that. Now let's run a

01:37:55.440 | few more. So what is my name again, let's see if it

01:37:58.800 | remembers. Your name is James. How can I help you today, James?

01:38:02.840 | Okay. So it's what we've just done there is literally

01:38:08.360 | conversation buffer memory, but for up to date, line chain with

01:38:14.640 | L cell with runnables. So the recommended way of doing it

01:38:19.040 | nowadays. So that's a very simple example. Okay, there's

01:38:23.240 | really not that much to it. It gets a little more complicated

01:38:28.200 | as we start thinking about the different types of memory.

01:38:30.760 | Although with that being said, it's not massively complicated,

01:38:33.760 | we're only really going to be changing the way that we're

01:38:36.160 | getting our interactions. So let's, let's dive into that and

01:38:42.080 | see how we will do something similar with the conversation

01:38:45.120 | buffer window memory. But first, let's actually just understand

01:38:48.240 | okay, what is the conversation buffer window memory. So as I

01:38:51.560 | mentioned, near the start, it's going to keep track of the last

01:38:53.880 | K messages. So there's a few things to keep in mind here.

01:38:58.600 | More messages does mean more tokens that send each request.

01:39:02.600 | And if we have more tokens in each request, it means that

01:39:05.320 | we're increasing the latency of our responses and also the cost.

01:39:08.360 | So with the previous memory type, we're just sending

01:39:12.200 | everything. And because we're sending everything that is going

01:39:15.440 | to be increasing our costs, it's going to be increasing our

01:39:17.400 | latency for every message, especially as the conversation

01:39:20.120 | gets longer and longer. And we don't, we might not necessarily

01:39:22.760 | want to do that. So with this conversation buffer window

01:39:27.000 | memory, we're going to say, okay, just return me the most

01:39:30.360 | recent messages. Okay, so let's, well, let's see how that would

01:39:36.000 | work. Here, we're going to return the most recent four

01:39:38.960 | messages. Okay, we are again, make sure we've turned messages

01:39:42.720 | is set to true. Again, this is deprecated. This is just the

01:39:46.320 | old way of doing it. In a moment, we'll see the updated

01:39:49.760 | way of doing this. We'll add all of our messages. Okay, so we

01:39:55.640 | have this. And just see here, right, so we've added in all

01:40:01.000 | these messages, there's more than four messages here. And we

01:40:03.680 | can actually see that here. So we have human message, AI,

01:40:07.400 | human, AI, human, AI, human, AI. Right. So we've got four pairs

01:40:13.440 | of human AI interactions there. But here, we don't have as more

01:40:17.560 | than four pairs. So four pairs would take us back all the way

01:40:21.440 | to here, I'm researching different types of

01:40:25.200 | conversational memory. Okay, and if we take a look here, the

01:40:29.200 | most the first message we have is I'm researching different

01:40:32.040 | types of conversational memory. So it's cut off these two here,

01:40:35.800 | which will be a bit problematic when we ask you what our name

01:40:38.720 | is. Okay, so let's just see, we're going to be using

01:40:41.400 | conversation chain object again, again, remember that is

01:40:44.600 | deprecated. And I want to say what is my name again, let's

01:40:48.360 | see, let's see what it says. I'm sorry, I don't know if I see

01:40:53.920 | your name or any personal information, if you like, you

01:40:55.920 | can tell me your name, right, so it doesn't actually remember.

01:40:58.360 | So that's kind of like a negative of the conversation

01:41:04.160 | buffer window memory. Of course, the to fix that in this

01:41:08.160 | scenario, we might just want to increase K maybe we say around

01:41:11.480 | the previous eight interaction pairs, and it will actually

01:41:15.400 | remember. So what's my name again, your name is James. So

01:41:19.200 | now it remembers, we just modified how much is

01:41:21.680 | remembering. But of course, you know, there's pros and cons to

01:41:24.600 | this, it really depends on what you're trying to build. So let's

01:41:28.120 | take a look at how we would actually implement this with

01:41:31.880 | the runnable with message history. Okay, so getting a

01:41:37.520 | little more complicated here, although it is, it's not, it's

01:41:41.680 | not complicated. But more we'll see. Okay, so we have a buffer

01:41:46.000 | window message history, we're creating a class here, this

01:41:49.400 | class is going to inherit from the base chat message history

01:41:53.320 | object from line chain. Okay, and all of our other message

01:41:58.320 | history objects can do the same thing before with the in memory

01:42:02.520 | message object that was basically replicating the buffer

01:42:06.120 | memory. So we didn't actually need to do anything, we didn't

01:42:10.240 | need to define our own class here. So in this case, we do.

01:42:14.760 | So we follow the same pattern that line chain follows with

01:42:19.800 | this base chat message history. And you can see a few of the

01:42:22.520 | functions here that are important. So add messages and

01:42:25.760 | clear the ones that we're going to be focusing on, we also need

01:42:28.320 | to have messages, which this object attribute here. Okay, so

01:42:32.120 | we're just implementing the synchronous methods here. If we

01:42:37.680 | want this to be async, if we want to supply async, we would

01:42:40.440 | have to add a add messages, a get messages and a clear as

01:42:45.760 | well. So let's go ahead and do that. We have messages we have

01:42:49.800 | k again, we're looking at remembering the top k messages

01:42:52.840 | or most recent k messages only. So it's important that we have

01:42:56.440 | that variable, we are adding messages through this class,

01:43:00.280 | this is going to be used by line chain within our runnable. So

01:43:04.080 | we need to make sure that we do have this method. And all we're

01:43:06.800 | going to be doing is sending the self messages list here. And

01:43:11.480 | then we're actually just going to be trimming that down so that

01:43:13.600 | we're not remembering anything beyond those, you know, most

01:43:18.480 | recent k messages that we have set from here. And then we also

01:43:24.160 | have the clear method as well. So we need to include that

01:43:26.920 | that's just going to clear the history. Okay, so it's not this

01:43:30.120 | isn't complicated, right? It just gives us this nice default

01:43:34.160 | standard interface for message history. And we just need to

01:43:38.280 | make sure we're following that pattern. Okay, I've included the

01:43:41.600 | this print here just so we can see what's happening. Okay, so

01:43:44.800 | we have that. And now for that get chat history function that

01:43:50.240 | we defined earlier, rather than using the built in method, we're

01:43:54.040 | going to be using our own object, which is a buffer window

01:43:57.520 | message history, which we defined just here. Okay. So if

01:44:02.800 | session ID is not in the chat map, as we did before, we're

01:44:05.800 | going to be initializing our buffer window message history,

01:44:08.480 | we're setting k up here with a default value of four, and then

01:44:12.320 | we just return it. Okay, and that is it. So let's run this,

01:44:16.200 | we have our runnable with message history, we have all of

01:44:20.360 | these variables, which are exactly the same as before. But

01:44:23.480 | then we also have these variables here with this history

01:44:26.600 | factory config. And this is where if we have new variables

01:44:34.040 | that we've added to our message history, in this case, k that we

01:44:38.680 | have down here, we need to provide that to line chain and

01:44:42.480 | tell it this is a new configurable field. Okay. And

01:44:45.680 | we've also added it for the session ID here as well. So

01:44:48.640 | we're just being explicit and have everything in that. So we

01:44:52.240 | have that and we run. Okay, now let's go ahead and invoke and

01:44:58.160 | see what we get. Okay, so important here, this history

01:45:02.680 | factory config, that is kind of being fed through into our

01:45:06.240 | invoke so that we can actually modify those variables from

01:45:09.840 | here. Okay, so we have config configurable, session ID, okay,

01:45:13.880 | we'll just put whatever we want in here. And then we also have

01:45:16.400 | the number k. Okay, so remember the previous four interactions,

01:45:22.640 | I think in this one, we're doing something slightly different. I

01:45:25.360 | think we're remembering the four interactions rather than the

01:45:28.560 | previous four interaction pairs. Okay, so my name is James,

01:45:32.560 | we're going to go through I'm just going to actually clear

01:45:35.400 | this. And I'm going to start again. And we're going to use

01:45:38.040 | the exact same add user message and AI message that we used

01:45:41.880 | before, which is manually inserting all that into our

01:45:44.240 | history, so that we can then just see, okay, what is the

01:45:47.840 | result. And you can see that k equals four is actually unlike

01:45:52.360 | before where we were having the saving the top four interaction

01:45:56.920 | pairs, when now saving the most recent four interactions, not

01:46:03.000 | pairs, just interactions. And honestly, I just think that's

01:46:06.480 | clearer. I think it's weird that the number four for k would

01:46:10.760 | actually save the most recent eight messages. Right? I think

01:46:14.960 | that's odd. So I'm just not replicating that weirdness. We

01:46:19.160 | could if we wanted to, I just don't like it. So I'm not doing

01:46:23.800 | that. And anyway, we can see from messages that we're

01:46:26.960 | returning just the most four recent messages. Okay, I wish

01:46:31.160 | would be these four. Okay, cool. So we've just using the

01:46:35.160 | runnable, we've replicated the old way of having a window

01:46:40.640 | memory. And okay, I'm going to say what is my name again, as

01:46:44.200 | before, it's not going to remember. So we can come to

01:46:47.000 | here, I'm sorry, but I don't have access to personal

01:46:48.680 | information and so on and so on. If you like to tell me your

01:46:51.360 | name, it doesn't know. Now let's try a new one, where we

01:46:55.640 | initialize a new session. Okay, so we're going with ID k 14. So

01:47:01.240 | that's going to create a new conversation there. And we're

01:47:03.760 | going to say, we're going to set k to 14. Okay, great. I'm

01:47:09.320 | going to manually insert the other messages as we did

01:47:12.760 | before. Okay, and we can see all of those you can see at the

01:47:15.880 | top here, we are still maintaining that Hi, my name is

01:47:18.520 | James message. Now let's see if it remembers my name. Your name

01:47:23.480 | is James. Okay, there we go. Cool. So that is working. We

01:47:28.360 | can also see, so we just added this, what is my name again,

01:47:31.960 | let's just see if did that get added to our list of messages.

01:47:36.440 | Right, what is my name again? Nice. And then we also have the

01:47:39.640 | response, your name is James. So just by invoking this, because

01:47:43.320 | we're using the, the runnable with message history, it's just

01:47:47.800 | automatically adding all of that into our message history,

01:47:51.800 | which is nice. Cool. Alright, so that is the buffer window

01:47:56.920 | memory. Now we are going to take a look at how we might do

01:48:01.480 | something a little more complicated, which is the

01:48:03.880 | summaries. Okay, so when you think about the summary, you

01:48:07.080 | know, what are we doing, we're actually taking the messages,

01:48:10.680 | we're using the LLM call to summarize them, to compress

01:48:14.760 | them, and then we're storing them within messages. So let's

01:48:18.360 | see how we would actually do that. So to start with, let's

01:48:23.720 | just see how it was done in old line chain. So your

01:48:27.000 | conversation summary memory, go through that. And let's just

01:48:33.160 | see what we get. So again, same interactions. Right, I'm just

01:48:38.600 | invoking, invoking, invoking, I'm not adding these directly

01:48:42.120 | to the messages, because it actually needs to go through a

01:48:46.520 | like that summarization process. And if we have a look, we can

01:48:50.520 | see it happening. Okay, current conversation. So sorry,

01:48:54.680 | current conversation. Hello there, my name is James, AI is

01:48:57.880 | generating. Current conversation, the human introduces

01:49:01.320 | himself as James, AI greets James warmly and expresses its

01:49:04.760 | readiness to chat and assist, inquiring about how his day is

01:49:08.200 | going. Right, so it's summarizing the previous

01:49:11.640 | interactions. And then we have, you know, after that summary, we

01:49:15.720 | have the most recent human message, and then the AI is

01:49:18.520 | going to generate its response. Okay, and that continues going,

01:49:22.200 | continues going. And you see that the final summary here is

01:49:25.240 | going to be a lot longer. Okay, and it's different that first

01:49:28.280 | first summary, of course, asking about his day, he mentions that

01:49:31.160 | he's researching different types of conversational memory.

01:49:33.640 | The AI responds enthusiastically, explaining that

01:49:36.280 | conversational memory includes short term memory, long term

01:49:38.760 | memory, contextual memory, personalized memory, and then

01:49:41.080 | inquires if James is focused on the specific type of memory.

01:49:44.680 | Okay, cool. So we get essentially the summary is just

01:49:48.760 | getting longer and longer as we go. But at some point, the idea

01:49:52.520 | is that it's not going to keep growing. And it should actually

01:49:55.560 | be shorter than if you were saving every single

01:49:57.640 | interaction, whilst maintaining as much of the information as

01:50:01.960 | possible. But of course, you're not going to maintain all of

01:50:06.280 | the information that you would with, for example, the the

01:50:09.720 | buffer memory, right with the summary, you are going to lose

01:50:13.640 | information, but hopefully less information than if you're just

01:50:17.960 | cutting interactions. So you're trying to reduce your token

01:50:21.880 | count whilst maintaining as much information as possible.

01:50:26.520 | Now, let's go and ask what is my name again, it should be able

01:50:30.360 | to answer because we can see in the summary here that I

01:50:34.200 | introduced myself as James. Okay, response, your name is

01:50:38.360 | James. How is your research going? Okay, so has that. Cool.

01:50:42.920 | Let's see how we'd implement that. So again, as before, we're

01:50:46.600 | going to go with that conversation summary message

01:50:50.760 | history, we're going to be importing a system message,

01:50:53.560 | we're going to be using that not for the LM that we're chatting

01:50:56.040 | with, but for the LM that will be generating our summary. So

01:51:00.520 | actually, that is not quite correct, there's create a

01:51:04.760 | summary, not that it matters, it's just the docker string. So

01:51:07.880 | we have our messages and we also have the LM. So different

01:51:10.520 | tribute here to what we had before. When we initialize a

01:51:14.440 | conversation summary message history, we need to be passing

01:51:17.640 | in our LM. We have the same methods as before, we have add

01:51:21.720 | messages and clear. And what we're doing is as messages

01:51:25.240 | coming, we extend with our current messages, but then we're

01:51:29.720 | modifying those. So we construct our instructions to

01:51:35.560 | make a summary. So that is here, we have the system prompt,

01:51:40.280 | given the existing conversation summary and the new messages,

01:51:43.240 | generate a new summary of the conversation, ensuring to

01:51:45.400 | maintain as much relevant information as possible. Then

01:51:48.920 | we have a human message here, through that we're passing the

01:51:52.360 | existing summary. And then we're passing in the new

01:51:56.840 | messages. So we format those and invoke the LM.

01:52:04.040 | And then what we're doing is in the messages, we're actually

01:52:10.040 | replacing the existing history that we had before with a new

01:52:14.440 | history, which is the single system summary message. Let's

01:52:20.040 | see what we get. As before, we have that get chat history

01:52:23.160 | exactly the same as before. The only real difference is that

01:52:26.440 | we're passing in the LM parameter here. And of course,

01:52:29.400 | as we're passing in the LM parameter in here, it does also

01:52:33.080 | mean that we're going to have to include that in the

01:52:34.760 | configurable field spec, and that we're going to need to

01:52:39.160 | include that when we're invoking our pipeline. So we

01:52:44.520 | run that, pass in the LM. Now, of course, one side effect of

01:52:51.160 | generating summaries or everything is that we're

01:52:52.920 | actually, you know, we're generating more. So you are

01:52:56.760 | actually using quite a lot of tokens. Whether or not you are

01:53:00.600 | saving tokens or not actually depends on the length of a

01:53:03.080 | conversation. As the conversation gets longer, if

01:53:05.880 | you're storing everything, after a little while that the

01:53:09.480 | token usage is actually going to increase. So if in your use

01:53:13.720 | case you expect to have shorter conversations, you would be

01:53:17.800 | saving money and tokens by just using the standard buffer

01:53:22.120 | memory. Whereas if you're expecting very long

01:53:25.080 | conversations, you would be saving tokens and money by

01:53:28.440 | using the summary history. Okay, so let's see what we got

01:53:33.160 | from that. We have a summary of the conversation. James

01:53:35.400 | introduced himself by saying, "Hi, my name is James." AR

01:53:37.800 | responded warmly asking, "Hi, James." Interaction include

01:53:40.600 | details about token usage. Okay, so we actually included

01:53:45.960 | everything here, which we probably should not have done.

01:53:49.400 | Why did we do that? So in here, we're including all of

01:53:57.240 | the content from the

01:54:03.720 | messages. So I think maybe if we just do "x.content" for

01:54:10.200 | "x" in messages, that should resolve that.

01:54:16.280 | Okay, there we go. So we quickly fixed that. So yeah, before

01:54:21.160 | we're passing in the entire message object, which obviously

01:54:23.560 | includes all of this information. Whereas actually

01:54:26.200 | we just want to be passing in the content. So we modified

01:54:30.360 | that and now we're getting what we'd expect. Okay, cool. And

01:54:35.640 | then we can keep going. So as we as we keep going, the

01:54:38.600 | summary should get more abstract. Like as we just saw

01:54:42.920 | here, it's literally just giving us the messages directly

01:54:46.440 | almost. Okay, so we're getting the summary there and we can

01:54:50.120 | keep going. We're going to add just more messages to that. So

01:54:53.080 | we'll see as we'll send those, we're getting a

01:54:57.720 | response. Send again, get response. And we're just adding

01:55:01.000 | all of that. Inverting all of that and that will be of course

01:55:03.960 | adding everything into our message history. Okay, cool. So

01:55:08.440 | we've run that. Let's see what the latest summary is.

01:55:13.560 | Okay, and then we have this. So this is a summary that we have

01:55:16.820 | instead of our chat history. Okay, cool. Now, finally, let's

01:55:23.860 | see what's my name again. We can just double check. You know,

01:55:26.980 | it has my name in there. So it should be able to tell us.

01:55:31.460 | Okay, cool. So your name is James. Pretty interesting. So

01:55:38.680 | let's have a quick look over at Langsmith. So the reason I

01:55:43.080 | want to do this is just to point out, okay, the different

01:55:46.600 | essentially token usage that we're getting with each one of

01:55:48.840 | these. Okay, so we can see that we have these runnable

01:55:51.400 | message history, which are probably improved in naming

01:55:54.200 | there. But we can see, okay, how long is each one of these

01:55:59.000 | taken? How many tokens are they also using? Come back to here.

01:56:03.800 | We have this runnable message history. This is, we'll go

01:56:07.320 | through a few of these, maybe to here, I think. You can see

01:56:11.400 | here, this is that first interaction where we're using

01:56:13.880 | the buffer memory. And we can see how many tokens we use

01:56:18.280 | here. So 112 tokens when we're asking what is my name again.

01:56:22.280 | Okay, then we modified this to include, I think it was like

01:56:27.880 | 14 interactions or something on those lines, obviously

01:56:30.520 | increases the number of tokens that we're using, right? So we

01:56:33.160 | can see that actually happening all in Langsmith, which is

01:56:36.200 | quite nice. And we can compare, okay, how many tokens is each

01:56:38.920 | one of these using. Now, this is looking at the buffer window.

01:56:43.960 | And if we come down to here and look at this one, so this is

01:56:47.640 | using our summary. Okay, so summary with what is my name

01:56:51.560 | again, actually use more tokens in this scenario, right? Which

01:56:54.520 | is interesting because we're trying to compress information.

01:56:57.640 | The reason there's more is because there's not, there

01:56:59.880 | hasn't been that many interactions. As the

01:57:02.680 | conversation length increases with the summary, this total

01:57:08.120 | number of tokens, especially if we prompt it correctly to keep

01:57:10.600 | that low, that should remain relatively small. Whereas with

01:57:16.040 | the buffer memory, that will just keep increasing and

01:57:19.560 | increasing as the conversation gets longer. So useful little

01:57:25.000 | way of using Langsmith there to just kind of figure out, okay,

01:57:28.920 | in terms of tokens and costs of what we're looking at for each

01:57:32.200 | of these memory types. Okay, so our final memory type acts as a

01:57:37.720 | mix of the summary memory and the buffer memory. So what it's

01:57:42.440 | going to do is keep the buffer up until an n number of tokens.

01:57:48.440 | And then once a message exceeds the n number of token limit for

01:57:52.760 | the buffer, it is actually going to be added into our

01:57:56.760 | summary. So this memory has the benefit of remembering in

01:58:02.600 | detail the most recent interactions whilst also not

01:58:07.000 | having the limitation of using too many tokens as a

01:58:12.440 | conversation gets longer and even potentially exceeding

01:58:15.400 | context windows if you try super hard. So this is a very

01:58:19.480 | interesting approach. Now as before, let's try the original

01:58:23.880 | way of implementing this. Then we will go ahead and use our

01:58:29.000 | update method for implementing this. So we come down to here

01:58:32.680 | and we're going to do Lang chain memory import conversation

01:58:36.360 | summary buffer memory. Okay, a few things here. LLM for

01:58:41.480 | summary. We have the n number of tokens that we can keep

01:58:46.200 | before they get added to the summary and then return

01:58:49.160 | messages, of course. Okay, you can see again this is

01:58:51.560 | deprecated. We use the conversation chain and then we're

01:58:56.040 | just passing our memory there and then we can chat. Okay, so

01:58:59.640 | super straightforward first message. We'll add a few more

01:59:03.880 | here. Again, we have to invoke because how memory type here is

01:59:10.120 | using LLM to create those summaries as it goes and let's

01:59:14.360 | see what they look like. Okay, so we can see for the first

01:59:16.920 | message here, we have a human message and then an AI message.

01:59:22.360 | Then we come a little bit lower down again. It's the same

01:59:24.440 | thing. Human message is the first thing in our history here.

01:59:28.840 | Then it's a system message. So this is at the point where

01:59:31.560 | we've exceeded that 300 token limit and the memory type here

01:59:36.440 | is generating those summaries. So that summary comes in as

01:59:40.120 | this is a message and we can see, okay, the human named

01:59:43.240 | James introduces himself and mentions he's researching

01:59:45.720 | different types of conversational memory and so on

01:59:47.960 | and so on. Right. Okay, cool. So we have that. Then let's come

01:59:53.480 | down a little bit further. We can see, okay, so the summary

01:59:57.160 | there. Okay, so that's what we that's what we have. That is

02:00:01.960 | the implementation for the old version of this memory. Again,

02:00:07.880 | we can see it's deprecated. So how do we implement this for

02:00:12.040 | our more recent versions of LangChain and specifically

02:00:16.200 | 0.3? Well, again, we're using that runnable message history

02:00:20.840 | and it looks a little more complicated than we were

02:00:24.360 | getting before, but it's actually just, you know, it's

02:00:26.680 | nothing too complex. We're just creating a summary as we

02:00:31.800 | did with the previous memory type, but the decision for

02:00:36.360 | adding to that summary is based on, in this case, actually the

02:00:39.960 | number of messages. So I didn't go with the LangChain

02:00:43.960 | version where it's a number of tokens. I don't like that. I

02:00:47.240 | prefer to go with messages. So what I'm doing is saying, okay,

02:00:50.520 | the last K messages. Okay. Once we exceed K messages, the

02:00:56.200 | messages beyond that are going to be added to the memory.

02:01:00.280 | Okay, cool. So let's see, we first initialize our

02:01:06.040 | conversation summary buffer message history class with LLM

02:01:11.640 | and K. Okay, so these two here. So LLM, of course, to create

02:01:15.320 | summaries and K is just the limit of number of messages

02:01:18.360 | that we want to keep before adding them to the summary or

02:01:21.560 | dropping them from our messages and adding them to the summary.

02:01:24.920 | Okay, so we will begin with, okay, do we have an existing

02:01:30.360 | summary? So the reason we set this to none is we can't extract

02:01:36.840 | the summary, the existing summary, unless it already

02:01:40.200 | exists. And the only way we can do that is by checking, okay,

02:01:43.800 | do we have any messages? If yes, we want to check if within

02:01:47.960 | those messages, we have a system message because we're

02:01:50.440 | doing the same structure as what we have for peer where the

02:01:53.720 | system message, that first system message is actually our

02:01:56.840 | summary. So that's what we're doing here. We're checking if

02:01:59.400 | there is a summary message already stored within our

02:02:02.200 | messages. Okay, so we're checking for that. If we find

02:02:08.600 | it, we'll just do, we have this little print statement so we

02:02:11.080 | can see that we found something and then we just make our

02:02:15.480 | existing summary. I should actually move this to the first

02:02:20.920 | instance here. Okay, so that existing summary will be set

02:02:26.920 | to the first message. Okay, and this would be a system message

02:02:33.480 | rather than a string. Cool, so we have that. Then we want to

02:02:39.640 | add any new messages to our history. Okay, so we're sending

02:02:44.760 | the history there and then we're saying, okay, if the

02:02:47.560 | length of our history is exceeds the K value that we

02:02:51.480 | set, we're going to say, okay, we found that many messages.

02:02:54.120 | We're going to be dropping the latest. It's going to be the

02:02:56.040 | latest two messages. This I will say here, one thing or one

02:03:01.640 | problem with this is that we're not going to be saving that

02:03:04.840 | many tokens if we're summarizing every two messages.

02:03:08.440 | So what I would probably do is in an actual like production

02:03:13.480 | setting, I would probably say let's go to twenty messages and

02:03:20.040 | once we hit twenty messages, let's take the previous ten.

02:03:23.720 | We're going to summarize them and put them into our summary

02:03:26.600 | alongside any previous summary that already existed, but in

02:03:30.440 | you know, this is also fine as well. Okay, so we say we found

02:03:36.600 | those messages. We're going to drop the latest two messages.

02:03:40.760 | Okay, so we pull the oldest messages out. I should say

02:03:46.200 | not the latest. It's the oldest, not the latest. We want to

02:03:51.000 | keep the latest and drop the oldest. So we pull out the

02:03:54.840 | oldest messages and keep only the most recent messages.

02:03:59.240 | Okay, then I'm saying, okay, if we don't have any old

02:04:03.720 | messages to summarize, we don't do anything. We just return.

02:04:07.560 | Okay, so this indicates that this has not been triggered. We

02:04:11.880 | would hit this, but in the case this has been triggered and we

02:04:17.000 | do have old messages, we're going to come to here. Okay, so

02:04:22.760 | this is we can see we have a system message prompt template

02:04:26.760 | saying giving the existing conversation summary in the new

02:04:29.480 | messages generate a new summary of the conversation,

02:04:32.520 | ensuring to maintain as much relevant information as

02:04:34.760 | possible. So if we want to be more conservative with tokens,

02:04:38.040 | we could modify this prompt here to say keep the summary to

02:04:42.360 | within the length of a single paragraph, for example, and

02:04:46.680 | then we have our human message prompt template, which can

02:04:49.240 | say, okay, here's the existing conversation summary and here

02:04:51.960 | are new messages. Now, new messages here is actually the

02:04:55.160 | old messages, but the way that we're framing it to the LLM

02:04:59.400 | here is that we want to summarize the whole conversation,

02:05:02.680 | right? It doesn't need to have the most recent messages that

02:05:05.000 | we're storing within our buffer. It doesn't need to know

02:05:08.600 | about those. That's irrelevant to the summary. So we just tell

02:05:11.560 | it that we have these new messages and as far as this LLM

02:05:14.280 | is concerned, this is like the full set of interactions. Okay,

02:05:18.600 | so then we would format those and invoke our LLM and then

02:05:23.800 | we'll print out our new summary so we can see what's going on

02:05:26.360 | there and we would prepend that new summary to our

02:05:31.640 | conversation history. Okay, and this will work so we can just

02:05:37.240 | prepend it like this because we've already popped. Where was

02:05:43.640 | it up here? If we have an existing summary, we already

02:05:48.600 | popped that from the list. It's already been pulled out of

02:05:50.520 | that list. So it's okay for us to just we don't need to say

02:05:54.760 | like we don't need to do this because we've already dropped

02:05:58.280 | that initial system message if it existed. Okay, and then we

02:06:01.960 | have the clear method as before. So that's all of the

02:06:05.640 | logic for our conversational summary buffer memory. We

02:06:12.200 | redefine our get chat history function with the LM and K

02:06:18.760 | parameters there and then we'll also want to set the

02:06:21.480 | configurable fields again. So that is just going to be called

02:06:25.080 | session ID LM and K. Okay, so now we can invoke the K value

02:06:32.280 | to begin with is going to be four. Okay, so you can see no

02:06:37.880 | old messages to update summary with. That's good. Let's invoke

02:06:42.520 | this a few times and let's see what we get. Okay, so no old

02:06:47.080 | messages to update summary with.

02:06:51.540 | Found six messages dropping the oldest two and then we have new

02:06:55.460 | summary in the conversation. James and Bruce themselves and

02:06:57.700 | Chris is interested in researching different types of

02:07:00.180 | conversational memory. Right so you can see there's quite a lot

02:07:03.220 | in here at the moment. So we would definitely want to prompt

02:07:07.940 | the LM the summary LM to keep that short. Otherwise, we're

02:07:12.100 | just getting a ton of stuff right, but we can see that that

02:07:16.820 | is you know it's it's working. It's functional. So let's go

02:07:20.500 | back and see if we can prompt it to be a little more concise.

02:07:23.940 | So we come to here and trying to maintain as much relevant

02:07:27.460 | information as possible. However, we need to keep our

02:07:34.980 | summary concise. The limit is a single short paragraph. Okay,

02:07:45.060 | something like this. Let's try and let's see what we get with

02:07:48.980 | that. Okay, so message one again and nothing to update.

02:07:54.100 | See this so new summary you can see it's a bit shorter. It

02:07:57.700 | doesn't have all those bullet points. Okay, so that seems

02:08:04.900 | better. Let's see so you can see the first summary is a bit

02:08:09.620 | shorter, but then as soon as we get to the second and third

02:08:13.700 | summaries, the second summary is actually slightly longer than

02:08:16.980 | the third one. Okay, so we're going to be we're going to be

02:08:20.260 | losing a bit of information in this case more than we were

02:08:23.460 | before, but we're saving a ton of tokens. So that's of course

02:08:27.460 | a good thing and of course we could keep going and adding

02:08:30.500 | many interactions here and we should see that this

02:08:33.460 | conversation summary will be it should maintain that sort of

02:08:37.220 | length of around one short paragraph. So that is it for

02:08:43.220 | this chapter on conversational memory. We've seen a few

02:08:47.300 | different memory types. We've implemented the old deprecated

02:08:51.140 | versions so we can see what they were like and then we've

02:08:55.060 | reimplemented them for the latest versions of lang chain

02:08:58.500 | and to be honest using logic where we are getting much more

02:09:02.740 | into the weebs and that is in some ways. Okay, it complicates

02:09:07.300 | things that is true, but in other ways it gives us a ton of

02:09:10.900 | control so we can modify those memory types as we did with

02:09:14.180 | that final summary buffer memory type. We can modify

02:09:17.940 | those to our liking, which is incredibly useful when you're

02:09:23.060 | actually building applications for the real world. So that is

02:09:26.340 | it for this chapter. We'll move on to the next one in this

02:09:29.780 | chapter. We are going to introduce agents now agents. I

02:09:34.820 | think are one of the most important components in the

02:09:39.300 | world of AI and I don't see that going away anytime soon.

02:09:43.140 | I think the majority of AI applications, the intelligent

02:09:49.220 | part of those will be almost always an implementation of an

02:09:53.380 | AI agent or most for AI agents. So in this chapter, we are just

02:09:57.940 | going to introduce agents within the context of lang

02:10:01.780 | chain. We're going to keep it relatively simple. We're going

02:10:05.540 | to go into much more depth in agents in the next chapter

02:10:10.500 | where we'll do a bit of a deep dive, but we'll focus on just

02:10:14.260 | introducing the core concepts and of course agents within

02:10:18.900 | lang chain here. So jumping straight into our notebook,

02:10:24.500 | let's run our prerequisites. You'll see that we do have an

02:10:28.660 | additional prerequisite here, which is Google search results.

02:10:31.780 | That's because we're going to be using the SERP API to allow

02:10:35.940 | our LM as an agent to search the web, which is one of the

02:10:41.700 | great things about agents that they can do all of these

02:10:44.420 | additional things and LM by itself obviously cannot. So

02:10:48.420 | we'll come down to here. We have our langsmith parameters

02:10:51.700 | again, of course. So you enter your lang chain API key if you

02:10:54.900 | have one and now we're going to take a look at tools, which is

02:10:59.380 | a very essential part of agents. So tools are a way for

02:11:04.740 | us to augment our LMs with essentially anything that we

02:11:08.900 | can write in code. So we mentioned that we're going to

02:11:12.420 | have a Google search tool that Google search tool. It's some

02:11:15.860 | code that gets executed by our LM in order to search Google

02:11:20.180 | and get some results. So a tool can be thought of as any code

02:11:25.620 | logic or any function in the case of Python and a function

02:11:31.380 | that has been formatted in a way so that our LM can

02:11:34.900 | understand how to use it and then actually use it. Although

02:11:39.860 | the LM itself is not using the tool. It's more our agent

02:11:44.740 | execution logic, which uses the tool for the LM. So we're

02:11:49.220 | going to go ahead and actually create a few simple tools.

02:11:52.740 | We're going to be using what is called the tool decorator from

02:11:55.380 | lang chain and there are a few things to keep in mind when

02:12:00.100 | we're building tools. So for optimal performance, our tool

02:12:04.100 | needs to be just very readable and what I mean by readable is

02:12:07.780 | we need three main things. One is a dot string that is written

02:12:12.660 | natural language and it is going to be used to explain to

02:12:15.860 | the LM when and why and how it should use this tool. We should

02:12:21.460 | also have clear parameter names. Those parameter names

02:12:25.460 | should tell the LM okay what each one of these parameters

02:12:29.780 | are. They should be self explanatory. If they are not

02:12:33.060 | self explanatory, we should be including an explanation for

02:12:37.860 | those parameters within the dot string. Then finally, we

02:12:41.220 | should have type annotations for both our parameters and

02:12:44.740 | also what we're returning from the tool. So let's jump in and

02:12:49.060 | see how we would implement all of that. So come down here and

02:12:52.820 | we have lang chain core tools import tool. Okay. So these are

02:12:57.380 | just four incredibly simple tools. We have the addition or

02:13:02.020 | add tool multiply the exponentiate and the subtract

02:13:05.780 | tools. Okay. So a few calculator S tools. Now when we

02:13:11.780 | add this tool decorator, it is turning each of these tools

02:13:17.140 | into what we call a structured tool object. So you can see

02:13:20.980 | that here. We can see we have this structured tool. We have a

02:13:26.180 | name description. Okay. And then we have this schema. We'll

02:13:30.340 | see this in a moment and a function right. So this

02:13:32.660 | function is literally just the original function. It's a

02:13:36.660 | mapping to the original function. So in this case, it's

02:13:39.700 | the add function. Now the description we can see it's

02:13:42.820 | coming from our dot string and of course the name as well is

02:13:46.740 | just coming from the function name. Okay. And then we can

02:13:50.020 | also see let's just print the name and description, but then

02:13:54.420 | we can also see the args schema right. We can so this

02:13:58.660 | thing here that we can't read at the moment to read it. We're

02:14:02.180 | just going to look at the model JSON schema method and then we

02:14:06.980 | can see what that contains, which is all of this

02:14:09.220 | information. So this actually contains everything includes

02:14:12.260 | properties. So we have the X. It creates a sort of title for

02:14:16.100 | that and it also specifies the type. Okay. So the type that we

02:14:20.660 | define is float float for opening. I guess mapped to

02:14:25.300 | number rather than just being float and then we also see that

02:14:28.900 | we have this required field. So this is telling our LM which

02:14:33.140 | parameters are required, which ones are optional. So we you

02:14:36.820 | know in some cases you would we can even do that here. Let's do

02:14:42.180 | Z. That is going to be float or none. Okay. And we're just

02:14:48.340 | going to say it is 0.3. Alright. I'm going to remove

02:14:53.460 | this in a minute because it's kind of weird, but let's just

02:14:57.140 | see what that looks like. So you see that we now have X, Y,

02:15:02.020 | and Z, but then in Z, we have some additional information.

02:15:06.580 | Okay. So it can be any of it can be a number or it can just

02:15:10.020 | be nothing. The default value for that is 0.3. Okay. And then

02:15:15.060 | if we look here, we can see that the required field does

02:15:18.020 | not include Z. So it's just X and Y. So it's describing the

02:15:22.980 | full function schema for us, but let's remove that. Okay. And

02:15:28.180 | we can see that again with our exponentiate tool similar

02:15:32.420 | thing. Okay. So how how are we going to invoke our tool? So

02:15:39.060 | the LLM the underlying LLM is actually going to generate a

02:15:42.900 | string. Okay. So it will look something like this. This is

02:15:46.660 | going to be our LLM output. So it is it's a string that is

02:15:51.780 | some JSON and of course to load a string into a dictionary

02:15:57.300 | format, we just use JSON loads. Okay. So let's see that. So

02:16:03.220 | this could be the output from our LLM. We load it into a

02:16:06.180 | dictionary and then we get an actual dictionary. And then

02:16:09.620 | what we would do is we can take our exponentiate tool. We

02:16:14.820 | access the underlying function and then we pass it the keyword

02:16:19.220 | arguments from our dictionary here. Okay. And that will

02:16:26.200 | execute our tool. That is the tool execution logic that

02:16:29.000 | LineChain implements and then later on in the next chapter,

02:16:32.520 | we'll be implementing ourselves. Cool. So let's move

02:16:35.560 | on to creating an agent. Now, we're going to be

02:16:38.680 | constructing a simple tool calling agent. We're going to

02:16:41.880 | be using LineChain expression language to do this. Now, we

02:16:45.720 | will be covering LineChain expression language or LSL

02:16:49.400 | more in a upcoming chapter but for now, all we need to know is

02:16:54.600 | that our agent will be constructed using syntax and

02:16:58.840 | components like this. So, we would start with our input

02:17:02.760 | parameters. That is going to include our user query and of

02:17:06.040 | course, the chat history because we need our agent to be

02:17:09.080 | conversational and remember previous interactions within

02:17:11.720 | the conversation. These input parameters will also include a

02:17:15.800 | placeholder for what we call the agent scratch pad. Now, the

02:17:18.680 | agent scratch pad is essentially where we are

02:17:21.240 | storing the internal thoughts or the internal dialogue of the

02:17:25.400 | agent as it is using tools and getting observations from those

02:17:28.280 | tools and working through those multiple internal steps. So, in

02:17:34.040 | the case that we will see, it will be using, for example, the

02:17:36.760 | addition tool, getting the result using the multiply tool,

02:17:39.720 | getting the result, and then providing a final answer

02:17:42.760 | towards as a user. So, let's jump in and see what it looks

02:17:46.680 | like. Okay, so we'll just start with defining our prompt. So,

02:17:50.360 | our prompt is going to include the system message. That's

02:17:53.480 | nothing. We're not putting anything special in there.

02:17:56.680 | We're going to include the chat history which is a messages

02:18:01.160 | placeholder. Then, we include our human message and then we

02:18:05.320 | include a placeholder for the agent scratch pad. Now, the way

02:18:08.760 | that we implement this later is going to be slightly different

02:18:12.040 | for the scratch pad. We'd actually use this messages

02:18:14.200 | placeholder but this is how we use it with the built-in

02:18:17.400 | create tool agent from LinkedIn. Next, we'll define our

02:18:21.240 | LM. We do need our opening our API key for that. So, we'll

02:18:24.920 | enter that here like so. Okay, so come down. Okay, so we're

02:18:30.120 | going to be creating this agent. We need conversation

02:18:33.240 | memory and we are going to use the older conversation buffer

02:18:36.280 | memory class rather than the newer runnable with message

02:18:39.080 | history class. That's just because we're also using this

02:18:42.200 | older create tool calling agent and this is the

02:18:46.760 | older way of doing things. In the next chapter, we are going

02:18:50.040 | to be using the more recent basically what we already

02:18:54.600 | learned on chat history. We're going to be using all of that

02:18:57.720 | to implement our chat history but for now, we're going to be

02:19:00.520 | using the older method which is deprecated just as a pre

02:19:04.760 | warning but again, as I mentioned at the very start of

02:19:08.200 | course, we're starting abstract and then we're getting into the

02:19:11.720 | details. So, we're going to initialize our agent for that.

02:19:15.960 | We need these four things. LLM as we defined. Tools as we have

02:19:20.440 | defined. Prompt as we have defined and then the memory

02:19:24.520 | which is our old conversation buffer memory. So, with all of

02:19:29.400 | that, we are going to go ahead and we create a tool calling

02:19:32.360 | agent and then we just provide it with everything. Okay, there

02:19:36.120 | we go. Now, you'll see here I didn't pass in the memory. I'm

02:19:41.400 | passing it in down here instead. So, we're going to

02:19:44.920 | start with this question which is what is 10.7 multiplied by

02:19:48.680 | 7.68. Okay. So, given the precision of these numbers, our

02:19:57.240 | normal LLM would not be able to answer that. Almost definitely

02:20:02.360 | would not be able to answer that correctly. We need a

02:20:04.920 | external tool to answer that accurately and we'll see that

02:20:08.520 | that is exactly what it's trying to do. So, we can see

02:20:12.440 | that the tool agent action message here. We see that it

02:20:17.800 | decided, okay, I'm going to use the multiply tool and here are

02:20:20.520 | the parameters I want to use for that tool. Okay, we can see

02:20:23.720 | X is 10.7 and Y is 7.68. You can see here that this is

02:20:28.760 | already a dictionary and that is because the Lang chain has

02:20:33.320 | taken the string from our LLM call and already converted it

02:20:37.880 | into a dictionary for us. Okay, so that's just it's happening

02:20:41.240 | behind the scenes there and you can actually see if we go into

02:20:44.840 | the details a little bit, we can see that we have these

02:20:46.840 | arguments and this is the original string that was coming

02:20:49.400 | from our LLM. Okay, which has already been, of course,

02:20:52.680 | processed by Lang chain. So, we have that. Now, the one thing

02:20:58.280 | missing here is that, okay, we've got that the LLM wants

02:21:03.800 | us to use multiply and we've got what the LLM wants us to

02:21:06.760 | put into multiply but where's the answer, right? There is no

02:21:11.160 | answer because the tool itself has not been executed because

02:21:14.840 | it can't be executed by the LLM but then, okay, didn't we

02:21:19.640 | already define our agent here? Yes, we defined the part of our

02:21:24.760 | agent. That is how LLM has our tools and it is going to

02:21:29.240 | generate which tool to use but it actually doesn't include the

02:21:33.880 | agent execution part which is, okay, the agent executor is a

02:21:40.360 | broader thing. It's broader logic like just code logic

02:21:44.520 | which acts as a scaffolding within which we have the

02:21:48.600 | iteration through multiple steps of our LLM calls followed

02:21:53.560 | by the LLM outputting what tool to use followed by us

02:21:57.320 | actually executing that for the LLM and then providing the

02:22:01.400 | output back into the LLM for another decision or another

02:22:05.480 | step. So, the agent itself here is not the full agentic flow

02:22:12.440 | that we might expect. Instead, for that, we need to implement

02:22:16.440 | this agent executor class. This agent executor includes our

02:22:20.840 | agent from before. Then, it also includes the tools and one

02:22:25.160 | thing here is, okay, we already passed the tools to our agent.

02:22:27.800 | Why do we need to pass them again? Well, the tools being

02:22:30.760 | passed to our agent up here, that is being used. So, that is

02:22:36.280 | essentially extracting out those function schemas and

02:22:39.240 | passing it to our LLM so that our LLM knows how to use the

02:22:41.880 | tools. Then, we're down here. We're passing the tools again

02:22:44.840 | to our agent executor and this is rather than looking at how

02:22:48.920 | to use those tools. This is just looking at, okay, I want

02:22:51.880 | the functions for those tools so that I can actually execute

02:22:54.440 | them for the LLM or for the agent. Okay, so that's what is

02:22:58.760 | happening there. Now, we can also pass in our memory

02:23:02.440 | directly. So, you see, if we scroll up a little bit here, I

02:23:06.600 | actually had to pass in the memory like this with our agent.

02:23:11.720 | That's just because we weren't using the agent executor. Now,

02:23:14.120 | we have the agent executor. It's going to handle that for

02:23:16.200 | us and another thing that's going to handle for us is

02:23:19.880 | intermediate steps. So, you'll see in a moment that when we

02:23:23.960 | invoke the agent executor, we don't include the intermediate

02:23:26.600 | steps and that's because that is already handled by the

02:23:29.800 | agent executor now. So, we'll come down. We'll set verbose

02:23:34.360 | equal to true so we can see what is happening and then we

02:23:38.200 | can see here, there's no intermediate steps anymore and

02:23:42.360 | we do still pass in the chat history like this but then the

02:23:47.480 | addition of those new interactions to our memory is

02:23:50.520 | going to be handled by the executor. So, in fact, let me

02:23:54.920 | actually show that very quickly before we jump in. Okay, so

02:23:59.320 | that's currently empty. We're going to execute this.

02:24:03.400 | Okay, we're entered that new agent executor chain and let's

02:24:07.300 | just have a quick look at our messages again and now you can

02:24:10.980 | see that agent executor automatically handled the

02:24:13.940 | addition of our human message and then the responding AI

02:24:17.700 | message for us. Okay, which is useful. Now, what happened? So,

02:24:23.140 | we can see that the multiply tool was invoked with these

02:24:26.820 | parameters and then this pink text here that we got, that is

02:24:30.900 | the observation from the tool. So, it's what the tool output

02:24:33.700 | back to us, okay? Then, this final message here is not

02:24:37.140 | formatted very nicely but this final message here is coming

02:24:40.420 | from our LLM. So, the green is our LLM output. The pink is our

02:24:46.420 | tool output, okay? So, the LLM after seeing this output says

02:24:53.700 | 10.7 multiplied by 7.68 is approximately 82.18. Okay,

02:25:01.220 | cool. Useful and then we can also see that the chat history

02:25:04.500 | which we already just saw. Great. So, that has been used

02:25:08.980 | correctly. We can just also confirm that that is correct.

02:25:13.220 | 82.1759 recurring which is exactly what we get here. Okay

02:25:18.740 | and we the reason for that is obviously our multiply tool is

02:25:22.340 | just doing this exact operation. Cool. So, let's try

02:25:28.100 | this with a bit of memory. So, I'm going to ask or I'm going

02:25:31.700 | to state to the agent. Hello, my name is James. We'll leave

02:25:36.980 | that as the it's not actually the first interaction because

02:25:40.100 | we already have these but it's an early interaction with my

02:25:45.860 | name in there. Then, we're going to try and perform

02:25:49.460 | multiple tool calls within a single execution loop and what

02:25:52.500 | you'll see with when it is calling these tools is that you

02:25:55.220 | can actually use multiple tools in parallel. So, for sure, I

02:25:58.420 | think two or three of these were used in parallel and then

02:26:01.460 | define or subtract had to wait for those previous results. So,

02:26:05.220 | it would have been executed afterwards and we should

02:26:08.420 | actually be able to see this in Langsmith. So, if we go here,

02:26:13.220 | yeah, we can see that we have this initial call and then we

02:26:17.060 | have add a multiply and exponentiate or use in parallel.

02:26:20.100 | Then, we have another call which you subtract and then we

02:26:22.820 | get the response. Okay, which is pretty cool and then the

02:26:27.620 | final result there is negative eleven. Now, when you look at

02:26:32.420 | whether the answer is accurate, I think the order here of

02:26:37.300 | calculations is not quite correct. So, if we put the

02:26:41.380 | actual computation here, it gets it right but otherwise, if

02:26:45.620 | I use natural language, it's like, I'm doing, maybe I'm

02:26:48.260 | phrasing it in a poor way. Okay, so, I suppose that is

02:26:53.780 | pretty important. So, okay, if we put the computation in here,

02:26:57.940 | we get the negative thirteen. So, it's something to be

02:27:01.460 | careful with and probably requires a little bit of

02:27:04.660 | prompting to prompting and maybe examples in order to get

02:27:08.020 | that smooth so that it does do things in the way that we might

02:27:12.740 | expect or maybe we as humans are just bad and misuse the

02:27:17.140 | systems one or the other. Okay, so now, we've gone through that

02:27:21.460 | a few times. Let's go and see if our agent can still recall

02:27:24.420 | our name. Okay and it remembers my name is James. Good. So, it

02:27:28.500 | still has that memory in there as well. That's good. Let's

02:27:32.020 | move on to another quick example where we're just going

02:27:35.220 | to use Google Search. So, we're going to be using the

02:27:37.700 | SEB API. You can, okay, you can get the API key that you need

02:27:43.540 | from here. So, SEB API dot com slash user slash sign in and

02:27:48.340 | just enter that in here. So, you will get it's up to 100

02:27:52.900 | searches per month for free. So, just be aware of that if

02:27:58.100 | you overuse it. I don't think they charge you cuz I don't

02:28:01.300 | think you enter your card details straight away but yeah

02:28:05.060 | just be aware of that limit. Now, there are certain tools

02:28:10.180 | that LineTrain have already built for us. So, they're

02:28:12.740 | pre-built tools and we can just load them using the load tools

02:28:15.860 | function. So, we do that like so. We have our load tools and

02:28:19.300 | we just pass in the SEB API tool only. We can pass in more

02:28:22.980 | there if we want to and then we also pass in our LM. Now, I'm

02:28:27.940 | going to one, use that tool but I'm also going to define my

02:28:31.700 | own tool which is to get the current location based on the

02:28:35.380 | IP address. Now, this is we're in Colab at the moment. So,

02:28:37.860 | it's actually going to get the IP address for the Colab

02:28:40.340 | instance that I'm currently on and we'll find out where that

02:28:43.380 | is. So, that is going to get the IP address and then it's

02:28:47.620 | going to provide the data back to our LM in this format here.

02:28:50.820 | So, we're going to be latitude, longitude, city, and

02:28:53.060 | country. Okay? We're also going to get the current date and

02:28:56.660 | time. So, now, we're going to redefine our prompt. I'm not

02:29:02.500 | going to include chat history here. I just want this to be

02:29:04.820 | like a one-shot thing. I'm going to redefine our agent and

02:29:09.300 | agent executor using our new tools which is our SEB API plus

02:29:13.780 | the get current date time and get location from IP. Then,

02:29:17.780 | I'm going to invoke our agent executor with I have a few

02:29:20.900 | questions. What is the date and time right now? How is the

02:29:23.780 | weather where I am? And please give me degrees in Celsius. So,

02:29:28.740 | when it gives me that weather. Okay and let's see what we get.

02:29:33.780 | Okay. So, apparently, we're in Council Bluffs in the US. It is

02:29:40.680 | 13 degrees Fahrenheit which I think is absolutely freezing.

02:29:44.440 | Oh my gosh, it is. Yes, minus ten. So, it's super cold over

02:29:48.760 | there. And you can see that, okay, it did give us

02:29:53.000 | Fahrenheit. So, that's that is because the tool that we're

02:29:55.320 | using provided us with Fahrenheit which is fine but it

02:29:59.960 | did translate that over into a estimate of Celsius for us

02:30:03.800 | which is pretty cool. So, let's actually output that. So, we

02:30:07.640 | get this which I is correct with the US approximately this

02:30:13.640 | and we also get a description of the conditions was partly

02:30:17.240 | cloudy with 0% precipitation lucky for them and humidity of

02:30:23.720 | 66%. Okay. All pretty cool. So, that is it for this

02:30:27.800 | introduction to Langchain Agents. As I mentioned, next

02:30:31.080 | chapter, we're going to dive much deeper into Agents and

02:30:34.120 | also implement that for Langchain version 0.3. So,

02:30:37.880 | we'll leave this chapter here and jump into the next one. In

02:30:41.320 | this chapter, we're going to be taking a deep dive into Agents

02:30:45.800 | with the Langchain and we're going to be covering what an

02:30:50.840 | agent is. We're going to talk a little bit conceptually about

02:30:55.640 | agents, the React agent, and the type of agent that we're

02:30:59.320 | going to be building and based on that knowledge, we are

02:31:02.120 | actually going to build out our own agent execution logic

02:31:07.880 | which we refer to as the agent executor. So, in comparison to

02:31:12.680 | the previous video on agents in Langchain which is more of an

02:31:17.240 | introduction, this is far more detailed. We'll be getting into

02:31:21.480 | the weeds a lot more with both what agents are and also agents

02:31:26.200 | within Langchain. Now, when we talk about agents, a

02:31:30.280 | significant part of the agent is actually relatively simple

02:31:36.520 | code logic that iteratively runs LLM calls and processes

02:31:44.040 | their outputs, potentially running or executing tools. The

02:31:48.760 | exact logic for each approach to building an agent will

02:31:53.400 | actually vary pretty significantly, but we'll focus

02:31:57.560 | on one of those which is the React agent. Now, React is a

02:32:03.160 | very common pattern and although being relatively old

02:32:07.560 | now, most of the tool agents that we see used by OpenAI and

02:32:13.320 | essentially every LLM company, they all use a very similar

02:32:17.240 | pattern. Now, the React agent follows a pattern like this.

02:32:20.920 | Okay, so we would have our user input up here. Okay, so our

02:32:26.760 | input here is a question, right? Aside from the Apple

02:32:29.160 | remote, what other device can control the program? Apple

02:32:31.720 | remote was originally designed to interact with. Now, probably

02:32:35.400 | most LLMs would actually be able to answer this directly

02:32:37.640 | now. This is from the paper, which was a few years back. Now,

02:32:42.600 | in this scenario, assuming our LLM didn't already know the

02:32:46.360 | answer, there are multiple steps an LLM or an agent might

02:32:50.280 | take in order to find out the answer. Okay, so first of

02:32:55.000 | those is we say our question here is what other device can

02:32:59.160 | control the program? Apple remote was originally designed

02:33:01.800 | to interact with. So the first thing is, okay, what was the

02:33:05.240 | program that the Apple remote was originally designed to

02:33:07.800 | interact with? That's the first question we have here. So what

02:33:12.360 | we do is I need to search Apple remote and find a program

02:33:15.240 | that's useful. This is a reasoning step. So the LLM is

02:33:18.840 | reasoning about what it needs to do. I need to search for

02:33:22.040 | that and find a program that's useful. So we are taking an

02:33:26.200 | action. This is a tool call here. Okay, so we're going to

02:33:29.480 | use the search tool and our query will be Apple remote and

02:33:33.000 | the observation is the response we get from executing that

02:33:36.120 | tool. Okay, so the response here will be the Apple remote

02:33:39.000 | is designed to control the front grow media center. So now

02:33:43.320 | we know the program Apple remote was originally designed

02:33:45.720 | to interact with. Now we're going to go through another

02:33:49.480 | iteration. Okay, so this is one iteration of our reasoning

02:33:55.160 | action and observation. So when we're talking about react

02:33:59.960 | here, although again, this sort of pattern is very common

02:34:03.640 | across many agents when we're talking about react, the name

02:34:07.880 | actually is reasoning or the first two characters of

02:34:12.360 | reasoning followed by action. Okay, so that's where the react

02:34:17.080 | comes from. So this is one of our react agent loops or

02:34:21.400 | iterations. We're going to go and do another one. So next

02:34:25.000 | step we have this information. The LM is not provided with

02:34:27.640 | this information. Now we want to do a search for front row.

02:34:31.800 | Okay, so we do that. This is the reasoning step. We perform

02:34:35.960 | the action search front row. Okay, tool search query front

02:34:40.680 | row observation. This is the response front row is controlled

02:34:44.600 | by an Apple remote or keyboard function keys. Alright, cool.

02:34:50.120 | So we know keyboard function keys are the other device that

02:34:53.880 | we were asking about up here. So now we have all the

02:34:58.600 | information we need. We can provide an answer to our user.

02:35:02.760 | So we go through another iteration here reasoning and

02:35:07.240 | action. Our reasoning is I can now provide the answer of

02:35:11.400 | keyboard function keys to the user. Okay, great. So then we

02:35:16.440 | use the answer tool. It's like final answer in more common

02:35:21.960 | tool agent use and the answer would be keyboard function

02:35:27.000 | keys, which we then output to our user. Okay, so that is the

02:35:33.720 | react loop. Okay, so looking at this. Where are we actually

02:35:40.020 | calling an LLM and in what way are we actually calling an LLM?

02:35:44.820 | So we have our reasoning step. Our LLM is generating the text

02:35:50.900 | here, right? So LLM is generating. Okay. What should I

02:35:53.700 | do then? Our LLM is going to generate the input parameters

02:35:59.620 | to our action step here that will those input parameters and

02:36:05.460 | the tool being used will be taken by our code logic, our

02:36:08.580 | agent executor logic, and they will be used to execute some

02:36:11.940 | code in which we will get an output. That output might be

02:36:16.180 | taken directly to our observation or our LLM might

02:36:19.460 | take that output and then generate an observation based

02:36:22.500 | on that. It depends on how you've implemented everything.

02:36:27.380 | So our LLM could potentially be being used at every single

02:36:32.660 | step there and of course that will repeat through every

02:36:37.860 | iteration. So we have further iterations down here. So you're

02:36:41.540 | potentially using an LLM multiple times throughout this

02:36:44.740 | whole process, which of course in terms of latency and token

02:36:48.020 | cost, it does mean that you're going to be paying more for an

02:36:52.100 | agent than you are with just a standard LLM, but that is of

02:36:55.940 | course expected because you have all of these different

02:36:58.740 | things going on. But the idea is that what you can get out of

02:37:02.820 | an agent is of course much better than what you can get

02:37:05.780 | out of an LLM alone. So when we're looking at all of this,

02:37:11.060 | all of this iterative chain of thought and tool use, all this

02:37:16.260 | needs to be controlled by what we call the agent executor,

02:37:19.380 | which is our code logic, which is hitting our LLM, processing

02:37:23.380 | its outputs, and repeating that process until we get to our

02:37:27.060 | answer. So breaking that part down, what does it actually

02:37:30.900 | look like? It looks kind of like this. So we have our user

02:37:34.900 | input goes into our LLM, okay, and then we move on to the

02:37:39.540 | reasoning and action steps. Is the action the answer? If it is

02:37:44.500 | the answer, so as we saw here, where is the answer? If the

02:37:50.660 | action is the answer, so true, we would just go straight to

02:37:54.180 | our outputs. Otherwise, we're going to use our selector tool.

02:37:57.620 | Agent executor is going to handle all this. It's going to

02:38:00.980 | execute our tool, and then from that, we get our three

02:38:05.460 | reasoning, action, observation, inputs, and outputs, and then

02:38:09.300 | we're feeding all that information back into our LLM,

02:38:11.940 | okay? In which case, we go back through that loop. So we

02:38:15.860 | could be looping for a little while until we get to that

02:38:19.060 | final output. Okay, so let's go across to the code. We're going

02:38:23.620 | to be going into the agent executor notebook. We'll open

02:38:26.580 | that up in Colab, and we'll go ahead and just install our

02:38:30.500 | prerequisites. Nothing different here. It's just

02:38:34.820 | Langtrain, Langsmith, optionally, as before. Again,

02:38:38.980 | optionally, Langtrain API key if you do want to use

02:38:41.540 | Langsmith. Okay, and then we'll come down to our first

02:38:47.060 | section, where it's going to define a few quick tools. I'm

02:38:51.220 | not necessarily going to go through these because we've

02:38:54.660 | already covered them in the agent introduction, but very

02:38:58.580 | quickly, Langtrain core tools, we're just importing this tool

02:39:02.180 | decorator, which transforms each of our functions here into

02:39:06.820 | what we would call a structured tool object. This

02:39:10.740 | thing here. Okay, which we can see. Let's just have a quick

02:39:14.660 | look here, and then if we want to, we can extract all of the

02:39:18.820 | sort of key information from that structured tool using

02:39:21.860 | these parameters here or attributes. So name,

02:39:24.180 | description, org schema, model, JSON schema, which give us

02:39:28.740 | essentially how the LLM should use our function. Okay, so I'm

02:39:34.900 | going to keep pushing through that. Now, very quickly again,

02:39:40.660 | we did cover this in the intro video, so I don't want to

02:39:44.420 | necessarily go over it again in too much detail, but our

02:39:48.580 | agent executor logic is going to need this part. So we're

02:39:52.660 | going to be getting a string from our LLM. We're going to be

02:39:55.780 | loading that into a dictionary object, and we're going to be

02:39:59.060 | using that to actually execute our tool as we do here using

02:40:02.980 | keyword arguments. Okay, like that. Okay, so with the tools

02:40:09.620 | out of the way, let's take a look at how we create our

02:40:12.340 | agent. So when I say agent here, I'm specifically talking

02:40:16.820 | about the part that is generating our reasoning step,

02:40:21.460 | then generating which tool and what the input parameters to

02:40:27.140 | that tool will be. Then the rest of that is not actually

02:40:30.340 | covered by the agent. Okay, the rest of that would be covered

02:40:33.380 | by the agent execution logic, which would be taking the tool

02:40:37.140 | to be used, the parameters, executing the tool, getting

02:40:41.220 | the response, aka the observation, and then iterating

02:40:45.060 | through that until the LLM is satisfied and we have enough

02:40:47.940 | information to answer a question. So looking at that,

02:40:52.740 | our agent will look something like this. It's pretty simple.

02:40:56.020 | So we have our input parameters, including the chat

02:40:58.500 | history, user query. We have our input parameters, including

02:41:01.780 | the chat history, user query, and actually would also have

02:41:04.900 | any intermediate steps that have happened in here as well. We

02:41:08.500 | have our prompt template, and then we have our LLM binded

02:41:12.340 | with tools. So let's see how all this would look starting

02:41:16.500 | with, we'll define our prompt template. So it's going to look

02:41:20.340 | like this. We have our system message, you're a helpful

02:41:24.340 | assistant when answering user's questions. You should use one

02:41:26.900 | tool to provide it after using a tool. The tool I will provide

02:41:29.380 | in the scratch pad below, okay, which we're naming here. If you

02:41:33.860 | have an answer in the scratch pad, you should not use any

02:41:36.580 | more tools and instead answer directly to the user. Okay, so

02:41:40.420 | we have that as our system message. We could obviously

02:41:43.300 | modify that based on what we're actually doing. Then following

02:41:47.620 | our system message, we're going to have our chat history, so any

02:41:50.420 | previous interactions between the user and the AI. Then we

02:41:54.180 | have our current message from the user, okay, which will be

02:41:57.860 | fed into the input field there. And then following this, we

02:42:01.780 | have our agent's scratch pad or the intermediate thoughts. So

02:42:05.140 | this is where things like the LLM deciding, okay, this is what

02:42:09.540 | I need to do. This is how I'm going to do it, aka the tool

02:42:12.900 | call. And this is the observation. That's where all

02:42:16.020 | of that information will be going, right? So each of those

02:42:18.980 | you want to pass in as a message, okay? And the way that

02:42:23.380 | will look is that any tool call generation from the LLM, so

02:42:28.020 | when the LLM is saying, use this tool, please, that will be

02:42:31.780 | a system message. And then the responses from our tool, so the

02:42:37.140 | observations, they will be returned as tooled messages.

02:42:42.180 | Great. So we'll run that to define our prompt template.

02:42:46.180 | We're going to define our LLM. So we're going to be using

02:42:49.700 | Jupyter 4.0 Mini with a temperature of zero because we

02:42:54.100 | want less creativity here, particularly when we're doing

02:42:56.820 | tool calling. There's just no need for us to use a high

02:43:00.500 | temperature here. So we need to enter our OpenAI API key, which

02:43:03.780 | we would get from platformopenai.com. We enter this,

02:43:08.100 | then we're going to continue and we're just going to add

02:43:11.140 | tools to our LLM here, okay? These, and we're going to bind

02:43:18.180 | them here. Then we have tool choice any. So tool choice any,

02:43:23.060 | we'll see in a moment, I'll go through this a little bit more

02:43:25.860 | in a second, but that's going to essentially force a tool

02:43:29.540 | call. And you can also put required, which is actually a

02:43:32.420 | bit more, it's a bit clearer, but I'm using any here, so I'll

02:43:36.500 | stick with it. So these are our tools we're going through. We

02:43:40.100 | have our inputs into the agent runnable. We have our prompt

02:43:44.980 | template and then that will get fed into our LLM. So let's run

02:43:49.140 | that. Now we would invoke the agent part of everything here

02:43:54.100 | with this. Okay, so let's see what it outputs. This is

02:43:56.820 | important. So I'm asking what is 10%? Obviously that should

02:44:00.420 | use the addition tool and we can actually see that happening.

02:44:03.620 | So the agent message content is actually empty here. This is

02:44:07.940 | where you'd usually get an answer, but if we go and have a

02:44:11.380 | look, we have additional keyword args. In there we have

02:44:14.580 | tool calls and then we have function arguments. Okay, so

02:44:19.060 | we're calling a function. Arguments for that function are

02:44:22.020 | this. Okay, so we can see this is string. Again, the way that

02:44:26.580 | we would parse that is we do JSON loads and that becomes

02:44:29.620 | dictionary and then we can see which function is being called

02:44:32.740 | and it is the add function and that is all we need in order to

02:44:36.420 | actually execute our function or our tool. Okay, we can see

02:44:42.740 | it's a lot more detail here. Now, what do we do from here?

02:44:47.780 | We're going to map the tool name to the tool function and

02:44:50.660 | then we're just going to execute the tool function with

02:44:52.580 | the generated args, i.e. those. I'll also just point out

02:44:57.380 | quickly that here we are getting the dictionary

02:45:00.100 | directly, which I think is coming from somewhere else in

02:45:02.820 | this, which is here. Okay, so even that step

02:45:08.820 | here where we're parsing this out, we don't necessarily need

02:45:11.300 | to do that because I think on the lang chain side, they're

02:45:14.580 | doing it for us. So we're already getting that. So JSON

02:45:19.540 | loads we don't necessarily need here. Okay, so we're just

02:45:22.900 | creating this tool name to function mapping dictionary

02:45:26.660 | here. So we're taking the well the tool names and we're just

02:45:30.420 | mapping those back to our tool functions and this is coming

02:45:33.140 | from our tools list. So that tools list that we defined

02:45:36.820 | here. Okay, and we can even just see quickly that will

02:45:41.140 | include everything or each of the tools we define there.

02:45:44.820 | Okay, that's all it is. Now, we're going to execute using

02:45:49.860 | our name to tool mapping. Okay, so this here will get us the

02:45:54.660 | function. So we'll get us this function and then to that

02:45:58.580 | function, we're going to pass the arguments that we

02:46:02.420 | generated. Okay. Let's see what it looks like. Alright, so the

02:46:08.180 | response to the observation is twenty. Now, we are going to

02:46:14.180 | feed that back into our LLM using the tool message and

02:46:19.140 | we're actually going to put a little bit of text around this

02:46:21.540 | to make it a little bit nicer. We don't necessarily need to

02:46:24.420 | do this to be completely honest. We could just return

02:46:29.220 | the answer directly. I don't understand. I don't even think

02:46:33.220 | there would really be any difference. So, we could do

02:46:36.980 | either. In some cases, that could be very useful. In other

02:46:40.020 | cases, like here, it doesn't really make too much

02:46:42.340 | difference, particularly because we have this tool call

02:46:44.980 | ID and what this tool call ID is doing is it's being used by

02:46:48.660 | OpenAI. It's being read by the LLM so that the LLM knows that

02:46:54.180 | the response we got here is actually mapped back to the

02:46:59.940 | tool execution that it's identified here because you see

02:47:04.020 | that we have this ID. Alright, we have an ID here. The LLM is

02:47:08.020 | going to see the ID. It's going to see the ID that we pass back

02:47:12.340 | in here and it's going to see those two are connected. So,

02:47:14.900 | you can see, okay, this is the tool I called and this is a

02:47:17.540 | response I got from it. Because of that, you don't necessarily

02:47:20.740 | need to say which tool you used here. You can. It depends on

02:47:25.620 | what you're doing. Okay. So, what do we get here? We have,

02:47:32.580 | okay, just running everything again. We've added our tool

02:47:35.780 | call. So, that's the original AI message that includes, okay,

02:47:39.060 | use that tool and then we have the tool execution, tool

02:47:41.940 | message, which is the observation. We map those to

02:47:46.500 | the agent stretch card and then what do we get? We have an AI

02:47:49.540 | message but the content is empty again, which is

02:47:52.420 | interesting because we said to our LLM up here, if you have an

02:47:57.940 | answer in the stretch pad, you should not use any more tools

02:48:01.140 | and instead answer directly to the user. So, why is our LLM

02:48:07.860 | not answering? Well, the reason for that is down here, we

02:48:13.620 | specify tool choice equals any, which again, it's the same as

02:48:19.060 | tool choice required, which is telling the LLM that it cannot

02:48:24.180 | actually answer directly. It has to use a tool and I usually

02:48:28.900 | do this, right? I would usually put tool choice equals any or

02:48:32.180 | required and force the LLM to use a tool every single time.

02:48:37.780 | So, then the question is, if it has to use a tool every time,

02:48:41.220 | how does it answer our user? Well, we'll see in a moment.

02:48:47.220 | First, I just want to show you the two options essentially

02:48:51.380 | that we have. The second is what I would usually use but

02:48:53.700 | let's start with the first. So, the first option is that we

02:48:57.700 | set tool choice equal to auto and this tells the LLM that it

02:49:01.540 | can either use a tool or it can answer the user directly using

02:49:06.580 | the final answer or using that content field. So, if we run

02:49:11.460 | that, like we're specifying tool choice as auto, we run

02:49:14.740 | that, let's invoke, okay? Initially, you see, ah, wait,

02:49:20.100 | there's still no content. That's because we didn't add

02:49:23.140 | anything into the agent scratch pad here. There's no

02:49:25.460 | information, right? It's all empty. Actually, it's empty

02:49:30.260 | because, sorry, so here, you have the chat history that's

02:49:32.820 | empty. We didn't specify the agent scratch pad and the

02:49:38.260 | reason that we can do that is because we're using, if you

02:49:40.340 | look here, we're using get. So, essentially, it's saying,

02:49:43.700 | try and get agent scratch pad from this dictionary but if it

02:49:46.420 | hasn't been provided, we're just going to give an empty

02:49:49.300 | list. So, that's why we don't need to specify it

02:49:52.820 | here. But that means that, oh, okay, the agent doesn't

02:49:56.980 | actually know anything here. It hasn't used the tool yet. So,

02:50:01.300 | we're going to just go through our iteration again, right? So,

02:50:04.020 | we're going to get our tool output. We're going to use that

02:50:07.300 | to create the tool message and then we're going to add our

02:50:11.380 | tool call from the AI and the observation. We're going to

02:50:15.620 | pass those to the agent scratch pad and this time, we'll see.

02:50:19.700 | We run that. Okay, now, we get the content, okay? So, now, it's

02:50:24.980 | not calling. You see here, there's no tool call or

02:50:27.460 | anything going on. We just get content. So, that is, this is a

02:50:34.260 | standard way of doing or building a tool calling agent.

02:50:38.420 | The other option which I mentioned, this is what I

02:50:40.740 | usually go with. So, number two here, I would usually create a

02:50:45.700 | final answer tool. So, why would we even do that? Why would we

02:50:53.140 | create a final answer tool rather than just, you know, this

02:50:55.380 | method is actually perfectly, you know, it works. So, why

02:50:59.140 | would we not just use this? There are a few reasons. The

02:51:03.060 | main ones are that with option two where we're forcing tool

02:51:07.620 | calling, this removes possibility of an agent using

02:51:11.940 | that content field directly and the reason, at least, the

02:51:16.740 | reason I found this good when building agents in the past is

02:51:19.620 | that occasionally, when you do want to use a tool, it's

02:51:22.660 | actually going to go with the content field and it can get

02:51:25.860 | quite annoying and use the content field quite frequently

02:51:29.380 | when you actually do want it to be using one of the tools and

02:51:34.100 | this is particularly noticeable with smaller models. With

02:51:39.380 | bigger models, it's not as common although it does still

02:51:42.740 | happen. Now, the second thing that I quite like about using a

02:51:47.060 | tool as your final answer is that you can enforce a

02:51:52.740 | structured output in your answer. So, this is something

02:51:55.460 | we saw in, I think, the first, yes, the first line chain

02:52:00.100 | example where we were using the structured output tool of

02:52:05.060 | line chain and what that actually is, the structured

02:52:08.260 | output feature of line chain, it's actually just a tool call,

02:52:11.700 | right? So, it's forcing a tool call from your LLM. It's just

02:52:15.060 | abstracted away so you don't realize that that's what it's

02:52:17.220 | doing but that is what it's doing. So, I find that

02:52:22.020 | structured outputs are very useful particularly when you

02:52:25.940 | have a lot of code around your agent. So, when that output

02:52:30.420 | needs to go downstream into some logic, that can be very

02:52:35.780 | useful because you can, you have a reliable output format

02:52:40.420 | that you know is going to be output and it's also incredibly

02:52:43.860 | useful if you have multiple outputs or multiple fields that

02:52:47.860 | you need to generate for. So, those can be very useful. Now,

02:52:53.780 | to implement this, so to implement option two, we need

02:52:56.500 | to create a final answer tool. We, as with our other tools,

02:53:02.020 | we're actually going to provide a description and you can or

02:53:05.860 | you cannot do this. So, you can, you can also just return

02:53:10.260 | none and actually just use the generated action as the

02:53:16.340 | essentially what you're going to send out of your agent

02:53:19.700 | execution logic or you can actually just execute the tool

02:53:23.700 | and just pass that information directly through. Perhaps, in

02:53:27.220 | some cases, you might have some additional post processing for

02:53:30.740 | your final answer. Maybe you do some checks to make sure it

02:53:33.220 | hasn't said anything weird. You could add that in this tool

02:53:37.300 | here but yeah, in this case, we're just going to pass those

02:53:41.060 | through directly. So, let's run this. We've added, where are we?

02:53:48.820 | Final answer. We've added the final answer tool to our named

02:53:51.460 | tool mapping. So, our agent can now use it. We redefine our

02:53:56.100 | agent, setting tool choice to any because we're forcing the

02:53:59.460 | tool choice here and let's go with what is ten plus ten. See

02:54:04.180 | what happens. Okay, we get this, right? We can also, one

02:54:08.900 | thing, nice thing here is that we don't need to check is our

02:54:11.460 | output in the content field or is it in the tool course field?

02:54:14.500 | We know it's going to be in the tool course field because

02:54:16.500 | we're forcing that tool use which is quite nice. So, okay,

02:54:19.860 | we know we're using the add tool and these are the

02:54:22.500 | arguments. Great. We go or go through that process again.

02:54:27.380 | We're going to create our tool message and then we're going to

02:54:30.260 | add those messages into our scratch pad or intermediate

02:54:33.460 | sets and then we can see again, ah, okay, content field is

02:54:38.100 | empty. That is expected. We're forcing tool users. No way that

02:54:42.580 | this can be or have anything inside it but then if we come

02:54:48.020 | down here to our tool course, nice. Final answer, answer, ten

02:54:54.100 | plus ten equals twenty. Alright? We also have this.

02:54:58.820 | Tools used. Where is tools used coming from? Okay, well, I

02:55:01.620 | mentioned before that you can add additional things or

02:55:06.020 | outputs when you're using this tool used for your final

02:55:09.700 | answer. So, if you just come up here to here, you can see that

02:55:14.820 | I asked the LLM to use that tools used field which I

02:55:18.980 | defined here. It's a list of strings. Use this to tell me

02:55:23.140 | what tools you use in your answer, right? So, I'm getting

02:55:26.260 | the normal answer but I'm also getting this information as

02:55:28.900 | well which is kind of nice. So, that's where that is coming

02:55:31.620 | from. See that? Okay. So, we have our actual answer here and

02:55:36.260 | then we just have some additional information, okay?

02:55:38.980 | We've also defined a type here. It's just a list of strings

02:55:41.620 | which is really nice. It's giving us a lot of control over

02:55:43.940 | what we're outputting which is perfect. That's, you know, when

02:55:46.580 | you're building with agents, the biggest problem in most

02:55:52.340 | cases is control of your LLM. So, here, we're getting a

02:55:58.100 | honestly pretty unbelievable amount of control over what our

02:56:02.740 | LLM is going to be doing which is perfect for when you're

02:56:07.060 | building in the real world. So, this is everything that we

02:56:12.580 | need. This is our answer and we would of course be passing

02:56:15.460 | that downstream into whatever logic our AI application would

02:56:22.020 | be using, okay? So, maybe that goes directly to a front end

02:56:26.020 | and we're displaying this as our answer and we're maybe

02:56:29.460 | providing some information about, okay, where did this

02:56:31.780 | answer come from or maybe there's some additional steps

02:56:34.980 | downstream where we're actually doing some more processing or

02:56:39.060 | transformations but yeah, we have that. That's great. Now,

02:56:43.540 | everything we've just done here, we've been executing

02:56:45.940 | everything one by one and that's to help us understand

02:56:50.980 | what process we go through when we're building an agent

02:56:55.220 | executor. But we're not going to want to do that all the time,

02:57:00.500 | are we? Most of the time, we probably want to abstract all

02:57:04.180 | this away and that's what we're going to do now. So, we're

02:57:07.860 | going to build essentially everything we've just taken.

02:57:11.140 | We're going to abstract that and abstract it away into a

02:57:15.220 | custom agent executor class. So, let's have a quick look at

02:57:20.020 | what we're doing here. Although it's literally just

02:57:22.340 | what we just did, okay? So, custom agent executor. We

02:57:27.860 | initialize it. We set this max iterations. I'll talk about

02:57:31.060 | this in a moment. We initialize it. That is going to set our

02:57:34.820 | chat history to just being empty. Okay, good. So, it's a

02:57:38.980 | new agent. There should be no chat history in this case. Then

02:57:42.180 | we actually define our agent, right? So, that part of logic

02:57:45.380 | that is going to be taking our inputs and generating what to

02:57:48.900 | do next aka what tool call to do, okay? And we set everything

02:57:53.460 | as attributes of our class and then we're going to define an

02:57:58.020 | invoke method. This invoke method is going to take an

02:58:02.420 | input which is just a string. So, it's going to be our

02:58:04.500 | message from the user and what it's going to do is it's going

02:58:09.460 | to iterate through essentially everything we just did, okay?

02:58:14.980 | Until we hit the the final answer tool, okay? So, well,

02:58:18.820 | what does that mean? We have our tool call, right? Which is

02:58:23.780 | we're just invoking our agent, right? So, it's going to

02:58:26.980 | generate what tool to use and what parameters should go into

02:58:29.700 | that, okay? And that's an AI message. So, we would append

02:58:35.460 | that to our agent stretch pad and then we're going to use the

02:58:38.820 | information from our tool call. So, the name of the tool and

02:58:42.020 | the args and also the ID. We're going to use all of that

02:58:45.860 | information to execute our tool and then provide the

02:58:51.140 | observation back to our LLM, okay? So, execute our tool here.

02:58:55.860 | We then format the tool output into a tool message. See here

02:59:00.580 | that I'm just using the the output directly. I'm not adding

02:59:03.620 | that additional information there. We do need to always

02:59:08.180 | pass in the tool call ID so that our LLM knows which output

02:59:12.900 | is mapped to which tool. I didn't mention this before in

02:59:16.580 | this video at least but that is that's important when we have

02:59:19.380 | multiple tool calls happening in parallel because that can

02:59:22.500 | happen. When we have multiple tool calls happening in

02:59:25.220 | parallel, let's say we have ten tool calls, all those

02:59:28.100 | responses might come back at different times. So, then the

02:59:31.380 | order of those can get messed up. So, we wouldn't necessarily

02:59:35.780 | always see that it's a AI message beginning a tool call

02:59:41.060 | followed by the answer to that tool call. Instead, it might be

02:59:44.900 | AI message followed by like ten different tool call responses.

02:59:49.620 | So, you need to have those IDs in there, okay? So, then we

02:59:54.260 | pass our tool output back to our Agent Scratchpad or

02:59:58.660 | intermediate steps. I'm sending a print in here so that we can

03:00:02.500 | see what's happening whilst everything is running. Then we

03:00:05.060 | increment this count number. We'll talk about that in a

03:00:08.580 | moment. So, coming past that, we say, okay, if the tool name

03:00:12.660 | here is final answer, that means we should stop, okay? So,

03:00:18.580 | once we get the final answer, that means we can actually

03:00:20.980 | extract our final answer from the final tool call, okay? And

03:00:25.940 | in this case, I'm going to say that we're going to extract the

03:00:31.220 | answer from the tool call or the observation. We're going to

03:00:35.300 | extract the answer that was generated. We're going to pass

03:00:38.260 | that into our chat history. So, we're going to have our user

03:00:41.860 | message. This is the one the user came up with followed by

03:00:45.380 | our answer which is just the natural answer field and that's

03:00:49.700 | simply an AI message. But then we're actually going to be

03:00:52.660 | including all of the information. So, this is the

03:00:55.780 | answer, natural language answer and also the tool was used

03:01:01.220 | output. We're going to be feeding all of that out to some

03:01:04.900 | downstream process as preferred. So, we have that. Now,

03:01:10.900 | one thing that can happen if we're not careful is that our

03:01:15.460 | agent executor may run many, many times and particularly if

03:01:20.660 | we've done something wrong in our logic because we're

03:01:23.140 | building these things, it can happen that maybe we've not

03:01:26.980 | connected the observation back up into our agent executor

03:01:32.260 | logic and in that case, what we might see is our agent

03:01:34.980 | executor runs again and again and again and I mean, that's

03:01:38.020 | fine. We're going to stop it but if we don't realize

03:01:42.020 | straight away and we're doing a lot of LLM calls that can get

03:01:44.980 | quite expensive quite quickly. So, what we can do is we can

03:01:49.060 | set a limit, right? So, that's what we've done up here with

03:01:51.220 | this max iterations. We said, okay, if we go past three max

03:01:54.740 | iterations by default, I'm going to say stop, alright? So,

03:01:58.660 | that's why we have the count here. While count is less than

03:02:02.820 | the max iterations, we're going to keep going. Once we hit the

03:02:06.820 | number of max iterations, we stop, okay? So, the while loop

03:02:09.860 | will just stop looping, okay? So, it just protects us in case

03:02:14.900 | of that and it also potentially maybe at some point, your agent

03:02:19.140 | might be doing too much to answer a question. So, this

03:02:22.260 | will force it to stop and just provide an answer. Although, if

03:02:25.860 | that does happen, I just realized there's a bit of a

03:02:28.980 | fault in the logic here. If that does happen, we wouldn't

03:02:31.940 | necessarily have the answer here, right? So, we'd probably

03:02:35.700 | want to handle that nicely but in this scenario, it's a very

03:02:40.260 | simple use case. We're not going to see that happening. So,

03:02:44.260 | we initialize our custom agent executor and then we invoke it,

03:02:50.740 | okay? And let's see what happens. Alright, there we go.

03:02:54.340 | So, that just wrapped everything into a single invoke.

03:03:00.740 | So, everything is handled for us. We could say, okay, what is

03:03:05.220 | ten? You know, we can modify that and say 7.4 for example

03:03:12.260 | and that will go through. We'll use the multiply tool instead

03:03:15.060 | and then we'll come back to the final answer again, okay? So,

03:03:18.420 | we can see that with this custom agent executor, we've

03:03:22.580 | built an agent and we have a lot more control over everything

03:03:27.060 | that is going on in here. One thing that we would probably

03:03:33.300 | need to add in this scenario is right now, I'm assuming that

03:03:36.500 | only one tool call will happen at once and it's also why I'm

03:03:39.460 | asking here. I'm not asking a complicated question because I

03:03:42.500 | don't want it to go and try and execute multiple tool calls at

03:03:46.340 | once which can happen. So, let's just try this. Okay. So,

03:03:52.660 | this is actually completely fine. So, this did just execute

03:03:55.620 | it one after the other. So, you can see that when asking this

03:04:00.500 | more complicated question, it first did the exponentiate tool

03:04:05.300 | followed by the add tool and then it actually gave us our

03:04:07.620 | final answer which is cool. Also told us we use both of

03:04:11.540 | those tools which it did but one thing that we should just

03:04:16.420 | be aware of is that from OpenAI, OpenAI can actually

03:04:20.420 | execute multiple tool calls in parallel. So, by specifying

03:04:24.980 | that we're just using this zero here, we're actually assuming

03:04:28.660 | that we're only ever going to be calling one tool at any one

03:04:32.420 | time which is not always going to be the case. So, you'd

03:04:35.140 | probably need to add a little bit of extra logic there in

03:04:37.380 | case of scenarios if you're building an agent that is

03:04:41.300 | likely to be running parallel tool calls. But yeah, you can

03:04:45.060 | see here actually it's completely fine. So, it's

03:04:47.620 | running one after the other. Okay. So, with that, we built

03:04:51.140 | our agent executor. I know there's a lot to that and of

03:04:55.860 | course, you can just use the very abstract agent executor

03:04:59.060 | in the chain but I think it's very good to understand what is

03:05:03.140 | actually going on to build our own agent executor in this

03:05:06.420 | case and it sets you up nicely for building more complicated

03:05:10.500 | or use case specific agent logic as well. So, that is it

03:05:17.300 | for this chapter. In this chapter, we're going to be

03:05:20.180 | taking a look at line change expression language. We'll be

03:05:23.460 | looking at the runnables, the serializable and parallel of

03:05:27.940 | those, the runnable pass through and essentially how we

03:05:32.500 | use LSL in its full capacity. Now, to do that well, what I

03:05:38.900 | want to do is actually start by looking at the traditional

03:05:42.820 | approach to building chains in line chain. So, to do that,

03:05:48.260 | we're going to go over to the LSL chapter and open that

03:05:51.860 | curl up. Okay. So, let's come down. We'll do the

03:05:56.900 | prerequisites. As before, nothing major in here. The one

03:06:00.820 | thing that is new is Docker Ray because later on, as you'll

03:06:04.180 | see, we're going to be using this as an example of the

03:06:08.980 | parallel capabilities in LSL. If you want to use Langsmith,

03:06:13.620 | you just need to add in your line chain API key. Okay. And

03:06:16.820 | then let's, okay. So, now, let's dive into the traditional

03:06:20.980 | approach to chains in line chain. So, the LN chain, I

03:06:27.540 | think it's probably one of the first things introduced in

03:06:30.420 | line chain, if I'm not wrong. This takes a prompt and feeds

03:06:33.780 | it into an LLM and that's it. You can also, you can add

03:06:39.540 | like output parsing to that as well but that's optional. I

03:06:44.260 | don't think we're going to cover it here. So, what that

03:06:47.860 | might look like is we have, for example, this prompt

03:06:50.340 | template here. Give me a small report on topic. Okay. So,

03:06:54.420 | that would be our prompt template. We'd set up as we

03:06:57.860 | usually do with the prompt templates as we've seen

03:07:01.540 | before. We then define our LLM. We need our API key for

03:07:08.180 | this which as usual, we would get from platform.openai.com.

03:07:14.020 | Then, we go ahead. I'm just showing you that you can invoke

03:07:18.580 | the LLM there. Then, we go ahead actually define a output

03:07:23.460 | parser. So, we do do this. I wasn't sure we did but we will

03:07:26.740 | then define our LLM chain like this. Okay. So, LLM chain, we

03:07:31.220 | are now prompt and now LLM and now output parser. Okay. This

03:07:36.740 | is the traditional approach. So, I would then say, okay,

03:07:42.660 | retrieve augmented generation and what it's going to do is

03:07:44.820 | it's going to give me a little report back on on rag. Okay.

03:07:49.620 | It takes a moment but you can see that that's what we get

03:07:51.940 | here. We can format that nicely as we usually do and we get,

03:07:57.780 | okay, look, we get a nice little report. However, the LLM

03:08:01.620 | chain is one, it's quite restrictive, right? We have to

03:08:05.380 | have like particular parameters that have been predefined as

03:08:09.220 | being usable which is, you know, restrictive and it's also

03:08:13.060 | been deprecated. So, you know, this isn't the standard way of

03:08:17.620 | doing this anymore but we can still use it. However, the

03:08:21.700 | preferred method to building this and building anything else

03:08:25.140 | really or chains in general in line chain is using LSL, right?

03:08:29.540 | And it's super simple, right? So, we just actually take the

03:08:32.100 | prompt LLM and output parser that we had before and then we

03:08:35.060 | just chain them together with these pipe operators. So, the

03:08:38.420 | pipe operator here is saying, take what is output from here

03:08:41.860 | and input it into here. Take what is output from here and

03:08:45.380 | put it into here. That's all it does. It's super simple. So,

03:08:49.700 | put those together and we invoke it in the same way and

03:08:52.820 | we'll get the same output, okay? And that's what we get.

03:08:58.500 | There is actually a slight difference on what we're

03:09:01.220 | getting out from there. You can see here we got actually a

03:09:04.500 | dictionary but that is pretty much the same, okay? So, we get

03:09:09.460 | that and as before, we can display that in Markdown with

03:09:14.260 | this, okay? So, we saw just now that we have this pipe

03:09:18.100 | operator here. It's not really standard Python syntax to use

03:09:26.260 | this or at least it's definitely not common. It's an

03:09:29.940 | aberration of the intended use of Python, I think. But anyway,

03:09:35.380 | it does, it looks cool and when you understand it, I kinda get

03:09:41.460 | why they do it because it does make things quite simple in

03:09:44.260 | comparison to what it could be otherwise. So, I kinda get it.

03:09:47.860 | It's a little bit weird but it's what they're doing and I'm

03:09:51.060 | teaching it ourselves. That's what we're going to learn. So,

03:09:55.780 | what is that pipe operator actually doing? Well, it's as I

03:10:04.020 | mentioned, it's taking the output from this, putting it as

03:10:06.340 | input into what is ever on the right but how does that

03:10:10.260 | actually work? Well, let's actually implement it

03:10:14.580 | ourselves without line chain. So, we're going to create this

03:10:17.380 | class called Runnable. This class, when we initialize it,

03:10:20.580 | it's going to take a function, okay? So, this is literally a

03:10:23.460 | Python function. It's going to take that and it's going to

03:10:28.180 | essentially turn it into what we would call a Runnable in

03:10:31.780 | line chain and what does that actually mean? Well, it doesn't

03:10:34.740 | really mean anything. It just means that when you use run the

03:10:40.180 | invoke method on it, it's going to call that function in the

03:10:43.140 | way that you would have done otherwise, alright? So, using

03:10:46.340 | just function, you know, brackets, open, parameters,

03:10:50.100 | brackets, close. It's going to do that but it's also going to

03:10:53.460 | add this method, this all method. Now, this all method in

03:10:59.060 | typical Python syntax. Now, this all method is essentially

03:11:03.620 | going to take your Runnable function, the one that you

03:11:07.140 | initialize with and it's also going to take an other

03:11:10.900 | function, okay? This other function is actually going to

03:11:14.260 | be a Runnable, I believe. Yes, it's going to be a Runnable

03:11:17.860 | just like this and what it's going to do is it's going to

03:11:22.180 | run this Runnable based on the output of your current

03:11:28.020 | Runnable, okay? That's what this all is going to do. Seems a

03:11:32.340 | bit weird maybe but I'll explain in a moment. We'll see

03:11:35.380 | why that works. So, I'm going to chain a few functions

03:11:39.540 | together using this all method. So, first, we're just

03:11:44.660 | going to turn them all into Runnables, okay? So, these are

03:11:47.620 | normal functions as you can see, normal Python functions.

03:11:50.660 | We then turn them into this Runnable using our Runnable

03:11:53.380 | class. Then, look what we can do, right? So, we're going to

03:11:59.460 | create a chain that is going to be our Runnable chained with

03:12:05.460 | another Runnable chained with another Runnable, okay? Let's

03:12:09.140 | see what happens. So, we're going to invoke that chain of

03:12:12.500 | Runnables with three. So, what is this going to do? Okay, we

03:12:17.540 | start with five. We're going to add five to three. So, we'll

03:12:21.220 | get eight. Then, we're going to subtract five from eight to

03:12:25.940 | give us three again and then we're going to multiply three

03:12:32.420 | by five to give us fifteen and we can invoke that and we get

03:12:37.860 | fifteen, okay? Pretty cool. So, that is interesting. How does

03:12:43.780 | that relate to the pipe operator? Well, that pipe

03:12:48.020 | operator in Python is actually a shortcut for the all method.

03:12:52.820 | So, what we just implemented is the pipe operator. So, we can

03:12:56.980 | actually run that now with the pipe operator here and we'll

03:13:00.660 | get the same. We'll get fifteen, right? So, that's that's

03:13:03.540 | what LineChain is doing. Like, under the hood, that is what

03:13:06.900 | that pipe operator is. It's just chaining together these

03:13:10.500 | multiple Runnables as we'd call them using their own internal

03:13:14.740 | or operator, okay? Which is cool. I will give them that.

03:13:19.140 | It's kind of a cool way of doing this. It's creative. I

03:13:22.340 | wouldn't have thought about it myself. So, yeah, that is a

03:13:27.620 | pipe operator. Then, we have these Runnable things, okay? So,

03:13:31.300 | this is this is different to the Runnable I just defined

03:13:34.020 | here. This is we define this ourselves. It's not a

03:13:37.220 | LineChain thing. We didn't get this from LineChain. Instead,

03:13:42.180 | this Runnable lambda object here, that is actually exactly

03:13:48.100 | the same as what we just defined, alright? So, what we

03:13:50.740 | did here, this Runnable, this Runnable lambda is the same

03:13:57.140 | thing but in LineChain, okay? So, if we use that, okay? We

03:14:01.780 | use that to now define three Runnables from the functions

03:14:06.100 | that we defined earlier. We can actually pair those together

03:14:09.300 | now using the the pipe operator. You could also pair

03:14:12.820 | them together if you want with the or operator, right? So, we

03:14:18.740 | could do what we did earlier. We can invoke that, okay? Or as

03:14:24.340 | we were doing originally, we choose pipe operator. Exactly

03:14:28.580 | the same. So, this Runnable lambda from LineChain is just

03:14:31.620 | what we what we just built with the Runnable. Cool. So, we have

03:14:35.540 | that. Now, let's try and do something a little more

03:14:38.820 | interesting. We're going to generate a report and we're

03:14:40.740 | going to try and edit that report using this

03:14:43.140 | functionality, okay? So, give me a small report about topic,

03:14:47.140 | okay? We'll go through here. We're going to get our report

03:14:51.780 | on AI, okay? So, we have this. You can see that AI is

03:14:57.540 | mentioned many times in here. Then, we're going to take a

03:15:04.820 | very simple function, right? So, I'm just going to extract

03:15:07.700 | fact. This is basically going to take what is it? See, taking

03:15:12.260 | the first. Okay. So, we're actually trying to remove the

03:15:17.300 | introduction here. I'm not sure if this actually will work as

03:15:20.740 | expected but it's it's fine. Try it anyway but then more

03:15:27.620 | importantly, we're going to replace this word, okay? So,

03:15:30.500 | we're going to replace an old word with a new word. Our old

03:15:32.820 | word is going to be AI. Our new word is going to be Skynet,

03:15:35.700 | okay? So, we can wrap both of these functions as Runnable

03:15:40.820 | Lambdas, okay? We can add those as additional steps inside our

03:15:45.380 | entire chain, alright? So, we're going to extract, try and

03:15:48.900 | remove the introduction although I think it needs a bit

03:15:51.540 | more processing than just splitting here and then we're

03:15:55.060 | going to replace the word. We need that actually to be AI.

03:15:58.340 | Run that, run this.

03:16:01.540 | Okay. So, now we get Artificial Intelligence Skynet refers to

03:16:07.200 | the simulation of human intelligence processed by

03:16:09.040 | machines and then we have narrow Skynet, weak Skynet, and

03:16:13.360 | strong Skynet. Applications of Skynet. Skynet technology is

03:16:17.600 | being applied in numerous fields including all these

03:16:19.760 | things. Scary. Despite its potential, Skynet poses several

03:16:24.800 | challenges. Systems can perpetrate existing biases. It

03:16:29.680 | raises significant privacy concerns. It can be exploited

03:16:34.160 | for malicious purposes, okay? So, we have all these, you know,

03:16:38.800 | it's just a silly little example. We can see also the

03:16:41.440 | introduction didn't work here. The reason for that is because

03:16:44.400 | our introduction includes multiple new lines here. So, I

03:16:48.400 | would actually, if I want to remove the introduction, we

03:16:51.280 | should remove it from here, I think. This is a, I will never

03:16:56.240 | actually recommend you do that because it's not, it's not very

03:17:00.960 | flexible. It's not very robust but just so I show you that

03:17:06.640 | that is actually working. So, this extract fact runnable,

03:17:10.560 | right? So, now we're essentially just removing the

03:17:13.840 | introduction, right? Why would we want to do that? I don't

03:17:17.440 | know but it's there just so you can see that we can have

03:17:20.880 | multiple of these runnable operations running and they

03:17:24.880 | can be whatever you want them to be. Okay, it is worth

03:17:28.400 | knowing that the inputs to our functions here were all single

03:17:32.880 | arguments, okay? If you have a function that is accepting

03:17:37.280 | multiple arguments, you can do that in the way that I would

03:17:40.080 | probably do it or you can do it in multiple ways. One of the

03:17:44.000 | ways that you can do that is actually write your function to

03:17:48.320 | accept multiple arguments but actually do them through a

03:17:50.800 | single argument. So, just like a single like x which would be

03:17:53.600 | like a dictionary or something and then just unpack them

03:17:56.560 | within the function and use them as needed. That's just,

03:17:59.040 | you know, one way you can do it. Now, we also have these

03:18:02.000 | different runnable objects that we can use. So, here we have

03:18:06.080 | runnable parallel and runnable pass-through. It's kind of

03:18:10.480 | self-explanatory to some degree. So, let me just go

03:18:13.680 | through those. So, runnable parallel allows you to run

03:18:17.360 | multiple runnable instances in parallel. Runnable pass-through

03:18:23.040 | may be less self-explanatory, allows us to pass a variable

03:18:26.880 | through to the next runnable without modifying it, okay? So,

03:18:30.960 | let's see how they would work. So, we're going to come down

03:18:33.600 | here and we're going to set up these two docker arrays or

03:18:37.280 | obviously, it's two sources of information and we're going to

03:18:42.080 | need our LN to pull information from both of these sources of

03:18:46.560 | information in parallel which is going to look like this. So,

03:18:49.600 | we have these two sources of information, vector store A,

03:18:53.440 | vector store B. This is our docker A and docker A B. These

03:18:58.960 | are both going to be fed in as context into our prompt. Then,

03:19:02.960 | our LN is going to use all of that to answer the question.

03:19:07.520 | Okay. So, to actually implement that, we have our, we need an

03:19:12.080 | embedding model. So, use OpenAI embeddings. We have our

03:19:15.520 | vector store A, vector store B. They're not, you know, real

03:19:19.440 | vectors. They're not full-on vectors here. We're just

03:19:22.480 | passing in a very small amount of information to both. So,

03:19:26.320 | we're saying, okay, we're going to create an in-memory vector

03:19:30.400 | store using these two bits of information. So, when say half

03:19:33.680 | the information is here, this would be a irrelevant piece of

03:19:36.000 | information. Then, we have the relevant information which is

03:19:38.800 | DeepSeq v3 was released in December 2024. Okay. Then, we're

03:19:44.160 | going to have some other information in our other vector

03:19:46.960 | store. Again, irrelevant piece here and relevant piece here.

03:19:51.200 | Okay. The DeepSeq v3 LLM is a mixture of experts model with

03:19:55.840 | 671 billion parameters at its largest. Okay. So, based on

03:20:02.160 | that, we're also going to build this prompt string. So, we're

03:20:04.960 | going to pass in both of those contexts into our prompt. Now,

03:20:07.840 | I'm going to ask a question. We don't actually need, we don't

03:20:12.320 | need that bit and actually, we don't even need that bit. What

03:20:16.000 | am I doing? So, we just need this. So, we have the both the

03:20:19.040 | contexts and we would run them through our prompt template.

03:20:23.520 | Okay. So, we have our system prompt template which is this

03:20:28.240 | and then we're just going to have, okay, our question is

03:20:30.160 | going to go into here as a user message. Cool. So, we have that

03:20:35.120 | and then, let me make this easier to read. We're going to

03:20:40.640 | convert both of those to retrievers which just means we

03:20:43.440 | can retrieve stuff from them and we're going to use this

03:20:46.800 | runnable parallel to run both of these in parallel, right? So,

03:20:54.240 | these have been both being run in parallel but then we're also

03:20:56.960 | running our question in parallel because this needs to

03:20:58.880 | be essentially passed through this component without us

03:21:03.600 | modifying anything. So, when we look at this here, it's almost

03:21:07.680 | like, okay, this section here would be our runnable parallel

03:21:12.960 | and these are being run in parallel but also our query is

03:21:17.600 | being passed through. So, it's almost like there's another

03:21:20.480 | line there which is our runnable pass through, okay? So,

03:21:22.880 | that's what we're doing here. These are running in parallel.

03:21:25.920 | One of them is a pass through. I need to run here. I just

03:21:34.480 | realized here we're using the deprecated embeddings. Just

03:21:38.800 | switch it to this. So, line chain open AI. We run that, run

03:21:44.160 | this, run that and now this is set up, okay? So, we then put

03:21:54.320 | our initial. So, this using our runnable parallel and runnable

03:21:58.320 | pass through. That is our initial step. We then have our

03:22:02.240 | prompt. Now, we should be chained together with the

03:22:06.960 | usual, you know, the usual pipe operator, okay? And now, we're

03:22:11.680 | going to invoke a question. What architecture does the mod

03:22:14.160 | DeepSeq release in December use, okay? So, for the ELN to

03:22:18.880 | answer this question, it's going to need to tell us what it

03:22:21.840 | needs the information about the DeepSeq model that was released

03:22:24.640 | in December which we have specified in one half here and

03:22:30.800 | then it also needs to know what architecture that model uses

03:22:33.280 | which is defined in the other half over here, okay? So, let's

03:22:39.040 | run this, okay? There we go. DeepSeq v3 model released in

03:22:45.040 | December 2024 is a mixture of experts model with 671 billion

03:22:49.840 | parameters, okay? So, a mixture of experts and this many

03:22:53.200 | parameters. Pretty cool. So, we've put together our pipeline

03:22:58.240 | using LSL, using the pipe operator, the runnables,

03:23:02.800 | specifically, we've looked at the runnable parallel, runnable

03:23:06.160 | pass through, and also the runnable lambdas. So, that's it

03:23:09.200 | for this chapter on LSL and we'll move on to the next one.

03:23:13.600 | In this chapter, we're going to cover streaming and async in

03:23:17.920 | lang chain. Now, both using async code and using streaming

03:23:23.200 | are incredibly important components of I think almost

03:23:28.320 | any conversational chat interface or at least any good

03:23:32.880 | conversational chat interface. For async, if your application

03:23:38.080 | is not async and you're spending a load of time in your

03:23:42.480 | API or whatever else waiting for LLM calls because a lot of

03:23:45.920 | those are behind APIs, you are waiting and your application is

03:23:50.880 | doing nothing because you've written synchronous code and

03:23:54.080 | that, well, there are many problems with that. Mainly, it

03:23:57.760 | doesn't scale. So, async code generally performs much better

03:24:02.160 | and especially for AI where a lot of the time, we're kind of

03:24:06.320 | waiting for API calls. So, async is incredibly important

03:24:09.680 | for that. For streaming, now, streaming is slightly different

03:24:13.920 | thing. So, let's say I want to tell me a story, okay? I'm

03:24:21.120 | using gbt4 here. It's a bit slower. So, we can actually

03:24:23.760 | stream. We can see that token by token, this text is being

03:24:27.200 | produced and sent to us. Now, this is not just a visual

03:24:30.480 | thing. This is the LLM when it is generating tokens or words,

03:24:38.240 | it is generating them one by one and that's because these

03:24:41.760 | LLMs literally generate tokens one by one. So, they're looking

03:24:45.600 | at all of the previous tokens in order to generate the next

03:24:48.240 | one and then generate next one, generate next one. Now, that's

03:24:50.720 | how they work. So, when we are implementing streaming, we're

03:24:56.800 | getting that feed of tokens directly from the LLM through

03:25:00.160 | to our, you know, our back end or our front end. That is what

03:25:03.520 | we see when we see that token by token interface, right? So,

03:25:07.520 | that's one thing. One other thing that I can do that, let

03:25:12.080 | me switch across to 4.0 is I can say, okay, we just got this

03:25:16.480 | story. I'm going to ask, are there any standard storytelling

03:25:26.480 | techniques to follow use above? Please use search.

03:25:35.440 | Okay. So, look, we get this very briefly there. We saw that

03:25:42.240 | it was searching the web and the way, it's not because we

03:25:46.240 | told it, okay, we told the LLM to use the search tool but then

03:25:51.600 | the LLM output some tokens to say, use the search tool that

03:25:56.320 | it's going to use a search tool and it also would have output

03:26:00.240 | the token saying what that search query would have been

03:26:02.720 | although we didn't see it there. But, what the chat GPT

03:26:07.760 | interface is doing there, so it received those tokens saying,

03:26:11.440 | hey, I'm going to use the search tool. It doesn't just send us

03:26:14.400 | those tokens like it does with the standard tokens here.

03:26:17.040 | Instead, it used those tokens to show us that searching the

03:26:22.960 | web little text box. So, streaming is not just the

03:26:28.000 | streaming of these direct tokens. It's also the streaming

03:26:33.120 | of these intermediate steps that the LLM may be thinking

03:26:36.640 | through which is particularly important when it comes to

03:26:40.960 | agents and agentic interfaces. So, it's also a feature thing,

03:26:45.280 | right? Streaming doesn't just look nice. It's also a feature.

03:26:49.360 | Then, finally, of course, when we're looking at this, okay,

03:26:53.200 | let's say we go back to GPT-4 and I say, okay, use all of

03:27:02.640 | this information to generate a long story for me,

03:27:11.200 | right? And, okay, we are getting the first token now. So, we

03:27:16.320 | know something is happening. We need to start reading. Now,

03:27:19.120 | imagine if we were not streaming anything here and

03:27:22.400 | we're just waiting, right? We're still waiting now. We're

03:27:25.200 | still waiting and we wouldn't see anything. We're just like,

03:27:28.240 | oh, it's just blank or maybe there's a little loading

03:27:30.800 | spinner. So, we'd still be waiting and even now, we're

03:27:37.280 | still waiting, right? This is an extreme example but can you

03:27:44.720 | imagine just waiting for so long and not seeing anything as

03:27:48.080 | a user, right? Now, just now, we would have got our answer if

03:27:52.240 | we were not streaming. I mean, that would be painful as a

03:27:56.560 | user. You'd not want to wait especially in a chat interface.

03:28:00.880 | You don't want to wait that long. It's okay with, okay, for

03:28:03.680 | example, deep research takes a long time to process but you

03:28:07.840 | know it's going to take a long time to process and it's a

03:28:10.000 | different use case, right? You're getting a report. This is

03:28:13.440 | a chat interface and yes, most messages are not going to take

03:28:18.560 | that long to generate. We're also probably not going to be

03:28:22.320 | using GPT-4 depending on, I don't know, maybe some people

03:28:25.440 | still do but in some scenarios, it's painful to need to wait

03:28:30.640 | that long, okay? And it's also the same for agents. It's nice

03:28:34.560 | when you're using agents to get an update on, okay, we're using

03:28:37.600 | this tool. It's using this tool. This is how it's using

03:28:39.680 | them. Perplexity, for example, have a very nice example of

03:28:43.840 | this. So, okay, what's this? OpenAI co-founder joins

03:28:48.240 | Mirati's startup. Let's see, right. So, we see this is

03:28:51.200 | really nice. We're using ProSearch. It's searching for

03:28:53.920 | news, showing us the results, like we're getting all this

03:28:57.200 | information as we're waiting which is really cool and it

03:29:01.840 | helps us understand what is actually happening, right? It's

03:29:05.040 | not needed in all use cases but it's super nice to have those

03:29:08.480 | intermediate steps, right? So, then we're not waiting and I

03:29:11.600 | think this bit probably also streamed but it was just super

03:29:14.240 | fast. So, I didn't see it but that's pretty cool. So,

03:29:18.640 | streaming is pretty important. Let's dive into our example.

03:29:23.920 | Okay, we'll open that in Colab and off we go. So, starting with

03:29:28.000 | the prerequisites, same as always, LangChain, optionally

03:29:32.320 | LangSmith. We'll also enter our LangChain API key if you'd

03:29:36.160 | like to use LangSmith. We'll also enter our OpenAI API key.

03:29:40.240 | So, that is platform.openai.com and then as usual, we can just

03:29:45.200 | invoke our LLM, right? So, we have that. It's working. Now,

03:29:50.160 | let's see how we would stream with AStream, okay? So,

03:29:54.880 | whenever a method, so stream is actually a method as well, we

03:29:58.800 | could use that but it's not async, right? So, whenever we

03:30:01.760 | see a method in LangChain that has a prefix onto what would be

03:30:06.320 | another method, that's like the async version of this. So, we

03:30:12.560 | can actually stream using async super easily using just LLM

03:30:19.680 | AStream, okay? Now, this is just an example and to be

03:30:25.280 | completely honest, you probably will not be able to use this in

03:30:28.720 | an actual application but it's just an example and we're going

03:30:32.400 | to see how we would use this or how we would stream

03:30:35.680 | asynchronously in an application further down in

03:30:39.040 | this notebook. So, starting with this, you can see here that

03:30:44.480 | we're getting these tokens, right? We're just appending it

03:30:46.800 | to tokens here. We don't actually need to do that. I

03:30:48.800 | don't think we're using this but maybe we, yeah, we'll do it

03:30:52.480 | here. It's fine. So, we're just appending the tokens as they

03:30:56.400 | come back from our LLM, appending it to this. We'll see

03:31:00.000 | what that is in a moment and then I'm just printing the

03:31:03.680 | token content, right? So, the content of the token. So, in

03:31:08.240 | this case, that would be L. In this case, it would be LP. It

03:31:11.440 | would be SAMS, four, so on and so on. So, you can see for the

03:31:14.720 | most part, it's tends to be word level but it can also be

03:31:18.800 | sub-word level as you see, sent, is one word, of course. So,

03:31:24.320 | you know, they get broken up in various ways. Then, adding

03:31:29.120 | this pipe character onto the end here. So, we can see, okay,

03:31:33.360 | where are our individual tokens? Then, we also have

03:31:36.720 | Flush. So, Flush, you can actually turn this off and

03:31:40.320 | it's still going to stream. You're still going to see

03:31:41.840 | everything but it's going to be a bit more. You can see it's

03:31:43.920 | kind of a, it's like bit by bit. When we use Flush, it

03:31:48.800 | forces the console to update what is being shown to us

03:31:53.680 | immediately, alright? So, we get a much smoother when we're

03:31:58.560 | looking at this. It's much smoother versus when Flush is

03:32:02.160 | not set to true. So, yeah, when you're printing, that is good

03:32:05.840 | to do just so you can see. You don't necessarily need to.

03:32:08.640 | Okay. Now, we added all those tokens to the tokens list so

03:32:12.960 | we can have a look at each individual object that was

03:32:15.600 | returned to us, right? This is interesting. So, you see that

03:32:18.640 | we have the AI message chunk, right? That's an object and

03:32:22.640 | then you have the content. The first one's actually empty.

03:32:26.000 | Second one has that N for NLP and yeah, I mean, that's all we

03:32:31.120 | really need to know. They're very simple objects but they're

03:32:34.240 | actually quite useful because just look at this, right? So,

03:32:38.640 | we can add each one of our AI message chunks, right? Let's

03:32:42.640 | see what that does. It doesn't create a list. It creates this,

03:32:45.920 | right? So, we still just have one AI message chunk but it's

03:32:51.600 | combined the content within those AI message chunks which

03:32:55.440 | is kind of cool, right? So, for example, like we could remove

03:32:59.440 | these, right? And then we just see NLP. So, it's kind of nice

03:33:05.440 | little feature there. I do. I actually quite like that. But

03:33:10.640 | you do need to just be a little bit careful because obviously

03:33:12.800 | you can do that the wrong way and you're going to get like a

03:33:16.720 | I don't know what that is. Some weird token salad. So, yeah,

03:33:21.360 | you need to just make sure you are going to be merging those

03:33:24.480 | in the correct order unless you, I don't know, unless you're

03:33:28.160 | doing something weird. Okay, cool. So, streaming, that was

03:33:32.720 | streaming from a LM. Let's have a look at streaming with

03:33:35.600 | agents. So, we, it gets a bit more complicated to be

03:33:41.120 | completely honest. But we also need to, things are going to

03:33:45.680 | get a bit more complicated so that we can implement this in,

03:33:49.280 | for example, an API, right? That is, it's kind of like a

03:33:52.800 | necessary thing in any case. So, to just very quickly, we're

03:33:58.560 | going to construct our agent executor like we did in the

03:34:01.440 | agent execution chapter. And for that, for the agent

03:34:06.160 | executor, we're going to need tools, chat prompt template, LM

03:34:09.600 | agent, and the agent executor itself, okay? Very quickly, I'm

03:34:13.360 | not going to go through these in detail. We just define our

03:34:16.320 | tools. We have add, multiply, exponentiate, subtract, and

03:34:20.080 | define our answer tool. Merge those into a single list of

03:34:23.200 | tools. Then, we have our prompt template. Again, same as

03:34:27.680 | before, we just have system message, we have chat history,

03:34:30.640 | we have a query, and then we have the agent scratch pad for

03:34:34.960 | those intermediate steps. Then, we define our agent using

03:34:39.760 | LSL. LSL works quite well with both streaming and async, by

03:34:44.000 | the way. It supports both out of the box, which is nice. So, we

03:34:49.840 | define our agent. Then, coming down here, we're going to

03:34:54.800 | create the agent executor. This is the same as before, right?

03:34:58.240 | So, there's nothing new in here, I don't think. So, just

03:35:01.520 | initialize our agent things there. Then, it's, yeah, we're

03:35:06.960 | looping through, looping through. Yeah, nothing, nothing

03:35:11.920 | new there. So, we're just executing, we're invoking our

03:35:15.600 | agent, seeing if there's a tool call. This is slightly, we

03:35:20.480 | could shift this to before or after. It doesn't actually

03:35:22.320 | matter that much. So, we're checking if it's the final

03:35:25.440 | answer. If not, we continue, execute our tools, and so on.

03:35:30.640 | Okay, cool. So, then, we can invoke that. Okay, we go, what

03:35:37.440 | is 10 plus 10? There we go, right? So, we have our agent

03:35:43.040 | executor, it is working. Now, when we are running our agent

03:35:50.240 | executor, with every new query, if we're putting this into an

03:35:54.000 | API, we're probably going to need to provide it with a fresh

03:35:59.200 | callback handler. Okay, so, this is the callback handler is

03:36:02.480 | what's going to handle taking the tokens that are being

03:36:05.520 | generated by a Lemo agent and giving them to some other

03:36:10.160 | piece of code. Like, for example, the streaming

03:36:12.960 | response for an API, and our callback handler is going to

03:36:18.560 | put those tokens in a queue, in our case, and then our, for

03:36:23.840 | example, the streaming object is going to pick them up from

03:36:26.880 | the queue and put them wherever they need to be. So, to allow

03:36:32.080 | us to do that with every new query, rather than us needing

03:36:35.440 | to initialize everything when we actually initialize our

03:36:39.600 | agent, we can add a configurable field to our Lem,

03:36:43.360 | okay? So, we set the configurable fields here. Oh,

03:36:46.960 | also, one thing is that we set streaming equal to true, that's

03:36:50.320 | very minor thing, but just so you see that there, we do do

03:36:54.080 | that. So, we add some configurable fields to our Lem,

03:36:57.200 | which means we can basically pass an object in for these on

03:37:00.640 | every new invocation. So, we set our configurable field, it's

03:37:06.000 | going to be called callbacks, and we just add a description,

03:37:09.440 | right? Nothing more to it. So, this will now allow us to

03:37:13.120 | provide that field when we're invoking our agent, okay? Now,

03:37:21.120 | we need to define our callback handler, and as I mentioned,

03:37:25.680 | what is basically going to be happening is this callback

03:37:28.000 | handler is going to be passing tokens into our async IO queue

03:37:33.200 | object, and then we're going to be picking them up from the

03:37:36.960 | queue elsewhere, okay? So, we can call it a queue callback

03:37:40.640 | handler, okay? And that is inheriting from the async

03:37:44.560 | callback handler, because we want all this to be done

03:37:46.480 | asynchronously, because we're thinking here about, okay, how

03:37:49.280 | do we implement all this stuff within APIs and actual real

03:37:52.880 | world code, and we do want to be doing all this in async. So,

03:37:58.080 | let me execute that, and I'll just explain a little bit of

03:38:00.240 | what we're looking at. So, we have the initialization, right?

03:38:03.520 | There's nothing specific here. What we really want to be

03:38:08.560 | doing is we want to be setting our queue object, assigning

03:38:11.760 | that to the class attributes, and then there's also this

03:38:15.840 | final answer scene, which we're setting to false. So, what

03:38:19.440 | we're going to be using that for is our LLM will be

03:38:24.240 | streaming tokens to us whilst it's using its tool calling,

03:38:29.360 | and we might not want to display those immediately, or

03:38:31.600 | we might want to display them in a different way. So, by

03:38:34.560 | setting this final answer scene to false, whilst our LLM is

03:38:41.440 | outputting those tool tokens, we can handle them in a

03:38:44.240 | different way, and then as soon as we see that it's done with

03:38:47.360 | the tool calls and it's onto the final answer, which is

03:38:49.600 | actually another tool call, but once we see that it's onto the

03:38:52.160 | final answer tool call, we can set this to true, and then we

03:38:56.240 | can start processing our tokens in a different way,

03:38:59.360 | essentially. So, we have that. Then, we have this

03:39:03.840 | aiter method. This is required for any async generator object.

03:39:11.280 | So, what that is going to be doing is going to be iterating

03:39:13.680 | through, right? So, it's a generator. It's going to be

03:39:16.400 | going iterating through and saying, okay, if our queue is

03:39:19.760 | empty, right? This is the queue that we set up here. If it's

03:39:22.800 | empty, wait a moment, right? We use the sleep method here, and

03:39:27.360 | this is an async sleep method. This is super important. We're

03:39:30.960 | using, we're awaiting for an asynchronous sleep, right? So,

03:39:35.040 | whilst we're, whilst we're waiting for that 0.1 seconds,

03:39:38.880 | our, our code can be doing other things, right? That that

03:39:43.360 | is important. If we, if we use, I think the standard is time

03:39:47.280 | dot sleep, that is not asynchronous, and so it will

03:39:50.560 | actually block the thread for that 0.1 seconds. So, we don't

03:39:54.880 | want that to happen. Generally, our queue should probably not

03:39:58.000 | be empty that frequently given how quickly tokens are going to

03:40:01.680 | be added to the queue. So, the only way that this would

03:40:05.440 | potentially be empty is maybe our LLM stops. Maybe there's

03:40:10.720 | like a connection interruption for a, you know, a brief second

03:40:13.600 | or something, and no tokens are added. So, in that case, we

03:40:17.280 | don't actually do anything. We don't keep checking the queue.

03:40:19.680 | We just wait a moment, okay? And then, we check again. Now,

03:40:24.320 | if it was empty, we wait, and then, we continue on to the

03:40:28.080 | next iteration. Otherwise, it probably won't be empty. We get

03:40:33.040 | whatever is from our, inside our queue. We get that out, pull

03:40:36.160 | it out. Then, we say, okay, if that token is a done token,

03:40:42.640 | we're going to return. So, we're going to stop this

03:40:45.760 | generator, right? We're finished. Otherwise, if it's

03:40:49.680 | something else, we're going to yield that token which means

03:40:52.480 | we're returning that token, but then, we're continuing through

03:40:55.520 | that loop again, right? So, that is our generator logic.

03:41:01.760 | Then, we have some other methods here. These are

03:41:05.360 | line-chain specific, okay? We have on LLM new token and we

03:41:10.400 | have on LLM end. Starting with on LLM new token, this is

03:41:14.960 | basically when an LLM returns a token to us. Line chain is

03:41:18.400 | going to run or execute this method, okay? This is the

03:41:23.280 | method that will be called. What this is going to do is

03:41:27.200 | it's going to go into the keyword arguments. It's going

03:41:29.200 | to get the chunk object. So, this is coming from our LLM. If

03:41:33.280 | there is something in that chunk, it's going to check for

03:41:37.440 | a final answer tool call first, okay? So, we get our tool

03:41:41.680 | calls and we say, if the name within our chunk, right?

03:41:46.400 | Probably, this will be emptying most of the tokens we return,

03:41:49.520 | right? So, you remember before when we're looking at the

03:41:52.640 | chunks here, this is what we're looking at, right? The

03:41:56.160 | content for us is actually always going to be empty and

03:41:58.320 | instead, we're actually going to get the additional keyword

03:42:00.720 | args here and inside there, we're going to have our tool

03:42:03.600 | calling, our tool calls as we saw in the previous videos,

03:42:08.480 | right? So, that's what we're extracting. We're extracting

03:42:10.800 | that information. That's why we're going additional keyword

03:42:13.760 | args, right? And get those tool, the tool call information,

03:42:18.800 | right? Or it will be none, right? So, if it is none, I

03:42:23.360 | don't think it ever would be none to be honest. It would be

03:42:25.840 | strange if it's none. I think that means something would be

03:42:28.080 | wrong. Okay, so here, we're using the Walrus operator. So,

03:42:31.120 | the Walrus operator, what it's doing here is whilst we're

03:42:34.880 | checking the if logic here, whilst we do that, it's also

03:42:39.840 | assigning whatever is inside this. It's assigning over to

03:42:44.160 | tool calls and then with the if we're checking whether tool

03:42:48.240 | calls is something or none, right? Because we're using get

03:42:52.640 | here. So, if this get operation fails and there is no tool

03:42:56.640 | calls, this object here will be equal to none which gets

03:43:01.360 | assigned to tool calls here and then this if none will return

03:43:06.160 | false and this logic will not run, okay? And it will just

03:43:09.680 | continue. If this is true, so if there is something returned

03:43:13.520 | here, we're going to check if that something returned is

03:43:16.400 | using the function name or tool name, final answer. If it is,

03:43:20.560 | we're going to set that final answer scene equal to true.

03:43:23.040 | Otherwise, we're just going to add our chunk into the queue,

03:43:27.760 | okay? We use put no weight here because we're we're using

03:43:30.560 | async. Otherwise, if you were not using async, I think you

03:43:33.600 | might just put weight or maybe even put put. No, okay, you

03:43:39.360 | you'd use put if it's just synchronous code but II don't

03:43:43.200 | think I've ever implemented this synchronously. So, it

03:43:46.240 | would actually just be put no weight for async, okay? And

03:43:49.440 | then return. So, we have that. Then, we have on LLM end, okay?

03:43:56.480 | So, this is when chain sees that the LLM has returned or

03:44:02.080 | indicated that it is finished with the response. Line chain

03:44:06.480 | will call this. So, you have to be aware that this will happen

03:44:13.120 | multiple times during an agent execution because if you think

03:44:17.440 | within our agent executor, we're hitting the LLM multiple

03:44:22.080 | times. We have that first step where it's deciding, oh, I'm

03:44:25.600 | going to use the add tool or the multiply tool and then that

03:44:29.120 | response gets back to us. We execute that tool and then we

03:44:33.360 | pass the output from that tool and or the original user query

03:44:36.960 | in the chat history, we pass that back to our LLM again,

03:44:39.680 | right? So, that's another call to our LLM that's going to come

03:44:42.560 | back. It's going to finish or it's going to give us something

03:44:45.120 | else, right? So, there's multiple LLM calls happening

03:44:48.640 | throughout our agent execution logic. So, this on LLM call

03:44:53.200 | will actually get called at the end of every single one of

03:44:55.680 | those LLM calls. Now, if we get to the end of a LLM call and it

03:45:02.480 | was just a it was a tool invocation. So, we had the, you

03:45:05.600 | know, it called the add tool. We don't want to put the done

03:45:11.280 | token into our queue because when the done token is added to

03:45:14.640 | our queue, we're going to stop iterating, okay? Instead, if it

03:45:20.880 | was just a tool call, we're going to say step end, right?

03:45:24.240 | And we'll actually get this token back. So, this is useful

03:45:27.920 | on, for example, the front end, you could have, okay, I've

03:45:32.560 | used the add tool. These are the parameters and it's the end

03:45:36.560 | of the step. So, you could have that your tool call is being

03:45:40.640 | used on some front end and as soon as it sees step end, it

03:45:43.840 | knows, okay, we're done with that. Here was the response,

03:45:46.720 | right? And it can just show you that and we're going to use

03:45:49.680 | that. We'll see that soon but let's say we get to the final

03:45:53.280 | answer tool. We're on the final answer tool and then we get

03:45:56.400 | this signal that the LLM has finished. Then, we need to stop

03:46:01.920 | iterating. Otherwise, our our stream generator is just going

03:46:06.000 | to keep going forever, right? Nothing's going to stop it or

03:46:08.880 | maybe it will time out. I don't think it will though. So, at

03:46:13.200 | that point, we need to send, okay, stop, right? We need to

03:46:16.800 | say we're done and then that will that will come back to

03:46:19.760 | here to our iterator and to our async iterator and it will

03:46:25.360 | return and stop the generator, okay? So, that's the core

03:46:30.960 | logic that we have inside that. I know there's a lot going on

03:46:34.240 | there. It's but we need all of this. So, it's important to be

03:46:38.400 | aware of it. Okay. So, now, let's see how we might actually

03:46:43.040 | call our agent with all of the streaming in this way. So,

03:46:49.360 | we're going to initialize our queue. We're going to use that

03:46:53.120 | to initialize a streamer, okay? Using the the custom streamer

03:46:56.400 | that we just set up. Custom callback handler, whatever you

03:46:59.040 | want to call it, okay? Then, I'm going to define a function.

03:47:03.200 | So, this is an asynchronous function. It has to be if if

03:47:05.840 | we're using async and what it's going to do is it's going to

03:47:09.200 | call our agent with a config here and we're going to pass it

03:47:14.720 | that call the the callback which is the streamer, right?

03:47:18.320 | Now, here, I'm not calling the agent executor. I'm just calling

03:47:20.800 | the agent, right? So, the if we come back up here, we're

03:47:25.360 | calling this, right? So, that's not going to include all the

03:47:28.720 | tool execution logic and importantly, we're calling the

03:47:32.960 | agent with the config that uses callbacks, right? So, this

03:47:37.840 | this configurable fields here from our LM is actually being

03:47:40.720 | fed through and it propagates through to our agent object as

03:47:43.360 | well to the runnable serializable, right? So, that's

03:47:47.200 | what we're executing here. We see agent with config and we're

03:47:50.560 | passing in those callbacks which is just one actually,

03:47:54.000 | okay? So, that sets up our agent and then we invoke it with

03:47:58.240 | a stream, okay? Like we did before and we're just going to

03:48:01.760 | return everything. So, let's run that, okay? And we see all

03:48:07.280 | the token or the chunk objects that have been returned and

03:48:10.480 | this is useful to understand what we're actually doing up

03:48:14.080 | here, right? So, when we're doing this chunk message,

03:48:17.920 | additional keyword arguments, right? We can see that in here.

03:48:20.960 | So, this would be the chunk message object. We get the

03:48:24.640 | additional keyword logs. We're going to tool calls and we get

03:48:28.480 | the information here. So, we have the ID for that tool call

03:48:31.040 | which we saw in the previous chapters. Then, we have our

03:48:35.760 | function, right? So, the function includes the name,

03:48:39.760 | right? So, we know what tool we're calling from this first

03:48:42.560 | chunk but we don't know the arguments, right? Those

03:48:44.960 | arguments are going to be streamed to us. So, we can see

03:48:47.600 | them begin to come through in the next chunk. So, next chunk

03:48:51.920 | is just it's just the first token for the add function,

03:48:56.640 | right? And we can see these all come together over multiple

03:49:00.640 | steps and we actually get all of our arguments, okay? That's

03:49:05.600 | pretty cool. So, actually one thing I would like to show you

03:49:10.000 | here as well. So, if we just do token equals tokens, sorry.

03:49:17.520 | And we do

03:49:20.320 | tokens.appendtoken.

03:49:25.460 | Okay. We have all of our tokens in here now. Alright, see that

03:49:31.260 | they're all AI message chunks. So, we can actually add those

03:49:35.500 | together, right? So, let's we'll go with these here and

03:49:39.340 | based on these, we're going to get all of the arguments, okay?

03:49:42.540 | So, this is kind of interesting. So, it's one until

03:49:46.220 | I think like the second to last maybe.

03:49:51.420 | Alright, so we have these and actually we just want to add

03:49:56.240 | those together. So, I'm going to go with tokens one and I'm

03:50:01.920 | just going to go four.

03:50:07.760 | For token in, we're going to go from the second onwards. I'm

03:50:13.700 | going to TK plus token, right? And let's see what TK looks

03:50:19.460 | like at the end here. TK.

03:50:23.780 | Okay. So, now you see that it's kind of merged with all those

03:50:28.180 | arguments here. Sorry, plus equal. Okay. So, run that and

03:50:34.500 | you can see here that it's merged those arguments. It

03:50:36.900 | didn't get all of them. So, I kind of missed some at the end

03:50:38.980 | there but it's merging them, right? So, you can see that

03:50:42.020 | logic where it's, you know, before it was adding the

03:50:45.060 | content from various chunks. It also does the same for the

03:50:49.460 | other parameters within your chunk object which is I think

03:50:53.220 | it's pretty cool and you can see here the name wasn't

03:50:55.940 | included. That's because we started on token one or on

03:50:59.700 | token zero where the name was. So, if we actually started from

03:51:02.660 | token zero and let's just let's just pull them in there,

03:51:06.660 | alright? So, from one onwards, we're going to get a complete

03:51:12.820 | AI message chunk which includes the name here and all of those

03:51:17.940 | arguments and you'll you'll see also here, right? Populate

03:51:21.220 | everything which is pretty cool. Okay. So, we have that.

03:51:26.900 | Now, based on this, we're going to want to modify our custom

03:51:29.700 | agent executor because we're streaming everything, right?

03:51:34.500 | So, we want to add streaming inside our agent executor which

03:51:38.020 | we're doing here, right? So, this is async def stream and

03:51:42.180 | we're sharing async for token in the A stream, okay? So, this

03:51:47.620 | is like the very first instance. If output is non,

03:51:51.220 | we're just going to be adding our token. So, the the chunk,

03:51:55.140 | sorry, to our output like a first token becomes our output.

03:52:00.740 | Otherwise, we're just appending our tokens to the output, okay?

03:52:06.660 | If the token content is empty, which it should be, right?

03:52:09.860 | Because we're using tool calls all the time. We're just going

03:52:12.340 | to print content, okay? I just added these as so we see like

03:52:16.580 | print everything. I just want to want to be able to see that.

03:52:19.540 | I wouldn't expect this to run because we're saying it has to

03:52:22.900 | use tool calling, okay? So, within our agent, if we come up

03:52:28.180 | to here, we said tool choice any. So, it's been forced to

03:52:30.980 | use tool calling. So, it should never really be returning

03:52:34.100 | anything inside the content field but just in case it's

03:52:36.980 | there, right? So, we'll see if that is actually true. Then,

03:52:40.740 | we're just getting out our tool calls information, okay? From

03:52:44.820 | our chunk and we're going to say, okay, if there's something

03:52:46.900 | in there, we're going to print what is in there, okay? And

03:52:49.540 | then, we're going to extract our tool name. If there is some,

03:52:52.500 | if there's a tool name, I'm going to show you the tool name.

03:52:55.780 | Then, we're going to get the ARGs and if the ARGs are not

03:52:58.740 | empty, we're going to see what we get in there, okay? And then

03:53:03.060 | from all of this, we're actually going to merge all of

03:53:05.380 | it into our AI message, right? Because we're merging

03:53:08.980 | everything as we're going through, we're merging

03:53:10.420 | everything into outputs as I showed you before, okay? Cool.

03:53:13.860 | And then, we're just awaiting our stream that will like kick

03:53:16.340 | it off, okay? And then, we do the standard agent executor

03:53:20.420 | stuff again here, right? So, we're just pulling out tool

03:53:23.380 | name, tool logs, tool call ID and then we're using all that

03:53:26.100 | to execute our tool here and then we're creating a new tool

03:53:29.700 | message and passing that back in. And then also here, I move

03:53:33.300 | the break for the final answer into the final step. So, that

03:53:37.780 | is our custom agent executor with streaming and let's see

03:53:41.220 | what, let's see what it does, okay? Same for both equals

03:53:45.380 | true, so we see all those print statements, okay? So, you can

03:53:52.340 | kind of see it's a little bit messy but you can see we have

03:53:55.700 | tool calls that had some stuff inside it, had add here and

03:54:00.740 | what we're printing out here is we're printing out the full AI

03:54:03.380 | message chunk with tool calls and then I'm just printing out,

03:54:06.900 | okay, what are we actually pulling out from that? So,

03:54:09.460 | these are actually coming from the same thing, okay? And then

03:54:12.740 | the same here, right? So, we're looking at the full message

03:54:15.300 | and then we're looking, okay, we're getting this argument out

03:54:18.340 | from it, okay? So, we can see everything that is being pulled

03:54:22.180 | out, you know, chunk by chunk or token by token and that's it,

03:54:27.380 | okay? So, we could just get everything like that. However,

03:54:31.060 | right, so I'm printing everything so we can see that

03:54:33.300 | streaming. What if I don't print, okay? So, we're setting

03:54:37.380 | verbose or by default, verbose is equal to false here. So,

03:54:41.860 | what happens if we invoke now? Let's see.

03:54:46.900 | Okay.

03:54:50.980 | Cool. We got nothing. So, the reason we got nothing is

03:54:58.480 | because we're not printing but we don't, if you are, if you're

03:55:04.560 | building an API, for example, you're pulling your tokens

03:55:08.160 | through, you can't print them to your like a front end or

03:55:15.440 | print them as to the output of your API. Printing goes to your

03:55:20.560 | terminal, right? Your console window. It doesn't go anywhere

03:55:24.080 | else. Instead, what we want to do is we actually want to get

03:55:29.040 | those tokens out, right? But if but how do we do that, right?

03:55:33.760 | So, we we printed them but another place that those tokens

03:55:37.680 | are is in our queue, right? Because we set them up to go to

03:55:41.680 | the queue. So, we can actually pull them out of our queue

03:55:48.480 | whilst our agent executor is running and then we can do

03:55:52.560 | whatever we want with them because our code is async. So,

03:55:54.800 | it can be doing multiple things at the same time. So, whilst

03:55:58.000 | our code is running the agent executor, whilst that is

03:56:02.000 | happening, our code can also be pulling out from our queue

03:56:05.680 | tokens that are in there and sending them to like an API,

03:56:11.120 | for example, right? Or whatever downstream logic you have. So,

03:56:15.680 | let's see what that looks like. We start by just initializing

03:56:19.040 | our queue, initializing our streamer with that queue. Then

03:56:22.080 | we create a task. So, this is basically saying, okay, I want

03:56:26.400 | to run this but don't run it right now. I'm not ready yet.

03:56:29.760 | The reason that I say I'm not ready yet is because I also

03:56:33.440 | want to define here my async loop which is going to be

03:56:38.000 | printing those tokens, right? But this is async, right? So,

03:56:41.360 | we set this up. This is like get ready to run this. Because

03:56:45.520 | it is async, this is running, right? This is just running.

03:56:49.760 | Like it's there. It's already running. So, we get this. We

03:56:52.640 | continue. We continue. None of this is actually executed

03:56:56.160 | yet, right? Only here when we await the task that we set up

03:57:02.560 | here. Only then does our agent executor run and our async

03:57:10.080 | object here begin getting tokens, right? And here, again,

03:57:14.080 | I'm printing but I don't need to print. I could I could have

03:57:17.280 | like a let's say where this is within an API or something.

03:57:23.440 | Let's say I'm I'm saying, okay, send token to XYZ token, right?

03:57:31.700 | That's sending a token somewhere or if we're maybe

03:57:34.340 | we're yielding this to our some sort of streamer object within

03:57:38.500 | our API, right? We can do whatever we want with those

03:57:40.900 | tokens, okay? I'm just printing them cuz I want to actually see

03:57:44.420 | them, okay? But just important here is that we're not printing

03:57:49.300 | them within our agent executor. We're printing them outside the

03:57:52.580 | agent executor. We've got them out and we can put them

03:57:55.860 | wherever we want which is perfect when you're building an

03:57:58.820 | actual sort of real world use case where you're using an API

03:58:01.220 | or something else. Okay, so let's run that. Let's see what

03:58:03.940 | we get. Look at that. We get all of the information we could

03:58:08.580 | need and a little bit more, right? Because now, we're using

03:58:12.580 | the agent executor and now, we can also see how we have this

03:58:16.740 | step end, right? So, I know or I know just from looking at this,

03:58:21.060 | right? This is my first tool use. So, what tool is it? Let's

03:58:25.620 | have a look. It's the add tool and then, we have these

03:58:29.140 | arguments. So, I can then pass them, right? Downstream. Then,

03:58:32.740 | we have the next tool use which is here, down here. So, then,

03:58:37.940 | we can then pass them in the way that we like. So, that's

03:58:42.100 | pretty cool. Let's, I mean, let's see, right? So, we're

03:58:47.060 | getting those things out. Can we, can we do something with

03:58:50.900 | them before I, before I print them and show them? Yes, let's

03:58:54.660 | see, okay? So, we're now modifying our loop here. Same

03:58:59.860 | stuff, right? We're still initializing our queue,

03:59:02.580 | initializing our streamer, initializing our tasks, okay?

03:59:06.020 | And we're still doing this async for token streamer, okay?

03:59:09.860 | But then, we're doing stuff with our tokens. So, I'm saying,

03:59:13.460 | okay, if we're on stream end, I'm not actually gonna print

03:59:17.300 | stream end. I'm gonna print new line, okay? Otherwise, if we're

03:59:21.940 | getting a tool call here, we're going to say, if that tool call

03:59:26.260 | is the tool name, I am going to print calling tool name, okay?

03:59:32.500 | If it's the arguments, I'm going to print the tool

03:59:36.020 | argument and I'm gonna end up with nothing so that we don't

03:59:38.740 | go onto a new line. So, we're actually gonna be streaming

03:59:41.460 | everything, okay? So, let's just see what this looks like.

03:59:47.420 | Oh, my bad. I just added that. Okay.

03:59:55.420 | You see that? So, it goes very fast. So, it's kinda hard to

03:59:59.200 | see it. I'm gonna slow it down so you can see. So, you can see

04:00:02.800 | that we, as soon as we get the tool name, we stream that

04:00:07.040 | we're calling the add tool. Then, we stream token by token,

04:00:10.560 | the actual arguments for that tool. Then, for the next one,

04:00:13.680 | again, we do the same. We're calling this tool name. Then,

04:00:16.880 | we're streaming token by token again. We're processing

04:00:20.240 | everything downstream from outside of the agent executor

04:00:24.560 | and this is an essential thing to be able to do when we're

04:00:27.920 | actually implementing streaming and async and everything else

04:00:32.480 | in an actual application. So, I know that's a lot but it's

04:00:38.960 | important. So, that is it for our chapter on streaming and

04:00:43.360 | async. I hope it's all been useful. Thanks. Now, we're on

04:00:47.200 | to the final capstone chapter. We're going to be taking

04:00:51.280 | everything that we've learned so far and using it to build a

04:00:56.640 | actual chat application. Now, the chat application is what

04:01:00.400 | you can see right now and we can go into this and ask some

04:01:04.400 | pretty interesting questions and because it's an agent

04:01:06.960 | because as I've accessed these tools, it will be able to

04:01:09.440 | answer them for us. So, we'll see inside our application that

04:01:12.800 | we can ask questions that require tool use such as this

04:01:17.040 | and because of the streaming that we've implemented, we can

04:01:19.600 | see all this information in real time. So, we can see that

04:01:22.160 | serve API tool is being used, that these are the queries. We

04:01:25.280 | saw all that was in parallel as well. So, each one of those

04:01:29.200 | tools were being used in parallel. We've modified the

04:01:31.840 | code a little bit to enable that and we see that we have

04:01:36.160 | the answer. We can also see the structured output being used

04:01:39.520 | here. So, we can see our answer followed by the tools used

04:01:43.440 | here and then we could ask follow-up questions as well

04:01:45.920 | because it's conversational. So, say how is the weather in

04:01:51.200 | each of those cities?

04:01:54.960 | Okay, that's pretty cool. So, this is what we're going to be

04:02:04.540 | building. We are, of course, going to be focusing on the

04:02:07.580 | API, the backend. I'm not front-end engineer so I can't

04:02:11.340 | take you through that but the code is there. So, for those of

04:02:14.380 | you that do want to go through the front-end code, you can, of

04:02:17.260 | course, go and do that but we'll be focusing on how we

04:02:20.380 | build the API that powers all of this using, of course,

04:02:24.220 | everything that we've learned so far. So, let's jump into it.

04:02:27.340 | The first thing we're going to want to do is clone this repo.

04:02:30.700 | So, we'll copy this URL. This is the repo, Aurelio Labs

04:02:34.860 | LineChainCourse and you just clone your repo like so. I've

04:02:41.340 | already done this so I'm not going to do it again. Instead,

04:02:44.940 | I'll just navigate to the LineChainCourse repo. Now,

04:02:49.340 | there's a few setup things that you do need to do. All of

04:02:53.020 | those can be found in the README. So, we just open a new

04:02:57.740 | tab here and I'll open the README. Okay, so this explains

04:03:03.180 | everything we need. We have, if you were running this locally

04:03:06.860 | already, you will have seen this or you will have already

04:03:09.580 | done all this but for those of you that haven't, we'll go

04:03:12.460 | through quickly now. So, you will need to install the uv

04:03:18.140 | library. So, this is how we manage our Python environment,

04:03:22.700 | our packages. We use uv. On Mac, you would install it like

04:03:27.980 | so. If you're on Windows or Linux, just double check how

04:03:32.620 | you would install over here. Once you have installed this,

04:03:36.700 | you would then go to install Python. So, uv Python install.

04:03:42.780 | Then, we want to create our VM, our virtual environment

04:03:47.580 | using that version of Python. So, uvvn here. Then, as you can

04:03:53.820 | see here, we need to activate that virtual environment which

04:03:57.420 | I did miss from here. So, let me quickly add that. So, you

04:04:02.060 | just run that. For me, I'm using Phish. So, I just add

04:04:05.740 | Phish onto the end there but if you're using Bash or ZSH, I

04:04:08.380 | think you can you can just run that directly. And then,

04:04:11.100 | finally, we need to sync, i.e. install all of our packages

04:04:16.700 | using uv sync. And you see that will install everything for

04:04:20.940 | you. Great. So, we have that and we can go ahead and actually

04:04:26.940 | open Cursor or VS Code and then we should find ourselves

04:04:32.220 | within Cursor or VS Code. So, in here, you'll find a few

04:04:37.740 | things that we will need. So, first is environment variables.

04:04:42.780 | So, we can come over to here and we have OpenAI, API Key,

04:04:47.100 | Long Chain API Key, and SERP API API Key. Create a copy of

04:04:50.940 | this and you'd make this your .env file or if you want to

04:04:56.780 | run it with source, you can, well, I like to use Mac.env

04:05:01.820 | when I'm on Mac and I just add export onto the start there and

04:05:05.740 | then enter my API keys. Now, I actually already have these in

04:05:10.140 | this local.mac.env file which over in my terminal, I would

04:05:15.420 | just activate with source again like that. Now, we'll need that

04:05:20.540 | when we are running our API and application later but for now,

04:05:24.940 | let's just focus on understanding what the API

04:05:28.380 | actually looks like. So, navigating into the 09 Capstone

04:05:33.340 | chapter, we'll find a few things. What we're going to

04:05:37.020 | focus on is the API here and we have a couple of notebooks

04:05:41.260 | that help us just understand, okay, what are we actually

04:05:44.780 | doing here? So, let me give you a quick overview of the API

04:05:49.260 | first. So, the API, we're using FastAPI for this. We have a

04:05:53.340 | few functions in here. The one that we'll start with is this.

04:05:57.420 | Okay. So, this is our post endpoint for invoke and this

04:06:01.900 | essentially sends something to our LLM and begins a streaming

04:06:05.980 | response. So, we can go ahead and actually start the API and

04:06:09.980 | we can just see what this looks like. So, we'll go into

04:06:13.180 | chapter 09 Capstone API after setting our environment

04:06:18.060 | variables here and we just want to do uv run uvcorn main

04:06:23.260 | colon app reload. We don't need to reload but if we're

04:06:26.620 | modifying the code, that can be useful. Okay, and we can see

04:06:29.820 | that our API is now running on localhost port 8000 and

04:06:37.340 | if we go to our browser, we can actually open the docs for our

04:06:41.180 | API. So, we go to 8000 slash docs. Okay, we just see that we

04:06:45.900 | have that single invoke method. It extracts the content and it

04:06:51.420 | gives us a small amount of information there. Now, we

04:06:54.780 | could try it out here. So, if we say, say, hello, we can run

04:07:00.860 | that and we'll see that we get a response. We get this. Okay.

04:07:08.140 | Now, the thing that we're missing here is that this is

04:07:10.380 | actually being streamed back to us. Okay. So, this is not a

04:07:15.340 | just a direct response. This is a stream. To see that, we're

04:07:19.020 | going to navigate over to here to this streaming testing

04:07:21.980 | notebook and we'll run this. So, we are using requests here.

04:07:28.540 | We are not just doing a, you know, the standard post request

04:07:32.940 | because we want to stream the output and then print the

04:07:35.900 | output as we are receiving them. Okay. So, that's why this

04:07:41.100 | look, it's a little more complicated than just a typical

04:07:43.340 | request request.get. So, what we're doing here is we're

04:07:49.340 | starting our session which is our post request and then we're

04:07:53.580 | just iterating through the content as we receive it from

04:07:57.340 | that request. When we receive a token, right? Because sometimes

04:08:00.940 | this might be none. We print that. Okay and we have that

04:08:04.700 | flush equals truth. We have the use in the past. So, let's

04:08:08.780 | define that and then let's just ask a simple question. What is

04:08:12.140 | five plus five?

04:08:15.100 | Okay and we we saw that was it was pretty quick. So, it

04:08:19.440 | generated this response first and then it went ahead and

04:08:23.680 | actually continued streaming with all of this. Okay and we

04:08:29.120 | can see that there are these special tokens are being

04:08:31.360 | provided. This is to help the front end basically decide,

04:08:36.240 | okay, what should go where? So, here where we're showing these

04:08:41.280 | multiple steps of tool use and the parameters. The way the

04:08:46.160 | front end is deciding how to display those is it's just it's

04:08:50.800 | being provided the single stream but it has these set

04:08:53.600 | tokens. Has a step, has a set name, then it has the

04:08:57.120 | parameters followed by the sort of ending of the set token and

04:09:01.200 | it's looking at each one of these and then the one step

04:09:04.960 | name that it treats differently is where it will see the final

04:09:08.800 | answer step name. When it sees the final step name rather than

04:09:11.840 | displaying this tool use interface, it instead begins

04:09:15.680 | streaming the tokens directly like a typical chat interface

04:09:20.320 | and if we look at what we actually get in our final

04:09:23.120 | answer, it's not just the answer itself, right? So, we

04:09:26.720 | have the answer here. This is streamed into that typical chat

04:09:32.640 | output but then we also have tools used and then this is

04:09:36.240 | added into the little boxes that we have below the chat

04:09:40.800 | here. So, there's quite a lot going on just within this

04:09:44.000 | little stream. Now, we can try with some other questions here.

04:09:48.880 | So, we can say, okay, tell me about the latest news in the

04:09:50.960 | world. You can see that there's a little bit of a wait here

04:09:52.960 | whilst it's waiting to get the response and then, yeah,

04:09:56.160 | it's streaming a lot of stuff quite quickly, okay? So, there's

04:10:00.160 | a lot coming through here, okay? And then we can ask other

04:10:03.840 | questions like, okay, this one here, how cold is it in Oslo

04:10:06.880 | right now? Is five multiplied by five, right? So, these two

04:10:10.800 | are going to be executed in parallel and then it will after

04:10:14.800 | it has the answers for those, the agent will use another

04:10:18.400 | multiply tool to multiply those two values together and all of

04:10:21.920 | that will get streamed, okay? And then, as we saw earlier, we

04:10:26.640 | have the what is the current date and time in these places.

04:10:29.440 | Same thing. So, three questions. There are three

04:10:32.560 | questions here. What is the current date and time in Dubai?

04:10:34.640 | What is the current date and time in Tokyo and what is the

04:10:36.720 | current date and time in Berlin? Those three questions

04:10:40.880 | get executed in parallel against the API search tool and

04:10:45.200 | then all answers get returned within that final answer, okay?

04:10:49.520 | So, that is how our API is working. Now, let's dive a

04:10:55.360 | little bit into the code and understand how it is working.

04:11:00.240 | So, there are a lot of important things here. There's

04:11:03.280 | some complexity but at the same time, we try to make this as

04:11:06.160 | simple as possible as well. So, this is just fast API syntax

04:11:10.480 | here with the app post invoke. So, just our invoke endpoint.

04:11:15.040 | We consume some content which is a string and then if you

04:11:19.040 | remember from the agent executed deep dive which is

04:11:22.480 | what we've implemented here or a modified version of that, we

04:11:27.520 | have to initialize our async IO queue and our streamer which

04:11:32.160 | is the queue callback handler which I believe is exactly the

04:11:35.520 | same as what we defined in that earlier chapter. There's no

04:11:38.800 | differences there. So, we define that and then we return

04:11:43.520 | this streaming response object, right? Again, this is a fast

04:11:46.960 | API thing. This is so that you are streaming a response. That

04:11:50.880 | streaming response has a few attributes here which again are

04:11:55.040 | fast API things or just generic API things. So, some headers

04:12:00.000 | giving instructions to the API and then the media type here

04:12:03.440 | which is text event stream. You can also use, I think it's text

04:12:07.360 | plane possibly as well but I believe the standard here would

04:12:12.000 | be to use event stream and then the more important part for us

04:12:16.400 | is this token generator, okay? So, what is this token

04:12:20.480 | generator? Well, it is this function that we've defined up

04:12:24.080 | here. Now, if you, again, if you remember that earlier

04:12:27.760 | chapter, at the end of the chapter, we set up a for loop

04:12:33.280 | where we're printing out different tokens in various

04:12:36.320 | formats. So, we're kind of post processing them before

04:12:40.320 | deciding how to display them. That's exactly what we're doing

04:12:43.520 | here. So, in this block here, we're looping through every

04:12:50.400 | token that we're receiving from our streamer. We're looping

04:12:54.720 | through and we're just saying, okay, if this is the end of a

04:12:58.240 | step, we're going to yield this end of step token which we we

04:13:02.640 | saw here, okay? So, it's this end of end of set token there.

04:13:07.680 | Otherwise, if this is a tool call, so again, we've got that

04:13:11.280 | walrus operator here. So, what we're doing is saying, okay,

04:13:14.720 | get the tool calls out from our current message. If there is

04:13:19.760 | something there. So, if this is not none, we're going to execute

04:13:23.360 | what is inside here and what is being executed inside here is

04:13:27.200 | we're checking for the tool name. If we have the tool name,

04:13:30.160 | we return this, okay? So, we have the start of step token,

04:13:35.040 | the start of the step name token, the tool name or step

04:13:39.680 | name, whichever those you want to call it, and then the end of

04:13:42.560 | the step name token, okay? And then this, of course, comes

04:13:48.560 | through to the front end like that, okay? That's what we have

04:13:52.320 | there. Otherwise, we should only be seeing the tool name

04:13:55.680 | returned as part of first token for every step. After that, it

04:13:59.520 | should just be tool arguments. So, in this case, we say, okay,

04:14:03.440 | if we have those tool or function arguments, we're going

04:14:06.480 | to just return them directly. So, then that is the part that

04:14:09.840 | would stream all of this here, okay? Like these would be

04:14:13.600 | individual tokens, right? For example, right? So, we might

04:14:16.800 | have the open curly brackets followed by query could be a

04:14:20.960 | token, the latest could be a token, world could be a token,

04:14:24.640 | news could be a token, etc. Okay? So, that is what is

04:14:28.160 | happening there. This should not get executed but we have a,

04:14:32.720 | we just handle that just in case. So, we have any issues

04:14:36.320 | with tokens being returned there. We're just gonna print

04:14:39.040 | this error and we're going to continue with the streaming but

04:14:43.600 | that should not really be happening. Cool. So, that is

04:14:47.120 | our token streaming loop. Now, the way that we are picking up

04:14:53.920 | tokens from our stream object here is of course through our

04:14:57.840 | agent execution logic which is happening in parallel, okay? So,

04:15:02.000 | all of this is asynchronous. We have this async definition

04:15:04.720 | here. So, all of this is happening asynchronously. So,

04:15:08.640 | what has happened here is here, we have created a task which is

04:15:14.320 | the agent executor invoke and we passing our content, we're

04:15:17.840 | passing that streamer which we're gonna be pulling tokens

04:15:20.160 | from and we also set verbose to true. Uh we can actually

04:15:24.160 | remove that but that would just allow us to see additional

04:15:27.600 | output in our terminal window if we want it. I don't think

04:15:32.640 | there's anything particularly interesting to look at in there

04:15:36.400 | but particularly if you are debugging that can be useful.

04:15:40.000 | So, we create our task here but this does not begin the task.

04:15:45.440 | Alright, this is a async IO create task but this does not

04:15:49.840 | begin until we await it down here. So, what is happening

04:15:53.520 | here is essentially this code here is still being run or in

04:15:58.880 | like a we're in an asynchronous loop here but then we await

04:16:02.800 | this task. As soon as we await this task, tokens will still

04:16:06.320 | start being placed within our queue which then get picked up

04:16:10.480 | by the streamer object here. So, then this begins receiving

04:16:14.880 | tokens. I know async is always a little bit more confusing

04:16:20.880 | given the strange order of things but that is essentially

04:16:25.040 | what is happening. You can imagine all this is essentially

04:16:27.680 | being executed all at the same time. So, we have that. So,

04:16:32.800 | anything else to go through here? I don't think so. It's

04:16:35.520 | all sort of boilerplate stuff for FastAPI rather than the

04:16:39.040 | actual AI code itself. So, we have that as our streaming

04:16:43.600 | function. Now, let's have a look at the agent code itself.

04:16:48.720 | Okay. So, agent code. Where would that be? So, we're using

04:16:52.400 | this agent execute invoke and we're importing this from the

04:16:56.720 | agent file. So, we can have a look in here for this. Now, you

04:17:01.840 | can see straight away, we're pulling in our API keys here.

04:17:06.000 | Just, yeah, make sure that you do have those. Now, all of our

04:17:10.000 | cell, okay? This is what we've seen before in that agent

04:17:14.800 | executed deep dive chapter. This is all practically the

04:17:19.280 | same. So, we have our LM. We've set those configurable fields

04:17:25.280 | as we did in the earlier chapters. That configurable

04:17:28.240 | field is for our callbacks. We have our prompt. This has been

04:17:31.760 | modified a little bit. So, essentially, just telling it,

04:17:36.080 | okay, make sure you use the tools provided. We say you must

04:17:40.480 | use the final answer to provide a final answer to the user and

04:17:43.680 | one thing that I added that I noticed every now and again. So,

04:17:47.360 | I have explicitly said, use tools to answer the user's

04:17:50.400 | current question, not previous questions. So, I found with

04:17:54.800 | this setup, it will occasionally, if I just have a

04:17:58.720 | little bit of small talk with the agent and beforehand I was

04:18:02.080 | asking questions about, okay, like what was the weather in

04:18:04.720 | this place or that place, the agent will kind of hang on to

04:18:08.000 | those previous questions and try and use a tool again to

04:18:11.600 | answer and that is just something that you can more or

04:18:14.240 | less prompt out of it, okay? So, we have that. This is all

04:18:18.400 | exactly the same as before, okay? So, we have our chat

04:18:21.200 | history to make this conversational. We have our

04:18:23.920 | human message and then our agent scratch pad so that our

04:18:27.040 | agent can think through multiple tool use messages.

04:18:30.960 | Great. So, we also have the article class. So, this is to

04:18:36.080 | process results from SERP API. We have our SERP API function

04:18:42.160 | here. I will talk about that a little more in a moment

04:18:45.040 | because this is also a little bit different to what we

04:18:46.800 | covered before. What we covered before with SERP API, if you

04:18:51.200 | remember, was synchronous because we're using the SERP

04:18:55.040 | API client directly or the SERP API tool directly from

04:18:59.840 | BlankChain and because we want everything to be asynchronous,

04:19:03.920 | we have had to recreate that tool in a asynchronous fashion

04:19:09.600 | which we'll talk about a little bit later. But for now, let's

04:19:13.360 | move on from that. We can see our final answer being used

04:19:18.000 | here. So, this is I think we define the exact same thing

04:19:21.920 | before probably in that deep dive chapter again where we

04:19:25.040 | have just the answer and the tools that have been used.

04:19:29.200 | Great. So, we have that. One thing that is a little

04:19:32.640 | different here is when we are defining our name to tool

04:19:38.480 | function. So, this takes a tool name and it maps it to a tool

04:19:43.680 | function. When we have synchronous tools, we actually

04:19:48.800 | use tool funk here. Okay. So, rather than tool coroutine, it

04:19:53.440 | would be tool funk. However, we are using asynchronous tools

04:19:59.200 | and so this is actually tool coroutine and this is why

04:20:04.960 | if you come up here, I've made every single tool

04:20:08.320 | asynchronous. Now, that is not really necessary for a tool

04:20:13.360 | like final answer because there's no API calls

04:20:16.560 | happening. An API call is a very typical scenario where

04:20:20.400 | you do want to use async because if you make an API call

04:20:23.840 | with a synchronous function, your code is just going to be

04:20:26.800 | waiting for the response from the API while the API is

04:20:31.440 | processing and doing whatever it's doing. So, that is an

04:20:36.080 | ideal scenario where you would want to use async because

04:20:38.960 | rather than your code just waiting for the response from

04:20:42.880 | the API, it can instead go and do something else whilst it's

04:20:46.320 | waiting, right? So, that's an ideal scenario where you'd use

04:20:49.360 | async which is why we would use it for example with the

04:20:51.760 | SERP API tool here but for final answer and for all of

04:20:56.320 | these calculator tools that we've built, there's actually

04:21:00.720 | no need to have these as async because our code is just

04:21:05.920 | running through. It's executing this code. There's no waiting

04:21:09.280 | involved. So, it doesn't necessarily make sense to have

04:21:12.080 | these asynchronous. However, by making them asynchronous, it

04:21:16.160 | means that I can do tool coroutine for all of them

04:21:19.440 | rather than saying, oh, if this tool is synchronous, use

04:21:23.520 | tool.func whereas if this one is async, use tool.coroutine.

04:21:28.000 | So, it just simplifies the code for us a lot more but yeah, not

04:21:33.040 | directly necessary but it does help us write cleaner code

04:21:36.800 | here. This is also true later on because we actually have to

04:21:41.280 | await our tool calls which we can see over here, right? So,

04:21:46.880 | we have to await those tool calls. That would get messier

04:21:50.960 | if we were using the like some sync tools, some async tools.

04:21:56.880 | So, we have that. We have our Q callback handler. This is

04:22:00.320 | again, that's the same as before. So, I'm not going to go

04:22:03.520 | through. I'm not going to go through that. We covered that

04:22:06.080 | in the earlier deep dive chapter. We have our execute

04:22:09.600 | tool function here. Again, that is asynchronous. This just

04:22:13.120 | helps us, you know, clean up code a little bit. This would,

04:22:16.640 | I think in the deep dive chapter, we had this directly

04:22:20.000 | place within our agent executor function and you can do that.

04:22:23.840 | It's fine. It's just a bit cleaner to kind of pull this

04:22:26.880 | out and we can also add more type annotations here which I

04:22:30.480 | like. So, execute tool expects us to provide an AI message

04:22:34.400 | which includes a tool call within it and it will return us

04:22:38.640 | a tool message. Okay. Agent executor, this is all the same

04:22:44.480 | as before and we're actually not even using verbose here so

04:22:48.240 | we could fully remove it but I will leave it. Of course, if

04:22:51.040 | you would like to use that, you can just add a if verbose and

04:22:54.400 | then log or print some stuff where you need it. Okay. So,

04:22:59.760 | what do we have in here? We have our streaming function. So,

04:23:02.720 | this is what actually calls our agent, right? So, we have a

04:23:08.800 | query. This will call our agent just here and we could even

04:23:14.080 | make this a little clearer. So, for example, this could be

04:23:17.200 | configured agent because this is this is not the response.

04:23:22.320 | This is a configured agent. So, I think this is maybe a little

04:23:25.360 | clearer. So, we are configuring our agent with our callbacks,

04:23:29.520 | okay? Which is just our streamer. Then we're iterating

04:23:32.880 | through the tokens are returned by our agent using a stream

04:23:37.040 | here. Okay? And as we are iterating through this because

04:23:41.920 | we pass our streamer to the callbacks here, what that is

04:23:46.400 | going to do is every single token that our agent returns is

04:23:52.320 | gonna get processed through our queue callback handler here.

04:23:57.280 | Okay? So, this on LM token on LMN, these are going to get

04:24:03.360 | executed and then all of those tokens you can see here are

04:24:07.360 | passed to our queue. Okay? Then, we come up here and we

04:24:11.040 | have this a iter. So, this a iter method here is used by our

04:24:16.000 | generator over in our API is used by this token generator.

04:24:22.660 | To pick up from the queue, the tokens that have been put in

04:24:28.420 | the queue by these other methods here. Okay? So, it's

04:24:32.260 | putting tokens into the queue and pulling them out with this.

04:24:38.020 | Okay? So, that is just happening in parallel as well as

04:24:41.460 | this code is running here. Now, the reason that we extract the

04:24:45.380 | tokens out here is that we want to pull out our tokens and we

04:24:49.460 | append them all to our outputs. Now, those outputs that becomes

04:24:53.780 | a list of AI messages which are essentially the AI telling us

04:24:58.660 | what tool to use and what parameters to pass to each one

04:25:02.580 | of those tools. This is very similar to what we covered in

04:25:06.180 | that deep dive chapter but the one thing that I have modified

04:25:09.380 | here is I've enabled us to use parallel tool calls. So, that

04:25:17.460 | is what we see here with this these four lines of code. We're

04:25:21.060 | saying, okay, if our tool call includes an ID, that means we

04:25:24.660 | have a new tool call or a new AI message. So, what we do is

04:25:29.940 | we append that AI message which is the AI message chunk to our

04:25:35.060 | outputs and then following that, if we don't get an ID,

04:25:38.180 | that means we're getting the tool arguments. So, following

04:25:41.780 | that, we're just adding our AI message chunk to the most

04:25:46.420 | recent AI message chunk from our outputs. Okay, so what that

04:25:50.260 | will do is it will create that list of AI messages. It'll be

04:25:56.500 | like, you know, AI message one and then this will just append

04:26:01.780 | everything to that AI message one. Then, we'll get our next

04:26:05.700 | AI message chunk. This will then just append everything to

04:26:09.220 | that until we get a complete AI message and so on and so on.

04:26:13.780 | Okay. So, what we do here is here, we've collected all of

04:26:19.780 | our AI message chunk objects. Then, finally, what we do is

04:26:23.460 | just transform all those AI message chunk objects into

04:26:26.580 | actual AI message objects and then return them from our

04:26:29.700 | function which we then receive over here. So, into the tool

04:26:33.780 | calls variable. Okay. Now, this is very similar to the deep

04:26:38.980 | dive chapter. Again, we're going through that count, that

04:26:42.660 | loop where we have a max iterations at which point we

04:26:45.300 | will just stop but until then, we continue iterating through

04:26:50.660 | and making more tool calls, executing those tool calls, and

04:26:53.700 | so on. So, what is going on here? Let's see. So, we got our

04:26:58.580 | tool calls. This is going to be a list of AI message objects.

04:27:02.660 | Then, what we do with those AI message objects is we pass them

04:27:07.060 | to this execute tool function. If you remember, what is that?

04:27:10.500 | That is this function here. So, we pass each AI message

04:27:15.140 | individually to this function and that will execute the tool

04:27:20.260 | for us and then return us that observation from the tool.

04:27:25.620 | Okay. So, that is what you see happening here but this is an

04:27:30.660 | async method. So, typically, what you'd have to do is you'd

04:27:34.100 | have to do await execute tool and we could do that. So, we

04:27:38.420 | could do a, okay, let me make this a little bigger for us.

04:27:42.660 | Okay. And so, what we could do, for example, which might be a

04:27:45.700 | bit clearer is you could do tool obs equals an empty list

04:27:51.220 | and what you could do is you can say for tool call, oops, in

04:27:56.180 | tool calls, the tool observation is we're going to

04:28:00.980 | append execute tool call which would have to be in a wait. So,

04:28:06.100 | we'd actually put the await in there and what this would do is

04:28:09.460 | actually the exact same thing as what we're doing here. The

04:28:12.740 | difference being that we're doing this tool by tool. Okay.

04:28:17.540 | So, we are, we're executing async here but we're doing them

04:28:22.340 | sequentially whereas what we can do which is better is we

04:28:25.780 | can use async gather. So, what this does is gathers all those

04:28:30.260 | coroutines and then we await them all at the same time to

04:28:34.180 | run them all asynchronously. They all begin at the same time

04:28:37.780 | or almost exactly the same time and we get those responses

04:28:42.500 | kind of in parallel but of course it's async so it's not

04:28:46.260 | fully in parallel but practically in parallel.

04:28:50.260 | Cool. So, we have that and then that, okay, we get all of our tool

04:28:54.900 | observations from that. So, that's all of our tool messages

04:28:57.620 | and then one interesting thing here is if we,

04:29:01.700 | let's say we have all of our AI messages with all of our tool

04:29:04.980 | calls and we just append all of those to our agent scratchpad.

04:29:09.460 | Alright. So, let's say here we're just like, oh, okay,

04:29:11.860 | agent scratchpad extend and then we would just have, okay,

04:29:17.700 | we'd have our tool calls and then we do agent scratchpad

04:29:22.820 | extend tool obs. Alright. So, what is happening here is this

04:29:27.780 | would essentially give us something that looks like this.

04:29:33.700 | So, we'd have our AI message, say, I'm just gonna put, okay,

04:29:38.660 | we'll just put tool call IDs in here to simplify it a little

04:29:41.380 | bit. This would be tool call ID A. Then, we would have AI

04:29:46.900 | message, tool call ID B. Then, we'd have tool message. Let's

04:29:54.740 | just remove this content field. I don't want that and tool

04:29:59.140 | message, tool call ID B, right? So, it would look something

04:30:02.660 | like this. So, the order is the tool message is not following

04:30:07.140 | the AI message which you would think, okay, we have this tool

04:30:10.420 | call ID. That's probably fine but actually, when we're

04:30:12.980 | running this, if you add these two agents scratchpad in this

04:30:16.340 | order, what you'll see is your response just hangs like

04:30:21.300 | nothing. Nothing happens when you come through to your second

04:30:25.860 | iteration of your agent call. So, actually, what you need to

04:30:29.620 | do is these need to be sorted so that they are actually in

04:30:33.060 | order and it doesn't actually doesn't necessarily matter

04:30:36.740 | which order in terms of like A or B or C or whatever you use.

04:30:40.500 | So, you could have this order. We have AI message, tool

04:30:43.460 | message, AI message, tool message, just as long as you

04:30:46.180 | have your tool call IDs are both together or you could, you

04:30:49.620 | know, invert this for example, right? So, you could have this,

04:30:54.580 | right? And that will work as well. It's essentially just as

04:30:58.180 | long as you have your AI message followed by your tool

04:31:01.140 | message and both of those are sharing that tool call ID. You

04:31:04.260 | need to make sure you have that order, okay? So, that of course

04:31:09.140 | would not happen if we do this and instead, what we need to do

04:31:13.700 | is something like this, okay? So, if I make this a little

04:31:18.580 | easier to read, okay? So, we're taking the tool call ID. We are

04:31:23.780 | pointing it to the tool observation and we're doing

04:31:26.500 | that for every tool call and tool observation within like a

04:31:29.860 | zip of those, okay? Then, what we're saying is for each tool

04:31:35.060 | call within our tool calls, we are extending our agent

04:31:38.820 | scratchpad with that tool call followed by the tool

04:31:43.300 | observation message which is the tool message. So, this would

04:31:46.420 | be our, this is the AI message and that is the tool messages

04:31:51.860 | down there, okay? So, that is always happening and that is

04:31:54.900 | how we get this correct order which will run. Otherwise,

04:31:59.620 | things will not run. So, that's important to be aware of,

04:32:04.020 | okay? Now, we're almost done. I know there's, we've just been

04:32:07.220 | through quite a lot. So, we continue, we increment our

04:32:10.820 | count as we were doing before and then we need to check for

04:32:13.300 | the final answer tool, okay? And because we're running these

04:32:16.260 | tools in parallel, okay? Because we're allowing multiple

04:32:19.460 | tool calls in one step, we can't just look at the most

04:32:23.300 | recent tool and look if it is, it has the name final answer.

04:32:26.260 | Instead, we need to iterate through all of our tool calls

04:32:28.740 | and check if any of them have the name final answer. If they

04:32:32.020 | do, we say, okay, we extract that final answer call. We

04:32:35.620 | extract the final answer as well. So, this is the direct

04:32:38.660 | text content and we say, okay, we have found the final answer.

04:32:42.900 | So, this will be set to true, okay? Which should happen

04:32:45.940 | every time but let's say if our agent gets stuck in a loop of

04:32:50.660 | calling multiple tools, this might not happen before we

04:32:55.300 | break based on the max iterations here. So, we might

04:32:58.820 | end up breaking based on max iterations rather than we found

04:33:02.340 | a final answer, okay? So, that can happen. So, anyway, if we

04:33:07.460 | find that final answer, we break out of this for loop here

04:33:11.220 | and then, of course, we do need to break out of our while loop

04:33:14.420 | which is here. So, we say, if we found the final answer,

04:33:17.380 | break, okay? Cool. So, we have that. Finally, after all of

04:33:24.100 | that. So, this is how, you know, we've executed our tool, our

04:33:26.900 | agent has steps and iterations, has process, we've been through

04:33:32.980 | those. Finally, we come down to here where we say, okay, we're

04:33:37.220 | gonna add that final output to our chat history. So, this is

04:33:40.980 | just going to be the text content, right? So, this here,

04:33:45.140 | get direct answer but then, what we do is we return the

04:33:50.180 | full final answer call. The full final answer call is

04:33:52.740 | basically this here, right? So, this answer and tools used but

04:33:57.220 | of course, populated. So, we're saying here that if we have a

04:34:00.820 | final answer, okay? If we have that, we're going to return the

04:34:05.620 | final answer call which was generated by our LLM.

04:34:09.300 | Otherwise, we're gonna return this one. So, this is in the

04:34:12.340 | scenario that maybe the agent got caught in a loop and just

04:34:15.540 | kept iterating. If that happens, we'll say it will come

04:34:19.220 | back with, okay, no answer found and it will just return,

04:34:22.100 | okay, we didn't use any tools which is not technically true

04:34:25.620 | but it's this is like a exception handling event. So,

04:34:30.020 | it ideally shouldn't happen but it's not really a big deal if

04:34:34.660 | we're saying, okay, there were no tools used in my opinion

04:34:37.620 | anyway. Cool. So, we have all of that and yeah, we just, we

04:34:44.340 | initialize our agent executor and then, I mean, that is our

04:34:48.900 | agent execution code. The one last thing we wanna go through

04:34:52.020 | is the SERP API tool which we will do in a moment. Okay. So,

04:34:57.300 | SERP API. Let's see what, let's see how we build our SERP API

04:35:04.260 | tool. Okay, so, we'll start with the synchronous SERP API.

04:35:10.900 | Now, the reason we're starting with this is that it's actually,

04:35:13.700 | it's just a bit simpler. So, I'll show you this quickly

04:35:16.500 | before we move on to the async implementation which is what

04:35:19.300 | we're using within our app. So, we want to get our SERP API

04:35:23.700 | API key. So, I'll run that and we just enter it at the top

04:35:28.260 | there. And this will run. So, we're going to use the SERP

04:35:34.500 | API SDK first. We're importing Google search and these are the

04:35:38.340 | input parameters. So, we have our API key. We're using, we

04:35:41.220 | say we want to use Google. We, our question is cell query. So,

04:35:45.220 | queue for query. We're searching for the latest news in the

04:35:48.340 | world and we'll return quite a lot of stuff. You can see

04:35:52.580 | there's a ton of stuff in there, right? Now, what we want

04:35:58.900 | is contained within this organic results key. So, we can

04:36:02.180 | run that and we'll see, okay, it's talking about, you know,

04:36:06.500 | various things. Pretty recent stuff at the moment. So, we can

04:36:10.340 | tell, okay, that is, that is in fact working. Now, this is

04:36:14.340 | quite messy. So, what I would like to do first is just clean

04:36:17.780 | that up a little bit. So, we define this article base model

04:36:21.620 | which is Pydantic and we're saying, okay, from a set of

04:36:25.780 | results. Okay. So, we're going to iterate through each of

04:36:28.420 | these. We're going to extract the title, source link, and the

04:36:33.620 | snippet. So, you can see title, source, link, and snippet here.

04:36:42.340 | Okay. So, that's all useful. We'll run that and what we do

04:36:46.740 | is we go through each of the results in organic results and

04:36:51.220 | we just load them into our article using this class method

04:36:54.020 | here and then we can see, okay, let's have a look at what those

04:36:58.740 | look like. It's much nicer. Okay, we get this nicely

04:37:04.260 | formatted object here. Cool. That's great. Now, all of this,

04:37:10.340 | what we just did here. So, this is using sub APIs SDK which is

04:37:14.660 | great. Super easy to use. The problem is that they don't

04:37:17.700 | offer a async SDK which is a shame but it's not that hard

04:37:22.820 | for us to set up ourselves. So, typically, with asynchronous

04:37:28.260 | requests, what we can use is the AIO HTTP library. It's well,

04:37:34.900 | you can see what we're doing here. So, this is equivalent to

04:37:39.220 | requests.get. Okay. That's essentially what we're doing

04:37:44.580 | here and the equivalent is literally this. Okay. So, this

04:37:49.860 | is the equivalent using requests that we are running

04:37:53.380 | here but we're using async code. So, we're using AI HTTP

04:37:58.820 | client session and then session.get. Okay. With this

04:38:03.540 | async with here and then we just await our response. So,

04:38:06.340 | this is all, yeah, this is what we do rather than this to make

04:38:10.980 | our code async. So, it's really simple and then the output that

04:38:14.980 | we get is exactly the same, right? So, we still get this

04:38:17.860 | exact same output. So, that means, of course, that we can

04:38:21.300 | use that articles method like this in the exact same way and

04:38:26.660 | we get, we get the same result. There's no need to make this

04:38:30.420 | article from sub API results async because again, like this,

04:38:35.700 | this bit of code here is fully local. It's just our Python

04:38:39.540 | running everything. So, this does not need to be async. Okay

04:38:44.820 | and we can see that we get literally the exact same result

04:38:48.580 | there. So, with that, we have everything that we would need

04:38:52.420 | to build a fully asynchronous sub API tool which is exactly

04:38:56.340 | what we do here for LangChain. So, we import those tools and I

04:39:00.580 | mean, there's nothing, is there anything different here? No.

04:39:03.380 | Alright, this is exactly what we we just did but I will run

04:39:06.420 | this because I would like to show you very quickly this.

04:39:11.220 | Okay. So, this is how we were initially calling our tools in

04:39:15.860 | previous chapters because we were okay mostly with using the

04:39:19.860 | the synchronous tools. However, you can see that the func here

04:39:26.100 | is just empty. Alright, so if I do type, it's just a non type.

04:39:30.660 | That is because well, this is an async function, okay? It's an

04:39:37.220 | async tool. Sorry. So, it was defined with async here and

04:39:41.860 | what happens when you do that is you get this coroutine object.

04:39:47.460 | So, rather than func which is it isn't here, you get that

04:39:52.260 | coroutine. If we then modify this which would be kinda, okay,

04:39:57.300 | let's just remove all the asyncs here and the await. If we

04:40:03.540 | modify that like so and then we look at the cert API

04:40:07.860 | structured tool, we go across, we see that we now get that

04:40:12.020 | func, okay? So, that is that is just the difference between an

04:40:15.940 | async structured tool versus async structured tool via

04:40:19.620 | corsion async. Okay, now we have coroutine again. So,

04:40:26.660 | important to be aware of that and of course, we we run using

04:40:33.300 | the cert API coroutine. So, that is that's how we build the

04:40:38.660 | cert API tool and there's nothing. I mean, that is

04:40:42.740 | exactly what we did here. So, I don't need to, I don't think we

04:40:45.380 | need to go through that any further. So, yeah, I think that

04:40:49.780 | is basically all of our code behind this API. With all of

04:40:54.340 | that, we can then go ahead. So, we have our API running

04:40:57.780 | already. Let's go ahead and actually run also our front

04:41:02.340 | end. So, we're gonna go to Documents Aurelio Linechain

04:41:06.340 | course and then we want to go to chapters zero nine capstone

04:41:12.100 | app and you will need to have NPM installed. So, to do that,

04:41:16.420 | what do we do? We can take a look at this answer for

04:41:19.460 | example. This is probably what I would recommend, okay? So, I

04:41:23.060 | would run brew install node followed by brew install NPM.

04:41:26.900 | If you're on Mac, of course, it's different. If you're on

04:41:28.740 | Linux or Windows, once you have those, you can do NPM install

04:41:33.060 | and this will just install all of the oops, sorry, NPM install

04:41:37.460 | and this will just install all of the node packages that we

04:41:41.780 | need and then we can just run NPM run dev, okay? And now, we

04:41:48.260 | have our app running on Locust 3000. So, we can come over to

04:41:52.820 | here, open that up and we have our application. You can

04:41:57.140 | ignore this. So, in here, we can begin just asking

04:42:00.500 | questions, okay? So, we can start with a quick question.

04:42:04.020 | What is five plus five?

04:42:07.380 | MC. So, we have our streaming happening here. It said the

04:42:12.200 | agent wants to use the add tool and these are the input

04:42:14.760 | parameters to the add tool and then we get the streamed

04:42:17.880 | response. So, this is the final answer tool where we're

04:42:21.800 | outputting that answer key and value and then here, we're

04:42:25.240 | outputting that tool used key and value which is just an

04:42:29.000 | array of the tools being used which just functions add. So,

04:42:32.840 | we have that. Then, let's ask another question. This time,

04:42:36.520 | we'll trigger SERP API with tell me about the latest news

04:42:39.880 | in the world. Okay. So, we can see that's using SERP API and

04:42:46.040 | the query is latest world news and then it comes down here

04:42:51.560 | and we actually get some citations here which is kind of

04:42:53.800 | cool. So, you can also come through to here, okay? And it

04:42:58.040 | takes us through to here. So, that's pretty cool.

04:43:01.080 | Unfortunately, I just lost my chat. So, fine. Let me, I can

04:43:07.080 | ask that question again.

04:43:10.040 | Okay. We can see that tools use SERP API there. Now, let's

04:43:19.360 | continue with the next question from our notebook which is how

04:43:23.840 | cold is it right now? What is five multiplied by five and

04:43:27.440 | what do you get when multiplying those two numbers

04:43:29.760 | together? I'm just gonna modify that to say in Celsius so that

04:43:35.760 | I can understand. Thank you. Okay. So, for this one, we can

04:43:38.640 | see what did we get? So, we got current temperature in Oslo. We

04:43:42.800 | got multiply five by five which is our second question and then

04:43:47.200 | we also got subtract. Interesting that I don't know

04:43:52.320 | why I did that. It's kind of weird. So, it decided to use.

04:43:56.880 | Oh, okay. So, this is, okay. So, then here it was. Okay, that

04:44:03.520 | kind of makes sense. Does that make sense? Roughly. Okay. So,

04:44:07.440 | I think the the conversion for Fahrenheit Celsius is say like

04:44:12.080 | subtract thirty-two. Okay. Yes. So, to go from Fahrenheit to

04:44:18.000 | Celsius, you are doing basically Fahrenheit minus

04:44:22.720 | thirty-two and then you're multiplying by this number

04:44:24.880 | here which the I assume the AI did not. I roughly did. Okay.

04:44:30.960 | So, subtracting thirty-six like thirty-two would have given us

04:44:33.520 | four and it gave us approximately two. So, if you

04:44:36.800 | think, okay, multiply by this, it's practically multiplying by

04:44:40.400 | 0.5. So, halving the value and that would give us roughly two

04:44:45.120 | degrees. So, that's what this was doing here. Kind of

04:44:48.560 | interesting. Okay, cool. So, we've gone through. We have

04:44:53.520 | seen how to build a fully fledged chat application using

04:44:59.280 | what we've learned throughout the course and we've built

04:45:02.400 | quite a lot. If you think about this application, you're

04:45:06.160 | getting the real-time updates on what tools are being used,

04:45:10.160 | the parameters being input to those tools, and then that is

04:45:12.640 | all being returned in a streamed output and even in a

04:45:17.440 | structured output for your final answer including the

04:45:19.760 | answer and the tools that we use. So, of course, you know

04:45:23.920 | what we built here is fairly limited but it's super easy to

04:45:27.920 | extend this like you could maybe something that you might

04:45:31.360 | want to go and do is take what we've built here and like fork

04:45:35.360 | this application and just go and add different tools to it

04:45:38.160 | and see what happens because this is very extensible. You

04:45:42.000 | can do a lot with it but yeah, that is the end of the course.

04:45:46.400 | Of course, this is just the beginning of whatever it is

04:45:50.800 | you're wanting to learn or build with AI. Treat this as

04:45:55.200 | the beginning and just go out and find all the other cool

04:45:59.040 | interesting stuff that you can go and build. So, I hope this

04:46:03.120 | course has been useful, informative, and gives you an

04:46:08.960 | advantage in whatever it is you're going out to build. So,

04:46:12.800 | thank you very much for watching and taking the course

04:46:15.680 | and sticking through right to the end. I know it's pretty

04:46:18.720 | long so I appreciate it a lot and I hope you get a lot out of

04:46:23.760 | it. Thanks. Bye.

04:46:27.520 | Bye.

04:46:30.560 | Bye.

04:46:33.600 | Bye.

04:46:36.640 | Bye.

04:46:39.680 | Bye.

04:46:42.720 | (gentle music)

LangChain Mastery in 2025 | Full 5 Hour Course

Chapters