Back to Index

LangChain Mastery in 2025 | Full 5 Hour Course


Chapters

0:0 Course Introduction
4:24 CH1 When to Use LangChain
13:28 CH2 Getting Started
14:14 Local Course Setup (Optional)
17:0 Colab Setup
18:11 Initializing our OpenAI LLMs
22:34 LLM Prompting
28:48 Creating a LLM Chain with LCEL
33:59 Another Text Generation Pipeline
37:11 Structured Outputs in LangChain
41:56 Image Generation in LangChain
46:59 CH3 LangSmith
49:36 LangSmith Tracing
55:45 CH4 Prompts
67:21 Using our LLM with Templates
72:39 Few-shot Prompting
78:56 Chain of Thought Prompting
85:25 CH5 LangChain Chat Memory
89:51 ConversationBufferMemory
98:39 ConversationBufferWindowMemory
107:57 ConversationSummaryMemory
117:33 ConversationSummaryBufferMemory
129:29 CH6 LangChain Agents Intro
136:34 Creating an Agent
140:56 Agent Executor
147:30 Web Search Agent
150:41 CH7 Agent Deep Dive
160:8 Creating an Agent with LCEL
176:40 Building a Custom Agent Executor
185:19 CH8 LCEL
189:14 LCEL Pipe Operator
193:28 LangChain RunnableLambda
198:0 LangChain Runnable Parallel and Passthrough
203:13 CH9 Streaming
209:22 Basic LangChain Streaming
213:29 Streaming with Agents
231:26 Custom Agent and Streaming
240:46 CH10 Capstone
245:25 API Build
252:14 API Token Generator
256:44 Agent Executor in API
274:50 Async SerpAPI Tool
280:53 Running the App
284:49 Course Completion!

Transcript

Welcome to the AI engineers guide for the line chain. This is a full course that will take you from the assumption that you know nothing about line chain to being able to proficiently use the framework, either, you know, within line chain, within line graph, or even elsewhere, from the fundamentals that you will learn in this course.

Now, this course will be broken up into multiple chapters, we're going to start by talking a little bit about what line chain is, and when we should really be using it, and when maybe we don't want to use it. We'll talk about the pros and cons, and also about the wider line chain ecosystem, not just about a line chain framework itself.

From there, we'll introduce line chain, and we'll just have a look at a few examples before diving into essentially the basics of the framework. Now, I will just note that all of this for line chain 0.3. So that is the latest current version. Although that being said, we will cover a little bit of where line chain comes from as well.

So we'll be looking at pre 0.3 version methods for doing things, so that we can understand, okay, that's the old way of doing things, how do we do it now, now that we're in version 0.3? And also, how do we dive a little deeper into those methods as well and kind of customize those.

From there, we'll be diving into what I believe is the future of AI. I mean, it's the now and the short term, potentially even further into the future. And that is agents. We'll be spending a lot of time on agents. So we'll be starting with a simple introduction to agents.

So that is how can we build an agent that is simple? What are the main components of agents? What do they look like? And then we'll be diving much deeper into them. And we'll be building out our own agent executor, which kind of like a framework around the AI components of an agent, we're building our own.

And once we've done our deep dive on agents, we'll be diving into line chain expression language, which we'll be using throughout this course. So line chain expression language is the recommended way of using line chain. And the expression language or LSAL takes kind of a break from standard Python syntax.

So there's a bit of weirdness in there. And yes, we'll be using it throughout the course. But we're leaving the LSAL chapter until this kind of later on in the course, because we really want to dive into the fundamentals of LSAL by that point. But the idea is that by this point, you already have a good grasp of at least how to use the basics of LSAL before we really dig in at that point, then we'll be digging streaming, which is an essential UX feature of AI applications in general streaming, it can just improve the user experience massively.

And it's not just about streaming tokens, you know, that interface where you have word by word, the AI is generating text on the screen, streaming is more than just that it is also the ability, if you've seen the interface of perplexity, where as the agent is thinking, you're getting an update of what the agent is thinking about what tools is using and how it is using those tools.

That's also another essential feature that we need to have a good understanding of streaming to build. So we'll also be taking a look at all of that. Then we'll finally we'll be topping it off with a capstone project where we will be building our own AI agent application that is going to incorporate all of these features, we're going to have an agent that can use tools, web search, we'll be using streaming, and we'll see all of this in a nice interface that we can that we can work with.

So as an overview, the course, of course, is very high level, what I've just gone through, there's a ton of stuff in here. And truly, this course can take you from you know, wherever you are with Lionchain at the moment, and whether you're a beginner or you've used it a bit or even intermediate, and you're probably going to learn a fair bit from it.

So without any further ado, let's dive in to the first chapter. Okay, so the first chapter of the course, we're going to focus on when should we actually use Lionchain? And when should we use something else? Now, through this chapter, we're not really going to focus too much on the code.

Well, you know, every other chapter is very code focused. But this one is a little more just theoretical. Why is Lionchain? Where's it fit in? When should I use it? When should I not? So I want to just start by framing this. Lionchain is one of, if not the most popular open source framework within the Python ecosystem, at least for AI.

It works pretty well for a lot of things. And also works terribly for a lot of things as well, to be completely honest. There are massive pros, massive cons to using Lionchain. Here, we're just going to discuss a few of those and see how Lionchain maybe compares a little bit against other frameworks.

So the very first question we should be asking ourselves is, do we even need a framework? Is a framework actually needed when we can just hit an API, you have the OpenAI API, other APIs, Mistral, so on, and we can get a response from an LLM in five lines of code on average for those is incredibly, incredibly simple.

However, that can change very quickly. When we start talking about agents, or retrieval, augmented generation, research assistance, all this sort of stuff, those use cases as methods can suddenly get quite complicated when we're outside of frameworks. And that's not necessarily a bad thing. Right? It can be incredibly useful to be able to just understand everything that is going on and build it yourself.

But the problem is that to do that, you need time, like you need to learn all the intricacies of building these things, the intricacies of these methods themselves, like what, you know, how do they even work? And that kind of runs in the opposite direction of what we see with AI at the moment, which is AI is being integrated into the world at an incredibly fast rate.

And because of this, most engineers coming into the space are not from a machine learning or AI background, most people don't necessarily have any experience with the system, a lot of engineers coming in that could be DevOps engineers, generic backend Python engineers, even front end engineers coming in and building all these things, which is great, but they don't necessarily have the experience and that, you know, that might be you as well.

And that's not a bad thing. Because the idea is that obviously you're going to learn and you're going to pick up a lot of these things. And in this scenario, there's quite a good argument for using a framework, because a framework means that you can get started faster. And a framework like Langchain, it abstracts away a lot of stuff.

And that's a big complaint that a lot of people will have with Langchain. But that abstracting away of many things is also what made Langchain popular, because it means that you can come in not really knowing, okay, what, you know, RAG is, for example, and you can implement a RAG pipeline, get the benefits of it without really needing to understand it.

And yes, there's an argument against that as well, just implementing something without really understanding it. But as we'll see throughout the course, it is possible to work with Langchain in a way, as we will in this course, where you kind of implement these things in an abstract way, and then break them apart, and start understanding the intricacies at least a little bit.

So that can actually be pretty good. However, again, circling back to what we said at the start, if the idea or your application is just a very simple, you know, you need to generate some text based on some basic input, maybe you should just use an API, that's completely valid as well.

Now, we just said, okay, a lot of people coming to Langchain might not be from an AI background. So another question for a lot of these engineers might be, okay, if I want to learn about, you know, RAG, agents, all these things, should I skip Langchain and just try and build it from scratch myself?

Well, Langchain can help a lot with that learning journey. So you can start very abstract. And as you gradually begin to understand the framework better, you can strip away more and more of those abstractions and get more into the details. And in my opinion, this gradual shift towards more explicit code, with less abstraction, is a really nice feature.

And it's also what we focus on, right? Throughout this course, that's what we're going to be doing. We're going to sign abstract, stripping away the abstractions, and getting more explicit with what we're building. So for example, building an agent in Langchain, there's this very simple and incredibly abstract create tools agent method that we can use.

And like it creates a tool agent for you. It's it doesn't tell you anything. So you can you can use that, right. And we will use that initially in the course, but then you can actually go from that to defining your full agent execution logic, which is basically a tools call to open AI, you're going to be getting that tool information back, but then you've got to figure out, okay, how am I going to execute that?

How am I going to store this information? And then how am I going to iterate through this? So we're going to be seeing that stripping away abstractions as we work through as we build agents as we do, as we build, like our streaming use case, among many other things, even chat memory, we'll see there as well.

So Langchain can act as the on ramp to your AI learning experience, then what you might find, and I do think this is quite true, for most people is that if you if you're really serious about AI engineering, and that's what you want to do, like that's your focus, right, which isn't for everyone, for certain, a lot of people just want to understand a bit of AI, and they want to continue doing what they're doing, and just integrate AI here and there.

And maybe those, you know, if that's your focus, you might stick with Langchain, there's not necessarily a reason to move on. But in the other scenario, where you're thinking, okay, I want to get really good at this, I want to just learn as much as I can. And I'm going to dedicate basically my, you know, my short term future of my career on becoming AI engineer.

Then Langchain might be the on ramp, it might be your initial learning curve. But then after you've become competent with Langchain, you might actually find that you want to move on to other frameworks. And that doesn't necessarily mean that you're going to have wasted your time with Langchain. Because one, Langchain is a thing helping you learn.

And two, one of the main frameworks that I recommend a lot of people to move on to is actually Langraff, which is still within the Langchain ecosystem, and it still uses a lot of Langchain objects and methods. And, of course, concepts as well. So even if you do move on from Langchain, you may move on to something like Langraff, which you can know Langchain for anyway.

And let's say you do move on to another framework instead. In that scenario, the concepts that you learn from Langchain are still pretty important. So to just finish up this chapter, I just want to summarize on that question of should you be using Langchain? What's important to remember is that Langchain does abstract a lot.

Now, this abstraction of Langchain is both a strength and a weakness. With more experience, those abstractions can feel like a limitation. And that is why we sort of go with the idea that Langchain is a really good to get started with. But as the project grows in complexity, or the engineers get more experience, they might move on to something like Langraff, which, in any case, is going to be using Langchain to some degree.

So in either one of those scenarios, Langchain is going to be a core tool in an AI engineer's toolkit. So it's worth learning in our opinion. But of course, it comes with its, you know, it comes with its weaknesses. And it's just good to be aware of that it's not a perfect framework.

But for the most part, you will learn a lot from it, and you will be able to build a lot with it. So with all of that, we'll move on to our first sort of hands on chapter with Langchain, where we'll just introduce Langchain, some of the essential concepts, we're not going to dive too much into the syntax, but we're still going to understand a little bit of what we can do with it.

Okay, so moving on to our next chapter, getting started with Langchain. In this chapter, we're going to be introducing Langchain by building a simple LM powered assistant that will do various things for us, it will be multimodal, generating some text, generating images, generate some structured outputs, it will do a few things.

Now to get started, we will go over to the course repo, all of the code, all the chapters are in here, there are two ways of running this, either locally or in Google Colab, we would recommend running in Google Colab, because it's just a lot simpler with environments. But you can also run it locally.

And actually, for the capstone, we will be running it locally, there's no way of us doing that in Colab. So if you would like to run everything locally, I'll show you how quickly now if you would like to run in Colab, which I would recommend at least for the first notebook chapters, just skip ahead, there will be chapter points in the timeline of the video.

So for only running it locally, we just come down to here. So this actually tells you everything that you need. So you will need to install uvi. Alright, so this is the package manager that we recommend by the Python and package management library, you don't need to use uvi, it's up to you.

uvi is very simple, it works really well. So I would recommend that. So you would install it with this command here. This is on Mac. So it will be different. Otherwise, if you are on Windows, or otherwise, you can look at the installation guide there and it will tell you what to do.

And so before we actually do this, what I will do is go ahead and just clone this repo. So we'll come into here, I'm going to create like a temp directory for me because I already have the flying chain course in there. And what I'm going to do is just get clone line chain course.

Okay, so you will also need to install git if you don't have that. Okay, so we have that, then what we'll do is copy this. Okay, so this will install Python 3.12.7 for us with this command, then this will create a new VM within that or using Python 3.12.7 that we've installed.

And then uvi sync will actually be looking at the pyproject.toml file, that's like the package installation for the repo and using that to install everything that we need. Now, we should actually make sure that we are within the line chain course directory. And then yes, we can run those three.

There we go. So everything should install with that. Now, if you are in cursor, you can just do cursor dot or we can run code dot if in VS code, I'll just be running this. And then I've opened up the course. Now within that course, you have your notebooks and then you just run through these making sure you select your kernel, Python environment and making sure you're using the correct VM from here.

So that should pop up already as this VM bin Python, and you'll click that and then you can run it through. When you are running locally, don't run these, you don't need to you've already installed everything. So you don't this specifically is for Colab. So that is running things locally.

Now let's have a look at running things in Colab. So for running everything in Colab, we have our notebooks in here, we click through, and then we have each of the chapters through here. So starting with the first chapter, the introduction, which is where we are now. So what you can do to open this in Colab is either just click this Colab button here.

Or if you really want to, for example, maybe this is not loading for you, what you can do is you can copy the URL at the top here, you can go over to Colab, you can go to open GitHub, and then just paste that in there and press enter.

And there we go, we have our notebook. Okay, so we're in now, what we will do first is just install the prerequisites. So we have line chain, just a little line chain packages here, line chain core, line chain OpenAI because we're using OpenAI and line chain community, which is needed for running what we're running.

Okay, so that has installed everything for us. So we can move on to our first step, which is initializing our LM. So we're going to be using GPT-40 mini, which is slightly small, but fast, but also cheaper model. That is also very good from OpenAI. So what we need to do here is get an API key.

Okay, so for getting the API key, we're going to go to OpenAI's website. And you can see here that we're opening platform. openai.com. And then we're going to go into settings organization API keys. So you can copy that or just click it from here. Okay, so I'm going to go ahead and create a new secret key to actually just in case you're kind of looking for where this is.

It's settings organization API keys again, okay, create a new API key, I'm going to call it line chain course. I'll just put on the semantic router, that's just my organization, you put it wherever you want it to be. And then you would copy your API key, you can see mine here, I'm obviously going to revert that before you see this, but you can try and use it if you really like.

So I'm going to copy that. And I'm going to place it into this little box here. You could also just place it, put your full API key in here, it's up to you. But this little box just makes things easier. Now, that what we've basically done there is just passing our API key, we're setting our OpenAI model GPT-40 mini.

And what we're going to be doing now is essentially just connecting and setting up our LLM parameters with line chain. So we run that, we say okay, we're using a GPT-40 mini. And we're also setting ourselves up to use two different LLMs here, or two of the same LLM with slightly different settings.

So the first of those is an LLM with a temperature setting of zero. The temperature setting basically controls almost the randomness of the output of your LLM. And the way that it works is when an LLM is predicting the next token, or next word in a sequence, it'll provide a probability actually for all of the tokens within the LLMs knowledge base or what the LLM has been trained on.

So what we do when we set a temperature of zero is we say you are going to give us the token with highest probability according to you, okay. Whereas when we set a temperature of 0.9, what we're saying is, okay, there's actually an increased probability of you giving us a token that according to your generated output is not the token with the highest probability according to the LLM.

But what that tends to do is give us more sort of creative outputs. So that's what the temperature does. So we are creating a normal LLM and then a more creative LLM with this. So what are we going to be building? We're going to be taking a draft article from the Aurelio learning page, and we're going to be using line chain to generate various things that we might find helpful as well.

You know, we have this article draft and we're editing it and just kind of like finalizing it. So what are those going to be? You can see them here. We have the title for the article, description, an SEO friendly description, specifically. The third one, we're going to be getting the LLM to provide us advice on existing paragraph and essentially writing a new paragraph for us from that existing paragraph.

And what it's going to do, this is the structured output part is going to write a new version of that paragraph for us. And it's going to give us advice on where we can improve our writing. Then we're going to generate a thumbnail hero image for our article. So a nice image that you would put at the top.

So here, we're just going to input our article, you can put something else in here if you like. Essentially, this is just a big article that's written a little while back on agents. And now we can go ahead and start preparing our prompts, which are essentially the instructions for our LLM.

So line chain comes with a lot of different like utilities for prompts, and we're going to dive into them in a lot more detail. But I do want to just give you the essentials now, just so you can understand what we're looking at, at least conceptually. So prompts for chat agents are at a minimum broken up into three components.

Those are the system prompt, this provides instructions to our LLM on how it should behave, what its objective is, and how it should go about achieving that objective. Generally, system prompts are going to be a bit longer than what we have here, depending on the use case, then we have our user prompts.

So these are user written messages. Usually, sometimes we might want to pre populate those if we want to encourage a particular type of conversational patterns from our agent. But for the most part, yes, these are going to be user generated. Then we have our AI prompts. So these are, of course, AI generated.

And again, in some cases, we might want to generate those ourselves beforehand or within a conversation if we have a particular reason for doing so. But for the most part, you can assume that these are actually user and AI generated. Now, the line chain provides us with templates for each one of these prompt types.

Let's go ahead and have a look at what these look like within line chain. So to begin, we are looking at this one. So we have our system message prompt template and human messages, the user that we saw before. So we have these two system prompt, keeping it quite simple here, you are a AI system that helps generate article titles, right.

So our first component we want to generate is article title. So we're telling the AI, that's what we want it to do. And then here, right. So here, we're actually providing kind of like a template for a user input. So yes, as I mentioned, user input can be, it can be fully generated by user, it might be kind of not generated by user, it might be setting up a conversation beforehand, which a user would later use, or in this scenario, we're actually creating a template, and the what the user will provide us will actually just be inserted here inside article.

And that's why we have this import variables. So what this is going to do is okay, we have all of these instructions around here, they're all going to be provided to open AI as if it is the user saying this, but it will actually just be this here, that user will be providing, okay.

And we might want to also format this a little nicer, it kind of depends, this will work as it is. But we can also put, you know, something like this to make it a little bit clearer to the LM. Okay, what is the article? Where are the prompts? So we have that, you can see in this scenario, there's not that much difference to what the system prompt and user prompt is doing.

And this is, it's a particular scenario, it varies when you get into the more conversational stuff, as we will do later, you'll see that the user prompt is generally more fully user generated, or mostly user generated. And much of these types of instructions, we might actually be putting into the system prompt, it varies.

And we'll see throughout the course, many different ways of using these different types of prompts in various different places. Then you'll see here, so I just want to show you how this is working, we can use this format method on our user prompt here to actually insert something within the article input here.

So we're going to go use prompt format, and then we pass in something for article. Okay. And we can also maybe format this a little nicer, but I'll just show you this for now. So we have our human message. And then inside content, this is the text that we had, right, you can see that we have all this, right.

And this is what we wrote before we wrote all this, except from this part, we didn't write this, instead of this, we had article, right. So let's format this a little nicer so that we can see. Okay, so this is exactly what we wrote up here, exactly the same, except from now we have test string instead of article.

So later, when we insert our article, it's going to go inside there, slowly soon. It's like it's an it's an F string in Python, okay. And this is again, this is one of the things where people might complain about line chain, you know, this sort of thing can be, you know, it seems excessive, because you could just do this with an F string.

But there are, as we'll see later, particularly when you're streaming, just really helpful features that come with using line chains kind of built in prompt templates, or at least message objects that we will see. So, you know, we need to keep that in mind. Again, as things get more complicated, line chain can be a bit more useful.

So, chat prompt template, this is basically just going to take what we have here, our system prompt, user prompts, we could also include some AI prompts in there. And what it's going to do is merge both of those. And then when we do format, what it's going to do is put both of those together into a chat history.

Okay, so let's see what that looks like. First, in a more messy way. Okay, so you can see we have just the content, right? So it doesn't include the whole, you know, before we had human message, we're not include, we're not seeing anything like that here. Instead, we're just seeing the string.

So now let's switch back to print. And we can see that what we have is our system message here, it's just prefixed with this system. And then we have human, and it's prefixed by human, and then it continues, right? So that's, that's all it's doing is just kind of merging those in some sort of chat log, we could also put in like AI messages, and they would appear in there as well.

Okay, so we have that. Now, that is our prompt template. Let's put that together with an LLM to create what would be in the past line chain be called an LLM chain. Now, we wouldn't necessarily call it an LLM chain, because we're not using the LLM chain abstraction, it's not super important, if that doesn't make sense, we'll go into it in more detail later, particularly in the in the LSO chapter.

So what this chain will do, think line chain is just chains, we're chaining together these multiple components, it will perform the steps prompt formatting. So that's what I just showed you LLM generation, so sending our prompt to OpenAI, getting a response and getting that output. So you can also add another step here, if you want to format that in a particular way, we're going to be outputting that in a particular format so that we can feed it into the next step more easily.

But there are also things called output parsers, which parse your output in a more dynamic or complicated way, depending on what you're doing. So this is our first look at LSAL, I don't want us to focus too much on the syntax here, because we will be doing that later.

But I do want you to just understand what is actually happening here. And logically, what are we writing? So all we really need to know right now is we define our inputs with the first dictionary segment here. Alright, so this is a, you know, our inputs, which we have defined already, okay.

So if we come up to our user prompt here, we said input variable is our article, right. And we might have also added input variables to the system prompt here as well. In that case, you know, let's say we had your AI assistant called name, right, that helps generate article titles.

In this scenario, we might have input variables, name here, right. And then what we would have to do down here is we would also have to pass that in, right. So it also we would have article, we would also have name. So basically, we just need to make sure that in here, we're including the variables that we have defined as input variables for our, our first prompts.

Okay, so we can actually go ahead and let's add that. So we can see it in action. So run this again, and just include that or reinitialize our first prompt. So we see that. And if we just have a look at what that means for this format function here, it means we'll also need to pass in a name, okay, and call it Joe.

Okay, so Joe, the AI, right, so you're an AI assistant called Joe now. Okay, so we have Joe, our AI, that is going to be fed in through these input variables. Then we have this pipe operator, the pipe operator is basically saying whatever is to the left of the pipe operator, which in this case would be this is going to go into whatever is on the right of the pipe operator.

It's that simple. Again, we'll dive into this and kind of break it apart in the LSL chapter. But for now, that's all we need to know. So this is going to go into our first prompt, that is going to form everything's going to add the name and the article that we've provided into our first prompt.

And it's going to output that, right, output that we have our pipe operator here. So the output of this is going to go into the input of our next step, our creative LM, then that is going to generate some tokens, it's going to generate our output, that output is going to be an AI message.

And as you saw before, if I take this bit out, within those message objects, we have this content field, okay, so we are actually going to extract the content field out from our AI message to just get the content. And that is what we do here. So we get the AI message out from ILM.

And then we're extracting the content from that AI message object. And we're going to pass it into a dictionary that just contains article title, like so. Okay, we don't need to do that, we can just get the AI message directly. I just want to show you how we are using this sort of chain in Elsa.

So once we have set up our chain, we then call it or execute it using the invoke method. Into that we will need to pass in those variables. So we have our article already, but we also gave our AI name now. So let's add that. And we'll run this.

Okay, so Joe has generated us a article title, unlocking the future, the rise of neuro symbolic AI agents. Cool, much better name than what I gave the article, which was AI agents are neuro symbolic systems. I don't think I did too bad. Okay, so we have that. Now, let's continue.

And what we're going to be doing is building more of these types of LM chain pipelines, where we're feeding in some prompts, we're generating something, getting something and doing something with it. So as mentioned, we have the title, we're now moving on to the description. So I want to generate description.

So we have our human message prompt template. So this is actually going to go into a similar format as before, we probably also want to redefine this because I think I'm using the same system message there. So let's, let's go ahead and do modify that. Or what we could also do is let's just remove the name now because I've shown you that.

So what we could do is you're an AI system that helps build good articles, right, build good articles. And we could just use this as our, you know, generic system prompt now. So let's say that's our new system prompt. Now we have our user prompt, you're tasked with creating a description for the article, the articles here for you to examine article, here is the article title.

Okay, so we need the article title now as well, and our input variables. Now we're going to output an SEO friendly article description. And we're just saying, just to be certain here, do not output anything other than the description. So you know, sometimes an LLM might say, Hey, look, this is what I generated for you.

The reason I think this is good is because so on and so on and so on. Right? If you're programmatically taking some output from an LLM, you don't want all of that fluff around what the LLM has generated, you just want exactly what you've asked it for. Okay, because otherwise, you need to pass out with code, and it can get messy, and also just far less reliable.

So we're just saying do not put anything else. Then we're putting all of these together. So system prompt and the second user prompt, this one here, putting those together into a new chat prompt template. And then we're going to feed all that in to another LSL chain as we have here to generate our description.

So let's go ahead, we invoke that as before, we're just making sure we add in the article title that we got from before. And let's see what we get. Okay, so we have this explore the transformative potential of neurosymbolic AI agents in a little bit long, to be honest.

But yeah, you can see what it's doing here. Right. And of course, we could then go in, we see this kind of too long, like SEO friendly description, not, not really. So we can modify this. I'll put the SEO friendly description, make sure we don't exceed, let me put on a new line, make sure we don't exceed, say 200 characters, or maybe it's even less to SEO, I don't, I don't have a clue.

I would just say 120 characters do not apply anything other than the description. Right. So we could just go back, modify our prompting, see what that generates again. Okay, so much shorter, probably too short now, but that's fine. Cool. So we have that we have a summary processor. And that's now in this dictionary format that we have here.

Cool. Now the third step, we want to consume that first article variable with our full article. And we're going to generate a few different output fields. So for this, we're going to be using the structured output feature. So let's scroll down, we'll see what that is, what that looks like.

So structured output is essentially we're forcing the LLAMic like it has to output a dictionary with these particular fields. Okay. And we can modify this quite a bit. But in this scenario, what I want to do is I want there to be an original paragraph, right, so I just want it to regenerate the original paragraph, because I'm lazy, and I don't want to extract it out, then I want to get the new edited paragraph, this is the LLAM generated improved paragraph, and then we want to get some feedback because we don't want to just automate ourselves, we want to augment ourselves and get better with AI rather than just being like how you do you do this.

So that's what we do here. And you can see that here we're using this pydantic object. And what pydantic allows us to do is define these particular fields. And it also allows us to assign these descriptions to a field and line chain is actually going to go ahead read all of this, right even reads.

So for example, we could put integer here, and we could actually get a numeric score for our paragraph, right, we can try that, right. So let's, let's, let's just try that quickly, I'll show you. So numeric, numeric score. In fact, let's even just ignore, let's not put anything here.

So I'm going to put constructive feedback on the original paragraph by just put into here. So let's see what happens. Okay, so we have that. And what I'm going to do is I'm going to get our creative LM, I'm going to use this with structured output method. And that's actually going to modify that LM class, create a new LM class that forces LM to use this structure for the output, right, so passing in paragraph into here.

Using this, we're creating this new structured LM. So let's run that and see what happens. Okay, so we're going to modify our chain accordingly, maybe what I can do is also just remove this bit for now. So we can just see what the structured LM outputs directly. And let's see.

Okay, so now you can see that we actually have that paragraph object, right, the one we defined up here, which is kind of cool. And then in there, we have the original paragraph, right. So this is where this is coming from. I definitely remember writing something that looks a lot like that.

So I think that is correct. We have the edited paragraph. So this is okay, what it thinks is better. And then interestingly, the feedback is three, which is weird, right? Because here we said the constructive feedback on the original paragraph. But what we're doing when we use this with structured output, for what line chain is doing is is essentially performing a tool call to open AI.

And what a tool call can do is force a particular structure in the output of an LM. So when we say feedback has to be an integer, no matter what we put here, it's going to give us an integer. Because how do you provide constructive feedback within sure doesn't really make sense.

But because we've set that limitation, that restriction here, that is what it does. It just gives us the numeric value. So I'm going to shift that to string. And then let's rerun this, see what we get. Okay, we should now see that we actually do get constructive feedback. Alright, so yeah, you can see it's quite, quite long.

So the original paragraph effectively communicates limitations with neural AI systems in performing certain tasks. However, it could benefit from slightly improved clarity and conciseness. For example, the phrase was becoming clear can be made more direct by changing it to became evident. Yeah, true. Thank you very much. So yeah, now we actually get that that feedback, which is pretty nice.

Now let's add in this final step to our chain. Okay, and it's just going to pull out our paragraph object here and extract into a dictionary, we don't necessarily need to do this. Honestly, I actually kind of prefer it within this paragraph object. But just so we can see how we would pass things on the other side of the chain.

Okay, so now we can see we've extracted that out. Cool. So we have all of that interesting feedback again. But let's leave it there for the text part of this. Now let's have a look at the sort of multimodal features that we can work with. So this is, you know, maybe one of those things that's kind of seems a bit more abstracted, a little bit complicated, where it maybe could be improved.

But you know, we're not going to really be focusing too much on the multimodal stuff, we'll still be focusing on language, but I did want to just show you very quickly. So we want this article to look better. Okay, we want to generate a prompt based on the article itself, that we can then pass to DALI, the image generation model from OpenAI, that will then generate an image like a like a thumbnail image for us.

Okay. So the first step of that is we're actually going to get an LLM to generate that. Alright, so we have our prompt that we're going to use for that. So I'm gonna say generate a prompt with less than 500 characters to generate an image based on the following article.

Okay, so that's our prompt. Yeah, super simple. We're using the generic prompt template here, you can use that you can use user prompt template, it's up to you. This is just like the generic prompt template, then what we're going to be doing is based on what this outputs, we're then going to feed that in to this generate and display image function via the image prompt parameter that is going to use the DALI API wrapper from line chain, it's going to run that image prompt, and we're going to get a URL out from that, essentially.

And then we're going to read that using SK image here, right, so it's going to read that image URL, going to get the image data, and then we're just going to display it. Okay, so pretty straightforward. Now, again, this is a L cell thing here that we're doing, we have this runnable lambda thing, when we're running functions within our cell, we need to wrap them within this runnable lambda, I, you know, I don't want to go too much into what this is doing here, because we do cover in the L cell chapter.

But it's just, you know, all you really need to know is we have a custom function, wrap it in runnable lambda. And then what we get from that we can use within this here, right, the L cell syntax. So what are we doing here, let's figure this out, we are taking our original image prompt that we defined just up here, right, input variable to that is article.

Okay, we have our article data being input here, feeding that into our prompt. From there, we get our message that we then feed into our LM from the LM, it's going to generate us a, like an image prompt, like a prompt for generating our image for this article, we can even let's let's print that out, so that we can see what it generates, because I'm also kind of curious.

Okay, so we'll just run that. And then let's see, it will feed in that content into our runnable, which is basically this function here. And we'll see what it generates. Okay, don't expect anything amazing from Dali, it's not, it's not the best, to be honest, but we at least we see how to use it.

Okay, so we can see the prompt that was used here, create an image that visually represents the concept of neuro symbolic agents depict a futuristic interface where a large language model interacts with traditional code, symbolizing integration of, oh, my gosh, something computation include elements like a brain to represent neural networks, gears or circuits or symbolic logic, and a web of connections illustrating vast use cases of AI agents.

Oh, my gosh, look at all that. Big prompt, then we get this. So you know, Dali is interesting, I would say, we could even take this, let's just see what that comes up with in something like mid journey, you can see these way cooler images that we get from just another image generation model far better, but pretty cool, honestly.

So in terms of generation images, the phrasing that the prompt itself is actually pretty good. The image, you know, could be better. But that's it, right. So with all of that, we've seen a little introduction to what we might building with Lightning Chain. So that's it for our introduction chapter.

As I mentioned, we don't want to go too much into what each of these things is doing, I just really want to focus on, okay, this is kind of how we're building something with line chain. This is the overall flow. We don't really want to be focusing too much on, okay, what exactly LSL is doing, or what exactly, you know, this prompt thing is that we're setting up, we're going to be focusing much more on all of those things, and much more in the upcoming chapters.

So for now, we've just seen a little bit of what we can build before diving in, in more detail. Okay, so now we're going to take a look at AI observability using Langsmith. Now, Langsmith is another piece of the broader line chain ecosystem. Its focus is on allowing us to see what our LLMs, agents, etc, are actually doing.

And it's something that we would definitely recommend using if you are going to be using line chain and line graph. Now let's take a look at how we would set Langsmith up, which is incredibly simple. So I'm going to open this in Colab. And I'm just going to install the prerequisites here.

You'll see these are all the same as before, but we now have the Langsmith library here as well. Now, we are going to be using Langsmith throughout the course. So in all the following chapters, we're going to be importing Langsmith, and that will be tracking everything we're doing. But you don't need Langsmith to go through the course, it's an optional dependency.

But as mentioned, I would recommend it. So we'll come down to here. And first thing that we will need is the line chain API key. Now we do need an API key, but that does come with a reasonable free tier. So we can see here, they have each of the plans.

And this is the one that we are by default on. So it's free for one user up to 5000 tracers per month. If you're building out an application, I think it's fairly easy to go beyond that, but it really depends on what you're building. So it's a good place to start with.

And then of course, you can upgrade as required. So we would go to smith.langchain.com. And you can see here that this will log me in automatically, I have all of these tracing projects, these are all from me running the various chapters of the course yours, if you do use Langsmith throughout the course, your Langsmith dashboard will end up looking something like this.

Now, what we need is an API key. So we go over to settings, we have API keys, and we're just going to create an API key. Because we're just going through some personal learning right now, I would go with personal access token, we can give a name or description if you want.

Okay, and we'll just copy that. And then we come over to our notebook, and we enter our API key there. And that is all we actually need to do. That's absolutely everything. I suppose the one thing to be aware of is that you should set your Langchain project to whatever project you're working within.

So of course, within the course, we have individual project names for each chapter. But for your own projects, of course, you should make sure this is something that you recognize and is useful to you. So Langsmith actually does a lot without needing to do anything. So we can actually go through, let's just initialize our LLM and start invoking it and seeing what Langsmith returns to us.

So we'll need our OpenAI API key, enter it here. And then let's just invoke hello. Okay, so nothing has changed on this end, right? So it was running code, there's nothing different here. However, now if we go to Langsmith, I'm going to go back to my dashboard. Okay, and you can see that the the order of these projects just changed a little bit.

And that's because the most recently used project, this one at the top, Langchain course Langsmith OpenAI, which is the current chapter we're in, that was just triggered. So I can go into here, I can see, oh, look at this. So we actually have something in the Langsmith UI. And all we did was enter our Langchain API key.

That's all we did. And we set some environment variables. And that's it. So we can actually click through to this and it will give us more information. So you can see what was the input, what was the output, and some other metadata here. You see, you know, there's not that much in here.

However, when we do the same for agents, we'll get a lot more information. So I can even show you a quick example from the future chapters. If we come through to agents intro here, for example. And we just take a look at one of these. Okay, so we have this input and output, but then on the left here, we get all of this information.

And the reason we get all this information is because agents are performing multiple LLM calls, etc, etc. So there's a lot more going on. So you can see, okay, what was the first LLM call, and then we get these tool use traces, we get another LLM call, another tool use and another LLM call.

So you can see all this information, which is incredibly useful and incredibly easy to do. Because all I did when saying this up in that agent chapter was simply set the API key and the environment variables as we have done just now. So you get a lot out of a very little effort with Langsmith, which is great.

So let's return to our Langsmith project here. And let's invoke some more. Now I've already shown you, you know, we're going to see a lot of things just by default. But we can also add other things that Langsmith wouldn't typically trace. So to do that, we will just import a traceable decorator from Langsmith.

And then let's make these just random functions traceable within Langsmith. Okay, so we run those, we have three here. So we're going to generate a random number, we're going to modify how long a function takes and also generate a random number. And then in this one, we're going to either return this no error, or we're going to raise an error.

So we're going to see how the Langsmith handles these different scenarios. So let's just iterate through and run those a few times. So it's going to run each one of those 10 times. Okay, so let's see what happens. So they're running, let's go over to our Langsmith UI and see what is happening over here.

So we can see that everything is updating, we're adding that information through. And we can see if we go into a couple of these, we can see a little more information. So the input and the output took three seconds. See random error here. In this scenario, random error passed without any issues.

Let me just refresh the page quickly. Okay, so now we have the rest of the information. And we can see that occasionally, if there is an error from our random error function, it is signified with this. And we can see the traceback as well that was returned there, which is useful.

Okay, so we can see if an error has been raised, we have to see what that error is. We can see the various latencies of these functions. So you can see that varying throughout here. We see all the inputs to each one of our functions, and then of course the outputs.

So we can see a lot in there, which is pretty good. Now, another thing that we can do is we can actually filter. So if we come to here, we can add a filter. Let's filter for errors. That would be value error. And then we just get all of the cases where one of our functions has returned or raised an error or value error specifically.

Okay, so that's useful. And then yeah, there's various other filters that we can add there. So we could add a name, for example, if we wanted to look for the generate string delay function only, we could also do that. Okay, and then we can see the varying latencies of that function as well.

Cool. So we have that. Now, one final thing that we might want to do is maybe we want to make those function names a bit more descriptive or easy to search for, for example. And we can do that by saying the name of the traceable decorator, like so. So let's run that.

Run this a few times. And then let's jump over to Langsmith again, go into Langsmith project. Okay, and you can see those coming through as well. So then we could also search for those based on that new name. So what was it, chit chat maker, like so. And then we can see all the information being streamed through to Langsmith.

So that is our introduction to Langsmith. There is really not all that much to go through here. It's very easy to set up. And as we've seen, it gives us a lot of observability into what we are building. And we will be using this throughout the course, we don't rely on it too much.

It's a completely optional dependency. So if you don't want to use Langsmith, you don't need to, but it's there and I would recommend doing so. So that's it for this chapter, we'll move on to the next one. Now we're going to move on to the chapter on prompts in Langchain.

Now, prompts, they seem like a simple concept, and they are a simple concept, but there's actually quite a lot to them when you start diving into them. And they truly have been a very fundamental part of what has propelled us forwards from pre LLM times to the current LLM times.

You have to think until LLMs became widespread, the way to fine tune a AI model or ML model back then was to get loads of data for your particular use case, spend a load of training your specific transformer or part of the transformer to essentially adapt it for that particular task.

That could take a long time. Depending on the task, it could take you months or in some times, if it was a simpler task, it might take probably days, potentially weeks. Now, the interesting thing with LLMs is that rather than needing to go through this whole fine tuning process to modify a model for one task over another task, rather than doing that, we just prompt it differently, we literally tell the model, hey, I want you to do this in this particular way.

And that is a paradigm shift in what you're doing is so much faster, it's going to take you, you know, a couple of minutes, rather than days, weeks, or months. And LLMs are incredibly powerful when it comes to just generalizing to, you know, across these many different tasks. So prompts, which control those instructions are a fundamental part of that.

Now, line chain naturally has many functionalities around prompts. And we can build very dynamic prompting pipelines that modify the structure and content of what we're actually feeding into our LLM, depending on different variables, different inputs. And we'll see that in this chapter. So we're going to work through prompting within the scope of a RAG example.

So let's start by just dissecting the various parts of a prompt that we might expect to see for a use case like RAG. So our typical prompt for RAG or retrieval, augmented generation will include rules for the LLM. And this is this you will see in most prompts, if not all this part of the prompt sets up the behavior of the LLM.

That is how it should be responding to user queries, what sort of personality it should be taking on what it should be focusing on when it is responding any particular rules or boundaries that we want to set. And really, what we're trying to do here is just to simply provide as much information as possible to the LLM about what we're doing, we just want to give the LLM context as to the place that it finds itself in.

Because an LLM has no idea where it is, it's just is a it takes in some information and spits out information. If the only information it receives is from the users, you know, user query, it has, you know, it doesn't know the context, what is the application that is within?

What is its objective? What is its aim? What are the boundaries? All of this, we need to just assume the LLM has absolutely no idea about because it truly does not. So as much context as we can provide, but it's important that we don't overdo it. It's, we see this all the time, people will over prompt an LLM, you want to be concise, you don't want fluff.

And in general, every single part of your prompt, the more concise and less fluffy, you can make it the better. Now, those rules or instructions are typically in the system prompt of your LLM. Now, the second one is context, which is RAG specific. The context refers to some sort of external information that you're feeding into your LLM.

We may have received this information from web search, database query or quite often in this case of RAG, it's a vector database. This external information that we provide is essentially the RA retrieval augmentation of RAG. We are augmenting the knowledge of our LLM, which the knowledge of our LLM is contained within the LLM model weights.

We're augmenting that knowledge with some external knowledge. That's what we're doing here. Now for chat LLMs, this context is typically placed within a conversational context within the user or assistant messages. And with more recent models, it can also be placed within tool and messages as well. Then we have the questions, pretty straightforward.

This is the query from the user. This is more, it's usually a user message, of course. There might be some additional formatting around this, you might add a little bit of extra context, or you might add some additional instructions. If you find that your LLM sometimes veers off the rules that you've set within the system prompt, you might append or prefix something here.

But for the most part, it's probably just going to be the user's input. And finally, so these are all the inputs for our prompt here is going to be the output that we get. So the answer from the assistant. Again, I mean, that's not even specific to RAG, it's just what you would expect in a chat LLM or any LLM.

And of course, that would be an assistant message. So putting all of that together in an actual prompt, so you can see everything we have here. So we have the rules for our prompt here, the instructions, we're just saying, okay, answer the question based on the context below. If you cannot answer the question, using the information, answer it, I don't know.

Then we have some context here. Okay, in this scenario, that context that we're feeding in here, because it's the first message, we might put that into the system prompt. But that may also be turned around. Okay, if you if you, for example, have an agent, you might have your question up here before the context.

And then that would be coming from a user message. And then this context would follow the question and be recognized as a tool message, it would be fed in that way as well, can depends on on what sort of structure you're going for that. But you can do either you can feed it into the system message if it's less conversational, whereas if it's more conversational, you might feed it in as a tool message.

Okay, and then we have a user query, which is here. And then we'd have the AI answer. Okay, and obviously, that would be generated here. Okay, so let's switch across to the code. We're in the linechain course repo notebooks, zero, three prompts, I'm just going to open this in Colab.

Okay, scroll down, and we'll start just by installing the prerequisites. Okay, so we just have the various libraries, again, as I mentioned before, langsmith is optional, you don't need to install it. But if you would like to see your traces and everything in langsmith, then I would recommend doing that.

And if you are using langsmith, you will need to enter your API key here. Again, if you're not using langsmith, you don't need to enter anything here, you just skip that cell. Okay, cool. And let's jump into the basic prompting them. So we're going to start with this prompt.

And so use query based on the question below. So we're just structuring what we just saw in code. And we're going to be using the chat prompt template, because generally speaking, we're using chat LMS in most, most cases, nowadays. So we have our chat prompt template, and that is going to contain a list of messages, system message to begin with, which is just going to contain this.

And we're feeding in the context within that there. And we have our user query here. Okay. So we'll run this. And if we take a look here, we haven't specified what our input variables are, okay. But we can see that we have query. And we have context up here, right?

So we can see that, okay, these are the input variables, we just haven't explicitly defined them here. So let's just confirm with this, that line chain did pick those up. And we can see that it did. So it has context and query as our input variables for the prompt template that we just defined.

Okay, so we can also see the structure of our templates. Let's have a look. Okay, so we can see that within messages here, we have a system message prompt template, the way that we define this, you can see here that we have from messages and this will consume various different structures.

So you can see here that it has a for messages is a sequence of message like representation. So we could pass in a system prompt template object, and then a user prompt template object. Or we can just use a tuple like this. And this actually defines okay, the system, this is a user, and you could also do assistant or tool messages and stuff here as well using the same structure.

And then we can look in here. And of course, that is being translated into the system message prompt template and human message prompt template. Okay. We have our input variables in there. And we have the template too. Okay. Now, let's continue. We'll see here why why just said, so we're importing our system message prompt template and human message prompt template.

And you can see we're using the same from messages method here. Right? And you can see so sequence of message like representation. It's just, you know, what that actually means. It can vary, right? So here we have system message prompt template from template, prompt here from template query, you know, there's various ways that you might want to do this, it just depends on how explicit you want to be.

Generally speaking, I think, for myself, I would prefer that we stick with the objects themselves, and be explicit. But it is definitely a little harder to pass when you're when you're reading this. So I understand why you might also prefer this is it's definitely cleaner, and it is a does look simpler.

So it just depends, I suppose, on preference. Okay. So you see, again, this is exactly the same. Okay, we're chair prompt template, and it contains this and this. Okay. You probably want to see the exact output. So it was messages. Okay, exactly the same as why I put before.

Cool. So we have all that. Let's see how we would invoke our LLM with these. We're going to be using for a mini again, we do need our API key. So enter that. And we'll just initialize our LLM, we are going with a low temperature here. So less randomness, or less creativity.

And in many cases, this is actually what I would be doing. The reason in this scenario that we're going with low temperature is we're doing rag. And if you remember, before we scroll up a little bit here, our template says, answer the user's query based on the context below.

If you cannot answer the question using the provided answer, information answer with I don't know, right. So just from reading that we know that we want our LLM to be as truthful and accurate as possible. So a more creative LLM is going to struggle with that and is more likely to hallucinate.

Whereas a low creativity or low temperature LLM will probably stick with the rules a little better. So again, it depends on your use case. You know, if you're creative writing, you might want to go with a higher temperature there. But for things like rag, where the information being output should be accurate, and truthful.

It's important, I think that we keep temperature low. Okay. I talked about that a little bit here. So of course, lower temperature zero makes the LLMs output more deterministic, which in theory should lead to less hallucination. Okay, so we're gonna go with L cell again here. This is for those of you that use line chain in the past, this is equivalent to an LLM chain object.

So our prompt template is being fed into our LLM. Okay. And from now we have this pipeline. Now let's see how we would use that pipeline. So gonna get some, create some context here. So this is some context around Aurelio AI. Mention that we built semantic routers, semantic junkers, as AI platform, and development services.

We mentioned, I think we specifically outlined this later on in the example. So the line chain experts, little piece of information. Now, most LLMs would have not been trained on the recent internet. So the fact that this came in September 2024, is relatively recent. So a lot of LLMs out of the box, you wouldn't expect them to know that.

So that is a good little bit of information to ask you about. So we invoke, we have our query. So what do we do? And we have that context. Okay, so we're feeding that into that pipeline that we defined here. Alright, so when we invoke that is automatically going to take query and context and actually feed it into our prompt template.

Okay. If we want to, we can also be a little more explicit. So you probably see me doing this throughout the course. Because I do like to be explicit with everything, to be honest. And you'll probably see me doing this. Okay, and this is doing the same thing. Well, you'll see it will in the moment.

This is doing the exact same thing. Again, this is just an LSL thing. So all I'm doing in this scenario is I'm saying, okay, take that from the dictionary query. And then also take from that input dictionary, the context key. Okay, so this is doing the exact same thing.

The reason that we might want to write this is mainly for clarity, to be honest, just too explicit, say, okay, these are the inputs, because otherwise, we don't really have them in the code other than within our original prompts up here, which is not super clear. So I think it's usually a good idea to just be more explicit with these things.

And of course, if you decide you're going to modify things a little bit, let's say you modify this input down the line, you can still feed in the same input here, you're just mapping it between different keys, essentially. Or if you would like to just modify that, you need to lowercase it on the way in or something, you can do.

So you have that, I'll just redefine that, actually. And we'll invoke again. Okay, we see that it does the exact same thing. Okay, so ready. So this is a AI message just generated by the LM. Okay, expertise in building AI agents, several open source frameworks, router, AI platform. Okay, right.

So provide them. So they have everything that other than the line chain experts thing, it didn't mention that. But we will, yeah, we'll test it later on that. Okay, so on to future prompting. This is a specific prompting technique. Now, many state of the art or also to LMS are very good at instruction following.

So you'll find that a few shot prompting is less common now than it used to be, at least for this or bigger, more state of the art models. But when you start using smaller models, not really what we can use here. But let's say you're using a source model like llama three, or llama two, which is much smaller, you will probably need to consider things like few shot prompting.

Although that being said, with open AI models, at least the current open AI models, this is not so important. Nonetheless, it can be useful. So the idea behind future prompting is that you are providing a few examples to your LM of how it should behave before you are actually going into the main part of the conversation.

So let's see how that would look. So we create an example prompt. So we have our human and AI. So human input AI response. So we're basically setting up okay, this with this type of input, you should provide this type of output. That's what we're doing here. And we're just going to provide some examples.

Okay, so we have our input, here's query one, here's the answer one, right? This is just I just want to show you how it works. This is not what we'd actually feed into our LM. Then, with both these examples and our example prompt would feed both of these into line chains, a few shot chat message prompt template.

Okay. And well, you'll see what we get out of it. Okay, so we basically get it formats everything and structures everything for us. Okay. And using this, of course, it depends on let's say you see that your user is talking about a particular topic. And you would like to guide your LM to talk about that particular topic in a particular way.

Right. So you could identify that the user is talking about that topic, either like a keyword match or a semantic similarity match. And based on that, you might want to modify these examples that you feed into your few shot chat message prompt template. And then obviously, for that could be what you do with topic A for topic B, you might have another set of examples that you feed into this.

All this time, your example prompts is remaining the same, but you're just modifying the examples that are going in so that they're more relevant to whatever it is your user is actually talking about. So that can be useful. Let's see an example of that. So when we are using a tiny LM, its ability would be limited, although I think we were probably fine here.

We're going to say, answer the user query based on the context below. Always enter a markdown format, you know, being very specific, this is our system prompt. Okay, that's nice. But what we've kind of said here is, okay, always enter a markdown format to do that. But when doing so, please provide headers, short summaries, and follow bullet points, then conclude.

Okay, so you see this here, okay, so we get this overview of array, you have this and this is actually quite good. But if we come down here, what I specifically want is to always follow this structure. Alright, so we have the double header for the topic, summary, header, a couple of bullet points.

And then I always want to follow this pattern where it's like to conclude, always, it's always bold. You know, I want to be very specific on what I want. And to be, you know, fully honest, with GPT 4.0 mini, you can actually just prompt most of this in. But for the sake of the example, we're going to provide a few short examples in a few short prompt examples, instead to get this.

So we're going to provide one example here. Second example here. And you'll see we're just following that same pattern, we're just setting up the pattern that the LM should use. So we're going to set that up here, we have our main header, a little summary, some sub headers, bullet points, sub header, bullet points, bullet points to conclude, so on and so on.

Same with this one here. Okay. And let's see what we got. Okay, so this is the structure of our new few short prompt template. You can see what all this looks like. Let's come down and we're going to do, we're basically going to insert that directly into our chat prompt template.

So we have from messages, system prompt, user prompt, and then we have in there, these, so let me actually show you very quickly. Right, so we just have this few short chat to message prompt template, which will be fed into the middle here, run that, and then feed all this back into our pipeline.

Okay, and this will, you know, modify the structure so that we have that bold to conclude at the end here. Okay, you can see nicely here. So we get a bit more of that, the exact structure that we were getting again with GPT 4.0 models and many other OpenAI models, you don't really need to do this, but you will see it in other examples.

We do have an example of this where we're using a Llama and we're using, I think Llama 2, if I'm not wrong. And you can see that adding this few short prompt template is actually a very good way of getting those smaller, less capable models to follow your instructions.

So this is really, when you're working with a smaller lens, this can be super useful, but even for SOTA models like GPT 4.0, if you do find that you're struggling with the prompting, it's just not quite following exactly what you want it to do. This is a very good technique for actually getting it to follow a very strict structure or behavior.

Okay, so moving on, we have chain of thought prompting. So this is a more common prompting technique that encourages the LLM to think through its reasoning or its thoughts step by step. So it's a chain of thought. The idea behind this is like, okay, in math class, when you're a kid, the teachers would always push you to put down your, your working out, right?

And there's multiple reasons for that. One of them is to get you to think because they know in a lot of cases, actually, you know, you're a kid and you're in a rush and you don't really care about this test. And the, you know, they're just trying to get you to slow down a little bit, and actually put down your reasoning.

And that kind of forced you to think, oh, actually, I'm skipping a little bit in my head, because I'm trying to just do everything up here. If I write it down, all of a sudden, it's like, Oh, actually, I'm, yeah, I need to actually do that slightly differently, you realize, okay, you're probably rushing a little bit.

Now, I'm not saying an LLM is rushing, but it's a similar effect by an LLM writing everything down, they tend to actually get things right more frequently. And at the same time, also similar to when you're a child and a teacher is reviewing your exam work by having the LLM write down its reasoning, you as a as a human or engineer, you can see where the LLM went wrong, if it did go wrong, which can be very useful when you're trying to diagnose problems.

So with chain of thought, we should see less hallucinations, and generally bad performance. Now to implement chain of thought in line chain, there's no specific like line chain objects that do that. Instead, it's it's just prompting. Okay, so let's go down and just see how we might do that.

Okay, so be helpful assistant answer the user question, you must answer the question directly without any other text or explanation. Okay, so that's our no chain of thought system prompt. I will just know here, especially with OpenAI. Again, this is one of those things where you'll see it more with the smaller models.

Most LLMs are actually trained to use chain of thought prompting by default. So we're actually specifically telling it here, you must answer the question directly without any other text or explanation. Okay, so we're actually kind of reverse prompting it to not use chain of thought. Otherwise, by default, it actually will try and do that because it's been trained to.

That's how that's how relevant chain of thought is. Okay, so I'm going to say how many keystrokes you need to type in, type the numbers from one to 500. Okay, we set up our like LLM chain pipeline. And we're going to just invoke our query. And we'll see what we get.

Total number of keystrokes needed to type numbers from one to 500 is 1511. The actual answers I've written here is 1392. Without chain thought is hallucinating. Okay, now let's go ahead and see okay with chain of thought prompting, what does it do? So be helpful assistant answer users question.

To answer the question, you must list systematically and in precise detail all sub problems that are needed to be solved to answer the question. Solve each sub problem individually, you have to shout at the LLM sometimes to get them to listen. And in sequence. Finally, use everything you've worked through to provide the final answer.

Okay, so we're getting it we're forcing it to kind of go through the full problem there. We can remove that. So run that. Again, I don't know why we have context there. I'll remove that. And let's see. You can see straightaway, that's taking a lot longer to generate the output.

That's because it's generating so many more tokens. So that's just one one drawback of this. But let's see what we have. So to determine how many keystrokes to tie those numbers, we is breaking down several sub problems to count number of digits from one to 910 to 99. So on account digits and number 500.

Okay, interesting. So that's how it's breaking it up. Some more digits counts in the previous steps. So we go through total digits. And we see this, okay, nine digits for those for here 180 for here 1200. And then, of course, three here. So it gets all those sums those digits and actually comes to the right answer.

Okay, so that that is, you know, that's the difference with with chain of thought versus without. So without it, we just get the wrong answer, basically guessing. With chain of thought, we get the right answer just by the LLM writing down its reasoning and breaking the problem down into multiple parts, which is, I found that super interesting that it does that.

So that's pretty cool. Now, I will just see. So as I mentioned, as we mentioned before, most LLMs nowadays are actually trained to use chain of thought prompting by default. So let's just see if we don't mention anything, right? Be a helpful assistant and answer these users questions. So we're not telling it not to think through its reasoning, and we're not telling it to think through its reasoning.

Let's just see what it does. Okay, so you can see, again, it's actually doing the exact same reasoning, okay, it doesn't, it doesn't give us like the sub problems at the start, but it is going through and it's breaking everything apart. Okay, which is quite interesting. And we get the same correct answer.

So the formatting here is slightly different. It's probably a little cleaner, actually, although I think, I don't know. Here, we get a lot more information. So both are fine. And in this scenario, we actually do get the right answer as well. So you can see that that chain of thought prompting has actually been quite literally trained into the model.

And you'll see that with most, well, I think all Save the Art LLMs. Okay, cool. So that is our chapter on prompting. Again, we're focusing very much on a lot of the fundamentals of prompting there. And of course, tying that back to the actual objects and methods within LanguageAid.

But for now, that's it for prompting. And we'll move on to the next chapter. In this chapter, we're going to be taking a look at conversational memory in LanguageChain. We're going to be taking a look at the core, like chat memory components that have really been in LanguageChain since the start, but are essentially no longer in the library.

And we'll be seeing how we actually implement those historic conversational memory utilities in the new versions of LanguageChain. So 0.3. Now as a pre warning, this chapter is fairly long. But that is because conversational memory is just such a critical part of chatbots and agents. Conversational memory is what allows them to remember previous interactions.

And without it, our chatbots and agents would just be responding to the most recent message without any understanding of previous interactions within a conversation. So they would just not be conversational. And depending on the type of conversation, we might want to go with various approaches to how we remember those interactions within a conversation.

Now throughout this chapter, we're going to be focusing on these four memory types. We'll be referring to these and I'll be showing you actually how each one of these works. But what we're really focusing on is rewriting these for the latest version of LangChain using the, what's called the runnable with message history.

So we're going to be essentially taking a look at the original implementations for each of these four original memory types, and then we'll be rewriting them with the runnable memory history class. So just taking a look at each of these four very quickly. Conversational buffer memory is I think the simplest, most intuitive of these memory types.

It is literally just you have your messages, they come in to this object, they are sold in this object as essentially a list. And when you need them again, it will return them to you. There's nothing, nothing else to it, super simple. The conversation buffer window memory, okay, so new word in the middle of the window.

This works in pretty much the same way. But those messages that it has stored, it's not going to return all of them for you. Instead, it's just going to return the most recent, let's say the most recent three, for example. Okay, and that is defined by a parameter k.

Conversational summary memory, rather than keeping track of the entire interaction memory directly, what it's doing is as those interactions come in, it's actually going to take them and it's going to compress them into a smaller little summary of what has been within that conversation. And as every new interaction is coming in, it's going to do that, and I keep iterating on that summary.

And then that is going to return to us when we need it. And finally, we have the conversational summary buffer memory. So this is it's taking sort of buffer part of this is actually referring to very similar thing to the buffer window memory, but rather than it being a most k messages, it's looking at the number of tokens within your memory, and it's returning the most recent k tokens.

That's what the buffer part is there. And then it's also merging that with the summary memory here. So essentially, what you're getting is almost like a list of the most recent messages based on the token length rather than the number of interactions, plus a summary, which would come at the top here.

So you get kind of both. The idea is that obviously this summary here would maintain all of your interactions in a very compressed form. So you're, you're losing less information, and you're still maintaining, you know, maybe the very first interaction, the user might have introduced themselves, giving you their name, hopefully, that would be maintained within the summary, and it would not be lost.

And then you have almost like high resolution on the most recent k or k tokens from your memory. Okay, so let's jump over to the code, we're going into the 04 chat memory notebook, open that in Colab. Okay, now here we are, let's go ahead and install the prerequisites, run all we again, can or cannot use a linesmith, it is up to you.

Enter that. And let's come down and start. So first, we'll just initialize our LM using for a mini in this example, again, low temperature. And we're going to start with conversation buffer memory. Okay, so this is the original version of this memory type. So let me, where are we, we're here.

So memory conversation buffer memory, and we're returning messages that needs to be set to true. So the reason that we set return messages true, it mentions up here is if you do not do this, it's going to be returning your chat history as a string to an LM. Whereas, well, chat elements nowadays would expect message objects.

So yeah, you just want to be returning these as messages rather than as strings. Okay. Otherwise, yeah, you're going to get some kind of strange behavior out from your LMS if you return them strings. So you do want to make sure that it's true. I think by default, it might not be true.

But this is coming, this is deprecated, right? It does tell you here, as deprecation warning, this is coming from older line chain, but it's a good place to start just to understand this. And then we're going to rewrite this with the runnables, which is the recommended way of doing so nowadays.

Okay, so adding messages to our memory, we're going to write this, okay, so it's just a just a conversation user AI user AI, so on, random chat, main things to note here is I do provide my name, we have the the model's name, right towards the start of those interactions.

Okay, so I'm just going to add all of those, we do it like this. Okay, then we can just see, we can load our history, like so. So let's just see what we have there. Okay, so we have human message, AI message, human message, right? This is exactly what we showed you just here.

It's just in that message format from line chain. Okay, so we can do that. Alternatively, we can actually do this. So we can get our memory, we initialize the constitutional buffer memory as we did before. And we can actually add it directly these message into our memory like that.

So we can use this add user message, add AI message, so on, so on, load again, and it's going to give us the exact same thing. Again, there's multiple ways to do the same thing. Cool. So we have that to pass all of this into our LM. Again, this is all deprecated stuff, we're going to learn how to use properly in a moment.

But this is how line chain is doing in the past. So to pass all of this into our LM, we'd be using this conversation chain, right? Again, this is deprecated. Nowadays, we would be using L cell for this. So I just want to show you how this would all go together.

And then we would invoke, okay, what is my name again, let's run that. And we'll see what we get is remembering everything, remember, so this conversation buffer memory, it doesn't drop messages, it just remembers everything. Right. And honestly, with the sort of high context windows of many LMS, that might be what you do.

It depends on how long you expect the conversation to go on for, but you could you probably in most cases would get away with this. Okay, so what, let's see what we get. I say, what is my name again? Okay, let's see what it gives me says your name is James.

Great. Thank you. That works. Now, as I mentioned, all of this I just showed you is actually deprecated. That's the old way of doing things. Let's see how we actually do this in modern or up to date blank chain. So we're using this runnable with message history. To implement that, we will need to use LSL.

And for that we will need to just define prompt templates or LM as we usually would. Okay, so we're going to set up our system prompt, which is just a helpful system called Zeta. Okay, we're going to put in this messages placeholder. Okay, so that's important. Essentially, that is where our messages are coming from our conversation buffer memory is going to be inserted, right?

So it's going to be that chat history is going to be inserted after our system prompt, but before our most recent query, which is going to be inserted last here. Okay, so messages placeholder item, that's important. And we use that throughout the course as well. So we use it both for chat history, and we'll see later on, we also use it for the intermediate thoughts that a agent would go through as well.

So important to remember that little thing. We'll link our prompt template to our LM. Again, if we would like, we could also add in the I think we only have the query here. Oh, we would probably also want our history as well. But I'm not going to do that right now.

Okay, so we have our pipeline. And we can go ahead and actually define our runnable with message history. Now this class or object when we are initializing it does require a few items, we can see them here. Okay, so we see that we have our pipeline with history. So it's basically going to be, you can you can see here, right, we have that history messages key, right, this here has to align with what we provided as a messages placeholder in our pipeline, right?

So we have our pipeline prompt template here, and here, right. So that's where it's coming from. It's coming from messages placeholder, the variable name is history, right? That's important. That links to this. Then for the input messages key here, we have query that, again, links to this. Okay, so both important to have that.

The other thing that is important is obviously we're passing in that pipeline from before. But then we also have this get session history. Basically, what this is doing is it saying, okay, I need to get the list of messages that make up my chat history that are going to be inserted into this variable.

So that is a function that we define, okay. And within this function, what we're trying to do here is actually replicate what we have with the previous conversation buffer memory. Okay, so that's what we're doing here. So it's very simple, right? So we have this in memory chat message history.

Okay, so that's just the object that we're going to be returning. What this will do is it will sell a session ID, the session ID is essentially like a unique identifier so that each conversational interaction within a single conversation is being mapped to a specific conversation. So you don't have overlapping, let's say have multiple users using the same system, you want to have a unique session ID for each one of those.

Okay, and what it's doing is saying, okay, if the session ID is not in the chat map, which is this empty dictionary we defined here, we are going to initialize that session with an in memory, chat message history. Okay, that's it. And we return. Okay, and all that's going to do is it's going to basically append our messages, they will be appended within this chat map session ID, and they're going to get returned.

There's nothing else to it, to be honest. So we invoke our runnable, let's see what we get. I need to run this. Okay, note that we do have this config, so we have the session ID, that's to again, as I mentioned, keep different conversations separate. Okay, so we've run that.

Now let's run a few more. So what is my name again, let's see if it remembers. Your name is James. How can I help you today, James? Okay. So it's what we've just done there is literally conversation buffer memory, but for up to date, line chain with L cell with runnables.

So the recommended way of doing it nowadays. So that's a very simple example. Okay, there's really not that much to it. It gets a little more complicated as we start thinking about the different types of memory. Although with that being said, it's not massively complicated, we're only really going to be changing the way that we're getting our interactions.

So let's, let's dive into that and see how we will do something similar with the conversation buffer window memory. But first, let's actually just understand okay, what is the conversation buffer window memory. So as I mentioned, near the start, it's going to keep track of the last K messages.

So there's a few things to keep in mind here. More messages does mean more tokens that send each request. And if we have more tokens in each request, it means that we're increasing the latency of our responses and also the cost. So with the previous memory type, we're just sending everything.

And because we're sending everything that is going to be increasing our costs, it's going to be increasing our latency for every message, especially as the conversation gets longer and longer. And we don't, we might not necessarily want to do that. So with this conversation buffer window memory, we're going to say, okay, just return me the most recent messages.

Okay, so let's, well, let's see how that would work. Here, we're going to return the most recent four messages. Okay, we are again, make sure we've turned messages is set to true. Again, this is deprecated. This is just the old way of doing it. In a moment, we'll see the updated way of doing this.

We'll add all of our messages. Okay, so we have this. And just see here, right, so we've added in all these messages, there's more than four messages here. And we can actually see that here. So we have human message, AI, human, AI, human, AI, human, AI. Right. So we've got four pairs of human AI interactions there.

But here, we don't have as more than four pairs. So four pairs would take us back all the way to here, I'm researching different types of conversational memory. Okay, and if we take a look here, the most the first message we have is I'm researching different types of conversational memory.

So it's cut off these two here, which will be a bit problematic when we ask you what our name is. Okay, so let's just see, we're going to be using conversation chain object again, again, remember that is deprecated. And I want to say what is my name again, let's see, let's see what it says.

I'm sorry, I don't know if I see your name or any personal information, if you like, you can tell me your name, right, so it doesn't actually remember. So that's kind of like a negative of the conversation buffer window memory. Of course, the to fix that in this scenario, we might just want to increase K maybe we say around the previous eight interaction pairs, and it will actually remember.

So what's my name again, your name is James. So now it remembers, we just modified how much is remembering. But of course, you know, there's pros and cons to this, it really depends on what you're trying to build. So let's take a look at how we would actually implement this with the runnable with message history.

Okay, so getting a little more complicated here, although it is, it's not, it's not complicated. But more we'll see. Okay, so we have a buffer window message history, we're creating a class here, this class is going to inherit from the base chat message history object from line chain. Okay, and all of our other message history objects can do the same thing before with the in memory message object that was basically replicating the buffer memory.

So we didn't actually need to do anything, we didn't need to define our own class here. So in this case, we do. So we follow the same pattern that line chain follows with this base chat message history. And you can see a few of the functions here that are important.

So add messages and clear the ones that we're going to be focusing on, we also need to have messages, which this object attribute here. Okay, so we're just implementing the synchronous methods here. If we want this to be async, if we want to supply async, we would have to add a add messages, a get messages and a clear as well.

So let's go ahead and do that. We have messages we have k again, we're looking at remembering the top k messages or most recent k messages only. So it's important that we have that variable, we are adding messages through this class, this is going to be used by line chain within our runnable.

So we need to make sure that we do have this method. And all we're going to be doing is sending the self messages list here. And then we're actually just going to be trimming that down so that we're not remembering anything beyond those, you know, most recent k messages that we have set from here.

And then we also have the clear method as well. So we need to include that that's just going to clear the history. Okay, so it's not this isn't complicated, right? It just gives us this nice default standard interface for message history. And we just need to make sure we're following that pattern.

Okay, I've included the this print here just so we can see what's happening. Okay, so we have that. And now for that get chat history function that we defined earlier, rather than using the built in method, we're going to be using our own object, which is a buffer window message history, which we defined just here.

Okay. So if session ID is not in the chat map, as we did before, we're going to be initializing our buffer window message history, we're setting k up here with a default value of four, and then we just return it. Okay, and that is it. So let's run this, we have our runnable with message history, we have all of these variables, which are exactly the same as before.

But then we also have these variables here with this history factory config. And this is where if we have new variables that we've added to our message history, in this case, k that we have down here, we need to provide that to line chain and tell it this is a new configurable field.

Okay. And we've also added it for the session ID here as well. So we're just being explicit and have everything in that. So we have that and we run. Okay, now let's go ahead and invoke and see what we get. Okay, so important here, this history factory config, that is kind of being fed through into our invoke so that we can actually modify those variables from here.

Okay, so we have config configurable, session ID, okay, we'll just put whatever we want in here. And then we also have the number k. Okay, so remember the previous four interactions, I think in this one, we're doing something slightly different. I think we're remembering the four interactions rather than the previous four interaction pairs.

Okay, so my name is James, we're going to go through I'm just going to actually clear this. And I'm going to start again. And we're going to use the exact same add user message and AI message that we used before, which is manually inserting all that into our history, so that we can then just see, okay, what is the result.

And you can see that k equals four is actually unlike before where we were having the saving the top four interaction pairs, when now saving the most recent four interactions, not pairs, just interactions. And honestly, I just think that's clearer. I think it's weird that the number four for k would actually save the most recent eight messages.

Right? I think that's odd. So I'm just not replicating that weirdness. We could if we wanted to, I just don't like it. So I'm not doing that. And anyway, we can see from messages that we're returning just the most four recent messages. Okay, I wish would be these four.

Okay, cool. So we've just using the runnable, we've replicated the old way of having a window memory. And okay, I'm going to say what is my name again, as before, it's not going to remember. So we can come to here, I'm sorry, but I don't have access to personal information and so on and so on.

If you like to tell me your name, it doesn't know. Now let's try a new one, where we initialize a new session. Okay, so we're going with ID k 14. So that's going to create a new conversation there. And we're going to say, we're going to set k to 14.

Okay, great. I'm going to manually insert the other messages as we did before. Okay, and we can see all of those you can see at the top here, we are still maintaining that Hi, my name is James message. Now let's see if it remembers my name. Your name is James.

Okay, there we go. Cool. So that is working. We can also see, so we just added this, what is my name again, let's just see if did that get added to our list of messages. Right, what is my name again? Nice. And then we also have the response, your name is James.

So just by invoking this, because we're using the, the runnable with message history, it's just automatically adding all of that into our message history, which is nice. Cool. Alright, so that is the buffer window memory. Now we are going to take a look at how we might do something a little more complicated, which is the summaries.

Okay, so when you think about the summary, you know, what are we doing, we're actually taking the messages, we're using the LLM call to summarize them, to compress them, and then we're storing them within messages. So let's see how we would actually do that. So to start with, let's just see how it was done in old line chain.

So your conversation summary memory, go through that. And let's just see what we get. So again, same interactions. Right, I'm just invoking, invoking, invoking, I'm not adding these directly to the messages, because it actually needs to go through a like that summarization process. And if we have a look, we can see it happening.

Okay, current conversation. So sorry, current conversation. Hello there, my name is James, AI is generating. Current conversation, the human introduces himself as James, AI greets James warmly and expresses its readiness to chat and assist, inquiring about how his day is going. Right, so it's summarizing the previous interactions. And then we have, you know, after that summary, we have the most recent human message, and then the AI is going to generate its response.

Okay, and that continues going, continues going. And you see that the final summary here is going to be a lot longer. Okay, and it's different that first first summary, of course, asking about his day, he mentions that he's researching different types of conversational memory. The AI responds enthusiastically, explaining that conversational memory includes short term memory, long term memory, contextual memory, personalized memory, and then inquires if James is focused on the specific type of memory.

Okay, cool. So we get essentially the summary is just getting longer and longer as we go. But at some point, the idea is that it's not going to keep growing. And it should actually be shorter than if you were saving every single interaction, whilst maintaining as much of the information as possible.

But of course, you're not going to maintain all of the information that you would with, for example, the the buffer memory, right with the summary, you are going to lose information, but hopefully less information than if you're just cutting interactions. So you're trying to reduce your token count whilst maintaining as much information as possible.

Now, let's go and ask what is my name again, it should be able to answer because we can see in the summary here that I introduced myself as James. Okay, response, your name is James. How is your research going? Okay, so has that. Cool. Let's see how we'd implement that.

So again, as before, we're going to go with that conversation summary message history, we're going to be importing a system message, we're going to be using that not for the LM that we're chatting with, but for the LM that will be generating our summary. So actually, that is not quite correct, there's create a summary, not that it matters, it's just the docker string.

So we have our messages and we also have the LM. So different tribute here to what we had before. When we initialize a conversation summary message history, we need to be passing in our LM. We have the same methods as before, we have add messages and clear. And what we're doing is as messages coming, we extend with our current messages, but then we're modifying those.

So we construct our instructions to make a summary. So that is here, we have the system prompt, given the existing conversation summary and the new messages, generate a new summary of the conversation, ensuring to maintain as much relevant information as possible. Then we have a human message here, through that we're passing the existing summary.

And then we're passing in the new messages. So we format those and invoke the LM. And then what we're doing is in the messages, we're actually replacing the existing history that we had before with a new history, which is the single system summary message. Let's see what we get.

As before, we have that get chat history exactly the same as before. The only real difference is that we're passing in the LM parameter here. And of course, as we're passing in the LM parameter in here, it does also mean that we're going to have to include that in the configurable field spec, and that we're going to need to include that when we're invoking our pipeline.

So we run that, pass in the LM. Now, of course, one side effect of generating summaries or everything is that we're actually, you know, we're generating more. So you are actually using quite a lot of tokens. Whether or not you are saving tokens or not actually depends on the length of a conversation.

As the conversation gets longer, if you're storing everything, after a little while that the token usage is actually going to increase. So if in your use case you expect to have shorter conversations, you would be saving money and tokens by just using the standard buffer memory. Whereas if you're expecting very long conversations, you would be saving tokens and money by using the summary history.

Okay, so let's see what we got from that. We have a summary of the conversation. James introduced himself by saying, "Hi, my name is James." AR responded warmly asking, "Hi, James." Interaction include details about token usage. Okay, so we actually included everything here, which we probably should not have done.

Why did we do that? So in here, we're including all of the content from the messages. So I think maybe if we just do "x.content" for "x" in messages, that should resolve that. Okay, there we go. So we quickly fixed that. So yeah, before we're passing in the entire message object, which obviously includes all of this information.

Whereas actually we just want to be passing in the content. So we modified that and now we're getting what we'd expect. Okay, cool. And then we can keep going. So as we as we keep going, the summary should get more abstract. Like as we just saw here, it's literally just giving us the messages directly almost.

Okay, so we're getting the summary there and we can keep going. We're going to add just more messages to that. So we'll see as we'll send those, we're getting a response. Send again, get response. And we're just adding all of that. Inverting all of that and that will be of course adding everything into our message history.

Okay, cool. So we've run that. Let's see what the latest summary is. Okay, and then we have this. So this is a summary that we have instead of our chat history. Okay, cool. Now, finally, let's see what's my name again. We can just double check. You know, it has my name in there.

So it should be able to tell us. Okay, cool. So your name is James. Pretty interesting. So let's have a quick look over at Langsmith. So the reason I want to do this is just to point out, okay, the different essentially token usage that we're getting with each one of these.

Okay, so we can see that we have these runnable message history, which are probably improved in naming there. But we can see, okay, how long is each one of these taken? How many tokens are they also using? Come back to here. We have this runnable message history. This is, we'll go through a few of these, maybe to here, I think.

You can see here, this is that first interaction where we're using the buffer memory. And we can see how many tokens we use here. So 112 tokens when we're asking what is my name again. Okay, then we modified this to include, I think it was like 14 interactions or something on those lines, obviously increases the number of tokens that we're using, right?

So we can see that actually happening all in Langsmith, which is quite nice. And we can compare, okay, how many tokens is each one of these using. Now, this is looking at the buffer window. And if we come down to here and look at this one, so this is using our summary.

Okay, so summary with what is my name again, actually use more tokens in this scenario, right? Which is interesting because we're trying to compress information. The reason there's more is because there's not, there hasn't been that many interactions. As the conversation length increases with the summary, this total number of tokens, especially if we prompt it correctly to keep that low, that should remain relatively small.

Whereas with the buffer memory, that will just keep increasing and increasing as the conversation gets longer. So useful little way of using Langsmith there to just kind of figure out, okay, in terms of tokens and costs of what we're looking at for each of these memory types. Okay, so our final memory type acts as a mix of the summary memory and the buffer memory.

So what it's going to do is keep the buffer up until an n number of tokens. And then once a message exceeds the n number of token limit for the buffer, it is actually going to be added into our summary. So this memory has the benefit of remembering in detail the most recent interactions whilst also not having the limitation of using too many tokens as a conversation gets longer and even potentially exceeding context windows if you try super hard.

So this is a very interesting approach. Now as before, let's try the original way of implementing this. Then we will go ahead and use our update method for implementing this. So we come down to here and we're going to do Lang chain memory import conversation summary buffer memory. Okay, a few things here.

LLM for summary. We have the n number of tokens that we can keep before they get added to the summary and then return messages, of course. Okay, you can see again this is deprecated. We use the conversation chain and then we're just passing our memory there and then we can chat.

Okay, so super straightforward first message. We'll add a few more here. Again, we have to invoke because how memory type here is using LLM to create those summaries as it goes and let's see what they look like. Okay, so we can see for the first message here, we have a human message and then an AI message.

Then we come a little bit lower down again. It's the same thing. Human message is the first thing in our history here. Then it's a system message. So this is at the point where we've exceeded that 300 token limit and the memory type here is generating those summaries. So that summary comes in as this is a message and we can see, okay, the human named James introduces himself and mentions he's researching different types of conversational memory and so on and so on.

Right. Okay, cool. So we have that. Then let's come down a little bit further. We can see, okay, so the summary there. Okay, so that's what we that's what we have. That is the implementation for the old version of this memory. Again, we can see it's deprecated. So how do we implement this for our more recent versions of LangChain and specifically 0.3?

Well, again, we're using that runnable message history and it looks a little more complicated than we were getting before, but it's actually just, you know, it's nothing too complex. We're just creating a summary as we did with the previous memory type, but the decision for adding to that summary is based on, in this case, actually the number of messages.

So I didn't go with the LangChain version where it's a number of tokens. I don't like that. I prefer to go with messages. So what I'm doing is saying, okay, the last K messages. Okay. Once we exceed K messages, the messages beyond that are going to be added to the memory.

Okay, cool. So let's see, we first initialize our conversation summary buffer message history class with LLM and K. Okay, so these two here. So LLM, of course, to create summaries and K is just the limit of number of messages that we want to keep before adding them to the summary or dropping them from our messages and adding them to the summary.

Okay, so we will begin with, okay, do we have an existing summary? So the reason we set this to none is we can't extract the summary, the existing summary, unless it already exists. And the only way we can do that is by checking, okay, do we have any messages?

If yes, we want to check if within those messages, we have a system message because we're doing the same structure as what we have for peer where the system message, that first system message is actually our summary. So that's what we're doing here. We're checking if there is a summary message already stored within our messages.

Okay, so we're checking for that. If we find it, we'll just do, we have this little print statement so we can see that we found something and then we just make our existing summary. I should actually move this to the first instance here. Okay, so that existing summary will be set to the first message.

Okay, and this would be a system message rather than a string. Cool, so we have that. Then we want to add any new messages to our history. Okay, so we're sending the history there and then we're saying, okay, if the length of our history is exceeds the K value that we set, we're going to say, okay, we found that many messages.

We're going to be dropping the latest. It's going to be the latest two messages. This I will say here, one thing or one problem with this is that we're not going to be saving that many tokens if we're summarizing every two messages. So what I would probably do is in an actual like production setting, I would probably say let's go to twenty messages and once we hit twenty messages, let's take the previous ten.

We're going to summarize them and put them into our summary alongside any previous summary that already existed, but in you know, this is also fine as well. Okay, so we say we found those messages. We're going to drop the latest two messages. Okay, so we pull the oldest messages out.

I should say not the latest. It's the oldest, not the latest. We want to keep the latest and drop the oldest. So we pull out the oldest messages and keep only the most recent messages. Okay, then I'm saying, okay, if we don't have any old messages to summarize, we don't do anything.

We just return. Okay, so this indicates that this has not been triggered. We would hit this, but in the case this has been triggered and we do have old messages, we're going to come to here. Okay, so this is we can see we have a system message prompt template saying giving the existing conversation summary in the new messages generate a new summary of the conversation, ensuring to maintain as much relevant information as possible.

So if we want to be more conservative with tokens, we could modify this prompt here to say keep the summary to within the length of a single paragraph, for example, and then we have our human message prompt template, which can say, okay, here's the existing conversation summary and here are new messages.

Now, new messages here is actually the old messages, but the way that we're framing it to the LLM here is that we want to summarize the whole conversation, right? It doesn't need to have the most recent messages that we're storing within our buffer. It doesn't need to know about those.

That's irrelevant to the summary. So we just tell it that we have these new messages and as far as this LLM is concerned, this is like the full set of interactions. Okay, so then we would format those and invoke our LLM and then we'll print out our new summary so we can see what's going on there and we would prepend that new summary to our conversation history.

Okay, and this will work so we can just prepend it like this because we've already popped. Where was it up here? If we have an existing summary, we already popped that from the list. It's already been pulled out of that list. So it's okay for us to just we don't need to say like we don't need to do this because we've already dropped that initial system message if it existed.

Okay, and then we have the clear method as before. So that's all of the logic for our conversational summary buffer memory. We redefine our get chat history function with the LM and K parameters there and then we'll also want to set the configurable fields again. So that is just going to be called session ID LM and K.

Okay, so now we can invoke the K value to begin with is going to be four. Okay, so you can see no old messages to update summary with. That's good. Let's invoke this a few times and let's see what we get. Okay, so no old messages to update summary with.

Found six messages dropping the oldest two and then we have new summary in the conversation. James and Bruce themselves and Chris is interested in researching different types of conversational memory. Right so you can see there's quite a lot in here at the moment. So we would definitely want to prompt the LM the summary LM to keep that short.

Otherwise, we're just getting a ton of stuff right, but we can see that that is you know it's it's working. It's functional. So let's go back and see if we can prompt it to be a little more concise. So we come to here and trying to maintain as much relevant information as possible.

However, we need to keep our summary concise. The limit is a single short paragraph. Okay, something like this. Let's try and let's see what we get with that. Okay, so message one again and nothing to update. See this so new summary you can see it's a bit shorter. It doesn't have all those bullet points.

Okay, so that seems better. Let's see so you can see the first summary is a bit shorter, but then as soon as we get to the second and third summaries, the second summary is actually slightly longer than the third one. Okay, so we're going to be we're going to be losing a bit of information in this case more than we were before, but we're saving a ton of tokens.

So that's of course a good thing and of course we could keep going and adding many interactions here and we should see that this conversation summary will be it should maintain that sort of length of around one short paragraph. So that is it for this chapter on conversational memory.

We've seen a few different memory types. We've implemented the old deprecated versions so we can see what they were like and then we've reimplemented them for the latest versions of lang chain and to be honest using logic where we are getting much more into the weebs and that is in some ways.

Okay, it complicates things that is true, but in other ways it gives us a ton of control so we can modify those memory types as we did with that final summary buffer memory type. We can modify those to our liking, which is incredibly useful when you're actually building applications for the real world.

So that is it for this chapter. We'll move on to the next one in this chapter. We are going to introduce agents now agents. I think are one of the most important components in the world of AI and I don't see that going away anytime soon. I think the majority of AI applications, the intelligent part of those will be almost always an implementation of an AI agent or most for AI agents.

So in this chapter, we are just going to introduce agents within the context of lang chain. We're going to keep it relatively simple. We're going to go into much more depth in agents in the next chapter where we'll do a bit of a deep dive, but we'll focus on just introducing the core concepts and of course agents within lang chain here.

So jumping straight into our notebook, let's run our prerequisites. You'll see that we do have an additional prerequisite here, which is Google search results. That's because we're going to be using the SERP API to allow our LM as an agent to search the web, which is one of the great things about agents that they can do all of these additional things and LM by itself obviously cannot.

So we'll come down to here. We have our langsmith parameters again, of course. So you enter your lang chain API key if you have one and now we're going to take a look at tools, which is a very essential part of agents. So tools are a way for us to augment our LMs with essentially anything that we can write in code.

So we mentioned that we're going to have a Google search tool that Google search tool. It's some code that gets executed by our LM in order to search Google and get some results. So a tool can be thought of as any code logic or any function in the case of Python and a function that has been formatted in a way so that our LM can understand how to use it and then actually use it.

Although the LM itself is not using the tool. It's more our agent execution logic, which uses the tool for the LM. So we're going to go ahead and actually create a few simple tools. We're going to be using what is called the tool decorator from lang chain and there are a few things to keep in mind when we're building tools.

So for optimal performance, our tool needs to be just very readable and what I mean by readable is we need three main things. One is a dot string that is written natural language and it is going to be used to explain to the LM when and why and how it should use this tool.

We should also have clear parameter names. Those parameter names should tell the LM okay what each one of these parameters are. They should be self explanatory. If they are not self explanatory, we should be including an explanation for those parameters within the dot string. Then finally, we should have type annotations for both our parameters and also what we're returning from the tool.

So let's jump in and see how we would implement all of that. So come down here and we have lang chain core tools import tool. Okay. So these are just four incredibly simple tools. We have the addition or add tool multiply the exponentiate and the subtract tools. Okay. So a few calculator S tools.

Now when we add this tool decorator, it is turning each of these tools into what we call a structured tool object. So you can see that here. We can see we have this structured tool. We have a name description. Okay. And then we have this schema. We'll see this in a moment and a function right.

So this function is literally just the original function. It's a mapping to the original function. So in this case, it's the add function. Now the description we can see it's coming from our dot string and of course the name as well is just coming from the function name. Okay.

And then we can also see let's just print the name and description, but then we can also see the args schema right. We can so this thing here that we can't read at the moment to read it. We're just going to look at the model JSON schema method and then we can see what that contains, which is all of this information.

So this actually contains everything includes properties. So we have the X. It creates a sort of title for that and it also specifies the type. Okay. So the type that we define is float float for opening. I guess mapped to number rather than just being float and then we also see that we have this required field.

So this is telling our LM which parameters are required, which ones are optional. So we you know in some cases you would we can even do that here. Let's do Z. That is going to be float or none. Okay. And we're just going to say it is 0.3. Alright.

I'm going to remove this in a minute because it's kind of weird, but let's just see what that looks like. So you see that we now have X, Y, and Z, but then in Z, we have some additional information. Okay. So it can be any of it can be a number or it can just be nothing.

The default value for that is 0.3. Okay. And then if we look here, we can see that the required field does not include Z. So it's just X and Y. So it's describing the full function schema for us, but let's remove that. Okay. And we can see that again with our exponentiate tool similar thing.

Okay. So how how are we going to invoke our tool? So the LLM the underlying LLM is actually going to generate a string. Okay. So it will look something like this. This is going to be our LLM output. So it is it's a string that is some JSON and of course to load a string into a dictionary format, we just use JSON loads.

Okay. So let's see that. So this could be the output from our LLM. We load it into a dictionary and then we get an actual dictionary. And then what we would do is we can take our exponentiate tool. We access the underlying function and then we pass it the keyword arguments from our dictionary here.

Okay. And that will execute our tool. That is the tool execution logic that LineChain implements and then later on in the next chapter, we'll be implementing ourselves. Cool. So let's move on to creating an agent. Now, we're going to be constructing a simple tool calling agent. We're going to be using LineChain expression language to do this.

Now, we will be covering LineChain expression language or LSL more in a upcoming chapter but for now, all we need to know is that our agent will be constructed using syntax and components like this. So, we would start with our input parameters. That is going to include our user query and of course, the chat history because we need our agent to be conversational and remember previous interactions within the conversation.

These input parameters will also include a placeholder for what we call the agent scratch pad. Now, the agent scratch pad is essentially where we are storing the internal thoughts or the internal dialogue of the agent as it is using tools and getting observations from those tools and working through those multiple internal steps.

So, in the case that we will see, it will be using, for example, the addition tool, getting the result using the multiply tool, getting the result, and then providing a final answer towards as a user. So, let's jump in and see what it looks like. Okay, so we'll just start with defining our prompt.

So, our prompt is going to include the system message. That's nothing. We're not putting anything special in there. We're going to include the chat history which is a messages placeholder. Then, we include our human message and then we include a placeholder for the agent scratch pad. Now, the way that we implement this later is going to be slightly different for the scratch pad.

We'd actually use this messages placeholder but this is how we use it with the built-in create tool agent from LinkedIn. Next, we'll define our LM. We do need our opening our API key for that. So, we'll enter that here like so. Okay, so come down. Okay, so we're going to be creating this agent.

We need conversation memory and we are going to use the older conversation buffer memory class rather than the newer runnable with message history class. That's just because we're also using this older create tool calling agent and this is the older way of doing things. In the next chapter, we are going to be using the more recent basically what we already learned on chat history.

We're going to be using all of that to implement our chat history but for now, we're going to be using the older method which is deprecated just as a pre warning but again, as I mentioned at the very start of course, we're starting abstract and then we're getting into the details.

So, we're going to initialize our agent for that. We need these four things. LLM as we defined. Tools as we have defined. Prompt as we have defined and then the memory which is our old conversation buffer memory. So, with all of that, we are going to go ahead and we create a tool calling agent and then we just provide it with everything.

Okay, there we go. Now, you'll see here I didn't pass in the memory. I'm passing it in down here instead. So, we're going to start with this question which is what is 10.7 multiplied by 7.68. Okay. So, given the precision of these numbers, our normal LLM would not be able to answer that.

Almost definitely would not be able to answer that correctly. We need a external tool to answer that accurately and we'll see that that is exactly what it's trying to do. So, we can see that the tool agent action message here. We see that it decided, okay, I'm going to use the multiply tool and here are the parameters I want to use for that tool.

Okay, we can see X is 10.7 and Y is 7.68. You can see here that this is already a dictionary and that is because the Lang chain has taken the string from our LLM call and already converted it into a dictionary for us. Okay, so that's just it's happening behind the scenes there and you can actually see if we go into the details a little bit, we can see that we have these arguments and this is the original string that was coming from our LLM.

Okay, which has already been, of course, processed by Lang chain. So, we have that. Now, the one thing missing here is that, okay, we've got that the LLM wants us to use multiply and we've got what the LLM wants us to put into multiply but where's the answer, right?

There is no answer because the tool itself has not been executed because it can't be executed by the LLM but then, okay, didn't we already define our agent here? Yes, we defined the part of our agent. That is how LLM has our tools and it is going to generate which tool to use but it actually doesn't include the agent execution part which is, okay, the agent executor is a broader thing.

It's broader logic like just code logic which acts as a scaffolding within which we have the iteration through multiple steps of our LLM calls followed by the LLM outputting what tool to use followed by us actually executing that for the LLM and then providing the output back into the LLM for another decision or another step.

So, the agent itself here is not the full agentic flow that we might expect. Instead, for that, we need to implement this agent executor class. This agent executor includes our agent from before. Then, it also includes the tools and one thing here is, okay, we already passed the tools to our agent.

Why do we need to pass them again? Well, the tools being passed to our agent up here, that is being used. So, that is essentially extracting out those function schemas and passing it to our LLM so that our LLM knows how to use the tools. Then, we're down here.

We're passing the tools again to our agent executor and this is rather than looking at how to use those tools. This is just looking at, okay, I want the functions for those tools so that I can actually execute them for the LLM or for the agent. Okay, so that's what is happening there.

Now, we can also pass in our memory directly. So, you see, if we scroll up a little bit here, I actually had to pass in the memory like this with our agent. That's just because we weren't using the agent executor. Now, we have the agent executor. It's going to handle that for us and another thing that's going to handle for us is intermediate steps.

So, you'll see in a moment that when we invoke the agent executor, we don't include the intermediate steps and that's because that is already handled by the agent executor now. So, we'll come down. We'll set verbose equal to true so we can see what is happening and then we can see here, there's no intermediate steps anymore and we do still pass in the chat history like this but then the addition of those new interactions to our memory is going to be handled by the executor.

So, in fact, let me actually show that very quickly before we jump in. Okay, so that's currently empty. We're going to execute this. Okay, we're entered that new agent executor chain and let's just have a quick look at our messages again and now you can see that agent executor automatically handled the addition of our human message and then the responding AI message for us.

Okay, which is useful. Now, what happened? So, we can see that the multiply tool was invoked with these parameters and then this pink text here that we got, that is the observation from the tool. So, it's what the tool output back to us, okay? Then, this final message here is not formatted very nicely but this final message here is coming from our LLM.

So, the green is our LLM output. The pink is our tool output, okay? So, the LLM after seeing this output says 10.7 multiplied by 7.68 is approximately 82.18. Okay, cool. Useful and then we can also see that the chat history which we already just saw. Great. So, that has been used correctly.

We can just also confirm that that is correct. 82.1759 recurring which is exactly what we get here. Okay and we the reason for that is obviously our multiply tool is just doing this exact operation. Cool. So, let's try this with a bit of memory. So, I'm going to ask or I'm going to state to the agent.

Hello, my name is James. We'll leave that as the it's not actually the first interaction because we already have these but it's an early interaction with my name in there. Then, we're going to try and perform multiple tool calls within a single execution loop and what you'll see with when it is calling these tools is that you can actually use multiple tools in parallel.

So, for sure, I think two or three of these were used in parallel and then define or subtract had to wait for those previous results. So, it would have been executed afterwards and we should actually be able to see this in Langsmith. So, if we go here, yeah, we can see that we have this initial call and then we have add a multiply and exponentiate or use in parallel.

Then, we have another call which you subtract and then we get the response. Okay, which is pretty cool and then the final result there is negative eleven. Now, when you look at whether the answer is accurate, I think the order here of calculations is not quite correct. So, if we put the actual computation here, it gets it right but otherwise, if I use natural language, it's like, I'm doing, maybe I'm phrasing it in a poor way.

Okay, so, I suppose that is pretty important. So, okay, if we put the computation in here, we get the negative thirteen. So, it's something to be careful with and probably requires a little bit of prompting to prompting and maybe examples in order to get that smooth so that it does do things in the way that we might expect or maybe we as humans are just bad and misuse the systems one or the other.

Okay, so now, we've gone through that a few times. Let's go and see if our agent can still recall our name. Okay and it remembers my name is James. Good. So, it still has that memory in there as well. That's good. Let's move on to another quick example where we're just going to use Google Search.

So, we're going to be using the SEB API. You can, okay, you can get the API key that you need from here. So, SEB API dot com slash user slash sign in and just enter that in here. So, you will get it's up to 100 searches per month for free.

So, just be aware of that if you overuse it. I don't think they charge you cuz I don't think you enter your card details straight away but yeah just be aware of that limit. Now, there are certain tools that LineTrain have already built for us. So, they're pre-built tools and we can just load them using the load tools function.

So, we do that like so. We have our load tools and we just pass in the SEB API tool only. We can pass in more there if we want to and then we also pass in our LM. Now, I'm going to one, use that tool but I'm also going to define my own tool which is to get the current location based on the IP address.

Now, this is we're in Colab at the moment. So, it's actually going to get the IP address for the Colab instance that I'm currently on and we'll find out where that is. So, that is going to get the IP address and then it's going to provide the data back to our LM in this format here.

So, we're going to be latitude, longitude, city, and country. Okay? We're also going to get the current date and time. So, now, we're going to redefine our prompt. I'm not going to include chat history here. I just want this to be like a one-shot thing. I'm going to redefine our agent and agent executor using our new tools which is our SEB API plus the get current date time and get location from IP.

Then, I'm going to invoke our agent executor with I have a few questions. What is the date and time right now? How is the weather where I am? And please give me degrees in Celsius. So, when it gives me that weather. Okay and let's see what we get. Okay.

So, apparently, we're in Council Bluffs in the US. It is 13 degrees Fahrenheit which I think is absolutely freezing. Oh my gosh, it is. Yes, minus ten. So, it's super cold over there. And you can see that, okay, it did give us Fahrenheit. So, that's that is because the tool that we're using provided us with Fahrenheit which is fine but it did translate that over into a estimate of Celsius for us which is pretty cool.

So, let's actually output that. So, we get this which I is correct with the US approximately this and we also get a description of the conditions was partly cloudy with 0% precipitation lucky for them and humidity of 66%. Okay. All pretty cool. So, that is it for this introduction to Langchain Agents.

As I mentioned, next chapter, we're going to dive much deeper into Agents and also implement that for Langchain version 0.3. So, we'll leave this chapter here and jump into the next one. In this chapter, we're going to be taking a deep dive into Agents with the Langchain and we're going to be covering what an agent is.

We're going to talk a little bit conceptually about agents, the React agent, and the type of agent that we're going to be building and based on that knowledge, we are actually going to build out our own agent execution logic which we refer to as the agent executor. So, in comparison to the previous video on agents in Langchain which is more of an introduction, this is far more detailed.

We'll be getting into the weeds a lot more with both what agents are and also agents within Langchain. Now, when we talk about agents, a significant part of the agent is actually relatively simple code logic that iteratively runs LLM calls and processes their outputs, potentially running or executing tools.

The exact logic for each approach to building an agent will actually vary pretty significantly, but we'll focus on one of those which is the React agent. Now, React is a very common pattern and although being relatively old now, most of the tool agents that we see used by OpenAI and essentially every LLM company, they all use a very similar pattern.

Now, the React agent follows a pattern like this. Okay, so we would have our user input up here. Okay, so our input here is a question, right? Aside from the Apple remote, what other device can control the program? Apple remote was originally designed to interact with. Now, probably most LLMs would actually be able to answer this directly now.

This is from the paper, which was a few years back. Now, in this scenario, assuming our LLM didn't already know the answer, there are multiple steps an LLM or an agent might take in order to find out the answer. Okay, so first of those is we say our question here is what other device can control the program?

Apple remote was originally designed to interact with. So the first thing is, okay, what was the program that the Apple remote was originally designed to interact with? That's the first question we have here. So what we do is I need to search Apple remote and find a program that's useful.

This is a reasoning step. So the LLM is reasoning about what it needs to do. I need to search for that and find a program that's useful. So we are taking an action. This is a tool call here. Okay, so we're going to use the search tool and our query will be Apple remote and the observation is the response we get from executing that tool.

Okay, so the response here will be the Apple remote is designed to control the front grow media center. So now we know the program Apple remote was originally designed to interact with. Now we're going to go through another iteration. Okay, so this is one iteration of our reasoning action and observation.

So when we're talking about react here, although again, this sort of pattern is very common across many agents when we're talking about react, the name actually is reasoning or the first two characters of reasoning followed by action. Okay, so that's where the react comes from. So this is one of our react agent loops or iterations.

We're going to go and do another one. So next step we have this information. The LM is not provided with this information. Now we want to do a search for front row. Okay, so we do that. This is the reasoning step. We perform the action search front row. Okay, tool search query front row observation.

This is the response front row is controlled by an Apple remote or keyboard function keys. Alright, cool. So we know keyboard function keys are the other device that we were asking about up here. So now we have all the information we need. We can provide an answer to our user.

So we go through another iteration here reasoning and action. Our reasoning is I can now provide the answer of keyboard function keys to the user. Okay, great. So then we use the answer tool. It's like final answer in more common tool agent use and the answer would be keyboard function keys, which we then output to our user.

Okay, so that is the react loop. Okay, so looking at this. Where are we actually calling an LLM and in what way are we actually calling an LLM? So we have our reasoning step. Our LLM is generating the text here, right? So LLM is generating. Okay. What should I do then?

Our LLM is going to generate the input parameters to our action step here that will those input parameters and the tool being used will be taken by our code logic, our agent executor logic, and they will be used to execute some code in which we will get an output.

That output might be taken directly to our observation or our LLM might take that output and then generate an observation based on that. It depends on how you've implemented everything. So our LLM could potentially be being used at every single step there and of course that will repeat through every iteration.

So we have further iterations down here. So you're potentially using an LLM multiple times throughout this whole process, which of course in terms of latency and token cost, it does mean that you're going to be paying more for an agent than you are with just a standard LLM, but that is of course expected because you have all of these different things going on.

But the idea is that what you can get out of an agent is of course much better than what you can get out of an LLM alone. So when we're looking at all of this, all of this iterative chain of thought and tool use, all this needs to be controlled by what we call the agent executor, which is our code logic, which is hitting our LLM, processing its outputs, and repeating that process until we get to our answer.

So breaking that part down, what does it actually look like? It looks kind of like this. So we have our user input goes into our LLM, okay, and then we move on to the reasoning and action steps. Is the action the answer? If it is the answer, so as we saw here, where is the answer?

If the action is the answer, so true, we would just go straight to our outputs. Otherwise, we're going to use our selector tool. Agent executor is going to handle all this. It's going to execute our tool, and then from that, we get our three reasoning, action, observation, inputs, and outputs, and then we're feeding all that information back into our LLM, okay?

In which case, we go back through that loop. So we could be looping for a little while until we get to that final output. Okay, so let's go across to the code. We're going to be going into the agent executor notebook. We'll open that up in Colab, and we'll go ahead and just install our prerequisites.

Nothing different here. It's just Langtrain, Langsmith, optionally, as before. Again, optionally, Langtrain API key if you do want to use Langsmith. Okay, and then we'll come down to our first section, where it's going to define a few quick tools. I'm not necessarily going to go through these because we've already covered them in the agent introduction, but very quickly, Langtrain core tools, we're just importing this tool decorator, which transforms each of our functions here into what we would call a structured tool object.

This thing here. Okay, which we can see. Let's just have a quick look here, and then if we want to, we can extract all of the sort of key information from that structured tool using these parameters here or attributes. So name, description, org schema, model, JSON schema, which give us essentially how the LLM should use our function.

Okay, so I'm going to keep pushing through that. Now, very quickly again, we did cover this in the intro video, so I don't want to necessarily go over it again in too much detail, but our agent executor logic is going to need this part. So we're going to be getting a string from our LLM.

We're going to be loading that into a dictionary object, and we're going to be using that to actually execute our tool as we do here using keyword arguments. Okay, like that. Okay, so with the tools out of the way, let's take a look at how we create our agent.

So when I say agent here, I'm specifically talking about the part that is generating our reasoning step, then generating which tool and what the input parameters to that tool will be. Then the rest of that is not actually covered by the agent. Okay, the rest of that would be covered by the agent execution logic, which would be taking the tool to be used, the parameters, executing the tool, getting the response, aka the observation, and then iterating through that until the LLM is satisfied and we have enough information to answer a question.

So looking at that, our agent will look something like this. It's pretty simple. So we have our input parameters, including the chat history, user query. We have our input parameters, including the chat history, user query, and actually would also have any intermediate steps that have happened in here as well.

We have our prompt template, and then we have our LLM binded with tools. So let's see how all this would look starting with, we'll define our prompt template. So it's going to look like this. We have our system message, you're a helpful assistant when answering user's questions. You should use one tool to provide it after using a tool.

The tool I will provide in the scratch pad below, okay, which we're naming here. If you have an answer in the scratch pad, you should not use any more tools and instead answer directly to the user. Okay, so we have that as our system message. We could obviously modify that based on what we're actually doing.

Then following our system message, we're going to have our chat history, so any previous interactions between the user and the AI. Then we have our current message from the user, okay, which will be fed into the input field there. And then following this, we have our agent's scratch pad or the intermediate thoughts.

So this is where things like the LLM deciding, okay, this is what I need to do. This is how I'm going to do it, aka the tool call. And this is the observation. That's where all of that information will be going, right? So each of those you want to pass in as a message, okay?

And the way that will look is that any tool call generation from the LLM, so when the LLM is saying, use this tool, please, that will be a system message. And then the responses from our tool, so the observations, they will be returned as tooled messages. Great. So we'll run that to define our prompt template.

We're going to define our LLM. So we're going to be using Jupyter 4.0 Mini with a temperature of zero because we want less creativity here, particularly when we're doing tool calling. There's just no need for us to use a high temperature here. So we need to enter our OpenAI API key, which we would get from platformopenai.com.

We enter this, then we're going to continue and we're just going to add tools to our LLM here, okay? These, and we're going to bind them here. Then we have tool choice any. So tool choice any, we'll see in a moment, I'll go through this a little bit more in a second, but that's going to essentially force a tool call.

And you can also put required, which is actually a bit more, it's a bit clearer, but I'm using any here, so I'll stick with it. So these are our tools we're going through. We have our inputs into the agent runnable. We have our prompt template and then that will get fed into our LLM.

So let's run that. Now we would invoke the agent part of everything here with this. Okay, so let's see what it outputs. This is important. So I'm asking what is 10%? Obviously that should use the addition tool and we can actually see that happening. So the agent message content is actually empty here.

This is where you'd usually get an answer, but if we go and have a look, we have additional keyword args. In there we have tool calls and then we have function arguments. Okay, so we're calling a function. Arguments for that function are this. Okay, so we can see this is string.

Again, the way that we would parse that is we do JSON loads and that becomes dictionary and then we can see which function is being called and it is the add function and that is all we need in order to actually execute our function or our tool. Okay, we can see it's a lot more detail here.

Now, what do we do from here? We're going to map the tool name to the tool function and then we're just going to execute the tool function with the generated args, i.e. those. I'll also just point out quickly that here we are getting the dictionary directly, which I think is coming from somewhere else in this, which is here.

Okay, so even that step here where we're parsing this out, we don't necessarily need to do that because I think on the lang chain side, they're doing it for us. So we're already getting that. So JSON loads we don't necessarily need here. Okay, so we're just creating this tool name to function mapping dictionary here.

So we're taking the well the tool names and we're just mapping those back to our tool functions and this is coming from our tools list. So that tools list that we defined here. Okay, and we can even just see quickly that will include everything or each of the tools we define there.

Okay, that's all it is. Now, we're going to execute using our name to tool mapping. Okay, so this here will get us the function. So we'll get us this function and then to that function, we're going to pass the arguments that we generated. Okay. Let's see what it looks like.

Alright, so the response to the observation is twenty. Now, we are going to feed that back into our LLM using the tool message and we're actually going to put a little bit of text around this to make it a little bit nicer. We don't necessarily need to do this to be completely honest.

We could just return the answer directly. I don't understand. I don't even think there would really be any difference. So, we could do either. In some cases, that could be very useful. In other cases, like here, it doesn't really make too much difference, particularly because we have this tool call ID and what this tool call ID is doing is it's being used by OpenAI.

It's being read by the LLM so that the LLM knows that the response we got here is actually mapped back to the tool execution that it's identified here because you see that we have this ID. Alright, we have an ID here. The LLM is going to see the ID.

It's going to see the ID that we pass back in here and it's going to see those two are connected. So, you can see, okay, this is the tool I called and this is a response I got from it. Because of that, you don't necessarily need to say which tool you used here.

You can. It depends on what you're doing. Okay. So, what do we get here? We have, okay, just running everything again. We've added our tool call. So, that's the original AI message that includes, okay, use that tool and then we have the tool execution, tool message, which is the observation.

We map those to the agent stretch card and then what do we get? We have an AI message but the content is empty again, which is interesting because we said to our LLM up here, if you have an answer in the stretch pad, you should not use any more tools and instead answer directly to the user.

So, why is our LLM not answering? Well, the reason for that is down here, we specify tool choice equals any, which again, it's the same as tool choice required, which is telling the LLM that it cannot actually answer directly. It has to use a tool and I usually do this, right?

I would usually put tool choice equals any or required and force the LLM to use a tool every single time. So, then the question is, if it has to use a tool every time, how does it answer our user? Well, we'll see in a moment. First, I just want to show you the two options essentially that we have.

The second is what I would usually use but let's start with the first. So, the first option is that we set tool choice equal to auto and this tells the LLM that it can either use a tool or it can answer the user directly using the final answer or using that content field.

So, if we run that, like we're specifying tool choice as auto, we run that, let's invoke, okay? Initially, you see, ah, wait, there's still no content. That's because we didn't add anything into the agent scratch pad here. There's no information, right? It's all empty. Actually, it's empty because, sorry, so here, you have the chat history that's empty.

We didn't specify the agent scratch pad and the reason that we can do that is because we're using, if you look here, we're using get. So, essentially, it's saying, try and get agent scratch pad from this dictionary but if it hasn't been provided, we're just going to give an empty list.

So, that's why we don't need to specify it here. But that means that, oh, okay, the agent doesn't actually know anything here. It hasn't used the tool yet. So, we're going to just go through our iteration again, right? So, we're going to get our tool output. We're going to use that to create the tool message and then we're going to add our tool call from the AI and the observation.

We're going to pass those to the agent scratch pad and this time, we'll see. We run that. Okay, now, we get the content, okay? So, now, it's not calling. You see here, there's no tool call or anything going on. We just get content. So, that is, this is a standard way of doing or building a tool calling agent.

The other option which I mentioned, this is what I usually go with. So, number two here, I would usually create a final answer tool. So, why would we even do that? Why would we create a final answer tool rather than just, you know, this method is actually perfectly, you know, it works.

So, why would we not just use this? There are a few reasons. The main ones are that with option two where we're forcing tool calling, this removes possibility of an agent using that content field directly and the reason, at least, the reason I found this good when building agents in the past is that occasionally, when you do want to use a tool, it's actually going to go with the content field and it can get quite annoying and use the content field quite frequently when you actually do want it to be using one of the tools and this is particularly noticeable with smaller models.

With bigger models, it's not as common although it does still happen. Now, the second thing that I quite like about using a tool as your final answer is that you can enforce a structured output in your answer. So, this is something we saw in, I think, the first, yes, the first line chain example where we were using the structured output tool of line chain and what that actually is, the structured output feature of line chain, it's actually just a tool call, right?

So, it's forcing a tool call from your LLM. It's just abstracted away so you don't realize that that's what it's doing but that is what it's doing. So, I find that structured outputs are very useful particularly when you have a lot of code around your agent. So, when that output needs to go downstream into some logic, that can be very useful because you can, you have a reliable output format that you know is going to be output and it's also incredibly useful if you have multiple outputs or multiple fields that you need to generate for.

So, those can be very useful. Now, to implement this, so to implement option two, we need to create a final answer tool. We, as with our other tools, we're actually going to provide a description and you can or you cannot do this. So, you can, you can also just return none and actually just use the generated action as the essentially what you're going to send out of your agent execution logic or you can actually just execute the tool and just pass that information directly through.

Perhaps, in some cases, you might have some additional post processing for your final answer. Maybe you do some checks to make sure it hasn't said anything weird. You could add that in this tool here but yeah, in this case, we're just going to pass those through directly. So, let's run this.

We've added, where are we? Final answer. We've added the final answer tool to our named tool mapping. So, our agent can now use it. We redefine our agent, setting tool choice to any because we're forcing the tool choice here and let's go with what is ten plus ten. See what happens.

Okay, we get this, right? We can also, one thing, nice thing here is that we don't need to check is our output in the content field or is it in the tool course field? We know it's going to be in the tool course field because we're forcing that tool use which is quite nice.

So, okay, we know we're using the add tool and these are the arguments. Great. We go or go through that process again. We're going to create our tool message and then we're going to add those messages into our scratch pad or intermediate sets and then we can see again, ah, okay, content field is empty.

That is expected. We're forcing tool users. No way that this can be or have anything inside it but then if we come down here to our tool course, nice. Final answer, answer, ten plus ten equals twenty. Alright? We also have this. Tools used. Where is tools used coming from?

Okay, well, I mentioned before that you can add additional things or outputs when you're using this tool used for your final answer. So, if you just come up here to here, you can see that I asked the LLM to use that tools used field which I defined here. It's a list of strings.

Use this to tell me what tools you use in your answer, right? So, I'm getting the normal answer but I'm also getting this information as well which is kind of nice. So, that's where that is coming from. See that? Okay. So, we have our actual answer here and then we just have some additional information, okay?

We've also defined a type here. It's just a list of strings which is really nice. It's giving us a lot of control over what we're outputting which is perfect. That's, you know, when you're building with agents, the biggest problem in most cases is control of your LLM. So, here, we're getting a honestly pretty unbelievable amount of control over what our LLM is going to be doing which is perfect for when you're building in the real world.

So, this is everything that we need. This is our answer and we would of course be passing that downstream into whatever logic our AI application would be using, okay? So, maybe that goes directly to a front end and we're displaying this as our answer and we're maybe providing some information about, okay, where did this answer come from or maybe there's some additional steps downstream where we're actually doing some more processing or transformations but yeah, we have that.

That's great. Now, everything we've just done here, we've been executing everything one by one and that's to help us understand what process we go through when we're building an agent executor. But we're not going to want to do that all the time, are we? Most of the time, we probably want to abstract all this away and that's what we're going to do now.

So, we're going to build essentially everything we've just taken. We're going to abstract that and abstract it away into a custom agent executor class. So, let's have a quick look at what we're doing here. Although it's literally just what we just did, okay? So, custom agent executor. We initialize it.

We set this max iterations. I'll talk about this in a moment. We initialize it. That is going to set our chat history to just being empty. Okay, good. So, it's a new agent. There should be no chat history in this case. Then we actually define our agent, right? So, that part of logic that is going to be taking our inputs and generating what to do next aka what tool call to do, okay?

And we set everything as attributes of our class and then we're going to define an invoke method. This invoke method is going to take an input which is just a string. So, it's going to be our message from the user and what it's going to do is it's going to iterate through essentially everything we just did, okay?

Until we hit the the final answer tool, okay? So, well, what does that mean? We have our tool call, right? Which is we're just invoking our agent, right? So, it's going to generate what tool to use and what parameters should go into that, okay? And that's an AI message.

So, we would append that to our agent stretch pad and then we're going to use the information from our tool call. So, the name of the tool and the args and also the ID. We're going to use all of that information to execute our tool and then provide the observation back to our LLM, okay?

So, execute our tool here. We then format the tool output into a tool message. See here that I'm just using the the output directly. I'm not adding that additional information there. We do need to always pass in the tool call ID so that our LLM knows which output is mapped to which tool.

I didn't mention this before in this video at least but that is that's important when we have multiple tool calls happening in parallel because that can happen. When we have multiple tool calls happening in parallel, let's say we have ten tool calls, all those responses might come back at different times.

So, then the order of those can get messed up. So, we wouldn't necessarily always see that it's a AI message beginning a tool call followed by the answer to that tool call. Instead, it might be AI message followed by like ten different tool call responses. So, you need to have those IDs in there, okay?

So, then we pass our tool output back to our Agent Scratchpad or intermediate steps. I'm sending a print in here so that we can see what's happening whilst everything is running. Then we increment this count number. We'll talk about that in a moment. So, coming past that, we say, okay, if the tool name here is final answer, that means we should stop, okay?

So, once we get the final answer, that means we can actually extract our final answer from the final tool call, okay? And in this case, I'm going to say that we're going to extract the answer from the tool call or the observation. We're going to extract the answer that was generated.

We're going to pass that into our chat history. So, we're going to have our user message. This is the one the user came up with followed by our answer which is just the natural answer field and that's simply an AI message. But then we're actually going to be including all of the information.

So, this is the answer, natural language answer and also the tool was used output. We're going to be feeding all of that out to some downstream process as preferred. So, we have that. Now, one thing that can happen if we're not careful is that our agent executor may run many, many times and particularly if we've done something wrong in our logic because we're building these things, it can happen that maybe we've not connected the observation back up into our agent executor logic and in that case, what we might see is our agent executor runs again and again and again and I mean, that's fine.

We're going to stop it but if we don't realize straight away and we're doing a lot of LLM calls that can get quite expensive quite quickly. So, what we can do is we can set a limit, right? So, that's what we've done up here with this max iterations. We said, okay, if we go past three max iterations by default, I'm going to say stop, alright?

So, that's why we have the count here. While count is less than the max iterations, we're going to keep going. Once we hit the number of max iterations, we stop, okay? So, the while loop will just stop looping, okay? So, it just protects us in case of that and it also potentially maybe at some point, your agent might be doing too much to answer a question.

So, this will force it to stop and just provide an answer. Although, if that does happen, I just realized there's a bit of a fault in the logic here. If that does happen, we wouldn't necessarily have the answer here, right? So, we'd probably want to handle that nicely but in this scenario, it's a very simple use case.

We're not going to see that happening. So, we initialize our custom agent executor and then we invoke it, okay? And let's see what happens. Alright, there we go. So, that just wrapped everything into a single invoke. So, everything is handled for us. We could say, okay, what is ten?

You know, we can modify that and say 7.4 for example and that will go through. We'll use the multiply tool instead and then we'll come back to the final answer again, okay? So, we can see that with this custom agent executor, we've built an agent and we have a lot more control over everything that is going on in here.

One thing that we would probably need to add in this scenario is right now, I'm assuming that only one tool call will happen at once and it's also why I'm asking here. I'm not asking a complicated question because I don't want it to go and try and execute multiple tool calls at once which can happen.

So, let's just try this. Okay. So, this is actually completely fine. So, this did just execute it one after the other. So, you can see that when asking this more complicated question, it first did the exponentiate tool followed by the add tool and then it actually gave us our final answer which is cool.

Also told us we use both of those tools which it did but one thing that we should just be aware of is that from OpenAI, OpenAI can actually execute multiple tool calls in parallel. So, by specifying that we're just using this zero here, we're actually assuming that we're only ever going to be calling one tool at any one time which is not always going to be the case.

So, you'd probably need to add a little bit of extra logic there in case of scenarios if you're building an agent that is likely to be running parallel tool calls. But yeah, you can see here actually it's completely fine. So, it's running one after the other. Okay. So, with that, we built our agent executor.

I know there's a lot to that and of course, you can just use the very abstract agent executor in the chain but I think it's very good to understand what is actually going on to build our own agent executor in this case and it sets you up nicely for building more complicated or use case specific agent logic as well.

So, that is it for this chapter. In this chapter, we're going to be taking a look at line change expression language. We'll be looking at the runnables, the serializable and parallel of those, the runnable pass through and essentially how we use LSL in its full capacity. Now, to do that well, what I want to do is actually start by looking at the traditional approach to building chains in line chain.

So, to do that, we're going to go over to the LSL chapter and open that curl up. Okay. So, let's come down. We'll do the prerequisites. As before, nothing major in here. The one thing that is new is Docker Ray because later on, as you'll see, we're going to be using this as an example of the parallel capabilities in LSL.

If you want to use Langsmith, you just need to add in your line chain API key. Okay. And then let's, okay. So, now, let's dive into the traditional approach to chains in line chain. So, the LN chain, I think it's probably one of the first things introduced in line chain, if I'm not wrong.

This takes a prompt and feeds it into an LLM and that's it. You can also, you can add like output parsing to that as well but that's optional. I don't think we're going to cover it here. So, what that might look like is we have, for example, this prompt template here.

Give me a small report on topic. Okay. So, that would be our prompt template. We'd set up as we usually do with the prompt templates as we've seen before. We then define our LLM. We need our API key for this which as usual, we would get from platform.openai.com. Then, we go ahead.

I'm just showing you that you can invoke the LLM there. Then, we go ahead actually define a output parser. So, we do do this. I wasn't sure we did but we will then define our LLM chain like this. Okay. So, LLM chain, we are now prompt and now LLM and now output parser.

Okay. This is the traditional approach. So, I would then say, okay, retrieve augmented generation and what it's going to do is it's going to give me a little report back on on rag. Okay. It takes a moment but you can see that that's what we get here. We can format that nicely as we usually do and we get, okay, look, we get a nice little report.

However, the LLM chain is one, it's quite restrictive, right? We have to have like particular parameters that have been predefined as being usable which is, you know, restrictive and it's also been deprecated. So, you know, this isn't the standard way of doing this anymore but we can still use it.

However, the preferred method to building this and building anything else really or chains in general in line chain is using LSL, right? And it's super simple, right? So, we just actually take the prompt LLM and output parser that we had before and then we just chain them together with these pipe operators.

So, the pipe operator here is saying, take what is output from here and input it into here. Take what is output from here and put it into here. That's all it does. It's super simple. So, put those together and we invoke it in the same way and we'll get the same output, okay?

And that's what we get. There is actually a slight difference on what we're getting out from there. You can see here we got actually a dictionary but that is pretty much the same, okay? So, we get that and as before, we can display that in Markdown with this, okay?

So, we saw just now that we have this pipe operator here. It's not really standard Python syntax to use this or at least it's definitely not common. It's an aberration of the intended use of Python, I think. But anyway, it does, it looks cool and when you understand it, I kinda get why they do it because it does make things quite simple in comparison to what it could be otherwise.

So, I kinda get it. It's a little bit weird but it's what they're doing and I'm teaching it ourselves. That's what we're going to learn. So, what is that pipe operator actually doing? Well, it's as I mentioned, it's taking the output from this, putting it as input into what is ever on the right but how does that actually work?

Well, let's actually implement it ourselves without line chain. So, we're going to create this class called Runnable. This class, when we initialize it, it's going to take a function, okay? So, this is literally a Python function. It's going to take that and it's going to essentially turn it into what we would call a Runnable in line chain and what does that actually mean?

Well, it doesn't really mean anything. It just means that when you use run the invoke method on it, it's going to call that function in the way that you would have done otherwise, alright? So, using just function, you know, brackets, open, parameters, brackets, close. It's going to do that but it's also going to add this method, this all method.

Now, this all method in typical Python syntax. Now, this all method is essentially going to take your Runnable function, the one that you initialize with and it's also going to take an other function, okay? This other function is actually going to be a Runnable, I believe. Yes, it's going to be a Runnable just like this and what it's going to do is it's going to run this Runnable based on the output of your current Runnable, okay?

That's what this all is going to do. Seems a bit weird maybe but I'll explain in a moment. We'll see why that works. So, I'm going to chain a few functions together using this all method. So, first, we're just going to turn them all into Runnables, okay? So, these are normal functions as you can see, normal Python functions.

We then turn them into this Runnable using our Runnable class. Then, look what we can do, right? So, we're going to create a chain that is going to be our Runnable chained with another Runnable chained with another Runnable, okay? Let's see what happens. So, we're going to invoke that chain of Runnables with three.

So, what is this going to do? Okay, we start with five. We're going to add five to three. So, we'll get eight. Then, we're going to subtract five from eight to give us three again and then we're going to multiply three by five to give us fifteen and we can invoke that and we get fifteen, okay?

Pretty cool. So, that is interesting. How does that relate to the pipe operator? Well, that pipe operator in Python is actually a shortcut for the all method. So, what we just implemented is the pipe operator. So, we can actually run that now with the pipe operator here and we'll get the same.

We'll get fifteen, right? So, that's that's what LineChain is doing. Like, under the hood, that is what that pipe operator is. It's just chaining together these multiple Runnables as we'd call them using their own internal or operator, okay? Which is cool. I will give them that. It's kind of a cool way of doing this.

It's creative. I wouldn't have thought about it myself. So, yeah, that is a pipe operator. Then, we have these Runnable things, okay? So, this is this is different to the Runnable I just defined here. This is we define this ourselves. It's not a LineChain thing. We didn't get this from LineChain.

Instead, this Runnable lambda object here, that is actually exactly the same as what we just defined, alright? So, what we did here, this Runnable, this Runnable lambda is the same thing but in LineChain, okay? So, if we use that, okay? We use that to now define three Runnables from the functions that we defined earlier.

We can actually pair those together now using the the pipe operator. You could also pair them together if you want with the or operator, right? So, we could do what we did earlier. We can invoke that, okay? Or as we were doing originally, we choose pipe operator. Exactly the same.

So, this Runnable lambda from LineChain is just what we what we just built with the Runnable. Cool. So, we have that. Now, let's try and do something a little more interesting. We're going to generate a report and we're going to try and edit that report using this functionality, okay?

So, give me a small report about topic, okay? We'll go through here. We're going to get our report on AI, okay? So, we have this. You can see that AI is mentioned many times in here. Then, we're going to take a very simple function, right? So, I'm just going to extract fact.

This is basically going to take what is it? See, taking the first. Okay. So, we're actually trying to remove the introduction here. I'm not sure if this actually will work as expected but it's it's fine. Try it anyway but then more importantly, we're going to replace this word, okay?

So, we're going to replace an old word with a new word. Our old word is going to be AI. Our new word is going to be Skynet, okay? So, we can wrap both of these functions as Runnable Lambdas, okay? We can add those as additional steps inside our entire chain, alright?

So, we're going to extract, try and remove the introduction although I think it needs a bit more processing than just splitting here and then we're going to replace the word. We need that actually to be AI. Run that, run this. Okay. So, now we get Artificial Intelligence Skynet refers to the simulation of human intelligence processed by machines and then we have narrow Skynet, weak Skynet, and strong Skynet.

Applications of Skynet. Skynet technology is being applied in numerous fields including all these things. Scary. Despite its potential, Skynet poses several challenges. Systems can perpetrate existing biases. It raises significant privacy concerns. It can be exploited for malicious purposes, okay? So, we have all these, you know, it's just a silly little example.

We can see also the introduction didn't work here. The reason for that is because our introduction includes multiple new lines here. So, I would actually, if I want to remove the introduction, we should remove it from here, I think. This is a, I will never actually recommend you do that because it's not, it's not very flexible.

It's not very robust but just so I show you that that is actually working. So, this extract fact runnable, right? So, now we're essentially just removing the introduction, right? Why would we want to do that? I don't know but it's there just so you can see that we can have multiple of these runnable operations running and they can be whatever you want them to be.

Okay, it is worth knowing that the inputs to our functions here were all single arguments, okay? If you have a function that is accepting multiple arguments, you can do that in the way that I would probably do it or you can do it in multiple ways. One of the ways that you can do that is actually write your function to accept multiple arguments but actually do them through a single argument.

So, just like a single like x which would be like a dictionary or something and then just unpack them within the function and use them as needed. That's just, you know, one way you can do it. Now, we also have these different runnable objects that we can use. So, here we have runnable parallel and runnable pass-through.

It's kind of self-explanatory to some degree. So, let me just go through those. So, runnable parallel allows you to run multiple runnable instances in parallel. Runnable pass-through may be less self-explanatory, allows us to pass a variable through to the next runnable without modifying it, okay? So, let's see how they would work.

So, we're going to come down here and we're going to set up these two docker arrays or obviously, it's two sources of information and we're going to need our LN to pull information from both of these sources of information in parallel which is going to look like this. So, we have these two sources of information, vector store A, vector store B.

This is our docker A and docker A B. These are both going to be fed in as context into our prompt. Then, our LN is going to use all of that to answer the question. Okay. So, to actually implement that, we have our, we need an embedding model. So, use OpenAI embeddings.

We have our vector store A, vector store B. They're not, you know, real vectors. They're not full-on vectors here. We're just passing in a very small amount of information to both. So, we're saying, okay, we're going to create an in-memory vector store using these two bits of information. So, when say half the information is here, this would be a irrelevant piece of information.

Then, we have the relevant information which is DeepSeq v3 was released in December 2024. Okay. Then, we're going to have some other information in our other vector store. Again, irrelevant piece here and relevant piece here. Okay. The DeepSeq v3 LLM is a mixture of experts model with 671 billion parameters at its largest.

Okay. So, based on that, we're also going to build this prompt string. So, we're going to pass in both of those contexts into our prompt. Now, I'm going to ask a question. We don't actually need, we don't need that bit and actually, we don't even need that bit. What am I doing?

So, we just need this. So, we have the both the contexts and we would run them through our prompt template. Okay. So, we have our system prompt template which is this and then we're just going to have, okay, our question is going to go into here as a user message.

Cool. So, we have that and then, let me make this easier to read. We're going to convert both of those to retrievers which just means we can retrieve stuff from them and we're going to use this runnable parallel to run both of these in parallel, right? So, these have been both being run in parallel but then we're also running our question in parallel because this needs to be essentially passed through this component without us modifying anything.

So, when we look at this here, it's almost like, okay, this section here would be our runnable parallel and these are being run in parallel but also our query is being passed through. So, it's almost like there's another line there which is our runnable pass through, okay? So, that's what we're doing here.

These are running in parallel. One of them is a pass through. I need to run here. I just realized here we're using the deprecated embeddings. Just switch it to this. So, line chain open AI. We run that, run this, run that and now this is set up, okay? So, we then put our initial.

So, this using our runnable parallel and runnable pass through. That is our initial step. We then have our prompt. Now, we should be chained together with the usual, you know, the usual pipe operator, okay? And now, we're going to invoke a question. What architecture does the mod DeepSeq release in December use, okay?

So, for the ELN to answer this question, it's going to need to tell us what it needs the information about the DeepSeq model that was released in December which we have specified in one half here and then it also needs to know what architecture that model uses which is defined in the other half over here, okay?

So, let's run this, okay? There we go. DeepSeq v3 model released in December 2024 is a mixture of experts model with 671 billion parameters, okay? So, a mixture of experts and this many parameters. Pretty cool. So, we've put together our pipeline using LSL, using the pipe operator, the runnables, specifically, we've looked at the runnable parallel, runnable pass through, and also the runnable lambdas.

So, that's it for this chapter on LSL and we'll move on to the next one. In this chapter, we're going to cover streaming and async in lang chain. Now, both using async code and using streaming are incredibly important components of I think almost any conversational chat interface or at least any good conversational chat interface.

For async, if your application is not async and you're spending a load of time in your API or whatever else waiting for LLM calls because a lot of those are behind APIs, you are waiting and your application is doing nothing because you've written synchronous code and that, well, there are many problems with that.

Mainly, it doesn't scale. So, async code generally performs much better and especially for AI where a lot of the time, we're kind of waiting for API calls. So, async is incredibly important for that. For streaming, now, streaming is slightly different thing. So, let's say I want to tell me a story, okay?

I'm using gbt4 here. It's a bit slower. So, we can actually stream. We can see that token by token, this text is being produced and sent to us. Now, this is not just a visual thing. This is the LLM when it is generating tokens or words, it is generating them one by one and that's because these LLMs literally generate tokens one by one.

So, they're looking at all of the previous tokens in order to generate the next one and then generate next one, generate next one. Now, that's how they work. So, when we are implementing streaming, we're getting that feed of tokens directly from the LLM through to our, you know, our back end or our front end.

That is what we see when we see that token by token interface, right? So, that's one thing. One other thing that I can do that, let me switch across to 4.0 is I can say, okay, we just got this story. I'm going to ask, are there any standard storytelling techniques to follow use above?

Please use search. Okay. So, look, we get this very briefly there. We saw that it was searching the web and the way, it's not because we told it, okay, we told the LLM to use the search tool but then the LLM output some tokens to say, use the search tool that it's going to use a search tool and it also would have output the token saying what that search query would have been although we didn't see it there.

But, what the chat GPT interface is doing there, so it received those tokens saying, hey, I'm going to use the search tool. It doesn't just send us those tokens like it does with the standard tokens here. Instead, it used those tokens to show us that searching the web little text box.

So, streaming is not just the streaming of these direct tokens. It's also the streaming of these intermediate steps that the LLM may be thinking through which is particularly important when it comes to agents and agentic interfaces. So, it's also a feature thing, right? Streaming doesn't just look nice. It's also a feature.

Then, finally, of course, when we're looking at this, okay, let's say we go back to GPT-4 and I say, okay, use all of this information to generate a long story for me, right? And, okay, we are getting the first token now. So, we know something is happening. We need to start reading.

Now, imagine if we were not streaming anything here and we're just waiting, right? We're still waiting now. We're still waiting and we wouldn't see anything. We're just like, oh, it's just blank or maybe there's a little loading spinner. So, we'd still be waiting and even now, we're still waiting, right?

This is an extreme example but can you imagine just waiting for so long and not seeing anything as a user, right? Now, just now, we would have got our answer if we were not streaming. I mean, that would be painful as a user. You'd not want to wait especially in a chat interface.

You don't want to wait that long. It's okay with, okay, for example, deep research takes a long time to process but you know it's going to take a long time to process and it's a different use case, right? You're getting a report. This is a chat interface and yes, most messages are not going to take that long to generate.

We're also probably not going to be using GPT-4 depending on, I don't know, maybe some people still do but in some scenarios, it's painful to need to wait that long, okay? And it's also the same for agents. It's nice when you're using agents to get an update on, okay, we're using this tool.

It's using this tool. This is how it's using them. Perplexity, for example, have a very nice example of this. So, okay, what's this? OpenAI co-founder joins Mirati's startup. Let's see, right. So, we see this is really nice. We're using ProSearch. It's searching for news, showing us the results, like we're getting all this information as we're waiting which is really cool and it helps us understand what is actually happening, right?

It's not needed in all use cases but it's super nice to have those intermediate steps, right? So, then we're not waiting and I think this bit probably also streamed but it was just super fast. So, I didn't see it but that's pretty cool. So, streaming is pretty important. Let's dive into our example.

Okay, we'll open that in Colab and off we go. So, starting with the prerequisites, same as always, LangChain, optionally LangSmith. We'll also enter our LangChain API key if you'd like to use LangSmith. We'll also enter our OpenAI API key. So, that is platform.openai.com and then as usual, we can just invoke our LLM, right?

So, we have that. It's working. Now, let's see how we would stream with AStream, okay? So, whenever a method, so stream is actually a method as well, we could use that but it's not async, right? So, whenever we see a method in LangChain that has a prefix onto what would be another method, that's like the async version of this.

So, we can actually stream using async super easily using just LLM AStream, okay? Now, this is just an example and to be completely honest, you probably will not be able to use this in an actual application but it's just an example and we're going to see how we would use this or how we would stream asynchronously in an application further down in this notebook.

So, starting with this, you can see here that we're getting these tokens, right? We're just appending it to tokens here. We don't actually need to do that. I don't think we're using this but maybe we, yeah, we'll do it here. It's fine. So, we're just appending the tokens as they come back from our LLM, appending it to this.

We'll see what that is in a moment and then I'm just printing the token content, right? So, the content of the token. So, in this case, that would be L. In this case, it would be LP. It would be SAMS, four, so on and so on. So, you can see for the most part, it's tends to be word level but it can also be sub-word level as you see, sent, is one word, of course.

So, you know, they get broken up in various ways. Then, adding this pipe character onto the end here. So, we can see, okay, where are our individual tokens? Then, we also have Flush. So, Flush, you can actually turn this off and it's still going to stream. You're still going to see everything but it's going to be a bit more.

You can see it's kind of a, it's like bit by bit. When we use Flush, it forces the console to update what is being shown to us immediately, alright? So, we get a much smoother when we're looking at this. It's much smoother versus when Flush is not set to true.

So, yeah, when you're printing, that is good to do just so you can see. You don't necessarily need to. Okay. Now, we added all those tokens to the tokens list so we can have a look at each individual object that was returned to us, right? This is interesting. So, you see that we have the AI message chunk, right?

That's an object and then you have the content. The first one's actually empty. Second one has that N for NLP and yeah, I mean, that's all we really need to know. They're very simple objects but they're actually quite useful because just look at this, right? So, we can add each one of our AI message chunks, right?

Let's see what that does. It doesn't create a list. It creates this, right? So, we still just have one AI message chunk but it's combined the content within those AI message chunks which is kind of cool, right? So, for example, like we could remove these, right? And then we just see NLP.

So, it's kind of nice little feature there. I do. I actually quite like that. But you do need to just be a little bit careful because obviously you can do that the wrong way and you're going to get like a I don't know what that is. Some weird token salad.

So, yeah, you need to just make sure you are going to be merging those in the correct order unless you, I don't know, unless you're doing something weird. Okay, cool. So, streaming, that was streaming from a LM. Let's have a look at streaming with agents. So, we, it gets a bit more complicated to be completely honest.

But we also need to, things are going to get a bit more complicated so that we can implement this in, for example, an API, right? That is, it's kind of like a necessary thing in any case. So, to just very quickly, we're going to construct our agent executor like we did in the agent execution chapter.

And for that, for the agent executor, we're going to need tools, chat prompt template, LM agent, and the agent executor itself, okay? Very quickly, I'm not going to go through these in detail. We just define our tools. We have add, multiply, exponentiate, subtract, and define our answer tool. Merge those into a single list of tools.

Then, we have our prompt template. Again, same as before, we just have system message, we have chat history, we have a query, and then we have the agent scratch pad for those intermediate steps. Then, we define our agent using LSL. LSL works quite well with both streaming and async, by the way.

It supports both out of the box, which is nice. So, we define our agent. Then, coming down here, we're going to create the agent executor. This is the same as before, right? So, there's nothing new in here, I don't think. So, just initialize our agent things there. Then, it's, yeah, we're looping through, looping through.

Yeah, nothing, nothing new there. So, we're just executing, we're invoking our agent, seeing if there's a tool call. This is slightly, we could shift this to before or after. It doesn't actually matter that much. So, we're checking if it's the final answer. If not, we continue, execute our tools, and so on.

Okay, cool. So, then, we can invoke that. Okay, we go, what is 10 plus 10? There we go, right? So, we have our agent executor, it is working. Now, when we are running our agent executor, with every new query, if we're putting this into an API, we're probably going to need to provide it with a fresh callback handler.

Okay, so, this is the callback handler is what's going to handle taking the tokens that are being generated by a Lemo agent and giving them to some other piece of code. Like, for example, the streaming response for an API, and our callback handler is going to put those tokens in a queue, in our case, and then our, for example, the streaming object is going to pick them up from the queue and put them wherever they need to be.

So, to allow us to do that with every new query, rather than us needing to initialize everything when we actually initialize our agent, we can add a configurable field to our Lem, okay? So, we set the configurable fields here. Oh, also, one thing is that we set streaming equal to true, that's very minor thing, but just so you see that there, we do do that.

So, we add some configurable fields to our Lem, which means we can basically pass an object in for these on every new invocation. So, we set our configurable field, it's going to be called callbacks, and we just add a description, right? Nothing more to it. So, this will now allow us to provide that field when we're invoking our agent, okay?

Now, we need to define our callback handler, and as I mentioned, what is basically going to be happening is this callback handler is going to be passing tokens into our async IO queue object, and then we're going to be picking them up from the queue elsewhere, okay? So, we can call it a queue callback handler, okay?

And that is inheriting from the async callback handler, because we want all this to be done asynchronously, because we're thinking here about, okay, how do we implement all this stuff within APIs and actual real world code, and we do want to be doing all this in async. So, let me execute that, and I'll just explain a little bit of what we're looking at.

So, we have the initialization, right? There's nothing specific here. What we really want to be doing is we want to be setting our queue object, assigning that to the class attributes, and then there's also this final answer scene, which we're setting to false. So, what we're going to be using that for is our LLM will be streaming tokens to us whilst it's using its tool calling, and we might not want to display those immediately, or we might want to display them in a different way.

So, by setting this final answer scene to false, whilst our LLM is outputting those tool tokens, we can handle them in a different way, and then as soon as we see that it's done with the tool calls and it's onto the final answer, which is actually another tool call, but once we see that it's onto the final answer tool call, we can set this to true, and then we can start processing our tokens in a different way, essentially.

So, we have that. Then, we have this aiter method. This is required for any async generator object. So, what that is going to be doing is going to be iterating through, right? So, it's a generator. It's going to be going iterating through and saying, okay, if our queue is empty, right?

This is the queue that we set up here. If it's empty, wait a moment, right? We use the sleep method here, and this is an async sleep method. This is super important. We're using, we're awaiting for an asynchronous sleep, right? So, whilst we're, whilst we're waiting for that 0.1 seconds, our, our code can be doing other things, right?

That that is important. If we, if we use, I think the standard is time dot sleep, that is not asynchronous, and so it will actually block the thread for that 0.1 seconds. So, we don't want that to happen. Generally, our queue should probably not be empty that frequently given how quickly tokens are going to be added to the queue.

So, the only way that this would potentially be empty is maybe our LLM stops. Maybe there's like a connection interruption for a, you know, a brief second or something, and no tokens are added. So, in that case, we don't actually do anything. We don't keep checking the queue. We just wait a moment, okay?

And then, we check again. Now, if it was empty, we wait, and then, we continue on to the next iteration. Otherwise, it probably won't be empty. We get whatever is from our, inside our queue. We get that out, pull it out. Then, we say, okay, if that token is a done token, we're going to return.

So, we're going to stop this generator, right? We're finished. Otherwise, if it's something else, we're going to yield that token which means we're returning that token, but then, we're continuing through that loop again, right? So, that is our generator logic. Then, we have some other methods here. These are line-chain specific, okay?

We have on LLM new token and we have on LLM end. Starting with on LLM new token, this is basically when an LLM returns a token to us. Line chain is going to run or execute this method, okay? This is the method that will be called. What this is going to do is it's going to go into the keyword arguments.

It's going to get the chunk object. So, this is coming from our LLM. If there is something in that chunk, it's going to check for a final answer tool call first, okay? So, we get our tool calls and we say, if the name within our chunk, right? Probably, this will be emptying most of the tokens we return, right?

So, you remember before when we're looking at the chunks here, this is what we're looking at, right? The content for us is actually always going to be empty and instead, we're actually going to get the additional keyword args here and inside there, we're going to have our tool calling, our tool calls as we saw in the previous videos, right?

So, that's what we're extracting. We're extracting that information. That's why we're going additional keyword args, right? And get those tool, the tool call information, right? Or it will be none, right? So, if it is none, I don't think it ever would be none to be honest. It would be strange if it's none.

I think that means something would be wrong. Okay, so here, we're using the Walrus operator. So, the Walrus operator, what it's doing here is whilst we're checking the if logic here, whilst we do that, it's also assigning whatever is inside this. It's assigning over to tool calls and then with the if we're checking whether tool calls is something or none, right?

Because we're using get here. So, if this get operation fails and there is no tool calls, this object here will be equal to none which gets assigned to tool calls here and then this if none will return false and this logic will not run, okay? And it will just continue.

If this is true, so if there is something returned here, we're going to check if that something returned is using the function name or tool name, final answer. If it is, we're going to set that final answer scene equal to true. Otherwise, we're just going to add our chunk into the queue, okay?

We use put no weight here because we're we're using async. Otherwise, if you were not using async, I think you might just put weight or maybe even put put. No, okay, you you'd use put if it's just synchronous code but II don't think I've ever implemented this synchronously. So, it would actually just be put no weight for async, okay?

And then return. So, we have that. Then, we have on LLM end, okay? So, this is when chain sees that the LLM has returned or indicated that it is finished with the response. Line chain will call this. So, you have to be aware that this will happen multiple times during an agent execution because if you think within our agent executor, we're hitting the LLM multiple times.

We have that first step where it's deciding, oh, I'm going to use the add tool or the multiply tool and then that response gets back to us. We execute that tool and then we pass the output from that tool and or the original user query in the chat history, we pass that back to our LLM again, right?

So, that's another call to our LLM that's going to come back. It's going to finish or it's going to give us something else, right? So, there's multiple LLM calls happening throughout our agent execution logic. So, this on LLM call will actually get called at the end of every single one of those LLM calls.

Now, if we get to the end of a LLM call and it was just a it was a tool invocation. So, we had the, you know, it called the add tool. We don't want to put the done token into our queue because when the done token is added to our queue, we're going to stop iterating, okay?

Instead, if it was just a tool call, we're going to say step end, right? And we'll actually get this token back. So, this is useful on, for example, the front end, you could have, okay, I've used the add tool. These are the parameters and it's the end of the step.

So, you could have that your tool call is being used on some front end and as soon as it sees step end, it knows, okay, we're done with that. Here was the response, right? And it can just show you that and we're going to use that. We'll see that soon but let's say we get to the final answer tool.

We're on the final answer tool and then we get this signal that the LLM has finished. Then, we need to stop iterating. Otherwise, our our stream generator is just going to keep going forever, right? Nothing's going to stop it or maybe it will time out. I don't think it will though.

So, at that point, we need to send, okay, stop, right? We need to say we're done and then that will that will come back to here to our iterator and to our async iterator and it will return and stop the generator, okay? So, that's the core logic that we have inside that.

I know there's a lot going on there. It's but we need all of this. So, it's important to be aware of it. Okay. So, now, let's see how we might actually call our agent with all of the streaming in this way. So, we're going to initialize our queue. We're going to use that to initialize a streamer, okay?

Using the the custom streamer that we just set up. Custom callback handler, whatever you want to call it, okay? Then, I'm going to define a function. So, this is an asynchronous function. It has to be if if we're using async and what it's going to do is it's going to call our agent with a config here and we're going to pass it that call the the callback which is the streamer, right?

Now, here, I'm not calling the agent executor. I'm just calling the agent, right? So, the if we come back up here, we're calling this, right? So, that's not going to include all the tool execution logic and importantly, we're calling the agent with the config that uses callbacks, right? So, this this configurable fields here from our LM is actually being fed through and it propagates through to our agent object as well to the runnable serializable, right?

So, that's what we're executing here. We see agent with config and we're passing in those callbacks which is just one actually, okay? So, that sets up our agent and then we invoke it with a stream, okay? Like we did before and we're just going to return everything. So, let's run that, okay?

And we see all the token or the chunk objects that have been returned and this is useful to understand what we're actually doing up here, right? So, when we're doing this chunk message, additional keyword arguments, right? We can see that in here. So, this would be the chunk message object.

We get the additional keyword logs. We're going to tool calls and we get the information here. So, we have the ID for that tool call which we saw in the previous chapters. Then, we have our function, right? So, the function includes the name, right? So, we know what tool we're calling from this first chunk but we don't know the arguments, right?

Those arguments are going to be streamed to us. So, we can see them begin to come through in the next chunk. So, next chunk is just it's just the first token for the add function, right? And we can see these all come together over multiple steps and we actually get all of our arguments, okay?

That's pretty cool. So, actually one thing I would like to show you here as well. So, if we just do token equals tokens, sorry. And we do tokens.appendtoken. Okay. We have all of our tokens in here now. Alright, see that they're all AI message chunks. So, we can actually add those together, right?

So, let's we'll go with these here and based on these, we're going to get all of the arguments, okay? So, this is kind of interesting. So, it's one until I think like the second to last maybe. Alright, so we have these and actually we just want to add those together.

So, I'm going to go with tokens one and I'm just going to go four. For token in, we're going to go from the second onwards. I'm going to TK plus token, right? And let's see what TK looks like at the end here. TK. Okay. So, now you see that it's kind of merged with all those arguments here.

Sorry, plus equal. Okay. So, run that and you can see here that it's merged those arguments. It didn't get all of them. So, I kind of missed some at the end there but it's merging them, right? So, you can see that logic where it's, you know, before it was adding the content from various chunks.

It also does the same for the other parameters within your chunk object which is I think it's pretty cool and you can see here the name wasn't included. That's because we started on token one or on token zero where the name was. So, if we actually started from token zero and let's just let's just pull them in there, alright?

So, from one onwards, we're going to get a complete AI message chunk which includes the name here and all of those arguments and you'll you'll see also here, right? Populate everything which is pretty cool. Okay. So, we have that. Now, based on this, we're going to want to modify our custom agent executor because we're streaming everything, right?

So, we want to add streaming inside our agent executor which we're doing here, right? So, this is async def stream and we're sharing async for token in the A stream, okay? So, this is like the very first instance. If output is non, we're just going to be adding our token.

So, the the chunk, sorry, to our output like a first token becomes our output. Otherwise, we're just appending our tokens to the output, okay? If the token content is empty, which it should be, right? Because we're using tool calls all the time. We're just going to print content, okay?

I just added these as so we see like print everything. I just want to want to be able to see that. I wouldn't expect this to run because we're saying it has to use tool calling, okay? So, within our agent, if we come up to here, we said tool choice any.

So, it's been forced to use tool calling. So, it should never really be returning anything inside the content field but just in case it's there, right? So, we'll see if that is actually true. Then, we're just getting out our tool calls information, okay? From our chunk and we're going to say, okay, if there's something in there, we're going to print what is in there, okay?

And then, we're going to extract our tool name. If there is some, if there's a tool name, I'm going to show you the tool name. Then, we're going to get the ARGs and if the ARGs are not empty, we're going to see what we get in there, okay? And then from all of this, we're actually going to merge all of it into our AI message, right?

Because we're merging everything as we're going through, we're merging everything into outputs as I showed you before, okay? Cool. And then, we're just awaiting our stream that will like kick it off, okay? And then, we do the standard agent executor stuff again here, right? So, we're just pulling out tool name, tool logs, tool call ID and then we're using all that to execute our tool here and then we're creating a new tool message and passing that back in.

And then also here, I move the break for the final answer into the final step. So, that is our custom agent executor with streaming and let's see what, let's see what it does, okay? Same for both equals true, so we see all those print statements, okay? So, you can kind of see it's a little bit messy but you can see we have tool calls that had some stuff inside it, had add here and what we're printing out here is we're printing out the full AI message chunk with tool calls and then I'm just printing out, okay, what are we actually pulling out from that?

So, these are actually coming from the same thing, okay? And then the same here, right? So, we're looking at the full message and then we're looking, okay, we're getting this argument out from it, okay? So, we can see everything that is being pulled out, you know, chunk by chunk or token by token and that's it, okay?

So, we could just get everything like that. However, right, so I'm printing everything so we can see that streaming. What if I don't print, okay? So, we're setting verbose or by default, verbose is equal to false here. So, what happens if we invoke now? Let's see. Okay. Cool. We got nothing.

So, the reason we got nothing is because we're not printing but we don't, if you are, if you're building an API, for example, you're pulling your tokens through, you can't print them to your like a front end or print them as to the output of your API. Printing goes to your terminal, right?

Your console window. It doesn't go anywhere else. Instead, what we want to do is we actually want to get those tokens out, right? But if but how do we do that, right? So, we we printed them but another place that those tokens are is in our queue, right? Because we set them up to go to the queue.

So, we can actually pull them out of our queue whilst our agent executor is running and then we can do whatever we want with them because our code is async. So, it can be doing multiple things at the same time. So, whilst our code is running the agent executor, whilst that is happening, our code can also be pulling out from our queue tokens that are in there and sending them to like an API, for example, right?

Or whatever downstream logic you have. So, let's see what that looks like. We start by just initializing our queue, initializing our streamer with that queue. Then we create a task. So, this is basically saying, okay, I want to run this but don't run it right now. I'm not ready yet.

The reason that I say I'm not ready yet is because I also want to define here my async loop which is going to be printing those tokens, right? But this is async, right? So, we set this up. This is like get ready to run this. Because it is async, this is running, right?

This is just running. Like it's there. It's already running. So, we get this. We continue. We continue. None of this is actually executed yet, right? Only here when we await the task that we set up here. Only then does our agent executor run and our async object here begin getting tokens, right?

And here, again, I'm printing but I don't need to print. I could I could have like a let's say where this is within an API or something. Let's say I'm I'm saying, okay, send token to XYZ token, right? That's sending a token somewhere or if we're maybe we're yielding this to our some sort of streamer object within our API, right?

We can do whatever we want with those tokens, okay? I'm just printing them cuz I want to actually see them, okay? But just important here is that we're not printing them within our agent executor. We're printing them outside the agent executor. We've got them out and we can put them wherever we want which is perfect when you're building an actual sort of real world use case where you're using an API or something else.

Okay, so let's run that. Let's see what we get. Look at that. We get all of the information we could need and a little bit more, right? Because now, we're using the agent executor and now, we can also see how we have this step end, right? So, I know or I know just from looking at this, right?

This is my first tool use. So, what tool is it? Let's have a look. It's the add tool and then, we have these arguments. So, I can then pass them, right? Downstream. Then, we have the next tool use which is here, down here. So, then, we can then pass them in the way that we like.

So, that's pretty cool. Let's, I mean, let's see, right? So, we're getting those things out. Can we, can we do something with them before I, before I print them and show them? Yes, let's see, okay? So, we're now modifying our loop here. Same stuff, right? We're still initializing our queue, initializing our streamer, initializing our tasks, okay?

And we're still doing this async for token streamer, okay? But then, we're doing stuff with our tokens. So, I'm saying, okay, if we're on stream end, I'm not actually gonna print stream end. I'm gonna print new line, okay? Otherwise, if we're getting a tool call here, we're going to say, if that tool call is the tool name, I am going to print calling tool name, okay?

If it's the arguments, I'm going to print the tool argument and I'm gonna end up with nothing so that we don't go onto a new line. So, we're actually gonna be streaming everything, okay? So, let's just see what this looks like. Oh, my bad. I just added that. Okay.

You see that? So, it goes very fast. So, it's kinda hard to see it. I'm gonna slow it down so you can see. So, you can see that we, as soon as we get the tool name, we stream that we're calling the add tool. Then, we stream token by token, the actual arguments for that tool.

Then, for the next one, again, we do the same. We're calling this tool name. Then, we're streaming token by token again. We're processing everything downstream from outside of the agent executor and this is an essential thing to be able to do when we're actually implementing streaming and async and everything else in an actual application.

So, I know that's a lot but it's important. So, that is it for our chapter on streaming and async. I hope it's all been useful. Thanks. Now, we're on to the final capstone chapter. We're going to be taking everything that we've learned so far and using it to build a actual chat application.

Now, the chat application is what you can see right now and we can go into this and ask some pretty interesting questions and because it's an agent because as I've accessed these tools, it will be able to answer them for us. So, we'll see inside our application that we can ask questions that require tool use such as this and because of the streaming that we've implemented, we can see all this information in real time.

So, we can see that serve API tool is being used, that these are the queries. We saw all that was in parallel as well. So, each one of those tools were being used in parallel. We've modified the code a little bit to enable that and we see that we have the answer.

We can also see the structured output being used here. So, we can see our answer followed by the tools used here and then we could ask follow-up questions as well because it's conversational. So, say how is the weather in each of those cities? Okay, that's pretty cool. So, this is what we're going to be building.

We are, of course, going to be focusing on the API, the backend. I'm not front-end engineer so I can't take you through that but the code is there. So, for those of you that do want to go through the front-end code, you can, of course, go and do that but we'll be focusing on how we build the API that powers all of this using, of course, everything that we've learned so far.

So, let's jump into it. The first thing we're going to want to do is clone this repo. So, we'll copy this URL. This is the repo, Aurelio Labs LineChainCourse and you just clone your repo like so. I've already done this so I'm not going to do it again. Instead, I'll just navigate to the LineChainCourse repo.

Now, there's a few setup things that you do need to do. All of those can be found in the README. So, we just open a new tab here and I'll open the README. Okay, so this explains everything we need. We have, if you were running this locally already, you will have seen this or you will have already done all this but for those of you that haven't, we'll go through quickly now.

So, you will need to install the uv library. So, this is how we manage our Python environment, our packages. We use uv. On Mac, you would install it like so. If you're on Windows or Linux, just double check how you would install over here. Once you have installed this, you would then go to install Python.

So, uv Python install. Then, we want to create our VM, our virtual environment using that version of Python. So, uvvn here. Then, as you can see here, we need to activate that virtual environment which I did miss from here. So, let me quickly add that. So, you just run that.

For me, I'm using Phish. So, I just add Phish onto the end there but if you're using Bash or ZSH, I think you can you can just run that directly. And then, finally, we need to sync, i.e. install all of our packages using uv sync. And you see that will install everything for you.

Great. So, we have that and we can go ahead and actually open Cursor or VS Code and then we should find ourselves within Cursor or VS Code. So, in here, you'll find a few things that we will need. So, first is environment variables. So, we can come over to here and we have OpenAI, API Key, Long Chain API Key, and SERP API API Key.

Create a copy of this and you'd make this your .env file or if you want to run it with source, you can, well, I like to use Mac.env when I'm on Mac and I just add export onto the start there and then enter my API keys. Now, I actually already have these in this local.mac.env file which over in my terminal, I would just activate with source again like that.

Now, we'll need that when we are running our API and application later but for now, let's just focus on understanding what the API actually looks like. So, navigating into the 09 Capstone chapter, we'll find a few things. What we're going to focus on is the API here and we have a couple of notebooks that help us just understand, okay, what are we actually doing here?

So, let me give you a quick overview of the API first. So, the API, we're using FastAPI for this. We have a few functions in here. The one that we'll start with is this. Okay. So, this is our post endpoint for invoke and this essentially sends something to our LLM and begins a streaming response.

So, we can go ahead and actually start the API and we can just see what this looks like. So, we'll go into chapter 09 Capstone API after setting our environment variables here and we just want to do uv run uvcorn main colon app reload. We don't need to reload but if we're modifying the code, that can be useful.

Okay, and we can see that our API is now running on localhost port 8000 and if we go to our browser, we can actually open the docs for our API. So, we go to 8000 slash docs. Okay, we just see that we have that single invoke method. It extracts the content and it gives us a small amount of information there.

Now, we could try it out here. So, if we say, say, hello, we can run that and we'll see that we get a response. We get this. Okay. Now, the thing that we're missing here is that this is actually being streamed back to us. Okay. So, this is not a just a direct response.

This is a stream. To see that, we're going to navigate over to here to this streaming testing notebook and we'll run this. So, we are using requests here. We are not just doing a, you know, the standard post request because we want to stream the output and then print the output as we are receiving them.

Okay. So, that's why this look, it's a little more complicated than just a typical request request.get. So, what we're doing here is we're starting our session which is our post request and then we're just iterating through the content as we receive it from that request. When we receive a token, right?

Because sometimes this might be none. We print that. Okay and we have that flush equals truth. We have the use in the past. So, let's define that and then let's just ask a simple question. What is five plus five? Okay and we we saw that was it was pretty quick.

So, it generated this response first and then it went ahead and actually continued streaming with all of this. Okay and we can see that there are these special tokens are being provided. This is to help the front end basically decide, okay, what should go where? So, here where we're showing these multiple steps of tool use and the parameters.

The way the front end is deciding how to display those is it's just it's being provided the single stream but it has these set tokens. Has a step, has a set name, then it has the parameters followed by the sort of ending of the set token and it's looking at each one of these and then the one step name that it treats differently is where it will see the final answer step name.

When it sees the final step name rather than displaying this tool use interface, it instead begins streaming the tokens directly like a typical chat interface and if we look at what we actually get in our final answer, it's not just the answer itself, right? So, we have the answer here.

This is streamed into that typical chat output but then we also have tools used and then this is added into the little boxes that we have below the chat here. So, there's quite a lot going on just within this little stream. Now, we can try with some other questions here.

So, we can say, okay, tell me about the latest news in the world. You can see that there's a little bit of a wait here whilst it's waiting to get the response and then, yeah, it's streaming a lot of stuff quite quickly, okay? So, there's a lot coming through here, okay?

And then we can ask other questions like, okay, this one here, how cold is it in Oslo right now? Is five multiplied by five, right? So, these two are going to be executed in parallel and then it will after it has the answers for those, the agent will use another multiply tool to multiply those two values together and all of that will get streamed, okay?

And then, as we saw earlier, we have the what is the current date and time in these places. Same thing. So, three questions. There are three questions here. What is the current date and time in Dubai? What is the current date and time in Tokyo and what is the current date and time in Berlin?

Those three questions get executed in parallel against the API search tool and then all answers get returned within that final answer, okay? So, that is how our API is working. Now, let's dive a little bit into the code and understand how it is working. So, there are a lot of important things here.

There's some complexity but at the same time, we try to make this as simple as possible as well. So, this is just fast API syntax here with the app post invoke. So, just our invoke endpoint. We consume some content which is a string and then if you remember from the agent executed deep dive which is what we've implemented here or a modified version of that, we have to initialize our async IO queue and our streamer which is the queue callback handler which I believe is exactly the same as what we defined in that earlier chapter.

There's no differences there. So, we define that and then we return this streaming response object, right? Again, this is a fast API thing. This is so that you are streaming a response. That streaming response has a few attributes here which again are fast API things or just generic API things.

So, some headers giving instructions to the API and then the media type here which is text event stream. You can also use, I think it's text plane possibly as well but I believe the standard here would be to use event stream and then the more important part for us is this token generator, okay?

So, what is this token generator? Well, it is this function that we've defined up here. Now, if you, again, if you remember that earlier chapter, at the end of the chapter, we set up a for loop where we're printing out different tokens in various formats. So, we're kind of post processing them before deciding how to display them.

That's exactly what we're doing here. So, in this block here, we're looping through every token that we're receiving from our streamer. We're looping through and we're just saying, okay, if this is the end of a step, we're going to yield this end of step token which we we saw here, okay?

So, it's this end of end of set token there. Otherwise, if this is a tool call, so again, we've got that walrus operator here. So, what we're doing is saying, okay, get the tool calls out from our current message. If there is something there. So, if this is not none, we're going to execute what is inside here and what is being executed inside here is we're checking for the tool name.

If we have the tool name, we return this, okay? So, we have the start of step token, the start of the step name token, the tool name or step name, whichever those you want to call it, and then the end of the step name token, okay? And then this, of course, comes through to the front end like that, okay?

That's what we have there. Otherwise, we should only be seeing the tool name returned as part of first token for every step. After that, it should just be tool arguments. So, in this case, we say, okay, if we have those tool or function arguments, we're going to just return them directly.

So, then that is the part that would stream all of this here, okay? Like these would be individual tokens, right? For example, right? So, we might have the open curly brackets followed by query could be a token, the latest could be a token, world could be a token, news could be a token, etc.

Okay? So, that is what is happening there. This should not get executed but we have a, we just handle that just in case. So, we have any issues with tokens being returned there. We're just gonna print this error and we're going to continue with the streaming but that should not really be happening.

Cool. So, that is our token streaming loop. Now, the way that we are picking up tokens from our stream object here is of course through our agent execution logic which is happening in parallel, okay? So, all of this is asynchronous. We have this async definition here. So, all of this is happening asynchronously.

So, what has happened here is here, we have created a task which is the agent executor invoke and we passing our content, we're passing that streamer which we're gonna be pulling tokens from and we also set verbose to true. Uh we can actually remove that but that would just allow us to see additional output in our terminal window if we want it.

I don't think there's anything particularly interesting to look at in there but particularly if you are debugging that can be useful. So, we create our task here but this does not begin the task. Alright, this is a async IO create task but this does not begin until we await it down here.

So, what is happening here is essentially this code here is still being run or in like a we're in an asynchronous loop here but then we await this task. As soon as we await this task, tokens will still start being placed within our queue which then get picked up by the streamer object here.

So, then this begins receiving tokens. I know async is always a little bit more confusing given the strange order of things but that is essentially what is happening. You can imagine all this is essentially being executed all at the same time. So, we have that. So, anything else to go through here?

I don't think so. It's all sort of boilerplate stuff for FastAPI rather than the actual AI code itself. So, we have that as our streaming function. Now, let's have a look at the agent code itself. Okay. So, agent code. Where would that be? So, we're using this agent execute invoke and we're importing this from the agent file.

So, we can have a look in here for this. Now, you can see straight away, we're pulling in our API keys here. Just, yeah, make sure that you do have those. Now, all of our cell, okay? This is what we've seen before in that agent executed deep dive chapter.

This is all practically the same. So, we have our LM. We've set those configurable fields as we did in the earlier chapters. That configurable field is for our callbacks. We have our prompt. This has been modified a little bit. So, essentially, just telling it, okay, make sure you use the tools provided.

We say you must use the final answer to provide a final answer to the user and one thing that I added that I noticed every now and again. So, I have explicitly said, use tools to answer the user's current question, not previous questions. So, I found with this setup, it will occasionally, if I just have a little bit of small talk with the agent and beforehand I was asking questions about, okay, like what was the weather in this place or that place, the agent will kind of hang on to those previous questions and try and use a tool again to answer and that is just something that you can more or less prompt out of it, okay?

So, we have that. This is all exactly the same as before, okay? So, we have our chat history to make this conversational. We have our human message and then our agent scratch pad so that our agent can think through multiple tool use messages. Great. So, we also have the article class.

So, this is to process results from SERP API. We have our SERP API function here. I will talk about that a little more in a moment because this is also a little bit different to what we covered before. What we covered before with SERP API, if you remember, was synchronous because we're using the SERP API client directly or the SERP API tool directly from BlankChain and because we want everything to be asynchronous, we have had to recreate that tool in a asynchronous fashion which we'll talk about a little bit later.

But for now, let's move on from that. We can see our final answer being used here. So, this is I think we define the exact same thing before probably in that deep dive chapter again where we have just the answer and the tools that have been used. Great. So, we have that.

One thing that is a little different here is when we are defining our name to tool function. So, this takes a tool name and it maps it to a tool function. When we have synchronous tools, we actually use tool funk here. Okay. So, rather than tool coroutine, it would be tool funk.

However, we are using asynchronous tools and so this is actually tool coroutine and this is why if you come up here, I've made every single tool asynchronous. Now, that is not really necessary for a tool like final answer because there's no API calls happening. An API call is a very typical scenario where you do want to use async because if you make an API call with a synchronous function, your code is just going to be waiting for the response from the API while the API is processing and doing whatever it's doing.

So, that is an ideal scenario where you would want to use async because rather than your code just waiting for the response from the API, it can instead go and do something else whilst it's waiting, right? So, that's an ideal scenario where you'd use async which is why we would use it for example with the SERP API tool here but for final answer and for all of these calculator tools that we've built, there's actually no need to have these as async because our code is just running through.

It's executing this code. There's no waiting involved. So, it doesn't necessarily make sense to have these asynchronous. However, by making them asynchronous, it means that I can do tool coroutine for all of them rather than saying, oh, if this tool is synchronous, use tool.func whereas if this one is async, use tool.coroutine.

So, it just simplifies the code for us a lot more but yeah, not directly necessary but it does help us write cleaner code here. This is also true later on because we actually have to await our tool calls which we can see over here, right? So, we have to await those tool calls.

That would get messier if we were using the like some sync tools, some async tools. So, we have that. We have our Q callback handler. This is again, that's the same as before. So, I'm not going to go through. I'm not going to go through that. We covered that in the earlier deep dive chapter.

We have our execute tool function here. Again, that is asynchronous. This just helps us, you know, clean up code a little bit. This would, I think in the deep dive chapter, we had this directly place within our agent executor function and you can do that. It's fine. It's just a bit cleaner to kind of pull this out and we can also add more type annotations here which I like.

So, execute tool expects us to provide an AI message which includes a tool call within it and it will return us a tool message. Okay. Agent executor, this is all the same as before and we're actually not even using verbose here so we could fully remove it but I will leave it.

Of course, if you would like to use that, you can just add a if verbose and then log or print some stuff where you need it. Okay. So, what do we have in here? We have our streaming function. So, this is what actually calls our agent, right? So, we have a query.

This will call our agent just here and we could even make this a little clearer. So, for example, this could be configured agent because this is this is not the response. This is a configured agent. So, I think this is maybe a little clearer. So, we are configuring our agent with our callbacks, okay?

Which is just our streamer. Then we're iterating through the tokens are returned by our agent using a stream here. Okay? And as we are iterating through this because we pass our streamer to the callbacks here, what that is going to do is every single token that our agent returns is gonna get processed through our queue callback handler here.

Okay? So, this on LM token on LMN, these are going to get executed and then all of those tokens you can see here are passed to our queue. Okay? Then, we come up here and we have this a iter. So, this a iter method here is used by our generator over in our API is used by this token generator.

To pick up from the queue, the tokens that have been put in the queue by these other methods here. Okay? So, it's putting tokens into the queue and pulling them out with this. Okay? So, that is just happening in parallel as well as this code is running here. Now, the reason that we extract the tokens out here is that we want to pull out our tokens and we append them all to our outputs.

Now, those outputs that becomes a list of AI messages which are essentially the AI telling us what tool to use and what parameters to pass to each one of those tools. This is very similar to what we covered in that deep dive chapter but the one thing that I have modified here is I've enabled us to use parallel tool calls.

So, that is what we see here with this these four lines of code. We're saying, okay, if our tool call includes an ID, that means we have a new tool call or a new AI message. So, what we do is we append that AI message which is the AI message chunk to our outputs and then following that, if we don't get an ID, that means we're getting the tool arguments.

So, following that, we're just adding our AI message chunk to the most recent AI message chunk from our outputs. Okay, so what that will do is it will create that list of AI messages. It'll be like, you know, AI message one and then this will just append everything to that AI message one.

Then, we'll get our next AI message chunk. This will then just append everything to that until we get a complete AI message and so on and so on. Okay. So, what we do here is here, we've collected all of our AI message chunk objects. Then, finally, what we do is just transform all those AI message chunk objects into actual AI message objects and then return them from our function which we then receive over here.

So, into the tool calls variable. Okay. Now, this is very similar to the deep dive chapter. Again, we're going through that count, that loop where we have a max iterations at which point we will just stop but until then, we continue iterating through and making more tool calls, executing those tool calls, and so on.

So, what is going on here? Let's see. So, we got our tool calls. This is going to be a list of AI message objects. Then, what we do with those AI message objects is we pass them to this execute tool function. If you remember, what is that? That is this function here.

So, we pass each AI message individually to this function and that will execute the tool for us and then return us that observation from the tool. Okay. So, that is what you see happening here but this is an async method. So, typically, what you'd have to do is you'd have to do await execute tool and we could do that.

So, we could do a, okay, let me make this a little bigger for us. Okay. And so, what we could do, for example, which might be a bit clearer is you could do tool obs equals an empty list and what you could do is you can say for tool call, oops, in tool calls, the tool observation is we're going to append execute tool call which would have to be in a wait.

So, we'd actually put the await in there and what this would do is actually the exact same thing as what we're doing here. The difference being that we're doing this tool by tool. Okay. So, we are, we're executing async here but we're doing them sequentially whereas what we can do which is better is we can use async gather.

So, what this does is gathers all those coroutines and then we await them all at the same time to run them all asynchronously. They all begin at the same time or almost exactly the same time and we get those responses kind of in parallel but of course it's async so it's not fully in parallel but practically in parallel.

Cool. So, we have that and then that, okay, we get all of our tool observations from that. So, that's all of our tool messages and then one interesting thing here is if we, let's say we have all of our AI messages with all of our tool calls and we just append all of those to our agent scratchpad.

Alright. So, let's say here we're just like, oh, okay, agent scratchpad extend and then we would just have, okay, we'd have our tool calls and then we do agent scratchpad extend tool obs. Alright. So, what is happening here is this would essentially give us something that looks like this.

So, we'd have our AI message, say, I'm just gonna put, okay, we'll just put tool call IDs in here to simplify it a little bit. This would be tool call ID A. Then, we would have AI message, tool call ID B. Then, we'd have tool message. Let's just remove this content field.

I don't want that and tool message, tool call ID B, right? So, it would look something like this. So, the order is the tool message is not following the AI message which you would think, okay, we have this tool call ID. That's probably fine but actually, when we're running this, if you add these two agents scratchpad in this order, what you'll see is your response just hangs like nothing.

Nothing happens when you come through to your second iteration of your agent call. So, actually, what you need to do is these need to be sorted so that they are actually in order and it doesn't actually doesn't necessarily matter which order in terms of like A or B or C or whatever you use.

So, you could have this order. We have AI message, tool message, AI message, tool message, just as long as you have your tool call IDs are both together or you could, you know, invert this for example, right? So, you could have this, right? And that will work as well.

It's essentially just as long as you have your AI message followed by your tool message and both of those are sharing that tool call ID. You need to make sure you have that order, okay? So, that of course would not happen if we do this and instead, what we need to do is something like this, okay?

So, if I make this a little easier to read, okay? So, we're taking the tool call ID. We are pointing it to the tool observation and we're doing that for every tool call and tool observation within like a zip of those, okay? Then, what we're saying is for each tool call within our tool calls, we are extending our agent scratchpad with that tool call followed by the tool observation message which is the tool message.

So, this would be our, this is the AI message and that is the tool messages down there, okay? So, that is always happening and that is how we get this correct order which will run. Otherwise, things will not run. So, that's important to be aware of, okay? Now, we're almost done.

I know there's, we've just been through quite a lot. So, we continue, we increment our count as we were doing before and then we need to check for the final answer tool, okay? And because we're running these tools in parallel, okay? Because we're allowing multiple tool calls in one step, we can't just look at the most recent tool and look if it is, it has the name final answer.

Instead, we need to iterate through all of our tool calls and check if any of them have the name final answer. If they do, we say, okay, we extract that final answer call. We extract the final answer as well. So, this is the direct text content and we say, okay, we have found the final answer.

So, this will be set to true, okay? Which should happen every time but let's say if our agent gets stuck in a loop of calling multiple tools, this might not happen before we break based on the max iterations here. So, we might end up breaking based on max iterations rather than we found a final answer, okay?

So, that can happen. So, anyway, if we find that final answer, we break out of this for loop here and then, of course, we do need to break out of our while loop which is here. So, we say, if we found the final answer, break, okay? Cool. So, we have that.

Finally, after all of that. So, this is how, you know, we've executed our tool, our agent has steps and iterations, has process, we've been through those. Finally, we come down to here where we say, okay, we're gonna add that final output to our chat history. So, this is just going to be the text content, right?

So, this here, get direct answer but then, what we do is we return the full final answer call. The full final answer call is basically this here, right? So, this answer and tools used but of course, populated. So, we're saying here that if we have a final answer, okay?

If we have that, we're going to return the final answer call which was generated by our LLM. Otherwise, we're gonna return this one. So, this is in the scenario that maybe the agent got caught in a loop and just kept iterating. If that happens, we'll say it will come back with, okay, no answer found and it will just return, okay, we didn't use any tools which is not technically true but it's this is like a exception handling event.

So, it ideally shouldn't happen but it's not really a big deal if we're saying, okay, there were no tools used in my opinion anyway. Cool. So, we have all of that and yeah, we just, we initialize our agent executor and then, I mean, that is our agent execution code.

The one last thing we wanna go through is the SERP API tool which we will do in a moment. Okay. So, SERP API. Let's see what, let's see how we build our SERP API tool. Okay, so, we'll start with the synchronous SERP API. Now, the reason we're starting with this is that it's actually, it's just a bit simpler.

So, I'll show you this quickly before we move on to the async implementation which is what we're using within our app. So, we want to get our SERP API API key. So, I'll run that and we just enter it at the top there. And this will run. So, we're going to use the SERP API SDK first.

We're importing Google search and these are the input parameters. So, we have our API key. We're using, we say we want to use Google. We, our question is cell query. So, queue for query. We're searching for the latest news in the world and we'll return quite a lot of stuff.

You can see there's a ton of stuff in there, right? Now, what we want is contained within this organic results key. So, we can run that and we'll see, okay, it's talking about, you know, various things. Pretty recent stuff at the moment. So, we can tell, okay, that is, that is in fact working.

Now, this is quite messy. So, what I would like to do first is just clean that up a little bit. So, we define this article base model which is Pydantic and we're saying, okay, from a set of results. Okay. So, we're going to iterate through each of these. We're going to extract the title, source link, and the snippet.

So, you can see title, source, link, and snippet here. Okay. So, that's all useful. We'll run that and what we do is we go through each of the results in organic results and we just load them into our article using this class method here and then we can see, okay, let's have a look at what those look like.

It's much nicer. Okay, we get this nicely formatted object here. Cool. That's great. Now, all of this, what we just did here. So, this is using sub APIs SDK which is great. Super easy to use. The problem is that they don't offer a async SDK which is a shame but it's not that hard for us to set up ourselves.

So, typically, with asynchronous requests, what we can use is the AIO HTTP library. It's well, you can see what we're doing here. So, this is equivalent to requests.get. Okay. That's essentially what we're doing here and the equivalent is literally this. Okay. So, this is the equivalent using requests that we are running here but we're using async code.

So, we're using AI HTTP client session and then session.get. Okay. With this async with here and then we just await our response. So, this is all, yeah, this is what we do rather than this to make our code async. So, it's really simple and then the output that we get is exactly the same, right?

So, we still get this exact same output. So, that means, of course, that we can use that articles method like this in the exact same way and we get, we get the same result. There's no need to make this article from sub API results async because again, like this, this bit of code here is fully local.

It's just our Python running everything. So, this does not need to be async. Okay and we can see that we get literally the exact same result there. So, with that, we have everything that we would need to build a fully asynchronous sub API tool which is exactly what we do here for LangChain.

So, we import those tools and I mean, there's nothing, is there anything different here? No. Alright, this is exactly what we we just did but I will run this because I would like to show you very quickly this. Okay. So, this is how we were initially calling our tools in previous chapters because we were okay mostly with using the the synchronous tools.

However, you can see that the func here is just empty. Alright, so if I do type, it's just a non type. That is because well, this is an async function, okay? It's an async tool. Sorry. So, it was defined with async here and what happens when you do that is you get this coroutine object.

So, rather than func which is it isn't here, you get that coroutine. If we then modify this which would be kinda, okay, let's just remove all the asyncs here and the await. If we modify that like so and then we look at the cert API structured tool, we go across, we see that we now get that func, okay?

So, that is that is just the difference between an async structured tool versus async structured tool via corsion async. Okay, now we have coroutine again. So, important to be aware of that and of course, we we run using the cert API coroutine. So, that is that's how we build the cert API tool and there's nothing.

I mean, that is exactly what we did here. So, I don't need to, I don't think we need to go through that any further. So, yeah, I think that is basically all of our code behind this API. With all of that, we can then go ahead. So, we have our API running already.

Let's go ahead and actually run also our front end. So, we're gonna go to Documents Aurelio Linechain course and then we want to go to chapters zero nine capstone app and you will need to have NPM installed. So, to do that, what do we do? We can take a look at this answer for example.

This is probably what I would recommend, okay? So, I would run brew install node followed by brew install NPM. If you're on Mac, of course, it's different. If you're on Linux or Windows, once you have those, you can do NPM install and this will just install all of the oops, sorry, NPM install and this will just install all of the node packages that we need and then we can just run NPM run dev, okay?

And now, we have our app running on Locust 3000. So, we can come over to here, open that up and we have our application. You can ignore this. So, in here, we can begin just asking questions, okay? So, we can start with a quick question. What is five plus five?

MC. So, we have our streaming happening here. It said the agent wants to use the add tool and these are the input parameters to the add tool and then we get the streamed response. So, this is the final answer tool where we're outputting that answer key and value and then here, we're outputting that tool used key and value which is just an array of the tools being used which just functions add.

So, we have that. Then, let's ask another question. This time, we'll trigger SERP API with tell me about the latest news in the world. Okay. So, we can see that's using SERP API and the query is latest world news and then it comes down here and we actually get some citations here which is kind of cool.

So, you can also come through to here, okay? And it takes us through to here. So, that's pretty cool. Unfortunately, I just lost my chat. So, fine. Let me, I can ask that question again. Okay. We can see that tools use SERP API there. Now, let's continue with the next question from our notebook which is how cold is it right now?

What is five multiplied by five and what do you get when multiplying those two numbers together? I'm just gonna modify that to say in Celsius so that I can understand. Thank you. Okay. So, for this one, we can see what did we get? So, we got current temperature in Oslo.

We got multiply five by five which is our second question and then we also got subtract. Interesting that I don't know why I did that. It's kind of weird. So, it decided to use. Oh, okay. So, this is, okay. So, then here it was. Okay, that kind of makes sense.

Does that make sense? Roughly. Okay. So, I think the the conversion for Fahrenheit Celsius is say like subtract thirty-two. Okay. Yes. So, to go from Fahrenheit to Celsius, you are doing basically Fahrenheit minus thirty-two and then you're multiplying by this number here which the I assume the AI did not.

I roughly did. Okay. So, subtracting thirty-six like thirty-two would have given us four and it gave us approximately two. So, if you think, okay, multiply by this, it's practically multiplying by 0.5. So, halving the value and that would give us roughly two degrees. So, that's what this was doing here.

Kind of interesting. Okay, cool. So, we've gone through. We have seen how to build a fully fledged chat application using what we've learned throughout the course and we've built quite a lot. If you think about this application, you're getting the real-time updates on what tools are being used, the parameters being input to those tools, and then that is all being returned in a streamed output and even in a structured output for your final answer including the answer and the tools that we use.

So, of course, you know what we built here is fairly limited but it's super easy to extend this like you could maybe something that you might want to go and do is take what we've built here and like fork this application and just go and add different tools to it and see what happens because this is very extensible.

You can do a lot with it but yeah, that is the end of the course. Of course, this is just the beginning of whatever it is you're wanting to learn or build with AI. Treat this as the beginning and just go out and find all the other cool interesting stuff that you can go and build.

So, I hope this course has been useful, informative, and gives you an advantage in whatever it is you're going out to build. So, thank you very much for watching and taking the course and sticking through right to the end. I know it's pretty long so I appreciate it a lot and I hope you get a lot out of it.

Thanks. Bye. Bye. Bye. Bye. Bye. Bye. (gentle music)