back to index

LangChain Mastery in 2025 | Full 5 Hour Course


Chapters

0:0 Course Introduction
4:24 CH1 When to Use LangChain
13:28 CH2 Getting Started
14:14 Local Course Setup (Optional)
17:0 Colab Setup
18:11 Initializing our OpenAI LLMs
22:34 LLM Prompting
28:48 Creating a LLM Chain with LCEL
33:59 Another Text Generation Pipeline
37:11 Structured Outputs in LangChain
41:56 Image Generation in LangChain
46:59 CH3 LangSmith
49:36 LangSmith Tracing
55:45 CH4 Prompts
67:21 Using our LLM with Templates
72:39 Few-shot Prompting
78:56 Chain of Thought Prompting
85:25 CH5 LangChain Chat Memory
89:51 ConversationBufferMemory
98:39 ConversationBufferWindowMemory
107:57 ConversationSummaryMemory
117:33 ConversationSummaryBufferMemory
129:29 CH6 LangChain Agents Intro
136:34 Creating an Agent
140:56 Agent Executor
147:30 Web Search Agent
150:41 CH7 Agent Deep Dive
160:8 Creating an Agent with LCEL
176:40 Building a Custom Agent Executor
185:19 CH8 LCEL
189:14 LCEL Pipe Operator
193:28 LangChain RunnableLambda
198:0 LangChain Runnable Parallel and Passthrough
203:13 CH9 Streaming
209:22 Basic LangChain Streaming
213:29 Streaming with Agents
231:26 Custom Agent and Streaming
240:46 CH10 Capstone
245:25 API Build
252:14 API Token Generator
256:44 Agent Executor in API
274:50 Async SerpAPI Tool
280:53 Running the App
284:49 Course Completion!

Whisper Transcript | Transcript Only Page

00:00:00.000 | Welcome to the AI engineers guide for the line chain.
00:00:03.320 | This is a full course that will take you from the assumption
00:00:08.400 | that you know nothing about line chain to being able to
00:00:12.440 | proficiently use the framework, either, you know, within line
00:00:17.480 | chain, within line graph, or even elsewhere, from the
00:00:22.360 | fundamentals that you will learn in this course. Now, this course
00:00:26.400 | will be broken up into multiple chapters, we're going to start
00:00:29.960 | by talking a little bit about what line chain is, and when we
00:00:33.920 | should really be using it, and when maybe we don't want to use
00:00:36.560 | it. We'll talk about the pros and cons, and also about the
00:00:40.040 | wider line chain ecosystem, not just about a line chain
00:00:43.880 | framework itself. From there, we'll introduce line chain, and
00:00:48.040 | we'll just have a look at a few examples before diving into
00:00:51.080 | essentially the basics of the framework. Now, I will just note
00:00:55.720 | that all of this for line chain 0.3. So that is the latest
00:01:00.360 | current version. Although that being said, we will cover a
00:01:04.640 | little bit of where line chain comes from as well. So we'll be
00:01:07.840 | looking at pre 0.3 version methods for doing things, so
00:01:13.320 | that we can understand, okay, that's the old way of doing
00:01:16.200 | things, how do we do it now, now that we're in version 0.3? And
00:01:20.280 | also, how do we dive a little deeper into those methods as
00:01:23.200 | well and kind of customize those. From there, we'll be
00:01:25.920 | diving into what I believe is the future of AI. I mean, it's
00:01:33.400 | the now and the short term, potentially even further into
00:01:36.840 | the future. And that is agents. We'll be spending a lot of time
00:01:40.880 | on agents. So we'll be starting with a simple introduction to
00:01:45.560 | agents. So that is how can we build an agent that is simple?
00:01:51.080 | What are the main components of agents? What do they look like?
00:01:53.880 | And then we'll be diving much deeper into them. And we'll be
00:01:57.640 | building out our own agent executor, which kind of like a
00:02:01.320 | framework around the AI components of an agent, we're
00:02:06.280 | building our own. And once we've done our deep dive on agents,
00:02:10.480 | we'll be diving into line chain expression language, which we'll
00:02:14.360 | be using throughout this course. So line chain expression
00:02:17.160 | language is the recommended way of using line chain. And the
00:02:21.680 | expression language or LSAL takes kind of a break from
00:02:25.240 | standard Python syntax. So there's a bit of weirdness in
00:02:30.400 | there. And yes, we'll be using it throughout the course. But
00:02:34.000 | we're leaving the LSAL chapter until this kind of later on in
00:02:39.160 | the course, because we really want to dive into the
00:02:41.320 | fundamentals of LSAL by that point. But the idea is that by
00:02:45.360 | this point, you already have a good grasp of at least how to
00:02:47.720 | use the basics of LSAL before we really dig in at that point,
00:02:51.960 | then we'll be digging streaming, which is an essential UX
00:02:56.200 | feature of AI applications in general streaming, it can just
00:03:01.120 | improve the user experience massively. And it's not just
00:03:04.920 | about streaming tokens, you know, that interface where you
00:03:07.960 | have word by word, the AI is generating text on the screen,
00:03:12.360 | streaming is more than just that it is also the ability, if
00:03:16.920 | you've seen the interface of perplexity, where as the agent
00:03:20.400 | is thinking, you're getting an update of what the agent is
00:03:23.800 | thinking about what tools is using and how it is using those
00:03:27.320 | tools. That's also another essential feature that we need
00:03:30.600 | to have a good understanding of streaming to build. So we'll
00:03:33.960 | also be taking a look at all of that. Then we'll finally we'll
00:03:38.080 | be topping it off with a capstone project where we will
00:03:42.320 | be building our own AI agent application that is going to
00:03:47.040 | incorporate all of these features, we're going to have an
00:03:50.240 | agent that can use tools, web search, we'll be using
00:03:53.080 | streaming, and we'll see all of this in a nice interface that we
00:03:57.760 | can that we can work with. So as an overview, the course, of
00:04:01.360 | course, is very high level, what I've just gone through, there's
00:04:04.840 | a ton of stuff in here. And truly, this course can take you
00:04:07.960 | from you know, wherever you are with Lionchain at the moment,
00:04:11.160 | and whether you're a beginner or you've used it a bit or even
00:04:14.040 | intermediate, and you're probably going to learn a fair
00:04:17.000 | bit from it. So without any further ado, let's dive in to
00:04:22.280 | the first chapter. Okay, so the first chapter of the course,
00:04:27.120 | we're going to focus on when should we actually use
00:04:30.480 | Lionchain? And when should we use something else? Now, through
00:04:34.320 | this chapter, we're not really going to focus too much on the
00:04:36.800 | code. Well, you know, every other chapter is very code
00:04:40.520 | focused. But this one is a little more just theoretical.
00:04:44.000 | Why is Lionchain? Where's it fit in? When should I use it? When
00:04:46.800 | should I not? So I want to just start by framing this. Lionchain
00:04:51.560 | is one of, if not the most popular open source framework
00:04:57.360 | within the Python ecosystem, at least for AI. It works pretty
00:05:01.640 | well for a lot of things. And also works terribly for a lot of
00:05:04.600 | things as well, to be completely honest. There are massive pros,
00:05:08.000 | massive cons to using Lionchain. Here, we're just going to
00:05:10.600 | discuss a few of those and see how Lionchain maybe compares a
00:05:14.760 | little bit against other frameworks. So the very first
00:05:19.040 | question we should be asking ourselves is, do we even need a
00:05:22.680 | framework? Is a framework actually needed when we can just
00:05:28.480 | hit an API, you have the OpenAI API, other APIs, Mistral, so on,
00:05:32.840 | and we can get a response from an LLM in five lines of code on
00:05:36.960 | average for those is incredibly, incredibly simple. However, that
00:05:42.000 | can change very quickly. When we start talking about agents, or
00:05:47.080 | retrieval, augmented generation, research assistance, all this
00:05:51.120 | sort of stuff, those use cases as methods can suddenly get
00:05:57.560 | quite complicated when we're outside of frameworks. And
00:06:02.640 | that's not necessarily a bad thing. Right? It can be
00:06:06.200 | incredibly useful to be able to just understand everything that
00:06:11.360 | is going on and build it yourself. But the problem is
00:06:15.680 | that to do that, you need time, like you need to learn all the
00:06:19.560 | intricacies of building these things, the intricacies of these
00:06:22.120 | methods themselves, like what, you know, how do they even work?
00:06:24.840 | And that kind of runs in the opposite direction of what we
00:06:28.840 | see with AI at the moment, which is AI is being integrated into
00:06:32.160 | the world at an incredibly fast rate. And because of this, most
00:06:38.880 | engineers coming into the space are not from a machine learning
00:06:43.280 | or AI background, most people don't necessarily have any
00:06:46.880 | experience with the system, a lot of engineers coming in that
00:06:50.840 | could be DevOps engineers, generic backend Python
00:06:53.920 | engineers, even front end engineers coming in and
00:06:57.120 | building all these things, which is great, but they don't
00:07:00.400 | necessarily have the experience and that, you know, that might
00:07:02.920 | be you as well. And that's not a bad thing. Because the idea is
00:07:06.480 | that obviously you're going to learn and you're going to pick
00:07:08.560 | up a lot of these things. And in this scenario, there's quite a
00:07:12.520 | good argument for using a framework, because a framework
00:07:16.320 | means that you can get started faster. And a framework like
00:07:20.080 | Langchain, it abstracts away a lot of stuff. And that's a big
00:07:24.800 | complaint that a lot of people will have with Langchain. But
00:07:28.560 | that abstracting away of many things is also what made
00:07:33.120 | Langchain popular, because it means that you can come in not
00:07:35.680 | really knowing, okay, what, you know, RAG is, for example, and
00:07:39.240 | you can implement a RAG pipeline, get the benefits of it
00:07:42.080 | without really needing to understand it. And yes, there's
00:07:44.760 | an argument against that as well, just implementing
00:07:47.680 | something without really understanding it. But as we'll
00:07:50.440 | see throughout the course, it is possible to work with
00:07:54.640 | Langchain in a way, as we will in this course, where you kind
00:08:00.600 | of implement these things in an abstract way, and then break
00:08:03.400 | them apart, and start understanding the intricacies at
00:08:07.000 | least a little bit. So that can actually be pretty good.
00:08:10.640 | However, again, circling back to what we said at the start, if
00:08:17.000 | the idea or your application is just a very simple, you know,
00:08:20.360 | you need to generate some text based on some basic input,
00:08:23.920 | maybe you should just use an API, that's completely valid as
00:08:27.120 | well. Now, we just said, okay, a lot of people coming to
00:08:31.360 | Langchain might not be from an AI background. So another
00:08:35.840 | question for a lot of these engineers might be, okay, if I
00:08:38.120 | want to learn about, you know, RAG, agents, all these things,
00:08:42.520 | should I skip Langchain and just try and build it from scratch
00:08:46.800 | myself? Well, Langchain can help a lot with that learning
00:08:50.920 | journey. So you can start very abstract. And as you gradually
00:08:56.480 | begin to understand the framework better, you can strip
00:09:00.520 | away more and more of those abstractions and get more into
00:09:03.560 | the details. And in my opinion, this gradual shift towards more
00:09:08.800 | explicit code, with less abstraction, is a really nice
00:09:14.560 | feature. And it's also what we focus on, right? Throughout
00:09:17.680 | this course, that's what we're going to be doing. We're going
00:09:19.520 | to sign abstract, stripping away the abstractions, and getting
00:09:23.240 | more explicit with what we're building. So for example,
00:09:26.000 | building an agent in Langchain, there's this very simple and
00:09:31.120 | incredibly abstract create tools agent method that we can use.
00:09:36.080 | And like it creates a tool agent for you. It's it doesn't tell
00:09:41.160 | you anything. So you can you can use that, right. And we will use
00:09:46.320 | that initially in the course, but then you can actually go
00:09:49.800 | from that to defining your full agent execution logic, which is
00:09:56.120 | basically a tools call to open AI, you're going to be getting
00:09:59.880 | that tool information back, but then you've got to figure out,
00:10:02.280 | okay, how am I going to execute that? How am I going to store
00:10:04.720 | this information? And then how am I going to iterate through
00:10:07.520 | this? So we're going to be seeing that stripping away
00:10:11.560 | abstractions as we work through as we build agents as we do, as
00:10:15.360 | we build, like our streaming use case, among many other things,
00:10:18.960 | even chat memory, we'll see there as well. So Langchain can
00:10:23.160 | act as the on ramp to your AI learning experience, then what
00:10:29.040 | you might find, and I do think this is quite true, for most
00:10:33.080 | people is that if you if you're really serious about AI
00:10:37.360 | engineering, and that's what you want to do, like that's your
00:10:39.640 | focus, right, which isn't for everyone, for certain, a lot of
00:10:43.840 | people just want to understand a bit of AI, and they want to
00:10:46.360 | continue doing what they're doing, and just integrate AI
00:10:49.000 | here and there. And maybe those, you know, if that's your focus,
00:10:51.760 | you might stick with Langchain, there's not necessarily a reason
00:10:55.360 | to move on. But in the other scenario, where you're thinking,
00:10:59.880 | okay, I want to get really good at this, I want to just learn as
00:11:04.400 | much as I can. And I'm going to dedicate basically my, you know,
00:11:07.960 | my short term future of my career on becoming AI engineer.
00:11:14.080 | Then Langchain might be the on ramp, it might be your initial
00:11:18.280 | learning curve. But then after you've become competent with
00:11:21.680 | Langchain, you might actually find that you want to move on to
00:11:24.080 | other frameworks. And that doesn't necessarily mean that
00:11:26.600 | you're going to have wasted your time with Langchain. Because
00:11:30.000 | one, Langchain is a thing helping you learn. And two, one
00:11:33.640 | of the main frameworks that I recommend a lot of people to
00:11:37.080 | move on to is actually Langraff, which is still within the
00:11:40.200 | Langchain ecosystem, and it still uses a lot of Langchain
00:11:43.840 | objects and methods. And, of course, concepts as well. So
00:11:48.720 | even if you do move on from Langchain, you may move on to
00:11:52.200 | something like Langraff, which you can know Langchain for
00:11:56.000 | anyway. And let's say you do move on to another framework
00:11:58.800 | instead. In that scenario, the concepts that you learn from
00:12:02.280 | Langchain are still pretty important. So to just finish up
00:12:05.600 | this chapter, I just want to summarize on that question of
00:12:10.160 | should you be using Langchain? What's important to remember is
00:12:14.400 | that Langchain does abstract a lot. Now, this abstraction of
00:12:18.680 | Langchain is both a strength and a weakness. With more
00:12:23.240 | experience, those abstractions can feel like a limitation. And
00:12:28.720 | that is why we sort of go with the idea that Langchain is a
00:12:34.920 | really good to get started with. But as the project grows in
00:12:38.320 | complexity, or the engineers get more experience, they might move
00:12:41.040 | on to something like Langraff, which, in any case, is going to
00:12:44.520 | be using Langchain to some degree. So in either one of
00:12:48.160 | those scenarios, Langchain is going to be a core tool in an AI
00:12:55.960 | engineer's toolkit. So it's worth learning in our opinion.
00:12:59.000 | But of course, it comes with its, you know, it comes with its
00:13:02.280 | weaknesses. And it's just good to be aware of that it's not a
00:13:04.920 | perfect framework. But for the most part, you will learn a lot
00:13:08.840 | from it, and you will be able to build a lot with it. So with all
00:13:13.120 | of that, we'll move on to our first sort of hands on chapter
00:13:17.840 | with Langchain, where we'll just introduce Langchain, some of the
00:13:22.800 | essential concepts, we're not going to dive too much into the
00:13:25.240 | syntax, but we're still going to understand a little bit of what
00:13:27.200 | we can do with it. Okay, so moving on to our next chapter,
00:13:29.880 | getting started with Langchain. In this chapter, we're going to
00:13:33.720 | be introducing Langchain by building a simple LM powered
00:13:38.040 | assistant that will do various things for us, it will be
00:13:41.040 | multimodal, generating some text, generating images,
00:13:44.760 | generate some structured outputs, it will do a few things.
00:13:47.960 | Now to get started, we will go over to the course repo, all of
00:13:53.240 | the code, all the chapters are in here, there are two ways of
00:13:56.840 | running this, either locally or in Google Colab, we would
00:14:01.000 | recommend running in Google Colab, because it's just a lot
00:14:03.880 | simpler with environments. But you can also run it locally. And
00:14:07.880 | actually, for the capstone, we will be running it locally,
00:14:11.640 | there's no way of us doing that in Colab. So if you would like
00:14:16.200 | to run everything locally, I'll show you how quickly now if you
00:14:19.520 | would like to run in Colab, which I would recommend at least
00:14:22.680 | for the first notebook chapters, just skip ahead, there will be
00:14:27.520 | chapter points in the timeline of the video. So for only
00:14:32.960 | running it locally, we just come down to here. So this actually
00:14:36.840 | tells you everything that you need. So you will need to
00:14:40.920 | install uvi. Alright, so this is the package manager that we
00:14:45.000 | recommend by the Python and package management library, you
00:14:48.840 | don't need to use uvi, it's up to you. uvi is very simple, it
00:14:54.440 | works really well. So I would recommend that. So you would
00:14:57.680 | install it with this command here. This is on Mac. So it will
00:15:02.320 | be different. Otherwise, if you are on Windows, or otherwise,
00:15:06.040 | you can look at the installation guide there and it will tell
00:15:08.560 | you what to do. And so before we actually do this, what I will
00:15:12.680 | do is go ahead and just clone this repo. So we'll come into
00:15:18.400 | here, I'm going to create like a temp directory for me because
00:15:21.680 | I already have the flying chain course in there. And what I'm
00:15:25.840 | going to do is just get clone line chain course. Okay, so you
00:15:29.800 | will also need to install git if you don't have that. Okay, so
00:15:34.880 | we have that, then what we'll do is copy this. Okay, so this
00:15:39.000 | will install Python 3.12.7 for us with this command, then this
00:15:44.360 | will create a new VM within that or using Python 3.12.7 that
00:15:50.960 | we've installed. And then uvi sync will actually be looking at
00:15:55.760 | the pyproject.toml file, that's like the package installation
00:16:00.520 | for the repo and using that to install everything that we need.
00:16:05.200 | Now, we should actually make sure that we are within the
00:16:08.160 | line chain course directory. And then yes, we can run those
00:16:12.080 | three. There we go. So everything should install with
00:16:17.080 | that. Now, if you are in cursor, you can just do cursor dot or we
00:16:25.000 | can run code dot if in VS code, I'll just be running this. And
00:16:29.960 | then I've opened up the course. Now within that course, you have
00:16:34.080 | your notebooks and then you just run through these making sure
00:16:36.880 | you select your kernel, Python environment and making sure
00:16:39.960 | you're using the correct VM from here. So that should pop up
00:16:44.400 | already as this VM bin Python, and you'll click that and then
00:16:48.800 | you can run it through. When you are running locally, don't run
00:16:52.520 | these, you don't need to you've already installed everything. So
00:16:55.320 | you don't this specifically is for Colab. So that is running
00:16:59.720 | things locally. Now let's have a look at running things in Colab.
00:17:05.080 | So for running everything in Colab, we have our notebooks in
00:17:09.160 | here, we click through, and then we have each of the chapters
00:17:12.400 | through here. So starting with the first chapter, the
00:17:16.120 | introduction, which is where we are now. So what you can do to
00:17:21.000 | open this in Colab is either just click this Colab button
00:17:24.600 | here. Or if you really want to, for example, maybe this is not
00:17:30.480 | loading for you, what you can do is you can copy the URL at the
00:17:34.840 | top here, you can go over to Colab, you can go to open GitHub,
00:17:40.920 | and then just paste that in there and press enter. And there
00:17:46.360 | we go, we have our notebook. Okay, so we're in now, what we
00:17:51.720 | will do first is just install the prerequisites. So we have
00:17:55.880 | line chain, just a little line chain packages here, line chain
00:17:59.680 | core, line chain OpenAI because we're using OpenAI and line
00:18:04.120 | chain community, which is needed for running what we're running.
00:18:07.600 | Okay, so that has installed everything for us. So we can
00:18:12.200 | move on to our first step, which is initializing our LM. So
00:18:18.800 | we're going to be using GPT-40 mini, which is slightly small,
00:18:23.280 | but fast, but also cheaper model. That is also very good
00:18:27.400 | from OpenAI. So what we need to do here is get an API key. Okay,
00:18:33.320 | so for getting the API key, we're going to go to OpenAI's
00:18:37.240 | website. And you can see here that we're opening platform.
00:18:40.560 | openai.com. And then we're going to go into settings organization
00:18:44.200 | API keys. So you can copy that or just click it from here.
00:18:49.160 | Okay, so I'm going to go ahead and create a new secret key to
00:18:54.120 | actually just in case you're kind of looking for where this
00:18:57.080 | is. It's settings organization API keys again, okay, create a
00:19:01.600 | new API key, I'm going to call it line chain course. I'll just
00:19:08.240 | put on the semantic router, that's just my organization, you
00:19:11.600 | put it wherever you want it to be. And then you would copy your
00:19:16.320 | API key, you can see mine here, I'm obviously going to revert
00:19:20.040 | that before you see this, but you can try and use it if you
00:19:22.400 | really like. So I'm going to copy that. And I'm going to
00:19:25.080 | place it into this little box here. You could also just place
00:19:29.680 | it, put your full API key in here, it's up to you. But this
00:19:34.560 | little box just makes things easier. Now, that what we've
00:19:39.040 | basically done there is just passing our API key, we're
00:19:41.360 | setting our OpenAI model GPT-40 mini. And what we're going to be
00:19:45.880 | doing now is essentially just connecting and setting up our
00:19:49.960 | LLM parameters with line chain. So we run that, we say okay,
00:19:55.680 | we're using a GPT-40 mini. And we're also setting ourselves up
00:19:59.880 | to use two different LLMs here, or two of the same LLM with
00:20:04.560 | slightly different settings. So the first of those is an LLM
00:20:08.240 | with a temperature setting of zero. The temperature setting
00:20:11.480 | basically controls almost the randomness of the output of
00:20:17.160 | your LLM. And the way that it works is when an LLM is
00:20:22.520 | predicting the next token, or next word in a sequence, it'll
00:20:28.040 | provide a probability actually for all of the tokens within the
00:20:31.360 | LLMs knowledge base or what the LLM has been trained on. So
00:20:35.280 | what we do when we set a temperature of zero is we say
00:20:38.400 | you are going to give us the token with highest probability
00:20:43.960 | according to you, okay. Whereas when we set a temperature of
00:20:48.160 | 0.9, what we're saying is, okay, there's actually an increased
00:20:52.280 | probability of you giving us a token that according to your
00:20:57.720 | generated output is not the token with the highest
00:21:01.000 | probability according to the LLM. But what that tends to do
00:21:04.240 | is give us more sort of creative outputs. So that's what the
00:21:08.120 | temperature does. So we are creating a normal LLM and then
00:21:12.680 | a more creative LLM with this. So what are we going to be
00:21:16.480 | building? We're going to be taking a draft article from the
00:21:24.960 | Aurelio learning page, and we're going to be using line chain to
00:21:29.040 | generate various things that we might find helpful as well. You
00:21:34.000 | know, we have this article draft and we're editing it and just
00:21:36.520 | kind of like finalizing it. So what are those going to be? You
00:21:39.920 | can see them here. We have the title for the article,
00:21:43.040 | description, an SEO friendly description, specifically. The
00:21:47.840 | third one, we're going to be getting the LLM to provide us
00:21:50.720 | advice on existing paragraph and essentially writing a new
00:21:54.280 | paragraph for us from that existing paragraph. And what
00:21:57.440 | it's going to do, this is the structured output part is going
00:22:00.960 | to write a new version of that paragraph for us. And it's going
00:22:03.800 | to give us advice on where we can improve our writing. Then
00:22:07.000 | we're going to generate a thumbnail hero image for our
00:22:11.080 | article. So a nice image that you would put at the top. So
00:22:14.520 | here, we're just going to input our article, you can put
00:22:19.040 | something else in here if you like. Essentially, this is just
00:22:22.200 | a big article that's written a little while back on agents. And
00:22:28.680 | now we can go ahead and start preparing our prompts, which are
00:22:32.480 | essentially the instructions for our LLM. So line chain comes
00:22:36.960 | with a lot of different like utilities for prompts, and we're
00:22:41.800 | going to dive into them in a lot more detail. But I do want to
00:22:44.160 | just give you the essentials now, just so you can understand
00:22:48.040 | what we're looking at, at least conceptually. So prompts for
00:22:51.480 | chat agents are at a minimum broken up into three
00:22:55.000 | components. Those are the system prompt, this provides
00:22:58.800 | instructions to our LLM on how it should behave, what its
00:23:01.600 | objective is, and how it should go about achieving that
00:23:04.640 | objective. Generally, system prompts are going to be a bit
00:23:08.440 | longer than what we have here, depending on the use case, then
00:23:11.880 | we have our user prompts. So these are user written
00:23:15.200 | messages. Usually, sometimes we might want to pre populate
00:23:18.680 | those if we want to encourage a particular type of
00:23:21.400 | conversational patterns from our agent. But for the most part,
00:23:26.640 | yes, these are going to be user generated. Then we have our AI
00:23:30.920 | prompts. So these are, of course, AI generated. And again,
00:23:35.200 | in some cases, we might want to generate those ourselves
00:23:38.360 | beforehand or within a conversation if we have a
00:23:41.960 | particular reason for doing so. But for the most part, you can
00:23:45.080 | assume that these are actually user and AI generated. Now, the
00:23:49.480 | line chain provides us with templates for each one of these
00:23:54.480 | prompt types. Let's go ahead and have a look at what these look
00:23:58.600 | like within line chain. So to begin, we are looking at this
00:24:03.560 | one. So we have our system message prompt template and
00:24:07.920 | human messages, the user that we saw before. So we have these
00:24:12.120 | two system prompt, keeping it quite simple here, you are a AI
00:24:15.520 | system that helps generate article titles, right. So our
00:24:18.640 | first component we want to generate is article title. So
00:24:22.160 | we're telling the AI, that's what we want it to do. And then
00:24:26.600 | here, right. So here, we're actually providing kind of like
00:24:32.680 | a template for a user input. So yes, as I mentioned, user input
00:24:40.000 | can be, it can be fully generated by user, it might be
00:24:44.920 | kind of not generated by user, it might be setting up a
00:24:48.400 | conversation beforehand, which a user would later use, or in
00:24:52.320 | this scenario, we're actually creating a template, and the
00:24:57.040 | what the user will provide us will actually just be inserted
00:25:00.800 | here inside article. And that's why we have this import
00:25:04.400 | variables. So what this is going to do is okay, we have all of
00:25:09.800 | these instructions around here, they're all going to be
00:25:12.920 | provided to open AI as if it is the user saying this, but it
00:25:16.760 | will actually just be this here, that user will be providing,
00:25:21.800 | okay. And we might want to also format this a little nicer, it
00:25:24.680 | kind of depends, this will work as it is. But we can also put,
00:25:28.320 | you know, something like this to make it a little bit clearer
00:25:31.400 | to the LM. Okay, what is the article? Where are the prompts?
00:25:36.840 | So we have that, you can see in this scenario, there's not that
00:25:42.680 | much difference to what the system prompt and user prompt is
00:25:45.120 | doing. And this is, it's a particular scenario, it varies
00:25:48.440 | when you get into the more conversational stuff, as we will
00:25:50.920 | do later, you'll see that the user prompt is generally more
00:25:55.640 | fully user generated, or mostly user generated. And much of
00:26:01.160 | these types of instructions, we might actually be putting into
00:26:04.960 | the system prompt, it varies. And we'll see throughout the
00:26:07.680 | course, many different ways of using these different types of
00:26:11.560 | prompts in various different places. Then you'll see here, so
00:26:16.400 | I just want to show you how this is working, we can use this
00:26:20.120 | format method on our user prompt here to actually insert
00:26:24.640 | something within the article input here. So we're going to go
00:26:29.840 | use prompt format, and then we pass in something for article.
00:26:32.920 | Okay. And we can also maybe format this a little nicer, but
00:26:37.240 | I'll just show you this for now. So we have our human message.
00:26:39.800 | And then inside content, this is the text that we had, right, you
00:26:43.200 | can see that we have all this, right. And this is what we wrote
00:26:46.000 | before we wrote all this, except from this part, we didn't write
00:26:50.000 | this, instead of this, we had article, right. So let's format
00:26:55.920 | this a little nicer so that we can see. Okay, so this is
00:26:59.600 | exactly what we wrote up here, exactly the same, except from
00:27:02.600 | now we have test string instead of article. So later, when we
00:27:06.840 | insert our article, it's going to go inside there, slowly
00:27:10.520 | soon. It's like it's an it's an F string in Python, okay. And
00:27:14.440 | this is again, this is one of the things where people might
00:27:16.760 | complain about line chain, you know, this sort of thing can be,
00:27:20.000 | you know, it seems excessive, because you could just do this
00:27:23.120 | with an F string. But there are, as we'll see later, particularly
00:27:26.240 | when you're streaming, just really helpful features that
00:27:29.960 | come with using line chains kind of built in prompt templates,
00:27:35.360 | or at least message objects that we will see. So, you know, we
00:27:42.160 | need to keep that in mind. Again, as things get more
00:27:45.080 | complicated, line chain can be a bit more useful. So, chat
00:27:48.880 | prompt template, this is basically just going to take
00:27:52.680 | what we have here, our system prompt, user prompts, we could
00:27:55.120 | also include some AI prompts in there. And what it's going to do
00:27:59.560 | is merge both of those. And then when we do format, what it's
00:28:05.400 | going to do is put both of those together into a chat history.
00:28:09.120 | Okay, so let's see what that looks like. First, in a more
00:28:13.080 | messy way. Okay, so you can see we have just the content, right?
00:28:18.840 | So it doesn't include the whole, you know, before we had human
00:28:22.120 | message, we're not include, we're not seeing anything like
00:28:24.520 | that here. Instead, we're just seeing the string. So now let's
00:28:28.680 | switch back to print. And we can see that what we have is our
00:28:33.880 | system message here, it's just prefixed with this system. And
00:28:37.160 | then we have human, and it's prefixed by human, and then it
00:28:39.840 | continues, right? So that's, that's all it's doing is just
00:28:42.320 | kind of merging those in some sort of chat log, we could also
00:28:45.000 | put in like AI messages, and they would appear in there as
00:28:47.680 | well. Okay, so we have that. Now, that is our prompt
00:28:52.280 | template. Let's put that together with an LLM to create
00:28:55.600 | what would be in the past line chain be called an LLM chain.
00:28:59.800 | Now, we wouldn't necessarily call it an LLM chain, because
00:29:03.040 | we're not using the LLM chain abstraction, it's not super
00:29:05.880 | important, if that doesn't make sense, we'll go into it in more
00:29:09.160 | detail later, particularly in the in the LSO chapter. So what
00:29:15.080 | this chain will do, think line chain is just chains, we're
00:29:20.240 | chaining together these multiple components, it will perform the
00:29:24.040 | steps prompt formatting. So that's what I just showed you
00:29:27.160 | LLM generation, so sending our prompt to OpenAI, getting a
00:29:32.960 | response and getting that output. So you can also add
00:29:37.280 | another step here, if you want to format that in a particular
00:29:39.840 | way, we're going to be outputting that in a particular
00:29:42.440 | format so that we can feed it into the next step more easily.
00:29:45.040 | But there are also things called output parsers, which parse
00:29:48.800 | your output in a more dynamic or complicated way, depending on
00:29:53.840 | what you're doing. So this is our first look at LSAL, I don't
00:29:58.360 | want us to focus too much on the syntax here, because we will be
00:30:01.000 | doing that later. But I do want you to just understand what is
00:30:04.840 | actually happening here. And logically, what are we writing?
00:30:10.880 | So all we really need to know right now is we define our
00:30:15.680 | inputs with the first dictionary segment here. Alright, so this
00:30:20.000 | is a, you know, our inputs, which we have defined already,
00:30:24.080 | okay. So if we come up to our user prompt here, we said input
00:30:30.840 | variable is our article, right. And we might have also added
00:30:34.000 | input variables to the system prompt here as well. In that
00:30:36.880 | case, you know, let's say we had your AI assistant called name,
00:30:43.920 | right, that helps generate article titles. In this
00:30:48.720 | scenario, we might have input variables, name here, right. And
00:30:55.360 | then what we would have to do down here is we would also have
00:31:00.560 | to pass that in, right. So it also we would have article, we
00:31:04.200 | would also have name. So basically, we just need to make
00:31:09.720 | sure that in here, we're including the variables that we
00:31:13.720 | have defined as input variables for our, our first prompts.
00:31:17.680 | Okay, so we can actually go ahead and let's add that. So we
00:31:21.120 | can see it in action. So run this again, and just include
00:31:26.640 | that or reinitialize our first prompt. So we see that. And if
00:31:32.320 | we just have a look at what that means for this format function
00:31:35.480 | here, it means we'll also need to pass in a name, okay, and
00:31:39.280 | call it Joe. Okay, so Joe, the AI, right, so you're an AI
00:31:43.880 | assistant called Joe now. Okay, so we have Joe, our AI, that is
00:31:48.400 | going to be fed in through these input variables. Then we have
00:31:51.560 | this pipe operator, the pipe operator is basically saying
00:31:54.960 | whatever is to the left of the pipe operator, which in this
00:31:58.320 | case would be this is going to go into whatever is on the right
00:32:02.360 | of the pipe operator. It's that simple. Again, we'll dive into
00:32:06.240 | this and kind of break it apart in the LSL chapter. But for now,
00:32:09.320 | that's all we need to know. So this is going to go into our
00:32:13.120 | first prompt, that is going to form everything's going to add
00:32:16.680 | the name and the article that we've provided into our first
00:32:19.320 | prompt. And it's going to output that, right, output that we have
00:32:23.240 | our pipe operator here. So the output of this is going to go
00:32:26.160 | into the input of our next step, our creative LM, then that is
00:32:32.800 | going to generate some tokens, it's going to generate our
00:32:35.320 | output, that output is going to be an AI message. And as you saw
00:32:40.600 | before, if I take this bit out, within those message objects, we
00:32:47.200 | have this content field, okay, so we are actually going to
00:32:50.720 | extract the content field out from our AI message to just get
00:32:56.640 | the content. And that is what we do here. So we get the AI
00:32:59.680 | message out from ILM. And then we're extracting the content
00:33:03.040 | from that AI message object. And we're going to pass it into a
00:33:05.800 | dictionary that just contains article title, like so. Okay, we
00:33:10.200 | don't need to do that, we can just get the AI message
00:33:12.440 | directly. I just want to show you how we are using this sort
00:33:17.120 | of chain in Elsa. So once we have set up our chain, we then
00:33:23.000 | call it or execute it using the invoke method. Into that we will
00:33:27.400 | need to pass in those variables. So we have our article already,
00:33:30.800 | but we also gave our AI name now. So let's add that. And
00:33:34.520 | we'll run this. Okay, so Joe has generated us a article title,
00:33:42.440 | unlocking the future, the rise of neuro symbolic AI agents.
00:33:46.600 | Cool, much better name than what I gave the article, which was
00:33:50.440 | AI agents are neuro symbolic systems. I don't think I did too
00:33:55.320 | bad. Okay, so we have that. Now, let's continue. And what we're
00:34:01.160 | going to be doing is building more of these types of LM chain
00:34:05.560 | pipelines, where we're feeding in some prompts, we're
00:34:09.560 | generating something, getting something and doing something
00:34:12.440 | with it. So as mentioned, we have the title, we're now moving
00:34:16.840 | on to the description. So I want to generate description. So we
00:34:19.680 | have our human message prompt template. So this is actually
00:34:22.160 | going to go into a similar format as before, we probably
00:34:27.840 | also want to redefine this because I think I'm using the
00:34:30.680 | same system message there. So let's, let's go ahead and do
00:34:35.280 | modify that. Or what we could also do is let's just remove the
00:34:41.360 | name now because I've shown you that. So what we could do is
00:34:46.080 | you're an AI system that helps build good articles, right,
00:34:52.000 | build good articles. And we could just use this as our, you
00:34:56.520 | know, generic system prompt now. So let's say that's our new
00:35:00.120 | system prompt. Now we have our user prompt, you're tasked with
00:35:03.000 | creating a description for the article, the articles here for
00:35:05.320 | you to examine article, here is the article title. Okay, so we
00:35:09.000 | need the article title now as well, and our input variables.
00:35:11.920 | Now we're going to output an SEO friendly article description.
00:35:15.320 | And we're just saying, just to be certain here, do not output
00:35:18.680 | anything other than the description. So you know,
00:35:21.160 | sometimes an LLM might say, Hey, look, this is what I generated
00:35:24.960 | for you. The reason I think this is good is because so on and so
00:35:27.520 | on and so on. Right? If you're programmatically taking some
00:35:31.120 | output from an LLM, you don't want all of that fluff around
00:35:34.640 | what the LLM has generated, you just want exactly what you've
00:35:38.120 | asked it for. Okay, because otherwise, you need to pass out
00:35:40.920 | with code, and it can get messy, and also just far less reliable.
00:35:44.840 | So we're just saying do not put anything else. Then we're
00:35:48.560 | putting all of these together. So system prompt and the second
00:35:50.880 | user prompt, this one here, putting those together into a
00:35:54.560 | new chat prompt template. And then we're going to feed all
00:35:58.520 | that in to another LSL chain as we have here to generate our
00:36:04.360 | description. So let's go ahead, we invoke that as before, we're
00:36:07.800 | just making sure we add in the article title that we got from
00:36:10.960 | before. And let's see what we get. Okay, so we have this
00:36:15.400 | explore the transformative potential of neurosymbolic AI
00:36:18.280 | agents in a little bit long, to be honest. But yeah, you can see
00:36:23.160 | what it's doing here. Right. And of course, we could then go in,
00:36:26.160 | we see this kind of too long, like SEO friendly description,
00:36:30.200 | not, not really. So we can modify this. I'll put the SEO
00:36:35.440 | friendly description, make sure we don't exceed, let me put on
00:36:42.440 | a new line, make sure we don't exceed, say 200 characters, or
00:36:46.760 | maybe it's even less to SEO, I don't, I don't have a clue. I
00:36:49.960 | would just say 120 characters do not apply anything other than
00:36:53.600 | the description. Right. So we could just go back, modify our
00:36:56.400 | prompting, see what that generates again. Okay, so much
00:37:00.640 | shorter, probably too short now, but that's fine. Cool. So we
00:37:04.160 | have that we have a summary processor. And that's now in
00:37:08.000 | this dictionary format that we have here. Cool. Now the third
00:37:12.600 | step, we want to consume that first article variable with our
00:37:17.240 | full article. And we're going to generate a few different output
00:37:22.200 | fields. So for this, we're going to be using the structured
00:37:26.520 | output feature. So let's scroll down, we'll see what that is,
00:37:31.920 | what that looks like. So structured output is essentially
00:37:36.120 | we're forcing the LLAMic like it has to output a dictionary with
00:37:40.960 | these particular fields. Okay. And we can modify this quite a
00:37:45.440 | bit. But in this scenario, what I want to do is I want there to
00:37:49.600 | be an original paragraph, right, so I just want it to regenerate
00:37:52.720 | the original paragraph, because I'm lazy, and I don't want to
00:37:54.720 | extract it out, then I want to get the new edited paragraph,
00:37:59.520 | this is the LLAM generated improved paragraph, and then we
00:38:03.600 | want to get some feedback because we don't want to just
00:38:06.120 | automate ourselves, we want to augment ourselves and get better
00:38:10.760 | with AI rather than just being like how you do you do this. So
00:38:14.880 | that's what we do here. And you can see that here we're using
00:38:18.320 | this pydantic object. And what pydantic allows us to do is
00:38:21.960 | define these particular fields. And it also allows us to assign
00:38:25.840 | these descriptions to a field and line chain is actually going
00:38:29.000 | to go ahead read all of this, right even reads. So for
00:38:32.760 | example, we could put integer here, and we could actually get
00:38:35.640 | a numeric score for our paragraph, right, we can try
00:38:40.280 | that, right. So let's, let's, let's just try that quickly,
00:38:42.600 | I'll show you. So numeric, numeric score. In fact, let's
00:38:48.400 | even just ignore, let's not put anything here. So I'm going to
00:38:51.600 | put constructive feedback on the original paragraph by just put
00:38:54.200 | into here. So let's see what happens. Okay, so we have that.
00:38:58.200 | And what I'm going to do is I'm going to get our creative LM,
00:39:01.600 | I'm going to use this with structured output method. And
00:39:04.320 | that's actually going to modify that LM class, create a new LM
00:39:07.320 | class that forces LM to use this structure for the output, right,
00:39:12.440 | so passing in paragraph into here. Using this, we're creating
00:39:15.880 | this new structured LM. So let's run that and see what happens.
00:39:21.360 | Okay, so we're going to modify our chain accordingly, maybe
00:39:25.960 | what I can do is also just remove this bit for now. So we
00:39:30.960 | can just see what the structured LM outputs directly. And let's
00:39:34.800 | see. Okay, so now you can see that we actually have that
00:39:41.080 | paragraph object, right, the one we defined up here, which is
00:39:43.800 | kind of cool. And then in there, we have the original
00:39:46.760 | paragraph, right. So this is where this is coming from. I
00:39:51.200 | definitely remember writing something that looks a lot like
00:39:54.160 | that. So I think that is correct. We have the edited
00:39:57.160 | paragraph. So this is okay, what it thinks is better. And then
00:40:00.960 | interestingly, the feedback is three, which is weird, right?
00:40:05.400 | Because here we said the constructive feedback on the
00:40:08.760 | original paragraph. But what we're doing when we use this
00:40:12.080 | with structured output, for what line chain is doing is is
00:40:15.480 | essentially performing a tool call to open AI. And what a tool
00:40:19.160 | call can do is force a particular structure in the
00:40:22.480 | output of an LM. So when we say feedback has to be an integer,
00:40:27.080 | no matter what we put here, it's going to give us an integer.
00:40:30.200 | Because how do you provide constructive feedback within
00:40:33.480 | sure doesn't really make sense. But because we've set that
00:40:37.200 | limitation, that restriction here, that is what it does. It
00:40:41.760 | just gives us the numeric value. So I'm going to shift that to
00:40:45.680 | string. And then let's rerun this, see what we get. Okay, we
00:40:49.360 | should now see that we actually do get constructive feedback.
00:40:52.640 | Alright, so yeah, you can see it's quite, quite long. So the
00:40:56.480 | original paragraph effectively communicates limitations with
00:40:59.040 | neural AI systems in performing certain tasks. However, it could
00:41:03.080 | benefit from slightly improved clarity and conciseness. For
00:41:06.400 | example, the phrase was becoming clear can be made more direct by
00:41:09.960 | changing it to became evident. Yeah, true. Thank you very much.
00:41:15.240 | So yeah, now we actually get that that feedback, which is
00:41:19.480 | pretty nice. Now let's add in this final step to our chain.
00:41:24.440 | Okay, and it's just going to pull out our paragraph object
00:41:28.960 | here and extract into a dictionary, we don't necessarily
00:41:31.960 | need to do this. Honestly, I actually kind of prefer it
00:41:34.280 | within this paragraph object. But just so we can see how we
00:41:38.680 | would pass things on the other side of the chain. Okay, so now
00:41:43.680 | we can see we've extracted that out. Cool. So we have all of
00:41:49.120 | that interesting feedback again. But let's leave it there for the
00:41:54.560 | text part of this. Now let's have a look at the sort of
00:41:58.360 | multimodal features that we can work with. So this is, you know,
00:42:02.400 | maybe one of those things that's kind of seems a bit more
00:42:04.600 | abstracted, a little bit complicated, where it maybe
00:42:08.120 | could be improved. But you know, we're not going to really be
00:42:10.920 | focusing too much on the multimodal stuff, we'll still be
00:42:13.440 | focusing on language, but I did want to just show you very
00:42:16.280 | quickly. So we want this article to look better. Okay, we want to
00:42:22.000 | generate a prompt based on the article itself, that we can then
00:42:28.640 | pass to DALI, the image generation model from OpenAI,
00:42:32.600 | that will then generate an image like a like a thumbnail image
00:42:36.320 | for us. Okay. So the first step of that is we're actually going
00:42:41.160 | to get an LLM to generate that. Alright, so we have our prompt
00:42:44.760 | that we're going to use for that. So I'm gonna say generate
00:42:47.200 | a prompt with less than 500 characters to generate an image
00:42:51.600 | based on the following article. Okay, so that's our prompt.
00:42:55.240 | Yeah, super simple. We're using the generic prompt template
00:42:58.920 | here, you can use that you can use user prompt template, it's
00:43:02.480 | up to you. This is just like the generic prompt template, then
00:43:06.560 | what we're going to be doing is based on what this outputs,
00:43:11.120 | we're then going to feed that in to this generate and display
00:43:15.000 | image function via the image prompt parameter that is going
00:43:19.320 | to use the DALI API wrapper from line chain, it's going to run
00:43:23.560 | that image prompt, and we're going to get a URL out from
00:43:26.720 | that, essentially. And then we're going to read that using
00:43:29.960 | SK image here, right, so it's going to read that image URL,
00:43:33.000 | going to get the image data, and then we're just going to display
00:43:36.120 | it. Okay, so pretty straightforward. Now, again, this
00:43:42.200 | is a L cell thing here that we're doing, we have this
00:43:46.160 | runnable lambda thing, when we're running functions within
00:43:50.480 | our cell, we need to wrap them within this runnable lambda, I,
00:43:54.400 | you know, I don't want to go too much into what this is doing
00:43:57.720 | here, because we do cover in the L cell chapter. But it's just,
00:44:01.760 | you know, all you really need to know is we have a custom
00:44:04.040 | function, wrap it in runnable lambda. And then what we get
00:44:07.840 | from that we can use within this here, right, the L cell
00:44:12.000 | syntax. So what are we doing here, let's figure this out, we
00:44:15.960 | are taking our original image prompt that we defined just up
00:44:19.840 | here, right, input variable to that is article. Okay, we have
00:44:25.800 | our article data being input here, feeding that into our
00:44:29.120 | prompt. From there, we get our message that we then feed into
00:44:33.640 | our LM from the LM, it's going to generate us a, like an image
00:44:37.960 | prompt, like a prompt for generating our image for this
00:44:41.520 | article, we can even let's let's print that out, so that we can
00:44:45.920 | see what it generates, because I'm also kind of curious. Okay,
00:44:49.920 | so we'll just run that. And then let's see, it will feed in that
00:44:55.480 | content into our runnable, which is basically this function here.
00:45:00.080 | And we'll see what it generates. Okay, don't expect anything
00:45:03.880 | amazing from Dali, it's not, it's not the best, to be honest,
00:45:07.720 | but we at least we see how to use it. Okay, so we can see the
00:45:12.800 | prompt that was used here, create an image that visually
00:45:15.280 | represents the concept of neuro symbolic agents depict a
00:45:18.360 | futuristic interface where a large language model interacts
00:45:22.120 | with traditional code, symbolizing integration of, oh,
00:45:25.440 | my gosh, something computation include elements like a brain to
00:45:29.880 | represent neural networks, gears or circuits or symbolic logic,
00:45:34.600 | and a web of connections illustrating vast use cases of
00:45:38.480 | AI agents. Oh, my gosh, look at all that. Big prompt, then we
00:45:44.480 | get this. So you know, Dali is interesting, I would say, we
00:45:48.160 | could even take this, let's just see what that comes up with in
00:45:51.880 | something like mid journey, you can see these way cooler images
00:45:56.640 | that we get from just another image generation model far
00:45:59.640 | better, but pretty cool, honestly. So in terms of
00:46:02.800 | generation images, the phrasing that the prompt itself is
00:46:06.600 | actually pretty good. The image, you know, could be better. But
00:46:11.440 | that's it, right. So with all of that, we've seen a little
00:46:15.760 | introduction to what we might building with Lightning Chain.
00:46:18.520 | So that's it for our introduction chapter. As I
00:46:21.560 | mentioned, we don't want to go too much into what each of these
00:46:24.800 | things is doing, I just really want to focus on, okay, this is
00:46:29.680 | kind of how we're building something with line chain. This
00:46:33.800 | is the overall flow. We don't really want to be focusing too
00:46:37.880 | much on, okay, what exactly LSL is doing, or what exactly, you
00:46:42.960 | know, this prompt thing is that we're setting up, we're going to
00:46:47.080 | be focusing much more on all of those things, and much more in
00:46:50.880 | the upcoming chapters. So for now, we've just seen a little
00:46:55.600 | bit of what we can build before diving in, in more detail. Okay,
00:46:59.760 | so now we're going to take a look at AI observability using
00:47:04.680 | Langsmith. Now, Langsmith is another piece of the broader
00:47:08.720 | line chain ecosystem. Its focus is on allowing us to see what
00:47:14.960 | our LLMs, agents, etc, are actually doing. And it's
00:47:18.840 | something that we would definitely recommend using if
00:47:21.720 | you are going to be using line chain and line graph. Now let's
00:47:24.200 | take a look at how we would set Langsmith up, which is
00:47:27.600 | incredibly simple. So I'm going to open this in Colab. And I'm
00:47:31.960 | just going to install the prerequisites here. You'll see
00:47:35.120 | these are all the same as before, but we now have the
00:47:37.280 | Langsmith library here as well. Now, we are going to be using
00:47:41.320 | Langsmith throughout the course. So in all the following chapters,
00:47:45.200 | we're going to be importing Langsmith, and that will be
00:47:48.440 | tracking everything we're doing. But you don't need
00:47:50.720 | Langsmith to go through the course, it's an optional
00:47:53.680 | dependency. But as mentioned, I would recommend it. So we'll
00:47:57.240 | come down to here. And first thing that we will need is the
00:48:00.040 | line chain API key. Now we do need an API key, but that does
00:48:04.600 | come with a reasonable free tier. So we can see here, they
00:48:09.640 | have each of the plans. And this is the one that we are by
00:48:13.160 | default on. So it's free for one user up to 5000 tracers per
00:48:20.200 | month. If you're building out an application, I think it's
00:48:23.080 | fairly easy to go beyond that, but it really depends on what
00:48:26.000 | you're building. So it's a good place to start with. And then of
00:48:29.640 | course, you can upgrade as required. So we would go to
00:48:35.000 | smith.langchain.com. And you can see here that this will log me
00:48:40.040 | in automatically, I have all of these tracing projects, these
00:48:43.560 | are all from me running the various chapters of the course
00:48:46.360 | yours, if you do use Langsmith throughout the course, your
00:48:49.560 | Langsmith dashboard will end up looking something like this.
00:48:52.640 | Now, what we need is an API key. So we go over to settings, we
00:48:58.800 | have API keys, and we're just going to create an API key.
00:49:02.240 | Because we're just going through some personal learning right
00:49:05.120 | now, I would go with personal access token, we can give a name
00:49:08.400 | or description if you want. Okay, and we'll just copy that.
00:49:12.000 | And then we come over to our notebook, and we enter our API
00:49:15.200 | key there. And that is all we actually need to do. That's
00:49:18.240 | absolutely everything. I suppose the one thing to be aware of is
00:49:21.320 | that you should set your Langchain project to whatever
00:49:24.280 | project you're working within. So of course, within the course,
00:49:27.320 | we have individual project names for each chapter. But for your
00:49:30.800 | own projects, of course, you should make sure this is
00:49:33.320 | something that you recognize and is useful to you. So Langsmith
00:49:37.840 | actually does a lot without needing to do anything. So we
00:49:40.680 | can actually go through, let's just initialize our LLM and
00:49:43.960 | start invoking it and seeing what Langsmith returns to us. So
00:49:48.480 | we'll need our OpenAI API key, enter it here. And then let's
00:49:53.560 | just invoke hello. Okay, so nothing has changed on this end,
00:49:58.720 | right? So it was running code, there's nothing different here.
00:50:01.320 | However, now if we go to Langsmith, I'm going to go back
00:50:05.640 | to my dashboard. Okay, and you can see that the the order of
00:50:10.120 | these projects just changed a little bit. And that's because
00:50:13.000 | the most recently used project, this one at the top, Langchain
00:50:16.600 | course Langsmith OpenAI, which is the current chapter we're in,
00:50:20.200 | that was just triggered. So I can go into here, I can see, oh,
00:50:24.360 | look at this. So we actually have something in the Langsmith
00:50:27.640 | UI. And all we did was enter our Langchain API key. That's all we
00:50:31.720 | did. And we set some environment variables. And that's it. So we
00:50:34.840 | can actually click through to this and it will give us more
00:50:36.640 | information. So you can see what was the input, what was the
00:50:40.440 | output, and some other metadata here. You see, you know, there's
00:50:45.640 | not that much in here. However, when we do the same for agents,
00:50:50.840 | we'll get a lot more information. So I can even show
00:50:54.360 | you a quick example from the future chapters. If we come
00:50:59.120 | through to agents intro here, for example. And we just take a
00:51:04.040 | look at one of these. Okay, so we have this input and output,
00:51:08.440 | but then on the left here, we get all of this information. And
00:51:11.800 | the reason we get all this information is because agents
00:51:14.200 | are performing multiple LLM calls, etc, etc. So there's a
00:51:18.800 | lot more going on. So you can see, okay, what was the first
00:51:21.880 | LLM call, and then we get these tool use traces, we get another
00:51:26.120 | LLM call, another tool use and another LLM call. So you can see
00:51:30.200 | all this information, which is incredibly useful and incredibly
00:51:33.600 | easy to do. Because all I did when saying this up in that
00:51:37.120 | agent chapter was simply set the API key and the environment
00:51:41.120 | variables as we have done just now. So you get a lot out of a
00:51:46.040 | very little effort with Langsmith, which is great. So
00:51:49.120 | let's return to our Langsmith project here. And let's invoke
00:51:53.040 | some more. Now I've already shown you, you know, we're going
00:51:56.480 | to see a lot of things just by default. But we can also add
00:51:59.760 | other things that Langsmith wouldn't typically trace. So to
00:52:05.080 | do that, we will just import a traceable decorator from
00:52:08.280 | Langsmith. And then let's make these just random functions
00:52:13.600 | traceable within Langsmith. Okay, so we run those, we have
00:52:19.000 | three here. So we're going to generate a random number, we're
00:52:22.600 | going to modify how long a function takes and also generate
00:52:27.960 | a random number. And then in this one, we're going to either
00:52:31.720 | return this no error, or we're going to raise an error. So
00:52:36.200 | we're going to see how the Langsmith handles these
00:52:38.880 | different scenarios. So let's just iterate through and run
00:52:43.160 | those a few times. So it's going to run each one of those 10
00:52:46.280 | times. Okay, so let's see what happens. So they're running,
00:52:52.040 | let's go over to our Langsmith UI and see what is happening
00:52:55.840 | over here. So we can see that everything is updating, we're
00:52:58.640 | adding that information through. And we can see if we go into a
00:53:01.600 | couple of these, we can see a little more information. So the
00:53:04.520 | input and the output took three seconds. See random error here.
00:53:11.200 | In this scenario, random error passed without any issues. Let
00:53:15.480 | me just refresh the page quickly. Okay, so now we have
00:53:20.200 | the rest of the information. And we can see that occasionally,
00:53:23.840 | if there is an error from our random error function, it is
00:53:26.800 | signified with this. And we can see the traceback as well that
00:53:31.520 | was returned there, which is useful. Okay, so we can see if
00:53:34.200 | an error has been raised, we have to see what that error is.
00:53:37.400 | We can see the various latencies of these functions. So you can
00:53:42.600 | see that varying throughout here. We see all the inputs to
00:53:47.640 | each one of our functions, and then of course the outputs. So
00:53:51.600 | we can see a lot in there, which is pretty good. Now, another
00:53:55.800 | thing that we can do is we can actually filter. So if we come
00:53:59.920 | to here, we can add a filter. Let's filter for errors. That
00:54:04.760 | would be value error. And then we just get all of the cases
00:54:09.240 | where one of our functions has returned or raised an error or
00:54:13.240 | value error specifically. Okay, so that's useful. And then
00:54:17.360 | yeah, there's various other filters that we can add there.
00:54:21.160 | So we could add a name, for example, if we wanted to look
00:54:24.640 | for the generate string delay function only, we could also do
00:54:30.560 | that. Okay, and then we can see the varying latencies of that
00:54:34.880 | function as well. Cool. So we have that. Now, one final thing
00:54:40.760 | that we might want to do is maybe we want to make those
00:54:43.680 | function names a bit more descriptive or easy to search
00:54:47.920 | for, for example. And we can do that by saying the name of the
00:54:51.200 | traceable decorator, like so. So let's run that. Run this a few
00:54:56.120 | times. And then let's jump over to Langsmith again, go into
00:55:01.160 | Langsmith project. Okay, and you can see those coming through as
00:55:04.200 | well. So then we could also search for those based on that
00:55:07.560 | new name. So what was it, chit chat maker, like so. And then
00:55:12.040 | we can see all the information being streamed through to
00:55:16.560 | Langsmith. So that is our introduction to Langsmith. There
00:55:21.160 | is really not all that much to go through here. It's very easy
00:55:25.200 | to set up. And as we've seen, it gives us a lot of
00:55:27.640 | observability into what we are building. And we will be using
00:55:32.880 | this throughout the course, we don't rely on it too much. It's
00:55:35.600 | a completely optional dependency. So if you don't want
00:55:38.000 | to use Langsmith, you don't need to, but it's there and I would
00:55:40.560 | recommend doing so. So that's it for this chapter, we'll move on
00:55:43.800 | to the next one. Now we're going to move on to the chapter on
00:55:48.560 | prompts in Langchain. Now, prompts, they seem like a simple
00:55:53.040 | concept, and they are a simple concept, but there's actually
00:55:55.320 | quite a lot to them when you start diving into them. And they
00:55:59.720 | truly have been a very fundamental part of what has
00:56:04.480 | propelled us forwards from pre LLM times to the current LLM
00:56:09.360 | times. You have to think until LLMs became widespread, the way
00:56:14.520 | to fine tune a AI model or ML model back then was to get loads
00:56:22.720 | of data for your particular use case, spend a load of training
00:56:26.840 | your specific transformer or part of the transformer to
00:56:30.960 | essentially adapt it for that particular task. That could take
00:56:35.120 | a long time. Depending on the task, it could take you months
00:56:40.840 | or in some times, if it was a simpler task, it might take
00:56:44.480 | probably days, potentially weeks. Now, the interesting
00:56:48.720 | thing with LLMs is that rather than needing to go through this
00:56:53.960 | whole fine tuning process to modify a model for one task over
00:57:00.520 | another task, rather than doing that, we just prompt it
00:57:03.400 | differently, we literally tell the model, hey, I want you to do
00:57:07.360 | this in this particular way. And that is a paradigm shift in what
00:57:12.480 | you're doing is so much faster, it's going to take you, you
00:57:15.600 | know, a couple of minutes, rather than days, weeks, or
00:57:18.400 | months. And LLMs are incredibly powerful when it comes to just
00:57:23.200 | generalizing to, you know, across these many different
00:57:26.200 | tasks. So prompts, which control those instructions are a
00:57:31.480 | fundamental part of that. Now, line chain naturally has many
00:57:36.560 | functionalities around prompts. And we can build very dynamic
00:57:40.320 | prompting pipelines that modify the structure and content of
00:57:44.360 | what we're actually feeding into our LLM, depending on different
00:57:47.800 | variables, different inputs. And we'll see that in this chapter.
00:57:51.920 | So we're going to work through prompting within the scope of a
00:57:57.160 | RAG example. So let's start by just dissecting the various
00:58:01.840 | parts of a prompt that we might expect to see for a use case
00:58:06.040 | like RAG. So our typical prompt for RAG or retrieval,
00:58:11.200 | augmented generation will include rules for the LLM. And
00:58:15.960 | this is this you will see in most prompts, if not all this
00:58:21.440 | part of the prompt sets up the behavior of the LLM. That is how
00:58:26.840 | it should be responding to user queries, what sort of
00:58:30.560 | personality it should be taking on what it should be focusing on
00:58:34.360 | when it is responding any particular rules or boundaries
00:58:37.800 | that we want to set. And really, what we're trying to do here is
00:58:42.240 | just to simply provide as much information as possible to the
00:58:47.200 | LLM about what we're doing, we just want to give the LLM
00:58:53.480 | context as to the place that it finds itself in. Because an LLM
00:58:59.200 | has no idea where it is, it's just is a it takes in some
00:59:02.840 | information and spits out information. If the only
00:59:05.800 | information it receives is from the users, you know, user query,
00:59:08.680 | it has, you know, it doesn't know the context, what is the
00:59:12.840 | application that is within? What is its objective? What is its
00:59:16.880 | aim? What are the boundaries? All of this, we need to just
00:59:21.400 | assume the LLM has absolutely no idea about because it truly
00:59:26.360 | does not. So as much context as we can provide, but it's
00:59:32.280 | important that we don't overdo it. It's, we see this all the
00:59:36.040 | time, people will over prompt an LLM, you want to be concise,
00:59:40.320 | you don't want fluff. And in general, every single part of
00:59:44.280 | your prompt, the more concise and less fluffy, you can make it
00:59:47.760 | the better. Now, those rules or instructions are typically in
00:59:51.560 | the system prompt of your LLM. Now, the second one is context,
00:59:55.800 | which is RAG specific. The context refers to some sort of
00:59:59.960 | external information that you're feeding into your LLM. We may
01:00:04.920 | have received this information from web search, database query
01:00:09.600 | or quite often in this case of RAG, it's a vector database.
01:00:14.000 | This external information that we provide is essentially the
01:00:19.120 | RA retrieval augmentation of RAG. We are augmenting the
01:00:25.880 | knowledge of our LLM, which the knowledge of our LLM is
01:00:29.720 | contained within the LLM model weights. We're augmenting that
01:00:33.600 | knowledge with some external knowledge. That's what we're
01:00:36.520 | doing here. Now for chat LLMs, this context is typically
01:00:43.320 | placed within a conversational context within the user or
01:00:48.720 | assistant messages. And with more recent models, it can also
01:00:54.320 | be placed within tool and messages as well. Then we have
01:00:58.760 | the questions, pretty straightforward. This is the
01:01:01.560 | query from the user. This is more, it's usually a user
01:01:06.680 | message, of course. There might be some additional formatting
01:01:10.960 | around this, you might add a little bit of extra context, or
01:01:14.680 | you might add some additional instructions. If you find that
01:01:18.240 | your LLM sometimes veers off the rules that you've set within
01:01:21.760 | the system prompt, you might append or prefix something here.
01:01:26.520 | But for the most part, it's probably just going to be the
01:01:28.600 | user's input. And finally, so these are all the inputs for our
01:01:33.800 | prompt here is going to be the output that we get. So the
01:01:37.760 | answer from the assistant. Again, I mean, that's not even
01:01:41.480 | specific to RAG, it's just what you would expect in a chat LLM
01:01:45.680 | or any LLM. And of course, that would be an assistant message.
01:01:49.600 | So putting all of that together in an actual prompt, so you can
01:01:53.440 | see everything we have here. So we have the rules for our
01:01:57.320 | prompt here, the instructions, we're just saying, okay, answer
01:02:00.360 | the question based on the context below. If you cannot
01:02:02.440 | answer the question, using the information, answer it, I don't
01:02:05.680 | know. Then we have some context here. Okay, in this scenario,
01:02:11.200 | that context that we're feeding in here, because it's the first
01:02:14.680 | message, we might put that into the system prompt. But that may
01:02:18.160 | also be turned around. Okay, if you if you, for example, have an
01:02:21.640 | agent, you might have your question up here before the
01:02:25.760 | context. And then that would be coming from a user message. And
01:02:30.000 | then this context would follow the question and be recognized
01:02:34.600 | as a tool message, it would be fed in that way as well, can
01:02:38.920 | depends on on what sort of structure you're going for that.
01:02:41.520 | But you can do either you can feed it into the system message
01:02:43.960 | if it's less conversational, whereas if it's more
01:02:47.920 | conversational, you might feed it in as a tool message. Okay,
01:02:50.760 | and then we have a user query, which is here. And then we'd
01:02:54.160 | have the AI answer. Okay, and obviously, that would be
01:02:57.120 | generated here. Okay, so let's switch across to the code. We're
01:03:01.520 | in the linechain course repo notebooks, zero, three prompts,
01:03:05.320 | I'm just going to open this in Colab. Okay, scroll down, and
01:03:09.280 | we'll start just by installing the prerequisites. Okay, so we
01:03:13.120 | just have the various libraries, again, as I mentioned before,
01:03:16.360 | langsmith is optional, you don't need to install it. But if you
01:03:19.360 | would like to see your traces and everything in langsmith,
01:03:22.560 | then I would recommend doing that. And if you are using
01:03:25.680 | langsmith, you will need to enter your API key here. Again,
01:03:29.760 | if you're not using langsmith, you don't need to enter
01:03:32.000 | anything here, you just skip that cell. Okay, cool. And let's
01:03:36.160 | jump into the basic prompting them. So we're going to start
01:03:41.080 | with this prompt. And so use query based on the question
01:03:43.600 | below. So we're just structuring what we just saw in code. And
01:03:49.200 | we're going to be using the chat prompt template, because
01:03:52.480 | generally speaking, we're using chat LMS in most, most cases,
01:03:57.720 | nowadays. So we have our chat prompt template, and that is
01:04:01.760 | going to contain a list of messages, system message to
01:04:05.440 | begin with, which is just going to contain this. And we're
01:04:08.800 | feeding in the context within that there. And we have our
01:04:13.640 | user query here. Okay. So we'll run this. And if we take a look
01:04:20.920 | here, we haven't specified what our input variables are, okay.
01:04:26.400 | But we can see that we have query. And we have context up
01:04:31.680 | here, right? So we can see that, okay, these are the input
01:04:34.320 | variables, we just haven't explicitly defined them here. So
01:04:39.160 | let's just confirm with this, that line chain did pick those
01:04:44.040 | up. And we can see that it did. So it has context and query as
01:04:46.720 | our input variables for the prompt template that we just
01:04:50.560 | defined. Okay, so we can also see the structure of our
01:04:55.280 | templates. Let's have a look. Okay, so we can see that within
01:05:00.760 | messages here, we have a system message prompt template, the way
01:05:05.160 | that we define this, you can see here that we have from messages
01:05:08.160 | and this will consume various different structures. So you can
01:05:14.680 | see here that it has a for messages is a sequence of
01:05:19.760 | message like representation. So we could pass in a system prompt
01:05:24.240 | template object, and then a user prompt template object. Or we
01:05:30.600 | can just use a tuple like this. And this actually defines okay,
01:05:33.920 | the system, this is a user, and you could also do assistant or
01:05:38.360 | tool messages and stuff here as well using the same structure.
01:05:42.280 | And then we can look in here. And of course, that is being
01:05:45.880 | translated into the system message prompt template and
01:05:50.080 | human message prompt template. Okay. We have our input
01:05:54.680 | variables in there. And we have the template too. Okay. Now,
01:05:59.880 | let's continue. We'll see here why why just said, so we're
01:06:05.400 | importing our system message prompt template and human
01:06:08.240 | message prompt template. And you can see we're using the same
01:06:11.200 | from messages method here. Right? And you can see so
01:06:15.520 | sequence of message like representation. It's just, you
01:06:19.440 | know, what that actually means. It can vary, right? So here we
01:06:23.160 | have system message prompt template from template, prompt
01:06:25.880 | here from template query, you know, there's various ways that
01:06:28.600 | you might want to do this, it just depends on how explicit you
01:06:32.960 | want to be. Generally speaking, I think, for myself, I would
01:06:38.960 | prefer that we stick with the objects themselves, and be
01:06:43.400 | explicit. But it is definitely a little harder to pass when
01:06:46.960 | you're when you're reading this. So I understand why you might
01:06:50.520 | also prefer this is it's definitely cleaner, and it is a
01:06:53.560 | does look simpler. So it just depends, I suppose, on
01:06:58.480 | preference. Okay. So you see, again, this is exactly the same.
01:07:05.640 | Okay, we're chair prompt template, and it contains this
01:07:08.600 | and this. Okay. You probably want to see the exact output. So
01:07:14.080 | it was messages. Okay, exactly the same as why I put before.
01:07:19.880 | Cool. So we have all that. Let's see how we would invoke our LLM
01:07:25.800 | with these. We're going to be using for a mini again, we do
01:07:30.280 | need our API key. So enter that. And we'll just initialize our
01:07:37.280 | LLM, we are going with a low temperature here. So less
01:07:41.120 | randomness, or less creativity. And in many cases, this is
01:07:46.840 | actually what I would be doing. The reason in this scenario that
01:07:51.400 | we're going with low temperature is we're doing rag. And if you
01:07:55.680 | remember, before we scroll up a little bit here, our template
01:07:59.000 | says, answer the user's query based on the context below. If
01:08:01.680 | you cannot answer the question using the provided answer,
01:08:04.680 | information answer with I don't know, right. So just from
01:08:09.760 | reading that we know that we want our LLM to be as truthful
01:08:15.320 | and accurate as possible. So a more creative LLM is going to
01:08:19.720 | struggle with that and is more likely to hallucinate. Whereas a
01:08:25.080 | low creativity or low temperature LLM will probably
01:08:29.160 | stick with the rules a little better. So again, it depends on
01:08:32.320 | your use case. You know, if you're creative writing, you
01:08:35.120 | might want to go with a higher temperature there. But for
01:08:38.440 | things like rag, where the information being output should
01:08:42.120 | be accurate, and truthful. It's important, I think that we keep
01:08:47.600 | temperature low. Okay. I talked about that a little bit here. So
01:08:51.840 | of course, lower temperature zero makes the LLMs output more
01:08:56.000 | deterministic, which in theory should lead to less
01:08:59.040 | hallucination. Okay, so we're gonna go with L cell again here.
01:09:03.240 | This is for those of you that use line chain in the past, this
01:09:06.480 | is equivalent to an LLM chain object. So our prompt template
01:09:10.840 | is being fed into our LLM. Okay. And from now we have this
01:09:16.800 | pipeline. Now let's see how we would use that pipeline. So
01:09:22.120 | gonna get some, create some context here. So this is some
01:09:27.160 | context around Aurelio AI. Mention that we built semantic
01:09:32.960 | routers, semantic junkers, as AI platform, and development
01:09:38.800 | services. We mentioned, I think we specifically outlined this
01:09:43.960 | later on in the example. So the line chain experts, little piece
01:09:47.160 | of information. Now, most LLMs would have not been trained on
01:09:51.920 | the recent internet. So the fact that this came in September
01:09:55.680 | 2024, is relatively recent. So a lot of LLMs out of the box, you
01:10:00.400 | wouldn't expect them to know that. So that is a good little
01:10:05.320 | bit of information to ask you about. So we invoke, we have our
01:10:08.880 | query. So what do we do? And we have that context. Okay, so
01:10:13.320 | we're feeding that into that pipeline that we defined here.
01:10:16.120 | Alright, so when we invoke that is automatically going to take
01:10:19.920 | query and context and actually feed it into our prompt
01:10:23.800 | template. Okay. If we want to, we can also be a little more
01:10:30.040 | explicit. So you probably see me doing this throughout the
01:10:34.280 | course. Because I do like to be explicit with everything, to be
01:10:39.040 | honest. And you'll probably see me doing this. Okay, and this is
01:10:49.640 | doing the same thing. Well, you'll see it will in the
01:10:53.240 | moment. This is doing the exact same thing. Again, this is just
01:10:57.800 | an LSL thing. So all I'm doing in this scenario is I'm saying,
01:11:04.760 | okay, take that from the dictionary query. And then also
01:11:10.160 | take from that input dictionary, the context key. Okay, so this
01:11:19.000 | is doing the exact same thing. The reason that we might want to
01:11:22.240 | write this is mainly for clarity, to be honest, just too
01:11:26.520 | explicit, say, okay, these are the inputs, because otherwise,
01:11:29.240 | we don't really have them in the code other than within our
01:11:33.360 | original prompts up here, which is not super clear. So I think
01:11:39.400 | it's usually a good idea to just be more explicit with these
01:11:41.720 | things. And of course, if you decide you're going to modify
01:11:45.160 | things a little bit, let's say you modify this input down the
01:11:48.880 | line, you can still feed in the same input here, you're just
01:11:52.240 | mapping it between different keys, essentially. Or if you
01:11:56.040 | would like to just modify that, you need to lowercase it on the
01:11:59.720 | way in or something, you can do. So you have that, I'll just
01:12:06.200 | redefine that, actually. And we'll invoke again. Okay, we see
01:12:13.440 | that it does the exact same thing. Okay, so ready. So this
01:12:17.600 | is a AI message just generated by the LM. Okay, expertise in
01:12:22.440 | building AI agents, several open source frameworks, router, AI
01:12:27.400 | platform. Okay, right. So provide them. So they have
01:12:32.840 | everything that other than the line chain experts thing, it
01:12:35.280 | didn't mention that. But we will, yeah, we'll test it later
01:12:39.080 | on that. Okay, so on to future prompting. This is a specific
01:12:43.040 | prompting technique. Now, many state of the art or also to LMS
01:12:48.440 | are very good at instruction following. So you'll find that a
01:12:52.400 | few shot prompting is less common now than it used to be,
01:12:56.240 | at least for this or bigger, more state of the art models.
01:13:00.480 | But when you start using smaller models, not really what we can
01:13:05.240 | use here. But let's say you're using a source model like llama
01:13:09.400 | three, or llama two, which is much smaller, you will probably
01:13:15.080 | need to consider things like few shot prompting. Although that
01:13:18.920 | being said, with open AI models, at least the current open AI
01:13:24.440 | models, this is not so important. Nonetheless, it can
01:13:27.920 | be useful. So the idea behind future prompting is that you are
01:13:31.880 | providing a few examples to your LM of how it should behave
01:13:36.760 | before you are actually going into the main part of the
01:13:42.520 | conversation. So let's see how that would look. So we create an
01:13:46.800 | example prompt. So we have our human and AI. So human input AI
01:13:51.520 | response. So we're basically setting up okay, this with this
01:13:54.760 | type of input, you should provide this type of output.
01:13:57.960 | That's what we're doing here. And we're just going to provide
01:14:01.760 | some examples. Okay, so we have our input, here's query one,
01:14:05.880 | here's the answer one, right? This is just I just want to show
01:14:09.680 | you how it works. This is not what we'd actually feed into our
01:14:12.680 | LM. Then, with both these examples and our example prompt
01:14:16.960 | would feed both of these into line chains, a few shot chat
01:14:21.680 | message prompt template. Okay. And well, you'll see what we get
01:14:26.720 | out of it. Okay, so we basically get it formats everything and
01:14:30.480 | structures everything for us. Okay. And using this, of course,
01:14:35.920 | it depends on let's say you see that your user is talking about
01:14:42.280 | a particular topic. And you would like to guide your LM to
01:14:47.240 | talk about that particular topic in a particular way. Right. So
01:14:50.760 | you could identify that the user is talking about that topic,
01:14:53.840 | either like a keyword match or a semantic similarity match. And
01:14:58.080 | based on that, you might want to modify these examples that you
01:15:01.240 | feed into your few shot chat message prompt template. And
01:15:06.080 | then obviously, for that could be what you do with topic A for
01:15:08.960 | topic B, you might have another set of examples that you feed
01:15:12.120 | into this. All this time, your example prompts is remaining the
01:15:15.800 | same, but you're just modifying the examples that are going in
01:15:18.480 | so that they're more relevant to whatever it is your user is
01:15:21.520 | actually talking about. So that can be useful. Let's see an
01:15:25.360 | example of that. So when we are using a tiny LM, its ability
01:15:29.800 | would be limited, although I think we were probably fine
01:15:33.160 | here. We're going to say, answer the user query based on the
01:15:36.760 | context below. Always enter a markdown format, you know, being
01:15:40.120 | very specific, this is our system prompt. Okay, that's
01:15:44.320 | nice. But what we've kind of said here is, okay, always
01:15:48.200 | enter a markdown format to do that. But when doing so, please
01:15:53.440 | provide headers, short summaries, and follow bullet
01:15:55.920 | points, then conclude. Okay, so you see this here, okay, so we
01:16:01.560 | get this overview of array, you have this and this is actually
01:16:05.160 | quite good. But if we come down here, what I specifically want
01:16:09.800 | is to always follow this structure. Alright, so we have
01:16:13.880 | the double header for the topic, summary, header, a couple of
01:16:20.120 | bullet points. And then I always want to follow this pattern
01:16:22.320 | where it's like to conclude, always, it's always bold. You
01:16:26.120 | know, I want to be very specific on what I want. And to be, you
01:16:30.400 | know, fully honest, with GPT 4.0 mini, you can actually just
01:16:35.200 | prompt most of this in. But for the sake of the example, we're
01:16:38.560 | going to provide a few short examples in a few short prompt
01:16:43.760 | examples, instead to get this. So we're going to provide one
01:16:46.920 | example here. Second example here. And you'll see we're just
01:16:51.360 | following that same pattern, we're just setting up the
01:16:53.160 | pattern that the LM should use. So we're going to set that up
01:16:58.400 | here, we have our main header, a little summary, some sub
01:17:03.720 | headers, bullet points, sub header, bullet points, bullet
01:17:06.240 | points to conclude, so on and so on. Same with this one here.
01:17:09.640 | Okay. And let's see what we got. Okay, so this is the structure
01:17:20.000 | of our new few short prompt template. You can see what all
01:17:24.800 | this looks like. Let's come down and we're going to do, we're
01:17:28.840 | basically going to insert that directly into our chat prompt
01:17:32.280 | template. So we have from messages, system prompt, user
01:17:37.600 | prompt, and then we have in there, these, so let me actually
01:17:42.960 | show you very quickly. Right, so we just have this few short
01:17:48.720 | chat to message prompt template, which will be fed into the
01:17:51.320 | middle here, run that, and then feed all this back into our
01:17:54.840 | pipeline. Okay, and this will, you know, modify the structure
01:17:58.440 | so that we have that bold to conclude at the end here. Okay,
01:18:01.880 | you can see nicely here. So we get a bit more of that, the
01:18:05.880 | exact structure that we were getting again with GPT 4.0
01:18:10.160 | models and many other OpenAI models, you don't really need to
01:18:14.120 | do this, but you will see it in other examples. We do have an
01:18:17.600 | example of this where we're using a Llama and we're using, I
01:18:21.760 | think Llama 2, if I'm not wrong. And you can see that adding this
01:18:26.680 | few short prompt template is actually a very good way of
01:18:31.280 | getting those smaller, less capable models to follow your
01:18:34.600 | instructions. So this is really, when you're working with a
01:18:38.000 | smaller lens, this can be super useful, but even for SOTA models
01:18:41.360 | like GPT 4.0, if you do find that you're struggling with the
01:18:45.640 | prompting, it's just not quite following exactly what you want
01:18:48.520 | it to do. This is a very good technique for actually getting
01:18:53.240 | it to follow a very strict structure or behavior. Okay, so
01:18:57.200 | moving on, we have chain of thought prompting. So this is a
01:19:01.720 | more common prompting technique that encourages the LLM to
01:19:06.320 | think through its reasoning or its thoughts step by step. So
01:19:11.480 | it's a chain of thought. The idea behind this is like, okay,
01:19:15.040 | in math class, when you're a kid, the teachers would always
01:19:19.280 | push you to put down your, your working out, right? And there's
01:19:24.400 | multiple reasons for that. One of them is to get you to think
01:19:26.960 | because they know in a lot of cases, actually, you know,
01:19:29.400 | you're a kid and you're in a rush and you don't really care
01:19:31.400 | about this test. And the, you know, they're just trying to get
01:19:35.680 | you to slow down a little bit, and actually put down your
01:19:39.360 | reasoning. And that kind of forced you to think, oh,
01:19:41.280 | actually, I'm skipping a little bit in my head, because I'm
01:19:44.320 | trying to just do everything up here. If I write it down, all
01:19:47.480 | of a sudden, it's like, Oh, actually, I'm, yeah, I need to
01:19:50.720 | actually do that slightly differently, you realize, okay,
01:19:53.280 | you're probably rushing a little bit. Now, I'm not saying an LLM
01:19:55.960 | is rushing, but it's a similar effect by an LLM writing
01:19:58.920 | everything down, they tend to actually get things right more
01:20:03.880 | frequently. And at the same time, also similar to when
01:20:07.720 | you're a child and a teacher is reviewing your exam work by
01:20:11.360 | having the LLM write down its reasoning, you as a as a human
01:20:15.920 | or engineer, you can see where the LLM went wrong, if it did
01:20:20.200 | go wrong, which can be very useful when you're trying to
01:20:22.480 | diagnose problems. So with chain of thought, we should see
01:20:26.240 | less hallucinations, and generally bad performance. Now
01:20:30.360 | to implement chain of thought in line chain, there's no
01:20:32.320 | specific like line chain objects that do that. Instead, it's
01:20:35.800 | it's just prompting. Okay, so let's go down and just see how
01:20:39.320 | we might do that. Okay, so be helpful assistant answer the
01:20:42.960 | user question, you must answer the question directly without
01:20:46.200 | any other text or explanation. Okay, so that's our no chain of
01:20:50.520 | thought system prompt. I will just know here, especially with
01:20:53.840 | OpenAI. Again, this is one of those things where you'll see
01:20:57.040 | it more with the smaller models. Most LLMs are actually trained
01:21:00.120 | to use chain of thought prompting by default. So we're
01:21:03.120 | actually specifically telling it here, you must answer the
01:21:05.880 | question directly without any other text or explanation. Okay,
01:21:09.800 | so we're actually kind of reverse prompting it to not use
01:21:13.000 | chain of thought. Otherwise, by default, it actually will try
01:21:17.000 | and do that because it's been trained to. That's how that's
01:21:19.600 | how relevant chain of thought is. Okay, so I'm going to say
01:21:23.280 | how many keystrokes you need to type in, type the numbers from
01:21:26.640 | one to 500. Okay, we set up our like LLM chain pipeline. And
01:21:32.720 | we're going to just invoke our query. And we'll see what we
01:21:35.760 | get. Total number of keystrokes needed to type numbers from one
01:21:40.520 | to 500 is 1511. The actual answers I've written here is
01:21:47.280 | 1392. Without chain thought is hallucinating. Okay, now let's
01:21:52.720 | go ahead and see okay with chain of thought prompting, what does
01:21:55.920 | it do? So be helpful assistant answer users question. To answer
01:22:00.480 | the question, you must list systematically and in precise
01:22:04.160 | detail all sub problems that are needed to be solved to answer
01:22:07.600 | the question. Solve each sub problem individually, you have
01:22:11.720 | to shout at the LLM sometimes to get them to listen. And in
01:22:14.920 | sequence. Finally, use everything you've worked
01:22:18.120 | through to provide the final answer. Okay, so we're getting
01:22:20.480 | it we're forcing it to kind of go through the full problem
01:22:24.320 | there. We can remove that. So run that. Again, I don't know
01:22:29.720 | why we have context there. I'll remove that. And let's see. You
01:22:37.040 | can see straightaway, that's taking a lot longer to generate
01:22:40.640 | the output. That's because it's generating so many more tokens.
01:22:43.000 | So that's just one one drawback of this. But let's see what we
01:22:46.320 | have. So to determine how many keystrokes to tie those numbers,
01:22:50.200 | we is breaking down several sub problems to count number of
01:22:54.080 | digits from one to 910 to 99. So on account digits and number
01:22:59.920 | 500. Okay, interesting. So that's how it's breaking it up.
01:23:04.040 | Some more digits counts in the previous steps. So we go
01:23:07.720 | through total digits. And we see this, okay, nine digits for
01:23:12.680 | those for here 180 for here 1200. And then, of course, three
01:23:20.480 | here. So it gets all those sums those digits and actually comes
01:23:25.600 | to the right answer. Okay, so that that is, you know, that's
01:23:29.200 | the difference with with chain of thought versus without. So
01:23:32.960 | without it, we just get the wrong answer, basically
01:23:35.800 | guessing. With chain of thought, we get the right answer just by
01:23:40.480 | the LLM writing down its reasoning and breaking the
01:23:43.720 | problem down into multiple parts, which is, I found that
01:23:47.160 | super interesting that it does that. So that's pretty cool.
01:23:52.080 | Now, I will just see. So as I mentioned, as we mentioned
01:23:55.800 | before, most LLMs nowadays are actually trained to use chain of
01:23:59.120 | thought prompting by default. So let's just see if we don't
01:24:02.360 | mention anything, right? Be a helpful assistant and answer
01:24:04.440 | these users questions. So we're not telling it not to think
01:24:07.560 | through its reasoning, and we're not telling it to think through
01:24:10.800 | its reasoning. Let's just see what it does. Okay, so you can
01:24:15.560 | see, again, it's actually doing the exact same reasoning, okay,
01:24:22.000 | it doesn't, it doesn't give us like the sub problems at the
01:24:24.480 | start, but it is going through and it's breaking everything
01:24:27.480 | apart. Okay, which is quite interesting. And we get the
01:24:31.040 | same correct answer. So the formatting here is slightly
01:24:34.000 | different. It's probably a little cleaner, actually,
01:24:36.800 | although I think, I don't know. Here, we get a lot more
01:24:41.560 | information. So both are fine. And in this scenario, we
01:24:46.640 | actually do get the right answer as well. So you can see that
01:24:50.080 | that chain of thought prompting has actually been quite
01:24:54.200 | literally trained into the model. And you'll see that with
01:24:58.560 | most, well, I think all Save the Art LLMs. Okay, cool. So that
01:25:04.480 | is our chapter on prompting. Again, we're focusing very much
01:25:09.960 | on a lot of the fundamentals of prompting there. And of course,
01:25:14.880 | tying that back to the actual objects and methods within
01:25:19.600 | LanguageAid. But for now, that's it for prompting. And we'll move
01:25:23.360 | on to the next chapter. In this chapter, we're going to be
01:25:26.360 | taking a look at conversational memory in LanguageChain. We're
01:25:30.960 | going to be taking a look at the core, like chat memory
01:25:35.280 | components that have really been in LanguageChain since the
01:25:39.200 | start, but are essentially no longer in the library. And we'll
01:25:43.800 | be seeing how we actually implement those historic
01:25:48.000 | conversational memory utilities in the new versions of
01:25:53.680 | LanguageChain. So 0.3. Now as a pre warning, this chapter is
01:25:57.720 | fairly long. But that is because conversational memory is just
01:26:02.640 | such a critical part of chatbots and agents. Conversational
01:26:07.440 | memory is what allows them to remember previous interactions.
01:26:11.120 | And without it, our chatbots and agents would just be responding
01:26:15.680 | to the most recent message without any understanding of
01:26:19.760 | previous interactions within a conversation. So they would just
01:26:23.160 | not be conversational. And depending on the type of
01:26:27.960 | conversation, we might want to go with various approaches to
01:26:32.080 | how we remember those interactions within a
01:26:36.720 | conversation. Now throughout this chapter, we're going to be
01:26:39.040 | focusing on these four memory types. We'll be referring to
01:26:43.640 | these and I'll be showing you actually how each one of these
01:26:46.400 | works. But what we're really focusing on is rewriting these
01:26:50.680 | for the latest version of LangChain using the, what's
01:26:54.480 | called the runnable with message
01:26:59.120 | history. So we're going to be essentially taking a look at the
01:27:05.320 | original implementations for each of these four original
01:27:08.960 | memory types, and then we'll be rewriting them with the
01:27:12.200 | runnable memory history class. So just taking a look at each of
01:27:16.880 | these four very quickly. Conversational buffer memory is
01:27:20.840 | I think the simplest, most intuitive of these memory types.
01:27:24.840 | It is literally just you have your messages, they come in to
01:27:31.160 | this object, they are sold in this object as essentially a
01:27:35.000 | list. And when you need them again, it will return them to
01:27:39.080 | you. There's nothing, nothing else to it, super simple. The
01:27:42.760 | conversation buffer window memory, okay, so new word in the
01:27:46.600 | middle of the window. This works in pretty much the same way.
01:27:50.880 | But those messages that it has stored, it's not going to return
01:27:54.680 | all of them for you. Instead, it's just going to return the
01:27:57.720 | most recent, let's say the most recent three, for example. Okay,
01:28:02.200 | and that is defined by a parameter k. Conversational
01:28:05.560 | summary memory, rather than keeping track of the entire
01:28:09.640 | interaction memory directly, what it's doing is as those
01:28:13.800 | interactions come in, it's actually going to take them and
01:28:17.640 | it's going to compress them into a smaller little summary of what
01:28:21.720 | has been within that conversation. And as every new
01:28:25.760 | interaction is coming in, it's going to do that, and I keep
01:28:28.440 | iterating on that summary. And then that is going to return to
01:28:32.080 | us when we need it. And finally, we have the conversational
01:28:34.640 | summary buffer memory. So this is it's taking sort of buffer
01:28:40.760 | part of this is actually referring to very similar thing
01:28:44.360 | to the buffer window memory, but rather than it being a most k
01:28:48.880 | messages, it's looking at the number of tokens within your
01:28:51.600 | memory, and it's returning the most recent k tokens. That's
01:28:58.320 | what the buffer part is there. And then it's also merging that
01:29:02.560 | with the summary memory here. So essentially, what you're
01:29:06.360 | getting is almost like a list of the most recent messages based
01:29:10.280 | on the token length rather than the number of interactions,
01:29:13.160 | plus a summary, which would come at the top here. So you get
01:29:18.240 | kind of both. The idea is that obviously this summary here
01:29:22.560 | would maintain all of your interactions in a very compressed
01:29:27.800 | form. So you're, you're losing less information, and you're
01:29:31.160 | still maintaining, you know, maybe the very first
01:29:33.880 | interaction, the user might have introduced themselves, giving
01:29:36.880 | you their name, hopefully, that would be maintained within the
01:29:40.760 | summary, and it would not be lost. And then you have almost
01:29:44.040 | like high resolution on the most recent k or k tokens from your
01:29:50.440 | memory. Okay, so let's jump over to the code, we're going into
01:29:53.840 | the 04 chat memory notebook, open that in Colab. Okay, now
01:29:57.720 | here we are, let's go ahead and install the prerequisites, run
01:30:02.240 | all we again, can or cannot use a linesmith, it is up to you.
01:30:08.280 | Enter that. And let's come down and start. So first, we'll just
01:30:13.560 | initialize our LM using for a mini in this example, again, low
01:30:19.320 | temperature. And we're going to start with conversation buffer
01:30:23.000 | memory. Okay, so this is the original version of this memory
01:30:30.400 | type. So let me, where are we, we're here. So memory
01:30:35.760 | conversation buffer memory, and we're returning messages that
01:30:38.560 | needs to be set to true. So the reason that we set return
01:30:42.640 | messages true, it mentions up here is if you do not do this,
01:30:47.600 | it's going to be returning your chat history as a string to an
01:30:51.800 | LM. Whereas, well, chat elements nowadays would expect message
01:30:58.480 | objects. So yeah, you just want to be returning these as
01:31:02.840 | messages rather than as strings. Okay. Otherwise, yeah, you're
01:31:06.480 | going to get some kind of strange behavior out from your
01:31:09.360 | LMS if you return them strings. So you do want to make sure
01:31:12.160 | that it's true. I think by default, it might not be true.
01:31:15.640 | But this is coming, this is deprecated, right? It does tell
01:31:18.360 | you here, as deprecation warning, this is coming from
01:31:22.360 | older line chain, but it's a good place to start just to
01:31:25.000 | understand this. And then we're going to rewrite this with the
01:31:27.560 | runnables, which is the recommended way of doing so
01:31:30.360 | nowadays. Okay, so adding messages to our memory, we're
01:31:34.880 | going to write this, okay, so it's just a just a conversation
01:31:38.920 | user AI user AI, so on, random chat, main things to note here
01:31:44.040 | is I do provide my name, we have the the model's name, right
01:31:47.360 | towards the start of those interactions. Okay, so I'm just
01:31:50.440 | going to add all of those, we do it like this. Okay, then we can
01:31:57.040 | just see, we can load our history, like so. So let's just
01:32:02.800 | see what we have there. Okay, so we have human message, AI
01:32:06.520 | message, human message, right? This is exactly what we showed
01:32:10.200 | you just here. It's just in that message format from line chain.
01:32:13.720 | Okay, so we can do that. Alternatively, we can actually
01:32:18.240 | do this. So we can get our memory, we initialize the
01:32:21.120 | constitutional buffer memory as we did before. And we can
01:32:24.360 | actually add it directly these message into our memory like
01:32:28.360 | that. So we can use this add user message, add AI message, so
01:32:31.440 | on, so on, load again, and it's going to give us the exact same
01:32:34.680 | thing. Again, there's multiple ways to do the same thing. Cool.
01:32:38.280 | So we have that to pass all of this into our LM. Again, this is
01:32:42.920 | all deprecated stuff, we're going to learn how to use
01:32:45.000 | properly in a moment. But this is how line chain is doing in
01:32:48.760 | the past. So to pass all of this into our LM, we'd be using this
01:32:53.680 | conversation chain, right? Again, this is deprecated.
01:32:57.600 | Nowadays, we would be using L cell for this. So I just want to
01:33:02.760 | show you how this would all go together. And then we would
01:33:05.280 | invoke, okay, what is my name again, let's run that. And we'll
01:33:10.040 | see what we get is remembering everything, remember, so this
01:33:13.240 | conversation buffer memory, it doesn't drop messages, it just
01:33:17.160 | remembers everything. Right. And honestly, with the sort of high
01:33:21.920 | context windows of many LMS, that might be what you do. It
01:33:25.200 | depends on how long you expect the conversation to go on for,
01:33:27.760 | but you could you probably in most cases would get away with
01:33:30.960 | this. Okay, so what, let's see what we get. I say, what is my
01:33:36.080 | name again? Okay, let's see what it gives me says your name is
01:33:39.760 | James. Great. Thank you. That works. Now, as I mentioned, all
01:33:45.200 | of this I just showed you is actually deprecated. That's the
01:33:47.280 | old way of doing things. Let's see how we actually do this in
01:33:50.520 | modern or up to date blank chain. So we're using this
01:33:54.440 | runnable with message history. To implement that, we will need
01:33:58.800 | to use LSL. And for that we will need to just define prompt
01:34:03.080 | templates or LM as we usually would. Okay, so we're going to
01:34:06.600 | set up our system prompt, which is just a helpful system called
01:34:10.880 | Zeta. Okay, we're going to put in this messages placeholder.
01:34:15.360 | Okay, so that's important. Essentially, that is where our
01:34:19.720 | messages are coming from our conversation buffer memory is
01:34:24.360 | going to be inserted, right? So it's going to be that chat
01:34:27.400 | history is going to be inserted after our system prompt, but
01:34:30.960 | before our most recent query, which is going to be inserted
01:34:34.360 | last here. Okay, so messages placeholder item, that's
01:34:38.800 | important. And we use that throughout the course as well.
01:34:41.600 | So we use it both for chat history, and we'll see later on,
01:34:44.800 | we also use it for the intermediate thoughts that a
01:34:47.960 | agent would go through as well. So important to remember that
01:34:51.920 | little thing. We'll link our prompt template to our LM.
01:34:56.320 | Again, if we would like, we could also add in the I think we
01:35:01.320 | only have the query here. Oh, we would probably also want our
01:35:05.880 | history as well. But I'm not going to do that right now.
01:35:09.360 | Okay, so we have our pipeline. And we can go ahead and actually
01:35:13.680 | define our runnable with message history. Now this class or
01:35:18.120 | object when we are initializing it does require a few items, we
01:35:21.360 | can see them here. Okay, so we see that we have our pipeline
01:35:25.400 | with history. So it's basically going to be, you can you can see
01:35:28.720 | here, right, we have that history messages key, right, this
01:35:32.120 | here has to align with what we provided as a messages
01:35:36.120 | placeholder in our pipeline, right? So we have our pipeline
01:35:41.240 | prompt template here, and here, right. So that's where it's
01:35:45.200 | coming from. It's coming from messages placeholder, the
01:35:47.120 | variable name is history, right? That's important. That links to
01:35:51.920 | this. Then for the input messages key here, we have query
01:35:56.360 | that, again, links to this. Okay, so both important to have
01:36:02.680 | that. The other thing that is important is obviously we're
01:36:06.480 | passing in that pipeline from before. But then we also have
01:36:09.480 | this get session history. Basically, what this is doing is
01:36:12.840 | it saying, okay, I need to get the list of messages that make
01:36:16.280 | up my chat history that are going to be inserted into this
01:36:19.200 | variable. So that is a function that we define, okay. And within
01:36:23.960 | this function, what we're trying to do here is actually
01:36:26.640 | replicate what we have with the previous conversation buffer
01:36:33.000 | memory. Okay, so that's what we're doing here. So it's very
01:36:36.880 | simple, right? So we have this in memory chat message history.
01:36:42.880 | Okay, so that's just the object that we're going to be
01:36:44.840 | returning. What this will do is it will sell a session ID, the
01:36:48.560 | session ID is essentially like a unique identifier so that each
01:36:52.560 | conversational interaction within a single conversation is
01:36:56.200 | being mapped to a specific conversation. So you don't have
01:36:58.960 | overlapping, let's say have multiple users using the same
01:37:01.480 | system, you want to have a unique session ID for each one
01:37:03.960 | of those. Okay, and what it's doing is saying, okay, if the
01:37:07.080 | session ID is not in the chat map, which is this empty
01:37:10.400 | dictionary we defined here, we are going to initialize that
01:37:15.000 | session with an in memory, chat message history. Okay, that's
01:37:21.040 | it. And we return. Okay, and all that's going to do is it's
01:37:25.040 | going to basically append our messages, they will be appended
01:37:28.560 | within this chat map session ID, and they're going to get
01:37:32.560 | returned. There's nothing else to it, to be honest. So we
01:37:38.000 | invoke our runnable, let's see what we get. I need to run this.
01:37:42.720 | Okay, note that we do have this config, so we have the session
01:37:48.800 | ID, that's to again, as I mentioned, keep different
01:37:51.600 | conversations separate. Okay, so we've run that. Now let's run a
01:37:55.440 | few more. So what is my name again, let's see if it
01:37:58.800 | remembers. Your name is James. How can I help you today, James?
01:38:02.840 | Okay. So it's what we've just done there is literally
01:38:08.360 | conversation buffer memory, but for up to date, line chain with
01:38:14.640 | L cell with runnables. So the recommended way of doing it
01:38:19.040 | nowadays. So that's a very simple example. Okay, there's
01:38:23.240 | really not that much to it. It gets a little more complicated
01:38:28.200 | as we start thinking about the different types of memory.
01:38:30.760 | Although with that being said, it's not massively complicated,
01:38:33.760 | we're only really going to be changing the way that we're
01:38:36.160 | getting our interactions. So let's, let's dive into that and
01:38:42.080 | see how we will do something similar with the conversation
01:38:45.120 | buffer window memory. But first, let's actually just understand
01:38:48.240 | okay, what is the conversation buffer window memory. So as I
01:38:51.560 | mentioned, near the start, it's going to keep track of the last
01:38:53.880 | K messages. So there's a few things to keep in mind here.
01:38:58.600 | More messages does mean more tokens that send each request.
01:39:02.600 | And if we have more tokens in each request, it means that
01:39:05.320 | we're increasing the latency of our responses and also the cost.
01:39:08.360 | So with the previous memory type, we're just sending
01:39:12.200 | everything. And because we're sending everything that is going
01:39:15.440 | to be increasing our costs, it's going to be increasing our
01:39:17.400 | latency for every message, especially as the conversation
01:39:20.120 | gets longer and longer. And we don't, we might not necessarily
01:39:22.760 | want to do that. So with this conversation buffer window
01:39:27.000 | memory, we're going to say, okay, just return me the most
01:39:30.360 | recent messages. Okay, so let's, well, let's see how that would
01:39:36.000 | work. Here, we're going to return the most recent four
01:39:38.960 | messages. Okay, we are again, make sure we've turned messages
01:39:42.720 | is set to true. Again, this is deprecated. This is just the
01:39:46.320 | old way of doing it. In a moment, we'll see the updated
01:39:49.760 | way of doing this. We'll add all of our messages. Okay, so we
01:39:55.640 | have this. And just see here, right, so we've added in all
01:40:01.000 | these messages, there's more than four messages here. And we
01:40:03.680 | can actually see that here. So we have human message, AI,
01:40:07.400 | human, AI, human, AI, human, AI. Right. So we've got four pairs
01:40:13.440 | of human AI interactions there. But here, we don't have as more
01:40:17.560 | than four pairs. So four pairs would take us back all the way
01:40:21.440 | to here, I'm researching different types of
01:40:25.200 | conversational memory. Okay, and if we take a look here, the
01:40:29.200 | most the first message we have is I'm researching different
01:40:32.040 | types of conversational memory. So it's cut off these two here,
01:40:35.800 | which will be a bit problematic when we ask you what our name
01:40:38.720 | is. Okay, so let's just see, we're going to be using
01:40:41.400 | conversation chain object again, again, remember that is
01:40:44.600 | deprecated. And I want to say what is my name again, let's
01:40:48.360 | see, let's see what it says. I'm sorry, I don't know if I see
01:40:53.920 | your name or any personal information, if you like, you
01:40:55.920 | can tell me your name, right, so it doesn't actually remember.
01:40:58.360 | So that's kind of like a negative of the conversation
01:41:04.160 | buffer window memory. Of course, the to fix that in this
01:41:08.160 | scenario, we might just want to increase K maybe we say around
01:41:11.480 | the previous eight interaction pairs, and it will actually
01:41:15.400 | remember. So what's my name again, your name is James. So
01:41:19.200 | now it remembers, we just modified how much is
01:41:21.680 | remembering. But of course, you know, there's pros and cons to
01:41:24.600 | this, it really depends on what you're trying to build. So let's
01:41:28.120 | take a look at how we would actually implement this with
01:41:31.880 | the runnable with message history. Okay, so getting a
01:41:37.520 | little more complicated here, although it is, it's not, it's
01:41:41.680 | not complicated. But more we'll see. Okay, so we have a buffer
01:41:46.000 | window message history, we're creating a class here, this
01:41:49.400 | class is going to inherit from the base chat message history
01:41:53.320 | object from line chain. Okay, and all of our other message
01:41:58.320 | history objects can do the same thing before with the in memory
01:42:02.520 | message object that was basically replicating the buffer
01:42:06.120 | memory. So we didn't actually need to do anything, we didn't
01:42:10.240 | need to define our own class here. So in this case, we do.
01:42:14.760 | So we follow the same pattern that line chain follows with
01:42:19.800 | this base chat message history. And you can see a few of the
01:42:22.520 | functions here that are important. So add messages and
01:42:25.760 | clear the ones that we're going to be focusing on, we also need
01:42:28.320 | to have messages, which this object attribute here. Okay, so
01:42:32.120 | we're just implementing the synchronous methods here. If we
01:42:37.680 | want this to be async, if we want to supply async, we would
01:42:40.440 | have to add a add messages, a get messages and a clear as
01:42:45.760 | well. So let's go ahead and do that. We have messages we have
01:42:49.800 | k again, we're looking at remembering the top k messages
01:42:52.840 | or most recent k messages only. So it's important that we have
01:42:56.440 | that variable, we are adding messages through this class,
01:43:00.280 | this is going to be used by line chain within our runnable. So
01:43:04.080 | we need to make sure that we do have this method. And all we're
01:43:06.800 | going to be doing is sending the self messages list here. And
01:43:11.480 | then we're actually just going to be trimming that down so that
01:43:13.600 | we're not remembering anything beyond those, you know, most
01:43:18.480 | recent k messages that we have set from here. And then we also
01:43:24.160 | have the clear method as well. So we need to include that
01:43:26.920 | that's just going to clear the history. Okay, so it's not this
01:43:30.120 | isn't complicated, right? It just gives us this nice default
01:43:34.160 | standard interface for message history. And we just need to
01:43:38.280 | make sure we're following that pattern. Okay, I've included the
01:43:41.600 | this print here just so we can see what's happening. Okay, so
01:43:44.800 | we have that. And now for that get chat history function that
01:43:50.240 | we defined earlier, rather than using the built in method, we're
01:43:54.040 | going to be using our own object, which is a buffer window
01:43:57.520 | message history, which we defined just here. Okay. So if
01:44:02.800 | session ID is not in the chat map, as we did before, we're
01:44:05.800 | going to be initializing our buffer window message history,
01:44:08.480 | we're setting k up here with a default value of four, and then
01:44:12.320 | we just return it. Okay, and that is it. So let's run this,
01:44:16.200 | we have our runnable with message history, we have all of
01:44:20.360 | these variables, which are exactly the same as before. But
01:44:23.480 | then we also have these variables here with this history
01:44:26.600 | factory config. And this is where if we have new variables
01:44:34.040 | that we've added to our message history, in this case, k that we
01:44:38.680 | have down here, we need to provide that to line chain and
01:44:42.480 | tell it this is a new configurable field. Okay. And
01:44:45.680 | we've also added it for the session ID here as well. So
01:44:48.640 | we're just being explicit and have everything in that. So we
01:44:52.240 | have that and we run. Okay, now let's go ahead and invoke and
01:44:58.160 | see what we get. Okay, so important here, this history
01:45:02.680 | factory config, that is kind of being fed through into our
01:45:06.240 | invoke so that we can actually modify those variables from
01:45:09.840 | here. Okay, so we have config configurable, session ID, okay,
01:45:13.880 | we'll just put whatever we want in here. And then we also have
01:45:16.400 | the number k. Okay, so remember the previous four interactions,
01:45:22.640 | I think in this one, we're doing something slightly different. I
01:45:25.360 | think we're remembering the four interactions rather than the
01:45:28.560 | previous four interaction pairs. Okay, so my name is James,
01:45:32.560 | we're going to go through I'm just going to actually clear
01:45:35.400 | this. And I'm going to start again. And we're going to use
01:45:38.040 | the exact same add user message and AI message that we used
01:45:41.880 | before, which is manually inserting all that into our
01:45:44.240 | history, so that we can then just see, okay, what is the
01:45:47.840 | result. And you can see that k equals four is actually unlike
01:45:52.360 | before where we were having the saving the top four interaction
01:45:56.920 | pairs, when now saving the most recent four interactions, not
01:46:03.000 | pairs, just interactions. And honestly, I just think that's
01:46:06.480 | clearer. I think it's weird that the number four for k would
01:46:10.760 | actually save the most recent eight messages. Right? I think
01:46:14.960 | that's odd. So I'm just not replicating that weirdness. We
01:46:19.160 | could if we wanted to, I just don't like it. So I'm not doing
01:46:23.800 | that. And anyway, we can see from messages that we're
01:46:26.960 | returning just the most four recent messages. Okay, I wish
01:46:31.160 | would be these four. Okay, cool. So we've just using the
01:46:35.160 | runnable, we've replicated the old way of having a window
01:46:40.640 | memory. And okay, I'm going to say what is my name again, as
01:46:44.200 | before, it's not going to remember. So we can come to
01:46:47.000 | here, I'm sorry, but I don't have access to personal
01:46:48.680 | information and so on and so on. If you like to tell me your
01:46:51.360 | name, it doesn't know. Now let's try a new one, where we
01:46:55.640 | initialize a new session. Okay, so we're going with ID k 14. So
01:47:01.240 | that's going to create a new conversation there. And we're
01:47:03.760 | going to say, we're going to set k to 14. Okay, great. I'm
01:47:09.320 | going to manually insert the other messages as we did
01:47:12.760 | before. Okay, and we can see all of those you can see at the
01:47:15.880 | top here, we are still maintaining that Hi, my name is
01:47:18.520 | James message. Now let's see if it remembers my name. Your name
01:47:23.480 | is James. Okay, there we go. Cool. So that is working. We
01:47:28.360 | can also see, so we just added this, what is my name again,
01:47:31.960 | let's just see if did that get added to our list of messages.
01:47:36.440 | Right, what is my name again? Nice. And then we also have the
01:47:39.640 | response, your name is James. So just by invoking this, because
01:47:43.320 | we're using the, the runnable with message history, it's just
01:47:47.800 | automatically adding all of that into our message history,
01:47:51.800 | which is nice. Cool. Alright, so that is the buffer window
01:47:56.920 | memory. Now we are going to take a look at how we might do
01:48:01.480 | something a little more complicated, which is the
01:48:03.880 | summaries. Okay, so when you think about the summary, you
01:48:07.080 | know, what are we doing, we're actually taking the messages,
01:48:10.680 | we're using the LLM call to summarize them, to compress
01:48:14.760 | them, and then we're storing them within messages. So let's
01:48:18.360 | see how we would actually do that. So to start with, let's
01:48:23.720 | just see how it was done in old line chain. So your
01:48:27.000 | conversation summary memory, go through that. And let's just
01:48:33.160 | see what we get. So again, same interactions. Right, I'm just
01:48:38.600 | invoking, invoking, invoking, I'm not adding these directly
01:48:42.120 | to the messages, because it actually needs to go through a
01:48:46.520 | like that summarization process. And if we have a look, we can
01:48:50.520 | see it happening. Okay, current conversation. So sorry,
01:48:54.680 | current conversation. Hello there, my name is James, AI is
01:48:57.880 | generating. Current conversation, the human introduces
01:49:01.320 | himself as James, AI greets James warmly and expresses its
01:49:04.760 | readiness to chat and assist, inquiring about how his day is
01:49:08.200 | going. Right, so it's summarizing the previous
01:49:11.640 | interactions. And then we have, you know, after that summary, we
01:49:15.720 | have the most recent human message, and then the AI is
01:49:18.520 | going to generate its response. Okay, and that continues going,
01:49:22.200 | continues going. And you see that the final summary here is
01:49:25.240 | going to be a lot longer. Okay, and it's different that first
01:49:28.280 | first summary, of course, asking about his day, he mentions that
01:49:31.160 | he's researching different types of conversational memory.
01:49:33.640 | The AI responds enthusiastically, explaining that
01:49:36.280 | conversational memory includes short term memory, long term
01:49:38.760 | memory, contextual memory, personalized memory, and then
01:49:41.080 | inquires if James is focused on the specific type of memory.
01:49:44.680 | Okay, cool. So we get essentially the summary is just
01:49:48.760 | getting longer and longer as we go. But at some point, the idea
01:49:52.520 | is that it's not going to keep growing. And it should actually
01:49:55.560 | be shorter than if you were saving every single
01:49:57.640 | interaction, whilst maintaining as much of the information as
01:50:01.960 | possible. But of course, you're not going to maintain all of
01:50:06.280 | the information that you would with, for example, the the
01:50:09.720 | buffer memory, right with the summary, you are going to lose
01:50:13.640 | information, but hopefully less information than if you're just
01:50:17.960 | cutting interactions. So you're trying to reduce your token
01:50:21.880 | count whilst maintaining as much information as possible.
01:50:26.520 | Now, let's go and ask what is my name again, it should be able
01:50:30.360 | to answer because we can see in the summary here that I
01:50:34.200 | introduced myself as James. Okay, response, your name is
01:50:38.360 | James. How is your research going? Okay, so has that. Cool.
01:50:42.920 | Let's see how we'd implement that. So again, as before, we're
01:50:46.600 | going to go with that conversation summary message
01:50:50.760 | history, we're going to be importing a system message,
01:50:53.560 | we're going to be using that not for the LM that we're chatting
01:50:56.040 | with, but for the LM that will be generating our summary. So
01:51:00.520 | actually, that is not quite correct, there's create a
01:51:04.760 | summary, not that it matters, it's just the docker string. So
01:51:07.880 | we have our messages and we also have the LM. So different
01:51:10.520 | tribute here to what we had before. When we initialize a
01:51:14.440 | conversation summary message history, we need to be passing
01:51:17.640 | in our LM. We have the same methods as before, we have add
01:51:21.720 | messages and clear. And what we're doing is as messages
01:51:25.240 | coming, we extend with our current messages, but then we're
01:51:29.720 | modifying those. So we construct our instructions to
01:51:35.560 | make a summary. So that is here, we have the system prompt,
01:51:40.280 | given the existing conversation summary and the new messages,
01:51:43.240 | generate a new summary of the conversation, ensuring to
01:51:45.400 | maintain as much relevant information as possible. Then
01:51:48.920 | we have a human message here, through that we're passing the
01:51:52.360 | existing summary. And then we're passing in the new
01:51:56.840 | messages. So we format those and invoke the LM.
01:52:04.040 | And then what we're doing is in the messages, we're actually
01:52:10.040 | replacing the existing history that we had before with a new
01:52:14.440 | history, which is the single system summary message. Let's
01:52:20.040 | see what we get. As before, we have that get chat history
01:52:23.160 | exactly the same as before. The only real difference is that
01:52:26.440 | we're passing in the LM parameter here. And of course,
01:52:29.400 | as we're passing in the LM parameter in here, it does also
01:52:33.080 | mean that we're going to have to include that in the
01:52:34.760 | configurable field spec, and that we're going to need to
01:52:39.160 | include that when we're invoking our pipeline. So we
01:52:44.520 | run that, pass in the LM. Now, of course, one side effect of
01:52:51.160 | generating summaries or everything is that we're
01:52:52.920 | actually, you know, we're generating more. So you are
01:52:56.760 | actually using quite a lot of tokens. Whether or not you are
01:53:00.600 | saving tokens or not actually depends on the length of a
01:53:03.080 | conversation. As the conversation gets longer, if
01:53:05.880 | you're storing everything, after a little while that the
01:53:09.480 | token usage is actually going to increase. So if in your use
01:53:13.720 | case you expect to have shorter conversations, you would be
01:53:17.800 | saving money and tokens by just using the standard buffer
01:53:22.120 | memory. Whereas if you're expecting very long
01:53:25.080 | conversations, you would be saving tokens and money by
01:53:28.440 | using the summary history. Okay, so let's see what we got
01:53:33.160 | from that. We have a summary of the conversation. James
01:53:35.400 | introduced himself by saying, "Hi, my name is James." AR
01:53:37.800 | responded warmly asking, "Hi, James." Interaction include
01:53:40.600 | details about token usage. Okay, so we actually included
01:53:45.960 | everything here, which we probably should not have done.
01:53:49.400 | Why did we do that? So in here, we're including all of
01:53:57.240 | the content from the
01:54:03.720 | messages. So I think maybe if we just do "x.content" for
01:54:10.200 | "x" in messages, that should resolve that.
01:54:16.280 | Okay, there we go. So we quickly fixed that. So yeah, before
01:54:21.160 | we're passing in the entire message object, which obviously
01:54:23.560 | includes all of this information. Whereas actually
01:54:26.200 | we just want to be passing in the content. So we modified
01:54:30.360 | that and now we're getting what we'd expect. Okay, cool. And
01:54:35.640 | then we can keep going. So as we as we keep going, the
01:54:38.600 | summary should get more abstract. Like as we just saw
01:54:42.920 | here, it's literally just giving us the messages directly
01:54:46.440 | almost. Okay, so we're getting the summary there and we can
01:54:50.120 | keep going. We're going to add just more messages to that. So
01:54:53.080 | we'll see as we'll send those, we're getting a
01:54:57.720 | response. Send again, get response. And we're just adding
01:55:01.000 | all of that. Inverting all of that and that will be of course
01:55:03.960 | adding everything into our message history. Okay, cool. So
01:55:08.440 | we've run that. Let's see what the latest summary is.
01:55:13.560 | Okay, and then we have this. So this is a summary that we have
01:55:16.820 | instead of our chat history. Okay, cool. Now, finally, let's
01:55:23.860 | see what's my name again. We can just double check. You know,
01:55:26.980 | it has my name in there. So it should be able to tell us.
01:55:31.460 | Okay, cool. So your name is James. Pretty interesting. So
01:55:38.680 | let's have a quick look over at Langsmith. So the reason I
01:55:43.080 | want to do this is just to point out, okay, the different
01:55:46.600 | essentially token usage that we're getting with each one of
01:55:48.840 | these. Okay, so we can see that we have these runnable
01:55:51.400 | message history, which are probably improved in naming
01:55:54.200 | there. But we can see, okay, how long is each one of these
01:55:59.000 | taken? How many tokens are they also using? Come back to here.
01:56:03.800 | We have this runnable message history. This is, we'll go
01:56:07.320 | through a few of these, maybe to here, I think. You can see
01:56:11.400 | here, this is that first interaction where we're using
01:56:13.880 | the buffer memory. And we can see how many tokens we use
01:56:18.280 | here. So 112 tokens when we're asking what is my name again.
01:56:22.280 | Okay, then we modified this to include, I think it was like
01:56:27.880 | 14 interactions or something on those lines, obviously
01:56:30.520 | increases the number of tokens that we're using, right? So we
01:56:33.160 | can see that actually happening all in Langsmith, which is
01:56:36.200 | quite nice. And we can compare, okay, how many tokens is each
01:56:38.920 | one of these using. Now, this is looking at the buffer window.
01:56:43.960 | And if we come down to here and look at this one, so this is
01:56:47.640 | using our summary. Okay, so summary with what is my name
01:56:51.560 | again, actually use more tokens in this scenario, right? Which
01:56:54.520 | is interesting because we're trying to compress information.
01:56:57.640 | The reason there's more is because there's not, there
01:56:59.880 | hasn't been that many interactions. As the
01:57:02.680 | conversation length increases with the summary, this total
01:57:08.120 | number of tokens, especially if we prompt it correctly to keep
01:57:10.600 | that low, that should remain relatively small. Whereas with
01:57:16.040 | the buffer memory, that will just keep increasing and
01:57:19.560 | increasing as the conversation gets longer. So useful little
01:57:25.000 | way of using Langsmith there to just kind of figure out, okay,
01:57:28.920 | in terms of tokens and costs of what we're looking at for each
01:57:32.200 | of these memory types. Okay, so our final memory type acts as a
01:57:37.720 | mix of the summary memory and the buffer memory. So what it's
01:57:42.440 | going to do is keep the buffer up until an n number of tokens.
01:57:48.440 | And then once a message exceeds the n number of token limit for
01:57:52.760 | the buffer, it is actually going to be added into our
01:57:56.760 | summary. So this memory has the benefit of remembering in
01:58:02.600 | detail the most recent interactions whilst also not
01:58:07.000 | having the limitation of using too many tokens as a
01:58:12.440 | conversation gets longer and even potentially exceeding
01:58:15.400 | context windows if you try super hard. So this is a very
01:58:19.480 | interesting approach. Now as before, let's try the original
01:58:23.880 | way of implementing this. Then we will go ahead and use our
01:58:29.000 | update method for implementing this. So we come down to here
01:58:32.680 | and we're going to do Lang chain memory import conversation
01:58:36.360 | summary buffer memory. Okay, a few things here. LLM for
01:58:41.480 | summary. We have the n number of tokens that we can keep
01:58:46.200 | before they get added to the summary and then return
01:58:49.160 | messages, of course. Okay, you can see again this is
01:58:51.560 | deprecated. We use the conversation chain and then we're
01:58:56.040 | just passing our memory there and then we can chat. Okay, so
01:58:59.640 | super straightforward first message. We'll add a few more
01:59:03.880 | here. Again, we have to invoke because how memory type here is
01:59:10.120 | using LLM to create those summaries as it goes and let's
01:59:14.360 | see what they look like. Okay, so we can see for the first
01:59:16.920 | message here, we have a human message and then an AI message.
01:59:22.360 | Then we come a little bit lower down again. It's the same
01:59:24.440 | thing. Human message is the first thing in our history here.
01:59:28.840 | Then it's a system message. So this is at the point where
01:59:31.560 | we've exceeded that 300 token limit and the memory type here
01:59:36.440 | is generating those summaries. So that summary comes in as
01:59:40.120 | this is a message and we can see, okay, the human named
01:59:43.240 | James introduces himself and mentions he's researching
01:59:45.720 | different types of conversational memory and so on
01:59:47.960 | and so on. Right. Okay, cool. So we have that. Then let's come
01:59:53.480 | down a little bit further. We can see, okay, so the summary
01:59:57.160 | there. Okay, so that's what we that's what we have. That is
02:00:01.960 | the implementation for the old version of this memory. Again,
02:00:07.880 | we can see it's deprecated. So how do we implement this for
02:00:12.040 | our more recent versions of LangChain and specifically
02:00:16.200 | 0.3? Well, again, we're using that runnable message history
02:00:20.840 | and it looks a little more complicated than we were
02:00:24.360 | getting before, but it's actually just, you know, it's
02:00:26.680 | nothing too complex. We're just creating a summary as we
02:00:31.800 | did with the previous memory type, but the decision for
02:00:36.360 | adding to that summary is based on, in this case, actually the
02:00:39.960 | number of messages. So I didn't go with the LangChain
02:00:43.960 | version where it's a number of tokens. I don't like that. I
02:00:47.240 | prefer to go with messages. So what I'm doing is saying, okay,
02:00:50.520 | the last K messages. Okay. Once we exceed K messages, the
02:00:56.200 | messages beyond that are going to be added to the memory.
02:01:00.280 | Okay, cool. So let's see, we first initialize our
02:01:06.040 | conversation summary buffer message history class with LLM
02:01:11.640 | and K. Okay, so these two here. So LLM, of course, to create
02:01:15.320 | summaries and K is just the limit of number of messages
02:01:18.360 | that we want to keep before adding them to the summary or
02:01:21.560 | dropping them from our messages and adding them to the summary.
02:01:24.920 | Okay, so we will begin with, okay, do we have an existing
02:01:30.360 | summary? So the reason we set this to none is we can't extract
02:01:36.840 | the summary, the existing summary, unless it already
02:01:40.200 | exists. And the only way we can do that is by checking, okay,
02:01:43.800 | do we have any messages? If yes, we want to check if within
02:01:47.960 | those messages, we have a system message because we're
02:01:50.440 | doing the same structure as what we have for peer where the
02:01:53.720 | system message, that first system message is actually our
02:01:56.840 | summary. So that's what we're doing here. We're checking if
02:01:59.400 | there is a summary message already stored within our
02:02:02.200 | messages. Okay, so we're checking for that. If we find
02:02:08.600 | it, we'll just do, we have this little print statement so we
02:02:11.080 | can see that we found something and then we just make our
02:02:15.480 | existing summary. I should actually move this to the first
02:02:20.920 | instance here. Okay, so that existing summary will be set
02:02:26.920 | to the first message. Okay, and this would be a system message
02:02:33.480 | rather than a string. Cool, so we have that. Then we want to
02:02:39.640 | add any new messages to our history. Okay, so we're sending
02:02:44.760 | the history there and then we're saying, okay, if the
02:02:47.560 | length of our history is exceeds the K value that we
02:02:51.480 | set, we're going to say, okay, we found that many messages.
02:02:54.120 | We're going to be dropping the latest. It's going to be the
02:02:56.040 | latest two messages. This I will say here, one thing or one
02:03:01.640 | problem with this is that we're not going to be saving that
02:03:04.840 | many tokens if we're summarizing every two messages.
02:03:08.440 | So what I would probably do is in an actual like production
02:03:13.480 | setting, I would probably say let's go to twenty messages and
02:03:20.040 | once we hit twenty messages, let's take the previous ten.
02:03:23.720 | We're going to summarize them and put them into our summary
02:03:26.600 | alongside any previous summary that already existed, but in
02:03:30.440 | you know, this is also fine as well. Okay, so we say we found
02:03:36.600 | those messages. We're going to drop the latest two messages.
02:03:40.760 | Okay, so we pull the oldest messages out. I should say
02:03:46.200 | not the latest. It's the oldest, not the latest. We want to
02:03:51.000 | keep the latest and drop the oldest. So we pull out the
02:03:54.840 | oldest messages and keep only the most recent messages.
02:03:59.240 | Okay, then I'm saying, okay, if we don't have any old
02:04:03.720 | messages to summarize, we don't do anything. We just return.
02:04:07.560 | Okay, so this indicates that this has not been triggered. We
02:04:11.880 | would hit this, but in the case this has been triggered and we
02:04:17.000 | do have old messages, we're going to come to here. Okay, so
02:04:22.760 | this is we can see we have a system message prompt template
02:04:26.760 | saying giving the existing conversation summary in the new
02:04:29.480 | messages generate a new summary of the conversation,
02:04:32.520 | ensuring to maintain as much relevant information as
02:04:34.760 | possible. So if we want to be more conservative with tokens,
02:04:38.040 | we could modify this prompt here to say keep the summary to
02:04:42.360 | within the length of a single paragraph, for example, and
02:04:46.680 | then we have our human message prompt template, which can
02:04:49.240 | say, okay, here's the existing conversation summary and here
02:04:51.960 | are new messages. Now, new messages here is actually the
02:04:55.160 | old messages, but the way that we're framing it to the LLM
02:04:59.400 | here is that we want to summarize the whole conversation,
02:05:02.680 | right? It doesn't need to have the most recent messages that
02:05:05.000 | we're storing within our buffer. It doesn't need to know
02:05:08.600 | about those. That's irrelevant to the summary. So we just tell
02:05:11.560 | it that we have these new messages and as far as this LLM
02:05:14.280 | is concerned, this is like the full set of interactions. Okay,
02:05:18.600 | so then we would format those and invoke our LLM and then
02:05:23.800 | we'll print out our new summary so we can see what's going on
02:05:26.360 | there and we would prepend that new summary to our
02:05:31.640 | conversation history. Okay, and this will work so we can just
02:05:37.240 | prepend it like this because we've already popped. Where was
02:05:43.640 | it up here? If we have an existing summary, we already
02:05:48.600 | popped that from the list. It's already been pulled out of
02:05:50.520 | that list. So it's okay for us to just we don't need to say
02:05:54.760 | like we don't need to do this because we've already dropped
02:05:58.280 | that initial system message if it existed. Okay, and then we
02:06:01.960 | have the clear method as before. So that's all of the
02:06:05.640 | logic for our conversational summary buffer memory. We
02:06:12.200 | redefine our get chat history function with the LM and K
02:06:18.760 | parameters there and then we'll also want to set the
02:06:21.480 | configurable fields again. So that is just going to be called
02:06:25.080 | session ID LM and K. Okay, so now we can invoke the K value
02:06:32.280 | to begin with is going to be four. Okay, so you can see no
02:06:37.880 | old messages to update summary with. That's good. Let's invoke
02:06:42.520 | this a few times and let's see what we get. Okay, so no old
02:06:47.080 | messages to update summary with.
02:06:51.540 | Found six messages dropping the oldest two and then we have new
02:06:55.460 | summary in the conversation. James and Bruce themselves and
02:06:57.700 | Chris is interested in researching different types of
02:07:00.180 | conversational memory. Right so you can see there's quite a lot
02:07:03.220 | in here at the moment. So we would definitely want to prompt
02:07:07.940 | the LM the summary LM to keep that short. Otherwise, we're
02:07:12.100 | just getting a ton of stuff right, but we can see that that
02:07:16.820 | is you know it's it's working. It's functional. So let's go
02:07:20.500 | back and see if we can prompt it to be a little more concise.
02:07:23.940 | So we come to here and trying to maintain as much relevant
02:07:27.460 | information as possible. However, we need to keep our
02:07:34.980 | summary concise. The limit is a single short paragraph. Okay,
02:07:45.060 | something like this. Let's try and let's see what we get with
02:07:48.980 | that. Okay, so message one again and nothing to update.
02:07:54.100 | See this so new summary you can see it's a bit shorter. It
02:07:57.700 | doesn't have all those bullet points. Okay, so that seems
02:08:04.900 | better. Let's see so you can see the first summary is a bit
02:08:09.620 | shorter, but then as soon as we get to the second and third
02:08:13.700 | summaries, the second summary is actually slightly longer than
02:08:16.980 | the third one. Okay, so we're going to be we're going to be
02:08:20.260 | losing a bit of information in this case more than we were
02:08:23.460 | before, but we're saving a ton of tokens. So that's of course
02:08:27.460 | a good thing and of course we could keep going and adding
02:08:30.500 | many interactions here and we should see that this
02:08:33.460 | conversation summary will be it should maintain that sort of
02:08:37.220 | length of around one short paragraph. So that is it for
02:08:43.220 | this chapter on conversational memory. We've seen a few
02:08:47.300 | different memory types. We've implemented the old deprecated
02:08:51.140 | versions so we can see what they were like and then we've
02:08:55.060 | reimplemented them for the latest versions of lang chain
02:08:58.500 | and to be honest using logic where we are getting much more
02:09:02.740 | into the weebs and that is in some ways. Okay, it complicates
02:09:07.300 | things that is true, but in other ways it gives us a ton of
02:09:10.900 | control so we can modify those memory types as we did with
02:09:14.180 | that final summary buffer memory type. We can modify
02:09:17.940 | those to our liking, which is incredibly useful when you're
02:09:23.060 | actually building applications for the real world. So that is
02:09:26.340 | it for this chapter. We'll move on to the next one in this
02:09:29.780 | chapter. We are going to introduce agents now agents. I
02:09:34.820 | think are one of the most important components in the
02:09:39.300 | world of AI and I don't see that going away anytime soon.
02:09:43.140 | I think the majority of AI applications, the intelligent
02:09:49.220 | part of those will be almost always an implementation of an
02:09:53.380 | AI agent or most for AI agents. So in this chapter, we are just
02:09:57.940 | going to introduce agents within the context of lang
02:10:01.780 | chain. We're going to keep it relatively simple. We're going
02:10:05.540 | to go into much more depth in agents in the next chapter
02:10:10.500 | where we'll do a bit of a deep dive, but we'll focus on just
02:10:14.260 | introducing the core concepts and of course agents within
02:10:18.900 | lang chain here. So jumping straight into our notebook,
02:10:24.500 | let's run our prerequisites. You'll see that we do have an
02:10:28.660 | additional prerequisite here, which is Google search results.
02:10:31.780 | That's because we're going to be using the SERP API to allow
02:10:35.940 | our LM as an agent to search the web, which is one of the
02:10:41.700 | great things about agents that they can do all of these
02:10:44.420 | additional things and LM by itself obviously cannot. So
02:10:48.420 | we'll come down to here. We have our langsmith parameters
02:10:51.700 | again, of course. So you enter your lang chain API key if you
02:10:54.900 | have one and now we're going to take a look at tools, which is
02:10:59.380 | a very essential part of agents. So tools are a way for
02:11:04.740 | us to augment our LMs with essentially anything that we
02:11:08.900 | can write in code. So we mentioned that we're going to
02:11:12.420 | have a Google search tool that Google search tool. It's some
02:11:15.860 | code that gets executed by our LM in order to search Google
02:11:20.180 | and get some results. So a tool can be thought of as any code
02:11:25.620 | logic or any function in the case of Python and a function
02:11:31.380 | that has been formatted in a way so that our LM can
02:11:34.900 | understand how to use it and then actually use it. Although
02:11:39.860 | the LM itself is not using the tool. It's more our agent
02:11:44.740 | execution logic, which uses the tool for the LM. So we're
02:11:49.220 | going to go ahead and actually create a few simple tools.
02:11:52.740 | We're going to be using what is called the tool decorator from
02:11:55.380 | lang chain and there are a few things to keep in mind when
02:12:00.100 | we're building tools. So for optimal performance, our tool
02:12:04.100 | needs to be just very readable and what I mean by readable is
02:12:07.780 | we need three main things. One is a dot string that is written
02:12:12.660 | natural language and it is going to be used to explain to
02:12:15.860 | the LM when and why and how it should use this tool. We should
02:12:21.460 | also have clear parameter names. Those parameter names
02:12:25.460 | should tell the LM okay what each one of these parameters
02:12:29.780 | are. They should be self explanatory. If they are not
02:12:33.060 | self explanatory, we should be including an explanation for
02:12:37.860 | those parameters within the dot string. Then finally, we
02:12:41.220 | should have type annotations for both our parameters and
02:12:44.740 | also what we're returning from the tool. So let's jump in and
02:12:49.060 | see how we would implement all of that. So come down here and
02:12:52.820 | we have lang chain core tools import tool. Okay. So these are
02:12:57.380 | just four incredibly simple tools. We have the addition or
02:13:02.020 | add tool multiply the exponentiate and the subtract
02:13:05.780 | tools. Okay. So a few calculator S tools. Now when we
02:13:11.780 | add this tool decorator, it is turning each of these tools
02:13:17.140 | into what we call a structured tool object. So you can see
02:13:20.980 | that here. We can see we have this structured tool. We have a
02:13:26.180 | name description. Okay. And then we have this schema. We'll
02:13:30.340 | see this in a moment and a function right. So this
02:13:32.660 | function is literally just the original function. It's a
02:13:36.660 | mapping to the original function. So in this case, it's
02:13:39.700 | the add function. Now the description we can see it's
02:13:42.820 | coming from our dot string and of course the name as well is
02:13:46.740 | just coming from the function name. Okay. And then we can
02:13:50.020 | also see let's just print the name and description, but then
02:13:54.420 | we can also see the args schema right. We can so this
02:13:58.660 | thing here that we can't read at the moment to read it. We're
02:14:02.180 | just going to look at the model JSON schema method and then we
02:14:06.980 | can see what that contains, which is all of this
02:14:09.220 | information. So this actually contains everything includes
02:14:12.260 | properties. So we have the X. It creates a sort of title for
02:14:16.100 | that and it also specifies the type. Okay. So the type that we
02:14:20.660 | define is float float for opening. I guess mapped to
02:14:25.300 | number rather than just being float and then we also see that
02:14:28.900 | we have this required field. So this is telling our LM which
02:14:33.140 | parameters are required, which ones are optional. So we you
02:14:36.820 | know in some cases you would we can even do that here. Let's do
02:14:42.180 | Z. That is going to be float or none. Okay. And we're just
02:14:48.340 | going to say it is 0.3. Alright. I'm going to remove
02:14:53.460 | this in a minute because it's kind of weird, but let's just
02:14:57.140 | see what that looks like. So you see that we now have X, Y,
02:15:02.020 | and Z, but then in Z, we have some additional information.
02:15:06.580 | Okay. So it can be any of it can be a number or it can just
02:15:10.020 | be nothing. The default value for that is 0.3. Okay. And then
02:15:15.060 | if we look here, we can see that the required field does
02:15:18.020 | not include Z. So it's just X and Y. So it's describing the
02:15:22.980 | full function schema for us, but let's remove that. Okay. And
02:15:28.180 | we can see that again with our exponentiate tool similar
02:15:32.420 | thing. Okay. So how how are we going to invoke our tool? So
02:15:39.060 | the LLM the underlying LLM is actually going to generate a
02:15:42.900 | string. Okay. So it will look something like this. This is
02:15:46.660 | going to be our LLM output. So it is it's a string that is
02:15:51.780 | some JSON and of course to load a string into a dictionary
02:15:57.300 | format, we just use JSON loads. Okay. So let's see that. So
02:16:03.220 | this could be the output from our LLM. We load it into a
02:16:06.180 | dictionary and then we get an actual dictionary. And then
02:16:09.620 | what we would do is we can take our exponentiate tool. We
02:16:14.820 | access the underlying function and then we pass it the keyword
02:16:19.220 | arguments from our dictionary here. Okay. And that will
02:16:26.200 | execute our tool. That is the tool execution logic that
02:16:29.000 | LineChain implements and then later on in the next chapter,
02:16:32.520 | we'll be implementing ourselves. Cool. So let's move
02:16:35.560 | on to creating an agent. Now, we're going to be
02:16:38.680 | constructing a simple tool calling agent. We're going to
02:16:41.880 | be using LineChain expression language to do this. Now, we
02:16:45.720 | will be covering LineChain expression language or LSL
02:16:49.400 | more in a upcoming chapter but for now, all we need to know is
02:16:54.600 | that our agent will be constructed using syntax and
02:16:58.840 | components like this. So, we would start with our input
02:17:02.760 | parameters. That is going to include our user query and of
02:17:06.040 | course, the chat history because we need our agent to be
02:17:09.080 | conversational and remember previous interactions within
02:17:11.720 | the conversation. These input parameters will also include a
02:17:15.800 | placeholder for what we call the agent scratch pad. Now, the
02:17:18.680 | agent scratch pad is essentially where we are
02:17:21.240 | storing the internal thoughts or the internal dialogue of the
02:17:25.400 | agent as it is using tools and getting observations from those
02:17:28.280 | tools and working through those multiple internal steps. So, in
02:17:34.040 | the case that we will see, it will be using, for example, the
02:17:36.760 | addition tool, getting the result using the multiply tool,
02:17:39.720 | getting the result, and then providing a final answer
02:17:42.760 | towards as a user. So, let's jump in and see what it looks
02:17:46.680 | like. Okay, so we'll just start with defining our prompt. So,
02:17:50.360 | our prompt is going to include the system message. That's
02:17:53.480 | nothing. We're not putting anything special in there.
02:17:56.680 | We're going to include the chat history which is a messages
02:18:01.160 | placeholder. Then, we include our human message and then we
02:18:05.320 | include a placeholder for the agent scratch pad. Now, the way
02:18:08.760 | that we implement this later is going to be slightly different
02:18:12.040 | for the scratch pad. We'd actually use this messages
02:18:14.200 | placeholder but this is how we use it with the built-in
02:18:17.400 | create tool agent from LinkedIn. Next, we'll define our
02:18:21.240 | LM. We do need our opening our API key for that. So, we'll
02:18:24.920 | enter that here like so. Okay, so come down. Okay, so we're
02:18:30.120 | going to be creating this agent. We need conversation
02:18:33.240 | memory and we are going to use the older conversation buffer
02:18:36.280 | memory class rather than the newer runnable with message
02:18:39.080 | history class. That's just because we're also using this
02:18:42.200 | older create tool calling agent and this is the
02:18:46.760 | older way of doing things. In the next chapter, we are going
02:18:50.040 | to be using the more recent basically what we already
02:18:54.600 | learned on chat history. We're going to be using all of that
02:18:57.720 | to implement our chat history but for now, we're going to be
02:19:00.520 | using the older method which is deprecated just as a pre
02:19:04.760 | warning but again, as I mentioned at the very start of
02:19:08.200 | course, we're starting abstract and then we're getting into the
02:19:11.720 | details. So, we're going to initialize our agent for that.
02:19:15.960 | We need these four things. LLM as we defined. Tools as we have
02:19:20.440 | defined. Prompt as we have defined and then the memory
02:19:24.520 | which is our old conversation buffer memory. So, with all of
02:19:29.400 | that, we are going to go ahead and we create a tool calling
02:19:32.360 | agent and then we just provide it with everything. Okay, there
02:19:36.120 | we go. Now, you'll see here I didn't pass in the memory. I'm
02:19:41.400 | passing it in down here instead. So, we're going to
02:19:44.920 | start with this question which is what is 10.7 multiplied by
02:19:48.680 | 7.68. Okay. So, given the precision of these numbers, our
02:19:57.240 | normal LLM would not be able to answer that. Almost definitely
02:20:02.360 | would not be able to answer that correctly. We need a
02:20:04.920 | external tool to answer that accurately and we'll see that
02:20:08.520 | that is exactly what it's trying to do. So, we can see
02:20:12.440 | that the tool agent action message here. We see that it
02:20:17.800 | decided, okay, I'm going to use the multiply tool and here are
02:20:20.520 | the parameters I want to use for that tool. Okay, we can see
02:20:23.720 | X is 10.7 and Y is 7.68. You can see here that this is
02:20:28.760 | already a dictionary and that is because the Lang chain has
02:20:33.320 | taken the string from our LLM call and already converted it
02:20:37.880 | into a dictionary for us. Okay, so that's just it's happening
02:20:41.240 | behind the scenes there and you can actually see if we go into
02:20:44.840 | the details a little bit, we can see that we have these
02:20:46.840 | arguments and this is the original string that was coming
02:20:49.400 | from our LLM. Okay, which has already been, of course,
02:20:52.680 | processed by Lang chain. So, we have that. Now, the one thing
02:20:58.280 | missing here is that, okay, we've got that the LLM wants
02:21:03.800 | us to use multiply and we've got what the LLM wants us to
02:21:06.760 | put into multiply but where's the answer, right? There is no
02:21:11.160 | answer because the tool itself has not been executed because
02:21:14.840 | it can't be executed by the LLM but then, okay, didn't we
02:21:19.640 | already define our agent here? Yes, we defined the part of our
02:21:24.760 | agent. That is how LLM has our tools and it is going to
02:21:29.240 | generate which tool to use but it actually doesn't include the
02:21:33.880 | agent execution part which is, okay, the agent executor is a
02:21:40.360 | broader thing. It's broader logic like just code logic
02:21:44.520 | which acts as a scaffolding within which we have the
02:21:48.600 | iteration through multiple steps of our LLM calls followed
02:21:53.560 | by the LLM outputting what tool to use followed by us
02:21:57.320 | actually executing that for the LLM and then providing the
02:22:01.400 | output back into the LLM for another decision or another
02:22:05.480 | step. So, the agent itself here is not the full agentic flow
02:22:12.440 | that we might expect. Instead, for that, we need to implement
02:22:16.440 | this agent executor class. This agent executor includes our
02:22:20.840 | agent from before. Then, it also includes the tools and one
02:22:25.160 | thing here is, okay, we already passed the tools to our agent.
02:22:27.800 | Why do we need to pass them again? Well, the tools being
02:22:30.760 | passed to our agent up here, that is being used. So, that is
02:22:36.280 | essentially extracting out those function schemas and
02:22:39.240 | passing it to our LLM so that our LLM knows how to use the
02:22:41.880 | tools. Then, we're down here. We're passing the tools again
02:22:44.840 | to our agent executor and this is rather than looking at how
02:22:48.920 | to use those tools. This is just looking at, okay, I want
02:22:51.880 | the functions for those tools so that I can actually execute
02:22:54.440 | them for the LLM or for the agent. Okay, so that's what is
02:22:58.760 | happening there. Now, we can also pass in our memory
02:23:02.440 | directly. So, you see, if we scroll up a little bit here, I
02:23:06.600 | actually had to pass in the memory like this with our agent.
02:23:11.720 | That's just because we weren't using the agent executor. Now,
02:23:14.120 | we have the agent executor. It's going to handle that for
02:23:16.200 | us and another thing that's going to handle for us is
02:23:19.880 | intermediate steps. So, you'll see in a moment that when we
02:23:23.960 | invoke the agent executor, we don't include the intermediate
02:23:26.600 | steps and that's because that is already handled by the
02:23:29.800 | agent executor now. So, we'll come down. We'll set verbose
02:23:34.360 | equal to true so we can see what is happening and then we
02:23:38.200 | can see here, there's no intermediate steps anymore and
02:23:42.360 | we do still pass in the chat history like this but then the
02:23:47.480 | addition of those new interactions to our memory is
02:23:50.520 | going to be handled by the executor. So, in fact, let me
02:23:54.920 | actually show that very quickly before we jump in. Okay, so
02:23:59.320 | that's currently empty. We're going to execute this.
02:24:03.400 | Okay, we're entered that new agent executor chain and let's
02:24:07.300 | just have a quick look at our messages again and now you can
02:24:10.980 | see that agent executor automatically handled the
02:24:13.940 | addition of our human message and then the responding AI
02:24:17.700 | message for us. Okay, which is useful. Now, what happened? So,
02:24:23.140 | we can see that the multiply tool was invoked with these
02:24:26.820 | parameters and then this pink text here that we got, that is
02:24:30.900 | the observation from the tool. So, it's what the tool output
02:24:33.700 | back to us, okay? Then, this final message here is not
02:24:37.140 | formatted very nicely but this final message here is coming
02:24:40.420 | from our LLM. So, the green is our LLM output. The pink is our
02:24:46.420 | tool output, okay? So, the LLM after seeing this output says
02:24:53.700 | 10.7 multiplied by 7.68 is approximately 82.18. Okay,
02:25:01.220 | cool. Useful and then we can also see that the chat history
02:25:04.500 | which we already just saw. Great. So, that has been used
02:25:08.980 | correctly. We can just also confirm that that is correct.
02:25:13.220 | 82.1759 recurring which is exactly what we get here. Okay
02:25:18.740 | and we the reason for that is obviously our multiply tool is
02:25:22.340 | just doing this exact operation. Cool. So, let's try
02:25:28.100 | this with a bit of memory. So, I'm going to ask or I'm going
02:25:31.700 | to state to the agent. Hello, my name is James. We'll leave
02:25:36.980 | that as the it's not actually the first interaction because
02:25:40.100 | we already have these but it's an early interaction with my
02:25:45.860 | name in there. Then, we're going to try and perform
02:25:49.460 | multiple tool calls within a single execution loop and what
02:25:52.500 | you'll see with when it is calling these tools is that you
02:25:55.220 | can actually use multiple tools in parallel. So, for sure, I
02:25:58.420 | think two or three of these were used in parallel and then
02:26:01.460 | define or subtract had to wait for those previous results. So,
02:26:05.220 | it would have been executed afterwards and we should
02:26:08.420 | actually be able to see this in Langsmith. So, if we go here,
02:26:13.220 | yeah, we can see that we have this initial call and then we
02:26:17.060 | have add a multiply and exponentiate or use in parallel.
02:26:20.100 | Then, we have another call which you subtract and then we
02:26:22.820 | get the response. Okay, which is pretty cool and then the
02:26:27.620 | final result there is negative eleven. Now, when you look at
02:26:32.420 | whether the answer is accurate, I think the order here of
02:26:37.300 | calculations is not quite correct. So, if we put the
02:26:41.380 | actual computation here, it gets it right but otherwise, if
02:26:45.620 | I use natural language, it's like, I'm doing, maybe I'm
02:26:48.260 | phrasing it in a poor way. Okay, so, I suppose that is
02:26:53.780 | pretty important. So, okay, if we put the computation in here,
02:26:57.940 | we get the negative thirteen. So, it's something to be
02:27:01.460 | careful with and probably requires a little bit of
02:27:04.660 | prompting to prompting and maybe examples in order to get
02:27:08.020 | that smooth so that it does do things in the way that we might
02:27:12.740 | expect or maybe we as humans are just bad and misuse the
02:27:17.140 | systems one or the other. Okay, so now, we've gone through that
02:27:21.460 | a few times. Let's go and see if our agent can still recall
02:27:24.420 | our name. Okay and it remembers my name is James. Good. So, it
02:27:28.500 | still has that memory in there as well. That's good. Let's
02:27:32.020 | move on to another quick example where we're just going
02:27:35.220 | to use Google Search. So, we're going to be using the
02:27:37.700 | SEB API. You can, okay, you can get the API key that you need
02:27:43.540 | from here. So, SEB API dot com slash user slash sign in and
02:27:48.340 | just enter that in here. So, you will get it's up to 100
02:27:52.900 | searches per month for free. So, just be aware of that if
02:27:58.100 | you overuse it. I don't think they charge you cuz I don't
02:28:01.300 | think you enter your card details straight away but yeah
02:28:05.060 | just be aware of that limit. Now, there are certain tools
02:28:10.180 | that LineTrain have already built for us. So, they're
02:28:12.740 | pre-built tools and we can just load them using the load tools
02:28:15.860 | function. So, we do that like so. We have our load tools and
02:28:19.300 | we just pass in the SEB API tool only. We can pass in more
02:28:22.980 | there if we want to and then we also pass in our LM. Now, I'm
02:28:27.940 | going to one, use that tool but I'm also going to define my
02:28:31.700 | own tool which is to get the current location based on the
02:28:35.380 | IP address. Now, this is we're in Colab at the moment. So,
02:28:37.860 | it's actually going to get the IP address for the Colab
02:28:40.340 | instance that I'm currently on and we'll find out where that
02:28:43.380 | is. So, that is going to get the IP address and then it's
02:28:47.620 | going to provide the data back to our LM in this format here.
02:28:50.820 | So, we're going to be latitude, longitude, city, and
02:28:53.060 | country. Okay? We're also going to get the current date and
02:28:56.660 | time. So, now, we're going to redefine our prompt. I'm not
02:29:02.500 | going to include chat history here. I just want this to be
02:29:04.820 | like a one-shot thing. I'm going to redefine our agent and
02:29:09.300 | agent executor using our new tools which is our SEB API plus
02:29:13.780 | the get current date time and get location from IP. Then,
02:29:17.780 | I'm going to invoke our agent executor with I have a few
02:29:20.900 | questions. What is the date and time right now? How is the
02:29:23.780 | weather where I am? And please give me degrees in Celsius. So,
02:29:28.740 | when it gives me that weather. Okay and let's see what we get.
02:29:33.780 | Okay. So, apparently, we're in Council Bluffs in the US. It is
02:29:40.680 | 13 degrees Fahrenheit which I think is absolutely freezing.
02:29:44.440 | Oh my gosh, it is. Yes, minus ten. So, it's super cold over
02:29:48.760 | there. And you can see that, okay, it did give us
02:29:53.000 | Fahrenheit. So, that's that is because the tool that we're
02:29:55.320 | using provided us with Fahrenheit which is fine but it
02:29:59.960 | did translate that over into a estimate of Celsius for us
02:30:03.800 | which is pretty cool. So, let's actually output that. So, we
02:30:07.640 | get this which I is correct with the US approximately this
02:30:13.640 | and we also get a description of the conditions was partly
02:30:17.240 | cloudy with 0% precipitation lucky for them and humidity of
02:30:23.720 | 66%. Okay. All pretty cool. So, that is it for this
02:30:27.800 | introduction to Langchain Agents. As I mentioned, next
02:30:31.080 | chapter, we're going to dive much deeper into Agents and
02:30:34.120 | also implement that for Langchain version 0.3. So,
02:30:37.880 | we'll leave this chapter here and jump into the next one. In
02:30:41.320 | this chapter, we're going to be taking a deep dive into Agents
02:30:45.800 | with the Langchain and we're going to be covering what an
02:30:50.840 | agent is. We're going to talk a little bit conceptually about
02:30:55.640 | agents, the React agent, and the type of agent that we're
02:30:59.320 | going to be building and based on that knowledge, we are
02:31:02.120 | actually going to build out our own agent execution logic
02:31:07.880 | which we refer to as the agent executor. So, in comparison to
02:31:12.680 | the previous video on agents in Langchain which is more of an
02:31:17.240 | introduction, this is far more detailed. We'll be getting into
02:31:21.480 | the weeds a lot more with both what agents are and also agents
02:31:26.200 | within Langchain. Now, when we talk about agents, a
02:31:30.280 | significant part of the agent is actually relatively simple
02:31:36.520 | code logic that iteratively runs LLM calls and processes
02:31:44.040 | their outputs, potentially running or executing tools. The
02:31:48.760 | exact logic for each approach to building an agent will
02:31:53.400 | actually vary pretty significantly, but we'll focus
02:31:57.560 | on one of those which is the React agent. Now, React is a
02:32:03.160 | very common pattern and although being relatively old
02:32:07.560 | now, most of the tool agents that we see used by OpenAI and
02:32:13.320 | essentially every LLM company, they all use a very similar
02:32:17.240 | pattern. Now, the React agent follows a pattern like this.
02:32:20.920 | Okay, so we would have our user input up here. Okay, so our
02:32:26.760 | input here is a question, right? Aside from the Apple
02:32:29.160 | remote, what other device can control the program? Apple
02:32:31.720 | remote was originally designed to interact with. Now, probably
02:32:35.400 | most LLMs would actually be able to answer this directly
02:32:37.640 | now. This is from the paper, which was a few years back. Now,
02:32:42.600 | in this scenario, assuming our LLM didn't already know the
02:32:46.360 | answer, there are multiple steps an LLM or an agent might
02:32:50.280 | take in order to find out the answer. Okay, so first of
02:32:55.000 | those is we say our question here is what other device can
02:32:59.160 | control the program? Apple remote was originally designed
02:33:01.800 | to interact with. So the first thing is, okay, what was the
02:33:05.240 | program that the Apple remote was originally designed to
02:33:07.800 | interact with? That's the first question we have here. So what
02:33:12.360 | we do is I need to search Apple remote and find a program
02:33:15.240 | that's useful. This is a reasoning step. So the LLM is
02:33:18.840 | reasoning about what it needs to do. I need to search for
02:33:22.040 | that and find a program that's useful. So we are taking an
02:33:26.200 | action. This is a tool call here. Okay, so we're going to
02:33:29.480 | use the search tool and our query will be Apple remote and
02:33:33.000 | the observation is the response we get from executing that
02:33:36.120 | tool. Okay, so the response here will be the Apple remote
02:33:39.000 | is designed to control the front grow media center. So now
02:33:43.320 | we know the program Apple remote was originally designed
02:33:45.720 | to interact with. Now we're going to go through another
02:33:49.480 | iteration. Okay, so this is one iteration of our reasoning
02:33:55.160 | action and observation. So when we're talking about react
02:33:59.960 | here, although again, this sort of pattern is very common
02:34:03.640 | across many agents when we're talking about react, the name
02:34:07.880 | actually is reasoning or the first two characters of
02:34:12.360 | reasoning followed by action. Okay, so that's where the react
02:34:17.080 | comes from. So this is one of our react agent loops or
02:34:21.400 | iterations. We're going to go and do another one. So next
02:34:25.000 | step we have this information. The LM is not provided with
02:34:27.640 | this information. Now we want to do a search for front row.
02:34:31.800 | Okay, so we do that. This is the reasoning step. We perform
02:34:35.960 | the action search front row. Okay, tool search query front
02:34:40.680 | row observation. This is the response front row is controlled
02:34:44.600 | by an Apple remote or keyboard function keys. Alright, cool.
02:34:50.120 | So we know keyboard function keys are the other device that
02:34:53.880 | we were asking about up here. So now we have all the
02:34:58.600 | information we need. We can provide an answer to our user.
02:35:02.760 | So we go through another iteration here reasoning and
02:35:07.240 | action. Our reasoning is I can now provide the answer of
02:35:11.400 | keyboard function keys to the user. Okay, great. So then we
02:35:16.440 | use the answer tool. It's like final answer in more common
02:35:21.960 | tool agent use and the answer would be keyboard function
02:35:27.000 | keys, which we then output to our user. Okay, so that is the
02:35:33.720 | react loop. Okay, so looking at this. Where are we actually
02:35:40.020 | calling an LLM and in what way are we actually calling an LLM?
02:35:44.820 | So we have our reasoning step. Our LLM is generating the text
02:35:50.900 | here, right? So LLM is generating. Okay. What should I
02:35:53.700 | do then? Our LLM is going to generate the input parameters
02:35:59.620 | to our action step here that will those input parameters and
02:36:05.460 | the tool being used will be taken by our code logic, our
02:36:08.580 | agent executor logic, and they will be used to execute some
02:36:11.940 | code in which we will get an output. That output might be
02:36:16.180 | taken directly to our observation or our LLM might
02:36:19.460 | take that output and then generate an observation based
02:36:22.500 | on that. It depends on how you've implemented everything.
02:36:27.380 | So our LLM could potentially be being used at every single
02:36:32.660 | step there and of course that will repeat through every
02:36:37.860 | iteration. So we have further iterations down here. So you're
02:36:41.540 | potentially using an LLM multiple times throughout this
02:36:44.740 | whole process, which of course in terms of latency and token
02:36:48.020 | cost, it does mean that you're going to be paying more for an
02:36:52.100 | agent than you are with just a standard LLM, but that is of
02:36:55.940 | course expected because you have all of these different
02:36:58.740 | things going on. But the idea is that what you can get out of
02:37:02.820 | an agent is of course much better than what you can get
02:37:05.780 | out of an LLM alone. So when we're looking at all of this,
02:37:11.060 | all of this iterative chain of thought and tool use, all this
02:37:16.260 | needs to be controlled by what we call the agent executor,
02:37:19.380 | which is our code logic, which is hitting our LLM, processing
02:37:23.380 | its outputs, and repeating that process until we get to our
02:37:27.060 | answer. So breaking that part down, what does it actually
02:37:30.900 | look like? It looks kind of like this. So we have our user
02:37:34.900 | input goes into our LLM, okay, and then we move on to the
02:37:39.540 | reasoning and action steps. Is the action the answer? If it is
02:37:44.500 | the answer, so as we saw here, where is the answer? If the
02:37:50.660 | action is the answer, so true, we would just go straight to
02:37:54.180 | our outputs. Otherwise, we're going to use our selector tool.
02:37:57.620 | Agent executor is going to handle all this. It's going to
02:38:00.980 | execute our tool, and then from that, we get our three
02:38:05.460 | reasoning, action, observation, inputs, and outputs, and then
02:38:09.300 | we're feeding all that information back into our LLM,
02:38:11.940 | okay? In which case, we go back through that loop. So we
02:38:15.860 | could be looping for a little while until we get to that
02:38:19.060 | final output. Okay, so let's go across to the code. We're going
02:38:23.620 | to be going into the agent executor notebook. We'll open
02:38:26.580 | that up in Colab, and we'll go ahead and just install our
02:38:30.500 | prerequisites. Nothing different here. It's just
02:38:34.820 | Langtrain, Langsmith, optionally, as before. Again,
02:38:38.980 | optionally, Langtrain API key if you do want to use
02:38:41.540 | Langsmith. Okay, and then we'll come down to our first
02:38:47.060 | section, where it's going to define a few quick tools. I'm
02:38:51.220 | not necessarily going to go through these because we've
02:38:54.660 | already covered them in the agent introduction, but very
02:38:58.580 | quickly, Langtrain core tools, we're just importing this tool
02:39:02.180 | decorator, which transforms each of our functions here into
02:39:06.820 | what we would call a structured tool object. This
02:39:10.740 | thing here. Okay, which we can see. Let's just have a quick
02:39:14.660 | look here, and then if we want to, we can extract all of the
02:39:18.820 | sort of key information from that structured tool using
02:39:21.860 | these parameters here or attributes. So name,
02:39:24.180 | description, org schema, model, JSON schema, which give us
02:39:28.740 | essentially how the LLM should use our function. Okay, so I'm
02:39:34.900 | going to keep pushing through that. Now, very quickly again,
02:39:40.660 | we did cover this in the intro video, so I don't want to
02:39:44.420 | necessarily go over it again in too much detail, but our
02:39:48.580 | agent executor logic is going to need this part. So we're
02:39:52.660 | going to be getting a string from our LLM. We're going to be
02:39:55.780 | loading that into a dictionary object, and we're going to be
02:39:59.060 | using that to actually execute our tool as we do here using
02:40:02.980 | keyword arguments. Okay, like that. Okay, so with the tools
02:40:09.620 | out of the way, let's take a look at how we create our
02:40:12.340 | agent. So when I say agent here, I'm specifically talking
02:40:16.820 | about the part that is generating our reasoning step,
02:40:21.460 | then generating which tool and what the input parameters to
02:40:27.140 | that tool will be. Then the rest of that is not actually
02:40:30.340 | covered by the agent. Okay, the rest of that would be covered
02:40:33.380 | by the agent execution logic, which would be taking the tool
02:40:37.140 | to be used, the parameters, executing the tool, getting
02:40:41.220 | the response, aka the observation, and then iterating
02:40:45.060 | through that until the LLM is satisfied and we have enough
02:40:47.940 | information to answer a question. So looking at that,
02:40:52.740 | our agent will look something like this. It's pretty simple.
02:40:56.020 | So we have our input parameters, including the chat
02:40:58.500 | history, user query. We have our input parameters, including
02:41:01.780 | the chat history, user query, and actually would also have
02:41:04.900 | any intermediate steps that have happened in here as well. We
02:41:08.500 | have our prompt template, and then we have our LLM binded
02:41:12.340 | with tools. So let's see how all this would look starting
02:41:16.500 | with, we'll define our prompt template. So it's going to look
02:41:20.340 | like this. We have our system message, you're a helpful
02:41:24.340 | assistant when answering user's questions. You should use one
02:41:26.900 | tool to provide it after using a tool. The tool I will provide
02:41:29.380 | in the scratch pad below, okay, which we're naming here. If you
02:41:33.860 | have an answer in the scratch pad, you should not use any
02:41:36.580 | more tools and instead answer directly to the user. Okay, so
02:41:40.420 | we have that as our system message. We could obviously
02:41:43.300 | modify that based on what we're actually doing. Then following
02:41:47.620 | our system message, we're going to have our chat history, so any
02:41:50.420 | previous interactions between the user and the AI. Then we
02:41:54.180 | have our current message from the user, okay, which will be
02:41:57.860 | fed into the input field there. And then following this, we
02:42:01.780 | have our agent's scratch pad or the intermediate thoughts. So
02:42:05.140 | this is where things like the LLM deciding, okay, this is what
02:42:09.540 | I need to do. This is how I'm going to do it, aka the tool
02:42:12.900 | call. And this is the observation. That's where all
02:42:16.020 | of that information will be going, right? So each of those
02:42:18.980 | you want to pass in as a message, okay? And the way that
02:42:23.380 | will look is that any tool call generation from the LLM, so
02:42:28.020 | when the LLM is saying, use this tool, please, that will be
02:42:31.780 | a system message. And then the responses from our tool, so the
02:42:37.140 | observations, they will be returned as tooled messages.
02:42:42.180 | Great. So we'll run that to define our prompt template.
02:42:46.180 | We're going to define our LLM. So we're going to be using
02:42:49.700 | Jupyter 4.0 Mini with a temperature of zero because we
02:42:54.100 | want less creativity here, particularly when we're doing
02:42:56.820 | tool calling. There's just no need for us to use a high
02:43:00.500 | temperature here. So we need to enter our OpenAI API key, which
02:43:03.780 | we would get from platformopenai.com. We enter this,
02:43:08.100 | then we're going to continue and we're just going to add
02:43:11.140 | tools to our LLM here, okay? These, and we're going to bind
02:43:18.180 | them here. Then we have tool choice any. So tool choice any,
02:43:23.060 | we'll see in a moment, I'll go through this a little bit more
02:43:25.860 | in a second, but that's going to essentially force a tool
02:43:29.540 | call. And you can also put required, which is actually a
02:43:32.420 | bit more, it's a bit clearer, but I'm using any here, so I'll
02:43:36.500 | stick with it. So these are our tools we're going through. We
02:43:40.100 | have our inputs into the agent runnable. We have our prompt
02:43:44.980 | template and then that will get fed into our LLM. So let's run
02:43:49.140 | that. Now we would invoke the agent part of everything here
02:43:54.100 | with this. Okay, so let's see what it outputs. This is
02:43:56.820 | important. So I'm asking what is 10%? Obviously that should
02:44:00.420 | use the addition tool and we can actually see that happening.
02:44:03.620 | So the agent message content is actually empty here. This is
02:44:07.940 | where you'd usually get an answer, but if we go and have a
02:44:11.380 | look, we have additional keyword args. In there we have
02:44:14.580 | tool calls and then we have function arguments. Okay, so
02:44:19.060 | we're calling a function. Arguments for that function are
02:44:22.020 | this. Okay, so we can see this is string. Again, the way that
02:44:26.580 | we would parse that is we do JSON loads and that becomes
02:44:29.620 | dictionary and then we can see which function is being called
02:44:32.740 | and it is the add function and that is all we need in order to
02:44:36.420 | actually execute our function or our tool. Okay, we can see
02:44:42.740 | it's a lot more detail here. Now, what do we do from here?
02:44:47.780 | We're going to map the tool name to the tool function and
02:44:50.660 | then we're just going to execute the tool function with
02:44:52.580 | the generated args, i.e. those. I'll also just point out
02:44:57.380 | quickly that here we are getting the dictionary
02:45:00.100 | directly, which I think is coming from somewhere else in
02:45:02.820 | this, which is here. Okay, so even that step
02:45:08.820 | here where we're parsing this out, we don't necessarily need
02:45:11.300 | to do that because I think on the lang chain side, they're
02:45:14.580 | doing it for us. So we're already getting that. So JSON
02:45:19.540 | loads we don't necessarily need here. Okay, so we're just
02:45:22.900 | creating this tool name to function mapping dictionary
02:45:26.660 | here. So we're taking the well the tool names and we're just
02:45:30.420 | mapping those back to our tool functions and this is coming
02:45:33.140 | from our tools list. So that tools list that we defined
02:45:36.820 | here. Okay, and we can even just see quickly that will
02:45:41.140 | include everything or each of the tools we define there.
02:45:44.820 | Okay, that's all it is. Now, we're going to execute using
02:45:49.860 | our name to tool mapping. Okay, so this here will get us the
02:45:54.660 | function. So we'll get us this function and then to that
02:45:58.580 | function, we're going to pass the arguments that we
02:46:02.420 | generated. Okay. Let's see what it looks like. Alright, so the
02:46:08.180 | response to the observation is twenty. Now, we are going to
02:46:14.180 | feed that back into our LLM using the tool message and
02:46:19.140 | we're actually going to put a little bit of text around this
02:46:21.540 | to make it a little bit nicer. We don't necessarily need to
02:46:24.420 | do this to be completely honest. We could just return
02:46:29.220 | the answer directly. I don't understand. I don't even think
02:46:33.220 | there would really be any difference. So, we could do
02:46:36.980 | either. In some cases, that could be very useful. In other
02:46:40.020 | cases, like here, it doesn't really make too much
02:46:42.340 | difference, particularly because we have this tool call
02:46:44.980 | ID and what this tool call ID is doing is it's being used by
02:46:48.660 | OpenAI. It's being read by the LLM so that the LLM knows that
02:46:54.180 | the response we got here is actually mapped back to the
02:46:59.940 | tool execution that it's identified here because you see
02:47:04.020 | that we have this ID. Alright, we have an ID here. The LLM is
02:47:08.020 | going to see the ID. It's going to see the ID that we pass back
02:47:12.340 | in here and it's going to see those two are connected. So,
02:47:14.900 | you can see, okay, this is the tool I called and this is a
02:47:17.540 | response I got from it. Because of that, you don't necessarily
02:47:20.740 | need to say which tool you used here. You can. It depends on
02:47:25.620 | what you're doing. Okay. So, what do we get here? We have,
02:47:32.580 | okay, just running everything again. We've added our tool
02:47:35.780 | call. So, that's the original AI message that includes, okay,
02:47:39.060 | use that tool and then we have the tool execution, tool
02:47:41.940 | message, which is the observation. We map those to
02:47:46.500 | the agent stretch card and then what do we get? We have an AI
02:47:49.540 | message but the content is empty again, which is
02:47:52.420 | interesting because we said to our LLM up here, if you have an
02:47:57.940 | answer in the stretch pad, you should not use any more tools
02:48:01.140 | and instead answer directly to the user. So, why is our LLM
02:48:07.860 | not answering? Well, the reason for that is down here, we
02:48:13.620 | specify tool choice equals any, which again, it's the same as
02:48:19.060 | tool choice required, which is telling the LLM that it cannot
02:48:24.180 | actually answer directly. It has to use a tool and I usually
02:48:28.900 | do this, right? I would usually put tool choice equals any or
02:48:32.180 | required and force the LLM to use a tool every single time.
02:48:37.780 | So, then the question is, if it has to use a tool every time,
02:48:41.220 | how does it answer our user? Well, we'll see in a moment.
02:48:47.220 | First, I just want to show you the two options essentially
02:48:51.380 | that we have. The second is what I would usually use but
02:48:53.700 | let's start with the first. So, the first option is that we
02:48:57.700 | set tool choice equal to auto and this tells the LLM that it
02:49:01.540 | can either use a tool or it can answer the user directly using
02:49:06.580 | the final answer or using that content field. So, if we run
02:49:11.460 | that, like we're specifying tool choice as auto, we run
02:49:14.740 | that, let's invoke, okay? Initially, you see, ah, wait,
02:49:20.100 | there's still no content. That's because we didn't add
02:49:23.140 | anything into the agent scratch pad here. There's no
02:49:25.460 | information, right? It's all empty. Actually, it's empty
02:49:30.260 | because, sorry, so here, you have the chat history that's
02:49:32.820 | empty. We didn't specify the agent scratch pad and the
02:49:38.260 | reason that we can do that is because we're using, if you
02:49:40.340 | look here, we're using get. So, essentially, it's saying,
02:49:43.700 | try and get agent scratch pad from this dictionary but if it
02:49:46.420 | hasn't been provided, we're just going to give an empty
02:49:49.300 | list. So, that's why we don't need to specify it
02:49:52.820 | here. But that means that, oh, okay, the agent doesn't
02:49:56.980 | actually know anything here. It hasn't used the tool yet. So,
02:50:01.300 | we're going to just go through our iteration again, right? So,
02:50:04.020 | we're going to get our tool output. We're going to use that
02:50:07.300 | to create the tool message and then we're going to add our
02:50:11.380 | tool call from the AI and the observation. We're going to
02:50:15.620 | pass those to the agent scratch pad and this time, we'll see.
02:50:19.700 | We run that. Okay, now, we get the content, okay? So, now, it's
02:50:24.980 | not calling. You see here, there's no tool call or
02:50:27.460 | anything going on. We just get content. So, that is, this is a
02:50:34.260 | standard way of doing or building a tool calling agent.
02:50:38.420 | The other option which I mentioned, this is what I
02:50:40.740 | usually go with. So, number two here, I would usually create a
02:50:45.700 | final answer tool. So, why would we even do that? Why would we
02:50:53.140 | create a final answer tool rather than just, you know, this
02:50:55.380 | method is actually perfectly, you know, it works. So, why
02:50:59.140 | would we not just use this? There are a few reasons. The
02:51:03.060 | main ones are that with option two where we're forcing tool
02:51:07.620 | calling, this removes possibility of an agent using
02:51:11.940 | that content field directly and the reason, at least, the
02:51:16.740 | reason I found this good when building agents in the past is
02:51:19.620 | that occasionally, when you do want to use a tool, it's
02:51:22.660 | actually going to go with the content field and it can get
02:51:25.860 | quite annoying and use the content field quite frequently
02:51:29.380 | when you actually do want it to be using one of the tools and
02:51:34.100 | this is particularly noticeable with smaller models. With
02:51:39.380 | bigger models, it's not as common although it does still
02:51:42.740 | happen. Now, the second thing that I quite like about using a
02:51:47.060 | tool as your final answer is that you can enforce a
02:51:52.740 | structured output in your answer. So, this is something
02:51:55.460 | we saw in, I think, the first, yes, the first line chain
02:52:00.100 | example where we were using the structured output tool of
02:52:05.060 | line chain and what that actually is, the structured
02:52:08.260 | output feature of line chain, it's actually just a tool call,
02:52:11.700 | right? So, it's forcing a tool call from your LLM. It's just
02:52:15.060 | abstracted away so you don't realize that that's what it's
02:52:17.220 | doing but that is what it's doing. So, I find that
02:52:22.020 | structured outputs are very useful particularly when you
02:52:25.940 | have a lot of code around your agent. So, when that output
02:52:30.420 | needs to go downstream into some logic, that can be very
02:52:35.780 | useful because you can, you have a reliable output format
02:52:40.420 | that you know is going to be output and it's also incredibly
02:52:43.860 | useful if you have multiple outputs or multiple fields that
02:52:47.860 | you need to generate for. So, those can be very useful. Now,
02:52:53.780 | to implement this, so to implement option two, we need
02:52:56.500 | to create a final answer tool. We, as with our other tools,
02:53:02.020 | we're actually going to provide a description and you can or
02:53:05.860 | you cannot do this. So, you can, you can also just return
02:53:10.260 | none and actually just use the generated action as the
02:53:16.340 | essentially what you're going to send out of your agent
02:53:19.700 | execution logic or you can actually just execute the tool
02:53:23.700 | and just pass that information directly through. Perhaps, in
02:53:27.220 | some cases, you might have some additional post processing for
02:53:30.740 | your final answer. Maybe you do some checks to make sure it
02:53:33.220 | hasn't said anything weird. You could add that in this tool
02:53:37.300 | here but yeah, in this case, we're just going to pass those
02:53:41.060 | through directly. So, let's run this. We've added, where are we?
02:53:48.820 | Final answer. We've added the final answer tool to our named
02:53:51.460 | tool mapping. So, our agent can now use it. We redefine our
02:53:56.100 | agent, setting tool choice to any because we're forcing the
02:53:59.460 | tool choice here and let's go with what is ten plus ten. See
02:54:04.180 | what happens. Okay, we get this, right? We can also, one
02:54:08.900 | thing, nice thing here is that we don't need to check is our
02:54:11.460 | output in the content field or is it in the tool course field?
02:54:14.500 | We know it's going to be in the tool course field because
02:54:16.500 | we're forcing that tool use which is quite nice. So, okay,
02:54:19.860 | we know we're using the add tool and these are the
02:54:22.500 | arguments. Great. We go or go through that process again.
02:54:27.380 | We're going to create our tool message and then we're going to
02:54:30.260 | add those messages into our scratch pad or intermediate
02:54:33.460 | sets and then we can see again, ah, okay, content field is
02:54:38.100 | empty. That is expected. We're forcing tool users. No way that
02:54:42.580 | this can be or have anything inside it but then if we come
02:54:48.020 | down here to our tool course, nice. Final answer, answer, ten
02:54:54.100 | plus ten equals twenty. Alright? We also have this.
02:54:58.820 | Tools used. Where is tools used coming from? Okay, well, I
02:55:01.620 | mentioned before that you can add additional things or
02:55:06.020 | outputs when you're using this tool used for your final
02:55:09.700 | answer. So, if you just come up here to here, you can see that
02:55:14.820 | I asked the LLM to use that tools used field which I
02:55:18.980 | defined here. It's a list of strings. Use this to tell me
02:55:23.140 | what tools you use in your answer, right? So, I'm getting
02:55:26.260 | the normal answer but I'm also getting this information as
02:55:28.900 | well which is kind of nice. So, that's where that is coming
02:55:31.620 | from. See that? Okay. So, we have our actual answer here and
02:55:36.260 | then we just have some additional information, okay?
02:55:38.980 | We've also defined a type here. It's just a list of strings
02:55:41.620 | which is really nice. It's giving us a lot of control over
02:55:43.940 | what we're outputting which is perfect. That's, you know, when
02:55:46.580 | you're building with agents, the biggest problem in most
02:55:52.340 | cases is control of your LLM. So, here, we're getting a
02:55:58.100 | honestly pretty unbelievable amount of control over what our
02:56:02.740 | LLM is going to be doing which is perfect for when you're
02:56:07.060 | building in the real world. So, this is everything that we
02:56:12.580 | need. This is our answer and we would of course be passing
02:56:15.460 | that downstream into whatever logic our AI application would
02:56:22.020 | be using, okay? So, maybe that goes directly to a front end
02:56:26.020 | and we're displaying this as our answer and we're maybe
02:56:29.460 | providing some information about, okay, where did this
02:56:31.780 | answer come from or maybe there's some additional steps
02:56:34.980 | downstream where we're actually doing some more processing or
02:56:39.060 | transformations but yeah, we have that. That's great. Now,
02:56:43.540 | everything we've just done here, we've been executing
02:56:45.940 | everything one by one and that's to help us understand
02:56:50.980 | what process we go through when we're building an agent
02:56:55.220 | executor. But we're not going to want to do that all the time,
02:57:00.500 | are we? Most of the time, we probably want to abstract all
02:57:04.180 | this away and that's what we're going to do now. So, we're
02:57:07.860 | going to build essentially everything we've just taken.
02:57:11.140 | We're going to abstract that and abstract it away into a
02:57:15.220 | custom agent executor class. So, let's have a quick look at
02:57:20.020 | what we're doing here. Although it's literally just
02:57:22.340 | what we just did, okay? So, custom agent executor. We
02:57:27.860 | initialize it. We set this max iterations. I'll talk about
02:57:31.060 | this in a moment. We initialize it. That is going to set our
02:57:34.820 | chat history to just being empty. Okay, good. So, it's a
02:57:38.980 | new agent. There should be no chat history in this case. Then
02:57:42.180 | we actually define our agent, right? So, that part of logic
02:57:45.380 | that is going to be taking our inputs and generating what to
02:57:48.900 | do next aka what tool call to do, okay? And we set everything
02:57:53.460 | as attributes of our class and then we're going to define an
02:57:58.020 | invoke method. This invoke method is going to take an
02:58:02.420 | input which is just a string. So, it's going to be our
02:58:04.500 | message from the user and what it's going to do is it's going
02:58:09.460 | to iterate through essentially everything we just did, okay?
02:58:14.980 | Until we hit the the final answer tool, okay? So, well,
02:58:18.820 | what does that mean? We have our tool call, right? Which is
02:58:23.780 | we're just invoking our agent, right? So, it's going to
02:58:26.980 | generate what tool to use and what parameters should go into
02:58:29.700 | that, okay? And that's an AI message. So, we would append
02:58:35.460 | that to our agent stretch pad and then we're going to use the
02:58:38.820 | information from our tool call. So, the name of the tool and
02:58:42.020 | the args and also the ID. We're going to use all of that
02:58:45.860 | information to execute our tool and then provide the
02:58:51.140 | observation back to our LLM, okay? So, execute our tool here.
02:58:55.860 | We then format the tool output into a tool message. See here
02:59:00.580 | that I'm just using the the output directly. I'm not adding
02:59:03.620 | that additional information there. We do need to always
02:59:08.180 | pass in the tool call ID so that our LLM knows which output
02:59:12.900 | is mapped to which tool. I didn't mention this before in
02:59:16.580 | this video at least but that is that's important when we have
02:59:19.380 | multiple tool calls happening in parallel because that can
02:59:22.500 | happen. When we have multiple tool calls happening in
02:59:25.220 | parallel, let's say we have ten tool calls, all those
02:59:28.100 | responses might come back at different times. So, then the
02:59:31.380 | order of those can get messed up. So, we wouldn't necessarily
02:59:35.780 | always see that it's a AI message beginning a tool call
02:59:41.060 | followed by the answer to that tool call. Instead, it might be
02:59:44.900 | AI message followed by like ten different tool call responses.
02:59:49.620 | So, you need to have those IDs in there, okay? So, then we
02:59:54.260 | pass our tool output back to our Agent Scratchpad or
02:59:58.660 | intermediate steps. I'm sending a print in here so that we can
03:00:02.500 | see what's happening whilst everything is running. Then we
03:00:05.060 | increment this count number. We'll talk about that in a
03:00:08.580 | moment. So, coming past that, we say, okay, if the tool name
03:00:12.660 | here is final answer, that means we should stop, okay? So,
03:00:18.580 | once we get the final answer, that means we can actually
03:00:20.980 | extract our final answer from the final tool call, okay? And
03:00:25.940 | in this case, I'm going to say that we're going to extract the
03:00:31.220 | answer from the tool call or the observation. We're going to
03:00:35.300 | extract the answer that was generated. We're going to pass
03:00:38.260 | that into our chat history. So, we're going to have our user
03:00:41.860 | message. This is the one the user came up with followed by
03:00:45.380 | our answer which is just the natural answer field and that's
03:00:49.700 | simply an AI message. But then we're actually going to be
03:00:52.660 | including all of the information. So, this is the
03:00:55.780 | answer, natural language answer and also the tool was used
03:01:01.220 | output. We're going to be feeding all of that out to some
03:01:04.900 | downstream process as preferred. So, we have that. Now,
03:01:10.900 | one thing that can happen if we're not careful is that our
03:01:15.460 | agent executor may run many, many times and particularly if
03:01:20.660 | we've done something wrong in our logic because we're
03:01:23.140 | building these things, it can happen that maybe we've not
03:01:26.980 | connected the observation back up into our agent executor
03:01:32.260 | logic and in that case, what we might see is our agent
03:01:34.980 | executor runs again and again and again and I mean, that's
03:01:38.020 | fine. We're going to stop it but if we don't realize
03:01:42.020 | straight away and we're doing a lot of LLM calls that can get
03:01:44.980 | quite expensive quite quickly. So, what we can do is we can
03:01:49.060 | set a limit, right? So, that's what we've done up here with
03:01:51.220 | this max iterations. We said, okay, if we go past three max
03:01:54.740 | iterations by default, I'm going to say stop, alright? So,
03:01:58.660 | that's why we have the count here. While count is less than
03:02:02.820 | the max iterations, we're going to keep going. Once we hit the
03:02:06.820 | number of max iterations, we stop, okay? So, the while loop
03:02:09.860 | will just stop looping, okay? So, it just protects us in case
03:02:14.900 | of that and it also potentially maybe at some point, your agent
03:02:19.140 | might be doing too much to answer a question. So, this
03:02:22.260 | will force it to stop and just provide an answer. Although, if
03:02:25.860 | that does happen, I just realized there's a bit of a
03:02:28.980 | fault in the logic here. If that does happen, we wouldn't
03:02:31.940 | necessarily have the answer here, right? So, we'd probably
03:02:35.700 | want to handle that nicely but in this scenario, it's a very
03:02:40.260 | simple use case. We're not going to see that happening. So,
03:02:44.260 | we initialize our custom agent executor and then we invoke it,
03:02:50.740 | okay? And let's see what happens. Alright, there we go.
03:02:54.340 | So, that just wrapped everything into a single invoke.
03:03:00.740 | So, everything is handled for us. We could say, okay, what is
03:03:05.220 | ten? You know, we can modify that and say 7.4 for example
03:03:12.260 | and that will go through. We'll use the multiply tool instead
03:03:15.060 | and then we'll come back to the final answer again, okay? So,
03:03:18.420 | we can see that with this custom agent executor, we've
03:03:22.580 | built an agent and we have a lot more control over everything
03:03:27.060 | that is going on in here. One thing that we would probably
03:03:33.300 | need to add in this scenario is right now, I'm assuming that
03:03:36.500 | only one tool call will happen at once and it's also why I'm
03:03:39.460 | asking here. I'm not asking a complicated question because I
03:03:42.500 | don't want it to go and try and execute multiple tool calls at
03:03:46.340 | once which can happen. So, let's just try this. Okay. So,
03:03:52.660 | this is actually completely fine. So, this did just execute
03:03:55.620 | it one after the other. So, you can see that when asking this
03:04:00.500 | more complicated question, it first did the exponentiate tool
03:04:05.300 | followed by the add tool and then it actually gave us our
03:04:07.620 | final answer which is cool. Also told us we use both of
03:04:11.540 | those tools which it did but one thing that we should just
03:04:16.420 | be aware of is that from OpenAI, OpenAI can actually
03:04:20.420 | execute multiple tool calls in parallel. So, by specifying
03:04:24.980 | that we're just using this zero here, we're actually assuming
03:04:28.660 | that we're only ever going to be calling one tool at any one
03:04:32.420 | time which is not always going to be the case. So, you'd
03:04:35.140 | probably need to add a little bit of extra logic there in
03:04:37.380 | case of scenarios if you're building an agent that is
03:04:41.300 | likely to be running parallel tool calls. But yeah, you can
03:04:45.060 | see here actually it's completely fine. So, it's
03:04:47.620 | running one after the other. Okay. So, with that, we built
03:04:51.140 | our agent executor. I know there's a lot to that and of
03:04:55.860 | course, you can just use the very abstract agent executor
03:04:59.060 | in the chain but I think it's very good to understand what is
03:05:03.140 | actually going on to build our own agent executor in this
03:05:06.420 | case and it sets you up nicely for building more complicated
03:05:10.500 | or use case specific agent logic as well. So, that is it
03:05:17.300 | for this chapter. In this chapter, we're going to be
03:05:20.180 | taking a look at line change expression language. We'll be
03:05:23.460 | looking at the runnables, the serializable and parallel of
03:05:27.940 | those, the runnable pass through and essentially how we
03:05:32.500 | use LSL in its full capacity. Now, to do that well, what I
03:05:38.900 | want to do is actually start by looking at the traditional
03:05:42.820 | approach to building chains in line chain. So, to do that,
03:05:48.260 | we're going to go over to the LSL chapter and open that
03:05:51.860 | curl up. Okay. So, let's come down. We'll do the
03:05:56.900 | prerequisites. As before, nothing major in here. The one
03:06:00.820 | thing that is new is Docker Ray because later on, as you'll
03:06:04.180 | see, we're going to be using this as an example of the
03:06:08.980 | parallel capabilities in LSL. If you want to use Langsmith,
03:06:13.620 | you just need to add in your line chain API key. Okay. And
03:06:16.820 | then let's, okay. So, now, let's dive into the traditional
03:06:20.980 | approach to chains in line chain. So, the LN chain, I
03:06:27.540 | think it's probably one of the first things introduced in
03:06:30.420 | line chain, if I'm not wrong. This takes a prompt and feeds
03:06:33.780 | it into an LLM and that's it. You can also, you can add
03:06:39.540 | like output parsing to that as well but that's optional. I
03:06:44.260 | don't think we're going to cover it here. So, what that
03:06:47.860 | might look like is we have, for example, this prompt
03:06:50.340 | template here. Give me a small report on topic. Okay. So,
03:06:54.420 | that would be our prompt template. We'd set up as we
03:06:57.860 | usually do with the prompt templates as we've seen
03:07:01.540 | before. We then define our LLM. We need our API key for
03:07:08.180 | this which as usual, we would get from platform.openai.com.
03:07:14.020 | Then, we go ahead. I'm just showing you that you can invoke
03:07:18.580 | the LLM there. Then, we go ahead actually define a output
03:07:23.460 | parser. So, we do do this. I wasn't sure we did but we will
03:07:26.740 | then define our LLM chain like this. Okay. So, LLM chain, we
03:07:31.220 | are now prompt and now LLM and now output parser. Okay. This
03:07:36.740 | is the traditional approach. So, I would then say, okay,
03:07:42.660 | retrieve augmented generation and what it's going to do is
03:07:44.820 | it's going to give me a little report back on on rag. Okay.
03:07:49.620 | It takes a moment but you can see that that's what we get
03:07:51.940 | here. We can format that nicely as we usually do and we get,
03:07:57.780 | okay, look, we get a nice little report. However, the LLM
03:08:01.620 | chain is one, it's quite restrictive, right? We have to
03:08:05.380 | have like particular parameters that have been predefined as
03:08:09.220 | being usable which is, you know, restrictive and it's also
03:08:13.060 | been deprecated. So, you know, this isn't the standard way of
03:08:17.620 | doing this anymore but we can still use it. However, the
03:08:21.700 | preferred method to building this and building anything else
03:08:25.140 | really or chains in general in line chain is using LSL, right?
03:08:29.540 | And it's super simple, right? So, we just actually take the
03:08:32.100 | prompt LLM and output parser that we had before and then we
03:08:35.060 | just chain them together with these pipe operators. So, the
03:08:38.420 | pipe operator here is saying, take what is output from here
03:08:41.860 | and input it into here. Take what is output from here and
03:08:45.380 | put it into here. That's all it does. It's super simple. So,
03:08:49.700 | put those together and we invoke it in the same way and
03:08:52.820 | we'll get the same output, okay? And that's what we get.
03:08:58.500 | There is actually a slight difference on what we're
03:09:01.220 | getting out from there. You can see here we got actually a
03:09:04.500 | dictionary but that is pretty much the same, okay? So, we get
03:09:09.460 | that and as before, we can display that in Markdown with
03:09:14.260 | this, okay? So, we saw just now that we have this pipe
03:09:18.100 | operator here. It's not really standard Python syntax to use
03:09:26.260 | this or at least it's definitely not common. It's an
03:09:29.940 | aberration of the intended use of Python, I think. But anyway,
03:09:35.380 | it does, it looks cool and when you understand it, I kinda get
03:09:41.460 | why they do it because it does make things quite simple in
03:09:44.260 | comparison to what it could be otherwise. So, I kinda get it.
03:09:47.860 | It's a little bit weird but it's what they're doing and I'm
03:09:51.060 | teaching it ourselves. That's what we're going to learn. So,
03:09:55.780 | what is that pipe operator actually doing? Well, it's as I
03:10:04.020 | mentioned, it's taking the output from this, putting it as
03:10:06.340 | input into what is ever on the right but how does that
03:10:10.260 | actually work? Well, let's actually implement it
03:10:14.580 | ourselves without line chain. So, we're going to create this
03:10:17.380 | class called Runnable. This class, when we initialize it,
03:10:20.580 | it's going to take a function, okay? So, this is literally a
03:10:23.460 | Python function. It's going to take that and it's going to
03:10:28.180 | essentially turn it into what we would call a Runnable in
03:10:31.780 | line chain and what does that actually mean? Well, it doesn't
03:10:34.740 | really mean anything. It just means that when you use run the
03:10:40.180 | invoke method on it, it's going to call that function in the
03:10:43.140 | way that you would have done otherwise, alright? So, using
03:10:46.340 | just function, you know, brackets, open, parameters,
03:10:50.100 | brackets, close. It's going to do that but it's also going to
03:10:53.460 | add this method, this all method. Now, this all method in
03:10:59.060 | typical Python syntax. Now, this all method is essentially
03:11:03.620 | going to take your Runnable function, the one that you
03:11:07.140 | initialize with and it's also going to take an other
03:11:10.900 | function, okay? This other function is actually going to
03:11:14.260 | be a Runnable, I believe. Yes, it's going to be a Runnable
03:11:17.860 | just like this and what it's going to do is it's going to
03:11:22.180 | run this Runnable based on the output of your current
03:11:28.020 | Runnable, okay? That's what this all is going to do. Seems a
03:11:32.340 | bit weird maybe but I'll explain in a moment. We'll see
03:11:35.380 | why that works. So, I'm going to chain a few functions
03:11:39.540 | together using this all method. So, first, we're just
03:11:44.660 | going to turn them all into Runnables, okay? So, these are
03:11:47.620 | normal functions as you can see, normal Python functions.
03:11:50.660 | We then turn them into this Runnable using our Runnable
03:11:53.380 | class. Then, look what we can do, right? So, we're going to
03:11:59.460 | create a chain that is going to be our Runnable chained with
03:12:05.460 | another Runnable chained with another Runnable, okay? Let's
03:12:09.140 | see what happens. So, we're going to invoke that chain of
03:12:12.500 | Runnables with three. So, what is this going to do? Okay, we
03:12:17.540 | start with five. We're going to add five to three. So, we'll
03:12:21.220 | get eight. Then, we're going to subtract five from eight to
03:12:25.940 | give us three again and then we're going to multiply three
03:12:32.420 | by five to give us fifteen and we can invoke that and we get
03:12:37.860 | fifteen, okay? Pretty cool. So, that is interesting. How does
03:12:43.780 | that relate to the pipe operator? Well, that pipe
03:12:48.020 | operator in Python is actually a shortcut for the all method.
03:12:52.820 | So, what we just implemented is the pipe operator. So, we can
03:12:56.980 | actually run that now with the pipe operator here and we'll
03:13:00.660 | get the same. We'll get fifteen, right? So, that's that's
03:13:03.540 | what LineChain is doing. Like, under the hood, that is what
03:13:06.900 | that pipe operator is. It's just chaining together these
03:13:10.500 | multiple Runnables as we'd call them using their own internal
03:13:14.740 | or operator, okay? Which is cool. I will give them that.
03:13:19.140 | It's kind of a cool way of doing this. It's creative. I
03:13:22.340 | wouldn't have thought about it myself. So, yeah, that is a
03:13:27.620 | pipe operator. Then, we have these Runnable things, okay? So,
03:13:31.300 | this is this is different to the Runnable I just defined
03:13:34.020 | here. This is we define this ourselves. It's not a
03:13:37.220 | LineChain thing. We didn't get this from LineChain. Instead,
03:13:42.180 | this Runnable lambda object here, that is actually exactly
03:13:48.100 | the same as what we just defined, alright? So, what we
03:13:50.740 | did here, this Runnable, this Runnable lambda is the same
03:13:57.140 | thing but in LineChain, okay? So, if we use that, okay? We
03:14:01.780 | use that to now define three Runnables from the functions
03:14:06.100 | that we defined earlier. We can actually pair those together
03:14:09.300 | now using the the pipe operator. You could also pair
03:14:12.820 | them together if you want with the or operator, right? So, we
03:14:18.740 | could do what we did earlier. We can invoke that, okay? Or as
03:14:24.340 | we were doing originally, we choose pipe operator. Exactly
03:14:28.580 | the same. So, this Runnable lambda from LineChain is just
03:14:31.620 | what we what we just built with the Runnable. Cool. So, we have
03:14:35.540 | that. Now, let's try and do something a little more
03:14:38.820 | interesting. We're going to generate a report and we're
03:14:40.740 | going to try and edit that report using this
03:14:43.140 | functionality, okay? So, give me a small report about topic,
03:14:47.140 | okay? We'll go through here. We're going to get our report
03:14:51.780 | on AI, okay? So, we have this. You can see that AI is
03:14:57.540 | mentioned many times in here. Then, we're going to take a
03:15:04.820 | very simple function, right? So, I'm just going to extract
03:15:07.700 | fact. This is basically going to take what is it? See, taking
03:15:12.260 | the first. Okay. So, we're actually trying to remove the
03:15:17.300 | introduction here. I'm not sure if this actually will work as
03:15:20.740 | expected but it's it's fine. Try it anyway but then more
03:15:27.620 | importantly, we're going to replace this word, okay? So,
03:15:30.500 | we're going to replace an old word with a new word. Our old
03:15:32.820 | word is going to be AI. Our new word is going to be Skynet,
03:15:35.700 | okay? So, we can wrap both of these functions as Runnable
03:15:40.820 | Lambdas, okay? We can add those as additional steps inside our
03:15:45.380 | entire chain, alright? So, we're going to extract, try and
03:15:48.900 | remove the introduction although I think it needs a bit
03:15:51.540 | more processing than just splitting here and then we're
03:15:55.060 | going to replace the word. We need that actually to be AI.
03:15:58.340 | Run that, run this.
03:16:01.540 | Okay. So, now we get Artificial Intelligence Skynet refers to
03:16:07.200 | the simulation of human intelligence processed by
03:16:09.040 | machines and then we have narrow Skynet, weak Skynet, and
03:16:13.360 | strong Skynet. Applications of Skynet. Skynet technology is
03:16:17.600 | being applied in numerous fields including all these
03:16:19.760 | things. Scary. Despite its potential, Skynet poses several
03:16:24.800 | challenges. Systems can perpetrate existing biases. It
03:16:29.680 | raises significant privacy concerns. It can be exploited
03:16:34.160 | for malicious purposes, okay? So, we have all these, you know,
03:16:38.800 | it's just a silly little example. We can see also the
03:16:41.440 | introduction didn't work here. The reason for that is because
03:16:44.400 | our introduction includes multiple new lines here. So, I
03:16:48.400 | would actually, if I want to remove the introduction, we
03:16:51.280 | should remove it from here, I think. This is a, I will never
03:16:56.240 | actually recommend you do that because it's not, it's not very
03:17:00.960 | flexible. It's not very robust but just so I show you that
03:17:06.640 | that is actually working. So, this extract fact runnable,
03:17:10.560 | right? So, now we're essentially just removing the
03:17:13.840 | introduction, right? Why would we want to do that? I don't
03:17:17.440 | know but it's there just so you can see that we can have
03:17:20.880 | multiple of these runnable operations running and they
03:17:24.880 | can be whatever you want them to be. Okay, it is worth
03:17:28.400 | knowing that the inputs to our functions here were all single
03:17:32.880 | arguments, okay? If you have a function that is accepting
03:17:37.280 | multiple arguments, you can do that in the way that I would
03:17:40.080 | probably do it or you can do it in multiple ways. One of the
03:17:44.000 | ways that you can do that is actually write your function to
03:17:48.320 | accept multiple arguments but actually do them through a
03:17:50.800 | single argument. So, just like a single like x which would be
03:17:53.600 | like a dictionary or something and then just unpack them
03:17:56.560 | within the function and use them as needed. That's just,
03:17:59.040 | you know, one way you can do it. Now, we also have these
03:18:02.000 | different runnable objects that we can use. So, here we have
03:18:06.080 | runnable parallel and runnable pass-through. It's kind of
03:18:10.480 | self-explanatory to some degree. So, let me just go
03:18:13.680 | through those. So, runnable parallel allows you to run
03:18:17.360 | multiple runnable instances in parallel. Runnable pass-through
03:18:23.040 | may be less self-explanatory, allows us to pass a variable
03:18:26.880 | through to the next runnable without modifying it, okay? So,
03:18:30.960 | let's see how they would work. So, we're going to come down
03:18:33.600 | here and we're going to set up these two docker arrays or
03:18:37.280 | obviously, it's two sources of information and we're going to
03:18:42.080 | need our LN to pull information from both of these sources of
03:18:46.560 | information in parallel which is going to look like this. So,
03:18:49.600 | we have these two sources of information, vector store A,
03:18:53.440 | vector store B. This is our docker A and docker A B. These
03:18:58.960 | are both going to be fed in as context into our prompt. Then,
03:19:02.960 | our LN is going to use all of that to answer the question.
03:19:07.520 | Okay. So, to actually implement that, we have our, we need an
03:19:12.080 | embedding model. So, use OpenAI embeddings. We have our
03:19:15.520 | vector store A, vector store B. They're not, you know, real
03:19:19.440 | vectors. They're not full-on vectors here. We're just
03:19:22.480 | passing in a very small amount of information to both. So,
03:19:26.320 | we're saying, okay, we're going to create an in-memory vector
03:19:30.400 | store using these two bits of information. So, when say half
03:19:33.680 | the information is here, this would be a irrelevant piece of
03:19:36.000 | information. Then, we have the relevant information which is
03:19:38.800 | DeepSeq v3 was released in December 2024. Okay. Then, we're
03:19:44.160 | going to have some other information in our other vector
03:19:46.960 | store. Again, irrelevant piece here and relevant piece here.
03:19:51.200 | Okay. The DeepSeq v3 LLM is a mixture of experts model with
03:19:55.840 | 671 billion parameters at its largest. Okay. So, based on
03:20:02.160 | that, we're also going to build this prompt string. So, we're
03:20:04.960 | going to pass in both of those contexts into our prompt. Now,
03:20:07.840 | I'm going to ask a question. We don't actually need, we don't
03:20:12.320 | need that bit and actually, we don't even need that bit. What
03:20:16.000 | am I doing? So, we just need this. So, we have the both the
03:20:19.040 | contexts and we would run them through our prompt template.
03:20:23.520 | Okay. So, we have our system prompt template which is this
03:20:28.240 | and then we're just going to have, okay, our question is
03:20:30.160 | going to go into here as a user message. Cool. So, we have that
03:20:35.120 | and then, let me make this easier to read. We're going to
03:20:40.640 | convert both of those to retrievers which just means we
03:20:43.440 | can retrieve stuff from them and we're going to use this
03:20:46.800 | runnable parallel to run both of these in parallel, right? So,
03:20:54.240 | these have been both being run in parallel but then we're also
03:20:56.960 | running our question in parallel because this needs to
03:20:58.880 | be essentially passed through this component without us
03:21:03.600 | modifying anything. So, when we look at this here, it's almost
03:21:07.680 | like, okay, this section here would be our runnable parallel
03:21:12.960 | and these are being run in parallel but also our query is
03:21:17.600 | being passed through. So, it's almost like there's another
03:21:20.480 | line there which is our runnable pass through, okay? So,
03:21:22.880 | that's what we're doing here. These are running in parallel.
03:21:25.920 | One of them is a pass through. I need to run here. I just
03:21:34.480 | realized here we're using the deprecated embeddings. Just
03:21:38.800 | switch it to this. So, line chain open AI. We run that, run
03:21:44.160 | this, run that and now this is set up, okay? So, we then put
03:21:54.320 | our initial. So, this using our runnable parallel and runnable
03:21:58.320 | pass through. That is our initial step. We then have our
03:22:02.240 | prompt. Now, we should be chained together with the
03:22:06.960 | usual, you know, the usual pipe operator, okay? And now, we're
03:22:11.680 | going to invoke a question. What architecture does the mod
03:22:14.160 | DeepSeq release in December use, okay? So, for the ELN to
03:22:18.880 | answer this question, it's going to need to tell us what it
03:22:21.840 | needs the information about the DeepSeq model that was released
03:22:24.640 | in December which we have specified in one half here and
03:22:30.800 | then it also needs to know what architecture that model uses
03:22:33.280 | which is defined in the other half over here, okay? So, let's
03:22:39.040 | run this, okay? There we go. DeepSeq v3 model released in
03:22:45.040 | December 2024 is a mixture of experts model with 671 billion
03:22:49.840 | parameters, okay? So, a mixture of experts and this many
03:22:53.200 | parameters. Pretty cool. So, we've put together our pipeline
03:22:58.240 | using LSL, using the pipe operator, the runnables,
03:23:02.800 | specifically, we've looked at the runnable parallel, runnable
03:23:06.160 | pass through, and also the runnable lambdas. So, that's it
03:23:09.200 | for this chapter on LSL and we'll move on to the next one.
03:23:13.600 | In this chapter, we're going to cover streaming and async in
03:23:17.920 | lang chain. Now, both using async code and using streaming
03:23:23.200 | are incredibly important components of I think almost
03:23:28.320 | any conversational chat interface or at least any good
03:23:32.880 | conversational chat interface. For async, if your application
03:23:38.080 | is not async and you're spending a load of time in your
03:23:42.480 | API or whatever else waiting for LLM calls because a lot of
03:23:45.920 | those are behind APIs, you are waiting and your application is
03:23:50.880 | doing nothing because you've written synchronous code and
03:23:54.080 | that, well, there are many problems with that. Mainly, it
03:23:57.760 | doesn't scale. So, async code generally performs much better
03:24:02.160 | and especially for AI where a lot of the time, we're kind of
03:24:06.320 | waiting for API calls. So, async is incredibly important
03:24:09.680 | for that. For streaming, now, streaming is slightly different
03:24:13.920 | thing. So, let's say I want to tell me a story, okay? I'm
03:24:21.120 | using gbt4 here. It's a bit slower. So, we can actually
03:24:23.760 | stream. We can see that token by token, this text is being
03:24:27.200 | produced and sent to us. Now, this is not just a visual
03:24:30.480 | thing. This is the LLM when it is generating tokens or words,
03:24:38.240 | it is generating them one by one and that's because these
03:24:41.760 | LLMs literally generate tokens one by one. So, they're looking
03:24:45.600 | at all of the previous tokens in order to generate the next
03:24:48.240 | one and then generate next one, generate next one. Now, that's
03:24:50.720 | how they work. So, when we are implementing streaming, we're
03:24:56.800 | getting that feed of tokens directly from the LLM through
03:25:00.160 | to our, you know, our back end or our front end. That is what
03:25:03.520 | we see when we see that token by token interface, right? So,
03:25:07.520 | that's one thing. One other thing that I can do that, let
03:25:12.080 | me switch across to 4.0 is I can say, okay, we just got this
03:25:16.480 | story. I'm going to ask, are there any standard storytelling
03:25:26.480 | techniques to follow use above? Please use search.
03:25:35.440 | Okay. So, look, we get this very briefly there. We saw that
03:25:42.240 | it was searching the web and the way, it's not because we
03:25:46.240 | told it, okay, we told the LLM to use the search tool but then
03:25:51.600 | the LLM output some tokens to say, use the search tool that
03:25:56.320 | it's going to use a search tool and it also would have output
03:26:00.240 | the token saying what that search query would have been
03:26:02.720 | although we didn't see it there. But, what the chat GPT
03:26:07.760 | interface is doing there, so it received those tokens saying,
03:26:11.440 | hey, I'm going to use the search tool. It doesn't just send us
03:26:14.400 | those tokens like it does with the standard tokens here.
03:26:17.040 | Instead, it used those tokens to show us that searching the
03:26:22.960 | web little text box. So, streaming is not just the
03:26:28.000 | streaming of these direct tokens. It's also the streaming
03:26:33.120 | of these intermediate steps that the LLM may be thinking
03:26:36.640 | through which is particularly important when it comes to
03:26:40.960 | agents and agentic interfaces. So, it's also a feature thing,
03:26:45.280 | right? Streaming doesn't just look nice. It's also a feature.
03:26:49.360 | Then, finally, of course, when we're looking at this, okay,
03:26:53.200 | let's say we go back to GPT-4 and I say, okay, use all of
03:27:02.640 | this information to generate a long story for me,
03:27:11.200 | right? And, okay, we are getting the first token now. So, we
03:27:16.320 | know something is happening. We need to start reading. Now,
03:27:19.120 | imagine if we were not streaming anything here and
03:27:22.400 | we're just waiting, right? We're still waiting now. We're
03:27:25.200 | still waiting and we wouldn't see anything. We're just like,
03:27:28.240 | oh, it's just blank or maybe there's a little loading
03:27:30.800 | spinner. So, we'd still be waiting and even now, we're
03:27:37.280 | still waiting, right? This is an extreme example but can you
03:27:44.720 | imagine just waiting for so long and not seeing anything as
03:27:48.080 | a user, right? Now, just now, we would have got our answer if
03:27:52.240 | we were not streaming. I mean, that would be painful as a
03:27:56.560 | user. You'd not want to wait especially in a chat interface.
03:28:00.880 | You don't want to wait that long. It's okay with, okay, for
03:28:03.680 | example, deep research takes a long time to process but you
03:28:07.840 | know it's going to take a long time to process and it's a
03:28:10.000 | different use case, right? You're getting a report. This is
03:28:13.440 | a chat interface and yes, most messages are not going to take
03:28:18.560 | that long to generate. We're also probably not going to be
03:28:22.320 | using GPT-4 depending on, I don't know, maybe some people
03:28:25.440 | still do but in some scenarios, it's painful to need to wait
03:28:30.640 | that long, okay? And it's also the same for agents. It's nice
03:28:34.560 | when you're using agents to get an update on, okay, we're using
03:28:37.600 | this tool. It's using this tool. This is how it's using
03:28:39.680 | them. Perplexity, for example, have a very nice example of
03:28:43.840 | this. So, okay, what's this? OpenAI co-founder joins
03:28:48.240 | Mirati's startup. Let's see, right. So, we see this is
03:28:51.200 | really nice. We're using ProSearch. It's searching for
03:28:53.920 | news, showing us the results, like we're getting all this
03:28:57.200 | information as we're waiting which is really cool and it
03:29:01.840 | helps us understand what is actually happening, right? It's
03:29:05.040 | not needed in all use cases but it's super nice to have those
03:29:08.480 | intermediate steps, right? So, then we're not waiting and I
03:29:11.600 | think this bit probably also streamed but it was just super
03:29:14.240 | fast. So, I didn't see it but that's pretty cool. So,
03:29:18.640 | streaming is pretty important. Let's dive into our example.
03:29:23.920 | Okay, we'll open that in Colab and off we go. So, starting with
03:29:28.000 | the prerequisites, same as always, LangChain, optionally
03:29:32.320 | LangSmith. We'll also enter our LangChain API key if you'd
03:29:36.160 | like to use LangSmith. We'll also enter our OpenAI API key.
03:29:40.240 | So, that is platform.openai.com and then as usual, we can just
03:29:45.200 | invoke our LLM, right? So, we have that. It's working. Now,
03:29:50.160 | let's see how we would stream with AStream, okay? So,
03:29:54.880 | whenever a method, so stream is actually a method as well, we
03:29:58.800 | could use that but it's not async, right? So, whenever we
03:30:01.760 | see a method in LangChain that has a prefix onto what would be
03:30:06.320 | another method, that's like the async version of this. So, we
03:30:12.560 | can actually stream using async super easily using just LLM
03:30:19.680 | AStream, okay? Now, this is just an example and to be
03:30:25.280 | completely honest, you probably will not be able to use this in
03:30:28.720 | an actual application but it's just an example and we're going
03:30:32.400 | to see how we would use this or how we would stream
03:30:35.680 | asynchronously in an application further down in
03:30:39.040 | this notebook. So, starting with this, you can see here that
03:30:44.480 | we're getting these tokens, right? We're just appending it
03:30:46.800 | to tokens here. We don't actually need to do that. I
03:30:48.800 | don't think we're using this but maybe we, yeah, we'll do it
03:30:52.480 | here. It's fine. So, we're just appending the tokens as they
03:30:56.400 | come back from our LLM, appending it to this. We'll see
03:31:00.000 | what that is in a moment and then I'm just printing the
03:31:03.680 | token content, right? So, the content of the token. So, in
03:31:08.240 | this case, that would be L. In this case, it would be LP. It
03:31:11.440 | would be SAMS, four, so on and so on. So, you can see for the
03:31:14.720 | most part, it's tends to be word level but it can also be
03:31:18.800 | sub-word level as you see, sent, is one word, of course. So,
03:31:24.320 | you know, they get broken up in various ways. Then, adding
03:31:29.120 | this pipe character onto the end here. So, we can see, okay,
03:31:33.360 | where are our individual tokens? Then, we also have
03:31:36.720 | Flush. So, Flush, you can actually turn this off and
03:31:40.320 | it's still going to stream. You're still going to see
03:31:41.840 | everything but it's going to be a bit more. You can see it's
03:31:43.920 | kind of a, it's like bit by bit. When we use Flush, it
03:31:48.800 | forces the console to update what is being shown to us
03:31:53.680 | immediately, alright? So, we get a much smoother when we're
03:31:58.560 | looking at this. It's much smoother versus when Flush is
03:32:02.160 | not set to true. So, yeah, when you're printing, that is good
03:32:05.840 | to do just so you can see. You don't necessarily need to.
03:32:08.640 | Okay. Now, we added all those tokens to the tokens list so
03:32:12.960 | we can have a look at each individual object that was
03:32:15.600 | returned to us, right? This is interesting. So, you see that
03:32:18.640 | we have the AI message chunk, right? That's an object and
03:32:22.640 | then you have the content. The first one's actually empty.
03:32:26.000 | Second one has that N for NLP and yeah, I mean, that's all we
03:32:31.120 | really need to know. They're very simple objects but they're
03:32:34.240 | actually quite useful because just look at this, right? So,
03:32:38.640 | we can add each one of our AI message chunks, right? Let's
03:32:42.640 | see what that does. It doesn't create a list. It creates this,
03:32:45.920 | right? So, we still just have one AI message chunk but it's
03:32:51.600 | combined the content within those AI message chunks which
03:32:55.440 | is kind of cool, right? So, for example, like we could remove
03:32:59.440 | these, right? And then we just see NLP. So, it's kind of nice
03:33:05.440 | little feature there. I do. I actually quite like that. But
03:33:10.640 | you do need to just be a little bit careful because obviously
03:33:12.800 | you can do that the wrong way and you're going to get like a
03:33:16.720 | I don't know what that is. Some weird token salad. So, yeah,
03:33:21.360 | you need to just make sure you are going to be merging those
03:33:24.480 | in the correct order unless you, I don't know, unless you're
03:33:28.160 | doing something weird. Okay, cool. So, streaming, that was
03:33:32.720 | streaming from a LM. Let's have a look at streaming with
03:33:35.600 | agents. So, we, it gets a bit more complicated to be
03:33:41.120 | completely honest. But we also need to, things are going to
03:33:45.680 | get a bit more complicated so that we can implement this in,
03:33:49.280 | for example, an API, right? That is, it's kind of like a
03:33:52.800 | necessary thing in any case. So, to just very quickly, we're
03:33:58.560 | going to construct our agent executor like we did in the
03:34:01.440 | agent execution chapter. And for that, for the agent
03:34:06.160 | executor, we're going to need tools, chat prompt template, LM
03:34:09.600 | agent, and the agent executor itself, okay? Very quickly, I'm
03:34:13.360 | not going to go through these in detail. We just define our
03:34:16.320 | tools. We have add, multiply, exponentiate, subtract, and
03:34:20.080 | define our answer tool. Merge those into a single list of
03:34:23.200 | tools. Then, we have our prompt template. Again, same as
03:34:27.680 | before, we just have system message, we have chat history,
03:34:30.640 | we have a query, and then we have the agent scratch pad for
03:34:34.960 | those intermediate steps. Then, we define our agent using
03:34:39.760 | LSL. LSL works quite well with both streaming and async, by
03:34:44.000 | the way. It supports both out of the box, which is nice. So, we
03:34:49.840 | define our agent. Then, coming down here, we're going to
03:34:54.800 | create the agent executor. This is the same as before, right?
03:34:58.240 | So, there's nothing new in here, I don't think. So, just
03:35:01.520 | initialize our agent things there. Then, it's, yeah, we're
03:35:06.960 | looping through, looping through. Yeah, nothing, nothing
03:35:11.920 | new there. So, we're just executing, we're invoking our
03:35:15.600 | agent, seeing if there's a tool call. This is slightly, we
03:35:20.480 | could shift this to before or after. It doesn't actually
03:35:22.320 | matter that much. So, we're checking if it's the final
03:35:25.440 | answer. If not, we continue, execute our tools, and so on.
03:35:30.640 | Okay, cool. So, then, we can invoke that. Okay, we go, what
03:35:37.440 | is 10 plus 10? There we go, right? So, we have our agent
03:35:43.040 | executor, it is working. Now, when we are running our agent
03:35:50.240 | executor, with every new query, if we're putting this into an
03:35:54.000 | API, we're probably going to need to provide it with a fresh
03:35:59.200 | callback handler. Okay, so, this is the callback handler is
03:36:02.480 | what's going to handle taking the tokens that are being
03:36:05.520 | generated by a Lemo agent and giving them to some other
03:36:10.160 | piece of code. Like, for example, the streaming
03:36:12.960 | response for an API, and our callback handler is going to
03:36:18.560 | put those tokens in a queue, in our case, and then our, for
03:36:23.840 | example, the streaming object is going to pick them up from
03:36:26.880 | the queue and put them wherever they need to be. So, to allow
03:36:32.080 | us to do that with every new query, rather than us needing
03:36:35.440 | to initialize everything when we actually initialize our
03:36:39.600 | agent, we can add a configurable field to our Lem,
03:36:43.360 | okay? So, we set the configurable fields here. Oh,
03:36:46.960 | also, one thing is that we set streaming equal to true, that's
03:36:50.320 | very minor thing, but just so you see that there, we do do
03:36:54.080 | that. So, we add some configurable fields to our Lem,
03:36:57.200 | which means we can basically pass an object in for these on
03:37:00.640 | every new invocation. So, we set our configurable field, it's
03:37:06.000 | going to be called callbacks, and we just add a description,
03:37:09.440 | right? Nothing more to it. So, this will now allow us to
03:37:13.120 | provide that field when we're invoking our agent, okay? Now,
03:37:21.120 | we need to define our callback handler, and as I mentioned,
03:37:25.680 | what is basically going to be happening is this callback
03:37:28.000 | handler is going to be passing tokens into our async IO queue
03:37:33.200 | object, and then we're going to be picking them up from the
03:37:36.960 | queue elsewhere, okay? So, we can call it a queue callback
03:37:40.640 | handler, okay? And that is inheriting from the async
03:37:44.560 | callback handler, because we want all this to be done
03:37:46.480 | asynchronously, because we're thinking here about, okay, how
03:37:49.280 | do we implement all this stuff within APIs and actual real
03:37:52.880 | world code, and we do want to be doing all this in async. So,
03:37:58.080 | let me execute that, and I'll just explain a little bit of
03:38:00.240 | what we're looking at. So, we have the initialization, right?
03:38:03.520 | There's nothing specific here. What we really want to be
03:38:08.560 | doing is we want to be setting our queue object, assigning
03:38:11.760 | that to the class attributes, and then there's also this
03:38:15.840 | final answer scene, which we're setting to false. So, what
03:38:19.440 | we're going to be using that for is our LLM will be
03:38:24.240 | streaming tokens to us whilst it's using its tool calling,
03:38:29.360 | and we might not want to display those immediately, or
03:38:31.600 | we might want to display them in a different way. So, by
03:38:34.560 | setting this final answer scene to false, whilst our LLM is
03:38:41.440 | outputting those tool tokens, we can handle them in a
03:38:44.240 | different way, and then as soon as we see that it's done with
03:38:47.360 | the tool calls and it's onto the final answer, which is
03:38:49.600 | actually another tool call, but once we see that it's onto the
03:38:52.160 | final answer tool call, we can set this to true, and then we
03:38:56.240 | can start processing our tokens in a different way,
03:38:59.360 | essentially. So, we have that. Then, we have this
03:39:03.840 | aiter method. This is required for any async generator object.
03:39:11.280 | So, what that is going to be doing is going to be iterating
03:39:13.680 | through, right? So, it's a generator. It's going to be
03:39:16.400 | going iterating through and saying, okay, if our queue is
03:39:19.760 | empty, right? This is the queue that we set up here. If it's
03:39:22.800 | empty, wait a moment, right? We use the sleep method here, and
03:39:27.360 | this is an async sleep method. This is super important. We're
03:39:30.960 | using, we're awaiting for an asynchronous sleep, right? So,
03:39:35.040 | whilst we're, whilst we're waiting for that 0.1 seconds,
03:39:38.880 | our, our code can be doing other things, right? That that
03:39:43.360 | is important. If we, if we use, I think the standard is time
03:39:47.280 | dot sleep, that is not asynchronous, and so it will
03:39:50.560 | actually block the thread for that 0.1 seconds. So, we don't
03:39:54.880 | want that to happen. Generally, our queue should probably not
03:39:58.000 | be empty that frequently given how quickly tokens are going to
03:40:01.680 | be added to the queue. So, the only way that this would
03:40:05.440 | potentially be empty is maybe our LLM stops. Maybe there's
03:40:10.720 | like a connection interruption for a, you know, a brief second
03:40:13.600 | or something, and no tokens are added. So, in that case, we
03:40:17.280 | don't actually do anything. We don't keep checking the queue.
03:40:19.680 | We just wait a moment, okay? And then, we check again. Now,
03:40:24.320 | if it was empty, we wait, and then, we continue on to the
03:40:28.080 | next iteration. Otherwise, it probably won't be empty. We get
03:40:33.040 | whatever is from our, inside our queue. We get that out, pull
03:40:36.160 | it out. Then, we say, okay, if that token is a done token,
03:40:42.640 | we're going to return. So, we're going to stop this
03:40:45.760 | generator, right? We're finished. Otherwise, if it's
03:40:49.680 | something else, we're going to yield that token which means
03:40:52.480 | we're returning that token, but then, we're continuing through
03:40:55.520 | that loop again, right? So, that is our generator logic.
03:41:01.760 | Then, we have some other methods here. These are
03:41:05.360 | line-chain specific, okay? We have on LLM new token and we
03:41:10.400 | have on LLM end. Starting with on LLM new token, this is
03:41:14.960 | basically when an LLM returns a token to us. Line chain is
03:41:18.400 | going to run or execute this method, okay? This is the
03:41:23.280 | method that will be called. What this is going to do is
03:41:27.200 | it's going to go into the keyword arguments. It's going
03:41:29.200 | to get the chunk object. So, this is coming from our LLM. If
03:41:33.280 | there is something in that chunk, it's going to check for
03:41:37.440 | a final answer tool call first, okay? So, we get our tool
03:41:41.680 | calls and we say, if the name within our chunk, right?
03:41:46.400 | Probably, this will be emptying most of the tokens we return,
03:41:49.520 | right? So, you remember before when we're looking at the
03:41:52.640 | chunks here, this is what we're looking at, right? The
03:41:56.160 | content for us is actually always going to be empty and
03:41:58.320 | instead, we're actually going to get the additional keyword
03:42:00.720 | args here and inside there, we're going to have our tool
03:42:03.600 | calling, our tool calls as we saw in the previous videos,
03:42:08.480 | right? So, that's what we're extracting. We're extracting
03:42:10.800 | that information. That's why we're going additional keyword
03:42:13.760 | args, right? And get those tool, the tool call information,
03:42:18.800 | right? Or it will be none, right? So, if it is none, I
03:42:23.360 | don't think it ever would be none to be honest. It would be
03:42:25.840 | strange if it's none. I think that means something would be
03:42:28.080 | wrong. Okay, so here, we're using the Walrus operator. So,
03:42:31.120 | the Walrus operator, what it's doing here is whilst we're
03:42:34.880 | checking the if logic here, whilst we do that, it's also
03:42:39.840 | assigning whatever is inside this. It's assigning over to
03:42:44.160 | tool calls and then with the if we're checking whether tool
03:42:48.240 | calls is something or none, right? Because we're using get
03:42:52.640 | here. So, if this get operation fails and there is no tool
03:42:56.640 | calls, this object here will be equal to none which gets
03:43:01.360 | assigned to tool calls here and then this if none will return
03:43:06.160 | false and this logic will not run, okay? And it will just
03:43:09.680 | continue. If this is true, so if there is something returned
03:43:13.520 | here, we're going to check if that something returned is
03:43:16.400 | using the function name or tool name, final answer. If it is,
03:43:20.560 | we're going to set that final answer scene equal to true.
03:43:23.040 | Otherwise, we're just going to add our chunk into the queue,
03:43:27.760 | okay? We use put no weight here because we're we're using
03:43:30.560 | async. Otherwise, if you were not using async, I think you
03:43:33.600 | might just put weight or maybe even put put. No, okay, you
03:43:39.360 | you'd use put if it's just synchronous code but II don't
03:43:43.200 | think I've ever implemented this synchronously. So, it
03:43:46.240 | would actually just be put no weight for async, okay? And
03:43:49.440 | then return. So, we have that. Then, we have on LLM end, okay?
03:43:56.480 | So, this is when chain sees that the LLM has returned or
03:44:02.080 | indicated that it is finished with the response. Line chain
03:44:06.480 | will call this. So, you have to be aware that this will happen
03:44:13.120 | multiple times during an agent execution because if you think
03:44:17.440 | within our agent executor, we're hitting the LLM multiple
03:44:22.080 | times. We have that first step where it's deciding, oh, I'm
03:44:25.600 | going to use the add tool or the multiply tool and then that
03:44:29.120 | response gets back to us. We execute that tool and then we
03:44:33.360 | pass the output from that tool and or the original user query
03:44:36.960 | in the chat history, we pass that back to our LLM again,
03:44:39.680 | right? So, that's another call to our LLM that's going to come
03:44:42.560 | back. It's going to finish or it's going to give us something
03:44:45.120 | else, right? So, there's multiple LLM calls happening
03:44:48.640 | throughout our agent execution logic. So, this on LLM call
03:44:53.200 | will actually get called at the end of every single one of
03:44:55.680 | those LLM calls. Now, if we get to the end of a LLM call and it
03:45:02.480 | was just a it was a tool invocation. So, we had the, you
03:45:05.600 | know, it called the add tool. We don't want to put the done
03:45:11.280 | token into our queue because when the done token is added to
03:45:14.640 | our queue, we're going to stop iterating, okay? Instead, if it
03:45:20.880 | was just a tool call, we're going to say step end, right?
03:45:24.240 | And we'll actually get this token back. So, this is useful
03:45:27.920 | on, for example, the front end, you could have, okay, I've
03:45:32.560 | used the add tool. These are the parameters and it's the end
03:45:36.560 | of the step. So, you could have that your tool call is being
03:45:40.640 | used on some front end and as soon as it sees step end, it
03:45:43.840 | knows, okay, we're done with that. Here was the response,
03:45:46.720 | right? And it can just show you that and we're going to use
03:45:49.680 | that. We'll see that soon but let's say we get to the final
03:45:53.280 | answer tool. We're on the final answer tool and then we get
03:45:56.400 | this signal that the LLM has finished. Then, we need to stop
03:46:01.920 | iterating. Otherwise, our our stream generator is just going
03:46:06.000 | to keep going forever, right? Nothing's going to stop it or
03:46:08.880 | maybe it will time out. I don't think it will though. So, at
03:46:13.200 | that point, we need to send, okay, stop, right? We need to
03:46:16.800 | say we're done and then that will that will come back to
03:46:19.760 | here to our iterator and to our async iterator and it will
03:46:25.360 | return and stop the generator, okay? So, that's the core
03:46:30.960 | logic that we have inside that. I know there's a lot going on
03:46:34.240 | there. It's but we need all of this. So, it's important to be
03:46:38.400 | aware of it. Okay. So, now, let's see how we might actually
03:46:43.040 | call our agent with all of the streaming in this way. So,
03:46:49.360 | we're going to initialize our queue. We're going to use that
03:46:53.120 | to initialize a streamer, okay? Using the the custom streamer
03:46:56.400 | that we just set up. Custom callback handler, whatever you
03:46:59.040 | want to call it, okay? Then, I'm going to define a function.
03:47:03.200 | So, this is an asynchronous function. It has to be if if
03:47:05.840 | we're using async and what it's going to do is it's going to
03:47:09.200 | call our agent with a config here and we're going to pass it
03:47:14.720 | that call the the callback which is the streamer, right?
03:47:18.320 | Now, here, I'm not calling the agent executor. I'm just calling
03:47:20.800 | the agent, right? So, the if we come back up here, we're
03:47:25.360 | calling this, right? So, that's not going to include all the
03:47:28.720 | tool execution logic and importantly, we're calling the
03:47:32.960 | agent with the config that uses callbacks, right? So, this
03:47:37.840 | this configurable fields here from our LM is actually being
03:47:40.720 | fed through and it propagates through to our agent object as
03:47:43.360 | well to the runnable serializable, right? So, that's
03:47:47.200 | what we're executing here. We see agent with config and we're
03:47:50.560 | passing in those callbacks which is just one actually,
03:47:54.000 | okay? So, that sets up our agent and then we invoke it with
03:47:58.240 | a stream, okay? Like we did before and we're just going to
03:48:01.760 | return everything. So, let's run that, okay? And we see all
03:48:07.280 | the token or the chunk objects that have been returned and
03:48:10.480 | this is useful to understand what we're actually doing up
03:48:14.080 | here, right? So, when we're doing this chunk message,
03:48:17.920 | additional keyword arguments, right? We can see that in here.
03:48:20.960 | So, this would be the chunk message object. We get the
03:48:24.640 | additional keyword logs. We're going to tool calls and we get
03:48:28.480 | the information here. So, we have the ID for that tool call
03:48:31.040 | which we saw in the previous chapters. Then, we have our
03:48:35.760 | function, right? So, the function includes the name,
03:48:39.760 | right? So, we know what tool we're calling from this first
03:48:42.560 | chunk but we don't know the arguments, right? Those
03:48:44.960 | arguments are going to be streamed to us. So, we can see
03:48:47.600 | them begin to come through in the next chunk. So, next chunk
03:48:51.920 | is just it's just the first token for the add function,
03:48:56.640 | right? And we can see these all come together over multiple
03:49:00.640 | steps and we actually get all of our arguments, okay? That's
03:49:05.600 | pretty cool. So, actually one thing I would like to show you
03:49:10.000 | here as well. So, if we just do token equals tokens, sorry.
03:49:17.520 | And we do
03:49:20.320 | tokens.appendtoken.
03:49:25.460 | Okay. We have all of our tokens in here now. Alright, see that
03:49:31.260 | they're all AI message chunks. So, we can actually add those
03:49:35.500 | together, right? So, let's we'll go with these here and
03:49:39.340 | based on these, we're going to get all of the arguments, okay?
03:49:42.540 | So, this is kind of interesting. So, it's one until
03:49:46.220 | I think like the second to last maybe.
03:49:51.420 | Alright, so we have these and actually we just want to add
03:49:56.240 | those together. So, I'm going to go with tokens one and I'm
03:50:01.920 | just going to go four.
03:50:07.760 | For token in, we're going to go from the second onwards. I'm
03:50:13.700 | going to TK plus token, right? And let's see what TK looks
03:50:19.460 | like at the end here. TK.
03:50:23.780 | Okay. So, now you see that it's kind of merged with all those
03:50:28.180 | arguments here. Sorry, plus equal. Okay. So, run that and
03:50:34.500 | you can see here that it's merged those arguments. It
03:50:36.900 | didn't get all of them. So, I kind of missed some at the end
03:50:38.980 | there but it's merging them, right? So, you can see that
03:50:42.020 | logic where it's, you know, before it was adding the
03:50:45.060 | content from various chunks. It also does the same for the
03:50:49.460 | other parameters within your chunk object which is I think
03:50:53.220 | it's pretty cool and you can see here the name wasn't
03:50:55.940 | included. That's because we started on token one or on
03:50:59.700 | token zero where the name was. So, if we actually started from
03:51:02.660 | token zero and let's just let's just pull them in there,
03:51:06.660 | alright? So, from one onwards, we're going to get a complete
03:51:12.820 | AI message chunk which includes the name here and all of those
03:51:17.940 | arguments and you'll you'll see also here, right? Populate
03:51:21.220 | everything which is pretty cool. Okay. So, we have that.
03:51:26.900 | Now, based on this, we're going to want to modify our custom
03:51:29.700 | agent executor because we're streaming everything, right?
03:51:34.500 | So, we want to add streaming inside our agent executor which
03:51:38.020 | we're doing here, right? So, this is async def stream and
03:51:42.180 | we're sharing async for token in the A stream, okay? So, this
03:51:47.620 | is like the very first instance. If output is non,
03:51:51.220 | we're just going to be adding our token. So, the the chunk,
03:51:55.140 | sorry, to our output like a first token becomes our output.
03:52:00.740 | Otherwise, we're just appending our tokens to the output, okay?
03:52:06.660 | If the token content is empty, which it should be, right?
03:52:09.860 | Because we're using tool calls all the time. We're just going
03:52:12.340 | to print content, okay? I just added these as so we see like
03:52:16.580 | print everything. I just want to want to be able to see that.
03:52:19.540 | I wouldn't expect this to run because we're saying it has to
03:52:22.900 | use tool calling, okay? So, within our agent, if we come up
03:52:28.180 | to here, we said tool choice any. So, it's been forced to
03:52:30.980 | use tool calling. So, it should never really be returning
03:52:34.100 | anything inside the content field but just in case it's
03:52:36.980 | there, right? So, we'll see if that is actually true. Then,
03:52:40.740 | we're just getting out our tool calls information, okay? From
03:52:44.820 | our chunk and we're going to say, okay, if there's something
03:52:46.900 | in there, we're going to print what is in there, okay? And
03:52:49.540 | then, we're going to extract our tool name. If there is some,
03:52:52.500 | if there's a tool name, I'm going to show you the tool name.
03:52:55.780 | Then, we're going to get the ARGs and if the ARGs are not
03:52:58.740 | empty, we're going to see what we get in there, okay? And then
03:53:03.060 | from all of this, we're actually going to merge all of
03:53:05.380 | it into our AI message, right? Because we're merging
03:53:08.980 | everything as we're going through, we're merging
03:53:10.420 | everything into outputs as I showed you before, okay? Cool.
03:53:13.860 | And then, we're just awaiting our stream that will like kick
03:53:16.340 | it off, okay? And then, we do the standard agent executor
03:53:20.420 | stuff again here, right? So, we're just pulling out tool
03:53:23.380 | name, tool logs, tool call ID and then we're using all that
03:53:26.100 | to execute our tool here and then we're creating a new tool
03:53:29.700 | message and passing that back in. And then also here, I move
03:53:33.300 | the break for the final answer into the final step. So, that
03:53:37.780 | is our custom agent executor with streaming and let's see
03:53:41.220 | what, let's see what it does, okay? Same for both equals
03:53:45.380 | true, so we see all those print statements, okay? So, you can
03:53:52.340 | kind of see it's a little bit messy but you can see we have
03:53:55.700 | tool calls that had some stuff inside it, had add here and
03:54:00.740 | what we're printing out here is we're printing out the full AI
03:54:03.380 | message chunk with tool calls and then I'm just printing out,
03:54:06.900 | okay, what are we actually pulling out from that? So,
03:54:09.460 | these are actually coming from the same thing, okay? And then
03:54:12.740 | the same here, right? So, we're looking at the full message
03:54:15.300 | and then we're looking, okay, we're getting this argument out
03:54:18.340 | from it, okay? So, we can see everything that is being pulled
03:54:22.180 | out, you know, chunk by chunk or token by token and that's it,
03:54:27.380 | okay? So, we could just get everything like that. However,
03:54:31.060 | right, so I'm printing everything so we can see that
03:54:33.300 | streaming. What if I don't print, okay? So, we're setting
03:54:37.380 | verbose or by default, verbose is equal to false here. So,
03:54:41.860 | what happens if we invoke now? Let's see.
03:54:46.900 | Okay.
03:54:50.980 | Cool. We got nothing. So, the reason we got nothing is
03:54:58.480 | because we're not printing but we don't, if you are, if you're
03:55:04.560 | building an API, for example, you're pulling your tokens
03:55:08.160 | through, you can't print them to your like a front end or
03:55:15.440 | print them as to the output of your API. Printing goes to your
03:55:20.560 | terminal, right? Your console window. It doesn't go anywhere
03:55:24.080 | else. Instead, what we want to do is we actually want to get
03:55:29.040 | those tokens out, right? But if but how do we do that, right?
03:55:33.760 | So, we we printed them but another place that those tokens
03:55:37.680 | are is in our queue, right? Because we set them up to go to
03:55:41.680 | the queue. So, we can actually pull them out of our queue
03:55:48.480 | whilst our agent executor is running and then we can do
03:55:52.560 | whatever we want with them because our code is async. So,
03:55:54.800 | it can be doing multiple things at the same time. So, whilst
03:55:58.000 | our code is running the agent executor, whilst that is
03:56:02.000 | happening, our code can also be pulling out from our queue
03:56:05.680 | tokens that are in there and sending them to like an API,
03:56:11.120 | for example, right? Or whatever downstream logic you have. So,
03:56:15.680 | let's see what that looks like. We start by just initializing
03:56:19.040 | our queue, initializing our streamer with that queue. Then
03:56:22.080 | we create a task. So, this is basically saying, okay, I want
03:56:26.400 | to run this but don't run it right now. I'm not ready yet.
03:56:29.760 | The reason that I say I'm not ready yet is because I also
03:56:33.440 | want to define here my async loop which is going to be
03:56:38.000 | printing those tokens, right? But this is async, right? So,
03:56:41.360 | we set this up. This is like get ready to run this. Because
03:56:45.520 | it is async, this is running, right? This is just running.
03:56:49.760 | Like it's there. It's already running. So, we get this. We
03:56:52.640 | continue. We continue. None of this is actually executed
03:56:56.160 | yet, right? Only here when we await the task that we set up
03:57:02.560 | here. Only then does our agent executor run and our async
03:57:10.080 | object here begin getting tokens, right? And here, again,
03:57:14.080 | I'm printing but I don't need to print. I could I could have
03:57:17.280 | like a let's say where this is within an API or something.
03:57:23.440 | Let's say I'm I'm saying, okay, send token to XYZ token, right?
03:57:31.700 | That's sending a token somewhere or if we're maybe
03:57:34.340 | we're yielding this to our some sort of streamer object within
03:57:38.500 | our API, right? We can do whatever we want with those
03:57:40.900 | tokens, okay? I'm just printing them cuz I want to actually see
03:57:44.420 | them, okay? But just important here is that we're not printing
03:57:49.300 | them within our agent executor. We're printing them outside the
03:57:52.580 | agent executor. We've got them out and we can put them
03:57:55.860 | wherever we want which is perfect when you're building an
03:57:58.820 | actual sort of real world use case where you're using an API
03:58:01.220 | or something else. Okay, so let's run that. Let's see what
03:58:03.940 | we get. Look at that. We get all of the information we could
03:58:08.580 | need and a little bit more, right? Because now, we're using
03:58:12.580 | the agent executor and now, we can also see how we have this
03:58:16.740 | step end, right? So, I know or I know just from looking at this,
03:58:21.060 | right? This is my first tool use. So, what tool is it? Let's
03:58:25.620 | have a look. It's the add tool and then, we have these
03:58:29.140 | arguments. So, I can then pass them, right? Downstream. Then,
03:58:32.740 | we have the next tool use which is here, down here. So, then,
03:58:37.940 | we can then pass them in the way that we like. So, that's
03:58:42.100 | pretty cool. Let's, I mean, let's see, right? So, we're
03:58:47.060 | getting those things out. Can we, can we do something with
03:58:50.900 | them before I, before I print them and show them? Yes, let's
03:58:54.660 | see, okay? So, we're now modifying our loop here. Same
03:58:59.860 | stuff, right? We're still initializing our queue,
03:59:02.580 | initializing our streamer, initializing our tasks, okay?
03:59:06.020 | And we're still doing this async for token streamer, okay?
03:59:09.860 | But then, we're doing stuff with our tokens. So, I'm saying,
03:59:13.460 | okay, if we're on stream end, I'm not actually gonna print
03:59:17.300 | stream end. I'm gonna print new line, okay? Otherwise, if we're
03:59:21.940 | getting a tool call here, we're going to say, if that tool call
03:59:26.260 | is the tool name, I am going to print calling tool name, okay?
03:59:32.500 | If it's the arguments, I'm going to print the tool
03:59:36.020 | argument and I'm gonna end up with nothing so that we don't
03:59:38.740 | go onto a new line. So, we're actually gonna be streaming
03:59:41.460 | everything, okay? So, let's just see what this looks like.
03:59:47.420 | Oh, my bad. I just added that. Okay.
03:59:55.420 | You see that? So, it goes very fast. So, it's kinda hard to
03:59:59.200 | see it. I'm gonna slow it down so you can see. So, you can see
04:00:02.800 | that we, as soon as we get the tool name, we stream that
04:00:07.040 | we're calling the add tool. Then, we stream token by token,
04:00:10.560 | the actual arguments for that tool. Then, for the next one,
04:00:13.680 | again, we do the same. We're calling this tool name. Then,
04:00:16.880 | we're streaming token by token again. We're processing
04:00:20.240 | everything downstream from outside of the agent executor
04:00:24.560 | and this is an essential thing to be able to do when we're
04:00:27.920 | actually implementing streaming and async and everything else
04:00:32.480 | in an actual application. So, I know that's a lot but it's
04:00:38.960 | important. So, that is it for our chapter on streaming and
04:00:43.360 | async. I hope it's all been useful. Thanks. Now, we're on
04:00:47.200 | to the final capstone chapter. We're going to be taking
04:00:51.280 | everything that we've learned so far and using it to build a
04:00:56.640 | actual chat application. Now, the chat application is what
04:01:00.400 | you can see right now and we can go into this and ask some
04:01:04.400 | pretty interesting questions and because it's an agent
04:01:06.960 | because as I've accessed these tools, it will be able to
04:01:09.440 | answer them for us. So, we'll see inside our application that
04:01:12.800 | we can ask questions that require tool use such as this
04:01:17.040 | and because of the streaming that we've implemented, we can
04:01:19.600 | see all this information in real time. So, we can see that
04:01:22.160 | serve API tool is being used, that these are the queries. We
04:01:25.280 | saw all that was in parallel as well. So, each one of those
04:01:29.200 | tools were being used in parallel. We've modified the
04:01:31.840 | code a little bit to enable that and we see that we have
04:01:36.160 | the answer. We can also see the structured output being used
04:01:39.520 | here. So, we can see our answer followed by the tools used
04:01:43.440 | here and then we could ask follow-up questions as well
04:01:45.920 | because it's conversational. So, say how is the weather in
04:01:51.200 | each of those cities?
04:01:54.960 | Okay, that's pretty cool. So, this is what we're going to be
04:02:04.540 | building. We are, of course, going to be focusing on the
04:02:07.580 | API, the backend. I'm not front-end engineer so I can't
04:02:11.340 | take you through that but the code is there. So, for those of
04:02:14.380 | you that do want to go through the front-end code, you can, of
04:02:17.260 | course, go and do that but we'll be focusing on how we
04:02:20.380 | build the API that powers all of this using, of course,
04:02:24.220 | everything that we've learned so far. So, let's jump into it.
04:02:27.340 | The first thing we're going to want to do is clone this repo.
04:02:30.700 | So, we'll copy this URL. This is the repo, Aurelio Labs
04:02:34.860 | LineChainCourse and you just clone your repo like so. I've
04:02:41.340 | already done this so I'm not going to do it again. Instead,
04:02:44.940 | I'll just navigate to the LineChainCourse repo. Now,
04:02:49.340 | there's a few setup things that you do need to do. All of
04:02:53.020 | those can be found in the README. So, we just open a new
04:02:57.740 | tab here and I'll open the README. Okay, so this explains
04:03:03.180 | everything we need. We have, if you were running this locally
04:03:06.860 | already, you will have seen this or you will have already
04:03:09.580 | done all this but for those of you that haven't, we'll go
04:03:12.460 | through quickly now. So, you will need to install the uv
04:03:18.140 | library. So, this is how we manage our Python environment,
04:03:22.700 | our packages. We use uv. On Mac, you would install it like
04:03:27.980 | so. If you're on Windows or Linux, just double check how
04:03:32.620 | you would install over here. Once you have installed this,
04:03:36.700 | you would then go to install Python. So, uv Python install.
04:03:42.780 | Then, we want to create our VM, our virtual environment
04:03:47.580 | using that version of Python. So, uvvn here. Then, as you can
04:03:53.820 | see here, we need to activate that virtual environment which
04:03:57.420 | I did miss from here. So, let me quickly add that. So, you
04:04:02.060 | just run that. For me, I'm using Phish. So, I just add
04:04:05.740 | Phish onto the end there but if you're using Bash or ZSH, I
04:04:08.380 | think you can you can just run that directly. And then,
04:04:11.100 | finally, we need to sync, i.e. install all of our packages
04:04:16.700 | using uv sync. And you see that will install everything for
04:04:20.940 | you. Great. So, we have that and we can go ahead and actually
04:04:26.940 | open Cursor or VS Code and then we should find ourselves
04:04:32.220 | within Cursor or VS Code. So, in here, you'll find a few
04:04:37.740 | things that we will need. So, first is environment variables.
04:04:42.780 | So, we can come over to here and we have OpenAI, API Key,
04:04:47.100 | Long Chain API Key, and SERP API API Key. Create a copy of
04:04:50.940 | this and you'd make this your .env file or if you want to
04:04:56.780 | run it with source, you can, well, I like to use Mac.env
04:05:01.820 | when I'm on Mac and I just add export onto the start there and
04:05:05.740 | then enter my API keys. Now, I actually already have these in
04:05:10.140 | this local.mac.env file which over in my terminal, I would
04:05:15.420 | just activate with source again like that. Now, we'll need that
04:05:20.540 | when we are running our API and application later but for now,
04:05:24.940 | let's just focus on understanding what the API
04:05:28.380 | actually looks like. So, navigating into the 09 Capstone
04:05:33.340 | chapter, we'll find a few things. What we're going to
04:05:37.020 | focus on is the API here and we have a couple of notebooks
04:05:41.260 | that help us just understand, okay, what are we actually
04:05:44.780 | doing here? So, let me give you a quick overview of the API
04:05:49.260 | first. So, the API, we're using FastAPI for this. We have a
04:05:53.340 | few functions in here. The one that we'll start with is this.
04:05:57.420 | Okay. So, this is our post endpoint for invoke and this
04:06:01.900 | essentially sends something to our LLM and begins a streaming
04:06:05.980 | response. So, we can go ahead and actually start the API and
04:06:09.980 | we can just see what this looks like. So, we'll go into
04:06:13.180 | chapter 09 Capstone API after setting our environment
04:06:18.060 | variables here and we just want to do uv run uvcorn main
04:06:23.260 | colon app reload. We don't need to reload but if we're
04:06:26.620 | modifying the code, that can be useful. Okay, and we can see
04:06:29.820 | that our API is now running on localhost port 8000 and
04:06:37.340 | if we go to our browser, we can actually open the docs for our
04:06:41.180 | API. So, we go to 8000 slash docs. Okay, we just see that we
04:06:45.900 | have that single invoke method. It extracts the content and it
04:06:51.420 | gives us a small amount of information there. Now, we
04:06:54.780 | could try it out here. So, if we say, say, hello, we can run
04:07:00.860 | that and we'll see that we get a response. We get this. Okay.
04:07:08.140 | Now, the thing that we're missing here is that this is
04:07:10.380 | actually being streamed back to us. Okay. So, this is not a
04:07:15.340 | just a direct response. This is a stream. To see that, we're
04:07:19.020 | going to navigate over to here to this streaming testing
04:07:21.980 | notebook and we'll run this. So, we are using requests here.
04:07:28.540 | We are not just doing a, you know, the standard post request
04:07:32.940 | because we want to stream the output and then print the
04:07:35.900 | output as we are receiving them. Okay. So, that's why this
04:07:41.100 | look, it's a little more complicated than just a typical
04:07:43.340 | request request.get. So, what we're doing here is we're
04:07:49.340 | starting our session which is our post request and then we're
04:07:53.580 | just iterating through the content as we receive it from
04:07:57.340 | that request. When we receive a token, right? Because sometimes
04:08:00.940 | this might be none. We print that. Okay and we have that
04:08:04.700 | flush equals truth. We have the use in the past. So, let's
04:08:08.780 | define that and then let's just ask a simple question. What is
04:08:12.140 | five plus five?
04:08:15.100 | Okay and we we saw that was it was pretty quick. So, it
04:08:19.440 | generated this response first and then it went ahead and
04:08:23.680 | actually continued streaming with all of this. Okay and we
04:08:29.120 | can see that there are these special tokens are being
04:08:31.360 | provided. This is to help the front end basically decide,
04:08:36.240 | okay, what should go where? So, here where we're showing these
04:08:41.280 | multiple steps of tool use and the parameters. The way the
04:08:46.160 | front end is deciding how to display those is it's just it's
04:08:50.800 | being provided the single stream but it has these set
04:08:53.600 | tokens. Has a step, has a set name, then it has the
04:08:57.120 | parameters followed by the sort of ending of the set token and
04:09:01.200 | it's looking at each one of these and then the one step
04:09:04.960 | name that it treats differently is where it will see the final
04:09:08.800 | answer step name. When it sees the final step name rather than
04:09:11.840 | displaying this tool use interface, it instead begins
04:09:15.680 | streaming the tokens directly like a typical chat interface
04:09:20.320 | and if we look at what we actually get in our final
04:09:23.120 | answer, it's not just the answer itself, right? So, we
04:09:26.720 | have the answer here. This is streamed into that typical chat
04:09:32.640 | output but then we also have tools used and then this is
04:09:36.240 | added into the little boxes that we have below the chat
04:09:40.800 | here. So, there's quite a lot going on just within this
04:09:44.000 | little stream. Now, we can try with some other questions here.
04:09:48.880 | So, we can say, okay, tell me about the latest news in the
04:09:50.960 | world. You can see that there's a little bit of a wait here
04:09:52.960 | whilst it's waiting to get the response and then, yeah,
04:09:56.160 | it's streaming a lot of stuff quite quickly, okay? So, there's
04:10:00.160 | a lot coming through here, okay? And then we can ask other
04:10:03.840 | questions like, okay, this one here, how cold is it in Oslo
04:10:06.880 | right now? Is five multiplied by five, right? So, these two
04:10:10.800 | are going to be executed in parallel and then it will after
04:10:14.800 | it has the answers for those, the agent will use another
04:10:18.400 | multiply tool to multiply those two values together and all of
04:10:21.920 | that will get streamed, okay? And then, as we saw earlier, we
04:10:26.640 | have the what is the current date and time in these places.
04:10:29.440 | Same thing. So, three questions. There are three
04:10:32.560 | questions here. What is the current date and time in Dubai?
04:10:34.640 | What is the current date and time in Tokyo and what is the
04:10:36.720 | current date and time in Berlin? Those three questions
04:10:40.880 | get executed in parallel against the API search tool and
04:10:45.200 | then all answers get returned within that final answer, okay?
04:10:49.520 | So, that is how our API is working. Now, let's dive a
04:10:55.360 | little bit into the code and understand how it is working.
04:11:00.240 | So, there are a lot of important things here. There's
04:11:03.280 | some complexity but at the same time, we try to make this as
04:11:06.160 | simple as possible as well. So, this is just fast API syntax
04:11:10.480 | here with the app post invoke. So, just our invoke endpoint.
04:11:15.040 | We consume some content which is a string and then if you
04:11:19.040 | remember from the agent executed deep dive which is
04:11:22.480 | what we've implemented here or a modified version of that, we
04:11:27.520 | have to initialize our async IO queue and our streamer which
04:11:32.160 | is the queue callback handler which I believe is exactly the
04:11:35.520 | same as what we defined in that earlier chapter. There's no
04:11:38.800 | differences there. So, we define that and then we return
04:11:43.520 | this streaming response object, right? Again, this is a fast
04:11:46.960 | API thing. This is so that you are streaming a response. That
04:11:50.880 | streaming response has a few attributes here which again are
04:11:55.040 | fast API things or just generic API things. So, some headers
04:12:00.000 | giving instructions to the API and then the media type here
04:12:03.440 | which is text event stream. You can also use, I think it's text
04:12:07.360 | plane possibly as well but I believe the standard here would
04:12:12.000 | be to use event stream and then the more important part for us
04:12:16.400 | is this token generator, okay? So, what is this token
04:12:20.480 | generator? Well, it is this function that we've defined up
04:12:24.080 | here. Now, if you, again, if you remember that earlier
04:12:27.760 | chapter, at the end of the chapter, we set up a for loop
04:12:33.280 | where we're printing out different tokens in various
04:12:36.320 | formats. So, we're kind of post processing them before
04:12:40.320 | deciding how to display them. That's exactly what we're doing
04:12:43.520 | here. So, in this block here, we're looping through every
04:12:50.400 | token that we're receiving from our streamer. We're looping
04:12:54.720 | through and we're just saying, okay, if this is the end of a
04:12:58.240 | step, we're going to yield this end of step token which we we
04:13:02.640 | saw here, okay? So, it's this end of end of set token there.
04:13:07.680 | Otherwise, if this is a tool call, so again, we've got that
04:13:11.280 | walrus operator here. So, what we're doing is saying, okay,
04:13:14.720 | get the tool calls out from our current message. If there is
04:13:19.760 | something there. So, if this is not none, we're going to execute
04:13:23.360 | what is inside here and what is being executed inside here is
04:13:27.200 | we're checking for the tool name. If we have the tool name,
04:13:30.160 | we return this, okay? So, we have the start of step token,
04:13:35.040 | the start of the step name token, the tool name or step
04:13:39.680 | name, whichever those you want to call it, and then the end of
04:13:42.560 | the step name token, okay? And then this, of course, comes
04:13:48.560 | through to the front end like that, okay? That's what we have
04:13:52.320 | there. Otherwise, we should only be seeing the tool name
04:13:55.680 | returned as part of first token for every step. After that, it
04:13:59.520 | should just be tool arguments. So, in this case, we say, okay,
04:14:03.440 | if we have those tool or function arguments, we're going
04:14:06.480 | to just return them directly. So, then that is the part that
04:14:09.840 | would stream all of this here, okay? Like these would be
04:14:13.600 | individual tokens, right? For example, right? So, we might
04:14:16.800 | have the open curly brackets followed by query could be a
04:14:20.960 | token, the latest could be a token, world could be a token,
04:14:24.640 | news could be a token, etc. Okay? So, that is what is
04:14:28.160 | happening there. This should not get executed but we have a,
04:14:32.720 | we just handle that just in case. So, we have any issues
04:14:36.320 | with tokens being returned there. We're just gonna print
04:14:39.040 | this error and we're going to continue with the streaming but
04:14:43.600 | that should not really be happening. Cool. So, that is
04:14:47.120 | our token streaming loop. Now, the way that we are picking up
04:14:53.920 | tokens from our stream object here is of course through our
04:14:57.840 | agent execution logic which is happening in parallel, okay? So,
04:15:02.000 | all of this is asynchronous. We have this async definition
04:15:04.720 | here. So, all of this is happening asynchronously. So,
04:15:08.640 | what has happened here is here, we have created a task which is
04:15:14.320 | the agent executor invoke and we passing our content, we're
04:15:17.840 | passing that streamer which we're gonna be pulling tokens
04:15:20.160 | from and we also set verbose to true. Uh we can actually
04:15:24.160 | remove that but that would just allow us to see additional
04:15:27.600 | output in our terminal window if we want it. I don't think
04:15:32.640 | there's anything particularly interesting to look at in there
04:15:36.400 | but particularly if you are debugging that can be useful.
04:15:40.000 | So, we create our task here but this does not begin the task.
04:15:45.440 | Alright, this is a async IO create task but this does not
04:15:49.840 | begin until we await it down here. So, what is happening
04:15:53.520 | here is essentially this code here is still being run or in
04:15:58.880 | like a we're in an asynchronous loop here but then we await
04:16:02.800 | this task. As soon as we await this task, tokens will still
04:16:06.320 | start being placed within our queue which then get picked up
04:16:10.480 | by the streamer object here. So, then this begins receiving
04:16:14.880 | tokens. I know async is always a little bit more confusing
04:16:20.880 | given the strange order of things but that is essentially
04:16:25.040 | what is happening. You can imagine all this is essentially
04:16:27.680 | being executed all at the same time. So, we have that. So,
04:16:32.800 | anything else to go through here? I don't think so. It's
04:16:35.520 | all sort of boilerplate stuff for FastAPI rather than the
04:16:39.040 | actual AI code itself. So, we have that as our streaming
04:16:43.600 | function. Now, let's have a look at the agent code itself.
04:16:48.720 | Okay. So, agent code. Where would that be? So, we're using
04:16:52.400 | this agent execute invoke and we're importing this from the
04:16:56.720 | agent file. So, we can have a look in here for this. Now, you
04:17:01.840 | can see straight away, we're pulling in our API keys here.
04:17:06.000 | Just, yeah, make sure that you do have those. Now, all of our
04:17:10.000 | cell, okay? This is what we've seen before in that agent
04:17:14.800 | executed deep dive chapter. This is all practically the
04:17:19.280 | same. So, we have our LM. We've set those configurable fields
04:17:25.280 | as we did in the earlier chapters. That configurable
04:17:28.240 | field is for our callbacks. We have our prompt. This has been
04:17:31.760 | modified a little bit. So, essentially, just telling it,
04:17:36.080 | okay, make sure you use the tools provided. We say you must
04:17:40.480 | use the final answer to provide a final answer to the user and
04:17:43.680 | one thing that I added that I noticed every now and again. So,
04:17:47.360 | I have explicitly said, use tools to answer the user's
04:17:50.400 | current question, not previous questions. So, I found with
04:17:54.800 | this setup, it will occasionally, if I just have a
04:17:58.720 | little bit of small talk with the agent and beforehand I was
04:18:02.080 | asking questions about, okay, like what was the weather in
04:18:04.720 | this place or that place, the agent will kind of hang on to
04:18:08.000 | those previous questions and try and use a tool again to
04:18:11.600 | answer and that is just something that you can more or
04:18:14.240 | less prompt out of it, okay? So, we have that. This is all
04:18:18.400 | exactly the same as before, okay? So, we have our chat
04:18:21.200 | history to make this conversational. We have our
04:18:23.920 | human message and then our agent scratch pad so that our
04:18:27.040 | agent can think through multiple tool use messages.
04:18:30.960 | Great. So, we also have the article class. So, this is to
04:18:36.080 | process results from SERP API. We have our SERP API function
04:18:42.160 | here. I will talk about that a little more in a moment
04:18:45.040 | because this is also a little bit different to what we
04:18:46.800 | covered before. What we covered before with SERP API, if you
04:18:51.200 | remember, was synchronous because we're using the SERP
04:18:55.040 | API client directly or the SERP API tool directly from
04:18:59.840 | BlankChain and because we want everything to be asynchronous,
04:19:03.920 | we have had to recreate that tool in a asynchronous fashion
04:19:09.600 | which we'll talk about a little bit later. But for now, let's
04:19:13.360 | move on from that. We can see our final answer being used
04:19:18.000 | here. So, this is I think we define the exact same thing
04:19:21.920 | before probably in that deep dive chapter again where we
04:19:25.040 | have just the answer and the tools that have been used.
04:19:29.200 | Great. So, we have that. One thing that is a little
04:19:32.640 | different here is when we are defining our name to tool
04:19:38.480 | function. So, this takes a tool name and it maps it to a tool
04:19:43.680 | function. When we have synchronous tools, we actually
04:19:48.800 | use tool funk here. Okay. So, rather than tool coroutine, it
04:19:53.440 | would be tool funk. However, we are using asynchronous tools
04:19:59.200 | and so this is actually tool coroutine and this is why
04:20:04.960 | if you come up here, I've made every single tool
04:20:08.320 | asynchronous. Now, that is not really necessary for a tool
04:20:13.360 | like final answer because there's no API calls
04:20:16.560 | happening. An API call is a very typical scenario where
04:20:20.400 | you do want to use async because if you make an API call
04:20:23.840 | with a synchronous function, your code is just going to be
04:20:26.800 | waiting for the response from the API while the API is
04:20:31.440 | processing and doing whatever it's doing. So, that is an
04:20:36.080 | ideal scenario where you would want to use async because
04:20:38.960 | rather than your code just waiting for the response from
04:20:42.880 | the API, it can instead go and do something else whilst it's
04:20:46.320 | waiting, right? So, that's an ideal scenario where you'd use
04:20:49.360 | async which is why we would use it for example with the
04:20:51.760 | SERP API tool here but for final answer and for all of
04:20:56.320 | these calculator tools that we've built, there's actually
04:21:00.720 | no need to have these as async because our code is just
04:21:05.920 | running through. It's executing this code. There's no waiting
04:21:09.280 | involved. So, it doesn't necessarily make sense to have
04:21:12.080 | these asynchronous. However, by making them asynchronous, it
04:21:16.160 | means that I can do tool coroutine for all of them
04:21:19.440 | rather than saying, oh, if this tool is synchronous, use
04:21:23.520 | tool.func whereas if this one is async, use tool.coroutine.
04:21:28.000 | So, it just simplifies the code for us a lot more but yeah, not
04:21:33.040 | directly necessary but it does help us write cleaner code
04:21:36.800 | here. This is also true later on because we actually have to
04:21:41.280 | await our tool calls which we can see over here, right? So,
04:21:46.880 | we have to await those tool calls. That would get messier
04:21:50.960 | if we were using the like some sync tools, some async tools.
04:21:56.880 | So, we have that. We have our Q callback handler. This is
04:22:00.320 | again, that's the same as before. So, I'm not going to go
04:22:03.520 | through. I'm not going to go through that. We covered that
04:22:06.080 | in the earlier deep dive chapter. We have our execute
04:22:09.600 | tool function here. Again, that is asynchronous. This just
04:22:13.120 | helps us, you know, clean up code a little bit. This would,
04:22:16.640 | I think in the deep dive chapter, we had this directly
04:22:20.000 | place within our agent executor function and you can do that.
04:22:23.840 | It's fine. It's just a bit cleaner to kind of pull this
04:22:26.880 | out and we can also add more type annotations here which I
04:22:30.480 | like. So, execute tool expects us to provide an AI message
04:22:34.400 | which includes a tool call within it and it will return us
04:22:38.640 | a tool message. Okay. Agent executor, this is all the same
04:22:44.480 | as before and we're actually not even using verbose here so
04:22:48.240 | we could fully remove it but I will leave it. Of course, if
04:22:51.040 | you would like to use that, you can just add a if verbose and
04:22:54.400 | then log or print some stuff where you need it. Okay. So,
04:22:59.760 | what do we have in here? We have our streaming function. So,
04:23:02.720 | this is what actually calls our agent, right? So, we have a
04:23:08.800 | query. This will call our agent just here and we could even
04:23:14.080 | make this a little clearer. So, for example, this could be
04:23:17.200 | configured agent because this is this is not the response.
04:23:22.320 | This is a configured agent. So, I think this is maybe a little
04:23:25.360 | clearer. So, we are configuring our agent with our callbacks,
04:23:29.520 | okay? Which is just our streamer. Then we're iterating
04:23:32.880 | through the tokens are returned by our agent using a stream
04:23:37.040 | here. Okay? And as we are iterating through this because
04:23:41.920 | we pass our streamer to the callbacks here, what that is
04:23:46.400 | going to do is every single token that our agent returns is
04:23:52.320 | gonna get processed through our queue callback handler here.
04:23:57.280 | Okay? So, this on LM token on LMN, these are going to get
04:24:03.360 | executed and then all of those tokens you can see here are
04:24:07.360 | passed to our queue. Okay? Then, we come up here and we
04:24:11.040 | have this a iter. So, this a iter method here is used by our
04:24:16.000 | generator over in our API is used by this token generator.
04:24:22.660 | To pick up from the queue, the tokens that have been put in
04:24:28.420 | the queue by these other methods here. Okay? So, it's
04:24:32.260 | putting tokens into the queue and pulling them out with this.
04:24:38.020 | Okay? So, that is just happening in parallel as well as
04:24:41.460 | this code is running here. Now, the reason that we extract the
04:24:45.380 | tokens out here is that we want to pull out our tokens and we
04:24:49.460 | append them all to our outputs. Now, those outputs that becomes
04:24:53.780 | a list of AI messages which are essentially the AI telling us
04:24:58.660 | what tool to use and what parameters to pass to each one
04:25:02.580 | of those tools. This is very similar to what we covered in
04:25:06.180 | that deep dive chapter but the one thing that I have modified
04:25:09.380 | here is I've enabled us to use parallel tool calls. So, that
04:25:17.460 | is what we see here with this these four lines of code. We're
04:25:21.060 | saying, okay, if our tool call includes an ID, that means we
04:25:24.660 | have a new tool call or a new AI message. So, what we do is
04:25:29.940 | we append that AI message which is the AI message chunk to our
04:25:35.060 | outputs and then following that, if we don't get an ID,
04:25:38.180 | that means we're getting the tool arguments. So, following
04:25:41.780 | that, we're just adding our AI message chunk to the most
04:25:46.420 | recent AI message chunk from our outputs. Okay, so what that
04:25:50.260 | will do is it will create that list of AI messages. It'll be
04:25:56.500 | like, you know, AI message one and then this will just append
04:26:01.780 | everything to that AI message one. Then, we'll get our next
04:26:05.700 | AI message chunk. This will then just append everything to
04:26:09.220 | that until we get a complete AI message and so on and so on.
04:26:13.780 | Okay. So, what we do here is here, we've collected all of
04:26:19.780 | our AI message chunk objects. Then, finally, what we do is
04:26:23.460 | just transform all those AI message chunk objects into
04:26:26.580 | actual AI message objects and then return them from our
04:26:29.700 | function which we then receive over here. So, into the tool
04:26:33.780 | calls variable. Okay. Now, this is very similar to the deep
04:26:38.980 | dive chapter. Again, we're going through that count, that
04:26:42.660 | loop where we have a max iterations at which point we
04:26:45.300 | will just stop but until then, we continue iterating through
04:26:50.660 | and making more tool calls, executing those tool calls, and
04:26:53.700 | so on. So, what is going on here? Let's see. So, we got our
04:26:58.580 | tool calls. This is going to be a list of AI message objects.
04:27:02.660 | Then, what we do with those AI message objects is we pass them
04:27:07.060 | to this execute tool function. If you remember, what is that?
04:27:10.500 | That is this function here. So, we pass each AI message
04:27:15.140 | individually to this function and that will execute the tool
04:27:20.260 | for us and then return us that observation from the tool.
04:27:25.620 | Okay. So, that is what you see happening here but this is an
04:27:30.660 | async method. So, typically, what you'd have to do is you'd
04:27:34.100 | have to do await execute tool and we could do that. So, we
04:27:38.420 | could do a, okay, let me make this a little bigger for us.
04:27:42.660 | Okay. And so, what we could do, for example, which might be a
04:27:45.700 | bit clearer is you could do tool obs equals an empty list
04:27:51.220 | and what you could do is you can say for tool call, oops, in
04:27:56.180 | tool calls, the tool observation is we're going to
04:28:00.980 | append execute tool call which would have to be in a wait. So,
04:28:06.100 | we'd actually put the await in there and what this would do is
04:28:09.460 | actually the exact same thing as what we're doing here. The
04:28:12.740 | difference being that we're doing this tool by tool. Okay.
04:28:17.540 | So, we are, we're executing async here but we're doing them
04:28:22.340 | sequentially whereas what we can do which is better is we
04:28:25.780 | can use async gather. So, what this does is gathers all those
04:28:30.260 | coroutines and then we await them all at the same time to
04:28:34.180 | run them all asynchronously. They all begin at the same time
04:28:37.780 | or almost exactly the same time and we get those responses
04:28:42.500 | kind of in parallel but of course it's async so it's not
04:28:46.260 | fully in parallel but practically in parallel.
04:28:50.260 | Cool. So, we have that and then that, okay, we get all of our tool
04:28:54.900 | observations from that. So, that's all of our tool messages
04:28:57.620 | and then one interesting thing here is if we,
04:29:01.700 | let's say we have all of our AI messages with all of our tool
04:29:04.980 | calls and we just append all of those to our agent scratchpad.
04:29:09.460 | Alright. So, let's say here we're just like, oh, okay,
04:29:11.860 | agent scratchpad extend and then we would just have, okay,
04:29:17.700 | we'd have our tool calls and then we do agent scratchpad
04:29:22.820 | extend tool obs. Alright. So, what is happening here is this
04:29:27.780 | would essentially give us something that looks like this.
04:29:33.700 | So, we'd have our AI message, say, I'm just gonna put, okay,
04:29:38.660 | we'll just put tool call IDs in here to simplify it a little
04:29:41.380 | bit. This would be tool call ID A. Then, we would have AI
04:29:46.900 | message, tool call ID B. Then, we'd have tool message. Let's
04:29:54.740 | just remove this content field. I don't want that and tool
04:29:59.140 | message, tool call ID B, right? So, it would look something
04:30:02.660 | like this. So, the order is the tool message is not following
04:30:07.140 | the AI message which you would think, okay, we have this tool
04:30:10.420 | call ID. That's probably fine but actually, when we're
04:30:12.980 | running this, if you add these two agents scratchpad in this
04:30:16.340 | order, what you'll see is your response just hangs like
04:30:21.300 | nothing. Nothing happens when you come through to your second
04:30:25.860 | iteration of your agent call. So, actually, what you need to
04:30:29.620 | do is these need to be sorted so that they are actually in
04:30:33.060 | order and it doesn't actually doesn't necessarily matter
04:30:36.740 | which order in terms of like A or B or C or whatever you use.
04:30:40.500 | So, you could have this order. We have AI message, tool
04:30:43.460 | message, AI message, tool message, just as long as you
04:30:46.180 | have your tool call IDs are both together or you could, you
04:30:49.620 | know, invert this for example, right? So, you could have this,
04:30:54.580 | right? And that will work as well. It's essentially just as
04:30:58.180 | long as you have your AI message followed by your tool
04:31:01.140 | message and both of those are sharing that tool call ID. You
04:31:04.260 | need to make sure you have that order, okay? So, that of course
04:31:09.140 | would not happen if we do this and instead, what we need to do
04:31:13.700 | is something like this, okay? So, if I make this a little
04:31:18.580 | easier to read, okay? So, we're taking the tool call ID. We are
04:31:23.780 | pointing it to the tool observation and we're doing
04:31:26.500 | that for every tool call and tool observation within like a
04:31:29.860 | zip of those, okay? Then, what we're saying is for each tool
04:31:35.060 | call within our tool calls, we are extending our agent
04:31:38.820 | scratchpad with that tool call followed by the tool
04:31:43.300 | observation message which is the tool message. So, this would
04:31:46.420 | be our, this is the AI message and that is the tool messages
04:31:51.860 | down there, okay? So, that is always happening and that is
04:31:54.900 | how we get this correct order which will run. Otherwise,
04:31:59.620 | things will not run. So, that's important to be aware of,
04:32:04.020 | okay? Now, we're almost done. I know there's, we've just been
04:32:07.220 | through quite a lot. So, we continue, we increment our
04:32:10.820 | count as we were doing before and then we need to check for
04:32:13.300 | the final answer tool, okay? And because we're running these
04:32:16.260 | tools in parallel, okay? Because we're allowing multiple
04:32:19.460 | tool calls in one step, we can't just look at the most
04:32:23.300 | recent tool and look if it is, it has the name final answer.
04:32:26.260 | Instead, we need to iterate through all of our tool calls
04:32:28.740 | and check if any of them have the name final answer. If they
04:32:32.020 | do, we say, okay, we extract that final answer call. We
04:32:35.620 | extract the final answer as well. So, this is the direct
04:32:38.660 | text content and we say, okay, we have found the final answer.
04:32:42.900 | So, this will be set to true, okay? Which should happen
04:32:45.940 | every time but let's say if our agent gets stuck in a loop of
04:32:50.660 | calling multiple tools, this might not happen before we
04:32:55.300 | break based on the max iterations here. So, we might
04:32:58.820 | end up breaking based on max iterations rather than we found
04:33:02.340 | a final answer, okay? So, that can happen. So, anyway, if we
04:33:07.460 | find that final answer, we break out of this for loop here
04:33:11.220 | and then, of course, we do need to break out of our while loop
04:33:14.420 | which is here. So, we say, if we found the final answer,
04:33:17.380 | break, okay? Cool. So, we have that. Finally, after all of
04:33:24.100 | that. So, this is how, you know, we've executed our tool, our
04:33:26.900 | agent has steps and iterations, has process, we've been through
04:33:32.980 | those. Finally, we come down to here where we say, okay, we're
04:33:37.220 | gonna add that final output to our chat history. So, this is
04:33:40.980 | just going to be the text content, right? So, this here,
04:33:45.140 | get direct answer but then, what we do is we return the
04:33:50.180 | full final answer call. The full final answer call is
04:33:52.740 | basically this here, right? So, this answer and tools used but
04:33:57.220 | of course, populated. So, we're saying here that if we have a
04:34:00.820 | final answer, okay? If we have that, we're going to return the
04:34:05.620 | final answer call which was generated by our LLM.
04:34:09.300 | Otherwise, we're gonna return this one. So, this is in the
04:34:12.340 | scenario that maybe the agent got caught in a loop and just
04:34:15.540 | kept iterating. If that happens, we'll say it will come
04:34:19.220 | back with, okay, no answer found and it will just return,
04:34:22.100 | okay, we didn't use any tools which is not technically true
04:34:25.620 | but it's this is like a exception handling event. So,
04:34:30.020 | it ideally shouldn't happen but it's not really a big deal if
04:34:34.660 | we're saying, okay, there were no tools used in my opinion
04:34:37.620 | anyway. Cool. So, we have all of that and yeah, we just, we
04:34:44.340 | initialize our agent executor and then, I mean, that is our
04:34:48.900 | agent execution code. The one last thing we wanna go through
04:34:52.020 | is the SERP API tool which we will do in a moment. Okay. So,
04:34:57.300 | SERP API. Let's see what, let's see how we build our SERP API
04:35:04.260 | tool. Okay, so, we'll start with the synchronous SERP API.
04:35:10.900 | Now, the reason we're starting with this is that it's actually,
04:35:13.700 | it's just a bit simpler. So, I'll show you this quickly
04:35:16.500 | before we move on to the async implementation which is what
04:35:19.300 | we're using within our app. So, we want to get our SERP API
04:35:23.700 | API key. So, I'll run that and we just enter it at the top
04:35:28.260 | there. And this will run. So, we're going to use the SERP
04:35:34.500 | API SDK first. We're importing Google search and these are the
04:35:38.340 | input parameters. So, we have our API key. We're using, we
04:35:41.220 | say we want to use Google. We, our question is cell query. So,
04:35:45.220 | queue for query. We're searching for the latest news in the
04:35:48.340 | world and we'll return quite a lot of stuff. You can see
04:35:52.580 | there's a ton of stuff in there, right? Now, what we want
04:35:58.900 | is contained within this organic results key. So, we can
04:36:02.180 | run that and we'll see, okay, it's talking about, you know,
04:36:06.500 | various things. Pretty recent stuff at the moment. So, we can
04:36:10.340 | tell, okay, that is, that is in fact working. Now, this is
04:36:14.340 | quite messy. So, what I would like to do first is just clean
04:36:17.780 | that up a little bit. So, we define this article base model
04:36:21.620 | which is Pydantic and we're saying, okay, from a set of
04:36:25.780 | results. Okay. So, we're going to iterate through each of
04:36:28.420 | these. We're going to extract the title, source link, and the
04:36:33.620 | snippet. So, you can see title, source, link, and snippet here.
04:36:42.340 | Okay. So, that's all useful. We'll run that and what we do
04:36:46.740 | is we go through each of the results in organic results and
04:36:51.220 | we just load them into our article using this class method
04:36:54.020 | here and then we can see, okay, let's have a look at what those
04:36:58.740 | look like. It's much nicer. Okay, we get this nicely
04:37:04.260 | formatted object here. Cool. That's great. Now, all of this,
04:37:10.340 | what we just did here. So, this is using sub APIs SDK which is
04:37:14.660 | great. Super easy to use. The problem is that they don't
04:37:17.700 | offer a async SDK which is a shame but it's not that hard
04:37:22.820 | for us to set up ourselves. So, typically, with asynchronous
04:37:28.260 | requests, what we can use is the AIO HTTP library. It's well,
04:37:34.900 | you can see what we're doing here. So, this is equivalent to
04:37:39.220 | requests.get. Okay. That's essentially what we're doing
04:37:44.580 | here and the equivalent is literally this. Okay. So, this
04:37:49.860 | is the equivalent using requests that we are running
04:37:53.380 | here but we're using async code. So, we're using AI HTTP
04:37:58.820 | client session and then session.get. Okay. With this
04:38:03.540 | async with here and then we just await our response. So,
04:38:06.340 | this is all, yeah, this is what we do rather than this to make
04:38:10.980 | our code async. So, it's really simple and then the output that
04:38:14.980 | we get is exactly the same, right? So, we still get this
04:38:17.860 | exact same output. So, that means, of course, that we can
04:38:21.300 | use that articles method like this in the exact same way and
04:38:26.660 | we get, we get the same result. There's no need to make this
04:38:30.420 | article from sub API results async because again, like this,
04:38:35.700 | this bit of code here is fully local. It's just our Python
04:38:39.540 | running everything. So, this does not need to be async. Okay
04:38:44.820 | and we can see that we get literally the exact same result
04:38:48.580 | there. So, with that, we have everything that we would need
04:38:52.420 | to build a fully asynchronous sub API tool which is exactly
04:38:56.340 | what we do here for LangChain. So, we import those tools and I
04:39:00.580 | mean, there's nothing, is there anything different here? No.
04:39:03.380 | Alright, this is exactly what we we just did but I will run
04:39:06.420 | this because I would like to show you very quickly this.
04:39:11.220 | Okay. So, this is how we were initially calling our tools in
04:39:15.860 | previous chapters because we were okay mostly with using the
04:39:19.860 | the synchronous tools. However, you can see that the func here
04:39:26.100 | is just empty. Alright, so if I do type, it's just a non type.
04:39:30.660 | That is because well, this is an async function, okay? It's an
04:39:37.220 | async tool. Sorry. So, it was defined with async here and
04:39:41.860 | what happens when you do that is you get this coroutine object.
04:39:47.460 | So, rather than func which is it isn't here, you get that
04:39:52.260 | coroutine. If we then modify this which would be kinda, okay,
04:39:57.300 | let's just remove all the asyncs here and the await. If we
04:40:03.540 | modify that like so and then we look at the cert API
04:40:07.860 | structured tool, we go across, we see that we now get that
04:40:12.020 | func, okay? So, that is that is just the difference between an
04:40:15.940 | async structured tool versus async structured tool via
04:40:19.620 | corsion async. Okay, now we have coroutine again. So,
04:40:26.660 | important to be aware of that and of course, we we run using
04:40:33.300 | the cert API coroutine. So, that is that's how we build the
04:40:38.660 | cert API tool and there's nothing. I mean, that is
04:40:42.740 | exactly what we did here. So, I don't need to, I don't think we
04:40:45.380 | need to go through that any further. So, yeah, I think that
04:40:49.780 | is basically all of our code behind this API. With all of
04:40:54.340 | that, we can then go ahead. So, we have our API running
04:40:57.780 | already. Let's go ahead and actually run also our front
04:41:02.340 | end. So, we're gonna go to Documents Aurelio Linechain
04:41:06.340 | course and then we want to go to chapters zero nine capstone
04:41:12.100 | app and you will need to have NPM installed. So, to do that,
04:41:16.420 | what do we do? We can take a look at this answer for
04:41:19.460 | example. This is probably what I would recommend, okay? So, I
04:41:23.060 | would run brew install node followed by brew install NPM.
04:41:26.900 | If you're on Mac, of course, it's different. If you're on
04:41:28.740 | Linux or Windows, once you have those, you can do NPM install
04:41:33.060 | and this will just install all of the oops, sorry, NPM install
04:41:37.460 | and this will just install all of the node packages that we
04:41:41.780 | need and then we can just run NPM run dev, okay? And now, we
04:41:48.260 | have our app running on Locust 3000. So, we can come over to
04:41:52.820 | here, open that up and we have our application. You can
04:41:57.140 | ignore this. So, in here, we can begin just asking
04:42:00.500 | questions, okay? So, we can start with a quick question.
04:42:04.020 | What is five plus five?
04:42:07.380 | MC. So, we have our streaming happening here. It said the
04:42:12.200 | agent wants to use the add tool and these are the input
04:42:14.760 | parameters to the add tool and then we get the streamed
04:42:17.880 | response. So, this is the final answer tool where we're
04:42:21.800 | outputting that answer key and value and then here, we're
04:42:25.240 | outputting that tool used key and value which is just an
04:42:29.000 | array of the tools being used which just functions add. So,
04:42:32.840 | we have that. Then, let's ask another question. This time,
04:42:36.520 | we'll trigger SERP API with tell me about the latest news
04:42:39.880 | in the world. Okay. So, we can see that's using SERP API and
04:42:46.040 | the query is latest world news and then it comes down here
04:42:51.560 | and we actually get some citations here which is kind of
04:42:53.800 | cool. So, you can also come through to here, okay? And it
04:42:58.040 | takes us through to here. So, that's pretty cool.
04:43:01.080 | Unfortunately, I just lost my chat. So, fine. Let me, I can
04:43:07.080 | ask that question again.
04:43:10.040 | Okay. We can see that tools use SERP API there. Now, let's
04:43:19.360 | continue with the next question from our notebook which is how
04:43:23.840 | cold is it right now? What is five multiplied by five and
04:43:27.440 | what do you get when multiplying those two numbers
04:43:29.760 | together? I'm just gonna modify that to say in Celsius so that
04:43:35.760 | I can understand. Thank you. Okay. So, for this one, we can
04:43:38.640 | see what did we get? So, we got current temperature in Oslo. We
04:43:42.800 | got multiply five by five which is our second question and then
04:43:47.200 | we also got subtract. Interesting that I don't know
04:43:52.320 | why I did that. It's kind of weird. So, it decided to use.
04:43:56.880 | Oh, okay. So, this is, okay. So, then here it was. Okay, that
04:44:03.520 | kind of makes sense. Does that make sense? Roughly. Okay. So,
04:44:07.440 | I think the the conversion for Fahrenheit Celsius is say like
04:44:12.080 | subtract thirty-two. Okay. Yes. So, to go from Fahrenheit to
04:44:18.000 | Celsius, you are doing basically Fahrenheit minus
04:44:22.720 | thirty-two and then you're multiplying by this number
04:44:24.880 | here which the I assume the AI did not. I roughly did. Okay.
04:44:30.960 | So, subtracting thirty-six like thirty-two would have given us
04:44:33.520 | four and it gave us approximately two. So, if you
04:44:36.800 | think, okay, multiply by this, it's practically multiplying by
04:44:40.400 | 0.5. So, halving the value and that would give us roughly two
04:44:45.120 | degrees. So, that's what this was doing here. Kind of
04:44:48.560 | interesting. Okay, cool. So, we've gone through. We have
04:44:53.520 | seen how to build a fully fledged chat application using
04:44:59.280 | what we've learned throughout the course and we've built
04:45:02.400 | quite a lot. If you think about this application, you're
04:45:06.160 | getting the real-time updates on what tools are being used,
04:45:10.160 | the parameters being input to those tools, and then that is
04:45:12.640 | all being returned in a streamed output and even in a
04:45:17.440 | structured output for your final answer including the
04:45:19.760 | answer and the tools that we use. So, of course, you know
04:45:23.920 | what we built here is fairly limited but it's super easy to
04:45:27.920 | extend this like you could maybe something that you might
04:45:31.360 | want to go and do is take what we've built here and like fork
04:45:35.360 | this application and just go and add different tools to it
04:45:38.160 | and see what happens because this is very extensible. You
04:45:42.000 | can do a lot with it but yeah, that is the end of the course.
04:45:46.400 | Of course, this is just the beginning of whatever it is
04:45:50.800 | you're wanting to learn or build with AI. Treat this as
04:45:55.200 | the beginning and just go out and find all the other cool
04:45:59.040 | interesting stuff that you can go and build. So, I hope this
04:46:03.120 | course has been useful, informative, and gives you an
04:46:08.960 | advantage in whatever it is you're going out to build. So,
04:46:12.800 | thank you very much for watching and taking the course
04:46:15.680 | and sticking through right to the end. I know it's pretty
04:46:18.720 | long so I appreciate it a lot and I hope you get a lot out of
04:46:23.760 | it. Thanks. Bye.
04:46:42.720 | (gentle music)