back to indexLangChain Mastery in 2025 | Full 5 Hour Course

Chapters
0:0 Course Introduction
4:24 CH1 When to Use LangChain
13:28 CH2 Getting Started
14:14 Local Course Setup (Optional)
17:0 Colab Setup
18:11 Initializing our OpenAI LLMs
22:34 LLM Prompting
28:48 Creating a LLM Chain with LCEL
33:59 Another Text Generation Pipeline
37:11 Structured Outputs in LangChain
41:56 Image Generation in LangChain
46:59 CH3 LangSmith
49:36 LangSmith Tracing
55:45 CH4 Prompts
67:21 Using our LLM with Templates
72:39 Few-shot Prompting
78:56 Chain of Thought Prompting
85:25 CH5 LangChain Chat Memory
89:51 ConversationBufferMemory
98:39 ConversationBufferWindowMemory
107:57 ConversationSummaryMemory
117:33 ConversationSummaryBufferMemory
129:29 CH6 LangChain Agents Intro
136:34 Creating an Agent
140:56 Agent Executor
147:30 Web Search Agent
150:41 CH7 Agent Deep Dive
160:8 Creating an Agent with LCEL
176:40 Building a Custom Agent Executor
185:19 CH8 LCEL
189:14 LCEL Pipe Operator
193:28 LangChain RunnableLambda
198:0 LangChain Runnable Parallel and Passthrough
203:13 CH9 Streaming
209:22 Basic LangChain Streaming
213:29 Streaming with Agents
231:26 Custom Agent and Streaming
240:46 CH10 Capstone
245:25 API Build
252:14 API Token Generator
256:44 Agent Executor in API
274:50 Async SerpAPI Tool
280:53 Running the App
284:49 Course Completion!
00:00:00.000 |
Welcome to the AI engineers guide for the line chain. 00:00:03.320 |
This is a full course that will take you from the assumption 00:00:08.400 |
that you know nothing about line chain to being able to 00:00:12.440 |
proficiently use the framework, either, you know, within line 00:00:17.480 |
chain, within line graph, or even elsewhere, from the 00:00:22.360 |
fundamentals that you will learn in this course. Now, this course 00:00:26.400 |
will be broken up into multiple chapters, we're going to start 00:00:29.960 |
by talking a little bit about what line chain is, and when we 00:00:33.920 |
should really be using it, and when maybe we don't want to use 00:00:36.560 |
it. We'll talk about the pros and cons, and also about the 00:00:40.040 |
wider line chain ecosystem, not just about a line chain 00:00:43.880 |
framework itself. From there, we'll introduce line chain, and 00:00:48.040 |
we'll just have a look at a few examples before diving into 00:00:51.080 |
essentially the basics of the framework. Now, I will just note 00:00:55.720 |
that all of this for line chain 0.3. So that is the latest 00:01:00.360 |
current version. Although that being said, we will cover a 00:01:04.640 |
little bit of where line chain comes from as well. So we'll be 00:01:07.840 |
looking at pre 0.3 version methods for doing things, so 00:01:13.320 |
that we can understand, okay, that's the old way of doing 00:01:16.200 |
things, how do we do it now, now that we're in version 0.3? And 00:01:20.280 |
also, how do we dive a little deeper into those methods as 00:01:23.200 |
well and kind of customize those. From there, we'll be 00:01:25.920 |
diving into what I believe is the future of AI. I mean, it's 00:01:33.400 |
the now and the short term, potentially even further into 00:01:36.840 |
the future. And that is agents. We'll be spending a lot of time 00:01:40.880 |
on agents. So we'll be starting with a simple introduction to 00:01:45.560 |
agents. So that is how can we build an agent that is simple? 00:01:51.080 |
What are the main components of agents? What do they look like? 00:01:53.880 |
And then we'll be diving much deeper into them. And we'll be 00:01:57.640 |
building out our own agent executor, which kind of like a 00:02:01.320 |
framework around the AI components of an agent, we're 00:02:06.280 |
building our own. And once we've done our deep dive on agents, 00:02:10.480 |
we'll be diving into line chain expression language, which we'll 00:02:14.360 |
be using throughout this course. So line chain expression 00:02:17.160 |
language is the recommended way of using line chain. And the 00:02:21.680 |
expression language or LSAL takes kind of a break from 00:02:25.240 |
standard Python syntax. So there's a bit of weirdness in 00:02:30.400 |
there. And yes, we'll be using it throughout the course. But 00:02:34.000 |
we're leaving the LSAL chapter until this kind of later on in 00:02:39.160 |
the course, because we really want to dive into the 00:02:41.320 |
fundamentals of LSAL by that point. But the idea is that by 00:02:45.360 |
this point, you already have a good grasp of at least how to 00:02:47.720 |
use the basics of LSAL before we really dig in at that point, 00:02:51.960 |
then we'll be digging streaming, which is an essential UX 00:02:56.200 |
feature of AI applications in general streaming, it can just 00:03:01.120 |
improve the user experience massively. And it's not just 00:03:04.920 |
about streaming tokens, you know, that interface where you 00:03:07.960 |
have word by word, the AI is generating text on the screen, 00:03:12.360 |
streaming is more than just that it is also the ability, if 00:03:16.920 |
you've seen the interface of perplexity, where as the agent 00:03:20.400 |
is thinking, you're getting an update of what the agent is 00:03:23.800 |
thinking about what tools is using and how it is using those 00:03:27.320 |
tools. That's also another essential feature that we need 00:03:30.600 |
to have a good understanding of streaming to build. So we'll 00:03:33.960 |
also be taking a look at all of that. Then we'll finally we'll 00:03:38.080 |
be topping it off with a capstone project where we will 00:03:42.320 |
be building our own AI agent application that is going to 00:03:47.040 |
incorporate all of these features, we're going to have an 00:03:50.240 |
agent that can use tools, web search, we'll be using 00:03:53.080 |
streaming, and we'll see all of this in a nice interface that we 00:03:57.760 |
can that we can work with. So as an overview, the course, of 00:04:01.360 |
course, is very high level, what I've just gone through, there's 00:04:04.840 |
a ton of stuff in here. And truly, this course can take you 00:04:07.960 |
from you know, wherever you are with Lionchain at the moment, 00:04:11.160 |
and whether you're a beginner or you've used it a bit or even 00:04:14.040 |
intermediate, and you're probably going to learn a fair 00:04:17.000 |
bit from it. So without any further ado, let's dive in to 00:04:22.280 |
the first chapter. Okay, so the first chapter of the course, 00:04:27.120 |
we're going to focus on when should we actually use 00:04:30.480 |
Lionchain? And when should we use something else? Now, through 00:04:34.320 |
this chapter, we're not really going to focus too much on the 00:04:36.800 |
code. Well, you know, every other chapter is very code 00:04:40.520 |
focused. But this one is a little more just theoretical. 00:04:44.000 |
Why is Lionchain? Where's it fit in? When should I use it? When 00:04:46.800 |
should I not? So I want to just start by framing this. Lionchain 00:04:51.560 |
is one of, if not the most popular open source framework 00:04:57.360 |
within the Python ecosystem, at least for AI. It works pretty 00:05:01.640 |
well for a lot of things. And also works terribly for a lot of 00:05:04.600 |
things as well, to be completely honest. There are massive pros, 00:05:08.000 |
massive cons to using Lionchain. Here, we're just going to 00:05:10.600 |
discuss a few of those and see how Lionchain maybe compares a 00:05:14.760 |
little bit against other frameworks. So the very first 00:05:19.040 |
question we should be asking ourselves is, do we even need a 00:05:22.680 |
framework? Is a framework actually needed when we can just 00:05:28.480 |
hit an API, you have the OpenAI API, other APIs, Mistral, so on, 00:05:32.840 |
and we can get a response from an LLM in five lines of code on 00:05:36.960 |
average for those is incredibly, incredibly simple. However, that 00:05:42.000 |
can change very quickly. When we start talking about agents, or 00:05:47.080 |
retrieval, augmented generation, research assistance, all this 00:05:51.120 |
sort of stuff, those use cases as methods can suddenly get 00:05:57.560 |
quite complicated when we're outside of frameworks. And 00:06:02.640 |
that's not necessarily a bad thing. Right? It can be 00:06:06.200 |
incredibly useful to be able to just understand everything that 00:06:11.360 |
is going on and build it yourself. But the problem is 00:06:15.680 |
that to do that, you need time, like you need to learn all the 00:06:19.560 |
intricacies of building these things, the intricacies of these 00:06:22.120 |
methods themselves, like what, you know, how do they even work? 00:06:24.840 |
And that kind of runs in the opposite direction of what we 00:06:28.840 |
see with AI at the moment, which is AI is being integrated into 00:06:32.160 |
the world at an incredibly fast rate. And because of this, most 00:06:38.880 |
engineers coming into the space are not from a machine learning 00:06:43.280 |
or AI background, most people don't necessarily have any 00:06:46.880 |
experience with the system, a lot of engineers coming in that 00:06:50.840 |
could be DevOps engineers, generic backend Python 00:06:53.920 |
engineers, even front end engineers coming in and 00:06:57.120 |
building all these things, which is great, but they don't 00:07:00.400 |
necessarily have the experience and that, you know, that might 00:07:02.920 |
be you as well. And that's not a bad thing. Because the idea is 00:07:06.480 |
that obviously you're going to learn and you're going to pick 00:07:08.560 |
up a lot of these things. And in this scenario, there's quite a 00:07:12.520 |
good argument for using a framework, because a framework 00:07:16.320 |
means that you can get started faster. And a framework like 00:07:20.080 |
Langchain, it abstracts away a lot of stuff. And that's a big 00:07:24.800 |
complaint that a lot of people will have with Langchain. But 00:07:28.560 |
that abstracting away of many things is also what made 00:07:33.120 |
Langchain popular, because it means that you can come in not 00:07:35.680 |
really knowing, okay, what, you know, RAG is, for example, and 00:07:39.240 |
you can implement a RAG pipeline, get the benefits of it 00:07:42.080 |
without really needing to understand it. And yes, there's 00:07:44.760 |
an argument against that as well, just implementing 00:07:47.680 |
something without really understanding it. But as we'll 00:07:50.440 |
see throughout the course, it is possible to work with 00:07:54.640 |
Langchain in a way, as we will in this course, where you kind 00:08:00.600 |
of implement these things in an abstract way, and then break 00:08:03.400 |
them apart, and start understanding the intricacies at 00:08:07.000 |
least a little bit. So that can actually be pretty good. 00:08:10.640 |
However, again, circling back to what we said at the start, if 00:08:17.000 |
the idea or your application is just a very simple, you know, 00:08:20.360 |
you need to generate some text based on some basic input, 00:08:23.920 |
maybe you should just use an API, that's completely valid as 00:08:27.120 |
well. Now, we just said, okay, a lot of people coming to 00:08:31.360 |
Langchain might not be from an AI background. So another 00:08:35.840 |
question for a lot of these engineers might be, okay, if I 00:08:38.120 |
want to learn about, you know, RAG, agents, all these things, 00:08:42.520 |
should I skip Langchain and just try and build it from scratch 00:08:46.800 |
myself? Well, Langchain can help a lot with that learning 00:08:50.920 |
journey. So you can start very abstract. And as you gradually 00:08:56.480 |
begin to understand the framework better, you can strip 00:09:00.520 |
away more and more of those abstractions and get more into 00:09:03.560 |
the details. And in my opinion, this gradual shift towards more 00:09:08.800 |
explicit code, with less abstraction, is a really nice 00:09:14.560 |
feature. And it's also what we focus on, right? Throughout 00:09:17.680 |
this course, that's what we're going to be doing. We're going 00:09:19.520 |
to sign abstract, stripping away the abstractions, and getting 00:09:23.240 |
more explicit with what we're building. So for example, 00:09:26.000 |
building an agent in Langchain, there's this very simple and 00:09:31.120 |
incredibly abstract create tools agent method that we can use. 00:09:36.080 |
And like it creates a tool agent for you. It's it doesn't tell 00:09:41.160 |
you anything. So you can you can use that, right. And we will use 00:09:46.320 |
that initially in the course, but then you can actually go 00:09:49.800 |
from that to defining your full agent execution logic, which is 00:09:56.120 |
basically a tools call to open AI, you're going to be getting 00:09:59.880 |
that tool information back, but then you've got to figure out, 00:10:02.280 |
okay, how am I going to execute that? How am I going to store 00:10:04.720 |
this information? And then how am I going to iterate through 00:10:07.520 |
this? So we're going to be seeing that stripping away 00:10:11.560 |
abstractions as we work through as we build agents as we do, as 00:10:15.360 |
we build, like our streaming use case, among many other things, 00:10:18.960 |
even chat memory, we'll see there as well. So Langchain can 00:10:23.160 |
act as the on ramp to your AI learning experience, then what 00:10:29.040 |
you might find, and I do think this is quite true, for most 00:10:33.080 |
people is that if you if you're really serious about AI 00:10:37.360 |
engineering, and that's what you want to do, like that's your 00:10:39.640 |
focus, right, which isn't for everyone, for certain, a lot of 00:10:43.840 |
people just want to understand a bit of AI, and they want to 00:10:46.360 |
continue doing what they're doing, and just integrate AI 00:10:49.000 |
here and there. And maybe those, you know, if that's your focus, 00:10:51.760 |
you might stick with Langchain, there's not necessarily a reason 00:10:55.360 |
to move on. But in the other scenario, where you're thinking, 00:10:59.880 |
okay, I want to get really good at this, I want to just learn as 00:11:04.400 |
much as I can. And I'm going to dedicate basically my, you know, 00:11:07.960 |
my short term future of my career on becoming AI engineer. 00:11:14.080 |
Then Langchain might be the on ramp, it might be your initial 00:11:18.280 |
learning curve. But then after you've become competent with 00:11:21.680 |
Langchain, you might actually find that you want to move on to 00:11:24.080 |
other frameworks. And that doesn't necessarily mean that 00:11:26.600 |
you're going to have wasted your time with Langchain. Because 00:11:30.000 |
one, Langchain is a thing helping you learn. And two, one 00:11:33.640 |
of the main frameworks that I recommend a lot of people to 00:11:37.080 |
move on to is actually Langraff, which is still within the 00:11:40.200 |
Langchain ecosystem, and it still uses a lot of Langchain 00:11:43.840 |
objects and methods. And, of course, concepts as well. So 00:11:48.720 |
even if you do move on from Langchain, you may move on to 00:11:52.200 |
something like Langraff, which you can know Langchain for 00:11:56.000 |
anyway. And let's say you do move on to another framework 00:11:58.800 |
instead. In that scenario, the concepts that you learn from 00:12:02.280 |
Langchain are still pretty important. So to just finish up 00:12:05.600 |
this chapter, I just want to summarize on that question of 00:12:10.160 |
should you be using Langchain? What's important to remember is 00:12:14.400 |
that Langchain does abstract a lot. Now, this abstraction of 00:12:18.680 |
Langchain is both a strength and a weakness. With more 00:12:23.240 |
experience, those abstractions can feel like a limitation. And 00:12:28.720 |
that is why we sort of go with the idea that Langchain is a 00:12:34.920 |
really good to get started with. But as the project grows in 00:12:38.320 |
complexity, or the engineers get more experience, they might move 00:12:41.040 |
on to something like Langraff, which, in any case, is going to 00:12:44.520 |
be using Langchain to some degree. So in either one of 00:12:48.160 |
those scenarios, Langchain is going to be a core tool in an AI 00:12:55.960 |
engineer's toolkit. So it's worth learning in our opinion. 00:12:59.000 |
But of course, it comes with its, you know, it comes with its 00:13:02.280 |
weaknesses. And it's just good to be aware of that it's not a 00:13:04.920 |
perfect framework. But for the most part, you will learn a lot 00:13:08.840 |
from it, and you will be able to build a lot with it. So with all 00:13:13.120 |
of that, we'll move on to our first sort of hands on chapter 00:13:17.840 |
with Langchain, where we'll just introduce Langchain, some of the 00:13:22.800 |
essential concepts, we're not going to dive too much into the 00:13:25.240 |
syntax, but we're still going to understand a little bit of what 00:13:27.200 |
we can do with it. Okay, so moving on to our next chapter, 00:13:29.880 |
getting started with Langchain. In this chapter, we're going to 00:13:33.720 |
be introducing Langchain by building a simple LM powered 00:13:38.040 |
assistant that will do various things for us, it will be 00:13:41.040 |
multimodal, generating some text, generating images, 00:13:44.760 |
generate some structured outputs, it will do a few things. 00:13:47.960 |
Now to get started, we will go over to the course repo, all of 00:13:53.240 |
the code, all the chapters are in here, there are two ways of 00:13:56.840 |
running this, either locally or in Google Colab, we would 00:14:01.000 |
recommend running in Google Colab, because it's just a lot 00:14:03.880 |
simpler with environments. But you can also run it locally. And 00:14:07.880 |
actually, for the capstone, we will be running it locally, 00:14:11.640 |
there's no way of us doing that in Colab. So if you would like 00:14:16.200 |
to run everything locally, I'll show you how quickly now if you 00:14:19.520 |
would like to run in Colab, which I would recommend at least 00:14:22.680 |
for the first notebook chapters, just skip ahead, there will be 00:14:27.520 |
chapter points in the timeline of the video. So for only 00:14:32.960 |
running it locally, we just come down to here. So this actually 00:14:36.840 |
tells you everything that you need. So you will need to 00:14:40.920 |
install uvi. Alright, so this is the package manager that we 00:14:45.000 |
recommend by the Python and package management library, you 00:14:48.840 |
don't need to use uvi, it's up to you. uvi is very simple, it 00:14:54.440 |
works really well. So I would recommend that. So you would 00:14:57.680 |
install it with this command here. This is on Mac. So it will 00:15:02.320 |
be different. Otherwise, if you are on Windows, or otherwise, 00:15:06.040 |
you can look at the installation guide there and it will tell 00:15:08.560 |
you what to do. And so before we actually do this, what I will 00:15:12.680 |
do is go ahead and just clone this repo. So we'll come into 00:15:18.400 |
here, I'm going to create like a temp directory for me because 00:15:21.680 |
I already have the flying chain course in there. And what I'm 00:15:25.840 |
going to do is just get clone line chain course. Okay, so you 00:15:29.800 |
will also need to install git if you don't have that. Okay, so 00:15:34.880 |
we have that, then what we'll do is copy this. Okay, so this 00:15:39.000 |
will install Python 3.12.7 for us with this command, then this 00:15:44.360 |
will create a new VM within that or using Python 3.12.7 that 00:15:50.960 |
we've installed. And then uvi sync will actually be looking at 00:15:55.760 |
the pyproject.toml file, that's like the package installation 00:16:00.520 |
for the repo and using that to install everything that we need. 00:16:05.200 |
Now, we should actually make sure that we are within the 00:16:08.160 |
line chain course directory. And then yes, we can run those 00:16:12.080 |
three. There we go. So everything should install with 00:16:17.080 |
that. Now, if you are in cursor, you can just do cursor dot or we 00:16:25.000 |
can run code dot if in VS code, I'll just be running this. And 00:16:29.960 |
then I've opened up the course. Now within that course, you have 00:16:34.080 |
your notebooks and then you just run through these making sure 00:16:36.880 |
you select your kernel, Python environment and making sure 00:16:39.960 |
you're using the correct VM from here. So that should pop up 00:16:44.400 |
already as this VM bin Python, and you'll click that and then 00:16:48.800 |
you can run it through. When you are running locally, don't run 00:16:52.520 |
these, you don't need to you've already installed everything. So 00:16:55.320 |
you don't this specifically is for Colab. So that is running 00:16:59.720 |
things locally. Now let's have a look at running things in Colab. 00:17:05.080 |
So for running everything in Colab, we have our notebooks in 00:17:09.160 |
here, we click through, and then we have each of the chapters 00:17:12.400 |
through here. So starting with the first chapter, the 00:17:16.120 |
introduction, which is where we are now. So what you can do to 00:17:21.000 |
open this in Colab is either just click this Colab button 00:17:24.600 |
here. Or if you really want to, for example, maybe this is not 00:17:30.480 |
loading for you, what you can do is you can copy the URL at the 00:17:34.840 |
top here, you can go over to Colab, you can go to open GitHub, 00:17:40.920 |
and then just paste that in there and press enter. And there 00:17:46.360 |
we go, we have our notebook. Okay, so we're in now, what we 00:17:51.720 |
will do first is just install the prerequisites. So we have 00:17:55.880 |
line chain, just a little line chain packages here, line chain 00:17:59.680 |
core, line chain OpenAI because we're using OpenAI and line 00:18:04.120 |
chain community, which is needed for running what we're running. 00:18:07.600 |
Okay, so that has installed everything for us. So we can 00:18:12.200 |
move on to our first step, which is initializing our LM. So 00:18:18.800 |
we're going to be using GPT-40 mini, which is slightly small, 00:18:23.280 |
but fast, but also cheaper model. That is also very good 00:18:27.400 |
from OpenAI. So what we need to do here is get an API key. Okay, 00:18:33.320 |
so for getting the API key, we're going to go to OpenAI's 00:18:37.240 |
website. And you can see here that we're opening platform. 00:18:40.560 |
openai.com. And then we're going to go into settings organization 00:18:44.200 |
API keys. So you can copy that or just click it from here. 00:18:49.160 |
Okay, so I'm going to go ahead and create a new secret key to 00:18:54.120 |
actually just in case you're kind of looking for where this 00:18:57.080 |
is. It's settings organization API keys again, okay, create a 00:19:01.600 |
new API key, I'm going to call it line chain course. I'll just 00:19:08.240 |
put on the semantic router, that's just my organization, you 00:19:11.600 |
put it wherever you want it to be. And then you would copy your 00:19:16.320 |
API key, you can see mine here, I'm obviously going to revert 00:19:20.040 |
that before you see this, but you can try and use it if you 00:19:22.400 |
really like. So I'm going to copy that. And I'm going to 00:19:25.080 |
place it into this little box here. You could also just place 00:19:29.680 |
it, put your full API key in here, it's up to you. But this 00:19:34.560 |
little box just makes things easier. Now, that what we've 00:19:39.040 |
basically done there is just passing our API key, we're 00:19:41.360 |
setting our OpenAI model GPT-40 mini. And what we're going to be 00:19:45.880 |
doing now is essentially just connecting and setting up our 00:19:49.960 |
LLM parameters with line chain. So we run that, we say okay, 00:19:55.680 |
we're using a GPT-40 mini. And we're also setting ourselves up 00:19:59.880 |
to use two different LLMs here, or two of the same LLM with 00:20:04.560 |
slightly different settings. So the first of those is an LLM 00:20:08.240 |
with a temperature setting of zero. The temperature setting 00:20:11.480 |
basically controls almost the randomness of the output of 00:20:17.160 |
your LLM. And the way that it works is when an LLM is 00:20:22.520 |
predicting the next token, or next word in a sequence, it'll 00:20:28.040 |
provide a probability actually for all of the tokens within the 00:20:31.360 |
LLMs knowledge base or what the LLM has been trained on. So 00:20:35.280 |
what we do when we set a temperature of zero is we say 00:20:38.400 |
you are going to give us the token with highest probability 00:20:43.960 |
according to you, okay. Whereas when we set a temperature of 00:20:48.160 |
0.9, what we're saying is, okay, there's actually an increased 00:20:52.280 |
probability of you giving us a token that according to your 00:20:57.720 |
generated output is not the token with the highest 00:21:01.000 |
probability according to the LLM. But what that tends to do 00:21:04.240 |
is give us more sort of creative outputs. So that's what the 00:21:08.120 |
temperature does. So we are creating a normal LLM and then 00:21:12.680 |
a more creative LLM with this. So what are we going to be 00:21:16.480 |
building? We're going to be taking a draft article from the 00:21:24.960 |
Aurelio learning page, and we're going to be using line chain to 00:21:29.040 |
generate various things that we might find helpful as well. You 00:21:34.000 |
know, we have this article draft and we're editing it and just 00:21:36.520 |
kind of like finalizing it. So what are those going to be? You 00:21:39.920 |
can see them here. We have the title for the article, 00:21:43.040 |
description, an SEO friendly description, specifically. The 00:21:47.840 |
third one, we're going to be getting the LLM to provide us 00:21:50.720 |
advice on existing paragraph and essentially writing a new 00:21:54.280 |
paragraph for us from that existing paragraph. And what 00:21:57.440 |
it's going to do, this is the structured output part is going 00:22:00.960 |
to write a new version of that paragraph for us. And it's going 00:22:03.800 |
to give us advice on where we can improve our writing. Then 00:22:07.000 |
we're going to generate a thumbnail hero image for our 00:22:11.080 |
article. So a nice image that you would put at the top. So 00:22:14.520 |
here, we're just going to input our article, you can put 00:22:19.040 |
something else in here if you like. Essentially, this is just 00:22:22.200 |
a big article that's written a little while back on agents. And 00:22:28.680 |
now we can go ahead and start preparing our prompts, which are 00:22:32.480 |
essentially the instructions for our LLM. So line chain comes 00:22:36.960 |
with a lot of different like utilities for prompts, and we're 00:22:41.800 |
going to dive into them in a lot more detail. But I do want to 00:22:44.160 |
just give you the essentials now, just so you can understand 00:22:48.040 |
what we're looking at, at least conceptually. So prompts for 00:22:51.480 |
chat agents are at a minimum broken up into three 00:22:55.000 |
components. Those are the system prompt, this provides 00:22:58.800 |
instructions to our LLM on how it should behave, what its 00:23:01.600 |
objective is, and how it should go about achieving that 00:23:04.640 |
objective. Generally, system prompts are going to be a bit 00:23:08.440 |
longer than what we have here, depending on the use case, then 00:23:11.880 |
we have our user prompts. So these are user written 00:23:15.200 |
messages. Usually, sometimes we might want to pre populate 00:23:18.680 |
those if we want to encourage a particular type of 00:23:21.400 |
conversational patterns from our agent. But for the most part, 00:23:26.640 |
yes, these are going to be user generated. Then we have our AI 00:23:30.920 |
prompts. So these are, of course, AI generated. And again, 00:23:35.200 |
in some cases, we might want to generate those ourselves 00:23:38.360 |
beforehand or within a conversation if we have a 00:23:41.960 |
particular reason for doing so. But for the most part, you can 00:23:45.080 |
assume that these are actually user and AI generated. Now, the 00:23:49.480 |
line chain provides us with templates for each one of these 00:23:54.480 |
prompt types. Let's go ahead and have a look at what these look 00:23:58.600 |
like within line chain. So to begin, we are looking at this 00:24:03.560 |
one. So we have our system message prompt template and 00:24:07.920 |
human messages, the user that we saw before. So we have these 00:24:12.120 |
two system prompt, keeping it quite simple here, you are a AI 00:24:15.520 |
system that helps generate article titles, right. So our 00:24:18.640 |
first component we want to generate is article title. So 00:24:22.160 |
we're telling the AI, that's what we want it to do. And then 00:24:26.600 |
here, right. So here, we're actually providing kind of like 00:24:32.680 |
a template for a user input. So yes, as I mentioned, user input 00:24:40.000 |
can be, it can be fully generated by user, it might be 00:24:44.920 |
kind of not generated by user, it might be setting up a 00:24:48.400 |
conversation beforehand, which a user would later use, or in 00:24:52.320 |
this scenario, we're actually creating a template, and the 00:24:57.040 |
what the user will provide us will actually just be inserted 00:25:00.800 |
here inside article. And that's why we have this import 00:25:04.400 |
variables. So what this is going to do is okay, we have all of 00:25:09.800 |
these instructions around here, they're all going to be 00:25:12.920 |
provided to open AI as if it is the user saying this, but it 00:25:16.760 |
will actually just be this here, that user will be providing, 00:25:21.800 |
okay. And we might want to also format this a little nicer, it 00:25:24.680 |
kind of depends, this will work as it is. But we can also put, 00:25:28.320 |
you know, something like this to make it a little bit clearer 00:25:31.400 |
to the LM. Okay, what is the article? Where are the prompts? 00:25:36.840 |
So we have that, you can see in this scenario, there's not that 00:25:42.680 |
much difference to what the system prompt and user prompt is 00:25:45.120 |
doing. And this is, it's a particular scenario, it varies 00:25:48.440 |
when you get into the more conversational stuff, as we will 00:25:50.920 |
do later, you'll see that the user prompt is generally more 00:25:55.640 |
fully user generated, or mostly user generated. And much of 00:26:01.160 |
these types of instructions, we might actually be putting into 00:26:04.960 |
the system prompt, it varies. And we'll see throughout the 00:26:07.680 |
course, many different ways of using these different types of 00:26:11.560 |
prompts in various different places. Then you'll see here, so 00:26:16.400 |
I just want to show you how this is working, we can use this 00:26:20.120 |
format method on our user prompt here to actually insert 00:26:24.640 |
something within the article input here. So we're going to go 00:26:29.840 |
use prompt format, and then we pass in something for article. 00:26:32.920 |
Okay. And we can also maybe format this a little nicer, but 00:26:37.240 |
I'll just show you this for now. So we have our human message. 00:26:39.800 |
And then inside content, this is the text that we had, right, you 00:26:43.200 |
can see that we have all this, right. And this is what we wrote 00:26:46.000 |
before we wrote all this, except from this part, we didn't write 00:26:50.000 |
this, instead of this, we had article, right. So let's format 00:26:55.920 |
this a little nicer so that we can see. Okay, so this is 00:26:59.600 |
exactly what we wrote up here, exactly the same, except from 00:27:02.600 |
now we have test string instead of article. So later, when we 00:27:06.840 |
insert our article, it's going to go inside there, slowly 00:27:10.520 |
soon. It's like it's an it's an F string in Python, okay. And 00:27:14.440 |
this is again, this is one of the things where people might 00:27:16.760 |
complain about line chain, you know, this sort of thing can be, 00:27:20.000 |
you know, it seems excessive, because you could just do this 00:27:23.120 |
with an F string. But there are, as we'll see later, particularly 00:27:26.240 |
when you're streaming, just really helpful features that 00:27:29.960 |
come with using line chains kind of built in prompt templates, 00:27:35.360 |
or at least message objects that we will see. So, you know, we 00:27:42.160 |
need to keep that in mind. Again, as things get more 00:27:45.080 |
complicated, line chain can be a bit more useful. So, chat 00:27:48.880 |
prompt template, this is basically just going to take 00:27:52.680 |
what we have here, our system prompt, user prompts, we could 00:27:55.120 |
also include some AI prompts in there. And what it's going to do 00:27:59.560 |
is merge both of those. And then when we do format, what it's 00:28:05.400 |
going to do is put both of those together into a chat history. 00:28:09.120 |
Okay, so let's see what that looks like. First, in a more 00:28:13.080 |
messy way. Okay, so you can see we have just the content, right? 00:28:18.840 |
So it doesn't include the whole, you know, before we had human 00:28:22.120 |
message, we're not include, we're not seeing anything like 00:28:24.520 |
that here. Instead, we're just seeing the string. So now let's 00:28:28.680 |
switch back to print. And we can see that what we have is our 00:28:33.880 |
system message here, it's just prefixed with this system. And 00:28:37.160 |
then we have human, and it's prefixed by human, and then it 00:28:39.840 |
continues, right? So that's, that's all it's doing is just 00:28:42.320 |
kind of merging those in some sort of chat log, we could also 00:28:45.000 |
put in like AI messages, and they would appear in there as 00:28:47.680 |
well. Okay, so we have that. Now, that is our prompt 00:28:52.280 |
template. Let's put that together with an LLM to create 00:28:55.600 |
what would be in the past line chain be called an LLM chain. 00:28:59.800 |
Now, we wouldn't necessarily call it an LLM chain, because 00:29:03.040 |
we're not using the LLM chain abstraction, it's not super 00:29:05.880 |
important, if that doesn't make sense, we'll go into it in more 00:29:09.160 |
detail later, particularly in the in the LSO chapter. So what 00:29:15.080 |
this chain will do, think line chain is just chains, we're 00:29:20.240 |
chaining together these multiple components, it will perform the 00:29:24.040 |
steps prompt formatting. So that's what I just showed you 00:29:27.160 |
LLM generation, so sending our prompt to OpenAI, getting a 00:29:32.960 |
response and getting that output. So you can also add 00:29:37.280 |
another step here, if you want to format that in a particular 00:29:39.840 |
way, we're going to be outputting that in a particular 00:29:42.440 |
format so that we can feed it into the next step more easily. 00:29:45.040 |
But there are also things called output parsers, which parse 00:29:48.800 |
your output in a more dynamic or complicated way, depending on 00:29:53.840 |
what you're doing. So this is our first look at LSAL, I don't 00:29:58.360 |
want us to focus too much on the syntax here, because we will be 00:30:01.000 |
doing that later. But I do want you to just understand what is 00:30:04.840 |
actually happening here. And logically, what are we writing? 00:30:10.880 |
So all we really need to know right now is we define our 00:30:15.680 |
inputs with the first dictionary segment here. Alright, so this 00:30:20.000 |
is a, you know, our inputs, which we have defined already, 00:30:24.080 |
okay. So if we come up to our user prompt here, we said input 00:30:30.840 |
variable is our article, right. And we might have also added 00:30:34.000 |
input variables to the system prompt here as well. In that 00:30:36.880 |
case, you know, let's say we had your AI assistant called name, 00:30:43.920 |
right, that helps generate article titles. In this 00:30:48.720 |
scenario, we might have input variables, name here, right. And 00:30:55.360 |
then what we would have to do down here is we would also have 00:31:00.560 |
to pass that in, right. So it also we would have article, we 00:31:04.200 |
would also have name. So basically, we just need to make 00:31:09.720 |
sure that in here, we're including the variables that we 00:31:13.720 |
have defined as input variables for our, our first prompts. 00:31:17.680 |
Okay, so we can actually go ahead and let's add that. So we 00:31:21.120 |
can see it in action. So run this again, and just include 00:31:26.640 |
that or reinitialize our first prompt. So we see that. And if 00:31:32.320 |
we just have a look at what that means for this format function 00:31:35.480 |
here, it means we'll also need to pass in a name, okay, and 00:31:39.280 |
call it Joe. Okay, so Joe, the AI, right, so you're an AI 00:31:43.880 |
assistant called Joe now. Okay, so we have Joe, our AI, that is 00:31:48.400 |
going to be fed in through these input variables. Then we have 00:31:51.560 |
this pipe operator, the pipe operator is basically saying 00:31:54.960 |
whatever is to the left of the pipe operator, which in this 00:31:58.320 |
case would be this is going to go into whatever is on the right 00:32:02.360 |
of the pipe operator. It's that simple. Again, we'll dive into 00:32:06.240 |
this and kind of break it apart in the LSL chapter. But for now, 00:32:09.320 |
that's all we need to know. So this is going to go into our 00:32:13.120 |
first prompt, that is going to form everything's going to add 00:32:16.680 |
the name and the article that we've provided into our first 00:32:19.320 |
prompt. And it's going to output that, right, output that we have 00:32:23.240 |
our pipe operator here. So the output of this is going to go 00:32:26.160 |
into the input of our next step, our creative LM, then that is 00:32:32.800 |
going to generate some tokens, it's going to generate our 00:32:35.320 |
output, that output is going to be an AI message. And as you saw 00:32:40.600 |
before, if I take this bit out, within those message objects, we 00:32:47.200 |
have this content field, okay, so we are actually going to 00:32:50.720 |
extract the content field out from our AI message to just get 00:32:56.640 |
the content. And that is what we do here. So we get the AI 00:32:59.680 |
message out from ILM. And then we're extracting the content 00:33:03.040 |
from that AI message object. And we're going to pass it into a 00:33:05.800 |
dictionary that just contains article title, like so. Okay, we 00:33:10.200 |
don't need to do that, we can just get the AI message 00:33:12.440 |
directly. I just want to show you how we are using this sort 00:33:17.120 |
of chain in Elsa. So once we have set up our chain, we then 00:33:23.000 |
call it or execute it using the invoke method. Into that we will 00:33:27.400 |
need to pass in those variables. So we have our article already, 00:33:30.800 |
but we also gave our AI name now. So let's add that. And 00:33:34.520 |
we'll run this. Okay, so Joe has generated us a article title, 00:33:42.440 |
unlocking the future, the rise of neuro symbolic AI agents. 00:33:46.600 |
Cool, much better name than what I gave the article, which was 00:33:50.440 |
AI agents are neuro symbolic systems. I don't think I did too 00:33:55.320 |
bad. Okay, so we have that. Now, let's continue. And what we're 00:34:01.160 |
going to be doing is building more of these types of LM chain 00:34:05.560 |
pipelines, where we're feeding in some prompts, we're 00:34:09.560 |
generating something, getting something and doing something 00:34:12.440 |
with it. So as mentioned, we have the title, we're now moving 00:34:16.840 |
on to the description. So I want to generate description. So we 00:34:19.680 |
have our human message prompt template. So this is actually 00:34:22.160 |
going to go into a similar format as before, we probably 00:34:27.840 |
also want to redefine this because I think I'm using the 00:34:30.680 |
same system message there. So let's, let's go ahead and do 00:34:35.280 |
modify that. Or what we could also do is let's just remove the 00:34:41.360 |
name now because I've shown you that. So what we could do is 00:34:46.080 |
you're an AI system that helps build good articles, right, 00:34:52.000 |
build good articles. And we could just use this as our, you 00:34:56.520 |
know, generic system prompt now. So let's say that's our new 00:35:00.120 |
system prompt. Now we have our user prompt, you're tasked with 00:35:03.000 |
creating a description for the article, the articles here for 00:35:05.320 |
you to examine article, here is the article title. Okay, so we 00:35:09.000 |
need the article title now as well, and our input variables. 00:35:11.920 |
Now we're going to output an SEO friendly article description. 00:35:15.320 |
And we're just saying, just to be certain here, do not output 00:35:18.680 |
anything other than the description. So you know, 00:35:21.160 |
sometimes an LLM might say, Hey, look, this is what I generated 00:35:24.960 |
for you. The reason I think this is good is because so on and so 00:35:27.520 |
on and so on. Right? If you're programmatically taking some 00:35:31.120 |
output from an LLM, you don't want all of that fluff around 00:35:34.640 |
what the LLM has generated, you just want exactly what you've 00:35:38.120 |
asked it for. Okay, because otherwise, you need to pass out 00:35:40.920 |
with code, and it can get messy, and also just far less reliable. 00:35:44.840 |
So we're just saying do not put anything else. Then we're 00:35:48.560 |
putting all of these together. So system prompt and the second 00:35:50.880 |
user prompt, this one here, putting those together into a 00:35:54.560 |
new chat prompt template. And then we're going to feed all 00:35:58.520 |
that in to another LSL chain as we have here to generate our 00:36:04.360 |
description. So let's go ahead, we invoke that as before, we're 00:36:07.800 |
just making sure we add in the article title that we got from 00:36:10.960 |
before. And let's see what we get. Okay, so we have this 00:36:15.400 |
explore the transformative potential of neurosymbolic AI 00:36:18.280 |
agents in a little bit long, to be honest. But yeah, you can see 00:36:23.160 |
what it's doing here. Right. And of course, we could then go in, 00:36:26.160 |
we see this kind of too long, like SEO friendly description, 00:36:30.200 |
not, not really. So we can modify this. I'll put the SEO 00:36:35.440 |
friendly description, make sure we don't exceed, let me put on 00:36:42.440 |
a new line, make sure we don't exceed, say 200 characters, or 00:36:46.760 |
maybe it's even less to SEO, I don't, I don't have a clue. I 00:36:49.960 |
would just say 120 characters do not apply anything other than 00:36:53.600 |
the description. Right. So we could just go back, modify our 00:36:56.400 |
prompting, see what that generates again. Okay, so much 00:37:00.640 |
shorter, probably too short now, but that's fine. Cool. So we 00:37:04.160 |
have that we have a summary processor. And that's now in 00:37:08.000 |
this dictionary format that we have here. Cool. Now the third 00:37:12.600 |
step, we want to consume that first article variable with our 00:37:17.240 |
full article. And we're going to generate a few different output 00:37:22.200 |
fields. So for this, we're going to be using the structured 00:37:26.520 |
output feature. So let's scroll down, we'll see what that is, 00:37:31.920 |
what that looks like. So structured output is essentially 00:37:36.120 |
we're forcing the LLAMic like it has to output a dictionary with 00:37:40.960 |
these particular fields. Okay. And we can modify this quite a 00:37:45.440 |
bit. But in this scenario, what I want to do is I want there to 00:37:49.600 |
be an original paragraph, right, so I just want it to regenerate 00:37:52.720 |
the original paragraph, because I'm lazy, and I don't want to 00:37:54.720 |
extract it out, then I want to get the new edited paragraph, 00:37:59.520 |
this is the LLAM generated improved paragraph, and then we 00:38:03.600 |
want to get some feedback because we don't want to just 00:38:06.120 |
automate ourselves, we want to augment ourselves and get better 00:38:10.760 |
with AI rather than just being like how you do you do this. So 00:38:14.880 |
that's what we do here. And you can see that here we're using 00:38:18.320 |
this pydantic object. And what pydantic allows us to do is 00:38:21.960 |
define these particular fields. And it also allows us to assign 00:38:25.840 |
these descriptions to a field and line chain is actually going 00:38:29.000 |
to go ahead read all of this, right even reads. So for 00:38:32.760 |
example, we could put integer here, and we could actually get 00:38:35.640 |
a numeric score for our paragraph, right, we can try 00:38:40.280 |
that, right. So let's, let's, let's just try that quickly, 00:38:42.600 |
I'll show you. So numeric, numeric score. In fact, let's 00:38:48.400 |
even just ignore, let's not put anything here. So I'm going to 00:38:51.600 |
put constructive feedback on the original paragraph by just put 00:38:54.200 |
into here. So let's see what happens. Okay, so we have that. 00:38:58.200 |
And what I'm going to do is I'm going to get our creative LM, 00:39:01.600 |
I'm going to use this with structured output method. And 00:39:04.320 |
that's actually going to modify that LM class, create a new LM 00:39:07.320 |
class that forces LM to use this structure for the output, right, 00:39:12.440 |
so passing in paragraph into here. Using this, we're creating 00:39:15.880 |
this new structured LM. So let's run that and see what happens. 00:39:21.360 |
Okay, so we're going to modify our chain accordingly, maybe 00:39:25.960 |
what I can do is also just remove this bit for now. So we 00:39:30.960 |
can just see what the structured LM outputs directly. And let's 00:39:34.800 |
see. Okay, so now you can see that we actually have that 00:39:41.080 |
paragraph object, right, the one we defined up here, which is 00:39:43.800 |
kind of cool. And then in there, we have the original 00:39:46.760 |
paragraph, right. So this is where this is coming from. I 00:39:51.200 |
definitely remember writing something that looks a lot like 00:39:54.160 |
that. So I think that is correct. We have the edited 00:39:57.160 |
paragraph. So this is okay, what it thinks is better. And then 00:40:00.960 |
interestingly, the feedback is three, which is weird, right? 00:40:05.400 |
Because here we said the constructive feedback on the 00:40:08.760 |
original paragraph. But what we're doing when we use this 00:40:12.080 |
with structured output, for what line chain is doing is is 00:40:15.480 |
essentially performing a tool call to open AI. And what a tool 00:40:19.160 |
call can do is force a particular structure in the 00:40:22.480 |
output of an LM. So when we say feedback has to be an integer, 00:40:27.080 |
no matter what we put here, it's going to give us an integer. 00:40:30.200 |
Because how do you provide constructive feedback within 00:40:33.480 |
sure doesn't really make sense. But because we've set that 00:40:37.200 |
limitation, that restriction here, that is what it does. It 00:40:41.760 |
just gives us the numeric value. So I'm going to shift that to 00:40:45.680 |
string. And then let's rerun this, see what we get. Okay, we 00:40:49.360 |
should now see that we actually do get constructive feedback. 00:40:52.640 |
Alright, so yeah, you can see it's quite, quite long. So the 00:40:56.480 |
original paragraph effectively communicates limitations with 00:40:59.040 |
neural AI systems in performing certain tasks. However, it could 00:41:03.080 |
benefit from slightly improved clarity and conciseness. For 00:41:06.400 |
example, the phrase was becoming clear can be made more direct by 00:41:09.960 |
changing it to became evident. Yeah, true. Thank you very much. 00:41:15.240 |
So yeah, now we actually get that that feedback, which is 00:41:19.480 |
pretty nice. Now let's add in this final step to our chain. 00:41:24.440 |
Okay, and it's just going to pull out our paragraph object 00:41:28.960 |
here and extract into a dictionary, we don't necessarily 00:41:31.960 |
need to do this. Honestly, I actually kind of prefer it 00:41:34.280 |
within this paragraph object. But just so we can see how we 00:41:38.680 |
would pass things on the other side of the chain. Okay, so now 00:41:43.680 |
we can see we've extracted that out. Cool. So we have all of 00:41:49.120 |
that interesting feedback again. But let's leave it there for the 00:41:54.560 |
text part of this. Now let's have a look at the sort of 00:41:58.360 |
multimodal features that we can work with. So this is, you know, 00:42:02.400 |
maybe one of those things that's kind of seems a bit more 00:42:04.600 |
abstracted, a little bit complicated, where it maybe 00:42:08.120 |
could be improved. But you know, we're not going to really be 00:42:10.920 |
focusing too much on the multimodal stuff, we'll still be 00:42:13.440 |
focusing on language, but I did want to just show you very 00:42:16.280 |
quickly. So we want this article to look better. Okay, we want to 00:42:22.000 |
generate a prompt based on the article itself, that we can then 00:42:28.640 |
pass to DALI, the image generation model from OpenAI, 00:42:32.600 |
that will then generate an image like a like a thumbnail image 00:42:36.320 |
for us. Okay. So the first step of that is we're actually going 00:42:41.160 |
to get an LLM to generate that. Alright, so we have our prompt 00:42:44.760 |
that we're going to use for that. So I'm gonna say generate 00:42:47.200 |
a prompt with less than 500 characters to generate an image 00:42:51.600 |
based on the following article. Okay, so that's our prompt. 00:42:55.240 |
Yeah, super simple. We're using the generic prompt template 00:42:58.920 |
here, you can use that you can use user prompt template, it's 00:43:02.480 |
up to you. This is just like the generic prompt template, then 00:43:06.560 |
what we're going to be doing is based on what this outputs, 00:43:11.120 |
we're then going to feed that in to this generate and display 00:43:15.000 |
image function via the image prompt parameter that is going 00:43:19.320 |
to use the DALI API wrapper from line chain, it's going to run 00:43:23.560 |
that image prompt, and we're going to get a URL out from 00:43:26.720 |
that, essentially. And then we're going to read that using 00:43:29.960 |
SK image here, right, so it's going to read that image URL, 00:43:33.000 |
going to get the image data, and then we're just going to display 00:43:36.120 |
it. Okay, so pretty straightforward. Now, again, this 00:43:42.200 |
is a L cell thing here that we're doing, we have this 00:43:46.160 |
runnable lambda thing, when we're running functions within 00:43:50.480 |
our cell, we need to wrap them within this runnable lambda, I, 00:43:54.400 |
you know, I don't want to go too much into what this is doing 00:43:57.720 |
here, because we do cover in the L cell chapter. But it's just, 00:44:01.760 |
you know, all you really need to know is we have a custom 00:44:04.040 |
function, wrap it in runnable lambda. And then what we get 00:44:07.840 |
from that we can use within this here, right, the L cell 00:44:12.000 |
syntax. So what are we doing here, let's figure this out, we 00:44:15.960 |
are taking our original image prompt that we defined just up 00:44:19.840 |
here, right, input variable to that is article. Okay, we have 00:44:25.800 |
our article data being input here, feeding that into our 00:44:29.120 |
prompt. From there, we get our message that we then feed into 00:44:33.640 |
our LM from the LM, it's going to generate us a, like an image 00:44:37.960 |
prompt, like a prompt for generating our image for this 00:44:41.520 |
article, we can even let's let's print that out, so that we can 00:44:45.920 |
see what it generates, because I'm also kind of curious. Okay, 00:44:49.920 |
so we'll just run that. And then let's see, it will feed in that 00:44:55.480 |
content into our runnable, which is basically this function here. 00:45:00.080 |
And we'll see what it generates. Okay, don't expect anything 00:45:03.880 |
amazing from Dali, it's not, it's not the best, to be honest, 00:45:07.720 |
but we at least we see how to use it. Okay, so we can see the 00:45:12.800 |
prompt that was used here, create an image that visually 00:45:15.280 |
represents the concept of neuro symbolic agents depict a 00:45:18.360 |
futuristic interface where a large language model interacts 00:45:22.120 |
with traditional code, symbolizing integration of, oh, 00:45:25.440 |
my gosh, something computation include elements like a brain to 00:45:29.880 |
represent neural networks, gears or circuits or symbolic logic, 00:45:34.600 |
and a web of connections illustrating vast use cases of 00:45:38.480 |
AI agents. Oh, my gosh, look at all that. Big prompt, then we 00:45:44.480 |
get this. So you know, Dali is interesting, I would say, we 00:45:48.160 |
could even take this, let's just see what that comes up with in 00:45:51.880 |
something like mid journey, you can see these way cooler images 00:45:56.640 |
that we get from just another image generation model far 00:45:59.640 |
better, but pretty cool, honestly. So in terms of 00:46:02.800 |
generation images, the phrasing that the prompt itself is 00:46:06.600 |
actually pretty good. The image, you know, could be better. But 00:46:11.440 |
that's it, right. So with all of that, we've seen a little 00:46:15.760 |
introduction to what we might building with Lightning Chain. 00:46:18.520 |
So that's it for our introduction chapter. As I 00:46:21.560 |
mentioned, we don't want to go too much into what each of these 00:46:24.800 |
things is doing, I just really want to focus on, okay, this is 00:46:29.680 |
kind of how we're building something with line chain. This 00:46:33.800 |
is the overall flow. We don't really want to be focusing too 00:46:37.880 |
much on, okay, what exactly LSL is doing, or what exactly, you 00:46:42.960 |
know, this prompt thing is that we're setting up, we're going to 00:46:47.080 |
be focusing much more on all of those things, and much more in 00:46:50.880 |
the upcoming chapters. So for now, we've just seen a little 00:46:55.600 |
bit of what we can build before diving in, in more detail. Okay, 00:46:59.760 |
so now we're going to take a look at AI observability using 00:47:04.680 |
Langsmith. Now, Langsmith is another piece of the broader 00:47:08.720 |
line chain ecosystem. Its focus is on allowing us to see what 00:47:14.960 |
our LLMs, agents, etc, are actually doing. And it's 00:47:18.840 |
something that we would definitely recommend using if 00:47:21.720 |
you are going to be using line chain and line graph. Now let's 00:47:24.200 |
take a look at how we would set Langsmith up, which is 00:47:27.600 |
incredibly simple. So I'm going to open this in Colab. And I'm 00:47:31.960 |
just going to install the prerequisites here. You'll see 00:47:35.120 |
these are all the same as before, but we now have the 00:47:37.280 |
Langsmith library here as well. Now, we are going to be using 00:47:41.320 |
Langsmith throughout the course. So in all the following chapters, 00:47:45.200 |
we're going to be importing Langsmith, and that will be 00:47:48.440 |
tracking everything we're doing. But you don't need 00:47:50.720 |
Langsmith to go through the course, it's an optional 00:47:53.680 |
dependency. But as mentioned, I would recommend it. So we'll 00:47:57.240 |
come down to here. And first thing that we will need is the 00:48:00.040 |
line chain API key. Now we do need an API key, but that does 00:48:04.600 |
come with a reasonable free tier. So we can see here, they 00:48:09.640 |
have each of the plans. And this is the one that we are by 00:48:13.160 |
default on. So it's free for one user up to 5000 tracers per 00:48:20.200 |
month. If you're building out an application, I think it's 00:48:23.080 |
fairly easy to go beyond that, but it really depends on what 00:48:26.000 |
you're building. So it's a good place to start with. And then of 00:48:29.640 |
course, you can upgrade as required. So we would go to 00:48:35.000 |
smith.langchain.com. And you can see here that this will log me 00:48:40.040 |
in automatically, I have all of these tracing projects, these 00:48:43.560 |
are all from me running the various chapters of the course 00:48:46.360 |
yours, if you do use Langsmith throughout the course, your 00:48:49.560 |
Langsmith dashboard will end up looking something like this. 00:48:52.640 |
Now, what we need is an API key. So we go over to settings, we 00:48:58.800 |
have API keys, and we're just going to create an API key. 00:49:02.240 |
Because we're just going through some personal learning right 00:49:05.120 |
now, I would go with personal access token, we can give a name 00:49:08.400 |
or description if you want. Okay, and we'll just copy that. 00:49:12.000 |
And then we come over to our notebook, and we enter our API 00:49:15.200 |
key there. And that is all we actually need to do. That's 00:49:18.240 |
absolutely everything. I suppose the one thing to be aware of is 00:49:21.320 |
that you should set your Langchain project to whatever 00:49:24.280 |
project you're working within. So of course, within the course, 00:49:27.320 |
we have individual project names for each chapter. But for your 00:49:30.800 |
own projects, of course, you should make sure this is 00:49:33.320 |
something that you recognize and is useful to you. So Langsmith 00:49:37.840 |
actually does a lot without needing to do anything. So we 00:49:40.680 |
can actually go through, let's just initialize our LLM and 00:49:43.960 |
start invoking it and seeing what Langsmith returns to us. So 00:49:48.480 |
we'll need our OpenAI API key, enter it here. And then let's 00:49:53.560 |
just invoke hello. Okay, so nothing has changed on this end, 00:49:58.720 |
right? So it was running code, there's nothing different here. 00:50:01.320 |
However, now if we go to Langsmith, I'm going to go back 00:50:05.640 |
to my dashboard. Okay, and you can see that the the order of 00:50:10.120 |
these projects just changed a little bit. And that's because 00:50:13.000 |
the most recently used project, this one at the top, Langchain 00:50:16.600 |
course Langsmith OpenAI, which is the current chapter we're in, 00:50:20.200 |
that was just triggered. So I can go into here, I can see, oh, 00:50:24.360 |
look at this. So we actually have something in the Langsmith 00:50:27.640 |
UI. And all we did was enter our Langchain API key. That's all we 00:50:31.720 |
did. And we set some environment variables. And that's it. So we 00:50:34.840 |
can actually click through to this and it will give us more 00:50:36.640 |
information. So you can see what was the input, what was the 00:50:40.440 |
output, and some other metadata here. You see, you know, there's 00:50:45.640 |
not that much in here. However, when we do the same for agents, 00:50:50.840 |
we'll get a lot more information. So I can even show 00:50:54.360 |
you a quick example from the future chapters. If we come 00:50:59.120 |
through to agents intro here, for example. And we just take a 00:51:04.040 |
look at one of these. Okay, so we have this input and output, 00:51:08.440 |
but then on the left here, we get all of this information. And 00:51:11.800 |
the reason we get all this information is because agents 00:51:14.200 |
are performing multiple LLM calls, etc, etc. So there's a 00:51:18.800 |
lot more going on. So you can see, okay, what was the first 00:51:21.880 |
LLM call, and then we get these tool use traces, we get another 00:51:26.120 |
LLM call, another tool use and another LLM call. So you can see 00:51:30.200 |
all this information, which is incredibly useful and incredibly 00:51:33.600 |
easy to do. Because all I did when saying this up in that 00:51:37.120 |
agent chapter was simply set the API key and the environment 00:51:41.120 |
variables as we have done just now. So you get a lot out of a 00:51:46.040 |
very little effort with Langsmith, which is great. So 00:51:49.120 |
let's return to our Langsmith project here. And let's invoke 00:51:53.040 |
some more. Now I've already shown you, you know, we're going 00:51:56.480 |
to see a lot of things just by default. But we can also add 00:51:59.760 |
other things that Langsmith wouldn't typically trace. So to 00:52:05.080 |
do that, we will just import a traceable decorator from 00:52:08.280 |
Langsmith. And then let's make these just random functions 00:52:13.600 |
traceable within Langsmith. Okay, so we run those, we have 00:52:19.000 |
three here. So we're going to generate a random number, we're 00:52:22.600 |
going to modify how long a function takes and also generate 00:52:27.960 |
a random number. And then in this one, we're going to either 00:52:31.720 |
return this no error, or we're going to raise an error. So 00:52:36.200 |
we're going to see how the Langsmith handles these 00:52:38.880 |
different scenarios. So let's just iterate through and run 00:52:43.160 |
those a few times. So it's going to run each one of those 10 00:52:46.280 |
times. Okay, so let's see what happens. So they're running, 00:52:52.040 |
let's go over to our Langsmith UI and see what is happening 00:52:55.840 |
over here. So we can see that everything is updating, we're 00:52:58.640 |
adding that information through. And we can see if we go into a 00:53:01.600 |
couple of these, we can see a little more information. So the 00:53:04.520 |
input and the output took three seconds. See random error here. 00:53:11.200 |
In this scenario, random error passed without any issues. Let 00:53:15.480 |
me just refresh the page quickly. Okay, so now we have 00:53:20.200 |
the rest of the information. And we can see that occasionally, 00:53:23.840 |
if there is an error from our random error function, it is 00:53:26.800 |
signified with this. And we can see the traceback as well that 00:53:31.520 |
was returned there, which is useful. Okay, so we can see if 00:53:34.200 |
an error has been raised, we have to see what that error is. 00:53:37.400 |
We can see the various latencies of these functions. So you can 00:53:42.600 |
see that varying throughout here. We see all the inputs to 00:53:47.640 |
each one of our functions, and then of course the outputs. So 00:53:51.600 |
we can see a lot in there, which is pretty good. Now, another 00:53:55.800 |
thing that we can do is we can actually filter. So if we come 00:53:59.920 |
to here, we can add a filter. Let's filter for errors. That 00:54:04.760 |
would be value error. And then we just get all of the cases 00:54:09.240 |
where one of our functions has returned or raised an error or 00:54:13.240 |
value error specifically. Okay, so that's useful. And then 00:54:17.360 |
yeah, there's various other filters that we can add there. 00:54:21.160 |
So we could add a name, for example, if we wanted to look 00:54:24.640 |
for the generate string delay function only, we could also do 00:54:30.560 |
that. Okay, and then we can see the varying latencies of that 00:54:34.880 |
function as well. Cool. So we have that. Now, one final thing 00:54:40.760 |
that we might want to do is maybe we want to make those 00:54:43.680 |
function names a bit more descriptive or easy to search 00:54:47.920 |
for, for example. And we can do that by saying the name of the 00:54:51.200 |
traceable decorator, like so. So let's run that. Run this a few 00:54:56.120 |
times. And then let's jump over to Langsmith again, go into 00:55:01.160 |
Langsmith project. Okay, and you can see those coming through as 00:55:04.200 |
well. So then we could also search for those based on that 00:55:07.560 |
new name. So what was it, chit chat maker, like so. And then 00:55:12.040 |
we can see all the information being streamed through to 00:55:16.560 |
Langsmith. So that is our introduction to Langsmith. There 00:55:21.160 |
is really not all that much to go through here. It's very easy 00:55:25.200 |
to set up. And as we've seen, it gives us a lot of 00:55:27.640 |
observability into what we are building. And we will be using 00:55:32.880 |
this throughout the course, we don't rely on it too much. It's 00:55:35.600 |
a completely optional dependency. So if you don't want 00:55:38.000 |
to use Langsmith, you don't need to, but it's there and I would 00:55:40.560 |
recommend doing so. So that's it for this chapter, we'll move on 00:55:43.800 |
to the next one. Now we're going to move on to the chapter on 00:55:48.560 |
prompts in Langchain. Now, prompts, they seem like a simple 00:55:53.040 |
concept, and they are a simple concept, but there's actually 00:55:55.320 |
quite a lot to them when you start diving into them. And they 00:55:59.720 |
truly have been a very fundamental part of what has 00:56:04.480 |
propelled us forwards from pre LLM times to the current LLM 00:56:09.360 |
times. You have to think until LLMs became widespread, the way 00:56:14.520 |
to fine tune a AI model or ML model back then was to get loads 00:56:22.720 |
of data for your particular use case, spend a load of training 00:56:26.840 |
your specific transformer or part of the transformer to 00:56:30.960 |
essentially adapt it for that particular task. That could take 00:56:35.120 |
a long time. Depending on the task, it could take you months 00:56:40.840 |
or in some times, if it was a simpler task, it might take 00:56:44.480 |
probably days, potentially weeks. Now, the interesting 00:56:48.720 |
thing with LLMs is that rather than needing to go through this 00:56:53.960 |
whole fine tuning process to modify a model for one task over 00:57:00.520 |
another task, rather than doing that, we just prompt it 00:57:03.400 |
differently, we literally tell the model, hey, I want you to do 00:57:07.360 |
this in this particular way. And that is a paradigm shift in what 00:57:12.480 |
you're doing is so much faster, it's going to take you, you 00:57:15.600 |
know, a couple of minutes, rather than days, weeks, or 00:57:18.400 |
months. And LLMs are incredibly powerful when it comes to just 00:57:23.200 |
generalizing to, you know, across these many different 00:57:26.200 |
tasks. So prompts, which control those instructions are a 00:57:31.480 |
fundamental part of that. Now, line chain naturally has many 00:57:36.560 |
functionalities around prompts. And we can build very dynamic 00:57:40.320 |
prompting pipelines that modify the structure and content of 00:57:44.360 |
what we're actually feeding into our LLM, depending on different 00:57:47.800 |
variables, different inputs. And we'll see that in this chapter. 00:57:51.920 |
So we're going to work through prompting within the scope of a 00:57:57.160 |
RAG example. So let's start by just dissecting the various 00:58:01.840 |
parts of a prompt that we might expect to see for a use case 00:58:06.040 |
like RAG. So our typical prompt for RAG or retrieval, 00:58:11.200 |
augmented generation will include rules for the LLM. And 00:58:15.960 |
this is this you will see in most prompts, if not all this 00:58:21.440 |
part of the prompt sets up the behavior of the LLM. That is how 00:58:26.840 |
it should be responding to user queries, what sort of 00:58:30.560 |
personality it should be taking on what it should be focusing on 00:58:34.360 |
when it is responding any particular rules or boundaries 00:58:37.800 |
that we want to set. And really, what we're trying to do here is 00:58:42.240 |
just to simply provide as much information as possible to the 00:58:47.200 |
LLM about what we're doing, we just want to give the LLM 00:58:53.480 |
context as to the place that it finds itself in. Because an LLM 00:58:59.200 |
has no idea where it is, it's just is a it takes in some 00:59:02.840 |
information and spits out information. If the only 00:59:05.800 |
information it receives is from the users, you know, user query, 00:59:08.680 |
it has, you know, it doesn't know the context, what is the 00:59:12.840 |
application that is within? What is its objective? What is its 00:59:16.880 |
aim? What are the boundaries? All of this, we need to just 00:59:21.400 |
assume the LLM has absolutely no idea about because it truly 00:59:26.360 |
does not. So as much context as we can provide, but it's 00:59:32.280 |
important that we don't overdo it. It's, we see this all the 00:59:36.040 |
time, people will over prompt an LLM, you want to be concise, 00:59:40.320 |
you don't want fluff. And in general, every single part of 00:59:44.280 |
your prompt, the more concise and less fluffy, you can make it 00:59:47.760 |
the better. Now, those rules or instructions are typically in 00:59:51.560 |
the system prompt of your LLM. Now, the second one is context, 00:59:55.800 |
which is RAG specific. The context refers to some sort of 00:59:59.960 |
external information that you're feeding into your LLM. We may 01:00:04.920 |
have received this information from web search, database query 01:00:09.600 |
or quite often in this case of RAG, it's a vector database. 01:00:14.000 |
This external information that we provide is essentially the 01:00:19.120 |
RA retrieval augmentation of RAG. We are augmenting the 01:00:25.880 |
knowledge of our LLM, which the knowledge of our LLM is 01:00:29.720 |
contained within the LLM model weights. We're augmenting that 01:00:33.600 |
knowledge with some external knowledge. That's what we're 01:00:36.520 |
doing here. Now for chat LLMs, this context is typically 01:00:43.320 |
placed within a conversational context within the user or 01:00:48.720 |
assistant messages. And with more recent models, it can also 01:00:54.320 |
be placed within tool and messages as well. Then we have 01:00:58.760 |
the questions, pretty straightforward. This is the 01:01:01.560 |
query from the user. This is more, it's usually a user 01:01:06.680 |
message, of course. There might be some additional formatting 01:01:10.960 |
around this, you might add a little bit of extra context, or 01:01:14.680 |
you might add some additional instructions. If you find that 01:01:18.240 |
your LLM sometimes veers off the rules that you've set within 01:01:21.760 |
the system prompt, you might append or prefix something here. 01:01:26.520 |
But for the most part, it's probably just going to be the 01:01:28.600 |
user's input. And finally, so these are all the inputs for our 01:01:33.800 |
prompt here is going to be the output that we get. So the 01:01:37.760 |
answer from the assistant. Again, I mean, that's not even 01:01:41.480 |
specific to RAG, it's just what you would expect in a chat LLM 01:01:45.680 |
or any LLM. And of course, that would be an assistant message. 01:01:49.600 |
So putting all of that together in an actual prompt, so you can 01:01:53.440 |
see everything we have here. So we have the rules for our 01:01:57.320 |
prompt here, the instructions, we're just saying, okay, answer 01:02:00.360 |
the question based on the context below. If you cannot 01:02:02.440 |
answer the question, using the information, answer it, I don't 01:02:05.680 |
know. Then we have some context here. Okay, in this scenario, 01:02:11.200 |
that context that we're feeding in here, because it's the first 01:02:14.680 |
message, we might put that into the system prompt. But that may 01:02:18.160 |
also be turned around. Okay, if you if you, for example, have an 01:02:21.640 |
agent, you might have your question up here before the 01:02:25.760 |
context. And then that would be coming from a user message. And 01:02:30.000 |
then this context would follow the question and be recognized 01:02:34.600 |
as a tool message, it would be fed in that way as well, can 01:02:38.920 |
depends on on what sort of structure you're going for that. 01:02:41.520 |
But you can do either you can feed it into the system message 01:02:43.960 |
if it's less conversational, whereas if it's more 01:02:47.920 |
conversational, you might feed it in as a tool message. Okay, 01:02:50.760 |
and then we have a user query, which is here. And then we'd 01:02:54.160 |
have the AI answer. Okay, and obviously, that would be 01:02:57.120 |
generated here. Okay, so let's switch across to the code. We're 01:03:01.520 |
in the linechain course repo notebooks, zero, three prompts, 01:03:05.320 |
I'm just going to open this in Colab. Okay, scroll down, and 01:03:09.280 |
we'll start just by installing the prerequisites. Okay, so we 01:03:13.120 |
just have the various libraries, again, as I mentioned before, 01:03:16.360 |
langsmith is optional, you don't need to install it. But if you 01:03:19.360 |
would like to see your traces and everything in langsmith, 01:03:22.560 |
then I would recommend doing that. And if you are using 01:03:25.680 |
langsmith, you will need to enter your API key here. Again, 01:03:29.760 |
if you're not using langsmith, you don't need to enter 01:03:32.000 |
anything here, you just skip that cell. Okay, cool. And let's 01:03:36.160 |
jump into the basic prompting them. So we're going to start 01:03:41.080 |
with this prompt. And so use query based on the question 01:03:43.600 |
below. So we're just structuring what we just saw in code. And 01:03:49.200 |
we're going to be using the chat prompt template, because 01:03:52.480 |
generally speaking, we're using chat LMS in most, most cases, 01:03:57.720 |
nowadays. So we have our chat prompt template, and that is 01:04:01.760 |
going to contain a list of messages, system message to 01:04:05.440 |
begin with, which is just going to contain this. And we're 01:04:08.800 |
feeding in the context within that there. And we have our 01:04:13.640 |
user query here. Okay. So we'll run this. And if we take a look 01:04:20.920 |
here, we haven't specified what our input variables are, okay. 01:04:26.400 |
But we can see that we have query. And we have context up 01:04:31.680 |
here, right? So we can see that, okay, these are the input 01:04:34.320 |
variables, we just haven't explicitly defined them here. So 01:04:39.160 |
let's just confirm with this, that line chain did pick those 01:04:44.040 |
up. And we can see that it did. So it has context and query as 01:04:46.720 |
our input variables for the prompt template that we just 01:04:50.560 |
defined. Okay, so we can also see the structure of our 01:04:55.280 |
templates. Let's have a look. Okay, so we can see that within 01:05:00.760 |
messages here, we have a system message prompt template, the way 01:05:05.160 |
that we define this, you can see here that we have from messages 01:05:08.160 |
and this will consume various different structures. So you can 01:05:14.680 |
see here that it has a for messages is a sequence of 01:05:19.760 |
message like representation. So we could pass in a system prompt 01:05:24.240 |
template object, and then a user prompt template object. Or we 01:05:30.600 |
can just use a tuple like this. And this actually defines okay, 01:05:33.920 |
the system, this is a user, and you could also do assistant or 01:05:38.360 |
tool messages and stuff here as well using the same structure. 01:05:42.280 |
And then we can look in here. And of course, that is being 01:05:45.880 |
translated into the system message prompt template and 01:05:50.080 |
human message prompt template. Okay. We have our input 01:05:54.680 |
variables in there. And we have the template too. Okay. Now, 01:05:59.880 |
let's continue. We'll see here why why just said, so we're 01:06:05.400 |
importing our system message prompt template and human 01:06:08.240 |
message prompt template. And you can see we're using the same 01:06:11.200 |
from messages method here. Right? And you can see so 01:06:15.520 |
sequence of message like representation. It's just, you 01:06:19.440 |
know, what that actually means. It can vary, right? So here we 01:06:23.160 |
have system message prompt template from template, prompt 01:06:25.880 |
here from template query, you know, there's various ways that 01:06:28.600 |
you might want to do this, it just depends on how explicit you 01:06:32.960 |
want to be. Generally speaking, I think, for myself, I would 01:06:38.960 |
prefer that we stick with the objects themselves, and be 01:06:43.400 |
explicit. But it is definitely a little harder to pass when 01:06:46.960 |
you're when you're reading this. So I understand why you might 01:06:50.520 |
also prefer this is it's definitely cleaner, and it is a 01:06:53.560 |
does look simpler. So it just depends, I suppose, on 01:06:58.480 |
preference. Okay. So you see, again, this is exactly the same. 01:07:05.640 |
Okay, we're chair prompt template, and it contains this 01:07:08.600 |
and this. Okay. You probably want to see the exact output. So 01:07:14.080 |
it was messages. Okay, exactly the same as why I put before. 01:07:19.880 |
Cool. So we have all that. Let's see how we would invoke our LLM 01:07:25.800 |
with these. We're going to be using for a mini again, we do 01:07:30.280 |
need our API key. So enter that. And we'll just initialize our 01:07:37.280 |
LLM, we are going with a low temperature here. So less 01:07:41.120 |
randomness, or less creativity. And in many cases, this is 01:07:46.840 |
actually what I would be doing. The reason in this scenario that 01:07:51.400 |
we're going with low temperature is we're doing rag. And if you 01:07:55.680 |
remember, before we scroll up a little bit here, our template 01:07:59.000 |
says, answer the user's query based on the context below. If 01:08:01.680 |
you cannot answer the question using the provided answer, 01:08:04.680 |
information answer with I don't know, right. So just from 01:08:09.760 |
reading that we know that we want our LLM to be as truthful 01:08:15.320 |
and accurate as possible. So a more creative LLM is going to 01:08:19.720 |
struggle with that and is more likely to hallucinate. Whereas a 01:08:25.080 |
low creativity or low temperature LLM will probably 01:08:29.160 |
stick with the rules a little better. So again, it depends on 01:08:32.320 |
your use case. You know, if you're creative writing, you 01:08:35.120 |
might want to go with a higher temperature there. But for 01:08:38.440 |
things like rag, where the information being output should 01:08:42.120 |
be accurate, and truthful. It's important, I think that we keep 01:08:47.600 |
temperature low. Okay. I talked about that a little bit here. So 01:08:51.840 |
of course, lower temperature zero makes the LLMs output more 01:08:56.000 |
deterministic, which in theory should lead to less 01:08:59.040 |
hallucination. Okay, so we're gonna go with L cell again here. 01:09:03.240 |
This is for those of you that use line chain in the past, this 01:09:06.480 |
is equivalent to an LLM chain object. So our prompt template 01:09:10.840 |
is being fed into our LLM. Okay. And from now we have this 01:09:16.800 |
pipeline. Now let's see how we would use that pipeline. So 01:09:22.120 |
gonna get some, create some context here. So this is some 01:09:27.160 |
context around Aurelio AI. Mention that we built semantic 01:09:32.960 |
routers, semantic junkers, as AI platform, and development 01:09:38.800 |
services. We mentioned, I think we specifically outlined this 01:09:43.960 |
later on in the example. So the line chain experts, little piece 01:09:47.160 |
of information. Now, most LLMs would have not been trained on 01:09:51.920 |
the recent internet. So the fact that this came in September 01:09:55.680 |
2024, is relatively recent. So a lot of LLMs out of the box, you 01:10:00.400 |
wouldn't expect them to know that. So that is a good little 01:10:05.320 |
bit of information to ask you about. So we invoke, we have our 01:10:08.880 |
query. So what do we do? And we have that context. Okay, so 01:10:13.320 |
we're feeding that into that pipeline that we defined here. 01:10:16.120 |
Alright, so when we invoke that is automatically going to take 01:10:19.920 |
query and context and actually feed it into our prompt 01:10:23.800 |
template. Okay. If we want to, we can also be a little more 01:10:30.040 |
explicit. So you probably see me doing this throughout the 01:10:34.280 |
course. Because I do like to be explicit with everything, to be 01:10:39.040 |
honest. And you'll probably see me doing this. Okay, and this is 01:10:49.640 |
doing the same thing. Well, you'll see it will in the 01:10:53.240 |
moment. This is doing the exact same thing. Again, this is just 01:10:57.800 |
an LSL thing. So all I'm doing in this scenario is I'm saying, 01:11:04.760 |
okay, take that from the dictionary query. And then also 01:11:10.160 |
take from that input dictionary, the context key. Okay, so this 01:11:19.000 |
is doing the exact same thing. The reason that we might want to 01:11:22.240 |
write this is mainly for clarity, to be honest, just too 01:11:26.520 |
explicit, say, okay, these are the inputs, because otherwise, 01:11:29.240 |
we don't really have them in the code other than within our 01:11:33.360 |
original prompts up here, which is not super clear. So I think 01:11:39.400 |
it's usually a good idea to just be more explicit with these 01:11:41.720 |
things. And of course, if you decide you're going to modify 01:11:45.160 |
things a little bit, let's say you modify this input down the 01:11:48.880 |
line, you can still feed in the same input here, you're just 01:11:52.240 |
mapping it between different keys, essentially. Or if you 01:11:56.040 |
would like to just modify that, you need to lowercase it on the 01:11:59.720 |
way in or something, you can do. So you have that, I'll just 01:12:06.200 |
redefine that, actually. And we'll invoke again. Okay, we see 01:12:13.440 |
that it does the exact same thing. Okay, so ready. So this 01:12:17.600 |
is a AI message just generated by the LM. Okay, expertise in 01:12:22.440 |
building AI agents, several open source frameworks, router, AI 01:12:27.400 |
platform. Okay, right. So provide them. So they have 01:12:32.840 |
everything that other than the line chain experts thing, it 01:12:35.280 |
didn't mention that. But we will, yeah, we'll test it later 01:12:39.080 |
on that. Okay, so on to future prompting. This is a specific 01:12:43.040 |
prompting technique. Now, many state of the art or also to LMS 01:12:48.440 |
are very good at instruction following. So you'll find that a 01:12:52.400 |
few shot prompting is less common now than it used to be, 01:12:56.240 |
at least for this or bigger, more state of the art models. 01:13:00.480 |
But when you start using smaller models, not really what we can 01:13:05.240 |
use here. But let's say you're using a source model like llama 01:13:09.400 |
three, or llama two, which is much smaller, you will probably 01:13:15.080 |
need to consider things like few shot prompting. Although that 01:13:18.920 |
being said, with open AI models, at least the current open AI 01:13:24.440 |
models, this is not so important. Nonetheless, it can 01:13:27.920 |
be useful. So the idea behind future prompting is that you are 01:13:31.880 |
providing a few examples to your LM of how it should behave 01:13:36.760 |
before you are actually going into the main part of the 01:13:42.520 |
conversation. So let's see how that would look. So we create an 01:13:46.800 |
example prompt. So we have our human and AI. So human input AI 01:13:51.520 |
response. So we're basically setting up okay, this with this 01:13:54.760 |
type of input, you should provide this type of output. 01:13:57.960 |
That's what we're doing here. And we're just going to provide 01:14:01.760 |
some examples. Okay, so we have our input, here's query one, 01:14:05.880 |
here's the answer one, right? This is just I just want to show 01:14:09.680 |
you how it works. This is not what we'd actually feed into our 01:14:12.680 |
LM. Then, with both these examples and our example prompt 01:14:16.960 |
would feed both of these into line chains, a few shot chat 01:14:21.680 |
message prompt template. Okay. And well, you'll see what we get 01:14:26.720 |
out of it. Okay, so we basically get it formats everything and 01:14:30.480 |
structures everything for us. Okay. And using this, of course, 01:14:35.920 |
it depends on let's say you see that your user is talking about 01:14:42.280 |
a particular topic. And you would like to guide your LM to 01:14:47.240 |
talk about that particular topic in a particular way. Right. So 01:14:50.760 |
you could identify that the user is talking about that topic, 01:14:53.840 |
either like a keyword match or a semantic similarity match. And 01:14:58.080 |
based on that, you might want to modify these examples that you 01:15:01.240 |
feed into your few shot chat message prompt template. And 01:15:06.080 |
then obviously, for that could be what you do with topic A for 01:15:08.960 |
topic B, you might have another set of examples that you feed 01:15:12.120 |
into this. All this time, your example prompts is remaining the 01:15:15.800 |
same, but you're just modifying the examples that are going in 01:15:18.480 |
so that they're more relevant to whatever it is your user is 01:15:21.520 |
actually talking about. So that can be useful. Let's see an 01:15:25.360 |
example of that. So when we are using a tiny LM, its ability 01:15:29.800 |
would be limited, although I think we were probably fine 01:15:33.160 |
here. We're going to say, answer the user query based on the 01:15:36.760 |
context below. Always enter a markdown format, you know, being 01:15:40.120 |
very specific, this is our system prompt. Okay, that's 01:15:44.320 |
nice. But what we've kind of said here is, okay, always 01:15:48.200 |
enter a markdown format to do that. But when doing so, please 01:15:53.440 |
provide headers, short summaries, and follow bullet 01:15:55.920 |
points, then conclude. Okay, so you see this here, okay, so we 01:16:01.560 |
get this overview of array, you have this and this is actually 01:16:05.160 |
quite good. But if we come down here, what I specifically want 01:16:09.800 |
is to always follow this structure. Alright, so we have 01:16:13.880 |
the double header for the topic, summary, header, a couple of 01:16:20.120 |
bullet points. And then I always want to follow this pattern 01:16:22.320 |
where it's like to conclude, always, it's always bold. You 01:16:26.120 |
know, I want to be very specific on what I want. And to be, you 01:16:30.400 |
know, fully honest, with GPT 4.0 mini, you can actually just 01:16:35.200 |
prompt most of this in. But for the sake of the example, we're 01:16:38.560 |
going to provide a few short examples in a few short prompt 01:16:43.760 |
examples, instead to get this. So we're going to provide one 01:16:46.920 |
example here. Second example here. And you'll see we're just 01:16:51.360 |
following that same pattern, we're just setting up the 01:16:53.160 |
pattern that the LM should use. So we're going to set that up 01:16:58.400 |
here, we have our main header, a little summary, some sub 01:17:03.720 |
headers, bullet points, sub header, bullet points, bullet 01:17:06.240 |
points to conclude, so on and so on. Same with this one here. 01:17:09.640 |
Okay. And let's see what we got. Okay, so this is the structure 01:17:20.000 |
of our new few short prompt template. You can see what all 01:17:24.800 |
this looks like. Let's come down and we're going to do, we're 01:17:28.840 |
basically going to insert that directly into our chat prompt 01:17:32.280 |
template. So we have from messages, system prompt, user 01:17:37.600 |
prompt, and then we have in there, these, so let me actually 01:17:42.960 |
show you very quickly. Right, so we just have this few short 01:17:48.720 |
chat to message prompt template, which will be fed into the 01:17:51.320 |
middle here, run that, and then feed all this back into our 01:17:54.840 |
pipeline. Okay, and this will, you know, modify the structure 01:17:58.440 |
so that we have that bold to conclude at the end here. Okay, 01:18:01.880 |
you can see nicely here. So we get a bit more of that, the 01:18:05.880 |
exact structure that we were getting again with GPT 4.0 01:18:10.160 |
models and many other OpenAI models, you don't really need to 01:18:14.120 |
do this, but you will see it in other examples. We do have an 01:18:17.600 |
example of this where we're using a Llama and we're using, I 01:18:21.760 |
think Llama 2, if I'm not wrong. And you can see that adding this 01:18:26.680 |
few short prompt template is actually a very good way of 01:18:31.280 |
getting those smaller, less capable models to follow your 01:18:34.600 |
instructions. So this is really, when you're working with a 01:18:38.000 |
smaller lens, this can be super useful, but even for SOTA models 01:18:41.360 |
like GPT 4.0, if you do find that you're struggling with the 01:18:45.640 |
prompting, it's just not quite following exactly what you want 01:18:48.520 |
it to do. This is a very good technique for actually getting 01:18:53.240 |
it to follow a very strict structure or behavior. Okay, so 01:18:57.200 |
moving on, we have chain of thought prompting. So this is a 01:19:01.720 |
more common prompting technique that encourages the LLM to 01:19:06.320 |
think through its reasoning or its thoughts step by step. So 01:19:11.480 |
it's a chain of thought. The idea behind this is like, okay, 01:19:15.040 |
in math class, when you're a kid, the teachers would always 01:19:19.280 |
push you to put down your, your working out, right? And there's 01:19:24.400 |
multiple reasons for that. One of them is to get you to think 01:19:26.960 |
because they know in a lot of cases, actually, you know, 01:19:29.400 |
you're a kid and you're in a rush and you don't really care 01:19:31.400 |
about this test. And the, you know, they're just trying to get 01:19:35.680 |
you to slow down a little bit, and actually put down your 01:19:39.360 |
reasoning. And that kind of forced you to think, oh, 01:19:41.280 |
actually, I'm skipping a little bit in my head, because I'm 01:19:44.320 |
trying to just do everything up here. If I write it down, all 01:19:47.480 |
of a sudden, it's like, Oh, actually, I'm, yeah, I need to 01:19:50.720 |
actually do that slightly differently, you realize, okay, 01:19:53.280 |
you're probably rushing a little bit. Now, I'm not saying an LLM 01:19:55.960 |
is rushing, but it's a similar effect by an LLM writing 01:19:58.920 |
everything down, they tend to actually get things right more 01:20:03.880 |
frequently. And at the same time, also similar to when 01:20:07.720 |
you're a child and a teacher is reviewing your exam work by 01:20:11.360 |
having the LLM write down its reasoning, you as a as a human 01:20:15.920 |
or engineer, you can see where the LLM went wrong, if it did 01:20:20.200 |
go wrong, which can be very useful when you're trying to 01:20:22.480 |
diagnose problems. So with chain of thought, we should see 01:20:26.240 |
less hallucinations, and generally bad performance. Now 01:20:30.360 |
to implement chain of thought in line chain, there's no 01:20:32.320 |
specific like line chain objects that do that. Instead, it's 01:20:35.800 |
it's just prompting. Okay, so let's go down and just see how 01:20:39.320 |
we might do that. Okay, so be helpful assistant answer the 01:20:42.960 |
user question, you must answer the question directly without 01:20:46.200 |
any other text or explanation. Okay, so that's our no chain of 01:20:50.520 |
thought system prompt. I will just know here, especially with 01:20:53.840 |
OpenAI. Again, this is one of those things where you'll see 01:20:57.040 |
it more with the smaller models. Most LLMs are actually trained 01:21:00.120 |
to use chain of thought prompting by default. So we're 01:21:03.120 |
actually specifically telling it here, you must answer the 01:21:05.880 |
question directly without any other text or explanation. Okay, 01:21:09.800 |
so we're actually kind of reverse prompting it to not use 01:21:13.000 |
chain of thought. Otherwise, by default, it actually will try 01:21:17.000 |
and do that because it's been trained to. That's how that's 01:21:19.600 |
how relevant chain of thought is. Okay, so I'm going to say 01:21:23.280 |
how many keystrokes you need to type in, type the numbers from 01:21:26.640 |
one to 500. Okay, we set up our like LLM chain pipeline. And 01:21:32.720 |
we're going to just invoke our query. And we'll see what we 01:21:35.760 |
get. Total number of keystrokes needed to type numbers from one 01:21:40.520 |
to 500 is 1511. The actual answers I've written here is 01:21:47.280 |
1392. Without chain thought is hallucinating. Okay, now let's 01:21:52.720 |
go ahead and see okay with chain of thought prompting, what does 01:21:55.920 |
it do? So be helpful assistant answer users question. To answer 01:22:00.480 |
the question, you must list systematically and in precise 01:22:04.160 |
detail all sub problems that are needed to be solved to answer 01:22:07.600 |
the question. Solve each sub problem individually, you have 01:22:11.720 |
to shout at the LLM sometimes to get them to listen. And in 01:22:14.920 |
sequence. Finally, use everything you've worked 01:22:18.120 |
through to provide the final answer. Okay, so we're getting 01:22:20.480 |
it we're forcing it to kind of go through the full problem 01:22:24.320 |
there. We can remove that. So run that. Again, I don't know 01:22:29.720 |
why we have context there. I'll remove that. And let's see. You 01:22:37.040 |
can see straightaway, that's taking a lot longer to generate 01:22:40.640 |
the output. That's because it's generating so many more tokens. 01:22:43.000 |
So that's just one one drawback of this. But let's see what we 01:22:46.320 |
have. So to determine how many keystrokes to tie those numbers, 01:22:50.200 |
we is breaking down several sub problems to count number of 01:22:54.080 |
digits from one to 910 to 99. So on account digits and number 01:22:59.920 |
500. Okay, interesting. So that's how it's breaking it up. 01:23:04.040 |
Some more digits counts in the previous steps. So we go 01:23:07.720 |
through total digits. And we see this, okay, nine digits for 01:23:12.680 |
those for here 180 for here 1200. And then, of course, three 01:23:20.480 |
here. So it gets all those sums those digits and actually comes 01:23:25.600 |
to the right answer. Okay, so that that is, you know, that's 01:23:29.200 |
the difference with with chain of thought versus without. So 01:23:32.960 |
without it, we just get the wrong answer, basically 01:23:35.800 |
guessing. With chain of thought, we get the right answer just by 01:23:40.480 |
the LLM writing down its reasoning and breaking the 01:23:43.720 |
problem down into multiple parts, which is, I found that 01:23:47.160 |
super interesting that it does that. So that's pretty cool. 01:23:52.080 |
Now, I will just see. So as I mentioned, as we mentioned 01:23:55.800 |
before, most LLMs nowadays are actually trained to use chain of 01:23:59.120 |
thought prompting by default. So let's just see if we don't 01:24:02.360 |
mention anything, right? Be a helpful assistant and answer 01:24:04.440 |
these users questions. So we're not telling it not to think 01:24:07.560 |
through its reasoning, and we're not telling it to think through 01:24:10.800 |
its reasoning. Let's just see what it does. Okay, so you can 01:24:15.560 |
see, again, it's actually doing the exact same reasoning, okay, 01:24:22.000 |
it doesn't, it doesn't give us like the sub problems at the 01:24:24.480 |
start, but it is going through and it's breaking everything 01:24:27.480 |
apart. Okay, which is quite interesting. And we get the 01:24:31.040 |
same correct answer. So the formatting here is slightly 01:24:34.000 |
different. It's probably a little cleaner, actually, 01:24:36.800 |
although I think, I don't know. Here, we get a lot more 01:24:41.560 |
information. So both are fine. And in this scenario, we 01:24:46.640 |
actually do get the right answer as well. So you can see that 01:24:50.080 |
that chain of thought prompting has actually been quite 01:24:54.200 |
literally trained into the model. And you'll see that with 01:24:58.560 |
most, well, I think all Save the Art LLMs. Okay, cool. So that 01:25:04.480 |
is our chapter on prompting. Again, we're focusing very much 01:25:09.960 |
on a lot of the fundamentals of prompting there. And of course, 01:25:14.880 |
tying that back to the actual objects and methods within 01:25:19.600 |
LanguageAid. But for now, that's it for prompting. And we'll move 01:25:23.360 |
on to the next chapter. In this chapter, we're going to be 01:25:26.360 |
taking a look at conversational memory in LanguageChain. We're 01:25:30.960 |
going to be taking a look at the core, like chat memory 01:25:35.280 |
components that have really been in LanguageChain since the 01:25:39.200 |
start, but are essentially no longer in the library. And we'll 01:25:43.800 |
be seeing how we actually implement those historic 01:25:48.000 |
conversational memory utilities in the new versions of 01:25:53.680 |
LanguageChain. So 0.3. Now as a pre warning, this chapter is 01:25:57.720 |
fairly long. But that is because conversational memory is just 01:26:02.640 |
such a critical part of chatbots and agents. Conversational 01:26:07.440 |
memory is what allows them to remember previous interactions. 01:26:11.120 |
And without it, our chatbots and agents would just be responding 01:26:15.680 |
to the most recent message without any understanding of 01:26:19.760 |
previous interactions within a conversation. So they would just 01:26:23.160 |
not be conversational. And depending on the type of 01:26:27.960 |
conversation, we might want to go with various approaches to 01:26:36.720 |
conversation. Now throughout this chapter, we're going to be 01:26:39.040 |
focusing on these four memory types. We'll be referring to 01:26:43.640 |
these and I'll be showing you actually how each one of these 01:26:46.400 |
works. But what we're really focusing on is rewriting these 01:26:50.680 |
for the latest version of LangChain using the, what's 01:26:59.120 |
history. So we're going to be essentially taking a look at the 01:27:05.320 |
original implementations for each of these four original 01:27:08.960 |
memory types, and then we'll be rewriting them with the 01:27:12.200 |
runnable memory history class. So just taking a look at each of 01:27:16.880 |
these four very quickly. Conversational buffer memory is 01:27:20.840 |
I think the simplest, most intuitive of these memory types. 01:27:24.840 |
It is literally just you have your messages, they come in to 01:27:31.160 |
this object, they are sold in this object as essentially a 01:27:35.000 |
list. And when you need them again, it will return them to 01:27:39.080 |
you. There's nothing, nothing else to it, super simple. The 01:27:42.760 |
conversation buffer window memory, okay, so new word in the 01:27:46.600 |
middle of the window. This works in pretty much the same way. 01:27:50.880 |
But those messages that it has stored, it's not going to return 01:27:54.680 |
all of them for you. Instead, it's just going to return the 01:27:57.720 |
most recent, let's say the most recent three, for example. Okay, 01:28:02.200 |
and that is defined by a parameter k. Conversational 01:28:05.560 |
summary memory, rather than keeping track of the entire 01:28:09.640 |
interaction memory directly, what it's doing is as those 01:28:13.800 |
interactions come in, it's actually going to take them and 01:28:17.640 |
it's going to compress them into a smaller little summary of what 01:28:21.720 |
has been within that conversation. And as every new 01:28:25.760 |
interaction is coming in, it's going to do that, and I keep 01:28:28.440 |
iterating on that summary. And then that is going to return to 01:28:32.080 |
us when we need it. And finally, we have the conversational 01:28:34.640 |
summary buffer memory. So this is it's taking sort of buffer 01:28:40.760 |
part of this is actually referring to very similar thing 01:28:44.360 |
to the buffer window memory, but rather than it being a most k 01:28:48.880 |
messages, it's looking at the number of tokens within your 01:28:51.600 |
memory, and it's returning the most recent k tokens. That's 01:28:58.320 |
what the buffer part is there. And then it's also merging that 01:29:02.560 |
with the summary memory here. So essentially, what you're 01:29:06.360 |
getting is almost like a list of the most recent messages based 01:29:10.280 |
on the token length rather than the number of interactions, 01:29:13.160 |
plus a summary, which would come at the top here. So you get 01:29:18.240 |
kind of both. The idea is that obviously this summary here 01:29:22.560 |
would maintain all of your interactions in a very compressed 01:29:27.800 |
form. So you're, you're losing less information, and you're 01:29:31.160 |
still maintaining, you know, maybe the very first 01:29:33.880 |
interaction, the user might have introduced themselves, giving 01:29:36.880 |
you their name, hopefully, that would be maintained within the 01:29:40.760 |
summary, and it would not be lost. And then you have almost 01:29:44.040 |
like high resolution on the most recent k or k tokens from your 01:29:50.440 |
memory. Okay, so let's jump over to the code, we're going into 01:29:53.840 |
the 04 chat memory notebook, open that in Colab. Okay, now 01:29:57.720 |
here we are, let's go ahead and install the prerequisites, run 01:30:02.240 |
all we again, can or cannot use a linesmith, it is up to you. 01:30:08.280 |
Enter that. And let's come down and start. So first, we'll just 01:30:13.560 |
initialize our LM using for a mini in this example, again, low 01:30:19.320 |
temperature. And we're going to start with conversation buffer 01:30:23.000 |
memory. Okay, so this is the original version of this memory 01:30:30.400 |
type. So let me, where are we, we're here. So memory 01:30:35.760 |
conversation buffer memory, and we're returning messages that 01:30:38.560 |
needs to be set to true. So the reason that we set return 01:30:42.640 |
messages true, it mentions up here is if you do not do this, 01:30:47.600 |
it's going to be returning your chat history as a string to an 01:30:51.800 |
LM. Whereas, well, chat elements nowadays would expect message 01:30:58.480 |
objects. So yeah, you just want to be returning these as 01:31:02.840 |
messages rather than as strings. Okay. Otherwise, yeah, you're 01:31:06.480 |
going to get some kind of strange behavior out from your 01:31:09.360 |
LMS if you return them strings. So you do want to make sure 01:31:12.160 |
that it's true. I think by default, it might not be true. 01:31:15.640 |
But this is coming, this is deprecated, right? It does tell 01:31:18.360 |
you here, as deprecation warning, this is coming from 01:31:22.360 |
older line chain, but it's a good place to start just to 01:31:25.000 |
understand this. And then we're going to rewrite this with the 01:31:27.560 |
runnables, which is the recommended way of doing so 01:31:30.360 |
nowadays. Okay, so adding messages to our memory, we're 01:31:34.880 |
going to write this, okay, so it's just a just a conversation 01:31:38.920 |
user AI user AI, so on, random chat, main things to note here 01:31:44.040 |
is I do provide my name, we have the the model's name, right 01:31:47.360 |
towards the start of those interactions. Okay, so I'm just 01:31:50.440 |
going to add all of those, we do it like this. Okay, then we can 01:31:57.040 |
just see, we can load our history, like so. So let's just 01:32:02.800 |
see what we have there. Okay, so we have human message, AI 01:32:06.520 |
message, human message, right? This is exactly what we showed 01:32:10.200 |
you just here. It's just in that message format from line chain. 01:32:13.720 |
Okay, so we can do that. Alternatively, we can actually 01:32:18.240 |
do this. So we can get our memory, we initialize the 01:32:21.120 |
constitutional buffer memory as we did before. And we can 01:32:24.360 |
actually add it directly these message into our memory like 01:32:28.360 |
that. So we can use this add user message, add AI message, so 01:32:31.440 |
on, so on, load again, and it's going to give us the exact same 01:32:34.680 |
thing. Again, there's multiple ways to do the same thing. Cool. 01:32:38.280 |
So we have that to pass all of this into our LM. Again, this is 01:32:42.920 |
all deprecated stuff, we're going to learn how to use 01:32:45.000 |
properly in a moment. But this is how line chain is doing in 01:32:48.760 |
the past. So to pass all of this into our LM, we'd be using this 01:32:53.680 |
conversation chain, right? Again, this is deprecated. 01:32:57.600 |
Nowadays, we would be using L cell for this. So I just want to 01:33:02.760 |
show you how this would all go together. And then we would 01:33:05.280 |
invoke, okay, what is my name again, let's run that. And we'll 01:33:10.040 |
see what we get is remembering everything, remember, so this 01:33:13.240 |
conversation buffer memory, it doesn't drop messages, it just 01:33:17.160 |
remembers everything. Right. And honestly, with the sort of high 01:33:21.920 |
context windows of many LMS, that might be what you do. It 01:33:25.200 |
depends on how long you expect the conversation to go on for, 01:33:27.760 |
but you could you probably in most cases would get away with 01:33:30.960 |
this. Okay, so what, let's see what we get. I say, what is my 01:33:36.080 |
name again? Okay, let's see what it gives me says your name is 01:33:39.760 |
James. Great. Thank you. That works. Now, as I mentioned, all 01:33:45.200 |
of this I just showed you is actually deprecated. That's the 01:33:47.280 |
old way of doing things. Let's see how we actually do this in 01:33:50.520 |
modern or up to date blank chain. So we're using this 01:33:54.440 |
runnable with message history. To implement that, we will need 01:33:58.800 |
to use LSL. And for that we will need to just define prompt 01:34:03.080 |
templates or LM as we usually would. Okay, so we're going to 01:34:06.600 |
set up our system prompt, which is just a helpful system called 01:34:10.880 |
Zeta. Okay, we're going to put in this messages placeholder. 01:34:15.360 |
Okay, so that's important. Essentially, that is where our 01:34:19.720 |
messages are coming from our conversation buffer memory is 01:34:24.360 |
going to be inserted, right? So it's going to be that chat 01:34:27.400 |
history is going to be inserted after our system prompt, but 01:34:30.960 |
before our most recent query, which is going to be inserted 01:34:34.360 |
last here. Okay, so messages placeholder item, that's 01:34:38.800 |
important. And we use that throughout the course as well. 01:34:41.600 |
So we use it both for chat history, and we'll see later on, 01:34:44.800 |
we also use it for the intermediate thoughts that a 01:34:47.960 |
agent would go through as well. So important to remember that 01:34:51.920 |
little thing. We'll link our prompt template to our LM. 01:34:56.320 |
Again, if we would like, we could also add in the I think we 01:35:01.320 |
only have the query here. Oh, we would probably also want our 01:35:05.880 |
history as well. But I'm not going to do that right now. 01:35:09.360 |
Okay, so we have our pipeline. And we can go ahead and actually 01:35:13.680 |
define our runnable with message history. Now this class or 01:35:18.120 |
object when we are initializing it does require a few items, we 01:35:21.360 |
can see them here. Okay, so we see that we have our pipeline 01:35:25.400 |
with history. So it's basically going to be, you can you can see 01:35:28.720 |
here, right, we have that history messages key, right, this 01:35:32.120 |
here has to align with what we provided as a messages 01:35:36.120 |
placeholder in our pipeline, right? So we have our pipeline 01:35:41.240 |
prompt template here, and here, right. So that's where it's 01:35:45.200 |
coming from. It's coming from messages placeholder, the 01:35:47.120 |
variable name is history, right? That's important. That links to 01:35:51.920 |
this. Then for the input messages key here, we have query 01:35:56.360 |
that, again, links to this. Okay, so both important to have 01:36:02.680 |
that. The other thing that is important is obviously we're 01:36:06.480 |
passing in that pipeline from before. But then we also have 01:36:09.480 |
this get session history. Basically, what this is doing is 01:36:12.840 |
it saying, okay, I need to get the list of messages that make 01:36:16.280 |
up my chat history that are going to be inserted into this 01:36:19.200 |
variable. So that is a function that we define, okay. And within 01:36:23.960 |
this function, what we're trying to do here is actually 01:36:26.640 |
replicate what we have with the previous conversation buffer 01:36:33.000 |
memory. Okay, so that's what we're doing here. So it's very 01:36:36.880 |
simple, right? So we have this in memory chat message history. 01:36:42.880 |
Okay, so that's just the object that we're going to be 01:36:44.840 |
returning. What this will do is it will sell a session ID, the 01:36:48.560 |
session ID is essentially like a unique identifier so that each 01:36:52.560 |
conversational interaction within a single conversation is 01:36:56.200 |
being mapped to a specific conversation. So you don't have 01:36:58.960 |
overlapping, let's say have multiple users using the same 01:37:01.480 |
system, you want to have a unique session ID for each one 01:37:03.960 |
of those. Okay, and what it's doing is saying, okay, if the 01:37:07.080 |
session ID is not in the chat map, which is this empty 01:37:10.400 |
dictionary we defined here, we are going to initialize that 01:37:15.000 |
session with an in memory, chat message history. Okay, that's 01:37:21.040 |
it. And we return. Okay, and all that's going to do is it's 01:37:25.040 |
going to basically append our messages, they will be appended 01:37:28.560 |
within this chat map session ID, and they're going to get 01:37:32.560 |
returned. There's nothing else to it, to be honest. So we 01:37:38.000 |
invoke our runnable, let's see what we get. I need to run this. 01:37:42.720 |
Okay, note that we do have this config, so we have the session 01:37:48.800 |
ID, that's to again, as I mentioned, keep different 01:37:51.600 |
conversations separate. Okay, so we've run that. Now let's run a 01:37:55.440 |
few more. So what is my name again, let's see if it 01:37:58.800 |
remembers. Your name is James. How can I help you today, James? 01:38:02.840 |
Okay. So it's what we've just done there is literally 01:38:08.360 |
conversation buffer memory, but for up to date, line chain with 01:38:14.640 |
L cell with runnables. So the recommended way of doing it 01:38:19.040 |
nowadays. So that's a very simple example. Okay, there's 01:38:23.240 |
really not that much to it. It gets a little more complicated 01:38:28.200 |
as we start thinking about the different types of memory. 01:38:30.760 |
Although with that being said, it's not massively complicated, 01:38:33.760 |
we're only really going to be changing the way that we're 01:38:36.160 |
getting our interactions. So let's, let's dive into that and 01:38:42.080 |
see how we will do something similar with the conversation 01:38:45.120 |
buffer window memory. But first, let's actually just understand 01:38:48.240 |
okay, what is the conversation buffer window memory. So as I 01:38:51.560 |
mentioned, near the start, it's going to keep track of the last 01:38:53.880 |
K messages. So there's a few things to keep in mind here. 01:38:58.600 |
More messages does mean more tokens that send each request. 01:39:02.600 |
And if we have more tokens in each request, it means that 01:39:05.320 |
we're increasing the latency of our responses and also the cost. 01:39:08.360 |
So with the previous memory type, we're just sending 01:39:12.200 |
everything. And because we're sending everything that is going 01:39:15.440 |
to be increasing our costs, it's going to be increasing our 01:39:17.400 |
latency for every message, especially as the conversation 01:39:20.120 |
gets longer and longer. And we don't, we might not necessarily 01:39:22.760 |
want to do that. So with this conversation buffer window 01:39:27.000 |
memory, we're going to say, okay, just return me the most 01:39:30.360 |
recent messages. Okay, so let's, well, let's see how that would 01:39:36.000 |
work. Here, we're going to return the most recent four 01:39:38.960 |
messages. Okay, we are again, make sure we've turned messages 01:39:42.720 |
is set to true. Again, this is deprecated. This is just the 01:39:46.320 |
old way of doing it. In a moment, we'll see the updated 01:39:49.760 |
way of doing this. We'll add all of our messages. Okay, so we 01:39:55.640 |
have this. And just see here, right, so we've added in all 01:40:01.000 |
these messages, there's more than four messages here. And we 01:40:03.680 |
can actually see that here. So we have human message, AI, 01:40:07.400 |
human, AI, human, AI, human, AI. Right. So we've got four pairs 01:40:13.440 |
of human AI interactions there. But here, we don't have as more 01:40:17.560 |
than four pairs. So four pairs would take us back all the way 01:40:25.200 |
conversational memory. Okay, and if we take a look here, the 01:40:29.200 |
most the first message we have is I'm researching different 01:40:32.040 |
types of conversational memory. So it's cut off these two here, 01:40:35.800 |
which will be a bit problematic when we ask you what our name 01:40:38.720 |
is. Okay, so let's just see, we're going to be using 01:40:41.400 |
conversation chain object again, again, remember that is 01:40:44.600 |
deprecated. And I want to say what is my name again, let's 01:40:48.360 |
see, let's see what it says. I'm sorry, I don't know if I see 01:40:53.920 |
your name or any personal information, if you like, you 01:40:55.920 |
can tell me your name, right, so it doesn't actually remember. 01:40:58.360 |
So that's kind of like a negative of the conversation 01:41:04.160 |
buffer window memory. Of course, the to fix that in this 01:41:08.160 |
scenario, we might just want to increase K maybe we say around 01:41:11.480 |
the previous eight interaction pairs, and it will actually 01:41:15.400 |
remember. So what's my name again, your name is James. So 01:41:19.200 |
now it remembers, we just modified how much is 01:41:21.680 |
remembering. But of course, you know, there's pros and cons to 01:41:24.600 |
this, it really depends on what you're trying to build. So let's 01:41:28.120 |
take a look at how we would actually implement this with 01:41:31.880 |
the runnable with message history. Okay, so getting a 01:41:37.520 |
little more complicated here, although it is, it's not, it's 01:41:41.680 |
not complicated. But more we'll see. Okay, so we have a buffer 01:41:46.000 |
window message history, we're creating a class here, this 01:41:49.400 |
class is going to inherit from the base chat message history 01:41:53.320 |
object from line chain. Okay, and all of our other message 01:41:58.320 |
history objects can do the same thing before with the in memory 01:42:02.520 |
message object that was basically replicating the buffer 01:42:06.120 |
memory. So we didn't actually need to do anything, we didn't 01:42:10.240 |
need to define our own class here. So in this case, we do. 01:42:14.760 |
So we follow the same pattern that line chain follows with 01:42:19.800 |
this base chat message history. And you can see a few of the 01:42:22.520 |
functions here that are important. So add messages and 01:42:25.760 |
clear the ones that we're going to be focusing on, we also need 01:42:28.320 |
to have messages, which this object attribute here. Okay, so 01:42:32.120 |
we're just implementing the synchronous methods here. If we 01:42:37.680 |
want this to be async, if we want to supply async, we would 01:42:40.440 |
have to add a add messages, a get messages and a clear as 01:42:45.760 |
well. So let's go ahead and do that. We have messages we have 01:42:49.800 |
k again, we're looking at remembering the top k messages 01:42:52.840 |
or most recent k messages only. So it's important that we have 01:42:56.440 |
that variable, we are adding messages through this class, 01:43:00.280 |
this is going to be used by line chain within our runnable. So 01:43:04.080 |
we need to make sure that we do have this method. And all we're 01:43:06.800 |
going to be doing is sending the self messages list here. And 01:43:11.480 |
then we're actually just going to be trimming that down so that 01:43:13.600 |
we're not remembering anything beyond those, you know, most 01:43:18.480 |
recent k messages that we have set from here. And then we also 01:43:24.160 |
have the clear method as well. So we need to include that 01:43:26.920 |
that's just going to clear the history. Okay, so it's not this 01:43:30.120 |
isn't complicated, right? It just gives us this nice default 01:43:34.160 |
standard interface for message history. And we just need to 01:43:38.280 |
make sure we're following that pattern. Okay, I've included the 01:43:41.600 |
this print here just so we can see what's happening. Okay, so 01:43:44.800 |
we have that. And now for that get chat history function that 01:43:50.240 |
we defined earlier, rather than using the built in method, we're 01:43:54.040 |
going to be using our own object, which is a buffer window 01:43:57.520 |
message history, which we defined just here. Okay. So if 01:44:02.800 |
session ID is not in the chat map, as we did before, we're 01:44:05.800 |
going to be initializing our buffer window message history, 01:44:08.480 |
we're setting k up here with a default value of four, and then 01:44:12.320 |
we just return it. Okay, and that is it. So let's run this, 01:44:16.200 |
we have our runnable with message history, we have all of 01:44:20.360 |
these variables, which are exactly the same as before. But 01:44:23.480 |
then we also have these variables here with this history 01:44:26.600 |
factory config. And this is where if we have new variables 01:44:34.040 |
that we've added to our message history, in this case, k that we 01:44:38.680 |
have down here, we need to provide that to line chain and 01:44:42.480 |
tell it this is a new configurable field. Okay. And 01:44:45.680 |
we've also added it for the session ID here as well. So 01:44:48.640 |
we're just being explicit and have everything in that. So we 01:44:52.240 |
have that and we run. Okay, now let's go ahead and invoke and 01:44:58.160 |
see what we get. Okay, so important here, this history 01:45:02.680 |
factory config, that is kind of being fed through into our 01:45:06.240 |
invoke so that we can actually modify those variables from 01:45:09.840 |
here. Okay, so we have config configurable, session ID, okay, 01:45:13.880 |
we'll just put whatever we want in here. And then we also have 01:45:16.400 |
the number k. Okay, so remember the previous four interactions, 01:45:22.640 |
I think in this one, we're doing something slightly different. I 01:45:25.360 |
think we're remembering the four interactions rather than the 01:45:28.560 |
previous four interaction pairs. Okay, so my name is James, 01:45:32.560 |
we're going to go through I'm just going to actually clear 01:45:35.400 |
this. And I'm going to start again. And we're going to use 01:45:38.040 |
the exact same add user message and AI message that we used 01:45:41.880 |
before, which is manually inserting all that into our 01:45:44.240 |
history, so that we can then just see, okay, what is the 01:45:47.840 |
result. And you can see that k equals four is actually unlike 01:45:52.360 |
before where we were having the saving the top four interaction 01:45:56.920 |
pairs, when now saving the most recent four interactions, not 01:46:03.000 |
pairs, just interactions. And honestly, I just think that's 01:46:06.480 |
clearer. I think it's weird that the number four for k would 01:46:10.760 |
actually save the most recent eight messages. Right? I think 01:46:14.960 |
that's odd. So I'm just not replicating that weirdness. We 01:46:19.160 |
could if we wanted to, I just don't like it. So I'm not doing 01:46:23.800 |
that. And anyway, we can see from messages that we're 01:46:26.960 |
returning just the most four recent messages. Okay, I wish 01:46:31.160 |
would be these four. Okay, cool. So we've just using the 01:46:35.160 |
runnable, we've replicated the old way of having a window 01:46:40.640 |
memory. And okay, I'm going to say what is my name again, as 01:46:44.200 |
before, it's not going to remember. So we can come to 01:46:47.000 |
here, I'm sorry, but I don't have access to personal 01:46:48.680 |
information and so on and so on. If you like to tell me your 01:46:51.360 |
name, it doesn't know. Now let's try a new one, where we 01:46:55.640 |
initialize a new session. Okay, so we're going with ID k 14. So 01:47:01.240 |
that's going to create a new conversation there. And we're 01:47:03.760 |
going to say, we're going to set k to 14. Okay, great. I'm 01:47:09.320 |
going to manually insert the other messages as we did 01:47:12.760 |
before. Okay, and we can see all of those you can see at the 01:47:15.880 |
top here, we are still maintaining that Hi, my name is 01:47:18.520 |
James message. Now let's see if it remembers my name. Your name 01:47:23.480 |
is James. Okay, there we go. Cool. So that is working. We 01:47:28.360 |
can also see, so we just added this, what is my name again, 01:47:31.960 |
let's just see if did that get added to our list of messages. 01:47:36.440 |
Right, what is my name again? Nice. And then we also have the 01:47:39.640 |
response, your name is James. So just by invoking this, because 01:47:43.320 |
we're using the, the runnable with message history, it's just 01:47:47.800 |
automatically adding all of that into our message history, 01:47:51.800 |
which is nice. Cool. Alright, so that is the buffer window 01:47:56.920 |
memory. Now we are going to take a look at how we might do 01:48:01.480 |
something a little more complicated, which is the 01:48:03.880 |
summaries. Okay, so when you think about the summary, you 01:48:07.080 |
know, what are we doing, we're actually taking the messages, 01:48:10.680 |
we're using the LLM call to summarize them, to compress 01:48:14.760 |
them, and then we're storing them within messages. So let's 01:48:18.360 |
see how we would actually do that. So to start with, let's 01:48:23.720 |
just see how it was done in old line chain. So your 01:48:27.000 |
conversation summary memory, go through that. And let's just 01:48:33.160 |
see what we get. So again, same interactions. Right, I'm just 01:48:38.600 |
invoking, invoking, invoking, I'm not adding these directly 01:48:42.120 |
to the messages, because it actually needs to go through a 01:48:46.520 |
like that summarization process. And if we have a look, we can 01:48:50.520 |
see it happening. Okay, current conversation. So sorry, 01:48:54.680 |
current conversation. Hello there, my name is James, AI is 01:48:57.880 |
generating. Current conversation, the human introduces 01:49:01.320 |
himself as James, AI greets James warmly and expresses its 01:49:04.760 |
readiness to chat and assist, inquiring about how his day is 01:49:08.200 |
going. Right, so it's summarizing the previous 01:49:11.640 |
interactions. And then we have, you know, after that summary, we 01:49:15.720 |
have the most recent human message, and then the AI is 01:49:18.520 |
going to generate its response. Okay, and that continues going, 01:49:22.200 |
continues going. And you see that the final summary here is 01:49:25.240 |
going to be a lot longer. Okay, and it's different that first 01:49:28.280 |
first summary, of course, asking about his day, he mentions that 01:49:31.160 |
he's researching different types of conversational memory. 01:49:33.640 |
The AI responds enthusiastically, explaining that 01:49:36.280 |
conversational memory includes short term memory, long term 01:49:38.760 |
memory, contextual memory, personalized memory, and then 01:49:41.080 |
inquires if James is focused on the specific type of memory. 01:49:44.680 |
Okay, cool. So we get essentially the summary is just 01:49:48.760 |
getting longer and longer as we go. But at some point, the idea 01:49:52.520 |
is that it's not going to keep growing. And it should actually 01:49:55.560 |
be shorter than if you were saving every single 01:49:57.640 |
interaction, whilst maintaining as much of the information as 01:50:01.960 |
possible. But of course, you're not going to maintain all of 01:50:06.280 |
the information that you would with, for example, the the 01:50:09.720 |
buffer memory, right with the summary, you are going to lose 01:50:13.640 |
information, but hopefully less information than if you're just 01:50:17.960 |
cutting interactions. So you're trying to reduce your token 01:50:21.880 |
count whilst maintaining as much information as possible. 01:50:26.520 |
Now, let's go and ask what is my name again, it should be able 01:50:30.360 |
to answer because we can see in the summary here that I 01:50:34.200 |
introduced myself as James. Okay, response, your name is 01:50:38.360 |
James. How is your research going? Okay, so has that. Cool. 01:50:42.920 |
Let's see how we'd implement that. So again, as before, we're 01:50:46.600 |
going to go with that conversation summary message 01:50:50.760 |
history, we're going to be importing a system message, 01:50:53.560 |
we're going to be using that not for the LM that we're chatting 01:50:56.040 |
with, but for the LM that will be generating our summary. So 01:51:00.520 |
actually, that is not quite correct, there's create a 01:51:04.760 |
summary, not that it matters, it's just the docker string. So 01:51:07.880 |
we have our messages and we also have the LM. So different 01:51:10.520 |
tribute here to what we had before. When we initialize a 01:51:14.440 |
conversation summary message history, we need to be passing 01:51:17.640 |
in our LM. We have the same methods as before, we have add 01:51:21.720 |
messages and clear. And what we're doing is as messages 01:51:25.240 |
coming, we extend with our current messages, but then we're 01:51:29.720 |
modifying those. So we construct our instructions to 01:51:35.560 |
make a summary. So that is here, we have the system prompt, 01:51:40.280 |
given the existing conversation summary and the new messages, 01:51:43.240 |
generate a new summary of the conversation, ensuring to 01:51:45.400 |
maintain as much relevant information as possible. Then 01:51:48.920 |
we have a human message here, through that we're passing the 01:51:52.360 |
existing summary. And then we're passing in the new 01:51:56.840 |
messages. So we format those and invoke the LM. 01:52:04.040 |
And then what we're doing is in the messages, we're actually 01:52:10.040 |
replacing the existing history that we had before with a new 01:52:14.440 |
history, which is the single system summary message. Let's 01:52:20.040 |
see what we get. As before, we have that get chat history 01:52:23.160 |
exactly the same as before. The only real difference is that 01:52:26.440 |
we're passing in the LM parameter here. And of course, 01:52:29.400 |
as we're passing in the LM parameter in here, it does also 01:52:33.080 |
mean that we're going to have to include that in the 01:52:34.760 |
configurable field spec, and that we're going to need to 01:52:39.160 |
include that when we're invoking our pipeline. So we 01:52:44.520 |
run that, pass in the LM. Now, of course, one side effect of 01:52:51.160 |
generating summaries or everything is that we're 01:52:52.920 |
actually, you know, we're generating more. So you are 01:52:56.760 |
actually using quite a lot of tokens. Whether or not you are 01:53:00.600 |
saving tokens or not actually depends on the length of a 01:53:03.080 |
conversation. As the conversation gets longer, if 01:53:05.880 |
you're storing everything, after a little while that the 01:53:09.480 |
token usage is actually going to increase. So if in your use 01:53:13.720 |
case you expect to have shorter conversations, you would be 01:53:17.800 |
saving money and tokens by just using the standard buffer 01:53:22.120 |
memory. Whereas if you're expecting very long 01:53:25.080 |
conversations, you would be saving tokens and money by 01:53:28.440 |
using the summary history. Okay, so let's see what we got 01:53:33.160 |
from that. We have a summary of the conversation. James 01:53:35.400 |
introduced himself by saying, "Hi, my name is James." AR 01:53:37.800 |
responded warmly asking, "Hi, James." Interaction include 01:53:40.600 |
details about token usage. Okay, so we actually included 01:53:45.960 |
everything here, which we probably should not have done. 01:53:49.400 |
Why did we do that? So in here, we're including all of 01:54:03.720 |
messages. So I think maybe if we just do "x.content" for 01:54:16.280 |
Okay, there we go. So we quickly fixed that. So yeah, before 01:54:21.160 |
we're passing in the entire message object, which obviously 01:54:23.560 |
includes all of this information. Whereas actually 01:54:26.200 |
we just want to be passing in the content. So we modified 01:54:30.360 |
that and now we're getting what we'd expect. Okay, cool. And 01:54:35.640 |
then we can keep going. So as we as we keep going, the 01:54:38.600 |
summary should get more abstract. Like as we just saw 01:54:42.920 |
here, it's literally just giving us the messages directly 01:54:46.440 |
almost. Okay, so we're getting the summary there and we can 01:54:50.120 |
keep going. We're going to add just more messages to that. So 01:54:53.080 |
we'll see as we'll send those, we're getting a 01:54:57.720 |
response. Send again, get response. And we're just adding 01:55:01.000 |
all of that. Inverting all of that and that will be of course 01:55:03.960 |
adding everything into our message history. Okay, cool. So 01:55:08.440 |
we've run that. Let's see what the latest summary is. 01:55:13.560 |
Okay, and then we have this. So this is a summary that we have 01:55:16.820 |
instead of our chat history. Okay, cool. Now, finally, let's 01:55:23.860 |
see what's my name again. We can just double check. You know, 01:55:26.980 |
it has my name in there. So it should be able to tell us. 01:55:31.460 |
Okay, cool. So your name is James. Pretty interesting. So 01:55:38.680 |
let's have a quick look over at Langsmith. So the reason I 01:55:43.080 |
want to do this is just to point out, okay, the different 01:55:46.600 |
essentially token usage that we're getting with each one of 01:55:48.840 |
these. Okay, so we can see that we have these runnable 01:55:51.400 |
message history, which are probably improved in naming 01:55:54.200 |
there. But we can see, okay, how long is each one of these 01:55:59.000 |
taken? How many tokens are they also using? Come back to here. 01:56:03.800 |
We have this runnable message history. This is, we'll go 01:56:07.320 |
through a few of these, maybe to here, I think. You can see 01:56:11.400 |
here, this is that first interaction where we're using 01:56:13.880 |
the buffer memory. And we can see how many tokens we use 01:56:18.280 |
here. So 112 tokens when we're asking what is my name again. 01:56:22.280 |
Okay, then we modified this to include, I think it was like 01:56:27.880 |
14 interactions or something on those lines, obviously 01:56:30.520 |
increases the number of tokens that we're using, right? So we 01:56:33.160 |
can see that actually happening all in Langsmith, which is 01:56:36.200 |
quite nice. And we can compare, okay, how many tokens is each 01:56:38.920 |
one of these using. Now, this is looking at the buffer window. 01:56:43.960 |
And if we come down to here and look at this one, so this is 01:56:47.640 |
using our summary. Okay, so summary with what is my name 01:56:51.560 |
again, actually use more tokens in this scenario, right? Which 01:56:54.520 |
is interesting because we're trying to compress information. 01:56:57.640 |
The reason there's more is because there's not, there 01:57:02.680 |
conversation length increases with the summary, this total 01:57:08.120 |
number of tokens, especially if we prompt it correctly to keep 01:57:10.600 |
that low, that should remain relatively small. Whereas with 01:57:16.040 |
the buffer memory, that will just keep increasing and 01:57:19.560 |
increasing as the conversation gets longer. So useful little 01:57:25.000 |
way of using Langsmith there to just kind of figure out, okay, 01:57:28.920 |
in terms of tokens and costs of what we're looking at for each 01:57:32.200 |
of these memory types. Okay, so our final memory type acts as a 01:57:37.720 |
mix of the summary memory and the buffer memory. So what it's 01:57:42.440 |
going to do is keep the buffer up until an n number of tokens. 01:57:48.440 |
And then once a message exceeds the n number of token limit for 01:57:52.760 |
the buffer, it is actually going to be added into our 01:57:56.760 |
summary. So this memory has the benefit of remembering in 01:58:02.600 |
detail the most recent interactions whilst also not 01:58:07.000 |
having the limitation of using too many tokens as a 01:58:12.440 |
conversation gets longer and even potentially exceeding 01:58:15.400 |
context windows if you try super hard. So this is a very 01:58:19.480 |
interesting approach. Now as before, let's try the original 01:58:23.880 |
way of implementing this. Then we will go ahead and use our 01:58:29.000 |
update method for implementing this. So we come down to here 01:58:32.680 |
and we're going to do Lang chain memory import conversation 01:58:36.360 |
summary buffer memory. Okay, a few things here. LLM for 01:58:41.480 |
summary. We have the n number of tokens that we can keep 01:58:46.200 |
before they get added to the summary and then return 01:58:49.160 |
messages, of course. Okay, you can see again this is 01:58:51.560 |
deprecated. We use the conversation chain and then we're 01:58:56.040 |
just passing our memory there and then we can chat. Okay, so 01:58:59.640 |
super straightforward first message. We'll add a few more 01:59:03.880 |
here. Again, we have to invoke because how memory type here is 01:59:10.120 |
using LLM to create those summaries as it goes and let's 01:59:14.360 |
see what they look like. Okay, so we can see for the first 01:59:16.920 |
message here, we have a human message and then an AI message. 01:59:22.360 |
Then we come a little bit lower down again. It's the same 01:59:24.440 |
thing. Human message is the first thing in our history here. 01:59:28.840 |
Then it's a system message. So this is at the point where 01:59:31.560 |
we've exceeded that 300 token limit and the memory type here 01:59:36.440 |
is generating those summaries. So that summary comes in as 01:59:40.120 |
this is a message and we can see, okay, the human named 01:59:43.240 |
James introduces himself and mentions he's researching 01:59:45.720 |
different types of conversational memory and so on 01:59:47.960 |
and so on. Right. Okay, cool. So we have that. Then let's come 01:59:53.480 |
down a little bit further. We can see, okay, so the summary 01:59:57.160 |
there. Okay, so that's what we that's what we have. That is 02:00:01.960 |
the implementation for the old version of this memory. Again, 02:00:07.880 |
we can see it's deprecated. So how do we implement this for 02:00:12.040 |
our more recent versions of LangChain and specifically 02:00:16.200 |
0.3? Well, again, we're using that runnable message history 02:00:20.840 |
and it looks a little more complicated than we were 02:00:24.360 |
getting before, but it's actually just, you know, it's 02:00:26.680 |
nothing too complex. We're just creating a summary as we 02:00:31.800 |
did with the previous memory type, but the decision for 02:00:36.360 |
adding to that summary is based on, in this case, actually the 02:00:39.960 |
number of messages. So I didn't go with the LangChain 02:00:43.960 |
version where it's a number of tokens. I don't like that. I 02:00:47.240 |
prefer to go with messages. So what I'm doing is saying, okay, 02:00:50.520 |
the last K messages. Okay. Once we exceed K messages, the 02:00:56.200 |
messages beyond that are going to be added to the memory. 02:01:00.280 |
Okay, cool. So let's see, we first initialize our 02:01:06.040 |
conversation summary buffer message history class with LLM 02:01:11.640 |
and K. Okay, so these two here. So LLM, of course, to create 02:01:15.320 |
summaries and K is just the limit of number of messages 02:01:18.360 |
that we want to keep before adding them to the summary or 02:01:21.560 |
dropping them from our messages and adding them to the summary. 02:01:24.920 |
Okay, so we will begin with, okay, do we have an existing 02:01:30.360 |
summary? So the reason we set this to none is we can't extract 02:01:36.840 |
the summary, the existing summary, unless it already 02:01:40.200 |
exists. And the only way we can do that is by checking, okay, 02:01:43.800 |
do we have any messages? If yes, we want to check if within 02:01:47.960 |
those messages, we have a system message because we're 02:01:50.440 |
doing the same structure as what we have for peer where the 02:01:53.720 |
system message, that first system message is actually our 02:01:56.840 |
summary. So that's what we're doing here. We're checking if 02:01:59.400 |
there is a summary message already stored within our 02:02:02.200 |
messages. Okay, so we're checking for that. If we find 02:02:08.600 |
it, we'll just do, we have this little print statement so we 02:02:11.080 |
can see that we found something and then we just make our 02:02:15.480 |
existing summary. I should actually move this to the first 02:02:20.920 |
instance here. Okay, so that existing summary will be set 02:02:26.920 |
to the first message. Okay, and this would be a system message 02:02:33.480 |
rather than a string. Cool, so we have that. Then we want to 02:02:39.640 |
add any new messages to our history. Okay, so we're sending 02:02:44.760 |
the history there and then we're saying, okay, if the 02:02:47.560 |
length of our history is exceeds the K value that we 02:02:51.480 |
set, we're going to say, okay, we found that many messages. 02:02:54.120 |
We're going to be dropping the latest. It's going to be the 02:02:56.040 |
latest two messages. This I will say here, one thing or one 02:03:01.640 |
problem with this is that we're not going to be saving that 02:03:04.840 |
many tokens if we're summarizing every two messages. 02:03:08.440 |
So what I would probably do is in an actual like production 02:03:13.480 |
setting, I would probably say let's go to twenty messages and 02:03:20.040 |
once we hit twenty messages, let's take the previous ten. 02:03:23.720 |
We're going to summarize them and put them into our summary 02:03:26.600 |
alongside any previous summary that already existed, but in 02:03:30.440 |
you know, this is also fine as well. Okay, so we say we found 02:03:36.600 |
those messages. We're going to drop the latest two messages. 02:03:40.760 |
Okay, so we pull the oldest messages out. I should say 02:03:46.200 |
not the latest. It's the oldest, not the latest. We want to 02:03:51.000 |
keep the latest and drop the oldest. So we pull out the 02:03:54.840 |
oldest messages and keep only the most recent messages. 02:03:59.240 |
Okay, then I'm saying, okay, if we don't have any old 02:04:03.720 |
messages to summarize, we don't do anything. We just return. 02:04:07.560 |
Okay, so this indicates that this has not been triggered. We 02:04:11.880 |
would hit this, but in the case this has been triggered and we 02:04:17.000 |
do have old messages, we're going to come to here. Okay, so 02:04:22.760 |
this is we can see we have a system message prompt template 02:04:26.760 |
saying giving the existing conversation summary in the new 02:04:29.480 |
messages generate a new summary of the conversation, 02:04:32.520 |
ensuring to maintain as much relevant information as 02:04:34.760 |
possible. So if we want to be more conservative with tokens, 02:04:38.040 |
we could modify this prompt here to say keep the summary to 02:04:42.360 |
within the length of a single paragraph, for example, and 02:04:46.680 |
then we have our human message prompt template, which can 02:04:49.240 |
say, okay, here's the existing conversation summary and here 02:04:51.960 |
are new messages. Now, new messages here is actually the 02:04:55.160 |
old messages, but the way that we're framing it to the LLM 02:04:59.400 |
here is that we want to summarize the whole conversation, 02:05:02.680 |
right? It doesn't need to have the most recent messages that 02:05:05.000 |
we're storing within our buffer. It doesn't need to know 02:05:08.600 |
about those. That's irrelevant to the summary. So we just tell 02:05:11.560 |
it that we have these new messages and as far as this LLM 02:05:14.280 |
is concerned, this is like the full set of interactions. Okay, 02:05:18.600 |
so then we would format those and invoke our LLM and then 02:05:23.800 |
we'll print out our new summary so we can see what's going on 02:05:26.360 |
there and we would prepend that new summary to our 02:05:31.640 |
conversation history. Okay, and this will work so we can just 02:05:37.240 |
prepend it like this because we've already popped. Where was 02:05:43.640 |
it up here? If we have an existing summary, we already 02:05:48.600 |
popped that from the list. It's already been pulled out of 02:05:50.520 |
that list. So it's okay for us to just we don't need to say 02:05:54.760 |
like we don't need to do this because we've already dropped 02:05:58.280 |
that initial system message if it existed. Okay, and then we 02:06:01.960 |
have the clear method as before. So that's all of the 02:06:05.640 |
logic for our conversational summary buffer memory. We 02:06:12.200 |
redefine our get chat history function with the LM and K 02:06:18.760 |
parameters there and then we'll also want to set the 02:06:21.480 |
configurable fields again. So that is just going to be called 02:06:25.080 |
session ID LM and K. Okay, so now we can invoke the K value 02:06:32.280 |
to begin with is going to be four. Okay, so you can see no 02:06:37.880 |
old messages to update summary with. That's good. Let's invoke 02:06:42.520 |
this a few times and let's see what we get. Okay, so no old 02:06:51.540 |
Found six messages dropping the oldest two and then we have new 02:06:55.460 |
summary in the conversation. James and Bruce themselves and 02:06:57.700 |
Chris is interested in researching different types of 02:07:00.180 |
conversational memory. Right so you can see there's quite a lot 02:07:03.220 |
in here at the moment. So we would definitely want to prompt 02:07:07.940 |
the LM the summary LM to keep that short. Otherwise, we're 02:07:12.100 |
just getting a ton of stuff right, but we can see that that 02:07:16.820 |
is you know it's it's working. It's functional. So let's go 02:07:20.500 |
back and see if we can prompt it to be a little more concise. 02:07:23.940 |
So we come to here and trying to maintain as much relevant 02:07:27.460 |
information as possible. However, we need to keep our 02:07:34.980 |
summary concise. The limit is a single short paragraph. Okay, 02:07:45.060 |
something like this. Let's try and let's see what we get with 02:07:48.980 |
that. Okay, so message one again and nothing to update. 02:07:54.100 |
See this so new summary you can see it's a bit shorter. It 02:07:57.700 |
doesn't have all those bullet points. Okay, so that seems 02:08:04.900 |
better. Let's see so you can see the first summary is a bit 02:08:09.620 |
shorter, but then as soon as we get to the second and third 02:08:13.700 |
summaries, the second summary is actually slightly longer than 02:08:16.980 |
the third one. Okay, so we're going to be we're going to be 02:08:20.260 |
losing a bit of information in this case more than we were 02:08:23.460 |
before, but we're saving a ton of tokens. So that's of course 02:08:27.460 |
a good thing and of course we could keep going and adding 02:08:30.500 |
many interactions here and we should see that this 02:08:33.460 |
conversation summary will be it should maintain that sort of 02:08:37.220 |
length of around one short paragraph. So that is it for 02:08:43.220 |
this chapter on conversational memory. We've seen a few 02:08:47.300 |
different memory types. We've implemented the old deprecated 02:08:51.140 |
versions so we can see what they were like and then we've 02:08:55.060 |
reimplemented them for the latest versions of lang chain 02:08:58.500 |
and to be honest using logic where we are getting much more 02:09:02.740 |
into the weebs and that is in some ways. Okay, it complicates 02:09:07.300 |
things that is true, but in other ways it gives us a ton of 02:09:10.900 |
control so we can modify those memory types as we did with 02:09:14.180 |
that final summary buffer memory type. We can modify 02:09:17.940 |
those to our liking, which is incredibly useful when you're 02:09:23.060 |
actually building applications for the real world. So that is 02:09:26.340 |
it for this chapter. We'll move on to the next one in this 02:09:29.780 |
chapter. We are going to introduce agents now agents. I 02:09:34.820 |
think are one of the most important components in the 02:09:39.300 |
world of AI and I don't see that going away anytime soon. 02:09:43.140 |
I think the majority of AI applications, the intelligent 02:09:49.220 |
part of those will be almost always an implementation of an 02:09:53.380 |
AI agent or most for AI agents. So in this chapter, we are just 02:09:57.940 |
going to introduce agents within the context of lang 02:10:01.780 |
chain. We're going to keep it relatively simple. We're going 02:10:05.540 |
to go into much more depth in agents in the next chapter 02:10:10.500 |
where we'll do a bit of a deep dive, but we'll focus on just 02:10:14.260 |
introducing the core concepts and of course agents within 02:10:18.900 |
lang chain here. So jumping straight into our notebook, 02:10:24.500 |
let's run our prerequisites. You'll see that we do have an 02:10:28.660 |
additional prerequisite here, which is Google search results. 02:10:31.780 |
That's because we're going to be using the SERP API to allow 02:10:35.940 |
our LM as an agent to search the web, which is one of the 02:10:41.700 |
great things about agents that they can do all of these 02:10:44.420 |
additional things and LM by itself obviously cannot. So 02:10:48.420 |
we'll come down to here. We have our langsmith parameters 02:10:51.700 |
again, of course. So you enter your lang chain API key if you 02:10:54.900 |
have one and now we're going to take a look at tools, which is 02:10:59.380 |
a very essential part of agents. So tools are a way for 02:11:04.740 |
us to augment our LMs with essentially anything that we 02:11:08.900 |
can write in code. So we mentioned that we're going to 02:11:12.420 |
have a Google search tool that Google search tool. It's some 02:11:15.860 |
code that gets executed by our LM in order to search Google 02:11:20.180 |
and get some results. So a tool can be thought of as any code 02:11:25.620 |
logic or any function in the case of Python and a function 02:11:31.380 |
that has been formatted in a way so that our LM can 02:11:34.900 |
understand how to use it and then actually use it. Although 02:11:39.860 |
the LM itself is not using the tool. It's more our agent 02:11:44.740 |
execution logic, which uses the tool for the LM. So we're 02:11:49.220 |
going to go ahead and actually create a few simple tools. 02:11:52.740 |
We're going to be using what is called the tool decorator from 02:11:55.380 |
lang chain and there are a few things to keep in mind when 02:12:00.100 |
we're building tools. So for optimal performance, our tool 02:12:04.100 |
needs to be just very readable and what I mean by readable is 02:12:07.780 |
we need three main things. One is a dot string that is written 02:12:12.660 |
natural language and it is going to be used to explain to 02:12:15.860 |
the LM when and why and how it should use this tool. We should 02:12:21.460 |
also have clear parameter names. Those parameter names 02:12:25.460 |
should tell the LM okay what each one of these parameters 02:12:29.780 |
are. They should be self explanatory. If they are not 02:12:33.060 |
self explanatory, we should be including an explanation for 02:12:37.860 |
those parameters within the dot string. Then finally, we 02:12:41.220 |
should have type annotations for both our parameters and 02:12:44.740 |
also what we're returning from the tool. So let's jump in and 02:12:49.060 |
see how we would implement all of that. So come down here and 02:12:52.820 |
we have lang chain core tools import tool. Okay. So these are 02:12:57.380 |
just four incredibly simple tools. We have the addition or 02:13:02.020 |
add tool multiply the exponentiate and the subtract 02:13:05.780 |
tools. Okay. So a few calculator S tools. Now when we 02:13:11.780 |
add this tool decorator, it is turning each of these tools 02:13:17.140 |
into what we call a structured tool object. So you can see 02:13:20.980 |
that here. We can see we have this structured tool. We have a 02:13:26.180 |
name description. Okay. And then we have this schema. We'll 02:13:30.340 |
see this in a moment and a function right. So this 02:13:32.660 |
function is literally just the original function. It's a 02:13:36.660 |
mapping to the original function. So in this case, it's 02:13:39.700 |
the add function. Now the description we can see it's 02:13:42.820 |
coming from our dot string and of course the name as well is 02:13:46.740 |
just coming from the function name. Okay. And then we can 02:13:50.020 |
also see let's just print the name and description, but then 02:13:54.420 |
we can also see the args schema right. We can so this 02:13:58.660 |
thing here that we can't read at the moment to read it. We're 02:14:02.180 |
just going to look at the model JSON schema method and then we 02:14:06.980 |
can see what that contains, which is all of this 02:14:09.220 |
information. So this actually contains everything includes 02:14:12.260 |
properties. So we have the X. It creates a sort of title for 02:14:16.100 |
that and it also specifies the type. Okay. So the type that we 02:14:20.660 |
define is float float for opening. I guess mapped to 02:14:25.300 |
number rather than just being float and then we also see that 02:14:28.900 |
we have this required field. So this is telling our LM which 02:14:33.140 |
parameters are required, which ones are optional. So we you 02:14:36.820 |
know in some cases you would we can even do that here. Let's do 02:14:42.180 |
Z. That is going to be float or none. Okay. And we're just 02:14:48.340 |
going to say it is 0.3. Alright. I'm going to remove 02:14:53.460 |
this in a minute because it's kind of weird, but let's just 02:14:57.140 |
see what that looks like. So you see that we now have X, Y, 02:15:02.020 |
and Z, but then in Z, we have some additional information. 02:15:06.580 |
Okay. So it can be any of it can be a number or it can just 02:15:10.020 |
be nothing. The default value for that is 0.3. Okay. And then 02:15:15.060 |
if we look here, we can see that the required field does 02:15:18.020 |
not include Z. So it's just X and Y. So it's describing the 02:15:22.980 |
full function schema for us, but let's remove that. Okay. And 02:15:28.180 |
we can see that again with our exponentiate tool similar 02:15:32.420 |
thing. Okay. So how how are we going to invoke our tool? So 02:15:39.060 |
the LLM the underlying LLM is actually going to generate a 02:15:42.900 |
string. Okay. So it will look something like this. This is 02:15:46.660 |
going to be our LLM output. So it is it's a string that is 02:15:51.780 |
some JSON and of course to load a string into a dictionary 02:15:57.300 |
format, we just use JSON loads. Okay. So let's see that. So 02:16:03.220 |
this could be the output from our LLM. We load it into a 02:16:06.180 |
dictionary and then we get an actual dictionary. And then 02:16:09.620 |
what we would do is we can take our exponentiate tool. We 02:16:14.820 |
access the underlying function and then we pass it the keyword 02:16:19.220 |
arguments from our dictionary here. Okay. And that will 02:16:26.200 |
execute our tool. That is the tool execution logic that 02:16:29.000 |
LineChain implements and then later on in the next chapter, 02:16:32.520 |
we'll be implementing ourselves. Cool. So let's move 02:16:35.560 |
on to creating an agent. Now, we're going to be 02:16:38.680 |
constructing a simple tool calling agent. We're going to 02:16:41.880 |
be using LineChain expression language to do this. Now, we 02:16:45.720 |
will be covering LineChain expression language or LSL 02:16:49.400 |
more in a upcoming chapter but for now, all we need to know is 02:16:54.600 |
that our agent will be constructed using syntax and 02:16:58.840 |
components like this. So, we would start with our input 02:17:02.760 |
parameters. That is going to include our user query and of 02:17:06.040 |
course, the chat history because we need our agent to be 02:17:09.080 |
conversational and remember previous interactions within 02:17:11.720 |
the conversation. These input parameters will also include a 02:17:15.800 |
placeholder for what we call the agent scratch pad. Now, the 02:17:18.680 |
agent scratch pad is essentially where we are 02:17:21.240 |
storing the internal thoughts or the internal dialogue of the 02:17:25.400 |
agent as it is using tools and getting observations from those 02:17:28.280 |
tools and working through those multiple internal steps. So, in 02:17:34.040 |
the case that we will see, it will be using, for example, the 02:17:36.760 |
addition tool, getting the result using the multiply tool, 02:17:39.720 |
getting the result, and then providing a final answer 02:17:42.760 |
towards as a user. So, let's jump in and see what it looks 02:17:46.680 |
like. Okay, so we'll just start with defining our prompt. So, 02:17:50.360 |
our prompt is going to include the system message. That's 02:17:53.480 |
nothing. We're not putting anything special in there. 02:17:56.680 |
We're going to include the chat history which is a messages 02:18:01.160 |
placeholder. Then, we include our human message and then we 02:18:05.320 |
include a placeholder for the agent scratch pad. Now, the way 02:18:08.760 |
that we implement this later is going to be slightly different 02:18:12.040 |
for the scratch pad. We'd actually use this messages 02:18:14.200 |
placeholder but this is how we use it with the built-in 02:18:17.400 |
create tool agent from LinkedIn. Next, we'll define our 02:18:21.240 |
LM. We do need our opening our API key for that. So, we'll 02:18:24.920 |
enter that here like so. Okay, so come down. Okay, so we're 02:18:30.120 |
going to be creating this agent. We need conversation 02:18:33.240 |
memory and we are going to use the older conversation buffer 02:18:36.280 |
memory class rather than the newer runnable with message 02:18:39.080 |
history class. That's just because we're also using this 02:18:42.200 |
older create tool calling agent and this is the 02:18:46.760 |
older way of doing things. In the next chapter, we are going 02:18:50.040 |
to be using the more recent basically what we already 02:18:54.600 |
learned on chat history. We're going to be using all of that 02:18:57.720 |
to implement our chat history but for now, we're going to be 02:19:00.520 |
using the older method which is deprecated just as a pre 02:19:04.760 |
warning but again, as I mentioned at the very start of 02:19:08.200 |
course, we're starting abstract and then we're getting into the 02:19:11.720 |
details. So, we're going to initialize our agent for that. 02:19:15.960 |
We need these four things. LLM as we defined. Tools as we have 02:19:20.440 |
defined. Prompt as we have defined and then the memory 02:19:24.520 |
which is our old conversation buffer memory. So, with all of 02:19:29.400 |
that, we are going to go ahead and we create a tool calling 02:19:32.360 |
agent and then we just provide it with everything. Okay, there 02:19:36.120 |
we go. Now, you'll see here I didn't pass in the memory. I'm 02:19:41.400 |
passing it in down here instead. So, we're going to 02:19:44.920 |
start with this question which is what is 10.7 multiplied by 02:19:48.680 |
7.68. Okay. So, given the precision of these numbers, our 02:19:57.240 |
normal LLM would not be able to answer that. Almost definitely 02:20:02.360 |
would not be able to answer that correctly. We need a 02:20:04.920 |
external tool to answer that accurately and we'll see that 02:20:08.520 |
that is exactly what it's trying to do. So, we can see 02:20:12.440 |
that the tool agent action message here. We see that it 02:20:17.800 |
decided, okay, I'm going to use the multiply tool and here are 02:20:20.520 |
the parameters I want to use for that tool. Okay, we can see 02:20:23.720 |
X is 10.7 and Y is 7.68. You can see here that this is 02:20:28.760 |
already a dictionary and that is because the Lang chain has 02:20:33.320 |
taken the string from our LLM call and already converted it 02:20:37.880 |
into a dictionary for us. Okay, so that's just it's happening 02:20:41.240 |
behind the scenes there and you can actually see if we go into 02:20:44.840 |
the details a little bit, we can see that we have these 02:20:46.840 |
arguments and this is the original string that was coming 02:20:49.400 |
from our LLM. Okay, which has already been, of course, 02:20:52.680 |
processed by Lang chain. So, we have that. Now, the one thing 02:20:58.280 |
missing here is that, okay, we've got that the LLM wants 02:21:03.800 |
us to use multiply and we've got what the LLM wants us to 02:21:06.760 |
put into multiply but where's the answer, right? There is no 02:21:11.160 |
answer because the tool itself has not been executed because 02:21:14.840 |
it can't be executed by the LLM but then, okay, didn't we 02:21:19.640 |
already define our agent here? Yes, we defined the part of our 02:21:24.760 |
agent. That is how LLM has our tools and it is going to 02:21:29.240 |
generate which tool to use but it actually doesn't include the 02:21:33.880 |
agent execution part which is, okay, the agent executor is a 02:21:40.360 |
broader thing. It's broader logic like just code logic 02:21:44.520 |
which acts as a scaffolding within which we have the 02:21:48.600 |
iteration through multiple steps of our LLM calls followed 02:21:53.560 |
by the LLM outputting what tool to use followed by us 02:21:57.320 |
actually executing that for the LLM and then providing the 02:22:01.400 |
output back into the LLM for another decision or another 02:22:05.480 |
step. So, the agent itself here is not the full agentic flow 02:22:12.440 |
that we might expect. Instead, for that, we need to implement 02:22:16.440 |
this agent executor class. This agent executor includes our 02:22:20.840 |
agent from before. Then, it also includes the tools and one 02:22:25.160 |
thing here is, okay, we already passed the tools to our agent. 02:22:27.800 |
Why do we need to pass them again? Well, the tools being 02:22:30.760 |
passed to our agent up here, that is being used. So, that is 02:22:36.280 |
essentially extracting out those function schemas and 02:22:39.240 |
passing it to our LLM so that our LLM knows how to use the 02:22:41.880 |
tools. Then, we're down here. We're passing the tools again 02:22:44.840 |
to our agent executor and this is rather than looking at how 02:22:48.920 |
to use those tools. This is just looking at, okay, I want 02:22:51.880 |
the functions for those tools so that I can actually execute 02:22:54.440 |
them for the LLM or for the agent. Okay, so that's what is 02:22:58.760 |
happening there. Now, we can also pass in our memory 02:23:02.440 |
directly. So, you see, if we scroll up a little bit here, I 02:23:06.600 |
actually had to pass in the memory like this with our agent. 02:23:11.720 |
That's just because we weren't using the agent executor. Now, 02:23:14.120 |
we have the agent executor. It's going to handle that for 02:23:16.200 |
us and another thing that's going to handle for us is 02:23:19.880 |
intermediate steps. So, you'll see in a moment that when we 02:23:23.960 |
invoke the agent executor, we don't include the intermediate 02:23:26.600 |
steps and that's because that is already handled by the 02:23:29.800 |
agent executor now. So, we'll come down. We'll set verbose 02:23:34.360 |
equal to true so we can see what is happening and then we 02:23:38.200 |
can see here, there's no intermediate steps anymore and 02:23:42.360 |
we do still pass in the chat history like this but then the 02:23:47.480 |
addition of those new interactions to our memory is 02:23:50.520 |
going to be handled by the executor. So, in fact, let me 02:23:54.920 |
actually show that very quickly before we jump in. Okay, so 02:23:59.320 |
that's currently empty. We're going to execute this. 02:24:03.400 |
Okay, we're entered that new agent executor chain and let's 02:24:07.300 |
just have a quick look at our messages again and now you can 02:24:10.980 |
see that agent executor automatically handled the 02:24:13.940 |
addition of our human message and then the responding AI 02:24:17.700 |
message for us. Okay, which is useful. Now, what happened? So, 02:24:23.140 |
we can see that the multiply tool was invoked with these 02:24:26.820 |
parameters and then this pink text here that we got, that is 02:24:30.900 |
the observation from the tool. So, it's what the tool output 02:24:33.700 |
back to us, okay? Then, this final message here is not 02:24:37.140 |
formatted very nicely but this final message here is coming 02:24:40.420 |
from our LLM. So, the green is our LLM output. The pink is our 02:24:46.420 |
tool output, okay? So, the LLM after seeing this output says 02:24:53.700 |
10.7 multiplied by 7.68 is approximately 82.18. Okay, 02:25:01.220 |
cool. Useful and then we can also see that the chat history 02:25:04.500 |
which we already just saw. Great. So, that has been used 02:25:08.980 |
correctly. We can just also confirm that that is correct. 02:25:13.220 |
82.1759 recurring which is exactly what we get here. Okay 02:25:18.740 |
and we the reason for that is obviously our multiply tool is 02:25:22.340 |
just doing this exact operation. Cool. So, let's try 02:25:28.100 |
this with a bit of memory. So, I'm going to ask or I'm going 02:25:31.700 |
to state to the agent. Hello, my name is James. We'll leave 02:25:36.980 |
that as the it's not actually the first interaction because 02:25:40.100 |
we already have these but it's an early interaction with my 02:25:45.860 |
name in there. Then, we're going to try and perform 02:25:49.460 |
multiple tool calls within a single execution loop and what 02:25:52.500 |
you'll see with when it is calling these tools is that you 02:25:55.220 |
can actually use multiple tools in parallel. So, for sure, I 02:25:58.420 |
think two or three of these were used in parallel and then 02:26:01.460 |
define or subtract had to wait for those previous results. So, 02:26:05.220 |
it would have been executed afterwards and we should 02:26:08.420 |
actually be able to see this in Langsmith. So, if we go here, 02:26:13.220 |
yeah, we can see that we have this initial call and then we 02:26:17.060 |
have add a multiply and exponentiate or use in parallel. 02:26:20.100 |
Then, we have another call which you subtract and then we 02:26:22.820 |
get the response. Okay, which is pretty cool and then the 02:26:27.620 |
final result there is negative eleven. Now, when you look at 02:26:32.420 |
whether the answer is accurate, I think the order here of 02:26:37.300 |
calculations is not quite correct. So, if we put the 02:26:41.380 |
actual computation here, it gets it right but otherwise, if 02:26:45.620 |
I use natural language, it's like, I'm doing, maybe I'm 02:26:48.260 |
phrasing it in a poor way. Okay, so, I suppose that is 02:26:53.780 |
pretty important. So, okay, if we put the computation in here, 02:26:57.940 |
we get the negative thirteen. So, it's something to be 02:27:01.460 |
careful with and probably requires a little bit of 02:27:04.660 |
prompting to prompting and maybe examples in order to get 02:27:08.020 |
that smooth so that it does do things in the way that we might 02:27:12.740 |
expect or maybe we as humans are just bad and misuse the 02:27:17.140 |
systems one or the other. Okay, so now, we've gone through that 02:27:21.460 |
a few times. Let's go and see if our agent can still recall 02:27:24.420 |
our name. Okay and it remembers my name is James. Good. So, it 02:27:28.500 |
still has that memory in there as well. That's good. Let's 02:27:32.020 |
move on to another quick example where we're just going 02:27:35.220 |
to use Google Search. So, we're going to be using the 02:27:37.700 |
SEB API. You can, okay, you can get the API key that you need 02:27:43.540 |
from here. So, SEB API dot com slash user slash sign in and 02:27:48.340 |
just enter that in here. So, you will get it's up to 100 02:27:52.900 |
searches per month for free. So, just be aware of that if 02:27:58.100 |
you overuse it. I don't think they charge you cuz I don't 02:28:01.300 |
think you enter your card details straight away but yeah 02:28:05.060 |
just be aware of that limit. Now, there are certain tools 02:28:10.180 |
that LineTrain have already built for us. So, they're 02:28:12.740 |
pre-built tools and we can just load them using the load tools 02:28:15.860 |
function. So, we do that like so. We have our load tools and 02:28:19.300 |
we just pass in the SEB API tool only. We can pass in more 02:28:22.980 |
there if we want to and then we also pass in our LM. Now, I'm 02:28:27.940 |
going to one, use that tool but I'm also going to define my 02:28:31.700 |
own tool which is to get the current location based on the 02:28:35.380 |
IP address. Now, this is we're in Colab at the moment. So, 02:28:37.860 |
it's actually going to get the IP address for the Colab 02:28:40.340 |
instance that I'm currently on and we'll find out where that 02:28:43.380 |
is. So, that is going to get the IP address and then it's 02:28:47.620 |
going to provide the data back to our LM in this format here. 02:28:50.820 |
So, we're going to be latitude, longitude, city, and 02:28:53.060 |
country. Okay? We're also going to get the current date and 02:28:56.660 |
time. So, now, we're going to redefine our prompt. I'm not 02:29:02.500 |
going to include chat history here. I just want this to be 02:29:04.820 |
like a one-shot thing. I'm going to redefine our agent and 02:29:09.300 |
agent executor using our new tools which is our SEB API plus 02:29:13.780 |
the get current date time and get location from IP. Then, 02:29:17.780 |
I'm going to invoke our agent executor with I have a few 02:29:20.900 |
questions. What is the date and time right now? How is the 02:29:23.780 |
weather where I am? And please give me degrees in Celsius. So, 02:29:28.740 |
when it gives me that weather. Okay and let's see what we get. 02:29:33.780 |
Okay. So, apparently, we're in Council Bluffs in the US. It is 02:29:40.680 |
13 degrees Fahrenheit which I think is absolutely freezing. 02:29:44.440 |
Oh my gosh, it is. Yes, minus ten. So, it's super cold over 02:29:48.760 |
there. And you can see that, okay, it did give us 02:29:53.000 |
Fahrenheit. So, that's that is because the tool that we're 02:29:55.320 |
using provided us with Fahrenheit which is fine but it 02:29:59.960 |
did translate that over into a estimate of Celsius for us 02:30:03.800 |
which is pretty cool. So, let's actually output that. So, we 02:30:07.640 |
get this which I is correct with the US approximately this 02:30:13.640 |
and we also get a description of the conditions was partly 02:30:17.240 |
cloudy with 0% precipitation lucky for them and humidity of 02:30:23.720 |
66%. Okay. All pretty cool. So, that is it for this 02:30:27.800 |
introduction to Langchain Agents. As I mentioned, next 02:30:31.080 |
chapter, we're going to dive much deeper into Agents and 02:30:34.120 |
also implement that for Langchain version 0.3. So, 02:30:37.880 |
we'll leave this chapter here and jump into the next one. In 02:30:41.320 |
this chapter, we're going to be taking a deep dive into Agents 02:30:45.800 |
with the Langchain and we're going to be covering what an 02:30:50.840 |
agent is. We're going to talk a little bit conceptually about 02:30:55.640 |
agents, the React agent, and the type of agent that we're 02:30:59.320 |
going to be building and based on that knowledge, we are 02:31:02.120 |
actually going to build out our own agent execution logic 02:31:07.880 |
which we refer to as the agent executor. So, in comparison to 02:31:12.680 |
the previous video on agents in Langchain which is more of an 02:31:17.240 |
introduction, this is far more detailed. We'll be getting into 02:31:21.480 |
the weeds a lot more with both what agents are and also agents 02:31:26.200 |
within Langchain. Now, when we talk about agents, a 02:31:30.280 |
significant part of the agent is actually relatively simple 02:31:36.520 |
code logic that iteratively runs LLM calls and processes 02:31:44.040 |
their outputs, potentially running or executing tools. The 02:31:48.760 |
exact logic for each approach to building an agent will 02:31:53.400 |
actually vary pretty significantly, but we'll focus 02:31:57.560 |
on one of those which is the React agent. Now, React is a 02:32:03.160 |
very common pattern and although being relatively old 02:32:07.560 |
now, most of the tool agents that we see used by OpenAI and 02:32:13.320 |
essentially every LLM company, they all use a very similar 02:32:17.240 |
pattern. Now, the React agent follows a pattern like this. 02:32:20.920 |
Okay, so we would have our user input up here. Okay, so our 02:32:26.760 |
input here is a question, right? Aside from the Apple 02:32:29.160 |
remote, what other device can control the program? Apple 02:32:31.720 |
remote was originally designed to interact with. Now, probably 02:32:35.400 |
most LLMs would actually be able to answer this directly 02:32:37.640 |
now. This is from the paper, which was a few years back. Now, 02:32:42.600 |
in this scenario, assuming our LLM didn't already know the 02:32:46.360 |
answer, there are multiple steps an LLM or an agent might 02:32:50.280 |
take in order to find out the answer. Okay, so first of 02:32:55.000 |
those is we say our question here is what other device can 02:32:59.160 |
control the program? Apple remote was originally designed 02:33:01.800 |
to interact with. So the first thing is, okay, what was the 02:33:05.240 |
program that the Apple remote was originally designed to 02:33:07.800 |
interact with? That's the first question we have here. So what 02:33:12.360 |
we do is I need to search Apple remote and find a program 02:33:15.240 |
that's useful. This is a reasoning step. So the LLM is 02:33:18.840 |
reasoning about what it needs to do. I need to search for 02:33:22.040 |
that and find a program that's useful. So we are taking an 02:33:26.200 |
action. This is a tool call here. Okay, so we're going to 02:33:29.480 |
use the search tool and our query will be Apple remote and 02:33:33.000 |
the observation is the response we get from executing that 02:33:36.120 |
tool. Okay, so the response here will be the Apple remote 02:33:39.000 |
is designed to control the front grow media center. So now 02:33:43.320 |
we know the program Apple remote was originally designed 02:33:45.720 |
to interact with. Now we're going to go through another 02:33:49.480 |
iteration. Okay, so this is one iteration of our reasoning 02:33:55.160 |
action and observation. So when we're talking about react 02:33:59.960 |
here, although again, this sort of pattern is very common 02:34:03.640 |
across many agents when we're talking about react, the name 02:34:07.880 |
actually is reasoning or the first two characters of 02:34:12.360 |
reasoning followed by action. Okay, so that's where the react 02:34:17.080 |
comes from. So this is one of our react agent loops or 02:34:21.400 |
iterations. We're going to go and do another one. So next 02:34:25.000 |
step we have this information. The LM is not provided with 02:34:27.640 |
this information. Now we want to do a search for front row. 02:34:31.800 |
Okay, so we do that. This is the reasoning step. We perform 02:34:35.960 |
the action search front row. Okay, tool search query front 02:34:40.680 |
row observation. This is the response front row is controlled 02:34:44.600 |
by an Apple remote or keyboard function keys. Alright, cool. 02:34:50.120 |
So we know keyboard function keys are the other device that 02:34:53.880 |
we were asking about up here. So now we have all the 02:34:58.600 |
information we need. We can provide an answer to our user. 02:35:02.760 |
So we go through another iteration here reasoning and 02:35:07.240 |
action. Our reasoning is I can now provide the answer of 02:35:11.400 |
keyboard function keys to the user. Okay, great. So then we 02:35:16.440 |
use the answer tool. It's like final answer in more common 02:35:21.960 |
tool agent use and the answer would be keyboard function 02:35:27.000 |
keys, which we then output to our user. Okay, so that is the 02:35:33.720 |
react loop. Okay, so looking at this. Where are we actually 02:35:40.020 |
calling an LLM and in what way are we actually calling an LLM? 02:35:44.820 |
So we have our reasoning step. Our LLM is generating the text 02:35:50.900 |
here, right? So LLM is generating. Okay. What should I 02:35:53.700 |
do then? Our LLM is going to generate the input parameters 02:35:59.620 |
to our action step here that will those input parameters and 02:36:05.460 |
the tool being used will be taken by our code logic, our 02:36:08.580 |
agent executor logic, and they will be used to execute some 02:36:11.940 |
code in which we will get an output. That output might be 02:36:16.180 |
taken directly to our observation or our LLM might 02:36:19.460 |
take that output and then generate an observation based 02:36:22.500 |
on that. It depends on how you've implemented everything. 02:36:27.380 |
So our LLM could potentially be being used at every single 02:36:32.660 |
step there and of course that will repeat through every 02:36:37.860 |
iteration. So we have further iterations down here. So you're 02:36:41.540 |
potentially using an LLM multiple times throughout this 02:36:44.740 |
whole process, which of course in terms of latency and token 02:36:48.020 |
cost, it does mean that you're going to be paying more for an 02:36:52.100 |
agent than you are with just a standard LLM, but that is of 02:36:55.940 |
course expected because you have all of these different 02:36:58.740 |
things going on. But the idea is that what you can get out of 02:37:02.820 |
an agent is of course much better than what you can get 02:37:05.780 |
out of an LLM alone. So when we're looking at all of this, 02:37:11.060 |
all of this iterative chain of thought and tool use, all this 02:37:16.260 |
needs to be controlled by what we call the agent executor, 02:37:19.380 |
which is our code logic, which is hitting our LLM, processing 02:37:23.380 |
its outputs, and repeating that process until we get to our 02:37:27.060 |
answer. So breaking that part down, what does it actually 02:37:30.900 |
look like? It looks kind of like this. So we have our user 02:37:34.900 |
input goes into our LLM, okay, and then we move on to the 02:37:39.540 |
reasoning and action steps. Is the action the answer? If it is 02:37:44.500 |
the answer, so as we saw here, where is the answer? If the 02:37:50.660 |
action is the answer, so true, we would just go straight to 02:37:54.180 |
our outputs. Otherwise, we're going to use our selector tool. 02:37:57.620 |
Agent executor is going to handle all this. It's going to 02:38:00.980 |
execute our tool, and then from that, we get our three 02:38:05.460 |
reasoning, action, observation, inputs, and outputs, and then 02:38:09.300 |
we're feeding all that information back into our LLM, 02:38:11.940 |
okay? In which case, we go back through that loop. So we 02:38:15.860 |
could be looping for a little while until we get to that 02:38:19.060 |
final output. Okay, so let's go across to the code. We're going 02:38:23.620 |
to be going into the agent executor notebook. We'll open 02:38:26.580 |
that up in Colab, and we'll go ahead and just install our 02:38:30.500 |
prerequisites. Nothing different here. It's just 02:38:34.820 |
Langtrain, Langsmith, optionally, as before. Again, 02:38:38.980 |
optionally, Langtrain API key if you do want to use 02:38:41.540 |
Langsmith. Okay, and then we'll come down to our first 02:38:47.060 |
section, where it's going to define a few quick tools. I'm 02:38:51.220 |
not necessarily going to go through these because we've 02:38:54.660 |
already covered them in the agent introduction, but very 02:38:58.580 |
quickly, Langtrain core tools, we're just importing this tool 02:39:02.180 |
decorator, which transforms each of our functions here into 02:39:06.820 |
what we would call a structured tool object. This 02:39:10.740 |
thing here. Okay, which we can see. Let's just have a quick 02:39:14.660 |
look here, and then if we want to, we can extract all of the 02:39:18.820 |
sort of key information from that structured tool using 02:39:21.860 |
these parameters here or attributes. So name, 02:39:24.180 |
description, org schema, model, JSON schema, which give us 02:39:28.740 |
essentially how the LLM should use our function. Okay, so I'm 02:39:34.900 |
going to keep pushing through that. Now, very quickly again, 02:39:40.660 |
we did cover this in the intro video, so I don't want to 02:39:44.420 |
necessarily go over it again in too much detail, but our 02:39:48.580 |
agent executor logic is going to need this part. So we're 02:39:52.660 |
going to be getting a string from our LLM. We're going to be 02:39:55.780 |
loading that into a dictionary object, and we're going to be 02:39:59.060 |
using that to actually execute our tool as we do here using 02:40:02.980 |
keyword arguments. Okay, like that. Okay, so with the tools 02:40:09.620 |
out of the way, let's take a look at how we create our 02:40:12.340 |
agent. So when I say agent here, I'm specifically talking 02:40:16.820 |
about the part that is generating our reasoning step, 02:40:21.460 |
then generating which tool and what the input parameters to 02:40:27.140 |
that tool will be. Then the rest of that is not actually 02:40:30.340 |
covered by the agent. Okay, the rest of that would be covered 02:40:33.380 |
by the agent execution logic, which would be taking the tool 02:40:37.140 |
to be used, the parameters, executing the tool, getting 02:40:41.220 |
the response, aka the observation, and then iterating 02:40:45.060 |
through that until the LLM is satisfied and we have enough 02:40:47.940 |
information to answer a question. So looking at that, 02:40:52.740 |
our agent will look something like this. It's pretty simple. 02:40:56.020 |
So we have our input parameters, including the chat 02:40:58.500 |
history, user query. We have our input parameters, including 02:41:01.780 |
the chat history, user query, and actually would also have 02:41:04.900 |
any intermediate steps that have happened in here as well. We 02:41:08.500 |
have our prompt template, and then we have our LLM binded 02:41:12.340 |
with tools. So let's see how all this would look starting 02:41:16.500 |
with, we'll define our prompt template. So it's going to look 02:41:20.340 |
like this. We have our system message, you're a helpful 02:41:24.340 |
assistant when answering user's questions. You should use one 02:41:26.900 |
tool to provide it after using a tool. The tool I will provide 02:41:29.380 |
in the scratch pad below, okay, which we're naming here. If you 02:41:33.860 |
have an answer in the scratch pad, you should not use any 02:41:36.580 |
more tools and instead answer directly to the user. Okay, so 02:41:40.420 |
we have that as our system message. We could obviously 02:41:43.300 |
modify that based on what we're actually doing. Then following 02:41:47.620 |
our system message, we're going to have our chat history, so any 02:41:50.420 |
previous interactions between the user and the AI. Then we 02:41:54.180 |
have our current message from the user, okay, which will be 02:41:57.860 |
fed into the input field there. And then following this, we 02:42:01.780 |
have our agent's scratch pad or the intermediate thoughts. So 02:42:05.140 |
this is where things like the LLM deciding, okay, this is what 02:42:09.540 |
I need to do. This is how I'm going to do it, aka the tool 02:42:12.900 |
call. And this is the observation. That's where all 02:42:16.020 |
of that information will be going, right? So each of those 02:42:18.980 |
you want to pass in as a message, okay? And the way that 02:42:23.380 |
will look is that any tool call generation from the LLM, so 02:42:28.020 |
when the LLM is saying, use this tool, please, that will be 02:42:31.780 |
a system message. And then the responses from our tool, so the 02:42:37.140 |
observations, they will be returned as tooled messages. 02:42:42.180 |
Great. So we'll run that to define our prompt template. 02:42:46.180 |
We're going to define our LLM. So we're going to be using 02:42:49.700 |
Jupyter 4.0 Mini with a temperature of zero because we 02:42:54.100 |
want less creativity here, particularly when we're doing 02:42:56.820 |
tool calling. There's just no need for us to use a high 02:43:00.500 |
temperature here. So we need to enter our OpenAI API key, which 02:43:03.780 |
we would get from platformopenai.com. We enter this, 02:43:08.100 |
then we're going to continue and we're just going to add 02:43:11.140 |
tools to our LLM here, okay? These, and we're going to bind 02:43:18.180 |
them here. Then we have tool choice any. So tool choice any, 02:43:23.060 |
we'll see in a moment, I'll go through this a little bit more 02:43:25.860 |
in a second, but that's going to essentially force a tool 02:43:29.540 |
call. And you can also put required, which is actually a 02:43:32.420 |
bit more, it's a bit clearer, but I'm using any here, so I'll 02:43:36.500 |
stick with it. So these are our tools we're going through. We 02:43:40.100 |
have our inputs into the agent runnable. We have our prompt 02:43:44.980 |
template and then that will get fed into our LLM. So let's run 02:43:49.140 |
that. Now we would invoke the agent part of everything here 02:43:54.100 |
with this. Okay, so let's see what it outputs. This is 02:43:56.820 |
important. So I'm asking what is 10%? Obviously that should 02:44:00.420 |
use the addition tool and we can actually see that happening. 02:44:03.620 |
So the agent message content is actually empty here. This is 02:44:07.940 |
where you'd usually get an answer, but if we go and have a 02:44:11.380 |
look, we have additional keyword args. In there we have 02:44:14.580 |
tool calls and then we have function arguments. Okay, so 02:44:19.060 |
we're calling a function. Arguments for that function are 02:44:22.020 |
this. Okay, so we can see this is string. Again, the way that 02:44:26.580 |
we would parse that is we do JSON loads and that becomes 02:44:29.620 |
dictionary and then we can see which function is being called 02:44:32.740 |
and it is the add function and that is all we need in order to 02:44:36.420 |
actually execute our function or our tool. Okay, we can see 02:44:42.740 |
it's a lot more detail here. Now, what do we do from here? 02:44:47.780 |
We're going to map the tool name to the tool function and 02:44:50.660 |
then we're just going to execute the tool function with 02:44:52.580 |
the generated args, i.e. those. I'll also just point out 02:44:57.380 |
quickly that here we are getting the dictionary 02:45:00.100 |
directly, which I think is coming from somewhere else in 02:45:08.820 |
here where we're parsing this out, we don't necessarily need 02:45:11.300 |
to do that because I think on the lang chain side, they're 02:45:14.580 |
doing it for us. So we're already getting that. So JSON 02:45:19.540 |
loads we don't necessarily need here. Okay, so we're just 02:45:22.900 |
creating this tool name to function mapping dictionary 02:45:26.660 |
here. So we're taking the well the tool names and we're just 02:45:30.420 |
mapping those back to our tool functions and this is coming 02:45:33.140 |
from our tools list. So that tools list that we defined 02:45:36.820 |
here. Okay, and we can even just see quickly that will 02:45:41.140 |
include everything or each of the tools we define there. 02:45:44.820 |
Okay, that's all it is. Now, we're going to execute using 02:45:49.860 |
our name to tool mapping. Okay, so this here will get us the 02:45:54.660 |
function. So we'll get us this function and then to that 02:45:58.580 |
function, we're going to pass the arguments that we 02:46:02.420 |
generated. Okay. Let's see what it looks like. Alright, so the 02:46:08.180 |
response to the observation is twenty. Now, we are going to 02:46:14.180 |
feed that back into our LLM using the tool message and 02:46:19.140 |
we're actually going to put a little bit of text around this 02:46:21.540 |
to make it a little bit nicer. We don't necessarily need to 02:46:24.420 |
do this to be completely honest. We could just return 02:46:29.220 |
the answer directly. I don't understand. I don't even think 02:46:33.220 |
there would really be any difference. So, we could do 02:46:36.980 |
either. In some cases, that could be very useful. In other 02:46:40.020 |
cases, like here, it doesn't really make too much 02:46:42.340 |
difference, particularly because we have this tool call 02:46:44.980 |
ID and what this tool call ID is doing is it's being used by 02:46:48.660 |
OpenAI. It's being read by the LLM so that the LLM knows that 02:46:54.180 |
the response we got here is actually mapped back to the 02:46:59.940 |
tool execution that it's identified here because you see 02:47:04.020 |
that we have this ID. Alright, we have an ID here. The LLM is 02:47:08.020 |
going to see the ID. It's going to see the ID that we pass back 02:47:12.340 |
in here and it's going to see those two are connected. So, 02:47:14.900 |
you can see, okay, this is the tool I called and this is a 02:47:17.540 |
response I got from it. Because of that, you don't necessarily 02:47:20.740 |
need to say which tool you used here. You can. It depends on 02:47:25.620 |
what you're doing. Okay. So, what do we get here? We have, 02:47:32.580 |
okay, just running everything again. We've added our tool 02:47:35.780 |
call. So, that's the original AI message that includes, okay, 02:47:39.060 |
use that tool and then we have the tool execution, tool 02:47:41.940 |
message, which is the observation. We map those to 02:47:46.500 |
the agent stretch card and then what do we get? We have an AI 02:47:49.540 |
message but the content is empty again, which is 02:47:52.420 |
interesting because we said to our LLM up here, if you have an 02:47:57.940 |
answer in the stretch pad, you should not use any more tools 02:48:01.140 |
and instead answer directly to the user. So, why is our LLM 02:48:07.860 |
not answering? Well, the reason for that is down here, we 02:48:13.620 |
specify tool choice equals any, which again, it's the same as 02:48:19.060 |
tool choice required, which is telling the LLM that it cannot 02:48:24.180 |
actually answer directly. It has to use a tool and I usually 02:48:28.900 |
do this, right? I would usually put tool choice equals any or 02:48:32.180 |
required and force the LLM to use a tool every single time. 02:48:37.780 |
So, then the question is, if it has to use a tool every time, 02:48:41.220 |
how does it answer our user? Well, we'll see in a moment. 02:48:47.220 |
First, I just want to show you the two options essentially 02:48:51.380 |
that we have. The second is what I would usually use but 02:48:53.700 |
let's start with the first. So, the first option is that we 02:48:57.700 |
set tool choice equal to auto and this tells the LLM that it 02:49:01.540 |
can either use a tool or it can answer the user directly using 02:49:06.580 |
the final answer or using that content field. So, if we run 02:49:11.460 |
that, like we're specifying tool choice as auto, we run 02:49:14.740 |
that, let's invoke, okay? Initially, you see, ah, wait, 02:49:20.100 |
there's still no content. That's because we didn't add 02:49:23.140 |
anything into the agent scratch pad here. There's no 02:49:25.460 |
information, right? It's all empty. Actually, it's empty 02:49:30.260 |
because, sorry, so here, you have the chat history that's 02:49:32.820 |
empty. We didn't specify the agent scratch pad and the 02:49:38.260 |
reason that we can do that is because we're using, if you 02:49:40.340 |
look here, we're using get. So, essentially, it's saying, 02:49:43.700 |
try and get agent scratch pad from this dictionary but if it 02:49:46.420 |
hasn't been provided, we're just going to give an empty 02:49:49.300 |
list. So, that's why we don't need to specify it 02:49:52.820 |
here. But that means that, oh, okay, the agent doesn't 02:49:56.980 |
actually know anything here. It hasn't used the tool yet. So, 02:50:01.300 |
we're going to just go through our iteration again, right? So, 02:50:04.020 |
we're going to get our tool output. We're going to use that 02:50:07.300 |
to create the tool message and then we're going to add our 02:50:11.380 |
tool call from the AI and the observation. We're going to 02:50:15.620 |
pass those to the agent scratch pad and this time, we'll see. 02:50:19.700 |
We run that. Okay, now, we get the content, okay? So, now, it's 02:50:24.980 |
not calling. You see here, there's no tool call or 02:50:27.460 |
anything going on. We just get content. So, that is, this is a 02:50:34.260 |
standard way of doing or building a tool calling agent. 02:50:38.420 |
The other option which I mentioned, this is what I 02:50:40.740 |
usually go with. So, number two here, I would usually create a 02:50:45.700 |
final answer tool. So, why would we even do that? Why would we 02:50:53.140 |
create a final answer tool rather than just, you know, this 02:50:55.380 |
method is actually perfectly, you know, it works. So, why 02:50:59.140 |
would we not just use this? There are a few reasons. The 02:51:03.060 |
main ones are that with option two where we're forcing tool 02:51:07.620 |
calling, this removes possibility of an agent using 02:51:11.940 |
that content field directly and the reason, at least, the 02:51:16.740 |
reason I found this good when building agents in the past is 02:51:19.620 |
that occasionally, when you do want to use a tool, it's 02:51:22.660 |
actually going to go with the content field and it can get 02:51:25.860 |
quite annoying and use the content field quite frequently 02:51:29.380 |
when you actually do want it to be using one of the tools and 02:51:34.100 |
this is particularly noticeable with smaller models. With 02:51:39.380 |
bigger models, it's not as common although it does still 02:51:42.740 |
happen. Now, the second thing that I quite like about using a 02:51:47.060 |
tool as your final answer is that you can enforce a 02:51:52.740 |
structured output in your answer. So, this is something 02:51:55.460 |
we saw in, I think, the first, yes, the first line chain 02:52:00.100 |
example where we were using the structured output tool of 02:52:05.060 |
line chain and what that actually is, the structured 02:52:08.260 |
output feature of line chain, it's actually just a tool call, 02:52:11.700 |
right? So, it's forcing a tool call from your LLM. It's just 02:52:15.060 |
abstracted away so you don't realize that that's what it's 02:52:17.220 |
doing but that is what it's doing. So, I find that 02:52:22.020 |
structured outputs are very useful particularly when you 02:52:25.940 |
have a lot of code around your agent. So, when that output 02:52:30.420 |
needs to go downstream into some logic, that can be very 02:52:35.780 |
useful because you can, you have a reliable output format 02:52:40.420 |
that you know is going to be output and it's also incredibly 02:52:43.860 |
useful if you have multiple outputs or multiple fields that 02:52:47.860 |
you need to generate for. So, those can be very useful. Now, 02:52:53.780 |
to implement this, so to implement option two, we need 02:52:56.500 |
to create a final answer tool. We, as with our other tools, 02:53:02.020 |
we're actually going to provide a description and you can or 02:53:05.860 |
you cannot do this. So, you can, you can also just return 02:53:10.260 |
none and actually just use the generated action as the 02:53:16.340 |
essentially what you're going to send out of your agent 02:53:19.700 |
execution logic or you can actually just execute the tool 02:53:23.700 |
and just pass that information directly through. Perhaps, in 02:53:27.220 |
some cases, you might have some additional post processing for 02:53:30.740 |
your final answer. Maybe you do some checks to make sure it 02:53:33.220 |
hasn't said anything weird. You could add that in this tool 02:53:37.300 |
here but yeah, in this case, we're just going to pass those 02:53:41.060 |
through directly. So, let's run this. We've added, where are we? 02:53:48.820 |
Final answer. We've added the final answer tool to our named 02:53:51.460 |
tool mapping. So, our agent can now use it. We redefine our 02:53:56.100 |
agent, setting tool choice to any because we're forcing the 02:53:59.460 |
tool choice here and let's go with what is ten plus ten. See 02:54:04.180 |
what happens. Okay, we get this, right? We can also, one 02:54:08.900 |
thing, nice thing here is that we don't need to check is our 02:54:11.460 |
output in the content field or is it in the tool course field? 02:54:14.500 |
We know it's going to be in the tool course field because 02:54:16.500 |
we're forcing that tool use which is quite nice. So, okay, 02:54:19.860 |
we know we're using the add tool and these are the 02:54:22.500 |
arguments. Great. We go or go through that process again. 02:54:27.380 |
We're going to create our tool message and then we're going to 02:54:30.260 |
add those messages into our scratch pad or intermediate 02:54:33.460 |
sets and then we can see again, ah, okay, content field is 02:54:38.100 |
empty. That is expected. We're forcing tool users. No way that 02:54:42.580 |
this can be or have anything inside it but then if we come 02:54:48.020 |
down here to our tool course, nice. Final answer, answer, ten 02:54:54.100 |
plus ten equals twenty. Alright? We also have this. 02:54:58.820 |
Tools used. Where is tools used coming from? Okay, well, I 02:55:01.620 |
mentioned before that you can add additional things or 02:55:06.020 |
outputs when you're using this tool used for your final 02:55:09.700 |
answer. So, if you just come up here to here, you can see that 02:55:14.820 |
I asked the LLM to use that tools used field which I 02:55:18.980 |
defined here. It's a list of strings. Use this to tell me 02:55:23.140 |
what tools you use in your answer, right? So, I'm getting 02:55:26.260 |
the normal answer but I'm also getting this information as 02:55:28.900 |
well which is kind of nice. So, that's where that is coming 02:55:31.620 |
from. See that? Okay. So, we have our actual answer here and 02:55:36.260 |
then we just have some additional information, okay? 02:55:38.980 |
We've also defined a type here. It's just a list of strings 02:55:41.620 |
which is really nice. It's giving us a lot of control over 02:55:43.940 |
what we're outputting which is perfect. That's, you know, when 02:55:46.580 |
you're building with agents, the biggest problem in most 02:55:52.340 |
cases is control of your LLM. So, here, we're getting a 02:55:58.100 |
honestly pretty unbelievable amount of control over what our 02:56:02.740 |
LLM is going to be doing which is perfect for when you're 02:56:07.060 |
building in the real world. So, this is everything that we 02:56:12.580 |
need. This is our answer and we would of course be passing 02:56:15.460 |
that downstream into whatever logic our AI application would 02:56:22.020 |
be using, okay? So, maybe that goes directly to a front end 02:56:26.020 |
and we're displaying this as our answer and we're maybe 02:56:29.460 |
providing some information about, okay, where did this 02:56:31.780 |
answer come from or maybe there's some additional steps 02:56:34.980 |
downstream where we're actually doing some more processing or 02:56:39.060 |
transformations but yeah, we have that. That's great. Now, 02:56:43.540 |
everything we've just done here, we've been executing 02:56:45.940 |
everything one by one and that's to help us understand 02:56:50.980 |
what process we go through when we're building an agent 02:56:55.220 |
executor. But we're not going to want to do that all the time, 02:57:00.500 |
are we? Most of the time, we probably want to abstract all 02:57:04.180 |
this away and that's what we're going to do now. So, we're 02:57:07.860 |
going to build essentially everything we've just taken. 02:57:11.140 |
We're going to abstract that and abstract it away into a 02:57:15.220 |
custom agent executor class. So, let's have a quick look at 02:57:20.020 |
what we're doing here. Although it's literally just 02:57:22.340 |
what we just did, okay? So, custom agent executor. We 02:57:27.860 |
initialize it. We set this max iterations. I'll talk about 02:57:31.060 |
this in a moment. We initialize it. That is going to set our 02:57:34.820 |
chat history to just being empty. Okay, good. So, it's a 02:57:38.980 |
new agent. There should be no chat history in this case. Then 02:57:42.180 |
we actually define our agent, right? So, that part of logic 02:57:45.380 |
that is going to be taking our inputs and generating what to 02:57:48.900 |
do next aka what tool call to do, okay? And we set everything 02:57:53.460 |
as attributes of our class and then we're going to define an 02:57:58.020 |
invoke method. This invoke method is going to take an 02:58:02.420 |
input which is just a string. So, it's going to be our 02:58:04.500 |
message from the user and what it's going to do is it's going 02:58:09.460 |
to iterate through essentially everything we just did, okay? 02:58:14.980 |
Until we hit the the final answer tool, okay? So, well, 02:58:18.820 |
what does that mean? We have our tool call, right? Which is 02:58:23.780 |
we're just invoking our agent, right? So, it's going to 02:58:26.980 |
generate what tool to use and what parameters should go into 02:58:29.700 |
that, okay? And that's an AI message. So, we would append 02:58:35.460 |
that to our agent stretch pad and then we're going to use the 02:58:38.820 |
information from our tool call. So, the name of the tool and 02:58:42.020 |
the args and also the ID. We're going to use all of that 02:58:45.860 |
information to execute our tool and then provide the 02:58:51.140 |
observation back to our LLM, okay? So, execute our tool here. 02:58:55.860 |
We then format the tool output into a tool message. See here 02:59:00.580 |
that I'm just using the the output directly. I'm not adding 02:59:03.620 |
that additional information there. We do need to always 02:59:08.180 |
pass in the tool call ID so that our LLM knows which output 02:59:12.900 |
is mapped to which tool. I didn't mention this before in 02:59:16.580 |
this video at least but that is that's important when we have 02:59:19.380 |
multiple tool calls happening in parallel because that can 02:59:22.500 |
happen. When we have multiple tool calls happening in 02:59:25.220 |
parallel, let's say we have ten tool calls, all those 02:59:28.100 |
responses might come back at different times. So, then the 02:59:31.380 |
order of those can get messed up. So, we wouldn't necessarily 02:59:35.780 |
always see that it's a AI message beginning a tool call 02:59:41.060 |
followed by the answer to that tool call. Instead, it might be 02:59:44.900 |
AI message followed by like ten different tool call responses. 02:59:49.620 |
So, you need to have those IDs in there, okay? So, then we 02:59:54.260 |
pass our tool output back to our Agent Scratchpad or 02:59:58.660 |
intermediate steps. I'm sending a print in here so that we can 03:00:02.500 |
see what's happening whilst everything is running. Then we 03:00:05.060 |
increment this count number. We'll talk about that in a 03:00:08.580 |
moment. So, coming past that, we say, okay, if the tool name 03:00:12.660 |
here is final answer, that means we should stop, okay? So, 03:00:18.580 |
once we get the final answer, that means we can actually 03:00:20.980 |
extract our final answer from the final tool call, okay? And 03:00:25.940 |
in this case, I'm going to say that we're going to extract the 03:00:31.220 |
answer from the tool call or the observation. We're going to 03:00:35.300 |
extract the answer that was generated. We're going to pass 03:00:38.260 |
that into our chat history. So, we're going to have our user 03:00:41.860 |
message. This is the one the user came up with followed by 03:00:45.380 |
our answer which is just the natural answer field and that's 03:00:49.700 |
simply an AI message. But then we're actually going to be 03:00:52.660 |
including all of the information. So, this is the 03:00:55.780 |
answer, natural language answer and also the tool was used 03:01:01.220 |
output. We're going to be feeding all of that out to some 03:01:04.900 |
downstream process as preferred. So, we have that. Now, 03:01:10.900 |
one thing that can happen if we're not careful is that our 03:01:15.460 |
agent executor may run many, many times and particularly if 03:01:20.660 |
we've done something wrong in our logic because we're 03:01:23.140 |
building these things, it can happen that maybe we've not 03:01:26.980 |
connected the observation back up into our agent executor 03:01:32.260 |
logic and in that case, what we might see is our agent 03:01:34.980 |
executor runs again and again and again and I mean, that's 03:01:38.020 |
fine. We're going to stop it but if we don't realize 03:01:42.020 |
straight away and we're doing a lot of LLM calls that can get 03:01:44.980 |
quite expensive quite quickly. So, what we can do is we can 03:01:49.060 |
set a limit, right? So, that's what we've done up here with 03:01:51.220 |
this max iterations. We said, okay, if we go past three max 03:01:54.740 |
iterations by default, I'm going to say stop, alright? So, 03:01:58.660 |
that's why we have the count here. While count is less than 03:02:02.820 |
the max iterations, we're going to keep going. Once we hit the 03:02:06.820 |
number of max iterations, we stop, okay? So, the while loop 03:02:09.860 |
will just stop looping, okay? So, it just protects us in case 03:02:14.900 |
of that and it also potentially maybe at some point, your agent 03:02:19.140 |
might be doing too much to answer a question. So, this 03:02:22.260 |
will force it to stop and just provide an answer. Although, if 03:02:25.860 |
that does happen, I just realized there's a bit of a 03:02:28.980 |
fault in the logic here. If that does happen, we wouldn't 03:02:31.940 |
necessarily have the answer here, right? So, we'd probably 03:02:35.700 |
want to handle that nicely but in this scenario, it's a very 03:02:40.260 |
simple use case. We're not going to see that happening. So, 03:02:44.260 |
we initialize our custom agent executor and then we invoke it, 03:02:50.740 |
okay? And let's see what happens. Alright, there we go. 03:02:54.340 |
So, that just wrapped everything into a single invoke. 03:03:00.740 |
So, everything is handled for us. We could say, okay, what is 03:03:05.220 |
ten? You know, we can modify that and say 7.4 for example 03:03:12.260 |
and that will go through. We'll use the multiply tool instead 03:03:15.060 |
and then we'll come back to the final answer again, okay? So, 03:03:18.420 |
we can see that with this custom agent executor, we've 03:03:22.580 |
built an agent and we have a lot more control over everything 03:03:27.060 |
that is going on in here. One thing that we would probably 03:03:33.300 |
need to add in this scenario is right now, I'm assuming that 03:03:36.500 |
only one tool call will happen at once and it's also why I'm 03:03:39.460 |
asking here. I'm not asking a complicated question because I 03:03:42.500 |
don't want it to go and try and execute multiple tool calls at 03:03:46.340 |
once which can happen. So, let's just try this. Okay. So, 03:03:52.660 |
this is actually completely fine. So, this did just execute 03:03:55.620 |
it one after the other. So, you can see that when asking this 03:04:00.500 |
more complicated question, it first did the exponentiate tool 03:04:05.300 |
followed by the add tool and then it actually gave us our 03:04:07.620 |
final answer which is cool. Also told us we use both of 03:04:11.540 |
those tools which it did but one thing that we should just 03:04:16.420 |
be aware of is that from OpenAI, OpenAI can actually 03:04:20.420 |
execute multiple tool calls in parallel. So, by specifying 03:04:24.980 |
that we're just using this zero here, we're actually assuming 03:04:28.660 |
that we're only ever going to be calling one tool at any one 03:04:32.420 |
time which is not always going to be the case. So, you'd 03:04:35.140 |
probably need to add a little bit of extra logic there in 03:04:37.380 |
case of scenarios if you're building an agent that is 03:04:41.300 |
likely to be running parallel tool calls. But yeah, you can 03:04:45.060 |
see here actually it's completely fine. So, it's 03:04:47.620 |
running one after the other. Okay. So, with that, we built 03:04:51.140 |
our agent executor. I know there's a lot to that and of 03:04:55.860 |
course, you can just use the very abstract agent executor 03:04:59.060 |
in the chain but I think it's very good to understand what is 03:05:03.140 |
actually going on to build our own agent executor in this 03:05:06.420 |
case and it sets you up nicely for building more complicated 03:05:10.500 |
or use case specific agent logic as well. So, that is it 03:05:17.300 |
for this chapter. In this chapter, we're going to be 03:05:20.180 |
taking a look at line change expression language. We'll be 03:05:23.460 |
looking at the runnables, the serializable and parallel of 03:05:27.940 |
those, the runnable pass through and essentially how we 03:05:32.500 |
use LSL in its full capacity. Now, to do that well, what I 03:05:38.900 |
want to do is actually start by looking at the traditional 03:05:42.820 |
approach to building chains in line chain. So, to do that, 03:05:48.260 |
we're going to go over to the LSL chapter and open that 03:05:51.860 |
curl up. Okay. So, let's come down. We'll do the 03:05:56.900 |
prerequisites. As before, nothing major in here. The one 03:06:00.820 |
thing that is new is Docker Ray because later on, as you'll 03:06:04.180 |
see, we're going to be using this as an example of the 03:06:08.980 |
parallel capabilities in LSL. If you want to use Langsmith, 03:06:13.620 |
you just need to add in your line chain API key. Okay. And 03:06:16.820 |
then let's, okay. So, now, let's dive into the traditional 03:06:20.980 |
approach to chains in line chain. So, the LN chain, I 03:06:27.540 |
think it's probably one of the first things introduced in 03:06:30.420 |
line chain, if I'm not wrong. This takes a prompt and feeds 03:06:33.780 |
it into an LLM and that's it. You can also, you can add 03:06:39.540 |
like output parsing to that as well but that's optional. I 03:06:44.260 |
don't think we're going to cover it here. So, what that 03:06:47.860 |
might look like is we have, for example, this prompt 03:06:50.340 |
template here. Give me a small report on topic. Okay. So, 03:06:54.420 |
that would be our prompt template. We'd set up as we 03:06:57.860 |
usually do with the prompt templates as we've seen 03:07:01.540 |
before. We then define our LLM. We need our API key for 03:07:08.180 |
this which as usual, we would get from platform.openai.com. 03:07:14.020 |
Then, we go ahead. I'm just showing you that you can invoke 03:07:18.580 |
the LLM there. Then, we go ahead actually define a output 03:07:23.460 |
parser. So, we do do this. I wasn't sure we did but we will 03:07:26.740 |
then define our LLM chain like this. Okay. So, LLM chain, we 03:07:31.220 |
are now prompt and now LLM and now output parser. Okay. This 03:07:36.740 |
is the traditional approach. So, I would then say, okay, 03:07:42.660 |
retrieve augmented generation and what it's going to do is 03:07:44.820 |
it's going to give me a little report back on on rag. Okay. 03:07:49.620 |
It takes a moment but you can see that that's what we get 03:07:51.940 |
here. We can format that nicely as we usually do and we get, 03:07:57.780 |
okay, look, we get a nice little report. However, the LLM 03:08:01.620 |
chain is one, it's quite restrictive, right? We have to 03:08:05.380 |
have like particular parameters that have been predefined as 03:08:09.220 |
being usable which is, you know, restrictive and it's also 03:08:13.060 |
been deprecated. So, you know, this isn't the standard way of 03:08:17.620 |
doing this anymore but we can still use it. However, the 03:08:21.700 |
preferred method to building this and building anything else 03:08:25.140 |
really or chains in general in line chain is using LSL, right? 03:08:29.540 |
And it's super simple, right? So, we just actually take the 03:08:32.100 |
prompt LLM and output parser that we had before and then we 03:08:35.060 |
just chain them together with these pipe operators. So, the 03:08:38.420 |
pipe operator here is saying, take what is output from here 03:08:41.860 |
and input it into here. Take what is output from here and 03:08:45.380 |
put it into here. That's all it does. It's super simple. So, 03:08:49.700 |
put those together and we invoke it in the same way and 03:08:52.820 |
we'll get the same output, okay? And that's what we get. 03:08:58.500 |
There is actually a slight difference on what we're 03:09:01.220 |
getting out from there. You can see here we got actually a 03:09:04.500 |
dictionary but that is pretty much the same, okay? So, we get 03:09:09.460 |
that and as before, we can display that in Markdown with 03:09:14.260 |
this, okay? So, we saw just now that we have this pipe 03:09:18.100 |
operator here. It's not really standard Python syntax to use 03:09:26.260 |
this or at least it's definitely not common. It's an 03:09:29.940 |
aberration of the intended use of Python, I think. But anyway, 03:09:35.380 |
it does, it looks cool and when you understand it, I kinda get 03:09:41.460 |
why they do it because it does make things quite simple in 03:09:44.260 |
comparison to what it could be otherwise. So, I kinda get it. 03:09:47.860 |
It's a little bit weird but it's what they're doing and I'm 03:09:51.060 |
teaching it ourselves. That's what we're going to learn. So, 03:09:55.780 |
what is that pipe operator actually doing? Well, it's as I 03:10:04.020 |
mentioned, it's taking the output from this, putting it as 03:10:06.340 |
input into what is ever on the right but how does that 03:10:10.260 |
actually work? Well, let's actually implement it 03:10:14.580 |
ourselves without line chain. So, we're going to create this 03:10:17.380 |
class called Runnable. This class, when we initialize it, 03:10:20.580 |
it's going to take a function, okay? So, this is literally a 03:10:23.460 |
Python function. It's going to take that and it's going to 03:10:28.180 |
essentially turn it into what we would call a Runnable in 03:10:31.780 |
line chain and what does that actually mean? Well, it doesn't 03:10:34.740 |
really mean anything. It just means that when you use run the 03:10:40.180 |
invoke method on it, it's going to call that function in the 03:10:43.140 |
way that you would have done otherwise, alright? So, using 03:10:46.340 |
just function, you know, brackets, open, parameters, 03:10:50.100 |
brackets, close. It's going to do that but it's also going to 03:10:53.460 |
add this method, this all method. Now, this all method in 03:10:59.060 |
typical Python syntax. Now, this all method is essentially 03:11:03.620 |
going to take your Runnable function, the one that you 03:11:07.140 |
initialize with and it's also going to take an other 03:11:10.900 |
function, okay? This other function is actually going to 03:11:14.260 |
be a Runnable, I believe. Yes, it's going to be a Runnable 03:11:17.860 |
just like this and what it's going to do is it's going to 03:11:22.180 |
run this Runnable based on the output of your current 03:11:28.020 |
Runnable, okay? That's what this all is going to do. Seems a 03:11:32.340 |
bit weird maybe but I'll explain in a moment. We'll see 03:11:35.380 |
why that works. So, I'm going to chain a few functions 03:11:39.540 |
together using this all method. So, first, we're just 03:11:44.660 |
going to turn them all into Runnables, okay? So, these are 03:11:47.620 |
normal functions as you can see, normal Python functions. 03:11:50.660 |
We then turn them into this Runnable using our Runnable 03:11:53.380 |
class. Then, look what we can do, right? So, we're going to 03:11:59.460 |
create a chain that is going to be our Runnable chained with 03:12:05.460 |
another Runnable chained with another Runnable, okay? Let's 03:12:09.140 |
see what happens. So, we're going to invoke that chain of 03:12:12.500 |
Runnables with three. So, what is this going to do? Okay, we 03:12:17.540 |
start with five. We're going to add five to three. So, we'll 03:12:21.220 |
get eight. Then, we're going to subtract five from eight to 03:12:25.940 |
give us three again and then we're going to multiply three 03:12:32.420 |
by five to give us fifteen and we can invoke that and we get 03:12:37.860 |
fifteen, okay? Pretty cool. So, that is interesting. How does 03:12:43.780 |
that relate to the pipe operator? Well, that pipe 03:12:48.020 |
operator in Python is actually a shortcut for the all method. 03:12:52.820 |
So, what we just implemented is the pipe operator. So, we can 03:12:56.980 |
actually run that now with the pipe operator here and we'll 03:13:00.660 |
get the same. We'll get fifteen, right? So, that's that's 03:13:03.540 |
what LineChain is doing. Like, under the hood, that is what 03:13:06.900 |
that pipe operator is. It's just chaining together these 03:13:10.500 |
multiple Runnables as we'd call them using their own internal 03:13:14.740 |
or operator, okay? Which is cool. I will give them that. 03:13:19.140 |
It's kind of a cool way of doing this. It's creative. I 03:13:22.340 |
wouldn't have thought about it myself. So, yeah, that is a 03:13:27.620 |
pipe operator. Then, we have these Runnable things, okay? So, 03:13:31.300 |
this is this is different to the Runnable I just defined 03:13:34.020 |
here. This is we define this ourselves. It's not a 03:13:37.220 |
LineChain thing. We didn't get this from LineChain. Instead, 03:13:42.180 |
this Runnable lambda object here, that is actually exactly 03:13:48.100 |
the same as what we just defined, alright? So, what we 03:13:50.740 |
did here, this Runnable, this Runnable lambda is the same 03:13:57.140 |
thing but in LineChain, okay? So, if we use that, okay? We 03:14:01.780 |
use that to now define three Runnables from the functions 03:14:06.100 |
that we defined earlier. We can actually pair those together 03:14:09.300 |
now using the the pipe operator. You could also pair 03:14:12.820 |
them together if you want with the or operator, right? So, we 03:14:18.740 |
could do what we did earlier. We can invoke that, okay? Or as 03:14:24.340 |
we were doing originally, we choose pipe operator. Exactly 03:14:28.580 |
the same. So, this Runnable lambda from LineChain is just 03:14:31.620 |
what we what we just built with the Runnable. Cool. So, we have 03:14:35.540 |
that. Now, let's try and do something a little more 03:14:38.820 |
interesting. We're going to generate a report and we're 03:14:43.140 |
functionality, okay? So, give me a small report about topic, 03:14:47.140 |
okay? We'll go through here. We're going to get our report 03:14:51.780 |
on AI, okay? So, we have this. You can see that AI is 03:14:57.540 |
mentioned many times in here. Then, we're going to take a 03:15:04.820 |
very simple function, right? So, I'm just going to extract 03:15:07.700 |
fact. This is basically going to take what is it? See, taking 03:15:12.260 |
the first. Okay. So, we're actually trying to remove the 03:15:17.300 |
introduction here. I'm not sure if this actually will work as 03:15:20.740 |
expected but it's it's fine. Try it anyway but then more 03:15:27.620 |
importantly, we're going to replace this word, okay? So, 03:15:30.500 |
we're going to replace an old word with a new word. Our old 03:15:32.820 |
word is going to be AI. Our new word is going to be Skynet, 03:15:35.700 |
okay? So, we can wrap both of these functions as Runnable 03:15:40.820 |
Lambdas, okay? We can add those as additional steps inside our 03:15:45.380 |
entire chain, alright? So, we're going to extract, try and 03:15:48.900 |
remove the introduction although I think it needs a bit 03:15:51.540 |
more processing than just splitting here and then we're 03:15:55.060 |
going to replace the word. We need that actually to be AI. 03:16:01.540 |
Okay. So, now we get Artificial Intelligence Skynet refers to 03:16:07.200 |
the simulation of human intelligence processed by 03:16:09.040 |
machines and then we have narrow Skynet, weak Skynet, and 03:16:13.360 |
strong Skynet. Applications of Skynet. Skynet technology is 03:16:17.600 |
being applied in numerous fields including all these 03:16:19.760 |
things. Scary. Despite its potential, Skynet poses several 03:16:24.800 |
challenges. Systems can perpetrate existing biases. It 03:16:29.680 |
raises significant privacy concerns. It can be exploited 03:16:34.160 |
for malicious purposes, okay? So, we have all these, you know, 03:16:38.800 |
it's just a silly little example. We can see also the 03:16:41.440 |
introduction didn't work here. The reason for that is because 03:16:44.400 |
our introduction includes multiple new lines here. So, I 03:16:48.400 |
would actually, if I want to remove the introduction, we 03:16:51.280 |
should remove it from here, I think. This is a, I will never 03:16:56.240 |
actually recommend you do that because it's not, it's not very 03:17:00.960 |
flexible. It's not very robust but just so I show you that 03:17:06.640 |
that is actually working. So, this extract fact runnable, 03:17:10.560 |
right? So, now we're essentially just removing the 03:17:13.840 |
introduction, right? Why would we want to do that? I don't 03:17:17.440 |
know but it's there just so you can see that we can have 03:17:20.880 |
multiple of these runnable operations running and they 03:17:24.880 |
can be whatever you want them to be. Okay, it is worth 03:17:28.400 |
knowing that the inputs to our functions here were all single 03:17:32.880 |
arguments, okay? If you have a function that is accepting 03:17:37.280 |
multiple arguments, you can do that in the way that I would 03:17:40.080 |
probably do it or you can do it in multiple ways. One of the 03:17:44.000 |
ways that you can do that is actually write your function to 03:17:48.320 |
accept multiple arguments but actually do them through a 03:17:50.800 |
single argument. So, just like a single like x which would be 03:17:53.600 |
like a dictionary or something and then just unpack them 03:17:56.560 |
within the function and use them as needed. That's just, 03:17:59.040 |
you know, one way you can do it. Now, we also have these 03:18:02.000 |
different runnable objects that we can use. So, here we have 03:18:06.080 |
runnable parallel and runnable pass-through. It's kind of 03:18:10.480 |
self-explanatory to some degree. So, let me just go 03:18:13.680 |
through those. So, runnable parallel allows you to run 03:18:17.360 |
multiple runnable instances in parallel. Runnable pass-through 03:18:23.040 |
may be less self-explanatory, allows us to pass a variable 03:18:26.880 |
through to the next runnable without modifying it, okay? So, 03:18:30.960 |
let's see how they would work. So, we're going to come down 03:18:33.600 |
here and we're going to set up these two docker arrays or 03:18:37.280 |
obviously, it's two sources of information and we're going to 03:18:42.080 |
need our LN to pull information from both of these sources of 03:18:46.560 |
information in parallel which is going to look like this. So, 03:18:49.600 |
we have these two sources of information, vector store A, 03:18:53.440 |
vector store B. This is our docker A and docker A B. These 03:18:58.960 |
are both going to be fed in as context into our prompt. Then, 03:19:02.960 |
our LN is going to use all of that to answer the question. 03:19:07.520 |
Okay. So, to actually implement that, we have our, we need an 03:19:12.080 |
embedding model. So, use OpenAI embeddings. We have our 03:19:15.520 |
vector store A, vector store B. They're not, you know, real 03:19:19.440 |
vectors. They're not full-on vectors here. We're just 03:19:22.480 |
passing in a very small amount of information to both. So, 03:19:26.320 |
we're saying, okay, we're going to create an in-memory vector 03:19:30.400 |
store using these two bits of information. So, when say half 03:19:33.680 |
the information is here, this would be a irrelevant piece of 03:19:36.000 |
information. Then, we have the relevant information which is 03:19:38.800 |
DeepSeq v3 was released in December 2024. Okay. Then, we're 03:19:44.160 |
going to have some other information in our other vector 03:19:46.960 |
store. Again, irrelevant piece here and relevant piece here. 03:19:51.200 |
Okay. The DeepSeq v3 LLM is a mixture of experts model with 03:19:55.840 |
671 billion parameters at its largest. Okay. So, based on 03:20:02.160 |
that, we're also going to build this prompt string. So, we're 03:20:04.960 |
going to pass in both of those contexts into our prompt. Now, 03:20:07.840 |
I'm going to ask a question. We don't actually need, we don't 03:20:12.320 |
need that bit and actually, we don't even need that bit. What 03:20:16.000 |
am I doing? So, we just need this. So, we have the both the 03:20:19.040 |
contexts and we would run them through our prompt template. 03:20:23.520 |
Okay. So, we have our system prompt template which is this 03:20:28.240 |
and then we're just going to have, okay, our question is 03:20:30.160 |
going to go into here as a user message. Cool. So, we have that 03:20:35.120 |
and then, let me make this easier to read. We're going to 03:20:40.640 |
convert both of those to retrievers which just means we 03:20:43.440 |
can retrieve stuff from them and we're going to use this 03:20:46.800 |
runnable parallel to run both of these in parallel, right? So, 03:20:54.240 |
these have been both being run in parallel but then we're also 03:20:56.960 |
running our question in parallel because this needs to 03:20:58.880 |
be essentially passed through this component without us 03:21:03.600 |
modifying anything. So, when we look at this here, it's almost 03:21:07.680 |
like, okay, this section here would be our runnable parallel 03:21:12.960 |
and these are being run in parallel but also our query is 03:21:17.600 |
being passed through. So, it's almost like there's another 03:21:20.480 |
line there which is our runnable pass through, okay? So, 03:21:22.880 |
that's what we're doing here. These are running in parallel. 03:21:25.920 |
One of them is a pass through. I need to run here. I just 03:21:34.480 |
realized here we're using the deprecated embeddings. Just 03:21:38.800 |
switch it to this. So, line chain open AI. We run that, run 03:21:44.160 |
this, run that and now this is set up, okay? So, we then put 03:21:54.320 |
our initial. So, this using our runnable parallel and runnable 03:21:58.320 |
pass through. That is our initial step. We then have our 03:22:02.240 |
prompt. Now, we should be chained together with the 03:22:06.960 |
usual, you know, the usual pipe operator, okay? And now, we're 03:22:11.680 |
going to invoke a question. What architecture does the mod 03:22:14.160 |
DeepSeq release in December use, okay? So, for the ELN to 03:22:18.880 |
answer this question, it's going to need to tell us what it 03:22:21.840 |
needs the information about the DeepSeq model that was released 03:22:24.640 |
in December which we have specified in one half here and 03:22:30.800 |
then it also needs to know what architecture that model uses 03:22:33.280 |
which is defined in the other half over here, okay? So, let's 03:22:39.040 |
run this, okay? There we go. DeepSeq v3 model released in 03:22:45.040 |
December 2024 is a mixture of experts model with 671 billion 03:22:49.840 |
parameters, okay? So, a mixture of experts and this many 03:22:53.200 |
parameters. Pretty cool. So, we've put together our pipeline 03:22:58.240 |
using LSL, using the pipe operator, the runnables, 03:23:02.800 |
specifically, we've looked at the runnable parallel, runnable 03:23:06.160 |
pass through, and also the runnable lambdas. So, that's it 03:23:09.200 |
for this chapter on LSL and we'll move on to the next one. 03:23:13.600 |
In this chapter, we're going to cover streaming and async in 03:23:17.920 |
lang chain. Now, both using async code and using streaming 03:23:23.200 |
are incredibly important components of I think almost 03:23:28.320 |
any conversational chat interface or at least any good 03:23:32.880 |
conversational chat interface. For async, if your application 03:23:38.080 |
is not async and you're spending a load of time in your 03:23:42.480 |
API or whatever else waiting for LLM calls because a lot of 03:23:45.920 |
those are behind APIs, you are waiting and your application is 03:23:50.880 |
doing nothing because you've written synchronous code and 03:23:54.080 |
that, well, there are many problems with that. Mainly, it 03:23:57.760 |
doesn't scale. So, async code generally performs much better 03:24:02.160 |
and especially for AI where a lot of the time, we're kind of 03:24:06.320 |
waiting for API calls. So, async is incredibly important 03:24:09.680 |
for that. For streaming, now, streaming is slightly different 03:24:13.920 |
thing. So, let's say I want to tell me a story, okay? I'm 03:24:21.120 |
using gbt4 here. It's a bit slower. So, we can actually 03:24:23.760 |
stream. We can see that token by token, this text is being 03:24:27.200 |
produced and sent to us. Now, this is not just a visual 03:24:30.480 |
thing. This is the LLM when it is generating tokens or words, 03:24:38.240 |
it is generating them one by one and that's because these 03:24:41.760 |
LLMs literally generate tokens one by one. So, they're looking 03:24:45.600 |
at all of the previous tokens in order to generate the next 03:24:48.240 |
one and then generate next one, generate next one. Now, that's 03:24:50.720 |
how they work. So, when we are implementing streaming, we're 03:24:56.800 |
getting that feed of tokens directly from the LLM through 03:25:00.160 |
to our, you know, our back end or our front end. That is what 03:25:03.520 |
we see when we see that token by token interface, right? So, 03:25:07.520 |
that's one thing. One other thing that I can do that, let 03:25:12.080 |
me switch across to 4.0 is I can say, okay, we just got this 03:25:16.480 |
story. I'm going to ask, are there any standard storytelling 03:25:26.480 |
techniques to follow use above? Please use search. 03:25:35.440 |
Okay. So, look, we get this very briefly there. We saw that 03:25:42.240 |
it was searching the web and the way, it's not because we 03:25:46.240 |
told it, okay, we told the LLM to use the search tool but then 03:25:51.600 |
the LLM output some tokens to say, use the search tool that 03:25:56.320 |
it's going to use a search tool and it also would have output 03:26:00.240 |
the token saying what that search query would have been 03:26:02.720 |
although we didn't see it there. But, what the chat GPT 03:26:07.760 |
interface is doing there, so it received those tokens saying, 03:26:11.440 |
hey, I'm going to use the search tool. It doesn't just send us 03:26:14.400 |
those tokens like it does with the standard tokens here. 03:26:17.040 |
Instead, it used those tokens to show us that searching the 03:26:22.960 |
web little text box. So, streaming is not just the 03:26:28.000 |
streaming of these direct tokens. It's also the streaming 03:26:33.120 |
of these intermediate steps that the LLM may be thinking 03:26:36.640 |
through which is particularly important when it comes to 03:26:40.960 |
agents and agentic interfaces. So, it's also a feature thing, 03:26:45.280 |
right? Streaming doesn't just look nice. It's also a feature. 03:26:49.360 |
Then, finally, of course, when we're looking at this, okay, 03:26:53.200 |
let's say we go back to GPT-4 and I say, okay, use all of 03:27:02.640 |
this information to generate a long story for me, 03:27:11.200 |
right? And, okay, we are getting the first token now. So, we 03:27:16.320 |
know something is happening. We need to start reading. Now, 03:27:19.120 |
imagine if we were not streaming anything here and 03:27:22.400 |
we're just waiting, right? We're still waiting now. We're 03:27:25.200 |
still waiting and we wouldn't see anything. We're just like, 03:27:28.240 |
oh, it's just blank or maybe there's a little loading 03:27:30.800 |
spinner. So, we'd still be waiting and even now, we're 03:27:37.280 |
still waiting, right? This is an extreme example but can you 03:27:44.720 |
imagine just waiting for so long and not seeing anything as 03:27:48.080 |
a user, right? Now, just now, we would have got our answer if 03:27:52.240 |
we were not streaming. I mean, that would be painful as a 03:27:56.560 |
user. You'd not want to wait especially in a chat interface. 03:28:00.880 |
You don't want to wait that long. It's okay with, okay, for 03:28:03.680 |
example, deep research takes a long time to process but you 03:28:07.840 |
know it's going to take a long time to process and it's a 03:28:10.000 |
different use case, right? You're getting a report. This is 03:28:13.440 |
a chat interface and yes, most messages are not going to take 03:28:18.560 |
that long to generate. We're also probably not going to be 03:28:22.320 |
using GPT-4 depending on, I don't know, maybe some people 03:28:25.440 |
still do but in some scenarios, it's painful to need to wait 03:28:30.640 |
that long, okay? And it's also the same for agents. It's nice 03:28:34.560 |
when you're using agents to get an update on, okay, we're using 03:28:37.600 |
this tool. It's using this tool. This is how it's using 03:28:39.680 |
them. Perplexity, for example, have a very nice example of 03:28:43.840 |
this. So, okay, what's this? OpenAI co-founder joins 03:28:48.240 |
Mirati's startup. Let's see, right. So, we see this is 03:28:51.200 |
really nice. We're using ProSearch. It's searching for 03:28:53.920 |
news, showing us the results, like we're getting all this 03:28:57.200 |
information as we're waiting which is really cool and it 03:29:01.840 |
helps us understand what is actually happening, right? It's 03:29:05.040 |
not needed in all use cases but it's super nice to have those 03:29:08.480 |
intermediate steps, right? So, then we're not waiting and I 03:29:11.600 |
think this bit probably also streamed but it was just super 03:29:14.240 |
fast. So, I didn't see it but that's pretty cool. So, 03:29:18.640 |
streaming is pretty important. Let's dive into our example. 03:29:23.920 |
Okay, we'll open that in Colab and off we go. So, starting with 03:29:28.000 |
the prerequisites, same as always, LangChain, optionally 03:29:32.320 |
LangSmith. We'll also enter our LangChain API key if you'd 03:29:36.160 |
like to use LangSmith. We'll also enter our OpenAI API key. 03:29:40.240 |
So, that is platform.openai.com and then as usual, we can just 03:29:45.200 |
invoke our LLM, right? So, we have that. It's working. Now, 03:29:50.160 |
let's see how we would stream with AStream, okay? So, 03:29:54.880 |
whenever a method, so stream is actually a method as well, we 03:29:58.800 |
could use that but it's not async, right? So, whenever we 03:30:01.760 |
see a method in LangChain that has a prefix onto what would be 03:30:06.320 |
another method, that's like the async version of this. So, we 03:30:12.560 |
can actually stream using async super easily using just LLM 03:30:19.680 |
AStream, okay? Now, this is just an example and to be 03:30:25.280 |
completely honest, you probably will not be able to use this in 03:30:28.720 |
an actual application but it's just an example and we're going 03:30:32.400 |
to see how we would use this or how we would stream 03:30:35.680 |
asynchronously in an application further down in 03:30:39.040 |
this notebook. So, starting with this, you can see here that 03:30:44.480 |
we're getting these tokens, right? We're just appending it 03:30:46.800 |
to tokens here. We don't actually need to do that. I 03:30:48.800 |
don't think we're using this but maybe we, yeah, we'll do it 03:30:52.480 |
here. It's fine. So, we're just appending the tokens as they 03:30:56.400 |
come back from our LLM, appending it to this. We'll see 03:31:00.000 |
what that is in a moment and then I'm just printing the 03:31:03.680 |
token content, right? So, the content of the token. So, in 03:31:08.240 |
this case, that would be L. In this case, it would be LP. It 03:31:11.440 |
would be SAMS, four, so on and so on. So, you can see for the 03:31:14.720 |
most part, it's tends to be word level but it can also be 03:31:18.800 |
sub-word level as you see, sent, is one word, of course. So, 03:31:24.320 |
you know, they get broken up in various ways. Then, adding 03:31:29.120 |
this pipe character onto the end here. So, we can see, okay, 03:31:33.360 |
where are our individual tokens? Then, we also have 03:31:36.720 |
Flush. So, Flush, you can actually turn this off and 03:31:40.320 |
it's still going to stream. You're still going to see 03:31:41.840 |
everything but it's going to be a bit more. You can see it's 03:31:43.920 |
kind of a, it's like bit by bit. When we use Flush, it 03:31:48.800 |
forces the console to update what is being shown to us 03:31:53.680 |
immediately, alright? So, we get a much smoother when we're 03:31:58.560 |
looking at this. It's much smoother versus when Flush is 03:32:02.160 |
not set to true. So, yeah, when you're printing, that is good 03:32:05.840 |
to do just so you can see. You don't necessarily need to. 03:32:08.640 |
Okay. Now, we added all those tokens to the tokens list so 03:32:12.960 |
we can have a look at each individual object that was 03:32:15.600 |
returned to us, right? This is interesting. So, you see that 03:32:18.640 |
we have the AI message chunk, right? That's an object and 03:32:22.640 |
then you have the content. The first one's actually empty. 03:32:26.000 |
Second one has that N for NLP and yeah, I mean, that's all we 03:32:31.120 |
really need to know. They're very simple objects but they're 03:32:34.240 |
actually quite useful because just look at this, right? So, 03:32:38.640 |
we can add each one of our AI message chunks, right? Let's 03:32:42.640 |
see what that does. It doesn't create a list. It creates this, 03:32:45.920 |
right? So, we still just have one AI message chunk but it's 03:32:51.600 |
combined the content within those AI message chunks which 03:32:55.440 |
is kind of cool, right? So, for example, like we could remove 03:32:59.440 |
these, right? And then we just see NLP. So, it's kind of nice 03:33:05.440 |
little feature there. I do. I actually quite like that. But 03:33:10.640 |
you do need to just be a little bit careful because obviously 03:33:12.800 |
you can do that the wrong way and you're going to get like a 03:33:16.720 |
I don't know what that is. Some weird token salad. So, yeah, 03:33:21.360 |
you need to just make sure you are going to be merging those 03:33:24.480 |
in the correct order unless you, I don't know, unless you're 03:33:28.160 |
doing something weird. Okay, cool. So, streaming, that was 03:33:32.720 |
streaming from a LM. Let's have a look at streaming with 03:33:35.600 |
agents. So, we, it gets a bit more complicated to be 03:33:41.120 |
completely honest. But we also need to, things are going to 03:33:45.680 |
get a bit more complicated so that we can implement this in, 03:33:49.280 |
for example, an API, right? That is, it's kind of like a 03:33:52.800 |
necessary thing in any case. So, to just very quickly, we're 03:33:58.560 |
going to construct our agent executor like we did in the 03:34:01.440 |
agent execution chapter. And for that, for the agent 03:34:06.160 |
executor, we're going to need tools, chat prompt template, LM 03:34:09.600 |
agent, and the agent executor itself, okay? Very quickly, I'm 03:34:13.360 |
not going to go through these in detail. We just define our 03:34:16.320 |
tools. We have add, multiply, exponentiate, subtract, and 03:34:20.080 |
define our answer tool. Merge those into a single list of 03:34:23.200 |
tools. Then, we have our prompt template. Again, same as 03:34:27.680 |
before, we just have system message, we have chat history, 03:34:30.640 |
we have a query, and then we have the agent scratch pad for 03:34:34.960 |
those intermediate steps. Then, we define our agent using 03:34:39.760 |
LSL. LSL works quite well with both streaming and async, by 03:34:44.000 |
the way. It supports both out of the box, which is nice. So, we 03:34:49.840 |
define our agent. Then, coming down here, we're going to 03:34:54.800 |
create the agent executor. This is the same as before, right? 03:34:58.240 |
So, there's nothing new in here, I don't think. So, just 03:35:01.520 |
initialize our agent things there. Then, it's, yeah, we're 03:35:06.960 |
looping through, looping through. Yeah, nothing, nothing 03:35:11.920 |
new there. So, we're just executing, we're invoking our 03:35:15.600 |
agent, seeing if there's a tool call. This is slightly, we 03:35:20.480 |
could shift this to before or after. It doesn't actually 03:35:22.320 |
matter that much. So, we're checking if it's the final 03:35:25.440 |
answer. If not, we continue, execute our tools, and so on. 03:35:30.640 |
Okay, cool. So, then, we can invoke that. Okay, we go, what 03:35:37.440 |
is 10 plus 10? There we go, right? So, we have our agent 03:35:43.040 |
executor, it is working. Now, when we are running our agent 03:35:50.240 |
executor, with every new query, if we're putting this into an 03:35:54.000 |
API, we're probably going to need to provide it with a fresh 03:35:59.200 |
callback handler. Okay, so, this is the callback handler is 03:36:02.480 |
what's going to handle taking the tokens that are being 03:36:05.520 |
generated by a Lemo agent and giving them to some other 03:36:10.160 |
piece of code. Like, for example, the streaming 03:36:12.960 |
response for an API, and our callback handler is going to 03:36:18.560 |
put those tokens in a queue, in our case, and then our, for 03:36:23.840 |
example, the streaming object is going to pick them up from 03:36:26.880 |
the queue and put them wherever they need to be. So, to allow 03:36:32.080 |
us to do that with every new query, rather than us needing 03:36:35.440 |
to initialize everything when we actually initialize our 03:36:39.600 |
agent, we can add a configurable field to our Lem, 03:36:43.360 |
okay? So, we set the configurable fields here. Oh, 03:36:46.960 |
also, one thing is that we set streaming equal to true, that's 03:36:50.320 |
very minor thing, but just so you see that there, we do do 03:36:54.080 |
that. So, we add some configurable fields to our Lem, 03:36:57.200 |
which means we can basically pass an object in for these on 03:37:00.640 |
every new invocation. So, we set our configurable field, it's 03:37:06.000 |
going to be called callbacks, and we just add a description, 03:37:09.440 |
right? Nothing more to it. So, this will now allow us to 03:37:13.120 |
provide that field when we're invoking our agent, okay? Now, 03:37:21.120 |
we need to define our callback handler, and as I mentioned, 03:37:25.680 |
what is basically going to be happening is this callback 03:37:28.000 |
handler is going to be passing tokens into our async IO queue 03:37:33.200 |
object, and then we're going to be picking them up from the 03:37:36.960 |
queue elsewhere, okay? So, we can call it a queue callback 03:37:40.640 |
handler, okay? And that is inheriting from the async 03:37:44.560 |
callback handler, because we want all this to be done 03:37:46.480 |
asynchronously, because we're thinking here about, okay, how 03:37:49.280 |
do we implement all this stuff within APIs and actual real 03:37:52.880 |
world code, and we do want to be doing all this in async. So, 03:37:58.080 |
let me execute that, and I'll just explain a little bit of 03:38:00.240 |
what we're looking at. So, we have the initialization, right? 03:38:03.520 |
There's nothing specific here. What we really want to be 03:38:08.560 |
doing is we want to be setting our queue object, assigning 03:38:11.760 |
that to the class attributes, and then there's also this 03:38:15.840 |
final answer scene, which we're setting to false. So, what 03:38:19.440 |
we're going to be using that for is our LLM will be 03:38:24.240 |
streaming tokens to us whilst it's using its tool calling, 03:38:29.360 |
and we might not want to display those immediately, or 03:38:31.600 |
we might want to display them in a different way. So, by 03:38:34.560 |
setting this final answer scene to false, whilst our LLM is 03:38:41.440 |
outputting those tool tokens, we can handle them in a 03:38:44.240 |
different way, and then as soon as we see that it's done with 03:38:47.360 |
the tool calls and it's onto the final answer, which is 03:38:49.600 |
actually another tool call, but once we see that it's onto the 03:38:52.160 |
final answer tool call, we can set this to true, and then we 03:38:56.240 |
can start processing our tokens in a different way, 03:38:59.360 |
essentially. So, we have that. Then, we have this 03:39:03.840 |
aiter method. This is required for any async generator object. 03:39:11.280 |
So, what that is going to be doing is going to be iterating 03:39:13.680 |
through, right? So, it's a generator. It's going to be 03:39:16.400 |
going iterating through and saying, okay, if our queue is 03:39:19.760 |
empty, right? This is the queue that we set up here. If it's 03:39:22.800 |
empty, wait a moment, right? We use the sleep method here, and 03:39:27.360 |
this is an async sleep method. This is super important. We're 03:39:30.960 |
using, we're awaiting for an asynchronous sleep, right? So, 03:39:35.040 |
whilst we're, whilst we're waiting for that 0.1 seconds, 03:39:38.880 |
our, our code can be doing other things, right? That that 03:39:43.360 |
is important. If we, if we use, I think the standard is time 03:39:47.280 |
dot sleep, that is not asynchronous, and so it will 03:39:50.560 |
actually block the thread for that 0.1 seconds. So, we don't 03:39:54.880 |
want that to happen. Generally, our queue should probably not 03:39:58.000 |
be empty that frequently given how quickly tokens are going to 03:40:01.680 |
be added to the queue. So, the only way that this would 03:40:05.440 |
potentially be empty is maybe our LLM stops. Maybe there's 03:40:10.720 |
like a connection interruption for a, you know, a brief second 03:40:13.600 |
or something, and no tokens are added. So, in that case, we 03:40:17.280 |
don't actually do anything. We don't keep checking the queue. 03:40:19.680 |
We just wait a moment, okay? And then, we check again. Now, 03:40:24.320 |
if it was empty, we wait, and then, we continue on to the 03:40:28.080 |
next iteration. Otherwise, it probably won't be empty. We get 03:40:33.040 |
whatever is from our, inside our queue. We get that out, pull 03:40:36.160 |
it out. Then, we say, okay, if that token is a done token, 03:40:42.640 |
we're going to return. So, we're going to stop this 03:40:45.760 |
generator, right? We're finished. Otherwise, if it's 03:40:49.680 |
something else, we're going to yield that token which means 03:40:52.480 |
we're returning that token, but then, we're continuing through 03:40:55.520 |
that loop again, right? So, that is our generator logic. 03:41:01.760 |
Then, we have some other methods here. These are 03:41:05.360 |
line-chain specific, okay? We have on LLM new token and we 03:41:10.400 |
have on LLM end. Starting with on LLM new token, this is 03:41:14.960 |
basically when an LLM returns a token to us. Line chain is 03:41:18.400 |
going to run or execute this method, okay? This is the 03:41:23.280 |
method that will be called. What this is going to do is 03:41:27.200 |
it's going to go into the keyword arguments. It's going 03:41:29.200 |
to get the chunk object. So, this is coming from our LLM. If 03:41:33.280 |
there is something in that chunk, it's going to check for 03:41:37.440 |
a final answer tool call first, okay? So, we get our tool 03:41:41.680 |
calls and we say, if the name within our chunk, right? 03:41:46.400 |
Probably, this will be emptying most of the tokens we return, 03:41:49.520 |
right? So, you remember before when we're looking at the 03:41:52.640 |
chunks here, this is what we're looking at, right? The 03:41:56.160 |
content for us is actually always going to be empty and 03:41:58.320 |
instead, we're actually going to get the additional keyword 03:42:00.720 |
args here and inside there, we're going to have our tool 03:42:03.600 |
calling, our tool calls as we saw in the previous videos, 03:42:08.480 |
right? So, that's what we're extracting. We're extracting 03:42:10.800 |
that information. That's why we're going additional keyword 03:42:13.760 |
args, right? And get those tool, the tool call information, 03:42:18.800 |
right? Or it will be none, right? So, if it is none, I 03:42:23.360 |
don't think it ever would be none to be honest. It would be 03:42:25.840 |
strange if it's none. I think that means something would be 03:42:28.080 |
wrong. Okay, so here, we're using the Walrus operator. So, 03:42:31.120 |
the Walrus operator, what it's doing here is whilst we're 03:42:34.880 |
checking the if logic here, whilst we do that, it's also 03:42:39.840 |
assigning whatever is inside this. It's assigning over to 03:42:44.160 |
tool calls and then with the if we're checking whether tool 03:42:48.240 |
calls is something or none, right? Because we're using get 03:42:52.640 |
here. So, if this get operation fails and there is no tool 03:42:56.640 |
calls, this object here will be equal to none which gets 03:43:01.360 |
assigned to tool calls here and then this if none will return 03:43:06.160 |
false and this logic will not run, okay? And it will just 03:43:09.680 |
continue. If this is true, so if there is something returned 03:43:13.520 |
here, we're going to check if that something returned is 03:43:16.400 |
using the function name or tool name, final answer. If it is, 03:43:20.560 |
we're going to set that final answer scene equal to true. 03:43:23.040 |
Otherwise, we're just going to add our chunk into the queue, 03:43:27.760 |
okay? We use put no weight here because we're we're using 03:43:30.560 |
async. Otherwise, if you were not using async, I think you 03:43:33.600 |
might just put weight or maybe even put put. No, okay, you 03:43:39.360 |
you'd use put if it's just synchronous code but II don't 03:43:43.200 |
think I've ever implemented this synchronously. So, it 03:43:46.240 |
would actually just be put no weight for async, okay? And 03:43:49.440 |
then return. So, we have that. Then, we have on LLM end, okay? 03:43:56.480 |
So, this is when chain sees that the LLM has returned or 03:44:02.080 |
indicated that it is finished with the response. Line chain 03:44:06.480 |
will call this. So, you have to be aware that this will happen 03:44:13.120 |
multiple times during an agent execution because if you think 03:44:17.440 |
within our agent executor, we're hitting the LLM multiple 03:44:22.080 |
times. We have that first step where it's deciding, oh, I'm 03:44:25.600 |
going to use the add tool or the multiply tool and then that 03:44:29.120 |
response gets back to us. We execute that tool and then we 03:44:33.360 |
pass the output from that tool and or the original user query 03:44:36.960 |
in the chat history, we pass that back to our LLM again, 03:44:39.680 |
right? So, that's another call to our LLM that's going to come 03:44:42.560 |
back. It's going to finish or it's going to give us something 03:44:45.120 |
else, right? So, there's multiple LLM calls happening 03:44:48.640 |
throughout our agent execution logic. So, this on LLM call 03:44:53.200 |
will actually get called at the end of every single one of 03:44:55.680 |
those LLM calls. Now, if we get to the end of a LLM call and it 03:45:02.480 |
was just a it was a tool invocation. So, we had the, you 03:45:05.600 |
know, it called the add tool. We don't want to put the done 03:45:11.280 |
token into our queue because when the done token is added to 03:45:14.640 |
our queue, we're going to stop iterating, okay? Instead, if it 03:45:20.880 |
was just a tool call, we're going to say step end, right? 03:45:24.240 |
And we'll actually get this token back. So, this is useful 03:45:27.920 |
on, for example, the front end, you could have, okay, I've 03:45:32.560 |
used the add tool. These are the parameters and it's the end 03:45:36.560 |
of the step. So, you could have that your tool call is being 03:45:40.640 |
used on some front end and as soon as it sees step end, it 03:45:43.840 |
knows, okay, we're done with that. Here was the response, 03:45:46.720 |
right? And it can just show you that and we're going to use 03:45:49.680 |
that. We'll see that soon but let's say we get to the final 03:45:53.280 |
answer tool. We're on the final answer tool and then we get 03:45:56.400 |
this signal that the LLM has finished. Then, we need to stop 03:46:01.920 |
iterating. Otherwise, our our stream generator is just going 03:46:06.000 |
to keep going forever, right? Nothing's going to stop it or 03:46:08.880 |
maybe it will time out. I don't think it will though. So, at 03:46:13.200 |
that point, we need to send, okay, stop, right? We need to 03:46:16.800 |
say we're done and then that will that will come back to 03:46:19.760 |
here to our iterator and to our async iterator and it will 03:46:25.360 |
return and stop the generator, okay? So, that's the core 03:46:30.960 |
logic that we have inside that. I know there's a lot going on 03:46:34.240 |
there. It's but we need all of this. So, it's important to be 03:46:38.400 |
aware of it. Okay. So, now, let's see how we might actually 03:46:43.040 |
call our agent with all of the streaming in this way. So, 03:46:49.360 |
we're going to initialize our queue. We're going to use that 03:46:53.120 |
to initialize a streamer, okay? Using the the custom streamer 03:46:56.400 |
that we just set up. Custom callback handler, whatever you 03:46:59.040 |
want to call it, okay? Then, I'm going to define a function. 03:47:03.200 |
So, this is an asynchronous function. It has to be if if 03:47:05.840 |
we're using async and what it's going to do is it's going to 03:47:09.200 |
call our agent with a config here and we're going to pass it 03:47:14.720 |
that call the the callback which is the streamer, right? 03:47:18.320 |
Now, here, I'm not calling the agent executor. I'm just calling 03:47:20.800 |
the agent, right? So, the if we come back up here, we're 03:47:25.360 |
calling this, right? So, that's not going to include all the 03:47:28.720 |
tool execution logic and importantly, we're calling the 03:47:32.960 |
agent with the config that uses callbacks, right? So, this 03:47:37.840 |
this configurable fields here from our LM is actually being 03:47:40.720 |
fed through and it propagates through to our agent object as 03:47:43.360 |
well to the runnable serializable, right? So, that's 03:47:47.200 |
what we're executing here. We see agent with config and we're 03:47:50.560 |
passing in those callbacks which is just one actually, 03:47:54.000 |
okay? So, that sets up our agent and then we invoke it with 03:47:58.240 |
a stream, okay? Like we did before and we're just going to 03:48:01.760 |
return everything. So, let's run that, okay? And we see all 03:48:07.280 |
the token or the chunk objects that have been returned and 03:48:10.480 |
this is useful to understand what we're actually doing up 03:48:14.080 |
here, right? So, when we're doing this chunk message, 03:48:17.920 |
additional keyword arguments, right? We can see that in here. 03:48:20.960 |
So, this would be the chunk message object. We get the 03:48:24.640 |
additional keyword logs. We're going to tool calls and we get 03:48:28.480 |
the information here. So, we have the ID for that tool call 03:48:31.040 |
which we saw in the previous chapters. Then, we have our 03:48:35.760 |
function, right? So, the function includes the name, 03:48:39.760 |
right? So, we know what tool we're calling from this first 03:48:42.560 |
chunk but we don't know the arguments, right? Those 03:48:44.960 |
arguments are going to be streamed to us. So, we can see 03:48:47.600 |
them begin to come through in the next chunk. So, next chunk 03:48:51.920 |
is just it's just the first token for the add function, 03:48:56.640 |
right? And we can see these all come together over multiple 03:49:00.640 |
steps and we actually get all of our arguments, okay? That's 03:49:05.600 |
pretty cool. So, actually one thing I would like to show you 03:49:10.000 |
here as well. So, if we just do token equals tokens, sorry. 03:49:25.460 |
Okay. We have all of our tokens in here now. Alright, see that 03:49:31.260 |
they're all AI message chunks. So, we can actually add those 03:49:35.500 |
together, right? So, let's we'll go with these here and 03:49:39.340 |
based on these, we're going to get all of the arguments, okay? 03:49:42.540 |
So, this is kind of interesting. So, it's one until 03:49:51.420 |
Alright, so we have these and actually we just want to add 03:49:56.240 |
those together. So, I'm going to go with tokens one and I'm 03:50:07.760 |
For token in, we're going to go from the second onwards. I'm 03:50:13.700 |
going to TK plus token, right? And let's see what TK looks 03:50:23.780 |
Okay. So, now you see that it's kind of merged with all those 03:50:28.180 |
arguments here. Sorry, plus equal. Okay. So, run that and 03:50:34.500 |
you can see here that it's merged those arguments. It 03:50:36.900 |
didn't get all of them. So, I kind of missed some at the end 03:50:38.980 |
there but it's merging them, right? So, you can see that 03:50:42.020 |
logic where it's, you know, before it was adding the 03:50:45.060 |
content from various chunks. It also does the same for the 03:50:49.460 |
other parameters within your chunk object which is I think 03:50:53.220 |
it's pretty cool and you can see here the name wasn't 03:50:55.940 |
included. That's because we started on token one or on 03:50:59.700 |
token zero where the name was. So, if we actually started from 03:51:02.660 |
token zero and let's just let's just pull them in there, 03:51:06.660 |
alright? So, from one onwards, we're going to get a complete 03:51:12.820 |
AI message chunk which includes the name here and all of those 03:51:17.940 |
arguments and you'll you'll see also here, right? Populate 03:51:21.220 |
everything which is pretty cool. Okay. So, we have that. 03:51:26.900 |
Now, based on this, we're going to want to modify our custom 03:51:29.700 |
agent executor because we're streaming everything, right? 03:51:34.500 |
So, we want to add streaming inside our agent executor which 03:51:38.020 |
we're doing here, right? So, this is async def stream and 03:51:42.180 |
we're sharing async for token in the A stream, okay? So, this 03:51:47.620 |
is like the very first instance. If output is non, 03:51:51.220 |
we're just going to be adding our token. So, the the chunk, 03:51:55.140 |
sorry, to our output like a first token becomes our output. 03:52:00.740 |
Otherwise, we're just appending our tokens to the output, okay? 03:52:06.660 |
If the token content is empty, which it should be, right? 03:52:09.860 |
Because we're using tool calls all the time. We're just going 03:52:12.340 |
to print content, okay? I just added these as so we see like 03:52:16.580 |
print everything. I just want to want to be able to see that. 03:52:19.540 |
I wouldn't expect this to run because we're saying it has to 03:52:22.900 |
use tool calling, okay? So, within our agent, if we come up 03:52:28.180 |
to here, we said tool choice any. So, it's been forced to 03:52:30.980 |
use tool calling. So, it should never really be returning 03:52:34.100 |
anything inside the content field but just in case it's 03:52:36.980 |
there, right? So, we'll see if that is actually true. Then, 03:52:40.740 |
we're just getting out our tool calls information, okay? From 03:52:44.820 |
our chunk and we're going to say, okay, if there's something 03:52:46.900 |
in there, we're going to print what is in there, okay? And 03:52:49.540 |
then, we're going to extract our tool name. If there is some, 03:52:52.500 |
if there's a tool name, I'm going to show you the tool name. 03:52:55.780 |
Then, we're going to get the ARGs and if the ARGs are not 03:52:58.740 |
empty, we're going to see what we get in there, okay? And then 03:53:03.060 |
from all of this, we're actually going to merge all of 03:53:05.380 |
it into our AI message, right? Because we're merging 03:53:08.980 |
everything as we're going through, we're merging 03:53:10.420 |
everything into outputs as I showed you before, okay? Cool. 03:53:13.860 |
And then, we're just awaiting our stream that will like kick 03:53:16.340 |
it off, okay? And then, we do the standard agent executor 03:53:20.420 |
stuff again here, right? So, we're just pulling out tool 03:53:23.380 |
name, tool logs, tool call ID and then we're using all that 03:53:26.100 |
to execute our tool here and then we're creating a new tool 03:53:29.700 |
message and passing that back in. And then also here, I move 03:53:33.300 |
the break for the final answer into the final step. So, that 03:53:37.780 |
is our custom agent executor with streaming and let's see 03:53:41.220 |
what, let's see what it does, okay? Same for both equals 03:53:45.380 |
true, so we see all those print statements, okay? So, you can 03:53:52.340 |
kind of see it's a little bit messy but you can see we have 03:53:55.700 |
tool calls that had some stuff inside it, had add here and 03:54:00.740 |
what we're printing out here is we're printing out the full AI 03:54:03.380 |
message chunk with tool calls and then I'm just printing out, 03:54:06.900 |
okay, what are we actually pulling out from that? So, 03:54:09.460 |
these are actually coming from the same thing, okay? And then 03:54:12.740 |
the same here, right? So, we're looking at the full message 03:54:15.300 |
and then we're looking, okay, we're getting this argument out 03:54:18.340 |
from it, okay? So, we can see everything that is being pulled 03:54:22.180 |
out, you know, chunk by chunk or token by token and that's it, 03:54:27.380 |
okay? So, we could just get everything like that. However, 03:54:31.060 |
right, so I'm printing everything so we can see that 03:54:33.300 |
streaming. What if I don't print, okay? So, we're setting 03:54:37.380 |
verbose or by default, verbose is equal to false here. So, 03:54:50.980 |
Cool. We got nothing. So, the reason we got nothing is 03:54:58.480 |
because we're not printing but we don't, if you are, if you're 03:55:04.560 |
building an API, for example, you're pulling your tokens 03:55:08.160 |
through, you can't print them to your like a front end or 03:55:15.440 |
print them as to the output of your API. Printing goes to your 03:55:20.560 |
terminal, right? Your console window. It doesn't go anywhere 03:55:24.080 |
else. Instead, what we want to do is we actually want to get 03:55:29.040 |
those tokens out, right? But if but how do we do that, right? 03:55:33.760 |
So, we we printed them but another place that those tokens 03:55:37.680 |
are is in our queue, right? Because we set them up to go to 03:55:41.680 |
the queue. So, we can actually pull them out of our queue 03:55:48.480 |
whilst our agent executor is running and then we can do 03:55:52.560 |
whatever we want with them because our code is async. So, 03:55:54.800 |
it can be doing multiple things at the same time. So, whilst 03:55:58.000 |
our code is running the agent executor, whilst that is 03:56:02.000 |
happening, our code can also be pulling out from our queue 03:56:05.680 |
tokens that are in there and sending them to like an API, 03:56:11.120 |
for example, right? Or whatever downstream logic you have. So, 03:56:15.680 |
let's see what that looks like. We start by just initializing 03:56:19.040 |
our queue, initializing our streamer with that queue. Then 03:56:22.080 |
we create a task. So, this is basically saying, okay, I want 03:56:26.400 |
to run this but don't run it right now. I'm not ready yet. 03:56:29.760 |
The reason that I say I'm not ready yet is because I also 03:56:33.440 |
want to define here my async loop which is going to be 03:56:38.000 |
printing those tokens, right? But this is async, right? So, 03:56:41.360 |
we set this up. This is like get ready to run this. Because 03:56:45.520 |
it is async, this is running, right? This is just running. 03:56:49.760 |
Like it's there. It's already running. So, we get this. We 03:56:52.640 |
continue. We continue. None of this is actually executed 03:56:56.160 |
yet, right? Only here when we await the task that we set up 03:57:02.560 |
here. Only then does our agent executor run and our async 03:57:10.080 |
object here begin getting tokens, right? And here, again, 03:57:14.080 |
I'm printing but I don't need to print. I could I could have 03:57:17.280 |
like a let's say where this is within an API or something. 03:57:23.440 |
Let's say I'm I'm saying, okay, send token to XYZ token, right? 03:57:31.700 |
That's sending a token somewhere or if we're maybe 03:57:34.340 |
we're yielding this to our some sort of streamer object within 03:57:38.500 |
our API, right? We can do whatever we want with those 03:57:40.900 |
tokens, okay? I'm just printing them cuz I want to actually see 03:57:44.420 |
them, okay? But just important here is that we're not printing 03:57:49.300 |
them within our agent executor. We're printing them outside the 03:57:52.580 |
agent executor. We've got them out and we can put them 03:57:55.860 |
wherever we want which is perfect when you're building an 03:57:58.820 |
actual sort of real world use case where you're using an API 03:58:01.220 |
or something else. Okay, so let's run that. Let's see what 03:58:03.940 |
we get. Look at that. We get all of the information we could 03:58:08.580 |
need and a little bit more, right? Because now, we're using 03:58:12.580 |
the agent executor and now, we can also see how we have this 03:58:16.740 |
step end, right? So, I know or I know just from looking at this, 03:58:21.060 |
right? This is my first tool use. So, what tool is it? Let's 03:58:25.620 |
have a look. It's the add tool and then, we have these 03:58:29.140 |
arguments. So, I can then pass them, right? Downstream. Then, 03:58:32.740 |
we have the next tool use which is here, down here. So, then, 03:58:37.940 |
we can then pass them in the way that we like. So, that's 03:58:42.100 |
pretty cool. Let's, I mean, let's see, right? So, we're 03:58:47.060 |
getting those things out. Can we, can we do something with 03:58:50.900 |
them before I, before I print them and show them? Yes, let's 03:58:54.660 |
see, okay? So, we're now modifying our loop here. Same 03:58:59.860 |
stuff, right? We're still initializing our queue, 03:59:02.580 |
initializing our streamer, initializing our tasks, okay? 03:59:06.020 |
And we're still doing this async for token streamer, okay? 03:59:09.860 |
But then, we're doing stuff with our tokens. So, I'm saying, 03:59:13.460 |
okay, if we're on stream end, I'm not actually gonna print 03:59:17.300 |
stream end. I'm gonna print new line, okay? Otherwise, if we're 03:59:21.940 |
getting a tool call here, we're going to say, if that tool call 03:59:26.260 |
is the tool name, I am going to print calling tool name, okay? 03:59:32.500 |
If it's the arguments, I'm going to print the tool 03:59:36.020 |
argument and I'm gonna end up with nothing so that we don't 03:59:38.740 |
go onto a new line. So, we're actually gonna be streaming 03:59:41.460 |
everything, okay? So, let's just see what this looks like. 03:59:55.420 |
You see that? So, it goes very fast. So, it's kinda hard to 03:59:59.200 |
see it. I'm gonna slow it down so you can see. So, you can see 04:00:02.800 |
that we, as soon as we get the tool name, we stream that 04:00:07.040 |
we're calling the add tool. Then, we stream token by token, 04:00:10.560 |
the actual arguments for that tool. Then, for the next one, 04:00:13.680 |
again, we do the same. We're calling this tool name. Then, 04:00:16.880 |
we're streaming token by token again. We're processing 04:00:20.240 |
everything downstream from outside of the agent executor 04:00:24.560 |
and this is an essential thing to be able to do when we're 04:00:27.920 |
actually implementing streaming and async and everything else 04:00:32.480 |
in an actual application. So, I know that's a lot but it's 04:00:38.960 |
important. So, that is it for our chapter on streaming and 04:00:43.360 |
async. I hope it's all been useful. Thanks. Now, we're on 04:00:47.200 |
to the final capstone chapter. We're going to be taking 04:00:51.280 |
everything that we've learned so far and using it to build a 04:00:56.640 |
actual chat application. Now, the chat application is what 04:01:00.400 |
you can see right now and we can go into this and ask some 04:01:04.400 |
pretty interesting questions and because it's an agent 04:01:06.960 |
because as I've accessed these tools, it will be able to 04:01:09.440 |
answer them for us. So, we'll see inside our application that 04:01:12.800 |
we can ask questions that require tool use such as this 04:01:17.040 |
and because of the streaming that we've implemented, we can 04:01:19.600 |
see all this information in real time. So, we can see that 04:01:22.160 |
serve API tool is being used, that these are the queries. We 04:01:25.280 |
saw all that was in parallel as well. So, each one of those 04:01:29.200 |
tools were being used in parallel. We've modified the 04:01:31.840 |
code a little bit to enable that and we see that we have 04:01:36.160 |
the answer. We can also see the structured output being used 04:01:39.520 |
here. So, we can see our answer followed by the tools used 04:01:43.440 |
here and then we could ask follow-up questions as well 04:01:45.920 |
because it's conversational. So, say how is the weather in 04:01:54.960 |
Okay, that's pretty cool. So, this is what we're going to be 04:02:04.540 |
building. We are, of course, going to be focusing on the 04:02:07.580 |
API, the backend. I'm not front-end engineer so I can't 04:02:11.340 |
take you through that but the code is there. So, for those of 04:02:14.380 |
you that do want to go through the front-end code, you can, of 04:02:17.260 |
course, go and do that but we'll be focusing on how we 04:02:20.380 |
build the API that powers all of this using, of course, 04:02:24.220 |
everything that we've learned so far. So, let's jump into it. 04:02:27.340 |
The first thing we're going to want to do is clone this repo. 04:02:30.700 |
So, we'll copy this URL. This is the repo, Aurelio Labs 04:02:34.860 |
LineChainCourse and you just clone your repo like so. I've 04:02:41.340 |
already done this so I'm not going to do it again. Instead, 04:02:44.940 |
I'll just navigate to the LineChainCourse repo. Now, 04:02:49.340 |
there's a few setup things that you do need to do. All of 04:02:53.020 |
those can be found in the README. So, we just open a new 04:02:57.740 |
tab here and I'll open the README. Okay, so this explains 04:03:03.180 |
everything we need. We have, if you were running this locally 04:03:06.860 |
already, you will have seen this or you will have already 04:03:09.580 |
done all this but for those of you that haven't, we'll go 04:03:12.460 |
through quickly now. So, you will need to install the uv 04:03:18.140 |
library. So, this is how we manage our Python environment, 04:03:22.700 |
our packages. We use uv. On Mac, you would install it like 04:03:27.980 |
so. If you're on Windows or Linux, just double check how 04:03:32.620 |
you would install over here. Once you have installed this, 04:03:36.700 |
you would then go to install Python. So, uv Python install. 04:03:42.780 |
Then, we want to create our VM, our virtual environment 04:03:47.580 |
using that version of Python. So, uvvn here. Then, as you can 04:03:53.820 |
see here, we need to activate that virtual environment which 04:03:57.420 |
I did miss from here. So, let me quickly add that. So, you 04:04:02.060 |
just run that. For me, I'm using Phish. So, I just add 04:04:05.740 |
Phish onto the end there but if you're using Bash or ZSH, I 04:04:08.380 |
think you can you can just run that directly. And then, 04:04:11.100 |
finally, we need to sync, i.e. install all of our packages 04:04:16.700 |
using uv sync. And you see that will install everything for 04:04:20.940 |
you. Great. So, we have that and we can go ahead and actually 04:04:26.940 |
open Cursor or VS Code and then we should find ourselves 04:04:32.220 |
within Cursor or VS Code. So, in here, you'll find a few 04:04:37.740 |
things that we will need. So, first is environment variables. 04:04:42.780 |
So, we can come over to here and we have OpenAI, API Key, 04:04:47.100 |
Long Chain API Key, and SERP API API Key. Create a copy of 04:04:50.940 |
this and you'd make this your .env file or if you want to 04:04:56.780 |
run it with source, you can, well, I like to use Mac.env 04:05:01.820 |
when I'm on Mac and I just add export onto the start there and 04:05:05.740 |
then enter my API keys. Now, I actually already have these in 04:05:10.140 |
this local.mac.env file which over in my terminal, I would 04:05:15.420 |
just activate with source again like that. Now, we'll need that 04:05:20.540 |
when we are running our API and application later but for now, 04:05:24.940 |
let's just focus on understanding what the API 04:05:28.380 |
actually looks like. So, navigating into the 09 Capstone 04:05:33.340 |
chapter, we'll find a few things. What we're going to 04:05:37.020 |
focus on is the API here and we have a couple of notebooks 04:05:41.260 |
that help us just understand, okay, what are we actually 04:05:44.780 |
doing here? So, let me give you a quick overview of the API 04:05:49.260 |
first. So, the API, we're using FastAPI for this. We have a 04:05:53.340 |
few functions in here. The one that we'll start with is this. 04:05:57.420 |
Okay. So, this is our post endpoint for invoke and this 04:06:01.900 |
essentially sends something to our LLM and begins a streaming 04:06:05.980 |
response. So, we can go ahead and actually start the API and 04:06:09.980 |
we can just see what this looks like. So, we'll go into 04:06:13.180 |
chapter 09 Capstone API after setting our environment 04:06:18.060 |
variables here and we just want to do uv run uvcorn main 04:06:23.260 |
colon app reload. We don't need to reload but if we're 04:06:26.620 |
modifying the code, that can be useful. Okay, and we can see 04:06:29.820 |
that our API is now running on localhost port 8000 and 04:06:37.340 |
if we go to our browser, we can actually open the docs for our 04:06:41.180 |
API. So, we go to 8000 slash docs. Okay, we just see that we 04:06:45.900 |
have that single invoke method. It extracts the content and it 04:06:51.420 |
gives us a small amount of information there. Now, we 04:06:54.780 |
could try it out here. So, if we say, say, hello, we can run 04:07:00.860 |
that and we'll see that we get a response. We get this. Okay. 04:07:08.140 |
Now, the thing that we're missing here is that this is 04:07:10.380 |
actually being streamed back to us. Okay. So, this is not a 04:07:15.340 |
just a direct response. This is a stream. To see that, we're 04:07:19.020 |
going to navigate over to here to this streaming testing 04:07:21.980 |
notebook and we'll run this. So, we are using requests here. 04:07:28.540 |
We are not just doing a, you know, the standard post request 04:07:32.940 |
because we want to stream the output and then print the 04:07:35.900 |
output as we are receiving them. Okay. So, that's why this 04:07:41.100 |
look, it's a little more complicated than just a typical 04:07:43.340 |
request request.get. So, what we're doing here is we're 04:07:49.340 |
starting our session which is our post request and then we're 04:07:53.580 |
just iterating through the content as we receive it from 04:07:57.340 |
that request. When we receive a token, right? Because sometimes 04:08:00.940 |
this might be none. We print that. Okay and we have that 04:08:04.700 |
flush equals truth. We have the use in the past. So, let's 04:08:08.780 |
define that and then let's just ask a simple question. What is 04:08:15.100 |
Okay and we we saw that was it was pretty quick. So, it 04:08:19.440 |
generated this response first and then it went ahead and 04:08:23.680 |
actually continued streaming with all of this. Okay and we 04:08:29.120 |
can see that there are these special tokens are being 04:08:31.360 |
provided. This is to help the front end basically decide, 04:08:36.240 |
okay, what should go where? So, here where we're showing these 04:08:41.280 |
multiple steps of tool use and the parameters. The way the 04:08:46.160 |
front end is deciding how to display those is it's just it's 04:08:50.800 |
being provided the single stream but it has these set 04:08:53.600 |
tokens. Has a step, has a set name, then it has the 04:08:57.120 |
parameters followed by the sort of ending of the set token and 04:09:01.200 |
it's looking at each one of these and then the one step 04:09:04.960 |
name that it treats differently is where it will see the final 04:09:08.800 |
answer step name. When it sees the final step name rather than 04:09:11.840 |
displaying this tool use interface, it instead begins 04:09:15.680 |
streaming the tokens directly like a typical chat interface 04:09:20.320 |
and if we look at what we actually get in our final 04:09:23.120 |
answer, it's not just the answer itself, right? So, we 04:09:26.720 |
have the answer here. This is streamed into that typical chat 04:09:32.640 |
output but then we also have tools used and then this is 04:09:36.240 |
added into the little boxes that we have below the chat 04:09:40.800 |
here. So, there's quite a lot going on just within this 04:09:44.000 |
little stream. Now, we can try with some other questions here. 04:09:48.880 |
So, we can say, okay, tell me about the latest news in the 04:09:50.960 |
world. You can see that there's a little bit of a wait here 04:09:52.960 |
whilst it's waiting to get the response and then, yeah, 04:09:56.160 |
it's streaming a lot of stuff quite quickly, okay? So, there's 04:10:00.160 |
a lot coming through here, okay? And then we can ask other 04:10:03.840 |
questions like, okay, this one here, how cold is it in Oslo 04:10:06.880 |
right now? Is five multiplied by five, right? So, these two 04:10:10.800 |
are going to be executed in parallel and then it will after 04:10:14.800 |
it has the answers for those, the agent will use another 04:10:18.400 |
multiply tool to multiply those two values together and all of 04:10:21.920 |
that will get streamed, okay? And then, as we saw earlier, we 04:10:26.640 |
have the what is the current date and time in these places. 04:10:29.440 |
Same thing. So, three questions. There are three 04:10:32.560 |
questions here. What is the current date and time in Dubai? 04:10:34.640 |
What is the current date and time in Tokyo and what is the 04:10:36.720 |
current date and time in Berlin? Those three questions 04:10:40.880 |
get executed in parallel against the API search tool and 04:10:45.200 |
then all answers get returned within that final answer, okay? 04:10:49.520 |
So, that is how our API is working. Now, let's dive a 04:10:55.360 |
little bit into the code and understand how it is working. 04:11:00.240 |
So, there are a lot of important things here. There's 04:11:03.280 |
some complexity but at the same time, we try to make this as 04:11:06.160 |
simple as possible as well. So, this is just fast API syntax 04:11:10.480 |
here with the app post invoke. So, just our invoke endpoint. 04:11:15.040 |
We consume some content which is a string and then if you 04:11:19.040 |
remember from the agent executed deep dive which is 04:11:22.480 |
what we've implemented here or a modified version of that, we 04:11:27.520 |
have to initialize our async IO queue and our streamer which 04:11:32.160 |
is the queue callback handler which I believe is exactly the 04:11:35.520 |
same as what we defined in that earlier chapter. There's no 04:11:38.800 |
differences there. So, we define that and then we return 04:11:43.520 |
this streaming response object, right? Again, this is a fast 04:11:46.960 |
API thing. This is so that you are streaming a response. That 04:11:50.880 |
streaming response has a few attributes here which again are 04:11:55.040 |
fast API things or just generic API things. So, some headers 04:12:00.000 |
giving instructions to the API and then the media type here 04:12:03.440 |
which is text event stream. You can also use, I think it's text 04:12:07.360 |
plane possibly as well but I believe the standard here would 04:12:12.000 |
be to use event stream and then the more important part for us 04:12:16.400 |
is this token generator, okay? So, what is this token 04:12:20.480 |
generator? Well, it is this function that we've defined up 04:12:24.080 |
here. Now, if you, again, if you remember that earlier 04:12:27.760 |
chapter, at the end of the chapter, we set up a for loop 04:12:33.280 |
where we're printing out different tokens in various 04:12:36.320 |
formats. So, we're kind of post processing them before 04:12:40.320 |
deciding how to display them. That's exactly what we're doing 04:12:43.520 |
here. So, in this block here, we're looping through every 04:12:50.400 |
token that we're receiving from our streamer. We're looping 04:12:54.720 |
through and we're just saying, okay, if this is the end of a 04:12:58.240 |
step, we're going to yield this end of step token which we we 04:13:02.640 |
saw here, okay? So, it's this end of end of set token there. 04:13:07.680 |
Otherwise, if this is a tool call, so again, we've got that 04:13:11.280 |
walrus operator here. So, what we're doing is saying, okay, 04:13:14.720 |
get the tool calls out from our current message. If there is 04:13:19.760 |
something there. So, if this is not none, we're going to execute 04:13:23.360 |
what is inside here and what is being executed inside here is 04:13:27.200 |
we're checking for the tool name. If we have the tool name, 04:13:30.160 |
we return this, okay? So, we have the start of step token, 04:13:35.040 |
the start of the step name token, the tool name or step 04:13:39.680 |
name, whichever those you want to call it, and then the end of 04:13:42.560 |
the step name token, okay? And then this, of course, comes 04:13:48.560 |
through to the front end like that, okay? That's what we have 04:13:52.320 |
there. Otherwise, we should only be seeing the tool name 04:13:55.680 |
returned as part of first token for every step. After that, it 04:13:59.520 |
should just be tool arguments. So, in this case, we say, okay, 04:14:03.440 |
if we have those tool or function arguments, we're going 04:14:06.480 |
to just return them directly. So, then that is the part that 04:14:09.840 |
would stream all of this here, okay? Like these would be 04:14:13.600 |
individual tokens, right? For example, right? So, we might 04:14:16.800 |
have the open curly brackets followed by query could be a 04:14:20.960 |
token, the latest could be a token, world could be a token, 04:14:24.640 |
news could be a token, etc. Okay? So, that is what is 04:14:28.160 |
happening there. This should not get executed but we have a, 04:14:32.720 |
we just handle that just in case. So, we have any issues 04:14:36.320 |
with tokens being returned there. We're just gonna print 04:14:39.040 |
this error and we're going to continue with the streaming but 04:14:43.600 |
that should not really be happening. Cool. So, that is 04:14:47.120 |
our token streaming loop. Now, the way that we are picking up 04:14:53.920 |
tokens from our stream object here is of course through our 04:14:57.840 |
agent execution logic which is happening in parallel, okay? So, 04:15:02.000 |
all of this is asynchronous. We have this async definition 04:15:04.720 |
here. So, all of this is happening asynchronously. So, 04:15:08.640 |
what has happened here is here, we have created a task which is 04:15:14.320 |
the agent executor invoke and we passing our content, we're 04:15:17.840 |
passing that streamer which we're gonna be pulling tokens 04:15:20.160 |
from and we also set verbose to true. Uh we can actually 04:15:24.160 |
remove that but that would just allow us to see additional 04:15:27.600 |
output in our terminal window if we want it. I don't think 04:15:32.640 |
there's anything particularly interesting to look at in there 04:15:36.400 |
but particularly if you are debugging that can be useful. 04:15:40.000 |
So, we create our task here but this does not begin the task. 04:15:45.440 |
Alright, this is a async IO create task but this does not 04:15:49.840 |
begin until we await it down here. So, what is happening 04:15:53.520 |
here is essentially this code here is still being run or in 04:15:58.880 |
like a we're in an asynchronous loop here but then we await 04:16:02.800 |
this task. As soon as we await this task, tokens will still 04:16:06.320 |
start being placed within our queue which then get picked up 04:16:10.480 |
by the streamer object here. So, then this begins receiving 04:16:14.880 |
tokens. I know async is always a little bit more confusing 04:16:20.880 |
given the strange order of things but that is essentially 04:16:25.040 |
what is happening. You can imagine all this is essentially 04:16:27.680 |
being executed all at the same time. So, we have that. So, 04:16:32.800 |
anything else to go through here? I don't think so. It's 04:16:35.520 |
all sort of boilerplate stuff for FastAPI rather than the 04:16:39.040 |
actual AI code itself. So, we have that as our streaming 04:16:43.600 |
function. Now, let's have a look at the agent code itself. 04:16:48.720 |
Okay. So, agent code. Where would that be? So, we're using 04:16:52.400 |
this agent execute invoke and we're importing this from the 04:16:56.720 |
agent file. So, we can have a look in here for this. Now, you 04:17:01.840 |
can see straight away, we're pulling in our API keys here. 04:17:06.000 |
Just, yeah, make sure that you do have those. Now, all of our 04:17:10.000 |
cell, okay? This is what we've seen before in that agent 04:17:14.800 |
executed deep dive chapter. This is all practically the 04:17:19.280 |
same. So, we have our LM. We've set those configurable fields 04:17:25.280 |
as we did in the earlier chapters. That configurable 04:17:28.240 |
field is for our callbacks. We have our prompt. This has been 04:17:31.760 |
modified a little bit. So, essentially, just telling it, 04:17:36.080 |
okay, make sure you use the tools provided. We say you must 04:17:40.480 |
use the final answer to provide a final answer to the user and 04:17:43.680 |
one thing that I added that I noticed every now and again. So, 04:17:47.360 |
I have explicitly said, use tools to answer the user's 04:17:50.400 |
current question, not previous questions. So, I found with 04:17:54.800 |
this setup, it will occasionally, if I just have a 04:17:58.720 |
little bit of small talk with the agent and beforehand I was 04:18:02.080 |
asking questions about, okay, like what was the weather in 04:18:04.720 |
this place or that place, the agent will kind of hang on to 04:18:08.000 |
those previous questions and try and use a tool again to 04:18:11.600 |
answer and that is just something that you can more or 04:18:14.240 |
less prompt out of it, okay? So, we have that. This is all 04:18:18.400 |
exactly the same as before, okay? So, we have our chat 04:18:21.200 |
history to make this conversational. We have our 04:18:23.920 |
human message and then our agent scratch pad so that our 04:18:27.040 |
agent can think through multiple tool use messages. 04:18:30.960 |
Great. So, we also have the article class. So, this is to 04:18:36.080 |
process results from SERP API. We have our SERP API function 04:18:42.160 |
here. I will talk about that a little more in a moment 04:18:45.040 |
because this is also a little bit different to what we 04:18:46.800 |
covered before. What we covered before with SERP API, if you 04:18:51.200 |
remember, was synchronous because we're using the SERP 04:18:55.040 |
API client directly or the SERP API tool directly from 04:18:59.840 |
BlankChain and because we want everything to be asynchronous, 04:19:03.920 |
we have had to recreate that tool in a asynchronous fashion 04:19:09.600 |
which we'll talk about a little bit later. But for now, let's 04:19:13.360 |
move on from that. We can see our final answer being used 04:19:18.000 |
here. So, this is I think we define the exact same thing 04:19:21.920 |
before probably in that deep dive chapter again where we 04:19:25.040 |
have just the answer and the tools that have been used. 04:19:29.200 |
Great. So, we have that. One thing that is a little 04:19:32.640 |
different here is when we are defining our name to tool 04:19:38.480 |
function. So, this takes a tool name and it maps it to a tool 04:19:43.680 |
function. When we have synchronous tools, we actually 04:19:48.800 |
use tool funk here. Okay. So, rather than tool coroutine, it 04:19:53.440 |
would be tool funk. However, we are using asynchronous tools 04:19:59.200 |
and so this is actually tool coroutine and this is why 04:20:04.960 |
if you come up here, I've made every single tool 04:20:08.320 |
asynchronous. Now, that is not really necessary for a tool 04:20:13.360 |
like final answer because there's no API calls 04:20:16.560 |
happening. An API call is a very typical scenario where 04:20:20.400 |
you do want to use async because if you make an API call 04:20:23.840 |
with a synchronous function, your code is just going to be 04:20:26.800 |
waiting for the response from the API while the API is 04:20:31.440 |
processing and doing whatever it's doing. So, that is an 04:20:36.080 |
ideal scenario where you would want to use async because 04:20:38.960 |
rather than your code just waiting for the response from 04:20:42.880 |
the API, it can instead go and do something else whilst it's 04:20:46.320 |
waiting, right? So, that's an ideal scenario where you'd use 04:20:49.360 |
async which is why we would use it for example with the 04:20:51.760 |
SERP API tool here but for final answer and for all of 04:20:56.320 |
these calculator tools that we've built, there's actually 04:21:00.720 |
no need to have these as async because our code is just 04:21:05.920 |
running through. It's executing this code. There's no waiting 04:21:09.280 |
involved. So, it doesn't necessarily make sense to have 04:21:12.080 |
these asynchronous. However, by making them asynchronous, it 04:21:16.160 |
means that I can do tool coroutine for all of them 04:21:19.440 |
rather than saying, oh, if this tool is synchronous, use 04:21:23.520 |
tool.func whereas if this one is async, use tool.coroutine. 04:21:28.000 |
So, it just simplifies the code for us a lot more but yeah, not 04:21:33.040 |
directly necessary but it does help us write cleaner code 04:21:36.800 |
here. This is also true later on because we actually have to 04:21:41.280 |
await our tool calls which we can see over here, right? So, 04:21:46.880 |
we have to await those tool calls. That would get messier 04:21:50.960 |
if we were using the like some sync tools, some async tools. 04:21:56.880 |
So, we have that. We have our Q callback handler. This is 04:22:00.320 |
again, that's the same as before. So, I'm not going to go 04:22:03.520 |
through. I'm not going to go through that. We covered that 04:22:06.080 |
in the earlier deep dive chapter. We have our execute 04:22:09.600 |
tool function here. Again, that is asynchronous. This just 04:22:13.120 |
helps us, you know, clean up code a little bit. This would, 04:22:16.640 |
I think in the deep dive chapter, we had this directly 04:22:20.000 |
place within our agent executor function and you can do that. 04:22:23.840 |
It's fine. It's just a bit cleaner to kind of pull this 04:22:26.880 |
out and we can also add more type annotations here which I 04:22:30.480 |
like. So, execute tool expects us to provide an AI message 04:22:34.400 |
which includes a tool call within it and it will return us 04:22:38.640 |
a tool message. Okay. Agent executor, this is all the same 04:22:44.480 |
as before and we're actually not even using verbose here so 04:22:48.240 |
we could fully remove it but I will leave it. Of course, if 04:22:51.040 |
you would like to use that, you can just add a if verbose and 04:22:54.400 |
then log or print some stuff where you need it. Okay. So, 04:22:59.760 |
what do we have in here? We have our streaming function. So, 04:23:02.720 |
this is what actually calls our agent, right? So, we have a 04:23:08.800 |
query. This will call our agent just here and we could even 04:23:14.080 |
make this a little clearer. So, for example, this could be 04:23:17.200 |
configured agent because this is this is not the response. 04:23:22.320 |
This is a configured agent. So, I think this is maybe a little 04:23:25.360 |
clearer. So, we are configuring our agent with our callbacks, 04:23:29.520 |
okay? Which is just our streamer. Then we're iterating 04:23:32.880 |
through the tokens are returned by our agent using a stream 04:23:37.040 |
here. Okay? And as we are iterating through this because 04:23:41.920 |
we pass our streamer to the callbacks here, what that is 04:23:46.400 |
going to do is every single token that our agent returns is 04:23:52.320 |
gonna get processed through our queue callback handler here. 04:23:57.280 |
Okay? So, this on LM token on LMN, these are going to get 04:24:03.360 |
executed and then all of those tokens you can see here are 04:24:07.360 |
passed to our queue. Okay? Then, we come up here and we 04:24:11.040 |
have this a iter. So, this a iter method here is used by our 04:24:16.000 |
generator over in our API is used by this token generator. 04:24:22.660 |
To pick up from the queue, the tokens that have been put in 04:24:28.420 |
the queue by these other methods here. Okay? So, it's 04:24:32.260 |
putting tokens into the queue and pulling them out with this. 04:24:38.020 |
Okay? So, that is just happening in parallel as well as 04:24:41.460 |
this code is running here. Now, the reason that we extract the 04:24:45.380 |
tokens out here is that we want to pull out our tokens and we 04:24:49.460 |
append them all to our outputs. Now, those outputs that becomes 04:24:53.780 |
a list of AI messages which are essentially the AI telling us 04:24:58.660 |
what tool to use and what parameters to pass to each one 04:25:02.580 |
of those tools. This is very similar to what we covered in 04:25:06.180 |
that deep dive chapter but the one thing that I have modified 04:25:09.380 |
here is I've enabled us to use parallel tool calls. So, that 04:25:17.460 |
is what we see here with this these four lines of code. We're 04:25:21.060 |
saying, okay, if our tool call includes an ID, that means we 04:25:24.660 |
have a new tool call or a new AI message. So, what we do is 04:25:29.940 |
we append that AI message which is the AI message chunk to our 04:25:35.060 |
outputs and then following that, if we don't get an ID, 04:25:38.180 |
that means we're getting the tool arguments. So, following 04:25:41.780 |
that, we're just adding our AI message chunk to the most 04:25:46.420 |
recent AI message chunk from our outputs. Okay, so what that 04:25:50.260 |
will do is it will create that list of AI messages. It'll be 04:25:56.500 |
like, you know, AI message one and then this will just append 04:26:01.780 |
everything to that AI message one. Then, we'll get our next 04:26:05.700 |
AI message chunk. This will then just append everything to 04:26:09.220 |
that until we get a complete AI message and so on and so on. 04:26:13.780 |
Okay. So, what we do here is here, we've collected all of 04:26:19.780 |
our AI message chunk objects. Then, finally, what we do is 04:26:23.460 |
just transform all those AI message chunk objects into 04:26:26.580 |
actual AI message objects and then return them from our 04:26:29.700 |
function which we then receive over here. So, into the tool 04:26:33.780 |
calls variable. Okay. Now, this is very similar to the deep 04:26:38.980 |
dive chapter. Again, we're going through that count, that 04:26:42.660 |
loop where we have a max iterations at which point we 04:26:45.300 |
will just stop but until then, we continue iterating through 04:26:50.660 |
and making more tool calls, executing those tool calls, and 04:26:53.700 |
so on. So, what is going on here? Let's see. So, we got our 04:26:58.580 |
tool calls. This is going to be a list of AI message objects. 04:27:02.660 |
Then, what we do with those AI message objects is we pass them 04:27:07.060 |
to this execute tool function. If you remember, what is that? 04:27:10.500 |
That is this function here. So, we pass each AI message 04:27:15.140 |
individually to this function and that will execute the tool 04:27:20.260 |
for us and then return us that observation from the tool. 04:27:25.620 |
Okay. So, that is what you see happening here but this is an 04:27:30.660 |
async method. So, typically, what you'd have to do is you'd 04:27:34.100 |
have to do await execute tool and we could do that. So, we 04:27:38.420 |
could do a, okay, let me make this a little bigger for us. 04:27:42.660 |
Okay. And so, what we could do, for example, which might be a 04:27:45.700 |
bit clearer is you could do tool obs equals an empty list 04:27:51.220 |
and what you could do is you can say for tool call, oops, in 04:27:56.180 |
tool calls, the tool observation is we're going to 04:28:00.980 |
append execute tool call which would have to be in a wait. So, 04:28:06.100 |
we'd actually put the await in there and what this would do is 04:28:09.460 |
actually the exact same thing as what we're doing here. The 04:28:12.740 |
difference being that we're doing this tool by tool. Okay. 04:28:17.540 |
So, we are, we're executing async here but we're doing them 04:28:22.340 |
sequentially whereas what we can do which is better is we 04:28:25.780 |
can use async gather. So, what this does is gathers all those 04:28:30.260 |
coroutines and then we await them all at the same time to 04:28:34.180 |
run them all asynchronously. They all begin at the same time 04:28:37.780 |
or almost exactly the same time and we get those responses 04:28:42.500 |
kind of in parallel but of course it's async so it's not 04:28:46.260 |
fully in parallel but practically in parallel. 04:28:50.260 |
Cool. So, we have that and then that, okay, we get all of our tool 04:28:54.900 |
observations from that. So, that's all of our tool messages 04:28:57.620 |
and then one interesting thing here is if we, 04:29:01.700 |
let's say we have all of our AI messages with all of our tool 04:29:04.980 |
calls and we just append all of those to our agent scratchpad. 04:29:09.460 |
Alright. So, let's say here we're just like, oh, okay, 04:29:11.860 |
agent scratchpad extend and then we would just have, okay, 04:29:17.700 |
we'd have our tool calls and then we do agent scratchpad 04:29:22.820 |
extend tool obs. Alright. So, what is happening here is this 04:29:27.780 |
would essentially give us something that looks like this. 04:29:33.700 |
So, we'd have our AI message, say, I'm just gonna put, okay, 04:29:38.660 |
we'll just put tool call IDs in here to simplify it a little 04:29:41.380 |
bit. This would be tool call ID A. Then, we would have AI 04:29:46.900 |
message, tool call ID B. Then, we'd have tool message. Let's 04:29:54.740 |
just remove this content field. I don't want that and tool 04:29:59.140 |
message, tool call ID B, right? So, it would look something 04:30:02.660 |
like this. So, the order is the tool message is not following 04:30:07.140 |
the AI message which you would think, okay, we have this tool 04:30:10.420 |
call ID. That's probably fine but actually, when we're 04:30:12.980 |
running this, if you add these two agents scratchpad in this 04:30:16.340 |
order, what you'll see is your response just hangs like 04:30:21.300 |
nothing. Nothing happens when you come through to your second 04:30:25.860 |
iteration of your agent call. So, actually, what you need to 04:30:29.620 |
do is these need to be sorted so that they are actually in 04:30:33.060 |
order and it doesn't actually doesn't necessarily matter 04:30:36.740 |
which order in terms of like A or B or C or whatever you use. 04:30:40.500 |
So, you could have this order. We have AI message, tool 04:30:43.460 |
message, AI message, tool message, just as long as you 04:30:46.180 |
have your tool call IDs are both together or you could, you 04:30:49.620 |
know, invert this for example, right? So, you could have this, 04:30:54.580 |
right? And that will work as well. It's essentially just as 04:30:58.180 |
long as you have your AI message followed by your tool 04:31:01.140 |
message and both of those are sharing that tool call ID. You 04:31:04.260 |
need to make sure you have that order, okay? So, that of course 04:31:09.140 |
would not happen if we do this and instead, what we need to do 04:31:13.700 |
is something like this, okay? So, if I make this a little 04:31:18.580 |
easier to read, okay? So, we're taking the tool call ID. We are 04:31:23.780 |
pointing it to the tool observation and we're doing 04:31:26.500 |
that for every tool call and tool observation within like a 04:31:29.860 |
zip of those, okay? Then, what we're saying is for each tool 04:31:35.060 |
call within our tool calls, we are extending our agent 04:31:38.820 |
scratchpad with that tool call followed by the tool 04:31:43.300 |
observation message which is the tool message. So, this would 04:31:46.420 |
be our, this is the AI message and that is the tool messages 04:31:51.860 |
down there, okay? So, that is always happening and that is 04:31:54.900 |
how we get this correct order which will run. Otherwise, 04:31:59.620 |
things will not run. So, that's important to be aware of, 04:32:04.020 |
okay? Now, we're almost done. I know there's, we've just been 04:32:07.220 |
through quite a lot. So, we continue, we increment our 04:32:10.820 |
count as we were doing before and then we need to check for 04:32:13.300 |
the final answer tool, okay? And because we're running these 04:32:16.260 |
tools in parallel, okay? Because we're allowing multiple 04:32:19.460 |
tool calls in one step, we can't just look at the most 04:32:23.300 |
recent tool and look if it is, it has the name final answer. 04:32:26.260 |
Instead, we need to iterate through all of our tool calls 04:32:28.740 |
and check if any of them have the name final answer. If they 04:32:32.020 |
do, we say, okay, we extract that final answer call. We 04:32:35.620 |
extract the final answer as well. So, this is the direct 04:32:38.660 |
text content and we say, okay, we have found the final answer. 04:32:42.900 |
So, this will be set to true, okay? Which should happen 04:32:45.940 |
every time but let's say if our agent gets stuck in a loop of 04:32:50.660 |
calling multiple tools, this might not happen before we 04:32:55.300 |
break based on the max iterations here. So, we might 04:32:58.820 |
end up breaking based on max iterations rather than we found 04:33:02.340 |
a final answer, okay? So, that can happen. So, anyway, if we 04:33:07.460 |
find that final answer, we break out of this for loop here 04:33:11.220 |
and then, of course, we do need to break out of our while loop 04:33:14.420 |
which is here. So, we say, if we found the final answer, 04:33:17.380 |
break, okay? Cool. So, we have that. Finally, after all of 04:33:24.100 |
that. So, this is how, you know, we've executed our tool, our 04:33:26.900 |
agent has steps and iterations, has process, we've been through 04:33:32.980 |
those. Finally, we come down to here where we say, okay, we're 04:33:37.220 |
gonna add that final output to our chat history. So, this is 04:33:40.980 |
just going to be the text content, right? So, this here, 04:33:45.140 |
get direct answer but then, what we do is we return the 04:33:50.180 |
full final answer call. The full final answer call is 04:33:52.740 |
basically this here, right? So, this answer and tools used but 04:33:57.220 |
of course, populated. So, we're saying here that if we have a 04:34:00.820 |
final answer, okay? If we have that, we're going to return the 04:34:05.620 |
final answer call which was generated by our LLM. 04:34:09.300 |
Otherwise, we're gonna return this one. So, this is in the 04:34:12.340 |
scenario that maybe the agent got caught in a loop and just 04:34:15.540 |
kept iterating. If that happens, we'll say it will come 04:34:19.220 |
back with, okay, no answer found and it will just return, 04:34:22.100 |
okay, we didn't use any tools which is not technically true 04:34:25.620 |
but it's this is like a exception handling event. So, 04:34:30.020 |
it ideally shouldn't happen but it's not really a big deal if 04:34:34.660 |
we're saying, okay, there were no tools used in my opinion 04:34:37.620 |
anyway. Cool. So, we have all of that and yeah, we just, we 04:34:44.340 |
initialize our agent executor and then, I mean, that is our 04:34:48.900 |
agent execution code. The one last thing we wanna go through 04:34:52.020 |
is the SERP API tool which we will do in a moment. Okay. So, 04:34:57.300 |
SERP API. Let's see what, let's see how we build our SERP API 04:35:04.260 |
tool. Okay, so, we'll start with the synchronous SERP API. 04:35:10.900 |
Now, the reason we're starting with this is that it's actually, 04:35:13.700 |
it's just a bit simpler. So, I'll show you this quickly 04:35:16.500 |
before we move on to the async implementation which is what 04:35:19.300 |
we're using within our app. So, we want to get our SERP API 04:35:23.700 |
API key. So, I'll run that and we just enter it at the top 04:35:28.260 |
there. And this will run. So, we're going to use the SERP 04:35:34.500 |
API SDK first. We're importing Google search and these are the 04:35:38.340 |
input parameters. So, we have our API key. We're using, we 04:35:41.220 |
say we want to use Google. We, our question is cell query. So, 04:35:45.220 |
queue for query. We're searching for the latest news in the 04:35:48.340 |
world and we'll return quite a lot of stuff. You can see 04:35:52.580 |
there's a ton of stuff in there, right? Now, what we want 04:35:58.900 |
is contained within this organic results key. So, we can 04:36:02.180 |
run that and we'll see, okay, it's talking about, you know, 04:36:06.500 |
various things. Pretty recent stuff at the moment. So, we can 04:36:10.340 |
tell, okay, that is, that is in fact working. Now, this is 04:36:14.340 |
quite messy. So, what I would like to do first is just clean 04:36:17.780 |
that up a little bit. So, we define this article base model 04:36:21.620 |
which is Pydantic and we're saying, okay, from a set of 04:36:25.780 |
results. Okay. So, we're going to iterate through each of 04:36:28.420 |
these. We're going to extract the title, source link, and the 04:36:33.620 |
snippet. So, you can see title, source, link, and snippet here. 04:36:42.340 |
Okay. So, that's all useful. We'll run that and what we do 04:36:46.740 |
is we go through each of the results in organic results and 04:36:51.220 |
we just load them into our article using this class method 04:36:54.020 |
here and then we can see, okay, let's have a look at what those 04:36:58.740 |
look like. It's much nicer. Okay, we get this nicely 04:37:04.260 |
formatted object here. Cool. That's great. Now, all of this, 04:37:10.340 |
what we just did here. So, this is using sub APIs SDK which is 04:37:14.660 |
great. Super easy to use. The problem is that they don't 04:37:17.700 |
offer a async SDK which is a shame but it's not that hard 04:37:22.820 |
for us to set up ourselves. So, typically, with asynchronous 04:37:28.260 |
requests, what we can use is the AIO HTTP library. It's well, 04:37:34.900 |
you can see what we're doing here. So, this is equivalent to 04:37:39.220 |
requests.get. Okay. That's essentially what we're doing 04:37:44.580 |
here and the equivalent is literally this. Okay. So, this 04:37:49.860 |
is the equivalent using requests that we are running 04:37:53.380 |
here but we're using async code. So, we're using AI HTTP 04:37:58.820 |
client session and then session.get. Okay. With this 04:38:03.540 |
async with here and then we just await our response. So, 04:38:06.340 |
this is all, yeah, this is what we do rather than this to make 04:38:10.980 |
our code async. So, it's really simple and then the output that 04:38:14.980 |
we get is exactly the same, right? So, we still get this 04:38:17.860 |
exact same output. So, that means, of course, that we can 04:38:21.300 |
use that articles method like this in the exact same way and 04:38:26.660 |
we get, we get the same result. There's no need to make this 04:38:30.420 |
article from sub API results async because again, like this, 04:38:35.700 |
this bit of code here is fully local. It's just our Python 04:38:39.540 |
running everything. So, this does not need to be async. Okay 04:38:44.820 |
and we can see that we get literally the exact same result 04:38:48.580 |
there. So, with that, we have everything that we would need 04:38:52.420 |
to build a fully asynchronous sub API tool which is exactly 04:38:56.340 |
what we do here for LangChain. So, we import those tools and I 04:39:00.580 |
mean, there's nothing, is there anything different here? No. 04:39:03.380 |
Alright, this is exactly what we we just did but I will run 04:39:06.420 |
this because I would like to show you very quickly this. 04:39:11.220 |
Okay. So, this is how we were initially calling our tools in 04:39:15.860 |
previous chapters because we were okay mostly with using the 04:39:19.860 |
the synchronous tools. However, you can see that the func here 04:39:26.100 |
is just empty. Alright, so if I do type, it's just a non type. 04:39:30.660 |
That is because well, this is an async function, okay? It's an 04:39:37.220 |
async tool. Sorry. So, it was defined with async here and 04:39:41.860 |
what happens when you do that is you get this coroutine object. 04:39:47.460 |
So, rather than func which is it isn't here, you get that 04:39:52.260 |
coroutine. If we then modify this which would be kinda, okay, 04:39:57.300 |
let's just remove all the asyncs here and the await. If we 04:40:03.540 |
modify that like so and then we look at the cert API 04:40:07.860 |
structured tool, we go across, we see that we now get that 04:40:12.020 |
func, okay? So, that is that is just the difference between an 04:40:15.940 |
async structured tool versus async structured tool via 04:40:19.620 |
corsion async. Okay, now we have coroutine again. So, 04:40:26.660 |
important to be aware of that and of course, we we run using 04:40:33.300 |
the cert API coroutine. So, that is that's how we build the 04:40:38.660 |
cert API tool and there's nothing. I mean, that is 04:40:42.740 |
exactly what we did here. So, I don't need to, I don't think we 04:40:45.380 |
need to go through that any further. So, yeah, I think that 04:40:49.780 |
is basically all of our code behind this API. With all of 04:40:54.340 |
that, we can then go ahead. So, we have our API running 04:40:57.780 |
already. Let's go ahead and actually run also our front 04:41:02.340 |
end. So, we're gonna go to Documents Aurelio Linechain 04:41:06.340 |
course and then we want to go to chapters zero nine capstone 04:41:12.100 |
app and you will need to have NPM installed. So, to do that, 04:41:16.420 |
what do we do? We can take a look at this answer for 04:41:19.460 |
example. This is probably what I would recommend, okay? So, I 04:41:23.060 |
would run brew install node followed by brew install NPM. 04:41:26.900 |
If you're on Mac, of course, it's different. If you're on 04:41:28.740 |
Linux or Windows, once you have those, you can do NPM install 04:41:33.060 |
and this will just install all of the oops, sorry, NPM install 04:41:37.460 |
and this will just install all of the node packages that we 04:41:41.780 |
need and then we can just run NPM run dev, okay? And now, we 04:41:48.260 |
have our app running on Locust 3000. So, we can come over to 04:41:52.820 |
here, open that up and we have our application. You can 04:41:57.140 |
ignore this. So, in here, we can begin just asking 04:42:00.500 |
questions, okay? So, we can start with a quick question. 04:42:07.380 |
MC. So, we have our streaming happening here. It said the 04:42:12.200 |
agent wants to use the add tool and these are the input 04:42:14.760 |
parameters to the add tool and then we get the streamed 04:42:17.880 |
response. So, this is the final answer tool where we're 04:42:21.800 |
outputting that answer key and value and then here, we're 04:42:25.240 |
outputting that tool used key and value which is just an 04:42:29.000 |
array of the tools being used which just functions add. So, 04:42:32.840 |
we have that. Then, let's ask another question. This time, 04:42:36.520 |
we'll trigger SERP API with tell me about the latest news 04:42:39.880 |
in the world. Okay. So, we can see that's using SERP API and 04:42:46.040 |
the query is latest world news and then it comes down here 04:42:51.560 |
and we actually get some citations here which is kind of 04:42:53.800 |
cool. So, you can also come through to here, okay? And it 04:42:58.040 |
takes us through to here. So, that's pretty cool. 04:43:01.080 |
Unfortunately, I just lost my chat. So, fine. Let me, I can 04:43:10.040 |
Okay. We can see that tools use SERP API there. Now, let's 04:43:19.360 |
continue with the next question from our notebook which is how 04:43:23.840 |
cold is it right now? What is five multiplied by five and 04:43:27.440 |
what do you get when multiplying those two numbers 04:43:29.760 |
together? I'm just gonna modify that to say in Celsius so that 04:43:35.760 |
I can understand. Thank you. Okay. So, for this one, we can 04:43:38.640 |
see what did we get? So, we got current temperature in Oslo. We 04:43:42.800 |
got multiply five by five which is our second question and then 04:43:47.200 |
we also got subtract. Interesting that I don't know 04:43:52.320 |
why I did that. It's kind of weird. So, it decided to use. 04:43:56.880 |
Oh, okay. So, this is, okay. So, then here it was. Okay, that 04:44:03.520 |
kind of makes sense. Does that make sense? Roughly. Okay. So, 04:44:07.440 |
I think the the conversion for Fahrenheit Celsius is say like 04:44:12.080 |
subtract thirty-two. Okay. Yes. So, to go from Fahrenheit to 04:44:18.000 |
Celsius, you are doing basically Fahrenheit minus 04:44:22.720 |
thirty-two and then you're multiplying by this number 04:44:24.880 |
here which the I assume the AI did not. I roughly did. Okay. 04:44:30.960 |
So, subtracting thirty-six like thirty-two would have given us 04:44:33.520 |
four and it gave us approximately two. So, if you 04:44:36.800 |
think, okay, multiply by this, it's practically multiplying by 04:44:40.400 |
0.5. So, halving the value and that would give us roughly two 04:44:45.120 |
degrees. So, that's what this was doing here. Kind of 04:44:48.560 |
interesting. Okay, cool. So, we've gone through. We have 04:44:53.520 |
seen how to build a fully fledged chat application using 04:44:59.280 |
what we've learned throughout the course and we've built 04:45:02.400 |
quite a lot. If you think about this application, you're 04:45:06.160 |
getting the real-time updates on what tools are being used, 04:45:10.160 |
the parameters being input to those tools, and then that is 04:45:12.640 |
all being returned in a streamed output and even in a 04:45:17.440 |
structured output for your final answer including the 04:45:19.760 |
answer and the tools that we use. So, of course, you know 04:45:23.920 |
what we built here is fairly limited but it's super easy to 04:45:27.920 |
extend this like you could maybe something that you might 04:45:31.360 |
want to go and do is take what we've built here and like fork 04:45:35.360 |
this application and just go and add different tools to it 04:45:38.160 |
and see what happens because this is very extensible. You 04:45:42.000 |
can do a lot with it but yeah, that is the end of the course. 04:45:46.400 |
Of course, this is just the beginning of whatever it is 04:45:50.800 |
you're wanting to learn or build with AI. Treat this as 04:45:55.200 |
the beginning and just go out and find all the other cool 04:45:59.040 |
interesting stuff that you can go and build. So, I hope this 04:46:03.120 |
course has been useful, informative, and gives you an 04:46:08.960 |
advantage in whatever it is you're going out to build. So, 04:46:12.800 |
thank you very much for watching and taking the course 04:46:15.680 |
and sticking through right to the end. I know it's pretty 04:46:18.720 |
long so I appreciate it a lot and I hope you get a lot out of