Prompt Templating and Techniques in LangChain

Now we're going to move on to the chapter on prompts in Langchain. Now prompts they seem like a simple concept and they are a simple concept but there's actually quite a lot to them when you start diving into them and they truly have been a very fundamental part of what has propelled us forwards from pre-LLM times to the current LLM times.

You have to think until LLMs became widespread the way to fine-tune a AI model or ML model back then was to get loads of data for your particular use case, spend a load of training your specific transformer or part of the transformer to essentially adapt it for that particular task.

That could take a long time. Depending on the the task it could take you you know months or in some times if it was a simpler task you might take probably days potentially weeks. Now the interesting thing with LLMs is that rather than needing to go through this whole fine tuning process to modify a model for one task over another task rather than doing that we just prompt it differently we literally tell the model hey I want you to do this in this particular way and that is a you know that's a paradigm shift in what you're doing it's so much faster it's going to take you you know a couple of minutes rather than days weeks or months and LLMs are incredibly powerful when it comes to just generalizing to you know across these many different tasks.

So prompts which control those instructions are a fundamental part of that. Now line chain naturally has many functionalities around prompts and we can build very dynamic prompting pipelines that modify the structure and content of what we're actually feeding into our LLM depending on different variables different inputs and we'll see that in this chapter so we're going to work through prompting within the scope of a RAG example.

So let's start by just dissecting the various parts of a prompt that we might expect to see for a use case like RAG. So our typical prompt for RAG or retrieval augmented generation will include rules for the LLM and this is you know this you will see in most prompts if not all.

This part of the prompt sets up the behavior of the LLM that is how it should be responding to user queries, what sort of personality it should be taking on, what it should be focusing on when it is responding, any particular rules or boundaries that we want to set and really what we're trying to do here is just to simply provide as much information as possible to the LLM about what we're doing.

We just want to give the LLM context as to the place that it finds itself in because now LLM has no idea where it is, it's just is a it takes in some information and spits out information. If the only information it receives is from the users you know user query it has you know it doesn't know the context what is the application that it is within, what is its objective, what is its aim, what are the boundaries.

All of this we need to just assume the LLM has absolutely no idea about because it truly does not. So as much context as we can provide but it's important that we don't overdo it. We see this all the time people will over prompt an LLM. You want to be concise you don't want fluff and in general every single part of your prompt the more concise and less fluffy you can make it the better.

Now those rules or instructions are typically in system prompt of your LLM. Now the second one is context which is RAG specific. The context refers to some sort of external information that you are feeding into your LLM. We may have received this information from like a web search, database, query or quite often in this case of RAG it's a vector database.

This external information that external information that we provide is essentially the RA, retrieval augmentation of RAG. We are augmenting the knowledge of our LLM which the knowledge of our LLM is contained within the LLM model weights. We're augmenting that knowledge with some external knowledge. That's what we're doing here.

Now for chat LLMs this context is typically placed within a conversational context within the user or assistant messages and with more recent models it can also be placed within tool and messages as well. Then we have the question. It's pretty straightforward. This is the query from the user. This is more as it's usually a user message of course.

There might be some additional formatting around this. You might add a little bit of extra context or you might add some additional instructions if you find that your LLM sometimes veers off the rules that you've set within the system prompt. You might you know append or prefix something here but for the most part it's probably just going to be the user's input.

And finally so these are all the inputs for our prompt. Here is going to be the output that we get. So the answer from the assistant. Again I mean that's not even specific to RAG. It's just what you would expect in a chat LLM or any LL. And of course that would be an assistant message.

So putting all of that together an actual prompt so you can see everything we have here. So we have the rules for our prompt here. The instructions we're just saying okay answer the question based on the context below. If you cannot answer the question using the information as with I don't know.

Then we have some context here. Okay in this scenario that context that we're feeding in here because it's the first message we might put into the system prompt. But that may also be turned around. Okay if you if you for example have an agent you might have your question up here before the context.

And then that would be coming from a user message. And then this context would follow the question and be recognized as a tool message. It would be fed in that way as well. It kind of depends on on what sort of structure you're going for there. But you can do either.

You can feed it into the system message if it's less conversational. Whereas if it's more conversation we might feed it in as a tool message. Okay and then we have a user query which is here. And then we'd have the AI answer. Okay and obviously that would be generated here.

Okay so let's switch across to the code. We're in the Langchain course repo notebook's 03 prompts. I'm just going to open this in Colab. Okay let's scroll down and we'll start just by installing the prerequisites. Okay so we just have the various libraries again. As I mentioned before Langsmith is optional.

You don't need to install it but if you would like to see your traces and everything in Langsmith then I would recommend doing that. And if you are using Langsmith you will need to enter your API key here. Again if you're not using Langsmith you don't need to enter anything here.

You just skip that cell. Okay cool. And let's jump into the basic prompting then. So we're going to start with this prompt and so use query based on the question below. So we're just structuring what we just saw in code. And we're going to be using the chat prompt template because generally speaking we're using chat LLMs in most cases nowadays.

So we have our chat prompt template and that is going to contain a list of messages. System message to begin with which is just going to contain this and we're feeding in the context within that there. And we have our user query here. Okay so we'll run this and if we take a look here we haven't specified what our input variables are.

Okay but we can see that we have query and we have context up here. Right so we can see that okay these are the input variables we just haven't explicitly defined them here. So let's just confirm with this that line chain did pick those up and we can see that it did.

So it has context and query as our input variables for the prompt template that we just defined. Okay so we can also see the structure of our templates. Let's have a look. Okay so we can see that within messages here we have a system message prompt template. The way that we define this you can see here that we have from messages and this will consume various different structures.

So you can see here that it has a from messages is a sequence of message like representation. So we could pass in a system prompt template object and then a user prompt template object or we can just use a tuple like this and this actually defines okay this system this is a user and you could also do assistant or tool messages and stuff here as well using the same structure.

And then we can look in here and of course that is being translated into the system message prompt template and human message prompt template. Okay we have our input variables in there and then we have the template too. Okay now let's continue we'll see here why what I just said.

So we're importing our system message prompt template and human message prompt template and you can see we're using the same from messages method here right and you can see it's still sequence of message like representation it's just you know what that actually means it can vary right. So here we have system message prompt template from template prompt here from template query you know there's various ways that you might want to do this it just depends on how explicit you want to be.

Generally speaking I think for myself I would prefer that we stick with the objects themselves and be explicit but it is definitely a little harder to pass when you're when you're reading this so I understand why you might also prefer this it's definitely cleaner and it is it does look simpler so it just depends I suppose on preference.

Okay so you can see again that this is exactly the same okay with chat prompt template and it contains this and this okay you probably want to see the exact output so as messages. Okay exactly the same as what I output before. Cool so we have all that let's see how we would invoke our LLM with these we're going to be using 4.0 mini again we do need our API key so enter that and we'll just initialize our LLM we are going with a low temperature here so less randomness or less creativity and you know in in many cases this is actually what I would be doing.

The reason in this scenario that we're going with low temperature is we're doing RAG and if you remember before we scroll up a little bit here our template says answer the user's query based on the context below if you cannot answer the question using the provided answer information answer with I don't know right so just from reading that we know that we want our LLM to be as truthful and accurate as possible so a more creative LLM is going to struggle with that and is more likely to hallucinate whereas a low creativity or low temperature LLM will probably stick with the rules a little better so again it depends on your use case you know if you're creative writing you might want to go with a higher temperature there but for things like RAG where the information being output should be accurate and truthful it's important I think that we keep temperature low okay I talk about that a little bit here so of course a lower temperature of zero makes the LLM's output more deterministic which in theory should lead to less hallucination okay so we're gonna go with lcell again here this is for those of you that use linechain in the past this is equivalent to an LLM chain object so our prompt template is being fed into our LLM okay and from now we have this pipeline now let's see how we would use that pipeline so we're gonna get some uh create some context here so this is just some context around Aurelio AI mention you know that we built semantic routers semantic junkers there's AI platform and development services we mentioned I think we specifically outline this later on in the example so the linechain experts a little piece of information now most LLMs would have not been trained on the recent internet so the fact that this came in September 2024 is relatively recent so a lot of LLMs out of the box you wouldn't expect them to know that so that is a good little bit of information to ask about so we invoke we have our query so what do we do and we have that context okay so we're feeding that into that pipeline that we defined here all right so when we invoke that it's automatically going to take query and context and actually feed it into our prompt template okay if we want to we can also be a little more explicit so you you will probably see me doing this throughout the course because I do like to be explicit with everything to be honest and you'll probably see me doing this okay and this is doing the same thing or you'll see it will in a moment this is doing the exact same thing again this is again this is just a also thing so all I'm doing in this scenario is I'm saying okay take from the dictionary query and then also take from that input dictionary the context key okay so this is doing the exact same thing the reason that we might want to write this is mainly for clarity to be honest just too explicit say okay these are the inputs because otherwise we don't really have them in the code other than within our original prompts up here which is not super clear so I think it's usually a good idea to just be more explicit with these things and of course if you decide you're going to modify things a little bit let's say you modify this to input down the line you can still feed in the same input here you're just you know mapping it between different keys essentially or if you would like to just modify that you need to lowercase it on the way in or something you can do so you have that I'll just uh redefine that actually and we'll invoke again okay and we see that this is the exact same thing okay so radio ai and so this is a ai message just generated by the lm okay expertise in building ai agents several open source frameworks router ai platform okay all right so provide them so they have everything there other than the line chain experts saying it didn't mention that but we will yeah we'll test it later on that okay so on to future prompting this is a specific prompting technique now many sort of state of the art or also to lms are very good at instruction following so you'll find that future prompting is less common now than it used to be at least for this or bigger more safety art models but when you start using smaller models not really what we can use here but let's say you're using a source model like llama 3 or llama 2 which is much smaller you will probably need to consider things like few shot prompting although that being said with open ai models you're not at least the current opening models this is not so important nonetheless it can be useful so the idea behind future prompting is that you are providing a few example examples to your mlm of how it should behave before you are actually going into the main part of the conversation so let's see how that would look so we create an example prompt so we have our human and ai so human input ai response so we're basically setting up okay this with this type of input you should provide this type of output that's what we're doing here and we're just going to provide some examples okay so we have our input here's query one here is the answer one all right this is just i just want to show you how it works this is not what we'd actually feed into our lm then with both these examples and our example prompt would feed both of these into line chains a few shot chat message prompt template okay and while you're you'll see what we get out of it okay so we basically get it formats everything and structures everything for us okay and using this of course it depends on let's say you see that your user is talking about a particular topic and you would like to guide your llm to talk about that particular topic in a particular way right so you could identify that the user is talking about that topic either like a keyword match or a semantic similarity match and based on that you might want to modify these examples that you feed into your few shot chat message prompt template and then obviously for that could be what you do for topic a for topic b you might have another set of examples that you feed into this all the all this time your example prompts is remaining the same but you're you're just modifying the examples that are going in so that they're more relevant to whatever it is your user is actually talking about so that can be useful now let's see an example of that so when we are using a tiny lm its ability would be limited although i think we are probably fine here we're going to say answer the user's query based on the context below always enter a markdown format you know being very specific the self-system prompt okay that's nice but what we've kind of said here is okay always answer a markdown format do that but when doing so please provide headers short summaries and follow bullet points then conclude okay so you see this and yeah here okay so we get this overview already right now you have this and this it's actually quite good but if we come down here what i specifically want is to always follow this structure right so we have the double header for the topic summary header a couple of bullet points and then i always want to follow this pattern where it's like to conclude always it's always bold you know i want to be very specific on what i want and to be you know fully honest with gpt4o mini you can actually just prompt most of this in but for the sake of the example we're going to provide a few shot examples in our few shot prompt examples instead to get this so we're going to provide one example here second example here and you see we're just following that same pattern we're just setting up the pattern that the lm should use so we're going to set that up here we have our main header a little summary some subheaders bullet points subheader bullet points subheader bullet points to conclude so on and so on same with this one here okay and let's see what we got okay so this is the structure of our new few shot prompt template you can see what all this looks like let's come down and we're going to do we're basically going to insert that directly into our chat prompt template so we have from messages system prompt user prompt and then we have in there these so let me actually show you very quickly right so we just have our this few shot chat to message prompt template which will be fed into the middle here run that and then feed all this back into our pipeline okay and this will you know modify the structure so that we have that bold to conclude at the end here okay we can see nicely here so we get a bit more of that the exact structure that we were getting again with gpt4o models and many other open air models you don't really need to do this but you will see it in other examples we do have an example of this where we're using a llama and we're using i think llama 2 if i'm not wrong and you can see that adding this few shot prompt template is actually a very good way of getting those smaller less capable models to follow your instructions so this is really when you're working those smaller lines this can be super useful but even first so to models like gp4o if you do find that you're struggling with the prompting it's just not quite following exactly what you want it to do this is a very good technique for actually getting it to follow a very strict structure or behavior okay so moving on we have chain of thought prompting so this is a more common prompting technique that encourages the elm to think through its reasoning or its thoughts step by step so it's a chain of thoughts the idea behind this is that okay in math class when you're a kid the teachers would always push you to put down your your working out right and there's multiple reasons for that one of them is to get you to think because they they know in a lot of cases actually you know you're a kid and you're in a rush and you don't really care about this test and the you know they're just trying to get you to slow down a little bit and actually put down your reasoning and that kind of forced you to think oh actually i'm skipping a little bit in my head because i'm trying to just do everything up here if i write it down all of a sudden it's like oh actually i'm yeah i need to actually do that slightly differently you you realize okay you're probably rushing a little bit now i'm not saying an lm is rushing but it's a similar effect by an lm writing everything down they tend to actually get things right more frequently and at the same time also similar to when you're a child and a teacher is reviewing your exam work by having the lm write down its reasoning you as a as a human or engineer you can see where the lm went wrong if it did go wrong which can be very useful when you're trying to diagnose problems so with train of thought we should see less hallucinations and generally bad performance now to implement chain of thought in line chain there's no specific like line chain objects that do that instead it's it's just prompting okay so let's go down and just see how we might do that okay so be helpful assistant answer the user question you must answer the question directly without any other text or explanation okay so that's our no chain of thought system prompt i will just note here especially with open ai again this is one of those things where you'll see it more with the smaller models most elements are actually trained to use chain of thought prompting by default so we're actually specifically telling it here you must answer the question directly without any other text or explanation okay so we're actually kind of reverse prompting it to not use chain of thought otherwise by default it actually will try and do that because it's been trained to that's how that's how relevant chain of thought is okay so i'm going to say how many keystrokes you need to type in type the numbers from one to five hundred okay we set up our like lm chain pipeline and we're going to just invoke our query and we'll see what we get total number of keystrokes needed to type numbers from one to five hundred is one thousand five hundred and eleven uh the actual answers i've written here is one thousand three hundred and ninety two without chain thought is hallucinating okay now let's go ahead and see okay with chain of thought prompting what does it do so be helpful assistant answer user's question to answer the question you must list systematically and in precise detail all sub problems that are needed to be solved to answer the question solve each sub problem individually you have to shout at the lm sometimes to get them to listen and in sequence finally use everything you've worked through to provide the final answer okay so we're getting it we're forcing it to kind of go through the full problem now we can remove that not sure why so run that again i don't know why we have context there i'll remove that and let's see you can see straight away that's taking a lot longer to generate the output that's because it's generating so many more tokens so that's just one one drawback of this but let's see what we have so to determine how many keystrokes to type those numbers we is breaking down several sub problems so count number of digits from one to nine ten to ninety nine so so on and count the digits in number 500.

okay interesting so that's how it's breaking it up some more digits counted in the previous steps so we go through total digits and we see us okay nine digits for those for here 180 for here 1200 and then of course three here so it gets all those sums those digits and actually comes to the right answer okay so that that is you know that's the difference with with chain of thought versus uh without so without it we just get the wrong answer basically guessing with chain of thought we get the right answer just by the llm writing down its reasoning and breaking the problem down into multiple parts which is i found that super interesting that it it does that so that's pretty cool now i will just see so as i meant as we mentioned before most llms nowadays are actually trying to use chain of thought prompting by default so let's just see if we don't mention anything right be a helpful assistant and answer the user's question so we're not telling it not to think through its reasoning and we're not telling it to think through its reasoning let's just see what it does okay so you can see again it's actually doing the exact same reasoning okay it doesn't it doesn't give us like the sub problems at the start but it is going through and it's breaking everything apart okay which is quite interesting and we get the same correct answer so the formatting here is slightly different it's probably a little cleaner actually although i think uh i don't know i here we get a lot more information so both are fine and in this scenario we actually do get the the right answer as well so you can see that that chain of thought prompting has actually been quite literally trained into the model and you'll see that with most uh well i think all save the art lms okay cool so that is our our chapter on prompting again we're just focusing very much on a lot of the fundamentals of prompting there and of course tying that back to the actual objects and methods within line chain but for now that's it for prompting and we'll move on to the next chapter

Prompt Templating and Techniques in LangChain

Chapters

Transcript