Back to Index

Fine-tuning OpenAI's GPT 3.5 for LangChain Agents


Chapters

0:0 Fine-tuning GPT 3.5 Turbo
1:44 Downloading the Training Data
2:57 Why Fine-tune an Agent
4:19 Training Data Format
6:10 Running OpenAI Fine-Tuning
9:4 Using Fine-Tuned GPT 3.5 in LangChain
11:1 Chatting with the Fine-Tuned Agent

Transcript

Today we're going to take a look at fine-tuning GPT 3.5. This is a very new feature that has been released by OpenAI and it gives us the ability to fine-tune our own custom GPT 3.5 models. So we can feed it a ton of our own conversations that are the ideal conversations we would like our chatbot or conversational agent to have and with those OpenAI will fine-tune a GPT 3.5 model and give that back to us as a sort of custom model that we can then just retrieve from the API given a...

we'll basically get like a unique ID for each model. So that's really cool because with these models, I mean there's a few things that we can do to improve them. We can do a ton of prompt engineering. It takes a long time but usually doesn't get us all the way there.

We can also do retrieval augmentation. Again it helps in some cases but not in others and if we add fine-tuning into that mix we actually have a lot of what we need in order to get almost any type of behavior that we'd like from our chatbots. So this is a very important feature and one that is going to become a key part of anyone's toolkit who is developing AI applications.

So a big part of fine-tuning is always data preparation and collection. We're going to touch on that in this video but I'm actually going to leave the majority of that for another video although it is very interesting so I definitely want to go into details of how I'm building these datasets.

But for now we'll just jump straight into the fine-tuning component. Okay so we're going to get started with this notebook here. We have a few installs here. We go ahead install those. We're going to get our dataset. So again this is one that I've pre-built. I will say there wasn't much manual work in this.

It's kind of all automated to build it. Essentially you can kind of see what we have here. These are all actually different but the system message for each one of these is the same but then you have like essentially like a conversation or a small segment of a conversation down here.

All of these are generated by GPT-4 so we can almost think of this fine-tuning as a form of knowledge distillation which is where you would use one model's knowledge and distill that knowledge into another model through some sort of fine-tuning approach which is what we're doing here. Now you can see example one the records here.

We have the role system. This is the template system prompt from LangChain. We have this role user and then we have you can see we have tools in here right. So that's the bit that is the most important because what I want to do here is actually fine-tune chat or GPT-3.5 to be a better agent and the issue that I have with GPT-3.5 as an agent is that it really struggles to use this JSON format.

So you can kind of see it here. It just seems to struggle with it a lot. GPT-4 does very well and that's why I've used GPT-4 to generate this data set. GPT-3.5 not so much. So we've created this data set in order to essentially teach it how to better use that format.

Okay now this is just focusing on almost like a retrieval tool only. Naturally in a real use case we would want more diverse conversations not just using a single tool. We'd want multiple tools and we'd also want conversations that are not necessarily always using tools but for this example I think this is good enough.

So I'm just going to save this to a local JSON lines file. We can see over here we can open this and all we will see is this. So we have our messages. So this is the format we need to follow when we're training. So we have a like a dictionary of messages and then we have a list of multiple messages that we're going to be training on.

Each one of these represents a single conversation. Okay we can go across like I said the first system message is the same for all of them. Again that's maybe something you'd want to modify or maybe you just use your particular format that you're using for a chatbot that you'd like to train.

So here we have next message role users. We have the tools that the assistant can use. Okay see these are all going to be the same again and then we get to here and this is where things begin to change. So here we have the end of the user message which is where we have a question from the user.

That is followed by the assistant responding with I want to use a vector search tool and it uses this JSON format that we've set up. And what we have here is like a search query. Okay main focus of technical report. This technical report and you see this is kind of coming from this.

Right all of this is generated by gpt4. So we said okay this is a question based on some information that we have from a data set. Make a search query which is what we have there and then continue and we get the context and we get later on an answer as well near the end.

Okay here assistant JSON action final answer. Focus of the technical report so on and so on is this. Right that's our format. So then we want to go on to here where we are going to first upload the files that we're going to be using for fine tuning. So that's our conversation JSON lines file.

We're going to be using this OpenAI file create which is obviously a new method in the OpenAI client. So you do need to make sure you have updated the client. The version I'm using here is 0279. Okay cool from that you we run this and we'll get a file id.

We need to take that so we get it from here. Get the file id and we're going to use it to create a fine tuning job. Okay so this is the actual like the fine tuning of our model and we specify what model we would like to fine tune.

At the moment this is our only option. So we run that. We should also sometimes it can take a little bit of time for this file to be available but you'll see if you if it pops up with an error here that means it just isn't available yet. So you just need to wait a little bit.

It doesn't take too long though. Okay so now we can see we we have this fine tuning job running and it says finish at null. That's because obviously it takes a little bit of time for the model to be fine tuned. So we essentially need to just wait until this is not null anymore and once that is not null we will also get our fine tuned model id which we'll need to use later on.

So we can yeah we'll get the job id where is that it is here and using that job id we can just retrieve the information for our fine tuning job like so. Okay so you see it still hasn't finished and it won't actually finish for a while. You can also check this so this will just say hey look created fine tuning job and then fine tuning job actually started.

Okay and yeah there the events what you can do is you can kind of set this so this will just keep looping through and every 100 seconds it's going to call the API and check if the fine tuning job has finished. You can either do that or you can check your emails.

So OpenAI once your fine tuning job has finished will also send you an email. Okay so this does take a little bit of time. I don't I think it was at least sort of 10-20 minutes when I ran it so probably more along 20 or so minutes. So what I'm going to do is just pause this or stop this and I am just going to come to here which is the model that I fine-tuned before this video.

So this one is already ready and I can just I'm just going to go ahead and use this rather than waiting. Now okay once we have our fine-tuned model we'll get our you know we get the id from here which is what I mentioned before and then we're going to I'm just going to test this.

Okay so I'm actually going to use this fine-tuned model in a conversational agent through LineChain. So to do so yeah we need the fine-tuned model id. I'm going to use a like a retrieval chain. The retrieval chain in LineChain I find is a little bit slow and overly complicated so I'm going to just load my own custom chain from here like retrieval chain.

Retrieval tool is what we'll be using it as and essentially it just makes less LLM calls so retrieval will be faster. Okay and from there we just want to load our or create our conversational agent as we typically would. So we have a LLM here we're just using our fine-tuned model name.

We yeah we're using conversational buffer window memory. We have this vector db chain which is going to be our tool which we initialize here. So this tool is using Pinecone so if you're following along you can just get your environment and your API key from here. Okay cool so once we have initialized all of those we can come down to here which is initializing our agent.

Conversational right description so conversational agent and we just run that. Cool now we can say okay tell me about LLAMA2 and I actually I should just point out that in order to use this tool here you would need to index your data first. So I have another notebook for that I'll leave a link to that at the top of the video so you can run through that if you if you really want to go through the whole thing.

So okay let's go with tell me about LLAMA2 it's just going to give us an answer but what we want to see is that it's correctly using this JSON format which it is doing which is really cool. So yeah we see it for the vector search tool action it did that I'm printing out more than I need to here I'm printing out the vector because I was debugging earlier.

We have a second like JSON or agent action which is to provide the final answer to the user and it is LLAMA2 is a collection of pre-trained fine-tuned large language models so on and so on right. Okay cool so that is one question let's try another. What makes LLAMA2 so special?

I think this is pulling from like the same context but we can come up to here again I'm printing out too much but it's fine. So LLAMA2 is special because it features a collection of pre-trained and fine-tuned large language models, optimizer dialog, use cases, these models called so on and so on.

I think we're supposed to say LLAMA outperform open source chat models on most benchmarks tested and maybe substitute for closed source models. Okay looks good now tell me about LLAMA2 red teaming. Again what we're looking for is the correct JSON format which it handles nicely and yeah we get I think it's the same same good answer again right.

So all looks good everything's in the format that we would want and yeah that is how we would fine-tune a GPT 3.5 model on our own dataset. Again like I said a big part of this is actually building the dataset this is just a very quick sort of demo example on how to do this.

So what I want to do going forwards is put together a more a longer video on fine-tuning where we'll kind of dive into actually building the dataset which is honestly the more important part of all of this but at least we've seen the different endpoints for the OpenAI client that we need to use when we're fine-tuning.

So for now I'll leave it there I hope this has been helpful and interesting so thank you very much for watching and I will see you again in the next one. Bye! you