back to indexFine-tuning OpenAI's GPT 3.5 for LangChain Agents
Chapters
0:0 Fine-tuning GPT 3.5 Turbo
1:44 Downloading the Training Data
2:57 Why Fine-tune an Agent
4:19 Training Data Format
6:10 Running OpenAI Fine-Tuning
9:4 Using Fine-Tuned GPT 3.5 in LangChain
11:1 Chatting with the Fine-Tuned Agent
00:00:00.000 |
Today we're going to take a look at fine-tuning GPT 3.5. This is a very new feature that has been 00:00:06.640 |
released by OpenAI and it gives us the ability to fine-tune our own custom GPT 3.5 models. 00:00:14.080 |
So we can feed it a ton of our own conversations that are the ideal conversations we would like 00:00:22.160 |
our chatbot or conversational agent to have and with those OpenAI will fine-tune a GPT 3.5 model 00:00:31.520 |
and give that back to us as a sort of custom model that we can then just retrieve from the API 00:00:39.040 |
given a... we'll basically get like a unique ID for each model. So that's really cool because 00:00:46.960 |
with these models, I mean there's a few things that we can do to improve them. We can do a ton 00:00:53.520 |
of prompt engineering. It takes a long time but usually doesn't get us all the way there. We can 00:00:58.720 |
also do retrieval augmentation. Again it helps in some cases but not in others and if we add 00:01:04.720 |
fine-tuning into that mix we actually have a lot of what we need in order to get almost any type of 00:01:12.720 |
behavior that we'd like from our chatbots. So this is a very important feature and one that 00:01:20.000 |
is going to become a key part of anyone's toolkit who is developing AI applications. 00:01:25.600 |
So a big part of fine-tuning is always data preparation and collection. We're going to 00:01:31.280 |
touch on that in this video but I'm actually going to leave the majority of that for another video 00:01:35.840 |
although it is very interesting so I definitely want to go into details of how I'm building these 00:01:40.000 |
datasets. But for now we'll just jump straight into the fine-tuning component. Okay so we're 00:01:45.920 |
going to get started with this notebook here. We have a few installs here. We go ahead install those. 00:01:54.160 |
We're going to get our dataset. So again this is one that I've pre-built. I will say there wasn't 00:02:01.280 |
much manual work in this. It's kind of all automated to build it. Essentially you can 00:02:08.080 |
kind of see what we have here. These are all actually different but the system message for 00:02:13.520 |
each one of these is the same but then you have like essentially like a conversation or a small 00:02:20.480 |
segment of a conversation down here. All of these are generated by GPT-4 so we can almost think of 00:02:27.840 |
this fine-tuning as a form of knowledge distillation which is where you would use one 00:02:34.720 |
model's knowledge and distill that knowledge into another model through some sort of fine-tuning 00:02:41.360 |
approach which is what we're doing here. Now you can see example one the records here. 00:02:50.160 |
We have the role system. This is the template system prompt from LangChain. We have this role 00:02:58.640 |
user and then we have you can see we have tools in here right. So that's the bit that is the most 00:03:05.920 |
important because what I want to do here is actually fine-tune chat or GPT-3.5 to be a better 00:03:15.920 |
agent and the issue that I have with GPT-3.5 as an agent is that it really struggles to use this 00:03:26.160 |
JSON format. So you can kind of see it here. It just seems to struggle with it a lot. GPT-4 does 00:03:34.000 |
very well and that's why I've used GPT-4 to generate this data set. GPT-3.5 not so much. 00:03:40.560 |
So we've created this data set in order to essentially teach it how to better use that 00:03:46.000 |
format. Okay now this is just focusing on almost like a retrieval tool only. Naturally in a real 00:03:56.000 |
use case we would want more diverse conversations not just using a single tool. We'd want multiple 00:04:02.880 |
tools and we'd also want conversations that are not necessarily always using tools but for this 00:04:09.120 |
example I think this is good enough. So I'm just going to save this to a local JSON lines file. We 00:04:16.160 |
can see over here we can open this and all we will see is this. So we have our messages. So this is 00:04:23.600 |
the format we need to follow when we're training. So we have a like a dictionary of messages and 00:04:30.560 |
then we have a list of multiple messages that we're going to be training on. Each one of these 00:04:35.840 |
represents a single conversation. Okay we can go across like I said the first system message is the 00:04:43.040 |
same for all of them. Again that's maybe something you'd want to modify or maybe you just use your 00:04:50.720 |
particular format that you're using for a chatbot that you'd like to train. So here we have next 00:04:56.800 |
message role users. We have the tools that the assistant can use. Okay see these are all going 00:05:04.080 |
to be the same again and then we get to here and this is where things begin to change. So here we 00:05:09.520 |
have the end of the user message which is where we have a question from the user. That is followed 00:05:16.480 |
by the assistant responding with I want to use a vector search tool and it uses this JSON format 00:05:24.080 |
that we've set up. And what we have here is like a search query. Okay main focus of technical report. 00:05:32.400 |
This technical report and you see this is kind of coming from this. Right all of this is generated 00:05:38.880 |
by gpt4. So we said okay this is a question based on some information that we have from a data set. 00:05:46.480 |
Make a search query which is what we have there and then continue and we get the context and we 00:05:55.120 |
get later on an answer as well near the end. Okay here assistant JSON action final answer. 00:06:02.640 |
Focus of the technical report so on and so on is this. Right that's our format. So then we want to 00:06:10.960 |
go on to here where we are going to first upload the files that we're going to be using for fine 00:06:17.760 |
tuning. So that's our conversation JSON lines file. We're going to be using this OpenAI file 00:06:24.880 |
create which is obviously a new method in the OpenAI client. So you do need to make sure you 00:06:31.840 |
have updated the client. The version I'm using here is 0279. Okay cool from that you we run this 00:06:43.200 |
and we'll get a file id. We need to take that so we get it from here. Get the file id and we're 00:06:50.880 |
going to use it to create a fine tuning job. Okay so this is the actual like the fine tuning of our 00:06:57.600 |
model and we specify what model we would like to fine tune. At the moment this is our only option. 00:07:04.560 |
So we run that. We should also sometimes it can take a little bit of time for this file to be 00:07:13.600 |
available but you'll see if you if it pops up with an error here that means it just isn't 00:07:19.680 |
available yet. So you just need to wait a little bit. It doesn't take too long though. Okay so now 00:07:25.520 |
we can see we we have this fine tuning job running and it says finish at null. That's because 00:07:33.040 |
obviously it takes a little bit of time for the model to be fine tuned. So we essentially need to 00:07:40.080 |
just wait until this is not null anymore and once that is not null we will also get our fine tuned 00:07:48.080 |
model id which we'll need to use later on. So we can yeah we'll get the job id where is that it is 00:07:58.640 |
here and using that job id we can just retrieve the information for our fine tuning job like so. 00:08:06.720 |
Okay so you see it still hasn't finished and it won't actually finish for a while. You can also 00:08:14.960 |
check this so this will just say hey look created fine tuning job and then fine tuning job actually 00:08:22.720 |
started. Okay and yeah there the events what you can do is you can kind of set this so this will 00:08:29.200 |
just keep looping through and every 100 seconds it's going to call the API and check if the fine 00:08:36.720 |
tuning job has finished. You can either do that or you can check your emails. So OpenAI once your 00:08:43.200 |
fine tuning job has finished will also send you an email. Okay so this does take a little bit of 00:08:49.760 |
time. I don't I think it was at least sort of 10-20 minutes when I ran it so probably more 00:08:57.920 |
along 20 or so minutes. So what I'm going to do is just pause this or stop this and I am just going 00:09:05.040 |
to come to here which is the model that I fine-tuned before this video. So this one is already 00:09:13.280 |
ready and I can just I'm just going to go ahead and use this rather than waiting. Now okay once 00:09:19.120 |
we have our fine-tuned model we'll get our you know we get the id from here which is what I 00:09:24.400 |
mentioned before and then we're going to I'm just going to test this. Okay so I'm actually going to 00:09:31.200 |
use this fine-tuned model in a conversational agent through LineChain. So to do so yeah we need 00:09:39.040 |
the fine-tuned model id. I'm going to use a like a retrieval chain. The retrieval chain in LineChain 00:09:49.360 |
I find is a little bit slow and overly complicated so I'm going to just load my own custom chain 00:09:57.440 |
from here like retrieval chain. Retrieval tool is what we'll be using it as and essentially 00:10:04.560 |
it just makes less LLM calls so retrieval will be faster. Okay and from there we just want to 00:10:13.520 |
load our or create our conversational agent as we typically would. So we have a LLM here we're 00:10:22.560 |
just using our fine-tuned model name. We yeah we're using conversational buffer window memory. 00:10:29.760 |
We have this vector db chain which is going to be our tool which we initialize here. So this tool 00:10:37.440 |
is using Pinecone so if you're following along you can just get your environment and your API 00:10:43.680 |
key from here. Okay cool so once we have initialized all of those we can come down to here which is 00:10:52.240 |
initializing our agent. Conversational right description so conversational agent 00:10:57.680 |
and we just run that. Cool now we can say okay tell me about LLAMA2 and I actually I should 00:11:06.240 |
just point out that in order to use this tool here you would need to index your data first. 00:11:13.680 |
So I have another notebook for that I'll leave a link to that at the top of the video so you can 00:11:18.960 |
run through that if you if you really want to go through the whole thing. So okay let's go with 00:11:24.000 |
tell me about LLAMA2 it's just going to give us an answer but what we want to see is that it's 00:11:28.400 |
correctly using this JSON format which it is doing which is really cool. So yeah we see it for the 00:11:37.520 |
vector search tool action it did that I'm printing out more than I need to here I'm printing out the 00:11:44.640 |
vector because I was debugging earlier. We have a second like JSON or agent action which is to 00:11:54.400 |
provide the final answer to the user and it is LLAMA2 is a collection of pre-trained fine-tuned 00:11:59.840 |
large language models so on and so on right. Okay cool so that is one question let's try another. 00:12:09.440 |
What makes LLAMA2 so special? I think this is pulling from like the same context but we can 00:12:18.080 |
come up to here again I'm printing out too much but it's fine. So LLAMA2 is special because it 00:12:27.680 |
features a collection of pre-trained and fine-tuned large language models, 00:12:31.920 |
optimizer dialog, use cases, these models called so on and so on. I think we're supposed to say 00:12:37.280 |
LLAMA outperform open source chat models on most benchmarks tested and maybe substitute for closed 00:12:44.160 |
source models. Okay looks good now tell me about LLAMA2 red teaming. Again what we're looking for 00:12:53.680 |
is the correct JSON format which it handles nicely and yeah we get I think it's the same 00:13:02.320 |
same good answer again right. So all looks good everything's in the format that we would want 00:13:08.240 |
and yeah that is how we would fine-tune a GPT 3.5 model on our own dataset. Again like I said a big 00:13:18.320 |
part of this is actually building the dataset this is just a very quick sort of demo example 00:13:25.120 |
on how to do this. So what I want to do going forwards is put together a more a longer video 00:13:31.760 |
on fine-tuning where we'll kind of dive into actually building the dataset which is honestly 00:13:37.280 |
the more important part of all of this but at least we've seen the different endpoints for 00:13:44.800 |
the OpenAI client that we need to use when we're fine-tuning. So for now I'll leave it there I hope 00:13:51.120 |
this has been helpful and interesting so thank you very much for watching and I will see you again