Fine-tuning OpenAI's GPT 3.5 for LangChain Agents

00:00:00.000 | Today we're going to take a look at fine-tuning GPT 3.5. This is a very new feature that has been

00:00:06.640 | released by OpenAI and it gives us the ability to fine-tune our own custom GPT 3.5 models.

00:00:14.080 | So we can feed it a ton of our own conversations that are the ideal conversations we would like

00:00:22.160 | our chatbot or conversational agent to have and with those OpenAI will fine-tune a GPT 3.5 model

00:00:31.520 | and give that back to us as a sort of custom model that we can then just retrieve from the API

00:00:39.040 | given a... we'll basically get like a unique ID for each model. So that's really cool because

00:00:46.960 | with these models, I mean there's a few things that we can do to improve them. We can do a ton

00:00:53.520 | of prompt engineering. It takes a long time but usually doesn't get us all the way there. We can

00:00:58.720 | also do retrieval augmentation. Again it helps in some cases but not in others and if we add

00:01:04.720 | fine-tuning into that mix we actually have a lot of what we need in order to get almost any type of

00:01:12.720 | behavior that we'd like from our chatbots. So this is a very important feature and one that

00:01:20.000 | is going to become a key part of anyone's toolkit who is developing AI applications.

00:01:25.600 | So a big part of fine-tuning is always data preparation and collection. We're going to

00:01:31.280 | touch on that in this video but I'm actually going to leave the majority of that for another video

00:01:35.840 | although it is very interesting so I definitely want to go into details of how I'm building these

00:01:40.000 | datasets. But for now we'll just jump straight into the fine-tuning component. Okay so we're

00:01:45.920 | going to get started with this notebook here. We have a few installs here. We go ahead install those.

00:01:54.160 | We're going to get our dataset. So again this is one that I've pre-built. I will say there wasn't

00:02:01.280 | much manual work in this. It's kind of all automated to build it. Essentially you can

00:02:08.080 | kind of see what we have here. These are all actually different but the system message for

00:02:13.520 | each one of these is the same but then you have like essentially like a conversation or a small

00:02:20.480 | segment of a conversation down here. All of these are generated by GPT-4 so we can almost think of

00:02:27.840 | this fine-tuning as a form of knowledge distillation which is where you would use one

00:02:34.720 | model's knowledge and distill that knowledge into another model through some sort of fine-tuning

00:02:41.360 | approach which is what we're doing here. Now you can see example one the records here.

00:02:50.160 | We have the role system. This is the template system prompt from LangChain. We have this role

00:02:58.640 | user and then we have you can see we have tools in here right. So that's the bit that is the most

00:03:05.920 | important because what I want to do here is actually fine-tune chat or GPT-3.5 to be a better

00:03:15.920 | agent and the issue that I have with GPT-3.5 as an agent is that it really struggles to use this

00:03:26.160 | JSON format. So you can kind of see it here. It just seems to struggle with it a lot. GPT-4 does

00:03:34.000 | very well and that's why I've used GPT-4 to generate this data set. GPT-3.5 not so much.

00:03:40.560 | So we've created this data set in order to essentially teach it how to better use that

00:03:46.000 | format. Okay now this is just focusing on almost like a retrieval tool only. Naturally in a real

00:03:56.000 | use case we would want more diverse conversations not just using a single tool. We'd want multiple

00:04:02.880 | tools and we'd also want conversations that are not necessarily always using tools but for this

00:04:09.120 | example I think this is good enough. So I'm just going to save this to a local JSON lines file. We

00:04:16.160 | can see over here we can open this and all we will see is this. So we have our messages. So this is

00:04:23.600 | the format we need to follow when we're training. So we have a like a dictionary of messages and

00:04:30.560 | then we have a list of multiple messages that we're going to be training on. Each one of these

00:04:35.840 | represents a single conversation. Okay we can go across like I said the first system message is the

00:04:43.040 | same for all of them. Again that's maybe something you'd want to modify or maybe you just use your

00:04:50.720 | particular format that you're using for a chatbot that you'd like to train. So here we have next

00:04:56.800 | message role users. We have the tools that the assistant can use. Okay see these are all going

00:05:04.080 | to be the same again and then we get to here and this is where things begin to change. So here we

00:05:09.520 | have the end of the user message which is where we have a question from the user. That is followed

00:05:16.480 | by the assistant responding with I want to use a vector search tool and it uses this JSON format

00:05:24.080 | that we've set up. And what we have here is like a search query. Okay main focus of technical report.

00:05:32.400 | This technical report and you see this is kind of coming from this. Right all of this is generated

00:05:38.880 | by gpt4. So we said okay this is a question based on some information that we have from a data set.

00:05:46.480 | Make a search query which is what we have there and then continue and we get the context and we

00:05:55.120 | get later on an answer as well near the end. Okay here assistant JSON action final answer.

00:06:02.640 | Focus of the technical report so on and so on is this. Right that's our format. So then we want to

00:06:10.960 | go on to here where we are going to first upload the files that we're going to be using for fine

00:06:17.760 | tuning. So that's our conversation JSON lines file. We're going to be using this OpenAI file

00:06:24.880 | create which is obviously a new method in the OpenAI client. So you do need to make sure you

00:06:31.840 | have updated the client. The version I'm using here is 0279. Okay cool from that you we run this

00:06:43.200 | and we'll get a file id. We need to take that so we get it from here. Get the file id and we're

00:06:50.880 | going to use it to create a fine tuning job. Okay so this is the actual like the fine tuning of our

00:06:57.600 | model and we specify what model we would like to fine tune. At the moment this is our only option.

00:07:04.560 | So we run that. We should also sometimes it can take a little bit of time for this file to be

00:07:13.600 | available but you'll see if you if it pops up with an error here that means it just isn't

00:07:19.680 | available yet. So you just need to wait a little bit. It doesn't take too long though. Okay so now

00:07:25.520 | we can see we we have this fine tuning job running and it says finish at null. That's because

00:07:33.040 | obviously it takes a little bit of time for the model to be fine tuned. So we essentially need to

00:07:40.080 | just wait until this is not null anymore and once that is not null we will also get our fine tuned

00:07:48.080 | model id which we'll need to use later on. So we can yeah we'll get the job id where is that it is

00:07:58.640 | here and using that job id we can just retrieve the information for our fine tuning job like so.

00:08:06.720 | Okay so you see it still hasn't finished and it won't actually finish for a while. You can also

00:08:14.960 | check this so this will just say hey look created fine tuning job and then fine tuning job actually

00:08:22.720 | started. Okay and yeah there the events what you can do is you can kind of set this so this will

00:08:29.200 | just keep looping through and every 100 seconds it's going to call the API and check if the fine

00:08:36.720 | tuning job has finished. You can either do that or you can check your emails. So OpenAI once your

00:08:43.200 | fine tuning job has finished will also send you an email. Okay so this does take a little bit of

00:08:49.760 | time. I don't I think it was at least sort of 10-20 minutes when I ran it so probably more

00:08:57.920 | along 20 or so minutes. So what I'm going to do is just pause this or stop this and I am just going

00:09:05.040 | to come to here which is the model that I fine-tuned before this video. So this one is already

00:09:13.280 | ready and I can just I'm just going to go ahead and use this rather than waiting. Now okay once

00:09:19.120 | we have our fine-tuned model we'll get our you know we get the id from here which is what I

00:09:24.400 | mentioned before and then we're going to I'm just going to test this. Okay so I'm actually going to

00:09:31.200 | use this fine-tuned model in a conversational agent through LineChain. So to do so yeah we need

00:09:39.040 | the fine-tuned model id. I'm going to use a like a retrieval chain. The retrieval chain in LineChain

00:09:49.360 | I find is a little bit slow and overly complicated so I'm going to just load my own custom chain

00:09:57.440 | from here like retrieval chain. Retrieval tool is what we'll be using it as and essentially

00:10:04.560 | it just makes less LLM calls so retrieval will be faster. Okay and from there we just want to

00:10:13.520 | load our or create our conversational agent as we typically would. So we have a LLM here we're

00:10:22.560 | just using our fine-tuned model name. We yeah we're using conversational buffer window memory.

00:10:29.760 | We have this vector db chain which is going to be our tool which we initialize here. So this tool

00:10:37.440 | is using Pinecone so if you're following along you can just get your environment and your API

00:10:43.680 | key from here. Okay cool so once we have initialized all of those we can come down to here which is

00:10:52.240 | initializing our agent. Conversational right description so conversational agent

00:10:57.680 | and we just run that. Cool now we can say okay tell me about LLAMA2 and I actually I should

00:11:06.240 | just point out that in order to use this tool here you would need to index your data first.

00:11:13.680 | So I have another notebook for that I'll leave a link to that at the top of the video so you can

00:11:18.960 | run through that if you if you really want to go through the whole thing. So okay let's go with

00:11:24.000 | tell me about LLAMA2 it's just going to give us an answer but what we want to see is that it's

00:11:28.400 | correctly using this JSON format which it is doing which is really cool. So yeah we see it for the

00:11:37.520 | vector search tool action it did that I'm printing out more than I need to here I'm printing out the

00:11:44.640 | vector because I was debugging earlier. We have a second like JSON or agent action which is to

00:11:54.400 | provide the final answer to the user and it is LLAMA2 is a collection of pre-trained fine-tuned

00:11:59.840 | large language models so on and so on right. Okay cool so that is one question let's try another.

00:12:09.440 | What makes LLAMA2 so special? I think this is pulling from like the same context but we can

00:12:18.080 | come up to here again I'm printing out too much but it's fine. So LLAMA2 is special because it

00:12:27.680 | features a collection of pre-trained and fine-tuned large language models,

00:12:31.920 | optimizer dialog, use cases, these models called so on and so on. I think we're supposed to say

00:12:37.280 | LLAMA outperform open source chat models on most benchmarks tested and maybe substitute for closed

00:12:44.160 | source models. Okay looks good now tell me about LLAMA2 red teaming. Again what we're looking for

00:12:53.680 | is the correct JSON format which it handles nicely and yeah we get I think it's the same

00:13:02.320 | same good answer again right. So all looks good everything's in the format that we would want

00:13:08.240 | and yeah that is how we would fine-tune a GPT 3.5 model on our own dataset. Again like I said a big

00:13:18.320 | part of this is actually building the dataset this is just a very quick sort of demo example

00:13:25.120 | on how to do this. So what I want to do going forwards is put together a more a longer video

00:13:31.760 | on fine-tuning where we'll kind of dive into actually building the dataset which is honestly

00:13:37.280 | the more important part of all of this but at least we've seen the different endpoints for

00:13:44.800 | the OpenAI client that we need to use when we're fine-tuning. So for now I'll leave it there I hope

00:13:51.120 | this has been helpful and interesting so thank you very much for watching and I will see you again

00:13:56.400 | in the next one. Bye!

00:14:09.280 | you

00:14:11.340 | [BLANK_AUDIO]

Fine-tuning OpenAI's GPT 3.5 for LangChain Agents

Chapters