back to indexLoRA Fine-tuning Tiny LLMs as Expert Agents

Chapters
0:0 LoRA Fine-tuning Agents
1:34 NeMo Microservices
3:19 NeMo Deployment
7:49 Deploying NeMo Microservices
16:54 xLAM Dataset Preparation
26:49 Train Validation Test Split
28:59 NeMo Data Store and Entity Store
34:14 LoRA Training with NeMo Customizer
42:3 Deploying NIMs
47:10 Chat Completion with NVIDIA NIMs
49:47 NVIDIA NeMo Microservices
00:00:00.000 |
Today we are going to be taking a look at how we can take off-the-shelf LLMs and fine-tune them 00:00:07.300 |
on our own data sets so that they can become better agents. Now one of the fundamental 00:00:14.280 |
requirements of LLMs in agentic workflows is that they can reliably use function calls. It is that 00:00:24.320 |
ability to do function calling that takes us from just having an LLM generating text to having an LLM 00:00:31.540 |
that is reviewing our code bases, writing code, opening PRs, checking out emails, doing web search, 00:00:38.980 |
all of that stuff is at its core it's an LLM but it's an LLM that can do function calling. Now despite 00:00:46.400 |
this until I think very recently there was a lot of LLMs like say the LLMs being built and released 00:00:55.920 |
that were good but they were not very good at function calling. Fortunately though for us even 00:01:02.100 |
with those other LMs we can very easily fine-tune them to make them actually pretty good at function 00:01:10.600 |
calling and therefore we can use these tiny LMs like we're going to be using a one billion parameter 00:01:15.860 |
model as agents which is pretty cool. So in this deep dive that's really what we're focusing on 00:01:22.580 |
how we can fine-tune these tiny LLMs or bigger LLMs if you really want to to be better agents and we'll 00:01:31.320 |
see how far we can push a tiny one billion parameter LLM. Now fine-tuning LLMs is pretty heavy you need a lot 00:01:40.040 |
compute. Fortunately we have access to NVIDIA's launchpad and with NVIDIA's launchpad I have 00:01:47.080 |
access to H100 GPUs so we're going to be running all this we're going to be training on a single 00:01:52.140 |
H100 but I can also use more if I want and as well as launchpad we are also going to be using 00:02:00.320 |
the Nemo microservices. Now Nemo microservices is from NVIDIA and the goal of the microservices 00:02:09.040 |
is to make fine-tuning and hosting your own LLMs and then also just running them in production much 00:02:17.000 |
easier or even just realistic to do at all. So these microservices, there's all these different components 00:02:24.660 |
within the microservices that we can use. One of the big ones is the customizer. The customizer is 00:02:30.700 |
what we are going to be fine-tuning with and there's also the evaluator. We're not actually 00:02:35.380 |
I'm not actually going to use this here although I may do in the future but that's of course evaluating 00:02:40.280 |
the models that you're fine-tuned and comparing them to you know where you began and NVIDIA NIMH which 00:02:46.060 |
is the essentially the LLM hosting service and that comes with the standard OpenAI compatible 00:02:53.420 |
endpoints so you can do chat completions and everything as you would with OpenAI. So we have 00:02:59.300 |
all that. There are a few other things here that I'm not going to go into but you can get an idea of what 00:03:04.700 |
they are anyway. So you have the curator which is for creating data sets to train your models, 00:03:09.940 |
retriever which is essentially RAG and then also guardrails which is of course you know protective 00:03:16.720 |
guardrails for LLMs when they're in production. Now we're going to be deploying this service and we're 00:03:24.120 |
going to be using a few different components. I've kind of covered most of those here. So the guardrails 00:03:30.020 |
and evaluator we're not using those so we can just ignore them they're on the side there. 00:03:33.640 |
The rest of these things we are using. So we'll be diving into each one of these components as we get 00:03:39.660 |
to them but let me just give you a quick overview of what this is here now and treat this as almost a 00:03:46.120 |
map of what we're going to building and going through. So to begin with we are going to be 00:03:52.480 |
pulling together a data set. So the data set of course it's going to it's going to come in up here 00:03:57.580 |
all right this is a data set we're going to bring that in and we're going to be sending that over here 00:04:01.600 |
to the data store. Okay so the data store what every single one of these components is accessible via an API 00:04:09.380 |
these all get deployed at once within our broader microservices deployment. So we'll see all that 00:04:17.020 |
later but anyway so we have our data set we do do some data preparation we're going to do that first 00:04:21.960 |
so in the middle here but then we're going to be putting everything in our data store. 00:04:26.840 |
We leave it in the data store there. We do register that data set in the entity store. The entity store 00:04:31.820 |
essentially is where we register all of our data sets and models so that all of the other components 00:04:38.460 |
within our microservices can read that and see what is you know what is accessible within the service. 00:04:44.520 |
Then we have the customizer. So as I mentioned the customizer is what handles the training of our models. 00:04:52.200 |
So the customizer is going to take a base model that we have defined. The way that we've done this is 00:05:00.020 |
we're setting the base models which is going to be llama 3.2 1 billion instruct. We're setting that as 00:05:06.780 |
the base model and we do that within the deployment set of our microservices. So we'll see that soon. 00:05:13.580 |
it's going to then load our train and validation data set from the data store component and then it's 00:05:20.280 |
going to run a training job based on a set of training parameters that we are going to provide 00:05:25.000 |
to us. So that's things like the learning rate, dropout, number of training epochs and so on. 00:05:31.100 |
We'll see that all later. And once that training job is complete our new custom model will be registered 00:05:40.460 |
with our NC store which means it will be accessible or at least viewable from our other components. 00:05:47.420 |
Then at that point, we will be using the deployment management component which you can see over here. 00:05:53.820 |
And what the deployment management is doing is deploying what we would call NIMs. 00:05:59.100 |
Now a NIM is what's a container from NVIDIA that has been built for GPU acceleration tasks. 00:06:07.420 |
So of course, hosting an LLM and running inference on it is the sort of thing that you would deploy within a NIM. 00:06:13.820 |
Now, when we deploy one of these LLM or model NIMs, that will then become usable by our NIM proxy. 00:06:25.020 |
The NIM proxy components is essentially just an API where we are going to send requests to. 00:06:30.940 |
So a chat completion request, for example, and based on the model that we provided, 00:06:35.980 |
it's going to route that to a particular NIM. 00:06:39.580 |
Now, we actually don't deploy NIMs for each specific custom model that we've built. 00:06:46.300 |
Okay, the only NIMs here are actually these two things. 00:06:55.820 |
And the way that we use them to run our custom models is that our custom models over here all have a base model parameter. 00:07:06.940 |
That base model parameter is going to define all of our custom models here as having a base model of LLAMA 3.2 1b instruct in this case. 00:07:15.740 |
And that means that our NIM proxy will know, okay, I'm using this NIM, but I'm using the model weights from this custom model. 00:07:23.340 |
Okay, so in essence, what this NIM becomes is a container for our custom model. 00:07:31.020 |
So at a high level, that is what we are going to be building. 00:07:37.180 |
We are, I know for sure this is going to be a fairly long one, but we are going to go through this step by step, starting with the deployment, and then working through each one of those steps I described. 00:07:49.340 |
Okay, so we're going to start by deploying our Nemo microservices. 00:07:53.660 |
So to do that, I have this little notebook that just takes us through everything. 00:07:58.700 |
Now, there are various ways of deploying this in the docs. 00:08:05.340 |
You'll see that they have this beginner tutorial and this does take you through some of the, some alternative ways of setting this up. 00:08:13.580 |
So you can, you can also try this and potentially it will be simpler. 00:08:18.220 |
Although for me, it was simpler to do what I'm going to show you, which is a little more hands up. 00:08:26.860 |
Essentially what we're going to be doing is downloading home charts for our Nemo microservices. 00:08:34.460 |
And to have to get those home charts, you will need a NGC account. 00:08:47.500 |
And this is going to take me through to their NGC catalog. 00:08:51.580 |
Now in here, you can find a ton of stuff, you know, that we are actually going to be using. 00:08:57.740 |
And I'll take you through this as we, as, as we need them. 00:09:01.500 |
But to begin with, I'm going to type in here, Nemo microservices. 00:09:06.860 |
And you can see that there is this Nemo microservices helm chart. 00:09:11.340 |
This is a helm chart, which bundles together all the individual home charts of the various 00:09:20.700 |
So we basically deploy this and it gives us everything that we need. 00:09:26.220 |
So it's like customized, evaluated guardrails. 00:09:28.700 |
And then also data store, entity store, deployment management, the NIM proxy operator, and so on. 00:09:38.060 |
And okay, right now, the latest version here is 25.4.0. 00:09:44.700 |
You can also check up here for the various versions if you want to use a different one. 00:09:51.580 |
So I just press that, it gave me the fetch command, and we can come over here. 00:09:58.220 |
Okay, so this is what we're going to be doing just here. 00:10:02.540 |
So that is how we navigate and find the various helm charts that we're using for deployment here. 00:10:07.820 |
But before we actually do that, we can't access this without providing our NGC API key. 00:10:28.140 |
Then come back over here and just run this cell and enter your API key. 00:10:42.140 |
This is literally the string that we're sending as our username. 00:10:48.860 |
So this is a helm thing, not specific to the microservices. 00:10:52.060 |
We need to include the values that we would like to set. 00:10:57.820 |
So these are going to override the default values within the helm chart. 00:11:01.260 |
The only one that we 100% need here is the NGC API key. 00:11:06.940 |
Then here I'm adding the LAMA 3.2 1B instruct model so that my customizer later can access that as a model to actually optimize. 00:11:15.740 |
You can also deploy this through the deployment management component otherwise. 00:11:21.260 |
But yeah, here you can also just deploy and it's ready to go when we need it. 00:11:29.900 |
So this was to handle a bug I was getting where there is essentially no storage for my customizer to save the model to or even I think write logs to. 00:11:41.580 |
So you may need that, may not, as far as I'm aware, it's either a bug or it's my misuse of the system that required that. 00:11:53.020 |
So we create our YAML file here and then we actually use that YAML file to deploy our service. 00:12:02.460 |
So before we do that, we do just create a namespace in Kubernetes. 00:12:11.900 |
Then, oh, the other thing that I also added here is, again, to handle that storage class issue I was seeing. 00:12:18.380 |
This here is going to look for all the default storage class values within our deployment 00:12:27.820 |
and replace them with the new storage class that I have set, which is this, okay, NFS client. 00:12:34.620 |
And you will see here the rule here is add the storage class if it is none, right? 00:12:40.700 |
So if there's already a storage class for the within the Helm chart, it's not going to replace it. 00:12:45.820 |
It's only going to replace it if that storage class has been set to none, okay? 00:12:56.220 |
So this is going to apply this logic, this replacement logic within our cluster. 00:13:02.060 |
I've already created it, so I'm going to get all these. 00:13:04.860 |
But when you run this the first time, you shouldn't see this. 00:13:10.140 |
So now what I want to do is, if you like, you can just confirm that you've filled out all the required values 00:13:20.700 |
So that can be good for, especially when it comes to debugging, if you have issues with your deployment, that can be quite good. 00:13:26.380 |
Now, we would install the Helm chart into the cluster. 00:13:31.340 |
So mine is going to tell me installation failed because I've already installed it, okay? 00:13:36.700 |
So mine is already there, so I don't actually need to do that. 00:13:39.820 |
But once you have done that, you can actually check your cluster and just see what is in there. 00:13:45.580 |
And you should, for the most part, see either initializing or running. 00:13:50.620 |
Sometimes you might see, especially for, I think, the entity store or the data store, you'll see a crash loop backoff. 00:13:57.180 |
That is, if you run this again, you should see, after not that long, that the status for that will turn to running. 00:14:09.020 |
If you see any errors or crash loop backoff here that are not going away or pending, that doesn't seem to be going away, there's probably an issue. 00:14:17.980 |
And you should look into that and try and figure things out. 00:14:22.460 |
The first time you do run this, you almost definitely will see that for your, I think, the Nemo operator controller manager, 00:14:32.140 |
this one here, you will see that this is stuck in a, I think, a crash loop backoff. 00:14:37.020 |
And the reason for this is because the scheduler dependency, which is Volcano here, 00:14:46.700 |
So we just need to run this cell here and this will install Volcano. 00:14:51.260 |
And then you would get your Nemo operator controller. 00:14:57.020 |
You would get the name for that pod from here. 00:15:03.580 |
And you would come in here and just run this and that would delete a pod and it will automatically restart. 00:15:11.260 |
And that should, that should fix the problem. 00:15:13.740 |
Now, one other thing that we will need to do is set our NVCR image pool secret. 00:15:18.540 |
So this is our ability to access NVIDIA's container registry, which is needed later on. 00:15:26.060 |
When we're training our models, we need to pull the relevant containers from NVCR. 00:15:32.380 |
And if we, if we don't have the secret set, we will not be able to access them. 00:15:36.540 |
So we'll get a, I think a forbidden error if I, if I'm not wrong. 00:15:40.460 |
So we do first just delete the existing secret. 00:15:44.380 |
I think there is just like a placeholder string in there by default. 00:15:48.700 |
Then what we would do is we create a new secret for the NVCR image pool secret. 00:15:55.180 |
For that, we do just need to pass in our, essentially our login details here. 00:16:04.540 |
And if you like, okay, first we can just check, okay, this is, you know, it's there, it exists. 00:16:11.180 |
And if you like, if yeah, I would recommend doing this. 00:16:15.020 |
You can just confirm that the secret has been read in correctly. 00:16:19.180 |
So if you run this cell, you will be able to see what is in there. 00:16:22.380 |
So that can just be useful if you're, especially if you're wanting to debug things or you're running 00:16:27.340 |
this for the first time, just so you understand what is going on. 00:16:29.820 |
And yeah, I mean, that, that actually should be everything for the deployment. 00:16:34.940 |
There is another step later where we're deploying our name, but we, we will run through that when 00:16:41.180 |
So with that, our whole deployment is ready to go. 00:16:46.140 |
And yeah, we're ready to start running through that whole data preparation, data storage, customization, 00:16:54.220 |
Now for our dataset, we are going to be using a dataset from Salesforce, which they use to train 00:17:04.620 |
Now these large action models are models that do a ton of function calling. 00:17:11.180 |
So the dataset that train those is obviously pretty good for us to fine tune our own LLM to be a 00:17:18.620 |
better agent through better function calling, of course. 00:17:22.060 |
So we need to go ahead and grab that dataset. 00:17:33.820 |
You will need an account for this, by the way. 00:17:46.700 |
And what you will probably see when you scroll down here is that you don't have access to this 00:17:51.340 |
And that's because you need to agree to their terms and conditions for using dataset. 00:17:55.180 |
So, you know, go ahead and agree to those terms and conditions. 00:17:59.180 |
And then once you've done that, you actually cannot programmatically download this dataset 00:18:09.100 |
Because, you know, your account has approved those terms and conditions, 00:18:13.100 |
but you need to prove that it's your account requesting that data programmatically. 00:18:16.700 |
So to do that, we need to go into our settings. 00:18:21.980 |
We go to access tokens and we're going to create a new token here. 00:18:28.140 |
This token, you can you can have fine grained access here if you want. 00:18:34.060 |
Or otherwise, I would just recommend like read only is fine. 00:18:38.460 |
Give it a little name and then create that token and you're good to go. 00:18:41.980 |
I've already created mine down here, so I'm going to create a new one. 00:18:46.380 |
So once you have that token, we need to jump into here and just run this cell 00:18:57.900 |
Now what we can do is go and download the data. 00:19:01.740 |
And you can see that the more here, we can see the dataset structure. 00:19:06.140 |
So you have ID obviously for each record, a query, which is the user query and an answer, 00:19:13.500 |
But it is not a direct answer, it's a tool call or function call. 00:19:19.740 |
So for those of you that have built tool calling agents before, 00:19:25.340 |
with open AI, line chain and everything else, there is always a tools parameter. 00:19:30.700 |
And in that tools parameter or function schema parameter. 00:19:34.300 |
And within that, you always provide a list of function schemas. 00:19:42.140 |
Okay, how, how can or what tools do I have access to? 00:19:49.820 |
Now, we can, I can show you what that looks like even. 00:19:53.980 |
So in here, yeah, we have that user query, the answers. 00:19:58.220 |
You can see here, there are multiple answers. 00:20:02.780 |
That is because, okay, where can I find live giveaways for beta access and games, right? 00:20:08.460 |
What the agent in this training dataset decided to do is it decides to use this live giveaways 00:20:19.820 |
So it's basically looking for, okay, live giveaways by type, and it's pulling in the beta type. 00:20:25.980 |
But the user is asking for beta access and games, right? 00:20:30.300 |
So in parallel, the LLM is also calling the same tool, but with the type game. 00:20:37.820 |
Now, in our training, we're actually just going to filter for records that have a single function 00:20:44.780 |
call because not all models can support parallel function calling. 00:20:49.260 |
So yeah, we're just sticking with the single one. 00:20:53.980 |
So we'll only have records where there is a single function call. 00:20:57.580 |
But, you know, the, the same applies, like if you want to train parallel function calling 00:21:03.020 |
and your, your base model is able to, you can, you just process the data as we do later 00:21:10.620 |
And then we have the tools argument here where we see the list of tools that this 00:21:15.260 |
LLM has access to, which is just this live giveaways by type tool. 00:21:22.540 |
And there is a little bit of a problem with this data set, which is it's not 00:21:27.020 |
quite in the correct format that we need in order to train with. 00:21:31.740 |
The NEMA customizer expects our data to be in the OpenAI compatible format. 00:21:37.100 |
That means that both the messages and the tool streamers, function streamers, they need to be in 00:21:43.340 |
the standard OpenAI format, which the XLM dataset does not provide is, is not in that format. 00:21:50.780 |
So we actually need to modify this a little bit and we'll, I'll, I'll talk you through, 00:21:54.700 |
okay, how we are doing that and what is actually going on there. 00:22:05.420 |
In fact, if it wasn't clear, so a function schema is just describing what a function 00:22:12.940 |
So for example, we have this multiply function, very simple. 00:22:23.020 |
When we run this function through a method to turn it into a function schema, which is what 00:22:28.700 |
we're doing here, you see that it turns into this structure, okay? 00:22:32.620 |
This structure is, that is the function schema, is also an OpenAI function schema. 00:22:38.300 |
So just so you're aware, that is basically, this is what we're doing, okay? 00:22:52.300 |
You can see here that, okay, this is the XLM dataset. 00:22:56.940 |
There are a few things that I'm missing, okay? 00:22:59.100 |
The first thing is that we need this type function, which I've put up here, okay? 00:23:05.820 |
So you can see, compare this function schema up here. 00:23:18.460 |
Then we need to put everything else within a function here. 00:23:21.820 |
So name, description parameters are all going to go inside this dictionary. 00:23:29.100 |
Here we have parameters and that goes straight into type and then description and so on. 00:23:34.940 |
Here, the parameters have a type object and then they have a properties dictionary inside 00:23:41.420 |
there where we're putting all this other information. 00:23:43.340 |
And then finally, the other final little thing there is that the types in the XLM dataset are using Python types. 00:23:53.180 |
So for example, STR, which is the Python type, which XLM uses, would become string. 00:23:59.100 |
Like the full, that's how you would name it here. 00:24:02.220 |
And then a list of anything would become array, okay? 00:24:06.140 |
And there's a full table here where I've written down all of the various, okay, Python formats, OpenAI format, 00:24:16.060 |
So we're going to be handling all of this in some logic, which I have written into here, okay? 00:24:21.980 |
So we're just normalizing the type here, looking at the type, converting into an OpenAI type. 00:24:25.900 |
Then we actually need to restructure the, you know, we looked at the structure difference between those two function streamers before. 00:24:36.780 |
We're converting from the XLM structure into the OpenAI structure. 00:24:40.700 |
And as part of this, we also do normalize our types, right? 00:24:46.860 |
So we're here, we're converting, you know, to all of these various type mappings, okay? 00:24:55.020 |
And then what we can do, okay, let me, this is the XLM tool schema. 00:24:59.260 |
If we convert this with our new function, we can see that it successfully turns XLM format 00:25:11.420 |
Okay, so that is for our tool or function schema. 00:25:14.940 |
Another thing that we need to do is for the actual assistant where it's calling a particular function or tool, 00:25:21.260 |
we also need to handle that in a particular way, which is a lot simpler. 00:25:27.100 |
So we are saying, okay, if we just have one tool call, we're going to keep it. 00:25:32.380 |
If there are more tool calls in this assistant message, 00:25:35.660 |
we're going to discard this and just skip it. 00:25:38.460 |
That, again, it's just to keep things simple for us. 00:25:42.460 |
But you do need to do that for LLMs that don't support parallel function calling. 00:25:46.620 |
And yeah, I mean, we're just restructuring that. 00:26:01.340 |
And then here we can see this is our formatted version. 00:26:10.620 |
So we actually need to go through and process everything, like all of our messages. 00:26:16.860 |
So that's the user message and also the assistant message, which is here. 00:26:23.980 |
And this is before processing and this is after processing. 00:26:33.100 |
So now we can go through and do that for the full data set. 00:26:43.180 |
And then we can see, okay, this is the first record has been cleaned up. 00:26:49.420 |
And now what we can do is work on separating these out into our train validation and test 00:26:55.260 |
So when we're fine tuning models, we have our training data, which is a 00:27:00.060 |
segmented part of the data that we are actually showing to the LLM during fine tuning 00:27:07.260 |
Then we have a small slice of that data set, which is left for validation. 00:27:12.140 |
So at the end of every training epoch, we're going to run our, 00:27:16.540 |
the current version of the LLM after it's been trained for an epoch on the validation 00:27:21.660 |
And we just report that performance back to ourselves. 00:27:27.660 |
So we're actually not necessarily going to use that here, but in the, if you're evaluating, 00:27:33.500 |
you would reserve that test split for the evaluation step, which comes after training. 00:27:38.700 |
You're going to basically test again on this test data set. 00:27:43.260 |
And that will inform you as to your almost final performance for your model. 00:27:49.340 |
So to do all this, we first are going to shuffle our data and I'm going to split our data into 00:27:58.540 |
We do a 70% train data followed by 15% and 15% for the validation and test data. 00:28:06.620 |
Which you can see here for the actual number of records and yeah. 00:28:13.340 |
If you are going to run that through evaluation, there's a slightly different format for that. 00:28:16.540 |
So you would format it a little bit differently. 00:28:19.660 |
And finally, we're just going to save all those. 00:28:22.540 |
So our training file, validation file and test file. 00:28:27.420 |
And we're going to be uploading those into our data store and registering them now. 00:28:33.260 |
So, I mean, that's already a lot I know, but that is the data preparation stuff out of the way. 00:28:40.540 |
So we have so far outlined what we're going to do with all the various components for our fine tuning pipeline. 00:28:48.620 |
We have then deployed our Nemo microservices and then we have prepared our dataset for training. 00:28:58.300 |
So we can actually jump into actually using the microservices now. 00:29:03.980 |
In this pipeline of microservices, what we first need to do is, of course, set up our training data. 00:29:14.620 |
But just, you know, taking out our training data because we've just saved it locally right now. 00:29:19.660 |
Taking that training data and giving it to our Nemo microservices. 00:29:25.580 |
And basically, we're going to put them in the data store. 00:29:38.300 |
So first thing we're going to do, okay, because we're going to need to reference all these IP addresses. 00:29:43.980 |
We're going to be hitting the various APIs throughout this notebook. 00:29:47.260 |
We're first just going to take a look at our deployment and we're going to see, okay, what are the IP addresses for each of our services? 00:29:56.940 |
We have the customizer, data store, actually, almost. 00:30:01.260 |
So these four and also our NimProxy we're going to be using. 00:30:04.860 |
So we need to make sure that we pull these IPs in. 00:30:08.460 |
So for the customizer here, and we're going to pull them in down here. 00:30:14.060 |
Now, the IP addresses for you when you're running this will be different to what you see here. 00:30:18.620 |
So it's important that you do copy those across. 00:30:24.700 |
And also, you do need to have HTTP at the start there. 00:30:28.860 |
So the only thing you need to be changing is the IP addresses in the middle there. 00:30:36.220 |
And the other thing, okay, this is less important, but you can modify it if you want. 00:30:43.420 |
So this is what we're going to be using when we put our dataset in the data store and sort it later. 00:30:48.620 |
So you can modify that if you want to call it something else. 00:30:54.460 |
Now, the first thing we need to do is for both our end store and the data store, 00:31:03.420 |
Now, namespace for both of these is going to be equivalent to our namespace that we've set 00:31:15.020 |
Now, the first time you run this, you should get to 200 responses. 00:31:18.700 |
The reason I'm getting 409 here is because I've already run this. 00:31:33.340 |
Now we are going to upload the data to our microservices. 00:31:38.380 |
And for that, we're using the Hunging Face API client. 00:31:41.980 |
Now, the reason that we're using this is that the Hugging Face API client is just very good at data 00:31:50.060 |
processing and fast transfer of data between various places. 00:31:55.580 |
But we're not actually using Hugging Face here. 00:32:01.900 |
This is going to go directly to our data store. 00:32:05.020 |
And the data store has been there's this Hugging Face endpoint, which is kind of like what we do with OpenAI compatibility. 00:32:17.820 |
They've made it Hugging Face API compatible as well. 00:32:26.460 |
You can see that it is just the namespace followed by the dataset name. 00:32:31.420 |
That is similar to if we go over to Hugging Face. 00:32:40.060 |
There is the like the namespace, which in this case is the Salesforce, followed by the dataset name, which is the XLAM function calling 60K. 00:32:48.860 |
This is it's the same thing, but just locally or within our microservice cluster. 00:32:55.100 |
Okay, now we can go ahead and create our repo. 00:32:59.260 |
You will find, you know, once you've run this once and you run it again, it will not recreate the repo because it already exists. 00:33:12.460 |
But then if I run it again, it's not going to show me anything. 00:33:18.860 |
And now we need to do is upload our training validation and test data sets. 00:33:22.620 |
And we do that with this face API upload file. 00:33:25.980 |
And we're just pointing it to each one of our data files. 00:33:28.380 |
Okay, so it's training validation and test data. 00:33:38.620 |
And now what we can do is register the dataset that we just created with our Nemo NC store. 00:33:45.180 |
So all we're going to do here is say, okay, this is our the URL to our files that we just created. 00:33:50.780 |
And so it's at the hugging face endpoint data sets. 00:33:54.140 |
And then we have the namespace and data set name again. 00:33:57.340 |
So all we're doing that is just posting that to the end store. 00:34:01.020 |
Now the entity store knows that we have this dataset. 00:34:05.260 |
And we can just confirm that that has been registered correctly. 00:34:17.900 |
So training, although super exciting, is actually always, there's not a lot to it nowadays. 00:34:28.700 |
We can get some nice charts and everything here. 00:34:31.020 |
So I'll just explain and go through, okay, how we can check in on the progress of our customization, 00:34:39.020 |
our training, how we can check in on the lost charts and so on. 00:34:45.420 |
So the first thing we want to do is actually check what models or base models we can fine tune from. 00:34:52.780 |
So we run this get customization, customization configs, and I can see, okay, we have this model. 00:34:59.820 |
Now, the reason I can see this is because earlier on when we were deploying everything for the customizer 00:35:05.580 |
within the values like YAML, I specified that I want this model to be available. 00:35:16.460 |
I think by default, there is a default model, which is actually this one. 00:35:21.580 |
So this is the default model that the customizer will have access to. 00:35:26.940 |
And then if we scroll down a little bit, we can see the model that I've defined as wanting 00:35:32.300 |
to have access to, which is this one here, the llama3 to 1b in stroke. 00:35:37.980 |
So this is the one we have set in our values like YAML. 00:35:43.500 |
It's this is the model that is just by default accessible within the by the customizer because 00:35:52.060 |
it's set already in the pre-written values dot YAML that we later overwrite. 00:36:00.540 |
So we have that and we can jump into actually training. 00:36:05.740 |
Now, the one thing, and I would, I would really recommend you do this is you can get a weights 00:36:12.460 |
and biases API key here, and it just makes checking in on the progress of your model. 00:36:18.940 |
So I would really recommend doing that to, to get this, you need to go to W and B dot A I open this. 00:36:30.460 |
I think you, they come with a free trial period and they, they might still do the sort of free 00:36:44.060 |
So once you have that come back over here, run this cell and you need to enter your API key. 00:36:57.100 |
So you can see there's all the various parameters that you set in there. 00:37:07.500 |
If you are running into some bugs and I mentioned some references to those here, uh, to deal with 00:37:16.860 |
So now what we can do is we can go get our customization ID, and then we can send a request to this 00:37:28.300 |
And we should see up here that it will probably be running. 00:37:33.660 |
Now we can also see there is, there's like this big list of like events that are happening. 00:37:45.740 |
It got access to some resources and it started running. 00:37:54.620 |
So if we look for anything that begins or any pods that begin with this cust, that is a 00:38:01.420 |
customizer, I believe you'll be able to see these and we can see, okay, that this, 00:38:06.780 |
the first one here is already completed and it only took 74 seconds. 00:38:11.100 |
How incredibly fast is that for a training job? 00:38:20.060 |
I believe pulling in the details and then it triggers the training job once it's complete. 00:38:31.660 |
You can take that and we can view the logs for that pod here. 00:38:36.460 |
So I'm going here and see, uh, you'll see this right now. 00:38:41.420 |
After a little while, you'll start seeing like training updates. 00:38:44.860 |
So, you know, what step is it on and so on, but most useful, which is why I said, you know, 00:38:51.100 |
get your weights and biases API key, most useful is actually going over to your weights and biases 00:38:57.180 |
What you should find is that Nemo will automatically create this NVIDIA Nemo 00:39:03.260 |
customizer project within that weights and biases for you. 00:39:08.060 |
And then we're going to see, well, these are a couple of past jobs I've already run for me, 00:39:16.380 |
So you're going to be able to check in on your validation loss and training loss. 00:39:25.020 |
Now, I think if I look at this, the job that I've just kicked off hasn't quite started yet. 00:39:36.620 |
So this top one here, I can remove all the rest. 00:39:39.980 |
I can't really see anything because it has literally just started. 00:39:44.460 |
But once we start getting metrics coming through, I will be able to see, you know, 00:39:50.380 |
how things are going and I'll be able to check back in every now and again to see how, you know, 00:39:55.340 |
how my losses coming out, how far along we are and so on. 00:40:07.500 |
If you just look at this, right, I'll remove that one. 00:40:13.420 |
My validation loss here is, yeah, it's lower. 00:40:16.060 |
This, this one here, this yellow run scored best for validation loss. 00:40:23.500 |
And what I can do is say, okay, I've got the run ID here. 00:40:30.940 |
So I can, I can take that and I can actually use that when I'm deciding which model to use later. 00:40:37.660 |
So this ID here, we can use to run various models, which is pretty helpful. 00:40:44.460 |
Now, I actually don't want to run another training run because this can take like 45, 50 minutes. 00:40:51.340 |
So what I'm going to do is cancel this training run. 00:40:54.620 |
Of course, you probably won't want to do that, but I'm going to cancel mine. 00:40:58.540 |
And I don't necessarily know, okay, what endpoint do I need to hit to cancel? 00:41:04.300 |
So what I'm going to do is I'm going to navigate to the NVIDIA Nemo Microservices dots. 00:41:15.580 |
So this is that Nemo Microservices latest API index. 00:41:19.660 |
And I can come in and say, okay, I need to customize the API. 00:41:51.420 |
So, you know, we have all the information here. 00:41:54.060 |
And that's great because I wanted to use that GPU for running inference, 00:42:02.700 |
So once that training job is complete, well, we will first see that in here. 00:42:08.540 |
When we do the get to the status, we should see that the status is complete. 00:42:17.260 |
We should see that the entity sort has already registered our model. 00:42:22.140 |
So we should see, this is the latest model here. 00:42:27.260 |
The name that we gave it followed by at, and then the customizer ID that it was provided. 00:42:34.540 |
And that means that, okay, we have it, this model, our custom model in our entity. 00:42:41.740 |
But if you remember in here, we cannot actually use chat inference on our custom models, 00:42:50.220 |
unless we have a NIM for the base model already deployed. 00:43:00.060 |
What we need to do is come down here and we're hitting the model deployments component. 00:43:05.500 |
Now, the model deployments component is, that's deciding, okay, I'm going to go and deploy 00:43:12.620 |
And we need to tell it, okay, which one do we want to deploy right now? 00:43:23.420 |
But the thing that you do need is, okay, you need the model. 00:43:25.660 |
And this needs to be the base model for your custom models. 00:43:29.580 |
So my custom models were trained off of this model. 00:43:32.620 |
But the thing that you need to be aware of is, okay, the image name. 00:43:42.620 |
So you actually need to go to the NGC catalog again, which we can find at catalog.ngc.nvidia.com. 00:43:51.100 |
You go into here, you go to containers here, and then you can actually filter by NVIDIA NIM, 00:43:59.180 |
And then what you have is all of the, well, all of the NIMs and the LMs that you can use. 00:44:07.740 |
So I'm going to say that I, well, yeah, I want the 3.21 model. 00:44:24.620 |
And then what I want to do is say, okay, I want to get this container. 00:44:29.820 |
And you can see that this is the latest tags image path, right? 00:44:34.220 |
I'm actually, well, you can take the whole thing, but it will also work with just this. 00:44:43.980 |
We'll see in a moment that that's not the latest, but it's fine. 00:44:56.780 |
So go into here and you can see, yeah, I don't know why 1 is showing as the latest, 00:45:06.140 |
Yeah, the order is kind of messed up here, but it's fine. 00:45:09.740 |
So 1.8.5 is the latest one as far as I can tell here. 00:45:24.140 |
So we, yeah, we run this, that is going to deploy. 00:45:30.540 |
And actually, for me, it's already deployed because I've already done that. 00:45:34.380 |
If I want to create another deployment, I can. 00:45:37.260 |
I can keep the model the same if I really want to. 00:45:43.020 |
And then we just come into here and we should see that there is a model deployment job going on. 00:45:57.100 |
There are a few different jobs that say like model deployment or model download or whatever. 00:46:04.220 |
But yeah, this is the one that we're looking for. 00:46:07.180 |
That, of course, for me, you know, I completed forever ago, 16 hours ago. 00:46:18.140 |
We can then just confirm that the model has been picked up now by our NIM endpoint. 00:46:35.580 |
But automatically, it's also pulling in our custom models, right? 00:46:40.620 |
So as soon as our NIM proxy sees that we have the like the base model NIM for our custom models, 00:46:49.100 |
it's also going to load in all those custom models as well. 00:46:55.260 |
I'm sure there's probably a few in here, right? 00:46:58.700 |
So we can see, yeah, we have another one here. 00:47:05.500 |
So, and I think those are all the ones that I've trained within this version of the instance anyway. 00:47:12.700 |
And now we can actually go ahead and use the model finally. 00:47:15.260 |
So using the model, as I mentioned, we use OpenAI, right? 00:47:23.180 |
So that means we can use it as we would a normal OpenAI model. 00:47:26.780 |
So we're actually just using the OpenAI library here. 00:47:33.900 |
So we just change the base URL to point to the NIM URL, the V1 API there. 00:47:39.420 |
And API key doesn't, you know, just put whatever you want in here. 00:47:43.420 |
I don't, I think you can just buy anything, but you put none if you want to be cautious. 00:47:49.180 |
Now we can test it on our test data that we set up before. 00:47:54.780 |
If we want, we can test, you test it on whatever you want, to be honest, but we can test on this. 00:48:01.820 |
And this is what would the diabetes risk for a lightly active person away of this so-and-so-and-so-and be. 00:48:17.260 |
We have assessed diabetes risk is probably the one that we would use. 00:48:28.060 |
So we just, we can just see everything there. 00:48:33.020 |
Content is none because when there is content, that is the LLM responding directly to you. 00:48:42.140 |
We can see tool calls, check completion message, tool call ID, so-and-so-and. 00:48:48.300 |
We can see that these are the parameters into some tool. 00:49:13.180 |
We put lightly active person activity, lightly active. 00:49:21.900 |
So for the activity here, we can actually see the allowed values here. 00:49:25.100 |
We have sedentary, lightly active, and so on. 00:49:27.500 |
So it's actually putting that in correctly, which is great. 00:49:31.260 |
Then we can also stream as well, because this is, you know, API compatible. 00:49:36.140 |
So to stream, we would, you know, we just set stream there and we stream like so. 00:49:44.140 |
And we can see all that coming through as well. 00:49:50.140 |
We have deployed our full microservice suite. 00:49:56.060 |
We upload our data prepared, like put it in the right places for Nemo microservices. 00:50:05.500 |
And then we've just tested it at the end there. 00:50:07.900 |
Of course, in many cases, you probably want to do evaluation and everything else around that. 00:50:16.140 |
But we can already see straight away, like this is a 1 billion parameter model. 00:50:21.180 |
Just from that test, straight away, it's able to do function calling, which is, I think, really, really good. 00:50:29.340 |
And it's not something that you would typically get from such a small 1 billion parameter model. 00:50:36.620 |
And you can test it more, like test it more, test it with more data. 00:50:40.380 |
And you will see that it is actually able to very competently use function calling and use it correctly, 00:50:47.660 |
which for a model of its size is really impressive. 00:50:51.660 |
And the reason for that is because we have fine-tuned it on that function calling data set. 00:50:58.380 |
And of course, you know, we can apply it to that function calling data set. 00:51:03.660 |
If you have your own specific tools and everything that you have within your specific use case or industry, 00:51:11.580 |
whatever it is, you can fine-tune on that data as well and make these tiny LMs highly competent agents. 00:51:20.860 |
Which is, I think, in terms of just cost, performance, is really impressive. 00:51:28.700 |
Yeah, so I really like this whole process that NVIDIA built with the microservices. 00:51:37.980 |
And just the fact that you can do this so quickly and build these models is, in my opinion, really exciting. 00:51:44.700 |
It's something that's fine-tuning models, building custom models. 00:51:48.140 |
It's something that has really been lost in maybe the past couple of years with big LMs coming out. 00:51:56.700 |
It's something that I hope this type of service makes more accessible and just a more common thing to do. 00:52:05.900 |
Because the results that you can get from it are really impressive. 00:52:09.100 |
So, yeah, that is it for this video and this big walkthrough and introduction to the Nemo Myroservices and fine-tuning LMs. 00:52:19.340 |
I hope all this has been useful and interesting, but for now I'll leave it there. 00:52:24.060 |
So, thank you very much for watching and I will see you again in the next one. Bye. 00:52:29.340 |
Thanks for watching and I will see you again in the next one.