No-code fine-tuning: Mark Hennings

00:00:00.000 | Hey, my name is Mark Hennings.

00:00:22.700 | I'm a serial entrepreneur,

00:00:24.020 | and I'm super excited to talk to you about

00:00:26.240 | fine-tuning large language models today without any code.

00:00:29.960 | Let's begin.

00:00:31.520 | For our purposes today,

00:00:33.040 | fine-tuning is training a foundation model for a specialized task.

00:00:37.800 | Some examples of these specialized tasks are writing any kind of copy,

00:00:41.680 | emails, blog articles, product descriptions.

00:00:44.620 | It could be scrubbing fake emails from a list,

00:00:47.200 | extracting or normalizing data,

00:00:49.180 | translating, paraphrasing, rewriting,

00:00:51.660 | qualifying a sales lead,

00:00:53.380 | ranking priority of support issues,

00:00:55.840 | detecting fraud, or flagging inappropriate content.

00:00:59.040 | These are very common tasks that businesses do every day.

00:01:02.600 | And something they have in common is that traditional programming or rule-based approaches

00:01:07.160 | do not work well for them,

00:01:09.120 | but large language models are great at them.

00:01:12.160 | They perform them easily and they can capture the nuance in the text that you're working with.

00:01:17.160 | So why should we fine tune?

00:01:18.720 | I mean, prompt engineering is great, right?

00:01:19.720 | You can do almost all of these things with a prompt.

00:01:22.720 | Well, I'll tell you, fine tuning is awesome.

00:01:25.720 | It's faster and cheaper because you can train a lighter model to match the quality of what you were doing with a prompt.

00:01:32.720 | It reduces the size of your prompts, allowing for longer completions.

00:01:36.280 | Trending examples allow you to cover edge cases and collaborate better as a team.

00:01:41.280 | And it's naturally resistant to prompt injection attacks.

00:01:46.280 | So let's dive into some of these.

00:01:48.840 | How much faster is it really?

00:01:49.840 | Well, if you take GPT-4 and its response time per token, it's about 196 milliseconds, give or take, from the OpenAI API.

00:01:58.400 | On the same API, GPT-3.5 is 73 milliseconds.

00:02:03.400 | That's three times faster.

00:02:05.400 | How much cheaper is it?

00:02:08.560 | Well, taking an example with GPT-4 versus GPT-3.5 fine tuned, you can save 88.6%.

00:02:16.960 | Well, then how much shorter do the prompts actually get?

00:02:19.560 | Well, I'll give you one example because it's going to vary depending on your prompt.

00:02:23.200 | But here's what a typical engineered prompt might look like.

00:02:26.360 | It has some instructions saying to, you know, write a blog post on this topic, how to write it,

00:02:33.040 | what tone to use, what to do, what not to do.

00:02:35.880 | Well, with a fine tuned model, it learns how we write.

00:02:39.400 | So we don't need all of those instructions.

00:02:41.200 | It learns from our training examples.

00:02:43.600 | So we're just giving it the one thing that's unique about this prompt versus another prompt,

00:02:48.400 | which is the topic that we want to write on.

00:02:50.400 | And in this very conservative example, it's 90% shorter.

00:02:56.280 | Now let's talk about collaborating as a team, right?

00:02:59.680 | Because none of us work in a vacuum.

00:03:01.600 | We work with other people.

00:03:03.440 | Imagine a GitHub repo.

00:03:04.440 | You have one file.

00:03:06.160 | Your whole code base is just one file.

00:03:07.960 | That's like your epic prompt.

00:03:09.400 | Well, with fine tuning, now you can have multiple files like we're used to, where developers can

00:03:15.600 | work on this section of code or that section of code.

00:03:17.440 | But we're not talking about code.

00:03:19.440 | We're talking about training examples.

00:03:21.240 | So your training data is this layer that your team can work on and add to and edit and improve.

00:03:28.480 | And then that feeds into the fine tune model.

00:03:30.600 | So the main point is if you can get equal or better output, why wouldn't you fine tune a model?

00:03:36.800 | Now, fine tuning is kind of a dev job right now.

00:03:41.640 | Okay?

00:03:42.640 | Let's be real.

00:03:43.160 | If you go online and you look up how to do fine tuning, you can find articles that talk about how to spin up GPU servers for training and inference.

00:03:50.360 | And you got to format your data with these ad hoc Python scripts and configure these parameters and then make API calls.

00:03:56.920 | It just looks like a dev job, but if you really break it down, why can't we just automate all of that with a user interface?

00:04:05.600 | Is that possible?

00:04:07.120 | It is possible.

00:04:10.400 | And the bar is lower than most people think to get started doing this.

00:04:13.640 | If you can get 20 examples of what you want your fine tune model to do, you can fine tune a model.

00:04:19.400 | This is not traditional machine learning where you need thousands of examples to get started.

00:04:24.160 | And the data set is this impossible barrier to get past.

00:04:28.280 | No, this is something that you could handwrite these if you want to.

00:04:32.280 | One way to think about this is as an extension to few shot learning.

00:04:35.720 | Let's say you can have five examples of what you want a model to do in your prompt.

00:04:40.840 | Well, with fine tuning, your training example data set can be as long as you want.

00:04:45.240 | So instead of five examples, you can now have 20 or 100.

00:04:48.440 | So it seems intuitive that with more examples, the model would be able to do closer to what we want it to do.

00:04:54.920 | So here's what I propose for a dev lifecycle for large language models.

00:04:59.960 | We start with prompt engineering.

00:05:02.040 | Prompt engineering is a powerful tool.

00:05:04.120 | It allows us to create a prototype to validate the concept.

00:05:07.800 | And we can also use it to create our initial data sets for fine tuning.

00:05:12.160 | Once we have those data sets, we should fine tune a model and we should evaluate it to make sure that it actually is better than the prompt engineered version.

00:05:20.680 | And then we can test which models we can get to perform at the same level.

00:05:25.480 | Then the fine tuned model can go into production and from production, we can capture feedback from our users and we can log the examples.

00:05:31.840 | And with those examples, we can continuously improve our fine tuned model because now all of a sudden we have the real examples that we can add back into our data set.

00:05:42.040 | So in terms of roles, I think that there's a huge opportunity for people to get into prompt engineering and fine tuning who are not developers.

00:05:49.840 | Yes, if you're a developer, you can fine tune.

00:05:52.360 | Absolutely.

00:05:53.040 | But you shouldn't have to be the only person that can fine tune.

00:05:56.520 | I'm a co-founder at EntryPoint and we have built the modern tooling to make this easy.

00:06:01.680 | Let's take a look at how it works.

00:06:03.160 | Here we are on the dashboard and I'm going to open the press release writer project.

00:06:07.360 | Let's take a look at my 20 examples.

00:06:11.120 | The way I created these 20 examples for a press release generator was I went online and I found 20 press releases that looked really good.

00:06:19.960 | They came from blog articles about the best press releases that you can write.

00:06:24.080 | However, I didn't have input data, so my data set was incomplete.

00:06:28.240 | But I used chat GPT 4 to take the press release and then write a list of facts that would be needed to actually have a professional writer write such a press release.

00:06:39.520 | You know, large language models aren't great at facts.

00:06:42.960 | So providing it the facts as the input makes sense to me that I want to give it a list of facts and then have it write something that's really polished that would be a really good first draft of a press release.

00:06:52.240 | With this user interface, I have a lot of visibility into the data that I'm actually putting into my fine tune model,

00:06:58.040 | which I think is really important and the way this works is that we have a structured data approach.

00:07:03.440 | So when you import like a CSV into entry point, each column becomes a field.

00:07:09.080 | Here I have the facts and here I have the press release and these fields you can use in a template.

00:07:14.560 | Just like you are writing a mass email and you want to insert somebody's first name or personalize the emails with information about a contact record.

00:07:22.160 | You can use references to these fields with the handlebars templating language and it provides a really intuitive way to easily format your output, your input and GPT 3.5 Turbo when you fine tune it you can actually use the system prompt,

00:07:39.360 | which is where you can include instructions as well, which creates this really interesting hybrid between prompt engineering and fine tuning where you can have a small data set for fine tuning.

00:07:47.960 | But you can also give it some instructions to help once we have a data set like this we can go and we can go to our fine tunes press the add button.

00:07:56.360 | Select the model the platform because this is cross platform and then we count your tokens and estimate your cost for you.

00:08:04.160 | If this is going to be a dollar so hold on tight press start and that will get started.

00:08:12.160 | But I have some here that are already trained so let's go into one and use entry point playground

00:08:17.960 | and see if we can actually generate a press release with our fine tune model.

00:08:21.040 | The list of facts here I actually wrote about the AI engineer summit

00:08:26.080 | and we'll see if we can make a press release for the AI engineer summit.

00:08:30.960 | Let's go all right, so this fine tune model created a title here and it made it look like a press release.

00:08:39.760 | What I found to be a really cool workflow is to actually create a list of facts and then generate an article, read the article and then get ideas from it and go back to my list of facts and refine those.

00:08:50.160 | And then that actually becomes an iterative process to get really cool results.

00:08:54.560 | So I really enjoy fine tuning.

00:08:56.280 | It takes a lot of the boilerplate out of the prompt and you can just focus on what's

00:08:59.840 | important for the results you want and the rest is taken care of by your training data.

00:09:04.360 | Entry point has a lot of other cool features like data synthesis and tools to compare the performance of your fine tune models.

00:09:10.960 | Unfortunately, we don't have time to go into all of that today, but I hope you will check it out.

00:09:15.240 | It's entrypointai.com and it was a pleasure speaking to you.

00:09:18.440 | I'll see you next time.

00:09:19.440 | Bye.

00:09:20.440 | Bye.

00:09:21.440 | We'll see you next time.