Pydantic is STILL all you need: Jason Liu

00:00:00.000 | Eliana Khanna: So, the context from last year's talk was "Pydantic All You Need."

00:00:15.800 | It was a very popular talk, you know, it kind of like kicked off my Twitter career.

00:00:20.160 | And today, I'm coming back a year later to basically say the same thing again: "Pydantic

00:00:26.200 | is still all you need."

00:00:27.680 | And really, my goal is to share with you sort of what I've learned for the past year.

00:00:31.360 | And the problem has always been the fact that if I had hired an intern to write an API for

00:00:35.980 | me and that API returns a string that I have to JSON loads into a dictionary and then just

00:00:40.600 | pray that the data was still there to begin with, I would be pretty pissed off.

00:00:44.500 | I would probably just fire them, replace them with Devon, and just prompt it to use FastAPI

00:00:49.000 | and Pydantic.

00:00:51.160 | Because I'm really tired of writing code like this, right, and this is the kind of code that

00:00:54.200 | we wrote when we had to work with things like chat GPT-3 and stuff like that.

00:00:59.380 | But there's a lot of good tools that we have in the Python ecosystem.

00:01:02.500 | And the ecosystem in all of these languages, whether it's Ecto and Elixir, Active Record,

00:01:06.680 | or anything like that, that can make our lives much, much easier.

00:01:09.880 | And so the problem is that by not having schemas and structured responses, we tend to lose

00:01:15.620 | compatibility, composability, and reliability when we build tools and write code that interact

00:01:21.300 | with external systems.

00:01:23.480 | But it seems that we're very happy with using LLMs for the exact same reason.

00:01:28.060 | And so last year, we mostly talked about how Pydantic and function calling was a great alternative

00:01:32.740 | for how we can use structured output to do a lot of additional benefits, right?

00:01:37.240 | We are able to have nested objects and nested models for modular structures.

00:01:41.400 | And then we can also use validators to improve the reliability of the systems that we build.

00:01:45.240 | And I'll talk about some of these examples.

00:01:47.240 | And so it's been about a year and a half.

00:01:50.060 | I think the big question is, what's new in Pydantic?

00:01:53.420 | What's new in the library?

00:01:54.740 | And the answer is basically nothing.

00:01:56.760 | I'm basically coming back to say that I was right, and it feels really, really good.

00:02:01.660 | It's still PIP install instructor, right?

00:02:04.000 | And since then, we've released 1.0.

00:02:07.220 | We've launched in five languages, Python, TypeScript, Ruby, Go, Elixir.

00:02:12.740 | We just built out a version in Rust.

00:02:14.800 | And again, it's mostly because this is just the exact 600 lines of code that you do not

00:02:19.020 | want to write.

00:02:20.380 | And at least in the Python library, we've seen 40% growth month over month.

00:02:24.660 | And we've only had about 2% of the coverage of the OpenAI download.

00:02:28.580 | So there's still tons of room to grow in terms of how we can make these APIs a lot more ergonomic.

00:02:35.180 | And so you saw 1.0, you might be going, Jason, what did we break in the API?

00:02:39.980 | I renamed a method, and now we support things like OLAMA and LAMACVP, along with a bunch of

00:02:46.100 | other APIs.

00:02:47.100 | So we support things like Anthropic, Cohered, Gemini, Grok, everything that you need.

00:02:52.280 | And as long as language models support more function calling capabilities, this API will

00:02:56.360 | pretty much stay standard.

00:02:58.700 | And if you haven't seen the talk last year, the general API looks like this.

00:03:01.940 | You define a Pydantic object.

00:03:04.320 | You can then, you know, patch the OpenAI client or any client that you want.

00:03:08.180 | And all you have to do is you've got to pass in response model equals user.

00:03:11.920 | Right?

00:03:12.920 | This is basically it.

00:03:13.920 | This is very similar to how fast API works.

00:03:16.860 | And you know, it took a little bit of hackiness, but now we can also leverage some of the new

00:03:20.540 | Python tooling to also infer the return type.

00:03:23.460 | And so here, because response model is a user, the object is inferred as a user object.

00:03:28.000 | You get nice red squiggly lines if you've messed up your code.

00:03:31.660 | The same thing happens when you want to create an iterable.

00:03:34.480 | Here you see that I have a single response model as a user, but I want to extract two objects.

00:03:39.280 | And as long as you set stream equals true, you're going to get each object as they return.

00:03:44.000 | And this is kind of a way of using streaming to improve the latency while having a little

00:03:47.840 | bit more structured output.

00:03:50.400 | We also have partials, right?

00:03:51.780 | The difference here is that instead of just returning a partially correct or validated JSON

00:03:56.720 | stack code, we can validate the entire object.

00:03:58.960 | And this means that if you have things like generative UI that use a structure, you can

00:04:02.540 | render that while streaming without having to write this very evil JSON stack code to figure

00:04:08.140 | out how to render this in real time.

00:04:11.420 | And so, yeah, nothing's really changed.

00:04:13.220 | You have one noun, which is the client, and you have three verbs.

00:04:15.960 | You can create, create with iterable, and create with partial based on whether or not you

00:04:19.460 | want to use streaming.

00:04:20.800 | And everything else you think about is going to be around the response model, the validator

00:04:24.840 | features that you have to build, and the messages array that you pass in.

00:04:28.080 | So if OpenAI supports some new, weird API call, as long as it fits with the messages, there's

00:04:33.020 | not going to be any breaking code.

00:04:36.040 | And that's why I think Pydantic is still all you need.

00:04:38.080 | And so the rest of this talk is basically going to be about two, really three things.

00:04:43.320 | I'm going to cover some examples of generation, in particular, around RAG and extraction.

00:04:49.000 | Then I'm just going to cover what we learned this year, and it's really not that much, right?

00:04:53.320 | Validation errors are very important, and usually they can fix any errors that we have.

00:04:58.320 | Not all language models can really support retry logic right now.

00:05:01.260 | I think that's something we're going to work towards.

00:05:03.060 | And ultimately, whether you use vision, or text, or RAG, or agents, they all benefit from

00:05:08.020 | structured outputs, right?

00:05:09.320 | Because the real idea here is we're going to be programming with data structures, which

00:05:12.560 | is something everyone knows how to do, rather than trying to, like, beg and pray to the LLM

00:05:16.720 | gods.

00:05:17.840 | And really, again, the theme of this talk is the fact that nothing really has changed.

00:05:21.300 | The language did not change.

00:05:22.760 | All we learned to do is relearn how to program.

00:05:27.060 | And so the first concept that I think many people might not have seen in Pydantic is the

00:05:31.020 | validators, right?

00:05:32.020 | Here, you can define a validator on any kind of attribute, and add additional logic that

00:05:36.080 | tells you what correct looks like.

00:05:38.460 | And so you see, in my prompt, I don't really ask the language model to uppercase all the names,

00:05:42.940 | but I can actually write Python code to verify that something is correct and throw an error

00:05:47.000 | message.

00:05:48.000 | And if I want to, I can turn retrying on, and that error message is caught by the language model,

00:05:52.720 | and then you use to correct the outputs.

00:05:54.320 | And so in this example, it is the error message that is part of the prompt, but conditionally

00:05:58.400 | added to the language model.

00:06:00.520 | And as you can see, after one API call, JSON is now all caps.

00:06:05.220 | Pretty nice.

00:06:06.420 | We can also do model-level validation.

00:06:09.140 | This is a very simple example that you might see, something like RAMP, where you're processing

00:06:13.780 | receipts.

00:06:14.780 | You might want to use a vision language model to extract the receipt data.

00:06:18.820 | There's a total cost, and the products is a list of products.

00:06:21.500 | And the validator does something a little bit more interesting.

00:06:23.100 | It says, "Make sure that the price and the quantity add up to the total cost."

00:06:27.180 | Right?

00:06:28.180 | Again, this basically doesn't really happen for 99% of the cases, but when it does happen,

00:06:33.320 | you see a red bar in Datadog, and that's really what I care about.

00:06:36.100 | And if I want to ask re-asking, I want to make sure that, again, everything is done correctly.

00:06:41.180 | So let's jump into generation, right?

00:06:42.780 | Why should I use structured outputs?

00:06:44.860 | Well, it turns out if you don't use structured outputs, the structure you get is just response

00:06:49.860 | as a content string.

00:06:50.860 | Right?

00:06:51.860 | You still get an object back out, but you're just hoping that you don't have to call JSON

00:06:55.400 | loads yourself and, you know, eat whatever cost you have in terms of parsing.

00:06:59.860 | And so a really simple example of a RAG application is not only having a content, but having a

00:07:05.840 | list of follow-up questions, right?

00:07:07.600 | The follow-up questions can be informed by the existing context, but now you're going to

00:07:11.460 | let the user show, like, "Hey, there's other questions that you can answer based on the context

00:07:15.180 | that I've put in the prompt."

00:07:16.460 | A really funny example that I've actually done in production quite a bit is just making

00:07:20.700 | sure that the links we return are valid.

00:07:23.100 | And so here, I have a very simple validator.

00:07:26.120 | I just have a regular expression, parse all the URLs, and I use post to figure out if the

00:07:30.660 | URL returns a 200.

00:07:32.380 | And now I can make sure very easily that no URLs are, you know, hallucinated.

00:07:37.380 | And in my instructions, I just say, "Well, if it's not real, just throw it out next time."

00:07:41.700 | Right?

00:07:42.700 | Don't try too hard.

00:07:43.700 | The same thing happens with retrieval augmented generation.

00:07:48.180 | We all kind of know at this point that embeddings won't really solve all the problems you have

00:07:51.680 | in search, right?

00:07:52.820 | For example, if I ask a question like, "What is the latest news from Z?"

00:07:57.100 | Like, latest news isn't something that embeddings can capture, right?

00:08:01.320 | The source of that, maybe that is relevant if you use BM25, but really there might be separate

00:08:06.020 | indices that we want to query.

00:08:07.980 | And we can do something very simple in the structured output world that makes this very reasonable,

00:08:12.460 | right?

00:08:13.460 | To find a search object, I say it has a query, a start date, an end date that is optional.

00:08:17.940 | Maybe there's a limit in case I want to see the top five results.

00:08:21.440 | And then a source that allows the language model to choose which backend I want to hit.

00:08:26.740 | And then, you know, how you actually search the endpoint is kind of an implementation detail

00:08:30.000 | that we don't care about.

00:08:31.200 | And now you just define a very simple function, you know, create search.

00:08:34.400 | It takes in a string, returns the object.

00:08:36.700 | And even the API call itself now is an implementation detail, right?

00:08:40.060 | As long as I get the search query out and it's correct, I can do a lot more.

00:08:43.680 | And in particular, like even the validations themselves, you know, I can figure out whether

00:08:48.180 | the date ranges are zero days, one day, and figure out even distributions based on the structured

00:08:52.760 | output.

00:08:53.760 | Then if I ask the question, like, what is the difference between X and Y, I can just

00:08:58.240 | turn on iterable mode.

00:08:59.240 | Now, if I ask this question, I'm going to have a search query for Y, a search query for X,

00:09:04.740 | and again, my RAG application can figure out that I can do two parallel search queries,

00:09:09.080 | collect them together, and continue on.

00:09:12.040 | And so this means that you can build a fairly sophisticated RAG application in two functions

00:09:15.980 | and two models.

00:09:17.980 | First you have the model for how you respond with the data, and then how you process a search

00:09:22.060 | query, right?

00:09:23.060 | As you can see here.

00:09:24.060 | And then you define two functions that return those objects.

00:09:29.520 | And then this is basically your advanced RAG application, right?

00:09:31.980 | You make a search query, you return multiple searches.

00:09:34.660 | You search each one, and then you pass the context into the answer question function.

00:09:38.380 | This is very, very straightforward code, but what this means is you get to render something

00:09:42.900 | very structured, and then whether or not this endpoint is used by OpenAPI, is parsed by a

00:09:47.540 | React model, again, these are all just implementation details.

00:09:50.900 | The LLM is very hidden behind the type system that we can now guarantee to be correct.

00:09:57.120 | And the last one I think is really interesting is this data extraction.

00:10:00.620 | If you want to do something like labeling, it's really easy to just say, okay, class label is

00:10:04.580 | literal of either spam or not spam, you've built a classifier.

00:10:08.300 | If you want the accuracy to improve about 15%, you can add chain of thought, right?

00:10:12.340 | And again, it's the structure that tells you how the language model works, but you still

00:10:15.460 | have good validation on whether or not you're going to get, you know, spam or some, like,

00:10:20.700 | babble on, like, you know, here's the JSON that you care about.

00:10:23.980 | You can do the same thing for things like extracting, like, structured information out of transcripts.

00:10:28.620 | Like, a very common example is people want to process transcripts.

00:10:31.740 | Now it's very structured, right?

00:10:32.740 | I have a classification in the meeting type, I've given myself a title, a list of action

00:10:37.460 | items, and a summary.

00:10:38.460 | Here, the owner is a string, but you can imagine having a validator that makes sure that the

00:10:43.060 | owners are the, at least one of the participants of the email based on some Google Calendar integration.

00:10:48.460 | Again, these are all implementation details.

00:10:51.180 | It's all up to you.

00:10:53.300 | And then lastly, you can do some really magical stuff.

00:10:55.180 | In this example, the type I've given is called table.

00:10:58.180 | It has a caption string and a very weird markdown data frame type hint.

00:11:03.060 | And here, what you can see is that I'm really just trying to extract images or tables out of

00:11:07.180 | an image.

00:11:09.180 | But this is a bit wild.

00:11:10.180 | Like, don't worry if you don't understand it.

00:11:11.460 | Basically, what we're using is we're using the new pip, uh, pep, basically to figure out

00:11:16.960 | how we can use annotations to create new type hints.

00:11:20.820 | And so this type hint is pretty advanced.

00:11:22.200 | It says that it's an instance of data frame, which means your IDE will now autocomplete all

00:11:26.640 | the data frame methods as you continue to program.

00:11:29.960 | Uh, the before evaluator says, I know M markdown is going to come out, but I want to parse

00:11:34.280 | it to a data frame.

00:11:35.400 | The serializer says, I know it's a data frame, but when I serialize it, I want it to be marked

00:11:39.240 | down.

00:11:40.240 | And then lastly, you can add additional JSON schema information, which becomes the prompt

00:11:44.320 | that you would use to send to a language model.

00:11:47.020 | But the idea here is, you know, it's really just a type system that we've defined that can

00:11:50.980 | be used by a language model.

00:11:52.440 | And then you can get pretty interesting outputs out of this, right?

00:11:55.520 | And because of the data frame, you can instantly call 2CSV or something like that without

00:11:59.740 | worrying about other implementation details.

00:12:03.120 | And so what we've seen is that we can now just generate things like date ranges, relationships,

00:12:07.540 | we can generate knowledge graphs as we've shown last year, and generally just think about DAGs

00:12:11.080 | and workflows and tables.

00:12:12.920 | And again, all we really care about is just coming up with a creative response model, having

00:12:17.980 | a good set of validators, and as models get smarter, we're only going to have to do less

00:12:22.360 | and less, right?

00:12:23.360 | This is fairly bulletproof.

00:12:26.120 | And so for the last five minutes, I really just want to share what I've learned in the past

00:12:29.520 | year, right?

00:12:30.520 | The first thing is that often one retry for models like OpenAI and Anthropic are basically

00:12:35.500 | enough, and really all you care about is having good, well-written, informative error messages,

00:12:41.120 | which has been hard for all time, but now you're more incentivized to build this out because

00:12:44.900 | this not only makes the code more readable to you, but to the language model.

00:12:48.900 | Then lastly, for the new models from like 3.5 and 4.0, they're so much faster now that we

00:12:54.620 | can actually eat the cost of latency for performance.

00:12:56.900 | So again, as these models get smarter and faster, you're still fairly bulletproof.

00:13:03.280 | One thing I've noticed in a lot of consulting that I've done is that we see 4% to 5% failure

00:13:07.780 | modes and very complex validations, but just by fine-tuning language models on function calling,

00:13:13.060 | we can get them down to zero for even simple models like Mistral or GPT-3.5.

00:13:17.880 | And lastly, structured output is here to stay, mostly because even in domains like Vision or

00:13:24.260 | RAG or Agents, really what I care about is defining the type system that I want to program

00:13:28.760 | with on top of how I want to use language models.

00:13:32.140 | Prompting is an implementation detail, the response model is an implementation detail.

00:13:36.140 | And whether or not we use something like constraint sampling that's available in Lama CBP or Lama

00:13:40.140 | or Outlines, again, the benefits I get as a programmer is sort of on a different level of abstraction.

00:13:46.140 | And then even with things like RAG and Agents, right now we think of RAG as much more like

00:13:51.600 | question-answering systems, but in larger enterprise situations, I see a lot of report generation

00:13:56.920 | as a step to make better decision-making, right?

00:14:00.260 | In Agents, a lot of it now becomes generating workflows and DAGs to then go send to an execution

00:14:04.980 | engine to do the computation ourselves rather than having some kind of React loop and hope that

00:14:09.980 | that these things terminate.

00:14:12.420 | And so really there's no new abstractions, right?

00:14:14.460 | Everything that we've done today is just reducing language models back to very classical programming.

00:14:19.140 | What I care about is that my IDE understands the types, and we just get red squiggly lines

00:14:23.380 | when things are unhappy.

00:14:25.740 | And what we've done is we've turned generative AI just to becoming generating data structures.

00:14:30.840 | You can now own the objects you define, you own the functions that you implement, you own

00:14:34.320 | the control flow, and most importantly you own the prompt because we just give you this messages

00:14:38.260 | array, and you can do anything that you want.

00:14:41.780 | And what this means to me, and I think what this means to everyone else here, is that we

00:14:45.060 | are actually turning software 3.0 and making it backwards compatible with existing software,

00:14:50.060 | right?

00:14:51.060 | We're allowing ourselves to demystify the language models and go back to a much more

00:14:53.980 | classical structure of how we program.

00:14:57.540 | And that's why I still think Pydantic is basically all we need.

00:15:00.700 | Thank you.

00:15:01.700 | Thanks.

00:15:02.700 | Thank you.

00:15:03.700 | Thank you.

00:15:04.700 | Thank you.

00:15:05.700 | Thank you.

00:15:06.700 | Thank you.

00:15:07.700 | Thank you.

00:15:08.700 | Thank you.

00:15:09.700 | Thank you.

00:15:10.700 | Thank you.

00:15:11.700 | Thank you.

00:15:12.700 | Thank you.

00:15:13.700 | Thank you.

00:15:14.700 | Thank you.

00:15:15.700 | Thank you.

00:15:16.700 | We'll see you next time.

Pydantic is STILL all you need: Jason Liu

Chapters