back to indexPydantic is STILL all you need: Jason Liu

Chapters
0:0 Intro
1:47 Whats new
6:41 Generation
7:44 Retrieval
9:11 Advanced
00:00:00.000 |
Eliana Khanna: So, the context from last year's talk was "Pydantic All You Need." 00:00:15.800 |
It was a very popular talk, you know, it kind of like kicked off my Twitter career. 00:00:20.160 |
And today, I'm coming back a year later to basically say the same thing again: "Pydantic 00:00:27.680 |
And really, my goal is to share with you sort of what I've learned for the past year. 00:00:31.360 |
And the problem has always been the fact that if I had hired an intern to write an API for 00:00:35.980 |
me and that API returns a string that I have to JSON loads into a dictionary and then just 00:00:40.600 |
pray that the data was still there to begin with, I would be pretty pissed off. 00:00:44.500 |
I would probably just fire them, replace them with Devon, and just prompt it to use FastAPI 00:00:51.160 |
Because I'm really tired of writing code like this, right, and this is the kind of code that 00:00:54.200 |
we wrote when we had to work with things like chat GPT-3 and stuff like that. 00:00:59.380 |
But there's a lot of good tools that we have in the Python ecosystem. 00:01:02.500 |
And the ecosystem in all of these languages, whether it's Ecto and Elixir, Active Record, 00:01:06.680 |
or anything like that, that can make our lives much, much easier. 00:01:09.880 |
And so the problem is that by not having schemas and structured responses, we tend to lose 00:01:15.620 |
compatibility, composability, and reliability when we build tools and write code that interact 00:01:23.480 |
But it seems that we're very happy with using LLMs for the exact same reason. 00:01:28.060 |
And so last year, we mostly talked about how Pydantic and function calling was a great alternative 00:01:32.740 |
for how we can use structured output to do a lot of additional benefits, right? 00:01:37.240 |
We are able to have nested objects and nested models for modular structures. 00:01:41.400 |
And then we can also use validators to improve the reliability of the systems that we build. 00:01:50.060 |
I think the big question is, what's new in Pydantic? 00:01:56.760 |
I'm basically coming back to say that I was right, and it feels really, really good. 00:02:07.220 |
We've launched in five languages, Python, TypeScript, Ruby, Go, Elixir. 00:02:14.800 |
And again, it's mostly because this is just the exact 600 lines of code that you do not 00:02:20.380 |
And at least in the Python library, we've seen 40% growth month over month. 00:02:24.660 |
And we've only had about 2% of the coverage of the OpenAI download. 00:02:28.580 |
So there's still tons of room to grow in terms of how we can make these APIs a lot more ergonomic. 00:02:35.180 |
And so you saw 1.0, you might be going, Jason, what did we break in the API? 00:02:39.980 |
I renamed a method, and now we support things like OLAMA and LAMACVP, along with a bunch of 00:02:47.100 |
So we support things like Anthropic, Cohered, Gemini, Grok, everything that you need. 00:02:52.280 |
And as long as language models support more function calling capabilities, this API will 00:02:58.700 |
And if you haven't seen the talk last year, the general API looks like this. 00:03:04.320 |
You can then, you know, patch the OpenAI client or any client that you want. 00:03:08.180 |
And all you have to do is you've got to pass in response model equals user. 00:03:16.860 |
And you know, it took a little bit of hackiness, but now we can also leverage some of the new 00:03:20.540 |
Python tooling to also infer the return type. 00:03:23.460 |
And so here, because response model is a user, the object is inferred as a user object. 00:03:28.000 |
You get nice red squiggly lines if you've messed up your code. 00:03:31.660 |
The same thing happens when you want to create an iterable. 00:03:34.480 |
Here you see that I have a single response model as a user, but I want to extract two objects. 00:03:39.280 |
And as long as you set stream equals true, you're going to get each object as they return. 00:03:44.000 |
And this is kind of a way of using streaming to improve the latency while having a little 00:03:51.780 |
The difference here is that instead of just returning a partially correct or validated JSON 00:03:56.720 |
stack code, we can validate the entire object. 00:03:58.960 |
And this means that if you have things like generative UI that use a structure, you can 00:04:02.540 |
render that while streaming without having to write this very evil JSON stack code to figure 00:04:13.220 |
You have one noun, which is the client, and you have three verbs. 00:04:15.960 |
You can create, create with iterable, and create with partial based on whether or not you 00:04:20.800 |
And everything else you think about is going to be around the response model, the validator 00:04:24.840 |
features that you have to build, and the messages array that you pass in. 00:04:28.080 |
So if OpenAI supports some new, weird API call, as long as it fits with the messages, there's 00:04:36.040 |
And that's why I think Pydantic is still all you need. 00:04:38.080 |
And so the rest of this talk is basically going to be about two, really three things. 00:04:43.320 |
I'm going to cover some examples of generation, in particular, around RAG and extraction. 00:04:49.000 |
Then I'm just going to cover what we learned this year, and it's really not that much, right? 00:04:53.320 |
Validation errors are very important, and usually they can fix any errors that we have. 00:04:58.320 |
Not all language models can really support retry logic right now. 00:05:01.260 |
I think that's something we're going to work towards. 00:05:03.060 |
And ultimately, whether you use vision, or text, or RAG, or agents, they all benefit from 00:05:09.320 |
Because the real idea here is we're going to be programming with data structures, which 00:05:12.560 |
is something everyone knows how to do, rather than trying to, like, beg and pray to the LLM 00:05:17.840 |
And really, again, the theme of this talk is the fact that nothing really has changed. 00:05:22.760 |
All we learned to do is relearn how to program. 00:05:27.060 |
And so the first concept that I think many people might not have seen in Pydantic is the 00:05:32.020 |
Here, you can define a validator on any kind of attribute, and add additional logic that 00:05:38.460 |
And so you see, in my prompt, I don't really ask the language model to uppercase all the names, 00:05:42.940 |
but I can actually write Python code to verify that something is correct and throw an error 00:05:48.000 |
And if I want to, I can turn retrying on, and that error message is caught by the language model, 00:05:54.320 |
And so in this example, it is the error message that is part of the prompt, but conditionally 00:06:00.520 |
And as you can see, after one API call, JSON is now all caps. 00:06:09.140 |
This is a very simple example that you might see, something like RAMP, where you're processing 00:06:14.780 |
You might want to use a vision language model to extract the receipt data. 00:06:18.820 |
There's a total cost, and the products is a list of products. 00:06:21.500 |
And the validator does something a little bit more interesting. 00:06:23.100 |
It says, "Make sure that the price and the quantity add up to the total cost." 00:06:28.180 |
Again, this basically doesn't really happen for 99% of the cases, but when it does happen, 00:06:33.320 |
you see a red bar in Datadog, and that's really what I care about. 00:06:36.100 |
And if I want to ask re-asking, I want to make sure that, again, everything is done correctly. 00:06:44.860 |
Well, it turns out if you don't use structured outputs, the structure you get is just response 00:06:51.860 |
You still get an object back out, but you're just hoping that you don't have to call JSON 00:06:55.400 |
loads yourself and, you know, eat whatever cost you have in terms of parsing. 00:06:59.860 |
And so a really simple example of a RAG application is not only having a content, but having a 00:07:07.600 |
The follow-up questions can be informed by the existing context, but now you're going to 00:07:11.460 |
let the user show, like, "Hey, there's other questions that you can answer based on the context 00:07:16.460 |
A really funny example that I've actually done in production quite a bit is just making 00:07:26.120 |
I just have a regular expression, parse all the URLs, and I use post to figure out if the 00:07:32.380 |
And now I can make sure very easily that no URLs are, you know, hallucinated. 00:07:37.380 |
And in my instructions, I just say, "Well, if it's not real, just throw it out next time." 00:07:43.700 |
The same thing happens with retrieval augmented generation. 00:07:48.180 |
We all kind of know at this point that embeddings won't really solve all the problems you have 00:07:52.820 |
For example, if I ask a question like, "What is the latest news from Z?" 00:07:57.100 |
Like, latest news isn't something that embeddings can capture, right? 00:08:01.320 |
The source of that, maybe that is relevant if you use BM25, but really there might be separate 00:08:07.980 |
And we can do something very simple in the structured output world that makes this very reasonable, 00:08:13.460 |
To find a search object, I say it has a query, a start date, an end date that is optional. 00:08:17.940 |
Maybe there's a limit in case I want to see the top five results. 00:08:21.440 |
And then a source that allows the language model to choose which backend I want to hit. 00:08:26.740 |
And then, you know, how you actually search the endpoint is kind of an implementation detail 00:08:31.200 |
And now you just define a very simple function, you know, create search. 00:08:36.700 |
And even the API call itself now is an implementation detail, right? 00:08:40.060 |
As long as I get the search query out and it's correct, I can do a lot more. 00:08:43.680 |
And in particular, like even the validations themselves, you know, I can figure out whether 00:08:48.180 |
the date ranges are zero days, one day, and figure out even distributions based on the structured 00:08:53.760 |
Then if I ask the question, like, what is the difference between X and Y, I can just 00:08:59.240 |
Now, if I ask this question, I'm going to have a search query for Y, a search query for X, 00:09:04.740 |
and again, my RAG application can figure out that I can do two parallel search queries, 00:09:12.040 |
And so this means that you can build a fairly sophisticated RAG application in two functions 00:09:17.980 |
First you have the model for how you respond with the data, and then how you process a search 00:09:24.060 |
And then you define two functions that return those objects. 00:09:29.520 |
And then this is basically your advanced RAG application, right? 00:09:31.980 |
You make a search query, you return multiple searches. 00:09:34.660 |
You search each one, and then you pass the context into the answer question function. 00:09:38.380 |
This is very, very straightforward code, but what this means is you get to render something 00:09:42.900 |
very structured, and then whether or not this endpoint is used by OpenAPI, is parsed by a 00:09:47.540 |
React model, again, these are all just implementation details. 00:09:50.900 |
The LLM is very hidden behind the type system that we can now guarantee to be correct. 00:09:57.120 |
And the last one I think is really interesting is this data extraction. 00:10:00.620 |
If you want to do something like labeling, it's really easy to just say, okay, class label is 00:10:04.580 |
literal of either spam or not spam, you've built a classifier. 00:10:08.300 |
If you want the accuracy to improve about 15%, you can add chain of thought, right? 00:10:12.340 |
And again, it's the structure that tells you how the language model works, but you still 00:10:15.460 |
have good validation on whether or not you're going to get, you know, spam or some, like, 00:10:20.700 |
babble on, like, you know, here's the JSON that you care about. 00:10:23.980 |
You can do the same thing for things like extracting, like, structured information out of transcripts. 00:10:28.620 |
Like, a very common example is people want to process transcripts. 00:10:32.740 |
I have a classification in the meeting type, I've given myself a title, a list of action 00:10:38.460 |
Here, the owner is a string, but you can imagine having a validator that makes sure that the 00:10:43.060 |
owners are the, at least one of the participants of the email based on some Google Calendar integration. 00:10:53.300 |
And then lastly, you can do some really magical stuff. 00:10:55.180 |
In this example, the type I've given is called table. 00:10:58.180 |
It has a caption string and a very weird markdown data frame type hint. 00:11:03.060 |
And here, what you can see is that I'm really just trying to extract images or tables out of 00:11:10.180 |
Like, don't worry if you don't understand it. 00:11:11.460 |
Basically, what we're using is we're using the new pip, uh, pep, basically to figure out 00:11:16.960 |
how we can use annotations to create new type hints. 00:11:22.200 |
It says that it's an instance of data frame, which means your IDE will now autocomplete all 00:11:26.640 |
the data frame methods as you continue to program. 00:11:29.960 |
Uh, the before evaluator says, I know M markdown is going to come out, but I want to parse 00:11:35.400 |
The serializer says, I know it's a data frame, but when I serialize it, I want it to be marked 00:11:40.240 |
And then lastly, you can add additional JSON schema information, which becomes the prompt 00:11:44.320 |
that you would use to send to a language model. 00:11:47.020 |
But the idea here is, you know, it's really just a type system that we've defined that can 00:11:52.440 |
And then you can get pretty interesting outputs out of this, right? 00:11:55.520 |
And because of the data frame, you can instantly call 2CSV or something like that without 00:12:03.120 |
And so what we've seen is that we can now just generate things like date ranges, relationships, 00:12:07.540 |
we can generate knowledge graphs as we've shown last year, and generally just think about DAGs 00:12:12.920 |
And again, all we really care about is just coming up with a creative response model, having 00:12:17.980 |
a good set of validators, and as models get smarter, we're only going to have to do less 00:12:26.120 |
And so for the last five minutes, I really just want to share what I've learned in the past 00:12:30.520 |
The first thing is that often one retry for models like OpenAI and Anthropic are basically 00:12:35.500 |
enough, and really all you care about is having good, well-written, informative error messages, 00:12:41.120 |
which has been hard for all time, but now you're more incentivized to build this out because 00:12:44.900 |
this not only makes the code more readable to you, but to the language model. 00:12:48.900 |
Then lastly, for the new models from like 3.5 and 4.0, they're so much faster now that we 00:12:54.620 |
can actually eat the cost of latency for performance. 00:12:56.900 |
So again, as these models get smarter and faster, you're still fairly bulletproof. 00:13:03.280 |
One thing I've noticed in a lot of consulting that I've done is that we see 4% to 5% failure 00:13:07.780 |
modes and very complex validations, but just by fine-tuning language models on function calling, 00:13:13.060 |
we can get them down to zero for even simple models like Mistral or GPT-3.5. 00:13:17.880 |
And lastly, structured output is here to stay, mostly because even in domains like Vision or 00:13:24.260 |
RAG or Agents, really what I care about is defining the type system that I want to program 00:13:28.760 |
with on top of how I want to use language models. 00:13:32.140 |
Prompting is an implementation detail, the response model is an implementation detail. 00:13:36.140 |
And whether or not we use something like constraint sampling that's available in Lama CBP or Lama 00:13:40.140 |
or Outlines, again, the benefits I get as a programmer is sort of on a different level of abstraction. 00:13:46.140 |
And then even with things like RAG and Agents, right now we think of RAG as much more like 00:13:51.600 |
question-answering systems, but in larger enterprise situations, I see a lot of report generation 00:13:56.920 |
as a step to make better decision-making, right? 00:14:00.260 |
In Agents, a lot of it now becomes generating workflows and DAGs to then go send to an execution 00:14:04.980 |
engine to do the computation ourselves rather than having some kind of React loop and hope that 00:14:12.420 |
And so really there's no new abstractions, right? 00:14:14.460 |
Everything that we've done today is just reducing language models back to very classical programming. 00:14:19.140 |
What I care about is that my IDE understands the types, and we just get red squiggly lines 00:14:25.740 |
And what we've done is we've turned generative AI just to becoming generating data structures. 00:14:30.840 |
You can now own the objects you define, you own the functions that you implement, you own 00:14:34.320 |
the control flow, and most importantly you own the prompt because we just give you this messages 00:14:38.260 |
array, and you can do anything that you want. 00:14:41.780 |
And what this means to me, and I think what this means to everyone else here, is that we 00:14:45.060 |
are actually turning software 3.0 and making it backwards compatible with existing software, 00:14:51.060 |
We're allowing ourselves to demystify the language models and go back to a much more 00:14:57.540 |
And that's why I still think Pydantic is basically all we need.