Back to Index

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor


Chapters

0:0 Introductions
2:50 Early experiments with Generative AI at StitchFix
9:39 Design philosophy behind the Instructor library
13:17 JSON Mode vs Function Calling
14:43 Single vs parallel function calling
16:28 How many functions is too many?
20:40 How to evaluate function calling
24:1 What is Instructor good for?
26:41 The Evolution from Looping to Workflow in AI Engineering
31:58 State of the AI Engineering Stack
33:40 Why Instructor isn't VC backed
37:8 Advice on Pursuing Open Source Projects and Consulting
42:59 The Concept of High Agency and Its Importance
51:6 Prompts as Code and the Structure of AI Inputs and Outputs
53:6 The Emergence of AI Engineering as a Distinct Field

Transcript

Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, partner at CTN Residence at Decibel Partners. And I'm joined by my co-host, Swiggs, founder of Small AI. Hello. We're back in the remote studio with Jason Liu from Instructure. Welcome, Jason. Hey there. Thanks for having me. Jason, you are extremely famous.

So I don't know what I'm going to do introducing you. But you're one of the Waterloo clan. There's a small cadre of you that's just completely dominating machine learning. Actually, can you list Waterloo alums that you know are just dominating and crushing it right now? So John from Ryasana is doing his inversion models.

I know there's Clive Chen. Clive Chen from Waterloo. He was one of the kids where I-- when I started the data science club, he was one of the guys who was joining in and just hanging out in the room. And now he was at Tesla, working with Harpathy. Now he's at OpenAI.

He's in my climbing club. Oh, hell yeah. Yeah. I haven't seen him in like six years now. To get in the social scene in San Francisco, you have to climb. So yeah, both in career and in rocks. Yeah, I mean, a lot of good problem solving there. But oh man, I feel like now that you put me on the spot, I don't know.

It's OK. Yeah, that was a riff. OK, but anyway, so you started a data science club at Waterloo. We can talk about that. But then also spent five years at Stitch Fix as an MLDE. You pioneered the use of OpenAI's LLMs to increase stylus efficiency. So you must have been a very, very early user.

This was pretty early on. Yeah, I mean, this was like GBD3. OK, so we actually were using transformers at Stitch Fix before the GBD3 model. So we were just using transformers recommendation systems. At that time, I was very skeptical of transformers. I was like, why do we need all this infrastructure?

We can just use matrix factorization. When GBD2 came out, I fine tuned my own GBD2 to write rap lyrics. And I was like, OK, this is cute. OK, I got to go back to my real job. Who cares if I can write a rap lyric? When GBD3 Instruct came out, again, I was very much like, why are we using a POST request to review every comment a person leaves?

We can just use classical models. So I was very against language models for the longest time. And then when ChachiPD came out, I basically just wrote a long apology letter to everyone at the company. I was like, hey, guys, I was very dismissive of some of this technology. I didn't think it would scale well.

And I am wrong. This is incredible. And I immediately just transitioned to go from computer vision recommendation systems to LLMs. But funny enough, now that we have RAG, we're kind of going back to recommendation systems. Yeah, speaking of that, I think Alessio's going to bring up-- I was going to say, we had Brian Bishop from Max on the podcast.

Did you overlap Stitch Fix? Yeah, yeah, he was one of my main users of the recommendation framework that I had built out at Stitch Fix. Yeah, we talked a lot about Rexis, so it makes sense. So I actually, now I have adopted that line, that RAG is Rexis. And if you're trying to reinvent new concepts, you should study Rexis first, because you're going to independently reinvent a lot of concepts.

So your system was called Flight. It's a recommendation framework with over 80% adoption, servicing 350 million requests every day. Wasn't there something existing at Stitch Fix? Like, why did you have to write one from scratch? No, so I think because at Stitch Fix, a lot of the machine learning engineers and data scientists were writing production code, sort of every team's systems were very bespoke.

It's like, this team only needs to do real-time recommendations with small data, so they just have a fast API app with some Panda's code. This other team has to do a lot more data, so they have some kind of Spark job that does some batch ETL that does a recommendation, right?

And so what happens is each team writes their code differently, and I have to come in and refactor their code. And I was like, oh, man, I'm refactoring four different code bases four different times. Wouldn't it be better if all the code quality was my fault? All right, well, let me just write this framework, force everyone else to use it, and now one person can maintain five different systems rather than five teams having their own bespoke system.

And so it was really a need of just standardizing everything. And then once you do that, you can do observability across the entire pipeline and make large, sweeping improvements in this infrastructure. If we notice that something is slow, we can detect it on the operator layer. Just, hey, hey, this team, you guys are doing this operation.

It's lowering our latency by 30%. If you just optimize your Python code here, we can probably make an extra million dollars. Like, jump on a call and figure this out. And then a lot of it was just doing all this observability work to figure out what the heck is going on and optimize this system from not only just a code perspective, but just harassing the org and saying, we need to add caching here.

We're doing duplicated work here. Let's go clean up the systems. Yeah. One more system that I'm interested in finding out more about is your similarity search system using Qlip and GPT-3 embedding in FICE, where you said over $50 million in annual revenue. So of course, they all gave all that to you, right?

No, no. I mean, it's not going up and down. But I got a little bit, so I'm pretty happy about that. But there, that was when we were doing fine-tuning ResNets to do image classification. And so a lot of it was, given an image, if we could predict the different attributes we have in our merchandising, and we can predict the text embeddings of the comments, then we can build a image vector or image embedding that can capture both descriptions of the clothing and sales of the clothing.

And then we would use these additional vectors to augment our recommendation system. And so with this, the recommendation system really was just around, what are similar items? What are complementary items? What are items that you would wear in a single outfit? And being able to say, on a product page, let me show you 15, 20 more things.

And then what we found was like, hey, when you turn that on, you make a bunch of money. Yeah. OK, so you didn't actually use GPT-3 embeddings. You fine-tuned your own. Because I was surprised that GPT-3 worked off the shelf. OK, OK. Because at this point, we would have 3 million pieces of inventory over a billion interactions between users and clothes.

Any kind of fine-tuning would definitely outperform some off-the-shelf model. Cool. I'm about to move on from Stitch Fix. But any other fun stories from the Stitch Fix days that you want to cover? No, I think that's basically it. I mean, the biggest one, really, was the fact that, I think, for just four years, I was so bearish on language models and just NLP in general.

I was like, oh, none of this really works. Why would I spend time focusing on this? I've got to go do the thing that makes money-- recommendations, bounding boxes, image classification. Yeah. And now I'm prompting an image model. I was like, oh, man, I was wrong. I think-- OK, so my Stitch Fix question would be, I think you have a bit of a drip.

And I don't. My primary wardrobe is free startup conference t-shirts. Should more technology brothers be using Stitch Fix? Or what's your fashion advice? Oh, man, I mean, I'm not a user of Stitch Fix, right? It's like, I enjoy going out and touching things and putting things on and trying them on, right?

I think Stitch Fix is a place where you kind of go because you want the work offloaded. Whereas I really love the clothing I buy where I have to-- when I land in Japan, I'm doing a 45-minute walk up a giant hill to find this weird denim shop. That's the stuff that really excites me.

But I think the bigger thing that's really captured is this idea that narrative matters a lot to human beings. And I think the recommendation system, that's really hard to capture. It's easy to sell-- it's easy to use AI to sell a $20 shirt. But it's really hard for AI to sell a $500 shirt.

But people are buying $500 shirts, you know what I mean? There's definitely something that we can't really capture just yet that we probably will figure out how to in the future. Well, it'll probably-- I'll put in JSON, which is what we're going to turn to next. So then you went on a sabbatical to South Park Commons in New York, which is unusual because it's usually-- Yeah, so basically in 2020, really, I was just enjoying working a lot.

And so I was just building a lot of stuff. This is where we were making the tens of millions of dollars doing stuff. And then I had a hand injury, and so I really couldn't code anymore for about a year, two years. And so I kind of took half of it as medical leave.

The other half, I became more of a tech lead, just making sure the systems or lights were on. And then when I went to New York, I spent some time there and kind of just wound down the tech work, did some pottery, did some jiu jitsu. And after GBD came out, I was like, oh, I clearly need to figure out what is going on here because something feels very magical, and I don't understand it.

So I spent basically five months just prompting and playing around with stuff. And then afterwards, it was just my startup friends going like, hey, Jason, my investors want us to have an AI strategy. Can you help us out? And it just snowballed more and more until I was making this my full-time job.

And you had YouTube University and a journaling app, a bunch of other explorations. But it seems like the most productive or the most best-known thing that came out of your time there was Instructor. Yeah, written on the bullet train in Japan. Well, tell us the origin story. Yeah, I mean, I think at some point, tools like Guardrails that Marvin came out, those are kind of tools that use XML and Pytantic to get structured data out.

But they really were doing things sort of in the prompt. And these are built with sort of the Instruct models in mind. And I really-- like, I'd already done that in the past. At Stitch Fix, one of the things we did was we would take a request note and turn that into a JSON object that we would use to send it to our search engine, right?

So if you said, like, I wanted skinny jeans that were this size, that would turn into JSON that we would send to our internal search APIs. It always felt kind of gross. A lot of it is just, like, you read the JSON, you parse it, you make sure the names are strings and ages are numbers, and you do all this messy stuff.

But when Function Calling came out, it was very much sort of a new way of doing things. Function Calling lets you define the schema separate from the data and the instructions. And what this meant was you can kind of have a lot more complex schemas and just map them in Pydantic.

And then you can just keep those very separate. And then once you add, like, methods, you can add validators and all that kind of stuff. The one thing I really had with a lot of these libraries, though, was it was doing a lot of the string formatting themselves, which was fine when it was the instruction tune models.

You just have a string. But when you have these new chat models, you have these chat messages. And I just didn't really feel like not being able to access that for the developer was sort of a good benefit that they would get. And so I just said, let me write the most simple SDK around the OpenAI SDK, simple wrapper on the SDK, just handle the response model a bit, and kind of think of myself more like requests than actual framework that people can use.

And so the goal is, hey, this is something that you can use to build your own framework. But let me just do all the boring stuff that nobody really wants to do. People want to build their own frameworks. People don't want to build JSON parsing. And the retrying and all that other stuff.

Yeah. Yeah, we had a little bit of this discussion before the show. But that design principle of going for being requests rather than being Django, what inspires you there? This has come from a lot of prior pain. Are there other open source projects that kind of inspired your philosophy here?

Yeah, I mean, I think it would be requests. I think it is just the obvious thing you install. If you were going to go make HTTP requests in Python, you would obviously import requests. Maybe if you want to do more async work, there's future tools. But you don't really even think about installing it.

And when you do install it, you don't think of it as like, oh, this is a requests app. No, this is just Python. The bigger question is, a lot of people ask questions like, oh, why isn't requests in the standard library? That's how I want my library to feel.

It's like, oh, if you're going to use the LLM SDKs, you're obviously going to install Instructor. And then I think the second question would be, oh, how come Instructor doesn't just go into OpenAI, go into Anthropic? If that's the conversation we're having, that's where I feel like I've succeeded.

Yeah, it's so standard, you may as well just have it in the base libraries. And the shape of the request has stayed the same. But initially, function calling was maybe equal structure outputs for a lot of people. I think now the models also support JSON mode and some of these things.

And return JSON on my grandma is going to die. All of that stuff is maybe to decide. How have you seen that evolution? Maybe what's the meta game today? Should people just forget about function calling for structure outputs? Or when is structure output, like JSON mode, the best versus not?

We'd love to get any thoughts, given that you do this every day. Yeah, I would almost say these are like different implementations of-- the real thing we care about is the fact that now we have typed responses to language models. And because we have that type response, my ID is a little bit happier.

I get autocomplete. If I'm using the response wrong, there's a little red squiggly line. Those are the things I care about. In terms of whether or not JSON mode is better, I usually think it's almost worse unless you want to spend less money on the prompt tokens that the function call represents.

Primarily because with JSON mode, you don't actually specify the schema. So sure, JSON load works. But really, I care a lot more than just the fact that it is JSON. I think function calling gives you a tool to specify the fact that, OK, this is a list of objects that I want.

And each object has a name or an age. And I want the age to be above 0. And I want to make sure it's parsed correctly. That's where function calling really shines. Any thoughts on single versus parallel function calling? When I first started-- so I did a presentation at our AI in Action Discord channel, and obviously, showcase instructor.

One of the big things that we had before with single function calling is like when you're trying to extract lists, you have to make these funky properties that are lists to then actually return all the objects. How do you see the hack being put on the developer's plate versus more of the stuff just getting better in the model?

And I know you tweeted recently about Anthropic, for example, some lists are not lists, they're strings. And there's all of these discrepancies. I almost would prefer it if it was always a single function call. But obviously, there is the agents workflows that Instructor doesn't really support that well, but are things that ought to be done.

You could define, I think, maybe like 50 or 60 different functions in a single API call. And if it was like get the weather, or turn the lights on, or do something else, it makes a lot of sense to have these parallel function calls. But in terms of an extraction workflow, I definitely think it's probably more helpful to have everything be a single schema.

Just because you can specify relationships between these entities that you can't do in parallel function calling, you can have a single chain of thought before you generate a list of results. There's small API differences, right? Where, yeah, if it's for parallel function calling, if you do one, again, I really care about how the SDK looks.

And so it's, OK, do I always return a list of functions, or do you just want to have the actual object back out? You want to have autocomplete over that object. What's the cap for how many function definitions you can put in where it still works well? Do you have any sense on that?

I mean, for the most part, I haven't really had a need to do anything that's more than six or seven different functions. I think in the documentation, they support way more. But I don't even know if there's any good evals that have over two dozen function calls. I think if you run into issues where you have 20, or 50, or 60 function calls, I think you're much better having those specifications saved in a vector database, and then have them be retrieved.

So if there are 30 tools, you should basically be ranking them, and then using the top K to do selection a little bit better, rather than just shoving 60 functions into a single API. Yeah. Well, I mean, so I think this is relevant now, because previously, I think context limits prevented you from having more than a dozen tools anyway.

And now that we have a million token context windows, Cloud recently, with their new function calling release, said they can handle over 250 tools, which is insane to me. That's a lot. I would say, you're saying you don't think there's many people doing that. I think anyone with a sort of agent-like platform where you have a bunch of connectors, they wouldn't run into that problem.

Probably, you're right that they should use a vector database and kind of rag their tools. I know Zapier has like a few thousand, like 8,000, 9,000 connectors that obviously don't fit anywhere. So yeah, I mean, I think that would be it, unless you need some kind of intelligence that chains things together, which is, I think, what Alessio is coming back to.

There is this trend about parallel function calling. I don't know what I think about that. Anthropx version was-- I think they used multiple tools in sequence, but they're not in parallel. I haven't explored this at all. I'm just throwing this open to you as to what do you think about all these things.

You know, do we assume that all function calls could happen in any order? I think there's a lot of-- in which case, we either can assume that, or we can assume that things need to happen in some kind of sequence as a DAG. But if it's a DAG, really, that's just one JSON object that is the entire DAG, rather than going, OK, the order of the function that I return don't matter.

That's just-- that's definitely just not true in practice. If I have a thing that's like, turn the lights on, unplug the power, and then turn the toaster on or something, the order doesn't matter. And it's unclear how well you can describe the importance of that reasoning to a language model yet.

I mean, I'm sure you can do it with good enough prompting, but I just haven't had any use cases where the function sequence really matters. Yeah. To me, the most interesting thing is the models are better at picking than your ranking is, usually. Like, I'm incubating a company around system integration.

And for example, with one system, there are like 780 endpoints. And if you actually try and do vector similarity, it's not that good, because the people that wrote the specs didn't have in mind making them semantically apart. They're kind of like, oh, create this, create this, create this. Versus when you give it to a model, and you put-- like in Opus, you put them all, it's quite good at picking which ones you should actually run.

And I'm curious to see if the model providers actually care about some of those workflows, or if the agent companies are actually going to build very good rankers to kind of fill that gap. Yeah, my money is on the rankers, because you can do those so easily. You could just say, well, given the embeddings of my search query and the embeddings of the description, I can just train XGBoost and just make sure that I have very high MRR, which is mean reciprocal rank.

And so the only objective is to make sure that the tools you use are in the top end filter. That feels super straightforward, and you don't have to actually figure out how to fine tune a language model to do tool selection anymore. Yeah, I definitely think that's the case.

Because for the most part, I imagine you either have less than three tools or more than 1,000. I don't know what kind of companies say, oh, thank God, we only have like 185 tools. And this works perfectly, right? That's right. And before we maybe move on just from this, it was interesting to me you retweeted this thing about entropic function calling, and it was Joshua Brown's retweeting some benchmark that's like, oh my God, entropic function calling, so good.

And then you retweeted it, and then you tweeted later, and it's like, it's actually not that good. What's your flow for like, how do you actually test these things? Because obviously, the benchmarks are lying, right? Because the benchmark says it's good, and you said it's bad, and I trust you more than the benchmark.

How do you think about that, and then how do you evolve it over time? Yeah, it's mostly just client data. I think when-- I actually have been mostly busy with enough client work that I haven't been able to reproduce public benchmarks, and so I can't even share some of the results of entropic.

But I would just say, in production, we have some pretty interesting schemas, where it's iteratively building lists, where we're doing updates of lists, like we're doing in-place updates, so upserts and inserts. And in those situations, we're like, oh, yeah, we have a bunch of different parsing errors. Numbers are being returned as strings.

We were expecting lists of objects, but we're getting strings that are like the strings of JSON. So we had to call JSON parse on individual elements. Overall, I'm super happy with the entropic models compared to the OpenAI models. Like, Sonnet is very cost-effective. Haiku is-- in function calling, it's actually better.

But I think we just had to file down the edges a little bit, where our tests pass, but then we actually apply to production, we get half a percent of traffic having issues, where if you ask for JSON, it'll still try to talk to you. Or if you use function calling, we'll have a parse error.

And so I think these are things that are definitely going to be things that are fixed in the upcoming weeks. But in terms of the reasoning capabilities, man, it's hard to beat 70% cost reduction, especially when you're building consumer applications. If you're building something for consultants or private equity, you're charging $400.

It doesn't really matter if it's $1 or $2. But for consumer apps, it makes products viable. If you can go from 4 to Sonnet, you might actually be able to price it better. I had this chart about the ELO versus the cost of all the models. And you could put trend graphs on each of those things about higher ELO equals higher cost, except for Haiku.

Haiku kind of just broke the lines, or the ISO ELOs, if you want to call it. Cool. Before we go too far into your opinions on just the overall ecosystem, I want to make sure that we map out the surface area of Instructure. I would say that most people would be familiar with Instructure from your talks, and your tweets, and all that.

You had the number one talk from the AI Engineer Summit. MARK MANDEL: Two Lews, Jason Lew and Jerry Lew. FRANCESC CAMPOY: Yeah, yeah, yeah. Started with J and then a Lew to do well. But yeah, until I actually went through your cookbook, I didn't realize the surface area. How would you categorize the use cases?

You have LLM self-critique. You have knowledge graphs in here. You have PII data sanitation. How do you characterize the people? What is the surface area of Instructure? MARK MANDEL: Yeah, so this is the part that feels crazy. Because really, the difference is LLMs give you strings, and Instructure gives you data structures.

And once you get data structures again, you can do every lead code problem you ever thought of. And so I think there's a couple of really common applications. The first one, obviously, is extracting structured data. This is just be, OK, well, I want to put in an image of a receipt.

I want to give back out a list of checkout items with a price, and a fee, and a coupon code, or whatever. That's one application. Another application really is around extracting graphs out. So one of the things we found out about these language models is that not only can you define nodes, it's really good at figuring out what are nodes and what are edges.

And so we have a bunch of examples where not only do I extract that this happens after that, but also, OK, these two are dependencies of another task. And you can do extracting complex entities that have relationships. Given a story, for example, you could extract relationships of families across different characters.

This can all be done by defining a graph. And then the last really big application really is just around query understanding. The idea is that any API call has some schema. And if you can define that schema ahead of time, you can use a language model to resolve a request into a much more complex request, one that an embedding could not do.

So for example, I have a really popular post called, like, "Rag Is More Than Embeddings." And effectively, if I have a question like this, what was the latest thing that happened this week? That embeds to nothing. But really, that query should just be select all data where the date time is between today and today minus seven days.

What if I said, how did my writing change between this month and last month? Again, embeddings would do nothing. But really, if you could do a group by over the month and a summarize, then you could, again, do something much more interesting. And so this really just calls out the fact that embeddings really is kind of like the lowest hanging fruit.

And using something like an instructor can really help produce a data structure. And then you can just use your computer science to reason about this data structure. Maybe you say, OK, well, I'm going to produce a graph where I want to group by each month and then summarize them jointly.

You can do that if you know how to define this data structure. In that part, you kind of run up against the chains of the world that used to have that. They still do have the self-querying, I think they used to call it, when we had Harrison on in our episode.

How do you see yourself interacting with the other, I guess, LLM frameworks in the ecosystem? - Yeah, I mean, if they use Instructure, I think that's totally cool. I think because it's just, again, it's just Python. It's asking, oh, how does Django interact with requests? Well, you just might make a request.get in a Django app.

But no one would say, oh, I went off of Django because I'm using requests now. They should be, ideally, the wrong comparison. In terms of especially the agent workflows, I think the real goal for me is to go down the LLM compiler route, which is instead of doing a React-type reasoning loop, I think my belief is that we should be using workflows.

If we do this, then we always have a request and a complete workflow. We can fine-tune a model that has a better workflow. Whereas it's hard to think about how do you fine-tune a better React loop. Do you want to always train it to have less looping? In which case, you want it to get the right answer the first time, in which case, it was a workflow to begin with.

- Can you define workflow? Because I think, obviously, I used to work at a workflow company, but I'm not sure this is a well-defined framework for everybody. - I'm thinking workflow in terms of the prefect Zapier workflow. I want to build a DAG. I want you to tell me what the nodes and edges are.

And then maybe the edges are also put in with AI. But the idea is that I want to be able to present you the entire plan and then ask you to fix things as I execute it, rather than going, hey, I couldn't parse the JSON, so I'm going to try again.

I couldn't parse the JSON, I'm going to try again. And then next thing you know, you spent $2 on OpenAI credits. Whereas with the plan, you can just say, oh, the edge between node x and y does not run. Let me just iteratively try to fix that component. Once that's fixed, go on to the next component.

And obviously, you can get into a world where, if you have enough examples of the nodes x and y, maybe you can use a vector database to find a good few-shot examples. You can do a lot if you break down the problem into that workflow and execute in that workflow, rather than looping and hoping the reasoning is good enough to generate the correct output.

Yeah, I would say I've been hammering on Devon a lot. I got access a couple of weeks ago. And obviously, for simple tasks, it does well. For the complicated, more than 10, 20-hour tasks, I can see it-- That's a crazy comparison. We used to talk about three, four loops.

Only once it gets to hour tasks, it's hard. Yeah. Less than an hour, there's nothing. That's crazy. I mean, I don't know. Yeah, OK, maybe my goalposts have shifted. I don't know. That's incredible. I'm like sub-one-minute executions. The fact that you're talking about 10 hours is incredible. I think it's a spectrum.

I actually-- I really, really-- I think I'm going to say this every single time I bring up Devon. Let's not reward them for taking longer to do things. Do you know what I mean? Like, that's a metric that is easily abusable. Sure. Yeah. You can run a game. Yeah, but all I'm saying is you can monotonically increase the success probability over an hour.

That's winning to me. Obviously, if you run an hour and you've made no progress-- like, I think when we were in auto-GBT land, there was that one example where I wanted it to buy me a bicycle. And overnight, I spent $7 on credits, and I never found the bicycle.

Yeah, yeah. I wonder if you'll be able to purchase a bicycle. Because it actually can do things in real world, it just needs to suspend to you for off and stuff. But the point I was trying to make was that I can see it turning plans. Like, when it gets on-- I think one of the agents' loopholes, or one of the things that is a real barrier for agents is LLMs really like to get stuck into a lane.

And what you're talking about, what I've seen Devon do is it gets stuck in a lane, and it will just kind of change plans based on the performance of the plan itself. And it's kind of cool. Yeah, I feel like we've gone too much in the looping route. And I think a lot of more plans and DAGs and data structures are probably going to come back to help fill in some holes.

Yeah. What's the interface to that? Do you see it's like an existing state machine kind of thing that connects to the LLMs, the traditional DAG player? So do you think we need something new for AI DAGs? Yeah, I mean, I think that the hard part is going to be describing visually the fact that this DAG can also change over time, and it should still be allowed to be fuzzy, right?

I think in mathematics, we have plate diagrams, and Markov chain diagrams, and recurrence states, and all that. Some of that might come into this workflow world. But to be honest, I'm not too sure. I think right now, the first steps are just how do we take this DAG idea and break it down to modular components that we can prompt better, have few-shot examples for, and ultimately fine-tune against.

But in terms of even the UI, it's hard to say what we'll likely win. I think people like Prefect and Zapier have a pretty good shot at doing a good job. Yeah. So you seem to use Prefect a lot. Actually, you worked at a Prefect competitor at Temporal. And I'm also very familiar with Dexter.

What else would you call out as particularly interesting in the AI engineering stack? Man, I almost use nothing. I just use Cursor and PyTests. Oh, OK. I think that's basically it. A lot of the observability companies have-- the more observability companies I've tried, the more I just use Postgres.

Really? OK. Postgres for observability? But the issue, really, is the fact that these observability companies isn't actually doing observability for the system. It's just doing the LLM thing. I still end up using Datadog or Sentry to do latency. And so I just have those systems handle it. And then the prompt-in, prompt-out latency token costs, I just put that in a Postgres table now.

So you don't need 20 funded startups building LLM ops? Yeah, but I'm also an old, tired guy. Because of my background, I was like, yeah, the Python stuff I'll write myself. But I will also just use Vercel happily. Because I'm just not familiar with that world of tooling. Whereas I think I spent three good years building observability tools for recommendation systems.

And I was like, oh, compared to that, Instructor is just one call. I just have to put time start, time end, and then count the prompt token. Because I'm not doing a very complex looping behavior. I'm doing mostly workflows and extraction. Yeah, I mean, while we're on this topic, we'll just kind of get this out of the way.

You famously have decided to not be a venture-backed company. You want to do the consulting route. The obvious route for someone as successful as Instructor is like, oh, here's hosted Instructor with all tooling. And you just said you had a whole bunch of experience building observability tooling. You have the perfect background to do this, and you're not.

Yeah, isn't that sick? I think that's sick. I know. I mean, I know why, because you want to go free dive. But-- Yeah, well, yeah, because I think there's two things. One, it's like, if I tell myself I want to build requests, requests is not a venture-backed startup. I mean, one could argue whether or not Postman is.

But I think for the most part, having worked so much, I'm kind of, like, I am more interested in looking at how systems are being applied and just having access to the most interesting data. And I think I can do that more through a consulting business where I can come in and go, oh, you want to build perfect memory.

You want to build an agent. You want to build, like, automations over construction or, like, insurance and the supply chain. Or you want to handle, like, writing, like, private equity, like, mergers and acquisitions reports based off of user interviews. Those things are super fun. Whereas, like, maintaining the library, I think, is mostly just kind of, like, a utility that I try to keep up, especially because if it's not venture-backed, I have no reason to sort of go down the route of, like, trying to get 1,000 integrations.

Like, in my mind, I just go, oh, OK, 98% of the people use OpenAI. I'll support that. And if someone contributes another, like, platform, that's great. I'll merge it in. But yeah, I mean, you only added Entropic Support, like, this year. Yeah, yeah, yeah. The thing, a lot of it was just, like, you couldn't even get an API key until, like, this year, right?

Yeah, that's true, that's true. And so, OK, if I add it, like, last year, I was kind of-- I'm trying to, like, double the code base to service, you know, half a percent of all downloads. Do you think the market share will shift a lot now that Entropic has, like, a very, very competitive offering?

I think it's still hard to get API access. I don't know if it's fully GA now, if it's GA, if you can get commercial access really easily. I don't know. I got commercial after, like, two weeks to reach out to their sales team. OK, yeah, so two weeks. Yeah, there's a call list here.

And then anytime you run into rate limits, just, like, ping one of the Entropic staff members. Then maybe we need to, like, cut that part out so I don't need to, like, you know, read false news. But it's a common question. Surely, just from the price perspective, it's going to make a lot of sense.

Like, if you are a business, you should totally consider, like, SONET, right? Like, the cost savings is just going to justify it if you actually are doing things at volume. And yeah, I think their SDK is, like, pretty good. But back to the instructor thing, I just don't think it's a billion-dollar company.

And I think if I raise money, the first question is going to be, like, how are you going to get a billion-dollar company? And I would just go, like, man, like, if I make a million dollars as a consultant, I'm super happy. I'm, like, more than ecstatic. I can have, like, a small staff of, like, three people.

Like, it's fun. And I think a lot of my happiest founder friends are those who, like, raised the tiniest seed round, became profitable, they're making, like, 70, 60, 70, like, MRR, 70,000 MRR. And they're, like, we don't even need to raise the seed round. Like, let's just keep it, like, between me and my co-founder, we'll go traveling, and it'll be a great time.

I think it's a lot of fun. - I repeat that as a seed investor in the company. I think that's, like, one of the things that people get wrong sometimes, and I see this a lot. They have an insight into, like, some new tech, like, say LLM, say AI, and they build some open source stuff, and it's like, I should just raise money and do this.

And I tell people a lot, it's like, look, you can make a lot more money doing something else than doing a startup. Like, most people that do a company could make a lot more money just working somewhere else than doing the company itself. Do you have any advice for folks that are maybe in a similar situation?

They're trying to decide, oh, should I stay in my, like, high-paid fang job and just tweet this on the side and do this on GitHub? Should I be a consultant? Like, being a consultant seems like a lot of work. It's like, you got to talk to all these people, you know?

- There's a lot to unpack, because I think the open source thing is just like, well, I'm just doing it for, like, purely for fun, and I'm doing it because I think I'm right. But part of being right is the fact that it's not a venture-backed startup. Like, I think I'm right because this is all you need.

Right? Like, you know. So I think a part of it is just, like, part of the philosophy is the fact that all you need is a very sharp blade to sort of do your work, and you don't actually need to build, like, a big enterprise. So that's one thing.

I think the other thing, too, that I've been thinking around, just because I have a lot of friends at Google that want to leave right now, it's like, man, like, what we lack is not money or, like, money or, like, skill. Like, what we lack is courage. Like, you just have to do this, the hard thing, and you have to do it scared anyways, right?

In terms of, like, whether or not you do want to do a founder, I think that's just a matter of, like, optionality. But I definitely recognize that the, like, expected value of being a founder is still quite low. - It is. - Right. Like, I know as many founder breakups and as I know friends who raised a seed round this year.

Right? And, like, that is, like, the reality. And, like, you know, even from my perspective, it's been tough where it's like, oh, man, like, a lot of incubators want you to have co-founders. Now you spend half the time, like, fundraising and then trying to, like, meet co-founders and find co-founders rather than building the thing.

And I was like, man, like, this is a lot of stuff, a lot of time spent out doing things I'm not really good at. I think, I do think there's a rising trend in solo founding. You know, I am a solo. I think that something like 30% of, like, I think, I forget what the exact stat is, something like 30% of starters that make it to, like, series B or something actually are solo founder.

So I think, I feel like this must-have co-founder idea mostly comes from YC and most, everyone else copies it. And then, yeah, you, like, plenty of companies break up over co-founder breakups. - Yeah, and I bet it would be, like, I wonder how much of it is the people who don't have that much, like, and I hope this is not a diss to anybody, but it's like, you sort of, you go through the incubator route because you don't have, like, the social equity you would need to just sort of, like, send an email to Sequoia and be, like, "Hey, I'm going on this ride.

"Do you want a ticket on the rocket ship?" Right, like, that's very hard to sell. Like, if I was to raise money, like, that's kind of, like, my message if I was to raise money is, like, "You've seen my Twitter. "My life is sick. "I've decided to make it much worse by being a founder "because this is something I have to do.

"So do you want to come along? "Otherwise, I'm gonna fund it myself." Like, if I can't say that, like, I don't need the money 'cause, like, I can, like, handle payroll and, like, hire an intern and get an assistant. Like, that's all fine. But, like, what I don't want to do, it's, like, I really don't want to go back to meta.

I want to, like, get two years to, like, try to find a problem we're solving. That feels like a bad time. - Yeah. Jason is like, "I wear a YSL jacket "on stage at AI Engineer Summit. "I don't need your accelerator money." - And boots. You don't forget the boots.

- That's true, that's true. - You have really good boots, really good boots. But I think that is a part of it, right? I think it is just, like, optionality. And also, just, like, I'm a lot older now. I think 22-year-old Jason would have been probably too scared, and now I'm, like, too wise.

But I think it's a matter of, like, oh, if you raise money, you have to have a plan of spending it. And I'm just not that creative with spending that much money. - Yeah. I mean, to be clear, you just celebrated your 30-year birthday. Happy birthday. - Yeah, it's awesome.

I'm going to Mexico next weekend. - You know, a lot older is relative to some of the folks listening. (laughing) - Seeing on the career tips, I think SWIGS had a great post about are you too old to get into AI? I saw one of your tweets in January '23.

You applied to, like, Figma, Notion, Cohere, Anthropic, and all of them rejected you because you didn't have enough LLM experience. I think at that time, it would be easy for a lot of people to say, oh, I kind of missed the boat, you know? I'm too late, not going to make it, you know?

Any advice for people that feel like that, you know? - Yeah, I mean, like, the biggest learning here is actually from a lot of folks in jiu-jitsu. They're like, oh, man, is it too late to start jiu-jitsu? Like, oh, I'll join jiu-jitsu once I get in more shape. Right?

It's like, there's a lot of, like, excuses. And then you say, oh, like, why should I start now? I'll be, like, 45 by the time I'm any good. And it's like, well, you'll be 45 anyways. Like, time is passing. Like, if you don't start now, you start tomorrow. You're just, like, one more day behind.

And if you're, like, if you're worried about being behind, like, today is, like, the soonest you can start. Right? And so you got to recognize that, like, maybe you just don't want it, and that's fine too. Like, if you wanted it, you would have started. Like, you know. I think a lot of these people, again, probably think of things on a too short time horizon.

But again, you know, you're going to be old anyways. You may as well just start now. - You know, one more thing on, I guess, the career advice slash sort of blogging. You always go viral for this post that you wrote on advice to young people and the lies you tell yourself.

- Oh, yeah, yeah, yeah. - You said that you were writing it for your sister. Like, why is that? - Yeah, yeah, yeah. Yeah, she was, like, bummed out about, like, you know, going to college and, like, stressing about jobs. And I was like, oh, and I really want to hear, okay.

And I just kind of, like, texted through the whole thing. It's crazy. It's got, like, 50,000 views. I'm like, I don't mind. - I mean, your average tweet has more. - But that thing is, like, you know, a 30-minute read now. - Yeah, yeah. So there's lots of stuff here, which I agree with.

You know, I'm also of occasionally indulge in the sort of life reflection phase. There's the how to be lucky. There's the how to have higher agency. I feel like the agency thing is always making a, is always a trend in SF or just in tech circles. - How do you define having high agency?

- Yeah, I mean, I'm almost, like, past the high agency phase now. Now my biggest concern is, like, okay, the agency is just, like, the norm of the vector. What also matters is the direction, right? It's, like, how pure is the shot? Yeah, I mean, I think agency is just a matter of, like, having courage and doing the thing.

That's scary, right? Like, you know, if you want to go rock climbing, it's, like, do you decide you want to go rock climbing, and then you show up to the gym, you rent some shoes, and you just fall 40 times? Or do you go, like, oh, like, I'm actually more intelligent.

Let me go research the kind of shoes that I want. Okay, like, there's flatter shoes and more inclined shoes. Like, which one should I get? Okay, let me go order the shoes on Amazon. I'll come back in three days. Like, oh, it's a little bit too tight. Maybe it's too aggressive.

I'm only a beginner. Let me go change. No, I think the higher agent person just, like, goes and, like, falls down 20 times, right? Yeah, I think the higher agency person is more focused on, like, process metrics versus outcome metrics, right? Like, from pottery, like, one thing I learned was if you want to be good at pottery, you shouldn't count, like, the number of cups or bowls you make.

You should just weigh the amount of clay you use, right? Like, the successful person says, oh, I went through 1,000 pounds of clay, 100 pounds of clay, right? The less agency person's like, oh, I made six cups, and then after I made six cups, like, there's not really, what do you do next?

No, just pounds of clay, pounds of clay. Same with the work here, right? It's like, oh, you just got to write the tweets, like, make the commits, contribute open source, like, write the documentation. There's no real outcome, it's just a process, and if you love that process, you just get really good at the thing you're doing.

- Yeah, so just to push back on this, 'cause obviously I mostly agree, how would you design performance review systems? (laughing) Because you were effectively saying we can count lines of code for developers, right? Like, did you put out-- - No, I don't think that would be the actual, like, I think if you make that an outcome, like, I can just expand a for loop, right?

I think, okay, so for performance review, this is interesting because I've mostly thought of it from the perspective of science and not engineering. Like, I've been running a lot of engineering stand-ups, primarily because there's not really that many machine learning folks. Like, the process outcome is like experiments and ideas, right, like, if you think about outcomes, what you might want to think about an outcome is, oh, I want to improve the revenue or whatnot, but that's really hard.

But if you're someone who is going out like, okay, like this week, I want to come up with like three or four experiments, I might move the needle. Okay, nothing worked. To them, they might think, oh, nothing worked, like, I suck. But to me, it's like, wow, you've closed off all these other possible avenues for, like, research.

Like, you're gonna get to the place that you're gonna figure out that direction really soon, right, like, there's no way you'd try 30 different things and none of them work. Usually, like, you know, 10 of them work, five of them work really well, two of them work really, really well, and one thing was, like, you know, the nail in the head.

So agency lets you sort of capture the volume of experiments. And, like, experience lets you figure out, like, oh, that other half, it's not worth doing, right? Like, I think experience is gonna go, half these prompting papers don't make any sense, just use a chain of thought and just, you know, use a for loop.

But that's kind of, that's basically it, right? It's like, usually performance for me is around, like, how many experiments are you running? Like, how oftentimes are you trying? - Yeah. - When do you give up on an experiment? Because at Stitch Fix, you kind of give up on language models, I guess, in a way, and as a tool to use.

And then maybe the tools got better. They got better before, you know, you were kind of like, you were right at the time and then the tool improved. I think there are similar paths in my engineering career where I try one approach and at the time it doesn't work and then the thing changes, but then I kind of soured on that approach and I don't go back to it soon enough.

- I see. What do you think about that loop? - So usually when I, like, when I'm coaching folks and they say, like, oh, these things don't work, I'm not going to pursue them in the future. Like, one of the big things, like, hey, the negative result is a result and this is something worth documenting.

Like, this isn't academia. Like, if it's negative, you don't just, like, not public. But then, like, what do you actually write down? Like, what you should write down is, like, here are the conditions. This is the inputs and the outputs we tried the experiment on. And then one thing that's really valuable is basically writing down under what conditions would I revisit these experiments, right?

It's like, these things don't work because of what we had at the time. If someone is reading this two years from now, under what conditions will we try again? That's really hard, but again, that's like another, that's like another skill you kind of learn, right? It's like, you do go back and you do experiments and you figure out why it works now.

I think a lot of it here is just, like, scaling worked. - Yeah. - Right, like, you could actually, like, rap lyrics, you know, like, that was because I did not have high enough quality data. If we phase shift and say, okay, you don't even need training data. So, oh, great, then it might just work.

- Yeah. - Different domain. - Do you have any, anything in your list that is like, it doesn't work now, but I want to try it again later? Something that people should, maybe keep in mind, you know, people always like, AGI when? You know, when are you going to know the AGI is here?

Maybe it's less than that, but any stuff that you tried recently that didn't work that you think will get there? - I mean, I think, like, the personal assistants and the writing I've shown to myself is just not good enough yet. So, I hired a writer and I hired a personal assistant.

So, now I'm going to basically, like, work with these people until I figure out, like, what I can actually, like, automate and what are, like, the reproducible steps, right? But, like, I think the experiment for me is, like, I'm going to go, like, pay a person, like, $1,000 a month to, like, help me improve my life and then let me, sort of, get them to help me figure out, like, what are the components and how do I actually modularize something to get it to work?

'Cause it's not just, like, OAuth, Gmail, Calendar, and, like, Notion. It's a little bit more complicated than that, but we just don't know what that is yet. Or those are two, sort of, systems that, I wish GPD 4 or Opus was actually good enough to just write me an essay, but most of the essays are still pretty bad.

- Yeah, I would say, you know, on the personal assistant side, Lindy is probably the one I've seen the most. He was, Flo was a speaker at the summit. I don't know if you've checked it out or any other, sort of, agents, assistant startup. - Not recently. I haven't tried Lindy.

It was, they were, like, behind, they were not GA last time I was considering it. - Yeah, yeah, they're not GA. - But a lot of it now, it's, like, oh, like, really what I want you to do is, like, take a look at all of my meetings and, like, write, like, a really good weekly summary email for my clients.

Remind them that I'm, like, you know, thinking of them and, like, working for them. Right? Or it's, like, I want you to notice that, like, my Mondays were way, like, way too packed and, like, block out more time and also, like, email the people to do the reschedule and then try to opt in to move them around.

And then I want you to say, oh, Jason should have, like, a 15-minute prep break after a four-back-to-back meeting. Those are things that, like, now I know I can prompt them in, but can it do it well? Like, before, I didn't even know that's what I wanted to prompt for.

It was, like, defragging a calendar and adding breaks so I can, like, eat lunch. Right? - Yeah, that's the AGI test. - Yeah, exactly. Compassion, right? - I think one thing that, yeah, we didn't touch on it before, but I think was interesting. You had this tweet a while ago about prompts should be code.

And then there were a lot of companies trying to build prompt engineering tooling, kind of trying to turn the prompt into a more structured thing. What's your thought today? Like, you know, now you want to turn the thinking into DAGs, like, do prompts should still be code? Like, any updated ideas?

- Nah, it's the same thing, right? I think, like, you know, with Instructor, it is very much, like, the output model is defined as a code object. That code object is sent to the LLM and in return, you get a data structure. So the outputs of these models, I think, should also be code, like, code objects.

And the inputs, somewhat, should be code objects. But I think the one thing that Instructor tries to do is separate instruction, data, and the types of the output. And beyond that, I really just think that, you know, most of it should be still, like, managed pretty closely to the developer.

Like, so much of it is changing that if you give control of these systems away too early, you end up, ultimately, wanting them back. Like, many companies I know that I reach out are ones where, like, oh, we're going off of the frameworks because now that we know what the business outcomes we're trying to optimize for, these frameworks don't work.

Yeah, 'cause, like, we do RAG, but we want to do RAG to, like, sell you supplements or to have you, like, schedule the fitness appointment. And, like, the prompts are kind of too baked into the systems to really pull them back out and, like, start doing upselling or something.

It's really funny, but a lot of it ends up being, like, once you understand the business outcomes, you care way more about the prompt, right? - Actually, this is fun. So we were trying, in our prep for this call, we were trying to say, like, what can you, as an independent person, say that maybe me and Alessio cannot say or, you know, someone who works at a company can say?

What do you think is the market share of the frameworks? The Lanchain, the Llama Index, the everything else. - Oh, massive. 'Cause not everyone wants to care about the code. - Yeah. - Right? It's like, I think that's a different question to, like, what is the business model and are they going to be, like, massively profitable businesses, right?

Like, making hundreds of millions of dollars, that feels, like, so straightforward, right? 'Cause not everyone is a prompt engineer. Like, there's so much productivity to be captured in, like, back-office automations, right? It's not because they care about the prompts, that they care about managing these things. - Yeah, but those are not sort of low-code experiences, you know?

- Yeah, I think the bigger challenge is, like, okay, $100 million, probably pretty easy. It's just time and effort. And they have both, like, the manpower and the money to sort of solve those problems. I think it's just like, again, if you go the VC route, then it's like, you're talking about billions and that's really the goal.

That stuff, for me, it's, like, pretty unclear. - Okay. - But again, that is to say that, like, I sort of am building things for developers who want to use Instructure to build their own tooling. But in terms of the amount of developers there are in the world versus, like, downstream consumers of these things or even just, like, you know, think of how many companies will use, like, the Adobes and the IBMs, right?

Because they want something that's fully managed and they want something that they know will work. And if the incremental 10% requires you to hire another team of 20 people, you might not want to do it. And I think that kind of organization is really good for those bigger companies.

- And I just want to capture your thoughts on one more thing, which is, you said you wanted most of the prompts to stay close to the developer. I wouldn't, and Hummel Hussain wrote this, like, post which I really love called, like, "FU, show me the prompt." I think it cites you in one of those, part of the blog post.

And I think DSPy is kind of, like, the complete antithesis of that, which is, I think, interesting. 'Cause I also hold the strong view that AI is a better prompt engineer than you are. And I don't know how to square that. I'm wondering if you have thoughts. - I think something like DSPy can work because there are, like, very short-term metrics to measure success.

Right? It is, like, did you find the PII? Or, like, did you write the multi-hop question the correct way? But in these, like, workflows that I've been managing, a lot of it is, like, are we minimizing, like, minimizing churn and maximizing retention? Like, that's not, like, it's not really, like, a, like, uptuna, like, training loop, right?

Like, those things are much more harder to capture. So we don't actually have those metrics for that, right? And obviously, we can figure out, like, okay, is the summary good? But then, like, how do you measure the quality of the summary, right? It's, like, that feedback loop, it ends up being a lot longer.

And then, again, when something changes, it's really hard to make sure that it works across these, like, newer models, or, again, like, changes to work for the current process. Like, when we migrate from, like, Anthropic to OpenAI, like, there's just a ton of change that are, like, infrastructure related, not necessarily around the prompt itself.

- Any other AI engineering startups that you think should not exist before we wrap up? - No, I mean, oh, my gosh. I mean, a lot of it, again, is just, like, every time of investors, like, what is, how does this make a billion dollars? Like, it doesn't. I'm gonna go back to just, like, tweeting and holding my breath underwater.

Yeah, like, I don't really pay attention too much to most of this. Like, most of the stuff I'm doing is around, like, the consumer layer, right? Like, it's not in the consumer layer, but, like, the consumer of, like, LLM calls. I think people just wanna move really fast and they're willing to pick these vendors, but it's, like, I don't really know if anything has really, like, blown me out the water.

Like, I only trust myself, but that's also a function of, like, just being an old man. Like, I think, you know, many companies are definitely very happy with using most of these tools anyways, but I definitely think I occupy, like, a very small space in the AI engineering ecosystem.

- Yeah, I would say one of the challenges here, you know, you talk about dealing in the consumer of LLM's space. I think that's what AI engineering differs from ML engineering, and I think a constant disconnect or cognitive dissonance in this field, in the AI engineers that have sprung up, is that they're not as good as the ML engineers.

They're not as qualified. I think that, you know, you are someone who has credibility in the MLE space, and you are also, you know, a very authoritative figure in the AIE space, and-- - Authoritative? - I think so. And, you know, I think you've built the de facto leading library.

I think yours, I think Instructor should be part of the standard lib, even though I try to not use it. Like, I also try to figure out that, I basically also end up rebuilding Instructor, right? Like, that's a lot of the back and forth that we had over the past two days.

(laughing) But like, yeah, like, I think that's a fundamental thing that we're trying to figure out. Like, there's a very small supply of MLEs. They're not, like, not everyone's gonna have that experience that you had, but the global demand for AI is going to far outstrip the existing MLEs.

So what do we do? Do we force everyone to go through the standard MLE curriculum, or do we make a new one? - I've got some takes. - Go. - I think a lot of these app layer startups should not be hiring MLEs, 'cause they end up churning. - Yeah, they want to work at OpenAI.

(laughing) 'Cause they're just like, "Hey guys, I joined and you have no data, and like, all I did this week was like, fix some TypeScript build errors, and like, figure out why we don't have any tests, and like, what is this framework X and Y? Like, how come, like, what am I, like, what are, like, how do you measure success?

What are your biggest outcomes? Oh, no, okay, let's not focus on that? Great, I'll focus on like, these TypeScript build errors." (laughing) And then you're just like, "What am I doing?" And then you kind of sort of feel really frustrated. And I already recognize that, because I've made offers to machine learning engineers, they've joined, and they've left in like, two months.

And the response is like, "Yeah, I think I'm going to join a research lab." So I think it's not even that, like, I don't even think you should be hiring these MLEs. On the other hand, what I also see a lot of, is the really motivated engineer that's doing more engineering, is not being allowed to actually like, fully pursue the AI engineering.

So they're the guy who built a demo, it got traction, now it's working, but they're still being pulled back to figure out like, why Google Calendar integrations are not working, or like, how to make sure that like, you know, the button is loading on the page. And so I'm sort of like, in a very interesting position where the companies want to hire an MLE, they don't need to hire, but they won't let the excited people who've caught the AI engineering bug could go do that work more full time.

- This is something I'm literally wrestling with, like, this week, as I just wrote something about it. This is one of the things I'm probably gonna be recommending in the future, is really thinking about like, where is the talent coming from? How much of it is internal? And do you really need to hire someone who's like, writing PyTorch code?

- Yeah, exactly. Most of the time you're not, you're gonna need someone to write instructor code. - And you're just like, yeah, you're making this like, and like, I feel goofy all the time, just like, prompting. It's like, oh man, I wish I just had a target data set that I could like, train a model against.

- Yes. - And I can just say it's right or wrong. - Yeah, so, you know, I guess what LeanSpace is, what the AI Engineering World's Fair is, is that we're trying to create and elevate this industry of AI engineers, where it's legitimate to actually take these motivated software engineers who wanna build more in AI and do creative things in AI, to actually say, you have the blessing, and this is a legitimate sub-specialty of software engineering.

- Yeah, I think there's gonna be a mix of that, product engineering. I think a lot more data science is gonna come in versus machine learning engineering, 'cause a lot of it now is just quantifying, like, what does the business actually want as an outcome? Right, the outcome is not RAGAP.

- Yeah. - The outcome is like, reduced churn, or something like that, but people need to figure out what that actually is, and how to measure it. - Yeah, yeah, all the data engineering tools still apply, BI layers, semantic layers, whatever. - Yeah. - Cool. - We'll see. - We'll have you back again for the World's Fair.

We don't know what you're gonna talk about, but I'm sure it's gonna be amazing. You're a very-- - The title is written. It's just, "Pydantic is still all you need." (laughing) - I'm worried about having too many all-you-need titles, because that's obviously very trendy. So, yeah, you have one of them, but I need to keep a lid on, like, everyone saying their thing is all you need.

But yeah, we'll figure it out. - Pydantic is not my thing. It's someone else's thing. - Yeah, yeah, yeah. - I think that's why it works. - Yeah, it's true. - Cool, well, it was a real pleasure to have you on. - Of course. - Everyone should go follow you on Twitter and check out Instructure.

There's also InstructureJS, I think, which I'm very happy to see. And what else? - Useinstructure.com. - Anything else to plug? - Useinstructure.com. We got a domain name now. - Nice, nice, awesome. Cool. - Cool. Thanks, Tristan. - Thanks. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (gentle music)