back to indexHigh Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor
Chapters
0:0 Introductions
2:50 Early experiments with Generative AI at StitchFix
9:39 Design philosophy behind the Instructor library
13:17 JSON Mode vs Function Calling
14:43 Single vs parallel function calling
16:28 How many functions is too many?
20:40 How to evaluate function calling
24:1 What is Instructor good for?
26:41 The Evolution from Looping to Workflow in AI Engineering
31:58 State of the AI Engineering Stack
33:40 Why Instructor isn't VC backed
37:8 Advice on Pursuing Open Source Projects and Consulting
42:59 The Concept of High Agency and Its Importance
51:6 Prompts as Code and the Structure of AI Inputs and Outputs
53:6 The Emergence of AI Engineering as a Distinct Field
00:00:02.960 |
This is Alessio, partner at CTN Residence at Decibel Partners. 00:00:06.640 |
And I'm joined by my co-host, Swiggs, founder of Small AI. 00:00:10.020 |
We're back in the remote studio with Jason Liu from Instructure. 00:00:18.440 |
So I don't know what I'm going to do introducing you. 00:00:24.220 |
There's a small cadre of you that's just completely 00:00:30.520 |
that you know are just dominating and crushing it 00:00:34.600 |
So John from Ryasana is doing his inversion models. 00:00:46.120 |
He was one of the kids where I-- when I started the data science 00:00:49.640 |
club, he was one of the guys who was joining in and just 00:00:52.920 |
And now he was at Tesla, working with Harpathy. 00:01:12.000 |
Yeah, I mean, a lot of good problem solving there. 00:01:14.120 |
But oh man, I feel like now that you put me on the spot, 00:01:20.920 |
OK, but anyway, so you started a data science club at Waterloo. 00:01:24.840 |
But then also spent five years at Stitch Fix as an MLDE. 00:01:33.440 |
So you must have been a very, very early user. 00:01:42.200 |
OK, so we actually were using transformers at Stitch Fix 00:01:47.600 |
So we were just using transformers recommendation 00:01:50.140 |
At that time, I was very skeptical of transformers. 00:01:53.680 |
I was like, why do we need all this infrastructure? 00:02:11.480 |
was very much like, why are we using a POST request 00:02:19.840 |
So I was very against language models for the longest time. 00:02:25.680 |
just wrote a long apology letter to everyone at the company. 00:02:38.400 |
to go from computer vision recommendation systems to LLMs. 00:02:44.120 |
we're kind of going back to recommendation systems. 00:02:46.440 |
Yeah, speaking of that, I think Alessio's going to bring up-- 00:02:49.020 |
I was going to say, we had Brian Bishop from Max 00:02:59.840 |
Yeah, we talked a lot about Rexis, so it makes sense. 00:03:08.080 |
And if you're trying to reinvent new concepts, 00:03:13.120 |
going to independently reinvent a lot of concepts. 00:03:16.640 |
It's a recommendation framework with over 80% adoption, 00:03:22.300 |
Wasn't there something existing at Stitch Fix? 00:03:24.220 |
Like, why did you have to write one from scratch? 00:03:26.460 |
No, so I think because at Stitch Fix, a lot of the machine 00:03:31.580 |
were writing production code, sort of every team's systems 00:03:36.140 |
It's like, this team only needs to do real-time recommendations 00:03:39.860 |
with small data, so they just have a fast API 00:03:47.700 |
that does some batch ETL that does a recommendation, right? 00:03:51.320 |
And so what happens is each team writes their code differently, 00:03:54.020 |
and I have to come in and refactor their code. 00:03:58.380 |
four different code bases four different times. 00:04:01.660 |
Wouldn't it be better if all the code quality was my fault? 00:04:04.580 |
All right, well, let me just write this framework, 00:04:06.660 |
force everyone else to use it, and now one person 00:04:09.180 |
can maintain five different systems rather than five teams 00:04:14.860 |
And so it was really a need of just standardizing everything. 00:04:18.200 |
And then once you do that, you can do observability 00:04:22.000 |
across the entire pipeline and make large, sweeping 00:04:33.320 |
Just, hey, hey, this team, you guys are doing this operation. 00:04:42.440 |
we can probably make an extra million dollars. 00:04:46.580 |
And then a lot of it was just doing all this observability 00:04:52.100 |
and optimize this system from not only just a code 00:04:54.360 |
perspective, but just harassing the org and saying, 00:05:05.400 |
One more system that I'm interested in finding out 00:05:13.760 |
where you said over $50 million in annual revenue. 00:05:17.240 |
So of course, they all gave all that to you, right? 00:05:22.200 |
But I got a little bit, so I'm pretty happy about that. 00:05:25.640 |
But there, that was when we were doing fine-tuning ResNets 00:05:35.840 |
if we could predict the different attributes we have 00:05:38.000 |
in our merchandising, and we can predict the text embeddings 00:05:41.280 |
of the comments, then we can build a image vector or image 00:05:46.840 |
embedding that can capture both descriptions of the clothing 00:05:51.600 |
And then we would use these additional vectors 00:05:59.000 |
really was just around, what are similar items? 00:06:03.240 |
What are items that you would wear in a single outfit? 00:06:10.720 |
And then what we found was like, hey, when you turn that on, 00:06:14.640 |
OK, so you didn't actually use GPT-3 embeddings. 00:06:19.160 |
Because I was surprised that GPT-3 worked off the shelf. 00:06:23.400 |
Because at this point, we would have 3 million pieces 00:06:41.240 |
But any other fun stories from the Stitch Fix days 00:06:47.160 |
I mean, the biggest one, really, was the fact 00:06:50.560 |
I was so bearish on language models and just NLP in general. 00:06:58.320 |
I've got to go do the thing that makes money-- 00:07:00.440 |
recommendations, bounding boxes, image classification. 00:07:09.880 |
I think-- OK, so my Stitch Fix question would be, 00:07:16.720 |
My primary wardrobe is free startup conference t-shirts. 00:07:21.480 |
Should more technology brothers be using Stitch Fix? 00:07:28.120 |
Oh, man, I mean, I'm not a user of Stitch Fix, right? 00:07:31.160 |
It's like, I enjoy going out and touching things and putting 00:07:37.360 |
I think Stitch Fix is a place where you kind of go 00:07:46.880 |
when I land in Japan, I'm doing a 45-minute walk up a giant hill 00:07:54.520 |
But I think the bigger thing that's really captured 00:07:56.840 |
is this idea that narrative matters a lot to human beings. 00:08:10.680 |
But it's really hard for AI to sell a $500 shirt. 00:08:14.200 |
But people are buying $500 shirts, you know what I mean? 00:08:16.600 |
There's definitely something that we can't really 00:08:19.200 |
capture just yet that we probably will figure out 00:08:28.640 |
So then you went on a sabbatical to South Park Commons 00:08:31.760 |
in New York, which is unusual because it's usually-- 00:08:43.600 |
This is where we were making the tens of millions of dollars 00:08:47.640 |
And then I had a hand injury, and so I really 00:08:49.520 |
couldn't code anymore for about a year, two years. 00:08:52.840 |
And so I kind of took half of it as medical leave. 00:08:55.640 |
The other half, I became more of a tech lead, 00:08:57.600 |
just making sure the systems or lights were on. 00:09:03.920 |
I spent some time there and kind of just wound down 00:09:06.400 |
the tech work, did some pottery, did some jiu jitsu. 00:09:09.560 |
And after GBD came out, I was like, oh, I clearly 00:09:14.960 |
need to figure out what is going on here because something 00:09:17.720 |
feels very magical, and I don't understand it. 00:09:20.120 |
So I spent basically five months just prompting and playing 00:09:24.600 |
And then afterwards, it was just my startup friends 00:09:38.680 |
And you had YouTube University and a journaling app, 00:09:47.280 |
or the most best-known thing that came out of your time 00:10:03.240 |
those are kind of tools that use XML and Pytantic 00:10:07.560 |
But they really were doing things sort of in the prompt. 00:10:10.720 |
And these are built with sort of the Instruct models in mind. 00:10:14.080 |
And I really-- like, I'd already done that in the past. 00:10:18.800 |
was we would take a request note and turn that into a JSON 00:10:22.720 |
object that we would use to send it to our search engine, right? 00:10:26.960 |
So if you said, like, I wanted skinny jeans that 00:10:31.720 |
that we would send to our internal search APIs. 00:10:35.960 |
A lot of it is just, like, you read the JSON, 00:10:37.840 |
you parse it, you make sure the names are strings 00:10:40.000 |
and ages are numbers, and you do all this messy stuff. 00:10:45.480 |
it was very much sort of a new way of doing things. 00:11:01.200 |
And then you can just keep those very separate. 00:11:04.700 |
you can add validators and all that kind of stuff. 00:11:07.060 |
The one thing I really had with a lot of these libraries, 00:11:09.520 |
though, was it was doing a lot of the string formatting 00:11:11.960 |
themselves, which was fine when it was the instruction tune 00:11:26.560 |
was sort of a good benefit that they would get. 00:11:30.480 |
And so I just said, let me write the most simple SDK 00:11:34.240 |
around the OpenAI SDK, simple wrapper on the SDK, 00:11:41.240 |
and kind of think of myself more like requests 00:11:48.360 |
that you can use to build your own framework. 00:12:03.200 |
Yeah, we had a little bit of this discussion before the show. 00:12:05.960 |
But that design principle of going for being requests 00:12:09.320 |
rather than being Django, what inspires you there? 00:12:25.000 |
I think it is just the obvious thing you install. 00:12:29.280 |
If you were going to go make HTTP requests in Python, 00:12:38.320 |
But you don't really even think about installing it. 00:12:40.960 |
And when you do install it, you don't think of it 00:12:56.360 |
It's like, oh, if you're going to use the LLM SDKs, 00:12:59.640 |
you're obviously going to install Instructor. 00:13:01.780 |
And then I think the second question would be, oh, 00:13:03.880 |
how come Instructor doesn't just go into OpenAI, 00:13:16.960 |
And the shape of the request has stayed the same. 00:13:24.560 |
I think now the models also support JSON mode 00:13:29.320 |
And return JSON on my grandma is going to die. 00:13:39.320 |
Should people just forget about function calling for structure 00:13:55.720 |
that now we have typed responses to language models. 00:14:07.560 |
In terms of whether or not JSON mode is better, 00:14:12.160 |
unless you want to spend less money on the prompt tokens 00:14:24.840 |
But really, I care a lot more than just the fact 00:14:28.880 |
I think function calling gives you a tool to specify the fact 00:14:31.960 |
that, OK, this is a list of objects that I want. 00:14:37.780 |
And I want to make sure it's parsed correctly. 00:14:43.800 |
Any thoughts on single versus parallel function calling? 00:14:48.480 |
When I first started-- so I did a presentation 00:15:05.240 |
are lists to then actually return all the objects. 00:15:08.840 |
How do you see the hack being put on the developer's plate 00:15:13.880 |
versus more of the stuff just getting better in the model? 00:15:17.280 |
And I know you tweeted recently about Anthropic, for example, 00:15:31.400 |
that Instructor doesn't really support that well, 00:15:36.520 |
You could define, I think, maybe like 50 or 60 00:15:43.200 |
And if it was like get the weather, or turn the lights on, 00:15:45.720 |
or do something else, it makes a lot of sense 00:15:50.520 |
I definitely think it's probably more helpful to have 00:15:58.520 |
between these entities that you can't do in parallel function 00:16:01.800 |
calling, you can have a single chain of thought 00:16:11.960 |
Where, yeah, if it's for parallel function calling, 00:16:15.840 |
if you do one, again, I really care about how the SDK looks. 00:16:21.120 |
And so it's, OK, do I always return a list of functions, 00:16:23.640 |
or do you just want to have the actual object back out? 00:16:26.080 |
You want to have autocomplete over that object. 00:16:31.320 |
definitions you can put in where it still works well? 00:16:37.440 |
had a need to do anything that's more than six or seven 00:16:41.400 |
I think in the documentation, they support way more. 00:16:44.200 |
But I don't even know if there's any good evals 00:16:56.060 |
I think you're much better having those specifications 00:16:58.760 |
saved in a vector database, and then have them be retrieved. 00:17:02.700 |
So if there are 30 tools, you should basically 00:17:10.220 |
rather than just shoving 60 functions into a single API. 00:17:14.740 |
Well, I mean, so I think this is relevant now, 00:17:17.060 |
because previously, I think context limits prevented you 00:17:24.060 |
And now that we have a million token context windows, 00:17:28.380 |
Cloud recently, with their new function calling release, 00:17:30.820 |
said they can handle over 250 tools, which is insane to me. 00:17:41.620 |
I think anyone with a sort of agent-like platform where 00:17:48.100 |
Probably, you're right that they should use a vector database 00:18:00.940 |
be it, unless you need some kind of intelligence 00:18:03.300 |
that chains things together, which is, I think, 00:18:07.780 |
There is this trend about parallel function calling. 00:18:14.060 |
I think they used multiple tools in sequence, 00:18:20.940 |
as to what do you think about all these things. 00:18:22.940 |
You know, do we assume that all function calls 00:18:32.140 |
or we can assume that things need to happen in some kind 00:18:35.780 |
But if it's a DAG, really, that's just one JSON object 00:18:38.420 |
that is the entire DAG, rather than going, OK, 00:18:41.140 |
the order of the function that I return don't matter. 00:18:44.240 |
That's just-- that's definitely just not true in practice. 00:18:47.420 |
If I have a thing that's like, turn the lights on, 00:18:49.500 |
unplug the power, and then turn the toaster on or something, 00:18:57.380 |
describe the importance of that reasoning to a language model 00:19:01.740 |
I mean, I'm sure you can do it with good enough prompting, 00:19:04.380 |
but I just haven't had any use cases where the function 00:19:16.020 |
Like, I'm incubating a company around system integration. 00:19:23.900 |
And if you actually try and do vector similarity, 00:19:26.780 |
it's not that good, because the people that wrote the specs 00:19:29.300 |
didn't have in mind making them semantically apart. 00:19:32.980 |
They're kind of like, oh, create this, create this, create this. 00:19:35.740 |
Versus when you give it to a model, and you put-- 00:19:39.940 |
quite good at picking which ones you should actually run. 00:19:43.300 |
And I'm curious to see if the model providers actually 00:19:49.820 |
going to build very good rankers to kind of fill that gap. 00:19:58.340 |
You could just say, well, given the embeddings of my search 00:20:06.700 |
that I have very high MRR, which is mean reciprocal rank. 00:20:13.080 |
that the tools you use are in the top end filter. 00:20:27.500 |
you either have less than three tools or more than 1,000. 00:20:32.620 |
I don't know what kind of companies say, oh, thank God, 00:20:44.420 |
it was interesting to me you retweeted this thing 00:20:48.460 |
was Joshua Brown's retweeting some benchmark that's like, 00:20:52.500 |
oh my God, entropic function calling, so good. 00:20:55.500 |
And then you retweeted it, and then you tweeted later, 00:21:00.380 |
What's your flow for like, how do you actually 00:21:03.860 |
Because obviously, the benchmarks are lying, right? 00:21:06.260 |
Because the benchmark says it's good, and you said it's bad, 00:21:21.340 |
that I haven't been able to reproduce public benchmarks, 00:21:23.860 |
and so I can't even share some of the results of entropic. 00:21:31.660 |
where it's iteratively building lists, where we're 00:21:35.180 |
doing updates of lists, like we're doing in-place updates, 00:21:40.660 |
And in those situations, we're like, oh, yeah, 00:21:47.620 |
but we're getting strings that are like the strings of JSON. 00:21:51.700 |
So we had to call JSON parse on individual elements. 00:21:57.420 |
Overall, I'm super happy with the entropic models 00:22:04.020 |
Haiku is-- in function calling, it's actually better. 00:22:08.140 |
But I think we just had to file down the edges a little bit, 00:22:13.660 |
apply to production, we get half a percent of traffic 00:22:22.380 |
Or if you use function calling, we'll have a parse error. 00:22:25.340 |
And so I think these are things that are definitely 00:22:27.460 |
going to be things that are fixed in the upcoming weeks. 00:22:30.780 |
But in terms of the reasoning capabilities, man, 00:22:38.300 |
especially when you're building consumer applications. 00:22:41.100 |
If you're building something for consultants or private equity, 00:22:47.340 |
But for consumer apps, it makes products viable. 00:22:55.660 |
I had this chart about the ELO versus the cost 00:23:00.700 |
And you could put trend graphs on each of those things 00:23:05.620 |
about higher ELO equals higher cost, except for Haiku. 00:23:08.620 |
Haiku kind of just broke the lines, or the ISO ELOs, 00:23:21.220 |
to make sure that we map out the surface area of Instructure. 00:23:25.820 |
be familiar with Instructure from your talks, 00:23:30.260 |
You had the number one talk from the AI Engineer Summit. 00:23:34.140 |
MARK MANDEL: Two Lews, Jason Lew and Jerry Lew. 00:23:42.600 |
But yeah, until I actually went through your cookbook, 00:24:01.440 |
MARK MANDEL: Yeah, so this is the part that feels crazy. 00:24:03.720 |
Because really, the difference is LLMs give you strings, 00:24:10.360 |
you can do every lead code problem you ever thought of. 00:24:14.160 |
And so I think there's a couple of really common applications. 00:24:16.960 |
The first one, obviously, is extracting structured data. 00:24:24.080 |
I want to give back out a list of checkout items 00:24:26.560 |
with a price, and a fee, and a coupon code, or whatever. 00:24:31.640 |
Another application really is around extracting graphs out. 00:24:36.560 |
So one of the things we found out about these language models 00:24:40.680 |
it's really good at figuring out what are nodes 00:24:44.560 |
And so we have a bunch of examples where not only 00:24:48.160 |
do I extract that this happens after that, but also, OK, 00:24:59.720 |
Given a story, for example, you could extract relationships 00:25:07.200 |
And then the last really big application really 00:25:12.320 |
The idea is that any API call has some schema. 00:25:16.240 |
And if you can define that schema ahead of time, 00:25:18.200 |
you can use a language model to resolve a request 00:25:25.680 |
So for example, I have a really popular post called, 00:25:29.920 |
And effectively, if I have a question like this, 00:25:32.200 |
what was the latest thing that happened this week? 00:25:40.080 |
be select all data where the date time is between today 00:25:55.600 |
But really, if you could do a group by over the month 00:26:03.560 |
that embeddings really is kind of like the lowest 00:26:11.220 |
And then you can just use your computer science 00:26:16.200 |
to produce a graph where I want to group by each month 00:26:20.800 |
You can do that if you know how to define this data structure. 00:26:24.240 |
In that part, you kind of run up against the chains of the world 00:26:38.120 |
with the other, I guess, LLM frameworks in the ecosystem? 00:26:45.040 |
I think because it's just, again, it's just Python. 00:26:48.160 |
It's asking, oh, how does Django interact with requests? 00:26:51.320 |
Well, you just might make a request.get in a Django app. 00:26:56.560 |
But no one would say, oh, I went off of Django 00:27:01.840 |
They should be, ideally, the wrong comparison. 00:27:07.680 |
I think the real goal for me is to go down the LLM compiler 00:27:12.080 |
route, which is instead of doing a React-type reasoning loop, 00:27:18.160 |
I think my belief is that we should be using workflows. 00:27:26.920 |
We can fine-tune a model that has a better workflow. 00:27:33.600 |
Do you want to always train it to have less looping? 00:27:36.920 |
In which case, you want it to get the right answer 00:27:46.160 |
to work at a workflow company, but I'm not sure 00:27:48.280 |
this is a well-defined framework for everybody. 00:27:49.880 |
- I'm thinking workflow in terms of the prefect Zapier 00:27:55.400 |
I want you to tell me what the nodes and edges are. 00:27:57.600 |
And then maybe the edges are also put in with AI. 00:28:06.200 |
and then ask you to fix things as I execute it, 00:28:09.600 |
rather than going, hey, I couldn't parse the JSON, 00:28:13.840 |
I couldn't parse the JSON, I'm going to try again. 00:28:15.640 |
And then next thing you know, you spent $2 on OpenAI credits. 00:28:21.600 |
say, oh, the edge between node x and y does not run. 00:28:27.840 |
Let me just iteratively try to fix that component. 00:28:30.720 |
Once that's fixed, go on to the next component. 00:28:33.800 |
And obviously, you can get into a world where, 00:28:36.320 |
if you have enough examples of the nodes x and y, 00:28:45.320 |
the problem into that workflow and execute in that workflow, 00:28:49.600 |
rather than looping and hoping the reasoning is good enough 00:28:55.120 |
Yeah, I would say I've been hammering on Devon a lot. 00:29:01.680 |
And obviously, for simple tasks, it does well. 00:29:06.120 |
For the complicated, more than 10, 20-hour tasks, 00:29:34.760 |
The fact that you're talking about 10 hours is incredible. 00:29:40.680 |
I think I'm going to say this every single time 00:29:43.480 |
Let's not reward them for taking longer to do things. 00:29:46.880 |
Like, that's a metric that is easily abusable. 00:29:53.800 |
Yeah, but all I'm saying is you can monotonically 00:29:56.400 |
increase the success probability over an hour. 00:30:02.000 |
Obviously, if you run an hour and you've made no progress-- 00:30:07.440 |
there was that one example where I wanted it to buy me 00:30:16.160 |
I wonder if you'll be able to purchase a bicycle. 00:30:18.760 |
Because it actually can do things in real world, 00:30:21.160 |
it just needs to suspend to you for off and stuff. 00:30:32.560 |
or one of the things that is a real barrier for agents 00:30:34.840 |
is LLMs really like to get stuck into a lane. 00:30:37.840 |
And what you're talking about, what I've seen Devon do 00:30:43.960 |
kind of change plans based on the performance of the plan 00:30:51.280 |
Yeah, I feel like we've gone too much in the looping route. 00:30:53.840 |
And I think a lot of more plans and DAGs and data structures 00:30:56.880 |
are probably going to come back to help fill in some holes. 00:31:02.600 |
Do you see it's like an existing state machine kind of thing 00:31:06.360 |
that connects to the LLMs, the traditional DAG player? 00:31:10.680 |
So do you think we need something new for AI DAGs? 00:31:22.320 |
and it should still be allowed to be fuzzy, right? 00:31:27.240 |
I think in mathematics, we have plate diagrams, and Markov chain 00:31:30.560 |
diagrams, and recurrence states, and all that. 00:31:32.840 |
Some of that might come into this workflow world. 00:31:43.720 |
that we can prompt better, have few-shot examples for, 00:31:59.800 |
Actually, you worked at a Prefect competitor at Temporal. 00:32:06.480 |
What else would you call out as particularly interesting 00:32:36.160 |
that these observability companies isn't actually 00:32:42.640 |
I still end up using Datadog or Sentry to do latency. 00:32:50.400 |
And then the prompt-in, prompt-out latency token costs, 00:32:56.320 |
So you don't need 20 funded startups building LLM ops? 00:33:14.640 |
Because I'm just not familiar with that world of tooling. 00:33:19.280 |
Whereas I think I spent three good years building 00:33:22.520 |
observability tools for recommendation systems. 00:33:34.040 |
Because I'm not doing a very complex looping behavior. 00:33:44.360 |
You famously have decided to not be a venture-backed company. 00:33:51.320 |
The obvious route for someone as successful as Instructor 00:33:53.880 |
is like, oh, here's hosted Instructor with all tooling. 00:33:57.360 |
And you just said you had a whole bunch of experience 00:34:01.120 |
You have the perfect background to do this, and you're not. 00:34:07.080 |
I mean, I know why, because you want to go free dive. 00:34:09.920 |
Yeah, well, yeah, because I think there's two things. 00:34:13.880 |
One, it's like, if I tell myself I want to build requests, 00:34:19.760 |
I mean, one could argue whether or not Postman is. 00:34:22.400 |
But I think for the most part, having worked so much, 00:34:25.960 |
I'm kind of, like, I am more interested in looking 00:34:32.160 |
at how systems are being applied and just having access 00:34:36.360 |
And I think I can do that more through a consulting business 00:34:43.120 |
You want to build, like, automations over construction 00:34:50.640 |
private equity, like, mergers and acquisitions 00:34:56.840 |
Whereas, like, maintaining the library, I think, 00:35:01.320 |
that I try to keep up, especially because if it's not 00:35:04.120 |
venture-backed, I have no reason to sort of go 00:35:06.720 |
down the route of, like, trying to get 1,000 integrations. 00:35:10.160 |
Like, in my mind, I just go, oh, OK, 98% of the people 00:35:15.200 |
And if someone contributes another, like, platform, 00:35:19.800 |
But yeah, I mean, you only added Entropic Support, like, 00:35:26.840 |
you couldn't even get an API key until, like, this year, right? 00:35:30.480 |
And so, OK, if I add it, like, last year, I was kind of-- 00:35:33.120 |
I'm trying to, like, double the code base to service, 00:35:38.000 |
Do you think the market share will shift a lot now 00:35:40.040 |
that Entropic has, like, a very, very competitive offering? 00:35:48.240 |
I don't know if it's fully GA now, if it's GA, 00:35:50.600 |
if you can get commercial access really easily. 00:35:54.800 |
I got commercial after, like, two weeks to reach out to their sales team. 00:36:02.640 |
just, like, ping one of the Entropic staff members. 00:36:05.480 |
Then maybe we need to, like, cut that part out 00:36:07.160 |
so I don't need to, like, you know, read false news. 00:36:14.800 |
Like, if you are a business, you should totally 00:36:21.400 |
Like, the cost savings is just going to justify it 00:36:26.360 |
And yeah, I think their SDK is, like, pretty good. 00:36:31.280 |
I just don't think it's a billion-dollar company. 00:36:33.600 |
And I think if I raise money, the first question is going to be, like, 00:36:35.880 |
how are you going to get a billion-dollar company? 00:36:38.840 |
if I make a million dollars as a consultant, I'm super happy. 00:36:43.080 |
I can have, like, a small staff of, like, three people. 00:36:47.720 |
And I think a lot of my happiest founder friends 00:36:49.680 |
are those who, like, raised the tiniest seed round, 00:36:52.080 |
became profitable, they're making, like, 70, 60, 70, like, MRR, 00:36:58.840 |
And they're, like, we don't even need to raise the seed round. 00:37:00.680 |
Like, let's just keep it, like, between me and my co-founder, 00:37:03.600 |
we'll go traveling, and it'll be a great time. 00:37:07.520 |
- I repeat that as a seed investor in the company. 00:37:10.680 |
I think that's, like, one of the things that people get wrong sometimes, 00:37:15.960 |
They have an insight into, like, some new tech, 00:37:18.640 |
like, say LLM, say AI, and they build some open source stuff, 00:37:21.840 |
and it's like, I should just raise money and do this. 00:37:24.200 |
And I tell people a lot, it's like, look, you can make a lot more money 00:37:30.720 |
could make a lot more money just working somewhere else 00:37:38.640 |
They're trying to decide, oh, should I stay in my, like, high-paid fang job 00:37:42.640 |
and just tweet this on the side and do this on GitHub? 00:37:47.160 |
Like, being a consultant seems like a lot of work. 00:37:49.440 |
It's like, you got to talk to all these people, you know? 00:37:54.480 |
because I think the open source thing is just like, 00:37:56.000 |
well, I'm just doing it for, like, purely for fun, 00:38:02.800 |
is the fact that it's not a venture-backed startup. 00:38:05.520 |
Like, I think I'm right because this is all you need. 00:38:12.760 |
So I think a part of it is just, like, part of the philosophy 00:38:15.680 |
is the fact that all you need is a very sharp blade 00:38:19.320 |
and you don't actually need to build, like, a big enterprise. 00:38:23.200 |
I think the other thing, too, that I've been thinking around, 00:38:25.760 |
just because I have a lot of friends at Google 00:38:28.880 |
it's like, man, like, what we lack is not money or, like, money or, like, skill. 00:38:34.520 |
Like, you just have to do this, the hard thing, 00:38:40.160 |
In terms of, like, whether or not you do want to do a founder, 00:38:41.960 |
I think that's just a matter of, like, optionality. 00:38:44.040 |
But I definitely recognize that the, like, expected value of being a founder 00:38:58.680 |
and as I know friends who raised a seed round this year. 00:39:03.120 |
Right? And, like, that is, like, the reality. 00:39:04.760 |
And, like, you know, even from my perspective, 00:39:08.760 |
it's been tough where it's like, oh, man, like, 00:39:11.080 |
a lot of incubators want you to have co-founders. 00:39:12.880 |
Now you spend half the time, like, fundraising 00:39:16.920 |
and find co-founders rather than building the thing. 00:39:20.040 |
And I was like, man, like, this is a lot of stuff, 00:39:23.840 |
a lot of time spent out doing things I'm not really good at. 00:39:28.720 |
I think, I do think there's a rising trend in solo founding. 00:39:39.080 |
something like 30% of starters that make it to, like, 00:39:41.160 |
series B or something actually are solo founder. 00:39:44.240 |
So I think, I feel like this must-have co-founder idea 00:39:48.000 |
mostly comes from YC and most, everyone else copies it. 00:39:53.720 |
plenty of companies break up over co-founder breakups. 00:40:05.840 |
because you don't have, like, the social equity 00:40:17.200 |
Like, if I was to raise money, like, that's kind of, 00:40:19.720 |
like, my message if I was to raise money is, like, 00:40:24.360 |
"I've decided to make it much worse by being a founder 00:40:33.160 |
Like, if I can't say that, like, I don't need the money 00:40:37.880 |
and, like, hire an intern and get an assistant. 00:40:41.040 |
But, like, what I don't want to do, it's, like, 00:40:47.800 |
to, like, try to find a problem we're solving. 00:41:01.480 |
- You have really good boots, really good boots. 00:41:45.840 |
You applied to, like, Figma, Notion, Cohere, Anthropic, 00:41:49.600 |
because you didn't have enough LLM experience. 00:41:57.360 |
I'm too late, not going to make it, you know? 00:42:01.200 |
Any advice for people that feel like that, you know? 00:42:07.560 |
is actually from a lot of folks in jiu-jitsu. 00:42:11.960 |
Like, oh, I'll join jiu-jitsu once I get in more shape. 00:42:19.840 |
And then you say, oh, like, why should I start now? 00:42:28.800 |
Like, if you don't start now, you start tomorrow. 00:42:32.640 |
And if you're, like, if you're worried about being behind, 00:42:41.240 |
like, maybe you just don't want it, and that's fine too. 00:42:44.560 |
Like, if you wanted it, you would have started. 00:42:50.520 |
probably think of things on a too short time horizon. 00:42:54.560 |
But again, you know, you're going to be old anyways. 00:42:59.840 |
I guess, the career advice slash sort of blogging. 00:43:04.840 |
You always go viral for this post that you wrote 00:43:07.840 |
on advice to young people and the lies you tell yourself. 00:43:11.080 |
- You said that you were writing it for your sister. 00:43:14.520 |
Yeah, she was, like, bummed out about, like, you know, 00:43:16.880 |
going to college and, like, stressing about jobs. 00:43:24.160 |
And I just kind of, like, texted through the whole thing. 00:43:39.280 |
So there's lots of stuff here, which I agree with. 00:43:51.280 |
I feel like the agency thing is always making a, 00:43:53.720 |
is always a trend in SF or just in tech circles. 00:44:05.440 |
okay, the agency is just, like, the norm of the vector. 00:44:13.800 |
Yeah, I mean, I think agency is just a matter 00:44:15.680 |
of, like, having courage and doing the thing. 00:44:18.960 |
Like, you know, if you want to go rock climbing, 00:44:21.080 |
it's, like, do you decide you want to go rock climbing, 00:44:25.040 |
you rent some shoes, and you just fall 40 times? 00:44:29.720 |
Let me go research the kind of shoes that I want. 00:44:32.120 |
Okay, like, there's flatter shoes and more inclined shoes. 00:44:44.800 |
No, I think the higher agent person just, like, 00:44:57.880 |
Like, from pottery, like, one thing I learned was 00:45:04.320 |
You should just weigh the amount of clay you use, right? 00:45:13.360 |
The less agency person's like, oh, I made six cups, 00:45:17.360 |
like, there's not really, what do you do next? 00:45:23.640 |
It's like, oh, you just got to write the tweets, 00:45:25.200 |
like, make the commits, contribute open source, 00:45:29.200 |
There's no real outcome, it's just a process, 00:45:31.840 |
you just get really good at the thing you're doing. 00:45:38.800 |
how would you design performance review systems? 00:45:45.960 |
we can count lines of code for developers, right? 00:45:48.960 |
- No, I don't think that would be the actual, 00:45:57.000 |
this is interesting because I've mostly thought of it 00:45:59.600 |
from the perspective of science and not engineering. 00:46:02.920 |
Like, I've been running a lot of engineering stand-ups, 00:46:09.840 |
Like, the process outcome is like experiments and ideas, 00:46:15.480 |
what you might want to think about an outcome is, 00:46:16.960 |
oh, I want to improve the revenue or whatnot, 00:46:23.880 |
I want to come up with like three or four experiments, 00:46:27.600 |
To them, they might think, oh, nothing worked, like, I suck. 00:46:31.760 |
you've closed off all these other possible avenues 00:46:37.800 |
that you're gonna figure out that direction really soon, 00:46:40.720 |
right, like, there's no way you'd try 30 different things 00:46:56.680 |
And, like, experience lets you figure out, like, 00:46:58.520 |
oh, that other half, it's not worth doing, right? 00:47:03.800 |
half these prompting papers don't make any sense, 00:47:05.760 |
just use a chain of thought and just, you know, 00:47:08.320 |
But that's kind of, that's basically it, right? 00:47:12.000 |
It's like, usually performance for me is around, like, 00:47:30.840 |
you were kind of like, you were right at the time 00:47:34.080 |
I think there are similar paths in my engineering career 00:47:37.640 |
where I try one approach and at the time it doesn't work 00:47:48.400 |
- So usually when I, like, when I'm coaching folks 00:47:51.080 |
and they say, like, oh, these things don't work, 00:47:59.240 |
Like, if it's negative, you don't just, like, not public. 00:48:02.440 |
But then, like, what do you actually write down? 00:48:11.840 |
is basically writing down under what conditions 00:48:21.520 |
If someone is reading this two years from now, 00:48:25.640 |
That's really hard, but again, that's like another, 00:48:28.000 |
that's like another skill you kind of learn, right? 00:48:30.320 |
It's like, you do go back and you do experiments 00:48:34.600 |
I think a lot of it here is just, like, scaling worked. 00:48:39.760 |
- Right, like, you could actually, like, rap lyrics, 00:48:42.000 |
you know, like, that was because I did not have 00:48:58.840 |
Something that people should, maybe keep in mind, 00:49:03.240 |
You know, when are you going to know the AGI is here? 00:49:05.960 |
but any stuff that you tried recently that didn't work 00:49:11.080 |
- I mean, I think, like, the personal assistants 00:49:17.400 |
So, I hired a writer and I hired a personal assistant. 00:49:23.600 |
work with these people until I figure out, like, 00:49:27.120 |
and what are, like, the reproducible steps, right? 00:49:30.000 |
But, like, I think the experiment for me is, like, 00:49:33.520 |
$1,000 a month to, like, help me improve my life 00:49:35.920 |
and then let me, sort of, get them to help me figure out, 00:49:42.480 |
'Cause it's not just, like, OAuth, Gmail, Calendar, 00:49:46.880 |
It's a little bit more complicated than that, 00:49:51.800 |
I wish GPD 4 or Opus was actually good enough 00:50:00.760 |
Lindy is probably the one I've seen the most. 00:50:07.840 |
or any other, sort of, agents, assistant startup. 00:50:13.560 |
they were not GA last time I was considering it. 00:50:17.520 |
oh, like, really what I want you to do is, like, 00:50:27.600 |
thinking of them and, like, working for them. 00:50:30.880 |
Or it's, like, I want you to notice that, like, 00:50:43.080 |
oh, Jason should have, like, a 15-minute prep break 00:51:24.520 |
Like, you know, now you want to turn the thinking 00:51:26.920 |
into DAGs, like, do prompts should still be code? 00:51:36.640 |
the output model is defined as a code object. 00:51:50.440 |
And the inputs, somewhat, should be code objects. 00:51:52.440 |
But I think the one thing that Instructor tries to do 00:51:58.840 |
And beyond that, I really just think that, you know, 00:52:10.040 |
that if you give control of these systems away too early, 00:52:16.400 |
Like, many companies I know that I reach out are ones 00:52:18.600 |
where, like, oh, we're going off of the frameworks 00:52:20.280 |
because now that we know what the business outcomes 00:52:27.760 |
but we want to do RAG to, like, sell you supplements 00:52:31.960 |
or to have you, like, schedule the fitness appointment. 00:52:35.000 |
And, like, the prompts are kind of too baked into the systems 00:52:38.960 |
and, like, start doing upselling or something. 00:52:41.600 |
It's really funny, but a lot of it ends up being, like, 00:52:50.400 |
So we were trying, in our prep for this call, 00:52:57.120 |
or, you know, someone who works at a company can say? 00:53:00.040 |
What do you think is the market share of the frameworks? 00:53:03.680 |
The Lanchain, the Llama Index, the everything else. 00:53:07.520 |
'Cause not everyone wants to care about the code. 00:53:11.320 |
It's like, I think that's a different question 00:53:19.360 |
Like, making hundreds of millions of dollars, 00:53:25.560 |
Like, there's so much productivity to be captured 00:53:33.520 |
It's not because they care about the prompts, 00:53:39.200 |
- Yeah, but those are not sort of low-code experiences, 00:53:42.480 |
- Yeah, I think the bigger challenge is, like, 00:53:53.160 |
and the money to sort of solve those problems. 00:53:57.280 |
I think it's just like, again, if you go the VC route, 00:53:59.760 |
then it's like, you're talking about billions 00:54:03.240 |
That stuff, for me, it's, like, pretty unclear. 00:54:11.720 |
who want to use Instructure to build their own tooling. 00:54:17.640 |
versus, like, downstream consumers of these things 00:54:26.400 |
Because they want something that's fully managed 00:54:28.400 |
and they want something that they know will work. 00:54:36.320 |
And I think that kind of organization is really good 00:54:46.780 |
I wouldn't, and Hummel Hussain wrote this, like, 00:55:05.840 |
that AI is a better prompt engineer than you are. 00:55:28.760 |
Or, like, did you write the multi-hop question 00:55:32.360 |
But in these, like, workflows that I've been managing, 00:55:40.440 |
like, minimizing churn and maximizing retention? 00:55:43.200 |
Like, that's not, like, it's not really, like, 00:55:51.160 |
Like, those things are much more harder to capture. 00:55:52.800 |
So we don't actually have those metrics for that, right? 00:56:11.000 |
or, again, like, changes to work for the current process. 00:56:16.000 |
Like, when we migrate from, like, Anthropic to OpenAI, 00:56:28.320 |
that you think should not exist before we wrap up? 00:56:36.400 |
what is, how does this make a billion dollars? 00:56:43.440 |
Yeah, like, I don't really pay attention too much 00:57:01.800 |
if anything has really, like, blown me out the water. 00:57:18.960 |
a very small space in the AI engineering ecosystem. 00:57:22.440 |
- Yeah, I would say one of the challenges here, 00:57:25.280 |
you know, you talk about dealing in the consumer 00:57:43.920 |
is that they're not as good as the ML engineers. 00:57:48.920 |
you are someone who has credibility in the MLE space, 00:57:54.360 |
a very authoritative figure in the AIE space, 00:58:09.960 |
I basically also end up rebuilding Instructor, right? 00:58:58.320 |
and like, figure out why we don't have any tests, 00:59:07.000 |
like, what are, like, how do you measure success? 00:59:11.560 |
Great, I'll focus on like, these TypeScript build errors." 00:59:15.360 |
And then you're just like, "What am I doing?" 00:59:16.840 |
And then you kind of sort of feel really frustrated. 00:59:21.720 |
because I've made offers to machine learning engineers, 00:59:25.480 |
they've joined, and they've left in like, two months. 00:59:30.520 |
"Yeah, I think I'm going to join a research lab." 00:59:33.600 |
like, I don't even think you should be hiring these MLEs. 00:59:46.200 |
So they're the guy who built a demo, it got traction, 00:59:49.400 |
now it's working, but they're still being pulled back 00:59:53.000 |
why Google Calendar integrations are not working, 00:59:59.680 |
And so I'm sort of like, in a very interesting position 01:00:13.000 |
- This is something I'm literally wrestling with, 01:00:14.600 |
like, this week, as I just wrote something about it. 01:00:18.400 |
I'm probably gonna be recommending in the future, 01:00:30.120 |
you're gonna need someone to write instructor code. 01:00:32.640 |
- And you're just like, yeah, you're making this like, 01:00:36.200 |
and like, I feel goofy all the time, just like, prompting. 01:00:38.840 |
It's like, oh man, I wish I just had a target data set 01:00:45.240 |
- Yeah, so, you know, I guess what LeanSpace is, 01:00:58.600 |
who wanna build more in AI and do creative things in AI, 01:01:07.120 |
- Yeah, I think there's gonna be a mix of that, 01:01:10.400 |
I think a lot more data science is gonna come in 01:01:16.640 |
like, what does the business actually want as an outcome? 01:01:26.120 |
but people need to figure out what that actually is, 01:01:28.800 |
- Yeah, yeah, all the data engineering tools still apply, 01:01:38.160 |
- We'll have you back again for the World's Fair. 01:01:52.320 |
- I'm worried about having too many all-you-need titles, 01:02:10.200 |
- Cool, well, it was a real pleasure to have you on.