back to indexAgents @ Work: Dust.tt — with Stanislas Polu
00:00:06.160 |
This is Alessio, partner and CTO at Decibel Partners, 00:00:09.120 |
and I'm joined by my co-host, Swiggs, founder of Small.ai. 00:00:12.160 |
- Hey, and today we're in a studio with Stanford, welcome. 00:00:17.800 |
- And you have had a very distinguished career. 00:00:29.480 |
Oracle, Totems, Stripe, and then OpenAI Preach at UBT. 00:00:35.840 |
About two years ago, you left OpenAI to start Dust. 00:00:38.280 |
I think you were one of the first OpenAI alum founders. 00:00:41.800 |
- Yeah, I think it was about at the same time 00:00:43.320 |
as the Adapt guys, so it was that first wave. 00:00:46.120 |
- Yeah, and people really loved the David episode. 00:01:05.640 |
You know, you were at Stripe for almost five years. 00:01:07.840 |
There are a lot of Stripe alums going into OpenAI. 00:01:12.120 |
- Yeah, so I think the buses of Stripe people 00:01:14.880 |
really started flowing in, I guess, after ChatGPD. 00:01:32.600 |
- She had a pretty high job at OpenAI at the time, 00:01:39.720 |
and you want to make them think it's awesome, 00:01:44.000 |
I was, like, maybe 16, so it was 25 years ago. 00:01:47.360 |
Then the first big exposure to AI would be at Stanford. 00:01:50.680 |
And I'm going to, like, disclose how old I am 00:01:54.240 |
because, at the time, it was a class taught by Andrew Eng. 00:02:01.320 |
It was half features for vision and a star algorithm. 00:02:11.680 |
But, you know, that cat face or the human face 00:02:16.760 |
Went to, hesitated doing a PhD, more in systems. 00:02:26.400 |
did a gazillion mistakes, got acquired by Stripe. 00:02:34.280 |
Felt like it was the time you had the Atari games, 00:02:36.440 |
you had the self-driving craziness at the time. 00:02:41.280 |
It felt like the Atari games were incredible, 00:03:01.400 |
Discovering new math would be very foundational. 00:03:04.480 |
but it's not as direct as driving people around. 00:03:10.320 |
kind of a bit of time where I started exploring. 00:03:15.000 |
on trying to get RC cars to drive autonomously. 00:03:25.080 |
because it was like probably very operational. 00:03:34.480 |
and because of a bug I wrote, I killed a family, 00:03:39.880 |
And so just decided like, no, that's just too crazy. 00:03:45.800 |
We're trying to apply transformers to cut fuzzing. 00:03:52.200 |
and tries to mutate the inputs of a library to find bugs. 00:03:58.320 |
and do reinforcement learning with the signal 00:04:03.960 |
Didn't work at all because the transformers are so slow 00:04:09.960 |
And then I started interested myself in math and AI. 00:04:15.320 |
And at the same time, OpenAI was kind of starting 00:04:17.320 |
the reasoning team that were tackling that project as well. 00:04:27.560 |
I don't know how much you want to dig into that. 00:04:29.360 |
The way to find your way to OpenAI when you're in Paris 00:04:31.480 |
was kind of an interesting adventure as well. 00:04:33.040 |
- Please, and I want to note, this was a two month journey. 00:04:51.520 |
- No, the truth is that I moved back to Paris through Scribe. 00:04:54.600 |
And I just felt the hardship of being remote from your team 00:05:11.600 |
obviously you had worked with Greg, but not anyone else. 00:05:20.640 |
that I was a good engineer through Greg, I presume, 00:05:29.320 |
"Hey, come pass interviews, it's gonna be fun." 00:05:35.800 |
So I go to SF, go through the interview process, 00:05:38.680 |
get an offer, and so I get Bob McGrew on the phone 00:05:42.640 |
"Hey, Stan, it's awesome, you've got an offer. 00:05:47.400 |
"I'm not coming to the SF, I'm based in Paris 00:06:06.400 |
and that's how I kind of started working at OpenAI, 00:06:25.440 |
and in particular in the context of formal mathematics. 00:06:28.760 |
The motivation was simple, transformers are very creative, 00:06:33.880 |
and formal math systems have the ability to verify a proof, 00:06:38.760 |
and the tactics they can use to solve problems 00:06:42.320 |
are very mechanical, so you miss the creativity. 00:06:44.840 |
And so the idea was to try to explore both together, 00:06:53.040 |
A formal system, just to give a little bit of context, 00:07:04.000 |
If the type checks, it means that the program is correct. 00:07:15.320 |
So the truth is that what you code in involves tactics 00:07:18.720 |
that may involve computation to search for solutions, 00:07:28.200 |
The verification of the proof at the very low level 00:07:32.760 |
- How quickly do you run into halting problem, 00:07:37.840 |
and possibilities where you're just like that? 00:07:41.480 |
It was really trying to solve very easy problems. 00:07:58.040 |
because that mass benchmark includes AMC problems, 00:08:00.800 |
AMCA times 10, 12, so these are the easy ones, 00:08:19.400 |
because I don't think we'll touch on this again. 00:08:23.520 |
and then more recently with DeepMind scoring silver 00:08:37.920 |
I mean, from my perspective, spent three years on that. 00:08:43.840 |
He was at FAIR, was working on some problems. 00:08:50.120 |
And we cracked a few problems here and there, 00:09:07.360 |
I think there's nothing too magical in their approach, 00:09:11.000 |
There's a Dan Silver talk from seven days ago 00:09:13.360 |
where it goes a little bit into more details. 00:09:22.920 |
So we can dig into what autoformalization means, 00:09:26.640 |
- Let's talk about the tail end, maybe, of the OpenAI. 00:09:30.520 |
"I'm gonna work on math and do all of these things." 00:09:33.000 |
I saw on one of your blog posts, you mentioned 00:09:47.680 |
And then you left just before ChatGPT was released, 00:09:49.920 |
but tell people a bit more about the research path 00:09:57.040 |
there's always been a large chunk of the compute 00:09:58.800 |
that was reserved to train the GPTs, which makes sense. 00:10:04.520 |
Most of the compute was going to a product called Nest, 00:10:09.440 |
And then you had a bunch of, let's say remote, 00:10:12.840 |
not core research teams that were trying to explore 00:10:23.040 |
where your question was going, is that in those labs, 00:10:27.480 |
So by definition, you shouldn't be managing them. 00:10:30.400 |
But in that space, there's a managing tool that is great, 00:10:35.280 |
Basically, by managing the compute allocation, 00:10:49.320 |
but if it was not aligned with OpenAI mission, 00:10:51.640 |
and that's fair, you wouldn't get the compute allocation. 00:10:55.160 |
As it happens, solving math was very much aligned 00:11:01.200 |
And so I was lucky to generally get the compute 00:11:05.920 |
- What do you need to show as incremental results 00:11:15.760 |
that it's going to be aligned with the company. 00:11:17.520 |
So it's much easier than to go into something 00:11:20.960 |
You have to show incremental progress, I guess. 00:11:23.080 |
It's like you ask for a certain amount of compute 00:11:33.880 |
And a strong negative result is actually often 00:11:40.320 |
And then it generally goes into, as any organization, 00:11:44.320 |
you would have kind of a people finding your project 00:11:47.120 |
or any other project kind of a cool and fancy. 00:12:02.400 |
because you're going in a different direction 00:12:15.600 |
like the results you were kind of bringing back to him 00:12:44.200 |
He would really coach me as a trainee researcher, I guess, 00:12:52.600 |
he was the one showing the North Star, right? 00:13:04.800 |
flock the different teams together towards an objective. 00:13:08.360 |
- I would say like the public perception of him 00:13:10.240 |
is that he was the strongest believer in scaling. 00:13:13.640 |
- He was, he has always pursued like the compression thesis. 00:13:19.320 |
What does the public not know about how he works? 00:13:22.400 |
- I think he's really focused on building the vision 00:13:25.160 |
and communicating the vision within the company, 00:13:28.680 |
I was personally surprised that he spent so much time, 00:13:31.760 |
you know, working on communicating that vision 00:13:34.160 |
and getting the teams to work together versus-- 00:13:40.040 |
it's the belief in compression and scanning computes. 00:13:43.560 |
I remember when I started working on the reasoning team, 00:14:02.400 |
- And was it according to the neural scaling laws, 00:14:13.360 |
basically at the time of GPT-3 being released 00:14:17.000 |
But before that, there really was a strong belief in scale. 00:14:20.960 |
I think it was just the belief that the transformer 00:14:26.960 |
and that this was just a question of scaling. 00:14:34.120 |
I didn't work, weirdly, I didn't work that much with Greg 00:14:43.400 |
One thing about Sam Altman, he really impressed me 00:14:46.000 |
because when I joined, he had joined not that long ago, 00:14:49.920 |
and it felt like he was kind of a very high-level CEO. 00:14:59.000 |
to go into the subjects within a year or something, 00:15:02.320 |
all the way to a situation where when I was having lunch 00:15:07.800 |
he would just quite know deeply what I was doing. 00:15:14.320 |
- Yeah, with no ML, but I didn't have any either, 00:15:19.560 |
But I think you can, it's a question about really, 00:15:24.320 |
the very technicalities of how things are done, 00:15:29.400 |
and what's being done, and what are the recent results, 00:15:35.240 |
and that really impressed me, given the size at the time 00:15:40.560 |
- Yeah, I mean, you've been, you were a founder before. 00:15:52.600 |
because most of the time, you operate at a very high level, 00:15:55.080 |
but being able to go deep down and being in the known 00:15:57.520 |
of what's happening on the ground is something 00:16:02.320 |
That's not a place in which I ever was as a founder, 00:16:05.440 |
because first company, we went all the way to 10 people. 00:16:17.280 |
I mean, Stripe was also like a huge rocket ship. 00:16:19.840 |
- Stripe, I was a founder, so I was, like at OpenAI, 00:16:34.960 |
This year, we've also had a similar management shakeup, 00:16:39.120 |
Can you compare what it was like going through that split 00:16:42.960 |
And then like, does that have any similarities now? 00:16:46.000 |
Like, are we gonna see a new Anthropic emerge 00:16:55.360 |
because they had been trying GPT-3, it was a success. 00:17:03.080 |
What I understood of it is that there was a disagreement 00:17:11.640 |
was the fact that we started working on the API 00:17:14.160 |
and wanted to make those models available through an API. 00:17:32.480 |
And I think it's just because we were mostly a research org 00:17:37.960 |
that some divergence in some teams, some people leave, 00:17:46.200 |
- Yeah, very deep bench, like just a lot of talent. 00:17:49.640 |
- So that was the OpenAI part of the history. 00:17:53.280 |
- So then you leave OpenAI in September, 2022. 00:18:07.960 |
rather than going back into doing some more research 00:18:13.280 |
So going through OpenAI was really kind of the PhD 00:18:32.320 |
I'm not a trained, formally trained researcher. 00:18:35.080 |
And it wasn't kind of necessarily an ambition of me 00:18:45.960 |
But at the time I decided that I wanted to go back 00:18:56.040 |
and if we believe the timelines might not be too long, 00:18:58.680 |
it's actually the last train leaving a station 00:19:01.640 |
After that, it's going to be computers all the way down. 00:19:12.680 |
And the motivation for starting a company was pretty simple. 00:19:20.360 |
So it was pre-GPT, but GPT-4 was ready since, 00:19:23.800 |
I mean, I'd been ready for a few months internally. 00:19:27.520 |
The capabilities are there to create an insane amount 00:19:34.360 |
The revenue of OpenAI at the time were ridiculously small 00:19:39.080 |
And so the thesis was there's probably a lot to be done 00:19:45.960 |
Let's talk a bit more about the form factor, maybe. 00:20:03.040 |
which was kind of like the browser extension. 00:20:19.440 |
It was almost inconceivable to just build a product 00:20:24.400 |
Though at the time there was a few companies doing that, 00:20:26.280 |
the one on marketing, I don't remember its name, Jasper. 00:20:56.760 |
I had the strong belief from my research time 00:21:05.680 |
Basically, if you just have one example, you overfit. 00:21:15.800 |
on a multi-step workflow, you start paralyzing stuff. 00:21:21.960 |
you just have like a messy stream of tokens going out 00:21:25.440 |
and it's very hard to observe what's going there. 00:21:33.080 |
the output of each interaction with the model 00:21:35.760 |
and dig into there through a new UI, which is-- 00:21:41.000 |
I mean, Dust is entirely open source even today. 00:21:47.080 |
The reason why is because we're not open source 00:21:48.600 |
because we're not doing an open source strategy. 00:21:53.080 |
We're open source because we can and it's fun. 00:21:59.680 |
- But I think that downside is a big fallacy. 00:22:04.800 |
but the value of Dust is not the current state. 00:22:22.120 |
you can be extremely transparent and just show the code. 00:22:28.600 |
you can just point to the issue, show the pull request. 00:22:33.120 |
Oh, PR welcome, that doesn't happen that much. 00:22:41.120 |
they really enjoy seeing the pull requests advancing 00:22:45.280 |
And then the downsides are mostly around security. 00:22:48.440 |
You never want to do security by obfuscation. 00:22:58.160 |
because if you're doing anything like bug bunting 00:23:01.840 |
you just give much more tools to the bug buntiers 00:23:09.360 |
I don't believe in the value of the code base per se. 00:23:13.400 |
- I think it's really the people that are on the code base 00:23:15.120 |
that have the value and the go-to-market and the product 00:23:18.040 |
and all of those things that are around the code base. 00:23:20.960 |
Obviously, that's not true for every code base. 00:23:28.560 |
I would buy that you don't want to be open source. 00:23:36.080 |
- I signed up for XP1, I was looking, January, 2023. 00:23:44.760 |
how did you feel having to push a product out 00:23:47.080 |
that was using this model that was so inferior? 00:23:55.200 |
that maybe doesn't quite work with the model today, 00:23:57.080 |
but you're just expecting the new model to be better? 00:23:59.360 |
- Yeah, so actually, XP1 was even on a smaller one 00:24:02.920 |
that was the post-Chad GPT release small version, 00:24:08.880 |
but it was the small version of Chad GPT, basically. 00:24:15.440 |
but at the same time, I think XP1 was designed, 00:24:18.080 |
was an experiment, but was designed as a way to be useful 00:24:22.560 |
If you just want to extract data from a LinkedIn page, 00:24:26.840 |
If you want to summarize an article on a newspaper, 00:24:31.000 |
And so it was really a question of trying to find a product 00:24:41.240 |
So that was kind of a, there's a bit of a frustration 00:24:44.880 |
and you know that you don't have access to it yet, 00:24:46.520 |
but it's also interesting to try to find a product 00:24:51.360 |
- And we highlighted XP1 in our anatomy of autonomy posts 00:25:08.800 |
and then you kind of got to where Dust is today. 00:25:12.640 |
of what Dust is today and the courtesies behind it. 00:25:16.920 |
So Dust, we really want to build the infrastructure 00:25:19.280 |
so that companies can deploy agents within their teams. 00:25:25.680 |
because we strongly believe in the emergence of use cases 00:25:28.280 |
from the people having access to creating an agent 00:25:32.600 |
They have to be tinkerers, they have to be curious, 00:25:35.120 |
but they can, like anybody can create an agent 00:25:51.800 |
you have to build the pipes such that the agents 00:25:53.560 |
can take action, can access the web, et cetera. 00:25:58.040 |
Maintaining connections to Notion, Slack, GitHub, 00:26:04.280 |
It is boring work, boring infrastructure work, 00:26:06.840 |
but that's something that we know is extremely valuable 00:26:09.440 |
in the same way that Stripe is extremely valuable 00:26:18.640 |
And there it's fascinating because everything started 00:26:21.760 |
from the conversational interface, obviously, 00:26:26.160 |
but we're only scratching the surface, right? 00:26:29.280 |
I think we are at the Pong level of LLM productization, 00:26:41.640 |
So this is really, our mission is to really create 00:26:48.520 |
to just get away all the work that can be automated 00:26:54.080 |
- And can you just comment on different takes 00:26:57.320 |
So maybe at the most open, it's like auto-GPT. 00:27:01.200 |
It's just kind of like, just try and do anything. 00:27:09.440 |
They're very super hands-on with each individual customer 00:27:15.880 |
between this is magic, this is exposed to you, 00:27:18.120 |
especially in a market where most people don't know 00:27:25.680 |
So the auto-GPT approach obviously is extremely exciting, 00:27:28.920 |
but we know that the agentic capability of models 00:27:37.760 |
Same with the XP one, and where it works is pretty simple. 00:27:40.440 |
It's like a simple workflows that involve a couple tools 00:27:51.120 |
you just want people to put it in the instructions. 00:27:57.200 |
pick up that document, do the work that I want 00:27:59.800 |
in the format I want, and give me the results. 00:28:06.080 |
it's mostly using English for people to program a workflow 00:28:16.000 |
would you say it's kind of like a LLM Zapier type of thing? 00:28:21.720 |
It's still very, you're programming with English? 00:28:25.760 |
So you're just saying, oh, do this, and then that. 00:28:31.320 |
You say, when I give you the command X, do this. 00:28:41.720 |
It's just need to describe what are the tasks 00:28:43.720 |
supposed to be and make the tool available to the agent. 00:28:49.200 |
The tool can be querying into a structured database. 00:29:03.400 |
sending an email, clicking on a button in the admin, 00:29:09.320 |
- Today, we maintain most of the integrations. 00:29:17.000 |
But the reality is that, the reality of the market today 00:29:22.280 |
And so it's mostly us maintaining the integration. 00:29:25.400 |
As an example, a very good source of information 00:29:30.880 |
because Salesforce is basically a database and a UI, 00:29:43.200 |
And the type of support, or real native support, 00:29:46.040 |
will be slightly more complex than just OSing into it, 00:29:52.520 |
oh, you want to connect your Salesforce to us? 00:29:54.440 |
Give us the SoQL, that's the Salesforce QL language. 00:29:58.600 |
Give us the queries you want us to run on it, 00:30:03.040 |
So that's interesting how not only integrations are cool, 00:30:06.200 |
and some of them require a bit of work on the user, 00:30:08.480 |
and for some of them that are really valuable to our users, 00:30:21.520 |
In that case, so we do have browser automation 00:30:24.240 |
for all the use cases and apply the public web, 00:30:35.560 |
- I mean, what I've been saying for a long time, 00:30:40.040 |
that you're gonna stand in front of your computer 00:30:50.760 |
And if the APIs are there, we should use them. 00:31:14.000 |
the scale-ups that are between 500 and 5,000 people, 00:31:17.040 |
tech companies, most of the SaaS they use have APIs. 00:31:21.080 |
Not as an interesting question for the open web, 00:31:26.280 |
that involve websites that don't necessarily have APIs. 00:31:29.240 |
And the current state of web integration from, 00:31:35.360 |
I don't even know if they have web navigation, 00:31:38.040 |
The current state of affair is really, really broken, 00:31:41.320 |
You have basically search and headless browsing. 00:31:44.000 |
But headless browsing, I think everybody's doing 00:31:46.840 |
basically body.innertext and fill that into the model. 00:31:56.200 |
that are exploring the capability of rendering a webpage 00:32:03.200 |
so that's basically the place where to click in the page 00:32:06.200 |
through that process, expose the actions to the model, 00:32:12.760 |
which is not a big page of a full DOM that is very noisy, 00:32:19.320 |
back to the original page and take the action. 00:32:24.000 |
and that will kind of change the level of things 00:32:29.720 |
That I feel exciting, but I also feel that the bulk 00:32:33.120 |
of the useful stuff that you can do within the company 00:32:36.000 |
can be done through API, the data can be retrieved by API, 00:32:40.640 |
- For listeners, I'll note that you're basically 00:32:44.520 |
- Exactly, exactly, I've seen it since summer. 00:32:47.200 |
- Adept is where it is, and Dust is where it is, 00:32:51.640 |
- Can we just quickly comment on function calling? 00:32:54.760 |
- You mentioned you don't need the models to be that smart 00:33:02.600 |
Is there any room for improvement left in function calling, 00:33:05.760 |
or do you feel you usually consistently get always 00:33:08.120 |
the right response, the right parameters, and all that? 00:33:12.200 |
because if the instructions are good and precise, 00:33:16.960 |
and the model just look at the scripts and just follow 00:33:19.160 |
and say, oh, he's probably talking about that action, 00:33:21.360 |
and I'm gonna use it, and the parameters are kind of 00:33:28.520 |
kind of a auto-GPT-esque level in the instructions, 00:33:31.080 |
and provide 16 different tools to your model, 00:33:33.520 |
yes, we're seeing the models in that state making mistakes. 00:33:37.080 |
And there is obviously some progress can be made 00:33:41.680 |
on the capabilities, but the interesting part 00:33:56.720 |
like pushing our users to create rather simple agents, 00:33:59.880 |
is that once you have those working really well, 00:34:03.040 |
you can create meta-agents that use the agents as actions, 00:34:06.040 |
and all of a sudden, you can kind of have a hierarchy 00:34:08.640 |
of responsibility that will probably get you almost 00:34:14.200 |
It require the construction of intermediary artifacts, 00:34:24.400 |
in a specific channel, or shipped, are shared in Slack. 00:34:27.280 |
We have a weekly meeting where we have a table 00:34:32.040 |
We're not writing that weekly meeting table anymore. 00:34:34.240 |
We have an assistant that just go find the right data 00:34:52.040 |
about our financials and our progress and our ARR, 00:34:57.320 |
those graphs directly, and those assistants works great. 00:35:00.720 |
By creating those assistants that cover those small parts 00:35:02.840 |
of that weekly meeting, slowly, we're getting to, 00:35:05.200 |
in a world where we'll have a weekly meeting assistant, 00:35:07.880 |
we'll just call it, you don't need to prompt it, 00:35:16.760 |
and that's an objective for us, to us using Dust, get there, 00:35:20.280 |
you're saving, I don't know, an hour of company time 00:35:24.400 |
- Yeah, that's my pet topic of NPM for agents. 00:35:27.480 |
It's like, how do you build dependency graphs of agents 00:35:31.560 |
Because why do I have to rebuild some of the smaller levels 00:35:36.280 |
- I have a quick follow-up question on agents 00:35:42.640 |
both from like Microsoft and even in startups. 00:35:53.440 |
I don't know, is there should be a protocol format? 00:35:56.000 |
- To be completely honest, the state we are at right now 00:36:00.280 |
So we haven't even explored yet the meta agents. 00:36:11.600 |
If you go to a company, random SaaS B2B company, 00:36:19.880 |
and you tell them build some tooling for yourself, 00:36:23.640 |
If you tell them build AutoGP, they'll go, "Auto what?" 00:36:29.040 |
you're very much focused on non-technical users. 00:36:33.120 |
You mention instruction instead of system prompt, right? 00:36:40.680 |
who kind of pushed us to create a friendly product. 00:36:45.320 |
I was knee-deep into AI when I started, obviously. 00:36:48.600 |
And my co-founder, Gabriel, was a Stripe as well. 00:36:55.040 |
Was at Allen, a healthcare company in Paris after that. 00:37:05.920 |
to make that technology not scary to end users. 00:37:17.880 |
And so we were very proactive and very deliberate 00:37:20.440 |
about creating a brand that feels not too scary 00:37:23.000 |
and creating a wording and a language, as you say, 00:37:31.200 |
- And another big point that David had about ADAPT 00:37:40.120 |
How's that different when you're interacting with APIs 00:37:48.760 |
- Yep, so I think that goes back to the DNA of the companies 00:38:00.080 |
and that's why they raised a large amount of money, 00:38:43.440 |
it is even for us human extremely hard to decide 00:38:52.600 |
So being extremely, extremely, extremely pragmatic here, 00:38:57.440 |
We have to build a product that satisfies the end users 00:39:04.240 |
person that is building the agent can iterate on it. 00:39:06.880 |
As a second step, maybe later when we start training model 00:39:10.920 |
we can optimize around that for each of those companies. 00:39:18.520 |
the same way all SaaS now kind of offers APIs 00:39:27.440 |
so that then you can use agents like Red Team, 00:39:34.760 |
I think it really going to depend on how much, 00:39:37.280 |
because you need to simulate to generate data, 00:39:44.880 |
or are we just going to be using frontier models as they are? 00:39:48.880 |
On that question, I don't have a strong opinion. 00:39:51.600 |
It might be the case that we'll be training models 00:39:59.360 |
that as you get big and you want to really own your product, 00:40:02.880 |
you're going to have to own the model as well. 00:40:05.680 |
Owning the model doesn't mean doing the pre-training, 00:40:09.440 |
but at least having an internal post-training 00:40:18.440 |
then there might be incentives for the SaaS' of the world 00:40:33.440 |
- So that's an incentive. - Yeah, they got to sell seats. 00:40:39.680 |
I'm sure you've used many, probably not just OpenAI. 00:40:42.320 |
Would you characterize some models as better than others? 00:40:47.600 |
What have been the trends in models over the last two years? 00:40:54.320 |
And at times it's the OpenAI model that is the best, 00:40:58.560 |
at times it's the Anthropic models that is the best. 00:41:06.440 |
- Yeah, so when you create an assistant or an agent, 00:41:08.400 |
you can just say, "Oh, I'm going to run it on GPT-4, 00:41:13.040 |
- Don't you think for the non-technical user, 00:41:18.320 |
So we move the default to the latest model that is cool, 00:41:26.600 |
you would have to go in advance and go pick your model. 00:41:37.240 |
- And do you care most about function calling 00:41:43.160 |
because there's nothing worse than a function call, 00:41:47.280 |
including incorrect parameters or being a bit off 00:41:49.800 |
because it just drives the whole interaction off. 00:41:55.640 |
- Yeah, these days, it's funny how the comparison 00:42:03.800 |
I personally don't have proof, but I know many people, 00:42:19.800 |
They kind of innovated in an interesting way, 00:42:23.200 |
but it's that they have that kind of chain of thought step 00:42:26.560 |
whenever you use a Cloud model or Summit model 00:42:31.880 |
when you just interact with it just for answering questions, 00:42:35.040 |
but when you use function calling, you get that step, 00:42:36.720 |
and it really helps getting better function calling. 00:42:41.520 |
with the Berkeley team that runs that leaderboard this week. 00:42:45.960 |
It was V1 like two months ago, and then V2, V3. 00:42:51.240 |
And then the third place is XLAM from Salesforce, 00:43:01.920 |
- But arguably O1 Mini has been in a line for that. 00:43:15.560 |
- It's funny because I've been doing research for three years 00:43:25.080 |
is that when we manage to activate the company, 00:43:27.960 |
The highest penetration we have is 88% daily active users 00:43:34.880 |
The kind of average penetration and activation 00:43:39.960 |
is something like more like 60 to 70% weekly active. 00:43:43.760 |
So we basically have the entire company interacting with us. 00:43:54.400 |
because there is so many places where you can create products 00:43:57.840 |
or do stuff that will give you the 80% with the work you do, 00:44:02.560 |
whereas deciding if it's GPT-4 or GPT-4 Turbo or et cetera, 00:44:07.160 |
you know, it'll just give you the 5% improvement. 00:44:11.720 |
- But the reality is that you want to focus on the places 00:44:17.680 |
But that's something that we'll have to do eventually 00:44:20.840 |
- It's funny 'cause in some ways the model labs 00:44:33.880 |
You're not really limited by quality of model. 00:44:36.280 |
- Right now we are limited by, yes, the infrastructure part, 00:44:45.000 |
to all the data they need to do the job they want to do. 00:44:50.760 |
that are starting to provide integrations as a service, right? 00:44:57.840 |
about how you chunk stuff and how you process information 00:45:15.040 |
And the reality is that if you look at Notion, 00:45:24.320 |
to actually make it available to models in a useful way. 00:45:28.280 |
Because you get all the blocks, details, et cetera, 00:45:32.160 |
- Because also it's for data scientists and not for AI. 00:45:34.920 |
- The reality of Notion is that sometimes you have a, 00:45:37.920 |
so when you have a page, there's a lot of structure in it, 00:45:47.120 |
Sometimes those databases are real tabular data. 00:46:01.400 |
And to really get a very high-quality interaction 00:46:27.180 |
You know, he wants to put the AI in there, but, you know. 00:46:33.940 |
that are like sneakily hard that you're tackling 00:46:43.540 |
is really building the infra that works for those agents, 00:46:53.020 |
that will be useful to a non-negligible set of your users. 00:47:00.200 |
that shouldn't be conversational interactions, 00:47:04.020 |
Basically, know that we have the firehose of information 00:47:22.340 |
because you can just sift through much more information. 00:47:31.620 |
"I wanna be updated when there is a piece of information 00:47:40.140 |
"It says the opposite of what you have in that paragraph. 00:47:42.160 |
"Maybe you wanna update or just ping that person." 00:47:44.540 |
I think there is a lot to be explored on the product layer 00:47:56.660 |
- One thing you keep mentioning about infra work, 00:48:00.900 |
and serving that in a very consumer-friendly way. 00:48:04.560 |
You always talk about infra being additional sources, 00:48:09.180 |
But I'm also interested in the vertical infra. 00:48:11.180 |
There is an orchestrator underlying all these things, 00:48:20.580 |
you have to wait for something to be executed 00:48:24.740 |
I used to work on an orchestrator as well, Temporal. 00:48:42.800 |
And you would say, "Why is it so complicated?" 00:48:51.040 |
like managing the entire set of stuff that needs to happen 00:49:02.240 |
And whenever we see that piece of information goes through, 00:49:05.080 |
maybe trigger workflows because to run agents, 00:49:18.520 |
of replacing Temporal. - Building orchestrators. 00:49:28.040 |
I think in that case, when you're a high-growth company, 00:49:31.740 |
your buy-build trade-off is very much on the side of buy, 00:49:37.600 |
you can focus on your core competency, et cetera. 00:49:41.040 |
we're starting to see the post-high-growth company, 00:49:54.840 |
- No, no, I know, of course they say it's true, 00:50:32.760 |
And then it makes sense to just scratch the SaaS away. 00:50:46.600 |
you don't have the capabilities to reduce SaaS cost. 00:50:54.800 |
new category of companies that might remove some SaaS. 00:50:57.560 |
- Yeah, Alessio's firm has an interesting thesis 00:51:06.520 |
You know, ideally, it's all a labor interface 00:51:08.520 |
where you're asking somebody to do something for you, 00:51:15.400 |
- Are you paying for Temporal Cloud or are you self-hosting? 00:51:24.200 |
- That's why as a shareholder, I like to hear that. 00:51:29.960 |
I just want a list for other founders to think about. 00:51:34.680 |
anything interesting there that you build or buy? 00:51:37.560 |
- I mean, there's always an interesting question. 00:51:39.320 |
We've been building a lot around the interface 00:51:44.880 |
the original version was an orchestration platform, 00:51:55.240 |
and so we continued building upon, and we own it. 00:52:02.400 |
- I would say light LLM is the current open source consensus. 00:52:15.840 |
I started as pure JavaScript, not TypeScript, 00:52:18.160 |
and I think you want to, if you're wondering, 00:52:21.000 |
oh, I want to go fast, I'll do a little bit of JavaScript. 00:52:26.800 |
- So interesting, you are a research engineer 00:52:29.440 |
that came out of OpenAI that bet in TypeScript. 00:52:31.880 |
- Well, the reality is that if you're building a product, 00:52:34.400 |
you're going to be doing a lot of JavaScript, right? 00:52:39.160 |
It's a great platform, and our internal service 00:52:47.320 |
The Next.js story, it's interesting because Next.js 00:52:49.400 |
is obviously the king of the world in JavaScript land, 00:52:51.720 |
but recently, ChachiBT just rewrote from Next.js to Remix. 00:52:58.920 |
That is like the biggest news in front-end world in a while. 00:53:04.640 |
you predicted the first billion-dollar company 00:53:08.160 |
and you said that's basically like a sign of AGI, 00:53:10.160 |
once we get there, and you said it had already been started. 00:53:16.440 |
- That quote was probably independently invented it, 00:53:25.920 |
I hypothesize it was maybe already being started, 00:53:34.240 |
I guess we're going to have to wait for it a little bit, 00:53:36.920 |
and I think it's because the dust of the world don't exist, 00:53:39.600 |
and so you don't have that thing that lets you run those, 00:54:04.160 |
with a lot of assistance from machines to achieve your job. 00:54:07.720 |
That would be great, and that I believe in a bit more. 00:54:15.720 |
but it's basically like so many people are focused on, 00:54:18.000 |
oh, it's kind of like displaced jobs and whatnot, 00:54:20.600 |
but I'm like, there's so much work that people don't do 00:54:24.240 |
and maybe the question is that you just don't scale 00:54:31.960 |
and then people using dust will be two people. 00:54:34.720 |
- So my hot take is I actually know what vertical 00:54:39.960 |
- There's already two of us, so we're at max capacity. 00:54:46.840 |
but his team is, he's got about like 200 people, 00:54:58.680 |
he sold his future for 250 million to Spotify, 00:55:01.480 |
so he's not going to hit that billionaire status. 00:55:11.320 |
by a bunch of agents, dust agents, to do all this stuff, 00:55:16.000 |
because then ultimately it's just the brand, the curation. 00:55:27.280 |
I think it was Pinterest or Dropbox founder at the time was, 00:55:32.080 |
when you're CEO, you mostly have an editorial position. 00:55:46.360 |
Like I write commentary, I choose between four options. 00:55:52.240 |
you build up your brand through those many decisions. 00:56:00.680 |
you've have an upcoming podcast with Notebook NLM, 00:56:23.280 |
Any final kind of like call to action hiring? 00:56:25.560 |
It's like, obviously people should buy the product 00:56:29.040 |
And no, I think we didn't dive into the vertical 00:56:37.040 |
We spike at penetration and that's just awesome 00:56:53.360 |
But the potential within the company after that is limited. 00:56:58.000 |
We're true believers of the horizontal approach 00:57:03.400 |
But I think it's an interesting thing to think about 00:57:17.200 |
- Yeah, I'll provide you my response on that. 00:57:21.280 |
And it's basically your sense on the products 00:57:26.160 |
In other words, if you're trying to be as many things 00:57:33.640 |
And in future, if we want to choose to spin off platforms 00:57:36.360 |
for other things, we can because we have that brand. 00:57:44.480 |
that like, here's the info that we use for search 00:57:48.960 |
you always can have lateral movement within companies, 00:58:04.400 |
I don't really mean the platform as the platform platform. 00:58:12.200 |
there is so many operations within the company. 00:58:14.280 |
Some of them have been extremely rationalized by the market, 00:58:24.760 |
But there is so many operations that make up a company