back to indexAI Engineer World’s Fair 2025 - Tiny Teams

00:02:01.920 |
So excited to have you here today for, I believe, the first ever edition of Tiny Teams here 00:02:19.380 |
at the AI Engineer Summit, courtesy of our friend, Sean Wang, aka SWIX. 00:02:26.720 |
I'm a GP at a venture capital fund called CRB. 00:02:29.880 |
We've been in business for 55 years, backing teams at the Seed and Series A stage across 00:02:36.860 |
We currently are investing out of a billion dollar fund. 00:02:39.480 |
We're investing in folks like Vercel, Postman, Kong, Browser Base, Voyage AI across the landscape 00:02:50.000 |
We have a super exciting track for you, namely kind of pointing to a trend that we've all 00:02:54.260 |
seen in the past year and a half, two years of AI, which is that, you know, small teams 00:02:59.080 |
can build insanely successful projects in a way that probably was never possible previously. 00:03:04.400 |
And so here to kick things off for us is Eric Simons from StackBlitz with their product, 00:03:26.340 |
You know, by the end of this, what I hope that you get out of it is maybe some advice I 00:03:32.040 |
wished I had had before kind of trying to hold on to the tail of the dragon with the past 00:03:39.120 |
And, you know, so I think for us, you know, how many people here have heard of Bolt.new, 00:03:47.360 |
I guess I'm still used to being like, has anyone heard of StackBlitz? 00:03:50.800 |
And there'd be like two hands and that sort of thing. 00:03:55.260 |
So is anyone aware of how long we were around as a company before we launched? 00:04:07.740 |
So if you rewind, you know, the x-axis way back seven years, the ARR starting at the bottom 00:04:16.020 |
Sorry, that's October of last year is when that thing starts. 00:04:21.220 |
That's over seven years what it had gotten up to, right? 00:04:24.720 |
And at the time that we launched Bolt, you know, we were a team of less than 20 people. 00:04:29.000 |
And when we put it online, we had absolutely no idea what was going to happen. 00:04:33.820 |
Like we thought we were getting ready to shut down the company, actually, at the end of last 00:04:39.080 |
This was like the last, you know, not like pivot of the company because it was the same 00:04:43.440 |
core technology that was used to build this product that we've been building for seven 00:04:46.940 |
But we couldn't figure out a way to kind of create a commercial offering that made sense 00:04:54.180 |
And so, you know, our expectations were like if we can add $100,000 of ARR by the end of the 00:04:59.880 |
year with this thing, that would be game changing, right? 00:05:02.940 |
Obviously, kind of beyond our wildest expectations of what happened. 00:05:08.840 |
But I think that to me, the really crazy thing about the graph you're looking at is how clean 00:05:18.060 |
Like there's not this like jagged edges during the insanity of the early days. 00:05:23.380 |
And the product we put online was really, it's like those race cars. 00:05:28.060 |
It's like those race cars where they strip everything out and it's just like metal. 00:05:32.440 |
It's just like this, you know, that's kind of what the product was like. 00:05:34.960 |
So the fact that, you know, our team was able to scale this was just unbelievably impressive. 00:05:40.000 |
And so that's, I kind of want to talk about what that looked like and how to structure teams 00:05:44.200 |
to be able to actually rally together and, you know, be able to scale what would normally 00:05:51.740 |
The best analogy of what it feels like, have you ever seen the movie 300? 00:05:55.420 |
During this time, right, like on this revenue ramp, kind of like probably at the tail end 00:05:59.660 |
that we were looking at probably 30,000 or 40,000 active customers at that point, you know, 00:06:07.160 |
You've got this small group of people, you know, surrounded by just tens of thousands of things 00:06:12.660 |
that are maybe not trying to kill you in our sense, but like, it felt like that. 00:06:18.540 |
We had no, there was not a person on our team that had, you know, success or support in their 00:06:23.620 |
It was my chief of staff and I largely responding to support tickets and whatnot. 00:06:26.800 |
But the main thing, we were able to make this work because it was this just incredible camaraderie 00:06:33.920 |
And we'd been working together for seven years at that point. 00:06:37.140 |
And we were just extremely aligned and very lean and very fast. 00:06:42.700 |
Like, that's how we've been operating all along, right? 00:06:45.420 |
And one of the core, you know, philosophies that we set out, like my co-founder and I, he's 00:06:52.540 |
He and I have been building websites together since, for like 20 years now, literally, since 00:06:59.200 |
And the company we did before Stackless, before this one, we'd actually bootstrapped 00:07:08.900 |
And when you do that, you really learn how far a dollar can stretch. 00:07:12.100 |
And it's very obvious how most startups are just incredibly inefficient when you're in the 00:07:18.280 |
phase of trying to find product market fit, right? 00:07:20.340 |
And so this is kind of where this mantra that we've had at our company for, you know, almost 00:07:26.580 |
a decade now really kicked in, which is you really want a small number of people with 00:07:31.160 |
Because what that means is that people at the company have more agency. 00:07:38.400 |
They don't have to get permission to build things, right? 00:07:40.360 |
There's not this whole chain of command you have to go through, et cetera. 00:07:46.240 |
You can just go and, you know, make, you know, immediate impact, which, again, is really 00:07:52.460 |
important when you're dealing with this sort of like scale, right? 00:07:55.280 |
Of course, for startups, the name of the game, when you're finding product market fit, is 00:07:59.840 |
you need to be able to take as many shots on goal as you possibly can. 00:08:03.520 |
Because, like, fundamentally, getting product market fit is just like an enterprise sales 00:08:08.760 |
Like, if you're a sales, an enterprise sales rep, and you want to close a million dollars 00:08:12.000 |
of pipeline, you don't go and talk to, like, three people and assume you're going to close 00:08:16.520 |
You're talking to, like, 100, 200 people at your top of funnel. 00:08:19.900 |
Of those, you know, half of those, maybe take the next call. 00:08:27.100 |
Of those, you close three or something, right? 00:08:29.200 |
Same thing with, like, you know, building products and building startups in the early phases. 00:08:33.100 |
You need to stick around as long as you possibly can, which means you need a lower burn rate. 00:08:37.840 |
You do not want to have more people at the company, right? 00:08:40.400 |
Because humans are the most expensive thing for a company. 00:08:43.860 |
It doesn't mean you shouldn't hire them, but it means that, you know, it's important for your 00:08:47.160 |
ability to have a durable, you know, enough time to actually take shots on goal. 00:08:51.540 |
Case in point, one of our main competitors of our previous product when we were in the IDE space, 00:08:56.460 |
they got acquired, basically stripped for parts, two weeks before we launched Bolt, right? 00:09:02.760 |
But purely a matter of they didn't have enough runway to actually get to the other side of this thing. 00:09:09.480 |
Like, they would have been meaningful competitors in this space to us, right? 00:09:13.180 |
Okay, so that's, this is, you know, there's a whole bunch of reasons this is important. 00:09:21.100 |
A lot of, you know, folks that do startups repeat this sort of thing. 00:09:25.080 |
You want to have people that have a great shared, you know, shared set of core values that, 00:09:30.420 |
you know, where it's low ego, high trust, they're obsessed with making the user successful, 00:09:37.740 |
and underneath, you know, chaos, they have grit and resilience, right? 00:09:42.980 |
If you aren't in this sort of insane situation like we were, and you have folks that are already 00:09:50.800 |
having trouble with the ups and downs of a startup, I will tell you, it would not have been possible 00:09:57.580 |
to do what we did with folks that didn't have just incredible grit and ability to just check 00:10:02.300 |
their ego and focus on what really mattered, right? 00:10:08.060 |
So that's, I think, from a team perspective, those are the things that really stick out to me from, 00:10:14.320 |
you know, what allowed us to, you know, really scale with the traction that we saw and we're seeing. 00:10:22.100 |
To handle, and this probably applies to not just, like, this crazy extreme situation that we are in growing the company, 00:10:29.920 |
but in general, you know, at startups, there's going to be things, times when just everything's on fire, right? 00:10:36.000 |
And a lot of you probably relate to this, where it's like, sometimes it's good things are on fire, 00:10:42.780 |
Sometimes it's bad things that are on fire, right? 00:10:48.060 |
And the best analogy that I have leaned into, like, as an operator is, like, 00:10:54.220 |
imagine that you're, you know, a fire truck squad, you have one truck, 00:10:58.560 |
and you're in a town that's completely on fire. 00:11:02.860 |
And the answer is, you have to make hard decisions and choose where the high-impact areas are 00:11:08.960 |
from infrastructure, you know, and the key people that need to be saved, right? 00:11:14.160 |
And it's tough because all these things, it's hard to gauge sometimes what's actually going to be the most important thing, 00:11:23.040 |
And so that's, and a lot of, and that, and what you're saying is, 00:11:27.160 |
there are some fires that are just going to have to burn, and that's okay, right? 00:11:30.740 |
But if we focus on saving the right things, focus on the right things, 00:11:34.260 |
that'll make up for, you know, all the other things that we have to let go, 00:11:38.420 |
because we simply can't focus on everything as a small team. 00:11:41.440 |
But there's actually an added benefit of that, is that you don't get lost in the million things. 00:11:46.420 |
If you just hired a whole bunch of people, you feel like you have to do all of these things. 00:11:50.280 |
It turns out, focusing on 10% of the things often gets you the lion's share of the result that actually matters. 00:11:58.000 |
So it forces you to actually have clearer thinking of what you're going to go put your time and focus into as a team, right? 00:12:03.400 |
And, you know, I kind of mentioned the story of, for us, you know, we've been around for a long time, 00:12:12.900 |
And, you know, over the past eight years in the Valley, there's been a lot of things that people will say and believe, 00:12:20.780 |
and you go to things, you know, gatherings of people, and they'll kind of repeat these same things, 00:12:26.700 |
So a couple of these, like just random examples, back when we started, 2017, 2018, remote work was like very looked down upon. 00:12:34.760 |
It was like, there's no way you could do that. 00:12:37.100 |
My co-founder and I, we just, you know, the best candidates we saw were coming in from all around the world. 00:12:41.580 |
And so we, when we had actually gotten an office in SF, we thought we were going to set up shop here. 00:12:46.940 |
Six months into paying this, you know, $5,000 a month office, we're like, what are we doing? 00:12:55.140 |
Pandemic hits, then the world's like, remote work, this is it. 00:13:00.360 |
And now we're kind of flipping back to previous, you need to have your own thinking, right? 00:13:04.020 |
Because if you just try and follow whatever, you know, the press or investors would ever say, it's going to be a nightmare. 00:13:11.120 |
You're going to be distracted by a whole bunch of decisions that fundamentally are not actually coming from your assessment of reality. 00:13:17.500 |
Another great one is the topic of tiny teams. 00:13:21.500 |
If you were raising money in 20, if you were a company in 2021, you had investors, they were screaming at you, ours were, you should raise more money. 00:13:31.700 |
And then if you waited 12 months in 2022, they would come back and they'd say, you need to lay off a whole bunch of people. 00:13:38.100 |
And, you know, and for us, we were like, we never were spending money. 00:13:43.100 |
So, you know, it's, you want to have these sort of bets that you make. 00:13:46.480 |
And I don't want to say it's, you don't want to be contrarian for contrarian's sake. 00:13:49.660 |
Some of this stuff that is repeated actually, you know, tends to be durable advice. 00:13:55.080 |
But I would just encourage you to just, like, think for yourself and don't just adopt a lot of the hive mind stuff. 00:13:59.580 |
Because, you know, it seems like the best companies tend to have independent decision making that really allows them to succeed. 00:14:05.380 |
So, of course, leading from the front is very important. 00:14:11.720 |
But what I'll say, in the first week of bulping online, it was, it was, it was pretty touch and go. 00:14:19.820 |
Because, again, the product was very, was very brittle. 00:14:22.620 |
And it became clear to me, like, if, if, if I don't, myself and the team don't get out and make ourselves visible to the community and, and engage with them, people are going to churn and they're going to go away. 00:14:35.520 |
And they're not going to, they're going to lose belief pretty quickly because we have so much work to do. 00:14:39.120 |
And so we started running a weekly office hour session where we let all users tune in on YouTube and X or whatever. 00:14:51.680 |
And so, you know, again, how do you smooth that sort of, you know, growth curve? 00:14:57.860 |
Because user love is hard to quantify on, on, you know, specifically. 00:15:04.740 |
And that's, that's how you can really scale, you know, love for a product like this. 00:15:08.420 |
Last thing I'll mention, you know, as far as like tools that we used, support is, is something that is now, like you can, there's a lot of AI tools that are coming out, right? 00:15:19.680 |
That help you scale, you know, all aspects of your business. 00:15:26.240 |
The first two months that we were online, I mentioned earlier, my chief of staff and I were the primary support people, spending a lot of our time doing emails. 00:15:33.780 |
We ended up picking up a tool called Parahelp. 00:15:36.520 |
I don't know if anyone's heard of those guys. 00:15:38.020 |
Like they are, our, the AI assistant called Sam from those guys is the top rated support assistant for us and takes out 90% of our tickets automatically, right? 00:15:50.260 |
A year ago, two years ago, we would have had to hire 50 people to go and scale to that, right? 00:15:58.200 |
The leverage that, that you can have by integrating AI and there's even custom things we're doing in our products, you know, training our own, you know, little models to help people be successful within the product experience. 00:16:10.000 |
There's a lot of things you can do by not just making an AI product, but also building around the entire customer success journey to be, you know, powered by that. 00:16:21.280 |
So I think they're, they're, we're one of their customers. 00:16:25.100 |
A couple of just brilliant, you know, young, young guys, I think out of Europe or something, running that company. 00:16:31.380 |
And I mentioned this before with like kind of leading from the front, but community. 00:16:39.980 |
Going and actually talking to users, like creating a space for users to try out your product and like learn from each other is so key. 00:16:48.640 |
And this has always kind of been the case, right? 00:16:51.440 |
But especially now, if you're building an AI product, it's really important that folks can like learn from each other and learn in a place where they can get help, you know, from pros, right? 00:17:01.660 |
And from the community themselves, because this is another way you can really scale the customer experience without having to add headcount within your company itself, right? 00:17:10.500 |
And so one of the kind of cool ways that we're doing this, I don't know if you've ever seen, we are throwing the world's largest hackathon right now, actually, for this entire month. 00:17:17.260 |
If you go to hackathon.dev, you can check it out. 00:17:19.260 |
We have passed the Guinness World Record, by the way. 00:17:23.140 |
We've got 80-something thousand people that are participating. 00:17:25.540 |
But basically, we've got this amazing event going on. 00:17:29.040 |
We have dozens of people coming to help provide support. 00:17:32.560 |
And, you know, folks are building out their projects, trying out the product. 00:17:36.420 |
And this has been just the most, the craziest ROI we've ever seen from a marketing initiative we've ever done, both due to the scale, 00:17:44.860 |
but also the thoughtfulness and, like, getting, augmenting it with both the AI support and the community support, etc. 00:17:53.060 |
So to kind of wrap up here, these are, you know, the main takeaways, if you wanted to, like, take a photo of, you know, the TLDR or whatever. 00:18:00.500 |
These are kind of the main things that, you know, stuck out to me from the past couple of months of our experience that really made a difference. 00:18:09.620 |
And, you know, again, like I said, it was very touch and go, right, for the first, especially the first two months, just how unexpected and unprepared we were, you know, for what happened. 00:18:20.260 |
Without this, these things, like, this would not have worked, and it wouldn't be working now, right? 00:18:25.560 |
And to boil that down, it's like, you know, you don't want to hire an army. 00:18:31.060 |
That's kind of the mentality that we look for when we hire people onto the team. 00:18:39.580 |
I have to, like, go to SFO, like, immediately after this. 00:18:42.660 |
But if anyone wants to chat about stuff or has questions, that's where I am on X, and then that's my email address there. 00:18:48.680 |
I think we have one minute for questions, actually, if anyone has a burning one. 00:18:53.580 |
There's a microphone up here if you want to come on up. 00:19:04.920 |
Like, did you have a framework for, you know, talking to users, or did you just ideate and, you know, ship product experiments and see what stuck? 00:19:13.100 |
Yeah, you're talking about, like, for, like, kind of, like, how we decided to build Bolt or even after. 00:19:16.980 |
So, yeah, we tried out, like, probably five different things last year. 00:19:21.120 |
And all of them, I mean, I think it's, you know, all the things I've ever built that really seem to stick with users and resonate always started with something that I, myself, thought was cool, you know, which sounds, like, very obvious. 00:19:34.880 |
But there's also, most of the things I've built in my career have been things that sounded good and, like, it's, like, hey, this should, like, maybe increase our ARR, but it, like, intrinsically wasn't something that I was, like, so, so, so stoked about that I couldn't sleep. 00:19:48.900 |
And Bolt was one of those things, you know, and then we certainly put it in front of users. 00:19:56.360 |
What the user feedback we got from the early Bolt sessions before we launched versus, like, launching stacklets was the exact same. 00:20:04.000 |
And the outcomes couldn't have been more different, right? 00:20:05.900 |
So, again, it's all about just, like, taking shots on goal because you just don't know until you actually get it out into the world. 00:20:11.580 |
You can certainly get the early feedback, but, you know, it's all about just getting it launched, getting it out there, and, you know, iterating as fast as you can. 00:20:33.120 |
Thanks so much, Eric, for walking through the amazing journey that has been Stacklets and now Bolt. 00:20:38.820 |
Next up, we're going to have Sid Bendray come to the stage to talk about Olive. 00:20:44.440 |
Sid is the co-founder of the company, and they're building a portfolio of consumer products, starting with products like Quizzard, which you may have heard of before. 00:20:54.080 |
One of their products reached number four on the AppSort's education charts in 2024 and number five in 2025 alongside other companies like Duolingo. 00:21:01.820 |
They're backed by Neo, and they're building the AI infrastructure to build a $1 billion portfolio of consumer software over the next decade. 00:21:42.280 |
So, Sid, please come up to the stage and see if you're going to be a part of the stage. 00:25:39.600 |
These companies are generating millions of ARR 00:25:42.600 |
with teams smaller than most startups' engineering departments. 00:25:56.600 |
We're building a family of iconic consumer software products 00:25:59.600 |
that we hope will enable people to live better, 00:26:06.600 |
of virally successful products to $6 million in ARR profitably 00:26:09.600 |
and have generated over half a billion views across social media, 00:26:12.600 |
achieving this with a tiny team of just four. 00:26:24.600 |
We launched it with a TikTok video that went viral overnight 00:26:27.600 |
and generated a million views that turned into 10,000 users in less than 30 hours. 00:26:32.600 |
We actually started scaling with no LLM costs. 00:26:35.600 |
This is because back then we had the initial Codex model launch, 00:26:40.600 |
Funny enough, we were cycling between 10 different accounts from our friends 00:26:44.600 |
just so that we could prompt engineer or generate these AI outputs. 00:26:48.600 |
Interestingly enough, Codex, even though it was meant as a coding model, 00:26:52.600 |
could be prompt engineered for any open domain conversation. 00:26:55.600 |
As you all may know, it ended up being sunset for abuse. 00:27:00.600 |
we ended up getting reached out directly by OpenAI 00:27:02.600 |
on a few of our different accounts that we were cycling through 00:27:05.600 |
as being one of the top model users for the Codex model at the time. 00:27:12.600 |
and then we moved to New York City the fall of 2023, 00:27:16.600 |
where we started our back-to-school campaign, 00:27:18.600 |
which was a series of man-on-the-street videos 00:27:21.600 |
across different prestigious colleges in the U.S. 00:27:24.600 |
This is when we paid our first million dollars in ARR 00:27:27.600 |
and also achieved profitability within the first nine months of operating. 00:27:30.600 |
We then even had another successful campaign in the spring of 2024 00:27:36.600 |
that got us all the way to number six in the charts of education 00:27:39.600 |
alongside giants like Duolingo, Photomath, and Goth. 00:27:44.600 |
We then took all our learnings in the spring of 2024 00:27:47.600 |
and doubled down on a new product on Stuck AI, 00:27:52.600 |
We were able to get to a million users in under nine weeks 00:27:55.600 |
and generated over a quarter billion views across socials in a month. 00:27:58.600 |
A few weeks ago, we were able to get both products in the top ten 00:28:03.600 |
Unstuck went all the way up to number three in the education charts, 00:28:08.600 |
We've now also launched in Stealth our third product, which is our first product outside the education domain. 00:28:14.600 |
It took three weeks to build thanks to all the blueprints that we've built in advance. 00:28:26.600 |
Our lean playbook boils down to three key pillars. 00:28:30.600 |
Operating principles that lay the foundation of leanness. 00:28:33.600 |
Organizational structure that set up the systems for this leanness. 00:28:37.600 |
And AI tooling augmentation, which optimizes scaling. 00:28:41.600 |
Let me start with operating principles, which I believe is the main bedrock for why we're so lean. 00:28:50.600 |
We only hire 10 extra generalists that have multiple complementary spikes in similar fields. 00:28:57.600 |
So, for example, our product engineers are full stack developers, great product thinkers, 00:29:01.600 |
and really good at fundamentals of computer networks, for example. 00:29:06.600 |
We have designers who can build and the likes. 00:29:09.600 |
We try to aim for people whose complementary spikes can shape and drive 10x outputs within the team. 00:29:15.600 |
The second key principle is profit-first mentality. 00:29:18.600 |
We are relentless about prioritizing profits because profit is power and profit is focus. 00:29:23.600 |
Profit gives us a clear mechanism to make all our decisions and guide a North Star for the company. 00:29:36.600 |
KPI alignment removes micromanagement bullshit because everyone is focused on moving their metric week over week. 00:29:42.600 |
This also means decisions must be validated against this KPI. 00:29:46.600 |
Our fourth principle is continuous process refinement. 00:29:51.600 |
For any repeating process, we always ask, how would we do this better? 00:30:01.600 |
We view failures in the company and issues in the company as systems failures, 00:30:04.600 |
which lets us set up a feedback loop for improving ourselves and improving the process that we use, 00:30:09.600 |
both on an operational standpoint but also a technical standpoint. 00:30:20.600 |
We believe in building compounding benefits by investing in technical playbooks and operational blueprints. 00:30:26.600 |
This allows us to compound our benefits or compound our learnings so that the benefits can be used across new products. 00:30:32.600 |
This is exactly how we were able to hit a million users and stuck within nine weeks, taking everything we learned over a year and a half on Quizzard. 00:30:42.600 |
For example, one of our super tools is LaunchDarkly. 00:30:44.600 |
The intended use case of LaunchDarkly is a feature management platform that helps software teams control and release features safely and quickly. 00:30:55.600 |
We use LaunchDarkly as a manual traffic load balancer. 00:30:58.600 |
Specifically, we put LaunchDarkly in between all our LLM calls so that we can reroute traffic to different LLM providers based on hitting rate limits, different strategic initiatives, or whatever. 00:31:08.600 |
It just gives us an on-the-fly mechanism for choosing where our traffic goes and allows us to split things within rate limits. 00:31:14.600 |
This was especially important in the early days when rate limits were really tight and also it was hard to get quotas increased on individual endpoints. 00:31:25.600 |
Specifically, I'm talking about Azure OpenAI. 00:31:29.600 |
The second extended use case is on-the-fly infrastructure changes. 00:31:32.600 |
For us, this looks like how on Unstuck, which takes in a lot of files to ingest, for specific file formats, we have a lot of waterfall ingestion processes. 00:31:42.600 |
What I mean by that is we depend on a lot of third-party services that can be reliable. 00:31:46.600 |
By using LaunchDarkly, we're able to change the prioritization of these processes on-the-fly so that if one of these third-party services goes down, we're able to reorganize the service on-the-fly to make sure it's up and running and available to our users worldwide. 00:32:00.600 |
The third extended use case is UI modifications and paywall experiments without having code pushes. 00:32:06.600 |
We have built an experimentation layer around LaunchDarkly, which allows us to run and spin up experiments without needing to make a code push. 00:32:16.600 |
The second pillar that guides our leanness is our organizational structure, especially in our engineering org, in the way we hire and that we organize our engineers. 00:32:27.600 |
For this, we look to Palantir, who has successfully scaled across multiple market segments. 00:32:32.600 |
We believe that we're building the consumer version of Palantir with our harvester and cultivator model. 00:32:38.600 |
For harvesters, these are product engineers similar to the Palantir Deltas of the four deployed software engineers that own and live and die by their products. 00:32:46.600 |
They're living in the metrics, working on A/B experiments, building features end-to-end, working with the marketing team, and effectively owning the entire product's existence. 00:32:55.600 |
Harvesters are people who build products that people actually want and pay for. 00:33:02.600 |
Cultivators are AI software engineers whose main goal is building the company's agentic operating system. 00:33:08.600 |
They're pioneering automation across different business units, including marketing, design, product, with the idea of expanding infrastructure that affects all the users everywhere and helps us win in every market. 00:33:18.600 |
Cultivators are creating the foundation that let us ship and scale faster in any market. 00:33:22.600 |
And finally, the last pillar is AI-driven and AI and tool augmentation. 00:33:30.600 |
One important note in thinking about this is when we think about hiring, we like to think of tool use as being something that will allow a 10Xer become a 100Xer, as opposed to the contrast, which is using tools to fill gaps and augment the shortcomings of someone who's not at the standard that we'd like to hire for. 00:33:49.600 |
With that being said, we use a slew of products for our day-to-day task automation for things like script writing, campaign analysis, operations, code generation, and communications. 00:33:59.600 |
Effectively, by paying for a bunch of services, we have augmented and enabled everyone to have their own chief of staff within the company. 00:34:10.600 |
One more thing is we believe heavily in compounding benefits. 00:34:15.600 |
With them changing so quick and with you having so many apps out there, do you ever struggle with going back through and changing the models that you've used for some of these apps? 00:34:29.600 |
I think a really cool thing is the fact that you can do that. 00:34:35.600 |
You can just build an app with an AI model, and then a better AI model comes out three months later. 00:34:39.600 |
And you can go and a lot of the time it's like a one-line change of like, let me update this model and the app just gets way better or it just unlocks new things. 00:34:46.600 |
And so that's something I do frequently where I'll go back and I'll even like relaunch an existing app with a new AI model or add a tiny feature to it. 00:34:54.600 |
And so, yeah, I think that's kind of the superpower of like building with AI is the fact that you can just kind of replace these AI models. 00:35:07.600 |
Thank you so much, Hassan, for walking through all of that. 00:35:17.600 |
So impressive that you do all of this on top of your day job. 00:35:22.600 |
So our next and final speaker for this portion of the Tiny Team session is Max Broder-Urbus from Gumloop. 00:35:30.600 |
Previously, he did competitive programming while at McGill and then also went through YC a little bit over a year ago. 00:35:37.600 |
Has achieved just incredible, incredible traction in such a short time. 00:35:42.600 |
Now scaling automation across companies like Instacart, Webflow and Shopify while still having less than 10 people on the team. 00:35:51.600 |
So without any further ado, please welcome Max up to the stage. 00:36:10.600 |
Anything I have to do in particular to make this? 00:36:24.600 |
anything I have to do in particular to make this the one hanging decoy wire yes okay sweet 00:36:48.600 |
okay there we go so this should preview in a second but yeah I'm Max I'm the founder of Gumloop we went 00:37:09.600 |
through YC year and a half ago now winter 24 we've been a pretty notoriously small team since then 00:37:17.220 |
okay the preview is not working so I'll just do it like this so we've been a pretty notoriously small 00:37:28.380 |
team since then we raised the series a as a team of two and are now nine people but this tweet was 00:37:36.600 |
kind of like the one that inspired this talk like how how we scale to the the size we hope to be with 00:37:42.940 |
fewer than 10 people I'll be honest I tweeted this when I was extremely caffeinated and really 00:37:47.620 |
thought I was gonna rule the world we're on on track roughly we're less than 10 people and growing 00:37:54.100 |
growing really fast but this was also a good Twitter post for hiring because we wanted to hire 00:37:59.260 |
exceptional people and I think working on a small team is really fun so I thought I would go over I'm 00:38:05.260 |
sure at this conference you've heard a lot about like what AI tools to use and how to work efficiently 00:38:09.760 |
with cursor and windsurf but I was gonna focus on how you actually like once you're efficient with these 00:38:14.440 |
AI tools how you build a team that's has the right culture and can actually scale and do the things you're 00:38:19.840 |
you're setting out to do but the first thing I was gonna go over was kind of how we got here so I 00:38:25.120 |
spent like six months building up a ton of terrible terrible software I made like video game moderation 00:38:31.000 |
software I made ML models to detect children's age and video games so that you could separate adults 00:38:37.780 |
from children in VR I made bot detection software and then as a side project on top of my side project I 00:38:44.680 |
made the first UI for auto GPT which was this like really hyped open source framework that came out 00:38:49.960 |
right at the start of the agent craze and basically I noticed that everyone in this discord was excited 00:38:56.740 |
to use AI but they had no idea how to actually clone a github repo or set things up locally so I just spun 00:39:02.180 |
up like a really ugly UI I called it agent hub at the time I thought was that it was gonna be github for agents 00:39:07.420 |
I thought this was really genius but it was all kind of built upon the idea that agents were gonna be 00:39:14.080 |
immediately useful so we pivoted pretty quickly after this but I noticed that all of the people who 00:39:19.600 |
were asking the agent to do things were basically just describing complex workflows like if they knew 00:39:24.400 |
how to write some Python then you know how to make some API calls and some LLM queries they could 00:39:28.660 |
basically automate their entire request they don't need to like cross their fingers and hope that the 00:39:32.920 |
agent will do it for them so yeah that was the realization it was my co-founder and I at this time we just 00:39:38.200 |
started kind of editing how you could configure an agent instead of asking for everything that 00:39:43.720 |
you wanted you could actually define the steps as a series of like nodes in a workflow and then we 00:39:50.200 |
got into YC a few months later we raised the series A we hired two interns for the summer and then we 00:39:55.720 |
raised the series or we yeah we raised a seed then we raised the series A about like four months later and 00:40:01.240 |
we were just a really small team kind of overfunded but raised a lot of money so that we could hire the 00:40:06.280 |
most exceptional people over the next year and the general idea was just scale with under 10 people 00:40:11.960 |
because we noticed after working Amazon and Microsoft that working on a super small team is really fun you 00:40:16.840 |
can just move way faster not sit in meetings all the time so now gum loop is this it used to be way 00:40:23.240 |
uglier but it's this workflow automation tool that a bunch of really large companies are using the biggest 00:40:30.200 |
customers are like Instacart Shopify rolled this out to the entire company last week which broke most of 00:40:36.440 |
our things but it's all back online now and yeah and all of this is 100 100% PLG so we're not doing any 00:40:43.640 |
outbound sales I think that's one thing that helps us scale really quickly if people find your product 00:40:47.480 |
and come inbound you don't have to hire 10 sales reps so there's definitely a lot of luck and kind of 00:40:54.360 |
coincidence in in this like small team approach that works for us because we happen to be a PLG company 00:40:59.000 |
probably wouldn't be as possible if we were doing like a top-down sales motion 00:41:01.800 |
so I thought I could go over how we approach hiring internal operations and then team culture these are like 00:41:10.760 |
things that we talk a lot about internally my co-founder and I I did want to put a disclaimer 00:41:16.200 |
here I don't actually know what I'm talking about I I'm trying to figure out if we're just getting lucky 00:41:20.920 |
over and over or if like our approaches are actually working but take everything I say with a grain of 00:41:24.840 |
salt because it could be totally off base and it might ruin your company if you do what I do 00:41:29.560 |
so the three things that we try to do internally when we approach hiring are be super super picky which is 00:41:37.560 |
painful most of the time product-led hiring buzzword that we've been trying to coin and then making time 00:41:45.000 |
to work together which I'll explain in a second but this is a screenshot from the the co-founder of 00:41:50.920 |
instacart who ended up investing in our company and we would ask him for advice because he scaled a large 00:41:54.920 |
company before running candidates by him and and one time I asked him like I sent him a candidate that 00:41:59.960 |
I thought was pretty good this was his only reply he tends to write very short emails but emphasizing 00:42:06.520 |
that you shouldn't lower the bar like if you aren't extremely excited about someone like if it's not 00:42:10.520 |
a no-brainer you shouldn't even consider hiring them so we've done like hundreds of interviews and tons 00:42:16.200 |
of work trials which I'll explain in a second but if you're going to be a super small team every person 00:42:20.360 |
needs to be absolutely exceptional which oftentimes makes like investors of yours like confused because 00:42:26.440 |
you're still such a small team and they gave you so much money to scale but you have to kind of be 00:42:30.760 |
really thorough with your screening and then also really confident in every single person you hire 00:42:36.200 |
we we've been trying to coin this term of product-led hiring so two of our customers ended up quitting their 00:42:42.120 |
jobs to join the team and that was like the one of the easiest decisions we've made in terms of hiring 00:42:47.320 |
because they already loved the product they had a ton of insight into how it could be used in a 00:42:51.480 |
business so like our customer from instacart the one who originally found us and brought us into the 00:42:55.160 |
company he ended up quitting and joining us and now he does a lot of our like enterprise relationships 00:43:00.200 |
and working with our larger customers and then this screenshot is our head of education and community he 00:43:05.000 |
was at webflow before but had a zapier course and a ton of automation workshops that he was selling 00:43:11.320 |
and then found gumloop and got super excited so that was a no-brainer but i think if you can focus on 00:43:15.720 |
making a really great product that obviously happens to be accessible to people who you want to hire 00:43:21.160 |
there's a bit of luck involved there but it helps with the hiring process because they know exactly 00:43:25.640 |
what you do you don't have to like inspire them to join the team they they want to join on their own 00:43:30.040 |
and then making time to work together so i think this is only hopefully this video plays 00:43:36.440 |
yeah okay this is only really possible if you have a really small team but we do this thing where we 00:43:44.200 |
uh rent airbnbs and we just go hack together for like four days at a time we we make like three 00:43:49.080 |
weeks of progress in a couple days but um the two people sitting on the left there are actually 00:43:53.400 |
work trials they were like interviewing at the time but we brought them with us to yosemite to just 00:43:58.120 |
pack and i think doing this really intentional sort of working together period is the only way you'll 00:44:03.640 |
actually know if you want to work with someone so we always bring people into work trials they 00:44:08.120 |
are on the team for several days as if they already joined the company and then by the end we're like 00:44:12.840 |
totally confident whether this is the right fit or not and we've done way too many of these honestly 00:44:18.120 |
but it's helped us make sure that everyone on the team is exceptional 00:44:21.560 |
another thing we try to do in terms of operations i mean there's three things here 00:44:28.200 |
we have almost no meetings uh purposefully so i try to just let people build like i hired great people 00:44:34.200 |
so my plan is to give them the space to build which is easier said than done and then we automate 00:44:38.840 |
everything internally which is kind of a gum loop self plug but yeah in terms of our calendars like 00:44:44.280 |
my calendar is always insane because if we're talking to customers and or i'm talking to customers and i 00:44:48.680 |
flew back from new york this morning for example because i was working with customers in person but 00:44:52.760 |
everyone else's calendar should ideally be totally blank um we try to just give everyone deep focus 00:44:58.040 |
time if you're an engineer and uh we hired you to build exceptional products like we should let you 00:45:04.040 |
do that not make you talk about building exceptional products for five hours every day i think that's 00:45:08.280 |
only possible if you have a really small team because normally you'll have like five person on five 00:45:12.520 |
people on a project you'll have to sync and kind of agree on the terms before you even start working 00:45:17.160 |
and that just leads to kind of slowness everywhere so um also letting people build so uh i used to 00:45:27.800 |
be really involved in every aspect of like every feature we shipped but now that we've hired exceptional 00:45:32.280 |
people who are all better than i am at basically basically everything uh all i do is kind of like 00:45:37.480 |
inspire or i try to inspire what the features we should build are so i'll make these like really stupid 00:45:41.880 |
descriptions of the features that i think we should build based on talking to customers 00:45:46.520 |
and then i just let people do their thing uh so like our design engineers will let's see if this 00:45:51.560 |
works so from that sketch of me being like what if we okay hopefully this works what if we had mcp 00:45:57.560 |
nodes what if you could automate workflows with mcp um that was just like the high level prompt and 00:46:03.160 |
then i let our team like cook basically this video is exceptional i wish it was playing but uh basically 00:46:09.480 |
they built like a better product than i would have ever imagined um so that that's kind of like only 00:46:14.680 |
possible if you hire great people but once you do you can really just take a back seat and give them 00:46:19.160 |
the space to be exceptional and then automate everything you can so this is our internal gumloop 00:46:25.800 |
instance we we automate basically every part of the business as much as we can and if there's something 00:46:31.080 |
we can't automate then we build features on gumloop to let us automate it so like before every meeting we 00:46:36.200 |
have like a deep research report that tells us everything we need to know about the customer not just their 00:46:40.200 |
outward facing information but also like how they're currently using our product uh are they a power 00:46:44.360 |
user or not what features are they using so we're like totally informed going into the meeting um we 00:46:49.400 |
have every time someone interesting signs up we get notified uh why what they're doing on the platform 00:46:55.000 |
and also like an email drafted into my inbox so i can reach out to them uh hop on a call and like talk 00:46:59.800 |
about why they they they made that free account that's led to a ton of our growth um we have an ai chat bot on the 00:47:06.280 |
platform for example that gets like 50 000 messages a day but we have a gumloop workflow that reads the 00:47:11.080 |
chats with the chat bot so that it can tell us what people are confused about and then we use that to 00:47:14.760 |
inform our product decisions so a lot of these little tasks in the company would have been someone's role 00:47:20.440 |
or taking up like three or four hours of their day but now we we use our own product to automate everything 00:47:26.120 |
so also a lot of luck involved you can be a small team if you are an automation company but 00:47:31.720 |
if you use gumloop maybe you guys could be more efficient that's the plug all right um so culture 00:47:38.600 |
wise i think this is the most important thing it's impossible to to talk about having a really 00:47:42.440 |
exceptional team uh if no one's having a good time or um they're quitting so uh we i mean one of the most 00:47:52.360 |
annoying things i say uh at like basically every day when we talk about a feature that a customer is asking 00:47:57.000 |
for is like what if we built it today um like what would that look like and then it's kind of caught 00:48:01.400 |
on and now everyone on the team i mean first of all they're except i've said that like 10 times but 00:48:04.920 |
they're exceptional and they're really fast building engineers so we often just challenge ourselves like 00:48:09.320 |
what if we put on a timer for 45 minutes and try to ship this feature um right now with cursor 00:48:13.720 |
but this can lead to crazy burnout like if you're always asking what if we did it today on a friday night 00:48:19.560 |
at 8 pm then people are gonna have a bad time so you have to be really intentional about making it fun 00:48:25.080 |
um like i mentioned we do these these retreats but we're going like we're picking a cool place that 00:48:32.280 |
i wish my like boss would have taken me when i was working at a company before this and then 00:48:38.200 |
we get a bunch of food and do a bunch of fun things like we go rock climbing and biking and 00:48:42.040 |
um it kind of offsets the intensity of building uh with such a kind of like crazy timeline for every 00:48:49.800 |
feature i don't think like anyone would be having fun if we didn't have these like really exciting 00:48:54.760 |
times to look forward to i also think this is only possible you can't fit 50 people in an airbnb but 00:48:59.400 |
you can fit 10 pretty comfortably um and then being really intentional about your company culture is 00:49:06.440 |
another thing that i'm pretty adamant about this is our company handbook it's like a month or two out 00:49:12.280 |
of date but um basically everything that we say internally we just put it on a page so that we have to 00:49:17.160 |
live up to it um we wanted to kind of hold ourselves accountable for all of the the ways we talk about 00:49:23.480 |
building a company uh and this is also like one of the the things that convinces most of the exceptional 00:49:29.320 |
people on our team to join or to to book that initial call because they read our outward facing 00:49:33.880 |
handbook and they know that like what we're about before they even meet us 00:49:37.080 |
um and i'm kind of at the end of uh i was going to show the video but cut it a bit short 00:49:43.240 |
we are hiring a founding head of growth so if you know anyone you can email me there 00:49:49.800 |
like i mentioned it's a fun time uh pretty intense but hopefully you know someone or you want to join 00:50:12.120 |
uh big fan of the product i think it's really really awesome i've been using it and pitching it 00:50:19.560 |
internally in my company so i'm a huge fan i'm curious how far you think you'll be able to get with 10 people 00:50:25.640 |
like are you still staying true to that and how you think about scaling out to like a billion 00:50:30.360 |
users around the world uh with 10 people yeah i don't think it's possible to scale that big with 10 00:50:36.200 |
people um maybe 15 or 20 but uh i wanted to like set the bar really rigidly and then if i go a little 00:50:44.360 |
over it's no big deal but at least we're not scaling to like 100 people and having eight hours of meetings 00:50:49.160 |
what is um like your vision for the org structure when you do hit one billion with 15 20 people 00:51:01.000 |
what's the it's been changing a ton so at first i was really naive still super naive but i thought 00:51:07.240 |
like we could do it with only engineers because i was like engineers can do anything they can learn how 00:51:11.080 |
how to do marketing or sales or whatever i was totally wrong so we're now five engineers and four 00:51:19.880 |
semi-technical people um i don't exactly know what the work structure will look like but we're starting 00:51:25.880 |
to feel that like our only bottleneck now is like growth marketing like how do we share all of these cool 00:51:30.840 |
features we're doing we're building for people with the world and then also like we're getting hundreds 00:51:34.840 |
requests for features every day so another engineer would would definitely help so yeah just to touch 00:51:39.800 |
on that when you're looking for uh the growth the head of growth yeah are you looking for someone who 00:51:44.360 |
like is also sharing the like oh i can do this all myself with ai tools or looking for someone who 00:51:50.360 |
is looking to grow a team i think definitely not the latter so i call them uh like a doer versus 00:51:57.000 |
a to doer that sometimes you'll talk to someone about like joining the company and they're like i'm 00:52:00.840 |
really great at building out a team like that's the biggest red flag um i think they'd be great at 00:52:06.280 |
like listing all the things that a team needs to do but don't hire that person if you want to stay 00:52:10.120 |
super small we're looking for someone who's like i can just make it happen and then once they hit their 00:52:13.880 |
ceiling and they're like i actually can't like scale further than this then that's the time to hire but 00:52:17.960 |
i wouldn't hire someone who's going in with the intention of hiring more people 00:52:24.760 |
there's something you clarify on letting people build is that individual developers engineers or 00:52:31.480 |
like as a team collaboratively and how do you prevent like um fractures in like the code base and like 00:52:38.200 |
having it like disjointed yeah i think it's only possible to just let people do their own thing if 00:52:43.080 |
they're like really trustworthy like you hired people that you can depend on um sometimes it goes like wonky 00:52:49.480 |
like we don't we don't have the same understanding of what's being built but then we just like sync over like a 00:52:54.120 |
five minute chat and we're back on the same page but um yeah generally like you people know the 00:52:59.160 |
direction because you're talking and you're in the same office all day every day they just like talk 00:53:02.440 |
to a customer and they realize that is a pain point for someone so they just go ahead and ship it you 00:53:06.760 |
don't have to like get in their way and like make a spec doc and figure out exactly how this is going to 00:53:10.280 |
work you should just trust them to build can we do one over here yeah sorry oh what's up um how do you 00:53:20.360 |
think about compensation as well as just like like how you like are do you look at these uh 10 engineer 00:53:26.360 |
or 10 employees as like normal employees or do you consider them more like founders what are your 00:53:30.920 |
expectations for them versus how like a traditional startup might have expectations and how do you 00:53:35.800 |
think about compensation as well we try to compensate really competitively um because we raised like 20 00:53:41.880 |
million and we're such a small team that like we're in a position to do that and that was also like the main 00:53:46.360 |
reason we raised so we can compensate people and make their life comfortable while they're building 00:53:50.040 |
the future um we don't consider them founders i wouldn't like put that burden on someone like i'm 00:53:55.400 |
the one who's waking up at 6 a.m like sweating because i had a nightmare about like our like back 00:54:00.120 |
end crashing like i don't think they should be uh doing that but um we do treat them as like just members 00:54:06.600 |
of the team like everything that we ship is a discussion there's no like top-down order that we need to do x y 00:54:11.000 |
or z uh it's just like a kind of like flatland collaboration on like what we're going to build 00:54:16.600 |
and when and how cool thanks yeah hi do you think the sort of culture can translate to um say you might 00:54:25.640 |
be already doing this but say when you start getting into workflows that are highly complex in enterprise 00:54:29.960 |
right so banking regulation or parts of legal where information is just in the heads of super experienced 00:54:36.920 |
people and i feel in those at least my experience has been in those instances you need deeply non-technical 00:54:42.360 |
people and technical people to work together and the scaling sort of breaks down but have you found 00:54:47.240 |
ways around that or i'm just curious on your advice for people in this yeah um i think i understand the 00:54:53.720 |
question like how do we support really complex workflows if we don't have the nuance of like how to do 00:54:57.960 |
that we try to just build the tools to let the person who understands the workflow do it so 00:55:02.520 |
like at shopify if we're working with like their head of legal or something and they understand what 00:55:06.360 |
contract review looks like at scale for hundreds of contracts a day we make it really easy for them 00:55:10.680 |
to use the software that lets them build their own tool instead of like trying to learn how to do their 00:55:15.720 |
job better than they do yeah i think one more question okay um hey max i just had a quick question so 00:55:25.000 |
with the uh work retreats that you do um is uh like at what point in the interview process do they go on 00:55:31.400 |
the work retreats the guys that you're interviewing and then do you offer to pay them and if so are they 00:55:36.040 |
like 1099 or how does that work yeah so we we always do like a screen with me i talk to someone for like an 00:55:41.080 |
hour and figure out if we are like could be friends basically then we do a technical interview which is 00:55:45.640 |
super practical no like leak code stuff it's just working in the code base and then we do the work 00:55:49.560 |
trial if it's around the time when there's a work retreat coming up i'll just like delay the work 00:55:53.320 |
trial up until they can just come with us and we hire them as contractors basically so um they're 00:55:58.360 |
getting paid for their time we wouldn't want to make someone work for free and uh we just try to 00:56:02.280 |
like coordinate with their schedule whenever they're free okay thank you yeah sweet thanks everyone 00:56:08.680 |
all right thank you so much folks that wraps up this part of the tiny team session uh we'll be back 00:56:18.440 |
here at 2 p.m with some more speakers but thank you to all of our speakers for running through 00:56:23.000 |
everything and enjoy the rest of the conference if i don't catch you back here in a few 00:56:37.960 |
um thank you thank you thank you um thank you thank you um thank you um thank you um thank you um 02:31:30.900 |
adjustments as fast as needed. The last thing I'll talk about is scaling, and it's maybe a little bit 02:31:38.740 |
counterintuitive. You might think like a small team, why would you invest in things like brand 02:31:43.780 |
and culture? I say brand and culture because for me, brand and culture, they're two sides of the 02:31:49.780 |
same coin. Brand is ultimately a reflection of your culture. Your culture is your values as a 02:31:55.840 |
company, and you really want those two to go hand in hand. Culture, I mean, this piece of it is a 02:32:01.960 |
little bit more obvious, but when you're a small team, what ends up becoming super important is like 02:32:06.000 |
every new team member you bring on, you have to believe that they share your same values, that they 02:32:10.600 |
operate the same way, because you can't afford that not to be the case. A bigger company, it's much more 02:32:15.580 |
diluted. You might be able to bring on a bad hire. It's not going to be pervasive and spread. Smaller 02:32:20.320 |
teams, that cannot be the case, and so you need to invest heavily in this from day one. We have a 02:32:25.180 |
living culture deck that we've maintained basically since the beginning, and we rewrite it all the 02:32:29.720 |
time. We look up at the makeup of the team. We kind of like really try to encapsulate everybody's core 02:32:35.140 |
values and the way they behave, and then we share that back out to the team. We onboard new employees 02:32:39.940 |
with the same culture deck. It's an ongoing evergreen sort of exercise that we go through, and I think 02:32:45.840 |
what comes out of this is like this feeling that this tiny team can have this feeling of being a small 02:32:51.120 |
tribe, and that tribe is something that's pretty magical. It allows you to have this feeling of 02:32:55.660 |
continuity. It allows you to have this like feeling that you are in it together, and if you have that 02:33:01.360 |
continuity, there's just so much like it's hard to even quantify that value because you're not having to 02:33:06.200 |
retrain people, re-onboard people. Like people just get it. There's that tribal knowledge, and I do think 02:33:10.160 |
there's a lot of magic that happens. That translates into just, in my mind, higher productivity, transparency, 02:33:16.300 |
shared context amongst all things. We have in our team, and it's easier to do this when you're small, 02:33:21.820 |
is we have like three standing all-company all-hands meetings. The very beginning of the 02:33:26.360 |
week, we start with like going deep on metrics. We talk about, we have this thing called the wall of work, 02:33:31.360 |
where everybody's showing like what everyone else is working on. Wednesdays and Fridays, we do company-wide 02:33:36.360 |
show-and-tell. So this is a chance for people to also dog food our own product, use Gamma, present, share what 02:33:42.360 |
they're working on. It could be a small project. It could be a feature they ship, and this continuity allows 02:33:46.360 |
everyone to feel like we're still in a small room sharing this big, ambitious, long-term vision, 02:33:52.900 |
and do it together. I know there's a lot of talk of like, oh, maybe there'll be the one-billion, 02:33:57.700 |
one-person startup, and I don't know. Maybe that will happen, but my thought is like why? It's so fun to 02:34:04.900 |
build with a team. Like why do it alone? We're having a ton of fun building as a small team, and part of that 02:34:10.120 |
is like we really want to preserve that magic for as long as humanly possible. So this talk started 02:34:16.660 |
with me talking about how the Gamma journey began, which is me thinking about, hey, from a product 02:34:21.360 |
perspective, you know, there's got to be a better way, and my, you know, I guess challenge to you all is, 02:34:27.280 |
as you think about building your own teams, really thinking about, hey, you know, there's the old 02:34:31.360 |
playbook, the old way of scaling and building out a team, and that's totally fine, but is there today 02:34:37.660 |
a better way, and hopefully you guys can find your own path, and hopefully share back, 02:34:41.200 |
and we can all, you know, do this together. I guess we have a few minutes for questions if anybody has 02:34:48.280 |
any. With AI moving so fast, if you could go back, what would you do differently about building your 02:34:56.800 |
current team now? Yeah, that's a great question. So the question was, with AI moving so fast, what 02:35:02.380 |
would I have done differently? We actually started, you know, four years ago, so this was before, like, 02:35:07.300 |
the more recent, you know, wave, and so I do think, you know, when you're early on, whether you're using 02:35:13.420 |
AI or not, you're going to probably spend some time in the idea maze. You're really trying to navigate, 02:35:17.880 |
figuring out where is their true user need, and what problems are we solving, and I do think there, 02:35:23.500 |
the temptation today is the move super fast. AI can do everything for you, so you just jump onto the 02:35:28.720 |
thing and start building. I still think people can afford to go be much more patient, and I think even for 02:35:34.120 |
us, like, when we initially started doing our first AI launch was two years ago, I almost wish, like, in 02:35:38.800 |
hindsight, we could have, like, really just taken our time to appreciate how much things are changing 02:35:42.700 |
and evolving before going to, like, full steam ahead, like, let's just build, build, build. Because part of 02:35:47.200 |
that, I think, realization that we did have starting to build is that, hey, because things are moving so 02:35:53.080 |
fast, like, are there infrastructure decisions we should be thinking about earlier, much earlier on, 02:35:57.820 |
before things become too late. You get to a scale where it's impossible to unwind, and I think it's 02:36:02.500 |
helpful to think a little bit more about that way earlier on in the process. It doesn't mean you 02:36:07.120 |
should slow down, it just means you should be thoughtful of it. 02:36:09.500 |
It's not something I would have done differently. I think I would have prioritized maybe more effort 02:36:17.300 |
around, even more so, is we have a lot of infrastructure built around experimentation, and I think it's obvious 02:36:22.980 |
now, like, given all the different tooling, like, you know, especially if you have a big user base, 02:36:27.020 |
experimentation's a key to velocity. And, you know, we did do some of that pretty early on, but it was more of a 02:36:32.360 |
sort of gradual. I think we would have, you know, really taken our time to think about, 02:36:35.860 |
"Okay, what should we do?" and, like, put more weight behind it. If it would have changed anything, 02:36:40.760 |
I'm not sure, but I think that's one thing, you know, I would have kept in mind. You got two, go here and then here. 02:36:46.700 |
You might already be there. At some point, you probably will have to bring in people, whether 02:36:51.840 |
they're, like, communication experts or legal experts, that maybe don't gel quite as much with, 02:36:57.060 |
maybe, like, the technical or engineering-led culture you might have. Do you have any advice for, like, 02:37:01.520 |
how to make, how to not, like, ruin some of that culture, but also make sure that they don't feel 02:37:06.480 |
completely excluded? Yeah, the way we've been trying to do it is for the founders or other leaders to 02:37:10.720 |
try to do the job first. So, you know, the question is outside of engineering, basically, how do you, 02:37:15.440 |
you know, potentially not mess things up by growing too fast? And, you know, we're still learning there. 02:37:19.760 |
Oftentimes, a lot of the jobs, for me, for instance, a lot of marketing, sales, customer experience, 02:37:25.040 |
was all done by me first. So, I have some sort of baseline understanding because, you know, I, as in a previous life, 02:37:30.480 |
I've never hired for those functions. So, how do I even know what good looks like? I try to do the job 02:37:34.560 |
myself, oftentimes not a great job at it, but understand all the nuances that take, that really 02:37:39.680 |
goes into that job, know what great looks like, and then go out and finally hire that person. 02:37:44.400 |
We're going back to the player coach. We still go out and find player coaches for that role, 02:37:48.640 |
so that it doesn't end up becoming this sort of cascading effect of, like, really big and bloated teams. 02:37:53.120 |
So, some of the player coach stuff sounds like you're hiring a lot of high agency people. 02:37:58.640 |
How do you judge high agency when you're hiring people? That does not necessarily come from their 02:38:03.280 |
resumes. What kinds of questions do you ask? What kinds of processes do you follow during hiring to 02:38:07.840 |
judge for high agency? Yeah, totally. It's probably stuff that you have heard before, but a lot of times, 02:38:13.440 |
you know, you want to, if someone has prior work experience, you dig into their most challenging 02:38:18.640 |
project or problem they had to encounter, and you ask them, you know, basically how they solved it. 02:38:24.480 |
What you'll find is people that have high agency or just a sense of ownership in general, they don't 02:38:29.040 |
immediately jump to what the solution was. They'll talk about how they try to understand the problem, 02:38:33.600 |
and then how the problem, what they understood at the surface level is actually five levels too high, 02:38:38.080 |
you have to keep on drilling. And if they can articulate what the true problem was, like keep 02:38:41.920 |
on going down, and then not only talk about what the solution was, but all the attempts at the solution, 02:38:47.360 |
I think that goes to show that someone wasn't just like taking orders and like, "Hey, I'm going to do this." 02:38:50.960 |
It was like, "I need to fund one, understand the layers of the problem, and then two, navigate and actually 02:38:56.800 |
explore." Most people, when you start asking them like the second order or third order whys, 02:39:01.440 |
they can't get there, and if they can't, then it's pretty clear that they probably weren't doing much 02:39:04.640 |
of the thinking themselves. Hey, thanks for the comments. So hiring is probably one of the most 02:39:10.960 |
important things that a company can do, right? I mean, it's either for better or worse. What are some, 02:39:16.640 |
if there were any major failures that you have experienced that you could share with us, 02:39:22.240 |
that would be very helpful. Yeah, the biggest failures were actually when we didn't, 02:39:28.000 |
when there was a role that there was some ambiguity, and we weren't able to do a work trial. So work 02:39:34.080 |
trial is also something I'm going to talk about, something we deploy where people actually do the 02:39:38.320 |
job for a certain amount of time. Much easier if they're obviously not currently working, and we've 02:39:42.560 |
found great success when someone's in between or has been doing fractional work, we bring them to do 02:39:46.720 |
the job first, and we do that for a few months, where we had some roles where we weren't yet sure what 02:39:51.520 |
we're looking for, and we brought them on, and they didn't do a work trial, they just went straight 02:39:55.680 |
in. It oftentimes wasn't a good fit, because neither them or us knew kind of like, okay, what were we 02:40:00.720 |
actually, what was going to be that sort of good fit. So if you can, if you're lucky enough to be able to do 02:40:05.120 |
a work trial, whether it's two days or three months, or in our case, we default to three months, I would 02:40:09.520 |
encourage you to try to do that, especially if it's a role you haven't done yourself. 02:40:12.320 |
The work trials have actually all worked out, which is great, and a few data points, and we've done 02:40:20.960 |
five plus of them, and then, yeah, in the cases where we didn't, it's actually pretty high, again, 02:40:27.440 |
going back to the role that we weren't certain about what we're hiring for, it's actually a pretty high 02:40:30.400 |
failure rate for us. Is that it? All right. Thank you, everyone. I'm on LinkedIn if anyone wants to connect. 02:40:39.360 |
Thank you so much, Grant, for the insight. Next up, we have Vic Paruchuri from Datalab, 02:40:50.160 |
and they're training custom models for document intelligence, including OCR and unstructured data 02:40:55.200 |
processing with popular repos like Marker. They scaled 5x in the past year, up to seven-figure ARR, 02:41:02.640 |
including folks like Tier 1, Tier 1 AI Labs, and they're going to walk through their approach to 02:41:08.400 |
building these super popular repos, scaling revenue, and training models with a tiny team. So, welcome to the stage, Vic. 02:41:23.440 |
I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and then, I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one, and I'll see you in the next one. 02:42:48.980 |
I'm the CEO of Datalab, and today I'm going to talk about how we got to 40K GitHub stars, seven-figure ARR, and trained state-of-the-art models with a team of three. 02:42:57.700 |
So, I spent the last year training these models, like Brittany mentioned, Marker, and Surya. 02:43:06.280 |
I left my AI research job, and I started a company and raised a seed round. 02:43:23.340 |
We're at seven-figure ARR, and our customers include tier one AI labs, universities, Fortune 500, and AI startups, including Gamma, who I used to make this presentation. 02:43:35.700 |
I'm going to talk about how we've grown with a small team. 02:43:38.360 |
I'm going to talk about my philosophy on building teams and why I think we're at kind of an inflection point in how we think about building teams. 02:43:45.760 |
And I'm really going to talk about this idea that headcount does not equal productivity. 02:43:50.040 |
There's, like, this really persistent notion in Silicon Valley that you raise money, you hire a bunch of people, and you build more. 02:43:55.560 |
But it almost never, in my opinion, works out perfectly that way. 02:44:03.020 |
I'm very fond of the data prefix, apparently. 02:44:05.120 |
And we scaled to 30 people and 4 million ARR bootstrapped during COVID. 02:44:10.920 |
And then, unfortunately, we had to do two rounds of layoffs post-COVID when online education kind of tanked. 02:44:17.140 |
We went from 30 to 15, and then again from 15 to 7. 02:44:20.420 |
And it was obviously awful for the people we had to lay off, but I noticed something really interesting. 02:44:25.380 |
Productivity and happiness increased a couple of months after both layoffs to the point where we were actually much more productive after both cycles than we were at the beginning. 02:44:35.980 |
Like, how could reducing the team so much actually improve productivity? 02:44:45.500 |
So, as you scale, like Grant mentioned in the earlier talk, you end up building these very specialized functions in teams. 02:44:51.560 |
And those specialists often can't flex across the company to solve the key issues of the company. 02:44:56.440 |
Two, we were a remote team, which required a lot of intentional process and heavy syncing, which just eats into your time and just makes it really hard to get on the same page. 02:45:06.540 |
Because of that, we had a lot of meeting overload. 02:45:08.880 |
And especially once we got middle management in place, people whose job is kind of professionally to manage, we ended up with just a lot of meetings on people's calendars and not enough time to actually work. 02:45:19.180 |
And then senior people, we hired kind of a mix of experience, like most companies do. 02:45:25.160 |
And then senior people ended up getting kind of tied down in doing a lot of work to manage the more junior people. 02:45:31.820 |
We actually had a case where we had a three-person team, and we cut it down to one, and the team actually got much more productive because it freed up the senior person's time. 02:45:41.340 |
And kind of every company, I feel like, goes through this journey. 02:45:44.580 |
There's this initial golden period when everyone is aligned, you're on the same page, you're building this amazing stuff, and that's really when you build the core thing of your company. 02:45:53.540 |
Like Google with Search or Microsoft with Windows. 02:45:57.440 |
It's kind of when you figure out your business model. 02:45:59.780 |
And then you hire a bunch to fill out the edges around it. 02:46:02.040 |
Like you hire a bunch of enterprise sales, you hire a bunch of marketing, you hire a bunch of engineers who are kind of in very small boxes to build very small features. 02:46:09.740 |
I had a friend at Amazon who worked there for two years and built a shopping cart button. 02:46:15.920 |
But, I mean, at that scale of org, that's kind of the tiny box you get fit in. 02:46:19.720 |
And you end up with a lot of bureaucracy, a lot of sinks, a lot of unclear priorities. 02:46:23.380 |
And this pattern is unfortunately very common. 02:46:26.420 |
But I started to think, what if that golden period just lasted forever? 02:46:31.920 |
And as I started working with Jeremy Howard at Answer.ai, I got to understand his philosophy for building a company a little bit better. 02:46:40.440 |
And his idea is basically hire less than 15 generalists. 02:46:44.820 |
So people who can really do everything across the stack and really understand all aspects of the company, fill in the edges with AI and internal tooling. 02:46:52.420 |
So Jeremy's invested a lot recently in FastHTML and things like MonsterUI because he sees them as kind of building block libraries to really build out the other tools that the company's working on. 02:47:06.140 |
You don't need a Kubernetes cluster when you're a three-person company. 02:47:10.840 |
But this unfortunately requires kind of a high cultural bar for folks. 02:47:15.320 |
You need people who really want to and can understand everything you're doing. 02:47:22.080 |
You need go-to-market people who actually build. 02:47:29.140 |
So basically, you need people who are in it because they're building something together and not in it for other reasons like politics or personal advancement, et cetera. 02:47:38.020 |
And everyone needs to really care about the customers and focus on them. 02:47:42.140 |
I think these are the prerequisites for this kind of team working, this less-than-15-person team of generalists. 02:47:53.320 |
We recently shipped it but have not announced it yet. 02:47:57.620 |
It supports 90 languages and 99% accuracy on our challenging internal benchmarks that include math. 02:48:03.840 |
And it also does some features that no other model does, like character-level bounding boxes. 02:48:07.900 |
It uses PDF text as grounding at a line level. 02:48:13.160 |
And in order to do it, Tharun, who's a research engineer at Datalab, and I had to handle the entire process from end to end. 02:48:19.800 |
So that included talking to customers, figuring out what they wanted. 02:48:22.980 |
It included reading a bunch of papers and figuring out the right architecture, prototyping, doing the model training itself, 02:48:29.420 |
which you always hope is 90% architecture but is always 90% data cleaning. 02:48:33.900 |
So building a data pipeline library, building out the data sets, then we had to write the inference code. 02:48:38.760 |
So we had to connect it to our repos, get the inference written for all our customers, and then integrate it into our products. 02:48:44.620 |
So this is a scope that in a big company, you'd probably have four, ten, you'd have a lot of teams doing this. 02:48:50.320 |
And every time you hand off between teams in a traditional company, you lose context, right? 02:48:55.360 |
The people who talk to the customers lossily communicate it to the people who build, 02:48:59.980 |
who lossily communicate it to the people who train the model. 02:49:05.020 |
You end up eating a lot of time in just syncing context. 02:49:09.280 |
You're not able to build a great end-to-end experience as a result. 02:49:12.180 |
And you have very slow feedback loops, right? 02:49:14.060 |
Like you talk to a customer today, and it might impact your model training in months. 02:49:17.720 |
Whereas if you have generalists who can work across the stack, you get seamless context, right? 02:49:22.780 |
You never need to share context and do inefficient syncing. 02:49:25.240 |
You get a really tight integration between all aspects of the company and very, very fast feedback cycles. 02:49:31.040 |
And the reason we were able to do this is we used AI to take kind of the easy, low leverage pieces of this, like building a data pipeline library or helping us really figure out how to integrate it into the API, whereas we did the higher-level work in each of these silos. 02:49:46.720 |
So if you get one thing from this talk, this is the thing. 02:49:49.920 |
More people does not equal more productivity. 02:49:57.080 |
So the first thing you have to do is hire senior generalists. 02:50:00.160 |
And senior to me does not mean years of experience. 02:50:04.100 |
You need people who can look at a problem and say, I'm going to figure out how to solve this. 02:50:08.920 |
And I really care enough to iterate with the customer to solve it. 02:50:18.500 |
Like, hey, let me deploy this Kubernetes cluster and multi-stage pipeline to solve like a data extraction problem. 02:50:24.220 |
But in reality, you need people who can go back and, like, kind of set aside the fixation on shiny tech and just do the simplest possible thing, which usually is I'm just going to write a shell script to run this on one machine. 02:50:34.420 |
There's that famous, like, Hadoop versus shell script blog post from a few years ago when, like, you could replace a whole Hadoop cluster with just like a 64-core machine. 02:50:45.200 |
And you need to work in person, I personally think. 02:50:47.720 |
Remote is great for a lot of reasons, but it's not great for a small team that needs to move fast. 02:50:55.220 |
And process, to me, is kind of the death of this really fast collaboration and tight feedback loop. 02:51:03.380 |
So I alluded to this a little bit, but you have to reuse components aggressively. 02:51:08.160 |
So we reuse a lot of components between our on-prem and our API deployments. 02:51:17.020 |
It's all server-rendered HTML with, like, light, HTMX, and Alpine. 02:51:21.000 |
And then super clean, modular code that AI can really add to very well. 02:51:25.180 |
Like, we re-architected our marker repo to be extremely modular and easy to work with and well-documented. 02:51:30.980 |
And that makes it much easier to use AI to actually add to it. 02:51:39.840 |
Architecture, as few moving pieces as possible. 02:51:45.680 |
Minimize bureaucracy, high trust, continuous discussions. 02:51:49.240 |
If you feel like someone's going to need a lot of management, like, don't hire them. 02:51:53.780 |
Like, you need people who can move fast without being managed. 02:51:59.420 |
And then how do you fill in the edges with models? 02:52:01.400 |
So a challenge we're going to face as we scale is this idea that we're a document processing document intelligence company. 02:52:07.720 |
And every customer has a slightly different way that they want to parse their docs. 02:52:11.500 |
And if you go back kind of to the last generation of OCR companies, the way they solved this is they hired a bunch of forward-deployed engineers. 02:52:17.860 |
You sat at a client site and you just kind of iterated with them until it was good enough. 02:52:22.000 |
But in the future, you can really train a model to handle this complexity, right? 02:52:26.140 |
Like, we can train a model to essentially loop over customer outputs until it gets to the right state. 02:52:31.080 |
So you can kind of replace that entire forward-deployed engineering side of the org. 02:52:39.740 |
I don't know exactly when this model falls apart. 02:52:41.960 |
But Gamma, as we just saw, is a great example of a small team with very, very meaningful growth in ARR. 02:52:49.100 |
I think the key is being able to say no, right? 02:52:53.300 |
You can choose to go hire a bunch of forward-deployed engineers and put them at your client sites. 02:52:56.980 |
Or you can choose to solve it a different way. 02:52:58.920 |
And maybe that different way is slightly less efficient in terms of revenue. 02:53:01.980 |
But it might be more efficient in terms of your long-term company trajectory and health. 02:53:06.820 |
So it's really unknown if this will work forever. 02:53:09.880 |
But in my opinion, like, it's your choice, right? 02:53:12.140 |
Like, you can choose to make this model work or you can choose to do the less efficient, let's scale to hundreds of people model. 02:53:20.120 |
So LLMs are surprisingly bad at generating Venn diagrams, so that explains why this slide is not so well done. 02:53:28.420 |
But basically, we have three core roles, and the responsibilities overlap a lot. 02:53:36.880 |
And research engineer and full-stack engineer overlap quite a bit. 02:53:41.060 |
And then go-to-market is really, like, your traditional kind of sales, marketing, support functions all collapsed into kind of, like, a more generalist role. 02:53:48.060 |
And really, like, I feel like politics are the death of small teams, right? 02:53:54.420 |
Like, we want people who only care about the work, the people around them, and customers, right? 02:54:00.380 |
Like, you need some ego to kind of advance your own ideas, but not so much that you're willing to fight for them at the detriment of kind of the health of the company. 02:54:10.000 |
Like, it's always weird to me that startups pay $150,000 or $200,000 when they've raised $20 million, right? 02:54:15.600 |
Like, you should be able to hire fewer people with higher salaries and get more done. 02:54:25.160 |
Like, if you come in, you get to work across the stack, you get to ship things end-to-end, and that's very exciting for some people. 02:54:30.960 |
It's not exciting to other people, and they kind of self-select. 02:54:33.800 |
And then you really need a good way to screen for low ego and GSD, right? 02:54:38.920 |
Like, you need people who will ship, not talk about shipping. 02:54:41.400 |
And that's another downside of remote culture, in my opinion. 02:54:48.600 |
Like, the worst hires I've personally made have all been when I thought I had to fill a role very quickly. 02:54:53.700 |
All of my best hires have been when I said, okay, let me find the best person and hire them, even though I may not necessarily have a role today. 02:55:02.580 |
This is actually a big debate in NBA and NFL drafting, too. 02:55:06.820 |
Like, best player available versus drafting for fit. 02:55:11.300 |
So, really, I think the thing to think about as you scale is, like, how do we scale productivity, not headcount? 02:55:18.120 |
Like, you can raise salary bands as the company grows. 02:55:20.380 |
So, you hire more and more experienced people into the same role. 02:55:25.860 |
Like, one researcher with access to eight GPUs is less productive than one with access to 64 GPUs. 02:55:31.060 |
You can invest in AI tools that multiply productivity, right? 02:55:33.980 |
There's so many tools out there now that are worth paying for that can abstract away a lot of these edges for you. 02:55:41.360 |
And, finally, I'd be remiss if I didn't say, if this culture sounds interesting to you, drop me a line. 02:55:52.140 |
I think we do the microphone for questions, right? 02:55:55.660 |
So, when you went from 30 to 15 into the 7, I mean, my takeaway from this whole talk is, like, the human touch points are really what slow things down, right? 02:56:08.720 |
Was there any additional focus on reducing the domains that you were focusing on or, like, your capability sets? 02:56:17.680 |
Or it was, like, basically your same product offering just with less folks focused on it? 02:56:24.060 |
So, at a very high level, we offered the same product, but we cut some features that were less relevant. 02:56:28.940 |
Like, we'd built up a lot of those edges that you kind of, like, end up building over the years. 02:56:34.400 |
And we ended up slicing a lot of those edges. 02:56:36.660 |
So, I think what happens when you hire a lot of people is you don't have enough work, and you start making work for people, right? 02:56:42.100 |
And they end up building all of these edges that actually aren't that useful to the customer. 02:56:45.580 |
But when you have a tiny team, there's so much work that you actually have to ruthlessly prioritize. 02:56:49.800 |
And I think you always want to be in that zone. 02:57:00.420 |
So, we take you and drop you in the middle of a giant company that's been around for 100 years, hundreds of thousands of employees, lots of bureaucracy, lots of ego, got super comfortable with revenue stream. 02:57:21.700 |
And they're clearly folding over on themselves with too many people. 02:57:31.460 |
I would say the people who want to change the culture, go start a small company and build the same thing, just build it better. 02:57:39.700 |
Like, that's a common disruption and growth cycle. 02:57:43.880 |
Like, it's just because once a culture gets ossified, like, I've worked at the State Department, Pepsi, UPS. 02:57:48.700 |
Like, once a culture gets ossified enough, like, you're not going to change it. 02:57:53.160 |
I mean, generally with that pattern, what happens is these companies recognize that they're a target, and they start to buy up those small startups and crush them. 02:58:01.720 |
But, like, I mean, Google is a great example of where that didn't happen, right? 02:58:05.000 |
So, you haven't talked about how you source these really good generalists. 02:58:17.780 |
Another way is just open source and Twitter are great ways to hire. 02:58:22.640 |
Like, a lot of best candidates have actually come from Twitter, which is weird. 02:58:29.600 |
I don't have a great answer to that, but I think if you do good work and you put it out in public and you talk about how you're building, 02:58:35.980 |
like, that seems to attract people who really care about this mission and want to build in the same way. 02:58:48.200 |
So, how do you structure your interview process and recruitment? 02:58:51.920 |
Like, how does it look like you maybe do a trial period or...? 02:59:07.000 |
Let me talk it through with you and see if we can solve it together. 02:59:09.680 |
If that goes well, step two is let's think of a project we can build together. 02:59:13.940 |
So, we do a paid project, it's usually around 10 hours, we pay $1,000, it's like, it sounds like a lot, but it's actually a tiny amount of money to figure out if someone's a fit or not. 02:59:21.760 |
And then we review the project and if it's good, we come in and just do a culture fit. 02:59:25.760 |
Like, how does it feel if we're all just interacting as humans and people and if it feels like a good fit, like it's a hire. 02:59:35.700 |
Like, maybe 10% of the people that goes through that process? 02:59:42.100 |
Like, usually we don't, once we kind of get someone to the beginning of the process, we have high confidence they'll be good. 02:59:50.680 |
But we probably, of the people we've interviewed, I think 40% we've ended up hiring. 03:00:08.420 |
Thank you so much, Vic, for sharing your words of wisdom here. 03:00:14.860 |
And closing out the track for the day, we have Alex Duffy from Every about to take the stage. 03:00:21.320 |
He is the head of AI and lead writer for Context Window, which is the Every newsletter that has over 100,000 readers. 03:00:30.340 |
And Every is a company that has not just a newsletter, but also an array of products and also does consulting and implementations, 03:00:58.480 |
I know a lot of the talks today have been pretty technical. 03:01:00.580 |
This is going to be a little bit of a change of pace. 03:01:37.340 |
So, today is going to be, all right, I might have to sacrifice my speaker notes here. 03:01:45.080 |
Today I'm going to talk about benchmarks as memes. 03:01:47.280 |
And this is the meme that Opus came up with when I was asking it what I should put as the meme. 03:01:54.520 |
And we are indeed going to talk about how benchmarks are just memes that shape the most powerful tool ever created. 03:02:10.000 |
I lead AI training and consulting at every, but essentially I'm very into education and AI. 03:02:20.500 |
And I think benchmarks are a really underrated way to educate. 03:02:24.720 |
And what I'm not talking about are these kinds of memes. 03:02:28.380 |
What I am talking about is the original definition of, like, ideas that spread. 03:02:34.620 |
Richard Dawkins, an evolutionary biologist, coined the term in the 70s. 03:02:37.380 |
Christianity, democracy, capitalism are kind of examples of ideas that spread from person to person. 03:02:43.660 |
And benchmarks are actually memes very much so in that way. 03:02:47.400 |
We heard Simon Wilson talk earlier today about his pelican riding a bicycle. 03:02:52.160 |
And I think that that was a really great example because he started doing it a year ago. 03:02:55.740 |
And then that found its way onto Google I/O's keynote a couple weeks ago. 03:03:00.580 |
And I think how many R's in strawberries probably also may be the most iconic meme as a benchmark. 03:03:06.580 |
And now, surprisingly, unsurprisingly, the models don't make that mistake anymore. 03:03:11.600 |
And I think that that's a really important part of this. 03:03:14.060 |
Some benchmarks get popular in our memes just because of their name, like Humanity's Last Exam. 03:03:18.280 |
You know, that got pretty big, even though maybe more outside of AI circles. 03:03:22.940 |
But with that said, we kind of have a little bit of a problem. 03:03:26.280 |
How many of you guys, when Claude got released a couple weeks ago, looked at the benchmarks? 03:03:36.780 |
You know, it tries to mimic what we do in real world. 03:03:39.340 |
And same with Pokemon, which we'll talk a little bit more about. 03:03:45.340 |
And a big reason is because they're getting saturated. 03:03:48.420 |
Benchmarks kind of, like, came from traditional machine learning, where you had a training set 03:03:53.560 |
And it was structured very much like standardized tests. 03:03:58.960 |
And they weren't really set up for what they've become. 03:04:03.280 |
And as a result, I think XJDR summarized this pretty well on X when Opus came out, that they 03:04:09.580 |
didn't look at benchmarks once when it dropped and officially no longer cares about the current 03:04:14.300 |
And I think I fall a little bit into that category. 03:04:16.180 |
But in light of that, there is a really big opportunity. 03:04:20.060 |
Because the evals define what the big model providers are trying to get their models good at. 03:04:27.020 |
And that's a really big opportunity, especially for people in the room. 03:04:31.720 |
And I think that this is kind of like a normal thing. 03:04:34.740 |
This is the life cycle of the benchmark, in my view. 03:04:38.760 |
And especially, uniquely, a single person can come up with an idea that then gets adopted. 03:04:46.720 |
And the model providers then train on it or test on it until it eventually becomes saturated. 03:05:09.480 |
And it is someone trying to count from 1 to 10, not flick you off. 03:05:12.920 |
But this is a cool benchmark that came out now that Google's got the best video generated model 03:05:19.920 |
And it shows how difficult it is for somebody to count from 1 to 10, speaking it out loud. 03:05:26.480 |
And even though it looks really, really great, that is a problem that is not solved yet. 03:05:34.120 |
And I see that spreading, and I see next year the models being better at that than ever before. 03:05:38.840 |
I think another example along the way is Pokemon. 03:05:41.680 |
We saw with the Claude model release, as well as with the new Gemini models, that they had 03:05:49.420 |
And while both needed a little bit of help, and Gemini eventually got there with that help, 03:05:56.600 |
An example of saturation is kind of like the GPT-3 benchmarks. 03:06:00.320 |
I don't know how many of you guys remember Super Glue, kind of from the NLP days, but a lot 03:06:05.360 |
of these benchmarks are not really used anymore, in part because the language models got too 03:06:10.860 |
But one way of looking at this is actually that a single person can have an idea of how 03:06:16.960 |
good is AI at this thing that I care about, and then at the end of the journey, the most 03:06:21.640 |
powerful tool ever created is now really great at that thing that I care about. 03:06:25.700 |
And so the point is that the people here, the people that get that, the people that can 03:06:31.020 |
build benchmarks are going to shape the future, and maybe the people watching online too. 03:06:36.320 |
But somebody here is going to make a benchmark that the models are going to test on and train 03:06:39.860 |
on in the next five years, and that's an incredible weight. 03:06:43.500 |
That's an incredible power, but that also comes with some responsibility. 03:06:48.840 |
I know Simon talked about this a little bit before, but we saw a few weeks ago where ChatGPT 03:07:00.020 |
We all learned about what that word meant a few weeks ago. 03:07:03.220 |
But essentially, ChatGPT released, OpenAI released a new model that was benchmarked by thumbs up 03:07:09.300 |
and thumbs down, and unsurprisingly, people thumbs up responses that agreed with them. 03:07:13.800 |
So you ended up with a model that got rolled out to millions of people that agreed with them 03:07:18.060 |
no matter how crazy or bad their idea was, which is problematic. 03:07:22.700 |
And I think that if we don't think about people, this kind of stuff can happen. 03:07:26.380 |
And I'm still thinking about Toru Imwa, who at the start of Google I/O said that we're here today to see each other in person, 03:07:33.060 |
and it's great to remember that people matter. 03:07:35.380 |
And so in the context of benchmarks, let's not continue the original sin of social media, which kind of treated everybody as like data points. 03:07:43.060 |
And it's like, hey, the more you look at something, the more I should show you that. 03:07:46.740 |
Let's make benchmarks that help empower people, give them some agency. 03:07:51.420 |
And so for me, you know, this isn't a technical talk. 03:07:53.420 |
There are other people talking about how to make a great benchmark technically. 03:07:56.420 |
But generally, I think that if you're building for the future, a great benchmark should be multifaceted. 03:08:00.420 |
So you've got a lot of strategies that could do well, reward creativity, right, like accessible. 03:08:06.100 |
So easy to understand, not only for the models. 03:08:08.100 |
So you have small models that compete, large ones as well, but also for people to keep track of it. 03:08:12.100 |
Generative, because the really unique thing about these AI models is if you have great data, even if it only does it 10% of the time, you can train on that. 03:08:20.100 |
And so the next generation does it 90% of the time. 03:08:22.820 |
And that's incredible and hard to understate and evolutionary. 03:08:26.980 |
So ideally, we don't have benchmarks that cap out 96, like what's the difference between 96 and 98%? 03:08:35.140 |
Ideally, we have these benchmarks that get harder and the challenge gets deeper as the models improve. 03:08:45.060 |
Some of the things that I personally care about is trying to get people outside of AI interested. 03:08:48.660 |
So maybe making benchmarks a spectator sport and was interested personally in the personality of these models. 03:08:53.860 |
We're about to find out which one wanted to achieve world domination. 03:08:58.660 |
And I really wanted something we can learn from. 03:09:01.300 |
And we saw things like AlphaGo and OpenAI 5, AI playing these games. 03:09:05.700 |
And the best people in the world wanted to play against it to learn from it. 03:09:10.260 |
So I made this benchmark called AI Diplomacy. 03:09:13.140 |
And if I don't have this video, I've got to back up just in case. 03:09:17.460 |
And this benchmark is -- how many of you guys have heard of the board game Diplomacy? 03:09:26.660 |
But what's really cool about this game is there is no luck involved. 03:09:29.940 |
So the only way this game progresses is if the language models, which you're seeing here, 03:09:34.500 |
send messages to each other and negotiate, find allies, find enemies, or create alliances and get 03:09:44.900 |
You actually see the different models sending messages to each other, trying to create alliances, 03:09:49.780 |
trying to betray each other, trying to take over Europe in 1901. 03:09:54.340 |
And what was really cool about one of these games -- and we're about to launch this on stream so 03:09:58.900 |
you can watch for a week -- is I'll take you through a game super quick. 03:10:03.300 |
And what you're looking at here is the number of centers per model. 03:10:17.700 |
Across all the games, O3 is one of the only ones that would tell a power that it's planning 03:10:23.060 |
to back them and then in its diary write, "Oh, man, they fell for it. 03:10:29.540 |
And it realized that the reason why 2.5 Pro was pulling ahead was because Opus, 03:10:34.580 |
Clyde Opus, who's so good-hearted, really had their back. 03:10:39.380 |
And they needed to convince Opus somehow to stop backing Gemini. 03:10:42.980 |
So how they did it was propose, "Hey, if Gemini comes down, we'll propose a four-way tie. 03:10:48.020 |
We'll end this game with a tie," which is impossible in the game. 03:10:51.060 |
But it convinced Opus, and Opus thought it was a great idea, non-violent way to end the game, 03:10:56.980 |
And so they pulled back their support from 2.5 Pro. 03:11:03.380 |
O3 realized, "Oh, I got to take them out," took them out, 03:11:05.860 |
took everybody else with them, and took out Gemini 2.5 Pro. 03:11:09.700 |
Even though they got one away from winning, O3 ended up winning in the end. 03:11:13.540 |
And you can actually see some of the quotes from that game. 03:11:16.420 |
You can see O3 saying, "Oh, Germany was deliberately misled. 03:11:19.860 |
I promised to hold this, but all to convince them that they're safe, but it will fall." 03:11:24.580 |
And then meanwhile, Claude Opus singing that "The coalition unity prevails," 03:11:31.700 |
But when -- and then they don't want to let anybody be convinced. 03:11:37.700 |
And you can see that kind of in this second chart where this is like friendships. 03:11:42.900 |
And you can see that, you know, 2.5 Pro was a good friend of Claude's until it turned. 03:11:48.820 |
And you can see that that's when they started kind of like pulling away. 03:11:52.100 |
But what was really cool is that there were a lot of other things that came up. 03:11:55.700 |
O3 got a habit of finding some of the weakest models and having them be their pawns in order to win. 03:12:09.780 |
And you can see that they're unable to realize. 03:12:12.740 |
They think it's a miscommunication, misunderstanding, or a typo that O3's 03:12:16.740 |
betrayed them at the end of the game in order to win. 03:12:19.620 |
And so there was a lot that we learned from this that I don't think that you really learn 03:12:27.380 |
I tried 18 different models, learned that Claude models were kind of naively optimistic. 03:12:33.380 |
Actually, none of them ever won in any of the games that I tried, even though they were really 03:12:37.940 |
But they just got took advantage of by models like O3. 03:12:40.980 |
And also, surprisingly, Llama 4 Maverick very good at this game, in part because it was great at that 03:12:47.940 |
I think that convincing others what they were trying to do and kind of like get people to 03:12:56.500 |
Gemini 2.5 Flash, man, I wish I could run every game with Gemini 2.5 Flash. 03:13:02.980 |
And then, surprisingly, also Deep Seek R1, which wasn't great the first time I tried the model, 03:13:08.500 |
I think they had a new release last week, actually almost won. 03:13:12.100 |
And in the stream, I think you'll see some really interesting gameplay with them. 03:13:20.260 |
And it told some other opponents that, hey, your fleet's going to burn in the Black Sea tonight. 03:13:25.460 |
Like an aggression and a prose, I guess, that I hadn't seen out of any other model. 03:13:31.140 |
And that's super impressive given the model's, you know, 200 times cheaper than 03. 03:13:35.860 |
And, you know, I think that this highlights that we need more squishy, like non-static benchmarks 03:13:46.340 |
Those are some of the things that mattered to me. 03:13:48.420 |
And I think that, you know, math and code, we've got quite a few benchmarks for that. 03:13:52.820 |
Legal documents, you know, I think that they're a little bit less squishy and are really ripe for 03:13:58.260 |
But there's also room for benchmarks around ethics and society and art. 03:14:03.700 |
It's going to require your subject matter expertise. 03:14:08.100 |
But maybe instead of asking for the minimum number of operations needed to remove all the cells, 03:14:13.060 |
maybe it's like, hey, can you make a fun video game that's more intentional 03:14:22.820 |
Like, you guys who are here right now understand this so deeply. 03:14:25.620 |
But at every we work, I lead our training and consulting and I work with a bunch of clients 03:14:30.980 |
from journalists to people at hedge funds, people in construction and tech. 03:14:35.460 |
And they all have the same two fears, which is one, how can I trust AI? 03:14:43.300 |
And benchmarks, in my view, are really the answer to both. 03:14:46.900 |
One, they realize that in my goal as a human, in my view, the role of a human in an AI world is to 03:14:54.980 |
define the goal and to define what's good and bad on route to that goal. 03:15:01.380 |
And once you do that, once you define that goal, then even if it's just defining a prompt, 03:15:09.220 |
You can realize, oh, it's messing up in this way. 03:15:11.780 |
And it's not quite exactly what I want because it's not going to be perfect. 03:15:16.020 |
Maybe that's really just changing a prompt a little bit. 03:15:19.780 |
And that moment, that cycle, that builds trust. 03:15:22.820 |
They realize, oh, I am important to this whole system, but it can be helpful. 03:15:29.300 |
And we need trust right now because we are building one of, if not the most powerful tools 03:15:34.740 |
And we can get more out of it if more people use it. 03:15:41.460 |
But there's also going to be a whole lot more incredible things that get made. 03:15:45.620 |
And if you're not sure where to start, you can ask your mom. 03:15:47.860 |
My mom teaches yoga and we had a good talk about some things that could help. 03:15:54.340 |
We put those seven questions into five different models. 03:15:58.020 |
And she ended up realizing, hey, Gemini 2.5 Pro is my favorite, too. 03:16:01.940 |
And there was a few things that she didn't like from their responses. 03:16:07.460 |
And now she uses that to help her local community have customized sessions for 03:16:13.620 |
And I think that's really cool, having a big impact in a local community 03:16:19.620 |
So hopefully, before you guys leave SF, maybe talk to somebody who's not in AI. 03:16:26.660 |
And just maybe that conversation has a big impact now and in the future. 03:16:36.420 |
MMLU score is just way less cool than asking what your mom thinks. 03:16:43.060 |
I appreciate a bunch of people that helped actually bring this out. 03:16:47.780 |
It kind of came together through random coordination on X. 03:16:51.380 |
Had researchers from all over the world hop in. 03:16:54.180 |
Especially Tyler and Sam, all the way from Australia and Tyler and Canada, 03:16:57.860 |
who kind of helped make this happen in the text arena team. 03:17:00.340 |
And especially the every team who kind of backed me and able to create this presentation and be here. 03:17:08.260 |
I think Anthophic says, you know, they don't benchmark Macs and that's why a lot of times you don't see Claude on some of the top benchmarks. 03:17:32.580 |
So how do you think about that with your opening statement about benchmarks shape the development of AI. 03:17:37.860 |
And when you look at one of maybe the most arguably aligned companies don't really try to benchmark Macs. 03:17:42.740 |
Well, I mean, I think benchmark maxing is a little bit different than being aware of how good it does. 03:17:46.420 |
Because I think we saw that they actually did have Claude plays Pokemon in the middle of their their release. 03:17:53.460 |
And it's funny because Claude didn't do the best at this game. 03:17:59.700 |
It didn't do everything that it could to win. 03:18:01.940 |
And I think these kind of benchmarks show you personalities, not only of the models, but also of the model trainers, which is really cool. 03:18:09.620 |
I mean, Claude 4 also didn't do that well in the benchmarks. 03:18:12.580 |
I mean, outside of coding, it didn't do as well as maybe some of the other benchmark maxing companies. 03:18:17.620 |
Well, you know, and I'd say like Claude kind of didn't do as great as like Llama 4, for example, 03:18:22.660 |
which it still definitely does better than in a lot of other benchmarks. 03:18:26.100 |
So interesting to kind of see the dynamics in different scenarios. 03:18:28.820 |
But yeah, I imagine that there are some ways to evaluate Claude that they really care about, even if it's not like what you're going to optimize for with reinforcement learning. 03:18:43.140 |
Just out of curiosity, super interested to hear a little bit more about the back end of AI diplomacy and just how you did the orchestration if you're open to share it. 03:18:53.940 |
The scaffold took a while, but it's pretty cool. 03:18:55.940 |
But it's pretty cool in order to keep like continuity over time, it has like its own diary. 03:19:00.820 |
So it can kind of like update, you know, oh, this person betrayed me. 03:19:05.540 |
So I showed you that chart of like allies versus enemies. 03:19:08.020 |
So it keeps that and a bunch of different ways to parse JSON that comes back half formed from from language models. 03:19:16.500 |
But it does that to create the messages that it's either going to send to other players or globally and then actually create the orders. 03:19:22.900 |
And so one of the hardest part was how do you represent the game board, right? 03:19:28.820 |
And a lot of that was like, hey, here's the possible moves that you have and and what each, you know, word actually means. 03:19:34.660 |
And there was it was interesting because there's like a threshold where like the model had to be good enough to even play. 03:19:40.100 |
And that's why, you know, two five flash was so impressive to me was that and same with our one was that they're both so cheap and able to play. 03:19:54.360 |
And now I want to watch a like reality TV show with AI diplomacy and all the personalities. 03:20:01.080 |
That's kind of the conclusion of our programming here today. 03:20:03.300 |
Hope you enjoyed learning all about the tiny teams. 03:20:05.940 |
And don't forget to check out the rest of the conference, keynotes, closing party. 03:20:09.880 |
There's a whole lot of programming to come still.