back to indexShipping Products When You Don't Know What they Can Do — Ben Stein, Teammates

00:00:00.040 |
Yeah, I mean the actual title has curse words in it. I will probably be cursing a lot. I didn't 00:00:18.720 |
know if I would get into the track if I actually published the curse words. I'm one of the founders 00:00:23.280 |
of teammates. I'm gonna wear my product manager hat today. I'm assuming this room is like mostly 00:00:28.560 |
product folks, probably product minded engineers as well, but I'm gonna just like wear the product 00:00:33.180 |
hat. A little bit about teammates, very quickly we make a platform for designing and managing an 00:00:40.020 |
entire digital workforce. So in AI engineer parlance, right, we're building agents, but I would think of 00:00:45.480 |
it like two ticks up from that because what we really believe it is the experience, the interaction 00:00:51.000 |
patterns of humans and computers working together. So I want to talk to you about my favorite teammate. 00:00:56.000 |
This is Stacy, Stacy Hand. She actually got promoted since this slide. She's an L3 engineer right now 00:01:01.880 |
on our team. She's awesome. She looks like a hamster. All of our customers get to design whatever teammates 00:01:07.460 |
and avatars they want. They give them personalities. It's all really fun. And Stacy lives inside all of 00:01:13.220 |
our collaboration tools. So she has a Google Workspace account, right, for Gmail. She has a Slack account. 00:01:19.100 |
We truly leaned into giving all of our teammates identity. And she sends emails, or I forward her emails. And she hangs out in Slack, like in the public channels. And she's Gen Alpha, which like is, I don't know what, I feel really old. I don't know what she's talking about. She's constantly like six, seven. And I'm like, what are you talking about? And I could tell from this room that none of you are, have 12 year olds. 00:01:32.840 |
is I don't know what I feel really old I don't know what she's talking about she's constantly 00:01:36.920 |
like six seven and I'm like what are you talking about and I can tell from this room that none of 00:01:41.080 |
you are have 12 year olds no okay there you go so yeah you're rolling your eyes as well but anyway 00:01:46.920 |
this is Stacy and this is sort of how my sales pitch goes right it's it's you know a little more 00:01:52.040 |
formal than this but like this is generally the pitch and I got asked a question at some point 00:01:58.760 |
recently which was oh yeah more the pitch right she'd like it shares Google Docs Google Sheets 00:02:03.720 |
and she said hey or a customer said hey can I tag my teammate in a Google Doc comment and this 00:02:09.800 |
did me pause because I was like well I had never actually thought about that before and so in the 00:02:14.600 |
back of my mind I'm like well of course you can your question is like what's going to happen so I'm like 00:02:18.520 |
okay so I'm like you know doing math in my head I'm like okay well we don't have webhooks she probably 00:02:23.080 |
won't or like a webhook from the comment okay but she's going to get the email notification in the email 00:02:27.720 |
that comes from Google does it have the comment and the content or maybe a link well I'm like 00:02:33.000 |
I have no idea right I actually don't know what's going to happen and this was like the impetus for 00:02:38.920 |
this talk is like how do I ship a product how do I develop a product how do I talk to customers how do 00:02:43.880 |
I instill trust when I don't know what my own product can do and like it's really weird and 00:02:51.160 |
sometimes I'm like well is this just because I'm an idiot and like well since it's my talk here I'm 00:02:54.840 |
going to say no and sometimes like well is this because what we're building is so far 00:02:59.160 |
out there right these are like truly autonomous agents that can use any and it's like I don't 00:03:04.040 |
think that's it either I think what's happening is the product management discipline is going to undergo 00:03:09.240 |
a transformation a shift in evolution whatever you call it that is super profound and we may or may not 00:03:14.760 |
totally realize it yet because I think in the engineering world we're like oh well we have 00:03:19.320 |
uh you know tools in our idees and we have cogen and like we sort of are starting to squint and 00:03:23.400 |
understanding maybe how the discipline is changing I don't think we really understand how product 00:03:30.440 |
development is changing and evolving and like what are the new tools and practices and how do we forget 00:03:34.520 |
everything we've learned in the past um why is this true right if it's if the answer is not Ben's an 00:03:41.400 |
idiot and the answer is uh not this is we're way out there it's two reasons number one if all our 00:03:46.600 |
products are built on top of LLMs and plus or minus they are like we don't know and we can never know 00:03:51.960 |
what the LLMs know right so it's like inherently in what we're building is like we don't know what the 00:03:56.840 |
foundation is like you don't have to know what your database like how it works but like you generally 00:04:00.440 |
know that it's like the surface area the interface that's exposed we don't understand this for the 00:04:05.160 |
the models and the other thing is the expectations from customers are just boundless right we're just 00:04:11.320 |
like hey here's a text box I mean it's kind of a good interface but like essentially we're like 00:04:14.920 |
here's a free text box and if it's anything other than like a help me write button you're essentially 00:04:19.000 |
inviting customers and users to just do whatever they want right so we have this like boundless surface 00:04:25.240 |
area built on top of a product that we don't understand and so the question now is like how do we adapt 00:04:31.640 |
so that's me let me actually pick on this google doc comment thing for a second right so if I was 00:04:37.160 |
wearing my like traditional PM hat I'm like okay well I need to make a feature that's going to 00:04:41.800 |
read and respond to google doc comments and so in my head I'm like okay well does Stacy have access 00:04:51.720 |
to the google doc if she gets tagged in the comment should she reply directly in the comment should she 00:04:56.920 |
reply at all what happens if somebody else comments in the thread what if someone comments in 00:05:01.560 |
the thread that's not addressed to her what if it's someone else but it's what if it's her doc 00:05:05.880 |
and someone else commented to someone else but she gets the note like there's just so much to like 00:05:10.360 |
think about and reason about and so I'm like okay well I'm not building a google doc commenting product so 00:05:16.360 |
I'm not going to spec all of those things out and like what's worse is like you also probably want to 00:05:21.800 |
tag her in linear tickets right and what's what's the book like if you give a mouse a cookie right it's 00:05:26.120 |
like if you give a mouse a cookie well you probably want to like tag her in Figma as well and you probably want to 00:05:31.480 |
tag her in LinkedIn posts like and so we're not a team that's building a generic commenting reply 00:05:38.360 |
agent system right so then the question is like what are we supposed to do right as like a product 00:05:43.240 |
manager who realizes okay I have this like boundless surface area how does the practice need to change 00:05:48.520 |
right unless you're this is the core of like what I want to what I want to talk about today 00:05:54.040 |
so I'll do like three highfalutin ivory tower ideas and then I'll talk through some like 00:05:58.520 |
practical ways to to make this real the first one is this mindset shift to like think in affordances 00:06:05.960 |
and not like specific requirements so it's not if you know as a user if Stacy replies in the comment thread 00:06:13.960 |
and she has really like that's not how we would think about it anymore it's the affordance or she 00:06:17.800 |
has affordances to comment or she has affordances to communicate or or to email or to collaborate and 00:06:23.640 |
we're going to trust the LMs we're going to trust the agentic workflow the work planning like all of the 00:06:28.520 |
things inside of our um you know our beautiful 12-factor agent we're going to assume that that we'll 00:06:33.160 |
understand but it's the affordances that we need to think about not the individual features which is 00:06:37.720 |
really weird and it's not typically how product people have ever thought before 00:06:40.920 |
and I would say actually this goes even further which is behavior is emergent and this was the 00:06:47.960 |
other thing that I did not expect at all like starting in this space was uh we don't not only do we not 00:06:55.240 |
know if things work sometimes they do and they work in ways we didn't expect and so I feel like our job 00:07:00.360 |
as product people is to discover functionality is what are the right building blocks right what are the 00:07:06.040 |
right lego bricks that we either give our engineering team our product our customers let them compose 00:07:12.120 |
and can we discover emergent behavior and that is one of the reasons that like this is the most exciting 00:07:17.080 |
time I've ever built because we're actually building things and then discovering what they can do themselves 00:07:21.240 |
and that sort of became the new job in a sense is discovering what's possible because if you asked me 00:07:26.760 |
I couldn't not sit down in front of a google doc and be like oh let me like type out what this thing 00:07:30.680 |
should I can't I don't know how to do it and well friend even if I could how do I then communicate it 00:07:36.760 |
right so how do you we communicate to a development team to a backlog how do you communicate exactly what 00:07:43.480 |
should be happening it's like Figma doesn't like have the affordances for this right my my PRD doesn't like 00:07:49.720 |
have the affordance for like well you should probably talk a little bit less gen alpha because 00:07:53.880 |
you're making Ben feel old or like hey you should be really like how do we communicate and express these 00:07:58.280 |
these concepts right so I think these are like the three you know high level uh ways that um 00:08:04.680 |
our practice needs to change but like let's make it a little more concrete okay so evals 00:08:11.960 |
I'm talking about evals okay it's really hard to make a slide with graphics of evals I feel bad for the 00:08:18.280 |
eval come like how do you illustrate an eval so I'm going to make you just look at pictures of 00:08:22.520 |
various teammates from you know across all of our customers um okay who hates raising their hand at 00:08:28.920 |
conferences when the speaker asks them okay awesome so here's my question which is okay for the engineers 00:08:34.840 |
here who like legit like don't lie like writes and runs their evals good number and of the product people 00:08:43.720 |
who has visibility into the evals there's a that's not bad and and do you look at them just because 00:08:50.760 |
you have the visibility all right one one and a half two okay great so I would posit that evals 00:08:57.400 |
actually I'll back up right so we all talk about evals we're all going to be embarrassed to say that 00:09:00.600 |
we don't really know what they are evals are a testing framework for probabilistic AI for agents right 00:09:07.800 |
like if we think about the uh deterministic code right I withdraw 100 from the ATM my bank account 00:09:15.000 |
should have 100 less right great and I can test that and I can write code to test that when the test is 00:09:20.840 |
like was she snarky in slack it's like well how do you test that how do you write that test right so we 00:09:27.320 |
come up with this whole new discipline of evals which is well she should be a little bit snarky and a little 00:09:34.040 |
bit funny but not mean and then we hand it off to another LLM to say okay well hey was that reply like 00:09:39.720 |
did it meet that criteria and how often did it um it doesn't have to be 100 right so she should be like 00:09:47.480 |
pretty snarky but like not mean 80 percent of the time or whatever the uh uh business logic that you want 00:09:53.960 |
right so these are evals and this is the world of evals but here's what I would posit which is it is the only way 00:09:59.720 |
that we know what our software can do right and which is why I love the idea of product people 00:10:06.360 |
looking at the evals right looking at uh because they become the new specification for the product 00:10:11.720 |
right and so as we're watching you know if you're downstairs in the expo gallery you're seeing like 00:10:15.800 |
new software it's like hey bring the team in and this little bit reminds me of like the old you know 00:10:19.960 |
for the the old timers here like behavior driven development there was this period of time and it's like 00:10:23.880 |
oh the business people are going to write the tests and that will get converted to code and then the code will 00:10:28.120 |
run and like the truth is like no one ever wanted to do that like no business I don't even know who 00:10:32.120 |
a business person is but like they want we're going to do that but I actually think this is different 00:10:36.200 |
and I think this is pretty um a meaningful way to actually understand what the product can do 00:10:41.800 |
and a little bit begin to specify what it can do 00:10:44.280 |
okay so I have vibe coding for a second which we which we all do we all talk about we don't talk about 00:10:51.240 |
vibe coding in a in a way that's really constructive and how do I sort of say this it's very very hard 00:10:59.720 |
I think I kind of was like oh you can't do it in Figma you can't do it in a PRD like what do I really 00:11:03.720 |
mean well it's very hard to like sit down in front of a blank piece of paper and um write what the teammate 00:11:11.240 |
the agent experience should be it's just really hard it's hard to like imagine it and it's not until you feel 00:11:18.200 |
it I mean so much of what we're doing in this like human computer interface is visceral it's feel 00:11:23.080 |
it is like oh well like do they ask too many questions like how many questions is too many 00:11:28.040 |
oh it wouldn't it be great if they clarified exactly what you meant well it turns out that's really 00:11:32.440 |
annoying but when I wrote like the first spec I'm like then the teammate should ask a lot of clarifying 00:11:36.680 |
questions and we gave it to users and they're like this sucks and I was like how would I've ever known that 00:11:41.400 |
and the answer is because it's so easy to prototype and vibe code something and get the feels and so 00:11:47.880 |
this is the next thing that I'm like pretty excited about as a new product management tool it is being 00:11:52.920 |
able to feel and experience what it's like to interact with a computer but uh without just like uh writing it 00:12:03.160 |
or hoping that you have a clickable prototype that will work I will also mention that we have to be careful 00:12:07.880 |
with vibe coding because I do not mean sit in the meeting and say to the engineering team how come 00:12:12.600 |
this is taking two weeks I finished the feature during the meeting like that doesn't that doesn't win 00:12:19.000 |
you any points right so it is no no this is never going to production but what this does is it gives you 00:12:25.240 |
the feel the the experience right and so this is like the only way I know to like actually test and 00:12:30.840 |
feel it out but do you um do you remember like the the Claude um certainty issue certainly I mean certainly 00:12:37.240 |
it was this period right every time you ask Claude to be like certainly and like that probably like 00:12:41.240 |
seemed really good when you're testing it for the very first time and then like the fourth time when 00:12:44.760 |
you're like hey can you do my taxes like certainly can you write my like acceptance speech certainly 00:12:49.480 |
like but this is actually really annoying but you don't realize that until you experience it so 00:12:54.120 |
like that's why I like the vibe coding okay so great we did all this development and then the 00:13:00.040 |
question is like hey we pushed a prod does it work like I told you I don't know the question is like how 00:13:05.560 |
do you test how do you like know that it's going to do uh the things that you said it was going to do 00:13:11.000 |
and I sort of alluded to this I'll go through this quickly is just really discover discover the 00:13:14.680 |
functionality and there's an old joke I'll tell the joke QA engineer walks into a bar orders a beer orders two 00:13:23.080 |
beers orders zero beers orders negative one beers orders a lizard orders a beer with a emoji right 00:13:29.000 |
it's like great this like bar is good to open and the first customer walks in asks where the bathroom 00:13:33.720 |
is and the bar blows up right like great great old joke it's kind of how I feel these days like I just 00:13:41.080 |
sit in I'm like oh you know it'd be cool if they were to like start posting comments on LinkedIn about 00:13:47.160 |
what if what if they were like every time I added like a track to my Spotify account they can like 00:13:51.800 |
there's just like crazy ideas but this is where like the emergent behavior comes from right and so 00:13:56.920 |
is this mindset of like let's just try let's just experiment and it's it's this like kind of growth 00:14:01.640 |
mindset shift from like I'm going to write the features and the requirements to no we're going 00:14:08.840 |
this was a little bit unexpected for me and this is 00:14:17.000 |
how do you sort of report to engineering and then have things fixed by engineering and what counts 00:14:23.000 |
as a bug in this world and that is really really strange and I think as sort of I don't know if it's 00:14:29.080 |
like just a product role or maybe in a support role like how do you know what is appropriate to escalate to 00:14:34.040 |
put onto the backlog to flag as a bug right it's like I'll keep picking on on Stacy you know she 00:14:39.720 |
she gives me a really hard time so it's fine it's like hey she used too many emojis like put it in in 00:14:45.960 |
in linear it's like well it's not really a bug like show me in this spec where you told me not to use too 00:14:50.760 |
many emojis right so it's almost like um like in our tickets it's like oh you know closed done closed 00:14:57.720 |
duplicate we need like closed llms be like crazy yo like I don't know how to fix this like just because 00:15:03.800 |
it's probabilistically generated so how do we know if it's right or wrong how do you know if it's a 00:15:07.400 |
feature if it's a bug right I think there's this element of um credibility that we need to build up 00:15:12.680 |
it's like hey we actually under we understand that for some use cases like 80 is good enough right this 00:15:20.840 |
eval we'll talk about evals if it's passing 90 of the time like that's a go it falls below 90 right 00:15:26.760 |
that's red and we're not going to ship it so actually come back to evals for a second because if the eval 00:15:31.880 |
becomes the spec and we can say hey we said at you know a hundred percent even though this is 00:15:37.640 |
probability you should never give a refund if a customer like can't prove that they bought the 00:15:41.560 |
thing or whatever like it is it's like great that is our metric and we could say yeah this is a bug 00:15:46.360 |
but if it's just a a feel becomes really difficult again this was totally unexpected that like uh debugging 00:15:53.560 |
and assigning bugs would become like uh controversial okay customers so this part is 00:16:00.440 |
uh i found this really weird right so i think about like not wearing my like founder hat but wearing my 00:16:07.160 |
like typical product manager hat right like i go into a customer meeting usually go with a salesperson 00:16:12.120 |
like i'm gonna play a role right and so what's the role well i'm either gonna play like visionary i'm 00:16:17.560 |
gonna like hey here's our vision for the product here's our roadmap for the future like let me help 00:16:22.200 |
you understand customer like how you're gonna come along on this journey with us or uh sometimes i'll 00:16:28.280 |
play the role of honest broker right it's like listen sales is like giving you a whole bunch of like 00:16:33.320 |
just like selling you a bunch of vaporware let me tell you what's real let me tell you like um exactly 00:16:38.760 |
what you can expect and that's a role you play right now usually preface this with like the sales team 00:16:42.440 |
beforehand it's like yeah i'm going to be the honest broker and like we'll give the customer confidence 00:16:46.120 |
today i'm like okay i told you our vision for the future our roadmap and the customer's like you're 00:16:53.640 |
full of like none of this actually works i'm like right i can't really paint the vision because no one 00:16:57.880 |
actually believes it it sounds like witchcraft and then i'm like oh well then i'll be the honest 00:17:01.320 |
broker and i'll tell you how things work but i just told you i have no idea how it works right so 00:17:05.000 |
this became very strange because i can't play either of the roles that i'm supposed to be playing 00:17:09.000 |
the future sounds like witchcraft the present is literally i don't know so how do we do this 00:17:13.960 |
i'll tell you how i've been doing it now i don't know if this is like 00:17:19.080 |
a 2025 answer or if this is like a durable answer like if we believe that all of our products are for 00:17:23.560 |
like for all time going to be probabilistic then like we probably have to figure out how this world works 00:17:27.400 |
what i've been doing now is really saying look we're inventing the future together right we're pulling 00:17:33.080 |
the future forward the reason you are talking to like a crazy startup like this and you are thinking 00:17:37.480 |
truly about like the future of how you know ai and agents are going to transform your business 00:17:42.040 |
is because you are a future thinker and we are going to do it together and it's a little bit like 00:17:45.560 |
hey let's compliment the customer let's like but it's not just like a false you know uh uh blowing smoke 00:17:51.160 |
it's like no truly we need to figure this out together and you know for 2025 i think that's actually 00:17:56.440 |
the thing that is working the best uh best for me it's like no no we have to do it together and honestly if 00:18:01.960 |
you are expecting something different like it's not time it's not time for you to like embrace this 00:18:08.520 |
world because this is this is the the way this world is going to work and so i don't know i'll 00:18:14.680 |
conclude with like i've never had more fun building i've never felt like both more inept and like more 00:18:21.080 |
excited about what what i'm doing or just the experience of throwing something out in the world and 00:18:25.960 |
then just like having my jaw drops like i can't believe this happened and not only that when we 00:18:31.000 |
upgrade the models that are like underneath them they just suddenly get smarter and that's really 00:18:35.320 |
weird too right it's like all of a sudden they start checking their work they're like oh yeah 00:18:40.200 |
i just did a query to make sure that the row is properly inserted and i was like who told you to do 00:18:45.720 |
that i'm like i don't know it just seemed like a good idea i'm like that is a good idea okay i wish i 00:18:50.360 |
thought of that but anyway but i think this is the new world that we're working in um the 00:18:55.960 |
discipline the product discipline i think is going to change for everyone and it's going to change 00:19:01.320 |
faster than we expect and we all need to like adapt to just like operating in a world and forget so much 00:19:07.800 |
of what we used to know right a lot of the core core ideas listen to customers several problems like 00:19:11.960 |
all of that obviously still applies but the tools the techniques that we've like relied on forever 00:19:16.600 |
i think are all getting upended and so anyway glad you're all at the ai engineer conference it's 00:19:21.240 |
awesome to have product people here working together because you know we all have to uh you know build 00:19:25.240 |
awesome products together so thank you very much