back to index

Shipping Products When You Don't Know What they Can Do — Ben Stein, Teammates


Whisper Transcript | Transcript Only Page

00:00:00.040 | Yeah, I mean the actual title has curse words in it. I will probably be cursing a lot. I didn't
00:00:18.720 | know if I would get into the track if I actually published the curse words. I'm one of the founders
00:00:23.280 | of teammates. I'm gonna wear my product manager hat today. I'm assuming this room is like mostly
00:00:28.560 | product folks, probably product minded engineers as well, but I'm gonna just like wear the product
00:00:33.180 | hat. A little bit about teammates, very quickly we make a platform for designing and managing an
00:00:40.020 | entire digital workforce. So in AI engineer parlance, right, we're building agents, but I would think of
00:00:45.480 | it like two ticks up from that because what we really believe it is the experience, the interaction
00:00:51.000 | patterns of humans and computers working together. So I want to talk to you about my favorite teammate.
00:00:56.000 | This is Stacy, Stacy Hand. She actually got promoted since this slide. She's an L3 engineer right now
00:01:01.880 | on our team. She's awesome. She looks like a hamster. All of our customers get to design whatever teammates
00:01:07.460 | and avatars they want. They give them personalities. It's all really fun. And Stacy lives inside all of
00:01:13.220 | our collaboration tools. So she has a Google Workspace account, right, for Gmail. She has a Slack account.
00:01:19.100 | We truly leaned into giving all of our teammates identity. And she sends emails, or I forward her emails. And she hangs out in Slack, like in the public channels. And she's Gen Alpha, which like is, I don't know what, I feel really old. I don't know what she's talking about. She's constantly like six, seven. And I'm like, what are you talking about? And I could tell from this room that none of you are, have 12 year olds.
00:01:32.840 | is I don't know what I feel really old I don't know what she's talking about she's constantly
00:01:36.920 | like six seven and I'm like what are you talking about and I can tell from this room that none of
00:01:41.080 | you are have 12 year olds no okay there you go so yeah you're rolling your eyes as well but anyway
00:01:46.920 | this is Stacy and this is sort of how my sales pitch goes right it's it's you know a little more
00:01:52.040 | formal than this but like this is generally the pitch and I got asked a question at some point
00:01:58.760 | recently which was oh yeah more the pitch right she'd like it shares Google Docs Google Sheets
00:02:03.720 | and she said hey or a customer said hey can I tag my teammate in a Google Doc comment and this
00:02:09.800 | did me pause because I was like well I had never actually thought about that before and so in the
00:02:14.600 | back of my mind I'm like well of course you can your question is like what's going to happen so I'm like
00:02:18.520 | okay so I'm like you know doing math in my head I'm like okay well we don't have webhooks she probably
00:02:23.080 | won't or like a webhook from the comment okay but she's going to get the email notification in the email
00:02:27.720 | that comes from Google does it have the comment and the content or maybe a link well I'm like
00:02:33.000 | I have no idea right I actually don't know what's going to happen and this was like the impetus for
00:02:38.920 | this talk is like how do I ship a product how do I develop a product how do I talk to customers how do
00:02:43.880 | I instill trust when I don't know what my own product can do and like it's really weird and
00:02:51.160 | sometimes I'm like well is this just because I'm an idiot and like well since it's my talk here I'm
00:02:54.840 | going to say no and sometimes like well is this because what we're building is so far
00:02:59.160 | out there right these are like truly autonomous agents that can use any and it's like I don't
00:03:04.040 | think that's it either I think what's happening is the product management discipline is going to undergo
00:03:09.240 | a transformation a shift in evolution whatever you call it that is super profound and we may or may not
00:03:14.760 | totally realize it yet because I think in the engineering world we're like oh well we have
00:03:19.320 | uh you know tools in our idees and we have cogen and like we sort of are starting to squint and
00:03:23.400 | understanding maybe how the discipline is changing I don't think we really understand how product
00:03:30.440 | development is changing and evolving and like what are the new tools and practices and how do we forget
00:03:34.520 | everything we've learned in the past um why is this true right if it's if the answer is not Ben's an
00:03:41.400 | idiot and the answer is uh not this is we're way out there it's two reasons number one if all our
00:03:46.600 | products are built on top of LLMs and plus or minus they are like we don't know and we can never know
00:03:51.960 | what the LLMs know right so it's like inherently in what we're building is like we don't know what the
00:03:56.840 | foundation is like you don't have to know what your database like how it works but like you generally
00:04:00.440 | know that it's like the surface area the interface that's exposed we don't understand this for the
00:04:05.160 | the models and the other thing is the expectations from customers are just boundless right we're just
00:04:11.320 | like hey here's a text box I mean it's kind of a good interface but like essentially we're like
00:04:14.920 | here's a free text box and if it's anything other than like a help me write button you're essentially
00:04:19.000 | inviting customers and users to just do whatever they want right so we have this like boundless surface
00:04:25.240 | area built on top of a product that we don't understand and so the question now is like how do we adapt
00:04:31.640 | so that's me let me actually pick on this google doc comment thing for a second right so if I was
00:04:37.160 | wearing my like traditional PM hat I'm like okay well I need to make a feature that's going to
00:04:41.800 | read and respond to google doc comments and so in my head I'm like okay well does Stacy have access
00:04:51.720 | to the google doc if she gets tagged in the comment should she reply directly in the comment should she
00:04:56.920 | reply at all what happens if somebody else comments in the thread what if someone comments in
00:05:01.560 | the thread that's not addressed to her what if it's someone else but it's what if it's her doc
00:05:05.880 | and someone else commented to someone else but she gets the note like there's just so much to like
00:05:10.360 | think about and reason about and so I'm like okay well I'm not building a google doc commenting product so
00:05:16.360 | I'm not going to spec all of those things out and like what's worse is like you also probably want to
00:05:21.800 | tag her in linear tickets right and what's what's the book like if you give a mouse a cookie right it's
00:05:26.120 | like if you give a mouse a cookie well you probably want to like tag her in Figma as well and you probably want to
00:05:31.480 | tag her in LinkedIn posts like and so we're not a team that's building a generic commenting reply
00:05:38.360 | agent system right so then the question is like what are we supposed to do right as like a product
00:05:43.240 | manager who realizes okay I have this like boundless surface area how does the practice need to change
00:05:48.520 | right unless you're this is the core of like what I want to what I want to talk about today
00:05:54.040 | so I'll do like three highfalutin ivory tower ideas and then I'll talk through some like
00:05:58.520 | practical ways to to make this real the first one is this mindset shift to like think in affordances
00:06:05.960 | and not like specific requirements so it's not if you know as a user if Stacy replies in the comment thread
00:06:13.960 | and she has really like that's not how we would think about it anymore it's the affordance or she
00:06:17.800 | has affordances to comment or she has affordances to communicate or or to email or to collaborate and
00:06:23.640 | we're going to trust the LMs we're going to trust the agentic workflow the work planning like all of the
00:06:28.520 | things inside of our um you know our beautiful 12-factor agent we're going to assume that that we'll
00:06:33.160 | understand but it's the affordances that we need to think about not the individual features which is
00:06:37.720 | really weird and it's not typically how product people have ever thought before
00:06:40.920 | and I would say actually this goes even further which is behavior is emergent and this was the
00:06:47.960 | other thing that I did not expect at all like starting in this space was uh we don't not only do we not
00:06:55.240 | know if things work sometimes they do and they work in ways we didn't expect and so I feel like our job
00:07:00.360 | as product people is to discover functionality is what are the right building blocks right what are the
00:07:06.040 | right lego bricks that we either give our engineering team our product our customers let them compose
00:07:12.120 | and can we discover emergent behavior and that is one of the reasons that like this is the most exciting
00:07:17.080 | time I've ever built because we're actually building things and then discovering what they can do themselves
00:07:21.240 | and that sort of became the new job in a sense is discovering what's possible because if you asked me
00:07:26.760 | I couldn't not sit down in front of a google doc and be like oh let me like type out what this thing
00:07:30.680 | should I can't I don't know how to do it and well friend even if I could how do I then communicate it
00:07:36.760 | right so how do you we communicate to a development team to a backlog how do you communicate exactly what
00:07:43.480 | should be happening it's like Figma doesn't like have the affordances for this right my my PRD doesn't like
00:07:49.720 | have the affordance for like well you should probably talk a little bit less gen alpha because
00:07:53.880 | you're making Ben feel old or like hey you should be really like how do we communicate and express these
00:07:58.280 | these concepts right so I think these are like the three you know high level uh ways that um
00:08:04.680 | our practice needs to change but like let's make it a little more concrete okay so evals
00:08:11.960 | I'm talking about evals okay it's really hard to make a slide with graphics of evals I feel bad for the
00:08:18.280 | eval come like how do you illustrate an eval so I'm going to make you just look at pictures of
00:08:22.520 | various teammates from you know across all of our customers um okay who hates raising their hand at
00:08:28.920 | conferences when the speaker asks them okay awesome so here's my question which is okay for the engineers
00:08:34.840 | here who like legit like don't lie like writes and runs their evals good number and of the product people
00:08:43.720 | who has visibility into the evals there's a that's not bad and and do you look at them just because
00:08:50.760 | you have the visibility all right one one and a half two okay great so I would posit that evals
00:08:57.400 | actually I'll back up right so we all talk about evals we're all going to be embarrassed to say that
00:09:00.600 | we don't really know what they are evals are a testing framework for probabilistic AI for agents right
00:09:07.800 | like if we think about the uh deterministic code right I withdraw 100 from the ATM my bank account
00:09:15.000 | should have 100 less right great and I can test that and I can write code to test that when the test is
00:09:20.840 | like was she snarky in slack it's like well how do you test that how do you write that test right so we
00:09:27.320 | come up with this whole new discipline of evals which is well she should be a little bit snarky and a little
00:09:34.040 | bit funny but not mean and then we hand it off to another LLM to say okay well hey was that reply like
00:09:39.720 | did it meet that criteria and how often did it um it doesn't have to be 100 right so she should be like
00:09:47.480 | pretty snarky but like not mean 80 percent of the time or whatever the uh uh business logic that you want
00:09:53.960 | right so these are evals and this is the world of evals but here's what I would posit which is it is the only way
00:09:59.720 | that we know what our software can do right and which is why I love the idea of product people
00:10:06.360 | looking at the evals right looking at uh because they become the new specification for the product
00:10:11.720 | right and so as we're watching you know if you're downstairs in the expo gallery you're seeing like
00:10:15.800 | new software it's like hey bring the team in and this little bit reminds me of like the old you know
00:10:19.960 | for the the old timers here like behavior driven development there was this period of time and it's like
00:10:23.880 | oh the business people are going to write the tests and that will get converted to code and then the code will
00:10:28.120 | run and like the truth is like no one ever wanted to do that like no business I don't even know who
00:10:32.120 | a business person is but like they want we're going to do that but I actually think this is different
00:10:36.200 | and I think this is pretty um a meaningful way to actually understand what the product can do
00:10:41.800 | and a little bit begin to specify what it can do
00:10:44.280 | okay so I have vibe coding for a second which we which we all do we all talk about we don't talk about
00:10:51.240 | vibe coding in a in a way that's really constructive and how do I sort of say this it's very very hard
00:10:59.720 | I think I kind of was like oh you can't do it in Figma you can't do it in a PRD like what do I really
00:11:03.720 | mean well it's very hard to like sit down in front of a blank piece of paper and um write what the teammate
00:11:11.240 | the agent experience should be it's just really hard it's hard to like imagine it and it's not until you feel
00:11:18.200 | it I mean so much of what we're doing in this like human computer interface is visceral it's feel
00:11:23.080 | it is like oh well like do they ask too many questions like how many questions is too many
00:11:28.040 | oh it wouldn't it be great if they clarified exactly what you meant well it turns out that's really
00:11:32.440 | annoying but when I wrote like the first spec I'm like then the teammate should ask a lot of clarifying
00:11:36.680 | questions and we gave it to users and they're like this sucks and I was like how would I've ever known that
00:11:41.400 | and the answer is because it's so easy to prototype and vibe code something and get the feels and so
00:11:47.880 | this is the next thing that I'm like pretty excited about as a new product management tool it is being
00:11:52.920 | able to feel and experience what it's like to interact with a computer but uh without just like uh writing it
00:12:03.160 | or hoping that you have a clickable prototype that will work I will also mention that we have to be careful
00:12:07.880 | with vibe coding because I do not mean sit in the meeting and say to the engineering team how come
00:12:12.600 | this is taking two weeks I finished the feature during the meeting like that doesn't that doesn't win
00:12:19.000 | you any points right so it is no no this is never going to production but what this does is it gives you
00:12:25.240 | the feel the the experience right and so this is like the only way I know to like actually test and
00:12:30.840 | feel it out but do you um do you remember like the the Claude um certainty issue certainly I mean certainly
00:12:37.240 | it was this period right every time you ask Claude to be like certainly and like that probably like
00:12:41.240 | seemed really good when you're testing it for the very first time and then like the fourth time when
00:12:44.760 | you're like hey can you do my taxes like certainly can you write my like acceptance speech certainly
00:12:49.480 | like but this is actually really annoying but you don't realize that until you experience it so
00:12:54.120 | like that's why I like the vibe coding okay so great we did all this development and then the
00:13:00.040 | question is like hey we pushed a prod does it work like I told you I don't know the question is like how
00:13:05.560 | do you test how do you like know that it's going to do uh the things that you said it was going to do
00:13:11.000 | and I sort of alluded to this I'll go through this quickly is just really discover discover the
00:13:14.680 | functionality and there's an old joke I'll tell the joke QA engineer walks into a bar orders a beer orders two
00:13:23.080 | beers orders zero beers orders negative one beers orders a lizard orders a beer with a emoji right
00:13:29.000 | it's like great this like bar is good to open and the first customer walks in asks where the bathroom
00:13:33.720 | is and the bar blows up right like great great old joke it's kind of how I feel these days like I just
00:13:41.080 | sit in I'm like oh you know it'd be cool if they were to like start posting comments on LinkedIn about
00:13:47.160 | what if what if they were like every time I added like a track to my Spotify account they can like
00:13:51.800 | there's just like crazy ideas but this is where like the emergent behavior comes from right and so
00:13:56.920 | is this mindset of like let's just try let's just experiment and it's it's this like kind of growth
00:14:01.640 | mindset shift from like I'm going to write the features and the requirements to no we're going
00:14:08.200 | to figure it out
00:14:08.840 | this was a little bit unexpected for me and this is
00:14:17.000 | how do you sort of report to engineering and then have things fixed by engineering and what counts
00:14:23.000 | as a bug in this world and that is really really strange and I think as sort of I don't know if it's
00:14:29.080 | like just a product role or maybe in a support role like how do you know what is appropriate to escalate to
00:14:34.040 | put onto the backlog to flag as a bug right it's like I'll keep picking on on Stacy you know she
00:14:39.720 | she gives me a really hard time so it's fine it's like hey she used too many emojis like put it in in
00:14:45.960 | in linear it's like well it's not really a bug like show me in this spec where you told me not to use too
00:14:50.760 | many emojis right so it's almost like um like in our tickets it's like oh you know closed done closed
00:14:57.720 | duplicate we need like closed llms be like crazy yo like I don't know how to fix this like just because
00:15:03.800 | it's probabilistically generated so how do we know if it's right or wrong how do you know if it's a
00:15:07.400 | feature if it's a bug right I think there's this element of um credibility that we need to build up
00:15:12.680 | it's like hey we actually under we understand that for some use cases like 80 is good enough right this
00:15:20.840 | eval we'll talk about evals if it's passing 90 of the time like that's a go it falls below 90 right
00:15:26.760 | that's red and we're not going to ship it so actually come back to evals for a second because if the eval
00:15:31.880 | becomes the spec and we can say hey we said at you know a hundred percent even though this is
00:15:37.640 | probability you should never give a refund if a customer like can't prove that they bought the
00:15:41.560 | thing or whatever like it is it's like great that is our metric and we could say yeah this is a bug
00:15:46.360 | but if it's just a a feel becomes really difficult again this was totally unexpected that like uh debugging
00:15:53.560 | and assigning bugs would become like uh controversial okay customers so this part is
00:16:00.440 | uh i found this really weird right so i think about like not wearing my like founder hat but wearing my
00:16:07.160 | like typical product manager hat right like i go into a customer meeting usually go with a salesperson
00:16:12.120 | like i'm gonna play a role right and so what's the role well i'm either gonna play like visionary i'm
00:16:17.560 | gonna like hey here's our vision for the product here's our roadmap for the future like let me help
00:16:22.200 | you understand customer like how you're gonna come along on this journey with us or uh sometimes i'll
00:16:28.280 | play the role of honest broker right it's like listen sales is like giving you a whole bunch of like
00:16:33.320 | just like selling you a bunch of vaporware let me tell you what's real let me tell you like um exactly
00:16:38.760 | what you can expect and that's a role you play right now usually preface this with like the sales team
00:16:42.440 | beforehand it's like yeah i'm going to be the honest broker and like we'll give the customer confidence
00:16:46.120 | today i'm like okay i told you our vision for the future our roadmap and the customer's like you're
00:16:53.640 | full of like none of this actually works i'm like right i can't really paint the vision because no one
00:16:57.880 | actually believes it it sounds like witchcraft and then i'm like oh well then i'll be the honest
00:17:01.320 | broker and i'll tell you how things work but i just told you i have no idea how it works right so
00:17:05.000 | this became very strange because i can't play either of the roles that i'm supposed to be playing
00:17:09.000 | the future sounds like witchcraft the present is literally i don't know so how do we do this
00:17:13.960 | i'll tell you how i've been doing it now i don't know if this is like
00:17:19.080 | a 2025 answer or if this is like a durable answer like if we believe that all of our products are for
00:17:23.560 | like for all time going to be probabilistic then like we probably have to figure out how this world works
00:17:27.400 | what i've been doing now is really saying look we're inventing the future together right we're pulling
00:17:33.080 | the future forward the reason you are talking to like a crazy startup like this and you are thinking
00:17:37.480 | truly about like the future of how you know ai and agents are going to transform your business
00:17:42.040 | is because you are a future thinker and we are going to do it together and it's a little bit like
00:17:45.560 | hey let's compliment the customer let's like but it's not just like a false you know uh uh blowing smoke
00:17:51.160 | it's like no truly we need to figure this out together and you know for 2025 i think that's actually
00:17:56.440 | the thing that is working the best uh best for me it's like no no we have to do it together and honestly if
00:18:01.960 | you are expecting something different like it's not time it's not time for you to like embrace this
00:18:08.520 | world because this is this is the the way this world is going to work and so i don't know i'll
00:18:14.680 | conclude with like i've never had more fun building i've never felt like both more inept and like more
00:18:21.080 | excited about what what i'm doing or just the experience of throwing something out in the world and
00:18:25.960 | then just like having my jaw drops like i can't believe this happened and not only that when we
00:18:31.000 | upgrade the models that are like underneath them they just suddenly get smarter and that's really
00:18:35.320 | weird too right it's like all of a sudden they start checking their work they're like oh yeah
00:18:40.200 | i just did a query to make sure that the row is properly inserted and i was like who told you to do
00:18:45.720 | that i'm like i don't know it just seemed like a good idea i'm like that is a good idea okay i wish i
00:18:50.360 | thought of that but anyway but i think this is the new world that we're working in um the
00:18:55.960 | discipline the product discipline i think is going to change for everyone and it's going to change
00:19:01.320 | faster than we expect and we all need to like adapt to just like operating in a world and forget so much
00:19:07.800 | of what we used to know right a lot of the core core ideas listen to customers several problems like
00:19:11.960 | all of that obviously still applies but the tools the techniques that we've like relied on forever
00:19:16.600 | i think are all getting upended and so anyway glad you're all at the ai engineer conference it's
00:19:21.240 | awesome to have product people here working together because you know we all have to uh you know build
00:19:25.240 | awesome products together so thank you very much