Back to Index

Shipping Products When You Don't Know What they Can Do — Ben Stein, Teammates


Transcript

Yeah, I mean the actual title has curse words in it. I will probably be cursing a lot. I didn't know if I would get into the track if I actually published the curse words. I'm one of the founders of teammates. I'm gonna wear my product manager hat today. I'm assuming this room is like mostly product folks, probably product minded engineers as well, but I'm gonna just like wear the product hat.

A little bit about teammates, very quickly we make a platform for designing and managing an entire digital workforce. So in AI engineer parlance, right, we're building agents, but I would think of it like two ticks up from that because what we really believe it is the experience, the interaction patterns of humans and computers working together.

So I want to talk to you about my favorite teammate. This is Stacy, Stacy Hand. She actually got promoted since this slide. She's an L3 engineer right now on our team. She's awesome. She looks like a hamster. All of our customers get to design whatever teammates and avatars they want.

They give them personalities. It's all really fun. And Stacy lives inside all of our collaboration tools. So she has a Google Workspace account, right, for Gmail. She has a Slack account. We truly leaned into giving all of our teammates identity. And she sends emails, or I forward her emails.

And she hangs out in Slack, like in the public channels. And she's Gen Alpha, which like is, I don't know what, I feel really old. I don't know what she's talking about. She's constantly like six, seven. And I'm like, what are you talking about? And I could tell from this room that none of you are, have 12 year olds.

is I don't know what I feel really old I don't know what she's talking about she's constantly like six seven and I'm like what are you talking about and I can tell from this room that none of you are have 12 year olds no okay there you go so yeah you're rolling your eyes as well but anyway this is Stacy and this is sort of how my sales pitch goes right it's it's you know a little more formal than this but like this is generally the pitch and I got asked a question at some point recently which was oh yeah more the pitch right she'd like it shares Google Docs Google Sheets and she said hey or a customer said hey can I tag my teammate in a Google Doc comment and this did me pause because I was like well I had never actually thought about that before and so in the back of my mind I'm like well of course you can your question is like what's going to happen so I'm like okay so I'm like you know doing math in my head I'm like okay well we don't have webhooks she probably won't or like a webhook from the comment okay but she's going to get the email notification in the email that comes from Google does it have the comment and the content or maybe a link well I'm like I have no idea right I actually don't know what's going to happen and this was like the impetus for this talk is like how do I ship a product how do I develop a product how do I talk to customers how do I instill trust when I don't know what my own product can do and like it's really weird and sometimes I'm like well is this just because I'm an idiot and like well since it's my talk here I'm going to say no and sometimes like well is this because what we're building is so far out there right these are like truly autonomous agents that can use any and it's like I don't think that's it either I think what's happening is the product management discipline is going to undergo a transformation a shift in evolution whatever you call it that is super profound and we may or may not totally realize it yet because I think in the engineering world we're like oh well we have uh you know tools in our idees and we have cogen and like we sort of are starting to squint and understanding maybe how the discipline is changing I don't think we really understand how product development is changing and evolving and like what are the new tools and practices and how do we forget everything we've learned in the past um why is this true right if it's if the answer is not Ben's an idiot and the answer is uh not this is we're way out there it's two reasons number one if all our products are built on top of LLMs and plus or minus they are like we don't know and we can never know what the LLMs know right so it's like inherently in what we're building is like we don't know what the foundation is like you don't have to know what your database like how it works but like you generally know that it's like the surface area the interface that's exposed we don't understand this for the the models and the other thing is the expectations from customers are just boundless right we're just like hey here's a text box I mean it's kind of a good interface but like essentially we're like here's a free text box and if it's anything other than like a help me write button you're essentially inviting customers and users to just do whatever they want right so we have this like boundless surface area built on top of a product that we don't understand and so the question now is like how do we adapt so that's me let me actually pick on this google doc comment thing for a second right so if I was wearing my like traditional PM hat I'm like okay well I need to make a feature that's going to read and respond to google doc comments and so in my head I'm like okay well does Stacy have access to the google doc if she gets tagged in the comment should she reply directly in the comment should she reply at all what happens if somebody else comments in the thread what if someone comments in the thread that's not addressed to her what if it's someone else but it's what if it's her doc and someone else commented to someone else but she gets the note like there's just so much to like think about and reason about and so I'm like okay well I'm not building a google doc commenting product so I'm not going to spec all of those things out and like what's worse is like you also probably want to tag her in linear tickets right and what's what's the book like if you give a mouse a cookie right it's like if you give a mouse a cookie well you probably want to like tag her in Figma as well and you probably want to tag her in LinkedIn posts like and so we're not a team that's building a generic commenting reply agent system right so then the question is like what are we supposed to do right as like a product manager who realizes okay I have this like boundless surface area how does the practice need to change right unless you're this is the core of like what I want to what I want to talk about today so I'll do like three highfalutin ivory tower ideas and then I'll talk through some like practical ways to to make this real the first one is this mindset shift to like think in affordances and not like specific requirements so it's not if you know as a user if Stacy replies in the comment thread and she has really like that's not how we would think about it anymore it's the affordance or she has affordances to comment or she has affordances to communicate or or to email or to collaborate and we're going to trust the LMs we're going to trust the agentic workflow the work planning like all of the things inside of our um you know our beautiful 12-factor agent we're going to assume that that we'll understand but it's the affordances that we need to think about not the individual features which is really weird and it's not typically how product people have ever thought before and I would say actually this goes even further which is behavior is emergent and this was the other thing that I did not expect at all like starting in this space was uh we don't not only do we not know if things work sometimes they do and they work in ways we didn't expect and so I feel like our job as product people is to discover functionality is what are the right building blocks right what are the right lego bricks that we either give our engineering team our product our customers let them compose and can we discover emergent behavior and that is one of the reasons that like this is the most exciting time I've ever built because we're actually building things and then discovering what they can do themselves and that sort of became the new job in a sense is discovering what's possible because if you asked me I couldn't not sit down in front of a google doc and be like oh let me like type out what this thing should I can't I don't know how to do it and well friend even if I could how do I then communicate it right so how do you we communicate to a development team to a backlog how do you communicate exactly what should be happening it's like Figma doesn't like have the affordances for this right my my PRD doesn't like have the affordance for like well you should probably talk a little bit less gen alpha because you're making Ben feel old or like hey you should be really like how do we communicate and express these these concepts right so I think these are like the three you know high level uh ways that um our practice needs to change but like let's make it a little more concrete okay so evals I'm talking about evals okay it's really hard to make a slide with graphics of evals I feel bad for the eval come like how do you illustrate an eval so I'm going to make you just look at pictures of various teammates from you know across all of our customers um okay who hates raising their hand at conferences when the speaker asks them okay awesome so here's my question which is okay for the engineers here who like legit like don't lie like writes and runs their evals good number and of the product people who has visibility into the evals there's a that's not bad and and do you look at them just because you have the visibility all right one one and a half two okay great so I would posit that evals actually I'll back up right so we all talk about evals we're all going to be embarrassed to say that we don't really know what they are evals are a testing framework for probabilistic AI for agents right like if we think about the uh deterministic code right I withdraw 100 from the ATM my bank account should have 100 less right great and I can test that and I can write code to test that when the test is like was she snarky in slack it's like well how do you test that how do you write that test right so we come up with this whole new discipline of evals which is well she should be a little bit snarky and a little bit funny but not mean and then we hand it off to another LLM to say okay well hey was that reply like did it meet that criteria and how often did it um it doesn't have to be 100 right so she should be like pretty snarky but like not mean 80 percent of the time or whatever the uh uh business logic that you want right so these are evals and this is the world of evals but here's what I would posit which is it is the only way that we know what our software can do right and which is why I love the idea of product people looking at the evals right looking at uh because they become the new specification for the product right and so as we're watching you know if you're downstairs in the expo gallery you're seeing like new software it's like hey bring the team in and this little bit reminds me of like the old you know for the the old timers here like behavior driven development there was this period of time and it's like oh the business people are going to write the tests and that will get converted to code and then the code will run and like the truth is like no one ever wanted to do that like no business I don't even know who a business person is but like they want we're going to do that but I actually think this is different and I think this is pretty um a meaningful way to actually understand what the product can do and a little bit begin to specify what it can do okay so I have vibe coding for a second which we which we all do we all talk about we don't talk about vibe coding in a in a way that's really constructive and how do I sort of say this it's very very hard I think I kind of was like oh you can't do it in Figma you can't do it in a PRD like what do I really mean well it's very hard to like sit down in front of a blank piece of paper and um write what the teammate the agent experience should be it's just really hard it's hard to like imagine it and it's not until you feel it I mean so much of what we're doing in this like human computer interface is visceral it's feel it is like oh well like do they ask too many questions like how many questions is too many oh it wouldn't it be great if they clarified exactly what you meant well it turns out that's really annoying but when I wrote like the first spec I'm like then the teammate should ask a lot of clarifying questions and we gave it to users and they're like this sucks and I was like how would I've ever known that and the answer is because it's so easy to prototype and vibe code something and get the feels and so this is the next thing that I'm like pretty excited about as a new product management tool it is being able to feel and experience what it's like to interact with a computer but uh without just like uh writing it or hoping that you have a clickable prototype that will work I will also mention that we have to be careful with vibe coding because I do not mean sit in the meeting and say to the engineering team how come this is taking two weeks I finished the feature during the meeting like that doesn't that doesn't win you any points right so it is no no this is never going to production but what this does is it gives you the feel the the experience right and so this is like the only way I know to like actually test and feel it out but do you um do you remember like the the Claude um certainty issue certainly I mean certainly it was this period right every time you ask Claude to be like certainly and like that probably like seemed really good when you're testing it for the very first time and then like the fourth time when you're like hey can you do my taxes like certainly can you write my like acceptance speech certainly like but this is actually really annoying but you don't realize that until you experience it so like that's why I like the vibe coding okay so great we did all this development and then the question is like hey we pushed a prod does it work like I told you I don't know the question is like how do you test how do you like know that it's going to do uh the things that you said it was going to do and I sort of alluded to this I'll go through this quickly is just really discover discover the functionality and there's an old joke I'll tell the joke QA engineer walks into a bar orders a beer orders two beers orders zero beers orders negative one beers orders a lizard orders a beer with a emoji right it's like great this like bar is good to open and the first customer walks in asks where the bathroom is and the bar blows up right like great great old joke it's kind of how I feel these days like I just sit in I'm like oh you know it'd be cool if they were to like start posting comments on LinkedIn about what if what if they were like every time I added like a track to my Spotify account they can like there's just like crazy ideas but this is where like the emergent behavior comes from right and so is this mindset of like let's just try let's just experiment and it's it's this like kind of growth mindset shift from like I'm going to write the features and the requirements to no we're going to figure it out this was a little bit unexpected for me and this is how do you sort of report to engineering and then have things fixed by engineering and what counts as a bug in this world and that is really really strange and I think as sort of I don't know if it's like just a product role or maybe in a support role like how do you know what is appropriate to escalate to put onto the backlog to flag as a bug right it's like I'll keep picking on on Stacy you know she she gives me a really hard time so it's fine it's like hey she used too many emojis like put it in in in linear it's like well it's not really a bug like show me in this spec where you told me not to use too many emojis right so it's almost like um like in our tickets it's like oh you know closed done closed duplicate we need like closed llms be like crazy yo like I don't know how to fix this like just because it's probabilistically generated so how do we know if it's right or wrong how do you know if it's a feature if it's a bug right I think there's this element of um credibility that we need to build up it's like hey we actually under we understand that for some use cases like 80 is good enough right this eval we'll talk about evals if it's passing 90 of the time like that's a go it falls below 90 right that's red and we're not going to ship it so actually come back to evals for a second because if the eval becomes the spec and we can say hey we said at you know a hundred percent even though this is probability you should never give a refund if a customer like can't prove that they bought the thing or whatever like it is it's like great that is our metric and we could say yeah this is a bug but if it's just a a feel becomes really difficult again this was totally unexpected that like uh debugging and assigning bugs would become like uh controversial okay customers so this part is uh i found this really weird right so i think about like not wearing my like founder hat but wearing my like typical product manager hat right like i go into a customer meeting usually go with a salesperson like i'm gonna play a role right and so what's the role well i'm either gonna play like visionary i'm gonna like hey here's our vision for the product here's our roadmap for the future like let me help you understand customer like how you're gonna come along on this journey with us or uh sometimes i'll play the role of honest broker right it's like listen sales is like giving you a whole bunch of like just like selling you a bunch of vaporware let me tell you what's real let me tell you like um exactly what you can expect and that's a role you play right now usually preface this with like the sales team beforehand it's like yeah i'm going to be the honest broker and like we'll give the customer confidence today i'm like okay i told you our vision for the future our roadmap and the customer's like you're full of like none of this actually works i'm like right i can't really paint the vision because no one actually believes it it sounds like witchcraft and then i'm like oh well then i'll be the honest broker and i'll tell you how things work but i just told you i have no idea how it works right so this became very strange because i can't play either of the roles that i'm supposed to be playing the future sounds like witchcraft the present is literally i don't know so how do we do this i'll tell you how i've been doing it now i don't know if this is like a 2025 answer or if this is like a durable answer like if we believe that all of our products are for like for all time going to be probabilistic then like we probably have to figure out how this world works what i've been doing now is really saying look we're inventing the future together right we're pulling the future forward the reason you are talking to like a crazy startup like this and you are thinking truly about like the future of how you know ai and agents are going to transform your business is because you are a future thinker and we are going to do it together and it's a little bit like hey let's compliment the customer let's like but it's not just like a false you know uh uh blowing smoke it's like no truly we need to figure this out together and you know for 2025 i think that's actually the thing that is working the best uh best for me it's like no no we have to do it together and honestly if you are expecting something different like it's not time it's not time for you to like embrace this world because this is this is the the way this world is going to work and so i don't know i'll conclude with like i've never had more fun building i've never felt like both more inept and like more excited about what what i'm doing or just the experience of throwing something out in the world and then just like having my jaw drops like i can't believe this happened and not only that when we upgrade the models that are like underneath them they just suddenly get smarter and that's really weird too right it's like all of a sudden they start checking their work they're like oh yeah i just did a query to make sure that the row is properly inserted and i was like who told you to do that i'm like i don't know it just seemed like a good idea i'm like that is a good idea okay i wish i thought of that but anyway but i think this is the new world that we're working in um the discipline the product discipline i think is going to change for everyone and it's going to change faster than we expect and we all need to like adapt to just like operating in a world and forget so much of what we used to know right a lot of the core core ideas listen to customers several problems like all of that obviously still applies but the tools the techniques that we've like relied on forever i think are all getting upended and so anyway glad you're all at the ai engineer conference it's awesome to have product people here working together because you know we all have to uh you know build awesome products together so thank you very much