The era of unbounded products: Designing for Multimodal IO: Ben Hylak

I'm so excited to be here with you guys today, and I think what is probably the coolest AI conference in the world at such an exciting, exciting time in history, I think, especially for AI products. If you don't already know me from either demos on Twitter or sometimes probably ill-advised spicy takes on Twitter, my name is Ben Hilack, and I'm the founder of Dawn.

So at Dawn, we help some of the best companies in the world, everyone from GitHub to Can of Soup build better, more predictable AI products. My entire life, I've been really obsessed with building and designing unbounded products. So unbounded products are products that transcend the mouse, the monitor in some way, right?

So for me, that started with robotics. I think the first one was when I was in eighth grade. Eventually, rockets at SpaceX. So these are very unbounded products. And then most recently, I was on the design team for the Apple Vision Pro for four years. So we designed the first version of Vision OS.

I think that AI makes products less bounded than they've ever been, right? You can type, you can talk, you can show images or show video just like we just saw. You can also sort of plead, you can bargain, you can confide, right? These are very interesting sort of input modalities.

And this unboundedness often makes products unpredictable, right? Confusing, hard to understand. Users assume your product can do things that it can't. They try to do those things. Doesn't work. And they walk away thinking that it can't even do the things that it can't. When you talk to people, and specifically people that are not in this room, how they use ChatGPT, how they learn how to use it, it's often word of mouth.

Right? So they hear one of their friends say that they used it for travel planning, and then they go use it for travel planning. A lot of us, a lot of us in this room, especially like people that are more technical, we often learn through trial and error, right?

So we just keep trying, keep trying. We keep trying because we know that these models are good, right? We know that it's impressive. But a lot of people are not. They don't do the trial and error thing, right? So they try it once. That doesn't work. They don't try good.

And so, this talk is about making good AI products. And to that end, I'm going to cover just three things. So those three things are the past, right? So how have products become more unbounded, and what has worked for unbounded products in the past? The present, which is AI products today.

What are sort of good design patterns and bad design patterns? And then the third point is going to be the future, right? So again, just three things. Just the past, the present, and the future. Easy. So let's start with the past. So most software that we use lives on a screen, right?

And you use it just by typing-- sorry, you use it primarily by swiping, clicking, and tapping, right? When you click something, whatever the developer expected to happen is what happens, depending on how good of a developer you are. It's easy for users to understand what your app can do.

They look. They see the buttons. They get it. It's also very easy for you to understand what your users are doing. You just add an amplitude or mix panel call on a button press. You see what they did. So if you think about one of the biggest changes to this, previous to the last two years, was multi-touch, right?

And this is just like, instead of one pointer, you have two. But just by adding that second pointer, you get relative distance. You get rotation, right? And just this one little change, like, largely made the smartphone possible, right? Like, largely made it easy to use a screen that is that small.

And now it's just getting crazy, right? It's like we have unbounded products everywhere. The products are so unbounded. You have software, you know, just freely roaming the streets of San Francisco, getting attacked by fiery mobs, right? So this is getting crazy. And so I want to talk a little bit about just one unbounded product that I got to work on, which is the Vision Pro.

And what I want to talk about is just three lessons that we learned while we were designing it, lessons that I think aren't as intuitive looking from the outside in. So I think that unbounded products are often defined by this "what if" question. Like, when we were starting, it's like users get themselves into the craziest situations.

So something as simple as, oh, well, what if someone's in the living room and then they move to the bedroom and they lay down on their bed, right? What should happen to your apps? If you're designing Mac OS that's on a laptop, you don't have to worry about that.

But that's something we had to think about. And there's hundreds of more questions like this, right? What if someone's on a plane? What if someone's next to their friend? What if someone has a disability of some sort, right? Like, they cannot move. They can't move their neck. They're bedridden.

So all of these what ifs. And I think this is, again, what defines unbounded products, right? All of us that are building AI products, we're constantly thinking, you know, oh, like, what if someone puts in this? What if someone puts in that? And there's evals, et cetera, et cetera.

And so without structure, you just have chaos, right? You have a blank slate. You have all these what ifs, infinite world of possibilities. And so it's really on us as product designers to add structure. And structure is what creates clarity. So again, I want to talk about three ways we added structure.

The first was highlighting what matters and doing it really fast. So the first thing you see in Vision OS is a home screen. It has apps, it has people, and it has environments. So those are the things that we think matter when you're using Vision OS. So they're the first thing you see.

It might not sound that novel, and it's not. In a lot of ways, it's the same thing that happens on your iPhone. But when you compare it to VR products that came before, it's very hard to understand how you're actually, you know, what is this thing good for when you look at this menu.

The second point is hierarchy. Hierarchy is what gives unbounded products a shape and a purpose, right? It's what helps users understand what it's good for, what they should use it for. So again, we have the home menu. That's kind of where everything starts and ends for Vision OS. We have windows.

They have bounds. You can resize them and move them. And any individual window can go full screen, right? So that was our hierarchy. The last point, which is really important, and I think the easiest way to make unbounded products feel familiar is -- sorry, intuitive. I got ahead of myself.

It's familiar. Familiarity. It was something we hit when we were building dawn. Our first kind of prototype was this star cluster thing that you could explore. It was really fun. Nowadays, it looks a lot more like this, which are, you know, tables. And we have graphs and examples. Again, it's just structure, clarity.

And I think that it's no accident that, you know, the TV app on Vision OS looks a lot like the TV app on TV OS, right? It's not an accident. It wasn't laziness. When people are sort of in uncharted territory, you want to give them as many signs of home as possible.

It's not an accident. It's an accident, right? And it's not an accident. Same thing for Control Center, right? When people see Vision OS, they already know how to use it. So again, these three points. Highlighting what matters, bringing that to the forefront, establishing hierarchy, and then leveraging familiarity. All right.

And so that was the past. Now we're going to talk about the present. And specifically, we're going to talk about the present. We're going to talk about AI products. We're going to talk about ways, both good and bad, that products have been incorporating structure into their AI features. It's really important to note the right structure is very unique to your app, right?

That's the whole point, is that it gives your app a shape. It helps your user understand what it's actually for. So let's take something like Dot, right? Dot is a companion. Dot is sort of a journal, at least for me. And so the structure they added was that if you pinch out, you can see each day separated, right?

It feels a lot like a journal. And if you tap a person or two people, in this case my co-founders, I can see this, again, structured information about them and a timeline every time I mention them to Dot. And so again, you're pulling that structure out of the chat.

I think it does a really good job of using structure to make their experience feel more like a search engine and less like ChatCPT, less like a chat you're having a conversation with, right? And they do this by, you know, really pulling your title, you know, your query up top as like a title, you know, highlighting the sources it came from, and then having the answer below that, right?

And then having that take up the full page kind of regardless. So it makes it feel, again, like more like one shot, less something you're having a back and forth with. Now I want to talk about sort of an anti-pattern I've seen, which is this is in the Vercel chatbot demo.

I think Vercel does some of the coolest design work in the entire world. I didn't like this one. So this is like having this idea of almost ephemeral UI, but inside the flow of chat, right? And I get the appeal, right? So actually, if we go back here, sorry, this was a video I want to show, you know, you have a slider, right?

So instead of having to like, you inquire about, you know, you want to buy Doge, and it shows you this UI. So you can adjust exactly how many instead of having to do it over text. It's, you know, could be good. The problem is that when it's stuck inside this sort of unstructured thing, it starts like floating away as I try to ask follow-ups, right?

And then at some point, I even have two of them, right? So I go back up to the first one. I press purchase, and now I'm interacting with something that's completely different. So it reminds me a lot of sort of the house in up, right? It's just kind of up, up, away.

So instead of trying to put structure stuff into this unstructured thing, I think the answer is you pull it out, right? You pull it off to the side. And what that means is that as the conversation continues, you can just sort of update that structure without disrupting where the user is.

And that's exactly what Claude did with artifacts, right? And I think why it's so successful is that they pulled out the structure, which is the app you're working on and iterating on, from the actual conversation. And then so as you make changes, you can even go between the versions here without even having to scroll in the conversation, right?

So it's beautiful. And it actually brings us to another thing that I think has been really effective for AI apps, which is this almost concept of version control. So this was actually one of the shipping, like, original Chat CPT features, which is kind of crazy. But if you edit a message, you can go between the versions, right?

And it actually maintains this entire tree. It's very complicated. But it's super powerful. With v0, Vercel did something, again, amazing, where it feels extremely familiar, almost like you're working on Google Slides or something. But you can go back and iterate, keep iterating on UI without having to be afraid that you're losing something, right?

So again, versions. Again, I think familiarity is really one of the most important things for unbounded products. I think Claude did an excellent job with this. Again, I'm hyping them up here. But Chat CPT introduced memory across all of your chats, right? Completely unbounded. So when I tell it something about, you know, some sort of medical problem as I'm working on a JavaScript, it's like, you know, it knows that, which is very weird to me.

I think this idea of projects and that structure of a project is very familiar. So sharing context across a project makes more sense. Agents are something that are extremely unfamiliar to most people. And this idea of having, you know, all these different tasks and you're feeding data between steps, whatever.

But you know what is familiar are spreadsheets, right? Spreadsheets are extremely familiar to -- not to me, actually, but to a lot of people. And I think the only real uses of agents I've seen in the world, in the real world, are spreadsheets. So this is Clay, right? And each column is essentially a step that an agent is taking, the user is defining.

So it's going across, building up kind of context across the spreadsheet. Each row, it's often almost -- you do it like an eval, right? So it's like you run the first 10 rows and then you run the next 50,000, 100,000, right? So you get it right. And you can see here, eventually, you end up with a personalized email as the last column.

But with all these steps in between. The next thing that I think is extremely effective in helping people understand what your app is for, and skipping all the sort of noise of prompt hacking, prompt engineering, are examples and presets. So ChatTuberT, I think, was the first for this, where they had these, you know, message to comfort a friend, plan a relaxing day, and so on.

V0 does an awesome job with this, right? We're not just having those suggestions below, but they also have an explore page where you can see what other users are doing, what's actually working, right? Again, try to, like, shortcut this, like, prompt, you know, blank canvas problem. Notion as well, right?

They have a simple menu where you can change tone for text instead of having to, like, be like, you know, you are a very concise GPT, whatever, whatever, whatever, right? So you're just using these tried and proven things that Notion can validate. And that last point brings us to the future, right?

So where are interfaces going in the future? Linus gave an awesome talk last year where he described prompt engineering as this almost trying to drive a car -- a llama trying to drive a car with a pool noodle from the backseat was, I think, his metaphor. And there's some real truth to this, right?

And so I think, first of all, the future has a lot less prompt engineering. And we're already seeing this, right? We're already seeing this with generative images, you know, the way that Apple designed it, where you're mixing and matching these different concepts. You're able -- you know, there's a ton of demos on Twitter of people.

You know, essentially you're going between emotions here in a more intuitive way. And then just yesterday, Figma released this way of adjusting the tone of text, right? Where you're going between professional, casual, expanded, concise. The problem with this is that casual means a lot of different things, right? Casual for a Fortune 500 company and a, you know, direct-to-consumer cosmetics brand, you know, with ads on TikTok, right?

These are very different things. Casual when talking to your best friend or a coworker, these are different. So how do we avoid being reductive when trying to offer these sorts of presets? And the answer is you're just like -- I don't know exactly how many zeros I put here, but you're just like million X or billion X the number of presets, right?

So you have enough presets for everything. And I think sparse autoencoders show a really promising path towards that. So if you guys have tried Golden Gate Claude, where you can kind of identify the one feature of Golden Gate Bridge-ness and amplify it, and it makes Claude obsessed with Golden Gate Bridges specifically, or the Golden Gate Bridge.

My friend Gittes has an amazing demo towards this, but for manipulating images, right? So you can see here he's increasing the amount of play of light and shadows, increasing the amount of serene forest streams or Venetian canals, in, again, a very controllable and predictable way. Okay, but -- so now we have a million, billion options, whatever.

How do we avoid too many options? I think this gets to point three, which is ranked presets. So these are presets that are personalized, searchable, and even invoked through natural language. They might not even be directly visible to the user. So the user types in something like more friendly, and you pull up the corresponding presets, like kindness, how close you are, how confrontational it is.

Again, maybe they're directly editing it, maybe they're not. And this gets to the last point, or second to last point, which is developer-defined personalization. So as soon as you're able to define those sort of features, you can start tuning them per user. So each user, in a way that you can't do with just text prompts today, right?

Because text prompts are sort of this fragile house of cards, where if you remove one word, the whole output changes. So you're able to tune it per user. And the last point, and especially true, as you start, your app is going to become increasingly different per user, is shifting from evals to analytics.

I don't think there's going to be some objectively correct, for a lot of domains, answer to things. Like, who was the first president, yes, but the right sort of tone for a summary for a specific user, I don't think so. And so I think that increasingly, it's going to be about how do you understand if you're meeting the needs of your users, and what they're asking for.

So that's it. Thank you so much. Oh yeah, we'll skip this one. And thank you so much for coming. *outro music*

The era of unbounded products: Designing for Multimodal IO: Ben Hylak

Chapters

Transcript