I want to start with a question. Does anybody remember what accounting looked like in the early 1900s? Yeah, me neither. But from what I gather, it was super frustrating. And it involved a lot of writing letters and numbers, annotating in margins, performing calculations by hand. You can probably look at these pages and sense how frustrating it is by looking at how many things are crossed out and all the inkblots on the page.
So thankfully, this isn't how the job's done these days. So in 1979, PhysiCelp totally changed the game, and this was the first spreadsheet for personal computers. It became an essential tool for accountants, at least until Lotus 1-2-3 was launched four years later. And the innovation here wasn't performing the calculations automatically.
We already had calculators and computers to do that for us. But instead, the innovation was having the structured interface that stacked those automatic calculations together into formulas so that when you change the value of a cell or you add a row to your spreadsheet, all of the spreadsheet numbers would be updated live.
So instead of spending all day doing calculations or manually updating the rows and columns, accountants could now spend more time worrying about the actual numbers. Okay. Fortunately or unfortunately, this isn't a spreadsheet conference, so let's get back to talking about AI. So one of the things I'm most interested in is what are the best ways to combine our new AI superpowers with the interfaces that we use today, or more importantly, the interfaces that we want to use tomorrow.
So often, when people talk about building interfaces with AI, they refer to these two distinct poles, whether it's automation or augmentation. In essence, automation takes rote tests and does them for the user, which is really great for anything that's super tedious or boring, like copy and pasting data into a table or doing calculations by hand.
And in contrast, augmentation gives the user a new ability or improves their existing abilities, which is awesome for things that are creative or nuanced, things we don't really trust models with yet, like analyzing data. And I think this contrast often ignores how related these two concepts really are. Automation has become a bit of a buzzword or a trigger word where people are worried about their jobs being automated.
And I think this is a very valid concern, and I kind of want to reframe this dichotomy. So instead, I think augmentation is composed of smaller automations. If our end goal is to augment tasks or jobs, we'll still need to automate parts of them. So, for example, if the end goal is analyzing data, automating the smaller tasks, like aggregating the data into a table or generating visualizations from that table, are going to help focus on your end goal.
So if we go back to our spreadsheet example, we can think of each cell, the calculations that create them, as having been automated away. And no one really thinks of spreadsheets as taking people's jobs. Instead, Excel, what I'm showing here, which is kind of like the current king of spreadsheets, is an essential tool for people who interact with things like financial data.
If we automate these parts behind the scenes, that's the first step towards achieving the goal of augmenting working with data. So in the future, we can easily imagine having this table aggregated automatically or writing the formulas for us. And having all this work done helps augment us in our greater goal of analyzing and understanding the data.
This is one of the reasons why you might hear me say some things, like chatbots aren't necessarily the future. I think that these flexible general tools like calculators and chatbots are wonderful, but then adding the structured interface around them makes them so much more powerful for a ton of different use cases.
What we want is something where the technology behind chatbots is embedded into the interfaces, where we're still driving, but the model's automating away the smaller tests that we find so frustrating. So what might these interfaces look like? Before answering that question, I want to introduce one more concept, the ladder of abstraction.
So that basic idea here is that the exact same object can be represented at many different levels of detail. So I think maps are a good example of this. We take this interface for granted, but Google Maps and other digital maps are incredibly compelling interfaces. They're so well designed, and they help represent different tasks involving navigation and localization at different scales.
So here we are at the most zoomed in scale, and we can see all of the different structures within the Monterey Bay Aquarium. We can see individual buildings, the names, the icons for them, maybe routes between the buildings. And this is great for navigating around the aquarium, but maybe not so great for getting to the aquarium.
As we zoom out, all of these buildings get smaller because they're further away, but that's not the only thing that happens. So at these more zoomed out levels, Google Maps actually starts hiding information, so I can't see the buildings inside of the aquarium anymore, or their icons or names.
But instead I can see city streets and different restaurants, and this will support a different set of tasks, like finding a restaurant or destination and getting to that place. Zooming out even further, we lose those city streets and stores, and instead we look at highways and terrain. And again, we have a different task here, this level supports longer range travel, getting to and from Monterey.
And then, if we go all the way out, we're mostly looking at the shape of states or countries. So if we tried to keep all of that information at higher zoom levels, it would be completely incomprehensible. There's really only so much information we can fit in our brains, and so many pixels on a screen.
And most of that detail isn't relevant for the task we're trying to do anyway. So, you could wonder, can we use AI to bring these kinds of principles to other types of interfaces? For example, what would happen if I zoomed out on a book? What would that even look like?
Typically, when we read a book, we're looking at every single word, but that's not the only level we think about. When remembering books we've read in the past, or summarizing a book for a friend, we're more concerned with overall topics and plots than specific wording. And now that we have access to language models, which are amazing at summarizing and transforming text, how can we use them to change the way we read and write?
So, here's a quick demo I put together of the first five chapters of Peter Pan. And there's no tricks here. I'm just scrolling through the first chapter. So, if we take this and we use an LLM to zoom out, we can see each paragraph change to a one-sentence summary.
And we have a mini-map to the right, and you can kind of see how much fewer words there are in the page and how much more quickly I could read this. If we zoom out another level, we can see summaries of, say, ten paragraphs at once. And again, you can see on the mini-map, we have way less text to read.
And then, finally, at that highest zoom level, we've reduced each chapter in one sentence. And here, we can fit five chapters on one page. So, if I were writing Peter Pan and I wanted to do something like tweak the pacing or modify the plot structure, viewing the text at this highest zoom level, editing it, and then zooming back in to see how that changed the raw text, would be a much nicer workflow than keeping all the parts in your head as you change it word by word.
So, another way to think about a book at a high level is with a story arc. And this describes the mood mapped over an entire story. You might be familiar with Kurt Vonnegut's graphical representation of the most common story arcs. For example, we have Man in a Hole, where the main character gets in trouble, gets out of it, and ends up better for the experience, which you'll see in stories like The Hobbit or The Wizard of Oz or Alice in Wonderland.
What if we could take the semantic value of all the sections in a book and plot that on a graph? And then, if we wanted to edit the story, we could go ahead and tweak parts of that graph and see how the raw text change. I mainly highlight this because I'm super excited to see how we use AI to innovate on writing tools within the next few years.
But first, let's combine the two concepts. So, the first concept is augmentation as stacked automations, and the second concept is traversing the ladder of abstraction for different tasks. How might this look in a more general product? So, I'm on the design team of a startup here in SF named Adept.
And at Adept, we're focused on training AI to use software, read screens, and take actions the way humans do. And our end goal is to make knowledge work easier, so any work on a computer. So, after speaking with a lot of people about what they do day to day at their jobs, we've found that much of knowledge work involves getting information, transforming our reasoning about it, and then acting on that information.
So, given this really common workflow, one of the things we've been thinking about is what might it mean to zoom out on any piece of information? So, we have some sketches where we're exploring what that might feel like or what it might enable us to do. I thought it would be really fun to share one of those with you all today.
All right. So, completely hypothetical situation. Let's say I was going to an awesome conference in San Francisco. What I would do first is I would go to Airbnb. I'd find listings near the venue. I'd click into the detail page of one of the listings. And there's all this generic information that should work for everybody.
But I have specific criteria that will help me decide whether or not it's the right thing to book. So, I'm going to be digging through this page looking for things like how close is it to the venue? Is there a coffee maker? Does it have good Wi-Fi? That kind of thing.
This kind of decision would be much easier if I could zoom out just a little. Get rid of all the branding and standard information that isn't really important to me right now. And focus on my deciding factors. So, to start, I can see the name of the listing, maybe the rating, a quick summary, and the total price.
And this is all pretty generic so far. But I know this conference is at the esteemed Hotel Nico. And I'm typically going to be looking at a map to find places near that venue. But if I could just extract the walking minutes to the hotel and put that right on the page, that would be really helpful.
And maybe if that's a little bit far, I can figure out what is the closest BART station to the listing and then add the walk to BART there as well as a backup way to get to the hotel. Another thing that's really important to me is the Wi-Fi speed.
I know I'm going to be working on my talk the night before, true story. So, I'm going to need really fast internet. So, I can use AI to pull out the relevant reviews and summarize them as positive or negative to really quickly judge whether the Wi-Fi is going to work or not.
Additionally, usually, Airbnb has like 50 vanity photos for any given listing. And I really just want one photo of the bedroom or living room or kitchen. So, if I could just pull those out and put them on the page, that would help me a lot. And then most importantly, at this higher Zoom level, preserving the ability to act on this information.
So, directly from this page, I can go ahead and reserve this listing or send a message to the host without going back to Airbnb. That would be really helpful and keep me in control. And I never really know whether staying at an Airbnb or hotel is going to be a better deal.
So, typically, I'll also look at hotel listings. And it's pretty great to be able to see that same elevated view no matter which site I'm looking at. Additionally, if I'm going to compare the hotel with the Airbnb listing, having these similar views side by side is going to give me a really easy comparison between the two of them.
But what if I wanted to look at 50 listings? Comparing 50 of these individual views would still be a lot of work. Zooming out a level, I can look at a spreadsheet for all 50 listings with my deciding factors all laid out for easy comparison. So, I can quickly eyeball the distribution for total price, get a sense of how quick the walks are for each of the listings, how many positive Wi-Fi reviews there are.
Importantly, I can still take action on this level. So, if I see a listing that's a clear winner, I can go ahead and book it right here instead of going back to Airbnb or Hotels.com. But sometimes the decision isn't so clear cut or it's more multifaceted than having the cheapest or the closest listing.
So, if I zoom out another level, each listing has been abstracted into a circle on a scatter plot. And these are colored by the Wi-Fi reviews. You can see the cheapest listings on the left of this plot with the most expensive ones on the right and the closest ones to the hotel near the bottom.
And I can pretty quickly see that there's this cluster of listings that are the cheapest and the closest and they also have good Wi-Fi. But I just realized my flight gets in at 9:00 a.m. But thankfully, I can still initiate actions from this view. So, I can circle these, send a message to all the listings within this cluster, ask them about their policy on early check-ins.
And whichever one responds first that I can check in at 11:00 a.m., I'm going to go ahead and book. So, as we saw, there's so many tasks that are best suited by a specific Zoom level. And what we're currently doing is we're manually abstracting that information in our heads.
So, in this example, digging through 50 different Airbnb or hotel listings, we're keeping all of the previous ones in our heads to try to find the best one. And this takes a lot of mental energy. I know I titled my talk Climbing the Ladder of Abstraction. That was partially to not rip off Brett Victor, who has a talk titled Up and Down the Ladder of Abstraction.
It's a great talk. But I'm not trying to argue that higher levels are better. Instead, what I'm trying to argue is that we can use AI to generate these different levels, glue them together, and make it easy to move between them. And I think this could completely change the way that we work with information.
So, this is one of the many great explorations we're doing at Adept to make all computer work easier. We're going to have a lot more to share in the near future. Stay tuned. And then, to sum up, there's three things that I would love for you to take away from this talk.
The first is augmenting tasks are going to look a lot like automating smaller, tedious parts. No one's thinking of spreadsheets as taking people's jobs, and digital spreadsheets is exactly the kind of innovation that I want to see in the next few years. Secondly, we often think about information at different levels of abstraction, and let's make this easier by using AI to generate and act on these different levels.
And then lastly, this is the kind of thinking we're doing at Adept. Feel free to follow us, or follow along, check in, and we're at Adept.ai. All right. Thanks for listening. All right. Thanks for listening. . Thank you. Thank you. Thank you. Thank you. Thank you. We'll see you next week.