The following is a conversation with Boris Sofman, who is the Senior Director of Engineering and Head of Trucking at Waymo, the autonomous vehicle company, formerly the Google Self-Driving Car Project. Before that, Boris was the co-founder and CEO of Anki, a robotics company that created Cosmo, which in my opinion, is one of the most incredible social robots ever built.
It's a toy robot, but one with an emotional intelligence that creates a fun and engaging human-robot interaction. It was truly sad for me to see Anki shut down when it did. I had high hopes for those little robots. We talk about this story and the future of autonomous trucks, vehicles, and robotics in general.
I spoke with Steve Viselli recently on episode 237 about the human side of trucking. This episode looks more at the robotics side. This is the Lex Friedman Podcast. To support it, please check out our sponsors in the description. And now, here's my conversation with Boris Sofman. Who is your favorite robot in science fiction, books or movies?
- WALL-E and R2-D2, where they were able to convey such an incredible degree of intent, emotion, and kind of character attachment without having any language whatsoever. And just purely through the richness of emotional interaction. So those were fantastic. And then the Terminator series, just like really pretty wide range, right?
But I kind of love this dynamic where you have this like incredible Terminator itself that Arnold played. And then he was kind of like the inferior, like previous generation version that was like totally outmatched, you know, in terms of kind of specs by the new one, but you know, still kind of like held his own.
And so it was kind of interesting where you realize how many levels there are on the spectrum from human to kind of potentials in AI and robotics to futures. And so, yeah, that movie really, as much as it was like kind of a direct world in a way, was actually quite fascinating, gets the imagination going.
- Well, from an engineering perspective, both the movies you mentioned, WALL-E and Terminator, the first one, is probably achievable. You know, humanoid robot, maybe not with like the realism in terms of skin and so on, but that humanoid form, we have that humanoid form. It seems like a compelling form.
Maybe the challenge is it's super expensive to build, but you can imagine, maybe not a machine of war, but you can imagine Terminator type robots walking around. And then the same, obviously, with WALL-E, you've basically, so for people who don't know, you created the company Anki that created a small robot with a big personality called Cosmo that just does exactly what WALL-E does, which is somehow with very few basic visual tools is able to communicate a depth of emotion.
And that's fascinating. But then again, the humanoid form is super compelling. So like Cosmo is very distant from a humanoid form. And then the Terminator has a humanoid form. And you can imagine both of those actually being in our society. - It's true. And it's interesting because it was very intentional to go really far away from human form when you think about a character like Cosmo or like WALL-E, where you can completely rethink the constraints you put on that character, what tools you leverage, and then how you actually create a personality and a level of intelligence interactivity that actually matches the constraints that you're under, whether it's mechanical or sensors or AI of the day.
- This is why I was always really surprised by how much energy people put towards trying to replicate human form in a robot, because you actually take on some pretty significant kind of constraints and downsides when you do that. The first of which is obviously the cost, where it's just the articulation of a human body is just so magical in both the precision as well as the dimensionality, that to replicate that even in its reasonably close form takes like a giant amount of joints and actuators and motion and sensors and encoders and so forth.
But then you're almost like setting an expectation that the closer you try to get to human form, the more you expect the strengths to match. And that's not the way AI works, is there's places where you're way stronger and there's places where you're weaker. And by moving away from human form, you can actually change the rules and embrace your strengths and bypass your weaknesses.
- And at the same time, the human form has way too many degrees of freedom to play with. It's kind of counterintuitive, just as you're saying, but when you have fewer constraints, it's almost harder to master the communication of emotion. Like you see this with cartoons, like stick figures, you can communicate quite a lot with just very minimal, like two dots for eyes and a line for a smile.
I think you can almost communicate arbitrary levels of emotion with just two dots and a line. And like, that's enough. And if you focus on just that, you can communicate the full range. And then you, like, if you do that, then you can focus on the actual magic of human and dot line interaction versus all the engineering mess.
- That's right. Like dimensionality, voice, all these sorts of things, they actually become a crutch where you get lost in a search space almost. And so some of the best animators that we've worked with, they almost like study when they come up, kind of in building their expertise by forcing these projects where all you have is like a ball that can like kind of jump and manipulate itself or like really, really like aggressive constraints where you're forced to kind of extract the deepest level of emotion.
And so in a lot of ways, you know, when we thought about Cosmo, I was like, you're right. If we had to like describe it in like one small phrase, it was bringing a Pixar character to life in the real world. And so it's what we were going for.
And in a lot of ways, what was interesting is that with like WALL-E, which we studied incredibly deeply, and in fact, some of our team were, you know, kind of had worked previously at Pixar and on that project, they intentionally constrained WALL-E as well, even though in an animated film, you could do whatever you wanted to because it forced you to like really saturate the smaller amount of dimensions.
But you sometimes end up getting a far more beautiful output because you're pushing at the extremes of this emotional space in a way that you just wouldn't because you get lost in a surface area if you have like something that is just infinitely articulable. - So if we backtrack a little bit, you thought of Cosmo in 2011, in 2013, actually designed and built it.
What is Anki? What is Cosmo? I guess, who is Cosmo? - Who is Cosmo? - What was the vision behind this incredible little robot? - We started Anki back in, like while we were still in graduate school. So myself and my two co-founders, we were PhD students in the Robotics Institute at Carnegie Mellon.
And so we were studying robotics, AI, machine learning, kind of different areas. One of my co-founders was working on walking robots, you know, for a period of time. And so we all had a bit of a really deep, kind of a deeper passion for applications of robotics and AI where there's like a spectrum where there's people that get like really fascinated by the theory of AI and machine learning robotics, where whether it gets applied in the near future or not is less of a kind of factor on them, but they love the pursuit of the like the challenge.
And that's necessary. And there's a lot of incredible breakthroughs that happen there. We're probably closer to the other end of the spectrum where we love the technology and all the evolution of it, but we were really driven by applications. Like how can you really reinvent experiences and functionality and build value that wouldn't have been possible without these approaches?
And that's what drove us. And we had a kind of some experiences through previous jobs and internships where we like got to see the applied side of robotics. And at that time, there was actually relatively few applications of robotics that were outside of, you know, pure research or industrial applications, military applications and so forth.
There were very few outside of it. So maybe, you know, iRobot was like one exception and then maybe there were a few others, but for the most part, there weren't that many. And so we got excited about consumer applications of robotics where you could leverage way higher levels of intelligence through software to create value and experiences that were just not possible in those fields today.
And we saw kind of a pretty wide range of applications that varied in the complexity of what it would take to actually solve those. And what we wanted to do was to commercialize this into a company, but actually do a bottoms up approach where we could have a huge impact in a space that was ripe to have an impact at that time and then build up off of that and move into other areas.
And entertainment became the place to start because you had relatively little innovation in a toy space, an entertainment space. You had these really rich experiences in video games and movies, but there was like this chasm in between. And so we thought that we could really reinvent that experience. And there was a really fascinating transition technically that was happening at the time where the cost of components was plummeting because of the mobile phone industry and then the smartphone industry.
And so the cost of a microcontroller, of a camera, of a motor, of memory, of microphones, cameras was dropping by orders of magnitude. And then on top of that, with the iPhone coming out in 2000, I think it was 2007, I believe. It started to become apparent within a couple of years that this could become a really incredible interface device and the brain with much more computation behind a physical world experience that wouldn't have been possible previously.
And so we really got excited about that and how we push all the complexity from the physical world into software by using really inexpensive components, but putting huge amounts of complexity into the AI side. And so Cosmo became our second product and then the one that we're probably most proud of.
The idea there was to create a physical character that had enough understanding and awareness of the physical world around it and the context that mattered to feel like he was alive. And to be able to have these emotional connections and experiences with people that you would typically only find inside of a movie.
And the motivation very much was Pixar. We had an incredible respect and appreciation for what they were able to build in this really beautiful fashion and film. But it was always, one, it was virtual, and two, it was like a story on rails that had no interactivity to it.
It was very fixed and it obviously had a magic to it, but where you really start to hit a different level of experiences when you're actually able to physically interact with that robot. - And then that was your idea with Anki, like the first product was the cars. So basically you take a toy, you add intelligence into it in the same way you would add intelligence into AI systems within a video game, but you're now bringing it into the physical space.
So the idea is really brilliant, which is you're basically bringing video games to life. - Exactly, that's exactly right. We literally use that exact same phrase because in the case of "Drive," this was a parallel of the racing genre. And the goal was to effectively have a physical racing experience, but have a virtual state at all times that matches what's happening in the physical world.
And then you can have a video game off of that and you can have different characters, different traits for the cars, weapons and interactions and special abilities and all these sorts of things that you think of virtually, but then you can have it physically. And one of the things that we were really surprised by that really stood out and immediately led us to really accelerate the path towards Cosmo is that things that feel like they're really constrained and simple in the physical world, they have an amplified impact on people where the exact same experience virtually would not have anywhere near the impact, but seeing it physically really stood out.
And so effectively with "Drive," we were creating a video game engine for the physical world. And then with Cosmo, we expanded that video game engine to create a character and kind of an animation and interaction engine on top of it that allowed us to start to create these much more rich experiences.
And a lot of those elements were almost like a proving ground for what would human-robot interaction feel like in a domain that's much more forgiving, where you can make mistakes in a game. It's okay if a car goes off the track or if Cosmo makes a mistake. And what's funny is actually, we were so worried about that.
In reality, we realized very quickly that those mistakes can be endearing. And if you make a mistake, as long as you realize you make a mistake and have the right emotional reaction to it, it builds even more empathy with the character. - That's brilliant. Exactly. So the thing you're optimizing for is fun.
You have so much more freedom to fail, to explore. And also in the toy space, all of this is really brilliant. I got to ask you, backtrack. It seems for a roboticist to take a jump into the direction of fun is a brilliant move. Because one, you have the freedom to explore and to design all those kinds of things.
And you can also build cheap robots. If you're not chasing perfection and toys, it's understood that you can go cheaper, which means in robot, it's still expensive, but it's actually affordable by a large number of people. That's a really brilliant space to explore. - Yeah, that's right. And in fact, we realized pretty quickly that perfection is actually not fun.
Because in a traditional roboticist sense, the first path planner, and this is the part that I worked on out of the gate, was a lot of the AI systems where you have these vehicles and cars racing, making optimal maneuvers to try to get ahead. And you realize very quickly that that's actually not fun because you want the chaos from mistakes.
And so you start to intentionally almost add noise to the system in order to create more of a realism in the exact same way that a human player might start really ineffective and inefficient and then start to increase their quality bar as they progress. And there is a really, really aggressive constraint that's forced on you by being a consumer product where the price point matters a ton, particularly in an entertainment where you can't make a thousand dollar product unless you're gonna meet the expectations of a thousand dollar product.
And so in order to make this work, your cost of goods had to be well under a hundred dollars. In the case of Cosmo, we got it under $50, end to end fully packaged and delivered. - And it was under $200 of the cost of retail. Okay, if we sit down at the early stages, if we go back to that, and you're sitting down and thinking about what Cosmo looks like from a design perspective and from a cost perspective, I imagine that was part of the conversation.
First of all, what came first? Did you have a cost in mind? Is there a target you're trying to chase? Did you have a vision in mind, like size? Did you have, 'cause there's a lot of unique qualities to Cosmo, so for people who don't know, they should definitely check it out.
There's a display, there's eyes on the display and those eyes can, it's pretty low resolution eyes, right? But they still able to convey a lot of emotion. And there's this arm, like that-- - Lift sort of-- - Lift stuff, but there's something about arm movement that adds even more kind of depth.
It's like the face communicates emotion and sadness and disappointment and happiness. And then the arms kind of communicates, I'm trying here. - Yeah. - I'm doing my best in this complicated world. - Exactly, so it's interesting because like, all of Cosmo is only four degrees of freedom and two of them are the two treads, which is for basic movement.
And so you literally have only a head that goes up and down, a lift that goes up and down, and then your two wheels. And you have sound and a screen, and a low resolution screen. And with that, it's actually pretty incredible what you can come up with, where, like you said, it's a really interesting give and take because there's a lot of ideas far beyond that, obviously, as you can imagine, where, like you said, how big is it?
How much degrees of freedom? What does he look like? What does he sound like? How does he communicate? It's a formula that actually scales way beyond entertainment. This is the formula for human kind of robot interface more generally, is you almost have this triangle between the physical aspects of it, the mechanics, the industrial design, what's mass producible, the cost constraints and so forth.
You have the AI side of how do you understand the world around you, interact intelligently with it, execute what you want to execute, so perceive the environment, make intelligent decisions, and move forward. And then you have the character side of it. Most companies that have done anything in human-robot interaction really miss the mark or under-invest in the character side of it.
They over-invest in the mechanical side of it, and then varied results on the AI side of it. And so the thinking is that you put more mechanical flexibility into it, you're gonna do better. You don't necessarily, you actually create a much higher bar for a higher ROI because now your price point goes up, your expectations go up, and if the AI can't meet it or the overall experience isn't there, you miss the mark.
- So who, like how did you, through those conversations, get the cost down so much and make it so simple? Like, there's a big theme here because you come from the mecca of robotics, which is Carnegie Mellon University, robotics. For all the people I've interacted with that come from there or just from the world experts at robotics, they don't, they would never build something like Cosmo.
And so where did that come from, the simplicity? - It came from this combination of a team that we had. It was quite cool because, and by the way, you ask anybody that's experienced in the toy entertainment space, you'll never sell a product over $99. That was fundamentally false, and we believed it to be false.
It was because the experience had to kind of meet the mark. And so we pushed past that amount, but there was a pressure where the higher you go, the more seasonal you become and the tougher it becomes. And so on the cost side, we very quickly partnered up with some previous contacts that we worked with where, just as an example, our head of mechanical engineering was one of the earliest heads of engineering at Logitech and has a billion units of consumer products in circulation that he's worked on.
So like crazy low cost, high volume consumer product experience. We had a really great mechanical engineering team and just a very practical mindset where we were not gonna compromise on feasibility in the market in order to chase something that would be an enabler. And we pushed a huge amount of expectations onto the software team where, yes, we're gonna use cheap, noisy motors and sensors, but we're gonna fix it on the software side.
Then we found on the design and character side, there was a faction that was more from like a game design background that thought that it should be very games-driven Cosmo, where you create a whole bunch of games experiences and it's all about like game mechanics. And then there was a fashion which my co-founder and I are the most involved in this, like really believed in, which was character-driven.
And the argument is that you will never compete with what you can do virtually from a game standpoint, but you actually, on the character side, put this into your wheelhouse and put it more towards your advantage because a physical character has a massively higher impact physically than virtually. - Okay, can I just pause on that?
'Cause this is so brilliant. For people who don't know, Cosmo plays games with you, but there's also a depth of character. And I actually, when I was playing with it, I wondered exactly what is the compelling aspect of this? Because to me, obviously I'm biased, but to me, the character, what I enjoyed most, honestly, or what got me to return to it is the character.
- That's right. - That's a fascinating discussion of, you're right. Ultimately, you cannot compete on the quality of the gaming experience. - It's too restrictive. The physical world is just too restrictive and you don't have a graphics engine, it's like all this. But on the character side, and clearly we moved in that direction as like kind of the winning path.
And we partnered up with this really, we immediately went towards Pixar and Carlos Bena, he was one of, had been at Pixar for nine years. He'd worked on tons of the movies, including "Wall-E" and others, and just immediately kind of spoke the language and it just clicked on how you think about that like kind of magic and drive.
And then we built out a team with him as like a really kind of prominent kind of driver of this with different types of backgrounds and animators and character developers where we put these constraints on the team, but then got them to really try to create magic despite that.
And we converged on this system that was at the overlap of character and the character AI that were, if you imagine the dimensionality of emotions, happy, sad, angry, surprised, confused, scared, like you think of these extreme emotions, we almost like kind of put this challenge to kind of populate this library of responses on how do you show the extreme response that like goes to the extreme spectrum on angry or frustrated or whatever.
And so that gave us a lot of intuition and learnings. And then we started parameterizing them where it wasn't just a fixed recording, but they were parameterized and had randomness to them where you could have infinite permutations of happy and surprised and so forth. And then we had a behavioral engine that took the context from the real world and would interpret it and then create kind of probability mappings on what sort of responses you would have that actually made sense.
And so if Cosmos saw you for the first time in a day, he'd be really surprised and happy in the same way that the first time you walk in and like your toddler sees you, they're so happy, but they're not gonna be that happy for the entirety of your next two hours.
But like you have this like spike in response, or if you leave him alone for too long, he gets bored and starts causing trouble and like nudging things off the table. Or if you beat him in a game, the most enjoyable emotions are him getting frustrated and grumpy to a point where our testers and our customers would be like, I had to let him win because I don't want him to be upset.
And so you start to like create this feedback loop where you see how powerful those emotions are. And just to give you an example, something as simple as eye contact. You don't think about it in a movie, just like it kind of happens like, camera angles and so forth, but that's not really a prominent source of interaction.
What happens when a physical character like Cosmo, when he makes eye contact with you, it built universal kind of connection, kids all the way through adults. And it was truly universal. It was not like people stopped caring after 10, 12 years old. And so we started doing experiments and we found something as simple as increasing the amount of eye contact, like the amount of times in a minute that he'll look over for your approval to like kind of make eye contact, just by I think doubling it.
We increased the play time engagement by 40%. Like you see these sort of like kind of interactions where you build that empathy. And so we studied pets, we studied virtual characters. There's like a lot of times actually dogs are one of the perfect, most perfect influencers behind these sort of interactions.
And what we realized is that the games were not there to entertain you. The games were to create context to bring out the character. And if you think about the types of games that you played, they were relatively simple, but they were always ones to create scenarios of either tension or winning or losing or surprise or whatever the case might be.
And they were purely there to just like create context to where an emotion could feel intelligent and not random. And in the end, it was all about the character. - So yeah, there's so many elements to play with here. So you said dogs. What lessons do we draw from cats who don't seem to give a damn about you?
Is that just another character? Is it another? - It's just another character. And so you could almost like in the early explorations, we thought that it would be really incredible if you had a diversity of characters where you almost help encourage which direction it goes, just like in a role-playing game.
And you had, like, think of like the seven dwarves sort of. And initially we even thought that it would be amazing if like their characters actually help them have strengths and weaknesses in some, like whatever they end up doing. Like some are scared, some are arrogant, some are super warm and like kind of friendly.
And in the end we focused on one because it made it very clear that, hey, we got to build out enough depth here because you're kind of trying to expand. It's almost like how long can you maintain a fiction that this character is alive to where the person's explorations don't hit a boundary, which happens almost immediately with typical toys.
And even with video games, how long can we create that immersive experience to where you expand the boundary? And one of the things we realized is that you're just way more forgiving when something has a personality and it's physical. That is the key that unlocks robotics interacting in the physical world more generally is that when you don't have a personality and you make a mistake as a robot, the stupid robot made a mistake.
Why is it not perfect? When you have a character and you make a mistake, you have empathy and it becomes endearing and you're way more forgiving. And that was the key that was like, I think goes far, far beyond entertainment. - It actually builds the depth of the personality, the mistakes.
So let me ask the movie "Her" question then. How, so "Cosmos" feels like the early days of something that will obviously be prevalent throughout society at a scale that we cannot even imagine. My sense is it seems obvious that these kinds of characters will permeate society and we'll be friends with them.
We'll be interacting with them in different ways. In a way we, I mean, you don't think of it this way, but when you play video games, they're often cold and impersonal, but even then you think about role-playing games, you become friends with certain characters in that game. They don't remember much about you.
They're just telling a story. It's exactly what you're saying. They exist in that virtual world. But if they acknowledge that you exist in this physical world, if the characters in the game remember that you exist, that you, like for me, like Lex, they understand that I'm a human being who has like hopes and dreams and so on.
It seems like there's going to be like billions, if not trillions of Cosmos in the world. So if we look at that future, there's several questions to ask. How intelligent does that future Cosmo need to be to create fulfilling relationships like friendships? - Yeah, it's a great question. And part of it was the recognition that it's going to take time to get there, because it has to be a lot more intelligent, because what's good enough to be a magical experience for an eight-year-old, it's a higher bar to do that, be like a pet in the home, or to help with functional interface in an office environment or in a home or and so forth.
And so, and the idea was that you build on that and you kind of get there. And as technology becomes more prevalent and less expensive and so forth, you can start to kind of work up to it. But you're absolutely right at the end of the day, we almost equated it to how the touchscreen created like this really novel interface to physical kind of devices like this.
This is the extension of it where you have much richer physical interaction in the real world. This is the enabler for it. And it shows itself in a few kind of really obvious places. So just take something as simple as a voice assistant. You will never, most people will never tolerate an Alexa or a Google Home, just starting a conversation proactively when you weren't kind of expecting it, because it feels weird.
It's like you were listening and like, and then now you're kind of, it feels intrusive. But if you had a character, like a cat that touches and gets your attention, or a toddler, like you never think twice about it. And what we found really kind of immediately is that these types of characters like Cosmo and they would like roam around and kind of get your attention.
And we had a future version that was always on kind of called Vector. People were way more forgiving. And so you could initiate interaction in a way that is not acceptable for machines. And in general, you know, there's a lot of ways to customize it, but it makes people who are skeptical of technology much more comfortable with it.
There was like, there were a couple of really, really prominent examples of this. So when we launched in Europe, and so we were in, I think like a dozen countries, if I remember correctly, but like we went pretty aggressively in launching in Germany and France and UK. And we were very worried in Europe because there's obviously like a really, a socially higher bar for privacy and security where you've heard about how many companies have had troubles on things that might've been okay in the US, but like are just not okay in Germany and France in particular.
And so we were worried about this because you have, you know, Cosmo, who's, you know, in a future product Vector, like where you have cameras, you have microphones, it's kind of connected and like you're playing with kids and like in these experiences, and you're like, this is like ripe to be like a nightmare if you're not careful.
And the journalists are like notoriously like really, really tough on these sorts of things. We were shocked and we prepared so much for what we would have to encounter. We were shocked in that not once from any journalists or customer did we have any complaints beyond like a really casual kind of question.
And it was because of the character where when the conversation came up, it was almost like, well, of course he has to see and hear how else is he gonna be alive and interacting with you? And it completely disarmed this like fear of technology that enabled this interaction to be much more fluid.
And again, like entertainment was a proving ground, but that is like, you know, there's like ingredients there that carry over to a lot of other elements down the road. - That's hilarious that we're a lot less concerned about privacy if the thing is value and charisma. I mean, that's true for all of human to human interactions.
- It's an understanding of intent where like, well, he's looking at me, he can see me. If he's not looking at me, he can't see me. Right, so it's almost like you're communicating intent and with that intent, people were like kind of a more understanding and calmer. And it's interesting, it was just the earliest kind of version of starting to experiment with this, but it wasn't an enabler.
And then you have like completely different dimensions where like, you know, kids with autism had like an incredible connection with Cosmo that just went beyond anything we'd ever seen. And we have like these just letters that we would receive from parents. And we had some research projects kind of going on with some universities on studying this.
But there's an interesting dimension there that got unlocked that just hadn't existed before that has these really interesting kind of links into society and a potential building block of future experience. - So if you look out into the future, do you think we will have beyond a particular game, you know, a companion like Her, like the movie Her, or like a Cosmo that's kind of asks you how your day went too, right?
You know, like a friend. How many years away from that do you think we are? What's your intuition? - Good question. So I think the idea of a different type of character, like more closer to like kind of a pet style companionship, it will come way faster. And there's a few reasons.
One is like to do something like in Her, that's like effectively almost general AI. And the bar is so high that if you miss it by a bit, you hit the uncanny valley where it just becomes creepy and like, and not appealing. Because the closer you try to get to a human in form and interface and voice, the harder it becomes.
Whereas you have way more flexibility on still landing a really great experience if you embrace the idea of a character. And that's why, one of the other reasons why we didn't have a voice, and also why like a lot of video game characters, like Sims, for example, does not have a voice when you think about it.
It wasn't just a cost savings like for them. It was actually for all of these purposes, it was because when you have a voice, you immediately narrow down the appeal to some particular demographic or age range or kind of style or gender. If you don't have a voice, people interpret what they want to interpret.
And an eight year old might get a very different interpretation than a 40 year old, but you create a dynamic range. And so you just, you can lean into these advantages much more in something that doesn't resemble a human. And so that'll come faster. I don't know when a human like, that's just still like Matt, just complete R&D at this point.
The chat interfaces are getting way more interesting and richer, but it's still a long way to go to kind of pass the test of, you know. - Well, let me, like, let's consider like, let me play devil's advocate. So Google is a very large company that's servicing, it's creating a very compelling product that wants to provide a service to a lot of people.
But let's go outside of that. You said characters. - Yeah. - It feels like, and you also said that it requires general intelligence to be a successful participant in a relationship, which could explain why I'm single. This is very, but the, I honestly want to push back on that a little bit because I feel like, is it possible that if you're just good at playing a character, in a movie, there's a bunch of characters.
If you just understand what creates compelling characters, and then you just are that character, and you exist in the world, and other people find you, and they connect with you, just like you do when you talk to somebody at a bar. I like this character, this character's kind of shady, I don't like them.
You pick the ones that you like, and, you know, maybe it's somebody that's, reminds you of your father or mother, I don't know what it is, but the Freudian thing, but there's some kind of connection that happens, and that's the cosmo you connect to. That's the future cosmo you connect, and that's, so I guess the statement I'm trying to make, is it possible to achieve a depth of friendship without solving general intelligence?
- I think so, and it's about intelligent kind of constraints, right? And just, you set expectations and constraints, such that in the space that's left, you can be successful. And so, you can do that by having a very focused domain that you can operate in. For example, you're a customer support agent for a particular product, and you create intelligence and a good interface around that.
Or, you know, kind of in the personal companionship side, you can't be everything to, across the board, you kind of solve those constraints. And I think it's possible. My worry is, right now, I don't see anybody that has picked up on where kind of cosmo left off, and is pushing on it in the same way.
And so, I don't know if it's the sort of thing where, similar to like how, you know, in dot-com, there were all these concepts that we considered, like, you know, that didn't work out, or like, failed, or like, were too early or whatnot, and then 20 years later, you have these, like, incredible successes on almost the same concept.
Like, it might be that sort of thing where, like, there's another pass at it that happens in five years or in 10 years. But it does feel like that appreciation of that, like, the three-legged stool, if you will, between, like, you know, the hardware, the AI, and the character, that balance, it's hard to, I'm not aware of anywhere right now where, like, that same kind of aggressive drive with the value on the character is happening.
And so- - To me, just a prediction, exactly as you said, something that looks awfully a lot like cosmo, not in the actual physical form, but in the three-legged stool, something like that, in some number of years, will be a trillion-dollar company. I don't understand, like, it's obvious to me that, like, character, not just as robotic companions, but in all our computers, they'll be there.
It's like, Clippy was, like, two legs of that stool or something like that. I mean, those are all different attempts. What's really confusing to me is they're born, these attempts, and everybody gets excited, and for some reason, they die, and then nobody else tries to pick it up. And then maybe a few years later, a crazy guy like you comes around with just enough brilliance and vision to create this thing and is born, a lot of people love it.
A lot of people get excited, but maybe the timing is not right yet. And then when the timing is right, it just blows up. It just keeps blowing up more and more until it just blows up, and I guess everything in the full span of human civilization collapses eventually.
- And that wouldn't surprise me at all. And, like, what's gonna be different in another five years or 10 years or whatnot? Physical component costs will continue to come down in price, and mobile devices and computation's gonna become more and more prevalent, as well as cloud as a big tool to offload cost.
AI is gonna be a massive transformation compared to what we dealt with, where everything from voice understanding to just kind of a broader contextual understanding and mapping of semantics and understanding scenes and so forth. And then the character side will continue to kind of progress as well, 'cause that magic does exist, it just exists in different forms.
And you have just the brilliance of the tapping and animation and these other areas where that was a big unlock in film, obviously. And so I think, yeah, the pieces can reconnect and the building blocks are actually gonna be way more impressive than they were five years ago. - So in 2019, Anki, the company that created Cosmo, the company that you started, had to shut down.
How did you feel at that time? - Yeah, it was tough. That was a really emotional stretch and it was a really tough year. Like about a year ahead of that was actually a pretty brutal stretch because we were kind of life or death on many, many moments, just navigating these insane kind of just ups and downs and barriers.
And the thing that made it, just rewinding a tiny bit, like what ended up being really challenging about it as a business where is, from a commercial standpoint and customer reception standpoint, there was a lot of things you could point to that were like pretty big successes. So millions of units, got to like pretty serious revenue, like kind of close to a hundred million annual revenue, number one kind of product in kind of various categories.
But it was pretty expensive, ended up being very seasonal where something like 85% of our volume was in Q4 because it was a present and it was expensive to market it and explain it and so forth. And even though the volume was like really sizable and like the reviews are really fantastic, forecasting and planning for it and managing the cash operations was just brutal.
Like it was absolutely brutal. You don't think about this when you're starting a company or when you have a few million in revenue because it's just your biggest costs are kind of just your head count and operations and everything's ahead of you. But we got to a point where, if you look at the entire year, you have to operate your company, pay all the people and so forth.
You have to pay for the manufacturing, the marketing and everything else to do your sales in mostly November, December and then get paid in December, January by retailers. And those swings were really rough and just made it like so difficult because the more successfully became, the more wild those swings became because you'd have to like spend, tens of millions of dollars on inventory, tens of millions of dollars on marketing and tens of millions of dollars on payroll and everything else.
- And then the bigger dip and then you're waiting for the key four. - Yeah, and it's not a business that like is recurring kind of month to month and predictable. And it's just, and then you're walking in your forecast in July, maybe August, if you're lucky. And it's also like very hit driven and seasonal where like you don't have the sort of continued kind of slow growth like you do in some other consumer electronics industries.
And so before then like hardware kind of like went out of favor too. And so you had Fitbit and GoPro drop from 10 billion revenue to 1 billion revenue and hardware companies are getting valued at like 1X revenue oftentimes, which is tough, right? And so we effectively kind of got caught in the middle where we were trying to quickly evolve out of entertainment and move into some other categories, but you can't let go of that business because like that's what you're valued on, that's what you're raising money on.
But there was no path to kind of pure profitability just there because it was such, specific type of price points and so forth. And so we tried really hard to make that transition. And yeah, we had a financing round that fell apart at the last second. And effectively there was just no path to kind of get through that and get to the next kind of like holiday season.
And so we ended up selling some of the assets and kind of winding down the company. It was brutal. Like I was very transparent with the company like in the team while we were going through it, where actually despite how challenging that period was, very few people left. I mean, like people love the vision, the team, the culture of the like kind of chemistry and kind of what we were doing.
There was just a huge amount of pride there. And then we wanted to see it through. And we felt like we had a shot to kind of get through these checkpoints. We ended up, and I mean by brutal, I mean like literally like days of cash, like three, four different times runway, like in the year, you know, kind of before it, where you're like playing games of chicken on negotiating credit line timelines and like repayment terms and how to get like a bridge loan from an investor.
It's just like level of stress that like as hard as things might be anywhere else, like you'll never come close to that where you feel that like responsibility for 200 plus people, right? And so we were very transparent during our fundraise on who we're talking to, the challenges that we have, how it's going and when things are going well, when things were tough.
And so it wasn't a complete shock when it happened, but it was just very emotional where like, you know, when we announced it finally, that like, you know, we basically were just like watching kind of like, you know, the runway and trying to kind of time it. And when we realized that like, we didn't have any more outs, we wanted to like kind of wind it down, make sure that it was like clean and, you know, we could like kind of take care of people the best we could, but they like broke down crying at all hands and somebody else had to step in for a bit.
And like, it was just very, very emotional, but the beautiful part is like afterwards, like everybody stayed at the office till like two, three in the morning, just like drinking and hanging out and telling stories and celebrating. And it was just like one of the best, for many people it was like the best kind of work experience that they had and there was a lot of pride in what we did.
And it wasn't anything obvious we could point to that like, hey, if only we had done that different, things would have been completely different. It was just like the physics didn't line up. And, but the experience was pretty incredible, but it was hard. Like it was, it had this feeling that there was just like incredible beauty in both the technology and products and the team that, you know, there's a lot there that like in the, you know, right context could have been pretty incredible, but it was emotional just.
- Yeah, just thinking, I mean, just looking at this company, like you said, the product and technology, but the vision, the implementation, you got the cost down very low. And the compelling, the nature of the product was great. So many robotics companies failed at this, at the robot was too expensive.
It didn't have the personality. It didn't really provide any value, like a sufficient value to justify the price. So like you succeeded where basically every single other robotics company, or most of them, they're like go in the category of social robotics have kind of failed. And I mean, it's quite tragic.
I remember reading that, I'm not sure if I talked to you before that happened or not, but I remember, you know, I'm distant from this. I remember being heartbroken reading that because like if Cosmo is not gonna succeed, what is going to succeed? 'Cause that to me was incredible.
Like it was an incredible idea. Cost is down, it's just like the most minimal design in physical form that you could do. It's really compelling, the balance of games. So it's a fun toy. It's a great gift for all kinds of age groups, right? It's just, it's compelling in every single way.
And it seemed like it was a huge success and it failing was, I don't know, there was heartbreak on many levels for me just as an external observer is I was thinking, how hard is it to run a business? That's what I was thinking. Like if this failed, this must have failed because it's obviously not like, yeah, it's business.
Maybe it's some aspect of the manufacturing and so on, but I'm now realizing it's also not just that, it's sales, marketing, all those. - It's everything, right? Like how do you explain something that's like a new category to people that like have all these predispositions? And so like, it had some of the hardest elements of, if you were to pick a business, it had some of the hardest customer dynamics because like to sell a $150 product, you got to convince both the child that wanted and the parents to agree that it's valuable.
So you're having like this dual prong marketing challenge. You have manufacturing, you have like really high precision on the components that you need, you have the AI challenges. So there were a lot of tough elements, but it was this feeling where like, it was just really great alignment of unique strength across kind of like all these different areas, just incredible like, you know, kind of character and animation team between this, like Carlos and it was like a character director, Dave that came on board and like, really great people there, the AI side, the manufacturing, the, you know, where like never missing a launch, right?
And actually, you know, he kind of hit that quality. It was, yeah, it was heartbreaking, but here's one neat thing is like, we had so much like fan mail from kind of kids, parents, like I actually like, there was a bunch that collected in the end that I actually saved and like, I never, it was too emotional to open it and I still haven't opened it.
And so I actually have this giant envelope of like a stack this much of like letters from, you know, kids and families, just like every, you got a permutation you can imagine. And so planning to kind of, I don't know, maybe like a five year, you know, five year to some year reunion, just inviting everybody over and we'll just like kind of dig into it and kind of bring back some memories, but you know, good impact.
- And well, I think there will be companies, maybe Waymo and Google will be somehow involved that will carry this flag forward and will make you proud, whether you're involved or not. I think this is one of the greatest robotics companies in the history of robotics. So you should be proud.
It's still tragic to know that, you know, 'cause you read all the stories of Apple and let's see, SpaceX and like companies that were just on the verge of failure several times through that story. And they just, it's almost like a roll of the dice, they succeeded. And here's a roll of the dice that just happened to go.
- And that's the appreciation that like, when you really like talk to a lot of the founders, like everybody goes through those moments and sometimes it really is a matter of like, you know, timing, a little bit of luck, like some things are just out of your control and you get a much deeper appreciation for just the dimensionality of that challenge.
But the great thing is, is that like a lot of the team actually like stayed together. And so there were actually, you know, a couple of companies that we kind of kept big chunks of the team together and we actually kind of helped align this, you know, to help people out as well.
And one of them was Waymo where a majority of the AI and robotics team actually had the exact background that you would look for in like kind of AV space. And it was a space that a lot of us like, you know, worked on in grad school, we're always passionate about and ended up, you know, maybe the time, you know, serendipitous timings from another perspective where like kind of landed in a really unique circumstance that's actually been quite exciting too.
- So it's interesting to ask you just your thoughts. Cosmos still lives on under Dreamlabs, I think. Is that, are you tracking the progress there or is it too much pain? Is that something that you're excited to see where that goes? - So keeping an eye on it, of course, just out of curiosity and obviously just kind of care for product line, I think it's deceptive how complex it is to manufacture and evolve that product line.
And the amount of experiences that are required to complete the picture and be able to move that forward. And I think that's gonna make it pretty hard to do something really substantial with it. It would be cool if like even the product in the way it was, was able to be manufactured.
- Yes. - Again, that would-- - Which is the current goal, I suppose. - Yeah, which would be neat. But I think it's deceptive how tricky that is on like everything from the quality control, the details, and then like technology changes that forces you to, Rick, reinvent and update certain things.
So I haven't been super close to it, but just kind of keeping an eye on it. - Yeah, it's really interesting how, it's deceptively difficult, just as you're saying. For example, those same folks, and I've spoken with them, they're, they partnered up with Rick and Morty creators to do the Butter Robot.
- Yeah. - I love the idea. I just recently, I'd kind of half-assed watched Rick and Morty previously, but now I just watched like the first season. It's such a brilliant show. I like, I did not understand how brilliant that show is. And obviously I think in season one is where the Butter Robot comes along for just a few minutes or whatever.
But I just fell in love with the Butter Robot. The sort of the, that particular character, just like you said, there's characters, you can create personalities, you can create, and that particular robot who's doing a particular task realizes, you know, this, like realizes, ask the existential question, the myth of Sisyphus question that Camus writes about.
It's like, is this all there is? Is he, moves butter? But you know, that realization, that's a beautiful little realization for a robot that my purpose is very limited to this particular task. It's humor, of course, it's darkness, it's a beautiful mix. But so they wanna release that Butter Robot, but something tells me that to do the same depth of personality as Cosmo had, the same richness, it would be on the manufacturing, on the AI, on the storytelling, on the design, it's going to be very, very difficult.
It could be a cool sort of toy for Rick and Morty fans, but to create the same depth of existential angst that the Butter Robot symbolizes is really, that's the brave effort you've succeeded at with Cosmo, but it's not easy, it's really difficult. - You can fail in almost any one of the kind of dimensions and like, and yeah, it takes, you know, yeah, you need convergence of a lot of different skill sets to try to pull that off, yeah.
On this topic, let me ask you for some advice, because as I've been watching Rick and Morty, as I told myself, I have to build the Butter Robot, just as a hobby project. And so I got a nice platform for it with treads and there's a camera that moves up and down and so on.
I'll probably paint it, but the question I'd like to ask, there's obvious technical questions I'm fine with, communication, the personality, storytelling, all those kinds of things. I think I understand the process of that, but how do you know when you got it right? So with Cosmo, how did you know this is great?
Like, or something is off, like, is this brainstorming with the team? Do you know it when you see it? Is it like love at first sight? It's like, this is right. Or like, I guess if we think of it as an optimization space, is there uncanny valley where you're like, this is not right, or this is right, or are a lot of characters right?
- Yeah, we stayed away from uncanny valley just by having such a different mapping where it didn't try to look like a dog or a human or anything like that. And so you avoided having like a weird pseudo similarity, but not quite hitting the mark. But you could like just fall flat where just like a personality or a character emotion just didn't feel right.
And so it actually mirrored very closely to kind of the iterations that a character director at Pixar would have, where you're running through it and you can virtually kind of like see what it'll look like. We created a plugin to where, we actually used like Maya, the simulation, you know, the animation tools.
And then we created a plugin that perfectly matched it to the physical one. And so you could like test it out virtually and then push a button and see it physically play out. And there's like subtle differences. And so you want to like make sure that that feedback loop is super easy to be able to test it live.
And then sometimes like you would just feel it that it's right and intuitively know. And then you'd also do, we did user testing, but it was very, very often that like the, like if we found it magical, it would scale and be magical more broadly. There were not too many cases where like, like we were pretty decent about not like getting to it, you know, geeking out or getting too attached to something that was super unique to us, but trying to kind of like, you know, put a customer hat on and does it truly kind of feel magical?
And so in a lot of ways, we just give a lot of autonomy to the character team to really think about the, you know, character board and mood boards and storyboards and like what's the background of this character and how would they react? And they went through a process that's actually pretty familiar, but now had to operate under these unique constraints.
But the moment where it felt right kind of took a fairly similar journey than like as a character in an animated film, actually. - That's quite cool. - Well, the thing that's really important to me, and I wonder if it's possible, well, I hope it's possible, pretty sure it's possible, is for me, even though I know how it works, to make sure there's sufficient randomness in the process, probably because it would be machine learning based, that I'm surprised that I don't, I'm surprised by certain reactions, I'm surprised by certain communication.
Maybe that's in a form of a question, were you surprised by certain things Cosmo did, like certain interactions? - Yeah, we made it intentionally, so that there would be some surprise than like a decent amount of variability in how he'd respond in certain circumstances. And so in the end, like it's, this isn't generally AI, this is a giant like spectrum and library of like parameterized kind of emotional responses and an emotional engine that would like kind of map your current state of the game, your emotions, the world, the people are playing with you all so forth to what's happening.
But we could make it feel spontaneous by creating enough diversity and randomness, but still within the bounds of what felt like very realistic to make that work. And then what was really neat is that we could get statistics on how much of that space we were saturating, and then add more animations and more diversity in the places that would get hit more often, so that you stay ahead of the curve and maximize the chance that it stays feeling alive.
And so, but then when you like combine it, like the permutations and kind of like the combinations of emotions stitched together, sometimes surprised us because you see them in isolation, but when you actually see them and you see them live, relative to some event that happened in the game or whatnot, like it was kind of cool to see the combination of the two.
And not too different than other robotics applications where like you get so used to thinking about like the modules of a system and how things progress through a tech stack, that the real magic is when all the pieces come together and you start getting the right emergent behavior in a way that's easy to lose when you just kind of go too deep into any one piece of it.
- Yeah, when the system is sufficiently complex, there is something like emergent behavior and that's where the magic is. As a human being, you could still appreciate the beauty of that magic at the system level. First of all, thank you for humoring me on this. It's really, really fascinating.
I think a lot of people would love this. I'd love to just, one last thing on the butter robot, I promise. (laughs) In terms of speech, Cosmo is able to communicate so much with just movement and face. Do you think speech is too much of a degree of freedom?
Like is speech a feature or a bug of deep interaction? Or emotional interaction? - Yeah, for a product, it's too deep right now. It's just not real. You would immediately break the fiction because the state of the art is just not good enough. And that's on top of just narrowing down the demographic where like the way you speak to an adult versus the way you speak to a child is very different.
Yet a dog is able to appeal to everybody. And so right now there is no speech system that is like rich enough and subtly realistic enough to feel appropriate. And so we very, very quickly kind of like moved away from it. Now, speech understanding is a different matter where understanding intent, that's a really valuable input.
But giving it back requires like a way, way higher bar given kind of where today's world is. And so that realization that you can do surprisingly much with either no speech or kind of tonal like the way, you know, Wally R2-D2 and kind of other characters are able to.
It's quite powerful and it generalizes across cultures and across ages really, really well. I think we're gonna be in that world for a little while where it's still very much an unsolved problem on how to like make something. It touches on the Uncanny Valley thing. So if you have legs and you're a big humanoid looking thing, you have very different expectations and a much narrower degree of what's gonna be acceptable by society than if you're a robot like Cosmo or Wally and you can, or some other form where you can kind of like reinvent the character.
Speech has that same property where speech is so well understood in terms of expectations by humans that you have far less flexibility on how to deviate from that and lean into your strengths and avoid weaknesses. - But I wonder if there is, obviously there's certain kinds of speech that activates the Uncanny Valley and breaks the illusion faster.
So I guess my intuition is we will solve certain, we would be able to create some speech-based personalities sooner than others. So for example, I could think of a robot that doesn't know English and is learning English, right? Those kinds of personalities. - A fiction where you're like, you're intentionally kind of like getting a toddler level of speech.
So that's exactly right. So you can't have like tie it into the experience where it is a more limited character or you embrace the lack of emotions as part, or the lack of, sorry, dynamic range in the speech kind of capabilities, emotions as like part of the character itself.
And you've seen that in like kind of fictional characters as well, but- - That's why this podcast works. - And yeah, and like, and you kind of had that with like, I don't know, I guess like data and some of the other, you know, what's it like. - Yeah, exactly.
- But yeah, so you have to, and that becomes a constraint that lets you meet the bar. - See, I honestly think like also if you add drunk and angry, that gives you more constraints that allow you to be a dumber from an NLP perspective. Like there's certain aspects.
So if you modify human behavior, like, so forget the sort of artificial thing where you don't know English, toddler thing. We, if you just look at the full range of humans, I think we, there's certain situations where we put up with like lower level of intelligence in our communication.
Like if somebody is drunk, we understand the situation, that they're probably under the influence, like we understand that they're not going to be making any sense. Anger is another one like that. I'm sure there's a lot of other kind of situations. - Yeah. - So yeah, again, language, loss in translation, that kind of stuff that I think if you play with that, what is it, the Ukrainian boy that passed the Turing test, you know, play with those ideas.
I think that's really interesting. And then you can create compelling characters, but you're right, that's a dangerous sort of road to walk because you're adding degrees of freedom that can get you in trouble. - Yeah, and that's why like you have these big pushes that like for most of the last decade plus, like where you'd have like full like human replicas of robots, where I'd be down to like skin and like kind of in some places.
My personal feeling is like, man, like that's not the direction that's most fruitful right now. - Beautiful art. - Yeah. - It's not in terms of a rich, deep, fulfilling experience. Yeah, you're right. - Yeah, and creating a minefield of potential places to feel off and then you're sidestepping where like the biggest kind of functional AI challenges are to actually have, you know, kind of like really rich productivity that actually kind of justifies, you know, kind of the higher price points.
And that's part of the challenges is like, yeah, like robots are gonna get to like thousands of dollars, tens of thousands of dollars and so forth, but you can imagine what sort of expectation of value that comes with it. And so that's where you wanna be able to invest the time and depth.
And so going down the full human replica route creates a gigantic distraction and really, really high bar that can end up sucking up so much of your resources. - So it's weird to say, but you happen to be one of the greatest at this point roboticists ever because you created this little guy, you were part obviously of a great team that created the little guy with a deep personality and are now switching to an entirely, well, maybe not entirely, but a different, fascinating, impactful robotics problem, which is autonomous driving and more specifically, the biggest version of autonomous driving, which is autonomous trucking.
So you are at Waymo now. Can you give us a big picture overview? What is Waymo? What is Waymo driver? What is Waymo One? What is Waymo Via? Can you give an overview of the company and the vision behind the company? - For sure. Waymo, by the way, it's just, it's been eyeopening on just how incredible that the people and the talent is and how in one company you almost have to create, I don't know, 30 companies worth of like technology and capability to like kind of solve the full spectrum of it.
So yeah, so I've been at Waymo since 2019, so about two and a half years. So Waymo is focused on building what we call a driver, which is creating the ability to have autonomous driving across different environments, vehicle platforms, domains and use cases. You know, as you know, it got started in 2009.
It was a lot, almost like an immediate successor to the Grand Challenge and Urban Challenges that were like incredible kind of catalysts for this whole space. And so Google started this project and then eventually Waymo spun out and so what Waymo is doing is creating the systems, both hardware, software, infrastructure, everything that goes into it to enable and to commercialize autonomous driving.
This hits on consumer transportation and ride sharing and kind of vehicles and urban environments. And as you mentioned, it hits on autonomous trucking to transport goods. So in a lot of ways, it's transporting people and transporting goods, but at the end of the day, the underlying capabilities that are required to do that are surprisingly better aligned than one might expect, where it's the fundamentals of being able to understand the world around you, process it, make intelligent decisions and prove that we are at a level of safety that enables large-scale autonomy.
- So from a branding perspective, sort of Waymo driver is the system that's irrespective of a particular vehicle it's operating in. You have a set of sensors that perceive the world, can act in that world and move this, whatever the vehicle is through the world. - That's right. And so in the same way that you have a driver's license and like your ability to drive isn't tied to a particular make and model of a car.
And of course there's special licenses for other types of vehicles, but the fundamentals of a human driver very, very largely carry over. And then there's uniquenesses related to a particular environment or domain or a particular vehicle type that kind of add some extra additive challenges. But that's exactly right.
It's the underlying systems that enable a physical vehicle without a human driver to very successfully accomplish the tasks that previously wasn't possible without a hundred percent human driving. - And then there's Waymo One, which is the transporting people from a brand perspective. And just in case we refer to it so people know, and then there's Waymo Via, which is the trucking component.
Why Via, by the way? What is that? What is that? Is it just like a cool sounding name that just, like, is there an interesting story there? Just, it is a pretty cool sounding name. - It's a cool sounding name. I mean, when you think about it, it's just like, well, we're gonna transport it via this and that.
So it's just kind of like an allusion to the mechanics of transporting something. And it is a pretty good grouping. And the interesting thing is that even the groupings kind of blur where Waymo One is like human transportation and there's a fully autonomous service in the Phoenix area that like every day is transporting people.
And it's pretty incredible to like, just, you know, see that operate at reasonably large scale and just kind of happen. And then on the Via side, it doesn't even have to be, like long haul trucking is a, like a major focus of ours, but down the road, you can stitch together the vehicle transportation as well for local delivery.
Also, in a lot of this requirements for local delivery overlap very heavily with consumer transportation. Obviously, you know, given that you're operating on a lot of the same roads and navigating the same safety challenges. And so, yeah, and Waymo very much is a, you know, multi-product company that has ambitions in both.
They have different challenges and both are tremendous opportunities. But the cool thing is, is that there's a huge amount of leverage and this kind of core technology stack now gets pushed on by both sides. And that adds its own unique challenges, but the success case is that the challenges that you push on, they get leveraged across all platforms and all domains.
- From an engineer perspective, the teams are integrated. - It's a mix. So there's a huge amount of centralized kind of core teams that support all applications. And so you think of something like the hardware team that develops the lasers, the compute, integrates into vehicle platforms. This is an experience that carries over across, you know, any application that we'd have, and they have been full with both.
Then there's like really unique perception challenges, planning challenges, like other types of challenges where there's a huge amount of leverage on a core tech stack, but then there's like dedicated teams that think of how do you deal with a unique challenge? For example, an articulated trailer with varying loads that completely changes the physical dynamics of a vehicle that doesn't exist on a car, but it becomes one of the most important kind of unique new challenges on a truck.
- So what's the long-term dream of Waymo via the autonomous trucking effort that Waymo's doing? - Yeah, so we're starting with developing L4 autonomy for class A trucks. These are 53 foot trailers that capture like a big, a pretty sizable percentage of the goods transportation in the country. Long-term, the opportunity is obviously to expand to much more diverse types of vehicles, types of goods transportation, and start to really expand in both the volume and the route feasibility that's possible.
And so just like we did on the car side, you start with a single route with a very specific operating kind of domain and constraints that allow you to solve the problem. But then over time, you start to really try to push against those boundaries and open up deeper feasibility across routes, across surface streets, across environmental conditions, across the type of goods that you carry, the versatility of those goods, and how little supervision is necessary to just start to scale this network.
And long-term, there's actually, it's a pretty incredible enabler where, today you have already a giant shortage of truck drivers. It's over 80,000 truck driver shortage that's expected to grow to hundreds of thousands in the years ahead. You have really, really quickly increasing demand from e-commerce and just distribution of where people are located.
You have one of the deepest safety challenges of any profession in the US where there's a huge, huge, huge kind of challenge around fatigue and around kind of the long routes that are driven. And even beyond kind of the cost and necessity of it, there are fundamental constraints built into our logistics network that are tied to the type of human constraints and regulatory constraints that are tied to trucking today.
For example, our limits on how long a driver can be driving in a single day before they're not allowed to drive anymore, which is a very important safety constraint. What that does is it enforces limitations on how far jumps with a single driver could be and makes you very subject to availability of drivers, which influences where warehouses are built, which influences how goods are transported, which influences costs.
And so you start to have an opportunity on everything from plugging into existing fleets and brokerages and the existing logistics network and just immediately start to have a huge opportunity to add value from cost and driving fuel insurance and safety standpoint, all the way to completely reinventing the logistics network across the United States and enabling something completely different than what it looks like today.
- Yeah, I had, it'll be published before this, had a great conversation with Steve Vigeli, who we talked about the manual driving. He echoed many of the same things that you were talking about, but we talked about much of the fascinating human stories of truck drivers. He was also was a truck driver for a bit as a grad student to try to understand the depth of the problem.
He's a fascinating- - Fascinating, why it's, we have some drivers that have 4 million miles of lifetime driving experience. It's pretty incredible. And yeah, it's, you know, learning from them, like some of them are on the road for 300 days a year. It's a very unique type of lifestyle.
- So there's fascinating stuff there. Just like you said, there's a shortage of actually people, truck drivers taking the job counter to what is, I think is publicly believed. So there's an excess of jobs and a shortage of people to take up those jobs. And just like you said, it's such a difficult problem.
And these are experts at driving, at solving this particular problem. And it's fascinating to learn from them, to understand, you know, how hard is this problem? And that's the question I wanna ask you from a perception, from a robotics perspective. What's your sense of how difficult is autonomous trucking?
Maybe you can comment on which scenarios are super difficult, which are more manageable. Is there a way to kind of convert into words how difficult the problem is? - Yeah, it's a good question. So there's, and as you can expect, it's a mix. Some things become a lot easier, or at least more flexible.
Some things are harder. And so, you know, on the things that are like, the tailwinds, the benefits. A big focus of automating trucking, especially initially, is really focusing on the long haul freeway stretch of it, where that's where a majority of the value is captured. On a freeway, you have a lot more structure and a lot more consistency across freeways across the US, compared to surface streets, where you have a way higher dimensionality of what can happen, lack of structure, lack of consistency and variability across cities.
So you can leverage that consistency to tackle, at least in that respect, a more constrained AI problem, which has some benefits to it. You can itemize much more of the sort of things you might encounter and so forth. And so those are benefits. - Is there a canonical freeway and city we should be thinking about?
Like, is there a standard thing that's brought up in conversation often? Like, here's a stretch of road. What is it? Like when people talk about traveling across country, they'll talk about New York, San Francisco. Is that the route? Like, is there a stretch of road that's like nice and clean?
And then there's like cities with difficulties in them that you kind of think of as the canonical problem to solve here. - Right. So starting with the car side, Waymo very intentionally picked the Phoenix area and the San Francisco area as a follow-up once we hit driverless, where when you think of consumer transportation and ride sharing kind of economy, a big percentage of that market is captured in the densest cities in the United States.
And so really pushing at and solving San Francisco becomes a really huge opportunity and importance. And places one dot on kind of like the spectrum of like kind of complexity. The Phoenix area, starting with Chandler and then like kind of expanding more broadly in the Phoenix metropolitan area, it's I believe the fastest growing city in the US.
It's a kind of a higher medium sized city, but growing quickly and still captures a really wide range of kind of like complexities. And so getting to driverless there actually exposes you to a lot of the building blocks you need for the more complicated environments. And so in a lot of ways, there's a thesis that if you start to kind of place a few of these kind of dots where San Francisco has these types of unique challenges, dense pedestrians, all this like complexity, especially when you get into the downtown areas and so forth, and Phoenix has like a really interesting kind of spectrum of challenges, maybe other ones like LA kind of add freeway focus and so forth.
You start to kind of cover the full set of features that you might expect, and it becomes faster and faster if you have the right systems and the right organization to then open up the fifth city and the 10th city and the 20th city. On trucking, there's similar properties where obviously there's uniquenesses in freeways when you get into really dense environments, and then the real opportunity to then get even more value is to think about how you expand with like some of the surface challenges.
But for example, right now, we're looking, we have a big facility that we're finishing building in Q1 in Dallas area that'll allow us to do testing from the Dallas area on routes like Dallas to Houston, Dallas to Phoenix, going out East and- - Dallas to Austin. - Austin, so that triangle- - Waymo should come to Austin.
- Well, Waymo, the car side was in Austin for a while. - Yes, I know, come back. - Yeah, but trucking is actually, Texas is one of the best places to start because of both volume, regulatory, where there's a lot of benefits. On trucking, a huge opportunity is Port of LA going East.
So in a lot of ways, a lot of the work is to start to stitch together a network and converge to Port of LA where you have the biggest port in the United States. And the amount of goods going East from there is pretty tremendous. And then obviously there's kind of channels everywhere, and then you have extra complexities as you get into like snow and inclement weather and so forth.
But what's interesting about trucking is every single route segment that you add increases the value of the whole network. And so it has this kind of network effect and cumulative effect that's very unique. And so there's all these dimensions that we think about. And so in a lot of ways, Dallas has a really unique hub that opens up a lot of options has become a really valuable lever.
- So the million questions I could ask you, first of all, you mentioned level four. For people who totally don't know, there's these levels of automation that level four refers to kind of the first step that you could recognize as fully autonomous driving. Level five is really fully autonomous driving.
And level four is kind of fully autonomous driving. And then there are specific definitions depending on who you ask what that actually means. But for you, what does the level four mean? And you mentioned freeway. Let's say like there's three parts of long haul trucking. Maybe I'm wrong in this, but there's freeway driving, there's like truck stop, and then there's more urban-y type of area.
So which of those do you want to tackle? Which of them do you include under level four? Like, how do you think about this problem? What do you focus on? Where's the biggest impact to be had in the short term? - So the goal is to, we got to get to market as fast as we can, because the moment you get to market, you just learn so much and it influences everything that you do.
And it is, I mean, it's one of the experiences I carried over from before, is that you add constraints, you figure out the right compromises, you do whatever it takes, because getting to market is so critical. - But here, with autonomous driving, you can get to market in so many different ways.
- That's right. And so one of the simplifications that we intentionally have put on is using what we call transfer hubs, where you can imagine depots that are at the entry points to metropolitan areas, like let's say Dallas, like the hub that we're building, which does a few things that are very valuable.
So from a first product standpoint, you can automate transfer hub to transfer hub, and that path from the transfer hub to the full freeway route can be a very intentional single route that you can select for the features that you feel you wanna handle at that point in time.
- Now you build a hub specifically designed for autonomous trucking. - And that's what's gonna happen actually, like you need to come out in January and check it out, 'cause it's gonna be really cool. It's not only is it our main operating headquarters for our fleet there, but it will be the first fully ground up designed driverless hub for autonomous trucks in terms of where do they enter, where do they depart?
How do you think about the flow of people, goods, everything? It's quite cool, and it's really beautiful on how it's thought through. And so early on, it is totally reasonable to do the last five miles manually to get to the final kind of depot to avoid having to solve the general surface street problem, which is obviously very complex.
Now, when the time comes, and we are increasingly, well, already we're pushing on some of this, but we will increasingly be pushing on surface street capabilities to build out the value chain to go all the way depot to depot instead of transfer hub to transfer hub. And we have probably the best advantages in the world because of all the Waymo experience on surface streets, but that's not the highest ROI right now where the highest ROI is-- - Hub to hub.
- Hub to hub and get the routes going. And so when you ask what's L4, L4 can be applied to any domain, operating domain or scope, but it's effectively for the places where we say we're ready for autonomous operation. We are 100% operating through as a self-driving truck with no human behind the wheel.
That is L4 autonomy. And it doesn't mean that you operate in every condition. It doesn't mean you operate on every road, but for a particularly well-defined area, operating conditions, routes, kind of domain, you are fully autonomous. And that's the difference between L4 and L5. And most people would agree that at least anytime in the foreseeable future, L5 is just not even really worth thinking about because there's always gonna be these extremes.
And so it's a race and almost like a game where you think of what is the sequence of expanded capabilities that create the most value and teach us the most and create this feedback loop where we're building out and unlocking more and more capability over time. - I gotta ask you, just curious.
So first of all, I have to, when I'm allowed, visit the Dallas facility 'cause it's super cool. It's like robot on the giving and the receiving end. It's the truck is a robot and the hub is a robot. - Yeah, it's gotta be very robot friendly. - Yeah, that's great.
I will feel at home. What's the sensor suite like on the hub if you can just high level mention it? Is, does the hub have like lidars and like, is the truck doing most of the intelligence or is the hub also intelligent? - Yeah, so most of it will be the truck and everything is like connected.
Like, so we have our servers where we know exactly where every truck is. We know exactly what's happening at a hub. And so you can imagine like a large backend system that over time starts to manage timings, goods, delivery windows, all these sorts of things. And so you don't actually need to, there might be special cases where that is valuable to equip some sensors in the hub, but a majority of the intelligence is gonna be on the truck because whatever's relevant to the truck, relevant should be seen by the truck and can be relayed remotely for any sort of kind of cognizance or decision-making.
But there's a distinct type of workflow where, where do you check trucks? Where do you want them to enter? What if there's many operating at once? Where's the staging area to depart? How do you set up the flow of humans and human cars and traffic so that you minimize the interaction between humans and kind of self-driving trucks?
And then how do you even intelligently select the locations of these transfer hubs that are both really great service locations for a metropolitan area? And there could be over time, many of them for a metropolitan area, while at the same time, leaning into the path of least resistance to lean into your current capabilities and strengths so that you minimize the amount of work that's necessary to unlock the next kind of big bar.
- I have a million questions. So first, is the goal to have no human in the truck? - The goal is to have no human in the truck. Now, of course, right now we're testing with expert operators and so forth, but the goal is to... Now, there might be circumstances where it makes sense to have a human and obviously these trucks can also be manually driven.
So sometimes like we talk with our fleet partners about how you can buy Waymo equipped Dymor truck down the road and on the routes that are autonomous, it's autonomous. On the routes that are not, it's human driven. Maybe there's L2 functionality that add safety systems and so forth. But as soon as they become, as soon as we expand in software, the availability of driverless routes, the hardware is forward compatible to just now start using them in real time.
And so you can imagine this mixed use, but at the end of the day, the largest value proposition is where you're able to have no constraints on how you can operate this truck. And it's 100% autonomous with nobody inside. - That's amazing. So let me ask on a logistics front, 'cause you mentioned that also opportunity to revamp or for built from scratch, some of the ideas around logistics.
I don't wanna throw too much shade, but from talking to Steve, my understanding is logistics is not perhaps as great as it could be in the current trucking environment. I'm not, maybe you can break down why, but there's probably competing companies. There's just a mess. Maybe some of it is literally just it's old school.
Like it's just like, it's not computerized. Like truckers are almost like contractors. There's an independence and there's not a nice interface where they can communicate where they're going, where they're at, all those kinds of things. And so it just feels like there's so much opportunity to digitize everything to where you could optimize the use of human time, optimize the use of all kinds of resources.
How much are you thinking about that problem? How fascinating is that problem? How difficult does it, how much opportunity is there to revolutionize the space of logistics in autonomous trucking, in trucking period? - It's pretty fascinating. It's, this is one of the most motivating aspects of all this where like, yes, there's like a mountain of problems that are like, you want to, you have to solve to get to like the first checkpoints and first driverless and so forth.
And inevitably like in a space like this, you plug in initially into the existing kind of system and start to kind of learn and iterate. But that opportunity is massive. And so a couple of the factors that play into it. So first of all, there's obviously just the physical constraints of driving time, driver availability.
Some fleets have a 95% attrition rate, you know, right now because of just this demands and like, you know, kind of gaps in competition and so forth. And then it's also incredibly fragmented where you would be shocked at like, when you look at industries, like, and you think of the top 10 players, like the biggest fleets, like the Walmarts and FedEx's and so forth, the percentage of the overall trucking market that's captured by the top 10 or 50 fleets is surprisingly small.
The average kind of truck operation is like a one to five truck, you know, family business. And so there's just like a huge amount of like fragmentation, which makes for really interesting challenges in kind of stitching together through like bulletin boards and brokerages and some people run their own fleets.
And this world's kind of like evolving, but it is one of the less digitized and optimized worlds that there is. And the part that is optimized is optimized to the constraints of today. And even within the constraints of today, this is a $900 billion industry in the US and it's continuing to grow.
- It feels like from a business perspective, if I were to predict that whilst trying to solve the autonomous trucking problem, Waymo might solve first the logistics problem. Like, 'cause that would already be a huge impact. So on the way to solving autonomous trucking, the human driven, like there's so much opportunity to significantly improve the human driven trucking, the timing, the logistics.
So you use humans optimally. - The handoffs, the like, you know. Well, even the, I mean, you get really ambitious, you start to expand this beyond like, how does the fulfillment center work? And like, how does the transfer hub work? How does the warehouse work to, I mean, there's a lot of opportunities to start to automate these chains.
And a lot of the inefficiency today is because like, you have a delay, like Port of LA has a bunch of ships right now waiting outside of it because they can't dock because there's not enough labor inside of the Port of LA. That means there's a big backlog of trucks, which means there's a big backlog of deliveries, which means the drivers aren't where they need to be.
And so you have this like huge chain reaction and your feasibility of readjusting in this network is low because everything's tied to humans and manual kind of processes or distributed processes across a whole bunch of players. And so one of the biggest enablers is, yes, we have to solve autonomous trucking first.
And that, by the way, that's not like an overnight thing. That's decades of continued kind of expansion and work. But the first checkpoint in the first route is like, is not that far off. But once you start enabling and you start to learn about how the constraints of autonomous trucking, which are very, very different than the constraints of human trucking, and again, strengths and weaknesses, how do you then start to leverage that and rethink a flow of goods more broadly?
And this is where like the learnings of like really partnering with some of the largest fleets in the US and the sort of learnings that they have about the industry and the sort of needs that they have and what would change if you just like really broke this one constraint that like holds up the whole network?
Or what if you enabled this other constraint? That actually drives the roadmap in a lot of ways because this is not like an all or nothing problem. It's, you know, you start to kind of unlock more and more functionality over time, which functionality most enables this optimization ends up being kind of part of the discussion.
But you're totally right. Like you fast forward to like, you know, five years, 10 years, 15 years, and you think about like very generalized capability of automation and logistics, as well as the ability to like poke into how those handoffs work. The efficiency goes far beyond just direct cost of today's like unit economics of a truck.
They go towards reinventing the entire system in the same way that, you know, you see, you know, these other industries that, like when you get to enough scale, you can really rethink how you build around your new set of capabilities, not the old set of capabilities. - Yeah, use the analogy metaphor or whatever that autonomous trucking is like email versus mail.
And then with email, you're still doing the communication, but it opens up all kinds of, varieties of communication that you didn't anticipate. - That's right, constraints are just completely different. And yeah, there's definitely a property of that here. And we're also still learning about it because there is a lot of really fascinating and sometimes really elegant things that the industry has done where there's companies whose entire existence is around, despite the constraints, optimizing as much as they can out of it.
And those lessons do carry over, but it's an interesting kind of merger of worlds to think about like, well, what if this was completely different? How would we approach it? And the interesting thing is that for a really, really, really long time, it's actually gonna be the merger between how to use autonomy and how to use humans that leans into each of their strengths.
- Yeah, and then we're back to Cosmo, human robot interaction. So, and the interesting thing about Waymo is because there's the passenger vehicle, the human, the transportation of humans and transportation of goods, you could see over time, they may kind of meld together more because you'll probably have like zero occupancy vehicles moving around.
So you have transportation of goods for short distances and then for slightly longer distances and then slightly longer, and then there'll be this, then you just see the difference between a passenger vehicle and a truck is just size and you can have different sizes and all that kind of stuff.
And at the core, you can have a Waymo driver that doesn't, as long as you have the same sense of suite, you can just think of it as one problem. - And that's why over time, these do kind of converge where in a lot of ways, a lot of the challenges we're solving are freeway driving, which are going to carry over very well to the vehicles, to the car side.
But there are like then unique challenges, like you have a very different dynamics in your vehicle where you have to see much further out in order to have the proper like response time because you have an 80,000 pound fully loaded truck. That's a very, very different type of braking profile than a car.
You have a really interesting kind of dynamic limits because of the trailer where you actually, it's very, very hard to like physically like flip a car or do something like physically, like most risk in a car is from just collisions. It's very hard to like in any normal operation to do something other than like, unless you hit something to actually kind of like roll over something.
On a truck, you actually have to drive much closer to the physical bounds of the safety limits, but you actually have like real constraints because you could have a really interesting interactions between the cabin and the trailer. There's something called jackknifing. If you turn too quickly, you have roll risks and so forth.
And so we spent a huge amount of time understanding those boundaries and those boundaries change based on the load that you have, which is also an interesting difference. And you have to propagate through that, through the algorithm so that you're leveraging your dynamic range, but always staying within a safety balance, but understanding what those safety bounds are.
And so we have this like really cool test facility where we like take it to the max and actually, imagine a truck with these giant training wheels on the back of the trailer and you're pushing it past the safety limits in order to like try to actually see where it rolls.
And so you define this high dimensional boundary, which then gets captured in software to stay safe and actually do the right thing. But it's kind of fascinating, the sort of kind of challenges you have there. But then all of these things drive really interesting challenges from perception to unique behavior prediction challenges.
And obviously in Planner where you have to think about merging and creating gaps with a 53 foot trailer and so forth. And then obviously the platform itself is very different. We have different numbers of sensors, sometimes types of sensors, and you also have unique blind spots that you have because of the trailer, which you have to think about.
And so it's a really interesting spectrum. And in the end, you try to capture these special cases in a way that is cleanly augmentations of the existing tech stack, because a majority of what we're solving is actually generalizable to freeway driving and different platforms. And over time, they all start to kind of merge ideally where the things that are unique are as minimal as possible.
And that's where you get the most leverage. And that's why Waymo can do, take on $2 trillion opportunities and have been nowhere near 2X the cost or investment or size. In fact, it's much, much smaller than that because of the high degree of leverage. - So what kind of sensor suite they can speak to that a long haul truck needs to have?
LiDAR, vision, how many, what are we talking about here? - Yeah, so it's more than the car. So very loose, you can think of as like 2X, but it varies depending on the sensor. And so we have like dozens of cameras, radar, and then multiple LiDAR as well. You'll see one difference where the cars have a central main sensor pod on the roof in the middle, and then some kind of hood sensors for blind spots.
The truck moves to two main sensor pods on the outsides where you would typically have the mirrors next to the driver. They effectively go as far out as possible, kind of up to the- - Up front. - Kind of on the cabin, not all the way in the front, but like kind of where the mirrors for the driver would be.
And so those are the main sensor pods. And the reason they're there is because if you had one in the middle, the trailer's higher than the cabin and you would be occluded with this like awkward wedge. - Too much occlusion. - Too much occlusion. And so then you would add a lot of complexity to the software to make up for that and just unnecessary complexity.
- There's so many probably fascinating design choices. - Really cool. - 'Cause you can probably bring up a LiDAR higher and have it in the center or something. You could have all kinds of choices to make the decisions here that ultimately probably will define the industry. - Right, but by having two on the side, there's actually multiple benefits.
So one is like you're just beyond the trailer, so you can see fully flush with the trailer. And so you eliminate most of your blind spot except for right behind the trailer, which is great because now the software carries over really well. And the same perception system you use on the car side, largely that architecture can carry over and you can retrain some models and so forth, but you leverage it a lot.
It also actually helps with redundancy where there's a really nice built-in redundancy for all the LiDAR cameras and radar where you can afford to have any one of them fail and you're still okay. And at scale, every one of them will fail. And you will be able to detect when one of them fails because they don't, because the redundancy, they're giving you the data that's inconsistent with the rest of the- - That's right.
And it's not just like they no longer give data. It could be like they're fouled or they stop giving data where some electrical thing gets cut or part of your compute goes down. So what's neat is that like you have way more sensors, part of his field of view and occlusions, part of it's redundancy, and then part of it is new use cases.
So there's new types of sensors to optimize for long range and kind of the sensing horizon that we look for on our vehicles that is unique to trucks because it actually is like kind of much like further out than a car. But a majority are actually used across both cars and trucks.
And so we use the same compute, the same fundamental baseline sensors, cameras, radar, IMUs. And so you get a great leverage from all of the infrastructure and the hardware development as a result. - So what about cameras? What role does, so LIDAR is this rich set of information that has its strengths, has some weaknesses.
Camera is this rich source of information that has some strengths, has its weaknesses. What role does LIDAR play? What role does vision cameras play in this beautiful problem of autonomous trucking? - It is beautiful. There's like so much that comes together. - And how much, at which point do they come together?
- Yeah. So I'll start with LIDAR. So LIDAR has been like way most, one of way most big strengths and advantages where we developed our own LIDAR in-house where many generations in, both in cost and functionality, it is the best in the space. - Which generation? 'Cause I know there's this cool, I love versions that are increasing.
Which version of the hardware stack is it currently? Officially, publicly? - So some parts iterate more than others. I'm trying to remember on the sensor side. So the entire self-driving system, which includes sensors and compute is fifth generation. - I can't wait until there's like iPhone style like announcements for like new versions of the Waymo hardware stack.
- Well, we try to be careful 'cause man, when you change the hardware, it takes a lot to like retrain the models and everything. So we just went through that and going from the Pacificus to the Jaguars. And so the Jaguars and the trucks have the same generation now.
But yeah, the LiDAR is, it's incredible. And so Waymo has leaned into that as a strength. And so a lot of the near range perception system that obviously kind of carries over a lot from the car side uses LiDAR as a very prominent kind of like primary sensor, but then obviously everything has its strengths and weaknesses.
And so in the near range, LiDAR is a gigantic advantage and it has its weaknesses on, when it comes to occlusions in certain areas, rain and weather, like things like that. But it's an incredible sensor and it gives you incredible density, perfect location, precision and consistency, which is a very valuable property to be able to kind of apply ML approach.
- Can you elaborate consistency? - Yeah, when you have a camera, the position of the sun, the time of the day, various of the properties can have a big impact, whether there's glare, the field of view, things like that. - So consistent. - The signal. - With in the face of a changing external environment, the signal.
- Yeah, daytime, nighttime, it's about 3D physical existence. In effect, like you're seeing beams of light that physically bounce off of something and come back. And so whatever the conditional conditions are, like the shape of a human sensor reading from a human or from a car or from an animal, like you have a reliability there, which ends up being valuable for kind of like the long tail of challenges.
Now, LiDAR is the first sensor to drop off in terms of range and ours has a really good range, but at the end of the day, it drops off. And so particularly for trucks, on top of the general redundancy that you want for near range and complements through cameras and radar for occlusions and for complimentary information and so forth, when you get the long range, you have to be radar and camera primary because your LiDAR data will fundamentally drop off after a period of time, and you have to be able to see kind of objects further out.
Now, cameras have the incredible range where you get a high density, high resolution camera, you can get data well past a kilometer and it's like really potentially a huge value. Now, the signal drops off, the noise is higher, detecting is harder, classifying is harder. And one that you may not think about localizing is harder because you can be off by like two meters and where something's located a kilometer away.
And that's the difference between being on the shoulder and being in your lane. And so you have like interesting challenges there that you have to solve, which have a bunch of approaches that come into it. Radar is interesting because it also has longer range than LiDAR and it gives you speed information.
So it becomes very, very useful for dynamic information of traffic flow, vehicle motions, animals, pedestrians, like just things that might be useful signals. And it helps with weather conditions where radar actually penetrates weather conditions in a better way than other sensors. And so it's just, it's kind of interesting where we've kind of started to converge towards not thinking about a problem as a LiDAR problem or a camera problem or radar problem, but it's a fusion problem where these are all like large scale ML problems where you put data into the system.
And in many cases, you just look for the signals that might be present in the union of all of these and leave it to the system as much as possible to start to really identify how to extract that. And then there's places we have to intervene and actually include more, but no single sensor is in a great position to like really solve this problem and end without a huge extra challenge.
- That's fascinating. There's a question that's probably still an open question is at which point do you fuse them? Do you solve the perception problem for each sensor suite individually, the LiDAR suite and the camera suite, or do you do some kind of heterogeneous fusion or do you fuse at the very beginning?
Is there a good answer or at least an inkling of intuitions you can come up with? - Yeah, so people refer to this as like early fusion or late fusion. So late fusion might be that you have like the camera pipeline, the LiDAR pipeline, and then you like fuse them.
And like when it gets to like final semantics and classification and tracking, you like kind of fuse them together and figure out which one's best. There's more and more evidence that early fusion is important. And that is because late fusion does not allow you to pick up on the complementary strengths and weaknesses of the sensors.
Weather's a great example where if you do early fusion, you have an incredibly hard problem for any single sensor in rain to solve that problem because you have reflections from the LiDAR, you have weird kind of noise from the camera, blah, blah, blah, right? But the combination of all of them can help you filter and help you get to the real signal that then gets you as close as possible to the original stack.
And be much more fluid about the strengths and weaknesses where your camera is much more susceptible to like kind of fouling on the actual lens from like rain or random stuff. Whereas like you might be a little bit more resilient in other sensors. And so there's an element of logic that always happens late in the game, but that fusion early on, actually, especially as you move towards ML and large-scale data-driven approaches, just maximizes your ability to pull out the best signal you can out of each modality before you start making constraining decisions that end up being hard to unwind late in the stack.
- So how much of this is a machine learning problem? What role does ML, machine learning, play in this whole problem of autonomous driving, autonomous trucking? - It's massive and it's increasing over time. If you go back to the grand challenge days in the early days of kind of AV development, there was ML, but it was not in like kind of the mass scale data style of ML.
It was like learning models, but in a more structured kind of way. And it was a lot of heuristic and search-based approaches and planning and so forth. You can make a lot of progress with these types of approaches kind of across the board, an almost deceptive amount of progress.
We can get pretty far, but then you start to really grind the further you get in some parts of the stack if you don't have an ability to absorb a massive amount of experience in a way that scales very sub-linearly in terms of human labor and human attention. And so when you look at the stack, the perception side is probably the first to get really revolutionized by ML.
And it goes back many years because ML for like computer vision and these types of approaches kind of took off was a lot of the like early kind of push in deep learning. And so there's always a debate on, you know, the spectrum between kind of like end to end ML, which, you know, is a little bit kind of like too far to how you architect it to where you have modules, but enough ability to think about long tail problems and so forth.
But at the end of the day, you have big parts of the system that are very ML and data driven. And we're increasingly moving that direction all the way across the board, including behavior where even when it's not like a gigantic ML problem that covers like a giant swath end to end, more and more parts of the system have this property where you want to be able to put more data into it and it gets better.
And that has been one of the realizations as you drive tens of millions of miles and try to like solve new expansions of domains without regressing in your old ones, it becomes intractable for a human to approach that in the way that traditionally robotics has kind of approached some elements of the tech stack.
- So are you trying to create a data pipeline specifically for the trucking problem? This is it, like how much leveraging of the autonomous driving is there in terms of data collection? - Yeah. - And how unique is the data required for the trucking problem? - So we use all the same infrastructure.
So labeling workflows, ML workflows, everything. So that actually carries over quite well. We heavily reuse the data even, where almost every model that we have on a truck, we started with the latest car model. - Cool. - And- - So it's almost like a good back car model. - Yeah, it's like you can think of like, despite the different domain and different numbers of sensors and position of sensors, there's a lot of signals that carry over across driving.
And so it's almost like pre-training and getting a big boost out of the gate where you can reduce the amount of data you need by a lot. And it goes both ways actually. And so we're increasingly thinking about our data strategy on how we leverage both of these. So you think about, you know, how other agents react to a truck.
Yeah, it's a little bit different, but the fundamentals are actually like, what will other vehicles in the road do? There's a lot of carry over that's possible. And in fact, just to give you an example, we're constantly kind of like adding more data from the trucking side. But as of right now, when we think of our, like one of our models, behavior prediction for other agents on the road, like vehicles.
85% of that data comes from cars. And a lot of that 85% comes from surface streets because we just had so much of it and it was really valuable. And so we're adding in more and more, particularly in the areas where we need more data, but you get a huge boost out of the gate.
- Just all different visual characteristics of roads, lane markings, pedestrians, all that, that's still relevant. - It's all still relevant. And then just the fundamentals of how, you know, you detect the car, does it really change that much? Whether you're detecting it from a car or a truck, the fundamentals of how a person will walk around your vehicle is it, it'll change a little bit, but the basics, like there's a lot of signal in there that as a starting point to a network can actually be very valuable.
Now we do have some very unique challenges where there's a sparsity of events on a freeway. The frequency of events happening on a freeway, whether it's, you know, interesting, you know, objects in the road or incidents, or even like from a human benchmark, like how often does a human have a accident on a freeway is far more sparse than on a surface street.
And so that leads to really interesting data problems where you can't just drive infinitely to encounter all the different permutations of things you might encounter. And so there you get into interesting tools like structure testing and data collection, data augmentation, and so forth. And so there's really interesting kind of technical challenges that push some of the research that enables these new suites of approaches.
- What role does simulation play? - Really good question. So Waymo simulates about a thousand miles for every mile it drives. So you think of-- - In both, so across the board. - Across the board, yeah. So you think of, for example, well, if we've driven, you know, over 20 million miles, that's over 20 billion miles in simulation.
Now, how do you use simulation? It's a multi-purpose. So you use it for basic development. So you want to do, make sure you have regression prevention and protection of everything you're doing, right? That's an easy one. When you encounter something interesting in the world, let's say there was an issue with how the vehicle behaved versus an ideal human.
You can play that back in simulation and start augmenting your system and seeing how you would have reacted to that scenario with this improvement or this new area. You can create scenarios that become part of your regression set after that point, right? Then you start getting into like really, really like kind of hill climbing where you say, "Hey, I need to improve this system.
I have these metrics that are really correlated with final performance. How do I know how well I'm doing?" The actual physical driving is the least efficient form of testing. And it's expensive, it's time consuming. So grabbing a large scale batch of historical data and simulating it to get a signal of over these last, or just random sample of a hundred thousand miles, how has this metric changed versus where we are today?
You can do that far more efficiently in simulation than just driving with that new system on board, right? And then you go all the way to the validation phase where to actually see your human relative safety of like how well you're performing on the car side or the trucking side relative to a human.
A lot of that safety case is actually driven by taking all of the physical operational driving, which probably includes a lot of interventions where the driver took over just in case. And then you simulate those forward and see if would anything have happened? And in most cases, the answer is no, but you can simulate it forward.
And you can even start to do really interesting things where you add virtual agents to create harder environments. You can fuzz the locations of physical agents. You can muck with the scene and stress test the scenario from a whole bunch of different dimensions. And effectively you're trying to like more efficiently sample this like infinite dimensional space, but try to encounter the problems as fast as possible.
Because what most people don't realize is the hardest problem in autonomous driving is actually the evaluation problem in many ways, not the actual autonomy problem. And so if you could in theory evaluate perfectly and instantaneously, you can solve that problem in a really fast feedback loop quite well. But the hardest part is being really smart about this suite of approaches on how can you get an accurate signal on how well you're doing as quickly as possible in a way that correlates to physical driving.
- Can you explain the evaluation problem? Which metric are you evaluating towards? Are we talking about safety? What are the performance metrics that we're talking about? - So in the end you care about end safety. Like that's in the end what keeps you, like that's what's deceptive where there's a lot of companies that have like a great demo.
The path from like a really great demo to being able to go driverless can be deceptively long, even when that demo looks like it's driverless quality. And the difference is that the thing that keeps you from going driverless is not the stuff you encounter in a demo, it's the stuff that you encounter once in a hundred thousand miles or 500,000 miles.
And so that is at the root of what is most challenging about going driverless because any issue you encounter you can go and fix it, but how do you know you didn't create five other issues that you haven't encountered yet? So those learnings, like those were painful learnings in Waymo's history that Waymo went through and led to us then finally being able to go driverless in Phoenix and now are at the heart of how we develop.
Evaluation is simultaneously evaluating final kind of end safety of how ready are you to go driverless, which may be as direct as what is your collision, human relative kind of collision rate for all these types of scenarios and severities to make sure that you're better than a human bar by a good amount.
But that's not actually the most useful for development. For development, it's much more kind of analog metrics that are part of the art of finding how, what are the properties of driving that give you a way quicker signal that's more sensitive than a collision that can correlate to the quality you care about and push the feedback loop to all of your development.
A lot of these are, for example, comparisons to human drivers, like manual drivers. How do you do relative to a human driver in various dimensions or various circumstances? - Can I ask you a tricky question? So if I brought you a truck, how would you test it? Okay, Alan Turing came along and you said- - This one's, can't tell if it's a human driver or autonomous driver.
- Yeah, exactly. But not the human because humans are flawed. - It's different, but yeah. How do you actually know you're ready, basically? Like, how do you know it's good enough? Yeah, and by the way, this is the reason why Waymo released a safety framework for the car side because one, it sets the bar so nobody cuts below it, and does something bad for the field that causes an accident.
Two, it's to start the conversation on framing what does this need to look like? Same thing we'll end up doing for the trucking side. It ends up being different portfolio of approaches. There's easy things like, are you compliant with all these fundamental rules of the road? Like, you never drive above the speed limit.
That's actually pretty easy. You can fundamentally prove that it's either impossible to violate that rule or that in these, you can itemize the scenarios where that comes up and you can do a test and show that you passed that test and therefore you can handle that scenario. And so those are like traditional structure testing kind of system engineering approaches where you can just, like fault rates is another example where when something fails, how do you deal with it?
You're not gonna drive and randomly wait for it to fail. You're gonna force a failure and make sure that you can handle it in close courses and simulation or on the road. And run through all the permutations of failures which you can oftentimes for some parts of system itemize like hardware.
The hardest part is behavioral where you have just infinite situations that could in theory happen. And you wanna figure out the combinations of approaches that can work there. You can probably pass the Turing test pretty quickly even if you're not like completely ready for driverless because the events that are really kind of like hard will not happen that often.
Just to give you a perspective, a human has a serious accident on a freeway like a truck driver on a freeway has, there's a serious event happens once every 1.3 million miles and something that actually has like a really serious injuries, 28 million miles. And so those are really rare.
And so you could have a driver that looks like it's ready to go, but you have no signal on what happens there. And so that's where you start to get creative on combinations of sampling and statistical arguments, focused structured arguments where you can kind of simulate those scenarios and show that you can handle them and metrics that are correlated with what you care about but you can measure much more quickly and get to a right answer.
And that's what makes it pretty hard. And in the end, you end up borrowing a lot of properties from aerospace and like space shuttles and so forth where you don't get the chance to launch it a million times just to say you're ready because it's too expensive to fail.
And so you go through a huge amount of kind of structured approaches in order to validate it. And then by thoroughness, you can make a strong argument that you're ready to go. This is actually a harder problem in a lot of ways though because you can think of a space shuttle as getting to a fixed point and then you kind of like or an airplane and you like freeze the software and then you like prove it and you're good to go.
Here you have to get to a driverless quality bar but then continue to aggressively change the software even while you're driverless. And so- - But in also the full range of environment that there's an external environment with a shuttle is you're basically testing the like the systems, the internal stuff.
- Yeah. - And you have a lot of control in the external stuff. - Yeah, and the hard part is how do you know you didn't get worse in something that you just changed? - Yes, sure. - And so in a lot of ways like the Turing test starts to fail pretty quickly because you start to feel driverless quality pretty early in that curve.
If you think about it, right? Like in most kind of really good AV demos maybe you'll sit there for 30 minutes, right? - Yeah. - So you've driven 15 miles or something like that. To go driverless, like what's the sort of rate of issues that you need to have you won't even encounter?
- So let's try something different then. Let's try a different version of the Turing test which is like an IQ test. So there's these difficult questions of increasing difficulty. They're designed, you don't know them ahead of time. Nobody knows the answer to them. - Right. - And so is it possible to in the future orchestrate basically really difficult-- - Obstacle course almost of like-- - Yeah, that maybe change every year and that represent, if you can pass these, they don't necessarily represent the full spectrum.
- That's it, yeah. They won't be conclusive, but you can at least get a really quick read and filter. - Yeah, like you're able to, 'cause you didn't know them ahead of time. Like, I don't know, probably-- - Like construction zones, failures-- - Or driving anywhere in Russia. - Yeah, yeah, exactly.
- Snow. - Weather, cut-ins, dense traffic, kind of merging lane closures, animal foreign objects on a road that pop out on short notice, mechanical failures, sensor breaking, tire popped, weird behaviors by other vehicles, like a hard brake, something reckless that they've done, fouling of sensors, like bugs or birds, you know, poop or something.
But yeah, like you have these like kind of like extreme conditions where like you have a nasty construction zone where everything shuts down and you have to like, you know, get pulled to the other side of the freeway with a temporary lane like that, right? Those are sort of conditions where we do that to ourselves, right?
We itemize everything that could possibly happen to give you a starting point to how to think about what you need to develop. And at the end of the day, there's no substitute for real miles. Like if you think of traditional ML, like you know how there's like a validation set where you hold out some data and like real world driving is the ultimate validation set.
That's the, in the end, like the cleanest signal. But you can do a really good job on creating an obstacle course. And you're absolutely right, like at the end, if there was such a thing as automating and kind of a readiness, it would be these extreme conditions like a red light runner, right?
A really reckless pedestrian that's jaywalking, a cyclist that, you know, makes like a really awkward maneuver. That's actually what keeps you from going driverless. Like in the end, that is the long tail. - Yeah, and it's interesting to think about that. That to me is the Turing test. Turing test means a lot of things, but to me in driving, the Turing test is exactly this validation set that is handcrafted.
There's a, I don't know if you know him. There's a guy named Francois Chollet. He designed, he thinks about like how to design a test for general intelligence. He designs these IQ tests for machines. And the validation set for him is handcrafted. And that it requires like human genius or ingenuity to create a really good test.
And you hold, you truly hold it out. It's an interesting perspective on the validation set, which is like, make that as hard as possible. Not a generic representation of the data, but this is the hardest. - The hardest stuff. Yeah, you know, it's like, go. Like you'll never fully itemize like all the world states that you'll expand.
And so you have to come up with different approaches. And this is where you start hitting the struggles of ML, where ML is fantastic at optimizing the average case. It's a really unique craft to think about how you deal with the worst case, which is what we care about in the Navy space, when using an ML system on something that occurs like super infrequently.
So like, you don't care about the worst case really on ads, because if you miss a few, it's not a big deal, but you do care about it on the driving side. And so typically like, you'll never fully enumerate the world. And so you have to take a step back and abstract away, what are the signals that you care about and the properties of a driver that correlate to defensive driving and avoiding nasty situations that even though you'll always be surprised by things you'll encounter, you feel good about your ability to generalize from what you've learned.
- All right, let me ask you a tricky question. So to me, the two companies that are building at scale some of the most incredible robots ever built is Waymo and Tesla. So there's very distinct approaches, technically, philosophically in these two systems. Let me ask you to play sort of devil's advocate and then the devil's advocate to the devil's advocate.
It's a bit of a race, of course, everyone can win. But if Waymo wins this race to level four, why would they win? What aspect of the approach do you think would be the winning aspect? And if Tesla wins, why would they win? And which aspect of their approach would be the reason?
Just building some intuition, almost not from a business perspective, for me, that's just technically. - Yeah. - Yeah, and we could summarize, I think maybe you can correct me, one of the more distinct aspects is Waymo has a richer suite of sensors, there's LiDAR and vision. Tesla now removed radar, they do vision only.
Tesla has a larger fleet of vehicles operated by humans, so it's already deployed on the field and it's larger, what do you call it, operational domain. And then Waymo is more focused on a specific domain and growing it with fewer vehicles. So both are fascinating approaches, both are, I think, there's a lot of brilliant ideas, nobody knows the answer.
So I'd love to get your comments on this lay of the land. - Yeah, for sure. So maybe I'll start with Waymo. And you're right, both incredible companies and just a gigantic respect to everything Tesla's accomplished and how they pushed the field forward as well. So on the Waymo side, there is a fundamental advantage in the fact that it is focused and geared towards L4 from the very beginning.
We've customized the sensor suite for it, the hardware, the compute, the infrastructure, the tech stack, and all of the investment inside the company. That's deceptively important because there's like a giant spectrum of problems you have to solve in order to really do this from infrastructure to hardware, to autonomy stack, to the safety framework.
And that's an advantage because there's a reason why it's the fifth generation hardware and why all of those learnings went into the Dymor program. It becomes such an advantage because you learn a lot as you drive and you optimize for the best information you have, but fundamentally, there's a big, big jump, like every order of magnitude that you drive in numbers of miles and what you learn and the gap from really kind of like decent progress for L2 and so forth to what it takes to actually go L4.
And at the end of the day, there's a feeling that Waymo has, there's a long way to go, nobody's won, but there's a lot of advantages in all of these buckets where it's the only company that has shipped a fully driverless service where you can go and you can use it and it's at a decently sizable scale.
And those learnings can feed forward into how to solve the more general problem. - And you see this process you've deployed in Chandler, you don't know the timeline exactly, but you could see the steps, they seem almost incremental, the steps. - It's become more engineering than totally blind R&D.
- 'Cause it works in one place and then you move to another place and you grow it this way. - And just to give you an example, we fundamentally changed our hardware and our software stack almost entirely from what went driverless in Phoenix to what is the current generation of the system on both sides because the things that got us to driverless, even though it got to driverless way beyond human relative safety, it is fundamentally not well set up to scale in an exponential fashion without getting into huge kind of scaling pains.
And so those learnings, you just can't shortcut. And so that's an advantage. And so there's a lot of open challenges to kind of get through technical organizational, like how do you solve problems that are increasingly broad and complex like this work on multiple products. But there's a few in that, okay, like balls in our court, there's a headstart there, now we gotta go and solve it.
And I think that focus on L4, it's a fundamentally different problem. If you think about it, like let's say we were designing an L2 truck that was meant to be safer and help a human, you could do that with far less sensors, far less complexity and provide value very quickly, arguably with what we already have today, just packaged up in a good product.
But you would take a huge risk in having a gap from even the like compute and sensors, not to mention the software to then jump from that system to an L4 system. So it's a huge risk basically. - So again, allow me to be the person that plays the devil's advocate and let's argue for the Tesla approach.
So what you just laid out makes perfect sense and is exactly right. There are some open questions here, which is it's possible that investing more in faster data collection, which is essentially what Tesla's doing, will get us there faster if the sensor suite doesn't matter as much and machine learning can do a lot of the work.
This is the open question is, how much is the thing you mentioned before, how much of driving can be end to end learned? That's the open question. Obviously the Waymo and the vision only machine learning approach will solve driving eventually both. The question is of timeline, what's faster. - That's right.
And what you mentioned, like if I were to make the opposite argument, like what puts Tesla in the strongest position, it's data. That is their like superpower where they have an access to real world data effectively with like a safety driver. And they found a way to like get paid by safety drivers versus safer safety drivers.
It's brilliant, right? But all joking aside, like one, it is incredible that they've built a business that's incredibly successful that can now be a foundation and bootstrap kind of like really aggressive investment in autonomy space. If you can do it, that's always like an incredible kind of advantage. And then the data aspect of it, it is a giant amount of data.
If you can use it the right way to then solve the problem, but the ability to collect and filter through the things to the things that matter at real world scale, at like a large distribution, that is huge. Like it's a big advantage. And so then the question becomes, can you use it in our right way?
And do you have the right software systems and hardware systems in order to solve the problem? And you're right that in the longterm, there's no reason to believe that pure camera systems can't solve the problem that humans obviously are solving with vision systems. But- - Question is when? - It's a risk.
It's a big risk. So there's no argument that it's not a risk, right? And it's already such a hard problem. And so much of that problem, by the way, is even beyond the perception side, some of the hardest elements of the problem in our behavioral side and decision-making and the long tail safety case, if you are adding risk and complexity on the input side from perception, you're now making a really, really hard problem which on its own is still almost insurmountably hard, even harder.
And so the question is just how much. And this is where you can easily get into a little bit of a kind of a trap where similar to how you, how do you evaluate how good an AV company's product is? Like you go and you do a trial, kind of a test run with them, a demo run, which they've kind of optimized like crazy and so forth.
And like, and it feels good. Do you put any weight in that, right? You know that that gap is kind of like pretty large still. Same thing on the like perception case, like the long tail of computer vision is really, really hard. And there's a lot of ways that that can come up.
And even if it doesn't happen that often at all, when you think about the safety bar and what it takes to actually go full driverless, not like incredible assistance driverless, but full driverless, that bar gets crazy high. And not only do you have to solve it on the behavioral side, but now you have to push computer vision beyond arguably where it's ever been pushed.
And so you now on top of the broader AV challenge, you have a really hard perception challenge as well. - So there's perception, there's planning, there's human robot interaction. To me, what's fascinating about what Tesla is doing is in this march towards level four, because it's in the hands of so many humans, you get to see video, you get to see humans.
I mean, forget companies, forget businesses. It's fascinating for humans to be interacting with robots. - That's incredible. And they're actually helping kind of push it forward. And that is valuable by the way, where even for us, a decent percentage of our data is human driving. We intentionally have humans drive higher percentage than you might expect, because that creates some of the best signals to train the autonomy.
And so that is on its own value. - So together we're kind of learning about this problem in an applied sense, just like you had with Cosmo. When you're chasing an actual product that people are going to use, robot-based product that people are going to use, you have to contend with the reality of what it takes to build a robot that successfully perceives the world and operates in the world, and what it takes to have a robot that interacts with other humans in the world.
And that's like, to me, one of the most interesting problems humans have ever undertaken, because you're in trying to create an intelligent agent that operates in a human world, you're also understanding the nature of intelligence itself. Like how hard is driving? It's still not answered to me. I still don't understand.
- All the subtle cues, even little things like your interaction with a pedestrian where you look at each other and just go, okay, go. That's hard to do without a human driver, right? And you're missing that dimension. How do you communicate that? So there's really, really interesting kind of elements here.
Now here's what's beautiful. Can you imagine that when autonomous driving is solved, how much of the technology foundation of that space can go and have tremendous, just transformative impacts on other problem areas and other spaces that have subsets of these same problems? It's just incredible to think about that.
- It's both a pro and a con. With autonomous driving is so safety critical. Once you solve it, it's beautiful because there's so many applications that are a lot less safety critical. But it's also the con of that is it's so hard to solve. And the same journalists that you mentioned that get excited for a demo are the ones who write long articles about the failure of your company if there's one accident that's based on a robot.
And it's just society's so tense and waiting for failure of robots. You're in such a high stake environment. Failure has such a high cost. - And it slows down development. - It slows down development. - Yeah, the team definitely noticed that once you go driverless, like we're driverless in Phoenix and you continue to iterate, your iteration pace slows down because your fear of regression forces so much more rigor that obviously you have to find a compromise on like, okay, well, how often do we release driverless builds?
Because every time you release a driverless build, you have to go through this validation process, which is very expensive and so forth. So it is interesting. It's like, it is one of the hardest things. There's no other industry where like, you would not like, you wouldn't release the products way, way quicker when you start to kind of provide even portions of the value that you provide.
Healthcare maybe is the other one. - Health, that's right. - But at the same time, right? Like we've gotten there where you think of like surgery, right? Like you have surgery, there's always a risk, but like, it's really, really bounded. You know that there's an accident rate when you go out and drive your car today, right?
Like, and you know what the fatality rate in the US is per year. We're not banning driving because there was a car accident, but the bar for us is way higher. And we hold ourselves very serious to it, where you have to not only be better than a human, but you probably have to like at scale, be far better than a human by a big margin.
And you have to be able to like really, really thoughtfully explain all of the ways that we validate that becomes very comfortable for humans to understand, because a bunch of jargon that we use internally just doesn't compute at the end of the day. We have to be able to explain to society, how do we quantify the risk and acknowledge that there is some non-zero risk, but it's far above a human, you know, relative safety.
- See, here's the thing to push back a little bit and bring Cosmo back in the conversation. You said something quite brilliant at the beginning of this conversation, I think probably applies for autonomous driving, which is, you know, there's this desire to make autonomous cars more safer than human driven cars.
But if you create a product that's really compelling and is able to explain both the leadership and the engineers and the product itself can communicate intent, then I think people may be able to be willing to put up with the thing that might be even riskier than humans, because they understand the value of taking risks.
You mentioned the speed limit. Humans understand the value of going over the speed limit. Humans understand the value of like going fast through a yellow light. To take, and when you're in Manhattan streets, pushing through, crossing pedestrians, they understand that. I mean, this is a much more tense topic of discussion, so this is just me talking.
So in Cosmo's case, there was something about the way this particular robot communicated, the energy it brought, the intent it was able to communicate to the humans, that you understood that of course it needs to have a camera. Of course it needs to have this information. And in that same way, to me, of course a car needs to take risks.
Of course there's going to be accidents. If you want a car that never has an accident, have a car that just doesn't go anywhere. But that's tricky, because that's not a robotics problem. - And many accidents are not even due to you, right? Obviously it's, so there's a big difference though.
That's not a personal decision. You're also impacting obviously kind of the rest of the road, and we're facilitating it, right? And so there's a higher kind of ethical and moral bar, which obviously then translates into, as a society, and from a regulatory standpoint, kind of like what comes out of it, where it's hard for us to ever see this even being a debate in the sense that like, you have to be beyond reproach from a safety standpoint, because if you're wrong about this, you could set the entire field back a decade, right?
- See, this is me speaking. I think if we look into the future, there will be, I personally believe, this is me speaking, that there will be less and less focus on safety. That's still very, very high. - Yeah, meaning like after autonomy is very common and accepted. - But not so common that it's everywhere, but there has to be a transition, because I think for innovation, just like you were saying, to explore ideas, you have to take risks.
And I think if autonomy in the near term is to become prevalent in society, I think people need to be more willing to understand the nature of risk, the value of risk. It's very difficult, you're right, of course, with driving, but that's the fascinating nature of it. It's a life and death situation that brings value to millions of people.
So you have to figure out what do we value about this world? How much do we value, how deeply do we want to avoid hurting other humans? - That's right. And there is a point where like, you can imagine a scenario where Waymo has a system that is even when it's like kind of beyond human relative safety and provably statistically will save lives, there is a thoughtful navigation of that fact versus just kind of society readiness and perception and education of society and regulators and everything else where like, it's multi-dimensional and it's not a purely logical argument, but ironically, the logic can actually help with the emotions.
And just like any technology, there's early adopters and then there's kind of like a curve that happens after it. - But eventually celebrities, you get the rock in a Waymo vehicle and then everybody just comes. - And then everybody's comes down 'cause the rock likes it, yeah. (laughing) - If you post the- - Yeah, and it's like, it's an open question on how this plays out.
I mean, maybe we're pleasantly surprised and it just like, people just realize that this is such a enabler of life and like efficiency and cost and everything that there's a pull, like at some point I should fully believe that this will go from a thoughtful kind of movement and tiptoeing and like kind of like a push to society realizes how wonderful of an enabler this could become and it becomes more of a pull and hard to know exactly how that'll play out, but at the end of the day, like both the goods transportation and the people transportation side of it has that property where it's not easy.
There's a lot of open questions and challenges to navigate and there's obviously the technical problems to solve as a kind of prerequisite, but they have such an opportunity that is on a scale that very few industries in the last 20, 30 years have even had a chance to tackle that I maybe were pleasantly surprised by how much that tipping point, like in a really short amount of time actually turns into a societal pull to kind of embrace the benefits of this.
- Yeah, I hope so. It seems like in the recent few decades, there's been tipping points for technologies where like overnight things change. It's like from taxis to ride sharing services, all that shift. I mean, there's just shift after shift after shift that requires digitization and technology. I hope we're pleasantly surprised in this.
So there's millions of long haul trucks now in the United States. Do you see a future where there's millions of way more trucks and maybe just broadly speaking, way more vehicles, just like ants running around the United States, freeways and local roads? - Yeah, and other countries too. You look back decades from now and it might be one of those things that just feels so natural and then it becomes almost like this kind of interesting kind of oddity that we had none of it, like kind of decades earlier.
And it'll take a long time to grow in scale. Very different challenges appear at every stage. But over time, like this is one of the most enabling technologies that we have in the world today. It'll feel like, how was the world before the internet? How was the world before mobile phones?
Like it's gonna have that sort of a feeling to it on both sides. - It's hard to predict the future, but do you sometimes think about weird ways it might change the world? Like surprising ways. So obviously there's more direct ways where like there's increases efficiency. It will enable a lot of kind of logistics, optimizations kind of things.
It will change probably our roadways and all that kind of stuff. But it could also change society in some kind of interesting ways. Do you ever think about how it might change cities? How it might change our lives? All that kind of stuff. - You can imagine city where people live versus work becoming more distributed because the pain of commuting becomes different, just easier.
There's a lot of options that open up. The layout of cities themselves and how you think about car storage and parking obviously just enables a completely different type of experience in urban environments. I think there was like a statistic that something like 30% of the traffic in cities during rush hour is caused by pursuit of parking or something like some really high stats.
So those obviously kind of open up a lot of options. Flexibility on goods will enable new industries and businesses that never existed before because now the efficiency becomes more palatable. Good delivery, timing, consistency and flexibility is gonna change. The way we distribute the logistics network will change. The way we then can integrate with warehousing, with shipping ports.
You can start to think about greater automation through the whole kind of stack and how that supply chain, the ripples become much more agile versus like very grindy the way they are today where just the adaptation is like very tough and there's like a lot of constraints that we have.
I think it'll be great for the environment. It'll be great for safety where like probably about 95% of accidents today statistically are due to just attention or things that are preventable with the strengths of automation. Yeah, and it'll be one of those things where like industries will shift, but the net creation is gonna be massively positive.
And then we just have to be thoughtful about the negative implications that will happen in local places and adjust for those. But I'm an optimist in general for the technology where you could argue a negative on any new technology, but you start to kind of see that if there is a big demand for something like this, that in almost all cases, that like it's an enabling factor that's gonna kind of propagate through society.
And particularly as life expectancies get longer and so forth, like there's just a lot more need for a greater percentage of the population to kind of just be serviced with a high level of efficiency because otherwise we're gonna have a really hard time kind of scaling to what's ahead in the next 50 years.
- Yeah, and you're absolutely right. Every technology has negative consequences and positive consequences. And we tend to focus on the negative a little bit too much. In fact, autonomous trucks are often brought up as an example of artificial intelligence and robots in general taking our jobs. And as we've talked about briefly here, we talk a lot with Steve, you know, that it is a concern that automation will take away certain jobs, it'll create other jobs.
So there's temporary pain, hopefully temporary, but pain is pain and people suffer. And that human suffering is really important to think about how, but trucking is, I mean, there's a lot written on this, is I would say far from the thing that will cause the most pain. - Yeah, there's even more positive properties about trucking where not only is there just a huge shortage which is gonna increase, the average age of truck drivers is getting closer to 50 because the younger people aren't wanting to come into it.
They're trying to like incentivize, lower the age limit, like all these sorts of things. And the demand is just gonna increase. And the least favorable, like, I mean, it depends on the person, but in most cases, the least favorable types of routes are the massive long haul routes where you're on the road away from your family 300 plus days a year.
- Steve's talked about the pain of those kinds of routes from a family perspective. You're basically away from family. It's not just hours, you work insane hours, but it's also just time away from family. - Obesity rate is through the roof because you're just sitting all day. Like it's really, really tough.
And that's also where like the biggest kind of safety risk is because of fatigue. And so when you think of the gradual evolution of how trucking comes in, first of all, it's not overnight. It's gonna take decades to kind of phase in all the, like there's just a long, long, long road ahead, but the routes and the portions of trucking that are gonna require humans the longest and benefit the most from humans are the short haul and most complicated kind of more urban routes, which are also the more pleasant ones, which are less continual driving time, more flexibility on like geography and location.
And you get to kind of sleep at your own home. - And very importantly, if you optimize the logistics, you're going to use humans much better and thereby pay them much better. Because like one of the biggest problems is truck drivers currently are paid by like how much they drive.
So they really feel the pain of inefficient logistics. Because like if they're just sitting around for hours, which they often do not driving, waiting, they're not getting paid for that time. - That's right. - So like logistics has a significant impact on the quality of life of a truck driver.
- And a high percentage of trucks are like empty because of inefficiencies in the system. Yeah, it's one of those things where like, and the other thing is when you increase the efficiency of a system like this, the overall net like volume of the system tends to increase, right?
Like the entire market cap of trucking is going to go up when the efficiency improves and facilitates both growth in industries and better utilization of trucking. And so that on its own just creates more and more demand, which of all the places where AI comes in and starts to really kind of reshape an industry, this is one of those where like, there's just a lot of positives that for at least any time in the foreseeable future seem really lined up in a good way to kind of come in and help with the shortage and start to kind of optimize for the routes that are most dangerous and most painful.
- Yeah, so this is true for trucking, but if we zoom out broader, automation and AI does technology broadly, I would say, but automation is a thing that has a potential in the next couple of decades to shift the kind of jobs available to humans. And so that results in, like I said, human suffering because people lose their jobs, there's economic pain there.
And there's also a pain of meaning. So for a lot of people, work is a source of meaning, it's a source of identity, of pride, pride in getting good at the job, pride in craftsmanship and excellence, which is what truck drivers talk about. But this is true for a lot of jobs.
And is that something you think about as a sort of a robotic is zooming out from the trucking thing, like where do you think it would be harder to find activity and work that's a source of identity, a source of meaning in the future? - I do think about it because you want to make sure that you worry about the entire system, like not just like the part of how it plays in it, but what are the ripple effects of it down the road.
And on enough of a time window, there's a lot of opportunity to put in the right policies, the right opportunities to kind of reshape and retrain and find those openings. And so just to give you a few examples, both trucking and cars, we have remote assistance facilities that are there to interface with customers and monitor vehicles and provide like very focused kind of assistance on kind of areas where the vehicle may want to request help in understanding an environment.
So those are jobs that kind of get created and supported. I remember like taking a tour of one of the Amazon facilities where you've probably seen the Kiva Systems robots, where you have these orange robots that have automated the warehouse, like kind of picking and collecting of items. And it's like really elegant and beautiful way.
It's actually one of my favorite applications of robotics of all time. Like I think it kind of came across that company like 2006, it was just amazing. And what was the-- - Warehouse robots that transport little things. - So basically instead of a person going and walking around and picking the seven items in your order, these robots go and pick up a shelf and move it over in a row where like the seven shelves that contain the seven items are lined up in a laser or whatever points to what you need to get.
And you go and pick it and you place it to fill the order. And so the people are fulfilling the final orders. What was interesting about that is that when I was asking them about like kind of the impact on labor, when they transitioned that warehouse, the throughput increased so much that the jobs shifted towards the final fulfillment, even though the robots took over entirely the search of the items themselves and the labor, the job stayed like nobody, like there was actually the same amount of jobs roughly that were necessary, but the throughput increased by like, I think over two X or some amount, right?
Like, so you have these situations that are not zero sum games in this really interesting way. And the optimist to me thinks that there's these types of solutions in almost any industry where the growth that's enabled creates opportunities that you can then leverage, but you gotta be intentional about finding those and really helping make those links.
Because even if you make the argument that like, there's a net positive, locally there's always tough hits that you gotta be very careful about. - That's right. You have to have an understanding of that link because there's a short period of time, whether training is acquired or just mental transition or physical or whatever is acquired, that's still gonna be short-term pain.
The uncertainty of it, there's families involved. I mean, it's exceptionally, it's difficult on a human level and you have to really think about that. You can't just look at economic metrics always, it's human beings. - That's right. And you can't even just take it as like, okay, well, we need to like subsidize or whatever, because like there is an element of just personal pride where majority of people don't wanna just be okay, but like they wanna actually like have a craft, like you said, and have a mission and feel like they're having a really positive impact.
And so my personal belief is that there's a lot of transferability and skillset that is possible, especially if you create a bridge and an investment to enable it. And to some degree, that's our responsibility as well, this process. - You mentioned Kiva Robots, Amazon. Let me ask you about the Astro Robot, which is, I don't know if you've seen it.
Amazon has announced that it's a home robot that they have, the screen looks awfully a lot like Cosmo, has I think different vision probably. What are your thoughts about like home robotics in this kind of space? There's been quite a bunch of home robots, social robots that very unfortunately have closed their doors for various reasons, perhaps they were too expensive, there's been manufacturing challenges, all that kind of stuff.
What are your thoughts about Amazon getting into this space? - Yeah, we had some signs that they're getting into it like long, long, long ago. Maybe they were a little bit too interested in Cosmo and I drew in our conversations, but they're also very good partners actually for us as we kind of just integrated a lot of shared technology.
- If I could also get your thoughts on, you could think of Alexa as a robot as well, Echo. Do you see those as fundamentally different just because you can move and look around, is that fundamentally different than the thing that just sits in place? - It opens up options, but my first reaction is I think like, I have my doubts that this one's gonna hit the mark because I think for the price point that it's at and the kind of functionality and value propositions that they're trying to put out, it's still searching for the kill application that justifies, I think it was like a $1,500 price point or kind of somewhere on there.
That's a really high bar. So there's enthusiasts and early adopters will obviously kind of pursue it, but you have to really, really hit a high mark at that price point, which we always tried to, we were always very cautious about jumping too quickly to the more advanced systems that we really wanted to make, but would have raised the bar so much you have to be able to hit it in today's cost structures and technologies.
The mobility is an angle that hasn't been utilized, but it has to be utilized in the right way. And so that's gonna be the biggest challenge is like, can you meet the bar of what the mass market consumer, like think like our neighbors, our friends, parents, like would they find a deep, deep value like in this at a mass scale that justifies the price point?
I think that's in the end, one of the biggest challenges for robotics, especially consumer robotics, where you have to kind of meet that bar, it becomes very, very hard. - And there's also the higher bar, just like you were saying with Cosmo of, you know, a thing that can look one way and then turn around and look at you, that's either a super desirable quality or super undesirable quality, depending on how much you trust the thing.
- That's right. So there's a problem of trust to solve there. There's a problem of personality. It's the thing that is the quote unquote problem that Cosmo solved so well, is that you trust the thing. And that has to do with the company, with the leadership, with the intent that's communicated by the device and the company and everything together.
- Yeah, exactly right. And so, and I think they also have to retrace some of the like warnings on the character side where like, as usual, I think that's the place where it's a, a lot of companies are great at the hardware side of it and can, you know, think about those elements.
And then there's like, you know, the thinking about the AI challenges, particularly with the advantage of Alexa is a pretty huge boost for them. The character side of it for technology companies is pretty new, novel territory. And so that will take some iterations, but yeah, I mean, I hope, I hope there's continued progress in this space and that thread doesn't kind of go dormant for too long.
And it's not, you know, it's gonna take a while to kind of evolve into like the ideal applications, but you know, this is one of Amazon's, I guess like you could call it, it's definitely like part of their DNA, but in many cases is also strength where they're very willing to like iterate kind of aggressively and move quickly.
- Not take risks. - And take risks. - You have deep pockets so you can kind of. - Yeah, and then maybe have more misfires than an apple would, but you know, it's different styles and different approaches. And you know, at the end of the day, it's like there's a few familiar kind of elements there for sure, which was, you know, kind of.
- Homage. Is one way to put it. So why is it so hard at a high level to build a robotics company, a robotics company that lives for a long time? So if you look at, so I thought Cosmo for sure would live for a very long time. That to me was exceptionally successful vision and idea and implementation.
iRobot is an example of a company that has pivoted in all the right ways to survive and arguably thrive by focusing on the, having like a, have a driver that constantly provides profit, which is the vacuum cleaner. And of course there's like Amazon, what they're doing is they're almost like taking risks so they can afford it because they have other sources of revenue, right?
But outside of those examples, most robotics companies fail. - Yeah. - Why do they fail? Why is it so hard to run a robotics company? - iRobot's impressive because they found a really, really great fit of where the technology could satisfy a really clear use case and need. And they did it well, and they didn't try to overshoot from a cost to benefit standpoint.
Robotics is hard because it like tends to be more expensive. It combines way more technologies than a lot of other types of companies do. If I were to like say one thing that is maybe the biggest risk in like a robotics company failing is that it can be either a technology in search of a application, or they try to bite off a kind of an offering that has a mismatch in kind of price to function.
And just the mass market appeal isn't there. And consumer products are just hard. It's just, I mean, after all the years in it, like definitely kind of feel a lot of the battle scars because you have, you know, you not only do you have to like hit the function, but you have to educate and explain, get awareness up, deal with different types of consumers.
Like, you know, there's a reason why a lot of technologies sometimes start in the enterprise space and then kind of continue forward in the consumer space. Even like, you know, you see AR like starting to kind of make that shift with HoloLens and so forth in some ways, consumers and price points that they're willing to kind of be attracted in a mass market way.
And I don't mean like, you know, 10,000 enthusiasts bought it, but I mean like, you know, 2 million, 10 million, 50 million, like mass market kind of interest, you know, have bought it. That bar is very, very high. And typically robotics is novel enough and non-standardized enough to where it pushes on price points so much that you can easily get out of range where the capabilities in today's technology or just the function that was picked just doesn't line up.
And so that product market fit is very important. - So the space of killer apps or rather super compelling apps is much smaller because it's easy to get outside of the price range. - Yeah. - For most consumers. - And it's not constant, right? Like, that's why like we picked off entertainment because the quality was just so low in physical entertainment that we felt we could leapfrog that and still create a really compelling offering at a price point that was defensible.
And we, like that proved out to be true. And over time, that same opportunity opens up in healthcare, in home applications, in commercial applications and kind of broader, more generalized interface. But there's missing pieces in order for that to happen. And all of those have to be present for it to line up.
And we see these sort of trends in technology where, you know, kind of technologies that start in one place evolve and kind of grow to another. Some things start in gaming, some things start in space or aerospace and then kind of move into the consumer market. And sometimes it's just a timing thing, right?
Where how many stabs at what became the iPhone were there over the 20 years before that just weren't quite ready in the function relative to the kind of price point and complexity. - And sometimes it's a small detail of the implementation that makes all the difference, which is design is so important.
- Something, yeah, like the new generation UX, right? - Yeah. - And that's, it's tough. And oftentimes all of them have to be there and it has to be like a perfect storm. And, but yeah, history repeats itself in a lot of ways in a lot of these trends, which is pretty fascinating.
- Well, let me ask you about the humanoid form. What do you think about the Tesla bot and humanoid robotics in general? So obviously to me, autonomous driving, Waymo and the other companies working in the space, that seems to be a great place to invest in potential revolutionary application of robotics, application, folks, application.
What's the role of humanoid robotics? Do you think Tesla bot is ridiculous? Do you think it's super promising? Do you think it's interesting, full of mystery? Nobody knows. What do you think about this thing? - Yeah, I think today humanoid form robotics is research. There's very few situations where you actually need a humanoid form to solve a problem.
If you think about it, right, like wheels are more efficient than legs. There's joints and degrees of freedom beyond a certain point, just add a lot of complexity and cost, right? So if you're doing a humanoid robot, oftentimes it's in the pursuit of a humanoid robot, not in the pursuit of an application for the time being.
Especially when you have like kind of the gaps in interface and, you know, kind of AI that we kind of talk about today. So anything Elon does, I'm interested in following. So there's an element of that world. - No matter how crazy. - How crazy it is. It's like, you know, I'll pay attention and I'm curious to see what comes out of it.
So it's like, you can't ever, you know, ignore it. But, you know, it's definitely far afield from their kind of core business, obviously. And- - What was interesting to me, is I've disagreed with Elon a lot about this, is to me, the compelling aspect of the humanoid form and a lot of kind of robots, Cosmo, for example, is a human robot interaction part.
From Elon Musk's perspective, Tesla bot has nothing to do with the human. It's a form that's effective for the factory because the factory is designed for humans. But to me, the reason you might want to argue for the humanoid form is because, you know, at a party, it's a nice way to fit into the party.
The humanoid form has a compelling notion to it in the same way that Cosmo is compelling. I would argue, if we were arguing about this, that it's cheaper to build a Cosmo, like that form. But if you wanted to make an argument, which I have with Jim Keller about, you know, you could actually make a humanoid robot for pretty cheap, it's possible.
And then the question is, all right, if you're using an application where it can be flawed, it can have a personality and be flawed in the same way that Cosmo is, then maybe it's interesting for integration to human society. That's, to me, is an interesting application of a humanoid form, 'cause humans are drawn, like I mentioned to you, legged robots, we're drawn to legs and limbs and body language and all that kind of stuff.
And even a face, even if you don't have the facial features, which you might not want to have to reduce the creepiness factor, all that kind of stuff. But yeah, that, to me, the humanoid form is compelling. But in terms of that being the right form for the factory environment, I'm not so sure.
- Yeah, for the factory environment, like right off the bat, what are you optimizing for? Is it strength? Is it mobility? Is it versatility, right? Like that changes completely the look and feel of the robot that you create. You know, and almost certainly the human form is over-designed for some dimensions and constrained for some dimensions.
And so, like, what are you grasping? Is it big? Is it little, right? And how do you customize it and make it customizable for the different needs, if that was the optimization, right? And then, you know, for the other one, I could totally be wrong. You know, I still feel that the closer you try to get to a human, the more you're subject to the biases of what a human should be, and you lose flexibility to shift away from your weaknesses and towards your strengths.
And that changes over time, but there's ways to make really approachable and natural interfaces for robotic kind of characters and, you know, and kind of deployments in these applications that do not at all look like a human directly, but that actually creates way more flexibility and capability and role and forgiveness and interface and everything else.
- Yeah, it's interesting, but I'm still confused by the magic I see in legged robots. - Yeah, so there is a magic. So I'm absolutely amazed at it from a technical curiosity standpoint and like the magic that like the Boston Dynamics team can do from, you know, like from walking and jumping and so forth.
Now, like there's been a long journey to try to find an application for that sort of technology, but wow, that's incredible technology, right? - Yes. - Then you kind of go towards, okay, are you working back from a goal of what you're trying to solve or are you working forward from a technology and then looking for a solution?
And I think that's where it's a kind of a bidirectional search oftentimes, but you got to, the two have to meet. And that's where humanoid robots is kind of close to that in that like it is a decision about a form factor and a technology that it forces that doesn't have a clear justification on why that's the killer app or, you know, from the other end.
- I think the core fascinating idea with the Tesla bot is the one that's carried by Waymo as well is when you're solving the general robotics problem of perception control, where there's the very clear applications of driving. It's as you get better and better at it, when you have like Waymo driver, the whole world starts to kind of start to look like a robotics problem.
So it's very interesting for now. - Detection, classification, segmentation, tracking, planning, like it's, yeah. - So there's no reason, I mean, I'm not speaking for Waymo here, but you know, moving goods, there's no reason transformer like this thing couldn't take the goods up an elevator, you know? Like slowly expand what it means to move goods and expand more and more of the world into a robotics problem.
- Well, that's right. And you start to like, think of it as an end-to-end robotics problem from like loading from, you know, from everything. And even like the truck itself, you know, today's generation is integrating into today's understanding of what a vehicle is, right? The Pacifica, Jaguar, the Freightliners from Daimler.
There's nothing that stops these, us from like down the road after like starting to get to scale, to like expand these partnerships to really rethink what would the next generation of a truck look like that is actually optimized for autonomy, not for today's world. And maybe that means a very different type of trailer.
Maybe that like, there's a lot of things you could rethink on that front, which is on its own, very, very exciting. - Let me ask you, like I said, you went to the Mecca of robotics, which is CMU, Carnegie Mellon University. You got a PhD there. So maybe by way of advice and maybe by way of story and memories, what does it take to get a PhD in robotics at CMU?
And maybe you can throw in there some advice for people who are thinking about doing work in artificial intelligence and robotics and are thinking about whether to get a PhD. - It's funny, I asked you what, I was at CMU for undergrad as well and didn't know anything about robotics coming in and was doing, you know, electrical computer engineering, computer science, and really got more and more into kind of AI and then fell in love with autonomous driving.
And at that point, like that was just by a big margin, like such a incredible, like central spot of investment in that area. And so what I would say is that like robotics, like for all the progress that's happened is still a really young field. There's a huge amount of opportunity.
Now that opportunity shifted where something like autonomous driving has moved from being very research and academics driven to being commercial driven where you see the investments happening in commercial. Now there's other areas that are much younger and you see like kind of grasping and manipulation, making kind of the same sort of journey that like autonomy made, and there's other areas as well.
What I would say is the space moves very quickly. Anything you do a PhD in, like it is in most areas will evolve and change as technology changes and constraints change and hardware changes and the world changes. And so the beautiful thing about robotics is it's super broad. It's not a narrow space at all, and it could be a million different things in a million different industries.
And so it's a great opportunity to come in and get a broad foundation on AI, machine learning, computer vision, systems, hardware, sensors, all these separate things. You do need to like go deep and find something that you're like really, really passionate about. Obviously, like just like any PhD, this is like a five, six year kind of endeavor.
And you have to love it enough to go super deep to learn all the things necessary to be super deeply functioning in that area and then contribute to it in a way that hasn't been done before. And in robotics, that probably means more breadth because robotics is rarely kind of like one particular kind of narrow technology.
And it means being able to collaborate with teams where like one of the coolest aspects of like the experience that I like kind of cherish in our PhD is that we actually had a pretty large AV project that for that time was like a pretty serious initiative where you got to like partner with a larger team and you had the experts in perception and the experts in planning and the staff and the mechanical engineers.
- It was a DARPA challenge. - So I was working on a project called UPI back then, which was basically the off-road version of the DARPA challenge. It was a DARPA funded project for basically like a large off-road vehicle that you would like drop and then give it a waypoint 10 kilometers away and it would have to navigate a completely unstructured-- - In an off-road environment.
- Yeah, so like forest, ditches, rocks, vegetation. And so it was like a really, really interesting kind of a hard problem where like wheels would be off to my shoulders. It's like gigantic, right? - Yeah, by the way, AV for people stands for autonomous vehicles. - Autonomous vehicles, yeah, sorry.
And so what I think is like the beauty of robotics, but also kind of like the expectation is that there's spaces in computer science where you can be very, very narrow and deep. Robotics, the necessity, but also the beauty of it is that it forces you to be excited about that breadth and that partnership across different disciplines that enable it.
But that also opens up so many more doors where you can go and you can do robotics in almost any category where robotics isn't really an industry. It's like AI, right? It's like the application of physical automation to all these other worlds. And so you can do robotic surgery, you can do vehicles, you can do factory automation, you can do healthcare, you can do like leverage the AI around the sensing to think about static sensors and scene understanding.
So I think that's gotta be the expectation and the excitement and it breeds people that are probably a little bit more collaborative and more excited about working in teams. - If I could briefly comment on the fact that the robotics people I've met in my life from CMU and MIT, they're really happy people.
- Yeah. - Because I think it's the collaborative thing. I think you don't... (laughs) You're not like sitting in like the fourth basement. - Yes, exactly. Which when you're doing machine learning purely software, it's very tempting to just disappear into your own hole and never collaborate. And that breeds a little bit more of the silo mentality of like, I have a problem.
It's almost like negative to talk to somebody else or something like that. But robotics folks are just very collaborative, very friendly and just... And there's also an energy of like, you get to confront the physics of reality often, which is humbling and also exciting. So it's humbling when it fails and exciting when it finally works.
- It's like a purity of the passion. You got to remember that like right now, like robotics and AI is like just all the rage and autonomous vehicles and all this. Like 15 years ago and 20 years ago, like it wasn't that deeply lucrative. People that went into robotics, they did it because they were like, thought it was just the coolest thing in the world.
They wanted to like make physical things intelligent in the real world. And so there's like a raw passion where they went into it for the right reasons and so forth. So it's really great space. And that organizational challenge, by the way, like when you think about the challenges in AV, we talk a lot about the technical challenges.
The organizational challenges through the roof, where you think about what it takes to build an AV system and you have companies that are now thousands of people. And you look at other really hard technical problems, like an operating system, it's pretty well established. Like you kind of know that there's a file system, there's virtual memory, there's this, there's that, there's like caching and like, and there's like a really reasonably well-established modularity and APIs and so forth.
And so you can kind of like scale it in an efficient fashion that doesn't exist anywhere near to that level of maturity in autonomous driving right now. And tech stacks are being reinvented, organizational structures are being reinvented. You have problems like pedestrians that are not isolated problems. They're part sensing, part behavior prediction, part planning, part evaluation.
And like one of the biggest challenges is actually how do you solve these problems where the mental capacity of a human is starting to get strained on how do you organize it and think about it, where you have this like multi-dimensional matrix that needs to all work together. And so that makes it kind of cool as well, because it's not like solved at all from, you know, like what does it take to actually scale this?
Right? And then you look at like other gigantic challenges that have, you know, that have been successful and are way more mature, there's a stability to it. And like, maybe the autonomous vehicle space will get there, but right now, just as many technical challenges as they are, they're like organizational challenges on how do you like solve these problems that touch on so many different areas and efficiently tackle them while like maintaining progress among all these constraints while scaling.
- By way of advice, what advice would you give to somebody thinking about doing a robotic startup? You mentioned Cosmo, somebody that wanted to carry the Cosmo flag forward, the Anki flag forward. Looking back at your experience, looking forward at a future that will obviously have such robots, what advice would you give to that person?
- Yeah, it was the greatest experience ever. And it's like, there's something you, there are things you learn navigating a startup that you'll never like, it was very hard to encounter that in like a typical kind of work environment. And it's just, it's wonderful. You gotta be ready for it.
It's not as like, you know, the glamor of a startup, there's just like just brutal emotional swings up and down. And so having co-founders actually helps a ton. Like I would not, could not imagine doing it solo, but having at least somebody where on your darkest days, you can kind of like really openly just like have that conversation and, you know, lean onto somebody that's in the thick of it with you, helps a lot.
What I would say-- - What was the nature of darkest days and the emotional swings? Is it worried about the funding? Is it worried about whether any of your ideas are any good or ever were good? Is it like the self-doubt? Is it like facing new challenges that have nothing to do with the technology, like organizational, human resources, that kind of stuff?
- Yeah, you come from a world in school where you feel that you put in a lot of effort and you'll get the right result and input translates proportional to output. And, you know, you need to solve the set or do whatever and just kind of get it done.
Now, PhD tests out a little bit, but at the end of the day, you put in the effort, you tend to like kind of come out with your enough results to you kind of get a PhD. In the startup space, like, you know, like you could talk to 50 investors and they just don't see your vision and it doesn't matter how hard you kind of tried and pitched.
You could work incredibly hard and you have a manufacturing defect. And if you don't fix it, you're out of business. You need to raise money by a certain date. And there's a, you got to have this milestone in order to like have a good pitch and you do it.
You have to have this talent and you just don't have it inside the company. Or, you know, you have to get 200 people or however many people kind of like along with you and kind of buy in the journey. You're like disagreeing with an investor and they're your investor.
So it's just like, you know, it's like, there's no walking away from it, right? So, and it tends to be like those things where you just kind of get clobbered in so many different ways that like things end up being harder than you expect. And it's like such a gauntlet, but you learn so much in the process.
And there's a lot of people that actually end up rooting for you and helping you like from the outside and you get good, great mentors and you like get find fantastic people that step up in the company. And you have this like magical period where everybody's like, it's life or death for the company, but like you're all fighting for the same thing and it's the most satisfying kind of journey ever.
The things that make it easier and that I would recommend is like be really, really thoughtful about the application. Like there's a saying of like, kind of, you know, team and execution and market and like kind of how important are each of those. And oftentimes the market wins and you come at it thinking that if you're smart enough and you work hard enough and you're like have the right talent to team and so forth, like you'll always kind of find a way through.
And it's surprising how much dynamics are driven by the industry you're in and the timing of you entering that industry. And so just Waymo is a great example of it. There is, I don't know if there'll ever be another company or suite of companies that has raised and continues to spend so much money at such an early phase of revenue generation and productization, you know, from a P&L standpoint.
Like it's an anomaly, like by any measure of any industry that's ever existed, except for maybe the US space program, right? But it's like a multiple trillion dollar opportunities, which is so unusual to find that size of a market that just the progress that shows the de-risking of it, you could apply whatever discounts you want off of that trillion dollar market and it still justifies the investment that is happening because like being successful in that space makes all the investments feel trivial.
Now, by the same consequence, like the size of the market, the size of the target audience, the ability to capture that market share, how hard that's gonna be, who the accumbents, like that's probably one of the lessons I appreciate like more than anything else, where like those things really, really do matter.
And oftentimes can dominate the quality of the team or execution, because if you miss the timing or you do it in the wrong space, or you run into like the institutional kind of headwinds of a particular environment, like let's say you have the greatest idea in the world, but you barrel into healthcare, but it takes 10 years to innovate in healthcare because of a lot of challenges, right?
Like there's fundamental laws of physics that you have to think about. And so the combination of like Anki and Waymo kind of drives that point home for me, where you can do a ton if you have the right market, the right opportunity, the right way to explain it, and you show the progress in the right sequence, it actually can really significantly change the course of your journey and startup.
- How much of it is understanding the market and how much of it is creating a new market? So how do you think about, like the space of robotics is really interesting. You said exactly right, the space of applications is small, relative to the cost involved. So how much is like truly revolutionary thinking about like what is the application?
And then, yeah, but so like creating something that- - Didn't exist. - Didn't really exist. Like, this is pretty obvious to me, the whole space of home robotics, just everything that Cosmo did, I guess you could talk to it as a toy and people will understand it, but Cosmo is much more than a toy.
And I don't think people fully understand the value of that. You have to create it and the product will communicate it. Like just like the iPhone, nobody understood the value of no keyboard and a thing that can do web browsing. I don't think they understood the value of that until you create it.
- Yeah, having a foot in the door and an entry point still helps because at the end of the day, like an iPhone replaced your phone. And so it had a fundamental purpose and it has all these things that it did better, right? And so then you could do ABC on top of it.
And then like, you even remember the early commercials where it's always like one application of what it could do and then you get a phone call. And so that was intentionally sending a message, something familiar, but then like, you can send a text message, you can listen to music, you can surf the web, right?
And so, autonomous driving obviously anchors on that as well. You don't have to explain to somebody the functionality of an autonomous truck, right? Like there's nuances around it, but the functionality makes sense. In the home, you have a fundamental advantage. Like we always thought about this because it was so painful to explain to people what our products did and how to communicate that super cleanly, especially when something was so experiential.
And so you compare like Anki to Nest. Nest had some beautiful products where they started scaling and like actually found like really great success. And they had like really clean and beautiful marketing messaging because they anchored on reinventing existing categories where it was a smart thermostat, right? And like, and so you kind of are able to take what's familiar, anchor that understanding, and then explain what's better about it.
- That's funny, you're right. Cosmo is like totally new thing. Like what is this thing? - It's we struggle. We spent like a lot of money on marketing. We had a hard, like we actually had far greater efficiency on Cosmo than anything else 'cause we found a way to capture the emotion in some little shorts to kind of wean into the personality in our marketing.
And it became viral where like we had these kind of videos that would like go and get like hundreds of thousands of views and like kind of like get spread and sometimes millions of views. And so, but it was like really, really hard. And so finding a way to kind of like anchor on something that's familiar but then grow into something that's not is an advantage.
But then again, like you don't have like their successes otherwise like Alexa never had a comp, right? You could argue that that's very novel and very new. And there's a lot of other examples that kind of created kind of a category out of like Kiva Systems. I mean, they like came in and they like, enterprise is a little easier because if you can, it's less susceptible to this because if you can argue a clear value proposition it's a more logical conversation that you can have with customers.
It's not, it's a little bit less emotional and kind of subjective, but. - Yeah, in the home you have to. - Yeah, it's like a home robot. It's like, what does that mean? And so then you really have to be crisp about the value proposition and what like really makes it worth it.
Like, and we, by the way, went to that same order. We almost like, we almost hit a wall coming out of 2013 where we were so big on explaining why our stuff was so high tech and all the kind of like great technology in it and how cool it is and so forth to having to make a super hard pivot on why is it fun and why does the random kind of family of four need this.
Right, like, so it's learnings, but that's the challenge. And I think like robotics tends to sometimes fall into the new category problem, but then you gotta be really crisp about why it needs to exist. - Well, I think some of robotics, depending on the category, depending on the application is a little bit of a marketing challenge.
And I don't mean, I mean, that's the kind of marketing that Waymo's doing, that Tesla is doing, is like showing off incredible engineering, incredible technology, but convincing, like you said, a family of four, that this is transformative for your life. This is fun. - They don't care how much tech is in your thing.
- They don't, they really don't care. - It's like, they need to know why they want it, so. - And some of that is just marketing. - Yeah, and that's why like Roomba, like, yes, they didn't, you know, like go and have this like, you know, huge, huge ramp into like the entirety of like kind of AI robotics and so forth.
But like, they built a really great business in the vacuum cleaner world. And like, everybody understands what a vacuum cleaner is. Most people are annoyed by doing it. And now you have one that like kind of does it itself, the various degrees of quality, but that is so compelling that like, it's easy to understand.
And like, and they had a very kind of, and I think they have like 15% of the vacuum cleaner market. So it's like pretty successful, right? - I think we need more of those types of thoughtful stepping stones in robotics, but the opportunities are becoming bigger because hardware is cheaper, computes cheaper, clouds cheaper, and AI is better.
So there's a lot of opportunity. - If we zoom out from specifically startups and robotics, what advice do you have to high school students, college students about career and living a life that you'd be proud of? You lived one heck of a life. You're very successful in several domains.
If you can convert that into a generalizable potion, what advice would you give? - Yeah, it's a very good question. So it's very hard to go into a space that you're not passionate about and push hard enough to be, you know, to like maximize your potential in it. And so there's always kind of like the saying of like, okay, follow your passion, great.
Try to find the overlap of where your passion overlaps with like a growing opportunity and need in the world, where it's not too different than the startup kind of argument that we talked about, where if you are- - Where your passion meets the market. - Right, you know what I mean?
Like, 'cause it's like, that's a beautiful thing where like, you can do what you love, but it's also just opens up tons of opportunities 'cause the world's ready for it, right? Like, and so like, if you're interested in technology, that might point to like, go and study machine learning, because you don't have to decide what career you're gonna go into, but it's gonna be such a versatile space that's gonna be at the root of like everything that's gonna be in front of us, that you can have eight different careers in different industries and be an absolute expert in this like kind of tool set that you wield that can go and be applied.
And that doesn't apply to just technology, right? It could be the exact same thing if you wanna, you know, same thought process applies to design, to marketing, to, you know, to sales, to anything, but that versatility where you like, when you're in a space that's gonna continue to grow, it's just like, what company do you join?
One that just is gonna grow and the growth creates opportunities where the surface area is just gonna increase and the problems will never get stale. And you can have, you know, many like, and so you go into a career where you have that sort of growth in the world that you're in, you end up having so much more opportunity that organically just appears, and you can then have more shots on goal to find like that killer overlap of timing and passion and skill set and point in life where you can like, you know, just really be motivated and fall in love with something.
And then at the same time, like find a balance. Like there's been times in my life where I worked like a little bit too obsessively and, you know, and crazy. And I, you know, think we kind of like tried to correct it, you know, kind of the right opportunities, but, you know, I think I probably appreciate a lot more now friendships to go way back, you know, family and things like that.
And I kind of have the personality where I could use, like I have like so much desire to really try to optimize, like, you know, what I'm working on that I can easily go to kind of an extreme. And now I'm trying to like kind of find that balance and make sure that I have the friendships, the family, like relationship with the kids, everything that like, I don't, I push really, really hard, but it kind of find a balance.
And I think people can be happy on actually many kind of extremes on that spectrum, but it's easy to kind of inadvertently make a choice by how you approach it, that then becomes really hard to unwind. And so being very thoughtful about kind of all of those dimensions makes a lot of sense.
And so, I mean, those are all interrelated, but at the end of the day- - Love, passion and love. Love towards, you said- - Yeah, family, friends. - Family. And hopefully one day, if your work pans out, Boris, is love towards robots. - Love towards robots. Not the creepy kind, the good kind.
- The good kind. Just friendship and fun. - Yeah, it's like another dimension to just how we interface with the world. Yeah. - Boris, you're one of my favorite human beings, roboticist. You've created some incredible robots and I think inspired countless people. And like I said, I hope Cosmo, I hope your work with Anki lives on.
And I can't wait to see what you do with Waymo. I mean, if we're talking about artificial intelligence technology, there's the potential to revolutionize so much of our world. That's it right there. So thank you so much for the work you've done and thank you for spending your valuable time talking with me.
- Thanks, Alex. - Thanks for listening to this conversation with Boris Sofman. To support this podcast, please check out our sponsors in the description. And now let me leave you with some words from Isaac Asimov. If you were to insist I was a robot, you might not consider me capable of love in some mystic human sense.
Thank you for listening and hope to see you next time. (upbeat music) (upbeat music)