Back to Index

Why Google failed to make GPT-3 -- with David Luan of Adept


Chapters

0:0 Introduction of David Luan, CEO and co-founder of Adept
1:14 David's background and career trajectory
3:20 Transition from reinforcement learning to transformers in the AI industry
5:35 History and development of GPT models at OpenAI and Google
13:8 Adept's $420 million funding rounds
13:38 Explanation of what Adept does and their vision for AI agents
19:20 Reasons for Adept becoming more public-facing
21:0 Adept's critical path and research directions (Persimmon, Fuyu, Act One)
26:23 How AI agents should interact with software and impact product development
30:37 Analogies between AI agents and self-driving car development
32:42 Balancing reliability, cost, speed and generality in AI agents
35:11 Adept's unique positioning and advantages in the AI industry
37:30 Potential of foundation models for robotics
39:22 Core research questions and reasons to work at Adept
40:57 David's closing thoughts on the AI agent space and industrialization of AI

Transcript

(upbeat music) ♪ Podcast, we dive right in ♪ ♪ Exploring the world, the guts where new begins begin ♪ ♪ David Luan, the founder ♪ ♪ A visionary in his own right ♪ ♪ Building autonomous agents ♪ ♪ Taking us to new heights, yeah ♪ - Hey everyone, welcome to the Latent Space Podcast.

This is Alessio, partner and CTO of Residence at Decibel Partners, and I'm joined by my co-host, Swiggs, founder of Small.ai. - Hey, and today we have David Luan, CEO, co-founder of ADEPT in the studio, welcome. - Yeah, thanks for having me. - Been a while in the works, I met you socially at one of those VC events, and you said that you were interested in coming on, and glad we finally were able to make this happen.

- Yeah, happy to be a part of it. - So we like to introduce the speaker, and then also just have you talk a little bit about what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first real-time video detection classification API that was Dextro, and that was your route to getting acquired into Axon, where you were director of AI.

Then you were the 30th hire at OpenAI? - Yeah, 30, 35, something around there. - Something like that. VP of Eng for two and a half years, two years and a bit, briefly served as tech lead of large models at Google, and then in 2022 started ADEPT. So that's the sort of brief CV.

- Yeah, more or less. - Yeah, is there anything else you wanna fill in the blanks, or people should know more about? - I guess a broader story was joined OpenAI fairly early, and then did that for about, yeah, two and a half to three years leading engineering there.

It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and were like, "Hey, you should take over our direction. "We'll go, mostly do IC work." So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened.

The company, the Dota effort was going pretty hard, and then more broadly trying to put some bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then at Google, so I led Google's LLM efforts, but also co-led Google Brain, was one of the brain leads more broadly.

And I think there's been a couple of different eras of AI research, right? And if we count everything before 2012 as prehistory, which people hate it when I say that, you kinda had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017.

And then from, and I think the game changed in 2017, and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers, and like, and I think-- - Skills that they need.

- Yeah, well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was like for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically, compared to, hey, we're just smaller Google brain, or like we're Google, like you work at OpenAI if you live in SF and don't wanna commute to Mountain View or don't wanna live in London, right?

That's like not enough to like hang your technical identity and as a company. And so like we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about like, how do you like leave some room for that, to really make it about like, what are the big scientific outcomes that you wanna show?

And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple years, right? And then what's changed now is I think that like the number one driver of AI progress over the next couple of years is gonna be the deep co-design and co-evolution of like product and users for feedback and actual technology.

And I think labs that retool to go do that are gonna do really well. And that's a big part of why I started ADAPT. - You mentioned Dota. Any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?

- You know, I actually think that people, like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's gonna happen.

I think even this definition of like AGI is something that outperforms humans at economically valuable tasks is kind of a implicit view of the world about what's gonna be the role of people. I think what I'm more interested in is a definition of AGI that's oriented around a model that can do anything a human can do on a computer.

And I think if you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so like, what did all the work we did on RL and stuff like that get us was it got us a really clear formulation, like you have a goal and you wanna maximize the goal and you wanna maximize reward, right?

Like natural LLM formulation doesn't come with that out of the box, right? So like, I think that we, as a field, got a lot right by thinking about, hey, how do we solve problems of that caliber? And then the thing we forgot is like, like the Novo RL is like a pretty terrible way to get there quickly.

Why are we rediscovering all the knowledge about the world? Like years ago, I had a debate with a Berkeley professor as to like, like what will it actually take to build AGI? And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there, right?

- The biological basis theory. - I think like we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning everything that humans already know, right? So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet.

In the future, you know, like now the multimodal models are becoming more of a thing where behavioral cloning the visual world, but really what we're just gonna have is like a universal byte model, right? Where like tokens of data that have high signal come in, and then all of those patterns are like learned by the model and then you can regurgitate any combination out, right?

So like, like text in to voice out, like image in to, I don't know, like to other image out or video out or whatever, like these like mappings, right? Like all just gonna be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of like, how do we combine this with all of the lessons we learned during the RL period?

And that's what's gonna drive progress. - Interesting. I'm still gonna pressure you for a little, a few more early opening eyes stories before we turn to the adept stuff. On your personal site, which I love, 'cause it's really nice, like personal, you know, story context around like your history.

- I need to update it, it's so old. - Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right? - I actually don't quite remember. I think I was joining right around. - Right around that? - I was right around that, yeah.

- Yeah, the canonical story was Alec, you know, just kind of came in and was like very obsessed with transformers and applying them to like Reddit sentiment analysis. - Yeah, yeah, sentiment, that's right, sentiment neuron, all that stuff. - The history of GPT, as far as you know, you know, according to you.

- Ah, okay, history of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where transformers sort of came about. The number one shocking thing to me was that, and this is like a consequence of the way that Google's organized, where like, again, like you and your three best friends write papers, right?

Okay, so zooming way out. I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line.

My job is not actually to promote a million ideas that never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right?

That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy, you know how to scale these things up. Like, here's half of all of our TPUs. And then I think they would have destroyed us.

- He clearly wanted it too. He's talking about trillion parameter models in 2017. - Yeah, and so I think this gets to the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But like, after GPT-2, we were all really excited about GPT-2, I can tell you more stories about that.

It was the last paper that I even got to really touch before everything became more about just like building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right?

Google has all this compute, Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, you know, we're like, we're probably just doing duplicative research to what he's doing, right?

He's got this like decoder only transformer that's probably gonna get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and like, and I was one of the brain leads, you know, it became really clear why, right?

At the time, there was a thing called the brain credit marketplace. And did you guys remember the brain credit marketplace? - No, I never heard of this. - Oh, so it's actually, you can ask any Googler, it's like just like a thing that they do. - I mean, look, like yeah, limited resources, you gotta have some kind of marketplace, right?

- You could. - Sometimes it's explicit, sometimes it's just political favors. - You could, and so then like, basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you wanna go do a giant job, you gotta convince like 19 or 20 of your colleagues not to do work.

And if that's how it works, it's like, it's really hard to get that bottom up critical mass to go scale these things. And like, and the team at Google were fighting valiantly, but like, we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right?

Of like this modern AI era to phase two. And I think in the same way, I think phase three companies can out execute phase two companies because of the same like asymmetry of success. - Yeah, I think it's underrated how much Nvidia worked with you in the early days as well.

I think maybe, I think it was Jensen, I'm not sure who circulated a recent photo of him delivering the first DGX to you guys. - I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for Nvidia, it is unreal. - But like what opening I like kind of give their requirements like co-design it or you just work with whatever Nvidia gave them.

- So we work really closely with them. There's, I'm not sure I can share all the stories, but like, I think like examples of ones that I've found particularly interesting. So Scott Gray is amazing. And I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that.

But as a result, like we had very close ties to Nvidia. Actually, one of my co-founders at Adapt, Eric Elson, was also one of the early GPGPU people. And so he and Scott and like Brian Catanzaro and Nvidia and Jonah and Ian at Nvidia, I think all were very close.

And we're all sort of part of this group of just like, how do we push these chips to the absolute limit? And I think like that kind of collaboration helped quite a bit. One interesting set of stuff is just like, knowing the A100 generation that like quad sparsity was gonna be a thing.

Is that something that we wanna go look into, right? And figure out if that's something that we could actually use for model training. And I think more and more people realize this, but like six years ago, or even three years ago, people refused to accept it. Like this era of AI is really a story of compute.

It's really the story of how do you more efficiently map like actual usable model flops to compute, right? - Yeah, cool. Is there another, you know, sort of GPT-2, 3 story that like, you know, you love to get out there that I think you think is like underappreciated for like the amount of work that people put into it?

- So two interesting GPT-2 stories. - Love it. - I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments, we were writing the modeling section. And I'm pretty sure the modeling section was like the shortest modeling section of any ML, like reasonably legitimate ML paper to that moment.

It was like section three model, like this is a standard vanilla decoder only transformer with like these particular things. It was like a paragraph long, if I remember correctly. And both of us were just looking at the same, being like, man, like the OGs in the field are gonna hate this.

They're gonna say no novelty. Like, why'd you guys do this work? So now it's funny to look at in hindsight that it was kind of a pivotal kind of paper. But I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about like, hey, like, is there like four different, like really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?

- Right. And it's like, you innovate on maybe like data set and scaling and not so much the architecture. - Yeah. I mean, now, I mean, like we all know how it works now, right? Which is that like, there's a collection of really hard won knowledge that you get only by being at the frontiers of scale.

And that hard won knowledge, a lot of it's not published. A lot of it is like stuff that like, it's actually not even easily reducible to what looks like a typical academic paper. But yeah, that's the stuff that helps differentiate one scaling program from another. - Yeah. You had a second one?

- Hilariously enough, the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself, and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before, so I always had like a tremendous amount of anxiety about partner meetings, which this basically is what it was, because it's like Kevin Scott and Satya and Amy Hood.

And it was my job to give the technical slides about, you know, what's the path to AGI, what's our research portfolio, all of this stuff. But it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up.

As we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so like, I'd spent all this time trying to figure out how to keep this thing on rails, prevent it from saying anything bad. But I had my canned demos, but I knew I had to go turn it around over to like Satya and Kevin and let them type anything in.

And that just, that really kept me up all night. - Nice. - Yeah. - That must have helped you, talking about partners meeting, you raised 420 million for ADAPT. The last round was a $350 million Series B, so I'm sure you do great in partners meetings. - Pitching Phoenix.

- Nice. - No, that's a high compliment coming from a VC. - Yeah, no, I mean, you're doing great already. Let's talk about ADAPT. And we were doing pre-prep, and you mentioned that maybe a lot of people don't understand what ADAPT is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse.

Like what is ADAPT? - Yeah, so I think ADAPT is like the least understood company in the like broader space of foundation models plus agents. So I'll give some color and I'll explain what it is, and I'll explain also why it's actually pretty different from what people would have guessed.

So the goal for ADAPT is we basically wanna build an AI agent that can basically help humans do anything a human does on a computer. And so what that really means is like, we want this thing to be super good at turning natural language, like goal specifications, right? Into the correct set of end steps, and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use.

And so the end vision of this is effectively like, I think in a couple of years, everyone's gonna have access to like an AI teammate that they can delegate arbitrary tasks to at work, and then also be able to use it as a sounding board and like just be way, way, way more productive, right?

And just like changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of like, what should I be doing and why, right? I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out.

I think like a systems like ADAPT are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models.

We also don't sell bottom up products. Like we're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we have, we work with like a range of different companies, some like late stage, like multi-thousand people startups, some fortune 500s, et cetera.

And what we do for them is we basically give them an out of the box solution where like big complex workflows that their employees do every day could be delegated to the model. So we look a little different from other companies in that like in order to go build this full agent thing, the most important thing you gotta get right is reliability.

I think over the last year or two. So initially zooming way back when one of the first things that Depp did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere.

'Cause like we did that in the original Act One demo and like showed that, showed like Google Sheets, like all this other stuff. But over the last like year since that has come out, there's been a lot of really cool demos. And you go play with them and you realize they work 60% of the time.

But since we've always been focused on how do we build an amazing enterprise product, like enterprises like don't want, can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability.

And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just like blowing money and poor truck drivers going places.

- Interesting. We had one of our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that you would ask somebody to do and the software just does it for you.

When you think about these use cases, do the users still go in and like look at the agent kind of like doing the things and can intervene or like are these like fully removed from them? Like the truck thing is like, does the truck just show up or like are there people in the middle like checking in?

- Yeah, so actually what's been really interesting is you could question whether they're fundamental, but I think there's two current flaws in the framing for services as software or I think what you just said. I think that one of them is like in our experience as we've been rolling out ADEPT, the people who actually do the jobs are the most excited about it because they don't go from I do this job to I don't do this job.

They go from I do this job for everything, including the shitty rote stuff to I'm a supervisor and I literally like, it's pretty magical when you watch the thing being used because like now it parallelizes a bunch of the things that you had to do sequentially by hand as a human and you can just click into any one of them, be like, hey, I wanna watch the trajectory that the agent went through to go solve this and the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result.

It just fails to execute and the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves and so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it.

I think the second piece of it that we've found is like our strategy as a company is to always be an augmentation company and I think one, out of principle, that's something we really care about but two, actually, if you're framing yourself as an augmentation company, you're always gonna live in the world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback and that's how you build a data flywheel.

That's how you actually learn from the smartest humans how to solve things models can't do today and so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job's to deliver you a lights-off solution for X.

- Yeah, it's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it and similarly in software development, you have Copilot, which is the augmentation product and then you have Sweep.dev, any of these products, which is like, they just do the whole thing.

I'm really curious to see how that evolves. I agree that today, the reliability's so important in the enterprise that they just don't use most of them. Yeah, no, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, Dev, they do Act One, they do Persimon, they do Fuyu, they do all these-- - It's just the public stuff.

- It's just public stuff and so I think you're gonna find, so one of the things we haven't shared before is we're completely sold out for Q1. And so I think-- - Sold out of what? - Sold out of bandwidth to go onboard more customers. I think we're like working really hard to go, like make that less of a bottleneck, but you could, but our expectation is that, I think we're gonna be significantly more public about the broader product shape and the new types of customers we wanna attract later this year.

So I think that clarification will happen by default. - Why have you become more public? You know, if the whole push has, you're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things. - I think we just flipped over that way fairly recently.

I think that, like, that's a good question. I think it actually boils down to two things. The public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January, 2022, like everybody in the field knew about the agents thing from RL, right?

But like the general public had no conception of what it was. They would still hang their narrative hat on the tree of like, everything's a chatbot, right? And so I think now, I think one of the things that I really care about is that when people think agent, they actually think the right thing, right?

Like all sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. Like to me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps, right?

And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Adept as they think about what the next thing they wanna do in their careers. And I think the field is quickly pivoting in a world where foundation models are looking more and more commodity.

And I think a huge amount of gain is gonna happen from how do you use foundation models as like the well-learned behavioral cloner to go solve agents. And I think people who wanna do agents research should really come to Adept. - Yeah, excellent. When you say agents have become more part of the public narrative, are there specific things that you point to?

So I'll name a few. Bill Gates, in his blog posts, mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying agents are the future for OpenAI.

- And before that even, I think there was something like New York Times, Kate Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company now brand themselves as an AI agent company.

It's just like, it's a term. I just feel like people really wanna-- - From the VC side, it's a bit mixed. - Is it? - As in like, I think there are a lot of VCs where like, I would not touch any agent startups 'cause like-- - Why is that?

- Well, you tell me. (laughs) - I think a lot of VCs that are maybe less technical don't understand the limitations of the-- - No, that's not fair. - No, no, no, no, I think like-- - You think so? - No, no, I think like the, what is possible today and like what is worth investing in, you know?

And I think like, I mean, people look at you and say, "Wow, these guys are building agents. "They needed 400 million to do it." So a lot of VCs are maybe like, "Oh, I would rather invest in something "that is like tacking on AI to an existing thing, "which is like easier to get the market "and kind of get some of the flag wheel going." But I'm also surprised a lot of funders just don't wanna do agents.

It's not even the funding. Like, sometimes we look around and it's like, "Why is nobody doing agents for X?" And it's like-- - Wow. - I don't get it. - That's good to know, actually. I never knew that before. My sense from my limited perspective is there's a new agent and company popping up every day.

So maybe I'm missing something. - There are, there are. But like I have advised people to take agents off of their title because it's so diluted. - It's now so diluted, yeah. - So then it doesn't stand for anything. - Yeah, that's a really good point. - So anyway, I do want to also cover, so like, you know, you're a portfolio allocator.

You have like people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adept sort of research directions? - The critical path for adept is we want to build agents that can do higher and higher level of abstraction things over time, all while keeping an insanely high reliability standard.

Because that's what turns us from research into something that customers want. And if you build agents with a really high reliability standard but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow.

That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is, can we teach a large model, basically, how to even actuate your computer? And I think we were one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and text.

But I think from there on out, we really realized was that like, in order to get reliability, and also like companies just do things in various different ways, you actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing.

And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that needs to kind of be the base for some of these agents. And so like, back then we had to do a ton of research, basically, on how do we actually make that possible?

Well, first off, like, back in 2020, I forgot the exact one month of 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the 4U architecture. I think one big hangover from primarily academic focus for multimodal models is like, most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera.

- Coco. - Yeah, right, and the Coco is awesome. Like, I love Coco, I love TY. Like, it's like, it's really helped the field, right? But like, that's the build one thing. I actually think like, like, it's really clear today, multimodal models are the default foundation model, right? It's just gonna supplant LLMs.

Like, why would you just train a giant multimodal model? And so for that though, like, where are they gonna be the most useful? They're gonna be most useful in knowledge work tasks. That's where the majority economic value is gonna be. It's not in cat and dogs, right? And so if that's what it is, what do you need to train?

I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. Like, that's just a totally different pre-training corpus. And so at Depp, spent a lot of time building that. And so the like, the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff.

But you take a lot of that data and then you make it really fast, make it really good at things like, like dense OCR on screens. And then now you have like the right, like a raw putty to go make a good agent. So that's kind of like some of the modeling side.

We've kind of only announced some of that stuff. We haven't really announced much of the agents work. But that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think like we're seeing, and you guys probably see this a little bit more than I do, but like we're seeing like a little bit of a pushback against like the tyranny of chatbots as form factor.

And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full of like vertical integration of all these bits in order to get to where we are.

- Yeah. I'll plug Amelia Weinberger's talk at our conference where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. - Totally. - And yeah. I mean, so I was kind of excited at Adept Experiments or Adept Workflows.

I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product. - Yeah. So we just use experiments as like a way to go push various ideas on the design side to some people and just like get them to play with it.

And actually the experiments code base underpins the actual product, but it's like just the code base itself is like a kind of like a skeleton for us to go deploy arbitrary cards on the side. - Yep. Yeah, makes sense. Yeah. I was gonna say, I would love to talk about the interaction layer.

So you train a model to see UI, but then there's the question of like, how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like they manage the end point. So the whole computer, you're more at the browser level.

Like, and I know I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is gonna change with that in mind as more and more of the work is done by agents instead of people?

- There's so much surface area here. And it's actually one of the things I'm really excited about. And it's like, it's funny because like, I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool.

So I would say the best analogy I have to why Adept is pursuing a path of being able to just use your computer like a human, plus of course being able to call APIs is the easy part, like being able to use your computer like a human is a hard part.

It's in the same way why people are excited about humanoid robotics, right? Like in a world where you had T equals infinity, right? You're probably gonna have various different form factors that robots could just be in and like all the specialization but the fact is that humans live in a human environment.

So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? Like if you go itemize out the number of things you wanna do on your computer, for which every step has an API, those numbers of workflows add up pretty close to zero.

And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing.

And so I think like this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. So what you're likely to see is you're gonna end up seeing agents that sort of like, I kind of think about this early days of the agent interaction layer level is a little bit like, do y'all remember Windows 3.1, like those days?

Okay, I might be too old for you guys on this, but like back in the day, Windows 3.1, right? Like the way we had this transition period between like pure command line, right? Being like the default to this new robot, the GUI is the default, and then you drop into the command line for like programmer things, right?

The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing, and you typed Windows and you hit enter, and then you got put into Windows. And then like GUI kind of became a layer above the command line. I think the same thing is gonna happen with agent interfaces, is like today we'll be having the GUI is like the base layer, and then the agent just controls the current GUI layer plus APIs.

And in the future, as more and more trust is built towards agents, and more and more things can be done by agents, and more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, like what changes for software is that like a lot of software is gonna be either systems or record, or like certain customized workflow execution engines.

And a lot of how you actually do stuff will be controlled at the agent layer. - And you think so like the Rabbit interface is more like it would like, you're not actually seeing the app that the model interacts with, you're just saying, hey, I need to log this call on Salesforce.

And like, you're never actually going on salesforce.com directly as the user. - I can see that being a model. I think I don't know enough about how, what using Rabbit in real life will actually be like to comment on that particular thing. But I think the broader, I think the broader idea that like, that like, you know, you have a goal, right?

The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems of record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal. That all just really leads to a world where you don't really need to ever interface with the apps underneath, unless you're a power user for some niche thing.

- General question. So first of all, I think like this whole, the sort of input mode conversation, I wonder if you have any analogies that you like with self-driving? Because I do think like, there's a little bit of like how the model should perceive the world. And, you know, the primary split in self-driving is LIDAR versus camera.

And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like-- - The multimodal approach that we're doing. - The non-multimodal vision, very, very heavy vision. All the for you stuff that you're doing, you're focusing on that, including charts and tables and-- - Yeah.

- Do you find like inspiration there from like, the self-driving world? - That's a good question. I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. And I think that's great. - Level one to five. - I think that's awesome. But I think that our number one goal is for agents not to look like self-driving, in that we wanna minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving.

We wanna be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, like compared to self-driving, like two things that people really undervalue is like really easy to get the like, driving a car down highway 101 in a sunny day demo, right?

Like that actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is like a really strong focus on like actually why does the model not do this thing?

And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems. - Yeah, makes sense.

I was slightly surprised just because I do generally consider the Waymo's that we see all around San Francisco as the most, I guess, real case of agents that we have, you know, in very material ways. - Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature.

Like from when it entered the consciousness and the 101, the driving down 101 on a sunny day moment happened to now, right? So I want to see that more compressed. - And then, you know, cruise, you know, RIP recently. So, and then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is there's, I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale.

'Cause you have reliability, you also have cost, you also have speed. Speed is a huge emphasis for a debt. All of that seems to, tends towards wanting to reduce, the tendency or the temptation is to reduce generality, to improve reliability and to improve cost, improve speed. Do you perceive a trade-off?

Do you have any insights that, that solve those trade-offs for you guys? - There's definitely a trade-off if you're at the Pareto frontier. I think a lot of folks aren't actually at the Pareto frontier. And I think the way you get there is basically like, how do you frame the fundamental agent problem in a way that just continues to benefit from data?

And I think that, I think like one of the main ways of like being able to solve that particular trade-off is like, you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible.

I think that's how you really solve it. Then you get into the other problems like, okay, are you overfitting on these end use cases, right? But like, you're not doing a thing where you're like being super prescriptive for the end steps and that the model, that the model can only do, for example.

- I mean, so then the question becomes kind of, do you have one sort of house model that you then customize for each customer and you're fine-tuning them on like each customer's specific use case? - Yeah, we're not sharing that one. - You're not sharing that. It's tempting because, but like that doesn't look like AGI to me.

You know what I mean? Like that is just, you have a good base model and then you fine-tune it to others. - Yeah, yeah, yeah. I mean, I think for what it's worth, I think there's like two paths to a lot more capability coming out of the model set that we all are training these days.

I think one path is you figure out how to spend, compute and turn it into data. I think the other path, and so like in that path, right, I consider search, RL, all the things that we all, that we all love in this era as part of that path, like self-play, all that stuff.

The second path is how do you get like super competent, high intelligence demonstrations from humans. And I think the right way to move forward is you kind of want to combine the two. Like the first one gives you maximum sample efficiency for a little second, but I think that it's gonna be hard to be running at max speed towards AGI without actually solving a bit of both.

- Yeah, any insights on, you haven't talked much about synthetic data as far as I can tell. Probably this is a bit of a, too much of a trend right now, but any insights on using synthetic data to augment the expensive human data? - The best part about framing AGI as being able to help people do things on computers is you have an environment.

- Yes. - So. (laughs) - So you can simulate all of it. - You could do a lot of stuff when you have an environment. - Yeah. - We were having dinner for our one year anniversary. - Congrats. - Yeah, thank you. Raza from HumanLoop was there and we mentioned you were coming on the pod with, this is our first.

- So he submitted a question. - Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPT-4 Data and Exist, now you've had GPT-4 Vision, which can help you building a lot of those things. How do you think about the things that are unique to you as ADAPT and like going back to like the, maybe research direction that you want to take the team and what you want people to come work on at ADAPT versus what is maybe now become commoditized that you didn't expect everybody would have access to?

- Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two, so he can push back on my assumption about his question. But I think implicit in that question is like, is a calculus of where does advantage accrue in the overall ML stack.

And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is like always gonna be a giant advantage of vertical integration.

I think like it lets us do things like have a really, really fast base model is really good at agent things, but is bad at cat and dog photo. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos. So like we're allocating our capacity wisely, is like one thing that you really get to do.

I also think that the other thing that is pretty important now in the broader foundation modeling space is like, I feel despite any potential concerns about, like how good is agents as like a startup area, that we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within ADAPT is flowing from, can we make a better agent?

Because right now, I think we all see that, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so like, I think pure play foundation model companies are just gonna be pinched by how good the next couple of llamas are gonna be.

And the next good open source thing, and then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models. I think it's gonna commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.

- So you don't consider yourself a pure play foundation model company? - No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other-- - Right, you're dedicated towards the agent. - Yeah, and our business is an agent business.

We're not here to sell you tokens, right? And I think selling tokens, unless there's like a-- - We're not here to sell you tokens. I love it. - It's like, if you have a particular area of specialty, then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute.

But if you don't have a specialty, I find that, I think it's gonna be a little tougher. - Interesting. Are you interested in robotics at all? - Personally fascinated by robotics. I always love, have always loved robotics. - No, but embodied agents as a business, figure is like a big, also sort of open AI affiliated company that raises a lot of money.

- Yeah, I think it's cool. I think, I mean, I don't know exactly what they're doing, but-- - Robots. - Yeah, well, I mean, that's, yeah. - What question would you ask if we had them on? Like, what would you ask them? - Oh, I just wanna understand what their overall strategy is gonna be between now and when there's reliable stuff to be deployed.

But honestly, I just don't know enough about it. - And if I told you, hey, fire your entire workforce, warehouse workforce, and put robots in there. Like, isn't that a strategy? - Oh, yeah, yeah, sorry, I'm not questioning whether they're doing smart things. I hope I didn't come off that way.

- No, no, no, no, you didn't. - It's just like, I genuinely don't know what they're doing as much. But I think like, look, I think there's two things. One, I'm so excited for someone to train a foundation model of robots. Like, it's just, I think it's just gonna work.

Like, I will die on this hill. I mean, like, again, this whole time, like, we've been on this podcast just continually saying, you know, like, these models are basically behavioral cloners, right? So let's go behavioral clone all this, like, robot behavior, right? And then you figure out everything else you have to do in order to teach you how to solve new problems.

Like, that's gonna work. I'm super stoked for that. I think, unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero-sum, like, job replacement play, right? And I'm personally less excited about that. - We had Ken June from Mimboo on the podcast. - Another guest.

- Yeah, we asked her why people should go work there and not at ADAPT. - Oh, that's so funny. - So I wanna, her, well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent.

And for her, the biggest research thing was, like, getting models better at reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at ADAPT instead of Mimboo? And maybe what are, like, the core research questions that people should be passionate about to have fun at ADAPT?

- Yeah, first off, I think that, I'm sure you guys believe this too, but, like, the AI space, to the extent there's an AI space and the AI agent space are both, like, exactly, as she likely said, like, I think colossal opportunities and, like, people are just gonna end up winning in different areas and people are all just gonna, a lot of companies are gonna do well.

So I really don't feel that zero-something at all. I would say, like, to, like, change the zero-sum framing is, like, why should you be at ADAPT? I think there's two huge reasons to be at ADAPT. I think one of them is, like, everything we do is in the service of, like, useful agents.

Like, we're not a research lab. Like, we do a lot of research in service of that goal, but we don't think about ourselves as, like, a classic research lab at all. And I think the second reason to work at ADAPT is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here.

And I think the examples for why that's true is, like, for example, like, our evaluations, they're not academic evals. They're not, like, simulator evals. They're, like, okay, like, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to do.

We can't do them at all. We've turned those into evals. Like, solve it, right? Like, I think that's really cool. Like, everybody knows a lot of these evals are, like, pretty saturated, and the new ones that even are not saturated, you look at someone and you're, like, is this actually useful, right?

I think that's a degree of, like, of, like, practicality that really helps. Like, we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff, but it's, like, they're very grounded in actual needs right now, which is really cool. - Yeah, this has been a wonderful dive.

You know, I wish we had more time, but, you know, I would just leave it kind of open to you. I think you have broad thoughts, you know, just about the agent space, but also just the general AI space. Any sort of rants or things that are helpful for you right now?

- Any rants? - Mining you for just general. - Wow, okay, so Amelia's already made the rant better than I have, but, like, not just chatbots is, like, kind of rant one. Rant two is, like, AI's really been the story of compute and compute plus data and ways in which you could change one for the other.

And I think as much as our research community is really smart, like, we have made many, many advancements, and that's gonna continue to be important, but, like, now I think the game is increasingly changing, and, like, the rapid industrialization era has begun, and I think we, unfortunately, have to embrace it.

- Yep, excellent. - Awesome, David, thank you so much for your time. - Cool, yeah, thanks, guys, this was fun. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music)