back to index

Why Google failed to make GPT-3 -- with David Luan of Adept


Chapters

0:0 Introduction of David Luan, CEO and co-founder of Adept
1:14 David's background and career trajectory
3:20 Transition from reinforcement learning to transformers in the AI industry
5:35 History and development of GPT models at OpenAI and Google
13:8 Adept's $420 million funding rounds
13:38 Explanation of what Adept does and their vision for AI agents
19:20 Reasons for Adept becoming more public-facing
21:0 Adept's critical path and research directions (Persimmon, Fuyu, Act One)
26:23 How AI agents should interact with software and impact product development
30:37 Analogies between AI agents and self-driving car development
32:42 Balancing reliability, cost, speed and generality in AI agents
35:11 Adept's unique positioning and advantages in the AI industry
37:30 Potential of foundation models for robotics
39:22 Core research questions and reasons to work at Adept
40:57 David's closing thoughts on the AI agent space and industrialization of AI

Whisper Transcript | Transcript Only Page

00:00:00.000 | (upbeat music)
00:00:02.580 | ♪ Podcast, we dive right in ♪
00:00:07.580 | ♪ Exploring the world, the guts where new begins begin ♪
00:00:12.760 | ♪ David Luan, the founder ♪
00:00:19.180 | ♪ A visionary in his own right ♪
00:00:23.160 | ♪ Building autonomous agents ♪
00:00:26.900 | ♪ Taking us to new heights, yeah ♪
00:00:32.040 | - Hey everyone, welcome to the Latent Space Podcast.
00:00:34.480 | This is Alessio, partner and CTO
00:00:36.280 | of Residence at Decibel Partners,
00:00:37.720 | and I'm joined by my co-host, Swiggs,
00:00:39.560 | founder of Small.ai.
00:00:40.840 | - Hey, and today we have David Luan,
00:00:42.840 | CEO, co-founder of ADEPT in the studio, welcome.
00:00:45.400 | - Yeah, thanks for having me.
00:00:46.320 | - Been a while in the works,
00:00:47.340 | I met you socially at one of those VC events,
00:00:50.560 | and you said that you were interested in coming on,
00:00:52.640 | and glad we finally were able to make this happen.
00:00:54.680 | - Yeah, happy to be a part of it.
00:00:57.000 | - So we like to introduce the speaker,
00:01:00.360 | and then also just have you talk a little bit
00:01:02.440 | about what's not on your LinkedIn,
00:01:03.800 | what people should just generally know about you.
00:01:06.560 | You started a company in college,
00:01:08.840 | which was the first real-time
00:01:10.800 | video detection classification API that was Dextro,
00:01:15.480 | and that was your route to getting acquired into Axon,
00:01:18.200 | where you were director of AI.
00:01:20.180 | Then you were the 30th hire at OpenAI?
00:01:23.840 | - Yeah, 30, 35, something around there.
00:01:25.880 | - Something like that.
00:01:27.520 | VP of Eng for two and a half years, two years and a bit,
00:01:31.520 | briefly served as tech lead of large models at Google,
00:01:36.520 | and then in 2022 started ADEPT.
00:01:40.360 | So that's the sort of brief CV.
00:01:42.780 | - Yeah, more or less.
00:01:43.800 | - Yeah, is there anything else
00:01:44.640 | you wanna fill in the blanks,
00:01:45.600 | or people should know more about?
00:01:47.440 | - I guess a broader story was joined OpenAI fairly early,
00:01:52.440 | and then did that for about, yeah,
00:01:53.840 | two and a half to three years leading engineering there.
00:01:56.520 | It's really funny, I think second or third day
00:01:59.720 | of my time at OpenAI, Greg and Ilya pulled me in a room
00:02:03.920 | and were like, "Hey, you should take over our direction.
00:02:08.920 | "We'll go, mostly do IC work."
00:02:12.640 | So that was fun, just coalescing a bunch of teams
00:02:15.300 | out of a couple of early initiatives
00:02:18.480 | that had already happened.
00:02:19.320 | The company, the Dota effort was going pretty hard,
00:02:21.840 | and then more broadly trying to put some
00:02:24.880 | bigger picture direction around
00:02:25.840 | what we were doing with basic research.
00:02:27.200 | So I spent a lot of time doing that.
00:02:28.660 | And then at Google, so I led Google's LLM efforts,
00:02:32.520 | but also co-led Google Brain,
00:02:33.960 | was one of the brain leads more broadly.
00:02:36.480 | And I think there's been a couple of different eras
00:02:39.320 | of AI research, right?
00:02:40.600 | And if we count everything before 2012 as prehistory,
00:02:44.060 | which people hate it when I say that,
00:02:46.040 | you kinda had this like you and your three best friends
00:02:48.040 | write a research paper that changes the world period
00:02:50.200 | from like 2012 to 2017.
00:02:52.640 | And then from, and I think the game changed in 2017,
00:02:56.080 | and like most labs didn't realize it,
00:02:57.520 | but we at OpenAI really did.
00:02:59.040 | I think in large part helped by like Ilya's
00:03:01.240 | constant beating of the drum
00:03:02.440 | that the world would be covered in data centers,
00:03:04.220 | and like, and I think--
00:03:05.400 | - Skills that they need.
00:03:06.360 | - Yeah, well, like I think we had conviction in that,
00:03:08.640 | but it wasn't until we started seeing results
00:03:10.360 | that it became clear that that was where we had to go.
00:03:12.600 | But also part of it as well was like for OpenAI,
00:03:14.600 | like when I first joined,
00:03:15.680 | I think one of the jobs that I had to do was
00:03:17.480 | how do I tell a differentiated vision
00:03:19.240 | for who we were technically,
00:03:21.040 | compared to, hey, we're just smaller Google brain,
00:03:23.360 | or like we're Google,
00:03:24.200 | like you work at OpenAI if you live in SF
00:03:26.120 | and don't wanna commute to Mountain View
00:03:27.480 | or don't wanna live in London, right?
00:03:29.120 | That's like not enough to like hang your technical identity
00:03:31.760 | and as a company.
00:03:32.580 | And so like we really did was,
00:03:34.480 | and I spent a lot of time pushing this,
00:03:35.760 | is just how do we get ourselves focused on
00:03:38.360 | a certain class of like giant swings and bets, right?
00:03:42.080 | Like how do you flip the script from
00:03:44.460 | you just do bottom-up research to more about like,
00:03:47.480 | how do you like leave some room for that,
00:03:49.040 | to really make it about like,
00:03:50.100 | what are the big scientific outcomes that you wanna show?
00:03:53.640 | And then you just solve them at all costs,
00:03:55.360 | whether or not you care about novelty and all that stuff.
00:03:57.680 | And that became the dominant model for a couple years, right?
00:04:00.740 | And then what's changed now is I think that like
00:04:05.560 | the number one driver of AI progress
00:04:07.000 | over the next couple of years
00:04:07.880 | is gonna be the deep co-design and co-evolution
00:04:10.320 | of like product and users for feedback
00:04:12.820 | and actual technology.
00:04:14.400 | And I think labs that retool to go do that
00:04:16.320 | are gonna do really well.
00:04:17.200 | And that's a big part of why I started ADAPT.
00:04:19.080 | - You mentioned Dota.
00:04:20.200 | Any memories thinking from like the switch from RL
00:04:23.840 | to Transformers at the time and kind of how the industry
00:04:27.040 | was evolving more in the LLM side
00:04:29.800 | and leaving behind some of the more agent simulation work?
00:04:33.760 | - You know, I actually think that people,
00:04:35.960 | like zooming way out,
00:04:36.920 | I think agents are just absolutely
00:04:38.480 | the correct long-term direction, right?
00:04:39.880 | You just go to find what AGI is, right?
00:04:41.720 | You're like, hey, like, well, first off,
00:04:43.320 | actually, I don't love AGI definitions
00:04:45.120 | that involve human replacement
00:04:46.440 | because I don't think that's actually
00:04:47.400 | how it's gonna happen.
00:04:48.520 | I think even this definition of like AGI
00:04:50.000 | is something that outperforms humans
00:04:51.240 | at economically valuable tasks
00:04:52.720 | is kind of a implicit view of the world
00:04:56.680 | about what's gonna be the role of people.
00:04:59.620 | I think what I'm more interested in
00:05:01.900 | is a definition of AGI that's oriented around
00:05:04.200 | a model that can do anything
00:05:05.200 | a human can do on a computer.
00:05:06.900 | And I think if you go think about that,
00:05:08.920 | which is like super tractable,
00:05:10.580 | then agent is just a natural consequence
00:05:13.960 | of that definition.
00:05:15.120 | And so like, what did all the work we did
00:05:17.800 | on RL and stuff like that get us
00:05:19.440 | was it got us a really clear formulation,
00:05:21.900 | like you have a goal
00:05:22.740 | and you wanna maximize the goal
00:05:23.760 | and you wanna maximize reward, right?
00:05:25.560 | Like natural LLM formulation
00:05:27.000 | doesn't come with that out of the box, right?
00:05:28.960 | So like, I think that we, as a field,
00:05:32.080 | got a lot right by thinking about,
00:05:33.460 | hey, how do we solve problems of that caliber?
00:05:35.720 | And then the thing we forgot is like,
00:05:37.840 | like the Novo RL is like a pretty terrible way
00:05:40.360 | to get there quickly.
00:05:41.260 | Why are we rediscovering all the knowledge
00:05:42.880 | about the world?
00:05:43.960 | Like years ago, I had a debate
00:05:45.120 | with a Berkeley professor as to like,
00:05:48.180 | like what will it actually take to build AGI?
00:05:50.480 | And his view is basically that you have to reproduce
00:05:52.980 | all the flops that went into evolution
00:05:56.080 | in order to be able to get there, right?
00:05:57.200 | - The biological basis theory.
00:05:59.280 | - I think like we are ignoring the fact
00:06:01.760 | that you have a giant shortcut,
00:06:02.840 | which is you can behavioral clone
00:06:04.120 | everything humans already know.
00:06:05.760 | And that's what we solved with LLMs.
00:06:07.080 | We've solved behavioral cloning
00:06:08.480 | everything that humans already know, right?
00:06:10.000 | So like today, maybe LLMs is like behavioral cloning
00:06:12.720 | every word that gets written on the internet.
00:06:14.920 | In the future, you know, like now the multimodal models
00:06:17.360 | are becoming more of a thing
00:06:18.280 | where behavioral cloning the visual world,
00:06:20.080 | but really what we're just gonna have
00:06:21.560 | is like a universal byte model, right?
00:06:23.560 | Where like tokens of data that have high signal come in,
00:06:27.320 | and then all of those patterns are like learned by the model
00:06:30.280 | and then you can regurgitate any combination out, right?
00:06:32.400 | So like, like text in to voice out,
00:06:34.520 | like image in to, I don't know,
00:06:36.040 | like to other image out or video out or whatever,
00:06:38.780 | like these like mappings, right?
00:06:40.080 | Like all just gonna be learned
00:06:41.160 | by this universal behavioral cloner.
00:06:43.160 | And so I'm glad we figured that out.
00:06:44.960 | And I think now we're back to the era of like,
00:06:48.440 | how do we combine this with all of the lessons we learned
00:06:51.320 | during the RL period?
00:06:53.120 | And that's what's gonna drive progress.
00:06:54.920 | - Interesting.
00:06:55.760 | I'm still gonna pressure you for a little,
00:06:57.280 | a few more early opening eyes stories
00:06:58.880 | before we turn to the adept stuff.
00:07:00.480 | On your personal site, which I love,
00:07:02.360 | 'cause it's really nice, like personal, you know,
00:07:05.120 | story context around like your history.
00:07:07.160 | - I need to update it, it's so old.
00:07:08.640 | - Yeah, it's so out of date.
00:07:10.820 | But you mentioned GPT-2.
00:07:12.920 | Did you overlap with GPT-1?
00:07:14.200 | I think you did, right?
00:07:15.360 | - I actually don't quite remember.
00:07:17.720 | I think I was joining right around.
00:07:19.600 | - Right around that? - I was right around that, yeah.
00:07:20.840 | - Yeah, the canonical story was Alec,
00:07:23.840 | you know, just kind of came in and was like very obsessed
00:07:26.840 | with transformers and applying them
00:07:29.720 | to like Reddit sentiment analysis.
00:07:32.200 | - Yeah, yeah, sentiment, that's right,
00:07:34.040 | sentiment neuron, all that stuff.
00:07:35.540 | - The history of GPT, as far as you know,
00:07:38.120 | you know, according to you.
00:07:38.960 | - Ah, okay, history of GPT, according to me,
00:07:40.940 | that's a pretty good question.
00:07:41.860 | So I think the real story of GPT starts at Google,
00:07:45.420 | of course, right?
00:07:46.340 | Because that's where transformers sort of came about.
00:07:49.780 | The number one shocking thing to me was that,
00:07:53.480 | and this is like a consequence of the way
00:07:54.820 | that Google's organized, where like, again,
00:07:56.260 | like you and your three best friends write papers, right?
00:07:58.460 | Okay, so zooming way out.
00:07:59.940 | I think about my job when I was a full-time research leader
00:08:03.220 | as a little bit of a portfolio allocator, right?
00:08:05.620 | So I've got really, really smart people.
00:08:08.260 | My job is to convince people to coalesce
00:08:10.720 | around a small number of really good ideas
00:08:12.680 | and then run them over the finish line.
00:08:14.280 | My job is not actually to promote a million ideas
00:08:17.200 | that never have critical mass.
00:08:18.680 | And then as the ideas start coming together
00:08:20.080 | and some of them start working well,
00:08:21.600 | my job is to nudge resources towards the things
00:08:24.600 | that are really working and then start disbanding
00:08:27.160 | some of the things that are not working, right?
00:08:29.200 | That muscle did not exist during my time at Google.
00:08:33.240 | And I think had they had it, what they would have done
00:08:35.400 | would be say, hey, Noam Shazir, you're a brilliant guy,
00:08:38.240 | you know how to scale these things up.
00:08:39.880 | Like, here's half of all of our TPUs.
00:08:42.480 | And then I think they would have destroyed us.
00:08:44.880 | - He clearly wanted it too.
00:08:45.800 | He's talking about trillion parameter models in 2017.
00:08:48.080 | - Yeah, and so I think this gets to the core
00:08:49.660 | of the GPT story, right?
00:08:50.840 | Which is that, and I'm jumping around historically, right?
00:08:53.440 | But like, after GPT-2, we were all really excited
00:08:55.800 | about GPT-2, I can tell you more stories about that.
00:08:58.760 | It was the last paper that I even got to really touch
00:09:01.440 | before everything became more about
00:09:02.800 | just like building a research org.
00:09:04.800 | You know, every day we were scaling up GPT-3,
00:09:07.840 | I would wake up and just be stressed.
00:09:10.160 | And I was stressed because, you know,
00:09:11.840 | you just look at the facts, right?
00:09:13.220 | Google has all this compute, Google has all the people
00:09:15.880 | who invented all of these underlying technologies.
00:09:18.480 | There's a guy named Noam who's really smart,
00:09:20.000 | who's already gone and done this talk
00:09:22.500 | about how he wants a trillion parameter model.
00:09:24.920 | And I'm just like, you know, we're like,
00:09:26.880 | we're probably just doing duplicative research
00:09:28.960 | to what he's doing, right?
00:09:30.040 | He's got this like decoder only transformer
00:09:32.600 | that's probably gonna get there before we do.
00:09:34.660 | And I was like, but like, please just like
00:09:36.520 | let this model finish, right?
00:09:38.920 | And it turned out the whole time
00:09:40.580 | that they just couldn't get critical mass.
00:09:42.440 | So during my year where I led the Google LM effort
00:09:45.520 | and like, and I was one of the brain leads,
00:09:48.220 | you know, it became really clear why, right?
00:09:50.760 | At the time, there was a thing called
00:09:52.600 | the brain credit marketplace.
00:09:54.840 | And did you guys remember the brain credit marketplace?
00:09:57.040 | - No, I never heard of this.
00:09:57.880 | - Oh, so it's actually, you can ask any Googler,
00:10:00.160 | it's like just like a thing that they do.
00:10:02.360 | - I mean, look, like yeah, limited resources,
00:10:04.840 | you gotta have some kind of marketplace, right?
00:10:06.880 | - You could. - Sometimes it's explicit,
00:10:08.140 | sometimes it's just political favors.
00:10:10.140 | - You could, and so then like,
00:10:12.560 | basically everyone's assigned a credit, right?
00:10:14.320 | So if you have a credit, you get to buy end chips
00:10:17.440 | according to supply and demand.
00:10:19.080 | So if you wanna go do a giant job,
00:10:20.820 | you gotta convince like 19 or 20 of your colleagues
00:10:22.860 | not to do work.
00:10:24.160 | And if that's how it works, it's like,
00:10:27.020 | it's really hard to get that bottom up critical mass
00:10:30.800 | to go scale these things.
00:10:31.880 | And like, and the team at Google were fighting valiantly,
00:10:35.040 | but like, we were able to beat them
00:10:36.880 | simply because we took big swings and we focused.
00:10:40.840 | And I think, again, that's like part of the narrative
00:10:43.000 | of like this phase one of AI, right?
00:10:45.280 | Of like this modern AI era to phase two.
00:10:48.360 | And I think in the same way, I think phase three companies
00:10:51.000 | can out execute phase two companies
00:10:52.920 | because of the same like asymmetry of success.
00:10:56.120 | - Yeah, I think it's underrated how much Nvidia
00:10:59.080 | worked with you in the early days as well.
00:11:01.040 | I think maybe, I think it was Jensen,
00:11:02.840 | I'm not sure who circulated a recent photo
00:11:06.720 | of him delivering the first DGX to you guys.
00:11:10.800 | - I think Jensen has been a complete legend
00:11:15.120 | and a mastermind throughout.
00:11:17.280 | I have so much respect for Nvidia, it is unreal.
00:11:20.120 | - But like what opening I like kind of give
00:11:21.680 | their requirements like co-design it
00:11:23.480 | or you just work with whatever Nvidia gave them.
00:11:26.840 | - So we work really closely with them.
00:11:29.120 | There's, I'm not sure I can share all the stories,
00:11:31.320 | but like, I think like examples of ones
00:11:33.480 | that I've found particularly interesting.
00:11:35.000 | So Scott Gray is amazing.
00:11:37.680 | And I really like working with him.
00:11:39.200 | He was on one of my teams, the supercomputing team,
00:11:43.120 | which Chris Berner runs and Chris Berner
00:11:44.800 | still does a lot of stuff in that.
00:11:46.360 | But as a result, like we had very close ties to Nvidia.
00:11:50.640 | Actually, one of my co-founders at Adapt, Eric Elson,
00:11:52.720 | was also one of the early GPGPU people.
00:11:55.120 | And so he and Scott and like Brian Catanzaro and Nvidia
00:11:58.640 | and Jonah and Ian at Nvidia, I think all were very close.
00:12:02.640 | And we're all sort of part of this group of just like,
00:12:04.240 | how do we push these chips to the absolute limit?
00:12:07.080 | And I think like that kind of collaboration
00:12:09.520 | helped quite a bit.
00:12:10.680 | One interesting set of stuff is just like,
00:12:12.400 | knowing the A100 generation that like quad sparsity
00:12:15.080 | was gonna be a thing.
00:12:15.920 | Is that something that we wanna go look into, right?
00:12:18.480 | And figure out if that's something
00:12:19.400 | that we could actually use for model training.
00:12:21.200 | And I think more and more people realize this,
00:12:22.920 | but like six years ago, or even three years ago,
00:12:26.360 | people refused to accept it.
00:12:28.040 | Like this era of AI is really a story of compute.
00:12:30.160 | It's really the story of how do you more efficiently map
00:12:32.760 | like actual usable model flops to compute, right?
00:12:37.080 | - Yeah, cool.
00:12:38.240 | Is there another, you know, sort of GPT-2, 3 story
00:12:42.160 | that like, you know, you love to get out there
00:12:45.040 | that I think you think is like underappreciated
00:12:47.040 | for like the amount of work that people put into it?
00:12:49.080 | - So two interesting GPT-2 stories.
00:12:51.160 | - Love it.
00:12:52.000 | - I spent a good bit of time just sprinting
00:12:53.800 | to help Alec get the paper out.
00:12:55.840 | And I remember one of the most entertaining moments,
00:12:59.880 | we were writing the modeling section.
00:13:01.840 | And I'm pretty sure the modeling section
00:13:03.320 | was like the shortest modeling section of any ML,
00:13:05.720 | like reasonably legitimate ML paper to that moment.
00:13:08.480 | It was like section three model,
00:13:10.160 | like this is a standard vanilla decoder only transformer
00:13:13.200 | with like these particular things.
00:13:14.960 | It was like a paragraph long, if I remember correctly.
00:13:17.240 | And both of us were just looking at the same,
00:13:18.880 | being like, man, like the OGs in the field
00:13:21.240 | are gonna hate this.
00:13:22.080 | They're gonna say no novelty.
00:13:23.760 | Like, why'd you guys do this work?
00:13:26.080 | So now it's funny to look at in hindsight
00:13:29.000 | that it was kind of a pivotal kind of paper.
00:13:31.800 | But I think it was one of the early ones
00:13:33.520 | where we just leaned fully into all we care about
00:13:36.080 | is solving problems in AI and not about like,
00:13:38.920 | hey, like, is there like four different,
00:13:40.960 | like really simple ideas
00:13:41.960 | that are cloaked in mathematical language
00:13:44.160 | that doesn't actually help move the field forward?
00:13:47.920 | - Right.
00:13:48.760 | And it's like, you innovate on maybe like data set
00:13:50.880 | and scaling and not so much the architecture.
00:13:53.560 | - Yeah.
00:13:55.280 | I mean, now, I mean,
00:13:56.160 | like we all know how it works now, right?
00:13:57.720 | Which is that like,
00:13:58.680 | there's a collection of really hard won knowledge
00:14:00.480 | that you get only by being at the frontiers of scale.
00:14:03.360 | And that hard won knowledge,
00:14:04.880 | a lot of it's not published.
00:14:06.080 | A lot of it is like stuff that like,
00:14:07.840 | it's actually not even easily reducible
00:14:09.560 | to what looks like a typical academic paper.
00:14:12.120 | But yeah, that's the stuff that helps differentiate
00:14:14.480 | one scaling program from another.
00:14:16.280 | - Yeah.
00:14:17.120 | You had a second one?
00:14:17.960 | - Hilariously enough,
00:14:19.040 | the last meeting we did with Microsoft
00:14:21.880 | before Microsoft invested in OpenAI,
00:14:24.680 | Sam Altman, myself, and our CFO flew up to Seattle
00:14:27.960 | to do the final pitch meeting.
00:14:29.640 | And I'd been a founder before,
00:14:31.000 | so I always had like a tremendous amount of anxiety
00:14:33.680 | about partner meetings,
00:14:34.960 | which this basically is what it was,
00:14:36.400 | because it's like Kevin Scott and Satya and Amy Hood.
00:14:40.320 | And it was my job to give the technical slides about,
00:14:43.400 | you know, what's the path to AGI,
00:14:44.640 | what's our research portfolio, all of this stuff.
00:14:47.200 | But it was also my job to give the GPT-2 demo.
00:14:50.000 | We had a slightly bigger version of GPT-2
00:14:52.040 | that we had just cut maybe a day or two
00:14:54.480 | before this flight up.
00:14:55.800 | As we all know now,
00:14:56.800 | model behaviors you find predictable at one checkpoint
00:14:59.480 | are not predictable in another checkpoint.
00:15:01.160 | And so like, I'd spent all this time trying to figure out
00:15:03.040 | how to keep this thing on rails,
00:15:05.120 | prevent it from saying anything bad.
00:15:06.680 | But I had my canned demos,
00:15:08.240 | but I knew I had to go turn it around
00:15:10.160 | over to like Satya and Kevin and let them type anything in.
00:15:14.040 | And that just, that really kept me up all night.
00:15:17.000 | - Nice.
00:15:18.760 | - Yeah.
00:15:19.600 | - That must have helped you,
00:15:20.440 | talking about partners meeting,
00:15:21.600 | you raised 420 million for ADAPT.
00:15:25.440 | The last round was a $350 million Series B,
00:15:28.040 | so I'm sure you do great in partners meetings.
00:15:30.040 | - Pitching Phoenix.
00:15:31.240 | - Nice.
00:15:32.080 | - No, that's a high compliment coming from a VC.
00:15:34.360 | - Yeah, no, I mean, you're doing great already.
00:15:36.800 | Let's talk about ADAPT.
00:15:38.600 | And we were doing pre-prep,
00:15:41.240 | and you mentioned that maybe a lot of people
00:15:42.520 | don't understand what ADAPT is.
00:15:43.800 | So usually we try and introduce the product
00:15:46.240 | and then have the founders fill in the blanks,
00:15:47.880 | but maybe let's do the reverse.
00:15:49.240 | Like what is ADAPT?
00:15:50.880 | - Yeah, so I think ADAPT is like the least understood company
00:15:54.800 | in the like broader space of foundation models plus agents.
00:15:58.480 | So I'll give some color and I'll explain what it is,
00:16:02.280 | and I'll explain also why it's actually pretty different
00:16:06.120 | from what people would have guessed.
00:16:07.560 | So the goal for ADAPT is we basically wanna build
00:16:11.760 | an AI agent that can basically help humans do anything
00:16:15.640 | a human does on a computer.
00:16:17.160 | And so what that really means is like,
00:16:19.480 | we want this thing to be super good at turning
00:16:21.680 | natural language, like goal specifications, right?
00:16:25.240 | Into the correct set of end steps,
00:16:27.480 | and then also have all the correct sensors and actuators
00:16:29.920 | to go get that thing done for you across any software tool
00:16:32.440 | that you already use.
00:16:33.520 | And so the end vision of this is effectively like,
00:16:35.680 | I think in a couple of years,
00:16:36.640 | everyone's gonna have access to like an AI teammate
00:16:38.840 | that they can delegate arbitrary tasks to at work,
00:16:42.560 | and then also be able to use it as a sounding board
00:16:45.000 | and like just be way, way, way more productive, right?
00:16:47.840 | And just like changes the shape of every job
00:16:50.600 | from something where you're mostly doing execution
00:16:52.400 | to something where you're mostly actually doing
00:16:53.880 | like these core liberal arts skills of like,
00:16:55.960 | what should I be doing and why, right?
00:16:58.160 | I find this like really exciting and motivating
00:16:59.760 | because I think it's actually a pretty different vision
00:17:02.840 | for how AGI will play out.
00:17:04.200 | I think like a systems like ADAPT
00:17:06.120 | are the most likely systems to be proto-AGIs.
00:17:09.800 | But I think the ways in which we are really counterintuitive
00:17:12.180 | to everybody is that we've actually been really quiet
00:17:14.840 | because we are not a developer company.
00:17:18.520 | We don't sell APIs.
00:17:19.520 | We don't sell open source models.
00:17:21.320 | We also don't sell bottom up products.
00:17:24.200 | Like we're not a thing that you go and click
00:17:26.520 | and download the extension
00:17:28.520 | and like we want more users signing up for that thing.
00:17:30.480 | We're actually an enterprise company.
00:17:31.760 | So what we do is we have,
00:17:33.040 | we work with like a range of different companies,
00:17:36.600 | some like late stage, like multi-thousand people startups,
00:17:40.680 | some fortune 500s, et cetera.
00:17:42.560 | And what we do for them is we basically give them
00:17:45.240 | an out of the box solution where like big complex workflows
00:17:49.240 | that their employees do every day
00:17:50.580 | could be delegated to the model.
00:17:52.280 | So we look a little different from other companies
00:17:54.360 | in that like in order to go build this full agent thing,
00:17:57.740 | the most important thing you gotta get right is reliability.
00:18:00.600 | I think over the last year or two.
00:18:02.080 | So initially zooming way back
00:18:03.680 | when one of the first things that Depp did
00:18:05.480 | was we released this demo called Act One, right?
00:18:08.880 | Act One was like pretty cool.
00:18:09.900 | It's like kind of become a hello world thing
00:18:11.600 | for people to show agent demos
00:18:13.220 | by going to Redfin and asking to buy a house somewhere.
00:18:15.960 | 'Cause like we did that in the original Act One demo
00:18:18.280 | and like showed that, showed like Google Sheets,
00:18:20.440 | like all this other stuff.
00:18:22.240 | But over the last like year since that has come out,
00:18:26.600 | there's been a lot of really cool demos.
00:18:30.400 | And you go play with them
00:18:31.240 | and you realize they work 60% of the time.
00:18:33.240 | But since we've always been focused
00:18:34.760 | on how do we build an amazing enterprise product,
00:18:36.820 | like enterprises like don't want,
00:18:38.640 | can't use anything that isn't in the nines of reliability.
00:18:41.800 | And so we've actually had to go down
00:18:43.040 | a slightly different tech tree
00:18:44.280 | than what you might find in the prompt engineering
00:18:47.240 | sort of plays in the agent space to get that reliability.
00:18:52.240 | And we've decided to prioritize reliability over all else.
00:18:54.600 | So like one of our use cases is crazy enough
00:18:56.980 | that it actually ends with a physical truck
00:18:59.400 | being sent to a place as the result of the agent workflow.
00:19:02.880 | And if you're like, if that works like 60% of the time,
00:19:05.240 | you're just like blowing money
00:19:06.440 | and poor truck drivers going places.
00:19:08.360 | - Interesting.
00:19:10.280 | We had one of our investment teams
00:19:12.480 | has this idea of services as software.
00:19:14.640 | I'm actually giving a talk at NVIDIA GTC about this,
00:19:17.320 | but basically software as a service,
00:19:19.800 | you're wrapping user productivity in software
00:19:23.120 | with agents and services as software
00:19:25.080 | is replacing things that you would ask somebody to do
00:19:29.280 | and the software just does it for you.
00:19:31.320 | When you think about these use cases,
00:19:33.240 | do the users still go in and like look at the agent
00:19:37.480 | kind of like doing the things and can intervene
00:19:39.280 | or like are these like fully removed from them?
00:19:41.600 | Like the truck thing is like, does the truck just show up
00:19:43.700 | or like are there people in the middle like checking in?
00:19:46.200 | - Yeah, so actually what's been really interesting
00:19:47.920 | is you could question whether they're fundamental,
00:19:49.840 | but I think there's two current flaws
00:19:51.000 | in the framing for services as software
00:19:53.600 | or I think what you just said.
00:19:55.080 | I think that one of them is like in our experience
00:19:56.960 | as we've been rolling out ADEPT,
00:19:59.400 | the people who actually do the jobs
00:20:00.960 | are the most excited about it
00:20:02.520 | because they don't go from I do this job
00:20:04.360 | to I don't do this job.
00:20:05.300 | They go from I do this job for everything,
00:20:07.600 | including the shitty rote stuff to I'm a supervisor
00:20:11.320 | and I literally like, it's pretty magical
00:20:13.120 | when you watch the thing being used
00:20:15.060 | because like now it parallelizes a bunch of the things
00:20:17.840 | that you had to do sequentially by hand as a human
00:20:20.640 | and you can just click into any one of them,
00:20:22.100 | be like, hey, I wanna watch the trajectory
00:20:23.600 | that the agent went through to go solve this
00:20:26.280 | and the nice thing about agent execution
00:20:28.400 | as opposed to like LLM generations
00:20:30.640 | is that a good chunk of the time
00:20:32.560 | when the agent fails to execute,
00:20:34.060 | it doesn't give you the wrong result.
00:20:35.240 | It just fails to execute
00:20:36.320 | and the whole trajectory is just broken and dead
00:20:37.840 | and the agent knows it, right?
00:20:39.280 | So then those are the ones that the human then goes
00:20:41.400 | and solves and so then they become a troubleshooter.
00:20:43.200 | They work on the more challenging stuff.
00:20:44.880 | They get way, way more stuff done
00:20:46.320 | and they're really excited about it.
00:20:47.880 | I think the second piece of it that we've found
00:20:51.160 | is like our strategy as a company
00:20:53.660 | is to always be an augmentation company
00:20:57.220 | and I think one, out of principle,
00:20:59.500 | that's something we really care about
00:21:01.120 | but two, actually, if you're framing yourself
00:21:04.260 | as an augmentation company,
00:21:06.100 | you're always gonna live in the world
00:21:07.500 | where you're solving tasks that are a little too hard
00:21:10.000 | for what the model can do today
00:21:11.740 | and still needs a human to provide oversight,
00:21:15.100 | provide clarifications, provide human feedback
00:21:17.620 | and that's how you build a data flywheel.
00:21:19.340 | That's how you actually learn from the smartest humans
00:21:21.380 | how to solve things models can't do today
00:21:23.220 | and so I actually think that being an augmentation company
00:21:25.980 | forces you to go develop your core AI capabilities faster
00:21:30.060 | than someone who's saying,
00:21:30.900 | ah, okay, my job's to deliver you
00:21:32.780 | a lights-off solution for X.
00:21:35.120 | - Yeah, it's interesting
00:21:36.340 | because we've seen two parts of the market.
00:21:39.100 | One is we have one company
00:21:40.740 | that does agents for SOC analysts.
00:21:43.340 | People just don't have them, you know,
00:21:45.020 | and just they cannot attract the talent to do it
00:21:47.020 | and similarly in software development,
00:21:49.260 | you have Copilot, which is the augmentation product
00:21:51.740 | and then you have Sweep.dev, any of these products,
00:21:54.860 | which is like, they just do the whole thing.
00:21:57.580 | I'm really curious to see how that evolves.
00:21:59.580 | I agree that today, the reliability's so important
00:22:02.500 | in the enterprise that they just don't use most of them.
00:22:05.560 | Yeah, no, that's cool.
00:22:08.380 | But it's great to hear the story
00:22:09.780 | because I think from the outside,
00:22:10.900 | people are like, oh, Dev, they do Act One,
00:22:13.180 | they do Persimon, they do Fuyu, they do all these--
00:22:15.500 | - It's just the public stuff.
00:22:16.660 | - It's just public stuff and so I think you're gonna find,
00:22:19.580 | so one of the things we haven't shared before
00:22:21.300 | is we're completely sold out for Q1.
00:22:23.340 | And so I think-- - Sold out of what?
00:22:25.340 | - Sold out of bandwidth to go onboard more customers.
00:22:27.740 | I think we're like working really hard to go,
00:22:29.820 | like make that less of a bottleneck,
00:22:31.860 | but you could, but our expectation is that,
00:22:35.740 | I think we're gonna be significantly more public
00:22:37.500 | about the broader product shape
00:22:39.740 | and the new types of customers
00:22:41.980 | we wanna attract later this year.
00:22:43.100 | So I think that clarification will happen by default.
00:22:46.620 | - Why have you become more public?
00:22:48.700 | You know, if the whole push has,
00:22:50.060 | you're sold out, you're my enterprise,
00:22:51.780 | but you're also clearly putting effort
00:22:53.100 | towards being more open or releasing more things.
00:22:56.700 | - I think we just flipped over that way fairly recently.
00:22:59.820 | I think that, like, that's a good question.
00:23:01.180 | I think it actually boils down to two things.
00:23:03.300 | The public narrative is really forming around agents
00:23:05.420 | as being the most important thing.
00:23:07.140 | And I'm really glad that's happening
00:23:08.540 | because when we started the company in January, 2022,
00:23:11.100 | like everybody in the field knew
00:23:13.700 | about the agents thing from RL, right?
00:23:15.340 | But like the general public had no conception
00:23:17.260 | of what it was.
00:23:18.100 | They would still hang their narrative hat
00:23:19.660 | on the tree of like, everything's a chatbot, right?
00:23:23.020 | And so I think now,
00:23:24.540 | I think one of the things that I really care about
00:23:26.260 | is that when people think agent,
00:23:27.860 | they actually think the right thing, right?
00:23:29.660 | Like all sorts of different things
00:23:31.660 | are being called agents.
00:23:32.500 | Chatbots are being called agents.
00:23:33.500 | Things that make a function call are being called agents.
00:23:35.260 | Like to me, an agent is something that you can give a goal
00:23:38.140 | and get an end step workflow done correctly
00:23:40.300 | in the minimum number of steps, right?
00:23:42.180 | And so that's a big part of why.
00:23:44.180 | And I think the other part is because
00:23:45.380 | I think it's always good for people
00:23:46.660 | to be more aware of Adept as they think about
00:23:48.220 | what the next thing they wanna do in their careers.
00:23:50.060 | And I think the field is quickly pivoting
00:23:52.420 | in a world where foundation models
00:23:54.380 | are looking more and more commodity.
00:23:56.580 | And I think a huge amount of gain is gonna happen
00:23:59.300 | from how do you use foundation models
00:24:01.940 | as like the well-learned behavioral cloner
00:24:05.700 | to go solve agents.
00:24:06.620 | And I think people who wanna do agents research
00:24:08.540 | should really come to Adept.
00:24:10.300 | - Yeah, excellent.
00:24:11.660 | When you say agents have become
00:24:13.100 | more part of the public narrative,
00:24:14.900 | are there specific things that you point to?
00:24:17.060 | So I'll name a few.
00:24:19.060 | Bill Gates, in his blog posts,
00:24:21.100 | mentioning that agents are the future.
00:24:23.020 | I'm the guy who made OSes,
00:24:24.540 | and I think agents are the next thing.
00:24:26.580 | So Bill Gates, I'll call that out.
00:24:28.380 | And then maybe Sam Altman also saying
00:24:29.900 | agents are the future for OpenAI.
00:24:31.580 | - And before that even, I think there was something
00:24:33.580 | like New York Times, Kate Metz
00:24:35.140 | wrote a New York Times piece about it.
00:24:37.180 | Right now, in a bit to differentiate,
00:24:39.020 | I'm seeing AI startups that used to just brand themselves
00:24:41.140 | as an AI company now brand themselves as an AI agent company.
00:24:44.060 | It's just like, it's a term.
00:24:45.260 | I just feel like people really wanna--
00:24:46.100 | - From the VC side, it's a bit mixed.
00:24:47.620 | - Is it?
00:24:48.460 | - As in like, I think there are a lot of VCs
00:24:49.820 | where like, I would not touch any agent startups
00:24:51.820 | 'cause like--
00:24:52.660 | - Why is that?
00:24:53.540 | - Well, you tell me.
00:24:54.380 | (laughs)
00:24:56.020 | - I think a lot of VCs that are maybe less technical
00:24:59.020 | don't understand the limitations of the--
00:25:00.940 | - No, that's not fair.
00:25:01.940 | - No, no, no, no, I think like--
00:25:02.780 | - You think so?
00:25:03.620 | - No, no, I think like the, what is possible today
00:25:06.380 | and like what is worth investing in, you know?
00:25:08.420 | And I think like, I mean, people look at you and say,
00:25:10.500 | "Wow, these guys are building agents.
00:25:12.020 | "They needed 400 million to do it."
00:25:13.860 | So a lot of VCs are maybe like,
00:25:15.260 | "Oh, I would rather invest in something
00:25:17.260 | "that is like tacking on AI to an existing thing,
00:25:20.000 | "which is like easier to get the market
00:25:21.440 | "and kind of get some of the flag wheel going."
00:25:24.260 | But I'm also surprised a lot of funders
00:25:26.660 | just don't wanna do agents.
00:25:27.900 | It's not even the funding.
00:25:28.820 | Like, sometimes we look around and it's like,
00:25:30.700 | "Why is nobody doing agents for X?"
00:25:33.300 | And it's like--
00:25:34.140 | - Wow.
00:25:34.980 | - I don't get it.
00:25:36.220 | - That's good to know, actually.
00:25:37.620 | I never knew that before.
00:25:39.580 | My sense from my limited perspective
00:25:42.220 | is there's a new agent and company popping up every day.
00:25:44.300 | So maybe I'm missing something.
00:25:45.140 | - There are, there are.
00:25:46.740 | But like I have advised people to take agents
00:25:48.780 | off of their title because it's so diluted.
00:25:52.060 | - It's now so diluted, yeah.
00:25:53.460 | - So then it doesn't stand for anything.
00:25:55.300 | - Yeah, that's a really good point.
00:25:56.500 | - So anyway, I do want to also cover,
00:25:59.340 | so like, you know, you're a portfolio allocator.
00:26:02.100 | You have like people know about Persimmon,
00:26:04.660 | people know about Fuyu and Fuyu Heavy.
00:26:06.700 | Can you take us through like how you think about
00:26:08.660 | that evolution of that and what people should think about
00:26:11.700 | what that means for adept sort of research directions?
00:26:14.980 | - The critical path for adept is we want to build
00:26:17.940 | agents that can do higher and higher level
00:26:20.300 | of abstraction things over time,
00:26:22.100 | all while keeping an insanely high reliability standard.
00:26:24.940 | Because that's what turns us from research
00:26:27.020 | into something that customers want.
00:26:28.580 | And if you build agents with a really high
00:26:30.380 | reliability standard but are continuing pushing
00:26:32.100 | a level of abstraction, you then learn from your users
00:26:34.420 | how to get that next level of abstraction faster.
00:26:36.180 | So that's how you actually build the data flow.
00:26:38.540 | That's the critical path for the company.
00:26:40.140 | Everything we do is in service of that.
00:26:41.780 | So if you go zoom way, way back to Act One days, right?
00:26:44.820 | Like the core thing behind Act One is,
00:26:46.180 | can we teach a large model, basically,
00:26:50.780 | how to even actuate your computer?
00:26:52.740 | And I think we were one of the first places
00:26:54.500 | to have solved that and shown it
00:26:56.340 | and shown the generalization that you get
00:26:57.780 | when you give it various different workflows and text.
00:27:00.420 | But I think from there on out,
00:27:01.980 | we really realized was that like,
00:27:03.380 | in order to get reliability,
00:27:06.540 | and also like companies just do things
00:27:07.980 | in various different ways,
00:27:08.820 | you actually want these models to be able
00:27:10.260 | to get a lot better at having some specification
00:27:14.060 | of some guardrails for what it actually should be doing.
00:27:16.340 | And I think in conjunction with that,
00:27:17.980 | a giant thing that was really necessary
00:27:20.100 | is really fast multimodal models
00:27:22.420 | that are really good at understanding knowledge work
00:27:24.140 | and really good at understanding screens.
00:27:25.940 | And that needs to kind of be the base
00:27:27.660 | for some of these agents.
00:27:29.620 | And so like, back then we had to do a ton of research,
00:27:32.620 | basically, on how do we actually make that possible?
00:27:34.780 | Well, first off, like, back in 2020,
00:27:37.220 | I forgot the exact one month of 23,
00:27:39.620 | like there were no multimodal models really
00:27:41.460 | that you could use for things like this.
00:27:43.500 | And so we pushed really hard on stuff
00:27:45.420 | like the 4U architecture.
00:27:46.860 | I think one big hangover from primarily academic focus
00:27:51.860 | for multimodal models is like,
00:27:53.580 | most multimodal models are primarily trained
00:27:55.340 | on like natural images, cat and dog photos,
00:27:57.740 | stuff that's come out of the camera.
00:27:59.100 | - Coco.
00:27:59.940 | - Yeah, right, and the Coco is awesome.
00:28:01.300 | Like, I love Coco, I love TY.
00:28:03.020 | Like, it's like, it's really helped the field, right?
00:28:05.300 | But like, that's the build one thing.
00:28:06.980 | I actually think like, like, it's really clear today,
00:28:09.340 | multimodal models are the default foundation model, right?
00:28:12.020 | It's just gonna supplant LLMs.
00:28:13.300 | Like, why would you just train a giant multimodal model?
00:28:16.940 | And so for that though,
00:28:18.020 | like, where are they gonna be the most useful?
00:28:19.420 | They're gonna be most useful in knowledge work tasks.
00:28:21.600 | That's where the majority economic value is gonna be.
00:28:23.340 | It's not in cat and dogs, right?
00:28:25.420 | And so if that's what it is, what do you need to train?
00:28:27.580 | I need to train on like charts, graphs, tables, invoices,
00:28:29.780 | PDFs, receipts, unstructured data, UIs.
00:28:32.060 | Like, that's just a totally different pre-training corpus.
00:28:35.380 | And so at Depp, spent a lot of time building that.
00:28:37.900 | And so the like, the public for use and stuff
00:28:39.900 | aren't trained on our actual corpus,
00:28:41.700 | it's trained on some other stuff.
00:28:42.840 | But you take a lot of that data
00:28:44.600 | and then you make it really fast,
00:28:46.380 | make it really good at things like,
00:28:47.780 | like dense OCR on screens.
00:28:50.640 | And then now you have like the right,
00:28:52.360 | like a raw putty to go make a good agent.
00:28:54.540 | So that's kind of like some of the modeling side.
00:28:56.540 | We've kind of only announced some of that stuff.
00:28:58.060 | We haven't really announced much of the agents work.
00:29:01.220 | But that if you put those together
00:29:02.900 | with the correct product form factor,
00:29:04.940 | and I think the product form factor also really matters.
00:29:07.320 | I think like we're seeing,
00:29:09.620 | and you guys probably see this a little bit more than I do,
00:29:11.600 | but like we're seeing like a little bit of a pushback
00:29:15.220 | against like the tyranny of chatbots as form factor.
00:29:18.420 | And I think that the reason why the form factor matters
00:29:21.180 | is the form factor changes what data you collect
00:29:23.020 | in the human feedback loop.
00:29:24.500 | And so I think we've spent a lot of time doing full
00:29:27.660 | of like vertical integration of all these bits
00:29:30.740 | in order to get to where we are.
00:29:32.180 | - Yeah.
00:29:33.020 | I'll plug Amelia Weinberger's talk at our conference
00:29:36.620 | where she gave a little bit of the thinking
00:29:38.740 | behind like what else exists other than chatbots
00:29:41.260 | that if you could delegate to reliable agents,
00:29:43.140 | you could do.
00:29:43.980 | - Totally.
00:29:44.800 | - And yeah.
00:29:46.500 | I mean, so I was kind of excited at Adept Experiments
00:29:49.900 | or Adept Workflows.
00:29:50.740 | I don't know what the official name for it is.
00:29:52.900 | I was like, okay, like this is something I can use,
00:29:55.500 | but it seems like it's just an experiment for now.
00:29:57.180 | It's not your product.
00:29:58.420 | - Yeah.
00:29:59.240 | So we just use experiments as like a way to go push
00:30:01.400 | various ideas on the design side to some people
00:30:05.400 | and just like get them to play with it.
00:30:06.640 | And actually the experiments code base underpins
00:30:11.640 | the actual product,
00:30:14.520 | but it's like just the code base itself
00:30:16.960 | is like a kind of like a skeleton
00:30:18.600 | for us to go deploy arbitrary cards on the side.
00:30:20.760 | - Yep.
00:30:21.580 | Yeah, makes sense.
00:30:22.540 | Yeah.
00:30:23.440 | I was gonna say,
00:30:24.280 | I would love to talk about the interaction layer.
00:30:25.920 | So you train a model to see UI,
00:30:28.860 | but then there's the question of like,
00:30:30.480 | how do you actually act on the UI?
00:30:32.160 | I think there was some rumors about open app building agents
00:30:35.160 | that are kind of like they manage the end point.
00:30:37.160 | So the whole computer,
00:30:39.040 | you're more at the browser level.
00:30:41.360 | Like, and I know I read in one of your papers,
00:30:44.200 | you have like a different representation,
00:30:46.120 | kind of like you don't just take the dome and act on it.
00:30:48.600 | You do a lot more stuff.
00:30:50.320 | How do you think about the best way the models will interact
00:30:53.520 | with the software and like how the development of products
00:30:56.680 | is gonna change with that in mind
00:30:58.320 | as more and more of the work is done by agents
00:31:00.480 | instead of people?
00:31:01.400 | - There's so much surface area here.
00:31:02.720 | And it's actually one of the things I'm really excited about.
00:31:04.320 | And it's like, it's funny because like,
00:31:06.360 | I've spent most of my time doing research stuff,
00:31:08.880 | but there's like a whole new ball game
00:31:10.520 | that I've been learning about and I find it really cool.
00:31:13.640 | So I would say the best analogy I have
00:31:18.640 | to why Adept is pursuing a path of being able to just use
00:31:23.640 | your computer like a human,
00:31:26.400 | plus of course being able to call APIs
00:31:28.000 | is the easy part,
00:31:29.400 | like being able to use your computer
00:31:30.240 | like a human is a hard part.
00:31:31.480 | It's in the same way why people are excited
00:31:32.800 | about humanoid robotics, right?
00:31:34.320 | Like in a world where you had T equals infinity, right?
00:31:37.520 | You're probably gonna have various different form factors
00:31:39.400 | that robots could just be in and like all the specialization
00:31:42.320 | but the fact is that humans live in a human environment.
00:31:44.560 | So having a human robot lets you do things that humans do
00:31:47.440 | without changing everything along the way.
00:31:49.840 | It's the same thing for software, right?
00:31:51.680 | Like if you go itemize out the number of things
00:31:55.120 | you wanna do on your computer,
00:31:56.560 | for which every step has an API,
00:31:59.240 | those numbers of workflows add up pretty close to zero.
00:32:01.920 | And so then many points along the way,
00:32:03.640 | you need the ability to actually control your computer
00:32:05.500 | like a human.
00:32:06.340 | It also lets you learn from human usage of computers
00:32:09.240 | as a source of training data that you don't get
00:32:10.720 | if you have to somehow figure out
00:32:14.080 | how every particular step needs to be
00:32:15.400 | some particular custom private API thing.
00:32:18.280 | And so I think like this is actually
00:32:19.520 | the most practical path.
00:32:20.880 | I think because it's the most practical path,
00:32:22.500 | I think a lot of success will come
00:32:24.840 | from going down this path.
00:32:26.240 | So what you're likely to see
00:32:27.280 | is you're gonna end up seeing agents
00:32:28.720 | that sort of like,
00:32:29.560 | I kind of think about this early days
00:32:31.160 | of the agent interaction layer level is a little bit like,
00:32:34.400 | do y'all remember Windows 3.1, like those days?
00:32:38.080 | Okay, I might be too old for you guys on this,
00:32:41.320 | but like back in the day, Windows 3.1, right?
00:32:43.840 | Like the way we had this transition period
00:32:46.400 | between like pure command line, right?
00:32:48.560 | Being like the default to this new robot,
00:32:51.520 | the GUI is the default,
00:32:52.360 | and then you drop into the command line
00:32:53.520 | for like programmer things, right?
00:32:55.320 | The old way was you booted your computer up,
00:32:57.920 | DOS booted,
00:32:59.000 | and then it would give you the C colon slash thing,
00:33:01.360 | and you typed Windows and you hit enter,
00:33:03.160 | and then you got put into Windows.
00:33:05.260 | And then like GUI kind of became a layer
00:33:08.440 | above the command line.
00:33:09.740 | I think the same thing is gonna happen
00:33:12.440 | with agent interfaces,
00:33:13.840 | is like today we'll be having the GUI
00:33:16.360 | is like the base layer,
00:33:18.200 | and then the agent just controls
00:33:20.160 | the current GUI layer plus APIs.
00:33:22.840 | And in the future,
00:33:24.000 | as more and more trust is built towards agents,
00:33:25.800 | and more and more things can be done by agents,
00:33:27.720 | and more UIs for agents are actually generative
00:33:29.880 | in and of themselves,
00:33:31.240 | then that just becomes a standard interaction layer.
00:33:33.800 | And if that becomes a standard interaction layer,
00:33:35.600 | like what changes for software
00:33:37.080 | is that like a lot of software
00:33:38.880 | is gonna be either systems or record,
00:33:40.680 | or like certain customized workflow execution engines.
00:33:44.240 | And a lot of how you actually do stuff
00:33:46.520 | will be controlled at the agent layer.
00:33:48.120 | - And you think so like the Rabbit interface
00:33:50.680 | is more like it would like,
00:33:51.780 | you're not actually seeing the app
00:33:53.280 | that the model interacts with,
00:33:54.680 | you're just saying,
00:33:55.940 | hey, I need to log this call on Salesforce.
00:33:57.960 | And like, you're never actually going
00:33:59.440 | on salesforce.com directly as the user.
00:34:02.340 | - I can see that being a model.
00:34:03.400 | I think I don't know enough about how,
00:34:06.560 | what using Rabbit in real life will actually be like
00:34:09.320 | to comment on that particular thing.
00:34:11.200 | But I think the broader,
00:34:13.400 | I think the broader idea that like,
00:34:15.600 | that like, you know, you have a goal, right?
00:34:18.320 | The agent knows how to break your goal down into steps.
00:34:20.320 | The agent knows how to use the underlying software
00:34:22.600 | and systems of record to achieve that goal for you.
00:34:24.880 | The agent maybe presents you information in a custom way
00:34:27.880 | that's only relevant to your particular goal.
00:34:30.940 | That all just really leads to a world
00:34:32.440 | where you don't really need to ever interface
00:34:35.820 | with the apps underneath,
00:34:36.840 | unless you're a power user for some niche thing.
00:34:38.760 | - General question.
00:34:39.960 | So first of all, I think like this whole,
00:34:42.200 | the sort of input mode conversation,
00:34:44.980 | I wonder if you have any analogies
00:34:46.920 | that you like with self-driving?
00:34:49.120 | Because I do think like,
00:34:50.880 | there's a little bit of like how the model
00:34:52.680 | should perceive the world.
00:34:54.100 | And, you know, the primary split in self-driving
00:34:56.440 | is LIDAR versus camera.
00:34:58.800 | And I feel like most agent companies that I'm tracking
00:35:03.080 | are all moving towards camera approach, which is like--
00:35:05.760 | - The multimodal approach that we're doing.
00:35:06.600 | - The non-multimodal vision, very, very heavy vision.
00:35:10.020 | All the for you stuff that you're doing,
00:35:11.520 | you're focusing on that, including charts and tables and--
00:35:15.760 | - Yeah.
00:35:16.840 | - Do you find like inspiration there from like,
00:35:19.600 | the self-driving world?
00:35:22.020 | - That's a good question.
00:35:23.880 | I think sometimes the most useful inspiration
00:35:26.240 | I've found from self-driving is the levels analogy.
00:35:31.240 | And I think that's great. - Level one to five.
00:35:32.480 | - I think that's awesome.
00:35:34.280 | But I think that our number one goal is for agents
00:35:36.800 | not to look like self-driving,
00:35:38.480 | in that we wanna minimize the chances
00:35:40.720 | that agents are sort of a thing
00:35:42.320 | that you just have to bang your head at for a long time
00:35:45.880 | to get to like two discontinuous milestones,
00:35:47.940 | which is basically what's happened in self-driving.
00:35:50.440 | We wanna be living in a world
00:35:51.520 | where you have the data flywheel immediately,
00:35:53.720 | and that takes you all the way up to the top.
00:35:55.600 | But similarly, I mean, like compared to self-driving,
00:35:58.240 | like two things that people really undervalue
00:36:00.680 | is like really easy to get the like,
00:36:03.080 | driving a car down highway 101 in a sunny day demo, right?
00:36:06.680 | Like that actually doesn't prove anything anymore.
00:36:09.320 | And I think the second thing is that
00:36:12.040 | as a non-self-driving expert,
00:36:13.920 | I think one of the things that we believe really strongly
00:36:18.100 | is that everyone undervalues the importance
00:36:22.700 | of really good sensors and actuators.
00:36:25.020 | And actually a lot of what's helped us
00:36:26.720 | get a lot of reliability is like a really strong focus
00:36:29.820 | on like actually why does the model not do this thing?
00:36:32.460 | And the non-trivial amount of time,
00:36:33.800 | the time the model doesn't actually do the thing
00:36:36.100 | is because if you're a wizard of ozzing it yourself,
00:36:38.260 | or if you have unreliable actuators, you can't do the thing.
00:36:41.580 | And so we've had to fix a lot of those problems.
00:36:43.860 | - Yeah, makes sense.
00:36:45.360 | I was slightly surprised just because
00:36:47.160 | I do generally consider the Waymo's
00:36:49.200 | that we see all around San Francisco
00:36:51.000 | as the most, I guess, real case of agents that we have,
00:36:55.000 | you know, in very material ways.
00:36:57.760 | - Oh, that's absolutely true.
00:36:58.600 | I think they've done an awesome job,
00:37:00.000 | but it has taken a long time for self-driving to mature.
00:37:02.960 | Like from when it entered the consciousness
00:37:05.280 | and the 101, the driving down 101 on a sunny day moment
00:37:08.680 | happened to now, right?
00:37:10.720 | So I want to see that more compressed.
00:37:11.560 | - And then, you know, cruise, you know, RIP recently.
00:37:15.180 | So, and then one more thing on just like,
00:37:18.140 | just going back on this reliability thing,
00:37:21.140 | something I have been holding in my head
00:37:24.060 | that I'm curious to get your commentary on is there's,
00:37:25.960 | I think there's a trade-off
00:37:26.800 | between reliability and generality,
00:37:29.020 | or I want to broaden reliability
00:37:30.780 | into just general like sort of production readiness
00:37:32.660 | and enterprise readiness scale.
00:37:34.180 | 'Cause you have reliability, you also have cost,
00:37:35.700 | you also have speed.
00:37:36.660 | Speed is a huge emphasis for a debt.
00:37:39.740 | All of that seems to, tends towards wanting to reduce,
00:37:44.520 | the tendency or the temptation is to reduce generality,
00:37:47.520 | to improve reliability and to improve cost, improve speed.
00:37:50.540 | Do you perceive a trade-off?
00:37:54.080 | Do you have any insights that,
00:37:56.360 | that solve those trade-offs for you guys?
00:37:59.000 | - There's definitely a trade-off
00:38:00.640 | if you're at the Pareto frontier.
00:38:03.000 | I think a lot of folks aren't actually
00:38:04.680 | at the Pareto frontier.
00:38:05.840 | And I think the way you get there is basically like,
00:38:09.320 | how do you frame the fundamental agent problem
00:38:11.920 | in a way that just continues to benefit from data?
00:38:15.320 | And I think that, I think like one of the main ways
00:38:19.200 | of like being able to solve that particular trade-off
00:38:21.640 | is like, you basically just want to formulate the problem
00:38:26.640 | such that every particular use case
00:38:29.080 | just looks like you collecting more data
00:38:30.640 | to go make that use case possible.
00:38:32.160 | I think that's how you really solve it.
00:38:33.280 | Then you get into the other problems like,
00:38:34.680 | okay, are you overfitting on these end use cases, right?
00:38:36.760 | But like, you're not doing a thing
00:38:38.360 | where you're like being super prescriptive
00:38:39.900 | for the end steps and that the model,
00:38:42.400 | that the model can only do, for example.
00:38:44.200 | - I mean, so then the question becomes kind of,
00:38:47.740 | do you have one sort of house model
00:38:49.640 | that you then customize for each customer
00:38:52.920 | and you're fine-tuning them
00:38:53.880 | on like each customer's specific use case?
00:38:55.640 | - Yeah, we're not sharing that one.
00:38:57.000 | - You're not sharing that.
00:38:59.520 | It's tempting because,
00:39:00.440 | but like that doesn't look like AGI to me.
00:39:02.440 | You know what I mean?
00:39:03.280 | Like that is just, you have a good base model
00:39:05.160 | and then you fine-tune it to others.
00:39:07.040 | - Yeah, yeah, yeah.
00:39:09.080 | I mean, I think for what it's worth,
00:39:10.680 | I think there's like two paths to a lot more capability
00:39:15.680 | coming out of the model set
00:39:17.300 | that we all are training these days.
00:39:19.120 | I think one path is you figure out how to spend,
00:39:21.920 | compute and turn it into data.
00:39:23.840 | I think the other path, and so like in that path, right,
00:39:26.280 | I consider search, RL, all the things that we all,
00:39:29.920 | that we all love in this era as part of that path,
00:39:32.720 | like self-play, all that stuff.
00:39:34.740 | The second path is how do you get like super competent,
00:39:39.740 | high intelligence demonstrations from humans.
00:39:44.940 | And I think the right way to move forward
00:39:46.940 | is you kind of want to combine the two.
00:39:48.700 | Like the first one gives you maximum sample efficiency
00:39:51.540 | for a little second,
00:39:53.260 | but I think that it's gonna be hard
00:39:55.260 | to be running at max speed towards AGI
00:39:59.100 | without actually solving a bit of both.
00:40:00.660 | - Yeah, any insights on,
00:40:03.180 | you haven't talked much about synthetic data
00:40:04.940 | as far as I can tell.
00:40:06.000 | Probably this is a bit of a, too much of a trend right now,
00:40:11.020 | but any insights on using synthetic data
00:40:12.820 | to augment the expensive human data?
00:40:15.380 | - The best part about framing AGI
00:40:17.740 | as being able to help people do things on computers
00:40:20.360 | is you have an environment.
00:40:21.420 | - Yes.
00:40:22.260 | - So.
00:40:23.100 | (laughs)
00:40:24.140 | - So you can simulate all of it.
00:40:25.420 | - You could do a lot of stuff when you have an environment.
00:40:27.820 | - Yeah.
00:40:28.660 | - We were having dinner for our one year anniversary.
00:40:30.700 | - Congrats.
00:40:31.980 | - Yeah, thank you.
00:40:33.300 | Raza from HumanLoop was there
00:40:34.900 | and we mentioned you were coming on the pod with,
00:40:37.900 | this is our first.
00:40:38.740 | - So he submitted a question.
00:40:39.560 | - Yeah, this is our first, I guess, like mailbag question.
00:40:42.480 | He asked, when you started GPT-4 Data and Exist,
00:40:46.460 | now you've had GPT-4 Vision,
00:40:48.420 | which can help you building a lot of those things.
00:40:51.580 | How do you think about the things that are unique to you
00:40:54.860 | as ADAPT and like going back to like the,
00:40:56.980 | maybe research direction that you want to take the team
00:40:59.500 | and what you want people to come work on at ADAPT
00:41:02.260 | versus what is maybe now become commoditized
00:41:05.020 | that you didn't expect everybody would have access to?
00:41:07.540 | - Yeah, that's a really good question.
00:41:09.200 | I think implicit in that question,
00:41:11.340 | and I wish he were tier two,
00:41:12.700 | so he can push back on my assumption about his question.
00:41:15.900 | But I think implicit in that question is like,
00:41:20.140 | is a calculus of where does advantage accrue
00:41:24.280 | in the overall ML stack.
00:41:26.120 | And maybe part of the assumption is that advantage accrues
00:41:29.480 | solely to base model scaling.
00:41:31.740 | But I actually believe pretty strongly
00:41:33.500 | that the way that you really win
00:41:36.700 | is that you have to go build an agent stack
00:41:41.700 | that is much more than that of the base model itself.
00:41:45.220 | And so I think like that is like always gonna be
00:41:48.260 | a giant advantage of vertical integration.
00:41:50.160 | I think like it lets us do things
00:41:51.340 | like have a really, really fast base model
00:41:53.060 | is really good at agent things,
00:41:54.520 | but is bad at cat and dog photo.
00:41:56.060 | It's pretty good at cat and dog photos.
00:41:57.460 | It's not like soda at cat and dog photos.
00:42:00.880 | So like we're allocating our capacity wisely,
00:42:04.840 | is like one thing that you really get to do.
00:42:06.880 | I also think that the other thing
00:42:08.240 | that is pretty important now
00:42:10.600 | in the broader foundation modeling space is like,
00:42:13.280 | I feel despite any potential concerns about,
00:42:17.520 | like how good is agents as like a startup area,
00:42:21.200 | that we were talking about earlier,
00:42:22.680 | I feel super good that we're doing foundation models
00:42:26.200 | in service of agents and all of the reward
00:42:28.260 | within ADAPT is flowing from, can we make a better agent?
00:42:31.620 | Because right now, I think we all see that,
00:42:34.500 | if you're training on publicly available web data,
00:42:37.500 | you put in the flops and you do reasonable things,
00:42:40.100 | then you get decent results.
00:42:41.780 | And if you just double the amount of compute,
00:42:43.740 | then you get predictably better results.
00:42:45.340 | And so like, I think pure play foundation model companies
00:42:48.800 | are just gonna be pinched by how good
00:42:52.240 | the next couple of llamas are gonna be.
00:42:53.940 | And the next good open source thing,
00:42:56.720 | and then seeing the really big players
00:43:00.120 | put ridiculous amounts of compute
00:43:01.800 | behind just training these base foundation models.
00:43:04.200 | I think it's gonna commoditize a lot of the regular LLMs
00:43:08.920 | and soon regular multimodal models.
00:43:10.680 | So I feel really good that we're just focused on agents.
00:43:13.240 | - So you don't consider yourself
00:43:14.600 | a pure play foundation model company?
00:43:16.560 | - No, because if we were a pure play foundation model
00:43:18.200 | company, we would be training general foundation models
00:43:21.680 | that do summarization and all this other--
00:43:24.520 | - Right, you're dedicated towards the agent.
00:43:26.800 | - Yeah, and our business is an agent business.
00:43:28.740 | We're not here to sell you tokens, right?
00:43:30.340 | And I think selling tokens,
00:43:32.600 | unless there's like a-- - We're not here
00:43:34.720 | to sell you tokens.
00:43:35.540 | I love it.
00:43:36.380 | - It's like, if you have a particular area of specialty,
00:43:41.080 | then you won't get caught in the fact that
00:43:43.940 | everyone's just scaling to ridiculous levels of compute.
00:43:47.500 | But if you don't have a specialty,
00:43:48.500 | I find that, I think it's gonna be a little tougher.
00:43:51.060 | - Interesting.
00:43:51.900 | Are you interested in robotics at all?
00:43:53.860 | - Personally fascinated by robotics.
00:43:55.140 | I always love, have always loved robotics.
00:43:57.620 | - No, but embodied agents as a business,
00:43:59.420 | figure is like a big, also sort of open AI affiliated
00:44:02.420 | company that raises a lot of money.
00:44:04.100 | - Yeah, I think it's cool.
00:44:05.260 | I think, I mean, I don't know exactly what they're doing,
00:44:09.040 | but-- - Robots.
00:44:10.640 | - Yeah, well, I mean, that's, yeah.
00:44:13.860 | - What question would you ask if we had them on?
00:44:15.460 | Like, what would you ask them?
00:44:16.840 | - Oh, I just wanna understand what their overall strategy
00:44:19.020 | is gonna be between now and when there's reliable stuff
00:44:21.540 | to be deployed.
00:44:22.940 | But honestly, I just don't know enough about it.
00:44:24.500 | - And if I told you, hey, fire your entire workforce,
00:44:28.060 | warehouse workforce, and put robots in there.
00:44:30.820 | Like, isn't that a strategy?
00:44:33.100 | - Oh, yeah, yeah, sorry, I'm not questioning
00:44:35.100 | whether they're doing smart things.
00:44:36.820 | I hope I didn't come off that way.
00:44:38.700 | - No, no, no, no, you didn't.
00:44:39.540 | - It's just like, I genuinely don't know
00:44:40.940 | what they're doing as much.
00:44:42.520 | But I think like, look, I think there's two things.
00:44:46.820 | One, I'm so excited for someone to train
00:44:50.300 | a foundation model of robots.
00:44:52.060 | Like, it's just, I think it's just gonna work.
00:44:54.820 | Like, I will die on this hill.
00:44:56.940 | I mean, like, again, this whole time,
00:44:59.260 | like, we've been on this podcast just continually saying,
00:45:01.620 | you know, like, these models are basically
00:45:04.820 | behavioral cloners, right?
00:45:05.860 | So let's go behavioral clone all this, like,
00:45:07.300 | robot behavior, right?
00:45:08.380 | And then you figure out everything else you have to do
00:45:10.580 | in order to teach you how to solve new problems.
00:45:12.200 | Like, that's gonna work.
00:45:13.540 | I'm super stoked for that.
00:45:15.880 | I think, unlike what we're doing with
00:45:19.020 | helping humans with knowledge work,
00:45:21.120 | it just sounds like a more zero-sum,
00:45:24.300 | like, job replacement play, right?
00:45:26.420 | And I'm personally less excited about that.
00:45:29.380 | - We had Ken June from Mimboo on the podcast.
00:45:33.820 | - Another guest.
00:45:34.860 | - Yeah, we asked her why people should go work there
00:45:37.340 | and not at ADAPT.
00:45:38.460 | - Oh, that's so funny.
00:45:39.520 | - So I wanna, her, well, she said, you know,
00:45:44.660 | there's space for everybody in this market.
00:45:46.720 | We're all doing interesting work.
00:45:48.080 | And she said, they're really excited about building
00:45:50.740 | an operating system for agent.
00:45:52.160 | And for her, the biggest research thing was, like,
00:45:55.040 | getting models better at reasoning
00:45:56.760 | and planning for these agents.
00:45:59.240 | The reverse question to you, you know,
00:46:01.360 | why should people be excited to come work at ADAPT
00:46:03.800 | instead of Mimboo?
00:46:04.920 | And maybe what are, like, the core research questions
00:46:08.360 | that people should be passionate about
00:46:09.800 | to have fun at ADAPT?
00:46:12.080 | - Yeah, first off, I think that,
00:46:15.220 | I'm sure you guys believe this too,
00:46:16.620 | but, like, the AI space, to the extent there's an AI space
00:46:21.140 | and the AI agent space are both, like, exactly,
00:46:24.580 | as she likely said, like, I think colossal opportunities
00:46:28.340 | and, like, people are just gonna end up winning
00:46:31.380 | in different areas and people are all just gonna,
00:46:33.140 | a lot of companies are gonna do well.
00:46:35.420 | So I really don't feel that zero-something at all.
00:46:37.980 | I would say, like, to, like, change the zero-sum framing
00:46:40.660 | is, like, why should you be at ADAPT?
00:46:43.400 | I think there's two huge reasons to be at ADAPT.
00:46:46.360 | I think one of them is, like, everything we do
00:46:49.720 | is in the service of, like, useful agents.
00:46:52.040 | Like, we're not a research lab.
00:46:53.600 | Like, we do a lot of research in service of that goal,
00:46:56.280 | but we don't think about ourselves
00:46:58.280 | as, like, a classic research lab at all.
00:47:00.480 | And I think the second reason to work at ADAPT
00:47:02.840 | is if you believe that actually having customers
00:47:05.380 | and a reward signal from customers
00:47:06.960 | lets you build a GI faster, which we really believe,
00:47:10.300 | then you should come here.
00:47:11.260 | And I think the examples for why that's true is, like,
00:47:13.660 | for example, like, our evaluations,
00:47:15.840 | they're not academic evals.
00:47:17.000 | They're not, like, simulator evals.
00:47:20.100 | They're, like, okay, like, we have a customer
00:47:21.940 | that really needs us to do these particular things.
00:47:24.480 | We can do some of them.
00:47:25.320 | These are the ones they want us to do.
00:47:26.140 | We can't do them at all.
00:47:26.980 | We've turned those into evals.
00:47:28.080 | Like, solve it, right?
00:47:29.440 | Like, I think that's really cool.
00:47:30.760 | Like, everybody knows a lot of these evals
00:47:32.200 | are, like, pretty saturated,
00:47:33.560 | and the new ones that even are not saturated,
00:47:35.360 | you look at someone and you're, like,
00:47:36.200 | is this actually useful, right?
00:47:37.920 | I think that's a degree of, like,
00:47:41.340 | of, like, practicality that really helps.
00:47:43.180 | Like, we're equally excited about the same problems
00:47:45.980 | around reasoning and planning and generalization
00:47:50.540 | and all of this stuff, but it's, like,
00:47:52.940 | they're very grounded in actual needs right now,
00:47:55.420 | which is really cool.
00:47:56.580 | - Yeah, this has been a wonderful dive.
00:47:59.060 | You know, I wish we had more time,
00:48:00.060 | but, you know, I would just leave it kind of open to you.
00:48:01.980 | I think you have broad thoughts, you know,
00:48:04.620 | just about the agent space,
00:48:05.580 | but also just the general AI space.
00:48:06.960 | Any sort of rants or things
00:48:09.380 | that are helpful for you right now?
00:48:11.400 | - Any rants?
00:48:12.640 | - Mining you for just general.
00:48:15.180 | - Wow, okay, so Amelia's already made the rant
00:48:17.680 | better than I have, but, like, not just chatbots
00:48:21.000 | is, like, kind of rant one.
00:48:23.120 | Rant two is, like, AI's really been the story of compute
00:48:28.120 | and compute plus data and ways in which
00:48:30.680 | you could change one for the other.
00:48:32.560 | And I think as much as our research community
00:48:37.560 | is really smart, like, we have made many, many advancements,
00:48:41.280 | and that's gonna continue to be important,
00:48:43.480 | but, like, now I think the game is increasingly changing,
00:48:47.160 | and, like, the rapid industrialization era has begun,
00:48:52.160 | and I think we, unfortunately, have to embrace it.
00:48:54.680 | - Yep, excellent.
00:48:55.520 | - Awesome, David, thank you so much for your time.
00:48:57.760 | - Cool, yeah, thanks, guys, this was fun.
00:48:59.900 | (upbeat music)
00:49:02.480 | (upbeat music)
00:49:05.060 | (upbeat music)
00:49:07.640 | (upbeat music)
00:49:10.220 | (upbeat music)
00:49:12.800 | (upbeat music)
00:49:15.380 | (upbeat music)
00:49:18.420 | (upbeat music)
00:49:21.000 | (upbeat music)
00:49:23.580 | [BLANK_AUDIO]