Why Google failed to make GPT-3 -- with David Luan of Adept

00:00:00.000 | (upbeat music)

00:00:02.580 | ♪ Podcast, we dive right in ♪

00:00:07.580 | ♪ Exploring the world, the guts where new begins begin ♪

00:00:12.760 | ♪ David Luan, the founder ♪

00:00:19.180 | ♪ A visionary in his own right ♪

00:00:23.160 | ♪ Building autonomous agents ♪

00:00:26.900 | ♪ Taking us to new heights, yeah ♪

00:00:32.040 | - Hey everyone, welcome to the Latent Space Podcast.

00:00:34.480 | This is Alessio, partner and CTO

00:00:36.280 | of Residence at Decibel Partners,

00:00:37.720 | and I'm joined by my co-host, Swiggs,

00:00:39.560 | founder of Small.ai.

00:00:40.840 | - Hey, and today we have David Luan,

00:00:42.840 | CEO, co-founder of ADEPT in the studio, welcome.

00:00:45.400 | - Yeah, thanks for having me.

00:00:46.320 | - Been a while in the works,

00:00:47.340 | I met you socially at one of those VC events,

00:00:50.560 | and you said that you were interested in coming on,

00:00:52.640 | and glad we finally were able to make this happen.

00:00:54.680 | - Yeah, happy to be a part of it.

00:00:57.000 | - So we like to introduce the speaker,

00:01:00.360 | and then also just have you talk a little bit

00:01:02.440 | about what's not on your LinkedIn,

00:01:03.800 | what people should just generally know about you.

00:01:06.560 | You started a company in college,

00:01:08.840 | which was the first real-time

00:01:10.800 | video detection classification API that was Dextro,

00:01:15.480 | and that was your route to getting acquired into Axon,

00:01:18.200 | where you were director of AI.

00:01:20.180 | Then you were the 30th hire at OpenAI?

00:01:23.840 | - Yeah, 30, 35, something around there.

00:01:25.880 | - Something like that.

00:01:27.520 | VP of Eng for two and a half years, two years and a bit,

00:01:31.520 | briefly served as tech lead of large models at Google,

00:01:36.520 | and then in 2022 started ADEPT.

00:01:40.360 | So that's the sort of brief CV.

00:01:42.780 | - Yeah, more or less.

00:01:43.800 | - Yeah, is there anything else

00:01:44.640 | you wanna fill in the blanks,

00:01:45.600 | or people should know more about?

00:01:47.440 | - I guess a broader story was joined OpenAI fairly early,

00:01:52.440 | and then did that for about, yeah,

00:01:53.840 | two and a half to three years leading engineering there.

00:01:56.520 | It's really funny, I think second or third day

00:01:59.720 | of my time at OpenAI, Greg and Ilya pulled me in a room

00:02:03.920 | and were like, "Hey, you should take over our direction.

00:02:08.920 | "We'll go, mostly do IC work."

00:02:12.640 | So that was fun, just coalescing a bunch of teams

00:02:15.300 | out of a couple of early initiatives

00:02:18.480 | that had already happened.

00:02:19.320 | The company, the Dota effort was going pretty hard,

00:02:21.840 | and then more broadly trying to put some

00:02:24.880 | bigger picture direction around

00:02:25.840 | what we were doing with basic research.

00:02:27.200 | So I spent a lot of time doing that.

00:02:28.660 | And then at Google, so I led Google's LLM efforts,

00:02:32.520 | but also co-led Google Brain,

00:02:33.960 | was one of the brain leads more broadly.

00:02:36.480 | And I think there's been a couple of different eras

00:02:39.320 | of AI research, right?

00:02:40.600 | And if we count everything before 2012 as prehistory,

00:02:44.060 | which people hate it when I say that,

00:02:46.040 | you kinda had this like you and your three best friends

00:02:48.040 | write a research paper that changes the world period

00:02:50.200 | from like 2012 to 2017.

00:02:52.640 | And then from, and I think the game changed in 2017,

00:02:56.080 | and like most labs didn't realize it,

00:02:57.520 | but we at OpenAI really did.

00:02:59.040 | I think in large part helped by like Ilya's

00:03:01.240 | constant beating of the drum

00:03:02.440 | that the world would be covered in data centers,

00:03:04.220 | and like, and I think--

00:03:05.400 | - Skills that they need.

00:03:06.360 | - Yeah, well, like I think we had conviction in that,

00:03:08.640 | but it wasn't until we started seeing results

00:03:10.360 | that it became clear that that was where we had to go.

00:03:12.600 | But also part of it as well was like for OpenAI,

00:03:14.600 | like when I first joined,

00:03:15.680 | I think one of the jobs that I had to do was

00:03:17.480 | how do I tell a differentiated vision

00:03:19.240 | for who we were technically,

00:03:21.040 | compared to, hey, we're just smaller Google brain,

00:03:23.360 | or like we're Google,

00:03:24.200 | like you work at OpenAI if you live in SF

00:03:26.120 | and don't wanna commute to Mountain View

00:03:27.480 | or don't wanna live in London, right?

00:03:29.120 | That's like not enough to like hang your technical identity

00:03:31.760 | and as a company.

00:03:32.580 | And so like we really did was,

00:03:34.480 | and I spent a lot of time pushing this,

00:03:35.760 | is just how do we get ourselves focused on

00:03:38.360 | a certain class of like giant swings and bets, right?

00:03:42.080 | Like how do you flip the script from

00:03:44.460 | you just do bottom-up research to more about like,

00:03:47.480 | how do you like leave some room for that,

00:03:49.040 | to really make it about like,

00:03:50.100 | what are the big scientific outcomes that you wanna show?

00:03:53.640 | And then you just solve them at all costs,

00:03:55.360 | whether or not you care about novelty and all that stuff.

00:03:57.680 | And that became the dominant model for a couple years, right?

00:04:00.740 | And then what's changed now is I think that like

00:04:05.560 | the number one driver of AI progress

00:04:07.000 | over the next couple of years

00:04:07.880 | is gonna be the deep co-design and co-evolution

00:04:10.320 | of like product and users for feedback

00:04:12.820 | and actual technology.

00:04:14.400 | And I think labs that retool to go do that

00:04:16.320 | are gonna do really well.

00:04:17.200 | And that's a big part of why I started ADAPT.

00:04:19.080 | - You mentioned Dota.

00:04:20.200 | Any memories thinking from like the switch from RL

00:04:23.840 | to Transformers at the time and kind of how the industry

00:04:27.040 | was evolving more in the LLM side

00:04:29.800 | and leaving behind some of the more agent simulation work?

00:04:33.760 | - You know, I actually think that people,

00:04:35.960 | like zooming way out,

00:04:36.920 | I think agents are just absolutely

00:04:38.480 | the correct long-term direction, right?

00:04:39.880 | You just go to find what AGI is, right?

00:04:41.720 | You're like, hey, like, well, first off,

00:04:43.320 | actually, I don't love AGI definitions

00:04:45.120 | that involve human replacement

00:04:46.440 | because I don't think that's actually

00:04:47.400 | how it's gonna happen.

00:04:48.520 | I think even this definition of like AGI

00:04:50.000 | is something that outperforms humans

00:04:51.240 | at economically valuable tasks

00:04:52.720 | is kind of a implicit view of the world

00:04:56.680 | about what's gonna be the role of people.

00:04:59.620 | I think what I'm more interested in

00:05:01.900 | is a definition of AGI that's oriented around

00:05:04.200 | a model that can do anything

00:05:05.200 | a human can do on a computer.

00:05:06.900 | And I think if you go think about that,

00:05:08.920 | which is like super tractable,

00:05:10.580 | then agent is just a natural consequence

00:05:13.960 | of that definition.

00:05:15.120 | And so like, what did all the work we did

00:05:17.800 | on RL and stuff like that get us

00:05:19.440 | was it got us a really clear formulation,

00:05:21.900 | like you have a goal

00:05:22.740 | and you wanna maximize the goal

00:05:23.760 | and you wanna maximize reward, right?

00:05:25.560 | Like natural LLM formulation

00:05:27.000 | doesn't come with that out of the box, right?

00:05:28.960 | So like, I think that we, as a field,

00:05:32.080 | got a lot right by thinking about,

00:05:33.460 | hey, how do we solve problems of that caliber?

00:05:35.720 | And then the thing we forgot is like,

00:05:37.840 | like the Novo RL is like a pretty terrible way

00:05:40.360 | to get there quickly.

00:05:41.260 | Why are we rediscovering all the knowledge

00:05:42.880 | about the world?

00:05:43.960 | Like years ago, I had a debate

00:05:45.120 | with a Berkeley professor as to like,

00:05:48.180 | like what will it actually take to build AGI?

00:05:50.480 | And his view is basically that you have to reproduce

00:05:52.980 | all the flops that went into evolution

00:05:56.080 | in order to be able to get there, right?

00:05:57.200 | - The biological basis theory.

00:05:59.280 | - I think like we are ignoring the fact

00:06:01.760 | that you have a giant shortcut,

00:06:02.840 | which is you can behavioral clone

00:06:04.120 | everything humans already know.

00:06:05.760 | And that's what we solved with LLMs.

00:06:07.080 | We've solved behavioral cloning

00:06:08.480 | everything that humans already know, right?

00:06:10.000 | So like today, maybe LLMs is like behavioral cloning

00:06:12.720 | every word that gets written on the internet.

00:06:14.920 | In the future, you know, like now the multimodal models

00:06:17.360 | are becoming more of a thing

00:06:18.280 | where behavioral cloning the visual world,

00:06:20.080 | but really what we're just gonna have

00:06:21.560 | is like a universal byte model, right?

00:06:23.560 | Where like tokens of data that have high signal come in,

00:06:27.320 | and then all of those patterns are like learned by the model

00:06:30.280 | and then you can regurgitate any combination out, right?

00:06:32.400 | So like, like text in to voice out,

00:06:34.520 | like image in to, I don't know,

00:06:36.040 | like to other image out or video out or whatever,

00:06:38.780 | like these like mappings, right?

00:06:40.080 | Like all just gonna be learned

00:06:41.160 | by this universal behavioral cloner.

00:06:43.160 | And so I'm glad we figured that out.

00:06:44.960 | And I think now we're back to the era of like,

00:06:48.440 | how do we combine this with all of the lessons we learned

00:06:51.320 | during the RL period?

00:06:53.120 | And that's what's gonna drive progress.

00:06:54.920 | - Interesting.

00:06:55.760 | I'm still gonna pressure you for a little,

00:06:57.280 | a few more early opening eyes stories

00:06:58.880 | before we turn to the adept stuff.

00:07:00.480 | On your personal site, which I love,

00:07:02.360 | 'cause it's really nice, like personal, you know,

00:07:05.120 | story context around like your history.

00:07:07.160 | - I need to update it, it's so old.

00:07:08.640 | - Yeah, it's so out of date.

00:07:10.820 | But you mentioned GPT-2.

00:07:12.920 | Did you overlap with GPT-1?

00:07:14.200 | I think you did, right?

00:07:15.360 | - I actually don't quite remember.

00:07:17.720 | I think I was joining right around.

00:07:19.600 | - Right around that? - I was right around that, yeah.

00:07:20.840 | - Yeah, the canonical story was Alec,

00:07:23.840 | you know, just kind of came in and was like very obsessed

00:07:26.840 | with transformers and applying them

00:07:29.720 | to like Reddit sentiment analysis.

00:07:32.200 | - Yeah, yeah, sentiment, that's right,

00:07:34.040 | sentiment neuron, all that stuff.

00:07:35.540 | - The history of GPT, as far as you know,

00:07:38.120 | you know, according to you.

00:07:38.960 | - Ah, okay, history of GPT, according to me,

00:07:40.940 | that's a pretty good question.

00:07:41.860 | So I think the real story of GPT starts at Google,

00:07:45.420 | of course, right?

00:07:46.340 | Because that's where transformers sort of came about.

00:07:49.780 | The number one shocking thing to me was that,

00:07:53.480 | and this is like a consequence of the way

00:07:54.820 | that Google's organized, where like, again,

00:07:56.260 | like you and your three best friends write papers, right?

00:07:58.460 | Okay, so zooming way out.

00:07:59.940 | I think about my job when I was a full-time research leader

00:08:03.220 | as a little bit of a portfolio allocator, right?

00:08:05.620 | So I've got really, really smart people.

00:08:08.260 | My job is to convince people to coalesce

00:08:10.720 | around a small number of really good ideas

00:08:12.680 | and then run them over the finish line.

00:08:14.280 | My job is not actually to promote a million ideas

00:08:17.200 | that never have critical mass.

00:08:18.680 | And then as the ideas start coming together

00:08:20.080 | and some of them start working well,

00:08:21.600 | my job is to nudge resources towards the things

00:08:24.600 | that are really working and then start disbanding

00:08:27.160 | some of the things that are not working, right?

00:08:29.200 | That muscle did not exist during my time at Google.

00:08:33.240 | And I think had they had it, what they would have done

00:08:35.400 | would be say, hey, Noam Shazir, you're a brilliant guy,

00:08:38.240 | you know how to scale these things up.

00:08:39.880 | Like, here's half of all of our TPUs.

00:08:42.480 | And then I think they would have destroyed us.

00:08:44.880 | - He clearly wanted it too.

00:08:45.800 | He's talking about trillion parameter models in 2017.

00:08:48.080 | - Yeah, and so I think this gets to the core

00:08:49.660 | of the GPT story, right?

00:08:50.840 | Which is that, and I'm jumping around historically, right?

00:08:53.440 | But like, after GPT-2, we were all really excited

00:08:55.800 | about GPT-2, I can tell you more stories about that.

00:08:58.760 | It was the last paper that I even got to really touch

00:09:01.440 | before everything became more about

00:09:02.800 | just like building a research org.

00:09:04.800 | You know, every day we were scaling up GPT-3,

00:09:07.840 | I would wake up and just be stressed.

00:09:10.160 | And I was stressed because, you know,

00:09:11.840 | you just look at the facts, right?

00:09:13.220 | Google has all this compute, Google has all the people

00:09:15.880 | who invented all of these underlying technologies.

00:09:18.480 | There's a guy named Noam who's really smart,

00:09:20.000 | who's already gone and done this talk

00:09:22.500 | about how he wants a trillion parameter model.

00:09:24.920 | And I'm just like, you know, we're like,

00:09:26.880 | we're probably just doing duplicative research

00:09:28.960 | to what he's doing, right?

00:09:30.040 | He's got this like decoder only transformer

00:09:32.600 | that's probably gonna get there before we do.

00:09:34.660 | And I was like, but like, please just like

00:09:36.520 | let this model finish, right?

00:09:38.920 | And it turned out the whole time

00:09:40.580 | that they just couldn't get critical mass.

00:09:42.440 | So during my year where I led the Google LM effort

00:09:45.520 | and like, and I was one of the brain leads,

00:09:48.220 | you know, it became really clear why, right?

00:09:50.760 | At the time, there was a thing called

00:09:52.600 | the brain credit marketplace.

00:09:54.840 | And did you guys remember the brain credit marketplace?

00:09:57.040 | - No, I never heard of this.

00:09:57.880 | - Oh, so it's actually, you can ask any Googler,

00:10:00.160 | it's like just like a thing that they do.

00:10:02.360 | - I mean, look, like yeah, limited resources,

00:10:04.840 | you gotta have some kind of marketplace, right?

00:10:06.880 | - You could. - Sometimes it's explicit,

00:10:08.140 | sometimes it's just political favors.

00:10:10.140 | - You could, and so then like,

00:10:12.560 | basically everyone's assigned a credit, right?

00:10:14.320 | So if you have a credit, you get to buy end chips

00:10:17.440 | according to supply and demand.

00:10:19.080 | So if you wanna go do a giant job,

00:10:20.820 | you gotta convince like 19 or 20 of your colleagues

00:10:22.860 | not to do work.

00:10:24.160 | And if that's how it works, it's like,

00:10:27.020 | it's really hard to get that bottom up critical mass

00:10:30.800 | to go scale these things.

00:10:31.880 | And like, and the team at Google were fighting valiantly,

00:10:35.040 | but like, we were able to beat them

00:10:36.880 | simply because we took big swings and we focused.

00:10:40.840 | And I think, again, that's like part of the narrative

00:10:43.000 | of like this phase one of AI, right?

00:10:45.280 | Of like this modern AI era to phase two.

00:10:48.360 | And I think in the same way, I think phase three companies

00:10:51.000 | can out execute phase two companies

00:10:52.920 | because of the same like asymmetry of success.

00:10:56.120 | - Yeah, I think it's underrated how much Nvidia

00:10:59.080 | worked with you in the early days as well.

00:11:01.040 | I think maybe, I think it was Jensen,

00:11:02.840 | I'm not sure who circulated a recent photo

00:11:06.720 | of him delivering the first DGX to you guys.

00:11:10.800 | - I think Jensen has been a complete legend

00:11:15.120 | and a mastermind throughout.

00:11:17.280 | I have so much respect for Nvidia, it is unreal.

00:11:20.120 | - But like what opening I like kind of give

00:11:21.680 | their requirements like co-design it

00:11:23.480 | or you just work with whatever Nvidia gave them.

00:11:26.840 | - So we work really closely with them.

00:11:29.120 | There's, I'm not sure I can share all the stories,

00:11:31.320 | but like, I think like examples of ones

00:11:33.480 | that I've found particularly interesting.

00:11:35.000 | So Scott Gray is amazing.

00:11:37.680 | And I really like working with him.

00:11:39.200 | He was on one of my teams, the supercomputing team,

00:11:43.120 | which Chris Berner runs and Chris Berner

00:11:44.800 | still does a lot of stuff in that.

00:11:46.360 | But as a result, like we had very close ties to Nvidia.

00:11:50.640 | Actually, one of my co-founders at Adapt, Eric Elson,

00:11:52.720 | was also one of the early GPGPU people.

00:11:55.120 | And so he and Scott and like Brian Catanzaro and Nvidia

00:11:58.640 | and Jonah and Ian at Nvidia, I think all were very close.

00:12:02.640 | And we're all sort of part of this group of just like,

00:12:04.240 | how do we push these chips to the absolute limit?

00:12:07.080 | And I think like that kind of collaboration

00:12:09.520 | helped quite a bit.

00:12:10.680 | One interesting set of stuff is just like,

00:12:12.400 | knowing the A100 generation that like quad sparsity

00:12:15.080 | was gonna be a thing.

00:12:15.920 | Is that something that we wanna go look into, right?

00:12:18.480 | And figure out if that's something

00:12:19.400 | that we could actually use for model training.

00:12:21.200 | And I think more and more people realize this,

00:12:22.920 | but like six years ago, or even three years ago,

00:12:26.360 | people refused to accept it.

00:12:28.040 | Like this era of AI is really a story of compute.

00:12:30.160 | It's really the story of how do you more efficiently map

00:12:32.760 | like actual usable model flops to compute, right?

00:12:37.080 | - Yeah, cool.

00:12:38.240 | Is there another, you know, sort of GPT-2, 3 story

00:12:42.160 | that like, you know, you love to get out there

00:12:45.040 | that I think you think is like underappreciated

00:12:47.040 | for like the amount of work that people put into it?

00:12:49.080 | - So two interesting GPT-2 stories.

00:12:51.160 | - Love it.

00:12:52.000 | - I spent a good bit of time just sprinting

00:12:53.800 | to help Alec get the paper out.

00:12:55.840 | And I remember one of the most entertaining moments,

00:12:59.880 | we were writing the modeling section.

00:13:01.840 | And I'm pretty sure the modeling section

00:13:03.320 | was like the shortest modeling section of any ML,

00:13:05.720 | like reasonably legitimate ML paper to that moment.

00:13:08.480 | It was like section three model,

00:13:10.160 | like this is a standard vanilla decoder only transformer

00:13:13.200 | with like these particular things.

00:13:14.960 | It was like a paragraph long, if I remember correctly.

00:13:17.240 | And both of us were just looking at the same,

00:13:18.880 | being like, man, like the OGs in the field

00:13:21.240 | are gonna hate this.

00:13:22.080 | They're gonna say no novelty.

00:13:23.760 | Like, why'd you guys do this work?

00:13:26.080 | So now it's funny to look at in hindsight

00:13:29.000 | that it was kind of a pivotal kind of paper.

00:13:31.800 | But I think it was one of the early ones

00:13:33.520 | where we just leaned fully into all we care about

00:13:36.080 | is solving problems in AI and not about like,

00:13:38.920 | hey, like, is there like four different,

00:13:40.960 | like really simple ideas

00:13:41.960 | that are cloaked in mathematical language

00:13:44.160 | that doesn't actually help move the field forward?

00:13:47.920 | - Right.

00:13:48.760 | And it's like, you innovate on maybe like data set

00:13:50.880 | and scaling and not so much the architecture.

00:13:53.560 | - Yeah.

00:13:55.280 | I mean, now, I mean,

00:13:56.160 | like we all know how it works now, right?

00:13:57.720 | Which is that like,

00:13:58.680 | there's a collection of really hard won knowledge

00:14:00.480 | that you get only by being at the frontiers of scale.

00:14:03.360 | And that hard won knowledge,

00:14:04.880 | a lot of it's not published.

00:14:06.080 | A lot of it is like stuff that like,

00:14:07.840 | it's actually not even easily reducible

00:14:09.560 | to what looks like a typical academic paper.

00:14:12.120 | But yeah, that's the stuff that helps differentiate

00:14:14.480 | one scaling program from another.

00:14:16.280 | - Yeah.

00:14:17.120 | You had a second one?

00:14:17.960 | - Hilariously enough,

00:14:19.040 | the last meeting we did with Microsoft

00:14:21.880 | before Microsoft invested in OpenAI,

00:14:24.680 | Sam Altman, myself, and our CFO flew up to Seattle

00:14:27.960 | to do the final pitch meeting.

00:14:29.640 | And I'd been a founder before,

00:14:31.000 | so I always had like a tremendous amount of anxiety

00:14:33.680 | about partner meetings,

00:14:34.960 | which this basically is what it was,

00:14:36.400 | because it's like Kevin Scott and Satya and Amy Hood.

00:14:40.320 | And it was my job to give the technical slides about,

00:14:43.400 | you know, what's the path to AGI,

00:14:44.640 | what's our research portfolio, all of this stuff.

00:14:47.200 | But it was also my job to give the GPT-2 demo.

00:14:50.000 | We had a slightly bigger version of GPT-2

00:14:52.040 | that we had just cut maybe a day or two

00:14:54.480 | before this flight up.

00:14:55.800 | As we all know now,

00:14:56.800 | model behaviors you find predictable at one checkpoint

00:14:59.480 | are not predictable in another checkpoint.

00:15:01.160 | And so like, I'd spent all this time trying to figure out

00:15:03.040 | how to keep this thing on rails,

00:15:05.120 | prevent it from saying anything bad.

00:15:06.680 | But I had my canned demos,

00:15:08.240 | but I knew I had to go turn it around

00:15:10.160 | over to like Satya and Kevin and let them type anything in.

00:15:14.040 | And that just, that really kept me up all night.

00:15:17.000 | - Nice.

00:15:18.760 | - Yeah.

00:15:19.600 | - That must have helped you,

00:15:20.440 | talking about partners meeting,

00:15:21.600 | you raised 420 million for ADAPT.

00:15:25.440 | The last round was a $350 million Series B,

00:15:28.040 | so I'm sure you do great in partners meetings.

00:15:30.040 | - Pitching Phoenix.

00:15:31.240 | - Nice.

00:15:32.080 | - No, that's a high compliment coming from a VC.

00:15:34.360 | - Yeah, no, I mean, you're doing great already.

00:15:36.800 | Let's talk about ADAPT.

00:15:38.600 | And we were doing pre-prep,

00:15:41.240 | and you mentioned that maybe a lot of people

00:15:42.520 | don't understand what ADAPT is.

00:15:43.800 | So usually we try and introduce the product

00:15:46.240 | and then have the founders fill in the blanks,

00:15:47.880 | but maybe let's do the reverse.

00:15:49.240 | Like what is ADAPT?

00:15:50.880 | - Yeah, so I think ADAPT is like the least understood company

00:15:54.800 | in the like broader space of foundation models plus agents.

00:15:58.480 | So I'll give some color and I'll explain what it is,

00:16:02.280 | and I'll explain also why it's actually pretty different

00:16:06.120 | from what people would have guessed.

00:16:07.560 | So the goal for ADAPT is we basically wanna build

00:16:11.760 | an AI agent that can basically help humans do anything

00:16:15.640 | a human does on a computer.

00:16:17.160 | And so what that really means is like,

00:16:19.480 | we want this thing to be super good at turning

00:16:21.680 | natural language, like goal specifications, right?

00:16:25.240 | Into the correct set of end steps,

00:16:27.480 | and then also have all the correct sensors and actuators

00:16:29.920 | to go get that thing done for you across any software tool

00:16:32.440 | that you already use.

00:16:33.520 | And so the end vision of this is effectively like,

00:16:35.680 | I think in a couple of years,

00:16:36.640 | everyone's gonna have access to like an AI teammate

00:16:38.840 | that they can delegate arbitrary tasks to at work,

00:16:42.560 | and then also be able to use it as a sounding board

00:16:45.000 | and like just be way, way, way more productive, right?

00:16:47.840 | And just like changes the shape of every job

00:16:50.600 | from something where you're mostly doing execution

00:16:52.400 | to something where you're mostly actually doing

00:16:53.880 | like these core liberal arts skills of like,

00:16:55.960 | what should I be doing and why, right?

00:16:58.160 | I find this like really exciting and motivating

00:16:59.760 | because I think it's actually a pretty different vision

00:17:02.840 | for how AGI will play out.

00:17:04.200 | I think like a systems like ADAPT

00:17:06.120 | are the most likely systems to be proto-AGIs.

00:17:09.800 | But I think the ways in which we are really counterintuitive

00:17:12.180 | to everybody is that we've actually been really quiet

00:17:14.840 | because we are not a developer company.

00:17:18.520 | We don't sell APIs.

00:17:19.520 | We don't sell open source models.

00:17:21.320 | We also don't sell bottom up products.

00:17:24.200 | Like we're not a thing that you go and click

00:17:26.520 | and download the extension

00:17:28.520 | and like we want more users signing up for that thing.

00:17:30.480 | We're actually an enterprise company.

00:17:31.760 | So what we do is we have,

00:17:33.040 | we work with like a range of different companies,

00:17:36.600 | some like late stage, like multi-thousand people startups,

00:17:40.680 | some fortune 500s, et cetera.

00:17:42.560 | And what we do for them is we basically give them

00:17:45.240 | an out of the box solution where like big complex workflows

00:17:49.240 | that their employees do every day

00:17:50.580 | could be delegated to the model.

00:17:52.280 | So we look a little different from other companies

00:17:54.360 | in that like in order to go build this full agent thing,

00:17:57.740 | the most important thing you gotta get right is reliability.

00:18:00.600 | I think over the last year or two.

00:18:02.080 | So initially zooming way back

00:18:03.680 | when one of the first things that Depp did

00:18:05.480 | was we released this demo called Act One, right?

00:18:08.880 | Act One was like pretty cool.

00:18:09.900 | It's like kind of become a hello world thing

00:18:11.600 | for people to show agent demos

00:18:13.220 | by going to Redfin and asking to buy a house somewhere.

00:18:15.960 | 'Cause like we did that in the original Act One demo

00:18:18.280 | and like showed that, showed like Google Sheets,

00:18:20.440 | like all this other stuff.

00:18:22.240 | But over the last like year since that has come out,

00:18:26.600 | there's been a lot of really cool demos.

00:18:30.400 | And you go play with them

00:18:31.240 | and you realize they work 60% of the time.

00:18:33.240 | But since we've always been focused

00:18:34.760 | on how do we build an amazing enterprise product,

00:18:36.820 | like enterprises like don't want,

00:18:38.640 | can't use anything that isn't in the nines of reliability.

00:18:41.800 | And so we've actually had to go down

00:18:43.040 | a slightly different tech tree

00:18:44.280 | than what you might find in the prompt engineering

00:18:47.240 | sort of plays in the agent space to get that reliability.

00:18:52.240 | And we've decided to prioritize reliability over all else.

00:18:54.600 | So like one of our use cases is crazy enough

00:18:56.980 | that it actually ends with a physical truck

00:18:59.400 | being sent to a place as the result of the agent workflow.

00:19:02.880 | And if you're like, if that works like 60% of the time,

00:19:05.240 | you're just like blowing money

00:19:06.440 | and poor truck drivers going places.

00:19:08.360 | - Interesting.

00:19:10.280 | We had one of our investment teams

00:19:12.480 | has this idea of services as software.

00:19:14.640 | I'm actually giving a talk at NVIDIA GTC about this,

00:19:17.320 | but basically software as a service,

00:19:19.800 | you're wrapping user productivity in software

00:19:23.120 | with agents and services as software

00:19:25.080 | is replacing things that you would ask somebody to do

00:19:29.280 | and the software just does it for you.

00:19:31.320 | When you think about these use cases,

00:19:33.240 | do the users still go in and like look at the agent

00:19:37.480 | kind of like doing the things and can intervene

00:19:39.280 | or like are these like fully removed from them?

00:19:41.600 | Like the truck thing is like, does the truck just show up

00:19:43.700 | or like are there people in the middle like checking in?

00:19:46.200 | - Yeah, so actually what's been really interesting

00:19:47.920 | is you could question whether they're fundamental,

00:19:49.840 | but I think there's two current flaws

00:19:51.000 | in the framing for services as software

00:19:53.600 | or I think what you just said.

00:19:55.080 | I think that one of them is like in our experience

00:19:56.960 | as we've been rolling out ADEPT,

00:19:59.400 | the people who actually do the jobs

00:20:00.960 | are the most excited about it

00:20:02.520 | because they don't go from I do this job

00:20:04.360 | to I don't do this job.

00:20:05.300 | They go from I do this job for everything,

00:20:07.600 | including the shitty rote stuff to I'm a supervisor

00:20:11.320 | and I literally like, it's pretty magical

00:20:13.120 | when you watch the thing being used

00:20:15.060 | because like now it parallelizes a bunch of the things

00:20:17.840 | that you had to do sequentially by hand as a human

00:20:20.640 | and you can just click into any one of them,

00:20:22.100 | be like, hey, I wanna watch the trajectory

00:20:23.600 | that the agent went through to go solve this

00:20:26.280 | and the nice thing about agent execution

00:20:28.400 | as opposed to like LLM generations

00:20:30.640 | is that a good chunk of the time

00:20:32.560 | when the agent fails to execute,

00:20:34.060 | it doesn't give you the wrong result.

00:20:35.240 | It just fails to execute

00:20:36.320 | and the whole trajectory is just broken and dead

00:20:37.840 | and the agent knows it, right?

00:20:39.280 | So then those are the ones that the human then goes

00:20:41.400 | and solves and so then they become a troubleshooter.

00:20:43.200 | They work on the more challenging stuff.

00:20:44.880 | They get way, way more stuff done

00:20:46.320 | and they're really excited about it.

00:20:47.880 | I think the second piece of it that we've found

00:20:51.160 | is like our strategy as a company

00:20:53.660 | is to always be an augmentation company

00:20:57.220 | and I think one, out of principle,

00:20:59.500 | that's something we really care about

00:21:01.120 | but two, actually, if you're framing yourself

00:21:04.260 | as an augmentation company,

00:21:06.100 | you're always gonna live in the world

00:21:07.500 | where you're solving tasks that are a little too hard

00:21:10.000 | for what the model can do today

00:21:11.740 | and still needs a human to provide oversight,

00:21:15.100 | provide clarifications, provide human feedback

00:21:17.620 | and that's how you build a data flywheel.

00:21:19.340 | That's how you actually learn from the smartest humans

00:21:21.380 | how to solve things models can't do today

00:21:23.220 | and so I actually think that being an augmentation company

00:21:25.980 | forces you to go develop your core AI capabilities faster

00:21:30.060 | than someone who's saying,

00:21:30.900 | ah, okay, my job's to deliver you

00:21:32.780 | a lights-off solution for X.

00:21:35.120 | - Yeah, it's interesting

00:21:36.340 | because we've seen two parts of the market.

00:21:39.100 | One is we have one company

00:21:40.740 | that does agents for SOC analysts.

00:21:43.340 | People just don't have them, you know,

00:21:45.020 | and just they cannot attract the talent to do it

00:21:47.020 | and similarly in software development,

00:21:49.260 | you have Copilot, which is the augmentation product

00:21:51.740 | and then you have Sweep.dev, any of these products,

00:21:54.860 | which is like, they just do the whole thing.

00:21:57.580 | I'm really curious to see how that evolves.

00:21:59.580 | I agree that today, the reliability's so important

00:22:02.500 | in the enterprise that they just don't use most of them.

00:22:05.560 | Yeah, no, that's cool.

00:22:08.380 | But it's great to hear the story

00:22:09.780 | because I think from the outside,

00:22:10.900 | people are like, oh, Dev, they do Act One,

00:22:13.180 | they do Persimon, they do Fuyu, they do all these--

00:22:15.500 | - It's just the public stuff.

00:22:16.660 | - It's just public stuff and so I think you're gonna find,

00:22:19.580 | so one of the things we haven't shared before

00:22:21.300 | is we're completely sold out for Q1.

00:22:23.340 | And so I think-- - Sold out of what?

00:22:25.340 | - Sold out of bandwidth to go onboard more customers.

00:22:27.740 | I think we're like working really hard to go,

00:22:29.820 | like make that less of a bottleneck,

00:22:31.860 | but you could, but our expectation is that,

00:22:35.740 | I think we're gonna be significantly more public

00:22:37.500 | about the broader product shape

00:22:39.740 | and the new types of customers

00:22:41.980 | we wanna attract later this year.

00:22:43.100 | So I think that clarification will happen by default.

00:22:46.620 | - Why have you become more public?

00:22:48.700 | You know, if the whole push has,

00:22:50.060 | you're sold out, you're my enterprise,

00:22:51.780 | but you're also clearly putting effort

00:22:53.100 | towards being more open or releasing more things.

00:22:56.700 | - I think we just flipped over that way fairly recently.

00:22:59.820 | I think that, like, that's a good question.

00:23:01.180 | I think it actually boils down to two things.

00:23:03.300 | The public narrative is really forming around agents

00:23:05.420 | as being the most important thing.

00:23:07.140 | And I'm really glad that's happening

00:23:08.540 | because when we started the company in January, 2022,

00:23:11.100 | like everybody in the field knew

00:23:13.700 | about the agents thing from RL, right?

00:23:15.340 | But like the general public had no conception

00:23:17.260 | of what it was.

00:23:18.100 | They would still hang their narrative hat

00:23:19.660 | on the tree of like, everything's a chatbot, right?

00:23:23.020 | And so I think now,

00:23:24.540 | I think one of the things that I really care about

00:23:26.260 | is that when people think agent,

00:23:27.860 | they actually think the right thing, right?

00:23:29.660 | Like all sorts of different things

00:23:31.660 | are being called agents.

00:23:32.500 | Chatbots are being called agents.

00:23:33.500 | Things that make a function call are being called agents.

00:23:35.260 | Like to me, an agent is something that you can give a goal

00:23:38.140 | and get an end step workflow done correctly

00:23:40.300 | in the minimum number of steps, right?

00:23:42.180 | And so that's a big part of why.

00:23:44.180 | And I think the other part is because

00:23:45.380 | I think it's always good for people

00:23:46.660 | to be more aware of Adept as they think about

00:23:48.220 | what the next thing they wanna do in their careers.

00:23:50.060 | And I think the field is quickly pivoting

00:23:52.420 | in a world where foundation models

00:23:54.380 | are looking more and more commodity.

00:23:56.580 | And I think a huge amount of gain is gonna happen

00:23:59.300 | from how do you use foundation models

00:24:01.940 | as like the well-learned behavioral cloner

00:24:05.700 | to go solve agents.

00:24:06.620 | And I think people who wanna do agents research

00:24:08.540 | should really come to Adept.

00:24:10.300 | - Yeah, excellent.

00:24:11.660 | When you say agents have become

00:24:13.100 | more part of the public narrative,

00:24:14.900 | are there specific things that you point to?

00:24:17.060 | So I'll name a few.

00:24:19.060 | Bill Gates, in his blog posts,

00:24:21.100 | mentioning that agents are the future.

00:24:23.020 | I'm the guy who made OSes,

00:24:24.540 | and I think agents are the next thing.

00:24:26.580 | So Bill Gates, I'll call that out.

00:24:28.380 | And then maybe Sam Altman also saying

00:24:29.900 | agents are the future for OpenAI.

00:24:31.580 | - And before that even, I think there was something

00:24:33.580 | like New York Times, Kate Metz

00:24:35.140 | wrote a New York Times piece about it.

00:24:37.180 | Right now, in a bit to differentiate,

00:24:39.020 | I'm seeing AI startups that used to just brand themselves

00:24:41.140 | as an AI company now brand themselves as an AI agent company.

00:24:44.060 | It's just like, it's a term.

00:24:45.260 | I just feel like people really wanna--

00:24:46.100 | - From the VC side, it's a bit mixed.

00:24:47.620 | - Is it?

00:24:48.460 | - As in like, I think there are a lot of VCs

00:24:49.820 | where like, I would not touch any agent startups

00:24:51.820 | 'cause like--

00:24:52.660 | - Why is that?

00:24:53.540 | - Well, you tell me.

00:24:54.380 | (laughs)

00:24:56.020 | - I think a lot of VCs that are maybe less technical

00:24:59.020 | don't understand the limitations of the--

00:25:00.940 | - No, that's not fair.

00:25:01.940 | - No, no, no, no, I think like--

00:25:02.780 | - You think so?

00:25:03.620 | - No, no, I think like the, what is possible today

00:25:06.380 | and like what is worth investing in, you know?

00:25:08.420 | And I think like, I mean, people look at you and say,

00:25:10.500 | "Wow, these guys are building agents.

00:25:12.020 | "They needed 400 million to do it."

00:25:13.860 | So a lot of VCs are maybe like,

00:25:15.260 | "Oh, I would rather invest in something

00:25:17.260 | "that is like tacking on AI to an existing thing,

00:25:20.000 | "which is like easier to get the market

00:25:21.440 | "and kind of get some of the flag wheel going."

00:25:24.260 | But I'm also surprised a lot of funders

00:25:26.660 | just don't wanna do agents.

00:25:27.900 | It's not even the funding.

00:25:28.820 | Like, sometimes we look around and it's like,

00:25:30.700 | "Why is nobody doing agents for X?"

00:25:33.300 | And it's like--

00:25:34.140 | - Wow.

00:25:34.980 | - I don't get it.

00:25:36.220 | - That's good to know, actually.

00:25:37.620 | I never knew that before.

00:25:39.580 | My sense from my limited perspective

00:25:42.220 | is there's a new agent and company popping up every day.

00:25:44.300 | So maybe I'm missing something.

00:25:45.140 | - There are, there are.

00:25:46.740 | But like I have advised people to take agents

00:25:48.780 | off of their title because it's so diluted.

00:25:52.060 | - It's now so diluted, yeah.

00:25:53.460 | - So then it doesn't stand for anything.

00:25:55.300 | - Yeah, that's a really good point.

00:25:56.500 | - So anyway, I do want to also cover,

00:25:59.340 | so like, you know, you're a portfolio allocator.

00:26:02.100 | You have like people know about Persimmon,

00:26:04.660 | people know about Fuyu and Fuyu Heavy.

00:26:06.700 | Can you take us through like how you think about

00:26:08.660 | that evolution of that and what people should think about

00:26:11.700 | what that means for adept sort of research directions?

00:26:14.980 | - The critical path for adept is we want to build

00:26:17.940 | agents that can do higher and higher level

00:26:20.300 | of abstraction things over time,

00:26:22.100 | all while keeping an insanely high reliability standard.

00:26:24.940 | Because that's what turns us from research

00:26:27.020 | into something that customers want.

00:26:28.580 | And if you build agents with a really high

00:26:30.380 | reliability standard but are continuing pushing

00:26:32.100 | a level of abstraction, you then learn from your users

00:26:34.420 | how to get that next level of abstraction faster.

00:26:36.180 | So that's how you actually build the data flow.

00:26:38.540 | That's the critical path for the company.

00:26:40.140 | Everything we do is in service of that.

00:26:41.780 | So if you go zoom way, way back to Act One days, right?

00:26:44.820 | Like the core thing behind Act One is,

00:26:46.180 | can we teach a large model, basically,

00:26:50.780 | how to even actuate your computer?

00:26:52.740 | And I think we were one of the first places

00:26:54.500 | to have solved that and shown it

00:26:56.340 | and shown the generalization that you get

00:26:57.780 | when you give it various different workflows and text.

00:27:00.420 | But I think from there on out,

00:27:01.980 | we really realized was that like,

00:27:03.380 | in order to get reliability,

00:27:06.540 | and also like companies just do things

00:27:07.980 | in various different ways,

00:27:08.820 | you actually want these models to be able

00:27:10.260 | to get a lot better at having some specification

00:27:14.060 | of some guardrails for what it actually should be doing.

00:27:16.340 | And I think in conjunction with that,

00:27:17.980 | a giant thing that was really necessary

00:27:20.100 | is really fast multimodal models

00:27:22.420 | that are really good at understanding knowledge work

00:27:24.140 | and really good at understanding screens.

00:27:25.940 | And that needs to kind of be the base

00:27:27.660 | for some of these agents.

00:27:29.620 | And so like, back then we had to do a ton of research,

00:27:32.620 | basically, on how do we actually make that possible?

00:27:34.780 | Well, first off, like, back in 2020,

00:27:37.220 | I forgot the exact one month of 23,

00:27:39.620 | like there were no multimodal models really

00:27:41.460 | that you could use for things like this.

00:27:43.500 | And so we pushed really hard on stuff

00:27:45.420 | like the 4U architecture.

00:27:46.860 | I think one big hangover from primarily academic focus

00:27:51.860 | for multimodal models is like,

00:27:53.580 | most multimodal models are primarily trained

00:27:55.340 | on like natural images, cat and dog photos,

00:27:57.740 | stuff that's come out of the camera.

00:27:59.100 | - Coco.

00:27:59.940 | - Yeah, right, and the Coco is awesome.

00:28:01.300 | Like, I love Coco, I love TY.

00:28:03.020 | Like, it's like, it's really helped the field, right?

00:28:05.300 | But like, that's the build one thing.

00:28:06.980 | I actually think like, like, it's really clear today,

00:28:09.340 | multimodal models are the default foundation model, right?

00:28:12.020 | It's just gonna supplant LLMs.

00:28:13.300 | Like, why would you just train a giant multimodal model?

00:28:16.940 | And so for that though,

00:28:18.020 | like, where are they gonna be the most useful?

00:28:19.420 | They're gonna be most useful in knowledge work tasks.

00:28:21.600 | That's where the majority economic value is gonna be.

00:28:23.340 | It's not in cat and dogs, right?

00:28:25.420 | And so if that's what it is, what do you need to train?

00:28:27.580 | I need to train on like charts, graphs, tables, invoices,

00:28:29.780 | PDFs, receipts, unstructured data, UIs.

00:28:32.060 | Like, that's just a totally different pre-training corpus.

00:28:35.380 | And so at Depp, spent a lot of time building that.

00:28:37.900 | And so the like, the public for use and stuff

00:28:39.900 | aren't trained on our actual corpus,

00:28:41.700 | it's trained on some other stuff.

00:28:42.840 | But you take a lot of that data

00:28:44.600 | and then you make it really fast,

00:28:46.380 | make it really good at things like,

00:28:47.780 | like dense OCR on screens.

00:28:50.640 | And then now you have like the right,

00:28:52.360 | like a raw putty to go make a good agent.

00:28:54.540 | So that's kind of like some of the modeling side.

00:28:56.540 | We've kind of only announced some of that stuff.

00:28:58.060 | We haven't really announced much of the agents work.

00:29:01.220 | But that if you put those together

00:29:02.900 | with the correct product form factor,

00:29:04.940 | and I think the product form factor also really matters.

00:29:07.320 | I think like we're seeing,

00:29:09.620 | and you guys probably see this a little bit more than I do,

00:29:11.600 | but like we're seeing like a little bit of a pushback

00:29:15.220 | against like the tyranny of chatbots as form factor.

00:29:18.420 | And I think that the reason why the form factor matters

00:29:21.180 | is the form factor changes what data you collect

00:29:23.020 | in the human feedback loop.

00:29:24.500 | And so I think we've spent a lot of time doing full

00:29:27.660 | of like vertical integration of all these bits

00:29:30.740 | in order to get to where we are.

00:29:32.180 | - Yeah.

00:29:33.020 | I'll plug Amelia Weinberger's talk at our conference

00:29:36.620 | where she gave a little bit of the thinking

00:29:38.740 | behind like what else exists other than chatbots

00:29:41.260 | that if you could delegate to reliable agents,

00:29:43.140 | you could do.

00:29:43.980 | - Totally.

00:29:44.800 | - And yeah.

00:29:46.500 | I mean, so I was kind of excited at Adept Experiments

00:29:49.900 | or Adept Workflows.

00:29:50.740 | I don't know what the official name for it is.

00:29:52.900 | I was like, okay, like this is something I can use,

00:29:55.500 | but it seems like it's just an experiment for now.

00:29:57.180 | It's not your product.

00:29:58.420 | - Yeah.

00:29:59.240 | So we just use experiments as like a way to go push

00:30:01.400 | various ideas on the design side to some people

00:30:05.400 | and just like get them to play with it.

00:30:06.640 | And actually the experiments code base underpins

00:30:11.640 | the actual product,

00:30:14.520 | but it's like just the code base itself

00:30:16.960 | is like a kind of like a skeleton

00:30:18.600 | for us to go deploy arbitrary cards on the side.

00:30:20.760 | - Yep.

00:30:21.580 | Yeah, makes sense.

00:30:22.540 | Yeah.

00:30:23.440 | I was gonna say,

00:30:24.280 | I would love to talk about the interaction layer.

00:30:25.920 | So you train a model to see UI,

00:30:28.860 | but then there's the question of like,

00:30:30.480 | how do you actually act on the UI?

00:30:32.160 | I think there was some rumors about open app building agents

00:30:35.160 | that are kind of like they manage the end point.

00:30:37.160 | So the whole computer,

00:30:39.040 | you're more at the browser level.

00:30:41.360 | Like, and I know I read in one of your papers,

00:30:44.200 | you have like a different representation,

00:30:46.120 | kind of like you don't just take the dome and act on it.

00:30:48.600 | You do a lot more stuff.

00:30:50.320 | How do you think about the best way the models will interact

00:30:53.520 | with the software and like how the development of products

00:30:56.680 | is gonna change with that in mind

00:30:58.320 | as more and more of the work is done by agents

00:31:00.480 | instead of people?

00:31:01.400 | - There's so much surface area here.

00:31:02.720 | And it's actually one of the things I'm really excited about.

00:31:04.320 | And it's like, it's funny because like,

00:31:06.360 | I've spent most of my time doing research stuff,

00:31:08.880 | but there's like a whole new ball game

00:31:10.520 | that I've been learning about and I find it really cool.

00:31:13.640 | So I would say the best analogy I have

00:31:18.640 | to why Adept is pursuing a path of being able to just use

00:31:23.640 | your computer like a human,

00:31:26.400 | plus of course being able to call APIs

00:31:28.000 | is the easy part,

00:31:29.400 | like being able to use your computer

00:31:30.240 | like a human is a hard part.

00:31:31.480 | It's in the same way why people are excited

00:31:32.800 | about humanoid robotics, right?

00:31:34.320 | Like in a world where you had T equals infinity, right?

00:31:37.520 | You're probably gonna have various different form factors

00:31:39.400 | that robots could just be in and like all the specialization

00:31:42.320 | but the fact is that humans live in a human environment.

00:31:44.560 | So having a human robot lets you do things that humans do

00:31:47.440 | without changing everything along the way.

00:31:49.840 | It's the same thing for software, right?

00:31:51.680 | Like if you go itemize out the number of things

00:31:55.120 | you wanna do on your computer,

00:31:56.560 | for which every step has an API,

00:31:59.240 | those numbers of workflows add up pretty close to zero.

00:32:01.920 | And so then many points along the way,

00:32:03.640 | you need the ability to actually control your computer

00:32:05.500 | like a human.

00:32:06.340 | It also lets you learn from human usage of computers

00:32:09.240 | as a source of training data that you don't get

00:32:10.720 | if you have to somehow figure out

00:32:14.080 | how every particular step needs to be

00:32:15.400 | some particular custom private API thing.

00:32:18.280 | And so I think like this is actually

00:32:19.520 | the most practical path.

00:32:20.880 | I think because it's the most practical path,

00:32:22.500 | I think a lot of success will come

00:32:24.840 | from going down this path.

00:32:26.240 | So what you're likely to see

00:32:27.280 | is you're gonna end up seeing agents

00:32:28.720 | that sort of like,

00:32:29.560 | I kind of think about this early days

00:32:31.160 | of the agent interaction layer level is a little bit like,

00:32:34.400 | do y'all remember Windows 3.1, like those days?

00:32:38.080 | Okay, I might be too old for you guys on this,

00:32:41.320 | but like back in the day, Windows 3.1, right?

00:32:43.840 | Like the way we had this transition period

00:32:46.400 | between like pure command line, right?

00:32:48.560 | Being like the default to this new robot,

00:32:51.520 | the GUI is the default,

00:32:52.360 | and then you drop into the command line

00:32:53.520 | for like programmer things, right?

00:32:55.320 | The old way was you booted your computer up,

00:32:57.920 | DOS booted,

00:32:59.000 | and then it would give you the C colon slash thing,

00:33:01.360 | and you typed Windows and you hit enter,

00:33:03.160 | and then you got put into Windows.

00:33:05.260 | And then like GUI kind of became a layer

00:33:08.440 | above the command line.

00:33:09.740 | I think the same thing is gonna happen

00:33:12.440 | with agent interfaces,

00:33:13.840 | is like today we'll be having the GUI

00:33:16.360 | is like the base layer,

00:33:18.200 | and then the agent just controls

00:33:20.160 | the current GUI layer plus APIs.

00:33:22.840 | And in the future,

00:33:24.000 | as more and more trust is built towards agents,

00:33:25.800 | and more and more things can be done by agents,

00:33:27.720 | and more UIs for agents are actually generative

00:33:29.880 | in and of themselves,

00:33:31.240 | then that just becomes a standard interaction layer.

00:33:33.800 | And if that becomes a standard interaction layer,

00:33:35.600 | like what changes for software

00:33:37.080 | is that like a lot of software

00:33:38.880 | is gonna be either systems or record,

00:33:40.680 | or like certain customized workflow execution engines.

00:33:44.240 | And a lot of how you actually do stuff

00:33:46.520 | will be controlled at the agent layer.

00:33:48.120 | - And you think so like the Rabbit interface

00:33:50.680 | is more like it would like,

00:33:51.780 | you're not actually seeing the app

00:33:53.280 | that the model interacts with,

00:33:54.680 | you're just saying,

00:33:55.940 | hey, I need to log this call on Salesforce.

00:33:57.960 | And like, you're never actually going

00:33:59.440 | on salesforce.com directly as the user.

00:34:02.340 | - I can see that being a model.

00:34:03.400 | I think I don't know enough about how,

00:34:06.560 | what using Rabbit in real life will actually be like

00:34:09.320 | to comment on that particular thing.

00:34:11.200 | But I think the broader,

00:34:13.400 | I think the broader idea that like,

00:34:15.600 | that like, you know, you have a goal, right?

00:34:18.320 | The agent knows how to break your goal down into steps.

00:34:20.320 | The agent knows how to use the underlying software

00:34:22.600 | and systems of record to achieve that goal for you.

00:34:24.880 | The agent maybe presents you information in a custom way

00:34:27.880 | that's only relevant to your particular goal.

00:34:30.940 | That all just really leads to a world

00:34:32.440 | where you don't really need to ever interface

00:34:35.820 | with the apps underneath,

00:34:36.840 | unless you're a power user for some niche thing.

00:34:38.760 | - General question.

00:34:39.960 | So first of all, I think like this whole,

00:34:42.200 | the sort of input mode conversation,

00:34:44.980 | I wonder if you have any analogies

00:34:46.920 | that you like with self-driving?

00:34:49.120 | Because I do think like,

00:34:50.880 | there's a little bit of like how the model

00:34:52.680 | should perceive the world.

00:34:54.100 | And, you know, the primary split in self-driving

00:34:56.440 | is LIDAR versus camera.

00:34:58.800 | And I feel like most agent companies that I'm tracking

00:35:03.080 | are all moving towards camera approach, which is like--

00:35:05.760 | - The multimodal approach that we're doing.

00:35:06.600 | - The non-multimodal vision, very, very heavy vision.

00:35:10.020 | All the for you stuff that you're doing,

00:35:11.520 | you're focusing on that, including charts and tables and--

00:35:15.760 | - Yeah.

00:35:16.840 | - Do you find like inspiration there from like,

00:35:19.600 | the self-driving world?

00:35:22.020 | - That's a good question.

00:35:23.880 | I think sometimes the most useful inspiration

00:35:26.240 | I've found from self-driving is the levels analogy.

00:35:31.240 | And I think that's great. - Level one to five.

00:35:32.480 | - I think that's awesome.

00:35:34.280 | But I think that our number one goal is for agents

00:35:36.800 | not to look like self-driving,

00:35:38.480 | in that we wanna minimize the chances

00:35:40.720 | that agents are sort of a thing

00:35:42.320 | that you just have to bang your head at for a long time

00:35:45.880 | to get to like two discontinuous milestones,

00:35:47.940 | which is basically what's happened in self-driving.

00:35:50.440 | We wanna be living in a world

00:35:51.520 | where you have the data flywheel immediately,

00:35:53.720 | and that takes you all the way up to the top.

00:35:55.600 | But similarly, I mean, like compared to self-driving,

00:35:58.240 | like two things that people really undervalue

00:36:00.680 | is like really easy to get the like,

00:36:03.080 | driving a car down highway 101 in a sunny day demo, right?

00:36:06.680 | Like that actually doesn't prove anything anymore.

00:36:09.320 | And I think the second thing is that

00:36:12.040 | as a non-self-driving expert,

00:36:13.920 | I think one of the things that we believe really strongly

00:36:18.100 | is that everyone undervalues the importance

00:36:22.700 | of really good sensors and actuators.

00:36:25.020 | And actually a lot of what's helped us

00:36:26.720 | get a lot of reliability is like a really strong focus

00:36:29.820 | on like actually why does the model not do this thing?

00:36:32.460 | And the non-trivial amount of time,

00:36:33.800 | the time the model doesn't actually do the thing

00:36:36.100 | is because if you're a wizard of ozzing it yourself,

00:36:38.260 | or if you have unreliable actuators, you can't do the thing.

00:36:41.580 | And so we've had to fix a lot of those problems.

00:36:43.860 | - Yeah, makes sense.

00:36:45.360 | I was slightly surprised just because

00:36:47.160 | I do generally consider the Waymo's

00:36:49.200 | that we see all around San Francisco

00:36:51.000 | as the most, I guess, real case of agents that we have,

00:36:55.000 | you know, in very material ways.

00:36:57.760 | - Oh, that's absolutely true.

00:36:58.600 | I think they've done an awesome job,

00:37:00.000 | but it has taken a long time for self-driving to mature.

00:37:02.960 | Like from when it entered the consciousness

00:37:05.280 | and the 101, the driving down 101 on a sunny day moment

00:37:08.680 | happened to now, right?

00:37:10.720 | So I want to see that more compressed.

00:37:11.560 | - And then, you know, cruise, you know, RIP recently.

00:37:15.180 | So, and then one more thing on just like,

00:37:18.140 | just going back on this reliability thing,

00:37:21.140 | something I have been holding in my head

00:37:24.060 | that I'm curious to get your commentary on is there's,

00:37:25.960 | I think there's a trade-off

00:37:26.800 | between reliability and generality,

00:37:29.020 | or I want to broaden reliability

00:37:30.780 | into just general like sort of production readiness

00:37:32.660 | and enterprise readiness scale.

00:37:34.180 | 'Cause you have reliability, you also have cost,

00:37:35.700 | you also have speed.

00:37:36.660 | Speed is a huge emphasis for a debt.

00:37:39.740 | All of that seems to, tends towards wanting to reduce,

00:37:44.520 | the tendency or the temptation is to reduce generality,

00:37:47.520 | to improve reliability and to improve cost, improve speed.

00:37:50.540 | Do you perceive a trade-off?

00:37:54.080 | Do you have any insights that,

00:37:56.360 | that solve those trade-offs for you guys?

00:37:59.000 | - There's definitely a trade-off

00:38:00.640 | if you're at the Pareto frontier.

00:38:03.000 | I think a lot of folks aren't actually

00:38:04.680 | at the Pareto frontier.

00:38:05.840 | And I think the way you get there is basically like,

00:38:09.320 | how do you frame the fundamental agent problem

00:38:11.920 | in a way that just continues to benefit from data?

00:38:15.320 | And I think that, I think like one of the main ways

00:38:19.200 | of like being able to solve that particular trade-off

00:38:21.640 | is like, you basically just want to formulate the problem

00:38:26.640 | such that every particular use case

00:38:29.080 | just looks like you collecting more data

00:38:30.640 | to go make that use case possible.

00:38:32.160 | I think that's how you really solve it.

00:38:33.280 | Then you get into the other problems like,

00:38:34.680 | okay, are you overfitting on these end use cases, right?

00:38:36.760 | But like, you're not doing a thing

00:38:38.360 | where you're like being super prescriptive

00:38:39.900 | for the end steps and that the model,

00:38:42.400 | that the model can only do, for example.

00:38:44.200 | - I mean, so then the question becomes kind of,

00:38:47.740 | do you have one sort of house model

00:38:49.640 | that you then customize for each customer

00:38:52.920 | and you're fine-tuning them

00:38:53.880 | on like each customer's specific use case?

00:38:55.640 | - Yeah, we're not sharing that one.

00:38:57.000 | - You're not sharing that.

00:38:59.520 | It's tempting because,

00:39:00.440 | but like that doesn't look like AGI to me.

00:39:02.440 | You know what I mean?

00:39:03.280 | Like that is just, you have a good base model

00:39:05.160 | and then you fine-tune it to others.

00:39:07.040 | - Yeah, yeah, yeah.

00:39:09.080 | I mean, I think for what it's worth,

00:39:10.680 | I think there's like two paths to a lot more capability

00:39:15.680 | coming out of the model set

00:39:17.300 | that we all are training these days.

00:39:19.120 | I think one path is you figure out how to spend,

00:39:21.920 | compute and turn it into data.

00:39:23.840 | I think the other path, and so like in that path, right,

00:39:26.280 | I consider search, RL, all the things that we all,

00:39:29.920 | that we all love in this era as part of that path,

00:39:32.720 | like self-play, all that stuff.

00:39:34.740 | The second path is how do you get like super competent,

00:39:39.740 | high intelligence demonstrations from humans.

00:39:44.940 | And I think the right way to move forward

00:39:46.940 | is you kind of want to combine the two.

00:39:48.700 | Like the first one gives you maximum sample efficiency

00:39:51.540 | for a little second,

00:39:53.260 | but I think that it's gonna be hard

00:39:55.260 | to be running at max speed towards AGI

00:39:59.100 | without actually solving a bit of both.

00:40:00.660 | - Yeah, any insights on,

00:40:03.180 | you haven't talked much about synthetic data

00:40:04.940 | as far as I can tell.

00:40:06.000 | Probably this is a bit of a, too much of a trend right now,

00:40:11.020 | but any insights on using synthetic data

00:40:12.820 | to augment the expensive human data?

00:40:15.380 | - The best part about framing AGI

00:40:17.740 | as being able to help people do things on computers

00:40:20.360 | is you have an environment.

00:40:21.420 | - Yes.

00:40:22.260 | - So.

00:40:23.100 | (laughs)

00:40:24.140 | - So you can simulate all of it.

00:40:25.420 | - You could do a lot of stuff when you have an environment.

00:40:27.820 | - Yeah.

00:40:28.660 | - We were having dinner for our one year anniversary.

00:40:30.700 | - Congrats.

00:40:31.980 | - Yeah, thank you.

00:40:33.300 | Raza from HumanLoop was there

00:40:34.900 | and we mentioned you were coming on the pod with,

00:40:37.900 | this is our first.

00:40:38.740 | - So he submitted a question.

00:40:39.560 | - Yeah, this is our first, I guess, like mailbag question.

00:40:42.480 | He asked, when you started GPT-4 Data and Exist,

00:40:46.460 | now you've had GPT-4 Vision,

00:40:48.420 | which can help you building a lot of those things.

00:40:51.580 | How do you think about the things that are unique to you

00:40:54.860 | as ADAPT and like going back to like the,

00:40:56.980 | maybe research direction that you want to take the team

00:40:59.500 | and what you want people to come work on at ADAPT

00:41:02.260 | versus what is maybe now become commoditized

00:41:05.020 | that you didn't expect everybody would have access to?

00:41:07.540 | - Yeah, that's a really good question.

00:41:09.200 | I think implicit in that question,

00:41:11.340 | and I wish he were tier two,

00:41:12.700 | so he can push back on my assumption about his question.

00:41:15.900 | But I think implicit in that question is like,

00:41:20.140 | is a calculus of where does advantage accrue

00:41:24.280 | in the overall ML stack.

00:41:26.120 | And maybe part of the assumption is that advantage accrues

00:41:29.480 | solely to base model scaling.

00:41:31.740 | But I actually believe pretty strongly

00:41:33.500 | that the way that you really win

00:41:36.700 | is that you have to go build an agent stack

00:41:41.700 | that is much more than that of the base model itself.

00:41:45.220 | And so I think like that is like always gonna be

00:41:48.260 | a giant advantage of vertical integration.

00:41:50.160 | I think like it lets us do things

00:41:51.340 | like have a really, really fast base model

00:41:53.060 | is really good at agent things,

00:41:54.520 | but is bad at cat and dog photo.

00:41:56.060 | It's pretty good at cat and dog photos.

00:41:57.460 | It's not like soda at cat and dog photos.

00:42:00.880 | So like we're allocating our capacity wisely,

00:42:04.840 | is like one thing that you really get to do.

00:42:06.880 | I also think that the other thing

00:42:08.240 | that is pretty important now

00:42:10.600 | in the broader foundation modeling space is like,

00:42:13.280 | I feel despite any potential concerns about,

00:42:17.520 | like how good is agents as like a startup area,

00:42:21.200 | that we were talking about earlier,

00:42:22.680 | I feel super good that we're doing foundation models

00:42:26.200 | in service of agents and all of the reward

00:42:28.260 | within ADAPT is flowing from, can we make a better agent?

00:42:31.620 | Because right now, I think we all see that,

00:42:34.500 | if you're training on publicly available web data,

00:42:37.500 | you put in the flops and you do reasonable things,

00:42:40.100 | then you get decent results.

00:42:41.780 | And if you just double the amount of compute,

00:42:43.740 | then you get predictably better results.

00:42:45.340 | And so like, I think pure play foundation model companies

00:42:48.800 | are just gonna be pinched by how good

00:42:52.240 | the next couple of llamas are gonna be.

00:42:53.940 | And the next good open source thing,

00:42:56.720 | and then seeing the really big players

00:43:00.120 | put ridiculous amounts of compute

00:43:01.800 | behind just training these base foundation models.

00:43:04.200 | I think it's gonna commoditize a lot of the regular LLMs

00:43:08.920 | and soon regular multimodal models.

00:43:10.680 | So I feel really good that we're just focused on agents.

00:43:13.240 | - So you don't consider yourself

00:43:14.600 | a pure play foundation model company?

00:43:16.560 | - No, because if we were a pure play foundation model

00:43:18.200 | company, we would be training general foundation models

00:43:21.680 | that do summarization and all this other--

00:43:24.520 | - Right, you're dedicated towards the agent.

00:43:26.800 | - Yeah, and our business is an agent business.

00:43:28.740 | We're not here to sell you tokens, right?

00:43:30.340 | And I think selling tokens,

00:43:32.600 | unless there's like a-- - We're not here

00:43:34.720 | to sell you tokens.

00:43:35.540 | I love it.

00:43:36.380 | - It's like, if you have a particular area of specialty,

00:43:41.080 | then you won't get caught in the fact that

00:43:43.940 | everyone's just scaling to ridiculous levels of compute.

00:43:47.500 | But if you don't have a specialty,

00:43:48.500 | I find that, I think it's gonna be a little tougher.

00:43:51.060 | - Interesting.

00:43:51.900 | Are you interested in robotics at all?

00:43:53.860 | - Personally fascinated by robotics.

00:43:55.140 | I always love, have always loved robotics.

00:43:57.620 | - No, but embodied agents as a business,

00:43:59.420 | figure is like a big, also sort of open AI affiliated

00:44:02.420 | company that raises a lot of money.

00:44:04.100 | - Yeah, I think it's cool.

00:44:05.260 | I think, I mean, I don't know exactly what they're doing,

00:44:09.040 | but-- - Robots.

00:44:10.640 | - Yeah, well, I mean, that's, yeah.

00:44:13.860 | - What question would you ask if we had them on?

00:44:15.460 | Like, what would you ask them?

00:44:16.840 | - Oh, I just wanna understand what their overall strategy

00:44:19.020 | is gonna be between now and when there's reliable stuff

00:44:21.540 | to be deployed.

00:44:22.940 | But honestly, I just don't know enough about it.

00:44:24.500 | - And if I told you, hey, fire your entire workforce,

00:44:28.060 | warehouse workforce, and put robots in there.

00:44:30.820 | Like, isn't that a strategy?

00:44:33.100 | - Oh, yeah, yeah, sorry, I'm not questioning

00:44:35.100 | whether they're doing smart things.

00:44:36.820 | I hope I didn't come off that way.

00:44:38.700 | - No, no, no, no, you didn't.

00:44:39.540 | - It's just like, I genuinely don't know

00:44:40.940 | what they're doing as much.

00:44:42.520 | But I think like, look, I think there's two things.

00:44:46.820 | One, I'm so excited for someone to train

00:44:50.300 | a foundation model of robots.

00:44:52.060 | Like, it's just, I think it's just gonna work.

00:44:54.820 | Like, I will die on this hill.

00:44:56.940 | I mean, like, again, this whole time,

00:44:59.260 | like, we've been on this podcast just continually saying,

00:45:01.620 | you know, like, these models are basically

00:45:04.820 | behavioral cloners, right?

00:45:05.860 | So let's go behavioral clone all this, like,

00:45:07.300 | robot behavior, right?

00:45:08.380 | And then you figure out everything else you have to do

00:45:10.580 | in order to teach you how to solve new problems.

00:45:12.200 | Like, that's gonna work.

00:45:13.540 | I'm super stoked for that.

00:45:15.880 | I think, unlike what we're doing with

00:45:19.020 | helping humans with knowledge work,

00:45:21.120 | it just sounds like a more zero-sum,

00:45:24.300 | like, job replacement play, right?

00:45:26.420 | And I'm personally less excited about that.

00:45:29.380 | - We had Ken June from Mimboo on the podcast.

00:45:33.820 | - Another guest.

00:45:34.860 | - Yeah, we asked her why people should go work there

00:45:37.340 | and not at ADAPT.

00:45:38.460 | - Oh, that's so funny.

00:45:39.520 | - So I wanna, her, well, she said, you know,

00:45:44.660 | there's space for everybody in this market.

00:45:46.720 | We're all doing interesting work.

00:45:48.080 | And she said, they're really excited about building

00:45:50.740 | an operating system for agent.

00:45:52.160 | And for her, the biggest research thing was, like,

00:45:55.040 | getting models better at reasoning

00:45:56.760 | and planning for these agents.

00:45:59.240 | The reverse question to you, you know,

00:46:01.360 | why should people be excited to come work at ADAPT

00:46:03.800 | instead of Mimboo?

00:46:04.920 | And maybe what are, like, the core research questions

00:46:08.360 | that people should be passionate about

00:46:09.800 | to have fun at ADAPT?

00:46:12.080 | - Yeah, first off, I think that,

00:46:15.220 | I'm sure you guys believe this too,

00:46:16.620 | but, like, the AI space, to the extent there's an AI space

00:46:21.140 | and the AI agent space are both, like, exactly,

00:46:24.580 | as she likely said, like, I think colossal opportunities

00:46:28.340 | and, like, people are just gonna end up winning

00:46:31.380 | in different areas and people are all just gonna,

00:46:33.140 | a lot of companies are gonna do well.

00:46:35.420 | So I really don't feel that zero-something at all.

00:46:37.980 | I would say, like, to, like, change the zero-sum framing

00:46:40.660 | is, like, why should you be at ADAPT?

00:46:43.400 | I think there's two huge reasons to be at ADAPT.

00:46:46.360 | I think one of them is, like, everything we do

00:46:49.720 | is in the service of, like, useful agents.

00:46:52.040 | Like, we're not a research lab.

00:46:53.600 | Like, we do a lot of research in service of that goal,

00:46:56.280 | but we don't think about ourselves

00:46:58.280 | as, like, a classic research lab at all.

00:47:00.480 | And I think the second reason to work at ADAPT

00:47:02.840 | is if you believe that actually having customers

00:47:05.380 | and a reward signal from customers

00:47:06.960 | lets you build a GI faster, which we really believe,

00:47:10.300 | then you should come here.

00:47:11.260 | And I think the examples for why that's true is, like,

00:47:13.660 | for example, like, our evaluations,

00:47:15.840 | they're not academic evals.

00:47:17.000 | They're not, like, simulator evals.

00:47:20.100 | They're, like, okay, like, we have a customer

00:47:21.940 | that really needs us to do these particular things.

00:47:24.480 | We can do some of them.

00:47:25.320 | These are the ones they want us to do.

00:47:26.140 | We can't do them at all.

00:47:26.980 | We've turned those into evals.

00:47:28.080 | Like, solve it, right?

00:47:29.440 | Like, I think that's really cool.

00:47:30.760 | Like, everybody knows a lot of these evals

00:47:32.200 | are, like, pretty saturated,

00:47:33.560 | and the new ones that even are not saturated,

00:47:35.360 | you look at someone and you're, like,

00:47:36.200 | is this actually useful, right?

00:47:37.920 | I think that's a degree of, like,

00:47:41.340 | of, like, practicality that really helps.

00:47:43.180 | Like, we're equally excited about the same problems

00:47:45.980 | around reasoning and planning and generalization

00:47:50.540 | and all of this stuff, but it's, like,

00:47:52.940 | they're very grounded in actual needs right now,

00:47:55.420 | which is really cool.

00:47:56.580 | - Yeah, this has been a wonderful dive.

00:47:59.060 | You know, I wish we had more time,

00:48:00.060 | but, you know, I would just leave it kind of open to you.

00:48:01.980 | I think you have broad thoughts, you know,

00:48:04.620 | just about the agent space,

00:48:05.580 | but also just the general AI space.

00:48:06.960 | Any sort of rants or things

00:48:09.380 | that are helpful for you right now?

00:48:11.400 | - Any rants?

00:48:12.640 | - Mining you for just general.

00:48:15.180 | - Wow, okay, so Amelia's already made the rant

00:48:17.680 | better than I have, but, like, not just chatbots

00:48:21.000 | is, like, kind of rant one.

00:48:23.120 | Rant two is, like, AI's really been the story of compute

00:48:28.120 | and compute plus data and ways in which

00:48:30.680 | you could change one for the other.

00:48:32.560 | And I think as much as our research community

00:48:37.560 | is really smart, like, we have made many, many advancements,

00:48:41.280 | and that's gonna continue to be important,

00:48:43.480 | but, like, now I think the game is increasingly changing,

00:48:47.160 | and, like, the rapid industrialization era has begun,

00:48:52.160 | and I think we, unfortunately, have to embrace it.

00:48:54.680 | - Yep, excellent.

00:48:55.520 | - Awesome, David, thank you so much for your time.

00:48:57.760 | - Cool, yeah, thanks, guys, this was fun.

00:48:59.900 | (upbeat music)

00:49:02.480 | (upbeat music)

00:49:05.060 | (upbeat music)

00:49:07.640 | (upbeat music)

00:49:10.220 | (upbeat music)

00:49:12.800 | (upbeat music)

00:49:15.380 | (upbeat music)

00:49:18.420 | (upbeat music)

00:49:21.000 | (upbeat music)

00:49:23.580 | [BLANK_AUDIO]

Why Google failed to make GPT-3 -- with David Luan of Adept

Chapters