[00:00:00.000 --> 00:00:02.580]   (upbeat music)
[00:00:02.580 --> 00:00:07.740]   - Hello, hello.
[00:00:07.740 --> 00:00:09.920]   This is Swix back again with part two
[00:00:09.920 --> 00:00:10.960]   of our NeurIPS coverage.
[00:00:10.960 --> 00:00:13.060]   This time we're gonna cover startups
[00:00:13.060 --> 00:00:15.000]   and it's a special episode
[00:00:15.000 --> 00:00:17.200]   because this is the last episode of 2023.
[00:00:17.200 --> 00:00:19.820]   We are definitely looking back at the year
[00:00:19.820 --> 00:00:21.800]   with rose colored glasses.
[00:00:21.800 --> 00:00:23.000]   This has been a fantastic year.
[00:00:23.000 --> 00:00:25.280]   We only started this podcast in February
[00:00:25.280 --> 00:00:26.840]   and it's grown so much.
[00:00:26.840 --> 00:00:28.520]   Thanks to all of you who've listened
[00:00:28.520 --> 00:00:30.720]   and give feedback and shared it with your friends.
[00:00:30.720 --> 00:00:33.320]   And we actually managed to invite a few
[00:00:33.320 --> 00:00:35.560]   of our former guests back on the pod
[00:00:35.560 --> 00:00:36.480]   together with some new friends
[00:00:36.480 --> 00:00:37.520]   and probably some new voices
[00:00:37.520 --> 00:00:38.840]   that you're gonna be hearing next year.
[00:00:38.840 --> 00:00:42.160]   So this is not a hard hitting interview series.
[00:00:42.160 --> 00:00:44.480]   You know, it's not that kind of interview.
[00:00:44.480 --> 00:00:47.960]   It's not that kind of podcast where we try to go too deep.
[00:00:47.960 --> 00:00:49.440]   Today we're just gonna go broad
[00:00:49.440 --> 00:00:52.000]   and we're just gonna check in on a bunch of startups
[00:00:52.000 --> 00:00:54.680]   that we like and monitor and we're present at NeurIPS.
[00:00:54.680 --> 00:00:57.560]   So first up is John Frankel of Mosaic ML.
[00:00:57.560 --> 00:01:01.840]   We last talked to him in May for the MPT7B episode.
[00:01:01.840 --> 00:01:03.280]   That's episode 13.
[00:01:03.280 --> 00:01:04.920]   And I have to say that was one of the best performing
[00:01:04.920 --> 00:01:05.880]   episodes of the whole year.
[00:01:05.880 --> 00:01:07.920]   So you're welcome to go back and listen to that
[00:01:07.920 --> 00:01:09.040]   if you missed it.
[00:01:09.040 --> 00:01:10.680]   And since then they were bought by Databricks
[00:01:10.680 --> 00:01:12.120]   for $1.3 billion.
[00:01:12.120 --> 00:01:14.360]   And actually during the interview,
[00:01:14.360 --> 00:01:16.080]   they were in the process of getting acquired.
[00:01:16.080 --> 00:01:17.440]   They just couldn't say anything about it,
[00:01:17.440 --> 00:01:19.520]   but it's definitely one of the biggest news of the year.
[00:01:19.520 --> 00:01:21.320]   And you can listen to what it's like
[00:01:21.320 --> 00:01:23.440]   or what's going through John's mind back then
[00:01:23.440 --> 00:01:26.520]   as well as now today, six months later.
[00:01:26.520 --> 00:01:28.280]   - Hey Jonathan, welcome back to the pod.
[00:01:28.280 --> 00:01:29.120]   - Thank you so much.
[00:01:29.120 --> 00:01:30.880]   This is an interesting place to have the pod
[00:01:30.880 --> 00:01:33.640]   under the overpass of interstate whatever it is.
[00:01:33.640 --> 00:01:36.520]   - Yeah, interstate whatever in the city of New Orleans.
[00:01:36.520 --> 00:01:37.840]   Yeah, it's really good to see you.
[00:01:37.840 --> 00:01:41.560]   Since you were last on the pod, Mosaic got acquired.
[00:01:41.560 --> 00:01:42.400]   - Yeah, thank you.
[00:01:42.400 --> 00:01:44.920]   I think you really deserve all the credit for this.
[00:01:44.920 --> 00:01:46.560]   - No, you guys were sitting on that news
[00:01:46.560 --> 00:01:49.040]   and we didn't know what was gonna happen.
[00:01:49.040 --> 00:01:52.040]   But I did come away from your interview
[00:01:52.040 --> 00:01:53.520]   with a very, very high impression of like,
[00:01:53.520 --> 00:01:54.960]   you guys are in a perfect place, perfect time
[00:01:54.960 --> 00:01:58.560]   and it makes a lot of sense to join forces with Databricks.
[00:01:58.560 --> 00:02:00.440]   - Yeah, they're kind of, I mean,
[00:02:00.440 --> 00:02:03.080]   I will say we really didn't want to get acquired.
[00:02:03.080 --> 00:02:04.120]   - You did not?
[00:02:04.120 --> 00:02:06.720]   - We didn't, I mean, we loved being independent,
[00:02:06.720 --> 00:02:08.840]   like we loved doing our own thing,
[00:02:08.840 --> 00:02:10.480]   but this just made too much sense.
[00:02:10.480 --> 00:02:15.120]   Like, you know, they do data, we do LLMs,
[00:02:15.120 --> 00:02:17.920]   we both do enterprises, we're all a bunch of academics.
[00:02:17.920 --> 00:02:19.680]   Like it was just kind of,
[00:02:19.680 --> 00:02:21.040]   we couldn't think of a better match.
[00:02:21.040 --> 00:02:24.000]   And so it just, we kind of came to the conclusion like,
[00:02:24.000 --> 00:02:27.040]   okay, I guess we can't not do this, like it's too perfect.
[00:02:27.040 --> 00:02:29.480]   - Yeah, yeah, and you've done a bunch of other podcasts
[00:02:29.480 --> 00:02:31.840]   on the acquisition, so I don't, we don't need to retread,
[00:02:31.840 --> 00:02:32.720]   I'll send people that way.
[00:02:32.720 --> 00:02:34.560]   Just like, what's new in Mosaic World?
[00:02:34.560 --> 00:02:37.360]   - In Mosaic World, honestly, like we're just cooking.
[00:02:37.360 --> 00:02:39.680]   I think we've been a little quiet lately,
[00:02:39.680 --> 00:02:41.520]   or at least we look quiet from the outside.
[00:02:41.520 --> 00:02:43.120]   It is certainly not that we haven't been busy
[00:02:43.120 --> 00:02:44.760]   and it's certainly not that, you know,
[00:02:44.760 --> 00:02:45.960]   we're not doing cool stuff.
[00:02:45.960 --> 00:02:47.960]   Part of it is that, you know, getting acquired,
[00:02:47.960 --> 00:02:49.360]   there's a bit of administrivia involved.
[00:02:49.360 --> 00:02:51.200]   You know, we had to go through new employee orientation,
[00:02:51.200 --> 00:02:53.160]   get health insurance, you know,
[00:02:53.160 --> 00:02:55.440]   meet our amazing new colleagues.
[00:02:55.440 --> 00:02:56.480]   Part of it is like, you know,
[00:02:56.480 --> 00:02:58.160]   the field has moved toward bigger stuff
[00:02:58.160 --> 00:03:00.320]   and we've moved toward bigger stuff.
[00:03:00.320 --> 00:03:02.040]   So I think we'll have some exciting stuff
[00:03:02.040 --> 00:03:02.920]   to talk about soon,
[00:03:02.920 --> 00:03:05.880]   but my philosophy is always like, speak through the work.
[00:03:05.880 --> 00:03:07.200]   So I don't wanna hype, I don't wanna like,
[00:03:07.200 --> 00:03:08.860]   get people excited, you know.
[00:03:08.860 --> 00:03:10.360]   You'll see the work and you judge for yourself.
[00:03:10.360 --> 00:03:12.000]   - Yeah, you talk about the industry
[00:03:12.000 --> 00:03:13.120]   moving towards bigger stuff.
[00:03:13.120 --> 00:03:14.760]   What trends are notable to you
[00:03:14.760 --> 00:03:16.360]   in the, let's say, second half of this year?
[00:03:16.360 --> 00:03:18.240]   - Everybody's figured out how to build LLMs.
[00:03:18.240 --> 00:03:21.440]   Like, it's no longer a coveted skill of, you know,
[00:03:21.440 --> 00:03:22.280]   a handful of people,
[00:03:22.280 --> 00:03:24.240]   but now we've all become LLM builders.
[00:03:24.240 --> 00:03:26.560]   The field has kind of narrowed in aperture again.
[00:03:26.560 --> 00:03:28.440]   And, you know, and yesterday when we were all figuring out
[00:03:28.440 --> 00:03:30.220]   how to train ImageNet, you know,
[00:03:30.220 --> 00:03:32.160]   now we're all figuring out how to build really big,
[00:03:32.160 --> 00:03:33.400]   really powerful models.
[00:03:33.400 --> 00:03:36.400]   And like, that's not just an assumed skill.
[00:03:36.400 --> 00:03:38.480]   The rest is kind of, what do you do with that skill?
[00:03:38.480 --> 00:03:39.320]   How do you build a product?
[00:03:39.320 --> 00:03:40.140]   How do you differentiate?
[00:03:40.140 --> 00:03:41.000]   What cool thing can you do
[00:03:41.000 --> 00:03:42.740]   that's different from everybody else?
[00:03:42.740 --> 00:03:45.400]   That's gonna determine kind of, you know,
[00:03:45.400 --> 00:03:47.160]   what 2024 is gonna be like.
[00:03:47.160 --> 00:03:49.240]   - Yeah, I guess, like, a lot of people are banking
[00:03:49.240 --> 00:03:51.160]   on multi-modal being, like,
[00:03:51.160 --> 00:03:53.520]   well, 2024 being the year of multi-modal LLMs.
[00:03:53.520 --> 00:03:56.540]   I feel like that's a little bit too broad a brush.
[00:03:56.540 --> 00:03:59.500]   I don't know, like, what's valuable in that front?
[00:03:59.500 --> 00:04:02.120]   - I mean, so multi-modal is gonna be a huge deal.
[00:04:02.120 --> 00:04:04.160]   Like, it's, but it's already a huge deal.
[00:04:04.160 --> 00:04:05.520]   Like, we can make multi-modal models.
[00:04:05.520 --> 00:04:07.780]   - The Lava Paper author I also interviewed on this pod.
[00:04:07.780 --> 00:04:09.160]   - Yeah, like, Lava's amazing.
[00:04:09.160 --> 00:04:11.320]   Like, you know, I've been playing with it a bunch personally.
[00:04:11.320 --> 00:04:12.520]   It's awesome.
[00:04:12.520 --> 00:04:14.360]   And we've got Bard, and we've got Gemini,
[00:04:14.360 --> 00:04:16.600]   and we've got GPT-4V, and, you know,
[00:04:16.600 --> 00:04:17.800]   I'm sure there are gonna be plenty more
[00:04:17.800 --> 00:04:19.340]   where that came from.
[00:04:19.340 --> 00:04:21.540]   I think the question is, as with all good things, you know,
[00:04:21.540 --> 00:04:24.720]   cool promise is different than, like, delivering value.
[00:04:24.720 --> 00:04:25.560]   - Yeah.
[00:04:25.560 --> 00:04:27.040]   - And I'm really curious, like, you know,
[00:04:27.040 --> 00:04:29.200]   what if people genuinely do this with this
[00:04:29.200 --> 00:04:30.280]   in real production settings,
[00:04:30.280 --> 00:04:32.200]   in the settings that will actually pay off
[00:04:32.200 --> 00:04:33.520]   the huge investment that's made
[00:04:33.520 --> 00:04:34.960]   to build these multi-modal models?
[00:04:34.960 --> 00:04:35.800]   - Right.
[00:04:35.800 --> 00:04:36.620]   - I'm also kind of curious, like,
[00:04:36.620 --> 00:04:37.960]   are we gonna start to see some big
[00:04:37.960 --> 00:04:39.480]   open-source multi-modal models?
[00:04:39.480 --> 00:04:41.440]   Like, you know, we've got Lava.
[00:04:41.440 --> 00:04:42.960]   It's moving in the right direction.
[00:04:42.960 --> 00:04:44.280]   But, like, is somebody gonna, you know,
[00:04:44.280 --> 00:04:46.840]   build something that looks a lot like GPT-4V,
[00:04:46.840 --> 00:04:48.040]   or something on that trajectory,
[00:04:48.040 --> 00:04:50.600]   and kind of start another arms race in that direction?
[00:04:50.600 --> 00:04:52.120]   Like, it'll be interesting to see.
[00:04:52.120 --> 00:04:53.560]   I'm honestly pretty curious,
[00:04:53.560 --> 00:04:55.080]   and I'm watching with bated breath
[00:04:55.080 --> 00:04:56.040]   for what everybody does.
[00:04:56.040 --> 00:04:58.480]   - Yeah, well, I think in our chat earlier today,
[00:04:58.480 --> 00:05:01.520]   you said, you know, we kind of live in a diverse world
[00:05:01.520 --> 00:05:04.160]   where, like, every company has kind of found its niche,
[00:05:04.160 --> 00:05:07.400]   maybe, if you wanna go through that logic.
[00:05:07.400 --> 00:05:10.400]   - Yeah, I'm kind of, like, I think there are, you know,
[00:05:10.400 --> 00:05:12.280]   there are the optimistic and pessimistic scenarios
[00:05:12.280 --> 00:05:13.120]   for where we go.
[00:05:13.120 --> 00:05:14.800]   Like, you know, I don't know.
[00:05:14.800 --> 00:05:16.200]   I kind of think there's a boring scenario
[00:05:16.200 --> 00:05:18.480]   where everybody basically is building these giant LLMs,
[00:05:18.480 --> 00:05:20.600]   and maybe language, you know, image models,
[00:05:20.600 --> 00:05:22.760]   or what have you, and they're all kind of the same.
[00:05:22.760 --> 00:05:24.280]   It's just you've got the Google version,
[00:05:24.280 --> 00:05:26.360]   and the OpenAI version, and the Amazon version,
[00:05:26.360 --> 00:05:28.640]   and it almost feels like cloud providers in some sense.
[00:05:28.640 --> 00:05:32.080]   Like, you know, what distinguishes AWS from GCP?
[00:05:32.080 --> 00:05:33.640]   It's kind of, you know, where you are.
[00:05:33.640 --> 00:05:35.520]   - Slightly different consoles and configurations.
[00:05:35.520 --> 00:05:36.720]   - Yeah, it's like different interface,
[00:05:36.720 --> 00:05:37.560]   and maybe you prefer one,
[00:05:37.560 --> 00:05:39.440]   or maybe, like, you've been using one for a while,
[00:05:39.440 --> 00:05:40.760]   and, like, you're used to it,
[00:05:40.760 --> 00:05:42.640]   or your IT person really likes this one,
[00:05:42.640 --> 00:05:44.640]   'cause, you know, they used to work at that company,
[00:05:44.640 --> 00:05:46.080]   or what have you.
[00:05:46.080 --> 00:05:47.320]   That would be a pretty boring world,
[00:05:47.320 --> 00:05:49.840]   but I think that's unlikely to be the case.
[00:05:49.840 --> 00:05:51.800]   I'm kind of, I'm looking at, like, you know,
[00:05:51.800 --> 00:05:54.160]   all the cool stuff coming out of Gemini,
[00:05:54.160 --> 00:05:56.360]   you know, all the cool stuff coming out of OpenAI,
[00:05:56.360 --> 00:05:57.720]   and then, like, I'm looking at Adobe.
[00:05:57.720 --> 00:05:59.200]   Like, they're building-- - Firefly, really?
[00:05:59.200 --> 00:06:01.400]   - Firefly, they're building, like, a different model
[00:06:01.400 --> 00:06:02.800]   with a creative perspective.
[00:06:02.800 --> 00:06:04.600]   Like, I'm kind of looking at this and going,
[00:06:04.600 --> 00:06:06.400]   maybe we'll have a wide diversity of models,
[00:06:06.400 --> 00:06:09.200]   and everybody will be building models,
[00:06:09.200 --> 00:06:11.240]   like, just by virtue of the fact that we need so much data
[00:06:11.240 --> 00:06:12.800]   to build any of these models.
[00:06:12.800 --> 00:06:14.200]   Everybody's gonna play to their strengths,
[00:06:14.200 --> 00:06:16.040]   and, you know, use every resource
[00:06:16.040 --> 00:06:16.920]   they have at their disposal,
[00:06:16.920 --> 00:06:19.880]   and Google has, you know, they have YouTube.
[00:06:19.880 --> 00:06:20.720]   I don't know if they're using it,
[00:06:20.720 --> 00:06:22.400]   but, like, that's a cool resource.
[00:06:22.400 --> 00:06:25.200]   OpenAI has put a ton of energy into text data.
[00:06:25.200 --> 00:06:27.160]   Adobe, like, gets creative people,
[00:06:27.160 --> 00:06:28.500]   and, like, there are a few other companies
[00:06:28.500 --> 00:06:29.680]   where that came from.
[00:06:29.680 --> 00:06:32.160]   So I'm kind of, like, I'm honestly curious
[00:06:32.160 --> 00:06:33.440]   if we're gonna just see, like,
[00:06:33.440 --> 00:06:35.320]   really different models for different people,
[00:06:35.320 --> 00:06:37.560]   and, I don't know, that's a pretty cool world to live in.
[00:06:37.560 --> 00:06:39.480]   Like, we won't see this arms race,
[00:06:39.480 --> 00:06:41.120]   we'll just kind of see, like, diversity.
[00:06:41.120 --> 00:06:42.960]   >> Yeah, and we shouldn't forget Bloomberg,
[00:06:42.960 --> 00:06:44.800]   which teased Bloomberg GPT,
[00:06:44.800 --> 00:06:46.480]   but that's a source of significant tokens,
[00:06:46.480 --> 00:06:47.320]   the financial world.
[00:06:47.320 --> 00:06:49.000]   >> Yeah, yeah, like, it's, I mean,
[00:06:49.000 --> 00:06:51.560]   my whole business on the Mosaic and Databricks side
[00:06:51.560 --> 00:06:53.960]   is, you know, helping people leverage the data they have,
[00:06:53.960 --> 00:06:56.400]   so I'm kind of, I'm excited about a world of diversity,
[00:06:56.400 --> 00:06:58.620]   because, you know, it's, not only do we have, like,
[00:06:58.620 --> 00:07:01.760]   these crazy diverse foundation models at the largest scales,
[00:07:01.760 --> 00:07:03.560]   but everybody embraces whatever they have.
[00:07:03.560 --> 00:07:05.320]   Like, our friends at Repl.it do a code model,
[00:07:05.320 --> 00:07:07.200]   and, you know, I don't know,
[00:07:07.200 --> 00:07:08.680]   Bloomberg does another finance model,
[00:07:08.680 --> 00:07:10.560]   and, like, somebody does a healthcare model,
[00:07:10.560 --> 00:07:12.760]   and, like, everybody draws on their strengths,
[00:07:12.760 --> 00:07:14.200]   and that's a cool world.
[00:07:14.200 --> 00:07:16.920]   >> Are you bullish every company training their own model?
[00:07:16.920 --> 00:07:19.200]   Sorry, that's a stupid question to ask you.
[00:07:19.200 --> 00:07:20.720]   (laughing)
[00:07:20.720 --> 00:07:22.000]   >> I mean, I think, you know,
[00:07:22.000 --> 00:07:23.200]   I'll give you the honest answer,
[00:07:23.200 --> 00:07:24.440]   'cause I think it's, you know,
[00:07:24.440 --> 00:07:26.360]   the business Mosaic answer is,
[00:07:26.360 --> 00:07:27.640]   oh yeah, I'm super bullish,
[00:07:27.640 --> 00:07:29.920]   like, everybody should train their own model,
[00:07:29.920 --> 00:07:31.560]   come do it on Databricks right now.
[00:07:31.560 --> 00:07:33.080]   >> Like, you should start from a base model
[00:07:33.080 --> 00:07:34.040]   that everyone shares, right?
[00:07:34.040 --> 00:07:35.120]   Like, that's kind of useful.
[00:07:35.120 --> 00:07:37.540]   >> Maybe, or, like, you work your way up.
[00:07:37.540 --> 00:07:39.960]   I think it's, like, there's a journey
[00:07:39.960 --> 00:07:41.600]   with playing with any of these models
[00:07:41.600 --> 00:07:43.200]   that may or may not end with training your own,
[00:07:43.200 --> 00:07:44.880]   depending on where you go on that journey.
[00:07:44.880 --> 00:07:46.640]   Like, you start by playing with an API,
[00:07:46.640 --> 00:07:48.080]   and maybe you do some retrieval,
[00:07:48.080 --> 00:07:49.720]   and maybe you do some fine-tuning,
[00:07:49.720 --> 00:07:51.120]   and then maybe you build your own model.
[00:07:51.120 --> 00:07:51.960]   >> Yeah.
[00:07:51.960 --> 00:07:52.780]   >> But, like, it's a journey,
[00:07:52.780 --> 00:07:55.520]   and I think there's a destination many people will get to
[00:07:55.520 --> 00:07:56.720]   that involves training their own model.
[00:07:56.720 --> 00:07:58.420]   >> Yeah, totally.
[00:07:58.420 --> 00:07:59.520]   What other trends are going on
[00:07:59.520 --> 00:08:01.680]   that you're liking, or seeing, or hating?
[00:08:01.680 --> 00:08:04.000]   >> Honestly, you know, maybe this gets to the question
[00:08:04.000 --> 00:08:06.480]   of, like, you know, overall impressions of NeurIPS.
[00:08:06.480 --> 00:08:09.840]   Like, I thought this was a pretty garden-variety NeurIPS,
[00:08:09.840 --> 00:08:11.920]   in some sense, which feels weird to say
[00:08:11.920 --> 00:08:13.800]   in the age of, you know, chat, GPT,
[00:08:13.800 --> 00:08:15.360]   and everything else that's happened in the past year.
[00:08:15.360 --> 00:08:16.200]   >> Yeah.
[00:08:16.200 --> 00:08:17.240]   >> But this felt like the most normal conference
[00:08:17.240 --> 00:08:19.640]   I've had since, like, 2019.
[00:08:19.640 --> 00:08:21.200]   You know, I mean, we've had a pandemic in between
[00:08:21.200 --> 00:08:24.680]   and everything, but, like, the past couple years,
[00:08:24.680 --> 00:08:26.000]   actually, internally at Mosaic,
[00:08:26.000 --> 00:08:27.880]   I always do a long write-up of every conference
[00:08:27.880 --> 00:08:28.820]   and the trends I see.
[00:08:28.820 --> 00:08:29.660]   >> Okay.
[00:08:29.660 --> 00:08:30.960]   >> Like, some public, some that are more relevant
[00:08:30.960 --> 00:08:33.320]   to what we're doing, and, like, a lot of the write-ups
[00:08:33.320 --> 00:08:34.480]   I've done over the past year or two
[00:08:34.480 --> 00:08:37.960]   have been, like, all about, like, the unease.
[00:08:37.960 --> 00:08:42.720]   Sometimes it was just, like, my write-up for ICML 2022
[00:08:42.720 --> 00:08:45.720]   was all about people capitulating to scale,
[00:08:45.720 --> 00:08:48.320]   and the five stages of grief, and, you know,
[00:08:48.320 --> 00:08:49.560]   how different people were responding.
[00:08:49.560 --> 00:08:52.960]   Academics, people at Google Brain, back when it existed,
[00:08:52.960 --> 00:08:54.280]   you know, all that stuff.
[00:08:54.280 --> 00:08:57.240]   And it almost looks quaint to think about
[00:08:57.240 --> 00:08:59.440]   that it was insightful to say people have capitulated
[00:08:59.440 --> 00:09:03.860]   to scale in this day and age where, you know,
[00:09:03.860 --> 00:09:05.880]   tens of billions of parameters looks mundane.
[00:09:05.880 --> 00:09:06.720]   >> Yeah.
[00:09:06.720 --> 00:09:09.120]   This kind of felt like, okay, the academics
[00:09:09.120 --> 00:09:10.880]   are trying to find their way forward.
[00:09:10.880 --> 00:09:12.480]   It's no longer just kind of coping and ignoring,
[00:09:12.480 --> 00:09:14.320]   but, like, trying to find their way forward.
[00:09:14.320 --> 00:09:16.960]   The industry folks are doing their thing.
[00:09:16.960 --> 00:09:19.080]   A lot more people keeping secrets than used to,
[00:09:19.080 --> 00:09:20.640]   but it's still, like, you know,
[00:09:20.640 --> 00:09:22.160]   a lot of people also aren't keeping secrets
[00:09:22.160 --> 00:09:23.560]   and can talk about what they're doing still.
[00:09:23.560 --> 00:09:24.400]   >> Yeah.
[00:09:24.400 --> 00:09:26.320]   >> So it kind of felt like, you know, equilibrium.
[00:09:26.320 --> 00:09:28.080]   I don't know how long it'll last,
[00:09:28.080 --> 00:09:30.120]   but this was a lot less of a frantic
[00:09:30.120 --> 00:09:32.660]   and stressful conference than I think I'm used to,
[00:09:32.660 --> 00:09:33.880]   at least in the past couple years.
[00:09:33.880 --> 00:09:36.040]   You know, I'm in a new role, in some sense.
[00:09:36.040 --> 00:09:36.960]   I'm on the business side now.
[00:09:36.960 --> 00:09:38.160]   I'm on the industry side.
[00:09:38.160 --> 00:09:39.000]   >> Yeah.
[00:09:39.000 --> 00:09:40.160]   >> And I'm trying to find my own path.
[00:09:40.160 --> 00:09:42.600]   But I felt like a lot of us have changed roles
[00:09:42.600 --> 00:09:45.200]   in some sense as the past couple years have,
[00:09:45.200 --> 00:09:47.400]   you know, have taken place and everybody's moved around
[00:09:47.400 --> 00:09:48.720]   and figured out what they want to do.
[00:09:48.720 --> 00:09:50.480]   But we've all kind of found our place at this point.
[00:09:50.480 --> 00:09:53.360]   I feel like, you know, we may be in different places,
[00:09:53.360 --> 00:09:56.280]   but the ecosystem, the community has kind of sustained
[00:09:56.280 --> 00:09:58.080]   with, you know, a bunch of new PhD students
[00:09:58.080 --> 00:09:59.560]   and all that good stuff.
[00:09:59.560 --> 00:10:00.400]   >> Yeah, yeah.
[00:10:00.400 --> 00:10:01.600]   >> Like, it's kind of, you know, I don't know,
[00:10:01.600 --> 00:10:03.000]   it's nature healing in some sense
[00:10:03.000 --> 00:10:06.320]   from the insanity of the past couple of years.
[00:10:06.320 --> 00:10:08.600]   And a reminder that, you know, we're all kind of small pieces
[00:10:08.600 --> 00:10:11.520]   in a much bigger, you know, ecosystem and community.
[00:10:11.520 --> 00:10:13.200]   >> Yeah, and it's still growing though.
[00:10:13.200 --> 00:10:15.160]   Apparently the latest stats was something
[00:10:15.160 --> 00:10:17.240]   like 15,000 attendees this year.
[00:10:17.240 --> 00:10:18.680]   >> Oh my God.
[00:10:18.680 --> 00:10:20.160]   Oh my God.
[00:10:20.160 --> 00:10:23.140]   I will say one big difference, you know,
[00:10:23.140 --> 00:10:24.760]   in the time right before the pandemic,
[00:10:24.760 --> 00:10:26.180]   deep learning was getting so popular,
[00:10:26.180 --> 00:10:28.960]   the conferences would sell out the day registration opened.
[00:10:28.960 --> 00:10:31.040]   Like as a student, you'd have to rush to register
[00:10:31.040 --> 00:10:32.600]   or you wouldn't even get to go.
[00:10:32.600 --> 00:10:33.640]   That I don't think is happening anymore.
[00:10:33.640 --> 00:10:34.480]   >> This year's easier, yeah.
[00:10:34.480 --> 00:10:36.760]   And they're also live streaming stuff, you know, so.
[00:10:36.760 --> 00:10:38.160]   >> Yeah, but it's kind of interesting that like,
[00:10:38.160 --> 00:10:39.960]   I guess we've adjusted to the huge capacity
[00:10:39.960 --> 00:10:42.800]   and everything that's, you know, going on.
[00:10:42.800 --> 00:10:44.900]   But it's, you know, even so with it getting bigger,
[00:10:44.900 --> 00:10:48.320]   it didn't feel that different to be honest.
[00:10:48.320 --> 00:10:49.600]   Maybe it's just that I joined the community
[00:10:49.600 --> 00:10:50.960]   when things were already big.
[00:10:50.960 --> 00:10:51.800]   >> Yeah.
[00:10:51.800 --> 00:10:52.640]   >> But like, you know, there were some journalists here,
[00:10:52.640 --> 00:10:54.360]   some VCs here, but that's always been the case.
[00:10:54.360 --> 00:10:55.360]   >> Yeah, it's always been the case.
[00:10:55.360 --> 00:10:59.280]   You always have, you know, overrated, underrated papers.
[00:10:59.280 --> 00:11:01.620]   We will maybe save the overrated stuff for later,
[00:11:01.620 --> 00:11:03.820]   but any underrated stuff that you want to highlight
[00:11:03.820 --> 00:11:05.440]   from this year, it doesn't have to be at the conference,
[00:11:05.440 --> 00:11:07.680]   but just want to remind you for underrated papers
[00:11:07.680 --> 00:11:08.800]   that people should pay attention to.
[00:11:08.800 --> 00:11:10.240]   >> I'm going to flip this a different way.
[00:11:10.240 --> 00:11:11.080]   >> Okay.
[00:11:11.080 --> 00:11:12.720]   >> Because I'm not a fan of overrated or underrated
[00:11:12.720 --> 00:11:15.680]   and I'm not like, I'm not a fan of passing judgment on stuff.
[00:11:15.680 --> 00:11:17.880]   I just don't like, far be it from me,
[00:11:17.880 --> 00:11:20.480]   I like, one of my big gripes is like,
[00:11:20.480 --> 00:11:21.920]   we shouldn't have best paper awards.
[00:11:21.920 --> 00:11:24.360]   Like, and I say that having gotten one back in the day.
[00:11:24.360 --> 00:11:26.100]   So I feel like I have the ability to say that
[00:11:26.100 --> 00:11:27.280]   not just out of bitterness,
[00:11:27.280 --> 00:11:29.560]   but out of like recognition that it's dumb.
[00:11:29.560 --> 00:11:30.400]   >> Sure.
[00:11:30.400 --> 00:11:32.420]   >> But test of time though, test of time is great.
[00:11:32.420 --> 00:11:33.380]   >> Test of time is awesome.
[00:11:33.380 --> 00:11:34.220]   >> Yeah.
[00:11:34.220 --> 00:11:36.300]   >> You know, and you know,
[00:11:36.300 --> 00:11:40.160]   I look forward to everybody using lottery tickets in 2029.
[00:11:40.160 --> 00:11:42.780]   No, if you're working on lottery tickets, you know,
[00:11:42.780 --> 00:11:44.340]   there's a lot of other cool stuff out there,
[00:11:44.340 --> 00:11:47.140]   but I think it's really, I'll turn that question into like,
[00:11:47.140 --> 00:11:49.540]   what areas should academics be thinking about?
[00:11:49.540 --> 00:11:51.340]   I don't know, what would I work on as a PhD student right now
[00:11:51.340 --> 00:11:52.940]   or what would I recommend a student work on?
[00:11:52.940 --> 00:11:54.920]   And all the biggest questions in the field
[00:11:54.920 --> 00:11:57.180]   come down to how you measure and how you evaluate.
[00:11:57.180 --> 00:11:59.420]   Those are just such fundamental questions
[00:11:59.420 --> 00:12:00.760]   until we know how to measure things,
[00:12:00.760 --> 00:12:02.720]   until we know how to evaluate anything,
[00:12:02.720 --> 00:12:04.120]   you can't really even do any science.
[00:12:04.120 --> 00:12:05.580]   We don't know what we're even talking about.
[00:12:05.580 --> 00:12:06.420]   >> Yeah.
[00:12:06.420 --> 00:12:08.440]   >> And so I'm also thinking a lot about like synthetic data.
[00:12:08.440 --> 00:12:10.620]   Can we generate useful evaluation sets
[00:12:10.620 --> 00:12:13.380]   for all the little properties we want to find about an LLM?
[00:12:13.380 --> 00:12:15.580]   Creating data sets is really hard,
[00:12:15.580 --> 00:12:17.180]   but a model can help us do that.
[00:12:17.180 --> 00:12:19.020]   So I'm kind of curious, like, you know,
[00:12:19.020 --> 00:12:22.300]   can we bootstrap the evaluation process with synthetic data,
[00:12:22.300 --> 00:12:24.740]   figure out good ways to help ourselves build good data sets,
[00:12:24.740 --> 00:12:26.420]   and then, you know, from there,
[00:12:26.420 --> 00:12:28.540]   maybe we can start to really take a bite
[00:12:28.540 --> 00:12:29.860]   out of the evaluation questions
[00:12:29.860 --> 00:12:31.700]   and get moving on the actual science
[00:12:31.700 --> 00:12:33.740]   of understanding what's going on with these LLMs.
[00:12:33.740 --> 00:12:37.060]   All that seems very academically viable.
[00:12:37.060 --> 00:12:38.820]   None of those require huge amounts of compute.
[00:12:38.820 --> 00:12:40.780]   They require creativity, ingenuity,
[00:12:40.780 --> 00:12:42.260]   but that's an abundance in academia,
[00:12:42.260 --> 00:12:43.460]   even when compute isn't.
[00:12:43.460 --> 00:12:46.760]   >> Yeah, I would say that that's actually one thing
[00:12:46.760 --> 00:12:48.980]   I've had a big delta on for this year.
[00:12:48.980 --> 00:12:50.060]   >> Yeah, tell me more, I'm curious.
[00:12:50.060 --> 00:12:51.860]   >> Synthetic data, I always thought it was,
[00:12:51.860 --> 00:12:53.940]   you're just kind of sampling from a known distribution anyway
[00:12:53.940 --> 00:12:55.100]   that you know is imperfect
[00:12:55.100 --> 00:12:56.900]   and doesn't match human preferences.
[00:12:57.740 --> 00:13:00.460]   And it's Kanjun from Imbue
[00:13:00.460 --> 00:13:01.860]   that actually changed my mind on this.
[00:13:01.860 --> 00:13:02.700]   >> Oh, tell me more.
[00:13:02.700 --> 00:13:04.500]   That is a smart person you're talking to.
[00:13:04.500 --> 00:13:05.860]   >> She's like, you actually don't want
[00:13:05.860 --> 00:13:06.860]   to match human preferences.
[00:13:06.860 --> 00:13:10.380]   You want to spike it in different ways, in useful ways.
[00:13:10.380 --> 00:13:13.260]   And so you want to synthesize data in useful ways
[00:13:13.260 --> 00:13:15.520]   that don't necessarily match human preferences.
[00:13:15.520 --> 00:13:17.100]   And once she said that, I was like, oh, okay,
[00:13:17.100 --> 00:13:20.340]   I think I'm actually sold on this as a viable practice.
[00:13:20.340 --> 00:13:22.140]   >> I would actually make a completely different argument,
[00:13:22.140 --> 00:13:23.220]   but she's right.
[00:13:23.220 --> 00:13:24.900]   So I'm probably going to make a wrong argument now
[00:13:24.900 --> 00:13:26.940]   because Kanjun is pretty much always right.
[00:13:26.940 --> 00:13:29.580]   And when she disagrees with me, it means I'm wrong.
[00:13:29.580 --> 00:13:32.660]   But the way that I look at it is synthetic data
[00:13:32.660 --> 00:13:37.660]   is not about, it's not about relying solely on the model.
[00:13:37.660 --> 00:13:39.700]   We as computer scientists love the idea
[00:13:39.700 --> 00:13:42.300]   that once you automate something, you fully automate it.
[00:13:42.300 --> 00:13:44.140]   It's really about how do you reduce
[00:13:44.140 --> 00:13:45.740]   the amount of work necessary
[00:13:45.740 --> 00:13:48.300]   to create something that's truly useful.
[00:13:48.300 --> 00:13:50.520]   And so synthetic data is not about
[00:13:50.520 --> 00:13:52.860]   can we whip up a data set automatically
[00:13:52.860 --> 00:13:54.180]   and then make a model better?
[00:13:54.180 --> 00:13:56.660]   It's about how can you use human time most effectively?
[00:13:56.660 --> 00:13:58.940]   And maybe labeling data or creating a data set from scratch
[00:13:58.940 --> 00:14:01.140]   is not the most effective use of human time.
[00:14:01.140 --> 00:14:04.620]   Maybe it's curating a data set that a model generated,
[00:14:04.620 --> 00:14:06.260]   you know, to pick the examples you like most
[00:14:06.260 --> 00:14:07.940]   and edit a few of them.
[00:14:07.940 --> 00:14:09.700]   When I think about the millions of different
[00:14:09.700 --> 00:14:11.540]   small properties of LLMs we want to study,
[00:14:11.540 --> 00:14:13.660]   like in some sense, the unit tests of LLMs
[00:14:13.660 --> 00:14:14.700]   that we want to develop, you know,
[00:14:14.700 --> 00:14:17.260]   that's going to require a bunch of tiny eval sets
[00:14:17.260 --> 00:14:19.700]   on specific really niche things.
[00:14:19.700 --> 00:14:22.640]   It's really hard for a human to just write from scratch.
[00:14:22.640 --> 00:14:24.500]   Nobody has the time or patience for that.
[00:14:24.500 --> 00:14:26.500]   If a model can help you do it and you can curate,
[00:14:26.500 --> 00:14:28.460]   you don't end up in a full feedback loop.
[00:14:28.460 --> 00:14:29.540]   You have a human there,
[00:14:29.540 --> 00:14:31.940]   but you're just making better use of your time.
[00:14:31.940 --> 00:14:32.780]   - Yeah, that makes sense.
[00:14:32.780 --> 00:14:35.860]   I would just observe that this sounds like weak labeling.
[00:14:35.860 --> 00:14:38.340]   And I talked to Raza Habib from Human Loop
[00:14:38.340 --> 00:14:40.220]   who actually pivoted away from weak labeling.
[00:14:40.220 --> 00:14:41.260]   - Interesting, tell me more.
[00:14:41.260 --> 00:14:42.980]   - I don't know, I just think it might have
[00:14:42.980 --> 00:14:44.220]   just been too early.
[00:14:44.220 --> 00:14:45.920]   I'm still a believer.
[00:14:45.920 --> 00:14:48.100]   - This is the thing about all of deep learning,
[00:14:48.100 --> 00:14:50.620]   like you never know whether you're too early,
[00:14:50.620 --> 00:14:53.580]   and too early is often six months too early.
[00:14:53.580 --> 00:14:54.920]   It's no longer the like, you know,
[00:14:54.920 --> 00:14:57.840]   Yoshua Bengio and everybody being 20 years too early.
[00:14:57.840 --> 00:14:58.740]   - Or Schmidt-Huber.
[00:14:58.740 --> 00:14:59.760]   - And Schmidt-Huber, of course.
[00:14:59.760 --> 00:15:02.600]   We have to salute, you know, Schmidt-Huber as well.
[00:15:02.600 --> 00:15:05.040]   It's not like being 20 years too early.
[00:15:05.040 --> 00:15:07.040]   It's like, you might be six months too early
[00:15:07.040 --> 00:15:09.200]   and some crazy thing is going to happen,
[00:15:09.200 --> 00:15:11.160]   or like something will finally click.
[00:15:11.160 --> 00:15:12.680]   And there goes that.
[00:15:12.680 --> 00:15:13.760]   - Yeah, yeah, totally.
[00:15:13.760 --> 00:15:15.880]   Cool, we're almost at probably your destination.
[00:15:15.880 --> 00:15:17.120]   The workshops tomorrow, you said,
[00:15:17.120 --> 00:15:19.040]   are like kind of the highlights for you for NeurIPS?
[00:15:19.040 --> 00:15:19.880]   - Yeah, yeah.
[00:15:19.880 --> 00:15:20.880]   That used to be my workshop strategy.
[00:15:20.880 --> 00:15:23.840]   I don't, I haven't, I picked out a few, but.
[00:15:23.840 --> 00:15:24.840]   - Oh, wander.
[00:15:24.840 --> 00:15:26.280]   - How do you do NeurIPS well, basically?
[00:15:26.280 --> 00:15:28.400]   - Wander and go to a lot of the poster sessions.
[00:15:28.400 --> 00:15:30.860]   Like, the talks at workshops are always great,
[00:15:30.860 --> 00:15:32.000]   but you know, often, honestly,
[00:15:32.000 --> 00:15:34.280]   the workshops are pretty eclectic in terms of talks.
[00:15:34.280 --> 00:15:35.800]   You try your best as a workshop organizer
[00:15:35.800 --> 00:15:37.240]   to put together a coherent program,
[00:15:37.240 --> 00:15:38.680]   but you know, presenters are gonna do
[00:15:38.680 --> 00:15:40.080]   what presenters are gonna do,
[00:15:40.080 --> 00:15:41.320]   and you can't really stop that.
[00:15:41.320 --> 00:15:44.480]   But instead, you know, I love the poster sessions,
[00:15:44.480 --> 00:15:46.440]   'cause like, you get students who are working
[00:15:46.440 --> 00:15:48.840]   on like really crazy creative stuff
[00:15:48.840 --> 00:15:50.600]   that isn't even ready for the conference yet.
[00:15:50.600 --> 00:15:53.040]   Like, you're actually seeing things
[00:15:53.040 --> 00:15:54.560]   that have not been put out on Twitter yet,
[00:15:54.560 --> 00:15:56.440]   and that's such a nice change from NeurIPS,
[00:15:56.440 --> 00:15:58.400]   where all the conference papers have been out for months,
[00:15:58.400 --> 00:15:59.240]   if not longer.
[00:15:59.240 --> 00:16:00.680]   - Oh, wait, I observed the opposite.
[00:16:00.680 --> 00:16:03.240]   Things that have been on Twitter for like forever
[00:16:03.240 --> 00:16:04.780]   are now out of date, and there are posters,
[00:16:04.780 --> 00:16:06.600]   because that's how long it takes to submit a paper.
[00:16:06.600 --> 00:16:07.440]   - Yeah, yeah.
[00:16:07.440 --> 00:16:08.280]   - So it's the other way.
[00:16:08.280 --> 00:16:10.040]   - But for the workshop poster sessions,
[00:16:10.040 --> 00:16:12.160]   it's the workshop poster sessions that are awesome,
[00:16:12.160 --> 00:16:13.600]   because you're truly seeing stuff
[00:16:13.600 --> 00:16:15.480]   that was created this fall,
[00:16:15.480 --> 00:16:16.320]   may not have been archived yet,
[00:16:16.320 --> 00:16:17.640]   nobody's talked about it,
[00:16:17.640 --> 00:16:19.560]   probably makes no sense yet,
[00:16:19.560 --> 00:16:21.440]   but may evolve into something really cool.
[00:16:21.440 --> 00:16:22.280]   - Interesting.
[00:16:22.280 --> 00:16:23.600]   - And so, and you also, like,
[00:16:23.600 --> 00:16:25.400]   there's not as much competition to talk to the people,
[00:16:25.400 --> 00:16:26.960]   you can just kind of chill.
[00:16:26.960 --> 00:16:28.920]   So I love to like wander from poster session
[00:16:28.920 --> 00:16:30.800]   to poster session throughout the workshops,
[00:16:30.800 --> 00:16:31.840]   'cause like, that's my favorite part.
[00:16:31.840 --> 00:16:33.240]   I don't know, I can hear, you know,
[00:16:33.240 --> 00:16:35.920]   somewhat important people talk any time,
[00:16:35.920 --> 00:16:37.640]   but it's like talking to the people
[00:16:37.640 --> 00:16:40.800]   and seeing, like, getting a glimpse of what might be ahead.
[00:16:40.800 --> 00:16:41.620]   - Yeah.
[00:16:41.620 --> 00:16:42.460]   - You know, being able to say like,
[00:16:42.460 --> 00:16:44.200]   oh my gosh, I remember seeing the poster for this paper
[00:16:44.200 --> 00:16:46.320]   that a year later becomes very important,
[00:16:46.320 --> 00:16:48.660]   and like, kind of asking yourself, you know,
[00:16:48.660 --> 00:16:50.760]   is this nonsense or is this brilliant?
[00:16:50.760 --> 00:16:52.080]   And like, not actually knowing the answer,
[00:16:52.080 --> 00:16:53.440]   having 50 million people on Twitter
[00:16:53.440 --> 00:16:55.000]   having told you the answer.
[00:16:55.000 --> 00:16:56.880]   That's kind of, I don't know, it's fun.
[00:16:56.880 --> 00:16:58.440]   It takes me back to like,
[00:16:58.440 --> 00:17:00.240]   what the conferences were like for me,
[00:17:00.240 --> 00:17:02.600]   you know, when I was early in my career.
[00:17:02.600 --> 00:17:04.240]   Like, you know, it was just kind of some random people
[00:17:04.240 --> 00:17:05.080]   coming and chatting with me,
[00:17:05.080 --> 00:17:07.360]   and you never really knew what was important
[00:17:07.360 --> 00:17:09.560]   and what wasn't, but it was all kind of cool and fun.
[00:17:09.560 --> 00:17:11.760]   - You use a formula hypothesis and, you know,
[00:17:11.760 --> 00:17:13.040]   search that way.
[00:17:13.040 --> 00:17:14.560]   Yeah, so I'm looking forward to tomorrow.
[00:17:14.560 --> 00:17:16.120]   If you find anything interesting, just let me know,
[00:17:16.120 --> 00:17:17.600]   and I'll go interview with them.
[00:17:17.600 --> 00:17:18.760]   I've been recording sessions
[00:17:18.760 --> 00:17:20.720]   with poster presenters all the time.
[00:17:20.720 --> 00:17:23.560]   And I wanted to expose people who don't come to NeurIPS,
[00:17:23.560 --> 00:17:25.280]   like, that this is what goes on.
[00:17:25.280 --> 00:17:27.960]   And there's so much, I found, so much talent
[00:17:27.960 --> 00:17:29.960]   that does a lot of work
[00:17:29.960 --> 00:17:31.520]   that you don't hear about them online,
[00:17:31.520 --> 00:17:32.360]   'cause they're just not online,
[00:17:32.360 --> 00:17:35.040]   or they just don't have the reach that, you know, I do.
[00:17:35.040 --> 00:17:36.780]   So like, I want to give them that reach.
[00:17:36.780 --> 00:17:38.300]   - Yeah, I think there's like, you know,
[00:17:38.300 --> 00:17:40.240]   I'll say two things kind of to close up.
[00:17:40.240 --> 00:17:41.740]   One is kind of that like,
[00:17:41.740 --> 00:17:44.640]   I feel like there's now so much hype attached to NeurIPS
[00:17:44.640 --> 00:17:46.240]   and iClear and ICML,
[00:17:46.240 --> 00:17:48.840]   just by virtue of the hype that's attached to the field.
[00:17:48.840 --> 00:17:49.960]   I don't know, this like,
[00:17:49.960 --> 00:17:51.560]   feels pretty mundane and boring to me.
[00:17:51.560 --> 00:17:54.440]   Like, it's really cool, but it's also just, you know,
[00:17:54.440 --> 00:17:56.000]   it's just a bunch of academics, like,
[00:17:56.000 --> 00:17:57.600]   walking around having boring conversations,
[00:17:57.600 --> 00:18:00.200]   getting coffee and like, pretending to party.
[00:18:00.200 --> 00:18:01.440]   I definitely, my experience of--
[00:18:01.440 --> 00:18:02.680]   - Pretending to party, I love it.
[00:18:02.680 --> 00:18:04.160]   - No, I'll say that, you know, I'll tell you--
[00:18:04.160 --> 00:18:05.000]   - It's true, it's so true.
[00:18:05.000 --> 00:18:07.360]   - My experience of NeurIPS last year, like,
[00:18:07.360 --> 00:18:08.880]   I don't know, these conferences have a reputation
[00:18:08.880 --> 00:18:10.800]   of being over the top with industry parties
[00:18:10.800 --> 00:18:11.920]   and things like that.
[00:18:11.920 --> 00:18:15.600]   And my impression was that was probably true in 2017.
[00:18:15.600 --> 00:18:19.660]   Like, that year is known as the NeurIPS that broke NeurIPS,
[00:18:19.660 --> 00:18:20.500]   for various reasons.
[00:18:20.500 --> 00:18:21.320]   I wasn't there at that time,
[00:18:21.320 --> 00:18:22.860]   that was before I was even in the field.
[00:18:22.860 --> 00:18:25.280]   But my experience last year, especially post-pandemic,
[00:18:25.280 --> 00:18:29.200]   was a whole generation of students had like, heard stories,
[00:18:29.200 --> 00:18:31.160]   and these stories had been built up in their minds,
[00:18:31.160 --> 00:18:33.180]   and they were trying to live out the fantasy
[00:18:33.180 --> 00:18:34.440]   of what they thought NeurIPS had been like.
[00:18:34.440 --> 00:18:36.120]   So these very boring happy hours,
[00:18:36.120 --> 00:18:38.960]   people tried to turn into ragers and it was hilarious.
[00:18:38.960 --> 00:18:40.600]   It was just adorable in some sense.
[00:18:40.600 --> 00:18:42.880]   So, you know, it's worth remembering, like, you know,
[00:18:42.880 --> 00:18:44.300]   there's the fantasy and there's the reality,
[00:18:44.300 --> 00:18:45.560]   and the reality is, you know,
[00:18:45.560 --> 00:18:47.700]   it's a boring industry conference where people are,
[00:18:47.700 --> 00:18:49.400]   or academic conference with some industry component
[00:18:49.400 --> 00:18:50.960]   where people are trying to make money
[00:18:50.960 --> 00:18:52.480]   and convince people to look at their posters
[00:18:52.480 --> 00:18:54.040]   and get a few citations and--
[00:18:54.040 --> 00:18:55.560]   - Lots of hiring, lots of hiring.
[00:18:55.560 --> 00:18:56.400]   - Lots of hiring.
[00:18:56.400 --> 00:18:57.360]   - Lots of hiring.
[00:18:57.360 --> 00:18:59.920]   - I think things have really settled into a new normal.
[00:18:59.920 --> 00:19:02.160]   And, you know, with all the hype and all the craziness
[00:19:02.160 --> 00:19:03.600]   over the past couple years,
[00:19:03.600 --> 00:19:05.760]   people feel like everything is just exploding
[00:19:05.760 --> 00:19:06.720]   and changing all the time.
[00:19:06.720 --> 00:19:07.900]   Like, you see those LinkedIn posts
[00:19:07.900 --> 00:19:09.400]   of everything has just changed.
[00:19:09.400 --> 00:19:10.240]   - I hate those.
[00:19:10.240 --> 00:19:11.160]   I hate those so much.
[00:19:11.160 --> 00:19:12.220]   I hate LinkedIn.
[00:19:12.220 --> 00:19:14.840]   If anyone is a LinkedIn influencer, I hate you.
[00:19:14.840 --> 00:19:16.280]   (laughing)
[00:19:16.280 --> 00:19:19.960]   But, you know, it's kind of like, this felt like, okay,
[00:19:19.960 --> 00:19:21.540]   like, maybe there's a steady state again.
[00:19:21.540 --> 00:19:23.480]   Maybe we can all catch our breath a bit.
[00:19:23.480 --> 00:19:25.400]   And it kind of felt like after a pandemic,
[00:19:25.400 --> 00:19:27.280]   after all the technical development that's happened
[00:19:27.280 --> 00:19:28.760]   in the past couple of years, like--
[00:19:28.760 --> 00:19:29.600]   - It's nice.
[00:19:29.600 --> 00:19:30.420]   - We can chill.
[00:19:30.420 --> 00:19:31.260]   - It's nice.
[00:19:31.260 --> 00:19:32.080]   - Like, we can kind of breathe a little bit.
[00:19:32.080 --> 00:19:33.500]   And there's something really nice about that.
[00:19:33.500 --> 00:19:34.400]   - Yeah, love that.
[00:19:34.400 --> 00:19:36.200]   Well, it's so nice to have you on again
[00:19:36.200 --> 00:19:37.400]   and chat and catch up.
[00:19:37.400 --> 00:19:38.240]   - Thank you so much.
[00:19:38.240 --> 00:19:39.060]   It's good to see you.
[00:19:39.060 --> 00:19:40.160]   - Thanks for jumping on.
[00:19:40.160 --> 00:19:41.280]   - In case it wasn't obvious,
[00:19:41.280 --> 00:19:44.320]   that was not up to the usual standards of our recordings
[00:19:44.320 --> 00:19:45.720]   because that was a walking interview.
[00:19:45.720 --> 00:19:48.080]   I was carrying these portable mics all over NeurIPS.
[00:19:48.080 --> 00:19:51.380]   And really the only way to schedule podcast interviews
[00:19:51.380 --> 00:19:54.560]   with people, especially busy people like John at NeurIPS,
[00:19:54.560 --> 00:19:56.320]   is to show up with a portable mic,
[00:19:56.320 --> 00:19:58.080]   shove it in their face and talk to them.
[00:19:58.080 --> 00:19:59.680]   And that's what the majority
[00:19:59.680 --> 00:20:02.580]   of the hip podcast conversations are for this episode,
[00:20:02.580 --> 00:20:04.400]   because that's the only way I can like,
[00:20:04.400 --> 00:20:06.880]   I see someone, grab someone, do you have 15 minutes
[00:20:06.880 --> 00:20:08.180]   and talk through something.
[00:20:08.180 --> 00:20:09.020]   That's what happens.
[00:20:09.020 --> 00:20:10.440]   That's how we schedule interviews
[00:20:10.440 --> 00:20:12.040]   with a whole bunch of people that we would not get.
[00:20:12.040 --> 00:20:13.880]   Otherwise, NeurIPS is too chaotic
[00:20:13.880 --> 00:20:16.080]   to schedule anything else otherwise.
[00:20:16.080 --> 00:20:17.960]   One takeaway from John's interview,
[00:20:17.960 --> 00:20:20.320]   which I want to highlight, apart from the whole,
[00:20:20.320 --> 00:20:22.080]   it's the new normal conversation,
[00:20:22.080 --> 00:20:25.200]   it's the focus on synthetic data generation.
[00:20:25.200 --> 00:20:28.320]   This is a recurring theme that is continually coming up
[00:20:28.320 --> 00:20:31.520]   from my conversations with literally everybody in the space.
[00:20:31.520 --> 00:20:33.120]   And how do you do it right?
[00:20:33.120 --> 00:20:36.520]   How do you do it with the blessing of OpenAI?
[00:20:36.520 --> 00:20:38.360]   ByteDance was recently banned from OpenAI
[00:20:38.360 --> 00:20:42.140]   because they were considered to be distilling from GPT-4,
[00:20:42.140 --> 00:20:44.420]   which is not allowed under the Terms of Service.
[00:20:44.420 --> 00:20:46.380]   I've heard that they're not the only company
[00:20:46.380 --> 00:20:48.260]   that is accused of or being thought of
[00:20:48.260 --> 00:20:50.460]   or rumored to be doing that.
[00:20:50.460 --> 00:20:51.460]   Probably the right approach,
[00:20:51.460 --> 00:20:53.540]   it's something that looks like DeepMind's approach,
[00:20:53.540 --> 00:20:55.940]   which on Monday of NeurIPS published a paper
[00:20:55.940 --> 00:20:57.860]   called "Beyond Human Data, Scaling Self-Training
[00:20:57.860 --> 00:20:59.580]   for Problem-Solving Language Models."
[00:20:59.580 --> 00:21:03.060]   And the concept is honestly not that complicated.
[00:21:03.060 --> 00:21:05.840]   For the domains of math and for coding,
[00:21:05.840 --> 00:21:09.340]   they were able to computer generate data for training on,
[00:21:09.340 --> 00:21:11.060]   and they found that when training POM2
[00:21:11.060 --> 00:21:13.220]   on that synthetically generated data
[00:21:13.220 --> 00:21:16.460]   improved their results and performance on the benchmarks
[00:21:16.460 --> 00:21:17.820]   for those relevant domains.
[00:21:17.820 --> 00:21:20.340]   It makes sense that we can scale
[00:21:20.340 --> 00:21:22.180]   beyond human data on those dimensions.
[00:21:22.180 --> 00:21:24.140]   That's the trivially easy stuff.
[00:21:24.140 --> 00:21:25.260]   And the question is,
[00:21:25.260 --> 00:21:28.700]   how do you scale beyond the verifiably correct?
[00:21:28.700 --> 00:21:30.820]   If you listen to part one of our NeurIPS coverage,
[00:21:30.820 --> 00:21:31.820]   we talked about DPO,
[00:21:31.820 --> 00:21:35.620]   which is more efficient usage of existing information.
[00:21:35.620 --> 00:21:37.920]   So not exactly using synthetic information,
[00:21:37.920 --> 00:21:39.760]   but just as a sneak peek of 2024,
[00:21:39.760 --> 00:21:41.760]   we've actually already recorded an episode
[00:21:41.760 --> 00:21:44.320]   with Nathan Lambert now of the Allen Institute
[00:21:44.320 --> 00:21:46.560]   on RLHF and RLAIF.
[00:21:46.560 --> 00:21:48.640]   And I think those approaches might scale
[00:21:48.640 --> 00:21:51.720]   beyond just the narrow domains of math and code.
[00:21:51.720 --> 00:21:53.960]   So next up is someone who's new to the pod,
[00:21:53.960 --> 00:21:54.800]   but not new to me.
[00:21:54.800 --> 00:21:57.560]   I've talked with Lynn from Fireworks a bunch
[00:21:57.560 --> 00:21:58.680]   over the past few months,
[00:21:58.680 --> 00:22:02.240]   and they've definitely blown up in the inference space.
[00:22:02.240 --> 00:22:03.060]   So in some sense,
[00:22:03.060 --> 00:22:04.760]   you can think of Fireworks as a competitor
[00:22:04.760 --> 00:22:07.600]   to Together AI or Replicate
[00:22:07.600 --> 00:22:09.880]   or any other sort of inference serving platform
[00:22:09.880 --> 00:22:11.120]   that you might think about,
[00:22:11.120 --> 00:22:12.280]   but they have a really good team
[00:22:12.280 --> 00:22:14.680]   and they've been doing some very good work with Mistral.
[00:22:14.680 --> 00:22:16.720]   Lynn and her team have an amazing track record,
[00:22:16.720 --> 00:22:18.680]   which you hear about in the interview,
[00:22:18.680 --> 00:22:20.360]   and their customer list is pretty stellar too.
[00:22:20.360 --> 00:22:22.520]   So it's worth checking out and checking in
[00:22:22.520 --> 00:22:25.920]   on the inference business with Lynn Tsao from Fireworks AI.
[00:22:25.920 --> 00:22:29.040]   - Rewind, we can do all that because this will be edited.
[00:22:29.040 --> 00:22:31.760]   Okay, so who are you and what is Fireworks?
[00:22:31.760 --> 00:22:33.800]   - Hey Sean, we started Fireworks last year
[00:22:33.800 --> 00:22:36.120]   and me and a few founding engineers,
[00:22:36.120 --> 00:22:37.760]   we have been working at MATA
[00:22:37.760 --> 00:22:42.000]   on building AI platform and specific PyTorch for five years.
[00:22:42.000 --> 00:22:43.760]   When we started PyTorch,
[00:22:43.760 --> 00:22:47.080]   it was a framework for researchers,
[00:22:47.080 --> 00:22:49.760]   and we took the mission to build one framework
[00:22:49.760 --> 00:22:51.760]   for both production and research
[00:22:51.760 --> 00:22:54.320]   and streamline research production transition,
[00:22:54.320 --> 00:22:56.840]   operating PyTorch as a huge scale
[00:22:56.840 --> 00:22:59.120]   for MATA and for the industry.
[00:22:59.120 --> 00:23:03.400]   So by the time we left last year,
[00:23:03.400 --> 00:23:06.680]   it is running more than five trillion inference per day
[00:23:06.680 --> 00:23:09.040]   across 50 data centers for MATA.
[00:23:09.040 --> 00:23:12.240]   And we feel like this is a great impact we have landed.
[00:23:12.240 --> 00:23:16.200]   But when we look at the industry, it's really, really behind.
[00:23:16.200 --> 00:23:21.200]   And we founded Fireworks to really bring this expertise
[00:23:21.200 --> 00:23:26.240]   to help industry adopt AI in the faster way,
[00:23:26.240 --> 00:23:30.920]   adopt the state of our best research into production
[00:23:30.920 --> 00:23:33.360]   in a very streamlined way.
[00:23:33.360 --> 00:23:35.640]   And why Fireworks the name?
[00:23:35.640 --> 00:23:37.760]   Because PyTorch holds fire,
[00:23:37.760 --> 00:23:39.880]   and we want this fire to be everywhere.
[00:23:39.880 --> 00:23:42.080]   That's why we come up with our name, Fireworks.
[00:23:42.080 --> 00:23:43.480]   - Nice, nice.
[00:23:43.480 --> 00:23:46.720]   Well, there's also Lightning and Lightning Labs
[00:23:46.720 --> 00:23:49.400]   is kind of a spinoff of that effort.
[00:23:49.400 --> 00:23:50.240]   - Right, right.
[00:23:50.240 --> 00:23:52.640]   - And basically, I think there are multiple teams
[00:23:52.640 --> 00:23:56.120]   working on better inference for PyTorch.
[00:23:56.120 --> 00:23:57.720]   Could you elaborate?
[00:23:57.720 --> 00:23:59.040]   How do you see the landscape
[00:23:59.040 --> 00:24:01.340]   of sort of inference as a service companies?
[00:24:01.340 --> 00:24:02.980]   I don't know if you consider yourself that,
[00:24:02.980 --> 00:24:05.140]   like infrastructure companies in general, I guess.
[00:24:05.140 --> 00:24:10.140]   - Right, so I think when we think about
[00:24:10.140 --> 00:24:13.860]   inference optimization, there are different angles, right?
[00:24:13.860 --> 00:24:17.840]   I still think PyTorch team, when I was there and now,
[00:24:17.840 --> 00:24:21.940]   now the PyTorch team, they are still doing a great job
[00:24:21.940 --> 00:24:24.300]   pushing for PyTorch performance optimization
[00:24:24.300 --> 00:24:26.220]   across training and inference
[00:24:26.220 --> 00:24:28.440]   through the PyTorch Compile project.
[00:24:29.420 --> 00:24:34.420]   The goal here is to, hey, keep the simple
[00:24:34.420 --> 00:24:37.420]   PyTorch programming API,
[00:24:37.420 --> 00:24:40.380]   which is really good for researchers,
[00:24:40.380 --> 00:24:42.100]   and then take the heavy lifting
[00:24:42.100 --> 00:24:44.340]   of doing optimization in an automatic way.
[00:24:44.340 --> 00:24:48.260]   But then, because PyTorch team just support
[00:24:48.260 --> 00:24:50.480]   and sustain a broad community,
[00:24:50.480 --> 00:24:53.660]   so the workload is much more diversified
[00:24:53.660 --> 00:24:56.020]   when they think about optimization.
[00:24:56.460 --> 00:25:00.340]   And here, at Fireworks, we take the same philosophy.
[00:25:00.340 --> 00:25:02.220]   We want to keep the simple API
[00:25:02.220 --> 00:25:04.420]   of PyTorch programming language,
[00:25:04.420 --> 00:25:06.700]   and take the heavy lifting of the optimization,
[00:25:06.700 --> 00:25:11.140]   but more specific target at industry verticals, right?
[00:25:11.140 --> 00:25:14.420]   For example, when we started company,
[00:25:14.420 --> 00:25:17.440]   we started from ranking recommendation,
[00:25:17.440 --> 00:25:20.320]   and we have a product around that.
[00:25:20.320 --> 00:25:24.540]   And then, later on, our customer we engage with,
[00:25:24.540 --> 00:25:26.700]   they're asking us, hey, can we help on Genii?
[00:25:26.700 --> 00:25:29.180]   Because all the Genii models are PyTorch models,
[00:25:29.180 --> 00:25:31.600]   it's bigger, it's more complex,
[00:25:31.600 --> 00:25:34.500]   it's even harder to operate and optimize.
[00:25:34.500 --> 00:25:37.740]   So then we start a vertical on Genii
[00:25:37.740 --> 00:25:39.420]   across large language model,
[00:25:39.420 --> 00:25:41.900]   and image generation, other modality as well.
[00:25:41.900 --> 00:25:44.100]   But because we focus on verticals,
[00:25:44.100 --> 00:25:47.580]   so we can't afford to take a much more specialized
[00:25:47.580 --> 00:25:49.700]   optimization approach.
[00:25:49.700 --> 00:25:52.900]   And that is complementary to PyTorch Compile,
[00:25:52.900 --> 00:25:56.220]   where PyTorch is driving for a broader audience.
[00:25:56.220 --> 00:25:57.940]   So that's where we are.
[00:25:57.940 --> 00:26:01.780]   And I will say, because of our PyTorch expertise,
[00:26:01.780 --> 00:26:04.980]   we are the best when it comes to
[00:26:04.980 --> 00:26:08.940]   performance optimization across the following areas, right?
[00:26:08.940 --> 00:26:11.620]   The performance for Genii models are pretty complicated
[00:26:11.620 --> 00:26:13.540]   because there's no one bottleneck
[00:26:13.540 --> 00:26:16.060]   on system resource consumption point of view.
[00:26:16.060 --> 00:26:20.840]   The bottleneck can scatter across CPU to GPU communication,
[00:26:20.840 --> 00:26:23.220]   the compute itself, memory bandwidth,
[00:26:23.220 --> 00:26:25.280]   and many other things.
[00:26:25.280 --> 00:26:30.280]   So we developed a very special scaling algorithm
[00:26:30.280 --> 00:26:35.660]   that allow us to tackle those bottleneck independently
[00:26:35.660 --> 00:26:38.580]   instead of blending them together.
[00:26:38.580 --> 00:26:41.460]   So that's very unique thing we are doing.
[00:26:41.460 --> 00:26:44.460]   The second is we build custom kernels
[00:26:44.460 --> 00:26:48.860]   across attentions, especially multi-query attention,
[00:26:48.860 --> 00:26:53.080]   matmul, or reduce, and those customer kernels
[00:26:53.080 --> 00:26:56.160]   outperform anything in the industry.
[00:26:56.160 --> 00:27:02.600]   Yeah, we also do many adaptive technology
[00:27:02.600 --> 00:27:04.960]   that just when we run the inference,
[00:27:04.960 --> 00:27:07.240]   it performance will get better.
[00:27:07.240 --> 00:27:08.560]   The more you run the workload,
[00:27:08.560 --> 00:27:11.120]   same workload will start to adapt to the workload
[00:27:11.120 --> 00:27:12.360]   and become better and better.
[00:27:12.360 --> 00:27:16.600]   So across all this, and that enable us to be
[00:27:16.600 --> 00:27:21.460]   in the leading position of Genii inference provider.
[00:27:21.460 --> 00:27:23.020]   - Just to give people a mental image,
[00:27:23.020 --> 00:27:24.260]   obviously they can go to the website,
[00:27:24.260 --> 00:27:26.660]   you have a self-serve option that people can try out.
[00:27:26.660 --> 00:27:29.760]   You mostly have a library of existing
[00:27:29.760 --> 00:27:31.180]   popular open source models.
[00:27:31.180 --> 00:27:33.100]   You just started creating your own models,
[00:27:33.100 --> 00:27:34.540]   which we can talk about.
[00:27:34.540 --> 00:27:35.820]   I didn't know that, that's super exciting.
[00:27:35.820 --> 00:27:40.220]   You actually recently enabled mixed trial
[00:27:40.220 --> 00:27:42.340]   in one day after their release
[00:27:42.340 --> 00:27:44.500]   by reverse engineering the code?
[00:27:44.500 --> 00:27:45.340]   - That's right.
[00:27:45.480 --> 00:27:46.320]   - That's a high level.
[00:27:46.320 --> 00:27:48.400]   - Yeah, so I think we did that twice.
[00:27:48.400 --> 00:27:51.320]   The first time when mixed trial 7B got released,
[00:27:51.320 --> 00:27:52.440]   the same day.
[00:27:52.440 --> 00:27:53.360]   They released in the morning,
[00:27:53.360 --> 00:27:56.400]   then in the afternoon we launched mixed trial 7B.
[00:27:56.400 --> 00:27:58.380]   I was the first to get work.
[00:27:58.380 --> 00:28:01.720]   - And this is basically, they release weights but no code.
[00:28:01.720 --> 00:28:03.920]   And then you have to implement code by guessing the--
[00:28:03.920 --> 00:28:07.360]   - Right, for mixed trial, that happened last week.
[00:28:07.360 --> 00:28:11.960]   They only released the weights, and there's no code.
[00:28:11.960 --> 00:28:16.020]   And I think it's really fun for us, right?
[00:28:16.020 --> 00:28:21.020]   So because thanks to the technology we developed over time,
[00:28:21.020 --> 00:28:28.200]   we actually built a slew of componentized libraries
[00:28:28.200 --> 00:28:33.020]   that enabling new models
[00:28:33.020 --> 00:28:35.120]   is not every time built from scratch.
[00:28:35.120 --> 00:28:39.060]   So because all these models
[00:28:39.060 --> 00:28:42.880]   share similar kind of model architecture underneath
[00:28:42.880 --> 00:28:44.260]   with different components,
[00:28:44.260 --> 00:28:47.700]   and that's why we have the velocity of the speed.
[00:28:47.700 --> 00:28:49.880]   But it was actually fun to hack it.
[00:28:49.880 --> 00:28:54.900]   Dima, he goes by Dimitri Zhukov.
[00:28:54.900 --> 00:28:55.740]   - Your CTO.
[00:28:55.740 --> 00:28:57.080]   - Yeah, our CTO.
[00:28:57.080 --> 00:29:00.160]   He basically took the Lama model
[00:29:00.160 --> 00:29:03.720]   and tried to retrofit to the mixed trial weights,
[00:29:03.720 --> 00:29:04.560]   and it worked.
[00:29:04.560 --> 00:29:06.080]   It worked, we were thrilled.
[00:29:06.080 --> 00:29:08.520]   Oh, it's actually working pretty well.
[00:29:08.520 --> 00:29:11.440]   But on top of that, it was just a base model.
[00:29:11.440 --> 00:29:13.900]   It's not Instruct Tune model.
[00:29:13.900 --> 00:29:16.640]   It's not really usable for chat.
[00:29:16.640 --> 00:29:19.360]   And then overnight, we tune a chat model
[00:29:19.360 --> 00:29:22.160]   and deploy it to pull bots
[00:29:22.160 --> 00:29:26.840]   and used by many other users already at high scale.
[00:29:26.840 --> 00:29:29.160]   And the feedback is really, really good.
[00:29:29.160 --> 00:29:30.600]   Of course, now we switched to mixed trial,
[00:29:30.600 --> 00:29:32.140]   Instruct as the official version,
[00:29:32.140 --> 00:29:34.320]   but we still keep getting users' feedback.
[00:29:34.320 --> 00:29:36.800]   Our overnight tuned chat model
[00:29:36.800 --> 00:29:38.600]   sometimes even performed better.
[00:29:38.600 --> 00:29:39.440]   - Wow.
[00:29:39.440 --> 00:29:41.120]   - So, yeah, so that's what we do.
[00:29:41.120 --> 00:29:44.760]   When it comes to the velocity of quality
[00:29:44.760 --> 00:29:48.480]   and velocity to high speed,
[00:29:48.480 --> 00:29:50.080]   we are the best company in the industry.
[00:29:50.080 --> 00:29:51.920]   - Yeah, mentioning speed, I should also mention
[00:29:51.920 --> 00:29:54.240]   that a lot of AI engineers listening on the podcast
[00:29:54.240 --> 00:29:56.680]   would be familiar with the Vercel AI Playgrounds,
[00:29:56.680 --> 00:29:59.240]   which you are the primary provider for, right?
[00:29:59.240 --> 00:30:00.600]   I mean, that's the one that's most visible
[00:30:00.600 --> 00:30:01.440]   'cause they name you,
[00:30:01.440 --> 00:30:02.720]   but I don't know if there's any other
[00:30:02.720 --> 00:30:04.240]   that you serve that you can name
[00:30:04.240 --> 00:30:06.640]   as you're the sort of inference provider.
[00:30:06.640 --> 00:30:08.880]   - Here's just kind of a very highly selective list
[00:30:08.880 --> 00:30:09.720]   of the customer. - Yeah, of course,
[00:30:09.720 --> 00:30:10.960]   it's not exhaustive.
[00:30:10.960 --> 00:30:13.400]   - Yeah, we get the marketing rights.
[00:30:13.400 --> 00:30:14.240]   - Yeah.
[00:30:14.240 --> 00:30:17.280]   - So we already served Tome.
[00:30:17.280 --> 00:30:19.600]   They're doing really good PowerPoint generation.
[00:30:19.600 --> 00:30:21.840]   If you haven't used that, please try it out.
[00:30:21.840 --> 00:30:22.840]   It's really cool.
[00:30:22.840 --> 00:30:24.960]   - Yeah, I used it for my keynote for my conference.
[00:30:24.960 --> 00:30:25.880]   - Oh, that's fantastic.
[00:30:25.880 --> 00:30:30.520]   - Yeah, I used like a magic trackpad to serve the Tome,
[00:30:30.520 --> 00:30:32.600]   and then obviously whenever I need to generate images,
[00:30:32.600 --> 00:30:34.140]   I actually generate it from inside of Tome.
[00:30:34.140 --> 00:30:35.880]   So I was using Fireworks without knowing it.
[00:30:35.880 --> 00:30:37.440]   - That's fantastic.
[00:30:37.440 --> 00:30:41.120]   We also serve the Copilot kind of application.
[00:30:41.120 --> 00:30:43.680]   For example, SourceGraph released Cody.
[00:30:43.680 --> 00:30:46.720]   - By the time this releases,
[00:30:46.720 --> 00:30:49.680]   we'll release our episode of SourceGraph and Steve Yagi.
[00:30:49.680 --> 00:30:50.520]   - Oh, that's great.
[00:30:50.520 --> 00:30:51.360]   That's great. - Yeah, we recorded one.
[00:30:51.360 --> 00:30:52.640]   We're good friends.
[00:30:52.640 --> 00:30:56.360]   - We also are the inference backend provider for Poll.
[00:30:56.360 --> 00:30:58.480]   That is a very popular chatbot,
[00:30:58.480 --> 00:31:00.680]   and Poll is building--
[00:31:00.680 --> 00:31:03.400]   - Wait, doesn't Poll just Anthropic or GPT?
[00:31:03.400 --> 00:31:04.240]   - At the beginning.
[00:31:04.240 --> 00:31:05.060]   - Oh, okay, now they have their own models.
[00:31:05.060 --> 00:31:08.760]   - Yeah, they are going big on open source models.
[00:31:08.760 --> 00:31:09.600]   - I see.
[00:31:09.600 --> 00:31:12.680]   - To provide a variety of different,
[00:31:12.680 --> 00:31:15.360]   solving different experiences,
[00:31:15.360 --> 00:31:18.860]   bring different experiences and much better performance.
[00:31:18.860 --> 00:31:22.960]   And of course, from their point of view, cost efficient.
[00:31:22.960 --> 00:31:25.360]   There are many other big enterprises,
[00:31:25.360 --> 00:31:27.320]   for example, with DoorDash.
[00:31:27.320 --> 00:31:28.160]   They're using us.
[00:31:28.160 --> 00:31:28.980]   - Did they say for what?
[00:31:28.980 --> 00:31:31.960]   - Yeah, so we actually, yeah,
[00:31:31.960 --> 00:31:35.780]   we release ranking recommendation stack with them
[00:31:35.780 --> 00:31:37.900]   to power their main business.
[00:31:37.900 --> 00:31:40.660]   Because when you go to their website,
[00:31:40.660 --> 00:31:42.980]   there are a lot of ranking recommendation stuff happening,
[00:31:42.980 --> 00:31:47.500]   including ads and kind of restaurant,
[00:31:47.500 --> 00:31:48.940]   search recommendation and so on.
[00:31:48.940 --> 00:31:50.660]   - One thing I wonder about is,
[00:31:50.660 --> 00:31:51.900]   for something like a DoorDash,
[00:31:51.900 --> 00:31:54.560]   and I'm a bit newer to Rexis in general,
[00:31:54.560 --> 00:31:56.740]   shouldn't those be pre-computed?
[00:31:56.740 --> 00:31:59.060]   Like, why does it have to be fast or live?
[00:31:59.060 --> 00:32:00.740]   It doesn't have to be live, right?
[00:32:00.740 --> 00:32:03.220]   - Actually, there are a lot of dynamism, right?
[00:32:03.220 --> 00:32:07.460]   Because your personal preference may change, right?
[00:32:07.460 --> 00:32:09.380]   It's also quickly learning.
[00:32:09.380 --> 00:32:12.020]   And their distribution channel,
[00:32:12.020 --> 00:32:14.660]   their participating restaurant may change,
[00:32:14.660 --> 00:32:16.140]   their menu may change.
[00:32:16.140 --> 00:32:19.960]   There's a lot of dynamism in the matching criteria here.
[00:32:19.960 --> 00:32:23.540]   And as I work at Matter for a long time,
[00:32:23.540 --> 00:32:28.380]   to actually do highly adaptive ranking recommendation,
[00:32:28.380 --> 00:32:30.600]   personalized ranking recommendation,
[00:32:30.600 --> 00:32:32.620]   yield the best performance
[00:32:32.620 --> 00:32:36.220]   when it comes to the relevance and revenue.
[00:32:36.220 --> 00:32:38.660]   - Yeah, I'm just asking offline versus online.
[00:32:38.660 --> 00:32:43.420]   I don't know how sensitive this is to latency requirements.
[00:32:43.420 --> 00:32:44.460]   - Oh, yeah, yeah.
[00:32:44.460 --> 00:32:48.740]   No, so a lot of time, most people,
[00:32:48.740 --> 00:32:53.740]   of course at big companies, people do online training.
[00:32:53.740 --> 00:32:56.580]   But for those enterprises,
[00:32:56.580 --> 00:32:59.920]   I haven't seen the need to go online training yet.
[00:32:59.920 --> 00:33:04.140]   So usually training is offline, but it's periodic, right?
[00:33:04.140 --> 00:33:06.700]   You have to refresh with new information
[00:33:06.700 --> 00:33:10.500]   and then you launch and deploy periodically, yeah.
[00:33:10.500 --> 00:33:13.560]   - Okay, and so I teased this earlier.
[00:33:13.560 --> 00:33:15.540]   I didn't know that you had your own models
[00:33:15.540 --> 00:33:17.020]   that you're also training.
[00:33:17.020 --> 00:33:18.780]   So you just released a clean lava.
[00:33:18.780 --> 00:33:20.380]   - Yeah.
[00:33:20.380 --> 00:33:21.200]   - What's the story behind that?
[00:33:21.200 --> 00:33:25.540]   - Right, so I think everyone knows like GPT-V
[00:33:25.540 --> 00:33:28.040]   and the kind of the space of multimodality, right?
[00:33:28.980 --> 00:33:32.360]   I think as I talked about in one of the interview
[00:33:32.360 --> 00:33:34.600]   when I was at Meta for PyTorch,
[00:33:34.600 --> 00:33:35.840]   at the end, the moderator asked me,
[00:33:35.840 --> 00:33:38.040]   "Hey, what do I think of the future?"
[00:33:38.040 --> 00:33:39.840]   My answer is multimodality.
[00:33:39.840 --> 00:33:41.360]   'Cause we live in the whole world
[00:33:41.360 --> 00:33:44.400]   that it has so many different modalities
[00:33:44.400 --> 00:33:49.400]   across image, audio, text, video, and so many other things.
[00:33:49.400 --> 00:33:53.920]   And that is the mix of our world and the real world experience.
[00:33:53.920 --> 00:33:57.680]   So yeah, we really think multimodality
[00:33:57.680 --> 00:34:00.640]   will be a very important aspect.
[00:34:00.640 --> 00:34:05.640]   So, and we take the very popular lava model from Microsoft,
[00:34:05.640 --> 00:34:10.680]   but it has the kind of GT4 training data.
[00:34:10.680 --> 00:34:13.960]   So we replace that with our own training data
[00:34:13.960 --> 00:34:16.920]   and make sure it's commercially usable.
[00:34:16.920 --> 00:34:19.080]   Yeah, we're super excited about this.
[00:34:19.080 --> 00:34:20.400]   - Yeah, I mean, it sounds like
[00:34:20.400 --> 00:34:22.720]   you'll be exploring more models as well
[00:34:22.720 --> 00:34:24.560]   and just putting all your platform
[00:34:24.560 --> 00:34:26.560]   and you're the fastest way to access them.
[00:34:26.560 --> 00:34:27.400]   We're here in New York.
[00:34:27.400 --> 00:34:29.200]   You're talking to a lot of industry folks.
[00:34:29.200 --> 00:34:30.920]   Any other top of mind conversations
[00:34:30.920 --> 00:34:31.880]   that you're just hearing a lot
[00:34:31.880 --> 00:34:33.600]   that may be surprising to people?
[00:34:33.600 --> 00:34:38.600]   - So I mostly talk with many startups that is emerging.
[00:34:38.600 --> 00:34:44.000]   So number one, it's really refreshing to me that,
[00:34:44.000 --> 00:34:46.560]   but not surprising, that there's so much
[00:34:46.560 --> 00:34:49.760]   product innovation that's happening across the board.
[00:34:49.760 --> 00:34:51.800]   So much energy there,
[00:34:51.800 --> 00:34:54.660]   but a lot of those are built on top of GNI.
[00:34:55.840 --> 00:34:56.980]   Of course it's not surprising,
[00:34:56.980 --> 00:35:00.720]   but it's kind of validating fundamentally
[00:35:00.720 --> 00:35:04.800]   innovative technology can reboot
[00:35:04.800 --> 00:35:06.520]   a huge part of the industry.
[00:35:06.520 --> 00:35:09.560]   So that's really, really refreshing.
[00:35:09.560 --> 00:35:14.560]   The second is, I think there are a lot more, hey,
[00:35:14.560 --> 00:35:20.120]   how we think about working together, right?
[00:35:20.120 --> 00:35:24.720]   How we build a bigger, more interesting product
[00:35:24.720 --> 00:35:26.680]   for a broader audience together.
[00:35:26.680 --> 00:35:30.160]   I think those conversation is very, very interesting to me.
[00:35:30.160 --> 00:35:31.000]   - Yeah, yeah.
[00:35:31.000 --> 00:35:31.820]   Okay, very cool.
[00:35:31.820 --> 00:35:33.800]   And you're also here to hire or recruit?
[00:35:33.800 --> 00:35:34.640]   - Oh, yeah, absolutely.
[00:35:34.640 --> 00:35:35.460]   - Maybe put out a call.
[00:35:35.460 --> 00:35:36.300]   Who are you looking for?
[00:35:36.300 --> 00:35:37.120]   What's the profile?
[00:35:37.120 --> 00:35:41.560]   - Yeah, we are definitely growing very fast as a company.
[00:35:41.560 --> 00:35:45.480]   We are looking for system engineers as,
[00:35:45.480 --> 00:35:50.480]   hey, we already have a rock solid inference serving,
[00:35:50.480 --> 00:35:54.200]   but we are scaling it quickly and aggressively.
[00:35:54.200 --> 00:35:59.120]   So anyone with cloud infrastructure experience
[00:35:59.120 --> 00:36:02.040]   move really fast, join us.
[00:36:02.040 --> 00:36:05.920]   We are also looking for researchers
[00:36:05.920 --> 00:36:10.280]   who has a lot of experience and understanding data a lot,
[00:36:10.280 --> 00:36:12.200]   understanding quality a lot,
[00:36:12.200 --> 00:36:14.760]   can get to kind of quickly help our customer
[00:36:14.760 --> 00:36:15.880]   get to high quality.
[00:36:15.880 --> 00:36:19.320]   And whether through training our own models
[00:36:19.320 --> 00:36:21.080]   or fine tuning the models
[00:36:21.080 --> 00:36:25.480]   and the building task specific fine tuning services.
[00:36:25.480 --> 00:36:28.960]   Those are the areas we are pushing really aggressive on.
[00:36:28.960 --> 00:36:31.280]   And of course, we are hiring across the board
[00:36:31.280 --> 00:36:35.520]   of go to market people all the way from marketing,
[00:36:35.520 --> 00:36:37.720]   solution architects, sales rep, and so on.
[00:36:37.720 --> 00:36:38.600]   - Yeah, yeah.
[00:36:38.600 --> 00:36:39.440]   Nice.
[00:36:39.440 --> 00:36:40.260]   Seems like you're scaling very quickly.
[00:36:40.260 --> 00:36:41.360]   Thanks for coming on.
[00:36:41.360 --> 00:36:43.080]   - Oh, thank you for having me.
[00:36:43.080 --> 00:36:43.920]   Cool.
[00:36:43.920 --> 00:36:44.760]   - That's it.
[00:36:44.760 --> 00:36:45.580]   - When I first met Fireworks,
[00:36:45.580 --> 00:36:46.760]   I was very impressed by your team,
[00:36:46.760 --> 00:36:49.120]   but since then I've been more impressed by the execution.
[00:36:49.120 --> 00:36:51.280]   And my guess is that this will not be the only time
[00:36:51.280 --> 00:36:53.720]   that you'll hear about them on the Lanespace pod.
[00:36:53.720 --> 00:36:56.120]   So far in organizing and editing this podcast,
[00:36:56.120 --> 00:36:57.680]   I've been trying to bias towards
[00:36:57.680 --> 00:37:00.120]   reintroducing previous guests of the pod
[00:37:00.120 --> 00:37:03.960]   as a form of end of year check-in episode with friends.
[00:37:03.960 --> 00:37:06.120]   But so many of them actually mentioned Fireworks.
[00:37:06.120 --> 00:37:08.600]   You'll see later with Cursor and Perplexity
[00:37:08.600 --> 00:37:10.400]   that I had to put Fireworks first,
[00:37:10.400 --> 00:37:13.060]   just because that many people have interacted with them,
[00:37:13.060 --> 00:37:15.640]   use them and love them or compete with them.
[00:37:15.640 --> 00:37:17.280]   I think it's a really interesting open question
[00:37:17.280 --> 00:37:19.840]   as to how much moat any one inference
[00:37:19.840 --> 00:37:22.360]   or commodity infrastructure provider can have.
[00:37:22.360 --> 00:37:24.640]   The people who are not in the business say there's no moat.
[00:37:24.640 --> 00:37:26.720]   And the people who are in the business, like Lin,
[00:37:26.720 --> 00:37:28.720]   see tons of moat in the software that they write,
[00:37:28.720 --> 00:37:30.440]   which obviously is proprietary to them.
[00:37:30.440 --> 00:37:32.160]   It's also interesting to see them start training
[00:37:32.160 --> 00:37:33.660]   and releasing their own models.
[00:37:33.660 --> 00:37:35.400]   And Fireworks released a lava variant,
[00:37:35.400 --> 00:37:38.780]   which we previously covered in our previous NeurIPS episode
[00:37:38.780 --> 00:37:40.480]   as one of the best papers of 2033.
[00:37:40.480 --> 00:37:43.080]   So I highly encourage you to check out that conversation
[00:37:43.080 --> 00:37:44.600]   with Altian if you're interested.
[00:37:44.600 --> 00:37:46.560]   So I say all that to preface the conversation
[00:37:46.560 --> 00:37:48.160]   that we're gonna have with the next two guests.
[00:37:48.160 --> 00:37:49.540]   The first is a return guest,
[00:37:49.540 --> 00:37:52.180]   which is Aman Sanger from cursor.so.
[00:37:52.180 --> 00:37:54.200]   We had them on in August to talk about
[00:37:54.200 --> 00:37:58.080]   their amazing rise to power as the AI first code editor.
[00:37:58.080 --> 00:38:00.600]   They've definitely exploded all over my timeline.
[00:38:00.600 --> 00:38:01.920]   And at the time of the interview,
[00:38:01.920 --> 00:38:06.920]   I myself was a VS Code, Cody, Codium, Copilot, Codium fan.
[00:38:06.920 --> 00:38:09.320]   And since then, I've actually switched my own workflow
[00:38:09.320 --> 00:38:11.400]   over to Cursor because of the better workflow
[00:38:11.400 --> 00:38:12.280]   that they provide.
[00:38:12.280 --> 00:38:13.560]   But still, there's a lot of open questions
[00:38:13.560 --> 00:38:14.560]   around their business.
[00:38:14.560 --> 00:38:17.360]   Just like Mosaic, during our podcast interview,
[00:38:17.360 --> 00:38:19.800]   they were actually sitting on a fundraise
[00:38:19.800 --> 00:38:21.960]   and they had recently announced their fundraise
[00:38:21.960 --> 00:38:22.780]   with OpenAI.
[00:38:22.780 --> 00:38:24.520]   So let's check in on Cursor.
[00:38:24.520 --> 00:38:25.360]   Okay, cool.
[00:38:25.360 --> 00:38:26.180]   So I'm back with Aman.
[00:38:26.180 --> 00:38:27.020]   Hey.
[00:38:27.020 --> 00:38:27.860]   - Hey, how's it going?
[00:38:27.860 --> 00:38:28.680]   - Hard to catch you.
[00:38:28.680 --> 00:38:29.800]   You're a difficult man to find.
[00:38:29.800 --> 00:38:30.960]   - I guess so.
[00:38:30.960 --> 00:38:31.920]   - You've been exploring NeurIPS
[00:38:31.920 --> 00:38:33.240]   and you also announced your fundraise
[00:38:33.240 --> 00:38:34.360]   since our last episode.
[00:38:34.360 --> 00:38:35.200]   - Yeah.
[00:38:35.200 --> 00:38:37.520]   So we raised $8 million from OpenAI.
[00:38:37.520 --> 00:38:39.020]   They've been a fantastic partner
[00:38:39.020 --> 00:38:40.500]   and I think it was a great decision.
[00:38:40.500 --> 00:38:41.520]   - Yeah.
[00:38:41.520 --> 00:38:42.820]   OpenAI used you themselves.
[00:38:42.820 --> 00:38:44.780]   - Yes, we have a lot of OpenAI users
[00:38:44.780 --> 00:38:47.080]   and we're growing pretty fast inside the org.
[00:38:47.080 --> 00:38:48.780]   The thing that we like to say is like,
[00:38:48.780 --> 00:38:52.480]   Cursor is the means by which research happens faster, right?
[00:38:52.480 --> 00:38:54.540]   Like as we make programming happen faster and faster,
[00:38:54.540 --> 00:38:56.480]   as we make programmers much more efficient,
[00:38:56.480 --> 00:38:59.040]   we're making researchers more efficient.
[00:38:59.040 --> 00:39:00.180]   And the bottleneck for research
[00:39:00.180 --> 00:39:02.100]   is really just implementation.
[00:39:02.100 --> 00:39:04.200]   If you can come up with an idea
[00:39:04.200 --> 00:39:06.700]   and then actually have the code,
[00:39:06.700 --> 00:39:09.980]   have the experiment all written for you immediately,
[00:39:09.980 --> 00:39:11.740]   researchers just happen much faster.
[00:39:11.740 --> 00:39:13.280]   And so that's the goal that we're working towards.
[00:39:13.280 --> 00:39:16.480]   And I think we're a tiny bit of the way there
[00:39:16.480 --> 00:39:18.120]   with a lot of OpenAI users.
[00:39:18.120 --> 00:39:18.960]   - Yeah.
[00:39:18.960 --> 00:39:21.640]   What's the funniest or most interesting sort of feedback
[00:39:21.640 --> 00:39:24.240]   you get from OpenAI people versus regular coders?
[00:39:24.240 --> 00:39:25.600]   Like do they prompt differently
[00:39:25.600 --> 00:39:26.760]   because they work at OpenAI?
[00:39:26.760 --> 00:39:29.240]   - So they actually probably have less feedback
[00:39:29.240 --> 00:39:30.440]   than some of our other users
[00:39:30.440 --> 00:39:31.600]   who are less familiar with language models
[00:39:31.600 --> 00:39:33.280]   'cause they know what the deficiencies are.
[00:39:33.280 --> 00:39:34.840]   They kind of know what's going on underneath the hood.
[00:39:34.840 --> 00:39:37.700]   - Yeah, you can probably give them interesting input
[00:39:37.700 --> 00:39:39.480]   on what people are trying and failing with.
[00:39:39.480 --> 00:39:40.680]   - Yeah, that's true.
[00:39:40.680 --> 00:39:41.860]   We do give them a lot of feedback
[00:39:41.860 --> 00:39:44.560]   on a lot of their early alphas and whatnot.
[00:39:44.560 --> 00:39:47.400]   - And so you've been tearing up the Twitters recently,
[00:39:47.400 --> 00:39:48.400]   putting in some effort.
[00:39:48.400 --> 00:39:50.040]   What are your sort of top messages
[00:39:50.040 --> 00:39:52.000]   that have been really resonating with people?
[00:39:52.000 --> 00:39:55.840]   - I was a big fan of the KV caching tweet.
[00:39:55.840 --> 00:39:58.360]   It's surprising that not too many people,
[00:39:58.360 --> 00:40:00.740]   it seemed like not too many people knew about this before.
[00:40:00.740 --> 00:40:01.580]   - Yeah.
[00:40:01.580 --> 00:40:04.800]   So when people learn about transformers,
[00:40:04.800 --> 00:40:08.080]   it's actually not in the documented literature
[00:40:08.080 --> 00:40:09.580]   and the academic side of things
[00:40:09.580 --> 00:40:12.200]   that KV caching is a common industry practice.
[00:40:12.200 --> 00:40:13.040]   - Yeah.
[00:40:13.040 --> 00:40:14.320]   - You only find out when you talk to industry people
[00:40:14.320 --> 00:40:16.040]   that you have a KV cache.
[00:40:16.040 --> 00:40:18.480]   - So when you say KV cache, it's really confusing
[00:40:18.480 --> 00:40:21.760]   because the KV cache, the KV cache can be cached, right?
[00:40:21.760 --> 00:40:23.000]   It's almost like a double caching.
[00:40:23.000 --> 00:40:25.160]   But the key idea here is,
[00:40:25.160 --> 00:40:27.600]   well, let's look at all the big closed model providers.
[00:40:27.600 --> 00:40:29.520]   They all have these chat models.
[00:40:29.520 --> 00:40:32.440]   And with chats and with conversations,
[00:40:32.440 --> 00:40:35.040]   the first N conversation messages are always fixed.
[00:40:35.040 --> 00:40:37.040]   And that means the first, let's say,
[00:40:37.040 --> 00:40:39.120]   N tokens are going to be fixed.
[00:40:39.120 --> 00:40:41.080]   And that means when I put the next token in,
[00:40:41.080 --> 00:40:43.360]   why do I need to redo all the work
[00:40:43.360 --> 00:40:45.900]   of re-computing the keys and values
[00:40:45.900 --> 00:40:47.140]   for those first N tokens?
[00:40:47.140 --> 00:40:48.440]   - Yeah.
[00:40:48.440 --> 00:40:51.360]   - And a standard inference trick for this
[00:40:51.360 --> 00:40:54.200]   is you take those keys and values
[00:40:54.200 --> 00:40:57.120]   and you move them from GPU RAM to CPU RAM.
[00:40:57.120 --> 00:40:57.960]   - Yeah.
[00:40:57.960 --> 00:41:00.120]   - You store them there for some period of time
[00:41:00.120 --> 00:41:01.160]   before they're evicted.
[00:41:01.160 --> 00:41:03.160]   And then if another request comes in
[00:41:03.160 --> 00:41:04.520]   with a matching prefix,
[00:41:04.520 --> 00:41:06.980]   the matching original conversation history,
[00:41:06.980 --> 00:41:08.400]   you just load those back into GPU RAM
[00:41:08.400 --> 00:41:11.360]   and you save a ton of time on compute.
[00:41:11.360 --> 00:41:13.440]   Your time to first token goes.
[00:41:13.440 --> 00:41:14.840]   And then because you're saving on compute,
[00:41:14.840 --> 00:41:16.880]   you can increase your throughput.
[00:41:16.880 --> 00:41:18.720]   And this is a trick that you don't really see
[00:41:18.720 --> 00:41:20.800]   in any of the open source inference actions.
[00:41:20.800 --> 00:41:22.320]   - So you don't see that,
[00:41:22.320 --> 00:41:24.220]   but people implement it on top of it, right?
[00:41:24.220 --> 00:41:25.060]   - Yes.
[00:41:25.060 --> 00:41:26.560]   Well, my understanding, I think,
[00:41:26.560 --> 00:41:27.400]   together, for example,
[00:41:27.400 --> 00:41:28.800]   I think is implementing this.
[00:41:28.800 --> 00:41:30.360]   - Yeah, and I just talked with Lin Tao
[00:41:30.360 --> 00:41:31.200]   from Fireworks as well,
[00:41:31.200 --> 00:41:32.560]   just doing that. - Yeah.
[00:41:32.560 --> 00:41:34.240]   - So one of the interesting,
[00:41:34.240 --> 00:41:36.800]   oh, I always assume that it's because of personalization.
[00:41:36.800 --> 00:41:38.560]   Like, hey, in my system prompt,
[00:41:38.560 --> 00:41:39.520]   I have today's date.
[00:41:39.520 --> 00:41:40.920]   I'm gonna have to update that once a day.
[00:41:40.920 --> 00:41:41.740]   Fine.
[00:41:41.740 --> 00:41:42.580]   No big deal.
[00:41:42.580 --> 00:41:43.420]   - Yeah.
[00:41:43.420 --> 00:41:46.000]   - But maybe if people have more customized prompts.
[00:41:46.000 --> 00:41:48.880]   But you said there's some kind of cache eviction policy
[00:41:48.880 --> 00:41:51.920]   where if there's a 95% match,
[00:41:51.920 --> 00:41:53.800]   you use the cache.
[00:41:53.800 --> 00:41:56.200]   - Yeah, I don't know what the exact eviction policy would be.
[00:41:56.200 --> 00:41:57.600]   You could probably use,
[00:41:57.600 --> 00:41:59.480]   assume you have, I don't know,
[00:41:59.480 --> 00:42:01.760]   100 gigabytes of space per device.
[00:42:01.760 --> 00:42:02.720]   Probably a lot more, actually.
[00:42:02.720 --> 00:42:06.500]   Probably up to a terabyte of CPU RAM per device.
[00:42:06.500 --> 00:42:07.660]   Or maybe per machine.
[00:42:07.660 --> 00:42:11.320]   You could just do something like least recently used.
[00:42:11.320 --> 00:42:14.560]   And then if you start to use up more space
[00:42:14.560 --> 00:42:15.640]   than exists on device,
[00:42:15.640 --> 00:42:18.520]   you just evict the least recently used request.
[00:42:18.520 --> 00:42:21.380]   - You are a consumer mostly of the GPT-4 API.
[00:42:21.380 --> 00:42:22.440]   - Yes.
[00:42:22.440 --> 00:42:23.920]   - They don't really expose this.
[00:42:23.920 --> 00:42:25.000]   - They don't. - In the API.
[00:42:25.000 --> 00:42:26.480]   How does this affect you?
[00:42:26.480 --> 00:42:29.000]   - I think it's actually pretty important
[00:42:29.000 --> 00:42:31.600]   to understand what's going on underneath the hood
[00:42:31.600 --> 00:42:33.560]   to take advantage of these things.
[00:42:33.560 --> 00:42:35.980]   So we use dedicated instances.
[00:42:35.980 --> 00:42:39.500]   - So they expose their capability to you.
[00:42:39.500 --> 00:42:40.340]   - Like somewhat.
[00:42:40.340 --> 00:42:41.740]   But the key thing is,
[00:42:41.740 --> 00:42:43.460]   they expose very little, actually.
[00:42:43.460 --> 00:42:44.900]   And you can-- - Isn't that weird?
[00:42:44.900 --> 00:42:46.500]   - I mean, yeah, but the only way
[00:42:46.500 --> 00:42:48.140]   that you can really take advantage of this,
[00:42:48.140 --> 00:42:50.160]   and I kind of had another tweet about this,
[00:42:50.160 --> 00:42:53.020]   is you need to really understand
[00:42:53.020 --> 00:42:54.060]   what's going on underneath the hood
[00:42:54.060 --> 00:42:58.420]   so you can then plan for when memory utilization is spiking
[00:42:58.420 --> 00:43:01.780]   based on how many tokens you're currently using
[00:43:01.780 --> 00:43:03.900]   or how much memory the instance you can speculate is,
[00:43:03.900 --> 00:43:06.780]   or when are you getting a lot of cache hits
[00:43:06.780 --> 00:43:10.340]   so you don't expect to be using as much compute,
[00:43:10.340 --> 00:43:12.020]   which means you can then increase your throughput
[00:43:12.020 --> 00:43:13.420]   without worrying about things going,
[00:43:13.420 --> 00:43:15.300]   latency spiking or things going down.
[00:43:15.300 --> 00:43:17.220]   - Yeah, and I don't know if you've,
[00:43:17.220 --> 00:43:20.820]   I've taken this thought to quite an extreme level.
[00:43:20.820 --> 00:43:23.020]   Like you can use this to cache RAG stuff,
[00:43:23.020 --> 00:43:24.020]   like RAG results.
[00:43:24.020 --> 00:43:24.860]   - Yeah.
[00:43:24.860 --> 00:43:26.700]   - And just general prompts, right?
[00:43:26.700 --> 00:43:27.660]   - You can, you can.
[00:43:27.660 --> 00:43:29.700]   So I did have another tweet about this
[00:43:29.700 --> 00:43:31.620]   where there's, no one's done this the best,
[00:43:31.620 --> 00:43:32.460]   to my knowledge,
[00:43:32.460 --> 00:43:34.660]   and I think this would be very, very hard to do,
[00:43:34.660 --> 00:43:37.820]   but you could technically cache the entirety
[00:43:37.820 --> 00:43:41.620]   of some corpus in something like S3
[00:43:41.620 --> 00:43:46.620]   if you have a model which has smaller sized keys and values.
[00:43:46.620 --> 00:43:49.180]   So this would be, instead of full multi-head attention,
[00:43:49.180 --> 00:43:52.620]   it could be something like grouped query attention,
[00:43:52.620 --> 00:43:54.740]   which is, I think, usually around 8x smaller,
[00:43:54.740 --> 00:43:59.740]   or even multi-query, which can be 64 to 256x smaller.
[00:44:00.220 --> 00:44:01.980]   And so then what that means is
[00:44:01.980 --> 00:44:05.420]   you can actually read the weights from Blob Storage
[00:44:05.420 --> 00:44:07.620]   if you have everything really optimized.
[00:44:07.620 --> 00:44:10.500]   You can read it into RAM a decent bit faster
[00:44:10.500 --> 00:44:13.260]   than it would actually take to compute,
[00:44:13.260 --> 00:44:15.060]   re-compute the key to kbcache.
[00:44:15.060 --> 00:44:17.540]   I think that'll be very tricky to implement,
[00:44:17.540 --> 00:44:19.740]   and I think there are actually not too many use cases
[00:44:19.740 --> 00:44:20.660]   where it would be useful.
[00:44:20.660 --> 00:44:23.660]   I think the code bases, there's actually one where it could be.
[00:44:23.660 --> 00:44:24.780]   - Yeah.
[00:44:24.780 --> 00:44:27.020]   My final observation on this is,
[00:44:27.940 --> 00:44:30.980]   OpenAI had the opportunity to offer caching to people
[00:44:30.980 --> 00:44:33.220]   with the assistance API, and again,
[00:44:33.220 --> 00:44:34.340]   they're charging you for the whole thing
[00:44:34.340 --> 00:44:37.340]   every single time you send a message to the assistance API.
[00:44:37.340 --> 00:44:41.740]   And I find it, is there some explanation?
[00:44:41.740 --> 00:44:44.780]   Is it just like a, we can do it, so we're gonna do it?
[00:44:44.780 --> 00:44:47.380]   - It's tricky when you're not using,
[00:44:47.380 --> 00:44:49.500]   I don't know what they're doing underneath the hood,
[00:44:49.500 --> 00:44:52.180]   but if you assume they're doing something like
[00:44:52.180 --> 00:44:55.820]   caching at a machine level, these are serverless endpoints.
[00:44:55.820 --> 00:44:57.220]   - I assume they're not, they're serverless, right?
[00:44:57.220 --> 00:45:00.260]   So you have to load, unload, and that causes a cold start,
[00:45:00.260 --> 00:45:02.500]   and that's a problem for them.
[00:45:02.500 --> 00:45:06.180]   - So it's really trivial when you have server endpoints,
[00:45:06.180 --> 00:45:08.600]   server-based endpoints, or dedicated instances.
[00:45:08.600 --> 00:45:11.020]   It's probably quite tricky to get right.
[00:45:11.020 --> 00:45:13.060]   I mean, I'm not really confident
[00:45:13.060 --> 00:45:15.420]   as to what their decision-making was there,
[00:45:15.420 --> 00:45:17.660]   but I'd imagine it's much more difficult to get right.
[00:45:17.660 --> 00:45:18.500]   - Got it.
[00:45:18.500 --> 00:45:20.700]   What was your second tweet that we prepped?
[00:45:20.700 --> 00:45:22.660]   - One of them that I thought was interesting
[00:45:22.660 --> 00:45:25.340]   was generating a trivial dataset.
[00:45:25.340 --> 00:45:26.900]   - Yes, synthetic data.
[00:45:26.900 --> 00:45:28.140]   - Using synthetic data.
[00:45:28.140 --> 00:45:30.460]   I mean, the key thing here is there's a lot of
[00:45:30.460 --> 00:45:33.060]   using synthetic data to, like the outputs of models
[00:45:33.060 --> 00:45:35.420]   to actually train weaker models,
[00:45:35.420 --> 00:45:38.060]   and so a lot of people have done this with GPT-4 outputs.
[00:45:38.060 --> 00:45:41.140]   This is actually, I think, that requires, I guess,
[00:45:41.140 --> 00:45:45.220]   the claim that you can train on GPT-4 outputs,
[00:45:45.220 --> 00:45:47.820]   and you'll still get pretty good models out of that.
[00:45:47.820 --> 00:45:48.660]   - Yeah, it's a little selfish.
[00:45:48.660 --> 00:45:49.580]   - Yeah, which seems reasonable,
[00:45:49.580 --> 00:45:51.460]   but we're actually relying on a weaker claim,
[00:45:51.460 --> 00:45:53.820]   because all we're doing is, I mean,
[00:45:53.820 --> 00:45:56.100]   people can check out the tweets and see it in more detail,
[00:45:56.100 --> 00:45:58.580]   but GPT-4 is quite good at this task
[00:45:58.580 --> 00:46:03.580]   of ordering four candidate documents
[00:46:03.580 --> 00:46:07.060]   given a query as to the relevance of the query, right?
[00:46:07.060 --> 00:46:08.740]   That's like, there have been papers that show this,
[00:46:08.740 --> 00:46:11.980]   like list-wise re-ranking, and it works really well.
[00:46:11.980 --> 00:46:14.620]   So if you do that for enough documents,
[00:46:14.620 --> 00:46:15.920]   and you do it in an efficient way,
[00:46:15.920 --> 00:46:18.140]   which we kind of use a variant of ELO
[00:46:18.140 --> 00:46:19.860]   called TruSkill to do,
[00:46:19.860 --> 00:46:24.580]   you can then get a really high-quality re-ranking dataset,
[00:46:24.580 --> 00:46:27.140]   really high-quality ordering over,
[00:46:27.140 --> 00:46:32.100]   let's say, 100 candidate documents given some query.
[00:46:32.100 --> 00:46:34.300]   So we use GPT-4 kind of in the loop
[00:46:34.300 --> 00:46:36.460]   for doing a bunch of different synthetic data stuff.
[00:46:36.460 --> 00:46:39.140]   This is one of them, and I feel like more people
[00:46:39.140 --> 00:46:40.820]   should be doing it for this kind of stuff.
[00:46:40.820 --> 00:46:44.340]   - Yeah, yeah, I think people are exploring
[00:46:44.340 --> 00:46:47.460]   synthetic data a lot at the back half of this year
[00:46:47.460 --> 00:46:51.060]   for choosing models as judges,
[00:46:51.060 --> 00:46:53.020]   models as synthetic data generators.
[00:46:53.020 --> 00:46:54.740]   - Yeah, I think models as judges
[00:46:54.740 --> 00:46:57.060]   is almost certainly going to work.
[00:46:57.060 --> 00:46:59.060]   If you use (mumbles) it's a very easy task.
[00:46:59.060 --> 00:47:00.460]   I think this is a very easy task.
[00:47:00.460 --> 00:47:02.820]   - This is how we do RLEIF?
[00:47:02.820 --> 00:47:04.620]   - Yeah, yeah, though it's interesting.
[00:47:04.620 --> 00:47:07.740]   RLEIF, I was looking at that paper again,
[00:47:07.740 --> 00:47:09.940]   and it seemed to really be good for,
[00:47:09.940 --> 00:47:12.180]   if you look at it compared to RLHF,
[00:47:12.180 --> 00:47:14.900]   it helped with harmlessness.
[00:47:14.900 --> 00:47:17.980]   I don't believe it actually helped in helpfulness.
[00:47:17.980 --> 00:47:20.200]   - It helped to achieve the Pareto optimal trade-off,
[00:47:20.200 --> 00:47:22.620]   which is no decline in the other two.
[00:47:22.620 --> 00:47:25.340]   - I think if you compare it to RLHF,
[00:47:25.340 --> 00:47:26.660]   it was pretty neck-and-neck.
[00:47:26.660 --> 00:47:29.700]   I don't think there's a statistically significant difference
[00:47:29.700 --> 00:47:32.540]   with helpfulness, at least, but it is interesting.
[00:47:32.540 --> 00:47:35.140]   RLEIF is just effectively getting better
[00:47:35.140 --> 00:47:37.380]   at censoring the model rather than improving
[00:47:37.380 --> 00:47:40.100]   its almost capabilities, its helpfulness,
[00:47:40.100 --> 00:47:43.500]   while RLHF, it'll do it as well as RLHF,
[00:47:43.500 --> 00:47:45.260]   but it doesn't offer anything additional there,
[00:47:45.260 --> 00:47:46.820]   which kind of makes sense to me.
[00:47:46.820 --> 00:47:50.060]   - First impressions on yours?
[00:47:50.060 --> 00:47:50.880]   - I mean, very interesting.
[00:47:50.880 --> 00:47:52.300]   Lots of very smart people.
[00:47:52.300 --> 00:47:55.100]   I've had lots of very interesting conversations.
[00:47:55.100 --> 00:47:56.380]   I'll probably be back next year.
[00:47:56.380 --> 00:47:57.780]   - I was kind of lukewarm on it coming in,
[00:47:57.780 --> 00:47:59.860]   'cause everyone goes like, "Oh, it's a big conference.
[00:47:59.860 --> 00:48:01.820]   "It's hard to navigate," and all that,
[00:48:01.820 --> 00:48:03.960]   but then you run into a few papers,
[00:48:03.960 --> 00:48:05.180]   people, authors, they're interesting,
[00:48:05.180 --> 00:48:07.180]   and then you're here, a bunch of other people
[00:48:07.180 --> 00:48:08.260]   I want to meet are all here.
[00:48:08.260 --> 00:48:10.980]   It's a nice way to get everyone in one place
[00:48:10.980 --> 00:48:13.220]   and just catch up on everything.
[00:48:13.220 --> 00:48:15.100]   The house parties are fun.
[00:48:15.100 --> 00:48:17.020]   Yesterday was just a lot of parties.
[00:48:17.020 --> 00:48:19.140]   I don't know.
[00:48:19.140 --> 00:48:21.060]   To me, it's very overwhelming,
[00:48:21.060 --> 00:48:23.940]   but I think the more exposures or epochs
[00:48:23.940 --> 00:48:26.100]   that you have on NeurIPS, the better,
[00:48:26.100 --> 00:48:28.300]   and I'm basically trying to doing this audio experience
[00:48:28.300 --> 00:48:29.300]   to try to bring people in,
[00:48:29.300 --> 00:48:31.660]   'cause there's many people who have just never come,
[00:48:31.660 --> 00:48:34.100]   but they should get a sense of what's going on here.
[00:48:34.100 --> 00:48:35.860]   I find there are people here
[00:48:35.860 --> 00:48:36.900]   who you've never heard of on Twitter.
[00:48:36.900 --> 00:48:38.140]   They're not on Twitter.
[00:48:38.140 --> 00:48:40.500]   They just know more, 'cause they've just done the work.
[00:48:40.500 --> 00:48:43.180]   - Exactly, yeah. - They've read everything.
[00:48:43.180 --> 00:48:44.780]   Have you seen the Datacomps paper?
[00:48:44.780 --> 00:48:45.620]   - I don't know.
[00:48:45.620 --> 00:48:47.620]   - I'll walk you over and show you.
[00:48:47.620 --> 00:48:49.820]   I was very impressed by their work.
[00:48:49.820 --> 00:48:51.700]   These people, they just come out of nowhere,
[00:48:51.700 --> 00:48:53.220]   and once a year, they do this,
[00:48:53.220 --> 00:48:54.300]   and this is the place to find them,
[00:48:54.300 --> 00:48:55.740]   so that's why I'm here.
[00:48:55.740 --> 00:48:57.660]   - Yeah, I mean, I completely agree.
[00:48:57.660 --> 00:49:00.860]   There's really such a good congregation
[00:49:00.860 --> 00:49:02.860]   of very good researchers, right?
[00:49:02.860 --> 00:49:04.300]   - Yeah, are you trying to hire them?
[00:49:04.300 --> 00:49:05.620]   Hey, let's make a hiring call.
[00:49:05.620 --> 00:49:08.180]   - Yeah, I mean, look, I think right now,
[00:49:08.180 --> 00:49:10.340]   we're a very small, very strong team.
[00:49:10.340 --> 00:49:11.780]   - We were five last time.
[00:49:11.780 --> 00:49:15.020]   - Yeah, so we are seven now.
[00:49:15.020 --> 00:49:17.540]   Only six engineers, though, so very small team.
[00:49:17.540 --> 00:49:18.900]   - You're more millions than people.
[00:49:18.900 --> 00:49:20.860]   (laughing)
[00:49:20.860 --> 00:49:21.700]   - Yeah.
[00:49:21.700 --> 00:49:24.740]   Look, we're a very small team,
[00:49:24.740 --> 00:49:25.860]   and we're looking to grow the team,
[00:49:25.860 --> 00:49:27.580]   but we're looking to grow it very carefully and slowly,
[00:49:27.580 --> 00:49:29.940]   'cause I think a lot of companies
[00:49:29.940 --> 00:49:31.580]   fall into the pitfall of hiring too quickly.
[00:49:31.580 --> 00:49:32.460]   - Yes.
[00:49:32.460 --> 00:49:34.740]   - So yeah, we're really looking for fantastic people.
[00:49:34.740 --> 00:49:38.740]   We're seeing incredible traction, incredible growth.
[00:49:38.740 --> 00:49:41.460]   There's a lot more really interesting problems to tackle,
[00:49:41.460 --> 00:49:44.100]   and people should check out our blog post on that,
[00:49:44.100 --> 00:49:46.860]   'cause I think it's very exciting, the kinds of things.
[00:49:46.860 --> 00:49:47.860]   - The fundraising post?
[00:49:47.860 --> 00:49:49.020]   - Yeah, there's a fundraising post,
[00:49:49.020 --> 00:49:50.500]   and then we kind of link there.
[00:49:50.500 --> 00:49:51.620]   There's a problems post if you go
[00:49:51.620 --> 00:49:54.460]   to anysphere.co/problems2023.
[00:49:54.460 --> 00:49:58.660]   There's lots of interesting work to do,
[00:49:58.660 --> 00:50:01.300]   and I think we have a really good chance of being the team
[00:50:01.300 --> 00:50:02.860]   that can crack CoGen.
[00:50:02.860 --> 00:50:04.620]   So it's a really exciting space.
[00:50:04.620 --> 00:50:06.940]   I think you'd be joining a very small, strong team.
[00:50:06.940 --> 00:50:07.980]   And so yeah, if you're interested
[00:50:07.980 --> 00:50:10.500]   in working with us at Cursor, we'd love to talk.
[00:50:10.500 --> 00:50:13.620]   You can just reach out to amon@cursor.sh.
[00:50:13.620 --> 00:50:14.940]   - Nice, sh, oh, okay.
[00:50:14.940 --> 00:50:15.860]   - Yeah, well--
[00:50:15.860 --> 00:50:16.700]   - I thought it was so--
[00:50:16.700 --> 00:50:18.700]   - We might try to get .ai or .com.
[00:50:18.700 --> 00:50:20.260]   We'll see, we'll see.
[00:50:20.260 --> 00:50:21.260]   Cool, well, thanks for dropping by.
[00:50:21.260 --> 00:50:22.100]   - Yeah, for sure.
[00:50:22.100 --> 00:50:23.100]   - Thanks for having me.
[00:50:23.100 --> 00:50:25.700]   - So there again, you see one of the topics
[00:50:25.700 --> 00:50:27.860]   that I highlighted from my conversation with John Frankel,
[00:50:27.860 --> 00:50:29.220]   which is why I put it at the start,
[00:50:29.220 --> 00:50:32.220]   which is synthetic data generation in all its glory.
[00:50:32.220 --> 00:50:35.900]   And for Amon and Cursor, they're particularly interested
[00:50:35.900 --> 00:50:39.740]   in LLMs as rankers, or LLMs as judges.
[00:50:39.740 --> 00:50:43.540]   And that seems to be generally a more blessed way
[00:50:43.540 --> 00:50:46.300]   than directly distilling the output of LLMs.
[00:50:46.300 --> 00:50:49.620]   And you can look out for our episode with Nathan in 2024
[00:50:49.620 --> 00:50:50.940]   to go deeper on that.
[00:50:50.940 --> 00:50:52.420]   Another founder that recently raised
[00:50:52.420 --> 00:50:54.580]   that is the talk of the AI community,
[00:50:54.580 --> 00:50:57.860]   particularly with Guillermo Rauch and Toby Lutka
[00:50:57.860 --> 00:51:00.660]   recently endorsing the product is Aravind Srinivas,
[00:51:00.660 --> 00:51:03.900]   or Perplexity AI, which started off being,
[00:51:03.900 --> 00:51:06.660]   maybe we will construct SQL queries for you.
[00:51:06.660 --> 00:51:09.500]   And they went to, maybe we'll construct SQL queries
[00:51:09.500 --> 00:51:11.180]   on our Twitter screen for you.
[00:51:11.180 --> 00:51:14.140]   And now they've blown up as a potential Google replacement,
[00:51:14.140 --> 00:51:16.020]   which is a huge increase in ambition,
[00:51:16.020 --> 00:51:18.780]   but they have the web app and the mobile apps to prove it.
[00:51:18.780 --> 00:51:21.020]   So here's Aravind with Perplexity.
[00:51:21.020 --> 00:51:24.860]   - And so congrats on all your success of Perplexity.
[00:51:24.860 --> 00:51:26.380]   The two most recent accomplishments,
[00:51:26.380 --> 00:51:28.700]   which I have seen at least on my feed is,
[00:51:28.700 --> 00:51:31.700]   one, you hit a million people on your mobile app.
[00:51:31.700 --> 00:51:32.540]   That's huge.
[00:51:32.540 --> 00:51:35.860]   - On both platforms, Android and iOS, independently.
[00:51:35.860 --> 00:51:38.300]   - Is that because of your slick video editing skills?
[00:51:38.300 --> 00:51:41.400]   - Actually, we have a good brand marketing designer.
[00:51:41.400 --> 00:51:44.180]   But I mean, more than everything else,
[00:51:44.180 --> 00:51:47.820]   I think the app's really good, fast.
[00:51:47.820 --> 00:51:49.380]   We spent a lot of time on it.
[00:51:49.380 --> 00:51:53.100]   In fact, our first rollout of the app was not that great.
[00:51:53.100 --> 00:51:54.860]   It was slow, it used to crash.
[00:51:54.860 --> 00:51:56.660]   Users complained, and we listened to that,
[00:51:56.660 --> 00:51:58.780]   and recruited a good mobile team,
[00:51:58.780 --> 00:52:00.500]   much faster and more reliable.
[00:52:00.500 --> 00:52:02.340]   - Any technical decisions that drove that?
[00:52:02.340 --> 00:52:04.700]   Is it React Native, that's slow, or something else?
[00:52:04.700 --> 00:52:05.700]   - It's all native.
[00:52:05.700 --> 00:52:08.420]   There's no, we're not on one common React stack.
[00:52:08.420 --> 00:52:10.100]   And the reason to do that is that's the only way
[00:52:10.100 --> 00:52:12.740]   to make the apps feel fast, right?
[00:52:12.740 --> 00:52:15.720]   And I believe ChatGPT also does this.
[00:52:15.720 --> 00:52:17.220]   They don't use React Native.
[00:52:17.220 --> 00:52:19.340]   - And then the other accomplishment is PPLX Online,
[00:52:19.340 --> 00:52:21.540]   which you're showing on screen here.
[00:52:21.540 --> 00:52:23.860]   What are the headline things that people should know
[00:52:23.860 --> 00:52:25.580]   if they haven't heard of PPLX Online?
[00:52:25.580 --> 00:52:27.780]   - Well, it's like the only LLM API
[00:52:27.780 --> 00:52:29.580]   that has no knowledge cutoff.
[00:52:29.580 --> 00:52:32.040]   So if you're a developer, and you just wanna prototype
[00:52:32.040 --> 00:52:34.060]   products that need information from the web,
[00:52:34.060 --> 00:52:37.140]   or has no knowledge cutoff, this is the only way to do that.
[00:52:37.140 --> 00:52:38.900]   And it's super fast, pretty accurate.
[00:52:38.900 --> 00:52:41.100]   You have two versions, a 7b and a 70b.
[00:52:41.100 --> 00:52:43.860]   So 7b is super fast, 7b is a little slower,
[00:52:43.860 --> 00:52:45.740]   but also better quality.
[00:52:45.740 --> 00:52:47.660]   And we plan to bring it up in the context
[00:52:47.660 --> 00:52:49.900]   of the Mixed Role MOE as well.
[00:52:49.900 --> 00:52:50.940]   That's been recently released.
[00:52:50.940 --> 00:52:52.060]   - Yeah, I think you've been pretty transparent
[00:52:52.060 --> 00:52:53.300]   that they are fine-tuned to Llamatu.
[00:52:53.300 --> 00:52:56.140]   - That's right, we are not in the business of pre-training.
[00:52:56.140 --> 00:52:57.680]   - But what do you fine-tune for
[00:52:57.680 --> 00:52:59.500]   between Llamatu and what you have?
[00:52:59.500 --> 00:53:01.780]   - Yeah, we fine-tune for summarization,
[00:53:01.780 --> 00:53:03.420]   the ability to take a bunch of sources
[00:53:03.420 --> 00:53:05.300]   and accurately give you a nice summary.
[00:53:05.300 --> 00:53:08.220]   - And you are, I think, the only provider right now
[00:53:08.220 --> 00:53:10.260]   with online access or whatever.
[00:53:10.260 --> 00:53:14.220]   But also Grok has access to Twitter,
[00:53:14.220 --> 00:53:15.140]   which you don't have.
[00:53:15.140 --> 00:53:16.940]   And they will release an API at some point.
[00:53:16.940 --> 00:53:20.980]   - If they release it, we'll be happy to use it.
[00:53:20.980 --> 00:53:23.980]   Our goal is to just give accurate answers on the web.
[00:53:23.980 --> 00:53:26.040]   And Twitter is just one part of the web.
[00:53:26.040 --> 00:53:29.000]   Their vision is like Twitter is the everything app.
[00:53:29.000 --> 00:53:31.580]   We believe that's the information out there
[00:53:31.580 --> 00:53:34.580]   that exists outside of Twitter that's also super valuable.
[00:53:34.580 --> 00:53:36.340]   In fact, you can even make an argument
[00:53:36.340 --> 00:53:38.700]   that information outside Twitter
[00:53:38.700 --> 00:53:40.380]   may even be a lot more valuable
[00:53:40.380 --> 00:53:41.900]   than information within Twitter
[00:53:41.900 --> 00:53:44.660]   because most of the links that get shared on Twitter
[00:53:44.660 --> 00:53:46.300]   are all from outside anyway.
[00:53:46.300 --> 00:53:48.100]   So it's only what you miss out on
[00:53:48.100 --> 00:53:50.460]   is a specific person's opinion.
[00:53:50.460 --> 00:53:52.660]   And usually journalists pick on that
[00:53:52.660 --> 00:53:53.780]   and write web articles.
[00:53:53.780 --> 00:53:55.800]   So it's all gonna diffuse.
[00:53:55.800 --> 00:53:58.260]   Good ideas usually diffuse the rest of the web.
[00:53:58.260 --> 00:54:00.060]   So we're not really missing out much.
[00:54:00.060 --> 00:54:01.580]   - It's a different source of data.
[00:54:01.580 --> 00:54:03.320]   - Yeah, it's a different source of data.
[00:54:03.320 --> 00:54:05.780]   Also, it's all about what do you want.
[00:54:05.780 --> 00:54:07.380]   Is your source a citation like
[00:54:07.380 --> 00:54:10.400]   already highly curated human artifact
[00:54:10.400 --> 00:54:11.900]   or is it like some tweet?
[00:54:11.900 --> 00:54:13.700]   These are all questions worth asking.
[00:54:13.700 --> 00:54:14.940]   - One thing that you do show off,
[00:54:14.940 --> 00:54:16.700]   so I was watching you demo just now.
[00:54:16.700 --> 00:54:18.340]   You have sentence by sentence citations.
[00:54:18.340 --> 00:54:19.180]   - That's right, yeah.
[00:54:19.180 --> 00:54:20.580]   - That's a design choice.
[00:54:20.580 --> 00:54:23.620]   Because realistically, your source articles
[00:54:23.620 --> 00:54:25.940]   actually overlap the full paragraph.
[00:54:25.940 --> 00:54:29.300]   So why did you choose to impose sentence by sentence?
[00:54:29.300 --> 00:54:30.660]   - That's how we write papers.
[00:54:30.660 --> 00:54:31.980]   I'm an academic.
[00:54:31.980 --> 00:54:33.500]   Every sentence you write in a paper
[00:54:33.500 --> 00:54:35.300]   needs to have a corresponding citation.
[00:54:35.300 --> 00:54:37.060]   - As a user, it can be confusing.
[00:54:37.060 --> 00:54:38.660]   Like when I click that link,
[00:54:38.660 --> 00:54:40.500]   maybe it's like the third paragraph.
[00:54:40.500 --> 00:54:41.320]   - That's right.
[00:54:41.320 --> 00:54:43.740]   We can do better in like exactly navigating you
[00:54:43.740 --> 00:54:45.220]   to the right part of the link.
[00:54:45.220 --> 00:54:46.380]   But we're looking into all that.
[00:54:46.380 --> 00:54:47.380]   - Yeah, of course.
[00:54:47.380 --> 00:54:49.500]   I mean, I do see you as like a search engine first
[00:54:49.500 --> 00:54:51.340]   with a very good language model team.
[00:54:51.340 --> 00:54:52.500]   - That's right, yeah.
[00:54:52.500 --> 00:54:53.320]   - Right?
[00:54:53.320 --> 00:54:54.160]   - Yeah, answer engine.
[00:54:54.160 --> 00:54:54.980]   I would call it answer engine.
[00:54:54.980 --> 00:54:55.820]   - Answer engine.
[00:54:55.820 --> 00:54:57.940]   You are doing a really good job with that.
[00:54:57.940 --> 00:55:00.300]   I also noticed in your PPLX blog post
[00:55:00.300 --> 00:55:02.220]   that you also talked about the fresh LLM paper.
[00:55:02.220 --> 00:55:03.060]   - That's right.
[00:55:03.060 --> 00:55:03.900]   - Maybe could you introduce that
[00:55:03.900 --> 00:55:05.980]   and did you talk to the authors?
[00:55:05.980 --> 00:55:07.780]   Are they here at NeurIPS?
[00:55:07.780 --> 00:55:10.060]   - I did not talk to the authors.
[00:55:10.060 --> 00:55:14.020]   It's not like we took a lot of inspiration from it,
[00:55:14.020 --> 00:55:19.020]   but it made sense to attribute the citation to them.
[00:55:19.020 --> 00:55:21.580]   - Yeah, to intellectual backgrounds.
[00:55:21.580 --> 00:55:24.260]   What do you look for at NeurIPS at a conference like this?
[00:55:24.260 --> 00:55:27.340]   - We're here for recruiting good, strong researchers
[00:55:27.340 --> 00:55:29.120]   to join our team,
[00:55:29.120 --> 00:55:31.300]   especially if they're more focused on shipping models
[00:55:31.300 --> 00:55:34.540]   to a search product seized by millions of people.
[00:55:34.540 --> 00:55:35.380]   - Awesome.
[00:55:35.380 --> 00:55:37.860]   We'll talk about your hiring call to action in a bit.
[00:55:37.860 --> 00:55:40.460]   I'm also interested in labs, like perplexity labs.
[00:55:40.460 --> 00:55:41.300]   - Yeah.
[00:55:41.300 --> 00:55:42.160]   - It seems like a place for you guys
[00:55:42.160 --> 00:55:43.940]   to experiment with serving models.
[00:55:43.940 --> 00:55:45.060]   - That's right.
[00:55:45.060 --> 00:55:49.460]   Yeah, everybody thinks you start off as a wrapper
[00:55:49.460 --> 00:55:51.900]   and then one magic day you just switch over
[00:55:51.900 --> 00:55:55.940]   from 3.5 to your own model.
[00:55:55.940 --> 00:55:57.380]   That's not how it works in practice.
[00:55:57.380 --> 00:55:59.860]   Your GPUs crash or your nodes are not working
[00:55:59.860 --> 00:56:01.780]   or Kubernetes doesn't work as expected
[00:56:01.780 --> 00:56:06.620]   and requests are not having the throughput required.
[00:56:06.620 --> 00:56:07.900]   You optimize for latency,
[00:56:07.900 --> 00:56:10.680]   but then you are worse on throughput,
[00:56:10.680 --> 00:56:12.900]   so you're not able to handle spike requests.
[00:56:12.900 --> 00:56:15.080]   So all these things can happen, right?
[00:56:15.080 --> 00:56:18.180]   So you only know about these if you start small
[00:56:18.180 --> 00:56:20.900]   and serve a playground where people come
[00:56:20.900 --> 00:56:24.400]   and test your own infrastructure and see how it holds up
[00:56:24.400 --> 00:56:25.660]   and then take the lessons from there
[00:56:25.660 --> 00:56:28.100]   and use it to serve it on production, right?
[00:56:28.100 --> 00:56:30.340]   So labs is sort of our playground
[00:56:30.340 --> 00:56:33.860]   for testing open-source models and our in-house models
[00:56:33.860 --> 00:56:34.700]   that have been fine-tuned
[00:56:34.700 --> 00:56:37.620]   for factual accuracy and helpfulness.
[00:56:37.620 --> 00:56:41.020]   And it's a nice way for people to test open-source models
[00:56:41.020 --> 00:56:42.780]   if they're curious about it,
[00:56:42.780 --> 00:56:43.860]   especially if they think about it
[00:56:43.860 --> 00:56:45.860]   as alternatives to chat GPT.
[00:56:45.860 --> 00:56:47.240]   And then it's also a nice way for us
[00:56:47.240 --> 00:56:49.700]   to battle-test our infrastructure.
[00:56:49.700 --> 00:56:51.540]   Same thing goes to the API.
[00:56:51.540 --> 00:56:53.180]   It's not like I believe these APIs
[00:56:53.180 --> 00:56:57.260]   are gonna take over GPT 3.5 APIs or something,
[00:56:57.260 --> 00:56:58.820]   but it's a nice way for our developers
[00:56:58.820 --> 00:57:01.100]   who want an alternative to explore,
[00:57:01.100 --> 00:57:03.860]   especially those who wanna use faster, smaller models,
[00:57:03.860 --> 00:57:05.220]   like the 7B models.
[00:57:05.220 --> 00:57:07.060]   And it's also a good way for us to know
[00:57:07.060 --> 00:57:09.620]   how we can handle search requests and things like that.
[00:57:09.620 --> 00:57:10.460]   - Yeah.
[00:57:10.460 --> 00:57:12.540]   I mean, so I wanna push back on this.
[00:57:12.540 --> 00:57:15.060]   You said your playground is a way to battle-test,
[00:57:15.060 --> 00:57:17.100]   but I think you would probably get orders
[00:57:17.100 --> 00:57:19.660]   of magnitude more traffic on your main app
[00:57:19.660 --> 00:57:20.900]   than your side app.
[00:57:20.900 --> 00:57:23.300]   - Look, we can't just directly ship to the main app, right?
[00:57:23.300 --> 00:57:24.700]   And you can never simulate--
[00:57:24.700 --> 00:57:25.660]   - It's like a staging environment.
[00:57:25.660 --> 00:57:26.980]   - Yeah, it's like a staging environment.
[00:57:26.980 --> 00:57:29.300]   But it's not just meant to be a staging.
[00:57:29.300 --> 00:57:31.980]   I don't wanna downplay the importance of Labs.
[00:57:31.980 --> 00:57:34.140]   Labs is sort of one of the fewest places
[00:57:34.140 --> 00:57:36.740]   on the internet today for you to go and explore
[00:57:36.740 --> 00:57:38.620]   and compare different open-source models.
[00:57:38.620 --> 00:57:42.460]   And it also tells the user how fast our inference is.
[00:57:42.460 --> 00:57:45.060]   We give you all the metrics like tokens per second,
[00:57:45.060 --> 00:57:46.380]   the time to first token.
[00:57:46.380 --> 00:57:48.500]   It's also a very transparent way to communicate
[00:57:48.500 --> 00:57:50.180]   the speed of our infrastructure,
[00:57:50.180 --> 00:57:52.780]   which helps us also recruit good talent for infrastructure.
[00:57:52.780 --> 00:57:55.220]   - Yeah, but I think you're pretty opinionated
[00:57:55.220 --> 00:57:56.980]   that you are an app company first.
[00:57:56.980 --> 00:57:57.820]   - Yeah, we are an app company.
[00:57:57.820 --> 00:57:58.660]   - You're not an inference company.
[00:57:58.660 --> 00:57:59.480]   - That's right.
[00:57:59.480 --> 00:58:00.320]   - You just happen to have--
[00:58:00.320 --> 00:58:02.140]   - We're not competing with Together AI or--
[00:58:02.140 --> 00:58:02.980]   - Fireworks.
[00:58:02.980 --> 00:58:04.540]   - Fireworks or like OctoML.
[00:58:04.540 --> 00:58:05.380]   - Yeah.
[00:58:05.380 --> 00:58:06.380]   - You know, there are like too many of them actually,
[00:58:06.380 --> 00:58:08.300]   honestly, and--
[00:58:08.300 --> 00:58:09.380]   - What do you think they need to do to win
[00:58:09.380 --> 00:58:10.620]   as an objective third party?
[00:58:10.620 --> 00:58:13.100]   - I think they need to raise an insane amount of capital
[00:58:13.100 --> 00:58:16.660]   and subsidize the cost so much and capture the market,
[00:58:16.660 --> 00:58:18.300]   or it's basically gonna be impossible
[00:58:18.300 --> 00:58:20.660]   because you're all offering the same thing, more or less.
[00:58:20.660 --> 00:58:22.860]   And NVIDIA's basically commoditizing it, right?
[00:58:22.860 --> 00:58:26.780]   Like with TRT-LM and like Megatron and things like that.
[00:58:26.780 --> 00:58:30.460]   So most people's stacks are gonna get standardized.
[00:58:30.460 --> 00:58:32.060]   So then why am I paying you?
[00:58:32.060 --> 00:58:33.620]   I'm paying you for the GPUs then.
[00:58:33.620 --> 00:58:34.460]   - Right.
[00:58:34.460 --> 00:58:35.500]   - That's the game you can only pay it,
[00:58:35.500 --> 00:58:37.460]   like it's an economy of scale thing.
[00:58:37.460 --> 00:58:39.780]   - Which you're also buying your own GPUs
[00:58:39.780 --> 00:58:40.700]   and running your own stack.
[00:58:40.700 --> 00:58:41.540]   - That's right, that's right.
[00:58:41.540 --> 00:58:43.660]   But we care about buying GPUs to serve our own product
[00:58:43.660 --> 00:58:45.820]   more than helping other people serve their products.
[00:58:45.820 --> 00:58:48.180]   - Yeah, what have you learned being like a,
[00:58:48.180 --> 00:58:50.220]   I don't know, I feel like you're both an infra CEO
[00:58:50.220 --> 00:58:52.940]   and an application sort of product CEO.
[00:58:52.940 --> 00:58:53.780]   How do you balance that?
[00:58:53.780 --> 00:58:56.020]   - Yeah, it's difficult, but you know,
[00:58:56.020 --> 00:58:58.180]   one thing exists in service of the other, right?
[00:58:58.180 --> 00:59:00.140]   Infrastructure exists in service of the product.
[00:59:00.140 --> 00:59:01.540]   You always have to remember that.
[00:59:01.540 --> 00:59:02.820]   For some people, product exists
[00:59:02.820 --> 00:59:05.060]   in service of the infrastructure.
[00:59:05.060 --> 00:59:06.140]   That's not how we are.
[00:59:06.140 --> 00:59:08.820]   - What does Perplexity become a year, two years from now?
[00:59:08.820 --> 00:59:09.980]   - Hopefully like a lot more people
[00:59:09.980 --> 00:59:12.500]   start using it as a Google replacement.
[00:59:12.500 --> 00:59:13.340]   - I see.
[00:59:13.340 --> 00:59:14.660]   You already, I read some stats somewhere,
[00:59:14.660 --> 00:59:15.860]   you're 10% of Bing traffic.
[00:59:15.860 --> 00:59:17.980]   - I don't know about that, but.
[00:59:17.980 --> 00:59:19.300]   - Someone was measuring like a third party,
[00:59:19.300 --> 00:59:20.620]   like similar type of thing.
[00:59:20.620 --> 00:59:23.580]   - Yeah, maybe for, actually for Bing chat,
[00:59:23.580 --> 00:59:26.300]   we might be even further ahead.
[00:59:26.300 --> 00:59:27.140]   - Okay.
[00:59:27.140 --> 00:59:29.220]   - Like there's just Perplexity versus Bing chat,
[00:59:29.220 --> 00:59:32.340]   not Bing.com, which is crazy given that
[00:59:32.340 --> 00:59:34.120]   they have so much distribution, right?
[00:59:34.120 --> 00:59:34.960]   - Oh yeah.
[00:59:34.960 --> 00:59:35.780]   - And marketing power.
[00:59:35.780 --> 00:59:37.780]   - But you are more AI native than they are?
[00:59:37.780 --> 00:59:38.620]   - That's right.
[00:59:38.620 --> 00:59:39.440]   - In a sense.
[00:59:39.440 --> 00:59:40.280]   - That's right.
[00:59:40.280 --> 00:59:41.100]   - You are a different search index.
[00:59:41.100 --> 00:59:41.940]   Like you have your own crawlers and everything.
[00:59:41.940 --> 00:59:43.820]   - Yeah, we have our own crawlers and indexes, yeah.
[00:59:43.820 --> 00:59:45.300]   - So like if I don't want Bing,
[00:59:45.300 --> 00:59:47.340]   then I use your stuff and maybe you turn--
[00:59:47.340 --> 00:59:48.420]   - That's right, yeah, that's right.
[00:59:48.420 --> 00:59:49.260]   - That's cool.
[00:59:49.260 --> 00:59:50.080]   So what are you hiring?
[00:59:50.080 --> 00:59:50.920]   What are you looking to hire?
[00:59:50.920 --> 00:59:52.860]   What should people demonstrate when joining you?
[00:59:52.860 --> 00:59:54.460]   I think you have a very strong perspective
[00:59:54.460 --> 00:59:55.820]   on the kind of culture that you're building.
[00:59:55.820 --> 00:59:57.280]   - Yeah, I mean, we work pretty hard
[00:59:57.280 --> 00:59:59.860]   and like we wanna get stuff done fast.
[00:59:59.860 --> 01:00:03.420]   So if you enjoy like fast shipping cycles and--
[01:00:03.420 --> 01:00:04.860]   - Can you give illustrations?
[01:00:04.860 --> 01:00:06.340]   Like what do you mean by that?
[01:00:06.340 --> 01:00:07.540]   - You know, every two weeks,
[01:00:07.540 --> 01:00:09.540]   like we have some announcements we make.
[01:00:09.540 --> 01:00:12.860]   So we work on very clear, precise projects
[01:00:12.860 --> 01:00:14.620]   that have like clear deliverables
[01:00:14.620 --> 01:00:17.820]   and we kind of constantly wanna keep improving the product.
[01:00:17.820 --> 01:00:20.740]   So as a machine learning research engineer,
[01:00:20.740 --> 01:00:22.720]   if you're excited about like training models
[01:00:22.720 --> 01:00:24.340]   and shipping them to production
[01:00:24.340 --> 01:00:27.340]   for such a useful use case like consumer search
[01:00:27.340 --> 01:00:31.340]   and wanna do it at the same velocity as us,
[01:00:31.340 --> 01:00:33.100]   like a startup rather than a big company
[01:00:33.100 --> 01:00:34.300]   that has to wait for several months
[01:00:34.300 --> 01:00:35.620]   to get something into production,
[01:00:35.620 --> 01:00:38.360]   that's a unique spot like to be in, right?
[01:00:38.360 --> 01:00:40.820]   And you also wanna be part of a growing exponential
[01:00:40.820 --> 01:00:42.460]   rather than something that's trying
[01:00:42.460 --> 01:00:44.780]   to defend its territory, right?
[01:00:44.780 --> 01:00:45.700]   - Defend its territory?
[01:00:45.700 --> 01:00:46.740]   - Yeah, like Google.
[01:00:46.740 --> 01:00:48.220]   Google's defending. - I see, yeah, yeah, yeah.
[01:00:48.220 --> 01:00:49.860]   - So they attack.
[01:00:49.860 --> 01:00:50.940]   So you wanna be an attacking team.
[01:00:50.940 --> 01:00:52.740]   - Have you heard, like what does Google say about you?
[01:00:52.740 --> 01:00:54.260]   Like are they interested in buying you?
[01:00:54.260 --> 01:00:57.380]   - I think they're being pretty appreciative
[01:00:57.380 --> 01:00:59.140]   and respectful of the product, right?
[01:00:59.140 --> 01:01:02.260]   - But like SGE is not great for some reason.
[01:01:02.260 --> 01:01:03.100]   - Yeah.
[01:01:03.100 --> 01:01:07.140]   By the way, I don't think Google people are not talented.
[01:01:07.140 --> 01:01:09.900]   Like they're probably more talented than we are.
[01:01:09.900 --> 01:01:14.700]   I think it's just that their incentives are not clear
[01:01:14.700 --> 01:01:16.740]   and they might have to cannibalize
[01:01:16.740 --> 01:01:18.900]   their own business model to like--
[01:01:18.900 --> 01:01:20.820]   - This is the classic innovators dilemma, right?
[01:01:20.820 --> 01:01:21.660]   - Exactly, yeah.
[01:01:21.660 --> 01:01:24.580]   - They have a cash cow and they're trying to preserve that.
[01:01:24.580 --> 01:01:28.660]   You don't have ads, but you're serving subscriptions
[01:01:28.660 --> 01:01:30.140]   and that's the main business model for now.
[01:01:30.140 --> 01:01:30.980]   - As of today, yeah.
[01:01:30.980 --> 01:01:31.900]   - Yeah, that's it.
[01:01:31.900 --> 01:01:33.560]   Well, thank you very much. - Cool.
[01:01:33.560 --> 01:01:35.300]   - All the people that we talked to so far
[01:01:35.300 --> 01:01:36.580]   and some of the best founders I know,
[01:01:36.580 --> 01:01:39.500]   whether or not they're in AI, are fierce nerds.
[01:01:39.500 --> 01:01:41.460]   And our event definitely reminds me
[01:01:41.460 --> 01:01:43.540]   of the fierce nerds concept.
[01:01:43.540 --> 01:01:46.020]   But I don't think I'm the best person to tell that story.
[01:01:46.020 --> 01:01:48.540]   Maybe I'll tag in Sean Puri.
[01:01:48.540 --> 01:01:50.640]   - Yeah, have you ever read that Paul Graham blog post
[01:01:50.640 --> 01:01:52.140]   called "Fierce Nerds"?
[01:01:52.140 --> 01:01:53.180]   - No, what is it?
[01:01:53.180 --> 01:01:54.020]   - It's an amazing post.
[01:01:54.020 --> 01:01:55.500]   I'm gonna read you a couple pieces of it,
[01:01:55.500 --> 01:01:57.700]   but it's one of those like, Paul Graham, I think is,
[01:01:57.700 --> 01:01:59.060]   somebody said this earlier, they go,
[01:01:59.060 --> 01:02:00.020]   what's that guy, Andrew Tate?
[01:02:00.020 --> 01:02:01.420]   They started some tweet that was really funny.
[01:02:01.420 --> 01:02:04.300]   It was, "Paul Graham was my Andrew Tate growing up."
[01:02:04.300 --> 01:02:06.260]   - Same. - Which is just like,
[01:02:06.260 --> 01:02:07.100]   so funny.
[01:02:07.100 --> 01:02:08.880]   It's such a funny, it's such a deep cut joke,
[01:02:08.880 --> 01:02:11.540]   but if you get it, you're like, it just hits the spot.
[01:02:11.540 --> 01:02:13.060]   All right, so he wrote this post and he goes,
[01:02:13.060 --> 01:02:15.500]   "Most people think of nerds as quiet,
[01:02:15.500 --> 01:02:17.000]   "sort of like diffident people, right?
[01:02:17.000 --> 01:02:18.700]   "Just sort of like passive.
[01:02:18.700 --> 01:02:20.520]   "And in most social situations, they are.
[01:02:20.520 --> 01:02:23.180]   "They're quiet and they're not the star quarterback
[01:02:23.180 --> 01:02:24.520]   "in the middle of the gym, right?
[01:02:24.520 --> 01:02:25.580]   "They're kind of a fish out of water
[01:02:25.580 --> 01:02:27.100]   "in a bunch of different things."
[01:02:27.100 --> 01:02:28.680]   He goes, "But this is an illusion
[01:02:28.680 --> 01:02:32.000]   "because that only happens when non-nerds observe them
[01:02:32.000 --> 01:02:34.960]   "'cause they're observing them in non-nerdy situations.
[01:02:34.960 --> 01:02:36.900]   "So you see a nerd at prom,
[01:02:36.900 --> 01:02:39.940]   "you just see them as a quiet sort of passive nerd.
[01:02:39.940 --> 01:02:41.100]   "There's no alpha in them.
[01:02:41.100 --> 01:02:44.160]   "But in fact, some nerds are quite fierce.
[01:02:44.160 --> 01:02:46.660]   "Fierce nerds are a small but interesting group.
[01:02:46.660 --> 01:02:48.380]   "They are extremely competitive,
[01:02:48.380 --> 01:02:49.840]   "more competitive, I would say,
[01:02:49.840 --> 01:02:51.540]   "than competitive non-nerds
[01:02:51.540 --> 01:02:54.200]   "because the competition is more personal to them,
[01:02:54.200 --> 01:02:56.780]   "partly because they're not emotionally mature
[01:02:56.780 --> 01:02:58.320]   "and they distance themselves from it,
[01:02:58.320 --> 01:03:00.600]   "but also because there's less randomness
[01:03:00.600 --> 01:03:02.520]   "in the types of competition that they engage in.
[01:03:02.520 --> 01:03:06.380]   "Therefore, they're justified in making it more personal."
[01:03:06.380 --> 01:03:07.200]   - I'll cut it off there.
[01:03:07.200 --> 01:03:09.740]   That's a clip from the "My First Million" podcast.
[01:03:09.740 --> 01:03:11.420]   And that's a story about how Dharmesh Shah,
[01:03:11.420 --> 01:03:13.500]   the HubSpot CTO, is a fierce nerd.
[01:03:13.500 --> 01:03:15.420]   And I really like that concept
[01:03:15.420 --> 01:03:18.500]   because, one, it helps to validate that nerds can also win
[01:03:18.500 --> 01:03:21.980]   and why nerds can sometimes win more than regular people.
[01:03:21.980 --> 01:03:24.660]   And obviously, for more, you can read that Paul Graham essay.
[01:03:24.660 --> 01:03:27.000]   But I think Arvind is a fierce nerd,
[01:03:27.000 --> 01:03:29.340]   and I think Perplexity is a fierce nerd company.
[01:03:29.340 --> 01:03:31.340]   They do have competition, though.
[01:03:31.340 --> 01:03:33.300]   It's not like Perplexity is the only company
[01:03:33.300 --> 01:03:34.460]   going after Google,
[01:03:34.460 --> 01:03:36.420]   not the only company going after Search.
[01:03:36.420 --> 01:03:37.540]   One of my favorite parts
[01:03:37.540 --> 01:03:40.100]   in compiling these ensemble episodes
[01:03:40.100 --> 01:03:43.460]   is juxtaposing two competitors next to each other
[01:03:43.460 --> 01:03:46.300]   or people who disagree or have different worldviews.
[01:03:46.300 --> 01:03:47.820]   Like, you just heard Perplexity.
[01:03:47.820 --> 01:03:50.580]   You just heard Arvind dunk on all the infrastructure companies
[01:03:50.580 --> 01:03:52.500]   including Fireworks, which we just had on.
[01:03:52.500 --> 01:03:53.780]   Now, I'm not the right person
[01:03:53.780 --> 01:03:55.460]   to tell you who's right and who's wrong,
[01:03:55.460 --> 01:03:57.780]   but I know for a fact that they cannot all be right,
[01:03:57.780 --> 01:03:59.020]   and that's what's fascinating.
[01:03:59.020 --> 01:04:00.020]   That's what makes the market.
[01:04:00.020 --> 01:04:02.180]   So next, in full disclosure, is a personal friend of mine.
[01:04:02.180 --> 01:04:04.300]   It's Will Brick from Metaphor Systems.
[01:04:04.300 --> 01:04:06.620]   Metaphor launched end of 2022
[01:04:06.620 --> 01:04:09.180]   with an AI search engine narrative as well,
[01:04:09.180 --> 01:04:12.180]   but their approach is more of a pre-trained
[01:04:12.180 --> 01:04:16.620]   LLM research engine as opposed to Arvind's answer engine.
[01:04:16.620 --> 01:04:18.740]   They're all very minor differences in the end.
[01:04:18.740 --> 01:04:20.740]   At the end of the day, people want to punch in a query
[01:04:20.740 --> 01:04:23.380]   and get results, and Metaphor's approach is different.
[01:04:23.380 --> 01:04:25.400]   They are going after the infrastructure play
[01:04:25.400 --> 01:04:27.540]   rather than the application plus infrastructure play,
[01:04:27.540 --> 01:04:29.500]   and it's just nice to contrast them together,
[01:04:29.500 --> 01:04:31.580]   and I'll leave the conclusions to you.
[01:04:31.580 --> 01:04:32.660]   What is Metaphor?
[01:04:32.660 --> 01:04:34.980]   - Metaphor is a search engine over the internet,
[01:04:34.980 --> 01:04:38.620]   but it's better than Google at handling complex queries.
[01:04:38.620 --> 01:04:40.220]   - Okay, why is that?
[01:04:40.220 --> 01:04:41.040]   - Why is that?
[01:04:41.040 --> 01:04:44.180]   Because we train a search algorithm from scratch
[01:04:44.180 --> 01:04:46.420]   to handle complex queries, basically.
[01:04:46.420 --> 01:04:47.660]   It's a totally different algorithm, yeah.
[01:04:47.660 --> 01:04:49.060]   - Why are you at NeurIPS?
[01:04:49.060 --> 01:04:51.580]   - I'm at NeurIPS because we want to learn
[01:04:51.580 --> 01:04:52.700]   about all the cool things people are working on,
[01:04:52.700 --> 01:04:54.780]   and also because we want to hire some crazy,
[01:04:54.780 --> 01:04:57.180]   good researchers to help build the future of search.
[01:04:57.180 --> 01:04:58.240]   - Metaphor has a search engine.
[01:04:58.240 --> 01:04:59.820]   That's what you launched last year,
[01:04:59.820 --> 01:05:01.700]   and then you also released an API,
[01:05:01.700 --> 01:05:02.980]   and I've actually been using the API.
[01:05:02.980 --> 01:05:06.260]   It's actually really good for augmenting LLMs with search.
[01:05:06.260 --> 01:05:08.540]   I don't know how much to which you want to lean
[01:05:08.540 --> 01:05:11.220]   being an app versus an infrastructure company.
[01:05:11.220 --> 01:05:14.100]   - Yeah, so we're leaning towards search infrastructure,
[01:05:14.100 --> 01:05:15.660]   so we really see ourselves as like,
[01:05:15.660 --> 01:05:17.820]   we want people to build applications on top of us.
[01:05:17.820 --> 01:05:20.180]   We see the future as everyone will use LLMs
[01:05:20.180 --> 01:05:21.020]   as the interface to everything,
[01:05:21.020 --> 01:05:23.940]   and we want to be powering the search to underlies that.
[01:05:23.940 --> 01:05:26.100]   I think we want people to build really cool UIs
[01:05:26.100 --> 01:05:27.900]   on top of our search, but the hard part,
[01:05:27.900 --> 01:05:29.180]   and the thing that we're focusing on,
[01:05:29.180 --> 01:05:30.500]   is really good search results.
[01:05:30.500 --> 01:05:32.200]   - Yeah, can you give examples?
[01:05:32.200 --> 01:05:34.060]   You have some really cool examples,
[01:05:34.060 --> 01:05:36.660]   like tweets, and books, and PDFs, and stuff.
[01:05:36.660 --> 01:05:38.900]   - People really get excited about researchers
[01:05:38.900 --> 01:05:40.380]   working on something similar to them
[01:05:40.380 --> 01:05:42.220]   in the Bay Area, or something like that.
[01:05:42.220 --> 01:05:43.060]   People have actually met--
[01:05:43.060 --> 01:05:43.880]   - Oh, yeah, yeah, yeah.
[01:05:43.880 --> 01:05:45.300]   Competitive Intel research as well.
[01:05:45.300 --> 01:05:47.580]   - People have met people in real life based on searches,
[01:05:47.580 --> 01:05:49.740]   because the results are so high quality,
[01:05:49.740 --> 01:05:52.180]   and they're not SEO spammed in any way.
[01:05:52.180 --> 01:05:54.380]   It's just exactly what you're asking for,
[01:05:54.380 --> 01:05:57.220]   that it's cool to see that digital information
[01:05:57.220 --> 01:06:00.340]   to real-world interaction thing happen.
[01:06:00.340 --> 01:06:03.340]   - I actually also interviewed Arvind from Perplexity,
[01:06:03.340 --> 01:06:07.140]   who I feel like is also in that search domain,
[01:06:07.140 --> 01:06:09.460]   but he's less focused on search infrastructure,
[01:06:09.460 --> 01:06:11.460]   he's more focused on just being a search engine.
[01:06:11.460 --> 01:06:13.460]   I don't know if you've compared yourself
[01:06:13.460 --> 01:06:15.260]   to Perplexity in that way.
[01:06:15.260 --> 01:06:16.660]   - Yeah, I know, we get asked this a lot.
[01:06:16.660 --> 01:06:18.140]   I mean, Perplexity is doing a great job
[01:06:18.140 --> 01:06:20.980]   at combining LLMs with search results,
[01:06:20.980 --> 01:06:23.900]   and that does make for a better search engine.
[01:06:23.900 --> 01:06:26.660]   That is the future of the user interaction,
[01:06:26.660 --> 01:06:29.740]   but we're just more focused on the search results themselves
[01:06:29.740 --> 01:06:31.460]   and really trying to handle the queries
[01:06:31.460 --> 01:06:33.540]   that Google and Bing are not good at.
[01:06:33.540 --> 01:06:34.380]   - Yeah.
[01:06:34.380 --> 01:06:37.420]   - So, I mean, we want people to build LLM-style interactions
[01:06:37.420 --> 01:06:39.020]   on top of our thing as well.
[01:06:39.020 --> 01:06:40.700]   - Wait, so you say Google and Bing are not good at it.
[01:06:40.700 --> 01:06:42.060]   Do you think that people will use you
[01:06:42.060 --> 01:06:44.220]   in complement to Google and Bing,
[01:06:44.220 --> 01:06:46.140]   or do you just completely replace that?
[01:06:46.140 --> 01:06:46.980]   - At least in the beginning.
[01:06:46.980 --> 01:06:48.860]   Like, we're gonna be used in places
[01:06:48.860 --> 01:06:51.140]   where Google and Bing don't work well.
[01:06:51.140 --> 01:06:53.980]   So, I mean, if your application wants to know the weather,
[01:06:53.980 --> 01:06:55.980]   or wants to know that Taylor Swift song,
[01:06:55.980 --> 01:06:57.300]   basically, if your application knows
[01:06:57.300 --> 01:06:58.780]   the right keywords to search with,
[01:06:58.780 --> 01:07:00.820]   then sure, Google and Bing are gonna be fine for you.
[01:07:00.820 --> 01:07:03.220]   But if you want to make these complex,
[01:07:03.220 --> 01:07:06.620]   almost metaphorical queries with natural language,
[01:07:06.620 --> 01:07:08.000]   which are really the most powerful ones,
[01:07:08.000 --> 01:07:09.420]   then you should be using Metaphor.
[01:07:09.420 --> 01:07:10.260]   - Yeah, yeah.
[01:07:10.260 --> 01:07:11.420]   I was actually walking from your,
[01:07:11.420 --> 01:07:13.500]   we were walking from your sushi party
[01:07:13.500 --> 01:07:16.220]   that you just had, like, it's like a recruiting event.
[01:07:16.220 --> 01:07:17.060]   - I hope the food was good.
[01:07:17.060 --> 01:07:18.060]   - It was pretty good, it was pretty good.
[01:07:18.060 --> 01:07:19.660]   I love me a little bit of sushi.
[01:07:19.660 --> 01:07:20.820]   And I was actually talking to people
[01:07:20.820 --> 01:07:22.840]   about your auto-prompting feature,
[01:07:22.840 --> 01:07:24.460]   'cause a lot of people,
[01:07:24.460 --> 01:07:26.380]   I was, there was someone from Mid-Journey there,
[01:07:26.380 --> 01:07:28.460]   and they were saying how Dolly 3
[01:07:28.460 --> 01:07:30.060]   also does sort of auto-prompting,
[01:07:30.060 --> 01:07:31.420]   or rewriting of the prompts.
[01:07:31.420 --> 01:07:32.260]   - Yeah.
[01:07:32.260 --> 01:07:33.300]   - Is there art to auto-prompting?
[01:07:33.300 --> 01:07:34.140]   How do you feel that?
[01:07:34.140 --> 01:07:35.300]   How do you feel about your auto-prompting feature,
[01:07:35.300 --> 01:07:36.140]   basically?
[01:07:36.140 --> 01:07:37.500]   - Yeah, auto-prompt is like, we convert,
[01:07:37.500 --> 01:07:38.780]   we use Chatsuite-T, basically,
[01:07:38.780 --> 01:07:41.340]   to convert the queries that come into the search engine
[01:07:41.340 --> 01:07:44.860]   into queries that are formatted for Metaphor's models.
[01:07:44.860 --> 01:07:48.820]   Because Metaphor is trained to predict links given text,
[01:07:48.820 --> 01:07:52.360]   so the model really, like, the best way to prompt Metaphor
[01:07:52.360 --> 01:07:55.460]   is to search in a way where a link naturally follows,
[01:07:55.460 --> 01:07:56.380]   which can be confusing,
[01:07:56.380 --> 01:07:57.380]   so we have this auto-prompt
[01:07:57.380 --> 01:07:58.420]   that converts into the right format.
[01:07:58.420 --> 01:07:59.260]   - Yeah.
[01:07:59.260 --> 01:08:00.540]   - You can kind of think of Metaphor as in the same state
[01:08:00.540 --> 01:08:02.660]   as, like, what GPD 3 was in.
[01:08:02.660 --> 01:08:04.040]   I don't know if you guys remember, but,
[01:08:04.040 --> 01:08:04.880]   or if you remember--
[01:08:04.880 --> 01:08:05.720]   - It's not instruction tunes, yeah.
[01:08:05.720 --> 01:08:07.780]   - Yeah, it's like, you know, two years ago,
[01:08:07.780 --> 01:08:08.900]   GPD 3 was auto-complete,
[01:08:08.900 --> 01:08:10.100]   so you had to, like, prompt it
[01:08:10.100 --> 01:08:11.860]   in order to get the best output from it.
[01:08:11.860 --> 01:08:12.740]   It had a lot of power,
[01:08:12.740 --> 01:08:14.060]   but it just had this weird user interface.
[01:08:14.060 --> 01:08:15.740]   Metaphor's in a similar situation.
[01:08:15.740 --> 01:08:17.600]   The problem is when you RLHF, you, like,
[01:08:17.600 --> 01:08:18.780]   and we've tried this, like,
[01:08:18.780 --> 01:08:20.700]   it does reduce the power of the model,
[01:08:20.700 --> 01:08:23.340]   and, like, it's just okay to,
[01:08:23.340 --> 01:08:25.820]   because, like, often we're using this auto-prompt,
[01:08:25.820 --> 01:08:28.180]   like, it's okay to keep this model the way it is,
[01:08:28.180 --> 01:08:31.020]   requiring this auto-complete type of search.
[01:08:31.020 --> 01:08:33.420]   - And, yeah, would you call yourself a search LLM?
[01:08:33.420 --> 01:08:35.000]   Like, very, very long ago,
[01:08:35.000 --> 01:08:36.140]   the original pitch for Metaphor
[01:08:36.140 --> 01:08:37.660]   that I heard from you guys was,
[01:08:37.660 --> 01:08:41.460]   you're an LLM that predicts links instead of tokens.
[01:08:41.460 --> 01:08:42.580]   - Oh, well, an LLM is, yeah,
[01:08:42.580 --> 01:08:45.540]   I mean, LLM is, like, it's modeling, like,
[01:08:45.540 --> 01:08:46.660]   yeah, usually, like, language,
[01:08:46.660 --> 01:08:47.500]   and we're not really,
[01:08:47.500 --> 01:08:49.060]   we're not exactly generating the links.
[01:08:49.060 --> 01:08:50.460]   We're, we search over an index.
[01:08:50.460 --> 01:08:51.820]   - Yeah, yeah, they're not hallucinated at all, right?
[01:08:51.820 --> 01:08:52.820]   They're actually from an index.
[01:08:52.820 --> 01:08:54.580]   - Yeah, I wouldn't call it a search LLM.
[01:08:54.580 --> 01:08:55.620]   - Okay. - It's more like a,
[01:08:55.620 --> 01:08:56.620]   really, a search engine.
[01:08:56.620 --> 01:08:57.460]   - Search engine. - You might even think
[01:08:57.460 --> 01:08:58.540]   of it as a research engine.
[01:08:58.540 --> 01:08:59.380]   - Yeah. - And there are a lot
[01:08:59.380 --> 01:09:00.580]   of different ways we're trying to explain it.
[01:09:00.580 --> 01:09:02.500]   I mean, I think we're, like, using terms
[01:09:02.500 --> 01:09:06.100]   that were developed in an old era for a new type of thing,
[01:09:06.100 --> 01:09:08.060]   so we might have to invent new words,
[01:09:08.060 --> 01:09:09.580]   or wait until they are created.
[01:09:09.580 --> 01:09:10.540]   - Yeah, yeah.
[01:09:10.540 --> 01:09:12.020]   What else should people know about Metaphor in general?
[01:09:12.020 --> 01:09:14.300]   Like, what other interesting work are you guys doing?
[01:09:14.300 --> 01:09:16.140]   - I think just, like, the vision is super exciting,
[01:09:16.140 --> 01:09:19.040]   and I think people don't realize how exciting the vision is.
[01:09:19.040 --> 01:09:21.020]   Basically, the vision is to solve search.
[01:09:21.020 --> 01:09:21.860]   What does that mean?
[01:09:21.860 --> 01:09:24.180]   It means that no matter how complex the query,
[01:09:24.180 --> 01:09:25.420]   Metaphor should be able to handle it.
[01:09:25.420 --> 01:09:28.820]   So we're talking, like, AI researchers, similar to you,
[01:09:28.820 --> 01:09:31.480]   who are in the Bay Area, who've worked on Rust before,
[01:09:31.480 --> 01:09:33.280]   who went to so-and-so college,
[01:09:33.280 --> 01:09:36.060]   who would be a great candidate for this startup.
[01:09:36.060 --> 01:09:37.980]   Whatever it is, we should be able to handle it,
[01:09:37.980 --> 01:09:40.180]   and language models are powerful enough
[01:09:40.180 --> 01:09:42.500]   to understand language at the level of a human.
[01:09:42.500 --> 01:09:43.780]   So you should theoretically be able
[01:09:43.780 --> 01:09:44.980]   to make a system like this.
[01:09:44.980 --> 01:09:46.500]   It's just a matter of how fast can it be.
[01:09:46.500 --> 01:09:47.340]   - Yeah. - And we wanna make
[01:09:47.340 --> 01:09:49.700]   these things, like, do all those complex queries really fast.
[01:09:49.700 --> 01:09:51.620]   And imagine if you could do this,
[01:09:51.620 --> 01:09:54.180]   imagine if this was, like, possible,
[01:09:54.180 --> 01:09:56.100]   and then you combine that with, like,
[01:09:56.100 --> 01:09:57.740]   you know, GPT-4, GPT-5,
[01:09:57.740 --> 01:10:00.220]   and that's how we want our customers to combine us,
[01:10:00.220 --> 01:10:01.740]   you know, combine us with GPT-4, GPT-5.
[01:10:01.740 --> 01:10:02.860]   Suddenly, now, you have the ability
[01:10:02.860 --> 01:10:05.380]   to literally answer any information query,
[01:10:05.380 --> 01:10:07.060]   no matter how complex.
[01:10:07.060 --> 01:10:08.380]   That, like, the entire world's knowledge
[01:10:08.380 --> 01:10:09.380]   is at your fingertips.
[01:10:09.380 --> 01:10:10.860]   That's, like, insane. (laughing)
[01:10:10.860 --> 01:10:12.620]   Like, we've basically become all-knowing.
[01:10:12.620 --> 01:10:13.980]   - Yeah. - You know, omnipotent.
[01:10:13.980 --> 01:10:15.260]   - Yeah, that's-- - Omniscient.
[01:10:15.260 --> 01:10:16.140]   (laughing)
[01:10:16.140 --> 01:10:17.700]   - Omniscient, and then omnipotent.
[01:10:17.700 --> 01:10:18.540]   - Omniscient, now-- - Knowledge is power, right?
[01:10:18.540 --> 01:10:19.700]   - Right, sorry, I skipped a step.
[01:10:19.700 --> 01:10:22.500]   - No, no, no, yeah, I can do that sort of QED proof
[01:10:22.500 --> 01:10:24.700]   of why omniscient equals omnipotent.
[01:10:24.700 --> 01:10:26.980]   I am very excited about you guys.
[01:10:26.980 --> 01:10:28.980]   You know, I've seen you grow literally
[01:10:28.980 --> 01:10:33.340]   from your living room, and it's definitely not over.
[01:10:33.340 --> 01:10:37.100]   What's it like having a meme-y celebrity CTO
[01:10:37.100 --> 01:10:38.300]   who keeps tweeting viral shit?
[01:10:38.300 --> 01:10:39.140]   - No, I mean, I love it.
[01:10:39.140 --> 01:10:40.780]   Like, Jeff literally just goes,
[01:10:40.780 --> 01:10:41.820]   Jeff has figured out Twitter.
[01:10:41.820 --> 01:10:42.980]   He just knows how to go viral,
[01:10:42.980 --> 01:10:44.260]   because he has really good takes,
[01:10:44.260 --> 01:10:46.220]   and we often throw up a party
[01:10:46.220 --> 01:10:47.940]   in response to his viral tweets.
[01:10:47.940 --> 01:10:50.300]   - So, you wanna talk about the Andrew Huberman party?
[01:10:50.300 --> 01:10:51.420]   - Yeah, okay, so he had a tweet
[01:10:51.420 --> 01:10:53.460]   that was like, Andrew Huberman has single-handedly
[01:10:53.460 --> 01:10:54.820]   destroyed the SF social scene,
[01:10:54.820 --> 01:10:57.220]   'cause everyone, whatever, is sober at parties
[01:10:57.220 --> 01:10:58.300]   and goes home early.
[01:10:58.300 --> 01:11:00.780]   And so, of course, we had an anti-Huberman party,
[01:11:00.780 --> 01:11:02.300]   where everyone stayed late,
[01:11:02.300 --> 01:11:05.020]   and we had a bunch of beer, and everyone--
[01:11:05.020 --> 01:11:07.220]   - Well, my favorite was all over the apartment
[01:11:07.220 --> 01:11:08.220]   that we had the party in.
[01:11:08.220 --> 01:11:09.940]   You plastered quotes from Andrew Huberman
[01:11:09.940 --> 01:11:11.060]   about how alcohol's bad for you.
[01:11:11.060 --> 01:11:13.100]   - Right, alcohol will destroy your brain,
[01:11:13.100 --> 01:11:14.420]   and all these things.
[01:11:14.420 --> 01:11:16.580]   I mean, look, everything in balance, right?
[01:11:16.580 --> 01:11:17.560]   We should have fun in life,
[01:11:17.560 --> 01:11:19.460]   but also, you know, be safe and everything.
[01:11:19.460 --> 01:11:21.500]   But, and then, he had another tweet
[01:11:21.500 --> 01:11:24.380]   about how he was gonna go on a date,
[01:11:24.380 --> 01:11:25.340]   but the girl ghosted him,
[01:11:25.340 --> 01:11:28.100]   and that allowed him to have focus on coding that night.
[01:11:28.100 --> 01:11:30.460]   So, of course, we had to have a ghosted-in-SF party,
[01:11:30.460 --> 01:11:32.220]   where everyone came to code together,
[01:11:32.220 --> 01:11:33.900]   'cause you're already gonna be ghosted on Friday night.
[01:11:33.900 --> 01:11:35.660]   You might as well code together while you're at it.
[01:11:35.660 --> 01:11:37.580]   - Yeah, I love that part of the social scene,
[01:11:37.580 --> 01:11:39.900]   and I think Metaphor is also really driving that somehow.
[01:11:39.900 --> 01:11:41.420]   So, congrats for all you do,
[01:11:41.420 --> 01:11:43.700]   and it's just nice to check in with you.
[01:11:43.700 --> 01:11:46.980]   - I've personally been enjoying the Metaphor approach
[01:11:46.980 --> 01:11:49.100]   to LLM search APIs.
[01:11:49.100 --> 01:11:53.540]   I've often said this in context of the capabilities of GPTs.
[01:11:53.540 --> 01:11:54.940]   So, if you think about it,
[01:11:54.940 --> 01:11:58.420]   what are the capabilities of ChatGPT as it is today,
[01:11:58.420 --> 01:12:00.960]   as well as GPTs as announced on Dev Day, right?
[01:12:00.960 --> 01:12:02.580]   There's the LLM base layer,
[01:12:02.580 --> 01:12:05.780]   but then you tack on three core capabilities on top of it,
[01:12:05.780 --> 01:12:08.060]   right, one is Retrieval-Omitted Generation,
[01:12:08.060 --> 01:12:10.140]   where you upload files and then you do RAG on it,
[01:12:10.140 --> 01:12:12.260]   and second is a Code Interpreter,
[01:12:12.260 --> 01:12:15.060]   where you do generate code in a sandbox,
[01:12:15.060 --> 01:12:17.060]   and then you run code and you correct code,
[01:12:17.060 --> 01:12:19.020]   and finally you execute it.
[01:12:19.020 --> 01:12:22.660]   And third is you have a search feature.
[01:12:22.660 --> 01:12:23.940]   And so, we have a bunch of companies
[01:12:23.940 --> 01:12:25.680]   competing for the RAG functionality.
[01:12:25.680 --> 01:12:27.260]   You can check our episodes mutually
[01:12:27.260 --> 01:12:30.820]   with Harrison of LangChain and Jerry of Llama Index this year.
[01:12:30.820 --> 01:12:31.740]   There's a bunch of companies
[01:12:31.740 --> 01:12:33.660]   competing for the Code Interpreter capability.
[01:12:33.660 --> 01:12:34.700]   There's obviously Repl.it,
[01:12:34.700 --> 01:12:38.220]   but then abstractly, there's also Denno and Valtown,
[01:12:38.220 --> 01:12:41.180]   and anyone who runs code is in that game, basically.
[01:12:41.180 --> 01:12:44.300]   But what is surprisingly uncontested is OpenWebSearch,
[01:12:44.300 --> 01:12:46.660]   and so far, I think it's Perplexity and Metaphor
[01:12:46.660 --> 01:12:49.240]   that are leading the pack in their different approaches.
[01:12:49.240 --> 01:12:54.120]   One, the PPLX API is an integrated LLM+ search API,
[01:12:54.120 --> 01:12:57.860]   and then two is Metaphor, which is search-only,
[01:12:57.860 --> 01:13:00.580]   and you kind of bring your own LLMs.
[01:13:00.580 --> 01:13:02.200]   For our next guest, we're actually going to go over
[01:13:02.200 --> 01:13:04.580]   to our last return guest,
[01:13:04.580 --> 01:13:06.320]   which is one of our most recent hits,
[01:13:06.320 --> 01:13:08.780]   which is Jeremy Howard, previously of Fast.ai,
[01:13:08.780 --> 01:13:10.280]   but now of Answer.ai.
[01:13:10.280 --> 01:13:12.000]   It seems that all people want is answers,
[01:13:12.000 --> 01:13:14.600]   and Jeremy doesn't have them, but he has questions.
[01:13:14.600 --> 01:13:17.820]   - Outside of the Decibel event recording.
[01:13:17.820 --> 01:13:19.160]   - I realized I had to be the interviewer,
[01:13:19.160 --> 01:13:20.300]   and I was like, "I probably should buy a wine."
[01:13:20.300 --> 01:13:21.400]   - And I had to pick a wine,
[01:13:21.400 --> 01:13:24.020]   and Sean told me, "Pick the most expensive one."
[01:13:24.020 --> 01:13:24.860]   - Yeah, it's on Decibel, anyway.
[01:13:24.860 --> 01:13:26.460]   - Because Decibel is paying for it.
[01:13:26.460 --> 01:13:28.840]   The one I'm having is from a $160 bottle,
[01:13:28.840 --> 01:13:29.760]   and it's really good.
[01:13:29.760 --> 01:13:31.720]   - And I did the same.
[01:13:31.720 --> 01:13:33.100]   - And I'm not having any wine.
[01:13:33.100 --> 01:13:35.320]   (laughing)
[01:13:35.320 --> 01:13:37.060]   - Could we go around and identify voices
[01:13:37.060 --> 01:13:38.500]   for people listening?
[01:13:38.500 --> 01:13:39.660]   Maybe Tanishq, you want to start?
[01:13:39.660 --> 01:13:41.740]   - Sure, my name is Tanishq Abraham.
[01:13:41.740 --> 01:13:44.120]   I am the CEO of MedArk,
[01:13:44.120 --> 01:13:46.620]   which is a medical AI research organization.
[01:13:46.620 --> 01:13:49.320]   I also work as a research director at Stability AI,
[01:13:49.320 --> 01:13:51.240]   and I've been collaborating with Jeremy Howard
[01:13:51.240 --> 01:13:54.820]   for more than a year, a couple years maybe,
[01:13:54.820 --> 01:13:57.020]   and he's also the president of MedArk,
[01:13:57.020 --> 01:13:59.700]   and he's been heavily involved in my venture as well.
[01:13:59.700 --> 01:14:02.020]   - And you have a podcast together, which I really enjoyed.
[01:14:02.020 --> 01:14:05.020]   - Oh yeah, yes, Jeremy had me on his podcast, which was--
[01:14:05.020 --> 01:14:06.860]   - Your first and only episode, or what the hell?
[01:14:06.860 --> 01:14:07.700]   - Yeah.
[01:14:07.700 --> 01:14:08.820]   (laughing)
[01:14:08.820 --> 01:14:11.420]   It turns out that maintaining a podcast is hard.
[01:14:12.380 --> 01:14:15.500]   - It's easy, just shove microphones in front of people.
[01:14:15.500 --> 01:14:17.660]   - So I'm Jeremy Howard, this is my voice,
[01:14:17.660 --> 01:14:22.660]   and as of today, I'm Jeremy Howard of Answer.ai, I guess.
[01:14:22.660 --> 01:14:25.500]   - And repeat guest on InSpace.
[01:14:25.500 --> 01:14:27.300]   Your last episode did really well
[01:14:27.300 --> 01:14:28.700]   in terms of the number of views.
[01:14:28.700 --> 01:14:30.100]   - Yeah, you guys are good interviewers.
[01:14:30.100 --> 01:14:32.020]   - Well, also you dropped a lot of spice,
[01:14:32.020 --> 01:14:34.300]   which is what we like as podcasters.
[01:14:34.300 --> 01:14:36.060]   We also have Jess Liao on for the first time, hey.
[01:14:36.060 --> 01:14:38.940]   - Yes, hello, I'm Jess Liao, and I'm a partner at Decibel.
[01:14:38.940 --> 01:14:40.020]   Excited to be here.
[01:14:40.020 --> 01:14:41.620]   Excited to be providing the wine also.
[01:14:41.620 --> 01:14:42.740]   - Standing in for Alessio.
[01:14:42.740 --> 01:14:43.580]   - Oh, so good.
[01:14:43.580 --> 01:14:45.980]   - Alessio dished us tonight, right?
[01:14:45.980 --> 01:14:47.940]   So you're the better replacement.
[01:14:47.940 --> 01:14:50.500]   - Yeah, it's good because in a previous conference,
[01:14:50.500 --> 01:14:53.780]   Alessio was wearing my badge and replacing me,
[01:14:53.780 --> 01:14:55.300]   so now I can be Alessio for tonight.
[01:14:55.300 --> 01:14:56.140]   - Well, you just work--
[01:14:56.140 --> 01:14:58.300]   - A shorter version of Alessio, basically.
[01:14:58.300 --> 01:14:59.540]   (laughing)
[01:14:59.540 --> 01:15:02.820]   - So today was the Answer.ai announcement,
[01:15:02.820 --> 01:15:03.980]   maybe you wanna cover that?
[01:15:03.980 --> 01:15:05.780]   Just what should people know about it?
[01:15:05.780 --> 01:15:07.020]   - What should people know about it?
[01:15:07.020 --> 01:15:09.180]   Oh, I don't know, man.
[01:15:09.180 --> 01:15:11.380]   - You went from Fast.ai to the dark side now.
[01:15:11.380 --> 01:15:12.220]   - No, it's not at all.
[01:15:12.220 --> 01:15:13.060]   - To the for-profit.
[01:15:13.060 --> 01:15:13.880]   - It is the light side.
[01:15:13.880 --> 01:15:15.860]   - It is actually, it is the light side.
[01:15:15.860 --> 01:15:20.100]   Fast.ai, look, I spent the last week in San Francisco,
[01:15:20.100 --> 01:15:22.900]   and the amount of love I received for Fast.ai
[01:15:22.900 --> 01:15:24.180]   was overwhelming.
[01:15:24.180 --> 01:15:26.860]   I couldn't believe how many people told me
[01:15:26.860 --> 01:15:28.460]   it changed their life, you know?
[01:15:28.460 --> 01:15:32.940]   Which is just amazing, but I have to say,
[01:15:32.940 --> 01:15:36.740]   it's actually time to be rejuvenated.
[01:15:36.740 --> 01:15:38.180]   The mission is the same.
[01:15:38.180 --> 01:15:40.340]   Bring AI to as many people as possible.
[01:15:40.340 --> 01:15:45.620]   But now, we can't do it on the back of my bank account.
[01:15:45.620 --> 01:15:48.600]   I've been paying for everything, well, and my wife.
[01:15:48.600 --> 01:15:49.440]   We can't afford it anymore.
[01:15:49.440 --> 01:15:50.440]   - But you've had donations and stuff.
[01:15:50.440 --> 01:15:51.280]   - No, no, no, nothing.
[01:15:51.280 --> 01:15:54.940]   - But you were very steadfastly against donations,
[01:15:54.940 --> 01:15:55.780]   I remember this.
[01:15:55.780 --> 01:15:57.980]   - Yeah, no donations, no revenue of any kind,
[01:15:57.980 --> 01:15:59.200]   totally independent.
[01:15:59.200 --> 01:16:04.580]   But now, I think we can do a better job
[01:16:04.580 --> 01:16:07.580]   by having a bank account with money in it.
[01:16:07.580 --> 01:16:10.660]   So, thank you, Jess, for sending us money.
[01:16:10.660 --> 01:16:11.860]   - Jess, what is it-- - We're happy to provide.
[01:16:11.860 --> 01:16:13.540]   - What is it like when someone like Jeremy comes
[01:16:13.540 --> 01:16:16.540]   and goes, "We need a bank account."
[01:16:16.540 --> 01:16:20.340]   - You know, there are some people that you go through a pitch
[01:16:20.340 --> 01:16:22.060]   and then there's some people that you email
[01:16:22.060 --> 01:16:23.700]   and you start prepping the wire,
[01:16:23.700 --> 01:16:26.140]   and I would say that Jeremy fell into the ladder.
[01:16:26.140 --> 01:16:28.780]   - Oh, no, I didn't even ask for this money.
[01:16:28.780 --> 01:16:30.820]   I was just gonna have a chat with Alessio
[01:16:30.820 --> 01:16:33.420]   to get some advice, and then Jess turned up,
[01:16:33.420 --> 01:16:35.920]   and Jess's other partner, John, turned up,
[01:16:35.920 --> 01:16:37.380]   and was like, "What are you guys doing here?"
[01:16:37.380 --> 01:16:39.820]   And I'm like, "Oh, we'd like to give you money."
[01:16:39.820 --> 01:16:42.780]   So, I was like, "Oh, okay."
[01:16:42.780 --> 01:16:44.020]   So, that was good.
[01:16:44.020 --> 01:16:45.140]   They have good taste, right?
[01:16:45.140 --> 01:16:46.100]   - Yeah.
[01:16:46.100 --> 01:16:47.780]   - I've talked to you a bit,
[01:16:47.780 --> 01:16:49.100]   especially at the modular conference,
[01:16:49.100 --> 01:16:50.420]   which I'm wearing the badge of.
[01:16:50.420 --> 01:16:51.260]   - Nice hoodie, yeah.
[01:16:51.260 --> 01:16:52.380]   - Yeah, the hoodie is really nice.
[01:16:52.380 --> 01:16:54.740]   So, you're interested in fine-tuning.
[01:16:54.740 --> 01:16:58.340]   You're interested in fundamental research.
[01:16:58.340 --> 01:17:00.340]   Could you list out the main areas of interest, maybe?
[01:17:00.340 --> 01:17:05.100]   - I mean, basically, the interest is in making AI
[01:17:05.100 --> 01:17:07.260]   as useful and valuable as possible.
[01:17:07.260 --> 01:17:08.100]   - Yeah.
[01:17:08.100 --> 01:17:10.980]   - That's how we make it as accessible as possible,
[01:17:10.980 --> 01:17:12.440]   as widely used as possible,
[01:17:12.440 --> 01:17:16.860]   help as many people as we can with this technology, right?
[01:17:16.860 --> 01:17:18.860]   So, how do we do that?
[01:17:18.860 --> 01:17:21.900]   It needs to be cheaper, it needs to be faster,
[01:17:21.900 --> 01:17:23.460]   it needs to be easier to use,
[01:17:23.460 --> 01:17:25.060]   and it needs to be more integrated
[01:17:25.060 --> 01:17:27.260]   into people's day-to-day lives,
[01:17:27.260 --> 01:17:29.260]   into the stuff that they do.
[01:17:29.260 --> 01:17:32.820]   This is hard, you know?
[01:17:32.820 --> 01:17:36.980]   And so, in the end, I guess I was inspired
[01:17:36.980 --> 01:17:40.260]   by Thomas Edison's Invention Factory
[01:17:40.260 --> 01:17:41.700]   in the late 19th century,
[01:17:41.700 --> 01:17:43.180]   where they had the same situation.
[01:17:43.180 --> 01:17:46.560]   They were like, "Oh, look, electricity's been invented.
[01:17:46.560 --> 01:17:49.920]   "Okay, what do we do with this?
[01:17:49.920 --> 01:17:51.220]   "It's a source of power.
[01:17:51.220 --> 01:17:53.180]   "I don't know."
[01:17:53.180 --> 01:17:55.240]   And they're like, "Oh, let's create the record player,
[01:17:55.240 --> 01:17:57.700]   "and the light bulb, and the refrigerator."
[01:17:57.700 --> 01:18:00.700]   And, you know, it's like, recognizing
[01:18:00.700 --> 01:18:01.820]   that now you have electricity,
[01:18:01.820 --> 01:18:04.140]   you can make all these things, that's hard.
[01:18:04.140 --> 01:18:06.180]   It requires really smart researchers
[01:18:06.180 --> 01:18:09.020]   who deeply understand the underlying technology,
[01:18:09.020 --> 01:18:12.260]   recognize, like, oh, there are some gaps here,
[01:18:12.260 --> 01:18:14.860]   but they could be filled if we, like,
[01:18:14.860 --> 01:18:17.860]   use this different kind of filament, or whatever.
[01:18:17.860 --> 01:18:22.860]   And so, you actually need, like, deep technical experts
[01:18:22.860 --> 01:18:26.100]   who also have the, like, curiosity,
[01:18:26.100 --> 01:18:28.260]   and playfulness, and spontaneity,
[01:18:28.260 --> 01:18:29.980]   to, like, think, like, oh, what if the world
[01:18:29.980 --> 01:18:32.260]   had this new thing in it?
[01:18:32.260 --> 01:18:35.300]   I wonder if we could put that thing in the world now,
[01:18:35.300 --> 01:18:37.620]   but we have AI.
[01:18:37.620 --> 01:18:39.780]   - Yeah, you were very complimentary of, like,
[01:18:39.780 --> 01:18:41.100]   the open source, so we last met
[01:18:41.100 --> 01:18:44.180]   at the open source meetup, as well.
[01:18:44.180 --> 01:18:45.860]   We met so many times.
[01:18:45.860 --> 01:18:47.540]   And you're very complimentary of, like,
[01:18:47.540 --> 01:18:49.380]   their approach towards just trying things,
[01:18:49.380 --> 01:18:52.220]   like model stacking, for example.
[01:18:52.220 --> 01:18:53.380]   Is that the kind of people
[01:18:53.380 --> 01:18:55.020]   that you're looking to collaborate with?
[01:18:55.020 --> 01:18:59.060]   - I think partly, you know, I'm deeply involved
[01:18:59.060 --> 01:19:00.780]   in the open source community,
[01:19:00.780 --> 01:19:03.980]   and I wanna continue to do that, you know?
[01:19:03.980 --> 01:19:08.980]   They, all the best kind of models outside
[01:19:08.980 --> 01:19:12.180]   of your kind of open AI and stuff
[01:19:12.180 --> 01:19:15.180]   are all created by the open source community,
[01:19:15.180 --> 01:19:19.420]   at the moment, through just trying crazy things.
[01:19:19.420 --> 01:19:22.780]   But it'll be a mix, you know?
[01:19:22.780 --> 01:19:24.460]   I also wanna work really closely
[01:19:24.460 --> 01:19:29.060]   with the best academics in the world, you know?
[01:19:29.060 --> 01:19:31.260]   And I also wanna collaborate with the people
[01:19:32.300 --> 01:19:34.380]   in parts of the world we've never even heard of,
[01:19:34.380 --> 01:19:35.500]   who never get a chance,
[01:19:35.500 --> 01:19:38.420]   because nobody gave them a chance.
[01:19:38.420 --> 01:19:39.980]   And, you know, so one of the things
[01:19:39.980 --> 01:19:42.700]   we're gonna be doing a lot of is, like,
[01:19:42.700 --> 01:19:45.580]   recruiting in really weird ways, you know,
[01:19:45.580 --> 01:19:50.380]   to find those people who are underappreciated.
[01:19:50.380 --> 01:19:51.580]   - Would it be, like, a challenge,
[01:19:51.580 --> 01:19:52.420]   like a Kaggle-type challenge?
[01:19:52.420 --> 01:19:54.780]   - Yeah, like, Kaggle-y kind of things,
[01:19:54.780 --> 01:19:57.660]   and, you know, basically find ways, you know,
[01:19:57.660 --> 01:20:02.460]   or through, like, open source bounties and stuff like that.
[01:20:02.460 --> 01:20:04.900]   Like, basically give people an opportunity
[01:20:04.900 --> 01:20:08.460]   to show that they can do amazing shit
[01:20:08.460 --> 01:20:09.700]   that nobody else can do.
[01:20:09.700 --> 01:20:10.540]   - Yeah.
[01:20:10.540 --> 01:20:11.820]   - Doesn't matter how old they are,
[01:20:11.820 --> 01:20:13.820]   or where they live, or what color their skin is,
[01:20:13.820 --> 01:20:15.260]   or whatever, you know?
[01:20:15.260 --> 01:20:17.980]   - Yeah, I think what the Fast.AI community has shown
[01:20:17.980 --> 01:20:19.100]   is that there are a lot of people
[01:20:19.100 --> 01:20:21.060]   who don't have a traditional background
[01:20:21.060 --> 01:20:22.740]   that are really talented people.
[01:20:22.740 --> 01:20:26.460]   And I think, yeah, it's great that that was there for,
[01:20:26.460 --> 01:20:27.580]   that the Fast.AI community was there,
[01:20:27.580 --> 01:20:30.020]   and that Jeremy continues to highlight
[01:20:30.020 --> 01:20:30.860]   those talents as well.
[01:20:30.860 --> 01:20:32.540]   - Actually, so let me give props to Tanishq
[01:20:32.540 --> 01:20:33.980]   as an example, right?
[01:20:33.980 --> 01:20:38.980]   So, Tanishq is the CEO of a research lab
[01:20:38.980 --> 01:20:40.980]   of which I'm the president, Medak.
[01:20:40.980 --> 01:20:43.180]   And he, how old are you, Tanishq?
[01:20:43.180 --> 01:20:45.180]   - I'm 20 years old, which is why I'm not drinking the wine.
[01:20:45.180 --> 01:20:47.500]   - So you're not a drinker at this wine bar.
[01:20:47.500 --> 01:20:49.780]   - You know, so like, Tanishq's a great example
[01:20:49.780 --> 01:20:53.580]   of somebody that most people wouldn't hire as a CEO,
[01:20:53.580 --> 01:20:54.460]   but why the hell not?
[01:20:54.460 --> 01:20:57.020]   Like, he finished high school 10 years ago.
[01:20:57.020 --> 01:20:58.620]   He finished high school at 10.
[01:20:58.620 --> 01:21:00.980]   You know, he had his first degree at, what, 14.
[01:21:00.980 --> 01:21:04.500]   Like, he's somebody who's, you know, done it twice.
[01:21:04.500 --> 01:21:06.340]   - I mean, that's somewhat, like,
[01:21:06.340 --> 01:21:10.260]   he went after the traditional accreditation,
[01:21:10.260 --> 01:21:13.020]   the pieces of paper that you would pursue
[01:21:13.020 --> 01:21:15.260]   to show yourself as qualified.
[01:21:15.260 --> 01:21:19.780]   So, in a way, he's part of that status quo.
[01:21:19.780 --> 01:21:24.780]   - In a way, but you know, unfortunately, people are ageist.
[01:21:24.780 --> 01:21:25.660]   - Yes, they are.
[01:21:25.660 --> 01:21:28.540]   - So, and I'll also note that I never actually
[01:21:28.540 --> 01:21:30.860]   did a computer science degree or anything like this.
[01:21:30.860 --> 01:21:34.900]   My start with AI was actually through the fast AI course.
[01:21:34.900 --> 01:21:37.980]   So, yeah, and so it's been a long journey since then, yeah.
[01:21:37.980 --> 01:21:40.740]   - What would you ask him about Answer?
[01:21:40.740 --> 01:21:42.580]   - 'Cause I already know a lot of what's going on
[01:21:42.580 --> 01:21:43.420]   at the company.
[01:21:43.420 --> 01:21:44.420]   - What is he not saying?
[01:21:44.420 --> 01:21:45.780]   Is he too humble to say?
[01:21:45.780 --> 01:21:48.020]   - I think what he's not saying is he already, you know,
[01:21:48.020 --> 01:21:50.980]   has a great team of researchers that, you know,
[01:21:50.980 --> 01:21:53.660]   there are already two researchers that are at Answer AI
[01:21:53.660 --> 01:21:55.580]   that are amazing researchers that I've had the chance
[01:21:55.580 --> 01:21:59.580]   to also interact with over the past maybe a year or so,
[01:21:59.580 --> 01:22:01.420]   closely, and also just more generally.
[01:22:01.420 --> 01:22:04.100]   I'm looking forward to seeing what Answer does.
[01:22:04.100 --> 01:22:05.460]   And I'm really excited to continue
[01:22:05.460 --> 01:22:06.780]   to collaborate with Jeremy.
[01:22:06.780 --> 01:22:10.180]   I think this will be even better for me.
[01:22:10.180 --> 01:22:12.060]   Like, I'm selfishly, I'm very excited
[01:22:12.060 --> 01:22:14.740]   because I think it'll be better for me, you know,
[01:22:14.740 --> 01:22:17.880]   to work closely with Jeremy as well.
[01:22:17.880 --> 01:22:20.780]   Even though, you know, he's in his own research lab,
[01:22:20.780 --> 01:22:22.860]   but I think the collaborations that will come out of this
[01:22:22.860 --> 01:22:24.820]   will be, will just be amazing.
[01:22:24.820 --> 01:22:26.700]   So that's what I'm excited for.
[01:22:26.700 --> 01:22:29.900]   - And Jeremy, last time you were on the podcast,
[01:22:29.900 --> 01:22:30.980]   you said that, you know,
[01:22:30.980 --> 01:22:32.460]   one of the most consistent pieces of advice
[01:22:32.460 --> 01:22:35.180]   that you always give is that people just need to show up,
[01:22:35.180 --> 01:22:37.420]   follow through, do the work, that stuff.
[01:22:37.420 --> 01:22:38.500]   Obviously, Tanish did that.
[01:22:38.500 --> 01:22:41.020]   - Yeah, so Tanish is one of those rare people, right?
[01:22:41.020 --> 01:22:42.060]   - But like, what, like,
[01:22:42.060 --> 01:22:43.980]   I feel like Tanish is more special than that.
[01:22:43.980 --> 01:22:45.820]   Like, what else did he do really well?
[01:22:45.820 --> 01:22:47.940]   - Yeah, so I mean,
[01:22:47.940 --> 01:22:51.300]   God, how old were you when I first came across you?
[01:22:51.300 --> 01:22:53.340]   Like 15 or something, maybe?
[01:22:53.340 --> 01:22:54.180]   - Wait, what?
[01:22:54.180 --> 01:22:55.220]   That's so long? - Yeah.
[01:22:55.220 --> 01:22:57.160]   - 'Cause he only took Fast.ai a year and a half ago.
[01:22:57.160 --> 01:22:58.000]   - No, no, no, no.
[01:22:58.000 --> 01:22:59.660]   He was a Fast.ai student back then.
[01:22:59.660 --> 01:23:00.500]   - Okay.
[01:23:00.500 --> 01:23:04.900]   - And, you know, he kind of got on the forum,
[01:23:04.900 --> 01:23:07.820]   helped answer questions, you know,
[01:23:07.820 --> 01:23:09.860]   asked interesting questions of his own.
[01:23:09.860 --> 01:23:16.100]   To stick with that for five years,
[01:23:16.100 --> 01:23:18.020]   that's tenacity, you know?
[01:23:18.020 --> 01:23:20.820]   And the last course we did
[01:23:20.820 --> 01:23:23.260]   was the hardest course we've ever had.
[01:23:23.260 --> 01:23:24.380]   It was the diffusion course.
[01:23:24.380 --> 01:23:27.580]   It was the first ever stable diffusion course.
[01:23:27.580 --> 01:23:30.380]   And none of us knew what the hell was going on.
[01:23:30.380 --> 01:23:31.700]   And, you know, he was the one
[01:23:31.700 --> 01:23:35.580]   who slogged through the math,
[01:23:35.580 --> 01:23:37.800]   figured out what the hell all those Greek letters were saying
[01:23:37.800 --> 01:23:41.940]   and did the first math of stable diffusion video
[01:23:41.940 --> 01:23:44.420]   that, as far as I know, that ever existed.
[01:23:44.420 --> 01:23:48.940]   You did that with Wassim, right?
[01:23:48.940 --> 01:23:50.420]   Along with Wassim.
[01:23:50.420 --> 01:23:54.140]   So, you know, he slogs through difficult shit.
[01:23:54.140 --> 01:23:57.780]   And the thing that I noticed now is like,
[01:23:57.780 --> 01:23:59.420]   you know, Tanishka's kind of famous,
[01:23:59.420 --> 01:24:01.460]   or was kind of famous, as a child prodigy.
[01:24:01.460 --> 01:24:02.900]   - Yes, you did a TED Talk when you were 14.
[01:24:02.900 --> 01:24:04.300]   - He was on Child Genius.
[01:24:04.300 --> 01:24:05.340]   He did a TED Talk when he was 14.
[01:24:05.340 --> 01:24:06.540]   - I was nine when I did it.
[01:24:06.540 --> 01:24:08.060]   - He was nine, okay.
[01:24:08.060 --> 01:24:09.900]   And like, so I kind of thought like,
[01:24:09.900 --> 01:24:13.340]   oh, things are easy for child prodigies.
[01:24:13.340 --> 01:24:16.900]   You know, they're so smart that they just, it's easy.
[01:24:16.900 --> 01:24:18.420]   And I'm like, oh no.
[01:24:18.420 --> 01:24:21.380]   Actually, Tanishka's nearly as dumb as me.
[01:24:21.380 --> 01:24:25.140]   And so he just works really, he just works really hard.
[01:24:25.140 --> 01:24:27.380]   And he's like, "Tanishka, what does this mean?"
[01:24:27.380 --> 01:24:29.380]   He's like, "I don't know."
[01:24:29.380 --> 01:24:32.660]   Like, oh, okay, we better figure it out.
[01:24:32.660 --> 01:24:35.300]   And so that's been interesting to see that like,
[01:24:35.300 --> 01:24:38.900]   actually, child prodigies have to work
[01:24:38.900 --> 01:24:40.780]   really, really hard as well, you know.
[01:24:40.780 --> 01:24:42.580]   That's part of what makes them a child prodigy
[01:24:42.580 --> 01:24:45.420]   is that they're tenacious and they don't give up
[01:24:45.420 --> 01:24:47.300]   even over five years.
[01:24:47.300 --> 01:24:48.700]   - Does it look that way to you?
[01:24:48.700 --> 01:24:49.540]   Is that what you--
[01:24:49.540 --> 01:24:50.380]   - Yeah, I think so.
[01:24:50.380 --> 01:24:51.780]   And I think, again, part of it--
[01:24:51.780 --> 01:24:53.580]   - You agree you're nearly as dumb as me?
[01:24:53.580 --> 01:24:54.620]   (laughing)
[01:24:54.620 --> 01:24:55.940]   - No.
[01:24:55.940 --> 01:24:57.220]   - Say it again for the pod.
[01:24:57.220 --> 01:25:00.220]   - I think Jeremy's trying to trick me here.
[01:25:00.220 --> 01:25:04.140]   But I think the Fast AI community has been so friendly
[01:25:04.140 --> 01:25:06.460]   that it's been a really pleasant experience
[01:25:06.460 --> 01:25:08.300]   to stay with that community.
[01:25:08.300 --> 01:25:11.180]   And I think that has also enabled my tenacity,
[01:25:11.180 --> 01:25:13.380]   'cause I enjoy being in that community so much.
[01:25:13.380 --> 01:25:15.460]   So that's why I've stuck around in that community
[01:25:15.460 --> 01:25:16.380]   for so long.
[01:25:16.380 --> 01:25:18.540]   So without that, without the community
[01:25:18.540 --> 01:25:21.020]   that Jeremy has built, I don't think there's any way--
[01:25:21.020 --> 01:25:21.860]   - It supports you.
[01:25:21.860 --> 01:25:23.140]   I had the same with Free Code Camp.
[01:25:23.140 --> 01:25:26.380]   - So, you know, I think a lot of it has to do with--
[01:25:26.380 --> 01:25:27.300]   - I'm gonna cry.
[01:25:27.300 --> 01:25:29.860]   - I think a lot of it has to do with
[01:25:29.860 --> 01:25:31.460]   building good communities.
[01:25:31.460 --> 01:25:33.820]   And Jeremy has done a really good job of doing that.
[01:25:33.820 --> 01:25:35.020]   And it's actually a lot of hard work
[01:25:35.020 --> 01:25:36.980]   to build a good community and to nurture
[01:25:36.980 --> 01:25:38.220]   and grow that community.
[01:25:38.220 --> 01:25:40.620]   And I've been in many communities
[01:25:40.620 --> 01:25:42.620]   and I've kind of observed how different communities
[01:25:42.620 --> 01:25:44.060]   in the AI field have grown.
[01:25:44.060 --> 01:25:46.220]   And Fast AI still is one of the best communities
[01:25:46.220 --> 01:25:47.580]   that I've had a chance to be a part of.
[01:25:47.580 --> 01:25:49.580]   So, you know, again, props to Jeremy
[01:25:49.580 --> 01:25:51.260]   for doing that as well.
[01:25:51.260 --> 01:25:54.340]   - I'm so embarrassed right now.
[01:25:54.340 --> 01:25:55.540]   - I wanna give you the perspective.
[01:25:55.540 --> 01:25:57.180]   You've been an AI investor for a while.
[01:25:57.180 --> 01:25:58.020]   - Yeah.
[01:25:58.020 --> 01:26:02.020]   - And how do you view this community and this moment here?
[01:26:02.020 --> 01:26:03.980]   - The one thing I will say to the conversation
[01:26:03.980 --> 01:26:06.580]   that we're just having that I think is awesome is--
[01:26:06.580 --> 01:26:07.420]   - We can move here a little bit.
[01:26:07.420 --> 01:26:10.780]   - Yeah, people keep coming and drinking more wine.
[01:26:10.780 --> 01:26:11.620]   - It's great, it's a mobile studio.
[01:26:11.620 --> 01:26:13.620]   - Yeah, we're truly a mobile studio,
[01:26:13.620 --> 01:26:15.260]   middle of New Orleans, let's go.
[01:26:16.220 --> 01:26:18.660]   One of my favorite heuristics as an investor
[01:26:18.660 --> 01:26:22.580]   is distance traveled rather than just your,
[01:26:22.580 --> 01:26:25.740]   rather than just like what do I see today
[01:26:25.740 --> 01:26:28.020]   in your resume or whatnot.
[01:26:28.020 --> 01:26:32.340]   Because I think if you just go by a certain pedigree
[01:26:32.340 --> 01:26:34.980]   or credential or whatnot, you miss a lot of people
[01:26:34.980 --> 01:26:36.980]   who have traveled a really big distance,
[01:26:36.980 --> 01:26:39.180]   who didn't have advantages to certain opportunities
[01:26:39.180 --> 01:26:41.260]   or came from different places or not from the US.
[01:26:41.260 --> 01:26:42.860]   Like you name all the different,
[01:26:42.860 --> 01:26:44.820]   you know, all the different lists.
[01:26:44.820 --> 01:26:47.380]   And I always try to look for those kinds of people
[01:26:47.380 --> 01:26:49.100]   because they're the ones that are always
[01:26:49.100 --> 01:26:51.340]   pushing the frontier and like really run through walls.
[01:26:51.340 --> 01:26:53.940]   And I think this conversation is a good example of that.
[01:26:53.940 --> 01:26:55.900]   - I mean, no one has a longer distance travel to Germany.
[01:26:55.900 --> 01:26:56.740]   - 100%.
[01:26:56.740 --> 01:26:57.580]   (laughing)
[01:26:57.580 --> 01:26:59.700]   Well, literally and in the sense--
[01:26:59.700 --> 01:27:01.820]   - Literally from Australia, yes.
[01:27:01.820 --> 01:27:04.620]   - And when we were, and I think when we were meeting
[01:27:04.620 --> 01:27:06.500]   last week, you were talking about this
[01:27:06.500 --> 01:27:08.940]   a little bit around looking for engineers
[01:27:08.940 --> 01:27:11.020]   and people at places that it's not necessarily
[01:27:11.020 --> 01:27:12.340]   where everyone else would be looking at,
[01:27:12.340 --> 01:27:14.100]   but that has yielded some of the best,
[01:27:14.100 --> 01:27:16.380]   like deepest relationships you've had, right?
[01:27:16.380 --> 01:27:17.260]   - Oh, absolutely.
[01:27:17.260 --> 01:27:20.420]   I mean, companies turn resources
[01:27:20.420 --> 01:27:22.740]   into valuable products and services, right?
[01:27:22.740 --> 01:27:25.140]   Like what are the resources that we suck in?
[01:27:25.140 --> 01:27:28.060]   It's like, it's people and GPUs, you know?
[01:27:28.060 --> 01:27:28.900]   - And money.
[01:27:28.900 --> 01:27:31.300]   - And it's like, and well, we need the money
[01:27:31.300 --> 01:27:34.220]   to get those GPUs and the people, right?
[01:27:34.220 --> 01:27:39.220]   And like, the GPUs are, you know, reasonably,
[01:27:39.220 --> 01:27:42.900]   like here you can replace one with another, no worries.
[01:27:42.900 --> 01:27:45.380]   So it's actually the competitive advantage,
[01:27:45.380 --> 01:27:47.780]   the thing that makes you different is the people.
[01:27:47.780 --> 01:27:52.780]   So this is the most important thing
[01:27:52.780 --> 01:27:57.860]   for us to achieve our mission is to build this team,
[01:27:57.860 --> 01:28:00.020]   you know, to build this really special team.
[01:28:00.020 --> 01:28:02.100]   And I, you know, I think the way to do that
[01:28:02.100 --> 01:28:04.820]   and the way I've always built teams is to say,
[01:28:04.820 --> 01:28:05.940]   is to look at people and say like,
[01:28:05.940 --> 01:28:09.580]   okay, where is this person now
[01:28:09.580 --> 01:28:12.540]   and what would it have taken them to get there?
[01:28:12.540 --> 01:28:15.860]   You know, like, so if somebody is like, you know,
[01:28:15.860 --> 01:28:18.940]   was kicked out of high school, you know,
[01:28:18.940 --> 01:28:22.420]   because they were dyslexic or because somebody was like,
[01:28:22.420 --> 01:28:24.100]   grew up in the mountains of Bangladesh
[01:28:24.100 --> 01:28:27.900]   and didn't have a PC until they were 16 or, you know,
[01:28:27.900 --> 01:28:32.900]   somebody fought against, you know, a woman who grew up
[01:28:32.900 --> 01:28:35.780]   in an environment which he had to fight against
[01:28:35.780 --> 01:28:38.300]   like institutionalized sexism or whatever.
[01:28:38.300 --> 01:28:39.980]   It's like, these are the people to me,
[01:28:39.980 --> 01:28:42.020]   I just kind of go like, okay,
[01:28:42.020 --> 01:28:46.300]   this person's gone from like negative 43 up to 99.
[01:28:46.300 --> 01:28:47.740]   - Yes, overcome a lot.
[01:28:47.740 --> 01:28:49.660]   - That's a kick-ass amazing person,
[01:28:49.660 --> 01:28:51.820]   whereas somebody who's gone from like 98 to 99
[01:28:51.820 --> 01:28:53.500]   is like, okay, it's cool.
[01:28:53.500 --> 01:28:56.580]   But they're probably not the people
[01:28:56.580 --> 01:28:59.700]   who are gonna like change the world.
[01:28:59.700 --> 01:29:02.300]   And so we want to be a small team
[01:29:02.300 --> 01:29:03.820]   where like literally every person in it
[01:29:03.820 --> 01:29:06.780]   is somebody who can change the world.
[01:29:06.780 --> 01:29:09.060]   And the nice thing is when you're in a small team like that,
[01:29:09.060 --> 01:29:11.020]   it's just really enjoyable
[01:29:11.020 --> 01:29:16.020]   because everybody's like just really great to be around,
[01:29:16.020 --> 01:29:21.100]   you know, really inspiring.
[01:29:21.100 --> 01:29:22.900]   And so, yeah, that's why we're kind of looking
[01:29:22.900 --> 01:29:27.900]   for these extremely special individuals.
[01:29:27.900 --> 01:29:29.700]   - Yeah, cool.
[01:29:29.700 --> 01:29:32.340]   So that's a hiring call explicitly, you know,
[01:29:32.340 --> 01:29:34.340]   if anyone's listening who fits that profile
[01:29:34.340 --> 01:29:36.740]   and really wants to work with you,
[01:29:36.740 --> 01:29:38.020]   they should reach out, right?
[01:29:38.020 --> 01:29:38.860]   - Yes, absolutely.
[01:29:38.860 --> 01:29:41.340]   - And now we have a website to send people to.
[01:29:41.340 --> 01:29:46.700]   So I was gonna wrap it up with just overall NeurIPS tips,
[01:29:46.700 --> 01:29:48.860]   right, like what is it like to be at NeurIPS this year
[01:29:48.860 --> 01:29:50.340]   if you've been here before?
[01:29:50.340 --> 01:29:53.420]   And also like, what's your best tip for doing NeurIPS right?
[01:29:53.420 --> 01:29:55.500]   Anyone can take it.
[01:29:55.500 --> 01:29:56.700]   - I guess I'll start.
[01:29:56.700 --> 01:29:57.740]   This is my second NeurIPS,
[01:29:57.740 --> 01:29:59.900]   so maybe I don't have a lot of experience with it,
[01:29:59.900 --> 01:30:03.580]   but I mean, I've been enjoying it a lot so far.
[01:30:03.580 --> 01:30:06.740]   For me, I think it's about networking with people
[01:30:06.740 --> 01:30:07.880]   and that's the best part of NeurIPS
[01:30:07.880 --> 01:30:10.380]   because at the end of the day,
[01:30:10.380 --> 01:30:13.080]   AI moves so fast that half of these papers
[01:30:13.080 --> 01:30:14.580]   are already kind of outdated.
[01:30:14.580 --> 01:30:16.580]   (laughing)
[01:30:16.580 --> 01:30:18.340]   Like, you know, we've already seen like--
[01:30:18.340 --> 01:30:19.540]   - They were written months ago, right?
[01:30:19.540 --> 01:30:20.380]   - Yeah, yeah.
[01:30:20.380 --> 01:30:21.220]   - They were approved months ago, too.
[01:30:21.220 --> 01:30:22.060]   - In order to get here, they had to be reviewed.
[01:30:22.060 --> 01:30:22.900]   - Exactly.
[01:30:22.900 --> 01:30:25.620]   So, you know, we're already seeing the second version
[01:30:25.620 --> 01:30:27.660]   or the third version of a lot of these models already
[01:30:27.660 --> 01:30:30.500]   and, you know, so, I mean, it's, for me--
[01:30:30.500 --> 01:30:32.220]   - So Archive is all you need?
[01:30:32.220 --> 01:30:34.180]   - Archive is all you need, I guess, yeah.
[01:30:34.180 --> 01:30:36.500]   So for me, the value comes out of talking with people
[01:30:36.500 --> 01:30:38.100]   and meeting with people and networking
[01:30:38.100 --> 01:30:40.020]   and that's why we're coming to events like these
[01:30:40.020 --> 01:30:42.980]   that, to network and, you know, make these connections
[01:30:42.980 --> 01:30:46.540]   and, you know, I actually meet a lot of collaborators
[01:30:46.540 --> 01:30:48.060]   and other researchers at all these conferences.
[01:30:48.060 --> 01:30:50.060]   - And just to be clear, when you say networking,
[01:30:50.060 --> 01:30:53.740]   like, it's not like networking in that sense
[01:30:53.740 --> 01:30:54.760]   of like getting ahead.
[01:30:54.760 --> 01:30:57.100]   It's a kind of a really nerdy kind of networking.
[01:30:57.100 --> 01:31:00.620]   So like, earlier, Tanishka and I were at another reception
[01:31:00.620 --> 01:31:03.100]   where it's like, "Oh, there's Albert Gu.
[01:31:03.100 --> 01:31:04.720]   "He's the guy that like two days ago
[01:31:04.720 --> 01:31:06.180]   "released the Mamba paper."
[01:31:06.180 --> 01:31:08.580]   And we got to him and say like, "Oh, you know,
[01:31:08.580 --> 01:31:10.720]   "we had a conversation about states-based models
[01:31:10.720 --> 01:31:12.320]   "and why he's using that and what he thinks
[01:31:12.320 --> 01:31:14.300]   "the opportunities and limitations are
[01:31:14.300 --> 01:31:15.740]   "and is there still room for attention?"
[01:31:15.740 --> 01:31:18.780]   And like, so when we say networking, you know,
[01:31:18.780 --> 01:31:21.740]   we mean like geeking out on deep conversations
[01:31:21.740 --> 01:31:24.140]   about people's academic areas of interest.
[01:31:24.140 --> 01:31:26.500]   - Yeah, I always follow up the question of like,
[01:31:26.500 --> 01:31:27.880]   "Okay, like what's your name, where you work
[01:31:27.880 --> 01:31:29.100]   "and then what are your interests?"
[01:31:29.100 --> 01:31:30.620]   And then we try to go from there.
[01:31:30.620 --> 01:31:33.740]   - Yeah, just like what paper did you write last or?
[01:31:33.740 --> 01:31:34.680]   - You know, I will say one thing.
[01:31:34.680 --> 01:31:36.780]   So even though the posters, there are a bunch
[01:31:36.780 --> 01:31:39.260]   that truly you go by and even the people presenting
[01:31:39.260 --> 01:31:40.500]   are like, "Yeah, this is kind of out of date."
[01:31:40.500 --> 01:31:43.460]   The one hack that's really fun is a lot of those people
[01:31:43.460 --> 01:31:45.580]   are also already working on the next thing
[01:31:45.580 --> 01:31:48.020]   and they can give you sort of an early preview
[01:31:48.020 --> 01:31:51.020]   of something that actually is not an archive yet.
[01:31:51.020 --> 01:31:53.140]   And so that I actually have always,
[01:31:53.140 --> 01:31:54.620]   my favorite parts of the conference
[01:31:54.620 --> 01:31:56.360]   are actually just walking around the poster session,
[01:31:56.360 --> 01:31:57.820]   shaking hands with people who are presenting
[01:31:57.820 --> 01:32:00.460]   and learning about what they're most excited about,
[01:32:00.460 --> 01:32:02.340]   what they're working on, what are some of the new things.
[01:32:02.340 --> 01:32:03.660]   So I find that really fun.
[01:32:03.660 --> 01:32:07.420]   And also in my case, since I'm a VC,
[01:32:07.420 --> 01:32:10.060]   my best tip is throw an event with a lot of good wine
[01:32:10.060 --> 01:32:12.700]   and let the people come.
[01:32:12.700 --> 01:32:14.700]   - Yeah, excellent.
[01:32:14.700 --> 01:32:16.300]   Jeremy, you have any tips?
[01:32:16.300 --> 01:32:19.000]   - I mean, like Dinesh, this is only my second year up.
[01:32:19.000 --> 01:32:22.700]   But I've been to quite a few conferences in general
[01:32:22.700 --> 01:32:24.700]   and my tip, number one tip for all conferences
[01:32:24.700 --> 01:32:26.340]   is don't go to any sessions.
[01:32:26.340 --> 01:32:27.540]   - Yeah, just stay outside and talk.
[01:32:27.540 --> 01:32:31.140]   - Like whatever they're saying very, very slowly,
[01:32:31.140 --> 01:32:32.700]   and then they're probably not an expert
[01:32:32.700 --> 01:32:34.740]   at verbal communication either.
[01:32:34.740 --> 01:32:36.400]   You can probably get the better version
[01:32:36.400 --> 01:32:37.540]   by just reading the damn paper
[01:32:37.540 --> 01:32:38.860]   that they're reading out to you.
[01:32:38.860 --> 01:32:40.500]   So don't bother with that.
[01:32:40.500 --> 01:32:44.700]   So like, yeah, hang outside in the hallway,
[01:32:44.700 --> 01:32:46.860]   look on the app to see who else is around
[01:32:46.860 --> 01:32:50.180]   and reach out to them and try and find a group
[01:32:50.180 --> 01:32:53.500]   of six or so interesting people to go and check out
[01:32:53.500 --> 01:32:58.500]   the local Louisiana sausage special outlet with, whatever.
[01:32:58.500 --> 01:33:00.400]   Yeah, that's--
[01:33:00.400 --> 01:33:01.240]   - Reception hopping.
[01:33:01.240 --> 01:33:02.300]   - Yeah, reception hopping.
[01:33:02.300 --> 01:33:03.780]   This is our fourth reception tonight.
[01:33:03.780 --> 01:33:05.060]   - Oh my God.
[01:33:05.060 --> 01:33:06.500]   - Fourth and best, right, Jeremy?
[01:33:06.500 --> 01:33:07.540]   - Oh, fourth and best.
[01:33:07.540 --> 01:33:09.700]   This is why we came to this one last,
[01:33:09.700 --> 01:33:13.540]   so we can hang out here until the wine's finished.
[01:33:13.540 --> 01:33:14.420]   - So a lot of people hate
[01:33:14.420 --> 01:33:17.300]   on the official NeurIPS conference app, Hoova,
[01:33:17.300 --> 01:33:19.540]   but I kind of like it because of one thing,
[01:33:19.540 --> 01:33:20.820]   people can organize their own meetups
[01:33:20.820 --> 01:33:21.660]   and list it here and--
[01:33:21.660 --> 01:33:22.940]   - It's awesome, it's awesome.
[01:33:22.940 --> 01:33:23.780]   - It's actually really good.
[01:33:23.780 --> 01:33:25.060]   - Yeah, so I'm Brazilian
[01:33:25.060 --> 01:33:27.340]   and there's a Brazil, like, little chat.
[01:33:27.340 --> 01:33:29.060]   And it's so fun, everyone's talking in Portuguese,
[01:33:29.060 --> 01:33:31.060]   talking all the time, they're sharing all the things that,
[01:33:31.060 --> 01:33:32.820]   and these are people talking about, actually,
[01:33:32.820 --> 01:33:35.780]   like, interesting concepts in Portuguese.
[01:33:35.780 --> 01:33:37.620]   So it's actually really fun.
[01:33:37.620 --> 01:33:38.460]   I love the app.
[01:33:38.460 --> 01:33:39.740]   - And I didn't even know you were Brazilian,
[01:33:39.740 --> 01:33:40.580]   so I love it so much.
[01:33:40.580 --> 01:33:41.400]   - I am, yes.
[01:33:41.400 --> 01:33:42.240]   - Yeah, Liao with a little scrutiny.
[01:33:42.240 --> 01:33:45.740]   - Liao, yeah, my accent kind of, like, trips people.
[01:33:45.740 --> 01:33:48.100]   And it also trips people when I say something incorrectly
[01:33:48.100 --> 01:33:50.700]   and you can't really tell, but I'm, like, really Brazilian.
[01:33:50.700 --> 01:33:53.620]   - Yeah, well, we should do a steakhouse next time.
[01:33:53.620 --> 01:33:54.460]   - Oh, yes, please.
[01:33:54.460 --> 01:33:56.180]   - Yeah, that's one of those dinners.
[01:33:56.180 --> 01:33:57.020]   - Done, done.
[01:33:57.020 --> 01:33:57.860]   - Churrascarias, right?
[01:33:57.860 --> 01:33:59.060]   - Yeah, churrascaria.
[01:33:59.060 --> 01:34:00.100]   - Exactly.
[01:34:00.100 --> 01:34:01.660]   My favorite was, there was a meetup
[01:34:01.660 --> 01:34:03.460]   for people who are interested in sushi.
[01:34:03.460 --> 01:34:04.300]   That was the meetup.
[01:34:04.300 --> 01:34:05.140]   - I love it, yeah.
[01:34:05.140 --> 01:34:06.740]   - There was, like, nothing machine learning about it.
[01:34:06.740 --> 01:34:08.340]   - So at ICML, it was really fun.
[01:34:08.340 --> 01:34:09.580]   There was one meetup that I went to
[01:34:09.580 --> 01:34:11.380]   that was just, like, swimming in the morning
[01:34:11.380 --> 01:34:12.220]   because it was in Hawaii.
[01:34:12.220 --> 01:34:13.560]   It was actually kind of awesome.
[01:34:13.560 --> 01:34:15.400]   And then people were, like, actually discussing, like,
[01:34:15.400 --> 01:34:17.740]   super legit topics in the ocean.
[01:34:17.740 --> 01:34:19.300]   - I'm actually kind of sad I missed out on ICML,
[01:34:19.300 --> 01:34:22.060]   but, like, it felt indulgent to go to Hawaii for that.
[01:34:22.060 --> 01:34:22.900]   - Yeah.
[01:34:22.900 --> 01:34:24.980]   - Okay, well, I just wanted to bring it to a close.
[01:34:24.980 --> 01:34:26.700]   The last thing I was gonna say is,
[01:34:26.700 --> 01:34:27.740]   Jeremy, I don't know if you know,
[01:34:27.740 --> 01:34:32.440]   I picked your meme as the best meme of November 2023.
[01:34:32.440 --> 01:34:34.100]   It was "laundry buddy."
[01:34:34.100 --> 01:34:35.340]   (all laughing)
[01:34:35.340 --> 01:34:37.580]   So, what's up with "laundry buddy"?
[01:34:37.580 --> 01:34:38.660]   Why do you hate it so much?
[01:34:38.660 --> 01:34:39.500]   What did it do to you?
[01:34:39.500 --> 01:34:40.340]   - No!
[01:34:40.340 --> 01:34:41.860]   (all laughing)
[01:34:41.860 --> 01:34:43.300]   No!
[01:34:43.300 --> 01:34:44.860]   It did nothing to me.
[01:34:44.860 --> 01:34:46.140]   - For people who are out of the loop, what did you do?
[01:34:46.140 --> 01:34:48.060]   - I couldn't have walked it back more.
[01:34:48.060 --> 01:34:49.620]   (all laughing)
[01:34:49.620 --> 01:34:51.100]   - Jeremy did walk it back on Twitter.
[01:34:51.100 --> 01:34:54.080]   - You really gonna make me revisit my shame?
[01:34:54.080 --> 01:34:54.920]   - I just think it's a fun story.
[01:34:54.920 --> 01:34:57.620]   - Okay, just for your show, I'm gonna revisit my shame.
[01:34:57.620 --> 01:34:58.720]   - Some people don't know, some people don't know.
[01:34:58.720 --> 01:35:01.580]   - I made a bold claim that "laundry buddy"
[01:35:01.580 --> 01:35:06.580]   was not the peak of open AI's path
[01:35:06.580 --> 01:35:11.220]   to societally beneficial artificial general intelligence.
[01:35:11.220 --> 01:35:12.060]   I was wrong.
[01:35:12.060 --> 01:35:13.900]   (all laughing)
[01:35:13.900 --> 01:35:17.060]   It is, in fact, very much on that path.
[01:35:17.060 --> 01:35:21.940]   It is well loved to be able to know
[01:35:21.940 --> 01:35:24.740]   that the world's best artificial intelligence
[01:35:24.740 --> 01:35:26.820]   can help you figure out how to sort out
[01:35:26.820 --> 01:35:28.780]   your whites and your colors,
[01:35:28.780 --> 01:35:31.660]   whether to use powder or pods,
[01:35:31.660 --> 01:35:35.500]   and what to do if you get a stain
[01:35:35.500 --> 01:35:37.480]   and you don't have laundry nearby.
[01:35:37.480 --> 01:35:41.980]   It's special, it's important,
[01:35:41.980 --> 01:35:43.860]   and it's a part of my life
[01:35:43.860 --> 01:35:45.820]   that I will never want to be without.
[01:35:46.780 --> 01:35:49.180]   - I love that the, so the Chattivity
[01:35:49.180 --> 01:35:50.520]   now has an official Twitter account,
[01:35:50.520 --> 01:35:52.340]   and they even got in on the "laundry buddy" meme,
[01:35:52.340 --> 01:35:53.700]   which is amazing to me.
[01:35:53.700 --> 01:35:56.420]   - I actually spent a couple of hours this morning
[01:35:56.420 --> 01:35:59.140]   hanging out with Boris Power from OpenAI,
[01:35:59.140 --> 01:36:03.820]   who was in there batting for "laundry buddy" from the start.
[01:36:03.820 --> 01:36:04.900]   (all laughing)
[01:36:04.900 --> 01:36:07.700]   - Wait, there's an anti and pro "laundry buddy"?
[01:36:07.700 --> 01:36:08.540]   - No, I mean, he was just
[01:36:08.540 --> 01:36:11.140]   a particularly strong enthusiast, right?
[01:36:11.140 --> 01:36:14.640]   He had the grace to not even bring it up, unlike you.
[01:36:14.640 --> 01:36:16.260]   (all laughing)
[01:36:16.260 --> 01:36:18.100]   - I had to, I had to, it was so funny.
[01:36:18.100 --> 01:36:20.140]   I cracked up so much, it was great.
[01:36:20.140 --> 01:36:21.580]   Well, thanks for chatting,
[01:36:21.580 --> 01:36:23.700]   and I'll return you back to your evenings.
[01:36:23.700 --> 01:36:25.900]   - May your clothes be well laundered.
[01:36:25.900 --> 01:36:26.740]   - Thanks for having us.
[01:36:26.740 --> 01:36:27.580]   - Cheers. - Thank you.
[01:36:27.580 --> 01:36:29.140]   - Thanks.
[01:36:29.140 --> 01:36:30.260]   - That was Jeremy Howard,
[01:36:30.260 --> 01:36:32.820]   together with Tanishq Abraham and Jess Liao.
[01:36:32.820 --> 01:36:35.300]   Tanishq and Jeremy recorded a podcast separately,
[01:36:35.300 --> 01:36:36.660]   so if you want to learn more about Tanishq,
[01:36:36.660 --> 01:36:38.780]   he's done long-form interviews in more detail
[01:36:38.780 --> 01:36:41.460]   than I can cover, because it's a lot of biomedical stuff,
[01:36:41.460 --> 01:36:42.340]   and that's one of the areas
[01:36:42.340 --> 01:36:44.300]   that we are not very knowledgeable on.
[01:36:44.300 --> 01:36:47.460]   And for Jess Liao, she was an investor in Mosaic,
[01:36:47.460 --> 01:36:49.220]   is one of the newest partners at Decibel,
[01:36:49.220 --> 01:36:52.060]   and led the round in Answer.ai.
[01:36:52.060 --> 01:36:53.540]   Next, we're going to go to some people
[01:36:53.540 --> 01:36:56.100]   on the show floor of the NeurIPS Expo.
[01:36:56.100 --> 01:36:58.220]   They're not people I had prior relationships with,
[01:36:58.220 --> 01:37:00.100]   but they're still doing interesting work nonetheless.
[01:37:00.100 --> 01:37:02.580]   And the first is, we're going to check in with Cerebris,
[01:37:02.580 --> 01:37:05.620]   which is not only producing giant, massive GPUs,
[01:37:05.620 --> 01:37:07.940]   but also publishing interesting research.
[01:37:07.940 --> 01:37:10.460]   So here's my conversation with Joe Hessness,
[01:37:10.460 --> 01:37:13.040]   Principal Research Scientist at Cerebris Systems.
[01:37:13.040 --> 01:37:15.860]   - That started working about a year ago.
[01:37:15.860 --> 01:37:18.580]   We started building out multi-box systems
[01:37:18.580 --> 01:37:20.800]   so that we could do cluster-level training,
[01:37:20.800 --> 01:37:24.140]   so larger-scale models, and so this last year,
[01:37:24.140 --> 01:37:28.300]   we've just been showing off what it's capable of.
[01:37:28.300 --> 01:37:29.980]   So early this year, we started
[01:37:29.980 --> 01:37:33.380]   with our Cerebris GPT models.
[01:37:33.380 --> 01:37:35.900]   That showed compute-optimal scaling
[01:37:35.900 --> 01:37:40.460]   for, so chinchilla-style scaling, but it's open-source.
[01:37:40.460 --> 01:37:42.920]   All those models, we released open-source.
[01:37:42.920 --> 01:37:46.720]   Based on that work, we got attention
[01:37:46.720 --> 01:37:48.000]   of a few different groups.
[01:37:48.000 --> 01:37:50.060]   One of them was the OpenTensor Foundation,
[01:37:50.060 --> 01:37:51.800]   and they came to us and said,
[01:37:51.800 --> 01:37:54.900]   hey, we want a great three-billion-parameter model
[01:37:54.900 --> 01:37:58.360]   that does, so it's something that's easy to deploy,
[01:37:58.360 --> 01:37:59.880]   like in a laptop or something,
[01:37:59.880 --> 01:38:03.400]   and we wanted to do very general language capabilities,
[01:38:03.400 --> 01:38:05.120]   long sequence length.
[01:38:05.120 --> 01:38:08.280]   And so we trained the BTLM language model for that.
[01:38:09.500 --> 01:38:12.800]   Concurrently with that, we also had an engagement
[01:38:12.800 --> 01:38:16.600]   that started up with Group 42 in the United Arab Emirates,
[01:38:16.600 --> 01:38:20.080]   so that's this poster, Core 42.
[01:38:20.080 --> 01:38:24.860]   They had interest to train large Arabic language models,
[01:38:24.860 --> 01:38:27.420]   so the first demos that we did for them
[01:38:27.420 --> 01:38:29.920]   were just Arabic models, but then they said,
[01:38:29.920 --> 01:38:33.760]   let's do multilingual Arabic and English.
[01:38:33.760 --> 01:38:36.640]   So we've been training the JACE 13 billion
[01:38:36.640 --> 01:38:40.720]   and 30 billion-parameter models this year.
[01:38:40.720 --> 01:38:42.680]   We've released both of those publicly.
[01:38:42.680 --> 01:38:46.080]   The first version of the 30 billion just came out,
[01:38:46.080 --> 01:38:49.360]   and the quality of that model in Arabic
[01:38:49.360 --> 01:38:52.880]   is better than any other public models currently,
[01:38:52.880 --> 01:38:55.320]   and then in English, it's competitive
[01:38:55.320 --> 01:38:58.400]   with models like Falcon 40B.
[01:38:58.400 --> 01:39:01.740]   So we're on a good track there.
[01:39:01.740 --> 01:39:04.840]   More releases to come through Core 42.
[01:39:04.840 --> 01:39:07.120]   We're excited to have that be open-source
[01:39:07.120 --> 01:39:09.760]   and to contribute to the community there.
[01:39:09.760 --> 01:39:14.080]   - Yeah, anecdotally, since we're already chatting,
[01:39:14.080 --> 01:39:15.720]   so we might as well keep going,
[01:39:15.720 --> 01:39:20.720]   but the UAE also notably has the Falcon or TII Institute.
[01:39:20.720 --> 01:39:23.440]   Are they related, are they competing with each other?
[01:39:23.440 --> 01:39:24.280]   What's going on?
[01:39:24.280 --> 01:39:26.360]   - Initially, there was a little bit of competition.
[01:39:26.360 --> 01:39:30.360]   They're funded by different people, different groups,
[01:39:30.360 --> 01:39:33.360]   but there is a countrywide effort going on
[01:39:33.360 --> 01:39:34.840]   in the United Arab Emirates
[01:39:34.840 --> 01:39:38.480]   to consolidate a lot of their AI efforts,
[01:39:38.480 --> 01:39:41.440]   and so that's why we're seeing very impressive
[01:39:41.440 --> 01:39:45.560]   and good pushes towards let's make it open,
[01:39:45.560 --> 01:39:47.800]   let's collaborate some more,
[01:39:47.800 --> 01:39:49.720]   and so there might be opportunities in the future
[01:39:49.720 --> 01:39:52.680]   for us to coordinate directly with TII,
[01:39:52.680 --> 01:39:55.360]   and we have looked at things like their data sets,
[01:39:55.360 --> 01:39:58.280]   like RefinedWeb, so there has been some exchange so far.
[01:39:58.280 --> 01:40:00.920]   - Yeah, with the macrodata refinements process
[01:40:00.920 --> 01:40:01.760]   that I don't know if you know.
[01:40:01.760 --> 01:40:02.760]   - Yes. - It was a reference
[01:40:02.760 --> 01:40:03.900]   to an Apple TV show.
[01:40:03.900 --> 01:40:06.560]   - Okay, Severance, anyway. - Interesting.
[01:40:06.560 --> 01:40:08.640]   - It's my fun fact.
[01:40:08.640 --> 01:40:10.000]   A little bit editor's note.
[01:40:10.000 --> 01:40:13.080]   The TII Institute people were actually there at NeurIPS
[01:40:13.080 --> 01:40:14.640]   presenting a poster on RefinedWeb,
[01:40:14.640 --> 01:40:17.600]   the data set that they did for Falcon 180B and 40B,
[01:40:17.600 --> 01:40:19.480]   so I asked them about the name.
[01:40:19.480 --> 01:40:21.040]   - My last question is about the name.
[01:40:21.040 --> 01:40:23.000]   - Is it from Apple, is it from Severance?
[01:40:23.000 --> 01:40:24.340]   - Yes. (laughs)
[01:40:24.340 --> 01:40:25.800]   - So what's the story, what's the?
[01:40:25.800 --> 01:40:27.840]   - No, it's just like, basically in the end,
[01:40:27.840 --> 01:40:30.320]   we had someone look at the data every now and then,
[01:40:30.320 --> 01:40:31.760]   like go through the thing,
[01:40:31.760 --> 01:40:33.880]   and that's like looking at the scary numbers.
[01:40:33.880 --> 01:40:35.440]   So, you know, this was the macrodata refinements.
[01:40:35.440 --> 01:40:37.040]   - You know, nobody comments about this.
[01:40:37.040 --> 01:40:37.880]   - I know.
[01:40:37.880 --> 01:40:39.520]   - I was like, wait, I saw this in Severance.
[01:40:39.520 --> 01:40:40.560]   - Yeah, I know.
[01:40:40.560 --> 01:40:42.480]   - Right, like, I was like, this is a good joke,
[01:40:42.480 --> 01:40:44.520]   'cause it's exactly what you do when you do filtering.
[01:40:44.520 --> 01:40:45.660]   - Exactly.
[01:40:45.660 --> 01:40:47.320]   - If you haven't seen Severance, it's a great show,
[01:40:47.320 --> 01:40:49.360]   it's on Apple TV, great watch for the holidays,
[01:40:49.360 --> 01:40:51.760]   pretty short, and it's interesting.
[01:40:51.760 --> 01:40:54.200]   I guess you can call it AI-related now.
[01:40:54.200 --> 01:40:56.640]   - But it's cool that, well, so one of the things
[01:40:56.640 --> 01:40:58.400]   I often get asked about, 'cause we have listeners
[01:40:58.400 --> 01:41:00.640]   in a lot of different countries,
[01:41:00.640 --> 01:41:03.280]   should every country have their own model, you know?
[01:41:03.280 --> 01:41:04.880]   - I think this is a really tough question,
[01:41:04.880 --> 01:41:09.060]   because the volume of data in different languages
[01:41:09.060 --> 01:41:12.480]   is a, it's power law, Zipf's law distributed,
[01:41:12.480 --> 01:41:16.440]   so the number of low-resource languages is massive.
[01:41:16.440 --> 01:41:20.360]   We're talking over 100 languages that are low-resource.
[01:41:20.360 --> 01:41:24.280]   You just have too few tokens to do a lot with
[01:41:24.280 --> 01:41:26.160]   in the language modeling context,
[01:41:26.160 --> 01:41:28.060]   so it's much harder to deal with those.
[01:41:28.060 --> 01:41:31.280]   Now there, we've actually seen a few different techniques
[01:41:31.280 --> 01:41:35.180]   at NeurIPS that are targeting those sorts of settings,
[01:41:35.180 --> 01:41:39.340]   and they're doing things like train a base language model
[01:41:39.340 --> 01:41:42.840]   in English, and then do transfer process
[01:41:42.840 --> 01:41:45.400]   where you co-train with both languages.
[01:41:45.400 --> 01:41:46.760]   - That makes a lot of sense.
[01:41:46.760 --> 01:41:48.240]   - It makes a lot of sense.
[01:41:48.240 --> 01:41:51.280]   In that setting, you wanna get the knowledge representation
[01:41:51.280 --> 01:41:54.120]   from one language, and then try to adapt the style--
[01:41:54.120 --> 01:41:56.600]   - Grammar. - Grammar, syntax, I guess,
[01:41:56.600 --> 01:41:58.840]   the easier part. - Yeah.
[01:41:58.840 --> 01:42:02.760]   - In Arabic, we're in a sort of medium-resource language.
[01:42:02.760 --> 01:42:04.160]   There, I think it makes more sense
[01:42:04.160 --> 01:42:07.040]   to try to mix two languages if you wanna do multilingual,
[01:42:07.040 --> 01:42:10.280]   and then it helps you do things like translation.
[01:42:10.280 --> 01:42:12.800]   And then higher-resource languages,
[01:42:12.800 --> 01:42:15.880]   so if you're talking European languages,
[01:42:15.880 --> 01:42:20.480]   French, Spanish, German, those I think you can do
[01:42:20.480 --> 01:42:24.400]   probably from scratch in those languages,
[01:42:24.400 --> 01:42:28.740]   and probably pretty easy to do multilinguality also.
[01:42:28.740 --> 01:42:29.580]   - Yeah.
[01:42:29.580 --> 01:42:32.560]   - So, yeah, it's definitely a very interesting
[01:42:32.560 --> 01:42:35.980]   open direction we're pushing for.
[01:42:35.980 --> 01:42:38.760]   In fact, I'd maybe reference--
[01:42:38.760 --> 01:42:40.160]   - The workshop. - We have a multilingual
[01:42:40.160 --> 01:42:44.960]   workshop on Friday where we've invited a bunch of groups
[01:42:44.960 --> 01:42:47.640]   to come and give talks about their experiences
[01:42:47.640 --> 01:42:50.360]   with training different language models.
[01:42:50.360 --> 01:42:52.080]   - Cool, well, people can check out the authors.
[01:42:52.080 --> 01:42:55.160]   I'm sure this is published and findable online.
[01:42:55.160 --> 01:42:56.100]   - Yes.
[01:42:56.100 --> 01:43:00.440]   - Cool, so we should probably get to intros a little bit.
[01:43:00.440 --> 01:43:02.400]   I mean, we're already recording.
[01:43:02.400 --> 01:43:03.400]   Who are you and what do you work on,
[01:43:03.400 --> 01:43:04.480]   and what does your team work on?
[01:43:04.480 --> 01:43:05.760]   - So my name's Joel Hesnes.
[01:43:05.760 --> 01:43:08.900]   I'm a principal research scientist at Cerebra Systems,
[01:43:08.900 --> 01:43:13.360]   and I'm the lead of our core machine learning group.
[01:43:13.360 --> 01:43:15.000]   So I've helped us bring up
[01:43:15.000 --> 01:43:17.700]   our foundation language models first,
[01:43:17.700 --> 01:43:20.120]   and helped kind of set some of the direction
[01:43:20.120 --> 01:43:21.880]   for expanding outward from there.
[01:43:21.880 --> 01:43:24.440]   So we started by expanding out a lot
[01:43:24.440 --> 01:43:29.220]   on the common language functionality,
[01:43:29.220 --> 01:43:32.160]   and now we're expanding into other places
[01:43:32.160 --> 01:43:34.900]   where transformer models can be used,
[01:43:34.900 --> 01:43:39.120]   so targeting things like multimodal
[01:43:39.120 --> 01:43:41.480]   and other workloads that are similar.
[01:43:41.480 --> 01:43:42.320]   - Okay.
[01:43:42.320 --> 01:43:45.840]   - So a lot of our effort has been bringing this up
[01:43:45.840 --> 01:43:49.120]   and coordinating with the broader Cerebris organization
[01:43:49.120 --> 01:43:52.700]   to lower these applications down,
[01:43:52.700 --> 01:43:56.360]   get them compiled to run at efficiency on our hardware.
[01:43:56.360 --> 01:43:58.880]   So there's been a lot of performance optimization,
[01:43:58.880 --> 01:44:00.480]   making sure numerics are correct
[01:44:00.480 --> 01:44:02.600]   for training large models,
[01:44:02.600 --> 01:44:05.720]   making sure things train stably, things like that.
[01:44:05.720 --> 01:44:06.560]   - Yeah.
[01:44:06.560 --> 01:44:09.760]   - So yeah, we're focusing on scaling out right now,
[01:44:09.760 --> 01:44:12.120]   getting much larger clusters.
[01:44:12.120 --> 01:44:15.120]   We've sold a couple already, and--
[01:44:15.120 --> 01:44:15.960]   - To G42.
[01:44:15.960 --> 01:44:20.320]   - To G42, and yeah, exciting things to come there, I think.
[01:44:20.320 --> 01:44:21.160]   - Exciting things to come.
[01:44:21.160 --> 01:44:23.160]   So we're gonna cover some of the other posters
[01:44:23.160 --> 01:44:25.880]   that you have here, but one thing I guess I,
[01:44:25.880 --> 01:44:29.840]   people are very unfamiliar with anything but NVIDIA.
[01:44:29.840 --> 01:44:33.040]   What should people know when working with a Cerebris chip?
[01:44:33.040 --> 01:44:35.800]   - Sure, yeah, I think maybe people might be familiar
[01:44:35.800 --> 01:44:37.320]   with our wafer.
[01:44:37.320 --> 01:44:38.160]   - Yeah.
[01:44:38.160 --> 01:44:40.920]   - So Cerebris uses a full wafer for our processor
[01:44:40.920 --> 01:44:43.620]   instead of cutting the wafer apart into pieces.
[01:44:44.480 --> 01:44:46.400]   If you cut it apart, you end up packaging it
[01:44:46.400 --> 01:44:48.000]   into a bunch of different cards,
[01:44:48.000 --> 01:44:49.560]   and then you package those into a box.
[01:44:49.560 --> 01:44:50.400]   - Then you have to network them, yeah.
[01:44:50.400 --> 01:44:52.040]   - And then you have to network them all together
[01:44:52.040 --> 01:44:54.200]   with a bunch of extra software.
[01:44:54.200 --> 01:44:56.960]   That's very complicated for large-scale applications,
[01:44:56.960 --> 01:44:58.680]   and so instead of doing that,
[01:44:58.680 --> 01:45:01.080]   we leave it together on a single wafer.
[01:45:01.080 --> 01:45:01.920]   - Got it.
[01:45:01.920 --> 01:45:04.360]   - That single wafer goes in a single big box
[01:45:04.360 --> 01:45:07.560]   that's, the performance is roughly equivalent,
[01:45:07.560 --> 01:45:12.560]   our CS2 box is roughly equivalent to maybe 20 A100 GPUs,
[01:45:13.160 --> 01:45:16.940]   and you can program it like running on a single GPU,
[01:45:16.940 --> 01:45:19.000]   so it's just much easier to use.
[01:45:19.000 --> 01:45:21.760]   - Nice, and is it cost-effective as well?
[01:45:21.760 --> 01:45:23.720]   I assume it is, 'cause you're saving
[01:45:23.720 --> 01:45:24.540]   a whole bunch of overhead.
[01:45:24.540 --> 01:45:27.760]   - Right, so we aim, so the manufacturing process,
[01:45:27.760 --> 01:45:30.200]   it has a lot lower cost because we don't have to deal
[01:45:30.200 --> 01:45:35.200]   with as many moving parts, fewer points of failure,
[01:45:35.200 --> 01:45:38.680]   reliability is quite good, and we try to,
[01:45:38.680 --> 01:45:43.680]   we aim to be price-performance comparable to GPU systems.
[01:45:43.680 --> 01:45:47.600]   - Cool, awesome, that's the hardware stuff.
[01:45:47.600 --> 01:45:49.760]   We're also gonna talk about the streaming things in a bit,
[01:45:49.760 --> 01:45:52.280]   but yeah, I'd love to, whatever you wanna pick next
[01:45:52.280 --> 01:45:54.520]   as one of your work for this year.
[01:45:54.520 --> 01:45:58.240]   - Just give an overview of some of our research directions.
[01:45:58.240 --> 01:46:02.640]   So our hardware is, it has native support
[01:46:02.640 --> 01:46:05.220]   for completely unstructured sparsity.
[01:46:06.600 --> 01:46:09.280]   What that means is we can send in,
[01:46:09.280 --> 01:46:11.480]   say, if we're using the weight streaming mode,
[01:46:11.480 --> 01:46:14.120]   which I mentioned, a weight that comes in,
[01:46:14.120 --> 01:46:17.360]   we can do a vector multiply with some activations,
[01:46:17.360 --> 01:46:20.400]   so you can use that in your matrix multiplies
[01:46:20.400 --> 01:46:24.160]   on the wafer, but you can do that on a per-weight basis.
[01:46:24.160 --> 01:46:25.360]   - You don't need to load the whole thing at once.
[01:46:25.360 --> 01:46:26.680]   - You don't need to load the whole thing
[01:46:26.680 --> 01:46:29.880]   to do matrix multiply, so what that means
[01:46:29.880 --> 01:46:31.920]   is we can do unstructured sparsity,
[01:46:31.920 --> 01:46:34.140]   just send in the weights that you actually wanna use
[01:46:34.140 --> 01:46:36.320]   in the matrix multiply, and you can get
[01:46:36.320 --> 01:46:37.960]   a sparse matrix multiply.
[01:46:37.960 --> 01:46:40.240]   - Isn't the decision for, like this is the argument,
[01:46:40.240 --> 01:46:41.680]   classic argument against that kind of sparsity
[01:46:41.680 --> 01:46:43.200]   is that the decision actually takes longer
[01:46:43.200 --> 01:46:45.100]   than just doing the math anyway.
[01:46:45.100 --> 01:46:49.200]   Like the branching, the sort of Turing-complete branching.
[01:46:49.200 --> 01:46:52.600]   - That's a, yeah, so part of the approach
[01:46:52.600 --> 01:46:56.280]   that we're using is a weight sparse approach,
[01:46:56.280 --> 01:47:00.760]   which means the sparsity is in the model itself,
[01:47:00.760 --> 01:47:02.380]   and so then while you're training that,
[01:47:02.380 --> 01:47:04.420]   you'd prefer those weights to be
[01:47:04.420 --> 01:47:07.000]   the same sparsity structure for a while.
[01:47:07.000 --> 01:47:07.840]   - Okay.
[01:47:07.840 --> 01:47:08.800]   - So there are techniques that train--
[01:47:08.800 --> 01:47:10.920]   - Some kind of constraints, some regularization thing.
[01:47:10.920 --> 01:47:14.500]   - Right, yeah, so the early works in this
[01:47:14.500 --> 01:47:16.600]   are things like the lottery ticket hypothesis,
[01:47:16.600 --> 01:47:19.320]   where you'd find the, yeah, chatting--
[01:47:19.320 --> 01:47:22.080]   - John Frankel's like 10 feet from us.
[01:47:22.080 --> 01:47:26.180]   - And there you find the mask
[01:47:26.180 --> 01:47:28.040]   by doing some heavy-duty training,
[01:47:28.040 --> 01:47:31.440]   and then you rewind and retrain the model from scratch.
[01:47:31.440 --> 01:47:33.940]   Now that's static sparse, so that you have
[01:47:33.940 --> 01:47:36.360]   the same weight sparsity all the way throughout.
[01:47:36.360 --> 01:47:38.420]   That works great on our hardware.
[01:47:38.420 --> 01:47:41.580]   We have, however, added a bunch of new functionality
[01:47:41.580 --> 01:47:43.700]   that's sort of beta in our recent release
[01:47:43.700 --> 01:47:46.260]   that allows you to change the sparsity throughout training,
[01:47:46.260 --> 01:47:49.600]   and so that's something that's being used
[01:47:49.600 --> 01:47:52.880]   in recent research works, like the rigging
[01:47:52.880 --> 01:47:56.620]   the lottery ticket hypothesis work, so RIGL,
[01:47:56.620 --> 01:47:59.060]   and then another one called SET,
[01:48:00.260 --> 01:48:05.020]   a different approach to deciding how to change the sparsity,
[01:48:05.020 --> 01:48:08.220]   but those updates happen infrequently enough
[01:48:08.220 --> 01:48:11.500]   that it doesn't harm the performance on our hardware.
[01:48:11.500 --> 01:48:13.260]   - That's cool, awesome.
[01:48:13.260 --> 01:48:18.260]   So this is, SparseIFT is the paper that you published.
[01:48:18.260 --> 01:48:22.300]   - Yes, so our SparseIFT work looks at different ways
[01:48:22.300 --> 01:48:26.280]   that you can swap out layers for sparse versions
[01:48:26.280 --> 01:48:29.620]   using the same flops that might be able to get you
[01:48:29.620 --> 01:48:31.420]   better representation capability.
[01:48:31.420 --> 01:48:35.580]   So if you have pressure in your representation
[01:48:35.580 --> 01:48:37.860]   that's in your activations, for instance,
[01:48:37.860 --> 01:48:39.900]   let's widen the layer and sparsify it
[01:48:39.900 --> 01:48:41.740]   to give the model more activations.
[01:48:41.740 --> 01:48:43.860]   You can store more in those activations.
[01:48:43.860 --> 01:48:47.180]   Those end up staying dense.
[01:48:47.180 --> 01:48:50.540]   So our results here show that we can get something
[01:48:50.540 --> 01:48:53.460]   like a two to three X performance improvement
[01:48:53.460 --> 01:48:57.100]   at 75% sparse, or you could flip it around
[01:48:57.100 --> 01:49:00.780]   and you can get, for the same flops,
[01:49:00.780 --> 01:49:04.100]   a better model by sometimes three to five percent.
[01:49:04.100 --> 01:49:05.820]   - That's probably, budget-wise,
[01:49:05.820 --> 01:49:07.860]   I guess you're choosing between pre-training and inference
[01:49:07.860 --> 01:49:10.460]   just like many people, like what you're optimizing for.
[01:49:10.460 --> 01:49:12.500]   - Yes. - That's great, awesome.
[01:49:12.500 --> 01:49:14.420]   And what else are you leading?
[01:49:14.420 --> 01:49:19.380]   - So I'm also working on some of the pre-training efforts
[01:49:19.380 --> 01:49:22.420]   that we're doing that look at things like gradient noise
[01:49:22.420 --> 01:49:25.460]   to estimate good batch sizing and make sure
[01:49:25.460 --> 01:49:28.140]   that we're making efficient use of the compute.
[01:49:28.140 --> 01:49:32.500]   So there are techniques, so we have a poster,
[01:49:32.500 --> 01:49:34.820]   the Efficient and Approximate Per Example
[01:49:34.820 --> 01:49:37.180]   Gradient Norms paper. - Oh my god.
[01:49:37.180 --> 01:49:40.140]   Per example? - This is, yes.
[01:49:40.140 --> 01:49:43.180]   So this is at the, we have this published
[01:49:43.180 --> 01:49:45.740]   at the WANT workshop with NIRPS,
[01:49:45.740 --> 01:49:50.740]   and the basic idea is gradient norm calculations are,
[01:49:50.740 --> 01:49:53.420]   typically if you wanted to do
[01:49:53.420 --> 01:49:55.420]   the gradient norm calculation,
[01:49:55.420 --> 01:49:57.740]   you'd wanna aggregate all the gradients together
[01:49:57.740 --> 01:49:59.740]   and then calculate the norm.
[01:49:59.740 --> 01:50:01.460]   And you do that over your batch.
[01:50:01.460 --> 01:50:04.060]   So that's, it's helpful if you wanna measure
[01:50:04.060 --> 01:50:06.100]   some training dynamics, but if you wanna look
[01:50:06.100 --> 01:50:08.580]   at something like critical batch size
[01:50:08.580 --> 01:50:11.580]   to understand how well is my model training
[01:50:11.580 --> 01:50:14.100]   in terms of efficiency, you actually want
[01:50:14.100 --> 01:50:17.020]   to have sub-batches, you wanna understand
[01:50:17.020 --> 01:50:19.260]   the grad norms of the sub-batches also.
[01:50:19.260 --> 01:50:21.940]   You use that and then the large batch grad norm,
[01:50:21.940 --> 01:50:24.500]   you can calculate noise statistics.
[01:50:24.500 --> 01:50:26.300]   Like signal to noise maybe. - Yeah.
[01:50:26.300 --> 01:50:29.980]   - If you use this technique that was defined
[01:50:29.980 --> 01:50:32.900]   by one of my teammates, Gavia,
[01:50:32.900 --> 01:50:36.620]   we can do an approximation that allows us
[01:50:36.620 --> 01:50:40.440]   to take some, run some statistics over activations
[01:50:40.440 --> 01:50:44.580]   and run some statistics over the delta gradient
[01:50:44.580 --> 01:50:47.500]   values coming back, and then you can take
[01:50:47.500 --> 01:50:50.300]   a dot product, an element-wise product of those,
[01:50:50.300 --> 01:50:52.060]   now it's much more compute efficient,
[01:50:52.060 --> 01:50:55.540]   to calculate for each example, this is an approximation
[01:50:55.540 --> 01:50:58.960]   of the grad norm for that sample.
[01:50:58.960 --> 01:51:02.140]   And then you can arbitrarily kind of combine
[01:51:02.140 --> 01:51:06.540]   those back together to get estimates of gradient noise.
[01:51:06.540 --> 01:51:09.220]   - Okay.
[01:51:09.220 --> 01:51:12.740]   - So this is something where we improved the,
[01:51:12.740 --> 01:51:17.340]   we improved the compute requirements.
[01:51:17.340 --> 01:51:20.820]   We use this in a few different contexts currently,
[01:51:20.820 --> 01:51:25.340]   but it improves the compute requirement for this
[01:51:25.340 --> 01:51:28.400]   from, for high dimensional tensors,
[01:51:28.400 --> 01:51:31.620]   from the dimension of the tensor down to linear,
[01:51:31.620 --> 01:51:33.080]   linear time computation.
[01:51:33.080 --> 01:51:34.060]   - Nice.
[01:51:34.060 --> 01:51:35.720]   And do you, is there like,
[01:51:35.720 --> 01:51:39.940]   I forget what this, what is this called,
[01:51:39.940 --> 01:51:42.000]   it's kind of like an annealing curve or something
[01:51:42.000 --> 01:51:44.600]   where you use this technique at the start
[01:51:44.600 --> 01:51:47.660]   to initialize and then eventually you sort of
[01:51:47.660 --> 01:51:49.500]   wean yourself off it?
[01:51:49.500 --> 01:51:53.260]   - So if you, so this is something you do wanna track
[01:51:53.260 --> 01:51:54.100]   throughout training.
[01:51:54.100 --> 01:51:54.920]   - Yeah.
[01:51:54.920 --> 01:51:58.220]   - Especially if you're doing like phase training
[01:51:58.220 --> 01:52:00.540]   or if you're changing the data distribution or something,
[01:52:00.540 --> 01:52:02.460]   it's really helpful to have these statistics
[01:52:02.460 --> 01:52:06.580]   to decide is my, am I using an appropriate batch size
[01:52:06.580 --> 01:52:09.660]   that I'm getting good generalization with the new data.
[01:52:09.660 --> 01:52:12.420]   It helps you set learning rates and things.
[01:52:12.420 --> 01:52:16.160]   So this is something you'd wanna track throughout training.
[01:52:16.160 --> 01:52:20.700]   It gives you an estimate of how big the batch size could be.
[01:52:20.700 --> 01:52:22.100]   - Yeah, excellent.
[01:52:22.100 --> 01:52:22.940]   Very cool.
[01:52:22.940 --> 01:52:27.620]   Any, one more?
[01:52:27.620 --> 01:52:33.060]   - Sure, so then given that we have a sparse accelerator,
[01:52:33.060 --> 01:52:35.400]   we're also looking at applications
[01:52:35.400 --> 01:52:37.740]   where you can deploy sparse models.
[01:52:37.740 --> 01:52:41.560]   And part of our work is figuring out
[01:52:41.560 --> 01:52:43.380]   how to find those sparse models
[01:52:43.380 --> 01:52:45.660]   that you'd use in a deployment setting.
[01:52:45.660 --> 01:52:49.340]   And so we have other work that's related
[01:52:49.340 --> 01:52:54.140]   to like the sparse GPT work that's been recently released
[01:52:54.140 --> 01:52:59.140]   where we do some pruning after dense pre-training
[01:52:59.140 --> 01:53:03.500]   and we do some retraining to get the capabilities
[01:53:03.500 --> 01:53:06.640]   of the model back up before you would put it in deployment.
[01:53:06.640 --> 01:53:07.860]   - How much of it can you get back?
[01:53:07.860 --> 01:53:09.620]   - Actually, I'm not totally familiar.
[01:53:09.620 --> 01:53:12.260]   This is work from my team members.
[01:53:12.260 --> 01:53:16.120]   I know we can do, so for large, very large language models
[01:53:16.120 --> 01:53:19.780]   that have not been trained on a huge number of tokens,
[01:53:19.780 --> 01:53:23.080]   you can do easily upwards of 50% sparsity
[01:53:23.080 --> 01:53:28.080]   and fully recover the upstream losses from this retraining.
[01:53:28.080 --> 01:53:33.360]   So this is a really big next step challenge
[01:53:33.360 --> 01:53:36.600]   for a lot of the organizations that we work with.
[01:53:36.600 --> 01:53:38.720]   They're interested in, now I have,
[01:53:38.720 --> 01:53:41.000]   they're able to pre-train a very large model
[01:53:41.000 --> 01:53:42.440]   with the hardware, now they're interested
[01:53:42.440 --> 01:53:45.940]   in figuring out how to deploy it in an efficient manner.
[01:53:45.940 --> 01:53:48.440]   So we're working with a few different groups on this.
[01:53:48.440 --> 01:53:52.160]   So we're working with Qualcomm
[01:53:52.160 --> 01:53:55.360]   and another group called Neural Magic
[01:53:55.360 --> 01:53:58.420]   that does inference for these large models.
[01:53:58.420 --> 01:54:00.600]   - Yeah, amazing.
[01:54:00.600 --> 01:54:04.480]   I was gonna ask if you need the same dataset to retrain,
[01:54:04.480 --> 01:54:06.140]   but it looks like you train on the pile.
[01:54:06.140 --> 01:54:08.160]   So I guess that's a no.
[01:54:08.160 --> 01:54:10.160]   - Yes, you can actually shift here.
[01:54:10.160 --> 01:54:14.120]   Obviously, different data distribution
[01:54:14.120 --> 01:54:16.220]   means you have to be a little bit careful
[01:54:16.220 --> 01:54:17.760]   about how you do the retraining.
[01:54:17.760 --> 01:54:19.800]   So I think there are a few different things
[01:54:19.800 --> 01:54:23.240]   we've learned about different learning rate warmups,
[01:54:23.240 --> 01:54:26.120]   different learning rate levels, I guess,
[01:54:26.120 --> 01:54:29.000]   because if you're doing a big distribution shift,
[01:54:29.000 --> 01:54:31.600]   you wanna allow the model to shift a little bit,
[01:54:31.600 --> 01:54:34.160]   and so you want a slightly higher learning rate.
[01:54:34.160 --> 01:54:36.380]   - But like, for example, you pruned LLAMA-2,
[01:54:36.380 --> 01:54:39.100]   and we don't know what the original dataset was.
[01:54:39.100 --> 01:54:40.900]   - Yeah, I mean, well, so we kind of know
[01:54:40.900 --> 01:54:43.180]   that LLAMA-2 is a little bit similar
[01:54:43.180 --> 01:54:46.500]   to something like Slim Pajama and LLAMA-1,
[01:54:46.500 --> 01:54:49.560]   but yeah, it is definitely a different dataset.
[01:54:49.560 --> 01:54:53.140]   We do know that Pile and Slim Pajama
[01:54:53.140 --> 01:54:55.840]   have a fair bit of overlap in some things,
[01:54:55.840 --> 01:54:59.120]   but it is definitely a different distribution, yeah.
[01:55:00.740 --> 01:55:05.040]   - So this is a lot of work that our Applied ML team,
[01:55:05.040 --> 01:55:07.320]   our Applied ML team is working on.
[01:55:07.320 --> 01:55:09.080]   We're expanding that team currently, by the way,
[01:55:09.080 --> 01:55:11.080]   so Cerebris is hiring for anybody
[01:55:11.080 --> 01:55:13.200]   who's interested in listening.
[01:55:13.200 --> 01:55:17.320]   You can check out our website, cerebris.net/join-us,
[01:55:17.320 --> 01:55:19.760]   if you'd like to check it out.
[01:55:19.760 --> 01:55:24.080]   Send us your resume, and we'll take a look.
[01:55:24.080 --> 01:55:26.860]   - Yeah, thanks for spending some time with us.
[01:55:26.860 --> 01:55:29.700]   Before we go, what's one NeurIPS tip
[01:55:29.700 --> 01:55:32.160]   that you wanna give to people if they're attending NeurIPS?
[01:55:32.160 --> 01:55:33.440]   How do you do NeurIPS right?
[01:55:33.440 --> 01:55:35.080]   - How do you do NeurIPS right?
[01:55:35.080 --> 01:55:39.200]   Well, so it's grown roughly 5X
[01:55:39.200 --> 01:55:41.080]   in the time that I've been attending NeurIPS,
[01:55:41.080 --> 01:55:44.040]   so it gets more overwhelming every year,
[01:55:44.040 --> 01:55:49.040]   so pace yourself, and I like that they've kind of backed off
[01:55:49.040 --> 01:55:53.360]   a bit on the talks and in favor of poster sessions.
[01:55:53.360 --> 01:55:55.280]   Just, you gotta go wander around,
[01:55:55.280 --> 01:55:58.480]   you gotta talk to people, you gotta check out posters
[01:55:58.480 --> 01:56:03.480]   and kind of let stuff sink in and ask questions, so yeah.
[01:56:03.480 --> 01:56:05.940]   - Yeah, excellent, well thanks so much for your time.
[01:56:05.940 --> 01:56:07.020]   - Definitely, thanks. - Thank you.
[01:56:07.020 --> 01:56:08.340]   - That's it.
[01:56:08.340 --> 01:56:10.340]   - I think Cerebris is doing very interesting work here.
[01:56:10.340 --> 01:56:12.180]   Most people know them for their hardware,
[01:56:12.180 --> 01:56:13.900]   but I think they're doing very interesting work
[01:56:13.900 --> 01:56:15.740]   on the software and LLM trading side,
[01:56:15.740 --> 01:56:18.440]   and I'd be interested to have them on again in 2024.
[01:56:18.440 --> 01:56:21.780]   So next we're gonna go walk down the floor to Voxel 51,
[01:56:21.780 --> 01:56:24.100]   which is not a company I've actually come across before,
[01:56:24.100 --> 01:56:26.820]   but it seems to be an interesting pair
[01:56:26.820 --> 01:56:28.340]   together with the next guest as well.
[01:56:28.340 --> 01:56:30.460]   So this is another one of those situations
[01:56:30.460 --> 01:56:32.780]   where I get to put two competitors next to each other
[01:56:32.780 --> 01:56:35.180]   and let you decide as to how they differ
[01:56:35.180 --> 01:56:37.000]   and how they talk about themselves.
[01:56:37.000 --> 01:56:39.020]   - Sure, my name is Jason Corso.
[01:56:39.020 --> 01:56:41.660]   I'm the co-founder and chief scientist at Voxel.
[01:56:41.660 --> 01:56:44.220]   I'm also on the faculty of EECS and robotics
[01:56:44.220 --> 01:56:45.660]   at the University of Michigan.
[01:56:45.660 --> 01:56:48.860]   So Voxel 51 is a spin out of my lab.
[01:56:48.860 --> 01:56:52.260]   We make a toolkit for AI engineers
[01:56:52.260 --> 01:56:55.500]   that sits on top of things like PyTorch and TensorFlow,
[01:56:55.500 --> 01:56:58.940]   and I think of it like a model and dataset debugger.
[01:56:58.940 --> 01:57:00.400]   The key problem that we face
[01:57:00.400 --> 01:57:02.720]   is not that we can go download datasets
[01:57:02.720 --> 01:57:03.920]   and then train models on them,
[01:57:03.920 --> 01:57:05.060]   or even with foundation models,
[01:57:05.060 --> 01:57:06.540]   go pull one off the shelf
[01:57:06.540 --> 01:57:08.900]   and then expect it to work exactly the way you want.
[01:57:08.900 --> 01:57:12.020]   The problem is really the co-development of a dataset
[01:57:12.020 --> 01:57:14.380]   to then go and actually use one of those models
[01:57:14.380 --> 01:57:16.380]   or train or fine tune your own model.
[01:57:16.380 --> 01:57:18.860]   So 51 lets you represent the data
[01:57:18.860 --> 01:57:21.660]   that you're using or building alongside your models
[01:57:21.660 --> 01:57:25.660]   in a way that is extensible, visualizable, and flexible
[01:57:25.660 --> 01:57:30.660]   so that you can write simple single lines of code in Python
[01:57:30.660 --> 01:57:33.200]   to do queries of your datasets and your models,
[01:57:33.200 --> 01:57:35.160]   like show me the corner cases
[01:57:35.160 --> 01:57:38.700]   where model A is outperforming model B, and it's outdoors,
[01:57:38.700 --> 01:57:42.440]   or show me intersections in my B2D data,
[01:57:42.440 --> 01:57:44.620]   or let me visualize my embeddings
[01:57:44.620 --> 01:57:46.540]   that are either just vision
[01:57:46.540 --> 01:57:48.860]   or point cloud-based or multimodal,
[01:57:48.860 --> 01:57:50.220]   and then visually interact with them
[01:57:50.220 --> 01:57:52.420]   with lassoing on the 3D embedding.
[01:57:52.420 --> 01:57:55.380]   - Is the concept of active learning still in vogue,
[01:57:55.380 --> 01:57:57.660]   or is it not cool these days? (laughs)
[01:57:57.660 --> 01:58:01.100]   - Well, I mean, so 51 is a pretty flexible ecosystem
[01:58:01.100 --> 01:58:02.100]   of capabilities.
[01:58:02.100 --> 01:58:05.340]   The heart of it really is that data-centric data model
[01:58:05.340 --> 01:58:06.380]   of unstructured data.
[01:58:06.380 --> 01:58:09.220]   So we support images, video, and point clouds.
[01:58:09.220 --> 01:58:10.740]   You can, in fact, there's a blog
[01:58:10.740 --> 01:58:13.300]   that one of my colleagues at Voxel 51
[01:58:13.300 --> 01:58:15.240]   wrote maybe a month ago
[01:58:15.240 --> 01:58:18.220]   on how to implement an active learning workflow
[01:58:18.220 --> 01:58:19.180]   on top of 51.
[01:58:19.180 --> 01:58:20.260]   So it's plausible. - Seems like it'll
[01:58:20.260 --> 01:58:21.700]   lend itself easily. - Yeah, exactly.
[01:58:21.700 --> 01:58:22.540]   It's plausible.
[01:58:22.540 --> 01:58:24.140]   I mean, the challenge with active learning
[01:58:24.140 --> 01:58:26.300]   is will just more data help,
[01:58:26.300 --> 01:58:27.980]   or do you need to write more data?
[01:58:27.980 --> 01:58:28.820]   - Of course to write more data, yeah.
[01:58:28.820 --> 01:58:30.940]   - And I think that's kind of a,
[01:58:30.940 --> 01:58:33.000]   that's the question, I think, right, so yeah.
[01:58:33.000 --> 01:58:34.880]   - Is it primarily vision that you work on,
[01:58:34.880 --> 01:58:35.780]   or is it just anything?
[01:58:35.780 --> 01:58:38.700]   - Yeah, so my experience is in computer vision,
[01:58:38.700 --> 01:58:42.240]   mostly video understanding and imaging problems.
[01:58:42.240 --> 01:58:43.740]   So that's where we got started.
[01:58:43.740 --> 01:58:45.740]   However, the software is pretty flexible,
[01:58:45.740 --> 01:58:47.420]   so you can add your own data type.
[01:58:47.420 --> 01:58:49.580]   Like, you know, we're considering adding audio,
[01:58:49.580 --> 01:58:54.100]   adding text, IOT, you know, like temporal signals.
[01:58:54.100 --> 01:58:55.740]   But right now, it's images, video, and point clouds.
[01:58:55.740 --> 01:58:57.020]   - I've often heard it said that, you know,
[01:58:57.020 --> 01:58:59.540]   the best researchers and the best engineers
[01:58:59.540 --> 01:59:00.820]   are really the people who get their hands dirty
[01:59:00.820 --> 01:59:02.020]   in the data sets.
[01:59:02.020 --> 01:59:03.300]   - Oh yeah, you have to get your hands dirty.
[01:59:03.300 --> 01:59:06.060]   And this is, so in some sense, the whole company exists
[01:59:06.060 --> 01:59:08.100]   because I was worried no one was getting
[01:59:08.100 --> 01:59:09.660]   their hands dirty enough, right?
[01:59:09.660 --> 01:59:11.980]   Like, they were just expecting to take a data set,
[01:59:11.980 --> 01:59:13.980]   take a model, and then train it once,
[01:59:13.980 --> 01:59:15.980]   and then out pops, like, your usable thing.
[01:59:15.980 --> 01:59:17.140]   No, that's not the way it works, right?
[01:59:17.140 --> 01:59:20.340]   This is a hard problem in building intuition,
[01:59:20.340 --> 01:59:22.980]   building a comfort, or like, an ability to take
[01:59:22.980 --> 01:59:25.540]   a 10 million sample data set and find, like,
[01:59:25.540 --> 01:59:28.660]   the 1,000 samples that are giving you this problem here.
[01:59:28.660 --> 01:59:30.740]   It's hard to do, and that's what 51 really lets you do.
[01:59:30.740 --> 01:59:32.640]   - Yeah, yeah, what's the name, actually?
[01:59:32.640 --> 01:59:33.980]   I have to ask.
[01:59:33.980 --> 01:59:35.580]   - Well, we had 50 bad ideas.
[01:59:35.580 --> 01:59:38.300]   - And this is the first, the one that was, like,
[01:59:38.300 --> 01:59:39.140]   actually good.
[01:59:39.140 --> 01:59:41.220]   - Well, that's the way we say now,
[01:59:41.220 --> 01:59:44.500]   but the actual original way we got started as a company
[01:59:44.500 --> 01:59:48.460]   was as a video understanding as a service platform,
[01:59:48.460 --> 01:59:50.540]   and so that's why, so the voxel in the name
[01:59:50.540 --> 01:59:53.540]   is in the space-time volume of pixels, you know?
[01:59:53.540 --> 01:59:56.380]   And 51 was just to elicit ideas of Area 51.
[01:59:56.380 --> 01:59:57.980]   Like, can you find the right voxel?
[01:59:57.980 --> 01:59:58.800]   Is it there?
[01:59:58.800 --> 01:59:59.640]   That kind of thing.
[01:59:59.640 --> 02:00:01.500]   We've subsequently way pivoted away from that,
[02:00:01.500 --> 02:00:04.380]   as most startups will do at some point in their journey.
[02:00:04.380 --> 02:00:06.020]   - Yeah, it makes the domain easier to buy.
[02:00:06.020 --> 02:00:07.740]   - Sure, exactly.
[02:00:07.740 --> 02:00:10.060]   - So, anything else people should know about your platform?
[02:00:10.060 --> 02:00:11.680]   Like, top use cases, top customers
[02:00:11.680 --> 02:00:12.780]   that you always brag about?
[02:00:12.780 --> 02:00:14.540]   - Sure, well, I mean, it is open source, right?
[02:00:14.540 --> 02:00:17.100]   So, as long as you have the three key assumptions,
[02:00:17.100 --> 02:00:19.400]   local data, one user, one machine,
[02:00:19.400 --> 02:00:20.940]   there's no limitation on the machine learning
[02:00:20.940 --> 02:00:22.460]   that you can do with 51.
[02:00:22.460 --> 02:00:24.420]   When you want to violate one of those assumptions,
[02:00:24.420 --> 02:00:27.360]   like work on a team, or work in the cloud, or whatever,
[02:00:27.360 --> 02:00:29.200]   then we have an enterprise product
[02:00:29.200 --> 02:00:31.220]   that you would talk to us to purchase, basically,
[02:00:31.220 --> 02:00:33.340]   and that's kind of like a Google Drive layer
[02:00:33.340 --> 02:00:34.940]   on top of the open source one.
[02:00:34.940 --> 02:00:35.980]   - Very reasonable.
[02:00:35.980 --> 02:00:38.440]   - Yeah, the only, I mean, we sell to,
[02:00:38.440 --> 02:00:39.900]   a lot of companies do use it.
[02:00:39.900 --> 02:00:41.820]   I'm not gonna name 'em here,
[02:00:41.820 --> 02:00:42.780]   but you can go to the website,
[02:00:42.780 --> 02:00:45.700]   there's a logo wall of those we can name.
[02:00:45.700 --> 02:00:47.280]   But it'd be great if you're listening
[02:00:47.280 --> 02:00:49.020]   to give us a GitHub star.
[02:00:49.020 --> 02:00:51.640]   That's our, like, we're here at NeurIPS to get users
[02:00:51.640 --> 02:00:52.480]   to get stars, right? - Stars for swag.
[02:00:52.480 --> 02:00:54.780]   - Stars for swag, you got it.
[02:00:54.780 --> 02:00:55.620]   - Yeah, excellent.
[02:00:55.620 --> 02:00:58.700]   You published a guide to doing CVPR right.
[02:00:58.700 --> 02:00:59.540]   - I did.
[02:00:59.540 --> 02:01:00.460]   - We're here at NeurIPS.
[02:01:00.460 --> 02:01:02.180]   What would be your guide for doing NeurIPS right?
[02:01:02.180 --> 02:01:03.500]   - So, how to do NeurIPS right?
[02:01:03.500 --> 02:01:04.460]   I think there's some key things
[02:01:04.460 --> 02:01:05.660]   of doing large conferences right.
[02:01:05.660 --> 02:01:08.660]   One is, like, don't expect to do too much per day, right?
[02:01:08.660 --> 02:01:10.260]   So, what I've always done,
[02:01:10.260 --> 02:01:13.100]   even when conferences were like a quarter of the size
[02:01:13.100 --> 02:01:15.540]   or less, like, for any one day,
[02:01:15.540 --> 02:01:18.040]   identify five to 10 papers in the morning
[02:01:18.040 --> 02:01:20.700]   that I just wanna understand for that day, right?
[02:01:20.700 --> 02:01:23.060]   So, then I will make sure though to spend time
[02:01:23.060 --> 02:01:26.120]   with that poster presenter at the oral talk.
[02:01:26.120 --> 02:01:27.220]   To me, that's the key.
[02:01:27.220 --> 02:01:29.060]   And then, at the end of that day,
[02:01:29.060 --> 02:01:32.180]   I do tend to write a summary from my own brain,
[02:01:32.180 --> 02:01:33.500]   my own notes of what I did,
[02:01:33.500 --> 02:01:36.640]   like what the key points were for those papers.
[02:01:36.640 --> 02:01:38.540]   That's definitely one winning strategy
[02:01:38.540 --> 02:01:39.740]   for a big conference like this.
[02:01:39.740 --> 02:01:41.900]   - All right, any other advice for people building
[02:01:41.900 --> 02:01:44.900]   or any papers that you're excited for this year?
[02:01:44.900 --> 02:01:47.860]   - Well, I mean, advice, I don't know.
[02:01:47.860 --> 02:01:48.740]   If you don't know your data,
[02:01:48.740 --> 02:01:50.060]   then you don't know what you're doing
[02:01:50.060 --> 02:01:52.220]   is the way I would probably say it.
[02:01:52.220 --> 02:01:54.140]   And indeed, like getting close to your data
[02:01:54.140 --> 02:01:56.420]   is part of the model building process, right?
[02:01:56.420 --> 02:01:58.460]   Like, just to say it again,
[02:01:58.460 --> 02:02:00.740]   I think of it as a co-development process
[02:02:00.740 --> 02:02:04.820]   of data sets and models, not of a model training problem.
[02:02:04.820 --> 02:02:06.580]   - Yeah, I actually had a really interesting chat
[02:02:06.580 --> 02:02:08.220]   with someone from Cerebris, actually,
[02:02:08.220 --> 02:02:09.900]   where they talked about how they were doing evals
[02:02:09.900 --> 02:02:14.020]   on their loss per region on a data set
[02:02:14.020 --> 02:02:16.300]   as they were training their large language models
[02:02:16.300 --> 02:02:18.300]   so that they could increase the exposure
[02:02:18.300 --> 02:02:20.860]   on a specific subdomain if they saw that specifically,
[02:02:20.860 --> 02:02:22.680]   like loss was not progressing as well
[02:02:22.680 --> 02:02:23.800]   in that particular subdomain.
[02:02:23.800 --> 02:02:25.300]   So it's kind of like online training
[02:02:25.300 --> 02:02:28.420]   and watching their models evolve while they're training.
[02:02:28.420 --> 02:02:30.820]   - Yeah, I guess it sounds like on specific subsets
[02:02:30.820 --> 02:02:32.780]   of the data, which is really important.
[02:02:32.780 --> 02:02:33.780]   - Cool, well, thanks so much for your time.
[02:02:33.780 --> 02:02:35.380]   - Thanks very much, nice to chat with you, Sean.
[02:02:35.380 --> 02:02:36.540]   - Coming from data engineering,
[02:02:36.540 --> 02:02:39.460]   it's pretty interesting to see this space develop.
[02:02:39.460 --> 02:02:41.380]   It's interesting, also, that a lot of them
[02:02:41.380 --> 02:02:43.060]   emphasize open source, which we'll see
[02:02:43.060 --> 02:02:45.700]   with the next speaker, which is Brandon from Gnomic.
[02:02:45.700 --> 02:02:47.140]   - Who are you and what's Gnomic?
[02:02:47.140 --> 02:02:49.020]   - Yeah, hey, everyone, my name is Brandon Ai.
[02:02:49.020 --> 02:02:51.420]   I'm a co-founder and CEO of Gnomic.
[02:02:51.420 --> 02:02:53.020]   Gnomic's a company that does many things,
[02:02:53.020 --> 02:02:55.320]   but we have two main products right now.
[02:02:55.320 --> 02:02:57.140]   One of them is GPD for All,
[02:02:57.140 --> 02:02:58.780]   which is an open-source ecosystem
[02:02:58.780 --> 02:03:00.820]   of low-resource language models.
[02:03:00.820 --> 02:03:03.300]   So it lets you do things like run, you know,
[02:03:03.300 --> 02:03:06.260]   Mistral 7b fine-tuned on OpenOrca on a MacBook
[02:03:06.260 --> 02:03:09.780]   or, you know, some esoteric GPU, things like this.
[02:03:09.780 --> 02:03:11.700]   The second product is a tool called Atlas.
[02:03:11.700 --> 02:03:14.220]   It lets you explore massive unstructured data sets
[02:03:14.220 --> 02:03:15.820]   in your web browser.
[02:03:15.820 --> 02:03:17.420]   Since we're here at NeurIPS, a lot of people
[02:03:17.420 --> 02:03:19.180]   seem to respond to calling it
[02:03:19.180 --> 02:03:21.220]   massive clickable t-SNE as a service.
[02:03:21.220 --> 02:03:24.660]   - Yes, I was actually thinking, is it t-SNE or UMAP?
[02:03:24.660 --> 02:03:26.740]   - Yeah, so it turns out, if you squint closely enough,
[02:03:26.740 --> 02:03:28.540]   they're the same algorithm, up to a choice
[02:03:28.540 --> 02:03:29.820]   of low-dimensional kernel.
[02:03:29.820 --> 02:03:32.540]   So we optimize the t-SNE objective function.
[02:03:32.540 --> 02:03:33.900]   One of our pieces of IP is we have
[02:03:33.900 --> 02:03:35.940]   the world's fastest optimizer for it.
[02:03:35.940 --> 02:03:39.220]   So if you take, say, the NVIDIA Rapids UMAP implementation,
[02:03:39.220 --> 02:03:41.620]   which is kind of the fastest version of this in the wild,
[02:03:41.620 --> 02:03:43.740]   off the shelf and run it on Wikipedia
[02:03:43.740 --> 02:03:45.540]   on the biggest machine on AWS,
[02:03:45.540 --> 02:03:47.100]   it's gonna take you a couple of days
[02:03:47.100 --> 02:03:48.480]   to actually get that map,
[02:03:48.480 --> 02:03:50.660]   but we can do it in about four hours.
[02:03:50.660 --> 02:03:51.500]   - Oh, excellent.
[02:03:51.500 --> 02:03:52.420]   - Yeah, it lets you make the maps
[02:03:52.420 --> 02:03:54.100]   part of your iterative daily workflow
[02:03:54.100 --> 02:03:56.040]   as opposed to having to wait a week to get them.
[02:03:56.040 --> 02:03:56.880]   - Nice.
[02:03:56.880 --> 02:03:58.860]   We'll throw a video on this on the show notes,
[02:03:58.860 --> 02:04:00.860]   but maybe you could sort of narratively
[02:04:00.860 --> 02:04:01.700]   show what you're showing.
[02:04:01.700 --> 02:04:04.380]   Like, you showed a TikTok example and a Twitter example,
[02:04:04.380 --> 02:04:06.100]   right, so these are really for visualizing
[02:04:06.100 --> 02:04:07.780]   massive multimodal data sets.
[02:04:07.780 --> 02:04:09.980]   - Yeah, so the fundamental thesis behind the tool
[02:04:09.980 --> 02:04:12.220]   is that the shape of data that people have
[02:04:12.220 --> 02:04:14.820]   has fundamentally changed as a result of generative.
[02:04:14.820 --> 02:04:16.740]   Instead of having these big Excel spreadsheets
[02:04:16.740 --> 02:04:19.800]   of tabular things, you now have vectors plus metadata,
[02:04:19.800 --> 02:04:21.980]   and we need to rethink visualization
[02:04:21.980 --> 02:04:24.700]   and the implications of that for the visualization stack.
[02:04:24.700 --> 02:04:26.380]   You are kind of seeing at the database layer
[02:04:26.380 --> 02:04:28.420]   that's starting to penetrate with vector DBs and stuff,
[02:04:28.420 --> 02:04:31.080]   but I think there's gonna be radical implications
[02:04:31.080 --> 02:04:32.940]   for that change all the way up the stack.
[02:04:32.940 --> 02:04:34.320]   And so you can use it on, you know,
[02:04:34.320 --> 02:04:35.580]   getting back to your original question,
[02:04:35.580 --> 02:04:39.120]   Twitter data, TikTok data, images, sounds, text,
[02:04:39.120 --> 02:04:40.680]   anything that you can stuff into a vector,
[02:04:40.680 --> 02:04:42.680]   which is pretty much anything these days,
[02:04:42.680 --> 02:04:44.140]   you can map and you can understand.
[02:04:44.140 --> 02:04:44.980]   - Yeah.
[02:04:44.980 --> 02:04:46.940]   Can I bring my own custom embeddings
[02:04:46.940 --> 02:04:48.380]   and see the impact of that?
[02:04:48.380 --> 02:04:49.220]   - You can.
[02:04:49.220 --> 02:04:50.860]   So there's two ways to get data into the platform.
[02:04:50.860 --> 02:04:52.780]   One way is bring your own embeddings,
[02:04:52.780 --> 02:04:54.520]   and then you just pip install gnomic
[02:04:54.520 --> 02:04:57.200]   from gnomic import atlas and then atlas.map_embeddings.
[02:04:57.200 --> 02:04:58.300]   You supply your embeddings,
[02:04:58.300 --> 02:05:00.340]   you supply metadata on top of them,
[02:05:00.340 --> 02:05:01.620]   and then a couple minutes later,
[02:05:01.620 --> 02:05:03.700]   you'll get a web link back to a map
[02:05:03.700 --> 02:05:05.260]   where you can click on it and fly around it.
[02:05:05.260 --> 02:05:06.460]   If you just have raw data,
[02:05:06.460 --> 02:05:08.020]   we have a bunch of out-of-the-box embedders
[02:05:08.020 --> 02:05:10.220]   that we develop and we work with partners to develop
[02:05:10.220 --> 02:05:12.680]   that you can use to map it out of the box as well.
[02:05:12.680 --> 02:05:13.520]   - Yeah.
[02:05:13.520 --> 02:05:16.380]   And this is not open source, but GPC for All is.
[02:05:16.380 --> 02:05:18.620]   - So there are aspects of the platform that are open source.
[02:05:18.620 --> 02:05:20.660]   The entire thing runs on a graphics engine
[02:05:20.660 --> 02:05:22.300]   that we developed called Deep Scatter.
[02:05:22.300 --> 02:05:23.480]   It's the only tool out there
[02:05:23.480 --> 02:05:25.260]   that can render a billion point scatter plots
[02:05:25.260 --> 02:05:26.380]   in a web browser.
[02:05:26.380 --> 02:05:27.740]   And to do that, you have to, again,
[02:05:27.740 --> 02:05:29.300]   kind of fundamentally rethink how graphics
[02:05:29.300 --> 02:05:30.620]   in the browser works from the ground up.
[02:05:30.620 --> 02:05:32.060]   That is available source,
[02:05:32.060 --> 02:05:33.940]   but unfortunately it's not fully open source.
[02:05:33.940 --> 02:05:34.780]   - It's okay.
[02:05:34.780 --> 02:05:36.380]   Yeah, you don't have to apologize for anything.
[02:05:36.380 --> 02:05:37.940]   - I do have to.
[02:05:37.940 --> 02:05:39.780]   I wish we could open source everything,
[02:05:39.780 --> 02:05:42.380]   but we are unfortunately subject to capitalism,
[02:05:42.380 --> 02:05:43.220]   and so we cannot.
[02:05:43.220 --> 02:05:45.460]   But in the limit, I would love to open source everything.
[02:05:45.460 --> 02:05:47.460]   - I also maybe heard you in another introduction
[02:05:47.460 --> 02:05:49.980]   talk about this as like Looker for language models.
[02:05:49.980 --> 02:05:51.700]   Like, elaborate more about that?
[02:05:51.700 --> 02:05:52.540]   - Yeah.
[02:05:52.540 --> 02:05:53.860]   - Do you have a query language?
[02:05:53.860 --> 02:05:56.180]   What are you thinking about as the overall vision?
[02:05:56.180 --> 02:05:58.420]   - Yeah, so I wanna bring it back to the analogy
[02:05:58.420 --> 02:06:00.940]   of like the new shape of data disrupting the stack, right?
[02:06:00.940 --> 02:06:02.140]   So the first place we see it hitting
[02:06:02.140 --> 02:06:03.540]   is at the database layer.
[02:06:03.540 --> 02:06:05.060]   Things, you know, we see vector databases.
[02:06:05.060 --> 02:06:06.500]   There's a million of them nowadays.
[02:06:06.500 --> 02:06:08.220]   I think that that change is gonna propagate
[02:06:08.220 --> 02:06:09.660]   all the way up the stack.
[02:06:09.660 --> 02:06:10.840]   And we are interested in, you know,
[02:06:10.840 --> 02:06:12.620]   what happens to the BI analytics,
[02:06:12.620 --> 02:06:14.220]   you know, visualization layer.
[02:06:14.220 --> 02:06:15.820]   And so really what we're thinking of this as
[02:06:15.820 --> 02:06:18.100]   is sort of like a tableau for unstructured data
[02:06:18.100 --> 02:06:20.420]   or a Looker or Power BI or something like this,
[02:06:20.420 --> 02:06:22.980]   where we've built the entire visualization system
[02:06:22.980 --> 02:06:24.720]   with embeddings as a first class citizen.
[02:06:24.720 --> 02:06:27.380]   And so that enables a lot of different actions.
[02:06:27.380 --> 02:06:28.500]   Some are already in the platform.
[02:06:28.500 --> 02:06:30.500]   Some I can't tease yet, unfortunately.
[02:06:30.500 --> 02:06:33.680]   But having embeddings as a first class primitive
[02:06:33.680 --> 02:06:36.300]   enables a lot of like very, very useful things
[02:06:36.300 --> 02:06:38.740]   that you're not gonna be able to get unless you have that.
[02:06:38.740 --> 02:06:41.380]   - What do people use Atlas for?
[02:06:41.380 --> 02:06:43.900]   Like just maybe list out some more use cases
[02:06:43.900 --> 02:06:44.980]   that might not be obvious
[02:06:44.980 --> 02:06:46.860]   from people just thinking about visualization.
[02:06:46.860 --> 02:06:48.340]   - Yeah, so we'll start with the most technical
[02:06:48.340 --> 02:06:49.580]   and we'll go to the least technical.
[02:06:49.580 --> 02:06:51.620]   A lot of ML engineers use it to understand
[02:06:51.620 --> 02:06:53.760]   and evaluate their models and training data.
[02:06:53.760 --> 02:06:55.500]   So we just did some work with Hugging Face
[02:06:55.500 --> 02:06:57.180]   on their Obelix data set,
[02:06:57.180 --> 02:06:59.280]   which they use to train their idefix model,
[02:06:59.280 --> 02:07:02.340]   doing some evaluation and training data analysis,
[02:07:02.340 --> 02:07:03.660]   looking at what areas of their--
[02:07:03.660 --> 02:07:04.620]   - We actually interviewed those guys.
[02:07:04.620 --> 02:07:06.860]   I was in Paris and I talked to Leo and--
[02:07:06.860 --> 02:07:07.700]   - And Victor.
[02:07:07.700 --> 02:07:08.520]   - Yeah, yeah.
[02:07:08.520 --> 02:07:09.360]   - Yeah, those guys are sick.
[02:07:09.360 --> 02:07:10.540]   But yeah, so we worked with them on this
[02:07:10.540 --> 02:07:12.460]   and we discovered a couple of things
[02:07:12.460 --> 02:07:13.300]   in their training data
[02:07:13.300 --> 02:07:15.140]   that they should have like actually cleaned out of it.
[02:07:15.140 --> 02:07:17.480]   There was like a bunch of end of sentence token
[02:07:17.480 --> 02:07:19.620]   to be replaced that made it through stuff like this.
[02:07:19.620 --> 02:07:20.460]   Some really garbage content.
[02:07:20.460 --> 02:07:21.900]   - Do you do anomaly detection?
[02:07:21.900 --> 02:07:24.260]   Or is that up to people to code themselves?
[02:07:24.260 --> 02:07:26.260]   - Yeah, so the anomalies usually manifest
[02:07:26.260 --> 02:07:28.620]   as like the little moons on the outside of the map.
[02:07:28.620 --> 02:07:29.460]   - Oh, sure, okay.
[02:07:29.460 --> 02:07:30.280]   - And then you can just like hit 'em
[02:07:30.280 --> 02:07:32.660]   with the little lasso tool and stuff like this.
[02:07:32.660 --> 02:07:34.380]   But one of the things about the Hugging Face map
[02:07:34.380 --> 02:07:36.460]   that I found fascinating was
[02:07:36.460 --> 02:07:39.180]   because we supply like a topic model out of the box,
[02:07:39.180 --> 02:07:41.420]   you can look at things like are there topics
[02:07:41.420 --> 02:07:43.780]   where the loss tends to like cluster together?
[02:07:43.780 --> 02:07:44.940]   And for the Hugging Face model,
[02:07:44.940 --> 02:07:47.980]   there was this high loss mode in the poetry topic,
[02:07:47.980 --> 02:07:49.500]   which I thought was super interesting.
[02:07:49.500 --> 02:07:51.220]   And so I've got two theories for it.
[02:07:51.220 --> 02:07:53.900]   One is that poetry includes the distinct subversion
[02:07:53.900 --> 02:07:55.820]   of like common linguistic patterns.
[02:07:55.820 --> 02:07:58.220]   And so of course, language models will be bad at it.
[02:07:58.220 --> 02:08:00.060]   But the more perhaps optimistic theory
[02:08:00.060 --> 02:08:01.420]   is that poetry captures something
[02:08:01.420 --> 02:08:02.420]   that's fundamentally human
[02:08:02.420 --> 02:08:04.340]   that the machines have not grasped yet.
[02:08:04.340 --> 02:08:06.540]   The pragmatic version, I think,
[02:08:06.540 --> 02:08:07.540]   is probably what's happening,
[02:08:07.540 --> 02:08:09.700]   but I like to be optimistic, so.
[02:08:09.700 --> 02:08:11.460]   - IdaFix is a visual data set.
[02:08:11.460 --> 02:08:12.620]   And you were-- - It's multimodal.
[02:08:12.620 --> 02:08:14.500]   - Yeah, okay, so they have poetry in there.
[02:08:14.500 --> 02:08:15.420]   - Yep. - Interesting.
[02:08:15.420 --> 02:08:17.000]   - It's sort of interleaved webpages
[02:08:17.000 --> 02:08:18.700]   of like it'll be an image and then some poetry.
[02:08:18.700 --> 02:08:20.460]   - Right, so that's the more technical side.
[02:08:20.460 --> 02:08:22.240]   - And then coming down to the less technical side,
[02:08:22.240 --> 02:08:24.060]   you know, a lot of our customer base at this point
[02:08:24.060 --> 02:08:26.340]   is like consulting type companies.
[02:08:26.340 --> 02:08:28.060]   And they find the product really useful
[02:08:28.060 --> 02:08:31.100]   for connecting domain experts with large data sets.
[02:08:31.100 --> 02:08:32.620]   So generally what will happen
[02:08:32.620 --> 02:08:33.940]   is you'll have these domain experts,
[02:08:33.940 --> 02:08:36.120]   be it like a doctor or someone in regulation,
[02:08:36.120 --> 02:08:38.020]   someone with subject matter expertise,
[02:08:38.020 --> 02:08:39.820]   that'll be handed this massive set of documents
[02:08:39.820 --> 02:08:41.740]   from a client and be like, I don't even know where to start.
[02:08:41.740 --> 02:08:43.140]   I don't even know what's in this.
[02:08:43.140 --> 02:08:45.580]   And so a couple of the consulting partners we work with
[02:08:45.580 --> 02:08:48.180]   actually now have a KPI that's like timed to Atlas,
[02:08:48.180 --> 02:08:49.860]   where it's like how quickly from the data set
[02:08:49.860 --> 02:08:52.360]   hitting the company does it get to Atlas
[02:08:52.360 --> 02:08:54.480]   so that we can send an analyst the map
[02:08:54.480 --> 02:08:56.120]   and they can start to explore it.
[02:08:56.120 --> 02:08:58.120]   And so we're really excited about enabling
[02:08:58.120 --> 02:09:00.840]   sort of traditionally non-technical people
[02:09:00.840 --> 02:09:02.960]   to explore and analyze these massive data sets
[02:09:02.960 --> 02:09:04.560]   with this no-code interface.
[02:09:04.560 --> 02:09:05.400]   - You know what you should do?
[02:09:05.400 --> 02:09:06.640]   You should hook up with Google.
[02:09:06.640 --> 02:09:07.680]   Doesn't Google have a big set
[02:09:07.680 --> 02:09:09.760]   of like publicly available data sets?
[02:09:09.760 --> 02:09:11.760]   - Yeah, so we've actually done a couple of collaborations
[02:09:11.760 --> 02:09:13.600]   with Google Cloud on some of those data sets.
[02:09:13.600 --> 02:09:15.320]   We can maybe link the blog posts or something.
[02:09:15.320 --> 02:09:16.360]   - Sure, yeah. - Yeah.
[02:09:16.360 --> 02:09:17.200]   - Okay, awesome.
[02:09:17.200 --> 02:09:18.040]   Just in Europe in general,
[02:09:18.040 --> 02:09:19.380]   you've been here a number of years.
[02:09:19.380 --> 02:09:20.920]   What do you look for when you come to NeurIPS?
[02:09:20.920 --> 02:09:23.000]   Any tips that you have for people coming to NeurIPS?
[02:09:23.000 --> 02:09:24.040]   - Oh, that's a good one.
[02:09:24.040 --> 02:09:27.200]   Yeah, big tip is just like if you see someone cool,
[02:09:27.200 --> 02:09:28.400]   like they're probably nice,
[02:09:28.400 --> 02:09:30.940]   so chase them down and like have them talk to you.
[02:09:30.940 --> 02:09:31.780]   - I love it.
[02:09:31.780 --> 02:09:33.080]   - Shove a microphone in their face.
[02:09:33.080 --> 02:09:33.920]   - Yeah, yeah.
[02:09:33.920 --> 02:09:34.760]   (laughing)
[02:09:34.760 --> 02:09:35.580]   No, I love it.
[02:09:35.580 --> 02:09:37.240]   But it was like my second NeurIPS or something,
[02:09:37.240 --> 02:09:39.560]   I saw Oriel Vinyals walk by
[02:09:39.560 --> 02:09:42.600]   and he had just done like the StarCraft stuff
[02:09:42.600 --> 02:09:44.520]   and I was like, okay, this guy is sick.
[02:09:44.520 --> 02:09:46.080]   He's doing some really cutting edge stuff.
[02:09:46.080 --> 02:09:48.040]   So I like ran up and asked him for life advice
[02:09:48.040 --> 02:09:50.280]   and he was so down to earth and like chatted with me
[02:09:50.280 --> 02:09:53.080]   for a bunch of time about like modeling and life
[02:09:53.080 --> 02:09:55.280]   and you know, how to think about my career and stuff.
[02:09:55.280 --> 02:09:57.980]   And so like, yeah, if you see a hero, like shoot your shot.
[02:09:57.980 --> 02:09:59.800]   - Yeah, yeah, very, very, very cool.
[02:09:59.800 --> 02:10:02.400]   Any papers that you're keen on this year
[02:10:02.400 --> 02:10:05.360]   or like maybe really affected you in previous years?
[02:10:05.360 --> 02:10:06.880]   - Oh, that's a good one too.
[02:10:06.880 --> 02:10:08.480]   This year, I think QLaura's here,
[02:10:08.480 --> 02:10:10.160]   which I think is like a very, very interesting--
[02:10:10.160 --> 02:10:11.360]   - Tim, I think is tomorrow.
[02:10:11.360 --> 02:10:12.200]   - Yeah, yeah, yeah, yeah.
[02:10:12.200 --> 02:10:14.660]   It's a very interesting set of implications
[02:10:14.660 --> 02:10:16.080]   for like the low resource world.
[02:10:16.080 --> 02:10:16.920]   - Can you elaborate?
[02:10:16.920 --> 02:10:19.080]   - Yeah, so one of the things we think a lot about at Gnomic
[02:10:19.080 --> 02:10:21.760]   is the accessibility of AI technology.
[02:10:21.760 --> 02:10:23.760]   And one of the things that's become very clear to us
[02:10:23.760 --> 02:10:25.520]   and I think everyone this year is like,
[02:10:25.520 --> 02:10:27.720]   there's the GPU rich and the GPU poor.
[02:10:27.720 --> 02:10:29.320]   And so I think methods that make it
[02:10:29.320 --> 02:10:31.420]   so that anyone in the world can interact
[02:10:31.420 --> 02:10:33.640]   with this technology like QLaura
[02:10:33.640 --> 02:10:36.120]   are just like so, so, so valuable.
[02:10:36.120 --> 02:10:38.080]   And so I think any research into like
[02:10:38.080 --> 02:10:39.800]   low resource training of models
[02:10:39.800 --> 02:10:41.160]   and low resource deployment of models
[02:10:41.160 --> 02:10:42.800]   is just gonna be so good for everybody,
[02:10:42.800 --> 02:10:44.440]   especially like the open source community
[02:10:44.440 --> 02:10:45.640]   that I really love to see it, so.
[02:10:45.640 --> 02:10:47.120]   - Yeah, you just reminded me.
[02:10:47.120 --> 02:10:50.320]   So talking about, we forgot to talk about GPC4ALL.
[02:10:50.320 --> 02:10:52.280]   Very, very early win, I think,
[02:10:52.280 --> 02:10:54.040]   in the overall space of things.
[02:10:54.040 --> 02:10:56.200]   But now, more recently in my mind,
[02:10:56.200 --> 02:10:59.120]   Llama CPP has come out to be its own platform.
[02:10:59.120 --> 02:11:01.360]   Old Llamas emerging as like a thing,
[02:11:01.360 --> 02:11:03.000]   like there's a bunch of ways
[02:11:03.000 --> 02:11:04.920]   in which people run models locally.
[02:11:04.920 --> 02:11:06.520]   How should people think about GPC4ALL
[02:11:06.520 --> 02:11:07.600]   in the context of all that?
[02:11:07.600 --> 02:11:09.760]   - Yeah, so one thing that a lot of people don't realize
[02:11:09.760 --> 02:11:12.480]   is that a lot of the core contributors to Llama CPP
[02:11:12.480 --> 02:11:13.820]   actually work at Gnomic.
[02:11:13.820 --> 02:11:16.000]   And so I guess the operant advice here
[02:11:16.000 --> 02:11:17.800]   is just like play nice with open source, right?
[02:11:17.800 --> 02:11:19.600]   Like GPC4ALL is this thing
[02:11:19.600 --> 02:11:21.580]   that's gonna be free forever for our community.
[02:11:21.580 --> 02:11:23.320]   We're gonna keep trying to improve it
[02:11:23.320 --> 02:11:26.240]   as our Discord recommends and as people call for.
[02:11:26.240 --> 02:11:30.240]   But if we can do things like go and contribute
[02:11:30.240 --> 02:11:31.920]   to other open source projects that are high impact,
[02:11:31.920 --> 02:11:32.960]   we're going to, right?
[02:11:32.960 --> 02:11:37.760]   And so the hope here is that as economic pressures apply,
[02:11:37.760 --> 02:11:39.080]   open source stays collaborative
[02:11:39.080 --> 02:11:41.240]   is really, really the goal for us, I think.
[02:11:41.240 --> 02:11:42.060]   - Okay, cool.
[02:11:42.060 --> 02:11:42.900]   Well, that's it.
[02:11:42.900 --> 02:11:43.740]   Any other last words?
[02:11:43.740 --> 02:11:45.080]   What are you looking for?
[02:11:45.080 --> 02:11:46.580]   How do people find you?
[02:11:46.580 --> 02:11:49.880]   - Yeah, you can follow us on Twitter at gnomic_ai.
[02:11:49.880 --> 02:11:52.640]   You can also find our website, gnomic.ai.
[02:11:52.640 --> 02:11:54.600]   - Hiring engineers, researchers?
[02:11:54.600 --> 02:11:58.080]   - Yeah, we're always looking for super interesting people.
[02:11:58.080 --> 02:11:59.560]   Yeah, come chat about interesting things
[02:11:59.560 --> 02:12:00.760]   in our Discord, really.
[02:12:00.760 --> 02:12:02.240]   You can visit our website and stuff.
[02:12:02.240 --> 02:12:03.520]   But really the best way to get involved
[02:12:03.520 --> 02:12:05.960]   is like make some maps, do some open source work.
[02:12:05.960 --> 02:12:08.920]   Like a lot of the people that we hired
[02:12:08.920 --> 02:12:10.920]   in this last kind of spree of hiring
[02:12:10.920 --> 02:12:12.920]   were like big open source contributors.
[02:12:12.920 --> 02:12:15.020]   And so like, yeah, just give back to the community
[02:12:15.020 --> 02:12:17.460]   and then, you know, we'll try and find you and boost you.
[02:12:17.460 --> 02:12:18.300]   - Yeah, awesome.
[02:12:18.300 --> 02:12:19.120]   Well, thanks so much for your time.
[02:12:19.120 --> 02:12:20.280]   - Yeah, take care.
[02:12:20.280 --> 02:12:21.900]   - I think the way that Gnomic's embracing
[02:12:21.900 --> 02:12:23.960]   and supporting open source AI is encouraging
[02:12:23.960 --> 02:12:26.260]   and I think more companies should learn from that.
[02:12:26.260 --> 02:12:27.100]   But they're definitely far
[02:12:27.100 --> 02:12:29.940]   from the only open source AI company out there.
[02:12:29.940 --> 02:12:32.020]   Lightning AI is one of the oldest, I guess,
[02:12:32.020 --> 02:12:34.140]   if you can call that old in the space.
[02:12:34.140 --> 02:12:37.320]   And I happened to catch Luca, the CTO at their booth
[02:12:37.320 --> 02:12:40.100]   and at NeurIPS, they were there to launch Lightning Studio,
[02:12:40.100 --> 02:12:41.880]   which is their new development environment.
[02:12:41.880 --> 02:12:42.720]   Hey Luca, welcome.
[02:12:42.720 --> 02:12:44.760]   Good to see that you guys are launching a new product today.
[02:12:44.760 --> 02:12:45.600]   - Yeah, sure.
[02:12:45.600 --> 02:12:46.780]   It's super exciting.
[02:12:46.780 --> 02:12:48.400]   It's the result of many months,
[02:12:48.400 --> 02:12:51.420]   if not years of work and realizations.
[02:12:51.420 --> 02:12:52.820]   - So maybe let's establish a baseline.
[02:12:52.820 --> 02:12:55.180]   Most people will have heard of PyTorch Lightning.
[02:12:55.180 --> 02:12:56.540]   What was the evolution to Lightning AI?
[02:12:56.540 --> 02:12:59.260]   - Yeah, so PyTorch Lightning is a very healthy,
[02:12:59.260 --> 02:13:01.600]   has a very healthy community of people using it.
[02:13:01.600 --> 02:13:06.060]   We are 5.5 downloads, about 80 million downloads in,
[02:13:06.060 --> 02:13:07.880]   sorry, 5.5 million downloads.
[02:13:07.880 --> 02:13:08.720]   - Of course.
[02:13:08.720 --> 02:13:11.080]   - Per month, about 80 million downloads in total.
[02:13:11.080 --> 02:13:14.840]   And it's one of the frameworks that comes from the era
[02:13:14.840 --> 02:13:18.160]   of traditional, quote unquote, deep learning,
[02:13:18.160 --> 02:13:22.560]   that is one of the main actors in the Gen AI space.
[02:13:22.560 --> 02:13:25.080]   Because, for example, stable diffusion was trained
[02:13:25.080 --> 02:13:27.480]   using PyTorch Lightning, a bunch of models.
[02:13:27.480 --> 02:13:30.520]   PyTorch Lightning powers Nemo from NVIDIA.
[02:13:30.520 --> 02:13:34.000]   - Yeah, their custom chip design language model.
[02:13:34.000 --> 02:13:37.560]   - Yeah, so basically, PyTorch Lightning has evolved
[02:13:37.560 --> 02:13:40.320]   and grown into Gen AI.
[02:13:40.320 --> 02:13:43.720]   And with the release of 2.0, 2.1,
[02:13:43.720 --> 02:13:47.400]   we've tried to make it better and better for use cases
[02:13:47.400 --> 02:13:50.360]   which you have very large models and you have a hard time
[02:13:50.360 --> 02:13:54.260]   not going out of memory, so. (laughs)
[02:13:54.260 --> 02:13:57.480]   And to distribute it with, PyTorch Lightning has always
[02:13:57.480 --> 02:14:01.760]   been very focused on distributed training.
[02:14:01.760 --> 02:14:04.500]   It's one of the things that it did the best.
[02:14:05.420 --> 02:14:09.200]   But when models get very large, I think that's where
[02:14:09.200 --> 02:14:11.000]   we improved a lot this year.
[02:14:11.000 --> 02:14:14.640]   We also launched Fabric, Lightning Fabric,
[02:14:14.640 --> 02:14:17.680]   which is a, it's a framework, it's a companion framework
[02:14:17.680 --> 02:14:21.200]   to PyTorch Lightning, where you get all the constituents
[02:14:21.200 --> 02:14:23.260]   of the Lightning trainer, but now you can write
[02:14:23.260 --> 02:14:24.600]   your own training loops.
[02:14:24.600 --> 02:14:28.560]   So for people doing very optimized stuff, very bespoke,
[02:14:28.560 --> 02:14:33.800]   I don't know, collecting calls, they want to place them
[02:14:33.800 --> 02:14:36.680]   where they want, they want to fully own the training loop,
[02:14:36.680 --> 02:14:39.240]   or they're doing stuff like reinforcement learnings
[02:14:39.240 --> 02:14:42.040]   where it's not the traditional training loop,
[02:14:42.040 --> 02:14:43.480]   you can still do it with the trainer,
[02:14:43.480 --> 02:14:47.520]   but it's a bit more difficult.
[02:14:47.520 --> 02:14:50.920]   Then Fabric lets you just write your four loops.
[02:14:50.920 --> 02:14:55.360]   But we'll still abstract away strategies, precision plugins,
[02:14:55.360 --> 02:14:58.560]   the login, the aggregation of metrics, and all this stuff.
[02:14:58.560 --> 02:15:00.940]   I like to think about these frameworks as frameworks
[02:15:00.940 --> 02:15:04.560]   that reduce the surface area for mistakes.
[02:15:04.560 --> 02:15:07.560]   Because mistakes nowadays, well, a few years ago--
[02:15:07.560 --> 02:15:09.580]   >> Cost a lot of money. >> Mistakes, exactly, right?
[02:15:09.580 --> 02:15:13.300]   They costed a lot of time to a PhD student,
[02:15:13.300 --> 02:15:14.960]   right now they cost a lot of money,
[02:15:14.960 --> 02:15:18.960]   so you don't want to make too many mistakes there.
[02:15:18.960 --> 02:15:22.680]   And Torchmatrix is a third project that we have
[02:15:22.680 --> 02:15:24.680]   that is very healthy and is powering
[02:15:24.680 --> 02:15:26.040]   a lot of the metric computation.
[02:15:26.040 --> 02:15:28.780]   Again, you don't want to compute accuracy
[02:15:28.780 --> 02:15:32.140]   and aggregate it across a multi-machine job
[02:15:32.140 --> 02:15:33.140]   in the wrong way, right?
[02:15:33.140 --> 02:15:35.560]   Because you'll get wrong indications
[02:15:35.560 --> 02:15:37.720]   and it's really easy to do it incorrectly.
[02:15:37.720 --> 02:15:41.360]   And this year we started doing--
[02:15:41.360 --> 02:15:43.360]   >> Yeah, as you mentioned, these are mostly open source.
[02:15:43.360 --> 02:15:45.360]   >> Yeah, these are all 100% open source.
[02:15:45.360 --> 02:15:47.680]   >> I think Fabric in particular was pretty popular.
[02:15:47.680 --> 02:15:50.200]   >> Yeah, yeah, so Fabric has powered also
[02:15:50.200 --> 02:15:54.360]   our language model repositories,
[02:15:54.360 --> 02:15:55.760]   LitLama and LitGPT.
[02:15:57.040 --> 02:16:00.080]   Basically, back when Lama was originally released,
[02:16:00.080 --> 02:16:04.920]   I, and me, of course, and--
[02:16:04.920 --> 02:16:06.760]   >> The weights were leaked.
[02:16:06.760 --> 02:16:07.800]   The weights were leaked.
[02:16:07.800 --> 02:16:09.600]   >> Yeah, the weights, yeah, exactly, exactly.
[02:16:09.600 --> 02:16:12.680]   But at some point there was a model
[02:16:12.680 --> 02:16:15.560]   being published by Meta as well.
[02:16:15.560 --> 02:16:19.640]   It was a GPL license, so we didn't really like that.
[02:16:19.640 --> 02:16:22.920]   And so we say, why don't we take NanoGPT,
[02:16:22.920 --> 02:16:25.640]   because I was working with NanoGPT at the time,
[02:16:25.640 --> 02:16:27.720]   and turn it into Lama.
[02:16:27.720 --> 02:16:29.160]   And that started the whole thing
[02:16:29.160 --> 02:16:31.840]   of minimal implementation, single file,
[02:16:31.840 --> 02:16:34.200]   you have everything there, you have no layers to go through
[02:16:34.200 --> 02:16:37.560]   to understand how your layers are.
[02:16:37.560 --> 02:16:41.480]   And that became something that became very popular
[02:16:41.480 --> 02:16:43.400]   within many organizations.
[02:16:43.400 --> 02:16:44.680]   So it's still very popular.
[02:16:44.680 --> 02:16:47.780]   So the LLM Efficiency Challenge,
[02:16:47.780 --> 02:16:50.600]   the starter kit had LitGPT in it.
[02:16:50.600 --> 02:16:52.540]   And LitGPT today supports many models,
[02:16:52.540 --> 02:16:53.900]   many different models.
[02:16:53.900 --> 02:16:55.880]   But it's very easy to get to the bottom
[02:16:55.880 --> 02:16:58.840]   of the implementation of every single thing.
[02:16:58.840 --> 02:16:59.800]   >> Yeah, it's a-- >> So it's very hackable.
[02:16:59.800 --> 02:17:01.400]   >> Yeah, it's one-- >> My philosophy is--
[02:17:01.400 --> 02:17:02.240]   >> File.
[02:17:02.240 --> 02:17:05.600]   >> Make it hackable before you make it fast, right?
[02:17:05.600 --> 02:17:07.280]   Because more people can contribute to it,
[02:17:07.280 --> 02:17:10.320]   and we have contributors being very successful.
[02:17:10.320 --> 02:17:13.000]   There have been initiatives of models
[02:17:13.000 --> 02:17:17.140]   being pre-trained using that, like TinyLama.
[02:17:17.140 --> 02:17:21.200]   And 360AI, I think, a few days ago came out,
[02:17:21.200 --> 02:17:24.680]   and they said they used LitLama to pre-train
[02:17:24.680 --> 02:17:26.200]   their seven billion parameter model.
[02:17:26.200 --> 02:17:27.880]   So it's great.
[02:17:27.880 --> 02:17:30.600]   And a lot of those learnings went back into Fabric
[02:17:30.600 --> 02:17:32.200]   and back into PyTorch Lightning.
[02:17:32.200 --> 02:17:34.680]   And this is how we're kind of growing organically
[02:17:34.680 --> 02:17:36.840]   towards supporting some AI use cases.
[02:17:36.840 --> 02:17:38.700]   >> There's an example of one of those learnings
[02:17:38.700 --> 02:17:43.120]   from those outside usage of LitGPT.
[02:17:43.120 --> 02:17:44.560]   Oh, LitLama, I guess.
[02:17:44.560 --> 02:17:45.400]   >> Sorry, can you say that?
[02:17:45.400 --> 02:17:46.880]   >> What's an example of one of those learnings
[02:17:46.880 --> 02:17:50.000]   that you got from 360 contributing back?
[02:17:50.960 --> 02:17:54.160]   >> Well, 360 is very young in the sense
[02:17:54.160 --> 02:17:57.480]   that we just learned, I think, the day before yesterday
[02:17:57.480 --> 02:18:00.040]   that they used us, so it's great.
[02:18:00.040 --> 02:18:01.880]   We're very happy about that.
[02:18:01.880 --> 02:18:05.040]   From TinyLama, they did some optimizations
[02:18:05.040 --> 02:18:06.060]   on top of our code.
[02:18:06.060 --> 02:18:10.960]   And they trained a 1.1 billion parameter model
[02:18:10.960 --> 02:18:12.680]   on three trillion tokens.
[02:18:12.680 --> 02:18:14.200]   I think they're still doing that.
[02:18:14.200 --> 02:18:16.000]   I don't think they're done.
[02:18:16.000 --> 02:18:19.280]   And then some of the improvements that they made,
[02:18:19.280 --> 02:18:22.880]   then we upstreamed it to our, like for example,
[02:18:22.880 --> 02:18:27.040]   I think chunk cross-entropy,
[02:18:27.040 --> 02:18:29.860]   some kernels that they were using.
[02:18:29.860 --> 02:18:35.040]   And then we were happy to see that even our data set
[02:18:35.040 --> 02:18:37.400]   that we optimized because it chunks your data
[02:18:37.400 --> 02:18:41.120]   and it can stream very quickly, work for them.
[02:18:41.120 --> 02:18:45.000]   So it's kind of a mutual thing that we're doing.
[02:18:45.000 --> 02:18:47.800]   And also, all the quantization support.
[02:18:47.800 --> 02:18:50.320]   For example, right now, Fabric and PyTorch Lightning
[02:18:50.320 --> 02:18:52.660]   support bits and bytes natively.
[02:18:52.660 --> 02:18:57.420]   And it's basically one of the few solutions
[02:18:57.420 --> 02:19:01.440]   where you can use quantization on any kind of model
[02:19:01.440 --> 02:19:04.580]   and not just the model that the original authors
[02:19:04.580 --> 02:19:05.720]   decided to support.
[02:19:05.720 --> 02:19:07.200]   Yeah, so it's kind of flexible.
[02:19:07.200 --> 02:19:10.600]   >> But here today, I think the main thing we're doing today
[02:19:10.600 --> 02:19:12.040]   is launching our platform.
[02:19:12.040 --> 02:19:14.000]   >> Yeah, you just launched Studio today.
[02:19:14.000 --> 02:19:14.920]   >> Yeah, exactly.
[02:19:14.920 --> 02:19:18.280]   Lightning Studio, again, is a result of many months
[02:19:18.280 --> 02:19:20.520]   and years of work.
[02:19:20.520 --> 02:19:24.600]   It basically makes you build AI at scale,
[02:19:24.600 --> 02:19:26.160]   but it feels like it's your laptop.
[02:19:26.160 --> 02:19:30.680]   So to me, it's kind of the first time I've seen a platform
[02:19:30.680 --> 02:19:33.720]   not leaking the abstraction of orchestration
[02:19:33.720 --> 02:19:35.080]   on the cloud and so on.
[02:19:35.080 --> 02:19:38.000]   Literally, there's nothing to learn, right?
[02:19:38.000 --> 02:19:40.960]   >> You put VS Code in the browser and then you add all that.
[02:19:40.960 --> 02:19:43.060]   >> You can even connect from your local VS Code
[02:19:43.060 --> 02:19:43.900]   and code there.
[02:19:43.900 --> 02:19:45.920]   >> You have the whole machine, it's a whole machine.
[02:19:45.920 --> 02:19:47.720]   >> Yeah, it's a cloud development environment.
[02:19:47.720 --> 02:19:48.760]   >> Exactly.
[02:19:48.760 --> 02:19:51.360]   And it's built around reproducible environment.
[02:19:51.360 --> 02:19:52.200]   Yeah, exactly.
[02:19:52.200 --> 02:19:54.520]   But when you go in there, it's not that you need to build
[02:19:54.520 --> 02:19:57.600]   your Docker container, you just go in there,
[02:19:57.600 --> 02:19:58.840]   you present it with a machine,
[02:19:58.840 --> 02:20:00.360]   you can start working immediately.
[02:20:00.360 --> 02:20:02.780]   If you pip install something and then you decide
[02:20:02.780 --> 02:20:06.240]   to switch instance type, your dependencies will carry over.
[02:20:06.240 --> 02:20:10.620]   Or if you decide to duplicate my studio,
[02:20:10.620 --> 02:20:12.360]   everything that I set up on that studio,
[02:20:12.360 --> 02:20:14.960]   from the environment to the data, the code,
[02:20:14.960 --> 02:20:18.120]   the checkpoints eventually that I put there,
[02:20:18.120 --> 02:20:21.140]   you will find them and so you will spend zero time
[02:20:21.140 --> 02:20:21.980]   setting up your environment.
[02:20:21.980 --> 02:20:23.520]   >> So are you snapshotting memory?
[02:20:23.520 --> 02:20:24.720]   How does this work?
[02:20:24.720 --> 02:20:27.080]   >> Well, that's secret sauce.
[02:20:27.080 --> 02:20:29.040]   >> You're not using containers, you said.
[02:20:29.040 --> 02:20:33.680]   >> Yeah, well, I mean, we do, if you think about it,
[02:20:33.680 --> 02:20:38.680]   then it's not too complicated fundamentally,
[02:20:38.680 --> 02:20:41.620]   but it's very complicated to actually get
[02:20:41.620 --> 02:20:43.280]   the perfect experience out of it.
[02:20:43.280 --> 02:20:45.720]   >> Maybe describe your design constraints,
[02:20:45.720 --> 02:20:47.840]   what are you optimizing for?
[02:20:47.840 --> 02:20:49.480]   >> We're optimizing for velocity.
[02:20:49.480 --> 02:20:53.160]   So we don't want people to spend time thinking
[02:20:53.160 --> 02:20:55.080]   about things they shouldn't think about.
[02:20:55.080 --> 02:20:57.600]   Like when you're coding on a machine
[02:20:57.600 --> 02:21:01.060]   and you now want four GPUs, you should just be able
[02:21:01.060 --> 02:21:03.600]   to get four GPUs and keep working, right?
[02:21:03.600 --> 02:21:06.680]   Without thinking about, oh, now I need to go to a console,
[02:21:06.680 --> 02:21:10.120]   spin things up, for my environment, attach drives,
[02:21:10.120 --> 02:21:12.800]   like these are all things you shouldn't think about.
[02:21:12.800 --> 02:21:17.320]   And again, it goes back to limiting the surface area
[02:21:17.320 --> 02:21:18.600]   for mistakes, right?
[02:21:18.600 --> 02:21:21.200]   Because you can do what you're good at
[02:21:21.200 --> 02:21:23.280]   and not do what you shouldn't mess with.
[02:21:23.280 --> 02:21:24.640]   >> It's like the fabric philosophy
[02:21:24.640 --> 02:21:26.540]   that's expanded to the dev environment.
[02:21:26.540 --> 02:21:28.160]   >> Exactly, exactly.
[02:21:28.160 --> 02:21:29.600]   And so, yeah, we're very excited.
[02:21:29.600 --> 02:21:32.080]   You can do small things, like in Colab,
[02:21:32.080 --> 02:21:34.960]   except that your data is persistent
[02:21:34.960 --> 02:21:37.980]   and you can switch off and switch on
[02:21:37.980 --> 02:21:39.480]   and everything will be there.
[02:21:40.480 --> 02:21:43.120]   Or you can even train large language models.
[02:21:43.120 --> 02:21:48.120]   >> Yeah, what are the larger customers doing?
[02:21:48.120 --> 02:21:49.320]   What are you doing for them?
[02:21:49.320 --> 02:21:51.080]   Because I feel like this might be targeted
[02:21:51.080 --> 02:21:52.960]   towards the smaller customers.
[02:21:52.960 --> 02:21:57.160]   >> No, actually, we work with very big
[02:21:57.160 --> 02:22:01.160]   financial institutions and we're actually
[02:22:01.160 --> 02:22:03.280]   pre-training models ourselves.
[02:22:03.280 --> 02:22:08.000]   So the scale at which you can operate is pretty large.
[02:22:08.000 --> 02:22:09.760]   It's not like, it looks like something
[02:22:09.760 --> 02:22:12.360]   that you can do small stuff with, which is true.
[02:22:12.360 --> 02:22:13.600]   It's super smooth there.
[02:22:13.600 --> 02:22:16.080]   But if you need to launch a job on 100 GPUs,
[02:22:16.080 --> 02:22:18.440]   you can just do it, provided that you have the machines.
[02:22:18.440 --> 02:22:22.780]   But we manage reservations, so we can target reservations.
[02:22:22.780 --> 02:22:26.040]   Or you can attach your own cloud account
[02:22:26.040 --> 02:22:28.880]   and negotiate your quotas with your cloud provider
[02:22:28.880 --> 02:22:31.340]   and we'll just orchestrate on your cloud account.
[02:22:31.340 --> 02:22:34.760]   >> Yeah, any cloud providers you would shout out as,
[02:22:34.760 --> 02:22:36.400]   particularly, I mean, people know the big three clouds,
[02:22:36.400 --> 02:22:38.320]   but any other providers that you would shout out
[02:22:38.320 --> 02:22:40.360]   as very good partners to work with so far?
[02:22:40.360 --> 02:22:42.680]   >> Right now, we've been focusing on AWS.
[02:22:42.680 --> 02:22:45.400]   We'll expand, of course, because--
[02:22:45.400 --> 02:22:46.960]   >> Yeah, everyone needs everyone else.
[02:22:46.960 --> 02:22:47.800]   >> Yeah, exactly.
[02:22:47.800 --> 02:22:49.640]   >> Apparently, Oracle's doing very well.
[02:22:49.640 --> 02:22:51.700]   >> Yeah, yeah, we talked to Oracle.
[02:22:51.700 --> 02:22:54.840]   We talked to most of the cloud providers out there.
[02:22:54.840 --> 02:22:57.660]   To us, it's more a matter of sequencing.
[02:22:57.660 --> 02:22:59.240]   We have a very good relationship, of course,
[02:22:59.240 --> 02:23:00.880]   with AWS right now.
[02:23:00.880 --> 02:23:03.940]   They've been supporting us for the launch and so on.
[02:23:03.940 --> 02:23:07.680]   But surely, we'll get into getting the best machines
[02:23:07.680 --> 02:23:09.540]   for our customers.
[02:23:09.540 --> 02:23:10.520]   >> Yeah.
[02:23:10.520 --> 02:23:13.240]   >> And in the near future, we'll also support
[02:23:13.240 --> 02:23:16.120]   on-prem clusters in terms of orchestration,
[02:23:16.120 --> 02:23:19.220]   like Slurm as an orchestrator or as a scheduler.
[02:23:19.220 --> 02:23:21.400]   >> People have mixed feelings about Slurm.
[02:23:21.400 --> 02:23:22.720]   >> Well, yeah, but in this case,
[02:23:22.720 --> 02:23:23.820]   you don't have to deal with it, right?
[02:23:23.820 --> 02:23:24.660]   >> Yes, yeah, yeah.
[02:23:24.660 --> 02:23:27.960]   >> We take away the pain and you still can orchestrate
[02:23:27.960 --> 02:23:28.940]   on top of that.
[02:23:29.880 --> 02:23:34.880]   It's still not out, but it will come in the near future.
[02:23:34.880 --> 02:23:37.120]   >> Yeah.
[02:23:37.120 --> 02:23:39.460]   >> We're already doing that with some companies.
[02:23:39.460 --> 02:23:41.900]   >> Yeah, so I want to talk about the workshop
[02:23:41.900 --> 02:23:42.740]   that you're doing on Friday.
[02:23:42.740 --> 02:23:43.580]   >> Yeah.
[02:23:43.580 --> 02:23:44.500]   >> The Efficiency Challenge.
[02:23:44.500 --> 02:23:45.340]   >> Yep.
[02:23:45.340 --> 02:23:46.160]   >> Was it motivated by a paper?
[02:23:46.160 --> 02:23:48.260]   I saw it's sort of like a cramming paper.
[02:23:48.260 --> 02:23:50.940]   What's the maximum you can do with one day of compute?
[02:23:50.940 --> 02:23:52.020]   Something like that?
[02:23:52.020 --> 02:23:55.660]   >> Yeah, so we know it is because they,
[02:23:56.540 --> 02:24:00.000]   Mark Sarfim and the other organizers
[02:24:00.000 --> 02:24:03.280]   ended up choosing the GPT as one of the models
[02:24:03.280 --> 02:24:05.800]   for Starkey, and we were happy about it, of course.
[02:24:05.800 --> 02:24:06.640]   >> Yeah, yeah.
[02:24:06.640 --> 02:24:08.520]   >> And so we said, yeah, what we can do together.
[02:24:08.520 --> 02:24:12.360]   And we ended up, and we really like the principle.
[02:24:12.360 --> 02:24:16.160]   So we believe smaller models can empower
[02:24:16.160 --> 02:24:21.320]   people a lot, getting control and understanding
[02:24:21.320 --> 02:24:22.960]   how to extract value from AI.
[02:24:22.960 --> 02:24:26.280]   And so I think there's a dire need
[02:24:26.280 --> 02:24:29.800]   for the consolidation, getting smaller,
[02:24:29.800 --> 02:24:34.140]   getting more efficiency, and getting the result you want
[02:24:34.140 --> 02:24:36.080]   in the shortest time as possible.
[02:24:36.080 --> 02:24:39.240]   And that's how the velocity will increase
[02:24:39.240 --> 02:24:44.240]   and how eventually open source will get there
[02:24:44.240 --> 02:24:47.840]   on par, if not beyond, what's available
[02:24:47.840 --> 02:24:49.340]   in the closed source world.
[02:24:49.340 --> 02:24:52.560]   So we are fully supportive of that.
[02:24:52.560 --> 02:24:54.760]   The way we ended up contributing is
[02:24:54.760 --> 02:24:59.160]   we maintained a public leaderboard,
[02:24:59.160 --> 02:25:00.200]   and it was a nice experience
[02:25:00.200 --> 02:25:02.080]   because we integrated with Discord.
[02:25:02.080 --> 02:25:03.780]   There was a Discord channel.
[02:25:03.780 --> 02:25:06.560]   >> This is for the Efficiency Challenge Discord?
[02:25:06.560 --> 02:25:09.080]   >> Yeah, exactly, the Efficiency Challenge Discord,
[02:25:09.080 --> 02:25:11.600]   and we set up an agent that was running
[02:25:11.600 --> 02:25:16.240]   on a few of our machines, and people could submit
[02:25:16.240 --> 02:25:20.920]   through a DM to the bot so that the bot
[02:25:20.920 --> 02:25:25.440]   would then spin up a job, run things in a queue,
[02:25:25.440 --> 02:25:28.520]   get back the results from evaluation,
[02:25:28.520 --> 02:25:34.960]   and then essentially get a ranking on where they were.
[02:25:34.960 --> 02:25:38.360]   And that, I think, helped a lot,
[02:25:38.360 --> 02:25:41.560]   motivating people to compete against each other,
[02:25:41.560 --> 02:25:43.400]   but in a very constructive way.
[02:25:43.400 --> 02:25:45.920]   And to be honest, in the first month,
[02:25:45.920 --> 02:25:48.360]   it's been very, very bumpy with that.
[02:25:48.360 --> 02:25:50.200]   It was all new infrastructure,
[02:25:50.200 --> 02:25:52.300]   and we were doing it in spare time,
[02:25:52.300 --> 02:25:56.240]   so it wasn't the best of the experience.
[02:25:56.240 --> 02:25:59.800]   So together with the community that was on there,
[02:25:59.800 --> 02:26:02.800]   they helped us figure out what was not going well,
[02:26:02.800 --> 02:26:05.120]   and I think at the end, we had more
[02:26:05.120 --> 02:26:08.400]   than 1,000 submissions that were successful.
[02:26:08.400 --> 02:26:10.720]   Many more submissions that didn't complete
[02:26:10.720 --> 02:26:14.680]   because of submission problems, like user code problems,
[02:26:14.680 --> 02:26:16.640]   but there were more than 1,000 submissions
[02:26:16.640 --> 02:26:20.260]   that were actually fully evaluated on that leaderboard.
[02:26:20.260 --> 02:26:23.200]   - So the challenge is over,
[02:26:23.200 --> 02:26:25.140]   but I don't know if you've done the analysis
[02:26:25.140 --> 02:26:28.400]   on anything to learn from the winning entries.
[02:26:28.400 --> 02:26:32.240]   - So we've been, the rest of the organizers, yes,
[02:26:32.240 --> 02:26:36.160]   they have put a lot of effort in the next three weeks,
[02:26:36.160 --> 02:26:38.960]   four weeks, to reevaluate everything,
[02:26:38.960 --> 02:26:42.600]   run the first ones from scratch,
[02:26:42.600 --> 02:26:44.520]   and they've done an amazing job.
[02:26:45.860 --> 02:26:47.400]   And some of the code that we wrote
[02:26:47.400 --> 02:26:49.360]   for the public leaderboard ended up
[02:26:49.360 --> 02:26:53.340]   being part of this evaluation infrastructure.
[02:26:53.340 --> 02:26:55.160]   I was very, very busy with the launch.
[02:26:55.160 --> 02:26:56.000]   - Of course.
[02:26:56.000 --> 02:26:57.280]   - So I didn't participate there,
[02:26:57.280 --> 02:26:58.120]   so I'm super curious-- - I'll try to talk
[02:26:58.120 --> 02:26:59.400]   to Sebastien. - On Friday.
[02:26:59.400 --> 02:27:00.640]   - Yeah. - Yeah, yeah, yeah.
[02:27:00.640 --> 02:27:01.960]   - Okay.
[02:27:01.960 --> 02:27:06.740]   - What will be the details from the winners.
[02:27:06.740 --> 02:27:07.580]   - Yes.
[02:27:07.580 --> 02:27:09.120]   - But I must say that all the community
[02:27:09.120 --> 02:27:12.720]   has been super nice, they were super constructive.
[02:27:12.720 --> 02:27:15.140]   I remember when Mistral first came out,
[02:27:15.140 --> 02:27:17.360]   there was a huge thread of people
[02:27:17.360 --> 02:27:19.080]   just getting in there, analyzing it,
[02:27:19.080 --> 02:27:21.840]   trying to find it, it was so much energy
[02:27:21.840 --> 02:27:25.280]   that we definitely want to push it forward
[02:27:25.280 --> 02:27:27.900]   and we'll create public studios
[02:27:27.900 --> 02:27:29.880]   with evaluation frameworks on them.
[02:27:29.880 --> 02:27:30.960]   And we want to enable this kind of--
[02:27:30.960 --> 02:27:32.720]   - So studios are shareable, of course.
[02:27:32.720 --> 02:27:33.560]   - Yes. - Yes, that makes sense.
[02:27:33.560 --> 02:27:35.660]   - Not only shareable between you and me,
[02:27:35.660 --> 02:27:37.840]   but also community-wide. - Yeah, yeah, yeah, yeah.
[02:27:37.840 --> 02:27:39.420]   - So there will be a lot of things,
[02:27:39.420 --> 02:27:41.200]   you can go in there, use your pre-credits
[02:27:41.200 --> 02:27:43.640]   to just run the evaluation on your model,
[02:27:43.640 --> 02:27:44.480]   you can do that.
[02:27:44.480 --> 02:27:46.040]   - Yeah, great.
[02:27:46.040 --> 02:27:48.320]   So last question, I've been asking everybody this.
[02:27:48.320 --> 02:27:50.760]   You've been coming to NeurIPS for many years,
[02:27:50.760 --> 02:27:53.360]   what are your NeurIPS tips?
[02:27:53.360 --> 02:27:56.980]   - Oh wow, yeah, go to posters, which I cannot do, but.
[02:27:56.980 --> 02:27:58.440]   (laughing)
[02:27:58.440 --> 02:28:00.840]   - But okay, so I've been to like,
[02:28:00.840 --> 02:28:02.920]   there's been three poster sessions so far.
[02:28:02.920 --> 02:28:04.640]   The popular ones are just crowded,
[02:28:04.640 --> 02:28:05.480]   there's just no way.
[02:28:05.480 --> 02:28:07.080]   - Yeah, I think there's not, yeah,
[02:28:07.080 --> 02:28:08.640]   I don't like to go to popular ones.
[02:28:08.640 --> 02:28:10.560]   - Yeah, okay, the less popular ones.
[02:28:10.560 --> 02:28:11.480]   - Yeah, yeah. - Just talk to them.
[02:28:11.480 --> 02:28:15.400]   - So I had so many super engaging conversations,
[02:28:15.400 --> 02:28:18.720]   even in topics like, even something is not apparently
[02:28:18.720 --> 02:28:21.640]   something that you should focus on.
[02:28:21.640 --> 02:28:22.480]   - Yeah.
[02:28:22.480 --> 02:28:25.080]   - Your brain will oxygenate itself a lot,
[02:28:25.080 --> 02:28:27.120]   and typically after these conferences,
[02:28:27.120 --> 02:28:30.700]   I always come back with a head full of ideas,
[02:28:30.700 --> 02:28:35.700]   and so I would say get enriched as much as you can
[02:28:35.700 --> 02:28:36.760]   by interacting with people,
[02:28:36.760 --> 02:28:38.620]   having very honest conversation with them.
[02:28:38.620 --> 02:28:40.160]   - Yeah.
[02:28:40.160 --> 02:28:42.720]   I had an off the record conversation with one
[02:28:42.720 --> 02:28:44.120]   of the presenters who said like,
[02:28:44.120 --> 02:28:46.920]   yeah, I don't, this paper I'm presenting,
[02:28:46.920 --> 02:28:47.840]   I don't believe in it.
[02:28:47.840 --> 02:28:49.000]   (laughing)
[02:28:49.000 --> 02:28:51.120]   I was like, wow, that's really honest.
[02:28:51.120 --> 02:28:53.200]   'Cause they submitted it months ago, right?
[02:28:53.200 --> 02:28:55.560]   And since then, the world has moved on.
[02:28:55.560 --> 02:28:59.660]   - Yeah, well, that's part of the struggle.
[02:28:59.660 --> 02:29:03.960]   I don't know how it must be being a postdoc
[02:29:03.960 --> 02:29:08.920]   or a PhD student, or even a master's student,
[02:29:08.920 --> 02:29:11.920]   nowadays in AI, it must be so stressful.
[02:29:11.920 --> 02:29:12.760]   (laughing)
[02:29:12.760 --> 02:29:13.580]   - Yeah, it is.
[02:29:13.580 --> 02:29:15.360]   - Back in the day, it was a lot easier.
[02:29:15.360 --> 02:29:16.680]   - You know, the prices have got bigger.
[02:29:16.680 --> 02:29:18.160]   - Yeah, for sure, for sure.
[02:29:18.160 --> 02:29:19.640]   - Okay, well, thank you so much for your time,
[02:29:19.640 --> 02:29:21.100]   and congrats on your launch.
[02:29:21.100 --> 02:29:23.700]   - Yeah, I think we'll, you know,
[02:29:23.700 --> 02:29:25.440]   meet each other on the platform, maybe.
[02:29:25.440 --> 02:29:27.160]   - Yes, I definitely will try it out.
[02:29:27.160 --> 02:29:28.680]   - Thank you. - Thanks a lot, bye.
[02:29:28.680 --> 02:29:31.400]   - A few of my AI engineer and ML engineer friends
[02:29:31.400 --> 02:29:32.660]   checked out Lightning Studio,
[02:29:32.660 --> 02:29:33.520]   and they were pretty impressed,
[02:29:33.520 --> 02:29:36.320]   so I'm personally interested to check it out next year.
[02:29:36.320 --> 02:29:37.680]   But last and not least,
[02:29:37.680 --> 02:29:40.080]   I want to give the mic to Jay Alomar
[02:29:40.080 --> 02:29:41.800]   of Cohere and LLM University,
[02:29:41.800 --> 02:29:44.180]   but more importantly, of the Illustrator Transformer,
[02:29:44.180 --> 02:29:46.000]   and is now writing a new book.
[02:29:46.000 --> 02:29:48.360]   We're here with Jay Alomar, educator of many things.
[02:29:48.360 --> 02:29:50.320]   I've learned so much from you.
[02:29:50.320 --> 02:29:52.440]   I literally, it's one of those moments
[02:29:52.440 --> 02:29:54.200]   where at NeurIPS, you just kind of see someone walking,
[02:29:54.200 --> 02:29:55.520]   and I'm like, "Is that Jay?"
[02:29:55.520 --> 02:29:57.800]   And then I had to get your attention a few times,
[02:29:57.800 --> 02:30:00.120]   but it's so nice to finally meet you.
[02:30:00.120 --> 02:30:02.700]   - It's great to meet you, and great to be here,
[02:30:02.700 --> 02:30:04.540]   and sort of meet all kinds of brilliant folks.
[02:30:04.540 --> 02:30:06.460]   I've watched your stuff,
[02:30:06.460 --> 02:30:09.360]   and sort of been watching the revolution,
[02:30:09.360 --> 02:30:11.520]   and how you're helping sort of crystallize
[02:30:11.520 --> 02:30:15.080]   people's thinking about this new domain of AI engineering,
[02:30:15.080 --> 02:30:18.240]   and so I think the title is very helpful
[02:30:18.240 --> 02:30:20.800]   as categorizing that class.
[02:30:20.800 --> 02:30:22.680]   - Yeah, trying to do for my audience
[02:30:22.680 --> 02:30:25.360]   what you do for just general ML education,
[02:30:25.360 --> 02:30:28.000]   which is, I think, something that you've really done
[02:30:28.000 --> 02:30:29.000]   an incredible job of.
[02:30:29.000 --> 02:30:30.480]   - Yeah, no, it's wonderful.
[02:30:30.480 --> 02:30:32.080]   It's what the community needs, definitely.
[02:30:32.080 --> 02:30:36.060]   As machine learning and AI sort of goes out of research,
[02:30:36.060 --> 02:30:37.660]   and goes into industry, and people--
[02:30:37.660 --> 02:30:39.740]   - It's a different persona, different background.
[02:30:39.740 --> 02:30:40.580]   There's the kind of people,
[02:30:40.580 --> 02:30:42.100]   and one of the reasons I'm doing this recording
[02:30:42.100 --> 02:30:44.460]   is the kind of people that follow my stuff
[02:30:44.460 --> 02:30:48.180]   don't come here, and maybe they shouldn't, right?
[02:30:48.180 --> 02:30:50.720]   Like, some of this is too in-depth.
[02:30:50.720 --> 02:30:52.780]   But I'm curious, you've been to many NeurIPS.
[02:30:52.780 --> 02:30:54.460]   What is your general take of the vibe?
[02:30:54.460 --> 02:30:55.480]   What are people talking about?
[02:30:55.480 --> 02:30:56.660]   What's top of mind?
[02:30:56.660 --> 02:30:59.260]   - There's a lot of LLMs that's interesting to see.
[02:30:59.260 --> 02:31:00.540]   - Suddenly a lot of interest, right?
[02:31:00.540 --> 02:31:02.840]   - Yes, yes, that is, let's say,
[02:31:02.840 --> 02:31:06.960]   maybe, possibly a new development in NeurIPS.
[02:31:06.960 --> 02:31:09.800]   That's the area that's growing.
[02:31:09.800 --> 02:31:12.000]   And a couple of interesting keywords,
[02:31:12.000 --> 02:31:14.760]   or groups, or directions are diffusion,
[02:31:14.760 --> 02:31:16.640]   even diffusion for text models.
[02:31:16.640 --> 02:31:17.480]   That's interesting.
[02:31:17.480 --> 02:31:18.600]   - Yeah, there was a paper on that yesterday.
[02:31:18.600 --> 02:31:19.440]   - Yeah.
[02:31:19.440 --> 02:31:22.280]   - I'm not too, what's the point of diffusion for text?
[02:31:22.280 --> 02:31:24.840]   Don't people want to stream things out?
[02:31:24.840 --> 02:31:26.240]   - Well, I mean, if you think of,
[02:31:26.240 --> 02:31:28.080]   like, on the application side,
[02:31:28.080 --> 02:31:30.860]   autoregressive generation has some problems.
[02:31:30.860 --> 02:31:33.260]   So if the model makes a mistake with token five,
[02:31:33.260 --> 02:31:34.500]   you're stuck with that problem.
[02:31:34.500 --> 02:31:36.460]   - Yes, that's what Tree of Thought solves,
[02:31:36.460 --> 02:31:38.180]   which the Tree of Thought guy was here.
[02:31:38.180 --> 02:31:40.940]   - Yeah, so it's one, let's say, one avenue.
[02:31:40.940 --> 02:31:41.780]   - Yes.
[02:31:41.780 --> 02:31:43.420]   - But it's like, maybe if you,
[02:31:43.420 --> 02:31:48.140]   if the model does not fall in a mistake in that way,
[02:31:48.140 --> 02:31:51.140]   you can unlock new, sort of, different applications.
[02:31:51.140 --> 02:31:53.140]   But also all the image generation stuff,
[02:31:53.140 --> 02:31:53.980]   that's really where.
[02:31:53.980 --> 02:31:55.580]   - Yeah, well, I'll make a plug.
[02:31:55.580 --> 02:31:57.500]   I actually had a, so there's a lot of house parties
[02:31:57.500 --> 02:31:59.700]   that happen after NeurIPS, which is fantastic.
[02:31:59.700 --> 02:32:01.820]   I ran into a guy from MidJourney for the first time.
[02:32:01.820 --> 02:32:04.380]   They have this new storytelling section,
[02:32:04.380 --> 02:32:06.020]   and they are actually exploring text diffusion
[02:32:06.020 --> 02:32:07.340]   because of storytelling.
[02:32:07.340 --> 02:32:09.400]   Because you have to generate a coherent story,
[02:32:09.400 --> 02:32:10.500]   just like you would an image.
[02:32:10.500 --> 02:32:11.340]   - True, true.
[02:32:11.340 --> 02:32:13.340]   - So I would buy that as a use case.
[02:32:13.340 --> 02:32:14.360]   - That is fascinating.
[02:32:14.360 --> 02:32:16.620]   And then with the agent stuff, like,
[02:32:16.620 --> 02:32:18.740]   if you're interested in the future of agents,
[02:32:18.740 --> 02:32:21.580]   which is, there's a lot of the reasoning stuff.
[02:32:21.580 --> 02:32:23.340]   The reasoning research in NeurIPS
[02:32:23.340 --> 02:32:26.180]   will most likely, sort of, inform the upcoming,
[02:32:26.180 --> 02:32:27.940]   what's gonna happen in agents next.
[02:32:27.940 --> 02:32:30.380]   So, chain of thought, tree of thought,
[02:32:30.380 --> 02:32:33.700]   that domain of research, for me, is very fascinating.
[02:32:33.700 --> 02:32:36.780]   Because it's gonna be applied very quickly.
[02:32:36.780 --> 02:32:39.460]   The React paper comes out, it's in line chain,
[02:32:39.460 --> 02:32:40.940]   everybody's sort of using it.
[02:32:40.940 --> 02:32:43.140]   Everybody has a sense of what agents are.
[02:32:43.140 --> 02:32:44.780]   But that really shows you the potential
[02:32:44.780 --> 02:32:46.440]   of what they're gonna be in the future.
[02:32:46.440 --> 02:32:49.140]   We're still in early days on agents.
[02:32:49.140 --> 02:32:51.860]   - Any other, like, top of mind sessions?
[02:32:51.860 --> 02:32:53.480]   Did you go to the Chris Ray run this morning?
[02:32:53.480 --> 02:32:54.660]   I thought that was pretty cool.
[02:32:54.660 --> 02:32:57.580]   - I have that one and a couple of the other, sort of,
[02:32:57.580 --> 02:32:59.820]   keynotes, I'll be re-watching them.
[02:32:59.820 --> 02:33:02.300]   But mostly, I'm just talking with people,
[02:33:02.300 --> 02:33:04.580]   recording video, that's been my,
[02:33:04.580 --> 02:33:06.180]   and sort of trying to orient myself.
[02:33:06.180 --> 02:33:08.340]   It's an overwhelming amount of content,
[02:33:08.340 --> 02:33:10.860]   and people, and posters, and talks.
[02:33:10.860 --> 02:33:13.260]   And so, I've been, yeah, looking at visualizations of,
[02:33:13.260 --> 02:33:15.220]   you know, these are the papers at NeurIPS.
[02:33:15.220 --> 02:33:17.100]   These are the ones that could be interesting to you.
[02:33:17.100 --> 02:33:17.940]   - Yeah, people have published, like,
[02:33:17.940 --> 02:33:20.540]   TSNE things of them, and that's good.
[02:33:20.540 --> 02:33:22.280]   But, like, it's not as good as just, kind of,
[02:33:22.280 --> 02:33:23.120]   seeing the vibes.
[02:33:23.120 --> 02:33:25.260]   I actually think the conference organizers
[02:33:25.260 --> 02:33:26.980]   do a good job of curating, like,
[02:33:26.980 --> 02:33:29.740]   what the, you know, oral session papers should be.
[02:33:29.740 --> 02:33:30.580]   - True.
[02:33:30.580 --> 02:33:31.980]   - You know, like, I've generally found them, like,
[02:33:31.980 --> 02:33:32.940]   generally very insightful.
[02:33:32.940 --> 02:33:34.020]   I just found out about DataConf
[02:33:34.020 --> 02:33:35.420]   from one of the oral sessions.
[02:33:35.420 --> 02:33:37.100]   I don't know if you've seen them.
[02:33:37.100 --> 02:33:38.980]   They're effectively a new ImageNet.
[02:33:38.980 --> 02:33:39.820]   - Oh, nice.
[02:33:39.820 --> 02:33:41.220]   - Which is, like, oh, that's cool.
[02:33:41.220 --> 02:33:42.060]   New benchmark.
[02:33:42.060 --> 02:33:43.260]   And, yeah, I mean, like, it's,
[02:33:43.260 --> 02:33:45.060]   to me, I'm taking it all in.
[02:33:45.060 --> 02:33:48.180]   So, it's impressive how many people do so much work,
[02:33:48.180 --> 02:33:49.340]   and you've never heard of them.
[02:33:49.340 --> 02:33:51.700]   - That's true, that's true, yeah.
[02:33:51.700 --> 02:33:53.200]   - And they're conversant in all the techniques,
[02:33:53.200 --> 02:33:54.680]   all the papers, all the stuff that you see online.
[02:33:54.680 --> 02:33:55.520]   - Yeah, yeah.
[02:33:55.520 --> 02:33:56.440]   - They're just not online.
[02:33:56.440 --> 02:33:57.280]   - That's true.
[02:33:57.280 --> 02:33:58.120]   - And they just do research quietly,
[02:33:58.120 --> 02:33:59.840]   and then once a year, they show up here.
[02:33:59.840 --> 02:34:01.960]   - Yeah, and sometimes you meet somebody here,
[02:34:01.960 --> 02:34:03.880]   and you're, like, and they would mention,
[02:34:03.880 --> 02:34:05.660]   they worked on that other paper,
[02:34:05.660 --> 02:34:07.680]   and it's a paper that you're very familiar with.
[02:34:07.680 --> 02:34:10.000]   And then you go into their Google Scholar or something,
[02:34:10.000 --> 02:34:11.480]   and you're, like, I've been reading
[02:34:11.480 --> 02:34:14.660]   this person's work for years,
[02:34:14.660 --> 02:34:16.320]   but the name never really, sort of,
[02:34:16.320 --> 02:34:18.760]   specifically popped up until you meet them in person.
[02:34:18.760 --> 02:34:20.080]   So, that's why it's, yeah,
[02:34:20.080 --> 02:34:22.540]   it's definitely an interesting experience.
[02:34:22.540 --> 02:34:24.400]   - Yeah, no particular order, but have you had those,
[02:34:24.400 --> 02:34:26.060]   like, any underrated person you would call out
[02:34:26.060 --> 02:34:28.020]   as, like, hey, everyone should pay more attention
[02:34:28.020 --> 02:34:30.020]   to the work that this person's doing?
[02:34:30.020 --> 02:34:31.980]   - One thing that comes across,
[02:34:31.980 --> 02:34:34.340]   which is why workshops are good,
[02:34:34.340 --> 02:34:36.340]   and we can get into that, sort of, later,
[02:34:36.340 --> 02:34:39.260]   is David Bao's work on interpretability
[02:34:39.260 --> 02:34:43.100]   and editing language models and editing their knowledge
[02:34:43.100 --> 02:34:45.900]   was one thing that, sort of, really stood out to me
[02:34:45.900 --> 02:34:49.540]   after I've met David and, sort of, heard about his work.
[02:34:49.540 --> 02:34:51.200]   - Editing by editing weights?
[02:34:51.200 --> 02:34:54.460]   - Yeah, they have a method of editing the model, exactly, yes.
[02:34:54.460 --> 02:34:56.000]   - Is this the one where they played Go?
[02:34:56.000 --> 02:34:57.880]   - This is Rome. - And they flipped a--
[02:34:57.880 --> 02:35:00.760]   - This is where they convince a model
[02:35:00.760 --> 02:35:03.340]   using that method that the Eiffel Tower
[02:35:03.340 --> 02:35:05.480]   is in Rome and not in Paris.
[02:35:05.480 --> 02:35:09.240]   And then they have subsequent methods of, let's say,
[02:35:09.240 --> 02:35:12.000]   if you make 100 edits like that, the model degrades.
[02:35:12.000 --> 02:35:14.000]   So, they have subsequent work on,
[02:35:14.000 --> 02:35:16.640]   okay, this is a better method to do many more of that.
[02:35:16.640 --> 02:35:19.940]   But also things like, and I've seen, like, Logit Lens
[02:35:19.940 --> 02:35:21.580]   and, sort of, where in the model
[02:35:21.580 --> 02:35:24.220]   is this token being suggested?
[02:35:24.220 --> 02:35:26.540]   Like, is it at layer one or is it at layer five
[02:35:26.540 --> 02:35:30.580]   or is it at layer, that localization is interesting work.
[02:35:30.580 --> 02:35:32.700]   - Yeah, so you do all these interviews on your YouTube,
[02:35:32.700 --> 02:35:33.580]   we'll send people there.
[02:35:33.580 --> 02:35:35.740]   Is this part of your work at Cohere, or?
[02:35:35.740 --> 02:35:36.580]   - A little bit, yes.
[02:35:36.580 --> 02:35:37.820]   - How does this, what is your deal?
[02:35:37.820 --> 02:35:40.140]   - Yeah, that's true.
[02:35:40.140 --> 02:35:44.740]   So, these, a bunch of them go on the Cohere YouTube channel
[02:35:44.740 --> 02:35:46.740]   and the Cohere socials as well.
[02:35:46.740 --> 02:35:48.420]   So, yeah, my work at Cohere,
[02:35:48.420 --> 02:35:50.340]   I get to learn in public, basically.
[02:35:50.340 --> 02:35:51.180]   - I love that.
[02:35:51.180 --> 02:35:53.580]   - So, Cohere builds language models for embeddings,
[02:35:53.580 --> 02:35:55.020]   re-ranking, and generation.
[02:35:55.020 --> 02:35:57.440]   And through selling them, I get to see what,
[02:35:57.440 --> 02:35:59.440]   how industry's solving problems with them.
[02:35:59.440 --> 02:36:01.680]   And that, to me, is very fascinating.
[02:36:01.680 --> 02:36:03.780]   To see the technology coming out of research
[02:36:03.780 --> 02:36:07.340]   and then how it goes into industry and how people use them,
[02:36:07.340 --> 02:36:09.580]   how people, sort of, need to be educated
[02:36:09.580 --> 02:36:11.540]   on the best ways of using them.
[02:36:11.540 --> 02:36:14.860]   That view, to me, is something I'm lucky to have.
[02:36:14.860 --> 02:36:17.260]   - Yeah, yeah, it's a good job to get, to be honest.
[02:36:17.260 --> 02:36:19.300]   If you love that stuff, you might as well get paid to do it.
[02:36:19.300 --> 02:36:20.140]   You probably don't notice,
[02:36:20.140 --> 02:36:21.860]   but I actually have written a book on learning in public,
[02:36:21.860 --> 02:36:24.520]   and I am a big advocate of getting developers
[02:36:24.520 --> 02:36:26.020]   and engineers to learn in public.
[02:36:26.020 --> 02:36:27.620]   - Well, you do it so well, so, yeah.
[02:36:27.620 --> 02:36:29.500]   - Yeah, this is my way of doing it.
[02:36:29.500 --> 02:36:30.780]   One final piece is, you know,
[02:36:30.780 --> 02:36:32.780]   you've written a lot of foundational work on,
[02:36:32.780 --> 02:36:33.620]   like, transformers.
[02:36:33.620 --> 02:36:35.920]   A lot of people are talking about the state-space models
[02:36:35.920 --> 02:36:37.940]   and what happens after the transformers.
[02:36:37.940 --> 02:36:39.080]   Do you have personal views on that?
[02:36:39.080 --> 02:36:40.080]   - Not yet, not yet.
[02:36:40.080 --> 02:36:40.920]   I'm on the lookout.
[02:36:40.920 --> 02:36:43.540]   So, there are always new ideas that, you know,
[02:36:43.540 --> 02:36:46.080]   there's maybe poster number 502 here
[02:36:46.080 --> 02:36:47.860]   that nobody paid attention to.
[02:36:47.860 --> 02:36:50.540]   Maybe in six months, we'll see that,
[02:36:50.540 --> 02:36:53.460]   oh, it crushes everything else on.
[02:36:53.460 --> 02:36:56.220]   So, that is always something you can never, sort of, expect.
[02:36:56.220 --> 02:36:59.060]   - Yeah, my favorite fact about the transformers papers,
[02:36:59.060 --> 02:37:00.960]   it itself was not accepted as,
[02:37:00.960 --> 02:37:02.500]   it was like a poster-only paper, right?
[02:37:02.500 --> 02:37:03.340]   - That's true.
[02:37:03.340 --> 02:37:04.180]   - I don't know the story behind that.
[02:37:04.180 --> 02:37:06.600]   - It was a big deal for machine translation.
[02:37:06.600 --> 02:37:07.440]   - Yes.
[02:37:07.440 --> 02:37:08.260]   - But it's like, okay, yeah,
[02:37:08.260 --> 02:37:09.100]   there's a cool translation paper.
[02:37:09.100 --> 02:37:09.940]   - It's one of many, right?
[02:37:09.940 --> 02:37:10.780]   - Yeah, they already have BERT.
[02:37:10.780 --> 02:37:12.180]   - One new attention method.
[02:37:12.180 --> 02:37:14.740]   We had Bardanow attention, and we had Luong attention,
[02:37:14.740 --> 02:37:17.020]   and like, now we have also, you know, one more.
[02:37:17.020 --> 02:37:18.300]   But then BERT comes out, and it's like,
[02:37:18.300 --> 02:37:19.860]   okay, this is more than translation.
[02:37:19.860 --> 02:37:21.060]   And then GPT comes out, and it's like,
[02:37:21.060 --> 02:37:22.260]   oh, this can generate text.
[02:37:22.260 --> 02:37:24.300]   - I'm still missing a good survey paper
[02:37:24.300 --> 02:37:26.460]   on everything that happens since attention is all you need.
[02:37:26.460 --> 02:37:27.820]   Like, the evolution towards
[02:37:27.820 --> 02:37:30.140]   the modern decoder-only paradigm.
[02:37:30.140 --> 02:37:30.980]   - Okay.
[02:37:30.980 --> 02:37:32.820]   - And I feel like someone needs to write that.
[02:37:32.820 --> 02:37:34.420]   (laughing)
[02:37:34.420 --> 02:37:36.340]   Everyone's too busy inventing new things
[02:37:36.340 --> 02:37:37.980]   to stop and write what happens.
[02:37:37.980 --> 02:37:39.100]   - Because it's a massive thing.
[02:37:39.100 --> 02:37:41.180]   There are a few people who,
[02:37:41.180 --> 02:37:45.060]   because there's a lot of work on different kinds of attention
[02:37:45.060 --> 02:37:46.420]   for the transformer, specifically.
[02:37:46.420 --> 02:37:48.940]   How to improve it for this problem, for that problem.
[02:37:48.940 --> 02:37:52.060]   But one thing that I'm doing is rewriting
[02:37:52.060 --> 02:37:54.620]   the Illustrator transformer with the ideas
[02:37:54.620 --> 02:37:57.060]   that have stood the test of time since then.
[02:37:57.060 --> 02:37:59.820]   So it's like, six years after, which ideas.
[02:37:59.820 --> 02:38:01.180]   So people are using rope--
[02:38:01.180 --> 02:38:02.140]   - Flash attention.
[02:38:02.140 --> 02:38:03.980]   - Flash attention. - Alibi.
[02:38:03.980 --> 02:38:05.820]   - And then, yeah, rope and alibi,
[02:38:05.820 --> 02:38:08.540]   let's say positional encodings, localized attention.
[02:38:08.540 --> 02:38:10.980]   So some ideas that people are continuing
[02:38:10.980 --> 02:38:12.780]   to use over and over. - Group query.
[02:38:12.780 --> 02:38:15.260]   - Multi-query and group multi-query, yes.
[02:38:15.260 --> 02:38:16.580]   - And then sliding window.
[02:38:16.580 --> 02:38:18.980]   - Not yet, so it's in Mistral,
[02:38:18.980 --> 02:38:21.860]   but we maybe need to see it in more work.
[02:38:21.860 --> 02:38:23.900]   - My conspiracy theory about Mistral is that,
[02:38:23.900 --> 02:38:25.540]   so the Mistral paper heavily features
[02:38:25.540 --> 02:38:27.740]   sliding window attention, and everyone is like, bullshit.
[02:38:27.740 --> 02:38:28.780]   Like, come on.
[02:38:28.780 --> 02:38:30.260]   - I mean, because you see it.
[02:38:30.260 --> 02:38:32.340]   You saw that for two years after the transformer,
[02:38:32.340 --> 02:38:33.780]   everybody was proposing new ideas.
[02:38:33.780 --> 02:38:36.380]   And if you put this in the transformer,
[02:38:36.380 --> 02:38:37.420]   it does better on this.
[02:38:37.420 --> 02:38:38.980]   But then, what stands the test of time?
[02:38:38.980 --> 02:38:42.740]   The vanilla transformer really stood the test of time,
[02:38:42.740 --> 02:38:44.740]   and did better than even a lot of these
[02:38:44.740 --> 02:38:47.020]   "enhanced" enhancements.
[02:38:47.020 --> 02:38:49.340]   But these ones, let's say, stood the test of time.
[02:38:49.340 --> 02:38:52.940]   So this rewriting is gonna be part of the book
[02:38:52.940 --> 02:38:54.940]   I'm sort of currently writing.
[02:38:54.940 --> 02:38:55.780]   - Oh, you're writing a book?
[02:38:55.780 --> 02:38:57.300]   - Yes, writing a book for O'Reilly
[02:38:57.300 --> 02:39:00.380]   called Hands-On Large Language Models,
[02:39:00.380 --> 02:39:01.740]   including this as a chapter.
[02:39:01.740 --> 02:39:04.460]   So if you want an updated, illustrated transformer,
[02:39:04.460 --> 02:39:05.300]   that's gonna be a part of it.
[02:39:05.300 --> 02:39:06.300]   - Yeah, well, when you launch your book,
[02:39:06.300 --> 02:39:08.180]   you should come on and do a full episode with us.
[02:39:08.180 --> 02:39:09.020]   - That'd be amazing.
[02:39:09.020 --> 02:39:10.060]   - Yeah, exactly.
[02:39:10.060 --> 02:39:11.860]   And then, just general NeurIPS tips, you know,
[02:39:11.860 --> 02:39:13.900]   as an attendee, like, if people are coming
[02:39:13.900 --> 02:39:15.980]   for the first time, what would you advise them to do?
[02:39:15.980 --> 02:39:18.420]   - I really love the visualization by,
[02:39:18.420 --> 02:39:21.860]   I posted about this, by Hendrik Strobel and Ben Hoover,
[02:39:21.860 --> 02:39:24.180]   of the t-SNE of all the papers.
[02:39:24.180 --> 02:39:26.860]   But also, it's clustered, so if you're interested
[02:39:26.860 --> 02:39:29.180]   in language models, that is clustered.
[02:39:29.180 --> 02:39:30.300]   - I use that for my planning.
[02:39:30.300 --> 02:39:31.900]   - It's so useful.
[02:39:31.900 --> 02:39:33.900]   Like, these things are absolutely incredible.
[02:39:33.900 --> 02:39:35.940]   I got to meet Hendrik, and they have so,
[02:39:35.940 --> 02:39:38.180]   a lot of very interesting ideas there.
[02:39:38.180 --> 02:39:40.540]   It helps you sort of orient yourself.
[02:39:40.540 --> 02:39:42.780]   And I've also seen work, kind of like it,
[02:39:42.780 --> 02:39:45.580]   but where you can do semantic search on,
[02:39:45.580 --> 02:39:48.060]   so you can say, you know, agent papers,
[02:39:48.060 --> 02:39:51.460]   and it doesn't need to match the actual keywords.
[02:39:51.460 --> 02:39:53.700]   With Cohere, we have a demo on RAG
[02:39:53.700 --> 02:39:55.060]   on NeurIPS papers as well.
[02:39:55.060 --> 02:39:56.700]   So, you can ask a question, you're like,
[02:39:56.700 --> 02:39:58.780]   okay, I'm interested in LLM and efficiency,
[02:39:58.780 --> 02:40:01.060]   it'll say, okay, this paper, this paper, this paper.
[02:40:01.060 --> 02:40:03.500]   And it's retrieval augmented sort of generation.
[02:40:03.500 --> 02:40:04.460]   So these are the three tools,
[02:40:04.460 --> 02:40:08.100]   but I think we need a lot more of these tools
[02:40:08.100 --> 02:40:09.660]   to make sense of this.
[02:40:09.660 --> 02:40:11.260]   - I need it for the meetups too.
[02:40:11.260 --> 02:40:12.700]   You know, in the conference app,
[02:40:12.700 --> 02:40:14.860]   there's all these meetups for very specific things.
[02:40:14.860 --> 02:40:15.780]   - That's true.
[02:40:15.780 --> 02:40:18.180]   - I started one for Singaporeans,
[02:40:18.180 --> 02:40:19.540]   'cause I'm a Singaporean in tech.
[02:40:19.540 --> 02:40:22.340]   And yeah, there's a bunch of very, very specific,
[02:40:22.340 --> 02:40:24.740]   like running meetups, nothing to do with tech specifically,
[02:40:24.740 --> 02:40:26.980]   but this is also a social event, right?
[02:40:26.980 --> 02:40:27.820]   Like, that you're meeting--
[02:40:27.820 --> 02:40:28.660]   - Okay, yeah.
[02:40:28.660 --> 02:40:30.540]   You wouldn't happen to be at MNLP.
[02:40:30.540 --> 02:40:31.380]   - No, why?
[02:40:31.380 --> 02:40:32.220]   - 'Cause some people did that,
[02:40:32.220 --> 02:40:34.220]   because it was like last week
[02:40:34.220 --> 02:40:36.620]   and some people went to MNLP in Singapore
[02:40:36.620 --> 02:40:38.100]   and then flew back here.
[02:40:38.100 --> 02:40:39.780]   - That's a tough call.
[02:40:39.780 --> 02:40:41.020]   Yeah, I'm not gonna do that.
[02:40:41.020 --> 02:40:42.660]   - That's rough, that's rough.
[02:40:42.660 --> 02:40:43.500]   - Well, thanks very much.
[02:40:43.500 --> 02:40:44.660]   It's a pleasure to have you on,
[02:40:44.660 --> 02:40:45.500]   pleasure to meet in person.
[02:40:45.500 --> 02:40:47.140]   - So good to meet, love your work.
[02:40:47.140 --> 02:40:47.980]   - Thank you.
[02:40:47.980 --> 02:40:48.820]   - Keep doing it.
[02:40:48.820 --> 02:40:50.700]   - Any calls to action for people while you're here?
[02:40:50.700 --> 02:40:54.100]   - Well, I'm JL Ammar on Twitter and YouTube,
[02:40:54.100 --> 02:40:57.500]   and we have LLM University, LLM.University.
[02:40:57.500 --> 02:41:00.620]   Like I collaborate with Luis and Muir Ammar.
[02:41:00.620 --> 02:41:02.860]   - Yeah, some of the best YouTube,
[02:41:02.860 --> 02:41:06.100]   like very short, but like very comprehensive, authoritative.
[02:41:06.100 --> 02:41:08.980]   - I'm very lucky to collaborate with these folks.
[02:41:08.980 --> 02:41:09.820]   - Yeah.
[02:41:09.820 --> 02:41:11.500]   - It's incredible, but yeah, thanks.
[02:41:11.500 --> 02:41:12.740]   - Yeah, thanks for doing all that.
[02:41:12.740 --> 02:41:13.580]   Thank you.
[02:41:13.580 --> 02:41:14.420]   - Appreciate it.
[02:41:14.420 --> 02:41:17.140]   - Okay, and that's it for our New York's coverage
[02:41:17.140 --> 02:41:20.820]   and for Latent Space Pod in 2023.
[02:41:20.820 --> 02:41:22.660]   We are still doing a listener survey.
[02:41:22.660 --> 02:41:24.980]   So if you are listening through here,
[02:41:24.980 --> 02:41:26.180]   you're definitely a big fan.
[02:41:26.180 --> 02:41:27.980]   We definitely want to hear from you.
[02:41:27.980 --> 02:41:29.100]   What do you like about the podcast?
[02:41:29.100 --> 02:41:30.900]   What do you want to hear for 2024?
[02:41:30.900 --> 02:41:33.740]   We've got a couple of really good episodes already recorded
[02:41:33.740 --> 02:41:35.180]   for the start of 2024.
[02:41:35.180 --> 02:41:36.540]   So we're going to start the year strong
[02:41:36.540 --> 02:41:39.180]   and come out to the one year anniversary of Latent Space.
[02:41:39.180 --> 02:41:40.860]   So thanks for all your support.
[02:41:40.860 --> 02:41:43.820]   Have a wonderful end of the year, and we'll see you soon.
[02:41:43.820 --> 02:41:45.180]   DJ, hit the outro.
[02:41:46.140 --> 02:41:48.740]   (upbeat music)
[02:41:48.740 --> 02:41:51.340]   (upbeat music)
[02:41:51.340 --> 02:41:53.660]   (upbeat music)