[00:00:00.000 --> 00:00:02.580] (upbeat music) [00:00:02.580 --> 00:00:07.740] - Hello, hello. [00:00:07.740 --> 00:00:09.920] This is Swix back again with part two [00:00:09.920 --> 00:00:10.960] of our NeurIPS coverage. [00:00:10.960 --> 00:00:13.060] This time we're gonna cover startups [00:00:13.060 --> 00:00:15.000] and it's a special episode [00:00:15.000 --> 00:00:17.200] because this is the last episode of 2023. [00:00:17.200 --> 00:00:19.820] We are definitely looking back at the year [00:00:19.820 --> 00:00:21.800] with rose colored glasses. [00:00:21.800 --> 00:00:23.000] This has been a fantastic year. [00:00:23.000 --> 00:00:25.280] We only started this podcast in February [00:00:25.280 --> 00:00:26.840] and it's grown so much. [00:00:26.840 --> 00:00:28.520] Thanks to all of you who've listened [00:00:28.520 --> 00:00:30.720] and give feedback and shared it with your friends. [00:00:30.720 --> 00:00:33.320] And we actually managed to invite a few [00:00:33.320 --> 00:00:35.560] of our former guests back on the pod [00:00:35.560 --> 00:00:36.480] together with some new friends [00:00:36.480 --> 00:00:37.520] and probably some new voices [00:00:37.520 --> 00:00:38.840] that you're gonna be hearing next year. [00:00:38.840 --> 00:00:42.160] So this is not a hard hitting interview series. [00:00:42.160 --> 00:00:44.480] You know, it's not that kind of interview. [00:00:44.480 --> 00:00:47.960] It's not that kind of podcast where we try to go too deep. [00:00:47.960 --> 00:00:49.440] Today we're just gonna go broad [00:00:49.440 --> 00:00:52.000] and we're just gonna check in on a bunch of startups [00:00:52.000 --> 00:00:54.680] that we like and monitor and we're present at NeurIPS. [00:00:54.680 --> 00:00:57.560] So first up is John Frankel of Mosaic ML. [00:00:57.560 --> 00:01:01.840] We last talked to him in May for the MPT7B episode. [00:01:01.840 --> 00:01:03.280] That's episode 13. [00:01:03.280 --> 00:01:04.920] And I have to say that was one of the best performing [00:01:04.920 --> 00:01:05.880] episodes of the whole year. [00:01:05.880 --> 00:01:07.920] So you're welcome to go back and listen to that [00:01:07.920 --> 00:01:09.040] if you missed it. [00:01:09.040 --> 00:01:10.680] And since then they were bought by Databricks [00:01:10.680 --> 00:01:12.120] for $1.3 billion. [00:01:12.120 --> 00:01:14.360] And actually during the interview, [00:01:14.360 --> 00:01:16.080] they were in the process of getting acquired. [00:01:16.080 --> 00:01:17.440] They just couldn't say anything about it, [00:01:17.440 --> 00:01:19.520] but it's definitely one of the biggest news of the year. [00:01:19.520 --> 00:01:21.320] And you can listen to what it's like [00:01:21.320 --> 00:01:23.440] or what's going through John's mind back then [00:01:23.440 --> 00:01:26.520] as well as now today, six months later. [00:01:26.520 --> 00:01:28.280] - Hey Jonathan, welcome back to the pod. [00:01:28.280 --> 00:01:29.120] - Thank you so much. [00:01:29.120 --> 00:01:30.880] This is an interesting place to have the pod [00:01:30.880 --> 00:01:33.640] under the overpass of interstate whatever it is. [00:01:33.640 --> 00:01:36.520] - Yeah, interstate whatever in the city of New Orleans. [00:01:36.520 --> 00:01:37.840] Yeah, it's really good to see you. [00:01:37.840 --> 00:01:41.560] Since you were last on the pod, Mosaic got acquired. [00:01:41.560 --> 00:01:42.400] - Yeah, thank you. [00:01:42.400 --> 00:01:44.920] I think you really deserve all the credit for this. [00:01:44.920 --> 00:01:46.560] - No, you guys were sitting on that news [00:01:46.560 --> 00:01:49.040] and we didn't know what was gonna happen. [00:01:49.040 --> 00:01:52.040] But I did come away from your interview [00:01:52.040 --> 00:01:53.520] with a very, very high impression of like, [00:01:53.520 --> 00:01:54.960] you guys are in a perfect place, perfect time [00:01:54.960 --> 00:01:58.560] and it makes a lot of sense to join forces with Databricks. [00:01:58.560 --> 00:02:00.440] - Yeah, they're kind of, I mean, [00:02:00.440 --> 00:02:03.080] I will say we really didn't want to get acquired. [00:02:03.080 --> 00:02:04.120] - You did not? [00:02:04.120 --> 00:02:06.720] - We didn't, I mean, we loved being independent, [00:02:06.720 --> 00:02:08.840] like we loved doing our own thing, [00:02:08.840 --> 00:02:10.480] but this just made too much sense. [00:02:10.480 --> 00:02:15.120] Like, you know, they do data, we do LLMs, [00:02:15.120 --> 00:02:17.920] we both do enterprises, we're all a bunch of academics. [00:02:17.920 --> 00:02:19.680] Like it was just kind of, [00:02:19.680 --> 00:02:21.040] we couldn't think of a better match. [00:02:21.040 --> 00:02:24.000] And so it just, we kind of came to the conclusion like, [00:02:24.000 --> 00:02:27.040] okay, I guess we can't not do this, like it's too perfect. [00:02:27.040 --> 00:02:29.480] - Yeah, yeah, and you've done a bunch of other podcasts [00:02:29.480 --> 00:02:31.840] on the acquisition, so I don't, we don't need to retread, [00:02:31.840 --> 00:02:32.720] I'll send people that way. [00:02:32.720 --> 00:02:34.560] Just like, what's new in Mosaic World? [00:02:34.560 --> 00:02:37.360] - In Mosaic World, honestly, like we're just cooking. [00:02:37.360 --> 00:02:39.680] I think we've been a little quiet lately, [00:02:39.680 --> 00:02:41.520] or at least we look quiet from the outside. [00:02:41.520 --> 00:02:43.120] It is certainly not that we haven't been busy [00:02:43.120 --> 00:02:44.760] and it's certainly not that, you know, [00:02:44.760 --> 00:02:45.960] we're not doing cool stuff. [00:02:45.960 --> 00:02:47.960] Part of it is that, you know, getting acquired, [00:02:47.960 --> 00:02:49.360] there's a bit of administrivia involved. [00:02:49.360 --> 00:02:51.200] You know, we had to go through new employee orientation, [00:02:51.200 --> 00:02:53.160] get health insurance, you know, [00:02:53.160 --> 00:02:55.440] meet our amazing new colleagues. [00:02:55.440 --> 00:02:56.480] Part of it is like, you know, [00:02:56.480 --> 00:02:58.160] the field has moved toward bigger stuff [00:02:58.160 --> 00:03:00.320] and we've moved toward bigger stuff. [00:03:00.320 --> 00:03:02.040] So I think we'll have some exciting stuff [00:03:02.040 --> 00:03:02.920] to talk about soon, [00:03:02.920 --> 00:03:05.880] but my philosophy is always like, speak through the work. [00:03:05.880 --> 00:03:07.200] So I don't wanna hype, I don't wanna like, [00:03:07.200 --> 00:03:08.860] get people excited, you know. [00:03:08.860 --> 00:03:10.360] You'll see the work and you judge for yourself. [00:03:10.360 --> 00:03:12.000] - Yeah, you talk about the industry [00:03:12.000 --> 00:03:13.120] moving towards bigger stuff. [00:03:13.120 --> 00:03:14.760] What trends are notable to you [00:03:14.760 --> 00:03:16.360] in the, let's say, second half of this year? [00:03:16.360 --> 00:03:18.240] - Everybody's figured out how to build LLMs. [00:03:18.240 --> 00:03:21.440] Like, it's no longer a coveted skill of, you know, [00:03:21.440 --> 00:03:22.280] a handful of people, [00:03:22.280 --> 00:03:24.240] but now we've all become LLM builders. [00:03:24.240 --> 00:03:26.560] The field has kind of narrowed in aperture again. [00:03:26.560 --> 00:03:28.440] And, you know, and yesterday when we were all figuring out [00:03:28.440 --> 00:03:30.220] how to train ImageNet, you know, [00:03:30.220 --> 00:03:32.160] now we're all figuring out how to build really big, [00:03:32.160 --> 00:03:33.400] really powerful models. [00:03:33.400 --> 00:03:36.400] And like, that's not just an assumed skill. [00:03:36.400 --> 00:03:38.480] The rest is kind of, what do you do with that skill? [00:03:38.480 --> 00:03:39.320] How do you build a product? [00:03:39.320 --> 00:03:40.140] How do you differentiate? [00:03:40.140 --> 00:03:41.000] What cool thing can you do [00:03:41.000 --> 00:03:42.740] that's different from everybody else? [00:03:42.740 --> 00:03:45.400] That's gonna determine kind of, you know, [00:03:45.400 --> 00:03:47.160] what 2024 is gonna be like. [00:03:47.160 --> 00:03:49.240] - Yeah, I guess, like, a lot of people are banking [00:03:49.240 --> 00:03:51.160] on multi-modal being, like, [00:03:51.160 --> 00:03:53.520] well, 2024 being the year of multi-modal LLMs. [00:03:53.520 --> 00:03:56.540] I feel like that's a little bit too broad a brush. [00:03:56.540 --> 00:03:59.500] I don't know, like, what's valuable in that front? [00:03:59.500 --> 00:04:02.120] - I mean, so multi-modal is gonna be a huge deal. [00:04:02.120 --> 00:04:04.160] Like, it's, but it's already a huge deal. [00:04:04.160 --> 00:04:05.520] Like, we can make multi-modal models. [00:04:05.520 --> 00:04:07.780] - The Lava Paper author I also interviewed on this pod. [00:04:07.780 --> 00:04:09.160] - Yeah, like, Lava's amazing. [00:04:09.160 --> 00:04:11.320] Like, you know, I've been playing with it a bunch personally. [00:04:11.320 --> 00:04:12.520] It's awesome. [00:04:12.520 --> 00:04:14.360] And we've got Bard, and we've got Gemini, [00:04:14.360 --> 00:04:16.600] and we've got GPT-4V, and, you know, [00:04:16.600 --> 00:04:17.800] I'm sure there are gonna be plenty more [00:04:17.800 --> 00:04:19.340] where that came from. [00:04:19.340 --> 00:04:21.540] I think the question is, as with all good things, you know, [00:04:21.540 --> 00:04:24.720] cool promise is different than, like, delivering value. [00:04:24.720 --> 00:04:25.560] - Yeah. [00:04:25.560 --> 00:04:27.040] - And I'm really curious, like, you know, [00:04:27.040 --> 00:04:29.200] what if people genuinely do this with this [00:04:29.200 --> 00:04:30.280] in real production settings, [00:04:30.280 --> 00:04:32.200] in the settings that will actually pay off [00:04:32.200 --> 00:04:33.520] the huge investment that's made [00:04:33.520 --> 00:04:34.960] to build these multi-modal models? [00:04:34.960 --> 00:04:35.800] - Right. [00:04:35.800 --> 00:04:36.620] - I'm also kind of curious, like, [00:04:36.620 --> 00:04:37.960] are we gonna start to see some big [00:04:37.960 --> 00:04:39.480] open-source multi-modal models? [00:04:39.480 --> 00:04:41.440] Like, you know, we've got Lava. [00:04:41.440 --> 00:04:42.960] It's moving in the right direction. [00:04:42.960 --> 00:04:44.280] But, like, is somebody gonna, you know, [00:04:44.280 --> 00:04:46.840] build something that looks a lot like GPT-4V, [00:04:46.840 --> 00:04:48.040] or something on that trajectory, [00:04:48.040 --> 00:04:50.600] and kind of start another arms race in that direction? [00:04:50.600 --> 00:04:52.120] Like, it'll be interesting to see. [00:04:52.120 --> 00:04:53.560] I'm honestly pretty curious, [00:04:53.560 --> 00:04:55.080] and I'm watching with bated breath [00:04:55.080 --> 00:04:56.040] for what everybody does. [00:04:56.040 --> 00:04:58.480] - Yeah, well, I think in our chat earlier today, [00:04:58.480 --> 00:05:01.520] you said, you know, we kind of live in a diverse world [00:05:01.520 --> 00:05:04.160] where, like, every company has kind of found its niche, [00:05:04.160 --> 00:05:07.400] maybe, if you wanna go through that logic. [00:05:07.400 --> 00:05:10.400] - Yeah, I'm kind of, like, I think there are, you know, [00:05:10.400 --> 00:05:12.280] there are the optimistic and pessimistic scenarios [00:05:12.280 --> 00:05:13.120] for where we go. [00:05:13.120 --> 00:05:14.800] Like, you know, I don't know. [00:05:14.800 --> 00:05:16.200] I kind of think there's a boring scenario [00:05:16.200 --> 00:05:18.480] where everybody basically is building these giant LLMs, [00:05:18.480 --> 00:05:20.600] and maybe language, you know, image models, [00:05:20.600 --> 00:05:22.760] or what have you, and they're all kind of the same. [00:05:22.760 --> 00:05:24.280] It's just you've got the Google version, [00:05:24.280 --> 00:05:26.360] and the OpenAI version, and the Amazon version, [00:05:26.360 --> 00:05:28.640] and it almost feels like cloud providers in some sense. [00:05:28.640 --> 00:05:32.080] Like, you know, what distinguishes AWS from GCP? [00:05:32.080 --> 00:05:33.640] It's kind of, you know, where you are. [00:05:33.640 --> 00:05:35.520] - Slightly different consoles and configurations. [00:05:35.520 --> 00:05:36.720] - Yeah, it's like different interface, [00:05:36.720 --> 00:05:37.560] and maybe you prefer one, [00:05:37.560 --> 00:05:39.440] or maybe, like, you've been using one for a while, [00:05:39.440 --> 00:05:40.760] and, like, you're used to it, [00:05:40.760 --> 00:05:42.640] or your IT person really likes this one, [00:05:42.640 --> 00:05:44.640] 'cause, you know, they used to work at that company, [00:05:44.640 --> 00:05:46.080] or what have you. [00:05:46.080 --> 00:05:47.320] That would be a pretty boring world, [00:05:47.320 --> 00:05:49.840] but I think that's unlikely to be the case. [00:05:49.840 --> 00:05:51.800] I'm kind of, I'm looking at, like, you know, [00:05:51.800 --> 00:05:54.160] all the cool stuff coming out of Gemini, [00:05:54.160 --> 00:05:56.360] you know, all the cool stuff coming out of OpenAI, [00:05:56.360 --> 00:05:57.720] and then, like, I'm looking at Adobe. [00:05:57.720 --> 00:05:59.200] Like, they're building-- - Firefly, really? [00:05:59.200 --> 00:06:01.400] - Firefly, they're building, like, a different model [00:06:01.400 --> 00:06:02.800] with a creative perspective. [00:06:02.800 --> 00:06:04.600] Like, I'm kind of looking at this and going, [00:06:04.600 --> 00:06:06.400] maybe we'll have a wide diversity of models, [00:06:06.400 --> 00:06:09.200] and everybody will be building models, [00:06:09.200 --> 00:06:11.240] like, just by virtue of the fact that we need so much data [00:06:11.240 --> 00:06:12.800] to build any of these models. [00:06:12.800 --> 00:06:14.200] Everybody's gonna play to their strengths, [00:06:14.200 --> 00:06:16.040] and, you know, use every resource [00:06:16.040 --> 00:06:16.920] they have at their disposal, [00:06:16.920 --> 00:06:19.880] and Google has, you know, they have YouTube. [00:06:19.880 --> 00:06:20.720] I don't know if they're using it, [00:06:20.720 --> 00:06:22.400] but, like, that's a cool resource. [00:06:22.400 --> 00:06:25.200] OpenAI has put a ton of energy into text data. [00:06:25.200 --> 00:06:27.160] Adobe, like, gets creative people, [00:06:27.160 --> 00:06:28.500] and, like, there are a few other companies [00:06:28.500 --> 00:06:29.680] where that came from. [00:06:29.680 --> 00:06:32.160] So I'm kind of, like, I'm honestly curious [00:06:32.160 --> 00:06:33.440] if we're gonna just see, like, [00:06:33.440 --> 00:06:35.320] really different models for different people, [00:06:35.320 --> 00:06:37.560] and, I don't know, that's a pretty cool world to live in. [00:06:37.560 --> 00:06:39.480] Like, we won't see this arms race, [00:06:39.480 --> 00:06:41.120] we'll just kind of see, like, diversity. [00:06:41.120 --> 00:06:42.960] >> Yeah, and we shouldn't forget Bloomberg, [00:06:42.960 --> 00:06:44.800] which teased Bloomberg GPT, [00:06:44.800 --> 00:06:46.480] but that's a source of significant tokens, [00:06:46.480 --> 00:06:47.320] the financial world. [00:06:47.320 --> 00:06:49.000] >> Yeah, yeah, like, it's, I mean, [00:06:49.000 --> 00:06:51.560] my whole business on the Mosaic and Databricks side [00:06:51.560 --> 00:06:53.960] is, you know, helping people leverage the data they have, [00:06:53.960 --> 00:06:56.400] so I'm kind of, I'm excited about a world of diversity, [00:06:56.400 --> 00:06:58.620] because, you know, it's, not only do we have, like, [00:06:58.620 --> 00:07:01.760] these crazy diverse foundation models at the largest scales, [00:07:01.760 --> 00:07:03.560] but everybody embraces whatever they have. [00:07:03.560 --> 00:07:05.320] Like, our friends at Repl.it do a code model, [00:07:05.320 --> 00:07:07.200] and, you know, I don't know, [00:07:07.200 --> 00:07:08.680] Bloomberg does another finance model, [00:07:08.680 --> 00:07:10.560] and, like, somebody does a healthcare model, [00:07:10.560 --> 00:07:12.760] and, like, everybody draws on their strengths, [00:07:12.760 --> 00:07:14.200] and that's a cool world. [00:07:14.200 --> 00:07:16.920] >> Are you bullish every company training their own model? [00:07:16.920 --> 00:07:19.200] Sorry, that's a stupid question to ask you. [00:07:19.200 --> 00:07:20.720] (laughing) [00:07:20.720 --> 00:07:22.000] >> I mean, I think, you know, [00:07:22.000 --> 00:07:23.200] I'll give you the honest answer, [00:07:23.200 --> 00:07:24.440] 'cause I think it's, you know, [00:07:24.440 --> 00:07:26.360] the business Mosaic answer is, [00:07:26.360 --> 00:07:27.640] oh yeah, I'm super bullish, [00:07:27.640 --> 00:07:29.920] like, everybody should train their own model, [00:07:29.920 --> 00:07:31.560] come do it on Databricks right now. [00:07:31.560 --> 00:07:33.080] >> Like, you should start from a base model [00:07:33.080 --> 00:07:34.040] that everyone shares, right? [00:07:34.040 --> 00:07:35.120] Like, that's kind of useful. [00:07:35.120 --> 00:07:37.540] >> Maybe, or, like, you work your way up. [00:07:37.540 --> 00:07:39.960] I think it's, like, there's a journey [00:07:39.960 --> 00:07:41.600] with playing with any of these models [00:07:41.600 --> 00:07:43.200] that may or may not end with training your own, [00:07:43.200 --> 00:07:44.880] depending on where you go on that journey. [00:07:44.880 --> 00:07:46.640] Like, you start by playing with an API, [00:07:46.640 --> 00:07:48.080] and maybe you do some retrieval, [00:07:48.080 --> 00:07:49.720] and maybe you do some fine-tuning, [00:07:49.720 --> 00:07:51.120] and then maybe you build your own model. [00:07:51.120 --> 00:07:51.960] >> Yeah. [00:07:51.960 --> 00:07:52.780] >> But, like, it's a journey, [00:07:52.780 --> 00:07:55.520] and I think there's a destination many people will get to [00:07:55.520 --> 00:07:56.720] that involves training their own model. [00:07:56.720 --> 00:07:58.420] >> Yeah, totally. [00:07:58.420 --> 00:07:59.520] What other trends are going on [00:07:59.520 --> 00:08:01.680] that you're liking, or seeing, or hating? [00:08:01.680 --> 00:08:04.000] >> Honestly, you know, maybe this gets to the question [00:08:04.000 --> 00:08:06.480] of, like, you know, overall impressions of NeurIPS. [00:08:06.480 --> 00:08:09.840] Like, I thought this was a pretty garden-variety NeurIPS, [00:08:09.840 --> 00:08:11.920] in some sense, which feels weird to say [00:08:11.920 --> 00:08:13.800] in the age of, you know, chat, GPT, [00:08:13.800 --> 00:08:15.360] and everything else that's happened in the past year. [00:08:15.360 --> 00:08:16.200] >> Yeah. [00:08:16.200 --> 00:08:17.240] >> But this felt like the most normal conference [00:08:17.240 --> 00:08:19.640] I've had since, like, 2019. [00:08:19.640 --> 00:08:21.200] You know, I mean, we've had a pandemic in between [00:08:21.200 --> 00:08:24.680] and everything, but, like, the past couple years, [00:08:24.680 --> 00:08:26.000] actually, internally at Mosaic, [00:08:26.000 --> 00:08:27.880] I always do a long write-up of every conference [00:08:27.880 --> 00:08:28.820] and the trends I see. [00:08:28.820 --> 00:08:29.660] >> Okay. [00:08:29.660 --> 00:08:30.960] >> Like, some public, some that are more relevant [00:08:30.960 --> 00:08:33.320] to what we're doing, and, like, a lot of the write-ups [00:08:33.320 --> 00:08:34.480] I've done over the past year or two [00:08:34.480 --> 00:08:37.960] have been, like, all about, like, the unease. [00:08:37.960 --> 00:08:42.720] Sometimes it was just, like, my write-up for ICML 2022 [00:08:42.720 --> 00:08:45.720] was all about people capitulating to scale, [00:08:45.720 --> 00:08:48.320] and the five stages of grief, and, you know, [00:08:48.320 --> 00:08:49.560] how different people were responding. [00:08:49.560 --> 00:08:52.960] Academics, people at Google Brain, back when it existed, [00:08:52.960 --> 00:08:54.280] you know, all that stuff. [00:08:54.280 --> 00:08:57.240] And it almost looks quaint to think about [00:08:57.240 --> 00:08:59.440] that it was insightful to say people have capitulated [00:08:59.440 --> 00:09:03.860] to scale in this day and age where, you know, [00:09:03.860 --> 00:09:05.880] tens of billions of parameters looks mundane. [00:09:05.880 --> 00:09:06.720] >> Yeah. [00:09:06.720 --> 00:09:09.120] This kind of felt like, okay, the academics [00:09:09.120 --> 00:09:10.880] are trying to find their way forward. [00:09:10.880 --> 00:09:12.480] It's no longer just kind of coping and ignoring, [00:09:12.480 --> 00:09:14.320] but, like, trying to find their way forward. [00:09:14.320 --> 00:09:16.960] The industry folks are doing their thing. [00:09:16.960 --> 00:09:19.080] A lot more people keeping secrets than used to, [00:09:19.080 --> 00:09:20.640] but it's still, like, you know, [00:09:20.640 --> 00:09:22.160] a lot of people also aren't keeping secrets [00:09:22.160 --> 00:09:23.560] and can talk about what they're doing still. [00:09:23.560 --> 00:09:24.400] >> Yeah. [00:09:24.400 --> 00:09:26.320] >> So it kind of felt like, you know, equilibrium. [00:09:26.320 --> 00:09:28.080] I don't know how long it'll last, [00:09:28.080 --> 00:09:30.120] but this was a lot less of a frantic [00:09:30.120 --> 00:09:32.660] and stressful conference than I think I'm used to, [00:09:32.660 --> 00:09:33.880] at least in the past couple years. [00:09:33.880 --> 00:09:36.040] You know, I'm in a new role, in some sense. [00:09:36.040 --> 00:09:36.960] I'm on the business side now. [00:09:36.960 --> 00:09:38.160] I'm on the industry side. [00:09:38.160 --> 00:09:39.000] >> Yeah. [00:09:39.000 --> 00:09:40.160] >> And I'm trying to find my own path. [00:09:40.160 --> 00:09:42.600] But I felt like a lot of us have changed roles [00:09:42.600 --> 00:09:45.200] in some sense as the past couple years have, [00:09:45.200 --> 00:09:47.400] you know, have taken place and everybody's moved around [00:09:47.400 --> 00:09:48.720] and figured out what they want to do. [00:09:48.720 --> 00:09:50.480] But we've all kind of found our place at this point. [00:09:50.480 --> 00:09:53.360] I feel like, you know, we may be in different places, [00:09:53.360 --> 00:09:56.280] but the ecosystem, the community has kind of sustained [00:09:56.280 --> 00:09:58.080] with, you know, a bunch of new PhD students [00:09:58.080 --> 00:09:59.560] and all that good stuff. [00:09:59.560 --> 00:10:00.400] >> Yeah, yeah. [00:10:00.400 --> 00:10:01.600] >> Like, it's kind of, you know, I don't know, [00:10:01.600 --> 00:10:03.000] it's nature healing in some sense [00:10:03.000 --> 00:10:06.320] from the insanity of the past couple of years. [00:10:06.320 --> 00:10:08.600] And a reminder that, you know, we're all kind of small pieces [00:10:08.600 --> 00:10:11.520] in a much bigger, you know, ecosystem and community. [00:10:11.520 --> 00:10:13.200] >> Yeah, and it's still growing though. [00:10:13.200 --> 00:10:15.160] Apparently the latest stats was something [00:10:15.160 --> 00:10:17.240] like 15,000 attendees this year. [00:10:17.240 --> 00:10:18.680] >> Oh my God. [00:10:18.680 --> 00:10:20.160] Oh my God. [00:10:20.160 --> 00:10:23.140] I will say one big difference, you know, [00:10:23.140 --> 00:10:24.760] in the time right before the pandemic, [00:10:24.760 --> 00:10:26.180] deep learning was getting so popular, [00:10:26.180 --> 00:10:28.960] the conferences would sell out the day registration opened. [00:10:28.960 --> 00:10:31.040] Like as a student, you'd have to rush to register [00:10:31.040 --> 00:10:32.600] or you wouldn't even get to go. [00:10:32.600 --> 00:10:33.640] That I don't think is happening anymore. [00:10:33.640 --> 00:10:34.480] >> This year's easier, yeah. [00:10:34.480 --> 00:10:36.760] And they're also live streaming stuff, you know, so. [00:10:36.760 --> 00:10:38.160] >> Yeah, but it's kind of interesting that like, [00:10:38.160 --> 00:10:39.960] I guess we've adjusted to the huge capacity [00:10:39.960 --> 00:10:42.800] and everything that's, you know, going on. [00:10:42.800 --> 00:10:44.900] But it's, you know, even so with it getting bigger, [00:10:44.900 --> 00:10:48.320] it didn't feel that different to be honest. [00:10:48.320 --> 00:10:49.600] Maybe it's just that I joined the community [00:10:49.600 --> 00:10:50.960] when things were already big. [00:10:50.960 --> 00:10:51.800] >> Yeah. [00:10:51.800 --> 00:10:52.640] >> But like, you know, there were some journalists here, [00:10:52.640 --> 00:10:54.360] some VCs here, but that's always been the case. [00:10:54.360 --> 00:10:55.360] >> Yeah, it's always been the case. [00:10:55.360 --> 00:10:59.280] You always have, you know, overrated, underrated papers. [00:10:59.280 --> 00:11:01.620] We will maybe save the overrated stuff for later, [00:11:01.620 --> 00:11:03.820] but any underrated stuff that you want to highlight [00:11:03.820 --> 00:11:05.440] from this year, it doesn't have to be at the conference, [00:11:05.440 --> 00:11:07.680] but just want to remind you for underrated papers [00:11:07.680 --> 00:11:08.800] that people should pay attention to. [00:11:08.800 --> 00:11:10.240] >> I'm going to flip this a different way. [00:11:10.240 --> 00:11:11.080] >> Okay. [00:11:11.080 --> 00:11:12.720] >> Because I'm not a fan of overrated or underrated [00:11:12.720 --> 00:11:15.680] and I'm not like, I'm not a fan of passing judgment on stuff. [00:11:15.680 --> 00:11:17.880] I just don't like, far be it from me, [00:11:17.880 --> 00:11:20.480] I like, one of my big gripes is like, [00:11:20.480 --> 00:11:21.920] we shouldn't have best paper awards. [00:11:21.920 --> 00:11:24.360] Like, and I say that having gotten one back in the day. [00:11:24.360 --> 00:11:26.100] So I feel like I have the ability to say that [00:11:26.100 --> 00:11:27.280] not just out of bitterness, [00:11:27.280 --> 00:11:29.560] but out of like recognition that it's dumb. [00:11:29.560 --> 00:11:30.400] >> Sure. [00:11:30.400 --> 00:11:32.420] >> But test of time though, test of time is great. [00:11:32.420 --> 00:11:33.380] >> Test of time is awesome. [00:11:33.380 --> 00:11:34.220] >> Yeah. [00:11:34.220 --> 00:11:36.300] >> You know, and you know, [00:11:36.300 --> 00:11:40.160] I look forward to everybody using lottery tickets in 2029. [00:11:40.160 --> 00:11:42.780] No, if you're working on lottery tickets, you know, [00:11:42.780 --> 00:11:44.340] there's a lot of other cool stuff out there, [00:11:44.340 --> 00:11:47.140] but I think it's really, I'll turn that question into like, [00:11:47.140 --> 00:11:49.540] what areas should academics be thinking about? [00:11:49.540 --> 00:11:51.340] I don't know, what would I work on as a PhD student right now [00:11:51.340 --> 00:11:52.940] or what would I recommend a student work on? [00:11:52.940 --> 00:11:54.920] And all the biggest questions in the field [00:11:54.920 --> 00:11:57.180] come down to how you measure and how you evaluate. [00:11:57.180 --> 00:11:59.420] Those are just such fundamental questions [00:11:59.420 --> 00:12:00.760] until we know how to measure things, [00:12:00.760 --> 00:12:02.720] until we know how to evaluate anything, [00:12:02.720 --> 00:12:04.120] you can't really even do any science. [00:12:04.120 --> 00:12:05.580] We don't know what we're even talking about. [00:12:05.580 --> 00:12:06.420] >> Yeah. [00:12:06.420 --> 00:12:08.440] >> And so I'm also thinking a lot about like synthetic data. [00:12:08.440 --> 00:12:10.620] Can we generate useful evaluation sets [00:12:10.620 --> 00:12:13.380] for all the little properties we want to find about an LLM? [00:12:13.380 --> 00:12:15.580] Creating data sets is really hard, [00:12:15.580 --> 00:12:17.180] but a model can help us do that. [00:12:17.180 --> 00:12:19.020] So I'm kind of curious, like, you know, [00:12:19.020 --> 00:12:22.300] can we bootstrap the evaluation process with synthetic data, [00:12:22.300 --> 00:12:24.740] figure out good ways to help ourselves build good data sets, [00:12:24.740 --> 00:12:26.420] and then, you know, from there, [00:12:26.420 --> 00:12:28.540] maybe we can start to really take a bite [00:12:28.540 --> 00:12:29.860] out of the evaluation questions [00:12:29.860 --> 00:12:31.700] and get moving on the actual science [00:12:31.700 --> 00:12:33.740] of understanding what's going on with these LLMs. [00:12:33.740 --> 00:12:37.060] All that seems very academically viable. [00:12:37.060 --> 00:12:38.820] None of those require huge amounts of compute. [00:12:38.820 --> 00:12:40.780] They require creativity, ingenuity, [00:12:40.780 --> 00:12:42.260] but that's an abundance in academia, [00:12:42.260 --> 00:12:43.460] even when compute isn't. [00:12:43.460 --> 00:12:46.760] >> Yeah, I would say that that's actually one thing [00:12:46.760 --> 00:12:48.980] I've had a big delta on for this year. [00:12:48.980 --> 00:12:50.060] >> Yeah, tell me more, I'm curious. [00:12:50.060 --> 00:12:51.860] >> Synthetic data, I always thought it was, [00:12:51.860 --> 00:12:53.940] you're just kind of sampling from a known distribution anyway [00:12:53.940 --> 00:12:55.100] that you know is imperfect [00:12:55.100 --> 00:12:56.900] and doesn't match human preferences. [00:12:57.740 --> 00:13:00.460] And it's Kanjun from Imbue [00:13:00.460 --> 00:13:01.860] that actually changed my mind on this. [00:13:01.860 --> 00:13:02.700] >> Oh, tell me more. [00:13:02.700 --> 00:13:04.500] That is a smart person you're talking to. [00:13:04.500 --> 00:13:05.860] >> She's like, you actually don't want [00:13:05.860 --> 00:13:06.860] to match human preferences. [00:13:06.860 --> 00:13:10.380] You want to spike it in different ways, in useful ways. [00:13:10.380 --> 00:13:13.260] And so you want to synthesize data in useful ways [00:13:13.260 --> 00:13:15.520] that don't necessarily match human preferences. [00:13:15.520 --> 00:13:17.100] And once she said that, I was like, oh, okay, [00:13:17.100 --> 00:13:20.340] I think I'm actually sold on this as a viable practice. [00:13:20.340 --> 00:13:22.140] >> I would actually make a completely different argument, [00:13:22.140 --> 00:13:23.220] but she's right. [00:13:23.220 --> 00:13:24.900] So I'm probably going to make a wrong argument now [00:13:24.900 --> 00:13:26.940] because Kanjun is pretty much always right. [00:13:26.940 --> 00:13:29.580] And when she disagrees with me, it means I'm wrong. [00:13:29.580 --> 00:13:32.660] But the way that I look at it is synthetic data [00:13:32.660 --> 00:13:37.660] is not about, it's not about relying solely on the model. [00:13:37.660 --> 00:13:39.700] We as computer scientists love the idea [00:13:39.700 --> 00:13:42.300] that once you automate something, you fully automate it. [00:13:42.300 --> 00:13:44.140] It's really about how do you reduce [00:13:44.140 --> 00:13:45.740] the amount of work necessary [00:13:45.740 --> 00:13:48.300] to create something that's truly useful. [00:13:48.300 --> 00:13:50.520] And so synthetic data is not about [00:13:50.520 --> 00:13:52.860] can we whip up a data set automatically [00:13:52.860 --> 00:13:54.180] and then make a model better? [00:13:54.180 --> 00:13:56.660] It's about how can you use human time most effectively? [00:13:56.660 --> 00:13:58.940] And maybe labeling data or creating a data set from scratch [00:13:58.940 --> 00:14:01.140] is not the most effective use of human time. [00:14:01.140 --> 00:14:04.620] Maybe it's curating a data set that a model generated, [00:14:04.620 --> 00:14:06.260] you know, to pick the examples you like most [00:14:06.260 --> 00:14:07.940] and edit a few of them. [00:14:07.940 --> 00:14:09.700] When I think about the millions of different [00:14:09.700 --> 00:14:11.540] small properties of LLMs we want to study, [00:14:11.540 --> 00:14:13.660] like in some sense, the unit tests of LLMs [00:14:13.660 --> 00:14:14.700] that we want to develop, you know, [00:14:14.700 --> 00:14:17.260] that's going to require a bunch of tiny eval sets [00:14:17.260 --> 00:14:19.700] on specific really niche things. [00:14:19.700 --> 00:14:22.640] It's really hard for a human to just write from scratch. [00:14:22.640 --> 00:14:24.500] Nobody has the time or patience for that. [00:14:24.500 --> 00:14:26.500] If a model can help you do it and you can curate, [00:14:26.500 --> 00:14:28.460] you don't end up in a full feedback loop. [00:14:28.460 --> 00:14:29.540] You have a human there, [00:14:29.540 --> 00:14:31.940] but you're just making better use of your time. [00:14:31.940 --> 00:14:32.780] - Yeah, that makes sense. [00:14:32.780 --> 00:14:35.860] I would just observe that this sounds like weak labeling. [00:14:35.860 --> 00:14:38.340] And I talked to Raza Habib from Human Loop [00:14:38.340 --> 00:14:40.220] who actually pivoted away from weak labeling. [00:14:40.220 --> 00:14:41.260] - Interesting, tell me more. [00:14:41.260 --> 00:14:42.980] - I don't know, I just think it might have [00:14:42.980 --> 00:14:44.220] just been too early. [00:14:44.220 --> 00:14:45.920] I'm still a believer. [00:14:45.920 --> 00:14:48.100] - This is the thing about all of deep learning, [00:14:48.100 --> 00:14:50.620] like you never know whether you're too early, [00:14:50.620 --> 00:14:53.580] and too early is often six months too early. [00:14:53.580 --> 00:14:54.920] It's no longer the like, you know, [00:14:54.920 --> 00:14:57.840] Yoshua Bengio and everybody being 20 years too early. [00:14:57.840 --> 00:14:58.740] - Or Schmidt-Huber. [00:14:58.740 --> 00:14:59.760] - And Schmidt-Huber, of course. [00:14:59.760 --> 00:15:02.600] We have to salute, you know, Schmidt-Huber as well. [00:15:02.600 --> 00:15:05.040] It's not like being 20 years too early. [00:15:05.040 --> 00:15:07.040] It's like, you might be six months too early [00:15:07.040 --> 00:15:09.200] and some crazy thing is going to happen, [00:15:09.200 --> 00:15:11.160] or like something will finally click. [00:15:11.160 --> 00:15:12.680] And there goes that. [00:15:12.680 --> 00:15:13.760] - Yeah, yeah, totally. [00:15:13.760 --> 00:15:15.880] Cool, we're almost at probably your destination. [00:15:15.880 --> 00:15:17.120] The workshops tomorrow, you said, [00:15:17.120 --> 00:15:19.040] are like kind of the highlights for you for NeurIPS? [00:15:19.040 --> 00:15:19.880] - Yeah, yeah. [00:15:19.880 --> 00:15:20.880] That used to be my workshop strategy. [00:15:20.880 --> 00:15:23.840] I don't, I haven't, I picked out a few, but. [00:15:23.840 --> 00:15:24.840] - Oh, wander. [00:15:24.840 --> 00:15:26.280] - How do you do NeurIPS well, basically? [00:15:26.280 --> 00:15:28.400] - Wander and go to a lot of the poster sessions. [00:15:28.400 --> 00:15:30.860] Like, the talks at workshops are always great, [00:15:30.860 --> 00:15:32.000] but you know, often, honestly, [00:15:32.000 --> 00:15:34.280] the workshops are pretty eclectic in terms of talks. [00:15:34.280 --> 00:15:35.800] You try your best as a workshop organizer [00:15:35.800 --> 00:15:37.240] to put together a coherent program, [00:15:37.240 --> 00:15:38.680] but you know, presenters are gonna do [00:15:38.680 --> 00:15:40.080] what presenters are gonna do, [00:15:40.080 --> 00:15:41.320] and you can't really stop that. [00:15:41.320 --> 00:15:44.480] But instead, you know, I love the poster sessions, [00:15:44.480 --> 00:15:46.440] 'cause like, you get students who are working [00:15:46.440 --> 00:15:48.840] on like really crazy creative stuff [00:15:48.840 --> 00:15:50.600] that isn't even ready for the conference yet. [00:15:50.600 --> 00:15:53.040] Like, you're actually seeing things [00:15:53.040 --> 00:15:54.560] that have not been put out on Twitter yet, [00:15:54.560 --> 00:15:56.440] and that's such a nice change from NeurIPS, [00:15:56.440 --> 00:15:58.400] where all the conference papers have been out for months, [00:15:58.400 --> 00:15:59.240] if not longer. [00:15:59.240 --> 00:16:00.680] - Oh, wait, I observed the opposite. [00:16:00.680 --> 00:16:03.240] Things that have been on Twitter for like forever [00:16:03.240 --> 00:16:04.780] are now out of date, and there are posters, [00:16:04.780 --> 00:16:06.600] because that's how long it takes to submit a paper. [00:16:06.600 --> 00:16:07.440] - Yeah, yeah. [00:16:07.440 --> 00:16:08.280] - So it's the other way. [00:16:08.280 --> 00:16:10.040] - But for the workshop poster sessions, [00:16:10.040 --> 00:16:12.160] it's the workshop poster sessions that are awesome, [00:16:12.160 --> 00:16:13.600] because you're truly seeing stuff [00:16:13.600 --> 00:16:15.480] that was created this fall, [00:16:15.480 --> 00:16:16.320] may not have been archived yet, [00:16:16.320 --> 00:16:17.640] nobody's talked about it, [00:16:17.640 --> 00:16:19.560] probably makes no sense yet, [00:16:19.560 --> 00:16:21.440] but may evolve into something really cool. [00:16:21.440 --> 00:16:22.280] - Interesting. [00:16:22.280 --> 00:16:23.600] - And so, and you also, like, [00:16:23.600 --> 00:16:25.400] there's not as much competition to talk to the people, [00:16:25.400 --> 00:16:26.960] you can just kind of chill. [00:16:26.960 --> 00:16:28.920] So I love to like wander from poster session [00:16:28.920 --> 00:16:30.800] to poster session throughout the workshops, [00:16:30.800 --> 00:16:31.840] 'cause like, that's my favorite part. [00:16:31.840 --> 00:16:33.240] I don't know, I can hear, you know, [00:16:33.240 --> 00:16:35.920] somewhat important people talk any time, [00:16:35.920 --> 00:16:37.640] but it's like talking to the people [00:16:37.640 --> 00:16:40.800] and seeing, like, getting a glimpse of what might be ahead. [00:16:40.800 --> 00:16:41.620] - Yeah. [00:16:41.620 --> 00:16:42.460] - You know, being able to say like, [00:16:42.460 --> 00:16:44.200] oh my gosh, I remember seeing the poster for this paper [00:16:44.200 --> 00:16:46.320] that a year later becomes very important, [00:16:46.320 --> 00:16:48.660] and like, kind of asking yourself, you know, [00:16:48.660 --> 00:16:50.760] is this nonsense or is this brilliant? [00:16:50.760 --> 00:16:52.080] And like, not actually knowing the answer, [00:16:52.080 --> 00:16:53.440] having 50 million people on Twitter [00:16:53.440 --> 00:16:55.000] having told you the answer. [00:16:55.000 --> 00:16:56.880] That's kind of, I don't know, it's fun. [00:16:56.880 --> 00:16:58.440] It takes me back to like, [00:16:58.440 --> 00:17:00.240] what the conferences were like for me, [00:17:00.240 --> 00:17:02.600] you know, when I was early in my career. [00:17:02.600 --> 00:17:04.240] Like, you know, it was just kind of some random people [00:17:04.240 --> 00:17:05.080] coming and chatting with me, [00:17:05.080 --> 00:17:07.360] and you never really knew what was important [00:17:07.360 --> 00:17:09.560] and what wasn't, but it was all kind of cool and fun. [00:17:09.560 --> 00:17:11.760] - You use a formula hypothesis and, you know, [00:17:11.760 --> 00:17:13.040] search that way. [00:17:13.040 --> 00:17:14.560] Yeah, so I'm looking forward to tomorrow. [00:17:14.560 --> 00:17:16.120] If you find anything interesting, just let me know, [00:17:16.120 --> 00:17:17.600] and I'll go interview with them. [00:17:17.600 --> 00:17:18.760] I've been recording sessions [00:17:18.760 --> 00:17:20.720] with poster presenters all the time. [00:17:20.720 --> 00:17:23.560] And I wanted to expose people who don't come to NeurIPS, [00:17:23.560 --> 00:17:25.280] like, that this is what goes on. [00:17:25.280 --> 00:17:27.960] And there's so much, I found, so much talent [00:17:27.960 --> 00:17:29.960] that does a lot of work [00:17:29.960 --> 00:17:31.520] that you don't hear about them online, [00:17:31.520 --> 00:17:32.360] 'cause they're just not online, [00:17:32.360 --> 00:17:35.040] or they just don't have the reach that, you know, I do. [00:17:35.040 --> 00:17:36.780] So like, I want to give them that reach. [00:17:36.780 --> 00:17:38.300] - Yeah, I think there's like, you know, [00:17:38.300 --> 00:17:40.240] I'll say two things kind of to close up. [00:17:40.240 --> 00:17:41.740] One is kind of that like, [00:17:41.740 --> 00:17:44.640] I feel like there's now so much hype attached to NeurIPS [00:17:44.640 --> 00:17:46.240] and iClear and ICML, [00:17:46.240 --> 00:17:48.840] just by virtue of the hype that's attached to the field. [00:17:48.840 --> 00:17:49.960] I don't know, this like, [00:17:49.960 --> 00:17:51.560] feels pretty mundane and boring to me. [00:17:51.560 --> 00:17:54.440] Like, it's really cool, but it's also just, you know, [00:17:54.440 --> 00:17:56.000] it's just a bunch of academics, like, [00:17:56.000 --> 00:17:57.600] walking around having boring conversations, [00:17:57.600 --> 00:18:00.200] getting coffee and like, pretending to party. [00:18:00.200 --> 00:18:01.440] I definitely, my experience of-- [00:18:01.440 --> 00:18:02.680] - Pretending to party, I love it. [00:18:02.680 --> 00:18:04.160] - No, I'll say that, you know, I'll tell you-- [00:18:04.160 --> 00:18:05.000] - It's true, it's so true. [00:18:05.000 --> 00:18:07.360] - My experience of NeurIPS last year, like, [00:18:07.360 --> 00:18:08.880] I don't know, these conferences have a reputation [00:18:08.880 --> 00:18:10.800] of being over the top with industry parties [00:18:10.800 --> 00:18:11.920] and things like that. [00:18:11.920 --> 00:18:15.600] And my impression was that was probably true in 2017. [00:18:15.600 --> 00:18:19.660] Like, that year is known as the NeurIPS that broke NeurIPS, [00:18:19.660 --> 00:18:20.500] for various reasons. [00:18:20.500 --> 00:18:21.320] I wasn't there at that time, [00:18:21.320 --> 00:18:22.860] that was before I was even in the field. [00:18:22.860 --> 00:18:25.280] But my experience last year, especially post-pandemic, [00:18:25.280 --> 00:18:29.200] was a whole generation of students had like, heard stories, [00:18:29.200 --> 00:18:31.160] and these stories had been built up in their minds, [00:18:31.160 --> 00:18:33.180] and they were trying to live out the fantasy [00:18:33.180 --> 00:18:34.440] of what they thought NeurIPS had been like. [00:18:34.440 --> 00:18:36.120] So these very boring happy hours, [00:18:36.120 --> 00:18:38.960] people tried to turn into ragers and it was hilarious. [00:18:38.960 --> 00:18:40.600] It was just adorable in some sense. [00:18:40.600 --> 00:18:42.880] So, you know, it's worth remembering, like, you know, [00:18:42.880 --> 00:18:44.300] there's the fantasy and there's the reality, [00:18:44.300 --> 00:18:45.560] and the reality is, you know, [00:18:45.560 --> 00:18:47.700] it's a boring industry conference where people are, [00:18:47.700 --> 00:18:49.400] or academic conference with some industry component [00:18:49.400 --> 00:18:50.960] where people are trying to make money [00:18:50.960 --> 00:18:52.480] and convince people to look at their posters [00:18:52.480 --> 00:18:54.040] and get a few citations and-- [00:18:54.040 --> 00:18:55.560] - Lots of hiring, lots of hiring. [00:18:55.560 --> 00:18:56.400] - Lots of hiring. [00:18:56.400 --> 00:18:57.360] - Lots of hiring. [00:18:57.360 --> 00:18:59.920] - I think things have really settled into a new normal. [00:18:59.920 --> 00:19:02.160] And, you know, with all the hype and all the craziness [00:19:02.160 --> 00:19:03.600] over the past couple years, [00:19:03.600 --> 00:19:05.760] people feel like everything is just exploding [00:19:05.760 --> 00:19:06.720] and changing all the time. [00:19:06.720 --> 00:19:07.900] Like, you see those LinkedIn posts [00:19:07.900 --> 00:19:09.400] of everything has just changed. [00:19:09.400 --> 00:19:10.240] - I hate those. [00:19:10.240 --> 00:19:11.160] I hate those so much. [00:19:11.160 --> 00:19:12.220] I hate LinkedIn. [00:19:12.220 --> 00:19:14.840] If anyone is a LinkedIn influencer, I hate you. [00:19:14.840 --> 00:19:16.280] (laughing) [00:19:16.280 --> 00:19:19.960] But, you know, it's kind of like, this felt like, okay, [00:19:19.960 --> 00:19:21.540] like, maybe there's a steady state again. [00:19:21.540 --> 00:19:23.480] Maybe we can all catch our breath a bit. [00:19:23.480 --> 00:19:25.400] And it kind of felt like after a pandemic, [00:19:25.400 --> 00:19:27.280] after all the technical development that's happened [00:19:27.280 --> 00:19:28.760] in the past couple of years, like-- [00:19:28.760 --> 00:19:29.600] - It's nice. [00:19:29.600 --> 00:19:30.420] - We can chill. [00:19:30.420 --> 00:19:31.260] - It's nice. [00:19:31.260 --> 00:19:32.080] - Like, we can kind of breathe a little bit. [00:19:32.080 --> 00:19:33.500] And there's something really nice about that. [00:19:33.500 --> 00:19:34.400] - Yeah, love that. [00:19:34.400 --> 00:19:36.200] Well, it's so nice to have you on again [00:19:36.200 --> 00:19:37.400] and chat and catch up. [00:19:37.400 --> 00:19:38.240] - Thank you so much. [00:19:38.240 --> 00:19:39.060] It's good to see you. [00:19:39.060 --> 00:19:40.160] - Thanks for jumping on. [00:19:40.160 --> 00:19:41.280] - In case it wasn't obvious, [00:19:41.280 --> 00:19:44.320] that was not up to the usual standards of our recordings [00:19:44.320 --> 00:19:45.720] because that was a walking interview. [00:19:45.720 --> 00:19:48.080] I was carrying these portable mics all over NeurIPS. [00:19:48.080 --> 00:19:51.380] And really the only way to schedule podcast interviews [00:19:51.380 --> 00:19:54.560] with people, especially busy people like John at NeurIPS, [00:19:54.560 --> 00:19:56.320] is to show up with a portable mic, [00:19:56.320 --> 00:19:58.080] shove it in their face and talk to them. [00:19:58.080 --> 00:19:59.680] And that's what the majority [00:19:59.680 --> 00:20:02.580] of the hip podcast conversations are for this episode, [00:20:02.580 --> 00:20:04.400] because that's the only way I can like, [00:20:04.400 --> 00:20:06.880] I see someone, grab someone, do you have 15 minutes [00:20:06.880 --> 00:20:08.180] and talk through something. [00:20:08.180 --> 00:20:09.020] That's what happens. [00:20:09.020 --> 00:20:10.440] That's how we schedule interviews [00:20:10.440 --> 00:20:12.040] with a whole bunch of people that we would not get. [00:20:12.040 --> 00:20:13.880] Otherwise, NeurIPS is too chaotic [00:20:13.880 --> 00:20:16.080] to schedule anything else otherwise. [00:20:16.080 --> 00:20:17.960] One takeaway from John's interview, [00:20:17.960 --> 00:20:20.320] which I want to highlight, apart from the whole, [00:20:20.320 --> 00:20:22.080] it's the new normal conversation, [00:20:22.080 --> 00:20:25.200] it's the focus on synthetic data generation. [00:20:25.200 --> 00:20:28.320] This is a recurring theme that is continually coming up [00:20:28.320 --> 00:20:31.520] from my conversations with literally everybody in the space. [00:20:31.520 --> 00:20:33.120] And how do you do it right? [00:20:33.120 --> 00:20:36.520] How do you do it with the blessing of OpenAI? [00:20:36.520 --> 00:20:38.360] ByteDance was recently banned from OpenAI [00:20:38.360 --> 00:20:42.140] because they were considered to be distilling from GPT-4, [00:20:42.140 --> 00:20:44.420] which is not allowed under the Terms of Service. [00:20:44.420 --> 00:20:46.380] I've heard that they're not the only company [00:20:46.380 --> 00:20:48.260] that is accused of or being thought of [00:20:48.260 --> 00:20:50.460] or rumored to be doing that. [00:20:50.460 --> 00:20:51.460] Probably the right approach, [00:20:51.460 --> 00:20:53.540] it's something that looks like DeepMind's approach, [00:20:53.540 --> 00:20:55.940] which on Monday of NeurIPS published a paper [00:20:55.940 --> 00:20:57.860] called "Beyond Human Data, Scaling Self-Training [00:20:57.860 --> 00:20:59.580] for Problem-Solving Language Models." [00:20:59.580 --> 00:21:03.060] And the concept is honestly not that complicated. [00:21:03.060 --> 00:21:05.840] For the domains of math and for coding, [00:21:05.840 --> 00:21:09.340] they were able to computer generate data for training on, [00:21:09.340 --> 00:21:11.060] and they found that when training POM2 [00:21:11.060 --> 00:21:13.220] on that synthetically generated data [00:21:13.220 --> 00:21:16.460] improved their results and performance on the benchmarks [00:21:16.460 --> 00:21:17.820] for those relevant domains. [00:21:17.820 --> 00:21:20.340] It makes sense that we can scale [00:21:20.340 --> 00:21:22.180] beyond human data on those dimensions. [00:21:22.180 --> 00:21:24.140] That's the trivially easy stuff. [00:21:24.140 --> 00:21:25.260] And the question is, [00:21:25.260 --> 00:21:28.700] how do you scale beyond the verifiably correct? [00:21:28.700 --> 00:21:30.820] If you listen to part one of our NeurIPS coverage, [00:21:30.820 --> 00:21:31.820] we talked about DPO, [00:21:31.820 --> 00:21:35.620] which is more efficient usage of existing information. [00:21:35.620 --> 00:21:37.920] So not exactly using synthetic information, [00:21:37.920 --> 00:21:39.760] but just as a sneak peek of 2024, [00:21:39.760 --> 00:21:41.760] we've actually already recorded an episode [00:21:41.760 --> 00:21:44.320] with Nathan Lambert now of the Allen Institute [00:21:44.320 --> 00:21:46.560] on RLHF and RLAIF. [00:21:46.560 --> 00:21:48.640] And I think those approaches might scale [00:21:48.640 --> 00:21:51.720] beyond just the narrow domains of math and code. [00:21:51.720 --> 00:21:53.960] So next up is someone who's new to the pod, [00:21:53.960 --> 00:21:54.800] but not new to me. [00:21:54.800 --> 00:21:57.560] I've talked with Lynn from Fireworks a bunch [00:21:57.560 --> 00:21:58.680] over the past few months, [00:21:58.680 --> 00:22:02.240] and they've definitely blown up in the inference space. [00:22:02.240 --> 00:22:03.060] So in some sense, [00:22:03.060 --> 00:22:04.760] you can think of Fireworks as a competitor [00:22:04.760 --> 00:22:07.600] to Together AI or Replicate [00:22:07.600 --> 00:22:09.880] or any other sort of inference serving platform [00:22:09.880 --> 00:22:11.120] that you might think about, [00:22:11.120 --> 00:22:12.280] but they have a really good team [00:22:12.280 --> 00:22:14.680] and they've been doing some very good work with Mistral. [00:22:14.680 --> 00:22:16.720] Lynn and her team have an amazing track record, [00:22:16.720 --> 00:22:18.680] which you hear about in the interview, [00:22:18.680 --> 00:22:20.360] and their customer list is pretty stellar too. [00:22:20.360 --> 00:22:22.520] So it's worth checking out and checking in [00:22:22.520 --> 00:22:25.920] on the inference business with Lynn Tsao from Fireworks AI. [00:22:25.920 --> 00:22:29.040] - Rewind, we can do all that because this will be edited. [00:22:29.040 --> 00:22:31.760] Okay, so who are you and what is Fireworks? [00:22:31.760 --> 00:22:33.800] - Hey Sean, we started Fireworks last year [00:22:33.800 --> 00:22:36.120] and me and a few founding engineers, [00:22:36.120 --> 00:22:37.760] we have been working at MATA [00:22:37.760 --> 00:22:42.000] on building AI platform and specific PyTorch for five years. [00:22:42.000 --> 00:22:43.760] When we started PyTorch, [00:22:43.760 --> 00:22:47.080] it was a framework for researchers, [00:22:47.080 --> 00:22:49.760] and we took the mission to build one framework [00:22:49.760 --> 00:22:51.760] for both production and research [00:22:51.760 --> 00:22:54.320] and streamline research production transition, [00:22:54.320 --> 00:22:56.840] operating PyTorch as a huge scale [00:22:56.840 --> 00:22:59.120] for MATA and for the industry. [00:22:59.120 --> 00:23:03.400] So by the time we left last year, [00:23:03.400 --> 00:23:06.680] it is running more than five trillion inference per day [00:23:06.680 --> 00:23:09.040] across 50 data centers for MATA. [00:23:09.040 --> 00:23:12.240] And we feel like this is a great impact we have landed. [00:23:12.240 --> 00:23:16.200] But when we look at the industry, it's really, really behind. [00:23:16.200 --> 00:23:21.200] And we founded Fireworks to really bring this expertise [00:23:21.200 --> 00:23:26.240] to help industry adopt AI in the faster way, [00:23:26.240 --> 00:23:30.920] adopt the state of our best research into production [00:23:30.920 --> 00:23:33.360] in a very streamlined way. [00:23:33.360 --> 00:23:35.640] And why Fireworks the name? [00:23:35.640 --> 00:23:37.760] Because PyTorch holds fire, [00:23:37.760 --> 00:23:39.880] and we want this fire to be everywhere. [00:23:39.880 --> 00:23:42.080] That's why we come up with our name, Fireworks. [00:23:42.080 --> 00:23:43.480] - Nice, nice. [00:23:43.480 --> 00:23:46.720] Well, there's also Lightning and Lightning Labs [00:23:46.720 --> 00:23:49.400] is kind of a spinoff of that effort. [00:23:49.400 --> 00:23:50.240] - Right, right. [00:23:50.240 --> 00:23:52.640] - And basically, I think there are multiple teams [00:23:52.640 --> 00:23:56.120] working on better inference for PyTorch. [00:23:56.120 --> 00:23:57.720] Could you elaborate? [00:23:57.720 --> 00:23:59.040] How do you see the landscape [00:23:59.040 --> 00:24:01.340] of sort of inference as a service companies? [00:24:01.340 --> 00:24:02.980] I don't know if you consider yourself that, [00:24:02.980 --> 00:24:05.140] like infrastructure companies in general, I guess. [00:24:05.140 --> 00:24:10.140] - Right, so I think when we think about [00:24:10.140 --> 00:24:13.860] inference optimization, there are different angles, right? [00:24:13.860 --> 00:24:17.840] I still think PyTorch team, when I was there and now, [00:24:17.840 --> 00:24:21.940] now the PyTorch team, they are still doing a great job [00:24:21.940 --> 00:24:24.300] pushing for PyTorch performance optimization [00:24:24.300 --> 00:24:26.220] across training and inference [00:24:26.220 --> 00:24:28.440] through the PyTorch Compile project. [00:24:29.420 --> 00:24:34.420] The goal here is to, hey, keep the simple [00:24:34.420 --> 00:24:37.420] PyTorch programming API, [00:24:37.420 --> 00:24:40.380] which is really good for researchers, [00:24:40.380 --> 00:24:42.100] and then take the heavy lifting [00:24:42.100 --> 00:24:44.340] of doing optimization in an automatic way. [00:24:44.340 --> 00:24:48.260] But then, because PyTorch team just support [00:24:48.260 --> 00:24:50.480] and sustain a broad community, [00:24:50.480 --> 00:24:53.660] so the workload is much more diversified [00:24:53.660 --> 00:24:56.020] when they think about optimization. [00:24:56.460 --> 00:25:00.340] And here, at Fireworks, we take the same philosophy. [00:25:00.340 --> 00:25:02.220] We want to keep the simple API [00:25:02.220 --> 00:25:04.420] of PyTorch programming language, [00:25:04.420 --> 00:25:06.700] and take the heavy lifting of the optimization, [00:25:06.700 --> 00:25:11.140] but more specific target at industry verticals, right? [00:25:11.140 --> 00:25:14.420] For example, when we started company, [00:25:14.420 --> 00:25:17.440] we started from ranking recommendation, [00:25:17.440 --> 00:25:20.320] and we have a product around that. [00:25:20.320 --> 00:25:24.540] And then, later on, our customer we engage with, [00:25:24.540 --> 00:25:26.700] they're asking us, hey, can we help on Genii? [00:25:26.700 --> 00:25:29.180] Because all the Genii models are PyTorch models, [00:25:29.180 --> 00:25:31.600] it's bigger, it's more complex, [00:25:31.600 --> 00:25:34.500] it's even harder to operate and optimize. [00:25:34.500 --> 00:25:37.740] So then we start a vertical on Genii [00:25:37.740 --> 00:25:39.420] across large language model, [00:25:39.420 --> 00:25:41.900] and image generation, other modality as well. [00:25:41.900 --> 00:25:44.100] But because we focus on verticals, [00:25:44.100 --> 00:25:47.580] so we can't afford to take a much more specialized [00:25:47.580 --> 00:25:49.700] optimization approach. [00:25:49.700 --> 00:25:52.900] And that is complementary to PyTorch Compile, [00:25:52.900 --> 00:25:56.220] where PyTorch is driving for a broader audience. [00:25:56.220 --> 00:25:57.940] So that's where we are. [00:25:57.940 --> 00:26:01.780] And I will say, because of our PyTorch expertise, [00:26:01.780 --> 00:26:04.980] we are the best when it comes to [00:26:04.980 --> 00:26:08.940] performance optimization across the following areas, right? [00:26:08.940 --> 00:26:11.620] The performance for Genii models are pretty complicated [00:26:11.620 --> 00:26:13.540] because there's no one bottleneck [00:26:13.540 --> 00:26:16.060] on system resource consumption point of view. [00:26:16.060 --> 00:26:20.840] The bottleneck can scatter across CPU to GPU communication, [00:26:20.840 --> 00:26:23.220] the compute itself, memory bandwidth, [00:26:23.220 --> 00:26:25.280] and many other things. [00:26:25.280 --> 00:26:30.280] So we developed a very special scaling algorithm [00:26:30.280 --> 00:26:35.660] that allow us to tackle those bottleneck independently [00:26:35.660 --> 00:26:38.580] instead of blending them together. [00:26:38.580 --> 00:26:41.460] So that's very unique thing we are doing. [00:26:41.460 --> 00:26:44.460] The second is we build custom kernels [00:26:44.460 --> 00:26:48.860] across attentions, especially multi-query attention, [00:26:48.860 --> 00:26:53.080] matmul, or reduce, and those customer kernels [00:26:53.080 --> 00:26:56.160] outperform anything in the industry. [00:26:56.160 --> 00:27:02.600] Yeah, we also do many adaptive technology [00:27:02.600 --> 00:27:04.960] that just when we run the inference, [00:27:04.960 --> 00:27:07.240] it performance will get better. [00:27:07.240 --> 00:27:08.560] The more you run the workload, [00:27:08.560 --> 00:27:11.120] same workload will start to adapt to the workload [00:27:11.120 --> 00:27:12.360] and become better and better. [00:27:12.360 --> 00:27:16.600] So across all this, and that enable us to be [00:27:16.600 --> 00:27:21.460] in the leading position of Genii inference provider. [00:27:21.460 --> 00:27:23.020] - Just to give people a mental image, [00:27:23.020 --> 00:27:24.260] obviously they can go to the website, [00:27:24.260 --> 00:27:26.660] you have a self-serve option that people can try out. [00:27:26.660 --> 00:27:29.760] You mostly have a library of existing [00:27:29.760 --> 00:27:31.180] popular open source models. [00:27:31.180 --> 00:27:33.100] You just started creating your own models, [00:27:33.100 --> 00:27:34.540] which we can talk about. [00:27:34.540 --> 00:27:35.820] I didn't know that, that's super exciting. [00:27:35.820 --> 00:27:40.220] You actually recently enabled mixed trial [00:27:40.220 --> 00:27:42.340] in one day after their release [00:27:42.340 --> 00:27:44.500] by reverse engineering the code? [00:27:44.500 --> 00:27:45.340] - That's right. [00:27:45.480 --> 00:27:46.320] - That's a high level. [00:27:46.320 --> 00:27:48.400] - Yeah, so I think we did that twice. [00:27:48.400 --> 00:27:51.320] The first time when mixed trial 7B got released, [00:27:51.320 --> 00:27:52.440] the same day. [00:27:52.440 --> 00:27:53.360] They released in the morning, [00:27:53.360 --> 00:27:56.400] then in the afternoon we launched mixed trial 7B. [00:27:56.400 --> 00:27:58.380] I was the first to get work. [00:27:58.380 --> 00:28:01.720] - And this is basically, they release weights but no code. [00:28:01.720 --> 00:28:03.920] And then you have to implement code by guessing the-- [00:28:03.920 --> 00:28:07.360] - Right, for mixed trial, that happened last week. [00:28:07.360 --> 00:28:11.960] They only released the weights, and there's no code. [00:28:11.960 --> 00:28:16.020] And I think it's really fun for us, right? [00:28:16.020 --> 00:28:21.020] So because thanks to the technology we developed over time, [00:28:21.020 --> 00:28:28.200] we actually built a slew of componentized libraries [00:28:28.200 --> 00:28:33.020] that enabling new models [00:28:33.020 --> 00:28:35.120] is not every time built from scratch. [00:28:35.120 --> 00:28:39.060] So because all these models [00:28:39.060 --> 00:28:42.880] share similar kind of model architecture underneath [00:28:42.880 --> 00:28:44.260] with different components, [00:28:44.260 --> 00:28:47.700] and that's why we have the velocity of the speed. [00:28:47.700 --> 00:28:49.880] But it was actually fun to hack it. [00:28:49.880 --> 00:28:54.900] Dima, he goes by Dimitri Zhukov. [00:28:54.900 --> 00:28:55.740] - Your CTO. [00:28:55.740 --> 00:28:57.080] - Yeah, our CTO. [00:28:57.080 --> 00:29:00.160] He basically took the Lama model [00:29:00.160 --> 00:29:03.720] and tried to retrofit to the mixed trial weights, [00:29:03.720 --> 00:29:04.560] and it worked. [00:29:04.560 --> 00:29:06.080] It worked, we were thrilled. [00:29:06.080 --> 00:29:08.520] Oh, it's actually working pretty well. [00:29:08.520 --> 00:29:11.440] But on top of that, it was just a base model. [00:29:11.440 --> 00:29:13.900] It's not Instruct Tune model. [00:29:13.900 --> 00:29:16.640] It's not really usable for chat. [00:29:16.640 --> 00:29:19.360] And then overnight, we tune a chat model [00:29:19.360 --> 00:29:22.160] and deploy it to pull bots [00:29:22.160 --> 00:29:26.840] and used by many other users already at high scale. [00:29:26.840 --> 00:29:29.160] And the feedback is really, really good. [00:29:29.160 --> 00:29:30.600] Of course, now we switched to mixed trial, [00:29:30.600 --> 00:29:32.140] Instruct as the official version, [00:29:32.140 --> 00:29:34.320] but we still keep getting users' feedback. [00:29:34.320 --> 00:29:36.800] Our overnight tuned chat model [00:29:36.800 --> 00:29:38.600] sometimes even performed better. [00:29:38.600 --> 00:29:39.440] - Wow. [00:29:39.440 --> 00:29:41.120] - So, yeah, so that's what we do. [00:29:41.120 --> 00:29:44.760] When it comes to the velocity of quality [00:29:44.760 --> 00:29:48.480] and velocity to high speed, [00:29:48.480 --> 00:29:50.080] we are the best company in the industry. [00:29:50.080 --> 00:29:51.920] - Yeah, mentioning speed, I should also mention [00:29:51.920 --> 00:29:54.240] that a lot of AI engineers listening on the podcast [00:29:54.240 --> 00:29:56.680] would be familiar with the Vercel AI Playgrounds, [00:29:56.680 --> 00:29:59.240] which you are the primary provider for, right? [00:29:59.240 --> 00:30:00.600] I mean, that's the one that's most visible [00:30:00.600 --> 00:30:01.440] 'cause they name you, [00:30:01.440 --> 00:30:02.720] but I don't know if there's any other [00:30:02.720 --> 00:30:04.240] that you serve that you can name [00:30:04.240 --> 00:30:06.640] as you're the sort of inference provider. [00:30:06.640 --> 00:30:08.880] - Here's just kind of a very highly selective list [00:30:08.880 --> 00:30:09.720] of the customer. - Yeah, of course, [00:30:09.720 --> 00:30:10.960] it's not exhaustive. [00:30:10.960 --> 00:30:13.400] - Yeah, we get the marketing rights. [00:30:13.400 --> 00:30:14.240] - Yeah. [00:30:14.240 --> 00:30:17.280] - So we already served Tome. [00:30:17.280 --> 00:30:19.600] They're doing really good PowerPoint generation. [00:30:19.600 --> 00:30:21.840] If you haven't used that, please try it out. [00:30:21.840 --> 00:30:22.840] It's really cool. [00:30:22.840 --> 00:30:24.960] - Yeah, I used it for my keynote for my conference. [00:30:24.960 --> 00:30:25.880] - Oh, that's fantastic. [00:30:25.880 --> 00:30:30.520] - Yeah, I used like a magic trackpad to serve the Tome, [00:30:30.520 --> 00:30:32.600] and then obviously whenever I need to generate images, [00:30:32.600 --> 00:30:34.140] I actually generate it from inside of Tome. [00:30:34.140 --> 00:30:35.880] So I was using Fireworks without knowing it. [00:30:35.880 --> 00:30:37.440] - That's fantastic. [00:30:37.440 --> 00:30:41.120] We also serve the Copilot kind of application. [00:30:41.120 --> 00:30:43.680] For example, SourceGraph released Cody. [00:30:43.680 --> 00:30:46.720] - By the time this releases, [00:30:46.720 --> 00:30:49.680] we'll release our episode of SourceGraph and Steve Yagi. [00:30:49.680 --> 00:30:50.520] - Oh, that's great. [00:30:50.520 --> 00:30:51.360] That's great. - Yeah, we recorded one. [00:30:51.360 --> 00:30:52.640] We're good friends. [00:30:52.640 --> 00:30:56.360] - We also are the inference backend provider for Poll. [00:30:56.360 --> 00:30:58.480] That is a very popular chatbot, [00:30:58.480 --> 00:31:00.680] and Poll is building-- [00:31:00.680 --> 00:31:03.400] - Wait, doesn't Poll just Anthropic or GPT? [00:31:03.400 --> 00:31:04.240] - At the beginning. [00:31:04.240 --> 00:31:05.060] - Oh, okay, now they have their own models. [00:31:05.060 --> 00:31:08.760] - Yeah, they are going big on open source models. [00:31:08.760 --> 00:31:09.600] - I see. [00:31:09.600 --> 00:31:12.680] - To provide a variety of different, [00:31:12.680 --> 00:31:15.360] solving different experiences, [00:31:15.360 --> 00:31:18.860] bring different experiences and much better performance. [00:31:18.860 --> 00:31:22.960] And of course, from their point of view, cost efficient. [00:31:22.960 --> 00:31:25.360] There are many other big enterprises, [00:31:25.360 --> 00:31:27.320] for example, with DoorDash. [00:31:27.320 --> 00:31:28.160] They're using us. [00:31:28.160 --> 00:31:28.980] - Did they say for what? [00:31:28.980 --> 00:31:31.960] - Yeah, so we actually, yeah, [00:31:31.960 --> 00:31:35.780] we release ranking recommendation stack with them [00:31:35.780 --> 00:31:37.900] to power their main business. [00:31:37.900 --> 00:31:40.660] Because when you go to their website, [00:31:40.660 --> 00:31:42.980] there are a lot of ranking recommendation stuff happening, [00:31:42.980 --> 00:31:47.500] including ads and kind of restaurant, [00:31:47.500 --> 00:31:48.940] search recommendation and so on. [00:31:48.940 --> 00:31:50.660] - One thing I wonder about is, [00:31:50.660 --> 00:31:51.900] for something like a DoorDash, [00:31:51.900 --> 00:31:54.560] and I'm a bit newer to Rexis in general, [00:31:54.560 --> 00:31:56.740] shouldn't those be pre-computed? [00:31:56.740 --> 00:31:59.060] Like, why does it have to be fast or live? [00:31:59.060 --> 00:32:00.740] It doesn't have to be live, right? [00:32:00.740 --> 00:32:03.220] - Actually, there are a lot of dynamism, right? [00:32:03.220 --> 00:32:07.460] Because your personal preference may change, right? [00:32:07.460 --> 00:32:09.380] It's also quickly learning. [00:32:09.380 --> 00:32:12.020] And their distribution channel, [00:32:12.020 --> 00:32:14.660] their participating restaurant may change, [00:32:14.660 --> 00:32:16.140] their menu may change. [00:32:16.140 --> 00:32:19.960] There's a lot of dynamism in the matching criteria here. [00:32:19.960 --> 00:32:23.540] And as I work at Matter for a long time, [00:32:23.540 --> 00:32:28.380] to actually do highly adaptive ranking recommendation, [00:32:28.380 --> 00:32:30.600] personalized ranking recommendation, [00:32:30.600 --> 00:32:32.620] yield the best performance [00:32:32.620 --> 00:32:36.220] when it comes to the relevance and revenue. [00:32:36.220 --> 00:32:38.660] - Yeah, I'm just asking offline versus online. [00:32:38.660 --> 00:32:43.420] I don't know how sensitive this is to latency requirements. [00:32:43.420 --> 00:32:44.460] - Oh, yeah, yeah. [00:32:44.460 --> 00:32:48.740] No, so a lot of time, most people, [00:32:48.740 --> 00:32:53.740] of course at big companies, people do online training. [00:32:53.740 --> 00:32:56.580] But for those enterprises, [00:32:56.580 --> 00:32:59.920] I haven't seen the need to go online training yet. [00:32:59.920 --> 00:33:04.140] So usually training is offline, but it's periodic, right? [00:33:04.140 --> 00:33:06.700] You have to refresh with new information [00:33:06.700 --> 00:33:10.500] and then you launch and deploy periodically, yeah. [00:33:10.500 --> 00:33:13.560] - Okay, and so I teased this earlier. [00:33:13.560 --> 00:33:15.540] I didn't know that you had your own models [00:33:15.540 --> 00:33:17.020] that you're also training. [00:33:17.020 --> 00:33:18.780] So you just released a clean lava. [00:33:18.780 --> 00:33:20.380] - Yeah. [00:33:20.380 --> 00:33:21.200] - What's the story behind that? [00:33:21.200 --> 00:33:25.540] - Right, so I think everyone knows like GPT-V [00:33:25.540 --> 00:33:28.040] and the kind of the space of multimodality, right? [00:33:28.980 --> 00:33:32.360] I think as I talked about in one of the interview [00:33:32.360 --> 00:33:34.600] when I was at Meta for PyTorch, [00:33:34.600 --> 00:33:35.840] at the end, the moderator asked me, [00:33:35.840 --> 00:33:38.040] "Hey, what do I think of the future?" [00:33:38.040 --> 00:33:39.840] My answer is multimodality. [00:33:39.840 --> 00:33:41.360] 'Cause we live in the whole world [00:33:41.360 --> 00:33:44.400] that it has so many different modalities [00:33:44.400 --> 00:33:49.400] across image, audio, text, video, and so many other things. [00:33:49.400 --> 00:33:53.920] And that is the mix of our world and the real world experience. [00:33:53.920 --> 00:33:57.680] So yeah, we really think multimodality [00:33:57.680 --> 00:34:00.640] will be a very important aspect. [00:34:00.640 --> 00:34:05.640] So, and we take the very popular lava model from Microsoft, [00:34:05.640 --> 00:34:10.680] but it has the kind of GT4 training data. [00:34:10.680 --> 00:34:13.960] So we replace that with our own training data [00:34:13.960 --> 00:34:16.920] and make sure it's commercially usable. [00:34:16.920 --> 00:34:19.080] Yeah, we're super excited about this. [00:34:19.080 --> 00:34:20.400] - Yeah, I mean, it sounds like [00:34:20.400 --> 00:34:22.720] you'll be exploring more models as well [00:34:22.720 --> 00:34:24.560] and just putting all your platform [00:34:24.560 --> 00:34:26.560] and you're the fastest way to access them. [00:34:26.560 --> 00:34:27.400] We're here in New York. [00:34:27.400 --> 00:34:29.200] You're talking to a lot of industry folks. [00:34:29.200 --> 00:34:30.920] Any other top of mind conversations [00:34:30.920 --> 00:34:31.880] that you're just hearing a lot [00:34:31.880 --> 00:34:33.600] that may be surprising to people? [00:34:33.600 --> 00:34:38.600] - So I mostly talk with many startups that is emerging. [00:34:38.600 --> 00:34:44.000] So number one, it's really refreshing to me that, [00:34:44.000 --> 00:34:46.560] but not surprising, that there's so much [00:34:46.560 --> 00:34:49.760] product innovation that's happening across the board. [00:34:49.760 --> 00:34:51.800] So much energy there, [00:34:51.800 --> 00:34:54.660] but a lot of those are built on top of GNI. [00:34:55.840 --> 00:34:56.980] Of course it's not surprising, [00:34:56.980 --> 00:35:00.720] but it's kind of validating fundamentally [00:35:00.720 --> 00:35:04.800] innovative technology can reboot [00:35:04.800 --> 00:35:06.520] a huge part of the industry. [00:35:06.520 --> 00:35:09.560] So that's really, really refreshing. [00:35:09.560 --> 00:35:14.560] The second is, I think there are a lot more, hey, [00:35:14.560 --> 00:35:20.120] how we think about working together, right? [00:35:20.120 --> 00:35:24.720] How we build a bigger, more interesting product [00:35:24.720 --> 00:35:26.680] for a broader audience together. [00:35:26.680 --> 00:35:30.160] I think those conversation is very, very interesting to me. [00:35:30.160 --> 00:35:31.000] - Yeah, yeah. [00:35:31.000 --> 00:35:31.820] Okay, very cool. [00:35:31.820 --> 00:35:33.800] And you're also here to hire or recruit? [00:35:33.800 --> 00:35:34.640] - Oh, yeah, absolutely. [00:35:34.640 --> 00:35:35.460] - Maybe put out a call. [00:35:35.460 --> 00:35:36.300] Who are you looking for? [00:35:36.300 --> 00:35:37.120] What's the profile? [00:35:37.120 --> 00:35:41.560] - Yeah, we are definitely growing very fast as a company. [00:35:41.560 --> 00:35:45.480] We are looking for system engineers as, [00:35:45.480 --> 00:35:50.480] hey, we already have a rock solid inference serving, [00:35:50.480 --> 00:35:54.200] but we are scaling it quickly and aggressively. [00:35:54.200 --> 00:35:59.120] So anyone with cloud infrastructure experience [00:35:59.120 --> 00:36:02.040] move really fast, join us. [00:36:02.040 --> 00:36:05.920] We are also looking for researchers [00:36:05.920 --> 00:36:10.280] who has a lot of experience and understanding data a lot, [00:36:10.280 --> 00:36:12.200] understanding quality a lot, [00:36:12.200 --> 00:36:14.760] can get to kind of quickly help our customer [00:36:14.760 --> 00:36:15.880] get to high quality. [00:36:15.880 --> 00:36:19.320] And whether through training our own models [00:36:19.320 --> 00:36:21.080] or fine tuning the models [00:36:21.080 --> 00:36:25.480] and the building task specific fine tuning services. [00:36:25.480 --> 00:36:28.960] Those are the areas we are pushing really aggressive on. [00:36:28.960 --> 00:36:31.280] And of course, we are hiring across the board [00:36:31.280 --> 00:36:35.520] of go to market people all the way from marketing, [00:36:35.520 --> 00:36:37.720] solution architects, sales rep, and so on. [00:36:37.720 --> 00:36:38.600] - Yeah, yeah. [00:36:38.600 --> 00:36:39.440] Nice. [00:36:39.440 --> 00:36:40.260] Seems like you're scaling very quickly. [00:36:40.260 --> 00:36:41.360] Thanks for coming on. [00:36:41.360 --> 00:36:43.080] - Oh, thank you for having me. [00:36:43.080 --> 00:36:43.920] Cool. [00:36:43.920 --> 00:36:44.760] - That's it. [00:36:44.760 --> 00:36:45.580] - When I first met Fireworks, [00:36:45.580 --> 00:36:46.760] I was very impressed by your team, [00:36:46.760 --> 00:36:49.120] but since then I've been more impressed by the execution. [00:36:49.120 --> 00:36:51.280] And my guess is that this will not be the only time [00:36:51.280 --> 00:36:53.720] that you'll hear about them on the Lanespace pod. [00:36:53.720 --> 00:36:56.120] So far in organizing and editing this podcast, [00:36:56.120 --> 00:36:57.680] I've been trying to bias towards [00:36:57.680 --> 00:37:00.120] reintroducing previous guests of the pod [00:37:00.120 --> 00:37:03.960] as a form of end of year check-in episode with friends. [00:37:03.960 --> 00:37:06.120] But so many of them actually mentioned Fireworks. [00:37:06.120 --> 00:37:08.600] You'll see later with Cursor and Perplexity [00:37:08.600 --> 00:37:10.400] that I had to put Fireworks first, [00:37:10.400 --> 00:37:13.060] just because that many people have interacted with them, [00:37:13.060 --> 00:37:15.640] use them and love them or compete with them. [00:37:15.640 --> 00:37:17.280] I think it's a really interesting open question [00:37:17.280 --> 00:37:19.840] as to how much moat any one inference [00:37:19.840 --> 00:37:22.360] or commodity infrastructure provider can have. [00:37:22.360 --> 00:37:24.640] The people who are not in the business say there's no moat. [00:37:24.640 --> 00:37:26.720] And the people who are in the business, like Lin, [00:37:26.720 --> 00:37:28.720] see tons of moat in the software that they write, [00:37:28.720 --> 00:37:30.440] which obviously is proprietary to them. [00:37:30.440 --> 00:37:32.160] It's also interesting to see them start training [00:37:32.160 --> 00:37:33.660] and releasing their own models. [00:37:33.660 --> 00:37:35.400] And Fireworks released a lava variant, [00:37:35.400 --> 00:37:38.780] which we previously covered in our previous NeurIPS episode [00:37:38.780 --> 00:37:40.480] as one of the best papers of 2033. [00:37:40.480 --> 00:37:43.080] So I highly encourage you to check out that conversation [00:37:43.080 --> 00:37:44.600] with Altian if you're interested. [00:37:44.600 --> 00:37:46.560] So I say all that to preface the conversation [00:37:46.560 --> 00:37:48.160] that we're gonna have with the next two guests. [00:37:48.160 --> 00:37:49.540] The first is a return guest, [00:37:49.540 --> 00:37:52.180] which is Aman Sanger from cursor.so. [00:37:52.180 --> 00:37:54.200] We had them on in August to talk about [00:37:54.200 --> 00:37:58.080] their amazing rise to power as the AI first code editor. [00:37:58.080 --> 00:38:00.600] They've definitely exploded all over my timeline. [00:38:00.600 --> 00:38:01.920] And at the time of the interview, [00:38:01.920 --> 00:38:06.920] I myself was a VS Code, Cody, Codium, Copilot, Codium fan. [00:38:06.920 --> 00:38:09.320] And since then, I've actually switched my own workflow [00:38:09.320 --> 00:38:11.400] over to Cursor because of the better workflow [00:38:11.400 --> 00:38:12.280] that they provide. [00:38:12.280 --> 00:38:13.560] But still, there's a lot of open questions [00:38:13.560 --> 00:38:14.560] around their business. [00:38:14.560 --> 00:38:17.360] Just like Mosaic, during our podcast interview, [00:38:17.360 --> 00:38:19.800] they were actually sitting on a fundraise [00:38:19.800 --> 00:38:21.960] and they had recently announced their fundraise [00:38:21.960 --> 00:38:22.780] with OpenAI. [00:38:22.780 --> 00:38:24.520] So let's check in on Cursor. [00:38:24.520 --> 00:38:25.360] Okay, cool. [00:38:25.360 --> 00:38:26.180] So I'm back with Aman. [00:38:26.180 --> 00:38:27.020] Hey. [00:38:27.020 --> 00:38:27.860] - Hey, how's it going? [00:38:27.860 --> 00:38:28.680] - Hard to catch you. [00:38:28.680 --> 00:38:29.800] You're a difficult man to find. [00:38:29.800 --> 00:38:30.960] - I guess so. [00:38:30.960 --> 00:38:31.920] - You've been exploring NeurIPS [00:38:31.920 --> 00:38:33.240] and you also announced your fundraise [00:38:33.240 --> 00:38:34.360] since our last episode. [00:38:34.360 --> 00:38:35.200] - Yeah. [00:38:35.200 --> 00:38:37.520] So we raised $8 million from OpenAI. [00:38:37.520 --> 00:38:39.020] They've been a fantastic partner [00:38:39.020 --> 00:38:40.500] and I think it was a great decision. [00:38:40.500 --> 00:38:41.520] - Yeah. [00:38:41.520 --> 00:38:42.820] OpenAI used you themselves. [00:38:42.820 --> 00:38:44.780] - Yes, we have a lot of OpenAI users [00:38:44.780 --> 00:38:47.080] and we're growing pretty fast inside the org. [00:38:47.080 --> 00:38:48.780] The thing that we like to say is like, [00:38:48.780 --> 00:38:52.480] Cursor is the means by which research happens faster, right? [00:38:52.480 --> 00:38:54.540] Like as we make programming happen faster and faster, [00:38:54.540 --> 00:38:56.480] as we make programmers much more efficient, [00:38:56.480 --> 00:38:59.040] we're making researchers more efficient. [00:38:59.040 --> 00:39:00.180] And the bottleneck for research [00:39:00.180 --> 00:39:02.100] is really just implementation. [00:39:02.100 --> 00:39:04.200] If you can come up with an idea [00:39:04.200 --> 00:39:06.700] and then actually have the code, [00:39:06.700 --> 00:39:09.980] have the experiment all written for you immediately, [00:39:09.980 --> 00:39:11.740] researchers just happen much faster. [00:39:11.740 --> 00:39:13.280] And so that's the goal that we're working towards. [00:39:13.280 --> 00:39:16.480] And I think we're a tiny bit of the way there [00:39:16.480 --> 00:39:18.120] with a lot of OpenAI users. [00:39:18.120 --> 00:39:18.960] - Yeah. [00:39:18.960 --> 00:39:21.640] What's the funniest or most interesting sort of feedback [00:39:21.640 --> 00:39:24.240] you get from OpenAI people versus regular coders? [00:39:24.240 --> 00:39:25.600] Like do they prompt differently [00:39:25.600 --> 00:39:26.760] because they work at OpenAI? [00:39:26.760 --> 00:39:29.240] - So they actually probably have less feedback [00:39:29.240 --> 00:39:30.440] than some of our other users [00:39:30.440 --> 00:39:31.600] who are less familiar with language models [00:39:31.600 --> 00:39:33.280] 'cause they know what the deficiencies are. [00:39:33.280 --> 00:39:34.840] They kind of know what's going on underneath the hood. [00:39:34.840 --> 00:39:37.700] - Yeah, you can probably give them interesting input [00:39:37.700 --> 00:39:39.480] on what people are trying and failing with. [00:39:39.480 --> 00:39:40.680] - Yeah, that's true. [00:39:40.680 --> 00:39:41.860] We do give them a lot of feedback [00:39:41.860 --> 00:39:44.560] on a lot of their early alphas and whatnot. [00:39:44.560 --> 00:39:47.400] - And so you've been tearing up the Twitters recently, [00:39:47.400 --> 00:39:48.400] putting in some effort. [00:39:48.400 --> 00:39:50.040] What are your sort of top messages [00:39:50.040 --> 00:39:52.000] that have been really resonating with people? [00:39:52.000 --> 00:39:55.840] - I was a big fan of the KV caching tweet. [00:39:55.840 --> 00:39:58.360] It's surprising that not too many people, [00:39:58.360 --> 00:40:00.740] it seemed like not too many people knew about this before. [00:40:00.740 --> 00:40:01.580] - Yeah. [00:40:01.580 --> 00:40:04.800] So when people learn about transformers, [00:40:04.800 --> 00:40:08.080] it's actually not in the documented literature [00:40:08.080 --> 00:40:09.580] and the academic side of things [00:40:09.580 --> 00:40:12.200] that KV caching is a common industry practice. [00:40:12.200 --> 00:40:13.040] - Yeah. [00:40:13.040 --> 00:40:14.320] - You only find out when you talk to industry people [00:40:14.320 --> 00:40:16.040] that you have a KV cache. [00:40:16.040 --> 00:40:18.480] - So when you say KV cache, it's really confusing [00:40:18.480 --> 00:40:21.760] because the KV cache, the KV cache can be cached, right? [00:40:21.760 --> 00:40:23.000] It's almost like a double caching. [00:40:23.000 --> 00:40:25.160] But the key idea here is, [00:40:25.160 --> 00:40:27.600] well, let's look at all the big closed model providers. [00:40:27.600 --> 00:40:29.520] They all have these chat models. [00:40:29.520 --> 00:40:32.440] And with chats and with conversations, [00:40:32.440 --> 00:40:35.040] the first N conversation messages are always fixed. [00:40:35.040 --> 00:40:37.040] And that means the first, let's say, [00:40:37.040 --> 00:40:39.120] N tokens are going to be fixed. [00:40:39.120 --> 00:40:41.080] And that means when I put the next token in, [00:40:41.080 --> 00:40:43.360] why do I need to redo all the work [00:40:43.360 --> 00:40:45.900] of re-computing the keys and values [00:40:45.900 --> 00:40:47.140] for those first N tokens? [00:40:47.140 --> 00:40:48.440] - Yeah. [00:40:48.440 --> 00:40:51.360] - And a standard inference trick for this [00:40:51.360 --> 00:40:54.200] is you take those keys and values [00:40:54.200 --> 00:40:57.120] and you move them from GPU RAM to CPU RAM. [00:40:57.120 --> 00:40:57.960] - Yeah. [00:40:57.960 --> 00:41:00.120] - You store them there for some period of time [00:41:00.120 --> 00:41:01.160] before they're evicted. [00:41:01.160 --> 00:41:03.160] And then if another request comes in [00:41:03.160 --> 00:41:04.520] with a matching prefix, [00:41:04.520 --> 00:41:06.980] the matching original conversation history, [00:41:06.980 --> 00:41:08.400] you just load those back into GPU RAM [00:41:08.400 --> 00:41:11.360] and you save a ton of time on compute. [00:41:11.360 --> 00:41:13.440] Your time to first token goes. [00:41:13.440 --> 00:41:14.840] And then because you're saving on compute, [00:41:14.840 --> 00:41:16.880] you can increase your throughput. [00:41:16.880 --> 00:41:18.720] And this is a trick that you don't really see [00:41:18.720 --> 00:41:20.800] in any of the open source inference actions. [00:41:20.800 --> 00:41:22.320] - So you don't see that, [00:41:22.320 --> 00:41:24.220] but people implement it on top of it, right? [00:41:24.220 --> 00:41:25.060] - Yes. [00:41:25.060 --> 00:41:26.560] Well, my understanding, I think, [00:41:26.560 --> 00:41:27.400] together, for example, [00:41:27.400 --> 00:41:28.800] I think is implementing this. [00:41:28.800 --> 00:41:30.360] - Yeah, and I just talked with Lin Tao [00:41:30.360 --> 00:41:31.200] from Fireworks as well, [00:41:31.200 --> 00:41:32.560] just doing that. - Yeah. [00:41:32.560 --> 00:41:34.240] - So one of the interesting, [00:41:34.240 --> 00:41:36.800] oh, I always assume that it's because of personalization. [00:41:36.800 --> 00:41:38.560] Like, hey, in my system prompt, [00:41:38.560 --> 00:41:39.520] I have today's date. [00:41:39.520 --> 00:41:40.920] I'm gonna have to update that once a day. [00:41:40.920 --> 00:41:41.740] Fine. [00:41:41.740 --> 00:41:42.580] No big deal. [00:41:42.580 --> 00:41:43.420] - Yeah. [00:41:43.420 --> 00:41:46.000] - But maybe if people have more customized prompts. [00:41:46.000 --> 00:41:48.880] But you said there's some kind of cache eviction policy [00:41:48.880 --> 00:41:51.920] where if there's a 95% match, [00:41:51.920 --> 00:41:53.800] you use the cache. [00:41:53.800 --> 00:41:56.200] - Yeah, I don't know what the exact eviction policy would be. [00:41:56.200 --> 00:41:57.600] You could probably use, [00:41:57.600 --> 00:41:59.480] assume you have, I don't know, [00:41:59.480 --> 00:42:01.760] 100 gigabytes of space per device. [00:42:01.760 --> 00:42:02.720] Probably a lot more, actually. [00:42:02.720 --> 00:42:06.500] Probably up to a terabyte of CPU RAM per device. [00:42:06.500 --> 00:42:07.660] Or maybe per machine. [00:42:07.660 --> 00:42:11.320] You could just do something like least recently used. [00:42:11.320 --> 00:42:14.560] And then if you start to use up more space [00:42:14.560 --> 00:42:15.640] than exists on device, [00:42:15.640 --> 00:42:18.520] you just evict the least recently used request. [00:42:18.520 --> 00:42:21.380] - You are a consumer mostly of the GPT-4 API. [00:42:21.380 --> 00:42:22.440] - Yes. [00:42:22.440 --> 00:42:23.920] - They don't really expose this. [00:42:23.920 --> 00:42:25.000] - They don't. - In the API. [00:42:25.000 --> 00:42:26.480] How does this affect you? [00:42:26.480 --> 00:42:29.000] - I think it's actually pretty important [00:42:29.000 --> 00:42:31.600] to understand what's going on underneath the hood [00:42:31.600 --> 00:42:33.560] to take advantage of these things. [00:42:33.560 --> 00:42:35.980] So we use dedicated instances. [00:42:35.980 --> 00:42:39.500] - So they expose their capability to you. [00:42:39.500 --> 00:42:40.340] - Like somewhat. [00:42:40.340 --> 00:42:41.740] But the key thing is, [00:42:41.740 --> 00:42:43.460] they expose very little, actually. [00:42:43.460 --> 00:42:44.900] And you can-- - Isn't that weird? [00:42:44.900 --> 00:42:46.500] - I mean, yeah, but the only way [00:42:46.500 --> 00:42:48.140] that you can really take advantage of this, [00:42:48.140 --> 00:42:50.160] and I kind of had another tweet about this, [00:42:50.160 --> 00:42:53.020] is you need to really understand [00:42:53.020 --> 00:42:54.060] what's going on underneath the hood [00:42:54.060 --> 00:42:58.420] so you can then plan for when memory utilization is spiking [00:42:58.420 --> 00:43:01.780] based on how many tokens you're currently using [00:43:01.780 --> 00:43:03.900] or how much memory the instance you can speculate is, [00:43:03.900 --> 00:43:06.780] or when are you getting a lot of cache hits [00:43:06.780 --> 00:43:10.340] so you don't expect to be using as much compute, [00:43:10.340 --> 00:43:12.020] which means you can then increase your throughput [00:43:12.020 --> 00:43:13.420] without worrying about things going, [00:43:13.420 --> 00:43:15.300] latency spiking or things going down. [00:43:15.300 --> 00:43:17.220] - Yeah, and I don't know if you've, [00:43:17.220 --> 00:43:20.820] I've taken this thought to quite an extreme level. [00:43:20.820 --> 00:43:23.020] Like you can use this to cache RAG stuff, [00:43:23.020 --> 00:43:24.020] like RAG results. [00:43:24.020 --> 00:43:24.860] - Yeah. [00:43:24.860 --> 00:43:26.700] - And just general prompts, right? [00:43:26.700 --> 00:43:27.660] - You can, you can. [00:43:27.660 --> 00:43:29.700] So I did have another tweet about this [00:43:29.700 --> 00:43:31.620] where there's, no one's done this the best, [00:43:31.620 --> 00:43:32.460] to my knowledge, [00:43:32.460 --> 00:43:34.660] and I think this would be very, very hard to do, [00:43:34.660 --> 00:43:37.820] but you could technically cache the entirety [00:43:37.820 --> 00:43:41.620] of some corpus in something like S3 [00:43:41.620 --> 00:43:46.620] if you have a model which has smaller sized keys and values. [00:43:46.620 --> 00:43:49.180] So this would be, instead of full multi-head attention, [00:43:49.180 --> 00:43:52.620] it could be something like grouped query attention, [00:43:52.620 --> 00:43:54.740] which is, I think, usually around 8x smaller, [00:43:54.740 --> 00:43:59.740] or even multi-query, which can be 64 to 256x smaller. [00:44:00.220 --> 00:44:01.980] And so then what that means is [00:44:01.980 --> 00:44:05.420] you can actually read the weights from Blob Storage [00:44:05.420 --> 00:44:07.620] if you have everything really optimized. [00:44:07.620 --> 00:44:10.500] You can read it into RAM a decent bit faster [00:44:10.500 --> 00:44:13.260] than it would actually take to compute, [00:44:13.260 --> 00:44:15.060] re-compute the key to kbcache. [00:44:15.060 --> 00:44:17.540] I think that'll be very tricky to implement, [00:44:17.540 --> 00:44:19.740] and I think there are actually not too many use cases [00:44:19.740 --> 00:44:20.660] where it would be useful. [00:44:20.660 --> 00:44:23.660] I think the code bases, there's actually one where it could be. [00:44:23.660 --> 00:44:24.780] - Yeah. [00:44:24.780 --> 00:44:27.020] My final observation on this is, [00:44:27.940 --> 00:44:30.980] OpenAI had the opportunity to offer caching to people [00:44:30.980 --> 00:44:33.220] with the assistance API, and again, [00:44:33.220 --> 00:44:34.340] they're charging you for the whole thing [00:44:34.340 --> 00:44:37.340] every single time you send a message to the assistance API. [00:44:37.340 --> 00:44:41.740] And I find it, is there some explanation? [00:44:41.740 --> 00:44:44.780] Is it just like a, we can do it, so we're gonna do it? [00:44:44.780 --> 00:44:47.380] - It's tricky when you're not using, [00:44:47.380 --> 00:44:49.500] I don't know what they're doing underneath the hood, [00:44:49.500 --> 00:44:52.180] but if you assume they're doing something like [00:44:52.180 --> 00:44:55.820] caching at a machine level, these are serverless endpoints. [00:44:55.820 --> 00:44:57.220] - I assume they're not, they're serverless, right? [00:44:57.220 --> 00:45:00.260] So you have to load, unload, and that causes a cold start, [00:45:00.260 --> 00:45:02.500] and that's a problem for them. [00:45:02.500 --> 00:45:06.180] - So it's really trivial when you have server endpoints, [00:45:06.180 --> 00:45:08.600] server-based endpoints, or dedicated instances. [00:45:08.600 --> 00:45:11.020] It's probably quite tricky to get right. [00:45:11.020 --> 00:45:13.060] I mean, I'm not really confident [00:45:13.060 --> 00:45:15.420] as to what their decision-making was there, [00:45:15.420 --> 00:45:17.660] but I'd imagine it's much more difficult to get right. [00:45:17.660 --> 00:45:18.500] - Got it. [00:45:18.500 --> 00:45:20.700] What was your second tweet that we prepped? [00:45:20.700 --> 00:45:22.660] - One of them that I thought was interesting [00:45:22.660 --> 00:45:25.340] was generating a trivial dataset. [00:45:25.340 --> 00:45:26.900] - Yes, synthetic data. [00:45:26.900 --> 00:45:28.140] - Using synthetic data. [00:45:28.140 --> 00:45:30.460] I mean, the key thing here is there's a lot of [00:45:30.460 --> 00:45:33.060] using synthetic data to, like the outputs of models [00:45:33.060 --> 00:45:35.420] to actually train weaker models, [00:45:35.420 --> 00:45:38.060] and so a lot of people have done this with GPT-4 outputs. [00:45:38.060 --> 00:45:41.140] This is actually, I think, that requires, I guess, [00:45:41.140 --> 00:45:45.220] the claim that you can train on GPT-4 outputs, [00:45:45.220 --> 00:45:47.820] and you'll still get pretty good models out of that. [00:45:47.820 --> 00:45:48.660] - Yeah, it's a little selfish. [00:45:48.660 --> 00:45:49.580] - Yeah, which seems reasonable, [00:45:49.580 --> 00:45:51.460] but we're actually relying on a weaker claim, [00:45:51.460 --> 00:45:53.820] because all we're doing is, I mean, [00:45:53.820 --> 00:45:56.100] people can check out the tweets and see it in more detail, [00:45:56.100 --> 00:45:58.580] but GPT-4 is quite good at this task [00:45:58.580 --> 00:46:03.580] of ordering four candidate documents [00:46:03.580 --> 00:46:07.060] given a query as to the relevance of the query, right? [00:46:07.060 --> 00:46:08.740] That's like, there have been papers that show this, [00:46:08.740 --> 00:46:11.980] like list-wise re-ranking, and it works really well. [00:46:11.980 --> 00:46:14.620] So if you do that for enough documents, [00:46:14.620 --> 00:46:15.920] and you do it in an efficient way, [00:46:15.920 --> 00:46:18.140] which we kind of use a variant of ELO [00:46:18.140 --> 00:46:19.860] called TruSkill to do, [00:46:19.860 --> 00:46:24.580] you can then get a really high-quality re-ranking dataset, [00:46:24.580 --> 00:46:27.140] really high-quality ordering over, [00:46:27.140 --> 00:46:32.100] let's say, 100 candidate documents given some query. [00:46:32.100 --> 00:46:34.300] So we use GPT-4 kind of in the loop [00:46:34.300 --> 00:46:36.460] for doing a bunch of different synthetic data stuff. [00:46:36.460 --> 00:46:39.140] This is one of them, and I feel like more people [00:46:39.140 --> 00:46:40.820] should be doing it for this kind of stuff. [00:46:40.820 --> 00:46:44.340] - Yeah, yeah, I think people are exploring [00:46:44.340 --> 00:46:47.460] synthetic data a lot at the back half of this year [00:46:47.460 --> 00:46:51.060] for choosing models as judges, [00:46:51.060 --> 00:46:53.020] models as synthetic data generators. [00:46:53.020 --> 00:46:54.740] - Yeah, I think models as judges [00:46:54.740 --> 00:46:57.060] is almost certainly going to work. [00:46:57.060 --> 00:46:59.060] If you use (mumbles) it's a very easy task. [00:46:59.060 --> 00:47:00.460] I think this is a very easy task. [00:47:00.460 --> 00:47:02.820] - This is how we do RLEIF? [00:47:02.820 --> 00:47:04.620] - Yeah, yeah, though it's interesting. [00:47:04.620 --> 00:47:07.740] RLEIF, I was looking at that paper again, [00:47:07.740 --> 00:47:09.940] and it seemed to really be good for, [00:47:09.940 --> 00:47:12.180] if you look at it compared to RLHF, [00:47:12.180 --> 00:47:14.900] it helped with harmlessness. [00:47:14.900 --> 00:47:17.980] I don't believe it actually helped in helpfulness. [00:47:17.980 --> 00:47:20.200] - It helped to achieve the Pareto optimal trade-off, [00:47:20.200 --> 00:47:22.620] which is no decline in the other two. [00:47:22.620 --> 00:47:25.340] - I think if you compare it to RLHF, [00:47:25.340 --> 00:47:26.660] it was pretty neck-and-neck. [00:47:26.660 --> 00:47:29.700] I don't think there's a statistically significant difference [00:47:29.700 --> 00:47:32.540] with helpfulness, at least, but it is interesting. [00:47:32.540 --> 00:47:35.140] RLEIF is just effectively getting better [00:47:35.140 --> 00:47:37.380] at censoring the model rather than improving [00:47:37.380 --> 00:47:40.100] its almost capabilities, its helpfulness, [00:47:40.100 --> 00:47:43.500] while RLHF, it'll do it as well as RLHF, [00:47:43.500 --> 00:47:45.260] but it doesn't offer anything additional there, [00:47:45.260 --> 00:47:46.820] which kind of makes sense to me. [00:47:46.820 --> 00:47:50.060] - First impressions on yours? [00:47:50.060 --> 00:47:50.880] - I mean, very interesting. [00:47:50.880 --> 00:47:52.300] Lots of very smart people. [00:47:52.300 --> 00:47:55.100] I've had lots of very interesting conversations. [00:47:55.100 --> 00:47:56.380] I'll probably be back next year. [00:47:56.380 --> 00:47:57.780] - I was kind of lukewarm on it coming in, [00:47:57.780 --> 00:47:59.860] 'cause everyone goes like, "Oh, it's a big conference. [00:47:59.860 --> 00:48:01.820] "It's hard to navigate," and all that, [00:48:01.820 --> 00:48:03.960] but then you run into a few papers, [00:48:03.960 --> 00:48:05.180] people, authors, they're interesting, [00:48:05.180 --> 00:48:07.180] and then you're here, a bunch of other people [00:48:07.180 --> 00:48:08.260] I want to meet are all here. [00:48:08.260 --> 00:48:10.980] It's a nice way to get everyone in one place [00:48:10.980 --> 00:48:13.220] and just catch up on everything. [00:48:13.220 --> 00:48:15.100] The house parties are fun. [00:48:15.100 --> 00:48:17.020] Yesterday was just a lot of parties. [00:48:17.020 --> 00:48:19.140] I don't know. [00:48:19.140 --> 00:48:21.060] To me, it's very overwhelming, [00:48:21.060 --> 00:48:23.940] but I think the more exposures or epochs [00:48:23.940 --> 00:48:26.100] that you have on NeurIPS, the better, [00:48:26.100 --> 00:48:28.300] and I'm basically trying to doing this audio experience [00:48:28.300 --> 00:48:29.300] to try to bring people in, [00:48:29.300 --> 00:48:31.660] 'cause there's many people who have just never come, [00:48:31.660 --> 00:48:34.100] but they should get a sense of what's going on here. [00:48:34.100 --> 00:48:35.860] I find there are people here [00:48:35.860 --> 00:48:36.900] who you've never heard of on Twitter. [00:48:36.900 --> 00:48:38.140] They're not on Twitter. [00:48:38.140 --> 00:48:40.500] They just know more, 'cause they've just done the work. [00:48:40.500 --> 00:48:43.180] - Exactly, yeah. - They've read everything. [00:48:43.180 --> 00:48:44.780] Have you seen the Datacomps paper? [00:48:44.780 --> 00:48:45.620] - I don't know. [00:48:45.620 --> 00:48:47.620] - I'll walk you over and show you. [00:48:47.620 --> 00:48:49.820] I was very impressed by their work. [00:48:49.820 --> 00:48:51.700] These people, they just come out of nowhere, [00:48:51.700 --> 00:48:53.220] and once a year, they do this, [00:48:53.220 --> 00:48:54.300] and this is the place to find them, [00:48:54.300 --> 00:48:55.740] so that's why I'm here. [00:48:55.740 --> 00:48:57.660] - Yeah, I mean, I completely agree. [00:48:57.660 --> 00:49:00.860] There's really such a good congregation [00:49:00.860 --> 00:49:02.860] of very good researchers, right? [00:49:02.860 --> 00:49:04.300] - Yeah, are you trying to hire them? [00:49:04.300 --> 00:49:05.620] Hey, let's make a hiring call. [00:49:05.620 --> 00:49:08.180] - Yeah, I mean, look, I think right now, [00:49:08.180 --> 00:49:10.340] we're a very small, very strong team. [00:49:10.340 --> 00:49:11.780] - We were five last time. [00:49:11.780 --> 00:49:15.020] - Yeah, so we are seven now. [00:49:15.020 --> 00:49:17.540] Only six engineers, though, so very small team. [00:49:17.540 --> 00:49:18.900] - You're more millions than people. [00:49:18.900 --> 00:49:20.860] (laughing) [00:49:20.860 --> 00:49:21.700] - Yeah. [00:49:21.700 --> 00:49:24.740] Look, we're a very small team, [00:49:24.740 --> 00:49:25.860] and we're looking to grow the team, [00:49:25.860 --> 00:49:27.580] but we're looking to grow it very carefully and slowly, [00:49:27.580 --> 00:49:29.940] 'cause I think a lot of companies [00:49:29.940 --> 00:49:31.580] fall into the pitfall of hiring too quickly. [00:49:31.580 --> 00:49:32.460] - Yes. [00:49:32.460 --> 00:49:34.740] - So yeah, we're really looking for fantastic people. [00:49:34.740 --> 00:49:38.740] We're seeing incredible traction, incredible growth. [00:49:38.740 --> 00:49:41.460] There's a lot more really interesting problems to tackle, [00:49:41.460 --> 00:49:44.100] and people should check out our blog post on that, [00:49:44.100 --> 00:49:46.860] 'cause I think it's very exciting, the kinds of things. [00:49:46.860 --> 00:49:47.860] - The fundraising post? [00:49:47.860 --> 00:49:49.020] - Yeah, there's a fundraising post, [00:49:49.020 --> 00:49:50.500] and then we kind of link there. [00:49:50.500 --> 00:49:51.620] There's a problems post if you go [00:49:51.620 --> 00:49:54.460] to anysphere.co/problems2023. [00:49:54.460 --> 00:49:58.660] There's lots of interesting work to do, [00:49:58.660 --> 00:50:01.300] and I think we have a really good chance of being the team [00:50:01.300 --> 00:50:02.860] that can crack CoGen. [00:50:02.860 --> 00:50:04.620] So it's a really exciting space. [00:50:04.620 --> 00:50:06.940] I think you'd be joining a very small, strong team. [00:50:06.940 --> 00:50:07.980] And so yeah, if you're interested [00:50:07.980 --> 00:50:10.500] in working with us at Cursor, we'd love to talk. [00:50:10.500 --> 00:50:13.620] You can just reach out to amon@cursor.sh. [00:50:13.620 --> 00:50:14.940] - Nice, sh, oh, okay. [00:50:14.940 --> 00:50:15.860] - Yeah, well-- [00:50:15.860 --> 00:50:16.700] - I thought it was so-- [00:50:16.700 --> 00:50:18.700] - We might try to get .ai or .com. [00:50:18.700 --> 00:50:20.260] We'll see, we'll see. [00:50:20.260 --> 00:50:21.260] Cool, well, thanks for dropping by. [00:50:21.260 --> 00:50:22.100] - Yeah, for sure. [00:50:22.100 --> 00:50:23.100] - Thanks for having me. [00:50:23.100 --> 00:50:25.700] - So there again, you see one of the topics [00:50:25.700 --> 00:50:27.860] that I highlighted from my conversation with John Frankel, [00:50:27.860 --> 00:50:29.220] which is why I put it at the start, [00:50:29.220 --> 00:50:32.220] which is synthetic data generation in all its glory. [00:50:32.220 --> 00:50:35.900] And for Amon and Cursor, they're particularly interested [00:50:35.900 --> 00:50:39.740] in LLMs as rankers, or LLMs as judges. [00:50:39.740 --> 00:50:43.540] And that seems to be generally a more blessed way [00:50:43.540 --> 00:50:46.300] than directly distilling the output of LLMs. [00:50:46.300 --> 00:50:49.620] And you can look out for our episode with Nathan in 2024 [00:50:49.620 --> 00:50:50.940] to go deeper on that. [00:50:50.940 --> 00:50:52.420] Another founder that recently raised [00:50:52.420 --> 00:50:54.580] that is the talk of the AI community, [00:50:54.580 --> 00:50:57.860] particularly with Guillermo Rauch and Toby Lutka [00:50:57.860 --> 00:51:00.660] recently endorsing the product is Aravind Srinivas, [00:51:00.660 --> 00:51:03.900] or Perplexity AI, which started off being, [00:51:03.900 --> 00:51:06.660] maybe we will construct SQL queries for you. [00:51:06.660 --> 00:51:09.500] And they went to, maybe we'll construct SQL queries [00:51:09.500 --> 00:51:11.180] on our Twitter screen for you. [00:51:11.180 --> 00:51:14.140] And now they've blown up as a potential Google replacement, [00:51:14.140 --> 00:51:16.020] which is a huge increase in ambition, [00:51:16.020 --> 00:51:18.780] but they have the web app and the mobile apps to prove it. [00:51:18.780 --> 00:51:21.020] So here's Aravind with Perplexity. [00:51:21.020 --> 00:51:24.860] - And so congrats on all your success of Perplexity. [00:51:24.860 --> 00:51:26.380] The two most recent accomplishments, [00:51:26.380 --> 00:51:28.700] which I have seen at least on my feed is, [00:51:28.700 --> 00:51:31.700] one, you hit a million people on your mobile app. [00:51:31.700 --> 00:51:32.540] That's huge. [00:51:32.540 --> 00:51:35.860] - On both platforms, Android and iOS, independently. [00:51:35.860 --> 00:51:38.300] - Is that because of your slick video editing skills? [00:51:38.300 --> 00:51:41.400] - Actually, we have a good brand marketing designer. [00:51:41.400 --> 00:51:44.180] But I mean, more than everything else, [00:51:44.180 --> 00:51:47.820] I think the app's really good, fast. [00:51:47.820 --> 00:51:49.380] We spent a lot of time on it. [00:51:49.380 --> 00:51:53.100] In fact, our first rollout of the app was not that great. [00:51:53.100 --> 00:51:54.860] It was slow, it used to crash. [00:51:54.860 --> 00:51:56.660] Users complained, and we listened to that, [00:51:56.660 --> 00:51:58.780] and recruited a good mobile team, [00:51:58.780 --> 00:52:00.500] much faster and more reliable. [00:52:00.500 --> 00:52:02.340] - Any technical decisions that drove that? [00:52:02.340 --> 00:52:04.700] Is it React Native, that's slow, or something else? [00:52:04.700 --> 00:52:05.700] - It's all native. [00:52:05.700 --> 00:52:08.420] There's no, we're not on one common React stack. [00:52:08.420 --> 00:52:10.100] And the reason to do that is that's the only way [00:52:10.100 --> 00:52:12.740] to make the apps feel fast, right? [00:52:12.740 --> 00:52:15.720] And I believe ChatGPT also does this. [00:52:15.720 --> 00:52:17.220] They don't use React Native. [00:52:17.220 --> 00:52:19.340] - And then the other accomplishment is PPLX Online, [00:52:19.340 --> 00:52:21.540] which you're showing on screen here. [00:52:21.540 --> 00:52:23.860] What are the headline things that people should know [00:52:23.860 --> 00:52:25.580] if they haven't heard of PPLX Online? [00:52:25.580 --> 00:52:27.780] - Well, it's like the only LLM API [00:52:27.780 --> 00:52:29.580] that has no knowledge cutoff. [00:52:29.580 --> 00:52:32.040] So if you're a developer, and you just wanna prototype [00:52:32.040 --> 00:52:34.060] products that need information from the web, [00:52:34.060 --> 00:52:37.140] or has no knowledge cutoff, this is the only way to do that. [00:52:37.140 --> 00:52:38.900] And it's super fast, pretty accurate. [00:52:38.900 --> 00:52:41.100] You have two versions, a 7b and a 70b. [00:52:41.100 --> 00:52:43.860] So 7b is super fast, 7b is a little slower, [00:52:43.860 --> 00:52:45.740] but also better quality. [00:52:45.740 --> 00:52:47.660] And we plan to bring it up in the context [00:52:47.660 --> 00:52:49.900] of the Mixed Role MOE as well. [00:52:49.900 --> 00:52:50.940] That's been recently released. [00:52:50.940 --> 00:52:52.060] - Yeah, I think you've been pretty transparent [00:52:52.060 --> 00:52:53.300] that they are fine-tuned to Llamatu. [00:52:53.300 --> 00:52:56.140] - That's right, we are not in the business of pre-training. [00:52:56.140 --> 00:52:57.680] - But what do you fine-tune for [00:52:57.680 --> 00:52:59.500] between Llamatu and what you have? [00:52:59.500 --> 00:53:01.780] - Yeah, we fine-tune for summarization, [00:53:01.780 --> 00:53:03.420] the ability to take a bunch of sources [00:53:03.420 --> 00:53:05.300] and accurately give you a nice summary. [00:53:05.300 --> 00:53:08.220] - And you are, I think, the only provider right now [00:53:08.220 --> 00:53:10.260] with online access or whatever. [00:53:10.260 --> 00:53:14.220] But also Grok has access to Twitter, [00:53:14.220 --> 00:53:15.140] which you don't have. [00:53:15.140 --> 00:53:16.940] And they will release an API at some point. [00:53:16.940 --> 00:53:20.980] - If they release it, we'll be happy to use it. [00:53:20.980 --> 00:53:23.980] Our goal is to just give accurate answers on the web. [00:53:23.980 --> 00:53:26.040] And Twitter is just one part of the web. [00:53:26.040 --> 00:53:29.000] Their vision is like Twitter is the everything app. [00:53:29.000 --> 00:53:31.580] We believe that's the information out there [00:53:31.580 --> 00:53:34.580] that exists outside of Twitter that's also super valuable. [00:53:34.580 --> 00:53:36.340] In fact, you can even make an argument [00:53:36.340 --> 00:53:38.700] that information outside Twitter [00:53:38.700 --> 00:53:40.380] may even be a lot more valuable [00:53:40.380 --> 00:53:41.900] than information within Twitter [00:53:41.900 --> 00:53:44.660] because most of the links that get shared on Twitter [00:53:44.660 --> 00:53:46.300] are all from outside anyway. [00:53:46.300 --> 00:53:48.100] So it's only what you miss out on [00:53:48.100 --> 00:53:50.460] is a specific person's opinion. [00:53:50.460 --> 00:53:52.660] And usually journalists pick on that [00:53:52.660 --> 00:53:53.780] and write web articles. [00:53:53.780 --> 00:53:55.800] So it's all gonna diffuse. [00:53:55.800 --> 00:53:58.260] Good ideas usually diffuse the rest of the web. [00:53:58.260 --> 00:54:00.060] So we're not really missing out much. [00:54:00.060 --> 00:54:01.580] - It's a different source of data. [00:54:01.580 --> 00:54:03.320] - Yeah, it's a different source of data. [00:54:03.320 --> 00:54:05.780] Also, it's all about what do you want. [00:54:05.780 --> 00:54:07.380] Is your source a citation like [00:54:07.380 --> 00:54:10.400] already highly curated human artifact [00:54:10.400 --> 00:54:11.900] or is it like some tweet? [00:54:11.900 --> 00:54:13.700] These are all questions worth asking. [00:54:13.700 --> 00:54:14.940] - One thing that you do show off, [00:54:14.940 --> 00:54:16.700] so I was watching you demo just now. [00:54:16.700 --> 00:54:18.340] You have sentence by sentence citations. [00:54:18.340 --> 00:54:19.180] - That's right, yeah. [00:54:19.180 --> 00:54:20.580] - That's a design choice. [00:54:20.580 --> 00:54:23.620] Because realistically, your source articles [00:54:23.620 --> 00:54:25.940] actually overlap the full paragraph. [00:54:25.940 --> 00:54:29.300] So why did you choose to impose sentence by sentence? [00:54:29.300 --> 00:54:30.660] - That's how we write papers. [00:54:30.660 --> 00:54:31.980] I'm an academic. [00:54:31.980 --> 00:54:33.500] Every sentence you write in a paper [00:54:33.500 --> 00:54:35.300] needs to have a corresponding citation. [00:54:35.300 --> 00:54:37.060] - As a user, it can be confusing. [00:54:37.060 --> 00:54:38.660] Like when I click that link, [00:54:38.660 --> 00:54:40.500] maybe it's like the third paragraph. [00:54:40.500 --> 00:54:41.320] - That's right. [00:54:41.320 --> 00:54:43.740] We can do better in like exactly navigating you [00:54:43.740 --> 00:54:45.220] to the right part of the link. [00:54:45.220 --> 00:54:46.380] But we're looking into all that. [00:54:46.380 --> 00:54:47.380] - Yeah, of course. [00:54:47.380 --> 00:54:49.500] I mean, I do see you as like a search engine first [00:54:49.500 --> 00:54:51.340] with a very good language model team. [00:54:51.340 --> 00:54:52.500] - That's right, yeah. [00:54:52.500 --> 00:54:53.320] - Right? [00:54:53.320 --> 00:54:54.160] - Yeah, answer engine. [00:54:54.160 --> 00:54:54.980] I would call it answer engine. [00:54:54.980 --> 00:54:55.820] - Answer engine. [00:54:55.820 --> 00:54:57.940] You are doing a really good job with that. [00:54:57.940 --> 00:55:00.300] I also noticed in your PPLX blog post [00:55:00.300 --> 00:55:02.220] that you also talked about the fresh LLM paper. [00:55:02.220 --> 00:55:03.060] - That's right. [00:55:03.060 --> 00:55:03.900] - Maybe could you introduce that [00:55:03.900 --> 00:55:05.980] and did you talk to the authors? [00:55:05.980 --> 00:55:07.780] Are they here at NeurIPS? [00:55:07.780 --> 00:55:10.060] - I did not talk to the authors. [00:55:10.060 --> 00:55:14.020] It's not like we took a lot of inspiration from it, [00:55:14.020 --> 00:55:19.020] but it made sense to attribute the citation to them. [00:55:19.020 --> 00:55:21.580] - Yeah, to intellectual backgrounds. [00:55:21.580 --> 00:55:24.260] What do you look for at NeurIPS at a conference like this? [00:55:24.260 --> 00:55:27.340] - We're here for recruiting good, strong researchers [00:55:27.340 --> 00:55:29.120] to join our team, [00:55:29.120 --> 00:55:31.300] especially if they're more focused on shipping models [00:55:31.300 --> 00:55:34.540] to a search product seized by millions of people. [00:55:34.540 --> 00:55:35.380] - Awesome. [00:55:35.380 --> 00:55:37.860] We'll talk about your hiring call to action in a bit. [00:55:37.860 --> 00:55:40.460] I'm also interested in labs, like perplexity labs. [00:55:40.460 --> 00:55:41.300] - Yeah. [00:55:41.300 --> 00:55:42.160] - It seems like a place for you guys [00:55:42.160 --> 00:55:43.940] to experiment with serving models. [00:55:43.940 --> 00:55:45.060] - That's right. [00:55:45.060 --> 00:55:49.460] Yeah, everybody thinks you start off as a wrapper [00:55:49.460 --> 00:55:51.900] and then one magic day you just switch over [00:55:51.900 --> 00:55:55.940] from 3.5 to your own model. [00:55:55.940 --> 00:55:57.380] That's not how it works in practice. [00:55:57.380 --> 00:55:59.860] Your GPUs crash or your nodes are not working [00:55:59.860 --> 00:56:01.780] or Kubernetes doesn't work as expected [00:56:01.780 --> 00:56:06.620] and requests are not having the throughput required. [00:56:06.620 --> 00:56:07.900] You optimize for latency, [00:56:07.900 --> 00:56:10.680] but then you are worse on throughput, [00:56:10.680 --> 00:56:12.900] so you're not able to handle spike requests. [00:56:12.900 --> 00:56:15.080] So all these things can happen, right? [00:56:15.080 --> 00:56:18.180] So you only know about these if you start small [00:56:18.180 --> 00:56:20.900] and serve a playground where people come [00:56:20.900 --> 00:56:24.400] and test your own infrastructure and see how it holds up [00:56:24.400 --> 00:56:25.660] and then take the lessons from there [00:56:25.660 --> 00:56:28.100] and use it to serve it on production, right? [00:56:28.100 --> 00:56:30.340] So labs is sort of our playground [00:56:30.340 --> 00:56:33.860] for testing open-source models and our in-house models [00:56:33.860 --> 00:56:34.700] that have been fine-tuned [00:56:34.700 --> 00:56:37.620] for factual accuracy and helpfulness. [00:56:37.620 --> 00:56:41.020] And it's a nice way for people to test open-source models [00:56:41.020 --> 00:56:42.780] if they're curious about it, [00:56:42.780 --> 00:56:43.860] especially if they think about it [00:56:43.860 --> 00:56:45.860] as alternatives to chat GPT. [00:56:45.860 --> 00:56:47.240] And then it's also a nice way for us [00:56:47.240 --> 00:56:49.700] to battle-test our infrastructure. [00:56:49.700 --> 00:56:51.540] Same thing goes to the API. [00:56:51.540 --> 00:56:53.180] It's not like I believe these APIs [00:56:53.180 --> 00:56:57.260] are gonna take over GPT 3.5 APIs or something, [00:56:57.260 --> 00:56:58.820] but it's a nice way for our developers [00:56:58.820 --> 00:57:01.100] who want an alternative to explore, [00:57:01.100 --> 00:57:03.860] especially those who wanna use faster, smaller models, [00:57:03.860 --> 00:57:05.220] like the 7B models. [00:57:05.220 --> 00:57:07.060] And it's also a good way for us to know [00:57:07.060 --> 00:57:09.620] how we can handle search requests and things like that. [00:57:09.620 --> 00:57:10.460] - Yeah. [00:57:10.460 --> 00:57:12.540] I mean, so I wanna push back on this. [00:57:12.540 --> 00:57:15.060] You said your playground is a way to battle-test, [00:57:15.060 --> 00:57:17.100] but I think you would probably get orders [00:57:17.100 --> 00:57:19.660] of magnitude more traffic on your main app [00:57:19.660 --> 00:57:20.900] than your side app. [00:57:20.900 --> 00:57:23.300] - Look, we can't just directly ship to the main app, right? [00:57:23.300 --> 00:57:24.700] And you can never simulate-- [00:57:24.700 --> 00:57:25.660] - It's like a staging environment. [00:57:25.660 --> 00:57:26.980] - Yeah, it's like a staging environment. [00:57:26.980 --> 00:57:29.300] But it's not just meant to be a staging. [00:57:29.300 --> 00:57:31.980] I don't wanna downplay the importance of Labs. [00:57:31.980 --> 00:57:34.140] Labs is sort of one of the fewest places [00:57:34.140 --> 00:57:36.740] on the internet today for you to go and explore [00:57:36.740 --> 00:57:38.620] and compare different open-source models. [00:57:38.620 --> 00:57:42.460] And it also tells the user how fast our inference is. [00:57:42.460 --> 00:57:45.060] We give you all the metrics like tokens per second, [00:57:45.060 --> 00:57:46.380] the time to first token. [00:57:46.380 --> 00:57:48.500] It's also a very transparent way to communicate [00:57:48.500 --> 00:57:50.180] the speed of our infrastructure, [00:57:50.180 --> 00:57:52.780] which helps us also recruit good talent for infrastructure. [00:57:52.780 --> 00:57:55.220] - Yeah, but I think you're pretty opinionated [00:57:55.220 --> 00:57:56.980] that you are an app company first. [00:57:56.980 --> 00:57:57.820] - Yeah, we are an app company. [00:57:57.820 --> 00:57:58.660] - You're not an inference company. [00:57:58.660 --> 00:57:59.480] - That's right. [00:57:59.480 --> 00:58:00.320] - You just happen to have-- [00:58:00.320 --> 00:58:02.140] - We're not competing with Together AI or-- [00:58:02.140 --> 00:58:02.980] - Fireworks. [00:58:02.980 --> 00:58:04.540] - Fireworks or like OctoML. [00:58:04.540 --> 00:58:05.380] - Yeah. [00:58:05.380 --> 00:58:06.380] - You know, there are like too many of them actually, [00:58:06.380 --> 00:58:08.300] honestly, and-- [00:58:08.300 --> 00:58:09.380] - What do you think they need to do to win [00:58:09.380 --> 00:58:10.620] as an objective third party? [00:58:10.620 --> 00:58:13.100] - I think they need to raise an insane amount of capital [00:58:13.100 --> 00:58:16.660] and subsidize the cost so much and capture the market, [00:58:16.660 --> 00:58:18.300] or it's basically gonna be impossible [00:58:18.300 --> 00:58:20.660] because you're all offering the same thing, more or less. [00:58:20.660 --> 00:58:22.860] And NVIDIA's basically commoditizing it, right? [00:58:22.860 --> 00:58:26.780] Like with TRT-LM and like Megatron and things like that. [00:58:26.780 --> 00:58:30.460] So most people's stacks are gonna get standardized. [00:58:30.460 --> 00:58:32.060] So then why am I paying you? [00:58:32.060 --> 00:58:33.620] I'm paying you for the GPUs then. [00:58:33.620 --> 00:58:34.460] - Right. [00:58:34.460 --> 00:58:35.500] - That's the game you can only pay it, [00:58:35.500 --> 00:58:37.460] like it's an economy of scale thing. [00:58:37.460 --> 00:58:39.780] - Which you're also buying your own GPUs [00:58:39.780 --> 00:58:40.700] and running your own stack. [00:58:40.700 --> 00:58:41.540] - That's right, that's right. [00:58:41.540 --> 00:58:43.660] But we care about buying GPUs to serve our own product [00:58:43.660 --> 00:58:45.820] more than helping other people serve their products. [00:58:45.820 --> 00:58:48.180] - Yeah, what have you learned being like a, [00:58:48.180 --> 00:58:50.220] I don't know, I feel like you're both an infra CEO [00:58:50.220 --> 00:58:52.940] and an application sort of product CEO. [00:58:52.940 --> 00:58:53.780] How do you balance that? [00:58:53.780 --> 00:58:56.020] - Yeah, it's difficult, but you know, [00:58:56.020 --> 00:58:58.180] one thing exists in service of the other, right? [00:58:58.180 --> 00:59:00.140] Infrastructure exists in service of the product. [00:59:00.140 --> 00:59:01.540] You always have to remember that. [00:59:01.540 --> 00:59:02.820] For some people, product exists [00:59:02.820 --> 00:59:05.060] in service of the infrastructure. [00:59:05.060 --> 00:59:06.140] That's not how we are. [00:59:06.140 --> 00:59:08.820] - What does Perplexity become a year, two years from now? [00:59:08.820 --> 00:59:09.980] - Hopefully like a lot more people [00:59:09.980 --> 00:59:12.500] start using it as a Google replacement. [00:59:12.500 --> 00:59:13.340] - I see. [00:59:13.340 --> 00:59:14.660] You already, I read some stats somewhere, [00:59:14.660 --> 00:59:15.860] you're 10% of Bing traffic. [00:59:15.860 --> 00:59:17.980] - I don't know about that, but. [00:59:17.980 --> 00:59:19.300] - Someone was measuring like a third party, [00:59:19.300 --> 00:59:20.620] like similar type of thing. [00:59:20.620 --> 00:59:23.580] - Yeah, maybe for, actually for Bing chat, [00:59:23.580 --> 00:59:26.300] we might be even further ahead. [00:59:26.300 --> 00:59:27.140] - Okay. [00:59:27.140 --> 00:59:29.220] - Like there's just Perplexity versus Bing chat, [00:59:29.220 --> 00:59:32.340] not Bing.com, which is crazy given that [00:59:32.340 --> 00:59:34.120] they have so much distribution, right? [00:59:34.120 --> 00:59:34.960] - Oh yeah. [00:59:34.960 --> 00:59:35.780] - And marketing power. [00:59:35.780 --> 00:59:37.780] - But you are more AI native than they are? [00:59:37.780 --> 00:59:38.620] - That's right. [00:59:38.620 --> 00:59:39.440] - In a sense. [00:59:39.440 --> 00:59:40.280] - That's right. [00:59:40.280 --> 00:59:41.100] - You are a different search index. [00:59:41.100 --> 00:59:41.940] Like you have your own crawlers and everything. [00:59:41.940 --> 00:59:43.820] - Yeah, we have our own crawlers and indexes, yeah. [00:59:43.820 --> 00:59:45.300] - So like if I don't want Bing, [00:59:45.300 --> 00:59:47.340] then I use your stuff and maybe you turn-- [00:59:47.340 --> 00:59:48.420] - That's right, yeah, that's right. [00:59:48.420 --> 00:59:49.260] - That's cool. [00:59:49.260 --> 00:59:50.080] So what are you hiring? [00:59:50.080 --> 00:59:50.920] What are you looking to hire? [00:59:50.920 --> 00:59:52.860] What should people demonstrate when joining you? [00:59:52.860 --> 00:59:54.460] I think you have a very strong perspective [00:59:54.460 --> 00:59:55.820] on the kind of culture that you're building. [00:59:55.820 --> 00:59:57.280] - Yeah, I mean, we work pretty hard [00:59:57.280 --> 00:59:59.860] and like we wanna get stuff done fast. [00:59:59.860 --> 01:00:03.420] So if you enjoy like fast shipping cycles and-- [01:00:03.420 --> 01:00:04.860] - Can you give illustrations? [01:00:04.860 --> 01:00:06.340] Like what do you mean by that? [01:00:06.340 --> 01:00:07.540] - You know, every two weeks, [01:00:07.540 --> 01:00:09.540] like we have some announcements we make. [01:00:09.540 --> 01:00:12.860] So we work on very clear, precise projects [01:00:12.860 --> 01:00:14.620] that have like clear deliverables [01:00:14.620 --> 01:00:17.820] and we kind of constantly wanna keep improving the product. [01:00:17.820 --> 01:00:20.740] So as a machine learning research engineer, [01:00:20.740 --> 01:00:22.720] if you're excited about like training models [01:00:22.720 --> 01:00:24.340] and shipping them to production [01:00:24.340 --> 01:00:27.340] for such a useful use case like consumer search [01:00:27.340 --> 01:00:31.340] and wanna do it at the same velocity as us, [01:00:31.340 --> 01:00:33.100] like a startup rather than a big company [01:00:33.100 --> 01:00:34.300] that has to wait for several months [01:00:34.300 --> 01:00:35.620] to get something into production, [01:00:35.620 --> 01:00:38.360] that's a unique spot like to be in, right? [01:00:38.360 --> 01:00:40.820] And you also wanna be part of a growing exponential [01:00:40.820 --> 01:00:42.460] rather than something that's trying [01:00:42.460 --> 01:00:44.780] to defend its territory, right? [01:00:44.780 --> 01:00:45.700] - Defend its territory? [01:00:45.700 --> 01:00:46.740] - Yeah, like Google. [01:00:46.740 --> 01:00:48.220] Google's defending. - I see, yeah, yeah, yeah. [01:00:48.220 --> 01:00:49.860] - So they attack. [01:00:49.860 --> 01:00:50.940] So you wanna be an attacking team. [01:00:50.940 --> 01:00:52.740] - Have you heard, like what does Google say about you? [01:00:52.740 --> 01:00:54.260] Like are they interested in buying you? [01:00:54.260 --> 01:00:57.380] - I think they're being pretty appreciative [01:00:57.380 --> 01:00:59.140] and respectful of the product, right? [01:00:59.140 --> 01:01:02.260] - But like SGE is not great for some reason. [01:01:02.260 --> 01:01:03.100] - Yeah. [01:01:03.100 --> 01:01:07.140] By the way, I don't think Google people are not talented. [01:01:07.140 --> 01:01:09.900] Like they're probably more talented than we are. [01:01:09.900 --> 01:01:14.700] I think it's just that their incentives are not clear [01:01:14.700 --> 01:01:16.740] and they might have to cannibalize [01:01:16.740 --> 01:01:18.900] their own business model to like-- [01:01:18.900 --> 01:01:20.820] - This is the classic innovators dilemma, right? [01:01:20.820 --> 01:01:21.660] - Exactly, yeah. [01:01:21.660 --> 01:01:24.580] - They have a cash cow and they're trying to preserve that. [01:01:24.580 --> 01:01:28.660] You don't have ads, but you're serving subscriptions [01:01:28.660 --> 01:01:30.140] and that's the main business model for now. [01:01:30.140 --> 01:01:30.980] - As of today, yeah. [01:01:30.980 --> 01:01:31.900] - Yeah, that's it. [01:01:31.900 --> 01:01:33.560] Well, thank you very much. - Cool. [01:01:33.560 --> 01:01:35.300] - All the people that we talked to so far [01:01:35.300 --> 01:01:36.580] and some of the best founders I know, [01:01:36.580 --> 01:01:39.500] whether or not they're in AI, are fierce nerds. [01:01:39.500 --> 01:01:41.460] And our event definitely reminds me [01:01:41.460 --> 01:01:43.540] of the fierce nerds concept. [01:01:43.540 --> 01:01:46.020] But I don't think I'm the best person to tell that story. [01:01:46.020 --> 01:01:48.540] Maybe I'll tag in Sean Puri. [01:01:48.540 --> 01:01:50.640] - Yeah, have you ever read that Paul Graham blog post [01:01:50.640 --> 01:01:52.140] called "Fierce Nerds"? [01:01:52.140 --> 01:01:53.180] - No, what is it? [01:01:53.180 --> 01:01:54.020] - It's an amazing post. [01:01:54.020 --> 01:01:55.500] I'm gonna read you a couple pieces of it, [01:01:55.500 --> 01:01:57.700] but it's one of those like, Paul Graham, I think is, [01:01:57.700 --> 01:01:59.060] somebody said this earlier, they go, [01:01:59.060 --> 01:02:00.020] what's that guy, Andrew Tate? [01:02:00.020 --> 01:02:01.420] They started some tweet that was really funny. [01:02:01.420 --> 01:02:04.300] It was, "Paul Graham was my Andrew Tate growing up." [01:02:04.300 --> 01:02:06.260] - Same. - Which is just like, [01:02:06.260 --> 01:02:07.100] so funny. [01:02:07.100 --> 01:02:08.880] It's such a funny, it's such a deep cut joke, [01:02:08.880 --> 01:02:11.540] but if you get it, you're like, it just hits the spot. [01:02:11.540 --> 01:02:13.060] All right, so he wrote this post and he goes, [01:02:13.060 --> 01:02:15.500] "Most people think of nerds as quiet, [01:02:15.500 --> 01:02:17.000] "sort of like diffident people, right? [01:02:17.000 --> 01:02:18.700] "Just sort of like passive. [01:02:18.700 --> 01:02:20.520] "And in most social situations, they are. [01:02:20.520 --> 01:02:23.180] "They're quiet and they're not the star quarterback [01:02:23.180 --> 01:02:24.520] "in the middle of the gym, right? [01:02:24.520 --> 01:02:25.580] "They're kind of a fish out of water [01:02:25.580 --> 01:02:27.100] "in a bunch of different things." [01:02:27.100 --> 01:02:28.680] He goes, "But this is an illusion [01:02:28.680 --> 01:02:32.000] "because that only happens when non-nerds observe them [01:02:32.000 --> 01:02:34.960] "'cause they're observing them in non-nerdy situations. [01:02:34.960 --> 01:02:36.900] "So you see a nerd at prom, [01:02:36.900 --> 01:02:39.940] "you just see them as a quiet sort of passive nerd. [01:02:39.940 --> 01:02:41.100] "There's no alpha in them. [01:02:41.100 --> 01:02:44.160] "But in fact, some nerds are quite fierce. [01:02:44.160 --> 01:02:46.660] "Fierce nerds are a small but interesting group. [01:02:46.660 --> 01:02:48.380] "They are extremely competitive, [01:02:48.380 --> 01:02:49.840] "more competitive, I would say, [01:02:49.840 --> 01:02:51.540] "than competitive non-nerds [01:02:51.540 --> 01:02:54.200] "because the competition is more personal to them, [01:02:54.200 --> 01:02:56.780] "partly because they're not emotionally mature [01:02:56.780 --> 01:02:58.320] "and they distance themselves from it, [01:02:58.320 --> 01:03:00.600] "but also because there's less randomness [01:03:00.600 --> 01:03:02.520] "in the types of competition that they engage in. [01:03:02.520 --> 01:03:06.380] "Therefore, they're justified in making it more personal." [01:03:06.380 --> 01:03:07.200] - I'll cut it off there. [01:03:07.200 --> 01:03:09.740] That's a clip from the "My First Million" podcast. [01:03:09.740 --> 01:03:11.420] And that's a story about how Dharmesh Shah, [01:03:11.420 --> 01:03:13.500] the HubSpot CTO, is a fierce nerd. [01:03:13.500 --> 01:03:15.420] And I really like that concept [01:03:15.420 --> 01:03:18.500] because, one, it helps to validate that nerds can also win [01:03:18.500 --> 01:03:21.980] and why nerds can sometimes win more than regular people. [01:03:21.980 --> 01:03:24.660] And obviously, for more, you can read that Paul Graham essay. [01:03:24.660 --> 01:03:27.000] But I think Arvind is a fierce nerd, [01:03:27.000 --> 01:03:29.340] and I think Perplexity is a fierce nerd company. [01:03:29.340 --> 01:03:31.340] They do have competition, though. [01:03:31.340 --> 01:03:33.300] It's not like Perplexity is the only company [01:03:33.300 --> 01:03:34.460] going after Google, [01:03:34.460 --> 01:03:36.420] not the only company going after Search. [01:03:36.420 --> 01:03:37.540] One of my favorite parts [01:03:37.540 --> 01:03:40.100] in compiling these ensemble episodes [01:03:40.100 --> 01:03:43.460] is juxtaposing two competitors next to each other [01:03:43.460 --> 01:03:46.300] or people who disagree or have different worldviews. [01:03:46.300 --> 01:03:47.820] Like, you just heard Perplexity. [01:03:47.820 --> 01:03:50.580] You just heard Arvind dunk on all the infrastructure companies [01:03:50.580 --> 01:03:52.500] including Fireworks, which we just had on. [01:03:52.500 --> 01:03:53.780] Now, I'm not the right person [01:03:53.780 --> 01:03:55.460] to tell you who's right and who's wrong, [01:03:55.460 --> 01:03:57.780] but I know for a fact that they cannot all be right, [01:03:57.780 --> 01:03:59.020] and that's what's fascinating. [01:03:59.020 --> 01:04:00.020] That's what makes the market. [01:04:00.020 --> 01:04:02.180] So next, in full disclosure, is a personal friend of mine. [01:04:02.180 --> 01:04:04.300] It's Will Brick from Metaphor Systems. [01:04:04.300 --> 01:04:06.620] Metaphor launched end of 2022 [01:04:06.620 --> 01:04:09.180] with an AI search engine narrative as well, [01:04:09.180 --> 01:04:12.180] but their approach is more of a pre-trained [01:04:12.180 --> 01:04:16.620] LLM research engine as opposed to Arvind's answer engine. [01:04:16.620 --> 01:04:18.740] They're all very minor differences in the end. [01:04:18.740 --> 01:04:20.740] At the end of the day, people want to punch in a query [01:04:20.740 --> 01:04:23.380] and get results, and Metaphor's approach is different. [01:04:23.380 --> 01:04:25.400] They are going after the infrastructure play [01:04:25.400 --> 01:04:27.540] rather than the application plus infrastructure play, [01:04:27.540 --> 01:04:29.500] and it's just nice to contrast them together, [01:04:29.500 --> 01:04:31.580] and I'll leave the conclusions to you. [01:04:31.580 --> 01:04:32.660] What is Metaphor? [01:04:32.660 --> 01:04:34.980] - Metaphor is a search engine over the internet, [01:04:34.980 --> 01:04:38.620] but it's better than Google at handling complex queries. [01:04:38.620 --> 01:04:40.220] - Okay, why is that? [01:04:40.220 --> 01:04:41.040] - Why is that? [01:04:41.040 --> 01:04:44.180] Because we train a search algorithm from scratch [01:04:44.180 --> 01:04:46.420] to handle complex queries, basically. [01:04:46.420 --> 01:04:47.660] It's a totally different algorithm, yeah. [01:04:47.660 --> 01:04:49.060] - Why are you at NeurIPS? [01:04:49.060 --> 01:04:51.580] - I'm at NeurIPS because we want to learn [01:04:51.580 --> 01:04:52.700] about all the cool things people are working on, [01:04:52.700 --> 01:04:54.780] and also because we want to hire some crazy, [01:04:54.780 --> 01:04:57.180] good researchers to help build the future of search. [01:04:57.180 --> 01:04:58.240] - Metaphor has a search engine. [01:04:58.240 --> 01:04:59.820] That's what you launched last year, [01:04:59.820 --> 01:05:01.700] and then you also released an API, [01:05:01.700 --> 01:05:02.980] and I've actually been using the API. [01:05:02.980 --> 01:05:06.260] It's actually really good for augmenting LLMs with search. [01:05:06.260 --> 01:05:08.540] I don't know how much to which you want to lean [01:05:08.540 --> 01:05:11.220] being an app versus an infrastructure company. [01:05:11.220 --> 01:05:14.100] - Yeah, so we're leaning towards search infrastructure, [01:05:14.100 --> 01:05:15.660] so we really see ourselves as like, [01:05:15.660 --> 01:05:17.820] we want people to build applications on top of us. [01:05:17.820 --> 01:05:20.180] We see the future as everyone will use LLMs [01:05:20.180 --> 01:05:21.020] as the interface to everything, [01:05:21.020 --> 01:05:23.940] and we want to be powering the search to underlies that. [01:05:23.940 --> 01:05:26.100] I think we want people to build really cool UIs [01:05:26.100 --> 01:05:27.900] on top of our search, but the hard part, [01:05:27.900 --> 01:05:29.180] and the thing that we're focusing on, [01:05:29.180 --> 01:05:30.500] is really good search results. [01:05:30.500 --> 01:05:32.200] - Yeah, can you give examples? [01:05:32.200 --> 01:05:34.060] You have some really cool examples, [01:05:34.060 --> 01:05:36.660] like tweets, and books, and PDFs, and stuff. [01:05:36.660 --> 01:05:38.900] - People really get excited about researchers [01:05:38.900 --> 01:05:40.380] working on something similar to them [01:05:40.380 --> 01:05:42.220] in the Bay Area, or something like that. [01:05:42.220 --> 01:05:43.060] People have actually met-- [01:05:43.060 --> 01:05:43.880] - Oh, yeah, yeah, yeah. [01:05:43.880 --> 01:05:45.300] Competitive Intel research as well. [01:05:45.300 --> 01:05:47.580] - People have met people in real life based on searches, [01:05:47.580 --> 01:05:49.740] because the results are so high quality, [01:05:49.740 --> 01:05:52.180] and they're not SEO spammed in any way. [01:05:52.180 --> 01:05:54.380] It's just exactly what you're asking for, [01:05:54.380 --> 01:05:57.220] that it's cool to see that digital information [01:05:57.220 --> 01:06:00.340] to real-world interaction thing happen. [01:06:00.340 --> 01:06:03.340] - I actually also interviewed Arvind from Perplexity, [01:06:03.340 --> 01:06:07.140] who I feel like is also in that search domain, [01:06:07.140 --> 01:06:09.460] but he's less focused on search infrastructure, [01:06:09.460 --> 01:06:11.460] he's more focused on just being a search engine. [01:06:11.460 --> 01:06:13.460] I don't know if you've compared yourself [01:06:13.460 --> 01:06:15.260] to Perplexity in that way. [01:06:15.260 --> 01:06:16.660] - Yeah, I know, we get asked this a lot. [01:06:16.660 --> 01:06:18.140] I mean, Perplexity is doing a great job [01:06:18.140 --> 01:06:20.980] at combining LLMs with search results, [01:06:20.980 --> 01:06:23.900] and that does make for a better search engine. [01:06:23.900 --> 01:06:26.660] That is the future of the user interaction, [01:06:26.660 --> 01:06:29.740] but we're just more focused on the search results themselves [01:06:29.740 --> 01:06:31.460] and really trying to handle the queries [01:06:31.460 --> 01:06:33.540] that Google and Bing are not good at. [01:06:33.540 --> 01:06:34.380] - Yeah. [01:06:34.380 --> 01:06:37.420] - So, I mean, we want people to build LLM-style interactions [01:06:37.420 --> 01:06:39.020] on top of our thing as well. [01:06:39.020 --> 01:06:40.700] - Wait, so you say Google and Bing are not good at it. [01:06:40.700 --> 01:06:42.060] Do you think that people will use you [01:06:42.060 --> 01:06:44.220] in complement to Google and Bing, [01:06:44.220 --> 01:06:46.140] or do you just completely replace that? [01:06:46.140 --> 01:06:46.980] - At least in the beginning. [01:06:46.980 --> 01:06:48.860] Like, we're gonna be used in places [01:06:48.860 --> 01:06:51.140] where Google and Bing don't work well. [01:06:51.140 --> 01:06:53.980] So, I mean, if your application wants to know the weather, [01:06:53.980 --> 01:06:55.980] or wants to know that Taylor Swift song, [01:06:55.980 --> 01:06:57.300] basically, if your application knows [01:06:57.300 --> 01:06:58.780] the right keywords to search with, [01:06:58.780 --> 01:07:00.820] then sure, Google and Bing are gonna be fine for you. [01:07:00.820 --> 01:07:03.220] But if you want to make these complex, [01:07:03.220 --> 01:07:06.620] almost metaphorical queries with natural language, [01:07:06.620 --> 01:07:08.000] which are really the most powerful ones, [01:07:08.000 --> 01:07:09.420] then you should be using Metaphor. [01:07:09.420 --> 01:07:10.260] - Yeah, yeah. [01:07:10.260 --> 01:07:11.420] I was actually walking from your, [01:07:11.420 --> 01:07:13.500] we were walking from your sushi party [01:07:13.500 --> 01:07:16.220] that you just had, like, it's like a recruiting event. [01:07:16.220 --> 01:07:17.060] - I hope the food was good. [01:07:17.060 --> 01:07:18.060] - It was pretty good, it was pretty good. [01:07:18.060 --> 01:07:19.660] I love me a little bit of sushi. [01:07:19.660 --> 01:07:20.820] And I was actually talking to people [01:07:20.820 --> 01:07:22.840] about your auto-prompting feature, [01:07:22.840 --> 01:07:24.460] 'cause a lot of people, [01:07:24.460 --> 01:07:26.380] I was, there was someone from Mid-Journey there, [01:07:26.380 --> 01:07:28.460] and they were saying how Dolly 3 [01:07:28.460 --> 01:07:30.060] also does sort of auto-prompting, [01:07:30.060 --> 01:07:31.420] or rewriting of the prompts. [01:07:31.420 --> 01:07:32.260] - Yeah. [01:07:32.260 --> 01:07:33.300] - Is there art to auto-prompting? [01:07:33.300 --> 01:07:34.140] How do you feel that? [01:07:34.140 --> 01:07:35.300] How do you feel about your auto-prompting feature, [01:07:35.300 --> 01:07:36.140] basically? [01:07:36.140 --> 01:07:37.500] - Yeah, auto-prompt is like, we convert, [01:07:37.500 --> 01:07:38.780] we use Chatsuite-T, basically, [01:07:38.780 --> 01:07:41.340] to convert the queries that come into the search engine [01:07:41.340 --> 01:07:44.860] into queries that are formatted for Metaphor's models. [01:07:44.860 --> 01:07:48.820] Because Metaphor is trained to predict links given text, [01:07:48.820 --> 01:07:52.360] so the model really, like, the best way to prompt Metaphor [01:07:52.360 --> 01:07:55.460] is to search in a way where a link naturally follows, [01:07:55.460 --> 01:07:56.380] which can be confusing, [01:07:56.380 --> 01:07:57.380] so we have this auto-prompt [01:07:57.380 --> 01:07:58.420] that converts into the right format. [01:07:58.420 --> 01:07:59.260] - Yeah. [01:07:59.260 --> 01:08:00.540] - You can kind of think of Metaphor as in the same state [01:08:00.540 --> 01:08:02.660] as, like, what GPD 3 was in. [01:08:02.660 --> 01:08:04.040] I don't know if you guys remember, but, [01:08:04.040 --> 01:08:04.880] or if you remember-- [01:08:04.880 --> 01:08:05.720] - It's not instruction tunes, yeah. [01:08:05.720 --> 01:08:07.780] - Yeah, it's like, you know, two years ago, [01:08:07.780 --> 01:08:08.900] GPD 3 was auto-complete, [01:08:08.900 --> 01:08:10.100] so you had to, like, prompt it [01:08:10.100 --> 01:08:11.860] in order to get the best output from it. [01:08:11.860 --> 01:08:12.740] It had a lot of power, [01:08:12.740 --> 01:08:14.060] but it just had this weird user interface. [01:08:14.060 --> 01:08:15.740] Metaphor's in a similar situation. [01:08:15.740 --> 01:08:17.600] The problem is when you RLHF, you, like, [01:08:17.600 --> 01:08:18.780] and we've tried this, like, [01:08:18.780 --> 01:08:20.700] it does reduce the power of the model, [01:08:20.700 --> 01:08:23.340] and, like, it's just okay to, [01:08:23.340 --> 01:08:25.820] because, like, often we're using this auto-prompt, [01:08:25.820 --> 01:08:28.180] like, it's okay to keep this model the way it is, [01:08:28.180 --> 01:08:31.020] requiring this auto-complete type of search. [01:08:31.020 --> 01:08:33.420] - And, yeah, would you call yourself a search LLM? [01:08:33.420 --> 01:08:35.000] Like, very, very long ago, [01:08:35.000 --> 01:08:36.140] the original pitch for Metaphor [01:08:36.140 --> 01:08:37.660] that I heard from you guys was, [01:08:37.660 --> 01:08:41.460] you're an LLM that predicts links instead of tokens. [01:08:41.460 --> 01:08:42.580] - Oh, well, an LLM is, yeah, [01:08:42.580 --> 01:08:45.540] I mean, LLM is, like, it's modeling, like, [01:08:45.540 --> 01:08:46.660] yeah, usually, like, language, [01:08:46.660 --> 01:08:47.500] and we're not really, [01:08:47.500 --> 01:08:49.060] we're not exactly generating the links. [01:08:49.060 --> 01:08:50.460] We're, we search over an index. [01:08:50.460 --> 01:08:51.820] - Yeah, yeah, they're not hallucinated at all, right? [01:08:51.820 --> 01:08:52.820] They're actually from an index. [01:08:52.820 --> 01:08:54.580] - Yeah, I wouldn't call it a search LLM. [01:08:54.580 --> 01:08:55.620] - Okay. - It's more like a, [01:08:55.620 --> 01:08:56.620] really, a search engine. [01:08:56.620 --> 01:08:57.460] - Search engine. - You might even think [01:08:57.460 --> 01:08:58.540] of it as a research engine. [01:08:58.540 --> 01:08:59.380] - Yeah. - And there are a lot [01:08:59.380 --> 01:09:00.580] of different ways we're trying to explain it. [01:09:00.580 --> 01:09:02.500] I mean, I think we're, like, using terms [01:09:02.500 --> 01:09:06.100] that were developed in an old era for a new type of thing, [01:09:06.100 --> 01:09:08.060] so we might have to invent new words, [01:09:08.060 --> 01:09:09.580] or wait until they are created. [01:09:09.580 --> 01:09:10.540] - Yeah, yeah. [01:09:10.540 --> 01:09:12.020] What else should people know about Metaphor in general? [01:09:12.020 --> 01:09:14.300] Like, what other interesting work are you guys doing? [01:09:14.300 --> 01:09:16.140] - I think just, like, the vision is super exciting, [01:09:16.140 --> 01:09:19.040] and I think people don't realize how exciting the vision is. [01:09:19.040 --> 01:09:21.020] Basically, the vision is to solve search. [01:09:21.020 --> 01:09:21.860] What does that mean? [01:09:21.860 --> 01:09:24.180] It means that no matter how complex the query, [01:09:24.180 --> 01:09:25.420] Metaphor should be able to handle it. [01:09:25.420 --> 01:09:28.820] So we're talking, like, AI researchers, similar to you, [01:09:28.820 --> 01:09:31.480] who are in the Bay Area, who've worked on Rust before, [01:09:31.480 --> 01:09:33.280] who went to so-and-so college, [01:09:33.280 --> 01:09:36.060] who would be a great candidate for this startup. [01:09:36.060 --> 01:09:37.980] Whatever it is, we should be able to handle it, [01:09:37.980 --> 01:09:40.180] and language models are powerful enough [01:09:40.180 --> 01:09:42.500] to understand language at the level of a human. [01:09:42.500 --> 01:09:43.780] So you should theoretically be able [01:09:43.780 --> 01:09:44.980] to make a system like this. [01:09:44.980 --> 01:09:46.500] It's just a matter of how fast can it be. [01:09:46.500 --> 01:09:47.340] - Yeah. - And we wanna make [01:09:47.340 --> 01:09:49.700] these things, like, do all those complex queries really fast. [01:09:49.700 --> 01:09:51.620] And imagine if you could do this, [01:09:51.620 --> 01:09:54.180] imagine if this was, like, possible, [01:09:54.180 --> 01:09:56.100] and then you combine that with, like, [01:09:56.100 --> 01:09:57.740] you know, GPT-4, GPT-5, [01:09:57.740 --> 01:10:00.220] and that's how we want our customers to combine us, [01:10:00.220 --> 01:10:01.740] you know, combine us with GPT-4, GPT-5. [01:10:01.740 --> 01:10:02.860] Suddenly, now, you have the ability [01:10:02.860 --> 01:10:05.380] to literally answer any information query, [01:10:05.380 --> 01:10:07.060] no matter how complex. [01:10:07.060 --> 01:10:08.380] That, like, the entire world's knowledge [01:10:08.380 --> 01:10:09.380] is at your fingertips. [01:10:09.380 --> 01:10:10.860] That's, like, insane. (laughing) [01:10:10.860 --> 01:10:12.620] Like, we've basically become all-knowing. [01:10:12.620 --> 01:10:13.980] - Yeah. - You know, omnipotent. [01:10:13.980 --> 01:10:15.260] - Yeah, that's-- - Omniscient. [01:10:15.260 --> 01:10:16.140] (laughing) [01:10:16.140 --> 01:10:17.700] - Omniscient, and then omnipotent. [01:10:17.700 --> 01:10:18.540] - Omniscient, now-- - Knowledge is power, right? [01:10:18.540 --> 01:10:19.700] - Right, sorry, I skipped a step. [01:10:19.700 --> 01:10:22.500] - No, no, no, yeah, I can do that sort of QED proof [01:10:22.500 --> 01:10:24.700] of why omniscient equals omnipotent. [01:10:24.700 --> 01:10:26.980] I am very excited about you guys. [01:10:26.980 --> 01:10:28.980] You know, I've seen you grow literally [01:10:28.980 --> 01:10:33.340] from your living room, and it's definitely not over. [01:10:33.340 --> 01:10:37.100] What's it like having a meme-y celebrity CTO [01:10:37.100 --> 01:10:38.300] who keeps tweeting viral shit? [01:10:38.300 --> 01:10:39.140] - No, I mean, I love it. [01:10:39.140 --> 01:10:40.780] Like, Jeff literally just goes, [01:10:40.780 --> 01:10:41.820] Jeff has figured out Twitter. [01:10:41.820 --> 01:10:42.980] He just knows how to go viral, [01:10:42.980 --> 01:10:44.260] because he has really good takes, [01:10:44.260 --> 01:10:46.220] and we often throw up a party [01:10:46.220 --> 01:10:47.940] in response to his viral tweets. [01:10:47.940 --> 01:10:50.300] - So, you wanna talk about the Andrew Huberman party? [01:10:50.300 --> 01:10:51.420] - Yeah, okay, so he had a tweet [01:10:51.420 --> 01:10:53.460] that was like, Andrew Huberman has single-handedly [01:10:53.460 --> 01:10:54.820] destroyed the SF social scene, [01:10:54.820 --> 01:10:57.220] 'cause everyone, whatever, is sober at parties [01:10:57.220 --> 01:10:58.300] and goes home early. [01:10:58.300 --> 01:11:00.780] And so, of course, we had an anti-Huberman party, [01:11:00.780 --> 01:11:02.300] where everyone stayed late, [01:11:02.300 --> 01:11:05.020] and we had a bunch of beer, and everyone-- [01:11:05.020 --> 01:11:07.220] - Well, my favorite was all over the apartment [01:11:07.220 --> 01:11:08.220] that we had the party in. [01:11:08.220 --> 01:11:09.940] You plastered quotes from Andrew Huberman [01:11:09.940 --> 01:11:11.060] about how alcohol's bad for you. [01:11:11.060 --> 01:11:13.100] - Right, alcohol will destroy your brain, [01:11:13.100 --> 01:11:14.420] and all these things. [01:11:14.420 --> 01:11:16.580] I mean, look, everything in balance, right? [01:11:16.580 --> 01:11:17.560] We should have fun in life, [01:11:17.560 --> 01:11:19.460] but also, you know, be safe and everything. [01:11:19.460 --> 01:11:21.500] But, and then, he had another tweet [01:11:21.500 --> 01:11:24.380] about how he was gonna go on a date, [01:11:24.380 --> 01:11:25.340] but the girl ghosted him, [01:11:25.340 --> 01:11:28.100] and that allowed him to have focus on coding that night. [01:11:28.100 --> 01:11:30.460] So, of course, we had to have a ghosted-in-SF party, [01:11:30.460 --> 01:11:32.220] where everyone came to code together, [01:11:32.220 --> 01:11:33.900] 'cause you're already gonna be ghosted on Friday night. [01:11:33.900 --> 01:11:35.660] You might as well code together while you're at it. [01:11:35.660 --> 01:11:37.580] - Yeah, I love that part of the social scene, [01:11:37.580 --> 01:11:39.900] and I think Metaphor is also really driving that somehow. [01:11:39.900 --> 01:11:41.420] So, congrats for all you do, [01:11:41.420 --> 01:11:43.700] and it's just nice to check in with you. [01:11:43.700 --> 01:11:46.980] - I've personally been enjoying the Metaphor approach [01:11:46.980 --> 01:11:49.100] to LLM search APIs. [01:11:49.100 --> 01:11:53.540] I've often said this in context of the capabilities of GPTs. [01:11:53.540 --> 01:11:54.940] So, if you think about it, [01:11:54.940 --> 01:11:58.420] what are the capabilities of ChatGPT as it is today, [01:11:58.420 --> 01:12:00.960] as well as GPTs as announced on Dev Day, right? [01:12:00.960 --> 01:12:02.580] There's the LLM base layer, [01:12:02.580 --> 01:12:05.780] but then you tack on three core capabilities on top of it, [01:12:05.780 --> 01:12:08.060] right, one is Retrieval-Omitted Generation, [01:12:08.060 --> 01:12:10.140] where you upload files and then you do RAG on it, [01:12:10.140 --> 01:12:12.260] and second is a Code Interpreter, [01:12:12.260 --> 01:12:15.060] where you do generate code in a sandbox, [01:12:15.060 --> 01:12:17.060] and then you run code and you correct code, [01:12:17.060 --> 01:12:19.020] and finally you execute it. [01:12:19.020 --> 01:12:22.660] And third is you have a search feature. [01:12:22.660 --> 01:12:23.940] And so, we have a bunch of companies [01:12:23.940 --> 01:12:25.680] competing for the RAG functionality. [01:12:25.680 --> 01:12:27.260] You can check our episodes mutually [01:12:27.260 --> 01:12:30.820] with Harrison of LangChain and Jerry of Llama Index this year. [01:12:30.820 --> 01:12:31.740] There's a bunch of companies [01:12:31.740 --> 01:12:33.660] competing for the Code Interpreter capability. [01:12:33.660 --> 01:12:34.700] There's obviously Repl.it, [01:12:34.700 --> 01:12:38.220] but then abstractly, there's also Denno and Valtown, [01:12:38.220 --> 01:12:41.180] and anyone who runs code is in that game, basically. [01:12:41.180 --> 01:12:44.300] But what is surprisingly uncontested is OpenWebSearch, [01:12:44.300 --> 01:12:46.660] and so far, I think it's Perplexity and Metaphor [01:12:46.660 --> 01:12:49.240] that are leading the pack in their different approaches. [01:12:49.240 --> 01:12:54.120] One, the PPLX API is an integrated LLM+ search API, [01:12:54.120 --> 01:12:57.860] and then two is Metaphor, which is search-only, [01:12:57.860 --> 01:13:00.580] and you kind of bring your own LLMs. [01:13:00.580 --> 01:13:02.200] For our next guest, we're actually going to go over [01:13:02.200 --> 01:13:04.580] to our last return guest, [01:13:04.580 --> 01:13:06.320] which is one of our most recent hits, [01:13:06.320 --> 01:13:08.780] which is Jeremy Howard, previously of Fast.ai, [01:13:08.780 --> 01:13:10.280] but now of Answer.ai. [01:13:10.280 --> 01:13:12.000] It seems that all people want is answers, [01:13:12.000 --> 01:13:14.600] and Jeremy doesn't have them, but he has questions. [01:13:14.600 --> 01:13:17.820] - Outside of the Decibel event recording. [01:13:17.820 --> 01:13:19.160] - I realized I had to be the interviewer, [01:13:19.160 --> 01:13:20.300] and I was like, "I probably should buy a wine." [01:13:20.300 --> 01:13:21.400] - And I had to pick a wine, [01:13:21.400 --> 01:13:24.020] and Sean told me, "Pick the most expensive one." [01:13:24.020 --> 01:13:24.860] - Yeah, it's on Decibel, anyway. [01:13:24.860 --> 01:13:26.460] - Because Decibel is paying for it. [01:13:26.460 --> 01:13:28.840] The one I'm having is from a $160 bottle, [01:13:28.840 --> 01:13:29.760] and it's really good. [01:13:29.760 --> 01:13:31.720] - And I did the same. [01:13:31.720 --> 01:13:33.100] - And I'm not having any wine. [01:13:33.100 --> 01:13:35.320] (laughing) [01:13:35.320 --> 01:13:37.060] - Could we go around and identify voices [01:13:37.060 --> 01:13:38.500] for people listening? [01:13:38.500 --> 01:13:39.660] Maybe Tanishq, you want to start? [01:13:39.660 --> 01:13:41.740] - Sure, my name is Tanishq Abraham. [01:13:41.740 --> 01:13:44.120] I am the CEO of MedArk, [01:13:44.120 --> 01:13:46.620] which is a medical AI research organization. [01:13:46.620 --> 01:13:49.320] I also work as a research director at Stability AI, [01:13:49.320 --> 01:13:51.240] and I've been collaborating with Jeremy Howard [01:13:51.240 --> 01:13:54.820] for more than a year, a couple years maybe, [01:13:54.820 --> 01:13:57.020] and he's also the president of MedArk, [01:13:57.020 --> 01:13:59.700] and he's been heavily involved in my venture as well. [01:13:59.700 --> 01:14:02.020] - And you have a podcast together, which I really enjoyed. [01:14:02.020 --> 01:14:05.020] - Oh yeah, yes, Jeremy had me on his podcast, which was-- [01:14:05.020 --> 01:14:06.860] - Your first and only episode, or what the hell? [01:14:06.860 --> 01:14:07.700] - Yeah. [01:14:07.700 --> 01:14:08.820] (laughing) [01:14:08.820 --> 01:14:11.420] It turns out that maintaining a podcast is hard. [01:14:12.380 --> 01:14:15.500] - It's easy, just shove microphones in front of people. [01:14:15.500 --> 01:14:17.660] - So I'm Jeremy Howard, this is my voice, [01:14:17.660 --> 01:14:22.660] and as of today, I'm Jeremy Howard of Answer.ai, I guess. [01:14:22.660 --> 01:14:25.500] - And repeat guest on InSpace. [01:14:25.500 --> 01:14:27.300] Your last episode did really well [01:14:27.300 --> 01:14:28.700] in terms of the number of views. [01:14:28.700 --> 01:14:30.100] - Yeah, you guys are good interviewers. [01:14:30.100 --> 01:14:32.020] - Well, also you dropped a lot of spice, [01:14:32.020 --> 01:14:34.300] which is what we like as podcasters. [01:14:34.300 --> 01:14:36.060] We also have Jess Liao on for the first time, hey. [01:14:36.060 --> 01:14:38.940] - Yes, hello, I'm Jess Liao, and I'm a partner at Decibel. [01:14:38.940 --> 01:14:40.020] Excited to be here. [01:14:40.020 --> 01:14:41.620] Excited to be providing the wine also. [01:14:41.620 --> 01:14:42.740] - Standing in for Alessio. [01:14:42.740 --> 01:14:43.580] - Oh, so good. [01:14:43.580 --> 01:14:45.980] - Alessio dished us tonight, right? [01:14:45.980 --> 01:14:47.940] So you're the better replacement. [01:14:47.940 --> 01:14:50.500] - Yeah, it's good because in a previous conference, [01:14:50.500 --> 01:14:53.780] Alessio was wearing my badge and replacing me, [01:14:53.780 --> 01:14:55.300] so now I can be Alessio for tonight. [01:14:55.300 --> 01:14:56.140] - Well, you just work-- [01:14:56.140 --> 01:14:58.300] - A shorter version of Alessio, basically. [01:14:58.300 --> 01:14:59.540] (laughing) [01:14:59.540 --> 01:15:02.820] - So today was the Answer.ai announcement, [01:15:02.820 --> 01:15:03.980] maybe you wanna cover that? [01:15:03.980 --> 01:15:05.780] Just what should people know about it? [01:15:05.780 --> 01:15:07.020] - What should people know about it? [01:15:07.020 --> 01:15:09.180] Oh, I don't know, man. [01:15:09.180 --> 01:15:11.380] - You went from Fast.ai to the dark side now. [01:15:11.380 --> 01:15:12.220] - No, it's not at all. [01:15:12.220 --> 01:15:13.060] - To the for-profit. [01:15:13.060 --> 01:15:13.880] - It is the light side. [01:15:13.880 --> 01:15:15.860] - It is actually, it is the light side. [01:15:15.860 --> 01:15:20.100] Fast.ai, look, I spent the last week in San Francisco, [01:15:20.100 --> 01:15:22.900] and the amount of love I received for Fast.ai [01:15:22.900 --> 01:15:24.180] was overwhelming. [01:15:24.180 --> 01:15:26.860] I couldn't believe how many people told me [01:15:26.860 --> 01:15:28.460] it changed their life, you know? [01:15:28.460 --> 01:15:32.940] Which is just amazing, but I have to say, [01:15:32.940 --> 01:15:36.740] it's actually time to be rejuvenated. [01:15:36.740 --> 01:15:38.180] The mission is the same. [01:15:38.180 --> 01:15:40.340] Bring AI to as many people as possible. [01:15:40.340 --> 01:15:45.620] But now, we can't do it on the back of my bank account. [01:15:45.620 --> 01:15:48.600] I've been paying for everything, well, and my wife. [01:15:48.600 --> 01:15:49.440] We can't afford it anymore. [01:15:49.440 --> 01:15:50.440] - But you've had donations and stuff. [01:15:50.440 --> 01:15:51.280] - No, no, no, nothing. [01:15:51.280 --> 01:15:54.940] - But you were very steadfastly against donations, [01:15:54.940 --> 01:15:55.780] I remember this. [01:15:55.780 --> 01:15:57.980] - Yeah, no donations, no revenue of any kind, [01:15:57.980 --> 01:15:59.200] totally independent. [01:15:59.200 --> 01:16:04.580] But now, I think we can do a better job [01:16:04.580 --> 01:16:07.580] by having a bank account with money in it. [01:16:07.580 --> 01:16:10.660] So, thank you, Jess, for sending us money. [01:16:10.660 --> 01:16:11.860] - Jess, what is it-- - We're happy to provide. [01:16:11.860 --> 01:16:13.540] - What is it like when someone like Jeremy comes [01:16:13.540 --> 01:16:16.540] and goes, "We need a bank account." [01:16:16.540 --> 01:16:20.340] - You know, there are some people that you go through a pitch [01:16:20.340 --> 01:16:22.060] and then there's some people that you email [01:16:22.060 --> 01:16:23.700] and you start prepping the wire, [01:16:23.700 --> 01:16:26.140] and I would say that Jeremy fell into the ladder. [01:16:26.140 --> 01:16:28.780] - Oh, no, I didn't even ask for this money. [01:16:28.780 --> 01:16:30.820] I was just gonna have a chat with Alessio [01:16:30.820 --> 01:16:33.420] to get some advice, and then Jess turned up, [01:16:33.420 --> 01:16:35.920] and Jess's other partner, John, turned up, [01:16:35.920 --> 01:16:37.380] and was like, "What are you guys doing here?" [01:16:37.380 --> 01:16:39.820] And I'm like, "Oh, we'd like to give you money." [01:16:39.820 --> 01:16:42.780] So, I was like, "Oh, okay." [01:16:42.780 --> 01:16:44.020] So, that was good. [01:16:44.020 --> 01:16:45.140] They have good taste, right? [01:16:45.140 --> 01:16:46.100] - Yeah. [01:16:46.100 --> 01:16:47.780] - I've talked to you a bit, [01:16:47.780 --> 01:16:49.100] especially at the modular conference, [01:16:49.100 --> 01:16:50.420] which I'm wearing the badge of. [01:16:50.420 --> 01:16:51.260] - Nice hoodie, yeah. [01:16:51.260 --> 01:16:52.380] - Yeah, the hoodie is really nice. [01:16:52.380 --> 01:16:54.740] So, you're interested in fine-tuning. [01:16:54.740 --> 01:16:58.340] You're interested in fundamental research. [01:16:58.340 --> 01:17:00.340] Could you list out the main areas of interest, maybe? [01:17:00.340 --> 01:17:05.100] - I mean, basically, the interest is in making AI [01:17:05.100 --> 01:17:07.260] as useful and valuable as possible. [01:17:07.260 --> 01:17:08.100] - Yeah. [01:17:08.100 --> 01:17:10.980] - That's how we make it as accessible as possible, [01:17:10.980 --> 01:17:12.440] as widely used as possible, [01:17:12.440 --> 01:17:16.860] help as many people as we can with this technology, right? [01:17:16.860 --> 01:17:18.860] So, how do we do that? [01:17:18.860 --> 01:17:21.900] It needs to be cheaper, it needs to be faster, [01:17:21.900 --> 01:17:23.460] it needs to be easier to use, [01:17:23.460 --> 01:17:25.060] and it needs to be more integrated [01:17:25.060 --> 01:17:27.260] into people's day-to-day lives, [01:17:27.260 --> 01:17:29.260] into the stuff that they do. [01:17:29.260 --> 01:17:32.820] This is hard, you know? [01:17:32.820 --> 01:17:36.980] And so, in the end, I guess I was inspired [01:17:36.980 --> 01:17:40.260] by Thomas Edison's Invention Factory [01:17:40.260 --> 01:17:41.700] in the late 19th century, [01:17:41.700 --> 01:17:43.180] where they had the same situation. [01:17:43.180 --> 01:17:46.560] They were like, "Oh, look, electricity's been invented. [01:17:46.560 --> 01:17:49.920] "Okay, what do we do with this? [01:17:49.920 --> 01:17:51.220] "It's a source of power. [01:17:51.220 --> 01:17:53.180] "I don't know." [01:17:53.180 --> 01:17:55.240] And they're like, "Oh, let's create the record player, [01:17:55.240 --> 01:17:57.700] "and the light bulb, and the refrigerator." [01:17:57.700 --> 01:18:00.700] And, you know, it's like, recognizing [01:18:00.700 --> 01:18:01.820] that now you have electricity, [01:18:01.820 --> 01:18:04.140] you can make all these things, that's hard. [01:18:04.140 --> 01:18:06.180] It requires really smart researchers [01:18:06.180 --> 01:18:09.020] who deeply understand the underlying technology, [01:18:09.020 --> 01:18:12.260] recognize, like, oh, there are some gaps here, [01:18:12.260 --> 01:18:14.860] but they could be filled if we, like, [01:18:14.860 --> 01:18:17.860] use this different kind of filament, or whatever. [01:18:17.860 --> 01:18:22.860] And so, you actually need, like, deep technical experts [01:18:22.860 --> 01:18:26.100] who also have the, like, curiosity, [01:18:26.100 --> 01:18:28.260] and playfulness, and spontaneity, [01:18:28.260 --> 01:18:29.980] to, like, think, like, oh, what if the world [01:18:29.980 --> 01:18:32.260] had this new thing in it? [01:18:32.260 --> 01:18:35.300] I wonder if we could put that thing in the world now, [01:18:35.300 --> 01:18:37.620] but we have AI. [01:18:37.620 --> 01:18:39.780] - Yeah, you were very complimentary of, like, [01:18:39.780 --> 01:18:41.100] the open source, so we last met [01:18:41.100 --> 01:18:44.180] at the open source meetup, as well. [01:18:44.180 --> 01:18:45.860] We met so many times. [01:18:45.860 --> 01:18:47.540] And you're very complimentary of, like, [01:18:47.540 --> 01:18:49.380] their approach towards just trying things, [01:18:49.380 --> 01:18:52.220] like model stacking, for example. [01:18:52.220 --> 01:18:53.380] Is that the kind of people [01:18:53.380 --> 01:18:55.020] that you're looking to collaborate with? [01:18:55.020 --> 01:18:59.060] - I think partly, you know, I'm deeply involved [01:18:59.060 --> 01:19:00.780] in the open source community, [01:19:00.780 --> 01:19:03.980] and I wanna continue to do that, you know? [01:19:03.980 --> 01:19:08.980] They, all the best kind of models outside [01:19:08.980 --> 01:19:12.180] of your kind of open AI and stuff [01:19:12.180 --> 01:19:15.180] are all created by the open source community, [01:19:15.180 --> 01:19:19.420] at the moment, through just trying crazy things. [01:19:19.420 --> 01:19:22.780] But it'll be a mix, you know? [01:19:22.780 --> 01:19:24.460] I also wanna work really closely [01:19:24.460 --> 01:19:29.060] with the best academics in the world, you know? [01:19:29.060 --> 01:19:31.260] And I also wanna collaborate with the people [01:19:32.300 --> 01:19:34.380] in parts of the world we've never even heard of, [01:19:34.380 --> 01:19:35.500] who never get a chance, [01:19:35.500 --> 01:19:38.420] because nobody gave them a chance. [01:19:38.420 --> 01:19:39.980] And, you know, so one of the things [01:19:39.980 --> 01:19:42.700] we're gonna be doing a lot of is, like, [01:19:42.700 --> 01:19:45.580] recruiting in really weird ways, you know, [01:19:45.580 --> 01:19:50.380] to find those people who are underappreciated. [01:19:50.380 --> 01:19:51.580] - Would it be, like, a challenge, [01:19:51.580 --> 01:19:52.420] like a Kaggle-type challenge? [01:19:52.420 --> 01:19:54.780] - Yeah, like, Kaggle-y kind of things, [01:19:54.780 --> 01:19:57.660] and, you know, basically find ways, you know, [01:19:57.660 --> 01:20:02.460] or through, like, open source bounties and stuff like that. [01:20:02.460 --> 01:20:04.900] Like, basically give people an opportunity [01:20:04.900 --> 01:20:08.460] to show that they can do amazing shit [01:20:08.460 --> 01:20:09.700] that nobody else can do. [01:20:09.700 --> 01:20:10.540] - Yeah. [01:20:10.540 --> 01:20:11.820] - Doesn't matter how old they are, [01:20:11.820 --> 01:20:13.820] or where they live, or what color their skin is, [01:20:13.820 --> 01:20:15.260] or whatever, you know? [01:20:15.260 --> 01:20:17.980] - Yeah, I think what the Fast.AI community has shown [01:20:17.980 --> 01:20:19.100] is that there are a lot of people [01:20:19.100 --> 01:20:21.060] who don't have a traditional background [01:20:21.060 --> 01:20:22.740] that are really talented people. [01:20:22.740 --> 01:20:26.460] And I think, yeah, it's great that that was there for, [01:20:26.460 --> 01:20:27.580] that the Fast.AI community was there, [01:20:27.580 --> 01:20:30.020] and that Jeremy continues to highlight [01:20:30.020 --> 01:20:30.860] those talents as well. [01:20:30.860 --> 01:20:32.540] - Actually, so let me give props to Tanishq [01:20:32.540 --> 01:20:33.980] as an example, right? [01:20:33.980 --> 01:20:38.980] So, Tanishq is the CEO of a research lab [01:20:38.980 --> 01:20:40.980] of which I'm the president, Medak. [01:20:40.980 --> 01:20:43.180] And he, how old are you, Tanishq? [01:20:43.180 --> 01:20:45.180] - I'm 20 years old, which is why I'm not drinking the wine. [01:20:45.180 --> 01:20:47.500] - So you're not a drinker at this wine bar. [01:20:47.500 --> 01:20:49.780] - You know, so like, Tanishq's a great example [01:20:49.780 --> 01:20:53.580] of somebody that most people wouldn't hire as a CEO, [01:20:53.580 --> 01:20:54.460] but why the hell not? [01:20:54.460 --> 01:20:57.020] Like, he finished high school 10 years ago. [01:20:57.020 --> 01:20:58.620] He finished high school at 10. [01:20:58.620 --> 01:21:00.980] You know, he had his first degree at, what, 14. [01:21:00.980 --> 01:21:04.500] Like, he's somebody who's, you know, done it twice. [01:21:04.500 --> 01:21:06.340] - I mean, that's somewhat, like, [01:21:06.340 --> 01:21:10.260] he went after the traditional accreditation, [01:21:10.260 --> 01:21:13.020] the pieces of paper that you would pursue [01:21:13.020 --> 01:21:15.260] to show yourself as qualified. [01:21:15.260 --> 01:21:19.780] So, in a way, he's part of that status quo. [01:21:19.780 --> 01:21:24.780] - In a way, but you know, unfortunately, people are ageist. [01:21:24.780 --> 01:21:25.660] - Yes, they are. [01:21:25.660 --> 01:21:28.540] - So, and I'll also note that I never actually [01:21:28.540 --> 01:21:30.860] did a computer science degree or anything like this. [01:21:30.860 --> 01:21:34.900] My start with AI was actually through the fast AI course. [01:21:34.900 --> 01:21:37.980] So, yeah, and so it's been a long journey since then, yeah. [01:21:37.980 --> 01:21:40.740] - What would you ask him about Answer? [01:21:40.740 --> 01:21:42.580] - 'Cause I already know a lot of what's going on [01:21:42.580 --> 01:21:43.420] at the company. [01:21:43.420 --> 01:21:44.420] - What is he not saying? [01:21:44.420 --> 01:21:45.780] Is he too humble to say? [01:21:45.780 --> 01:21:48.020] - I think what he's not saying is he already, you know, [01:21:48.020 --> 01:21:50.980] has a great team of researchers that, you know, [01:21:50.980 --> 01:21:53.660] there are already two researchers that are at Answer AI [01:21:53.660 --> 01:21:55.580] that are amazing researchers that I've had the chance [01:21:55.580 --> 01:21:59.580] to also interact with over the past maybe a year or so, [01:21:59.580 --> 01:22:01.420] closely, and also just more generally. [01:22:01.420 --> 01:22:04.100] I'm looking forward to seeing what Answer does. [01:22:04.100 --> 01:22:05.460] And I'm really excited to continue [01:22:05.460 --> 01:22:06.780] to collaborate with Jeremy. [01:22:06.780 --> 01:22:10.180] I think this will be even better for me. [01:22:10.180 --> 01:22:12.060] Like, I'm selfishly, I'm very excited [01:22:12.060 --> 01:22:14.740] because I think it'll be better for me, you know, [01:22:14.740 --> 01:22:17.880] to work closely with Jeremy as well. [01:22:17.880 --> 01:22:20.780] Even though, you know, he's in his own research lab, [01:22:20.780 --> 01:22:22.860] but I think the collaborations that will come out of this [01:22:22.860 --> 01:22:24.820] will be, will just be amazing. [01:22:24.820 --> 01:22:26.700] So that's what I'm excited for. [01:22:26.700 --> 01:22:29.900] - And Jeremy, last time you were on the podcast, [01:22:29.900 --> 01:22:30.980] you said that, you know, [01:22:30.980 --> 01:22:32.460] one of the most consistent pieces of advice [01:22:32.460 --> 01:22:35.180] that you always give is that people just need to show up, [01:22:35.180 --> 01:22:37.420] follow through, do the work, that stuff. [01:22:37.420 --> 01:22:38.500] Obviously, Tanish did that. [01:22:38.500 --> 01:22:41.020] - Yeah, so Tanish is one of those rare people, right? [01:22:41.020 --> 01:22:42.060] - But like, what, like, [01:22:42.060 --> 01:22:43.980] I feel like Tanish is more special than that. [01:22:43.980 --> 01:22:45.820] Like, what else did he do really well? [01:22:45.820 --> 01:22:47.940] - Yeah, so I mean, [01:22:47.940 --> 01:22:51.300] God, how old were you when I first came across you? [01:22:51.300 --> 01:22:53.340] Like 15 or something, maybe? [01:22:53.340 --> 01:22:54.180] - Wait, what? [01:22:54.180 --> 01:22:55.220] That's so long? - Yeah. [01:22:55.220 --> 01:22:57.160] - 'Cause he only took Fast.ai a year and a half ago. [01:22:57.160 --> 01:22:58.000] - No, no, no, no. [01:22:58.000 --> 01:22:59.660] He was a Fast.ai student back then. [01:22:59.660 --> 01:23:00.500] - Okay. [01:23:00.500 --> 01:23:04.900] - And, you know, he kind of got on the forum, [01:23:04.900 --> 01:23:07.820] helped answer questions, you know, [01:23:07.820 --> 01:23:09.860] asked interesting questions of his own. [01:23:09.860 --> 01:23:16.100] To stick with that for five years, [01:23:16.100 --> 01:23:18.020] that's tenacity, you know? [01:23:18.020 --> 01:23:20.820] And the last course we did [01:23:20.820 --> 01:23:23.260] was the hardest course we've ever had. [01:23:23.260 --> 01:23:24.380] It was the diffusion course. [01:23:24.380 --> 01:23:27.580] It was the first ever stable diffusion course. [01:23:27.580 --> 01:23:30.380] And none of us knew what the hell was going on. [01:23:30.380 --> 01:23:31.700] And, you know, he was the one [01:23:31.700 --> 01:23:35.580] who slogged through the math, [01:23:35.580 --> 01:23:37.800] figured out what the hell all those Greek letters were saying [01:23:37.800 --> 01:23:41.940] and did the first math of stable diffusion video [01:23:41.940 --> 01:23:44.420] that, as far as I know, that ever existed. [01:23:44.420 --> 01:23:48.940] You did that with Wassim, right? [01:23:48.940 --> 01:23:50.420] Along with Wassim. [01:23:50.420 --> 01:23:54.140] So, you know, he slogs through difficult shit. [01:23:54.140 --> 01:23:57.780] And the thing that I noticed now is like, [01:23:57.780 --> 01:23:59.420] you know, Tanishka's kind of famous, [01:23:59.420 --> 01:24:01.460] or was kind of famous, as a child prodigy. [01:24:01.460 --> 01:24:02.900] - Yes, you did a TED Talk when you were 14. [01:24:02.900 --> 01:24:04.300] - He was on Child Genius. [01:24:04.300 --> 01:24:05.340] He did a TED Talk when he was 14. [01:24:05.340 --> 01:24:06.540] - I was nine when I did it. [01:24:06.540 --> 01:24:08.060] - He was nine, okay. [01:24:08.060 --> 01:24:09.900] And like, so I kind of thought like, [01:24:09.900 --> 01:24:13.340] oh, things are easy for child prodigies. [01:24:13.340 --> 01:24:16.900] You know, they're so smart that they just, it's easy. [01:24:16.900 --> 01:24:18.420] And I'm like, oh no. [01:24:18.420 --> 01:24:21.380] Actually, Tanishka's nearly as dumb as me. [01:24:21.380 --> 01:24:25.140] And so he just works really, he just works really hard. [01:24:25.140 --> 01:24:27.380] And he's like, "Tanishka, what does this mean?" [01:24:27.380 --> 01:24:29.380] He's like, "I don't know." [01:24:29.380 --> 01:24:32.660] Like, oh, okay, we better figure it out. [01:24:32.660 --> 01:24:35.300] And so that's been interesting to see that like, [01:24:35.300 --> 01:24:38.900] actually, child prodigies have to work [01:24:38.900 --> 01:24:40.780] really, really hard as well, you know. [01:24:40.780 --> 01:24:42.580] That's part of what makes them a child prodigy [01:24:42.580 --> 01:24:45.420] is that they're tenacious and they don't give up [01:24:45.420 --> 01:24:47.300] even over five years. [01:24:47.300 --> 01:24:48.700] - Does it look that way to you? [01:24:48.700 --> 01:24:49.540] Is that what you-- [01:24:49.540 --> 01:24:50.380] - Yeah, I think so. [01:24:50.380 --> 01:24:51.780] And I think, again, part of it-- [01:24:51.780 --> 01:24:53.580] - You agree you're nearly as dumb as me? [01:24:53.580 --> 01:24:54.620] (laughing) [01:24:54.620 --> 01:24:55.940] - No. [01:24:55.940 --> 01:24:57.220] - Say it again for the pod. [01:24:57.220 --> 01:25:00.220] - I think Jeremy's trying to trick me here. [01:25:00.220 --> 01:25:04.140] But I think the Fast AI community has been so friendly [01:25:04.140 --> 01:25:06.460] that it's been a really pleasant experience [01:25:06.460 --> 01:25:08.300] to stay with that community. [01:25:08.300 --> 01:25:11.180] And I think that has also enabled my tenacity, [01:25:11.180 --> 01:25:13.380] 'cause I enjoy being in that community so much. [01:25:13.380 --> 01:25:15.460] So that's why I've stuck around in that community [01:25:15.460 --> 01:25:16.380] for so long. [01:25:16.380 --> 01:25:18.540] So without that, without the community [01:25:18.540 --> 01:25:21.020] that Jeremy has built, I don't think there's any way-- [01:25:21.020 --> 01:25:21.860] - It supports you. [01:25:21.860 --> 01:25:23.140] I had the same with Free Code Camp. [01:25:23.140 --> 01:25:26.380] - So, you know, I think a lot of it has to do with-- [01:25:26.380 --> 01:25:27.300] - I'm gonna cry. [01:25:27.300 --> 01:25:29.860] - I think a lot of it has to do with [01:25:29.860 --> 01:25:31.460] building good communities. [01:25:31.460 --> 01:25:33.820] And Jeremy has done a really good job of doing that. [01:25:33.820 --> 01:25:35.020] And it's actually a lot of hard work [01:25:35.020 --> 01:25:36.980] to build a good community and to nurture [01:25:36.980 --> 01:25:38.220] and grow that community. [01:25:38.220 --> 01:25:40.620] And I've been in many communities [01:25:40.620 --> 01:25:42.620] and I've kind of observed how different communities [01:25:42.620 --> 01:25:44.060] in the AI field have grown. [01:25:44.060 --> 01:25:46.220] And Fast AI still is one of the best communities [01:25:46.220 --> 01:25:47.580] that I've had a chance to be a part of. [01:25:47.580 --> 01:25:49.580] So, you know, again, props to Jeremy [01:25:49.580 --> 01:25:51.260] for doing that as well. [01:25:51.260 --> 01:25:54.340] - I'm so embarrassed right now. [01:25:54.340 --> 01:25:55.540] - I wanna give you the perspective. [01:25:55.540 --> 01:25:57.180] You've been an AI investor for a while. [01:25:57.180 --> 01:25:58.020] - Yeah. [01:25:58.020 --> 01:26:02.020] - And how do you view this community and this moment here? [01:26:02.020 --> 01:26:03.980] - The one thing I will say to the conversation [01:26:03.980 --> 01:26:06.580] that we're just having that I think is awesome is-- [01:26:06.580 --> 01:26:07.420] - We can move here a little bit. [01:26:07.420 --> 01:26:10.780] - Yeah, people keep coming and drinking more wine. [01:26:10.780 --> 01:26:11.620] - It's great, it's a mobile studio. [01:26:11.620 --> 01:26:13.620] - Yeah, we're truly a mobile studio, [01:26:13.620 --> 01:26:15.260] middle of New Orleans, let's go. [01:26:16.220 --> 01:26:18.660] One of my favorite heuristics as an investor [01:26:18.660 --> 01:26:22.580] is distance traveled rather than just your, [01:26:22.580 --> 01:26:25.740] rather than just like what do I see today [01:26:25.740 --> 01:26:28.020] in your resume or whatnot. [01:26:28.020 --> 01:26:32.340] Because I think if you just go by a certain pedigree [01:26:32.340 --> 01:26:34.980] or credential or whatnot, you miss a lot of people [01:26:34.980 --> 01:26:36.980] who have traveled a really big distance, [01:26:36.980 --> 01:26:39.180] who didn't have advantages to certain opportunities [01:26:39.180 --> 01:26:41.260] or came from different places or not from the US. [01:26:41.260 --> 01:26:42.860] Like you name all the different, [01:26:42.860 --> 01:26:44.820] you know, all the different lists. [01:26:44.820 --> 01:26:47.380] And I always try to look for those kinds of people [01:26:47.380 --> 01:26:49.100] because they're the ones that are always [01:26:49.100 --> 01:26:51.340] pushing the frontier and like really run through walls. [01:26:51.340 --> 01:26:53.940] And I think this conversation is a good example of that. [01:26:53.940 --> 01:26:55.900] - I mean, no one has a longer distance travel to Germany. [01:26:55.900 --> 01:26:56.740] - 100%. [01:26:56.740 --> 01:26:57.580] (laughing) [01:26:57.580 --> 01:26:59.700] Well, literally and in the sense-- [01:26:59.700 --> 01:27:01.820] - Literally from Australia, yes. [01:27:01.820 --> 01:27:04.620] - And when we were, and I think when we were meeting [01:27:04.620 --> 01:27:06.500] last week, you were talking about this [01:27:06.500 --> 01:27:08.940] a little bit around looking for engineers [01:27:08.940 --> 01:27:11.020] and people at places that it's not necessarily [01:27:11.020 --> 01:27:12.340] where everyone else would be looking at, [01:27:12.340 --> 01:27:14.100] but that has yielded some of the best, [01:27:14.100 --> 01:27:16.380] like deepest relationships you've had, right? [01:27:16.380 --> 01:27:17.260] - Oh, absolutely. [01:27:17.260 --> 01:27:20.420] I mean, companies turn resources [01:27:20.420 --> 01:27:22.740] into valuable products and services, right? [01:27:22.740 --> 01:27:25.140] Like what are the resources that we suck in? [01:27:25.140 --> 01:27:28.060] It's like, it's people and GPUs, you know? [01:27:28.060 --> 01:27:28.900] - And money. [01:27:28.900 --> 01:27:31.300] - And it's like, and well, we need the money [01:27:31.300 --> 01:27:34.220] to get those GPUs and the people, right? [01:27:34.220 --> 01:27:39.220] And like, the GPUs are, you know, reasonably, [01:27:39.220 --> 01:27:42.900] like here you can replace one with another, no worries. [01:27:42.900 --> 01:27:45.380] So it's actually the competitive advantage, [01:27:45.380 --> 01:27:47.780] the thing that makes you different is the people. [01:27:47.780 --> 01:27:52.780] So this is the most important thing [01:27:52.780 --> 01:27:57.860] for us to achieve our mission is to build this team, [01:27:57.860 --> 01:28:00.020] you know, to build this really special team. [01:28:00.020 --> 01:28:02.100] And I, you know, I think the way to do that [01:28:02.100 --> 01:28:04.820] and the way I've always built teams is to say, [01:28:04.820 --> 01:28:05.940] is to look at people and say like, [01:28:05.940 --> 01:28:09.580] okay, where is this person now [01:28:09.580 --> 01:28:12.540] and what would it have taken them to get there? [01:28:12.540 --> 01:28:15.860] You know, like, so if somebody is like, you know, [01:28:15.860 --> 01:28:18.940] was kicked out of high school, you know, [01:28:18.940 --> 01:28:22.420] because they were dyslexic or because somebody was like, [01:28:22.420 --> 01:28:24.100] grew up in the mountains of Bangladesh [01:28:24.100 --> 01:28:27.900] and didn't have a PC until they were 16 or, you know, [01:28:27.900 --> 01:28:32.900] somebody fought against, you know, a woman who grew up [01:28:32.900 --> 01:28:35.780] in an environment which he had to fight against [01:28:35.780 --> 01:28:38.300] like institutionalized sexism or whatever. [01:28:38.300 --> 01:28:39.980] It's like, these are the people to me, [01:28:39.980 --> 01:28:42.020] I just kind of go like, okay, [01:28:42.020 --> 01:28:46.300] this person's gone from like negative 43 up to 99. [01:28:46.300 --> 01:28:47.740] - Yes, overcome a lot. [01:28:47.740 --> 01:28:49.660] - That's a kick-ass amazing person, [01:28:49.660 --> 01:28:51.820] whereas somebody who's gone from like 98 to 99 [01:28:51.820 --> 01:28:53.500] is like, okay, it's cool. [01:28:53.500 --> 01:28:56.580] But they're probably not the people [01:28:56.580 --> 01:28:59.700] who are gonna like change the world. [01:28:59.700 --> 01:29:02.300] And so we want to be a small team [01:29:02.300 --> 01:29:03.820] where like literally every person in it [01:29:03.820 --> 01:29:06.780] is somebody who can change the world. [01:29:06.780 --> 01:29:09.060] And the nice thing is when you're in a small team like that, [01:29:09.060 --> 01:29:11.020] it's just really enjoyable [01:29:11.020 --> 01:29:16.020] because everybody's like just really great to be around, [01:29:16.020 --> 01:29:21.100] you know, really inspiring. [01:29:21.100 --> 01:29:22.900] And so, yeah, that's why we're kind of looking [01:29:22.900 --> 01:29:27.900] for these extremely special individuals. [01:29:27.900 --> 01:29:29.700] - Yeah, cool. [01:29:29.700 --> 01:29:32.340] So that's a hiring call explicitly, you know, [01:29:32.340 --> 01:29:34.340] if anyone's listening who fits that profile [01:29:34.340 --> 01:29:36.740] and really wants to work with you, [01:29:36.740 --> 01:29:38.020] they should reach out, right? [01:29:38.020 --> 01:29:38.860] - Yes, absolutely. [01:29:38.860 --> 01:29:41.340] - And now we have a website to send people to. [01:29:41.340 --> 01:29:46.700] So I was gonna wrap it up with just overall NeurIPS tips, [01:29:46.700 --> 01:29:48.860] right, like what is it like to be at NeurIPS this year [01:29:48.860 --> 01:29:50.340] if you've been here before? [01:29:50.340 --> 01:29:53.420] And also like, what's your best tip for doing NeurIPS right? [01:29:53.420 --> 01:29:55.500] Anyone can take it. [01:29:55.500 --> 01:29:56.700] - I guess I'll start. [01:29:56.700 --> 01:29:57.740] This is my second NeurIPS, [01:29:57.740 --> 01:29:59.900] so maybe I don't have a lot of experience with it, [01:29:59.900 --> 01:30:03.580] but I mean, I've been enjoying it a lot so far. [01:30:03.580 --> 01:30:06.740] For me, I think it's about networking with people [01:30:06.740 --> 01:30:07.880] and that's the best part of NeurIPS [01:30:07.880 --> 01:30:10.380] because at the end of the day, [01:30:10.380 --> 01:30:13.080] AI moves so fast that half of these papers [01:30:13.080 --> 01:30:14.580] are already kind of outdated. [01:30:14.580 --> 01:30:16.580] (laughing) [01:30:16.580 --> 01:30:18.340] Like, you know, we've already seen like-- [01:30:18.340 --> 01:30:19.540] - They were written months ago, right? [01:30:19.540 --> 01:30:20.380] - Yeah, yeah. [01:30:20.380 --> 01:30:21.220] - They were approved months ago, too. [01:30:21.220 --> 01:30:22.060] - In order to get here, they had to be reviewed. [01:30:22.060 --> 01:30:22.900] - Exactly. [01:30:22.900 --> 01:30:25.620] So, you know, we're already seeing the second version [01:30:25.620 --> 01:30:27.660] or the third version of a lot of these models already [01:30:27.660 --> 01:30:30.500] and, you know, so, I mean, it's, for me-- [01:30:30.500 --> 01:30:32.220] - So Archive is all you need? [01:30:32.220 --> 01:30:34.180] - Archive is all you need, I guess, yeah. [01:30:34.180 --> 01:30:36.500] So for me, the value comes out of talking with people [01:30:36.500 --> 01:30:38.100] and meeting with people and networking [01:30:38.100 --> 01:30:40.020] and that's why we're coming to events like these [01:30:40.020 --> 01:30:42.980] that, to network and, you know, make these connections [01:30:42.980 --> 01:30:46.540] and, you know, I actually meet a lot of collaborators [01:30:46.540 --> 01:30:48.060] and other researchers at all these conferences. [01:30:48.060 --> 01:30:50.060] - And just to be clear, when you say networking, [01:30:50.060 --> 01:30:53.740] like, it's not like networking in that sense [01:30:53.740 --> 01:30:54.760] of like getting ahead. [01:30:54.760 --> 01:30:57.100] It's a kind of a really nerdy kind of networking. [01:30:57.100 --> 01:31:00.620] So like, earlier, Tanishka and I were at another reception [01:31:00.620 --> 01:31:03.100] where it's like, "Oh, there's Albert Gu. [01:31:03.100 --> 01:31:04.720] "He's the guy that like two days ago [01:31:04.720 --> 01:31:06.180] "released the Mamba paper." [01:31:06.180 --> 01:31:08.580] And we got to him and say like, "Oh, you know, [01:31:08.580 --> 01:31:10.720] "we had a conversation about states-based models [01:31:10.720 --> 01:31:12.320] "and why he's using that and what he thinks [01:31:12.320 --> 01:31:14.300] "the opportunities and limitations are [01:31:14.300 --> 01:31:15.740] "and is there still room for attention?" [01:31:15.740 --> 01:31:18.780] And like, so when we say networking, you know, [01:31:18.780 --> 01:31:21.740] we mean like geeking out on deep conversations [01:31:21.740 --> 01:31:24.140] about people's academic areas of interest. [01:31:24.140 --> 01:31:26.500] - Yeah, I always follow up the question of like, [01:31:26.500 --> 01:31:27.880] "Okay, like what's your name, where you work [01:31:27.880 --> 01:31:29.100] "and then what are your interests?" [01:31:29.100 --> 01:31:30.620] And then we try to go from there. [01:31:30.620 --> 01:31:33.740] - Yeah, just like what paper did you write last or? [01:31:33.740 --> 01:31:34.680] - You know, I will say one thing. [01:31:34.680 --> 01:31:36.780] So even though the posters, there are a bunch [01:31:36.780 --> 01:31:39.260] that truly you go by and even the people presenting [01:31:39.260 --> 01:31:40.500] are like, "Yeah, this is kind of out of date." [01:31:40.500 --> 01:31:43.460] The one hack that's really fun is a lot of those people [01:31:43.460 --> 01:31:45.580] are also already working on the next thing [01:31:45.580 --> 01:31:48.020] and they can give you sort of an early preview [01:31:48.020 --> 01:31:51.020] of something that actually is not an archive yet. [01:31:51.020 --> 01:31:53.140] And so that I actually have always, [01:31:53.140 --> 01:31:54.620] my favorite parts of the conference [01:31:54.620 --> 01:31:56.360] are actually just walking around the poster session, [01:31:56.360 --> 01:31:57.820] shaking hands with people who are presenting [01:31:57.820 --> 01:32:00.460] and learning about what they're most excited about, [01:32:00.460 --> 01:32:02.340] what they're working on, what are some of the new things. [01:32:02.340 --> 01:32:03.660] So I find that really fun. [01:32:03.660 --> 01:32:07.420] And also in my case, since I'm a VC, [01:32:07.420 --> 01:32:10.060] my best tip is throw an event with a lot of good wine [01:32:10.060 --> 01:32:12.700] and let the people come. [01:32:12.700 --> 01:32:14.700] - Yeah, excellent. [01:32:14.700 --> 01:32:16.300] Jeremy, you have any tips? [01:32:16.300 --> 01:32:19.000] - I mean, like Dinesh, this is only my second year up. [01:32:19.000 --> 01:32:22.700] But I've been to quite a few conferences in general [01:32:22.700 --> 01:32:24.700] and my tip, number one tip for all conferences [01:32:24.700 --> 01:32:26.340] is don't go to any sessions. [01:32:26.340 --> 01:32:27.540] - Yeah, just stay outside and talk. [01:32:27.540 --> 01:32:31.140] - Like whatever they're saying very, very slowly, [01:32:31.140 --> 01:32:32.700] and then they're probably not an expert [01:32:32.700 --> 01:32:34.740] at verbal communication either. [01:32:34.740 --> 01:32:36.400] You can probably get the better version [01:32:36.400 --> 01:32:37.540] by just reading the damn paper [01:32:37.540 --> 01:32:38.860] that they're reading out to you. [01:32:38.860 --> 01:32:40.500] So don't bother with that. [01:32:40.500 --> 01:32:44.700] So like, yeah, hang outside in the hallway, [01:32:44.700 --> 01:32:46.860] look on the app to see who else is around [01:32:46.860 --> 01:32:50.180] and reach out to them and try and find a group [01:32:50.180 --> 01:32:53.500] of six or so interesting people to go and check out [01:32:53.500 --> 01:32:58.500] the local Louisiana sausage special outlet with, whatever. [01:32:58.500 --> 01:33:00.400] Yeah, that's-- [01:33:00.400 --> 01:33:01.240] - Reception hopping. [01:33:01.240 --> 01:33:02.300] - Yeah, reception hopping. [01:33:02.300 --> 01:33:03.780] This is our fourth reception tonight. [01:33:03.780 --> 01:33:05.060] - Oh my God. [01:33:05.060 --> 01:33:06.500] - Fourth and best, right, Jeremy? [01:33:06.500 --> 01:33:07.540] - Oh, fourth and best. [01:33:07.540 --> 01:33:09.700] This is why we came to this one last, [01:33:09.700 --> 01:33:13.540] so we can hang out here until the wine's finished. [01:33:13.540 --> 01:33:14.420] - So a lot of people hate [01:33:14.420 --> 01:33:17.300] on the official NeurIPS conference app, Hoova, [01:33:17.300 --> 01:33:19.540] but I kind of like it because of one thing, [01:33:19.540 --> 01:33:20.820] people can organize their own meetups [01:33:20.820 --> 01:33:21.660] and list it here and-- [01:33:21.660 --> 01:33:22.940] - It's awesome, it's awesome. [01:33:22.940 --> 01:33:23.780] - It's actually really good. [01:33:23.780 --> 01:33:25.060] - Yeah, so I'm Brazilian [01:33:25.060 --> 01:33:27.340] and there's a Brazil, like, little chat. [01:33:27.340 --> 01:33:29.060] And it's so fun, everyone's talking in Portuguese, [01:33:29.060 --> 01:33:31.060] talking all the time, they're sharing all the things that, [01:33:31.060 --> 01:33:32.820] and these are people talking about, actually, [01:33:32.820 --> 01:33:35.780] like, interesting concepts in Portuguese. [01:33:35.780 --> 01:33:37.620] So it's actually really fun. [01:33:37.620 --> 01:33:38.460] I love the app. [01:33:38.460 --> 01:33:39.740] - And I didn't even know you were Brazilian, [01:33:39.740 --> 01:33:40.580] so I love it so much. [01:33:40.580 --> 01:33:41.400] - I am, yes. [01:33:41.400 --> 01:33:42.240] - Yeah, Liao with a little scrutiny. [01:33:42.240 --> 01:33:45.740] - Liao, yeah, my accent kind of, like, trips people. [01:33:45.740 --> 01:33:48.100] And it also trips people when I say something incorrectly [01:33:48.100 --> 01:33:50.700] and you can't really tell, but I'm, like, really Brazilian. [01:33:50.700 --> 01:33:53.620] - Yeah, well, we should do a steakhouse next time. [01:33:53.620 --> 01:33:54.460] - Oh, yes, please. [01:33:54.460 --> 01:33:56.180] - Yeah, that's one of those dinners. [01:33:56.180 --> 01:33:57.020] - Done, done. [01:33:57.020 --> 01:33:57.860] - Churrascarias, right? [01:33:57.860 --> 01:33:59.060] - Yeah, churrascaria. [01:33:59.060 --> 01:34:00.100] - Exactly. [01:34:00.100 --> 01:34:01.660] My favorite was, there was a meetup [01:34:01.660 --> 01:34:03.460] for people who are interested in sushi. [01:34:03.460 --> 01:34:04.300] That was the meetup. [01:34:04.300 --> 01:34:05.140] - I love it, yeah. [01:34:05.140 --> 01:34:06.740] - There was, like, nothing machine learning about it. [01:34:06.740 --> 01:34:08.340] - So at ICML, it was really fun. [01:34:08.340 --> 01:34:09.580] There was one meetup that I went to [01:34:09.580 --> 01:34:11.380] that was just, like, swimming in the morning [01:34:11.380 --> 01:34:12.220] because it was in Hawaii. [01:34:12.220 --> 01:34:13.560] It was actually kind of awesome. [01:34:13.560 --> 01:34:15.400] And then people were, like, actually discussing, like, [01:34:15.400 --> 01:34:17.740] super legit topics in the ocean. [01:34:17.740 --> 01:34:19.300] - I'm actually kind of sad I missed out on ICML, [01:34:19.300 --> 01:34:22.060] but, like, it felt indulgent to go to Hawaii for that. [01:34:22.060 --> 01:34:22.900] - Yeah. [01:34:22.900 --> 01:34:24.980] - Okay, well, I just wanted to bring it to a close. [01:34:24.980 --> 01:34:26.700] The last thing I was gonna say is, [01:34:26.700 --> 01:34:27.740] Jeremy, I don't know if you know, [01:34:27.740 --> 01:34:32.440] I picked your meme as the best meme of November 2023. [01:34:32.440 --> 01:34:34.100] It was "laundry buddy." [01:34:34.100 --> 01:34:35.340] (all laughing) [01:34:35.340 --> 01:34:37.580] So, what's up with "laundry buddy"? [01:34:37.580 --> 01:34:38.660] Why do you hate it so much? [01:34:38.660 --> 01:34:39.500] What did it do to you? [01:34:39.500 --> 01:34:40.340] - No! [01:34:40.340 --> 01:34:41.860] (all laughing) [01:34:41.860 --> 01:34:43.300] No! [01:34:43.300 --> 01:34:44.860] It did nothing to me. [01:34:44.860 --> 01:34:46.140] - For people who are out of the loop, what did you do? [01:34:46.140 --> 01:34:48.060] - I couldn't have walked it back more. [01:34:48.060 --> 01:34:49.620] (all laughing) [01:34:49.620 --> 01:34:51.100] - Jeremy did walk it back on Twitter. [01:34:51.100 --> 01:34:54.080] - You really gonna make me revisit my shame? [01:34:54.080 --> 01:34:54.920] - I just think it's a fun story. [01:34:54.920 --> 01:34:57.620] - Okay, just for your show, I'm gonna revisit my shame. [01:34:57.620 --> 01:34:58.720] - Some people don't know, some people don't know. [01:34:58.720 --> 01:35:01.580] - I made a bold claim that "laundry buddy" [01:35:01.580 --> 01:35:06.580] was not the peak of open AI's path [01:35:06.580 --> 01:35:11.220] to societally beneficial artificial general intelligence. [01:35:11.220 --> 01:35:12.060] I was wrong. [01:35:12.060 --> 01:35:13.900] (all laughing) [01:35:13.900 --> 01:35:17.060] It is, in fact, very much on that path. [01:35:17.060 --> 01:35:21.940] It is well loved to be able to know [01:35:21.940 --> 01:35:24.740] that the world's best artificial intelligence [01:35:24.740 --> 01:35:26.820] can help you figure out how to sort out [01:35:26.820 --> 01:35:28.780] your whites and your colors, [01:35:28.780 --> 01:35:31.660] whether to use powder or pods, [01:35:31.660 --> 01:35:35.500] and what to do if you get a stain [01:35:35.500 --> 01:35:37.480] and you don't have laundry nearby. [01:35:37.480 --> 01:35:41.980] It's special, it's important, [01:35:41.980 --> 01:35:43.860] and it's a part of my life [01:35:43.860 --> 01:35:45.820] that I will never want to be without. [01:35:46.780 --> 01:35:49.180] - I love that the, so the Chattivity [01:35:49.180 --> 01:35:50.520] now has an official Twitter account, [01:35:50.520 --> 01:35:52.340] and they even got in on the "laundry buddy" meme, [01:35:52.340 --> 01:35:53.700] which is amazing to me. [01:35:53.700 --> 01:35:56.420] - I actually spent a couple of hours this morning [01:35:56.420 --> 01:35:59.140] hanging out with Boris Power from OpenAI, [01:35:59.140 --> 01:36:03.820] who was in there batting for "laundry buddy" from the start. [01:36:03.820 --> 01:36:04.900] (all laughing) [01:36:04.900 --> 01:36:07.700] - Wait, there's an anti and pro "laundry buddy"? [01:36:07.700 --> 01:36:08.540] - No, I mean, he was just [01:36:08.540 --> 01:36:11.140] a particularly strong enthusiast, right? [01:36:11.140 --> 01:36:14.640] He had the grace to not even bring it up, unlike you. [01:36:14.640 --> 01:36:16.260] (all laughing) [01:36:16.260 --> 01:36:18.100] - I had to, I had to, it was so funny. [01:36:18.100 --> 01:36:20.140] I cracked up so much, it was great. [01:36:20.140 --> 01:36:21.580] Well, thanks for chatting, [01:36:21.580 --> 01:36:23.700] and I'll return you back to your evenings. [01:36:23.700 --> 01:36:25.900] - May your clothes be well laundered. [01:36:25.900 --> 01:36:26.740] - Thanks for having us. [01:36:26.740 --> 01:36:27.580] - Cheers. - Thank you. [01:36:27.580 --> 01:36:29.140] - Thanks. [01:36:29.140 --> 01:36:30.260] - That was Jeremy Howard, [01:36:30.260 --> 01:36:32.820] together with Tanishq Abraham and Jess Liao. [01:36:32.820 --> 01:36:35.300] Tanishq and Jeremy recorded a podcast separately, [01:36:35.300 --> 01:36:36.660] so if you want to learn more about Tanishq, [01:36:36.660 --> 01:36:38.780] he's done long-form interviews in more detail [01:36:38.780 --> 01:36:41.460] than I can cover, because it's a lot of biomedical stuff, [01:36:41.460 --> 01:36:42.340] and that's one of the areas [01:36:42.340 --> 01:36:44.300] that we are not very knowledgeable on. [01:36:44.300 --> 01:36:47.460] And for Jess Liao, she was an investor in Mosaic, [01:36:47.460 --> 01:36:49.220] is one of the newest partners at Decibel, [01:36:49.220 --> 01:36:52.060] and led the round in Answer.ai. [01:36:52.060 --> 01:36:53.540] Next, we're going to go to some people [01:36:53.540 --> 01:36:56.100] on the show floor of the NeurIPS Expo. [01:36:56.100 --> 01:36:58.220] They're not people I had prior relationships with, [01:36:58.220 --> 01:37:00.100] but they're still doing interesting work nonetheless. [01:37:00.100 --> 01:37:02.580] And the first is, we're going to check in with Cerebris, [01:37:02.580 --> 01:37:05.620] which is not only producing giant, massive GPUs, [01:37:05.620 --> 01:37:07.940] but also publishing interesting research. [01:37:07.940 --> 01:37:10.460] So here's my conversation with Joe Hessness, [01:37:10.460 --> 01:37:13.040] Principal Research Scientist at Cerebris Systems. [01:37:13.040 --> 01:37:15.860] - That started working about a year ago. [01:37:15.860 --> 01:37:18.580] We started building out multi-box systems [01:37:18.580 --> 01:37:20.800] so that we could do cluster-level training, [01:37:20.800 --> 01:37:24.140] so larger-scale models, and so this last year, [01:37:24.140 --> 01:37:28.300] we've just been showing off what it's capable of. [01:37:28.300 --> 01:37:29.980] So early this year, we started [01:37:29.980 --> 01:37:33.380] with our Cerebris GPT models. [01:37:33.380 --> 01:37:35.900] That showed compute-optimal scaling [01:37:35.900 --> 01:37:40.460] for, so chinchilla-style scaling, but it's open-source. [01:37:40.460 --> 01:37:42.920] All those models, we released open-source. [01:37:42.920 --> 01:37:46.720] Based on that work, we got attention [01:37:46.720 --> 01:37:48.000] of a few different groups. [01:37:48.000 --> 01:37:50.060] One of them was the OpenTensor Foundation, [01:37:50.060 --> 01:37:51.800] and they came to us and said, [01:37:51.800 --> 01:37:54.900] hey, we want a great three-billion-parameter model [01:37:54.900 --> 01:37:58.360] that does, so it's something that's easy to deploy, [01:37:58.360 --> 01:37:59.880] like in a laptop or something, [01:37:59.880 --> 01:38:03.400] and we wanted to do very general language capabilities, [01:38:03.400 --> 01:38:05.120] long sequence length. [01:38:05.120 --> 01:38:08.280] And so we trained the BTLM language model for that. [01:38:09.500 --> 01:38:12.800] Concurrently with that, we also had an engagement [01:38:12.800 --> 01:38:16.600] that started up with Group 42 in the United Arab Emirates, [01:38:16.600 --> 01:38:20.080] so that's this poster, Core 42. [01:38:20.080 --> 01:38:24.860] They had interest to train large Arabic language models, [01:38:24.860 --> 01:38:27.420] so the first demos that we did for them [01:38:27.420 --> 01:38:29.920] were just Arabic models, but then they said, [01:38:29.920 --> 01:38:33.760] let's do multilingual Arabic and English. [01:38:33.760 --> 01:38:36.640] So we've been training the JACE 13 billion [01:38:36.640 --> 01:38:40.720] and 30 billion-parameter models this year. [01:38:40.720 --> 01:38:42.680] We've released both of those publicly. [01:38:42.680 --> 01:38:46.080] The first version of the 30 billion just came out, [01:38:46.080 --> 01:38:49.360] and the quality of that model in Arabic [01:38:49.360 --> 01:38:52.880] is better than any other public models currently, [01:38:52.880 --> 01:38:55.320] and then in English, it's competitive [01:38:55.320 --> 01:38:58.400] with models like Falcon 40B. [01:38:58.400 --> 01:39:01.740] So we're on a good track there. [01:39:01.740 --> 01:39:04.840] More releases to come through Core 42. [01:39:04.840 --> 01:39:07.120] We're excited to have that be open-source [01:39:07.120 --> 01:39:09.760] and to contribute to the community there. [01:39:09.760 --> 01:39:14.080] - Yeah, anecdotally, since we're already chatting, [01:39:14.080 --> 01:39:15.720] so we might as well keep going, [01:39:15.720 --> 01:39:20.720] but the UAE also notably has the Falcon or TII Institute. [01:39:20.720 --> 01:39:23.440] Are they related, are they competing with each other? [01:39:23.440 --> 01:39:24.280] What's going on? [01:39:24.280 --> 01:39:26.360] - Initially, there was a little bit of competition. [01:39:26.360 --> 01:39:30.360] They're funded by different people, different groups, [01:39:30.360 --> 01:39:33.360] but there is a countrywide effort going on [01:39:33.360 --> 01:39:34.840] in the United Arab Emirates [01:39:34.840 --> 01:39:38.480] to consolidate a lot of their AI efforts, [01:39:38.480 --> 01:39:41.440] and so that's why we're seeing very impressive [01:39:41.440 --> 01:39:45.560] and good pushes towards let's make it open, [01:39:45.560 --> 01:39:47.800] let's collaborate some more, [01:39:47.800 --> 01:39:49.720] and so there might be opportunities in the future [01:39:49.720 --> 01:39:52.680] for us to coordinate directly with TII, [01:39:52.680 --> 01:39:55.360] and we have looked at things like their data sets, [01:39:55.360 --> 01:39:58.280] like RefinedWeb, so there has been some exchange so far. [01:39:58.280 --> 01:40:00.920] - Yeah, with the macrodata refinements process [01:40:00.920 --> 01:40:01.760] that I don't know if you know. [01:40:01.760 --> 01:40:02.760] - Yes. - It was a reference [01:40:02.760 --> 01:40:03.900] to an Apple TV show. [01:40:03.900 --> 01:40:06.560] - Okay, Severance, anyway. - Interesting. [01:40:06.560 --> 01:40:08.640] - It's my fun fact. [01:40:08.640 --> 01:40:10.000] A little bit editor's note. [01:40:10.000 --> 01:40:13.080] The TII Institute people were actually there at NeurIPS [01:40:13.080 --> 01:40:14.640] presenting a poster on RefinedWeb, [01:40:14.640 --> 01:40:17.600] the data set that they did for Falcon 180B and 40B, [01:40:17.600 --> 01:40:19.480] so I asked them about the name. [01:40:19.480 --> 01:40:21.040] - My last question is about the name. [01:40:21.040 --> 01:40:23.000] - Is it from Apple, is it from Severance? [01:40:23.000 --> 01:40:24.340] - Yes. (laughs) [01:40:24.340 --> 01:40:25.800] - So what's the story, what's the? [01:40:25.800 --> 01:40:27.840] - No, it's just like, basically in the end, [01:40:27.840 --> 01:40:30.320] we had someone look at the data every now and then, [01:40:30.320 --> 01:40:31.760] like go through the thing, [01:40:31.760 --> 01:40:33.880] and that's like looking at the scary numbers. [01:40:33.880 --> 01:40:35.440] So, you know, this was the macrodata refinements. [01:40:35.440 --> 01:40:37.040] - You know, nobody comments about this. [01:40:37.040 --> 01:40:37.880] - I know. [01:40:37.880 --> 01:40:39.520] - I was like, wait, I saw this in Severance. [01:40:39.520 --> 01:40:40.560] - Yeah, I know. [01:40:40.560 --> 01:40:42.480] - Right, like, I was like, this is a good joke, [01:40:42.480 --> 01:40:44.520] 'cause it's exactly what you do when you do filtering. [01:40:44.520 --> 01:40:45.660] - Exactly. [01:40:45.660 --> 01:40:47.320] - If you haven't seen Severance, it's a great show, [01:40:47.320 --> 01:40:49.360] it's on Apple TV, great watch for the holidays, [01:40:49.360 --> 01:40:51.760] pretty short, and it's interesting. [01:40:51.760 --> 01:40:54.200] I guess you can call it AI-related now. [01:40:54.200 --> 01:40:56.640] - But it's cool that, well, so one of the things [01:40:56.640 --> 01:40:58.400] I often get asked about, 'cause we have listeners [01:40:58.400 --> 01:41:00.640] in a lot of different countries, [01:41:00.640 --> 01:41:03.280] should every country have their own model, you know? [01:41:03.280 --> 01:41:04.880] - I think this is a really tough question, [01:41:04.880 --> 01:41:09.060] because the volume of data in different languages [01:41:09.060 --> 01:41:12.480] is a, it's power law, Zipf's law distributed, [01:41:12.480 --> 01:41:16.440] so the number of low-resource languages is massive. [01:41:16.440 --> 01:41:20.360] We're talking over 100 languages that are low-resource. [01:41:20.360 --> 01:41:24.280] You just have too few tokens to do a lot with [01:41:24.280 --> 01:41:26.160] in the language modeling context, [01:41:26.160 --> 01:41:28.060] so it's much harder to deal with those. [01:41:28.060 --> 01:41:31.280] Now there, we've actually seen a few different techniques [01:41:31.280 --> 01:41:35.180] at NeurIPS that are targeting those sorts of settings, [01:41:35.180 --> 01:41:39.340] and they're doing things like train a base language model [01:41:39.340 --> 01:41:42.840] in English, and then do transfer process [01:41:42.840 --> 01:41:45.400] where you co-train with both languages. [01:41:45.400 --> 01:41:46.760] - That makes a lot of sense. [01:41:46.760 --> 01:41:48.240] - It makes a lot of sense. [01:41:48.240 --> 01:41:51.280] In that setting, you wanna get the knowledge representation [01:41:51.280 --> 01:41:54.120] from one language, and then try to adapt the style-- [01:41:54.120 --> 01:41:56.600] - Grammar. - Grammar, syntax, I guess, [01:41:56.600 --> 01:41:58.840] the easier part. - Yeah. [01:41:58.840 --> 01:42:02.760] - In Arabic, we're in a sort of medium-resource language. [01:42:02.760 --> 01:42:04.160] There, I think it makes more sense [01:42:04.160 --> 01:42:07.040] to try to mix two languages if you wanna do multilingual, [01:42:07.040 --> 01:42:10.280] and then it helps you do things like translation. [01:42:10.280 --> 01:42:12.800] And then higher-resource languages, [01:42:12.800 --> 01:42:15.880] so if you're talking European languages, [01:42:15.880 --> 01:42:20.480] French, Spanish, German, those I think you can do [01:42:20.480 --> 01:42:24.400] probably from scratch in those languages, [01:42:24.400 --> 01:42:28.740] and probably pretty easy to do multilinguality also. [01:42:28.740 --> 01:42:29.580] - Yeah. [01:42:29.580 --> 01:42:32.560] - So, yeah, it's definitely a very interesting [01:42:32.560 --> 01:42:35.980] open direction we're pushing for. [01:42:35.980 --> 01:42:38.760] In fact, I'd maybe reference-- [01:42:38.760 --> 01:42:40.160] - The workshop. - We have a multilingual [01:42:40.160 --> 01:42:44.960] workshop on Friday where we've invited a bunch of groups [01:42:44.960 --> 01:42:47.640] to come and give talks about their experiences [01:42:47.640 --> 01:42:50.360] with training different language models. [01:42:50.360 --> 01:42:52.080] - Cool, well, people can check out the authors. [01:42:52.080 --> 01:42:55.160] I'm sure this is published and findable online. [01:42:55.160 --> 01:42:56.100] - Yes. [01:42:56.100 --> 01:43:00.440] - Cool, so we should probably get to intros a little bit. [01:43:00.440 --> 01:43:02.400] I mean, we're already recording. [01:43:02.400 --> 01:43:03.400] Who are you and what do you work on, [01:43:03.400 --> 01:43:04.480] and what does your team work on? [01:43:04.480 --> 01:43:05.760] - So my name's Joel Hesnes. [01:43:05.760 --> 01:43:08.900] I'm a principal research scientist at Cerebra Systems, [01:43:08.900 --> 01:43:13.360] and I'm the lead of our core machine learning group. [01:43:13.360 --> 01:43:15.000] So I've helped us bring up [01:43:15.000 --> 01:43:17.700] our foundation language models first, [01:43:17.700 --> 01:43:20.120] and helped kind of set some of the direction [01:43:20.120 --> 01:43:21.880] for expanding outward from there. [01:43:21.880 --> 01:43:24.440] So we started by expanding out a lot [01:43:24.440 --> 01:43:29.220] on the common language functionality, [01:43:29.220 --> 01:43:32.160] and now we're expanding into other places [01:43:32.160 --> 01:43:34.900] where transformer models can be used, [01:43:34.900 --> 01:43:39.120] so targeting things like multimodal [01:43:39.120 --> 01:43:41.480] and other workloads that are similar. [01:43:41.480 --> 01:43:42.320] - Okay. [01:43:42.320 --> 01:43:45.840] - So a lot of our effort has been bringing this up [01:43:45.840 --> 01:43:49.120] and coordinating with the broader Cerebris organization [01:43:49.120 --> 01:43:52.700] to lower these applications down, [01:43:52.700 --> 01:43:56.360] get them compiled to run at efficiency on our hardware. [01:43:56.360 --> 01:43:58.880] So there's been a lot of performance optimization, [01:43:58.880 --> 01:44:00.480] making sure numerics are correct [01:44:00.480 --> 01:44:02.600] for training large models, [01:44:02.600 --> 01:44:05.720] making sure things train stably, things like that. [01:44:05.720 --> 01:44:06.560] - Yeah. [01:44:06.560 --> 01:44:09.760] - So yeah, we're focusing on scaling out right now, [01:44:09.760 --> 01:44:12.120] getting much larger clusters. [01:44:12.120 --> 01:44:15.120] We've sold a couple already, and-- [01:44:15.120 --> 01:44:15.960] - To G42. [01:44:15.960 --> 01:44:20.320] - To G42, and yeah, exciting things to come there, I think. [01:44:20.320 --> 01:44:21.160] - Exciting things to come. [01:44:21.160 --> 01:44:23.160] So we're gonna cover some of the other posters [01:44:23.160 --> 01:44:25.880] that you have here, but one thing I guess I, [01:44:25.880 --> 01:44:29.840] people are very unfamiliar with anything but NVIDIA. [01:44:29.840 --> 01:44:33.040] What should people know when working with a Cerebris chip? [01:44:33.040 --> 01:44:35.800] - Sure, yeah, I think maybe people might be familiar [01:44:35.800 --> 01:44:37.320] with our wafer. [01:44:37.320 --> 01:44:38.160] - Yeah. [01:44:38.160 --> 01:44:40.920] - So Cerebris uses a full wafer for our processor [01:44:40.920 --> 01:44:43.620] instead of cutting the wafer apart into pieces. [01:44:44.480 --> 01:44:46.400] If you cut it apart, you end up packaging it [01:44:46.400 --> 01:44:48.000] into a bunch of different cards, [01:44:48.000 --> 01:44:49.560] and then you package those into a box. [01:44:49.560 --> 01:44:50.400] - Then you have to network them, yeah. [01:44:50.400 --> 01:44:52.040] - And then you have to network them all together [01:44:52.040 --> 01:44:54.200] with a bunch of extra software. [01:44:54.200 --> 01:44:56.960] That's very complicated for large-scale applications, [01:44:56.960 --> 01:44:58.680] and so instead of doing that, [01:44:58.680 --> 01:45:01.080] we leave it together on a single wafer. [01:45:01.080 --> 01:45:01.920] - Got it. [01:45:01.920 --> 01:45:04.360] - That single wafer goes in a single big box [01:45:04.360 --> 01:45:07.560] that's, the performance is roughly equivalent, [01:45:07.560 --> 01:45:12.560] our CS2 box is roughly equivalent to maybe 20 A100 GPUs, [01:45:13.160 --> 01:45:16.940] and you can program it like running on a single GPU, [01:45:16.940 --> 01:45:19.000] so it's just much easier to use. [01:45:19.000 --> 01:45:21.760] - Nice, and is it cost-effective as well? [01:45:21.760 --> 01:45:23.720] I assume it is, 'cause you're saving [01:45:23.720 --> 01:45:24.540] a whole bunch of overhead. [01:45:24.540 --> 01:45:27.760] - Right, so we aim, so the manufacturing process, [01:45:27.760 --> 01:45:30.200] it has a lot lower cost because we don't have to deal [01:45:30.200 --> 01:45:35.200] with as many moving parts, fewer points of failure, [01:45:35.200 --> 01:45:38.680] reliability is quite good, and we try to, [01:45:38.680 --> 01:45:43.680] we aim to be price-performance comparable to GPU systems. [01:45:43.680 --> 01:45:47.600] - Cool, awesome, that's the hardware stuff. [01:45:47.600 --> 01:45:49.760] We're also gonna talk about the streaming things in a bit, [01:45:49.760 --> 01:45:52.280] but yeah, I'd love to, whatever you wanna pick next [01:45:52.280 --> 01:45:54.520] as one of your work for this year. [01:45:54.520 --> 01:45:58.240] - Just give an overview of some of our research directions. [01:45:58.240 --> 01:46:02.640] So our hardware is, it has native support [01:46:02.640 --> 01:46:05.220] for completely unstructured sparsity. [01:46:06.600 --> 01:46:09.280] What that means is we can send in, [01:46:09.280 --> 01:46:11.480] say, if we're using the weight streaming mode, [01:46:11.480 --> 01:46:14.120] which I mentioned, a weight that comes in, [01:46:14.120 --> 01:46:17.360] we can do a vector multiply with some activations, [01:46:17.360 --> 01:46:20.400] so you can use that in your matrix multiplies [01:46:20.400 --> 01:46:24.160] on the wafer, but you can do that on a per-weight basis. [01:46:24.160 --> 01:46:25.360] - You don't need to load the whole thing at once. [01:46:25.360 --> 01:46:26.680] - You don't need to load the whole thing [01:46:26.680 --> 01:46:29.880] to do matrix multiply, so what that means [01:46:29.880 --> 01:46:31.920] is we can do unstructured sparsity, [01:46:31.920 --> 01:46:34.140] just send in the weights that you actually wanna use [01:46:34.140 --> 01:46:36.320] in the matrix multiply, and you can get [01:46:36.320 --> 01:46:37.960] a sparse matrix multiply. [01:46:37.960 --> 01:46:40.240] - Isn't the decision for, like this is the argument, [01:46:40.240 --> 01:46:41.680] classic argument against that kind of sparsity [01:46:41.680 --> 01:46:43.200] is that the decision actually takes longer [01:46:43.200 --> 01:46:45.100] than just doing the math anyway. [01:46:45.100 --> 01:46:49.200] Like the branching, the sort of Turing-complete branching. [01:46:49.200 --> 01:46:52.600] - That's a, yeah, so part of the approach [01:46:52.600 --> 01:46:56.280] that we're using is a weight sparse approach, [01:46:56.280 --> 01:47:00.760] which means the sparsity is in the model itself, [01:47:00.760 --> 01:47:02.380] and so then while you're training that, [01:47:02.380 --> 01:47:04.420] you'd prefer those weights to be [01:47:04.420 --> 01:47:07.000] the same sparsity structure for a while. [01:47:07.000 --> 01:47:07.840] - Okay. [01:47:07.840 --> 01:47:08.800] - So there are techniques that train-- [01:47:08.800 --> 01:47:10.920] - Some kind of constraints, some regularization thing. [01:47:10.920 --> 01:47:14.500] - Right, yeah, so the early works in this [01:47:14.500 --> 01:47:16.600] are things like the lottery ticket hypothesis, [01:47:16.600 --> 01:47:19.320] where you'd find the, yeah, chatting-- [01:47:19.320 --> 01:47:22.080] - John Frankel's like 10 feet from us. [01:47:22.080 --> 01:47:26.180] - And there you find the mask [01:47:26.180 --> 01:47:28.040] by doing some heavy-duty training, [01:47:28.040 --> 01:47:31.440] and then you rewind and retrain the model from scratch. [01:47:31.440 --> 01:47:33.940] Now that's static sparse, so that you have [01:47:33.940 --> 01:47:36.360] the same weight sparsity all the way throughout. [01:47:36.360 --> 01:47:38.420] That works great on our hardware. [01:47:38.420 --> 01:47:41.580] We have, however, added a bunch of new functionality [01:47:41.580 --> 01:47:43.700] that's sort of beta in our recent release [01:47:43.700 --> 01:47:46.260] that allows you to change the sparsity throughout training, [01:47:46.260 --> 01:47:49.600] and so that's something that's being used [01:47:49.600 --> 01:47:52.880] in recent research works, like the rigging [01:47:52.880 --> 01:47:56.620] the lottery ticket hypothesis work, so RIGL, [01:47:56.620 --> 01:47:59.060] and then another one called SET, [01:48:00.260 --> 01:48:05.020] a different approach to deciding how to change the sparsity, [01:48:05.020 --> 01:48:08.220] but those updates happen infrequently enough [01:48:08.220 --> 01:48:11.500] that it doesn't harm the performance on our hardware. [01:48:11.500 --> 01:48:13.260] - That's cool, awesome. [01:48:13.260 --> 01:48:18.260] So this is, SparseIFT is the paper that you published. [01:48:18.260 --> 01:48:22.300] - Yes, so our SparseIFT work looks at different ways [01:48:22.300 --> 01:48:26.280] that you can swap out layers for sparse versions [01:48:26.280 --> 01:48:29.620] using the same flops that might be able to get you [01:48:29.620 --> 01:48:31.420] better representation capability. [01:48:31.420 --> 01:48:35.580] So if you have pressure in your representation [01:48:35.580 --> 01:48:37.860] that's in your activations, for instance, [01:48:37.860 --> 01:48:39.900] let's widen the layer and sparsify it [01:48:39.900 --> 01:48:41.740] to give the model more activations. [01:48:41.740 --> 01:48:43.860] You can store more in those activations. [01:48:43.860 --> 01:48:47.180] Those end up staying dense. [01:48:47.180 --> 01:48:50.540] So our results here show that we can get something [01:48:50.540 --> 01:48:53.460] like a two to three X performance improvement [01:48:53.460 --> 01:48:57.100] at 75% sparse, or you could flip it around [01:48:57.100 --> 01:49:00.780] and you can get, for the same flops, [01:49:00.780 --> 01:49:04.100] a better model by sometimes three to five percent. [01:49:04.100 --> 01:49:05.820] - That's probably, budget-wise, [01:49:05.820 --> 01:49:07.860] I guess you're choosing between pre-training and inference [01:49:07.860 --> 01:49:10.460] just like many people, like what you're optimizing for. [01:49:10.460 --> 01:49:12.500] - Yes. - That's great, awesome. [01:49:12.500 --> 01:49:14.420] And what else are you leading? [01:49:14.420 --> 01:49:19.380] - So I'm also working on some of the pre-training efforts [01:49:19.380 --> 01:49:22.420] that we're doing that look at things like gradient noise [01:49:22.420 --> 01:49:25.460] to estimate good batch sizing and make sure [01:49:25.460 --> 01:49:28.140] that we're making efficient use of the compute. [01:49:28.140 --> 01:49:32.500] So there are techniques, so we have a poster, [01:49:32.500 --> 01:49:34.820] the Efficient and Approximate Per Example [01:49:34.820 --> 01:49:37.180] Gradient Norms paper. - Oh my god. [01:49:37.180 --> 01:49:40.140] Per example? - This is, yes. [01:49:40.140 --> 01:49:43.180] So this is at the, we have this published [01:49:43.180 --> 01:49:45.740] at the WANT workshop with NIRPS, [01:49:45.740 --> 01:49:50.740] and the basic idea is gradient norm calculations are, [01:49:50.740 --> 01:49:53.420] typically if you wanted to do [01:49:53.420 --> 01:49:55.420] the gradient norm calculation, [01:49:55.420 --> 01:49:57.740] you'd wanna aggregate all the gradients together [01:49:57.740 --> 01:49:59.740] and then calculate the norm. [01:49:59.740 --> 01:50:01.460] And you do that over your batch. [01:50:01.460 --> 01:50:04.060] So that's, it's helpful if you wanna measure [01:50:04.060 --> 01:50:06.100] some training dynamics, but if you wanna look [01:50:06.100 --> 01:50:08.580] at something like critical batch size [01:50:08.580 --> 01:50:11.580] to understand how well is my model training [01:50:11.580 --> 01:50:14.100] in terms of efficiency, you actually want [01:50:14.100 --> 01:50:17.020] to have sub-batches, you wanna understand [01:50:17.020 --> 01:50:19.260] the grad norms of the sub-batches also. [01:50:19.260 --> 01:50:21.940] You use that and then the large batch grad norm, [01:50:21.940 --> 01:50:24.500] you can calculate noise statistics. [01:50:24.500 --> 01:50:26.300] Like signal to noise maybe. - Yeah. [01:50:26.300 --> 01:50:29.980] - If you use this technique that was defined [01:50:29.980 --> 01:50:32.900] by one of my teammates, Gavia, [01:50:32.900 --> 01:50:36.620] we can do an approximation that allows us [01:50:36.620 --> 01:50:40.440] to take some, run some statistics over activations [01:50:40.440 --> 01:50:44.580] and run some statistics over the delta gradient [01:50:44.580 --> 01:50:47.500] values coming back, and then you can take [01:50:47.500 --> 01:50:50.300] a dot product, an element-wise product of those, [01:50:50.300 --> 01:50:52.060] now it's much more compute efficient, [01:50:52.060 --> 01:50:55.540] to calculate for each example, this is an approximation [01:50:55.540 --> 01:50:58.960] of the grad norm for that sample. [01:50:58.960 --> 01:51:02.140] And then you can arbitrarily kind of combine [01:51:02.140 --> 01:51:06.540] those back together to get estimates of gradient noise. [01:51:06.540 --> 01:51:09.220] - Okay. [01:51:09.220 --> 01:51:12.740] - So this is something where we improved the, [01:51:12.740 --> 01:51:17.340] we improved the compute requirements. [01:51:17.340 --> 01:51:20.820] We use this in a few different contexts currently, [01:51:20.820 --> 01:51:25.340] but it improves the compute requirement for this [01:51:25.340 --> 01:51:28.400] from, for high dimensional tensors, [01:51:28.400 --> 01:51:31.620] from the dimension of the tensor down to linear, [01:51:31.620 --> 01:51:33.080] linear time computation. [01:51:33.080 --> 01:51:34.060] - Nice. [01:51:34.060 --> 01:51:35.720] And do you, is there like, [01:51:35.720 --> 01:51:39.940] I forget what this, what is this called, [01:51:39.940 --> 01:51:42.000] it's kind of like an annealing curve or something [01:51:42.000 --> 01:51:44.600] where you use this technique at the start [01:51:44.600 --> 01:51:47.660] to initialize and then eventually you sort of [01:51:47.660 --> 01:51:49.500] wean yourself off it? [01:51:49.500 --> 01:51:53.260] - So if you, so this is something you do wanna track [01:51:53.260 --> 01:51:54.100] throughout training. [01:51:54.100 --> 01:51:54.920] - Yeah. [01:51:54.920 --> 01:51:58.220] - Especially if you're doing like phase training [01:51:58.220 --> 01:52:00.540] or if you're changing the data distribution or something, [01:52:00.540 --> 01:52:02.460] it's really helpful to have these statistics [01:52:02.460 --> 01:52:06.580] to decide is my, am I using an appropriate batch size [01:52:06.580 --> 01:52:09.660] that I'm getting good generalization with the new data. [01:52:09.660 --> 01:52:12.420] It helps you set learning rates and things. [01:52:12.420 --> 01:52:16.160] So this is something you'd wanna track throughout training. [01:52:16.160 --> 01:52:20.700] It gives you an estimate of how big the batch size could be. [01:52:20.700 --> 01:52:22.100] - Yeah, excellent. [01:52:22.100 --> 01:52:22.940] Very cool. [01:52:22.940 --> 01:52:27.620] Any, one more? [01:52:27.620 --> 01:52:33.060] - Sure, so then given that we have a sparse accelerator, [01:52:33.060 --> 01:52:35.400] we're also looking at applications [01:52:35.400 --> 01:52:37.740] where you can deploy sparse models. [01:52:37.740 --> 01:52:41.560] And part of our work is figuring out [01:52:41.560 --> 01:52:43.380] how to find those sparse models [01:52:43.380 --> 01:52:45.660] that you'd use in a deployment setting. [01:52:45.660 --> 01:52:49.340] And so we have other work that's related [01:52:49.340 --> 01:52:54.140] to like the sparse GPT work that's been recently released [01:52:54.140 --> 01:52:59.140] where we do some pruning after dense pre-training [01:52:59.140 --> 01:53:03.500] and we do some retraining to get the capabilities [01:53:03.500 --> 01:53:06.640] of the model back up before you would put it in deployment. [01:53:06.640 --> 01:53:07.860] - How much of it can you get back? [01:53:07.860 --> 01:53:09.620] - Actually, I'm not totally familiar. [01:53:09.620 --> 01:53:12.260] This is work from my team members. [01:53:12.260 --> 01:53:16.120] I know we can do, so for large, very large language models [01:53:16.120 --> 01:53:19.780] that have not been trained on a huge number of tokens, [01:53:19.780 --> 01:53:23.080] you can do easily upwards of 50% sparsity [01:53:23.080 --> 01:53:28.080] and fully recover the upstream losses from this retraining. [01:53:28.080 --> 01:53:33.360] So this is a really big next step challenge [01:53:33.360 --> 01:53:36.600] for a lot of the organizations that we work with. [01:53:36.600 --> 01:53:38.720] They're interested in, now I have, [01:53:38.720 --> 01:53:41.000] they're able to pre-train a very large model [01:53:41.000 --> 01:53:42.440] with the hardware, now they're interested [01:53:42.440 --> 01:53:45.940] in figuring out how to deploy it in an efficient manner. [01:53:45.940 --> 01:53:48.440] So we're working with a few different groups on this. [01:53:48.440 --> 01:53:52.160] So we're working with Qualcomm [01:53:52.160 --> 01:53:55.360] and another group called Neural Magic [01:53:55.360 --> 01:53:58.420] that does inference for these large models. [01:53:58.420 --> 01:54:00.600] - Yeah, amazing. [01:54:00.600 --> 01:54:04.480] I was gonna ask if you need the same dataset to retrain, [01:54:04.480 --> 01:54:06.140] but it looks like you train on the pile. [01:54:06.140 --> 01:54:08.160] So I guess that's a no. [01:54:08.160 --> 01:54:10.160] - Yes, you can actually shift here. [01:54:10.160 --> 01:54:14.120] Obviously, different data distribution [01:54:14.120 --> 01:54:16.220] means you have to be a little bit careful [01:54:16.220 --> 01:54:17.760] about how you do the retraining. [01:54:17.760 --> 01:54:19.800] So I think there are a few different things [01:54:19.800 --> 01:54:23.240] we've learned about different learning rate warmups, [01:54:23.240 --> 01:54:26.120] different learning rate levels, I guess, [01:54:26.120 --> 01:54:29.000] because if you're doing a big distribution shift, [01:54:29.000 --> 01:54:31.600] you wanna allow the model to shift a little bit, [01:54:31.600 --> 01:54:34.160] and so you want a slightly higher learning rate. [01:54:34.160 --> 01:54:36.380] - But like, for example, you pruned LLAMA-2, [01:54:36.380 --> 01:54:39.100] and we don't know what the original dataset was. [01:54:39.100 --> 01:54:40.900] - Yeah, I mean, well, so we kind of know [01:54:40.900 --> 01:54:43.180] that LLAMA-2 is a little bit similar [01:54:43.180 --> 01:54:46.500] to something like Slim Pajama and LLAMA-1, [01:54:46.500 --> 01:54:49.560] but yeah, it is definitely a different dataset. [01:54:49.560 --> 01:54:53.140] We do know that Pile and Slim Pajama [01:54:53.140 --> 01:54:55.840] have a fair bit of overlap in some things, [01:54:55.840 --> 01:54:59.120] but it is definitely a different distribution, yeah. [01:55:00.740 --> 01:55:05.040] - So this is a lot of work that our Applied ML team, [01:55:05.040 --> 01:55:07.320] our Applied ML team is working on. [01:55:07.320 --> 01:55:09.080] We're expanding that team currently, by the way, [01:55:09.080 --> 01:55:11.080] so Cerebris is hiring for anybody [01:55:11.080 --> 01:55:13.200] who's interested in listening. [01:55:13.200 --> 01:55:17.320] You can check out our website, cerebris.net/join-us, [01:55:17.320 --> 01:55:19.760] if you'd like to check it out. [01:55:19.760 --> 01:55:24.080] Send us your resume, and we'll take a look. [01:55:24.080 --> 01:55:26.860] - Yeah, thanks for spending some time with us. [01:55:26.860 --> 01:55:29.700] Before we go, what's one NeurIPS tip [01:55:29.700 --> 01:55:32.160] that you wanna give to people if they're attending NeurIPS? [01:55:32.160 --> 01:55:33.440] How do you do NeurIPS right? [01:55:33.440 --> 01:55:35.080] - How do you do NeurIPS right? [01:55:35.080 --> 01:55:39.200] Well, so it's grown roughly 5X [01:55:39.200 --> 01:55:41.080] in the time that I've been attending NeurIPS, [01:55:41.080 --> 01:55:44.040] so it gets more overwhelming every year, [01:55:44.040 --> 01:55:49.040] so pace yourself, and I like that they've kind of backed off [01:55:49.040 --> 01:55:53.360] a bit on the talks and in favor of poster sessions. [01:55:53.360 --> 01:55:55.280] Just, you gotta go wander around, [01:55:55.280 --> 01:55:58.480] you gotta talk to people, you gotta check out posters [01:55:58.480 --> 01:56:03.480] and kind of let stuff sink in and ask questions, so yeah. [01:56:03.480 --> 01:56:05.940] - Yeah, excellent, well thanks so much for your time. [01:56:05.940 --> 01:56:07.020] - Definitely, thanks. - Thank you. [01:56:07.020 --> 01:56:08.340] - That's it. [01:56:08.340 --> 01:56:10.340] - I think Cerebris is doing very interesting work here. [01:56:10.340 --> 01:56:12.180] Most people know them for their hardware, [01:56:12.180 --> 01:56:13.900] but I think they're doing very interesting work [01:56:13.900 --> 01:56:15.740] on the software and LLM trading side, [01:56:15.740 --> 01:56:18.440] and I'd be interested to have them on again in 2024. [01:56:18.440 --> 01:56:21.780] So next we're gonna go walk down the floor to Voxel 51, [01:56:21.780 --> 01:56:24.100] which is not a company I've actually come across before, [01:56:24.100 --> 01:56:26.820] but it seems to be an interesting pair [01:56:26.820 --> 01:56:28.340] together with the next guest as well. [01:56:28.340 --> 01:56:30.460] So this is another one of those situations [01:56:30.460 --> 01:56:32.780] where I get to put two competitors next to each other [01:56:32.780 --> 01:56:35.180] and let you decide as to how they differ [01:56:35.180 --> 01:56:37.000] and how they talk about themselves. [01:56:37.000 --> 01:56:39.020] - Sure, my name is Jason Corso. [01:56:39.020 --> 01:56:41.660] I'm the co-founder and chief scientist at Voxel. [01:56:41.660 --> 01:56:44.220] I'm also on the faculty of EECS and robotics [01:56:44.220 --> 01:56:45.660] at the University of Michigan. [01:56:45.660 --> 01:56:48.860] So Voxel 51 is a spin out of my lab. [01:56:48.860 --> 01:56:52.260] We make a toolkit for AI engineers [01:56:52.260 --> 01:56:55.500] that sits on top of things like PyTorch and TensorFlow, [01:56:55.500 --> 01:56:58.940] and I think of it like a model and dataset debugger. [01:56:58.940 --> 01:57:00.400] The key problem that we face [01:57:00.400 --> 01:57:02.720] is not that we can go download datasets [01:57:02.720 --> 01:57:03.920] and then train models on them, [01:57:03.920 --> 01:57:05.060] or even with foundation models, [01:57:05.060 --> 01:57:06.540] go pull one off the shelf [01:57:06.540 --> 01:57:08.900] and then expect it to work exactly the way you want. [01:57:08.900 --> 01:57:12.020] The problem is really the co-development of a dataset [01:57:12.020 --> 01:57:14.380] to then go and actually use one of those models [01:57:14.380 --> 01:57:16.380] or train or fine tune your own model. [01:57:16.380 --> 01:57:18.860] So 51 lets you represent the data [01:57:18.860 --> 01:57:21.660] that you're using or building alongside your models [01:57:21.660 --> 01:57:25.660] in a way that is extensible, visualizable, and flexible [01:57:25.660 --> 01:57:30.660] so that you can write simple single lines of code in Python [01:57:30.660 --> 01:57:33.200] to do queries of your datasets and your models, [01:57:33.200 --> 01:57:35.160] like show me the corner cases [01:57:35.160 --> 01:57:38.700] where model A is outperforming model B, and it's outdoors, [01:57:38.700 --> 01:57:42.440] or show me intersections in my B2D data, [01:57:42.440 --> 01:57:44.620] or let me visualize my embeddings [01:57:44.620 --> 01:57:46.540] that are either just vision [01:57:46.540 --> 01:57:48.860] or point cloud-based or multimodal, [01:57:48.860 --> 01:57:50.220] and then visually interact with them [01:57:50.220 --> 01:57:52.420] with lassoing on the 3D embedding. [01:57:52.420 --> 01:57:55.380] - Is the concept of active learning still in vogue, [01:57:55.380 --> 01:57:57.660] or is it not cool these days? (laughs) [01:57:57.660 --> 01:58:01.100] - Well, I mean, so 51 is a pretty flexible ecosystem [01:58:01.100 --> 01:58:02.100] of capabilities. [01:58:02.100 --> 01:58:05.340] The heart of it really is that data-centric data model [01:58:05.340 --> 01:58:06.380] of unstructured data. [01:58:06.380 --> 01:58:09.220] So we support images, video, and point clouds. [01:58:09.220 --> 01:58:10.740] You can, in fact, there's a blog [01:58:10.740 --> 01:58:13.300] that one of my colleagues at Voxel 51 [01:58:13.300 --> 01:58:15.240] wrote maybe a month ago [01:58:15.240 --> 01:58:18.220] on how to implement an active learning workflow [01:58:18.220 --> 01:58:19.180] on top of 51. [01:58:19.180 --> 01:58:20.260] So it's plausible. - Seems like it'll [01:58:20.260 --> 01:58:21.700] lend itself easily. - Yeah, exactly. [01:58:21.700 --> 01:58:22.540] It's plausible. [01:58:22.540 --> 01:58:24.140] I mean, the challenge with active learning [01:58:24.140 --> 01:58:26.300] is will just more data help, [01:58:26.300 --> 01:58:27.980] or do you need to write more data? [01:58:27.980 --> 01:58:28.820] - Of course to write more data, yeah. [01:58:28.820 --> 01:58:30.940] - And I think that's kind of a, [01:58:30.940 --> 01:58:33.000] that's the question, I think, right, so yeah. [01:58:33.000 --> 01:58:34.880] - Is it primarily vision that you work on, [01:58:34.880 --> 01:58:35.780] or is it just anything? [01:58:35.780 --> 01:58:38.700] - Yeah, so my experience is in computer vision, [01:58:38.700 --> 01:58:42.240] mostly video understanding and imaging problems. [01:58:42.240 --> 01:58:43.740] So that's where we got started. [01:58:43.740 --> 01:58:45.740] However, the software is pretty flexible, [01:58:45.740 --> 01:58:47.420] so you can add your own data type. [01:58:47.420 --> 01:58:49.580] Like, you know, we're considering adding audio, [01:58:49.580 --> 01:58:54.100] adding text, IOT, you know, like temporal signals. [01:58:54.100 --> 01:58:55.740] But right now, it's images, video, and point clouds. [01:58:55.740 --> 01:58:57.020] - I've often heard it said that, you know, [01:58:57.020 --> 01:58:59.540] the best researchers and the best engineers [01:58:59.540 --> 01:59:00.820] are really the people who get their hands dirty [01:59:00.820 --> 01:59:02.020] in the data sets. [01:59:02.020 --> 01:59:03.300] - Oh yeah, you have to get your hands dirty. [01:59:03.300 --> 01:59:06.060] And this is, so in some sense, the whole company exists [01:59:06.060 --> 01:59:08.100] because I was worried no one was getting [01:59:08.100 --> 01:59:09.660] their hands dirty enough, right? [01:59:09.660 --> 01:59:11.980] Like, they were just expecting to take a data set, [01:59:11.980 --> 01:59:13.980] take a model, and then train it once, [01:59:13.980 --> 01:59:15.980] and then out pops, like, your usable thing. [01:59:15.980 --> 01:59:17.140] No, that's not the way it works, right? [01:59:17.140 --> 01:59:20.340] This is a hard problem in building intuition, [01:59:20.340 --> 01:59:22.980] building a comfort, or like, an ability to take [01:59:22.980 --> 01:59:25.540] a 10 million sample data set and find, like, [01:59:25.540 --> 01:59:28.660] the 1,000 samples that are giving you this problem here. [01:59:28.660 --> 01:59:30.740] It's hard to do, and that's what 51 really lets you do. [01:59:30.740 --> 01:59:32.640] - Yeah, yeah, what's the name, actually? [01:59:32.640 --> 01:59:33.980] I have to ask. [01:59:33.980 --> 01:59:35.580] - Well, we had 50 bad ideas. [01:59:35.580 --> 01:59:38.300] - And this is the first, the one that was, like, [01:59:38.300 --> 01:59:39.140] actually good. [01:59:39.140 --> 01:59:41.220] - Well, that's the way we say now, [01:59:41.220 --> 01:59:44.500] but the actual original way we got started as a company [01:59:44.500 --> 01:59:48.460] was as a video understanding as a service platform, [01:59:48.460 --> 01:59:50.540] and so that's why, so the voxel in the name [01:59:50.540 --> 01:59:53.540] is in the space-time volume of pixels, you know? [01:59:53.540 --> 01:59:56.380] And 51 was just to elicit ideas of Area 51. [01:59:56.380 --> 01:59:57.980] Like, can you find the right voxel? [01:59:57.980 --> 01:59:58.800] Is it there? [01:59:58.800 --> 01:59:59.640] That kind of thing. [01:59:59.640 --> 02:00:01.500] We've subsequently way pivoted away from that, [02:00:01.500 --> 02:00:04.380] as most startups will do at some point in their journey. [02:00:04.380 --> 02:00:06.020] - Yeah, it makes the domain easier to buy. [02:00:06.020 --> 02:00:07.740] - Sure, exactly. [02:00:07.740 --> 02:00:10.060] - So, anything else people should know about your platform? [02:00:10.060 --> 02:00:11.680] Like, top use cases, top customers [02:00:11.680 --> 02:00:12.780] that you always brag about? [02:00:12.780 --> 02:00:14.540] - Sure, well, I mean, it is open source, right? [02:00:14.540 --> 02:00:17.100] So, as long as you have the three key assumptions, [02:00:17.100 --> 02:00:19.400] local data, one user, one machine, [02:00:19.400 --> 02:00:20.940] there's no limitation on the machine learning [02:00:20.940 --> 02:00:22.460] that you can do with 51. [02:00:22.460 --> 02:00:24.420] When you want to violate one of those assumptions, [02:00:24.420 --> 02:00:27.360] like work on a team, or work in the cloud, or whatever, [02:00:27.360 --> 02:00:29.200] then we have an enterprise product [02:00:29.200 --> 02:00:31.220] that you would talk to us to purchase, basically, [02:00:31.220 --> 02:00:33.340] and that's kind of like a Google Drive layer [02:00:33.340 --> 02:00:34.940] on top of the open source one. [02:00:34.940 --> 02:00:35.980] - Very reasonable. [02:00:35.980 --> 02:00:38.440] - Yeah, the only, I mean, we sell to, [02:00:38.440 --> 02:00:39.900] a lot of companies do use it. [02:00:39.900 --> 02:00:41.820] I'm not gonna name 'em here, [02:00:41.820 --> 02:00:42.780] but you can go to the website, [02:00:42.780 --> 02:00:45.700] there's a logo wall of those we can name. [02:00:45.700 --> 02:00:47.280] But it'd be great if you're listening [02:00:47.280 --> 02:00:49.020] to give us a GitHub star. [02:00:49.020 --> 02:00:51.640] That's our, like, we're here at NeurIPS to get users [02:00:51.640 --> 02:00:52.480] to get stars, right? - Stars for swag. [02:00:52.480 --> 02:00:54.780] - Stars for swag, you got it. [02:00:54.780 --> 02:00:55.620] - Yeah, excellent. [02:00:55.620 --> 02:00:58.700] You published a guide to doing CVPR right. [02:00:58.700 --> 02:00:59.540] - I did. [02:00:59.540 --> 02:01:00.460] - We're here at NeurIPS. [02:01:00.460 --> 02:01:02.180] What would be your guide for doing NeurIPS right? [02:01:02.180 --> 02:01:03.500] - So, how to do NeurIPS right? [02:01:03.500 --> 02:01:04.460] I think there's some key things [02:01:04.460 --> 02:01:05.660] of doing large conferences right. [02:01:05.660 --> 02:01:08.660] One is, like, don't expect to do too much per day, right? [02:01:08.660 --> 02:01:10.260] So, what I've always done, [02:01:10.260 --> 02:01:13.100] even when conferences were like a quarter of the size [02:01:13.100 --> 02:01:15.540] or less, like, for any one day, [02:01:15.540 --> 02:01:18.040] identify five to 10 papers in the morning [02:01:18.040 --> 02:01:20.700] that I just wanna understand for that day, right? [02:01:20.700 --> 02:01:23.060] So, then I will make sure though to spend time [02:01:23.060 --> 02:01:26.120] with that poster presenter at the oral talk. [02:01:26.120 --> 02:01:27.220] To me, that's the key. [02:01:27.220 --> 02:01:29.060] And then, at the end of that day, [02:01:29.060 --> 02:01:32.180] I do tend to write a summary from my own brain, [02:01:32.180 --> 02:01:33.500] my own notes of what I did, [02:01:33.500 --> 02:01:36.640] like what the key points were for those papers. [02:01:36.640 --> 02:01:38.540] That's definitely one winning strategy [02:01:38.540 --> 02:01:39.740] for a big conference like this. [02:01:39.740 --> 02:01:41.900] - All right, any other advice for people building [02:01:41.900 --> 02:01:44.900] or any papers that you're excited for this year? [02:01:44.900 --> 02:01:47.860] - Well, I mean, advice, I don't know. [02:01:47.860 --> 02:01:48.740] If you don't know your data, [02:01:48.740 --> 02:01:50.060] then you don't know what you're doing [02:01:50.060 --> 02:01:52.220] is the way I would probably say it. [02:01:52.220 --> 02:01:54.140] And indeed, like getting close to your data [02:01:54.140 --> 02:01:56.420] is part of the model building process, right? [02:01:56.420 --> 02:01:58.460] Like, just to say it again, [02:01:58.460 --> 02:02:00.740] I think of it as a co-development process [02:02:00.740 --> 02:02:04.820] of data sets and models, not of a model training problem. [02:02:04.820 --> 02:02:06.580] - Yeah, I actually had a really interesting chat [02:02:06.580 --> 02:02:08.220] with someone from Cerebris, actually, [02:02:08.220 --> 02:02:09.900] where they talked about how they were doing evals [02:02:09.900 --> 02:02:14.020] on their loss per region on a data set [02:02:14.020 --> 02:02:16.300] as they were training their large language models [02:02:16.300 --> 02:02:18.300] so that they could increase the exposure [02:02:18.300 --> 02:02:20.860] on a specific subdomain if they saw that specifically, [02:02:20.860 --> 02:02:22.680] like loss was not progressing as well [02:02:22.680 --> 02:02:23.800] in that particular subdomain. [02:02:23.800 --> 02:02:25.300] So it's kind of like online training [02:02:25.300 --> 02:02:28.420] and watching their models evolve while they're training. [02:02:28.420 --> 02:02:30.820] - Yeah, I guess it sounds like on specific subsets [02:02:30.820 --> 02:02:32.780] of the data, which is really important. [02:02:32.780 --> 02:02:33.780] - Cool, well, thanks so much for your time. [02:02:33.780 --> 02:02:35.380] - Thanks very much, nice to chat with you, Sean. [02:02:35.380 --> 02:02:36.540] - Coming from data engineering, [02:02:36.540 --> 02:02:39.460] it's pretty interesting to see this space develop. [02:02:39.460 --> 02:02:41.380] It's interesting, also, that a lot of them [02:02:41.380 --> 02:02:43.060] emphasize open source, which we'll see [02:02:43.060 --> 02:02:45.700] with the next speaker, which is Brandon from Gnomic. [02:02:45.700 --> 02:02:47.140] - Who are you and what's Gnomic? [02:02:47.140 --> 02:02:49.020] - Yeah, hey, everyone, my name is Brandon Ai. [02:02:49.020 --> 02:02:51.420] I'm a co-founder and CEO of Gnomic. [02:02:51.420 --> 02:02:53.020] Gnomic's a company that does many things, [02:02:53.020 --> 02:02:55.320] but we have two main products right now. [02:02:55.320 --> 02:02:57.140] One of them is GPD for All, [02:02:57.140 --> 02:02:58.780] which is an open-source ecosystem [02:02:58.780 --> 02:03:00.820] of low-resource language models. [02:03:00.820 --> 02:03:03.300] So it lets you do things like run, you know, [02:03:03.300 --> 02:03:06.260] Mistral 7b fine-tuned on OpenOrca on a MacBook [02:03:06.260 --> 02:03:09.780] or, you know, some esoteric GPU, things like this. [02:03:09.780 --> 02:03:11.700] The second product is a tool called Atlas. [02:03:11.700 --> 02:03:14.220] It lets you explore massive unstructured data sets [02:03:14.220 --> 02:03:15.820] in your web browser. [02:03:15.820 --> 02:03:17.420] Since we're here at NeurIPS, a lot of people [02:03:17.420 --> 02:03:19.180] seem to respond to calling it [02:03:19.180 --> 02:03:21.220] massive clickable t-SNE as a service. [02:03:21.220 --> 02:03:24.660] - Yes, I was actually thinking, is it t-SNE or UMAP? [02:03:24.660 --> 02:03:26.740] - Yeah, so it turns out, if you squint closely enough, [02:03:26.740 --> 02:03:28.540] they're the same algorithm, up to a choice [02:03:28.540 --> 02:03:29.820] of low-dimensional kernel. [02:03:29.820 --> 02:03:32.540] So we optimize the t-SNE objective function. [02:03:32.540 --> 02:03:33.900] One of our pieces of IP is we have [02:03:33.900 --> 02:03:35.940] the world's fastest optimizer for it. [02:03:35.940 --> 02:03:39.220] So if you take, say, the NVIDIA Rapids UMAP implementation, [02:03:39.220 --> 02:03:41.620] which is kind of the fastest version of this in the wild, [02:03:41.620 --> 02:03:43.740] off the shelf and run it on Wikipedia [02:03:43.740 --> 02:03:45.540] on the biggest machine on AWS, [02:03:45.540 --> 02:03:47.100] it's gonna take you a couple of days [02:03:47.100 --> 02:03:48.480] to actually get that map, [02:03:48.480 --> 02:03:50.660] but we can do it in about four hours. [02:03:50.660 --> 02:03:51.500] - Oh, excellent. [02:03:51.500 --> 02:03:52.420] - Yeah, it lets you make the maps [02:03:52.420 --> 02:03:54.100] part of your iterative daily workflow [02:03:54.100 --> 02:03:56.040] as opposed to having to wait a week to get them. [02:03:56.040 --> 02:03:56.880] - Nice. [02:03:56.880 --> 02:03:58.860] We'll throw a video on this on the show notes, [02:03:58.860 --> 02:04:00.860] but maybe you could sort of narratively [02:04:00.860 --> 02:04:01.700] show what you're showing. [02:04:01.700 --> 02:04:04.380] Like, you showed a TikTok example and a Twitter example, [02:04:04.380 --> 02:04:06.100] right, so these are really for visualizing [02:04:06.100 --> 02:04:07.780] massive multimodal data sets. [02:04:07.780 --> 02:04:09.980] - Yeah, so the fundamental thesis behind the tool [02:04:09.980 --> 02:04:12.220] is that the shape of data that people have [02:04:12.220 --> 02:04:14.820] has fundamentally changed as a result of generative. [02:04:14.820 --> 02:04:16.740] Instead of having these big Excel spreadsheets [02:04:16.740 --> 02:04:19.800] of tabular things, you now have vectors plus metadata, [02:04:19.800 --> 02:04:21.980] and we need to rethink visualization [02:04:21.980 --> 02:04:24.700] and the implications of that for the visualization stack. [02:04:24.700 --> 02:04:26.380] You are kind of seeing at the database layer [02:04:26.380 --> 02:04:28.420] that's starting to penetrate with vector DBs and stuff, [02:04:28.420 --> 02:04:31.080] but I think there's gonna be radical implications [02:04:31.080 --> 02:04:32.940] for that change all the way up the stack. [02:04:32.940 --> 02:04:34.320] And so you can use it on, you know, [02:04:34.320 --> 02:04:35.580] getting back to your original question, [02:04:35.580 --> 02:04:39.120] Twitter data, TikTok data, images, sounds, text, [02:04:39.120 --> 02:04:40.680] anything that you can stuff into a vector, [02:04:40.680 --> 02:04:42.680] which is pretty much anything these days, [02:04:42.680 --> 02:04:44.140] you can map and you can understand. [02:04:44.140 --> 02:04:44.980] - Yeah. [02:04:44.980 --> 02:04:46.940] Can I bring my own custom embeddings [02:04:46.940 --> 02:04:48.380] and see the impact of that? [02:04:48.380 --> 02:04:49.220] - You can. [02:04:49.220 --> 02:04:50.860] So there's two ways to get data into the platform. [02:04:50.860 --> 02:04:52.780] One way is bring your own embeddings, [02:04:52.780 --> 02:04:54.520] and then you just pip install gnomic [02:04:54.520 --> 02:04:57.200] from gnomic import atlas and then atlas.map_embeddings. [02:04:57.200 --> 02:04:58.300] You supply your embeddings, [02:04:58.300 --> 02:05:00.340] you supply metadata on top of them, [02:05:00.340 --> 02:05:01.620] and then a couple minutes later, [02:05:01.620 --> 02:05:03.700] you'll get a web link back to a map [02:05:03.700 --> 02:05:05.260] where you can click on it and fly around it. [02:05:05.260 --> 02:05:06.460] If you just have raw data, [02:05:06.460 --> 02:05:08.020] we have a bunch of out-of-the-box embedders [02:05:08.020 --> 02:05:10.220] that we develop and we work with partners to develop [02:05:10.220 --> 02:05:12.680] that you can use to map it out of the box as well. [02:05:12.680 --> 02:05:13.520] - Yeah. [02:05:13.520 --> 02:05:16.380] And this is not open source, but GPC for All is. [02:05:16.380 --> 02:05:18.620] - So there are aspects of the platform that are open source. [02:05:18.620 --> 02:05:20.660] The entire thing runs on a graphics engine [02:05:20.660 --> 02:05:22.300] that we developed called Deep Scatter. [02:05:22.300 --> 02:05:23.480] It's the only tool out there [02:05:23.480 --> 02:05:25.260] that can render a billion point scatter plots [02:05:25.260 --> 02:05:26.380] in a web browser. [02:05:26.380 --> 02:05:27.740] And to do that, you have to, again, [02:05:27.740 --> 02:05:29.300] kind of fundamentally rethink how graphics [02:05:29.300 --> 02:05:30.620] in the browser works from the ground up. [02:05:30.620 --> 02:05:32.060] That is available source, [02:05:32.060 --> 02:05:33.940] but unfortunately it's not fully open source. [02:05:33.940 --> 02:05:34.780] - It's okay. [02:05:34.780 --> 02:05:36.380] Yeah, you don't have to apologize for anything. [02:05:36.380 --> 02:05:37.940] - I do have to. [02:05:37.940 --> 02:05:39.780] I wish we could open source everything, [02:05:39.780 --> 02:05:42.380] but we are unfortunately subject to capitalism, [02:05:42.380 --> 02:05:43.220] and so we cannot. [02:05:43.220 --> 02:05:45.460] But in the limit, I would love to open source everything. [02:05:45.460 --> 02:05:47.460] - I also maybe heard you in another introduction [02:05:47.460 --> 02:05:49.980] talk about this as like Looker for language models. [02:05:49.980 --> 02:05:51.700] Like, elaborate more about that? [02:05:51.700 --> 02:05:52.540] - Yeah. [02:05:52.540 --> 02:05:53.860] - Do you have a query language? [02:05:53.860 --> 02:05:56.180] What are you thinking about as the overall vision? [02:05:56.180 --> 02:05:58.420] - Yeah, so I wanna bring it back to the analogy [02:05:58.420 --> 02:06:00.940] of like the new shape of data disrupting the stack, right? [02:06:00.940 --> 02:06:02.140] So the first place we see it hitting [02:06:02.140 --> 02:06:03.540] is at the database layer. [02:06:03.540 --> 02:06:05.060] Things, you know, we see vector databases. [02:06:05.060 --> 02:06:06.500] There's a million of them nowadays. [02:06:06.500 --> 02:06:08.220] I think that that change is gonna propagate [02:06:08.220 --> 02:06:09.660] all the way up the stack. [02:06:09.660 --> 02:06:10.840] And we are interested in, you know, [02:06:10.840 --> 02:06:12.620] what happens to the BI analytics, [02:06:12.620 --> 02:06:14.220] you know, visualization layer. [02:06:14.220 --> 02:06:15.820] And so really what we're thinking of this as [02:06:15.820 --> 02:06:18.100] is sort of like a tableau for unstructured data [02:06:18.100 --> 02:06:20.420] or a Looker or Power BI or something like this, [02:06:20.420 --> 02:06:22.980] where we've built the entire visualization system [02:06:22.980 --> 02:06:24.720] with embeddings as a first class citizen. [02:06:24.720 --> 02:06:27.380] And so that enables a lot of different actions. [02:06:27.380 --> 02:06:28.500] Some are already in the platform. [02:06:28.500 --> 02:06:30.500] Some I can't tease yet, unfortunately. [02:06:30.500 --> 02:06:33.680] But having embeddings as a first class primitive [02:06:33.680 --> 02:06:36.300] enables a lot of like very, very useful things [02:06:36.300 --> 02:06:38.740] that you're not gonna be able to get unless you have that. [02:06:38.740 --> 02:06:41.380] - What do people use Atlas for? [02:06:41.380 --> 02:06:43.900] Like just maybe list out some more use cases [02:06:43.900 --> 02:06:44.980] that might not be obvious [02:06:44.980 --> 02:06:46.860] from people just thinking about visualization. [02:06:46.860 --> 02:06:48.340] - Yeah, so we'll start with the most technical [02:06:48.340 --> 02:06:49.580] and we'll go to the least technical. [02:06:49.580 --> 02:06:51.620] A lot of ML engineers use it to understand [02:06:51.620 --> 02:06:53.760] and evaluate their models and training data. [02:06:53.760 --> 02:06:55.500] So we just did some work with Hugging Face [02:06:55.500 --> 02:06:57.180] on their Obelix data set, [02:06:57.180 --> 02:06:59.280] which they use to train their idefix model, [02:06:59.280 --> 02:07:02.340] doing some evaluation and training data analysis, [02:07:02.340 --> 02:07:03.660] looking at what areas of their-- [02:07:03.660 --> 02:07:04.620] - We actually interviewed those guys. [02:07:04.620 --> 02:07:06.860] I was in Paris and I talked to Leo and-- [02:07:06.860 --> 02:07:07.700] - And Victor. [02:07:07.700 --> 02:07:08.520] - Yeah, yeah. [02:07:08.520 --> 02:07:09.360] - Yeah, those guys are sick. [02:07:09.360 --> 02:07:10.540] But yeah, so we worked with them on this [02:07:10.540 --> 02:07:12.460] and we discovered a couple of things [02:07:12.460 --> 02:07:13.300] in their training data [02:07:13.300 --> 02:07:15.140] that they should have like actually cleaned out of it. [02:07:15.140 --> 02:07:17.480] There was like a bunch of end of sentence token [02:07:17.480 --> 02:07:19.620] to be replaced that made it through stuff like this. [02:07:19.620 --> 02:07:20.460] Some really garbage content. [02:07:20.460 --> 02:07:21.900] - Do you do anomaly detection? [02:07:21.900 --> 02:07:24.260] Or is that up to people to code themselves? [02:07:24.260 --> 02:07:26.260] - Yeah, so the anomalies usually manifest [02:07:26.260 --> 02:07:28.620] as like the little moons on the outside of the map. [02:07:28.620 --> 02:07:29.460] - Oh, sure, okay. [02:07:29.460 --> 02:07:30.280] - And then you can just like hit 'em [02:07:30.280 --> 02:07:32.660] with the little lasso tool and stuff like this. [02:07:32.660 --> 02:07:34.380] But one of the things about the Hugging Face map [02:07:34.380 --> 02:07:36.460] that I found fascinating was [02:07:36.460 --> 02:07:39.180] because we supply like a topic model out of the box, [02:07:39.180 --> 02:07:41.420] you can look at things like are there topics [02:07:41.420 --> 02:07:43.780] where the loss tends to like cluster together? [02:07:43.780 --> 02:07:44.940] And for the Hugging Face model, [02:07:44.940 --> 02:07:47.980] there was this high loss mode in the poetry topic, [02:07:47.980 --> 02:07:49.500] which I thought was super interesting. [02:07:49.500 --> 02:07:51.220] And so I've got two theories for it. [02:07:51.220 --> 02:07:53.900] One is that poetry includes the distinct subversion [02:07:53.900 --> 02:07:55.820] of like common linguistic patterns. [02:07:55.820 --> 02:07:58.220] And so of course, language models will be bad at it. [02:07:58.220 --> 02:08:00.060] But the more perhaps optimistic theory [02:08:00.060 --> 02:08:01.420] is that poetry captures something [02:08:01.420 --> 02:08:02.420] that's fundamentally human [02:08:02.420 --> 02:08:04.340] that the machines have not grasped yet. [02:08:04.340 --> 02:08:06.540] The pragmatic version, I think, [02:08:06.540 --> 02:08:07.540] is probably what's happening, [02:08:07.540 --> 02:08:09.700] but I like to be optimistic, so. [02:08:09.700 --> 02:08:11.460] - IdaFix is a visual data set. [02:08:11.460 --> 02:08:12.620] And you were-- - It's multimodal. [02:08:12.620 --> 02:08:14.500] - Yeah, okay, so they have poetry in there. [02:08:14.500 --> 02:08:15.420] - Yep. - Interesting. [02:08:15.420 --> 02:08:17.000] - It's sort of interleaved webpages [02:08:17.000 --> 02:08:18.700] of like it'll be an image and then some poetry. [02:08:18.700 --> 02:08:20.460] - Right, so that's the more technical side. [02:08:20.460 --> 02:08:22.240] - And then coming down to the less technical side, [02:08:22.240 --> 02:08:24.060] you know, a lot of our customer base at this point [02:08:24.060 --> 02:08:26.340] is like consulting type companies. [02:08:26.340 --> 02:08:28.060] And they find the product really useful [02:08:28.060 --> 02:08:31.100] for connecting domain experts with large data sets. [02:08:31.100 --> 02:08:32.620] So generally what will happen [02:08:32.620 --> 02:08:33.940] is you'll have these domain experts, [02:08:33.940 --> 02:08:36.120] be it like a doctor or someone in regulation, [02:08:36.120 --> 02:08:38.020] someone with subject matter expertise, [02:08:38.020 --> 02:08:39.820] that'll be handed this massive set of documents [02:08:39.820 --> 02:08:41.740] from a client and be like, I don't even know where to start. [02:08:41.740 --> 02:08:43.140] I don't even know what's in this. [02:08:43.140 --> 02:08:45.580] And so a couple of the consulting partners we work with [02:08:45.580 --> 02:08:48.180] actually now have a KPI that's like timed to Atlas, [02:08:48.180 --> 02:08:49.860] where it's like how quickly from the data set [02:08:49.860 --> 02:08:52.360] hitting the company does it get to Atlas [02:08:52.360 --> 02:08:54.480] so that we can send an analyst the map [02:08:54.480 --> 02:08:56.120] and they can start to explore it. [02:08:56.120 --> 02:08:58.120] And so we're really excited about enabling [02:08:58.120 --> 02:09:00.840] sort of traditionally non-technical people [02:09:00.840 --> 02:09:02.960] to explore and analyze these massive data sets [02:09:02.960 --> 02:09:04.560] with this no-code interface. [02:09:04.560 --> 02:09:05.400] - You know what you should do? [02:09:05.400 --> 02:09:06.640] You should hook up with Google. [02:09:06.640 --> 02:09:07.680] Doesn't Google have a big set [02:09:07.680 --> 02:09:09.760] of like publicly available data sets? [02:09:09.760 --> 02:09:11.760] - Yeah, so we've actually done a couple of collaborations [02:09:11.760 --> 02:09:13.600] with Google Cloud on some of those data sets. [02:09:13.600 --> 02:09:15.320] We can maybe link the blog posts or something. [02:09:15.320 --> 02:09:16.360] - Sure, yeah. - Yeah. [02:09:16.360 --> 02:09:17.200] - Okay, awesome. [02:09:17.200 --> 02:09:18.040] Just in Europe in general, [02:09:18.040 --> 02:09:19.380] you've been here a number of years. [02:09:19.380 --> 02:09:20.920] What do you look for when you come to NeurIPS? [02:09:20.920 --> 02:09:23.000] Any tips that you have for people coming to NeurIPS? [02:09:23.000 --> 02:09:24.040] - Oh, that's a good one. [02:09:24.040 --> 02:09:27.200] Yeah, big tip is just like if you see someone cool, [02:09:27.200 --> 02:09:28.400] like they're probably nice, [02:09:28.400 --> 02:09:30.940] so chase them down and like have them talk to you. [02:09:30.940 --> 02:09:31.780] - I love it. [02:09:31.780 --> 02:09:33.080] - Shove a microphone in their face. [02:09:33.080 --> 02:09:33.920] - Yeah, yeah. [02:09:33.920 --> 02:09:34.760] (laughing) [02:09:34.760 --> 02:09:35.580] No, I love it. [02:09:35.580 --> 02:09:37.240] But it was like my second NeurIPS or something, [02:09:37.240 --> 02:09:39.560] I saw Oriel Vinyals walk by [02:09:39.560 --> 02:09:42.600] and he had just done like the StarCraft stuff [02:09:42.600 --> 02:09:44.520] and I was like, okay, this guy is sick. [02:09:44.520 --> 02:09:46.080] He's doing some really cutting edge stuff. [02:09:46.080 --> 02:09:48.040] So I like ran up and asked him for life advice [02:09:48.040 --> 02:09:50.280] and he was so down to earth and like chatted with me [02:09:50.280 --> 02:09:53.080] for a bunch of time about like modeling and life [02:09:53.080 --> 02:09:55.280] and you know, how to think about my career and stuff. [02:09:55.280 --> 02:09:57.980] And so like, yeah, if you see a hero, like shoot your shot. [02:09:57.980 --> 02:09:59.800] - Yeah, yeah, very, very, very cool. [02:09:59.800 --> 02:10:02.400] Any papers that you're keen on this year [02:10:02.400 --> 02:10:05.360] or like maybe really affected you in previous years? [02:10:05.360 --> 02:10:06.880] - Oh, that's a good one too. [02:10:06.880 --> 02:10:08.480] This year, I think QLaura's here, [02:10:08.480 --> 02:10:10.160] which I think is like a very, very interesting-- [02:10:10.160 --> 02:10:11.360] - Tim, I think is tomorrow. [02:10:11.360 --> 02:10:12.200] - Yeah, yeah, yeah, yeah. [02:10:12.200 --> 02:10:14.660] It's a very interesting set of implications [02:10:14.660 --> 02:10:16.080] for like the low resource world. [02:10:16.080 --> 02:10:16.920] - Can you elaborate? [02:10:16.920 --> 02:10:19.080] - Yeah, so one of the things we think a lot about at Gnomic [02:10:19.080 --> 02:10:21.760] is the accessibility of AI technology. [02:10:21.760 --> 02:10:23.760] And one of the things that's become very clear to us [02:10:23.760 --> 02:10:25.520] and I think everyone this year is like, [02:10:25.520 --> 02:10:27.720] there's the GPU rich and the GPU poor. [02:10:27.720 --> 02:10:29.320] And so I think methods that make it [02:10:29.320 --> 02:10:31.420] so that anyone in the world can interact [02:10:31.420 --> 02:10:33.640] with this technology like QLaura [02:10:33.640 --> 02:10:36.120] are just like so, so, so valuable. [02:10:36.120 --> 02:10:38.080] And so I think any research into like [02:10:38.080 --> 02:10:39.800] low resource training of models [02:10:39.800 --> 02:10:41.160] and low resource deployment of models [02:10:41.160 --> 02:10:42.800] is just gonna be so good for everybody, [02:10:42.800 --> 02:10:44.440] especially like the open source community [02:10:44.440 --> 02:10:45.640] that I really love to see it, so. [02:10:45.640 --> 02:10:47.120] - Yeah, you just reminded me. [02:10:47.120 --> 02:10:50.320] So talking about, we forgot to talk about GPC4ALL. [02:10:50.320 --> 02:10:52.280] Very, very early win, I think, [02:10:52.280 --> 02:10:54.040] in the overall space of things. [02:10:54.040 --> 02:10:56.200] But now, more recently in my mind, [02:10:56.200 --> 02:10:59.120] Llama CPP has come out to be its own platform. [02:10:59.120 --> 02:11:01.360] Old Llamas emerging as like a thing, [02:11:01.360 --> 02:11:03.000] like there's a bunch of ways [02:11:03.000 --> 02:11:04.920] in which people run models locally. [02:11:04.920 --> 02:11:06.520] How should people think about GPC4ALL [02:11:06.520 --> 02:11:07.600] in the context of all that? [02:11:07.600 --> 02:11:09.760] - Yeah, so one thing that a lot of people don't realize [02:11:09.760 --> 02:11:12.480] is that a lot of the core contributors to Llama CPP [02:11:12.480 --> 02:11:13.820] actually work at Gnomic. [02:11:13.820 --> 02:11:16.000] And so I guess the operant advice here [02:11:16.000 --> 02:11:17.800] is just like play nice with open source, right? [02:11:17.800 --> 02:11:19.600] Like GPC4ALL is this thing [02:11:19.600 --> 02:11:21.580] that's gonna be free forever for our community. [02:11:21.580 --> 02:11:23.320] We're gonna keep trying to improve it [02:11:23.320 --> 02:11:26.240] as our Discord recommends and as people call for. [02:11:26.240 --> 02:11:30.240] But if we can do things like go and contribute [02:11:30.240 --> 02:11:31.920] to other open source projects that are high impact, [02:11:31.920 --> 02:11:32.960] we're going to, right? [02:11:32.960 --> 02:11:37.760] And so the hope here is that as economic pressures apply, [02:11:37.760 --> 02:11:39.080] open source stays collaborative [02:11:39.080 --> 02:11:41.240] is really, really the goal for us, I think. [02:11:41.240 --> 02:11:42.060] - Okay, cool. [02:11:42.060 --> 02:11:42.900] Well, that's it. [02:11:42.900 --> 02:11:43.740] Any other last words? [02:11:43.740 --> 02:11:45.080] What are you looking for? [02:11:45.080 --> 02:11:46.580] How do people find you? [02:11:46.580 --> 02:11:49.880] - Yeah, you can follow us on Twitter at gnomic_ai. [02:11:49.880 --> 02:11:52.640] You can also find our website, gnomic.ai. [02:11:52.640 --> 02:11:54.600] - Hiring engineers, researchers? [02:11:54.600 --> 02:11:58.080] - Yeah, we're always looking for super interesting people. [02:11:58.080 --> 02:11:59.560] Yeah, come chat about interesting things [02:11:59.560 --> 02:12:00.760] in our Discord, really. [02:12:00.760 --> 02:12:02.240] You can visit our website and stuff. [02:12:02.240 --> 02:12:03.520] But really the best way to get involved [02:12:03.520 --> 02:12:05.960] is like make some maps, do some open source work. [02:12:05.960 --> 02:12:08.920] Like a lot of the people that we hired [02:12:08.920 --> 02:12:10.920] in this last kind of spree of hiring [02:12:10.920 --> 02:12:12.920] were like big open source contributors. [02:12:12.920 --> 02:12:15.020] And so like, yeah, just give back to the community [02:12:15.020 --> 02:12:17.460] and then, you know, we'll try and find you and boost you. [02:12:17.460 --> 02:12:18.300] - Yeah, awesome. [02:12:18.300 --> 02:12:19.120] Well, thanks so much for your time. [02:12:19.120 --> 02:12:20.280] - Yeah, take care. [02:12:20.280 --> 02:12:21.900] - I think the way that Gnomic's embracing [02:12:21.900 --> 02:12:23.960] and supporting open source AI is encouraging [02:12:23.960 --> 02:12:26.260] and I think more companies should learn from that. [02:12:26.260 --> 02:12:27.100] But they're definitely far [02:12:27.100 --> 02:12:29.940] from the only open source AI company out there. [02:12:29.940 --> 02:12:32.020] Lightning AI is one of the oldest, I guess, [02:12:32.020 --> 02:12:34.140] if you can call that old in the space. [02:12:34.140 --> 02:12:37.320] And I happened to catch Luca, the CTO at their booth [02:12:37.320 --> 02:12:40.100] and at NeurIPS, they were there to launch Lightning Studio, [02:12:40.100 --> 02:12:41.880] which is their new development environment. [02:12:41.880 --> 02:12:42.720] Hey Luca, welcome. [02:12:42.720 --> 02:12:44.760] Good to see that you guys are launching a new product today. [02:12:44.760 --> 02:12:45.600] - Yeah, sure. [02:12:45.600 --> 02:12:46.780] It's super exciting. [02:12:46.780 --> 02:12:48.400] It's the result of many months, [02:12:48.400 --> 02:12:51.420] if not years of work and realizations. [02:12:51.420 --> 02:12:52.820] - So maybe let's establish a baseline. [02:12:52.820 --> 02:12:55.180] Most people will have heard of PyTorch Lightning. [02:12:55.180 --> 02:12:56.540] What was the evolution to Lightning AI? [02:12:56.540 --> 02:12:59.260] - Yeah, so PyTorch Lightning is a very healthy, [02:12:59.260 --> 02:13:01.600] has a very healthy community of people using it. [02:13:01.600 --> 02:13:06.060] We are 5.5 downloads, about 80 million downloads in, [02:13:06.060 --> 02:13:07.880] sorry, 5.5 million downloads. [02:13:07.880 --> 02:13:08.720] - Of course. [02:13:08.720 --> 02:13:11.080] - Per month, about 80 million downloads in total. [02:13:11.080 --> 02:13:14.840] And it's one of the frameworks that comes from the era [02:13:14.840 --> 02:13:18.160] of traditional, quote unquote, deep learning, [02:13:18.160 --> 02:13:22.560] that is one of the main actors in the Gen AI space. [02:13:22.560 --> 02:13:25.080] Because, for example, stable diffusion was trained [02:13:25.080 --> 02:13:27.480] using PyTorch Lightning, a bunch of models. [02:13:27.480 --> 02:13:30.520] PyTorch Lightning powers Nemo from NVIDIA. [02:13:30.520 --> 02:13:34.000] - Yeah, their custom chip design language model. [02:13:34.000 --> 02:13:37.560] - Yeah, so basically, PyTorch Lightning has evolved [02:13:37.560 --> 02:13:40.320] and grown into Gen AI. [02:13:40.320 --> 02:13:43.720] And with the release of 2.0, 2.1, [02:13:43.720 --> 02:13:47.400] we've tried to make it better and better for use cases [02:13:47.400 --> 02:13:50.360] which you have very large models and you have a hard time [02:13:50.360 --> 02:13:54.260] not going out of memory, so. (laughs) [02:13:54.260 --> 02:13:57.480] And to distribute it with, PyTorch Lightning has always [02:13:57.480 --> 02:14:01.760] been very focused on distributed training. [02:14:01.760 --> 02:14:04.500] It's one of the things that it did the best. [02:14:05.420 --> 02:14:09.200] But when models get very large, I think that's where [02:14:09.200 --> 02:14:11.000] we improved a lot this year. [02:14:11.000 --> 02:14:14.640] We also launched Fabric, Lightning Fabric, [02:14:14.640 --> 02:14:17.680] which is a, it's a framework, it's a companion framework [02:14:17.680 --> 02:14:21.200] to PyTorch Lightning, where you get all the constituents [02:14:21.200 --> 02:14:23.260] of the Lightning trainer, but now you can write [02:14:23.260 --> 02:14:24.600] your own training loops. [02:14:24.600 --> 02:14:28.560] So for people doing very optimized stuff, very bespoke, [02:14:28.560 --> 02:14:33.800] I don't know, collecting calls, they want to place them [02:14:33.800 --> 02:14:36.680] where they want, they want to fully own the training loop, [02:14:36.680 --> 02:14:39.240] or they're doing stuff like reinforcement learnings [02:14:39.240 --> 02:14:42.040] where it's not the traditional training loop, [02:14:42.040 --> 02:14:43.480] you can still do it with the trainer, [02:14:43.480 --> 02:14:47.520] but it's a bit more difficult. [02:14:47.520 --> 02:14:50.920] Then Fabric lets you just write your four loops. [02:14:50.920 --> 02:14:55.360] But we'll still abstract away strategies, precision plugins, [02:14:55.360 --> 02:14:58.560] the login, the aggregation of metrics, and all this stuff. [02:14:58.560 --> 02:15:00.940] I like to think about these frameworks as frameworks [02:15:00.940 --> 02:15:04.560] that reduce the surface area for mistakes. [02:15:04.560 --> 02:15:07.560] Because mistakes nowadays, well, a few years ago-- [02:15:07.560 --> 02:15:09.580] >> Cost a lot of money. >> Mistakes, exactly, right? [02:15:09.580 --> 02:15:13.300] They costed a lot of time to a PhD student, [02:15:13.300 --> 02:15:14.960] right now they cost a lot of money, [02:15:14.960 --> 02:15:18.960] so you don't want to make too many mistakes there. [02:15:18.960 --> 02:15:22.680] And Torchmatrix is a third project that we have [02:15:22.680 --> 02:15:24.680] that is very healthy and is powering [02:15:24.680 --> 02:15:26.040] a lot of the metric computation. [02:15:26.040 --> 02:15:28.780] Again, you don't want to compute accuracy [02:15:28.780 --> 02:15:32.140] and aggregate it across a multi-machine job [02:15:32.140 --> 02:15:33.140] in the wrong way, right? [02:15:33.140 --> 02:15:35.560] Because you'll get wrong indications [02:15:35.560 --> 02:15:37.720] and it's really easy to do it incorrectly. [02:15:37.720 --> 02:15:41.360] And this year we started doing-- [02:15:41.360 --> 02:15:43.360] >> Yeah, as you mentioned, these are mostly open source. [02:15:43.360 --> 02:15:45.360] >> Yeah, these are all 100% open source. [02:15:45.360 --> 02:15:47.680] >> I think Fabric in particular was pretty popular. [02:15:47.680 --> 02:15:50.200] >> Yeah, yeah, so Fabric has powered also [02:15:50.200 --> 02:15:54.360] our language model repositories, [02:15:54.360 --> 02:15:55.760] LitLama and LitGPT. [02:15:57.040 --> 02:16:00.080] Basically, back when Lama was originally released, [02:16:00.080 --> 02:16:04.920] I, and me, of course, and-- [02:16:04.920 --> 02:16:06.760] >> The weights were leaked. [02:16:06.760 --> 02:16:07.800] The weights were leaked. [02:16:07.800 --> 02:16:09.600] >> Yeah, the weights, yeah, exactly, exactly. [02:16:09.600 --> 02:16:12.680] But at some point there was a model [02:16:12.680 --> 02:16:15.560] being published by Meta as well. [02:16:15.560 --> 02:16:19.640] It was a GPL license, so we didn't really like that. [02:16:19.640 --> 02:16:22.920] And so we say, why don't we take NanoGPT, [02:16:22.920 --> 02:16:25.640] because I was working with NanoGPT at the time, [02:16:25.640 --> 02:16:27.720] and turn it into Lama. [02:16:27.720 --> 02:16:29.160] And that started the whole thing [02:16:29.160 --> 02:16:31.840] of minimal implementation, single file, [02:16:31.840 --> 02:16:34.200] you have everything there, you have no layers to go through [02:16:34.200 --> 02:16:37.560] to understand how your layers are. [02:16:37.560 --> 02:16:41.480] And that became something that became very popular [02:16:41.480 --> 02:16:43.400] within many organizations. [02:16:43.400 --> 02:16:44.680] So it's still very popular. [02:16:44.680 --> 02:16:47.780] So the LLM Efficiency Challenge, [02:16:47.780 --> 02:16:50.600] the starter kit had LitGPT in it. [02:16:50.600 --> 02:16:52.540] And LitGPT today supports many models, [02:16:52.540 --> 02:16:53.900] many different models. [02:16:53.900 --> 02:16:55.880] But it's very easy to get to the bottom [02:16:55.880 --> 02:16:58.840] of the implementation of every single thing. [02:16:58.840 --> 02:16:59.800] >> Yeah, it's a-- >> So it's very hackable. [02:16:59.800 --> 02:17:01.400] >> Yeah, it's one-- >> My philosophy is-- [02:17:01.400 --> 02:17:02.240] >> File. [02:17:02.240 --> 02:17:05.600] >> Make it hackable before you make it fast, right? [02:17:05.600 --> 02:17:07.280] Because more people can contribute to it, [02:17:07.280 --> 02:17:10.320] and we have contributors being very successful. [02:17:10.320 --> 02:17:13.000] There have been initiatives of models [02:17:13.000 --> 02:17:17.140] being pre-trained using that, like TinyLama. [02:17:17.140 --> 02:17:21.200] And 360AI, I think, a few days ago came out, [02:17:21.200 --> 02:17:24.680] and they said they used LitLama to pre-train [02:17:24.680 --> 02:17:26.200] their seven billion parameter model. [02:17:26.200 --> 02:17:27.880] So it's great. [02:17:27.880 --> 02:17:30.600] And a lot of those learnings went back into Fabric [02:17:30.600 --> 02:17:32.200] and back into PyTorch Lightning. [02:17:32.200 --> 02:17:34.680] And this is how we're kind of growing organically [02:17:34.680 --> 02:17:36.840] towards supporting some AI use cases. [02:17:36.840 --> 02:17:38.700] >> There's an example of one of those learnings [02:17:38.700 --> 02:17:43.120] from those outside usage of LitGPT. [02:17:43.120 --> 02:17:44.560] Oh, LitLama, I guess. [02:17:44.560 --> 02:17:45.400] >> Sorry, can you say that? [02:17:45.400 --> 02:17:46.880] >> What's an example of one of those learnings [02:17:46.880 --> 02:17:50.000] that you got from 360 contributing back? [02:17:50.960 --> 02:17:54.160] >> Well, 360 is very young in the sense [02:17:54.160 --> 02:17:57.480] that we just learned, I think, the day before yesterday [02:17:57.480 --> 02:18:00.040] that they used us, so it's great. [02:18:00.040 --> 02:18:01.880] We're very happy about that. [02:18:01.880 --> 02:18:05.040] From TinyLama, they did some optimizations [02:18:05.040 --> 02:18:06.060] on top of our code. [02:18:06.060 --> 02:18:10.960] And they trained a 1.1 billion parameter model [02:18:10.960 --> 02:18:12.680] on three trillion tokens. [02:18:12.680 --> 02:18:14.200] I think they're still doing that. [02:18:14.200 --> 02:18:16.000] I don't think they're done. [02:18:16.000 --> 02:18:19.280] And then some of the improvements that they made, [02:18:19.280 --> 02:18:22.880] then we upstreamed it to our, like for example, [02:18:22.880 --> 02:18:27.040] I think chunk cross-entropy, [02:18:27.040 --> 02:18:29.860] some kernels that they were using. [02:18:29.860 --> 02:18:35.040] And then we were happy to see that even our data set [02:18:35.040 --> 02:18:37.400] that we optimized because it chunks your data [02:18:37.400 --> 02:18:41.120] and it can stream very quickly, work for them. [02:18:41.120 --> 02:18:45.000] So it's kind of a mutual thing that we're doing. [02:18:45.000 --> 02:18:47.800] And also, all the quantization support. [02:18:47.800 --> 02:18:50.320] For example, right now, Fabric and PyTorch Lightning [02:18:50.320 --> 02:18:52.660] support bits and bytes natively. [02:18:52.660 --> 02:18:57.420] And it's basically one of the few solutions [02:18:57.420 --> 02:19:01.440] where you can use quantization on any kind of model [02:19:01.440 --> 02:19:04.580] and not just the model that the original authors [02:19:04.580 --> 02:19:05.720] decided to support. [02:19:05.720 --> 02:19:07.200] Yeah, so it's kind of flexible. [02:19:07.200 --> 02:19:10.600] >> But here today, I think the main thing we're doing today [02:19:10.600 --> 02:19:12.040] is launching our platform. [02:19:12.040 --> 02:19:14.000] >> Yeah, you just launched Studio today. [02:19:14.000 --> 02:19:14.920] >> Yeah, exactly. [02:19:14.920 --> 02:19:18.280] Lightning Studio, again, is a result of many months [02:19:18.280 --> 02:19:20.520] and years of work. [02:19:20.520 --> 02:19:24.600] It basically makes you build AI at scale, [02:19:24.600 --> 02:19:26.160] but it feels like it's your laptop. [02:19:26.160 --> 02:19:30.680] So to me, it's kind of the first time I've seen a platform [02:19:30.680 --> 02:19:33.720] not leaking the abstraction of orchestration [02:19:33.720 --> 02:19:35.080] on the cloud and so on. [02:19:35.080 --> 02:19:38.000] Literally, there's nothing to learn, right? [02:19:38.000 --> 02:19:40.960] >> You put VS Code in the browser and then you add all that. [02:19:40.960 --> 02:19:43.060] >> You can even connect from your local VS Code [02:19:43.060 --> 02:19:43.900] and code there. [02:19:43.900 --> 02:19:45.920] >> You have the whole machine, it's a whole machine. [02:19:45.920 --> 02:19:47.720] >> Yeah, it's a cloud development environment. [02:19:47.720 --> 02:19:48.760] >> Exactly. [02:19:48.760 --> 02:19:51.360] And it's built around reproducible environment. [02:19:51.360 --> 02:19:52.200] Yeah, exactly. [02:19:52.200 --> 02:19:54.520] But when you go in there, it's not that you need to build [02:19:54.520 --> 02:19:57.600] your Docker container, you just go in there, [02:19:57.600 --> 02:19:58.840] you present it with a machine, [02:19:58.840 --> 02:20:00.360] you can start working immediately. [02:20:00.360 --> 02:20:02.780] If you pip install something and then you decide [02:20:02.780 --> 02:20:06.240] to switch instance type, your dependencies will carry over. [02:20:06.240 --> 02:20:10.620] Or if you decide to duplicate my studio, [02:20:10.620 --> 02:20:12.360] everything that I set up on that studio, [02:20:12.360 --> 02:20:14.960] from the environment to the data, the code, [02:20:14.960 --> 02:20:18.120] the checkpoints eventually that I put there, [02:20:18.120 --> 02:20:21.140] you will find them and so you will spend zero time [02:20:21.140 --> 02:20:21.980] setting up your environment. [02:20:21.980 --> 02:20:23.520] >> So are you snapshotting memory? [02:20:23.520 --> 02:20:24.720] How does this work? [02:20:24.720 --> 02:20:27.080] >> Well, that's secret sauce. [02:20:27.080 --> 02:20:29.040] >> You're not using containers, you said. [02:20:29.040 --> 02:20:33.680] >> Yeah, well, I mean, we do, if you think about it, [02:20:33.680 --> 02:20:38.680] then it's not too complicated fundamentally, [02:20:38.680 --> 02:20:41.620] but it's very complicated to actually get [02:20:41.620 --> 02:20:43.280] the perfect experience out of it. [02:20:43.280 --> 02:20:45.720] >> Maybe describe your design constraints, [02:20:45.720 --> 02:20:47.840] what are you optimizing for? [02:20:47.840 --> 02:20:49.480] >> We're optimizing for velocity. [02:20:49.480 --> 02:20:53.160] So we don't want people to spend time thinking [02:20:53.160 --> 02:20:55.080] about things they shouldn't think about. [02:20:55.080 --> 02:20:57.600] Like when you're coding on a machine [02:20:57.600 --> 02:21:01.060] and you now want four GPUs, you should just be able [02:21:01.060 --> 02:21:03.600] to get four GPUs and keep working, right? [02:21:03.600 --> 02:21:06.680] Without thinking about, oh, now I need to go to a console, [02:21:06.680 --> 02:21:10.120] spin things up, for my environment, attach drives, [02:21:10.120 --> 02:21:12.800] like these are all things you shouldn't think about. [02:21:12.800 --> 02:21:17.320] And again, it goes back to limiting the surface area [02:21:17.320 --> 02:21:18.600] for mistakes, right? [02:21:18.600 --> 02:21:21.200] Because you can do what you're good at [02:21:21.200 --> 02:21:23.280] and not do what you shouldn't mess with. [02:21:23.280 --> 02:21:24.640] >> It's like the fabric philosophy [02:21:24.640 --> 02:21:26.540] that's expanded to the dev environment. [02:21:26.540 --> 02:21:28.160] >> Exactly, exactly. [02:21:28.160 --> 02:21:29.600] And so, yeah, we're very excited. [02:21:29.600 --> 02:21:32.080] You can do small things, like in Colab, [02:21:32.080 --> 02:21:34.960] except that your data is persistent [02:21:34.960 --> 02:21:37.980] and you can switch off and switch on [02:21:37.980 --> 02:21:39.480] and everything will be there. [02:21:40.480 --> 02:21:43.120] Or you can even train large language models. [02:21:43.120 --> 02:21:48.120] >> Yeah, what are the larger customers doing? [02:21:48.120 --> 02:21:49.320] What are you doing for them? [02:21:49.320 --> 02:21:51.080] Because I feel like this might be targeted [02:21:51.080 --> 02:21:52.960] towards the smaller customers. [02:21:52.960 --> 02:21:57.160] >> No, actually, we work with very big [02:21:57.160 --> 02:22:01.160] financial institutions and we're actually [02:22:01.160 --> 02:22:03.280] pre-training models ourselves. [02:22:03.280 --> 02:22:08.000] So the scale at which you can operate is pretty large. [02:22:08.000 --> 02:22:09.760] It's not like, it looks like something [02:22:09.760 --> 02:22:12.360] that you can do small stuff with, which is true. [02:22:12.360 --> 02:22:13.600] It's super smooth there. [02:22:13.600 --> 02:22:16.080] But if you need to launch a job on 100 GPUs, [02:22:16.080 --> 02:22:18.440] you can just do it, provided that you have the machines. [02:22:18.440 --> 02:22:22.780] But we manage reservations, so we can target reservations. [02:22:22.780 --> 02:22:26.040] Or you can attach your own cloud account [02:22:26.040 --> 02:22:28.880] and negotiate your quotas with your cloud provider [02:22:28.880 --> 02:22:31.340] and we'll just orchestrate on your cloud account. [02:22:31.340 --> 02:22:34.760] >> Yeah, any cloud providers you would shout out as, [02:22:34.760 --> 02:22:36.400] particularly, I mean, people know the big three clouds, [02:22:36.400 --> 02:22:38.320] but any other providers that you would shout out [02:22:38.320 --> 02:22:40.360] as very good partners to work with so far? [02:22:40.360 --> 02:22:42.680] >> Right now, we've been focusing on AWS. [02:22:42.680 --> 02:22:45.400] We'll expand, of course, because-- [02:22:45.400 --> 02:22:46.960] >> Yeah, everyone needs everyone else. [02:22:46.960 --> 02:22:47.800] >> Yeah, exactly. [02:22:47.800 --> 02:22:49.640] >> Apparently, Oracle's doing very well. [02:22:49.640 --> 02:22:51.700] >> Yeah, yeah, we talked to Oracle. [02:22:51.700 --> 02:22:54.840] We talked to most of the cloud providers out there. [02:22:54.840 --> 02:22:57.660] To us, it's more a matter of sequencing. [02:22:57.660 --> 02:22:59.240] We have a very good relationship, of course, [02:22:59.240 --> 02:23:00.880] with AWS right now. [02:23:00.880 --> 02:23:03.940] They've been supporting us for the launch and so on. [02:23:03.940 --> 02:23:07.680] But surely, we'll get into getting the best machines [02:23:07.680 --> 02:23:09.540] for our customers. [02:23:09.540 --> 02:23:10.520] >> Yeah. [02:23:10.520 --> 02:23:13.240] >> And in the near future, we'll also support [02:23:13.240 --> 02:23:16.120] on-prem clusters in terms of orchestration, [02:23:16.120 --> 02:23:19.220] like Slurm as an orchestrator or as a scheduler. [02:23:19.220 --> 02:23:21.400] >> People have mixed feelings about Slurm. [02:23:21.400 --> 02:23:22.720] >> Well, yeah, but in this case, [02:23:22.720 --> 02:23:23.820] you don't have to deal with it, right? [02:23:23.820 --> 02:23:24.660] >> Yes, yeah, yeah. [02:23:24.660 --> 02:23:27.960] >> We take away the pain and you still can orchestrate [02:23:27.960 --> 02:23:28.940] on top of that. [02:23:29.880 --> 02:23:34.880] It's still not out, but it will come in the near future. [02:23:34.880 --> 02:23:37.120] >> Yeah. [02:23:37.120 --> 02:23:39.460] >> We're already doing that with some companies. [02:23:39.460 --> 02:23:41.900] >> Yeah, so I want to talk about the workshop [02:23:41.900 --> 02:23:42.740] that you're doing on Friday. [02:23:42.740 --> 02:23:43.580] >> Yeah. [02:23:43.580 --> 02:23:44.500] >> The Efficiency Challenge. [02:23:44.500 --> 02:23:45.340] >> Yep. [02:23:45.340 --> 02:23:46.160] >> Was it motivated by a paper? [02:23:46.160 --> 02:23:48.260] I saw it's sort of like a cramming paper. [02:23:48.260 --> 02:23:50.940] What's the maximum you can do with one day of compute? [02:23:50.940 --> 02:23:52.020] Something like that? [02:23:52.020 --> 02:23:55.660] >> Yeah, so we know it is because they, [02:23:56.540 --> 02:24:00.000] Mark Sarfim and the other organizers [02:24:00.000 --> 02:24:03.280] ended up choosing the GPT as one of the models [02:24:03.280 --> 02:24:05.800] for Starkey, and we were happy about it, of course. [02:24:05.800 --> 02:24:06.640] >> Yeah, yeah. [02:24:06.640 --> 02:24:08.520] >> And so we said, yeah, what we can do together. [02:24:08.520 --> 02:24:12.360] And we ended up, and we really like the principle. [02:24:12.360 --> 02:24:16.160] So we believe smaller models can empower [02:24:16.160 --> 02:24:21.320] people a lot, getting control and understanding [02:24:21.320 --> 02:24:22.960] how to extract value from AI. [02:24:22.960 --> 02:24:26.280] And so I think there's a dire need [02:24:26.280 --> 02:24:29.800] for the consolidation, getting smaller, [02:24:29.800 --> 02:24:34.140] getting more efficiency, and getting the result you want [02:24:34.140 --> 02:24:36.080] in the shortest time as possible. [02:24:36.080 --> 02:24:39.240] And that's how the velocity will increase [02:24:39.240 --> 02:24:44.240] and how eventually open source will get there [02:24:44.240 --> 02:24:47.840] on par, if not beyond, what's available [02:24:47.840 --> 02:24:49.340] in the closed source world. [02:24:49.340 --> 02:24:52.560] So we are fully supportive of that. [02:24:52.560 --> 02:24:54.760] The way we ended up contributing is [02:24:54.760 --> 02:24:59.160] we maintained a public leaderboard, [02:24:59.160 --> 02:25:00.200] and it was a nice experience [02:25:00.200 --> 02:25:02.080] because we integrated with Discord. [02:25:02.080 --> 02:25:03.780] There was a Discord channel. [02:25:03.780 --> 02:25:06.560] >> This is for the Efficiency Challenge Discord? [02:25:06.560 --> 02:25:09.080] >> Yeah, exactly, the Efficiency Challenge Discord, [02:25:09.080 --> 02:25:11.600] and we set up an agent that was running [02:25:11.600 --> 02:25:16.240] on a few of our machines, and people could submit [02:25:16.240 --> 02:25:20.920] through a DM to the bot so that the bot [02:25:20.920 --> 02:25:25.440] would then spin up a job, run things in a queue, [02:25:25.440 --> 02:25:28.520] get back the results from evaluation, [02:25:28.520 --> 02:25:34.960] and then essentially get a ranking on where they were. [02:25:34.960 --> 02:25:38.360] And that, I think, helped a lot, [02:25:38.360 --> 02:25:41.560] motivating people to compete against each other, [02:25:41.560 --> 02:25:43.400] but in a very constructive way. [02:25:43.400 --> 02:25:45.920] And to be honest, in the first month, [02:25:45.920 --> 02:25:48.360] it's been very, very bumpy with that. [02:25:48.360 --> 02:25:50.200] It was all new infrastructure, [02:25:50.200 --> 02:25:52.300] and we were doing it in spare time, [02:25:52.300 --> 02:25:56.240] so it wasn't the best of the experience. [02:25:56.240 --> 02:25:59.800] So together with the community that was on there, [02:25:59.800 --> 02:26:02.800] they helped us figure out what was not going well, [02:26:02.800 --> 02:26:05.120] and I think at the end, we had more [02:26:05.120 --> 02:26:08.400] than 1,000 submissions that were successful. [02:26:08.400 --> 02:26:10.720] Many more submissions that didn't complete [02:26:10.720 --> 02:26:14.680] because of submission problems, like user code problems, [02:26:14.680 --> 02:26:16.640] but there were more than 1,000 submissions [02:26:16.640 --> 02:26:20.260] that were actually fully evaluated on that leaderboard. [02:26:20.260 --> 02:26:23.200] - So the challenge is over, [02:26:23.200 --> 02:26:25.140] but I don't know if you've done the analysis [02:26:25.140 --> 02:26:28.400] on anything to learn from the winning entries. [02:26:28.400 --> 02:26:32.240] - So we've been, the rest of the organizers, yes, [02:26:32.240 --> 02:26:36.160] they have put a lot of effort in the next three weeks, [02:26:36.160 --> 02:26:38.960] four weeks, to reevaluate everything, [02:26:38.960 --> 02:26:42.600] run the first ones from scratch, [02:26:42.600 --> 02:26:44.520] and they've done an amazing job. [02:26:45.860 --> 02:26:47.400] And some of the code that we wrote [02:26:47.400 --> 02:26:49.360] for the public leaderboard ended up [02:26:49.360 --> 02:26:53.340] being part of this evaluation infrastructure. [02:26:53.340 --> 02:26:55.160] I was very, very busy with the launch. [02:26:55.160 --> 02:26:56.000] - Of course. [02:26:56.000 --> 02:26:57.280] - So I didn't participate there, [02:26:57.280 --> 02:26:58.120] so I'm super curious-- - I'll try to talk [02:26:58.120 --> 02:26:59.400] to Sebastien. - On Friday. [02:26:59.400 --> 02:27:00.640] - Yeah. - Yeah, yeah, yeah. [02:27:00.640 --> 02:27:01.960] - Okay. [02:27:01.960 --> 02:27:06.740] - What will be the details from the winners. [02:27:06.740 --> 02:27:07.580] - Yes. [02:27:07.580 --> 02:27:09.120] - But I must say that all the community [02:27:09.120 --> 02:27:12.720] has been super nice, they were super constructive. [02:27:12.720 --> 02:27:15.140] I remember when Mistral first came out, [02:27:15.140 --> 02:27:17.360] there was a huge thread of people [02:27:17.360 --> 02:27:19.080] just getting in there, analyzing it, [02:27:19.080 --> 02:27:21.840] trying to find it, it was so much energy [02:27:21.840 --> 02:27:25.280] that we definitely want to push it forward [02:27:25.280 --> 02:27:27.900] and we'll create public studios [02:27:27.900 --> 02:27:29.880] with evaluation frameworks on them. [02:27:29.880 --> 02:27:30.960] And we want to enable this kind of-- [02:27:30.960 --> 02:27:32.720] - So studios are shareable, of course. [02:27:32.720 --> 02:27:33.560] - Yes. - Yes, that makes sense. [02:27:33.560 --> 02:27:35.660] - Not only shareable between you and me, [02:27:35.660 --> 02:27:37.840] but also community-wide. - Yeah, yeah, yeah, yeah. [02:27:37.840 --> 02:27:39.420] - So there will be a lot of things, [02:27:39.420 --> 02:27:41.200] you can go in there, use your pre-credits [02:27:41.200 --> 02:27:43.640] to just run the evaluation on your model, [02:27:43.640 --> 02:27:44.480] you can do that. [02:27:44.480 --> 02:27:46.040] - Yeah, great. [02:27:46.040 --> 02:27:48.320] So last question, I've been asking everybody this. [02:27:48.320 --> 02:27:50.760] You've been coming to NeurIPS for many years, [02:27:50.760 --> 02:27:53.360] what are your NeurIPS tips? [02:27:53.360 --> 02:27:56.980] - Oh wow, yeah, go to posters, which I cannot do, but. [02:27:56.980 --> 02:27:58.440] (laughing) [02:27:58.440 --> 02:28:00.840] - But okay, so I've been to like, [02:28:00.840 --> 02:28:02.920] there's been three poster sessions so far. [02:28:02.920 --> 02:28:04.640] The popular ones are just crowded, [02:28:04.640 --> 02:28:05.480] there's just no way. [02:28:05.480 --> 02:28:07.080] - Yeah, I think there's not, yeah, [02:28:07.080 --> 02:28:08.640] I don't like to go to popular ones. [02:28:08.640 --> 02:28:10.560] - Yeah, okay, the less popular ones. [02:28:10.560 --> 02:28:11.480] - Yeah, yeah. - Just talk to them. [02:28:11.480 --> 02:28:15.400] - So I had so many super engaging conversations, [02:28:15.400 --> 02:28:18.720] even in topics like, even something is not apparently [02:28:18.720 --> 02:28:21.640] something that you should focus on. [02:28:21.640 --> 02:28:22.480] - Yeah. [02:28:22.480 --> 02:28:25.080] - Your brain will oxygenate itself a lot, [02:28:25.080 --> 02:28:27.120] and typically after these conferences, [02:28:27.120 --> 02:28:30.700] I always come back with a head full of ideas, [02:28:30.700 --> 02:28:35.700] and so I would say get enriched as much as you can [02:28:35.700 --> 02:28:36.760] by interacting with people, [02:28:36.760 --> 02:28:38.620] having very honest conversation with them. [02:28:38.620 --> 02:28:40.160] - Yeah. [02:28:40.160 --> 02:28:42.720] I had an off the record conversation with one [02:28:42.720 --> 02:28:44.120] of the presenters who said like, [02:28:44.120 --> 02:28:46.920] yeah, I don't, this paper I'm presenting, [02:28:46.920 --> 02:28:47.840] I don't believe in it. [02:28:47.840 --> 02:28:49.000] (laughing) [02:28:49.000 --> 02:28:51.120] I was like, wow, that's really honest. [02:28:51.120 --> 02:28:53.200] 'Cause they submitted it months ago, right? [02:28:53.200 --> 02:28:55.560] And since then, the world has moved on. [02:28:55.560 --> 02:28:59.660] - Yeah, well, that's part of the struggle. [02:28:59.660 --> 02:29:03.960] I don't know how it must be being a postdoc [02:29:03.960 --> 02:29:08.920] or a PhD student, or even a master's student, [02:29:08.920 --> 02:29:11.920] nowadays in AI, it must be so stressful. [02:29:11.920 --> 02:29:12.760] (laughing) [02:29:12.760 --> 02:29:13.580] - Yeah, it is. [02:29:13.580 --> 02:29:15.360] - Back in the day, it was a lot easier. [02:29:15.360 --> 02:29:16.680] - You know, the prices have got bigger. [02:29:16.680 --> 02:29:18.160] - Yeah, for sure, for sure. [02:29:18.160 --> 02:29:19.640] - Okay, well, thank you so much for your time, [02:29:19.640 --> 02:29:21.100] and congrats on your launch. [02:29:21.100 --> 02:29:23.700] - Yeah, I think we'll, you know, [02:29:23.700 --> 02:29:25.440] meet each other on the platform, maybe. [02:29:25.440 --> 02:29:27.160] - Yes, I definitely will try it out. [02:29:27.160 --> 02:29:28.680] - Thank you. - Thanks a lot, bye. [02:29:28.680 --> 02:29:31.400] - A few of my AI engineer and ML engineer friends [02:29:31.400 --> 02:29:32.660] checked out Lightning Studio, [02:29:32.660 --> 02:29:33.520] and they were pretty impressed, [02:29:33.520 --> 02:29:36.320] so I'm personally interested to check it out next year. [02:29:36.320 --> 02:29:37.680] But last and not least, [02:29:37.680 --> 02:29:40.080] I want to give the mic to Jay Alomar [02:29:40.080 --> 02:29:41.800] of Cohere and LLM University, [02:29:41.800 --> 02:29:44.180] but more importantly, of the Illustrator Transformer, [02:29:44.180 --> 02:29:46.000] and is now writing a new book. [02:29:46.000 --> 02:29:48.360] We're here with Jay Alomar, educator of many things. [02:29:48.360 --> 02:29:50.320] I've learned so much from you. [02:29:50.320 --> 02:29:52.440] I literally, it's one of those moments [02:29:52.440 --> 02:29:54.200] where at NeurIPS, you just kind of see someone walking, [02:29:54.200 --> 02:29:55.520] and I'm like, "Is that Jay?" [02:29:55.520 --> 02:29:57.800] And then I had to get your attention a few times, [02:29:57.800 --> 02:30:00.120] but it's so nice to finally meet you. [02:30:00.120 --> 02:30:02.700] - It's great to meet you, and great to be here, [02:30:02.700 --> 02:30:04.540] and sort of meet all kinds of brilliant folks. [02:30:04.540 --> 02:30:06.460] I've watched your stuff, [02:30:06.460 --> 02:30:09.360] and sort of been watching the revolution, [02:30:09.360 --> 02:30:11.520] and how you're helping sort of crystallize [02:30:11.520 --> 02:30:15.080] people's thinking about this new domain of AI engineering, [02:30:15.080 --> 02:30:18.240] and so I think the title is very helpful [02:30:18.240 --> 02:30:20.800] as categorizing that class. [02:30:20.800 --> 02:30:22.680] - Yeah, trying to do for my audience [02:30:22.680 --> 02:30:25.360] what you do for just general ML education, [02:30:25.360 --> 02:30:28.000] which is, I think, something that you've really done [02:30:28.000 --> 02:30:29.000] an incredible job of. [02:30:29.000 --> 02:30:30.480] - Yeah, no, it's wonderful. [02:30:30.480 --> 02:30:32.080] It's what the community needs, definitely. [02:30:32.080 --> 02:30:36.060] As machine learning and AI sort of goes out of research, [02:30:36.060 --> 02:30:37.660] and goes into industry, and people-- [02:30:37.660 --> 02:30:39.740] - It's a different persona, different background. [02:30:39.740 --> 02:30:40.580] There's the kind of people, [02:30:40.580 --> 02:30:42.100] and one of the reasons I'm doing this recording [02:30:42.100 --> 02:30:44.460] is the kind of people that follow my stuff [02:30:44.460 --> 02:30:48.180] don't come here, and maybe they shouldn't, right? [02:30:48.180 --> 02:30:50.720] Like, some of this is too in-depth. [02:30:50.720 --> 02:30:52.780] But I'm curious, you've been to many NeurIPS. [02:30:52.780 --> 02:30:54.460] What is your general take of the vibe? [02:30:54.460 --> 02:30:55.480] What are people talking about? [02:30:55.480 --> 02:30:56.660] What's top of mind? [02:30:56.660 --> 02:30:59.260] - There's a lot of LLMs that's interesting to see. [02:30:59.260 --> 02:31:00.540] - Suddenly a lot of interest, right? [02:31:00.540 --> 02:31:02.840] - Yes, yes, that is, let's say, [02:31:02.840 --> 02:31:06.960] maybe, possibly a new development in NeurIPS. [02:31:06.960 --> 02:31:09.800] That's the area that's growing. [02:31:09.800 --> 02:31:12.000] And a couple of interesting keywords, [02:31:12.000 --> 02:31:14.760] or groups, or directions are diffusion, [02:31:14.760 --> 02:31:16.640] even diffusion for text models. [02:31:16.640 --> 02:31:17.480] That's interesting. [02:31:17.480 --> 02:31:18.600] - Yeah, there was a paper on that yesterday. [02:31:18.600 --> 02:31:19.440] - Yeah. [02:31:19.440 --> 02:31:22.280] - I'm not too, what's the point of diffusion for text? [02:31:22.280 --> 02:31:24.840] Don't people want to stream things out? [02:31:24.840 --> 02:31:26.240] - Well, I mean, if you think of, [02:31:26.240 --> 02:31:28.080] like, on the application side, [02:31:28.080 --> 02:31:30.860] autoregressive generation has some problems. [02:31:30.860 --> 02:31:33.260] So if the model makes a mistake with token five, [02:31:33.260 --> 02:31:34.500] you're stuck with that problem. [02:31:34.500 --> 02:31:36.460] - Yes, that's what Tree of Thought solves, [02:31:36.460 --> 02:31:38.180] which the Tree of Thought guy was here. [02:31:38.180 --> 02:31:40.940] - Yeah, so it's one, let's say, one avenue. [02:31:40.940 --> 02:31:41.780] - Yes. [02:31:41.780 --> 02:31:43.420] - But it's like, maybe if you, [02:31:43.420 --> 02:31:48.140] if the model does not fall in a mistake in that way, [02:31:48.140 --> 02:31:51.140] you can unlock new, sort of, different applications. [02:31:51.140 --> 02:31:53.140] But also all the image generation stuff, [02:31:53.140 --> 02:31:53.980] that's really where. [02:31:53.980 --> 02:31:55.580] - Yeah, well, I'll make a plug. [02:31:55.580 --> 02:31:57.500] I actually had a, so there's a lot of house parties [02:31:57.500 --> 02:31:59.700] that happen after NeurIPS, which is fantastic. [02:31:59.700 --> 02:32:01.820] I ran into a guy from MidJourney for the first time. [02:32:01.820 --> 02:32:04.380] They have this new storytelling section, [02:32:04.380 --> 02:32:06.020] and they are actually exploring text diffusion [02:32:06.020 --> 02:32:07.340] because of storytelling. [02:32:07.340 --> 02:32:09.400] Because you have to generate a coherent story, [02:32:09.400 --> 02:32:10.500] just like you would an image. [02:32:10.500 --> 02:32:11.340] - True, true. [02:32:11.340 --> 02:32:13.340] - So I would buy that as a use case. [02:32:13.340 --> 02:32:14.360] - That is fascinating. [02:32:14.360 --> 02:32:16.620] And then with the agent stuff, like, [02:32:16.620 --> 02:32:18.740] if you're interested in the future of agents, [02:32:18.740 --> 02:32:21.580] which is, there's a lot of the reasoning stuff. [02:32:21.580 --> 02:32:23.340] The reasoning research in NeurIPS [02:32:23.340 --> 02:32:26.180] will most likely, sort of, inform the upcoming, [02:32:26.180 --> 02:32:27.940] what's gonna happen in agents next. [02:32:27.940 --> 02:32:30.380] So, chain of thought, tree of thought, [02:32:30.380 --> 02:32:33.700] that domain of research, for me, is very fascinating. [02:32:33.700 --> 02:32:36.780] Because it's gonna be applied very quickly. [02:32:36.780 --> 02:32:39.460] The React paper comes out, it's in line chain, [02:32:39.460 --> 02:32:40.940] everybody's sort of using it. [02:32:40.940 --> 02:32:43.140] Everybody has a sense of what agents are. [02:32:43.140 --> 02:32:44.780] But that really shows you the potential [02:32:44.780 --> 02:32:46.440] of what they're gonna be in the future. [02:32:46.440 --> 02:32:49.140] We're still in early days on agents. [02:32:49.140 --> 02:32:51.860] - Any other, like, top of mind sessions? [02:32:51.860 --> 02:32:53.480] Did you go to the Chris Ray run this morning? [02:32:53.480 --> 02:32:54.660] I thought that was pretty cool. [02:32:54.660 --> 02:32:57.580] - I have that one and a couple of the other, sort of, [02:32:57.580 --> 02:32:59.820] keynotes, I'll be re-watching them. [02:32:59.820 --> 02:33:02.300] But mostly, I'm just talking with people, [02:33:02.300 --> 02:33:04.580] recording video, that's been my, [02:33:04.580 --> 02:33:06.180] and sort of trying to orient myself. [02:33:06.180 --> 02:33:08.340] It's an overwhelming amount of content, [02:33:08.340 --> 02:33:10.860] and people, and posters, and talks. [02:33:10.860 --> 02:33:13.260] And so, I've been, yeah, looking at visualizations of, [02:33:13.260 --> 02:33:15.220] you know, these are the papers at NeurIPS. [02:33:15.220 --> 02:33:17.100] These are the ones that could be interesting to you. [02:33:17.100 --> 02:33:17.940] - Yeah, people have published, like, [02:33:17.940 --> 02:33:20.540] TSNE things of them, and that's good. [02:33:20.540 --> 02:33:22.280] But, like, it's not as good as just, kind of, [02:33:22.280 --> 02:33:23.120] seeing the vibes. [02:33:23.120 --> 02:33:25.260] I actually think the conference organizers [02:33:25.260 --> 02:33:26.980] do a good job of curating, like, [02:33:26.980 --> 02:33:29.740] what the, you know, oral session papers should be. [02:33:29.740 --> 02:33:30.580] - True. [02:33:30.580 --> 02:33:31.980] - You know, like, I've generally found them, like, [02:33:31.980 --> 02:33:32.940] generally very insightful. [02:33:32.940 --> 02:33:34.020] I just found out about DataConf [02:33:34.020 --> 02:33:35.420] from one of the oral sessions. [02:33:35.420 --> 02:33:37.100] I don't know if you've seen them. [02:33:37.100 --> 02:33:38.980] They're effectively a new ImageNet. [02:33:38.980 --> 02:33:39.820] - Oh, nice. [02:33:39.820 --> 02:33:41.220] - Which is, like, oh, that's cool. [02:33:41.220 --> 02:33:42.060] New benchmark. [02:33:42.060 --> 02:33:43.260] And, yeah, I mean, like, it's, [02:33:43.260 --> 02:33:45.060] to me, I'm taking it all in. [02:33:45.060 --> 02:33:48.180] So, it's impressive how many people do so much work, [02:33:48.180 --> 02:33:49.340] and you've never heard of them. [02:33:49.340 --> 02:33:51.700] - That's true, that's true, yeah. [02:33:51.700 --> 02:33:53.200] - And they're conversant in all the techniques, [02:33:53.200 --> 02:33:54.680] all the papers, all the stuff that you see online. [02:33:54.680 --> 02:33:55.520] - Yeah, yeah. [02:33:55.520 --> 02:33:56.440] - They're just not online. [02:33:56.440 --> 02:33:57.280] - That's true. [02:33:57.280 --> 02:33:58.120] - And they just do research quietly, [02:33:58.120 --> 02:33:59.840] and then once a year, they show up here. [02:33:59.840 --> 02:34:01.960] - Yeah, and sometimes you meet somebody here, [02:34:01.960 --> 02:34:03.880] and you're, like, and they would mention, [02:34:03.880 --> 02:34:05.660] they worked on that other paper, [02:34:05.660 --> 02:34:07.680] and it's a paper that you're very familiar with. [02:34:07.680 --> 02:34:10.000] And then you go into their Google Scholar or something, [02:34:10.000 --> 02:34:11.480] and you're, like, I've been reading [02:34:11.480 --> 02:34:14.660] this person's work for years, [02:34:14.660 --> 02:34:16.320] but the name never really, sort of, [02:34:16.320 --> 02:34:18.760] specifically popped up until you meet them in person. [02:34:18.760 --> 02:34:20.080] So, that's why it's, yeah, [02:34:20.080 --> 02:34:22.540] it's definitely an interesting experience. [02:34:22.540 --> 02:34:24.400] - Yeah, no particular order, but have you had those, [02:34:24.400 --> 02:34:26.060] like, any underrated person you would call out [02:34:26.060 --> 02:34:28.020] as, like, hey, everyone should pay more attention [02:34:28.020 --> 02:34:30.020] to the work that this person's doing? [02:34:30.020 --> 02:34:31.980] - One thing that comes across, [02:34:31.980 --> 02:34:34.340] which is why workshops are good, [02:34:34.340 --> 02:34:36.340] and we can get into that, sort of, later, [02:34:36.340 --> 02:34:39.260] is David Bao's work on interpretability [02:34:39.260 --> 02:34:43.100] and editing language models and editing their knowledge [02:34:43.100 --> 02:34:45.900] was one thing that, sort of, really stood out to me [02:34:45.900 --> 02:34:49.540] after I've met David and, sort of, heard about his work. [02:34:49.540 --> 02:34:51.200] - Editing by editing weights? [02:34:51.200 --> 02:34:54.460] - Yeah, they have a method of editing the model, exactly, yes. [02:34:54.460 --> 02:34:56.000] - Is this the one where they played Go? [02:34:56.000 --> 02:34:57.880] - This is Rome. - And they flipped a-- [02:34:57.880 --> 02:35:00.760] - This is where they convince a model [02:35:00.760 --> 02:35:03.340] using that method that the Eiffel Tower [02:35:03.340 --> 02:35:05.480] is in Rome and not in Paris. [02:35:05.480 --> 02:35:09.240] And then they have subsequent methods of, let's say, [02:35:09.240 --> 02:35:12.000] if you make 100 edits like that, the model degrades. [02:35:12.000 --> 02:35:14.000] So, they have subsequent work on, [02:35:14.000 --> 02:35:16.640] okay, this is a better method to do many more of that. [02:35:16.640 --> 02:35:19.940] But also things like, and I've seen, like, Logit Lens [02:35:19.940 --> 02:35:21.580] and, sort of, where in the model [02:35:21.580 --> 02:35:24.220] is this token being suggested? [02:35:24.220 --> 02:35:26.540] Like, is it at layer one or is it at layer five [02:35:26.540 --> 02:35:30.580] or is it at layer, that localization is interesting work. [02:35:30.580 --> 02:35:32.700] - Yeah, so you do all these interviews on your YouTube, [02:35:32.700 --> 02:35:33.580] we'll send people there. [02:35:33.580 --> 02:35:35.740] Is this part of your work at Cohere, or? [02:35:35.740 --> 02:35:36.580] - A little bit, yes. [02:35:36.580 --> 02:35:37.820] - How does this, what is your deal? [02:35:37.820 --> 02:35:40.140] - Yeah, that's true. [02:35:40.140 --> 02:35:44.740] So, these, a bunch of them go on the Cohere YouTube channel [02:35:44.740 --> 02:35:46.740] and the Cohere socials as well. [02:35:46.740 --> 02:35:48.420] So, yeah, my work at Cohere, [02:35:48.420 --> 02:35:50.340] I get to learn in public, basically. [02:35:50.340 --> 02:35:51.180] - I love that. [02:35:51.180 --> 02:35:53.580] - So, Cohere builds language models for embeddings, [02:35:53.580 --> 02:35:55.020] re-ranking, and generation. [02:35:55.020 --> 02:35:57.440] And through selling them, I get to see what, [02:35:57.440 --> 02:35:59.440] how industry's solving problems with them. [02:35:59.440 --> 02:36:01.680] And that, to me, is very fascinating. [02:36:01.680 --> 02:36:03.780] To see the technology coming out of research [02:36:03.780 --> 02:36:07.340] and then how it goes into industry and how people use them, [02:36:07.340 --> 02:36:09.580] how people, sort of, need to be educated [02:36:09.580 --> 02:36:11.540] on the best ways of using them. [02:36:11.540 --> 02:36:14.860] That view, to me, is something I'm lucky to have. [02:36:14.860 --> 02:36:17.260] - Yeah, yeah, it's a good job to get, to be honest. [02:36:17.260 --> 02:36:19.300] If you love that stuff, you might as well get paid to do it. [02:36:19.300 --> 02:36:20.140] You probably don't notice, [02:36:20.140 --> 02:36:21.860] but I actually have written a book on learning in public, [02:36:21.860 --> 02:36:24.520] and I am a big advocate of getting developers [02:36:24.520 --> 02:36:26.020] and engineers to learn in public. [02:36:26.020 --> 02:36:27.620] - Well, you do it so well, so, yeah. [02:36:27.620 --> 02:36:29.500] - Yeah, this is my way of doing it. [02:36:29.500 --> 02:36:30.780] One final piece is, you know, [02:36:30.780 --> 02:36:32.780] you've written a lot of foundational work on, [02:36:32.780 --> 02:36:33.620] like, transformers. [02:36:33.620 --> 02:36:35.920] A lot of people are talking about the state-space models [02:36:35.920 --> 02:36:37.940] and what happens after the transformers. [02:36:37.940 --> 02:36:39.080] Do you have personal views on that? [02:36:39.080 --> 02:36:40.080] - Not yet, not yet. [02:36:40.080 --> 02:36:40.920] I'm on the lookout. [02:36:40.920 --> 02:36:43.540] So, there are always new ideas that, you know, [02:36:43.540 --> 02:36:46.080] there's maybe poster number 502 here [02:36:46.080 --> 02:36:47.860] that nobody paid attention to. [02:36:47.860 --> 02:36:50.540] Maybe in six months, we'll see that, [02:36:50.540 --> 02:36:53.460] oh, it crushes everything else on. [02:36:53.460 --> 02:36:56.220] So, that is always something you can never, sort of, expect. [02:36:56.220 --> 02:36:59.060] - Yeah, my favorite fact about the transformers papers, [02:36:59.060 --> 02:37:00.960] it itself was not accepted as, [02:37:00.960 --> 02:37:02.500] it was like a poster-only paper, right? [02:37:02.500 --> 02:37:03.340] - That's true. [02:37:03.340 --> 02:37:04.180] - I don't know the story behind that. [02:37:04.180 --> 02:37:06.600] - It was a big deal for machine translation. [02:37:06.600 --> 02:37:07.440] - Yes. [02:37:07.440 --> 02:37:08.260] - But it's like, okay, yeah, [02:37:08.260 --> 02:37:09.100] there's a cool translation paper. [02:37:09.100 --> 02:37:09.940] - It's one of many, right? [02:37:09.940 --> 02:37:10.780] - Yeah, they already have BERT. [02:37:10.780 --> 02:37:12.180] - One new attention method. [02:37:12.180 --> 02:37:14.740] We had Bardanow attention, and we had Luong attention, [02:37:14.740 --> 02:37:17.020] and like, now we have also, you know, one more. [02:37:17.020 --> 02:37:18.300] But then BERT comes out, and it's like, [02:37:18.300 --> 02:37:19.860] okay, this is more than translation. [02:37:19.860 --> 02:37:21.060] And then GPT comes out, and it's like, [02:37:21.060 --> 02:37:22.260] oh, this can generate text. [02:37:22.260 --> 02:37:24.300] - I'm still missing a good survey paper [02:37:24.300 --> 02:37:26.460] on everything that happens since attention is all you need. [02:37:26.460 --> 02:37:27.820] Like, the evolution towards [02:37:27.820 --> 02:37:30.140] the modern decoder-only paradigm. [02:37:30.140 --> 02:37:30.980] - Okay. [02:37:30.980 --> 02:37:32.820] - And I feel like someone needs to write that. [02:37:32.820 --> 02:37:34.420] (laughing) [02:37:34.420 --> 02:37:36.340] Everyone's too busy inventing new things [02:37:36.340 --> 02:37:37.980] to stop and write what happens. [02:37:37.980 --> 02:37:39.100] - Because it's a massive thing. [02:37:39.100 --> 02:37:41.180] There are a few people who, [02:37:41.180 --> 02:37:45.060] because there's a lot of work on different kinds of attention [02:37:45.060 --> 02:37:46.420] for the transformer, specifically. [02:37:46.420 --> 02:37:48.940] How to improve it for this problem, for that problem. [02:37:48.940 --> 02:37:52.060] But one thing that I'm doing is rewriting [02:37:52.060 --> 02:37:54.620] the Illustrator transformer with the ideas [02:37:54.620 --> 02:37:57.060] that have stood the test of time since then. [02:37:57.060 --> 02:37:59.820] So it's like, six years after, which ideas. [02:37:59.820 --> 02:38:01.180] So people are using rope-- [02:38:01.180 --> 02:38:02.140] - Flash attention. [02:38:02.140 --> 02:38:03.980] - Flash attention. - Alibi. [02:38:03.980 --> 02:38:05.820] - And then, yeah, rope and alibi, [02:38:05.820 --> 02:38:08.540] let's say positional encodings, localized attention. [02:38:08.540 --> 02:38:10.980] So some ideas that people are continuing [02:38:10.980 --> 02:38:12.780] to use over and over. - Group query. [02:38:12.780 --> 02:38:15.260] - Multi-query and group multi-query, yes. [02:38:15.260 --> 02:38:16.580] - And then sliding window. [02:38:16.580 --> 02:38:18.980] - Not yet, so it's in Mistral, [02:38:18.980 --> 02:38:21.860] but we maybe need to see it in more work. [02:38:21.860 --> 02:38:23.900] - My conspiracy theory about Mistral is that, [02:38:23.900 --> 02:38:25.540] so the Mistral paper heavily features [02:38:25.540 --> 02:38:27.740] sliding window attention, and everyone is like, bullshit. [02:38:27.740 --> 02:38:28.780] Like, come on. [02:38:28.780 --> 02:38:30.260] - I mean, because you see it. [02:38:30.260 --> 02:38:32.340] You saw that for two years after the transformer, [02:38:32.340 --> 02:38:33.780] everybody was proposing new ideas. [02:38:33.780 --> 02:38:36.380] And if you put this in the transformer, [02:38:36.380 --> 02:38:37.420] it does better on this. [02:38:37.420 --> 02:38:38.980] But then, what stands the test of time? [02:38:38.980 --> 02:38:42.740] The vanilla transformer really stood the test of time, [02:38:42.740 --> 02:38:44.740] and did better than even a lot of these [02:38:44.740 --> 02:38:47.020] "enhanced" enhancements. [02:38:47.020 --> 02:38:49.340] But these ones, let's say, stood the test of time. [02:38:49.340 --> 02:38:52.940] So this rewriting is gonna be part of the book [02:38:52.940 --> 02:38:54.940] I'm sort of currently writing. [02:38:54.940 --> 02:38:55.780] - Oh, you're writing a book? [02:38:55.780 --> 02:38:57.300] - Yes, writing a book for O'Reilly [02:38:57.300 --> 02:39:00.380] called Hands-On Large Language Models, [02:39:00.380 --> 02:39:01.740] including this as a chapter. [02:39:01.740 --> 02:39:04.460] So if you want an updated, illustrated transformer, [02:39:04.460 --> 02:39:05.300] that's gonna be a part of it. [02:39:05.300 --> 02:39:06.300] - Yeah, well, when you launch your book, [02:39:06.300 --> 02:39:08.180] you should come on and do a full episode with us. [02:39:08.180 --> 02:39:09.020] - That'd be amazing. [02:39:09.020 --> 02:39:10.060] - Yeah, exactly. [02:39:10.060 --> 02:39:11.860] And then, just general NeurIPS tips, you know, [02:39:11.860 --> 02:39:13.900] as an attendee, like, if people are coming [02:39:13.900 --> 02:39:15.980] for the first time, what would you advise them to do? [02:39:15.980 --> 02:39:18.420] - I really love the visualization by, [02:39:18.420 --> 02:39:21.860] I posted about this, by Hendrik Strobel and Ben Hoover, [02:39:21.860 --> 02:39:24.180] of the t-SNE of all the papers. [02:39:24.180 --> 02:39:26.860] But also, it's clustered, so if you're interested [02:39:26.860 --> 02:39:29.180] in language models, that is clustered. [02:39:29.180 --> 02:39:30.300] - I use that for my planning. [02:39:30.300 --> 02:39:31.900] - It's so useful. [02:39:31.900 --> 02:39:33.900] Like, these things are absolutely incredible. [02:39:33.900 --> 02:39:35.940] I got to meet Hendrik, and they have so, [02:39:35.940 --> 02:39:38.180] a lot of very interesting ideas there. [02:39:38.180 --> 02:39:40.540] It helps you sort of orient yourself. [02:39:40.540 --> 02:39:42.780] And I've also seen work, kind of like it, [02:39:42.780 --> 02:39:45.580] but where you can do semantic search on, [02:39:45.580 --> 02:39:48.060] so you can say, you know, agent papers, [02:39:48.060 --> 02:39:51.460] and it doesn't need to match the actual keywords. [02:39:51.460 --> 02:39:53.700] With Cohere, we have a demo on RAG [02:39:53.700 --> 02:39:55.060] on NeurIPS papers as well. [02:39:55.060 --> 02:39:56.700] So, you can ask a question, you're like, [02:39:56.700 --> 02:39:58.780] okay, I'm interested in LLM and efficiency, [02:39:58.780 --> 02:40:01.060] it'll say, okay, this paper, this paper, this paper. [02:40:01.060 --> 02:40:03.500] And it's retrieval augmented sort of generation. [02:40:03.500 --> 02:40:04.460] So these are the three tools, [02:40:04.460 --> 02:40:08.100] but I think we need a lot more of these tools [02:40:08.100 --> 02:40:09.660] to make sense of this. [02:40:09.660 --> 02:40:11.260] - I need it for the meetups too. [02:40:11.260 --> 02:40:12.700] You know, in the conference app, [02:40:12.700 --> 02:40:14.860] there's all these meetups for very specific things. [02:40:14.860 --> 02:40:15.780] - That's true. [02:40:15.780 --> 02:40:18.180] - I started one for Singaporeans, [02:40:18.180 --> 02:40:19.540] 'cause I'm a Singaporean in tech. [02:40:19.540 --> 02:40:22.340] And yeah, there's a bunch of very, very specific, [02:40:22.340 --> 02:40:24.740] like running meetups, nothing to do with tech specifically, [02:40:24.740 --> 02:40:26.980] but this is also a social event, right? [02:40:26.980 --> 02:40:27.820] Like, that you're meeting-- [02:40:27.820 --> 02:40:28.660] - Okay, yeah. [02:40:28.660 --> 02:40:30.540] You wouldn't happen to be at MNLP. [02:40:30.540 --> 02:40:31.380] - No, why? [02:40:31.380 --> 02:40:32.220] - 'Cause some people did that, [02:40:32.220 --> 02:40:34.220] because it was like last week [02:40:34.220 --> 02:40:36.620] and some people went to MNLP in Singapore [02:40:36.620 --> 02:40:38.100] and then flew back here. [02:40:38.100 --> 02:40:39.780] - That's a tough call. [02:40:39.780 --> 02:40:41.020] Yeah, I'm not gonna do that. [02:40:41.020 --> 02:40:42.660] - That's rough, that's rough. [02:40:42.660 --> 02:40:43.500] - Well, thanks very much. [02:40:43.500 --> 02:40:44.660] It's a pleasure to have you on, [02:40:44.660 --> 02:40:45.500] pleasure to meet in person. [02:40:45.500 --> 02:40:47.140] - So good to meet, love your work. [02:40:47.140 --> 02:40:47.980] - Thank you. [02:40:47.980 --> 02:40:48.820] - Keep doing it. [02:40:48.820 --> 02:40:50.700] - Any calls to action for people while you're here? [02:40:50.700 --> 02:40:54.100] - Well, I'm JL Ammar on Twitter and YouTube, [02:40:54.100 --> 02:40:57.500] and we have LLM University, LLM.University. [02:40:57.500 --> 02:41:00.620] Like I collaborate with Luis and Muir Ammar. [02:41:00.620 --> 02:41:02.860] - Yeah, some of the best YouTube, [02:41:02.860 --> 02:41:06.100] like very short, but like very comprehensive, authoritative. [02:41:06.100 --> 02:41:08.980] - I'm very lucky to collaborate with these folks. [02:41:08.980 --> 02:41:09.820] - Yeah. [02:41:09.820 --> 02:41:11.500] - It's incredible, but yeah, thanks. [02:41:11.500 --> 02:41:12.740] - Yeah, thanks for doing all that. [02:41:12.740 --> 02:41:13.580] Thank you. [02:41:13.580 --> 02:41:14.420] - Appreciate it. [02:41:14.420 --> 02:41:17.140] - Okay, and that's it for our New York's coverage [02:41:17.140 --> 02:41:20.820] and for Latent Space Pod in 2023. [02:41:20.820 --> 02:41:22.660] We are still doing a listener survey. [02:41:22.660 --> 02:41:24.980] So if you are listening through here, [02:41:24.980 --> 02:41:26.180] you're definitely a big fan. [02:41:26.180 --> 02:41:27.980] We definitely want to hear from you. [02:41:27.980 --> 02:41:29.100] What do you like about the podcast? [02:41:29.100 --> 02:41:30.900] What do you want to hear for 2024? [02:41:30.900 --> 02:41:33.740] We've got a couple of really good episodes already recorded [02:41:33.740 --> 02:41:35.180] for the start of 2024. [02:41:35.180 --> 02:41:36.540] So we're going to start the year strong [02:41:36.540 --> 02:41:39.180] and come out to the one year anniversary of Latent Space. [02:41:39.180 --> 02:41:40.860] So thanks for all your support. [02:41:40.860 --> 02:41:43.820] Have a wonderful end of the year, and we'll see you soon. [02:41:43.820 --> 02:41:45.180] DJ, hit the outro. [02:41:46.140 --> 02:41:48.740] (upbeat music) [02:41:48.740 --> 02:41:51.340] (upbeat music) [02:41:51.340 --> 02:41:53.660] (upbeat music)