back to index

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)


Chapters

0:0 Intro Song by Suno.ai
2:1 Swyx and Alessio in Singapore
5:49 GPU Rich vs Poors: Frontier Labs
6:35 GPU Rich Frontier Models: Claude 3.5
10:37 GPU Rich helping Poors: Llama 3.1: The Synthetic Data Model
15:41 GPU Rich helping Poors: Frontier Labs Vibe Shift - Phi 3, Gemma 2
18:26 GPU Rich: Mistral Large
21:56 GPU Rich: Nvidia + FlashAttention 3
23:45 GPU Rich helping Poors: Noam Shazeer & Character.AI
28:14 GPU Poors: On Device LLMs: Mozilla Llamafile, Chrome (Gemini Nano), Apple Intelligence
35:33 Quality Data Wars: NYT vs The Atlantic lawyer up vs partner up
37:41 Quality Data Wars: Reddit, ScarJo, RIAA vs Udio & Suno
41:3 Quality Data Wars: Synthetic Data, Jagged Intelligence, AlphaProof
45:33 Multimodality War: ChatGPT Voice Mode, OpenAI demo at AIEWF
47:34 Multimodality War: Meta Llama 3 multimodality + Chameleon
50:54 Multimodality War: PaliGemma + CoPaliGemma
52:55 Renaming Rag/Ops War to LLM OS War
55:31 LLM OS War: Ops War: Prompt Management vs Gateway vs Observability
62:57 LLM OS War: BM42 Vector DB Wars, Memory Databases, GraphRAG
66:15 LLM OS War: Agent Tooling
68:26 LLM OS War: Agent Protocols
70:43 Trend: Commoditization of Intelligence
76:45 Trend: Vertical Service as Software, AI Employees, Brightwave, Dropzone
80:44 Trend: Benchmark Frontiers after MMLU
83:31 Crowdstrike will save us from Skynet

Whisper Transcript | Transcript Only Page

00:00:00.000 | (upbeat music)
00:00:02.580 | - Hey, everyone.
00:00:05.020 | Welcome to the Latent Space Podcast.
00:00:06.700 | This is Alessio, partner and CTO
00:00:08.440 | on Resonance Investable Partners.
00:00:10.260 | And today we're in the Sengovor studio with Swix.
00:00:13.780 | - Hey, this is our long-awaited one-on-one episode.
00:00:18.100 | I don't know how long ago the previous one was.
00:00:21.460 | Do you remember?
00:00:22.660 | Three, four months now?
00:00:23.500 | - Yeah, it's been a while.
00:00:25.780 | - People really enjoyed it.
00:00:26.620 | It's just really, I think our travel schedules
00:00:28.660 | have been really difficult to get this stuff together.
00:00:31.180 | And then we also had like a decent backlog
00:00:32.980 | of guests for a while.
00:00:34.500 | I think we've kind of depleted that backlog now
00:00:36.260 | and we need to build it up again.
00:00:37.540 | (laughing)
00:00:38.900 | But it's been busy and there's been a lot of news.
00:00:40.980 | So we actually get to do this like sort of rapid fire thing.
00:00:44.260 | I think some people, you know,
00:00:45.260 | the podcast has grown a lot in the last six months.
00:00:48.020 | Maybe just reintroducing like what you're up to,
00:00:50.780 | what I'm up to, and why we're here in Sengovor
00:00:54.180 | and stuff like that.
00:00:55.020 | - Yeah, my first time here in Sengovor,
00:00:56.900 | which has been really nice.
00:00:58.420 | This country is really amazing, I would say.
00:01:00.620 | First of all, everything feels like
00:01:02.900 | the busiest part of the city.
00:01:04.220 | Everything is skyscrapers.
00:01:05.420 | There's like plants in all the buildings,
00:01:07.460 | or at least in the areas that I've been in,
00:01:09.300 | which has been awesome.
00:01:10.140 | And I was at one of the offices kind of on the south side
00:01:13.220 | and from the 38th floor, you can see Indonesia on one side
00:01:17.340 | and you can see Malaysia on the other side.
00:01:20.300 | So it's quite small.
00:01:22.140 | One of the people there said their kid goes to school
00:01:24.700 | at the border with Malaysia, basically,
00:01:26.220 | so they could drive to Malaysia every day.
00:01:27.740 | (all laughing)
00:01:28.580 | So they could go pick her up from school.
00:01:29.900 | Yeah, and we came here, we hosted with you
00:01:32.380 | the Sovereign AI Summit Wednesday night.
00:01:34.380 | We had a lot of-
00:01:35.980 | - NVIDIA, Goldman, Temasek, Singtel.
00:01:38.420 | - GSE, Singtel.
00:01:40.180 | And we got to talk about this trend of Sovereign AI,
00:01:42.580 | which maybe we might cover on another episode,
00:01:44.860 | but basically how do you drive, if you're a country,
00:01:47.700 | how do you drive productivity growth
00:01:49.300 | in a time where populations are shrinking,
00:01:51.140 | the workforce is shrinking,
00:01:52.220 | and AI can kind of supplement a lot of this.
00:01:54.500 | And then the question is, okay,
00:01:56.460 | should I put all this money in foundation models?
00:01:58.660 | Should I put it in data centers and infrastructure?
00:02:00.980 | Should I put it in GPUs?
00:02:02.420 | Should I put it in agents and whatnot?
00:02:04.340 | So we'll touch on some of these trends in the episode,
00:02:07.260 | but it was a fun event.
00:02:08.740 | And I did not expect some of the most senior people
00:02:11.940 | at the largest financial institution in Singapore
00:02:13.940 | ask about state space models and some of the alternatives.
00:02:17.220 | So it's great to see how advanced
00:02:19.620 | the conversation is sometimes.
00:02:21.660 | - Yeah, I think that that is mostly people trying
00:02:25.380 | to listen to jargon that is being floated around
00:02:29.300 | as like, oh, what could kill transformers?
00:02:30.620 | And then they jump straight there
00:02:32.060 | without actually exploring the fundamentals,
00:02:34.420 | the basics of what they will actually put to work.
00:02:36.500 | That's fine, it's a forum to ask questions.
00:02:38.660 | So you wanna ask about the future,
00:02:40.500 | but I feel like it's not very practical
00:02:42.700 | to spend so much time on those things.
00:02:45.420 | Part of the things that I do in space,
00:02:46.980 | especially when I travel, is to try to ask questions
00:02:51.300 | about what countries that are not the US
00:02:54.340 | and not San Francisco can do,
00:02:55.900 | because everyone feels a bit left out.
00:02:58.220 | You feel it here as well.
00:03:00.260 | And I'm trying to promote alternatives.
00:03:02.620 | I think AI engineering is one way that countries
00:03:04.740 | can capitalize on the industry
00:03:06.700 | without building a hundred billion dollar cluster,
00:03:08.540 | which is one fifth the GDP of Singapore.
00:03:10.820 | And so my pitch at the summit was that we would,
00:03:19.100 | Singapore with the AIGeneration.
00:03:20.740 | We're also working on bringing the AIGener conference
00:03:23.820 | to Singapore next year, together with iClear.
00:03:25.580 | So yeah, we're just trying my best
00:03:27.940 | and I'm being looped into various government meetings
00:03:31.100 | to try to make that happen.
00:03:32.620 | - Well, we'll definitely be here next year.
00:03:34.580 | We'll be, I'll be back here very often.
00:03:37.220 | It's really nice.
00:03:38.380 | - Yeah, awesome.
00:03:39.300 | Okay, well, we have a lot of news.
00:03:42.220 | How do you think we should cover?
00:03:43.940 | - Maybe just recap since the framework of the four words
00:03:48.020 | of AI is something that came up end of last year.
00:03:51.700 | So basically, we'll link in the show notes,
00:03:53.860 | but the end of year recap for 2023
00:03:56.660 | was basically the four words of AI,
00:03:59.020 | which we picked GPU-rich versus GPU-poor,
00:04:02.540 | the data quality wars, the multimodality wars,
00:04:05.260 | and the reg/ops wars.
00:04:08.100 | So usually everything falls back under those four categories.
00:04:11.820 | So I'm pretty happy that seven months later,
00:04:14.660 | it's something that still matters.
00:04:15.980 | - It still kind of holds up.
00:04:17.100 | - Yeah, most AI stuff from eight months ago,
00:04:19.620 | it's really not that relevant anymore.
00:04:22.020 | And today, we'll try and bucket
00:04:24.940 | some of the recent news on it.
00:04:26.580 | We haven't done a monthly thing in like three months.
00:04:30.100 | So three months is a lot of stuff.
00:04:32.260 | - That's mostly because I got busy with the conference.
00:04:35.100 | But I do want to, actually,
00:04:38.300 | I do want to get back on that horse,
00:04:40.300 | or maybe just do it weekly
00:04:41.580 | so that I don't have such a big lift that I don't do it.
00:04:44.020 | I think the activation energy is the problem, really.
00:04:47.420 | So yeah, I think frontier model-wise,
00:04:50.380 | it seems like Cloud has really carved out
00:04:53.460 | a persistent space for itself.
00:04:55.500 | For a long time, I thought it was kind of like
00:04:56.940 | a clear number two to open AI.
00:04:58.900 | And with 3.5s on it,
00:05:00.780 | at least in some of the hard benchmarks on LMSIS,
00:05:04.180 | or coding benchmarks on LMSIS,
00:05:05.700 | it is the undisputed number one model in the world,
00:05:08.340 | even with 4.0 mini.
00:05:10.100 | And we can talk about 4.0 mini and benchmarking later on,
00:05:12.220 | but for Cloud to be there and hold that position
00:05:14.740 | for what is more than a month now in AI time is a big deal.
00:05:19.740 | There's not much that people know publicly
00:05:22.540 | about what Enthopic did for Cloud's on it,
00:05:26.100 | but I think it's still a huge achievement.
00:05:28.220 | It marks the beginning of a non-open AI-centric world
00:05:31.580 | to the point where the people on Twitter
00:05:33.260 | have canceled ChatGPT.
00:05:35.420 | That's been a trend that's been going on for a while.
00:05:37.180 | We talked about the unbundling of ChatGPT.
00:05:39.580 | But now, new open source projects and tooling,
00:05:42.420 | they're just built for Cloud.
00:05:43.620 | They don't even use open AI.
00:05:45.220 | That's a strategic threat to open AI, I think, a little bit.
00:05:48.940 | Obviously, open AI is so big
00:05:50.140 | that it doesn't really care about that.
00:05:51.780 | But for Enthopic, it's a big win.
00:05:53.500 | I think to see that going
00:05:55.740 | and to see Enthopic differentiating itself
00:05:57.540 | and actually implementing research.
00:05:59.900 | So the rumor is that the scaling monosematicity paper
00:06:03.140 | that they put out two months ago
00:06:05.580 | was a big part of Cloud 3.5 Sonnet.
00:06:08.660 | I've had off-the-record chats with people about that idea,
00:06:11.780 | and they don't agree that it is the only cause.
00:06:14.740 | So I was thinking this is the only thing that they did.
00:06:17.340 | But people say that there's about four or five other tricks
00:06:21.380 | that they haven't disclosed yet
00:06:22.460 | that went into 3.5 Sonnet.
00:06:23.940 | But the scaling monosematicity paper
00:06:25.860 | is a very, very good read.
00:06:26.820 | It's a very long read.
00:06:28.060 | But it basically says that you can find control vectors,
00:06:31.660 | control features now that you can turn on
00:06:34.100 | to make it better at code without really retraining it.
00:06:37.180 | You just train a whole bunch of sparse autoencoders,
00:06:39.580 | find a bunch of features, and just say,
00:06:41.660 | let's up those features,
00:06:43.180 | and suddenly you're better at code,
00:06:44.820 | or suddenly you care a lot about the Golden Gate Bridge.
00:06:47.220 | These are the same things to the model.
00:06:48.580 | That is a huge, huge win for interpretability
00:06:51.020 | because up to now we were only doing interpretability
00:06:54.940 | on toy models, like a few million parameters,
00:06:58.220 | a model of Go or chess or whatever.
00:07:00.020 | Cloud 3.5 Sonnet was interpreted
00:07:01.980 | and usefully improved using this technique.
00:07:06.220 | - Yeah, I think it would be amazing
00:07:07.620 | if we could replicate the same on the open models to then,
00:07:10.420 | because now we can use Llama 3.1
00:07:12.100 | to generate synthetic data for training and fine tuning.
00:07:15.620 | I think obviously Anthropic has a lot of compute
00:07:17.980 | and a lot of money.
00:07:18.820 | So once they figure out, okay,
00:07:20.540 | this is what we should make the model better at,
00:07:22.660 | they can kind of like put a lot of resources.
00:07:24.540 | I think an open source is probably gonna be
00:07:26.700 | a more distributed effort.
00:07:28.660 | Like I feel like Noose has held the crown
00:07:30.820 | of like the best fine tuning data set owners for a while,
00:07:33.460 | but at some point that should change, hopefully.
00:07:36.500 | Like other groups should step up.
00:07:38.740 | And I think if we can apply the same principles
00:07:41.060 | to like a model as big as 405B
00:07:43.460 | and bring them into like maybe the 7B form factor,
00:07:46.500 | that would be great.
00:07:47.340 | But yeah, Cloud is great.
00:07:48.220 | I canceled JGBD a while ago.
00:07:50.260 | Really small podcaster run for latent space.
00:07:53.300 | It runs both on Cloud and on OpenAI
00:07:55.500 | and Cloud is definitely better most of the time.
00:07:58.060 | It's not a benchmark, it's just vibes,
00:07:59.700 | but when the vibes are good, the vibes are good.
00:08:02.060 | - We run most of the I/O summaries on Cloud as well.
00:08:05.140 | - But, and I always run it against OpenAI.
00:08:06.860 | Sometimes OpenAI wins.
00:08:08.020 | I do a daily comparison,
00:08:10.180 | but yeah, Cloud is very strong at summarization
00:08:12.340 | and instruction following,
00:08:13.180 | which is something I care a lot about.
00:08:14.380 | So when you talk about frontier models,
00:08:16.420 | MMLU no longer cut it, right?
00:08:18.020 | Like we have reached like 92 on MMLU.
00:08:20.940 | It's going to like 95, 97.
00:08:22.980 | It just means you're memorizing MMLU.
00:08:24.540 | Like there's some fundamental irreducible level of mistakes
00:08:28.580 | because of MMLU's quality.
00:08:30.580 | We talked about this with Clementine
00:08:32.540 | on the Hugging Face episode.
00:08:34.540 | And so we need to see what else,
00:08:37.220 | what is the next frontier?
00:08:38.300 | I think there are 10 directions that I outlined below,
00:08:40.900 | but we'll talk about that later.
00:08:42.060 | Yeah, should we move on to number three?
00:08:43.660 | - Yeah, 3.1, I guess that too.
00:08:46.140 | Make sure to good differentiate between the models.
00:08:49.620 | But yeah, we have a whole episode with Thomas Shalom
00:08:52.660 | from the Meta team, which was really, really good.
00:08:56.340 | And I'm glad we got the podcast to come out
00:08:57.860 | at the same time as the model.
00:08:59.140 | - Yeah, I think we're the only ones to coordinate
00:09:01.340 | for the paper release for the big launch,
00:09:03.380 | the 405 launch.
00:09:04.500 | Zuck did a few interviews,
00:09:05.700 | but we're the only ones
00:09:06.540 | that did the technical team interview.
00:09:07.940 | - Yeah, yeah, yeah.
00:09:09.060 | I mean, they were like surfing or something
00:09:11.980 | with the Bloomberg person.
00:09:13.500 | We should get invited to surf with Zuck.
00:09:15.220 | - I would, yeah, I would be down to.
00:09:16.740 | - To the audience, the technical breakdown.
00:09:19.820 | - So behind the scenes, you know, one for listeners,
00:09:22.500 | one thing that we have attention about is who do we invite?
00:09:25.260 | Because obviously if we get Mark Zuckerberg,
00:09:26.940 | it'll be a big name,
00:09:27.780 | then it will cause people to download us more,
00:09:30.300 | but it will be a less technical interview
00:09:31.620 | because he's not on the research team.
00:09:33.460 | He's CEO of Meta.
00:09:36.060 | And so I think it's this constant back and forth.
00:09:38.220 | Like we want to grow as a podcast,
00:09:39.380 | but we want to serve a technical audience.
00:09:40.660 | And we're trying to do that, thread that line
00:09:42.900 | because our currency as podcasters
00:09:45.020 | is the people that listen to it.
00:09:46.780 | And we need big names,
00:09:48.060 | but we also need to serve our audience well.
00:09:50.060 | And I think if we don't do it well,
00:09:52.740 | this actually goes all the way back to George Hotz.
00:09:54.500 | When after he finished recording with us,
00:09:56.580 | he said, "You have two paths in the podcast world.
00:09:59.500 | Either you go be Lex Friedman or you stay small on niche."
00:10:04.100 | And we definitely like, we like our niche.
00:10:06.380 | We think it's a good niche.
00:10:07.820 | It's going to grow.
00:10:08.860 | But at the same time, I still want us to grow.
00:10:10.820 | I want us to grow on YouTube, right?
00:10:12.300 | And so that's always a Meta thing.
00:10:15.260 | Not to get too Meta.
00:10:16.220 | - Not that Meta, the other Meta.
00:10:18.820 | - Yeah, so Lama Tree, yeah.
00:10:19.820 | - I think to me, the biggest thing
00:10:21.260 | is the training on outputs.
00:10:23.260 | Like every company is just hiding the fact
00:10:25.900 | that they've been fine tuning and training on GPT-4 outputs
00:10:28.820 | and you can not technically do it,
00:10:30.820 | but obviously OpenAI is not enforcing it.
00:10:32.940 | I think now for the first time,
00:10:34.860 | there's like a clear path to how do we make a 7B model good
00:10:38.620 | without having to go through GPT-4 or going to Cloud 3.
00:10:42.580 | And we'll kind of talk about this later,
00:10:44.300 | but I think we're seeing maybe the, you know,
00:10:47.300 | not the death, but like selling the picks and shovels,
00:10:50.100 | it's kind of going away.
00:10:51.340 | And like building the vertical things
00:10:52.780 | is like where most of the value is actually getting captured,
00:10:55.300 | at least at the early stages.
00:10:57.060 | So being able to make small models better
00:10:59.780 | and specific things through a large model
00:11:02.260 | is more important than yet another 7B model
00:11:05.980 | that I can try and use.
00:11:06.900 | But at the end of the day,
00:11:07.740 | I still need to go through the large labs to fine tune.
00:11:10.260 | So that to me is the most interesting thing.
00:11:11.780 | You know, it's such a large model
00:11:13.620 | that like it's obviously amazing,
00:11:15.620 | but I don't know if a lot of people are switching
00:11:17.660 | from GPT-4 or Cloud 3.5 to run 4 or 5B.
00:11:22.660 | I also don't know what the hosting options are
00:11:25.500 | as far as like scaling, you know,
00:11:27.580 | I don't know if the Fireworks and Together
00:11:30.260 | or Software World, how much capacity
00:11:32.300 | they actually have to serve this model,
00:11:33.700 | because at the end of the day,
00:11:35.380 | it's a lot of compute if some of the big products
00:11:38.420 | will switch to it and you cannot easily run it yourself.
00:11:41.420 | So I don't know, but to me,
00:11:43.020 | this synthetic data piece
00:11:44.060 | is definitely the most interesting.
00:11:46.340 | - Yeah, I would say that it is not enough now
00:11:49.780 | to say that synthetic data is real.
00:11:52.300 | I actually shipped that in the original email
00:11:54.980 | and then I changed that in the sort of what you see now
00:11:57.460 | in the podcast description.
00:11:59.420 | But because it is so established now
00:12:01.940 | that synthetic data is real,
00:12:03.140 | therefore you need to go to the next level,
00:12:04.460 | which is, okay, what do you use it for
00:12:06.100 | and how do you use it?
00:12:07.500 | And I think that is what was interesting for Lama3 for me,
00:12:11.180 | which you read the paper,
00:12:12.380 | 90 pages of all filler no killer is something like that.
00:12:16.020 | This is what the people were saying.
00:12:17.620 | Very, very like for once a frontier model
00:12:20.540 | with a proper paper instead of a marketing blog post.
00:12:23.900 | And they actually spelled out how they'd use synthetic data
00:12:27.580 | for a few different domains.
00:12:28.700 | So they have synthetic data for code, for math,
00:12:31.380 | for multilinguality, for long context, for tool use,
00:12:34.860 | and then also for ASR and voice generation.
00:12:37.340 | And I think that, yeah, okay,
00:12:39.780 | now you have the license to go distill Lama3, 405B,
00:12:44.780 | but how do you do that?
00:12:47.100 | That is the sort of the next frontier.
00:12:48.340 | Now you have the permission to do it, how do you do it?
00:12:50.180 | And I think that people are gonna reference Lama3 a lot,
00:12:53.380 | but then they can use those techniques for everything else.
00:12:56.220 | You know, in our episode with Thomas,
00:12:57.660 | he talked about like,
00:12:59.020 | I was very focused on synthetic data for pre-training
00:13:00.940 | 'cause that's my context.
00:13:02.380 | That's my conversations with Technium from Noose
00:13:04.900 | and all the other people doing synthetic data
00:13:07.300 | for pre-training and fine tuning.
00:13:09.260 | But he was talking about post-training as well.
00:13:11.340 | And for everything here was post-training.
00:13:14.500 | In fact, I wish we had spent more time
00:13:15.940 | with Thomas on this stuff.
00:13:17.300 | We just didn't have the paper beforehand.
00:13:19.460 | (all laughing)
00:13:21.260 | But I think like when I call Lama3,
00:13:22.860 | the synthetic data model is you have the license for it,
00:13:25.980 | but then you also have the roadmap, the recipe,
00:13:28.380 | because it's in the paper.
00:13:30.540 | And now everybody knows how to do this.
00:13:33.060 | And probably, you know, obviously opening eyes
00:13:36.140 | probably laughing at us
00:13:36.980 | 'cause they did this like a year ago,
00:13:38.500 | but now it's in the open.
00:13:40.220 | - I mean, they can laugh all they want,
00:13:41.740 | but they're coming for them.
00:13:43.380 | I think, I mean, that's definitely
00:13:44.620 | the biggest vibe shift, right?
00:13:45.940 | It's like, obviously Lama 3.1 is good.
00:13:48.740 | Obviously Cloud is good.
00:13:49.940 | Maybe a year and a half ago,
00:13:51.580 | you didn't get the benefit of the doubt.
00:13:53.620 | It's like an open AI competitor to be state-of-the-art.
00:13:56.140 | You know, it was kind of like,
00:13:56.980 | oh, Anthropic, yeah, these guys are cute over there.
00:13:59.260 | They're trying to do their thing, but it's not open AI.
00:14:01.620 | And like Lama2 is great,
00:14:03.500 | but like it's really not a serious model.
00:14:06.060 | You know, it's like just good enough.
00:14:07.780 | I think now it's like every time Anthropic
00:14:10.140 | releases something, people are like,
00:14:12.060 | okay, this is like a serious thing.
00:14:13.420 | Whenever like Meta releases something,
00:14:15.500 | it's like, okay, they're at the same level.
00:14:17.420 | And I don't know if open AI is kind of like sandbagging
00:14:20.780 | the GPT next, you know?
00:14:22.820 | And then they kind of, you know, yesterday or today,
00:14:27.380 | they launched the search GPT thing behind the wait list.
00:14:30.900 | - The Singapore confusion, when was it?
00:14:32.580 | - Yeah, when was it?
00:14:33.420 | - Yes, it happened yesterday, U.S. time,
00:14:35.260 | but today, Singapore time.
00:14:36.940 | - Thursday.
00:14:37.780 | It's been really confusing.
00:14:40.620 | But yeah, and people are kind of like,
00:14:42.380 | oh, okay, open AI.
00:14:43.980 | I don't know if we can take you seriously.
00:14:46.620 | - Well, no, one of the AI grants employees,
00:14:51.580 | I think Hirsch, tweeted that, you know,
00:14:53.460 | you can skip the wait list, just go to perplexity.com.
00:14:56.220 | (laughs)
00:14:57.900 | And that was a really, really sick burn
00:15:00.180 | for the open AI search GPT wait list.
00:15:02.580 | But their implementation will have something different.
00:15:04.820 | They probably like train a dedicated model for that,
00:15:07.100 | you know, like they will have some innovation
00:15:08.740 | that we haven't seen.
00:15:09.580 | - Yeah, data licensing, obviously.
00:15:10.940 | - Data licensing, yes.
00:15:12.700 | We're optimistic, you know, but the vibe shift is real.
00:15:15.860 | And I think that's something
00:15:16.700 | that is just worth commenting on and watching.
00:15:18.660 | And yeah, how the other labs catch up.
00:15:21.860 | I think what you said there is actually very interesting.
00:15:23.780 | The trend of successive releases is very important to watch.
00:15:27.860 | If things get less and less exciting,
00:15:30.420 | then it's a red flag for that company.
00:15:32.660 | And if things get more and more exciting,
00:15:34.220 | it means that these guys have a good team,
00:15:36.260 | they have a good plan, good ideas.
00:15:38.620 | So yeah, like I will call out, you know,
00:15:41.540 | the Microsoft PHY team as well.
00:15:43.700 | PHY 1 was kind of widely regarded
00:15:45.860 | to be overtrained on benchmarks
00:15:47.300 | and PHY 2 and PHY 3 subsequently improved a lot as well.
00:15:50.780 | I would say also similar for GEMMA, GEMMA 1 and 2.
00:15:54.020 | GEMMA 2 is currently leading
00:15:56.580 | in terms of the local Lama sort of vibe check,
00:15:59.220 | eval, informal straw poll.
00:16:02.700 | And that's only like a month after release.
00:16:04.780 | They released at the AI Engineer World's Fair.
00:16:07.380 | And, you know, like I didn't know what to think about it
00:16:10.540 | 'cause GEMMA 1 wasn't like super well-received.
00:16:12.420 | It was just kind of like, here's like free tier GEMMA and I,
00:16:15.540 | you know, but now GEMMA 2 is actually
00:16:17.780 | like a very legitimately widely used model
00:16:20.740 | by the open source and local Lama community.
00:16:23.420 | So that's great until Lama 3 7B came along.
00:16:26.460 | (laughing)
00:16:28.260 | And so like the, and we'll talk about this also,
00:16:30.340 | like just the winds of AI winter is also like,
00:16:32.980 | what is the depreciation schedule on this model
00:16:35.820 | inference and training costs?
00:16:37.180 | Like it's very high.
00:16:39.100 | - Yeah.
00:16:39.940 | I'm curious to get your thought on Mistral.
00:16:42.300 | Everybody's favorite Spark and Waitz company.
00:16:45.220 | - Yeah, yeah.
00:16:46.700 | - They just released the, you know, Mistral Large Enough.
00:16:49.900 | - Large, Mistral Large 2.
00:16:51.260 | - Yeah, Large 2.
00:16:52.220 | - So this was one day after Lama 3,
00:16:55.220 | presumably because they were speaking at ICML,
00:16:57.340 | which is going on right now.
00:16:58.900 | By the way, Brittany is doing a guest host thing for us.
00:17:02.060 | She's running around the poster sessions doing what I do,
00:17:04.620 | which is very great.
00:17:05.460 | 'Cause I couldn't go 'cause my visa issue.
00:17:07.100 | I have to be careful what I say here,
00:17:08.460 | but I think because we still want to respect their work,
00:17:11.060 | but Mistral Large, I would say,
00:17:12.380 | it's like not as exciting as Lama 3.
00:17:14.860 | I think that is very, very fair to say.
00:17:16.820 | It is, yes, another GPT-4 class model
00:17:20.220 | released as Open Waitz with a research license
00:17:23.140 | on a commercial license, but still Open Waitz.
00:17:25.580 | And that's good for the community,
00:17:27.340 | but it is a step down in terms of the general excitement
00:17:31.140 | around Mistral compared to Lama.
00:17:32.940 | I think that would be fair to say,
00:17:34.340 | and I would say that to Mistral themselves.
00:17:36.380 | So the general hope is, and I cannot say too much,
00:17:39.460 | it's 'cause I've had offline conversations
00:17:40.860 | with people close to this.
00:17:42.940 | The general hope is that they need something more.
00:17:45.580 | Of the 10 elements of what is next
00:17:48.500 | in terms of their frontier model boundaries,
00:17:51.220 | Mistral needs to make progress there.
00:17:53.020 | They made progress here with instruction following
00:17:56.860 | and structured output and multilinguality
00:17:59.620 | and all those things.
00:18:00.940 | But I think to stand out,
00:18:02.460 | you need to basically pull a stunt.
00:18:03.900 | You need to be a superlatively good company
00:18:06.100 | in one dimension.
00:18:07.300 | And now, unfortunately, Mistral does not have that crown
00:18:09.780 | as open-source kings.
00:18:11.340 | Like a year ago, I was saying,
00:18:12.540 | Mistral are the kings of open-source AI.
00:18:14.700 | Now meta is, they've lost their crowns.
00:18:17.180 | By the way, they've also deprecated Mistral 7B,
00:18:20.500 | 8x7B, and 8x22B.
00:18:22.660 | So now there's only the closed-source models
00:18:24.820 | that are the API platform.
00:18:25.780 | So has Mistral basically started becoming
00:18:28.820 | more of a closed-model proprietary platform?
00:18:32.020 | I don't believe that's true.
00:18:34.060 | I believe that they're still very committed to open-source,
00:18:36.940 | but they need to come up with something more
00:18:38.420 | that people can use.
00:18:39.300 | And that's a grind.
00:18:40.620 | I mean, they have, what, $600 million to do it?
00:18:44.020 | So that's still good.
00:18:46.140 | But people are waiting for what's next from them.
00:18:48.980 | - Yeah, to me, the perception was interesting.
00:18:51.620 | In the comments of the release,
00:18:52.940 | everybody was like,
00:18:53.980 | "Why do you have a non-commercial license
00:18:55.900 | "for not making any money anyway from the inference?"
00:18:58.100 | So I feel like the AI engineering tier list
00:19:02.180 | is kind of shifting in real time.
00:19:03.780 | And maybe Mistral, like you said before,
00:19:05.500 | was like, "Hey, thank God for these guys.
00:19:07.740 | "They're saving us in open-source.
00:19:09.100 | "They're kind of like speed-running
00:19:10.540 | "GPT-1, GPT-2, GPT-3 in open-source."
00:19:13.460 | But now it's like they're kind of moving away from that.
00:19:16.020 | I haven't really heard of that many people
00:19:18.380 | using them as scale commercially,
00:19:20.580 | just from discussions.
00:19:23.380 | So I'm curious to see what the next step is.
00:19:25.540 | - Yeah, but also you're sort of US-based,
00:19:27.220 | and maybe they're not focused there, right?
00:19:29.220 | So- - Yeah, no, exactly.
00:19:31.300 | - It's a very big elephant,
00:19:32.580 | and we're only touching pieces of it
00:19:33.900 | as blind leading the blind. (laughs)
00:19:37.140 | I will call out,
00:19:39.020 | they have some interesting experimentations with Mamba,
00:19:41.260 | and Mistral NEMO is actually
00:19:43.420 | on the efficiency frontier chart that I drew
00:19:46.060 | that is still relevant.
00:19:47.460 | So don't discount Mistral NEMO.
00:19:49.380 | But Mistral Large, otherwise, it's an update.
00:19:52.100 | It's a necessary update for Mistral Large V1,
00:19:54.980 | but other than that, they're just kind of holding the line,
00:19:58.060 | not really advancing the field yet.
00:20:00.860 | That'll be my statement there.
00:20:02.980 | - So those are the frontier big labs.
00:20:05.020 | - Yes.
00:20:05.860 | - And then now we're gonna shift a little bit
00:20:07.580 | towards the smaller deployable on-device solutions.
00:20:11.340 | - Yeah.
00:20:12.180 | First of all, a shout out to our friend, Tri Dao,
00:20:15.380 | who released Flash Attention 3.
00:20:16.860 | Flash Attention 2, we kind of did a deep dive on the podcast.
00:20:19.860 | He came on in the studio back then.
00:20:22.180 | It's just great to see how small groups
00:20:25.460 | can make a big impact on a whole industry,
00:20:27.340 | just like by making math better.
00:20:29.980 | So it's just great to see.
00:20:32.020 | Just wanted to give Tri a shout out.
00:20:33.540 | - Something I mentioned there,
00:20:34.820 | and it's something that always comes up,
00:20:36.580 | even in the Sovereign AI Summit that we did,
00:20:38.180 | was does NVIDIA's competitors have any threat to NVIDIA?
00:20:43.180 | AMD, like MADX, like Etched,
00:20:48.220 | which caused a lot of noise with their SOHU chip as well.
00:20:51.100 | And just the simple fact is that
00:20:53.420 | NVIDIA has won the hardware lottery,
00:20:55.340 | and people are customizing for NVIDIA.
00:20:57.380 | Like Flash Attention 3 only works for NVIDIA,
00:20:59.580 | only works for H100s.
00:21:01.140 | And like this much work, this much scaling,
00:21:03.140 | this much validation going into this stuff
00:21:05.380 | is very difficult to replicate,
00:21:06.980 | or very expensive to replicate
00:21:08.180 | for the other hardware ecosystems.
00:21:10.020 | So not impossible.
00:21:11.500 | I actually heard a really good argument from,
00:21:15.900 | I think it is Martin Casado from A16Z,
00:21:19.380 | who was saying basically like,
00:21:20.780 | yeah, absolutely NVIDIA's hardware and ecosystem makes sense.
00:21:25.580 | And obviously that's contributed to,
00:21:27.340 | it's like, I don't know,
00:21:28.900 | it's like the most valuable company in the world right now,
00:21:31.300 | but current training runs are like
00:21:33.020 | 100 million to 200 million in cost.
00:21:35.780 | But when they go to 500 million,
00:21:37.460 | when they go to a billion,
00:21:38.300 | when they go to 1 trillion,
00:21:39.860 | then you can actually start justifying
00:21:41.980 | making custom ASICs for your run.
00:21:44.580 | And if they cut your costs by like half,
00:21:47.700 | then you make your money back in one run.
00:21:49.740 | - Yeah, yeah, yeah.
00:21:50.580 | Martin has always been a fan of custom ASIC.
00:21:53.780 | I think they wrote a really good post
00:21:55.460 | maybe a couple of years ago about cloud repatriation.
00:21:58.380 | - Oh yeah, I think he got a lot of shit for that,
00:22:00.500 | but it's becoming more consensus now, I think.
00:22:03.620 | So Noam Shazir blogging again,
00:22:05.140 | fantastic gifts to the world.
00:22:06.460 | This guy, nonstop bangers.
00:22:09.020 | And so he's at Character AI
00:22:12.620 | and he put up a post talking about five tricks
00:22:15.700 | that they use to serve 20% of Google search traffic
00:22:19.380 | as LLM inference.
00:22:21.060 | A lot of people were very shocked by that number,
00:22:23.060 | but I think you just have to remember
00:22:24.780 | that most conversations are multi-turn, right?
00:22:27.940 | Like in the span of one Google search,
00:22:30.380 | I will send like 10 text messages, right?
00:22:32.460 | So obviously there's a good ratio here that matters.
00:22:35.860 | It's obviously a flex of Character AI's traction
00:22:38.660 | among the kids,
00:22:40.060 | because I have tried to use Character AI since then,
00:22:42.500 | and I still cannot for the life of me get it.
00:22:44.780 | Have you tried?
00:22:46.100 | - I have tried it, but yes, definitely not.
00:22:47.860 | - Yeah, they launched like voice.
00:22:48.980 | I tried to talk to it.
00:22:49.900 | It was just so stupid.
00:22:51.340 | I just didn't like it myself.
00:22:54.300 | But this is what it means.
00:22:55.140 | - But please still come on the podcast to Noam Shazir.
00:22:57.500 | - Sorry, what did I mean?
00:22:58.660 | - No, no, no.
00:22:59.580 | Because like I don't really understand
00:23:02.140 | like what the use case is for apart from like the therapy,
00:23:04.860 | role play, homework assistant type of stuff
00:23:07.340 | that is the norm.
00:23:08.180 | But anyway, one of the most interesting things,
00:23:10.500 | so you detailed five tricks.
00:23:12.260 | One thing that people talk a lot about
00:23:13.820 | is native int8 training.
00:23:15.420 | I got it wrong in our Thomas podcast.
00:23:17.100 | I said FPA is int8.
00:23:19.060 | And I think like that is something that is an easy win.
00:23:21.180 | Like we should basically,
00:23:23.380 | when we're getting to the point
00:23:24.780 | where we're overtraining models 100 times
00:23:28.740 | past Chinchilla ratio to optimize for inference,
00:23:31.820 | the next thing is actually like,
00:23:33.020 | hey, let's stop using so much memory when training
00:23:35.940 | because we're gonna quantize it anyway for inference.
00:23:38.500 | So like just let's pre-quantize it in training.
00:23:41.500 | So that makes a lot of sense.
00:23:42.860 | The other thing as well is this concept
00:23:44.740 | of global local hybrid architecture,
00:23:47.700 | which I think is basically going to be the norm, right?
00:23:51.420 | So he has this formula of one to five ratio
00:23:54.300 | of global attention to local attention.
00:23:56.500 | And he says that that works
00:23:59.180 | for the long-form conversations that character has.
00:24:01.460 | Okay, that's great.
00:24:02.500 | And like simultaneously we have independence research
00:24:06.020 | from other companies about similar hybrid ratios
00:24:08.940 | being the best for their research.
00:24:10.900 | So Nvidia came out
00:24:11.940 | with a Mamba transformer hybrid research thing.
00:24:14.740 | And in their estimation, you only need 7% transformers.
00:24:18.020 | Everything else can be state space models.
00:24:19.780 | Jamba also had something like
00:24:21.740 | between like six to like 30 to one.
00:24:24.500 | And basically every form of hybrid architecture
00:24:27.980 | seems to be working at the research stage.
00:24:30.500 | So I think like if we scale this,
00:24:32.060 | it makes complete sense
00:24:33.500 | that you just need a mix of architectures
00:24:35.860 | and it could well be that the transformer block
00:24:38.340 | instead of transformers being all you need,
00:24:40.100 | transformers are the global attention thing.
00:24:43.300 | And then the local attention thing
00:24:44.820 | can be the state space models,
00:24:45.980 | can be the RWKVs, can be another transformer,
00:24:48.420 | but just limited by a sliding window.
00:24:50.940 | And I think like we're slowly discovering
00:24:52.980 | like the fundamental building blocks of AI.
00:24:55.700 | One is transformers,
00:24:56.660 | one is something that's local, whatever that is.
00:24:59.900 | And then, you know, who knows what else is next?
00:25:01.740 | I mean, the other stuff is adapters,
00:25:03.700 | we can talk about that.
00:25:04.820 | But yeah, headline is that Gnome,
00:25:07.140 | maybe he's too confident,
00:25:08.420 | but I mean, I believe him.
00:25:10.100 | Gnome thinks that he can do inference at 13X cheaper
00:25:13.100 | than the fireworks together, right?
00:25:15.780 | So like, there is a lot of room left to improve inference.
00:25:20.220 | - I mean, it does make sense, right?
00:25:21.420 | Because like, otherwise, I don't know.
00:25:23.380 | - Otherwise, character would be bankrupt.
00:25:24.220 | - Yeah, exactly.
00:25:25.060 | I was like, they would be losing a ton of money, so.
00:25:28.420 | - They are rumored to be exploring a sale.
00:25:31.060 | So I'm sure money is still an issue for them,
00:25:33.580 | but I'm also sure they're making a lot of money.
00:25:35.300 | So it's very hard to tell
00:25:36.660 | because it's not a very public company.
00:25:39.420 | - Well, I think that's one of the things
00:25:42.940 | in the market right now too,
00:25:44.140 | is like, hey, do you just want to keep building?
00:25:46.660 | Do you want to like,
00:25:47.580 | just not worry about the money and go build somewhere else?
00:25:49.540 | Kind of like maybe Inflection and Adapt
00:25:52.100 | and some of these other non-equal hires,
00:25:55.060 | licensing deals and whatnot.
00:25:56.980 | So I'm curious to see what companies decide to stick with it.
00:26:01.500 | - I think Google or Meta should pay $1 billion
00:26:04.460 | for Gnome alone.
00:26:05.460 | The purchase price for a character is $1 billion,
00:26:09.860 | which is super underpriced.
00:26:10.700 | - Which is nothing at their market cap, right?
00:26:12.380 | - It's nothing.
00:26:13.220 | Meta's market cap right now is $1.15 trillion
00:26:17.340 | because they're down 5%, 11% in the past month.
00:26:21.140 | - What?
00:26:22.100 | - Yeah.
00:26:22.940 | So if you pay $1 billion,
00:26:24.980 | you know, that's like 0.01% of your market cap.
00:26:28.620 | And they pay $1 billion for WhatsApp
00:26:31.980 | and they buy 1% of their market cap on that at the time.
00:26:34.740 | So yeah.
00:26:35.620 | - That is beyond our pay grade.
00:26:37.060 | But the last piece of the GPU rich poor wars.
00:26:39.660 | So we're going from the super GPU rich
00:26:41.260 | down to like the medium GPU rich.
00:26:42.940 | And now down to the GPU poorest is on-device models, right?
00:26:46.820 | Which is something that people are very, very excited about.
00:26:49.340 | So at my conference, Mozilla AI,
00:26:52.140 | I think was kind of like the talk of the town there
00:26:54.820 | on Llamafile.
00:26:56.020 | We had Justine Tunney come in
00:26:57.700 | and explain like some of the optimizations that they did.
00:26:59.820 | And their just general vision for on-device AI.
00:27:02.540 | I think that like, it's basically the second act of Mozilla.
00:27:07.260 | Like a lot of good with the open source browser.
00:27:10.460 | And obviously then they have since declined
00:27:13.100 | because it's very hard to keep up in that field.
00:27:15.500 | And Mozilla has had some management issues as well.
00:27:17.660 | But now that the operating system
00:27:19.420 | is moving to the AI layer,
00:27:21.460 | now they're also like, you know,
00:27:22.860 | promoting open source AI there
00:27:24.580 | and also like private AI, right?
00:27:26.420 | Like open source is synonymous with local, private
00:27:29.220 | and all the good things that people want.
00:27:31.020 | And I think their vision of like,
00:27:32.540 | even running this stuff on CPUs at a very, very fast speed
00:27:35.220 | by just like being extremely cracked.
00:27:37.140 | (laughs)
00:27:39.220 | I think it's very understated
00:27:40.940 | and we should probably try to support it more.
00:27:44.460 | And it's just amazing to host these people
00:27:47.820 | and see the progress.
00:27:49.780 | - Yeah, I think to me the biggest question about on-device,
00:27:52.420 | obviously there's a Gemini Nano,
00:27:54.100 | which is getting shipped with Chrome.
00:27:56.100 | - Yeah, so let's survey, right?
00:27:57.300 | So Llamafile is one executable
00:27:58.900 | that runs on every architecture.
00:28:00.860 | Similar for, by the way, Mojo from Mozilla,
00:28:03.700 | which also spoke at the conference.
00:28:05.460 | And then what else?
00:28:06.940 | Llama CPP, MLX, those kinds are all sort of that layer.
00:28:11.260 | Then the next layer up would be the built-in
00:28:14.340 | into their products by the vendors.
00:28:18.820 | So Google Chrome is building Gemini Nano into the browser.
00:28:21.940 | The next version of Google Chrome will have Nano inside
00:28:25.260 | that you can use like window.ai.something
00:28:28.060 | and it would just call Nano.
00:28:29.620 | There'll be no download, no latency whatsoever
00:28:32.300 | 'cause it runs on your device.
00:28:33.740 | And there's Apple Intelligence as well,
00:28:35.460 | which is Apple's version, which is in the OS,
00:28:38.100 | accessible by apps.
00:28:39.540 | And then there's a long tail of others.
00:28:41.060 | But yeah, your comments on those things.
00:28:43.420 | - My biggest question is how much can you differentiate
00:28:46.900 | at that model size?
00:28:48.620 | Like how big is gonna be the performance gap
00:28:51.740 | between all these models?
00:28:53.140 | And are people gonna be aware of what model is running?
00:28:56.340 | Right now, for the large models,
00:28:58.420 | we're still pretty aware of like,
00:28:59.820 | oh, is this SoundNet 3.5, is this GPT-4,
00:29:02.940 | is this, you know, 3.1, 4.5B.
00:29:05.540 | I think the smaller you get,
00:29:07.220 | the more it's just gonna become like a utility, you know?
00:29:10.460 | So like, you're not gonna need a model router
00:29:12.620 | for like small models.
00:29:13.580 | You're not gonna need any of that.
00:29:15.260 | Like they're all gonna converge
00:29:16.780 | to like the best possible performance.
00:29:18.540 | - Actually, Apple Intelligence is the model router, I think.
00:29:21.380 | They have something like 14,
00:29:22.940 | I did a count in my newsletter, like 14 to 20 adapters.
00:29:26.820 | And so based on your use case,
00:29:27.900 | they'll route and load the adapter
00:29:30.460 | or they'll route to OpenAI.
00:29:32.420 | So there is some routing layer.
00:29:34.860 | To me, I think a lot of people were trying to puzzle out
00:29:37.420 | the strategic moves between OpenAI and Apple here
00:29:40.540 | because Apple is in a very good position
00:29:43.180 | to commoditize OpenAI.
00:29:44.460 | There was some rumors that Google was working with Apple
00:29:46.660 | to launch it, they did not make it for the launch,
00:29:48.780 | but presumably Apple wants to commoditize OpenAI, right?
00:29:52.020 | So, you know, when you launch,
00:29:53.900 | you can choose your preferred external AI provider
00:29:57.500 | and it's either OpenAI or Google or someone else.
00:30:00.220 | I mean, that puts Apple at the center of the world
00:30:02.620 | with the ability to make routing decisions.
00:30:05.780 | And I think that's probably good for privacy,
00:30:08.340 | probably good for the planet,
00:30:10.420 | 'cause you're not running like oversized models
00:30:13.620 | on like your spell check pass.
00:30:16.660 | And I'm generally pretty positive on it.
00:30:18.940 | Like, yeah, I'm not concerned about the capabilities issue.
00:30:22.300 | It meets their benchmarks.
00:30:23.180 | Apple put out a whole bunch of proprietary benchmarks
00:30:26.020 | 'cause they don't like to do anything
00:30:27.580 | in the way that everyone else does it.
00:30:29.740 | So like, you know, in the Apple intelligence blog posts,
00:30:31.620 | they like, I think like all of them
00:30:33.780 | were just like their internal human evaluations.
00:30:36.140 | And only one of them was an industry standard benchmark,
00:30:38.420 | which was IFEVL, which is good.
00:30:40.340 | But like, you know, why didn't you also release your MMLU?
00:30:43.340 | Oh, 'cause you suck on it.
00:30:44.380 | All right.
00:30:45.220 | (laughing)
00:30:46.060 | - Well, I actually think all these models will be good.
00:30:49.300 | And on the Apple side,
00:30:50.340 | I'm curious to see what the price tag will be
00:30:52.500 | to be the default.
00:30:53.460 | Right now, Google pays them 20 billion
00:30:55.780 | to be the default search.
00:30:57.540 | - I see.
00:30:58.660 | The rumors is zero.
00:31:00.380 | - Yeah, I mean, today, even if it was 20 billion,
00:31:03.060 | that's like nothing compared to like, you know,
00:31:05.380 | NVIDIA's worth three trillion.
00:31:06.860 | So like even paying 20 billion
00:31:08.860 | to be the default AI provider,
00:31:10.980 | like would be cheap compared to search,
00:31:13.580 | given that AI is actually being
00:31:15.460 | such a core part of the experience.
00:31:17.100 | Like Google being the default
00:31:18.340 | for like Apple's phone experience
00:31:20.260 | really doesn't change anything.
00:31:22.100 | Becoming the default AI provider
00:31:23.820 | for like the Apple experience
00:31:24.980 | will be worth a lot more than this.
00:31:26.780 | - I mean, so I can justify it being zero
00:31:28.500 | instead of 20 billion,
00:31:29.340 | it's because opening has to foot the inference costs, right?
00:31:31.740 | So that's a lot.
00:31:33.180 | - Well, yeah, Microsoft really is putting it,
00:31:35.660 | but again, Microsoft is worth two trillion, you know?
00:31:38.580 | - So as someone who,
00:31:40.820 | this is the web developer coming out,
00:31:42.740 | as someone who is a champion of the open web,
00:31:44.700 | Apple has been, let's just say,
00:31:47.500 | roadblock in that direction.
00:31:49.460 | I think Gemini Nano being good
00:31:50.820 | is more important than Apple intelligence
00:31:52.980 | being generally capable.
00:31:53.980 | Apple intelligence being like on-device router
00:31:57.060 | for Apple apps is good,
00:31:58.220 | but like if you care about the open web,
00:32:00.300 | you really need Gemini Nano to work.
00:32:03.020 | And we're not sure.
00:32:04.300 | Like right now we have some demos
00:32:05.900 | showing that it's fast enough,
00:32:07.540 | but we haven't had systematic tests on it.
00:32:09.860 | Along the lines of that research,
00:32:11.220 | I will highlight that Apple has also put out Datacomp LM.
00:32:15.020 | I actually interviewed Datacomp at NeurIPS last year,
00:32:17.900 | and they've branched out from just vision and images
00:32:20.260 | to language models.
00:32:21.500 | And Apple has put out a reference implementation
00:32:24.180 | of the 7B language model
00:32:25.820 | that's built on top of Datacomp.
00:32:27.300 | And it is better than FindWeb, which is huge
00:32:30.220 | because FindWeb was the state-of-the-art last month.
00:32:32.180 | (both laughing)
00:32:34.900 | And that's fantastic.
00:32:35.740 | So basically like Datacomp is an open data,
00:32:38.780 | open weights, open model, like super everything open.
00:32:42.460 | So there will be a lot of people
00:32:44.500 | optimizing this kind of model.
00:32:47.140 | They'll be building on architectures
00:32:48.820 | like mobile LM and small LM,
00:32:50.300 | which basically innovate in terms of like shared weights
00:32:53.060 | and shared matrices for smaller models
00:32:55.540 | so that you just optimize the amount of file size
00:32:57.820 | and memory that you take up.
00:32:59.700 | And I think just general trend of on-device models,
00:33:02.540 | like the only way that intelligence
00:33:04.660 | to cheap-to-meter happens is everything happens on-device.
00:33:08.580 | So unfortunately that means that OpenAI
00:33:10.780 | is not involved in this.
00:33:12.020 | Like OpenAI's mission is intelligence to cheap-to-meter,
00:33:14.300 | and they're not doing the one thing
00:33:15.500 | that needs to happen for that
00:33:16.980 | because there's no business plan
00:33:18.460 | in monetizing an API for that.
00:33:20.340 | But by definition, none of this is APIs.
00:33:22.940 | - I don't know.
00:33:23.780 | Maybe OpenAI, even Sam Ullman needs to figure it out
00:33:25.980 | so they can do a--
00:33:26.940 | - Yeah, I'm excited for OpenAI phone.
00:33:28.700 | I don't know if you would buy an OpenAI phone.
00:33:30.580 | I mean, I'm very locked into the iOS ecosystem, but I mean--
00:33:33.060 | - I will not be the first person to buy it
00:33:35.020 | because I don't want to be stuck
00:33:35.980 | with like the rabbit equivalent of a iPhone,
00:33:38.340 | but I think it makes a lot of sense.
00:33:40.740 | I want their--
00:33:41.580 | - They're building a search engine now.
00:33:42.860 | The next thing is the phone.
00:33:43.980 | (laughs)
00:33:45.300 | - Exactly.
00:33:46.380 | So we'll see.
00:33:47.700 | - We'll see.
00:33:48.540 | When it comes on a wait list, we'll see.
00:33:49.780 | - Yeah, yeah, we'll review it.
00:33:51.660 | All right, so that was GPU-rich, GPU-poor.
00:33:54.860 | Maybe we just want to run quickly
00:33:56.500 | through the quality data wars.
00:33:58.620 | There's mostly drama in this section.
00:34:01.300 | There's not as much research.
00:34:03.980 | - I think there's a lot of news going in the background.
00:34:05.860 | So like the New York Times lawsuit is still ongoing.
00:34:08.820 | You know, it's just like we won't have specific things
00:34:12.340 | to update people on.
00:34:14.660 | There are specific deals that are happening all the time
00:34:17.220 | with Stack Overflow making deals with everybody,
00:34:19.820 | with like Shutterstock making deals with everybody.
00:34:22.580 | It's just, it's hard to make a single news item
00:34:25.100 | out of something that is just slowly cooking
00:34:26.460 | in the background.
00:34:27.820 | - Yeah, on the New York Times thing,
00:34:30.100 | OpenAI's strategy has been to make the New York Times
00:34:34.180 | prove that their content is actually any original
00:34:37.940 | or like actually interesting.
00:34:39.220 | - Really?
00:34:40.060 | - Yeah, so it's kind of like, you know, the iRobot meme.
00:34:42.500 | It's like, can a robot create a beautiful new symphony?
00:34:45.900 | And the robot is like, can you?
00:34:47.860 | - I think that's what OpenAI's strategy is.
00:34:51.740 | - Yeah, I think that the danger with the lawsuit,
00:34:53.940 | because this lawsuit is very public,
00:34:55.780 | because OpenAI responded, including with Ilya,
00:34:59.060 | showing their emails with New York Times,
00:35:01.340 | saying that, "Hey, we were doing a deal.
00:35:03.860 | You were like very close to a deal.
00:35:04.980 | And then suddenly on the eve of the deal, you called it off."
00:35:08.220 | I don't think New York Times has responded to that one,
00:35:10.420 | but it's very, very strange
00:35:11.980 | because the New York Times' brand is like trying to be like,
00:35:15.580 | you know, they're supposed to be the top newspaper
00:35:17.180 | in their country.
00:35:18.580 | If OpenAI, like just, and this was my criticism of it
00:35:22.340 | at the point in time, like, okay,
00:35:23.900 | we'll just go to the next best paper,
00:35:25.540 | the Washington Post, the Financial Times,
00:35:27.500 | they're all happy to work with us.
00:35:29.180 | And then what does New York Times have?
00:35:30.700 | - Yeah, yeah, yeah.
00:35:31.940 | - So you just lost out on like a hundred million dollars,
00:35:33.740 | $200 million a year of licensing deals
00:35:36.300 | just because you wanted to pick that war,
00:35:38.020 | which ideologically,
00:35:38.980 | I think they are absolutely right to do that.
00:35:41.740 | But, you know, the other people,
00:35:44.460 | The Verge did a very good interview with,
00:35:47.220 | I think the Washington Post.
00:35:49.500 | I'm going to get the outlet wrong.
00:35:51.780 | The Verge did a very good interview
00:35:53.660 | with a newspaper owner, editor,
00:35:56.180 | on why they did the deal with OpenAI.
00:35:58.420 | And I think that listening to them on like,
00:36:00.700 | they're thinking through like the reasoning
00:36:04.060 | of like the pros and cons of picking a fight
00:36:06.300 | versus partnering, I think it's very interesting.
00:36:08.540 | - Yeah, I guess the winner in all of this is Sridhar,
00:36:12.140 | which is making over 200 million just in data licensing
00:36:15.420 | to OpenAI and some of the other AI providers.
00:36:18.580 | I mean, 200 million is like more
00:36:20.580 | than most AI startups are making.
00:36:22.740 | - So I think that was an IPO play,
00:36:24.180 | 'cause Reddit conveniently did this deal before IPO, right?
00:36:27.380 | - Totally.
00:36:28.220 | - Is it like a one-time deal?
00:36:29.060 | And then, you know, the stock language is from there?
00:36:30.700 | I don't know.
00:36:31.540 | - Yeah, no, well, their IPO is done.
00:36:34.180 | Well, I guess it's not gone down.
00:36:35.660 | So in this market, they're up 25%, I think, since IPO.
00:36:39.380 | But I saw the FTC had opened an inquiry into it
00:36:42.860 | just to like investigate.
00:36:44.020 | So I'm curious what the antitrust regulations
00:36:48.580 | are gonna be like when it comes to data.
00:36:50.260 | Obviously, acquisitions are blocked
00:36:52.220 | to prevent kind of like stifling competition.
00:36:55.100 | I wonder if for data, it will be similar,
00:36:57.180 | where, hey, you cannot actually get all of your data
00:37:00.580 | only behind $100 million plus contracts,
00:37:03.420 | because otherwise you're stopping any new company
00:37:06.620 | from building a competing product, so.
00:37:08.780 | - Yeah, that's a serious overreach of the state there.
00:37:11.780 | - Yeah, yeah, yeah.
00:37:12.620 | - So as a free market person, I want to defend.
00:37:15.020 | It is weird, I'm a free market person
00:37:16.380 | and I'm a content creator, right?
00:37:17.500 | So I want to be paid for my content.
00:37:19.780 | At the same time, I believe that, you know,
00:37:21.900 | people should be able to make their own decisions
00:37:23.540 | about all these deals.
00:37:25.380 | But UGC is a weird thing,
00:37:27.260 | 'cause UGC is contributed by volunteers.
00:37:30.740 | - Yeah.
00:37:31.580 | - And the other big news about Reddit
00:37:32.500 | is that apparently they have added to their robots.txt,
00:37:35.100 | like only Google should index us, right?
00:37:37.460 | 'Cause we did the deal with Google.
00:37:39.020 | And that's obviously blocking OpenAI from crawling them,
00:37:41.740 | Anthropic from crawling them, you know,
00:37:43.300 | Perplexity from crawling them.
00:37:44.860 | Perplexity maybe ignores all robots.txt,
00:37:47.100 | but that's a whole different other issue.
00:37:49.140 | And then the other thing is,
00:37:49.980 | I think this is big in the sort of normie worlds.
00:37:52.700 | The actors, you know, Scarlett Johansson
00:37:55.180 | had a very, very public Apple Notes take down of OpenAI.
00:37:58.940 | Only Scarlett Johansson can do that to Sam Altman.
00:38:01.380 | And then, you know, I was very proud of my newsletter
00:38:03.500 | for that day, I called it Skyfall,
00:38:05.100 | because the voice of Sky, so I called it Skyfall.
00:38:09.300 | And, but it's true, like, that one, she can win.
00:38:13.820 | And there's a very well-established case law there.
00:38:16.300 | And the YouTubers and the music industry, the RIAA,
00:38:19.220 | like the most litigious section of the creator economy
00:38:23.300 | has gone after Yudio and Suno, you know,
00:38:25.620 | Mikey from our podcast with him.
00:38:28.260 | And it's unclear what will happen there,
00:38:30.720 | but it's gonna be a very costly legal battle for sure.
00:38:34.420 | - Yeah, I mean, music industry and lawsuits
00:38:37.860 | name a more iconic duel, you know,
00:38:39.460 | so I think that's to be expected.
00:38:41.700 | - I think last time we talked about this,
00:38:42.740 | I was pretty optimistic that something like this
00:38:46.020 | would reach the Supreme Court.
00:38:47.740 | And with the way that the Supreme Court is making rulings,
00:38:52.260 | like we just need a judgment on whether or not
00:38:54.260 | training on data is transformative use.
00:38:57.860 | So I think it is.
00:38:59.020 | Literally, we're using transformers
00:39:00.340 | to do transformative use.
00:39:02.060 | So then it's open season for AI to do it.
00:39:04.060 | And comparatively, the content creators
00:39:05.940 | and owners will lose out, they just will.
00:39:09.100 | 'Cause right now we're paying their money
00:39:10.100 | out of fear of lawsuits.
00:39:11.540 | If the Supreme Court rules that there are no lawsuits
00:39:13.300 | to be had, then all their money disappears.
00:39:16.540 | - I think people are probably scraping late in space
00:39:18.660 | and we're not getting a dime, so that's what it is.
00:39:23.340 | - No, you can support with like
00:39:24.700 | an $8 a month subscription and that pays
00:39:26.820 | for our microphones and travel and stuff like that.
00:39:28.860 | Yeah, it's definitely not worth the amount of time
00:39:32.340 | we're putting into it, but it's a labor of love.
00:39:34.260 | - Yeah, exactly.
00:39:35.980 | - Synthetic data.
00:39:36.820 | - Yeah, I guess we talked about it a little bit
00:39:39.060 | before with Lama, but there was also the alpha proof thing.
00:39:43.020 | - Yes, just before I came here,
00:39:44.940 | I was working on that, et cetera.
00:39:46.740 | - Yeah, Google trained, almost got a gold medal.
00:39:49.540 | I forget what the--
00:39:50.380 | - Yes, they're one point short of the gold medal.
00:39:51.580 | - Yeah, one point short of the gold medal.
00:39:52.740 | - It's a remarkable, I wish they had more questions.
00:39:55.460 | So the International Math Olympiad has six questions
00:39:59.900 | and each question is seven points.
00:40:02.140 | Every single question that the alpha proof model tried,
00:40:06.740 | it got full marks on, it just failed on two.
00:40:09.900 | And then the cutoff was like sadly one point higher
00:40:12.660 | than that, but still like it was a very big,
00:40:15.140 | like a lot of people have been looking at IMO
00:40:17.100 | as like the next gold prize, grand prize
00:40:19.500 | in terms of what AI can achieve and betting markets
00:40:22.860 | and Eliezer Yakovsky has updated and saying like,
00:40:25.940 | yeah, like we're pretty close.
00:40:27.500 | Like we basically have reached it near gold medal status.
00:40:31.500 | We definitely reached a silver and bronze status
00:40:34.100 | and we'll probably reach gold medal next year, right?
00:40:37.140 | Which is good.
00:40:38.140 | There's also related work from Hugging Face
00:40:40.180 | on the Numina Math Competition.
00:40:41.660 | So this is on the AI Mathematical Olympiad,
00:40:44.540 | which is an easier version of the human Math Olympiad.
00:40:48.900 | This is all like related research work on search
00:40:51.900 | and verifier model assisted exploration
00:40:54.660 | of mathematical problems.
00:40:57.060 | So yeah, that's super positive.
00:40:58.300 | I don't really know much else beyond that.
00:41:00.020 | Like it's always hard to cover this kind of news
00:41:01.820 | 'cause it's not super practical
00:41:03.300 | and it also doesn't generalize.
00:41:04.540 | So one thing that people are talking about
00:41:06.500 | is this concept of jagged intelligence.
00:41:08.220 | 'Cause at the same time, we're having this discussion
00:41:10.540 | about being super human.
00:41:12.460 | You know, one of the IMO questions was solved in 19 seconds
00:41:16.700 | after we gave the question to AlphaProof.
00:41:20.500 | At the same time, language models cannot determine
00:41:23.100 | if 9.9 is smaller than or bigger than 9.11.
00:41:27.580 | And part of that is 9.11 is an inside job,
00:41:31.060 | but it's a funny, and then there's someone else's joke.
00:41:34.300 | I don't know, and I really like that joke.
00:41:35.820 | But it's jagged intelligence.
00:41:37.100 | This is a failure to generalize because of tokenization
00:41:40.020 | or because of whatever.
00:41:41.100 | And what we need is general intelligence.
00:41:43.100 | We've always been able to train dedicated special models
00:41:45.180 | to win prizes and do stunts.
00:41:47.140 | But the grand prize is general intelligence.
00:41:49.500 | That same model does everything.
00:41:51.700 | - Is it gonna work that way?
00:41:53.380 | I don't know.
00:41:54.220 | I think like if you look back a year and a half ago
00:41:57.260 | and you would say,
00:41:58.100 | "Can one model get to general intelligence?"
00:41:59.700 | Most people would be like, "Yeah, we can keep scaling."
00:42:02.100 | I think now it's like,
00:42:04.020 | is it gonna be more of a mix of models?
00:42:07.260 | You know, like can you actually do one model
00:42:09.100 | that does it all?
00:42:10.260 | - Yeah, absolutely.
00:42:11.460 | I think GPT-5 or Gemini 3 or whatever
00:42:15.660 | would be much more capable at this kind of stuff
00:42:19.100 | while it also serves our needs with everyday things.
00:42:22.420 | It might be completely uneconomical.
00:42:26.260 | Like why would you use a giant ass model
00:42:28.060 | to do normal stuff?
00:42:29.740 | But it is just a demonstration of proof
00:42:31.980 | that we can build super intelligence for sure.
00:42:34.900 | And then everything else follows from there.
00:42:37.300 | But right now we're just pursuing super intelligence.
00:42:40.180 | I always think about this,
00:42:41.420 | just reflecting on the GPU rich, poor stuff
00:42:43.580 | and now this alpha geometry stuff.
00:42:46.460 | I used to say you pursue capability first,
00:42:48.260 | then you make it more efficient.
00:42:50.140 | You make frontier model,
00:42:51.140 | then you distill it down to the 8B, 7B, 7EB,
00:42:53.900 | which is what Lambda 3 did.
00:42:55.460 | And by the way, also OpenAI did it with GPC 4.0
00:42:58.780 | and then distilled it down to 4.0 Mini.
00:43:00.620 | And then Cloud also did it with Opus
00:43:02.860 | and then with 3.5 Sonnet, right?
00:43:05.300 | That suitable recipe.
00:43:07.260 | In fact, I call it part of the deployment strategy of models.
00:43:10.340 | You train a base layer, you train a large one,
00:43:12.780 | and then you distill it down.
00:43:14.020 | You add structured output generation,
00:43:15.940 | tool calling and all that.
00:43:17.020 | You add the long context.
00:43:18.060 | You add like this standard stack of stuff
00:43:20.300 | in post-training that is growing and growing
00:43:22.380 | to the point where now OpenAI has opened a team
00:43:24.580 | for mid-training that happens before post-training.
00:43:28.860 | I think like one thing that I've realized
00:43:31.660 | from this alpha geometry thing
00:43:33.420 | is before you have capability and you have efficiency,
00:43:36.340 | there's an in-between layer of generalization
00:43:39.060 | that you need to accomplish.
00:43:40.020 | You need to do capability in one domain.
00:43:42.500 | You need to generalize it.
00:43:43.860 | Then you need to efficiencize it.
00:43:46.460 | Then you have good models.
00:43:48.020 | - That makes sense.
00:43:50.740 | I think like maybe the question is
00:43:53.020 | how many things can you make it better for
00:43:56.100 | before generalizing it, you know?
00:43:58.260 | Yeah, I don't have a good intuition for that.
00:44:00.340 | - We'll talk about that in the next thing.
00:44:02.340 | Yeah, so we can skip Nemotron's worth looking at
00:44:04.420 | if you're interested in synthetic data.
00:44:06.220 | Multimodal labeling, I think has happened a lot.
00:44:08.300 | We'll jump to multimodal now.
00:44:11.540 | - Yeah, we got a bunch of news.
00:44:13.220 | Well, the first news is that 4.0 voice is still not out
00:44:17.100 | even though the demo was great.
00:44:19.060 | I think they're starting to roll out the beta
00:44:21.340 | in the next week.
00:44:22.180 | So I subscribed back to ChatGPT+.
00:44:25.420 | - You gave in?
00:44:26.340 | - I gave in because they're rolling it out next week.
00:44:28.660 | So you better be on the cutoff
00:44:30.300 | or you're not going to get it.
00:44:31.620 | - Nice baits I've all met.
00:44:33.300 | - I said this, I said when I talk about
00:44:34.660 | unbundling on ChatGPT,
00:44:35.500 | it's basically because they had nothing to offer people.
00:44:37.780 | That's why people aren't subscribing
00:44:38.940 | because why keep paying $20 a month for this, right?
00:44:41.500 | But now they have proprietary models.
00:44:42.940 | Oh yeah, I'm back in, right?
00:44:44.860 | - We're so back.
00:44:45.860 | - We're so back, we're so back.
00:44:47.220 | I will pay $200 for the Scarlett Johansson voice,
00:44:49.460 | but you know, they'll probably get sued for that.
00:44:52.500 | But yeah, the voice is coming.
00:44:55.260 | We had a demo at the World's Fair that was,
00:44:57.980 | I think the second public demo.
00:45:00.260 | Roman, I have to really give him a shout out for that.
00:45:03.140 | We had a few people drop out last minute
00:45:05.580 | and he was, he rescued the conference
00:45:08.220 | and worked really hard.
00:45:10.500 | Like, you know, I think off the scenes,
00:45:11.900 | I think something that people don't understand
00:45:13.540 | is OpenAI puts a lot of effort into their presentations
00:45:15.940 | and if it's not ready, they won't launch it.
00:45:17.700 | Like he was ready to call it off
00:45:19.060 | if we didn't make the AV work for him.
00:45:21.260 | And I think, yeah, they care about their presentation
00:45:23.740 | and how they launch things to people.
00:45:25.860 | Those minor polished details really matter.
00:45:28.380 | Just for the record, for people who don't understand
00:45:30.340 | what happened was, first of all, you can go see,
00:45:32.620 | just look for like the GPC 4.0 talk
00:45:34.340 | at the AI Engineer World's Fair.
00:45:35.940 | But second of all,
00:45:36.780 | because it was presented live at a conference
00:45:39.060 | with large speakers blaring next to you
00:45:40.900 | and it is a real-time voice thing.
00:45:42.460 | So it's listening to its own voice
00:45:44.340 | and it needs to distinguish between its own voice
00:45:46.780 | and between the human voice
00:45:47.980 | and it needs to ignore its own voice.
00:45:49.540 | So we had OpenAI engineers tune that for our stage
00:45:52.460 | to make this thing happen, which is absurd.
00:45:56.620 | It was so funny, but also like,
00:45:58.500 | shout out to them for doing that for us
00:46:00.740 | and for the community, right?
00:46:02.020 | Because I think people wanted an update on voice.
00:46:05.020 | - Yeah, they definitely do care about demos.
00:46:07.980 | Not much to add there.
00:46:09.300 | - Yeah. - Lumetri voice?
00:46:10.780 | - Something that maybe is buried
00:46:12.380 | among all the Lumetri news
00:46:13.580 | is that Lumetri is supposed to be a multimodal model.
00:46:16.340 | It was delayed thanks to the European Union.
00:46:19.380 | Apparently, I'm not sure what the whole story there is.
00:46:22.220 | I didn't really read that much about it.
00:46:23.860 | It is coming.
00:46:24.700 | Lumetri will be multimodal.
00:46:26.260 | It uses adapters rather than being natively multimodal.
00:46:29.980 | But I think that it's interesting to see
00:46:32.260 | the state of Meta-AI research come together
00:46:35.620 | because there was this independent threads of voice box
00:46:39.540 | and seamless communication.
00:46:41.540 | These are all projects that Meta-AI has launched
00:46:43.540 | that basically didn't really go anywhere
00:46:45.260 | because they were all one-offs.
00:46:46.860 | But now all that research is being pulled in into Lumetri,
00:46:49.300 | like Lumetri is just subsuming all of FAIR,
00:46:52.300 | all of Meta-AI into this thing.
00:46:54.660 | And yeah, you can see the voice box mentioned
00:46:57.660 | in Lumetri voice adapter.
00:46:59.380 | I was kind of bearish on conformers
00:47:01.820 | because I looked at the state of existing conformer research
00:47:05.060 | in ICM, Eclair, and NeurIPS,
00:47:08.180 | and they were far, far, far behind Whisper,
00:47:10.980 | mostly because of scale,
00:47:12.020 | like the sheer amount of resources that are dedicated.
00:47:14.300 | But Meta is approaching there.
00:47:15.940 | I think they had 230,000 hours of speech recordings.
00:47:20.580 | I think Whisper is something like 600,000.
00:47:24.260 | So Meta just needs to 3X the budget on this thing
00:47:26.220 | and they'll do it.
00:47:27.220 | And we'll have open source voice.
00:47:30.980 | - Yeah, and then we can hopefully fine tune on our voice
00:47:34.500 | and then we just need to write this episode
00:47:36.460 | instead of actually recording it.
00:47:38.180 | - I should also shout out the other thing from Meta,
00:47:40.180 | which is a very, very big deal, which is Chameleon,
00:47:42.820 | which is a natively early fusion vision and language model.
00:47:47.500 | So most things are late fusion, basically.
00:47:49.460 | Like you freeze an existing language model,
00:47:51.340 | you freeze an existing vision transformer,
00:47:53.660 | then you kind of fuse them with an adapter layer.
00:47:56.140 | That is what Lumetri is also doing.
00:47:58.460 | But Chameleon is slightly different.
00:47:59.820 | Chameleon is interleaving in the same way that IdaFix,
00:48:03.940 | the sort of data set is doing,
00:48:06.700 | interleaving natively for image generation
00:48:09.420 | and vision and text understanding.
00:48:12.940 | And I think like once that is better understood,
00:48:16.100 | that is going to be better.
00:48:17.180 | That is the more deep learning build version of this,
00:48:20.060 | the more GPU rich version of doing all this.
00:48:23.060 | I asked Yitei this question about Chameleon in his episode,
00:48:26.660 | he did not confirm or deny,
00:48:28.020 | but I think he would agree that
00:48:29.660 | that is the right way to do multimodality.
00:48:32.340 | And now that we're proving out
00:48:34.820 | that multimodality is valuable to people,
00:48:37.820 | basically all this half-ass measures around adapters
00:48:41.340 | is going to flip to natively multimodal.
00:48:44.340 | To me, that's what GPC 4.0 represents.
00:48:46.020 | It is the train from scratch, fully omnimodal model,
00:48:50.620 | which is early fusion.
00:48:51.900 | So if you want to read that,
00:48:53.220 | you should read the Chameleon paper, basically.
00:48:54.740 | That's what is my whole point.
00:48:56.340 | - And there was some of the Chameleon drama
00:48:58.900 | because the open model doesn't have image generation.
00:49:02.460 | And then there were fine tuning recipe.
00:49:04.500 | - It's so funny.
00:49:05.340 | The leads were like, "No, do not follow these instructions
00:49:08.140 | "to fine tune image generation."
00:49:10.860 | - That's just really funny.
00:49:11.940 | I don't know what the...
00:49:13.380 | Okay, so yeah, whenever image generation is concerned,
00:49:15.460 | obviously because of the Gemini issue,
00:49:17.980 | it's very tricky for large companies to release that,
00:49:22.700 | but they can remove it,
00:49:24.020 | say that they remove it,
00:49:24.860 | point out exactly where they remove it
00:49:26.300 | and let the open source community put it back in.
00:49:28.500 | (laughs)
00:49:31.340 | The last piece I had, which I kind of deleted,
00:49:32.980 | was there's a special mention,
00:49:34.980 | honorable mention of Gemma again with PolyGemma,
00:49:37.540 | which is one of the smaller releases from Google I/O.
00:49:39.780 | I think you went, right?
00:49:40.980 | So PolyGemma was mentioned in there?
00:49:43.060 | I don't know.
00:49:44.860 | It was one of the...
00:49:45.700 | - Yeah, yeah, one of the workshops.
00:49:46.540 | - Very, very small release.
00:49:48.380 | But CopolyGemma now is being talked a lot about
00:49:50.940 | as a late fusion model
00:49:53.020 | for extracting structured text out of PDFs.
00:49:55.780 | Very, very important for business work.
00:49:57.500 | - Yeah, I know.
00:49:58.340 | - Workhorses.
00:49:59.180 | - Yes.
00:50:00.020 | - So apparently it is doing better than Amazon Textract
00:50:02.060 | and all the other state-of-the-art.
00:50:03.420 | And it's a tiny, tiny model that does this.
00:50:05.580 | And it's really interesting.
00:50:06.900 | It's a combination of the Omar Khattab's
00:50:09.900 | Colbert retrieval approach on top of a vision model,
00:50:13.820 | which I was severely underestimating PolyGemma
00:50:16.500 | when it came out, but it continues to come up.
00:50:18.820 | There's a lot of trends.
00:50:20.140 | And again, this is making a lot of progress here,
00:50:22.860 | just in terms of their applications in real-world use cases.
00:50:26.060 | These are small models, but they're very, very capable
00:50:28.220 | and they're a very good basis
00:50:29.460 | to build things like CopolyGemma.
00:50:31.060 | - Yeah, no, Google has been doing great.
00:50:33.780 | I think maybe a lot of people initially wrote them off,
00:50:36.100 | but between, you know, some of the Gemini Nano stuff,
00:50:39.340 | like Gemma 2, PolyGemma.
00:50:42.540 | We'll talk about some of the KV cache and context caching.
00:50:45.820 | - Yeah, yeah, that's a rag horse.
00:50:47.540 | - So there's a lot to like.
00:50:50.260 | And our friend Logan is over there now, so.
00:50:52.460 | He's excited about everything they got going on, so yeah.
00:50:55.180 | - I think there's a little bit of a fight
00:50:56.620 | between AI Studio and Vertex.
00:50:58.420 | And what Logan represents is,
00:51:00.260 | so he's moved from DevRel to PM,
00:51:02.140 | and he was PM for the Gemma 2 launch.
00:51:05.580 | Vertex has this reputation of being extremely hard to use.
00:51:08.380 | It's one reason why GCP has kind of
00:51:10.580 | fallen behind a little bit.
00:51:12.220 | And so AI Studio represents like
00:51:14.180 | the developer-friendly version of this,
00:51:16.020 | like the Netlify or Vercel to the AWS, right?
00:51:20.820 | And I think it's Google's chance to reinvent itself
00:51:23.780 | for this audience, for the AI engineer audience
00:51:25.380 | that doesn't want like five levels of off IDs and org IDs
00:51:29.180 | and policy permissions just to get something going.
00:51:32.180 | - True, true.
00:51:33.900 | Yeah, we want to jump into RAG Ops Wars.
00:51:37.020 | - What to say here?
00:51:37.860 | I think that what RAG Ops Wars are to me,
00:51:40.820 | like the tooling around the ecosystem.
00:51:44.100 | And I might need to actually rename this war.
00:51:46.900 | - War renaming alert, what are we calling it?
00:51:49.660 | - LLMOS.
00:51:50.940 | - LLMOS.
00:51:51.780 | - Because it used to be when the only job
00:51:55.380 | for AIs to do was chatbots.
00:51:58.900 | Then RAG matters, then Ops matters.
00:52:01.620 | But now we need AIs to also write code.
00:52:03.860 | We also need AIs to work with other agents, right?
00:52:08.780 | That's not reflected in any of the other wars.
00:52:11.020 | So I think that just the whole point is
00:52:13.740 | what does an LLM plug into with the broader ecosystem
00:52:16.820 | to be more capable than an LLM can be on its own?
00:52:19.300 | - Yeah.
00:52:20.260 | - I just announced it,
00:52:21.780 | but this is something I've been thinking about a lot.
00:52:24.500 | It's a blog post I've been working on.
00:52:26.340 | Basically, my tip to other people is
00:52:28.300 | if you want to see where things are going,
00:52:29.420 | you go open up the chat GPT, GPT creator.
00:52:31.820 | Every single button on the GPT creator
00:52:33.900 | is a potential startup.
00:52:35.820 | Exa is for search.
00:52:38.380 | The knowledge RAG thing is for RAG.
00:52:40.140 | - Yeah, we invested in e2b.
00:52:42.140 | - Yeah, congrats.
00:52:42.980 | Is that announced?
00:52:43.820 | I don't know if you-
00:52:44.660 | - It's announced now.
00:52:45.500 | By the time this goes out, it'll be.
00:52:47.140 | - Briefly, what is e2b?
00:52:48.420 | - So e2b is basically a code interpreter SDK as a service.
00:52:52.060 | So you can add code interpreter to any model.
00:52:54.460 | They partner with Mistral to add that in.
00:52:56.380 | They have this open source cloud artifacts clone
00:52:59.220 | using e2b.
00:53:00.420 | It's a, I mean, the amount of like traction
00:53:02.580 | that they've been getting in open source has been amazing.
00:53:05.060 | I think they went in like four months
00:53:07.020 | from like 10K to a million containers spun up on the cloud.
00:53:10.900 | So, I mean, you told me this maybe like nine months ago,
00:53:14.620 | 12 months ago, something like that.
00:53:16.500 | You were like, well, you literally just said
00:53:19.220 | every chat GPT plugin can be-
00:53:21.060 | - A business, a startup.
00:53:22.100 | - Can be a business startup.
00:53:23.780 | And I think now it's more clear than ever
00:53:26.300 | than the chat bots are just kind of like
00:53:29.340 | the band-aid solution, you know,
00:53:31.780 | before we build more comprehensive systems.
00:53:33.980 | And yeah, AXA just raised a Series A from Lightspeed, so.
00:53:38.740 | - I tried to get you in on that one as well.
00:53:40.420 | - Yeah, yeah, no, I read that.
00:53:42.500 | - I'm trying to be a scout, man.
00:53:43.340 | I don't know.
00:53:44.180 | - So yeah, this is a giving as a VC, early stage VC,
00:53:50.380 | like giving capabilities to the models
00:53:53.020 | is like way more important than the actual LLM ops,
00:53:57.300 | you know, the observability and like all these things,
00:53:59.380 | like those are nice, but like the way you build real value
00:54:02.980 | for a lot of the customers, it's like,
00:54:04.620 | how can this model do more than just chat with me?
00:54:07.380 | So running code, doing analysis, doing web search.
00:54:10.020 | - I might disagree with you.
00:54:13.220 | I think they're all valuable.
00:54:14.900 | They're all valuable.
00:54:15.740 | - Yeah, well. - They're all valuable.
00:54:16.660 | So I would disagree with you just on like,
00:54:18.860 | I find ops my number one problem right now
00:54:21.460 | building Smalltalk and building AI news,
00:54:24.220 | building anything that I do.
00:54:25.180 | And I don't think I'm happy with all the ops solutions
00:54:28.380 | that I've explored.
00:54:29.220 | There are some 80 something ops startups.
00:54:31.220 | - Right.
00:54:32.380 | - I nearly, you know, started one of them,
00:54:34.500 | but we'll briefly talk about this ops thing
00:54:36.260 | and then we'll go back to Rag.
00:54:37.940 | The central way I explain this thing to people
00:54:39.980 | is that all the model labs view their job as stopping
00:54:43.020 | by serving you their model over an API, right?
00:54:46.100 | That is unfortunately not everything that you need
00:54:48.660 | in order to productionize this API.
00:54:51.140 | So obviously there's all these startups.
00:54:52.580 | They're like, yeah, we are ops guys.
00:54:54.020 | We've done this for 30 years.
00:54:55.940 | We will now do this for AI.
00:54:57.780 | And 80 of them show up and they all raise money.
00:55:01.100 | And the question is like,
00:55:03.060 | what do you actually need as sort of an AI native ops layer
00:55:06.620 | versus what is just plugged into Datadog, right?
00:55:09.540 | I don't know if you have dealt with that
00:55:11.580 | because I'm not like a super ops person,
00:55:13.700 | but I appreciate the importance of this thing.
00:55:15.500 | I think there's three broad categories,
00:55:18.380 | which is frameworks, gateways, and monitoring or tracing.
00:55:23.140 | We've talked to like, I interviewed Human Loop in London
00:55:26.060 | and you've talked to a fair share of them.
00:55:28.380 | I've talked to a fair share of them.
00:55:29.500 | So the frameworks would be,
00:55:30.940 | honestly, I won't name the startup,
00:55:33.020 | but basically what this company was doing
00:55:35.340 | was charging me $49 a month to store my prompt template.
00:55:39.460 | And every time I make an inference,
00:55:40.620 | it would F-string call the prompt template
00:55:43.980 | on some variables that I supply.
00:55:45.700 | And it's charging $49 a month for unlimited storage of that.
00:55:49.460 | It's absurd, but like people want prompt management tools.
00:55:53.100 | They want to interoperate between PM and developer.
00:55:57.340 | There's some value there.
00:55:58.220 | I don't know what the right price is.
00:55:59.620 | - Yeah. - There's some price.
00:56:00.820 | - I was at, I'm sure I can share this.
00:56:02.780 | I was at the Grab office and they also treat prompts as code,
00:56:07.140 | but they build their own thing to then import the prompts.
00:56:08.580 | - Yeah, but I want to check prompts into my code base
00:56:10.140 | as a developer, right?
00:56:11.300 | But maybe, do you want it outside of the code base?
00:56:14.260 | - Well, you can have it in the code base,
00:56:15.620 | but like, what's like the prompt file?
00:56:17.780 | What's like, you know, it's not just a string.
00:56:20.980 | - It's string and model and config.
00:56:24.100 | - Exactly, how do you pass these things?
00:56:26.220 | But I think like the problem with building frameworks
00:56:29.500 | is like frameworks generalize things that we know work.
00:56:33.500 | And like right now we don't really know what works.
00:56:35.580 | - Yeah, but some people have to try, you know,
00:56:37.180 | in the whole point of early stages,
00:56:38.340 | you try it before you know it works.
00:56:39.620 | - Yeah, but I think like the past,
00:56:42.780 | if you see the most successful open source frameworks
00:56:45.140 | that became successful businesses are frameworks
00:56:47.620 | that were built inside companies
00:56:49.300 | and then were kind of spun out as projects.
00:56:51.820 | So I think it's more about ordering.
00:56:54.020 | - Vertical-filled instead of horizontal-filled.
00:56:55.820 | (laughs)
00:56:56.980 | - I mean, we try to be horizontal-filled, right?
00:56:58.820 | And it's like, where are all the horizontal startups?
00:57:01.820 | - There are a lot of them.
00:57:02.860 | They're just not that, they're not going to win by themselves.
00:57:07.860 | I think some of them will win by sheer excellent execution.
00:57:12.340 | And then, but like the market won't pull them.
00:57:14.500 | They will have to pull the market.
00:57:16.140 | - Oh, but that's the thing.
00:57:16.980 | It's like, you know, take like Julius, right?
00:57:20.420 | It's like, "Hey, why are you guys doing Julius?"
00:57:22.780 | It's like the same as Code Interpreter.
00:57:24.380 | And yet they're pretty successful.
00:57:26.460 | A lot of people use it
00:57:27.500 | because they're like solving a problem.
00:57:28.980 | And then-
00:57:29.820 | - They're more dedicated to it than Code Interpreter.
00:57:31.620 | - Exactly.
00:57:32.460 | So it's like, I think-
00:57:33.580 | - Just take it more seriously than (indistinct)
00:57:36.180 | - I think people underestimate how important it is
00:57:38.940 | to be very good at doing something
00:57:41.060 | versus trying to serve everybody with some of these things.
00:57:43.900 | So, yeah, I think that's a learning
00:57:45.700 | that a lot of founders are having.
00:57:47.660 | - Yes.
00:57:48.500 | Okay, so to round out the Ops world.
00:57:49.980 | So it's a three circle Venn diagram, right?
00:57:52.260 | It's frameworks, it's gateways.
00:57:54.780 | So the only job of the gateway is to just be one endpoint
00:57:57.660 | that proxies all the other endpoints, right?
00:58:00.300 | And it normalizes the APIs mostly to OpenAI's API
00:58:05.300 | just because most people started OpenAI.
00:58:07.940 | And then lastly, it's monitoring and tracing, right?
00:58:10.340 | So logging those things,
00:58:11.780 | understanding the latency, like P99 or whatever,
00:58:14.340 | and like the number of steps that you take.
00:58:15.820 | So lagsmith is obviously very, very early on to this stuff.
00:58:18.620 | But so is like fuse.
00:58:20.460 | So is, oh my God, like there's so many.
00:58:24.180 | I'm sure like Datadog has some like-
00:58:26.180 | - Yeah, yeah.
00:58:27.020 | - Weights and biases has some, you know.
00:58:29.100 | It's very hard for me to choose between all those things.
00:58:31.740 | So I, as a small team developer,
00:58:34.620 | wants one tool that does all these things.
00:58:36.820 | And my discovery has been
00:58:38.020 | that there's so much specialization here.
00:58:40.220 | Like everyone is like, oh yeah, we do this,
00:58:42.500 | but we don't do that.
00:58:43.340 | For the other stuff,
00:58:44.180 | we recommend these two other friends of ours.
00:58:46.420 | And I'm like, why am I integrating four tools
00:58:48.900 | when I just need one?
00:58:49.980 | They're all the same thing.
00:58:51.380 | That is my current frustration.
00:58:54.980 | The obvious frustration solution is I build my own, right?
00:58:57.780 | Which is, you know, we have 14 standards, now we have 15.
00:59:01.060 | So it's just a very messy place to be in.
00:59:03.700 | I wish there was a better solution to recommend to people
00:59:06.660 | because right now I cannot clearly recommend things.
00:59:08.940 | - Yeah, I think the biggest change in this market
00:59:11.300 | is like latency is actually not that important anymore.
00:59:14.860 | Like we lived in the past 10 years in a world
00:59:17.060 | where like 10, 15, 20 milliseconds made a big difference.
00:59:20.620 | I think today people will be happy to trade 50 milliseconds
00:59:24.140 | to get higher quality output from a model.
00:59:27.220 | So, but still all the tracing is all like,
00:59:29.620 | how long did it take?
00:59:30.660 | Like, what's the thing?
00:59:31.500 | Instead of saying, is this quality good for this output?
00:59:34.660 | Like, should you use another model?
00:59:36.180 | Like, we're just kind of taking what we did with cloud
00:59:38.860 | and putting it in LLMs instead of saying
00:59:41.460 | what actually matters when it comes to LLMs,
00:59:43.780 | what you should actually monitor.
00:59:45.780 | Like, I don't really care what my P99 is
00:59:47.780 | if the model is crap, right?
00:59:49.100 | It's like, also like, I don't own most of the models.
00:59:51.820 | So it's like, this is the GPT-4 API performance.
00:59:54.660 | It's like, okay, am I going into a moment?
00:59:56.820 | It's like, I can't do anything about it, you know?
00:59:58.820 | So I think that's maybe why the value is not there.
01:00:02.100 | Like, you know, am I supposed to pay 100K a year?
01:00:04.580 | Like I pay the data dog or whatever to tell me,
01:00:07.900 | for have you to tell me that GPT-4 is slow?
01:00:10.140 | It's like, you know, and just not, I don't know.
01:00:13.940 | - I agree, it's challenging there.
01:00:15.860 | Okay, so the last piece I'll mention is briefly,
01:00:18.820 | ML Ops is still real.
01:00:20.740 | I think LLM Ops, or whatever you call this,
01:00:23.700 | AI Engineer Ops, the Ops layer on top of the LLM layer
01:00:26.540 | might follow the same evolution path as the ML Ops layer.
01:00:30.580 | And so the most impressive thing I've seen
01:00:32.820 | from the ML Ops layer is from Apple.
01:00:35.260 | When they announced Apple Intelligence,
01:00:36.660 | they also announced Teleria,
01:00:37.820 | which is their internal ML Ops tool,
01:00:39.740 | which way you can profile the performance
01:00:41.260 | of each layer of a transformer.
01:00:43.100 | And you can A/B test like a hundred different variations
01:00:46.460 | of different quantizations and stuff
01:00:47.820 | and pick the best performance.
01:00:49.460 | And I could see a straight line from there to like,
01:00:51.860 | okay, I want this, but for my AI Engineering Ops,
01:00:55.380 | like I want this level of clarity on like what I do.
01:00:59.700 | And there's a lot of internal engineering
01:01:02.340 | within these big companies
01:01:03.180 | would take their ML training very seriously.
01:01:05.020 | And I see that also happening for AI Engineering as well.
01:01:07.660 | And let's briefly talk about RAG and context caching, maybe,
01:01:10.140 | unless you have other like LLM OS stuff
01:01:12.340 | that you're excited about.
01:01:14.220 | - LLM OS stuff I'm excited about.
01:01:16.460 | No, I think that's really, a lot of it is like,
01:01:20.180 | move beyond being observability or like help
01:01:24.260 | for like making the prompt call
01:01:25.500 | and like actually being on LLM OS, you know?
01:01:28.140 | I think today it's mostly like LLM Rails, you know?
01:01:32.060 | Like there's no OS,
01:01:33.140 | but I think like actually helping people build things.
01:01:35.020 | That's why, you know, if you look at xia2b,
01:01:37.500 | it's like, that's the OS, you know?
01:01:39.300 | Those are kind of like the OS primitives
01:01:41.780 | that you need around it.
01:01:43.180 | - Yeah, okay.
01:01:44.020 | So I'll mention a couple of things then.
01:01:45.580 | One layer I've been excited about publicly,
01:01:47.900 | but I haven't talked about it on this podcast
01:01:49.540 | is memory databases,
01:01:51.660 | memory layers on top of vector databases.
01:01:53.820 | The Vogue thing of last year was vector databases, right?
01:01:56.220 | Everybody had a vector database company.
01:01:59.420 | And I think the insight is that vector databases
01:02:02.380 | are too low level.
01:02:03.300 | Like they're not very useful out of the box.
01:02:05.100 | They do cosine similarity matching and retrieval,
01:02:07.660 | and that's about it.
01:02:08.540 | We'll briefly maybe mention here, BM42,
01:02:10.780 | which was this whole debate between Vespa and who else?
01:02:14.540 | Quadrants, Qdrants,
01:02:15.660 | and I think a couple other companies also chipped in,
01:02:18.460 | but it was mainly a very, very public
01:02:20.420 | and ugly theater battle
01:02:21.460 | between benchmarking for databases.
01:02:23.860 | And the history of benchmarking for databases
01:02:25.620 | goes as far back as Larry Ellison and Oracle and all that.
01:02:28.940 | It's just very cute to see it happening
01:02:30.820 | in the vector database space.
01:02:32.660 | Some things don't change.
01:02:34.620 | But on top of that,
01:02:36.340 | I think one of the reasons I put vector databases
01:02:38.580 | inside of these wars is in order to grow,
01:02:41.460 | the vector databases have to become more frameworks.
01:02:44.220 | In order to grow,
01:02:45.060 | the ops companies have to become more frameworks, right?
01:02:47.180 | And then the framework companies have to become ops companies,
01:02:49.300 | which is what Lankchain is.
01:02:51.100 | So one element of the vector databases growing,
01:02:54.020 | I've been looking for what the next direction
01:02:55.700 | of vector databases growing is,
01:02:57.420 | is memory, long conversation memory.
01:02:59.460 | I have on me this B,
01:03:02.660 | which is one of the personal AI wearables.
01:03:04.340 | I'm also getting the limitless personal AI wearable,
01:03:07.060 | which is like,
01:03:07.900 | I just wanted to record my whole conversation
01:03:09.660 | and just repeat back to me,
01:03:11.140 | or let me find, augment my memory.
01:03:15.300 | I'm sure Character AI has some version of this.
01:03:17.180 | Like everyone has conversation memory
01:03:19.380 | that is different from factual memory.
01:03:21.420 | And right now,
01:03:22.260 | vector database is very oriented towards factual memory,
01:03:24.340 | document retrieval, knowledge-based retrieval,
01:03:26.620 | but it's not the same thing as conversation retrieval,
01:03:28.660 | where I need to know what I've said to you,
01:03:31.420 | what I said to you yesterday,
01:03:32.340 | what I said to you a year ago, three years ago.
01:03:34.420 | And it's a different nature of retrieval, right?
01:03:36.740 | So there's a, at the conference that we ran,
01:03:39.660 | graph rag was a lot of focus for people,
01:03:42.580 | the marriage of knowledge graphs and rag.
01:03:44.740 | I think that this is commonly a trap in ML
01:03:48.100 | that people are like,
01:03:48.940 | they discover that graphs are a thing for the first time.
01:03:50.900 | They're like, oh yeah, everything's a graph.
01:03:52.140 | Like the future is graphs and then nothing happens.
01:03:54.140 | Very, very common.
01:03:55.020 | This happened like three, four times
01:03:56.220 | in the industry's past as well.
01:03:58.540 | But maybe this time is different.
01:03:59.940 | - Maybe.
01:04:00.780 | (laughs)
01:04:01.620 | Unless.
01:04:02.460 | - Unless.
01:04:03.300 | (laughs)
01:04:04.500 | So, this is a fun, this is why I'm not an investor.
01:04:08.340 | Like you have to get the time that this time is different
01:04:11.620 | because no ideas are really truly new,
01:04:14.180 | but sometimes this time is different.
01:04:16.820 | (laughs)
01:04:18.060 | - Maybe.
01:04:18.900 | - And so memory databases are one form of that,
01:04:20.660 | where like they're focused on the problem of long form memory
01:04:24.180 | for agents, for assistants, for chatbots and all that.
01:04:28.620 | I definitely see that coming.
01:04:29.820 | There were some funding rounds
01:04:30.660 | that I can't really talk about in this sector
01:04:32.980 | and I've seen that happen a lot.
01:04:35.420 | Yeah, I have one more category in LMOS,
01:04:36.780 | but any comments on--
01:04:37.900 | - Yeah, no, I think that makes sense to me,
01:04:39.580 | that moving away from just semantic similarity,
01:04:42.660 | I think it's the most important
01:04:43.860 | because people use the same word
01:04:45.780 | with very different meanings, especially when talking.
01:04:48.620 | When writing, it's different, but yeah.
01:04:50.260 | - Yeah, the other direction that vector databases
01:04:51.780 | have gone into, which LensDB presented at my conference,
01:04:55.060 | was multimodality.
01:04:55.940 | So Character AI uses LensDB for multimodal embeddings.
01:04:59.900 | That's just a minor difference.
01:05:01.700 | I don't think that's like a quantum leap
01:05:03.220 | in terms of what a vector database does for you.
01:05:05.540 | The other thing that I see in LMOS world
01:05:07.620 | is mostly the evolution of just the ecosystem of agents,
01:05:12.620 | the agents talking to other agents
01:05:15.740 | and coordinating with other agents.
01:05:17.260 | So I interviewed Graham Newbig at iClear
01:05:20.380 | and he since announced that they are pivoting OpenDevIn
01:05:23.500 | or broadening OpenDevIn into All Hands AI.
01:05:26.820 | I'm not sure about that name,
01:05:27.660 | but it is one of the three LMOS startups
01:05:32.660 | that got funded in the past two months
01:05:35.140 | that I know about, and maybe you know more.
01:05:36.620 | They're all building like this ecosystem of agents,
01:05:39.460 | working with other agents
01:05:40.300 | and all this tooling for agents.
01:05:43.060 | To me, it makes more sense.
01:05:44.060 | It is probably the biggest thing I missed
01:05:46.820 | in doing the four wars.
01:05:48.540 | The need for startups to build this ecosystem thing up.
01:05:51.880 | So the big categories have been taken.
01:05:53.260 | Search, done.
01:05:54.340 | Code interpreter, done.
01:05:55.500 | There's a long tail of others.
01:05:56.420 | So memory is emerging, then there's like other stuff.
01:06:00.700 | And so they're focusing on that.
01:06:02.740 | To me, browser is slightly different from search
01:06:05.980 | and Browserbase is another company I invested in
01:06:08.380 | that is focused on that,
01:06:09.580 | but they're not the only one in that category by any means.
01:06:12.540 | I used to tell people, go to the DevIn demo
01:06:14.900 | and look at the four things that they offer
01:06:16.260 | and each of those things is a startup.
01:06:18.900 | DevIn, since then, they spoke at the conference as well.
01:06:20.740 | Scott was super nice to me
01:06:22.220 | and actually gave me some personal time as well.
01:06:24.780 | They have an updated chart of their plans.
01:06:27.140 | Look at their plans.
01:06:27.980 | They have like 16 things.
01:06:29.100 | Each of those things is a potential startup now.
01:06:31.260 | And that is the LMOS.
01:06:32.420 | Everyone's building towards that direction
01:06:33.660 | because they need it to do what they need to do as an agent.
01:06:36.740 | If you believe in the agent's future,
01:06:38.820 | you need all these things.
01:06:40.260 | - Yeah.
01:06:41.080 | You think the HNOS is its own company?
01:06:45.380 | Do you think it's a open standard?
01:06:47.140 | Do you think?
01:06:48.100 | - I would love it to be open standard.
01:06:49.540 | The reality is that people want to own that standard.
01:06:51.660 | So, we actually wound down the AI Engineer Foundation
01:06:54.900 | with the first project was the Agent Protocol,
01:06:57.020 | which E2B actually donated to the foundation
01:06:59.580 | because no one's interested.
01:07:01.740 | Everyone wants to be VC-backed
01:07:04.260 | when they want to own it, right?
01:07:05.540 | So, it's too early to be open source.
01:07:08.700 | People will keep this proprietary and more power to them.
01:07:10.980 | They need to make it work.
01:07:11.820 | They need to make revenue
01:07:12.900 | before all the other stuff can happen.
01:07:14.660 | - Yeah.
01:07:15.500 | I'm really curious.
01:07:16.340 | We're investors in a bunch of agent companies.
01:07:18.260 | None of them really care
01:07:20.660 | about how to communicate with other agents.
01:07:23.020 | They're so focused internally, you know?
01:07:25.060 | But I think in the future,
01:07:26.540 | you know, it talks about this-
01:07:27.380 | - I see, you're talking about agent
01:07:28.220 | to other external agents.
01:07:29.620 | - Yeah, so I think-
01:07:30.460 | - I'm not talking about that.
01:07:31.280 | - Yeah, I wonder when, like,
01:07:33.500 | because that's where the future is going, right?
01:07:35.100 | So, today it's like intra-agent connectivity, you know?
01:07:38.540 | At some point, it's like, well,
01:07:39.620 | it's not like somebody I'm selling into a company
01:07:42.580 | and the company already uses agent X for that job.
01:07:45.580 | I need to talk to that agent, you know?
01:07:47.780 | But I think nobody really cares about that today.
01:07:49.700 | So I think that's usually it.
01:07:51.100 | - Yeah, so I think that that layer right now is open API.
01:07:55.980 | Just give me a RESTful protocol,
01:07:58.180 | I can interoperate with that.
01:07:59.460 | RESTful protocol only does request-response.
01:08:01.740 | So then the next layer is something I have worked on,
01:08:03.600 | which is long-running request-response,
01:08:05.900 | which is workflows,
01:08:07.140 | which is what Temporal was supposed to do
01:08:09.140 | before, let's just say, management issues.
01:08:11.820 | Yeah, but like, you know, RPC or some kind of, you know,
01:08:14.540 | I think that the dream is,
01:08:17.060 | and this is one of my problems with the LMOS concept,
01:08:20.900 | is that do we really need to rewrite every single thing
01:08:23.420 | for AI-native use cases?
01:08:25.340 | Shouldn't the AI just use these things,
01:08:28.020 | these tools the same way as humans use them?
01:08:30.420 | Reality is, for now, yes, they need specialized APIs.
01:08:34.080 | In the distant future, when these things cost nothing,
01:08:36.780 | then they can use it the same way as humans does,
01:08:39.020 | but right now they need specialized interfaces.
01:08:40.940 | The layer between agents ideally should just be English,
01:08:44.720 | you know, like the same way that we talk,
01:08:47.180 | but like English is too under-specified,
01:08:50.580 | unstructured to make that happen, so.
01:08:53.140 | - It's interesting because we talk to each other in English,
01:08:56.420 | but then we both use tools to do things
01:08:58.900 | to then get the response back.
01:09:00.340 | - For those people who want to dive in a little bit more,
01:09:02.580 | I think AutoGen, I would definitely recommend
01:09:04.860 | looking at that, Crew AI.
01:09:06.380 | There are established frameworks now
01:09:08.220 | that are working on interagents, communication layers,
01:09:10.780 | to coordinate them, and not necessarily externally
01:09:13.740 | from company to company, just internally as well.
01:09:16.540 | If you have multiple agents farming out work
01:09:17.980 | to do different things, you're going to need this anyway.
01:09:20.500 | And I don't think it's that hard.
01:09:23.080 | They are using English.
01:09:23.940 | They're using some mix of English and structured output.
01:09:27.560 | And yeah, if you have a better idea than that, let us know.
01:09:31.340 | - Yeah, we're listening.
01:09:32.540 | - So that's the four words discussion.
01:09:35.980 | I think I want to leave some discussion time open
01:09:38.020 | for miscellaneous trends that are happening
01:09:40.540 | in the industry that don't exactly fit in the four words
01:09:43.220 | or are a layer above the four words.
01:09:45.780 | So the first one to me is just this trend of open source.
01:09:48.820 | Obviously this overlaps a lot with the GPU poor thing,
01:09:51.420 | but I want to really call out this depreciation thing
01:09:54.700 | that I've been working on.
01:09:55.540 | Like I do think it's probably one of the bigger thesis
01:09:58.940 | that I've had in the past month,
01:10:01.280 | which is that we now have a rough idea
01:10:04.340 | of the deprecation schedule of this sort of model spend.
01:10:08.820 | And I basically drew a chart.
01:10:10.460 | I'll link it in the show notes,
01:10:11.500 | but I drew a chart of the price efficiency frontier
01:10:15.620 | of as of March, April, 2024.
01:10:19.180 | And then I had listed all the models
01:10:20.780 | that sit within that frontier.
01:10:22.860 | Haiku was the best cost per intelligence
01:10:25.460 | at that point in time.
01:10:26.740 | And then I did the same chart in July, two days ago,
01:10:30.340 | and the whole thing has moved.
01:10:32.220 | And Mistral is like deprecating their old models
01:10:35.060 | that used to be in the old frontier.
01:10:36.700 | It is so shocking how predictive and tight this band is.
01:10:41.700 | Very, very tight band,
01:10:43.340 | and the whole industry is moving the same way.
01:10:45.380 | And it's roughly one order of magnitude drop in cost
01:10:49.060 | for the same level of intelligence every four months.
01:10:52.220 | My previous number for this
01:10:53.380 | was one order of magnitude drop in cost every 12 months.
01:10:56.020 | But the timeline accelerated
01:10:57.980 | because at GPT-3, it took about a year
01:11:00.860 | to drop order of magnitude.
01:11:02.860 | But now GPT-4, it's really crazy.
01:11:05.100 | I don't know what to say about that,
01:11:06.300 | but I just want to know.
01:11:07.140 | - Do you think GPT-Next and Cloud 4
01:11:10.900 | push it back down
01:11:12.860 | because they're coming out
01:11:13.700 | with higher intelligence, higher cost?
01:11:15.660 | Or is it maybe like the timeline is going down
01:11:18.700 | because new frontier models
01:11:19.980 | are not really coming out at the same rate?
01:11:22.540 | - Interesting.
01:11:23.380 | I don't know.
01:11:24.220 | That's a really good question.
01:11:25.060 | Wow, I'm stumped.
01:11:26.820 | I don't have--
01:11:27.660 | - You're like, "Wow, you got a good question."
01:11:28.500 | - Yeah, I don't have an answer.
01:11:30.340 | No, I mean, you have a good question,
01:11:31.620 | but I thought I had solved this,
01:11:33.420 | and then now you came along with it.
01:11:34.500 | The first response is something I haven't thought about.
01:11:37.140 | Yeah, yeah.
01:11:37.980 | So there's two directions here, right?
01:11:39.140 | When the cost of frontier models are going up,
01:11:41.740 | potentially like SB1047 is going to make it illegal
01:11:45.220 | to train even larger models.
01:11:46.940 | For us, I think the opposition has increased enough
01:11:49.940 | that it's not going to be a real concern for people.
01:11:52.180 | But I think every lab basically needs
01:11:54.980 | a small, medium, large play.
01:11:56.500 | And like we said,
01:11:57.540 | in the sort of model deployment framework,
01:12:00.300 | first you choose, you pursue capability,
01:12:02.700 | then you pursue generalization,
01:12:03.940 | then you pursue efficiency.
01:12:05.180 | And what we're talking about here is efficiency.
01:12:08.220 | Now we care about efficiency.
01:12:09.900 | That's definitely one of the emergent stories
01:12:11.660 | of the year that has happened,
01:12:13.460 | is efficiency matters for 4.0, 4.0 mini,
01:12:16.620 | and 3.5 Sonnet in a way that in January,
01:12:19.820 | nobody was talking about.
01:12:21.780 | And that's great.
01:12:22.860 | - Yeah.
01:12:23.700 | - Regardless of GPT-Next and Cloud 4 or whatever,
01:12:26.100 | or Gemini 2,
01:12:27.580 | we will still have efficiency frontiers to pursue.
01:12:30.580 | And it seems like doing the higher capable thing
01:12:33.940 | creates the synthetic data for us
01:12:35.460 | to do the efficient thing.
01:12:37.580 | And that means lifting up the,
01:12:39.420 | like I had this difference chart
01:12:41.060 | between Lama 3.0 8B, Lama 3.0 7TB,
01:12:45.460 | versus their 3.1 differences.
01:12:48.140 | And the 8B had the most uplift across all the benchmarks.
01:12:51.940 | Right, it makes sense.
01:12:52.780 | You're training from the 4 or 5B,
01:12:54.460 | you're distilling from there,
01:12:55.300 | and it's going to have the biggest lift up.
01:12:56.900 | So the best way to train more efficient models
01:12:59.220 | is to train the large model.
01:13:00.500 | - Right, yeah, yeah.
01:13:02.140 | - And then you can distill down to the rest.
01:13:04.060 | So this is fascinating from an investor point of view.
01:13:06.060 | You're like, okay, you're worried about picks and shovels,
01:13:07.820 | you're worried about investing in foundation model labs.
01:13:09.860 | And that's a matter of opinion.
01:13:11.780 | I do think that some foundation model labs
01:13:13.900 | are worth investing in
01:13:14.780 | because they do pay back very quickly.
01:13:17.340 | I think for engineers,
01:13:18.740 | the question is,
01:13:20.060 | what do you do when you know that your base cost
01:13:22.420 | is going down an order of magnitude every four months?
01:13:25.700 | How do you make those assumptions?
01:13:27.500 | And I don't know the answer to that.
01:13:28.780 | I'm just posing the question.
01:13:30.020 | I'm calling attention to it.
01:13:31.300 | Because I think that cognition,
01:13:32.740 | burning like rumors is,
01:13:34.380 | I don't know nothing from Scott.
01:13:35.660 | I haven't talked to him at all about this,
01:13:37.060 | even though he's very friendly.
01:13:38.500 | But they did that,
01:13:39.860 | they got the media attention,
01:13:41.140 | and now the cost of intelligence is going down.
01:13:43.940 | And it will be economically viable tomorrow.
01:13:46.380 | In the meantime, they have a crap ton of value
01:13:48.740 | from user data,
01:13:50.380 | and a crap ton of value from media exposure.
01:13:52.780 | And I think that the correct stunt to pull
01:13:55.060 | is to make economically non-viable startups now,
01:13:59.100 | and then wait.
01:14:00.380 | But honestly,
01:14:03.020 | basically I'm basically advocating
01:14:04.180 | for people to burn VC money.
01:14:05.260 | - Yeah, they can burn my money all they want
01:14:07.780 | if they're building something useful.
01:14:09.380 | I think the big problem,
01:14:10.740 | not a problem,
01:14:11.580 | but the price of the model comes out,
01:14:13.860 | and then people build on it.
01:14:15.180 | And then there's really no,
01:14:17.260 | the model providers don't really have a lot of leverage
01:14:20.460 | on keeping the price high.
01:14:22.100 | They just have to bring it down
01:14:23.300 | because the people downstream of them
01:14:24.860 | are not making that much money with them.
01:14:27.660 | And I wonder what's gonna be the model
01:14:29.260 | where it's like,
01:14:30.220 | this model is so good,
01:14:31.180 | I'm not putting the price down.
01:14:33.140 | Like if GPD 4.0 was amazing
01:14:36.300 | and was actually creating a lot of value downstream,
01:14:40.540 | people would be happy to pay.
01:14:42.100 | I think people today are not that happy with the models.
01:14:45.620 | They're good,
01:14:46.460 | but I'm not paying that much
01:14:47.300 | because I'm not really getting that much out of it.
01:14:49.380 | Like we have this AI center of excellence
01:14:51.580 | with a lot of the Fortune 500 groups,
01:14:53.540 | and there are people saving 10, 20 million a year
01:14:56.980 | like with these models doing boring stuff,
01:14:59.660 | like document translation and things like that,
01:15:01.700 | but nobody's making 100 million.
01:15:03.860 | Nobody's making 150 million.
01:15:05.780 | So like the prices just have to go down too much,
01:15:09.540 | but maybe that will change at some point.
01:15:12.060 | - Yeah, I always mention temperature to use cases, right?
01:15:15.260 | Those are temperature zero use cases
01:15:16.700 | where you need precision,
01:15:17.780 | you need creativity.
01:15:19.060 | What are the cases where hallucination is a feature,
01:15:20.900 | not a bug, right?
01:15:21.740 | So we're the first podcast to interview WebSim,
01:15:24.380 | and I'm still pretty positive
01:15:26.340 | about the generative part of AI.
01:15:27.820 | Like we took generative AI and we used it to do reg.
01:15:30.580 | We have an infinite creativity engine.
01:15:34.500 | Let's go do more of that.
01:15:36.700 | So we'll hopefully do more episodes there.
01:15:38.420 | You have some stuff on agents you wanna-
01:15:40.420 | - Yeah, no, I think this is something
01:15:41.900 | that we talked a lot about,
01:15:43.460 | and we wrote this post months and months ago
01:15:46.500 | about shifting from software as a service
01:15:49.060 | to services as a software.
01:15:50.620 | And that's only more true now.
01:15:52.260 | I think like most companies that are buying AI tooling,
01:15:56.020 | they want the AI to do some sort of labor for them.
01:15:59.220 | And that's why the picks and shovels
01:16:01.060 | kind of disinterest maybe comes from a little bit.
01:16:03.780 | Most companies do not wanna buy tools to build AI.
01:16:06.100 | They want the AI,
01:16:07.380 | and they also do not want to pay a lot of money
01:16:09.660 | for something that makes employees more productive
01:16:11.740 | because the productivity gains
01:16:13.020 | are not accruing to the companies.
01:16:14.580 | They're just accruing to the employees.
01:16:16.660 | People work less, have longer lunch breaks
01:16:18.780 | because they get things done faster.
01:16:20.540 | But most companies are not making a lot more money
01:16:23.020 | by making employees productive.
01:16:24.300 | That's not true for startups.
01:16:25.940 | So if you look at most startups today in AI,
01:16:28.220 | like they're much smaller teams compared to before
01:16:30.940 | versus agents.
01:16:31.780 | We have companies like Brightwave,
01:16:33.100 | which we had on the podcast.
01:16:34.780 | You're selling labor,
01:16:36.020 | which is something that people are used to paying
01:16:38.660 | on a certain pay scale.
01:16:40.660 | So when you're doing that,
01:16:42.300 | if you ask Brightwave, they don't have it public,
01:16:44.540 | but they charge a lot of money,
01:16:46.020 | more than you would expect
01:16:47.620 | because hedge funds and like investment banking,
01:16:50.020 | investment advisors,
01:16:51.300 | they're used to paying a lot of money for research.
01:16:53.220 | It's like the labor,
01:16:54.060 | they don't even care that you use AI.
01:16:55.860 | They just want labor to be done.
01:16:57.060 | - I'll mention one pushback,
01:16:58.300 | but as a hedge fund,
01:16:59.820 | we used to pay for analyst research
01:17:01.980 | out of our brokerage cost
01:17:03.620 | and not read them.
01:17:05.260 | To me, that's my risk of Brightwave,
01:17:07.620 | but you know.
01:17:08.460 | - No, but I think the-
01:17:09.580 | - As a consumer of research, I'm like-
01:17:11.340 | - If we want to go down the rabbit hole,
01:17:13.580 | there's a lot of pressure on funds
01:17:14.940 | for like a OPEX efficiency.
01:17:16.540 | So there's not really capture researchers anymore
01:17:20.220 | at most funds.
01:17:21.060 | And like even the sell side research is not that good.
01:17:23.300 | - Taking them from in-house to external thing.
01:17:25.740 | - Yeah. - Yeah, that makes sense.
01:17:27.820 | - So yeah, you know,
01:17:29.020 | we have drop zone that does security analysis.
01:17:31.260 | Same, people are used to paying for managed security
01:17:33.900 | or like outsourced SOC analysts.
01:17:35.540 | They don't want to buy an AI tool
01:17:37.540 | to make the security team more productive.
01:17:39.940 | - Okay, and what specifically does drop zone do?
01:17:41.460 | - They do SOC analysis.
01:17:43.060 | So not SOC like the compliance,
01:17:44.860 | but it's like when you have security alerts,
01:17:46.580 | how do you investigate them?
01:17:47.820 | So large enterprises,
01:17:49.060 | they get like thousands of phishing email
01:17:51.180 | and then they forward them to IT
01:17:52.540 | and IT or security person, the tier zero
01:17:55.660 | has to go in and say,
01:17:57.060 | that's a phishing email that is in, that is in.
01:17:59.140 | So they have an agent that does that.
01:18:00.780 | So the cost to do,
01:18:02.060 | like for a human to do the analysis
01:18:03.740 | at the rate that they get paid,
01:18:05.460 | it's like $35 per alert.
01:18:07.660 | Drop zone is like $6 per alert.
01:18:10.300 | So it's a very basic economic analysis for the company,
01:18:13.900 | whether or not they want to buy it.
01:18:15.220 | It's not about, is my analyst going to have more free time?
01:18:18.540 | Like, is it more productive?
01:18:19.580 | So selling the labor is like the story
01:18:22.820 | of the market right now.
01:18:24.500 | - My version of this is I should start
01:18:26.220 | a consulting services today
01:18:28.060 | and then slowly automate myself,
01:18:30.060 | my employees out of a job, right?
01:18:32.180 | Is that fundable?
01:18:34.220 | - Is that fundable?
01:18:35.060 | That's a good question.
01:18:35.900 | I think whether or not,
01:18:37.100 | depends how big you want it to be.
01:18:37.940 | - This is a services company, basically.
01:18:39.940 | - Yeah, that's, I mean, that's what,
01:18:41.580 | I know now it's maybe not as good of an example,
01:18:43.540 | but Crosstrek started as a security research.
01:18:48.500 | - Yeah, I mean, it's still one of the most successful
01:18:50.300 | companies of all time.
01:18:51.140 | - Yeah, yeah, yeah.
01:18:52.820 | - Yeah, it's an interesting model.
01:18:53.940 | I'm always checking my biases there.
01:18:55.780 | Anything else on the agents side of things?
01:18:58.180 | - No, that's really something
01:18:59.540 | that people should spend more time on.
01:19:00.980 | It's like, what's the end labor that I'm building?
01:19:03.940 | Because, you know, sometimes when you're being too generic
01:19:06.220 | and you want to help people build things, like ADAPT,
01:19:08.780 | like ADAPT, you know, David was on the podcast
01:19:10.660 | and he said they were sold out of things,
01:19:12.420 | but they're kind of like--
01:19:13.780 | - And then he sold out himself.
01:19:14.980 | - Yeah, it's like, they're working with each company
01:19:19.020 | and the company has to invest the time
01:19:20.900 | to build with them.
01:19:21.740 | - Yeah, you need more hands-off.
01:19:23.580 | - Exactly.
01:19:24.420 | - Yeah.
01:19:25.260 | - So, and that's more verticalized.
01:19:26.620 | - Yeah, yeah.
01:19:27.460 | I'll shout out here, Jason Liu,
01:19:28.660 | he was also on a podcast and spoke at the conference.
01:19:30.940 | He has this idea of like, it's reports, not rag.
01:19:33.980 | You want things to produce reports,
01:19:35.780 | because reports can actually get consumed.
01:19:37.900 | Rag is still too much work, still too much chatbotting.
01:19:40.300 | I'll briefly mention that new benchmarks,
01:19:42.180 | I'm thinking about.
01:19:43.420 | I think you need to have everyone studying AI research,
01:19:48.060 | understanding the progress of AI and foundation models,
01:19:50.820 | needs to have in mind what is next after MMLU.
01:19:53.860 | I have 10 proposals.
01:19:55.020 | Most of them, half of them come
01:19:56.460 | from the Hugging Face episode.
01:19:58.180 | So everyone's loving Clementine.
01:20:00.700 | I want her back on.
01:20:01.740 | And she was amazing and very charismatic,
01:20:03.780 | even though she made us take down the YouTube.
01:20:06.620 | But MUSR for multi-step reasoning, math for math,
01:20:10.340 | IFE for instruction following, Big Bench hard.
01:20:13.100 | And code, we're now getting to the area
01:20:15.420 | that the Hugging Face leaderboard does not have.
01:20:17.460 | And I'm considering making my own
01:20:19.020 | 'cause I care about this so much.
01:20:20.620 | So MBPP is the current one that is post-human eval,
01:20:24.780 | 'cause human eval is widely known to be saturated.
01:20:26.740 | And SciCode is like the newest one
01:20:28.380 | that I would point people to.
01:20:29.660 | Context utilization, we had Mark from Gradient
01:20:31.740 | on talk about ruler, but also zeros goes in infinite bench,
01:20:34.580 | were the two that Nalma 3 used instead of ruler.
01:20:37.820 | But basically, something that's a little bit more rigorous
01:20:40.580 | than needle in a haystack,
01:20:42.100 | that is something that people need.
01:20:43.900 | Then you have function calling.
01:20:45.100 | Here, I think Gorilla, API Bank, Nexus,
01:20:47.300 | pretty consensus, I've done nothing there apart from,
01:20:49.940 | yeah, like all models need something like this.
01:20:52.720 | Vision now is like multimodality,
01:20:54.820 | the vision is the most important.
01:20:56.460 | I think like Vibey Vell is actually the state of the art here.
01:20:59.020 | I, you know, open to being corrected,
01:21:00.860 | and then multilinguality.
01:21:02.020 | So basically, like these are the 10 directions, right?
01:21:04.500 | Post-MMLU, here are the frontier capabilities.
01:21:06.980 | If you're developing models,
01:21:08.380 | or if you're encountering a new model,
01:21:10.220 | evaluate them on all these elements,
01:21:11.840 | and then you have a good sense of how state of the art they are
01:21:14.500 | and what you need them for
01:21:15.780 | in terms of applying them to your use case.
01:21:17.640 | So I just want to get that out there.
01:21:18.980 | - Yeah, and we have the RKGI thing.
01:21:21.380 | How do you think about benchmarking for, you know,
01:21:24.740 | everyday thing or like benchmarking for something
01:21:27.580 | that is maybe like a hard to reach goal?
01:21:30.020 | - Yeah, this has been a debate for,
01:21:32.740 | that's obviously very important
01:21:33.900 | and probably more important for product usage, right?
01:21:36.680 | Here, I'm talking about benchmarking
01:21:38.300 | for general model evals.
01:21:40.180 | And then there's a schism in the AI engineering community
01:21:43.380 | or criticism of AI engineering community
01:21:44.740 | that did not care enough about product evals.
01:21:47.320 | So Hamoud Hossein led that,
01:21:49.380 | and I had a bit of disagreement with him,
01:21:51.340 | but I acknowledge that, I think that it's important.
01:21:53.980 | There was an oversight in my original AI engineer post.
01:21:56.580 | So the job of the engineer
01:21:57.900 | is to produce product-specific evals for your use case.
01:22:01.620 | And there's no way that these general academic benchmarks
01:22:04.180 | are going to do that
01:22:05.020 | because they don't know your use case.
01:22:06.180 | It's not important.
01:22:07.180 | They will correlate with your use case,
01:22:09.340 | and that is a good sign, right?
01:22:10.700 | These are very, very rigorous and thought through.
01:22:13.420 | So you want to look for correlates,
01:22:14.740 | then you want to look for specifics.
01:22:15.980 | And that's something that only you can do.
01:22:17.780 | So yeah, RKGI will correlate with IQ.
01:22:20.780 | It's an IQ test, right?
01:22:22.780 | How well does IQ test correlate to job performance?
01:22:25.940 | 5%, 10%, not nothing, but not everything.
01:22:29.380 | And so it's important.
01:22:30.340 | - Anything else?
01:22:31.340 | - Super intelligence.
01:22:32.820 | We can, you know, we try not to talk about safety.
01:22:35.580 | My favorite safety joke from our dinner
01:22:37.420 | is that, you know, if you're worried about agents
01:22:39.220 | taking over the worlds
01:22:40.140 | and you need a button to take them down,
01:22:41.660 | just install CrowdStrike on every agent.
01:22:44.660 | And you have a button that has just been proved
01:22:46.740 | at the largest scale in the world
01:22:47.740 | to disable all agents, right?
01:22:49.220 | So save super intelligence.
01:22:51.540 | You should just install CrowdStrike.
01:22:54.100 | That's what Elias Oskiver should do.
01:22:55.940 | - That's funny, except for the CrowdStrike people.
01:22:59.780 | Awesome, man, this was great.
01:23:00.780 | I'm glad we did it.
01:23:01.620 | I'm sure we'll do it more regularly
01:23:03.300 | now that you're out of visa jail.
01:23:05.060 | - Yeah, yeah.
01:23:05.900 | I think, you know, AI News is surprisingly helpful
01:23:08.220 | for doing this.
01:23:09.460 | - Yeah.
01:23:10.300 | - Yeah.
01:23:11.140 | I had no idea when I started.
01:23:12.780 | I just thought I needed a thing to summarize discords,
01:23:15.620 | but now it's becoming a proper media company.
01:23:18.300 | Like a thousand people sign up every month.
01:23:20.740 | It's growing.
01:23:21.900 | - Cool.
01:23:22.740 | Thank you all for listening.
01:23:23.820 | - Yeah.
01:23:24.660 | - See you next time.
01:23:25.500 | - Bye.
01:23:26.340 | (upbeat music)
01:23:28.900 | (upbeat music)
01:23:31.500 | (upbeat music)
01:23:34.080 | (upbeat music)