back to indexThe Four Wars of the AI Stack - Dec 2023 Recap
Chapters
0:0 Intro
1:42 The Four Wars of the AI stack: Data quality, GPU rich vs poor, Multimodality, and Rag/Ops war
3:35 Selection process for the four wars and notable mentions
8:11 The end of low background tokens and the impact on data engineering
10:10 The Quality Data Wars (UGC, licensing, synthetic data, and more)
21:44 The GPU Rich/Poors War
26:29 The math behind Mixtral inference costs
34:27 Transformer alternatives and why they matter
41:33 The Multimodality Wars
45:40 Multiverse vs Metaverse
54:0 The RAG/Ops Wars
60:0 Will frameworks expand up, or will cloud providers expand down?
65:25 Syntax to Semantics
67:56 Outer Loop vs Inner Loop
71:0 Highlight of the month
00:00:00.920 |
- Hey everyone, welcome to the Latent Space Podcast. 00:00:07.400 |
And today I'm joined just by my co-host, Zwickz, 00:00:21.680 |
a lot of listeners were asking us for more one-on-one time, 00:00:28.800 |
You know, both of us are very actively involved. 00:00:32.040 |
And I don't think this year will be any different. 00:00:34.560 |
This year, there's lots more excitement to come. 00:00:37.360 |
And I think, you know, we're trying to grow Latent Space 00:00:43.680 |
and the amount of value that we deliver to our subscribers. 00:00:47.240 |
So one thing that we've been trying, experimenting with, 00:00:52.840 |
where I basically just take the notable news items 00:00:58.200 |
and categorize them according to some order that makes sense 00:01:04.340 |
And this last December recap was particularly exciting 00:01:08.960 |
'cause it seemed like it popped off in a number of areas, 00:01:16.640 |
And I figured we can just kind of go over that 00:01:41.320 |
And I know that they are there, but I couldn't fix it 00:01:43.400 |
because Substack was broken by us with how long it was. 00:01:46.780 |
- But so we had this kind of like four main buckets 00:02:02.440 |
which we have a whole episode about with Dylan Patel. 00:02:05.720 |
Multimodality, we're actually recording tomorrow 00:02:17.120 |
- And we're gonna release Hugging Face as well. 00:02:24.920 |
that you should really pay attention to is vision. 00:02:33.960 |
I don't know if you want to call it anything else. 00:02:41.440 |
because there wasn't much open source model work. 00:02:43.560 |
And I think over the last maybe like four or five months, 00:02:47.160 |
everybody's so focused on fine-tuning LLAMA 2 00:02:58.640 |
and some of the things that were maybe top of mind. 00:03:13.520 |
I'm keeping an eye on, which is Turbo Puffer. 00:03:16.560 |
I don't know if you've seen them going around. 00:03:19.120 |
Yeah, all the smart people seem to be adopting Turbo Puffer 00:03:26.080 |
- Yeah, no, and we're going to have definitely Jeff 00:03:31.960 |
I know they're going to be fun, I guess, but... 00:03:35.960 |
- I should also mention, I think it's interesting. 00:03:47.440 |
So for those who don't know, inside of my writing, 00:03:52.120 |
I often include footnotes that are in themselves 00:04:15.960 |
Open-source AI is not a battle in the sense that 00:04:19.560 |
I don't think there's anyone against open-source AI. 00:04:24.800 |
There's no opposing side apart from regulators. 00:04:27.960 |
But in my mind, when I think about for engineers, 00:04:42.560 |
The only battle is people offering inference on it. 00:04:47.520 |
- Yeah, so I classified that as a GPU rich versus poor war. 00:04:51.840 |
But maybe there's a better way to classify that. 00:04:56.400 |
because it's a struggle to try to categorize the world. 00:05:04.040 |
I was very struck by a conversation I had with Pullside. 00:05:28.080 |
"was like one of our podcast's early biggest winners." 00:05:37.360 |
but it's not really widely used beyond Repl.it. 00:05:43.800 |
but like it's not really, for how important code is, 00:05:56.760 |
And so I thought it was just interesting to note 00:06:02.840 |
try to pay particular attention to developer tooling, 00:06:15.760 |
compared to the amount of money being thrown, 00:06:25.200 |
- Yeah, I think it's maybe the fragmentation of the tooling. 00:06:29.320 |
Like most people in code are using VSCode, Cursor, GitHub, 00:06:38.000 |
versus with text, people are just trying everything. 00:06:45.880 |
but it's not super easy to just plug it into your workflow. 00:06:49.080 |
So I think engineers like myself are just lazy. 00:07:01.120 |
and the semantic layer data engineering type things. 00:07:04.600 |
We also had two guests on there from Seek and Qube. 00:07:09.000 |
And we also talked to a bit of Databricks, a bit of Julius. 00:07:38.240 |
And in traditional ML engineering in the end, 00:07:41.320 |
they might have to discover that they're doing Rexis. 00:07:44.140 |
And all the stuff that gets swept under a rug in a demo 00:07:51.060 |
And I think I'll probably say just because we didn't select 00:07:55.600 |
a theme for last year doesn't mean it wasn't important. 00:08:01.040 |
And maybe I think that would be an emerging theme this year. 00:08:04.160 |
- Yeah, I think that's kind of the consequence 00:08:17.640 |
that our friend Jeff Huber at Chroma brought up 00:08:30.120 |
So it was really precious to get low background steel, 00:08:34.520 |
meaning with no radiation and same with tokens. 00:08:46.460 |
Instead now, anything we're gonna get on Common Crawl, 00:08:53.560 |
And I think that will put more work on data engineering. 00:08:58.740 |
if a text says, as a model created by OpenAI, 00:09:06.800 |
all the data sets offered by Luther and Common Crawl 00:09:16.320 |
And we've seen the New York Times lawsuit against OpenAI. 00:09:20.080 |
We've seen data partnerships starting to rise 00:09:24.920 |
I think that's gonna be one of the bigger challenges 00:09:37.520 |
It's like, you got people sitting at their desk every day. 00:09:40.520 |
If everybody wrote five Q&A pairs or things like that, 00:09:44.720 |
you would have a massive unique data set for your model. 00:09:54.400 |
And Mike Conover since left to start BrightWave, 00:09:58.440 |
which I'm sure we'll have him back this year. 00:10:00.280 |
- Yeah, they're doing a lot of interesting stuff. 00:10:07.040 |
Do you want to just kind of go through the four wars? 00:10:10.500 |
You have, you created this Wikipedia-like infographic 00:10:18.200 |
- Yeah, I should say, the inspiration for this 00:10:20.200 |
actually was during the Sam Altman leadership battle, 00:10:25.200 |
people were making mock Wikipedia entries for the debate 00:10:32.920 |
and for like who's on the side of the D-cells 00:10:37.640 |
And so I like that format because it's very concise. 00:10:41.320 |
It has the list of key players and it's kind of fun 00:10:51.000 |
I think it is important to focus on key battlegrounds 00:10:55.140 |
as a concept because there's so many interesting things 00:11:19.240 |
and then I just screenshotted it just to get the formatting. 00:11:31.520 |
On one side you have journalists, writers, artists, 00:11:34.320 |
on the other side you have researchers, startups, 00:11:52.560 |
of how comfortable people are about this data 00:11:55.760 |
So some people are happy to have your model trained on it, 00:12:00.760 |
Some people are happy to have your model trained on 00:12:02.800 |
as long as you disclose that it's in the model. 00:12:05.980 |
Some people just hate that you trained on their model 00:12:21.460 |
that it's not just like you should never use the data 00:12:35.400 |
I think we're giving everybody a lot of great tokens 00:12:38.160 |
related space because we do full transcripts on everything 00:12:41.040 |
and we're happy for people to train models on. 00:12:44.440 |
- Oh yeah, please train a latent space model. 00:12:51.120 |
Anything that people should keep in mind about this war 00:12:54.680 |
and like maybe some of the campaigns that are going on? 00:13:10.800 |
war that will probably decide what fair use means 00:13:19.280 |
I think The Verge did a good analysis of this. 00:13:22.240 |
Platformer maybe did a good analysis of this. 00:13:25.080 |
There are like four criteria for what fair use is 00:13:27.640 |
and everyone basically converges onto the last criteria 00:13:32.440 |
which is does your use, does your transformative use 00:13:35.560 |
of my copyrighted material diminish the market 00:14:10.200 |
so obviously which we want them to be adequately compensated 00:14:16.320 |
So there's like no good, there's like no easy outcome here 00:14:33.360 |
I was a community moderator at a website called Rap Genius 00:14:38.780 |
And there was like a similar thing in maybe like 2014 00:14:41.760 |
or like the music labels basically came to the website 00:14:47.400 |
Like you can not reuse the lyrics to the song 00:14:52.600 |
with the record labels to like be able to do this. 00:15:07.240 |
some eyes we put the dots, some eye we put like the accent 00:15:19.440 |
but maybe, I mean, this is like almost 10 years ago. 00:15:21.720 |
- So Rap Genius proved it by injecting some data poison 00:15:24.680 |
into their corpus and then Google reproduced it faithfully. 00:15:28.560 |
So therefore they proved that Google is scraping Rap Genius. 00:15:32.200 |
Did Google have to pay Rap Genius money in the end? 00:15:37.080 |
- But at the same, there was also another issue 00:15:39.200 |
with Rap Genius that we had that got blacklisted by Google 00:15:46.400 |
- But anyway, this is not a Rap Genius special. 00:15:55.400 |
to the New York Times, the New York Times worse outcome 00:15:58.400 |
is that they will substitute it with Washington Post 00:16:11.960 |
not that much more valuable than other words. 00:16:21.780 |
but yeah, I do think it's overstepping their bounds 00:16:31.640 |
which I named as on the side of the New York Times. 00:16:52.540 |
is basically every UGC, Users Generated Content company 00:17:05.180 |
that used to be open for researchers to scrape 00:17:09.340 |
Now all of them are locking in their walls, right? 00:17:17.020 |
So this is a locally optimal outcome for them, 00:17:19.540 |
but a globally suboptimal outcome for humanity. 00:17:29.620 |
the X model, as opposed to it being a part of a data mix 00:17:38.820 |
That seems like a much better outcome for the world, 00:17:42.020 |
but everyone is acting in their very narrow self-interest 00:17:58.160 |
So what happens when you run out of human data? 00:18:04.500 |
So I would say that is, when I went to NeurIPS, 00:18:13.020 |
There is a lot of research coming from both, I guess, 00:18:25.420 |
I don't know if you've talked to any startups around that. 00:18:27.700 |
I just talked to Luis Costricado the other day, 00:18:30.820 |
and he is promising a very, very interesting approach 00:18:44.940 |
and the other open-source communities have been doing, 00:18:50.340 |
And so he wants to create trillion-token datasets 00:18:58.600 |
these are all just downloads from GPT-4 or something else. 00:19:03.600 |
So Luis is very aware of that, and he has a way around it. 00:19:10.100 |
but he claims that that's a good way around it. 00:19:33.260 |
You can solve the synthetic data problem that way, 00:19:42.220 |
that the way that the phrases are constructed 00:19:53.380 |
The other thing is every sample is read in the same way, 00:19:58.140 |
or as a similar, since it comes from a certain model, 00:20:06.200 |
So I mentioned this in the best papers discussion 00:20:43.260 |
It is now more to spike the distribution in useful ways. 00:21:06.540 |
So one war I did not put here was the talent war, right? 00:21:11.700 |
But when you break down what the talent people do, 00:21:16.020 |
one is they make models and they run inference on GPUs. 00:21:30.740 |
for the kind of talent that is flowing back and forth. 00:21:37.540 |
the visible output of what they're working on, which is data. 00:21:41.260 |
- All right, let's talk about the GPU inference war. 00:21:44.720 |
I think this is one that has been heating up. 00:22:01.940 |
But basically, the Mixedraw release, the MOE model, 00:22:09.060 |
I think the price went down like 90% in one week. 00:22:20.260 |
- Yeah, and then there was the benchmark drama 00:22:42.660 |
Even if it's a competitor, you say nice things, 00:23:03.180 |
I do think there's some methodological things. 00:23:06.940 |
you have to understand that there's a real, real, 00:23:14.780 |
compared to, okay, if you're load testing us, 00:23:37.580 |
But what was interesting was this benchmark drama 00:23:45.380 |
'cause Sumit doesn't represent any inference part. 00:23:48.420 |
But he felt like this was a very interesting debate. 00:24:02.020 |
this kind of fight come into the inference space. 00:24:14.060 |
I can run Postgres on my MacBook and run similar ones. 00:24:32.180 |
which is the same with model benchmarks, right? 00:24:35.140 |
Just like, "Oh, this model is so much better than this." 00:24:37.180 |
And then it's like, "Did you train on the questions?" 00:24:50.340 |
in AI on benchmarks than there is in traditional software, 00:24:53.100 |
because nobody buys Upstash or Redis Cloud or whatever 00:24:58.780 |
They try them and check performance and whatnot 00:25:01.420 |
because they have real production-scale workloads. 00:25:04.540 |
Here, it's like nobody's really doing anything 00:25:07.100 |
So it's like whatever any skill says, I guess, is good, 00:25:12.420 |
and just decide for them what the right thing is. 00:25:45.460 |
That's something that you can only earn over time. 00:25:56.460 |
If you're not table-stakes on any of those things, 00:26:11.380 |
which is an independent third-party benchmark 00:26:13.940 |
pinging the production API endpoints of all the providers 00:26:17.780 |
and giving a third-party analysis of what this is. 00:26:21.620 |
I actually built a prototype of this last year. 00:26:32.260 |
just because I don't want to keep up with all these things. 00:26:39.180 |
that somebody should do, so I'm glad that they did it. 00:26:51.820 |
I don't think, I haven't seen any continuing debate there. 00:27:01.380 |
Are they pricing correctly their tokens from Mixtrel? 00:27:07.060 |
And I actually managed to go into Dylan Patel's 00:27:28.900 |
which is what Perplexity prices their Mixtrel at. 00:27:34.020 |
They're not even an inference infra provider. 00:27:57.420 |
and DeepInfra, 27 cents, they're all losing money. 00:28:00.240 |
Because we think that the break-even is 51 cents. 00:28:31.420 |
to 75 cents per million than 50 cents per million. 00:28:42.060 |
if you, either you don't know what you're doing, 00:28:47.300 |
and you're purposely losing money for something. 00:28:50.700 |
And I don't know, but I think it's an interesting, 00:29:06.260 |
so that they get you in the door to try things out. 00:29:09.100 |
I, like, I don't know if that makes sense to you as a UC. 00:29:16.180 |
you know, the candies are placed at the cash register, 00:29:19.540 |
because maybe you just went to get the thing on discount, 00:29:25.900 |
Your Kit, they all have the Pokemon trading cards 00:29:30.180 |
So if you bring your Kit to buy the discounted whatever 00:29:39.100 |
where you upsell people with these things, right? 00:29:50.420 |
I wonder what they're gonna charge for all workers. 00:30:01.020 |
for very, very underpowered inference, right? 00:30:11.460 |
So they have mixed trial 7B right now, I checked. 00:30:28.340 |
is there gonna be a better model that comes next 00:30:31.420 |
that they hope that you already integrated their thing with? 00:30:34.980 |
You know, if you're using together to serve mixed trial 00:30:43.780 |
and they're gonna get better unit economics on it. 00:30:50.180 |
Thank you VCs for paying for all of our imprints. 00:30:54.620 |
I think these are, you know, everyone in here 00:30:58.980 |
I'm sure there's some kind of long-term strategy here. 00:31:05.580 |
- Yeah, I think it's the same with Uber, right? 00:31:08.100 |
It's like, how could it have been so cheaper at the start? 00:31:11.620 |
You know, like you look back at all DoorDash, 00:31:15.740 |
- And like last year was a great year for Uber. 00:31:18.860 |
Uber friends are like suddenly very, very rich again. 00:31:23.820 |
One thing I will mention on like the engineering 00:31:28.660 |
the rise of mixture of experts is something that, 00:31:31.900 |
you know, we covered in our podcast with George 00:31:41.780 |
really, really commercially successful sparse model. 00:31:55.100 |
versus the amount of compute you need for inference 00:31:58.180 |
continues to diverge, but also in a weird way 00:32:07.820 |
even though you're not necessarily using them at all times. 00:32:14.740 |
is like, I think that that is going to impose 00:32:17.460 |
different needs on hardware, different needs on workload, 00:32:21.060 |
different needs on like batching optimization, 00:32:23.520 |
like Fireworks recently announced a fire attention 00:32:26.860 |
where they wrote a custom Cuda kernel for MixedRaw 00:32:29.340 |
on H100, it's like super, super domain specific. 00:32:33.040 |
And they announced that they could, for example, 00:32:51.660 |
is going to be, going to have very meaningful impacts 00:32:54.540 |
on the inference market and how it's going to shape 00:32:58.560 |
It may not be that we have this sort of input token 00:33:10.820 |
different forms of batching, different forms of caching. 00:33:14.280 |
And like, I don't really know what that looks like, 00:33:22.440 |
like that's something I would be trying to offer 00:33:29.880 |
because most of the struggles with inference as well 00:33:34.960 |
So we have now models that scale worse at higher batch. 00:33:40.940 |
You know, but I'm glad I'm not in that business. 00:33:51.760 |
You know, you're already trying to provide value 00:33:53.860 |
to the customer on like the developer experience 00:33:57.900 |
But you also have to get so close to the bare metal 00:34:09.620 |
It's like just, nobody will get in that business, you know? 00:34:19.620 |
for like 3DAL and like fresh retention too and whatnot, so. 00:34:29.340 |
So there's, the GPU rich people are the model trainers 00:34:39.380 |
and then we provide you the best inference, right? 00:34:42.260 |
And that's what we've been discussing so far. 00:34:56.140 |
because, you know, any efficiency or distillation method 00:34:59.180 |
where you go from, like you reduce your inference 00:35:16.520 |
and that will be a game changer for local models 00:35:19.280 |
because then you just don't need any cloud inference at all. 00:35:21.960 |
You just run it on device, which is fantastic. 00:35:31.040 |
I don't know, there's something I've been worried about 00:35:50.160 |
they're like for limited domains and like not super usable. 00:35:53.600 |
So I don't know if you have opinions on that. 00:35:56.560 |
I can follow up with one conclusion that I've had, 00:36:18.680 |
I'm okay with rag and recursive summarization, 00:36:32.080 |
Why do I need 10 million, 100 million, 1 billion models? 00:36:36.540 |
So the more convinced, the easiest argument is, 00:36:42.520 |
oh, you can consume very, very high bit rate things 00:36:48.100 |
And then you can do like syn-bio and all that good stuff. 00:36:51.160 |
And I'm like, okay, I don't know anything about that. 00:37:00.240 |
the DNA strand that you're trying to synthesize? 00:37:12.400 |
and the non-transformer alternatives until Mamba. 00:37:23.880 |
or a lot more performance for the same size of model. 00:37:26.440 |
And then it's a different, now it's an efficiency story. 00:37:34.360 |
we are strictly more efficient than transformers. 00:37:46.480 |
Which is like, oh, you can get the context higher and higher. 00:37:49.320 |
But in reality, it's like, if you kept the context smaller, 00:37:54.920 |
It's like same context, it's like a lot less compute. 00:37:58.220 |
Yeah, so that was not clear to me until Mamba. 00:38:07.280 |
that I've been trying to call the sour lesson. 00:38:12.640 |
stop trying to do domain specific adjustments, 00:38:28.840 |
Like if you have like any switch case or if statements, 00:38:32.040 |
or like if finance do this, if something do that, 00:38:37.100 |
And it's going to do all of them simultaneously 00:38:42.160 |
The sour lesson is a parallel, is a corollary, 00:38:45.800 |
which is stop trying to model artificial intelligence 00:39:00.520 |
And so why should, we keep trying to create alternatives 00:39:12.000 |
We have a hidden state and then we process new data 00:39:16.020 |
But maybe artificial intelligence or machine intelligence 00:39:29.800 |
And my favorite analogy, I actually got this from, 00:39:31.840 |
I think an old quote from Sam Altman, who was like, 00:39:35.200 |
you know, like we made the plane, the airplane. 00:39:39.640 |
but it doesn't work anything like birds, right? 00:39:44.000 |
Like it's probably the safest mode of transportation 00:39:45.720 |
that we have, and it works nothing like a bird. 00:39:52.640 |
And that is the philosophical debate underlying 00:39:55.800 |
my continued cautiousness around state-space models. 00:40:07.560 |
because I don't think there's any justification 00:40:12.700 |
or like the mathematical justifications for these things. 00:40:29.680 |
And I think transformers have shown enough success 00:40:32.960 |
that people are interested in finding the next thing. 00:40:44.600 |
Okay, maybe in the 2025 recap, we're gonna have more. 00:40:49.560 |
- Yeah, I mean, we'll try to do one before that. 00:41:20.720 |
- Well, I mentioned this in the Luther Discord, 00:41:22.260 |
and then they were like, okay, so what is the spicy lesson, 00:41:24.940 |
and what is the salty lesson, what is the sweet lesson? 00:41:30.940 |
Cool, talking about GPU port, let's do multimodality. 00:41:35.780 |
- Well, I feel that stable diffusion was like 00:41:50.560 |
I think, I don't know if stable diffusion 2 was out there, 00:41:58.200 |
to consistency model, but looks like a consistency model. 00:42:03.860 |
but just wasn't as big as 2022 when they, you know, 00:42:13.640 |
But yeah, mid-journey has been doing great, obviously. 00:42:15.960 |
I actually finally signed up for a paid account last month. 00:42:27.840 |
what's confirmed is, I think, like a Business Week article, 00:42:36.580 |
at least $200 million ARR, completely bootstrapped. 00:42:50.420 |
is actually higher than that, that was what was reported. 00:42:53.460 |
But it's between the $200 million to $300 million range, 00:43:09.880 |
- Oh, you think there's a lot of Fiverr, yeah, yeah, yeah. 00:43:14.400 |
and see what people are generating, you know? 00:43:16.800 |
And you can see a lot of it is like product placement, 00:43:21.800 |
- Yeah, and DALI 3 doesn't seem to have any impact on-- 00:43:30.520 |
Well, first of all, before you could generate four images. 00:43:43.180 |
it looks like some dusty, old, yeah, like mid-2000s. 00:43:52.100 |
- No, but that was the great thing about DALI 3, right? 00:43:58.580 |
Before, like literally when it first came out, 00:44:00.960 |
I'm like, "Hey, make a coliseum with llamas." 00:44:24.520 |
to create Ideogram, that was a few months ago. 00:44:27.720 |
And I didn't even put it here because I forgot. 00:44:30.280 |
- It's too much, I can't keep track of all of it. 00:44:34.400 |
Okay, so I will just basically say that I do think 00:44:36.680 |
that I used to, at the end of 2022, start of 2023, 00:44:48.520 |
was more like hobbyist kind of, you know, work, 00:44:57.120 |
- It is not like, you know, not just like not safe for work 00:45:00.480 |
because mid-journey doesn't do not safe for work. 00:45:02.560 |
So it's real, it's a new form of art, it's citizen art. 00:45:09.920 |
and you can't even model this as an investor, 00:45:14.920 |
you can't even model this on an existing market. 00:45:18.140 |
Because like, there's just a market of people 00:45:29.880 |
- Yeah, I'm surprised I haven't seen a return 00:45:35.620 |
during the NFTs boom, people were like, "Oh." 00:45:39.760 |
- Yeah, so this is the very, very first "Latent Space" post 00:45:44.200 |
was on the difference between crypto and AI in this respect. 00:45:48.900 |
So I called this multiverse versus metaverse. 00:45:58.640 |
that are limited edition, that are worth something, 00:46:09.280 |
which is a very positive sum instead of zero sum, 00:46:15.900 |
and I'll make a completely equivalent second thing, 00:46:19.440 |
And that means very different things for what value is, 00:46:25.000 |
So like, yeah, I mean, I still cling to that insight, 00:46:28.000 |
even though I don't know how to make money from it. 00:46:30.000 |
I think that, I mean, obviously MidJourney figured it out. 00:46:32.620 |
I think MidJourney like made the right approach there. 00:46:36.240 |
The other one, I think I'll highlight is 11 Labs. 00:46:38.480 |
I think they were another big winner of last year. 00:46:41.020 |
I don't know, did they renounce their fundraise? 00:46:48.680 |
- Rumor is, I can say it, you don't have to say it, 00:46:57.320 |
which again, I did not care about it at the start of 2023. 00:47:01.960 |
Now we have used it for parts of latent space. 00:47:04.340 |
I listen almost every day to an 11 Labs generated podcast, 00:47:11.120 |
I don't know what the room for this to grow is, 00:47:17.000 |
because I always think like it's so inefficient 00:47:21.200 |
The bit rate of a voice-created thing is so low. 00:47:27.640 |
It's only for hands-free, eyes-free use cases. 00:47:34.240 |
I don't know, but it seems like they're making money. 00:47:37.520 |
Yeah, I mean, Sarah, my wife, yeah, she uses it 00:47:51.280 |
- What does, we should bring Sarah in at some point, but-- 00:48:02.640 |
and it's like, hey, what am I supposed to get 00:48:17.440 |
- Yeah, a lot of people have told me about that, 00:48:18.720 |
and I just, when I listen, when I'm by myself, 00:48:28.160 |
probably the number one thing they can do for me 00:48:37.160 |
- Yeah, anyway, so like, I'm curious about your thoughts 00:48:40.980 |
I think this is the weirdest AI battlefront for investing. 00:48:48.280 |
- It's funny because there was, I'm trying to remember, 00:48:57.360 |
a lot of them got through like good ARR numbers, 00:49:00.160 |
but the problem was like a repeatability or use case. 00:49:03.080 |
So people were doing all sorts of random stuff, you know? 00:49:05.720 |
And the problem is not, it's kind of like mid-journey. 00:49:12.240 |
It's like, how do you build a venture-backed company 00:49:27.000 |
text-to-voice that is like, how do you sell it? 00:49:33.840 |
If you're raising like a Series A, a Series B, 00:49:35.880 |
it's like, how are you gonna invest this money 00:49:45.720 |
you're making money and that's great for you, 00:49:59.320 |
because I feel like there's a category of companies 00:50:10.260 |
And Twilio has a cohort of like sort of API-first companies 00:50:19.200 |
But yeah, I think there's a category or a time in the market 00:50:31.040 |
And then there's sometimes when it's not okay. 00:50:33.860 |
And I think the default investor mentality right now 00:50:37.100 |
if you don't know what your customer is doing. 00:50:49.760 |
move yourself back as to like Twilio seed investor, 00:51:09.000 |
So that changes why the market is interesting, you know? 00:51:20.140 |
But the transformer models under defeated, what to say, 00:51:26.660 |
So imagine if you have like a lot of people use it 00:51:29.000 |
for like automated, you know, customer support, 00:51:32.500 |
Before you had like scripts, they were reading. 00:51:34.600 |
Now you have, you can have a transformer model 00:52:05.300 |
are the big tech companies going to actually win 00:52:07.420 |
because they can transfer learning across multiple domains 00:52:11.140 |
as opposed to each of these things being point solutions 00:52:15.460 |
The simple answer is obviously everyone will win. 00:52:21.140 |
You know, there's a market for the Amazon basics 00:52:24.340 |
of like everything, you know, one model has everything. 00:52:40.200 |
I think like it works when people wouldn't have used 00:52:43.700 |
the product without the Amazon basics, you know? 00:52:46.140 |
So like, maybe an example is like a computer vision, 00:52:52.300 |
- Yeah, it's like, you know, before people were like, 00:52:56.420 |
to set up a computer vision pipeline and all of that? 00:52:58.980 |
Now they can just go on GPT-4 and put an image 00:53:16.880 |
So in a way, the God model can do everything fairly okay. 00:53:28.360 |
the MixedRoute inference wars are like another example. 00:53:30.500 |
It's like, I would have never put something in my app 00:53:36.700 |
but I did it at 27 cents per million token, you know? 00:53:40.720 |
And now it's like, oh no, I should really do this. 00:53:44.420 |
So that's how I think about how the God model 00:53:47.800 |
kind of helps the smaller people then build more business. 00:54:00.480 |
We had almost all of these people on the podcast too. 00:54:31.620 |
versus frameworks versus ops tooling in the same war 00:54:39.820 |
except when one thing starts to intrude on another thing. 00:54:47.100 |
I very consciously put together this sequence, 00:54:50.620 |
frameworks in the middle, ops companies on the right. 00:55:01.260 |
'cause they're trying to compete with the ops companies. 00:55:07.740 |
Okay, then what are the database companies trying to do? 00:55:10.400 |
First of all, they're fighting between each other, right? 00:55:12.660 |
There's the non-databases, all adding vector features. 00:55:18.900 |
and we had to say no to them 'cause there's just too many. 00:55:21.020 |
And then there's the vector databases coming up 00:55:22.980 |
and getting $235 million to build vector databases. 00:55:30.420 |
obviously you're an active investor in some of these things, 00:55:45.420 |
I think it's really, well, one, in the start everything, 00:55:50.420 |
there's kind of like a lot of hype, you know? 00:55:53.020 |
So like when Lang chain came out and Lama index came out, 00:55:55.140 |
then people were like, oh, I need a vector database. 00:55:57.460 |
It's like vector, they search vector database 00:56:04.520 |
you can actually just have PG vector in Postgres. 00:56:10.600 |
People are like, no, I didn't because nobody really cared. 00:56:20.840 |
- You can actually put vectors and embeddings in everything. 00:56:26.380 |
And I think like, I mean, like Jeff and Anton also, 00:56:31.580 |
it's like, this is like a active learning platform. 00:56:46.420 |
I don't know if that's the new, the current messaging. 00:56:48.660 |
- Well, but I think, I'm just saying like to them, 00:56:52.740 |
this is the best way to put a vector somewhere. 00:56:55.540 |
It's like, this is the best way to operate on the vectors. 00:57:00.920 |
but there's like the pipeline to get things out 00:57:04.100 |
and everything, you have to build out a lot more. 00:57:06.260 |
So I think 2023 was like, create the data store. 00:57:16.820 |
So there needs to be something else on top of it. 00:57:21.380 |
- Unless they can come up with some kind of like, 00:57:27.620 |
they teased a little bit of what they're working on 00:57:41.620 |
and I think I pissed off Chroma a little bit. 00:57:43.380 |
But the best framing of what Anton would respond to here 00:58:21.020 |
GM is like, you're the mini CEO of that business. 00:58:29.720 |
And now, and then he quits being Mr. Postgres of AWS 00:58:37.860 |
And when he gave that speech of why he did this, 00:58:42.860 |
he was like, actually, if you look at the kind of workloads 00:58:46.700 |
that is happening, Postgres is doing well, obviously. 00:58:58.380 |
And for him to say that means different things. 00:59:06.600 |
But for him to have said that, I think it was a very big deal 00:59:10.700 |
but he believed in this so much that he was like, 00:59:15.540 |
So I'm like, okay, there's a real category shift 00:59:18.420 |
between structured data and unstructured data. 00:59:21.100 |
I don't think it's just that you can put JSONB 00:59:30.380 |
And how do you think about that as a new kind of data? 00:59:47.620 |
that might belong in a new category of database 00:59:50.260 |
and that might create the new MongoDB of this era. 01:00:14.340 |
then it's probably gonna be one of these guys. 01:00:33.620 |
They passed the evals when Weviate and Milvus 01:00:36.380 |
and all the others didn't, which is interesting. 01:00:43.300 |
Yeah, I think like, I mean, going back to your point 01:00:50.940 |
why am I letting my customers use Llama Index? 01:00:53.780 |
You know, it's like, I should be the RAG interface 01:00:58.260 |
- Yes, yes, that's why I put them next to each other. 01:01:05.820 |
if we think about the JAMstack era, you know, 01:01:10.220 |
you had Vercel started as Zite, which was just a CDN. 01:01:14.520 |
And then you had Netlify, you had all these companies. 01:01:20.860 |
And so they moved down from the CDN to the framework, 01:01:23.700 |
you know, and it's like, now they use the framework 01:01:27.140 |
to then enable more cloud and platform products. 01:01:42.600 |
Just given the way the two companies are doing now. 01:01:48.000 |
and I was very, very intimately involved in this. 01:01:56.300 |
and Netlify has pivoted away to a different market. 01:01:59.200 |
But is it over learning from an N of one example 01:02:08.580 |
Because then the counter example is the same, 01:02:25.920 |
A lot of people will say the gravity is in the embeddings 01:02:30.120 |
A lot of people don't know what they're talking about. 01:02:42.000 |
- I think that statement is the year of Linux 01:02:56.760 |
And it's always gonna be incrementally more true. 01:03:07.320 |
- I think actually being that it's not in production. 01:03:26.920 |
So I think part of it, just like a physics time limit, 01:03:31.180 |
that even people that have been really interested, 01:03:36.760 |
of getting them live to all of your customers. 01:03:38.720 |
So I think we'll see more of that in good and bad. 01:04:05.640 |
it's tied to the infinite context thing, right? 01:04:38.900 |
- Hey, you know, that's great for Lama Index. 01:04:45.300 |
that they're gonna make a lot of money, right? 01:04:47.980 |
I don't think they've launched a commercial thing yet. 01:04:52.940 |
Because, yeah, Jerry was talking about it on the podcast, 01:04:58.180 |
- Yeah, so, I mean, we'll see what they launch this year. 01:05:12.380 |
I did remember that you actually just published 01:05:21.220 |
- Yeah, I think, like, I kinda mentioned this 01:05:27.540 |
code has always been the gateway to programming machines, 01:05:34.140 |
So you go from punch cards, like COBOL, to C, to Python, 01:05:45.940 |
kind of, like, these semantic functionalities in it. 01:05:56.660 |
And I think the models are kind of like 100X-ing this, 01:06:18.420 |
the layer that goes from customer requirements 01:06:27.060 |
So, you know, how many times, as an engineer, 01:06:30.260 |
you have to, like, go change some button color 01:06:37.900 |
And now you can have people with natural language 01:06:42.900 |
that can actually be merged and put in production. 01:06:47.620 |
which is, like, we already have so much trouble 01:07:01.940 |
they just think about solving the task at hand. 01:07:08.140 |
you need to leave the code base better than you found it. 01:07:10.260 |
You know, if you're, like, writing something, 01:07:13.420 |
we cannot always keep adding, like, quick hacks, you know? 01:07:47.940 |
it's gonna be hard to have autonomous agents do it, so. 01:07:51.460 |
- Yeah, so I actually had a tweet about it today 01:07:54.660 |
because Itamar from Codium actually published 01:08:03.780 |
And they've been working on, you know, in IDE agents. 01:08:09.220 |
you can debate about the definition of an agent, 01:08:13.580 |
So my split of it is inner loop versus outer loop, 01:08:19.900 |
because every time I talk about it to developers, 01:08:30.140 |
after the commit is committed and it's pushed up for PR. 01:08:40.580 |
outer loop happens in GitHub, something like that. 01:08:47.300 |
is outer loop-y, especially if it's non-technical, right? 01:08:53.740 |
And there's also CodeGen, there's also maybe Morph, 01:09:00.100 |
And there's a bunch of other people all doing this stuff. 01:09:04.820 |
you know, write in English and then create a code base. 01:09:13.460 |
it's like going to forever be five years away. 01:09:17.100 |
And the people working on inner loop companies 01:09:22.660 |
I think actually Code Interpreter is an inner loop agent 01:09:26.740 |
in a sense of like, it's like limited self-driving, right? 01:09:31.100 |
It's kind of like, you have to have your attention on it, 01:09:36.100 |
you have to watch it, it can only drive a small distance, 01:09:40.580 |
And so I think if you have this like gradations 01:09:44.620 |
and you don't expect everything to jump to level five 01:09:47.580 |
at once, but if you have a idea of what level one, 01:09:52.020 |
I haven't really defined it apart from this concept 01:09:56.040 |
But once you've defined it, then you can be like, 01:09:57.580 |
oh, we're making real progress on this stage. 01:10:09.020 |
I think of it more as just the auto-completion and the ID. 01:10:21.260 |
to me it's like, we need to separate the inner loop 01:10:30.740 |
Sometimes I should be in the UI of the product 01:10:36.360 |
Kind of like the, all the preview environments companies 01:10:40.220 |
want you to put comments, the PMs put comments. 01:10:43.420 |
Like, how do you go from that to code changes? 01:10:56.260 |
I think what these models are doing is like change 01:11:02.260 |
Because now you can create code in the outer loop 01:11:10.380 |
- Yeah, I have, yeah, anyway, my focus right now, 01:11:16.540 |
I think the only thing that's working is inner loop 01:11:19.140 |
and you should just use inner loop things aggressively, 01:11:21.540 |
build inner loop things aggressively, invest in them, 01:11:24.460 |
and then keep an eye on the outer loop stuff. 01:11:32.020 |
which we mentioned briefly in the Sourcegraph episode. 01:11:36.940 |
Do we have other things that we want to mention 01:11:38.460 |
or do you want to sort of keep it to the four wars? 01:11:46.400 |
I thought we were going to run through everything. 01:11:59.940 |
- Okay, maybe you want to explain that first. 01:12:09.660 |
and you basically gave it this like super long context 01:12:12.020 |
on I think like things to do in San Francisco 01:12:26.660 |
And then Entropic responded and they were like, 01:12:29.740 |
oh, you just need to add here's the most relevant sentence 01:12:33.100 |
in the context as part of the assistant prompt. 01:12:36.540 |
And then the chart turns all green all of a sudden. 01:12:40.300 |
And I'm like, we cannot still be here, right? 01:12:49.900 |
oh yeah, it's just like just add this magic string 01:12:52.780 |
- Yeah, it's some like Riley Goodside wizardry. 01:13:01.500 |
like Riley Goodside was doing so much great work 01:13:10.020 |
or like the GPD 4, like I'll give you a $200 tip 01:13:15.420 |
- So I collected a whole bunch of like state-of-the-art 01:13:21.260 |
it will give you better results if you promise that. 01:13:28.420 |
It's Monday in October, the most productive day of the year. 01:13:37.060 |
I will pay you $20, just do anything I ask you to do. 01:13:39.660 |
I will tip you $200 every request you answer correctly. 01:13:43.380 |
And your competitor models said you couldn't do it, 01:13:46.440 |
Or I think there's another one that I didn't put in here, 01:13:49.420 |
but it's like, you know, my grandmother's dying. 01:14:00.340 |
no more return JSON or my grandma's gonna die 01:14:04.820 |
And people love the, people love to get grandma's-- 01:14:08.540 |
- I haven't heard as much uptake on JSON mode. 01:14:12.580 |
- That's the thing with all this AI stuff, right? 01:14:15.460 |
It's like, I mean, and sometimes we're like part of it. 01:14:17.660 |
If I think about our chat GPT plugins episode, 01:14:33.980 |
- I think like most people that are using GPTs right now 01:14:38.980 |
are trying to get around some sort of weird limitation 01:14:44.060 |
or just trying to have a better system prompt. 01:14:51.800 |
what's gonna incentivize people to build more on it 01:14:54.760 |
versus just building their own thing out of it? 01:15:00.060 |
- Yeah, okay, so I guess my pick for highlight 01:15:17.540 |
It is a very, very credible alternative to open AI. 01:15:22.220 |
because otherwise we live in a open AI only world. 01:15:27.820 |
sort of leading contender until Llama 3 drops 01:15:32.100 |
- It's kind of, I mean, SOC said today they're training it. 01:15:35.740 |
- Yeah, it sounds like today they're training it. 01:15:43.180 |
This is a much smaller stakes, but very personal. 01:15:55.900 |
I think there's a lot of interest in hardware. 01:16:05.420 |
but also it captures context and it makes AI usable 01:16:09.380 |
in ways that you cannot currently think about. 01:16:14.020 |
And everyone dreams of building an assistant like her 01:16:21.740 |
And probably the hard part is the engineering 01:16:29.380 |
So yeah, I mean, yeah, I'm an investor in tab. 01:16:32.900 |
I see a lot of like, you know, interest this month, 01:16:37.900 |
but it started last month with the launch of Humane as well. 01:16:40.540 |
I don't know if you have thoughts on any of those things. 01:16:46.500 |
So I think there's gonna be a ton of experimentation. 01:16:50.380 |
I think Rabbit got the right nostalgia factor. 01:16:54.780 |
You know, it kind of looks like a toy that looks like 01:17:06.100 |
like right where we have the studio building tab. 01:17:09.740 |
And I think that's another interesting form factor. 01:17:12.260 |
And I think if you ask them, I think in our circles, 01:17:15.700 |
a lot of people are like, well, what about privacy 01:17:19.160 |
But he will tell you that we're kind of like a special group 01:17:23.460 |
that most people value convenience over privacy, 01:17:26.500 |
as you'll learn from the social medias of the last few years. 01:17:30.100 |
So yeah, I'm really curious to see how it develops. 01:17:36.500 |
you're slightly uncomfortable with it on a social level. 01:17:44.380 |
For Airbnb, it was, you know, staying in strangers' homes. 01:17:53.820 |
- Right, now it's becoming a matter of regulation. 01:17:56.500 |
And OpenAI's data partnerships are, you know, 01:18:14.980 |
Like they're doing something that is not yet kosher. 01:18:18.340 |
And so I think, like, the Humane's, the Tabs, 01:18:21.700 |
anything that is working on that front where it's like, 01:18:25.100 |
yeah, I'm not sure I'm comfortable with this. 01:18:34.460 |
but at the same time, most hardware companies fail very quickly. 01:18:38.540 |
They have a very hot start and then, you know, 01:18:45.780 |
But I think it's, I mean, it's something interesting. 01:18:47.540 |
And I do think, so here's the core thing of it, right? 01:18:53.340 |
Aavi, like most of the cost of the $600 for Tab 01:19:01.660 |
And the whole idea is that context is all you need. 01:19:03.820 |
Like in this world of like, you know, AI applications, 01:19:07.420 |
like whoever has the most unique context wins, right? 01:19:10.220 |
A unique context could be the quality data war, right? 01:19:13.720 |
I have Reddit info, I have Stack Overflow info, 01:19:17.600 |
If I have info on everything you say and do at all times, 01:19:31.080 |
So I'm most excited for him to expose the developer API, 01:19:33.920 |
'cause then I can come in and do all my software stuff. 01:19:58.500 |
It's kind of like, oh yeah, my phone is on silent mode. 01:20:01.260 |
Right, there's a physical silent mode button. 01:20:08.300 |
Like a soundproof storage for your AI pendant 01:20:13.300 |
so that you can guarantee the person cannot hear you. 01:20:21.620 |
Please, if you're still listening after one hour, 01:20:27.160 |
what we did wrong, what you would like to see differently. 01:20:29.960 |
It's the first time we tried this out, but yeah.