The Four Wars of the AI Stack

00:00:00.920 | - Hey everyone, welcome to the Latent Space Podcast.

00:00:03.980 | This is Alessio, partner and CTO

00:00:05.840 | in Residence at Decibel Partners.

00:00:07.400 | And today I'm joined just by my co-host, Zwickz,

00:00:10.480 | for a new podcast format.

00:00:12.560 | - Yeah, and it's a bit uncomfortable

00:00:15.360 | because we have to just stare

00:00:16.580 | into each other's eyes lovingly.

00:00:18.880 | But in our end of year survey last year,

00:00:21.680 | a lot of listeners were asking us for more one-on-one time,

00:00:25.140 | more opinions from the both of us as hosts

00:00:27.560 | on what's going on in AI.

00:00:28.800 | You know, both of us are very actively involved.

00:00:30.900 | (laughing)

00:00:32.040 | And I don't think this year will be any different.

00:00:34.560 | This year, there's lots more excitement to come.

00:00:37.360 | And I think, you know, we're trying to grow Latent Space

00:00:42.160 | in terms of the types of formats

00:00:43.680 | and the amount of value that we deliver to our subscribers.

00:00:47.240 | So one thing that we've been trying, experimenting with,

00:00:49.320 | is this monthly recap that I started doing

00:00:51.460 | around August of last year,

00:00:52.840 | where I basically just take the notable news items

00:00:55.920 | of the month and then I sort them

00:00:58.200 | and categorize them according to some order that makes sense

00:01:01.880 | and write them down in the newsletter.

00:01:04.340 | And this last December recap was particularly exciting

00:01:08.960 | 'cause it seemed like it popped off in a number of areas,

00:01:11.920 | particularly with the AI breakdown.

00:01:13.560 | Our friend NLW featured it on his podcast.

00:01:16.640 | And I figured we can just kind of go over that

00:01:18.680 | as a way of setting the stage for 2024,

00:01:22.280 | but also recapping what happens in 2023.

00:01:25.280 | - Yeah, and people always ask me

00:01:27.560 | if December is like a slow month,

00:01:29.840 | but I think you almost broke Substack

00:01:32.120 | with how many links we had in the thing.

00:01:34.440 | - No, we actually did.

00:01:35.260 | So a lot of people commented to me

00:01:37.700 | about the formatting issues

00:01:39.760 | within the newsletter that I sent out.

00:01:41.320 | And I know that they are there, but I couldn't fix it

00:01:43.400 | because Substack was broken by us with how long it was.

00:01:46.780 | - But so we had this kind of like four main buckets

00:01:51.840 | called the four words of the AI stack,

00:01:55.400 | data quality, and I guess like data quantity

00:01:58.840 | as well, in a way.

00:02:00.640 | The GPU rich versus poor,

00:02:02.440 | which we have a whole episode about with Dylan Patel.

00:02:05.720 | Multimodality, we're actually recording tomorrow

00:02:08.720 | with LumaLabs about their new 3D model.

00:02:11.000 | So we went from text to image to 3D video.

00:02:15.400 | I wonder what's next.

00:02:17.120 | - And we're gonna release Hugging Face as well.

00:02:19.160 | 'Cause I guess I've been thinking

00:02:20.480 | about calling it multimodality 101,

00:02:22.520 | because the first modality beyond text

00:02:24.920 | that you should really pay attention to is vision.

00:02:27.120 | - Right, yeah.

00:02:28.260 | Yeah, and then the rag ops war.

00:02:31.520 | I think that's a--

00:02:33.120 | - I don't know what to call it.

00:02:33.960 | I don't know if you want to call it anything else.

00:02:36.040 | This is my--

00:02:36.880 | - I don't know.

00:02:37.960 | But I think beginning of last year,

00:02:39.960 | that was like kind of the hottest space

00:02:41.440 | because there wasn't much open source model work.

00:02:43.560 | And I think over the last maybe like four or five months,

00:02:47.160 | everybody's so focused on fine-tuning LLAMA 2

00:02:50.220 | and like a DPO to improve these models,

00:02:53.480 | max trial, and all these things.

00:02:54.880 | And people forgot about our friends

00:02:56.800 | at LangChain, LLAMA Index,

00:02:58.640 | and some of the things that were maybe top of mind.

00:03:01.820 | VectorDBs, it seemed like everybody

00:03:04.080 | was releasing a VectorDB early in the year.

00:03:06.120 | - Yeah, I think that I'll be very surprised

00:03:08.840 | if any new VectorDBs come out this year.

00:03:12.080 | With one exception, which is something

00:03:13.520 | I'm keeping an eye on, which is Turbo Puffer.

00:03:16.560 | I don't know if you've seen them going around.

00:03:19.120 | Yeah, all the smart people seem to be adopting Turbo Puffer

00:03:22.520 | as the first serverless VectorDB,

00:03:25.160 | which could be interesting.

00:03:26.080 | - Yeah, no, and we're going to have definitely Jeff

00:03:29.240 | and Antoine on the podcast at some point.

00:03:31.960 | I know they're going to be fun, I guess, but...

00:03:35.960 | - I should also mention, I think it's interesting.

00:03:38.400 | So the reason I selected these four wars

00:03:41.640 | was a process of elimination of wars

00:03:44.920 | that I think ended up not mattering.

00:03:47.440 | So for those who don't know, inside of my writing,

00:03:52.120 | I often include footnotes that are in themselves

00:03:55.840 | just essays in the footnotes.

00:03:58.640 | And so I think it's also notable,

00:04:00.320 | the things that people thought were hot,

00:04:02.380 | that were less hot than expected.

00:04:05.240 | So it was agents, definitely less hot

00:04:08.400 | than at the start of 2023.

00:04:12.240 | And then this one is a very controversial,

00:04:14.720 | non-selection by me, I think.

00:04:15.960 | Open-source AI is not a battle in the sense that

00:04:19.560 | I don't think there's anyone against open-source AI.

00:04:22.920 | Everyone is on one side.

00:04:24.800 | There's no opposing side apart from regulators.

00:04:27.960 | But in my mind, when I think about for engineers,

00:04:31.560 | engineers are all universally in favor

00:04:34.240 | of open-source models.

00:04:35.560 | So there's no battle here.

00:04:36.800 | Everyone just wants it to improve.

00:04:38.040 | So it's not interesting to write about.

00:04:40.240 | We just want more open-source.

00:04:41.560 | - Yeah.

00:04:42.560 | The only battle is people offering inference on it.

00:04:45.600 | - Yes.

00:04:46.440 | - Killing each other in the process.

00:04:47.520 | - Yeah, so I classified that as a GPU rich versus poor war.

00:04:51.840 | But maybe there's a better way to classify that.

00:04:54.240 | And you can give me some feedback on that

00:04:56.400 | because it's a struggle to try to categorize the world.

00:05:01.400 | Code models as well.

00:05:04.040 | I was very struck by a conversation I had with Pullside.

00:05:07.960 | I saw Kant from Pullside.

00:05:10.200 | So they haven't been on the podcast yet.

00:05:12.000 | They're kind of stealth still,

00:05:14.200 | but they had a very, very notable fundraise.

00:05:15.920 | I think they had like $50 million raised.

00:05:18.440 | - I think even more, yeah.

00:05:19.480 | - For a seed.

00:05:20.960 | Spending most of it on GPUs.

00:05:23.040 | And my conversation with ISO, he was like,

00:05:25.960 | "Hey, you know, like Repl.it

00:05:28.080 | "was like one of our podcast's early biggest winners."

00:05:32.440 | Repl.it didn't really follow up with.

00:05:33.800 | Like they announced like their 1.5 model,

00:05:37.360 | but it's not really widely used beyond Repl.it.

00:05:40.240 | There's Starcoder, there is Code Llama,

00:05:43.800 | but like it's not really, for how important code is,

00:05:47.160 | it doesn't seem like as big of a battlefront

00:05:50.760 | as just general function calling, reasoning,

00:05:53.680 | these other kinds of domains.

00:05:56.760 | And so I thought it was just interesting to note

00:05:59.840 | that even though we as a podcast

00:06:02.840 | try to pay particular attention to developer tooling,

00:06:05.880 | to code models, we interviewed Cursor,

00:06:07.960 | Find, Repl.it, Codium, and Hugging Face.

00:06:11.200 | These all seem like very small

00:06:15.760 | compared to the amount of money being thrown,

00:06:20.200 | the amount of heat in the other domains.

00:06:23.520 | And I don't know why that is.

00:06:25.200 | - Yeah, I think it's maybe the fragmentation of the tooling.

00:06:29.320 | Like most people in code are using VSCode, Cursor, GitHub,

00:06:35.160 | one of the three,

00:06:36.000 | so there's maybe not as much experimentation

00:06:38.000 | versus with text, people are just trying everything.

00:06:41.800 | It's hard to try a code model.

00:06:44.520 | I see code models being released,

00:06:45.880 | but it's not super easy to just plug it into your workflow.

00:06:49.080 | So I think engineers like myself are just lazy.

00:06:52.080 | And it's like, hey, I'm having great success

00:06:54.120 | with whatever I'm using.

00:06:55.880 | I don't really wanna go there.

00:06:57.540 | - Special case form of code is SQL

00:07:01.120 | and the semantic layer data engineering type things.

00:07:04.600 | We also had two guests on there from Seek and Qube.

00:07:09.000 | And we also talked to a bit of Databricks, a bit of Julius.

00:07:11.520 | - Yeah, and we have Brian from Hex.

00:07:13.200 | - And Brian from Hex.

00:07:14.920 | Does he count?

00:07:15.760 | I don't know.

00:07:16.600 | - Yeah, no.

00:07:17.420 | - Yeah, yeah, yeah.

00:07:18.260 | I guess the Hex notebooks, yes.

00:07:19.680 | Hex magic, yes.

00:07:21.680 | Rexis is a different beast.

00:07:24.180 | Anyway, but yeah, I think people who come

00:07:31.640 | to AI engineering for the AI

00:07:34.920 | might actually end up finding themselves

00:07:36.460 | in data engineering in the end.

00:07:38.240 | And in traditional ML engineering in the end,

00:07:41.320 | they might have to discover that they're doing Rexis.

00:07:44.140 | And all the stuff that gets swept under a rug in a demo

00:07:49.800 | becomes their job.

00:07:51.060 | And I think I'll probably say just because we didn't select

00:07:55.600 | a theme for last year doesn't mean it wasn't important.

00:07:58.640 | It just wasn't top of mind yet.

00:08:01.040 | And maybe I think that would be an emerging theme this year.

00:08:04.160 | - Yeah, I think that's kind of the consequence

00:08:06.640 | of the low background tokens,

00:08:08.640 | like the end of the low background tokens.

00:08:10.320 | Once--

00:08:11.160 | - Can you explain what you think

00:08:12.360 | are low background tokens?

00:08:13.640 | This was our November recap.

00:08:14.720 | - Yeah, well, the comparison

00:08:17.640 | that our friend Jeff Huber at Chroma brought up

00:08:20.680 | is steel before the atomic bomb creation.

00:08:24.520 | So steel before and no radiation in it.

00:08:26.840 | After all the testing,

00:08:27.800 | a lot of steel had radiation embedded in it.

00:08:30.120 | So it was really precious to get low background steel,

00:08:34.520 | meaning with no radiation and same with tokens.

00:08:37.200 | You can assume that any internet content

00:08:40.720 | from three years ago, it's just internet.

00:08:43.380 | It doesn't have, it's like people writing,

00:08:45.200 | it's not models writing.

00:08:46.460 | Instead now, anything we're gonna get on Common Crawl,

00:08:49.760 | updates and things like that,

00:08:50.840 | you never know if it's human written or not.

00:08:53.560 | And I think that will put more work on data engineering.

00:08:55.880 | Because even basic stuff like checking

00:08:58.740 | if a text says, as a model created by OpenAI,

00:09:01.840 | it's gonna be important.

00:09:04.080 | So people are just being blindly taking

00:09:06.800 | all the data sets offered by Luther and Common Crawl

00:09:10.320 | and all these different things,

00:09:11.560 | assuming that all the data in it is good.

00:09:14.000 | I think now, how do you build on top of it?

00:09:16.320 | And we've seen the New York Times lawsuit against OpenAI.

00:09:20.080 | We've seen data partnerships starting to rise

00:09:23.280 | in different companies.

00:09:24.920 | I think that's gonna be one of the bigger challenges

00:09:27.520 | and maybe we'll see more of the work

00:09:30.360 | that Databricks has done

00:09:31.400 | to build the DALI 5K instruction tuning,

00:09:34.680 | just first party creation of data.

00:09:37.520 | It's like, you got people sitting at their desk every day.

00:09:40.520 | If everybody wrote five Q&A pairs or things like that,

00:09:44.720 | you would have a massive unique data set for your model.

00:09:48.400 | So, yeah.

00:09:50.480 | - Yeah, for people who missed that episode,

00:09:52.440 | that was one of our early episodes as well.

00:09:54.400 | And Mike Conover since left to start BrightWave,

00:09:58.440 | which I'm sure we'll have him back this year.

00:10:00.280 | - Yeah, they're doing a lot of interesting stuff.

00:10:02.080 | I think the next episode will be very cool.

00:10:05.240 | - Awesome.

00:10:06.160 | So how do you want to tackle this?

00:10:07.040 | Do you want to just kind of go through the four wars?

00:10:09.440 | - Yeah, let's do it.

00:10:10.500 | You have, you created this Wikipedia-like infographic

00:10:16.680 | for each of them.

00:10:18.200 | - Yeah, I should say, the inspiration for this

00:10:20.200 | actually was during the Sam Altman leadership battle,

00:10:25.200 | people were making mock Wikipedia entries for the debate

00:10:32.920 | and for like who's on the side of the D-cells

00:10:35.640 | and who was inside of the EX.

00:10:37.640 | And so I like that format because it's very concise.

00:10:41.320 | It has the list of key players and it's kind of fun

00:10:45.040 | to think about like who's on what side

00:10:46.720 | and think about what is important

00:10:49.160 | and what people are battling over.

00:10:51.000 | I think it is important to focus on key battlegrounds

00:10:55.140 | as a concept because there's so many interesting things

00:10:58.480 | you could be talking about in AI

00:10:59.760 | and they're not all equally interesting.

00:11:01.640 | So how do you decide what is interesting?

00:11:05.320 | I think it's money, it's power, it's people,

00:11:08.800 | it's like impact, that kind of stuff.

00:11:11.740 | And so, yeah, that's what I ended up doing.

00:11:14.560 | And so fun fact, the way I did this

00:11:16.800 | was I actually edited the HTML on Wikipedia

00:11:19.240 | and then I just screenshotted it just to get the formatting.

00:11:22.320 | - Good old developer tools.

00:11:24.240 | Developer tools is all you need.

00:11:25.840 | So the data war, belligerence.

00:11:31.520 | On one side you have journalists, writers, artists,

00:11:34.320 | on the other side you have researchers, startups,

00:11:37.180 | synthetic data researchers.

00:11:39.000 | I guess like maybe we wanna talk about

00:11:43.400 | what are the axis of war.

00:11:45.800 | So like one of them is attribution, right?

00:11:48.520 | Like I think there's a varying spectrum

00:11:52.560 | of how comfortable people are about this data

00:11:54.920 | going into a model.

00:11:55.760 | So some people are happy to have your model trained on it,

00:11:59.800 | no matter what.

00:12:00.760 | Some people are happy to have your model trained on

00:12:02.800 | as long as you disclose that it's in the model.

00:12:05.980 | Some people just hate that you trained on their model

00:12:09.920 | and some people like the New York Times

00:12:12.360 | wants you to destroy any artifact

00:12:14.360 | that might have touched your article.

00:12:16.160 | So that's kind of what we're fighting on.

00:12:19.200 | It's not always, I just wanna make it clear

00:12:21.460 | that it's not just like you should never use the data

00:12:23.920 | or you should always use the data.

00:12:25.320 | I think people are just trying to figure out

00:12:28.040 | what's the right form of attribution

00:12:30.000 | and how do I get paid as somebody

00:12:32.760 | whose data ended up being in this training.

00:12:35.400 | I think we're giving everybody a lot of great tokens

00:12:38.160 | related space because we do full transcripts on everything

00:12:41.040 | and we're happy for people to train models on.

00:12:44.440 | - Oh yeah, please train a latent space model.

00:12:46.880 | - Yeah, we would love it.

00:12:48.080 | So that's kind of what we're fighting on.

00:12:51.120 | Anything that people should keep in mind about this war

00:12:54.680 | and like maybe some of the campaigns that are going on?

00:12:59.160 | - So I think the New York Times one

00:13:01.160 | is probably going to go to Supreme Court.

00:13:03.200 | It is very, very critical.

00:13:05.760 | It is a landmark

00:13:10.800 | war that will probably decide what fair use means

00:13:13.880 | in context of AI.

00:13:16.040 | So I think it's, and I recommend,

00:13:19.280 | I think The Verge did a good analysis of this.

00:13:22.240 | Platformer maybe did a good analysis of this.

00:13:25.080 | There are like four criteria for what fair use is

00:13:27.640 | and everyone basically converges onto the last criteria

00:13:32.440 | which is does your use, does your transformative use

00:13:35.560 | of my copyrighted material diminish the market

00:13:40.360 | for my content?

00:13:41.480 | And it's very hard to say.

00:13:45.600 | I always suspect that yes, in some capacity,

00:13:49.400 | in some amount, but good luck proving that

00:13:52.260 | in a court of law.

00:13:53.160 | And I think a negative ruling on open AI

00:13:59.080 | would seriously stall the progress of AI.

00:14:03.700 | And that's bad for humanity,

00:14:08.040 | but good for content creators and writers

00:14:10.200 | so obviously which we want them to be adequately compensated

00:14:14.000 | and recognized for their work.

00:14:16.320 | So there's like no good, there's like no easy outcome here

00:14:19.740 | apart from the existing copyright system

00:14:21.740 | which is also somewhat broken.

00:14:24.200 | And it's just a very, very tricky,

00:14:26.840 | challenging case, I think.

00:14:30.580 | Yeah, so.

00:14:31.720 | - It's funny because we had something,

00:14:33.360 | I was a community moderator at a website called Rap Genius

00:14:36.760 | which was a lyric sanitation.

00:14:38.780 | And there was like a similar thing in maybe like 2014

00:14:41.760 | or like the music labels basically came to the website

00:14:45.160 | and it's like, hey, this is not fair use.

00:14:47.400 | Like you can not reuse the lyrics to the song

00:14:49.760 | and eventually the website made deals

00:14:52.600 | with the record labels to like be able to do this.

00:14:57.120 | And then Google was stealing the transcripts

00:15:00.020 | to put in like the enhanced thing.

00:15:02.640 | - And they proved it by.

00:15:04.080 | - Yeah, yeah, we did all the like,

00:15:05.760 | basically like the things on the eye,

00:15:07.240 | some eyes we put the dots, some eye we put like the accent

00:15:10.580 | and that's how it made it all better.

00:15:12.640 | - I thought they just varied the spacing

00:15:14.920 | or they like use a different kind of spacing

00:15:16.960 | in the Unicode.

00:15:17.800 | - I think it was the eye thing,

00:15:19.440 | but maybe, I mean, this is like almost 10 years ago.

00:15:21.720 | - So Rap Genius proved it by injecting some data poison

00:15:24.680 | into their corpus and then Google reproduced it faithfully.

00:15:28.560 | So therefore they proved that Google is scraping Rap Genius.

00:15:32.200 | Did Google have to pay Rap Genius money in the end?

00:15:35.360 | - I don't think so.

00:15:37.080 | - But at the same, there was also another issue

00:15:39.200 | with Rap Genius that we had that got blacklisted by Google

00:15:42.720 | for like, there was like a lot going on.

00:15:45.440 | - Of course.

00:15:46.400 | - But anyway, this is not a Rap Genius special.

00:15:48.840 | - Yeah, I mean, ultimately,

00:15:49.800 | like I think that we do need quality data.

00:15:52.700 | I think that then if this case is contained

00:15:55.400 | to the New York Times, the New York Times worse outcome

00:15:58.400 | is that they will substitute it with Washington Post

00:16:01.400 | and they substitute with The Economist

00:16:02.720 | or like the second or third ranked newspaper

00:16:05.720 | that is the most friendly to AI.

00:16:07.680 | And then the New York Times will realize

00:16:08.960 | that actually their words are not as,

00:16:11.960 | not that much more valuable than other words.

00:16:14.460 | And then the value of the content comes down

00:16:17.520 | very, very dramatically.

00:16:19.600 | So I think it will be interesting,

00:16:21.780 | but yeah, I do think it's overstepping their bounds

00:16:24.240 | to call for the destruction of all GPTs.

00:16:26.560 | That's probably for sure.

00:16:28.280 | Then the bigger problem I have

00:16:30.220 | is with Stack Overflow and Reddit,

00:16:31.640 | which I named as on the side of the New York Times.

00:16:35.780 | They have effectively shut down their APIs

00:16:38.260 | in order to try to train their own models.

00:16:41.780 | Probably same as Twitter, actually.

00:16:43.940 | I should probably have put Twitter,

00:16:45.340 | I put Twitter on the wrong side, maybe.

00:16:46.540 | I don't know, Twitter is on both sides.

00:16:48.780 | - Elon is on every side, the side of chaos.

00:16:51.540 | - Yeah, what this is,

00:16:52.540 | is basically every UGC, Users Generated Content company

00:16:56.340 | of the 2000 and 2010s,

00:16:58.740 | now has a giant pile of user content

00:17:01.620 | that becomes valuable data

00:17:05.180 | that used to be open for researchers to scrape

00:17:08.140 | and train models.

00:17:09.340 | Now all of them are locking in their walls, right?

00:17:11.900 | Behind their walled gardens

00:17:13.500 | and then trying to train their own models

00:17:14.820 | to boost their benefits.

00:17:17.020 | So this is a locally optimal outcome for them,

00:17:19.540 | but a globally suboptimal outcome for humanity.

00:17:22.320 | Because why should we care

00:17:24.100 | about the closed garden of Reddit?

00:17:26.180 | The Reddit model, the Stack Overflow model,

00:17:29.620 | the X model, as opposed to it being a part of a data mix

00:17:34.020 | of 20% Reddit, 20% Stack Overflow, 20% X.

00:17:38.820 | That seems like a much better outcome for the world,

00:17:42.020 | but everyone is acting in their very narrow self-interest

00:17:45.180 | in trying to make their own model,

00:17:47.200 | which is probably going to suck.

00:17:49.300 | - Right.

00:17:50.140 | (laughs)

00:17:52.220 | So next, Bor, after you get data--

00:17:54.740 | - Oh, we should mention synthetic data.

00:17:57.320 | - Oh, yeah.

00:17:58.160 | So what happens when you run out of human data?

00:18:01.580 | You make your own.

00:18:02.420 | (laughs)

00:18:04.500 | So I would say that is, when I went to NeurIPS,

00:18:09.060 | that was the number one discussion

00:18:10.440 | out of every single researcher's mouth.

00:18:13.020 | There is a lot of research coming from both, I guess,

00:18:17.420 | the big labs as well as the academic labs

00:18:22.420 | on what good synthetic data looks like.

00:18:25.420 | I don't know if you've talked to any startups around that.

00:18:27.700 | I just talked to Luis Costricado the other day,

00:18:30.820 | and he is promising a very, very interesting approach

00:18:35.820 | to synthetic data generation.

00:18:37.580 | I think his phrase for it

00:18:40.140 | is pre-trained-scale synthetic data,

00:18:42.500 | as opposed to what the news research

00:18:44.940 | and the other open-source communities have been doing,

00:18:46.900 | which is fine-tuned-scale synthetic data.

00:18:50.340 | And so he wants to create trillion-token datasets

00:18:53.880 | that are all synthetic.

00:18:55.360 | And I'm like, okay, that's interesting,

00:18:57.180 | but also at the same time,

00:18:58.600 | these are all just downloads from GPT-4 or something else.

00:19:03.600 | So Luis is very aware of that, and he has a way around it.

00:19:09.220 | I don't really understand it,

00:19:10.100 | but he claims that that's a good way around it.

00:19:14.100 | Andrej Karpathy at NeurIPS

00:19:16.860 | highlighted this paper from DeepMind

00:19:18.340 | where they were bootstrapping synthetic data

00:19:22.660 | that could be verifiably proven correct.

00:19:26.640 | So specifically in math and in code,

00:19:29.620 | where there is a correct answer.

00:19:31.100 | So yeah, that makes sense.

00:19:33.260 | You can solve the synthetic data problem that way,

00:19:35.780 | but what about beyond that?

00:19:37.660 | There's just no answer.

00:19:39.620 | - And wasn't part of the issue also

00:19:42.220 | that the way that the phrases are constructed

00:19:45.380 | and all of that in synthetic data

00:19:46.980 | ends up making mode collapse even worse?

00:19:51.140 | Because one thing is right or wrong, right?

00:19:53.380 | The other thing is every sample is read in the same way,

00:19:58.140 | or as a similar, since it comes from a certain model,

00:20:01.920 | kind of as a similar root of structure.

00:20:04.480 | - You already have, yeah.

00:20:06.200 | So I mentioned this in the best papers discussion

00:20:09.400 | with John Frankel.

00:20:10.400 | So the basic argument is you already have

00:20:13.380 | a flawed distribution from a language model.

00:20:17.280 | You are resampling that flawed distribution

00:20:19.240 | to double down on that flawed distribution.

00:20:22.040 | There's no extra information from humans.

00:20:23.980 | So on principle, how can this work?

00:20:25.920 | And so the only conclusion there

00:20:29.020 | is you don't need it to emulate a human.

00:20:31.420 | You need it to emulate a useful assistant,

00:20:33.600 | however you define it.

00:20:34.800 | So I think that the goal of synthetic data

00:20:38.140 | is less to emulate human speech,

00:20:41.060 | because that is basically solved.

00:20:43.260 | It is now more to spike the distribution in useful ways.

00:20:47.940 | And that's a phrase I borrowed from Kanjun.

00:20:49.540 | But anyway, so I think that synthetic data

00:20:51.380 | will be a giant theme for this year,

00:20:53.380 | and not least because the human data

00:20:56.140 | is being locked up behind walls.

00:20:58.460 | So it's a very, very clear trend.

00:21:00.500 | This is probably the most amount of money

00:21:03.820 | after GPUs will be spent here on data.

00:21:06.540 | So one war I did not put here was the talent war, right?

00:21:09.300 | Like the war for PhDs and smart people.

00:21:11.700 | But when you break down what the talent people do,

00:21:16.020 | one is they make models and they run inference on GPUs.

00:21:21.260 | Or they run training runs on GPUs.

00:21:24.460 | But the other is they clean data.

00:21:26.060 | They find data, clean data, and format data.

00:21:28.860 | And so yeah, these are all just proxies

00:21:30.740 | for the kind of talent that is flowing back and forth.

00:21:33.460 | And ultimately, I think you have to focus

00:21:35.500 | on what they're working on,

00:21:37.540 | the visible output of what they're working on, which is data.

00:21:41.260 | - All right, let's talk about the GPU inference war.

00:21:44.720 | I think this is one that has been heating up.

00:21:47.100 | And we actually have a bunch of these folks

00:21:49.780 | coming on the podcast in the next few days.

00:21:51.940 | - Yeah, yeah, yeah.

00:21:52.780 | Are we calling it compute month?

00:21:54.020 | - Yeah, we can figure out a name,

00:21:55.620 | but we have modal, together, replicate.

00:21:59.380 | There's a lot coming up.

00:22:01.940 | But basically, the Mixedraw release, the MOE model,

00:22:05.620 | was kind of the spark of the war.

00:22:09.060 | I think the price went down like 90% in one week.

00:22:13.260 | - Yeah, I wrote two, two, two times.

00:22:15.500 | But yeah, one divided by two, two, two

00:22:16.900 | is whatever the price is.

00:22:20.260 | - Yeah, and then there was the benchmark drama

00:22:22.780 | between Together and AnyScale,

00:22:25.100 | on whether or not which one was faster,

00:22:27.540 | and whether or not the benchmark

00:22:29.140 | was really reflective of performance.

00:22:31.380 | - Yeah, and this was very surprisingly ugly,

00:22:36.300 | in a way that I think usually people

00:22:39.220 | try to respect each other's work,

00:22:40.540 | and play nice, and say nice things

00:22:41.780 | when people release stuff.

00:22:42.660 | Even if it's a competitor, you say nice things,

00:22:44.100 | or you don't say anything at all.

00:22:46.500 | AnyScale, for some reason,

00:22:49.020 | they released a benchmark on which,

00:22:52.540 | of course, AnyScale looks the best.

00:22:54.340 | (laughs)

00:22:55.860 | Why would you release a benchmark

00:22:56.940 | where you don't look the best?

00:22:58.260 | But then, basically, everyone featured

00:23:00.340 | in that benchmark didn't like it, of course.

00:23:03.180 | I do think there's some methodological things.

00:23:05.420 | So for anyone doing benchmarks,

00:23:06.940 | you have to understand that there's a real, real,

00:23:09.180 | real difference between a public benchmark

00:23:11.820 | that is meant for just limited testing,

00:23:14.780 | compared to, okay, if you're load testing us,

00:23:17.220 | or if you're seeing what a real

00:23:19.020 | enterprise customer would see,

00:23:20.100 | you have to give them a heads up.

00:23:22.220 | You have to get a different API key,

00:23:23.420 | a different endpoint, and you test

00:23:24.900 | the real infrastructure, not the demo one.

00:23:27.260 | This is very common for infra companies,

00:23:29.180 | and I think AnyScale just neglected that,

00:23:32.260 | and it hurt their credibility.

00:23:34.460 | AnyScale is not new at this game.

00:23:36.060 | They should have done that.

00:23:37.580 | But what was interesting was this benchmark drama

00:23:39.900 | reached even beyond AnyScale.

00:23:42.020 | We're gonna have Sumit on,

00:23:42.900 | and he's gonna talk about why he weighed in,

00:23:45.380 | 'cause Sumit doesn't represent any inference part.

00:23:47.220 | He just works at Meta.

00:23:48.420 | But he felt like this was a very interesting debate.

00:23:53.420 | And I think we'll see more of this.

00:23:55.300 | You have been a data investor for a while.

00:23:58.060 | Database companies always do this.

00:24:00.220 | And I think now we're just seeing

00:24:02.020 | this kind of fight come into the inference space.

00:24:04.900 | - Yeah, yeah, and I think the hardest thing

00:24:07.340 | is the end customer can now replicate it.

00:24:11.340 | So if you give me a Postgres benchmark,

00:24:14.060 | I can run Postgres on my MacBook and run similar ones.

00:24:18.340 | I think with models, it's just impossible.

00:24:20.060 | So people tell you, "This is the benchmark,"

00:24:22.700 | and you're like, "Okay, I have to go sign up

00:24:24.580 | "to every single cloud now to try it."

00:24:28.020 | It's just not easy.

00:24:30.260 | And we talked about this in Benchmarks 101,

00:24:32.180 | which is the same with model benchmarks, right?

00:24:35.140 | Just like, "Oh, this model is so much better than this."

00:24:37.180 | And then it's like, "Did you train on the questions?"

00:24:40.020 | And it's like, "What?

00:24:41.420 | "Oh, I don't know."

00:24:42.740 | So, and again, it's hard for people

00:24:44.700 | to just run the models and test them.

00:24:48.060 | So there's a lot more weight, I think,

00:24:50.340 | in AI on benchmarks than there is in traditional software,

00:24:53.100 | because nobody buys Upstash or Redis Cloud or whatever

00:24:57.500 | just based on a benchmark.

00:24:58.780 | They try them and check performance and whatnot

00:25:01.420 | because they have real production-scale workloads.

00:25:04.540 | Here, it's like nobody's really doing anything

00:25:06.260 | with these models.

00:25:07.100 | So it's like whatever any skill says, I guess, is good,

00:25:10.420 | but then customers are gonna go try it

00:25:12.420 | and just decide for them what the right thing is.

00:25:16.620 | - Yeah, yeah.

00:25:17.460 | And I think it's important to understand

00:25:20.060 | it is not just about cost.

00:25:21.740 | I think what the price war represented

00:25:23.900 | was a race to the bottom on cost.

00:25:25.780 | And you're like, "Okay, Deep Infra,"

00:25:28.380 | which is a company, we're not,

00:25:30.220 | the name of the company is Deep Infra,

00:25:31.780 | "Deep Infra has promised to just always

00:25:33.180 | "be the lowest cost provider."

00:25:34.900 | Okay, fine, that's a good value proposition,

00:25:37.300 | but you're not only optimizing for that

00:25:40.180 | in a production application.

00:25:41.500 | You're optimizing for latency.

00:25:43.020 | That's one thing.

00:25:43.860 | You're optimizing for uptime.

00:25:45.460 | That's something that you can only earn over time.

00:25:47.760 | You're optimizing for throughput

00:25:49.900 | and other forms of reliability.

00:25:52.060 | It starts to tail off beyond that,

00:25:53.540 | but there's three or four dimensions

00:25:55.000 | that really, really matter.

00:25:56.460 | If you're not table-stakes on any of those things,

00:25:58.300 | you're out.

00:25:59.140 | You're just out.

00:25:59.980 | So actually, there was a really good website

00:26:03.820 | that was released just this week

00:26:06.380 | called Artificial Analysis, did you see it?

00:26:08.740 | Yeah, so this is what the industry needs,

00:26:11.380 | which is an independent third-party benchmark

00:26:13.940 | pinging the production API endpoints of all the providers

00:26:17.780 | and giving a third-party analysis of what this is.

00:26:21.620 | I actually built a prototype of this last year.

00:26:23.340 | - Yeah, I was gonna say.

00:26:24.180 | - But I didn't like maintaining it.

00:26:27.460 | (laughing)

00:26:29.700 | I'm glad someone else is doing it

00:26:32.260 | just because I don't want to keep up with all these things.

00:26:36.420 | But still, I think it's a public service

00:26:39.180 | that somebody should do, so I'm glad that they did it.

00:26:42.020 | I think they did it very well.

00:26:44.220 | So yeah, I think that is where, I guess,

00:26:48.300 | the inference drama is ending for now.

00:26:51.820 | I don't think, I haven't seen any continuing debate there.

00:26:55.580 | The only other thing that,

00:26:57.020 | I did some extra work on this for the recap,

00:26:59.780 | which is, are they losing money?

00:27:01.380 | Are they pricing correctly their tokens from Mixtrel?

00:27:07.060 | And I actually managed to go into Dylan Patel's

00:27:11.100 | write-up of the Mixtrel price war.

00:27:13.520 | And I think I reasonably worked out

00:27:16.840 | that you can serve Mixtrel,

00:27:18.820 | and the lowest you can possibly charge

00:27:20.700 | if you take the most aggressive amortization

00:27:23.300 | of all your CAPEX and all that,

00:27:25.680 | is 50 to 75 cents per million tokens,

00:27:28.900 | which is what Perplexity prices their Mixtrel at.

00:27:31.580 | And Perplexity is a very smart player.

00:27:34.020 | They're not even an inference infra provider.

00:27:37.340 | They're just doing this for fun.

00:27:38.940 | But they're like, "Yeah, we don't want

00:27:41.940 | "to lose money on this.

00:27:42.780 | "We will provide it at cost.

00:27:44.080 | "This is what cost is to us."

00:27:46.380 | So that means, so Perplexity provides it

00:27:49.800 | at 56 cents per million output tokens.

00:27:52.620 | That means AnyScale, which is 50 cents,

00:27:54.460 | OctoAI, 50 cents, AbacusAI, 30 cents,

00:27:57.420 | and DeepInfra, 27 cents, they're all losing money.

00:28:00.240 | Because we think that the break-even is 51 cents.

00:28:03.140 | - And that's, and even that is like

00:28:06.820 | a full batch size and kind of max.

00:28:09.760 | - No, no, no.

00:28:10.600 | I assume-- - Max utilization.

00:28:11.540 | - I assume 50% utilization.

00:28:13.460 | So like, if you talk to practitioners,

00:28:16.260 | very, very good is 60%.

00:28:19.060 | Average is like 30, 40.

00:28:20.520 | So I just, I say 50, right?

00:28:22.380 | You assume 50%, batch size 16,

00:28:25.300 | 100 tokens per second generation.

00:28:27.300 | That's also very, very high.

00:28:28.300 | These are all very favorable numbers.

00:28:29.580 | Like, probably the real number is closer

00:28:31.420 | to 75 cents per million than 50 cents per million.

00:28:33.940 | Anyway, anyone charging under 50,

00:28:37.540 | definitely losing money.

00:28:39.060 | So then it's like, okay, you,

00:28:42.060 | if you, either you don't know what you're doing,

00:28:43.940 | which, in which case, good luck,

00:28:46.300 | or you know what you're doing,

00:28:47.300 | and you're purposely losing money for something.

00:28:49.860 | And what is that?

00:28:50.700 | And I don't know, but I think it's an interesting,

00:28:54.660 | aggressive strategy to pursue

00:28:56.340 | if you are doing it on purpose.

00:28:58.180 | So this is something that, like,

00:29:00.100 | the classical, like Walmart,

00:29:02.020 | would have a loss leader.

00:29:03.100 | Like, they really, really, on purpose,

00:29:05.060 | lose money on things,

00:29:06.260 | so that they get you in the door to try things out.

00:29:09.100 | I, like, I don't know if that makes sense to you as a UC.

00:29:12.940 | - Yeah, yeah, yeah.

00:29:13.780 | It's like the, well, it's like all the,

00:29:16.180 | you know, the candies are placed at the cash register,

00:29:19.540 | because maybe you just went to get the thing on discount,

00:29:21.660 | and then you buy a Kit-Kat, whatever,

00:29:23.860 | and then make money on the Kit-Kat.

00:29:25.900 | Your Kit, they all have the Pokemon trading cards

00:29:29.340 | at checkout now.

00:29:30.180 | So if you bring your Kit to buy the discounted whatever

00:29:33.500 | for you, then you end up spending more.

00:29:35.580 | But to me, the thing is, like,

00:29:37.900 | where's the checkout register

00:29:39.100 | where you upsell people with these things, right?

00:29:41.380 | - Yeah, I don't know how you-

00:29:42.220 | - It's like, that's really the big thing.

00:29:44.940 | Yeah, I don't know.

00:29:47.220 | I'm curious to see.

00:29:48.060 | I don't think Cloudflare still has it live.

00:29:50.420 | I wonder what they're gonna charge for all workers.

00:29:53.780 | Yeah.

00:29:54.620 | - They cannot serve mixed trial.

00:29:56.140 | Their GPUs are too underpowered.

00:29:57.860 | Cloudflare AI is like very good marketing

00:30:01.020 | for very, very underpowered inference, right?

00:30:05.900 | - Yeah, well, I don't know.

00:30:07.580 | I think it all depends on, like,

00:30:09.740 | what is gonna be needed, right?

00:30:11.460 | So they have mixed trial 7B right now, I checked.

00:30:15.300 | But yeah, I wonder-

00:30:16.140 | - They cannot serve mixed trial.

00:30:17.140 | - Yeah, yeah, yeah.

00:30:17.980 | - Okay, yeah, yeah.

00:30:18.820 | - I wonder, but I think they don't wanna get

00:30:20.820 | into this race right now, probably.

00:30:22.220 | - No. - You know?

00:30:23.100 | - Yeah. - So yeah, I'm curious.

00:30:26.100 | Going back to the last leading, it's like,

00:30:28.340 | is there gonna be a better model that comes next

00:30:31.420 | that they hope that you already integrated their thing with?

00:30:34.980 | You know, if you're using together to serve mixed trial

00:30:38.260 | and then something else comes in

00:30:40.140 | that you're gonna replace mixed trial with,

00:30:42.180 | hopefully you're still gonna use together

00:30:43.780 | and they're gonna get better unit economics on it.

00:30:46.380 | I don't know.

00:30:48.280 | - Yeah. - It's a good question.

00:30:49.340 | - It's a good question.

00:30:50.180 | Thank you VCs for paying for all of our imprints.

00:30:53.780 | - No, no, no.

00:30:54.620 | I think these are, you know, everyone in here

00:30:56.580 | are grown adults, they're smart investors.

00:30:58.980 | I'm sure there's some kind of long-term strategy here.

00:31:00.740 | And I'm trying to figure that out.

00:31:01.920 | Like, assume that people are smart

00:31:03.580 | and then what smart people do.

00:31:05.580 | - Yeah, I think it's the same with Uber, right?

00:31:08.100 | It's like, how could it have been so cheaper at the start?

00:31:11.620 | You know, like you look back at all DoorDash,

00:31:14.260 | all these things, it's like-

00:31:15.740 | - And like last year was a great year for Uber.

00:31:18.020 | - Yeah, no, exactly.

00:31:18.860 | Uber friends are like suddenly very, very rich again.

00:31:21.640 | (laughing)

00:31:23.820 | One thing I will mention on like the engineering

00:31:26.620 | sort of technical detail side is, you know,

00:31:28.660 | the rise of mixture of experts is something that,

00:31:31.900 | you know, we covered in our podcast with George

00:31:35.260 | and now with MixedRaw.

00:31:37.640 | And it represents the first successful,

00:31:41.780 | really, really commercially successful sparse model.

00:31:45.020 | And sparse in a very interesting way,

00:31:47.380 | in a sense that the divergence between

00:31:51.760 | the amount of compute you need at training

00:31:55.100 | versus the amount of compute you need for inference

00:31:58.180 | continues to diverge, but also in a weird way

00:32:00.700 | where you need to keep all the weights

00:32:03.420 | of the MOE model loaded,

00:32:07.820 | even though you're not necessarily using them at all times.

00:32:10.500 | (laughing)

00:32:11.740 | So, I mean, basically what I think that is,

00:32:14.740 | is like, I think that that is going to impose

00:32:17.460 | different needs on hardware, different needs on workload,

00:32:21.060 | different needs on like batching optimization,

00:32:23.520 | like Fireworks recently announced a fire attention

00:32:26.860 | where they wrote a custom Cuda kernel for MixedRaw

00:32:29.340 | on H100, it's like super, super domain specific.

00:32:33.040 | And they announced that they could, for example,

00:32:35.100 | quantize from like 16 bit down to eight bit

00:32:38.760 | with like no loss in performance.

00:32:41.020 | Like all these magical details emerge

00:32:43.480 | when you take advantage of like very,

00:32:45.340 | very custom optimizations like that.

00:32:48.140 | And I think like the rise in MOEs this year

00:32:51.660 | is going to be, going to have very meaningful impacts

00:32:54.540 | on the inference market and how it's going to shape

00:32:56.100 | how we think in price for inference.

00:32:58.560 | It may not be that we have this sort of input token

00:33:03.220 | versus output token paradigm for long,

00:33:06.700 | particularly because we have things like,

00:33:10.820 | different forms of batching, different forms of caching.

00:33:14.280 | And like, I don't really know what that looks like,

00:33:17.200 | but I'm very curious.

00:33:18.200 | I see a lot of opportunity here.

00:33:19.440 | If I was an inference provider player,

00:33:22.440 | like that's something I would be trying to offer

00:33:24.120 | to people as a way to differentiate,

00:33:25.600 | because otherwise you're just an API.

00:33:27.280 | - Yeah, no, it was in a way counterintuitive

00:33:29.880 | because most of the struggles with inference as well

00:33:33.160 | are just like memory bandwidth, you know?

00:33:34.960 | So we have now models that scale worse at higher batch.

00:33:40.940 | You know, but I'm glad I'm not in that business.

00:33:44.980 | I can tell you that.

00:33:45.820 | That's for, there's so much work to be done

00:33:48.780 | at like so many low levels of the stack.

00:33:51.760 | You know, you're already trying to provide value

00:33:53.860 | to the customer on like the developer experience

00:33:56.660 | and all of that.

00:33:57.900 | But you also have to get so close to the bare metal

00:34:00.320 | to like make this model.

00:34:01.700 | Actually, like writing a kernel,

00:34:03.980 | imagine if you had to write,

00:34:05.180 | you're like a CPU cloud provider

00:34:07.180 | and you have to like write instruction sets.

00:34:09.620 | It's like just, nobody will get in that business, you know?

00:34:13.900 | So I salute all of our friends

00:34:16.140 | at Compute Providers doing this work.

00:34:18.000 | And I mean, together it's doing so much

00:34:19.620 | for like 3DAL and like fresh retention too and whatnot, so.

00:34:23.060 | - Yeah, yeah.

00:34:23.900 | So, and that's something that I would leave

00:34:25.660 | as the last part of this sort of war

00:34:27.220 | of GPU rich versus poor.

00:34:29.340 | So there's, the GPU rich people are the model trainers

00:34:34.100 | and the infra providers.

00:34:35.500 | They're saying like, we have the GPUs,

00:34:37.260 | come use our GPUs, you know,

00:34:39.380 | and then we provide you the best inference, right?

00:34:42.260 | And that's what we've been discussing so far.

00:34:44.580 | On the other side, on the GPU poor side,

00:34:47.180 | are like all the alternative methods, right?

00:34:49.100 | The modulars, the tiny corps, the QLoras,

00:34:53.260 | and all the other types of stuff.

00:34:54.620 | I even put consistency models in there

00:34:56.140 | because, you know, any efficiency or distillation method

00:34:59.180 | where you go from, like you reduce your inference

00:35:02.320 | or GPU usage by like 25 to 40 times,

00:35:05.920 | is a GPU poor friendly approach.

00:35:07.880 | - Right.

00:35:08.720 | (laughing)

00:35:11.080 | - So I will also put Apple and MLX in there.

00:35:13.040 | And that's also like,

00:35:13.880 | Apple is finally making moves in inference

00:35:16.520 | and that will be a game changer for local models

00:35:19.280 | because then you just don't need any cloud inference at all.

00:35:21.960 | You just run it on device, which is fantastic.

00:35:24.720 | And then obviously RWKV and Mamba

00:35:26.800 | and Stripe Tina from together.

00:35:29.400 | Like all those like emerging models.

00:35:31.040 | I don't know, there's something I've been worried about

00:35:33.320 | for a latent space.

00:35:34.520 | How much attention should we give

00:35:36.560 | to the emerging architectures?

00:35:39.320 | Because there's a very good chance

00:35:41.520 | that one, these things don't work out.

00:35:43.400 | Two, they take a very long time to work out.

00:35:46.000 | And then three, once they work out,

00:35:50.160 | they're like for limited domains and like not super usable.

00:35:53.600 | So I don't know if you have opinions on that.

00:35:56.560 | I can follow up with one conclusion that I've had,

00:35:58.920 | but I want to--

00:35:59.760 | - Yeah, no, I want to hear it.

00:36:00.600 | - Put that question open to you.

00:36:01.840 | So the one conclusion is RWKV

00:36:05.360 | and the state space models, including Mamba,

00:36:08.360 | have historically just been pitched

00:36:09.880 | as super long context models.

00:36:12.000 | And I'm like, that's not something I need

00:36:16.280 | because I'm okay with 100K context.

00:36:18.680 | I'm okay with rag and recursive summarization,

00:36:23.480 | all those techniques to extend your context,

00:36:25.480 | like rope and yarn and all these things.

00:36:28.280 | So why do I need million context models?

00:36:32.080 | Why do I need 10 million, 100 million, 1 billion models?

00:36:35.480 | Like, why?

00:36:36.540 | So the more convinced, the easiest argument is,

00:36:42.520 | oh, you can consume very, very high bit rate things

00:36:45.520 | like video and DNA strands.

00:36:48.100 | And then you can do like syn-bio and all that good stuff.

00:36:51.160 | And I'm like, okay, I don't know anything about that.

00:36:54.120 | Like what happens if like you hallucinate

00:36:56.780 | like one wrong chain in your, you know,

00:37:00.240 | the DNA strand that you're trying to synthesize?

00:37:02.360 | Good luck.

00:37:03.600 | I don't think, I don't know.

00:37:05.640 | So like, that's why I've been historically

00:37:08.760 | underweighting intentionally our coverage

00:37:11.480 | of state space models

00:37:12.400 | and the non-transformer alternatives until Mamba.

00:37:17.100 | Mamba really changed things where basically

00:37:20.300 | for the same amount of compute,

00:37:22.620 | you can get a lot more mileage

00:37:23.880 | or a lot more performance for the same size of model.

00:37:26.440 | And then it's a different, now it's an efficiency story.

00:37:28.760 | Now it's a GPU poor story.

00:37:30.920 | It is no longer a long context story.

00:37:32.680 | It is just straight up,

00:37:34.360 | we are strictly more efficient than transformers.

00:37:36.520 | I'm like, oh, okay, I can get that.

00:37:38.280 | Does that change anything?

00:37:41.380 | I don't know.

00:37:42.760 | - No, that makes sense.

00:37:43.600 | I think the people look at the slope, right?

00:37:46.480 | Which is like, oh, you can get the context higher and higher.

00:37:49.320 | But in reality, it's like, if you kept the context smaller,

00:37:52.640 | instead look at the anti-slope, so to speak.

00:37:54.920 | It's like same context, it's like a lot less compute.

00:37:57.380 | - Yes.

00:37:58.220 | Yeah, so that was not clear to me until Mamba.

00:38:01.640 | And so I think that's interesting.

00:38:04.560 | I do think that there's a concept

00:38:07.280 | that I've been trying to call the sour lesson.

00:38:10.440 | You know, the bitter lesson is

00:38:12.640 | stop trying to do domain specific adjustments,

00:38:14.640 | just scale things up and it's going to work.

00:38:17.040 | That's general intelligence.

00:38:18.440 | General intelligence is dislikes any attempt

00:38:23.440 | to imbue inside of it special intelligence.

00:38:28.840 | Like if you have like any switch case or if statements,

00:38:32.040 | or like if finance do this, if something do that,

00:38:35.480 | don't bother, just scale things up.

00:38:37.100 | And it's going to do all of them simultaneously

00:38:39.320 | all better at once.

00:38:40.160 | That's the bitter lesson.

00:38:42.160 | The sour lesson is a parallel, is a corollary,

00:38:45.800 | which is stop trying to model artificial intelligence

00:38:49.220 | like human intelligence, right?

00:38:51.080 | Like the neuron was inspired by the brain,

00:38:53.920 | but doesn't work exactly like the brain.

00:38:55.860 | Machine learning uses back propagation,

00:38:58.000 | the brain does not use back propagation.

00:39:00.520 | And so why should, we keep trying to create alternatives

00:39:05.520 | to transformers that look like RNNs,

00:39:09.040 | because we think that humans act like RNNs.

00:39:12.000 | We have a hidden state and then we process new data

00:39:14.680 | and we update that state.

00:39:16.020 | But maybe artificial intelligence or machine intelligence

00:39:20.720 | doesn't work like that.

00:39:22.280 | And maybe we just fail every time we try.

00:39:24.480 | (laughs)

00:39:26.780 | So that's the sour lesson.

00:39:28.320 | Every time we try to model things.

00:39:29.800 | And my favorite analogy, I actually got this from,

00:39:31.840 | I think an old quote from Sam Altman, who was like,

00:39:35.200 | you know, like we made the plane, the airplane.

00:39:37.720 | It was inspired by birds,

00:39:39.640 | but it doesn't work anything like birds, right?

00:39:41.840 | It just is, and it works very efficiently.

00:39:44.000 | Like it's probably the safest mode of transportation

00:39:45.720 | that we have, and it works nothing like a bird.

00:39:48.800 | So why should artificial intelligence

00:39:50.720 | work like human intelligence?

00:39:52.640 | And that is the philosophical debate underlying

00:39:55.800 | my continued cautiousness around state-space models.

00:40:00.800 | Which I don't know if it's,

00:40:03.600 | I feel very vulnerable saying this,

00:40:07.560 | because I don't think there's any justification

00:40:09.700 | once you look at the empirical results

00:40:12.700 | or like the mathematical justifications for these things.

00:40:16.040 | But there is some grounding in philosophy

00:40:17.940 | that you should have when you think about,

00:40:20.440 | does an idea make sense?

00:40:21.800 | Does it, is it worth exploring?

00:40:23.440 | - Yeah.

00:40:24.280 | Well, I think now there's a lot of work

00:40:28.320 | being put into it, right?

00:40:29.680 | And I think transformers have shown enough success

00:40:32.960 | that people are interested in finding the next thing.

00:40:37.040 | You know? - Yeah.

00:40:37.880 | - So before it wasn't clear if transformers

00:40:40.440 | were really gonna work.

00:40:41.260 | So people were kind of working on them.

00:40:43.760 | But yeah.

00:40:44.600 | Okay, maybe in the 2025 recap, we're gonna have more.

00:40:49.560 | - Yeah, I mean, we'll try to do one before that.

00:40:51.680 | So we actually have a link.

00:40:53.540 | I don't know if you know this.

00:40:54.560 | Shreya Rajpal from Guardrails.

00:40:56.480 | She's married to Karan from--

00:40:58.360 | - From Hazy.

00:40:59.800 | - Sorry? - He was a Hazy, right?

00:41:01.600 | - Yeah, from Hazy, yeah.

00:41:02.960 | And so now he's started one of the other

00:41:04.880 | state-space model companies.

00:41:05.800 | I forget the name of it, so we'll see.

00:41:08.120 | And I'm sure this will be an emerging topic

00:41:11.540 | this year as well.

00:41:12.420 | So we'll don't have to wait 'til next year.

00:41:14.380 | - Yeah, no, I think we're gonna have

00:41:15.980 | maybe the sour lesson, you know, overview.

00:41:20.720 | - Well, I mentioned this in the Luther Discord,

00:41:22.260 | and then they were like, okay, so what is the spicy lesson,

00:41:24.940 | and what is the salty lesson, what is the sweet lesson?

00:41:28.580 | - I want the sweet lesson, sounds better.

00:41:30.940 | Cool, talking about GPU port, let's do multimodality.

00:41:35.780 | - Well, I feel that stable diffusion was like

00:41:38.140 | the first GPU port model, you know?

00:41:41.840 | Everybody was running it at home.

00:41:42.680 | - Yes, yes, absolutely, I should,

00:41:43.880 | I don't know if I mentioned that.

00:41:44.760 | I just didn't mention it.

00:41:45.840 | Stability, I think in 2023, you know,

00:41:48.880 | they shipped incremental things.

00:41:50.560 | I think, I don't know if stable diffusion 2 was out there,

00:41:54.000 | but everyone's talking about XTXL Turbo,

00:41:55.760 | which is a form, which is an alternative

00:41:58.200 | to consistency model, but looks like a consistency model.

00:42:01.000 | They shipped video diffusion.

00:42:02.240 | They shipped a whole bunch of stuff,

00:42:03.860 | but just wasn't as big as 2022 when they, you know,

00:42:06.840 | made a huge impact with stable diffusion.

00:42:08.640 | - Yeah, yeah, I mean, it's hard to up to.

00:42:11.400 | - It's hard to top that.

00:42:12.240 | - Stable diffusion.

00:42:13.640 | But yeah, mid-journey has been doing great, obviously.

00:42:15.960 | I actually finally signed up for a paid account last month.

00:42:20.120 | - Mid-journey, yeah, yeah.

00:42:20.960 | - Yeah, I'm part of the $200 million a year.

00:42:24.080 | - You have to, yeah, so now it's,

00:42:27.840 | what's confirmed is, I think, like a Business Week article,

00:42:30.480 | or Economist, or Information article,

00:42:32.760 | that yeah, this team has now reached

00:42:36.580 | at least $200 million ARR, completely bootstrapped.

00:42:40.300 | I think their employee count is somewhere

00:42:42.460 | between like 15 and 30 people.

00:42:44.380 | I don't know if you know exact numbers.

00:42:47.420 | I have heard rumors that their revenue

00:42:50.420 | is actually higher than that, that was what was reported.

00:42:53.460 | But it's between the $200 million to $300 million range,

00:42:55.660 | which is crazy.

00:42:56.500 | Especially if it's like primarily B2C.

00:43:01.440 | - Mm-hmm, yeah.

00:43:03.000 | - Which it looks like it is.

00:43:04.480 | - Yeah, yeah, yeah.

00:43:05.680 | It's like B to Fiverr to B.

00:43:08.240 | I think there's like a ton of--

00:43:09.880 | - Oh, you think there's a lot of Fiverr, yeah, yeah, yeah.

00:43:12.000 | Mid-journey specialists.

00:43:12.840 | - Yeah, yeah, you can like get in Discord

00:43:14.400 | and see what people are generating, you know?

00:43:16.800 | And you can see a lot of it is like product placement,

00:43:19.960 | ads, and a lot of stuff like that.

00:43:21.800 | - Yeah, and DALI 3 doesn't seem to have any impact on--

00:43:24.680 | - DALI 3 got so much worse after the GPD 4.

00:43:28.000 | - Really?

00:43:28.920 | - Like the all-in-one.

00:43:30.520 | Well, first of all, before you could generate four images.

00:43:33.300 | And then I had like very good vibes.

00:43:34.900 | Now the vibes are like boomer vibes.

00:43:37.180 | - Oh, no.

00:43:38.020 | - Every time I generate something--

00:43:39.220 | - The images I have here are DALI 3.

00:43:40.780 | - Every time I generate something on DALI,

00:43:43.180 | it looks like some dusty, old, yeah, like mid-2000s.

00:43:48.180 | - I think it's a skill issue.

00:43:50.820 | I think you have DALI 3 wrong.

00:43:52.100 | - No, but that was the great thing about DALI 3, right?

00:43:55.740 | It's like it made the prompt better for you.

00:43:57.660 | - Yeah, yeah, yeah.

00:43:58.580 | Before, like literally when it first came out,

00:44:00.960 | I'm like, "Hey, make a coliseum with llamas."

00:44:04.000 | And it was like this beautiful thing.

00:44:05.520 | I feel like now it's not, I don't know.

00:44:08.280 | Again, it's a model, right?

00:44:10.440 | So it's like maybe I just got unlucky.

00:44:12.160 | I'm in the wrong latent space, but yeah.

00:44:17.160 | - Yeah, there's a lot of players in this.

00:44:19.140 | I don't even think I put some of the players

00:44:21.840 | I was really excited about.

00:44:22.960 | Like, you know, the Imogen team spit out

00:44:24.520 | to create Ideogram, that was a few months ago.

00:44:27.720 | And I didn't even put it here because I forgot.

00:44:30.280 | - It's too much, I can't keep track of all of it.

00:44:33.560 | - Yeah, yeah.

00:44:34.400 | Okay, so I will just basically say that I do think

00:44:36.680 | that I used to, at the end of 2022, start of 2023,

00:44:40.640 | I was not as excited about multimodality.

00:44:43.680 | Obviously, I'm more excited about it now.

00:44:46.440 | I used to think that text-to-image

00:44:48.520 | was more like hobbyist kind of, you know, work,

00:44:52.680 | but $300 million a year is not hobbyist.

00:44:54.360 | - Right.

00:44:55.200 | (laughing)

00:44:57.120 | - It is not like, you know, not just like not safe for work

00:45:00.480 | because mid-journey doesn't do not safe for work.

00:45:02.560 | So it's real, it's a new form of art, it's citizen art.

00:45:06.360 | It's exciting, it's unusual and interesting,

00:45:09.920 | and you can't even model this as an investor,

00:45:14.920 | you can't even model this on an existing market.

00:45:18.140 | Because like, there's just a market of people

00:45:20.760 | who would typically not pay for art,

00:45:23.040 | and now they pay a little bit for art,

00:45:25.160 | which is digital, not as good as human,

00:45:27.560 | but it's good enough, I use it all the time.

00:45:29.880 | - Yeah, I'm surprised I haven't seen a return

00:45:32.680 | of the digital frames that were very popular

00:45:35.620 | during the NFTs boom, people were like, "Oh."

00:45:39.760 | - Yeah, so this is the very, very first "Latent Space" post

00:45:44.200 | was on the difference between crypto and AI in this respect.

00:45:48.900 | So I called this multiverse versus metaverse.

00:45:52.200 | Crypto is very much about metaverse.

00:45:53.960 | Let us create digital scarcity,

00:45:56.000 | and let us create tokens that are worth,

00:45:58.640 | that are limited edition, that are worth something,

00:46:01.200 | and then you display it probably in your PFP

00:46:04.160 | as your representation of yourself.

00:46:06.240 | And what AI represents is multiverse,

00:46:09.280 | which is a very positive sum instead of zero sum,

00:46:12.680 | where if you like a thing, okay,

00:46:14.920 | I'll choose a different seed,

00:46:15.900 | and I'll make a completely equivalent second thing,

00:46:18.400 | and that's mine.

00:46:19.440 | And that means very different things for what value is,

00:46:23.360 | and where value accrues.

00:46:25.000 | So like, yeah, I mean, I still cling to that insight,

00:46:28.000 | even though I don't know how to make money from it.

00:46:30.000 | I think that, I mean, obviously MidJourney figured it out.

00:46:32.620 | I think MidJourney like made the right approach there.

00:46:36.240 | The other one, I think I'll highlight is 11 Labs.

00:46:38.480 | I think they were another big winner of last year.

00:46:41.020 | I don't know, did they renounce their fundraise?

00:46:44.080 | I think so.

00:46:45.320 | - I don't know.

00:46:47.020 | - Rumor is--

00:46:47.840 | - Yeah, rumor is.

00:46:48.680 | - Rumor is, I can say it, you don't have to say it,

00:46:50.280 | because I only heard it from my friends.

00:46:52.360 | Rumor is they're now Unicorn.

00:46:53.760 | And they just focus on voice synthesis,

00:46:57.320 | which again, I did not care about it at the start of 2023.

00:47:01.960 | Now we have used it for parts of latent space.

00:47:04.340 | I listen almost every day to an 11 Labs generated podcast,

00:47:09.320 | the Hacker News Daily Recap podcast.

00:47:11.120 | I don't know what the room for this to grow is,

00:47:17.000 | because I always think like it's so inefficient

00:47:19.080 | to talk to an AI, right?

00:47:21.200 | The bit rate of a voice-created thing is so low.

00:47:24.720 | It's only for asynchronous use cases.

00:47:27.640 | It's only for hands-free, eyes-free use cases.

00:47:30.080 | So why would you invest in voice generation?

00:47:34.240 | I don't know, but it seems like they're making money.

00:47:36.680 | - Right, yeah, yeah.

00:47:37.520 | Yeah, I mean, Sarah, my wife, yeah, she uses it

00:47:40.520 | while she drives to talk to Chad Chibiti.

00:47:43.100 | - I see.

00:47:43.940 | - Just like--

00:47:44.760 | - Yeah, so Chad Chibiti uses their own TTS.

00:47:47.480 | It's not 11 Labs, okay.

00:47:49.120 | But you can see the modality.

00:47:51.280 | - What does, we should bring Sarah in at some point, but--

00:47:54.440 | - Customer interview.

00:47:55.900 | - What does she use Chad Chibiti voice for?

00:47:58.300 | - We're doing a bunch of home renovation.

00:48:00.140 | So maybe she's driving to Home Depot,

00:48:02.640 | and it's like, hey, what am I supposed to get

00:48:05.720 | to replace the sink, you know?

00:48:08.160 | Or all these sort of things

00:48:10.640 | that maybe were like Google searches before.

00:48:13.080 | Now you can easily do eyes-free, hands-free.

00:48:17.440 | - Yeah, a lot of people have told me about that,

00:48:18.720 | and I just, when I listen, when I'm by myself,

00:48:22.040 | I always listen to podcasts.

00:48:23.240 | (laughing)

00:48:24.640 | So I don't have time for Chad Chibiti.

00:48:26.520 | And Chad Chibiti, you know,

00:48:28.160 | probably the number one thing they can do for me

00:48:29.720 | is give me like a speed adjustment.

00:48:32.160 | (laughing)

00:48:33.000 | So I can listen in twice.

00:48:33.820 | - Yeah, yeah, yeah, yeah.

00:48:34.660 | (laughing)

00:48:35.480 | - Yeah.

00:48:36.320 | - That's funny.

00:48:37.160 | - Yeah, anyway, so like, I'm curious about your thoughts

00:48:38.880 | on like how, and as an investor,

00:48:40.980 | I think this is the weirdest AI battlefront for investing.

00:48:45.880 | 'Cause you don't know the time.

00:48:48.280 | - It's funny because there was, I'm trying to remember,

00:48:51.560 | there was a bunch of companies

00:48:53.360 | doing synthetic voices a while ago.

00:48:56.040 | And I think the problem,

00:48:57.360 | a lot of them got through like good ARR numbers,

00:49:00.160 | but the problem was like a repeatability or use case.

00:49:03.080 | So people were doing all sorts of random stuff, you know?

00:49:05.720 | And the problem is not, it's kind of like mid-journey.

00:49:09.000 | The problem is not that there's not

00:49:10.480 | maybe a market of interest.

00:49:12.240 | It's like, how do you build a venture-backed company

00:49:14.560 | with like a scalable go-to-market

00:49:16.040 | that like can go after a customer segment

00:49:18.200 | and like do it repeatedly?

00:49:19.760 | I think that's been the challenge.

00:49:21.240 | I don't know how 11 Labs is doing it,

00:49:23.560 | but you could do so many things with voice,

00:49:27.000 | text-to-voice that is like, how do you sell it?

00:49:29.240 | You know, who do you call?

00:49:30.680 | Like, that's like the hardest thing, right?

00:49:33.840 | If you're raising like a Series A, a Series B,

00:49:35.880 | it's like, how are you gonna invest this money

00:49:38.280 | in sales and marketing to get revenue back?

00:49:40.280 | It's kind of like the basic of it.

00:49:41.920 | And it can be challenging.

00:49:43.540 | That's why sometimes investors are like,

00:49:45.720 | you're making money and that's great for you,

00:49:47.640 | but like how--

00:49:49.960 | - There's no industry--

00:49:50.800 | - Yeah, it's hard.

00:49:52.040 | It's hard to like just tie it together.

00:49:55.280 | - Okay.

00:49:56.120 | I would be interested in,

00:49:59.320 | because I feel like there's a category of companies

00:50:01.800 | in the early 2010s that did this,

00:50:04.360 | meaning they offered an API

00:50:06.420 | with no idea how you were gonna use it.

00:50:09.120 | I'm thinking Twilio.

00:50:10.260 | And Twilio has a cohort of like sort of API-first companies

00:50:16.520 | that are all like sort of Twilio inspired.

00:50:19.200 | But yeah, I think there's a category or a time in the market

00:50:25.320 | when it makes sense to just offer APIs

00:50:27.520 | and just let your customers figure it out

00:50:29.720 | and it's actually okay.

00:50:31.040 | And then there's sometimes when it's not okay.

00:50:33.860 | And I think the default investor mentality right now

00:50:36.260 | is that it's not okay

00:50:37.100 | if you don't know what your customer is doing.

00:50:39.240 | - I think, well, Twilio is a funny example

00:50:41.760 | because I think in the middle 2010s,

00:50:44.480 | Uber was like 15% of Twilio's revenue.

00:50:47.200 | - I'm just, I'm talking like,

00:50:49.760 | move yourself back as to like Twilio seed investor,

00:50:51.880 | Twilio series A investor, they had no idea.

00:50:54.460 | Uber wasn't even around.

00:50:55.800 | - But I think the thing now is like,

00:50:58.800 | text to voice is not new, you know?

00:51:02.260 | Like that's really the thing.

00:51:03.360 | It's like, what's new now

00:51:04.760 | is that you can generate very good text

00:51:06.960 | to then feed into the model.

00:51:09.000 | So that changes why the market is interesting, you know?

00:51:12.160 | But if you really think about it,

00:51:14.340 | the models today are a little better.

00:51:16.120 | They're maybe like 50% better

00:51:18.160 | than they were three years ago.

00:51:20.140 | But the transformer models under defeated, what to say,

00:51:24.040 | they're like a billion times better.

00:51:26.660 | So imagine if you have like a lot of people use it

00:51:29.000 | for like automated, you know, customer support,

00:51:31.280 | things like that.

00:51:32.500 | Before you had like scripts, they were reading.

00:51:34.600 | Now you have, you can have a transformer model

00:51:37.320 | converse with the customer.

00:51:38.880 | So it makes it a lot more useful in cases.

00:51:42.800 | But we'll see how that changes.

00:51:45.920 | - Okay, the last thing I'll mention here,

00:51:47.940 | why is this a war?

00:51:50.260 | Which is OpenAI and Gemini and Google

00:51:54.620 | are working on everything models

00:51:56.820 | versus each of these individual startups

00:51:59.000 | all working on their selected modality.

00:52:01.860 | And so this is a question of like,

00:52:05.300 | are the big tech companies going to actually win

00:52:07.420 | because they can transfer learning across multiple domains

00:52:11.140 | as opposed to each of these things being point solutions

00:52:13.700 | in their specific things.

00:52:15.460 | The simple answer is obviously everyone will win.

00:52:17.580 | - Right.

00:52:18.420 | (laughs)

00:52:19.240 | - Because the AI market is so huge.

00:52:21.140 | You know, there's a market for the Amazon basics

00:52:24.340 | of like everything, you know, one model has everything.

00:52:26.260 | And then there's a market for,

00:52:27.700 | no, like the basics are not good enough.

00:52:29.060 | I'll need the special thing.

00:52:31.180 | Do you have an opinion on

00:52:33.220 | when does one market win over the other

00:52:35.980 | or is it just like everything's gonna win?

00:52:38.780 | - Yeah, it's interesting.

00:52:40.200 | I think like it works when people wouldn't have used

00:52:43.700 | the product without the Amazon basics, you know?

00:52:46.140 | So like, maybe an example is like a computer vision,

00:52:49.220 | you know, like, I mean, we have--

00:52:50.580 | - Yeah, vision is sore here now.

00:52:52.300 | - Yeah, it's like, you know, before people were like,

00:52:54.700 | why am I bothering trying out

00:52:56.420 | to set up a computer vision pipeline and all of that?

00:52:58.980 | Now they can just go on GPT-4 and put an image

00:53:01.660 | and it's like, oh, this is good.

00:53:03.140 | I could use this for this.

00:53:04.380 | And then they build out something

00:53:06.180 | and maybe they don't use OpenGPT-4v,

00:53:08.940 | they use RoboFlow or whatever else.

00:53:11.000 | That's kind of how I think about it.

00:53:13.120 | It's like, what's the thing

00:53:14.640 | that enables people to try it, you know?

00:53:16.880 | So in a way, the God model can do everything fairly okay.

00:53:21.160 | It's like DALI and MidJourney, you know,

00:53:24.220 | all these different things.

00:53:25.220 | Who's like the, and maybe like MixedRoute,

00:53:28.360 | the MixedRoute inference wars are like another example.

00:53:30.500 | It's like, I would have never put something in my app

00:53:34.180 | at like $2 per million tokens,

00:53:36.700 | but I did it at 27 cents per million token, you know?

00:53:40.720 | And now it's like, oh no, I should really do this.

00:53:43.260 | It's a lot better.

00:53:44.420 | So that's how I think about how the God model

00:53:47.800 | kind of helps the smaller people then build more business.

00:53:51.460 | - Yeah, cool.

00:53:52.580 | Yeah, creates a category.

00:53:53.820 | Yeah, Ragonops.

00:53:56.560 | - Yeah, last but not least, where to begin?

00:54:00.480 | We had almost all of these people on the podcast too.

00:54:04.380 | - They're honestly the easiest to talk to

00:54:06.220 | because they look like DevTools

00:54:08.740 | and you are a DevTools investor.

00:54:11.180 | I worked in DevTools and they're all,

00:54:15.580 | I think they're also more mature, right?

00:54:17.860 | As businesses, there's more of a playbook

00:54:21.060 | that is well understood by the customer.

00:54:23.940 | Like, yes, I need a new stack here.

00:54:26.340 | Maybe not.

00:54:27.260 | So I think the reason, okay,

00:54:28.900 | so my biggest problem with putting databases

00:54:31.620 | versus frameworks versus ops tooling in the same war

00:54:35.620 | is that they're not really a war.

00:54:37.400 | They work cohesively together,

00:54:39.820 | except when one thing starts to intrude on another thing.

00:54:44.780 | And that's why I put the very, very,

00:54:47.100 | I very consciously put together this sequence,

00:54:49.260 | which is databases on the left,

00:54:50.620 | frameworks in the middle, ops companies on the right.

00:54:53.460 | What's the first product of Lang chain?

00:54:55.380 | Lang Smith, which is an ops thing, right?

00:54:57.380 | So now suddenly the framework companies

00:54:59.500 | are not so friendly with the ops companies

00:55:01.260 | 'cause they're trying to compete with the ops companies.

00:55:03.260 | What are the ops companies trying to do?

00:55:04.500 | The ops companies are trying to produce SDKs

00:55:06.180 | that compete with frameworks.

00:55:07.740 | Okay, then what are the database companies trying to do?

00:55:10.400 | First of all, they're fighting between each other, right?

00:55:12.660 | There's the non-databases, all adding vector features.

00:55:17.220 | We had some people approach us

00:55:18.900 | and we had to say no to them 'cause there's just too many.

00:55:21.020 | And then there's the vector databases coming up

00:55:22.980 | and getting $235 million to build vector databases.

00:55:27.980 | Maybe I'll just, you know,

00:55:30.420 | obviously you're an active investor in some of these things,

00:55:32.540 | so you cannot say everything,

00:55:33.660 | but just on databases alone,

00:55:36.180 | one of the biggest debates of 2023,

00:55:39.140 | where do you stand on the whole thing?

00:55:41.180 | - That's the million dollar question.

00:55:45.420 | I think it's really, well, one, in the start everything,

00:55:50.420 | there's kind of like a lot of hype, you know?

00:55:53.020 | So like when Lang chain came out and Lama index came out,

00:55:55.140 | then people were like, oh, I need a vector database.

00:55:57.460 | It's like vector, they search vector database

00:55:59.800 | and it's like Chroma, Pinecone, whatever.

00:56:03.680 | But then it's like, oh,

00:56:04.520 | you can actually just have PG vector in Postgres.

00:56:07.680 | And you already have Postgres.

00:56:09.400 | Did you know it could do that?

00:56:10.600 | People are like, no, I didn't because nobody really cared.

00:56:12.920 | So like, there's not a lot of documentation.

00:56:15.160 | Same with, yeah, MongoDB vector, Cassandra,

00:56:18.760 | all these things. - Redis, Elasticsearch.

00:56:20.840 | - You can actually put vectors and embeddings in everything.

00:56:24.980 | - It's a different kind of index.

00:56:26.380 | And I think like, I mean, like Jeff and Anton also,

00:56:29.900 | what they always talked about even early on,

00:56:31.580 | it's like, this is like a active learning platform.

00:56:34.260 | This is not just like a vector database.

00:56:36.180 | It's like, what do you do with the vectors?

00:56:38.620 | It's like, what's most helpful?

00:56:40.340 | It's not where do you store them.

00:56:41.900 | So that's kind of the change.

00:56:44.860 | - I think that was old Chroma, by the way.

00:56:46.420 | I don't know if that's the new, the current messaging.

00:56:48.660 | - Well, but I think, I'm just saying like to them,

00:56:51.900 | it's never about,

00:56:52.740 | this is the best way to put a vector somewhere.

00:56:55.540 | It's like, this is the best way to operate on the vectors.

00:56:59.180 | And the store is like part of it,

00:57:00.920 | but there's like the pipeline to get things out

00:57:04.100 | and everything, you have to build out a lot more.

00:57:06.260 | So I think 2023 was like, create the data store.

00:57:10.300 | I think 2024 is gonna be like,

00:57:12.460 | how do I make the data store useful?

00:57:14.220 | Because the vector store just commoditized.

00:57:16.820 | So there needs to be something else on top of it.

00:57:19.320 | Yeah.

00:57:21.380 | - Unless they can come up with some kind of like,

00:57:23.940 | new distance function or something.

00:57:25.760 | I keep waiting for Chroma to,

00:57:27.620 | they teased a little bit of what they're working on

00:57:29.300 | at the AI Engineer Summit, which yeah,

00:57:32.020 | density and whatever other fancy formulas

00:57:34.740 | that Anton is cooking up.

00:57:35.980 | But yeah, so I think I tweeted about this

00:57:40.300 | maybe like two, three months ago,

00:57:41.620 | and I think I pissed off Chroma a little bit.

00:57:43.380 | But the best framing of what Anton would respond to here

00:57:47.140 | is what people are embedding within vectors

00:57:49.700 | is a very different kind of data

00:57:51.460 | from what is already within Postgres

00:57:53.660 | and MongoDB and all the others.

00:57:55.260 | So in some sense, it's net new data.

00:57:58.900 | And that actually struck a chord with me

00:58:02.400 | because that's how I started to understand

00:58:05.340 | structured versus unstructured data.

00:58:07.060 | That's how I started to understand,

00:58:09.060 | one of my kind of heroes is Mark,

00:58:13.180 | who's the CTO of MongoDB.

00:58:15.540 | This guy was the former GM of AWS RDS.

00:58:20.020 | And for those who don't know,

00:58:21.020 | GM is like, you're the mini CEO of that business.

00:58:24.580 | And when you work at AWS RDS,

00:58:26.340 | you run a $1, $2 billion a year business.

00:58:29.720 | And now, and then he quits being Mr. Postgres of AWS

00:58:35.360 | to join MongoDB, the enemy.

00:58:37.860 | And when he gave that speech of why he did this,

00:58:42.860 | he was like, actually, if you look at the kind of workloads

00:58:46.700 | that is happening, Postgres is doing well, obviously.

00:58:50.060 | Structured data, always going to be there.

00:58:51.980 | But unstructured data and document type data

00:58:55.500 | is just rising exponential rate even faster.

00:58:58.380 | And for him to say that means different things.

00:59:01.740 | Anybody could have said that.

00:59:02.580 | Anybody could have pointed,

00:59:03.580 | made a chart that showed what he did.

00:59:05.760 | Anybody could have said that.

00:59:06.600 | But for him to have said that, I think it was a very big deal

00:59:08.860 | 'cause he's rich, he doesn't have to work,

00:59:10.700 | but he believed in this so much that he was like,

00:59:13.100 | okay, I'll just join MongoDB.

00:59:15.540 | So I'm like, okay, there's a real category shift

00:59:18.420 | between structured data and unstructured data.

00:59:20.260 | I believe it.

00:59:21.100 | I don't think it's just that you can put JSONB

00:59:23.540 | inside of Postgres and be done.

00:59:24.740 | That's not a NoSQL database.

00:59:26.640 | Okay, fine.

00:59:27.480 | So what is this new thing of vectors?

00:59:30.380 | And how do you think about that as a new kind of data?

00:59:34.740 | And I think if there's a third category

00:59:37.060 | of something beyond unstructured data,

00:59:41.140 | I don't know what it is.

00:59:42.300 | Context or memory or whatever you call it,

00:59:45.940 | whatever you call this kind of new data,

00:59:47.620 | that might belong in a new category of database

00:59:50.260 | and that might create the new MongoDB of this era.

00:59:53.540 | And it could be any one of these guys.

00:59:56.260 | Right now, Pinecone has the lead.

00:59:59.260 | I think they're $750 million company.

01:00:03.220 | - Valuation.

01:00:04.040 | - Yeah.

01:00:04.880 | And then all the others are much smaller.

01:00:06.700 | So like, okay, if there's a room for like,

01:00:09.620 | if this is really a new data category

01:00:12.100 | and there's a room for a key player,

01:00:14.340 | then it's probably gonna be one of these guys.

01:00:16.180 | By the way, I left out Weviate

01:00:17.980 | and I put Kudrant in there.

01:00:19.220 | Do you know why?

01:00:20.180 | - No.

01:00:21.660 | - Anthropic and OpenAI, both use Kudrant

01:00:26.300 | for their internal RAG solutions.

01:00:29.900 | Which means that for whatever reason,

01:00:32.460 | we should probably interview Kudrant.

01:00:33.620 | They passed the evals when Weviate and Milvus

01:00:36.380 | and all the others didn't, which is interesting.

01:00:39.140 | - Yeah, yeah, yeah, yeah.

01:00:40.340 | - There's a lot that we don't know.

01:00:41.540 | - Yeah, interesting.

01:00:43.300 | Yeah, I think like, I mean, going back to your point

01:00:45.300 | of like, Langtring, building Langsmith,

01:00:47.900 | at some point, some of the vector databases

01:00:49.820 | are gonna be like,

01:00:50.940 | why am I letting my customers use Llama Index?

01:00:53.780 | You know, it's like, I should be the RAG interface

01:00:56.980 | since I'm owning the data.

01:00:58.260 | - Yes, yes, that's why I put them next to each other.

01:01:01.220 | Right now they're friends.

01:01:02.600 | - Yeah, right now, but I mean,

01:01:05.820 | if we think about the JAMstack era, you know,

01:01:10.220 | you had Vercel started as Zite, which was just a CDN.

01:01:14.520 | And then you had Netlify, you had all these companies.

01:01:17.820 | And then Vercel built Next.js.

01:01:20.860 | And so they moved down from the CDN to the framework,

01:01:23.700 | you know, and it's like, now they use the framework

01:01:27.140 | to then enable more cloud and platform products.

01:01:30.280 | Which way is it gonna give this way?

01:01:33.380 | I think what we learned from before

01:01:35.260 | is that you rather own the framework

01:01:37.340 | and then have the cloud to support it

01:01:39.380 | than like just have Netlify

01:01:40.780 | and not have your own framework.

01:01:42.600 | Just given the way the two companies are doing now.

01:01:46.340 | - So for those who don't know,

01:01:47.180 | I worked at Netlify

01:01:48.000 | and I was very, very intimately involved in this.

01:01:49.740 | (laughing)

01:01:51.340 | - So we don't have to say anything private.

01:01:53.060 | - No, no, no, it's fine, it's fine.

01:01:54.460 | It's well known that Vercel won

01:01:56.300 | and Netlify has pivoted away to a different market.

01:01:59.200 | But is it over learning from an N of one example

01:02:06.220 | that you always wanna own the framework?

01:02:07.540 | - No, no, no, no, no.

01:02:08.580 | Because then the counter example is the same,

01:02:10.500 | which is Gatsby.

01:02:11.620 | - Yes. - Where you own the framework,

01:02:12.940 | you don't own the cloud

01:02:13.780 | and then you don't make money either.

01:02:15.000 | So it's kind of like...

01:02:17.460 | I think we still gotta figure out

01:02:20.560 | where the gravity is in this market.

01:02:23.200 | I think a lot of people will say

01:02:24.320 | the gravity is in the model.

01:02:25.920 | A lot of people will say the gravity is in the embeddings

01:02:28.320 | and the data that you put into it.

01:02:30.120 | A lot of people don't know what they're talking about.

01:02:31.960 | So I think 2024 is supposed to be

01:02:35.740 | the year of AI in production.

01:02:37.400 | I think we're gonna learn soon

01:02:40.440 | who bleats and to where.

01:02:42.000 | - I think that statement is the year of Linux

01:02:45.980 | on the desktop thing.

01:02:47.020 | It's just always gonna be true.

01:02:50.180 | People are always gonna be saying it.

01:02:52.600 | We're gonna be here one year later

01:02:53.760 | and then it's just like, yeah,

01:02:54.840 | this year is the year of AI production.

01:02:56.760 | And it's always gonna be incrementally more true.

01:02:59.200 | But what is the catalyst?

01:03:02.240 | What is the big event that you will point to

01:03:04.520 | and say, aha, now it's in production?

01:03:06.480 | I don't know.

01:03:07.320 | - I think actually being that it's not in production.

01:03:10.640 | A lot of companies, it's funny.

01:03:13.000 | One, they're just like an inherent timeline

01:03:16.640 | that large companies work within.

01:03:18.360 | GPT-4 came out in April.

01:03:21.280 | That's like eight months.

01:03:22.400 | It's like most companies don't buy things

01:03:24.500 | within eight months and implement them.

01:03:26.920 | So I think part of it, just like a physics time limit,

01:03:31.180 | that even people that have been really interested,

01:03:34.100 | you just cannot go through the whole process

01:03:36.760 | of getting them live to all of your customers.

01:03:38.720 | So I think we'll see more of that in good and bad.

01:03:41.640 | It's gonna be a lot of failures

01:03:43.240 | and a lot of successes, hopefully.

01:03:45.400 | Yeah.

01:03:47.320 | - Yeah, any other commentary on tooling,

01:03:49.600 | RAG, ops, anything like that?

01:03:51.520 | I mean, I always tell people,

01:03:54.400 | as much as I'm interested in fine tuning,

01:03:55.680 | I think RAG is here to stay.

01:03:57.760 | Don't even doubt it.

01:03:59.560 | This is a necessary part

01:04:01.000 | that every AI engineer should know.

01:04:03.480 | - Yeah, well, I think, yeah,

01:04:05.640 | it's tied to the infinite context thing, right?

01:04:08.280 | I think the leftover question is like,

01:04:10.840 | do you wanna have infinite context

01:04:13.280 | and hope that the model is good enough

01:04:15.200 | at parsing which parts matter to your query?

01:04:18.200 | Or do you wanna use RAG

01:04:20.120 | and wrap very specific context injection?

01:04:23.560 | I think so far, most people will say,

01:04:25.320 | I'd rather do a context injection

01:04:27.220 | with just what I care about

01:04:28.640 | than put a whole document in there

01:04:30.000 | and hope the model gets it.

01:04:31.060 | But maybe that changes.

01:04:33.660 | - I don't, I like--

01:04:34.780 | - Yeah, no, I mean--

01:04:35.620 | - There's no way it changes.

01:04:37.820 | (laughing)

01:04:38.900 | - Hey, you know, that's great for Lama Index.

01:04:41.140 | - Yeah, yeah, no, it's great.

01:04:41.980 | - You know, like, great luck

01:04:42.800 | is gonna make a lot of money, I guess.

01:04:43.640 | - Yeah, yeah, yeah.

01:04:44.460 | No, it's not clear

01:04:45.300 | that they're gonna make a lot of money, right?

01:04:46.660 | 'Cause they're just an open source project.

01:04:47.980 | I don't think they've launched a commercial thing yet.

01:04:50.380 | - I don't think so.

01:04:52.940 | Because, yeah, Jerry was talking about it on the podcast,

01:04:55.460 | but it wasn't, yeah.

01:04:58.180 | - Yeah, so, I mean, we'll see what they launch this year.

01:05:00.620 | I do have--

01:05:01.460 | - The year of AI in production.

01:05:02.820 | - The year of Lama Index in production.

01:05:04.700 | - Yeah.

01:05:05.700 | Okay, so that's the four wars.

01:05:07.260 | We also covered a bunch of other non-wars

01:05:10.580 | that we skipped over.

01:05:12.380 | I did remember that you actually just published

01:05:15.540 | a piece on the semantic versus--

01:05:17.620 | - The syntax, the semantics.

01:05:19.140 | - Do you wanna cover that as a--

01:05:21.220 | - Yeah, I think, like, I kinda mentioned this

01:05:24.260 | a couple times on the podcast,

01:05:25.940 | but basically, the idea of, like,

01:05:27.540 | code has always been the gateway to programming machines,

01:05:31.940 | and we spend a lot of time making it easier.

01:05:34.140 | So you go from punch cards, like COBOL, to C, to Python,

01:05:39.140 | just to make it easier for the person

01:05:42.660 | to read and write the code.

01:05:44.460 | And through it, we started adding

01:05:45.940 | kind of, like, these semantic functionalities in it.

01:05:48.300 | So in Python, you can do array.sort.

01:05:51.460 | You don't need to know bubble sort.

01:05:52.860 | You don't need to know any algorithm

01:05:54.300 | that you learned in school to do it.

01:05:56.660 | And I think the models are kind of like 100X-ing this,

01:06:00.540 | which is, like, now all you need to do

01:06:02.260 | is, like, create a sign-up form, you know,

01:06:05.580 | where people put a name, email,

01:06:07.020 | and send it to this endpoint.

01:06:09.980 | So it's gonna be a lot easier for people

01:06:12.940 | that know the semantics of the business,

01:06:14.980 | which is, you know, your product managers,

01:06:16.900 | or your business people,

01:06:18.420 | the layer that goes from customer requirements

01:06:22.300 | to implementation, basically,

01:06:24.580 | and have them intervene in the code.

01:06:27.060 | So, you know, how many times, as an engineer,

01:06:30.260 | you have to, like, go change some button color

01:06:32.980 | or, like, some button size,

01:06:34.260 | like, these small things that, like,

01:06:36.060 | you really shouldn't be doing.

01:06:37.900 | And now you can have people with natural language

01:06:40.580 | intervene in the code and write code

01:06:42.900 | that can actually be merged and put in production.

01:06:46.180 | I also wrote the bear case for it,

01:06:47.620 | which is, like, we already have so much trouble

01:06:49.940 | getting engineering teams to collaborate

01:06:52.140 | and get all their changes together

01:06:53.820 | without conflicts and all of these things,

01:06:55.740 | and maybe also having non-technical people

01:06:58.740 | try and do things will be hard.

01:07:00.420 | And models are,

01:07:01.940 | they just think about solving the task at hand.

01:07:05.020 | They don't think about,

01:07:06.140 | I've always told my engineers, it's like,

01:07:08.140 | you need to leave the code base better than you found it.

01:07:10.260 | You know, if you're, like, writing something,

01:07:11.860 | it's like, just,

01:07:13.420 | we cannot always keep adding, like, quick hacks, you know?

01:07:18.740 | And I think models are great at quick hacks,

01:07:20.980 | but sometimes it's like, oh,

01:07:22.820 | this is, like, the 16th button

01:07:24.500 | that you've changed a style for,

01:07:27.180 | you should make a class for it.

01:07:28.460 | That's, like, the dumbest example.

01:07:30.580 | So I think that's,

01:07:32.420 | if that happens,

01:07:34.900 | then I think I'll be a lot more bullish

01:07:36.460 | on, like, coding agents, you know?

01:07:38.820 | But I think now that's kind of the,

01:07:40.460 | until you can have non-technical people

01:07:43.340 | manually query models and look at results

01:07:45.940 | and then say, this is ready to go,

01:07:47.940 | it's gonna be hard to have autonomous agents do it, so.

01:07:51.460 | - Yeah, so I actually had a tweet about it today

01:07:54.660 | because Itamar from Codium actually published

01:07:57.820 | Prompt Engineering, Flow Engineering,

01:08:01.340 | as his next evolution of Prompt Engineering.

01:08:03.780 | And they've been working on, you know, in IDE agents.

01:08:08.100 | They call it agents,

01:08:09.220 | you can debate about the definition of an agent,

01:08:12.460 | you know, at the end of the day.

01:08:13.580 | So my split of it is inner loop versus outer loop,

01:08:16.980 | which I think you understand that.

01:08:18.580 | Maybe I have to explain it to the audience,

01:08:19.900 | because every time I talk about it to developers,

01:08:21.500 | they don't, they've never heard of it.

01:08:23.500 | So inner loop is everything that happens

01:08:25.460 | between a Git commit.

01:08:27.260 | Outer loop is everything happens

01:08:30.140 | after the commit is committed and it's pushed up for PR.

01:08:34.980 | So maybe that's too reductive,

01:08:37.540 | but that's something like that, right?

01:08:38.460 | Like, inner loop happens within your IDE,

01:08:40.580 | outer loop happens in GitHub, something like that.

01:08:43.500 | Okay, so I think your conception of an agent

01:08:47.300 | is outer loop-y, especially if it's non-technical, right?

01:08:50.460 | Like the dream, like you mentioned

01:08:52.100 | sweep.dev in your write up.

01:08:53.740 | And there's also CodeGen, there's also maybe Morph,

01:08:58.060 | depends what Morph is doing.

01:09:00.100 | And there's a bunch of other people all doing this stuff.

01:09:02.940 | Even small developer was also like,

01:09:04.820 | you know, write in English and then create a code base.

01:09:08.180 | And I think it's just not ready for that.

01:09:10.980 | Outer loop is a mirage,

01:09:13.460 | it's like going to forever be five years away.

01:09:17.100 | And the people working on inner loop companies

01:09:18.940 | have been the right bet.

01:09:20.860 | And you can work on inner loop agents.

01:09:22.660 | I think actually Code Interpreter is an inner loop agent

01:09:26.740 | in a sense of like, it's like limited self-driving, right?

01:09:31.100 | It's kind of like, you have to have your attention on it,

01:09:36.100 | you have to watch it, it can only drive a small distance,

01:09:38.940 | but it is somewhat self-driving.

01:09:40.580 | And so I think if you have this like gradations

01:09:42.660 | in your outlook on autonomous agents,

01:09:44.620 | and you don't expect everything to jump to level five

01:09:47.580 | at once, but if you have a idea of what level one,

01:09:50.260 | two, three, four, five looks like for you,

01:09:52.020 | I haven't really defined it apart from this concept

01:09:54.700 | of inner loop versus outer loop.

01:09:56.040 | But once you've defined it, then you can be like,

01:09:57.580 | oh, we're making real progress on this stage.

01:10:00.060 | And this other stage, too early for now,

01:10:02.940 | but at some point somebody will do it.

01:10:04.900 | - Yeah, yeah.

01:10:06.340 | I think like, yeah, maybe level one is like,

01:10:09.020 | I think of it more as just the auto-completion and the ID.

01:10:13.420 | Level two is like asking cursor,

01:10:15.820 | hey, how can I make this change?

01:10:18.660 | But then level three should be like,

01:10:21.260 | to me it's like, we need to separate the inner loop

01:10:24.060 | from the ID, you know?

01:10:25.980 | Like, I need to make a code change.

01:10:28.600 | Sometimes I shouldn't go in the ID.

01:10:30.740 | Sometimes I should be in the UI of the product

01:10:34.780 | and say, hey, that needs to be changed.

01:10:36.360 | Kind of like the, all the preview environments companies

01:10:40.220 | want you to put comments, the PMs put comments.

01:10:43.420 | Like, how do you go from that to code changes?

01:10:46.860 | There should be enough there

01:10:48.500 | to make the code changes happen

01:10:51.380 | through a supervised interface.

01:10:53.100 | - That's how the loop.

01:10:54.740 | - Yeah, but that's kind of like,

01:10:56.260 | I think what these models are doing is like change

01:10:59.940 | where the loop start and end, you know?

01:11:02.260 | Because now you can create code in the outer loop

01:11:04.660 | before you couldn't do it.

01:11:07.100 | - That's the dream, that's the dream.

01:11:09.540 | - Yeah.

01:11:10.380 | - Yeah, I have, yeah, anyway, my focus right now,

01:11:14.500 | I'll say if anyone cares, it's like,

01:11:16.540 | I think the only thing that's working is inner loop

01:11:19.140 | and you should just use inner loop things aggressively,

01:11:21.540 | build inner loop things aggressively, invest in them,

01:11:24.460 | and then keep an eye on the outer loop stuff.

01:11:26.680 | - Yeah.

01:11:27.520 | - It's still very early.

01:11:28.500 | I did invest in CodeGen, this Jhacks thing,

01:11:32.020 | which we mentioned briefly in the Sourcegraph episode.

01:11:36.940 | Do we have other things that we want to mention

01:11:38.460 | or do you want to sort of keep it to the four wars?

01:11:40.940 | - I think that's great.

01:11:41.780 | I thought it was going to be much shorter,

01:11:43.340 | but we're at one hour, 15 minutes.

01:11:46.400 | I thought we were going to run through everything.

01:11:48.020 | - Yeah, I mean, are there, like, okay,

01:11:50.100 | maybe like top two things from December

01:11:53.540 | that you have commentary on.

01:11:55.660 | - I think the needle in a haystack thing.

01:11:59.940 | - Okay, maybe you want to explain that first.

01:12:01.100 | - Yeah, basically like Entropic,

01:12:05.100 | there was like one example floating around

01:12:07.740 | about Cloud's context window,

01:12:09.660 | and you basically gave it this like super long context

01:12:12.020 | on I think like things to do in San Francisco

01:12:14.460 | or something like that.

01:12:15.620 | And then it was like,

01:12:16.460 | what is the most fun thing to do in SF?

01:12:18.300 | And it always, it didn't,

01:12:21.100 | they made this nice chart of like, okay,

01:12:22.940 | based on where it is in the context,

01:12:24.820 | it gave a better or worse response.

01:12:26.660 | And then Entropic responded and they were like,

01:12:29.740 | oh, you just need to add here's the most relevant sentence

01:12:33.100 | in the context as part of the assistant prompt.

01:12:36.540 | And then the chart turns all green all of a sudden.

01:12:40.300 | And I'm like, we cannot still be here, right?

01:12:44.060 | Like, it cannot, this is like some--

01:12:47.660 | - And you have Entropic like telling people,

01:12:49.900 | oh yeah, it's just like just add this magic string

01:12:51.940 | and it works.

01:12:52.780 | - Yeah, it's some like Riley Goodside wizardry.

01:12:55.500 | It's like, I don't want to do that anymore.

01:12:57.260 | I thought Riley, I thought like, you know,

01:12:59.860 | in the early days of GPDs,

01:13:01.500 | like Riley Goodside was doing so much great work

01:13:04.020 | on like prompt engineering and whatnot.

01:13:06.740 | We shouldn't be there anymore.

01:13:08.080 | There shouldn't be somebody telling me,

01:13:10.020 | or like the GPD 4, like I'll give you a $200 tip

01:13:14.260 | if you do this right and like--

01:13:15.420 | - So I collected a whole bunch of like state-of-the-art

01:13:17.820 | prompting techniques.

01:13:19.620 | Yeah, so if you tip the model,

01:13:21.260 | it will give you better results if you promise that.

01:13:24.980 | So okay, here's the current state-of-the-art

01:13:27.340 | for GPT prompting.

01:13:28.420 | It's Monday in October, the most productive day of the year.

01:13:31.020 | You have to take a deep breath

01:13:32.260 | and you have to think step-by-step.

01:13:33.740 | You have to return the full script.

01:13:35.700 | You are an expert on everything.

01:13:37.060 | I will pay you $20, just do anything I ask you to do.

01:13:39.660 | I will tip you $200 every request you answer correctly.

01:13:43.380 | And your competitor models said you couldn't do it,

01:13:45.540 | but you can do it.

01:13:46.440 | Or I think there's another one that I didn't put in here,

01:13:49.420 | but it's like, you know, my grandmother's dying.

01:13:51.580 | This is an emergency, please help me do it.

01:13:53.140 | - Yeah, that's actually my,

01:13:56.060 | I think my most viewed tweet ever.

01:13:58.220 | At OpenAI Dev Day, I tweeted,

01:14:00.340 | no more return JSON or my grandma's gonna die

01:14:03.460 | when they announce JSON mode.

01:14:04.820 | And people love the, people love to get grandma's--

01:14:08.540 | - I haven't heard as much uptake on JSON mode.

01:14:11.740 | I think it's still--

01:14:12.580 | - That's the thing with all this AI stuff, right?

01:14:15.460 | It's like, I mean, and sometimes we're like part of it.

01:14:17.660 | If I think about our chat GPT plugins episode,

01:14:22.220 | I think in the moment people are just like,

01:14:23.940 | oh, this is gonna be such a big deal.

01:14:26.280 | And then it takes varied amount of times

01:14:29.180 | to like really pick up, you know?

01:14:30.700 | - Do you think that will happen in GPTs?

01:14:33.980 | - I think like most people that are using GPTs right now

01:14:38.980 | are trying to get around some sort of weird limitation

01:14:42.780 | of the base model, you know,

01:14:44.060 | or just trying to have a better system prompt.

01:14:46.620 | But like at some point there's limited value

01:14:50.140 | to get out of it.

01:14:50.980 | So the question is like,

01:14:51.800 | what's gonna incentivize people to build more on it

01:14:54.760 | versus just building their own thing out of it?

01:14:58.740 | I don't know.

01:15:00.060 | - Yeah, okay, so I guess my pick for highlight

01:15:03.980 | of last month, there's two.

01:15:05.620 | One, we finally got Gemini.

01:15:07.560 | - Right.

01:15:08.580 | - I think the marketing was dishonest.

01:15:10.620 | - Yeah, we need the soundboard.

01:15:12.180 | - Wham, wham, wham.

01:15:13.860 | - But still, it is a sort of model.

01:15:17.540 | It is a very, very credible alternative to open AI.

01:15:20.860 | And we should be happy for that

01:15:22.220 | because otherwise we live in a open AI only world.

01:15:25.220 | And Gemini is basically the only other

01:15:27.820 | sort of leading contender until Llama 3 drops

01:15:30.460 | whenever Llama 3 comes out.

01:15:32.100 | - It's kind of, I mean, SOC said today they're training it.

01:15:35.740 | - Yeah, it sounds like today they're training it.

01:15:38.340 | For me, I guess I'm still very interested

01:15:41.020 | in like the hardware metagame.

01:15:43.180 | This is a much smaller stakes, but very personal.

01:15:47.260 | I think recently, especially, you know,

01:15:50.460 | we're recording this mid-January.

01:15:52.900 | So after CES, after Rabbit R1 launched,

01:15:55.900 | I think there's a lot of interest in hardware.

01:15:58.940 | I don't know how you feel about it

01:16:00.220 | as an enterprise software investor,

01:16:02.220 | but I think that hardware is hard,

01:16:05.420 | but also it captures context and it makes AI usable

01:16:09.380 | in ways that you cannot currently think about.

01:16:14.020 | And everyone dreams of building an assistant like her

01:16:17.700 | in the movie "Her."

01:16:19.020 | That is a hardware piece.

01:16:20.100 | That is actually not only software.

01:16:21.740 | And probably the hard part is the engineering

01:16:23.820 | for the hardware.

01:16:24.660 | And then the sort of AI engineering

01:16:27.100 | for the assistant within the hardware.

01:16:29.380 | So yeah, I mean, yeah, I'm an investor in tab.

01:16:32.900 | I see a lot of like, you know, interest this month,

01:16:37.900 | but it started last month with the launch of Humane as well.

01:16:40.540 | I don't know if you have thoughts on any of those things.

01:16:43.180 | - Well, I think this year we also get

01:16:44.780 | the Apple Vision Pro thing.

01:16:46.500 | So I think there's gonna be a ton of experimentation.

01:16:50.380 | I think Rabbit got the right nostalgia factor.

01:16:54.780 | You know, it kind of looks like a toy that looks like

01:16:57.980 | a Game Boy Advance, something like that.

01:17:00.500 | I'm curious to see what you got beyond that.

01:17:04.820 | I think, yeah, I mean, obvious,

01:17:06.100 | like right where we have the studio building tab.

01:17:09.740 | And I think that's another interesting form factor.

01:17:12.260 | And I think if you ask them, I think in our circles,

01:17:15.700 | a lot of people are like, well, what about privacy

01:17:18.320 | and all these things?

01:17:19.160 | But he will tell you that we're kind of like a special group

01:17:23.460 | that most people value convenience over privacy,

01:17:26.500 | as you'll learn from the social medias of the last few years.

01:17:30.100 | So yeah, I'm really curious to see how it develops.

01:17:34.100 | - I really like technology where it's,

01:17:36.500 | you're slightly uncomfortable with it on a social level.

01:17:39.120 | And so, you know, for Uber,

01:17:42.500 | it was like this regulation around taxis.

01:17:44.380 | For Airbnb, it was, you know, staying in strangers' homes.

01:17:47.860 | And now it turns out for OpenAI,

01:17:49.500 | it was training on people's content.

01:17:52.980 | - Right.

01:17:53.820 | - Right, now it's becoming a matter of regulation.

01:17:56.500 | And OpenAI's data partnerships are, you know,

01:17:58.800 | a form of private regulatory capture,

01:18:01.020 | which is a playbook that is fantastic.

01:18:02.700 | Like if you, I hope it was on purpose,

01:18:04.860 | because whoever did that is a genius.

01:18:07.060 | So I'm like, okay, like, you know,

01:18:09.180 | I do think that every great new company,

01:18:12.540 | especially on the consumer side,

01:18:13.740 | is provocative in that sense.

01:18:14.980 | Like they're doing something that is not yet kosher.

01:18:18.340 | And so I think, like, the Humane's, the Tabs,

01:18:21.700 | anything that is working on that front where it's like,

01:18:25.100 | yeah, I'm not sure I'm comfortable with this.

01:18:27.180 | And then, but maybe it could change.

01:18:28.860 | That is a really interesting shift.

01:18:31.420 | So I'm excited from that point of view,

01:18:34.460 | but at the same time, most hardware companies fail very quickly.

01:18:38.540 | They have a very hot start and then, you know,

01:18:40.180 | everyone puts it in their drawer

01:18:41.620 | and then never looks at it again.

01:18:43.600 | So I'm very, very aware of that.

01:18:45.780 | But I think it's, I mean, it's something interesting.

01:18:47.540 | And I do think, so here's the core thing of it, right?

01:18:50.420 | Aavi doesn't think it's a hardware company.

01:18:53.340 | Aavi, like most of the cost of the $600 for Tab

01:18:57.860 | is going towards GPT costs,

01:18:59.820 | because it's actually processing context.

01:19:01.660 | And the whole idea is that context is all you need.

01:19:03.820 | Like in this world of like, you know, AI applications,

01:19:07.420 | like whoever has the most unique context wins, right?

01:19:10.220 | A unique context could be the quality data war, right?

01:19:12.180 | Like a unique context is like, you know,

01:19:13.720 | I have Reddit info, I have Stack Overflow info,

01:19:15.520 | I have New York Times info.

01:19:17.600 | If I have info on everything you say and do at all times,

01:19:21.560 | that is something that no one else has.

01:19:25.440 | And if he becomes a good store of that,

01:19:28.800 | then like, what can you build with that?

01:19:31.080 | So I'm most excited for him to expose the developer API,

01:19:33.920 | 'cause then I can come in and do all my software stuff.

01:19:36.480 | But he has to build a hardware layer

01:19:38.080 | and get acceptance for that first.

01:19:39.960 | - Right, yeah, no, I'm excited to see.

01:19:43.580 | I'm sure we're gonna see a lot of people

01:19:45.340 | work around with them.

01:19:46.180 | So I'm excited to see.

01:19:48.340 | - I actually, so I think he doesn't like me

01:19:50.260 | because I asked for an off button.

01:19:52.300 | I guess I wanna be able to guarantee you,

01:19:55.020 | if we're having a conversation,

01:19:56.540 | I wanna show you, you see, it's off, right?

01:19:58.500 | It's kind of like, oh yeah, my phone is on silent mode.

01:20:01.260 | Right, there's a physical silent mode button.

01:20:03.580 | But now he just wants it to be always on.

01:20:05.580 | - That's a whole new market.

01:20:08.300 | Like a soundproof storage for your AI pendant

01:20:13.300 | so that you can guarantee the person cannot hear you.

01:20:16.700 | Awesome, no, this was fun.

01:20:21.620 | Please, if you're still listening after one hour,

01:20:24.320 | 21 minutes, let us know what we did right,

01:20:27.160 | what we did wrong, what you would like to see differently.

01:20:29.960 | It's the first time we tried this out, but yeah.

01:20:33.280 | - Awesome, thanks for doing this.

01:20:34.320 | - Cool.

01:20:35.600 | (upbeat music)

01:20:38.180 | (upbeat music)

01:20:40.760 | (upbeat music)

01:20:43.340 | (upbeat music)

01:20:45.920 | (upbeat music)

01:20:48.520 | (upbeat music)

01:20:51.120 | (upbeat music)

01:20:53.700 | [BLANK_AUDIO]

The Four Wars of the AI Stack - Dec 2023 Recap

Chapters