back to index

The Four Wars of the AI Stack - Dec 2023 Recap


Chapters

0:0 Intro
1:42 The Four Wars of the AI stack: Data quality, GPU rich vs poor, Multimodality, and Rag/Ops war
3:35 Selection process for the four wars and notable mentions
8:11 The end of low background tokens and the impact on data engineering
10:10 The Quality Data Wars (UGC, licensing, synthetic data, and more)
21:44 The GPU Rich/Poors War
26:29 The math behind Mixtral inference costs
34:27 Transformer alternatives and why they matter
41:33 The Multimodality Wars
45:40 Multiverse vs Metaverse
54:0 The RAG/Ops Wars
60:0 Will frameworks expand up, or will cloud providers expand down?
65:25 Syntax to Semantics
67:56 Outer Loop vs Inner Loop
71:0 Highlight of the month

Whisper Transcript | Transcript Only Page

00:00:00.920 | - Hey everyone, welcome to the Latent Space Podcast.
00:00:03.980 | This is Alessio, partner and CTO
00:00:05.840 | in Residence at Decibel Partners.
00:00:07.400 | And today I'm joined just by my co-host, Zwickz,
00:00:10.480 | for a new podcast format.
00:00:12.560 | - Yeah, and it's a bit uncomfortable
00:00:15.360 | because we have to just stare
00:00:16.580 | into each other's eyes lovingly.
00:00:18.880 | But in our end of year survey last year,
00:00:21.680 | a lot of listeners were asking us for more one-on-one time,
00:00:25.140 | more opinions from the both of us as hosts
00:00:27.560 | on what's going on in AI.
00:00:28.800 | You know, both of us are very actively involved.
00:00:30.900 | (laughing)
00:00:32.040 | And I don't think this year will be any different.
00:00:34.560 | This year, there's lots more excitement to come.
00:00:37.360 | And I think, you know, we're trying to grow Latent Space
00:00:42.160 | in terms of the types of formats
00:00:43.680 | and the amount of value that we deliver to our subscribers.
00:00:47.240 | So one thing that we've been trying, experimenting with,
00:00:49.320 | is this monthly recap that I started doing
00:00:51.460 | around August of last year,
00:00:52.840 | where I basically just take the notable news items
00:00:55.920 | of the month and then I sort them
00:00:58.200 | and categorize them according to some order that makes sense
00:01:01.880 | and write them down in the newsletter.
00:01:04.340 | And this last December recap was particularly exciting
00:01:08.960 | 'cause it seemed like it popped off in a number of areas,
00:01:11.920 | particularly with the AI breakdown.
00:01:13.560 | Our friend NLW featured it on his podcast.
00:01:16.640 | And I figured we can just kind of go over that
00:01:18.680 | as a way of setting the stage for 2024,
00:01:22.280 | but also recapping what happens in 2023.
00:01:25.280 | - Yeah, and people always ask me
00:01:27.560 | if December is like a slow month,
00:01:29.840 | but I think you almost broke Substack
00:01:32.120 | with how many links we had in the thing.
00:01:34.440 | - No, we actually did.
00:01:35.260 | So a lot of people commented to me
00:01:37.700 | about the formatting issues
00:01:39.760 | within the newsletter that I sent out.
00:01:41.320 | And I know that they are there, but I couldn't fix it
00:01:43.400 | because Substack was broken by us with how long it was.
00:01:46.780 | - But so we had this kind of like four main buckets
00:01:51.840 | called the four words of the AI stack,
00:01:55.400 | data quality, and I guess like data quantity
00:01:58.840 | as well, in a way.
00:02:00.640 | The GPU rich versus poor,
00:02:02.440 | which we have a whole episode about with Dylan Patel.
00:02:05.720 | Multimodality, we're actually recording tomorrow
00:02:08.720 | with LumaLabs about their new 3D model.
00:02:11.000 | So we went from text to image to 3D video.
00:02:15.400 | I wonder what's next.
00:02:17.120 | - And we're gonna release Hugging Face as well.
00:02:19.160 | 'Cause I guess I've been thinking
00:02:20.480 | about calling it multimodality 101,
00:02:22.520 | because the first modality beyond text
00:02:24.920 | that you should really pay attention to is vision.
00:02:27.120 | - Right, yeah.
00:02:28.260 | Yeah, and then the rag ops war.
00:02:31.520 | I think that's a--
00:02:33.120 | - I don't know what to call it.
00:02:33.960 | I don't know if you want to call it anything else.
00:02:36.040 | This is my--
00:02:36.880 | - I don't know.
00:02:37.960 | But I think beginning of last year,
00:02:39.960 | that was like kind of the hottest space
00:02:41.440 | because there wasn't much open source model work.
00:02:43.560 | And I think over the last maybe like four or five months,
00:02:47.160 | everybody's so focused on fine-tuning LLAMA 2
00:02:50.220 | and like a DPO to improve these models,
00:02:53.480 | max trial, and all these things.
00:02:54.880 | And people forgot about our friends
00:02:56.800 | at LangChain, LLAMA Index,
00:02:58.640 | and some of the things that were maybe top of mind.
00:03:01.820 | VectorDBs, it seemed like everybody
00:03:04.080 | was releasing a VectorDB early in the year.
00:03:06.120 | - Yeah, I think that I'll be very surprised
00:03:08.840 | if any new VectorDBs come out this year.
00:03:12.080 | With one exception, which is something
00:03:13.520 | I'm keeping an eye on, which is Turbo Puffer.
00:03:16.560 | I don't know if you've seen them going around.
00:03:19.120 | Yeah, all the smart people seem to be adopting Turbo Puffer
00:03:22.520 | as the first serverless VectorDB,
00:03:25.160 | which could be interesting.
00:03:26.080 | - Yeah, no, and we're going to have definitely Jeff
00:03:29.240 | and Antoine on the podcast at some point.
00:03:31.960 | I know they're going to be fun, I guess, but...
00:03:35.960 | - I should also mention, I think it's interesting.
00:03:38.400 | So the reason I selected these four wars
00:03:41.640 | was a process of elimination of wars
00:03:44.920 | that I think ended up not mattering.
00:03:47.440 | So for those who don't know, inside of my writing,
00:03:52.120 | I often include footnotes that are in themselves
00:03:55.840 | just essays in the footnotes.
00:03:58.640 | And so I think it's also notable,
00:04:00.320 | the things that people thought were hot,
00:04:02.380 | that were less hot than expected.
00:04:05.240 | So it was agents, definitely less hot
00:04:08.400 | than at the start of 2023.
00:04:12.240 | And then this one is a very controversial,
00:04:14.720 | non-selection by me, I think.
00:04:15.960 | Open-source AI is not a battle in the sense that
00:04:19.560 | I don't think there's anyone against open-source AI.
00:04:22.920 | Everyone is on one side.
00:04:24.800 | There's no opposing side apart from regulators.
00:04:27.960 | But in my mind, when I think about for engineers,
00:04:31.560 | engineers are all universally in favor
00:04:34.240 | of open-source models.
00:04:35.560 | So there's no battle here.
00:04:36.800 | Everyone just wants it to improve.
00:04:38.040 | So it's not interesting to write about.
00:04:40.240 | We just want more open-source.
00:04:41.560 | - Yeah.
00:04:42.560 | The only battle is people offering inference on it.
00:04:45.600 | - Yes.
00:04:46.440 | - Killing each other in the process.
00:04:47.520 | - Yeah, so I classified that as a GPU rich versus poor war.
00:04:51.840 | But maybe there's a better way to classify that.
00:04:54.240 | And you can give me some feedback on that
00:04:56.400 | because it's a struggle to try to categorize the world.
00:05:01.400 | Code models as well.
00:05:04.040 | I was very struck by a conversation I had with Pullside.
00:05:07.960 | I saw Kant from Pullside.
00:05:10.200 | So they haven't been on the podcast yet.
00:05:12.000 | They're kind of stealth still,
00:05:14.200 | but they had a very, very notable fundraise.
00:05:15.920 | I think they had like $50 million raised.
00:05:18.440 | - I think even more, yeah.
00:05:19.480 | - For a seed.
00:05:20.960 | Spending most of it on GPUs.
00:05:23.040 | And my conversation with ISO, he was like,
00:05:25.960 | "Hey, you know, like Repl.it
00:05:28.080 | "was like one of our podcast's early biggest winners."
00:05:32.440 | Repl.it didn't really follow up with.
00:05:33.800 | Like they announced like their 1.5 model,
00:05:37.360 | but it's not really widely used beyond Repl.it.
00:05:40.240 | There's Starcoder, there is Code Llama,
00:05:43.800 | but like it's not really, for how important code is,
00:05:47.160 | it doesn't seem like as big of a battlefront
00:05:50.760 | as just general function calling, reasoning,
00:05:53.680 | these other kinds of domains.
00:05:56.760 | And so I thought it was just interesting to note
00:05:59.840 | that even though we as a podcast
00:06:02.840 | try to pay particular attention to developer tooling,
00:06:05.880 | to code models, we interviewed Cursor,
00:06:07.960 | Find, Repl.it, Codium, and Hugging Face.
00:06:11.200 | These all seem like very small
00:06:15.760 | compared to the amount of money being thrown,
00:06:20.200 | the amount of heat in the other domains.
00:06:23.520 | And I don't know why that is.
00:06:25.200 | - Yeah, I think it's maybe the fragmentation of the tooling.
00:06:29.320 | Like most people in code are using VSCode, Cursor, GitHub,
00:06:35.160 | one of the three,
00:06:36.000 | so there's maybe not as much experimentation
00:06:38.000 | versus with text, people are just trying everything.
00:06:41.800 | It's hard to try a code model.
00:06:44.520 | I see code models being released,
00:06:45.880 | but it's not super easy to just plug it into your workflow.
00:06:49.080 | So I think engineers like myself are just lazy.
00:06:52.080 | And it's like, hey, I'm having great success
00:06:54.120 | with whatever I'm using.
00:06:55.880 | I don't really wanna go there.
00:06:57.540 | - Special case form of code is SQL
00:07:01.120 | and the semantic layer data engineering type things.
00:07:04.600 | We also had two guests on there from Seek and Qube.
00:07:09.000 | And we also talked to a bit of Databricks, a bit of Julius.
00:07:11.520 | - Yeah, and we have Brian from Hex.
00:07:13.200 | - And Brian from Hex.
00:07:14.920 | Does he count?
00:07:15.760 | I don't know.
00:07:16.600 | - Yeah, no.
00:07:17.420 | - Yeah, yeah, yeah.
00:07:18.260 | I guess the Hex notebooks, yes.
00:07:19.680 | Hex magic, yes.
00:07:21.680 | Rexis is a different beast.
00:07:24.180 | Anyway, but yeah, I think people who come
00:07:31.640 | to AI engineering for the AI
00:07:34.920 | might actually end up finding themselves
00:07:36.460 | in data engineering in the end.
00:07:38.240 | And in traditional ML engineering in the end,
00:07:41.320 | they might have to discover that they're doing Rexis.
00:07:44.140 | And all the stuff that gets swept under a rug in a demo
00:07:49.800 | becomes their job.
00:07:51.060 | And I think I'll probably say just because we didn't select
00:07:55.600 | a theme for last year doesn't mean it wasn't important.
00:07:58.640 | It just wasn't top of mind yet.
00:08:01.040 | And maybe I think that would be an emerging theme this year.
00:08:04.160 | - Yeah, I think that's kind of the consequence
00:08:06.640 | of the low background tokens,
00:08:08.640 | like the end of the low background tokens.
00:08:10.320 | Once--
00:08:11.160 | - Can you explain what you think
00:08:12.360 | are low background tokens?
00:08:13.640 | This was our November recap.
00:08:14.720 | - Yeah, well, the comparison
00:08:17.640 | that our friend Jeff Huber at Chroma brought up
00:08:20.680 | is steel before the atomic bomb creation.
00:08:24.520 | So steel before and no radiation in it.
00:08:26.840 | After all the testing,
00:08:27.800 | a lot of steel had radiation embedded in it.
00:08:30.120 | So it was really precious to get low background steel,
00:08:34.520 | meaning with no radiation and same with tokens.
00:08:37.200 | You can assume that any internet content
00:08:40.720 | from three years ago, it's just internet.
00:08:43.380 | It doesn't have, it's like people writing,
00:08:45.200 | it's not models writing.
00:08:46.460 | Instead now, anything we're gonna get on Common Crawl,
00:08:49.760 | updates and things like that,
00:08:50.840 | you never know if it's human written or not.
00:08:53.560 | And I think that will put more work on data engineering.
00:08:55.880 | Because even basic stuff like checking
00:08:58.740 | if a text says, as a model created by OpenAI,
00:09:01.840 | it's gonna be important.
00:09:04.080 | So people are just being blindly taking
00:09:06.800 | all the data sets offered by Luther and Common Crawl
00:09:10.320 | and all these different things,
00:09:11.560 | assuming that all the data in it is good.
00:09:14.000 | I think now, how do you build on top of it?
00:09:16.320 | And we've seen the New York Times lawsuit against OpenAI.
00:09:20.080 | We've seen data partnerships starting to rise
00:09:23.280 | in different companies.
00:09:24.920 | I think that's gonna be one of the bigger challenges
00:09:27.520 | and maybe we'll see more of the work
00:09:30.360 | that Databricks has done
00:09:31.400 | to build the DALI 5K instruction tuning,
00:09:34.680 | just first party creation of data.
00:09:37.520 | It's like, you got people sitting at their desk every day.
00:09:40.520 | If everybody wrote five Q&A pairs or things like that,
00:09:44.720 | you would have a massive unique data set for your model.
00:09:48.400 | So, yeah.
00:09:50.480 | - Yeah, for people who missed that episode,
00:09:52.440 | that was one of our early episodes as well.
00:09:54.400 | And Mike Conover since left to start BrightWave,
00:09:58.440 | which I'm sure we'll have him back this year.
00:10:00.280 | - Yeah, they're doing a lot of interesting stuff.
00:10:02.080 | I think the next episode will be very cool.
00:10:05.240 | - Awesome.
00:10:06.160 | So how do you want to tackle this?
00:10:07.040 | Do you want to just kind of go through the four wars?
00:10:09.440 | - Yeah, let's do it.
00:10:10.500 | You have, you created this Wikipedia-like infographic
00:10:16.680 | for each of them.
00:10:18.200 | - Yeah, I should say, the inspiration for this
00:10:20.200 | actually was during the Sam Altman leadership battle,
00:10:25.200 | people were making mock Wikipedia entries for the debate
00:10:32.920 | and for like who's on the side of the D-cells
00:10:35.640 | and who was inside of the EX.
00:10:37.640 | And so I like that format because it's very concise.
00:10:41.320 | It has the list of key players and it's kind of fun
00:10:45.040 | to think about like who's on what side
00:10:46.720 | and think about what is important
00:10:49.160 | and what people are battling over.
00:10:51.000 | I think it is important to focus on key battlegrounds
00:10:55.140 | as a concept because there's so many interesting things
00:10:58.480 | you could be talking about in AI
00:10:59.760 | and they're not all equally interesting.
00:11:01.640 | So how do you decide what is interesting?
00:11:05.320 | I think it's money, it's power, it's people,
00:11:08.800 | it's like impact, that kind of stuff.
00:11:11.740 | And so, yeah, that's what I ended up doing.
00:11:14.560 | And so fun fact, the way I did this
00:11:16.800 | was I actually edited the HTML on Wikipedia
00:11:19.240 | and then I just screenshotted it just to get the formatting.
00:11:22.320 | - Good old developer tools.
00:11:24.240 | Developer tools is all you need.
00:11:25.840 | So the data war, belligerence.
00:11:31.520 | On one side you have journalists, writers, artists,
00:11:34.320 | on the other side you have researchers, startups,
00:11:37.180 | synthetic data researchers.
00:11:39.000 | I guess like maybe we wanna talk about
00:11:43.400 | what are the axis of war.
00:11:45.800 | So like one of them is attribution, right?
00:11:48.520 | Like I think there's a varying spectrum
00:11:52.560 | of how comfortable people are about this data
00:11:54.920 | going into a model.
00:11:55.760 | So some people are happy to have your model trained on it,
00:11:59.800 | no matter what.
00:12:00.760 | Some people are happy to have your model trained on
00:12:02.800 | as long as you disclose that it's in the model.
00:12:05.980 | Some people just hate that you trained on their model
00:12:09.920 | and some people like the New York Times
00:12:12.360 | wants you to destroy any artifact
00:12:14.360 | that might have touched your article.
00:12:16.160 | So that's kind of what we're fighting on.
00:12:19.200 | It's not always, I just wanna make it clear
00:12:21.460 | that it's not just like you should never use the data
00:12:23.920 | or you should always use the data.
00:12:25.320 | I think people are just trying to figure out
00:12:28.040 | what's the right form of attribution
00:12:30.000 | and how do I get paid as somebody
00:12:32.760 | whose data ended up being in this training.
00:12:35.400 | I think we're giving everybody a lot of great tokens
00:12:38.160 | related space because we do full transcripts on everything
00:12:41.040 | and we're happy for people to train models on.
00:12:44.440 | - Oh yeah, please train a latent space model.
00:12:46.880 | - Yeah, we would love it.
00:12:48.080 | So that's kind of what we're fighting on.
00:12:51.120 | Anything that people should keep in mind about this war
00:12:54.680 | and like maybe some of the campaigns that are going on?
00:12:59.160 | - So I think the New York Times one
00:13:01.160 | is probably going to go to Supreme Court.
00:13:03.200 | It is very, very critical.
00:13:05.760 | It is a landmark
00:13:10.800 | war that will probably decide what fair use means
00:13:13.880 | in context of AI.
00:13:16.040 | So I think it's, and I recommend,
00:13:19.280 | I think The Verge did a good analysis of this.
00:13:22.240 | Platformer maybe did a good analysis of this.
00:13:25.080 | There are like four criteria for what fair use is
00:13:27.640 | and everyone basically converges onto the last criteria
00:13:32.440 | which is does your use, does your transformative use
00:13:35.560 | of my copyrighted material diminish the market
00:13:40.360 | for my content?
00:13:41.480 | And it's very hard to say.
00:13:45.600 | I always suspect that yes, in some capacity,
00:13:49.400 | in some amount, but good luck proving that
00:13:52.260 | in a court of law.
00:13:53.160 | And I think a negative ruling on open AI
00:13:59.080 | would seriously stall the progress of AI.
00:14:03.700 | And that's bad for humanity,
00:14:08.040 | but good for content creators and writers
00:14:10.200 | so obviously which we want them to be adequately compensated
00:14:14.000 | and recognized for their work.
00:14:16.320 | So there's like no good, there's like no easy outcome here
00:14:19.740 | apart from the existing copyright system
00:14:21.740 | which is also somewhat broken.
00:14:24.200 | And it's just a very, very tricky,
00:14:26.840 | challenging case, I think.
00:14:30.580 | Yeah, so.
00:14:31.720 | - It's funny because we had something,
00:14:33.360 | I was a community moderator at a website called Rap Genius
00:14:36.760 | which was a lyric sanitation.
00:14:38.780 | And there was like a similar thing in maybe like 2014
00:14:41.760 | or like the music labels basically came to the website
00:14:45.160 | and it's like, hey, this is not fair use.
00:14:47.400 | Like you can not reuse the lyrics to the song
00:14:49.760 | and eventually the website made deals
00:14:52.600 | with the record labels to like be able to do this.
00:14:57.120 | And then Google was stealing the transcripts
00:15:00.020 | to put in like the enhanced thing.
00:15:02.640 | - And they proved it by.
00:15:04.080 | - Yeah, yeah, we did all the like,
00:15:05.760 | basically like the things on the eye,
00:15:07.240 | some eyes we put the dots, some eye we put like the accent
00:15:10.580 | and that's how it made it all better.
00:15:12.640 | - I thought they just varied the spacing
00:15:14.920 | or they like use a different kind of spacing
00:15:16.960 | in the Unicode.
00:15:17.800 | - I think it was the eye thing,
00:15:19.440 | but maybe, I mean, this is like almost 10 years ago.
00:15:21.720 | - So Rap Genius proved it by injecting some data poison
00:15:24.680 | into their corpus and then Google reproduced it faithfully.
00:15:28.560 | So therefore they proved that Google is scraping Rap Genius.
00:15:32.200 | Did Google have to pay Rap Genius money in the end?
00:15:35.360 | - I don't think so.
00:15:37.080 | - But at the same, there was also another issue
00:15:39.200 | with Rap Genius that we had that got blacklisted by Google
00:15:42.720 | for like, there was like a lot going on.
00:15:45.440 | - Of course.
00:15:46.400 | - But anyway, this is not a Rap Genius special.
00:15:48.840 | - Yeah, I mean, ultimately,
00:15:49.800 | like I think that we do need quality data.
00:15:52.700 | I think that then if this case is contained
00:15:55.400 | to the New York Times, the New York Times worse outcome
00:15:58.400 | is that they will substitute it with Washington Post
00:16:01.400 | and they substitute with The Economist
00:16:02.720 | or like the second or third ranked newspaper
00:16:05.720 | that is the most friendly to AI.
00:16:07.680 | And then the New York Times will realize
00:16:08.960 | that actually their words are not as,
00:16:11.960 | not that much more valuable than other words.
00:16:14.460 | And then the value of the content comes down
00:16:17.520 | very, very dramatically.
00:16:19.600 | So I think it will be interesting,
00:16:21.780 | but yeah, I do think it's overstepping their bounds
00:16:24.240 | to call for the destruction of all GPTs.
00:16:26.560 | That's probably for sure.
00:16:28.280 | Then the bigger problem I have
00:16:30.220 | is with Stack Overflow and Reddit,
00:16:31.640 | which I named as on the side of the New York Times.
00:16:35.780 | They have effectively shut down their APIs
00:16:38.260 | in order to try to train their own models.
00:16:41.780 | Probably same as Twitter, actually.
00:16:43.940 | I should probably have put Twitter,
00:16:45.340 | I put Twitter on the wrong side, maybe.
00:16:46.540 | I don't know, Twitter is on both sides.
00:16:48.780 | - Elon is on every side, the side of chaos.
00:16:51.540 | - Yeah, what this is,
00:16:52.540 | is basically every UGC, Users Generated Content company
00:16:56.340 | of the 2000 and 2010s,
00:16:58.740 | now has a giant pile of user content
00:17:01.620 | that becomes valuable data
00:17:05.180 | that used to be open for researchers to scrape
00:17:08.140 | and train models.
00:17:09.340 | Now all of them are locking in their walls, right?
00:17:11.900 | Behind their walled gardens
00:17:13.500 | and then trying to train their own models
00:17:14.820 | to boost their benefits.
00:17:17.020 | So this is a locally optimal outcome for them,
00:17:19.540 | but a globally suboptimal outcome for humanity.
00:17:22.320 | Because why should we care
00:17:24.100 | about the closed garden of Reddit?
00:17:26.180 | The Reddit model, the Stack Overflow model,
00:17:29.620 | the X model, as opposed to it being a part of a data mix
00:17:34.020 | of 20% Reddit, 20% Stack Overflow, 20% X.
00:17:38.820 | That seems like a much better outcome for the world,
00:17:42.020 | but everyone is acting in their very narrow self-interest
00:17:45.180 | in trying to make their own model,
00:17:47.200 | which is probably going to suck.
00:17:49.300 | - Right.
00:17:50.140 | (laughs)
00:17:52.220 | So next, Bor, after you get data--
00:17:54.740 | - Oh, we should mention synthetic data.
00:17:57.320 | - Oh, yeah.
00:17:58.160 | So what happens when you run out of human data?
00:18:01.580 | You make your own.
00:18:02.420 | (laughs)
00:18:04.500 | So I would say that is, when I went to NeurIPS,
00:18:09.060 | that was the number one discussion
00:18:10.440 | out of every single researcher's mouth.
00:18:13.020 | There is a lot of research coming from both, I guess,
00:18:17.420 | the big labs as well as the academic labs
00:18:22.420 | on what good synthetic data looks like.
00:18:25.420 | I don't know if you've talked to any startups around that.
00:18:27.700 | I just talked to Luis Costricado the other day,
00:18:30.820 | and he is promising a very, very interesting approach
00:18:35.820 | to synthetic data generation.
00:18:37.580 | I think his phrase for it
00:18:40.140 | is pre-trained-scale synthetic data,
00:18:42.500 | as opposed to what the news research
00:18:44.940 | and the other open-source communities have been doing,
00:18:46.900 | which is fine-tuned-scale synthetic data.
00:18:50.340 | And so he wants to create trillion-token datasets
00:18:53.880 | that are all synthetic.
00:18:55.360 | And I'm like, okay, that's interesting,
00:18:57.180 | but also at the same time,
00:18:58.600 | these are all just downloads from GPT-4 or something else.
00:19:03.600 | So Luis is very aware of that, and he has a way around it.
00:19:09.220 | I don't really understand it,
00:19:10.100 | but he claims that that's a good way around it.
00:19:14.100 | Andrej Karpathy at NeurIPS
00:19:16.860 | highlighted this paper from DeepMind
00:19:18.340 | where they were bootstrapping synthetic data
00:19:22.660 | that could be verifiably proven correct.
00:19:26.640 | So specifically in math and in code,
00:19:29.620 | where there is a correct answer.
00:19:31.100 | So yeah, that makes sense.
00:19:33.260 | You can solve the synthetic data problem that way,
00:19:35.780 | but what about beyond that?
00:19:37.660 | There's just no answer.
00:19:39.620 | - And wasn't part of the issue also
00:19:42.220 | that the way that the phrases are constructed
00:19:45.380 | and all of that in synthetic data
00:19:46.980 | ends up making mode collapse even worse?
00:19:51.140 | Because one thing is right or wrong, right?
00:19:53.380 | The other thing is every sample is read in the same way,
00:19:58.140 | or as a similar, since it comes from a certain model,
00:20:01.920 | kind of as a similar root of structure.
00:20:04.480 | - You already have, yeah.
00:20:06.200 | So I mentioned this in the best papers discussion
00:20:09.400 | with John Frankel.
00:20:10.400 | So the basic argument is you already have
00:20:13.380 | a flawed distribution from a language model.
00:20:17.280 | You are resampling that flawed distribution
00:20:19.240 | to double down on that flawed distribution.
00:20:22.040 | There's no extra information from humans.
00:20:23.980 | So on principle, how can this work?
00:20:25.920 | And so the only conclusion there
00:20:29.020 | is you don't need it to emulate a human.
00:20:31.420 | You need it to emulate a useful assistant,
00:20:33.600 | however you define it.
00:20:34.800 | So I think that the goal of synthetic data
00:20:38.140 | is less to emulate human speech,
00:20:41.060 | because that is basically solved.
00:20:43.260 | It is now more to spike the distribution in useful ways.
00:20:47.940 | And that's a phrase I borrowed from Kanjun.
00:20:49.540 | But anyway, so I think that synthetic data
00:20:51.380 | will be a giant theme for this year,
00:20:53.380 | and not least because the human data
00:20:56.140 | is being locked up behind walls.
00:20:58.460 | So it's a very, very clear trend.
00:21:00.500 | This is probably the most amount of money
00:21:03.820 | after GPUs will be spent here on data.
00:21:06.540 | So one war I did not put here was the talent war, right?
00:21:09.300 | Like the war for PhDs and smart people.
00:21:11.700 | But when you break down what the talent people do,
00:21:16.020 | one is they make models and they run inference on GPUs.
00:21:21.260 | Or they run training runs on GPUs.
00:21:24.460 | But the other is they clean data.
00:21:26.060 | They find data, clean data, and format data.
00:21:28.860 | And so yeah, these are all just proxies
00:21:30.740 | for the kind of talent that is flowing back and forth.
00:21:33.460 | And ultimately, I think you have to focus
00:21:35.500 | on what they're working on,
00:21:37.540 | the visible output of what they're working on, which is data.
00:21:41.260 | - All right, let's talk about the GPU inference war.
00:21:44.720 | I think this is one that has been heating up.
00:21:47.100 | And we actually have a bunch of these folks
00:21:49.780 | coming on the podcast in the next few days.
00:21:51.940 | - Yeah, yeah, yeah.
00:21:52.780 | Are we calling it compute month?
00:21:54.020 | - Yeah, we can figure out a name,
00:21:55.620 | but we have modal, together, replicate.
00:21:59.380 | There's a lot coming up.
00:22:01.940 | But basically, the Mixedraw release, the MOE model,
00:22:05.620 | was kind of the spark of the war.
00:22:09.060 | I think the price went down like 90% in one week.
00:22:13.260 | - Yeah, I wrote two, two, two times.
00:22:15.500 | But yeah, one divided by two, two, two
00:22:16.900 | is whatever the price is.
00:22:20.260 | - Yeah, and then there was the benchmark drama
00:22:22.780 | between Together and AnyScale,
00:22:25.100 | on whether or not which one was faster,
00:22:27.540 | and whether or not the benchmark
00:22:29.140 | was really reflective of performance.
00:22:31.380 | - Yeah, and this was very surprisingly ugly,
00:22:36.300 | in a way that I think usually people
00:22:39.220 | try to respect each other's work,
00:22:40.540 | and play nice, and say nice things
00:22:41.780 | when people release stuff.
00:22:42.660 | Even if it's a competitor, you say nice things,
00:22:44.100 | or you don't say anything at all.
00:22:46.500 | AnyScale, for some reason,
00:22:49.020 | they released a benchmark on which,
00:22:52.540 | of course, AnyScale looks the best.
00:22:54.340 | (laughs)
00:22:55.860 | Why would you release a benchmark
00:22:56.940 | where you don't look the best?
00:22:58.260 | But then, basically, everyone featured
00:23:00.340 | in that benchmark didn't like it, of course.
00:23:03.180 | I do think there's some methodological things.
00:23:05.420 | So for anyone doing benchmarks,
00:23:06.940 | you have to understand that there's a real, real,
00:23:09.180 | real difference between a public benchmark
00:23:11.820 | that is meant for just limited testing,
00:23:14.780 | compared to, okay, if you're load testing us,
00:23:17.220 | or if you're seeing what a real
00:23:19.020 | enterprise customer would see,
00:23:20.100 | you have to give them a heads up.
00:23:22.220 | You have to get a different API key,
00:23:23.420 | a different endpoint, and you test
00:23:24.900 | the real infrastructure, not the demo one.
00:23:27.260 | This is very common for infra companies,
00:23:29.180 | and I think AnyScale just neglected that,
00:23:32.260 | and it hurt their credibility.
00:23:34.460 | AnyScale is not new at this game.
00:23:36.060 | They should have done that.
00:23:37.580 | But what was interesting was this benchmark drama
00:23:39.900 | reached even beyond AnyScale.
00:23:42.020 | We're gonna have Sumit on,
00:23:42.900 | and he's gonna talk about why he weighed in,
00:23:45.380 | 'cause Sumit doesn't represent any inference part.
00:23:47.220 | He just works at Meta.
00:23:48.420 | But he felt like this was a very interesting debate.
00:23:53.420 | And I think we'll see more of this.
00:23:55.300 | You have been a data investor for a while.
00:23:58.060 | Database companies always do this.
00:24:00.220 | And I think now we're just seeing
00:24:02.020 | this kind of fight come into the inference space.
00:24:04.900 | - Yeah, yeah, and I think the hardest thing
00:24:07.340 | is the end customer can now replicate it.
00:24:11.340 | So if you give me a Postgres benchmark,
00:24:14.060 | I can run Postgres on my MacBook and run similar ones.
00:24:18.340 | I think with models, it's just impossible.
00:24:20.060 | So people tell you, "This is the benchmark,"
00:24:22.700 | and you're like, "Okay, I have to go sign up
00:24:24.580 | "to every single cloud now to try it."
00:24:28.020 | It's just not easy.
00:24:30.260 | And we talked about this in Benchmarks 101,
00:24:32.180 | which is the same with model benchmarks, right?
00:24:35.140 | Just like, "Oh, this model is so much better than this."
00:24:37.180 | And then it's like, "Did you train on the questions?"
00:24:40.020 | And it's like, "What?
00:24:41.420 | "Oh, I don't know."
00:24:42.740 | So, and again, it's hard for people
00:24:44.700 | to just run the models and test them.
00:24:48.060 | So there's a lot more weight, I think,
00:24:50.340 | in AI on benchmarks than there is in traditional software,
00:24:53.100 | because nobody buys Upstash or Redis Cloud or whatever
00:24:57.500 | just based on a benchmark.
00:24:58.780 | They try them and check performance and whatnot
00:25:01.420 | because they have real production-scale workloads.
00:25:04.540 | Here, it's like nobody's really doing anything
00:25:06.260 | with these models.
00:25:07.100 | So it's like whatever any skill says, I guess, is good,
00:25:10.420 | but then customers are gonna go try it
00:25:12.420 | and just decide for them what the right thing is.
00:25:16.620 | - Yeah, yeah.
00:25:17.460 | And I think it's important to understand
00:25:20.060 | it is not just about cost.
00:25:21.740 | I think what the price war represented
00:25:23.900 | was a race to the bottom on cost.
00:25:25.780 | And you're like, "Okay, Deep Infra,"
00:25:28.380 | which is a company, we're not,
00:25:30.220 | the name of the company is Deep Infra,
00:25:31.780 | "Deep Infra has promised to just always
00:25:33.180 | "be the lowest cost provider."
00:25:34.900 | Okay, fine, that's a good value proposition,
00:25:37.300 | but you're not only optimizing for that
00:25:40.180 | in a production application.
00:25:41.500 | You're optimizing for latency.
00:25:43.020 | That's one thing.
00:25:43.860 | You're optimizing for uptime.
00:25:45.460 | That's something that you can only earn over time.
00:25:47.760 | You're optimizing for throughput
00:25:49.900 | and other forms of reliability.
00:25:52.060 | It starts to tail off beyond that,
00:25:53.540 | but there's three or four dimensions
00:25:55.000 | that really, really matter.
00:25:56.460 | If you're not table-stakes on any of those things,
00:25:58.300 | you're out.
00:25:59.140 | You're just out.
00:25:59.980 | So actually, there was a really good website
00:26:03.820 | that was released just this week
00:26:06.380 | called Artificial Analysis, did you see it?
00:26:08.740 | Yeah, so this is what the industry needs,
00:26:11.380 | which is an independent third-party benchmark
00:26:13.940 | pinging the production API endpoints of all the providers
00:26:17.780 | and giving a third-party analysis of what this is.
00:26:21.620 | I actually built a prototype of this last year.
00:26:23.340 | - Yeah, I was gonna say.
00:26:24.180 | - But I didn't like maintaining it.
00:26:27.460 | (laughing)
00:26:29.700 | I'm glad someone else is doing it
00:26:32.260 | just because I don't want to keep up with all these things.
00:26:36.420 | But still, I think it's a public service
00:26:39.180 | that somebody should do, so I'm glad that they did it.
00:26:42.020 | I think they did it very well.
00:26:44.220 | So yeah, I think that is where, I guess,
00:26:48.300 | the inference drama is ending for now.
00:26:51.820 | I don't think, I haven't seen any continuing debate there.
00:26:55.580 | The only other thing that,
00:26:57.020 | I did some extra work on this for the recap,
00:26:59.780 | which is, are they losing money?
00:27:01.380 | Are they pricing correctly their tokens from Mixtrel?
00:27:07.060 | And I actually managed to go into Dylan Patel's
00:27:11.100 | write-up of the Mixtrel price war.
00:27:13.520 | And I think I reasonably worked out
00:27:16.840 | that you can serve Mixtrel,
00:27:18.820 | and the lowest you can possibly charge
00:27:20.700 | if you take the most aggressive amortization
00:27:23.300 | of all your CAPEX and all that,
00:27:25.680 | is 50 to 75 cents per million tokens,
00:27:28.900 | which is what Perplexity prices their Mixtrel at.
00:27:31.580 | And Perplexity is a very smart player.
00:27:34.020 | They're not even an inference infra provider.
00:27:37.340 | They're just doing this for fun.
00:27:38.940 | But they're like, "Yeah, we don't want
00:27:41.940 | "to lose money on this.
00:27:42.780 | "We will provide it at cost.
00:27:44.080 | "This is what cost is to us."
00:27:46.380 | So that means, so Perplexity provides it
00:27:49.800 | at 56 cents per million output tokens.
00:27:52.620 | That means AnyScale, which is 50 cents,
00:27:54.460 | OctoAI, 50 cents, AbacusAI, 30 cents,
00:27:57.420 | and DeepInfra, 27 cents, they're all losing money.
00:28:00.240 | Because we think that the break-even is 51 cents.
00:28:03.140 | - And that's, and even that is like
00:28:06.820 | a full batch size and kind of max.
00:28:09.760 | - No, no, no.
00:28:10.600 | I assume-- - Max utilization.
00:28:11.540 | - I assume 50% utilization.
00:28:13.460 | So like, if you talk to practitioners,
00:28:16.260 | very, very good is 60%.
00:28:19.060 | Average is like 30, 40.
00:28:20.520 | So I just, I say 50, right?
00:28:22.380 | You assume 50%, batch size 16,
00:28:25.300 | 100 tokens per second generation.
00:28:27.300 | That's also very, very high.
00:28:28.300 | These are all very favorable numbers.
00:28:29.580 | Like, probably the real number is closer
00:28:31.420 | to 75 cents per million than 50 cents per million.
00:28:33.940 | Anyway, anyone charging under 50,
00:28:37.540 | definitely losing money.
00:28:39.060 | So then it's like, okay, you,
00:28:42.060 | if you, either you don't know what you're doing,
00:28:43.940 | which, in which case, good luck,
00:28:46.300 | or you know what you're doing,
00:28:47.300 | and you're purposely losing money for something.
00:28:49.860 | And what is that?
00:28:50.700 | And I don't know, but I think it's an interesting,
00:28:54.660 | aggressive strategy to pursue
00:28:56.340 | if you are doing it on purpose.
00:28:58.180 | So this is something that, like,
00:29:00.100 | the classical, like Walmart,
00:29:02.020 | would have a loss leader.
00:29:03.100 | Like, they really, really, on purpose,
00:29:05.060 | lose money on things,
00:29:06.260 | so that they get you in the door to try things out.
00:29:09.100 | I, like, I don't know if that makes sense to you as a UC.
00:29:12.940 | - Yeah, yeah, yeah.
00:29:13.780 | It's like the, well, it's like all the,
00:29:16.180 | you know, the candies are placed at the cash register,
00:29:19.540 | because maybe you just went to get the thing on discount,
00:29:21.660 | and then you buy a Kit-Kat, whatever,
00:29:23.860 | and then make money on the Kit-Kat.
00:29:25.900 | Your Kit, they all have the Pokemon trading cards
00:29:29.340 | at checkout now.
00:29:30.180 | So if you bring your Kit to buy the discounted whatever
00:29:33.500 | for you, then you end up spending more.
00:29:35.580 | But to me, the thing is, like,
00:29:37.900 | where's the checkout register
00:29:39.100 | where you upsell people with these things, right?
00:29:41.380 | - Yeah, I don't know how you-
00:29:42.220 | - It's like, that's really the big thing.
00:29:44.940 | Yeah, I don't know.
00:29:47.220 | I'm curious to see.
00:29:48.060 | I don't think Cloudflare still has it live.
00:29:50.420 | I wonder what they're gonna charge for all workers.
00:29:53.780 | Yeah.
00:29:54.620 | - They cannot serve mixed trial.
00:29:56.140 | Their GPUs are too underpowered.
00:29:57.860 | Cloudflare AI is like very good marketing
00:30:01.020 | for very, very underpowered inference, right?
00:30:05.900 | - Yeah, well, I don't know.
00:30:07.580 | I think it all depends on, like,
00:30:09.740 | what is gonna be needed, right?
00:30:11.460 | So they have mixed trial 7B right now, I checked.
00:30:15.300 | But yeah, I wonder-
00:30:16.140 | - They cannot serve mixed trial.
00:30:17.140 | - Yeah, yeah, yeah.
00:30:17.980 | - Okay, yeah, yeah.
00:30:18.820 | - I wonder, but I think they don't wanna get
00:30:20.820 | into this race right now, probably.
00:30:22.220 | - No. - You know?
00:30:23.100 | - Yeah. - So yeah, I'm curious.
00:30:26.100 | Going back to the last leading, it's like,
00:30:28.340 | is there gonna be a better model that comes next
00:30:31.420 | that they hope that you already integrated their thing with?
00:30:34.980 | You know, if you're using together to serve mixed trial
00:30:38.260 | and then something else comes in
00:30:40.140 | that you're gonna replace mixed trial with,
00:30:42.180 | hopefully you're still gonna use together
00:30:43.780 | and they're gonna get better unit economics on it.
00:30:46.380 | I don't know.
00:30:48.280 | - Yeah. - It's a good question.
00:30:49.340 | - It's a good question.
00:30:50.180 | Thank you VCs for paying for all of our imprints.
00:30:53.780 | - No, no, no.
00:30:54.620 | I think these are, you know, everyone in here
00:30:56.580 | are grown adults, they're smart investors.
00:30:58.980 | I'm sure there's some kind of long-term strategy here.
00:31:00.740 | And I'm trying to figure that out.
00:31:01.920 | Like, assume that people are smart
00:31:03.580 | and then what smart people do.
00:31:05.580 | - Yeah, I think it's the same with Uber, right?
00:31:08.100 | It's like, how could it have been so cheaper at the start?
00:31:11.620 | You know, like you look back at all DoorDash,
00:31:14.260 | all these things, it's like-
00:31:15.740 | - And like last year was a great year for Uber.
00:31:18.020 | - Yeah, no, exactly.
00:31:18.860 | Uber friends are like suddenly very, very rich again.
00:31:21.640 | (laughing)
00:31:23.820 | One thing I will mention on like the engineering
00:31:26.620 | sort of technical detail side is, you know,
00:31:28.660 | the rise of mixture of experts is something that,
00:31:31.900 | you know, we covered in our podcast with George
00:31:35.260 | and now with MixedRaw.
00:31:37.640 | And it represents the first successful,
00:31:41.780 | really, really commercially successful sparse model.
00:31:45.020 | And sparse in a very interesting way,
00:31:47.380 | in a sense that the divergence between
00:31:51.760 | the amount of compute you need at training
00:31:55.100 | versus the amount of compute you need for inference
00:31:58.180 | continues to diverge, but also in a weird way
00:32:00.700 | where you need to keep all the weights
00:32:03.420 | of the MOE model loaded,
00:32:07.820 | even though you're not necessarily using them at all times.
00:32:10.500 | (laughing)
00:32:11.740 | So, I mean, basically what I think that is,
00:32:14.740 | is like, I think that that is going to impose
00:32:17.460 | different needs on hardware, different needs on workload,
00:32:21.060 | different needs on like batching optimization,
00:32:23.520 | like Fireworks recently announced a fire attention
00:32:26.860 | where they wrote a custom Cuda kernel for MixedRaw
00:32:29.340 | on H100, it's like super, super domain specific.
00:32:33.040 | And they announced that they could, for example,
00:32:35.100 | quantize from like 16 bit down to eight bit
00:32:38.760 | with like no loss in performance.
00:32:41.020 | Like all these magical details emerge
00:32:43.480 | when you take advantage of like very,
00:32:45.340 | very custom optimizations like that.
00:32:48.140 | And I think like the rise in MOEs this year
00:32:51.660 | is going to be, going to have very meaningful impacts
00:32:54.540 | on the inference market and how it's going to shape
00:32:56.100 | how we think in price for inference.
00:32:58.560 | It may not be that we have this sort of input token
00:33:03.220 | versus output token paradigm for long,
00:33:06.700 | particularly because we have things like,
00:33:10.820 | different forms of batching, different forms of caching.
00:33:14.280 | And like, I don't really know what that looks like,
00:33:17.200 | but I'm very curious.
00:33:18.200 | I see a lot of opportunity here.
00:33:19.440 | If I was an inference provider player,
00:33:22.440 | like that's something I would be trying to offer
00:33:24.120 | to people as a way to differentiate,
00:33:25.600 | because otherwise you're just an API.
00:33:27.280 | - Yeah, no, it was in a way counterintuitive
00:33:29.880 | because most of the struggles with inference as well
00:33:33.160 | are just like memory bandwidth, you know?
00:33:34.960 | So we have now models that scale worse at higher batch.
00:33:40.940 | You know, but I'm glad I'm not in that business.
00:33:44.980 | I can tell you that.
00:33:45.820 | That's for, there's so much work to be done
00:33:48.780 | at like so many low levels of the stack.
00:33:51.760 | You know, you're already trying to provide value
00:33:53.860 | to the customer on like the developer experience
00:33:56.660 | and all of that.
00:33:57.900 | But you also have to get so close to the bare metal
00:34:00.320 | to like make this model.
00:34:01.700 | Actually, like writing a kernel,
00:34:03.980 | imagine if you had to write,
00:34:05.180 | you're like a CPU cloud provider
00:34:07.180 | and you have to like write instruction sets.
00:34:09.620 | It's like just, nobody will get in that business, you know?
00:34:13.900 | So I salute all of our friends
00:34:16.140 | at Compute Providers doing this work.
00:34:18.000 | And I mean, together it's doing so much
00:34:19.620 | for like 3DAL and like fresh retention too and whatnot, so.
00:34:23.060 | - Yeah, yeah.
00:34:23.900 | So, and that's something that I would leave
00:34:25.660 | as the last part of this sort of war
00:34:27.220 | of GPU rich versus poor.
00:34:29.340 | So there's, the GPU rich people are the model trainers
00:34:34.100 | and the infra providers.
00:34:35.500 | They're saying like, we have the GPUs,
00:34:37.260 | come use our GPUs, you know,
00:34:39.380 | and then we provide you the best inference, right?
00:34:42.260 | And that's what we've been discussing so far.
00:34:44.580 | On the other side, on the GPU poor side,
00:34:47.180 | are like all the alternative methods, right?
00:34:49.100 | The modulars, the tiny corps, the QLoras,
00:34:53.260 | and all the other types of stuff.
00:34:54.620 | I even put consistency models in there
00:34:56.140 | because, you know, any efficiency or distillation method
00:34:59.180 | where you go from, like you reduce your inference
00:35:02.320 | or GPU usage by like 25 to 40 times,
00:35:05.920 | is a GPU poor friendly approach.
00:35:07.880 | - Right.
00:35:08.720 | (laughing)
00:35:11.080 | - So I will also put Apple and MLX in there.
00:35:13.040 | And that's also like,
00:35:13.880 | Apple is finally making moves in inference
00:35:16.520 | and that will be a game changer for local models
00:35:19.280 | because then you just don't need any cloud inference at all.
00:35:21.960 | You just run it on device, which is fantastic.
00:35:24.720 | And then obviously RWKV and Mamba
00:35:26.800 | and Stripe Tina from together.
00:35:29.400 | Like all those like emerging models.
00:35:31.040 | I don't know, there's something I've been worried about
00:35:33.320 | for a latent space.
00:35:34.520 | How much attention should we give
00:35:36.560 | to the emerging architectures?
00:35:39.320 | Because there's a very good chance
00:35:41.520 | that one, these things don't work out.
00:35:43.400 | Two, they take a very long time to work out.
00:35:46.000 | And then three, once they work out,
00:35:50.160 | they're like for limited domains and like not super usable.
00:35:53.600 | So I don't know if you have opinions on that.
00:35:56.560 | I can follow up with one conclusion that I've had,
00:35:58.920 | but I want to--
00:35:59.760 | - Yeah, no, I want to hear it.
00:36:00.600 | - Put that question open to you.
00:36:01.840 | So the one conclusion is RWKV
00:36:05.360 | and the state space models, including Mamba,
00:36:08.360 | have historically just been pitched
00:36:09.880 | as super long context models.
00:36:12.000 | And I'm like, that's not something I need
00:36:16.280 | because I'm okay with 100K context.
00:36:18.680 | I'm okay with rag and recursive summarization,
00:36:23.480 | all those techniques to extend your context,
00:36:25.480 | like rope and yarn and all these things.
00:36:28.280 | So why do I need million context models?
00:36:32.080 | Why do I need 10 million, 100 million, 1 billion models?
00:36:35.480 | Like, why?
00:36:36.540 | So the more convinced, the easiest argument is,
00:36:42.520 | oh, you can consume very, very high bit rate things
00:36:45.520 | like video and DNA strands.
00:36:48.100 | And then you can do like syn-bio and all that good stuff.
00:36:51.160 | And I'm like, okay, I don't know anything about that.
00:36:54.120 | Like what happens if like you hallucinate
00:36:56.780 | like one wrong chain in your, you know,
00:37:00.240 | the DNA strand that you're trying to synthesize?
00:37:02.360 | Good luck.
00:37:03.600 | I don't think, I don't know.
00:37:05.640 | So like, that's why I've been historically
00:37:08.760 | underweighting intentionally our coverage
00:37:11.480 | of state space models
00:37:12.400 | and the non-transformer alternatives until Mamba.
00:37:17.100 | Mamba really changed things where basically
00:37:20.300 | for the same amount of compute,
00:37:22.620 | you can get a lot more mileage
00:37:23.880 | or a lot more performance for the same size of model.
00:37:26.440 | And then it's a different, now it's an efficiency story.
00:37:28.760 | Now it's a GPU poor story.
00:37:30.920 | It is no longer a long context story.
00:37:32.680 | It is just straight up,
00:37:34.360 | we are strictly more efficient than transformers.
00:37:36.520 | I'm like, oh, okay, I can get that.
00:37:38.280 | Does that change anything?
00:37:41.380 | I don't know.
00:37:42.760 | - No, that makes sense.
00:37:43.600 | I think the people look at the slope, right?
00:37:46.480 | Which is like, oh, you can get the context higher and higher.
00:37:49.320 | But in reality, it's like, if you kept the context smaller,
00:37:52.640 | instead look at the anti-slope, so to speak.
00:37:54.920 | It's like same context, it's like a lot less compute.
00:37:57.380 | - Yes.
00:37:58.220 | Yeah, so that was not clear to me until Mamba.
00:38:01.640 | And so I think that's interesting.
00:38:04.560 | I do think that there's a concept
00:38:07.280 | that I've been trying to call the sour lesson.
00:38:10.440 | You know, the bitter lesson is
00:38:12.640 | stop trying to do domain specific adjustments,
00:38:14.640 | just scale things up and it's going to work.
00:38:17.040 | That's general intelligence.
00:38:18.440 | General intelligence is dislikes any attempt
00:38:23.440 | to imbue inside of it special intelligence.
00:38:28.840 | Like if you have like any switch case or if statements,
00:38:32.040 | or like if finance do this, if something do that,
00:38:35.480 | don't bother, just scale things up.
00:38:37.100 | And it's going to do all of them simultaneously
00:38:39.320 | all better at once.
00:38:40.160 | That's the bitter lesson.
00:38:42.160 | The sour lesson is a parallel, is a corollary,
00:38:45.800 | which is stop trying to model artificial intelligence
00:38:49.220 | like human intelligence, right?
00:38:51.080 | Like the neuron was inspired by the brain,
00:38:53.920 | but doesn't work exactly like the brain.
00:38:55.860 | Machine learning uses back propagation,
00:38:58.000 | the brain does not use back propagation.
00:39:00.520 | And so why should, we keep trying to create alternatives
00:39:05.520 | to transformers that look like RNNs,
00:39:09.040 | because we think that humans act like RNNs.
00:39:12.000 | We have a hidden state and then we process new data
00:39:14.680 | and we update that state.
00:39:16.020 | But maybe artificial intelligence or machine intelligence
00:39:20.720 | doesn't work like that.
00:39:22.280 | And maybe we just fail every time we try.
00:39:24.480 | (laughs)
00:39:26.780 | So that's the sour lesson.
00:39:28.320 | Every time we try to model things.
00:39:29.800 | And my favorite analogy, I actually got this from,
00:39:31.840 | I think an old quote from Sam Altman, who was like,
00:39:35.200 | you know, like we made the plane, the airplane.
00:39:37.720 | It was inspired by birds,
00:39:39.640 | but it doesn't work anything like birds, right?
00:39:41.840 | It just is, and it works very efficiently.
00:39:44.000 | Like it's probably the safest mode of transportation
00:39:45.720 | that we have, and it works nothing like a bird.
00:39:48.800 | So why should artificial intelligence
00:39:50.720 | work like human intelligence?
00:39:52.640 | And that is the philosophical debate underlying
00:39:55.800 | my continued cautiousness around state-space models.
00:40:00.800 | Which I don't know if it's,
00:40:03.600 | I feel very vulnerable saying this,
00:40:07.560 | because I don't think there's any justification
00:40:09.700 | once you look at the empirical results
00:40:12.700 | or like the mathematical justifications for these things.
00:40:16.040 | But there is some grounding in philosophy
00:40:17.940 | that you should have when you think about,
00:40:20.440 | does an idea make sense?
00:40:21.800 | Does it, is it worth exploring?
00:40:23.440 | - Yeah.
00:40:24.280 | Well, I think now there's a lot of work
00:40:28.320 | being put into it, right?
00:40:29.680 | And I think transformers have shown enough success
00:40:32.960 | that people are interested in finding the next thing.
00:40:37.040 | You know? - Yeah.
00:40:37.880 | - So before it wasn't clear if transformers
00:40:40.440 | were really gonna work.
00:40:41.260 | So people were kind of working on them.
00:40:43.760 | But yeah.
00:40:44.600 | Okay, maybe in the 2025 recap, we're gonna have more.
00:40:49.560 | - Yeah, I mean, we'll try to do one before that.
00:40:51.680 | So we actually have a link.
00:40:53.540 | I don't know if you know this.
00:40:54.560 | Shreya Rajpal from Guardrails.
00:40:56.480 | She's married to Karan from--
00:40:58.360 | - From Hazy.
00:40:59.800 | - Sorry? - He was a Hazy, right?
00:41:01.600 | - Yeah, from Hazy, yeah.
00:41:02.960 | And so now he's started one of the other
00:41:04.880 | state-space model companies.
00:41:05.800 | I forget the name of it, so we'll see.
00:41:08.120 | And I'm sure this will be an emerging topic
00:41:11.540 | this year as well.
00:41:12.420 | So we'll don't have to wait 'til next year.
00:41:14.380 | - Yeah, no, I think we're gonna have
00:41:15.980 | maybe the sour lesson, you know, overview.
00:41:20.720 | - Well, I mentioned this in the Luther Discord,
00:41:22.260 | and then they were like, okay, so what is the spicy lesson,
00:41:24.940 | and what is the salty lesson, what is the sweet lesson?
00:41:28.580 | - I want the sweet lesson, sounds better.
00:41:30.940 | Cool, talking about GPU port, let's do multimodality.
00:41:35.780 | - Well, I feel that stable diffusion was like
00:41:38.140 | the first GPU port model, you know?
00:41:41.840 | Everybody was running it at home.
00:41:42.680 | - Yes, yes, absolutely, I should,
00:41:43.880 | I don't know if I mentioned that.
00:41:44.760 | I just didn't mention it.
00:41:45.840 | Stability, I think in 2023, you know,
00:41:48.880 | they shipped incremental things.
00:41:50.560 | I think, I don't know if stable diffusion 2 was out there,
00:41:54.000 | but everyone's talking about XTXL Turbo,
00:41:55.760 | which is a form, which is an alternative
00:41:58.200 | to consistency model, but looks like a consistency model.
00:42:01.000 | They shipped video diffusion.
00:42:02.240 | They shipped a whole bunch of stuff,
00:42:03.860 | but just wasn't as big as 2022 when they, you know,
00:42:06.840 | made a huge impact with stable diffusion.
00:42:08.640 | - Yeah, yeah, I mean, it's hard to up to.
00:42:11.400 | - It's hard to top that.
00:42:12.240 | - Stable diffusion.
00:42:13.640 | But yeah, mid-journey has been doing great, obviously.
00:42:15.960 | I actually finally signed up for a paid account last month.
00:42:20.120 | - Mid-journey, yeah, yeah.
00:42:20.960 | - Yeah, I'm part of the $200 million a year.
00:42:24.080 | - You have to, yeah, so now it's,
00:42:27.840 | what's confirmed is, I think, like a Business Week article,
00:42:30.480 | or Economist, or Information article,
00:42:32.760 | that yeah, this team has now reached
00:42:36.580 | at least $200 million ARR, completely bootstrapped.
00:42:40.300 | I think their employee count is somewhere
00:42:42.460 | between like 15 and 30 people.
00:42:44.380 | I don't know if you know exact numbers.
00:42:47.420 | I have heard rumors that their revenue
00:42:50.420 | is actually higher than that, that was what was reported.
00:42:53.460 | But it's between the $200 million to $300 million range,
00:42:55.660 | which is crazy.
00:42:56.500 | Especially if it's like primarily B2C.
00:43:01.440 | - Mm-hmm, yeah.
00:43:03.000 | - Which it looks like it is.
00:43:04.480 | - Yeah, yeah, yeah.
00:43:05.680 | It's like B to Fiverr to B.
00:43:08.240 | I think there's like a ton of--
00:43:09.880 | - Oh, you think there's a lot of Fiverr, yeah, yeah, yeah.
00:43:12.000 | Mid-journey specialists.
00:43:12.840 | - Yeah, yeah, you can like get in Discord
00:43:14.400 | and see what people are generating, you know?
00:43:16.800 | And you can see a lot of it is like product placement,
00:43:19.960 | ads, and a lot of stuff like that.
00:43:21.800 | - Yeah, and DALI 3 doesn't seem to have any impact on--
00:43:24.680 | - DALI 3 got so much worse after the GPD 4.
00:43:28.000 | - Really?
00:43:28.920 | - Like the all-in-one.
00:43:30.520 | Well, first of all, before you could generate four images.
00:43:33.300 | And then I had like very good vibes.
00:43:34.900 | Now the vibes are like boomer vibes.
00:43:37.180 | - Oh, no.
00:43:38.020 | - Every time I generate something--
00:43:39.220 | - The images I have here are DALI 3.
00:43:40.780 | - Every time I generate something on DALI,
00:43:43.180 | it looks like some dusty, old, yeah, like mid-2000s.
00:43:48.180 | - I think it's a skill issue.
00:43:50.820 | I think you have DALI 3 wrong.
00:43:52.100 | - No, but that was the great thing about DALI 3, right?
00:43:55.740 | It's like it made the prompt better for you.
00:43:57.660 | - Yeah, yeah, yeah.
00:43:58.580 | Before, like literally when it first came out,
00:44:00.960 | I'm like, "Hey, make a coliseum with llamas."
00:44:04.000 | And it was like this beautiful thing.
00:44:05.520 | I feel like now it's not, I don't know.
00:44:08.280 | Again, it's a model, right?
00:44:10.440 | So it's like maybe I just got unlucky.
00:44:12.160 | I'm in the wrong latent space, but yeah.
00:44:17.160 | - Yeah, there's a lot of players in this.
00:44:19.140 | I don't even think I put some of the players
00:44:21.840 | I was really excited about.
00:44:22.960 | Like, you know, the Imogen team spit out
00:44:24.520 | to create Ideogram, that was a few months ago.
00:44:27.720 | And I didn't even put it here because I forgot.
00:44:30.280 | - It's too much, I can't keep track of all of it.
00:44:33.560 | - Yeah, yeah.
00:44:34.400 | Okay, so I will just basically say that I do think
00:44:36.680 | that I used to, at the end of 2022, start of 2023,
00:44:40.640 | I was not as excited about multimodality.
00:44:43.680 | Obviously, I'm more excited about it now.
00:44:46.440 | I used to think that text-to-image
00:44:48.520 | was more like hobbyist kind of, you know, work,
00:44:52.680 | but $300 million a year is not hobbyist.
00:44:54.360 | - Right.
00:44:55.200 | (laughing)
00:44:57.120 | - It is not like, you know, not just like not safe for work
00:45:00.480 | because mid-journey doesn't do not safe for work.
00:45:02.560 | So it's real, it's a new form of art, it's citizen art.
00:45:06.360 | It's exciting, it's unusual and interesting,
00:45:09.920 | and you can't even model this as an investor,
00:45:14.920 | you can't even model this on an existing market.
00:45:18.140 | Because like, there's just a market of people
00:45:20.760 | who would typically not pay for art,
00:45:23.040 | and now they pay a little bit for art,
00:45:25.160 | which is digital, not as good as human,
00:45:27.560 | but it's good enough, I use it all the time.
00:45:29.880 | - Yeah, I'm surprised I haven't seen a return
00:45:32.680 | of the digital frames that were very popular
00:45:35.620 | during the NFTs boom, people were like, "Oh."
00:45:39.760 | - Yeah, so this is the very, very first "Latent Space" post
00:45:44.200 | was on the difference between crypto and AI in this respect.
00:45:48.900 | So I called this multiverse versus metaverse.
00:45:52.200 | Crypto is very much about metaverse.
00:45:53.960 | Let us create digital scarcity,
00:45:56.000 | and let us create tokens that are worth,
00:45:58.640 | that are limited edition, that are worth something,
00:46:01.200 | and then you display it probably in your PFP
00:46:04.160 | as your representation of yourself.
00:46:06.240 | And what AI represents is multiverse,
00:46:09.280 | which is a very positive sum instead of zero sum,
00:46:12.680 | where if you like a thing, okay,
00:46:14.920 | I'll choose a different seed,
00:46:15.900 | and I'll make a completely equivalent second thing,
00:46:18.400 | and that's mine.
00:46:19.440 | And that means very different things for what value is,
00:46:23.360 | and where value accrues.
00:46:25.000 | So like, yeah, I mean, I still cling to that insight,
00:46:28.000 | even though I don't know how to make money from it.
00:46:30.000 | I think that, I mean, obviously MidJourney figured it out.
00:46:32.620 | I think MidJourney like made the right approach there.
00:46:36.240 | The other one, I think I'll highlight is 11 Labs.
00:46:38.480 | I think they were another big winner of last year.
00:46:41.020 | I don't know, did they renounce their fundraise?
00:46:44.080 | I think so.
00:46:45.320 | - I don't know.
00:46:47.020 | - Rumor is--
00:46:47.840 | - Yeah, rumor is.
00:46:48.680 | - Rumor is, I can say it, you don't have to say it,
00:46:50.280 | because I only heard it from my friends.
00:46:52.360 | Rumor is they're now Unicorn.
00:46:53.760 | And they just focus on voice synthesis,
00:46:57.320 | which again, I did not care about it at the start of 2023.
00:47:01.960 | Now we have used it for parts of latent space.
00:47:04.340 | I listen almost every day to an 11 Labs generated podcast,
00:47:09.320 | the Hacker News Daily Recap podcast.
00:47:11.120 | I don't know what the room for this to grow is,
00:47:17.000 | because I always think like it's so inefficient
00:47:19.080 | to talk to an AI, right?
00:47:21.200 | The bit rate of a voice-created thing is so low.
00:47:24.720 | It's only for asynchronous use cases.
00:47:27.640 | It's only for hands-free, eyes-free use cases.
00:47:30.080 | So why would you invest in voice generation?
00:47:34.240 | I don't know, but it seems like they're making money.
00:47:36.680 | - Right, yeah, yeah.
00:47:37.520 | Yeah, I mean, Sarah, my wife, yeah, she uses it
00:47:40.520 | while she drives to talk to Chad Chibiti.
00:47:43.100 | - I see.
00:47:43.940 | - Just like--
00:47:44.760 | - Yeah, so Chad Chibiti uses their own TTS.
00:47:47.480 | It's not 11 Labs, okay.
00:47:49.120 | But you can see the modality.
00:47:51.280 | - What does, we should bring Sarah in at some point, but--
00:47:54.440 | - Customer interview.
00:47:55.900 | - What does she use Chad Chibiti voice for?
00:47:58.300 | - We're doing a bunch of home renovation.
00:48:00.140 | So maybe she's driving to Home Depot,
00:48:02.640 | and it's like, hey, what am I supposed to get
00:48:05.720 | to replace the sink, you know?
00:48:08.160 | Or all these sort of things
00:48:10.640 | that maybe were like Google searches before.
00:48:13.080 | Now you can easily do eyes-free, hands-free.
00:48:17.440 | - Yeah, a lot of people have told me about that,
00:48:18.720 | and I just, when I listen, when I'm by myself,
00:48:22.040 | I always listen to podcasts.
00:48:23.240 | (laughing)
00:48:24.640 | So I don't have time for Chad Chibiti.
00:48:26.520 | And Chad Chibiti, you know,
00:48:28.160 | probably the number one thing they can do for me
00:48:29.720 | is give me like a speed adjustment.
00:48:32.160 | (laughing)
00:48:33.000 | So I can listen in twice.
00:48:33.820 | - Yeah, yeah, yeah, yeah.
00:48:34.660 | (laughing)
00:48:35.480 | - Yeah.
00:48:36.320 | - That's funny.
00:48:37.160 | - Yeah, anyway, so like, I'm curious about your thoughts
00:48:38.880 | on like how, and as an investor,
00:48:40.980 | I think this is the weirdest AI battlefront for investing.
00:48:45.880 | 'Cause you don't know the time.
00:48:48.280 | - It's funny because there was, I'm trying to remember,
00:48:51.560 | there was a bunch of companies
00:48:53.360 | doing synthetic voices a while ago.
00:48:56.040 | And I think the problem,
00:48:57.360 | a lot of them got through like good ARR numbers,
00:49:00.160 | but the problem was like a repeatability or use case.
00:49:03.080 | So people were doing all sorts of random stuff, you know?
00:49:05.720 | And the problem is not, it's kind of like mid-journey.
00:49:09.000 | The problem is not that there's not
00:49:10.480 | maybe a market of interest.
00:49:12.240 | It's like, how do you build a venture-backed company
00:49:14.560 | with like a scalable go-to-market
00:49:16.040 | that like can go after a customer segment
00:49:18.200 | and like do it repeatedly?
00:49:19.760 | I think that's been the challenge.
00:49:21.240 | I don't know how 11 Labs is doing it,
00:49:23.560 | but you could do so many things with voice,
00:49:27.000 | text-to-voice that is like, how do you sell it?
00:49:29.240 | You know, who do you call?
00:49:30.680 | Like, that's like the hardest thing, right?
00:49:33.840 | If you're raising like a Series A, a Series B,
00:49:35.880 | it's like, how are you gonna invest this money
00:49:38.280 | in sales and marketing to get revenue back?
00:49:40.280 | It's kind of like the basic of it.
00:49:41.920 | And it can be challenging.
00:49:43.540 | That's why sometimes investors are like,
00:49:45.720 | you're making money and that's great for you,
00:49:47.640 | but like how--
00:49:49.960 | - There's no industry--
00:49:50.800 | - Yeah, it's hard.
00:49:52.040 | It's hard to like just tie it together.
00:49:55.280 | - Okay.
00:49:56.120 | I would be interested in,
00:49:59.320 | because I feel like there's a category of companies
00:50:01.800 | in the early 2010s that did this,
00:50:04.360 | meaning they offered an API
00:50:06.420 | with no idea how you were gonna use it.
00:50:09.120 | I'm thinking Twilio.
00:50:10.260 | And Twilio has a cohort of like sort of API-first companies
00:50:16.520 | that are all like sort of Twilio inspired.
00:50:19.200 | But yeah, I think there's a category or a time in the market
00:50:25.320 | when it makes sense to just offer APIs
00:50:27.520 | and just let your customers figure it out
00:50:29.720 | and it's actually okay.
00:50:31.040 | And then there's sometimes when it's not okay.
00:50:33.860 | And I think the default investor mentality right now
00:50:36.260 | is that it's not okay
00:50:37.100 | if you don't know what your customer is doing.
00:50:39.240 | - I think, well, Twilio is a funny example
00:50:41.760 | because I think in the middle 2010s,
00:50:44.480 | Uber was like 15% of Twilio's revenue.
00:50:47.200 | - I'm just, I'm talking like,
00:50:49.760 | move yourself back as to like Twilio seed investor,
00:50:51.880 | Twilio series A investor, they had no idea.
00:50:54.460 | Uber wasn't even around.
00:50:55.800 | - But I think the thing now is like,
00:50:58.800 | text to voice is not new, you know?
00:51:02.260 | Like that's really the thing.
00:51:03.360 | It's like, what's new now
00:51:04.760 | is that you can generate very good text
00:51:06.960 | to then feed into the model.
00:51:09.000 | So that changes why the market is interesting, you know?
00:51:12.160 | But if you really think about it,
00:51:14.340 | the models today are a little better.
00:51:16.120 | They're maybe like 50% better
00:51:18.160 | than they were three years ago.
00:51:20.140 | But the transformer models under defeated, what to say,
00:51:24.040 | they're like a billion times better.
00:51:26.660 | So imagine if you have like a lot of people use it
00:51:29.000 | for like automated, you know, customer support,
00:51:31.280 | things like that.
00:51:32.500 | Before you had like scripts, they were reading.
00:51:34.600 | Now you have, you can have a transformer model
00:51:37.320 | converse with the customer.
00:51:38.880 | So it makes it a lot more useful in cases.
00:51:42.800 | But we'll see how that changes.
00:51:45.920 | - Okay, the last thing I'll mention here,
00:51:47.940 | why is this a war?
00:51:50.260 | Which is OpenAI and Gemini and Google
00:51:54.620 | are working on everything models
00:51:56.820 | versus each of these individual startups
00:51:59.000 | all working on their selected modality.
00:52:01.860 | And so this is a question of like,
00:52:05.300 | are the big tech companies going to actually win
00:52:07.420 | because they can transfer learning across multiple domains
00:52:11.140 | as opposed to each of these things being point solutions
00:52:13.700 | in their specific things.
00:52:15.460 | The simple answer is obviously everyone will win.
00:52:17.580 | - Right.
00:52:18.420 | (laughs)
00:52:19.240 | - Because the AI market is so huge.
00:52:21.140 | You know, there's a market for the Amazon basics
00:52:24.340 | of like everything, you know, one model has everything.
00:52:26.260 | And then there's a market for,
00:52:27.700 | no, like the basics are not good enough.
00:52:29.060 | I'll need the special thing.
00:52:31.180 | Do you have an opinion on
00:52:33.220 | when does one market win over the other
00:52:35.980 | or is it just like everything's gonna win?
00:52:38.780 | - Yeah, it's interesting.
00:52:40.200 | I think like it works when people wouldn't have used
00:52:43.700 | the product without the Amazon basics, you know?
00:52:46.140 | So like, maybe an example is like a computer vision,
00:52:49.220 | you know, like, I mean, we have--
00:52:50.580 | - Yeah, vision is sore here now.
00:52:52.300 | - Yeah, it's like, you know, before people were like,
00:52:54.700 | why am I bothering trying out
00:52:56.420 | to set up a computer vision pipeline and all of that?
00:52:58.980 | Now they can just go on GPT-4 and put an image
00:53:01.660 | and it's like, oh, this is good.
00:53:03.140 | I could use this for this.
00:53:04.380 | And then they build out something
00:53:06.180 | and maybe they don't use OpenGPT-4v,
00:53:08.940 | they use RoboFlow or whatever else.
00:53:11.000 | That's kind of how I think about it.
00:53:13.120 | It's like, what's the thing
00:53:14.640 | that enables people to try it, you know?
00:53:16.880 | So in a way, the God model can do everything fairly okay.
00:53:21.160 | It's like DALI and MidJourney, you know,
00:53:24.220 | all these different things.
00:53:25.220 | Who's like the, and maybe like MixedRoute,
00:53:28.360 | the MixedRoute inference wars are like another example.
00:53:30.500 | It's like, I would have never put something in my app
00:53:34.180 | at like $2 per million tokens,
00:53:36.700 | but I did it at 27 cents per million token, you know?
00:53:40.720 | And now it's like, oh no, I should really do this.
00:53:43.260 | It's a lot better.
00:53:44.420 | So that's how I think about how the God model
00:53:47.800 | kind of helps the smaller people then build more business.
00:53:51.460 | - Yeah, cool.
00:53:52.580 | Yeah, creates a category.
00:53:53.820 | Yeah, Ragonops.
00:53:56.560 | - Yeah, last but not least, where to begin?
00:54:00.480 | We had almost all of these people on the podcast too.
00:54:04.380 | - They're honestly the easiest to talk to
00:54:06.220 | because they look like DevTools
00:54:08.740 | and you are a DevTools investor.
00:54:11.180 | I worked in DevTools and they're all,
00:54:15.580 | I think they're also more mature, right?
00:54:17.860 | As businesses, there's more of a playbook
00:54:21.060 | that is well understood by the customer.
00:54:23.940 | Like, yes, I need a new stack here.
00:54:26.340 | Maybe not.
00:54:27.260 | So I think the reason, okay,
00:54:28.900 | so my biggest problem with putting databases
00:54:31.620 | versus frameworks versus ops tooling in the same war
00:54:35.620 | is that they're not really a war.
00:54:37.400 | They work cohesively together,
00:54:39.820 | except when one thing starts to intrude on another thing.
00:54:44.780 | And that's why I put the very, very,
00:54:47.100 | I very consciously put together this sequence,
00:54:49.260 | which is databases on the left,
00:54:50.620 | frameworks in the middle, ops companies on the right.
00:54:53.460 | What's the first product of Lang chain?
00:54:55.380 | Lang Smith, which is an ops thing, right?
00:54:57.380 | So now suddenly the framework companies
00:54:59.500 | are not so friendly with the ops companies
00:55:01.260 | 'cause they're trying to compete with the ops companies.
00:55:03.260 | What are the ops companies trying to do?
00:55:04.500 | The ops companies are trying to produce SDKs
00:55:06.180 | that compete with frameworks.
00:55:07.740 | Okay, then what are the database companies trying to do?
00:55:10.400 | First of all, they're fighting between each other, right?
00:55:12.660 | There's the non-databases, all adding vector features.
00:55:17.220 | We had some people approach us
00:55:18.900 | and we had to say no to them 'cause there's just too many.
00:55:21.020 | And then there's the vector databases coming up
00:55:22.980 | and getting $235 million to build vector databases.
00:55:27.980 | Maybe I'll just, you know,
00:55:30.420 | obviously you're an active investor in some of these things,
00:55:32.540 | so you cannot say everything,
00:55:33.660 | but just on databases alone,
00:55:36.180 | one of the biggest debates of 2023,
00:55:39.140 | where do you stand on the whole thing?
00:55:41.180 | - That's the million dollar question.
00:55:45.420 | I think it's really, well, one, in the start everything,
00:55:50.420 | there's kind of like a lot of hype, you know?
00:55:53.020 | So like when Lang chain came out and Lama index came out,
00:55:55.140 | then people were like, oh, I need a vector database.
00:55:57.460 | It's like vector, they search vector database
00:55:59.800 | and it's like Chroma, Pinecone, whatever.
00:56:03.680 | But then it's like, oh,
00:56:04.520 | you can actually just have PG vector in Postgres.
00:56:07.680 | And you already have Postgres.
00:56:09.400 | Did you know it could do that?
00:56:10.600 | People are like, no, I didn't because nobody really cared.
00:56:12.920 | So like, there's not a lot of documentation.
00:56:15.160 | Same with, yeah, MongoDB vector, Cassandra,
00:56:18.760 | all these things. - Redis, Elasticsearch.
00:56:20.840 | - You can actually put vectors and embeddings in everything.
00:56:24.980 | - It's a different kind of index.
00:56:26.380 | And I think like, I mean, like Jeff and Anton also,
00:56:29.900 | what they always talked about even early on,
00:56:31.580 | it's like, this is like a active learning platform.
00:56:34.260 | This is not just like a vector database.
00:56:36.180 | It's like, what do you do with the vectors?
00:56:38.620 | It's like, what's most helpful?
00:56:40.340 | It's not where do you store them.
00:56:41.900 | So that's kind of the change.
00:56:44.860 | - I think that was old Chroma, by the way.
00:56:46.420 | I don't know if that's the new, the current messaging.
00:56:48.660 | - Well, but I think, I'm just saying like to them,
00:56:51.900 | it's never about,
00:56:52.740 | this is the best way to put a vector somewhere.
00:56:55.540 | It's like, this is the best way to operate on the vectors.
00:56:59.180 | And the store is like part of it,
00:57:00.920 | but there's like the pipeline to get things out
00:57:04.100 | and everything, you have to build out a lot more.
00:57:06.260 | So I think 2023 was like, create the data store.
00:57:10.300 | I think 2024 is gonna be like,
00:57:12.460 | how do I make the data store useful?
00:57:14.220 | Because the vector store just commoditized.
00:57:16.820 | So there needs to be something else on top of it.
00:57:19.320 | Yeah.
00:57:21.380 | - Unless they can come up with some kind of like,
00:57:23.940 | new distance function or something.
00:57:25.760 | I keep waiting for Chroma to,
00:57:27.620 | they teased a little bit of what they're working on
00:57:29.300 | at the AI Engineer Summit, which yeah,
00:57:32.020 | density and whatever other fancy formulas
00:57:34.740 | that Anton is cooking up.
00:57:35.980 | But yeah, so I think I tweeted about this
00:57:40.300 | maybe like two, three months ago,
00:57:41.620 | and I think I pissed off Chroma a little bit.
00:57:43.380 | But the best framing of what Anton would respond to here
00:57:47.140 | is what people are embedding within vectors
00:57:49.700 | is a very different kind of data
00:57:51.460 | from what is already within Postgres
00:57:53.660 | and MongoDB and all the others.
00:57:55.260 | So in some sense, it's net new data.
00:57:58.900 | And that actually struck a chord with me
00:58:02.400 | because that's how I started to understand
00:58:05.340 | structured versus unstructured data.
00:58:07.060 | That's how I started to understand,
00:58:09.060 | one of my kind of heroes is Mark,
00:58:13.180 | who's the CTO of MongoDB.
00:58:15.540 | This guy was the former GM of AWS RDS.
00:58:20.020 | And for those who don't know,
00:58:21.020 | GM is like, you're the mini CEO of that business.
00:58:24.580 | And when you work at AWS RDS,
00:58:26.340 | you run a $1, $2 billion a year business.
00:58:29.720 | And now, and then he quits being Mr. Postgres of AWS
00:58:35.360 | to join MongoDB, the enemy.
00:58:37.860 | And when he gave that speech of why he did this,
00:58:42.860 | he was like, actually, if you look at the kind of workloads
00:58:46.700 | that is happening, Postgres is doing well, obviously.
00:58:50.060 | Structured data, always going to be there.
00:58:51.980 | But unstructured data and document type data
00:58:55.500 | is just rising exponential rate even faster.
00:58:58.380 | And for him to say that means different things.
00:59:01.740 | Anybody could have said that.
00:59:02.580 | Anybody could have pointed,
00:59:03.580 | made a chart that showed what he did.
00:59:05.760 | Anybody could have said that.
00:59:06.600 | But for him to have said that, I think it was a very big deal
00:59:08.860 | 'cause he's rich, he doesn't have to work,
00:59:10.700 | but he believed in this so much that he was like,
00:59:13.100 | okay, I'll just join MongoDB.
00:59:15.540 | So I'm like, okay, there's a real category shift
00:59:18.420 | between structured data and unstructured data.
00:59:20.260 | I believe it.
00:59:21.100 | I don't think it's just that you can put JSONB
00:59:23.540 | inside of Postgres and be done.
00:59:24.740 | That's not a NoSQL database.
00:59:26.640 | Okay, fine.
00:59:27.480 | So what is this new thing of vectors?
00:59:30.380 | And how do you think about that as a new kind of data?
00:59:34.740 | And I think if there's a third category
00:59:37.060 | of something beyond unstructured data,
00:59:41.140 | I don't know what it is.
00:59:42.300 | Context or memory or whatever you call it,
00:59:45.940 | whatever you call this kind of new data,
00:59:47.620 | that might belong in a new category of database
00:59:50.260 | and that might create the new MongoDB of this era.
00:59:53.540 | And it could be any one of these guys.
00:59:56.260 | Right now, Pinecone has the lead.
00:59:59.260 | I think they're $750 million company.
01:00:03.220 | - Valuation.
01:00:04.040 | - Yeah.
01:00:04.880 | And then all the others are much smaller.
01:00:06.700 | So like, okay, if there's a room for like,
01:00:09.620 | if this is really a new data category
01:00:12.100 | and there's a room for a key player,
01:00:14.340 | then it's probably gonna be one of these guys.
01:00:16.180 | By the way, I left out Weviate
01:00:17.980 | and I put Kudrant in there.
01:00:19.220 | Do you know why?
01:00:20.180 | - No.
01:00:21.660 | - Anthropic and OpenAI, both use Kudrant
01:00:26.300 | for their internal RAG solutions.
01:00:29.900 | Which means that for whatever reason,
01:00:32.460 | we should probably interview Kudrant.
01:00:33.620 | They passed the evals when Weviate and Milvus
01:00:36.380 | and all the others didn't, which is interesting.
01:00:39.140 | - Yeah, yeah, yeah, yeah.
01:00:40.340 | - There's a lot that we don't know.
01:00:41.540 | - Yeah, interesting.
01:00:43.300 | Yeah, I think like, I mean, going back to your point
01:00:45.300 | of like, Langtring, building Langsmith,
01:00:47.900 | at some point, some of the vector databases
01:00:49.820 | are gonna be like,
01:00:50.940 | why am I letting my customers use Llama Index?
01:00:53.780 | You know, it's like, I should be the RAG interface
01:00:56.980 | since I'm owning the data.
01:00:58.260 | - Yes, yes, that's why I put them next to each other.
01:01:01.220 | Right now they're friends.
01:01:02.600 | - Yeah, right now, but I mean,
01:01:05.820 | if we think about the JAMstack era, you know,
01:01:10.220 | you had Vercel started as Zite, which was just a CDN.
01:01:14.520 | And then you had Netlify, you had all these companies.
01:01:17.820 | And then Vercel built Next.js.
01:01:20.860 | And so they moved down from the CDN to the framework,
01:01:23.700 | you know, and it's like, now they use the framework
01:01:27.140 | to then enable more cloud and platform products.
01:01:30.280 | Which way is it gonna give this way?
01:01:33.380 | I think what we learned from before
01:01:35.260 | is that you rather own the framework
01:01:37.340 | and then have the cloud to support it
01:01:39.380 | than like just have Netlify
01:01:40.780 | and not have your own framework.
01:01:42.600 | Just given the way the two companies are doing now.
01:01:46.340 | - So for those who don't know,
01:01:47.180 | I worked at Netlify
01:01:48.000 | and I was very, very intimately involved in this.
01:01:49.740 | (laughing)
01:01:51.340 | - So we don't have to say anything private.
01:01:53.060 | - No, no, no, it's fine, it's fine.
01:01:54.460 | It's well known that Vercel won
01:01:56.300 | and Netlify has pivoted away to a different market.
01:01:59.200 | But is it over learning from an N of one example
01:02:06.220 | that you always wanna own the framework?
01:02:07.540 | - No, no, no, no, no.
01:02:08.580 | Because then the counter example is the same,
01:02:10.500 | which is Gatsby.
01:02:11.620 | - Yes. - Where you own the framework,
01:02:12.940 | you don't own the cloud
01:02:13.780 | and then you don't make money either.
01:02:15.000 | So it's kind of like...
01:02:17.460 | I think we still gotta figure out
01:02:20.560 | where the gravity is in this market.
01:02:23.200 | I think a lot of people will say
01:02:24.320 | the gravity is in the model.
01:02:25.920 | A lot of people will say the gravity is in the embeddings
01:02:28.320 | and the data that you put into it.
01:02:30.120 | A lot of people don't know what they're talking about.
01:02:31.960 | So I think 2024 is supposed to be
01:02:35.740 | the year of AI in production.
01:02:37.400 | I think we're gonna learn soon
01:02:40.440 | who bleats and to where.
01:02:42.000 | - I think that statement is the year of Linux
01:02:45.980 | on the desktop thing.
01:02:47.020 | It's just always gonna be true.
01:02:50.180 | People are always gonna be saying it.
01:02:52.600 | We're gonna be here one year later
01:02:53.760 | and then it's just like, yeah,
01:02:54.840 | this year is the year of AI production.
01:02:56.760 | And it's always gonna be incrementally more true.
01:02:59.200 | But what is the catalyst?
01:03:02.240 | What is the big event that you will point to
01:03:04.520 | and say, aha, now it's in production?
01:03:06.480 | I don't know.
01:03:07.320 | - I think actually being that it's not in production.
01:03:10.640 | A lot of companies, it's funny.
01:03:13.000 | One, they're just like an inherent timeline
01:03:16.640 | that large companies work within.
01:03:18.360 | GPT-4 came out in April.
01:03:21.280 | That's like eight months.
01:03:22.400 | It's like most companies don't buy things
01:03:24.500 | within eight months and implement them.
01:03:26.920 | So I think part of it, just like a physics time limit,
01:03:31.180 | that even people that have been really interested,
01:03:34.100 | you just cannot go through the whole process
01:03:36.760 | of getting them live to all of your customers.
01:03:38.720 | So I think we'll see more of that in good and bad.
01:03:41.640 | It's gonna be a lot of failures
01:03:43.240 | and a lot of successes, hopefully.
01:03:45.400 | Yeah.
01:03:47.320 | - Yeah, any other commentary on tooling,
01:03:49.600 | RAG, ops, anything like that?
01:03:51.520 | I mean, I always tell people,
01:03:54.400 | as much as I'm interested in fine tuning,
01:03:55.680 | I think RAG is here to stay.
01:03:57.760 | Don't even doubt it.
01:03:59.560 | This is a necessary part
01:04:01.000 | that every AI engineer should know.
01:04:03.480 | - Yeah, well, I think, yeah,
01:04:05.640 | it's tied to the infinite context thing, right?
01:04:08.280 | I think the leftover question is like,
01:04:10.840 | do you wanna have infinite context
01:04:13.280 | and hope that the model is good enough
01:04:15.200 | at parsing which parts matter to your query?
01:04:18.200 | Or do you wanna use RAG
01:04:20.120 | and wrap very specific context injection?
01:04:23.560 | I think so far, most people will say,
01:04:25.320 | I'd rather do a context injection
01:04:27.220 | with just what I care about
01:04:28.640 | than put a whole document in there
01:04:30.000 | and hope the model gets it.
01:04:31.060 | But maybe that changes.
01:04:33.660 | - I don't, I like--
01:04:34.780 | - Yeah, no, I mean--
01:04:35.620 | - There's no way it changes.
01:04:37.820 | (laughing)
01:04:38.900 | - Hey, you know, that's great for Lama Index.
01:04:41.140 | - Yeah, yeah, no, it's great.
01:04:41.980 | - You know, like, great luck
01:04:42.800 | is gonna make a lot of money, I guess.
01:04:43.640 | - Yeah, yeah, yeah.
01:04:44.460 | No, it's not clear
01:04:45.300 | that they're gonna make a lot of money, right?
01:04:46.660 | 'Cause they're just an open source project.
01:04:47.980 | I don't think they've launched a commercial thing yet.
01:04:50.380 | - I don't think so.
01:04:52.940 | Because, yeah, Jerry was talking about it on the podcast,
01:04:55.460 | but it wasn't, yeah.
01:04:58.180 | - Yeah, so, I mean, we'll see what they launch this year.
01:05:00.620 | I do have--
01:05:01.460 | - The year of AI in production.
01:05:02.820 | - The year of Lama Index in production.
01:05:04.700 | - Yeah.
01:05:05.700 | Okay, so that's the four wars.
01:05:07.260 | We also covered a bunch of other non-wars
01:05:10.580 | that we skipped over.
01:05:12.380 | I did remember that you actually just published
01:05:15.540 | a piece on the semantic versus--
01:05:17.620 | - The syntax, the semantics.
01:05:19.140 | - Do you wanna cover that as a--
01:05:21.220 | - Yeah, I think, like, I kinda mentioned this
01:05:24.260 | a couple times on the podcast,
01:05:25.940 | but basically, the idea of, like,
01:05:27.540 | code has always been the gateway to programming machines,
01:05:31.940 | and we spend a lot of time making it easier.
01:05:34.140 | So you go from punch cards, like COBOL, to C, to Python,
01:05:39.140 | just to make it easier for the person
01:05:42.660 | to read and write the code.
01:05:44.460 | And through it, we started adding
01:05:45.940 | kind of, like, these semantic functionalities in it.
01:05:48.300 | So in Python, you can do array.sort.
01:05:51.460 | You don't need to know bubble sort.
01:05:52.860 | You don't need to know any algorithm
01:05:54.300 | that you learned in school to do it.
01:05:56.660 | And I think the models are kind of like 100X-ing this,
01:06:00.540 | which is, like, now all you need to do
01:06:02.260 | is, like, create a sign-up form, you know,
01:06:05.580 | where people put a name, email,
01:06:07.020 | and send it to this endpoint.
01:06:09.980 | So it's gonna be a lot easier for people
01:06:12.940 | that know the semantics of the business,
01:06:14.980 | which is, you know, your product managers,
01:06:16.900 | or your business people,
01:06:18.420 | the layer that goes from customer requirements
01:06:22.300 | to implementation, basically,
01:06:24.580 | and have them intervene in the code.
01:06:27.060 | So, you know, how many times, as an engineer,
01:06:30.260 | you have to, like, go change some button color
01:06:32.980 | or, like, some button size,
01:06:34.260 | like, these small things that, like,
01:06:36.060 | you really shouldn't be doing.
01:06:37.900 | And now you can have people with natural language
01:06:40.580 | intervene in the code and write code
01:06:42.900 | that can actually be merged and put in production.
01:06:46.180 | I also wrote the bear case for it,
01:06:47.620 | which is, like, we already have so much trouble
01:06:49.940 | getting engineering teams to collaborate
01:06:52.140 | and get all their changes together
01:06:53.820 | without conflicts and all of these things,
01:06:55.740 | and maybe also having non-technical people
01:06:58.740 | try and do things will be hard.
01:07:00.420 | And models are,
01:07:01.940 | they just think about solving the task at hand.
01:07:05.020 | They don't think about,
01:07:06.140 | I've always told my engineers, it's like,
01:07:08.140 | you need to leave the code base better than you found it.
01:07:10.260 | You know, if you're, like, writing something,
01:07:11.860 | it's like, just,
01:07:13.420 | we cannot always keep adding, like, quick hacks, you know?
01:07:18.740 | And I think models are great at quick hacks,
01:07:20.980 | but sometimes it's like, oh,
01:07:22.820 | this is, like, the 16th button
01:07:24.500 | that you've changed a style for,
01:07:27.180 | you should make a class for it.
01:07:28.460 | That's, like, the dumbest example.
01:07:30.580 | So I think that's,
01:07:32.420 | if that happens,
01:07:34.900 | then I think I'll be a lot more bullish
01:07:36.460 | on, like, coding agents, you know?
01:07:38.820 | But I think now that's kind of the,
01:07:40.460 | until you can have non-technical people
01:07:43.340 | manually query models and look at results
01:07:45.940 | and then say, this is ready to go,
01:07:47.940 | it's gonna be hard to have autonomous agents do it, so.
01:07:51.460 | - Yeah, so I actually had a tweet about it today
01:07:54.660 | because Itamar from Codium actually published
01:07:57.820 | Prompt Engineering, Flow Engineering,
01:08:01.340 | as his next evolution of Prompt Engineering.
01:08:03.780 | And they've been working on, you know, in IDE agents.
01:08:08.100 | They call it agents,
01:08:09.220 | you can debate about the definition of an agent,
01:08:12.460 | you know, at the end of the day.
01:08:13.580 | So my split of it is inner loop versus outer loop,
01:08:16.980 | which I think you understand that.
01:08:18.580 | Maybe I have to explain it to the audience,
01:08:19.900 | because every time I talk about it to developers,
01:08:21.500 | they don't, they've never heard of it.
01:08:23.500 | So inner loop is everything that happens
01:08:25.460 | between a Git commit.
01:08:27.260 | Outer loop is everything happens
01:08:30.140 | after the commit is committed and it's pushed up for PR.
01:08:34.980 | So maybe that's too reductive,
01:08:37.540 | but that's something like that, right?
01:08:38.460 | Like, inner loop happens within your IDE,
01:08:40.580 | outer loop happens in GitHub, something like that.
01:08:43.500 | Okay, so I think your conception of an agent
01:08:47.300 | is outer loop-y, especially if it's non-technical, right?
01:08:50.460 | Like the dream, like you mentioned
01:08:52.100 | sweep.dev in your write up.
01:08:53.740 | And there's also CodeGen, there's also maybe Morph,
01:08:58.060 | depends what Morph is doing.
01:09:00.100 | And there's a bunch of other people all doing this stuff.
01:09:02.940 | Even small developer was also like,
01:09:04.820 | you know, write in English and then create a code base.
01:09:08.180 | And I think it's just not ready for that.
01:09:10.980 | Outer loop is a mirage,
01:09:13.460 | it's like going to forever be five years away.
01:09:17.100 | And the people working on inner loop companies
01:09:18.940 | have been the right bet.
01:09:20.860 | And you can work on inner loop agents.
01:09:22.660 | I think actually Code Interpreter is an inner loop agent
01:09:26.740 | in a sense of like, it's like limited self-driving, right?
01:09:31.100 | It's kind of like, you have to have your attention on it,
01:09:36.100 | you have to watch it, it can only drive a small distance,
01:09:38.940 | but it is somewhat self-driving.
01:09:40.580 | And so I think if you have this like gradations
01:09:42.660 | in your outlook on autonomous agents,
01:09:44.620 | and you don't expect everything to jump to level five
01:09:47.580 | at once, but if you have a idea of what level one,
01:09:50.260 | two, three, four, five looks like for you,
01:09:52.020 | I haven't really defined it apart from this concept
01:09:54.700 | of inner loop versus outer loop.
01:09:56.040 | But once you've defined it, then you can be like,
01:09:57.580 | oh, we're making real progress on this stage.
01:10:00.060 | And this other stage, too early for now,
01:10:02.940 | but at some point somebody will do it.
01:10:04.900 | - Yeah, yeah.
01:10:06.340 | I think like, yeah, maybe level one is like,
01:10:09.020 | I think of it more as just the auto-completion and the ID.
01:10:13.420 | Level two is like asking cursor,
01:10:15.820 | hey, how can I make this change?
01:10:18.660 | But then level three should be like,
01:10:21.260 | to me it's like, we need to separate the inner loop
01:10:24.060 | from the ID, you know?
01:10:25.980 | Like, I need to make a code change.
01:10:28.600 | Sometimes I shouldn't go in the ID.
01:10:30.740 | Sometimes I should be in the UI of the product
01:10:34.780 | and say, hey, that needs to be changed.
01:10:36.360 | Kind of like the, all the preview environments companies
01:10:40.220 | want you to put comments, the PMs put comments.
01:10:43.420 | Like, how do you go from that to code changes?
01:10:46.860 | There should be enough there
01:10:48.500 | to make the code changes happen
01:10:51.380 | through a supervised interface.
01:10:53.100 | - That's how the loop.
01:10:54.740 | - Yeah, but that's kind of like,
01:10:56.260 | I think what these models are doing is like change
01:10:59.940 | where the loop start and end, you know?
01:11:02.260 | Because now you can create code in the outer loop
01:11:04.660 | before you couldn't do it.
01:11:07.100 | - That's the dream, that's the dream.
01:11:09.540 | - Yeah.
01:11:10.380 | - Yeah, I have, yeah, anyway, my focus right now,
01:11:14.500 | I'll say if anyone cares, it's like,
01:11:16.540 | I think the only thing that's working is inner loop
01:11:19.140 | and you should just use inner loop things aggressively,
01:11:21.540 | build inner loop things aggressively, invest in them,
01:11:24.460 | and then keep an eye on the outer loop stuff.
01:11:26.680 | - Yeah.
01:11:27.520 | - It's still very early.
01:11:28.500 | I did invest in CodeGen, this Jhacks thing,
01:11:32.020 | which we mentioned briefly in the Sourcegraph episode.
01:11:36.940 | Do we have other things that we want to mention
01:11:38.460 | or do you want to sort of keep it to the four wars?
01:11:40.940 | - I think that's great.
01:11:41.780 | I thought it was going to be much shorter,
01:11:43.340 | but we're at one hour, 15 minutes.
01:11:46.400 | I thought we were going to run through everything.
01:11:48.020 | - Yeah, I mean, are there, like, okay,
01:11:50.100 | maybe like top two things from December
01:11:53.540 | that you have commentary on.
01:11:55.660 | - I think the needle in a haystack thing.
01:11:59.940 | - Okay, maybe you want to explain that first.
01:12:01.100 | - Yeah, basically like Entropic,
01:12:05.100 | there was like one example floating around
01:12:07.740 | about Cloud's context window,
01:12:09.660 | and you basically gave it this like super long context
01:12:12.020 | on I think like things to do in San Francisco
01:12:14.460 | or something like that.
01:12:15.620 | And then it was like,
01:12:16.460 | what is the most fun thing to do in SF?
01:12:18.300 | And it always, it didn't,
01:12:21.100 | they made this nice chart of like, okay,
01:12:22.940 | based on where it is in the context,
01:12:24.820 | it gave a better or worse response.
01:12:26.660 | And then Entropic responded and they were like,
01:12:29.740 | oh, you just need to add here's the most relevant sentence
01:12:33.100 | in the context as part of the assistant prompt.
01:12:36.540 | And then the chart turns all green all of a sudden.
01:12:40.300 | And I'm like, we cannot still be here, right?
01:12:44.060 | Like, it cannot, this is like some--
01:12:47.660 | - And you have Entropic like telling people,
01:12:49.900 | oh yeah, it's just like just add this magic string
01:12:51.940 | and it works.
01:12:52.780 | - Yeah, it's some like Riley Goodside wizardry.
01:12:55.500 | It's like, I don't want to do that anymore.
01:12:57.260 | I thought Riley, I thought like, you know,
01:12:59.860 | in the early days of GPDs,
01:13:01.500 | like Riley Goodside was doing so much great work
01:13:04.020 | on like prompt engineering and whatnot.
01:13:06.740 | We shouldn't be there anymore.
01:13:08.080 | There shouldn't be somebody telling me,
01:13:10.020 | or like the GPD 4, like I'll give you a $200 tip
01:13:14.260 | if you do this right and like--
01:13:15.420 | - So I collected a whole bunch of like state-of-the-art
01:13:17.820 | prompting techniques.
01:13:19.620 | Yeah, so if you tip the model,
01:13:21.260 | it will give you better results if you promise that.
01:13:24.980 | So okay, here's the current state-of-the-art
01:13:27.340 | for GPT prompting.
01:13:28.420 | It's Monday in October, the most productive day of the year.
01:13:31.020 | You have to take a deep breath
01:13:32.260 | and you have to think step-by-step.
01:13:33.740 | You have to return the full script.
01:13:35.700 | You are an expert on everything.
01:13:37.060 | I will pay you $20, just do anything I ask you to do.
01:13:39.660 | I will tip you $200 every request you answer correctly.
01:13:43.380 | And your competitor models said you couldn't do it,
01:13:45.540 | but you can do it.
01:13:46.440 | Or I think there's another one that I didn't put in here,
01:13:49.420 | but it's like, you know, my grandmother's dying.
01:13:51.580 | This is an emergency, please help me do it.
01:13:53.140 | - Yeah, that's actually my,
01:13:56.060 | I think my most viewed tweet ever.
01:13:58.220 | At OpenAI Dev Day, I tweeted,
01:14:00.340 | no more return JSON or my grandma's gonna die
01:14:03.460 | when they announce JSON mode.
01:14:04.820 | And people love the, people love to get grandma's--
01:14:08.540 | - I haven't heard as much uptake on JSON mode.
01:14:11.740 | I think it's still--
01:14:12.580 | - That's the thing with all this AI stuff, right?
01:14:15.460 | It's like, I mean, and sometimes we're like part of it.
01:14:17.660 | If I think about our chat GPT plugins episode,
01:14:22.220 | I think in the moment people are just like,
01:14:23.940 | oh, this is gonna be such a big deal.
01:14:26.280 | And then it takes varied amount of times
01:14:29.180 | to like really pick up, you know?
01:14:30.700 | - Do you think that will happen in GPTs?
01:14:33.980 | - I think like most people that are using GPTs right now
01:14:38.980 | are trying to get around some sort of weird limitation
01:14:42.780 | of the base model, you know,
01:14:44.060 | or just trying to have a better system prompt.
01:14:46.620 | But like at some point there's limited value
01:14:50.140 | to get out of it.
01:14:50.980 | So the question is like,
01:14:51.800 | what's gonna incentivize people to build more on it
01:14:54.760 | versus just building their own thing out of it?
01:14:58.740 | I don't know.
01:15:00.060 | - Yeah, okay, so I guess my pick for highlight
01:15:03.980 | of last month, there's two.
01:15:05.620 | One, we finally got Gemini.
01:15:07.560 | - Right.
01:15:08.580 | - I think the marketing was dishonest.
01:15:10.620 | - Yeah, we need the soundboard.
01:15:12.180 | - Wham, wham, wham.
01:15:13.860 | - But still, it is a sort of model.
01:15:17.540 | It is a very, very credible alternative to open AI.
01:15:20.860 | And we should be happy for that
01:15:22.220 | because otherwise we live in a open AI only world.
01:15:25.220 | And Gemini is basically the only other
01:15:27.820 | sort of leading contender until Llama 3 drops
01:15:30.460 | whenever Llama 3 comes out.
01:15:32.100 | - It's kind of, I mean, SOC said today they're training it.
01:15:35.740 | - Yeah, it sounds like today they're training it.
01:15:38.340 | For me, I guess I'm still very interested
01:15:41.020 | in like the hardware metagame.
01:15:43.180 | This is a much smaller stakes, but very personal.
01:15:47.260 | I think recently, especially, you know,
01:15:50.460 | we're recording this mid-January.
01:15:52.900 | So after CES, after Rabbit R1 launched,
01:15:55.900 | I think there's a lot of interest in hardware.
01:15:58.940 | I don't know how you feel about it
01:16:00.220 | as an enterprise software investor,
01:16:02.220 | but I think that hardware is hard,
01:16:05.420 | but also it captures context and it makes AI usable
01:16:09.380 | in ways that you cannot currently think about.
01:16:14.020 | And everyone dreams of building an assistant like her
01:16:17.700 | in the movie "Her."
01:16:19.020 | That is a hardware piece.
01:16:20.100 | That is actually not only software.
01:16:21.740 | And probably the hard part is the engineering
01:16:23.820 | for the hardware.
01:16:24.660 | And then the sort of AI engineering
01:16:27.100 | for the assistant within the hardware.
01:16:29.380 | So yeah, I mean, yeah, I'm an investor in tab.
01:16:32.900 | I see a lot of like, you know, interest this month,
01:16:37.900 | but it started last month with the launch of Humane as well.
01:16:40.540 | I don't know if you have thoughts on any of those things.
01:16:43.180 | - Well, I think this year we also get
01:16:44.780 | the Apple Vision Pro thing.
01:16:46.500 | So I think there's gonna be a ton of experimentation.
01:16:50.380 | I think Rabbit got the right nostalgia factor.
01:16:54.780 | You know, it kind of looks like a toy that looks like
01:16:57.980 | a Game Boy Advance, something like that.
01:17:00.500 | I'm curious to see what you got beyond that.
01:17:04.820 | I think, yeah, I mean, obvious,
01:17:06.100 | like right where we have the studio building tab.
01:17:09.740 | And I think that's another interesting form factor.
01:17:12.260 | And I think if you ask them, I think in our circles,
01:17:15.700 | a lot of people are like, well, what about privacy
01:17:18.320 | and all these things?
01:17:19.160 | But he will tell you that we're kind of like a special group
01:17:23.460 | that most people value convenience over privacy,
01:17:26.500 | as you'll learn from the social medias of the last few years.
01:17:30.100 | So yeah, I'm really curious to see how it develops.
01:17:34.100 | - I really like technology where it's,
01:17:36.500 | you're slightly uncomfortable with it on a social level.
01:17:39.120 | And so, you know, for Uber,
01:17:42.500 | it was like this regulation around taxis.
01:17:44.380 | For Airbnb, it was, you know, staying in strangers' homes.
01:17:47.860 | And now it turns out for OpenAI,
01:17:49.500 | it was training on people's content.
01:17:52.980 | - Right.
01:17:53.820 | - Right, now it's becoming a matter of regulation.
01:17:56.500 | And OpenAI's data partnerships are, you know,
01:17:58.800 | a form of private regulatory capture,
01:18:01.020 | which is a playbook that is fantastic.
01:18:02.700 | Like if you, I hope it was on purpose,
01:18:04.860 | because whoever did that is a genius.
01:18:07.060 | So I'm like, okay, like, you know,
01:18:09.180 | I do think that every great new company,
01:18:12.540 | especially on the consumer side,
01:18:13.740 | is provocative in that sense.
01:18:14.980 | Like they're doing something that is not yet kosher.
01:18:18.340 | And so I think, like, the Humane's, the Tabs,
01:18:21.700 | anything that is working on that front where it's like,
01:18:25.100 | yeah, I'm not sure I'm comfortable with this.
01:18:27.180 | And then, but maybe it could change.
01:18:28.860 | That is a really interesting shift.
01:18:31.420 | So I'm excited from that point of view,
01:18:34.460 | but at the same time, most hardware companies fail very quickly.
01:18:38.540 | They have a very hot start and then, you know,
01:18:40.180 | everyone puts it in their drawer
01:18:41.620 | and then never looks at it again.
01:18:43.600 | So I'm very, very aware of that.
01:18:45.780 | But I think it's, I mean, it's something interesting.
01:18:47.540 | And I do think, so here's the core thing of it, right?
01:18:50.420 | Aavi doesn't think it's a hardware company.
01:18:53.340 | Aavi, like most of the cost of the $600 for Tab
01:18:57.860 | is going towards GPT costs,
01:18:59.820 | because it's actually processing context.
01:19:01.660 | And the whole idea is that context is all you need.
01:19:03.820 | Like in this world of like, you know, AI applications,
01:19:07.420 | like whoever has the most unique context wins, right?
01:19:10.220 | A unique context could be the quality data war, right?
01:19:12.180 | Like a unique context is like, you know,
01:19:13.720 | I have Reddit info, I have Stack Overflow info,
01:19:15.520 | I have New York Times info.
01:19:17.600 | If I have info on everything you say and do at all times,
01:19:21.560 | that is something that no one else has.
01:19:25.440 | And if he becomes a good store of that,
01:19:28.800 | then like, what can you build with that?
01:19:31.080 | So I'm most excited for him to expose the developer API,
01:19:33.920 | 'cause then I can come in and do all my software stuff.
01:19:36.480 | But he has to build a hardware layer
01:19:38.080 | and get acceptance for that first.
01:19:39.960 | - Right, yeah, no, I'm excited to see.
01:19:43.580 | I'm sure we're gonna see a lot of people
01:19:45.340 | work around with them.
01:19:46.180 | So I'm excited to see.
01:19:48.340 | - I actually, so I think he doesn't like me
01:19:50.260 | because I asked for an off button.
01:19:52.300 | I guess I wanna be able to guarantee you,
01:19:55.020 | if we're having a conversation,
01:19:56.540 | I wanna show you, you see, it's off, right?
01:19:58.500 | It's kind of like, oh yeah, my phone is on silent mode.
01:20:01.260 | Right, there's a physical silent mode button.
01:20:03.580 | But now he just wants it to be always on.
01:20:05.580 | - That's a whole new market.
01:20:08.300 | Like a soundproof storage for your AI pendant
01:20:13.300 | so that you can guarantee the person cannot hear you.
01:20:16.700 | Awesome, no, this was fun.
01:20:21.620 | Please, if you're still listening after one hour,
01:20:24.320 | 21 minutes, let us know what we did right,
01:20:27.160 | what we did wrong, what you would like to see differently.
01:20:29.960 | It's the first time we tried this out, but yeah.
01:20:33.280 | - Awesome, thanks for doing this.
01:20:34.320 | - Cool.
01:20:35.600 | (upbeat music)
01:20:38.180 | (upbeat music)
01:20:40.760 | (upbeat music)
01:20:43.340 | (upbeat music)
01:20:45.920 | (upbeat music)
01:20:48.520 | (upbeat music)
01:20:51.120 | (upbeat music)
01:20:53.700 | [BLANK_AUDIO]