Back to Index

The Four Wars of the AI Stack - Dec 2023 Recap


Chapters

0:0 Intro
1:42 The Four Wars of the AI stack: Data quality, GPU rich vs poor, Multimodality, and Rag/Ops war
3:35 Selection process for the four wars and notable mentions
8:11 The end of low background tokens and the impact on data engineering
10:10 The Quality Data Wars (UGC, licensing, synthetic data, and more)
21:44 The GPU Rich/Poors War
26:29 The math behind Mixtral inference costs
34:27 Transformer alternatives and why they matter
41:33 The Multimodality Wars
45:40 Multiverse vs Metaverse
54:0 The RAG/Ops Wars
60:0 Will frameworks expand up, or will cloud providers expand down?
65:25 Syntax to Semantics
67:56 Outer Loop vs Inner Loop
71:0 Highlight of the month

Transcript

- Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners. And today I'm joined just by my co-host, Zwickz, for a new podcast format. - Yeah, and it's a bit uncomfortable because we have to just stare into each other's eyes lovingly.

But in our end of year survey last year, a lot of listeners were asking us for more one-on-one time, more opinions from the both of us as hosts on what's going on in AI. You know, both of us are very actively involved. (laughing) And I don't think this year will be any different.

This year, there's lots more excitement to come. And I think, you know, we're trying to grow Latent Space in terms of the types of formats and the amount of value that we deliver to our subscribers. So one thing that we've been trying, experimenting with, is this monthly recap that I started doing around August of last year, where I basically just take the notable news items of the month and then I sort them and categorize them according to some order that makes sense and write them down in the newsletter.

And this last December recap was particularly exciting 'cause it seemed like it popped off in a number of areas, particularly with the AI breakdown. Our friend NLW featured it on his podcast. And I figured we can just kind of go over that as a way of setting the stage for 2024, but also recapping what happens in 2023.

- Yeah, and people always ask me if December is like a slow month, but I think you almost broke Substack with how many links we had in the thing. - No, we actually did. So a lot of people commented to me about the formatting issues within the newsletter that I sent out.

And I know that they are there, but I couldn't fix it because Substack was broken by us with how long it was. - But so we had this kind of like four main buckets called the four words of the AI stack, data quality, and I guess like data quantity as well, in a way.

The GPU rich versus poor, which we have a whole episode about with Dylan Patel. Multimodality, we're actually recording tomorrow with LumaLabs about their new 3D model. So we went from text to image to 3D video. I wonder what's next. - And we're gonna release Hugging Face as well. 'Cause I guess I've been thinking about calling it multimodality 101, because the first modality beyond text that you should really pay attention to is vision.

- Right, yeah. Yeah, and then the rag ops war. I think that's a-- - I don't know what to call it. I don't know if you want to call it anything else. This is my-- - I don't know. But I think beginning of last year, that was like kind of the hottest space because there wasn't much open source model work.

And I think over the last maybe like four or five months, everybody's so focused on fine-tuning LLAMA 2 and like a DPO to improve these models, max trial, and all these things. And people forgot about our friends at LangChain, LLAMA Index, and some of the things that were maybe top of mind.

VectorDBs, it seemed like everybody was releasing a VectorDB early in the year. - Yeah, I think that I'll be very surprised if any new VectorDBs come out this year. With one exception, which is something I'm keeping an eye on, which is Turbo Puffer. I don't know if you've seen them going around.

Yeah, all the smart people seem to be adopting Turbo Puffer as the first serverless VectorDB, which could be interesting. - Yeah, no, and we're going to have definitely Jeff and Antoine on the podcast at some point. I know they're going to be fun, I guess, but... - I should also mention, I think it's interesting.

So the reason I selected these four wars was a process of elimination of wars that I think ended up not mattering. So for those who don't know, inside of my writing, I often include footnotes that are in themselves just essays in the footnotes. And so I think it's also notable, the things that people thought were hot, that were less hot than expected.

So it was agents, definitely less hot than at the start of 2023. And then this one is a very controversial, non-selection by me, I think. Open-source AI is not a battle in the sense that I don't think there's anyone against open-source AI. Everyone is on one side. There's no opposing side apart from regulators.

But in my mind, when I think about for engineers, engineers are all universally in favor of open-source models. So there's no battle here. Everyone just wants it to improve. So it's not interesting to write about. We just want more open-source. - Yeah. The only battle is people offering inference on it.

- Yes. - Killing each other in the process. - Yeah, so I classified that as a GPU rich versus poor war. But maybe there's a better way to classify that. And you can give me some feedback on that because it's a struggle to try to categorize the world. Code models as well.

I was very struck by a conversation I had with Pullside. I saw Kant from Pullside. So they haven't been on the podcast yet. They're kind of stealth still, but they had a very, very notable fundraise. I think they had like $50 million raised. - I think even more, yeah.

- For a seed. Spending most of it on GPUs. And my conversation with ISO, he was like, "Hey, you know, like Repl.it "was like one of our podcast's early biggest winners." Repl.it didn't really follow up with. Like they announced like their 1.5 model, but it's not really widely used beyond Repl.it.

There's Starcoder, there is Code Llama, but like it's not really, for how important code is, it doesn't seem like as big of a battlefront as just general function calling, reasoning, these other kinds of domains. And so I thought it was just interesting to note that even though we as a podcast try to pay particular attention to developer tooling, to code models, we interviewed Cursor, Find, Repl.it, Codium, and Hugging Face.

These all seem like very small compared to the amount of money being thrown, the amount of heat in the other domains. And I don't know why that is. - Yeah, I think it's maybe the fragmentation of the tooling. Like most people in code are using VSCode, Cursor, GitHub, one of the three, so there's maybe not as much experimentation versus with text, people are just trying everything.

It's hard to try a code model. I see code models being released, but it's not super easy to just plug it into your workflow. So I think engineers like myself are just lazy. And it's like, hey, I'm having great success with whatever I'm using. I don't really wanna go there.

- Special case form of code is SQL and the semantic layer data engineering type things. We also had two guests on there from Seek and Qube. And we also talked to a bit of Databricks, a bit of Julius. - Yeah, and we have Brian from Hex. - And Brian from Hex.

Does he count? I don't know. - Yeah, no. - Yeah, yeah, yeah. I guess the Hex notebooks, yes. Hex magic, yes. Rexis is a different beast. Anyway, but yeah, I think people who come to AI engineering for the AI might actually end up finding themselves in data engineering in the end.

And in traditional ML engineering in the end, they might have to discover that they're doing Rexis. And all the stuff that gets swept under a rug in a demo becomes their job. And I think I'll probably say just because we didn't select a theme for last year doesn't mean it wasn't important.

It just wasn't top of mind yet. And maybe I think that would be an emerging theme this year. - Yeah, I think that's kind of the consequence of the low background tokens, like the end of the low background tokens. Once-- - Can you explain what you think are low background tokens?

This was our November recap. - Yeah, well, the comparison that our friend Jeff Huber at Chroma brought up is steel before the atomic bomb creation. So steel before and no radiation in it. After all the testing, a lot of steel had radiation embedded in it. So it was really precious to get low background steel, meaning with no radiation and same with tokens.

You can assume that any internet content from three years ago, it's just internet. It doesn't have, it's like people writing, it's not models writing. Instead now, anything we're gonna get on Common Crawl, updates and things like that, you never know if it's human written or not. And I think that will put more work on data engineering.

Because even basic stuff like checking if a text says, as a model created by OpenAI, it's gonna be important. So people are just being blindly taking all the data sets offered by Luther and Common Crawl and all these different things, assuming that all the data in it is good.

I think now, how do you build on top of it? And we've seen the New York Times lawsuit against OpenAI. We've seen data partnerships starting to rise in different companies. I think that's gonna be one of the bigger challenges and maybe we'll see more of the work that Databricks has done to build the DALI 5K instruction tuning, just first party creation of data.

It's like, you got people sitting at their desk every day. If everybody wrote five Q&A pairs or things like that, you would have a massive unique data set for your model. So, yeah. - Yeah, for people who missed that episode, that was one of our early episodes as well.

And Mike Conover since left to start BrightWave, which I'm sure we'll have him back this year. - Yeah, they're doing a lot of interesting stuff. I think the next episode will be very cool. - Awesome. So how do you want to tackle this? Do you want to just kind of go through the four wars?

- Yeah, let's do it. You have, you created this Wikipedia-like infographic for each of them. - Yeah, I should say, the inspiration for this actually was during the Sam Altman leadership battle, people were making mock Wikipedia entries for the debate and for like who's on the side of the D-cells and who was inside of the EX.

And so I like that format because it's very concise. It has the list of key players and it's kind of fun to think about like who's on what side and think about what is important and what people are battling over. I think it is important to focus on key battlegrounds as a concept because there's so many interesting things you could be talking about in AI and they're not all equally interesting.

So how do you decide what is interesting? I think it's money, it's power, it's people, it's like impact, that kind of stuff. And so, yeah, that's what I ended up doing. And so fun fact, the way I did this was I actually edited the HTML on Wikipedia and then I just screenshotted it just to get the formatting.

- Good old developer tools. Developer tools is all you need. So the data war, belligerence. On one side you have journalists, writers, artists, on the other side you have researchers, startups, synthetic data researchers. I guess like maybe we wanna talk about what are the axis of war. So like one of them is attribution, right?

Like I think there's a varying spectrum of how comfortable people are about this data going into a model. So some people are happy to have your model trained on it, no matter what. Some people are happy to have your model trained on as long as you disclose that it's in the model.

Some people just hate that you trained on their model and some people like the New York Times wants you to destroy any artifact that might have touched your article. So that's kind of what we're fighting on. It's not always, I just wanna make it clear that it's not just like you should never use the data or you should always use the data.

I think people are just trying to figure out what's the right form of attribution and how do I get paid as somebody whose data ended up being in this training. I think we're giving everybody a lot of great tokens related space because we do full transcripts on everything and we're happy for people to train models on.

- Oh yeah, please train a latent space model. - Yeah, we would love it. So that's kind of what we're fighting on. Anything that people should keep in mind about this war and like maybe some of the campaigns that are going on? - So I think the New York Times one is probably going to go to Supreme Court.

It is very, very critical. It is a landmark war that will probably decide what fair use means in context of AI. So I think it's, and I recommend, I think The Verge did a good analysis of this. Platformer maybe did a good analysis of this. There are like four criteria for what fair use is and everyone basically converges onto the last criteria which is does your use, does your transformative use of my copyrighted material diminish the market for my content?

And it's very hard to say. I always suspect that yes, in some capacity, in some amount, but good luck proving that in a court of law. And I think a negative ruling on open AI would seriously stall the progress of AI. And that's bad for humanity, but good for content creators and writers so obviously which we want them to be adequately compensated and recognized for their work.

So there's like no good, there's like no easy outcome here apart from the existing copyright system which is also somewhat broken. And it's just a very, very tricky, challenging case, I think. Yeah, so. - It's funny because we had something, I was a community moderator at a website called Rap Genius which was a lyric sanitation.

And there was like a similar thing in maybe like 2014 or like the music labels basically came to the website and it's like, hey, this is not fair use. Like you can not reuse the lyrics to the song and eventually the website made deals with the record labels to like be able to do this.

And then Google was stealing the transcripts to put in like the enhanced thing. - And they proved it by. - Yeah, yeah, we did all the like, basically like the things on the eye, some eyes we put the dots, some eye we put like the accent and that's how it made it all better.

- I thought they just varied the spacing or they like use a different kind of spacing in the Unicode. - I think it was the eye thing, but maybe, I mean, this is like almost 10 years ago. - So Rap Genius proved it by injecting some data poison into their corpus and then Google reproduced it faithfully.

So therefore they proved that Google is scraping Rap Genius. Did Google have to pay Rap Genius money in the end? - I don't think so. - But at the same, there was also another issue with Rap Genius that we had that got blacklisted by Google for like, there was like a lot going on.

- Of course. - But anyway, this is not a Rap Genius special. - Yeah, I mean, ultimately, like I think that we do need quality data. I think that then if this case is contained to the New York Times, the New York Times worse outcome is that they will substitute it with Washington Post and they substitute with The Economist or like the second or third ranked newspaper that is the most friendly to AI.

And then the New York Times will realize that actually their words are not as, not that much more valuable than other words. And then the value of the content comes down very, very dramatically. So I think it will be interesting, but yeah, I do think it's overstepping their bounds to call for the destruction of all GPTs.

That's probably for sure. Then the bigger problem I have is with Stack Overflow and Reddit, which I named as on the side of the New York Times. They have effectively shut down their APIs in order to try to train their own models. Probably same as Twitter, actually. I should probably have put Twitter, I put Twitter on the wrong side, maybe.

I don't know, Twitter is on both sides. - Elon is on every side, the side of chaos. - Yeah, what this is, is basically every UGC, Users Generated Content company of the 2000 and 2010s, now has a giant pile of user content that becomes valuable data that used to be open for researchers to scrape and train models.

Now all of them are locking in their walls, right? Behind their walled gardens and then trying to train their own models to boost their benefits. So this is a locally optimal outcome for them, but a globally suboptimal outcome for humanity. Because why should we care about the closed garden of Reddit?

The Reddit model, the Stack Overflow model, the X model, as opposed to it being a part of a data mix of 20% Reddit, 20% Stack Overflow, 20% X. That seems like a much better outcome for the world, but everyone is acting in their very narrow self-interest in trying to make their own model, which is probably going to suck.

- Right. (laughs) So next, Bor, after you get data-- - Oh, we should mention synthetic data. - Oh, yeah. So what happens when you run out of human data? You make your own. (laughs) So I would say that is, when I went to NeurIPS, that was the number one discussion out of every single researcher's mouth.

There is a lot of research coming from both, I guess, the big labs as well as the academic labs on what good synthetic data looks like. I don't know if you've talked to any startups around that. I just talked to Luis Costricado the other day, and he is promising a very, very interesting approach to synthetic data generation.

I think his phrase for it is pre-trained-scale synthetic data, as opposed to what the news research and the other open-source communities have been doing, which is fine-tuned-scale synthetic data. And so he wants to create trillion-token datasets that are all synthetic. And I'm like, okay, that's interesting, but also at the same time, these are all just downloads from GPT-4 or something else.

So Luis is very aware of that, and he has a way around it. I don't really understand it, but he claims that that's a good way around it. Andrej Karpathy at NeurIPS highlighted this paper from DeepMind where they were bootstrapping synthetic data that could be verifiably proven correct. So specifically in math and in code, where there is a correct answer.

So yeah, that makes sense. You can solve the synthetic data problem that way, but what about beyond that? There's just no answer. - And wasn't part of the issue also that the way that the phrases are constructed and all of that in synthetic data ends up making mode collapse even worse?

Because one thing is right or wrong, right? The other thing is every sample is read in the same way, or as a similar, since it comes from a certain model, kind of as a similar root of structure. - You already have, yeah. So I mentioned this in the best papers discussion with John Frankel.

So the basic argument is you already have a flawed distribution from a language model. You are resampling that flawed distribution to double down on that flawed distribution. There's no extra information from humans. So on principle, how can this work? And so the only conclusion there is you don't need it to emulate a human.

You need it to emulate a useful assistant, however you define it. So I think that the goal of synthetic data is less to emulate human speech, because that is basically solved. It is now more to spike the distribution in useful ways. And that's a phrase I borrowed from Kanjun.

But anyway, so I think that synthetic data will be a giant theme for this year, and not least because the human data is being locked up behind walls. So it's a very, very clear trend. This is probably the most amount of money after GPUs will be spent here on data.

So one war I did not put here was the talent war, right? Like the war for PhDs and smart people. But when you break down what the talent people do, one is they make models and they run inference on GPUs. Or they run training runs on GPUs. But the other is they clean data.

They find data, clean data, and format data. And so yeah, these are all just proxies for the kind of talent that is flowing back and forth. And ultimately, I think you have to focus on what they're working on, the visible output of what they're working on, which is data.

- All right, let's talk about the GPU inference war. I think this is one that has been heating up. And we actually have a bunch of these folks coming on the podcast in the next few days. - Yeah, yeah, yeah. Are we calling it compute month? - Yeah, we can figure out a name, but we have modal, together, replicate.

There's a lot coming up. But basically, the Mixedraw release, the MOE model, was kind of the spark of the war. I think the price went down like 90% in one week. - Yeah, I wrote two, two, two times. But yeah, one divided by two, two, two is whatever the price is.

- Yeah, and then there was the benchmark drama between Together and AnyScale, on whether or not which one was faster, and whether or not the benchmark was really reflective of performance. - Yeah, and this was very surprisingly ugly, in a way that I think usually people try to respect each other's work, and play nice, and say nice things when people release stuff.

Even if it's a competitor, you say nice things, or you don't say anything at all. AnyScale, for some reason, they released a benchmark on which, of course, AnyScale looks the best. (laughs) Why would you release a benchmark where you don't look the best? But then, basically, everyone featured in that benchmark didn't like it, of course.

I do think there's some methodological things. So for anyone doing benchmarks, you have to understand that there's a real, real, real difference between a public benchmark that is meant for just limited testing, compared to, okay, if you're load testing us, or if you're seeing what a real enterprise customer would see, you have to give them a heads up.

You have to get a different API key, a different endpoint, and you test the real infrastructure, not the demo one. This is very common for infra companies, and I think AnyScale just neglected that, and it hurt their credibility. AnyScale is not new at this game. They should have done that.

But what was interesting was this benchmark drama reached even beyond AnyScale. We're gonna have Sumit on, and he's gonna talk about why he weighed in, 'cause Sumit doesn't represent any inference part. He just works at Meta. But he felt like this was a very interesting debate. And I think we'll see more of this.

You have been a data investor for a while. Database companies always do this. And I think now we're just seeing this kind of fight come into the inference space. - Yeah, yeah, and I think the hardest thing is the end customer can now replicate it. So if you give me a Postgres benchmark, I can run Postgres on my MacBook and run similar ones.

I think with models, it's just impossible. So people tell you, "This is the benchmark," and you're like, "Okay, I have to go sign up "to every single cloud now to try it." It's just not easy. And we talked about this in Benchmarks 101, which is the same with model benchmarks, right?

Just like, "Oh, this model is so much better than this." And then it's like, "Did you train on the questions?" And it's like, "What? "Oh, I don't know." So, and again, it's hard for people to just run the models and test them. So there's a lot more weight, I think, in AI on benchmarks than there is in traditional software, because nobody buys Upstash or Redis Cloud or whatever just based on a benchmark.

They try them and check performance and whatnot because they have real production-scale workloads. Here, it's like nobody's really doing anything with these models. So it's like whatever any skill says, I guess, is good, but then customers are gonna go try it and just decide for them what the right thing is.

- Yeah, yeah. And I think it's important to understand it is not just about cost. I think what the price war represented was a race to the bottom on cost. And you're like, "Okay, Deep Infra," which is a company, we're not, the name of the company is Deep Infra, "Deep Infra has promised to just always "be the lowest cost provider." Okay, fine, that's a good value proposition, but you're not only optimizing for that in a production application.

You're optimizing for latency. That's one thing. You're optimizing for uptime. That's something that you can only earn over time. You're optimizing for throughput and other forms of reliability. It starts to tail off beyond that, but there's three or four dimensions that really, really matter. If you're not table-stakes on any of those things, you're out.

You're just out. So actually, there was a really good website that was released just this week called Artificial Analysis, did you see it? Yeah, so this is what the industry needs, which is an independent third-party benchmark pinging the production API endpoints of all the providers and giving a third-party analysis of what this is.

I actually built a prototype of this last year. - Yeah, I was gonna say. - But I didn't like maintaining it. (laughing) I'm glad someone else is doing it just because I don't want to keep up with all these things. But still, I think it's a public service that somebody should do, so I'm glad that they did it.

I think they did it very well. So yeah, I think that is where, I guess, the inference drama is ending for now. I don't think, I haven't seen any continuing debate there. The only other thing that, I did some extra work on this for the recap, which is, are they losing money?

Are they pricing correctly their tokens from Mixtrel? And I actually managed to go into Dylan Patel's write-up of the Mixtrel price war. And I think I reasonably worked out that you can serve Mixtrel, and the lowest you can possibly charge if you take the most aggressive amortization of all your CAPEX and all that, is 50 to 75 cents per million tokens, which is what Perplexity prices their Mixtrel at.

And Perplexity is a very smart player. They're not even an inference infra provider. They're just doing this for fun. But they're like, "Yeah, we don't want "to lose money on this. "We will provide it at cost. "This is what cost is to us." So that means, so Perplexity provides it at 56 cents per million output tokens.

That means AnyScale, which is 50 cents, OctoAI, 50 cents, AbacusAI, 30 cents, and DeepInfra, 27 cents, they're all losing money. Because we think that the break-even is 51 cents. - And that's, and even that is like a full batch size and kind of max. - No, no, no. I assume-- - Max utilization.

- I assume 50% utilization. So like, if you talk to practitioners, very, very good is 60%. Average is like 30, 40. So I just, I say 50, right? You assume 50%, batch size 16, 100 tokens per second generation. That's also very, very high. These are all very favorable numbers.

Like, probably the real number is closer to 75 cents per million than 50 cents per million. Anyway, anyone charging under 50, definitely losing money. So then it's like, okay, you, if you, either you don't know what you're doing, which, in which case, good luck, or you know what you're doing, and you're purposely losing money for something.

And what is that? And I don't know, but I think it's an interesting, aggressive strategy to pursue if you are doing it on purpose. So this is something that, like, the classical, like Walmart, would have a loss leader. Like, they really, really, on purpose, lose money on things, so that they get you in the door to try things out.

I, like, I don't know if that makes sense to you as a UC. - Yeah, yeah, yeah. It's like the, well, it's like all the, you know, the candies are placed at the cash register, because maybe you just went to get the thing on discount, and then you buy a Kit-Kat, whatever, and then make money on the Kit-Kat.

Your Kit, they all have the Pokemon trading cards at checkout now. So if you bring your Kit to buy the discounted whatever for you, then you end up spending more. But to me, the thing is, like, where's the checkout register where you upsell people with these things, right? - Yeah, I don't know how you- - It's like, that's really the big thing.

Yeah, I don't know. I'm curious to see. I don't think Cloudflare still has it live. I wonder what they're gonna charge for all workers. Yeah. - They cannot serve mixed trial. Their GPUs are too underpowered. Cloudflare AI is like very good marketing for very, very underpowered inference, right? - Yeah, well, I don't know.

I think it all depends on, like, what is gonna be needed, right? So they have mixed trial 7B right now, I checked. But yeah, I wonder- - They cannot serve mixed trial. - Yeah, yeah, yeah. - Okay, yeah, yeah. - I wonder, but I think they don't wanna get into this race right now, probably.

- No. - You know? - Yeah. - So yeah, I'm curious. Going back to the last leading, it's like, is there gonna be a better model that comes next that they hope that you already integrated their thing with? You know, if you're using together to serve mixed trial and then something else comes in that you're gonna replace mixed trial with, hopefully you're still gonna use together and they're gonna get better unit economics on it.

I don't know. - Yeah. - It's a good question. - It's a good question. Thank you VCs for paying for all of our imprints. - No, no, no. I think these are, you know, everyone in here are grown adults, they're smart investors. I'm sure there's some kind of long-term strategy here.

And I'm trying to figure that out. Like, assume that people are smart and then what smart people do. - Yeah, I think it's the same with Uber, right? It's like, how could it have been so cheaper at the start? You know, like you look back at all DoorDash, all these things, it's like- - And like last year was a great year for Uber.

- Yeah, no, exactly. Uber friends are like suddenly very, very rich again. (laughing) One thing I will mention on like the engineering sort of technical detail side is, you know, the rise of mixture of experts is something that, you know, we covered in our podcast with George and now with MixedRaw.

And it represents the first successful, really, really commercially successful sparse model. And sparse in a very interesting way, in a sense that the divergence between the amount of compute you need at training versus the amount of compute you need for inference continues to diverge, but also in a weird way where you need to keep all the weights of the MOE model loaded, even though you're not necessarily using them at all times.

(laughing) So, I mean, basically what I think that is, is like, I think that that is going to impose different needs on hardware, different needs on workload, different needs on like batching optimization, like Fireworks recently announced a fire attention where they wrote a custom Cuda kernel for MixedRaw on H100, it's like super, super domain specific.

And they announced that they could, for example, quantize from like 16 bit down to eight bit with like no loss in performance. Like all these magical details emerge when you take advantage of like very, very custom optimizations like that. And I think like the rise in MOEs this year is going to be, going to have very meaningful impacts on the inference market and how it's going to shape how we think in price for inference.

It may not be that we have this sort of input token versus output token paradigm for long, particularly because we have things like, different forms of batching, different forms of caching. And like, I don't really know what that looks like, but I'm very curious. I see a lot of opportunity here.

If I was an inference provider player, like that's something I would be trying to offer to people as a way to differentiate, because otherwise you're just an API. - Yeah, no, it was in a way counterintuitive because most of the struggles with inference as well are just like memory bandwidth, you know?

So we have now models that scale worse at higher batch. You know, but I'm glad I'm not in that business. I can tell you that. That's for, there's so much work to be done at like so many low levels of the stack. You know, you're already trying to provide value to the customer on like the developer experience and all of that.

But you also have to get so close to the bare metal to like make this model. Actually, like writing a kernel, imagine if you had to write, you're like a CPU cloud provider and you have to like write instruction sets. It's like just, nobody will get in that business, you know?

So I salute all of our friends at Compute Providers doing this work. And I mean, together it's doing so much for like 3DAL and like fresh retention too and whatnot, so. - Yeah, yeah. So, and that's something that I would leave as the last part of this sort of war of GPU rich versus poor.

So there's, the GPU rich people are the model trainers and the infra providers. They're saying like, we have the GPUs, come use our GPUs, you know, and then we provide you the best inference, right? And that's what we've been discussing so far. On the other side, on the GPU poor side, are like all the alternative methods, right?

The modulars, the tiny corps, the QLoras, and all the other types of stuff. I even put consistency models in there because, you know, any efficiency or distillation method where you go from, like you reduce your inference or GPU usage by like 25 to 40 times, is a GPU poor friendly approach.

- Right. (laughing) - So I will also put Apple and MLX in there. And that's also like, Apple is finally making moves in inference and that will be a game changer for local models because then you just don't need any cloud inference at all. You just run it on device, which is fantastic.

And then obviously RWKV and Mamba and Stripe Tina from together. Like all those like emerging models. I don't know, there's something I've been worried about for a latent space. How much attention should we give to the emerging architectures? Because there's a very good chance that one, these things don't work out.

Two, they take a very long time to work out. And then three, once they work out, they're like for limited domains and like not super usable. So I don't know if you have opinions on that. I can follow up with one conclusion that I've had, but I want to-- - Yeah, no, I want to hear it.

- Put that question open to you. So the one conclusion is RWKV and the state space models, including Mamba, have historically just been pitched as super long context models. And I'm like, that's not something I need because I'm okay with 100K context. I'm okay with rag and recursive summarization, all those techniques to extend your context, like rope and yarn and all these things.

So why do I need million context models? Why do I need 10 million, 100 million, 1 billion models? Like, why? So the more convinced, the easiest argument is, oh, you can consume very, very high bit rate things like video and DNA strands. And then you can do like syn-bio and all that good stuff.

And I'm like, okay, I don't know anything about that. Like what happens if like you hallucinate like one wrong chain in your, you know, the DNA strand that you're trying to synthesize? Good luck. I don't think, I don't know. So like, that's why I've been historically underweighting intentionally our coverage of state space models and the non-transformer alternatives until Mamba.

Mamba really changed things where basically for the same amount of compute, you can get a lot more mileage or a lot more performance for the same size of model. And then it's a different, now it's an efficiency story. Now it's a GPU poor story. It is no longer a long context story.

It is just straight up, we are strictly more efficient than transformers. I'm like, oh, okay, I can get that. Does that change anything? I don't know. - No, that makes sense. I think the people look at the slope, right? Which is like, oh, you can get the context higher and higher.

But in reality, it's like, if you kept the context smaller, instead look at the anti-slope, so to speak. It's like same context, it's like a lot less compute. - Yes. Yeah, so that was not clear to me until Mamba. And so I think that's interesting. I do think that there's a concept that I've been trying to call the sour lesson.

You know, the bitter lesson is stop trying to do domain specific adjustments, just scale things up and it's going to work. That's general intelligence. General intelligence is dislikes any attempt to imbue inside of it special intelligence. Like if you have like any switch case or if statements, or like if finance do this, if something do that, don't bother, just scale things up.

And it's going to do all of them simultaneously all better at once. That's the bitter lesson. The sour lesson is a parallel, is a corollary, which is stop trying to model artificial intelligence like human intelligence, right? Like the neuron was inspired by the brain, but doesn't work exactly like the brain.

Machine learning uses back propagation, the brain does not use back propagation. And so why should, we keep trying to create alternatives to transformers that look like RNNs, because we think that humans act like RNNs. We have a hidden state and then we process new data and we update that state.

But maybe artificial intelligence or machine intelligence doesn't work like that. And maybe we just fail every time we try. (laughs) So that's the sour lesson. Every time we try to model things. And my favorite analogy, I actually got this from, I think an old quote from Sam Altman, who was like, you know, like we made the plane, the airplane.

It was inspired by birds, but it doesn't work anything like birds, right? It just is, and it works very efficiently. Like it's probably the safest mode of transportation that we have, and it works nothing like a bird. So why should artificial intelligence work like human intelligence? And that is the philosophical debate underlying my continued cautiousness around state-space models.

Which I don't know if it's, I feel very vulnerable saying this, because I don't think there's any justification once you look at the empirical results or like the mathematical justifications for these things. But there is some grounding in philosophy that you should have when you think about, does an idea make sense?

Does it, is it worth exploring? - Yeah. Well, I think now there's a lot of work being put into it, right? And I think transformers have shown enough success that people are interested in finding the next thing. You know? - Yeah. - So before it wasn't clear if transformers were really gonna work.

So people were kind of working on them. But yeah. Okay, maybe in the 2025 recap, we're gonna have more. - Yeah, I mean, we'll try to do one before that. So we actually have a link. I don't know if you know this. Shreya Rajpal from Guardrails. She's married to Karan from-- - From Hazy.

- Sorry? - He was a Hazy, right? - Yeah, from Hazy, yeah. And so now he's started one of the other state-space model companies. I forget the name of it, so we'll see. And I'm sure this will be an emerging topic this year as well. So we'll don't have to wait 'til next year.

- Yeah, no, I think we're gonna have maybe the sour lesson, you know, overview. - Well, I mentioned this in the Luther Discord, and then they were like, okay, so what is the spicy lesson, and what is the salty lesson, what is the sweet lesson? - I want the sweet lesson, sounds better.

Cool, talking about GPU port, let's do multimodality. - Well, I feel that stable diffusion was like the first GPU port model, you know? Everybody was running it at home. - Yes, yes, absolutely, I should, I don't know if I mentioned that. I just didn't mention it. Stability, I think in 2023, you know, they shipped incremental things.

I think, I don't know if stable diffusion 2 was out there, but everyone's talking about XTXL Turbo, which is a form, which is an alternative to consistency model, but looks like a consistency model. They shipped video diffusion. They shipped a whole bunch of stuff, but just wasn't as big as 2022 when they, you know, made a huge impact with stable diffusion.

- Yeah, yeah, I mean, it's hard to up to. - It's hard to top that. - Stable diffusion. But yeah, mid-journey has been doing great, obviously. I actually finally signed up for a paid account last month. - Mid-journey, yeah, yeah. - Yeah, I'm part of the $200 million a year.

- You have to, yeah, so now it's, what's confirmed is, I think, like a Business Week article, or Economist, or Information article, that yeah, this team has now reached at least $200 million ARR, completely bootstrapped. I think their employee count is somewhere between like 15 and 30 people. I don't know if you know exact numbers.

I have heard rumors that their revenue is actually higher than that, that was what was reported. But it's between the $200 million to $300 million range, which is crazy. Especially if it's like primarily B2C. - Mm-hmm, yeah. - Which it looks like it is. - Yeah, yeah, yeah. It's like B to Fiverr to B.

I think there's like a ton of-- - Oh, you think there's a lot of Fiverr, yeah, yeah, yeah. Mid-journey specialists. - Yeah, yeah, you can like get in Discord and see what people are generating, you know? And you can see a lot of it is like product placement, ads, and a lot of stuff like that.

- Yeah, and DALI 3 doesn't seem to have any impact on-- - DALI 3 got so much worse after the GPD 4. - Really? - Like the all-in-one. Well, first of all, before you could generate four images. And then I had like very good vibes. Now the vibes are like boomer vibes.

- Oh, no. - Every time I generate something-- - The images I have here are DALI 3. - Every time I generate something on DALI, it looks like some dusty, old, yeah, like mid-2000s. - I think it's a skill issue. I think you have DALI 3 wrong. - No, but that was the great thing about DALI 3, right?

It's like it made the prompt better for you. - Yeah, yeah, yeah. Before, like literally when it first came out, I'm like, "Hey, make a coliseum with llamas." And it was like this beautiful thing. I feel like now it's not, I don't know. Again, it's a model, right? So it's like maybe I just got unlucky.

I'm in the wrong latent space, but yeah. - Yeah, there's a lot of players in this. I don't even think I put some of the players I was really excited about. Like, you know, the Imogen team spit out to create Ideogram, that was a few months ago. And I didn't even put it here because I forgot.

- It's too much, I can't keep track of all of it. - Yeah, yeah. Okay, so I will just basically say that I do think that I used to, at the end of 2022, start of 2023, I was not as excited about multimodality. Obviously, I'm more excited about it now.

I used to think that text-to-image was more like hobbyist kind of, you know, work, but $300 million a year is not hobbyist. - Right. (laughing) - It is not like, you know, not just like not safe for work because mid-journey doesn't do not safe for work. So it's real, it's a new form of art, it's citizen art.

It's exciting, it's unusual and interesting, and you can't even model this as an investor, you can't even model this on an existing market. Because like, there's just a market of people who would typically not pay for art, and now they pay a little bit for art, which is digital, not as good as human, but it's good enough, I use it all the time.

- Yeah, I'm surprised I haven't seen a return of the digital frames that were very popular during the NFTs boom, people were like, "Oh." - Yeah, so this is the very, very first "Latent Space" post was on the difference between crypto and AI in this respect. So I called this multiverse versus metaverse.

Crypto is very much about metaverse. Let us create digital scarcity, and let us create tokens that are worth, that are limited edition, that are worth something, and then you display it probably in your PFP as your representation of yourself. And what AI represents is multiverse, which is a very positive sum instead of zero sum, where if you like a thing, okay, I'll choose a different seed, and I'll make a completely equivalent second thing, and that's mine.

And that means very different things for what value is, and where value accrues. So like, yeah, I mean, I still cling to that insight, even though I don't know how to make money from it. I think that, I mean, obviously MidJourney figured it out. I think MidJourney like made the right approach there.

The other one, I think I'll highlight is 11 Labs. I think they were another big winner of last year. I don't know, did they renounce their fundraise? I think so. - I don't know. - Rumor is-- - Yeah, rumor is. - Rumor is, I can say it, you don't have to say it, because I only heard it from my friends.

Rumor is they're now Unicorn. And they just focus on voice synthesis, which again, I did not care about it at the start of 2023. Now we have used it for parts of latent space. I listen almost every day to an 11 Labs generated podcast, the Hacker News Daily Recap podcast.

I don't know what the room for this to grow is, because I always think like it's so inefficient to talk to an AI, right? The bit rate of a voice-created thing is so low. It's only for asynchronous use cases. It's only for hands-free, eyes-free use cases. So why would you invest in voice generation?

I don't know, but it seems like they're making money. - Right, yeah, yeah. Yeah, I mean, Sarah, my wife, yeah, she uses it while she drives to talk to Chad Chibiti. - I see. - Just like-- - Yeah, so Chad Chibiti uses their own TTS. It's not 11 Labs, okay.

But you can see the modality. - What does, we should bring Sarah in at some point, but-- - Customer interview. - What does she use Chad Chibiti voice for? - We're doing a bunch of home renovation. So maybe she's driving to Home Depot, and it's like, hey, what am I supposed to get to replace the sink, you know?

Or all these sort of things that maybe were like Google searches before. Now you can easily do eyes-free, hands-free. - Yeah, a lot of people have told me about that, and I just, when I listen, when I'm by myself, I always listen to podcasts. (laughing) So I don't have time for Chad Chibiti.

And Chad Chibiti, you know, probably the number one thing they can do for me is give me like a speed adjustment. (laughing) So I can listen in twice. - Yeah, yeah, yeah, yeah. (laughing) - Yeah. - That's funny. - Yeah, anyway, so like, I'm curious about your thoughts on like how, and as an investor, I think this is the weirdest AI battlefront for investing.

'Cause you don't know the time. - It's funny because there was, I'm trying to remember, there was a bunch of companies doing synthetic voices a while ago. And I think the problem, a lot of them got through like good ARR numbers, but the problem was like a repeatability or use case.

So people were doing all sorts of random stuff, you know? And the problem is not, it's kind of like mid-journey. The problem is not that there's not maybe a market of interest. It's like, how do you build a venture-backed company with like a scalable go-to-market that like can go after a customer segment and like do it repeatedly?

I think that's been the challenge. I don't know how 11 Labs is doing it, but you could do so many things with voice, text-to-voice that is like, how do you sell it? You know, who do you call? Like, that's like the hardest thing, right? If you're raising like a Series A, a Series B, it's like, how are you gonna invest this money in sales and marketing to get revenue back?

It's kind of like the basic of it. And it can be challenging. That's why sometimes investors are like, you're making money and that's great for you, but like how-- - There's no industry-- - Yeah, it's hard. It's hard to like just tie it together. - Okay. I would be interested in, because I feel like there's a category of companies in the early 2010s that did this, meaning they offered an API with no idea how you were gonna use it.

I'm thinking Twilio. And Twilio has a cohort of like sort of API-first companies that are all like sort of Twilio inspired. But yeah, I think there's a category or a time in the market when it makes sense to just offer APIs and just let your customers figure it out and it's actually okay.

And then there's sometimes when it's not okay. And I think the default investor mentality right now is that it's not okay if you don't know what your customer is doing. - I think, well, Twilio is a funny example because I think in the middle 2010s, Uber was like 15% of Twilio's revenue.

- I'm just, I'm talking like, move yourself back as to like Twilio seed investor, Twilio series A investor, they had no idea. Uber wasn't even around. - But I think the thing now is like, text to voice is not new, you know? Like that's really the thing. It's like, what's new now is that you can generate very good text to then feed into the model.

So that changes why the market is interesting, you know? But if you really think about it, the models today are a little better. They're maybe like 50% better than they were three years ago. But the transformer models under defeated, what to say, they're like a billion times better. So imagine if you have like a lot of people use it for like automated, you know, customer support, things like that.

Before you had like scripts, they were reading. Now you have, you can have a transformer model converse with the customer. So it makes it a lot more useful in cases. But we'll see how that changes. - Okay, the last thing I'll mention here, why is this a war? Which is OpenAI and Gemini and Google are working on everything models versus each of these individual startups all working on their selected modality.

And so this is a question of like, are the big tech companies going to actually win because they can transfer learning across multiple domains as opposed to each of these things being point solutions in their specific things. The simple answer is obviously everyone will win. - Right. (laughs) - Because the AI market is so huge.

You know, there's a market for the Amazon basics of like everything, you know, one model has everything. And then there's a market for, no, like the basics are not good enough. I'll need the special thing. Do you have an opinion on when does one market win over the other or is it just like everything's gonna win?

- Yeah, it's interesting. I think like it works when people wouldn't have used the product without the Amazon basics, you know? So like, maybe an example is like a computer vision, you know, like, I mean, we have-- - Yeah, vision is sore here now. - Yeah, it's like, you know, before people were like, why am I bothering trying out to set up a computer vision pipeline and all of that?

Now they can just go on GPT-4 and put an image and it's like, oh, this is good. I could use this for this. And then they build out something and maybe they don't use OpenGPT-4v, they use RoboFlow or whatever else. That's kind of how I think about it. It's like, what's the thing that enables people to try it, you know?

So in a way, the God model can do everything fairly okay. It's like DALI and MidJourney, you know, all these different things. Who's like the, and maybe like MixedRoute, the MixedRoute inference wars are like another example. It's like, I would have never put something in my app at like $2 per million tokens, but I did it at 27 cents per million token, you know?

And now it's like, oh no, I should really do this. It's a lot better. So that's how I think about how the God model kind of helps the smaller people then build more business. - Yeah, cool. Yeah, creates a category. Yeah, Ragonops. - Yeah, last but not least, where to begin?

We had almost all of these people on the podcast too. - They're honestly the easiest to talk to because they look like DevTools and you are a DevTools investor. I worked in DevTools and they're all, I think they're also more mature, right? As businesses, there's more of a playbook that is well understood by the customer.

Like, yes, I need a new stack here. Maybe not. So I think the reason, okay, so my biggest problem with putting databases versus frameworks versus ops tooling in the same war is that they're not really a war. They work cohesively together, except when one thing starts to intrude on another thing.

And that's why I put the very, very, I very consciously put together this sequence, which is databases on the left, frameworks in the middle, ops companies on the right. What's the first product of Lang chain? Lang Smith, which is an ops thing, right? So now suddenly the framework companies are not so friendly with the ops companies 'cause they're trying to compete with the ops companies.

What are the ops companies trying to do? The ops companies are trying to produce SDKs that compete with frameworks. Okay, then what are the database companies trying to do? First of all, they're fighting between each other, right? There's the non-databases, all adding vector features. We had some people approach us and we had to say no to them 'cause there's just too many.

And then there's the vector databases coming up and getting $235 million to build vector databases. Maybe I'll just, you know, obviously you're an active investor in some of these things, so you cannot say everything, but just on databases alone, one of the biggest debates of 2023, where do you stand on the whole thing?

- That's the million dollar question. I think it's really, well, one, in the start everything, there's kind of like a lot of hype, you know? So like when Lang chain came out and Lama index came out, then people were like, oh, I need a vector database. It's like vector, they search vector database and it's like Chroma, Pinecone, whatever.

But then it's like, oh, you can actually just have PG vector in Postgres. And you already have Postgres. Did you know it could do that? People are like, no, I didn't because nobody really cared. So like, there's not a lot of documentation. Same with, yeah, MongoDB vector, Cassandra, all these things.

- Redis, Elasticsearch. - You can actually put vectors and embeddings in everything. - It's a different kind of index. And I think like, I mean, like Jeff and Anton also, what they always talked about even early on, it's like, this is like a active learning platform. This is not just like a vector database.

It's like, what do you do with the vectors? It's like, what's most helpful? It's not where do you store them. So that's kind of the change. - I think that was old Chroma, by the way. I don't know if that's the new, the current messaging. - Well, but I think, I'm just saying like to them, it's never about, this is the best way to put a vector somewhere.

It's like, this is the best way to operate on the vectors. And the store is like part of it, but there's like the pipeline to get things out and everything, you have to build out a lot more. So I think 2023 was like, create the data store. I think 2024 is gonna be like, how do I make the data store useful?

Because the vector store just commoditized. So there needs to be something else on top of it. Yeah. - Unless they can come up with some kind of like, new distance function or something. I keep waiting for Chroma to, they teased a little bit of what they're working on at the AI Engineer Summit, which yeah, density and whatever other fancy formulas that Anton is cooking up.

But yeah, so I think I tweeted about this maybe like two, three months ago, and I think I pissed off Chroma a little bit. But the best framing of what Anton would respond to here is what people are embedding within vectors is a very different kind of data from what is already within Postgres and MongoDB and all the others.

So in some sense, it's net new data. And that actually struck a chord with me because that's how I started to understand structured versus unstructured data. That's how I started to understand, one of my kind of heroes is Mark, who's the CTO of MongoDB. This guy was the former GM of AWS RDS.

And for those who don't know, GM is like, you're the mini CEO of that business. And when you work at AWS RDS, you run a $1, $2 billion a year business. And now, and then he quits being Mr. Postgres of AWS to join MongoDB, the enemy. And when he gave that speech of why he did this, he was like, actually, if you look at the kind of workloads that is happening, Postgres is doing well, obviously.

Structured data, always going to be there. But unstructured data and document type data is just rising exponential rate even faster. And for him to say that means different things. Anybody could have said that. Anybody could have pointed, made a chart that showed what he did. Anybody could have said that.

But for him to have said that, I think it was a very big deal 'cause he's rich, he doesn't have to work, but he believed in this so much that he was like, okay, I'll just join MongoDB. So I'm like, okay, there's a real category shift between structured data and unstructured data.

I believe it. I don't think it's just that you can put JSONB inside of Postgres and be done. That's not a NoSQL database. Okay, fine. So what is this new thing of vectors? And how do you think about that as a new kind of data? And I think if there's a third category of something beyond unstructured data, I don't know what it is.

Context or memory or whatever you call it, whatever you call this kind of new data, that might belong in a new category of database and that might create the new MongoDB of this era. And it could be any one of these guys. Right now, Pinecone has the lead. I think they're $750 million company.

- Valuation. - Yeah. And then all the others are much smaller. So like, okay, if there's a room for like, if this is really a new data category and there's a room for a key player, then it's probably gonna be one of these guys. By the way, I left out Weviate and I put Kudrant in there.

Do you know why? - No. - Anthropic and OpenAI, both use Kudrant for their internal RAG solutions. Which means that for whatever reason, we should probably interview Kudrant. They passed the evals when Weviate and Milvus and all the others didn't, which is interesting. - Yeah, yeah, yeah, yeah. - There's a lot that we don't know.

- Yeah, interesting. Yeah, I think like, I mean, going back to your point of like, Langtring, building Langsmith, at some point, some of the vector databases are gonna be like, why am I letting my customers use Llama Index? You know, it's like, I should be the RAG interface since I'm owning the data.

- Yes, yes, that's why I put them next to each other. Right now they're friends. - Yeah, right now, but I mean, if we think about the JAMstack era, you know, you had Vercel started as Zite, which was just a CDN. And then you had Netlify, you had all these companies.

And then Vercel built Next.js. And so they moved down from the CDN to the framework, you know, and it's like, now they use the framework to then enable more cloud and platform products. Which way is it gonna give this way? I think what we learned from before is that you rather own the framework and then have the cloud to support it than like just have Netlify and not have your own framework.

Just given the way the two companies are doing now. - So for those who don't know, I worked at Netlify and I was very, very intimately involved in this. (laughing) - So we don't have to say anything private. - No, no, no, it's fine, it's fine. It's well known that Vercel won and Netlify has pivoted away to a different market.

But is it over learning from an N of one example that you always wanna own the framework? - No, no, no, no, no. Because then the counter example is the same, which is Gatsby. - Yes. - Where you own the framework, you don't own the cloud and then you don't make money either.

So it's kind of like... I think we still gotta figure out where the gravity is in this market. I think a lot of people will say the gravity is in the model. A lot of people will say the gravity is in the embeddings and the data that you put into it.

A lot of people don't know what they're talking about. So I think 2024 is supposed to be the year of AI in production. I think we're gonna learn soon who bleats and to where. - I think that statement is the year of Linux on the desktop thing. It's just always gonna be true.

People are always gonna be saying it. We're gonna be here one year later and then it's just like, yeah, this year is the year of AI production. And it's always gonna be incrementally more true. But what is the catalyst? What is the big event that you will point to and say, aha, now it's in production?

I don't know. - I think actually being that it's not in production. A lot of companies, it's funny. One, they're just like an inherent timeline that large companies work within. GPT-4 came out in April. That's like eight months. It's like most companies don't buy things within eight months and implement them.

So I think part of it, just like a physics time limit, that even people that have been really interested, you just cannot go through the whole process of getting them live to all of your customers. So I think we'll see more of that in good and bad. It's gonna be a lot of failures and a lot of successes, hopefully.

Yeah. - Yeah, any other commentary on tooling, RAG, ops, anything like that? I mean, I always tell people, as much as I'm interested in fine tuning, I think RAG is here to stay. Don't even doubt it. This is a necessary part that every AI engineer should know. - Yeah, well, I think, yeah, it's tied to the infinite context thing, right?

I think the leftover question is like, do you wanna have infinite context and hope that the model is good enough at parsing which parts matter to your query? Or do you wanna use RAG and wrap very specific context injection? I think so far, most people will say, I'd rather do a context injection with just what I care about than put a whole document in there and hope the model gets it.

But maybe that changes. - I don't, I like-- - Yeah, no, I mean-- - There's no way it changes. (laughing) - Hey, you know, that's great for Lama Index. - Yeah, yeah, no, it's great. - You know, like, great luck is gonna make a lot of money, I guess.

- Yeah, yeah, yeah. No, it's not clear that they're gonna make a lot of money, right? 'Cause they're just an open source project. I don't think they've launched a commercial thing yet. - I don't think so. Because, yeah, Jerry was talking about it on the podcast, but it wasn't, yeah.

- Yeah, so, I mean, we'll see what they launch this year. I do have-- - The year of AI in production. - The year of Lama Index in production. - Yeah. Okay, so that's the four wars. We also covered a bunch of other non-wars that we skipped over. I did remember that you actually just published a piece on the semantic versus-- - The syntax, the semantics.

- Do you wanna cover that as a-- - Yeah, I think, like, I kinda mentioned this a couple times on the podcast, but basically, the idea of, like, code has always been the gateway to programming machines, and we spend a lot of time making it easier. So you go from punch cards, like COBOL, to C, to Python, just to make it easier for the person to read and write the code.

And through it, we started adding kind of, like, these semantic functionalities in it. So in Python, you can do array.sort. You don't need to know bubble sort. You don't need to know any algorithm that you learned in school to do it. And I think the models are kind of like 100X-ing this, which is, like, now all you need to do is, like, create a sign-up form, you know, where people put a name, email, and send it to this endpoint.

So it's gonna be a lot easier for people that know the semantics of the business, which is, you know, your product managers, or your business people, the layer that goes from customer requirements to implementation, basically, and have them intervene in the code. So, you know, how many times, as an engineer, you have to, like, go change some button color or, like, some button size, like, these small things that, like, you really shouldn't be doing.

And now you can have people with natural language intervene in the code and write code that can actually be merged and put in production. I also wrote the bear case for it, which is, like, we already have so much trouble getting engineering teams to collaborate and get all their changes together without conflicts and all of these things, and maybe also having non-technical people try and do things will be hard.

And models are, they just think about solving the task at hand. They don't think about, I've always told my engineers, it's like, you need to leave the code base better than you found it. You know, if you're, like, writing something, it's like, just, we cannot always keep adding, like, quick hacks, you know?

And I think models are great at quick hacks, but sometimes it's like, oh, this is, like, the 16th button that you've changed a style for, you should make a class for it. That's, like, the dumbest example. So I think that's, if that happens, then I think I'll be a lot more bullish on, like, coding agents, you know?

But I think now that's kind of the, until you can have non-technical people manually query models and look at results and then say, this is ready to go, it's gonna be hard to have autonomous agents do it, so. - Yeah, so I actually had a tweet about it today because Itamar from Codium actually published Prompt Engineering, Flow Engineering, as his next evolution of Prompt Engineering.

And they've been working on, you know, in IDE agents. They call it agents, you can debate about the definition of an agent, you know, at the end of the day. So my split of it is inner loop versus outer loop, which I think you understand that. Maybe I have to explain it to the audience, because every time I talk about it to developers, they don't, they've never heard of it.

So inner loop is everything that happens between a Git commit. Outer loop is everything happens after the commit is committed and it's pushed up for PR. So maybe that's too reductive, but that's something like that, right? Like, inner loop happens within your IDE, outer loop happens in GitHub, something like that.

Okay, so I think your conception of an agent is outer loop-y, especially if it's non-technical, right? Like the dream, like you mentioned sweep.dev in your write up. And there's also CodeGen, there's also maybe Morph, depends what Morph is doing. And there's a bunch of other people all doing this stuff.

Even small developer was also like, you know, write in English and then create a code base. And I think it's just not ready for that. Outer loop is a mirage, it's like going to forever be five years away. And the people working on inner loop companies have been the right bet.

And you can work on inner loop agents. I think actually Code Interpreter is an inner loop agent in a sense of like, it's like limited self-driving, right? It's kind of like, you have to have your attention on it, you have to watch it, it can only drive a small distance, but it is somewhat self-driving.

And so I think if you have this like gradations in your outlook on autonomous agents, and you don't expect everything to jump to level five at once, but if you have a idea of what level one, two, three, four, five looks like for you, I haven't really defined it apart from this concept of inner loop versus outer loop.

But once you've defined it, then you can be like, oh, we're making real progress on this stage. And this other stage, too early for now, but at some point somebody will do it. - Yeah, yeah. I think like, yeah, maybe level one is like, I think of it more as just the auto-completion and the ID.

Level two is like asking cursor, hey, how can I make this change? But then level three should be like, to me it's like, we need to separate the inner loop from the ID, you know? Like, I need to make a code change. Sometimes I shouldn't go in the ID.

Sometimes I should be in the UI of the product and say, hey, that needs to be changed. Kind of like the, all the preview environments companies want you to put comments, the PMs put comments. Like, how do you go from that to code changes? There should be enough there to make the code changes happen through a supervised interface.

- That's how the loop. - Yeah, but that's kind of like, I think what these models are doing is like change where the loop start and end, you know? Because now you can create code in the outer loop before you couldn't do it. - That's the dream, that's the dream.

- Yeah. - Yeah, I have, yeah, anyway, my focus right now, I'll say if anyone cares, it's like, I think the only thing that's working is inner loop and you should just use inner loop things aggressively, build inner loop things aggressively, invest in them, and then keep an eye on the outer loop stuff.

- Yeah. - It's still very early. I did invest in CodeGen, this Jhacks thing, which we mentioned briefly in the Sourcegraph episode. Do we have other things that we want to mention or do you want to sort of keep it to the four wars? - I think that's great.

I thought it was going to be much shorter, but we're at one hour, 15 minutes. I thought we were going to run through everything. - Yeah, I mean, are there, like, okay, maybe like top two things from December that you have commentary on. - I think the needle in a haystack thing.

- Okay, maybe you want to explain that first. - Yeah, basically like Entropic, there was like one example floating around about Cloud's context window, and you basically gave it this like super long context on I think like things to do in San Francisco or something like that. And then it was like, what is the most fun thing to do in SF?

And it always, it didn't, they made this nice chart of like, okay, based on where it is in the context, it gave a better or worse response. And then Entropic responded and they were like, oh, you just need to add here's the most relevant sentence in the context as part of the assistant prompt.

And then the chart turns all green all of a sudden. And I'm like, we cannot still be here, right? Like, it cannot, this is like some-- - And you have Entropic like telling people, oh yeah, it's just like just add this magic string and it works. - Yeah, it's some like Riley Goodside wizardry.

It's like, I don't want to do that anymore. I thought Riley, I thought like, you know, in the early days of GPDs, like Riley Goodside was doing so much great work on like prompt engineering and whatnot. We shouldn't be there anymore. There shouldn't be somebody telling me, or like the GPD 4, like I'll give you a $200 tip if you do this right and like-- - So I collected a whole bunch of like state-of-the-art prompting techniques.

Yeah, so if you tip the model, it will give you better results if you promise that. So okay, here's the current state-of-the-art for GPT prompting. It's Monday in October, the most productive day of the year. You have to take a deep breath and you have to think step-by-step. You have to return the full script.

You are an expert on everything. I will pay you $20, just do anything I ask you to do. I will tip you $200 every request you answer correctly. And your competitor models said you couldn't do it, but you can do it. Or I think there's another one that I didn't put in here, but it's like, you know, my grandmother's dying.

This is an emergency, please help me do it. - Yeah, that's actually my, I think my most viewed tweet ever. At OpenAI Dev Day, I tweeted, no more return JSON or my grandma's gonna die when they announce JSON mode. And people love the, people love to get grandma's-- - I haven't heard as much uptake on JSON mode.

I think it's still-- - That's the thing with all this AI stuff, right? It's like, I mean, and sometimes we're like part of it. If I think about our chat GPT plugins episode, I think in the moment people are just like, oh, this is gonna be such a big deal.

And then it takes varied amount of times to like really pick up, you know? - Do you think that will happen in GPTs? - I think like most people that are using GPTs right now are trying to get around some sort of weird limitation of the base model, you know, or just trying to have a better system prompt.

But like at some point there's limited value to get out of it. So the question is like, what's gonna incentivize people to build more on it versus just building their own thing out of it? I don't know. - Yeah, okay, so I guess my pick for highlight of last month, there's two.

One, we finally got Gemini. - Right. - I think the marketing was dishonest. - Yeah, we need the soundboard. - Wham, wham, wham. - But still, it is a sort of model. It is a very, very credible alternative to open AI. And we should be happy for that because otherwise we live in a open AI only world.

And Gemini is basically the only other sort of leading contender until Llama 3 drops whenever Llama 3 comes out. - It's kind of, I mean, SOC said today they're training it. - Yeah, it sounds like today they're training it. For me, I guess I'm still very interested in like the hardware metagame.

This is a much smaller stakes, but very personal. I think recently, especially, you know, we're recording this mid-January. So after CES, after Rabbit R1 launched, I think there's a lot of interest in hardware. I don't know how you feel about it as an enterprise software investor, but I think that hardware is hard, but also it captures context and it makes AI usable in ways that you cannot currently think about.

And everyone dreams of building an assistant like her in the movie "Her." That is a hardware piece. That is actually not only software. And probably the hard part is the engineering for the hardware. And then the sort of AI engineering for the assistant within the hardware. So yeah, I mean, yeah, I'm an investor in tab.

I see a lot of like, you know, interest this month, but it started last month with the launch of Humane as well. I don't know if you have thoughts on any of those things. - Well, I think this year we also get the Apple Vision Pro thing. So I think there's gonna be a ton of experimentation.

I think Rabbit got the right nostalgia factor. You know, it kind of looks like a toy that looks like a Game Boy Advance, something like that. I'm curious to see what you got beyond that. I think, yeah, I mean, obvious, like right where we have the studio building tab.

And I think that's another interesting form factor. And I think if you ask them, I think in our circles, a lot of people are like, well, what about privacy and all these things? But he will tell you that we're kind of like a special group that most people value convenience over privacy, as you'll learn from the social medias of the last few years.

So yeah, I'm really curious to see how it develops. - I really like technology where it's, you're slightly uncomfortable with it on a social level. And so, you know, for Uber, it was like this regulation around taxis. For Airbnb, it was, you know, staying in strangers' homes. And now it turns out for OpenAI, it was training on people's content.

- Right. - Right, now it's becoming a matter of regulation. And OpenAI's data partnerships are, you know, a form of private regulatory capture, which is a playbook that is fantastic. Like if you, I hope it was on purpose, because whoever did that is a genius. So I'm like, okay, like, you know, I do think that every great new company, especially on the consumer side, is provocative in that sense.

Like they're doing something that is not yet kosher. And so I think, like, the Humane's, the Tabs, anything that is working on that front where it's like, yeah, I'm not sure I'm comfortable with this. And then, but maybe it could change. That is a really interesting shift. So I'm excited from that point of view, but at the same time, most hardware companies fail very quickly.

They have a very hot start and then, you know, everyone puts it in their drawer and then never looks at it again. So I'm very, very aware of that. But I think it's, I mean, it's something interesting. And I do think, so here's the core thing of it, right?

Aavi doesn't think it's a hardware company. Aavi, like most of the cost of the $600 for Tab is going towards GPT costs, because it's actually processing context. And the whole idea is that context is all you need. Like in this world of like, you know, AI applications, like whoever has the most unique context wins, right?

A unique context could be the quality data war, right? Like a unique context is like, you know, I have Reddit info, I have Stack Overflow info, I have New York Times info. If I have info on everything you say and do at all times, that is something that no one else has.

And if he becomes a good store of that, then like, what can you build with that? So I'm most excited for him to expose the developer API, 'cause then I can come in and do all my software stuff. But he has to build a hardware layer and get acceptance for that first.

- Right, yeah, no, I'm excited to see. I'm sure we're gonna see a lot of people work around with them. So I'm excited to see. - I actually, so I think he doesn't like me because I asked for an off button. I guess I wanna be able to guarantee you, if we're having a conversation, I wanna show you, you see, it's off, right?

It's kind of like, oh yeah, my phone is on silent mode. Right, there's a physical silent mode button. But now he just wants it to be always on. - That's a whole new market. Like a soundproof storage for your AI pendant so that you can guarantee the person cannot hear you.

Awesome, no, this was fun. Please, if you're still listening after one hour, 21 minutes, let us know what we did right, what we did wrong, what you would like to see differently. It's the first time we tried this out, but yeah. - Awesome, thanks for doing this. - Cool.

(upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music)