back to indexBeating GPT-4 with Open Source Models - with Michael Royzen of Phind
Chapters
0:0 Introductions
1:2 Founding SmartLens in High School (2017)
3:44 Shifting to NLP
5:10 Sparking Interest in Long-Form Q&A (HuggingFace Demo)
8:32 Creating a Search Engine (Common Crawl, 2020)
11:29 Early Days: Hello Cognition to Phind
13:35 Phind Launch & In-Depth Look
20:58 Envisioning Phind: Integrating Reasoning with Code & Web
23:26 Exploring the Developer Productivity Landscape
26:28 Phind's Top Use Cases & Early Adoption
30:0 Behind Phind’s Rebranding (Advice from Paul Graham)
39:40 Crafting a Custom Model (Code Llama & Expanded Data)
44:34 Phind's Model: Evaluation Tactics & Metrics
47:0 Enhancing Accuracy with Reinforcement Learning
51:18 Running Models Locally: Interest & Techniques (Quantization)
67:13 Michael’s Autodidact Journey in AI Research
72:0 Lightning Round
00:00:02.580 |
- Hey everyone, welcome to the Latent Space Podcast. 00:00:13.840 |
and I'm joined by my co-host, Swiggs, founder of Small AI. 00:00:22.000 |
- Yeah, we are recording this in a surprisingly hot October 00:00:25.200 |
in San Francisco, and I mean, sometimes the studio works, 00:00:50.560 |
but then obviously you can fill in the blanks 00:00:53.880 |
So you actually were a high school entrepreneur. 00:01:10.720 |
the deep learning revolution was already in flow, 00:01:13.480 |
and good computer vision models were a thing. 00:01:16.940 |
And what really made me interested in deep learning 00:01:19.180 |
was I got invited to go to Apple's WWDC conference 00:01:26.960 |
'cause I was really into making iOS apps at the time. 00:01:43.760 |
And after seeing that, I was like, oh, this is cool. 00:01:52.920 |
And so I had this crazy idea where it was like, 00:02:28.200 |
Yeah, so I took that, filtered it, pre-processed it, 00:02:32.800 |
and then did a massive fine tune on Inception V3, 00:02:40.680 |
deep convolutional computer vision model at the time. 00:02:44.420 |
And to my surprise, it actually worked insanely well. 00:02:53.220 |
I think it ended up being 17,000 categories approximately 00:03:00.920 |
It worked so well that it actually worked better 00:03:07.940 |
And so, and on top of this, the model ran on the device. 00:03:14.120 |
A big part of the issue with Google Lens at the time 00:03:22.480 |
having to upload an image to a server and get it back. 00:03:28.000 |
even on the iPhones of the day in 2017, much faster. 00:03:37.620 |
And there was kind of one big spike in usage, 00:03:45.680 |
Oh, it's like a monthly or annual subscription? 00:03:48.560 |
- Even though you don't actually have any servers. 00:04:04.200 |
that the usage was surprisingly not that frequent. 00:04:08.280 |
The extent to which all three of us have a sense of sight, 00:04:11.920 |
I would think that if I lost my sense of sight, 00:04:15.320 |
The average usage of Be My Eyes per day is 1.5 times. 00:04:21.360 |
where I was also looking into image captioning, 00:04:32.360 |
well, people want to give a description of an image, 00:04:43.720 |
NVIDIA was working on this back in 2019, 2020. 00:04:46.480 |
They had some impressive, I think, phase scans, 00:04:54.560 |
But it wasn't able to take a natural language description 00:05:17.640 |
I'm still sort of working on updating the app in college. 00:05:24.280 |
hey, what if I make an enterprise version of this as well? 00:05:32.680 |
But I thought, this massive classification model 00:05:37.000 |
works so well, and it's so small, and so fast, 00:05:43.240 |
or do any of those things that you're supposed to do. 00:05:44.760 |
I was just mainly interested in building a type of backend 00:05:49.640 |
So I was mainly just doing it for myself, just to learn. 00:05:53.040 |
And so I built this enterprise classification product, 00:05:57.880 |
I'm also building an invoice processing product, 00:06:03.200 |
where using some of the aspects that I built previously, 00:06:07.640 |
although obviously it's very different from classification, 00:06:14.880 |
from an unstructured invoice through our API. 00:06:18.560 |
And that's what led me to HuggingFace for the first time, 00:06:21.800 |
'cause that involves some natural language components. 00:06:31.320 |
I used the standard BERT, and also LongFormer, 00:06:37.400 |
And LongFormer was interesting because it allowed, 00:06:43.000 |
Like BERT, all of the first-gen encoder-only models, 00:06:46.640 |
they only had a context window of 512 tokens. 00:06:51.840 |
There's none of this alibi or ROPE that we have now, 00:06:54.960 |
where we can basically massage it to be longer. 00:06:57.040 |
It was, they're fixed, 512 absolute encodings. 00:07:00.720 |
And so LongFormer at the time was the only way 00:07:06.560 |
or ask a question about like 4,000 tokens worth of text. 00:07:10.000 |
And so implemented LongFormer, it worked super well. 00:07:14.960 |
But nobody really kind of used the enterprise product. 00:07:28.960 |
And so nobody really used it, and my heart wasn't in it, 00:07:35.200 |
But a little later, I went back to Hugnyface, 00:07:42.440 |
They had this demo made by this researcher, Yassin Jarnit. 00:07:47.480 |
And he called it LongForm Question Answering. 00:07:52.480 |
And basically, it was this self-contained notebook demo 00:08:15.200 |
The demo itself, it used, I think, BART as the model. 00:08:19.720 |
for both an Elasticsearch index of Wikipedia, 00:08:24.720 |
as well as a Dense index, powered by Facebook's Face, 00:08:36.760 |
But when it worked, I think the question in the demo was, 00:08:56.800 |
and it would know what to do and just give you the answer. 00:09:03.040 |
And I started thinking about ways to make it better. 00:09:14.280 |
it was fine-tuned on this Reddit dataset called Eli Five. 00:09:23.800 |
Someone had scraped, I think, I forget who did it, 00:09:36.720 |
So we're bootstrapping this model from Eli Five, 00:09:44.640 |
when doing this rag retrieval from these databases 00:09:51.280 |
And so Eli Five actually turned out to be a good dataset 00:09:54.600 |
for training these types of question-answering models 00:10:01.360 |
and at least helps the model get the format right. 00:10:19.800 |
it's able to have a reasonably high-quality output. 00:10:22.400 |
And so once I made the model as big as I can, 00:10:28.280 |
I started looking for ways to improve the index. 00:10:38.560 |
for how to make an Elasticsearch index just for Wikipedia. 00:10:42.280 |
And I was like, "Why not do all of Common Crawl?" 00:10:50.640 |
worth of AWS credits left over from the SmartLens project. 00:10:59.200 |
And so I was able to spin up a bunch of instances 00:11:01.480 |
and just process all of Common Crawl, which is massive. 00:11:04.640 |
So it's roughly like, it's terabytes of text. 00:11:12.240 |
I went to Alexa to get like the top 1000 websites 00:11:25.760 |
'cause the webpages were already included in dump. 00:11:40.520 |
because obviously there's this massive long tail 00:11:43.120 |
of small sites that are really cool, actually. 00:11:50.040 |
which is a search engine specialized on the long tail. 00:11:53.000 |
I think they actually exclude like the top 10,000. 00:11:58.520 |
and just don't really know what their pitch is. 00:12:04.920 |
but for this, that was kind of out of the question, 00:12:16.680 |
approximately 350 million webpages through Elasticsearch. 00:12:22.680 |
So I built this index running on AWS with these webpages, 00:12:28.120 |
Like you can ask it like general common knowledge, 00:12:31.360 |
history, politics, current events, questions, 00:12:35.320 |
and it would be able to do a fast lookup in the index, 00:12:40.280 |
and it would give like a surprisingly good result. 00:12:55.360 |
And yeah, I was kind of shocked no one was doing this, 00:13:31.720 |
It was still a glorified summarizer, basically. 00:13:36.640 |
- I think Bloom ended up actually coming out in 2022, 00:13:47.560 |
the Bloom models just were never really that good, 00:13:50.320 |
which is so sad 'cause I really wanted to use them. 00:13:53.120 |
But I think they didn't train on that much data. 00:14:00.360 |
which we now know are far below Chinchilla Optimal. 00:14:05.800 |
what we're currently doing with the fine model goes, 00:14:13.120 |
And then they didn't really do any fine tuning 00:14:16.360 |
So T0 worked well because they took the T5 models, 00:14:28.240 |
similar to GPT-3, but the models were much smaller. 00:14:30.840 |
So the models, yeah, they were pre-trained better. 00:14:46.720 |
from diverse data sources in the fall of 2021. 00:14:52.560 |
This is before Flan T5, which came out in 2022. 00:15:04.200 |
on top of T0, I also did the Reddit Eli5 fine tune. 00:15:16.200 |
to where I didn't get discouraged like I did previously. 00:15:23.600 |
Sometimes it would just misinterpret your answers so, 00:15:31.840 |
But for the first time, it was working reasonably well. 00:15:36.640 |
I think the BART model is like 800 million parameters, 00:15:45.200 |
And that was the very first iteration of Hello. 00:15:57.720 |
Our fine tune T0 model connected to our Elasticsearch index 00:16:02.160 |
of those 350 million top 10,000 common crawl websites. 00:16:16.360 |
that's effectively connected to like a large enough index 00:16:21.640 |
that I would consider like an internet scale. 00:16:28.360 |
like an internet scale LLM powered rag search system 00:16:35.320 |
And around the time me and my future co-founder, Justin, 00:16:56.080 |
go to sleep, wake up the next morning at like eight 00:17:01.440 |
And I was also doing my thesis at the same time, 00:17:19.880 |
And the conclusions of my research actually kind of helped 00:17:52.920 |
And chat GPT browsing will think that llama.cpp 00:17:56.600 |
that you can just compile with GCC and you're all good. 00:18:02.800 |
even though I'm sure somewhere in their internal prompts, 00:18:05.480 |
they have something like, if you're not sure, do a lookup. 00:18:12.520 |
And so we approached LLM powered question answering 00:18:19.440 |
We pivoted to make this for programmers in June of 2022, 00:18:25.880 |
around the time that we were getting into YC. 00:18:33.040 |
is the case where the models actually have to think. 00:18:37.160 |
the models were kind of more glorified summarization models. 00:18:42.520 |
the Google featured snippets, but on steroids. 00:18:48.280 |
the simpler questions would get commoditized. 00:18:57.560 |
to like answer the more basic kind of like summarization, 00:19:03.000 |
like current events questions with lightweight models. 00:19:05.320 |
That'll only continue to get cheaper over time. 00:19:07.680 |
And so we kind of started thinking about this trade-off 00:19:09.760 |
where LLM models are going to get both better 00:19:19.680 |
Either you can run a model of the same intelligence 00:19:25.480 |
or you can run a better model for the same price. 00:19:33.960 |
they're going to deploy, and they're already doing this 00:19:35.560 |
with SGE, they're going to deploy a relatively basic 00:19:43.160 |
about like current events, like who won the Superbowl, 00:19:51.600 |
And the flip side of that is like more complex questions 00:19:56.160 |
and you have to solve problems and like debug code. 00:19:58.760 |
And we realized like we were much more interested 00:20:06.480 |
And so we've optimized everything that we do for that. 00:20:10.480 |
And that's a big reason of why we've built FIND 00:20:22.760 |
what the emergent properties are in terms of reasoning, 00:20:25.800 |
in terms of being able to solve complex multi-step problems. 00:20:30.480 |
And I think that some of those emergent capabilities, 00:20:37.320 |
So as I think there's always an opportunity for us 00:20:48.080 |
what is the best, most advanced reasoning engine 00:20:55.200 |
that's connected to the internet that we can just provide? 00:21:10.320 |
when they have a question or when they're frustrated 00:21:18.120 |
if you're experiencing really any kind of issue 00:21:31.200 |
It has an interface in VS Code and more IDEs to come. 00:21:44.560 |
or they will find other code in your code base, 00:21:56.440 |
So, that's really the philosophy behind FIND. 00:22:03.560 |
And so, right now from a product perspective, 00:22:10.800 |
So, the VS Code extension that we launched recently 00:22:17.120 |
and it knows where to find the right code context 00:22:25.960 |
And it's not just reliant on what the model knows. 00:22:29.280 |
And it's able to figure out what it needs by itself 00:22:44.360 |
But the issue is also not everyone wants to use VS Code. 00:22:53.240 |
or they're using PyCharm or other IDEs, JetBrains. 00:23:11.080 |
of all these startups doing code, doing search, et cetera. 00:23:15.320 |
But really, who everyone's competing with is ChatGPT, 00:23:37.280 |
people are happy to go somewhere else, basically. 00:23:46.760 |
people sometimes perhaps aren't even in an IDE. 00:24:03.080 |
And so, the web part of it also exists for that, 00:24:28.040 |
Yeah, so I thought the podcast with Aman was great, 00:24:37.640 |
about not having platform risk in the longterm, 00:24:42.160 |
but some of the features that were mentioned, 00:24:54.600 |
We haven't yet seen, with VS Code in particular, 00:24:59.280 |
any functionality that we'd like to do yet in the IDE 00:25:09.960 |
or something that we kind of hack into there, 00:25:15.440 |
And so I think it remains to be seen where that goes. 00:25:21.800 |
is we're not trying to just be in an IDE or be an IDE. 00:25:28.440 |
and is really meant to cover the entire lifecycle 00:25:37.400 |
and I want to get from that idea to a working product. 00:25:42.600 |
of Find is really about, is starting with that, 00:25:50.000 |
is just going to be really just the problem solving. 00:26:10.880 |
some impression about the type of traffic that you have, 00:26:19.520 |
And I don't know if you have some mental categorization 00:26:28.560 |
So the two main types of searches that we see 00:26:32.720 |
are how-to questions, like how to do X using Y tool. 00:26:37.840 |
And this historically has been our bread and butter, 00:26:44.480 |
at just going over a bunch of developer documentation 00:26:48.240 |
and figuring out exactly the part that's relevant 00:26:50.200 |
and just telling you, okay, like you can use this method. 00:26:59.920 |
people organically just started pasting in code 00:27:03.520 |
that's not working and just said, fix it for me. 00:27:19.120 |
Maybe it required like some multi-step reasoning. 00:27:25.080 |
or something found in either a Stack Overflow post 00:27:31.000 |
And so then they paste it into Find and then Find works. 00:27:33.800 |
So those are really those two different cases. 00:27:45.720 |
And so that's what a big part of our VS Code extension is, 00:27:51.560 |
here, just like fix it for me type of workflow. 00:27:55.880 |
Like it's in your code base, it's in the IDE. 00:28:02.560 |
But at the end of the day, like I said previously, 00:28:05.920 |
that's still a relatively, not to say it's a small part, 00:28:46.000 |
"Would that be the primary value proposition?" 00:28:48.800 |
And so what we've seen is that any model plus web search 00:28:51.960 |
is just significantly better than that model itself. 00:28:54.640 |
- Do you think that's what you got right in April? 00:28:55.920 |
Like, so you got 1500 points on Hacker News in April, 00:28:59.400 |
which is like, if you live on Hacker News a lot, 00:29:02.280 |
that is unheard of for someone so early on in your journey. 00:29:17.680 |
So we launched the very first version of Find 00:29:24.320 |
after like the previous demo connected to our own index. 00:29:26.760 |
Like once we got into YC, we scrapped our own index 00:29:42.200 |
And over time, every time we like added some intelligence 00:29:46.040 |
to the product, a better model, we just keep launching. 00:30:08.160 |
Should we go for like the full Paul Graham story 00:30:11.320 |
- Do you wanna do it now or you wanna do it later? 00:30:15.360 |
- I think, okay, let's just start with the name for now 00:30:17.520 |
and then we can do the full Paul Graham story later. 00:30:24.960 |
he saw our name and our domain was at the time, sayhello.so. 00:30:39.560 |
you know, we just kind of broke college students. 00:30:44.120 |
because it was the first like conversational search engine. 00:30:49.240 |
that's the angle that we were approaching it from. 00:30:55.360 |
Like the sayhello, like what does that even mean? 00:30:58.520 |
And like .so, like, it's gotta be like a .com. 00:31:02.560 |
We did some time just like with Paul Graham in the room. 00:31:05.640 |
We just like looked at different domain names, 00:31:07.840 |
like different things that like popped into our head. 00:31:13.240 |
Like with the P-H-I-N-D spelling in particular. 00:31:15.720 |
- Yeah, which is not typical naming advice, right? 00:31:30.040 |
But over time, like it kind of, it kept growing on us. 00:31:40.160 |
It's owned by this elderly Canadian gentleman 00:31:42.920 |
who got to know and he was willing to sell it to us. 00:31:53.080 |
I mean, you know, everyone who looks at you is wondering. 00:31:56.640 |
and a lot of people actually pronounce it finned, 00:31:59.160 |
which, you know, by now is kind of, you know, 00:32:08.160 |
and then just have that redirect to P-H-I-N-D. 00:32:10.920 |
So P-H-I-N-D is like definitely the right spelling. 00:32:15.880 |
- So Bing web search, and then in August you launched V2. 00:32:29.040 |
like I don't really think of it that way in my mind. 00:32:31.120 |
There's like, there's the version we launched during, 00:32:34.760 |
which was the Bing version directed towards programmers. 00:32:40.560 |
that's why I call it like the first incarnation 00:32:43.120 |
'Cause it was already directed towards programmers. 00:32:44.800 |
We had like a code snippet search built in as well. 00:33:07.920 |
Got some traction, but really like we were only doing like, 00:33:10.720 |
I don't know, maybe like 10,000 searches a day. 00:33:17.000 |
'Cause looking back, the product like was not that good. 00:33:19.760 |
And yeah, every time we've like made an improvement 00:33:32.640 |
and importantly, like better underlying models. 00:33:35.640 |
Yeah, I would really consider every kind of iteration 00:33:40.760 |
every major version after that was when we introduced 00:33:51.520 |
when we were like, okay, our own models aren't good enough. 00:33:56.320 |
And that actually, that did lead to kind of like our first 00:34:09.960 |
But we were still kind of running into problems 00:34:15.840 |
but people were leaving because even like GPT 3.5, 00:34:23.400 |
like still not that great at doing like code-related 00:34:34.520 |
And so it was really only when GPT-4 came around in April 00:34:41.600 |
our first real opportunity to really make this thing 00:34:44.880 |
like the way that it should have been all along. 00:34:53.680 |
And so what we did was we just let anyone use GPT-4 00:35:33.600 |
And that's what like really, really made it blow up. 00:35:46.120 |
towards like really grabbing people's attention. 00:35:50.200 |
- So something I would be anxious about as a founder 00:35:53.760 |
So obviously we all remember that pretty closely. 00:36:08.840 |
Because it was like kind of de facto access to GPT-4 00:36:14.200 |
- Chat GPT-4 was in Chat GPT from day one, I think. 00:36:23.080 |
we had people building unofficial APIs around Find. 00:36:31.880 |
And I think OpenAI actually has the right perspective 00:36:36.000 |
"Okay, people can do whatever they want with the API. 00:36:37.520 |
If they're paying for it, they can do whatever they want. 00:36:50.240 |
to effectively crack down on those unofficial APIs, 00:37:10.640 |
how do we, like, what do we make of this, right? 00:37:19.080 |
which have just like massively, massively ballooned. 00:37:28.720 |
with the release of Lama 2 and Lama 3 on the horizon 00:37:34.120 |
to vertical applications running their own models. 00:37:57.880 |
effectively two, two and a half years of research 00:38:05.840 |
all of like the instruction tuning techniques, RLHF, 00:38:13.520 |
and now there's all these other startups like Mistral too. 00:38:15.240 |
Like there's a bunch of very well-funded open source players 00:38:20.000 |
taking the recipe that's now known and scaling it up. 00:38:24.520 |
So I think that even if a Delta exists in 2024, 00:38:29.440 |
the Delta between proprietary and open source 00:38:46.920 |
than whatever the proprietary model is at the time. 00:38:54.720 |
And that's something that we're super excited about 00:38:58.200 |
'cause yeah, that brings us to kind of the fine model 00:39:05.280 |
was to be able to return to that if that makes sense. 00:39:14.840 |
who like they want longer context in the model basically. 00:39:24.080 |
They want, and without, you know, context and retrieval 00:39:31.400 |
that if you have the space to just put the raw files 00:39:38.440 |
that is still better than chunking and retrieval. 00:39:42.760 |
with longer context, faster speed, lower cost. 00:39:46.440 |
And that's the direction that we're going to find model. 00:39:52.520 |
that we can take a really good open source model 00:40:00.360 |
all of the high quality data that we can find. 00:40:12.440 |
One of the very interesting ideas that I've seen 00:40:30.960 |
So basically there's all this really high quality, 00:40:36.560 |
like human-made, human-written diff data out there 00:40:40.200 |
on every time someone makes a commit in some repo. 00:40:48.640 |
what should that code look like in the future? 00:40:55.320 |
So we ran this experiment, we trained the fine model. 00:41:07.320 |
of the BigCode leaderboard by far, it's not close, 00:41:13.320 |
We have a 10 point gap between us and the next best model 00:41:18.320 |
on Java, JavaScript, I think C#, multilingual. 00:41:23.400 |
And what we kind of learned from that whole experience 00:41:36.360 |
And we know this because GPT-4 is able to predict 00:41:42.760 |
I've seen it predict like the specific example values 00:41:48.280 |
in the docstring, which is extremely improbable 00:41:53.560 |
So I think there's a lot of dataset contamination 00:42:15.880 |
I'm sure that, you know, a couple of months from now 00:42:17.080 |
next year, we'll be like, oh, you know, like GPT-4.5, 00:42:19.800 |
GPT-5, it's so much better, like GPT-4 is terrible. 00:42:25.640 |
And what we found is that when doing like temperature zero 00:42:29.040 |
evals, it's actually mostly deterministic GPT-4 00:42:34.920 |
across runs in assigning scores to two different answers. 00:42:51.160 |
here's what people will be asking this model dataset. 00:42:56.680 |
is just like releasing the model to our users 00:43:01.280 |
'Cause that's like the only thing that really matters 00:43:05.960 |
that it's intended for and then seeing how people react. 00:43:09.640 |
And for the most part, the incredible thing is 00:43:47.040 |
or just like better implementation than GPT-4, 00:43:57.400 |
where we've seen emerging capabilities in the find model 00:44:13.240 |
where like riddles were like with like temporal 00:44:23.440 |
We went from not being able to do those at all 00:44:25.360 |
to being able to do them just by training on more code, 00:44:34.280 |
- Yeah, so I just wanted to make sure that we have the, 00:44:46.040 |
So unfortunately there's no Code Llama 7 to be. 00:44:48.120 |
If there was, that would be super cool, but there's not. 00:44:59.960 |
- Yeah, and they also did a couple of things. 00:45:08.680 |
So they actually increased the like max position tokens 00:45:24.680 |
to give it theoretically better long context support 00:45:29.120 |
But yeah, but otherwise it's like basically Llama 2. 00:45:36.120 |
we haven't yet done anything with the model architecture 00:45:39.120 |
and we just trained it on like many, many more billions 00:45:43.960 |
And something else that we're taking a look at now 00:45:47.040 |
is using reinforcement learning for correctness. 00:45:50.360 |
One of the interesting pitfalls that we've noticed 00:45:59.160 |
sometime is capable of getting the right answer. 00:46:09.000 |
and able to arrive at the right answer, but not always. 00:46:13.560 |
is something that we're gonna try is that like, 00:46:23.120 |
and then like use the correct answer as like a loss 00:46:27.360 |
basically to try to get it to be more correct. 00:46:31.600 |
And I think there's a high chance I think of this working 00:46:33.760 |
because it's very similar to the like RLHF method 00:46:36.880 |
where you basically show pairs of completions 00:46:40.560 |
for a given question, except the criteria is like, 00:46:48.280 |
But here, you know, we have a different criteria, 00:46:57.840 |
we just need to cajole it into being more consistent. 00:47:00.120 |
- There were a couple of things that I noticed 00:47:01.960 |
in the product that were not strange, but unique. 00:47:05.240 |
So first of all, the model can talk multiple times 00:47:13.400 |
And then you had outside of the thumbs up, thumbs down, 00:47:16.640 |
you have things like have DLLM prioritize this message 00:47:20.160 |
and its answers, or then continue from this message 00:47:29.640 |
yeah, what are like some tricks or learnings to that? 00:47:33.800 |
So yeah, that's specifically in our pair programmer mode, 00:47:40.000 |
that also like asks you clarifying questions back 00:47:46.240 |
if it doesn't fully understand what you're doing 00:47:47.800 |
and it kind of, it holds your hand a bit more. 00:47:55.240 |
to make more of an auto GPT, where you can kind of give it 00:47:58.320 |
this problem that might take multiple searches 00:48:03.120 |
And so that's the impetus behind building that product, 00:48:11.400 |
and also be able to handle really long conversations. 00:48:14.040 |
Like people are really trying to use the pair programmer 00:48:16.320 |
to go from like, sometimes really from like basic idea 00:48:20.760 |
And so we noticed was, is that we were having 00:48:25.600 |
sometimes with like 60 messages, like a hundred messages. 00:48:29.040 |
And like those become really, really challenging 00:48:31.560 |
to manage like the appropriate context window 00:48:42.520 |
or the product can continue giving good responses, 00:48:45.120 |
even if you're like 60 messages deep in a conversation. 00:48:47.880 |
So that's where the prioritized user messages 00:48:50.000 |
like comes from is like, people have asked us 00:48:56.720 |
that they want to be left in the conversation. 00:49:04.160 |
really gone a long way towards solving that problem. 00:49:07.080 |
- Yeah, and then you have a run and replit thing. 00:49:11.600 |
like learning some people trying to run the wrong code, 00:49:19.280 |
of like being a place where people can go from like idea 00:49:23.120 |
to like fully working code, having a code sandbox, 00:49:31.680 |
And replit is great and people use that feature. 00:49:43.160 |
and then like recursively iterate on it, exactly. 00:49:50.240 |
- So Amjad has specifically told me in person 00:49:53.760 |
At the same time, he's also working on his own models. 00:49:59.600 |
Like he wants to power you, but also compete with you. 00:50:17.720 |
So like replit approaches this problem from the IDE side. 00:50:28.240 |
And we're approaching it from the side of like an LLM 00:50:42.080 |
But we're kind of, we're approaching this problem 00:50:48.960 |
But I think that, you know, in the long, long term, 00:50:52.360 |
we have an opportunity to also just have like 00:50:56.480 |
this general kind of like technical reasoning engine product 00:51:00.680 |
that's, you know, potentially also not just for programmers 00:51:14.280 |
that eventually might go beyond like our current scope. 00:51:25.560 |
but first we gotta get the Paul Graham, Ron Conway story. 00:51:39.480 |
So the summer batch runs from June to September, 00:51:47.520 |
right around the time that many like YC startups 00:51:52.640 |
here's how we're gonna pitch investors and everything. 00:51:55.320 |
And at the same time, me and my co-founder, Justin, 00:52:03.240 |
we were thinking about building this company in New York, 00:52:10.720 |
pre-Chad GPT, pre last year, pre the AI boom, 00:52:18.480 |
- Lost its luster, yeah, like no one was here. 00:52:20.880 |
It was far from clear, like if there would be an AI boom, 00:52:29.920 |
If SF would be so back, as everyone is saying these days, 00:52:36.160 |
And so, and all of our friends, we were graduating college, 00:52:39.760 |
'cause like we happened to just graduate college 00:52:43.400 |
Like we didn't even have, I think we had a week in between. 00:52:54.760 |
from previous internships, but we both, like we, 00:53:02.520 |
at the company at which, like where I reneged my offer, 00:53:12.480 |
- Yeah, that was really great that they did that. 00:53:18.880 |
But yeah, we were both planning to be in New York. 00:53:21.240 |
And all of our friends were there from college. 00:53:24.160 |
And so like at this point, like we have this whole plan, 00:53:28.360 |
we're like on August 1st, we're gonna move to New York. 00:53:30.960 |
And we had like this Airbnb for the month of New York, 00:53:37.040 |
The day before we go to New York, I call Justin 00:53:40.840 |
and I just, I tell him like, why are we doing this? 00:53:48.720 |
that August 1st rolled around, all of our mentors 00:53:57.720 |
But like there were already signs that like something 00:54:02.520 |
even if like we didn't fully wanna admit it yet. 00:54:12.000 |
something kind of clicked when the rubber met the road 00:54:24.920 |
So we still go to New York 'cause like we have the Airbnb, 00:54:28.520 |
like we don't have any other kind of place to go 00:54:32.920 |
And New York is just unfortunately too much fun. 00:54:39.480 |
who are just, you know, like basically starting their jobs, 00:54:46.960 |
They're making all this money and they're like partying 00:54:50.400 |
And like, yeah, it's just a very distracting place to be. 00:54:52.720 |
And so we were just like sitting in this like small, 00:54:55.040 |
you know, like cramped apartment, terrible posture, 00:55:08.880 |
and he is doing office hours with a certain number 00:55:26.600 |
And like immediately, like half the spots were gone, 00:55:30.440 |
but somehow the very last spot was still available. 00:55:35.240 |
And so I picked the very, very last time slot 00:55:48.320 |
And so we made a plan that we're going to fly 00:55:50.880 |
from New York to SF and back to New York in one day 00:56:03.840 |
We meet PG, you know, we tell him about the startup. 00:56:08.040 |
And one thing I love about PG is that he gets like, 00:56:14.120 |
like you can see his eyes like really light up. 00:56:23.480 |
'Cause like, he'll just like start like, you know, 00:56:26.960 |
asking all these questions about how it works 00:56:34.520 |
- I think that like, he was asking us a lot of questions 00:56:41.240 |
'Cause like, as soon as like we told him like, 00:56:47.360 |
Like, we could really see like the gears turning 00:56:56.480 |
- And you're like 10 minutes with him, right? 00:56:57.400 |
- We had like 45, yeah, we had a decent chunk of time. 00:57:18.960 |
- And you're like, he haven't started your batch. 00:57:24.240 |
- Yeah, this is about halfway through the batch. 00:57:29.040 |
- Which when you're like not technically fundraising yet. 00:57:35.240 |
there was still a lot of issues with the product. 00:57:37.800 |
But I think like, it must have like still kind of 00:57:48.720 |
We have this dinner planned with this other friend 00:57:55.480 |
So we thought, okay, after an hour, we'll be done. 00:58:07.280 |
Or he's like, I gotta go have dinner with my wife, Jessica. 00:58:15.720 |
- Yeah, yeah, like Jessica does not get enough credit 00:58:25.560 |
'Cause she like, he understands like the technical side 00:58:44.520 |
who like we also promised to get dinner with. 00:58:47.440 |
So like, we'd love to, but like, I don't know if we can. 00:58:51.000 |
So like, yeah, so all of us just like hop in his car 00:58:54.720 |
and we go to his house and then we just like have this, 00:59:03.320 |
Like I remember him telling Jessica distinctly, 00:59:10.880 |
are like, are not gonna know what like a search result is. 00:59:23.720 |
- And you also just spoiled the booking system for PG. 00:59:27.160 |
'Cause now everyone's just gonna go after the last slot. 00:59:34.280 |
Yeah, I've met other founders that he did it this year. 00:59:41.120 |
that like YC just did like a random like scheduling system. 00:59:49.200 |
- Who is one of the most legendary angels in Silicon Valley. 00:59:54.960 |
the rest of our round came together pretty quickly. 01:00:00.480 |
like it might feel like playing favorites, right? 01:00:20.160 |
and like these accelerators in general is like, 01:00:21.920 |
YC gets like a lot of criticism from founders 01:00:23.880 |
who feel like they didn't get value out of it. 01:00:26.680 |
But like, in my view, YC is what you make of it. 01:00:31.800 |
they're like, you really got to grab this opportunity, 01:00:36.800 |
And if you do, then it could be the best thing in the world. 01:00:39.400 |
And if you don't, and if you're just kind of like a passive, 01:00:41.800 |
even like an average founder in YC, you're still gonna fail. 01:00:45.560 |
if you're average in your batch, you're gonna fail. 01:00:48.760 |
Like you have to just be exceptional in every way. 01:00:52.600 |
the rest of our round came together pretty quickly, 01:01:06.760 |
And we're just holed up in this like little house 01:01:20.680 |
we go over to the patio, where like our workstation is. 01:01:24.200 |
And Ron Conway, he's known for having like this notebook 01:01:27.840 |
that he goes around with, where he like sits down 01:01:30.960 |
with the notebook and like takes very, very detailed notes. 01:01:35.440 |
So he sits down with his notebook and he asks us like, 01:01:53.640 |
And then like he leaves a couple hours later, 01:02:10.120 |
Ron is known for writing these like one-liner emails 01:02:13.720 |
that are like very short, but very to the point. 01:02:16.400 |
And I think that's why like everyone responds to Ron. 01:02:23.160 |
He responds quickly, like tagging this VP of AI at NVIDIA. 01:02:26.440 |
And we start working with NVIDIA, which is great. 01:02:29.360 |
And something that I love about NVIDIA, by the way, 01:02:35.360 |
And at NVIDIA, they know that they're gonna win regardless. 01:02:40.360 |
So they don't care where you get the GPUs from. 01:02:45.280 |
unlike various sales reps that you might encounter 01:02:55.640 |
if you're getting NVIDIA GPUs, they're still winning. 01:03:05.440 |
- So like, so, okay, and then just to tie up this thing, 01:03:08.600 |
because it, so first of all, that's a fantastic story. 01:03:10.960 |
And like, you know, I just wanted to let you tell that 01:03:17.840 |
That you already decided to make by the time you met Ron, 01:03:20.040 |
which is we are going to have our own hardware. 01:03:22.240 |
We're gonna rack him in a data center somewhere. 01:03:26.840 |
'cause actually we don't, but we just need GPUs period. 01:03:35.400 |
and like they wanna make you commit to long terms 01:03:45.680 |
NVIDIA will kind of be to the point and be like, 01:03:53.800 |
Like they'll help you walk through what the options are. 01:04:07.960 |
they actually implemented a custom feature for us 01:04:16.400 |
Yeah, I don't think they would have done it otherwise. 01:04:18.640 |
They implemented streaming generation for T5 based models. 01:04:25.560 |
up until we switched to GPT in February, March of this year. 01:04:30.560 |
So they implemented that just for us actually, 01:04:34.120 |
And so like, they'll help you look at the complete picture 01:04:36.720 |
and then just help you get done what you need to get done. 01:04:40.400 |
- And I know one of your interests is also local models, 01:04:44.760 |
open source models and hardware kind of goes hand in hand. 01:04:54.200 |
- Yeah, so it's something that we're very interested in. 01:04:59.200 |
Because something that kind of we're hearing a lot about 01:05:09.000 |
but they wanna have it like within like their own sandbox. 01:05:11.840 |
They wanna have it like on hardware that they control. 01:05:16.040 |
in how we can get big models to run efficiently 01:05:26.480 |
Very interested in like where the whole quantization thing 01:05:32.680 |
'Cause like, obviously there are all these like great 01:05:34.080 |
quantization libraries now that go to four bit, eight bit, 01:05:52.800 |
Yeah, so we have these great quantization libraries 01:05:55.280 |
that for the most part are able to get the size down 01:06:02.520 |
like the quantized models currently are actually worse 01:06:06.520 |
And so I'm very curious if the future is something like 01:06:08.760 |
what NVIDIA is doing with their implementation of FP8, 01:06:15.760 |
Where basically once FP8 support is kind of more widespread 01:06:26.520 |
you can kind of switch between the two different FP8 formats 01:06:31.240 |
one with greater precision, one with greater range, 01:06:34.600 |
and then combine that with only not doing FP8 on every layer 01:06:57.280 |
but that's something that we're excited about 01:07:02.760 |
and other hardware once they get FP8 support as well. 01:07:16.960 |
well, I'm fortunate to have like a decent systems background 01:07:20.000 |
from UT Austin and somewhat of a research background, 01:07:23.120 |
even though like I didn't publish any papers, 01:07:28.200 |
Like I didn't publish the thesis that I wrote 01:07:30.600 |
mainly out of time because I was doing both of that 01:07:35.480 |
and then everything was kind of one after another. 01:07:38.080 |
But like I'm very fortunate to kind of have like the systems 01:07:40.200 |
and like a bit of like a research background. 01:07:41.960 |
But yeah, for the most part, outside of that foundation, 01:07:50.720 |
Like where do you, what fire hose do you drink from? 01:07:54.000 |
So like whenever I see something that blows my mind, 01:07:56.560 |
the way that that initial Hugging Face demo did, 01:08:10.360 |
just trying to get a mental model of what is happening. 01:08:15.360 |
so I can understand like the why, the how and the why. 01:08:20.760 |
then I can make my own hypotheses about like, 01:08:27.880 |
And here's why maybe they're correct, maybe they're wrong. 01:08:30.360 |
And here's how like I can improve on it and iterate on it. 01:08:33.600 |
And I guess that's the mindset that I approach it from 01:08:46.680 |
'cause like I would have loved to just have been able 01:08:48.560 |
to say like, hey, like I have no idea what I'm doing. 01:08:51.080 |
Can you just like be this like technical research assistant 01:08:54.360 |
and kind of hold my hand and like ask me clarifying questions 01:08:57.320 |
and like help me like formalize my assumptions 01:09:07.480 |
- Because I think you would use Find differently 01:09:17.640 |
It's like, no, no, even like non-technical questions as well. 01:09:20.360 |
'Cause that's just something I'm curious about. 01:09:34.160 |
because of very deliberate decisions that we've made 01:09:59.520 |
So like sometimes it's slower for like simple questions, 01:10:10.160 |
call for hiring any roles you're looking for. 01:10:13.640 |
What should people know about working at Find? 01:10:25.400 |
a lot of the work that we've done has been solely product. 01:10:28.320 |
But we also do, especially now with the Find model, 01:10:34.800 |
in trying to apply the very latest techniques 01:10:42.320 |
to training the very, very best model for our vertical. 01:10:46.800 |
And the two go hand in hand because the product, 01:11:01.080 |
So we're doing really kind of both at the same time. 01:11:03.560 |
And so someone who like enjoys seeing both of those sides, 01:11:06.960 |
like doing something very tangible that affects the user, 01:11:11.080 |
high quality, reliable code that runs in production, 01:11:39.120 |
- Yeah, well, we already have a founding engineer technically. 01:11:56.000 |
acceleration, exploration, and then a takeaway. 01:12:21.480 |
to now like mostly a reasoning heavy product. 01:12:24.320 |
And we had no idea that this would happen this fast. 01:12:26.040 |
Like we thought that like there'd be a lot more time 01:12:29.720 |
and like many more things that needed to happen 01:12:32.600 |
before we could do some level of like intelligent reasoning 01:12:41.000 |
and it happened much faster than we could have thought. 01:12:58.440 |
being able to guarantee that the answer will be correct 01:13:25.600 |
There's a very interesting paper that came out recently. 01:13:52.160 |
Because I feel like LMQL is a little bit too structured 01:14:01.280 |
- This is only something we've begun to take a look at. 01:14:08.680 |
we're definitely interested in exploring further. 01:14:11.000 |
But something that we are like a bit further along on 01:14:12.720 |
is also like exploring reinforcement learning 01:14:15.520 |
for correctness, as opposed to only harmfulness 01:14:19.080 |
the way it has typically been used in my research. 01:14:24.000 |
Do you have internal evals for what hallucination rate is 01:14:42.120 |
We more measure like was the answer right or was it wrong? 01:14:51.240 |
like the RAG context fed into the model as well. 01:14:54.200 |
So basically, if the context was bad and the answer was bad, 01:15:10.160 |
- Harrison from LangChain has been talking about 01:15:11.640 |
this sort of two by two matrix with the RAGs people. 01:15:31.160 |
and doing them in a quantitative way is really hard, 01:15:36.600 |
that I think harnesses GPT-4 in the right way. 01:15:47.760 |
like a prompting markup language for prompting models 01:15:53.120 |
'Cause we've written some very, very complex prompts, 01:15:58.080 |
to like do like very fancy things with people's code. 01:16:17.120 |
through some other abstraction above language 01:16:22.120 |
that has been like tested to do that some of the time, 01:16:26.440 |
perhaps like combined with like formal grammar limitations 01:16:35.040 |
- These are all things that have kind of emerged directly 01:16:38.080 |
from the issues we're facing ourselves in China. 01:16:45.640 |
what's one message, idea you want people to remember 01:16:50.720 |
- Yeah, I think pay attention to those moments 01:17:07.800 |
'Cause I see a lot of people trying to start startups 01:17:15.560 |
or like I'm like generally interested in the space. 01:17:27.120 |
it's much easier to stay like obsessed every single day 01:18:00.160 |
that believe that like the future of solving problems 01:18:03.960 |
and making things will be just like focused more 01:18:18.760 |
is what really gets you through the tough times 01:18:21.160 |
and hopefully gets you to the other side someday. 01:18:26.560 |
I kinda wanna play "Lose Yourself" as the outro music.