Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

00:00:00.000 | (upbeat music)

00:00:02.580 | - Hey everyone, welcome to the Latent Space Podcast.

00:00:10.040 | This is Alessio, partner and CTO

00:00:12.040 | of Residence Investable Partners,

00:00:13.840 | and I'm joined by my co-host, Swiggs, founder of Small AI.

00:00:17.280 | - Hey, and today we have in the studio

00:00:18.800 | Michael Voisin from Fines, welcome.

00:00:20.600 | - Thank you so much, it's great to be here.

00:00:22.000 | - Yeah, we are recording this in a surprisingly hot October

00:00:25.200 | in San Francisco, and I mean, sometimes the studio works,

00:00:29.600 | but--

00:00:30.440 | - The blue angels are flying by right now.

00:00:31.260 | - And the blue angels are flying by.

00:00:32.100 | (laughing)

00:00:32.920 | - Sorry about the noise.

00:00:33.760 | - I don't think they can hear it.

00:00:35.080 | We have enough damping.

00:00:36.320 | Anyway, so welcome.

00:00:38.320 | I've seen Fine blow up this year,

00:00:41.640 | mostly I think since your launch in Feb,

00:00:44.000 | and V2, and then your Hacker News post.

00:00:48.800 | We tend to like to introduce our guests,

00:00:50.560 | but then obviously you can fill in the blanks

00:00:52.200 | with the origin story.

00:00:53.880 | So you actually were a high school entrepreneur.

00:00:56.960 | You started SmartLens,

00:00:58.280 | which is a computer vision startup in 2017.

00:01:00.720 | - That's right, yeah.

00:01:01.640 | So I remember when TensorFlow came out

00:01:04.640 | and people started talking about,

00:01:07.680 | oh, obviously at the time, after AlexNet,

00:01:10.720 | the deep learning revolution was already in flow,

00:01:13.480 | and good computer vision models were a thing.

00:01:16.940 | And what really made me interested in deep learning

00:01:19.180 | was I got invited to go to Apple's WWDC conference

00:01:25.440 | as a student scholar,

00:01:26.960 | 'cause I was really into making iOS apps at the time.

00:01:30.000 | And so I go there and I go to this talk

00:01:32.000 | where they add an API that let people

00:01:36.560 | run computer vision models on the device

00:01:41.240 | using far more efficient GPU primitives.

00:01:43.760 | And after seeing that, I was like, oh, this is cool.

00:01:46.680 | This is gonna have a big explosion

00:01:48.600 | of different computer vision models

00:01:50.720 | running locally on the iPhone.

00:01:52.920 | And so I had this crazy idea where it was like,

00:01:57.360 | what if I could just make this model

00:02:02.080 | that could recognize just about anything

00:02:05.200 | and have it run on the device?

00:02:07.520 | And that was the genesis

00:02:08.720 | for what eventually became SmartLens.

00:02:10.980 | I took this data set called ImageNet 22K.

00:02:17.800 | So most people, when they think of ImageNet,

00:02:21.160 | think of ImageNet 1K.

00:02:22.680 | But the full ImageNet actually has,

00:02:24.960 | I think, 22,000 different categories.

00:02:28.200 | Yeah, so I took that, filtered it, pre-processed it,

00:02:32.800 | and then did a massive fine tune on Inception V3,

00:02:37.800 | which was, I think, the state-of-the-art

00:02:40.680 | deep convolutional computer vision model at the time.

00:02:44.420 | And to my surprise, it actually worked insanely well.

00:02:49.320 | I had no idea what would happen

00:02:50.520 | if I give a single model.

00:02:53.220 | I think it ended up being 17,000 categories approximately

00:02:57.520 | that I collapsed them into.

00:02:59.360 | It actually ended up working so well.

00:03:00.920 | It worked so well that it actually worked better

00:03:04.200 | than Google Lens,

00:03:05.720 | which released its V1 around the same time.

00:03:07.940 | And so, and on top of this, the model ran on the device.

00:03:12.440 | So it didn't need an internet connection.

00:03:14.120 | A big part of the issue with Google Lens at the time

00:03:16.400 | was that connections were slower.

00:03:19.320 | 4G was around, but it wasn't nearly as fast.

00:03:21.480 | And so there was a noticeable lag

00:03:22.480 | having to upload an image to a server and get it back.

00:03:25.720 | But just processing it locally,

00:03:28.000 | even on the iPhones of the day in 2017, much faster.

00:03:31.500 | And so it was a cool little project.

00:03:35.960 | It got some traction.

00:03:36.780 | TechCrunch wrote about it.

00:03:37.620 | And there was kind of one big spike in usage,

00:03:41.000 | and then over time it tapered off.

00:03:42.800 | But people still pay for it, which is wild.

00:03:44.840 | - That's awesome.

00:03:45.680 | Oh, it's like a monthly or annual subscription?

00:03:46.920 | - Yeah, it's like a monthly subscription.

00:03:48.560 | - Even though you don't actually have any servers.

00:03:50.240 | - Even though we don't have any servers.

00:03:52.960 | That's right, I was in high school.

00:03:53.960 | I wanted to make a little bit of money.

00:03:54.960 | I was like, yeah.

00:03:56.520 | - That's awesome.

00:03:57.360 | The modern equivalent is kind of Be My Eyes.

00:03:59.880 | And they actually disclosed

00:04:01.920 | in the GPT-4 Vision system card recently

00:04:04.200 | that the usage was surprisingly not that frequent.

00:04:08.280 | The extent to which all three of us have a sense of sight,

00:04:11.920 | I would think that if I lost my sense of sight,

00:04:13.520 | I would use Be My Eyes all the time.

00:04:15.320 | The average usage of Be My Eyes per day is 1.5 times.

00:04:18.640 | - Exactly.

00:04:19.520 | And I was thinking about this as well,

00:04:21.360 | where I was also looking into image captioning,

00:04:24.240 | where you give a model an image,

00:04:26.480 | and then it tells you what's in the image.

00:04:28.440 | But it turns out that what people want

00:04:29.920 | is the exact opposite.

00:04:30.840 | People want to give you a description,

00:04:32.360 | well, people want to give a description of an image,

00:04:34.200 | and then have the AI generate the image.

00:04:36.600 | - Oh, the other way.

00:04:37.600 | - Exactly.

00:04:38.440 | And so, at the time,

00:04:41.160 | I think there were some GANs,

00:04:43.720 | NVIDIA was working on this back in 2019, 2020.

00:04:46.480 | They had some impressive, I think, phase scans,

00:04:49.320 | where they had this model that would produce

00:04:51.640 | these really high quality portraits.

00:04:54.560 | But it wasn't able to take a natural language description

00:04:58.840 | the way Midtourney or DALI 3 can,

00:05:01.320 | and just generate you an image

00:05:04.800 | with exactly what you described in it.

00:05:07.640 | - Awesome.

00:05:08.480 | And how'd that get into NLP?

00:05:10.240 | - I released the Smart Lens app,

00:05:11.880 | and that was around the time,

00:05:12.720 | I was a senior in high school,

00:05:13.560 | I was applying to college.

00:05:14.800 | College rolls around,

00:05:17.640 | I'm still sort of working on updating the app in college.

00:05:21.760 | But I start thinking like,

00:05:24.280 | hey, what if I make an enterprise version of this as well?

00:05:28.600 | At the time, there was Clarify

00:05:30.160 | that provided some computer vision APIs.

00:05:32.680 | But I thought, this massive classification model

00:05:37.000 | works so well, and it's so small, and so fast,

00:05:39.880 | might as well build an enterprise product.

00:05:41.760 | And I didn't even talk to users,

00:05:43.240 | or do any of those things that you're supposed to do.

00:05:44.760 | I was just mainly interested in building a type of backend

00:05:48.800 | I've never built before.

00:05:49.640 | So I was mainly just doing it for myself, just to learn.

00:05:53.040 | And so I built this enterprise classification product,

00:05:56.800 | and as part of it,

00:05:57.880 | I'm also building an invoice processing product,

00:06:03.200 | where using some of the aspects that I built previously,

00:06:07.640 | although obviously it's very different from classification,

00:06:10.720 | I wanted to be able to just extract

00:06:13.640 | a bunch of structured data

00:06:14.880 | from an unstructured invoice through our API.

00:06:18.560 | And that's what led me to HuggingFace for the first time,

00:06:21.800 | 'cause that involves some natural language components.

00:06:23.880 | And so I go to HuggingFace,

00:06:25.280 | and with various encoder models

00:06:29.640 | that were around at the time, I think,

00:06:31.320 | I used the standard BERT, and also LongFormer,

00:06:34.720 | which came out around the same time.

00:06:37.400 | And LongFormer was interesting because it allowed,

00:06:40.120 | it had a much bigger context window

00:06:41.960 | than those models at the time.

00:06:43.000 | Like BERT, all of the first-gen encoder-only models,

00:06:46.640 | they only had a context window of 512 tokens.

00:06:50.800 | And it's fixed.

00:06:51.840 | There's none of this alibi or ROPE that we have now,

00:06:54.960 | where we can basically massage it to be longer.

00:06:57.040 | It was, they're fixed, 512 absolute encodings.

00:07:00.720 | And so LongFormer at the time was the only way

00:07:04.000 | that you can fit, say, a sequence length,

00:07:06.560 | or ask a question about like 4,000 tokens worth of text.

00:07:10.000 | And so implemented LongFormer, it worked super well.

00:07:14.960 | But nobody really kind of used the enterprise product.

00:07:20.400 | And that's kind of what I expected,

00:07:22.120 | 'cause at the end of the day, it was COVID.

00:07:24.560 | I was building this kind of mostly for me,

00:07:26.440 | mostly just kind of to learn.

00:07:28.960 | And so nobody really used it, and my heart wasn't in it,

00:07:32.440 | and I kind of just shelved it.

00:07:35.200 | But a little later, I went back to Hugnyface,

00:07:38.520 | and I saw this demo that they had,

00:07:40.640 | and this is in the summer of 2020.

00:07:42.440 | They had this demo made by this researcher, Yassin Jarnit.

00:07:47.480 | And he called it LongForm Question Answering.

00:07:52.480 | And basically, it was this self-contained notebook demo

00:07:59.400 | where you can ask a model a question,

00:08:04.560 | the way that we do now with ChatGPT.

00:08:06.960 | It would do a lookup into some database,

00:08:10.800 | and it would give you an answer.

00:08:12.040 | And it absolutely blew my mind.

00:08:15.200 | The demo itself, it used, I think, BART as the model.

00:08:18.160 | And in the notebook, it had support

00:08:19.720 | for both an Elasticsearch index of Wikipedia,

00:08:24.720 | as well as a Dense index, powered by Facebook's Face,

00:08:29.240 | Vice, I think that's how you pronounce it.

00:08:32.400 | It had both, and it was very iffy.

00:08:36.760 | But when it worked, I think the question in the demo was,

00:08:41.080 | why are all boats white?

00:08:42.840 | When it worked, it blew my mind

00:08:45.080 | that instead of doing this few-shot thing,

00:08:48.960 | like people were doing GPT-3 at the time,

00:08:50.760 | which is all the rage,

00:08:51.800 | you could just ask a model a question,

00:08:53.880 | provide no extra context,

00:08:56.800 | and it would know what to do and just give you the answer.

00:08:59.640 | It blew my mind to such an extent

00:09:00.920 | that I couldn't stop thinking about that.

00:09:03.040 | And I started thinking about ways to make it better.

00:09:05.920 | I tried training or doing the fine-tune

00:09:09.800 | with a larger BART model.

00:09:12.160 | And this BART model, yeah,

00:09:14.280 | it was fine-tuned on this Reddit dataset called Eli Five.

00:09:19.280 | So basically-

00:09:21.200 | - The subreddit.

00:09:22.040 | - Yeah, the subreddit, yeah.

00:09:23.800 | Someone had scraped, I think, I forget who did it,

00:09:26.720 | but someone had scraped a subreddit.

00:09:30.320 | And put it into a well-formatted,

00:09:33.000 | relatively clean dataset of human questions

00:09:35.720 | and human answers.

00:09:36.720 | So we're bootstrapping this model from Eli Five,

00:09:39.640 | and that made it pretty good

00:09:42.880 | at at least getting the right format

00:09:44.640 | when doing this rag retrieval from these databases

00:09:49.120 | and then generating the final answer.

00:09:51.280 | And so Eli Five actually turned out to be a good dataset

00:09:54.600 | for training these types of question-answering models

00:09:56.800 | because the question's written by a human.

00:10:00.040 | The answer's written by a human,

00:10:01.360 | and at least helps the model get the format right.

00:10:04.440 | Even if the model is still very small

00:10:06.960 | and it can't really think super well,

00:10:08.080 | at least it gets the format right.

00:10:09.520 | And so it ends up acting

00:10:11.360 | as kind of a glorified summarization model

00:10:13.360 | where if it's fed in high-quality context

00:10:17.240 | from the retrieval system,

00:10:19.800 | it's able to have a reasonably high-quality output.

00:10:22.400 | And so once I made the model as big as I can,

00:10:24.600 | just fine-tuning on BART large,

00:10:28.280 | I started looking for ways to improve the index.

00:10:33.080 | So in the demo, in the notebook,

00:10:35.640 | it was there were instructions

00:10:38.560 | for how to make an Elasticsearch index just for Wikipedia.

00:10:42.280 | And I was like, "Why not do all of Common Crawl?"

00:10:45.480 | So I downloaded Common Crawl,

00:10:47.480 | and thankfully I had like $10,000 or $15,000

00:10:50.640 | worth of AWS credits left over from the SmartLens project.

00:10:53.880 | That's what really allowed me to do this

00:10:55.120 | 'cause there's no other funding.

00:10:56.240 | I was still in college.

00:10:58.200 | Not a lot of money.

00:10:59.200 | And so I was able to spin up a bunch of instances

00:11:01.480 | and just process all of Common Crawl, which is massive.

00:11:04.640 | So it's roughly like, it's terabytes of text.

00:11:07.760 | And so I whitelisted.

00:11:12.240 | I went to Alexa to get like the top 1000 websites

00:11:17.240 | or 10,000 websites in the world,

00:11:21.040 | and then filtered only by those websites,

00:11:23.480 | and then indexed those websites

00:11:25.760 | 'cause the webpages were already included in dump.

00:11:28.280 | So I just-

00:11:29.120 | - You mean to supplement Common Crawl

00:11:30.480 | or to filter Common Crawl?

00:11:31.560 | - Filter Common Crawl.

00:11:32.400 | - Oh, okay. - Yeah.

00:11:33.240 | So we filtered Common Crawl just by,

00:11:36.680 | yeah, the top, I think, 10,000.

00:11:39.680 | Just to limit this,

00:11:40.520 | because obviously there's this massive long tail

00:11:43.120 | of small sites that are really cool, actually.

00:11:45.760 | And there's other projects like,

00:11:47.920 | shout out to Marginalian New,

00:11:50.040 | which is a search engine specialized on the long tail.

00:11:53.000 | I think they actually exclude like the top 10,000.

00:11:55.960 | - That's what they do. - 10,000, yeah.

00:11:57.680 | - I've seen them around

00:11:58.520 | and just don't really know what their pitch is.

00:12:00.240 | - Yeah, yeah, yeah.

00:12:01.080 | So they exclude all the top stuff.

00:12:03.960 | So the long tail is cool,

00:12:04.920 | but for this, that was kind of out of the question,

00:12:07.440 | and that was most of the data anyway.

00:12:09.160 | So we've removed that.

00:12:11.920 | And then I indexed the remaining

00:12:16.680 | approximately 350 million webpages through Elasticsearch.

00:12:22.680 | So I built this index running on AWS with these webpages,

00:12:26.600 | and it actually worked quite well.

00:12:28.120 | Like you can ask it like general common knowledge,

00:12:31.360 | history, politics, current events, questions,

00:12:35.320 | and it would be able to do a fast lookup in the index,

00:12:39.120 | feed it into the model,

00:12:40.280 | and it would give like a surprisingly good result.

00:12:43.320 | And so when I saw that,

00:12:45.560 | I thought that this is definitely doable.

00:12:49.680 | And like, it kind of shocked me

00:12:50.720 | that like no one else was doing this.

00:12:52.560 | And so this was now the fall of 2020.

00:12:55.360 | And yeah, I was kind of shocked no one was doing this,

00:13:01.240 | but it costs a lot of money to keep it up.

00:13:03.360 | I was still in college.

00:13:04.200 | There are things going on.

00:13:05.080 | I got bogged down by classes.

00:13:06.360 | And so I ended up shelving this

00:13:08.120 | for almost a full year, actually.

00:13:13.120 | And I returned to it in fall of 2021

00:13:16.720 | when Big Science released T0.

00:13:20.520 | When Big Science released the T0 models,

00:13:22.680 | that was a massive jump

00:13:24.880 | in the reasoning ability of the model.

00:13:28.240 | And it was better at reasoning,

00:13:30.640 | it was better at summarization.

00:13:31.720 | It was still a glorified summarizer, basically.

00:13:33.760 | - Was this a precursor to Bloom?

00:13:35.440 | Because Bloom's the one that I know.

00:13:36.640 | - I think Bloom ended up actually coming out in 2022,

00:13:39.400 | but Bloom had other problems

00:13:43.840 | where I think for whatever reason,

00:13:47.560 | the Bloom models just were never really that good,

00:13:50.320 | which is so sad 'cause I really wanted to use them.

00:13:53.120 | But I think they didn't train on that much data.

00:13:56.040 | I think they used the original,

00:13:57.440 | they were trying to replicate GPT-3.

00:13:59.200 | So they just used those numbers,

00:14:00.360 | which we now know are far below Chinchilla Optimal.

00:14:02.840 | And even Chinchilla Optimal,

00:14:04.200 | which we can talk about later,

00:14:05.800 | what we're currently doing with the fine model goes,

00:14:07.240 | yeah, it goes way beyond that.

00:14:08.880 | But they weren't sharing enough data.

00:14:10.760 | I'm not sure how that data was clean,

00:14:11.960 | but it probably wasn't super clean.

00:14:13.120 | And then they didn't really do any fine tuning

00:14:14.800 | until much later.

00:14:16.360 | So T0 worked well because they took the T5 models,

00:14:20.920 | which were closer to Chinchilla Optimal.

00:14:24.840 | 'Cause I think they were trained on

00:14:26.200 | also like 300 something billion tokens,

00:14:28.240 | similar to GPT-3, but the models were much smaller.

00:14:30.840 | So the models, yeah, they were pre-trained better.

00:14:35.200 | And then they were fine-tuned on this.

00:14:39.200 | I think T0 is the first model

00:14:44.600 | that did large-scale instruction tuning

00:14:46.720 | from diverse data sources in the fall of 2021.

00:14:51.040 | This is before Instruct GPT.

00:14:52.560 | This is before Flan T5, which came out in 2022.

00:14:56.560 | This is the very, very first,

00:14:58.600 | at least well-known example of that.

00:15:01.680 | And so it came out and then I did,

00:15:04.200 | on top of T0, I also did the Reddit Eli5 fine tune.

00:15:09.000 | And that was the first model and system

00:15:14.120 | that actually worked well enough

00:15:16.200 | to where I didn't get discouraged like I did previously.

00:15:19.120 | 'Cause the failure cases

00:15:20.480 | of the BART-based system was so egregious.

00:15:23.600 | Sometimes it would just misinterpret your answers so,

00:15:27.520 | or questions so horribly

00:15:28.880 | that it was just extremely discouraging.

00:15:31.840 | But for the first time, it was working reasonably well.

00:15:34.880 | I'm also using a much bigger model.

00:15:36.640 | I think the BART model is like 800 million parameters,

00:15:39.520 | but T0, we were using 3B.

00:15:41.880 | So it was T0, 3B, bigger model.

00:15:45.200 | And that was the very first iteration of Hello.

00:15:51.280 | So ended up doing a show HN on Hacker News

00:15:54.560 | in January, 2022 of that system.

00:15:57.720 | Our fine tune T0 model connected to our Elasticsearch index

00:16:02.160 | of those 350 million top 10,000 common crawl websites.

00:16:06.840 | And to the best of my knowledge,

00:16:07.920 | I think that's the first example

00:16:11.360 | that I'm aware of a LLM search engine model

00:16:16.360 | that's effectively connected to like a large enough index

00:16:21.640 | that I would consider like an internet scale.

00:16:23.880 | So I think we were the first to release

00:16:28.360 | like an internet scale LLM powered rag search system

00:16:33.360 | in January, 2022.

00:16:35.320 | And around the time me and my future co-founder, Justin,

00:16:40.680 | we were like, you know, we really,

00:16:42.640 | why not do this full time?

00:16:43.880 | Like this seems like the future.

00:16:45.600 | This is really cool.

00:16:47.240 | I couldn't really sleep even.

00:16:48.400 | Like I was going to bed and I was like,

00:16:50.480 | I was thinking about it.

00:16:51.320 | Like I would say up until like 2.30 AM,

00:16:53.760 | like reading papers on my phone in bed,

00:16:56.080 | go to sleep, wake up the next morning at like eight

00:16:58.600 | and just be super excited to keep working.

00:17:01.440 | And I was also doing my thesis at the same time,

00:17:04.440 | my senior honors thesis at UT Austin

00:17:06.840 | about something very similar.

00:17:10.320 | We were researching factuality

00:17:13.320 | in abstractive question answering systems.

00:17:17.960 | So a lot of overlap with this project.

00:17:19.880 | And the conclusions of my research actually kind of helped

00:17:24.400 | guide the development path of Hello.

00:17:26.880 | In the research we found that LLMs don't,

00:17:30.720 | they don't know what they don't know.

00:17:34.120 | So the conclusion was,

00:17:35.680 | is that you always have to do a search

00:17:40.240 | to ensure that the model actually knows

00:17:42.120 | what it's talking about.

00:17:43.320 | And my favorite example of this even today

00:17:45.120 | is kind of with chat GPT browsing,

00:17:47.640 | where you can ask chat GPT browsing,

00:17:50.440 | how do I run llama.cpp?

00:17:52.920 | And chat GPT browsing will think that llama.cpp

00:17:55.160 | is some file on your computer

00:17:56.600 | that you can just compile with GCC and you're all good.

00:17:59.720 | It won't even bother doing a lookup,

00:18:02.800 | even though I'm sure somewhere in their internal prompts,

00:18:05.480 | they have something like, if you're not sure, do a lookup.

00:18:07.720 | Like that's not good enough.

00:18:09.480 | So models don't know what they don't know.

00:18:10.960 | You always have to do a search.

00:18:12.520 | And so we approached LLM powered question answering

00:18:17.520 | from the search angle.

00:18:19.440 | We pivoted to make this for programmers in June of 2022,

00:18:25.880 | around the time that we were getting into YC.

00:18:29.280 | We realized that like,

00:18:30.480 | what we're really interested in,

00:18:33.040 | is the case where the models actually have to think.

00:18:36.320 | 'Cause up until then,

00:18:37.160 | the models were kind of more glorified summarization models.

00:18:40.880 | Like we really thought of them like,

00:18:42.520 | the Google featured snippets, but on steroids.

00:18:45.320 | And so we like, we saw a future where,

00:18:48.280 | the simpler questions would get commoditized.

00:18:50.560 | And I still think that's going to happen

00:18:52.040 | with like Google SGE and like it's nowadays,

00:18:55.640 | it's really not that hard

00:18:57.560 | to like answer the more basic kind of like summarization,

00:19:03.000 | like current events questions with lightweight models.

00:19:05.320 | That'll only continue to get cheaper over time.

00:19:07.680 | And so we kind of started thinking about this trade-off

00:19:09.760 | where LLM models are going to get both better

00:19:13.760 | and cheaper over time.

00:19:15.360 | And that's going to force people

00:19:17.720 | who run them to make a choice.

00:19:19.680 | Either you can run a model of the same intelligence

00:19:22.960 | that you could previously for cheaper,

00:19:25.480 | or you can run a better model for the same price.

00:19:29.160 | And so someone like Google,

00:19:31.160 | once the price kind of falls low enough,

00:19:33.960 | they're going to deploy, and they're already doing this

00:19:35.560 | with SGE, they're going to deploy a relatively basic

00:19:39.920 | kind of glorified summarizer model

00:19:41.760 | that can answer very basic questions

00:19:43.160 | about like current events, like who won the Superbowl,

00:19:47.040 | like what's going on on Capitol Hill,

00:19:50.240 | like those types of things.

00:19:51.600 | And the flip side of that is like more complex questions

00:19:55.240 | where like you have to reason

00:19:56.160 | and you have to solve problems and like debug code.

00:19:58.760 | And we realized like we were much more interested

00:20:02.360 | in kind of going along the bleeding edge

00:20:05.280 | of that frontier case.

00:20:06.480 | And so we've optimized everything that we do for that.

00:20:10.480 | And that's a big reason of why we've built FIND

00:20:13.520 | specifically for programmers,

00:20:15.400 | as opposed to saying like,

00:20:17.240 | we're kind of a search engine for everyone

00:20:18.800 | because as these models get more capable,

00:20:21.200 | we're very interested in seeing kind of

00:20:22.760 | what the emergent properties are in terms of reasoning,

00:20:25.800 | in terms of being able to solve complex multi-step problems.

00:20:30.480 | And I think that some of those emergent capabilities,

00:20:33.840 | like we're starting to see,

00:20:35.720 | but we don't even fully understand.

00:20:37.320 | So as I think there's always an opportunity for us

00:20:42.040 | to become more general if we wanted,

00:20:43.840 | but we've been along this path of like,

00:20:48.080 | what is the best, most advanced reasoning engine

00:20:53.080 | that's connected to your code base,

00:20:55.200 | that's connected to the internet that we can just provide?

00:20:57.880 | - What is FIND today, pragmatically,

00:21:00.160 | from a product perspective?

00:21:01.840 | How do people interact with it?

00:21:03.160 | How does it plug into your workflow?

00:21:04.920 | - Yeah, so FIND is really a system.

00:21:06.640 | FIND is a system for programmers

00:21:10.320 | when they have a question or when they're frustrated

00:21:12.640 | or when something's not working.

00:21:13.960 | - You're frustrated.

00:21:14.800 | - Yeah, for them to get on block.

00:21:15.880 | The most abstract page for FIND is like,

00:21:18.120 | if you're experiencing really any kind of issue

00:21:22.120 | as a programmer,

00:21:23.280 | we'll solve that issue for you in 15 seconds

00:21:25.520 | as opposed to 15 minutes or longer.

00:21:27.600 | And so, FIND has an interface on the web.

00:21:31.200 | It has an interface in VS Code and more IDEs to come.

00:21:35.280 | But ultimately, it's just a system

00:21:37.280 | where a developer can paste in a question

00:21:39.920 | or paste in code that's not working.

00:21:41.800 | And FIND will do a search on the internet

00:21:44.560 | or they will find other code in your code base,

00:21:46.880 | perhaps that's relevant.

00:21:48.480 | FIND will find the context that it needs

00:21:50.920 | to answer your question

00:21:52.520 | and then feed it to a reasoning engine

00:21:54.520 | powerful enough to actually answer it.

00:21:56.440 | So, that's really the philosophy behind FIND.

00:21:58.080 | It's a system for getting developers

00:22:00.400 | the answers that they're looking for.

00:22:03.560 | And so, right now from a product perspective,

00:22:06.520 | this means that we're really all about

00:22:09.320 | getting the right context.

00:22:10.800 | So, the VS Code extension that we launched recently

00:22:13.800 | is a big part of this

00:22:15.280 | 'cause you can just ask a question

00:22:17.120 | and it knows where to find the right code context

00:22:22.120 | in your code.

00:22:23.280 | It can do an internet search as well.

00:22:24.280 | So, it's up to date.

00:22:25.960 | And it's not just reliant on what the model knows.

00:22:29.280 | And it's able to figure out what it needs by itself

00:22:33.640 | and answer your question based on that.

00:22:36.360 | And if it needs some help,

00:22:38.400 | you can also get yourself kind of just,

00:22:41.160 | there's opportunities for you yourself

00:22:42.760 | to put in all that context in.

00:22:44.360 | But the issue is also not everyone wants to use VS Code.

00:22:49.800 | Some people are real Neovim sticklers

00:22:53.240 | or they're using PyCharm or other IDEs, JetBrains.

00:22:57.840 | And so, for those people,

00:23:00.800 | they're actually okay with switching tabs,

00:23:03.960 | at least for now,

00:23:04.920 | if it means them getting their answer.

00:23:07.840 | 'Cause really, there's been an explosion

00:23:11.080 | of all these startups doing code, doing search, et cetera.

00:23:15.320 | But really, who everyone's competing with is ChatGPT,

00:23:18.520 | which only has that one web interface.

00:23:20.600 | And ChatGPT is really the bar.

00:23:23.000 | And so, that's what we're up against.

00:23:26.160 | - And so, your idea,

00:23:27.520 | we have Aman from Cursor on the podcast

00:23:29.840 | and they've gone through the,

00:23:31.720 | we need to own the IDE thing.

00:23:33.480 | Yours is more like,

00:23:35.160 | in order to get the right answer,

00:23:37.280 | people are happy to go somewhere else, basically.

00:23:39.800 | They're happy to get out of their IDE.

00:23:42.440 | - That was a great podcast, by the way.

00:23:44.400 | But yeah, so part of it is that

00:23:46.760 | people sometimes perhaps aren't even in an IDE.

00:23:51.640 | So, the whole task of software engineering

00:23:55.440 | goes way beyond just running code, right?

00:23:56.960 | There's also a design stage.

00:23:58.560 | There's a planning stage.

00:23:59.520 | A lot of this happens on whiteboards.

00:24:01.400 | It happens in notebooks.

00:24:03.080 | And so, the web part of it also exists for that,

00:24:05.680 | where you're not even coding it

00:24:07.720 | and you're just trying to get

00:24:08.680 | a more conceptual understanding

00:24:10.640 | of what you're trying to build first.

00:24:12.480 | But the podcast with Aman was great,

00:24:16.680 | but somewhere where I disagree with him

00:24:18.480 | is that you actually need to own the IDE.

00:24:21.720 | I think in the long, sorry, sorry.

00:24:27.120 | Oh, let's cut that.

00:24:28.040 | Yeah, so I thought the podcast with Aman was great,

00:24:31.480 | but somewhere where I disagree with him

00:24:32.960 | is that you need to own the IDE.

00:24:35.400 | I think he made kind of some good points

00:24:37.640 | about not having platform risk in the longterm,

00:24:42.160 | but some of the features that were mentioned,

00:24:47.920 | like suggesting diffs, for example,

00:24:50.800 | those are all doable with an extension.

00:24:54.600 | We haven't yet seen, with VS Code in particular,

00:24:59.280 | any functionality that we'd like to do yet in the IDE

00:25:05.200 | that we can't either do through

00:25:07.720 | directly supported VS Code functionality

00:25:09.960 | or something that we kind of hack into there,

00:25:12.200 | which we've also done a fair bit of.

00:25:15.440 | And so I think it remains to be seen where that goes.

00:25:20.000 | But I think what we're looking to be

00:25:21.800 | is we're not trying to just be in an IDE or be an IDE.

00:25:26.400 | Find is a system that goes beyond the IDE

00:25:28.440 | and is really meant to cover the entire lifecycle

00:25:32.960 | of a developer's thought process

00:25:34.760 | in going about like, hey, I have this idea

00:25:37.400 | and I want to get from that idea to a working product.

00:25:40.680 | And so then that's what the longterm vision

00:25:42.600 | of Find is really about, is starting with that,

00:25:45.520 | where in the future, I think programming

00:25:50.000 | is just going to be really just the problem solving.

00:25:53.920 | Like you come up with an idea,

00:25:55.600 | you come up with the basic design

00:25:57.600 | for the algorithm in your head,

00:25:59.680 | and you just tell the AI, hey, just do it.

00:26:02.240 | Just make it work.

00:26:03.520 | And that's what we're building towards.

00:26:05.280 | - Fantastic.

00:26:06.680 | I think we might want to give people,

00:26:10.880 | some impression about the type of traffic that you have,

00:26:14.360 | because when you present it with a text box,

00:26:18.080 | you could type in anything.

00:26:19.520 | And I don't know if you have some mental categorization

00:26:22.200 | of what are the top three use cases

00:26:25.120 | that people tend to call lesser.

00:26:26.800 | - Yeah, that's a great question.

00:26:28.560 | So the two main types of searches that we see

00:26:32.720 | are how-to questions, like how to do X using Y tool.

00:26:37.840 | And this historically has been our bread and butter,

00:26:41.000 | because with our embeddings,

00:26:43.200 | like we're really, really good

00:26:44.480 | at just going over a bunch of developer documentation

00:26:48.240 | and figuring out exactly the part that's relevant

00:26:50.200 | and just telling you, okay, like you can use this method.

00:26:52.880 | But as LLMs have gotten better,

00:26:54.720 | and as we've really transitioned

00:26:56.760 | to using GPT-4 a lot in our product,

00:26:59.920 | people organically just started pasting in code

00:27:03.520 | that's not working and just said, fix it for me.

00:27:05.760 | - Fix this. - Yeah.

00:27:06.720 | And what really shocks us is that

00:27:09.240 | a lot of the people who do that,

00:27:12.800 | they're coming from ChatGPT.

00:27:14.480 | So they tried it in ChatGPT with ChatGPT-4.

00:27:18.200 | It didn't work.

00:27:19.120 | Maybe it required like some multi-step reasoning.

00:27:21.880 | Maybe it required like some internet context

00:27:25.080 | or something found in either a Stack Overflow post

00:27:28.800 | or some documentation to solve it.

00:27:31.000 | And so then they paste it into Find and then Find works.

00:27:33.800 | So those are really those two different cases.

00:27:36.600 | Like, how can I build this conceptually

00:27:39.440 | or like remind me of this one detail

00:27:41.360 | that I need to build this thing,

00:27:43.560 | or just like, here's this code, fix it.

00:27:45.720 | And so that's what a big part of our VS Code extension is,

00:27:49.080 | is like enabling a much smoother,

00:27:51.560 | here, just like fix it for me type of workflow.

00:27:54.200 | That's really its main benefits.

00:27:55.880 | Like it's in your code base, it's in the IDE.

00:27:58.360 | It knows how to find the relevant context

00:28:01.240 | to answer that question.

00:28:02.560 | But at the end of the day, like I said previously,

00:28:05.920 | that's still a relatively, not to say it's a small part,

00:28:10.080 | but it's a limited part of the entire

00:28:13.240 | kind of mental lifecycle of a programmer.

00:28:16.880 | - Yeah.

00:28:17.720 | When you launched in, so you launched in Feb

00:28:20.200 | and then you launched V2 in August,

00:28:22.200 | you had a couple other pretty impactful

00:28:25.240 | posts/feature launches.

00:28:27.320 | The web search one was massive.

00:28:29.440 | And so you were mostly a GPT-4 wrapper.

00:28:36.280 | - We were for a long time.

00:28:37.240 | - For a long time, until recently.

00:28:38.200 | - Yeah, until recently.

00:28:39.640 | - So like people coming over from ChatGPT

00:28:41.280 | were saying, "We're gonna say model."

00:28:43.160 | - Yep.

00:28:44.000 | - "What would be your version of web search?

00:28:46.000 | "Would that be the primary value proposition?"

00:28:47.960 | - Basically, yeah.

00:28:48.800 | And so what we've seen is that any model plus web search

00:28:51.960 | is just significantly better than that model itself.

00:28:54.640 | - Do you think that's what you got right in April?

00:28:55.920 | Like, so you got 1500 points on Hacker News in April,

00:28:59.400 | which is like, if you live on Hacker News a lot,

00:29:02.280 | that is unheard of for someone so early on in your journey.

00:29:06.240 | - Yeah, super, super grateful for that.

00:29:08.360 | Definitely was not expecting it.

00:29:09.920 | So what we've done with Hacker News

00:29:11.040 | is we've just kept launching.

00:29:13.000 | - Yeah.

00:29:13.840 | - Like, what they don't tell you

00:29:14.880 | is like, you can just keep launching.

00:29:16.040 | So that's what we've been doing.

00:29:17.680 | So we launched the very first version of Find

00:29:20.800 | in its current incarnation

00:29:24.320 | after like the previous demo connected to our own index.

00:29:26.760 | Like once we got into YC, we scrapped our own index

00:29:30.160 | 'cause it was too cumbersome at the time.

00:29:33.600 | We moved over to using Bing as kind of

00:29:36.400 | just the raw source data.

00:29:39.040 | And we launched as Hello Cognition.

00:29:42.200 | And over time, every time we like added some intelligence

00:29:46.040 | to the product, a better model, we just keep launching.

00:29:47.960 | And every additional time we launched,

00:29:51.880 | we got way more traffic.

00:29:52.840 | So we actually silently rebranded to Find

00:29:55.400 | in late December of last year.

00:29:57.560 | But like, we didn't have that much traffic.

00:29:58.840 | Like nobody really knew who we were.

00:30:00.120 | - How'd you pick the name of it?

00:30:00.960 | - Paul Graham actually picked it for us.

00:30:02.520 | - All right, tell the story.

00:30:03.360 | - Yeah, so, oh boy.

00:30:05.600 | Yeah, where do I start?

00:30:06.440 | So this is a big aside.

00:30:08.160 | Should we go for like the full Paul Graham story

00:30:10.480 | or just the name?

00:30:11.320 | - Do you wanna do it now or you wanna do it later?

00:30:12.400 | I'll give you a choice.

00:30:13.600 | (laughs)

00:30:15.360 | - I think, okay, let's just start with the name for now

00:30:17.520 | and then we can do the full Paul Graham story later.

00:30:20.040 | But basically, Paul Graham,

00:30:23.120 | when we were lucky enough to meet him,

00:30:24.960 | he saw our name and our domain was at the time, sayhello.so.

00:30:29.960 | And he's just like, "Guys, like, come on.

00:30:32.800 | Like, what is this?"

00:30:34.720 | You know, like, and we were like,

00:30:37.880 | "Yeah."

00:30:38.720 | But like when we bought it,

00:30:39.560 | you know, we just kind of broke college students.

00:30:40.920 | Like we didn't have that much money.

00:30:42.000 | And like, we really liked "hello" as a name

00:30:44.120 | because it was the first like conversational search engine.

00:30:48.360 | And that's kind of,

00:30:49.240 | that's the angle that we were approaching it from.

00:30:52.000 | And so we had sayhello.so and he's like,

00:30:54.200 | "There's so many problems with that."

00:30:55.360 | Like the sayhello, like what does that even mean?

00:30:58.520 | And like .so, like, it's gotta be like a .com.

00:31:02.560 | We did some time just like with Paul Graham in the room.

00:31:05.640 | We just like looked at different domain names,

00:31:07.840 | like different things that like popped into our head.

00:31:10.240 | And one of the things that popped into,

00:31:11.920 | like Paul Graham said was fine.

00:31:13.240 | Like with the P-H-I-N-D spelling in particular.

00:31:15.720 | - Yeah, which is not typical naming advice, right?

00:31:17.960 | - Yes.

00:31:18.800 | - Because it's not, when people hear it,

00:31:19.840 | they don't spell it that way.

00:31:20.680 | - Exactly.

00:31:21.520 | It's hard to spell.

00:31:22.960 | And also it's like very nineties.

00:31:25.080 | And so at first, like we didn't like,

00:31:27.240 | I was like, like, like, I don't know.

00:31:30.040 | But over time, like it kind of, it kept growing on us.

00:31:34.360 | And eventually we're like, okay, you know,

00:31:38.880 | we like the name.

00:31:40.160 | It's owned by this elderly Canadian gentleman

00:31:42.920 | who got to know and he was willing to sell it to us.

00:31:45.760 | And so we bought it and we changed the name.

00:31:49.360 | Yeah.

00:31:50.200 | But anyways, where were you?

00:31:52.240 | - I had to ask.

00:31:53.080 | I mean, you know, everyone who looks at you is wondering.

00:31:55.800 | - A lot of people,

00:31:56.640 | and a lot of people actually pronounce it finned,

00:31:59.160 | which, you know, by now is kind of, you know,

00:32:02.240 | it's part of the game,

00:32:03.880 | but eventually we want to buy F-I-N-D.com

00:32:08.160 | and then just have that redirect to P-H-I-N-D.

00:32:10.920 | So P-H-I-N-D is like definitely the right spelling.

00:32:12.920 | But like, we'll just, yeah,

00:32:14.400 | we'll have all the cases addressed.

00:32:15.880 | - So Bing web search, and then in August you launched V2.

00:32:19.520 | Could you, is V2 the find as a system pitch?

00:32:24.520 | Or have you moved, evolved since then?

00:32:26.360 | - Yeah, so I don't, like the V2 moniker,

00:32:29.040 | like I don't really think of it that way in my mind.

00:32:31.120 | There's like, there's the version we launched during,

00:32:33.120 | last summer during YC,

00:32:34.760 | which was the Bing version directed towards programmers.

00:32:39.440 | And that's kind of like,

00:32:40.560 | that's why I call it like the first incarnation

00:32:42.280 | of what we currently are.

00:32:43.120 | 'Cause it was already directed towards programmers.

00:32:44.800 | We had like a code snippet search built in as well.

00:32:47.600 | 'Cause at the time, you know,

00:32:48.840 | the models we were using weren't good enough

00:32:50.600 | to generate code snippets.

00:32:51.640 | Even GPT, like the Text DaVinci 2,

00:32:54.160 | which was available at the time,

00:32:56.120 | wasn't that good at generating code.

00:32:58.240 | And it would generate like very, very short,

00:32:59.880 | very incomplete code snippets.

00:33:03.640 | And so we launched that last summer.

00:33:07.920 | Got some traction, but really like we were only doing like,

00:33:10.720 | I don't know, maybe like 10,000 searches a day.

00:33:13.320 | Like some people knew about it.

00:33:15.520 | Some people use it, which is impressive.

00:33:17.000 | 'Cause looking back, the product like was not that good.

00:33:19.760 | And yeah, every time we've like made an improvement

00:33:24.200 | to the way that we retrieve context

00:33:27.560 | through better embeddings,

00:33:28.920 | more intelligent, like HTML parsers,

00:33:32.640 | and importantly, like better underlying models.

00:33:35.640 | Yeah, I would really consider every kind of iteration

00:33:39.000 | after that when we,

00:33:40.760 | every major version after that was when we introduced

00:33:43.560 | the better underlying answering model.

00:33:45.480 | Like in February, we launched,

00:33:47.400 | we had to swallow a bit of our pride

00:33:51.520 | when we were like, okay, our own models aren't good enough.

00:33:54.720 | We have to go to open AI.

00:33:56.320 | And that actually, that did lead to kind of like our first

00:34:01.160 | like decent bump of traffic in February.

00:34:06.400 | And people kept using it.

00:34:07.440 | Like our attention was way better too.

00:34:09.960 | But we were still kind of running into problems

00:34:12.760 | of like more advanced reasoning.

00:34:14.280 | Some people tried it,

00:34:15.840 | but people were leaving because even like GPT 3.5,

00:34:20.240 | both turbo and non-turbo,

00:34:23.400 | like still not that great at doing like code-related

00:34:27.120 | reasoning beyond like the how do you do X,

00:34:31.760 | like documentation search type of use case.

00:34:34.520 | And so it was really only when GPT-4 came around in April

00:34:39.280 | that we were like, okay, like this is like

00:34:41.600 | our first real opportunity to really make this thing

00:34:44.880 | like the way that it should have been all along.

00:34:47.360 | And having GPT-4 as the brain

00:34:49.800 | is what led to that Hacker News post.

00:34:53.680 | And so what we did was we just let anyone use GPT-4

00:34:58.680 | on Find for free without a login,

00:35:02.960 | which I actually don't regret.

00:35:07.400 | So it was very expensive obviously,

00:35:09.680 | but like at that stage,

00:35:13.000 | all we needed to do was show like,

00:35:15.000 | we just needed to like show people,

00:35:16.240 | here's what Find can do.

00:35:17.800 | That was the main thing.

00:35:18.640 | And so that worked, that worked.

00:35:19.960 | Like we got a lot of users.

00:35:22.040 | Do you know Fireship?

00:35:25.840 | - Yeah, the YouTube Jeff Delaney.

00:35:27.520 | - Yeah, he made a short about Find.

00:35:30.120 | And that's on top of the Hacker News post.

00:35:33.600 | And that's what like really, really made it blow up.

00:35:35.360 | It got millions of views in days.

00:35:37.280 | And he's just funny.

00:35:39.160 | Like what I love about Fireship is like he,

00:35:41.480 | like you guys, yeah.

00:35:42.640 | Yeah, like humor goes a long way

00:35:46.120 | towards like really grabbing people's attention.

00:35:49.040 | And so that blew up.

00:35:50.200 | - So something I would be anxious about as a founder

00:35:52.920 | during that period.

00:35:53.760 | So obviously we all remember that pretty closely.

00:35:55.400 | There were a couple of people

00:35:56.480 | who had access to the GPT-4 API doing this,

00:35:59.080 | which is unrestricted access to GPT-4.

00:36:01.760 | And I have to imagine YC,

00:36:04.800 | OpenAI wasn't that happy about that.

00:36:08.840 | Because it was like kind of de facto access to GPT-4

00:36:13.080 | before they released it.

00:36:14.200 | - Chat GPT-4 was in Chat GPT from day one, I think.

00:36:16.920 | OpenAI actually came to our support

00:36:20.440 | because what happened was

00:36:23.080 | we had people building unofficial APIs around Find.

00:36:27.840 | Yeah, to try to get free access to it.

00:36:31.880 | And I think OpenAI actually has the right perspective

00:36:35.160 | on this where they're like,

00:36:36.000 | "Okay, people can do whatever they want with the API.

00:36:37.520 | If they're paying for it, they can do whatever they want.

00:36:39.600 | But it's not okay if paying customers

00:36:42.600 | are being exploited by these other actors."

00:36:44.200 | So they actually got in touch with us

00:36:45.160 | and they helped us set up better

00:36:47.880 | Cloudflare bot monitoring controls

00:36:50.240 | to effectively crack down on those unofficial APIs,

00:36:55.000 | which we're very happy about.

00:36:58.960 | But yeah, so we launched GPT-4.

00:37:03.760 | A lot of people come to the product.

00:37:06.040 | And yeah, for a long time we're just,

00:37:09.160 | we're figuring out like,

00:37:10.640 | how do we, like, what do we make of this, right?

00:37:13.880 | Like, how do we, A, make it better,

00:37:17.240 | but also deal with like our costs,

00:37:19.080 | which have just like massively, massively ballooned.

00:37:22.520 | And I think over time it's,

00:37:26.040 | I think it's become more clear

00:37:28.720 | with the release of Lama 2 and Lama 3 on the horizon

00:37:31.520 | that we will once again see a return

00:37:34.120 | to vertical applications running their own models.

00:37:38.000 | As was true last year and before,

00:37:40.360 | I think that GPT-4, my hypothesis is that

00:37:44.800 | the jump from 4 to 4.5 or 4 to 5

00:37:48.280 | will be smaller than the jump from 3 to 4.

00:37:52.240 | And the reason why is because

00:37:54.040 | there were a lot of different things.

00:37:56.280 | Like there was two plus,

00:37:57.880 | effectively two, two and a half years of research

00:38:00.160 | that went into going from 3 to 4.

00:38:02.880 | Like more data, bigger model,

00:38:05.840 | all of like the instruction tuning techniques, RLHF,

00:38:08.640 | all of that is known.

00:38:11.800 | And like Meta, for example,

00:38:13.520 | and now there's all these other startups like Mistral too.

00:38:15.240 | Like there's a bunch of very well-funded open source players

00:38:18.200 | that are now working on just like

00:38:20.000 | taking the recipe that's now known and scaling it up.

00:38:24.520 | So I think that even if a Delta exists in 2024,

00:38:29.440 | the Delta between proprietary and open source

00:38:32.400 | won't be large enough that a startup like us

00:38:36.560 | with a lot of data that we've collected

00:38:40.200 | can take the data that we have,

00:38:42.000 | fine tune an open source model

00:38:44.120 | and like be able to have it be better

00:38:46.920 | than whatever the proprietary model is at the time.

00:38:49.920 | That's my hypothesis.

00:38:51.480 | That we'll once again see a return

00:38:52.920 | to these verticalized models.

00:38:54.720 | And that's something that we're super excited about

00:38:58.200 | 'cause yeah, that brings us to kind of the fine model

00:39:01.840 | because the plan from kind of the start

00:39:05.280 | was to be able to return to that if that makes sense.

00:39:08.880 | And I think now we're definitely at a point

00:39:10.280 | where it does make sense

00:39:12.040 | because we have requests from users

00:39:14.840 | who like they want longer context in the model basically.

00:39:19.000 | Like they want to be able to ask questions

00:39:22.640 | about their entire code base.

00:39:24.080 | They want, and without, you know, context and retrieval

00:39:27.920 | and taking a chance of that,

00:39:28.760 | like I think it's generally been shown

00:39:31.400 | that if you have the space to just put the raw files

00:39:36.400 | inside of a big context window,

00:39:38.440 | that is still better than chunking and retrieval.

00:39:40.600 | It just is.

00:39:41.640 | So there's various things that we could do

00:39:42.760 | with longer context, faster speed, lower cost.

00:39:45.480 | Super excited about that.

00:39:46.440 | And that's the direction that we're going to find model.

00:39:48.720 | And our big hypothesis there is precisely

00:39:52.520 | that we can take a really good open source model

00:39:55.520 | and then just train it on absolutely

00:40:00.360 | all of the high quality data that we can find.

00:40:03.800 | And there's a lot of various, you know,

00:40:07.640 | interesting ideas for this.

00:40:09.080 | We have our own techniques

00:40:10.280 | that we're kind of playing with internally.

00:40:12.440 | One of the very interesting ideas that I've seen

00:40:14.720 | is Octopack from BigCode.

00:40:18.600 | I don't think that it made that big waves

00:40:20.440 | when it came out, I think in August,

00:40:21.800 | but the idea is that they have this dataset

00:40:25.960 | that maps GitHub commits to a change.

00:40:30.960 | So basically there's all this really high quality,

00:40:36.560 | like human-made, human-written diff data out there

00:40:40.200 | on every time someone makes a commit in some repo.

00:40:42.760 | And you can use that to train models.

00:40:44.560 | You take the file state before

00:40:46.960 | and like given a commit message,

00:40:48.640 | what should that code look like in the future?

00:40:51.080 | - You got it. - You can--

00:40:51.920 | - You're money though, it's any good?

00:40:54.120 | - No, unfortunately.

00:40:55.320 | So we ran this experiment, we trained the fine model.

00:40:57.360 | And if you go to the BigCode leaderboard

00:40:59.840 | as of today, October 5th,

00:41:02.760 | all of our models are at the top

00:41:07.320 | of the BigCode leaderboard by far, it's not close,

00:41:11.000 | particularly in languages other than Python.

00:41:13.320 | We have a 10 point gap between us and the next best model

00:41:18.320 | on Java, JavaScript, I think C#, multilingual.

00:41:23.400 | And what we kind of learned from that whole experience

00:41:27.560 | releasing those models is that

00:41:29.080 | human eval doesn't really matter.

00:41:31.320 | Not just that, but GPT-4 itself

00:41:34.120 | has been trained on human eval.

00:41:36.360 | And we know this because GPT-4 is able to predict

00:41:39.680 | the exact docstring in many of the problems.

00:41:42.760 | I've seen it predict like the specific example values

00:41:48.280 | in the docstring, which is extremely improbable

00:41:51.520 | for it to just, you know, no.

00:41:53.560 | So I think there's a lot of dataset contamination

00:41:56.920 | and it only captures a very limited subset

00:42:00.400 | like what programmers are actually doing.

00:42:02.360 | What we do internally for evaluations

00:42:04.240 | are we have GPT-4 score answers.

00:42:09.240 | GPT-4 is a really good evaluator.

00:42:12.200 | I mean, obviously it's by really good,

00:42:14.360 | I mean, it's the best that we have.

00:42:15.880 | I'm sure that, you know, a couple of months from now

00:42:17.080 | next year, we'll be like, oh, you know, like GPT-4.5,

00:42:19.800 | GPT-5, it's so much better, like GPT-4 is terrible.

00:42:22.400 | But like right now it's the best

00:42:23.640 | that we have short of humans.

00:42:25.640 | And what we found is that when doing like temperature zero

00:42:29.040 | evals, it's actually mostly deterministic GPT-4

00:42:34.920 | across runs in assigning scores to two different answers.

00:42:38.800 | So we found it to be a very useful tool

00:42:42.000 | in comparing our model to say GPT-4.

00:42:47.000 | Yeah, on our like internal, like real world,

00:42:51.160 | here's what people will be asking this model dataset.

00:42:54.000 | And the other thing that we're running

00:42:56.680 | is just like releasing the model to our users

00:42:59.080 | and just seeing what they think.

00:43:01.280 | 'Cause that's like the only thing that really matters

00:43:02.880 | is like releasing it for the application

00:43:05.960 | that it's intended for and then seeing how people react.

00:43:09.640 | And for the most part, the incredible thing is

00:43:11.800 | is that people don't notice a difference

00:43:15.000 | between our model and GPT-4

00:43:17.040 | for the vast majority of searches.

00:43:19.200 | There's some reasoning problems

00:43:22.000 | that GPT-4 can still do better.

00:43:23.600 | We're working on addressing that.

00:43:25.640 | But in terms of like the types of questions

00:43:27.320 | that people are asking on find,

00:43:28.880 | yeah, like there's not that much difference.

00:43:33.400 | And in fact, like I've been running my own

00:43:35.640 | kind of side-by-side comparisons.

00:43:37.720 | Shout out to Godmode by the way.

00:43:40.520 | And I've like myself,

00:43:42.440 | I've kind of confirmed this to be the case.

00:43:43.640 | And even sometimes it gives a better answer,

00:43:46.000 | perhaps like more concise

00:43:47.040 | or just like better implementation than GPT-4,

00:43:49.560 | which that's what surprises me.

00:43:51.080 | And so by now we kind of have like this

00:43:55.200 | reasoning is all you need kind of hypothesis

00:43:57.400 | where we've seen emerging capabilities in the find model

00:44:00.320 | whereby training it on high quality code,

00:44:02.880 | it can actually like reason better.

00:44:04.920 | It went from not being able to solve

00:44:09.840 | like world problems

00:44:13.240 | where like riddles were like with like temporal

00:44:17.280 | and like placement of objects

00:44:20.400 | and moving and stuff like that,

00:44:22.280 | that GPT-4 can do pretty well.

00:44:23.440 | We went from not being able to do those at all

00:44:25.360 | to being able to do them just by training on more code,

00:44:29.880 | which is wild.

00:44:31.320 | So we're already like starting to see

00:44:32.840 | like these emerging capabilities.

00:44:34.280 | - Yeah, so I just wanted to make sure that we have the,

00:44:38.120 | I guess like the model card in our heads.

00:44:40.800 | So you started from Code Llama?

00:44:42.520 | - Yes.

00:44:43.680 | - 65, 34?

00:44:45.200 | - 34.

00:44:46.040 | So unfortunately there's no Code Llama 7 to be.

00:44:48.120 | If there was, that would be super cool, but there's not.

00:44:50.280 | - 34, and then, which in itself was Llama 2,

00:44:54.960 | which was on two trillion tokens

00:44:56.080 | and the added 500 billion code tokens.

00:44:58.280 | - Yes.

00:44:59.120 | - And you just added a bunch more.

00:44:59.960 | - Yeah, and they also did a couple of things.

00:45:02.000 | So they did, I think they did 500 billion,

00:45:04.440 | the general pre-training

00:45:05.640 | and then they did an extra 20 billion

00:45:07.280 | long context pre-training.

00:45:08.680 | So they actually increased the like max position tokens

00:45:13.680 | to 16K up from 8K.

00:45:17.080 | And then they changed the theta parameter

00:45:21.800 | for the ROPE embeddings as well

00:45:24.680 | to give it theoretically better long context support

00:45:27.040 | up to 100K tokens.

00:45:29.120 | But yeah, but otherwise it's like basically Llama 2.

00:45:31.080 | - So you just took that and just added data?

00:45:32.960 | - Exactly.

00:45:33.800 | - You didn't do any other fundamental?

00:45:34.760 | - Yeah, so we didn't actually,

00:45:36.120 | we haven't yet done anything with the model architecture

00:45:39.120 | and we just trained it on like many, many more billions

00:45:41.560 | of tokens on our own infrastructure.

00:45:43.960 | And something else that we're taking a look at now

00:45:47.040 | is using reinforcement learning for correctness.

00:45:50.360 | One of the interesting pitfalls that we've noticed

00:45:55.240 | with the fine model is that in cases

00:45:57.720 | where it gets stuff wrong,

00:45:59.160 | sometime is capable of getting the right answer.

00:46:02.160 | It's just, there's a big variance problem.

00:46:03.680 | It's wildly inconsistent.

00:46:05.480 | Like there are cases when it is able

00:46:08.000 | to get the right chain of thought

00:46:09.000 | and able to arrive at the right answer, but not always.

00:46:11.800 | And so like one of our hypotheses

00:46:13.560 | is something that we're gonna try is that like,

00:46:15.640 | we can actually do reinforcement learning

00:46:19.280 | on like for a given problem,

00:46:21.760 | generate a bunch of completions

00:46:23.120 | and then like use the correct answer as like a loss

00:46:27.360 | basically to try to get it to be more correct.

00:46:31.600 | And I think there's a high chance I think of this working

00:46:33.760 | because it's very similar to the like RLHF method

00:46:36.880 | where you basically show pairs of completions

00:46:40.560 | for a given question, except the criteria is like,

00:46:43.280 | which one is like, you know, less harmful.

00:46:48.280 | But here, you know, we have a different criteria,

00:46:51.520 | but if the model's already capable

00:46:55.480 | of getting the right answer, which it is,

00:46:57.840 | we just need to cajole it into being more consistent.

00:47:00.120 | - There were a couple of things that I noticed

00:47:01.960 | in the product that were not strange, but unique.

00:47:05.240 | So first of all, the model can talk multiple times

00:47:08.760 | in a row, like most other applications

00:47:10.840 | is like human model, human model.

00:47:13.400 | And then you had outside of the thumbs up, thumbs down,

00:47:16.640 | you have things like have DLLM prioritize this message

00:47:20.160 | and its answers, or then continue from this message

00:47:23.040 | to like go back.

00:47:23.960 | How does that change the flow of the user?

00:47:27.320 | And like in terms of like prompting it,

00:47:29.640 | yeah, what are like some tricks or learnings to that?

00:47:32.280 | - Yeah, that's a good question.

00:47:33.800 | So yeah, that's specifically in our pair programmer mode,

00:47:38.320 | which is a more conversational mode

00:47:40.000 | that also like asks you clarifying questions back

00:47:46.240 | if it doesn't fully understand what you're doing

00:47:47.800 | and it kind of, it holds your hand a bit more.

00:47:50.240 | And so from user feedback, we had requests

00:47:55.240 | to make more of an auto GPT, where you can kind of give it

00:47:58.320 | this problem that might take multiple searches

00:48:00.040 | or multiple different steps,

00:48:01.280 | like multiple reasoning steps to solve.

00:48:03.120 | And so that's the impetus behind building that product,

00:48:08.120 | being able to do multiple steps

00:48:11.400 | and also be able to handle really long conversations.

00:48:14.040 | Like people are really trying to use the pair programmer

00:48:16.320 | to go from like, sometimes really from like basic idea

00:48:18.880 | to like complete working code.

00:48:20.760 | And so we noticed was, is that we were having

00:48:23.320 | like these very, very long threads,

00:48:25.600 | sometimes with like 60 messages, like a hundred messages.

00:48:29.040 | And like those become really, really challenging

00:48:31.560 | to manage like the appropriate context window

00:48:34.360 | of what should go inside of the context

00:48:37.520 | and how to preserve the context

00:48:40.800 | so that the model can continue

00:48:42.520 | or the product can continue giving good responses,

00:48:45.120 | even if you're like 60 messages deep in a conversation.

00:48:47.880 | So that's where the prioritized user messages

00:48:50.000 | like comes from is like, people have asked us

00:48:54.000 | to just like let them pin messages

00:48:56.720 | that they want to be left in the conversation.

00:48:59.920 | And yeah, and then that seems to have like

00:49:04.160 | really gone a long way towards solving that problem.

00:49:07.080 | - Yeah, and then you have a run and replit thing.

00:49:09.640 | Are you planning to build your own repl

00:49:11.600 | like learning some people trying to run the wrong code,

00:49:15.000 | unsafe code?

00:49:15.880 | - Yes, yes.

00:49:16.840 | So I think like in the long-term vision

00:49:19.280 | of like being a place where people can go from like idea

00:49:23.120 | to like fully working code, having a code sandbox,

00:49:28.120 | like a natively integrated code sandbox

00:49:30.560 | makes a lot of sense.

00:49:31.680 | And replit is great and people use that feature.

00:49:35.360 | But yeah, I think there's more we can do

00:49:37.200 | in terms of like having something

00:49:39.520 | a bit closer to code interpreter

00:49:40.760 | where it's able to run the code

00:49:43.160 | and then like recursively iterate on it, exactly.

00:49:45.920 | - I think replit is working on APIs

00:49:48.320 | to enable you to do that.

00:49:49.400 | - Yep.

00:49:50.240 | - So Amjad has specifically told me in person

00:49:51.880 | that he wants to enable that for people.

00:49:53.760 | At the same time, he's also working on his own models.

00:49:55.720 | - Right.

00:49:56.560 | - And Ghostwriter and all the other stuff.

00:49:57.800 | - Yeah.

00:49:58.640 | - So it's gonna get interesting.

00:49:59.600 | Like he wants to power you, but also compete with you.

00:50:02.200 | - Yeah.

00:50:03.040 | And like, and we love replit.

00:50:04.680 | I think that a lot of these,

00:50:06.680 | like a lot of the companies in our space,

00:50:09.120 | like we're all going to converge

00:50:11.880 | to solving a very similar problem,

00:50:16.240 | but from a different angle.

00:50:17.720 | So like replit approaches this problem from the IDE side.

00:50:20.680 | Like they started as like this IDE

00:50:22.920 | that you can run in the browser.

00:50:24.880 | And they started for like from that side,

00:50:26.360 | making coding just like more accessible.

00:50:28.240 | And we're approaching it from the side of like an LLM

00:50:32.440 | that's just like connected to everything

00:50:34.760 | that it needs to be connected to,

00:50:36.320 | which includes your code context.

00:50:37.960 | So that's why like we're kind of making,

00:50:39.640 | you know, inroads into IDEs.

00:50:42.080 | But we're kind of, we're approaching this problem

00:50:43.200 | from different sides.

00:50:44.040 | And I think it will be interesting

00:50:45.400 | to see where things end up.

00:50:48.960 | But I think that, you know, in the long, long term,

00:50:52.360 | we have an opportunity to also just have like

00:50:56.480 | this general kind of like technical reasoning engine product

00:51:00.680 | that's, you know, potentially also not just for programmers

00:51:06.440 | and it's also powered in this web interface.

00:51:07.760 | Like where there's potential,

00:51:11.400 | I think other things that we will build

00:51:14.280 | that eventually might go beyond like our current scope.

00:51:18.120 | - Exciting, we'll look forward to that.

00:51:19.840 | - Thank you.

00:51:20.680 | - We're gonna zoom out a little bit

00:51:22.120 | into sort of AI ecosystem stories,

00:51:25.560 | but first we gotta get the Paul Graham, Ron Conway story.

00:51:28.520 | - Yeah, so flashback to last summer,

00:51:32.160 | we're in the YC batch.

00:51:33.520 | And we're doing the summer batch, summer 22.

00:51:39.480 | So the summer batch runs from June to September,

00:51:43.040 | approximately.

00:51:44.360 | This was late July, early August,

00:51:47.520 | right around the time that many like YC startups

00:51:50.840 | start like going out, like gearing up,

00:51:52.640 | here's how we're gonna pitch investors and everything.

00:51:55.320 | And at the same time, me and my co-founder, Justin,

00:51:58.320 | we were planning on moving to New York.

00:52:00.800 | So for a long time, actually,

00:52:03.240 | we were thinking about building this company in New York,

00:52:06.920 | mainly for personal reasons, actually.

00:52:08.640 | 'Cause like during the pandemic,

00:52:10.720 | pre-Chad GPT, pre last year, pre the AI boom,

00:52:13.960 | SF unfortunately really kind of like--

00:52:17.640 | - So did.

00:52:18.480 | - Lost its luster, yeah, like no one was here.

00:52:20.880 | It was far from clear, like if there would be an AI boom,

00:52:26.280 | if like SF would be like the AI--

00:52:28.240 | - Back.

00:52:29.080 | - Yeah, exactly.

00:52:29.920 | If SF would be so back, as everyone is saying these days,

00:52:34.920 | it was far from clear.

00:52:36.160 | And so, and all of our friends, we were graduating college,

00:52:39.760 | 'cause like we happened to just graduate college

00:52:42.280 | and immediately start YC.

00:52:43.400 | Like we didn't even have, I think we had a week in between.

00:52:47.000 | So it was just--

00:52:47.840 | - You didn't bother looking for jobs,

00:52:48.680 | you were just like, this is all good.

00:52:50.080 | - Well, actually, both me and my co-founder,

00:52:51.240 | we had jobs that we secured in 2021

00:52:54.760 | from previous internships, but we both, like we,

00:52:57.520 | funny enough, when I spoke to my boss's boss

00:53:02.520 | at the company at which, like where I reneged my offer,

00:53:06.840 | I told him we got into YC.

00:53:08.920 | They actually said, yeah, you should do YC.

00:53:10.320 | - Wow, that's very selfless, that's great.

00:53:12.480 | - Yeah, that was really great that they did that.

00:53:14.240 | - In San Francisco, they would have offered

00:53:15.400 | to invest as well.

00:53:16.240 | - Yes, yes, they would have.

00:53:18.880 | But yeah, we were both planning to be in New York.

00:53:21.240 | And all of our friends were there from college.

00:53:24.160 | And so like at this point, like we have this whole plan,

00:53:28.360 | we're like on August 1st, we're gonna move to New York.

00:53:30.960 | And we had like this Airbnb for the month of New York,

00:53:33.400 | we're gonna stay there and we're gonna work

00:53:34.520 | and like all of that.

00:53:37.040 | The day before we go to New York, I call Justin

00:53:40.840 | and I just, I tell him like, why are we doing this?

00:53:44.000 | Like, why are we doing this?

00:53:45.400 | 'Cause in our batch, like by the time

00:53:48.720 | that August 1st rolled around, all of our mentors

00:53:51.120 | at YC were saying like, hey, like,

00:53:52.880 | you should really consider staying in SF.

00:53:54.400 | - It's the hybrid batch, right?

00:53:55.680 | - Yeah, it was the hybrid batch.

00:53:57.720 | But like there were already signs that like something

00:54:00.520 | was kind of like afoot in SF,

00:54:02.520 | even if like we didn't fully wanna admit it yet.

00:54:05.360 | And so we were like, no, I don't know.

00:54:08.080 | And so the day before, like, I don't know,

00:54:12.000 | something kind of clicked when the rubber met the road

00:54:14.760 | and it was time to go to New York.

00:54:16.280 | We were like, why are we doing this?

00:54:18.080 | And like, we didn't have any good reasons

00:54:20.520 | for staying in New York at that point

00:54:22.040 | beyond like our friends are there.

00:54:24.920 | So we still go to New York 'cause like we have the Airbnb,

00:54:28.520 | like we don't have any other kind of place to go

00:54:30.080 | for the next few weeks.

00:54:31.240 | We're in New York.

00:54:32.920 | And New York is just unfortunately too much fun.

00:54:36.720 | Like all of my other friends from college

00:54:39.480 | who are just, you know, like basically starting their jobs,

00:54:42.080 | starting their lives as adults, you know,

00:54:45.160 | they just got stuck into these jobs.

00:54:46.960 | They're making all this money and they're like partying

00:54:48.640 | and like all these things are happening.

00:54:50.400 | And like, yeah, it's just a very distracting place to be.

00:54:52.720 | And so we were just like sitting in this like small,

00:54:55.040 | you know, like cramped apartment, terrible posture,

00:54:57.920 | trying to get as much work done as we can.

00:55:00.400 | Too many distractions.

00:55:01.920 | And then we get this email from YC saying

00:55:06.160 | that Paul Graham is in town, in SF,

00:55:08.880 | and he is doing office hours with a certain number

00:55:13.880 | of startups in the current batch.

00:55:16.560 | And whoever signs up first gets it.

00:55:20.240 | And I happened to be super lucky.

00:55:21.480 | I was about to go for a run,

00:55:22.920 | but I just, I saw the email notification

00:55:24.480 | come across the street.

00:55:25.320 | I immediately clicked on the link.

00:55:26.600 | And like immediately, like half the spots were gone,

00:55:30.440 | but somehow the very last spot was still available.

00:55:35.240 | And so I picked the very, very last time slot

00:55:37.520 | at 7 p.m. semi-strategically, you know,

00:55:40.920 | so we would have like time to go over.

00:55:42.760 | And also because like, I didn't really know

00:55:47.080 | how we're going to get to SF yet.

00:55:48.320 | And so we made a plan that we're going to fly

00:55:50.880 | from New York to SF and back to New York in one day

00:55:54.720 | and do like the full round trip.

00:55:55.880 | And we're going to meet with PG

00:55:58.640 | at the YC Mountain View office.

00:56:00.320 | And so we go there, we do that.

00:56:03.840 | We meet PG, you know, we tell him about the startup.

00:56:08.040 | And one thing I love about PG is that he gets like,

00:56:11.200 | he gets so excited.

00:56:12.640 | Like when he gets excited about something,

00:56:14.120 | like you can see his eyes like really light up.

00:56:17.080 | And he'll just start asking you questions.

00:56:19.360 | In fact, it's a little challenging sometimes

00:56:20.640 | to like finish kind of like the rest of like

00:56:22.360 | the description of your pitch.

00:56:23.480 | 'Cause like, he'll just like start like, you know,

00:56:26.960 | asking all these questions about how it works

00:56:29.360 | and like, you know, what's going on.

00:56:31.160 | - And what was the most challenging question

00:56:33.640 | that he asked you?

00:56:34.520 | - I think that like, he was asking us a lot of questions

00:56:38.680 | about like, like really how it worked.

00:56:41.240 | 'Cause like, as soon as like we told him like,

00:56:42.480 | hey, like we think that the future of search

00:56:45.280 | is answers, not links.

00:56:47.360 | Like, we could really see like the gears turning

00:56:51.680 | in his head.

00:56:52.800 | I think we were like the first demo of that,

00:56:55.640 | that he saw.

00:56:56.480 | - And you're like 10 minutes with him, right?

00:56:57.400 | - We had like 45, yeah, we had a decent chunk of time.

00:57:02.200 | Yeah.

00:57:03.040 | And so we tell him how it works.

00:57:04.720 | Like, he's very excited about it.

00:57:07.000 | And I just like, I just blurt it out.

00:57:08.440 | I just like ask him to invest.

00:57:10.200 | And he hasn't even seen the product yet.

00:57:12.160 | Oh, we just asked him to invest.

00:57:14.160 | And he says, yeah.

00:57:15.040 | And like, we're super excited about that.

00:57:18.960 | - And you're like, he haven't started your batch.

00:57:21.440 | - No, no, no, this is like...

00:57:23.400 | - This is after your batch.

00:57:24.240 | - Yeah, this is about halfway through the batch.

00:57:27.440 | Or two, two, no, two thirds of the batch.

00:57:29.040 | - Which when you're like not technically fundraising yet.

00:57:31.320 | - Or about to start fundraising.

00:57:32.920 | Yeah, so we have like this demo

00:57:34.120 | and like we showed him and like,

00:57:35.240 | there was still a lot of issues with the product.

00:57:37.800 | But I think like, it must have like still kind of

00:57:40.920 | like blown his mind in some way.

00:57:42.520 | And so, yeah, so like we're having fun.

00:57:46.680 | He's having fun.

00:57:48.720 | We have this dinner planned with this other friend

00:57:53.320 | that we had in SF.

00:57:54.160 | 'Cause we were only there for that one day.

00:57:55.480 | So we thought, okay, after an hour, we'll be done.

00:57:59.040 | We'll grab dinner with our friend

00:58:00.120 | and we'll fly back to New York.

00:58:01.320 | But PG was like, I'm having so much fun.

00:58:03.400 | Like, do you wanna...

00:58:05.080 | - Have dinner?

00:58:05.920 | - Yeah, come to my house.

00:58:07.280 | Or he's like, I gotta go have dinner with my wife, Jessica.

00:58:12.280 | Who's also awesome, by the way.

00:58:14.160 | - She's like the heart of YC.

00:58:15.720 | - Yeah, yeah, like Jessica does not get enough credit

00:58:19.120 | as an aside for her role.

00:58:20.880 | - He tries, he tries.

00:58:21.880 | - He tries, but like, yeah,

00:58:23.120 | Jessica really deserves a lot of credit.

00:58:25.560 | 'Cause she like, he understands like the technical side

00:58:29.600 | and she understands people and together,

00:58:30.960 | they're just like a phenomenal team.

00:58:32.480 | But he's like, yeah, I gotta go see Jessica.

00:58:34.640 | But you guys are welcome to come with.

00:58:36.480 | Do you wanna come with?

00:58:37.320 | And we're like, we have this friend

00:58:39.240 | who's like right now outside of,

00:58:42.680 | like we're literally outside the door

00:58:44.520 | who like we also promised to get dinner with.

00:58:47.440 | So like, we'd love to, but like, I don't know if we can.

00:58:49.000 | He's like, oh, he's welcome to come too.

00:58:51.000 | So like, yeah, so all of us just like hop in his car

00:58:54.720 | and we go to his house and then we just like have this,

00:58:57.360 | like we have dinner and we have this,

00:59:01.400 | like just chat about the future of search.

00:59:03.320 | Like I remember him telling Jessica distinctly,

00:59:05.880 | like our kids and our kids' kids

00:59:10.880 | are like, are not gonna know what like a search result is.

00:59:13.760 | Like they're just gonna like have answers.

00:59:16.000 | So that was really like a mind blowing,

00:59:18.360 | like inflection point moment for sure.

00:59:21.320 | - Wow, that email changed your life.

00:59:22.880 | - Absolutely.

00:59:23.720 | - And you also just spoiled the booking system for PG.

00:59:27.160 | 'Cause now everyone's just gonna go after the last slot.

00:59:30.320 | - Oh man, yeah, but like,

00:59:31.520 | I don't know if he even does that anymore.

00:59:33.440 | - He does, he does.

00:59:34.280 | Yeah, I've met other founders that he did it this year.

00:59:36.160 | - This year, gotcha.

00:59:37.000 | But when we told him about how we did it,

00:59:39.160 | he was like, I am like frankly shocked

00:59:41.120 | that like YC just did like a random like scheduling system.

00:59:44.080 | They didn't like do anything else, but.

00:59:46.480 | - Okay, and then he introduces Duron Conway.

00:59:48.360 | - Yes.

00:59:49.200 | - Who is one of the most legendary angels in Silicon Valley.

00:59:52.600 | - Yes, so after PG invested,

00:59:54.960 | the rest of our round came together pretty quickly.

00:59:58.080 | And so like--

00:59:58.920 | - By the way, I'm surprised,

01:00:00.480 | like it might feel like playing favorites, right?

01:00:03.440 | Within the current batch to be like,

01:00:05.360 | yo, PG invested in this one.

01:00:07.080 | - Right, and like, yes.

01:00:12.080 | - Too bad for the others.

01:00:15.400 | - Too bad for the others, I guess.

01:00:17.040 | I think this is a bigger point about YC

01:00:20.160 | and like these accelerators in general is like,

01:00:21.920 | YC gets like a lot of criticism from founders

01:00:23.880 | who feel like they didn't get value out of it.

01:00:26.680 | But like, in my view, YC is what you make of it.

01:00:30.120 | Like, and the YC tells you this,

01:00:31.800 | they're like, you really got to grab this opportunity,

01:00:33.800 | like buy the balls and make the most of it.

01:00:36.800 | And if you do, then it could be the best thing in the world.

01:00:39.400 | And if you don't, and if you're just kind of like a passive,

01:00:41.800 | even like an average founder in YC, you're still gonna fail.

01:00:43.720 | And they tell you that, they're like,

01:00:45.560 | if you're average in your batch, you're gonna fail.

01:00:48.760 | Like you have to just be exceptional in every way.

01:00:51.240 | And so yeah, after PG invested,

01:00:52.600 | the rest of our round came together pretty quickly,

01:00:54.680 | which I'm very fortunate for.

01:00:55.520 | And yeah, he introduces to Ron.

01:00:57.040 | And after he did, I get a call from Ron.

01:01:00.200 | And Ron says like, hey, like, you know,

01:01:02.520 | PG tells me what you're working on,

01:01:03.600 | I'd love to come meet you guys.

01:01:05.080 | And I'm like, wait, no way.

01:01:06.760 | And we're just holed up in this like little house

01:01:10.080 | in San Mateo, which is a little small,

01:01:13.080 | but you know, it had a nice patio.

01:01:14.160 | In fact, we had like our monitors set up

01:01:16.440 | outside on the deck out there.

01:01:18.680 | And so Ron Conway comes over,

01:01:20.680 | we go over to the patio, where like our workstation is.

01:01:24.200 | And Ron Conway, he's known for having like this notebook

01:01:27.840 | that he goes around with, where he like sits down

01:01:30.960 | with the notebook and like takes very, very detailed notes.

01:01:33.680 | So he never like forgets anything.

01:01:35.440 | So he sits down with his notebook and he asks us like,

01:01:38.400 | hey guys, like, what do you need?

01:01:40.640 | And we're like, oh, we need GPUs.

01:01:42.400 | Like back then the GPU shortage

01:01:45.280 | wasn't even nearly as bad as it is now.

01:01:47.120 | But like, even then it was still challenging

01:01:49.840 | to get like the quota that we needed.

01:01:51.600 | And he's like, okay, no problem.

01:01:53.640 | And then like he leaves a couple hours later,

01:01:56.280 | we get an email and we're CC'd on an email

01:01:59.160 | that Ron wrote to Jensen, the CEO of NVIDIA,

01:02:03.400 | saying like, hey, like these guys need GPUs.

01:02:06.840 | - You didn't say how much?

01:02:07.680 | It was just like, just give them GPUs.

01:02:09.000 | - Basically, yeah.

01:02:10.120 | Ron is known for writing these like one-liner emails

01:02:13.720 | that are like very short, but very to the point.

01:02:16.400 | And I think that's why like everyone responds to Ron.

01:02:20.120 | Everyone loves Ron.

01:02:21.640 | And so Jensen responds.

01:02:23.160 | He responds quickly, like tagging this VP of AI at NVIDIA.

01:02:26.440 | And we start working with NVIDIA, which is great.

01:02:29.360 | And something that I love about NVIDIA, by the way,

01:02:31.240 | is that after that intro,

01:02:32.800 | we got matched with like a dedicated team.

01:02:35.360 | And at NVIDIA, they know that they're gonna win regardless.

01:02:40.360 | So they don't care where you get the GPUs from.

01:02:43.680 | They're like, they're truly neutral,

01:02:45.280 | unlike various sales reps that you might encounter

01:02:47.840 | at various like clouds and, you know,

01:02:50.040 | hardware companies, et cetera.

01:02:51.880 | Like they actually just wanna help you

01:02:53.160 | 'cause they know they don't care.

01:02:54.240 | Like regardless, they know that

01:02:55.640 | if you're getting NVIDIA GPUs, they're still winning.

01:02:58.280 | So I guess that's a tip is that like,

01:03:01.840 | if you're looking for GPUs, like NVIDIA,

01:03:04.200 | yeah, they'll help you do it.

01:03:05.440 | - So like, so, okay, and then just to tie up this thing,

01:03:08.600 | because it, so first of all, that's a fantastic story.

01:03:10.960 | And like, you know, I just wanted to let you tell that

01:03:13.200 | 'cause it's special.

01:03:14.600 | That is a strategic shift, right?

01:03:17.840 | That you already decided to make by the time you met Ron,

01:03:20.040 | which is we are going to have our own hardware.

01:03:22.240 | We're gonna rack him in a data center somewhere.

01:03:24.800 | - Not even that we need our own hardware

01:03:26.840 | 'cause actually we don't, but we just need GPUs period.

01:03:31.360 | And like every cloud loves,

01:03:33.760 | like they have their own sales tactics

01:03:35.400 | and like they wanna make you commit to long terms

01:03:38.480 | and like very non-flexible terms.

01:03:40.800 | And like, there's all these,

01:03:43.280 | there's a web of different things

01:03:44.600 | that you kind of have to navigate.

01:03:45.680 | NVIDIA will kind of be to the point and be like,

01:03:47.280 | okay, you can do this on this cloud,

01:03:49.480 | this on this cloud.

01:03:51.120 | If like, this is your budget,

01:03:52.640 | maybe you wanna consider buying as well.

01:03:53.800 | Like they'll help you walk through what the options are.

01:03:56.960 | And in terms of software,

01:04:00.000 | and the reason why they're helpful is

01:04:01.840 | 'cause like they look at the full picture.

01:04:03.680 | So they'll help you with the hardware.

01:04:06.480 | And in terms of software,

01:04:07.960 | they actually implemented a custom feature for us

01:04:10.520 | in Faster Transformer,

01:04:12.120 | which is one of their libraries.

01:04:13.520 | - For you?

01:04:14.360 | - For us, yeah.

01:04:15.560 | Which is wild.

01:04:16.400 | Yeah, I don't think they would have done it otherwise.

01:04:18.640 | They implemented streaming generation for T5 based models.

01:04:23.960 | Which we were running at the time,

01:04:25.560 | up until we switched to GPT in February, March of this year.

01:04:30.560 | So they implemented that just for us actually,

01:04:32.960 | and Faster Transformer.

01:04:34.120 | And so like, they'll help you look at the complete picture

01:04:36.720 | and then just help you get done what you need to get done.

01:04:40.400 | - And I know one of your interests is also local models,

01:04:44.760 | open source models and hardware kind of goes hand in hand.

01:04:48.120 | Any fun projects, explorations in the space

01:04:51.520 | that you wanna share with local Llamas?

01:04:54.200 | - Yeah, so it's something that we're very interested in.

01:04:59.200 | Because something that kind of we're hearing a lot about

01:05:04.400 | is like people want something like find,

01:05:08.200 | especially companies,

01:05:09.000 | but they wanna have it like within like their own sandbox.

01:05:11.840 | They wanna have it like on hardware that they control.

01:05:14.840 | And so I'm super, super interested

01:05:16.040 | in how we can get big models to run efficiently

01:05:19.880 | on local hardware.

01:05:22.280 | And so like, Llamas is great.

01:05:25.120 | Llamas CPP is great.

01:05:26.480 | Very interested in like where the whole quantization thing

01:05:31.840 | is going.

01:05:32.680 | 'Cause like, obviously there are all these like great

01:05:34.080 | quantization libraries now that go to four bit, eight bit,

01:05:37.960 | but specifically int eight and int four.

01:05:42.800 | - Which is the lowest it can go, right?

01:05:45.320 | - Right, but with int eight,

01:05:48.000 | there's not necessarily a speed increase.

01:05:51.560 | It's just a storage optimization.

01:05:52.800 | Yeah, so we have these great quantization libraries

01:05:55.280 | that for the most part are able to get the size down

01:05:59.360 | with not that much quality loss.

01:06:01.560 | But there is some,

01:06:02.520 | like the quantized models currently are actually worse

01:06:05.080 | than the non-quantized ones.

01:06:06.520 | And so I'm very curious if the future is something like

01:06:08.760 | what NVIDIA is doing with their implementation of FP8,

01:06:13.000 | which they're implementing

01:06:13.880 | in their transformer engine library.

01:06:15.760 | Where basically once FP8 support is kind of more widespread

01:06:20.760 | and hardware can support it efficiently,

01:06:26.520 | you can kind of switch between the two different FP8 formats

01:06:31.240 | one with greater precision, one with greater range,

01:06:34.600 | and then combine that with only not doing FP8 on every layer

01:06:39.600 | and doing like a mixed precision

01:06:41.680 | with like FP32 on some layers.

01:06:43.720 | And like NVIDIA claims that this strategy

01:06:45.840 | that they're kind of demoing with the H100

01:06:50.040 | has no degradation.

01:06:52.160 | And so it remains to be seen

01:06:55.200 | whether that is really true in practice,

01:06:57.280 | but that's something that we're excited about

01:06:58.800 | and whether that can be applied to like Macs

01:07:02.760 | and other hardware once they get FP8 support as well.

01:07:05.720 | - Oh, we should also talk about hiring.

01:07:07.600 | How do you get your info, right?

01:07:08.960 | Like you seem to know, you seem self-taught.

01:07:13.040 | - Yeah, so I've always just,

01:07:16.960 | well, I'm fortunate to have like a decent systems background

01:07:20.000 | from UT Austin and somewhat of a research background,

01:07:23.120 | even though like I didn't publish any papers,

01:07:26.360 | but like I went through all the motions.

01:07:28.200 | Like I didn't publish the thesis that I wrote

01:07:30.600 | mainly out of time because I was doing both of that

01:07:33.160 | and the startup at the same time.

01:07:34.280 | And then I graduated and then it was YC

01:07:35.480 | and then everything was kind of one after another.

01:07:38.080 | But like I'm very fortunate to kind of have like the systems

01:07:40.200 | and like a bit of like a research background.

01:07:41.960 | But yeah, for the most part, outside of that foundation,

01:07:45.600 | like I've always just,

01:07:46.880 | whenever I've been interested in something,

01:07:47.920 | I just like, I go deep.

01:07:49.720 | - Give people tips, right?

01:07:50.720 | Like where do you, what fire hose do you drink from?

01:07:53.160 | - Yeah, exactly.

01:07:54.000 | So like whenever I see something that blows my mind,

01:07:56.560 | the way that that initial Hugging Face demo did,

01:07:59.000 | that was like the start of everything.

01:08:00.600 | I'll just, yeah, I'll just like,

01:08:02.760 | I'll start from the beginning.

01:08:03.720 | I'll like, if I don't know anything,

01:08:05.600 | then like I'll just, I'll start by

01:08:10.360 | just trying to get a mental model of what is happening.

01:08:12.880 | Like first I need to understand what,

01:08:15.360 | so I can understand like the why, the how and the why.

01:08:19.120 | And once I can understand that,

01:08:20.760 | then I can make my own hypotheses about like,

01:08:24.320 | okay, here are the assumptions

01:08:25.440 | that the authors of this made.

01:08:27.880 | And here's why maybe they're correct, maybe they're wrong.

01:08:30.360 | And here's how like I can improve on it and iterate on it.

01:08:33.600 | And I guess that's the mindset that I approach it from

01:08:36.040 | is like once I understand something,

01:08:37.360 | like how can it be better?

01:08:39.120 | How can it be faster?

01:08:39.960 | How can it be like more accurate?

01:08:42.080 | And so I guess for anyone starting now,

01:08:43.400 | like I would have used Find.

01:08:44.960 | If I was starting now,

01:08:46.680 | 'cause like I would have loved to just have been able

01:08:48.560 | to say like, hey, like I have no idea what I'm doing.

01:08:51.080 | Can you just like be this like technical research assistant

01:08:54.360 | and kind of hold my hand and like ask me clarifying questions

01:08:57.320 | and like help me like formalize my assumptions

01:08:59.280 | like along the way.

01:09:00.360 | I would have loved that.

01:09:01.400 | But yeah, I just kind of did that myself.

01:09:03.320 | - Yeah.

01:09:04.160 | Recording looms of yourself using Find

01:09:05.400 | actually would be pretty interesting.

01:09:06.640 | - Yeah.

01:09:07.480 | - Because I think you would use Find differently

01:09:08.840 | than people would by themselves.

01:09:11.280 | - I think so, yeah.

01:09:12.120 | - Unprompted.

01:09:12.960 | - I generally use Find for everything,

01:09:16.680 | which is definitely, yeah.

01:09:17.640 | It's like, no, no, even like non-technical questions as well.

01:09:20.360 | 'Cause that's just something I'm curious about.

01:09:23.160 | But that's generally like,

01:09:24.480 | that's less of a usage pattern nowadays.

01:09:26.200 | Like most people generally for the most part

01:09:29.200 | do technical questions on Find.

01:09:31.680 | And that is completely understandable

01:09:34.160 | because of very deliberate decisions that we've made

01:09:36.640 | in how we've optimized the product.

01:09:38.440 | Like we've optimized the product

01:09:39.920 | very much in a quality first manner

01:09:43.080 | as opposed to a like speed first

01:09:46.040 | or like some balance of the two matters.

01:09:47.880 | So we're like, we have to run GPT-4

01:09:50.160 | or some GPT-4 equivalent by default.

01:09:52.560 | And it has to give like a good answer

01:09:54.920 | to like a very demanding technical audience

01:09:56.840 | for people who will leave.

01:09:57.760 | So that's just the trade off.

01:09:59.520 | So like sometimes it's slower for like simple questions,

01:10:04.240 | but like we did that on purpose, so.

01:10:07.160 | - Awesome.

01:10:08.000 | Before we do a lightning round,

01:10:10.160 | call for hiring any roles you're looking for.

01:10:13.640 | What should people know about working at Find?

01:10:15.400 | - Yeah.

01:10:16.240 | So we really straddled the line

01:10:18.080 | between product and research at Find.

01:10:20.760 | Like for the past little while,

01:10:25.400 | a lot of the work that we've done has been solely product.

01:10:28.320 | But we also do, especially now with the Find model,

01:10:31.480 | a very particular kind of applied research

01:10:34.800 | in trying to apply the very latest techniques

01:10:39.040 | and techniques that might not,

01:10:40.280 | that have not even been proven yet

01:10:42.320 | to training the very, very best model for our vertical.

01:10:46.800 | And the two go hand in hand because the product,

01:10:51.320 | the UI, the UX is kind of model agnostic,

01:10:54.080 | but when it has a better kind of kernel,

01:10:57.200 | as Andrej Karpathy put it, plugged into it,

01:11:00.240 | it gets so much better.

01:11:01.080 | So we're doing really kind of both at the same time.

01:11:03.560 | And so someone who like enjoys seeing both of those sides,

01:11:06.960 | like doing something very tangible that affects the user,

01:11:11.080 | high quality, reliable code that runs in production,

01:11:15.480 | but also having that chance to experiment

01:11:18.240 | with like building these models.

01:11:20.760 | Yeah, we'd love to talk to you.

01:11:23.640 | - And the title is Applied AI Engineer.

01:11:25.480 | - I don't know what the title is.

01:11:26.520 | Like that is one title,

01:11:30.600 | but I don't know if like this really exists

01:11:33.760 | 'cause I feel like we're too rigid

01:11:35.520 | about like bucketing people into categories.

01:11:37.560 | - Yeah, founding engineer is fine.

01:11:39.120 | - Yeah, well, we already have a founding engineer technically.

01:11:42.240 | - Well, for what it's worth,

01:11:43.920 | OpenAI is adopting Applied AI Engineer.

01:11:45.880 | - Really?

01:11:46.720 | - So it's becoming a thing.

01:11:48.000 | - It's becoming a thing.

01:11:49.200 | - All right. - We'll see.

01:11:51.480 | - We'll see.

01:11:52.760 | Lightning round.

01:11:53.880 | - Lightning round.

01:11:54.720 | - Yeah, we have two questions,

01:11:56.000 | acceleration, exploration, and then a takeaway.

01:11:58.440 | So the acceleration one is what's something

01:12:00.760 | that already happened in AI

01:12:02.520 | that you thought would take much longer?

01:12:04.320 | - Yeah, the jump from these like models

01:12:08.320 | being glorified summarization models

01:12:10.920 | to actual powerful reasoning engines

01:12:13.480 | happened much faster than we thought.

01:12:16.680 | 'Cause like our product itself transitioned

01:12:18.560 | from being kind of, you know,

01:12:19.840 | this glorified summarization product

01:12:21.480 | to now like mostly a reasoning heavy product.

01:12:24.320 | And we had no idea that this would happen this fast.

01:12:26.040 | Like we thought that like there'd be a lot more time

01:12:29.720 | and like many more things that needed to happen

01:12:32.600 | before we could do some level of like intelligent reasoning

01:12:37.600 | on a low level about people's code.

01:12:40.160 | But it's already happened

01:12:41.000 | and it happened much faster than we could have thought.

01:12:45.200 | But I think that leads into your next point.

01:12:49.880 | - Which is exploration. - Exploration, yeah.

01:12:52.080 | - What do you think is the most interesting

01:12:53.480 | unsolved question in AI? - Yes.

01:12:55.360 | I think solving hallucinations,

01:12:58.440 | being able to guarantee that the answer will be correct

01:13:03.960 | is I think super interesting.

01:13:05.160 | And it's particularly relevant to us

01:13:07.720 | 'cause like we operate in a space

01:13:09.880 | where like everything needs to be correct.

01:13:11.720 | Like the code, like not just the logic,

01:13:14.400 | but like the implementation,

01:13:15.840 | everything has to be completely correct.

01:13:18.320 | And there's a lot of very interesting work

01:13:19.720 | that's going on in this space.

01:13:21.320 | Some of it is approaching it

01:13:22.440 | from the angle of formal grammars.

01:13:25.600 | There's a very interesting paper that came out recently.

01:13:28.400 | I forget where it came out of,

01:13:30.320 | but it came, the paper is basically,

01:13:33.200 | you can define a grammar that restricts

01:13:37.080 | and modifies the models.

01:13:39.440 | - Log props.

01:13:40.280 | - Exactly, like decoding strategy

01:13:42.360 | to only conform to that grammar.

01:13:46.400 | And that helps it.

01:13:49.400 | - Is this LMQL?

01:13:52.160 | Because I feel like LMQL is a little bit too structured

01:13:54.520 | for if the goal is avoiding hallucination.

01:13:57.640 | That's such a vague goal.

01:13:59.360 | - Yeah.

01:14:00.440 | - Yeah, I haven't seen it.

01:14:01.280 | - This is only something we've begun to take a look at.

01:14:03.960 | I haven't fully read the paper yet.

01:14:05.680 | Like I've only kind of skimmed the abstract,

01:14:07.600 | but it's something that like,

01:14:08.680 | we're definitely interested in exploring further.

01:14:11.000 | But something that we are like a bit further along on

01:14:12.720 | is also like exploring reinforcement learning

01:14:15.520 | for correctness, as opposed to only harmfulness

01:14:19.080 | the way it has typically been used in my research.

01:14:21.080 | - We just did a CEO paper on that.

01:14:22.320 | - Yeah.

01:14:23.160 | - Just a quick follow-up.

01:14:24.000 | Do you have internal evals for what hallucination rate is

01:14:27.920 | on stock GPT-4 and then maybe what yours is

01:14:31.720 | after fine tuning?

01:14:32.720 | - We, yeah.

01:14:33.600 | So we don't measure hallucination directly

01:14:38.600 | in our internal benchmarks.

01:14:42.120 | We more measure like was the answer right or was it wrong?

01:14:45.360 | We measure hallucination indirectly

01:14:47.560 | by evaluating the context,

01:14:51.240 | like the RAG context fed into the model as well.

01:14:54.200 | So basically, if the context was bad and the answer was bad,

01:14:58.320 | then chances are like, it's the context,

01:15:00.920 | but if the context was good,

01:15:02.360 | and it just like misinterpreted that

01:15:05.320 | or had the wrong conclusion,

01:15:07.480 | then like we can take different steps there.

01:15:10.160 | - Harrison from LangChain has been talking about

01:15:11.640 | this sort of two by two matrix with the RAGs people.

01:15:15.080 | It's pretty simple concept.

01:15:16.680 | What's the source of error?

01:15:17.800 | - Exactly.

01:15:18.640 | And I've been talking to Harrison actually

01:15:19.960 | about like a more like structured way,

01:15:22.240 | perhaps within LangChain to like two evals.

01:15:24.600 | 'Cause I think that's a massive problem.

01:15:26.040 | Like every single eval is different

01:15:28.760 | for these big large language models

01:15:31.160 | and doing them in a quantitative way is really hard,

01:15:34.320 | but it's possible with like a platform

01:15:36.600 | that I think harnesses GPT-4 in the right way.

01:15:39.520 | That and also perhaps a stricter prompting

01:15:45.360 | stricter prompting language,

01:15:47.760 | like a prompting markup language for prompting models

01:15:50.960 | is something I'm also very interested in.

01:15:53.120 | 'Cause we've written some very, very complex prompts,

01:15:56.280 | particularly for a VS code extension

01:15:58.080 | to like do like very fancy things with people's code.

01:16:02.200 | And like, I wish there was a way

01:16:04.720 | that you could have like a more formal way,

01:16:06.600 | like a Python for LLM prompting

01:16:10.240 | that you could activate desired things

01:16:14.480 | within like the models execution flow

01:16:17.120 | through some other abstraction above language

01:16:22.120 | that has been like tested to do that some of the time,

01:16:26.440 | perhaps like combined with like formal grammar limitations

01:16:30.760 | and stuff like that.

01:16:32.560 | - Interesting.

01:16:33.400 | I have no idea what that looks like.

01:16:35.040 | - These are all things that have kind of emerged directly

01:16:38.080 | from the issues we're facing ourselves in China.

01:16:40.840 | But yeah, definitely very abstract so far.

01:16:43.760 | - Awesome.

01:16:44.600 | And yeah, just to wrap,

01:16:45.640 | what's one message, idea you want people to remember

01:16:49.880 | and think about?

01:16:50.720 | - Yeah, I think pay attention to those moments

01:16:55.640 | that like really jump out at you.

01:16:56.760 | Like when you see like a crazy demo

01:16:58.880 | that you can't like forget about

01:17:00.520 | or like something that you just think

01:17:02.920 | is really, really cool.

01:17:04.520 | Yeah, don't let that go.

01:17:07.800 | 'Cause I see a lot of people trying to start startups

01:17:11.120 | from the angle of like,

01:17:12.240 | hey, I just wanna start a startup

01:17:13.760 | or I'm just like bored at my job

01:17:15.560 | or like I'm like generally interested in the space.

01:17:18.880 | And I personally disagree with that.

01:17:20.640 | My take is that like, it's much easier,

01:17:25.360 | having been on both sides of that coin now,

01:17:27.120 | it's much easier to stay like obsessed every single day

01:17:32.120 | when the genesis of your startup

01:17:34.360 | is like something that really spoke to you

01:17:38.160 | in an incredibly meaningful way

01:17:41.200 | beyond just being kind of some insight

01:17:42.920 | that you've noticed.

01:17:43.920 | And I guess that's,

01:17:46.200 | I think like what we're discovering now

01:17:50.240 | is that like in the long, long term,

01:17:53.960 | like what you're really building

01:17:55.600 | is like you're building a group of people

01:17:57.680 | that believe this thing,

01:18:00.160 | that believe that like the future of solving problems

01:18:03.960 | and making things will be just like focused more

01:18:08.280 | on like the human thought process

01:18:10.240 | as opposed to the implementation part.

01:18:12.160 | And it's like, it's that belief that I think

01:18:18.760 | is what really gets you through the tough times

01:18:21.160 | and hopefully gets you to the other side someday.

01:18:25.720 | - Awesome.

01:18:26.560 | I kinda wanna play "Lose Yourself" as the outro music.

01:18:29.960 | - Then we'll get DMCA strike.

01:18:31.560 | - That'd be great though.

01:18:34.120 | - Thank you so much for coming on.

01:18:35.400 | - Yeah, thank you so much for having me.

01:18:37.000 | This was really fun.

01:18:38.600 | (upbeat music)

01:18:41.200 | (upbeat music)

01:18:43.800 | (upbeat music)

01:18:46.400 | (upbeat music)

01:18:49.000 | (upbeat music)

01:18:51.580 | [BLANK_AUDIO]

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Chapters