The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

00:00:00.000 | [MUSIC PLAYING]

00:00:01.440 | Hey, everyone.

00:00:02.240 | Welcome to the Latent Space Podcast.

00:00:04.200 | This is Alessio, partner and CTO on Residents

00:00:06.640 | at Decibel Partners.

00:00:07.600 | And I'm joined by my co-host, Sweets, founder of Small.ai.

00:00:10.760 | Hey, and today we're christening our new podcast studio

00:00:14.560 | in the Newton.

00:00:16.200 | And we have Biang and Steve from Sourcegraph.

00:00:19.380 | Welcome.

00:00:19.880 | Hey, thanks for having us.

00:00:21.560 | So this has been a long time coming.

00:00:23.040 | I'm very excited to have you.

00:00:24.480 | We also are just celebrating the one year anniversary of ChatGPT

00:00:28.240 | yesterday.

00:00:30.360 | But also we'll be talking about the GA of Cody later on today.

00:00:34.480 | But we'll just do a quick intros of both of you.

00:00:37.320 | Obviously, people can research you and check the show notes

00:00:39.960 | for more.

00:00:40.880 | But Biang, you worked in computer vision at Stanford,

00:00:43.200 | and then you worked at Palantir.

00:00:44.640 | I did, yeah.

00:00:45.440 | You also interned at Google, which is--

00:00:47.120 | I did back in the day, where I get to use

00:00:48.920 | Steve's system, dev tool.

00:00:51.880 | Right.

00:00:53.400 | What was it called?

00:00:54.280 | It was called Grok.

00:00:55.120 | Well, the end user thing was Google Code Search.

00:00:58.100 | That's what everyone called it, or just like CS.

00:01:00.680 | But the brains of it were really the Trigram index and then

00:01:06.160 | Grok, which provided the reference graph.

00:01:08.720 | Today it's called Kythe, the open source Google one.

00:01:11.400 | It's sort of like Grok v3.

00:01:13.080 | On your podcast, which you've had me on,

00:01:15.640 | you've interviewed a bunch of other code search developers,

00:01:18.760 | including the current developer of Kythe, right?

00:01:21.560 | No, we didn't have any Kythe people on,

00:01:24.200 | although we would love to if they're up for it.

00:01:27.480 | We had Kelly Norton, who built a similar system at Etsy.

00:01:32.840 | It's an open source project called Hound.

00:01:35.340 | We also had Han-Wen Nienhaus, who

00:01:39.480 | created Zooked, which is--

00:01:41.000 | That's the name I'm thinking about.

00:01:43.120 | --I think heavily inspired by the Trigram index that

00:01:46.360 | powered Google's original code search,

00:01:49.520 | and that we also now use at Sourcegraph.

00:01:51.380 | Yeah.

00:01:51.880 | So you teamed up with Quinn over 10 years

00:01:54.240 | ago to start Sourcegraph.

00:01:57.040 | And I kind of view it like--

00:02:00.040 | we'll talk more about this.

00:02:01.320 | You were indexing all code on the internet,

00:02:05.080 | and now you're in the perfect spot

00:02:07.000 | to create a coding intelligence startup.

00:02:10.440 | Yeah, yeah.

00:02:11.040 | I guess the back story was, I used Google Code Search

00:02:15.160 | while I was an intern.

00:02:16.120 | And then after I left that internship

00:02:19.360 | and worked elsewhere, it was the single dev tool

00:02:22.800 | that I missed the most.

00:02:23.840 | I felt like my job was just a lot more tedious and much more

00:02:27.760 | of a hassle without it.

00:02:29.840 | And so when Quinn and I started working together at Palantir,

00:02:32.420 | he had also used various code search engines in open source

00:02:37.040 | over the years.

00:02:38.440 | And it was just a pain point that we both felt,

00:02:41.000 | both working on code at Palantir and also

00:02:44.840 | working within Palantir's clients, which

00:02:46.520 | were a lot of Fortune 500 companies,

00:02:49.120 | large financial institutions, folks like that.

00:02:51.960 | And if anything, the pains they felt

00:02:55.880 | in dealing with large, complex code bases

00:02:57.840 | made our pain points feel small by comparison.

00:03:01.800 | And so that was really the impetus

00:03:03.160 | for starting Sourcegraph.

00:03:05.160 | Yeah, excellent.

00:03:07.000 | Steve, you famously worked at Amazon.

00:03:10.280 | I did, yep.

00:03:11.960 | And revealed-- and you've told many, many stories.

00:03:15.160 | I want every single listener of "Latent Space"

00:03:17.040 | to check out Steve's YouTube, because he effectively

00:03:20.640 | had a podcast that you didn't tell anyone

00:03:23.400 | about or something.

00:03:25.240 | You just hit record and just went on a few rants.

00:03:28.880 | I'm always here for a Stevie rant.

00:03:31.160 | Then you moved to Google, where you also

00:03:34.640 | had some interesting thoughts on just the overall Google

00:03:36.920 | culture versus Amazon.

00:03:38.320 | You joined Grab as head of Eng for a couple of years.

00:03:40.720 | I'm from Singapore, so I have actually personally

00:03:44.480 | used a lot of Grab's features.

00:03:46.560 | And it was very interesting to see

00:03:48.280 | you talk so highly of Grab's engineering

00:03:50.800 | and overall prospects, because--

00:03:53.320 | Because as a customer, it sucked.

00:03:55.320 | No, it's just like--

00:03:57.080 | no, well, being from a smaller country,

00:03:59.400 | you never see anyone from our home country

00:04:02.760 | being on a global stage or talked

00:04:04.560 | about as a good startup that people admire or look up

00:04:08.880 | to, on the league that you, with all your legendary experience,

00:04:14.520 | would consider equivalent.

00:04:16.760 | Yeah, no, absolutely.

00:04:18.440 | They actually didn't even know that they were as good

00:04:21.320 | as they were, in a sense.

00:04:22.600 | They started hiring a bunch of people from Silicon Valley

00:04:25.360 | to come in and fix it.

00:04:26.680 | And we came in and we were like, oh, we

00:04:28.880 | could have been a little better, operational excellence

00:04:30.360 | and stuff.

00:04:31.080 | But by and large, they're really sharp.

00:04:32.680 | And the only thing about Grab is that they get criticized a lot

00:04:37.160 | for being too Westernized.

00:04:39.320 | Oh, by who?

00:04:41.240 | By Singaporeans who don't want to work there.

00:04:44.400 | OK, well, I guess I'm biased because I'm here,

00:04:47.800 | but I don't see that as a problem.

00:04:51.920 | And if anything, they've had their success

00:04:54.520 | because they were more Westernized than the Sanders

00:04:56.760 | Singaporean tech company.

00:04:57.880 | I mean, they had their success because they are laser-focused.

00:05:01.240 | They copy to Amazon.

00:05:02.960 | I mean, they're executing really, really, really well.

00:05:07.080 | For a giant--

00:05:08.480 | I was on a Slack with 2,500 engineers.

00:05:11.880 | It was like this giant waterfall that you

00:05:14.160 | could dip your toe into.

00:05:15.200 | You'd never catch up with them.

00:05:16.640 | Actually, the AI summarizers would

00:05:18.160 | have been really helpful there.

00:05:21.200 | But yeah, no, I think Grab is successful

00:05:23.200 | because they're just out there with their sleeves rolled up,

00:05:25.880 | just making it happen.

00:05:27.360 | Yeah.

00:05:27.920 | And for those who don't know, it's

00:05:29.320 | not just like Uber of Southeast Asia.

00:05:31.160 | It's also a super app.

00:05:33.000 | PayPal plus.

00:05:35.400 | Yeah, in the way that super apps don't exist in the West.

00:05:38.000 | It's one of the greatest mysteries, enduring mysteries

00:05:40.840 | of B2C, that super apps work in the East

00:05:42.820 | and don't work in the West.

00:05:43.960 | Don't understand it.

00:05:44.840 | Yeah, it's just kind of curious.

00:05:46.760 | They didn't work in India either.

00:05:48.160 | And it was primarily because of bandwidth reasons

00:05:50.200 | and smaller phones.

00:05:51.920 | That should change now.

00:05:53.320 | Should.

00:05:53.840 | And maybe we'll see a super app here.

00:05:55.360 | Yeah.

00:05:55.920 | Yeah.

00:05:56.840 | You worked on-- you retired-ish?

00:05:59.200 | I did, yeah.

00:06:00.200 | You worked on your own video game.

00:06:02.960 | Which-- any fun stories about that?

00:06:04.760 | Any-- I think-- and that's also where you discover some need

00:06:07.160 | for code search, right?

00:06:09.000 | Yeah.

00:06:09.840 | Sure, a need for a lot of stuff.

00:06:11.360 | Better programming languages, better databases,

00:06:14.040 | better everything.

00:06:15.000 | I mean, I started in '95, where there was kind of nothing.

00:06:19.480 | Yeah.

00:06:20.200 | I just want to say, I remember when

00:06:21.400 | you first went to Grab, because you wrote that blog post,

00:06:23.640 | talking about why you were excited about it,

00:06:25.480 | about the expanding Asian market.

00:06:27.440 | And our reaction was like, oh, man.

00:06:29.160 | Why didn't-- how did we miss stealing it?

00:06:32.000 | Hiring you.

00:06:32.880 | Yeah, I was like, miss that.

00:06:34.120 | Wow, I'm tired.

00:06:35.120 | Can we tell that story?

00:06:36.120 | So how did this happen?

00:06:37.640 | So you were inspired by Grok.

00:06:41.560 | Yeah, so I guess the back story, from my point of view,

00:06:44.880 | is I had used Code Search and Grok while at Google.

00:06:49.360 | But I didn't actually know that it was connected to you, Steve.

00:06:52.720 | Like, I knew you from your blog posts, which were always

00:06:55.160 | excellent, kind of like inside, very thoughtful takes on--

00:06:59.640 | from an engineer's perspective, on some of the challenges

00:07:01.920 | facing tech companies, and tech culture,

00:07:04.200 | and that sort of thing.

00:07:06.280 | But my first introduction to you,

00:07:08.000 | within the context of code intelligence and code

00:07:10.120 | understanding, was I watched a talk that you gave,

00:07:13.720 | I think, at Stanford about Grok when you were first

00:07:15.840 | building it.

00:07:16.360 | And that was very eye-opening.

00:07:18.280 | And I was like, oh, that guy, the guy

00:07:20.640 | who writes the extremely thoughtful, ranty blog posts,

00:07:24.360 | also built that system.

00:07:27.520 | And so that's how I knew you were kind of involved in that.

00:07:32.200 | And then it was kind of like, we always

00:07:34.760 | kind of wanted to hire you, but never

00:07:37.000 | knew quite how to approach you or get

00:07:40.560 | that conversation started.

00:07:42.920 | Well, we got introduced by Max, right?

00:07:45.840 | Yeah.

00:07:46.340 | He's the head of Temporal.

00:07:47.920 | Temporal, yeah.

00:07:49.760 | And yeah, I mean, it was a no-brainer.

00:07:52.040 | They called me up.

00:07:52.800 | And I noticed when Sourcegraph had come out.

00:07:55.560 | Of course, when they first came out,

00:07:57.400 | I had this dagger of jealousy stabbed through me,

00:08:00.400 | piercingly, which I remember, because I am not

00:08:02.920 | a jealous person by any means, ever.

00:08:06.040 | But boy, I was like, rah, rah, rah.

00:08:08.320 | But I was kind of busy, right?

00:08:09.800 | And just one thing led to another.

00:08:11.580 | I got sucked back into the ads vortex and whatever.

00:08:14.440 | So thank god, Sourcegraph actually kind of rescued me.

00:08:18.440 | Here's a chance to build DevTools.

00:08:20.000 | Yeah.

00:08:20.640 | That's the best.

00:08:21.360 | DevTools are the best.

00:08:23.400 | Cool.

00:08:23.900 | Well, so that's the overall intro.

00:08:25.440 | I guess we can get into Cody.

00:08:27.560 | Is there anything else that people should know about you

00:08:29.880 | before we get started?

00:08:31.400 | I mean, everybody knows I'm a musician.

00:08:34.920 | So I can juggle five balls.

00:08:39.400 | Five is good.

00:08:40.080 | Five is good.

00:08:40.800 | I've only ever managed three.

00:08:42.960 | Five's hard.

00:08:45.080 | And six, a little bit.

00:08:46.400 | Wow.

00:08:47.160 | That's impressive.

00:08:49.120 | So yeah, to jump into Sourcegraph,

00:08:51.840 | this has been a company 10 years in the making.

00:08:54.480 | And as Sean said, now you're at the right place.

00:08:58.200 | Phase two.

00:08:59.520 | Now exactly, you spent 10 years collecting all this code,

00:09:02.480 | indexing, making it easy to surface it, and how--

00:09:05.640 | And also learning how to work with enterprises

00:09:07.960 | and having them trust you with their code bases.

00:09:10.360 | Because initially, you were only doing on-prem, right, like VPC,

00:09:14.400 | a lot of VPC deployments.

00:09:15.880 | So in the very early days, we were cloud only.

00:09:17.800 | But the first major customers we landed

00:09:20.360 | were all on-prem, self-hosted.

00:09:22.960 | And that was, I think, related to the nature of the problem

00:09:25.880 | that we're solving, which becomes

00:09:27.600 | just a critical, unignorable pain point once you're

00:09:30.120 | above 100 devs or so.

00:09:32.320 | Yeah.

00:09:32.920 | And now Kodi is going to be GA by the time this releases.

00:09:36.520 | So congrats.

00:09:38.360 | Congrats to your future self for launching this in two weeks.

00:09:42.440 | Can you give a quick overview of just what Kodi is?

00:09:45.280 | I think everybody understands that it's an AI coding agent.

00:09:49.440 | But a lot of companies say they have an AI coding agent.

00:09:52.000 | So yeah, what does Kodi do?

00:09:53.920 | How do people interface with it?

00:09:55.600 | Yeah, so basically, how is it different

00:09:57.680 | from the several dozen other AI coding agents

00:10:00.320 | that exist in the market now?

00:10:03.120 | I think our take--

00:10:04.320 | when we thought about building a coding assistant that

00:10:08.360 | would do things like code generation and question

00:10:10.440 | answering about your code base, I

00:10:11.800 | think we came at it from the perspective of we've

00:10:14.600 | spent the past decade building the world's best code

00:10:17.880 | understanding engine for human developers, right?

00:10:21.000 | So it's kind of your guide as a human dev

00:10:26.280 | if you want to go and dive into a large, complex code base.

00:10:30.360 | And so our intuition was that a lot of the context

00:10:33.960 | that we're providing to human developers

00:10:35.640 | would also be useful context for AI developers to consume.

00:10:40.920 | And so in terms of the feature set,

00:10:43.560 | Kodi is very similar to a lot of other assistants.

00:10:45.640 | It does inline autocompletion.

00:10:47.240 | It does code base aware chat.

00:10:49.640 | It does specific commands that automate tasks

00:10:53.000 | that you might rather not want to do,

00:10:55.640 | like generating unit tests or adding detailed documentation.

00:11:01.080 | But we think the core differentiator is really

00:11:04.320 | the quality of the context, which is

00:11:06.400 | hard to describe succinctly.

00:11:08.280 | It's a bit like saying, what's the difference between Google

00:11:10.900 | and AltaVista?

00:11:12.520 | There's not a quick checkbox list of features

00:11:14.880 | that you can rattle off.

00:11:15.880 | But it really just comes down to all the attention and detail

00:11:19.000 | that we've paid to making that context work well and be

00:11:23.080 | high quality and fast.

00:11:24.760 | For human devs, we're now kind of plugging into the AI coding

00:11:27.720 | assistant as well.

00:11:29.280 | Yeah.

00:11:30.020 | I mean, just to add, just to add my own perspective

00:11:33.680 | onto what Byung just described, I'd

00:11:37.720 | say RAG is kind of like a consultant

00:11:40.920 | that the LLM has available that knows about your code.

00:11:45.000 | RAG provides basically a bridge to a lookup system

00:11:47.400 | for the LLM, right?

00:11:49.520 | Whereas fine-tuning would be more like on-the-job training

00:11:53.240 | for somebody.

00:11:54.000 | If the LLM is a person, and you send them to a new job,

00:11:56.840 | and you do on-the-job training, that's

00:11:57.880 | what fine-tuning is like, right?

00:11:59.360 | So tuned to a specific task.

00:12:02.200 | You're always going to need that expert,

00:12:03.820 | even if you get the on-the-job training,

00:12:05.480 | because the expert knows your particular code base,

00:12:08.400 | your task, right?

00:12:10.320 | And that expert has to know your code.

00:12:12.620 | And there's a chicken-and-egg problem, because we're like,

00:12:15.160 | well, I'm going to ask the LLM about my code.

00:12:17.120 | But first, I have to explain it, right?

00:12:19.320 | It's this chicken-and-egg problem.

00:12:20.740 | That's where RAG comes in.

00:12:22.200 | And we have the best consultants, right?

00:12:24.800 | The best assistant who knows your code.

00:12:28.240 | And so when you sit down with Cody, right?

00:12:32.600 | What Bian said earlier about going to Google

00:12:34.640 | and using code search, and then starting to feel like without

00:12:37.200 | it, his job was super tedious, yeah?

00:12:40.760 | Once you start using these-- do you guys use coding assistants?

00:12:43.600 | Yeah, right?

00:12:44.400 | I mean, we're getting to the point very quickly, right?

00:12:48.600 | Where you feel like you're kind of like--

00:12:50.640 | almost like you're programming without the internet, right?

00:12:52.840 | Or something.

00:12:53.480 | It's like you're programming back in the '90s

00:12:55.340 | without the coding assistant, yeah?

00:12:57.680 | So hopefully that helps for people

00:12:59.480 | who have no idea about coding systems, what they are.

00:13:03.240 | Yeah.

00:13:03.960 | And I mean, going back to using them,

00:13:06.120 | we had a lot of them on the podcast already.

00:13:07.920 | We had Cursor.

00:13:08.920 | We had Codium and Codium, very similar names.

00:13:12.680 | Yeah.

00:13:13.180 | Griblet, Find, and then, of course, there's Copilot.

00:13:16.400 | Tab9.

00:13:17.960 | Oh, RIP.

00:13:19.100 | No, Kite is the one that died, right?

00:13:20.640 | Oh, right.

00:13:21.400 | I don't know.

00:13:22.120 | I'm starting to get drunk.

00:13:23.940 | So you had a Copilot versus Cody blog post.

00:13:26.760 | And I think it really shows the context improvement.

00:13:31.040 | So you had two examples that stuck with me.

00:13:32.960 | One was, what does this application do?

00:13:35.500 | And the Copilot answer was like, oh, it

00:13:37.440 | uses JavaScript and NPM and this.

00:13:40.000 | And it's like, but that's not what it does.

00:13:42.520 | That's what it's built with.

00:13:43.880 | Versus Cody was like, oh, these are the major functions

00:13:47.840 | and these are the functionalities and things

00:13:49.760 | like that.

00:13:51.280 | And then the other one was, how do I start this up?

00:13:53.440 | And Copilot just said, NPM start,

00:13:56.440 | even though there was no start command in the package JSON.

00:13:59.440 | But, you know, moat collapse, right?

00:14:01.680 | Most projects use NPM start, so maybe this does too.

00:14:05.720 | How do you think about open source models and private--

00:14:10.600 | because Copilot has their own private thing.

00:14:12.520 | And I think you guys use Starcoder, if I remember right.

00:14:15.880 | - Yeah, that's correct.

00:14:17.000 | I think Copilot uses some variant of Codex.

00:14:19.780 | They're kind of cagey about it.

00:14:21.080 | I don't think they've officially announced what model they use.

00:14:24.000 | - And I think they use a range of models based on what you're

00:14:26.540 | doing.

00:14:27.240 | - Yeah, so everyone uses a range of model.

00:14:28.960 | No one uses the same model for inline completion

00:14:31.260 | versus chat, because the latency requirements for--

00:14:34.320 | - Oh, OK.

00:14:35.040 | - Well, there's fill in the middle.

00:14:36.500 | There's also what the model's trained on.

00:14:38.560 | So we actually had completions powered

00:14:40.760 | by Cloud Instant for a while.

00:14:42.720 | But you had to kind of prompt hack your way

00:14:44.960 | to get it to output just the code and not, like, hey,

00:14:48.480 | here's the code you asked for, like that sort of text.

00:14:52.000 | So everyone uses a range of models.

00:14:54.320 | We've kind of designed Kodi to be especially model--

00:14:59.880 | not agnostic, but pluggable.

00:15:02.400 | So one of our design considerations

00:15:05.640 | was, as the ecosystem evolves, we

00:15:07.680 | want to be able to integrate the best in class models,

00:15:11.040 | whether they're proprietary or open source, into Kodi,

00:15:15.200 | because the pace of innovation in the space is just so quick.

00:15:19.680 | And I think that's been to our advantage.

00:15:21.760 | Like today, Kodi uses Starcoder for inline completions.

00:15:25.640 | And with the benefit of the context that we provide,

00:15:29.440 | we actually show comparable completion acceptance rate

00:15:33.200 | metrics.

00:15:34.160 | It's kind of like the standard metric

00:15:35.840 | that folks use to evaluate inline completion quality.

00:15:38.240 | It's like, if I show you a completion,

00:15:39.840 | what's the chance that you actually accept the completion

00:15:42.240 | versus you reject it?

00:15:43.160 | And so we're at par with Copilot,

00:15:45.080 | which is at the head of the industry right now.

00:15:47.920 | And we've been able to do that with the Starcoder model, which

00:15:50.420 | is open source, and the benefit of the context fetching stuff

00:15:54.360 | that we provide.

00:15:55.020 | And of course, a lot of like prompt engineering

00:15:57.000 | and other stuff along the way.

00:16:00.400 | Yeah.

00:16:01.280 | And Steve, you've wrote a post called

00:16:03.640 | "Cheating is All You Need" about what you're building.

00:16:06.080 | And one of the points you made is

00:16:07.460 | that everybody's fighting on the same axis, which

00:16:10.000 | is better UI and the IDE, maybe like a better chat response.

00:16:14.400 | But data modes are kind of the most important thing.

00:16:17.000 | And you guys have like a 10-year-old mode

00:16:19.960 | with all the data you've been collecting.

00:16:22.280 | How do you kind of think about what other companies are

00:16:25.800 | doing wrong, right?

00:16:26.720 | Like, why is nobody doing this in terms

00:16:30.320 | of like really focusing on RAG?

00:16:31.840 | I feel like you see so many people, oh, we just

00:16:34.560 | got a new model, and it's like a bit human eval.

00:16:36.920 | And it's like, wow, but maybe like that's not

00:16:39.240 | what we should really be doing, you know?

00:16:41.040 | Do you think most people underestimate

00:16:42.960 | the importance of like the actual RAG in code?

00:16:47.040 | Yeah, I mean, I think that people weren't doing it much.

00:16:51.440 | It's kind of at the edges of AI.

00:16:53.320 | It's not in the center.

00:16:54.520 | I know that when ChatGPT launched,

00:16:56.200 | so within the last year, I've heard a lot of rumblings

00:16:58.520 | from inside of Google, right?

00:16:59.840 | Because they're undergoing a huge transformation

00:17:02.240 | to try to, of course, get into the new world.

00:17:05.120 | And I heard that they told a bunch of teams

00:17:07.160 | to go and train their own models or fine-tune their own models,

00:17:09.740 | both.

00:17:10.640 | And it was a shit show, right?

00:17:12.120 | Because nobody knew how to do it.

00:17:14.240 | And they launched two coding assistants.

00:17:16.840 | One was called Code D, with an E-Y.

00:17:20.120 | And then there was--

00:17:21.160 | I don't know what happened in that one.

00:17:22.740 | And then there's Duet, right?

00:17:24.120 | Google loves to compete with themselves, right?

00:17:26.160 | They do this all the time.

00:17:27.440 | And they had a paper on Duet, like, from a year ago.

00:17:29.880 | And they were doing exactly what Copilot was doing,

00:17:32.040 | which was just pulling in the local context, right?

00:17:35.720 | But fundamentally, I thought of this

00:17:38.440 | because we were talking about the splitting of the models.

00:17:40.840 | In the early days, it was the LLM did everything.

00:17:44.160 | And then we realized that for certain use cases,

00:17:47.000 | like completions, that a different, smaller, faster

00:17:49.160 | model would be better.

00:17:50.760 | And that fragmentation of models,

00:17:53.040 | actually, we expected to continue and proliferate,

00:17:55.880 | right?

00:17:56.440 | Because fundamentally, we're a recommender engine right now.

00:18:00.040 | We're recommending code to the LLM.

00:18:02.080 | We're saying, may I interest you in this code

00:18:04.200 | right here so that you can answer my question?

00:18:06.920 | And being good at recommender engine--

00:18:09.180 | I mean, who are the best recommenders, right?

00:18:11.020 | There's YouTube, and Spotify, and Amazon, or whatever, right?

00:18:14.320 | Yeah, and they all have many, many, many, many, many models,

00:18:17.480 | right?

00:18:18.160 | All fine-tuned for very specific--

00:18:20.640 | and that's where we're headed in code, too, absolutely.

00:18:24.040 | Yeah, we just did an episode we released on Wednesday,

00:18:26.880 | which we said RAG is like Rexis, or like LLMs.

00:18:30.720 | You're basically just suggesting good content.

00:18:33.600 | It's like what?

00:18:34.680 | Recommendation systems.

00:18:35.760 | Oh, got it.

00:18:36.280 | Yeah, yeah, yeah.

00:18:36.960 | Rexis.

00:18:37.460 | Yeah.

00:18:37.960 | So the naive implementation of RAG

00:18:40.240 | is you embed everything through a vector database.

00:18:42.720 | You embed your query, and then you find the nearest neighbors,

00:18:45.420 | and that's your RAG.

00:18:46.600 | But actually, you need to rank it.

00:18:48.020 | And actually, you need to make sure

00:18:49.720 | there's sample diversity and that kind of stuff.

00:18:52.360 | And then you're slowly gradient-descending yourself

00:18:55.120 | towards rediscovering proper Rexis,

00:18:58.040 | which has been traditional ML for a long time,

00:19:00.080 | but approaching it from an LLM perspective.

00:19:02.840 | Yeah, I almost think of it as a generalized search problem,

00:19:06.160 | because it's a lot of the same things.

00:19:08.080 | You want your layer 1 to have high recall

00:19:11.080 | and get all the potential things that could be relevant,

00:19:13.840 | and then there's typically a layer 2 re-ranking mechanism

00:19:18.160 | that bumps up the precision, tries

00:19:20.240 | to get the relevant stuff to the top of the results list.

00:19:24.400 | Have you discovered that ranking matters a lot?

00:19:26.400 | So the context is that I think a lot of research

00:19:30.120 | shows that one, context utilization

00:19:33.000 | matters based on model.

00:19:34.920 | GBT uses the top of the context window,

00:19:37.600 | and then apparently, Cloud uses the bottom better.

00:19:40.480 | And it's lossy in the middle.

00:19:42.200 | So ranking matters.

00:19:43.320 | No, it really does.

00:19:44.360 | The skill with which models are able to take advantage

00:19:47.040 | of context is always going to be dependent on how

00:19:49.720 | that factors into the impact on the training loss.

00:19:53.400 | So if you want long context window models to work well,

00:19:56.240 | then you have to have a ton of data where it's

00:19:58.080 | like, here's a billion lines of text,

00:20:01.200 | and I'm going to ask a question about something that's

00:20:04.080 | embedded deeply into it, and give me the right answer.

00:20:07.880 | And unless you have that training set,

00:20:09.560 | then of course you're going to have variability in terms

00:20:12.040 | of where it attends to.

00:20:13.320 | And in most naturally occurring data,

00:20:15.320 | the thing that you're talking about right now,

00:20:17.200 | the thing I'm asking about, is going

00:20:18.280 | to be something that we talked about recently.

00:20:20.840 | Did you really just say gradient dissenting yourself?

00:20:24.640 | Actually, I love that it's entered the casual lexicon.

00:20:28.520 | My favorite version of that is how you have to p-hack papers.

00:20:32.440 | So when you throw humans at the problem,

00:20:35.120 | that's called graduate student dissent.

00:20:38.600 | That's great.

00:20:39.960 | Yeah, it's really awesome.

00:20:43.320 | I think the other interesting thing that you have

00:20:45.360 | is inline-assist UX that is, I wouldn't say async,

00:20:51.280 | but it works while you can also do work.

00:20:53.240 | So you can ask Kodi to make changes on a code block,

00:20:55.840 | and you can still edit the same file at the same time.

00:20:59.880 | How do you see that in the future?

00:21:01.880 | Do you see a lot of Kodis running together

00:21:04.120 | at the same time?

00:21:05.800 | How do you validate also that they're not

00:21:08.040 | messing each other up as they make changes in the code?

00:21:11.200 | And maybe what are the limitations today,

00:21:12.920 | and what do you think about where the attack is going?

00:21:14.920 | I want to start with a little history,

00:21:16.040 | and then I'm going to turn it over to Bjorn.

00:21:18.200 | So we actually had this feature in the very first launch

00:21:21.320 | back in June.

00:21:22.320 | Dominic wrote it.

00:21:23.200 | It was called Nonstop Kodi.

00:21:25.040 | And you could have multiple basically LLM requests

00:21:28.880 | in parallel modifying your source file.

00:21:31.200 | And he wrote a bunch of codes to handle all of the diffing

00:21:33.640 | logic, and you could see the regions of code

00:21:36.640 | that the LLM was going to change.

00:21:38.760 | And he was showing me demos of it.

00:21:40.960 | And it just felt like it was just a little before its time.

00:21:45.280 | But a bunch of that stuff, that scaffolding

00:21:47.480 | was able to be reused for where inline's sitting today.

00:21:52.200 | How would you characterize it today?

00:21:54.200 | Yeah, so that interface has really

00:21:56.280 | evolved from a like, hey, general purpose,

00:21:58.920 | like request anything inline in the code

00:22:02.360 | and have the code update, to really like targeted features

00:22:05.000 | like fix the bug that exists at this line,

00:22:08.720 | or request a very specific change.

00:22:11.320 | And the reason for that is, I think the challenge

00:22:14.400 | that we ran into with inline fixes--

00:22:16.120 | and we do want to get to the point where you could just

00:22:18.440 | fire it, forget, and have half a dozen of these running

00:22:21.560 | in parallel.

00:22:22.400 | But I think we ran into the challenge

00:22:24.720 | early on that a lot of people are running into now

00:22:27.200 | when they're trying to construct agents, which

00:22:29.920 | is the reliability of working code generation

00:22:36.280 | is just not quite there yet in today's language models.

00:22:40.920 | And so that kind of constrains you to an interaction

00:22:45.360 | where the human is always like in the inner loop,

00:22:47.600 | like checking the output of each response.

00:22:50.960 | And if you want that to work in a way where

00:22:54.280 | you can be asynchronous, you kind of

00:22:56.840 | have to constrain it to a domain where today's language models

00:22:59.720 | can generate reliable code well enough.

00:23:02.120 | So generating unit tests, that's like a well-constrained problem,

00:23:05.520 | or fixing a bug that shows up as a compiler error or a test

00:23:11.320 | error, that's a well-constrained problem.

00:23:13.280 | But the more general, like, hey, write me

00:23:15.440 | this class that does x, y, and z using the libraries

00:23:17.480 | that I have, that is not quite there yet,

00:23:21.080 | even with the benefit of really good context.

00:23:23.760 | It definitely moves the needle a lot,

00:23:25.400 | but we're not quite there yet to the point

00:23:27.560 | where you can just fire and forget.

00:23:29.240 | And I actually think that this is something

00:23:31.600 | that people don't broadly appreciate yet,

00:23:34.560 | because I think that everyone's chasing

00:23:36.480 | this dream of agentic execution.

00:23:39.560 | And if we're to really define that down,

00:23:41.560 | I think it implies a couple of things.

00:23:43.160 | You have a multi-step process where

00:23:44.640 | each step is fully automated, where

00:23:46.120 | you don't have to have a human in the loop every time.

00:23:48.440 | And there's also kind of like an LLM call at each stage,

00:23:52.080 | or nearly every stage in that chain.

00:23:55.960 | And based on all the work that we've

00:23:58.080 | done with the inline interactions,

00:24:01.760 | with kind of like general Cody features

00:24:06.320 | for implementing longer chains of thought,

00:24:08.160 | we're actually a little bit more bearish

00:24:11.040 | than the average AI hypefluencer out there

00:24:15.880 | on the feasibility of agents with purely kind

00:24:19.160 | of like transformer-based models.

00:24:20.680 | To your original question, like the inline interactions

00:24:23.280 | with Cody, we've actually constrained it

00:24:24.960 | to be more targeted, like fix the current error

00:24:28.000 | or make this quick fix.

00:24:29.600 | I think that that does differentiate us

00:24:31.160 | from a lot of the other tools on the market,

00:24:32.760 | because a lot of people are going

00:24:34.100 | after this shnazzy inline edit interaction,

00:24:36.960 | whereas I think where we've moved--

00:24:38.880 | and this is based on the user feedback that we've gotten--

00:24:41.240 | it's like that sort of thing, it demos well,

00:24:43.840 | but when you're actually coding day-to-day,

00:24:45.680 | you don't want to have a long chat conversation

00:24:47.760 | inline with the code base.

00:24:48.840 | That's a waste of time.

00:24:50.200 | You'd rather just have it write the right thing

00:24:52.900 | and then move on with your life or not have to think about it.

00:24:55.480 | And that's what we're trying to work towards.

00:24:57.360 | I mean, yeah, we're not going in the agent direction.

00:24:59.640 | I mean, I'll believe in agents when somebody

00:25:01.480 | shows me one that works.

00:25:03.600 | Instead, we're working on sort of solidifying

00:25:06.640 | our strength, which is bringing the right context in.

00:25:10.240 | So new context sources, ways for you

00:25:12.060 | to plug in your own context, ways for you to control

00:25:14.400 | or influence the context, the mixing that

00:25:16.440 | happens before the request goes out, et cetera.

00:25:19.360 | And there's just so much low-hanging fruit

00:25:21.200 | left in that space that agents seems

00:25:23.420 | like a little bit of a boondoggle.

00:25:24.840 | Just to dive into that a little bit further,

00:25:27.120 | I think at a very high level, what do people

00:25:29.640 | mean when they say agents?

00:25:30.720 | They really mean greater automation, fully automated.

00:25:33.200 | The dream is, here's an issue.

00:25:35.400 | Go implement that.

00:25:36.720 | And I don't have to think about it as a human.

00:25:38.800 | And I think we are working towards that.

00:25:40.720 | That is the eventual goal.

00:25:41.840 | I think it's specifically the approach of, hey,

00:25:44.880 | can we have a transformer-based LLM alone

00:25:48.120 | be the backbone or the orchestrator

00:25:50.680 | of these agentic flows?

00:25:52.000 | We're a little bit more bearish today.

00:25:56.640 | You want a human in the loop.

00:25:58.240 | I mean, you kind of have to.

00:25:59.440 | It's just a reality of the behavior of language models

00:26:03.340 | that are purely transformer-based.

00:26:04.840 | And I think that's just a reflection of reality.

00:26:06.920 | And I don't think people realize that yet.

00:26:08.680 | Because if you look at the way that a lot of other AI tools

00:26:14.680 | have implemented context fetching, for instance,

00:26:18.360 | you see this in the co-pilot approach,

00:26:20.080 | where if you use the at-workspace thing that

00:26:23.080 | supposedly provides code-based level context,

00:26:27.040 | it has an agentic approach, where you kind of look

00:26:31.840 | at how it's behaving.

00:26:32.920 | And it feels like they're making multiple requests to the LLM,

00:26:36.280 | being like, what would you do in this case?

00:26:38.640 | Would you search for stuff?

00:26:39.760 | What sort of files would you gather?

00:26:42.160 | Go and read those files.

00:26:43.480 | And it's a multi-hop step, so it takes a long while.

00:26:46.440 | It's also non-deterministic.

00:26:47.920 | Because any sort of LLM invocation,

00:26:50.040 | it's like a dice roll.

00:26:51.800 | And then at the end of the day, the context it fetches

00:26:54.040 | is not that good.

00:26:55.160 | Whereas our approach is just like, OK,

00:26:56.760 | let's do some code searches that make sense,

00:26:59.280 | and then maybe crawl through the reference graph a little bit.

00:27:03.560 | That is fast.

00:27:04.840 | That doesn't require any sort of LLM invocation at all.

00:27:08.520 | And we can pull in much better context very quickly.

00:27:13.040 | So it's faster, it's more reliable, it's deterministic,

00:27:16.080 | and it yields better context quality.

00:27:17.720 | And so that's what we think.

00:27:20.000 | We just don't think you should cargo cult or naively go,

00:27:23.760 | agents are the future, let's just

00:27:25.240 | try to implement agents on top of the LLMs that exist today.

00:27:29.760 | I think there are a couple of other technologies

00:27:33.600 | or approaches that need to be refined first

00:27:35.800 | before we can get into these multi-stage, fully automated

00:27:39.520 | workflows.

00:27:40.520 | We're very much focused on developer inner loop right now.

00:27:43.480 | But you do see things eventually moving

00:27:45.400 | towards developer outer loop.

00:27:47.920 | So would you basically say that they're

00:27:50.680 | tackling the agents problem that you don't want to tackle?

00:27:54.720 | No, I would say at a high level, we

00:27:56.960 | are after maybe like the same high level problem, which

00:28:00.720 | is like, hey, I want some code written.

00:28:03.000 | I want to develop some software.

00:28:05.320 | And can an automated system go build that software for me?

00:28:12.000 | I think the approaches might be different.

00:28:14.440 | So I think the analogy in my mind

00:28:16.400 | is, think about the AI chess players.

00:28:20.440 | Coding in some senses, it's similar and dissimilar

00:28:23.040 | to chess.

00:28:24.120 | I think one question I ask is, do you

00:28:25.620 | think producing code is more difficult than playing chess

00:28:29.400 | or less difficult than playing chess?

00:28:31.600 | More?

00:28:32.560 | I think more.

00:28:33.560 | And if you look at the best AI chess players,

00:28:36.480 | yes, you can use an LLM to play chess.

00:28:38.440 | People have showed demos where it's like, oh, yeah,

00:28:40.560 | GPT-4 is actually a pretty decent chess move suggester.

00:28:44.760 | But you would never build a best-in-class chess player

00:28:49.720 | off of GPT-4 alone.

00:28:53.160 | The way that people design chess players

00:28:55.760 | is you have a search space.

00:28:58.400 | And then you have a way to explore that search space

00:29:02.240 | efficiently.

00:29:02.880 | There's a bunch of search algorithms, essentially,

00:29:04.920 | where you're doing tree search in various ways.

00:29:07.000 | And you can have heuristic functions,

00:29:10.080 | which might be powered by an LLM.

00:29:11.840 | You might use an LLM to generate proposals in that space

00:29:15.120 | that you can efficiently explore.

00:29:18.840 | But the backbone is still this more formalized tree search

00:29:24.440 | based approach rather than the LLM itself.

00:29:28.560 | And so I think my high level intuition is

00:29:31.800 | that the way that we get to this more reliable multi-step

00:29:36.000 | workflows that can do things beyond generate unit test,

00:29:41.400 | it's really going to be like a search-based approach, where

00:29:43.960 | you use an LLM as kind of like an advisor or a proposal

00:29:47.080 | function, sort of your heuristic function,

00:29:49.360 | like the A* search algorithm.

00:29:54.560 | But it's probably not going to be the thing that

00:29:57.240 | is the backbone.

00:29:58.400 | Because I guess it's not the right tool for that.

00:30:01.120 | Yeah, yeah.

00:30:02.720 | You also have-- I can see yourself

00:30:05.680 | thinking through this, but not saying

00:30:07.300 | the words, the philosophical Peter Norvig type discussion.

00:30:11.560 | Maybe you want to introduce that divided in software.

00:30:16.120 | Yeah, definitely.

00:30:19.120 | Your listeners are savvy.

00:30:20.400 | They're probably familiar with the classic Chomsky

00:30:22.620 | versus Norvig debate.

00:30:24.120 | No, actually, I was prompting you to introduce that.

00:30:26.800 | Oh, got it.

00:30:27.760 | So if you look at the history of artificial intelligence,

00:30:32.160 | it goes way back to--

00:30:33.800 | I don't know, it's probably as old as modern computers,

00:30:36.440 | like '50s, '60s, '70s.

00:30:38.640 | People are debating on what is the path

00:30:40.680 | to producing a general human level of intelligence.

00:30:43.600 | And two schools of thought that emerged.

00:30:47.360 | One is the Norvig school of thought,

00:30:51.320 | which, roughly speaking, includes large language

00:30:55.560 | models, regression, SVN.

00:30:58.840 | Basically, any model that you learn from data

00:31:02.380 | and is data driven, machine learning--

00:31:04.400 | most of machine learning would fall under this umbrella.

00:31:06.700 | And that school of thought says, just learn from the data.

00:31:10.800 | That's the approach to reaching intelligence.

00:31:13.320 | And then the Chomsky approach is more things

00:31:16.000 | like compilers, and parsers, and formal systems.

00:31:20.160 | So basically, let's think very carefully

00:31:22.320 | about how to construct a formal, precise system.

00:31:26.120 | And that will be the approach to how we build

00:31:29.080 | a truly intelligent system.

00:31:31.080 | Lisp, for instance, was originally an attempt to--

00:31:36.240 | I think Lisp was invented so that you

00:31:38.400 | could create rules-based systems that you would call AI.

00:31:41.560 | As a language, yeah.

00:31:42.360 | Yeah, and for a long time, there was this debate.

00:31:44.400 | There were certain AI research labs

00:31:45.860 | that were more in the Chomsky camp,

00:31:47.840 | and others that were more in the Norvig camp.

00:31:50.120 | And it's a debate that rages on today.

00:31:51.760 | And I feel like the consensus right now

00:31:53.760 | is that Norvig definitely has the upper hand right now

00:31:56.840 | with the advent of LLMs, and diffusion models,

00:31:59.280 | and all the other recent progress in machine learning.

00:32:03.840 | But the Chomsky-based stuff is still really useful,

00:32:08.080 | in my view.

00:32:09.160 | I mean, it's like parsers, compilers.

00:32:10.720 | Basically, a lot of the stuff that

00:32:12.160 | provides really good context, it provides

00:32:14.160 | kind of like the knowledge graph backbone

00:32:17.260 | that you want to explore with your AI dev tool.

00:32:21.400 | That will come from Chomsky-based tools,

00:32:23.920 | like compilers and parsers.

00:32:25.600 | It's a lot of what we've invested in the past decade

00:32:28.040 | at Sourcegraph, and what you built with Grok.

00:32:33.040 | Basically, these formal systems that

00:32:34.480 | construct these very precise knowledge graphs that

00:32:37.640 | are great context providers, and great guardrails enforcers,

00:32:41.400 | and safety checkers for the output of a more data-driven,

00:32:48.720 | fuzzier system that uses like the Norvig-based models.

00:32:54.240 | Bianca was talking about this stuff

00:32:55.780 | like it happened in the Middle Ages.

00:32:57.500 | Basically, it's like, OK, so when I was in college,

00:33:02.000 | I was in college learning Lisp, and Prolog, and Planning,

00:33:04.500 | and all the deterministic Chomsky approaches to AI.

00:33:08.240 | And I was there when Norvig basically declared it dead.

00:33:12.440 | I was there 3,000 years ago when Norvig and Chomsky

00:33:16.040 | fought on the volcano.

00:33:17.280 | When did he declare it dead?

00:33:18.520 | What do you mean he declared it dead?

00:33:20.040 | Late '90s, yeah, when I went to Google,

00:33:22.080 | Peter Norvig was already there.

00:33:24.960 | And he had basically like--

00:33:27.120 | I forget exactly where.

00:33:29.160 | He's got so many famous short posts, amazing things.

00:33:32.080 | He had a famous talk, "The Unreasonable Effectiveness

00:33:34.280 | of Data."

00:33:35.080 | Yeah, maybe that was it.

00:33:36.080 | But at some point, basically, he basically

00:33:38.560 | convinced everybody that the deterministic approaches had

00:33:41.360 | failed, and that heuristic-based, data-driven,

00:33:44.280 | statistical approaches, stochastic were better.

00:33:47.240 | The primary reason--

00:33:48.640 | I can tell you this because I was there--

00:33:50.400 | was that--

00:33:50.960 | [LAUGHTER]

00:33:53.360 | --was that, well, the steam-powered engine-- no.

00:33:55.800 | [LAUGHTER]

00:33:58.080 | The reason was that the deterministic stuff didn't

00:34:00.560 | scale, right?

00:34:01.800 | They were using Prolog, man, constraint systems

00:34:04.160 | and stuff like that.

00:34:05.200 | Well, that was a long time ago, right?

00:34:07.400 | Today, actually, these Chomsky-style systems do scale.

00:34:11.080 | And that's, in fact, exactly what Sourcegraph has built.

00:34:14.200 | And so we have a very unique--

00:34:16.240 | I love the framing that Bjong's made,

00:34:19.240 | the marriage of the Chomsky and the Norvig models,

00:34:22.360 | conceptual models, because we have both of them.

00:34:24.840 | And they're both really important.

00:34:26.260 | And, in fact, there's this really interesting overlap

00:34:29.760 | between them, where the AI or our graph or our search engine

00:34:33.400 | could potentially provide the right context for any given

00:34:35.720 | query, which is, of course, why ranking is important.

00:34:38.360 | But what we've really signed ourselves up for

00:34:40.680 | is an extraordinary amount of testing, yeah?

00:34:44.520 | Because, like you were saying, Swix,

00:34:46.760 | you were saying that GPT-4 tends to the front of the context

00:34:49.760 | window, and maybe other LLMs to the back,

00:34:51.740 | and maybe all the way to the middle.

00:34:53.580 | Yeah, and so that means that if we're actually

00:34:56.680 | verifying whether some change we've made

00:34:59.000 | has improved things, we're going to have

00:35:00.920 | to test putting it at the beginning of the window

00:35:02.480 | and at the end of the window, and maybe

00:35:04.280 | make the right decision based on the LLM that you've chosen.

00:35:06.920 | Which some of our competitors, that's

00:35:08.200 | a problem that they don't have.

00:35:09.500 | But we meet you where you are, yeah?

00:35:11.720 | And just to finish, we're writing thousands,

00:35:14.360 | tens of thousands.

00:35:15.400 | We're generating tests, filling the middle type tests

00:35:17.560 | and things, and then using our graph

00:35:19.320 | to basically fine-tune Cody's behavior there, yeah?

00:35:24.200 | Yeah.

00:35:25.080 | I also want to add, I have an internal pet name

00:35:28.400 | for this hybrid architecture that I'm trying to make catch on.

00:35:32.680 | Maybe I'll just say it here.

00:35:34.760 | Saying it publicly makes it more real.

00:35:37.320 | But I call the architecture that we've

00:35:39.560 | developed the Normski architecture.

00:35:43.880 | And it's kind of like--

00:35:45.120 | I mean, it's obviously a portmanteau of Norvig

00:35:49.060 | and Chomsky, but the acronym, it stands

00:35:52.280 | for non-agentic, rapid, multi-source code intelligence.

00:35:58.040 | So non-agentic, because--

00:35:59.200 | Rolls right off the tongue.

00:36:00.360 | Wow!

00:36:01.680 | And Normski.

00:36:02.720 | Yeah.

00:36:04.120 | Yeah.

00:36:05.400 | But it's non-agentic in the sense

00:36:07.000 | that we're not trying to pitch you on agent hype, right?

00:36:12.040 | The things it does are really just use developer tools

00:36:15.460 | developers have been using for decades now,

00:36:17.680 | like parsers and really good search indexes and things

00:36:21.000 | like that.

00:36:23.200 | Rapid, because we place an emphasis on speed.

00:36:25.440 | We don't want to sit there waiting for multiple LLM

00:36:28.920 | requests to return to complete a simple user request.

00:36:32.240 | Multi-source, because we're thinking broadly

00:36:35.600 | about what pieces of information and knowledge

00:36:39.240 | are useful context.

00:36:40.120 | So obviously starting with things

00:36:41.840 | that you can search in your code base,

00:36:43.680 | and then you add in the reference graph, which

00:36:45.640 | kind of allows you to crawl outward

00:36:47.440 | from those initial results.

00:36:49.920 | But then even beyond that, sources of information,

00:36:52.160 | like there's a lot of knowledge that's

00:36:54.120 | embedded in docs, in PRDs, or product specs,

00:37:01.680 | in your production logging system, in your chat,

00:37:07.520 | in your Slack channel, right?

00:37:09.520 | Like there's so much context that's embedded there.

00:37:11.600 | And when you're a human developer

00:37:12.840 | and you're trying to be productive in your code base,

00:37:15.080 | you're going to go to all these different systems

00:37:16.600 | to collect the context that you need to figure out

00:37:19.480 | what code you need to write.

00:37:21.520 | And I don't think the AI developer will be any different.

00:37:24.560 | It will need to pull context from all

00:37:26.640 | these different sources.

00:37:27.680 | So we're thinking broadly about how

00:37:29.280 | to integrate these into Cody.

00:37:32.760 | We hope through kind of like an open protocol

00:37:35.200 | that others can extend and implement.

00:37:38.420 | And this is something else that should be, I guess,

00:37:41.960 | like accessible by December 14th in kind of like a preview

00:37:45.240 | stage.

00:37:46.640 | But that's really about like broadening

00:37:48.400 | this notion of the code graph beyond your Git repository

00:37:51.480 | to all the other sources where technical knowledge

00:37:53.800 | and valuable context can live.

00:37:56.120 | Yeah, it becomes an artifact graph, right?

00:37:58.000 | It can link into your logs and your wikis

00:37:59.920 | and any data source, right?

00:38:03.080 | How do you guys think about the importance of--

00:38:05.600 | it's almost like data pre-processing in a way,

00:38:07.800 | which is bring it all together, tie it together, make it ready.

00:38:12.440 | Yeah, any thoughts on how to actually make

00:38:14.640 | that good, what some of the innovation you guys have made?

00:38:18.240 | We talk a lot about the context fetching, right?

00:38:20.900 | I mean, there's a lot of ways you could answer this question.

00:38:23.400 | But we've spent a lot of time just in this podcast

00:38:26.380 | here talking about context fetching.

00:38:27.920 | But stuffing the context into the window

00:38:30.000 | is also the bin packing problem, right?

00:38:31.840 | Because the window is not big enough

00:38:33.340 | and you've got more context than you can fit.

00:38:35.220 | You've got a ranker maybe.

00:38:36.560 | But what is that context?

00:38:40.520 | Is it a function that was returned

00:38:42.320 | by an embedding or a graph call or something?

00:38:45.280 | Do you need the whole function?

00:38:46.640 | Or do you just need the top part of the function,

00:38:50.040 | this expression here, right?

00:38:51.800 | So that art, the golf game of trying

00:38:53.920 | to get each piece of context down into its smallest state,

00:38:58.000 | possibly even summarized by another model

00:39:00.440 | before it even goes to the LLM, becomes this

00:39:02.960 | is the game that we're in, yeah?

00:39:04.800 | And so recursive summarization and all the other techniques

00:39:07.840 | that you've got to use to stuff stuff into that context window

00:39:10.560 | become critically important.

00:39:12.200 | And you have to test them across every configuration of models

00:39:15.200 | that you could possibly need.

00:39:16.800 | I think data preprocessing is probably

00:39:19.000 | the unsexy, way underappreciated secret

00:39:22.160 | to a lot of the cool stuff that people are shipping today,

00:39:26.760 | whether you're doing like RAG or fine tuning or pre-training.

00:39:31.920 | The preprocessing step matters so much

00:39:34.800 | because it is basically garbage in, garbage out, right?

00:39:39.440 | Like if you're feeding in garbage to the model,

00:39:41.600 | then it's going to output garbage.

00:39:43.560 | Concretely, for code RAG, if you're not

00:39:49.000 | doing some sort of preprocessing that

00:39:50.800 | takes advantage of a parser and is

00:39:53.680 | able to extract the key components of a particular file

00:39:58.320 | of code, separate the function signature from the body,

00:40:00.760 | from the doc string, what are you even doing?

00:40:03.080 | That's like table stakes.

00:40:05.000 | And it opens up so much more possibilities

00:40:08.760 | with which you can tune your system

00:40:12.360 | to take advantage of the signals that

00:40:15.160 | come from those different parts of the code.

00:40:17.760 | We've had a tool since computers were invented

00:40:20.120 | that understands the structure of source code

00:40:23.560 | to 100% precision.

00:40:26.640 | The compiler knows everything there

00:40:28.760 | is to know about the code in terms of structure.

00:40:32.320 | Why would you not want to use that

00:40:34.080 | in a system that's trying to generate code,

00:40:36.440 | answer questions about code?

00:40:37.760 | You shouldn't throw that out the window

00:40:39.400 | just because now we have really good data-driven models that

00:40:44.320 | can do other things.

00:40:45.160 | Yeah.

00:40:45.800 | When I called it a data moat in my cheating post,

00:40:50.520 | a lot of people were confused about--

00:40:53.000 | because data moat sort of sounds like data lake

00:40:56.040 | because there's data and water and stuff.

00:40:57.960 | I don't know.

00:40:58.800 | And so they thought that we were sitting

00:41:00.080 | on this giant mountain of data that we had collected.

00:41:02.440 | But that's not what our data moat is.

00:41:04.040 | It's really a data preprocessing engine

00:41:06.400 | that can very quickly and scalably basically dissect

00:41:09.600 | your entire code base into very small, fine-grained semantic

00:41:12.900 | units and then serve it up.

00:41:15.600 | And so it's really-- it's not a data moat.

00:41:17.280 | It's a data preprocessing moat, I guess.

00:41:20.000 | Yeah, if anything, we're hypersensitive to customer data

00:41:23.380 | privacy requirements.

00:41:24.880 | So it's not like we've taken a bunch of private data

00:41:27.040 | and trained a generally available model.

00:41:29.840 | In fact, exactly the opposite.

00:41:32.000 | A lot of our customers are choosing

00:41:33.520 | Cody over Copilot and other competitors

00:41:36.480 | because we have an explicit guarantee

00:41:38.200 | that we don't do any of that.

00:41:39.400 | And we've done that from day one.

00:41:41.000 | Yeah.

00:41:42.000 | I think that's a very real concern in today's day and age.

00:41:44.740 | Because if your proprietary IP finds its way

00:41:48.120 | into the training set of any model,

00:41:50.720 | it's very easy both to extract that knowledge from the model

00:41:54.360 | and also use it to build systems that

00:41:57.640 | work on top of the institutional knowledge

00:41:59.440 | that you've built up.

00:42:01.560 | About a year ago, I wrote a post on LLMs for developers.

00:42:05.040 | And one of the points I had was maybe the depth of the DSL.

00:42:08.680 | I spent most of my career writing Ruby.

00:42:10.560 | And I love Ruby.

00:42:12.120 | It's so nice to use.

00:42:13.640 | But it's not as performant, but it's really easy to read.

00:42:16.680 | And then you look at other languages,

00:42:18.560 | maybe they're faster, but they're more verbose.

00:42:21.760 | And when you think about efficiency of the context

00:42:24.280 | window, that actually matters.

00:42:27.440 | But I haven't really seen a DSL for models.

00:42:31.360 | I haven't seen code being optimized

00:42:33.720 | to be easier to put in a model context.

00:42:36.320 | And it seems like your pre-processing

00:42:38.320 | is kind of doing that.

00:42:39.240 | Do you see in the future the way we think about DSL and APIs

00:42:43.600 | and service interfaces be more focused

00:42:46.660 | on being context-friendly?

00:42:48.520 | Whereas maybe it's harder to read for the human,

00:42:52.400 | but the human is never going to write it anyway.

00:42:55.160 | We were talking on the "Hacks" podcast.

00:42:57.400 | There are some data science things, like spin-up the spandex.

00:43:01.400 | Humans are never going to write again,

00:43:03.160 | because the models can just do very easily.

00:43:05.760 | Yeah, curious to hear your thoughts.

00:43:07.880 | Well, so DSLs, they involve writing a grammar and a parser.

00:43:14.600 | And they're like little languages, right?

00:43:18.600 | And we do them that way because we need them to compile,

00:43:23.240 | and humans need to be able to read them, and so on.

00:43:26.120 | The LLMs don't need that level of structure.

00:43:28.120 | You can throw any pile of crap at them,

00:43:30.600 | more or less unstructured, and they'll deal with it.

00:43:32.800 | So I think that's why a DSL hasn't emerged

00:43:35.600 | for communicating with the LLM or packaging up

00:43:38.420 | the context or anything.

00:43:39.420 | Maybe it will at some point, right?

00:43:40.880 | We've got tagging of context and things

00:43:42.560 | like that that are sort of peeking into DSL territory,

00:43:45.240 | right?

00:43:45.740 | But your point on do users, do people

00:43:48.480 | have to learn DSLs, like regular expressions,

00:43:50.800 | or pick your favorite, right?

00:43:52.440 | XPath.

00:43:53.600 | I think you're absolutely right that the LLMs are really,

00:43:56.140 | really good at that.

00:43:57.000 | And I think you're going to see a lot less of people

00:43:59.220 | having to slave away learning these things.

00:44:01.080 | They just have to know the broad capabilities,

00:44:03.040 | and then the LLM will take care of the rest.

00:44:06.400 | Yeah, I'd agree with that.

00:44:07.560 | I think we will see kind of like a revisiting of--

00:44:11.640 | basically, the value profit of DSL

00:44:13.400 | is that it makes it easier to work with a lower level

00:44:17.320 | language, but at the expense of introducing an abstraction

00:44:20.080 | layer.

00:44:22.280 | And in many cases today, without the benefit of AI co-generation,

00:44:27.920 | that's totally worth it, right?

00:44:31.040 | With the benefit of AI co-generation,

00:44:33.200 | I mean, I don't think all DSLs will go away.

00:44:36.800 | I think there's still places where that trade-off

00:44:38.960 | is going to be worthwhile.

00:44:40.280 | But it's kind of like, how much of source code

00:44:43.780 | do you think is going to be generated

00:44:45.320 | through natural language prompting in the future?

00:44:47.000 | Because in a way, any programming language

00:44:48.960 | is just a DSL on top of assembly, right?

00:44:52.760 | And so if people can do that, then yeah.

00:44:56.200 | Maybe for a large portion of the code that's written,

00:44:59.140 | people don't actually have to understand

00:45:00.800 | the DSL that is Ruby, or Python, or basically

00:45:04.840 | any other programming language that exists today.

00:45:07.000 | I mean, seriously, do you guys ever write SQL queries now

00:45:09.960 | without using a model of some sort?

00:45:12.080 | At least at JavaScript.

00:45:13.120 | Ever?

00:45:14.200 | Yeah, right?

00:45:14.920 | And so we have kind of passed that bridge, right?

00:45:18.200 | Yeah, I think to me, the long-term thing is like,

00:45:21.840 | is there ever going to be--

00:45:23.560 | you don't actually see the code.

00:45:25.360 | It's like, hey-- the basic thing is like, hey,

00:45:27.240 | I need a function to sum two numbers.

00:45:29.400 | And that's it.

00:45:31.240 | I don't need you to generate the code.

00:45:33.080 | And the follow-on question, do you need the engineer

00:45:35.400 | or the paycheck?

00:45:37.920 | I mean, right?

00:45:38.880 | That's kind of the agent's discussion in a way,

00:45:40.880 | where you cannot automate the agents,

00:45:42.960 | but slowly you're getting more of the atomic units

00:45:46.600 | of the work done.

00:45:48.400 | I kind of think of it as like, do you need a punch card

00:45:50.800 | operator to answer that for you?

00:45:52.640 | And so I think we're still going to have people

00:45:54.600 | in the role of a software engineer,

00:45:56.100 | but the portion of time they spend

00:45:58.920 | on these kind of low-level, tedious tasks

00:46:02.600 | versus the higher-level, more creative tasks is going to

00:46:05.480 | shift.

00:46:07.000 | No, I haven't used punch cards.

00:46:08.560 | It looks over here.

00:46:09.320 | [LAUGHTER]

00:46:12.340 | Yeah.

00:46:12.840 | Yeah, I've been talking about--

00:46:14.520 | so we've kind of made this podcast

00:46:17.040 | about the sort of rise of the AI engineer.

00:46:20.040 | And the first step is the AI-enhanced engineer

00:46:22.440 | that is that software developer that is no longer doing

00:46:25.720 | these routine boilerplate-y type tasks,

00:46:28.280 | because they're just enhanced by tools like yours.

00:46:30.880 | So you mentioned-- you opened CodeGraph.

00:46:33.160 | I mean, that is a kind of DSL, maybe.

00:46:35.960 | And because we're releasing this as you go GA,

00:46:40.040 | you hope for other people to take advantage of that?

00:46:43.700 | Oh, yeah.

00:46:44.200 | I would say-- so OpenCodeGraph is not a DSL.

00:46:46.280 | It's more of a protocol.

00:46:47.280 | It's basically like, hey, if you want

00:46:48.820 | to make your system, whether it's chat, or logging,

00:46:52.760 | or whatever, accessible to an AI developer tool like Kodi,

00:46:58.840 | here is kind of like the schema by which you can provide

00:47:03.000 | that context and offer hints.

00:47:04.800 | So comparisons like LSP obviously

00:47:08.200 | did this for kind of like standard code intelligence.

00:47:10.600 | It's kind of like a lingua franca for providing

00:47:12.800 | fine references and codefinition.

00:47:14.520 | There's kind of like analogs to that.

00:47:16.200 | There might be also analogs to kind of the original OpenAI

00:47:20.720 | kind of like plugins API, where it's like, hey,

00:47:25.880 | there's all this context out there

00:47:27.440 | that might be useful for an LM-based system to consume.

00:47:31.520 | And so at a high level, what we're trying to do

00:47:33.920 | is define a common language for context providers

00:47:38.560 | to provide context to other tools in the software

00:47:41.640 | development lifecycle.

00:47:43.040 | Yeah.

00:47:43.640 | Do you have any critiques of LSP, by the way,

00:47:45.480 | since this is very much very close to home?

00:47:48.200 | One of the authors wrote a really good critique recently.

00:47:50.600 | Yeah.

00:47:51.100 | Oh, LSP?

00:47:51.600 | I don't think I saw that.

00:47:52.680 | Yeah, yeah.

00:47:53.180 | How LSP could have been better.

00:47:54.720 | It just came out a couple of weeks ago.

00:47:56.360 | It was a good article.

00:47:57.400 | Yeah.

00:47:57.900 | I don't know if I--

00:47:59.360 | I think LSP is great for what it did for the developer

00:48:02.760 | ecosystem.

00:48:04.200 | It's absolutely fantastic.

00:48:05.600 | Nowadays, it's very easy--

00:48:08.120 | it's much easier now to get code navigation up and running

00:48:11.840 | in--

00:48:12.340 | A bunch of editors.

00:48:13.440 | --in a bunch of editors by speaking this protocol.

00:48:15.800 | I think maybe the interesting question

00:48:17.440 | is looking at the different design decisions made,

00:48:21.440 | comparing LSP basically with Kithe.

00:48:24.240 | Because Kithe has more of a--

00:48:27.800 | I don't know, how would you describe it?

00:48:29.460 | A storage format.

00:48:30.560 | I think the critique of LSP from a Kithe point of view

00:48:33.320 | would be, with LSP, you don't actually

00:48:34.920 | have an actual model, a symbolic model, of the code.

00:48:39.920 | It's not like LSP models, hey, this function

00:48:41.720 | calls this other function.

00:48:43.240 | LSP is all range-based.

00:48:44.840 | Like, hey, your token is at line 32--

00:48:48.240 | your cursor is at line 32, column 1.

00:48:51.200 | And that's the thing you feed into the language server.

00:48:54.700 | And then it's like, OK, here's the range

00:48:56.860 | that you should jump to if you click on that range.

00:48:59.000 | So it kind of is intentionally ignorant of the fact

00:49:02.400 | that there's a thing called a reference underneath your

00:49:04.760 | cursor, and that's linked to a symbol definition.

00:49:07.100 | Well, actually, that's the worst example you could have used.

00:49:09.640 | You're right, but that's the one thing that it actually

00:49:12.320 | did bake in, is following references.

00:49:14.800 | Sure.

00:49:15.300 | But it's sort of hardwired.

00:49:16.720 | Yeah.

00:49:18.240 | Whereas Kithe attempts to model all these things explicitly.

00:49:21.520 | And so--

00:49:22.640 | Well, so LSP's a protocol, right?

00:49:25.520 | And so Google's internal protocol is gRPC-based.

00:49:28.600 | And it's a different approach than LSP.

00:49:34.440 | Basically, you make a heavy query to the back end,

00:49:36.620 | and you get a lot of data back, and then you

00:49:38.460 | render the whole page.

00:49:40.920 | So we've looked at LSP, and we think that it's just--

00:49:44.320 | it's a little long in the tooth, right?

00:49:45.960 | I mean, it's a great protocol, lots and lots of support

00:49:48.240 | for it.

00:49:48.740 | But we need to push into the domain of exposing

00:49:52.800 | the intelligence through the protocol.

00:49:55.520 | Yeah.

00:49:56.360 | And so I would say, I mean, we've

00:49:59.160 | developed a protocol of our own called Skip, which is, I think,

00:50:02.020 | at a very high level, trying to take some of the good ideas

00:50:04.440 | from LSP and from Kithe, and merge that into a system that,

00:50:08.160 | in the near term, is useful for SourceGraph,

00:50:10.540 | but I think in the long term, we hope it will

00:50:12.400 | be useful for the ecosystem.

00:50:13.840 | And I would say, OK, so here's what LSP did well.

00:50:17.400 | LSP, by virtue of being intentionally dumb--

00:50:20.840 | "dumb" in air quotes, because I'm not ragging on it--

00:50:23.880 | Yeah.

00:50:25.600 | But what it allowed it to do is it

00:50:28.280 | allowed language service developers

00:50:30.060 | to kind of bypass the hard problem of modeling language

00:50:33.400 | semantics precisely.

00:50:35.040 | So if all you want to do is jump to definition,

00:50:37.200 | you don't have to come up with a universally unique naming

00:50:40.320 | scheme for each symbol, which is actually quite challenging.

00:50:43.600 | Because you have to think about, OK, what's

00:50:45.920 | the top scope of this name?

00:50:47.760 | Is it the source code repository?

00:50:50.240 | Is it the package?

00:50:53.320 | Does it depend on what package server

00:50:57.800 | you're fetching this from, whether it's the public one

00:51:00.360 | or the one inside your--

00:51:01.480 | anyways, naming is hard, right?

00:51:03.800 | And by just going from a location-to-location-based

00:51:07.680 | approach, you basically just throw that out the window.

00:51:09.920 | All I care about is jumping to definition.

00:51:11.720 | Just make that work, and you can make that work

00:51:14.240 | without having to deal with all the complex global naming

00:51:18.080 | things.

00:51:19.000 | The limitation of that approach is

00:51:20.440 | that it's harder to build on top of that

00:51:23.200 | to build a true-knowledge graph.

00:51:24.880 | If you actually want a system that says, OK,

00:51:26.840 | here's the web of functions, and here's

00:51:28.520 | how they reference each other.

00:51:29.760 | And I want to incorporate that semantic model of how

00:51:32.800 | the code operates, or how the code relates to each other

00:51:35.880 | at a static level, you can't do that with LSP,

00:51:37.920 | because you have to deal with line ranges.

00:51:39.760 | And concretely, the pain point that we found

00:51:42.280 | in using LSP for source graph is,

00:51:44.560 | in order to do a find references and then jump to definition,

00:51:48.240 | it's like a multi-hop process, because you

00:51:50.640 | have to jump to the range, and then you

00:51:52.240 | find the symbol at that range.

00:51:53.600 | And it just adds a lot of latency and complexity

00:51:55.560 | of these operations.

00:51:56.400 | Where as a human, you're like, well,

00:51:58.000 | this thing clearly references this other thing.

00:52:00.080 | Why can't you just jump me to that?

00:52:02.440 | And I think that's the thing that Kite does well.

00:52:04.440 | But then I think the issue that Kite has had with adoption

00:52:07.520 | is, because it's a more sophisticated schema, I think.

00:52:14.480 | And so there's basically more things

00:52:15.960 | that you have to implement to get a Kite implementation

00:52:18.400 | up and running.

00:52:19.080 | I hope I'm not like--

00:52:20.760 | correct me if I'm wrong about any of this.

00:52:22.920 | 100%.

00:52:24.280 | Kite also has the problem-- all these systems

00:52:26.560 | have the problem, even Skip, or at least the way

00:52:29.160 | that we implemented the indexers,

00:52:30.560 | that they have to integrate with your build system

00:52:33.200 | in order to build that knowledge graph,

00:52:34.920 | because you have to basically compile

00:52:36.520 | the code in a special mode to generate artifacts instead

00:52:39.080 | of binaries.

00:52:40.200 | And I would say--

00:52:41.440 | by the way, earlier I was saying that xrefs were in LSP,

00:52:46.240 | but it's actually-- I was thinking of LSP plus lsif.

00:52:49.780 | Ugh, lsif.

00:52:51.680 | That's another--

00:52:53.000 | Which is actually bad.

00:52:53.920 | We can say that's bad, right?

00:52:56.360 | Lsif was not good.

00:52:58.880 | It's like Skip or Kite.

00:53:00.040 | It's supposed to be sort of a model, a serialization

00:53:03.440 | for the code graph.

00:53:04.360 | But it basically just does what LSP needs, the bare minimum.

00:53:08.280 | Lsif is basically if you took LSP

00:53:09.720 | and turned that into a serialization format.

00:53:11.600 | So you build an index for language servers

00:53:13.440 | to kind of quickly bootstrap from cold start.

00:53:15.840 | But it's a graph model with all of the inconvenience of the API

00:53:19.640 | without an actual graph.

00:53:21.480 | And so, yeah, it's not great.

00:53:23.960 | So one of the things that we try to do with Skip

00:53:25.960 | is try to capture the best of both worlds.

00:53:27.760 | So make it easy to write an indexer,

00:53:29.400 | make the schema simple, but also model

00:53:32.120 | some of the more symbolic characteristics of the code

00:53:34.960 | that would allow us to essentially construct this

00:53:37.680 | knowledge graph that we can then make

00:53:39.560 | useful for both the human developer through SourceGraph

00:53:41.880 | and through the AI developer through Kodi.

00:53:44.600 | So anyway, just to finish off the graph comment

00:53:49.040 | is we've got a new graph that's Skip-based.

00:53:55.080 | We call it BFG internally, right?

00:53:58.380 | Beautiful something graph.

00:53:59.640 | Big friendly graph.

00:54:00.880 | Big friendly graph.

00:54:01.760 | It's a blazing fast.

00:54:02.680 | Blazing fast.

00:54:03.240 | Chasing fast graph.

00:54:04.480 | And it is blazing fast, actually.

00:54:05.940 | It's really, really interesting.

00:54:07.240 | I should probably have to do a blog post about it

00:54:09.920 | to walk you through exactly how they're doing it.

00:54:12.040 | Oh, please.

00:54:12.600 | But it's a very AI-like, iterative, experimentation

00:54:16.800 | sort of approach, where we're building a code graph based

00:54:20.400 | on all of our 10 years of knowledge

00:54:22.160 | about building code graphs.

00:54:23.640 | But we're building it quickly with zero configuration,

00:54:25.880 | and it doesn't have to integrate with your build system

00:54:28.840 | through some magic tricks that we have.

00:54:30.680 | And so it just happens when you install the plug-in

00:54:35.600 | that it'll be there and indexing your code

00:54:38.240 | and providing that knowledge graph in the background

00:54:40.440 | without all that build system integration.

00:54:42.320 | This is a bit of secret sauce that we haven't really--

00:54:46.800 | I don't know, we haven't advertised it very much lately.

00:54:49.800 | But I am super excited about it, because what they do

00:54:52.480 | is they say, all right, let's tackle function parameters

00:54:55.120 | today.

00:54:56.000 | Kodi's not doing a very good job of completing function call

00:54:58.800 | arguments or function parameters in the definition, right?

00:55:01.920 | Yeah, we generate those thousands of tests.

00:55:03.840 | And then we can actually reuse those tests for the AI context

00:55:06.840 | as well.

00:55:07.760 | So fortunately, things are kind of converging on.

00:55:10.040 | We have half a dozen really, really good context sources.

00:55:14.680 | And we mix them all together.

00:55:16.880 | So anyway, BFG, you're going to hear more about it probably,

00:55:21.760 | I would say, probably in the holidays?

00:55:24.240 | Yeah, I think it'll be online for December 14th.

00:55:28.240 | We'll probably mention it.

00:55:29.640 | BFG is probably not the public name we're going to go with.

00:55:32.720 | I think we might call it Graph Context or something like that.

00:55:36.560 | We're officially calling it BFG.

00:55:38.680 | You're going to hear first.

00:55:40.240 | BFG is just kind of like the working name.

00:55:42.000 | And it's interesting.

00:55:43.880 | So the impetus for BFG was, if you

00:55:46.480 | look at current AI inline code completion tools

00:55:50.760 | and the errors that they make, a lot of the errors

00:55:53.400 | that they make, even in kind of the easy single line case,

00:55:56.960 | are essentially type errors, right?

00:56:00.200 | You're trying to complete a function call.

00:56:04.120 | And it suggests a variable that you define earlier,

00:56:06.200 | but that variable is the wrong type.

00:56:08.480 | And that's the sort of thing where it's like, well,

00:56:10.880 | like a first year freshman CS student

00:56:14.480 | would not make that error, right?

00:56:16.440 | So why does the AI make that error?

00:56:19.040 | And the reason is, I mean, the AI is just

00:56:21.680 | suggesting things that are plausible

00:56:23.280 | without the context of the types or any other broader

00:56:28.640 | files in the code.

00:56:31.360 | And so the kind of intuition here

00:56:33.360 | is, why don't we just do the basic thing

00:56:36.920 | that any baseline intelligent human developer would

00:56:40.240 | do, which is click jump to definition,

00:56:43.440 | click some find references, and pull in that graph context

00:56:48.080 | into the context window, and then

00:56:51.360 | have it generate the completion.

00:56:53.480 | So that's sort of like the MVP of what BFG was.

00:56:56.320 | And it turns out that works really well.

00:56:58.000 | You can eliminate a lot of type errors

00:57:02.920 | that AI coding tools make just by pulling in that context.

00:57:06.840 | Yeah, but the graph is definitely our Chomsky side.

00:57:09.560 | Yeah, exactly.

00:57:10.280 | So this Chomsky-Norvig thing, I think,

00:57:12.720 | pops up in a bunch of different layers.

00:57:15.200 | And I think it's just a very useful and also kind of nicely

00:57:18.960 | nerdy way to describe the system that we're trying to build.

00:57:23.120 | By the way, I remember the point I

00:57:25.640 | was trying to make earlier to your question, Alessio, about,

00:57:28.080 | is AI going to replace programmers?

00:57:29.800 | And I was talking about how compilers--

00:57:31.520 | they thought, oh, are compilers going to replace programming?

00:57:33.640 | And what it did was it just changed

00:57:35.100 | kind of what programmers have to focus on.

00:57:36.920 | And I think AI is just going to level us up again.

00:57:39.240 | So programmers are still going to be building stuff

00:57:42.120 | until agents come along, but I don't believe.

00:57:46.280 | And so, yeah.

00:57:47.680 | Yeah, to be clear, again, with the agent stuff

00:57:50.600 | at a high level, I think we will get there.

00:57:52.460 | I think that's still the kind of long-term target.

00:57:54.840 | And I think also with Kodi, it's like,

00:57:57.160 | you can have Kodi draft up an execution plan.

00:58:00.160 | It's just not going to be the sort of thing where you can't

00:58:04.440 | attend to what it's doing.

00:58:05.880 | Like, we think that with Kodi, it's like, you guys Kodi,

00:58:08.520 | like, hey, I have this bug.

00:58:09.640 | Help me solve it.

00:58:10.340 | It would do a reasonable job of fetching context and saying,

00:58:12.840 | here are the files you should modify.

00:58:14.960 | And if you prompt it further, you

00:58:16.480 | can actually suggest co-changes to make to those files.

00:58:19.200 | And that's a very nice way to resolve issues,

00:58:21.640 | because you're kind of on the rails for most of the time,

00:58:24.720 | but then now and then you have to intervene as a human.

00:58:27.600 | I just think that if we're trying

00:58:28.960 | to get to complete automation, where it's like the sort

00:58:31.720 | of thing where a non-software engineer, someone

00:58:34.600 | who has no technical expertise, can just

00:58:36.600 | speak a non-trivial feature into existence,

00:58:41.520 | that is still, I think, several key innovations away

00:58:46.360 | from happening right now.

00:58:47.400 | And I don't think the pure transformer-based LLM

00:58:51.400 | orchestrator model of agents that is kind of dominant today

00:58:56.320 | is going to get us there.

00:58:57.880 | FRANCESC CAMPOY: Yeah.

00:58:58.960 | Just what you're talking about triggered a thread

00:59:04.480 | I've been working on for a little bit, which is we're

00:59:07.400 | very much reacting to developments in models

00:59:09.920 | on a month-to-month basis.

00:59:11.960 | You had a post about, we're going

00:59:15.520 | to need a bigger moat, which is great JAWS reference for those

00:59:19.040 | who didn't catch it.

00:59:20.000 | About how quickly--

00:59:20.760 | MARK MANDEL: I forgot all about that.

00:59:22.300 | FRANCESC CAMPOY: --how quickly models are evolving.

00:59:24.920 | But I think if you kind of look out,

00:59:26.560 | I actually caught Sam Altman on the podcast

00:59:29.200 | yesterday talking about GPT-10.

00:59:31.960 | [LAUGHTER]

00:59:32.460 | MARK MANDEL: Ooh, wow.

00:59:34.120 | Things are accelerating.

00:59:36.680 | FRANCESC CAMPOY: And actually, there's a pretty good cadence

00:59:39.240 | from GPT-2, 3, and 4 that you can-- if you project out.

00:59:42.360 | So 4 is based on George Hotz's concept of 20 petaflops

00:59:48.120 | being a human's worth of compute.

00:59:52.080 | GPT-4 took about 100 years in terms of human years

00:59:57.080 | to train, in terms of the amount of compute.

01:00:00.120 | So that's one living person.

01:00:02.800 | And every generation of GPT increases

01:00:05.460 | two orders of magnitude.

01:00:07.320 | So 5 is 100 people.

01:00:10.680 | And if you just project it out, 9 is every human on Earth,

01:00:14.880 | and 10 is every human ever.

01:00:18.960 | And he thinks he'll reach there by the end of the decade.

01:00:22.280 | MARK MANDEL: George Hotz does?

01:00:23.520 | FRANCESC CAMPOY: No, Sam Altman.

01:00:24.280 | MARK MANDEL: Oh, Sam Altman, OK.

01:00:25.200 | FRANCESC CAMPOY: Yeah.

01:00:26.080 | So I just like setting those high-level--

01:00:29.800 | you have dots on the line.

01:00:32.160 | We're at the start of the curve with Moore's law.

01:00:37.080 | George Moore, I think, thought it would last 10 years.

01:00:40.120 | And he just kept drawing for another 50.

01:00:43.680 | And I think we have all these data points.

01:00:45.600 | And we're just trying to extrapolate the curve out

01:00:48.200 | to where this goes.

01:00:50.040 | So all I'm saying is this Asian stuff that we dealt

01:00:54.040 | might come here by 2030.

01:00:56.240 | And I don't know how you plan when things are not

01:01:01.400 | possible today.

01:01:02.080 | And you're like, it's not worth doing.

01:01:04.640 | But we're going to be here in 2030.

01:01:06.840 | And what do we do then?

01:01:12.360 | MARK MANDEL: So is the question like--

01:01:14.000 | FRANCESC CAMPOY: There's no question.

01:01:15.500 | It's like sharing of a comment, just

01:01:17.120 | because at the back of my head, anytime

01:01:20.240 | we hear things like things are not practical today,

01:01:23.080 | I'm just like, all right, but how do we--

01:01:25.640 | MARK MANDEL: So here's a question, maybe.

01:01:28.200 | I get the whole scaling argument.

01:01:30.220 | I do think that there will be something like a Moore's law

01:01:32.640 | for AI inference.

01:01:34.920 | I mean, definitely, I think, at the hardware level, like GPUs.

01:01:39.800 | I think it gets a little fuzzier the higher you move up

01:01:42.400 | in the stack.

01:01:44.400 | But for instance, going back to the chess analogy,

01:01:50.000 | at what point do we think that GPT-X or whatever,

01:01:54.520 | a pure transformer-based LLM model will be state of the art

01:02:00.440 | or outperform the best chess-playing algorithm today?

01:02:04.680 | Because I think that is one milestone on--

01:02:07.480 | FRANCESC CAMPOY: Where you completely overlap

01:02:09.880 | search and symbolic models.

01:02:11.040 | MARK MANDEL: Yeah, exactly, because I

01:02:11.220 | think that would be--

01:02:12.680 | I mean, just to put my cards on the table,

01:02:13.960 | I think that would kind of disprove the thesis that I just

01:02:16.320 | stated, which is kind of like the pure transformer,

01:02:18.960 | just scale the transformer-based approach.

01:02:21.720 | That would be a proof point where like, hey,

01:02:23.600 | maybe that is the right approach,

01:02:25.000 | versus, oh, we actually have to take a step back and think--

01:02:28.400 | you get what I'm saying, right?

01:02:29.680 | Is the transformer going to be like,

01:02:31.260 | is at the end all be all of architectures,

01:02:33.080 | and it's just a matter of scaling that?

01:02:34.840 | Or are there other algorithms, and that

01:02:37.200 | is going to be one piece of a system of intelligence

01:02:41.740 | that's going to take advantage-- that we'll have to take

01:02:44.120 | advantage of, like many other algorithms and approaches?

01:02:47.240 | FRANCESC CAMPOY: Yeah, we shall see.

01:02:49.200 | Maybe John Carmack will find it.

01:02:51.600 | MARK MANDEL: Yeah.

01:02:53.800 | FRANCESC CAMPOY: All right, sorry for that digression.

01:02:56.000 | I'm just very curious.

01:02:57.480 | So one thing I did actually want to check in on,

01:03:00.000 | because we talked a little bit about code graphs and reference

01:03:02.760 | graphs and all that.

01:03:03.640 | Do you actually use a graph database?

01:03:05.360 | No, right?

01:03:06.280 | MARK MANDEL: No.

01:03:07.120 | FRANCESC CAMPOY: Isn't it weird?

01:03:08.480 | MARK MANDEL: Well, I mean, how would you find graph database?

01:03:10.760 | FRANCESC CAMPOY: We use Postgres.

01:03:12.380 | And yeah, I saw a paper actually right

01:03:14.140 | after I joined Sourcegraph.

01:03:15.220 | There was some joint study between IBM

01:03:16.760 | and some other company that basically showed

01:03:18.420 | that Postgres was performing as well as most of the graph

01:03:20.840 | databases for most graph workloads.

01:03:22.860 | MARK MANDEL: Wow.

01:03:23.620 | In V0 of Sourcegraph, we're like,

01:03:26.400 | we're building a code graph.

01:03:27.660 | Let's use a graph database.

01:03:30.820 | I won't name the database, because I mean,

01:03:33.020 | it was like 10 years ago.

01:03:34.100 | So they're probably much better now.

01:03:35.640 | But we basically tried to dump a non-trivially sized data set,

01:03:40.260 | but also not the whole universe of code, right?

01:03:44.540 | It was a relatively small data set

01:03:46.180 | compared to what we're indexing now into the database.

01:03:48.780 | And we let it run for a week.

01:03:51.620 | And I think it segfaulted or something.

01:03:55.360 | And we're like, OK, let's try another approach.

01:03:58.800 | Let's just put everything in Postgres.

01:04:00.380 | And these days, the graph data, I mean,

01:04:03.460 | it's partially in Postgres.

01:04:04.620 | It's partially just--

01:04:05.700 | I mean, you could store them as flat files.

01:04:07.660 | FRANCESC CAMPOY: Yeah.

01:04:08.620 | I mean, at the end of the day, all the databases,

01:04:10.760 | just get me the data I want.

01:04:12.340 | Answer the queries that I need, right?

01:04:14.660 | If all your queries are single hops in this--

01:04:20.060 | MARK MANDEL: Which they will be if you denormalize

01:04:22.220 | for other use cases.

01:04:23.060 | FRANCESC CAMPOY: Yeah, exactly.

01:04:24.740 | MARK MANDEL: Interesting.

01:04:25.820 | FRANCESC CAMPOY: So, yeah.

01:04:27.100 | MARK MANDEL: Seventh normal form is just a bunch of files.

01:04:29.500 | FRANCESC CAMPOY: Yeah, yeah.

01:04:30.700 | And I don't know, I feel like there's

01:04:32.340 | a bunch of stuff like that, where it's like,

01:04:34.740 | if you look past the marketing and think

01:04:36.460 | about the actual query load, or the traffic patterns,

01:04:41.900 | or the end user use cases you need to serve,

01:04:46.020 | just go with the tried and true, dumb, classic tools

01:04:49.220 | over the new-agey stuff.

01:04:50.540 | MARK MANDEL: Choose point technology, yeah.

01:04:52.260 | FRANCESC CAMPOY: I mean, there's a bunch of stuff

01:04:54.260 | like that in the search domain, too, especially right now,

01:04:56.700 | with embeddings, and vector search, and all that.

01:05:00.900 | But classic search techniques still go very far.

01:05:04.020 | And I don't know, I think in the next year or two maybe,

01:05:07.100 | as we get past the peak AI hype, we'll

01:05:10.680 | start to see the gap emerge, or become more obvious to more

01:05:17.060 | people about how many of the newfangled techniques

01:05:20.100 | actually work in practice, and yield a better product

01:05:23.340 | experience day to day.

01:05:24.780 | MARK MANDEL: Yeah.

01:05:25.940 | So speaking of which, obviously there's

01:05:27.880 | a bunch of other people trying to build AI tooling.

01:05:31.340 | What can you say about your AI stack?

01:05:34.320 | Obviously, you build a lot proprietary in-house,

01:05:36.900 | but what approaches-- so prompt engineering,

01:05:42.020 | do you have a prompt engineering management tool?

01:05:45.620 | What approaches there do you do?

01:05:48.540 | Pre-processing orchestration, do you use Airflow?

01:05:50.900 | Do you use something else?

01:05:52.540 | That kind of stuff.

01:05:53.580 | FRANCESC CAMPOY: Yeah.

01:05:54.500 | Ours is very duct-taped together at the moment.

01:05:58.780 | So in terms of stack, it's essentially

01:06:02.820 | Go and TypeScript, and now Rust.

01:06:06.460 | There's the knowledge graph, the code knowledge graph

01:06:09.220 | that we built, which is using indexers, many of which

01:06:12.620 | are open source, that speak the skip protocol.

01:06:17.980 | And we have the code search back end.

01:06:21.860 | Traditionally, we supported regular expression search

01:06:24.540 | and string literal search with a trigram index.

01:06:28.060 | And we're also building more fuzzy search on top of that

01:06:31.300 | now, kind of like natural language or keyword-based

01:06:33.500 | search on top of that.

01:06:36.820 | And we use a variety of open source and proprietary models.

01:06:40.140 | We try to be pluggable with respect to different models,

01:06:42.820 | so we can easily swap the latest model in and out

01:06:46.580 | as they come online.

01:06:49.460 | I'm just hunting for, is there anything out there

01:06:52.620 | that you're like, these guys are really good.

01:06:55.420 | Everyone should check them out.

01:06:56.700 | So for example, you talked about recursive summarization,

01:06:59.500 | which is something that LangChain and LlamaIndex do.

01:07:01.780 | I presume you wrote your own.

01:07:03.940 | I presume--

01:07:04.500 | Yeah, we wrote our own.

01:07:05.500 | I think the stuff that LlamaIndex and LangChain

01:07:08.580 | are doing are super interesting.

01:07:10.780 | I think, from our point of view, it's

01:07:12.420 | like we're still in the application end user use case

01:07:16.020 | discovery phase.

01:07:17.060 | And so adopting an external infrastructure or middleware

01:07:25.020 | tool just seems overly constraining right now.

01:07:27.300 | We need full control.

01:07:28.540 | Yeah, we need full control, because we

01:07:29.540 | need to be able to iterate rapidly up and down the stack.

01:07:32.260 | But maybe at some point, there'll be a convergence,

01:07:34.620 | and we can actually merge some of our stuff into theirs

01:07:36.880 | and turn that into a common resource.

01:07:39.340 | In terms of other vendors that we use,

01:07:41.300 | I mean, obviously, nothing but good things

01:07:43.700 | to say about Anthropic and OpenAI,

01:07:46.700 | which we both kind of partner with and use.

01:07:50.620 | Also, plug for Fireworks as an inference platform.

01:07:55.020 | Their team was kind of like ex-meta people

01:07:57.940 | who basically know all the bag of tricks

01:08:01.620 | for making inference fast.

01:08:02.820 | I met Lynn.

01:08:03.340 | So she was--

01:08:03.840 | Lynn is great.

01:08:05.180 | She was with Sumith.

01:08:06.140 | She was the co-manager of PyTorch for five years.

01:08:08.500 | Yeah, yeah, yeah.

01:08:10.540 | But is their main thing that we just

01:08:12.380 | do fastest inference on Earth?

01:08:14.940 | Is that what it is?

01:08:15.940 | I think that's the pitch.

01:08:17.980 | And it keeps getting faster somehow.

01:08:20.420 | We run Starcoder on top of Fireworks.

01:08:22.900 | And that's made it so that we just don't have

01:08:24.820 | to think about building up an inference stack.

01:08:27.860 | And so that's great for us, because it allows us to focus

01:08:30.340 | more on the data fetching, the knowledge graph,

01:08:35.500 | and model fine-tuning, which we've also invested a bit in.

01:08:40.260 | That's right.

01:08:40.820 | We've got multiple AI workstreams in progress now,

01:08:43.820 | because we hired a head of AI, finally.

01:08:45.860 | We spent close to a year, actually.

01:08:48.460 | I talked to probably 75 candidates.

01:08:51.700 | And the guy we hired, Rashab, is absolutely world-class.

01:08:56.140 | And he immediately started multiple workstreams,

01:08:58.740 | including he's fine-tuned Starcoder already.

01:09:01.860 | He's got Prompt Engineering workstream.

01:09:04.100 | He's got the Embeddings workstream.

01:09:06.780 | He's got Evaluation and Experimentation.

01:09:09.100 | Benchmarking-- wouldn't it be nice

01:09:10.820 | if Cody was on Hugging Face with a benchmark

01:09:14.820 | that anybody could say, well, we'll

01:09:17.140 | run against the benchmark, or we'll make our own benchmark

01:09:19.740 | if we don't like yours.

01:09:20.740 | But we'll be forcing people into the quantitative comparisons.

01:09:24.740 | And that's all happening under the AI program

01:09:26.820 | that he's building for us.

01:09:28.420 | Yeah.

01:09:29.060 | I should mention, by the way, I've

01:09:30.420 | heard that there's a v2 of Starcoder coming on.

01:09:33.860 | So you guys should talk to Hugging Face.

01:09:35.660 | Cool.

01:09:36.440 | Awesome.

01:09:36.940 | Great.

01:09:37.940 | I actually visited their offices in Paris,

01:09:39.740 | which is where I heard it.

01:09:40.700 | That's awesome.

01:09:41.320 | Can you guys believe how amazing it is that the open source

01:09:44.420 | models are competitive with GPT and Anthropic?

01:09:49.060 | I mean, it's nuts, right?

01:09:50.260 | I mean, that one Googler that was predicting that open source

01:09:53.420 | would catch up, at least he was right for completions.

01:09:57.700 | Yeah, I mean, for completions, open source

01:09:59.660 | is state of the art right now.

01:10:01.300 | You were on OpenAI, then you went to Cloud,

01:10:03.100 | and now you've rifted up.

01:10:05.100 | Yeah, for completions.

01:10:06.100 | We still use Cloud and GPT-4 for chat and also commands.

01:10:11.980 | But the ecosystem is going to continue to evolve.

01:10:17.100 | We obviously love the open source ecosystem.

01:10:19.620 | And a huge shout out to Hugging Face.

01:10:21.740 | And also Meta Research, we love the work

01:10:24.620 | that they're doing in kind of driving the ecosystem forward.

01:10:27.300 | Yeah, you didn't mention Codelama.

01:10:29.220 | We're not using Codelama currently.

01:10:31.300 | It's always kind of like a constant evaluation process.

01:10:33.980 | I don't want to come out and say, hey, this model's

01:10:36.140 | the best because we chose it.

01:10:37.340 | It's basically like we did a bunch of tests

01:10:39.580 | for the sorts of context that we're fetching now

01:10:42.460 | and given the way that our prompt's constructed now.

01:10:44.580 | And at the end of the day, it was like a judgment call.

01:10:47.000 | Like, star coders seem to work the best,

01:10:48.700 | and that's why we adopted it.

01:10:50.380 | But it's sort of like a continual process

01:10:52.340 | of revisitation.

01:10:53.140 | Like, if someone comes up with a neat new context fetching

01:10:55.680 | mechanism-- and we have a couple coming online soon--

01:10:59.060 | then it's always like, OK, let's try that

01:11:00.820 | against the kind of array of models that are available

01:11:04.860 | and see how this moves the needle across that set.

01:11:09.980 | Yeah.

01:11:10.920 | What do you wish someone else built?

01:11:14.260 | What did we have to build that we wish we could have used?

01:11:17.900 | Is that the question?

01:11:18.940 | Interesting.

01:11:19.740 | This is a request for startups.

01:11:21.060 | [LAUGHTER]

01:11:24.060 | I mean, if someone could just provide

01:11:25.700 | like a very nice, clean data set of both naturally occurring

01:11:32.700 | and synthetic code data out there.

01:11:34.820 | Yeah, could someone please give us their data mode?

01:11:36.980 | [LAUGHTER]

01:11:37.860 | Well, not even the data mode.

01:11:39.100 | It's just like, I feel like most models today,

01:11:41.380 | they still use a combination of the stack and the pile

01:11:44.060 | as their training corpus.

01:11:47.780 | But you can only stretch that so far.

01:11:50.500 | At some point, we need more data.

01:11:52.340 | And I don't know.

01:11:55.020 | I think there's still more alpha in synthetic data.

01:11:59.020 | We have a couple efforts where we

01:12:01.020 | think fine-tuning some models on specific coding tasks

01:12:03.300 | will yield alpha, will yield more kind

01:12:05.020 | of like reliable code generation of the sort

01:12:08.500 | where it's reliable enough that we can fully automate it,

01:12:11.260 | at least like the one hop thing.

01:12:14.700 | And synthetic data is playing a part of that.

01:12:17.060 | But I mean, if there were like a synthetic data provider--

01:12:19.760 | I don't think you could construct a provider that has

01:12:21.980 | access to some proprietary code base.

01:12:25.200 | No company in the world would be able to sell that to you.

01:12:27.660 | But anyone who's just providing clean data

01:12:29.980 | sets off of the publicly available data,

01:12:33.700 | that would be nice.

01:12:35.940 | I don't know if there's a business around that.

01:12:37.860 | But that's something that we definitely love to use.

01:12:40.200 | Oh, for sure.

01:12:40.820 | My god.

01:12:41.320 | I mean, but that's also like the secret weapon, right?

01:12:44.580 | For any AI is the data that you've curated.

01:12:48.220 | So I doubt people are going to be, oh, we'll see.

01:12:52.740 | But we can maybe contribute if we want

01:12:54.900 | to have a benchmark of our own.

01:12:56.480 | Yeah.

01:12:57.100 | Yeah.

01:12:57.940 | I would say that would be the bull case for Repl.it,

01:13:01.500 | that you want to be a coding platform where you also offer

01:13:04.540 | bounties.

01:13:05.980 | And then you eventually bootstrap your own proprietary

01:13:08.940 | set of coding data.

01:13:10.300 | I don't think they'll ever share it.

01:13:11.800 | And the rumor is--

01:13:14.580 | this is from nobody at Repl.it that I'm hearing.

01:13:17.680 | But also, they're just not leveraging that actively.

01:13:21.660 | They're actually just betting on OpenAI to do a lot of that,

01:13:25.220 | which banking on OpenAI, I think,

01:13:27.860 | has been a winning strategy so far.

01:13:30.540 | Yeah, they're definitely great at executing and--

01:13:33.860 | Executing their CEO.

01:13:35.540 | Ooh.

01:13:37.260 | And then bring him back in four days.

01:13:38.980 | Yeah.

01:13:39.480 | He won.

01:13:39.980 | That was a whole, like, I don't know.

01:13:42.620 | Did you guys-- yeah, was the company just

01:13:45.700 | obsessed by the drama?

01:13:47.500 | We were unable to work.

01:13:48.460 | I just walked in after it happened.

01:13:50.340 | And this whole room in the new room was just like,

01:13:52.560 | everyone's just staring at their phones.

01:13:54.220 | I mean, it's a bit difficult to ignore.

01:13:58.060 | I mean, it would have real implications for us, too.

01:14:00.220 | Because we're using them.

01:14:01.300 | And so there's a very real question of,

01:14:03.060 | do we have to do a quick--

01:14:04.220 | Yeah, did you-- yeah, Microsoft.

01:14:05.600 | You just moved to Microsoft, right?

01:14:07.140 | Yeah, I mean, that would have been the break glass plan.

01:14:10.620 | If the worst case played out, then I

01:14:13.180 | think we'd have a lot of customers the day after being

01:14:16.140 | like, how can you guarantee the reliability of your services

01:14:19.500 | if the company itself isn't stable?

01:14:22.020 | But I'm really happy they got things sorted out

01:14:24.540 | and things are stable now.

01:14:26.380 | Because they build really cool stuff,

01:14:27.940 | and we love using their tech.

01:14:30.260 | Yeah, awesome.

01:14:31.340 | So we kind of went through everything, right?

01:14:33.980 | Sourcecraft, Kodi, why agents don't work,

01:14:37.300 | why inline completion is better, all of these things.

01:14:42.180 | How does that bubble up to who manages the people, right?

01:14:46.820 | Because as engineering managers, and I never--

01:14:50.780 | I didn't write much code.

01:14:52.140 | I was mostly helping people write their own code.

01:14:55.020 | So even if you have the best inline completion,

01:14:57.140 | it doesn't help me do my job.

01:14:59.620 | What's kind of the future of Sourcecraft

01:15:02.580 | in the engineering org?

01:15:04.220 | Yeah, so that's a really interesting question.

01:15:07.580 | And I think it sort of gets at this issue, which

01:15:10.420 | is I think basically every AI dev tools creator or producer

01:15:19.140 | these days, I think us included, we're

01:15:22.700 | kind of focusing on the wrong problem in a way.

01:15:26.340 | Because the real problem of modern software development,

01:15:30.340 | I think, is not how quickly can you write more lines of code.

01:15:34.180 | It's really about managing the emergent complexity

01:15:37.980 | of code bases as they evolve and grow,

01:15:41.340 | and how to make efficient development tractable again.

01:15:47.060 | Because the bulk of your time becomes more about understanding

01:15:51.540 | how the system works and how the pieces fit together currently

01:15:56.140 | so that you can update it in a way that gets you

01:16:00.220 | your added functionality, doesn't break anything,

01:16:03.340 | and doesn't introduce a lot of additional complexity

01:16:05.580 | that will slow you down in the future.

01:16:08.100 | And if anything, the inner loop developer tools

01:16:11.140 | that are all about generating lines of code,

01:16:15.020 | yes, they help you get your feature done faster.

01:16:17.780 | They generate a lot of boilerplate for you.

01:16:19.780 | But they might make this problem of managing large complex code

01:16:24.180 | bases more challenging.

01:16:25.820 | Just because now, instead of having a pistol,

01:16:29.620 | you'll have a machine gun in terms

01:16:31.020 | of being able to write code.

01:16:33.100 | And there's going to be a bunch of natural language prompted

01:16:35.740 | code that is generated in the future that was produced

01:16:38.500 | by someone who doesn't even have an understanding of source

01:16:42.780 | code.

01:16:43.460 | And so how are you going to verify the quality of that

01:16:45.780 | and make sure it not only checks the low-level boxes,

01:16:49.820 | but also fits architecturally in a way that's

01:16:52.820 | sensible into your code base.

01:16:54.020 | And so I think as we look forward

01:16:56.180 | to the future of the next year, we

01:16:57.980 | have a lot of ideas around how to make code bases,

01:17:01.260 | as they evolve, more understandable and manageable

01:17:05.020 | to the people who really care about the code base as a whole--

01:17:08.300 | tech leads, engineering leaders, folks like that.

01:17:11.340 | And it is kind of like a return to our ultimate mission

01:17:16.820 | at Sourcegraph, which is to make code accessible to all.

01:17:19.340 | It's not really about enabling people to write code.

01:17:21.640 | And if anything, the original version of Sourcegraph

01:17:24.820 | was a rejection of, hey, let's stop

01:17:26.460 | trying to build the next best editor,

01:17:29.220 | because there's already enough people doing that.

01:17:32.100 | The real problem that we're facing--

01:17:34.700 | I mean, Quinn, myself, and you, Steve, at Google--

01:17:37.860 | was how do we make sense of the code that

01:17:39.920 | exists so we can understand enough to know

01:17:41.900 | what code needs to be written?

01:17:45.660 | Yeah.

01:17:46.300 | Well, I'll tell you what customers want--

01:17:48.980 | what they're going to get.

01:17:50.060 | What they want is for Kody to have

01:17:51.820 | a monitor for developer productivity.

01:17:54.020 | And any developer who falls below a threshold,

01:17:56.180 | a button lights up where the admin can fire them.

01:17:58.860 | Or Kody will even press that button for you

01:18:01.300 | as the time passes.

01:18:02.940 | But I'm kind of only half tongue-in-cheek here.

01:18:06.260 | We've got some prospects who are kind of sniffing down

01:18:09.460 | that avenue.

01:18:10.180 | And we're like, no.

01:18:12.540 | But what they're going to get is much--

01:18:15.320 | like Bian was saying-- much greater whole code-based

01:18:17.700 | understanding, which is actually something that Kody is,

01:18:20.260 | I would argue, the best at today in the coding assistance space,

01:18:23.020 | right, because of our search engine and the techniques

01:18:25.480 | that we're using.

01:18:26.300 | And that whole code-based understanding

01:18:27.880 | is so important for any sort of a manager who just

01:18:30.860 | wants to get a feel for the architecture

01:18:32.660 | or potential security vulnerabilities

01:18:34.340 | or whether people are writing code that's well-tested

01:18:37.140 | and et cetera, et cetera, right?

01:18:39.020 | And solving that problem is tricky, right?

01:18:42.580 | This is not the developer inner loop or outer loop.

01:18:44.900 | It's like the manager inner loop?

01:18:47.620 | No, outer loop.

01:18:48.540 | The manager inner loop is staring at your belly button,

01:18:51.580 | I guess.

01:18:52.820 | So in any case--

01:18:54.220 | Waiting for the next Slack message to arrive?

01:18:57.060 | Yes.

01:18:58.280 | What they really want is a batch mode for these assistants

01:19:00.700 | where you can actually take the coding assistant

01:19:02.780 | and shove its face into your code base.

01:19:04.980 | And 6 billion lines of code later,

01:19:08.180 | it's told you all the security vulnerabilities.

01:19:10.360 | That's what they really actually want.

01:19:11.980 | It's an insanely expensive proposition, right?

01:19:14.060 | You know, just the GPU cost, especially if you're

01:19:16.100 | doing it on a regular basis.

01:19:17.580 | So it's better to do it at the point the code enters

01:19:19.780 | the system.

01:19:20.380 | And so now we're starting to get into developer outer loop

01:19:22.720 | stuff.

01:19:23.220 | And I think that's where a lot of the-- to your question,

01:19:25.400 | right?

01:19:25.900 | A lot of the admins and managers and the decision makers,

01:19:28.820 | anybody who just kind of isn't coding but is involved,

01:19:32.540 | they're going to have, I think, well, a set of tools, right?

01:19:37.780 | And a set of--

01:19:38.780 | just like with code search today.

01:19:40.980 | Our code search actually serves that audience as well,

01:19:43.540 | the CIO types, right?

01:19:45.140 | Because they're just like, oh, hey,

01:19:46.640 | I want to see how we do Samaloth.

01:19:48.300 | And they use our search engine and they go find it.

01:19:50.380 | And AI is just going to make that so much easier for them.

01:19:53.780 | Yeah, I have a-- this is my perfect place

01:19:56.180 | to put my anecdote of how I used Kodi yesterday.

01:19:59.380 | I was actually trying to build this Twitter scraper thing.

01:20:02.020 | And Twitter is notoriously very challenging to work with

01:20:06.200 | because they don't want to work with anyone.

01:20:09.000 | And there's a repo that I wanted to inspect.

01:20:11.960 | It was really big that had the Twitter scraper thing in it.

01:20:16.860 | And I pulled it into Copilot, didn't work.

01:20:20.420 | But then I noticed that on your landing page,

01:20:23.180 | you had a web version.

01:20:24.100 | Like, I typically think of Kodi as a VS Code extension.

01:20:27.900 | But you have a web version where you just plug in any repo

01:20:30.580 | in there and just talk to it.

01:20:31.860 | And that's what I used to figure it out.

01:20:34.780 | Wow, Kodi web is wild.

01:20:37.240 | Yeah.

01:20:37.840 | I mean, we've done a very poor job

01:20:39.680 | of making the existence of that feature--

01:20:42.880 | It's not easy to find.

01:20:43.840 | It's not easy to find.

01:20:44.800 | The search thing is like, oh, this is old source graph.

01:20:46.840 | You don't want to look at old source graph.

01:20:48.640 | You can use source graph, all the AI stuff.

01:20:50.520 | Old source graph has AI stuff.

01:20:52.120 | And it's Kodi web.

01:20:53.920 | Yeah, there's a little Ask Kodi button

01:20:55.880 | that's hidden in the upper right hand corner.

01:20:58.120 | We should make that more visible.

01:20:59.860 | It's definitely one of those aha moments

01:21:01.760 | when you can ask a question of--

01:21:03.120 | Of any repo, right?

01:21:04.140 | Because you already indexed it.

01:21:05.660 | Well, you didn't embed it, but you indexed it.

01:21:08.100 | And there's actually some use cases

01:21:09.720 | that have emerged among power users where they kind of do--

01:21:13.060 | like, you're familiar with v0.dev.

01:21:15.780 | You can kind of replicate that, but for arbitrary frameworks

01:21:18.260 | and libraries with Kodi web.

01:21:20.340 | Because there's also an equally hidden toggle, which you may

01:21:22.900 | not have discovered yet, where you can actually

01:21:24.860 | tag in multiple repositories as context.

01:21:27.180 | And so you can do things like--

01:21:28.580 | we have a demo path where it's like, OK,

01:21:30.540 | let's say you want to build a stock ticker that's

01:21:33.280 | React-based, but uses this one tick data fetching API.

01:21:37.400 | It's like, you tag both repositories in.

01:21:39.320 | You ask it-- it's like two sentences.

01:21:41.200 | Like, build a stock tick app.

01:21:42.480 | Track the tick data of Bank of America, Wells Fargo

01:21:45.660 | over the past week.

01:21:47.040 | And it generates a code.

01:21:48.040 | You can paste that in.

01:21:49.440 | And it works magically.

01:21:53.280 | We'll probably invest in that more,

01:21:55.160 | just because the wow factor of that is just pretty incredible.

01:21:58.360 | It's like, what if you can speak apps into existence

01:22:00.800 | that use the frameworks and packages that you want to use?

01:22:06.220 | It's not even fine-tuning.

01:22:07.380 | It's just taking advantage of your RAG pipeline.

01:22:09.380 | Yeah, it's just RAG.

01:22:10.820 | RAG is all you need for many things.

01:22:14.420 | It's not just RAG.

01:22:15.540 | It's RAG, right?

01:22:18.580 | RAG's good, not a fallback.

01:22:20.700 | Yeah, but I guess getting back to the original question,

01:22:23.300 | I think there's a couple of things

01:22:25.620 | I think would be interesting for engineering leaders.

01:22:27.780 | One is the use case that you called out,

01:22:29.440 | is all the stuff that you currently don't do

01:22:32.100 | that you really ought to be doing with respect to, like,

01:22:34.520 | ensuring code quality, or updating dependencies,

01:22:37.560 | or keeping things up to date, the things

01:22:42.680 | that humans find toilsome and tedious and just don't want

01:22:45.800 | to do, but would really help uplevel the quality, security,

01:22:49.880 | and robustness of your code base.

01:22:51.480 | Now we potentially have a way to do that with machines.

01:22:56.840 | I think there's also this other thing,

01:23:00.440 | and this gets back to the point of,

01:23:02.720 | how do you measure developer productivity?

01:23:04.520 | It's like the perennial age-old question.

01:23:06.960 | Every CFO in the world would love

01:23:08.520 | to do it in the same way that you can measure marketing,

01:23:11.920 | or sales, or other parts of the organization.

01:23:14.560 | And I think, what is the actual way you would do this

01:23:18.000 | that is good, if you had all the time in the world?

01:23:20.960 | I think, as an engineering manager or an engineering

01:23:23.320 | leader, what you would do is you would go read

01:23:25.660 | through the Git log, maybe like line by line.

01:23:28.160 | Be like, OK, you, Sean, these are the features

01:23:31.560 | that you built over the past six months or a year.

01:23:36.680 | These are the things that delivered that you helped drive.

01:23:39.120 | Here's the stuff that you did to help your teammates.

01:23:43.280 | Here are the reviews that you did

01:23:44.760 | that helped ensure that we have maintained

01:23:47.000 | a coherent and high-quality code base.

01:23:52.760 | Now connect that to the things that matter to the business.

01:23:55.220 | Like, what were we trying to drive this?

01:23:57.040 | Was it engagement?

01:23:58.160 | Was it revenue?

01:23:59.320 | Was it adoption of some new product line?

01:24:02.440 | And really weave that story together.

01:24:04.280 | The work that you did had this impact

01:24:05.960 | on the metrics that moved the needle for the business

01:24:08.200 | and ultimately show up in revenue, or stock price,

01:24:12.480 | or whatever it is that's at the very top of any for-profit

01:24:16.760 | organization.

01:24:18.080 | And you could, in theory, do all that today

01:24:22.440 | if you had all the time in the world.

01:24:24.360 | But as an engineering leader--

01:24:25.620 | It's a busy building.

01:24:26.540 | Yeah, you're too busy building.

01:24:27.540 | You're too busy with a bunch of other stuff.

01:24:29.380 | Plus, it's also tedious, like reading through Git log

01:24:32.660 | and trying to understand what a change does and summarizing

01:24:35.280 | that.

01:24:35.780 | Yeah.

01:24:36.620 | It's just-- it's not the most exciting work in the world.

01:24:40.320 | But with the benefit of AI, I think

01:24:44.060 | you could conceive of a system that actually

01:24:46.260 | does a lot of the tedium and helps you actually

01:24:48.740 | tell that story.

01:24:50.140 | And I think that is maybe the ultimate answer to how

01:24:53.260 | we get at developer productivity in a way

01:24:55.580 | that a CFO would be like, OK, I can buy that.

01:24:59.380 | The work that you did impacted these core metrics

01:25:03.100 | because these features were tied to those.

01:25:05.620 | And therefore, we can afford to invest more

01:25:09.060 | in this part of the organization.

01:25:10.420 | And that's what we really want to drive towards.

01:25:12.020 | I think that's what we've been trying to build all along,

01:25:14.500 | in a way, with Sourcegraph.

01:25:15.700 | It's this code-based level of understanding.

01:25:18.420 | And the availability of LLMs and AI

01:25:21.820 | now just puts that much sooner in reach, I think.

01:25:26.020 | Yeah.

01:25:26.740 | But I mean, we have to focus, also, small company.

01:25:30.460 | And so our short-term focus is lovability, right?

01:25:34.420 | Yeah.

01:25:34.920 | We absolutely have to make Cody like--

01:25:37.300 | everybody wants it, right?

01:25:39.420 | But absolutely, Sourcegraph is all

01:25:41.460 | about enabling all of the non-engineering roles,

01:25:46.340 | decision makers, and so on.

01:25:48.620 | And as Bianca says, I mean, I think

01:25:50.660 | there's just a lot of opportunity

01:25:52.180 | there once we've built a lovable Cody.

01:25:54.500 | Awesome.

01:25:56.260 | We want to jump into lightning round?

01:25:58.340 | Lightning round.

01:25:59.260 | OK.

01:25:59.820 | Which we always forget to send the questions ahead of time.

01:26:04.300 | So we usually have three, one around acceleration,

01:26:07.180 | exploration, and then a final takeaway.

01:26:09.340 | So the acceleration one is, what's

01:26:11.780 | something that already happened in AI that is possible today

01:26:14.940 | that you thought would take much longer?

01:26:16.740 | I mean, just LLMs and how good the vision models are now.

01:26:22.300 | Like, I got my start--

01:26:23.340 | Oh, vision.

01:26:23.740 | OK.

01:26:24.240 | Yeah.

01:26:24.740 | Well, I mean, back in the day, I got my start machine learning

01:26:30.100 | in computer vision, but circa 2009, 2010.

01:26:35.020 | And in those days, everything was statistical-based.

01:26:37.780 | Neural nets had not yet made their comeback.

01:26:40.940 | And so nothing really worked.

01:26:43.160 | And so I was very bearish after that experience

01:26:45.220 | on the future of computer vision.

01:26:46.660 | But man, the progress that's been

01:26:48.220 | made just in the past three or four years

01:26:51.660 | has just been absolutely astounding.

01:26:54.800 | So yeah, it came up faster than I expected it to.

01:26:59.580 | Yeah, multimodal in general, I think

01:27:02.700 | there's a lot more capability there

01:27:04.340 | that we're not tapping into, potentially even

01:27:06.740 | in the coding assistant space.

01:27:08.500 | And honestly, I think that the form factor

01:27:11.500 | that coding assistants have today

01:27:12.940 | is probably not the steady state that we're seeing long-term.

01:27:17.060 | I mean, you'll always have completions,

01:27:18.900 | and you'll always have chat, and commands, and so on.

01:27:21.420 | But I think we're going to discover a lot more.

01:27:23.380 | And I think multimodal potentially opens up

01:27:25.820 | some kind of new ways to get your stuff done.

01:27:30.540 | So yeah, I think the capabilities are there today.

01:27:32.620 | And it's just shocking.

01:27:33.740 | I mean, I still am astonished.

01:27:35.720 | When I sit down, and I have a conversation with the LLM

01:27:38.540 | with the context, and it's like I'm

01:27:41.340 | talking to a senior engineer, or an architect, or somebody.

01:27:45.100 | And I can bounce ideas off it.

01:27:46.740 | And I think that people have very different working models

01:27:49.220 | with these assistants today.

01:27:50.460 | Some people are just completion, completion, completion.

01:27:52.740 | That's it.

01:27:53.500 | And if they want some code generated,

01:27:55.000 | they write a comment, and then telling them what to do.

01:27:58.340 | But I truly think that there are other modalities that we're

01:28:01.040 | going to stumble across, and just kind of latently,

01:28:06.380 | inherently built into the LLMs today.

01:28:08.420 | We just haven't found them yet.

01:28:09.780 | They're more of a discovery than invention.

01:28:12.460 | Like other usage patterns?

01:28:14.180 | Absolutely.

01:28:14.960 | I mean, the one we talked about earlier, nonstop coding

01:28:17.260 | is one, where you could just kick off

01:28:19.140 | a whole bunch of requests to refactor, and so on.

01:28:22.580 | But there could be any number of others.

01:28:24.540 | We talk about agents, that's kind of out there.

01:28:26.540 | But I think there are kind of more inner loop type ones

01:28:29.780 | to be found.

01:28:31.220 | And we haven't looked at all that multimodal yet.

01:28:35.300 | Yeah.

01:28:36.820 | For sure, there's two that come to mind,

01:28:39.520 | just off the top of my head.

01:28:41.260 | One, which is effectively architecture diagrams

01:28:44.140 | and entity relationship diagrams.

01:28:47.180 | There's probably more alpha in synthesizing them

01:28:49.700 | for management to see, which is, you don't need AI for that.

01:28:55.260 | You can just use your reference graph.

01:28:57.420 | But then also doing it the other way around,

01:28:59.260 | when someone draws stuff on a whiteboard

01:29:00.940 | and actually generating code.

01:29:02.220 | Well, you can generate the diagram,

01:29:05.020 | and then explanations, as well.

01:29:07.340 | Yeah.

01:29:08.260 | And then the other one is, there was a demo

01:29:10.140 | that went pretty viral two, three weeks ago,

01:29:13.260 | about how someone just had an always-on script,

01:29:16.540 | just screenshotting and sending it to GPTVision

01:29:20.140 | on some kind of time interval.

01:29:21.620 | And it would just autonomously suggest stuff.

01:29:23.900 | Yeah.

01:29:24.900 | So no trigger, just watching your screen,

01:29:27.300 | and just being a real co-pilot, rather than having

01:29:30.900 | you initiate with the chat.

01:29:32.420 | Yeah.

01:29:33.420 | So there's some--

01:29:34.620 | It's like the return of Clippy, right?

01:29:36.380 | Return of Clippy.

01:29:37.100 | But actually good.

01:29:39.660 | So the reason I know this is we actually did a hackathon,

01:29:41.980 | where we wrote that project, but it roasted you while you did

01:29:46.820 | it, so it's like, hey, you're on Twitter right now.

01:29:49.940 | You should be coding.

01:29:52.820 | And that can be a fun co-pilot thing, as well.

01:29:55.340 | Yeah.

01:29:56.540 | OK, so I'll jump on.

01:29:57.860 | Exploration, what do you think is the most interesting

01:30:00.140 | unsolved question in AI?

01:30:02.900 | It used to be scaling, right, with CNNs and RNNs,

01:30:06.260 | and Transformer solved that.

01:30:07.540 | So what's the next big hurdle that's

01:30:09.020 | keeping GPT-10 from emerging?

01:30:12.180 | I mean, do you mean that like--

01:30:13.460 | Ooh, this is like a safetyist argument.

01:30:15.120 | I feel like-- do you mean like the pure model, like AI layer?

01:30:18.380 | No, it doesn't have to be--

01:30:19.540 | I mean, for me personally, it's like,

01:30:21.120 | how do you get reliable first try working code generation?

01:30:27.260 | Even like a single hop, like write

01:30:29.140 | a function that does this.

01:30:30.380 | Because I think if you want to get to the point

01:30:33.340 | where you can actually be truly agentic or multi-step

01:30:37.140 | automated, a necessary part of that

01:30:40.540 | is the single step has to be robust and reliable.

01:30:44.820 | And so I think that's the problem that we're

01:30:47.860 | focused on solving right now.

01:30:49.400 | Because once you have that, it's a building block

01:30:51.400 | that you can then compose into longer chains.

01:30:55.660 | And just to wrap things up, what's

01:30:57.100 | one message, takeaway that you want people

01:31:00.740 | to remember and think about?

01:31:02.780 | I mean, I think for me it's just like the best

01:31:09.540 | DevTools in the future are going to have

01:31:11.700 | to leverage many different forms of intelligence.

01:31:14.780 | Calling back to that like Normski architecture,

01:31:18.300 | trying to make it catch on.

01:31:19.740 | You should call it something cool like S* or R*.

01:31:22.940 | Yes, yes, yes.

01:31:24.500 | Just one letter and then just let people speculate.

01:31:26.860 | Yeah, yeah, what could he mean?

01:31:30.460 | But I don't know, like in terms of trying

01:31:32.980 | to describe what we're building, we

01:31:34.260 | try to be a little bit more down to earth

01:31:35.980 | and straightforward.

01:31:37.660 | And I think Normski encapsulates the two big technology areas

01:31:44.620 | that we're investing in that we think

01:31:46.140 | will be very important for producing really good DevTools.

01:31:51.460 | And I think it's a big differentiator that we

01:31:53.960 | view that Cody has right now.

01:31:57.060 | Yeah, and mine would be I know for a fact

01:32:00.900 | that not all developers today are using coding assistants.

01:32:04.700 | And that's probably because they tried it

01:32:08.380 | and it didn't immediately write a bunch of beautiful code

01:32:11.460 | for them.

01:32:12.060 | And they were like, ah, too much effort, and they left.

01:32:15.700 | Well, my big takeaway from this talk

01:32:17.420 | would be if you're one of those engineers,

01:32:19.860 | you better start planning another career.

01:32:24.400 | Because this stuff is in the future.

01:32:26.240 | And honestly, it takes some effort

01:32:29.640 | to actually make coding assistants work today.

01:32:31.920 | You have to-- just like talking to GPT,

01:32:33.880 | they'll give you the runaround, just like doing a Google search

01:32:35.720 | sometimes.

01:32:36.720 | But if you're not putting that effort in and learning

01:32:39.560 | the sort of footprint and the characteristics of how

01:32:42.600 | LLMs behave under different query conditions and so on,

01:32:46.040 | if you're not getting a feel for the coding assistant,

01:32:48.560 | then you're letting this whole train just pull out

01:32:50.700 | of the station and leave you behind.

01:32:52.700 | Yeah.

01:32:53.200 | Cool.

01:32:53.700 | Absolutely.

01:32:54.560 | Yeah, thank you guys so much for coming on and being

01:32:57.120 | the first guest in the new studio.

01:32:59.240 | Our pleasure.

01:32:59.960 | Thanks for having us.

01:33:00.880 | [MUSIC PLAYING]

01:33:04.240 | [MUSIC PLAYING]

01:33:07.600 | [MUSIC PLAYING]

01:33:11.520 | [MUSIC PLAYING]

01:33:15.840 | [MUSIC PLAYING]

01:33:19.360 | [MUSIC PLAYING]

01:33:22.720 | (upbeat music)

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

Chapters