back to index

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph


Chapters

0:0 Intros & Backgrounds
6:20 How Steve's work on Grok inspired SourceGraph for Beyang
8:53 From code search to AI coding assistant
13:18 Comparison of coding assistants and the capabilities of Cody
16:49 The importance of context (RAG) in AI coding tools
20:33 The debate between Chomsky and Norvig approaches in AI
25:2 Code completion vs Agents as the UX
30:6 Normsky: the Norvig + Chomsky models collision
36:0 How to build the right context for coding
42:0 The death of the DSL?
46:15 LSP, Skip, Kythe, BFG, and all that fun stuff
62:0 The SourceGraph internal stack
68:46 Building on open source models
74:35 SourceGraph for engineering managers?
86:0 Lightning Round

Whisper Transcript | Transcript Only Page

00:00:00.000 | [MUSIC PLAYING]
00:00:01.440 | Hey, everyone.
00:00:02.240 | Welcome to the Latent Space Podcast.
00:00:04.200 | This is Alessio, partner and CTO on Residents
00:00:06.640 | at Decibel Partners.
00:00:07.600 | And I'm joined by my co-host, Sweets, founder of Small.ai.
00:00:10.760 | Hey, and today we're christening our new podcast studio
00:00:14.560 | in the Newton.
00:00:16.200 | And we have Biang and Steve from Sourcegraph.
00:00:19.380 | Welcome.
00:00:19.880 | Hey, thanks for having us.
00:00:21.560 | So this has been a long time coming.
00:00:23.040 | I'm very excited to have you.
00:00:24.480 | We also are just celebrating the one year anniversary of ChatGPT
00:00:28.240 | yesterday.
00:00:30.360 | But also we'll be talking about the GA of Cody later on today.
00:00:34.480 | But we'll just do a quick intros of both of you.
00:00:37.320 | Obviously, people can research you and check the show notes
00:00:39.960 | for more.
00:00:40.880 | But Biang, you worked in computer vision at Stanford,
00:00:43.200 | and then you worked at Palantir.
00:00:44.640 | I did, yeah.
00:00:45.440 | You also interned at Google, which is--
00:00:47.120 | I did back in the day, where I get to use
00:00:48.920 | Steve's system, dev tool.
00:00:51.880 | Right.
00:00:53.400 | What was it called?
00:00:54.280 | It was called Grok.
00:00:55.120 | Well, the end user thing was Google Code Search.
00:00:58.100 | That's what everyone called it, or just like CS.
00:01:00.680 | But the brains of it were really the Trigram index and then
00:01:06.160 | Grok, which provided the reference graph.
00:01:08.720 | Today it's called Kythe, the open source Google one.
00:01:11.400 | It's sort of like Grok v3.
00:01:13.080 | On your podcast, which you've had me on,
00:01:15.640 | you've interviewed a bunch of other code search developers,
00:01:18.760 | including the current developer of Kythe, right?
00:01:21.560 | No, we didn't have any Kythe people on,
00:01:24.200 | although we would love to if they're up for it.
00:01:27.480 | We had Kelly Norton, who built a similar system at Etsy.
00:01:32.840 | It's an open source project called Hound.
00:01:35.340 | We also had Han-Wen Nienhaus, who
00:01:39.480 | created Zooked, which is--
00:01:41.000 | That's the name I'm thinking about.
00:01:43.120 | --I think heavily inspired by the Trigram index that
00:01:46.360 | powered Google's original code search,
00:01:49.520 | and that we also now use at Sourcegraph.
00:01:51.380 | Yeah.
00:01:51.880 | So you teamed up with Quinn over 10 years
00:01:54.240 | ago to start Sourcegraph.
00:01:57.040 | And I kind of view it like--
00:02:00.040 | we'll talk more about this.
00:02:01.320 | You were indexing all code on the internet,
00:02:05.080 | and now you're in the perfect spot
00:02:07.000 | to create a coding intelligence startup.
00:02:10.440 | Yeah, yeah.
00:02:11.040 | I guess the back story was, I used Google Code Search
00:02:15.160 | while I was an intern.
00:02:16.120 | And then after I left that internship
00:02:19.360 | and worked elsewhere, it was the single dev tool
00:02:22.800 | that I missed the most.
00:02:23.840 | I felt like my job was just a lot more tedious and much more
00:02:27.760 | of a hassle without it.
00:02:29.840 | And so when Quinn and I started working together at Palantir,
00:02:32.420 | he had also used various code search engines in open source
00:02:37.040 | over the years.
00:02:38.440 | And it was just a pain point that we both felt,
00:02:41.000 | both working on code at Palantir and also
00:02:44.840 | working within Palantir's clients, which
00:02:46.520 | were a lot of Fortune 500 companies,
00:02:49.120 | large financial institutions, folks like that.
00:02:51.960 | And if anything, the pains they felt
00:02:55.880 | in dealing with large, complex code bases
00:02:57.840 | made our pain points feel small by comparison.
00:03:01.800 | And so that was really the impetus
00:03:03.160 | for starting Sourcegraph.
00:03:05.160 | Yeah, excellent.
00:03:07.000 | Steve, you famously worked at Amazon.
00:03:10.280 | I did, yep.
00:03:11.960 | And revealed-- and you've told many, many stories.
00:03:15.160 | I want every single listener of "Latent Space"
00:03:17.040 | to check out Steve's YouTube, because he effectively
00:03:20.640 | had a podcast that you didn't tell anyone
00:03:23.400 | about or something.
00:03:25.240 | You just hit record and just went on a few rants.
00:03:28.880 | I'm always here for a Stevie rant.
00:03:31.160 | Then you moved to Google, where you also
00:03:34.640 | had some interesting thoughts on just the overall Google
00:03:36.920 | culture versus Amazon.
00:03:38.320 | You joined Grab as head of Eng for a couple of years.
00:03:40.720 | I'm from Singapore, so I have actually personally
00:03:44.480 | used a lot of Grab's features.
00:03:46.560 | And it was very interesting to see
00:03:48.280 | you talk so highly of Grab's engineering
00:03:50.800 | and overall prospects, because--
00:03:53.320 | Because as a customer, it sucked.
00:03:55.320 | No, it's just like--
00:03:57.080 | no, well, being from a smaller country,
00:03:59.400 | you never see anyone from our home country
00:04:02.760 | being on a global stage or talked
00:04:04.560 | about as a good startup that people admire or look up
00:04:08.880 | to, on the league that you, with all your legendary experience,
00:04:14.520 | would consider equivalent.
00:04:16.760 | Yeah, no, absolutely.
00:04:18.440 | They actually didn't even know that they were as good
00:04:21.320 | as they were, in a sense.
00:04:22.600 | They started hiring a bunch of people from Silicon Valley
00:04:25.360 | to come in and fix it.
00:04:26.680 | And we came in and we were like, oh, we
00:04:28.880 | could have been a little better, operational excellence
00:04:30.360 | and stuff.
00:04:31.080 | But by and large, they're really sharp.
00:04:32.680 | And the only thing about Grab is that they get criticized a lot
00:04:37.160 | for being too Westernized.
00:04:39.320 | Oh, by who?
00:04:41.240 | By Singaporeans who don't want to work there.
00:04:44.400 | OK, well, I guess I'm biased because I'm here,
00:04:47.800 | but I don't see that as a problem.
00:04:51.920 | And if anything, they've had their success
00:04:54.520 | because they were more Westernized than the Sanders
00:04:56.760 | Singaporean tech company.
00:04:57.880 | I mean, they had their success because they are laser-focused.
00:05:01.240 | They copy to Amazon.
00:05:02.960 | I mean, they're executing really, really, really well.
00:05:07.080 | For a giant--
00:05:08.480 | I was on a Slack with 2,500 engineers.
00:05:11.880 | It was like this giant waterfall that you
00:05:14.160 | could dip your toe into.
00:05:15.200 | You'd never catch up with them.
00:05:16.640 | Actually, the AI summarizers would
00:05:18.160 | have been really helpful there.
00:05:21.200 | But yeah, no, I think Grab is successful
00:05:23.200 | because they're just out there with their sleeves rolled up,
00:05:25.880 | just making it happen.
00:05:27.360 | Yeah.
00:05:27.920 | And for those who don't know, it's
00:05:29.320 | not just like Uber of Southeast Asia.
00:05:31.160 | It's also a super app.
00:05:33.000 | PayPal plus.
00:05:35.400 | Yeah, in the way that super apps don't exist in the West.
00:05:38.000 | It's one of the greatest mysteries, enduring mysteries
00:05:40.840 | of B2C, that super apps work in the East
00:05:42.820 | and don't work in the West.
00:05:43.960 | Don't understand it.
00:05:44.840 | Yeah, it's just kind of curious.
00:05:46.760 | They didn't work in India either.
00:05:48.160 | And it was primarily because of bandwidth reasons
00:05:50.200 | and smaller phones.
00:05:51.920 | That should change now.
00:05:53.320 | Should.
00:05:53.840 | And maybe we'll see a super app here.
00:05:55.360 | Yeah.
00:05:55.920 | Yeah.
00:05:56.840 | You worked on-- you retired-ish?
00:05:59.200 | I did, yeah.
00:06:00.200 | You worked on your own video game.
00:06:02.960 | Which-- any fun stories about that?
00:06:04.760 | Any-- I think-- and that's also where you discover some need
00:06:07.160 | for code search, right?
00:06:09.000 | Yeah.
00:06:09.840 | Sure, a need for a lot of stuff.
00:06:11.360 | Better programming languages, better databases,
00:06:14.040 | better everything.
00:06:15.000 | I mean, I started in '95, where there was kind of nothing.
00:06:19.480 | Yeah.
00:06:20.200 | I just want to say, I remember when
00:06:21.400 | you first went to Grab, because you wrote that blog post,
00:06:23.640 | talking about why you were excited about it,
00:06:25.480 | about the expanding Asian market.
00:06:27.440 | And our reaction was like, oh, man.
00:06:29.160 | Why didn't-- how did we miss stealing it?
00:06:32.000 | Hiring you.
00:06:32.880 | Yeah, I was like, miss that.
00:06:34.120 | Wow, I'm tired.
00:06:35.120 | Can we tell that story?
00:06:36.120 | So how did this happen?
00:06:37.640 | So you were inspired by Grok.
00:06:41.560 | Yeah, so I guess the back story, from my point of view,
00:06:44.880 | is I had used Code Search and Grok while at Google.
00:06:49.360 | But I didn't actually know that it was connected to you, Steve.
00:06:52.720 | Like, I knew you from your blog posts, which were always
00:06:55.160 | excellent, kind of like inside, very thoughtful takes on--
00:06:59.640 | from an engineer's perspective, on some of the challenges
00:07:01.920 | facing tech companies, and tech culture,
00:07:04.200 | and that sort of thing.
00:07:06.280 | But my first introduction to you,
00:07:08.000 | within the context of code intelligence and code
00:07:10.120 | understanding, was I watched a talk that you gave,
00:07:13.720 | I think, at Stanford about Grok when you were first
00:07:15.840 | building it.
00:07:16.360 | And that was very eye-opening.
00:07:18.280 | And I was like, oh, that guy, the guy
00:07:20.640 | who writes the extremely thoughtful, ranty blog posts,
00:07:24.360 | also built that system.
00:07:27.520 | And so that's how I knew you were kind of involved in that.
00:07:32.200 | And then it was kind of like, we always
00:07:34.760 | kind of wanted to hire you, but never
00:07:37.000 | knew quite how to approach you or get
00:07:40.560 | that conversation started.
00:07:42.920 | Well, we got introduced by Max, right?
00:07:45.840 | Yeah.
00:07:46.340 | He's the head of Temporal.
00:07:47.920 | Temporal, yeah.
00:07:49.760 | And yeah, I mean, it was a no-brainer.
00:07:52.040 | They called me up.
00:07:52.800 | And I noticed when Sourcegraph had come out.
00:07:55.560 | Of course, when they first came out,
00:07:57.400 | I had this dagger of jealousy stabbed through me,
00:08:00.400 | piercingly, which I remember, because I am not
00:08:02.920 | a jealous person by any means, ever.
00:08:06.040 | But boy, I was like, rah, rah, rah.
00:08:08.320 | But I was kind of busy, right?
00:08:09.800 | And just one thing led to another.
00:08:11.580 | I got sucked back into the ads vortex and whatever.
00:08:14.440 | So thank god, Sourcegraph actually kind of rescued me.
00:08:18.440 | Here's a chance to build DevTools.
00:08:20.000 | Yeah.
00:08:20.640 | That's the best.
00:08:21.360 | DevTools are the best.
00:08:23.400 | Cool.
00:08:23.900 | Well, so that's the overall intro.
00:08:25.440 | I guess we can get into Cody.
00:08:27.560 | Is there anything else that people should know about you
00:08:29.880 | before we get started?
00:08:31.400 | I mean, everybody knows I'm a musician.
00:08:34.920 | So I can juggle five balls.
00:08:39.400 | Five is good.
00:08:40.080 | Five is good.
00:08:40.800 | I've only ever managed three.
00:08:42.960 | Five's hard.
00:08:45.080 | And six, a little bit.
00:08:47.160 | That's impressive.
00:08:49.120 | So yeah, to jump into Sourcegraph,
00:08:51.840 | this has been a company 10 years in the making.
00:08:54.480 | And as Sean said, now you're at the right place.
00:08:58.200 | Phase two.
00:08:59.520 | Now exactly, you spent 10 years collecting all this code,
00:09:02.480 | indexing, making it easy to surface it, and how--
00:09:05.640 | And also learning how to work with enterprises
00:09:07.960 | and having them trust you with their code bases.
00:09:10.360 | Because initially, you were only doing on-prem, right, like VPC,
00:09:14.400 | a lot of VPC deployments.
00:09:15.880 | So in the very early days, we were cloud only.
00:09:17.800 | But the first major customers we landed
00:09:20.360 | were all on-prem, self-hosted.
00:09:22.960 | And that was, I think, related to the nature of the problem
00:09:25.880 | that we're solving, which becomes
00:09:27.600 | just a critical, unignorable pain point once you're
00:09:30.120 | above 100 devs or so.
00:09:32.320 | Yeah.
00:09:32.920 | And now Kodi is going to be GA by the time this releases.
00:09:36.520 | So congrats.
00:09:38.360 | Congrats to your future self for launching this in two weeks.
00:09:42.440 | Can you give a quick overview of just what Kodi is?
00:09:45.280 | I think everybody understands that it's an AI coding agent.
00:09:49.440 | But a lot of companies say they have an AI coding agent.
00:09:52.000 | So yeah, what does Kodi do?
00:09:53.920 | How do people interface with it?
00:09:55.600 | Yeah, so basically, how is it different
00:09:57.680 | from the several dozen other AI coding agents
00:10:00.320 | that exist in the market now?
00:10:03.120 | I think our take--
00:10:04.320 | when we thought about building a coding assistant that
00:10:08.360 | would do things like code generation and question
00:10:10.440 | answering about your code base, I
00:10:11.800 | think we came at it from the perspective of we've
00:10:14.600 | spent the past decade building the world's best code
00:10:17.880 | understanding engine for human developers, right?
00:10:21.000 | So it's kind of your guide as a human dev
00:10:26.280 | if you want to go and dive into a large, complex code base.
00:10:30.360 | And so our intuition was that a lot of the context
00:10:33.960 | that we're providing to human developers
00:10:35.640 | would also be useful context for AI developers to consume.
00:10:40.920 | And so in terms of the feature set,
00:10:43.560 | Kodi is very similar to a lot of other assistants.
00:10:45.640 | It does inline autocompletion.
00:10:47.240 | It does code base aware chat.
00:10:49.640 | It does specific commands that automate tasks
00:10:53.000 | that you might rather not want to do,
00:10:55.640 | like generating unit tests or adding detailed documentation.
00:11:01.080 | But we think the core differentiator is really
00:11:04.320 | the quality of the context, which is
00:11:06.400 | hard to describe succinctly.
00:11:08.280 | It's a bit like saying, what's the difference between Google
00:11:10.900 | and AltaVista?
00:11:12.520 | There's not a quick checkbox list of features
00:11:14.880 | that you can rattle off.
00:11:15.880 | But it really just comes down to all the attention and detail
00:11:19.000 | that we've paid to making that context work well and be
00:11:23.080 | high quality and fast.
00:11:24.760 | For human devs, we're now kind of plugging into the AI coding
00:11:27.720 | assistant as well.
00:11:29.280 | Yeah.
00:11:30.020 | I mean, just to add, just to add my own perspective
00:11:33.680 | onto what Byung just described, I'd
00:11:37.720 | say RAG is kind of like a consultant
00:11:40.920 | that the LLM has available that knows about your code.
00:11:45.000 | RAG provides basically a bridge to a lookup system
00:11:47.400 | for the LLM, right?
00:11:49.520 | Whereas fine-tuning would be more like on-the-job training
00:11:53.240 | for somebody.
00:11:54.000 | If the LLM is a person, and you send them to a new job,
00:11:56.840 | and you do on-the-job training, that's
00:11:57.880 | what fine-tuning is like, right?
00:11:59.360 | So tuned to a specific task.
00:12:02.200 | You're always going to need that expert,
00:12:03.820 | even if you get the on-the-job training,
00:12:05.480 | because the expert knows your particular code base,
00:12:08.400 | your task, right?
00:12:10.320 | And that expert has to know your code.
00:12:12.620 | And there's a chicken-and-egg problem, because we're like,
00:12:15.160 | well, I'm going to ask the LLM about my code.
00:12:17.120 | But first, I have to explain it, right?
00:12:19.320 | It's this chicken-and-egg problem.
00:12:20.740 | That's where RAG comes in.
00:12:22.200 | And we have the best consultants, right?
00:12:24.800 | The best assistant who knows your code.
00:12:28.240 | And so when you sit down with Cody, right?
00:12:32.600 | What Bian said earlier about going to Google
00:12:34.640 | and using code search, and then starting to feel like without
00:12:37.200 | it, his job was super tedious, yeah?
00:12:40.760 | Once you start using these-- do you guys use coding assistants?
00:12:43.600 | Yeah, right?
00:12:44.400 | I mean, we're getting to the point very quickly, right?
00:12:48.600 | Where you feel like you're kind of like--
00:12:50.640 | almost like you're programming without the internet, right?
00:12:52.840 | Or something.
00:12:53.480 | It's like you're programming back in the '90s
00:12:55.340 | without the coding assistant, yeah?
00:12:57.680 | So hopefully that helps for people
00:12:59.480 | who have no idea about coding systems, what they are.
00:13:03.240 | Yeah.
00:13:03.960 | And I mean, going back to using them,
00:13:06.120 | we had a lot of them on the podcast already.
00:13:07.920 | We had Cursor.
00:13:08.920 | We had Codium and Codium, very similar names.
00:13:12.680 | Yeah.
00:13:13.180 | Griblet, Find, and then, of course, there's Copilot.
00:13:16.400 | Tab9.
00:13:17.960 | Oh, RIP.
00:13:19.100 | No, Kite is the one that died, right?
00:13:20.640 | Oh, right.
00:13:21.400 | I don't know.
00:13:22.120 | I'm starting to get drunk.
00:13:23.940 | So you had a Copilot versus Cody blog post.
00:13:26.760 | And I think it really shows the context improvement.
00:13:31.040 | So you had two examples that stuck with me.
00:13:32.960 | One was, what does this application do?
00:13:35.500 | And the Copilot answer was like, oh, it
00:13:37.440 | uses JavaScript and NPM and this.
00:13:40.000 | And it's like, but that's not what it does.
00:13:42.520 | That's what it's built with.
00:13:43.880 | Versus Cody was like, oh, these are the major functions
00:13:47.840 | and these are the functionalities and things
00:13:49.760 | like that.
00:13:51.280 | And then the other one was, how do I start this up?
00:13:53.440 | And Copilot just said, NPM start,
00:13:56.440 | even though there was no start command in the package JSON.
00:13:59.440 | But, you know, moat collapse, right?
00:14:01.680 | Most projects use NPM start, so maybe this does too.
00:14:05.720 | How do you think about open source models and private--
00:14:10.600 | because Copilot has their own private thing.
00:14:12.520 | And I think you guys use Starcoder, if I remember right.
00:14:15.880 | - Yeah, that's correct.
00:14:17.000 | I think Copilot uses some variant of Codex.
00:14:19.780 | They're kind of cagey about it.
00:14:21.080 | I don't think they've officially announced what model they use.
00:14:24.000 | - And I think they use a range of models based on what you're
00:14:26.540 | doing.
00:14:27.240 | - Yeah, so everyone uses a range of model.
00:14:28.960 | No one uses the same model for inline completion
00:14:31.260 | versus chat, because the latency requirements for--
00:14:34.320 | - Oh, OK.
00:14:35.040 | - Well, there's fill in the middle.
00:14:36.500 | There's also what the model's trained on.
00:14:38.560 | So we actually had completions powered
00:14:40.760 | by Cloud Instant for a while.
00:14:42.720 | But you had to kind of prompt hack your way
00:14:44.960 | to get it to output just the code and not, like, hey,
00:14:48.480 | here's the code you asked for, like that sort of text.
00:14:52.000 | So everyone uses a range of models.
00:14:54.320 | We've kind of designed Kodi to be especially model--
00:14:59.880 | not agnostic, but pluggable.
00:15:02.400 | So one of our design considerations
00:15:05.640 | was, as the ecosystem evolves, we
00:15:07.680 | want to be able to integrate the best in class models,
00:15:11.040 | whether they're proprietary or open source, into Kodi,
00:15:15.200 | because the pace of innovation in the space is just so quick.
00:15:19.680 | And I think that's been to our advantage.
00:15:21.760 | Like today, Kodi uses Starcoder for inline completions.
00:15:25.640 | And with the benefit of the context that we provide,
00:15:29.440 | we actually show comparable completion acceptance rate
00:15:33.200 | metrics.
00:15:34.160 | It's kind of like the standard metric
00:15:35.840 | that folks use to evaluate inline completion quality.
00:15:38.240 | It's like, if I show you a completion,
00:15:39.840 | what's the chance that you actually accept the completion
00:15:42.240 | versus you reject it?
00:15:43.160 | And so we're at par with Copilot,
00:15:45.080 | which is at the head of the industry right now.
00:15:47.920 | And we've been able to do that with the Starcoder model, which
00:15:50.420 | is open source, and the benefit of the context fetching stuff
00:15:54.360 | that we provide.
00:15:55.020 | And of course, a lot of like prompt engineering
00:15:57.000 | and other stuff along the way.
00:16:00.400 | Yeah.
00:16:01.280 | And Steve, you've wrote a post called
00:16:03.640 | "Cheating is All You Need" about what you're building.
00:16:06.080 | And one of the points you made is
00:16:07.460 | that everybody's fighting on the same axis, which
00:16:10.000 | is better UI and the IDE, maybe like a better chat response.
00:16:14.400 | But data modes are kind of the most important thing.
00:16:17.000 | And you guys have like a 10-year-old mode
00:16:19.960 | with all the data you've been collecting.
00:16:22.280 | How do you kind of think about what other companies are
00:16:25.800 | doing wrong, right?
00:16:26.720 | Like, why is nobody doing this in terms
00:16:30.320 | of like really focusing on RAG?
00:16:31.840 | I feel like you see so many people, oh, we just
00:16:34.560 | got a new model, and it's like a bit human eval.
00:16:36.920 | And it's like, wow, but maybe like that's not
00:16:39.240 | what we should really be doing, you know?
00:16:41.040 | Do you think most people underestimate
00:16:42.960 | the importance of like the actual RAG in code?
00:16:47.040 | Yeah, I mean, I think that people weren't doing it much.
00:16:51.440 | It's kind of at the edges of AI.
00:16:53.320 | It's not in the center.
00:16:54.520 | I know that when ChatGPT launched,
00:16:56.200 | so within the last year, I've heard a lot of rumblings
00:16:58.520 | from inside of Google, right?
00:16:59.840 | Because they're undergoing a huge transformation
00:17:02.240 | to try to, of course, get into the new world.
00:17:05.120 | And I heard that they told a bunch of teams
00:17:07.160 | to go and train their own models or fine-tune their own models,
00:17:09.740 | both.
00:17:10.640 | And it was a shit show, right?
00:17:12.120 | Because nobody knew how to do it.
00:17:14.240 | And they launched two coding assistants.
00:17:16.840 | One was called Code D, with an E-Y.
00:17:20.120 | And then there was--
00:17:21.160 | I don't know what happened in that one.
00:17:22.740 | And then there's Duet, right?
00:17:24.120 | Google loves to compete with themselves, right?
00:17:26.160 | They do this all the time.
00:17:27.440 | And they had a paper on Duet, like, from a year ago.
00:17:29.880 | And they were doing exactly what Copilot was doing,
00:17:32.040 | which was just pulling in the local context, right?
00:17:35.720 | But fundamentally, I thought of this
00:17:38.440 | because we were talking about the splitting of the models.
00:17:40.840 | In the early days, it was the LLM did everything.
00:17:44.160 | And then we realized that for certain use cases,
00:17:47.000 | like completions, that a different, smaller, faster
00:17:49.160 | model would be better.
00:17:50.760 | And that fragmentation of models,
00:17:53.040 | actually, we expected to continue and proliferate,
00:17:55.880 | right?
00:17:56.440 | Because fundamentally, we're a recommender engine right now.
00:18:00.040 | We're recommending code to the LLM.
00:18:02.080 | We're saying, may I interest you in this code
00:18:04.200 | right here so that you can answer my question?
00:18:06.920 | And being good at recommender engine--
00:18:09.180 | I mean, who are the best recommenders, right?
00:18:11.020 | There's YouTube, and Spotify, and Amazon, or whatever, right?
00:18:14.320 | Yeah, and they all have many, many, many, many, many models,
00:18:17.480 | right?
00:18:18.160 | All fine-tuned for very specific--
00:18:20.640 | and that's where we're headed in code, too, absolutely.
00:18:24.040 | Yeah, we just did an episode we released on Wednesday,
00:18:26.880 | which we said RAG is like Rexis, or like LLMs.
00:18:30.720 | You're basically just suggesting good content.
00:18:33.600 | It's like what?
00:18:34.680 | Recommendation systems.
00:18:35.760 | Oh, got it.
00:18:36.280 | Yeah, yeah, yeah.
00:18:36.960 | Rexis.
00:18:37.460 | Yeah.
00:18:37.960 | So the naive implementation of RAG
00:18:40.240 | is you embed everything through a vector database.
00:18:42.720 | You embed your query, and then you find the nearest neighbors,
00:18:45.420 | and that's your RAG.
00:18:46.600 | But actually, you need to rank it.
00:18:48.020 | And actually, you need to make sure
00:18:49.720 | there's sample diversity and that kind of stuff.
00:18:52.360 | And then you're slowly gradient-descending yourself
00:18:55.120 | towards rediscovering proper Rexis,
00:18:58.040 | which has been traditional ML for a long time,
00:19:00.080 | but approaching it from an LLM perspective.
00:19:02.840 | Yeah, I almost think of it as a generalized search problem,
00:19:06.160 | because it's a lot of the same things.
00:19:08.080 | You want your layer 1 to have high recall
00:19:11.080 | and get all the potential things that could be relevant,
00:19:13.840 | and then there's typically a layer 2 re-ranking mechanism
00:19:18.160 | that bumps up the precision, tries
00:19:20.240 | to get the relevant stuff to the top of the results list.
00:19:24.400 | Have you discovered that ranking matters a lot?
00:19:26.400 | So the context is that I think a lot of research
00:19:30.120 | shows that one, context utilization
00:19:33.000 | matters based on model.
00:19:34.920 | GBT uses the top of the context window,
00:19:37.600 | and then apparently, Cloud uses the bottom better.
00:19:40.480 | And it's lossy in the middle.
00:19:42.200 | So ranking matters.
00:19:43.320 | No, it really does.
00:19:44.360 | The skill with which models are able to take advantage
00:19:47.040 | of context is always going to be dependent on how
00:19:49.720 | that factors into the impact on the training loss.
00:19:53.400 | So if you want long context window models to work well,
00:19:56.240 | then you have to have a ton of data where it's
00:19:58.080 | like, here's a billion lines of text,
00:20:01.200 | and I'm going to ask a question about something that's
00:20:04.080 | embedded deeply into it, and give me the right answer.
00:20:07.880 | And unless you have that training set,
00:20:09.560 | then of course you're going to have variability in terms
00:20:12.040 | of where it attends to.
00:20:13.320 | And in most naturally occurring data,
00:20:15.320 | the thing that you're talking about right now,
00:20:17.200 | the thing I'm asking about, is going
00:20:18.280 | to be something that we talked about recently.
00:20:20.840 | Did you really just say gradient dissenting yourself?
00:20:24.640 | Actually, I love that it's entered the casual lexicon.
00:20:28.520 | My favorite version of that is how you have to p-hack papers.
00:20:32.440 | So when you throw humans at the problem,
00:20:35.120 | that's called graduate student dissent.
00:20:38.600 | That's great.
00:20:39.960 | Yeah, it's really awesome.
00:20:43.320 | I think the other interesting thing that you have
00:20:45.360 | is inline-assist UX that is, I wouldn't say async,
00:20:51.280 | but it works while you can also do work.
00:20:53.240 | So you can ask Kodi to make changes on a code block,
00:20:55.840 | and you can still edit the same file at the same time.
00:20:59.880 | How do you see that in the future?
00:21:01.880 | Do you see a lot of Kodis running together
00:21:04.120 | at the same time?
00:21:05.800 | How do you validate also that they're not
00:21:08.040 | messing each other up as they make changes in the code?
00:21:11.200 | And maybe what are the limitations today,
00:21:12.920 | and what do you think about where the attack is going?
00:21:14.920 | I want to start with a little history,
00:21:16.040 | and then I'm going to turn it over to Bjorn.
00:21:18.200 | So we actually had this feature in the very first launch
00:21:21.320 | back in June.
00:21:22.320 | Dominic wrote it.
00:21:23.200 | It was called Nonstop Kodi.
00:21:25.040 | And you could have multiple basically LLM requests
00:21:28.880 | in parallel modifying your source file.
00:21:31.200 | And he wrote a bunch of codes to handle all of the diffing
00:21:33.640 | logic, and you could see the regions of code
00:21:36.640 | that the LLM was going to change.
00:21:38.760 | And he was showing me demos of it.
00:21:40.960 | And it just felt like it was just a little before its time.
00:21:45.280 | But a bunch of that stuff, that scaffolding
00:21:47.480 | was able to be reused for where inline's sitting today.
00:21:52.200 | How would you characterize it today?
00:21:54.200 | Yeah, so that interface has really
00:21:56.280 | evolved from a like, hey, general purpose,
00:21:58.920 | like request anything inline in the code
00:22:02.360 | and have the code update, to really like targeted features
00:22:05.000 | like fix the bug that exists at this line,
00:22:08.720 | or request a very specific change.
00:22:11.320 | And the reason for that is, I think the challenge
00:22:14.400 | that we ran into with inline fixes--
00:22:16.120 | and we do want to get to the point where you could just
00:22:18.440 | fire it, forget, and have half a dozen of these running
00:22:21.560 | in parallel.
00:22:22.400 | But I think we ran into the challenge
00:22:24.720 | early on that a lot of people are running into now
00:22:27.200 | when they're trying to construct agents, which
00:22:29.920 | is the reliability of working code generation
00:22:36.280 | is just not quite there yet in today's language models.
00:22:40.920 | And so that kind of constrains you to an interaction
00:22:45.360 | where the human is always like in the inner loop,
00:22:47.600 | like checking the output of each response.
00:22:50.960 | And if you want that to work in a way where
00:22:54.280 | you can be asynchronous, you kind of
00:22:56.840 | have to constrain it to a domain where today's language models
00:22:59.720 | can generate reliable code well enough.
00:23:02.120 | So generating unit tests, that's like a well-constrained problem,
00:23:05.520 | or fixing a bug that shows up as a compiler error or a test
00:23:11.320 | error, that's a well-constrained problem.
00:23:13.280 | But the more general, like, hey, write me
00:23:15.440 | this class that does x, y, and z using the libraries
00:23:17.480 | that I have, that is not quite there yet,
00:23:21.080 | even with the benefit of really good context.
00:23:23.760 | It definitely moves the needle a lot,
00:23:25.400 | but we're not quite there yet to the point
00:23:27.560 | where you can just fire and forget.
00:23:29.240 | And I actually think that this is something
00:23:31.600 | that people don't broadly appreciate yet,
00:23:34.560 | because I think that everyone's chasing
00:23:36.480 | this dream of agentic execution.
00:23:39.560 | And if we're to really define that down,
00:23:41.560 | I think it implies a couple of things.
00:23:43.160 | You have a multi-step process where
00:23:44.640 | each step is fully automated, where
00:23:46.120 | you don't have to have a human in the loop every time.
00:23:48.440 | And there's also kind of like an LLM call at each stage,
00:23:52.080 | or nearly every stage in that chain.
00:23:55.960 | And based on all the work that we've
00:23:58.080 | done with the inline interactions,
00:24:01.760 | with kind of like general Cody features
00:24:06.320 | for implementing longer chains of thought,
00:24:08.160 | we're actually a little bit more bearish
00:24:11.040 | than the average AI hypefluencer out there
00:24:15.880 | on the feasibility of agents with purely kind
00:24:19.160 | of like transformer-based models.
00:24:20.680 | To your original question, like the inline interactions
00:24:23.280 | with Cody, we've actually constrained it
00:24:24.960 | to be more targeted, like fix the current error
00:24:28.000 | or make this quick fix.
00:24:29.600 | I think that that does differentiate us
00:24:31.160 | from a lot of the other tools on the market,
00:24:32.760 | because a lot of people are going
00:24:34.100 | after this shnazzy inline edit interaction,
00:24:36.960 | whereas I think where we've moved--
00:24:38.880 | and this is based on the user feedback that we've gotten--
00:24:41.240 | it's like that sort of thing, it demos well,
00:24:43.840 | but when you're actually coding day-to-day,
00:24:45.680 | you don't want to have a long chat conversation
00:24:47.760 | inline with the code base.
00:24:48.840 | That's a waste of time.
00:24:50.200 | You'd rather just have it write the right thing
00:24:52.900 | and then move on with your life or not have to think about it.
00:24:55.480 | And that's what we're trying to work towards.
00:24:57.360 | I mean, yeah, we're not going in the agent direction.
00:24:59.640 | I mean, I'll believe in agents when somebody
00:25:01.480 | shows me one that works.
00:25:03.600 | Instead, we're working on sort of solidifying
00:25:06.640 | our strength, which is bringing the right context in.
00:25:10.240 | So new context sources, ways for you
00:25:12.060 | to plug in your own context, ways for you to control
00:25:14.400 | or influence the context, the mixing that
00:25:16.440 | happens before the request goes out, et cetera.
00:25:19.360 | And there's just so much low-hanging fruit
00:25:21.200 | left in that space that agents seems
00:25:23.420 | like a little bit of a boondoggle.
00:25:24.840 | Just to dive into that a little bit further,
00:25:27.120 | I think at a very high level, what do people
00:25:29.640 | mean when they say agents?
00:25:30.720 | They really mean greater automation, fully automated.
00:25:33.200 | The dream is, here's an issue.
00:25:35.400 | Go implement that.
00:25:36.720 | And I don't have to think about it as a human.
00:25:38.800 | And I think we are working towards that.
00:25:40.720 | That is the eventual goal.
00:25:41.840 | I think it's specifically the approach of, hey,
00:25:44.880 | can we have a transformer-based LLM alone
00:25:48.120 | be the backbone or the orchestrator
00:25:50.680 | of these agentic flows?
00:25:52.000 | We're a little bit more bearish today.
00:25:56.640 | You want a human in the loop.
00:25:58.240 | I mean, you kind of have to.
00:25:59.440 | It's just a reality of the behavior of language models
00:26:03.340 | that are purely transformer-based.
00:26:04.840 | And I think that's just a reflection of reality.
00:26:06.920 | And I don't think people realize that yet.
00:26:08.680 | Because if you look at the way that a lot of other AI tools
00:26:14.680 | have implemented context fetching, for instance,
00:26:18.360 | you see this in the co-pilot approach,
00:26:20.080 | where if you use the at-workspace thing that
00:26:23.080 | supposedly provides code-based level context,
00:26:27.040 | it has an agentic approach, where you kind of look
00:26:31.840 | at how it's behaving.
00:26:32.920 | And it feels like they're making multiple requests to the LLM,
00:26:36.280 | being like, what would you do in this case?
00:26:38.640 | Would you search for stuff?
00:26:39.760 | What sort of files would you gather?
00:26:42.160 | Go and read those files.
00:26:43.480 | And it's a multi-hop step, so it takes a long while.
00:26:46.440 | It's also non-deterministic.
00:26:47.920 | Because any sort of LLM invocation,
00:26:50.040 | it's like a dice roll.
00:26:51.800 | And then at the end of the day, the context it fetches
00:26:54.040 | is not that good.
00:26:55.160 | Whereas our approach is just like, OK,
00:26:56.760 | let's do some code searches that make sense,
00:26:59.280 | and then maybe crawl through the reference graph a little bit.
00:27:03.560 | That is fast.
00:27:04.840 | That doesn't require any sort of LLM invocation at all.
00:27:08.520 | And we can pull in much better context very quickly.
00:27:13.040 | So it's faster, it's more reliable, it's deterministic,
00:27:16.080 | and it yields better context quality.
00:27:17.720 | And so that's what we think.
00:27:20.000 | We just don't think you should cargo cult or naively go,
00:27:23.760 | agents are the future, let's just
00:27:25.240 | try to implement agents on top of the LLMs that exist today.
00:27:29.760 | I think there are a couple of other technologies
00:27:33.600 | or approaches that need to be refined first
00:27:35.800 | before we can get into these multi-stage, fully automated
00:27:39.520 | workflows.
00:27:40.520 | We're very much focused on developer inner loop right now.
00:27:43.480 | But you do see things eventually moving
00:27:45.400 | towards developer outer loop.
00:27:47.920 | So would you basically say that they're
00:27:50.680 | tackling the agents problem that you don't want to tackle?
00:27:54.720 | No, I would say at a high level, we
00:27:56.960 | are after maybe like the same high level problem, which
00:28:00.720 | is like, hey, I want some code written.
00:28:03.000 | I want to develop some software.
00:28:05.320 | And can an automated system go build that software for me?
00:28:12.000 | I think the approaches might be different.
00:28:14.440 | So I think the analogy in my mind
00:28:16.400 | is, think about the AI chess players.
00:28:20.440 | Coding in some senses, it's similar and dissimilar
00:28:23.040 | to chess.
00:28:24.120 | I think one question I ask is, do you
00:28:25.620 | think producing code is more difficult than playing chess
00:28:29.400 | or less difficult than playing chess?
00:28:31.600 | More?
00:28:32.560 | I think more.
00:28:33.560 | And if you look at the best AI chess players,
00:28:36.480 | yes, you can use an LLM to play chess.
00:28:38.440 | People have showed demos where it's like, oh, yeah,
00:28:40.560 | GPT-4 is actually a pretty decent chess move suggester.
00:28:44.760 | But you would never build a best-in-class chess player
00:28:49.720 | off of GPT-4 alone.
00:28:53.160 | The way that people design chess players
00:28:55.760 | is you have a search space.
00:28:58.400 | And then you have a way to explore that search space
00:29:02.240 | efficiently.
00:29:02.880 | There's a bunch of search algorithms, essentially,
00:29:04.920 | where you're doing tree search in various ways.
00:29:07.000 | And you can have heuristic functions,
00:29:10.080 | which might be powered by an LLM.
00:29:11.840 | You might use an LLM to generate proposals in that space
00:29:15.120 | that you can efficiently explore.
00:29:18.840 | But the backbone is still this more formalized tree search
00:29:24.440 | based approach rather than the LLM itself.
00:29:28.560 | And so I think my high level intuition is
00:29:31.800 | that the way that we get to this more reliable multi-step
00:29:36.000 | workflows that can do things beyond generate unit test,
00:29:41.400 | it's really going to be like a search-based approach, where
00:29:43.960 | you use an LLM as kind of like an advisor or a proposal
00:29:47.080 | function, sort of your heuristic function,
00:29:49.360 | like the A* search algorithm.
00:29:54.560 | But it's probably not going to be the thing that
00:29:57.240 | is the backbone.
00:29:58.400 | Because I guess it's not the right tool for that.
00:30:01.120 | Yeah, yeah.
00:30:02.720 | You also have-- I can see yourself
00:30:05.680 | thinking through this, but not saying
00:30:07.300 | the words, the philosophical Peter Norvig type discussion.
00:30:11.560 | Maybe you want to introduce that divided in software.
00:30:16.120 | Yeah, definitely.
00:30:19.120 | Your listeners are savvy.
00:30:20.400 | They're probably familiar with the classic Chomsky
00:30:22.620 | versus Norvig debate.
00:30:24.120 | No, actually, I was prompting you to introduce that.
00:30:26.800 | Oh, got it.
00:30:27.760 | So if you look at the history of artificial intelligence,
00:30:32.160 | it goes way back to--
00:30:33.800 | I don't know, it's probably as old as modern computers,
00:30:36.440 | like '50s, '60s, '70s.
00:30:38.640 | People are debating on what is the path
00:30:40.680 | to producing a general human level of intelligence.
00:30:43.600 | And two schools of thought that emerged.
00:30:47.360 | One is the Norvig school of thought,
00:30:51.320 | which, roughly speaking, includes large language
00:30:55.560 | models, regression, SVN.
00:30:58.840 | Basically, any model that you learn from data
00:31:02.380 | and is data driven, machine learning--
00:31:04.400 | most of machine learning would fall under this umbrella.
00:31:06.700 | And that school of thought says, just learn from the data.
00:31:10.800 | That's the approach to reaching intelligence.
00:31:13.320 | And then the Chomsky approach is more things
00:31:16.000 | like compilers, and parsers, and formal systems.
00:31:20.160 | So basically, let's think very carefully
00:31:22.320 | about how to construct a formal, precise system.
00:31:26.120 | And that will be the approach to how we build
00:31:29.080 | a truly intelligent system.
00:31:31.080 | Lisp, for instance, was originally an attempt to--
00:31:36.240 | I think Lisp was invented so that you
00:31:38.400 | could create rules-based systems that you would call AI.
00:31:41.560 | As a language, yeah.
00:31:42.360 | Yeah, and for a long time, there was this debate.
00:31:44.400 | There were certain AI research labs
00:31:45.860 | that were more in the Chomsky camp,
00:31:47.840 | and others that were more in the Norvig camp.
00:31:50.120 | And it's a debate that rages on today.
00:31:51.760 | And I feel like the consensus right now
00:31:53.760 | is that Norvig definitely has the upper hand right now
00:31:56.840 | with the advent of LLMs, and diffusion models,
00:31:59.280 | and all the other recent progress in machine learning.
00:32:03.840 | But the Chomsky-based stuff is still really useful,
00:32:08.080 | in my view.
00:32:09.160 | I mean, it's like parsers, compilers.
00:32:10.720 | Basically, a lot of the stuff that
00:32:12.160 | provides really good context, it provides
00:32:14.160 | kind of like the knowledge graph backbone
00:32:17.260 | that you want to explore with your AI dev tool.
00:32:21.400 | That will come from Chomsky-based tools,
00:32:23.920 | like compilers and parsers.
00:32:25.600 | It's a lot of what we've invested in the past decade
00:32:28.040 | at Sourcegraph, and what you built with Grok.
00:32:33.040 | Basically, these formal systems that
00:32:34.480 | construct these very precise knowledge graphs that
00:32:37.640 | are great context providers, and great guardrails enforcers,
00:32:41.400 | and safety checkers for the output of a more data-driven,
00:32:48.720 | fuzzier system that uses like the Norvig-based models.
00:32:54.240 | Bianca was talking about this stuff
00:32:55.780 | like it happened in the Middle Ages.
00:32:57.500 | Basically, it's like, OK, so when I was in college,
00:33:02.000 | I was in college learning Lisp, and Prolog, and Planning,
00:33:04.500 | and all the deterministic Chomsky approaches to AI.
00:33:08.240 | And I was there when Norvig basically declared it dead.
00:33:12.440 | I was there 3,000 years ago when Norvig and Chomsky
00:33:16.040 | fought on the volcano.
00:33:17.280 | When did he declare it dead?
00:33:18.520 | What do you mean he declared it dead?
00:33:20.040 | Late '90s, yeah, when I went to Google,
00:33:22.080 | Peter Norvig was already there.
00:33:24.960 | And he had basically like--
00:33:27.120 | I forget exactly where.
00:33:29.160 | He's got so many famous short posts, amazing things.
00:33:32.080 | He had a famous talk, "The Unreasonable Effectiveness
00:33:34.280 | of Data."
00:33:35.080 | Yeah, maybe that was it.
00:33:36.080 | But at some point, basically, he basically
00:33:38.560 | convinced everybody that the deterministic approaches had
00:33:41.360 | failed, and that heuristic-based, data-driven,
00:33:44.280 | statistical approaches, stochastic were better.
00:33:47.240 | The primary reason--
00:33:48.640 | I can tell you this because I was there--
00:33:50.400 | was that--
00:33:50.960 | [LAUGHTER]
00:33:53.360 | --was that, well, the steam-powered engine-- no.
00:33:55.800 | [LAUGHTER]
00:33:58.080 | The reason was that the deterministic stuff didn't
00:34:00.560 | scale, right?
00:34:01.800 | They were using Prolog, man, constraint systems
00:34:04.160 | and stuff like that.
00:34:05.200 | Well, that was a long time ago, right?
00:34:07.400 | Today, actually, these Chomsky-style systems do scale.
00:34:11.080 | And that's, in fact, exactly what Sourcegraph has built.
00:34:14.200 | And so we have a very unique--
00:34:16.240 | I love the framing that Bjong's made,
00:34:19.240 | the marriage of the Chomsky and the Norvig models,
00:34:22.360 | conceptual models, because we have both of them.
00:34:24.840 | And they're both really important.
00:34:26.260 | And, in fact, there's this really interesting overlap
00:34:29.760 | between them, where the AI or our graph or our search engine
00:34:33.400 | could potentially provide the right context for any given
00:34:35.720 | query, which is, of course, why ranking is important.
00:34:38.360 | But what we've really signed ourselves up for
00:34:40.680 | is an extraordinary amount of testing, yeah?
00:34:44.520 | Because, like you were saying, Swix,
00:34:46.760 | you were saying that GPT-4 tends to the front of the context
00:34:49.760 | window, and maybe other LLMs to the back,
00:34:51.740 | and maybe all the way to the middle.
00:34:53.580 | Yeah, and so that means that if we're actually
00:34:56.680 | verifying whether some change we've made
00:34:59.000 | has improved things, we're going to have
00:35:00.920 | to test putting it at the beginning of the window
00:35:02.480 | and at the end of the window, and maybe
00:35:04.280 | make the right decision based on the LLM that you've chosen.
00:35:06.920 | Which some of our competitors, that's
00:35:08.200 | a problem that they don't have.
00:35:09.500 | But we meet you where you are, yeah?
00:35:11.720 | And just to finish, we're writing thousands,
00:35:14.360 | tens of thousands.
00:35:15.400 | We're generating tests, filling the middle type tests
00:35:17.560 | and things, and then using our graph
00:35:19.320 | to basically fine-tune Cody's behavior there, yeah?
00:35:24.200 | Yeah.
00:35:25.080 | I also want to add, I have an internal pet name
00:35:28.400 | for this hybrid architecture that I'm trying to make catch on.
00:35:32.680 | Maybe I'll just say it here.
00:35:34.760 | Saying it publicly makes it more real.
00:35:37.320 | But I call the architecture that we've
00:35:39.560 | developed the Normski architecture.
00:35:43.880 | And it's kind of like--
00:35:45.120 | I mean, it's obviously a portmanteau of Norvig
00:35:49.060 | and Chomsky, but the acronym, it stands
00:35:52.280 | for non-agentic, rapid, multi-source code intelligence.
00:35:58.040 | So non-agentic, because--
00:35:59.200 | Rolls right off the tongue.
00:36:01.680 | And Normski.
00:36:02.720 | Yeah.
00:36:04.120 | Yeah.
00:36:05.400 | But it's non-agentic in the sense
00:36:07.000 | that we're not trying to pitch you on agent hype, right?
00:36:12.040 | The things it does are really just use developer tools
00:36:15.460 | developers have been using for decades now,
00:36:17.680 | like parsers and really good search indexes and things
00:36:21.000 | like that.
00:36:23.200 | Rapid, because we place an emphasis on speed.
00:36:25.440 | We don't want to sit there waiting for multiple LLM
00:36:28.920 | requests to return to complete a simple user request.
00:36:32.240 | Multi-source, because we're thinking broadly
00:36:35.600 | about what pieces of information and knowledge
00:36:39.240 | are useful context.
00:36:40.120 | So obviously starting with things
00:36:41.840 | that you can search in your code base,
00:36:43.680 | and then you add in the reference graph, which
00:36:45.640 | kind of allows you to crawl outward
00:36:47.440 | from those initial results.
00:36:49.920 | But then even beyond that, sources of information,
00:36:52.160 | like there's a lot of knowledge that's
00:36:54.120 | embedded in docs, in PRDs, or product specs,
00:37:01.680 | in your production logging system, in your chat,
00:37:07.520 | in your Slack channel, right?
00:37:09.520 | Like there's so much context that's embedded there.
00:37:11.600 | And when you're a human developer
00:37:12.840 | and you're trying to be productive in your code base,
00:37:15.080 | you're going to go to all these different systems
00:37:16.600 | to collect the context that you need to figure out
00:37:19.480 | what code you need to write.
00:37:21.520 | And I don't think the AI developer will be any different.
00:37:24.560 | It will need to pull context from all
00:37:26.640 | these different sources.
00:37:27.680 | So we're thinking broadly about how
00:37:29.280 | to integrate these into Cody.
00:37:32.760 | We hope through kind of like an open protocol
00:37:35.200 | that others can extend and implement.
00:37:38.420 | And this is something else that should be, I guess,
00:37:41.960 | like accessible by December 14th in kind of like a preview
00:37:45.240 | stage.
00:37:46.640 | But that's really about like broadening
00:37:48.400 | this notion of the code graph beyond your Git repository
00:37:51.480 | to all the other sources where technical knowledge
00:37:53.800 | and valuable context can live.
00:37:56.120 | Yeah, it becomes an artifact graph, right?
00:37:58.000 | It can link into your logs and your wikis
00:37:59.920 | and any data source, right?
00:38:03.080 | How do you guys think about the importance of--
00:38:05.600 | it's almost like data pre-processing in a way,
00:38:07.800 | which is bring it all together, tie it together, make it ready.
00:38:12.440 | Yeah, any thoughts on how to actually make
00:38:14.640 | that good, what some of the innovation you guys have made?
00:38:18.240 | We talk a lot about the context fetching, right?
00:38:20.900 | I mean, there's a lot of ways you could answer this question.
00:38:23.400 | But we've spent a lot of time just in this podcast
00:38:26.380 | here talking about context fetching.
00:38:27.920 | But stuffing the context into the window
00:38:30.000 | is also the bin packing problem, right?
00:38:31.840 | Because the window is not big enough
00:38:33.340 | and you've got more context than you can fit.
00:38:35.220 | You've got a ranker maybe.
00:38:36.560 | But what is that context?
00:38:40.520 | Is it a function that was returned
00:38:42.320 | by an embedding or a graph call or something?
00:38:45.280 | Do you need the whole function?
00:38:46.640 | Or do you just need the top part of the function,
00:38:50.040 | this expression here, right?
00:38:51.800 | So that art, the golf game of trying
00:38:53.920 | to get each piece of context down into its smallest state,
00:38:58.000 | possibly even summarized by another model
00:39:00.440 | before it even goes to the LLM, becomes this
00:39:02.960 | is the game that we're in, yeah?
00:39:04.800 | And so recursive summarization and all the other techniques
00:39:07.840 | that you've got to use to stuff stuff into that context window
00:39:10.560 | become critically important.
00:39:12.200 | And you have to test them across every configuration of models
00:39:15.200 | that you could possibly need.
00:39:16.800 | I think data preprocessing is probably
00:39:19.000 | the unsexy, way underappreciated secret
00:39:22.160 | to a lot of the cool stuff that people are shipping today,
00:39:26.760 | whether you're doing like RAG or fine tuning or pre-training.
00:39:31.920 | The preprocessing step matters so much
00:39:34.800 | because it is basically garbage in, garbage out, right?
00:39:39.440 | Like if you're feeding in garbage to the model,
00:39:41.600 | then it's going to output garbage.
00:39:43.560 | Concretely, for code RAG, if you're not
00:39:49.000 | doing some sort of preprocessing that
00:39:50.800 | takes advantage of a parser and is
00:39:53.680 | able to extract the key components of a particular file
00:39:58.320 | of code, separate the function signature from the body,
00:40:00.760 | from the doc string, what are you even doing?
00:40:03.080 | That's like table stakes.
00:40:05.000 | And it opens up so much more possibilities
00:40:08.760 | with which you can tune your system
00:40:12.360 | to take advantage of the signals that
00:40:15.160 | come from those different parts of the code.
00:40:17.760 | We've had a tool since computers were invented
00:40:20.120 | that understands the structure of source code
00:40:23.560 | to 100% precision.
00:40:26.640 | The compiler knows everything there
00:40:28.760 | is to know about the code in terms of structure.
00:40:32.320 | Why would you not want to use that
00:40:34.080 | in a system that's trying to generate code,
00:40:36.440 | answer questions about code?
00:40:37.760 | You shouldn't throw that out the window
00:40:39.400 | just because now we have really good data-driven models that
00:40:44.320 | can do other things.
00:40:45.160 | Yeah.
00:40:45.800 | When I called it a data moat in my cheating post,
00:40:50.520 | a lot of people were confused about--
00:40:53.000 | because data moat sort of sounds like data lake
00:40:56.040 | because there's data and water and stuff.
00:40:57.960 | I don't know.
00:40:58.800 | And so they thought that we were sitting
00:41:00.080 | on this giant mountain of data that we had collected.
00:41:02.440 | But that's not what our data moat is.
00:41:04.040 | It's really a data preprocessing engine
00:41:06.400 | that can very quickly and scalably basically dissect
00:41:09.600 | your entire code base into very small, fine-grained semantic
00:41:12.900 | units and then serve it up.
00:41:15.600 | And so it's really-- it's not a data moat.
00:41:17.280 | It's a data preprocessing moat, I guess.
00:41:20.000 | Yeah, if anything, we're hypersensitive to customer data
00:41:23.380 | privacy requirements.
00:41:24.880 | So it's not like we've taken a bunch of private data
00:41:27.040 | and trained a generally available model.
00:41:29.840 | In fact, exactly the opposite.
00:41:32.000 | A lot of our customers are choosing
00:41:33.520 | Cody over Copilot and other competitors
00:41:36.480 | because we have an explicit guarantee
00:41:38.200 | that we don't do any of that.
00:41:39.400 | And we've done that from day one.
00:41:41.000 | Yeah.
00:41:42.000 | I think that's a very real concern in today's day and age.
00:41:44.740 | Because if your proprietary IP finds its way
00:41:48.120 | into the training set of any model,
00:41:50.720 | it's very easy both to extract that knowledge from the model
00:41:54.360 | and also use it to build systems that
00:41:57.640 | work on top of the institutional knowledge
00:41:59.440 | that you've built up.
00:42:01.560 | About a year ago, I wrote a post on LLMs for developers.
00:42:05.040 | And one of the points I had was maybe the depth of the DSL.
00:42:08.680 | I spent most of my career writing Ruby.
00:42:10.560 | And I love Ruby.
00:42:12.120 | It's so nice to use.
00:42:13.640 | But it's not as performant, but it's really easy to read.
00:42:16.680 | And then you look at other languages,
00:42:18.560 | maybe they're faster, but they're more verbose.
00:42:21.760 | And when you think about efficiency of the context
00:42:24.280 | window, that actually matters.
00:42:27.440 | But I haven't really seen a DSL for models.
00:42:31.360 | I haven't seen code being optimized
00:42:33.720 | to be easier to put in a model context.
00:42:36.320 | And it seems like your pre-processing
00:42:38.320 | is kind of doing that.
00:42:39.240 | Do you see in the future the way we think about DSL and APIs
00:42:43.600 | and service interfaces be more focused
00:42:46.660 | on being context-friendly?
00:42:48.520 | Whereas maybe it's harder to read for the human,
00:42:52.400 | but the human is never going to write it anyway.
00:42:55.160 | We were talking on the "Hacks" podcast.
00:42:57.400 | There are some data science things, like spin-up the spandex.
00:43:01.400 | Humans are never going to write again,
00:43:03.160 | because the models can just do very easily.
00:43:05.760 | Yeah, curious to hear your thoughts.
00:43:07.880 | Well, so DSLs, they involve writing a grammar and a parser.
00:43:14.600 | And they're like little languages, right?
00:43:18.600 | And we do them that way because we need them to compile,
00:43:23.240 | and humans need to be able to read them, and so on.
00:43:26.120 | The LLMs don't need that level of structure.
00:43:28.120 | You can throw any pile of crap at them,
00:43:30.600 | more or less unstructured, and they'll deal with it.
00:43:32.800 | So I think that's why a DSL hasn't emerged
00:43:35.600 | for communicating with the LLM or packaging up
00:43:38.420 | the context or anything.
00:43:39.420 | Maybe it will at some point, right?
00:43:40.880 | We've got tagging of context and things
00:43:42.560 | like that that are sort of peeking into DSL territory,
00:43:45.240 | right?
00:43:45.740 | But your point on do users, do people
00:43:48.480 | have to learn DSLs, like regular expressions,
00:43:50.800 | or pick your favorite, right?
00:43:52.440 | XPath.
00:43:53.600 | I think you're absolutely right that the LLMs are really,
00:43:56.140 | really good at that.
00:43:57.000 | And I think you're going to see a lot less of people
00:43:59.220 | having to slave away learning these things.
00:44:01.080 | They just have to know the broad capabilities,
00:44:03.040 | and then the LLM will take care of the rest.
00:44:06.400 | Yeah, I'd agree with that.
00:44:07.560 | I think we will see kind of like a revisiting of--
00:44:11.640 | basically, the value profit of DSL
00:44:13.400 | is that it makes it easier to work with a lower level
00:44:17.320 | language, but at the expense of introducing an abstraction
00:44:20.080 | layer.
00:44:22.280 | And in many cases today, without the benefit of AI co-generation,
00:44:27.920 | that's totally worth it, right?
00:44:31.040 | With the benefit of AI co-generation,
00:44:33.200 | I mean, I don't think all DSLs will go away.
00:44:36.800 | I think there's still places where that trade-off
00:44:38.960 | is going to be worthwhile.
00:44:40.280 | But it's kind of like, how much of source code
00:44:43.780 | do you think is going to be generated
00:44:45.320 | through natural language prompting in the future?
00:44:47.000 | Because in a way, any programming language
00:44:48.960 | is just a DSL on top of assembly, right?
00:44:52.760 | And so if people can do that, then yeah.
00:44:56.200 | Maybe for a large portion of the code that's written,
00:44:59.140 | people don't actually have to understand
00:45:00.800 | the DSL that is Ruby, or Python, or basically
00:45:04.840 | any other programming language that exists today.
00:45:07.000 | I mean, seriously, do you guys ever write SQL queries now
00:45:09.960 | without using a model of some sort?
00:45:12.080 | At least at JavaScript.
00:45:13.120 | Ever?
00:45:14.200 | Yeah, right?
00:45:14.920 | And so we have kind of passed that bridge, right?
00:45:18.200 | Yeah, I think to me, the long-term thing is like,
00:45:21.840 | is there ever going to be--
00:45:23.560 | you don't actually see the code.
00:45:25.360 | It's like, hey-- the basic thing is like, hey,
00:45:27.240 | I need a function to sum two numbers.
00:45:29.400 | And that's it.
00:45:31.240 | I don't need you to generate the code.
00:45:33.080 | And the follow-on question, do you need the engineer
00:45:35.400 | or the paycheck?
00:45:37.920 | I mean, right?
00:45:38.880 | That's kind of the agent's discussion in a way,
00:45:40.880 | where you cannot automate the agents,
00:45:42.960 | but slowly you're getting more of the atomic units
00:45:46.600 | of the work done.
00:45:48.400 | I kind of think of it as like, do you need a punch card
00:45:50.800 | operator to answer that for you?
00:45:52.640 | And so I think we're still going to have people
00:45:54.600 | in the role of a software engineer,
00:45:56.100 | but the portion of time they spend
00:45:58.920 | on these kind of low-level, tedious tasks
00:46:02.600 | versus the higher-level, more creative tasks is going to
00:46:05.480 | shift.
00:46:07.000 | No, I haven't used punch cards.
00:46:08.560 | It looks over here.
00:46:09.320 | [LAUGHTER]
00:46:12.340 | Yeah.
00:46:12.840 | Yeah, I've been talking about--
00:46:14.520 | so we've kind of made this podcast
00:46:17.040 | about the sort of rise of the AI engineer.
00:46:20.040 | And the first step is the AI-enhanced engineer
00:46:22.440 | that is that software developer that is no longer doing
00:46:25.720 | these routine boilerplate-y type tasks,
00:46:28.280 | because they're just enhanced by tools like yours.
00:46:30.880 | So you mentioned-- you opened CodeGraph.
00:46:33.160 | I mean, that is a kind of DSL, maybe.
00:46:35.960 | And because we're releasing this as you go GA,
00:46:40.040 | you hope for other people to take advantage of that?
00:46:43.700 | Oh, yeah.
00:46:44.200 | I would say-- so OpenCodeGraph is not a DSL.
00:46:46.280 | It's more of a protocol.
00:46:47.280 | It's basically like, hey, if you want
00:46:48.820 | to make your system, whether it's chat, or logging,
00:46:52.760 | or whatever, accessible to an AI developer tool like Kodi,
00:46:58.840 | here is kind of like the schema by which you can provide
00:47:03.000 | that context and offer hints.
00:47:04.800 | So comparisons like LSP obviously
00:47:08.200 | did this for kind of like standard code intelligence.
00:47:10.600 | It's kind of like a lingua franca for providing
00:47:12.800 | fine references and codefinition.
00:47:14.520 | There's kind of like analogs to that.
00:47:16.200 | There might be also analogs to kind of the original OpenAI
00:47:20.720 | kind of like plugins API, where it's like, hey,
00:47:25.880 | there's all this context out there
00:47:27.440 | that might be useful for an LM-based system to consume.
00:47:31.520 | And so at a high level, what we're trying to do
00:47:33.920 | is define a common language for context providers
00:47:38.560 | to provide context to other tools in the software
00:47:41.640 | development lifecycle.
00:47:43.040 | Yeah.
00:47:43.640 | Do you have any critiques of LSP, by the way,
00:47:45.480 | since this is very much very close to home?
00:47:48.200 | One of the authors wrote a really good critique recently.
00:47:50.600 | Yeah.
00:47:51.100 | Oh, LSP?
00:47:51.600 | I don't think I saw that.
00:47:52.680 | Yeah, yeah.
00:47:53.180 | How LSP could have been better.
00:47:54.720 | It just came out a couple of weeks ago.
00:47:56.360 | It was a good article.
00:47:57.400 | Yeah.
00:47:57.900 | I don't know if I--
00:47:59.360 | I think LSP is great for what it did for the developer
00:48:02.760 | ecosystem.
00:48:04.200 | It's absolutely fantastic.
00:48:05.600 | Nowadays, it's very easy--
00:48:08.120 | it's much easier now to get code navigation up and running
00:48:12.340 | A bunch of editors.
00:48:13.440 | --in a bunch of editors by speaking this protocol.
00:48:15.800 | I think maybe the interesting question
00:48:17.440 | is looking at the different design decisions made,
00:48:21.440 | comparing LSP basically with Kithe.
00:48:24.240 | Because Kithe has more of a--
00:48:27.800 | I don't know, how would you describe it?
00:48:29.460 | A storage format.
00:48:30.560 | I think the critique of LSP from a Kithe point of view
00:48:33.320 | would be, with LSP, you don't actually
00:48:34.920 | have an actual model, a symbolic model, of the code.
00:48:39.920 | It's not like LSP models, hey, this function
00:48:41.720 | calls this other function.
00:48:43.240 | LSP is all range-based.
00:48:44.840 | Like, hey, your token is at line 32--
00:48:48.240 | your cursor is at line 32, column 1.
00:48:51.200 | And that's the thing you feed into the language server.
00:48:54.700 | And then it's like, OK, here's the range
00:48:56.860 | that you should jump to if you click on that range.
00:48:59.000 | So it kind of is intentionally ignorant of the fact
00:49:02.400 | that there's a thing called a reference underneath your
00:49:04.760 | cursor, and that's linked to a symbol definition.
00:49:07.100 | Well, actually, that's the worst example you could have used.
00:49:09.640 | You're right, but that's the one thing that it actually
00:49:12.320 | did bake in, is following references.
00:49:14.800 | Sure.
00:49:15.300 | But it's sort of hardwired.
00:49:16.720 | Yeah.
00:49:18.240 | Whereas Kithe attempts to model all these things explicitly.
00:49:21.520 | And so--
00:49:22.640 | Well, so LSP's a protocol, right?
00:49:25.520 | And so Google's internal protocol is gRPC-based.
00:49:28.600 | And it's a different approach than LSP.
00:49:34.440 | Basically, you make a heavy query to the back end,
00:49:36.620 | and you get a lot of data back, and then you
00:49:38.460 | render the whole page.
00:49:40.920 | So we've looked at LSP, and we think that it's just--
00:49:44.320 | it's a little long in the tooth, right?
00:49:45.960 | I mean, it's a great protocol, lots and lots of support
00:49:48.240 | for it.
00:49:48.740 | But we need to push into the domain of exposing
00:49:52.800 | the intelligence through the protocol.
00:49:55.520 | Yeah.
00:49:56.360 | And so I would say, I mean, we've
00:49:59.160 | developed a protocol of our own called Skip, which is, I think,
00:50:02.020 | at a very high level, trying to take some of the good ideas
00:50:04.440 | from LSP and from Kithe, and merge that into a system that,
00:50:08.160 | in the near term, is useful for SourceGraph,
00:50:10.540 | but I think in the long term, we hope it will
00:50:12.400 | be useful for the ecosystem.
00:50:13.840 | And I would say, OK, so here's what LSP did well.
00:50:17.400 | LSP, by virtue of being intentionally dumb--
00:50:20.840 | "dumb" in air quotes, because I'm not ragging on it--
00:50:23.880 | Yeah.
00:50:25.600 | But what it allowed it to do is it
00:50:28.280 | allowed language service developers
00:50:30.060 | to kind of bypass the hard problem of modeling language
00:50:33.400 | semantics precisely.
00:50:35.040 | So if all you want to do is jump to definition,
00:50:37.200 | you don't have to come up with a universally unique naming
00:50:40.320 | scheme for each symbol, which is actually quite challenging.
00:50:43.600 | Because you have to think about, OK, what's
00:50:45.920 | the top scope of this name?
00:50:47.760 | Is it the source code repository?
00:50:50.240 | Is it the package?
00:50:53.320 | Does it depend on what package server
00:50:57.800 | you're fetching this from, whether it's the public one
00:51:00.360 | or the one inside your--
00:51:01.480 | anyways, naming is hard, right?
00:51:03.800 | And by just going from a location-to-location-based
00:51:07.680 | approach, you basically just throw that out the window.
00:51:09.920 | All I care about is jumping to definition.
00:51:11.720 | Just make that work, and you can make that work
00:51:14.240 | without having to deal with all the complex global naming
00:51:18.080 | things.
00:51:19.000 | The limitation of that approach is
00:51:20.440 | that it's harder to build on top of that
00:51:23.200 | to build a true-knowledge graph.
00:51:24.880 | If you actually want a system that says, OK,
00:51:26.840 | here's the web of functions, and here's
00:51:28.520 | how they reference each other.
00:51:29.760 | And I want to incorporate that semantic model of how
00:51:32.800 | the code operates, or how the code relates to each other
00:51:35.880 | at a static level, you can't do that with LSP,
00:51:37.920 | because you have to deal with line ranges.
00:51:39.760 | And concretely, the pain point that we found
00:51:42.280 | in using LSP for source graph is,
00:51:44.560 | in order to do a find references and then jump to definition,
00:51:48.240 | it's like a multi-hop process, because you
00:51:50.640 | have to jump to the range, and then you
00:51:52.240 | find the symbol at that range.
00:51:53.600 | And it just adds a lot of latency and complexity
00:51:55.560 | of these operations.
00:51:56.400 | Where as a human, you're like, well,
00:51:58.000 | this thing clearly references this other thing.
00:52:00.080 | Why can't you just jump me to that?
00:52:02.440 | And I think that's the thing that Kite does well.
00:52:04.440 | But then I think the issue that Kite has had with adoption
00:52:07.520 | is, because it's a more sophisticated schema, I think.
00:52:14.480 | And so there's basically more things
00:52:15.960 | that you have to implement to get a Kite implementation
00:52:18.400 | up and running.
00:52:19.080 | I hope I'm not like--
00:52:20.760 | correct me if I'm wrong about any of this.
00:52:22.920 | 100%.
00:52:24.280 | Kite also has the problem-- all these systems
00:52:26.560 | have the problem, even Skip, or at least the way
00:52:29.160 | that we implemented the indexers,
00:52:30.560 | that they have to integrate with your build system
00:52:33.200 | in order to build that knowledge graph,
00:52:34.920 | because you have to basically compile
00:52:36.520 | the code in a special mode to generate artifacts instead
00:52:39.080 | of binaries.
00:52:40.200 | And I would say--
00:52:41.440 | by the way, earlier I was saying that xrefs were in LSP,
00:52:46.240 | but it's actually-- I was thinking of LSP plus lsif.
00:52:49.780 | Ugh, lsif.
00:52:51.680 | That's another--
00:52:53.000 | Which is actually bad.
00:52:53.920 | We can say that's bad, right?
00:52:56.360 | Lsif was not good.
00:52:58.880 | It's like Skip or Kite.
00:53:00.040 | It's supposed to be sort of a model, a serialization
00:53:03.440 | for the code graph.
00:53:04.360 | But it basically just does what LSP needs, the bare minimum.
00:53:08.280 | Lsif is basically if you took LSP
00:53:09.720 | and turned that into a serialization format.
00:53:11.600 | So you build an index for language servers
00:53:13.440 | to kind of quickly bootstrap from cold start.
00:53:15.840 | But it's a graph model with all of the inconvenience of the API
00:53:19.640 | without an actual graph.
00:53:21.480 | And so, yeah, it's not great.
00:53:23.960 | So one of the things that we try to do with Skip
00:53:25.960 | is try to capture the best of both worlds.
00:53:27.760 | So make it easy to write an indexer,
00:53:29.400 | make the schema simple, but also model
00:53:32.120 | some of the more symbolic characteristics of the code
00:53:34.960 | that would allow us to essentially construct this
00:53:37.680 | knowledge graph that we can then make
00:53:39.560 | useful for both the human developer through SourceGraph
00:53:41.880 | and through the AI developer through Kodi.
00:53:44.600 | So anyway, just to finish off the graph comment
00:53:49.040 | is we've got a new graph that's Skip-based.
00:53:55.080 | We call it BFG internally, right?
00:53:58.380 | Beautiful something graph.
00:53:59.640 | Big friendly graph.
00:54:00.880 | Big friendly graph.
00:54:01.760 | It's a blazing fast.
00:54:02.680 | Blazing fast.
00:54:03.240 | Chasing fast graph.
00:54:04.480 | And it is blazing fast, actually.
00:54:05.940 | It's really, really interesting.
00:54:07.240 | I should probably have to do a blog post about it
00:54:09.920 | to walk you through exactly how they're doing it.
00:54:12.040 | Oh, please.
00:54:12.600 | But it's a very AI-like, iterative, experimentation
00:54:16.800 | sort of approach, where we're building a code graph based
00:54:20.400 | on all of our 10 years of knowledge
00:54:22.160 | about building code graphs.
00:54:23.640 | But we're building it quickly with zero configuration,
00:54:25.880 | and it doesn't have to integrate with your build system
00:54:28.840 | through some magic tricks that we have.
00:54:30.680 | And so it just happens when you install the plug-in
00:54:35.600 | that it'll be there and indexing your code
00:54:38.240 | and providing that knowledge graph in the background
00:54:40.440 | without all that build system integration.
00:54:42.320 | This is a bit of secret sauce that we haven't really--
00:54:46.800 | I don't know, we haven't advertised it very much lately.
00:54:49.800 | But I am super excited about it, because what they do
00:54:52.480 | is they say, all right, let's tackle function parameters
00:54:55.120 | today.
00:54:56.000 | Kodi's not doing a very good job of completing function call
00:54:58.800 | arguments or function parameters in the definition, right?
00:55:01.920 | Yeah, we generate those thousands of tests.
00:55:03.840 | And then we can actually reuse those tests for the AI context
00:55:06.840 | as well.
00:55:07.760 | So fortunately, things are kind of converging on.
00:55:10.040 | We have half a dozen really, really good context sources.
00:55:14.680 | And we mix them all together.
00:55:16.880 | So anyway, BFG, you're going to hear more about it probably,
00:55:21.760 | I would say, probably in the holidays?
00:55:24.240 | Yeah, I think it'll be online for December 14th.
00:55:28.240 | We'll probably mention it.
00:55:29.640 | BFG is probably not the public name we're going to go with.
00:55:32.720 | I think we might call it Graph Context or something like that.
00:55:36.560 | We're officially calling it BFG.
00:55:38.680 | You're going to hear first.
00:55:40.240 | BFG is just kind of like the working name.
00:55:42.000 | And it's interesting.
00:55:43.880 | So the impetus for BFG was, if you
00:55:46.480 | look at current AI inline code completion tools
00:55:50.760 | and the errors that they make, a lot of the errors
00:55:53.400 | that they make, even in kind of the easy single line case,
00:55:56.960 | are essentially type errors, right?
00:56:00.200 | You're trying to complete a function call.
00:56:04.120 | And it suggests a variable that you define earlier,
00:56:06.200 | but that variable is the wrong type.
00:56:08.480 | And that's the sort of thing where it's like, well,
00:56:10.880 | like a first year freshman CS student
00:56:14.480 | would not make that error, right?
00:56:16.440 | So why does the AI make that error?
00:56:19.040 | And the reason is, I mean, the AI is just
00:56:21.680 | suggesting things that are plausible
00:56:23.280 | without the context of the types or any other broader
00:56:28.640 | files in the code.
00:56:31.360 | And so the kind of intuition here
00:56:33.360 | is, why don't we just do the basic thing
00:56:36.920 | that any baseline intelligent human developer would
00:56:40.240 | do, which is click jump to definition,
00:56:43.440 | click some find references, and pull in that graph context
00:56:48.080 | into the context window, and then
00:56:51.360 | have it generate the completion.
00:56:53.480 | So that's sort of like the MVP of what BFG was.
00:56:56.320 | And it turns out that works really well.
00:56:58.000 | You can eliminate a lot of type errors
00:57:02.920 | that AI coding tools make just by pulling in that context.
00:57:06.840 | Yeah, but the graph is definitely our Chomsky side.
00:57:09.560 | Yeah, exactly.
00:57:10.280 | So this Chomsky-Norvig thing, I think,
00:57:12.720 | pops up in a bunch of different layers.
00:57:15.200 | And I think it's just a very useful and also kind of nicely
00:57:18.960 | nerdy way to describe the system that we're trying to build.
00:57:23.120 | By the way, I remember the point I
00:57:25.640 | was trying to make earlier to your question, Alessio, about,
00:57:28.080 | is AI going to replace programmers?
00:57:29.800 | And I was talking about how compilers--
00:57:31.520 | they thought, oh, are compilers going to replace programming?
00:57:33.640 | And what it did was it just changed
00:57:35.100 | kind of what programmers have to focus on.
00:57:36.920 | And I think AI is just going to level us up again.
00:57:39.240 | So programmers are still going to be building stuff
00:57:42.120 | until agents come along, but I don't believe.
00:57:46.280 | And so, yeah.
00:57:47.680 | Yeah, to be clear, again, with the agent stuff
00:57:50.600 | at a high level, I think we will get there.
00:57:52.460 | I think that's still the kind of long-term target.
00:57:54.840 | And I think also with Kodi, it's like,
00:57:57.160 | you can have Kodi draft up an execution plan.
00:58:00.160 | It's just not going to be the sort of thing where you can't
00:58:04.440 | attend to what it's doing.
00:58:05.880 | Like, we think that with Kodi, it's like, you guys Kodi,
00:58:08.520 | like, hey, I have this bug.
00:58:09.640 | Help me solve it.
00:58:10.340 | It would do a reasonable job of fetching context and saying,
00:58:12.840 | here are the files you should modify.
00:58:14.960 | And if you prompt it further, you
00:58:16.480 | can actually suggest co-changes to make to those files.
00:58:19.200 | And that's a very nice way to resolve issues,
00:58:21.640 | because you're kind of on the rails for most of the time,
00:58:24.720 | but then now and then you have to intervene as a human.
00:58:27.600 | I just think that if we're trying
00:58:28.960 | to get to complete automation, where it's like the sort
00:58:31.720 | of thing where a non-software engineer, someone
00:58:34.600 | who has no technical expertise, can just
00:58:36.600 | speak a non-trivial feature into existence,
00:58:41.520 | that is still, I think, several key innovations away
00:58:46.360 | from happening right now.
00:58:47.400 | And I don't think the pure transformer-based LLM
00:58:51.400 | orchestrator model of agents that is kind of dominant today
00:58:56.320 | is going to get us there.
00:58:57.880 | FRANCESC CAMPOY: Yeah.
00:58:58.960 | Just what you're talking about triggered a thread
00:59:04.480 | I've been working on for a little bit, which is we're
00:59:07.400 | very much reacting to developments in models
00:59:09.920 | on a month-to-month basis.
00:59:11.960 | You had a post about, we're going
00:59:15.520 | to need a bigger moat, which is great JAWS reference for those
00:59:19.040 | who didn't catch it.
00:59:20.000 | About how quickly--
00:59:20.760 | MARK MANDEL: I forgot all about that.
00:59:22.300 | FRANCESC CAMPOY: --how quickly models are evolving.
00:59:24.920 | But I think if you kind of look out,
00:59:26.560 | I actually caught Sam Altman on the podcast
00:59:29.200 | yesterday talking about GPT-10.
00:59:31.960 | [LAUGHTER]
00:59:32.460 | MARK MANDEL: Ooh, wow.
00:59:34.120 | Things are accelerating.
00:59:36.680 | FRANCESC CAMPOY: And actually, there's a pretty good cadence
00:59:39.240 | from GPT-2, 3, and 4 that you can-- if you project out.
00:59:42.360 | So 4 is based on George Hotz's concept of 20 petaflops
00:59:48.120 | being a human's worth of compute.
00:59:52.080 | GPT-4 took about 100 years in terms of human years
00:59:57.080 | to train, in terms of the amount of compute.
01:00:00.120 | So that's one living person.
01:00:02.800 | And every generation of GPT increases
01:00:05.460 | two orders of magnitude.
01:00:07.320 | So 5 is 100 people.
01:00:10.680 | And if you just project it out, 9 is every human on Earth,
01:00:14.880 | and 10 is every human ever.
01:00:18.960 | And he thinks he'll reach there by the end of the decade.
01:00:22.280 | MARK MANDEL: George Hotz does?
01:00:23.520 | FRANCESC CAMPOY: No, Sam Altman.
01:00:24.280 | MARK MANDEL: Oh, Sam Altman, OK.
01:00:25.200 | FRANCESC CAMPOY: Yeah.
01:00:26.080 | So I just like setting those high-level--
01:00:29.800 | you have dots on the line.
01:00:32.160 | We're at the start of the curve with Moore's law.
01:00:37.080 | George Moore, I think, thought it would last 10 years.
01:00:40.120 | And he just kept drawing for another 50.
01:00:43.680 | And I think we have all these data points.
01:00:45.600 | And we're just trying to extrapolate the curve out
01:00:48.200 | to where this goes.
01:00:50.040 | So all I'm saying is this Asian stuff that we dealt
01:00:54.040 | might come here by 2030.
01:00:56.240 | And I don't know how you plan when things are not
01:01:01.400 | possible today.
01:01:02.080 | And you're like, it's not worth doing.
01:01:04.640 | But we're going to be here in 2030.
01:01:06.840 | And what do we do then?
01:01:12.360 | MARK MANDEL: So is the question like--
01:01:14.000 | FRANCESC CAMPOY: There's no question.
01:01:15.500 | It's like sharing of a comment, just
01:01:17.120 | because at the back of my head, anytime
01:01:20.240 | we hear things like things are not practical today,
01:01:23.080 | I'm just like, all right, but how do we--
01:01:25.640 | MARK MANDEL: So here's a question, maybe.
01:01:28.200 | I get the whole scaling argument.
01:01:30.220 | I do think that there will be something like a Moore's law
01:01:32.640 | for AI inference.
01:01:34.920 | I mean, definitely, I think, at the hardware level, like GPUs.
01:01:39.800 | I think it gets a little fuzzier the higher you move up
01:01:42.400 | in the stack.
01:01:44.400 | But for instance, going back to the chess analogy,
01:01:50.000 | at what point do we think that GPT-X or whatever,
01:01:54.520 | a pure transformer-based LLM model will be state of the art
01:02:00.440 | or outperform the best chess-playing algorithm today?
01:02:04.680 | Because I think that is one milestone on--
01:02:07.480 | FRANCESC CAMPOY: Where you completely overlap
01:02:09.880 | search and symbolic models.
01:02:11.040 | MARK MANDEL: Yeah, exactly, because I
01:02:11.220 | think that would be--
01:02:12.680 | I mean, just to put my cards on the table,
01:02:13.960 | I think that would kind of disprove the thesis that I just
01:02:16.320 | stated, which is kind of like the pure transformer,
01:02:18.960 | just scale the transformer-based approach.
01:02:21.720 | That would be a proof point where like, hey,
01:02:23.600 | maybe that is the right approach,
01:02:25.000 | versus, oh, we actually have to take a step back and think--
01:02:28.400 | you get what I'm saying, right?
01:02:29.680 | Is the transformer going to be like,
01:02:31.260 | is at the end all be all of architectures,
01:02:33.080 | and it's just a matter of scaling that?
01:02:34.840 | Or are there other algorithms, and that
01:02:37.200 | is going to be one piece of a system of intelligence
01:02:41.740 | that's going to take advantage-- that we'll have to take
01:02:44.120 | advantage of, like many other algorithms and approaches?
01:02:47.240 | FRANCESC CAMPOY: Yeah, we shall see.
01:02:49.200 | Maybe John Carmack will find it.
01:02:51.600 | MARK MANDEL: Yeah.
01:02:53.800 | FRANCESC CAMPOY: All right, sorry for that digression.
01:02:56.000 | I'm just very curious.
01:02:57.480 | So one thing I did actually want to check in on,
01:03:00.000 | because we talked a little bit about code graphs and reference
01:03:02.760 | graphs and all that.
01:03:03.640 | Do you actually use a graph database?
01:03:05.360 | No, right?
01:03:06.280 | MARK MANDEL: No.
01:03:07.120 | FRANCESC CAMPOY: Isn't it weird?
01:03:08.480 | MARK MANDEL: Well, I mean, how would you find graph database?
01:03:10.760 | FRANCESC CAMPOY: We use Postgres.
01:03:12.380 | And yeah, I saw a paper actually right
01:03:14.140 | after I joined Sourcegraph.
01:03:15.220 | There was some joint study between IBM
01:03:16.760 | and some other company that basically showed
01:03:18.420 | that Postgres was performing as well as most of the graph
01:03:20.840 | databases for most graph workloads.
01:03:22.860 | MARK MANDEL: Wow.
01:03:23.620 | In V0 of Sourcegraph, we're like,
01:03:26.400 | we're building a code graph.
01:03:27.660 | Let's use a graph database.
01:03:30.820 | I won't name the database, because I mean,
01:03:33.020 | it was like 10 years ago.
01:03:34.100 | So they're probably much better now.
01:03:35.640 | But we basically tried to dump a non-trivially sized data set,
01:03:40.260 | but also not the whole universe of code, right?
01:03:44.540 | It was a relatively small data set
01:03:46.180 | compared to what we're indexing now into the database.
01:03:48.780 | And we let it run for a week.
01:03:51.620 | And I think it segfaulted or something.
01:03:55.360 | And we're like, OK, let's try another approach.
01:03:58.800 | Let's just put everything in Postgres.
01:04:00.380 | And these days, the graph data, I mean,
01:04:03.460 | it's partially in Postgres.
01:04:04.620 | It's partially just--
01:04:05.700 | I mean, you could store them as flat files.
01:04:07.660 | FRANCESC CAMPOY: Yeah.
01:04:08.620 | I mean, at the end of the day, all the databases,
01:04:10.760 | just get me the data I want.
01:04:12.340 | Answer the queries that I need, right?
01:04:14.660 | If all your queries are single hops in this--
01:04:20.060 | MARK MANDEL: Which they will be if you denormalize
01:04:22.220 | for other use cases.
01:04:23.060 | FRANCESC CAMPOY: Yeah, exactly.
01:04:24.740 | MARK MANDEL: Interesting.
01:04:25.820 | FRANCESC CAMPOY: So, yeah.
01:04:27.100 | MARK MANDEL: Seventh normal form is just a bunch of files.
01:04:29.500 | FRANCESC CAMPOY: Yeah, yeah.
01:04:30.700 | And I don't know, I feel like there's
01:04:32.340 | a bunch of stuff like that, where it's like,
01:04:34.740 | if you look past the marketing and think
01:04:36.460 | about the actual query load, or the traffic patterns,
01:04:41.900 | or the end user use cases you need to serve,
01:04:46.020 | just go with the tried and true, dumb, classic tools
01:04:49.220 | over the new-agey stuff.
01:04:50.540 | MARK MANDEL: Choose point technology, yeah.
01:04:52.260 | FRANCESC CAMPOY: I mean, there's a bunch of stuff
01:04:54.260 | like that in the search domain, too, especially right now,
01:04:56.700 | with embeddings, and vector search, and all that.
01:05:00.900 | But classic search techniques still go very far.
01:05:04.020 | And I don't know, I think in the next year or two maybe,
01:05:07.100 | as we get past the peak AI hype, we'll
01:05:10.680 | start to see the gap emerge, or become more obvious to more
01:05:17.060 | people about how many of the newfangled techniques
01:05:20.100 | actually work in practice, and yield a better product
01:05:23.340 | experience day to day.
01:05:24.780 | MARK MANDEL: Yeah.
01:05:25.940 | So speaking of which, obviously there's
01:05:27.880 | a bunch of other people trying to build AI tooling.
01:05:31.340 | What can you say about your AI stack?
01:05:34.320 | Obviously, you build a lot proprietary in-house,
01:05:36.900 | but what approaches-- so prompt engineering,
01:05:42.020 | do you have a prompt engineering management tool?
01:05:45.620 | What approaches there do you do?
01:05:48.540 | Pre-processing orchestration, do you use Airflow?
01:05:50.900 | Do you use something else?
01:05:52.540 | That kind of stuff.
01:05:53.580 | FRANCESC CAMPOY: Yeah.
01:05:54.500 | Ours is very duct-taped together at the moment.
01:05:58.780 | So in terms of stack, it's essentially
01:06:02.820 | Go and TypeScript, and now Rust.
01:06:06.460 | There's the knowledge graph, the code knowledge graph
01:06:09.220 | that we built, which is using indexers, many of which
01:06:12.620 | are open source, that speak the skip protocol.
01:06:17.980 | And we have the code search back end.
01:06:21.860 | Traditionally, we supported regular expression search
01:06:24.540 | and string literal search with a trigram index.
01:06:28.060 | And we're also building more fuzzy search on top of that
01:06:31.300 | now, kind of like natural language or keyword-based
01:06:33.500 | search on top of that.
01:06:36.820 | And we use a variety of open source and proprietary models.
01:06:40.140 | We try to be pluggable with respect to different models,
01:06:42.820 | so we can easily swap the latest model in and out
01:06:46.580 | as they come online.
01:06:49.460 | I'm just hunting for, is there anything out there
01:06:52.620 | that you're like, these guys are really good.
01:06:55.420 | Everyone should check them out.
01:06:56.700 | So for example, you talked about recursive summarization,
01:06:59.500 | which is something that LangChain and LlamaIndex do.
01:07:01.780 | I presume you wrote your own.
01:07:03.940 | I presume--
01:07:04.500 | Yeah, we wrote our own.
01:07:05.500 | I think the stuff that LlamaIndex and LangChain
01:07:08.580 | are doing are super interesting.
01:07:10.780 | I think, from our point of view, it's
01:07:12.420 | like we're still in the application end user use case
01:07:16.020 | discovery phase.
01:07:17.060 | And so adopting an external infrastructure or middleware
01:07:25.020 | tool just seems overly constraining right now.
01:07:27.300 | We need full control.
01:07:28.540 | Yeah, we need full control, because we
01:07:29.540 | need to be able to iterate rapidly up and down the stack.
01:07:32.260 | But maybe at some point, there'll be a convergence,
01:07:34.620 | and we can actually merge some of our stuff into theirs
01:07:36.880 | and turn that into a common resource.
01:07:39.340 | In terms of other vendors that we use,
01:07:41.300 | I mean, obviously, nothing but good things
01:07:43.700 | to say about Anthropic and OpenAI,
01:07:46.700 | which we both kind of partner with and use.
01:07:50.620 | Also, plug for Fireworks as an inference platform.
01:07:55.020 | Their team was kind of like ex-meta people
01:07:57.940 | who basically know all the bag of tricks
01:08:01.620 | for making inference fast.
01:08:02.820 | I met Lynn.
01:08:03.340 | So she was--
01:08:03.840 | Lynn is great.
01:08:05.180 | She was with Sumith.
01:08:06.140 | She was the co-manager of PyTorch for five years.
01:08:08.500 | Yeah, yeah, yeah.
01:08:10.540 | But is their main thing that we just
01:08:12.380 | do fastest inference on Earth?
01:08:14.940 | Is that what it is?
01:08:15.940 | I think that's the pitch.
01:08:17.980 | And it keeps getting faster somehow.
01:08:20.420 | We run Starcoder on top of Fireworks.
01:08:22.900 | And that's made it so that we just don't have
01:08:24.820 | to think about building up an inference stack.
01:08:27.860 | And so that's great for us, because it allows us to focus
01:08:30.340 | more on the data fetching, the knowledge graph,
01:08:35.500 | and model fine-tuning, which we've also invested a bit in.
01:08:40.260 | That's right.
01:08:40.820 | We've got multiple AI workstreams in progress now,
01:08:43.820 | because we hired a head of AI, finally.
01:08:45.860 | We spent close to a year, actually.
01:08:48.460 | I talked to probably 75 candidates.
01:08:51.700 | And the guy we hired, Rashab, is absolutely world-class.
01:08:56.140 | And he immediately started multiple workstreams,
01:08:58.740 | including he's fine-tuned Starcoder already.
01:09:01.860 | He's got Prompt Engineering workstream.
01:09:04.100 | He's got the Embeddings workstream.
01:09:06.780 | He's got Evaluation and Experimentation.
01:09:09.100 | Benchmarking-- wouldn't it be nice
01:09:10.820 | if Cody was on Hugging Face with a benchmark
01:09:14.820 | that anybody could say, well, we'll
01:09:17.140 | run against the benchmark, or we'll make our own benchmark
01:09:19.740 | if we don't like yours.
01:09:20.740 | But we'll be forcing people into the quantitative comparisons.
01:09:24.740 | And that's all happening under the AI program
01:09:26.820 | that he's building for us.
01:09:28.420 | Yeah.
01:09:29.060 | I should mention, by the way, I've
01:09:30.420 | heard that there's a v2 of Starcoder coming on.
01:09:33.860 | So you guys should talk to Hugging Face.
01:09:35.660 | Cool.
01:09:36.440 | Awesome.
01:09:36.940 | Great.
01:09:37.940 | I actually visited their offices in Paris,
01:09:39.740 | which is where I heard it.
01:09:40.700 | That's awesome.
01:09:41.320 | Can you guys believe how amazing it is that the open source
01:09:44.420 | models are competitive with GPT and Anthropic?
01:09:49.060 | I mean, it's nuts, right?
01:09:50.260 | I mean, that one Googler that was predicting that open source
01:09:53.420 | would catch up, at least he was right for completions.
01:09:57.700 | Yeah, I mean, for completions, open source
01:09:59.660 | is state of the art right now.
01:10:01.300 | You were on OpenAI, then you went to Cloud,
01:10:03.100 | and now you've rifted up.
01:10:05.100 | Yeah, for completions.
01:10:06.100 | We still use Cloud and GPT-4 for chat and also commands.
01:10:11.980 | But the ecosystem is going to continue to evolve.
01:10:17.100 | We obviously love the open source ecosystem.
01:10:19.620 | And a huge shout out to Hugging Face.
01:10:21.740 | And also Meta Research, we love the work
01:10:24.620 | that they're doing in kind of driving the ecosystem forward.
01:10:27.300 | Yeah, you didn't mention Codelama.
01:10:29.220 | We're not using Codelama currently.
01:10:31.300 | It's always kind of like a constant evaluation process.
01:10:33.980 | I don't want to come out and say, hey, this model's
01:10:36.140 | the best because we chose it.
01:10:37.340 | It's basically like we did a bunch of tests
01:10:39.580 | for the sorts of context that we're fetching now
01:10:42.460 | and given the way that our prompt's constructed now.
01:10:44.580 | And at the end of the day, it was like a judgment call.
01:10:47.000 | Like, star coders seem to work the best,
01:10:48.700 | and that's why we adopted it.
01:10:50.380 | But it's sort of like a continual process
01:10:52.340 | of revisitation.
01:10:53.140 | Like, if someone comes up with a neat new context fetching
01:10:55.680 | mechanism-- and we have a couple coming online soon--
01:10:59.060 | then it's always like, OK, let's try that
01:11:00.820 | against the kind of array of models that are available
01:11:04.860 | and see how this moves the needle across that set.
01:11:09.980 | Yeah.
01:11:10.920 | What do you wish someone else built?
01:11:14.260 | What did we have to build that we wish we could have used?
01:11:17.900 | Is that the question?
01:11:18.940 | Interesting.
01:11:19.740 | This is a request for startups.
01:11:21.060 | [LAUGHTER]
01:11:24.060 | I mean, if someone could just provide
01:11:25.700 | like a very nice, clean data set of both naturally occurring
01:11:32.700 | and synthetic code data out there.
01:11:34.820 | Yeah, could someone please give us their data mode?
01:11:36.980 | [LAUGHTER]
01:11:37.860 | Well, not even the data mode.
01:11:39.100 | It's just like, I feel like most models today,
01:11:41.380 | they still use a combination of the stack and the pile
01:11:44.060 | as their training corpus.
01:11:47.780 | But you can only stretch that so far.
01:11:50.500 | At some point, we need more data.
01:11:52.340 | And I don't know.
01:11:55.020 | I think there's still more alpha in synthetic data.
01:11:59.020 | We have a couple efforts where we
01:12:01.020 | think fine-tuning some models on specific coding tasks
01:12:03.300 | will yield alpha, will yield more kind
01:12:05.020 | of like reliable code generation of the sort
01:12:08.500 | where it's reliable enough that we can fully automate it,
01:12:11.260 | at least like the one hop thing.
01:12:14.700 | And synthetic data is playing a part of that.
01:12:17.060 | But I mean, if there were like a synthetic data provider--
01:12:19.760 | I don't think you could construct a provider that has
01:12:21.980 | access to some proprietary code base.
01:12:25.200 | No company in the world would be able to sell that to you.
01:12:27.660 | But anyone who's just providing clean data
01:12:29.980 | sets off of the publicly available data,
01:12:33.700 | that would be nice.
01:12:35.940 | I don't know if there's a business around that.
01:12:37.860 | But that's something that we definitely love to use.
01:12:40.200 | Oh, for sure.
01:12:40.820 | My god.
01:12:41.320 | I mean, but that's also like the secret weapon, right?
01:12:44.580 | For any AI is the data that you've curated.
01:12:48.220 | So I doubt people are going to be, oh, we'll see.
01:12:52.740 | But we can maybe contribute if we want
01:12:54.900 | to have a benchmark of our own.
01:12:56.480 | Yeah.
01:12:57.100 | Yeah.
01:12:57.940 | I would say that would be the bull case for Repl.it,
01:13:01.500 | that you want to be a coding platform where you also offer
01:13:04.540 | bounties.
01:13:05.980 | And then you eventually bootstrap your own proprietary
01:13:08.940 | set of coding data.
01:13:10.300 | I don't think they'll ever share it.
01:13:11.800 | And the rumor is--
01:13:14.580 | this is from nobody at Repl.it that I'm hearing.
01:13:17.680 | But also, they're just not leveraging that actively.
01:13:21.660 | They're actually just betting on OpenAI to do a lot of that,
01:13:25.220 | which banking on OpenAI, I think,
01:13:27.860 | has been a winning strategy so far.
01:13:30.540 | Yeah, they're definitely great at executing and--
01:13:33.860 | Executing their CEO.
01:13:37.260 | And then bring him back in four days.
01:13:38.980 | Yeah.
01:13:39.480 | He won.
01:13:39.980 | That was a whole, like, I don't know.
01:13:42.620 | Did you guys-- yeah, was the company just
01:13:45.700 | obsessed by the drama?
01:13:47.500 | We were unable to work.
01:13:48.460 | I just walked in after it happened.
01:13:50.340 | And this whole room in the new room was just like,
01:13:52.560 | everyone's just staring at their phones.
01:13:54.220 | I mean, it's a bit difficult to ignore.
01:13:58.060 | I mean, it would have real implications for us, too.
01:14:00.220 | Because we're using them.
01:14:01.300 | And so there's a very real question of,
01:14:03.060 | do we have to do a quick--
01:14:04.220 | Yeah, did you-- yeah, Microsoft.
01:14:05.600 | You just moved to Microsoft, right?
01:14:07.140 | Yeah, I mean, that would have been the break glass plan.
01:14:10.620 | If the worst case played out, then I
01:14:13.180 | think we'd have a lot of customers the day after being
01:14:16.140 | like, how can you guarantee the reliability of your services
01:14:19.500 | if the company itself isn't stable?
01:14:22.020 | But I'm really happy they got things sorted out
01:14:24.540 | and things are stable now.
01:14:26.380 | Because they build really cool stuff,
01:14:27.940 | and we love using their tech.
01:14:30.260 | Yeah, awesome.
01:14:31.340 | So we kind of went through everything, right?
01:14:33.980 | Sourcecraft, Kodi, why agents don't work,
01:14:37.300 | why inline completion is better, all of these things.
01:14:42.180 | How does that bubble up to who manages the people, right?
01:14:46.820 | Because as engineering managers, and I never--
01:14:50.780 | I didn't write much code.
01:14:52.140 | I was mostly helping people write their own code.
01:14:55.020 | So even if you have the best inline completion,
01:14:57.140 | it doesn't help me do my job.
01:14:59.620 | What's kind of the future of Sourcecraft
01:15:02.580 | in the engineering org?
01:15:04.220 | Yeah, so that's a really interesting question.
01:15:07.580 | And I think it sort of gets at this issue, which
01:15:10.420 | is I think basically every AI dev tools creator or producer
01:15:19.140 | these days, I think us included, we're
01:15:22.700 | kind of focusing on the wrong problem in a way.
01:15:26.340 | Because the real problem of modern software development,
01:15:30.340 | I think, is not how quickly can you write more lines of code.
01:15:34.180 | It's really about managing the emergent complexity
01:15:37.980 | of code bases as they evolve and grow,
01:15:41.340 | and how to make efficient development tractable again.
01:15:47.060 | Because the bulk of your time becomes more about understanding
01:15:51.540 | how the system works and how the pieces fit together currently
01:15:56.140 | so that you can update it in a way that gets you
01:16:00.220 | your added functionality, doesn't break anything,
01:16:03.340 | and doesn't introduce a lot of additional complexity
01:16:05.580 | that will slow you down in the future.
01:16:08.100 | And if anything, the inner loop developer tools
01:16:11.140 | that are all about generating lines of code,
01:16:15.020 | yes, they help you get your feature done faster.
01:16:17.780 | They generate a lot of boilerplate for you.
01:16:19.780 | But they might make this problem of managing large complex code
01:16:24.180 | bases more challenging.
01:16:25.820 | Just because now, instead of having a pistol,
01:16:29.620 | you'll have a machine gun in terms
01:16:31.020 | of being able to write code.
01:16:33.100 | And there's going to be a bunch of natural language prompted
01:16:35.740 | code that is generated in the future that was produced
01:16:38.500 | by someone who doesn't even have an understanding of source
01:16:42.780 | code.
01:16:43.460 | And so how are you going to verify the quality of that
01:16:45.780 | and make sure it not only checks the low-level boxes,
01:16:49.820 | but also fits architecturally in a way that's
01:16:52.820 | sensible into your code base.
01:16:54.020 | And so I think as we look forward
01:16:56.180 | to the future of the next year, we
01:16:57.980 | have a lot of ideas around how to make code bases,
01:17:01.260 | as they evolve, more understandable and manageable
01:17:05.020 | to the people who really care about the code base as a whole--
01:17:08.300 | tech leads, engineering leaders, folks like that.
01:17:11.340 | And it is kind of like a return to our ultimate mission
01:17:16.820 | at Sourcegraph, which is to make code accessible to all.
01:17:19.340 | It's not really about enabling people to write code.
01:17:21.640 | And if anything, the original version of Sourcegraph
01:17:24.820 | was a rejection of, hey, let's stop
01:17:26.460 | trying to build the next best editor,
01:17:29.220 | because there's already enough people doing that.
01:17:32.100 | The real problem that we're facing--
01:17:34.700 | I mean, Quinn, myself, and you, Steve, at Google--
01:17:37.860 | was how do we make sense of the code that
01:17:39.920 | exists so we can understand enough to know
01:17:41.900 | what code needs to be written?
01:17:45.660 | Yeah.
01:17:46.300 | Well, I'll tell you what customers want--
01:17:48.980 | what they're going to get.
01:17:50.060 | What they want is for Kody to have
01:17:51.820 | a monitor for developer productivity.
01:17:54.020 | And any developer who falls below a threshold,
01:17:56.180 | a button lights up where the admin can fire them.
01:17:58.860 | Or Kody will even press that button for you
01:18:01.300 | as the time passes.
01:18:02.940 | But I'm kind of only half tongue-in-cheek here.
01:18:06.260 | We've got some prospects who are kind of sniffing down
01:18:09.460 | that avenue.
01:18:10.180 | And we're like, no.
01:18:12.540 | But what they're going to get is much--
01:18:15.320 | like Bian was saying-- much greater whole code-based
01:18:17.700 | understanding, which is actually something that Kody is,
01:18:20.260 | I would argue, the best at today in the coding assistance space,
01:18:23.020 | right, because of our search engine and the techniques
01:18:25.480 | that we're using.
01:18:26.300 | And that whole code-based understanding
01:18:27.880 | is so important for any sort of a manager who just
01:18:30.860 | wants to get a feel for the architecture
01:18:32.660 | or potential security vulnerabilities
01:18:34.340 | or whether people are writing code that's well-tested
01:18:37.140 | and et cetera, et cetera, right?
01:18:39.020 | And solving that problem is tricky, right?
01:18:42.580 | This is not the developer inner loop or outer loop.
01:18:44.900 | It's like the manager inner loop?
01:18:47.620 | No, outer loop.
01:18:48.540 | The manager inner loop is staring at your belly button,
01:18:51.580 | I guess.
01:18:52.820 | So in any case--
01:18:54.220 | Waiting for the next Slack message to arrive?
01:18:58.280 | What they really want is a batch mode for these assistants
01:19:00.700 | where you can actually take the coding assistant
01:19:02.780 | and shove its face into your code base.
01:19:04.980 | And 6 billion lines of code later,
01:19:08.180 | it's told you all the security vulnerabilities.
01:19:10.360 | That's what they really actually want.
01:19:11.980 | It's an insanely expensive proposition, right?
01:19:14.060 | You know, just the GPU cost, especially if you're
01:19:16.100 | doing it on a regular basis.
01:19:17.580 | So it's better to do it at the point the code enters
01:19:19.780 | the system.
01:19:20.380 | And so now we're starting to get into developer outer loop
01:19:22.720 | stuff.
01:19:23.220 | And I think that's where a lot of the-- to your question,
01:19:25.400 | right?
01:19:25.900 | A lot of the admins and managers and the decision makers,
01:19:28.820 | anybody who just kind of isn't coding but is involved,
01:19:32.540 | they're going to have, I think, well, a set of tools, right?
01:19:37.780 | And a set of--
01:19:38.780 | just like with code search today.
01:19:40.980 | Our code search actually serves that audience as well,
01:19:43.540 | the CIO types, right?
01:19:45.140 | Because they're just like, oh, hey,
01:19:46.640 | I want to see how we do Samaloth.
01:19:48.300 | And they use our search engine and they go find it.
01:19:50.380 | And AI is just going to make that so much easier for them.
01:19:53.780 | Yeah, I have a-- this is my perfect place
01:19:56.180 | to put my anecdote of how I used Kodi yesterday.
01:19:59.380 | I was actually trying to build this Twitter scraper thing.
01:20:02.020 | And Twitter is notoriously very challenging to work with
01:20:06.200 | because they don't want to work with anyone.
01:20:09.000 | And there's a repo that I wanted to inspect.
01:20:11.960 | It was really big that had the Twitter scraper thing in it.
01:20:16.860 | And I pulled it into Copilot, didn't work.
01:20:20.420 | But then I noticed that on your landing page,
01:20:23.180 | you had a web version.
01:20:24.100 | Like, I typically think of Kodi as a VS Code extension.
01:20:27.900 | But you have a web version where you just plug in any repo
01:20:30.580 | in there and just talk to it.
01:20:31.860 | And that's what I used to figure it out.
01:20:34.780 | Wow, Kodi web is wild.
01:20:37.240 | Yeah.
01:20:37.840 | I mean, we've done a very poor job
01:20:39.680 | of making the existence of that feature--
01:20:42.880 | It's not easy to find.
01:20:43.840 | It's not easy to find.
01:20:44.800 | The search thing is like, oh, this is old source graph.
01:20:46.840 | You don't want to look at old source graph.
01:20:48.640 | You can use source graph, all the AI stuff.
01:20:50.520 | Old source graph has AI stuff.
01:20:52.120 | And it's Kodi web.
01:20:53.920 | Yeah, there's a little Ask Kodi button
01:20:55.880 | that's hidden in the upper right hand corner.
01:20:58.120 | We should make that more visible.
01:20:59.860 | It's definitely one of those aha moments
01:21:01.760 | when you can ask a question of--
01:21:03.120 | Of any repo, right?
01:21:04.140 | Because you already indexed it.
01:21:05.660 | Well, you didn't embed it, but you indexed it.
01:21:08.100 | And there's actually some use cases
01:21:09.720 | that have emerged among power users where they kind of do--
01:21:13.060 | like, you're familiar with v0.dev.
01:21:15.780 | You can kind of replicate that, but for arbitrary frameworks
01:21:18.260 | and libraries with Kodi web.
01:21:20.340 | Because there's also an equally hidden toggle, which you may
01:21:22.900 | not have discovered yet, where you can actually
01:21:24.860 | tag in multiple repositories as context.
01:21:27.180 | And so you can do things like--
01:21:28.580 | we have a demo path where it's like, OK,
01:21:30.540 | let's say you want to build a stock ticker that's
01:21:33.280 | React-based, but uses this one tick data fetching API.
01:21:37.400 | It's like, you tag both repositories in.
01:21:39.320 | You ask it-- it's like two sentences.
01:21:41.200 | Like, build a stock tick app.
01:21:42.480 | Track the tick data of Bank of America, Wells Fargo
01:21:45.660 | over the past week.
01:21:47.040 | And it generates a code.
01:21:48.040 | You can paste that in.
01:21:49.440 | And it works magically.
01:21:53.280 | We'll probably invest in that more,
01:21:55.160 | just because the wow factor of that is just pretty incredible.
01:21:58.360 | It's like, what if you can speak apps into existence
01:22:00.800 | that use the frameworks and packages that you want to use?
01:22:06.220 | It's not even fine-tuning.
01:22:07.380 | It's just taking advantage of your RAG pipeline.
01:22:09.380 | Yeah, it's just RAG.
01:22:10.820 | RAG is all you need for many things.
01:22:14.420 | It's not just RAG.
01:22:15.540 | It's RAG, right?
01:22:18.580 | RAG's good, not a fallback.
01:22:20.700 | Yeah, but I guess getting back to the original question,
01:22:23.300 | I think there's a couple of things
01:22:25.620 | I think would be interesting for engineering leaders.
01:22:27.780 | One is the use case that you called out,
01:22:29.440 | is all the stuff that you currently don't do
01:22:32.100 | that you really ought to be doing with respect to, like,
01:22:34.520 | ensuring code quality, or updating dependencies,
01:22:37.560 | or keeping things up to date, the things
01:22:42.680 | that humans find toilsome and tedious and just don't want
01:22:45.800 | to do, but would really help uplevel the quality, security,
01:22:49.880 | and robustness of your code base.
01:22:51.480 | Now we potentially have a way to do that with machines.
01:22:56.840 | I think there's also this other thing,
01:23:00.440 | and this gets back to the point of,
01:23:02.720 | how do you measure developer productivity?
01:23:04.520 | It's like the perennial age-old question.
01:23:06.960 | Every CFO in the world would love
01:23:08.520 | to do it in the same way that you can measure marketing,
01:23:11.920 | or sales, or other parts of the organization.
01:23:14.560 | And I think, what is the actual way you would do this
01:23:18.000 | that is good, if you had all the time in the world?
01:23:20.960 | I think, as an engineering manager or an engineering
01:23:23.320 | leader, what you would do is you would go read
01:23:25.660 | through the Git log, maybe like line by line.
01:23:28.160 | Be like, OK, you, Sean, these are the features
01:23:31.560 | that you built over the past six months or a year.
01:23:36.680 | These are the things that delivered that you helped drive.
01:23:39.120 | Here's the stuff that you did to help your teammates.
01:23:43.280 | Here are the reviews that you did
01:23:44.760 | that helped ensure that we have maintained
01:23:47.000 | a coherent and high-quality code base.
01:23:52.760 | Now connect that to the things that matter to the business.
01:23:55.220 | Like, what were we trying to drive this?
01:23:57.040 | Was it engagement?
01:23:58.160 | Was it revenue?
01:23:59.320 | Was it adoption of some new product line?
01:24:02.440 | And really weave that story together.
01:24:04.280 | The work that you did had this impact
01:24:05.960 | on the metrics that moved the needle for the business
01:24:08.200 | and ultimately show up in revenue, or stock price,
01:24:12.480 | or whatever it is that's at the very top of any for-profit
01:24:16.760 | organization.
01:24:18.080 | And you could, in theory, do all that today
01:24:22.440 | if you had all the time in the world.
01:24:24.360 | But as an engineering leader--
01:24:25.620 | It's a busy building.
01:24:26.540 | Yeah, you're too busy building.
01:24:27.540 | You're too busy with a bunch of other stuff.
01:24:29.380 | Plus, it's also tedious, like reading through Git log
01:24:32.660 | and trying to understand what a change does and summarizing
01:24:35.280 | that.
01:24:35.780 | Yeah.
01:24:36.620 | It's just-- it's not the most exciting work in the world.
01:24:40.320 | But with the benefit of AI, I think
01:24:44.060 | you could conceive of a system that actually
01:24:46.260 | does a lot of the tedium and helps you actually
01:24:48.740 | tell that story.
01:24:50.140 | And I think that is maybe the ultimate answer to how
01:24:53.260 | we get at developer productivity in a way
01:24:55.580 | that a CFO would be like, OK, I can buy that.
01:24:59.380 | The work that you did impacted these core metrics
01:25:03.100 | because these features were tied to those.
01:25:05.620 | And therefore, we can afford to invest more
01:25:09.060 | in this part of the organization.
01:25:10.420 | And that's what we really want to drive towards.
01:25:12.020 | I think that's what we've been trying to build all along,
01:25:14.500 | in a way, with Sourcegraph.
01:25:15.700 | It's this code-based level of understanding.
01:25:18.420 | And the availability of LLMs and AI
01:25:21.820 | now just puts that much sooner in reach, I think.
01:25:26.020 | Yeah.
01:25:26.740 | But I mean, we have to focus, also, small company.
01:25:30.460 | And so our short-term focus is lovability, right?
01:25:34.420 | Yeah.
01:25:34.920 | We absolutely have to make Cody like--
01:25:37.300 | everybody wants it, right?
01:25:39.420 | But absolutely, Sourcegraph is all
01:25:41.460 | about enabling all of the non-engineering roles,
01:25:46.340 | decision makers, and so on.
01:25:48.620 | And as Bianca says, I mean, I think
01:25:50.660 | there's just a lot of opportunity
01:25:52.180 | there once we've built a lovable Cody.
01:25:54.500 | Awesome.
01:25:56.260 | We want to jump into lightning round?
01:25:58.340 | Lightning round.
01:25:59.820 | Which we always forget to send the questions ahead of time.
01:26:04.300 | So we usually have three, one around acceleration,
01:26:07.180 | exploration, and then a final takeaway.
01:26:09.340 | So the acceleration one is, what's
01:26:11.780 | something that already happened in AI that is possible today
01:26:14.940 | that you thought would take much longer?
01:26:16.740 | I mean, just LLMs and how good the vision models are now.
01:26:22.300 | Like, I got my start--
01:26:23.340 | Oh, vision.
01:26:24.240 | Yeah.
01:26:24.740 | Well, I mean, back in the day, I got my start machine learning
01:26:30.100 | in computer vision, but circa 2009, 2010.
01:26:35.020 | And in those days, everything was statistical-based.
01:26:37.780 | Neural nets had not yet made their comeback.
01:26:40.940 | And so nothing really worked.
01:26:43.160 | And so I was very bearish after that experience
01:26:45.220 | on the future of computer vision.
01:26:46.660 | But man, the progress that's been
01:26:48.220 | made just in the past three or four years
01:26:51.660 | has just been absolutely astounding.
01:26:54.800 | So yeah, it came up faster than I expected it to.
01:26:59.580 | Yeah, multimodal in general, I think
01:27:02.700 | there's a lot more capability there
01:27:04.340 | that we're not tapping into, potentially even
01:27:06.740 | in the coding assistant space.
01:27:08.500 | And honestly, I think that the form factor
01:27:11.500 | that coding assistants have today
01:27:12.940 | is probably not the steady state that we're seeing long-term.
01:27:17.060 | I mean, you'll always have completions,
01:27:18.900 | and you'll always have chat, and commands, and so on.
01:27:21.420 | But I think we're going to discover a lot more.
01:27:23.380 | And I think multimodal potentially opens up
01:27:25.820 | some kind of new ways to get your stuff done.
01:27:30.540 | So yeah, I think the capabilities are there today.
01:27:32.620 | And it's just shocking.
01:27:33.740 | I mean, I still am astonished.
01:27:35.720 | When I sit down, and I have a conversation with the LLM
01:27:38.540 | with the context, and it's like I'm
01:27:41.340 | talking to a senior engineer, or an architect, or somebody.
01:27:45.100 | And I can bounce ideas off it.
01:27:46.740 | And I think that people have very different working models
01:27:49.220 | with these assistants today.
01:27:50.460 | Some people are just completion, completion, completion.
01:27:52.740 | That's it.
01:27:53.500 | And if they want some code generated,
01:27:55.000 | they write a comment, and then telling them what to do.
01:27:58.340 | But I truly think that there are other modalities that we're
01:28:01.040 | going to stumble across, and just kind of latently,
01:28:06.380 | inherently built into the LLMs today.
01:28:08.420 | We just haven't found them yet.
01:28:09.780 | They're more of a discovery than invention.
01:28:12.460 | Like other usage patterns?
01:28:14.180 | Absolutely.
01:28:14.960 | I mean, the one we talked about earlier, nonstop coding
01:28:17.260 | is one, where you could just kick off
01:28:19.140 | a whole bunch of requests to refactor, and so on.
01:28:22.580 | But there could be any number of others.
01:28:24.540 | We talk about agents, that's kind of out there.
01:28:26.540 | But I think there are kind of more inner loop type ones
01:28:29.780 | to be found.
01:28:31.220 | And we haven't looked at all that multimodal yet.
01:28:35.300 | Yeah.
01:28:36.820 | For sure, there's two that come to mind,
01:28:39.520 | just off the top of my head.
01:28:41.260 | One, which is effectively architecture diagrams
01:28:44.140 | and entity relationship diagrams.
01:28:47.180 | There's probably more alpha in synthesizing them
01:28:49.700 | for management to see, which is, you don't need AI for that.
01:28:55.260 | You can just use your reference graph.
01:28:57.420 | But then also doing it the other way around,
01:28:59.260 | when someone draws stuff on a whiteboard
01:29:00.940 | and actually generating code.
01:29:02.220 | Well, you can generate the diagram,
01:29:05.020 | and then explanations, as well.
01:29:07.340 | Yeah.
01:29:08.260 | And then the other one is, there was a demo
01:29:10.140 | that went pretty viral two, three weeks ago,
01:29:13.260 | about how someone just had an always-on script,
01:29:16.540 | just screenshotting and sending it to GPTVision
01:29:20.140 | on some kind of time interval.
01:29:21.620 | And it would just autonomously suggest stuff.
01:29:23.900 | Yeah.
01:29:24.900 | So no trigger, just watching your screen,
01:29:27.300 | and just being a real co-pilot, rather than having
01:29:30.900 | you initiate with the chat.
01:29:32.420 | Yeah.
01:29:33.420 | So there's some--
01:29:34.620 | It's like the return of Clippy, right?
01:29:36.380 | Return of Clippy.
01:29:37.100 | But actually good.
01:29:39.660 | So the reason I know this is we actually did a hackathon,
01:29:41.980 | where we wrote that project, but it roasted you while you did
01:29:46.820 | it, so it's like, hey, you're on Twitter right now.
01:29:49.940 | You should be coding.
01:29:52.820 | And that can be a fun co-pilot thing, as well.
01:29:55.340 | Yeah.
01:29:56.540 | OK, so I'll jump on.
01:29:57.860 | Exploration, what do you think is the most interesting
01:30:00.140 | unsolved question in AI?
01:30:02.900 | It used to be scaling, right, with CNNs and RNNs,
01:30:06.260 | and Transformer solved that.
01:30:07.540 | So what's the next big hurdle that's
01:30:09.020 | keeping GPT-10 from emerging?
01:30:12.180 | I mean, do you mean that like--
01:30:13.460 | Ooh, this is like a safetyist argument.
01:30:15.120 | I feel like-- do you mean like the pure model, like AI layer?
01:30:18.380 | No, it doesn't have to be--
01:30:19.540 | I mean, for me personally, it's like,
01:30:21.120 | how do you get reliable first try working code generation?
01:30:27.260 | Even like a single hop, like write
01:30:29.140 | a function that does this.
01:30:30.380 | Because I think if you want to get to the point
01:30:33.340 | where you can actually be truly agentic or multi-step
01:30:37.140 | automated, a necessary part of that
01:30:40.540 | is the single step has to be robust and reliable.
01:30:44.820 | And so I think that's the problem that we're
01:30:47.860 | focused on solving right now.
01:30:49.400 | Because once you have that, it's a building block
01:30:51.400 | that you can then compose into longer chains.
01:30:55.660 | And just to wrap things up, what's
01:30:57.100 | one message, takeaway that you want people
01:31:00.740 | to remember and think about?
01:31:02.780 | I mean, I think for me it's just like the best
01:31:09.540 | DevTools in the future are going to have
01:31:11.700 | to leverage many different forms of intelligence.
01:31:14.780 | Calling back to that like Normski architecture,
01:31:18.300 | trying to make it catch on.
01:31:19.740 | You should call it something cool like S* or R*.
01:31:22.940 | Yes, yes, yes.
01:31:24.500 | Just one letter and then just let people speculate.
01:31:26.860 | Yeah, yeah, what could he mean?
01:31:30.460 | But I don't know, like in terms of trying
01:31:32.980 | to describe what we're building, we
01:31:34.260 | try to be a little bit more down to earth
01:31:35.980 | and straightforward.
01:31:37.660 | And I think Normski encapsulates the two big technology areas
01:31:44.620 | that we're investing in that we think
01:31:46.140 | will be very important for producing really good DevTools.
01:31:51.460 | And I think it's a big differentiator that we
01:31:53.960 | view that Cody has right now.
01:31:57.060 | Yeah, and mine would be I know for a fact
01:32:00.900 | that not all developers today are using coding assistants.
01:32:04.700 | And that's probably because they tried it
01:32:08.380 | and it didn't immediately write a bunch of beautiful code
01:32:11.460 | for them.
01:32:12.060 | And they were like, ah, too much effort, and they left.
01:32:15.700 | Well, my big takeaway from this talk
01:32:17.420 | would be if you're one of those engineers,
01:32:19.860 | you better start planning another career.
01:32:24.400 | Because this stuff is in the future.
01:32:26.240 | And honestly, it takes some effort
01:32:29.640 | to actually make coding assistants work today.
01:32:31.920 | You have to-- just like talking to GPT,
01:32:33.880 | they'll give you the runaround, just like doing a Google search
01:32:35.720 | sometimes.
01:32:36.720 | But if you're not putting that effort in and learning
01:32:39.560 | the sort of footprint and the characteristics of how
01:32:42.600 | LLMs behave under different query conditions and so on,
01:32:46.040 | if you're not getting a feel for the coding assistant,
01:32:48.560 | then you're letting this whole train just pull out
01:32:50.700 | of the station and leave you behind.
01:32:52.700 | Yeah.
01:32:53.200 | Cool.
01:32:53.700 | Absolutely.
01:32:54.560 | Yeah, thank you guys so much for coming on and being
01:32:57.120 | the first guest in the new studio.
01:32:59.240 | Our pleasure.
01:32:59.960 | Thanks for having us.
01:33:00.880 | [MUSIC PLAYING]
01:33:04.240 | [MUSIC PLAYING]
01:33:07.600 | [MUSIC PLAYING]
01:33:11.520 | [MUSIC PLAYING]
01:33:15.840 | [MUSIC PLAYING]
01:33:19.360 | [MUSIC PLAYING]
01:33:22.720 | (upbeat music)