back to indexThe "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
Chapters
0:0 Intros & Backgrounds
6:20 How Steve's work on Grok inspired SourceGraph for Beyang
8:53 From code search to AI coding assistant
13:18 Comparison of coding assistants and the capabilities of Cody
16:49 The importance of context (RAG) in AI coding tools
20:33 The debate between Chomsky and Norvig approaches in AI
25:2 Code completion vs Agents as the UX
30:6 Normsky: the Norvig + Chomsky models collision
36:0 How to build the right context for coding
42:0 The death of the DSL?
46:15 LSP, Skip, Kythe, BFG, and all that fun stuff
62:0 The SourceGraph internal stack
68:46 Building on open source models
74:35 SourceGraph for engineering managers?
86:0 Lightning Round
00:00:04.200 |
This is Alessio, partner and CTO on Residents 00:00:07.600 |
And I'm joined by my co-host, Sweets, founder of Small.ai. 00:00:10.760 |
Hey, and today we're christening our new podcast studio 00:00:16.200 |
And we have Biang and Steve from Sourcegraph. 00:00:24.480 |
We also are just celebrating the one year anniversary of ChatGPT 00:00:30.360 |
But also we'll be talking about the GA of Cody later on today. 00:00:34.480 |
But we'll just do a quick intros of both of you. 00:00:37.320 |
Obviously, people can research you and check the show notes 00:00:40.880 |
But Biang, you worked in computer vision at Stanford, 00:00:55.120 |
Well, the end user thing was Google Code Search. 00:00:58.100 |
That's what everyone called it, or just like CS. 00:01:00.680 |
But the brains of it were really the Trigram index and then 00:01:08.720 |
Today it's called Kythe, the open source Google one. 00:01:15.640 |
you've interviewed a bunch of other code search developers, 00:01:18.760 |
including the current developer of Kythe, right? 00:01:24.200 |
although we would love to if they're up for it. 00:01:27.480 |
We had Kelly Norton, who built a similar system at Etsy. 00:01:43.120 |
--I think heavily inspired by the Trigram index that 00:02:11.040 |
I guess the back story was, I used Google Code Search 00:02:19.360 |
and worked elsewhere, it was the single dev tool 00:02:23.840 |
I felt like my job was just a lot more tedious and much more 00:02:29.840 |
And so when Quinn and I started working together at Palantir, 00:02:32.420 |
he had also used various code search engines in open source 00:02:38.440 |
And it was just a pain point that we both felt, 00:02:49.120 |
large financial institutions, folks like that. 00:02:57.840 |
made our pain points feel small by comparison. 00:03:11.960 |
And revealed-- and you've told many, many stories. 00:03:15.160 |
I want every single listener of "Latent Space" 00:03:17.040 |
to check out Steve's YouTube, because he effectively 00:03:25.240 |
You just hit record and just went on a few rants. 00:03:34.640 |
had some interesting thoughts on just the overall Google 00:03:38.320 |
You joined Grab as head of Eng for a couple of years. 00:03:40.720 |
I'm from Singapore, so I have actually personally 00:04:04.560 |
about as a good startup that people admire or look up 00:04:08.880 |
to, on the league that you, with all your legendary experience, 00:04:18.440 |
They actually didn't even know that they were as good 00:04:22.600 |
They started hiring a bunch of people from Silicon Valley 00:04:28.880 |
could have been a little better, operational excellence 00:04:32.680 |
And the only thing about Grab is that they get criticized a lot 00:04:41.240 |
By Singaporeans who don't want to work there. 00:04:44.400 |
OK, well, I guess I'm biased because I'm here, 00:04:54.520 |
because they were more Westernized than the Sanders 00:04:57.880 |
I mean, they had their success because they are laser-focused. 00:05:02.960 |
I mean, they're executing really, really, really well. 00:05:23.200 |
because they're just out there with their sleeves rolled up, 00:05:35.400 |
Yeah, in the way that super apps don't exist in the West. 00:05:38.000 |
It's one of the greatest mysteries, enduring mysteries 00:05:48.160 |
And it was primarily because of bandwidth reasons 00:06:04.760 |
Any-- I think-- and that's also where you discover some need 00:06:11.360 |
Better programming languages, better databases, 00:06:15.000 |
I mean, I started in '95, where there was kind of nothing. 00:06:21.400 |
you first went to Grab, because you wrote that blog post, 00:06:41.560 |
Yeah, so I guess the back story, from my point of view, 00:06:44.880 |
is I had used Code Search and Grok while at Google. 00:06:49.360 |
But I didn't actually know that it was connected to you, Steve. 00:06:52.720 |
Like, I knew you from your blog posts, which were always 00:06:55.160 |
excellent, kind of like inside, very thoughtful takes on-- 00:06:59.640 |
from an engineer's perspective, on some of the challenges 00:07:08.000 |
within the context of code intelligence and code 00:07:10.120 |
understanding, was I watched a talk that you gave, 00:07:13.720 |
I think, at Stanford about Grok when you were first 00:07:20.640 |
who writes the extremely thoughtful, ranty blog posts, 00:07:27.520 |
And so that's how I knew you were kind of involved in that. 00:07:57.400 |
I had this dagger of jealousy stabbed through me, 00:08:00.400 |
piercingly, which I remember, because I am not 00:08:11.580 |
I got sucked back into the ads vortex and whatever. 00:08:14.440 |
So thank god, Sourcegraph actually kind of rescued me. 00:08:27.560 |
Is there anything else that people should know about you 00:08:51.840 |
this has been a company 10 years in the making. 00:08:54.480 |
And as Sean said, now you're at the right place. 00:08:59.520 |
Now exactly, you spent 10 years collecting all this code, 00:09:02.480 |
indexing, making it easy to surface it, and how-- 00:09:05.640 |
And also learning how to work with enterprises 00:09:07.960 |
and having them trust you with their code bases. 00:09:10.360 |
Because initially, you were only doing on-prem, right, like VPC, 00:09:15.880 |
So in the very early days, we were cloud only. 00:09:22.960 |
And that was, I think, related to the nature of the problem 00:09:27.600 |
just a critical, unignorable pain point once you're 00:09:32.920 |
And now Kodi is going to be GA by the time this releases. 00:09:38.360 |
Congrats to your future self for launching this in two weeks. 00:09:42.440 |
Can you give a quick overview of just what Kodi is? 00:09:45.280 |
I think everybody understands that it's an AI coding agent. 00:09:49.440 |
But a lot of companies say they have an AI coding agent. 00:09:57.680 |
from the several dozen other AI coding agents 00:10:04.320 |
when we thought about building a coding assistant that 00:10:08.360 |
would do things like code generation and question 00:10:11.800 |
think we came at it from the perspective of we've 00:10:14.600 |
spent the past decade building the world's best code 00:10:17.880 |
understanding engine for human developers, right? 00:10:26.280 |
if you want to go and dive into a large, complex code base. 00:10:30.360 |
And so our intuition was that a lot of the context 00:10:35.640 |
would also be useful context for AI developers to consume. 00:10:43.560 |
Kodi is very similar to a lot of other assistants. 00:10:49.640 |
It does specific commands that automate tasks 00:10:55.640 |
like generating unit tests or adding detailed documentation. 00:11:01.080 |
But we think the core differentiator is really 00:11:08.280 |
It's a bit like saying, what's the difference between Google 00:11:12.520 |
There's not a quick checkbox list of features 00:11:15.880 |
But it really just comes down to all the attention and detail 00:11:19.000 |
that we've paid to making that context work well and be 00:11:24.760 |
For human devs, we're now kind of plugging into the AI coding 00:11:30.020 |
I mean, just to add, just to add my own perspective 00:11:40.920 |
that the LLM has available that knows about your code. 00:11:45.000 |
RAG provides basically a bridge to a lookup system 00:11:49.520 |
Whereas fine-tuning would be more like on-the-job training 00:11:54.000 |
If the LLM is a person, and you send them to a new job, 00:12:05.480 |
because the expert knows your particular code base, 00:12:12.620 |
And there's a chicken-and-egg problem, because we're like, 00:12:15.160 |
well, I'm going to ask the LLM about my code. 00:12:34.640 |
and using code search, and then starting to feel like without 00:12:40.760 |
Once you start using these-- do you guys use coding assistants? 00:12:44.400 |
I mean, we're getting to the point very quickly, right? 00:12:50.640 |
almost like you're programming without the internet, right? 00:12:53.480 |
It's like you're programming back in the '90s 00:12:59.480 |
who have no idea about coding systems, what they are. 00:13:08.920 |
We had Codium and Codium, very similar names. 00:13:13.180 |
Griblet, Find, and then, of course, there's Copilot. 00:13:26.760 |
And I think it really shows the context improvement. 00:13:43.880 |
Versus Cody was like, oh, these are the major functions 00:13:51.280 |
And then the other one was, how do I start this up? 00:13:56.440 |
even though there was no start command in the package JSON. 00:14:01.680 |
Most projects use NPM start, so maybe this does too. 00:14:05.720 |
How do you think about open source models and private-- 00:14:12.520 |
And I think you guys use Starcoder, if I remember right. 00:14:21.080 |
I don't think they've officially announced what model they use. 00:14:24.000 |
- And I think they use a range of models based on what you're 00:14:28.960 |
No one uses the same model for inline completion 00:14:31.260 |
versus chat, because the latency requirements for-- 00:14:44.960 |
to get it to output just the code and not, like, hey, 00:14:48.480 |
here's the code you asked for, like that sort of text. 00:14:54.320 |
We've kind of designed Kodi to be especially model-- 00:15:07.680 |
want to be able to integrate the best in class models, 00:15:11.040 |
whether they're proprietary or open source, into Kodi, 00:15:15.200 |
because the pace of innovation in the space is just so quick. 00:15:21.760 |
Like today, Kodi uses Starcoder for inline completions. 00:15:25.640 |
And with the benefit of the context that we provide, 00:15:29.440 |
we actually show comparable completion acceptance rate 00:15:35.840 |
that folks use to evaluate inline completion quality. 00:15:39.840 |
what's the chance that you actually accept the completion 00:15:45.080 |
which is at the head of the industry right now. 00:15:47.920 |
And we've been able to do that with the Starcoder model, which 00:15:50.420 |
is open source, and the benefit of the context fetching stuff 00:15:55.020 |
And of course, a lot of like prompt engineering 00:16:03.640 |
"Cheating is All You Need" about what you're building. 00:16:07.460 |
that everybody's fighting on the same axis, which 00:16:10.000 |
is better UI and the IDE, maybe like a better chat response. 00:16:14.400 |
But data modes are kind of the most important thing. 00:16:22.280 |
How do you kind of think about what other companies are 00:16:31.840 |
I feel like you see so many people, oh, we just 00:16:34.560 |
got a new model, and it's like a bit human eval. 00:16:36.920 |
And it's like, wow, but maybe like that's not 00:16:42.960 |
the importance of like the actual RAG in code? 00:16:47.040 |
Yeah, I mean, I think that people weren't doing it much. 00:16:56.200 |
so within the last year, I've heard a lot of rumblings 00:16:59.840 |
Because they're undergoing a huge transformation 00:17:02.240 |
to try to, of course, get into the new world. 00:17:07.160 |
to go and train their own models or fine-tune their own models, 00:17:24.120 |
Google loves to compete with themselves, right? 00:17:27.440 |
And they had a paper on Duet, like, from a year ago. 00:17:29.880 |
And they were doing exactly what Copilot was doing, 00:17:32.040 |
which was just pulling in the local context, right? 00:17:38.440 |
because we were talking about the splitting of the models. 00:17:40.840 |
In the early days, it was the LLM did everything. 00:17:44.160 |
And then we realized that for certain use cases, 00:17:47.000 |
like completions, that a different, smaller, faster 00:17:53.040 |
actually, we expected to continue and proliferate, 00:17:56.440 |
Because fundamentally, we're a recommender engine right now. 00:18:02.080 |
We're saying, may I interest you in this code 00:18:04.200 |
right here so that you can answer my question? 00:18:09.180 |
I mean, who are the best recommenders, right? 00:18:11.020 |
There's YouTube, and Spotify, and Amazon, or whatever, right? 00:18:14.320 |
Yeah, and they all have many, many, many, many, many models, 00:18:20.640 |
and that's where we're headed in code, too, absolutely. 00:18:24.040 |
Yeah, we just did an episode we released on Wednesday, 00:18:26.880 |
which we said RAG is like Rexis, or like LLMs. 00:18:30.720 |
You're basically just suggesting good content. 00:18:40.240 |
is you embed everything through a vector database. 00:18:42.720 |
You embed your query, and then you find the nearest neighbors, 00:18:49.720 |
there's sample diversity and that kind of stuff. 00:18:52.360 |
And then you're slowly gradient-descending yourself 00:18:58.040 |
which has been traditional ML for a long time, 00:19:02.840 |
Yeah, I almost think of it as a generalized search problem, 00:19:11.080 |
and get all the potential things that could be relevant, 00:19:13.840 |
and then there's typically a layer 2 re-ranking mechanism 00:19:20.240 |
to get the relevant stuff to the top of the results list. 00:19:24.400 |
Have you discovered that ranking matters a lot? 00:19:26.400 |
So the context is that I think a lot of research 00:19:37.600 |
and then apparently, Cloud uses the bottom better. 00:19:44.360 |
The skill with which models are able to take advantage 00:19:47.040 |
of context is always going to be dependent on how 00:19:49.720 |
that factors into the impact on the training loss. 00:19:53.400 |
So if you want long context window models to work well, 00:19:56.240 |
then you have to have a ton of data where it's 00:20:01.200 |
and I'm going to ask a question about something that's 00:20:04.080 |
embedded deeply into it, and give me the right answer. 00:20:09.560 |
then of course you're going to have variability in terms 00:20:15.320 |
the thing that you're talking about right now, 00:20:18.280 |
to be something that we talked about recently. 00:20:20.840 |
Did you really just say gradient dissenting yourself? 00:20:24.640 |
Actually, I love that it's entered the casual lexicon. 00:20:28.520 |
My favorite version of that is how you have to p-hack papers. 00:20:43.320 |
I think the other interesting thing that you have 00:20:45.360 |
is inline-assist UX that is, I wouldn't say async, 00:20:53.240 |
So you can ask Kodi to make changes on a code block, 00:20:55.840 |
and you can still edit the same file at the same time. 00:21:08.040 |
messing each other up as they make changes in the code? 00:21:12.920 |
and what do you think about where the attack is going? 00:21:18.200 |
So we actually had this feature in the very first launch 00:21:25.040 |
And you could have multiple basically LLM requests 00:21:31.200 |
And he wrote a bunch of codes to handle all of the diffing 00:21:40.960 |
And it just felt like it was just a little before its time. 00:21:47.480 |
was able to be reused for where inline's sitting today. 00:22:02.360 |
and have the code update, to really like targeted features 00:22:11.320 |
And the reason for that is, I think the challenge 00:22:16.120 |
and we do want to get to the point where you could just 00:22:18.440 |
fire it, forget, and have half a dozen of these running 00:22:24.720 |
early on that a lot of people are running into now 00:22:27.200 |
when they're trying to construct agents, which 00:22:29.920 |
is the reliability of working code generation 00:22:36.280 |
is just not quite there yet in today's language models. 00:22:40.920 |
And so that kind of constrains you to an interaction 00:22:45.360 |
where the human is always like in the inner loop, 00:22:56.840 |
have to constrain it to a domain where today's language models 00:23:02.120 |
So generating unit tests, that's like a well-constrained problem, 00:23:05.520 |
or fixing a bug that shows up as a compiler error or a test 00:23:15.440 |
this class that does x, y, and z using the libraries 00:23:21.080 |
even with the benefit of really good context. 00:23:46.120 |
you don't have to have a human in the loop every time. 00:23:48.440 |
And there's also kind of like an LLM call at each stage, 00:24:15.880 |
on the feasibility of agents with purely kind 00:24:20.680 |
To your original question, like the inline interactions 00:24:24.960 |
to be more targeted, like fix the current error 00:24:38.880 |
and this is based on the user feedback that we've gotten-- 00:24:45.680 |
you don't want to have a long chat conversation 00:24:50.200 |
You'd rather just have it write the right thing 00:24:52.900 |
and then move on with your life or not have to think about it. 00:24:55.480 |
And that's what we're trying to work towards. 00:24:57.360 |
I mean, yeah, we're not going in the agent direction. 00:25:03.600 |
Instead, we're working on sort of solidifying 00:25:06.640 |
our strength, which is bringing the right context in. 00:25:12.060 |
to plug in your own context, ways for you to control 00:25:16.440 |
happens before the request goes out, et cetera. 00:25:30.720 |
They really mean greater automation, fully automated. 00:25:36.720 |
And I don't have to think about it as a human. 00:25:41.840 |
I think it's specifically the approach of, hey, 00:25:59.440 |
It's just a reality of the behavior of language models 00:26:04.840 |
And I think that's just a reflection of reality. 00:26:08.680 |
Because if you look at the way that a lot of other AI tools 00:26:14.680 |
have implemented context fetching, for instance, 00:26:23.080 |
supposedly provides code-based level context, 00:26:27.040 |
it has an agentic approach, where you kind of look 00:26:32.920 |
And it feels like they're making multiple requests to the LLM, 00:26:43.480 |
And it's a multi-hop step, so it takes a long while. 00:26:51.800 |
And then at the end of the day, the context it fetches 00:26:59.280 |
and then maybe crawl through the reference graph a little bit. 00:27:04.840 |
That doesn't require any sort of LLM invocation at all. 00:27:08.520 |
And we can pull in much better context very quickly. 00:27:13.040 |
So it's faster, it's more reliable, it's deterministic, 00:27:20.000 |
We just don't think you should cargo cult or naively go, 00:27:25.240 |
try to implement agents on top of the LLMs that exist today. 00:27:29.760 |
I think there are a couple of other technologies 00:27:35.800 |
before we can get into these multi-stage, fully automated 00:27:40.520 |
We're very much focused on developer inner loop right now. 00:27:50.680 |
tackling the agents problem that you don't want to tackle? 00:27:56.960 |
are after maybe like the same high level problem, which 00:28:05.320 |
And can an automated system go build that software for me? 00:28:20.440 |
Coding in some senses, it's similar and dissimilar 00:28:25.620 |
think producing code is more difficult than playing chess 00:28:33.560 |
And if you look at the best AI chess players, 00:28:38.440 |
People have showed demos where it's like, oh, yeah, 00:28:40.560 |
GPT-4 is actually a pretty decent chess move suggester. 00:28:44.760 |
But you would never build a best-in-class chess player 00:28:58.400 |
And then you have a way to explore that search space 00:29:02.880 |
There's a bunch of search algorithms, essentially, 00:29:04.920 |
where you're doing tree search in various ways. 00:29:11.840 |
You might use an LLM to generate proposals in that space 00:29:18.840 |
But the backbone is still this more formalized tree search 00:29:31.800 |
that the way that we get to this more reliable multi-step 00:29:36.000 |
workflows that can do things beyond generate unit test, 00:29:41.400 |
it's really going to be like a search-based approach, where 00:29:43.960 |
you use an LLM as kind of like an advisor or a proposal 00:29:54.560 |
But it's probably not going to be the thing that 00:29:58.400 |
Because I guess it's not the right tool for that. 00:30:07.300 |
the words, the philosophical Peter Norvig type discussion. 00:30:11.560 |
Maybe you want to introduce that divided in software. 00:30:20.400 |
They're probably familiar with the classic Chomsky 00:30:24.120 |
No, actually, I was prompting you to introduce that. 00:30:27.760 |
So if you look at the history of artificial intelligence, 00:30:33.800 |
I don't know, it's probably as old as modern computers, 00:30:40.680 |
to producing a general human level of intelligence. 00:30:51.320 |
which, roughly speaking, includes large language 00:30:58.840 |
Basically, any model that you learn from data 00:31:04.400 |
most of machine learning would fall under this umbrella. 00:31:06.700 |
And that school of thought says, just learn from the data. 00:31:10.800 |
That's the approach to reaching intelligence. 00:31:16.000 |
like compilers, and parsers, and formal systems. 00:31:22.320 |
about how to construct a formal, precise system. 00:31:26.120 |
And that will be the approach to how we build 00:31:31.080 |
Lisp, for instance, was originally an attempt to-- 00:31:38.400 |
could create rules-based systems that you would call AI. 00:31:42.360 |
Yeah, and for a long time, there was this debate. 00:31:47.840 |
and others that were more in the Norvig camp. 00:31:53.760 |
is that Norvig definitely has the upper hand right now 00:31:56.840 |
with the advent of LLMs, and diffusion models, 00:31:59.280 |
and all the other recent progress in machine learning. 00:32:03.840 |
But the Chomsky-based stuff is still really useful, 00:32:17.260 |
that you want to explore with your AI dev tool. 00:32:25.600 |
It's a lot of what we've invested in the past decade 00:32:28.040 |
at Sourcegraph, and what you built with Grok. 00:32:34.480 |
construct these very precise knowledge graphs that 00:32:37.640 |
are great context providers, and great guardrails enforcers, 00:32:41.400 |
and safety checkers for the output of a more data-driven, 00:32:48.720 |
fuzzier system that uses like the Norvig-based models. 00:32:57.500 |
Basically, it's like, OK, so when I was in college, 00:33:02.000 |
I was in college learning Lisp, and Prolog, and Planning, 00:33:04.500 |
and all the deterministic Chomsky approaches to AI. 00:33:08.240 |
And I was there when Norvig basically declared it dead. 00:33:12.440 |
I was there 3,000 years ago when Norvig and Chomsky 00:33:29.160 |
He's got so many famous short posts, amazing things. 00:33:32.080 |
He had a famous talk, "The Unreasonable Effectiveness 00:33:38.560 |
convinced everybody that the deterministic approaches had 00:33:41.360 |
failed, and that heuristic-based, data-driven, 00:33:44.280 |
statistical approaches, stochastic were better. 00:33:53.360 |
--was that, well, the steam-powered engine-- no. 00:33:58.080 |
The reason was that the deterministic stuff didn't 00:34:01.800 |
They were using Prolog, man, constraint systems 00:34:07.400 |
Today, actually, these Chomsky-style systems do scale. 00:34:11.080 |
And that's, in fact, exactly what Sourcegraph has built. 00:34:19.240 |
the marriage of the Chomsky and the Norvig models, 00:34:22.360 |
conceptual models, because we have both of them. 00:34:26.260 |
And, in fact, there's this really interesting overlap 00:34:29.760 |
between them, where the AI or our graph or our search engine 00:34:33.400 |
could potentially provide the right context for any given 00:34:35.720 |
query, which is, of course, why ranking is important. 00:34:38.360 |
But what we've really signed ourselves up for 00:34:46.760 |
you were saying that GPT-4 tends to the front of the context 00:34:53.580 |
Yeah, and so that means that if we're actually 00:35:00.920 |
to test putting it at the beginning of the window 00:35:04.280 |
make the right decision based on the LLM that you've chosen. 00:35:15.400 |
We're generating tests, filling the middle type tests 00:35:19.320 |
to basically fine-tune Cody's behavior there, yeah? 00:35:25.080 |
I also want to add, I have an internal pet name 00:35:28.400 |
for this hybrid architecture that I'm trying to make catch on. 00:35:45.120 |
I mean, it's obviously a portmanteau of Norvig 00:35:52.280 |
for non-agentic, rapid, multi-source code intelligence. 00:36:07.000 |
that we're not trying to pitch you on agent hype, right? 00:36:12.040 |
The things it does are really just use developer tools 00:36:17.680 |
like parsers and really good search indexes and things 00:36:23.200 |
Rapid, because we place an emphasis on speed. 00:36:25.440 |
We don't want to sit there waiting for multiple LLM 00:36:28.920 |
requests to return to complete a simple user request. 00:36:35.600 |
about what pieces of information and knowledge 00:36:43.680 |
and then you add in the reference graph, which 00:36:49.920 |
But then even beyond that, sources of information, 00:37:01.680 |
in your production logging system, in your chat, 00:37:09.520 |
Like there's so much context that's embedded there. 00:37:12.840 |
and you're trying to be productive in your code base, 00:37:15.080 |
you're going to go to all these different systems 00:37:16.600 |
to collect the context that you need to figure out 00:37:21.520 |
And I don't think the AI developer will be any different. 00:37:32.760 |
We hope through kind of like an open protocol 00:37:38.420 |
And this is something else that should be, I guess, 00:37:41.960 |
like accessible by December 14th in kind of like a preview 00:37:48.400 |
this notion of the code graph beyond your Git repository 00:37:51.480 |
to all the other sources where technical knowledge 00:38:03.080 |
How do you guys think about the importance of-- 00:38:05.600 |
it's almost like data pre-processing in a way, 00:38:07.800 |
which is bring it all together, tie it together, make it ready. 00:38:14.640 |
that good, what some of the innovation you guys have made? 00:38:18.240 |
We talk a lot about the context fetching, right? 00:38:20.900 |
I mean, there's a lot of ways you could answer this question. 00:38:23.400 |
But we've spent a lot of time just in this podcast 00:38:33.340 |
and you've got more context than you can fit. 00:38:42.320 |
by an embedding or a graph call or something? 00:38:46.640 |
Or do you just need the top part of the function, 00:38:53.920 |
to get each piece of context down into its smallest state, 00:39:04.800 |
And so recursive summarization and all the other techniques 00:39:07.840 |
that you've got to use to stuff stuff into that context window 00:39:12.200 |
And you have to test them across every configuration of models 00:39:22.160 |
to a lot of the cool stuff that people are shipping today, 00:39:26.760 |
whether you're doing like RAG or fine tuning or pre-training. 00:39:34.800 |
because it is basically garbage in, garbage out, right? 00:39:39.440 |
Like if you're feeding in garbage to the model, 00:39:53.680 |
able to extract the key components of a particular file 00:39:58.320 |
of code, separate the function signature from the body, 00:40:00.760 |
from the doc string, what are you even doing? 00:40:17.760 |
We've had a tool since computers were invented 00:40:20.120 |
that understands the structure of source code 00:40:28.760 |
is to know about the code in terms of structure. 00:40:39.400 |
just because now we have really good data-driven models that 00:40:45.800 |
When I called it a data moat in my cheating post, 00:40:53.000 |
because data moat sort of sounds like data lake 00:41:00.080 |
on this giant mountain of data that we had collected. 00:41:06.400 |
that can very quickly and scalably basically dissect 00:41:09.600 |
your entire code base into very small, fine-grained semantic 00:41:20.000 |
Yeah, if anything, we're hypersensitive to customer data 00:41:24.880 |
So it's not like we've taken a bunch of private data 00:41:42.000 |
I think that's a very real concern in today's day and age. 00:41:50.720 |
it's very easy both to extract that knowledge from the model 00:42:01.560 |
About a year ago, I wrote a post on LLMs for developers. 00:42:05.040 |
And one of the points I had was maybe the depth of the DSL. 00:42:13.640 |
But it's not as performant, but it's really easy to read. 00:42:18.560 |
maybe they're faster, but they're more verbose. 00:42:21.760 |
And when you think about efficiency of the context 00:42:39.240 |
Do you see in the future the way we think about DSL and APIs 00:42:48.520 |
Whereas maybe it's harder to read for the human, 00:42:52.400 |
but the human is never going to write it anyway. 00:42:57.400 |
There are some data science things, like spin-up the spandex. 00:43:07.880 |
Well, so DSLs, they involve writing a grammar and a parser. 00:43:18.600 |
And we do them that way because we need them to compile, 00:43:23.240 |
and humans need to be able to read them, and so on. 00:43:30.600 |
more or less unstructured, and they'll deal with it. 00:43:35.600 |
for communicating with the LLM or packaging up 00:43:42.560 |
like that that are sort of peeking into DSL territory, 00:43:48.480 |
have to learn DSLs, like regular expressions, 00:43:53.600 |
I think you're absolutely right that the LLMs are really, 00:43:57.000 |
And I think you're going to see a lot less of people 00:44:01.080 |
They just have to know the broad capabilities, 00:44:07.560 |
I think we will see kind of like a revisiting of-- 00:44:13.400 |
is that it makes it easier to work with a lower level 00:44:17.320 |
language, but at the expense of introducing an abstraction 00:44:22.280 |
And in many cases today, without the benefit of AI co-generation, 00:44:36.800 |
I think there's still places where that trade-off 00:44:40.280 |
But it's kind of like, how much of source code 00:44:45.320 |
through natural language prompting in the future? 00:44:56.200 |
Maybe for a large portion of the code that's written, 00:45:00.800 |
the DSL that is Ruby, or Python, or basically 00:45:04.840 |
any other programming language that exists today. 00:45:07.000 |
I mean, seriously, do you guys ever write SQL queries now 00:45:14.920 |
And so we have kind of passed that bridge, right? 00:45:18.200 |
Yeah, I think to me, the long-term thing is like, 00:45:25.360 |
It's like, hey-- the basic thing is like, hey, 00:45:33.080 |
And the follow-on question, do you need the engineer 00:45:38.880 |
That's kind of the agent's discussion in a way, 00:45:42.960 |
but slowly you're getting more of the atomic units 00:45:48.400 |
I kind of think of it as like, do you need a punch card 00:45:52.640 |
And so I think we're still going to have people 00:46:02.600 |
versus the higher-level, more creative tasks is going to 00:46:20.040 |
And the first step is the AI-enhanced engineer 00:46:22.440 |
that is that software developer that is no longer doing 00:46:28.280 |
because they're just enhanced by tools like yours. 00:46:35.960 |
And because we're releasing this as you go GA, 00:46:40.040 |
you hope for other people to take advantage of that? 00:46:48.820 |
to make your system, whether it's chat, or logging, 00:46:52.760 |
or whatever, accessible to an AI developer tool like Kodi, 00:46:58.840 |
here is kind of like the schema by which you can provide 00:47:08.200 |
did this for kind of like standard code intelligence. 00:47:10.600 |
It's kind of like a lingua franca for providing 00:47:16.200 |
There might be also analogs to kind of the original OpenAI 00:47:20.720 |
kind of like plugins API, where it's like, hey, 00:47:27.440 |
that might be useful for an LM-based system to consume. 00:47:31.520 |
And so at a high level, what we're trying to do 00:47:33.920 |
is define a common language for context providers 00:47:38.560 |
to provide context to other tools in the software 00:47:43.640 |
Do you have any critiques of LSP, by the way, 00:47:48.200 |
One of the authors wrote a really good critique recently. 00:47:59.360 |
I think LSP is great for what it did for the developer 00:48:08.120 |
it's much easier now to get code navigation up and running 00:48:13.440 |
--in a bunch of editors by speaking this protocol. 00:48:17.440 |
is looking at the different design decisions made, 00:48:30.560 |
I think the critique of LSP from a Kithe point of view 00:48:34.920 |
have an actual model, a symbolic model, of the code. 00:48:51.200 |
And that's the thing you feed into the language server. 00:48:56.860 |
that you should jump to if you click on that range. 00:48:59.000 |
So it kind of is intentionally ignorant of the fact 00:49:02.400 |
that there's a thing called a reference underneath your 00:49:04.760 |
cursor, and that's linked to a symbol definition. 00:49:07.100 |
Well, actually, that's the worst example you could have used. 00:49:09.640 |
You're right, but that's the one thing that it actually 00:49:18.240 |
Whereas Kithe attempts to model all these things explicitly. 00:49:25.520 |
And so Google's internal protocol is gRPC-based. 00:49:34.440 |
Basically, you make a heavy query to the back end, 00:49:40.920 |
So we've looked at LSP, and we think that it's just-- 00:49:45.960 |
I mean, it's a great protocol, lots and lots of support 00:49:48.740 |
But we need to push into the domain of exposing 00:49:59.160 |
developed a protocol of our own called Skip, which is, I think, 00:50:02.020 |
at a very high level, trying to take some of the good ideas 00:50:04.440 |
from LSP and from Kithe, and merge that into a system that, 00:50:10.540 |
but I think in the long term, we hope it will 00:50:13.840 |
And I would say, OK, so here's what LSP did well. 00:50:20.840 |
"dumb" in air quotes, because I'm not ragging on it-- 00:50:30.060 |
to kind of bypass the hard problem of modeling language 00:50:35.040 |
So if all you want to do is jump to definition, 00:50:37.200 |
you don't have to come up with a universally unique naming 00:50:40.320 |
scheme for each symbol, which is actually quite challenging. 00:50:57.800 |
you're fetching this from, whether it's the public one 00:51:03.800 |
And by just going from a location-to-location-based 00:51:07.680 |
approach, you basically just throw that out the window. 00:51:11.720 |
Just make that work, and you can make that work 00:51:14.240 |
without having to deal with all the complex global naming 00:51:29.760 |
And I want to incorporate that semantic model of how 00:51:32.800 |
the code operates, or how the code relates to each other 00:51:35.880 |
at a static level, you can't do that with LSP, 00:51:44.560 |
in order to do a find references and then jump to definition, 00:51:53.600 |
And it just adds a lot of latency and complexity 00:51:58.000 |
this thing clearly references this other thing. 00:52:02.440 |
And I think that's the thing that Kite does well. 00:52:04.440 |
But then I think the issue that Kite has had with adoption 00:52:07.520 |
is, because it's a more sophisticated schema, I think. 00:52:15.960 |
that you have to implement to get a Kite implementation 00:52:24.280 |
Kite also has the problem-- all these systems 00:52:26.560 |
have the problem, even Skip, or at least the way 00:52:30.560 |
that they have to integrate with your build system 00:52:36.520 |
the code in a special mode to generate artifacts instead 00:52:41.440 |
by the way, earlier I was saying that xrefs were in LSP, 00:52:46.240 |
but it's actually-- I was thinking of LSP plus lsif. 00:53:00.040 |
It's supposed to be sort of a model, a serialization 00:53:04.360 |
But it basically just does what LSP needs, the bare minimum. 00:53:13.440 |
to kind of quickly bootstrap from cold start. 00:53:15.840 |
But it's a graph model with all of the inconvenience of the API 00:53:23.960 |
So one of the things that we try to do with Skip 00:53:32.120 |
some of the more symbolic characteristics of the code 00:53:34.960 |
that would allow us to essentially construct this 00:53:39.560 |
useful for both the human developer through SourceGraph 00:53:44.600 |
So anyway, just to finish off the graph comment 00:54:07.240 |
I should probably have to do a blog post about it 00:54:09.920 |
to walk you through exactly how they're doing it. 00:54:12.600 |
But it's a very AI-like, iterative, experimentation 00:54:16.800 |
sort of approach, where we're building a code graph based 00:54:23.640 |
But we're building it quickly with zero configuration, 00:54:25.880 |
and it doesn't have to integrate with your build system 00:54:30.680 |
And so it just happens when you install the plug-in 00:54:38.240 |
and providing that knowledge graph in the background 00:54:42.320 |
This is a bit of secret sauce that we haven't really-- 00:54:46.800 |
I don't know, we haven't advertised it very much lately. 00:54:49.800 |
But I am super excited about it, because what they do 00:54:52.480 |
is they say, all right, let's tackle function parameters 00:54:56.000 |
Kodi's not doing a very good job of completing function call 00:54:58.800 |
arguments or function parameters in the definition, right? 00:55:03.840 |
And then we can actually reuse those tests for the AI context 00:55:07.760 |
So fortunately, things are kind of converging on. 00:55:10.040 |
We have half a dozen really, really good context sources. 00:55:16.880 |
So anyway, BFG, you're going to hear more about it probably, 00:55:24.240 |
Yeah, I think it'll be online for December 14th. 00:55:29.640 |
BFG is probably not the public name we're going to go with. 00:55:32.720 |
I think we might call it Graph Context or something like that. 00:55:46.480 |
look at current AI inline code completion tools 00:55:50.760 |
and the errors that they make, a lot of the errors 00:55:53.400 |
that they make, even in kind of the easy single line case, 00:56:04.120 |
And it suggests a variable that you define earlier, 00:56:08.480 |
And that's the sort of thing where it's like, well, 00:56:23.280 |
without the context of the types or any other broader 00:56:36.920 |
that any baseline intelligent human developer would 00:56:43.440 |
click some find references, and pull in that graph context 00:56:53.480 |
So that's sort of like the MVP of what BFG was. 00:57:02.920 |
that AI coding tools make just by pulling in that context. 00:57:06.840 |
Yeah, but the graph is definitely our Chomsky side. 00:57:15.200 |
And I think it's just a very useful and also kind of nicely 00:57:18.960 |
nerdy way to describe the system that we're trying to build. 00:57:25.640 |
was trying to make earlier to your question, Alessio, about, 00:57:31.520 |
they thought, oh, are compilers going to replace programming? 00:57:36.920 |
And I think AI is just going to level us up again. 00:57:39.240 |
So programmers are still going to be building stuff 00:57:42.120 |
until agents come along, but I don't believe. 00:57:47.680 |
Yeah, to be clear, again, with the agent stuff 00:57:52.460 |
I think that's still the kind of long-term target. 00:57:57.160 |
you can have Kodi draft up an execution plan. 00:58:00.160 |
It's just not going to be the sort of thing where you can't 00:58:05.880 |
Like, we think that with Kodi, it's like, you guys Kodi, 00:58:10.340 |
It would do a reasonable job of fetching context and saying, 00:58:16.480 |
can actually suggest co-changes to make to those files. 00:58:19.200 |
And that's a very nice way to resolve issues, 00:58:21.640 |
because you're kind of on the rails for most of the time, 00:58:24.720 |
but then now and then you have to intervene as a human. 00:58:28.960 |
to get to complete automation, where it's like the sort 00:58:31.720 |
of thing where a non-software engineer, someone 00:58:41.520 |
that is still, I think, several key innovations away 00:58:47.400 |
And I don't think the pure transformer-based LLM 00:58:51.400 |
orchestrator model of agents that is kind of dominant today 00:58:58.960 |
Just what you're talking about triggered a thread 00:59:04.480 |
I've been working on for a little bit, which is we're 00:59:15.520 |
to need a bigger moat, which is great JAWS reference for those 00:59:22.300 |
FRANCESC CAMPOY: --how quickly models are evolving. 00:59:36.680 |
FRANCESC CAMPOY: And actually, there's a pretty good cadence 00:59:39.240 |
from GPT-2, 3, and 4 that you can-- if you project out. 00:59:42.360 |
So 4 is based on George Hotz's concept of 20 petaflops 00:59:52.080 |
GPT-4 took about 100 years in terms of human years 01:00:10.680 |
And if you just project it out, 9 is every human on Earth, 01:00:18.960 |
And he thinks he'll reach there by the end of the decade. 01:00:32.160 |
We're at the start of the curve with Moore's law. 01:00:37.080 |
George Moore, I think, thought it would last 10 years. 01:00:45.600 |
And we're just trying to extrapolate the curve out 01:00:50.040 |
So all I'm saying is this Asian stuff that we dealt 01:00:56.240 |
And I don't know how you plan when things are not 01:01:20.240 |
we hear things like things are not practical today, 01:01:30.220 |
I do think that there will be something like a Moore's law 01:01:34.920 |
I mean, definitely, I think, at the hardware level, like GPUs. 01:01:39.800 |
I think it gets a little fuzzier the higher you move up 01:01:44.400 |
But for instance, going back to the chess analogy, 01:01:50.000 |
at what point do we think that GPT-X or whatever, 01:01:54.520 |
a pure transformer-based LLM model will be state of the art 01:02:00.440 |
or outperform the best chess-playing algorithm today? 01:02:07.480 |
FRANCESC CAMPOY: Where you completely overlap 01:02:13.960 |
I think that would kind of disprove the thesis that I just 01:02:16.320 |
stated, which is kind of like the pure transformer, 01:02:25.000 |
versus, oh, we actually have to take a step back and think-- 01:02:37.200 |
is going to be one piece of a system of intelligence 01:02:41.740 |
that's going to take advantage-- that we'll have to take 01:02:44.120 |
advantage of, like many other algorithms and approaches? 01:02:53.800 |
FRANCESC CAMPOY: All right, sorry for that digression. 01:02:57.480 |
So one thing I did actually want to check in on, 01:03:00.000 |
because we talked a little bit about code graphs and reference 01:03:08.480 |
MARK MANDEL: Well, I mean, how would you find graph database? 01:03:18.420 |
that Postgres was performing as well as most of the graph 01:03:35.640 |
But we basically tried to dump a non-trivially sized data set, 01:03:40.260 |
but also not the whole universe of code, right? 01:03:46.180 |
compared to what we're indexing now into the database. 01:03:55.360 |
And we're like, OK, let's try another approach. 01:04:08.620 |
I mean, at the end of the day, all the databases, 01:04:14.660 |
If all your queries are single hops in this-- 01:04:20.060 |
MARK MANDEL: Which they will be if you denormalize 01:04:27.100 |
MARK MANDEL: Seventh normal form is just a bunch of files. 01:04:36.460 |
about the actual query load, or the traffic patterns, 01:04:46.020 |
just go with the tried and true, dumb, classic tools 01:04:52.260 |
FRANCESC CAMPOY: I mean, there's a bunch of stuff 01:04:54.260 |
like that in the search domain, too, especially right now, 01:04:56.700 |
with embeddings, and vector search, and all that. 01:05:00.900 |
But classic search techniques still go very far. 01:05:04.020 |
And I don't know, I think in the next year or two maybe, 01:05:10.680 |
start to see the gap emerge, or become more obvious to more 01:05:17.060 |
people about how many of the newfangled techniques 01:05:20.100 |
actually work in practice, and yield a better product 01:05:27.880 |
a bunch of other people trying to build AI tooling. 01:05:34.320 |
Obviously, you build a lot proprietary in-house, 01:05:42.020 |
do you have a prompt engineering management tool? 01:05:48.540 |
Pre-processing orchestration, do you use Airflow? 01:05:54.500 |
Ours is very duct-taped together at the moment. 01:06:06.460 |
There's the knowledge graph, the code knowledge graph 01:06:09.220 |
that we built, which is using indexers, many of which 01:06:12.620 |
are open source, that speak the skip protocol. 01:06:21.860 |
Traditionally, we supported regular expression search 01:06:24.540 |
and string literal search with a trigram index. 01:06:28.060 |
And we're also building more fuzzy search on top of that 01:06:31.300 |
now, kind of like natural language or keyword-based 01:06:36.820 |
And we use a variety of open source and proprietary models. 01:06:40.140 |
We try to be pluggable with respect to different models, 01:06:42.820 |
so we can easily swap the latest model in and out 01:06:49.460 |
I'm just hunting for, is there anything out there 01:06:52.620 |
that you're like, these guys are really good. 01:06:56.700 |
So for example, you talked about recursive summarization, 01:06:59.500 |
which is something that LangChain and LlamaIndex do. 01:07:05.500 |
I think the stuff that LlamaIndex and LangChain 01:07:12.420 |
like we're still in the application end user use case 01:07:17.060 |
And so adopting an external infrastructure or middleware 01:07:25.020 |
tool just seems overly constraining right now. 01:07:29.540 |
need to be able to iterate rapidly up and down the stack. 01:07:32.260 |
But maybe at some point, there'll be a convergence, 01:07:34.620 |
and we can actually merge some of our stuff into theirs 01:07:50.620 |
Also, plug for Fireworks as an inference platform. 01:08:06.140 |
She was the co-manager of PyTorch for five years. 01:08:22.900 |
And that's made it so that we just don't have 01:08:24.820 |
to think about building up an inference stack. 01:08:27.860 |
And so that's great for us, because it allows us to focus 01:08:30.340 |
more on the data fetching, the knowledge graph, 01:08:35.500 |
and model fine-tuning, which we've also invested a bit in. 01:08:40.820 |
We've got multiple AI workstreams in progress now, 01:08:51.700 |
And the guy we hired, Rashab, is absolutely world-class. 01:08:56.140 |
And he immediately started multiple workstreams, 01:09:17.140 |
run against the benchmark, or we'll make our own benchmark 01:09:20.740 |
But we'll be forcing people into the quantitative comparisons. 01:09:24.740 |
And that's all happening under the AI program 01:09:30.420 |
heard that there's a v2 of Starcoder coming on. 01:09:41.320 |
Can you guys believe how amazing it is that the open source 01:09:44.420 |
models are competitive with GPT and Anthropic? 01:09:50.260 |
I mean, that one Googler that was predicting that open source 01:09:53.420 |
would catch up, at least he was right for completions. 01:10:06.100 |
We still use Cloud and GPT-4 for chat and also commands. 01:10:11.980 |
But the ecosystem is going to continue to evolve. 01:10:24.620 |
that they're doing in kind of driving the ecosystem forward. 01:10:31.300 |
It's always kind of like a constant evaluation process. 01:10:33.980 |
I don't want to come out and say, hey, this model's 01:10:39.580 |
for the sorts of context that we're fetching now 01:10:42.460 |
and given the way that our prompt's constructed now. 01:10:44.580 |
And at the end of the day, it was like a judgment call. 01:10:53.140 |
Like, if someone comes up with a neat new context fetching 01:10:55.680 |
mechanism-- and we have a couple coming online soon-- 01:11:00.820 |
against the kind of array of models that are available 01:11:04.860 |
and see how this moves the needle across that set. 01:11:14.260 |
What did we have to build that we wish we could have used? 01:11:25.700 |
like a very nice, clean data set of both naturally occurring 01:11:34.820 |
Yeah, could someone please give us their data mode? 01:11:39.100 |
It's just like, I feel like most models today, 01:11:41.380 |
they still use a combination of the stack and the pile 01:11:55.020 |
I think there's still more alpha in synthetic data. 01:12:01.020 |
think fine-tuning some models on specific coding tasks 01:12:08.500 |
where it's reliable enough that we can fully automate it, 01:12:14.700 |
And synthetic data is playing a part of that. 01:12:17.060 |
But I mean, if there were like a synthetic data provider-- 01:12:19.760 |
I don't think you could construct a provider that has 01:12:25.200 |
No company in the world would be able to sell that to you. 01:12:35.940 |
I don't know if there's a business around that. 01:12:37.860 |
But that's something that we definitely love to use. 01:12:41.320 |
I mean, but that's also like the secret weapon, right? 01:12:48.220 |
So I doubt people are going to be, oh, we'll see. 01:12:57.940 |
I would say that would be the bull case for Repl.it, 01:13:01.500 |
that you want to be a coding platform where you also offer 01:13:05.980 |
And then you eventually bootstrap your own proprietary 01:13:14.580 |
this is from nobody at Repl.it that I'm hearing. 01:13:17.680 |
But also, they're just not leveraging that actively. 01:13:21.660 |
They're actually just betting on OpenAI to do a lot of that, 01:13:30.540 |
Yeah, they're definitely great at executing and-- 01:13:50.340 |
And this whole room in the new room was just like, 01:13:58.060 |
I mean, it would have real implications for us, too. 01:14:07.140 |
Yeah, I mean, that would have been the break glass plan. 01:14:13.180 |
think we'd have a lot of customers the day after being 01:14:16.140 |
like, how can you guarantee the reliability of your services 01:14:22.020 |
But I'm really happy they got things sorted out 01:14:31.340 |
So we kind of went through everything, right? 01:14:37.300 |
why inline completion is better, all of these things. 01:14:42.180 |
How does that bubble up to who manages the people, right? 01:14:46.820 |
Because as engineering managers, and I never-- 01:14:52.140 |
I was mostly helping people write their own code. 01:14:55.020 |
So even if you have the best inline completion, 01:15:04.220 |
Yeah, so that's a really interesting question. 01:15:07.580 |
And I think it sort of gets at this issue, which 01:15:10.420 |
is I think basically every AI dev tools creator or producer 01:15:22.700 |
kind of focusing on the wrong problem in a way. 01:15:26.340 |
Because the real problem of modern software development, 01:15:30.340 |
I think, is not how quickly can you write more lines of code. 01:15:34.180 |
It's really about managing the emergent complexity 01:15:41.340 |
and how to make efficient development tractable again. 01:15:47.060 |
Because the bulk of your time becomes more about understanding 01:15:51.540 |
how the system works and how the pieces fit together currently 01:15:56.140 |
so that you can update it in a way that gets you 01:16:00.220 |
your added functionality, doesn't break anything, 01:16:03.340 |
and doesn't introduce a lot of additional complexity 01:16:08.100 |
And if anything, the inner loop developer tools 01:16:15.020 |
yes, they help you get your feature done faster. 01:16:19.780 |
But they might make this problem of managing large complex code 01:16:25.820 |
Just because now, instead of having a pistol, 01:16:33.100 |
And there's going to be a bunch of natural language prompted 01:16:35.740 |
code that is generated in the future that was produced 01:16:38.500 |
by someone who doesn't even have an understanding of source 01:16:43.460 |
And so how are you going to verify the quality of that 01:16:45.780 |
and make sure it not only checks the low-level boxes, 01:16:49.820 |
but also fits architecturally in a way that's 01:16:57.980 |
have a lot of ideas around how to make code bases, 01:17:01.260 |
as they evolve, more understandable and manageable 01:17:05.020 |
to the people who really care about the code base as a whole-- 01:17:08.300 |
tech leads, engineering leaders, folks like that. 01:17:11.340 |
And it is kind of like a return to our ultimate mission 01:17:16.820 |
at Sourcegraph, which is to make code accessible to all. 01:17:19.340 |
It's not really about enabling people to write code. 01:17:21.640 |
And if anything, the original version of Sourcegraph 01:17:29.220 |
because there's already enough people doing that. 01:17:34.700 |
I mean, Quinn, myself, and you, Steve, at Google-- 01:17:54.020 |
And any developer who falls below a threshold, 01:17:56.180 |
a button lights up where the admin can fire them. 01:18:02.940 |
But I'm kind of only half tongue-in-cheek here. 01:18:06.260 |
We've got some prospects who are kind of sniffing down 01:18:15.320 |
like Bian was saying-- much greater whole code-based 01:18:17.700 |
understanding, which is actually something that Kody is, 01:18:20.260 |
I would argue, the best at today in the coding assistance space, 01:18:23.020 |
right, because of our search engine and the techniques 01:18:27.880 |
is so important for any sort of a manager who just 01:18:34.340 |
or whether people are writing code that's well-tested 01:18:42.580 |
This is not the developer inner loop or outer loop. 01:18:48.540 |
The manager inner loop is staring at your belly button, 01:18:54.220 |
Waiting for the next Slack message to arrive? 01:18:58.280 |
What they really want is a batch mode for these assistants 01:19:00.700 |
where you can actually take the coding assistant 01:19:08.180 |
it's told you all the security vulnerabilities. 01:19:11.980 |
It's an insanely expensive proposition, right? 01:19:14.060 |
You know, just the GPU cost, especially if you're 01:19:17.580 |
So it's better to do it at the point the code enters 01:19:20.380 |
And so now we're starting to get into developer outer loop 01:19:23.220 |
And I think that's where a lot of the-- to your question, 01:19:25.900 |
A lot of the admins and managers and the decision makers, 01:19:28.820 |
anybody who just kind of isn't coding but is involved, 01:19:32.540 |
they're going to have, I think, well, a set of tools, right? 01:19:40.980 |
Our code search actually serves that audience as well, 01:19:48.300 |
And they use our search engine and they go find it. 01:19:50.380 |
And AI is just going to make that so much easier for them. 01:19:56.180 |
to put my anecdote of how I used Kodi yesterday. 01:19:59.380 |
I was actually trying to build this Twitter scraper thing. 01:20:02.020 |
And Twitter is notoriously very challenging to work with 01:20:11.960 |
It was really big that had the Twitter scraper thing in it. 01:20:20.420 |
But then I noticed that on your landing page, 01:20:24.100 |
Like, I typically think of Kodi as a VS Code extension. 01:20:27.900 |
But you have a web version where you just plug in any repo 01:20:44.800 |
The search thing is like, oh, this is old source graph. 01:20:55.880 |
that's hidden in the upper right hand corner. 01:21:05.660 |
Well, you didn't embed it, but you indexed it. 01:21:09.720 |
that have emerged among power users where they kind of do-- 01:21:15.780 |
You can kind of replicate that, but for arbitrary frameworks 01:21:20.340 |
Because there's also an equally hidden toggle, which you may 01:21:22.900 |
not have discovered yet, where you can actually 01:21:30.540 |
let's say you want to build a stock ticker that's 01:21:33.280 |
React-based, but uses this one tick data fetching API. 01:21:42.480 |
Track the tick data of Bank of America, Wells Fargo 01:21:55.160 |
just because the wow factor of that is just pretty incredible. 01:21:58.360 |
It's like, what if you can speak apps into existence 01:22:00.800 |
that use the frameworks and packages that you want to use? 01:22:07.380 |
It's just taking advantage of your RAG pipeline. 01:22:20.700 |
Yeah, but I guess getting back to the original question, 01:22:25.620 |
I think would be interesting for engineering leaders. 01:22:32.100 |
that you really ought to be doing with respect to, like, 01:22:34.520 |
ensuring code quality, or updating dependencies, 01:22:42.680 |
that humans find toilsome and tedious and just don't want 01:22:45.800 |
to do, but would really help uplevel the quality, security, 01:22:51.480 |
Now we potentially have a way to do that with machines. 01:23:08.520 |
to do it in the same way that you can measure marketing, 01:23:11.920 |
or sales, or other parts of the organization. 01:23:14.560 |
And I think, what is the actual way you would do this 01:23:18.000 |
that is good, if you had all the time in the world? 01:23:20.960 |
I think, as an engineering manager or an engineering 01:23:23.320 |
leader, what you would do is you would go read 01:23:25.660 |
through the Git log, maybe like line by line. 01:23:28.160 |
Be like, OK, you, Sean, these are the features 01:23:31.560 |
that you built over the past six months or a year. 01:23:36.680 |
These are the things that delivered that you helped drive. 01:23:39.120 |
Here's the stuff that you did to help your teammates. 01:23:52.760 |
Now connect that to the things that matter to the business. 01:24:05.960 |
on the metrics that moved the needle for the business 01:24:08.200 |
and ultimately show up in revenue, or stock price, 01:24:12.480 |
or whatever it is that's at the very top of any for-profit 01:24:29.380 |
Plus, it's also tedious, like reading through Git log 01:24:32.660 |
and trying to understand what a change does and summarizing 01:24:36.620 |
It's just-- it's not the most exciting work in the world. 01:24:46.260 |
does a lot of the tedium and helps you actually 01:24:50.140 |
And I think that is maybe the ultimate answer to how 01:24:55.580 |
that a CFO would be like, OK, I can buy that. 01:24:59.380 |
The work that you did impacted these core metrics 01:25:10.420 |
And that's what we really want to drive towards. 01:25:12.020 |
I think that's what we've been trying to build all along, 01:25:21.820 |
now just puts that much sooner in reach, I think. 01:25:26.740 |
But I mean, we have to focus, also, small company. 01:25:30.460 |
And so our short-term focus is lovability, right? 01:25:41.460 |
about enabling all of the non-engineering roles, 01:25:59.820 |
Which we always forget to send the questions ahead of time. 01:26:04.300 |
So we usually have three, one around acceleration, 01:26:11.780 |
something that already happened in AI that is possible today 01:26:16.740 |
I mean, just LLMs and how good the vision models are now. 01:26:24.740 |
Well, I mean, back in the day, I got my start machine learning 01:26:35.020 |
And in those days, everything was statistical-based. 01:26:43.160 |
And so I was very bearish after that experience 01:26:54.800 |
So yeah, it came up faster than I expected it to. 01:27:04.340 |
that we're not tapping into, potentially even 01:27:12.940 |
is probably not the steady state that we're seeing long-term. 01:27:18.900 |
and you'll always have chat, and commands, and so on. 01:27:21.420 |
But I think we're going to discover a lot more. 01:27:25.820 |
some kind of new ways to get your stuff done. 01:27:30.540 |
So yeah, I think the capabilities are there today. 01:27:35.720 |
When I sit down, and I have a conversation with the LLM 01:27:41.340 |
talking to a senior engineer, or an architect, or somebody. 01:27:46.740 |
And I think that people have very different working models 01:27:50.460 |
Some people are just completion, completion, completion. 01:27:55.000 |
they write a comment, and then telling them what to do. 01:27:58.340 |
But I truly think that there are other modalities that we're 01:28:01.040 |
going to stumble across, and just kind of latently, 01:28:14.960 |
I mean, the one we talked about earlier, nonstop coding 01:28:19.140 |
a whole bunch of requests to refactor, and so on. 01:28:24.540 |
We talk about agents, that's kind of out there. 01:28:26.540 |
But I think there are kind of more inner loop type ones 01:28:31.220 |
And we haven't looked at all that multimodal yet. 01:28:41.260 |
One, which is effectively architecture diagrams 01:28:47.180 |
There's probably more alpha in synthesizing them 01:28:49.700 |
for management to see, which is, you don't need AI for that. 01:29:13.260 |
about how someone just had an always-on script, 01:29:16.540 |
just screenshotting and sending it to GPTVision 01:29:21.620 |
And it would just autonomously suggest stuff. 01:29:27.300 |
and just being a real co-pilot, rather than having 01:29:39.660 |
So the reason I know this is we actually did a hackathon, 01:29:41.980 |
where we wrote that project, but it roasted you while you did 01:29:46.820 |
it, so it's like, hey, you're on Twitter right now. 01:29:52.820 |
And that can be a fun co-pilot thing, as well. 01:29:57.860 |
Exploration, what do you think is the most interesting 01:30:02.900 |
It used to be scaling, right, with CNNs and RNNs, 01:30:15.120 |
I feel like-- do you mean like the pure model, like AI layer? 01:30:21.120 |
how do you get reliable first try working code generation? 01:30:30.380 |
Because I think if you want to get to the point 01:30:33.340 |
where you can actually be truly agentic or multi-step 01:30:40.540 |
is the single step has to be robust and reliable. 01:30:49.400 |
Because once you have that, it's a building block 01:30:51.400 |
that you can then compose into longer chains. 01:31:02.780 |
I mean, I think for me it's just like the best 01:31:11.700 |
to leverage many different forms of intelligence. 01:31:14.780 |
Calling back to that like Normski architecture, 01:31:19.740 |
You should call it something cool like S* or R*. 01:31:24.500 |
Just one letter and then just let people speculate. 01:31:37.660 |
And I think Normski encapsulates the two big technology areas 01:31:46.140 |
will be very important for producing really good DevTools. 01:31:51.460 |
And I think it's a big differentiator that we 01:32:00.900 |
that not all developers today are using coding assistants. 01:32:08.380 |
and it didn't immediately write a bunch of beautiful code 01:32:12.060 |
And they were like, ah, too much effort, and they left. 01:32:29.640 |
to actually make coding assistants work today. 01:32:33.880 |
they'll give you the runaround, just like doing a Google search 01:32:36.720 |
But if you're not putting that effort in and learning 01:32:39.560 |
the sort of footprint and the characteristics of how 01:32:42.600 |
LLMs behave under different query conditions and so on, 01:32:46.040 |
if you're not getting a feel for the coding assistant, 01:32:48.560 |
then you're letting this whole train just pull out 01:32:54.560 |
Yeah, thank you guys so much for coming on and being