back to index

Building the platform for agent coordination — Tom Moor, Linear


Whisper Transcript | Transcript Only Page

00:00:00.640 | So yeah, I'm Tom. I lead the engineering team at Linear, and today I would love to talk to you a
00:00:22.320 | bit about our story with AI, how we think about AI as a company, some of the features we've built,
00:00:28.240 | and then how we see software development going from here and perhaps Linear's place in that future.
00:00:34.080 | So just for anybody that hasn't heard of Linear in the room that you might not be familiar,
00:00:43.600 | so Linear is a product development tool. It's disguised as an issue tracker, we like to say.
00:00:48.560 | We've spent the last five years obsessing over the speed, clarity, removing friction, making it just
00:00:56.800 | the best tool for ICs to use to work every day. So yeah, it started as a simple tracker, and now we
00:01:05.360 | think of it as an operating system for engineering and product teams to build their products. And
00:01:11.920 | we're used by OpenAI, Ramp, Vasell, thousands of other modern software companies you've heard of
00:01:20.880 | and use Linear to kind of keep track of their work. So just a little bit of history of our AI
00:01:29.440 | journey, as it were. We spun up an internal Skunkworks team in early 2023, which I think was about GPT-3,
00:01:39.920 | if I remember rightly. Our initial focus was on kind of summarization, some similarity. We were looking at
00:01:48.240 | embeddings. Nobody on the team had any AI experience, so we're just kind of jumping in and figuring it out
00:01:54.960 | as we go. One of the things we realized really quickly was that many of the features that we needed
00:02:00.480 | to build needed a really solid search foundation. Almost everything you need to first find the
00:02:06.720 | relevant stuff, right? So we had Elasticsearch at that time, and they didn't have a very good vector
00:02:11.920 | offering. I think maybe they actually didn't have no vector offering back in 2023. So we looked around,
00:02:17.360 | and this was kind of a moment where there was like a hundred startups suddenly came out with vector
00:02:21.840 | databases, right? I was like, there's Pinecone, there was this, there was that. And so we looked at these,
00:02:26.480 | we evaluated a few. They all had a ton of trade-offs. And so we literally just ended up after experimenting
00:02:32.960 | with a bunch of things, and we had like OpenAI embeddings, and we stored them in PGVector, and we put the PGVector
00:02:38.480 | on GCP, and it was like the most classic linear decision ever because it was so pragmatic and just used
00:02:45.600 | the solid things. So on that base, we shipped some features, right? We shipped a V1 of
00:02:51.520 | similar issues where we're kind of suggesting related issues. This was like, in hindsight,
00:02:56.720 | two years later, so naive. We were just doing simple cosine embedding comparisons against the vector
00:03:06.400 | database. And we shipped natural language filters. Actually, I think this is one of the better ones,
00:03:11.840 | where you can just type in natural language bugs assigned to me in the last two weeks that are closed,
00:03:18.000 | and it will produce the filter. So it's very one-shot, very naive in comparison. Yeah, but
00:03:23.280 | pretty useful and kind of hidden, I would say, as well. We also have another feature where
00:03:28.560 | if you create an issue from a Slack thread, we will not just pass the text from the Slack message. We will
00:03:35.200 | we will try and produce the right issue from that automatically. And that was like so seamless
00:03:39.600 | and hidden that I think a lot of people didn't even realize it was happening.
00:03:43.840 | And we never shipped a co-pilot. We tried. It was like it was co-pilot season. And
00:03:48.640 | we just the quality wasn't there. You know, we have this quality bar and it did not reach it. So
00:03:54.000 | I don't know if it was a lack of imagination for our team because we weren't like AI-pilled enough at the
00:03:59.680 | time, or it was like the capability of these early models. I think a bit of both, to be honest.
00:04:06.480 | So, you know, we -- I think this was the right approach at the time, in a way. Like,
00:04:10.960 | a lot of people on Twitter kind of noticed. They're like, oh, these are very seamless features. You're
00:04:16.640 | not slapping AI in our face. Like, there was literally toothbrushes that said they had AI.
00:04:22.240 | I think it's probably much worse now, to be honest. But, you know, people kind of appreciated this
00:04:26.720 | approach of, like, small pragmatic value adds. And then, like, fast forward to 2024. And, you know,
00:04:33.760 | we've added a few things since then. But it really feels like at 2024, the end of 2024, we hit a
00:04:39.120 | turning point. You know, 03 coming out, the planning and reasoning models, the multimodal capabilities
00:04:46.320 | became available in the APIs. The context windows went through the roof. You know, have, like, million
00:04:51.280 | token contexts. Like, you can do crazy things with that. DeepSeq, of course, made a splash. And we felt
00:04:58.800 | like some of our experiments started to become a lot less brittle. And things actually felt smart.
00:05:06.240 | Things kind of clicked for the team a little bit more. We saw how deep this could go.
00:05:11.600 | So the first thing we did was we started by rebuilding our search index again. Which, I don't know,
00:05:20.640 | if you ever, like, backfilled, like, millions -- hundreds of millions of rows of embeddings,
00:05:26.000 | it takes a while. So we moved to a hybrid search approach. This was something that
00:05:30.480 | we had really felt was lacking over, like, the year and a half, that we kind of had PG Vector sat on its
00:05:36.240 | own. And we weren't -- we didn't put it in our main database because it was so huge. So it was kind of
00:05:41.920 | sat in its own thing. So we moved to Turbo Puffer. If you've not heard of Turbo Puffer, really, really cool
00:05:47.520 | search index, I'd highly recommend giving it a look. And we moved our embeddings over to Cohere. After
00:05:52.720 | doing kind of a comparison, we felt that they were a lot better for our domain, at least, than OpenAI's.
00:05:59.440 | So this kind of filled a gap in the search. And this is actually just finished rolling out in, like,
00:06:06.880 | the last two weeks because the backfill took such a while. But now we thought, okay,
00:06:12.240 | we've got a really solid search foundation. What are we going to do with this?
00:06:15.520 | So first thing we did is, like, we're building this feature called Product Intelligence.
00:06:20.720 | This is basically, like, similar issues V2. So instead of just doing simple cosine matching,
00:06:27.920 | We now have a pipeline. That pipeline is using query rewriting. It's using the hybrid search engine.
00:06:34.080 | It's re-ranking the results. We're using deterministic rules. And then out the other side,
00:06:38.720 | what we get is a map of relationships from any given issue to its related issues. And then how they are
00:06:46.560 | related and the why they are related. And then what we're able to do with that is expose this in the
00:06:54.160 | product. I hope that's clear enough as, you know, we have suggested labels, suggested assignees,
00:07:00.480 | possible duplicates. And then on things like projects, it's like why this might be the right person to
00:07:08.960 | work on this issue or why this might be the right project for this. So, you know, we're working with,
00:07:13.360 | like, the open AIs of the world. They have thousands of tickets coming in. And they really have to have
00:07:17.760 | as much help as possible to kind of churn through them and get them into the right, the hands of the
00:07:22.240 | right engineers. I think I skipped one. Yeah. So the next one was customer feedback analysis. This is
00:07:30.320 | something we're working on right now. So one of the other features of Linear is you can bring in all of
00:07:34.880 | the customer feedback from all of your channels and then use that to help to decide what you're going to
00:07:41.200 | build. And so obviously one of the steps there is, okay, we have hundreds of pieces of feedback.
00:07:46.800 | How do we figure out what to build from this, right? So of course, LLMs are great at analyzing text.
00:07:53.600 | And we found that I think our head of product actually said that our analysis was able to beat
00:08:01.040 | 90% of the candidates he talks to in the interview process for what they're able to do in terms of
00:08:07.200 | analysis. So we're able to, yes, churn through hundreds or thousands of customer requests and then
00:08:12.320 | figure out for this given project, like, how might we split this up, what features might be created from
00:08:17.680 | this, which is pretty cool. Another feature we've already shipped is a daily or weekly pulse. This
00:08:25.760 | synthesizes all of the updates that are happening in your workspace, creates a pulse from it, like a summarized
00:08:34.000 | pulse. And then we also produce, like, an audio podcast version, which is pretty cool, because you
00:08:39.120 | can pull open our mobile app and then listen to that on your commute. I hope we have an RSS feed for
00:08:44.240 | it soon. I really want to just subscribe to it in a podcast player. So although I put podcast here, it's
00:08:49.040 | not quite a podcast. You have to have a mobile app or the desktop app. But this is great. You just, like,
00:08:54.000 | over breakfast, like, what has the team been up to while I was asleep. Oh, that was a -- sorry. That's the visual of it. Apologies.
00:09:03.120 | Apologies. And then, yeah. So one other feature I'll go through here is this issue from video. So
00:09:14.240 | literally, so many bugs come in as video recordings from customers, right? Drop the video. We'll analyze
00:09:21.920 | it. We'll figure out the reproduction steps. And then we'll create the issue for you from that. This is
00:09:27.440 | maybe not the finest example of the feature, but another kind of, like, seamless but very powerful and saves a bunch of time.
00:09:34.320 | So, of course, we're baking as much into the platform as we can
00:09:40.560 | in terms of these things. But there's a limit to that, right? We can't put in everything. We don't know.
00:09:47.520 | Every team is different. Every team is shaped differently. So we want to make this pluggable.
00:09:52.080 | And this is kind of where agents come in. So the way we're thinking about agents is as
00:09:56.560 | infinitely scalable cloud-based teammates. So we launched a platform for this two weeks ago.
00:10:02.720 | We figure, you know, we're already doing a pretty good job of orchestrating humans.
00:10:07.680 | We are a communication tool for humans, after all. And if agents are going to be members of your team
00:10:14.080 | going forward, then they should also live in the same place where all of the human communication happens.
00:10:20.560 | So first, hopefully, if the internet stands up, I'm tethering. I'll do some -- I've got some videos.
00:10:27.040 | Yeah. So CodeGen is one of the first coding agents that integrated with us.
00:10:34.080 | So they can -- is this going to play? Cool. Yeah. So CodeGen, you can assign it. You can mention it
00:10:42.320 | inside of Linear like any other user. And it will produce plans. It will produce PRs.
00:10:49.360 | You can see here. It's going to pop in -- boop. This is a sped up, by the way. That took four minutes,
00:10:54.400 | not 20 seconds. But, yes, it will produce the PR. And then you can go and review it like you would any
00:10:59.680 | other -- any other worker -- any other team member.
00:11:03.200 | This is really powerful, by the way. And you can -- because it's an agentic system in the background,
00:11:11.920 | you can also interact with it from -- not just from within Linear, but from within Slack or from
00:11:17.360 | other communication tools. And you can say, go and fix this ticket and give it a linear issue,
00:11:22.480 | and it will know how to connect it all up. Or you'll be able to interrupt it part way.
00:11:27.040 | Bucket is a feature flagging platform that integrated with the first version of our
00:11:34.640 | agents platform here. Let's see. Is this going to -- oh, no.
00:11:38.160 | All righty. Yeah. So in this case, you can just mention the bucket agent, tell it to create a flag.
00:11:45.040 | It will create a feature flag for you. You can roll it out. You can check the status of things
00:11:50.080 | all within here. And, of course, because it's agentic, you don't have to go command by command.
00:11:54.400 | You can say, create a new flag, roll it out to 30% of users, and things like that.
00:12:00.000 | And then Charlie is another coding agent with access to your repository. It's really good at creating
00:12:07.760 | plans and doing, like, root calls analysis of bugs. So in this case, we have an issue here. It has a
00:12:15.120 | sentry issue attached. We can just mention Charlie, ask it to do some research. So it can go and look at
00:12:23.120 | your recent commits. It can go look through the code base. And it can kind of figure out the cause of
00:12:29.040 | this issue. And you can imagine immediately, right, like, this has saved a lot of minutes of engineer's
00:12:35.760 | time. They can come in here and immediately see possible causes and regression reasons for this issue.
00:12:41.200 | So the examples I've shown so far have been kind of living in the common area of an issue. Obviously,
00:12:50.000 | that's not quite where we want to be in the long term. So, you know, we're working on building
00:12:56.400 | additional surfaces for this in the product so that agents aren't just, like, the same as users on the
00:13:04.400 | team. They're kind of better because you can see what they're thinking. And I can't see what my teammates
00:13:09.200 | are thinking a lot of the time. So, yeah. So we'll have this surface where the agents can send you their
00:13:14.800 | observations. They can send you the tool calls. You're able to kind of go behind the scenes of the
00:13:19.840 | agent. You'll be able to interrupt it. And then this is kind of consistent across the whole workspace,
00:13:27.360 | right? So you have different coding agents. You have PM agents. One other company that's building an
00:13:34.640 | integration with us right now is intercom with the Fin agent. So you'll be able to do things like just say,
00:13:39.120 | hey, Fin, I fixed this bug. Can you go and reply to the 100 customers that reported it? And, you know,
00:13:45.840 | how much time did that just save? So we're building this interface out right now. And I expect to have
00:13:50.640 | it in a couple of weeks. But I've been really using these features a ton. And I've been hammering
00:13:57.520 | this for months. And I think it really changes the game. And we'll expect kind of the amount of bugs
00:14:02.560 | sitting in companies' backlogs, which we kind of take for granted that you have this giant backlog
00:14:06.640 | that you're never going to get to the bottom of. I think there's just not going to be an excuse for
00:14:10.720 | that anymore. The agents can tackle it for you. There's nothing to stop you assigning every single
00:14:18.240 | issue in your backlog out to an agent. Have it do a first pass. Maybe 50% of them will be fixed by
00:14:22.640 | the end of the week. So I think, yes, we're really in this world now where you can build more. You can
00:14:28.720 | build higher quality because more of the grunt work is being done. And you can build faster.
00:14:33.600 | How much time we got? So I'll just talk a little bit about, like, the architecture of this.
00:14:41.680 | So, yeah, in Linear, agents are first-class users. They have identity. They have history. You can see
00:14:48.800 | everything they do. There's a full audit trail of those events. You install them via OAuth.
00:14:54.800 | And then once they're installed, kind of any admin on the team can manage that agent and its access.
00:15:03.360 | And they work fully transparently.
00:15:05.760 | So we have a very mature GraphQL API at this point, which basically enables agents to do anything in the
00:15:15.920 | product that a human could do and granular scopes. And then we added brand new webhooks for this
00:15:22.240 | specifically, where if you are developing an agent with Linear, you will get webhooks when events happen
00:15:29.760 | that are specific to your agent. So somebody replies to your agent. Your agent was triggered on this issue.
00:15:36.640 | We also added some additional scopes that you can opt into to choose whether your agent is mentionable or assignable.
00:15:45.360 | And then, as part of that kind of future UI that I just showed, we're also working on a new SDK to be released at the same time,
00:15:52.640 | which will just make that really, really easy where you can -- so right now you can build all this stuff.
00:15:57.760 | It's on our existing API. And you kind of have to figure out a bit more, I would say. So we're kind of
00:16:03.360 | building this abstraction layer, this sugar, where you can very, very easily integrate with the platform.
00:16:13.760 | So, yeah, I'll finish with some of the best practices that we found working with these partners over the last
00:16:21.840 | couple of months. You know, it really felt like we're kind of on the cutting edge here, and we're building it
00:16:28.720 | as the agents themselves still haven't launched in a lot of ways. You know, like Google and Codex only just
00:16:33.760 | launched those within the last couple of weeks. So first is to be -- to respond very quickly and very
00:16:42.560 | precisely when folks trigger your agent. So if I mention your agent, it should respond as fast as
00:16:49.760 | possible. A lot of what we've seen is people using emoji reactions for that right now. Excuse me, I have to cough.
00:16:55.840 | Yeah, so -- and then respond in a way that kind of like reassures the user that you -- the agent
00:17:06.960 | understood the request. You know, so it's like if you say at CodeGen, can you take care of this? The response
00:17:13.040 | should be like something -- I will produce a PR for this specific thing you asked me. It's like, okay, you
00:17:17.920 | understood what I meant. Great. Inhabit the platform. This is like linear specific a little bit, but
00:17:25.840 | in this example, but I think it applies anywhere. We really expect that these agents are not linear
00:17:32.000 | agents. They are the agents that live in the cloud, and one of the ways that they interact is through
00:17:37.200 | linear, right? It's just another -- it's a window into their behavior and hopefully like a really well
00:17:42.320 | structured one where they get a lot of context, but we really think that, you know, if you're working
00:17:48.160 | within -- if you're working within Slack, you should use the language of those platforms and not confuse
00:17:52.800 | things and put great effort into that. And then things like -- one of the things that we expect to
00:17:59.840 | happen inside of linear is if you're working on an issue, you should move that issue to in progress.
00:18:04.480 | Don't just leave it in the backlog. You expect that of your teammates, and we expect that of agents as
00:18:08.960 | well. And then just, again, like natural behavior. So if somebody triggered you and then they replied
00:18:17.120 | in that thread, you shouldn't need to mention the agent again to get a response. It should be a natural
00:18:22.560 | behavior that if you reply to them, they will respond.
00:18:31.040 | Yeah, don't be clever. Clarify your intent before acting. I think we see a lot of like attempts at
00:18:38.320 | one shots. One pattern that we're seeing right now coming out of a lot of the coding agents is they'll
00:18:44.160 | form a plan before doing anything and communicate that plan up front and get clarification on it. So
00:18:52.240 | that's something that we definitely expect to happen.
00:18:55.360 | And finally, you know, be sure you're adding value. I think, you know, LLMs, they love to just
00:19:05.520 | produce tons of text. We don't want to see splats straight out of OpenAI into comments, into issues,
00:19:12.160 | into any other services. Be concise. Be useful. Be like a good teammate would be. You can always fall
00:19:20.480 | back on asking, like, what would a human do in this situation? And try your best to achieve that.
00:19:25.760 | Cool. That's it. Thanks for listening. And if you're interested in working with us on this platform or
00:19:35.360 | integrating with Linnea, let me know.