back to index

3 ingredients for building reliable enterprise agents - Harrison Chase, LangChain/LangGraph


Whisper Transcript | Transcript Only Page

00:00:01.000 | SPEAKER 1: I want to talk today a little bit
00:00:16.600 | about trying to build reliable agents in the enterprise.
00:00:19.920 | This is something we work with a bunch of people for,
00:00:22.680 | both people building as developers
00:00:25.240 | inside of an enterprise, looking to build agents
00:00:27.480 | for their company, but also people
00:00:29.440 | who are looking to build solutions and bring them
00:00:32.020 | and sell them into enterprises.
00:00:35.060 | And so I wanted to talk a little bit about some
00:00:37.420 | of what we see kind of being the success tips and tricks
00:00:40.720 | for making this happen.
00:00:42.100 | So the vision of the future that I and other people, I think,
00:00:46.880 | have a similar view of for agents is that there'll
00:00:49.600 | be a lot of them.
00:00:50.260 | They'll be running around the enterprise
00:00:51.580 | doing different things.
00:00:52.840 | They'll be an agent for every different task.
00:00:55.180 | We'll be coordinating with them.
00:00:56.440 | We'll be kind of like a manager, a supervisor.
00:00:58.880 | And so how do we get to that vision?
00:01:02.600 | And what parts of this will kind of arrive before the others?
00:01:10.960 | And so I was thinking about this question.
00:01:12.720 | What makes some agents kind of succeed in the enterprise
00:01:15.920 | and some fail?
00:01:17.040 | And I was chatting with my friend Asaf.
00:01:19.640 | He's the head of AI at Monday.
00:01:21.080 | He also wrote GPT Researcher.
00:01:22.580 | It's a great open source package.
00:01:24.880 | I was chatting with him a few weeks ago.
00:01:27.220 | And a lot of the ideas here are borrowed
00:01:29.800 | from that conversation.
00:01:30.820 | He'll probably write a blog post about this
00:01:33.500 | with a slightly different framing, which I would encourage
00:01:35.680 | everyone to check out.
00:01:36.880 | So I just want to give him a massive shout out.
00:01:38.520 | And if you have the opportunity to chat with him,
00:01:40.820 | you should definitely take that opportunity.
00:01:44.560 | Thinking about it from first principles,
00:01:46.400 | like what makes agents successful in the enterprise?
00:01:51.680 | It'll make it successful.
00:01:52.620 | It will make it more likely to be adopted.
00:01:54.480 | The greater the value of the agent if it's right.
00:01:57.160 | These probably aren't going to sound kind of like earth
00:01:59.680 | shattering, but hopefully we'll get to some interesting points.
00:02:02.060 | The more value it provides when it's right,
00:02:04.220 | the more likely it will be to be adopted.
00:02:06.800 | The more likely it is to have success,
00:02:09.140 | the more likely it will be to be adopted.
00:02:11.560 | And then the cost if it's wrong.
00:02:13.260 | If there's big costs when it's wrong,
00:02:15.540 | then it will be less likely to be adopted.
00:02:17.300 | So I think these are three kind of like ingredients,
00:02:19.640 | which are pretty simple and pretty basic,
00:02:21.580 | but I think provide an interesting kind of like first principles
00:02:23.960 | approach for how to think about building agents
00:02:26.100 | and what types of agents kind of like find success.
00:02:29.380 | And I say in the enterprise here,
00:02:31.120 | but I also think this applies just generally
00:02:33.080 | within kind of like society.
00:02:36.480 | If we want to try to put this into a fun little equation,
00:02:39.020 | we can multiply the probability that something succeeds times
00:02:43.180 | the value that you get when it succeeds,
00:02:44.740 | and then do the opposite for the cost when it's wrong.
00:02:47.560 | And of course, this needs to be greater
00:02:49.140 | than the cost of running the agent for you
00:02:51.040 | to want to put it into production.
00:02:52.440 | So yeah, fun little kind of like stats slash math formula.
00:02:58.720 | So how can we build agents that score higher on this?
00:03:01.680 | Because this hasn't been anything kind
00:03:03.480 | of like earth shattering so far.
00:03:04.860 | Hopefully, we'll get to some fun insights
00:03:06.180 | when we talk about how to make that equation kind of like go up.
00:03:11.020 | So how can we increase the value of things when they go right?
00:03:17.980 | And what types of agents have higher value?
00:03:20.720 | So part of this is choosing kind of like problems
00:03:24.520 | where there is really high kind of like value.
00:03:26.980 | So a lot of the agents that have been successful so far--
00:03:30.120 | Harvey in the legal space is one of them.
00:03:33.880 | In the finance space, we see stuff around research
00:03:36.240 | and summarization.
00:03:36.980 | These are high value work tasks.
00:03:39.300 | People pay a lot of money for lawyers
00:03:42.320 | and for research and investment research.
00:03:45.620 | And so these are examples of what I would
00:03:47.160 | say kind of like high value tasks are.
00:03:51.120 | There's other ways to kind of like improve
00:03:53.080 | the value of what you're working on besides just switching
00:03:55.520 | kind of like the vertical completely.
00:03:56.880 | And I think we're starting to see some of this,
00:03:58.760 | especially more recently.
00:04:00.380 | So if we think about RAG or if we think about kind
00:04:02.360 | of like existing question-- or older school question answering
00:04:06.380 | solutions, they would often respond kind of like quickly,
00:04:08.680 | ideally within five seconds, and give you a quick answer.
00:04:11.220 | And we're starting to see a trend towards things
00:04:13.320 | like deep research, which go and run for an extended period
00:04:17.060 | of time.
00:04:17.720 | We're seeing the same with code.
00:04:19.060 | We start with cursor.
00:04:19.900 | It has kind of like inline autocomplete,
00:04:21.600 | maybe some chat question answering there.
00:04:23.820 | In the past like three weeks, there's been, what,
00:04:26.340 | seven different examples of these ambient agents
00:04:28.540 | that run in the background for like hours at time.
00:04:30.980 | And I think this speaks to ways that people are trying
00:04:33.240 | to get their agents to provide more value.
00:04:35.300 | They're getting them to do more work, pretty basic.
00:04:39.260 | But I do think that like as we have--
00:04:41.000 | if you think about this future of agents working
00:04:43.440 | and what that means, that doesn't mean a copilot.
00:04:46.360 | That means something working more autonomously
00:04:48.320 | in the background, doing more amounts of work.
00:04:52.140 | So besides kind of like focusing on areas or verticals
00:04:56.280 | that provide value, I think you can also absolutely
00:04:59.380 | reshift the UI, UX, the interaction pattern
00:05:02.800 | of what you're building to be kind of like more long term
00:05:05.960 | and do more kind of like substantial patterns of work.
00:05:11.140 | Let's talk about now the probability of success.
00:05:13.120 | How do we make this go up?
00:05:15.200 | So there's two different aspects I want to talk about here.
00:05:20.000 | One, I think, is about the reliability of agents.
00:05:22.180 | If you've built agents before, it's
00:05:25.720 | easy to get something that works in a prototype.
00:05:27.800 | It runs once, great.
00:05:28.720 | You can make a video, put it on Twitter.
00:05:30.140 | But it's hard to make it work reliably.
00:05:31.600 | Put it in production.
00:05:34.140 | And I think the core thing that we've seen--
00:05:37.060 | and by the way, for some parts of--
00:05:41.020 | for some types of agents, that's totally fine.
00:05:43.300 | You can have agents that run for a while
00:05:47.560 | and you don't know what they do, and that's totally fine.
00:05:51.580 | Especially in the enterprise, we see oftentimes
00:05:53.800 | that people want more predictability, more control
00:05:56.740 | over what steps actually happen inside the agents.
00:05:59.620 | Maybe they always want to do step A after step B.
00:06:02.760 | And so if you prompt an agent to do that, great.
00:06:05.080 | It might do that like 90% of the time.
00:06:07.080 | You don't know what the LLM will do.
00:06:08.580 | If you put that in a deterministic kind of workflow or code,
00:06:12.120 | then it will always do that.
00:06:13.400 | And so especially in the enterprise,
00:06:14.740 | we see that there are workflow-like things
00:06:18.580 | where you need more controllability, more predictability,
00:06:21.980 | than you get by just prompting.
00:06:23.660 | And so what we've seen is the solution for this
00:06:26.300 | is basically make more and more of your agent deterministic.
00:06:30.200 | There is this concept of workflows versus agents.
00:06:32.840 | Anthropic wrote a great blog post on this
00:06:35.160 | that I'd encourage you to check out.
00:06:37.380 | I would argue that instead of workflows versus agents,
00:06:40.080 | it's oftentimes workflows and agents.
00:06:42.960 | We see that parts of an agentic system
00:06:46.120 | are sometimes looping, calling a tool.
00:06:48.440 | And sometimes they're just doing A after B after C.
00:06:50.880 | An example of this is when you think
00:06:52.140 | about multi-agent architectures.
00:06:53.780 | If you think about an architecture that has agent A,
00:06:56.880 | and then after agent A finishes, you always call agent B.
00:06:59.760 | Is that a workflow?
00:07:01.040 | Is that an agent?
00:07:01.780 | It's this middle ground.
00:07:03.580 | And so as we think about building tools for this future,
00:07:07.480 | one of the things that we've released is Langraph.
00:07:10.380 | Langraph is an agent framework.
00:07:11.900 | It's very different from other agent frameworks,
00:07:13.940 | where it really leans in to this spectrum of workflows
00:07:17.780 | and agents and allows you to be wherever is best
00:07:21.180 | for your application on that curve.
00:07:24.180 | And where on that curve is best totally
00:07:26.100 | depends on the application that you're building.
00:07:28.200 | There is another thing that is different from just building
00:07:34.280 | and changing the agent.
00:07:35.100 | And I think there's oftentimes really high error bars
00:07:40.640 | that people have when they think about how likely an agent
00:07:43.060 | is to work.
00:07:43.740 | I think this technology is new when trying
00:07:47.660 | to get something built or approved or put into production
00:07:50.080 | inside an enterprise.
00:07:50.880 | I think there's a lot of uncertainty and fear around this.
00:07:56.540 | And I think that relates to this fundamental uncertainty
00:08:01.700 | around how this agent is performing.
00:08:04.080 | And so besides just making it better,
00:08:07.100 | a really important thing that we see
00:08:09.080 | to do inside the enterprise, whether you're
00:08:11.860 | bringing a third party agent and selling it as a service,
00:08:14.880 | or whether you're building inside the enterprise yourself,
00:08:18.800 | is to work to reduce the way that people see the error bars
00:08:23.800 | of how this agent performs.
00:08:25.740 | So what I mean by that specifically
00:08:27.700 | is that this is where observability and evals
00:08:30.160 | actually plays a slightly different role
00:08:32.920 | than we would maybe think or we would maybe intend.
00:08:35.740 | So we have an observability and eval solution called LangSmith.
00:08:39.400 | We built it for developers so that they could see what's
00:08:41.740 | going on inside their agent.
00:08:43.480 | It's also proved really, really valuable for communicating
00:08:46.940 | to external shareholders what's going on inside the agent
00:08:51.000 | and how the agent performs and where it messes up
00:08:54.000 | and where it doesn't mess up and basically communicate
00:08:56.340 | these kind of patterns.
00:08:58.400 | And so again, the observability part,
00:09:00.660 | you can just see every step that's
00:09:02.040 | happening inside the agent.
00:09:03.460 | This reduces the uncertainty that people
00:09:05.860 | have around what the agent and what it's actually doing.
00:09:07.920 | They can see that it's making three, five LLM calls.
00:09:11.140 | It's not just one.
00:09:11.820 | They're actually being really thoughtful about the steps
00:09:13.500 | that are happening.
00:09:14.420 | And then you can benchmark it against different things.
00:09:17.800 | And so there's a great story of a user of ours
00:09:20.960 | who used LangSmith initially to build the agent,
00:09:24.340 | but then brought it and showed it to the review panel
00:09:26.800 | as they were trying to get their agent approved
00:09:28.380 | to go into production.
00:09:29.680 | And they ended the meeting under time, which almost never
00:09:33.080 | happens if you've been to these review panels.
00:09:35.560 | And they showed them basically everything
00:09:38.260 | inside LangSmith.
00:09:39.060 | And it helped reduce the perception or the risk
00:09:43.060 | that people had of these agents.
00:09:47.800 | And then the last thing I want to talk about
00:09:51.160 | is the cost of something if it's wrong.
00:09:56.620 | There's similar to the probability of things being right,
00:10:00.260 | this plays an outsized role, especially
00:10:04.500 | in larger enterprises among review boards and managers,
00:10:07.240 | people's perception of these agents.
00:10:08.800 | People hear stories of agents going wild
00:10:10.920 | and causing brand damage or giving away things for free.
00:10:16.420 | I think there's an outsized perception
00:10:19.060 | of what could happen if things go bad.
00:10:25.780 | And so I think there's a few UI/UX tricks that people are doing
00:10:31.580 | and that successful agents have to just make this a non-issue.
00:10:36.220 | So one is just make it easy to reverse the changes
00:10:39.380 | that the agent makes.
00:10:40.700 | So if you think about code-- and this
00:10:42.040 | is a screenshot of ReplitAgent.
00:10:44.000 | It's a diff that it generates a PR.
00:10:47.400 | Code's really easy to revert.
00:10:48.640 | You go back to the previous commit.
00:10:50.860 | And so I think that's part of the reason
00:10:52.580 | why we see code being one of the first kind of real places
00:10:57.040 | that you can apply agents besides the fact
00:10:58.940 | that the models are trained on it.
00:11:00.080 | It's also that when you use these agents,
00:11:02.840 | you create all these commits.
00:11:04.520 | And well, it depends how you do it.
00:11:06.680 | Replit does it in a very clever way where every time they change
00:11:09.320 | a file, they save it as a new commit.
00:11:10.740 | So you can always go back.
00:11:11.660 | You can always revert kind of like what the agent does.
00:11:15.080 | And then the second part is having a human in the loop.
00:11:18.440 | So rather than merging code changes into main directly,
00:11:24.140 | open up PR.
00:11:24.920 | That's putting the human in the loop.
00:11:26.400 | And so then the effect of the agent--
00:11:29.080 | it's not kind of like making changes.
00:11:30.760 | There's the human who's kind of approving what the agent does.
00:11:33.920 | And this seems maybe a little subtle,
00:11:37.120 | but I think it completely changes the cost calculations
00:11:40.320 | in people's minds about what the cost of the agent doing
00:11:42.720 | something bad is because now it's reversible,
00:11:45.260 | and you have a human who is going
00:11:46.620 | to prevent it from even going in in the first place
00:11:48.780 | if it's bad.
00:11:50.480 | And so human in the loop is one of the big things
00:11:52.620 | that we see people selling these enterprises
00:11:55.680 | and building inside enterprises really leaning into.
00:11:59.440 | So to make this a little bit more concrete,
00:12:01.340 | what are some examples of this?
00:12:02.860 | I think deep research is a pretty good example of this.
00:12:06.100 | If we think about this, there is a period of time
00:12:10.280 | upfront when you're messaging with deep research
00:12:12.000 | that you go back and forth.
00:12:12.940 | It asks you follow-up questions, and you calibrate
00:12:14.960 | on what you want to research.
00:12:16.560 | That puts the human in the loop.
00:12:18.960 | It also makes sure that it gets a better result.
00:12:21.340 | So it increases the value that you're
00:12:23.060 | going to get from the report because it's more aligned
00:12:24.960 | with what you actually want.
00:12:26.580 | And then deep research-- it doesn't take this
00:12:29.660 | and publish it as a blog out in the internet,
00:12:31.560 | or it doesn't take it and email it to your clients.
00:12:33.220 | It produces just a report that you can read
00:12:35.740 | and decide what to do with.
00:12:36.800 | So it's not actually doing anything.
00:12:38.800 | It's up to you to take that and do things.
00:12:41.460 | I think similarly, when you think about code,
00:12:44.200 | it's another great example of--
00:12:47.560 | so Claude code also has this ability
00:12:50.140 | where it asks questions.
00:12:51.140 | It clarifies things.
00:12:52.860 | This is to both keep the human in the loop,
00:12:55.360 | but also make sure that it yields better results.
00:12:57.360 | And then again, with code, maybe you're not making a commit
00:13:00.040 | every time you change things.
00:13:01.140 | But it's on a separate branch.
00:13:02.440 | You open up PR.
00:13:03.160 | You're not pushing directly to master.
00:13:05.700 | And so I think these are examples of things
00:13:09.000 | in the general industry that follow some of these patterns.
00:13:14.540 | So OK.
00:13:15.040 | So we've figured out a few levers that we
00:13:16.600 | can pull to try to make our agents more interesting
00:13:21.620 | to be deployed in the enterprise.
00:13:23.720 | What next?
00:13:24.260 | What next is how do we scale that?
00:13:27.160 | So if this has positive value, then what we really want to do
00:13:31.880 | is just multiply this a bunch and scale it up a bunch.
00:13:34.900 | And I think this speaks to the concept
00:13:37.520 | of ambient agents, which is when we think about agents working
00:13:43.400 | in this futuristic view, agents working in an enterprise doing
00:13:46.820 | things in the background, they're doing things in the background.
00:13:49.160 | They're not being kicked off by humans still in the loop.
00:13:52.900 | They're being triggered by different events.
00:13:56.440 | And I think the reason that this is so powerful
00:13:58.660 | is that it scales up this positive expected value
00:14:01.680 | thing even more than we can.
00:14:04.000 | Like I can only really have one--
00:14:05.660 | maybe I can have two chat boxes open at the same time.
00:14:08.440 | But now there can be hundreds of these
00:14:10.680 | running in the background.
00:14:12.220 | And so when we think about the difference
00:14:13.780 | between chat agents, which I would argue we've mostly seen,
00:14:17.080 | and ambient agents, one big difference is ambient agents
00:14:19.840 | are triggered by events.
00:14:21.140 | That lets us scale ourselves instead of a one-to-one.
00:14:23.300 | It's now a one-to-many conversation
00:14:24.840 | that we can be happening.
00:14:26.480 | And so the concurrences of these agents that can be running
00:14:28.880 | goes from one to unlimited.
00:14:32.600 | The latency requirements also change.
00:14:35.040 | So when chat, you have this kind of like UX expectation
00:14:37.900 | that it responds really, really quickly.
00:14:40.080 | And that's not the case with ambient agents,
00:14:42.140 | because they're triggered without you even knowing.
00:14:43.980 | So how do you know?
00:14:46.080 | How do you even care how long it's running?
00:14:47.860 | And so what does this let you do?
00:14:49.760 | Why does this matter?
00:14:50.420 | This lets you do more complex operations.
00:14:52.420 | So you can do more things.
00:14:53.680 | So you can start to build up a bigger body of work.
00:14:57.260 | You can go from changing one line of code
00:14:59.600 | to changing a whole file or making a new repo or any of that.
00:15:02.960 | And so instead of this agent just responding directly
00:15:05.420 | or calling a single tool call, which usually happens
00:15:07.600 | in these chat applications because of the latency requirements,
00:15:10.100 | it can now do these more complex things.
00:15:12.020 | And so the value can start kind of like increasing in terms
00:15:14.860 | of what you're doing.
00:15:16.520 | And then the other thing that I want to emphasize
00:15:19.780 | is that there's still kind of like a UX for interacting
00:15:23.080 | with these agents.
00:15:24.780 | So ambient does not mean fully autonomous.
00:15:26.680 | And this is really, really important.
00:15:28.340 | Because autonomous-- when people hear autonomous,
00:15:30.340 | they think the cost of this thing doing something bad
00:15:32.920 | is really high.
00:15:34.480 | Because I'm not going to be able to oversee it.
00:15:37.020 | I don't know what's going on.
00:15:38.300 | How do-- it could go out there and run wild.
00:15:40.280 | And so ambient does not mean fully autonomous.
00:15:43.220 | And so there are a lot of different kind of like human in the loop
00:15:46.060 | interaction patterns that you can bring into these kind of like
00:15:49.120 | background, these ambient agents.
00:15:51.180 | There can be an approve-reject pattern where for certain tools,
00:15:54.220 | you want to explicitly say, yes, it's OK to call this tool.
00:15:57.180 | You might want to edit the tool that it's calling.
00:15:59.220 | So if it messes up a tool call, you
00:16:00.720 | can actually just correct it in the UI.
00:16:03.000 | You might want to give it the ability to kind of like ask
00:16:05.280 | questions so that you can answer them.
00:16:06.880 | You can provide more info if it gets stuck kind of like halfway
00:16:09.420 | through.
00:16:10.080 | And then time travel is something that we call human on the loop
00:16:13.340 | as well.
00:16:13.840 | So this is after the agents run.
00:16:15.000 | If it messed up on step like 10 out of 100,
00:16:17.620 | you can reverse back to step 10 and say, hey, no,
00:16:20.880 | resume from here but do this other thing slightly differently.
00:16:24.200 | And so human in the loop, we think, is super, super important.
00:16:27.820 | The other thing that I want to call out just briefly
00:16:30.020 | is I think there's this intermediary state
00:16:33.700 | where we're starting to be right now.
00:16:35.880 | I wouldn't call deep research or cloud code
00:16:39.300 | or any of these coding agents ambient agents
00:16:41.600 | because they're still triggered by a human.
00:16:43.600 | But I think these are good examples of sync to async agents.
00:16:48.120 | And so factory is a coding agent.
00:16:50.580 | They use a term kind of like async coding agents.
00:16:53.060 | And I really like that.
00:16:54.700 | But I think this kind of like sync to async agents
00:16:57.760 | is a natural progression if you think about it.
00:16:59.720 | Like right now to start--
00:17:01.040 | or a year ago, everything was a sync agent.
00:17:03.280 | We were chatting with it.
00:17:04.100 | It was very much in the moment.
00:17:05.460 | The future is probably these autonomous agents working
00:17:07.520 | in the background, still pinging us when they need help.
00:17:09.640 | But there's this intermediate state where
00:17:11.460 | the human kicks it off, uses that kind of human in the loop
00:17:14.340 | at the start to calibrate on what you want it to do.
00:17:16.780 | And so I think that that table I showed of like chat and ambient
00:17:20.900 | is actually probably missing a column in the middle.
00:17:22.840 | That's like these sync to async agents.
00:17:25.220 | Anyways, an example of some of the UXs that we think
00:17:28.240 | can be interesting for these ambient agents
00:17:30.260 | are basically what we call agent inbox, which
00:17:32.560 | is where you surface all the actions that the agent wants
00:17:34.960 | to take that need your approval.
00:17:36.080 | And then you can go in and approve, reject,
00:17:37.780 | leave feedback, things like that.
00:17:39.940 | Just kind of tie this together and make it really concrete
00:17:42.620 | what I mean by ambient agents.
00:17:44.500 | Email, I think, is a really natural place for ambient agents.
00:17:47.120 | These agents can listen to incoming emails.
00:17:49.580 | Those are events.
00:17:50.680 | They can run on however many emails come in.
00:17:53.020 | So that's, in theory, unlimited.
00:17:55.900 | But you still probably want an agent--
00:17:58.200 | or you still probably want the human, the user,
00:17:59.960 | to approve any emails that go out or any calendar events that
00:18:03.220 | get sent, depending on your level of comfort.
00:18:06.300 | And so this is a concrete thing.
00:18:09.200 | I actually built one that I have myself.
00:18:11.640 | We've used it to kind of test out a lot of these things.
00:18:14.240 | If people want to try it out, there is a QR code
00:18:16.280 | that you can scan and get the GitHub repo.
00:18:17.980 | It's all open source.
00:18:19.620 | And I think it's not the only example of ambient agents,
00:18:23.960 | but it's one that I've built myself,
00:18:26.040 | and so we talk a lot about internally.
00:18:29.820 | That's all I have.
00:18:31.120 | I'm not sure if there's time for questions or not.
00:18:33.700 | One or two questions, if people have them.
00:18:35.740 | So my question is, although everybody's
00:18:48.880 | talking about agents, but only code generating agents
00:18:52.640 | are the ones who are getting funding,
00:18:54.480 | is it because you can measure what you have done
00:18:57.580 | and you can reverse what you have done.
00:18:59.380 | But for all other agents, you can do a lot of stuff,
00:19:02.160 | but you cannot measure what you have done.
00:19:03.820 | You cannot reverse what you have done.
00:19:05.500 | Yeah.
00:19:06.000 | I think there's a variety of reasons.
00:19:09.380 | I think those two measure and--
00:19:12.340 | well, OK.
00:19:13.080 | So the measure thing, I think probably more so.
00:19:15.380 | You can-- a lot of the large model labs train on a lot of coding
00:19:19.300 | data because you can test whether it's correct or not.
00:19:21.420 | You can run it, see if it compiles.
00:19:22.800 | Same with math data.
00:19:23.680 | Math is very-- it's verifiable, right?
00:19:25.680 | So math and code are two examples of verifiable domains.
00:19:28.420 | Essay writing is less verifiable.
00:19:29.920 | What does it mean for an essay to be correct?
00:19:31.580 | That's far more ambiguous.
00:19:33.080 | And so because of these verifiable things,
00:19:35.580 | you're able to bootstrap a lot of training data.
00:19:37.580 | And so there's a lot of training data in the models already
00:19:40.040 | about code.
00:19:40.540 | And so the models are better at that.
00:19:41.800 | That makes the agents that use those models better at that.
00:19:44.660 | Then the second part, I do think code lends itself
00:19:47.860 | naturally to this commit and this draft and this preview thing.
00:19:51.280 | I think that's more generalizable.
00:19:53.380 | So legal is a great example.
00:19:54.740 | Legal, you can have first drafts of things.
00:19:57.180 | That's very common.
00:19:57.840 | Same with essay writing.
00:19:59.040 | I think the concept of a first draft
00:20:01.260 | is actually a really good UX to aim for.
00:20:03.120 | It lets you do far more.
00:20:04.460 | It also puts the human in the loop.
00:20:05.920 | And so you get this dual kind of--
00:20:09.480 | like if you put the human in a loop at every step,
00:20:11.420 | that doesn't provide any value.
00:20:12.740 | Each step is so small.
00:20:14.080 | So the key is finding these UX patterns
00:20:16.800 | where the agent does a ton of work,
00:20:18.540 | but the human's still in the loop at key points.
00:20:20.920 | And first drafts, I think, are a great mental model for that.
00:20:24.460 | So anything where there's like first drafts, legal, writing,
00:20:27.840 | code, I think that's a little bit more generalizable.
00:20:32.480 | The verifiable stuff, that's a little bit tougher.
00:20:35.840 | Yeah.
00:20:36.340 | Yeah.
00:20:41.140 | Oh, no, I'm good.
00:20:41.820 | I'll talk to you afterwards.
00:20:43.020 | ANDREW BROGDON: Cool.
00:20:43.840 | Yeah, more than happy to chat after.
00:20:45.300 | Thank you all.
00:20:46.220 | ANDREW BROGDON: Thank you.
00:20:46.980 | ANDREW BROGDON: Thank you.
00:20:48.320 | ANDREW BROGDON: Thank you.
00:20:49.180 | ANDREW BROGDON: Thank you.
00:20:50.180 | We'll be right back.