back to index

Beyond Conversation: Why Documents Transform Natural Language into Code - Filip Kozera


Chapters

0:0 Intro
0:19 Problems with chatbased systems
0:52 Context pollution
2:8 Documents
2:55 Concurrency
3:21 Background
3:46 Workflows
4:20 Human in the loop
5:7 Protocols
6:22 Agent Economy
7:21 Managing Agents
8:58 Conclusion

Whisper Transcript | Transcript Only Page

00:00:00.080 | Hi, I'm Philip and I'm the CEO at Wordware. Today I want to talk to you about what sucks
00:00:06.080 | about chat-based interfaces, how documents can actually solve those issues and how do they lead
00:00:12.640 | to background agents that do tasks for you in the background. So firstly, let's start with what are
00:00:21.120 | the problems with chat-based systems. When I interact with Claude or OpenAI, it all seems very
00:00:27.840 | informal. I end up often creating workflows for myself using projects or just copy-pasting stuff.
00:00:35.280 | And in that way, when I'm kind of having these long conversations that I populate the context window,
00:00:40.960 | I realize that a lot of these things in the context window are just like gibberish and garbage. And so
00:00:50.640 | we get context pollution. I also don't get to iterate in a structured manner. If I'm working with
00:00:57.280 | artifacts, it ends up that I basically make the context window dirty enough with not being able to
00:01:04.720 | you know, change one sentence that I really wanted to make sure that it's precise.
00:01:08.960 | I also sometimes lack another level of forced clarity. ChatGPT often asks me that one particular
00:01:19.040 | question around, "Oh, in deep research, what would you like to actually find?" But it never actually asks me
00:01:26.480 | in the right way when it's not actually certain about some things and therefore not forcing me to clarity.
00:01:34.800 | Also, there's a couple more issues about poor version control, limited reusability, model laziness. The more the context grows, the less,
00:01:45.760 | less, you know, the worst response I'm getting. Also, chat interfaces don't support any logical grouping
00:01:53.200 | or nesting in any way. And we also are interacting with a single abstraction layer. We don't get to
00:02:00.080 | see and choose whether we want to specify every small detail of a particular task or just set it up in
00:02:07.200 | some way. Hence documents. Documents are actually the original way of specifying more complex systems.
00:02:16.400 | The first product requirements doc that dates probably to Noah's Ark around three and a half thousand years ago.
00:02:26.320 | Don't check me on that one, however. And it's the first kind of take on someone explaining a more
00:02:34.720 | complex system to somebody who is not aware of how to build it. And therefore, documents are actually the
00:02:43.600 | ultimate way of humans communicating these more complex ideas. And so in that way, we get forced clarity,
00:02:50.880 | opportunity, which is great. But next problem with chat and one of the biggest problems with chat is
00:02:56.560 | concurrency. We have concurrency of one with all chat-based systems. We need to be sitting there and
00:03:03.440 | we are getting like the inklings of how the future will look like when Manus or Deep Research are running in
00:03:11.840 | the background and actually doing things for us. It feels great. It feels great that there's something
00:03:17.920 | in the background that's happening. So now let's riff off this idea of the background agents that I'm going
00:03:25.760 | to introduce. So we've been doing work and computers have been doing work for us in the background for
00:03:33.760 | quite a while. And, you know, we've kind of created workflows which are kind of handcrafted. You can
00:03:39.600 | think of the Zapier of this world. And then we just barely started to create specialized agents. That basically
00:03:47.440 | means that at some stage it kind of had an if-else statement that was somewhat fuzzy and it made one or two
00:03:54.080 | decisions. And as we can see on this diagram, when the importance of some workflow is high and the
00:04:02.160 | occurrence is high, we actually end up using handcrafted workflows. We're only now entering an area where the
00:04:09.120 | general agents are starting to take some decisions. But whenever we have kind of higher importance, we don't let
00:04:16.720 | that general agents to kind of enter our life. So how can we remedy this? We remedy this by
00:04:25.760 | introducing a human in the loop. So essentially now with the human in the loop, the agent can do a bunch of work and we
00:04:33.840 | get to approve, reject it, change the way that it's created there and output or even fix its logic entirely.
00:04:43.440 | They normally react to some kind of implicit or explicit user intent or trigger. So you can
00:04:49.600 | think of these background agents as being activated by a sent email, send Slack message, maybe your meeting.
00:04:56.640 | That could be an implicit trigger that you had a meeting with a name party and that party with investors and
00:05:03.360 | that could prompt you to update your CRM. And with having a bunch of these ambient agents, you end up
00:05:11.280 | having like having to create protocols of how humans and the AIs communicate between themselves.
00:05:17.520 | And that basically means that, you know, both humans can control their agents and its outputs, but also
00:05:24.640 | different agents can communicate with each other to educate themselves around, you know, sources of data.
00:05:32.480 | An agent, one agent could be communicating with a more general agent around what is in your notion.
00:05:38.880 | And in that way we'll probably start with a prosumer first where a bunch of agents are working in the
00:05:44.240 | background. But very soon that starts to be about an organization and we'll start having organizations
00:05:51.440 | which have their own agents and also external agents. And in that way, we might even get agents
00:05:58.080 | which manage other humans, which sounds ridiculous at the beginning, but actually, you know, that could be just
00:06:03.120 | an agent which creates Jira tickets for all of your engineers. And in that way we basically create the
00:06:08.880 | graph of the enterprise of the future. I think when I think about these background agents, firstly, kind of
00:06:16.480 | working for the prosumer, maybe managing your emails and just making sure that you have more time for
00:06:21.680 | yourself. I think this idea like so naturally represents a bottom up movement to enterprises,
00:06:28.720 | which will be more slow moving and trying to make sure that all of their agentic tools that these agents
00:06:35.200 | need to be using are verified and have the right authority and they have the right permissions and they
00:06:41.120 | don't mess things up. So as I am thinking about the future of the agent economy, I think that the stochastic mindset
00:06:49.200 | of like we need to adopt that because we essentially need to lead with leverage over uncertainty.
00:06:57.760 | If something that you don't fully get how it works closes your clients and delivers on business value,
00:07:03.760 | 99.9% of the time we're not going to care about the fact that we don't understand what's happening in
00:07:10.320 | that 0.1% of the time. We're just going to make sure that the impact of the 0.1% of the time is not
00:07:18.640 | catastrophic. I also think humans will manage a bunch of agents and that's why taste and intent
00:07:26.800 | is so important. You will need to imbue your own personal brand onto agents and take responsibility for
00:07:33.840 | their actions. We'll also need a lot more communication protocols between humans and AI and also in between
00:07:39.680 | agents. The MCP is the first protocol that kind of sets it up, but I think it lacks more information
00:07:46.480 | about what are the constraints of a particular agent, what are the authority that it needs to
00:07:52.240 | have in order to act, whether it needs approval from human in the loop, etc.
00:07:59.120 | Right now when I think about that kind of humans managing agents, we only see this properly in coding and in coding.
00:08:07.840 | We basically see people who are good engineers who are good both at IC work and at managing a team of interns
00:08:16.000 | really, really, really being able to take the benefit of the AI revolution. A lot of excellent IC engineers
00:08:23.760 | end up saying, oh, I don't want to use AI. It's actually not that good as people are saying.
00:08:30.640 | And it probably isn't for them, but their bar for code might be too high. They might want to have
00:08:38.160 | everything optimized in the right manner. And in this way, you know, this is the first time where
00:08:42.800 | engineers are managing the swarm of agents and they need to be good at managing in order to actually
00:08:50.640 | distill leverage and distill benefit for their organization.
00:08:56.640 | So just to wrap up, I think the concurrence of one of chat-based systems and the pollution you get
00:09:07.120 | for playing with them, it almost is like brainstorming. But after brainstorming, you need to sit down and
00:09:13.840 | create a right document to explain what an agent should be doing.
00:09:18.400 | This is only needed for repeatable processes. Once you have a repeatable process that you trust and you
00:09:24.640 | think will be very useful, you can hook it up to a trigger. That can be either, you know, a cron drop or it
00:09:31.680 | could be a Gmail trigger or it could be an implicit trigger. Then that agent is then able to act in the background.
00:09:39.520 | The latency matters a little bit less there and only surface issues to you once it is struggling with
00:09:46.560 | something or needs your approval. Therefore, your work is mostly around creating these assignments for
00:09:53.040 | the agents, making sure that your taste is imbued there and then approving the results of the work,
00:10:00.240 | making sure you trust it more and more and more as you keep going. In that way, we are creating swarms of agents
00:10:06.800 | which are working in the background. And our main job is to swipe left and right as if it's Tinder and
00:10:13.040 | approve and edit the results of the work of the agents. I think from there, the prosumer market is
00:10:21.280 | going to adopt this much more widely and we're going to see it slowly entering the enterprise
00:10:28.000 | market. And I'm super excited about the enterprises creating most incredible tools that are going to be
00:10:35.120 | agentic and they are going to be used by the state of the art newest models, but the tools are going to be
00:10:40.800 | demoed. So a very clear progression for the future. Let's see if it's true. Thank you so much. I'm Philip,
00:10:48.160 | the co-founder and CEO of WorldWare. And at WorldWare, we actually enable these background agents to work.
00:10:54.160 | Come build yours.