back to indexBeyond Conversation: Why Documents Transform Natural Language into Code - Filip Kozera

Chapters
0:0 Intro
0:19 Problems with chatbased systems
0:52 Context pollution
2:8 Documents
2:55 Concurrency
3:21 Background
3:46 Workflows
4:20 Human in the loop
5:7 Protocols
6:22 Agent Economy
7:21 Managing Agents
8:58 Conclusion
00:00:00.080 |
Hi, I'm Philip and I'm the CEO at Wordware. Today I want to talk to you about what sucks 00:00:06.080 |
about chat-based interfaces, how documents can actually solve those issues and how do they lead 00:00:12.640 |
to background agents that do tasks for you in the background. So firstly, let's start with what are 00:00:21.120 |
the problems with chat-based systems. When I interact with Claude or OpenAI, it all seems very 00:00:27.840 |
informal. I end up often creating workflows for myself using projects or just copy-pasting stuff. 00:00:35.280 |
And in that way, when I'm kind of having these long conversations that I populate the context window, 00:00:40.960 |
I realize that a lot of these things in the context window are just like gibberish and garbage. And so 00:00:50.640 |
we get context pollution. I also don't get to iterate in a structured manner. If I'm working with 00:00:57.280 |
artifacts, it ends up that I basically make the context window dirty enough with not being able to 00:01:04.720 |
you know, change one sentence that I really wanted to make sure that it's precise. 00:01:08.960 |
I also sometimes lack another level of forced clarity. ChatGPT often asks me that one particular 00:01:19.040 |
question around, "Oh, in deep research, what would you like to actually find?" But it never actually asks me 00:01:26.480 |
in the right way when it's not actually certain about some things and therefore not forcing me to clarity. 00:01:34.800 |
Also, there's a couple more issues about poor version control, limited reusability, model laziness. The more the context grows, the less, 00:01:45.760 |
less, you know, the worst response I'm getting. Also, chat interfaces don't support any logical grouping 00:01:53.200 |
or nesting in any way. And we also are interacting with a single abstraction layer. We don't get to 00:02:00.080 |
see and choose whether we want to specify every small detail of a particular task or just set it up in 00:02:07.200 |
some way. Hence documents. Documents are actually the original way of specifying more complex systems. 00:02:16.400 |
The first product requirements doc that dates probably to Noah's Ark around three and a half thousand years ago. 00:02:26.320 |
Don't check me on that one, however. And it's the first kind of take on someone explaining a more 00:02:34.720 |
complex system to somebody who is not aware of how to build it. And therefore, documents are actually the 00:02:43.600 |
ultimate way of humans communicating these more complex ideas. And so in that way, we get forced clarity, 00:02:50.880 |
opportunity, which is great. But next problem with chat and one of the biggest problems with chat is 00:02:56.560 |
concurrency. We have concurrency of one with all chat-based systems. We need to be sitting there and 00:03:03.440 |
we are getting like the inklings of how the future will look like when Manus or Deep Research are running in 00:03:11.840 |
the background and actually doing things for us. It feels great. It feels great that there's something 00:03:17.920 |
in the background that's happening. So now let's riff off this idea of the background agents that I'm going 00:03:25.760 |
to introduce. So we've been doing work and computers have been doing work for us in the background for 00:03:33.760 |
quite a while. And, you know, we've kind of created workflows which are kind of handcrafted. You can 00:03:39.600 |
think of the Zapier of this world. And then we just barely started to create specialized agents. That basically 00:03:47.440 |
means that at some stage it kind of had an if-else statement that was somewhat fuzzy and it made one or two 00:03:54.080 |
decisions. And as we can see on this diagram, when the importance of some workflow is high and the 00:04:02.160 |
occurrence is high, we actually end up using handcrafted workflows. We're only now entering an area where the 00:04:09.120 |
general agents are starting to take some decisions. But whenever we have kind of higher importance, we don't let 00:04:16.720 |
that general agents to kind of enter our life. So how can we remedy this? We remedy this by 00:04:25.760 |
introducing a human in the loop. So essentially now with the human in the loop, the agent can do a bunch of work and we 00:04:33.840 |
get to approve, reject it, change the way that it's created there and output or even fix its logic entirely. 00:04:43.440 |
They normally react to some kind of implicit or explicit user intent or trigger. So you can 00:04:49.600 |
think of these background agents as being activated by a sent email, send Slack message, maybe your meeting. 00:04:56.640 |
That could be an implicit trigger that you had a meeting with a name party and that party with investors and 00:05:03.360 |
that could prompt you to update your CRM. And with having a bunch of these ambient agents, you end up 00:05:11.280 |
having like having to create protocols of how humans and the AIs communicate between themselves. 00:05:17.520 |
And that basically means that, you know, both humans can control their agents and its outputs, but also 00:05:24.640 |
different agents can communicate with each other to educate themselves around, you know, sources of data. 00:05:32.480 |
An agent, one agent could be communicating with a more general agent around what is in your notion. 00:05:38.880 |
And in that way we'll probably start with a prosumer first where a bunch of agents are working in the 00:05:44.240 |
background. But very soon that starts to be about an organization and we'll start having organizations 00:05:51.440 |
which have their own agents and also external agents. And in that way, we might even get agents 00:05:58.080 |
which manage other humans, which sounds ridiculous at the beginning, but actually, you know, that could be just 00:06:03.120 |
an agent which creates Jira tickets for all of your engineers. And in that way we basically create the 00:06:08.880 |
graph of the enterprise of the future. I think when I think about these background agents, firstly, kind of 00:06:16.480 |
working for the prosumer, maybe managing your emails and just making sure that you have more time for 00:06:21.680 |
yourself. I think this idea like so naturally represents a bottom up movement to enterprises, 00:06:28.720 |
which will be more slow moving and trying to make sure that all of their agentic tools that these agents 00:06:35.200 |
need to be using are verified and have the right authority and they have the right permissions and they 00:06:41.120 |
don't mess things up. So as I am thinking about the future of the agent economy, I think that the stochastic mindset 00:06:49.200 |
of like we need to adopt that because we essentially need to lead with leverage over uncertainty. 00:06:57.760 |
If something that you don't fully get how it works closes your clients and delivers on business value, 00:07:03.760 |
99.9% of the time we're not going to care about the fact that we don't understand what's happening in 00:07:10.320 |
that 0.1% of the time. We're just going to make sure that the impact of the 0.1% of the time is not 00:07:18.640 |
catastrophic. I also think humans will manage a bunch of agents and that's why taste and intent 00:07:26.800 |
is so important. You will need to imbue your own personal brand onto agents and take responsibility for 00:07:33.840 |
their actions. We'll also need a lot more communication protocols between humans and AI and also in between 00:07:39.680 |
agents. The MCP is the first protocol that kind of sets it up, but I think it lacks more information 00:07:46.480 |
about what are the constraints of a particular agent, what are the authority that it needs to 00:07:52.240 |
have in order to act, whether it needs approval from human in the loop, etc. 00:07:59.120 |
Right now when I think about that kind of humans managing agents, we only see this properly in coding and in coding. 00:08:07.840 |
We basically see people who are good engineers who are good both at IC work and at managing a team of interns 00:08:16.000 |
really, really, really being able to take the benefit of the AI revolution. A lot of excellent IC engineers 00:08:23.760 |
end up saying, oh, I don't want to use AI. It's actually not that good as people are saying. 00:08:30.640 |
And it probably isn't for them, but their bar for code might be too high. They might want to have 00:08:38.160 |
everything optimized in the right manner. And in this way, you know, this is the first time where 00:08:42.800 |
engineers are managing the swarm of agents and they need to be good at managing in order to actually 00:08:50.640 |
distill leverage and distill benefit for their organization. 00:08:56.640 |
So just to wrap up, I think the concurrence of one of chat-based systems and the pollution you get 00:09:07.120 |
for playing with them, it almost is like brainstorming. But after brainstorming, you need to sit down and 00:09:13.840 |
create a right document to explain what an agent should be doing. 00:09:18.400 |
This is only needed for repeatable processes. Once you have a repeatable process that you trust and you 00:09:24.640 |
think will be very useful, you can hook it up to a trigger. That can be either, you know, a cron drop or it 00:09:31.680 |
could be a Gmail trigger or it could be an implicit trigger. Then that agent is then able to act in the background. 00:09:39.520 |
The latency matters a little bit less there and only surface issues to you once it is struggling with 00:09:46.560 |
something or needs your approval. Therefore, your work is mostly around creating these assignments for 00:09:53.040 |
the agents, making sure that your taste is imbued there and then approving the results of the work, 00:10:00.240 |
making sure you trust it more and more and more as you keep going. In that way, we are creating swarms of agents 00:10:06.800 |
which are working in the background. And our main job is to swipe left and right as if it's Tinder and 00:10:13.040 |
approve and edit the results of the work of the agents. I think from there, the prosumer market is 00:10:21.280 |
going to adopt this much more widely and we're going to see it slowly entering the enterprise 00:10:28.000 |
market. And I'm super excited about the enterprises creating most incredible tools that are going to be 00:10:35.120 |
agentic and they are going to be used by the state of the art newest models, but the tools are going to be 00:10:40.800 |
demoed. So a very clear progression for the future. Let's see if it's true. Thank you so much. I'm Philip, 00:10:48.160 |
the co-founder and CEO of WorldWare. And at WorldWare, we actually enable these background agents to work.