back to indexAX is the only Experience that Matters - Ivan Burazin, Daytona

00:00:15.440 |
The number of agents in the world, I believe, 00:00:20.120 |
but they will basically be the number of humans 00:00:29.320 |
are using multiple of them, more are spinning up more. 00:00:38.920 |
I mean, we don't have a lot of data around this. 00:00:41.000 |
I actually researched a lot and tried to find what we have. 00:00:46.480 |
is that the first one may be not that important, 00:00:49.300 |
but 25% of YC startups say that AI writes 95% of their code. 00:00:54.340 |
But the second part is actually quite interesting, 00:00:56.460 |
whereas 37% of the latest YC batch are actually building agents 00:01:09.940 |
And I think that's actually quite interesting and aligns 00:01:14.000 |
But the most uncomfortable truth for people building tools 00:01:17.600 |
for agents, which we are and I know a lot of people are as well, 00:01:20.780 |
is that most of the tools that are built today 00:01:25.000 |
break the moment you remove a human from the loop. 00:01:30.280 |
As we're not entering a world where agents are assisting 00:01:33.420 |
developers, agents will definitely be the developers. 00:01:36.320 |
I mean, they will be the number one people using that. 00:01:39.320 |
So basically, if you're not building dev tools-- 00:01:51.300 |
And what I think about when we build our things, 00:01:56.360 |
what does it mean to actually build tools for the future? 00:02:03.620 |
And the term for building for agents is agent experience. 00:02:07.680 |
And a good friend of mine, Matt, which I guess a lot of you 00:02:17.260 |
the evolution of, like, a user experience, a customer 00:02:19.160 |
experience, and mostly a developer experience. 00:02:23.060 |
But the definition that I sort of found that works the best 00:02:29.200 |
that works at Netlify, so not Matt necessarily, 00:02:38.380 |
And how easily can they operate within digital environments 00:02:51.560 |
who is actually implementing agent experience? 00:02:58.680 |
there's more than the ones I put in here, of course. 00:03:00.540 |
So like, if you guys are building something like this, 00:03:03.740 |
Like, I just wanted to put a few that I've talked to and looked at. 00:03:07.860 |
And so I think seamless authentication is a really, really important one. 00:03:13.520 |
And if your agent-- if you give an agent a task to do and it has to log in, for the most part, 00:03:19.760 |
And moreover, you don't want to give your passwords to the agent. 00:03:24.700 |
And so a company like Arcade, which randomly, if he's around-- I met the founder yesterday. 00:03:29.760 |
I had not met him before that he was ready to talk. 00:03:33.900 |
What they solve is if your agent tries to authenticate into, like, Delta or Booking.com or wherever 00:03:39.480 |
it is, it can fall back to the user, the user can log in, and the agent can go off to its job. 00:03:45.620 |
Moreover, you can also add credentials inside of Arcade. 00:03:49.480 |
Before that, your agent can go and do what it means to be done. 00:03:53.380 |
So authenticate once, agent goes off, does their job. 00:03:57.540 |
A second one is-- this is the easiest one, which I'm sure pretty much all of you do here-- 00:04:05.860 |
Basically, any doc URL, just append a .md, and you'll get a clean markdown file, no fluff. 00:04:12.680 |
The agent can consume it as easy as-- very, very easily. 00:04:15.620 |
But moreover, the llms.txt standard-- I'm sure all of you have seen it. 00:04:23.700 |
So if you don't have an llms.txt in your docs, you definitely should. 00:04:28.080 |
It makes it super easy for the agent to go and read what you all are doing. 00:04:34.980 |
The third thing we found was API-first design. 00:04:37.700 |
And I think this is actually the most critical. 00:04:39.620 |
Basically, if an agent can't see the functionality, it's really hard for it to do. 00:04:44.640 |
And I strongly believe that the best way for an agent to interact with any tool is through 00:04:52.340 |
And I put API here, and I won't talk about MCP. 00:05:00.400 |
But basically, an API is the underlying tech that you have to have, have to be exposed, 00:05:05.240 |
so the agent can access what they want the most efficiently. 00:05:11.100 |
And so Neon and Etnify and Supabase and these companies do a really good job of this. 00:05:16.460 |
As far as I know, all the key things that an agent needs to use their services can be accessed 00:05:23.580 |
And now, all these things are really great, and I think this is all the right direction. 00:05:31.240 |
Is it just, like, make your docs more readable? 00:05:35.220 |
And so what I wanted to think about is the first definition of Sean's. 00:05:38.300 |
And I think there's missing one key part for all of us who are building tools to think about. 00:05:46.340 |
So how easily can an agent autonomously access, autonomously understand, autonomously operate 00:05:52.500 |
I think this word is actually quite important, and it makes you sort of think about what you're 00:05:59.960 |
Because if you give your agent a task, and it can do a lot, but it always has to fall back 00:06:05.780 |
to you to achieve its task, I don't think that is the future. 00:06:09.900 |
And I think we should all go back to the drawing board and think about this. 00:06:14.280 |
Basically, what happens if there are no humans around to click the buttons, to debug errors, 00:06:21.060 |
I think that's where our work starts as tool builders. 00:06:26.340 |
And so if you're not actually solving that, then I actually think you are just porting for 00:06:31.080 |
And just as a note, Swix basically coerced me into doing this talk, because he said that 00:06:39.100 |
agent experience doesn't exist, it's just a wrap-around developer experience. 00:06:42.540 |
And so I think if you're not thinking about how an agent can autonomously do his task, then 00:06:51.220 |
And just really quickly, I sort of skipped off who I am and why the hell I'm talking about 00:06:54.480 |
these things, and why I might know something, might not. 00:06:59.480 |
So first company in the early 2000s, so M-dated, very old person, started by building data centers 00:07:06.620 |
and server rooms, HP servers, IBMs, Hypervisor, VMware, all these things. 00:07:11.160 |
So actually screwing in servers and running cables and installing Windows servers via CDs. 00:07:21.360 |
After that, sort of sold that, created the very first browser-based ID in 2009. 00:07:28.600 |
That was super early, so we had to create our own IDs, our own orchestrators, our own isolation. 00:07:34.360 |
More recently, led the developer experience at a company called Infobip. 00:07:39.920 |
It's a multi-billion-dollar company that competes with Twilio, which you haven't really heard 00:07:45.040 |
But basically, it's a communications platform as a service. 00:07:47.540 |
So one API for sending emails, SMS, voice, and all those nice things. 00:07:52.960 |
And as of late, working at Daytona with a bunch of people that I've worked with in my 00:07:59.360 |
And I have to say, if anyone is a founder or founding a company, if you can work with 00:08:04.480 |
people that you've worked with historically, you should do that. 00:08:11.860 |
Daytona basically is a secure and elastic infrastructure purposely built for running AI-generated code. 00:08:18.580 |
Basically that means we created an agent-native runtime. 00:08:21.440 |
What agent-native means, I'll hopefully explain a bit. 00:08:27.840 |
So we, as Daytona, give agents a computing environment that they can use to run code, 00:08:33.400 |
do data analysis, reinforcement learning, computer use, or more recently, I've seen people use 00:08:38.560 |
it for agents to play video games, like Counter-Strike and whatnot. 00:08:43.020 |
So people are doing funny things with these things. 00:08:46.580 |
And so Daytona is basically what a laptop is to a human. 00:08:49.820 |
That is what a Daytona runtime is for an agent, sort of. 00:08:54.300 |
And so when we were starting to build this new company or this new product, we looked at what 00:09:00.860 |
And we took the principles of what agent experience is. 00:09:03.920 |
And these are the first things that sort of we built through that. 00:09:08.860 |
Basically, if you have a tool for an agent and your agent is in interactive mode, so think 00:09:14.420 |
of like Claude or ChatGPT, if you're the user, you don't want to waste time for tools to turn 00:09:20.580 |
So we created something that spins up in like 27 milliseconds. 00:09:23.420 |
Obviously, API first, so can an agent be an API, turn on a machine, turn it off, you know, clone 00:09:31.660 |
After that, we thought about what more things can it do? 00:09:33.760 |
So like, it's really fast, the agent can spin it up, but what happens when it gets inside? 00:09:38.100 |
And so we thought it wasn't ideal for an agent to parse an output from a terminal. 00:09:42.080 |
So we preloaded all of them with headless tools. 00:09:45.080 |
So a file explorer, get clients, LSP, terminal, and all these things. 00:09:49.280 |
And so this sort of aligns with the original definition is like we're helping agents do 00:09:54.660 |
And we thought like that was it until we started getting like really interesting users and customers 00:10:01.080 |
that said like, oh shit, like our agent can't do this. 00:10:07.220 |
And so I'm gonna walk you through some of these new primitives, maybe a bigger word for that, 00:10:13.080 |
maybe not, or features that we found with new customers building agents on top of us. 00:10:20.460 |
And these, just to be clear, I don't think these features are something that everyone can replicate, 00:10:24.580 |
but it's more of like how we thought about solving these problems and how we've seen these problems 00:10:29.020 |
and maybe inspires you to think about how you're building your tools for agents. 00:10:34.880 |
So the first thing is a declarative image builder. 00:10:39.000 |
So Daytona is a sandbox that uses any Docker image off the shelf. 00:10:43.640 |
So as a template, then the agent can, you know, spin up a sandbox with that template. 00:10:48.380 |
Obviously, if an agent needs to add a dependency, it can use, you know, the API terminal and just 00:10:57.800 |
But if an agent has to now install like 20 new things and do it over and over again, that's 00:11:01.600 |
just like a waste of time and resources. And the way you would solve that right now is like a human 00:11:06.320 |
goes in to Daytona, you know, creates the new Docker container, or goes to the laptop, creates a new Docker container, 00:11:11.060 |
pushes to our container registry, and the agent can do that. But that like takes time and effort. 00:11:15.440 |
The other option is like an agent tries to build a Docker image on its own and then pushes the container 00:11:20.720 |
registry, which is brittle, takes time, breaks, and probably won't do very well. So how do you think about that? 00:11:28.240 |
Well, we created something like a declarative image builder where an agent can say, 00:11:31.520 |
Oh, I need a net new one. This is my base image. These are things I want installed. These run these 00:11:37.440 |
commands. We build it on the fly and open up a sandbox. Basically, the agent can now at any moment 00:11:43.360 |
in time say, I need something net new. I can make it on my own, and I can launch it on my own. Solves it 00:11:50.000 |
end to end for itself. The second thing is what we humans take for granted is if you're programming on your 00:11:57.840 |
laptop, your environments are usually probably Docker containers on your machine. And if you have this 00:12:02.960 |
large data set, you can really easily share it with all the environments that you have on your machine. 00:12:09.120 |
But if an agent, and we found some users that really need this, they need like 100 gigabyte data sets in 00:12:13.760 |
every machine that they have, there is no local laptop context. Every environment is isolated on its own. 00:12:21.920 |
So how do you make it more efficient that an agent doesn't have to download something from or upload 00:12:27.360 |
from S3 every single goddamn time? And so basically, we created something called Daytona Volumes. 00:12:33.120 |
The agent can invoke one of these, any size it wants, uploads it once, and maps it as a network drive, 00:12:40.960 |
or mounts it as a network drive on every single machine. The last thing I'm just going to go through, 00:12:47.440 |
and then we'll finish off on these feature stuff, is quite different from the others. But I put it in 00:12:52.160 |
here because I feel it's very much unique to agents. And that is the ability to execute things in parallel. 00:12:59.520 |
So an agent, unlike a human, we as humans can work on one machine, maybe two if we're super dialed in. 00:13:04.640 |
An agent can try things in parallel all at once. So instead of going, try this outcome, it sucks, go back, 00:13:16.000 |
try it again, it sucks, go back. It can basically take a machine, fork it five times, 10 times, 100,000 times, 00:13:22.240 |
go through all of these, and then come to an output. And so again, 00:13:28.000 |
these are just things that we have seen as we've worked with users, and that they need their agents, 00:13:34.560 |
what their agents need to get their job done. And so if you ask, like, what else is there in 00:13:39.520 |
agent experience? The short answer is, like, I have no idea. The reason I say that is because 00:13:47.840 |
people that are building agents are building them right now. And you're only now, or we are only now, 00:13:54.560 |
finding out what they actually need to get the task done. And so what I want to instill you all with 00:14:01.120 |
is, if you haven't thought about, or if you have a tool that at some point needs a human in the loop, 00:14:07.920 |
then you probably haven't solved or thought about how to solve this. So the beginning of the talk is 00:14:12.800 |
called, basically, the agent experience is only experience that matters. And I say this not because 00:14:18.240 |
we as humans are disappearing, but agents by far will be the largest user out there. And less and 00:14:23.920 |
less will have people behind the screen reading the logs, clicking on buttons, typing into terminals. 00:14:29.120 |
And lastly, just one thing I'll leave you with is, when we started Daytona, basically, we wanted the best 00:14:35.360 |
developer experience that we could have. And we invested a lot in, like, the terminal. And there was a talk 00:14:40.240 |
earlier about here today about someone that has amazing, you know, terminal UI. And I think that's great. 00:14:44.960 |
We started with that as well. But as a small company, you sort of focus on what you need, 00:14:49.120 |
and on what's most important. And right now, what's most important is, yeah, great to have a CLI. 00:14:55.360 |
But can your agent do your task end to end? Because basically, if your agent can't use your product 00:15:01.120 |
in the future, absolutely no one will. So with that, I thank you for your time. We, if anyone has any 00:15:10.720 |
interest in this, you can take a look here, use our GitHub, it's open source. Also, we have a booth