back to indexHow we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

Chapters
0:0 Introduction to Casco and AI Agents
1:31 Evolution of Agent Stacks and Security Concerns
2:56 Why Casco Hacked AI Agents
4:0 Common Issue 1: Cross-User Data Access (IDOR)
7:38 Common Issue 2: Arbitrary Code Execution
12:38 Common Issue 3: Server-Side Request Forgery (SSRF)
14:48 Key Takeaways
15:28 Casco's Solution and Contact Information
15:56 Q&A
00:00:19.800 |
So let me first introduce myself a little bit. 00:00:22.040 |
I'm Rene, I'm the CEO of Casco, we're a YC company, 00:00:25.120 |
and we specialize in red teaming AI agents and apps. 00:00:30.400 |
I spent my previous time at AWS working on AI agents, 00:00:39.680 |
building voice to code, and I won Europe's largest hackathon 00:00:44.200 |
And so I would talk to it, say, build me a blog post, 00:00:50.500 |
It did things like loading pictures from San Francisco. 00:00:55.100 |
And you can see how horribly slow the APIs were back then. 00:00:59.580 |
by showing you the architecture diagram of that thing. 00:01:08.120 |
and these things were extremely difficult to do. 00:01:11.080 |
But it really gave me a glimpse of what the future could look 00:01:14.720 |
like, even back then as technology gets better, right? 00:01:20.140 |
Two months ago, I quit AWS and worked out of the garage 00:01:28.760 |
And so from there, we also looked into how else have things 00:01:32.980 |
Well, this was my architecture diagram from back then. 00:01:36.140 |
You could see there was three different cloud providers, 00:01:38.680 |
including IBM Watson, which was like forefront at the time. 00:01:43.600 |
And before, it was like Microsoft Lewis, which was like some natural language 00:01:49.100 |
And you can see it was just a lot of like piecing things together. 00:01:51.560 |
And that was already kind of difficult to do. 00:01:54.100 |
But nowadays, we see the stacks normalize significantly more, right? 00:01:57.940 |
I think this is probably what the average agent stack looks like these days. 00:02:05.040 |
You talk to an API server that talks to an LLM, connects up with tools. 00:02:09.200 |
And then you have a bunch of data sources associated to it. 00:02:12.120 |
So this kind of normalization of agent stack is actually really good. 00:02:16.000 |
That makes many things easier, definitely better than my HackerFarm project 10 years ago. 00:02:20.620 |
But we need to think about the security posture around these systems. 00:02:24.620 |
And my general impression over the last few years is like primary discussions around LLM security, 00:02:32.220 |
really like, hey, can you do prompt injection? 00:02:34.800 |
Can you get it to do harmful content, which is all really important. 00:02:39.640 |
But the reality with security is you need to look at all the different arrows in your system. 00:02:45.340 |
And that is typically where real damage happens, right? 00:02:49.560 |
And so this is really agent security, and that is what I want to talk about today. 00:02:55.560 |
Now, one thing is like, why did we even hack a bunch of agents? 00:03:01.480 |
The answer is, quite frankly, we wanted to launch internally at Y Combinator, and we wanted 00:03:10.840 |
And fun fact, we have the second highest upvoted launch post inside Y Combinator of all time. 00:03:21.840 |
At a time, we were looking at, oh, which agents are already live? 00:03:25.600 |
And then let's just set a timer for 30 minutes. 00:03:27.600 |
We don't want to waste too much time on this. 00:03:29.600 |
And then, you know, let's figure out what their system prompts are and just kind of understand 00:03:34.840 |
And I have a feeling when I was creating this meme that this could be true, but it turns 00:03:40.720 |
And then we looked at, oh, what kind of tool definitions do they have, right? 00:03:46.840 |
Is it supposed to access data, supposed to run code, right? 00:03:50.640 |
And then we just tried to exploit them and see what's going on. 00:03:54.660 |
And it was really fun, because we hacked out of 16 agents that were launched. 00:03:58.840 |
Within 30 minutes each, we hacked seven of them, and there are three common issues we 00:04:06.140 |
So I hope that we will all learn today what the most common issues are, so you don't make 00:04:11.800 |
And also, this is going to be the best investment if you're a VC dispatch, because they're all 00:04:21.040 |
I mean, you guys were just here at the OF talk. 00:04:23.640 |
You know where this is going to head into, right? 00:04:26.840 |
So we first leaked this company's system prompt, and we saw, huh, it has a bunch of interesting 00:04:32.300 |
tools attached to it, including looking up user info by ID, suspicious, document by ID, 00:04:41.140 |
And then, you know, like, when you see this, you just want to be like, oh, yeah, there's 00:04:44.740 |
this thing called IDOR, like Insecure Direct Object Reference. 00:04:49.340 |
It's basically when you make a request, and you validate that, hey, the token's valid, and 00:04:53.840 |
then you just let the request through, right? 00:04:55.680 |
And you're kind of betting on the fact that the ID cannot be guessed. 00:05:03.140 |
We looked up a product demo video that they recorded, and we found the user ID in the URL 00:05:15.720 |
We were able to find their personal information, including their email, nickname, whatever. 00:05:19.980 |
Well, it gets better because these things are also interconnected. 00:05:24.440 |
So you had not only the user ID, but you also had, like, oh, the chat ID. 00:05:31.780 |
And then these things ultimately linked up together and allows you to traverse the entire 00:05:41.420 |
There was a really comprehensive talk literally right before this. 00:05:48.100 |
You need to think about how do you authenticate but also authorize the request. 00:05:57.020 |
And then the second thing is, like, this is what we see in this super base era with role-level 00:06:01.720 |
Just make sure that you have some sort of access control matrix somewhere that checks that it 00:06:06.400 |
matches up with whoever's making the request. 00:06:14.400 |
Now you can see this was actually, you know, an issue that was kind of there, right? 00:06:18.520 |
It's not, like, around the LLM and the API server. 00:06:23.840 |
And yeah, there's a lot of errors in this diagram. 00:06:28.420 |
So the next thing is to remember, as you're thinking about these tools and how you're building 00:06:33.600 |
it, like, agents actually act like users, not API servers. 00:06:38.480 |
When we were, like, debugging this issue, like, we actually asked a bunch of Y-Combinator companies, 00:06:44.940 |
Because clearly they can build a web app properly, right? 00:06:48.240 |
But it's just, like, I think, as developers, we have this natural pattern matching in our 00:06:53.320 |
It's like, oh, yeah, this thing runs in a server, so it should be like a service. 00:06:55.480 |
And then I'm going to give it service-level permissions. 00:07:01.200 |
So everything that applies to users applies to agents, too. 00:07:05.320 |
So make sure that, you know, your LLM should probably not determine your authorization pattern. 00:07:11.360 |
Second thing is it should probably not act with service-level permission. 00:07:16.440 |
And then, just like users, you should make sure you don't just accept any input. 00:07:24.440 |
A lot of these are like the traditional web application security things that you just need 00:07:28.440 |
to, like, really, really internalize for this new world. 00:07:36.940 |
So this is not as common, but the damage is bigger. 00:07:40.920 |
So in the pattern we see, there are a lot of code tools that agents use. 00:07:49.340 |
It basically talks about what's the distribution of which industry and how much do they use Claude. 00:07:59.120 |
Yeah, so us nerds, we make up 3.4% of the world, but we're 37% of Claude's usage. 00:08:06.460 |
Because we love computers and we love coding, right? 00:08:10.940 |
But it's not just us that use agents with coding tools. 00:08:15.060 |
In fact, many agents create code on demand to do some things, right? 00:08:19.480 |
Like some agents just generate a calculator on demand to make a calculation, right? 00:08:24.220 |
And so there's a lot of these code execution sandboxes out there that are interesting. 00:08:29.780 |
And so if you think about that, there's actually a critical path in your system, 00:08:34.620 |
because you've got a tool that talks to another container. 00:08:39.180 |
And when you have arbitrary compute, many things can happen. 00:08:46.040 |
So we did the same script, did the system prompt. 00:08:52.320 |
But as an attacker, you always think about the things that are like, huh, that's kind of 00:08:58.340 |
It's like, oh, wait, it runs code and never outputted it to the user. 00:09:04.560 |
Oh, yeah, and mostly run it mostly at most once. 00:09:09.940 |
And so you try to basically invert what the system prompt is saying, because that is exactly 00:09:19.700 |
So we figured out, oh, this thing does have a code tool. 00:09:22.740 |
And so we tried running something that's like, oh, it only allows me to write Python. 00:09:30.000 |
And it doesn't allow me to run these really dangerous function calls. 00:09:41.880 |
And it had two kind of innocent permissions, write a Python file and read some files. 00:09:51.300 |
Because what if we just looked around the file system now, right? 00:09:55.740 |
So we looked at, oh, build me a little tree functionality and, you know, return me the entire 00:10:05.900 |
And then we looked at, oh, it has two endpoints, write file and execute file. 00:10:11.880 |
These endpoints are hidden behind a VPC, so we cannot hit it directly. 00:10:26.420 |
That's where all the protections are for their code. 00:10:29.840 |
And so we can just override the app.py file with empty strings around all the security 00:10:45.580 |
So the thing with arbitrary code execution, once you're inside a container, is that you can 00:10:52.660 |
Like there's this thing called service endpoint discovery, metadata discovery. 00:10:59.820 |
Basically, it allows you to discover what are other devices on the network? 00:11:03.720 |
What other resources are there on the network? 00:11:05.680 |
And you can also just fetch the user token -- sorry, the service token, just see what's going 00:11:21.180 |
Who has really, really spent time configuring service-level tokens and their permissions 00:11:26.900 |
in a granular manner and does it all the time and never forgets to set something wrong? 00:11:38.180 |
So that's -- and we just queried BigQuery, which has a great interface for that. 00:11:44.180 |
Making sure you have code sandboxes correctly is very hard, because you can move laterally 00:11:48.980 |
across the infrastructure, and that is just very, very dangerous. 00:11:53.180 |
And so kind of like don't roll your off in the web world. 00:12:09.080 |
There's one in our YC batch that I personally just genuinely really love. 00:12:16.080 |
And what I love about them is they have an MCP server that's just as easy to plug into, 00:12:24.080 |
Don't do, you know, your own Python, app.py thing. 00:12:30.080 |
So that leads into a third part of an attack vector around server-side request forgery. 00:12:37.080 |
It's a very long word and really bugs me that the SSRF didn't fit on the previous line. 00:12:46.080 |
So this is what happens when you can kind of get a tool to call another endpoint that you 00:12:53.080 |
didn't, you know, that the service itself didn't intend you to call. 00:12:56.080 |
And you can pull out a lot of information just through that workflow. 00:13:11.080 |
It's like, huh, it pulls the database schema from a private GitHub repository. 00:13:20.080 |
That means whatever request goes to that private GitHub repository must have the Git credentials. 00:13:26.080 |
Otherwise, how can it pull that from a private repository? 00:13:31.080 |
So I guess I can just put in whatever string I want and coerce it into providing that. 00:13:35.080 |
So let's set up a badactor.com test.git repo and just see what credentials come through. 00:13:41.080 |
And, yep, it comes across with the Git credentials. 00:13:45.080 |
And so now you can actually take those Git credentials and just download your entire code base that was 00:14:01.080 |
Now, we told our batchmates immediately, and they told us, don't worry, bro. 00:14:06.080 |
That company's secure if you're a VC listening in. 00:14:09.080 |
But with that, though, it is really important to think about the implications of what your 00:14:17.080 |
I love Vibe coding, not gonna lie, but, like, you gotta really think about where all these 00:14:22.080 |
arrows are and if you've configured those things correctly. 00:14:25.080 |
So with that, always sanitize your inputs and outputs. 00:14:29.080 |
This could be, like, a web dev conference from 20 years ago. 00:14:36.080 |
Like, we just need to make sure we keep those good security practices that we have learned 00:14:41.080 |
to love, hopefully, over the years to take it forward to a new technology paradigm. 00:14:46.080 |
And then, ultimately, I want you to take away three things. 00:14:50.080 |
So first thing is, agent security is bigger than just LLM security. 00:14:55.080 |
Make sure you understand how these threat vectors apply inside your overall system. 00:15:00.080 |
Second thing is, treat agents as users, and that applies to authentication, to sanitization 00:15:05.080 |
of user inputs, and many of the other things. 00:15:08.080 |
And last thing, definitely don't roll your own code signbox. 00:15:13.080 |
And, you know, it very quickly turns from, like, an intern project into, like, a nightmare. 00:15:20.080 |
And these are the most basic ones that we've seen come across, right? 00:15:26.080 |
And if you don't know exactly how your agent's security posture is, you can go to casco.com. 00:15:33.080 |
We built an AI agent that actively attacks other AI agents and tells you where they break. 00:15:39.080 |
And, yeah, feel free to connect with me on LinkedIn or on Twitter. 00:15:42.080 |
And I've, every now and then, some good stuff to post. 00:15:53.080 |
We could have time for, like, one or two quick questions if you're game for it. 00:16:02.080 |
There's a lot of just, like, open techniques. 00:16:04.080 |
The best one that I've seen is from hiddenlayer.com. 00:16:08.080 |
They have a great blog post on, like, it's a policy puppeteering attack. 00:16:19.080 |
If you're missing, like, coding agents, like, how do you make sure, because the coding agent 00:16:25.080 |
is not compromised, like, how do you make sure that it's actually not running, like, running 00:16:31.080 |
Like, if you try to whitelist them, like, there's so many creative ways that I want to get 00:16:39.080 |
Are you talking about it locally or server-side? 00:16:44.080 |
I mean, locally is even more dangerous because they have differentials of the user running 00:16:50.080 |
So locally, I think right now the industry is either you go full YOLO mode or you ask every 00:17:00.080 |
And then on server-side, use a code sandbox because ultimately they have constraints around 00:17:06.080 |
the internal networks, but also they have constraints around how long they can live as a sandbox. 00:17:15.080 |
So they typically use something called firecracker on the hood, which is a better isolation layer. 00:17:19.080 |
Yeah, if you just use containers, by the way, that's not an isolation layer in case anybody's