How we hacked YC Spring 2025 batch’s AI agents

- - So yeah, who's ready to hack some agents? Yeah, oh wow, all right. So let me first introduce myself a little bit. I'm Rene, I'm the CEO of Casco, we're a YC company, and we specialize in red teaming AI agents and apps. And so we spent-- I spent my previous time at AWS working on AI agents, but I've always really loved working on AI.

In fact, there's a video of me 10 years ago building voice to code, and I won Europe's largest hackathon by doing that. And so I would talk to it, say, build me a blog post, and it would generate the sites. And it was actually-- it was kind of fun.

It did things like loading pictures from San Francisco. And you can see how horribly slow the APIs were back then. And I'm about to give you a nightmare by showing you the architecture diagram of that thing. But yeah, it kind of did the job. And this was like 10 years ago.

Obviously, back then was no generative AI, and these things were extremely difficult to do. But it really gave me a glimpse of what the future could look like, even back then as technology gets better, right? So obviously, many things have changed. Two months ago, I quit AWS and worked out of the garage with my co-founder.

And we got into Y Combinator. So yay! That's awesome. And so from there, we also looked into how else have things evolved. Well, this was my architecture diagram from back then. You could see there was three different cloud providers, including IBM Watson, which was like forefront at the time.

That's true. And before, it was like Microsoft Lewis, which was like some natural language understanding things. And you can see it was just a lot of like piecing things together. And that was already kind of difficult to do. But nowadays, we see the stacks normalize significantly more, right? I think this is probably what the average agent stack looks like these days.

Got some server front end. You talk to an API server that talks to an LLM, connects up with tools. And then you have a bunch of data sources associated to it. So this kind of normalization of agent stack is actually really good. That makes many things easier, definitely better than my HackerFarm project 10 years ago.

But we need to think about the security posture around these systems. And my general impression over the last few years is like primary discussions around LLM security, really like, hey, can you do prompt injection? Can you get it to do harmful content, which is all really important. But the reality with security is you need to look at all the different arrows in your system.

And that is typically where real damage happens, right? And so this is really agent security, and that is what I want to talk about today. Now, one thing is like, why did we even hack a bunch of agents? It's kind of a weird thing to do. The answer is, quite frankly, we wanted to launch internally at Y Combinator, and we wanted a splashy headline.

And so we're like, uh-oh, what do we do? And fun fact, we have the second highest upvoted launch post inside Y Combinator of all time. So, higher than Rippling. Yes. Okay. So, we did basically this approach. At a time, we were looking at, oh, which agents are already live?

And then let's just set a timer for 30 minutes. We don't want to waste too much time on this. And then, you know, let's figure out what their system prompts are and just kind of understand how they're working. And I have a feeling when I was creating this meme that this could be true, but it turns out it is true.

And then we looked at, oh, what kind of tool definitions do they have, right? Like, you know, what is it supposed to do? Is it supposed to access data, supposed to run code, right? And then we just tried to exploit them and see what's going on. And it was really fun, because we hacked out of 16 agents that were launched.

Within 30 minutes each, we hacked seven of them, and there are three common issues we see across all of these ones. So I hope that we will all learn today what the most common issues are, so you don't make the same mistakes. And also, this is going to be the best investment if you're a VC dispatch, because they're all secure now.

So first issue, cross-user data access. I mean, you guys were just here at the OF talk. You know where this is going to head into, right? So we first leaked this company's system prompt, and we saw, huh, it has a bunch of interesting tools attached to it, including looking up user info by ID, suspicious, document by ID, and a bunch of other things.

And then, you know, like, when you see this, you just want to be like, oh, yeah, there's this thing called IDOR, like Insecure Direct Object Reference. It's basically when you make a request, and you validate that, hey, the token's valid, and then you just let the request through, right?

And you're kind of betting on the fact that the ID cannot be guessed. Well, that's obviously not good. So, yeah. We looked up a product demo video that they recorded, and we found the user ID in the URL bar. And just, like, tried to plug it in. This is a different ID, by the way.

Don't worry, guys. This is my co-founder's ID now. And yeah. We were able to find their personal information, including their email, nickname, whatever. Well, it gets better because these things are also interconnected. So you had not only the user ID, but you also had, like, oh, the chat ID.

Oh. And their document ID. And then these things ultimately linked up together and allows you to traverse the entire system. Right? Okay. It's not good. So what's the fix for that? There was a really comprehensive talk literally right before this. Sorry for the folks that missed it. But this is the basic fix for it, right?

You need to think about how do you authenticate but also authorize the request. It's really two checks, right? Make sure your token is valid. Good job, team. I got that. And then the second thing is, like, this is what we see in this super base era with role-level security.

Just make sure that you have some sort of access control matrix somewhere that checks that it matches up with whoever's making the request. Okay? Super, super important. Authenticate and authorize. Now you can see this was actually, you know, an issue that was kind of there, right? It's not, like, around the LLM and the API server.

It's really what is happening downstream. And yeah, there's a lot of errors in this diagram. We're going to look at all of them. So the next thing is to remember, as you're thinking about these tools and how you're building it, like, agents actually act like users, not API servers.

When we were, like, debugging this issue, like, we actually asked a bunch of Y-Combinator companies, like, why did you build it this way? Because clearly they can build a web app properly, right? But it's just, like, I think, as developers, we have this natural pattern matching in our heads.

It's like, oh, yeah, this thing runs in a server, so it should be like a service. And then I'm going to give it service-level permissions. But actually, agents are like users, right? So everything that applies to users applies to agents, too. So make sure that, you know, your LLM should probably not determine your authorization pattern.

That's bad. That's a red flag. Second thing is it should probably not act with service-level permission. Listen to your previous talk on OAuth. That's great. And then, just like users, you should make sure you don't just accept any input. You should sanitize them. Same with outputs, right? A lot of these are like the traditional web application security things that you just need to, like, really, really internalize for this new world.

Now that was interesting. And so the second one was even better. So this is not as common, but the damage is bigger. So in the pattern we see, there are a lot of code tools that agents use. And there's an anthropic paper here. It basically talks about what's the distribution of which industry and how much do they use Claude.

And there's, like, this one outlier here. I'll zoom it in for you. Yeah, so us nerds, we make up 3.4% of the world, but we're 37% of Claude's usage. Ooh, why is that? Because we love computers and we love coding, right? And so we found immediately the value of it.

But it's not just us that use agents with coding tools. In fact, many agents create code on demand to do some things, right? Like some agents just generate a calculator on demand to make a calculation, right? And so there's a lot of these code execution sandboxes out there that are interesting.

And so if you think about that, there's actually a critical path in your system, because you've got a tool that talks to another container. A container is arbitrary compute. And when you have arbitrary compute, many things can happen. Many bad things, many good things, right? But let's talk about the bad things today.

So we did the same script, did the system prompt. Again, the system prompt itself, great. I mean, it doesn't cause any damage. But as an attacker, you always think about the things that are like, huh, that's kind of suspicious, right? It's like, oh, wait, it runs code and never outputted it to the user.

OK, let's output it to the user. Oh, yeah, and mostly run it mostly at most once. Let's run it all the time. And so you try to basically invert what the system prompt is saying, because that is exactly what the developer didn't want you to do. And that is how bad actors think, right?

So we figured out, oh, this thing does have a code tool. And so we tried running something that's like, oh, it only allows me to write Python. And I love JavaScript. And it doesn't allow me to run these really dangerous function calls. Oh, OK. And it restricts which Python files to run.

That's also not good. So, yeah, but we looked at what it could do. And it had two kind of innocent permissions, write a Python file and read some files. You can do a lot with that. This is great. Because what if we just looked around the file system now, right?

We can read files. So we looked at, oh, build me a little tree functionality and, you know, return me the entire file system tree to see what's going on. Oh, my God, there's an app.py file. That's probably important. And then we looked at, oh, it has two endpoints, write file and execute file.

Ah, OK. These endpoints are hidden behind a VPC, so we cannot hit it directly. That's OK. But, huh, we can write files. Huh, we can write files. There's an app.py file. Huh. Let's look into that. Oh, wait. That's where all the protections are for their code. And so we can just override the app.py file with empty strings around all the security checks.

And whoopsie. We got in. So now we can Bitcoin mine all day. That's great, right? Yeah? No, it gets much worse. So the thing with arbitrary code execution, once you're inside a container, is that you can do many things. Like there's this thing called service endpoint discovery, metadata discovery.

Y'all heard of that? No? OK. Basically, it allows you to discover what are other devices on the network? What other resources are there on the network? And you can also just fetch the user token -- sorry, the service token, just see what's going on. What's the project name? And you start looking around.

It's like, oh, OK. Yeah, OK. I can also fetch the scopes. So I can do many things with this token. That's awesome. Who has really, really spent time configuring service-level tokens and their permissions in a granular manner and does it all the time and never forgets to set something wrong?

OK. One guy. One guy. There. OK. Whoopsie. So that's -- and we just queried BigQuery, which has a great interface for that. Isn't that cool? Yeah. So, yeah. Making sure you have code sandboxes correctly is very hard, because you can move laterally across the infrastructure, and that is just very, very dangerous.

OK? And so kind of like don't roll your off in the web world. Don't roll your own code sandboxes, please. It's just very hard. It's very, very hard. And so use an out-of-the-box solution. There are many of them. E2B is, I think, a very popular one. Some folks have probably heard of it.

There's one in our YC batch that I personally just genuinely really love. They have observability built in. They boot up super quickly. And what I love about them is they have an MCP server that's just as easy to plug into, right? So just easier for your agents to work with.

So please do that. Don't do, you know, your own Python, app.py thing. It's not good. Trust me. So that leads into a third part of an attack vector around server-side request forgery. It's a very long word and really bugs me that the SSRF didn't fit on the previous line.

It really triggers me. Yeah, I know. So this is what happens when you can kind of get a tool to call another endpoint that you didn't, you know, that the service itself didn't intend you to call. And you can pull out a lot of information just through that workflow.

So let me give you an example. So this is exactly. Extract a system prompt. Great. Oh, this thing can create databases. That sounds exciting. And then you look into it. It's like, huh, it pulls the database schema from a private GitHub repository. Isn't that great? That means whatever request goes to that private GitHub repository must have the Git credentials.

Right? Otherwise, how can it pull that from a private repository? So, yeah, and it's just a string. So I guess I can just put in whatever string I want and coerce it into providing that. So let's set up a badactor.com test.git repo and just see what credentials come through.

And, yep, it comes across with the Git credentials. And so now you can actually take those Git credentials and just download your entire code base that was behind the private repo. Isn't that crazy? Isn't that crazy? Yeah. I mean, it's awesome for me to do this. Right? You get paid to do this.

Oh, my. It's amazing. Now, we told our batchmates immediately, and they told us, don't worry, bro. It's already fixed. It's okay, guys. That company's secure if you're a VC listening in. But with that, though, it is really important to think about the implications of what your system is doing.

Right? I love Vibe coding, not gonna lie, but, like, you gotta really think about where all these arrows are and if you've configured those things correctly. So with that, always sanitize your inputs and outputs. This could be, like, a web dev conference from 20 years ago. But it applies to agents, too, right?

Like, we just need to make sure we keep those good security practices that we have learned to love, hopefully, over the years to take it forward to a new technology paradigm. And then, ultimately, I want you to take away three things. So first thing is, agent security is bigger than just LLM security.

Make sure you understand how these threat vectors apply inside your overall system. Second thing is, treat agents as users, and that applies to authentication, to sanitization of user inputs, and many of the other things. And last thing, definitely don't roll your own code signbox. That is just so dangerous.

And, you know, it very quickly turns from, like, an intern project into, like, a nightmare. So it'd be very, very careful of that. And these are the most basic ones that we've seen come across, right? There's obviously many more security issues. And if you don't know exactly how your agent's security posture is, you can go to casco.com.

You can book a demo with us. We built an AI agent that actively attacks other AI agents and tells you where they break. Isn't that great? And, yeah, feel free to connect with me on LinkedIn or on Twitter. And I've, every now and then, some good stuff to post.

Yeah. Awesome. Thanks, Renee. Does anyone have any questions? We could have time for, like, one or two quick questions if you're game for it. Sure. How do you look at the system problems? How do I look at system problems? There's a lot of just, like, open techniques. The best one that I've seen is from hiddenlayer.com.

Have you guys checked those guys out? They have a great blog post on, like, it's a policy puppeteering attack. Yeah. It's great. Cool. Awesome. Oh? Yeah. If you're missing, like, coding agents, like, how do you make sure, because the coding agent is not compromised, like, how do you make sure that it's actually not running, like, running the proper commands?

Like, this is a super tough thing to do. Like, if you try to whitelist them, like, there's so many creative ways that I want to get it out there. But, like, how are you-- Yeah. Are you talking about it locally or server-side? They're on both, like-- Yeah. I mean, locally is even more dangerous because they have differentials of the user running the code.

Yeah, no, very much so. So locally, I think right now the industry is either you go full YOLO mode or you ask every time, right? I mean, I'm not joking. Cursor's thing is called YOLO mode, right? And then on server-side, use a code sandbox because ultimately they have constraints around the internal networks, but also they have constraints around how long they can live as a sandbox.

Yeah. Okay, so sandboxes that use VMs actually? Yeah. So they typically use something called firecracker on the hood, which is a better isolation layer. Yeah, if you just use containers, by the way, that's not an isolation layer in case anybody's wondering. Yeah. Yeah, don't use containers for isolation. Yeah.

you We'll see you next time.

How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

Chapters

Transcript