How we hacked YC Spring 2025 batch’s AI agents

00:00:00.000 | -

00:00:02.000 | - So yeah, who's ready to hack some agents?

00:00:17.720 | Yeah, oh wow, all right.

00:00:19.800 | So let me first introduce myself a little bit.

00:00:22.040 | I'm Rene, I'm the CEO of Casco, we're a YC company,

00:00:25.120 | and we specialize in red teaming AI agents and apps.

00:00:28.600 | And so we spent--

00:00:30.400 | I spent my previous time at AWS working on AI agents,

00:00:33.740 | but I've always really loved working on AI.

00:00:36.940 | In fact, there's a video of me 10 years ago

00:00:39.680 | building voice to code, and I won Europe's largest hackathon

00:00:43.180 | by doing that.

00:00:44.200 | And so I would talk to it, say, build me a blog post,

00:00:47.040 | and it would generate the sites.

00:00:48.740 | And it was actually-- it was kind of fun.

00:00:50.500 | It did things like loading pictures from San Francisco.

00:00:55.100 | And you can see how horribly slow the APIs were back then.

00:00:58.160 | And I'm about to give you a nightmare

00:00:59.580 | by showing you the architecture diagram of that thing.

00:01:02.420 | But yeah, it kind of did the job.

00:01:04.580 | And this was like 10 years ago.

00:01:06.260 | Obviously, back then was no generative AI,

00:01:08.120 | and these things were extremely difficult to do.

00:01:11.080 | But it really gave me a glimpse of what the future could look

00:01:14.720 | like, even back then as technology gets better, right?

00:01:18.140 | So obviously, many things have changed.

00:01:20.140 | Two months ago, I quit AWS and worked out of the garage

00:01:23.560 | with my co-founder.

00:01:24.780 | And we got into Y Combinator.

00:01:26.720 | So yay!

00:01:27.560 | That's awesome.

00:01:28.760 | And so from there, we also looked into how else have things

00:01:32.180 | evolved.

00:01:32.980 | Well, this was my architecture diagram from back then.

00:01:36.140 | You could see there was three different cloud providers,

00:01:38.680 | including IBM Watson, which was like forefront at the time.

00:01:41.900 | That's true.

00:01:43.600 | And before, it was like Microsoft Lewis, which was like some natural language

00:01:48.320 | understanding things.

00:01:49.100 | And you can see it was just a lot of like piecing things together.

00:01:51.560 | And that was already kind of difficult to do.

00:01:54.100 | But nowadays, we see the stacks normalize significantly more, right?

00:01:57.940 | I think this is probably what the average agent stack looks like these days.

00:02:03.240 | Got some server front end.

00:02:05.040 | You talk to an API server that talks to an LLM, connects up with tools.

00:02:09.200 | And then you have a bunch of data sources associated to it.

00:02:12.120 | So this kind of normalization of agent stack is actually really good.

00:02:16.000 | That makes many things easier, definitely better than my HackerFarm project 10 years ago.

00:02:20.620 | But we need to think about the security posture around these systems.

00:02:24.620 | And my general impression over the last few years is like primary discussions around LLM security,

00:02:32.220 | really like, hey, can you do prompt injection?

00:02:34.800 | Can you get it to do harmful content, which is all really important.

00:02:39.640 | But the reality with security is you need to look at all the different arrows in your system.

00:02:45.340 | And that is typically where real damage happens, right?

00:02:49.560 | And so this is really agent security, and that is what I want to talk about today.

00:02:55.560 | Now, one thing is like, why did we even hack a bunch of agents?

00:02:59.320 | It's kind of a weird thing to do.

00:03:01.480 | The answer is, quite frankly, we wanted to launch internally at Y Combinator, and we wanted

00:03:06.360 | a splashy headline.

00:03:07.500 | And so we're like, uh-oh, what do we do?

00:03:10.840 | And fun fact, we have the second highest upvoted launch post inside Y Combinator of all time.

00:03:15.840 | So, higher than Rippling.

00:03:16.840 | Yes.

00:03:17.840 | Okay.

00:03:18.840 | So, we did basically this approach.

00:03:21.840 | At a time, we were looking at, oh, which agents are already live?

00:03:25.600 | And then let's just set a timer for 30 minutes.

00:03:27.600 | We don't want to waste too much time on this.

00:03:29.600 | And then, you know, let's figure out what their system prompts are and just kind of understand

00:03:33.440 | how they're working.

00:03:34.840 | And I have a feeling when I was creating this meme that this could be true, but it turns

00:03:38.960 | out it is true.

00:03:40.720 | And then we looked at, oh, what kind of tool definitions do they have, right?

00:03:45.300 | Like, you know, what is it supposed to do?

00:03:46.840 | Is it supposed to access data, supposed to run code, right?

00:03:50.640 | And then we just tried to exploit them and see what's going on.

00:03:54.660 | And it was really fun, because we hacked out of 16 agents that were launched.

00:03:58.840 | Within 30 minutes each, we hacked seven of them, and there are three common issues we

00:04:04.280 | see across all of these ones.

00:04:06.140 | So I hope that we will all learn today what the most common issues are, so you don't make

00:04:10.720 | the same mistakes.

00:04:11.800 | And also, this is going to be the best investment if you're a VC dispatch, because they're all

00:04:16.200 | secure now.

00:04:17.580 | So first issue, cross-user data access.

00:04:21.040 | I mean, you guys were just here at the OF talk.

00:04:23.640 | You know where this is going to head into, right?

00:04:26.840 | So we first leaked this company's system prompt, and we saw, huh, it has a bunch of interesting

00:04:32.300 | tools attached to it, including looking up user info by ID, suspicious, document by ID,

00:04:40.140 | and a bunch of other things.

00:04:41.140 | And then, you know, like, when you see this, you just want to be like, oh, yeah, there's

00:04:44.740 | this thing called IDOR, like Insecure Direct Object Reference.

00:04:49.340 | It's basically when you make a request, and you validate that, hey, the token's valid, and

00:04:53.840 | then you just let the request through, right?

00:04:55.680 | And you're kind of betting on the fact that the ID cannot be guessed.

00:04:58.900 | Well, that's obviously not good.

00:05:01.260 | So, yeah.

00:05:03.140 | We looked up a product demo video that they recorded, and we found the user ID in the URL

00:05:07.980 | bar.

00:05:08.980 | And just, like, tried to plug it in.

00:05:10.480 | This is a different ID, by the way.

00:05:11.980 | Don't worry, guys.

00:05:12.980 | This is my co-founder's ID now.

00:05:14.720 | And yeah.

00:05:15.720 | We were able to find their personal information, including their email, nickname, whatever.

00:05:19.980 | Well, it gets better because these things are also interconnected.

00:05:24.440 | So you had not only the user ID, but you also had, like, oh, the chat ID.

00:05:29.560 | Oh.

00:05:30.560 | And their document ID.

00:05:31.780 | And then these things ultimately linked up together and allows you to traverse the entire

00:05:35.200 | system.

00:05:36.200 | Right?

00:05:37.200 | Okay.

00:05:38.200 | It's not good.

00:05:39.200 | So what's the fix for that?

00:05:41.420 | There was a really comprehensive talk literally right before this.

00:05:44.280 | Sorry for the folks that missed it.

00:05:45.660 | But this is the basic fix for it, right?

00:05:48.100 | You need to think about how do you authenticate but also authorize the request.

00:05:51.640 | It's really two checks, right?

00:05:53.180 | Make sure your token is valid.

00:05:55.020 | Good job, team.

00:05:56.020 | I got that.

00:05:57.020 | And then the second thing is, like, this is what we see in this super base era with role-level

00:06:00.720 | security.

00:06:01.720 | Just make sure that you have some sort of access control matrix somewhere that checks that it

00:06:06.400 | matches up with whoever's making the request.

00:06:08.940 | Okay?

00:06:09.940 | Super, super important.

00:06:11.400 | Authenticate and authorize.

00:06:14.400 | Now you can see this was actually, you know, an issue that was kind of there, right?

00:06:18.520 | It's not, like, around the LLM and the API server.

00:06:21.180 | It's really what is happening downstream.

00:06:23.840 | And yeah, there's a lot of errors in this diagram.

00:06:26.420 | We're going to look at all of them.

00:06:28.420 | So the next thing is to remember, as you're thinking about these tools and how you're building

00:06:33.600 | it, like, agents actually act like users, not API servers.

00:06:38.480 | When we were, like, debugging this issue, like, we actually asked a bunch of Y-Combinator companies,

00:06:42.680 | like, why did you build it this way?

00:06:44.940 | Because clearly they can build a web app properly, right?

00:06:48.240 | But it's just, like, I think, as developers, we have this natural pattern matching in our

00:06:52.320 | heads.

00:06:53.320 | It's like, oh, yeah, this thing runs in a server, so it should be like a service.

00:06:55.480 | And then I'm going to give it service-level permissions.

00:06:57.480 | But actually, agents are like users, right?

00:07:01.200 | So everything that applies to users applies to agents, too.

00:07:05.320 | So make sure that, you know, your LLM should probably not determine your authorization pattern.

00:07:09.360 | That's bad.

00:07:10.360 | That's a red flag.

00:07:11.360 | Second thing is it should probably not act with service-level permission.

00:07:14.440 | Listen to your previous talk on OAuth.

00:07:15.440 | That's great.

00:07:16.440 | And then, just like users, you should make sure you don't just accept any input.

00:07:21.480 | You should sanitize them.

00:07:23.240 | Same with outputs, right?

00:07:24.440 | A lot of these are like the traditional web application security things that you just need

00:07:28.440 | to, like, really, really internalize for this new world.

00:07:32.500 | Now that was interesting.

00:07:34.340 | And so the second one was even better.

00:07:36.940 | So this is not as common, but the damage is bigger.

00:07:40.920 | So in the pattern we see, there are a lot of code tools that agents use.

00:07:46.220 | And there's an anthropic paper here.

00:07:49.340 | It basically talks about what's the distribution of which industry and how much do they use Claude.

00:07:55.320 | And there's, like, this one outlier here.

00:07:56.580 | I'll zoom it in for you.

00:07:59.120 | Yeah, so us nerds, we make up 3.4% of the world, but we're 37% of Claude's usage.

00:08:05.540 | Ooh, why is that?

00:08:06.460 | Because we love computers and we love coding, right?

00:08:08.520 | And so we found immediately the value of it.

00:08:10.940 | But it's not just us that use agents with coding tools.

00:08:15.060 | In fact, many agents create code on demand to do some things, right?

00:08:19.480 | Like some agents just generate a calculator on demand to make a calculation, right?

00:08:24.220 | And so there's a lot of these code execution sandboxes out there that are interesting.

00:08:29.780 | And so if you think about that, there's actually a critical path in your system,

00:08:34.620 | because you've got a tool that talks to another container.

00:08:37.460 | A container is arbitrary compute.

00:08:39.180 | And when you have arbitrary compute, many things can happen.

00:08:42.300 | Many bad things, many good things, right?

00:08:44.180 | But let's talk about the bad things today.

00:08:46.040 | So we did the same script, did the system prompt.

00:08:48.680 | Again, the system prompt itself, great.

00:08:50.440 | I mean, it doesn't cause any damage.

00:08:52.320 | But as an attacker, you always think about the things that are like, huh, that's kind of

00:08:57.340 | suspicious, right?

00:08:58.340 | It's like, oh, wait, it runs code and never outputted it to the user.

00:09:03.180 | OK, let's output it to the user.

00:09:04.560 | Oh, yeah, and mostly run it mostly at most once.

00:09:08.640 | Let's run it all the time.

00:09:09.940 | And so you try to basically invert what the system prompt is saying, because that is exactly

00:09:14.940 | what the developer didn't want you to do.

00:09:16.820 | And that is how bad actors think, right?

00:09:19.700 | So we figured out, oh, this thing does have a code tool.

00:09:22.740 | And so we tried running something that's like, oh, it only allows me to write Python.

00:09:28.120 | And I love JavaScript.

00:09:30.000 | And it doesn't allow me to run these really dangerous function calls.

00:09:33.600 | Oh, OK.

00:09:34.600 | And it restricts which Python files to run.

00:09:37.100 | That's also not good.

00:09:38.600 | So, yeah, but we looked at what it could do.

00:09:41.880 | And it had two kind of innocent permissions, write a Python file and read some files.

00:09:49.300 | You can do a lot with that.

00:09:50.300 | This is great.

00:09:51.300 | Because what if we just looked around the file system now, right?

00:09:54.560 | We can read files.

00:09:55.740 | So we looked at, oh, build me a little tree functionality and, you know, return me the entire

00:10:00.600 | file system tree to see what's going on.

00:10:02.600 | Oh, my God, there's an app.py file.

00:10:04.900 | That's probably important.

00:10:05.900 | And then we looked at, oh, it has two endpoints, write file and execute file.

00:10:10.880 | Ah, OK.

00:10:11.880 | These endpoints are hidden behind a VPC, so we cannot hit it directly.

00:10:15.080 | That's OK.

00:10:16.080 | But, huh, we can write files.

00:10:19.200 | Huh, we can write files.

00:10:21.200 | There's an app.py file.

00:10:23.200 | Huh.

00:10:24.420 | Let's look into that.

00:10:25.420 | Oh, wait.

00:10:26.420 | That's where all the protections are for their code.

00:10:29.840 | And so we can just override the app.py file with empty strings around all the security

00:10:35.620 | checks.

00:10:37.260 | And whoopsie.

00:10:39.440 | We got in.

00:10:40.360 | So now we can Bitcoin mine all day.

00:10:42.420 | That's great, right?

00:10:43.420 | Yeah?

00:10:44.420 | No, it gets much worse.

00:10:45.580 | So the thing with arbitrary code execution, once you're inside a container, is that you can

00:10:51.280 | do many things.

00:10:52.660 | Like there's this thing called service endpoint discovery, metadata discovery.

00:10:56.400 | Y'all heard of that?

00:10:57.820 | No?

00:10:58.820 | OK.

00:10:59.820 | Basically, it allows you to discover what are other devices on the network?

00:11:03.720 | What other resources are there on the network?

00:11:05.680 | And you can also just fetch the user token -- sorry, the service token, just see what's going

00:11:10.820 | on.

00:11:11.820 | What's the project name?

00:11:12.820 | And you start looking around.

00:11:13.820 | It's like, oh, OK.

00:11:14.820 | Yeah, OK.

00:11:15.820 | I can also fetch the scopes.

00:11:17.060 | So I can do many things with this token.

00:11:19.180 | That's awesome.

00:11:21.180 | Who has really, really spent time configuring service-level tokens and their permissions

00:11:26.900 | in a granular manner and does it all the time and never forgets to set something wrong?

00:11:32.180 | OK.

00:11:33.180 | One guy.

00:11:34.180 | One guy.

00:11:35.180 | There.

00:11:36.180 | OK.

00:11:37.180 | Whoopsie.

00:11:38.180 | So that's -- and we just queried BigQuery, which has a great interface for that.

00:11:41.180 | Isn't that cool?

00:11:42.180 | Yeah.

00:11:43.180 | So, yeah.

00:11:44.180 | Making sure you have code sandboxes correctly is very hard, because you can move laterally

00:11:48.980 | across the infrastructure, and that is just very, very dangerous.

00:11:52.180 | OK?

00:11:53.180 | And so kind of like don't roll your off in the web world.

00:11:57.080 | Don't roll your own code sandboxes, please.

00:11:59.080 | It's just very hard.

00:12:00.080 | It's very, very hard.

00:12:02.080 | And so use an out-of-the-box solution.

00:12:04.080 | There are many of them.

00:12:05.080 | E2B is, I think, a very popular one.

00:12:07.080 | Some folks have probably heard of it.

00:12:09.080 | There's one in our YC batch that I personally just genuinely really love.

00:12:13.080 | They have observability built in.

00:12:14.080 | They boot up super quickly.

00:12:16.080 | And what I love about them is they have an MCP server that's just as easy to plug into,

00:12:19.080 | right?

00:12:20.080 | So just easier for your agents to work with.

00:12:22.080 | So please do that.

00:12:24.080 | Don't do, you know, your own Python, app.py thing.

00:12:27.080 | It's not good.

00:12:28.080 | Trust me.

00:12:30.080 | So that leads into a third part of an attack vector around server-side request forgery.

00:12:37.080 | It's a very long word and really bugs me that the SSRF didn't fit on the previous line.

00:12:42.080 | It really triggers me.

00:12:44.080 | Yeah, I know.

00:12:46.080 | So this is what happens when you can kind of get a tool to call another endpoint that you

00:12:53.080 | didn't, you know, that the service itself didn't intend you to call.

00:12:56.080 | And you can pull out a lot of information just through that workflow.

00:13:00.080 | So let me give you an example.

00:13:02.080 | So this is exactly.

00:13:04.080 | Extract a system prompt.

00:13:05.080 | Great.

00:13:06.080 | Oh, this thing can create databases.

00:13:08.080 | That sounds exciting.

00:13:09.080 | And then you look into it.

00:13:11.080 | It's like, huh, it pulls the database schema from a private GitHub repository.

00:13:18.080 | Isn't that great?

00:13:20.080 | That means whatever request goes to that private GitHub repository must have the Git credentials.

00:13:25.080 | Right?

00:13:26.080 | Otherwise, how can it pull that from a private repository?

00:13:28.080 | So, yeah, and it's just a string.

00:13:31.080 | So I guess I can just put in whatever string I want and coerce it into providing that.

00:13:35.080 | So let's set up a badactor.com test.git repo and just see what credentials come through.

00:13:41.080 | And, yep, it comes across with the Git credentials.

00:13:45.080 | And so now you can actually take those Git credentials and just download your entire code base that was

00:13:50.080 | behind the private repo.

00:13:51.080 | Isn't that crazy?

00:13:52.080 | Isn't that crazy?

00:13:53.080 | Yeah.

00:13:54.080 | I mean, it's awesome for me to do this.

00:13:57.080 | Right?

00:13:58.080 | You get paid to do this.

00:13:59.080 | Oh, my.

00:14:00.080 | It's amazing.

00:14:01.080 | Now, we told our batchmates immediately, and they told us, don't worry, bro.

00:14:04.080 | It's already fixed.

00:14:05.080 | It's okay, guys.

00:14:06.080 | That company's secure if you're a VC listening in.

00:14:09.080 | But with that, though, it is really important to think about the implications of what your

00:14:15.080 | system is doing.

00:14:16.080 | Right?

00:14:17.080 | I love Vibe coding, not gonna lie, but, like, you gotta really think about where all these

00:14:22.080 | arrows are and if you've configured those things correctly.

00:14:25.080 | So with that, always sanitize your inputs and outputs.

00:14:29.080 | This could be, like, a web dev conference from 20 years ago.

00:14:33.080 | But it applies to agents, too, right?

00:14:36.080 | Like, we just need to make sure we keep those good security practices that we have learned

00:14:41.080 | to love, hopefully, over the years to take it forward to a new technology paradigm.

00:14:46.080 | And then, ultimately, I want you to take away three things.

00:14:50.080 | So first thing is, agent security is bigger than just LLM security.

00:14:55.080 | Make sure you understand how these threat vectors apply inside your overall system.

00:15:00.080 | Second thing is, treat agents as users, and that applies to authentication, to sanitization

00:15:05.080 | of user inputs, and many of the other things.

00:15:08.080 | And last thing, definitely don't roll your own code signbox.

00:15:11.080 | That is just so dangerous.

00:15:13.080 | And, you know, it very quickly turns from, like, an intern project into, like, a nightmare.

00:15:17.080 | So it'd be very, very careful of that.

00:15:20.080 | And these are the most basic ones that we've seen come across, right?

00:15:24.080 | There's obviously many more security issues.

00:15:26.080 | And if you don't know exactly how your agent's security posture is, you can go to casco.com.

00:15:31.080 | You can book a demo with us.

00:15:33.080 | We built an AI agent that actively attacks other AI agents and tells you where they break.

00:15:38.080 | Isn't that great?

00:15:39.080 | And, yeah, feel free to connect with me on LinkedIn or on Twitter.

00:15:42.080 | And I've, every now and then, some good stuff to post.

00:15:45.080 | Yeah.

00:15:46.080 | Awesome.

00:15:51.080 | Thanks, Renee.

00:15:52.080 | Does anyone have any questions?

00:15:53.080 | We could have time for, like, one or two quick questions if you're game for it.

00:15:57.080 | Sure.

00:15:58.080 | How do you look at the system problems?

00:16:01.080 | How do I look at system problems?

00:16:02.080 | There's a lot of just, like, open techniques.

00:16:04.080 | The best one that I've seen is from hiddenlayer.com.

00:16:07.080 | Have you guys checked those guys out?

00:16:08.080 | They have a great blog post on, like, it's a policy puppeteering attack.

00:16:13.080 | Yeah.

00:16:14.080 | It's great.

00:16:15.080 | Cool.

00:16:16.080 | Awesome.

00:16:17.080 | Oh?

00:16:18.080 | Yeah.

00:16:19.080 | If you're missing, like, coding agents, like, how do you make sure, because the coding agent

00:16:25.080 | is not compromised, like, how do you make sure that it's actually not running, like, running

00:16:28.080 | the proper commands?

00:16:29.080 | Like, this is a super tough thing to do.

00:16:31.080 | Like, if you try to whitelist them, like, there's so many creative ways that I want to get

00:16:36.080 | it out there.

00:16:37.080 | But, like, how are you--

00:16:38.080 | Yeah.

00:16:39.080 | Are you talking about it locally or server-side?

00:16:41.080 | They're on both, like--

00:16:43.080 | Yeah.

00:16:44.080 | I mean, locally is even more dangerous because they have differentials of the user running

00:16:48.080 | the code.

00:16:49.080 | Yeah, no, very much so.

00:16:50.080 | So locally, I think right now the industry is either you go full YOLO mode or you ask every

00:16:56.080 | time, right?

00:16:57.080 | I mean, I'm not joking.

00:16:58.080 | Cursor's thing is called YOLO mode, right?

00:17:00.080 | And then on server-side, use a code sandbox because ultimately they have constraints around

00:17:06.080 | the internal networks, but also they have constraints around how long they can live as a sandbox.

00:17:10.080 | Yeah.

00:17:11.080 | Okay, so sandboxes that use VMs actually?

00:17:14.080 | Yeah.

00:17:15.080 | So they typically use something called firecracker on the hood, which is a better isolation layer.

00:17:19.080 | Yeah, if you just use containers, by the way, that's not an isolation layer in case anybody's

00:17:23.080 | wondering.

00:17:24.080 | Yeah.

00:17:25.080 | Yeah, don't use containers for isolation.

00:17:26.080 | Yeah.

00:17:27.080 | you

00:17:27.580 | We'll see you next time.

How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco

Chapters