back to index

How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco


Chapters

0:0 Introduction to Casco and AI Agents
1:31 Evolution of Agent Stacks and Security Concerns
2:56 Why Casco Hacked AI Agents
4:0 Common Issue 1: Cross-User Data Access (IDOR)
7:38 Common Issue 2: Arbitrary Code Execution
12:38 Common Issue 3: Server-Side Request Forgery (SSRF)
14:48 Key Takeaways
15:28 Casco's Solution and Contact Information
15:56 Q&A

Whisper Transcript | Transcript Only Page

00:00:02.000 | - So yeah, who's ready to hack some agents?
00:00:17.720 | Yeah, oh wow, all right.
00:00:19.800 | So let me first introduce myself a little bit.
00:00:22.040 | I'm Rene, I'm the CEO of Casco, we're a YC company,
00:00:25.120 | and we specialize in red teaming AI agents and apps.
00:00:28.600 | And so we spent--
00:00:30.400 | I spent my previous time at AWS working on AI agents,
00:00:33.740 | but I've always really loved working on AI.
00:00:36.940 | In fact, there's a video of me 10 years ago
00:00:39.680 | building voice to code, and I won Europe's largest hackathon
00:00:43.180 | by doing that.
00:00:44.200 | And so I would talk to it, say, build me a blog post,
00:00:47.040 | and it would generate the sites.
00:00:48.740 | And it was actually-- it was kind of fun.
00:00:50.500 | It did things like loading pictures from San Francisco.
00:00:55.100 | And you can see how horribly slow the APIs were back then.
00:00:58.160 | And I'm about to give you a nightmare
00:00:59.580 | by showing you the architecture diagram of that thing.
00:01:02.420 | But yeah, it kind of did the job.
00:01:04.580 | And this was like 10 years ago.
00:01:06.260 | Obviously, back then was no generative AI,
00:01:08.120 | and these things were extremely difficult to do.
00:01:11.080 | But it really gave me a glimpse of what the future could look
00:01:14.720 | like, even back then as technology gets better, right?
00:01:18.140 | So obviously, many things have changed.
00:01:20.140 | Two months ago, I quit AWS and worked out of the garage
00:01:23.560 | with my co-founder.
00:01:24.780 | And we got into Y Combinator.
00:01:26.720 | So yay!
00:01:27.560 | That's awesome.
00:01:28.760 | And so from there, we also looked into how else have things
00:01:32.180 | evolved.
00:01:32.980 | Well, this was my architecture diagram from back then.
00:01:36.140 | You could see there was three different cloud providers,
00:01:38.680 | including IBM Watson, which was like forefront at the time.
00:01:41.900 | That's true.
00:01:43.600 | And before, it was like Microsoft Lewis, which was like some natural language
00:01:48.320 | understanding things.
00:01:49.100 | And you can see it was just a lot of like piecing things together.
00:01:51.560 | And that was already kind of difficult to do.
00:01:54.100 | But nowadays, we see the stacks normalize significantly more, right?
00:01:57.940 | I think this is probably what the average agent stack looks like these days.
00:02:03.240 | Got some server front end.
00:02:05.040 | You talk to an API server that talks to an LLM, connects up with tools.
00:02:09.200 | And then you have a bunch of data sources associated to it.
00:02:12.120 | So this kind of normalization of agent stack is actually really good.
00:02:16.000 | That makes many things easier, definitely better than my HackerFarm project 10 years ago.
00:02:20.620 | But we need to think about the security posture around these systems.
00:02:24.620 | And my general impression over the last few years is like primary discussions around LLM security,
00:02:32.220 | really like, hey, can you do prompt injection?
00:02:34.800 | Can you get it to do harmful content, which is all really important.
00:02:39.640 | But the reality with security is you need to look at all the different arrows in your system.
00:02:45.340 | And that is typically where real damage happens, right?
00:02:49.560 | And so this is really agent security, and that is what I want to talk about today.
00:02:55.560 | Now, one thing is like, why did we even hack a bunch of agents?
00:02:59.320 | It's kind of a weird thing to do.
00:03:01.480 | The answer is, quite frankly, we wanted to launch internally at Y Combinator, and we wanted
00:03:06.360 | a splashy headline.
00:03:07.500 | And so we're like, uh-oh, what do we do?
00:03:10.840 | And fun fact, we have the second highest upvoted launch post inside Y Combinator of all time.
00:03:15.840 | So, higher than Rippling.
00:03:17.840 | Okay.
00:03:18.840 | So, we did basically this approach.
00:03:21.840 | At a time, we were looking at, oh, which agents are already live?
00:03:25.600 | And then let's just set a timer for 30 minutes.
00:03:27.600 | We don't want to waste too much time on this.
00:03:29.600 | And then, you know, let's figure out what their system prompts are and just kind of understand
00:03:33.440 | how they're working.
00:03:34.840 | And I have a feeling when I was creating this meme that this could be true, but it turns
00:03:38.960 | out it is true.
00:03:40.720 | And then we looked at, oh, what kind of tool definitions do they have, right?
00:03:45.300 | Like, you know, what is it supposed to do?
00:03:46.840 | Is it supposed to access data, supposed to run code, right?
00:03:50.640 | And then we just tried to exploit them and see what's going on.
00:03:54.660 | And it was really fun, because we hacked out of 16 agents that were launched.
00:03:58.840 | Within 30 minutes each, we hacked seven of them, and there are three common issues we
00:04:04.280 | see across all of these ones.
00:04:06.140 | So I hope that we will all learn today what the most common issues are, so you don't make
00:04:10.720 | the same mistakes.
00:04:11.800 | And also, this is going to be the best investment if you're a VC dispatch, because they're all
00:04:16.200 | secure now.
00:04:17.580 | So first issue, cross-user data access.
00:04:21.040 | I mean, you guys were just here at the OF talk.
00:04:23.640 | You know where this is going to head into, right?
00:04:26.840 | So we first leaked this company's system prompt, and we saw, huh, it has a bunch of interesting
00:04:32.300 | tools attached to it, including looking up user info by ID, suspicious, document by ID,
00:04:40.140 | and a bunch of other things.
00:04:41.140 | And then, you know, like, when you see this, you just want to be like, oh, yeah, there's
00:04:44.740 | this thing called IDOR, like Insecure Direct Object Reference.
00:04:49.340 | It's basically when you make a request, and you validate that, hey, the token's valid, and
00:04:53.840 | then you just let the request through, right?
00:04:55.680 | And you're kind of betting on the fact that the ID cannot be guessed.
00:04:58.900 | Well, that's obviously not good.
00:05:01.260 | So, yeah.
00:05:03.140 | We looked up a product demo video that they recorded, and we found the user ID in the URL
00:05:08.980 | And just, like, tried to plug it in.
00:05:10.480 | This is a different ID, by the way.
00:05:11.980 | Don't worry, guys.
00:05:12.980 | This is my co-founder's ID now.
00:05:14.720 | And yeah.
00:05:15.720 | We were able to find their personal information, including their email, nickname, whatever.
00:05:19.980 | Well, it gets better because these things are also interconnected.
00:05:24.440 | So you had not only the user ID, but you also had, like, oh, the chat ID.
00:05:30.560 | And their document ID.
00:05:31.780 | And then these things ultimately linked up together and allows you to traverse the entire
00:05:35.200 | system.
00:05:36.200 | Right?
00:05:37.200 | Okay.
00:05:38.200 | It's not good.
00:05:39.200 | So what's the fix for that?
00:05:41.420 | There was a really comprehensive talk literally right before this.
00:05:44.280 | Sorry for the folks that missed it.
00:05:45.660 | But this is the basic fix for it, right?
00:05:48.100 | You need to think about how do you authenticate but also authorize the request.
00:05:51.640 | It's really two checks, right?
00:05:53.180 | Make sure your token is valid.
00:05:55.020 | Good job, team.
00:05:56.020 | I got that.
00:05:57.020 | And then the second thing is, like, this is what we see in this super base era with role-level
00:06:00.720 | security.
00:06:01.720 | Just make sure that you have some sort of access control matrix somewhere that checks that it
00:06:06.400 | matches up with whoever's making the request.
00:06:08.940 | Okay?
00:06:09.940 | Super, super important.
00:06:11.400 | Authenticate and authorize.
00:06:14.400 | Now you can see this was actually, you know, an issue that was kind of there, right?
00:06:18.520 | It's not, like, around the LLM and the API server.
00:06:21.180 | It's really what is happening downstream.
00:06:23.840 | And yeah, there's a lot of errors in this diagram.
00:06:26.420 | We're going to look at all of them.
00:06:28.420 | So the next thing is to remember, as you're thinking about these tools and how you're building
00:06:33.600 | it, like, agents actually act like users, not API servers.
00:06:38.480 | When we were, like, debugging this issue, like, we actually asked a bunch of Y-Combinator companies,
00:06:42.680 | like, why did you build it this way?
00:06:44.940 | Because clearly they can build a web app properly, right?
00:06:48.240 | But it's just, like, I think, as developers, we have this natural pattern matching in our
00:06:52.320 | heads.
00:06:53.320 | It's like, oh, yeah, this thing runs in a server, so it should be like a service.
00:06:55.480 | And then I'm going to give it service-level permissions.
00:06:57.480 | But actually, agents are like users, right?
00:07:01.200 | So everything that applies to users applies to agents, too.
00:07:05.320 | So make sure that, you know, your LLM should probably not determine your authorization pattern.
00:07:09.360 | That's bad.
00:07:10.360 | That's a red flag.
00:07:11.360 | Second thing is it should probably not act with service-level permission.
00:07:14.440 | Listen to your previous talk on OAuth.
00:07:15.440 | That's great.
00:07:16.440 | And then, just like users, you should make sure you don't just accept any input.
00:07:21.480 | You should sanitize them.
00:07:23.240 | Same with outputs, right?
00:07:24.440 | A lot of these are like the traditional web application security things that you just need
00:07:28.440 | to, like, really, really internalize for this new world.
00:07:32.500 | Now that was interesting.
00:07:34.340 | And so the second one was even better.
00:07:36.940 | So this is not as common, but the damage is bigger.
00:07:40.920 | So in the pattern we see, there are a lot of code tools that agents use.
00:07:46.220 | And there's an anthropic paper here.
00:07:49.340 | It basically talks about what's the distribution of which industry and how much do they use Claude.
00:07:55.320 | And there's, like, this one outlier here.
00:07:56.580 | I'll zoom it in for you.
00:07:59.120 | Yeah, so us nerds, we make up 3.4% of the world, but we're 37% of Claude's usage.
00:08:05.540 | Ooh, why is that?
00:08:06.460 | Because we love computers and we love coding, right?
00:08:08.520 | And so we found immediately the value of it.
00:08:10.940 | But it's not just us that use agents with coding tools.
00:08:15.060 | In fact, many agents create code on demand to do some things, right?
00:08:19.480 | Like some agents just generate a calculator on demand to make a calculation, right?
00:08:24.220 | And so there's a lot of these code execution sandboxes out there that are interesting.
00:08:29.780 | And so if you think about that, there's actually a critical path in your system,
00:08:34.620 | because you've got a tool that talks to another container.
00:08:37.460 | A container is arbitrary compute.
00:08:39.180 | And when you have arbitrary compute, many things can happen.
00:08:42.300 | Many bad things, many good things, right?
00:08:44.180 | But let's talk about the bad things today.
00:08:46.040 | So we did the same script, did the system prompt.
00:08:48.680 | Again, the system prompt itself, great.
00:08:50.440 | I mean, it doesn't cause any damage.
00:08:52.320 | But as an attacker, you always think about the things that are like, huh, that's kind of
00:08:57.340 | suspicious, right?
00:08:58.340 | It's like, oh, wait, it runs code and never outputted it to the user.
00:09:03.180 | OK, let's output it to the user.
00:09:04.560 | Oh, yeah, and mostly run it mostly at most once.
00:09:08.640 | Let's run it all the time.
00:09:09.940 | And so you try to basically invert what the system prompt is saying, because that is exactly
00:09:14.940 | what the developer didn't want you to do.
00:09:16.820 | And that is how bad actors think, right?
00:09:19.700 | So we figured out, oh, this thing does have a code tool.
00:09:22.740 | And so we tried running something that's like, oh, it only allows me to write Python.
00:09:28.120 | And I love JavaScript.
00:09:30.000 | And it doesn't allow me to run these really dangerous function calls.
00:09:33.600 | Oh, OK.
00:09:34.600 | And it restricts which Python files to run.
00:09:37.100 | That's also not good.
00:09:38.600 | So, yeah, but we looked at what it could do.
00:09:41.880 | And it had two kind of innocent permissions, write a Python file and read some files.
00:09:49.300 | You can do a lot with that.
00:09:50.300 | This is great.
00:09:51.300 | Because what if we just looked around the file system now, right?
00:09:54.560 | We can read files.
00:09:55.740 | So we looked at, oh, build me a little tree functionality and, you know, return me the entire
00:10:00.600 | file system tree to see what's going on.
00:10:02.600 | Oh, my God, there's an app.py file.
00:10:04.900 | That's probably important.
00:10:05.900 | And then we looked at, oh, it has two endpoints, write file and execute file.
00:10:10.880 | Ah, OK.
00:10:11.880 | These endpoints are hidden behind a VPC, so we cannot hit it directly.
00:10:15.080 | That's OK.
00:10:16.080 | But, huh, we can write files.
00:10:19.200 | Huh, we can write files.
00:10:21.200 | There's an app.py file.
00:10:24.420 | Let's look into that.
00:10:25.420 | Oh, wait.
00:10:26.420 | That's where all the protections are for their code.
00:10:29.840 | And so we can just override the app.py file with empty strings around all the security
00:10:35.620 | checks.
00:10:37.260 | And whoopsie.
00:10:39.440 | We got in.
00:10:40.360 | So now we can Bitcoin mine all day.
00:10:42.420 | That's great, right?
00:10:43.420 | Yeah?
00:10:44.420 | No, it gets much worse.
00:10:45.580 | So the thing with arbitrary code execution, once you're inside a container, is that you can
00:10:51.280 | do many things.
00:10:52.660 | Like there's this thing called service endpoint discovery, metadata discovery.
00:10:56.400 | Y'all heard of that?
00:10:59.820 | Basically, it allows you to discover what are other devices on the network?
00:11:03.720 | What other resources are there on the network?
00:11:05.680 | And you can also just fetch the user token -- sorry, the service token, just see what's going
00:11:11.820 | What's the project name?
00:11:12.820 | And you start looking around.
00:11:13.820 | It's like, oh, OK.
00:11:14.820 | Yeah, OK.
00:11:15.820 | I can also fetch the scopes.
00:11:17.060 | So I can do many things with this token.
00:11:19.180 | That's awesome.
00:11:21.180 | Who has really, really spent time configuring service-level tokens and their permissions
00:11:26.900 | in a granular manner and does it all the time and never forgets to set something wrong?
00:11:33.180 | One guy.
00:11:34.180 | One guy.
00:11:35.180 | There.
00:11:37.180 | Whoopsie.
00:11:38.180 | So that's -- and we just queried BigQuery, which has a great interface for that.
00:11:41.180 | Isn't that cool?
00:11:42.180 | Yeah.
00:11:43.180 | So, yeah.
00:11:44.180 | Making sure you have code sandboxes correctly is very hard, because you can move laterally
00:11:48.980 | across the infrastructure, and that is just very, very dangerous.
00:11:53.180 | And so kind of like don't roll your off in the web world.
00:11:57.080 | Don't roll your own code sandboxes, please.
00:11:59.080 | It's just very hard.
00:12:00.080 | It's very, very hard.
00:12:02.080 | And so use an out-of-the-box solution.
00:12:04.080 | There are many of them.
00:12:05.080 | E2B is, I think, a very popular one.
00:12:07.080 | Some folks have probably heard of it.
00:12:09.080 | There's one in our YC batch that I personally just genuinely really love.
00:12:13.080 | They have observability built in.
00:12:14.080 | They boot up super quickly.
00:12:16.080 | And what I love about them is they have an MCP server that's just as easy to plug into,
00:12:19.080 | right?
00:12:20.080 | So just easier for your agents to work with.
00:12:22.080 | So please do that.
00:12:24.080 | Don't do, you know, your own Python, app.py thing.
00:12:27.080 | It's not good.
00:12:28.080 | Trust me.
00:12:30.080 | So that leads into a third part of an attack vector around server-side request forgery.
00:12:37.080 | It's a very long word and really bugs me that the SSRF didn't fit on the previous line.
00:12:42.080 | It really triggers me.
00:12:44.080 | Yeah, I know.
00:12:46.080 | So this is what happens when you can kind of get a tool to call another endpoint that you
00:12:53.080 | didn't, you know, that the service itself didn't intend you to call.
00:12:56.080 | And you can pull out a lot of information just through that workflow.
00:13:00.080 | So let me give you an example.
00:13:02.080 | So this is exactly.
00:13:04.080 | Extract a system prompt.
00:13:05.080 | Great.
00:13:06.080 | Oh, this thing can create databases.
00:13:08.080 | That sounds exciting.
00:13:09.080 | And then you look into it.
00:13:11.080 | It's like, huh, it pulls the database schema from a private GitHub repository.
00:13:18.080 | Isn't that great?
00:13:20.080 | That means whatever request goes to that private GitHub repository must have the Git credentials.
00:13:25.080 | Right?
00:13:26.080 | Otherwise, how can it pull that from a private repository?
00:13:28.080 | So, yeah, and it's just a string.
00:13:31.080 | So I guess I can just put in whatever string I want and coerce it into providing that.
00:13:35.080 | So let's set up a badactor.com test.git repo and just see what credentials come through.
00:13:41.080 | And, yep, it comes across with the Git credentials.
00:13:45.080 | And so now you can actually take those Git credentials and just download your entire code base that was
00:13:50.080 | behind the private repo.
00:13:51.080 | Isn't that crazy?
00:13:52.080 | Isn't that crazy?
00:13:53.080 | Yeah.
00:13:54.080 | I mean, it's awesome for me to do this.
00:13:57.080 | Right?
00:13:58.080 | You get paid to do this.
00:13:59.080 | Oh, my.
00:14:00.080 | It's amazing.
00:14:01.080 | Now, we told our batchmates immediately, and they told us, don't worry, bro.
00:14:04.080 | It's already fixed.
00:14:05.080 | It's okay, guys.
00:14:06.080 | That company's secure if you're a VC listening in.
00:14:09.080 | But with that, though, it is really important to think about the implications of what your
00:14:15.080 | system is doing.
00:14:16.080 | Right?
00:14:17.080 | I love Vibe coding, not gonna lie, but, like, you gotta really think about where all these
00:14:22.080 | arrows are and if you've configured those things correctly.
00:14:25.080 | So with that, always sanitize your inputs and outputs.
00:14:29.080 | This could be, like, a web dev conference from 20 years ago.
00:14:33.080 | But it applies to agents, too, right?
00:14:36.080 | Like, we just need to make sure we keep those good security practices that we have learned
00:14:41.080 | to love, hopefully, over the years to take it forward to a new technology paradigm.
00:14:46.080 | And then, ultimately, I want you to take away three things.
00:14:50.080 | So first thing is, agent security is bigger than just LLM security.
00:14:55.080 | Make sure you understand how these threat vectors apply inside your overall system.
00:15:00.080 | Second thing is, treat agents as users, and that applies to authentication, to sanitization
00:15:05.080 | of user inputs, and many of the other things.
00:15:08.080 | And last thing, definitely don't roll your own code signbox.
00:15:11.080 | That is just so dangerous.
00:15:13.080 | And, you know, it very quickly turns from, like, an intern project into, like, a nightmare.
00:15:17.080 | So it'd be very, very careful of that.
00:15:20.080 | And these are the most basic ones that we've seen come across, right?
00:15:24.080 | There's obviously many more security issues.
00:15:26.080 | And if you don't know exactly how your agent's security posture is, you can go to casco.com.
00:15:31.080 | You can book a demo with us.
00:15:33.080 | We built an AI agent that actively attacks other AI agents and tells you where they break.
00:15:38.080 | Isn't that great?
00:15:39.080 | And, yeah, feel free to connect with me on LinkedIn or on Twitter.
00:15:42.080 | And I've, every now and then, some good stuff to post.
00:15:45.080 | Yeah.
00:15:46.080 | Awesome.
00:15:51.080 | Thanks, Renee.
00:15:52.080 | Does anyone have any questions?
00:15:53.080 | We could have time for, like, one or two quick questions if you're game for it.
00:15:57.080 | Sure.
00:15:58.080 | How do you look at the system problems?
00:16:01.080 | How do I look at system problems?
00:16:02.080 | There's a lot of just, like, open techniques.
00:16:04.080 | The best one that I've seen is from hiddenlayer.com.
00:16:07.080 | Have you guys checked those guys out?
00:16:08.080 | They have a great blog post on, like, it's a policy puppeteering attack.
00:16:13.080 | Yeah.
00:16:14.080 | It's great.
00:16:15.080 | Cool.
00:16:16.080 | Awesome.
00:16:18.080 | Yeah.
00:16:19.080 | If you're missing, like, coding agents, like, how do you make sure, because the coding agent
00:16:25.080 | is not compromised, like, how do you make sure that it's actually not running, like, running
00:16:28.080 | the proper commands?
00:16:29.080 | Like, this is a super tough thing to do.
00:16:31.080 | Like, if you try to whitelist them, like, there's so many creative ways that I want to get
00:16:36.080 | it out there.
00:16:37.080 | But, like, how are you--
00:16:38.080 | Yeah.
00:16:39.080 | Are you talking about it locally or server-side?
00:16:41.080 | They're on both, like--
00:16:43.080 | Yeah.
00:16:44.080 | I mean, locally is even more dangerous because they have differentials of the user running
00:16:48.080 | the code.
00:16:49.080 | Yeah, no, very much so.
00:16:50.080 | So locally, I think right now the industry is either you go full YOLO mode or you ask every
00:16:56.080 | time, right?
00:16:57.080 | I mean, I'm not joking.
00:16:58.080 | Cursor's thing is called YOLO mode, right?
00:17:00.080 | And then on server-side, use a code sandbox because ultimately they have constraints around
00:17:06.080 | the internal networks, but also they have constraints around how long they can live as a sandbox.
00:17:10.080 | Yeah.
00:17:11.080 | Okay, so sandboxes that use VMs actually?
00:17:14.080 | Yeah.
00:17:15.080 | So they typically use something called firecracker on the hood, which is a better isolation layer.
00:17:19.080 | Yeah, if you just use containers, by the way, that's not an isolation layer in case anybody's
00:17:23.080 | wondering.
00:17:24.080 | Yeah.
00:17:25.080 | Yeah, don't use containers for isolation.
00:17:26.080 | Yeah.
00:17:27.580 | We'll see you next time.