How to Build Trustworthy AI

00:00:01.000 | Hi, my name is Allie Howe. I am a VCISO for Growth Cyber. We are a business that helps other companies build trustworthy AI. We sit at the intersection of AI security and compliance.

00:00:14.620 | Today we're going to be talking about a variety of different topics, namely what is trustworthy AI, what goes into building trustworthy AI, and why you should care about trustworthy AI.

00:00:28.720 | So to start off, who even needs trustworthy AI? Why do we care about it? Well, it's been in the news a lot, whether you realize it or not.

00:00:36.740 | All the way back in 2023, we all saw that case with the Chevy Tahoe incident where a user was able to be offered a Chevy Tahoe from a chatbot for a dollar.

00:00:49.420 | So that chatbot did not operate as it was intended to by the company, and it was in a position to be taken advantage of by a user.

00:00:57.440 | Another instance, in 2024, Slack was able to be tricked into leaking data from private channels via a prompt injection.

00:01:05.740 | So again, that system did not operate as intended and had some pretty strict consequences as well for probably Slack AI, and the companies, whoever, whatever company was, had that data leakage happen.

00:01:16.900 | Very recently, a couple weeks ago, we saw Darth Vader, an NPC being released into Fortnite by Epic Labs.

00:01:25.340 | That was really interesting. I think this was the first case of a voice agent being used in a video game.

00:01:30.200 | So users were able to interact within AI Vader and ask it all sorts of crazy questions.

00:01:34.960 | You know, at first, Vader exhibited a lot of bad behaviors, sort of reminiscent of the Microsoft Taye chatbot back from 2006.

00:01:42.540 | It was saying things that were racist, homophobic.

00:01:44.860 | It since has improved dramatically.

00:01:48.080 | That's not what I'm diving into specifically today.

00:01:50.600 | But as you can see, there's many cases in the news where AI was not necessarily trustworthy.

00:01:55.220 | So it's something that is happening quite often and something that needs your attention.

00:02:03.620 | There's a lot of debate around who is responsible for trustworthy AI, and it really boils down to you.

00:02:10.100 | You are the one responsible.

00:02:11.300 | There was a lawsuit the other day on May 20th, so very recently, where a radio host was suing OpenAI over false statements that were generated by chat GPT.

00:02:23.440 | However, chat GPT was able to get the case dismissed, or they won the case, or OpenAI did not end up in trouble, simply because it states that chat GPT can make wrong outputs from time to time, and it's up to the user to understand that and to proceed with caution.

00:02:40.540 | So if you're using AI, it's likely you that will be responsible, both on paper and also just from a brand and reputational standpoint, you are the one that needs to be aware that you are likely responsible and will be taking the fall for anything your AI application does that's incorrect, wrong, or inappropriate.

00:03:00.960 | So when we talk about building trustworthy AI, that's something that both product engineering and security teams are focused on.

00:03:08.900 | Product teams probably care that your AI application is outputting the right topics, it's relevant, it's generally helpful.

00:03:15.360 | Security teams are probably thinking about, you know, your AI application's not going to be saying anything that's going to be inappropriate or off-topic as well, but they're also looking out for things like prompt injections and jailbreaks.

00:03:26.780 | Engineering is helping out with these as well, probably cross-functioning is also thinking about things like cost and latency as well.

00:03:32.740 | So it's a really big team that comes together to be able to put together trustworthy AI specifically.

00:03:38.720 | And the recipe for trustworthy AI is AI security and AI safety.

00:03:44.200 | So what's the difference?

00:03:45.780 | AI security is how does the outside world harm my AI application?

00:03:51.500 | AI safety is how does my AI application harm the world?

00:03:55.480 | We'll talk into, we'll go into detail about both of those shortly.

00:03:59.100 | So there's this new paradigm out there that AI engineering has introduced.

00:04:05.740 | Traditionally, we had DevSecOps where our scanning tools, security tools like our SaaS tools were integrated within our CI CD pipelines.

00:04:15.020 | They were able to capture most of the vulnerabilities, such as software dependency, supply chain issues, and insecure code.

00:04:20.460 | But now, thanks to AI engineering and, you know, data scientists and machine learning engineers, they don't work in our traditional CI CD platforms.

00:04:28.400 | They work in things like Databricks.

00:04:30.000 | They work in things like Jupyter Notebooks.

00:04:31.800 | So we need a new model for what AI engineering DevSecOps looks like and how we're actually going to build trustworthy AI.

00:04:40.680 | Traditionally, there's been a big focus on shifting, but thanks to prompt injections changing rapidly, AI models being deployed and switched out rapidly,

00:04:51.040 | there's now a big focus on shifting right for AI security, which is why runtime security is particularly important.

00:04:58.440 | I mean, all three of these boxes here, build, test, and run are, build, test, and deploy are,

00:05:03.420 | but with AI security specifically, especially because it's so non-deterministic,

00:05:07.440 | having something on the rightmost side at the time of runtime is incredibly important.

00:05:13.020 | You know, no longer are we just caring about shift left.

00:05:15.860 | We're really worried about this entire lifecycle.

00:05:18.120 | So on the leftmost side, we've got our build.

00:05:20.660 | That's where we're going to be doing some sort of like model scanning, thinking about model providence,

00:05:24.420 | looking at AI or ML bombs.

00:05:27.100 | So that's machine learning security operations.

00:05:29.180 | It's kind of a play on the term DevSecOps.

00:05:31.020 | That's important.

00:05:31.800 | And then in the middle, we've got AI Red Team.

00:05:34.220 | We're going to test our AI applications for both AI security and AI safety concerns.

00:05:40.460 | And then on the rightmost side, we've got AI runtime security,

00:05:43.520 | where we can validate AI inputs and outputs as they come into our AI system and our models.

00:05:51.840 | So DevSecOps is out.

00:05:53.800 | MLSecOps is in.

00:05:56.280 | So basically, MLSecOps is machine learning security operations.

00:06:00.600 | As I mentioned before, MLSecOps is able to look into and take into consideration places that traditional DevSecOps does not.

00:06:09.220 | As I mentioned, AI engineers live in Databricks and Jupyter Notebooks, not in traditional CICT pipelines.

00:06:14.720 | So it's important to focus there as well and look for exposed secrets that could be in those Databricks or Jupyter Notebooks.

00:06:22.580 | And also part of MLSecOps is understanding things like model providence.

00:06:27.740 | So, you know, who built this model?

00:06:30.600 | Where did it come from?

00:06:31.460 | What data was it trained on?

00:06:32.980 | Those are helpful things to think about for compliance as well.

00:06:35.760 | If you have, you know, access to that data the model was trained on,

00:06:39.840 | if you're supposed to be using it or safeguarding it, understanding maybe, you know, maybe a nation state made that model.

00:06:46.760 | What are the implications of that?

00:06:48.340 | Is that something you need to be worried about?

00:06:49.880 | One of the biggest risks from models that you may be using, especially like open source ones you can get your hands on, are model serialization attacks.

00:06:58.640 | A model serialization attack is when models are, code is saved into the model at runtime, at, sorry, at time of serialization.

00:07:06.360 | And then when you go to de-serialize the model, that code will automatically be run.

00:07:10.280 | So you're now talking about arbitrary code execution.

00:07:13.500 | And we see this with the pickle serialization format.

00:07:16.360 | That's one of the most well-known ones for serializing models.

00:07:20.420 | However, if you look at the pickles documentation, they do say that the module is not secure.

00:07:27.600 | And you're not supposed to unpickle data that you do not inherently trust.

00:07:32.320 | So you're getting these models, say, from a model repository or a model zoo on the web, and you're just downloading it.

00:07:38.300 | You need to scan those to see if they have any unsafe operators in them, if you're at risk to model serialization attacks.

00:07:44.820 | A lot of serialization attacks run as soon as the model is de-serialized.

00:07:47.920 | So that arbitrary code could be causing data loss, credential loss, or model poisoning.

00:07:51.820 | So it's really easy to scan models.

00:07:54.280 | So I really encourage you to do this and be practicing this within your organization.

00:07:59.160 | One example is ModelScan from Protect AI.

00:08:02.480 | And Protect AI also created this MLS SecOps community where you can learn more about MLS SecOps.

00:08:07.780 | I've personally learned a lot from it, so I really encourage you to check it out.

00:08:11.060 | But I can show you ModelScan really quickly.

00:08:13.120 | This is the repo.

00:08:15.100 | It's open source, free to use, really easy to download.

00:08:17.960 | Just pip install it and then run it like this.

00:08:20.180 | They also have an example of a model serialization attack within this code base that you can go ahead and check out.

00:08:26.040 | So as you can see here, when the model gets saved, we are adding our unsafe payload in there, which is this command to basically output the AWS Secrets that we have.

00:08:36.380 | And so when this model gets run or loaded, we can now see that this OS Keys has been outputted, which, you know, then we have a credential leak.

00:08:44.080 | That's not great.

00:08:44.800 | And then if you had used ModelScan to scan your model before, you know, this had happened, you would have seen that there is a critical vulnerability, unsafe operator in use here.

00:08:54.320 | So that's one example of, you know, a potential solution for scanning your models that's open source free that you can use.

00:09:00.120 | There's other ones out there as well.

00:09:03.200 | They also have a partnership with HuggingFace.

00:09:05.940 | So ModelScan is used within HuggingFace to do scans of files and model data.

00:09:12.420 | So if you ever see something that's like unsafe here, you can go learn about it.

00:09:16.300 | So keep a lookout for that as well as you're pulling models off of ModelZoes.

00:09:20.060 | So now that we've talked about the left most side, which is MLSecOps, let's now talk about the middle, which is AI red teaming.

00:09:32.620 | And AI red teaming, we can use that to simulate both adversarial threats and also AI safety concerns.

00:09:40.020 | So during AI red teaming, we can test for things like prompt injections, jailbreaks, but we can also test for AI safety concerns.

00:09:47.800 | Like, so if a user asks, you know, how can I build a bomb?

00:09:50.240 | How could I create chemical weapons?

00:09:52.060 | Those are things that your model should not, you know, be answering.

00:09:54.960 | It should be safeguarding against.

00:09:56.200 | It also should be, you know, it should be biased.

00:09:58.900 | It shouldn't do anything that's like homophobic or racist.

00:10:02.040 | Like we saw in the Vader example from the very beginning.

00:10:04.420 | These are things we can both test during AI red teaming.

00:10:07.260 | And we should be continuously testing our models because, you know, as we know, models change as users interact with them, not just with code deploys like traditional software has.

00:10:16.320 | Another benefit of AI red teaming is you can use them to influence runtime guardrails.

00:10:21.700 | So if we see that, you know, our model is particularly vulnerable to this type of prompt injection or it's continuously saying like this sort of racist thing, well, we can block certain topics and prompts that would elicit those responses to make sure that our model behaves as expected during runtime.

00:10:38.660 | Another benefit of AI red teaming is the ability to compare LLMs to each other.

00:10:43.580 | In some cases, models can have back doors built into them where you're not going to see this during sort of just for testing.

00:10:50.940 | But if you start to compare LLMs to each other, you might start to see differences that might suggest that one model has a back door built into it, whereas another does not.

00:11:00.140 | So if you ask every single model the same question, but one had a really different response, you might be like, OK, maybe this model like one either doesn't work appropriately.

00:11:08.060 | Maybe this was by design.

00:11:09.560 | There's some sort of backdoor deceptive behavior that was built into this model.

00:11:13.000 | So I'm going to use a different model instead.

00:11:15.000 | So there's a lot of reasons to do AI red teaming that can be particularly helpful, both with model selection and helping you influence what runtime guardrails to put in place during runtime.

00:11:27.700 | So in terms of runtime security, if there was one area to invest in, I would pick runtime security because AI red teaming can be particularly time consuming and also expensive to if you're going to like retrain models based on the results.

00:11:41.840 | It might be particularly difficult or a lot of overhead for your organization if you're starting, you know, an AI security practice.

00:11:47.560 | AI runtime security is easy to implement.

00:11:50.580 | Typically, it's done with just including an API or installing another like Python module.

00:11:57.100 | It's pretty easy to get deployed with a lot of benefits.

00:11:59.560 | So.

00:12:00.340 | We should really focus on AI security, runtime security, because it's important to shift right.

00:12:06.940 | That's where we're going to see things like the prompt injections being thrown at our models and different types of prompt attacks.

00:12:12.940 | These can be indirect prompt injections where, say, you have a system that is using RAG or calling out and scraping a website.

00:12:20.140 | Web data can have hidden prompt injections in them.

00:12:23.080 | The same can documents that you've used with your RAG set up.

00:12:25.960 | So you could get something that way.

00:12:27.220 | You could also have a direct and prompt injection straight from the user to the chatbot or to the AI agent.

00:12:31.900 | You can also see jailbreaks as well, which in text, those look like things that are semantically strange, pretty chaotic looking.

00:12:39.260 | This is an example here on the slide of a jailbreak from LaCara AI.

00:12:42.760 | And also at runtime, your application can see off topic or unsafe prompts, such as ones that might, you know, get your application to output something that is unsafe.

00:12:52.800 | So something that's like, you know, instructions for how to build a bomb or something that's, you know, inappropriate or racist or, you know, something that's bad to say.

00:13:00.940 | You can check that at runtime, both on the input side, looking at the prompts and then either deflecting it and not allowing your model to answer it in the first place.

00:13:09.360 | Or if your model answers it and something that's incorrect or inappropriate comes back, you can go ahead and block that from being ever sent to the user.

00:13:17.860 | So it's a super nice solution to have in place to make sure that your AI you've deployed is behaving in a trustworthy manner.

00:13:28.120 | So let's take a look at that beta solution from our setup from the beginning, where we've got that beta NPC running in Fortnite.

00:13:35.520 | It was a really interesting setup that I tried to imagine.

00:13:38.960 | So this is just an estimated guess of what their architecture looks like.

00:13:41.920 | So basically, the users that are using Fortnite are sending proximity audio, user audio.

00:13:48.240 | There's like other contacts being sent to the servers likely about like player skins or other player configurations.

00:13:55.180 | And also, of course, the user's audio feed, where we see the, where they're talking to beta go into probably Fortnite servers.

00:14:00.940 | And then it's being passed to 11 labs for voice to text transformation, eventually being passed to Gemini to craft beta's response.

00:14:09.380 | And then we're going to see beta's response being sent back in text to 11 labs, then sent back as audio to Fortnite servers, and eventually all the way back to the user.

00:14:18.300 | So there's a lot going on here.

00:14:20.820 | And if I was going to be inserting AI runtime security, I'd probably insert it as close to the model as possible.

00:14:26.840 | So if we get something that's like unsafe, if the user said something that was maybe against user terms or conditions, or, you know, the user said something inappropriate, maybe we don't want to like allow Vader to answer that.

00:14:36.940 | So we can validate our inputs here at this, at this location.

00:14:40.100 | So let's say we answer Vader's question or topic, and it did come back as something that was off topic or inappropriate, we could flag it there.

00:14:48.020 | So what I'm talking about right now is AI safety concerns, where the user has said something inappropriate, we don't want Vader to behave in a way that's inappropriate either.

00:14:56.280 | So we can check for that, but of course, we can also check for, you know, prompt ejections, jailbreaks, other, you know, AI security concerns as well.

00:15:03.660 | So AI runtime is both for AI safety and also for AI security.

00:15:07.480 | But of course, also, you know, include AI runtime security here as well before we even get to 11 labs.

00:15:14.500 | So if the Fortnite servers come back and say, hey, like, you know, this looks like something that's inappropriate, like, I don't even want to do a call to 11 labs, there's another API call, I'm sure it's expensive as well, because they're using other services.

00:15:24.880 | So you could add it here as well, same here, checking it again after it comes out of 11 labs, checking the audio, making sure that that still looks good.

00:15:33.020 | However, there's tradeoffs at play here, we have to think about cost, latency and accuracy.

00:15:38.480 | That's what a good AI runtime security solution has, it has all of those things, low latency, low cost, or, you know, acceptable cost.

00:15:45.600 | It's also like highly accurate as well.

00:15:47.380 | A lot of AI runtime security solutions are backed by really well put together and established AI research teams, which are constantly finding new prompt ejections, new jailbreaks.

00:15:57.720 | That's not something you have to worry about when you're building your product, trying to keep up with the latest jailbreaks and prompt ejections, you can rely on your vendor for that.

00:16:05.580 | So just to show you an example of AI runtime security at work, this is Pillar, it's an application security lifecycle application, which I have been using to do some of my work.

00:16:22.200 | This is just an example of one, there's also different AI runtime security solutions out there.

00:16:26.860 | But basically, I had an application that I was using for finding patients that would be suitable for ALS clinical trials.

00:16:40.540 | And as you can see here, this is supposed to be an acceptable use case here, where we are getting patients in our database we would recommend for different studies that we found on the web.

00:16:52.780 | So basically, the goal of this agent or this multi-agent system is to output a list of patients suitable for each trial.

00:16:59.720 | It's not supposed to answer questions about individual patients or modify patient data in the database.

00:17:06.280 | So in this example, I have asked this system to change a patient's FEC percentage in the database to 50.

00:17:14.580 | The Pillar has blocked this for me, so my agent's not going to answer this.

00:17:18.860 | Because I have configured my guardrails to restrict this topic and keywords related to this so that we can make sure to not change patient data in the database and this application can behave as expected.

00:17:32.280 | So a lot of guardrails are where a lot of guardrails and applications or vendors will allow you to look out for things like PII, making sure that that's not being either input or output, depending on your configuration, making sure we're looking for things like AI safety concerns, toxic responses.

00:17:48.960 | But a really strong advantage of AI runtime security platforms or some of them is the ability to add custom guardrails.

00:17:55.860 | So like this one in this example, where I've got this very specific functionality where I don't want my system to be able to update database information.

00:18:04.140 | So I'm able to use that for this, which, of course, is a security concern.

00:18:07.960 | But I could also restrict topics around like, you know, say if I was Tesla and I didn't want to recommend Ford, for example, because that's a competing car company, I have restrict that topic to make sure that my outputs are outputs that are reflecting my business goals, as well as my AI security and AI safety goals.

00:18:24.960 | So runtime solutions can be particularly helpful and impactful in that way.

00:18:28.760 | And then once we have an AI security runtime solution, we can go ahead and verify that in different GRC platforms.

00:18:41.220 | So as I mentioned before, I help with compliance as well.

00:18:43.960 | So if we're doing all this work to build trustworthy AI, we might as well demonstrate it and we might as well show that to our customers so that they can trust in what we build and also help us with our sales cycles.

00:18:54.520 | So for instance, I have this risk in my risk register, which having a risk assessment is something that is required for SOC 2, at least annually.

00:19:04.860 | So if I have this custom risk of, you know, what if my company or patient service damage to do a harmful or off-topic output from an AI application?

00:19:13.140 | In Vanta, I made a custom control for validating AI outputs.

00:19:21.380 | So in this control, I have submitted evidence that I'm going to be validating all of my different AI outputs.

00:19:28.280 | And so I can go ahead and throw that into a trust center and show that I'm passing controls for validating AI outputs and inputs.

00:19:35.700 | So if someone comes and wants to, you know, work with my company or buy my solution, they can go ahead and go on my trust center and see not only controls that are applicable to SOC 2, but additional controls that I have created.

00:19:47.220 | For AI security that shows, hey, we're, you know, I'm using a runtime solution.

00:19:50.760 | So, I mean, I'm clearly taking AI security seriously.

00:19:53.480 | And so maybe you want to buy from me, or maybe you want to work with my company, or maybe this, you know, helps you not send me a, you know, 200 question page security questionnaire, for example.

00:20:03.460 | So something to call out if you're, you know, if you're building with AI and you are taking AI, trustworthy AI seriously, and you're building it, might as well show it, might as well use it as a competitive advantage in your sales cycles.

00:20:17.620 | So if you're not convinced you need to build with trustworthy AI yet, here's some other reasons why you might want to take it seriously.

00:20:23.300 | Cyber security risk and business risk have never been more aligned.

00:20:27.580 | So in the past, you could have built a product that was insecure by design, shipped it, made some revenue with it, and then added security layer.

00:20:35.740 | However, with AI, we're not seeing that.

00:20:38.200 | Making sure that AI applications output the correct outputs, that they are aligned, they are safe, they are on topic.

00:20:45.900 | That is as much of a cyber security concern as it is a business concern.

00:20:50.600 | AI that outputs things that are off topic or relevant, it's not going to be revenue generating.

00:20:55.340 | So getting AI security and trustworthy AI right from the beginning will not only be helpful for your security program, but it will be helpful for revenue as well.

00:21:03.000 | AI will also make missing cyber security best practices worse.

00:21:07.940 | So you aren't tracking things like, you know, what data you have trained on, where you're getting different models,

00:21:13.900 | how you're taking care of supply chain risk.

00:21:16.880 | AI is only going to magnify that and make that worse.

00:21:19.660 | And we're also seeing an increasing regulatory environment for compliance perspective.

00:21:25.460 | We're seeing ISO 42001 crop up, which is the first international standard around AI regulation or a compliance framework, sorry.

00:21:34.600 | And then the EO AI Act, of course, came out.

00:21:37.580 | And that's what this picture is about on the right.

00:21:39.360 | There was a case where an AI company was putting together a database of faces that they had scraped off the web.

00:21:46.440 | They ended up getting fined by the EU for about $20 million thanks to the EO AI Act.

00:21:51.240 | So it's important to take that seriously.

00:21:52.620 | If you are going to accidentally output patient data, if you're building a healthcare AI application, then you might be finding yourself with a HIPAA violation.

00:22:04.020 | So that's important to take into consideration.

00:22:05.780 | But also different guidelines that are specific to whatever you're building, whatever your industry is.

00:22:10.300 | So, for example, even the FDA came out with AI and ML guidelines earlier this year.

00:22:14.580 | So it's important to keep in mind what sort of regulations your company might be subject to.

00:22:19.280 | So it's worth getting ahead of those either regulations that exist now or will exist down the road by building trustworthy AI today.

00:22:26.920 | Trustworthy AI is super important because it's going to unlock a lot of revolutionary innovation.

00:22:37.100 | So, for instance, if we think about the healthcare industry, we can't use solutions that help us identify new proteins that can be able to create entirely new organs for transplant or allow us to use models that portraying on 1.3 million cells and has a lot of, like, very confidential data in them.

00:22:54.160 | We can't take advantage of the amazing technology that we're going to be able to create as a society if what we're building isn't trustworthy to begin with.

00:23:02.200 | These systems need compliance.

00:23:03.540 | They need AI safety.

00:23:04.860 | They need AI security.

00:23:06.600 | So if that doesn't motivate you to take trustworthy AI seriously, I hope it does because what will end up happening is we're going to be able to create things we never thought possible, but only if we had trust in place in the beginning.

00:23:18.560 | So the bottom line is you are responsible for building trustworthy AI.

00:23:26.140 | We've seen the news headlines.

00:23:27.480 | It happens all the time.

00:23:28.620 | We've also seen the lawsuits where it's often the user of the AI that's the one that's responsible.

00:23:35.140 | So, you know, don't wait to get yourself into a lawsuit.

00:23:37.600 | Start building trustworthy AI today.

00:23:39.560 | Trustworthy AI is AI security, which is how does the world harm your AI application, plus AI safety, which is how does your AI application harm the world?

00:23:51.100 | So you can build trustworthy AI by incorporating ML SecOps practices, by red teaming your AI applications, and by incorporating AI runtime security solution at the time of deployment in your AI system.

00:24:04.500 | So thanks for watching.

00:24:08.420 | If you want to find me, here's my handles.

00:24:10.240 | But it was super awesome to deliver this discussion to you today.

00:24:13.940 | If you've got any questions, please let me know.

00:24:15.880 | Happy to reach out, answer more questions online.

00:24:18.580 | But thanks.

00:24:19.080 | Thank you.

00:24:20.080 | Thank you.

How to Build Trustworthy AI — Allie Howe

Chapters