GitHub's AI Powered Security Platform: Sarah Khalife

Thank you all for joining today. Thank you for staying if you stayed from the previous session as well. I know there were a bunch of GitHub talks, so I'm very excited that you got to hear from all of us from all different places within the company, but also from all different teams.

So, you heard a lot about co-pilots, a lot, a lot about co-pilot. Co-pilot today, co-pilot tomorrow, co-pilot in the next year, maybe in a couple years. So, from workspaces to get a co-pilot enterprise to co-pilot futures, there's so much more going on with co-pilot itself. But what we really want to actually do within GitHub is not only incorporate some of the new features and capabilities within co-pilot only, but we also want to incorporate it as part of the platform.

So, everything within the platform itself will start having AI in it. So, today, you've heard a lot about co-pilot. Obviously, we're at an AI conference. So, if you didn't, that would have been a bad thing. But with collaboration, productivity and security, those are some of our biggest aspects of the GitHub core platform capabilities there.

From pull requests, as you heard some of the code review things that were happening earlier on. But also from productivity perspective, how do you gain momentum and get more developers excited to do their work across their developer platforms, especially if they're using GitHub co-pilots. How do you get that momentum going from a productivity perspective?

And then, security. Security, I feel like, hasn't been talked about as much in terms of how can we how can we improve app sec with AI rather than just how do we talk about security around AI. Security around AI is probably something talked about all the time from legal to privacy to everything in between.

But what we want to do at GitHub is to also incorporate AI to make sure that security is also being progressed along with the new capabilities that we're seeing today. And everything that GitHub does, we want to do it with scale. We want to build our integrations and our APIs.

So anything that I've talked about today is generally API backed or there's a third party integrator that also comes in and helps with a lot of the capabilities. There's never one path that solves all the answers. So we want to make sure that all of our customers, you know, have a path away to a solution, have a path to that solution.

Today, if you didn't know, actually, this might be not the latest numbers, actually. But I think the numbers are more. I haven't updated the slide in a while. But there's over 100 million developers registered on GitHub. There's 4 million organizations, 90% of the Fortune 100 companies, which sometimes I'm surprised by the numbers because I know everybody uses GitHub.

But like 90% of the Fortune 100 is mind blowing. I work with customers day in, day out. But this is still mind blowing to see, because that is how we get a lot of our feedback. That's how we get a lot of our community part of the conversation. That's how we build a lot of our features.

Anything that we do, we incorporate that community back into the conversation. By means of introductions, I'm Sarah Khalifa. I'm a principal solutions engineer here at GitHub. I've been at GitHub for about four and a half years now, which is a long time for GitHub's life. And it was pre-copilot, pre some of the capabilities, almost pre-actions, if you use actions.

So, you know, we've seen the platform grow a lot during the last couple of years. But like I said, the community is what really makes it the biggest, better platform across the board. Our customers, our vendors, our partners, everybody that is part of that conversation is how we really get into the next iteration of what we're going to build.

And I'm lucky enough to work with a lot of customers. So I'm working on the customer side of things. All my customers today are financial services, but over the last four and a half years, I've been working with all types of customers from all different parts of the industries outside in this enterprise world that we live in.

So I did a quick introduction of what is the GitHub platform today. We talked about co-pilot, but that's only one aspect. We talked a little about collaboration. You heard about it earlier on through Christina and Chris's session. You also heard a lot about productivity, especially with copilot being in that conversation.

But today we're going to talk about GitHub advanced security and how we're incorporating AI into that. If you haven't heard of GitHub advanced security, don't worry, I'm going to cover what is GitHub advanced security. And then we're going to talk about some of the new features are coming along that are the AI aspects into advanced security.

And then we'll do a live demo, because as you can see, all of our GitHub teams here love doing live demos, even if they don't work, or if they work better than they expect, which is a very great example of what happened in the earlier session today. Okay. So we'll be focusing on one aspect of the platform, and then we'll be talking about specifically security.

So how can we incorporate security in our day-to-day work? And then how can AI improve that experience for developers? Who here has used GitHub advanced security? Anybody? Nice. A couple people. So what is GitHub advanced security? So GitHub advanced security allows you to incorporate, think of it as your AppSec aspects into your developer platform day in, day out, and has that GitHub experience.

So when we talk about advanced security, we have code scanning. Code scanning allows you to do SaaS and other types of code scanning within the developer platform, within the GitHub ecosystem, to find vulnerabilities and detect different patterns that are vulnerable to then help remediate them faster. With code scanning today, there are two aspects that I would say are the more popular aspects to talk about.

And we can talk about a lot more if you want to come to the GitHub, Microsoft and GitHub booth later on. But code scanning today allows you to detect vulnerabilities using code QL, which is our internal or proprietary language, I guess, but it's open source. So you can actually build it, your own queries yourself.

But with code scanning today, it allows you to detect different vulnerable patterns across your code. We'll find the data flow and be able to analyze that data flow to understand where the vulnerability is, and what type of source and exploits you have within that vulnerability. So that makes it super powerful, especially when we talk about some of the AI aspects that we're adding to it, because we have the context and we have that information in there.

With code QL today, there are more than I don't even know how many query packs that we offer internally or from the GitHub side. But we also bring in that community. So anything that we do is community backed. And sorry, I keep hitting the mic. But anything that we do is community backed.

So when we talk about code QL, it's not just our queries, it's Microsoft queries, it's Google's queries, it's Uber's queries, and so forth. And we build that aspect of community in even in security, just because we know there's never going to be GitHub is going to answer every single question that you may have, especially for the companies that you work for.

The other aspect of code scanning is to incorporate third party integrations. So again, this is where our vendor and partnership comes through. So if you're using something like container scanning, code scanning today, code QL specifically is more focused on SaaS. We're not going to do container scanning today, at least not in the near future.

So why not incorporate some of the results back into your pull request and get that information and feedback sooner than later. So that's an aspect of third party integrations, code scanning, infrastructures, code scanning, sorry, container scanning, infrastructures, code scanning, or any other third parties that you want to integrate with, you can do that through serif inputs through code scanning.

The biggest win, which all my customers have loved, and I don't know if you felt it if you're doing even open source work is secret scanning. Secret scanning is a lifesaver in many, many cases. How many times has somebody accidentally left some, you know, secrets, maybe AWS or Azure keys in there in their log file without realizing and, you know, it happens.

You do a test file and you forget to add it to your git ignore. It saves, it saves a lot of Bitcoin miners being spun up within your environment. So secret scanning does a full-blown analysis of secrets across your repositories from API keys to your own custom secrets and using AI that I'll be talking about in a little bit on how to detect other types of secrets that are just plain text but really hard to reduce the amount of false positives on them.

AI can really help with that. So with secret scanning, it's been the biggest win across all of my customers and it's been a big, big discussion on how do we prevent things but also how do we reactively and proactively prevent things. So proactively what we have incorporated was push protection so it allows you to block any pushes that are coming to the GitHub ecosystem before the secret is being exposed or before it goes into your git commit history so then you can, you don't have to revoke it at that point.

But what's in your git commit history? We don't recommend deleting anything in your git commit history especially if you work for a regulated industry that goes through audits and has to maintain a lot of historical aspects very precisely. That's where we just recommend, hey, we are able to detect not only your current state but all your git history and other issues, pull requests, comments and pull requests and so forth if there's any secrets in those and we really recommend revoking them.

And last but not least, supply chain security. If you have maybe heard of it as Dependabot is one of the tools within the supply chain security aspect of it, Dependabot allows you to detect dependencies that are vulnerable today. So with Dependabot, it gives you an opportunity to say, hey, I found vulnerabilities for these dependents or found that these dependencies are vulnerable.

So maybe we need to upgrade to the latest version. There's going to be some additional AI components are being added to that as well. But that's something that I won't be covering as much today because that's still early earlier on in the stages there. Some of our secret scanning partners that we work with today are very much common vendors that you might be working with throughout the your day to day.

But this is where again, we talk about the community and the vendors and the partners that we work with. Because what we do is for secret scanning is not only incorporate their patterns, but we also push them to improve how they're doing the patterns to make sure that their vulnerability, their secrets are in general, are not going to create a lot of false positives.

So we create a kind of like a mechanism for them to add hashes and more kind of more specific information to be able to detect these secrets almost at 99%, 99.9 something percent. I don't know the exact average that we have today, but it's pretty high up there and it reduces the amount of false positives so significantly when you use some of our high fidelity partners that we work with.

So, at any given point in time, GitHub really believes security should be part of the day-to-day responsibilities of everybody. It's a shared responsibility. It's never just AppSec saying, hey, you need to fix these 10 vulnerabilities by tomorrow or else we can't deploy. It's never just the developer trying to figure out how to fix this vulnerability that they've never even heard of or maybe not even understand to then be able to deploy on time.

So it should be more of a shared responsibility. So our goal is to bridge that gap and make that conversation a lot easier. So anything that we do with advanced security, anything that we're doing with AI allows you to really add that aspect into it. So what can AI do for us?

How can we benefit with, how can AI benefit security? With AI, there's so much more that you can do, especially with generative AI, as you can see with co-pilot, with all the new customers, vendors, partners that you're seeing here at this conference. There's just so many aspects to it.

So the first couple of things that we've noticed right off the bat is easier identification. How can we help, how can AI help us identify vulnerabilities or secrets much easier? How can we have faster remediation? So when you identify things, if you're not fixing them, then what's the point of identifying them half the time, right?

If you're not going to fix the vulnerabilities, that's where the actual issue is. It's easy to find vulnerabilities lately, a lot more easier than they were before, but fixing them is the actual issue. And that's where the productivity aspect also comes into play. And last but not least, driving that productivity.

The faster you're able to fix vulnerabilities, the faster you're able to be a little more productive, increase your like security risk postures of where your company may be today, reduce the amount of, you know, turmoil that you have to hit, you know, by deploying earlier on with the fix rather than waiting to like production or after production, or when a customer is using your product already.

But in general, this is where AI, we see AI really helping introduce a lot more of that, more of those capabilities. So first, but not the most important, but it is probably the biggest one that we are very excited for is code scanning autofix. With code scanning autofix, not only are we helping detect vulnerabilities with code scanning, but now we're providing a way to autofix those with AI.

So in the pull request, as you're working actively, it will actually provide a response back to say, hey, maybe you should be fixing this vulnerability this specific way. And it'll give you a suggestion. Obviously, it's AI. So it's going to give you a suggestion of what it thinks based on the context it has, you can always edit it, you can always fix it, or you can commit it and rerun your test and see if it actually fixes that vulnerability.

With code scanning autofix, code QL is what's providing a lot of that context. So code QL is finding the data flow of that vulnerability, you're getting information of what that vulnerability common fixes are when you're doing code scanning. So providing the context in the way that we are doing our backend system to prompt that request, it's actually providing a really, really good autofix result.

From our customers and from all of my customers that have tested this today, they found that autofix has been pretty successful for, I don't know, maybe 70% of their use cases. But again, this is going to only get better as we're working with more customers as more people are starting to test this out.

And it's in public beta today, so you can actually test this out yourself if you're interested. Second to this is the secret scanning improvements. Some of the aspects of creating a custom pattern requires a lot of work. I mean, I don't know, who knows regex to the point where they can, they feel confident, confident rolling out a regex scan across all your repositories, right?

I personally cannot claim that I do. My regex, I mean, it's not bad, but it's not the best that it can be. So why not have AI help us custom generate those regexes? So with custom pattern generation, you can provide AI capabilities to maybe suggest a different way to write some of your regexes.

So you can provide samples and examples of what you're looking for. And then AI, our secret can or secret scanning custom pattern generation would generate a custom pattern for you to at least have a starting point if you don't think that's the full answer just yet. But it generates that custom pattern.

And it makes it so much easier to give more and more examples because the more context it has, the better answer will provide. And it will generate a response back so you don't have to figure out how to write this new on search for this type of regex to find a custom pattern.

So this has simplified the process so much. And a lot of our customers have loved, loved, loved having this capability because they were doing that anyway, probably on chat GPT, or maybe going to copilot in their IDE, or maybe they were doing this on Google and trying to figure out cheat sheet with regexes.

Like there was so much work that was being done just to generate that. Now you can just have somebody alongside with you like a copilot to help you generate that custom secret. And last but not least, actually, I think this is one of the most important ones from a secret scanning perspective, is to detect unstructured passwords.

How often do you have password equals? I mean, very, very often. Let me tell you, let me give you that answer. Very often. How often is that actually vulnerable? Is that a real password? Is that actually being exploited? Probably not very often. Probably way less often than how often you have password equals somewhere.

But there's so many types of unstructured passwords like that, where you can define a password of sorts, but never know if it's actually being exploited. So what we're doing, we're doing, and think of it as an AI analysis of the repository to understand if that password is a true positive.

So what we're doing is finding passwords, and we label them today as other because they are definitely going to have some false positive in them in the first iteration of this. But it's going to identify those passwords and make it easier for you to say, hey, these are actually vulnerable passwords that we have exposed in our Git history.

So we need to revoke them, rotate them and start storing them in our Azure core vault or Azure key vault or wherever we want to. This is going to be such a game changer in terms of like passwords that are internal to your company. This is going to be a game changer for passwords that aren't really very structured in general that allow you to do things that you shouldn't be storing in a Git repository.

But this is, again, where it's a way of AI helping finding and discovering easier, much more easily than you could have before. So the easier identification, faster remediation. So the easier you can identify, the faster you can remediate it. But let's go into a demo. So I have about maybe like 10 minutes left here.

So really quickly, I kind of want to just start off with a GitHub repository here or a GitHub organization here. You can see there's a security tab at the top of your GitHub organization. If you have admin access or if you're there's a security manager role as well, you'll be able to see the security tab.

If you have GitHub advanced security on, it will actually give you a lot more information. If you don't, you'll have some dependent upon information going on here. But this security tab gives you an overview of all the information that you have across this organization. So in this example here, what you're seeing is that there's so many open alerts.

This is our demo repository. This is a production code. So do not worry. This is not going to be deployed anywhere in Azure. But we are safe for today. But nonetheless, 73,000 alerts is a lot. But you can identify these alerts and find more patterns of what's going on based on the secrets.

If it's secrets being identified or if it's vulnerabilities from your SAS scanning or if it's vulnerabilities from your third party integrations or if it's dependency vulnerabilities from dependabot. So you can see a lot more information and statistics on, hey, what's the age of some of the alerts? These alerts have been living for a long time.

What's the remediation time timeline? How often are you remediating these? How often are you actually resolving these? And then you can understand the impact analysis of all the different repositories. This is one of the views. So my actual favorite view is the coverage view. Because the concern for a lot of customers is like, how do we know what we don't know?

So in many cases, people believe that they have full coverage of everything. But it's usually done in CI and in the CI pipeline. So you're not actually getting coverage unless you're going through a CI pipeline. But there's so many more repositories that probably are just sitting there for just basic automation, something else that you're just running that it's not going through your CI pipeline.

So you're never actually running security scanning across all of those. So in this view here, you can get a true identification of how many of your repositories are actually covered. So you can see secret scanning, code scanning, and dependabot. In many cases, secret scanning is an easy one-click button on.

So obviously, there's just a lot more coverage across secret scanning, 99% here. We're hopefully going to get 100 at one point in time. But for code scanning, it's 57. What do we need to do? Why is it 57? Does it make sense that should we not be scanning more things with code scanning?

And that's where I think this provides really the best value. That's from an organization perspective, or an admin, or if you're a security manager. As a developer, I want to go into my repository, similar to your repository. You can see a security tab here. This security tab really allows you to understand what's going on across that specific repository.

So let's go into one of these here. So I'm in my code scanning repository here, or sorry, in my repository in the code scanning alerts here. I can see all of the alerts listed out where I need to start fixing. So this is a lot more reactive work. We found these vulnerabilities.

How can we start fixing them? We have a big backlog. We have some tech debt. This is a place where I'm going to go and understand what's going on. So then I can go fix those. So when I go into one of them, for instance, actually, let's go into this one here.

When we go into one of them, for instance, we can see specifically what the vulnerability is. So if we click on show paths, you can see from source all the way down to the sink of what the vulnerability is. So as a developer, I can understand that where I need to start fixing these vulnerabilities.

But in many cases, there's more than one way to an exploit. There's more than one way to get to that exposure point. So how do we identify that and understand that? That's what CodeQL does really well. It identifies all the different paths. So if I go into maybe step eight, it looks like there's a different source, but it's the same sink in this example.

So why not actually fix a vulnerability in step seven? So then I can find the common denominator across all of those. So that is the more reactive work. What we really want to do with AI is to be more proactive. So now I am in a pull request. I, not me, but Mr.

Left Rife left here actually introduced a vulnerability. In this vulnerability, he, let's see what the vulnerability is, cross-site scripting. So he introduced some cross-site scripting, easy mistake to make, very common vulnerability, usually an easy fix. But as a developer, I never really knew what it was. So I can get a better understanding of what that is.

So dependent, advanced security will actually tell you, hey, this is the vulnerability. This is the information around that vulnerability. But the autofix aspect will actually be very specific. So this specific solution is to this vulnerability. So now before you even merge your code into your production, main branches, develop branches, you can get results and an answer back on how to resolve that vulnerability.

So finding that vulnerability and remediating it all within the pull request. And that's the power of AI in this case. So in this example here, it's asking to install the escape HTML library and import that in and actually that resolves your vulnerability fairly easily. But it could have saved me like a couple minutes, a couple hours, a couple days, depending on how much I knew about this pull request or how much I knew about the code or how much I understood from this vulnerability to actually make that fix.

In this case, it took me just reading through this. And I want to make sure I obviously I'm still the developer. I still want to do my analysis, understand if I if it's the right answer. But I can then decide to commit that fix. And as soon as I commit that fix, it will rerun all the scans.

So we can see if that vulnerability is actually remediated right off the bat. So that's our AI autofix. And I know we have only like one more minute left. But at the end of the day, what we really want to show is how AI can really improve that experience.

And this is just one example. And the more examples are if we're generating some secrets, I can show that at the Microsoft booth if you want to stop by later on. Generating the secrets with AI, detecting other types of secrets with AI. I can show all of that at the Microsoft booth later on.

Awesome. Thank you. We'll see you next time.

GitHub's AI Powered Security Platform: Sarah Khalife

Transcript